Huawei NE20 - Features PDF

NE20E-S2
V800R009C10SPC200
Feature Description
Issue 01
Date 2018-05-04
NE20E-S2
Feature Description Contents
Contents
1 Feature Description....................................................................................................................... 1
1.1 Using the Packet Format Query Tool ............................................................................................................................ 2
1.2 VRPv8 Overview .......................................................................................................................................................... 3
1.2.1 About This Document ................................................................................................................................................ 3
1.2.2 VRP8 Overview ......................................................................................................................................................... 5
1.2.2.1 Introduction............................................................................................................................................................. 5
1.2.2.1.1 Introduction of VRP8 ........................................................................................................................................... 6
1.2.2.1.2 Development of the VRP ..................................................................................................................................... 7
1.2.2.2 Architecture............................................................................................................................................................. 7
1.2.2.2.1 VRP8 Componentization ..................................................................................................................................... 7
1.2.2.2.2 VRP8 Virtualized Hierarchy ................................................................................................................................ 8
1.2.2.2.3 VRP8 High Extensibility ..................................................................................................................................... 9
1.2.2.2.4 VRP8 Carrier-Class Management and Maintenance.......................................................................................... 11
1.2.2.2.5 Advantages of the VRP8 Architecture ............................................................................................................... 15
1.3 Basic Configurations .................................................................................................................................................. 16
1.3.1 About This Document .............................................................................................................................................. 16
1.3.2 TTY ......................................................................................................................................................................... 19
1.3.2.1 Introduction........................................................................................................................................................... 19
1.3.2.2 Principles .............................................................................................................................................................. 19
1.3.2.2.1 TTY.................................................................................................................................................................... 19
1.3.3 Telnet ....................................................................................................................................................................... 20
1.3.3.1 Introduction........................................................................................................................................................... 20
1.3.3.2 Principles .............................................................................................................................................................. 20
1.3.3.2.1 Telnet ................................................................................................................................................................. 20
1.3.3.3 Applications .......................................................................................................................................................... 23
1.3.3.3.1 Telnet ................................................................................................................................................................. 23
1.3.4 SSH .......................................................................................................................................................................... 24
1.3.4.1 Introduction........................................................................................................................................................... 24
1.3.4.2 Principles .............................................................................................................................................................. 24
1.3.4.2.1 SSH .................................................................................................................................................................... 24
1.3.4.3 Applications .......................................................................................................................................................... 28
1.3.4.3.1 Supporting STelnet ............................................................................................................................................ 28
1.3.4.3.2 Supporting SFTP ............................................................................................................................................... 28
Issue 01 (2018-05-04) ii
NE20E-S2
1.3.4.3.3 Supporting SCP ................................................................................................................................................. 29

1.3.4.3.4 Accessing a Private Network ............................................................................................................................. 30
1.3.4.3.5 Supporting Access Through Other Ports ............................................................................................................ 31
1.3.4.3.6 Supporting ACL ................................................................................................................................................. 31
1.3.4.3.7 Supporting SNETCONF .................................................................................................................................... 32
1.3.5 Command Line Interface ......................................................................................................................................... 32
1.3.5.1 Introduction........................................................................................................................................................... 32
1.3.5.2 Principles .............................................................................................................................................................. 32
1.3.5.2.1 Principles of the Command Line Interface ........................................................................................................ 32
1.3.5.3 Applications .......................................................................................................................................................... 35
1.3.6 SSL .......................................................................................................................................................................... 36
1.3.6.1 Introduction........................................................................................................................................................... 36
1.3.6.2 Principles .............................................................................................................................................................. 36
1.3.6.2.1 SSL .................................................................................................................................................................... 36
1.3.6.3 Applications .......................................................................................................................................................... 42
1.3.6.3.1 SSL .................................................................................................................................................................... 42
1.3.7 VFM......................................................................................................................................................................... 43
1.3.7.1 Introduction........................................................................................................................................................... 43
1.3.8 File Transfer ............................................................................................................................................................. 43
1.3.8.1 FTP ....................................................................................................................................................................... 44
1.3.8.1.1 Introduction........................................................................................................................................................ 44
1.3.8.1.2 FTP Features Supported by VPNs ..................................................................................................................... 44
1.3.8.1.3 Principles ........................................................................................................................................................... 45
1.3.8.1.4 Applications ....................................................................................................................................................... 48
1.3.8.2 TFTP ..................................................................................................................................................................... 49
1.3.8.2.1 Introduction........................................................................................................................................................ 49
1.3.8.2.2 Principles ........................................................................................................................................................... 50
1.3.8.2.3 Applications ....................................................................................................................................................... 51
1.3.9 Configuration Management ..................................................................................................................................... 52
1.3.9.1 Introduction........................................................................................................................................................... 52
1.3.9.2 Principles .............................................................................................................................................................. 53
1.3.9.2.1 Two-Phase Validation Mode .............................................................................................................................. 53
1.3.9.2.2 Configuration Rollback...................................................................................................................................... 55
1.3.9.2.3 Configuration Trial Run ..................................................................................................................................... 56
1.4 System Management ................................................................................................................................................... 58
1.4.1 About This Document .............................................................................................................................................. 58
1.4.2 VS ............................................................................................................................................................................ 60
1.4.2.1 Introduction........................................................................................................................................................... 60
1.4.2.2 Principles .............................................................................................................................................................. 61
1.4.2.2.1 Basic VS Concepts............................................................................................................................................. 61
1.4.2.3 Applications .......................................................................................................................................................... 63
1.4.2.3.1 Simplification of Network Deployment ............................................................................................................. 63
Issue 01 (2018-05-04) iii

NE20E-S2
1.4.2.3.2 Assignment of VSs to Users .............................................................................................................................. 65

1.4.2.3.3 Service Differentiation and Isolation ................................................................................................................. 66
1.4.2.3.4 Multi-Service VPN ............................................................................................................................................ 67
1.4.2.3.5 New Service Verification ................................................................................................................................... 68
1.4.3 Information Management ........................................................................................................................................ 69
1.4.3.1 Introduction........................................................................................................................................................... 69
1.4.3.2 Principles .............................................................................................................................................................. 69
1.4.3.2.1 Information Classification ................................................................................................................................. 69
1.4.3.2.2 Information Level .............................................................................................................................................. 71
1.4.3.2.3 Information Format ............................................................................................................................................ 72
1.4.3.2.4 Information Output ............................................................................................................................................ 72
1.4.3.3 Applications .......................................................................................................................................................... 75
1.4.3.3.1 Monitoring Network Operations Using Collected Information ......................................................................... 75
1.4.3.3.2 Locating Network Faults Using Collected Information ..................................................................................... 75
1.4.3.3.3 Information Audit .............................................................................................................................................. 75
1.4.4 Fault Management ................................................................................................................................................... 75
1.4.4.1 Introduction........................................................................................................................................................... 75
1.4.4.2 Principles .............................................................................................................................................................. 76
1.4.4.2.1 Alarm Masking .................................................................................................................................................. 76
1.4.4.2.2 Alarm Suppression ............................................................................................................................................. 77
1.4.5 Performance Management ....................................................................................................................................... 79
1.4.5.1 Introduction........................................................................................................................................................... 79
1.4.5.2 Principles .............................................................................................................................................................. 79
1.4.5.3 Applications .......................................................................................................................................................... 80
1.4.6 Upgrade and Maintenance ....................................................................................................................................... 81
1.4.6.1 Introduction........................................................................................................................................................... 81
1.4.6.2 Principles .............................................................................................................................................................. 81
1.4.6.2.1 Software Management ....................................................................................................................................... 81
1.4.6.2.2 Trusted Computing ............................................................................................................................................ 83
1.4.6.2.3 System Upgrade ................................................................................................................................................. 84
1.4.6.2.4 Patch Upgrade .................................................................................................................................................... 84
1.4.6.2.5 License-based Sales Policy ................................................................................................................................ 86
1.4.6.3 Applications .......................................................................................................................................................... 87
1.4.6.3.1 Upgrade Software .............................................................................................................................................. 87
1.4.6.3.2 License Authorization ........................................................................................................................................ 87
1.4.6.4 Terms, Acronyms, and Abbreviations ................................................................................................................... 87
1.4.7 SNMP ...................................................................................................................................................................... 88
1.4.7.1 Introduction........................................................................................................................................................... 88
1.4.7.2 Principles .............................................................................................................................................................. 91
1.4.7.2.1 SNMP Principles ................................................................................................................................................ 91
1.4.7.2.2 SNMP Management Model ............................................................................................................................... 92
1.4.7.2.3 SNMPv1 Principles............................................................................................................................................ 93
Issue 01 (2018-05-04) iv
NE20E-S2
1.4.7.2.4 SNMPv2c Principles .......................................................................................................................................... 96

1.4.7.2.5 SNMPv3 Principles............................................................................................................................................ 97
1.4.7.2.6 MIB .................................................................................................................................................................... 98
1.4.7.2.7 SMI .................................................................................................................................................................. 100
1.4.7.2.8 Trap .................................................................................................................................................................. 100
1.4.7.2.9 SNMP Protocol Stack Support for Error Codes ............................................................................................... 101
1.4.7.2.10 SNMP Support for IPv6 ................................................................................................................................. 102
1.4.7.2.11 Comparisons of Security in Different SNMP Versions .................................................................................. 102
1.4.7.2.12 ACL Support .................................................................................................................................................. 103
1.4.7.2.13 SNMP Proxy .................................................................................................................................................. 103
1.4.7.2.14 SNMP Support for AAA Users ...................................................................................................................... 106
1.4.7.3 Applications ........................................................................................................................................................ 108
1.4.7.3.1 Monitoring an Outdoor Cabinet Using SNMP Proxy ...................................................................................... 108
1.4.8 NETCONF ............................................................................................................................................................. 109
1.4.8.1 Introduction......................................................................................................................................................... 109
1.4.8.2 Principles ............................................................................................................................................................ 111
1.4.8.2.1 NETCONF Protocol Framework ..................................................................................................................... 111
1.4.8.2.2 NETCONF Network Architecture and Related Concepts ................................................................................ 112
1.4.8.2.3 NETCONF Authorization ................................................................................................................................ 116
1.4.8.2.4 NETCONF Capabilities and Operations .......................................................................................................... 118
1.4.8.3 Applications ........................................................................................................................................................ 133
1.4.8.3.1 NETCONF-based Configuration and Management ......................................................................................... 133
1.4.9 DCN ....................................................................................................................................................................... 134
1.4.9.1 Introduction......................................................................................................................................................... 134
1.4.9.2 Principles ............................................................................................................................................................ 136
1.4.9.2.1 Basic Concepts................................................................................................................................................. 136
1.4.9.2.2 Basic DCN Principles ...................................................................................................................................... 137
1.4.9.3 Applications ........................................................................................................................................................ 137
1.4.9.4 Terms, Acronyms, and Abbreviations ................................................................................................................. 139
1.4.10 LAD ..................................................................................................................................................................... 139
1.4.10.1 Introduction....................................................................................................................................................... 139
1.4.10.2 Principles .......................................................................................................................................................... 140
1.4.10.2.1 Basic Concepts............................................................................................................................................... 140
1.4.10.2.2 Implementation .............................................................................................................................................. 144
1.4.10.3 Applications ...................................................................................................................................................... 145
1.4.10.3.1 LAD Application in Single-Neighbor Networking ........................................................................................ 145
1.4.10.3.2 LAD Application in Multi-Neighbor Networking ......................................................................................... 146
1.4.10.3.3 LAD Application in Link Aggregation .......................................................................................................... 147
1.4.10.4 Terms, Acronyms, and Abbreviations ............................................................................................................... 148
1.4.11 LLDP ................................................................................................................................................................... 149
1.4.11.1 Introduction ....................................................................................................................................................... 149
1.4.11.2 Principles .......................................................................................................................................................... 150
Issue 01 (2018-05-04) v
NE20E-S2
1.4.11.2.1 Basic LLDP Concepts .................................................................................................................................... 150

1.4.11.2.2 Basic LLDP Principles ................................................................................................................................... 153
1.4.11.3 Applications ...................................................................................................................................................... 155
1.4.11.3.1 LLDP Applications in Single Neighbor Networking ..................................................................................... 155
1.4.11.3.2 LLDP Applications in Multi-Neighbor Networking ...................................................................................... 156
1.4.11.3.3 LLDP Applications in Link Aggregation ....................................................................................................... 157
1.4.12 Physical Layer Clock Synchronization ................................................................................................................ 159
1.4.12.1 Introduction....................................................................................................................................................... 159
1.4.12.2 Principles .......................................................................................................................................................... 160
1.4.12.2.1 Basic Concepts............................................................................................................................................... 160
1.4.12.2.2 Physical Layer Synchronization Modes and Precautions .............................................................................. 162
1.4.12.2.3 Networking Modes of Clock Physical Layer Synchronization ...................................................................... 164
1.4.12.2.4 Physical Layer Clock Protection Switching ................................................................................................... 165
1.4.12.2.5 Impact ............................................................................................................................................................ 168
1.4.12.3 Terms and Abbreviations .................................................................................................................................. 168
1.4.13 1588 ACR Clock Synchronization ....................................................................................................................... 168
1.4.13.1 Introduction....................................................................................................................................................... 168
1.4.13.2 Principles .......................................................................................................................................................... 169
1.4.13.2.1 Basic Principles of 1588 ACR ....................................................................................................................... 169
1.4.13.3 Applications ...................................................................................................................................................... 172
1.4.14 CES ACR Clock Synchronization ....................................................................................................................... 174
1.4.14.1 Introduction....................................................................................................................................................... 174
1.4.14.2 Principles .......................................................................................................................................................... 175
1.4.14.2.1 Basic Concepts............................................................................................................................................... 175
1.4.14.2.2 Basic Principles ............................................................................................................................................. 175
1.4.14.3 Applications ...................................................................................................................................................... 176
1.4.15 1588v2 and G.8275.1 ........................................................................................................................................... 176
1.4.15.1 Introduction....................................................................................................................................................... 176
1.4.15.2 Principles .......................................................................................................................................................... 180
1.4.15.2.1 Basic Concepts............................................................................................................................................... 180
1.4.15.2.2 Principle of Synchronization.......................................................................................................................... 182
1.4.15.2.3 Principle of G.8275.1 Synchronization .......................................................................................................... 194
1.4.15.3 Application Environment .................................................................................................................................. 194
1.4.16 1588 ATR ............................................................................................................................................................. 200
1.4.16.1 Introduction....................................................................................................................................................... 200
1.4.16.2 Principles .......................................................................................................................................................... 201
1.4.16.2.1 Principles of 1588 ATR .................................................................................................................................. 201
1.4.16.3 Applications ...................................................................................................................................................... 204
Issue 01 (2018-05-04) vi
NE20E-S2

1.4.17 ATom GPS Timing ............................................................................................................................................... 206
1.4.17.1 Introduction....................................................................................................................................................... 206
1.4.17.2 Principles .......................................................................................................................................................... 207
1.4.17.2.1 Modules ......................................................................................................................................................... 207
1.4.17.2.2 Implementation Principles ............................................................................................................................. 207
1.4.17.3 Applications ...................................................................................................................................................... 208
1.4.18 NTP...................................................................................................................................................................... 209
1.4.18.1 Introduction....................................................................................................................................................... 210
1.4.18.2 Principles .......................................................................................................................................................... 211
1.4.18.2.1 NTP Implementation Model .......................................................................................................................... 211
1.4.18.2.2 Network Structure .......................................................................................................................................... 213
1.4.18.2.3 Format of NTP Messages ............................................................................................................................... 214
1.4.18.2.4 NTP Operating Modes ................................................................................................................................... 216
1.4.18.2.5 NTP Events Processing .................................................................................................................................. 218
1.4.18.2.6 Dynamic and Static NTP Associations ........................................................................................................... 219
1.4.18.2.7 NTP Access Control ....................................................................................................................................... 220
1.4.18.2.8 VPN Support .................................................................................................................................................. 221
1.4.18.3 Applications ...................................................................................................................................................... 221
1.4.19 OPS ...................................................................................................................................................................... 223
1.4.19.1 Overview........................................................................................................................................................... 223
1.4.19.2 Principles .......................................................................................................................................................... 224
1.4.19.2.1 OPS Architecture ........................................................................................................................................... 224
1.4.19.2.2 Maintenance Assistant Overview ................................................................................................................... 225
1.4.19.3 Application ........................................................................................................................................................ 225
1.4.19.3.1 Maintenance Assistant Applications .............................................................................................................. 225
1.4.20 SAID .................................................................................................................................................................... 227
1.4.20.1 Introduction....................................................................................................................................................... 227
1.4.20.2 Principles .......................................................................................................................................................... 228
1.4.20.2.1 Basic SAID Functions ................................................................................................................................... 228
1.4.20.2.2 SAID for Ping ................................................................................................................................................ 230
1.4.21 KPI ....................................................................................................................................................................... 231
1.4.21.1 Introduction....................................................................................................................................................... 231
1.4.21.2 Principles .......................................................................................................................................................... 232
1.5 Network Reliability .................................................................................................................................................. 240
1.5.1 About This Document ............................................................................................................................................ 240
1.5.2 BFD ....................................................................................................................................................................... 242
1.5.2.1 Introduction......................................................................................................................................................... 242
1.5.2.2 Principles ............................................................................................................................................................ 243
Issue 01 (2018-05-04) vii

NE20E-S2
1.5.2.2.1 Basic Concepts................................................................................................................................................. 243

1.5.2.2.2 BFD for IP ....................................................................................................................................................... 247
1.5.2.2.3 BFD for PST .................................................................................................................................................... 249
1.5.2.2.4 Multicast BFD ................................................................................................................................................. 249
1.5.2.2.5 BFD for PIS ..................................................................................................................................................... 250
1.5.2.2.6 BFD for Link-Bundle....................................................................................................................................... 251
1.5.2.2.7 BFD Echo ........................................................................................................................................................ 252
1.5.2.2.8 Board Selection Rules for BFD Sessions ......................................................................................................... 254
1.5.2.2.9 BFD Dampening .............................................................................................................................................. 256
1.5.2.3 Applications ........................................................................................................................................................ 257
1.5.2.3.1 BFD for Static Routes ...................................................................................................................................... 257
1.5.2.3.2 BFD for RIP ..................................................................................................................................................... 258
1.5.2.3.3 BFD for OSPF ................................................................................................................................................. 260
1.5.2.3.4 BFD for OSPFv3 ............................................................................................................................................. 261
1.5.2.3.5 BFD for IS-IS .................................................................................................................................................. 262
1.5.2.3.6 BFD for BGP ................................................................................................................................................... 265
1.5.2.3.7 BFD for LDP LSP ............................................................................................................................................ 266
1.5.2.3.8 BFD for P2MP TE ........................................................................................................................................... 268
1.5.2.3.9 BFD for TE CR-LSP ........................................................................................................................................ 269
1.5.2.3.10 BFD for TE Tunnel ........................................................................................................................................ 272
1.5.2.3.11 BFD for RSVP ............................................................................................................................................... 272
1.5.2.3.12 BFD for VRRP............................................................................................................................................... 273
1.5.2.3.13 BFD for PW ................................................................................................................................................... 277
1.5.2.3.14 BFD for Multicast VPLS ............................................................................................................................... 279
1.5.2.3.15 BFD for PIM .................................................................................................................................................. 281
1.5.3 LMSP ..................................................................................................................................................................... 283
1.5.3.1 Introduction......................................................................................................................................................... 283
1.5.3.2 Principles ............................................................................................................................................................ 283
1.5.3.2.1 Basic LMSP Principles .................................................................................................................................... 284
1.5.3.2.2 Single-Chassis LMSP Implementation ............................................................................................................ 287
1.5.3.2.3 MC-LMSP Implementation ............................................................................................................................. 289
1.5.3.3 Applications ........................................................................................................................................................ 290
1.5.3.3.1 Application of Single-chassis LMSP on a Mobile Bearer Network ................................................................. 290
1.5.3.3.2 MC-LMSP and PW Redundancy Application.................................................................................................. 293
1.5.3.3.3 MC-LMSP and E-PW APS Application........................................................................................................... 295
1.5.3.3.4 L3VPN (PPP/MLPPP) and MC-LMSP Application ........................................................................................ 296
1.5.4 MPLS OAM .......................................................................................................................................................... 297
1.5.4.1 Introduction......................................................................................................................................................... 297
1.5.4.2 Principles ............................................................................................................................................................ 299
1.5.4.2.1 Basic Detection ................................................................................................................................................ 299
1.5.4.2.2 Auto Protocol ................................................................................................................................................... 302
1.5.4.3 Applications ........................................................................................................................................................ 303
Issue 01 (2018-05-04) viii

NE20E-S2
1.5.4.3.1 MPLS OAM Application in the IP RAN Layer 2 to Edge Scenario ................................................................ 303
1.5.4.3.2 Application of MPLS OAM in VPLS Networking .......................................................................................... 304
1.5.4.4 Terms and Abbreviations .................................................................................................................................... 305
1.5.5 MPLS-TP OAM ..................................................................................................................................................... 306
1.5.5.1 Introduction......................................................................................................................................................... 306
1.5.5.2 Principles ............................................................................................................................................................ 309
1.5.5.2.1 Basic Concepts................................................................................................................................................. 309
1.5.5.2.2 Continuity Check and Connectivity Verification ............................................................................................. 311
1.5.5.2.3 Packet Loss Measurement ............................................................................................................................... 312
1.5.5.2.4 Frame Delay Measurement .............................................................................................................................. 314
1.5.5.2.5 Remote Defect Indication ................................................................................................................................ 316
1.5.5.2.6 Loopback ......................................................................................................................................................... 317
1.5.5.3 Applications ........................................................................................................................................................ 318
1.5.5.3.1 MPLS-TP OAM Application in the IP RAN Layer 2 to Edge Scenario .......................................................... 318
1.5.5.3.2 Application of MPLS-TP OAM in VPLS Networking .................................................................................... 319
1.5.6 VRRP ..................................................................................................................................................................... 321
1.5.6.1 Introduction......................................................................................................................................................... 321
1.5.6.2 Principles ............................................................................................................................................................ 325
1.5.6.2.1 Basic VRRP Concepts ..................................................................................................................................... 325
1.5.6.2.2 VRRP Packets .................................................................................................................................................. 326
1.5.6.2.3 VRRP Operating Principles ............................................................................................................................. 330
1.5.6.2.4 Basic VRRP Functions..................................................................................................................................... 334
1.5.6.2.5 mVRRP ............................................................................................................................................................ 337
1.5.6.2.6 Association Between VRRP and a VRRP-disabled Interface .......................................................................... 339
1.5.6.2.7 BFD for VRRP................................................................................................................................................. 340
1.5.6.2.8 VRRP Tracking EFM ....................................................................................................................................... 344
1.5.6.2.9 VRRP Tracking CFM....................................................................................................................................... 346
1.5.6.2.10 VRRP Association with NQA ........................................................................................................................ 349
1.5.6.2.11 Association Between a VRRP Backup Group and a Route ............................................................................ 351
1.5.6.2.12 Association Between Direct Routes and a VRRP Backup Group .................................................................. 354
1.5.6.2.13 Traffic Forwarding by a Backup Device ........................................................................................................ 356
1.5.6.2.14 Rapid VRRP Switchback ............................................................................................................................... 358
1.5.6.3 Applications ........................................................................................................................................................ 360
1.5.6.3.1 IPRAN Gateway Protection Solution .............................................................................................................. 360
1.5.7 Ethernet OAM ....................................................................................................................................................... 364
1.5.7.1 Introduction......................................................................................................................................................... 364
1.5.7.2 EFM Principles ................................................................................................................................................... 368
1.5.7.2.1 Basic Concepts................................................................................................................................................. 368
1.5.7.2.2 Background ...................................................................................................................................................... 370
1.5.7.2.3 Basic Functions ................................................................................................................................................ 371
Issue 01 (2018-05-04) ix
NE20E-S2
1.5.7.3 CFM Principles ................................................................................................................................................... 375

1.5.7.3.1 Basic Concepts................................................................................................................................................. 375
1.5.7.3.2 Background ...................................................................................................................................................... 384
1.5.7.3.3 Basic Functions ................................................................................................................................................ 385
1.5.7.3.4 CFM Alarms .................................................................................................................................................... 388
1.5.7.4 Y.1731 Principles ................................................................................................................................................ 390
1.5.7.4.1 Background ...................................................................................................................................................... 390
1.5.7.4.2 Basic Functions ................................................................................................................................................ 391
1.5.7.5 Ethernet OAM Fault Advertisement ................................................................................................................... 405
1.5.7.5.1 Background ...................................................................................................................................................... 405
1.5.7.5.2 Fault Information Advertisement Between EFM and Other Modules ............................................................. 406
1.5.7.5.3 Fault Information Advertisement Between CFM and Other Modules ............................................................. 408
1.5.7.6 Applications ........................................................................................................................................................ 414
1.5.7.6.1 Ethernet OAM Applications on a MAN .......................................................................................................... 414
1.5.7.6.2 Ethernet OAM Applications on an IPRAN ...................................................................................................... 416
1.5.7.7 Our Advantages .................................................................................................................................................. 417
1.5.7.7.1 EFM Enhancements ......................................................................................................................................... 417
1.5.8 Dual-Device Backup .............................................................................................................................................. 418
1.5.8.1 Introduction......................................................................................................................................................... 418
1.5.8.2 Applications ........................................................................................................................................................ 420
1.5.8.2.1 Dual-Device ARP Hot Backup ........................................................................................................................ 420
1.5.8.2.2 Dual-Device IGMP Snooping Hot Backup ...................................................................................................... 421
1.5.9 Bit-Error-Triggered Protection Switching ............................................................................................................. 424
1.5.9.1 Introduction......................................................................................................................................................... 424
1.5.9.2 Principles ............................................................................................................................................................ 425
1.5.9.2.1 Bit Error Detection........................................................................................................................................... 425
1.5.9.2.2 Bit-Error-Triggered Section Switching ............................................................................................................ 427
1.5.9.2.3 Bit-Error-Triggered IGP Route Switching ....................................................................................................... 429
1.5.9.2.4 Bit-Error-Triggered Trunk Update ................................................................................................................... 430
1.5.9.2.5 Bit-Error-Triggered RSVP-TE Tunnel Switching ............................................................................................ 433
1.5.9.2.6 Bit-Error-Triggered Switching for PW ............................................................................................................ 435
1.5.9.2.7 Bit-Error-Triggered L3VPN Switching ........................................................................................................... 437
1.5.9.2.8 Bit-Error-Triggered Static CR-LSP/PW/E-PW APS........................................................................................ 439
1.5.9.2.9 Relationships Among Bit-Error-Triggered Protection Switching Features ...................................................... 441
1.5.9.3 Applications ........................................................................................................................................................ 446
1.5.9.3.1 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which TE Tunnels Carry an IP RAN
........................................................................................................................................................................................ 446
1.5.9.3.2 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which LDP LSPs Carry an IP RAN
........................................................................................................................................................................................ 449
1.5.9.3.3 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which a Static CR-LSP/PW Carries
L2VPN Services ............................................................................................................................................................. 452
Issue 01 (2018-05-04) x
NE20E-S2

1.6 Interface and Data Link ............................................................................................................................................ 454
1.6.2 Introduction............................................................................................................................................................ 456
1.6.3 Principles ............................................................................................................................................................... 457
1.6.3.1 Basic Concepts.................................................................................................................................................... 457
1.6.3.2 Logical Interface ................................................................................................................................................. 464
1.6.3.3 Interface Monitoring Group ................................................................................................................................ 468
1.6.4 Applications ........................................................................................................................................................... 469
1.6.4.1 Sub-interface ....................................................................................................................................................... 469
1.6.4.2 Eth-Trunk ............................................................................................................................................................ 470
1.6.4.3 Loopback Interface ............................................................................................................................................. 470
1.6.4.4 Null0 Interface .................................................................................................................................................... 472
1.6.4.5 Tunnel Interface .................................................................................................................................................. 474
1.6.4.6 Application of Interface Monitoring Group ........................................................................................................ 474
1.7 LAN Access and MAN Access ................................................................................................................................. 475
1.7.2 Ethernet .................................................................................................................................................................. 478
1.7.2.1 Introduction......................................................................................................................................................... 478
1.7.2.2 Principles ............................................................................................................................................................ 479
1.7.2.2.1 Ethernet Physical Layer ................................................................................................................................... 479
1.7.2.2.2 Ethernet Data Link Layer................................................................................................................................. 488
1.7.2.3 Applications ........................................................................................................................................................ 493
1.7.2.3.1 Computer Interconnection ............................................................................................................................... 493
1.7.2.3.2 Interconnection Between High-Speed Network Devices ................................................................................. 493
1.7.2.3.3 MAN Access Methods ..................................................................................................................................... 493
1.7.3 Trunk ...................................................................................................................................................................... 493
1.7.3.1 Introduction......................................................................................................................................................... 493
1.7.3.2 Principles ............................................................................................................................................................ 494
1.7.3.2.1 Basic Trunk Principles ..................................................................................................................................... 494
1.7.3.2.2 Constraints on the Trunk Interface ................................................................................................................... 495
1.7.3.2.3 Types and Features of Trunk Interfaces ........................................................................................................... 495
1.7.3.2.4 Link Aggregation Control Protocol ................................................................................................................. 497
1.7.3.2.5 E-Trunk ............................................................................................................................................................ 505
1.7.3.3 Applications ........................................................................................................................................................ 511
1.7.3.3.1 Application of Eth-Trunk ................................................................................................................................. 511
1.7.3.3.2 E-Trunk Application in Dual-homing Networking .......................................................................................... 513
1.7.4 Layer 2 Protocol Tunneling ................................................................................................................................... 514
1.7.4.1 Introduction......................................................................................................................................................... 514
1.7.4.2 Principles ............................................................................................................................................................ 515
1.7.4.2.1 Basic Concepts................................................................................................................................................. 515
1.7.4.2.2 Basic Principles ............................................................................................................................................... 517
Issue 01 (2018-05-04) xi
NE20E-S2
1.7.4.3 Applications ........................................................................................................................................................ 521

1.7.4.3.1 Untagged Layer 2 Protocol Tunneling Application ......................................................................................... 521
1.7.4.3.2 VLAN-based Layer 2 Protocol Tunneling Application.................................................................................... 522
1.7.4.3.3 QinQ-based Layer 2 Protocol Tunneling Application ...................................................................................... 524
1.7.4.3.4 Hybrid VLAN-based Layer 2 Protocol Tunneling Application ....................................................................... 525
1.7.5 VLAN .................................................................................................................................................................... 526
1.7.5.1 Introduction......................................................................................................................................................... 526
1.7.5.2 Principles ............................................................................................................................................................ 528
1.7.5.2.1 Basic Concepts................................................................................................................................................. 528
1.7.5.2.2 VLAN Communication Principles ................................................................................................................... 531
1.7.5.2.3 VLAN Aggregation .......................................................................................................................................... 538
1.7.5.2.4 VLAN Mapping ............................................................................................................................................... 545
1.7.5.2.5 VLAN Damping .............................................................................................................................................. 546
1.7.5.2.6 Flexible Service Access Through Sub-interfaces of Various Types ................................................................. 546
1.7.5.3 Applications ........................................................................................................................................................ 556
1.7.5.3.1 Port-based VLAN Classification ..................................................................................................................... 556
1.7.5.3.2 VLAN Trunk Application ................................................................................................................................ 557
1.7.5.3.3 Inter-VLAN Communication Application ....................................................................................................... 557
1.7.5.3.4 VLAN Aggregation Application ...................................................................................................................... 558
1.7.6 QinQ ...................................................................................................................................................................... 559
1.7.6.1 Introduction......................................................................................................................................................... 559
1.7.6.2 QinQ Features ..................................................................................................................................................... 561
1.7.6.2.1 Basic Concepts................................................................................................................................................. 561
1.7.6.2.2 QinQ Tunneling ............................................................................................................................................... 563
1.7.6.2.3 Layer 2 Selective QinQ .................................................................................................................................... 564
1.7.6.2.4 VLAN Stacking ............................................................................................................................................... 566
1.7.6.2.5 Compatibility of EtherTypes in QinQ Tags ...................................................................................................... 566
1.7.6.2.6 QinQ-based VLAN Tag Swapping .................................................................................................................. 567
1.7.6.2.7 QinQ Mapping ................................................................................................................................................. 567
1.7.6.2.8 Symmetry/Asymmetry Mode........................................................................................................................... 569
1.7.6.2.9 IP Forwarding on a Termination Sub-interface ................................................................................................ 570
1.7.6.2.10 Proxy ARP on a Termination Sub-interface ................................................................................................... 572
1.7.6.2.11 DHCP Server on a Termination Sub-interface ............................................................................................... 575
1.7.6.2.12 DHCP Relay on a Termination Sub-interface ................................................................................................ 576
1.7.6.2.13 VRRP on a Termination Sub-interface ........................................................................................................... 578
1.7.6.2.14 L3VPN Access Through a Termination Sub-interface ................................................................................... 581
1.7.6.2.15 VPWS Access Through a Termination Sub-interface .................................................................................... 583
1.7.6.2.16 VPLS Access Through a Termination Sub-interface...................................................................................... 584
1.7.6.2.17 Multicast Service on a Termination Sub-interface ......................................................................................... 586
1.7.6.2.18 VPWS Access Through a QinQ Stacking Sub-interface ................................................................................ 587
1.7.6.2.19 VPLS Access Through a QinQ Stacking Sub-interface ................................................................................. 588
Issue 01 (2018-05-04) xii

NE20E-S2
1.7.6.2.20 802.1p on a QinQ Interface ............................................................................................................................ 589

1.7.6.3 Applications ........................................................................................................................................................ 590
1.7.6.3.1 User Services on a Metro Ethernet .................................................................................................................. 590
1.7.6.3.2 Enterprise Leased Line Interconnections ......................................................................................................... 592
1.7.7 EVC ....................................................................................................................................................................... 593
1.7.7.1 Introduction......................................................................................................................................................... 593
1.7.7.2 Principles ............................................................................................................................................................ 597
1.7.7.2.1 EVC Service Bearing ....................................................................................................................................... 597
1.7.7.3 Applications ........................................................................................................................................................ 610
1.7.7.3.1 Application of EVC Bearing VPLS Services ................................................................................................... 610
1.7.7.3.2 Application of EVC VPWS Services ............................................................................................................... 613
1.7.8 STP/RSTP/MSTP .................................................................................................................................................. 615
1.7.8.1 Introduction......................................................................................................................................................... 615
1.7.8.2 STP/RSTP Principles .......................................................................................................................................... 617
1.7.8.2.1 Background ...................................................................................................................................................... 617
1.7.8.2.2 Basic Concepts................................................................................................................................................. 618
1.7.8.2.3 BPDU Format .................................................................................................................................................. 626
1.7.8.2.4 STP Topology Calculation ............................................................................................................................... 628
1.7.8.2.5 Evolution from STP to RSTP........................................................................................................................... 634
1.7.8.2.6 RSTP Implementation ...................................................................................................................................... 640
1.7.8.3 MSTP Principles ................................................................................................................................................. 642
1.7.8.3.1 MSTP Background........................................................................................................................................... 642
1.7.8.3.2 Basic Concepts................................................................................................................................................. 644
1.7.8.3.3 MST BPDUs .................................................................................................................................................... 652
1.7.8.3.4 MSTP Topology Calculation ............................................................................................................................ 657
1.7.8.3.5 MSTP Fast Convergence ................................................................................................................................. 659
1.7.8.3.6 MSTP Multi-process ........................................................................................................................................ 660
1.7.8.4 E-STP Principles ................................................................................................................................................. 666
1.7.8.5 Applications ........................................................................................................................................................ 676
1.7.8.5.1 STP Application ............................................................................................................................................... 676
1.7.8.5.2 Application of MSTP ....................................................................................................................................... 677
1.7.8.5.3 BPDU Tunneling ............................................................................................................................................. 678
1.7.8.5.4 Application of MSTP Multi-process ................................................................................................................ 679
1.7.9 ERPS (G.8032) ....................................................................................................................................................... 681
1.7.9.1 Introduction......................................................................................................................................................... 681
1.7.9.2 Principles ............................................................................................................................................................ 683
1.7.9.2.1 Basic Concepts................................................................................................................................................. 683
1.7.9.2.2 R-APS PDU Format......................................................................................................................................... 689
1.7.9.2.3 ERPS Single Ring Principles ........................................................................................................................... 692
Issue 01 (2018-05-04) xiii

NE20E-S2
1.7.9.2.4 ERPS Multi-ring Principles ............................................................................................................................. 697

1.7.9.2.5 ERPS Multi-instance........................................................................................................................................ 700
1.7.9.2.6 Association Between ERPS and Ethernet CFM ............................................................................................... 701
1.7.9.3 Applications ........................................................................................................................................................ 703
1.7.9.3.1 ERPS Layer 2 Protocol Tunneling Application ............................................................................................... 703
1.7.10 MAC Flapping-based Loop Detection ................................................................................................................. 706
1.7.10.1 Introduction....................................................................................................................................................... 706
1.7.10.2 Principles .......................................................................................................................................................... 706
1.7.10.2.1 Principles of MAC Flapping-based Loop Detection ...................................................................................... 706
1.7.10.3 Applications ...................................................................................................................................................... 708
1.7.10.3.1 MAC Flapping-based Loop Detection for VPLS Networks .......................................................................... 708
1.7.11 VXLAN ............................................................................................................................................................... 710
1.7.11.1 Introduction ....................................................................................................................................................... 710
1.7.11.2 Principles .......................................................................................................................................................... 712
1.7.11.2.1 Basic Concepts ............................................................................................................................................... 712
1.7.11.2.2 VXLAN Packet Format ................................................................................................................................. 714
1.7.11.2.3 NVE Deployment Mode ................................................................................................................................ 716
1.7.11.2.4 VXLAN Control Plane ................................................................................................................................... 718
1.7.11.2.5 Data Packet Forwarding ................................................................................................................................. 728
1.7.11.2.6 VXLAN Active-Active Reliability ................................................................................................................ 734
1.7.11.3 Applications ...................................................................................................................................................... 740
1.7.11.3.1 Application for Communication Between Terminal Users on a VXLAN ...................................................... 740
1.7.11.3.2 Application for Communication Between Terminal Users on a VXLAN and Legacy Network .................... 742
1.7.11.3.3 Distributed VXLAN Gateway Application .................................................................................................... 743
1.7.11.3.4 Application for BRAS Access Through VXLAN .......................................................................................... 744
1.7.11.3.5 Application of VXLAN Active-Active Reliability ......................................................................................... 747
1.7.12 DCI Solutions ...................................................................................................................................................... 749
1.7.12.1 Introduction....................................................................................................................................................... 749
1.7.12.2 Principles .......................................................................................................................................................... 749
1.7.12.2.1 Control Plane ................................................................................................................................................. 749
1.7.12.2.2 Data Plane ...................................................................................................................................................... 755
1.7.12.3 Applications ...................................................................................................................................................... 757
1.7.12.3.1 Application of an End-to-End Overlay VXLAN Tunnel................................................................................ 757
1.7.12.3.2 Application of Option A VLAN Layer 3 Access ............................................................................................ 758
1.7.12.3.3 Application of Option A VXLAN Layer 3 Access ......................................................................................... 759
1.7.12.3.4 Application of an MPLS Integrated Deployment Scenario ............................................................................ 760
1.7.13 Proactive Loop Detection .................................................................................................................................... 762
1.7.13.1 Introduction....................................................................................................................................................... 762
Issue 01 (2018-05-04) xiv

NE20E-S2
1.7.13.2 Principles .......................................................................................................................................................... 762

1.7.13.2.1 Proactive Loop Detection .............................................................................................................................. 762
1.7.13.2.2 Loop Detection Packet Format ...................................................................................................................... 764
1.7.13.3 Application ........................................................................................................................................................ 766
1.7.13.3.1 AC Interface Receiving a Loop Detection Packet .......................................................................................... 766
1.7.13.3.2 PW Side Receiving a Loop Detection Packet ................................................................................................ 767
1.7.14 RRPP ................................................................................................................................................................... 767
1.7.14.1 Principles .......................................................................................................................................................... 767
1.7.14.1.1 Basic Concepts............................................................................................................................................... 767
1.7.14.1.2 RRPP Snooping ............................................................................................................................................. 770
1.8 WAN Access ............................................................................................................................................................. 772
1.8.2 ATM IMA .............................................................................................................................................................. 775
1.8.2.1 Introduction......................................................................................................................................................... 775
1.8.2.2 Principles ............................................................................................................................................................ 776
1.8.2.2.1 Basic IMA Principles ....................................................................................................................................... 776
1.8.2.3 Applications ........................................................................................................................................................ 778
1.8.2.3.1 ATM IMA Applications on an L2VPN ............................................................................................................ 778
1.8.3 ATM ....................................................................................................................................................................... 780
1.8.3.1 Introduction......................................................................................................................................................... 780
1.8.3.2 Feature Updates .................................................................................................................................................. 780
1.8.3.3 Principles ............................................................................................................................................................ 781
1.8.3.3.1 ATM Protocol Architecture .............................................................................................................................. 781
1.8.3.3.2 ATM Physical Layer ........................................................................................................................................ 783
1.8.3.3.3 ATM Layer ....................................................................................................................................................... 787
1.8.3.3.4 ATM Adaptation Layer .................................................................................................................................... 793
1.8.3.3.5 ATM Multiprotocol Encapsulation .................................................................................................................. 795
1.8.3.4 Application .......................................................................................................................................................... 802
1.8.3.4.1 IPoA ................................................................................................................................................................. 802
1.8.3.5 Impact ................................................................................................................................................................. 803
1.8.3.5.1 On System Performance .................................................................................................................................. 803
1.8.3.5.2 On Other Features ............................................................................................................................................ 803
1.8.3.5.3 Our Advantages ............................................................................................................................................... 803
1.8.3.5.4 Known Issues ................................................................................................................................................... 803
1.8.4 Frame Relay ........................................................................................................................................................... 806
1.8.4.1 Introduction......................................................................................................................................................... 806
1.8.4.2 Principles ............................................................................................................................................................ 808
1.8.4.2.1 Introduction...................................................................................................................................................... 808
1.8.4.2.2 LMI .................................................................................................................................................................. 809
1.8.4.2.3 FR Frame Encapsulation and Forwarding ....................................................................................................... 812
Issue 01 (2018-05-04) xv
NE20E-S2
1.8.4.2.4 FR Sub-interfaces ............................................................................................................................................ 814

1.8.4.3 Applications ........................................................................................................................................................ 816
1.8.4.3.1 FR Access ........................................................................................................................................................ 816
1.8.4.4 Impact ................................................................................................................................................................. 817
1.8.4.4.1 On System Performance .................................................................................................................................. 817
1.8.4.4.2 On Other Features ............................................................................................................................................ 817
1.8.4.4.3 Our Advantages ............................................................................................................................................... 817
1.8.4.4.4 Known Issues ................................................................................................................................................... 817
1.8.5 HDLC and IP-Trunk .............................................................................................................................................. 818
1.8.5.1 Introduction......................................................................................................................................................... 818
1.8.5.2 Principles ............................................................................................................................................................ 819
1.8.5.2.1 HDLC Principles ............................................................................................................................................. 819
1.8.5.2.2 HDLC Operation Modes .................................................................................................................................. 820
1.8.5.2.3 HDLC Frame Format ....................................................................................................................................... 820
1.8.5.2.4 HDLC Frame Types ......................................................................................................................................... 821
1.8.5.2.5 IP-Trunk ........................................................................................................................................................... 821
1.8.5.2.6 HDLC Flapping Suppression ........................................................................................................................... 821
1.8.5.3 Applications ........................................................................................................................................................ 823
1.8.6 PPP and MP ........................................................................................................................................................... 825
1.8.6.1 Introduction......................................................................................................................................................... 825
1.8.6.2 Principles ............................................................................................................................................................ 826
1.8.6.2.1 PPP Basic Concepts ......................................................................................................................................... 826
1.8.6.2.2 PPP Link Establishment Process ...................................................................................................................... 830
1.8.6.2.3 PPP Magic Number Check .............................................................................................................................. 837
1.8.6.2.4 PPP Flapping Suppression ............................................................................................................................... 839
1.8.6.2.5 MP Principles ................................................................................................................................................... 840
1.8.6.3 Applications ........................................................................................................................................................ 841
1.8.6.3.1 MP Applications............................................................................................................................................... 841
1.8.7 Transmission Alarm Customization and Suppression ............................................................................................ 842
1.8.7.1 Introduction......................................................................................................................................................... 842
1.8.7.2 Principles ............................................................................................................................................................ 843
1.8.7.2.1 Basic Concepts................................................................................................................................................. 843
1.8.7.2.2 Transmission Alarm Processing ....................................................................................................................... 844
1.8.8 CES Service Connectivity Test .............................................................................................................................. 847
1.8.8.1 Introduction......................................................................................................................................................... 847
1.8.8.2 Principles ............................................................................................................................................................ 847
1.8.8.3 Applications ........................................................................................................................................................ 848
Issue 01 (2018-05-04) xvi

NE20E-S2
1.8.9 CES ........................................................................................................................................................................ 849

1.8.9.1 Introduction......................................................................................................................................................... 849
1.8.9.2 Principles ............................................................................................................................................................ 851
1.8.9.2.1 Basic Concepts................................................................................................................................................. 851
1.8.9.2.2 IP RAN Implementation on the Device ........................................................................................................... 854
1.8.9.2.3 Principles of CEP ............................................................................................................................................. 860
1.8.9.3 Applications ........................................................................................................................................................ 864
1.9 IP Services ................................................................................................................................................................ 868
1.9.2 ARP ........................................................................................................................................................................ 871
1.9.2.1 Introduction......................................................................................................................................................... 871
1.9.2.2 Principles ............................................................................................................................................................ 874
1.9.2.2.2 Dynamic ARP .................................................................................................................................................. 880
1.9.2.2.3 Static ARP ........................................................................................................................................................ 882
1.9.2.2.4 Gratuitous ARP ................................................................................................................................................ 883
1.9.2.2.5 Proxy ARP ....................................................................................................................................................... 885
1.9.2.2.6 ARP-Ping ......................................................................................................................................................... 894
1.9.2.2.7 Dual-Device ARP Hot Backup ........................................................................................................................ 897
1.9.2.3 Applications ........................................................................................................................................................ 899
1.9.2.3.1 Intra-VLAN Proxy ARP Application ............................................................................................................... 899
1.9.2.3.2 Static ARP Application .................................................................................................................................... 900
1.9.3 ACL ....................................................................................................................................................................... 902
1.9.3.1 Introduction......................................................................................................................................................... 902
1.9.3.2 Principles ............................................................................................................................................................ 903
1.9.3.2.1 Basic ACL Concepts ........................................................................................................................................ 903
1.9.3.2.2 ACL Matching Principles................................................................................................................................. 905
1.9.3.3 Applications ........................................................................................................................................................ 910
1.9.3.3.1 ACL Applied to Telnet(VTY), SNMP, FTP & TFTP ....................................................................................... 910
1.9.3.3.2 ACL Applied to Traffic Policy ......................................................................................................................... 912
1.9.3.3.3 ACL Applied to Route Policy .......................................................................................................................... 918
1.9.3.3.4 ACL Applied to Filter Policy ........................................................................................................................... 925
1.9.3.3.5 ACL Applied to Multicast Policy ..................................................................................................................... 928
1.9.3.3.6 ACL Applied to CPU Defend Policy ............................................................................................................... 929
1.9.3.3.7 ACL Applied to NAT ....................................................................................................................................... 931
1.9.3.3.8 ACL Applied to IPSec Policy........................................................................................................................... 932
1.9.3.3.9 ACL Applied to Filtering BFD Passive Echo................................................................................................... 932
1.9.4 DHCP..................................................................................................................................................................... 934
1.9.4.1 Introduction......................................................................................................................................................... 934
Issue 01 (2018-05-04) xvii

NE20E-S2
1.9.4.2 DHCPv4 Principles ............................................................................................................................................. 935

1.9.4.2.1 DHCPv4 Overview .......................................................................................................................................... 935
1.9.4.2.2 DHCPv4 Messages .......................................................................................................................................... 936
1.9.4.2.3 DHCPv4 Server ............................................................................................................................................... 942
1.9.4.2.4 DHCPv4 Relay ................................................................................................................................................ 944
1.9.4.3 DHCPv6 Principles ............................................................................................................................................. 947
1.9.4.3.1 DHCPv6 Overview .......................................................................................................................................... 947
1.9.4.3.2 DHCPv6 Messages .......................................................................................................................................... 949
1.9.4.3.3 DHCPv6 Relay ................................................................................................................................................ 954
1.9.4.4 Applications ........................................................................................................................................................ 958
1.9.4.4.1 DHCPv4 Server Application ............................................................................................................................ 958
1.9.4.4.2 DHCPv4/v6 Relay Application ........................................................................................................................ 959
1.9.5 DNS ....................................................................................................................................................................... 960
1.9.5.1 Introduction......................................................................................................................................................... 960
1.9.5.2 Principles ............................................................................................................................................................ 960
1.9.5.2.1 Static DNS ....................................................................................................................................................... 961
1.9.5.2.2 Dynamic DNS .................................................................................................................................................. 961
1.9.5.3 Applications ........................................................................................................................................................ 963
1.9.6 MTU ...................................................................................................................................................................... 963
1.9.6.1 What is MTU ...................................................................................................................................................... 963
1.9.6.2 IP MTU Fragmentation ....................................................................................................................................... 964
1.9.6.3 MPLS MTU Fragmentation ................................................................................................................................ 968
1.9.6.4 Protocols MTU Negotiation ................................................................................................................................ 970
1.9.6.5 Number of labels carried in an MPLS packet in various scenarios ..................................................................... 974
1.9.7 Load Balancing ...................................................................................................................................................... 975
1.9.7.1 Introduction......................................................................................................................................................... 975
1.9.7.2 Basic Concepts.................................................................................................................................................... 975
1.9.7.2.1 What Is Load Balancing................................................................................................................................... 975
1.9.7.2.2 Per-Flow and Per-Packet Load Balancing ....................................................................................................... 977
1.9.7.2.3 ECMP and UCMP............................................................................................................................................ 978
1.9.7.3 Basic Principles .................................................................................................................................................. 980
1.9.7.4 Conditions for Load Balancing ........................................................................................................................... 983
1.9.7.4.1 Route Load Balancing ..................................................................................................................................... 983
1.9.7.4.2 Tunnel Load Balancing .................................................................................................................................... 990
1.9.7.4.3 Eth-Trunk Load Balancing............................................................................................................................... 992
1.9.7.5 Load Balancing Algorithm.................................................................................................................................. 994
1.9.7.5.1 Algorithm Overview ........................................................................................................................................ 994
1.9.7.5.2 Analysis for Load Balancing In Typical Scenarios .......................................................................................... 995
1.9.7.6 Appendix: Default Hash Factors ....................................................................................................................... 1010
1.9.8 UCMP .................................................................................................................................................................. 1013
1.9.8.1 Introduction....................................................................................................................................................... 1013
Issue 01 (2018-05-04) xviii

NE20E-S2
1.9.8.2 Principles .......................................................................................................................................................... 1014

1.9.8.2.1 Basic Principles ............................................................................................................................................. 1014
1.9.8.2.2 Interface-based UCMP .................................................................................................................................. 1014
1.9.8.2.3 Global UCMP ................................................................................................................................................ 1015
1.9.8.3 Applications ...................................................................................................................................................... 1016
1.9.8.3.1 Interface-based UCMP Application ............................................................................................................... 1016
1.9.8.3.2 Global UCMP Application ............................................................................................................................. 1016
1.9.9 IPv4...................................................................................................................................................................... 1017
1.9.9.1 Introduction....................................................................................................................................................... 1017
1.9.9.2 Principles .......................................................................................................................................................... 1018
1.9.9.2.1 ICMP ............................................................................................................................................................. 1018
1.9.9.2.2 TCP ................................................................................................................................................................ 1019
1.9.9.2.3 UDP ............................................................................................................................................................... 1020
1.9.9.2.4 RawIP ............................................................................................................................................................ 1020
1.9.9.2.5 Socket ............................................................................................................................................................ 1021
1.9.9.3 Applications ...................................................................................................................................................... 1022
1.9.10 IPv6 .................................................................................................................................................................... 1022
1.9.10.1 Introduction..................................................................................................................................................... 1022
1.9.10.2 Principles ........................................................................................................................................................ 1023
1.9.10.2.1 IPv6 Addresses ............................................................................................................................................. 1023
1.9.10.2.2 IPv6 Features ............................................................................................................................................... 1026
1.9.10.2.3 ICMPv6........................................................................................................................................................ 1028
1.9.10.2.4 Neighbor Discovery ..................................................................................................................................... 1030
1.9.10.2.5 Path MTU .................................................................................................................................................... 1033
1.9.10.2.6 Dual Protocol Stacks .................................................................................................................................... 1034
1.9.10.2.7 IPv6 over IPv4 Tunnel ................................................................................................................................. 1035
1.9.10.2.8 TCP6 ............................................................................................................................................................ 1036
1.9.10.2.9 UDP6 ........................................................................................................................................................... 1037
1.9.10.2.10 RawIP6 ...................................................................................................................................................... 1037
1.10 IP Routing ............................................................................................................................................................. 1038
1.10.1 About This Document ........................................................................................................................................ 1038
1.10.2 IP Routing Overview ......................................................................................................................................... 1040
1.10.2.1 Introduction..................................................................................................................................................... 1040
1.10.2.2 Principles ........................................................................................................................................................ 1041
1.10.2.2.1 Routers ......................................................................................................................................................... 1041
1.10.2.2.2 Routing Protocols ........................................................................................................................................ 1042
1.10.2.2.3 Routing Tables ............................................................................................................................................. 1042
1.10.2.2.4 Route Iteration ............................................................................................................................................. 1044
1.10.2.2.5 Static and Dynamic Routes .......................................................................................................................... 1045
1.10.2.2.6 Classification of Dynamic Routing Protocols .............................................................................................. 1045
1.10.2.2.7 Routing Protocol and Route Priority............................................................................................................ 1045
Issue 01 (2018-05-04) xix

NE20E-S2
1.10.2.2.8 Priority-based Route Convergence .............................................................................................................. 1047

1.10.2.2.9 Load Balancing and Route Backup .............................................................................................................. 1048
1.10.2.2.10 Principles of IP FRR .................................................................................................................................. 1050
1.10.2.2.11 Readvertisement of Routing Information ................................................................................................... 1051
1.10.2.2.12 Indirect Next Hop ...................................................................................................................................... 1051
1.10.2.2.13 Default Route ............................................................................................................................................. 1055
1.10.2.2.14 Multi-Topology .......................................................................................................................................... 1055
1.10.2.2.15 Association Between Direct Routes and a VRRP Backup Group .............................................................. 1056
1.10.2.2.16 Direct Routes Responding to L3VE Interface Status Changes After a Delay ............................................ 1058
1.10.2.2.17 Association Between the Direct Route and PW Status .............................................................................. 1059
1.10.2.2.18 Vlink Direct Route Advertisement ............................................................................................................. 1060
1.10.2.3 Applications .................................................................................................................................................... 1062
1.10.2.3.1 Typical Application of IP FRR ..................................................................................................................... 1062
1.10.2.3.2 Data Center Applications of Association Between Direct Routes and a VRRP Backup Group ................... 1062
1.10.2.3.3 IPRAN Applications of Association Between Direct Routes and a VRRP Backup Group .......................... 1063
1.10.2.4 Appendix List of Port Numbers of Common Protocols .................................................................................. 1064
1.10.2.5 Terms, Acronyms, and Abbreviations ............................................................................................................. 1065
1.10.3 Static Route ........................................................................................................................................................ 1067
1.10.3.1 Introduction..................................................................................................................................................... 1067
1.10.3.2 Principles ........................................................................................................................................................ 1067
1.10.3.2.1 Components ................................................................................................................................................. 1067
1.10.3.2.2 Applications ................................................................................................................................................. 1068
1.10.3.2.3 Functions...................................................................................................................................................... 1070
1.10.3.2.4 BFD for Static Routes .................................................................................................................................. 1070
1.10.3.2.5 NQA for Static Routes ................................................................................................................................. 1071
1.10.3.2.6 Static Route Permanent Advertisement ........................................................................................................ 1074
1.10.4 RIP ..................................................................................................................................................................... 1075
1.10.4.1 Introduction..................................................................................................................................................... 1075
1.10.4.2 Principles ........................................................................................................................................................ 1076
1.10.4.2.1 RIP-1 ............................................................................................................................................................ 1076
1.10.4.2.2 RIP-2 ............................................................................................................................................................ 1076
1.10.4.2.3 Timers .......................................................................................................................................................... 1077
1.10.4.2.4 Split Horizon ................................................................................................................................................ 1078
1.10.4.2.5 Poison Reverse ............................................................................................................................................. 1079
1.10.4.2.6 Triggered Update ......................................................................................................................................... 1080
1.10.4.2.7 Route Summarization .................................................................................................................................. 1080
1.10.4.2.8 Multi-Process and Multi-Instance ................................................................................................................ 1081
1.10.4.2.9 BFD for RIP ................................................................................................................................................. 1081
1.10.4.2.10 RIP Authentication ..................................................................................................................................... 1083
1.10.5 RIPng ................................................................................................................................................................. 1084
1.10.5.1 Introduction..................................................................................................................................................... 1084
1.10.5.2 Principles ........................................................................................................................................................ 1085
Issue 01 (2018-05-04) xx
NE20E-S2
1.10.5.2.1 RIPng Packet Format ................................................................................................................................... 1085

1.10.5.2.2 Timers .......................................................................................................................................................... 1086
1.10.5.2.3 Split Horizon ................................................................................................................................................ 1087
1.10.5.2.4 Poison Reverse ............................................................................................................................................. 1087
1.10.5.2.5 Triggered Update ......................................................................................................................................... 1087
1.10.5.2.6 Route Summarization .................................................................................................................................. 1088
1.10.5.2.7 Multi-Process and Multi-Instance ................................................................................................................ 1089
1.10.5.2.8 IPsec Authentication .................................................................................................................................... 1089
1.10.6 OSPF.................................................................................................................................................................. 1090
1.10.6.1 Introduction..................................................................................................................................................... 1090
1.10.6.2 Principles ........................................................................................................................................................ 1091
1.10.6.2.1 Basic Concepts............................................................................................................................................. 1091
1.10.6.2.2 Basic Principles ........................................................................................................................................... 1100
1.10.6.2.3 OSPF Route Control .................................................................................................................................... 1106
1.10.6.2.4 OSPF Virtual Link ....................................................................................................................................... 1108
1.10.6.2.5 OSPF TE ...................................................................................................................................................... 1109
1.10.6.2.6 OSPF VPN ....................................................................................................................................................1111
1.10.6.2.7 OSPF NSSA................................................................................................................................................. 1117
1.10.6.2.8 OSPF Local MT ........................................................................................................................................... 1119
1.10.6.2.9 BFD for OSPF ............................................................................................................................................. 1120
1.10.6.2.10 OSPF GTSM .............................................................................................................................................. 1122
1.10.6.2.11 OSPF Smart-discover ................................................................................................................................. 1122
1.10.6.2.12 OSPF-BGP Synchronization ...................................................................................................................... 1123
1.10.6.2.13 LDP-IGP Synchronization ......................................................................................................................... 1124
1.10.6.2.14 OSPF Fast Convergence ............................................................................................................................ 1127
1.10.6.2.15 OSPF Neighbor Relationship Flapping Suppression ................................................................................. 1128
1.10.6.2.16 OSPF Multi-Area Adjacency ..................................................................................................................... 1134
1.10.6.2.17 OSPF IP FRR ............................................................................................................................................. 1137
1.10.6.2.18 OSPF Authentication ................................................................................................................................. 1142
1.10.6.2.19 OSPF Packet Format .................................................................................................................................. 1143
1.10.6.2.20 OSPF LSA Format ..................................................................................................................................... 1150
1.10.6.3 Applications .................................................................................................................................................... 1156
1.10.6.3.1 OSPF GTSM ................................................................................................................................................ 1156
1.10.7 OSPFv3 .............................................................................................................................................................. 1157
1.10.7.1 Introduction..................................................................................................................................................... 1157
1.10.7.2 Principles ........................................................................................................................................................ 1157
1.10.7.2.1 OSPFv3 Fundamentals ................................................................................................................................ 1157
1.10.7.2.2 Comparison Between OSPFv3 and OSPFv2 ............................................................................................... 1163
1.10.7.2.3 BFD for OSPFv3 ......................................................................................................................................... 1165
1.10.7.2.4 Priority-based Convergence ......................................................................................................................... 1166
1.10.7.2.5 OSPFv3 Auto FRR....................................................................................................................................... 1166
1.10.7.2.6 OSPFv3 GR ................................................................................................................................................. 1169
Issue 01 (2018-05-04) xxi

NE20E-S2
1.10.7.2.7 OSPFv3 VPN ............................................................................................................................................... 1170

1.10.7.2.8 OSPFv3-BGP Association ........................................................................................................................... 1174
1.10.7.2.9 OSPFv3 Authentication ............................................................................................................................... 1175
1.10.7.2.10 OSPFv3 Neighbor Relationship Flapping Suppression ............................................................................. 1177
1.10.7.2.11 OSPFv3 Flush ............................................................................................................................................ 1181
1.10.7.2.12 OSPFv3 Packet Format .............................................................................................................................. 1189
1.10.7.2.13 OSPFv3 LSA Format ................................................................................................................................. 1196
1.10.8 IS-IS ................................................................................................................................................................... 1206
1.10.8.1 Introduction..................................................................................................................................................... 1206
1.10.8.2 Principles ........................................................................................................................................................ 1206
1.10.8.2.1 Basic Concepts of IS-IS ............................................................................................................................... 1206
1.10.8.2.2 Basic Protocols of IS-IS ............................................................................................................................... 1210
1.10.8.2.3 IS-IS Routing Information Control .............................................................................................................. 1216
1.10.8.2.4 IS-IS Neighbor Relationship Flapping Suppression .................................................................................... 1220
1.10.8.2.5 IS-IS Overload ............................................................................................................................................. 1227
1.10.8.2.6 IS-IS Fast Convergence ............................................................................................................................... 1227
1.10.8.2.7 IS-IS LSP Fragment Extension .................................................................................................................... 1229
1.10.8.2.8 IS-IS 3-Way Handshake ............................................................................................................................... 1232
1.10.8.2.9 IS-IS for IPv6 ............................................................................................................................................... 1233
1.10.8.2.10 IS-IS TE ..................................................................................................................................................... 1233
1.10.8.2.11 IS-IS Wide Metric ...................................................................................................................................... 1237
1.10.8.2.12 BFD for IS-IS ............................................................................................................................................ 1239
1.10.8.2.13 IS-IS Auto FRR .......................................................................................................................................... 1241
1.10.8.2.14 IS-IS Authentication .................................................................................................................................. 1244
1.10.8.2.15 IS-IS Purge LSP Source Tracing ................................................................................................................ 1245
1.10.8.2.16 IS-IS MT .................................................................................................................................................... 1250
1.10.8.2.17 IS-IS Local MT .......................................................................................................................................... 1254
1.10.8.2.18 IS-IS Control Messages ............................................................................................................................. 1256
1.10.8.3 Applications .................................................................................................................................................... 1262
1.10.8.3.1 IS-IS MT ...................................................................................................................................................... 1262
1.10.8.4 Appendixes ..................................................................................................................................................... 1263
1.10.9 BGP ................................................................................................................................................................... 1264
1.10.9.1 Introduction..................................................................................................................................................... 1264
1.10.9.2 Principles ........................................................................................................................................................ 1266
1.10.9.2.1 Basic Principle ............................................................................................................................................. 1266
1.10.9.2.2 BGP Route Processing ................................................................................................................................. 1270
1.10.9.2.3 AIGP ............................................................................................................................................................ 1276
1.10.9.2.4 Peer Group ................................................................................................................................................... 1278
1.10.9.2.5 Route Dampening ........................................................................................................................................ 1278
1.10.9.2.6 Community Attribute ................................................................................................................................... 1279
1.10.9.2.7 BGP Confederation ...................................................................................................................................... 1281
1.10.9.2.8 Route Reflector ............................................................................................................................................ 1282
Issue 01 (2018-05-04) xxii

NE20E-S2
1.10.9.2.9 BGP VPN Route Crossing ........................................................................................................................... 1286

1.10.9.2.10 MP-BGP .................................................................................................................................................... 1288
1.10.9.2.11 BGP Security.............................................................................................................................................. 1289
1.10.9.2.12 BGP GR ..................................................................................................................................................... 1290
1.10.9.2.13 BFD for BGP ............................................................................................................................................. 1291
1.10.9.2.14 BGP 6PE .................................................................................................................................................... 1292
1.10.9.2.15 BGP ORF ................................................................................................................................................... 1300
1.10.9.2.16 BGP Auto FRR .......................................................................................................................................... 1301
1.10.9.2.17 BGP Dynamic Update Peer-Groups ........................................................................................................... 1302
1.10.9.2.18 Active-Route-Advertise ............................................................................................................................. 1305
1.10.9.2.19 4-Byte AS Number..................................................................................................................................... 1307
1.10.9.2.20 BMP ........................................................................................................................................................... 1311
1.10.9.2.21 BGP Best External ..................................................................................................................................... 1312
1.10.9.2.22 BGP Add-Path ............................................................................................................................................ 1315
1.10.9.2.23 BGP Iteration Suppression in Case of Next Hop Flapping ........................................................................ 1317
1.10.9.2.24 BGP-LS...................................................................................................................................................... 1318
1.10.10 Routing Policy ................................................................................................................................................. 1324
1.10.10.1 Introduction to Routing Policies ................................................................................................................... 1324
1.10.10.2 Principles ...................................................................................................................................................... 1325
1.10.10.3 Applications .................................................................................................................................................. 1331
1.11 IP Multicast ........................................................................................................................................................... 1334
1.11.2 IP Multicast Overview ....................................................................................................................................... 1337
1.11.2.1 Introduction ..................................................................................................................................................... 1337
1.11.2.2 Principles ........................................................................................................................................................ 1339
1.11.2.2.1 Basic Concepts ............................................................................................................................................. 1339
1.11.2.2.2 Basic Framework ......................................................................................................................................... 1340
1.11.2.2.3 Multicast Addresses ..................................................................................................................................... 1341
1.11.2.2.4 Multicast Protocols ...................................................................................................................................... 1346
1.11.2.2.5 Multicast Models ......................................................................................................................................... 1349
1.11.2.2.6 Multicast Packet Forwarding ....................................................................................................................... 1349
1.11.2.3 Applications .................................................................................................................................................... 1350
1.11.3 IGMP ................................................................................................................................................................. 1351
1.11.3.1 Introduction ..................................................................................................................................................... 1351
1.11.3.2 Principles ........................................................................................................................................................ 1352
1.11.3.2.1 Principles of IGMP ...................................................................................................................................... 1352
1.11.3.2.2 IGMP Policy Control ................................................................................................................................... 1357
1.11.3.2.3 IGMP Static-Group Join............................................................................................................................... 1361
1.11.3.2.4 IGMP Prompt-Leave .................................................................................................................................... 1362
1.11.3.2.5 IGMP SSM Mapping ................................................................................................................................... 1362
1.11.3.2.6 IGMP On-Demand ....................................................................................................................................... 1363
1.11.3.2.7 IGMP IPsec .................................................................................................................................................. 1365
Issue 01 (2018-05-04) xxiii

NE20E-S2
1.11.3.2.8 Multi-Instance Supported by IGMP ............................................................................................................. 1366

1.11.3.3 IGMP Applications ......................................................................................................................................... 1366
1.11.3.3.1 Typical IGMP Applications .......................................................................................................................... 1366
1.11.4 PIM .................................................................................................................................................................... 1366
1.11.4.1 Introduction ..................................................................................................................................................... 1366
1.11.4.2 Principles ........................................................................................................................................................ 1367
1.11.4.2.1 PIM-SM ....................................................................................................................................................... 1367
1.11.4.2.2 PIM-SSM ..................................................................................................................................................... 1382
1.11.4.2.3 PIM Reliability............................................................................................................................................. 1383
1.11.4.2.4 PIM Security ................................................................................................................................................ 1385
1.11.4.2.5 PIM FRR ...................................................................................................................................................... 1393
1.11.4.2.6 PIM Control Messages ................................................................................................................................. 1397
1.11.4.2.7 Multicast over P2MP TE Tunnels ................................................................................................................ 1407
1.11.4.3 Applications .................................................................................................................................................... 1410
1.11.4.3.1 PIM Intra-domain......................................................................................................................................... 1410
1.11.4.3.2 PIM-SSM Intra-domain ............................................................................................................................... 1412
1.11.4.3.3 P2MP TE Applications for IPTV ................................................................................................................. 1414
1.11.4.3.4 NON-ECMP PIM FRR Based on IGP FRR ................................................................................................. 1415
1.11.4.3.5 NON-ECMP PIM FRR Based on Multicast Static Route ............................................................................ 1416
1.11.4.4 Appendix ......................................................................................................................................................... 1418
1.11.5 MSDP ................................................................................................................................................................ 1418
1.11.5.1 Introduction ..................................................................................................................................................... 1418
1.11.5.2 Principles ........................................................................................................................................................ 1419
1.11.5.2.1 Inter-Domain Multicast in MSDP ................................................................................................................ 1419
1.11.5.2.2 Mesh Group ................................................................................................................................................. 1421
1.11.5.2.3 Anycast-RP in MSDP ................................................................................................................................... 1421
1.11.5.2.4 Multi-Instance MSDP .................................................................................................................................. 1422
1.11.5.2.5 MSDP Authentication .................................................................................................................................. 1423
1.11.5.2.6 RPF Check Rules for SA Messages ............................................................................................................. 1423
1.11.5.3 Applications .................................................................................................................................................... 1424
1.11.6 Multicast Route Management ............................................................................................................................ 1426
1.11.6.1 Introduction ..................................................................................................................................................... 1426
1.11.6.2 Principles ........................................................................................................................................................ 1427
1.11.6.2.1 RPF Check ................................................................................................................................................... 1427
1.11.6.2.2 Multicast Load Splitting............................................................................................................................... 1428
1.11.6.2.3 Longest-Match Multicast Routing ............................................................................................................... 1431
1.11.6.2.4 Multicast Multi-Topology ............................................................................................................................ 1431
1.11.6.2.5 Multicast Boundary ...................................................................................................................................... 1433
1.11.7 Multicast VPN in Rosen Mode .......................................................................................................................... 1434
1.11.7.1 Introduction ..................................................................................................................................................... 1434
1.11.7.2 Principles ........................................................................................................................................................ 1434
1.11.7.2.1 Basic Concepts ............................................................................................................................................. 1434
Issue 01 (2018-05-04) xxiv

NE20E-S2
1.11.7.2.2 Inter-domain Multicast Implemented by MVPN ......................................................................................... 1435

1.11.7.2.3 PIM Neighbor Relationships Between CEs, PEs, and Ps ............................................................................. 1437
1.11.7.2.4 Share-MDT Setup Process ........................................................................................................................... 1438
1.11.7.2.5 Switch-MDT Switchover ............................................................................................................................. 1439
1.11.7.2.6 Multicast VPN Extranet ............................................................................................................................... 1441
1.11.7.2.7 BGP A-D MVPN ......................................................................................................................................... 1446
1.11.7.3 MVPN Applications ........................................................................................................................................ 1449
1.11.7.3.1 Single-AS MD VPN..................................................................................................................................... 1449
1.11.8 Multicast VPN in NG MVPN Mode .................................................................................................................. 1450
1.11.8.1 Introduction ..................................................................................................................................................... 1450
1.11.8.2 NG MVPN Control Plane ............................................................................................................................... 1452
1.11.8.2.1 Control Plane Overview ............................................................................................................................... 1452
1.11.8.2.2 MVPN Membership Autodiscovery ............................................................................................................. 1456
1.11.8.2.3 I-PMSI Tunnel Establishment ...................................................................................................................... 1457
1.11.8.2.4 Switching Between I-PMSI and S-PMSI Tunnels ....................................................................................... 1462
1.11.8.2.5 Joining and Leaving for Multicast Group Members .................................................................................... 1468
1.11.8.3 NG MVPN Data Plane .................................................................................................................................... 1483
1.11.8.4 NG MVPN Control Messages ......................................................................................................................... 1485
1.11.8.5 NG MVPN Reliability .................................................................................................................................... 1495
1.11.8.6 Applications .................................................................................................................................................... 1504
1.11.8.6.1 Application of NG MVPN to IPTV Services ............................................................................................... 1504
1.11.9 MLD .................................................................................................................................................................. 1509
1.11.9.1 Introduction ..................................................................................................................................................... 1509
1.11.9.2 Principles ........................................................................................................................................................ 1509
1.11.9.2.1 MLDv1 and MLDv2 .................................................................................................................................... 1509
1.11.9.2.2 MLD Group Compatibility .......................................................................................................................... 1512
1.11.9.2.3 MLD Querier Election ................................................................................................................................. 1512
1.11.9.2.4 MLD On-Demand ........................................................................................................................................ 1513
1.11.9.2.5 Protocol Comparison ................................................................................................................................... 1515
1.11.9.3 MLD Applications .......................................................................................................................................... 1515
1.11.10 User-side Multicast .......................................................................................................................................... 1516
1.11.10.1 Introduction ................................................................................................................................................... 1516
1.11.10.2 Principles ...................................................................................................................................................... 1517
1.11.10.2.1 Overview .................................................................................................................................................... 1517
1.11.10.2.2 Multicast Program Join .............................................................................................................................. 1522
1.11.10.2.3 Multicast Program Leave ........................................................................................................................... 1524
1.11.10.2.4 Multicast Program Leave by Going Offline ............................................................................................... 1526
1.11.10.2.5 User-side Multicast CAC ........................................................................................................................... 1527
1.11.10.3 Applications .................................................................................................................................................. 1529
1.11.10.3.1 User-side Multicast for PPPoE Access Users............................................................................................. 1529
1.11.10.3.2 User-side Multicast for IPoE Access Users ................................................................................................ 1531
Issue 01 (2018-05-04) xxv

NE20E-S2
1.11.10.3.3 User-side Multicast VPN ........................................................................................................................... 1532

1.11.11 Layer 2 Multicast ............................................................................................................................................. 1533
1.11.11.1 Introduction ................................................................................................................................................... 1533
1.11.11.2 Principles....................................................................................................................................................... 1535
1.11.11.2.1 IGMP Snooping.......................................................................................................................................... 1535
1.11.11.2.2 Static Layer 2 Multicast ............................................................................................................................. 1539
1.11.11.2.3 Layer 2 SSM Mapping ............................................................................................................................... 1540
1.11.11.2.4 IGMP Snooping Proxy ............................................................................................................................... 1542
1.11.11.2.5 Multicast VLAN......................................................................................................................................... 1544
1.11.11.2.6 Layer 2 Multicast Entry Limit .................................................................................................................... 1548
1.11.11.2.7 Layer 2 Multicast CAC .............................................................................................................................. 1549
1.11.11.2.8 Rapid Multicast Data Forwarding on a Backup Device ............................................................................. 1552
1.11.11.2.9 Layer 2 Multicast Instance ......................................................................................................................... 1554
1.11.11.2.10 MLD Snooping ........................................................................................................................................ 1556
1.11.11.3 Applications .................................................................................................................................................. 1565
1.11.11.3.1 Application of Layer 2 Multicast for IPTV Services .................................................................................. 1565
1.11.11.3.2 MLD Snooping Application ....................................................................................................................... 1567
1.11.11.4 Terms, Acronyms, and Abbreviations ........................................................................................................... 1568
1.12 MPLS .................................................................................................................................................................... 1569
1.12.2 MPLS Overview ................................................................................................................................................ 1572
1.12.2.1 Introduction..................................................................................................................................................... 1572
1.12.2.2 Principles ........................................................................................................................................................ 1573
1.12.2.2.1 Concepts ...................................................................................................................................................... 1573
1.12.2.2.2 LSP Establishment ....................................................................................................................................... 1579
1.12.2.2.3 MPLS P Fragmentation ................................................................................................................................ 1580
1.12.2.2.4 Checking the Source Interface of a Static CR-LSP...................................................................................... 1581
1.12.2.3 Applications .................................................................................................................................................... 1582
1.12.2.3.1 MPLS-based VPN ........................................................................................................................................ 1582
1.12.2.3.2 PBR to an LSP ............................................................................................................................................. 1583
1.12.3 MPLS LDP ........................................................................................................................................................ 1584
1.12.3.1 Introduction..................................................................................................................................................... 1584
1.12.3.2 Principles ........................................................................................................................................................ 1585
1.12.3.2.1 Basic Concepts............................................................................................................................................. 1585
1.12.3.2.2 LDP Session ................................................................................................................................................. 1586
1.12.3.2.3 Label Advertisement, Distribution, and Retention ....................................................................................... 1588
1.12.3.2.4 Outbound and Inbound LDP Policies .......................................................................................................... 1589
1.12.3.2.5 Establishment of an LDP LSP ..................................................................................................................... 1590
1.12.3.2.6 LDP Session Protection ............................................................................................................................... 1591
1.12.3.2.7 LDP Auto FRR ............................................................................................................................................. 1592
1.12.3.2.8 LDP-IGP Synchronization ........................................................................................................................... 1594
1.12.3.2.9 LDP GR ....................................................................................................................................................... 1598
Issue 01 (2018-05-04) xxvi

NE20E-S2
1.12.3.2.10 BFD for LDP LSP ...................................................................................................................................... 1600

1.12.3.2.11 LDP Bit Error Detection ............................................................................................................................ 1602
1.12.3.2.12 LDP MTU .................................................................................................................................................. 1603
1.12.3.2.13 LDP Authentication ................................................................................................................................... 1603
1.12.3.2.14 LDP over TE .............................................................................................................................................. 1604
1.12.3.2.15 LDP GTSM ................................................................................................................................................ 1605
1.12.3.2.16 Compatible Local and Remote LDP Session ............................................................................................. 1607
1.12.3.2.17 Assigning Labels to Both Upstream and Downstream LSRs ..................................................................... 1608
1.12.3.2.18 mLDP ......................................................................................................................................................... 1609
1.12.3.2.19 LDP Traffic Statistics Collection ............................................................................................................... 1616
1.12.3.2.20 BFD for P2MP Tunnel ............................................................................................................................... 1616
1.12.3.2.21 LDP Extension for Inter-Area LSP ............................................................................................................ 1617
1.12.3.3 Applications .................................................................................................................................................... 1619
1.12.3.3.1 mLDP Applications in an IPTV Scenario .................................................................................................... 1619
1.12.4 MPLS TE ........................................................................................................................................................... 1621
1.12.4.1 Introduction..................................................................................................................................................... 1621
1.12.4.2 Basic Principles .............................................................................................................................................. 1624
1.12.4.2.1 Technical Overview ..................................................................................................................................... 1624
1.12.4.2.2 Information Advertisement Component ....................................................................................................... 1626
1.12.4.2.3 Path Calculation Component ....................................................................................................................... 1629
1.12.4.2.4 Establishing a CR-LSP Using RSVP-TE ..................................................................................................... 1633
1.12.4.2.5 RSVP Summary Refresh .............................................................................................................................. 1635
1.12.4.2.6 RSVP Hello.................................................................................................................................................. 1636
1.12.4.2.7 Traffic Forwarding Component ................................................................................................................... 1637
1.12.4.2.8 Priorities and Preemption ............................................................................................................................. 1640
1.12.4.2.9 Affinity Naming Function ............................................................................................................................ 1641
1.12.4.3 Tunnel Optimization ....................................................................................................................................... 1642
1.12.4.3.1 Tunnel Re-optimization ............................................................................................................................... 1642
1.12.4.3.2 Automatic Bandwidth Adjustment ............................................................................................................... 1643
1.12.4.3.3 PCE+............................................................................................................................................................ 1645
1.12.4.4 IP-Prefix Tunnel.............................................................................................................................................. 1650
1.12.4.5 MPLS TE Reliability ...................................................................................................................................... 1650
1.12.4.5.1 Make-Before-Break ..................................................................................................................................... 1650
1.12.4.5.2 TE FRR ........................................................................................................................................................ 1652
1.12.4.5.3 CR-LSP Backup ........................................................................................................................................... 1660
1.12.4.5.4 Isolated LSP Computation ........................................................................................................................... 1663
1.12.4.5.5 Association Between CR-LSP Establishment and the IS-IS Overload ........................................................ 1665
1.12.4.5.6 SRLG ........................................................................................................................................................... 1667
1.12.4.5.7 MPLS TE Tunnel Protection Group ............................................................................................................. 1668
1.12.4.5.8 BFD for TE CR-LSP .................................................................................................................................... 1671
1.12.4.5.9 BFD for TE Tunnel ...................................................................................................................................... 1674
1.12.4.5.10 BFD for P2MP TE ..................................................................................................................................... 1674
Issue 01 (2018-05-04) xxvii

NE20E-S2
1.12.4.5.11 BFD for RSVP ........................................................................................................................................... 1675

1.12.4.5.12 RSVP GR ................................................................................................................................................... 1676
1.12.4.5.13 Loopback Detection for a Static Bidirectional Co-Routed CR-LSP .......................................................... 1677
1.12.4.6 MPLS TE Security .......................................................................................................................................... 1679
1.12.4.6.1 RSVP Authentication ................................................................................................................................... 1679
1.12.4.7 Static Bidirectional Co-routed LSPs ............................................................................................................... 1681
1.12.4.8 Associated Bidirectional CR-LSPs ................................................................................................................. 1682
1.12.4.9 CBTS .............................................................................................................................................................. 1683
1.12.4.10 P2MP TE....................................................................................................................................................... 1685
1.12.4.11 Applications .................................................................................................................................................. 1693
1.12.4.11.1 P2MP TE Applications for IPTV ............................................................................................................... 1693
1.12.5 Seamless MPLS ................................................................................................................................................. 1695
1.12.5.1 Introduction..................................................................................................................................................... 1695
1.12.5.2 Principles ........................................................................................................................................................ 1695
1.12.5.2.1 Basic Principles of Seamless MPLS ............................................................................................................ 1695
1.12.5.2.2 BFD for BGP Tunnel ................................................................................................................................... 1709
1.12.5.3 Applications .................................................................................................................................................... 1711
1.12.5.3.1 Seamless MPLS Applications in VPN Services ........................................................................................... 1711
1.12.6 GMPLS UNI ...................................................................................................................................................... 1713
1.12.6.1 Introduction to GMPLS UNI .......................................................................................................................... 1713
1.12.6.2 Principles ........................................................................................................................................................ 1714
1.12.6.2.1 Basic Concepts............................................................................................................................................. 1714
1.12.6.2.2 Establishment of a GMPLS UNI Tunnel ..................................................................................................... 1718
1.12.6.2.3 UNI LSP Graceful Deletion ......................................................................................................................... 1720
1.12.6.2.4 UNI Tunnel Calculation Using Both IP and Optical PCE Servers ............................................................... 1721
1.12.6.2.5 SRLG Sharing Between Optical and IP Layers Within a Transport Network .............................................. 1722
1.12.6.3 Deployment Scenario ...................................................................................................................................... 1723
1.12.6.3.1 General GMPLS UNI Scheme ..................................................................................................................... 1723
1.13 Segment Routing .................................................................................................................................................. 1724
1.13.2 IPv4 Segment Routing ....................................................................................................................................... 1727
1.13.2.1 Introduction..................................................................................................................................................... 1727
1.13.2.2 Feature Updates .............................................................................................................................................. 1729
1.13.2.3 Principles ........................................................................................................................................................ 1729
1.13.2.3.2 SR LSP ........................................................................................................................................................ 1733
1.13.2.3.3 IS-IS for SR ................................................................................................................................................. 1737
1.13.2.3.4 SR-TE .......................................................................................................................................................... 1747
1.13.2.3.5 Importing Traffic .......................................................................................................................................... 1755
1.13.2.3.6 BFD for SR-TE ............................................................................................................................................ 1762
1.13.2.3.7 TI-LFA FRR ................................................................................................................................................. 1764
Issue 01 (2018-05-04) xxviii

NE20E-S2
1.13.2.3.8 SR OAM ...................................................................................................................................................... 1772

1.13.2.3.9 Applications ................................................................................................................................................. 1775
1.13.2.3.10 Acronyms and Abbreviations ..................................................................................................................... 1777
1.14 VPN ...................................................................................................................................................................... 1778
1.14.2 VPN Overview................................................................................................................................................... 1781
1.14.2.1 Introduction..................................................................................................................................................... 1781
1.14.2.1.1 Classification ............................................................................................................................................... 1783
1.14.2.1.2 Architecture.................................................................................................................................................. 1783
1.14.2.1.3 Typical Networking ..................................................................................................................................... 1784
1.14.2.2 Principles ........................................................................................................................................................ 1784
1.14.2.2.1 Tunneling ..................................................................................................................................................... 1784
1.14.2.2.2 Implementation Modes ................................................................................................................................ 1785
1.14.2.2.3 Features Related to VPN Implementation .................................................................................................... 1785
1.14.3 GRE ................................................................................................................................................................... 1787
1.14.3.1 Introduction..................................................................................................................................................... 1787
1.14.3.2 Principles ........................................................................................................................................................ 1788
1.14.3.2.2 Keepalive Detection ..................................................................................................................................... 1792
1.14.3.2.3 Security Mechanism .................................................................................................................................... 1792
1.14.3.3 Applications .................................................................................................................................................... 1793
1.14.3.3.1 Enlarging the Operation Scope of the Network with Limited Hops ............................................................ 1793
1.14.3.3.2 Connecting Discontinuous Sub-networks to Establish a VPN ..................................................................... 1793
1.14.3.3.3 CEs Connecting to the MPLS VPN over GRE Tunnels ............................................................................... 1795
1.14.3.3.4 Application of GRE on an ERSPAN Network ............................................................................................. 1797
1.14.3.4 Appendix ......................................................................................................................................................... 1798
1.14.4 L2TPv3 .............................................................................................................................................................. 1799
1.14.4.1 Overview......................................................................................................................................................... 1799
1.14.4.2 Principles ........................................................................................................................................................ 1799
1.14.4.2.1 Basic Concepts............................................................................................................................................. 1799
1.14.4.2.2 L2TPv3 Implementation Principles ............................................................................................................. 1802
1.14.4.3 Applications .................................................................................................................................................... 1805
1.14.5 Tunnel Management .......................................................................................................................................... 1807
1.14.5.1 Introduction..................................................................................................................................................... 1807
1.14.5.2 Principles ........................................................................................................................................................ 1807
1.14.5.2.1 Tunnel Policy ............................................................................................................................................... 1807
1.14.5.2.2 Tunnel Policy Selector ................................................................................................................................. 1810
1.14.6 BGP/MPLS IP VPN ........................................................................................................................................... 1812
1.14.6.1 Introduction..................................................................................................................................................... 1812
1.14.6.2 Principles ........................................................................................................................................................ 1813
1.14.6.2.1 Basic BGP/MPLS IP VPN ........................................................................................................................... 1813
Issue 01 (2018-05-04) xxix

NE20E-S2
1.14.6.2.2 Hub & Spoke ............................................................................................................................................... 1822

1.14.6.2.3 MCE............................................................................................................................................................. 1825
1.14.6.2.4 Inter-AS VPN............................................................................................................................................... 1827
1.14.6.2.5 Carrier's Carrier ........................................................................................................................................... 1832
1.14.6.2.6 HVPN .......................................................................................................................................................... 1842
1.14.6.2.7 BGP/MPLS IP VPN Label Distribution Modes ........................................................................................... 1850
1.14.6.2.8 BGP SoO...................................................................................................................................................... 1853
1.14.6.2.9 Route Import Between VPN and Public Network Instances ........................................................................ 1853
1.14.6.2.10 VPN FRR ................................................................................................................................................... 1855
1.14.6.2.11 VPN ORF ................................................................................................................................................... 1857
1.14.6.2.12 BGP/MPLS IPv6 VPN Extension .............................................................................................................. 1859
1.14.6.2.13 VPN Dual-Stack Access............................................................................................................................. 1861
1.14.6.3 Applications .................................................................................................................................................... 1861
1.14.6.3.1 Application of MCEs on a Campus Network ............................................................................................... 1861
1.14.6.3.2 Application of MCEs on a Data Center Network ......................................................................................... 1863
1.14.6.3.3 Application of HVPN on an IP RAN ........................................................................................................... 1864
1.14.6.3.4 VPN ORF Usage Scenario ........................................................................................................................... 1866
1.14.7 VPWS ................................................................................................................................................................ 1868
1.14.7.1 Introduction..................................................................................................................................................... 1868
1.14.7.2 Principles ........................................................................................................................................................ 1870
1.14.7.2.1 Basic Functions ............................................................................................................................................ 1870
1.14.7.2.2 VPWS in CCC Mode ................................................................................................................................... 1871
1.14.7.2.3 Martini VPWS ............................................................................................................................................. 1872
1.14.7.2.4 VPWS in SVC Mode ................................................................................................................................... 1878
1.14.7.2.5 VPWS in BGP Mode ................................................................................................................................... 1879
1.14.7.2.6 Heterogeneous VPWS ................................................................................................................................. 1886
1.14.7.2.7 ATM Cell Relay ........................................................................................................................................... 1887
1.14.7.2.8 VCCV .......................................................................................................................................................... 1893
1.14.7.2.9 PW Redundancy .......................................................................................................................................... 1893
1.14.7.2.10 PW APS ..................................................................................................................................................... 1896
1.14.7.2.11 Comparison of VPWS Implementation Modes .......................................................................................... 1898
1.14.7.2.12 Comparison of Martini VPWS and BGP/MPLS IP VPN........................................................................... 1899
1.14.7.2.13 Inter-AS VPWS ......................................................................................................................................... 1900
1.14.7.2.14 Flow-Label-based Load Balancing ............................................................................................................ 1903
1.14.7.2.15 Mutual Protection Between an LDP VC and a CCC VC ........................................................................... 1904
1.14.7.3 Applications .................................................................................................................................................... 1911
1.14.7.3.1 Enterprise Leased Line Service Bearer Using PWE3 .................................................................................. 1911
1.14.7.3.2 HSI Service Bearer Using PWE3 ................................................................................................................ 1913
1.14.7.3.3 PW APS Application .................................................................................................................................... 1914
1.14.8 IP Hard Pipe ....................................................................................................................................................... 1915
1.14.8.1 Introduction..................................................................................................................................................... 1915
1.14.8.2 Principles ........................................................................................................................................................ 1917
Issue 01 (2018-05-04) xxx

NE20E-S2
1.14.8.2.1 Centralized Management of IP Hard-Pipe-based Leased Line Services on the NMS.................................. 1917
1.14.8.2.2 Interface-based Hard Pipe Bandwidth Reservation ..................................................................................... 1919
1.14.8.2.3 AC Interface Service Bandwidth Limitation ................................................................................................ 1920
1.14.8.2.4 Hard-Pipe-based TE LSP ............................................................................................................................. 1920
1.14.8.2.5 Hard-Pipe-based VLL/PWE3 ...................................................................................................................... 1921
1.14.8.2.6 Hard Pipe Reliability ................................................................................................................................... 1922
1.14.8.2.7 Hard Pipe Service Quality Monitoring ........................................................................................................ 1923
1.14.8.3 Applications .................................................................................................................................................... 1923
1.14.8.3.1 Hard-Pipe-based Enterprise Leased Line Application ................................................................................. 1923
1.14.8.3.2 Hard-Pipe-based Enterprise Leased Line Protection ................................................................................... 1923
1.14.8.3.3 Hard-Pipe-based Leased Line Services Implemented by Huawei and Non-Huawei Devices ..................... 1924
1.14.9 VPLS.................................................................................................................................................................. 1925
1.14.9.1 Introduction..................................................................................................................................................... 1925
1.14.9.2 Principles ........................................................................................................................................................ 1927
1.14.9.2.1 VPLS Description ........................................................................................................................................ 1927
1.14.9.2.2 VPLS Functions ........................................................................................................................................... 1934
1.14.9.2.3 LDP VPLS ................................................................................................................................................... 1937
1.14.9.2.4 BGP VPLS ................................................................................................................................................... 1941
1.14.9.2.5 HVPLS......................................................................................................................................................... 1943
1.14.9.2.6 BGP AD VPLS ............................................................................................................................................ 1944
1.14.9.2.7 Inter-AS VPLS ............................................................................................................................................. 1948
1.14.9.2.8 VPLS PW Redundancy ................................................................................................................................ 1950
1.14.9.2.9 Multicast VPLS............................................................................................................................................ 1953
1.14.9.2.10 VPLS Multi-homing .................................................................................................................................. 1958
1.14.9.3 Applications .................................................................................................................................................... 1960
1.14.9.3.1 Application of VPLS in Residential Services .............................................................................................. 1960
1.14.9.3.2 Application of VPLS in Enterprise Services ................................................................................................ 1962
1.14.9.3.3 VPLS PW Redundancy for Protecting Multicast Services .......................................................................... 1964
1.14.9.3.4 VPLS PW Redundancy for Protecting Unicast Services ............................................................................. 1968
1.14.9.3.5 Application of Multicast VPLS .................................................................................................................... 1971
1.14.9.3.6 VPWS Accessing VPLS .............................................................................................................................. 1973
1.14.9.3.7 VPLS Multi-homing Application ................................................................................................................. 1974
1.14.10 L2VPN Accessing L3VPN .............................................................................................................................. 1975
1.14.10.1 Introduction................................................................................................................................................... 1975
1.14.10.2 Principles ...................................................................................................................................................... 1976
1.14.10.2.1 Basic Concepts and Implementation .......................................................................................................... 1976
1.14.10.2.2 Classification of L2VPN Accessing L3VPN ............................................................................................. 1977
1.14.10.3 Applications .................................................................................................................................................. 1978
1.14.10.3.1 VPWS Accessing L3VPN .......................................................................................................................... 1978
1.14.10.3.2 VPLS Accessing L3VPN ........................................................................................................................... 1979
Issue 01 (2018-05-04) xxxi

NE20E-S2
1.14.11 EVPN ............................................................................................................................................................... 1980

1.14.11.1 Introduction ................................................................................................................................................... 1980
1.14.11.2 Principles ...................................................................................................................................................... 1981
1.14.11.2.1 EVPN Principles ........................................................................................................................................ 1981
1.14.11.2.2 Basic Principles of EVPN Seamless MPLS ............................................................................................... 1990
1.14.11.2.3 Inter-AS EVPN Option C........................................................................................................................... 2002
1.14.11.2.4 Hybrid Transmission of Layer 2 and Layer 3 Traffic in a VLAN Accessing MPLS EVPN Scenario ....... 2004
1.14.11.3 Applications .................................................................................................................................................. 2006
1.14.11.3.1 Using EVPN to Interconnect Other Networks ........................................................................................... 2006
1.14.12 PBB-EVPN ...................................................................................................................................................... 2007
1.14.12.1 Introduction................................................................................................................................................... 2007
1.14.12.2 Principles ...................................................................................................................................................... 2008
1.14.12.2.1 PBB-EVPN Principles ............................................................................................................................... 2008
1.14.12.3 Applications .................................................................................................................................................. 2017
1.14.12.3.1 Migration from an HVPLS Network to a PBB-EVPN .............................................................................. 2017
1.14.13 PBB VPLS ....................................................................................................................................................... 2018
1.14.13.1 Introduction................................................................................................................................................... 2018
1.14.13.2 Principles ...................................................................................................................................................... 2022
1.14.13.2.1 PBB VPLS Principles ................................................................................................................................ 2022
1.14.13.3 Applications .................................................................................................................................................. 2025
1.14.13.3.1 PBB VPLS Application.............................................................................................................................. 2025
1.15 QoS ....................................................................................................................................................................... 2028
1.15.2 What Is QoS ....................................................................................................................................................... 2031
1.15.2.1 What Is QoS .................................................................................................................................................... 2031
1.15.2.2 QoS Specifications .......................................................................................................................................... 2032
1.15.2.3 Common QoS Specifications .......................................................................................................................... 2034
1.15.3 DiffServ Overview ............................................................................................................................................. 2038
1.15.3.1 DiffServ Model ............................................................................................................................................... 2038
1.15.3.2 DSCP and PHB ............................................................................................................................................... 2039
1.15.3.3 Components in the DiffServ Model ................................................................................................................ 2042
1.15.4 End-to-End QoS Service Models ....................................................................................................................... 2044
1.15.5 Overall QoS Process .......................................................................................................................................... 2046
1.15.6 Classification and Marking ................................................................................................................................ 2063
1.15.6.1 Traffic Classifiers and Traffic Behaviors ........................................................................................................ 2063
1.15.6.2 QoS Priority Fields ......................................................................................................................................... 2065
1.15.6.3 BA Classification ............................................................................................................................................ 2068
1.15.6.3.1 What Is BA Classification ............................................................................................................................ 2068
1.15.6.3.2 QoS Priority Mapping .................................................................................................................................. 2068
1.15.6.3.3 BA and PHB ................................................................................................................................................ 2088
1.15.6.4 MF Classification ............................................................................................................................................ 2093
1.15.6.4.1 What Is MF Classification ........................................................................................................................... 2093
Issue 01 (2018-05-04) xxxii

NE20E-S2
1.15.6.4.2 Traffic Policy Based on MF Classification .................................................................................................. 2096

1.15.6.4.3 QPPB ........................................................................................................................................................... 2099
1.15.7 Traffic Policing and Traffic Shaping .................................................................................................................. 2104
1.15.7.1 Traffic Policing ............................................................................................................................................... 2104
1.15.7.1.1 Overview...................................................................................................................................................... 2104
1.15.7.1.2 Token Bucket ............................................................................................................................................... 2104
1.15.7.1.3 CAR ............................................................................................................................................................. 2109
1.15.7.1.4 Traffic Policing Applications ....................................................................................................................... 2116
1.15.7.2 Traffic Shaping ............................................................................................................................................... 2119
1.15.7.3 Comparison Between Traffic Policing and Traffic Shaping ........................................................................... 2126
1.15.8 Congestion Management and Avoidance ........................................................................................................... 2127
1.15.8.1 Traffic Congestion and Solutions .................................................................................................................... 2127
1.15.8.2 Queues and Congestion Management ............................................................................................................. 2130
1.15.8.3 Congestion Avoidance .................................................................................................................................... 2143
1.15.8.4 Impact of Queue Buffer on Delay and Jitter ................................................................................................... 2147
1.15.8.5 HQoS .............................................................................................................................................................. 2148
1.15.9 MPLS QoS ......................................................................................................................................................... 2171
1.15.9.1 MPLS QoS Overview ..................................................................................................................................... 2171
1.15.9.2 MPLS DiffServ ............................................................................................................................................... 2172
1.15.9.3 MPLS HQoS ................................................................................................................................................... 2177
1.15.9.3.1 Implementation Principle ............................................................................................................................. 2177
1.15.9.3.2 Application ................................................................................................................................................... 2180
1.15.10 ATM QoS ......................................................................................................................................................... 2181
1.15.10.1 Basic Concepts of ATM ................................................................................................................................ 2181
1.15.10.2 QoS of ATMoPSN and PSNoATM ............................................................................................................... 2190
1.15.11 Multicast Virtual Scheduling............................................................................................................................ 2195
1.15.11.1 Introduction ................................................................................................................................................... 2195
1.15.11.2 Principles ...................................................................................................................................................... 2196
1.15.11.2.1 Basic Principles of Multicast Virtual Scheduling ....................................................................................... 2196
1.15.11.3 Applications .................................................................................................................................................. 2197
1.15.11.3.1 Typical Single-Edge Network with Multicast Virtual Scheduling ............................................................. 2197
1.15.11.3.2 Typical Double-Edge Network with Multicast Virtual Scheduling ............................................................ 2198
1.15.12 L2TP QoS ........................................................................................................................................................ 2198
1.15.12.1 Introduction to L2TP QoS............................................................................................................................. 2198
1.15.12.2 Principles ...................................................................................................................................................... 2199
1.15.12.2.1 Principles ................................................................................................................................................... 2199
1.15.13 Acronyms and Abbreviations ........................................................................................................................... 2200
1.16 Security ................................................................................................................................................................. 2206
1.16.2 ARP Security...................................................................................................................................................... 2209
1.16.2.1 Introduction..................................................................................................................................................... 2209
1.16.2.2 Principles ........................................................................................................................................................ 2211
Issue 01 (2018-05-04) xxxiii

NE20E-S2
1.16.2.2.1 Validity Check of ARP Packets .................................................................................................................... 2211

1.16.2.2.2 Strict ARP Learning ..................................................................................................................................... 2212
1.16.2.2.3 ARP Entry Limit .......................................................................................................................................... 2214
1.16.2.2.4 ARP Packet Rate Limit ................................................................................................................................ 2215
1.16.2.2.5 ARP Miss Message Rate Limit .................................................................................................................... 2216
1.16.2.2.6 Gratuitous ARP Packet Discarding .............................................................................................................. 2216
1.16.2.3 Applications .................................................................................................................................................... 2217
1.16.2.3.1 Anti-ARP Spoofing Application .................................................................................................................. 2217
1.16.2.3.2 Anti-ARP Flood Application ........................................................................................................................ 2219
1.16.2.4 Abbreviations .................................................................................................................................................. 2220
1.16.3 BGP Flow Specification .................................................................................................................................... 2220
1.16.3.1 Introduction..................................................................................................................................................... 2220
1.16.3.2 Principles ........................................................................................................................................................ 2222
1.16.3.2.1 Principles of Flow Specification .................................................................................................................. 2222
1.16.3.2.2 Principles of BGP VPNv4 Flow Specification............................................................................................. 2227
1.16.3.3 Application ...................................................................................................................................................... 2228
1.16.3.3.1 Application of BGP Flow Specification on a Network with Multiple Ingresses .......................................... 2228
1.16.3.3.2 Application of BGP Flow Specification on a VPN ...................................................................................... 2229
1.16.3.3.3 Application of BGP VPNv4 Flow Specification .......................................................................................... 2230
1.16.4 DHCP Snooping ................................................................................................................................................ 2231
1.16.4.1 Introduction..................................................................................................................................................... 2231
1.16.4.2 Principles ........................................................................................................................................................ 2233
1.16.4.2.1 Basic Concepts of DHCP Snooping ............................................................................................................. 2233
1.16.4.2.2 Bogus DHCP Server Attack ......................................................................................................................... 2240
1.16.4.2.3 Man-in-the-middle Attack, IP/MAC Spoofing Attack, and DHCP Exhaustion Attack ................................ 2242
1.16.4.2.4 Starvation Attack.......................................................................................................................................... 2245
1.16.4.2.5 DHCP DoS Attack by Changing CHADDR ................................................................................................ 2245
1.16.4.3 Application ...................................................................................................................................................... 2246
1.16.5 Keychain ............................................................................................................................................................ 2248
1.16.5.1 Introduction..................................................................................................................................................... 2249
1.16.5.2 Principles ........................................................................................................................................................ 2249
1.16.5.2.1 Principles of Keychain ................................................................................................................................. 2249
1.16.5.3 Applications .................................................................................................................................................... 2250
1.16.5.3.1 Non-TCP Applications of Keychain ............................................................................................................ 2250
1.16.5.3.2 TCP Applications of Keychain..................................................................................................................... 2252
1.16.6 URPF ................................................................................................................................................................. 2255
1.16.6.1 Introduction..................................................................................................................................................... 2255
1.16.6.2 Principles ........................................................................................................................................................ 2257
1.16.6.2.1 Principles of URPF ...................................................................................................................................... 2257
1.16.6.3 Applications .................................................................................................................................................... 2258
Issue 01 (2018-05-04) xxxiv

NE20E-S2
1.16.7 MAC Address Limit........................................................................................................................................... 2262

1.16.7.1 Introduction..................................................................................................................................................... 2262
1.16.7.2 Principles ........................................................................................................................................................ 2262
1.16.7.2.1 Basic Principles of MAC Address Limit...................................................................................................... 2263
1.16.7.2.2 Traffic Suppression Principle ....................................................................................................................... 2263
1.16.7.3 Applications .................................................................................................................................................... 2265
1.16.8 Layer 2 Loop Detection ..................................................................................................................................... 2267
1.16.8.1 Introduction..................................................................................................................................................... 2267
1.16.8.2 Principles ........................................................................................................................................................ 2268
1.16.8.2.1 Principles of Layer 2 Loop Detection .......................................................................................................... 2268
1.16.9 Device Security .................................................................................................................................................. 2269
1.16.9.1 Introduction..................................................................................................................................................... 2269
1.16.9.2 Principles of Device Security.......................................................................................................................... 2270
1.16.9.2.1 Implementation Principle ............................................................................................................................. 2271
1.16.9.2.2 Application Layer Association ..................................................................................................................... 2272
1.16.9.2.3 Management and Control Plane Protection.................................................................................................. 2272
1.16.9.2.4 TCP/IP Attack Defense ................................................................................................................................ 2273
1.16.9.2.5 URPF on an Interface .................................................................................................................................. 2273
1.16.9.2.6 Attack Source Tracing .................................................................................................................................. 2273
1.16.9.2.7 Dynamic Link Protection ............................................................................................................................. 2273
1.16.9.2.8 GTSM .......................................................................................................................................................... 2274
1.16.9.2.9 TM Three-Level Scheduling ........................................................................................................................ 2274
1.16.9.2.10 CP-CAR ..................................................................................................................................................... 2276
1.16.9.2.11 Whitelist ..................................................................................................................................................... 2277
1.16.9.2.12 Blacklist ..................................................................................................................................................... 2277
1.16.9.2.13 User-Defined Flow..................................................................................................................................... 2277
1.16.9.2.14 Smallest Packet Compensation .................................................................................................................. 2277
1.16.9.2.15 Alarm ......................................................................................................................................................... 2277
1.16.9.3.1 Acronyms and Abbreviations ....................................................................................................................... 2277
1.16.10 SOC ................................................................................................................................................................. 2278
1.16.10.1 Introduction................................................................................................................................................... 2278
1.16.10.2 Principles ...................................................................................................................................................... 2279
1.16.10.2.1 Architecture................................................................................................................................................ 2279
1.16.10.2.2 SOC Processing ......................................................................................................................................... 2281
1.16.11 IPsec ................................................................................................................................................................. 2282
1.16.11.1 Overview ....................................................................................................................................................... 2282
1.16.11.2 Introduction ................................................................................................................................................... 2283
1.16.11.3 Application Scenario ..................................................................................................................................... 2284
Issue 01 (2018-05-04) xxxv

NE20E-S2
1.16.11.3.1 Carrier Scenario ......................................................................................................................................... 2284

1.16.11.3.2 Enterprise Scenario .................................................................................................................................... 2284
1.16.11.4 IPsec Framework .......................................................................................................................................... 2288
1.16.11.4.1 Security Protocol ........................................................................................................................................ 2288
1.16.11.4.2 Encapsulation Mode................................................................................................................................... 2289
1.16.11.4.3 Encryption Algorithm ................................................................................................................................ 2291
1.16.11.4.4 Authentication Algorithm ........................................................................................................................... 2292
1.16.11.4.5 Key Exchange ............................................................................................................................................ 2294
1.16.11.5 IPsec SA ........................................................................................................................................................ 2296
1.16.11.5.1 IKEv1 SA Negotiation Process .................................................................................................................. 2297
1.16.11.5.2 IKEv2 SA Negotiation Process .................................................................................................................. 2303
1.16.11.6 IPsec Packet Processing ................................................................................................................................ 2305
1.16.11.7 IPsec DPD ..................................................................................................................................................... 2307
1.16.11.8 IPsec Security................................................................................................................................................ 2308
1.16.11.9 IPsec QoS ...................................................................................................................................................... 2312
1.16.11.10 IPsec NAT Traversal ................................................................................................................................... 2313
1.16.11.11 IPsec Enhanced Functions ........................................................................................................................... 2315
1.16.11.12 Extended IPsec Applications ....................................................................................................................... 2317
1.16.11.12.1 IPsec Application in the L2VPN or L3VPN Scenario ............................................................................. 2317
1.16.12 Mirroring ......................................................................................................................................................... 2321
1.16.12.1 Introduction................................................................................................................................................... 2322
1.16.12.2 Principles of Mirroring ................................................................................................................................. 2323
1.16.12.2.1 Principles of Mirroring .............................................................................................................................. 2323
1.16.12.3 Applications .................................................................................................................................................. 2323
1.16.13 Obtaining Packet Headers ................................................................................................................................ 2324
1.16.13.1 Introduction................................................................................................................................................... 2324
1.16.13.2 Principles ...................................................................................................................................................... 2325
1.16.13.3 Applications .................................................................................................................................................. 2326
1.16.14 PKI ................................................................................................................................................................... 2327
1.16.14.1 Introduction................................................................................................................................................... 2327
1.16.14.2 Principles ...................................................................................................................................................... 2328
1.16.14.2.1 PKI System ................................................................................................................................................ 2328
1.16.14.2.2 Certificate Application ............................................................................................................................... 2330
1.16.14.2.3 Certificate Acquisition ............................................................................................................................... 2331
1.16.14.2.4 CRL ........................................................................................................................................................... 2331
1.16.14.2.5 CMPv2 ....................................................................................................................................................... 2332
1.16.14.3 PKI Application ............................................................................................................................................ 2334
1.16.14.3.1 Certificate Application on the IPSec VPN ................................................................................................. 2334
1.16.14.3.2 Certificate Attribute-based VPN Access Control ....................................................................................... 2335
1.16.14.3.3 Whitelist-based Access Control ................................................................................................................. 2335
1.16.15 Management Plane Access Control.................................................................................................................. 2336
1 Introduction ......................................................................................................................... 2336
Issue 01 (2018-05-04) xxxvi

NE20E-S2
1.16.15.2 Principles ...................................................................................................................................................... 2337

1.17 Virtual Access ....................................................................................................................................................... 2339
About This Document ................................................................................................................................................... 2339
1.17.2 Virtual Access Solution ...................................................................................................................................... 2341
Overview ...................................................................................................................................................................... 2341
Introduction .................................................................................................................................................................. 2341
1.17.2.3.4 Features Supported ...................................................................................................................................... 2344
1.17.2.3.5 Typical Scenarios and Networking .............................................................................................................. 2347
1.17.2.3.6 Principles ..................................................................................................................................................... 2350
1.17.2.3.7 Reliability .................................................................................................................................................... 2372
1.17.2.3.8 Applications ................................................................................................................................................. 2387
1.18 System Monitor .................................................................................................................................................... 2394
1.18.2 NetStream .......................................................................................................................................................... 2397
Introduction .................................................................................................................................................................. 2397
1.18.2.5 Principles ........................................................................................................................................................ 2398
Basic Functions............................................................................................................................................................. 2399
1.18.2.5.9 Sampling Procedure ..................................................................................................................................... 2400
1.18.2.5.10 Sampling Modes ........................................................................................................................................ 2400
1.18.2.5.11 Aging of a Stream ...................................................................................................................................... 2401
1.18.2.5.12 Export of a Flow ........................................................................................................................................ 2402
1.18.2.5.13 Format Versions of NetStream Packets ...................................................................................................... 2404
1.18.2.6 Applications .................................................................................................................................................... 2422
1.18.3 NQA................................................................................................................................................................... 2424
Introduction .................................................................................................................................................................. 2424
1.18.3.7 Principles ........................................................................................................................................................ 2426
Overview ...................................................................................................................................................................... 2426
1.18.3.7.14 NQA Detection on an IP Network ............................................................................................................. 2428
1.18.3.7.15 NQA Detection on an MPLS Network ...................................................................................................... 2439
1.18.3.7.16 NQA Detection on a VPN .......................................................................................................................... 2441
1.18.3.7.17 NQA Detection on a Layer 2 Network ....................................................................................................... 2448
1.18.3.7.18 RFC 2544 Generalflow Test ...................................................................................................................... 2450
1.18.3.7.19 Y.1564 Ethernet Service Activation Test .................................................................................................... 2453
1.18.4 Ping and Tracert ................................................................................................................................................. 2462
Introduction .................................................................................................................................................................. 2462
1.18.4.9 Principles ........................................................................................................................................................ 2463
Ping/Tracert .................................................................................................................................................................. 2463
1.18.4.9.20 MPLS Ping/Tracert .................................................................................................................................... 2464
1.18.4.9.21 PW Ping/Tracert......................................................................................................................................... 2468
Issue 01 (2018-05-04) xxxvii

NE20E-S2
1.18.4.9.22 CE Ping ...................................................................................................................................................... 2473

1.18.4.9.23 GMAC Ping/Trace ..................................................................................................................................... 2474
1.18.4.9.24 802.1ag MAC Ping/Trace .......................................................................................................................... 2476
1.18.5 TWAMP ............................................................................................................................................................. 2478
Introduction .................................................................................................................................................................. 2478
1.18.5.10 Principles ...................................................................................................................................................... 2480
TWAMP Implementation Principles ............................................................................................................................. 2480
1.18.5.10.25 TWAMP Implementation Process ............................................................................................................ 2482
1.18.5.11 Applications .................................................................................................................................................. 2484
TWAMP Applications on an IP Network ...................................................................................................................... 2484
1.18.5.11.26 TWAMP Applications on an L3VPN ....................................................................................................... 2485
1.18.6 TWAMP Light ................................................................................................................................................... 2486
Introduction .................................................................................................................................................................. 2486
1.18.6.13 Principles ...................................................................................................................................................... 2488
Overview ...................................................................................................................................................................... 2488
1.18.6.13.27 Comparison Between TWAMP Light and TWAMP ................................................................................ 2489
1.18.6.13.28 Principles ................................................................................................................................................. 2490
1.18.6.14 Applications .................................................................................................................................................. 2492
TWAMP Light Application on an L3VPN .................................................................................................................... 2492
1.18.7 IP FPM ............................................................................................................................................................... 2493
Introduction .................................................................................................................................................................. 2493
1.18.7.16 Principles ...................................................................................................................................................... 2493
Basic Concepts ............................................................................................................................................................. 2493
1.18.7.16.29 Basic Functions ........................................................................................................................................ 2495
1.18.7.17 Applications .................................................................................................................................................. 2499
IP FPM Applications on Seamless MPLS ..................................................................................................................... 2499
1.18.7.17.30 IP FPM Applications on IP RANs ............................................................................................................ 2501
1.18.7.17.31 End-to-End Performance Measurement Scenarios .................................................................................. 2503
1.18.7.17.32 Hop-by-hop Performance Measurement Scenarios.................................................................................. 2507
1.19 User Access ........................................................................................................................................................... 2510
1.19.2 AAA and User Management .............................................................................................................................. 2514
Introduction .................................................................................................................................................................. 2514
1.19.2.19 Principles ...................................................................................................................................................... 2518
AAA ............................................................................................................................................................................. 2518
1.19.2.19.33 HWTACACS ........................................................................................................................................... 2519
1.19.2.19.34 RADIUS .................................................................................................................................................. 2521
1.19.2.19.35 Domain-based User Management ............................................................................................................ 2527
1.19.2.19.36 User Group-based and Task Group-based User Management ................................................................. 2529
Issue 01 (2018-05-04) xxxviii

NE20E-S2
1.19.2.19.37 AAA to BRAS User ................................................................................................................................. 2529

1.19.2.19.38 BRAS User Management ......................................................................................................................... 2530
1.19.2.19.39 BRAS User Domain Classification .......................................................................................................... 2533
1.19.2.19.40 Validation Rules for Domain Names........................................................................................................ 2534
1.19.2.20 Applications .................................................................................................................................................. 2536
1.19.2.22 HWTACACS Attribute ................................................................................................................................. 2538
1.19.2.23 RADIUS Attributes ....................................................................................................................................... 2538
RADIUS Attribute Dictionary ...................................................................................................................................... 2538
1.19.2.23.41 Attributes Carried in RADIUS Packets .................................................................................................... 2539
1.19.2.23.42 RADIUS Attribute Prohibition, Conversion, and Default Carrying Status .............................................. 2591
1.19.2.23.43 Description of RADIUS Attributes .......................................................................................................... 2595
1.19.2.23.44 RADIUS Server Selection ....................................................................................................................... 2703
1.19.2.23.45 Description for the Attributes of OWN Type ........................................................................................... 2704
1.19.2.23.46 Reasons for User Offline ......................................................................................................................... 2705
1.19.2.23.47 More Information About HW-Data-Filter (82) ........................................................................................ 2729
1.19.2.23.48 More Information About NAS-Port-Id (87) ............................................................................................. 2743
1.19.2.23.49 More Information About HW-Dhcp-Option (187) ................................................................................... 2752
1.19.2.23.50 More Information About HW-Avpair(188) .............................................................................................. 2754
1.19.3 BRAS Feature Glance ........................................................................................................................................ 2762
1.19.4 Address alloct and management ......................................................................................................................... 2765
Address Allocation........................................................................................................................................................ 2765
Address Allocation Methods for Different Users.......................................................................................................... 2765
1.19.4.23.51 Principles of Stateless Address Autoconfiguration .................................................................................. 2766
1.19.4.23.52 Principles of DHCPv4 Access ................................................................................................................. 2766
1.19.4.23.53 Principles of DHCPv6 Access ................................................................................................................. 2775
1.19.4.23.54 Principles of ND Proxy ............................................................................................................................ 2782
1.19.4.24 Address management .................................................................................................................................... 2782
IPv4 Address management............................................................................................................................................ 2782
1.19.4.25 UNR Generation and Advertisement ............................................................................................................ 2784
1.19.5 IPoEv4 Access ................................................................................................................................................... 2787
Introduction to IPoEv4 ................................................................................................................................................. 2787
1.19.5.26 Principles ...................................................................................................................................................... 2788
Basic Principle of IPoEv4 ............................................................................................................................................. 2788
1.19.5.27 Applications .................................................................................................................................................. 2790
1.19.5.28 Terms and Abbreviations .............................................................................................................................. 2796
1.19.6 IPoEv6 Access ................................................................................................................................................... 2797
Introduction to IPoEv6 ................................................................................................................................................. 2797
1.19.6.29 Principles ...................................................................................................................................................... 2798
Principles of IPoEv6 Access ......................................................................................................................................... 2798
1.19.6.30 Applications .................................................................................................................................................. 2799
Issue 01 (2018-05-04) xxxix

NE20E-S2
1.19.7 PPPoE Access .................................................................................................................................................... 2808

Introduction .................................................................................................................................................................. 2808
1.19.7.32 Principles ...................................................................................................................................................... 2809
PPPoE User Login Process ........................................................................................................................................... 2809
1.19.7.32.55 PPPoE MTU and MRU Negotiation ........................................................................................................ 2816
1.19.7.32.56 PPPoE Packet Format .............................................................................................................................. 2817
1.19.7.33 Applications .................................................................................................................................................. 2818
PPPoE Application........................................................................................................................................................ 2818
1.19.8 802.1x Access .................................................................................................................................................... 2821
Introduction to 802.1x Access ...................................................................................................................................... 2821
1.19.8.35 Feature Updates ............................................................................................................................................ 2821
1.19.8.36 Principle ........................................................................................................................................................ 2822
Basic Principle of 802.1x Access .................................................................................................................................. 2822
1.19.8.36.57 Authentication Initiation and User Logoff ............................................................................................... 2823
1.19.8.36.58 EAP Packet Relaying and Termination .................................................................................................... 2823
1.19.8.36.59 Basic Process of the IEEE 802.1x Authentication System ....................................................................... 2824
1.19.8.37 Applications .................................................................................................................................................. 2825
1.19.9 L2TP Access ...................................................................................................................................................... 2827
Introduction to L2TP .................................................................................................................................................... 2827
1.19.9.39 Principles ...................................................................................................................................................... 2830
L2TP Protocol Structure ............................................................................................................................................... 2830
1.19.9.39.60 L2TP Header ............................................................................................................................................ 2830
1.19.9.40 Applications .................................................................................................................................................. 2840
1.19.10 Dynamic Address Pool..................................................................................................................................... 2845
Principles ...................................................................................................................................................................... 2845
Basic Principles ............................................................................................................................................................ 2845
1.19.10.41.61 Packet Exchanges Involved in Dynamic Address Pool Implementation ................................................ 2845
1.19.10.42 Applications ................................................................................................................................................ 2848
1.19.10.43 Terms, Acronyms, and Abbreviations ......................................................................................................... 2848
1.20 NAT and IPv6 Transition ...................................................................................................................................... 2848
1.20.2 NAT ................................................................................................................................................................... 2851
Introduction .................................................................................................................................................................. 2851
1.20.2.44 Principles ...................................................................................................................................................... 2851
Basic Concepts ............................................................................................................................................................. 2851
1.20.2.44.62 NAT Port Allocation Modes ..................................................................................................................... 2854
1.20.2.44.63 NAT ALG ................................................................................................................................................. 2854
1.20.2.44.64 Configuring a NAT Device When External Users Want to Access an Internal Server ............................. 2863
1.20.2.44.65 NAT Easy-IP ............................................................................................................................................ 2864
Issue 01 (2018-05-04) xl
NE20E-S2
1.20.2.44.66 NAT Resource Protection......................................................................................................................... 2865

1.20.2.44.67 NAT Backup ............................................................................................................................................ 2866
1.20.2.44.68 NAT Logging ........................................................................................................................................... 2867
1.20.2.45 Applications .................................................................................................................................................. 2872
NAT Deployment .......................................................................................................................................................... 2872
1.20.2.45.69 Outbound Interface-based NAT ............................................................................................................... 2873
1.20.2.45.70 Dual NAT ................................................................................................................................................. 2874
1.20.2.45.71 Hairpin Scenario ...................................................................................................................................... 2875
1.21 Value-Added-Service ............................................................................................................................................ 2876
1.21.2 BOD ................................................................................................................................................................... 2878
Introduction .................................................................................................................................................................. 2878
1.21.2.47 Principles ...................................................................................................................................................... 2879
BOD Overview ............................................................................................................................................................. 2879
1.21.2.47.72 BOD Service Activation and Deactivation .............................................................................................. 2882
1.21.2.47.73 BOD Service Quota Management............................................................................................................ 2883
1.21.2.47.74 BOD Service Accounting ......................................................................................................................... 2884
1.21.2.47.75 BOD Service Traffic Statistics ................................................................................................................. 2884
1.21.2.48 Applications .................................................................................................................................................. 2884
1.21.3 DAA................................................................................................................................................................... 2886
Introduction .................................................................................................................................................................. 2886
1.21.3.50 Principles ...................................................................................................................................................... 2887
Basic Concepts of DAA ................................................................................................................................................ 2887
1.21.3.50.76 DAA Service Accounting ......................................................................................................................... 2891
1.21.3.50.77 DAA Service Policy Switching ................................................................................................................ 2894
1.21.3.50.78 DAA Service Quota Management............................................................................................................ 2896
1.21.3.51 Applications .................................................................................................................................................. 2897
Typical Usage Scenarios of DAA ................................................................................................................................. 2897
1.21.4 EDSG ................................................................................................................................................................. 2900
Introduction .................................................................................................................................................................. 2900
1.21.4.53 Principles ...................................................................................................................................................... 2901
Basic Concepts ............................................................................................................................................................. 2901
1.21.4.53.79 Key EDSG Techniques ............................................................................................................................ 2902
1.21.4.53.80 EDSG Service Activation and Deactivation ............................................................................................ 2907
1.21.4.53.81 EDSG Service Replacement and Restoration .......................................................................................... 2908
1.21.4.53.82 EDSG Service Policy Obtainment ........................................................................................................... 2910
1.21.4.53.83 EDSG Service Authentication .................................................................................................................. 2912
1.21.4.53.84 EDSG Service Accounting ....................................................................................................................... 2912
1.21.4.53.85 Prepaid Quota Management for EDSG Services ..................................................................................... 2914
Issue 01 (2018-05-04) xli

NE20E-S2
1.21.4.53.86 EDSG Information Query over CoA ....................................................................................................... 2915

1.21.4.53.87 EDSG Traffic Reporting Frequency......................................................................................................... 2917
1.21.4.54 Applications .................................................................................................................................................. 2917
Typical EDSG Networking ........................................................................................................................................... 2917
1.22 License Support .................................................................................................................................................... 2920
1.23 References ............................................................................................................................................................ 2920
Issue 01 (2018-05-04) xlii

NE20E-S2
Feature Description 1 Feature Description
1 Feature Description
About This Chapter

1.1 Using the Packet Format Query Tool
1.2 VRPv8 Overview
1.3 Basic Configurations
1.4 System Management
1.5 Network Reliability
1.6 Interface and Data Link
1.7 LAN Access and MAN Access
1.8 WAN Access
1.9 IP Services
1.10 IP Routing
1.11 IP Multicast
1.12 MPLS
1.13 Segment Routing
1.14 VPN
1.15 QoS
1.16 Security
1.17 Virtual Access
1.18 System Monitor
1.19 User Access
1.20 NAT and IPv6 Transition
1.21 Value-Added-Service
1.22 License Support
Issue 01 (2018-05-04) 1
NE20E-S2
1.23 References
1.1 Using the Packet Format Query Tool

The following figure shows the window of the packet format query tool. The tool provides the
packet format query function, through which you can query the detailed formats and
descriptions of packets at the physical layer, data link layer, MPLS layer, network layer,
transport layer, and application layer.
The queried packet formats are for reference only.

 Enterprise users
 Carrier users
Figure 1-1 Window of the packet format query tool
Issue 01 (2018-05-04) 2
NE20E-S2
1.2 VRPv8 Overview

1.2.1 About This Document
Purpose
This document describes the VRP8 features in terms of its overview, architecture, system
features and features.
This document together with other types of document helps intended readers get a deep
understanding of the VRP8 features.
Related Version
The following table lists the product version related to this document.
Product Name Version

NE20E-S2 Series V800R009C10
U2000 V200R017C50
Intended Audience
This document is intended for:
 Network planning engineers
 Commissioning engineers
 Data configuration engineers
 System maintenance engineers
Security Declaration
 Encryption algorithm declaration
The encryption algorithms DES/3DES/SKIPJACK/RC2/RSA (RSA-1024 or
lower)/MD2/MD4/MD5 (in digital signature scenarios and password encryption)/SHA1
(in digital signature scenarios) have a low security, which may bring security risks. If
protocols allowed, using more secure encryption algorithms, such as AES/RSA
(RSA-2048 or higher)/SHA2/HMAC-SHA2 is recommended.
 Password configuration declaration
− Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
− To further improve device security, periodically change the password.
 Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
 Feature declaration
Issue 01 (2018-05-04) 3
NE20E-S2
− The NetStream feature may be used to analyze the communication information of

terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
− The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
− The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
within the boundaries permitted by applicable laws and regulations. Effective
measures must be taken to ensure that information is securely protected.
 Reliability design declaration
Network planning and site design must comply with reliability design principles and
provide device- and solution-level protection. Device-level protection includes planning
principles of dual-network and inter-board dual-link to avoid single point or single link
of failure. Solution-level protection refers to a fast convergence mechanism, such as FRR
and VRRP.
Special Declaration
 This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
 The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
 Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
 The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates an imminently hazardous situation which, if not
avoided, will result in death or serious injury.
Issue 01 (2018-05-04) 4
NE20E-S2
Symbol Description
Indicates a potentially hazardous situation which, if not
avoided, could result in death or serious injury.

avoided, may result in minor or moderate injury.

avoided, could result in equipment damage, data loss,
performance deterioration, or unanticipated results.
NOTICE is used to address practices not related to personal
injury.
Calls attention to important information, best practices and
tips.
NOTE is used to address information not related to
personal injury, equipment damage, and environment
deterioration.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
 Changes in Issue 03 (2017-09-20)
This issue is the third official release. The software version of this issue is
V800R009C10SPC200.
This issue is the second official release. The software version of this issue is
V800R009C10SPC100.
This issue is the first official release. The software version of this issue is
V800R009C10.
1.2.2 VRP8 Overview

1.2.2.1 Introduction
Huawei has been dedicated to developing the Versatile Routing Platform (VRP) for the last
10-plus years to provide improved IP routing services. The VRP has been widely applied to
Huawei IP network devices, including high-end and low-end switches and routers. As network
convergence and IP orientation develop, the VRP has also been applied to wireless and
transmission devices, such as the Gateway GPRS Support Node (GGSN) and Serving GPRS
Support Node (SGSN) wireless devices and the multi-service transmission platform (MSTP)
and packet transport network (PTN) transmission devices.
The VRP provides various basic IP routing services and value-added services.
 Basic routing services include:
Issue 01 (2018-05-04) 5
NE20E-S2
− TCP
− IPv4/IPv6 dual stack
− Diverse user link access techniques
− Unicast routing protocols
− Multiprotocol Label Switching (MPLS) protocols, including MPLS Label
Distribution Protocol (LDP) and MPLS traffic engineering (TE)
 Value-added services include:
− User access control
− Security
− Firewall
− L3VPN (Layer 3 Virtual Private Network)
The network devices running the VRP are configured and managed on the following universal
management interfaces:
 Command-line interface (CLI)
 SNMP
 Netconf
As a large-scale IP routing software package, the VRP has been developed based on industry
standards and has passed rigorous tests before being released. Huawei rigorously tests all
software versions to make sure that they comply with all the relevant standards before they
are released. Major features and specifications of the VRP satisfy industry standards,
including the standards defined by the Internet Engineering Task Force (IETF) and
International Telecommunication Union-Telecommunication Standardization Sector (ITU-T).
The VRP software platform has also been verified by the market as well. So far, the VRP has
been installed on more than 2,000,000 network devices. As IP technologies and hardware
develop, new VRP versions are released to provide higher performance, extensibility, and
reliability, and more value-added services.
1.2.2.1.1 Introduction of VRP8

Restricted by software and hardware technologies, earlier network device operating systems
used monolithic models. The software was compiled into an executive file and executed by an
embedded operating system. Only single-CPU hardware was available, providing integrated
control management and running all protocols and management data on one node. With the
development and popularization of Internet technologies as well as the IP orientation of
carrier networks, network devices evolved from single boxes to single chassis to multi-chassis,
and single core CPUs have evolved to multi-core CPUs.
Following this development trend to provide higher network reliability and to fully use the
processing capabilities of the multi-core hardware, Huawei developed the VRP8 based on
pre-existing versions. The VRP8 supports the following features:
 Multi-core or multi-process CPUs
 Distributed applications
 Virtualization for Virtual Routers (VSs)
 Netconf and two-phase configuration validation and configuration rollback
Issue 01 (2018-05-04) 6
NE20E-S2
1.2.2.1.2 Development of the VRP

Five VRP versions have been developed: VRP1, VRP2, VRP3, VRP5, and VRP8. The
following figure illustrates their main functions.
Figure 1-2 Development of the VRP
The VRP5 is a distributed network operating system and features high extensibility, reliability,
and performance. Currently, network devices running VRP5 are serving more than 50 carriers
worldwide. The VRP5 provides various features and its stability has withstood the market
test.
The VRP8 is a new-generation network operating system, which has a distributed,
multi-process, and component-based architecture. The VRP8 supports distributed applications
and virtualization techniques. It builds upon the hardware development trend and will meet
carriers' exploding service requirements for the next five to ten years.
1.2.2.2 Architecture
1.2.2.2.1 VRP8 Componentization
Componentization refers to the method of encapsulating associated functions and data into a
software module, which is instantiated to function as a basic unit of communication
Issue 01 (2018-05-04) 7
NE20E-S2
scheduling. The VRP8 architecture design is component-based. The entire system is divided
into multiple independent components that communicate through interfaces. One component
provides services for another component through an interface, and the served component does
not need to know how the serving component provided its services.
The component-based architecture design has the following advantages:
 Components are replaceable.
A component can be replaced by another component if the substitute provides the same
functions and services as those of the replaced component. The new component can even
use a different programming language. This enables a user to upgrade or add VRP8
components.
 Components are reusable.
High-quality software components can serve for a long time and are stored in the
software database. The VRP8 software can be customized to a product architecture that
is quite different from its original hardware platform.
 Components are distributable.
VRP8 components are deployed in a distributed manner. Two relevant components are
deployed on different nodes and they can communicate with each other across networks.
Component distribution can be implemented without modifying components. Instead,
only the data of related deployment policies needs to be modified.
1.2.2.2.2 VRP8 Virtualized Hierarchy

The VRP8 supports virtualization by dividing a device into VSs, allowing flexible service
deployment. Devices are categorized into VSs based on logical resources.
Figure 1-3 VRP8 virtualized hierarchy
A PS can be divided into multiple VSs. A VS can be separately configured with services. VSs
share a line card's physical interfaces, hardware forwarding resources, and the processing
capability of the control plane.
Virtualization techniques provide the following functions:
 Virtualized networks
VSs of a device are leased to enterprise users and other service providers (SPs), which
reduces CAPEX and OPE and provides a higher level of security and reliability.
 Flat networks
Issue 01 (2018-05-04) 8
NE20E-S2
VSs allow one device to have the functionalities of multiple devices, such as provider (P)
and provider edge (PE) devices, which simplifies the network architecture and flattens
the network.
 Multi-service over a network
A variety of services are separately deployed on various VSs. These VSs form a logical
multi-service network. VSs allow services to be independent of each other, which
improves security and reliability and reduces CAPEX and OPEX.
 New service verification
After a device is divided into VSs, new services such as IPv6 services or video services
can be verified separately without affecting existing services. VSs carrying various
services form a logical network, which improves security and reliability.
1.2.2.2.3 VRP8 High Extensibility

To improve extensibility, the VRP8 supports backward compatibility and plug-and-play
functionality on hardware line cards, allowing quick responses to users' demands. The VRP8
implements high expandability for the following items:
 Line cards: Standard driver framework and plug-and-play are supported.
 Software features: The data plane operates based on modules.
 Capacity and performance: Services based on fine-granularity distribution are
simultaneously processed.
 Service deployment: Virtualization techniques for dividing a device into VSs are
supported.
 Operation and maintenance tools: The configuration plane is separated from the control
plane.
The trend is to utilize, multi-main control board, multi-CPU, and multi-core architectures in
the development of the hardware on existing core routers. The reason is that traditional
integrated OS does not support modular service deployment or processing, and only depends
on the processing capability of a single CPU with a single core. The second-generation OS
supports coarse-granularity modules, allowing multiple protocols and service modules to
simultaneously process services. These OSs, however, are incapable of supporting the
processing of protocol- and service-specific distributed instances and are still unable to take
advantage of multi-CPU and multi-core processing capabilities. The VRP8 with its
fine-granularity distributed architecture and protocol- and service-specific components allows
a device to deploy services in distributed instances and to process services simultaneously.
This helps a device overcome the constraints of the single entity's processing capability and
memory and to take advantage of integral hardware processing capability on the control plane,
improving the sustainable extensibility of the device's performance and capacity.
Issue 01 (2018-05-04) 9
NE20E-S2
Figure 1-4 Improving performance and capacity extensibility through VRP8 distribution
On the VRP8, the data plane adopts a model-based forwarding technique. A mere change in
the forwarding model, not in code, allows a new function to be implemented or allows a
function change on the forwarding plane, enabling quick responses to carriers' demands.
Figure 1-5 High extensibility of the data plane
To support various network interfaces, an IP network device usually supports various line card
types. The problem is that these cards historically needed to be replaced as technology
progressed and chips were replaced and/or updated. To help carriers maximize the return on
investment and prevent large-scale line card replacement, software needs to support forward
and backward compatibility of line cards. The VRP8 implements forward and backward
Issue 01 (2018-05-04) 10
NE20E-S2
compatibility on line cards over the standard driver framework with the help of software and
hardware decoupling techniques.
Figure 1-6 VRP8 forward and backward compatibility
1.2.2.2.4 VRP8 Carrier-Class Management and Maintenance
Configuration Management
As shown in Figure 1-7, the VRP8 management plane adopts a hierarchical architecture,
consisting of the following elements:
 Configuration tools
 Configuration information model
 Configuration data
The VRP8 management plane provides the following functions:
 Support for various existing configuration tools and more
 Implementation of model-based configuration
 Data verification and configuration rollback
 Database-assistant configuration data recovery
Issue 01 (2018-05-04) 11
NE20E-S2
Figure 1-7 VRP8 configuration management and maintenance model
A configuration interface layer provides various configuration tools. A configuration tool

parses a configuration request and then sends the request to a Configuration (CFG)
component. The CFG component uses a pre-defined configuration information model to
perform verification, association, and generation of configuration data. After a user commits a
configuration and the configuration is successfully executed, configuration data is saved in a
central database. A process-specific APP database obtains the configuration information from
the central database.
The VRP8 supports two-phase configuration validation and configuration rollback.
Fault Management
As shown in Figure 1-8, the VRP8 implements fault management based on service objects.
The VRP8 creates a service object relationship model to analyze the correlation between
alarms, filter out invalid alarms, and report root alarms, speeding up fault identification.
Issue 01 (2018-05-04) 12
NE20E-S2
Figure 1-8 VRP8 fault management model
Performance Management
As shown in Figure 1-9, the VRP8 provides a flexible performance management mechanism.
Information about an object to be monitored, including a description of the object and a
monitoring threshold, can be manually defined on a configuration interface. The configuration
data can then be delivered by the central database. The APP component collects statistics
about the configured object and sends them to a Perf Management server through a
performance management (PM) agent. After receiving the statistics, the Perf Management
server generates information about a fault based on the pre-defined object and monitoring
threshold and then sends this fault information to the network management system (NMS)
through the fault management center. Performance information can be viewed by running a
command or through the NMS.
Issue 01 (2018-05-04) 13
NE20E-S2
Figure 1-9 VRP8 performance management model
Plug-and-Play
As shown in Figure 1-10, VRP8 plug-and-play allows a great number of devices to be
deployed on a site at a time and to be managed and maintained in remote mode, reducing
OPEX.
Issue 01 (2018-05-04) 14
NE20E-S2
Figure 1-10 VRP8 plug-and-play
Devices supporting VRP8 plug-and-play are deployed as follows:

1. Software commissioning engineers import IP addresses and names of devices to be
deployed to a DHCP server.
2. Hardware installation engineers install devices and power them on.
3. Devices automatically apply for IP addresses and initial configurations and the DHCP
server assigns IP addresses and delivers initial configurations.
4. The devices report their presence to the NMS and the NMS detects the devices online.
Then the commissioning engineers remotely commission the devices and configure
services.
1.2.2.2.5 Advantages of the VRP8 Architecture

The VRP8 architecture has the following advantages:
 High extensibility
− The VRP8 has a layered architecture with clear inter-layer dependency and
interfaces and independent intra-layer components.
− The base framework uses the model-driven architecture technology, with stable
processing mechanisms and flexible separated service models, to rapidly respond to
customers' requirements for software features.
− Based on the standard driver framework, hardware drivers support plug-and-play,
implementing backward compatibility of interface boards.
− The virtual system (VS) technique is supported to flexibly extend services.
Benefits: flexible service operation, timely response to customers' requirements, and
smooth hardware upgrades
 High reliability
− Process-based fault isolation is implemented.
Issue 01 (2018-05-04) 15
NE20E-S2
− Process-based NSx helps implement seamless convergence on the forwarding plane,

control plane, and service plane.
Benefits: non-stop service operation with high reliability and reduced operation and
maintenance expenditures
 High performance
− Services are distributed in fine granularity and processed at the same time,
achieving industry-leading performance and specification indicators.
− Performance and specifications are expandable and can be improved along with
hardware upgrades.
− Priority-based real-time scheduling guarantees that services are rapidly processed.
Benefits: larger-scaled service deployment and faster fault convergence, full use of
hardware capabilities, and continuous improvement in performance and specifications
 Carrier-class management and maintenance
− The carrier-class configuration and management plane facilitates service
deployment and maintenance.
− The VRP8 provides better fault management mechanisms.
− The VRP8 provides plug-and-play network device management.
Benefits: more effective service deployment capabilities, faster service monitoring and
fault locating, and lower operation and maintenance expenditures
1.3 Basic Configurations

Purpose
This document describes the basic configurations features in terms of its overview, principle,
and applications.
Related Version

U2000 V200R017C50
Intended Audience
Issue 01 (2018-05-04) 16
NE20E-S2
securely protected.
and VRRP.
Special Declaration
Issue 01 (2018-05-04) 17
NE20E-S2
Symbol Conventions
Symbol Description



injury.
tips.
deterioration.
Change History
V800R009C10SPC200.
V800R009C10SPC100.
Issue 01 (2018-05-04) 18
NE20E-S2

V800R009C10.
1.3.2 TTY
Only physical systems (PSs) support terminal type (TTY).
Terminal type (TTY), also called terminal service, provides access interfaces and human
machine interfaces (HMIs) for you to configure routers. TTY supports the following ports:
 Console port
 Virtual type terminal (VTY) port
Routers support user login over console or VTY ports. You can use a console port to set user
interface parameters, such as the speed, databits, stopbits, and parity. You can also initiate a
Telnet or Secure Shell (SSH) session to log in to a VTY port.
Routers do not support user login over AUX ports.
1.3.2.2 Principles
1.3.2.2.1 TTY
User Management
You can configure, monitor, and maintain local or remote network devices only after
configuring user interfaces, user management, and terminal services. User interfaces provide
login venues, user management ensures login security, and terminal services provide login
protocols. Routers support user login over console ports.
User Interface
A user interface is presented in the form of a user interface view for you to log in to a router.
You can use user interfaces to set parameters on all physical and logical interfaces that work
in asynchronous and interactive modes, and manage, authenticate, and authorize login users.
Routers allow users to access user interfaces through console ports.
A console port is provided by IPU of a router. IPU provides one console port that conforms to
the EIA/TIA-232 standard. The console port is a data connection equipment (DCE) interface.
The serial port of a user terminal can be directly connected to a router's console port to
implement local configurations.
User Login
When a router starts for the first time, no user name or password is configured on it. However,
the router prompts you to configure a password during the first login. After you configure a
password for a router, you must enter the configured password before logging in to the router
through the console port.
Issue 01 (2018-05-04) 19
NE20E-S2
When a router is powered on for the first time, you must log in to the router through the console port,
which is a prerequisite for other login modes as well. For example, you can use Telnet to log in to a
router only after you use the console port to log in to the router and configure an IP address.
1.3.3 Telnet
The Telecommunication Network Protocol (Telnet) is derived from ARPANET released in
1969. It is the earliest Internet application.
A Telnet connection is a Transmission Control Protocol (TCP) connection used to transmit
data with interspersed Telnet control information. Telnet uses the client/server model to
present an interactive interface that enables a terminal to remotely log in to a server. A user
can log in to one host and then use Telnet to remotely log in to and configure and manage
multiple hosts without having to connect each one to a terminal. Figure 1-11 shows the Telnet
client/server model.
Figure 1-11 Telnet client/server model
In Figure 1-11:
 Telnet uses TCP for transmission.
 All Telnet echo information is displayed on the terminal.
 The server directly interacts with the pseudo terminal.
 The server and client transmit commands and data over the TCP connection.
 The client logs in to the server.
Use the STelnet protocol because this protocol is not secure.
1.3.3.2 Principles
1.3.3.2.1 Telnet
Telnet applies to any host or terminal. The client's operating system maps a terminal to a
network virtual terminal (NVT) regardless of the terminal's type. Then, the server maps the
Issue 01 (2018-05-04) 20
NE20E-S2
NVT into a supported terminal type. This mapping masks client and terminal types.
Communicating ends are assumed to be connected to the NVTs.
Telnet uses the symmetric client/server mode. Therefore, each end of a Telnet connection must have an
NVT.
Two communicating ends negotiate options by sending WILL, WONT, DO, and DONT
requests. The options are used to determine the content of the Telnet service and include the
echo information, command change character set, and line mode.
Requests in Telnet
Either communicating end can initiate a request. Table 1-1 describes requests in Telnet.
Table 1-1 Requests in Telnet

Request Description Response From the Receiver
From the
Sender
- - WILL WONT DO DONT

WILL The sender wants - - The The receiver
to enable an receiver rejects the
option. grants the request.
request.
WONT The sender wants - - - The receiver
to disable an must grant
option. the request.
DO The sender wants The receiver The receiver - -
the receiver to grants the rejects the
enable an option. request. request.
DONT The sender wants - The receiver - -
the receiver to must grant
disable an option. the request.
 When the sender sends a WONT or DONT request, the receiver must grant the request.
 When the sender sends a WILL or DO request, the receiver can grant or reject the
request.
− If the receiver grants the request, the option immediately takes effect.
− If the receiver rejects the request, the option does not take effect. The sender can
still retain the NVT function.
Option Negotiation
Option negotiation requires the following three items:
 IAC
 A WILL, DO, WONT, or DONT request
Issue 01 (2018-05-04) 21
NE20E-S2
 Option ID
The following is an example of option negotiation.
The server wants to enable option 33 "remote traffic control," and the client grants the request.
The exchanged commands are as follows:
 On the server: <IAC, WILL, 33>
 On the client: <IAC, DO, 33>
Suboption Negotiation
In addition to an option ID, other information may be required. For example, if the receiver is
required to specify a terminal type, the receiver must respond with an American Standard
Code for Information Interchange (ASCII) string to identify the terminal type.
The format of suboption negotiation is as follows:
<IAC, SB, option ID, suboption content, IAC, SE>
A complete suboption negotiation process is as follows:
1. The sender asks to enable the option by sending a DO/WILL request carrying the option
ID.
2. The receiver grants the request by sending a WILL/DO request carrying the option ID.
Through the preceding steps, both communicating ends agree to enable the option.
3. Either communicating end sends the request carrying the suboption ID through the
suboption-begin (SB) command and ends the suboption negotiation through the
suboption-end (SE) command.
4. The other end responds to the suboption negotiation through the SB command, suboption
codes, and related negotiation information, and then ends the suboption.
5. The receiver responds with a DO/WILL message to grant the request.
If there is no other suboption to be negotiated, the current negotiation is complete.
 Assume, for demonstration purposes, that the receiver grants the request from the sender.
 In practice, the receiver can reject the request from the server at any time.
The following is an example of terminal type negotiation.

The client wants to enable the "terminal type" option (option ID is 24). The server grants the
request and asks the client what its terminal type is. The client responds to the server with its
terminal type "DELL PC". The exchanged commands are as follows:
1. From the client: <IAC, WILL, 24>
2. From the server: <IAC, DO, 24>
3. From the server: <IAC, SB, 24, 1, IAC, SE>
4. From the client: <IAC, SB, 24, 0, 'D', 'E', 'L', 'L', 'P', 'C', IAC, SE>
 The sender can request the terminal type only when the option negotiation type is DO.
 The sender can relay the actual terminal type only when the option negotiation type is WILL.
 The terminal type cannot be sent automatically. It is sent only for responding to the request, that is, it
is sent in request-response mode.
 The terminal type information is a case-insensitive NVT ASCII string.
Issue 01 (2018-05-04) 22
NE20E-S2
Operating Modes
Telnet supports the following operating modes:
 Half-duplex
 One character at a time
 One line at a time
 Line mode
Symmetry
Symmetry in the negotiation syntax allows either the client or server to request a particular
option as required, which optimizes the services provided by the other party. A terminal
protocol allows a terminal to interact with an application process on a host. It also allows
process-process and terminal-terminal interactions.
Support for IPv6 in the NE20E

Telnet clients allow access to IPv6 host addresses, and Telnet servers can receive IPv6
connection requests.
1.3.3.3 Applications
1.3.3.3.1 Telnet
Telnet in the HUAWEI NE20E-S2

The NE20E covers the following Telnet services:
 Telnet server
You can run the Telnet client application on a PC to log in to and manage a router.
 Telnet client
After running the emulation terminal program or Telnet client application on a PC to set
up a connection with a router, you can run the telnet command to log in to and manage
other routers. In the scenario shown in Figure 1-12, Device A acts as both a Telnet server
and client.
Figure 1-12 Telnet applications
Issue 01 (2018-05-04) 23
NE20E-S2
Telnet applies to remote login. You can use Telnet to configure, monitor, and maintain remote
or local devices.
As shown in Figure 1-13, you can use Telnet on Device A to remotely log in to Device B.
Figure 1-13 Telnet login
1.3.4 SSH
Telnet access is not secure because there is no authentication method and the data transmitted
across Transmission Control Protocol (TCP) connections is in plaintext. As a result, the
system is vulnerable to denial of service (DoS), IP address spoofing, and route spoofing
attacks.
As network security becomes more and more critical, using Telnet and File Transfer Protocol
(FTP) to transmit passwords and data in plaintext proves to be more and more vulnerable.
Secure Shell (SSH) resolves this issue. SSH encrypts the transmitted data to provide networks
with security services and therefore ensures security during remote login.
SSH exchanges data using TCP. It builds a secure channel over TCP. In addition to the
standard port (port 22), SSH supports access from other service ports to prevent unauthorized
access.
 SSH has three versions: SSH1.0, SSH1.5, and SSH2.0. The NE20E implements SSH2.0 that features
backward compatibility.
 Unless specified otherwise, SSH in this document refers to SSH2.0.
1.3.4.2 Principles
1.3.4.2.1 SSH
SSH Client
The SSH client function allows you to establish SSH connections with a router that can
function as an SSH server or with a UNIX host. Figure 1-14 and Figure 1-15 show the setup
of SSH channels for a local area network (LAN) and a wide area network (WAN),
respectively.
Issue 01 (2018-05-04) 24
NE20E-S2
Figure 1-14 Setting up an SSH channel on a LAN
Figure 1-15 Setting up an SSH channel on a WAN
SFTP
SFTP is short for SSH FTP that is a secure FTP protocol. SFTP is on the basis of SSH. It
ensures that users can log in to a remote device securely for file management and transmission,
and enhances the security in data transmission. In addition, you can log in to a remote SSH
server from the device that functions as an SFTP client.
STelnet
STelnet is a secure Telnet protocol, it is based on SSH2.0. Unlike Telnet, SSH authenticates
clients and encrypts data in both directions to guarantee secure transmissions on a
conventional insecure network.
SCP
Secure Copy (SCP) is based on SSH2.0. It guarantees secure file transfer in the traditional
insecure network environment by authenticating the client and encrypting the transmitted data
by using stelnet service.
SCP uses Secure Shell (SSH) for data transfer and uses the same mechanisms for
authentication, thereby ensuring the authenticity and confidentiality of the data in transit. A
client can send (upload) files to a server. Client can also request files or directories from a
server (download). SCP runs over TCP port 22 by default.
Issue 01 (2018-05-04) 25
NE20E-S2
Unlike SFTP, SCP allows file uploading or downloading without user authentication and
public key assignment, and also supports file uploading or downloading in batches.
Supporting Access Through Other Ports

The standard monitoring port number of SSH is 22. Access to this port continuously degrades
the performance of the bandwidth and the server, and other clients can no longer access the
port. This is a kind of the Denial of Service (DoS) attack.
After you set the monitoring port to a non-standard port on the SSH server, the attacker
cannot learn about the port change. This effectively prevents the attacker from continuously
accessing the standard port to use the bandwidth and system resources excessively. Legal
users can access the SSH service through the non-standard port to prevent DoS attacks.
Figure 1-16 shows SSH server access through other ports.
Figure 1-16 Accessing the SSH server through other ports
Only authorized clients can set up socket connections with the SSH server through the
non-standard port. The clients and server then negotiate an SSH version, algorithms, and
session keys. User authentication, session requests, and interactive sessions are performed
subsequently.
SSH can be applied on switched or edge devices across the network to implement secure user
access and management on the devices.
Secure Remote Access

SSH provides secure remote access on insecure networks by taking the following measures:
 Supports Rivest-Shamir-Adleman (RSA)/digital signature algorithm (DSA)/elliptic curve
cryptography (ECC) public key authentication modes. The public and private keys are
generated based on the encryption principle of the asymmetric encryption system,
ensuring secure key exchange and session process.
 Supports certificate authentication modes. The client uses certificate signatures to
authenticate the server, preventing the middleman attack.
Issue 01 (2018-05-04) 26
NE20E-S2
 Supports data encryption algorithms, such as Data Encryption Standard (DES), 3DES,
and Advanced Encryption Standard (AES).
 Encrypts the data exchanged between the SSH client and the server, including the user
name and password. This encryption prevents the password from being intercepted.
 SM2 elliptic curves cryptography (ECC) algorithm
The SM2 and RSA algorithms are based on the ECC and belong to the asymmetric
cryptography system. The differences between the ECC and RSA algorithms are as
follows:
− The RSA algorithm is based on large number factorization, which increases the key
length. And the long keys slow down the computing speed and complicate the key
storage and management.
− Based on discrete logarithm, the ECC algorithm is difficult to crack and is more
secure.
Compared with the RSA algorithm, the ECC algorithm shortens the key length while
ensuring the same security.
Compared with the RSA algorithm, the ECC algorithm secures the encryption with
short keys, which speeds up encryption. The ECC algorithm has the following
advantages:
− ECC algorithm provides same security with shorter key length than the RSA
algorithm.
− Features a shorter computing process and higher processing speed than the RSA
algorithm.
− Requires less storage space than the RSA algorithm does.
− Requires lower bandwidth than the RSA algorithm does.
To ensure high security, do not use the DES algorithm/3DES algorithm/RSA algorithm whose length is
less than 2048 digits as the authentication type for the SSH user and data encryption. You are advised to
use a securer ECC authentication algorithm for higher security.
Supporting ACL
The SSH server can use access control lists (ACLs) to limit SSH users' incoming and
outgoing call authorities. ACL prevents unauthorized users from setting up TCP connections
and entering the SSH negotiation phase, which improves SSH server access security.
Figure 1-17 Applying ACL on the SSH server
Support for IPv6 in the NE20E

SSH clients support access to IPv6 host addresses, and SSH servers can receive IPv6
connection requests.
Issue 01 (2018-05-04) 27
NE20E-S2
1.3.4.3.1 Supporting STelnet
STelnet is based on SSH. The client and server set up a secure connection through negotiation.
The client can then log in to the server through the secure Telnet service.
Figure 1-18 STelnet
As shown in Figure 1-18,

 Devices support both the STelnet client and server functions.
For convenience, the devices can be either STelnet servers or clients to access other
STelnet servers.
 Devices support the enabling or disabling of the STelnet server. By default, the STelnet
server is disabled.
When the STelnet server function is not required, you can disable it globally.
1.3.4.3.2 Supporting SFTP

SFTP is based on SSH2.0. It provides the following authentication methods: password, RSA,
password-rsa, DSA, password-dsa, ECC, password-ecc, and all the above. Before logging in
to the server through the SFTP client, you must enter a correct user name, password, and
private key for authentication by the server. After you are authenticated, you can remotely
manage files as you do using FTP. The system uses a negotiated session key to encrypt data.
Attackers do not have correct private keys or passwords, and therefore they cannot be
authenticated. Attackers also cannot decrypt transmitted data to obtain session keys, though
they may have listened to the data between clients and the server. This is because only
specified clients and the server can decrypt the transmitted data. This mechanism ensures the
security of data transmission across the network.
The system provides the following functions:
 Supports both the SFTP client and server functions.
For convenience, the devices can be either SFTP servers or clients to access other SFTP
servers.
 Supports the enabling or disabling of the SFTP server. By default, the SFTP server is
disabled.
When the SFTP server function is not required, you can disable it. This function is
configured globally.
 Supports the setting of the default directory that the SFTP client is allowed to access.
The server allocates different directories to clients, which implements file isolation
among different clients.
 Supports client and server using the transparent file system, a unified file system used for
accessing files on remote boards.
Issue 01 (2018-05-04) 28
NE20E-S2
 Supports the NETCONF file transfer process and provides acknowledge for a file
transfer success or failure.
Figure 1-19 shows an SFTP application.
Figure 1-19 SFTP
1.3.4.3.3 Supporting SCP

The Secure Copy Protocol (SCP), derived from Secure Shell version 2 (SSH2), securely
transfer files between hosts based on the client/server model. SCP supports the following
authentication methods: password authentication, digital signature algorithm (DSA),
password-DSA, elliptic curve cryptography (ECC) algorithm, password-ECC,
Revist-Shamir-Adleman Algorithm (RSA), and password-RSA. A user on an SCP client must
enter a correct user name, password, and private key for authentication before establish a
connection to an SCP server. After authentication, the client can manage remote file transfer
over a network using SCP and encrypt data with a session key negotiated with the server.
With the SCP function, an attacker does not have the correct private key or password, fails to
be authenticated. In addition, the attacker cannot decrypt data or obtain a session key even
though the attacker intercepts data exchanged between clients and the server. Only specified
clients and the server can decrypt data exchanged between one another. SCP helps devices
securely transmit data across networks.
Devices support the following SCP functions:
 Devices support both the SCP client and server functions.
Each device can serve as either an SCP server or client.
 The SCP server function can be enabled and disabled. By default, the SCP server
function is disabled.
Disable the SCP server function when you do not need it. This function is configured
globally.
 The transparent file system can be used for on the client and server. A unified file
system is used to obtain files on remote boards for all file operations.
 Recursive multiple file transfer is supported.
Issue 01 (2018-05-04) 29
NE20E-S2
For example, a directory contains multiple files and sub-directories. SCP can be used to
transfer all files in the directory in a batch without changing the hierarchical directory
structure.
Figure 1-20 SCP networking diagram
1.3.4.3.4 Accessing a Private Network

HUAWEI NE20E-S2 supports STelnet, SNETCONF, and SFTP client functions and can set
up virtual private network (VPN)-based socket connections. Both the STelnet, SNETCONF,
and SFTP clients access the SSH server on a private network.
Figure 1-21 shows access to a private network.
Figure 1-21 Accessing a private network
Issue 01 (2018-05-04) 30
NE20E-S2
1.3.4.3.5 Supporting Access Through Other Ports

SSH's standard monitoring port number is 22. Attackers' continual access to this port degrades
bandwidth and server performance. As a result, other clients cannot access the port. This is a
kind of denial of service (DoS) attack.
After you configure a non-standard port on the SSH server, attackers cannot learn the port
change and continue to send socket connection requests to port 22. When the SSH server
checks that the port is not a monitoring port, it rejects the requests.
Figure 1-22 shows SSH server access through other ports.
Figure 1-22 Accessing the SSH server through other ports
Only authorized clients can set up socket connections with the SSH server through the
non-standard port. The clients and server then negotiate an SSH version, algorithms, and
session keys. User authentication, session requests, and interactive sessions are performed
subsequently.
SSH can be applied on switched or edge devices across the network to implement secure user
access and management on the devices.
1.3.4.3.6 Supporting ACL

The SSH server can use access control lists (ACLs) to limit SSH users' incoming and
outgoing call authorities. ACL prevents unauthorized users from setting up TCP connections
and entering the SSH negotiation phase, which improves SSH server access security.
Figure 1-23 Applying ACL on the SSH server
Issue 01 (2018-05-04) 31
NE20E-S2
1.3.4.3.7 Supporting SNETCONF

The NETCONF agent, an application running on top of the SSH server, uses a secure
transport channel established by SSH. NETCONF is used to access configuration and state
information and to modify configuration information, and therefore the ability to access this
protocol must be limited to authorized clients. To run NETCONF over SSH, the client first
establishes an SSH transport connection using the SSH transport protocol. The client and
server exchange keys for message integrity and encryption. Once the client is successfully
authenticated, the client invokes the "SSH-connection" service, also known as the SSH
connection protocol. After the SSH connection service is established, the client opens a
session channel, which results in an SSH session. Once the SSH session is established, the
user (or application) invokes SNETCONF as an SSH subsystem, which is a feature of SSH
version 2 (SSHv2). The SSH server ensures the reliability and packet sequencing for the data
packets delivered for the SNETCONF subsystem.
Figure 1-24 Applying NETCONF on the SSH server
1.3.5 Command Line Interface

Definition
The command line interface (CLI) is an interface through which you can interact with a router.
The system provides a series of commands that allow you to configure and manage the router.
Purpose
The CLI is a traditional configuration tool, which is available on most data communication
products. However, with the wider application of data communication products worldwide,
customers require a more available, flexible, and friendly CLI.
Carrier-class devices have strict requirements for system security. Users must pass the
Authentication, Authorization and Accounting (AAA) authentication before logging in to a
CLI or before running commands, which ensures that users can view and use only the
commands that match their rights.
1.3.5.2 Principles
1.3.5.2.1 Principles of the Command Line Interface
The CLI is a key configuration tool. After you log in to a router, a prompt is displayed,
indicating that you have accessed the CLI and can enter a command.
Issue 01 (2018-05-04) 32
NE20E-S2
The CLI parses commands and packets carrying configuration information. You can use the
CLI to configure and manage routers. The CLI also provides an online help function.
Basic Principles of CLI Command Parsing

To parse a command, the CLI undergoes the following phases:
1. Command receiving phase
The CLI receives and displays all characters you have entered. When you press Enter,
the CLI begins to process the command.
2. Command matching phase
The system compares the received command with commands in the current command
mode to search for a matching command.
− If a matching command exists, the system enters the command checking phase.
− If a matching command does not exist, the system informs you that the command is
invalid and waits for a new command.
3. Command checking phase
The CLI checks every element of the entered command against the matching command,
including the string length and value range validity.
− If all command elements are valid, the system authenticates the command.
− If any command element is invalid, the system informs you that the command is
invalid and waits for a new command.
4. Command authentication phase
The system authenticates the user name and command locally or sends them to the AAA
server for authentication.
− If you have rights to run the command, the system begins to parse the command.
− If you do not have rights to run the command, the system displays a message and
waits for a new command.
5. Command parsing phase
After parsing a command into a packet that carries specific information, the CLI sends
the packet to the command processing module and waits for the results. The CLI then
parses the packet carrying the results and displays them on the terminal.
Basic Principles of Online Help

Online help is one of the basic components of the CLI. This function helps you know which
commands can be configured and provides the predictive text input function. For example,
when entering a command, the value range of a parameter in the command is provided.
Online help can be classified as full, partial, or Tab help.
 Full help
− In any command view, when you enter a question mark (?) at the command prompt,
all the first element of the commands available in the command view and their brief
descriptions are listed.
− When you enter a command followed by a space and a question mark (?), all the
keywords and their brief descriptions are listed if the position of the question mark
(?) is for a keyword.
Issue 01 (2018-05-04) 33
NE20E-S2
− When you enter a command followed by a space and a question mark (?), the value
range and function of the parameter are listed if the position of the question mark (?)
is for a parameter.
To provide full help in command mode, the CLI undergoes the following phases:
a. Command receiving phase
The CLI receives and displays all characters you have entered. When you enter a
question mark (?), the CLI starts online help. If full help is required, the system
starts full help.
b. Command matching phase
The system compares the received command with commands in the current
command mode to search for a matching command.
 If a matching command exists, the system matches commands with your rights
and displays all commands you can use.
 If a matching command does not exist, the system informs you that the
command is invalid and waits for a new command.
c. Command help phase
The system searches the configurable commands for possible elements in the
question mark (?) position.
 If the entered command is complete, cr is displayed.
 If the entered command is incomplete, possible command elements and their
description are displayed.
 Partial help
− When you enter a string followed by a question mark (?), the system lists all
keywords that start with the string.
− When you enter a command followed by a question mark (?):
 If the position of the question mark (?) is for a keyword, all keywords in the
command starting with the string are listed.
 If the position of the question mark (?) is for a parameter and the parameter is
valid, information about all the parameters starting with the string is listed,
including the value range.
 If the position of the question mark (?) is for a parameter but the parameter is
invalid, the CLI informs you that the input is incorrect.
To provide partial help in specific command mode, the CLI undergoes the following
phases:
a. Command receiving phase
The CLI receives and displays all characters you have entered. When you enter a
question mark (?), the CLI starts online help. If partial help is required, the system
starts partial help.
b. Command matching phase
The system compares the received command with commands in the current
command mode to search for a matching command.
 If a matching command exists, the system matches commands with your rights
and displays all commands you can use.
 If a matching command does not exist, the system informs you that the
command is invalid and waits for a new command.
c. Command help phase
Issue 01 (2018-05-04) 34
NE20E-S2
The system searches configurable commands for possible command elements in the
position of a question mark (?) and displays possible command elements.
 Tab help
Tab help is an application of partial help, which provides help only for keywords. The
system does not display the description of a keyword.
You can enter the first letters of a keyword in a command and press Tab.
Tab help information is displayed in lexicographical order.
− If what you have entered identifies a unique keyword, the complete keyword is
displayed.
− If what you have entered does not identify a unique keyword, you can press Tab
repeatedly to view the matching keywords and select the desired one.
− If what you have entered does not match any command element, the system does
not modify the input and just displays what you have entered.
− If what you have entered is not a keyword in the command, the system does not
modify the input and just displays what you have entered.
The CLI also provides dynamic help for querying the database and script. If parameters
in a command support dynamic help and you enter the first letters of a parameter in the
command and press Tab, the following situations occur:
− If what you have entered identifies a unique parameter, the complete parameter is
displayed.
− If what you have entered does not identify a unique parameter, you can press Tab
repeatedly to view the matching parameters and select the desired one.
Shortcut Key Function

Shortcut keys are classified as system or user-defined shortcut keys.
When you use a shortcut key, the system automatically executes the corresponding command.
Different terminal software defines shortcut keys differently. Therefore, the shortcut keys on your
terminal may be different from those listed here.
Security Management Policy

Before you run a command, the system authenticates your rights. When the CLI starts, it
obtains an authentication policy from the local AAA server and authenticates all commands
based on this policy.
None
Issue 01 (2018-05-04) 35
NE20E-S2
1.3.6 SSL
Definition
The Secure Sockets Layer (SSL) protocol is a cryptographic protocol that provides
communication security over the Internet. It allows a client and a server to communicate in a
way designed to prevent eavesdropping by authenticating the server or the client.
Purpose
SSL and application layer protocols work independently. Connections of application layer
protocols, such as Syslog, can be established based on SSL handshakes. Before a client and a
server use an application layer protocol to communicate, SSL is used to determine
cryptography, negotiate a key, and authenticate the server. Data that is then transmitted using
the application layer protocol between the client and the server will be encrypted, thereby
protecting privacy.
Benefits
SSL offers the following benefits:
 Provides secure network transmission. SSL uses data encryption, authentication, and
message integrity check to ensure secure data transmission over the network.
 Supports various application layer protocols. SSL is originally designed for securing
World Wide Web traffic. As SSL functions between the application and transport layers,
it secures data transmission for any application layer protocol based on TCP connections.
 Achieves easy deployment.
1.3.6.2 Principles
1.3.6.2.1 SSL
Working Process
 SSL protocol structure
As shown in Figure 1-25, SSL functions between the application and transport layers. It
secures data transmission for any application layer protocol based on TCP connections.
SSL is divided into two layers: lower layer with the SSL record protocol and upper layer
with the SSL handshake protocol, SSL change cipher spec protocol, and SSL alert
protocol.
Issue 01 (2018-05-04) 36
NE20E-S2
Figure 1-25 SSL protocol stack
− SSL record protocol: divides upper-layer information blocks into records, computes
and adds message authentication codes (MACs), encrypts records, and sends them
to the receiver.
− SSL handshake protocol: negotiates a cipher suite including a symmetric encryption
algorithm, a key exchange algorithm, and a MAC algorithm, exchanges a shared
key securely between a server and a client, and authenticates the server and client.
The client and server establish a session using the SSL handshake protocol to
negotiate session parameters including the session identifier, peer certificate, cipher
suite, and master secret.
− SSL change cipher spec protocol: used by the client and server to send a
ChangeCipherSpec message to notify the receiver that subsequent records will be
protected under the newly negotiated cipher suite and key.
− SSL alert protocol: allows one end to report alerts to the other. An alert message
conveys the alert severity and description.
 SSL handshake process
The client and server negotiate session parameters during the SSL handshake process to
establish a session. Session parameters mainly include the session identifier, peer
certificate, cipher suite, and master secret. The master secret and cipher suite are used to
compute a MAC and encrypt data to be transmitted in this session.
The SSL handshake process varies according to the real-world situations. Handshake
processes in three situations are described as follows:
− SSL handshake process in which only the server is authenticated
Issue 01 (2018-05-04) 37
NE20E-S2
Figure 1-26 SSL handshake process in which only the server is authenticated
As shown inFigure 1-26, only the SSL server, not the SSL client, needs to be
authenticated. The SSL handshake process is as follows:
i. The SSL client sends a ClientHello message specifying the supported SSL
protocol version and cipher suite to the SSL server.
ii. The server responds with a ServerHello message containing the protocol
version and cipher suite chosen from the choices offered by the client. If the
server allows the client to reuse this session in the future, the server sends a
ServerHello message carrying a session ID to the client.
iii. The server sends a Certificate message carrying its digital certificate with its
public key to the client.
iv. The server sends a ServerHelloDone message, indicating that the SSL protocol
version and cipher suite negotiation finishes and key information exchange
starts.
v. After verifying the digital certificate of the server, the client responds with a
ClientKeyExchange message carrying a randomly generated key (called the
master secret), which is encrypted using the public key of the server certificate.
vi. The client sends a ChangeCipherSpec message to notify the server that every
subsequent message will be encrypted and a MAC will be computed based on
the negotiated key and cipher suite.
vii. The client computes a hash for all the previous handshake messages except the
ChangeCipherSpec message, uses the negotiated key and cipher suite to
process the hash, and sends a Finished message containing the hash and MAC
Issue 01 (2018-05-04) 38
NE20E-S2
to the server. The server computes a hash in the same way, decrypts the
received Finished message, and verifies the hash and MAC. If the verification
succeeds, the key and cipher suite negotiation is successful.
viii. The server sends a ChangeCipherSpec message to notify the client that
subsequent messages will be encrypted and a MAC will be computed based on
the negotiated key and cipher suite.
ix. The server computes a hash for all the previous handshake messages, uses the
negotiated key and cipher suite to process the hash, and sends a Finished
message containing the hash and MAC to the client. The client computes a
hash in the same way, decrypts the received Finished message, and verifies the
hash and MAC. If the verification succeeds, the key and cipher suite
negotiation is successful.
After receiving the Finished message from the server, if the client successfully
decrypts the message, the client checks whether the server is the owner of the
digital certificate. Only the SSL server that has a specified private key can decrypt
the ClientKeyExchange message to obtain the master secret. In this process, the
client authenticates the server.
 The ChangeCipherSpec message is based on the SSL change cipher spec protocol, and other
messages exchanged in the handshake process are based on the SSL handshake protocol.
 Computing a hash means that a hash algorithm (MD5 or SHA) is used to convert an arbitrary-length
message into a fixed-length message.
− SSL handshake verification
Issue 01 (2018-05-04) 39
NE20E-S2
Figure 1-27 SSL handshake verification
Whether to authenticate the SSL client is determined by the SSL server. As shown
by blue arrows in Figure 1-27, if the server needs to authenticate the client, the
following operations are required in addition to the SSL handshake process in
which the client authenticates the server:
i. The server sends a CertificateRequest message to request the client to send its
certificate to the server.
ii. The client sends a Certificate message carrying its certificate and public key to
the server. After receiving the message, the server verifies the validity of the
certificate.
iii. The client computes a hash for the master secret over handshake messages,
encrypts the hash using its private key, and then sends a CertificateVerify
message to the server.
iv. The server computes a hash for the master secret over handshake messages,
decrypts the received CertificateVerify message using the public key in the
client's certificate, and compares the decrypted result with the computed hash.
If the two values are the same, client authentication succeeds.
− SSL handshake process for resuming a session
Issue 01 (2018-05-04) 40
NE20E-S2
Figure 1-28 SSL handshake process for resuming a session
Asymmetric cryptography is used to encrypt keys and authenticate peer identities

when session parameters are being negotiated and a session is being established.
The computation workload is heavy, consuming a lot of system resources. To
simplify the SSL handshake process, SSL allows resumed sessions, as shown in
Figure 1-28. The details are as follows:
i. The client sends a ClientHello message. The session ID in this message is set
to the ID of the session to be resumed.
ii. If the server allows this session to be resumed, it replies with a ServerHello
message with the same session ID. After that, the client and server can use the
key and cipher suite of the resumed session without additional negotiation.
iii. The client sends a ChangeCipherSpec message to notify the server that
subsequent messages will be encrypted and a MAC will be computed based on
the key and cipher suite negotiated for the original session.
iv. The client computes a hash over handshake messages, uses the key and cipher
suite negotiated for the original session to process the hash, and then sends a
Finished message to the server so that the server can check whether the key
and cipher suite are correct.
v. Similarly, the server sends a ChangeCipherSpec message to notify the client
that subsequent messages will be encrypted and a MAC will be computed
based on the key and cipher suite negotiated for the original session.
vi. The server computes a hash over handshake messages, uses the key and cipher
suite negotiated for the original session to process the hash, and then sends a
Finished message to the client so that the client can check whether the key and
cipher suite are correct.
Security Mechanism
 Connection privacy
Issue 01 (2018-05-04) 41
NE20E-S2
SSL uses symmetric cryptography to encrypt data to be transmitted and the key exchange
algorithm Rivest Shamir and Adleman (RSA), which is one of asymmetric algorithms, to
encrypt the key used by the symmetric cryptography.
To ensure high security, do not use the RSA key pair whose length is less than 2048 digits.
 Identity authentication
Digital-signed certificates are used to authenticate a server and a client that attempt to
communicate with each other. Authenticating the client identity is optional. The SSL
server and client use the mechanism provided by the Public Key Infrastructure (PKI) to
apply to a CA for a certificate.
 Message integrity
A keyed MAC is used to verify message integrity during transmission.
A MAC algorithm computes a key and arbitrary-length data to output a MAC.
− A message sender uses a MAC algorithm and a key to compute a MAC and adds it
to the end of the message before sending the message to the receiver.
− The receiver uses the same key and MAC algorithm to compute a MAC and
compares the computed MAC with the MAC in the received message.
If the two MACs are the same, the message has not been tampered during transmission.
If the two MACs are different, the message has been tampered during transmission, and
the receiver will discard this message.
Currently, only DCN and Syslog support SSL-based encryption.
1.3.6.3.1 SSL
SSL authenticates the client and server and encrypts data transmitted between the two parties,
which improves network security.
Some traditional protocols do not have a security mechanism. As a result, data is transmitted
in plaintext. To improve security, configure SSL on the clients and server. SSL's data
encryption, identity authentication, and message integrity check mechanisms ensures security
of data transmission.
Issue 01 (2018-05-04) 42
NE20E-S2
Figure 1-29 SSL application on a DCN network
On the DCN network shown in Figure 1-29, an SSL policy is configured on and a trusted-CA
file is loaded to the GNE and NMS to verify the identity of the certificate owner, sign a digital
certificate to prevent eavesdropping and tampering, and manage the certificate and key. After
the GNE and NMS are authenticated and a connection is established between them, data can
be encrypted and transmitted between them.
1.3.7 VFM
Definition
Virtual File Management (VFM) is an interface the system provides for you to manage files.
Purpose
VFM can manage storage devices, directories, and files.
Directory management: VFM allows you to save files in logical hierarchies, query the current
working directory, change a directory, view directory or file information, and create or delete
a directory.
File management: VFM allows you to query, copy, rename, move, delete, and restore files.
1.3.8 File Transfer
Only physical systems (PSs) support file transfer.
Issue 01 (2018-05-04) 43
NE20E-S2
1.3.8.1 FTP
1.3.8.1.1 Introduction
When two hosts run different operating systems and use different file structures and character
sets, you can use File Transfer Protocol (FTP) to copy files from one host to the other.
Use the SFTP protocol because this protocol is not secure.
1.3.8.1.2 FTP Features Supported by VPNs

A File Transfer Protocol (FTP) client needs to access an FTP server deployed on a virtual
private network (VPN) instead of a public IP network. A VPN extends a private network
across a public network. It implements private transmission between devices and ensures
secure data transmission on an insecure network, such as the Internet.
A VPN can also be used to connect two separate networks over the Internet and operate as a
single network, which is useful for organizations that have two physical sites. VPNs are often
used by organizations to provide remote access to secure organizational networks. VPNs do
not require each PC to set up VPN connections, and a router is used to connect the two sites.
After a VPN is deployed, the router maintains a constant tunnel between the two sites. The
links between nodes of a VPN are formed over virtual circuits between hosts of a larger
network.
Figure 1-30 shows VPN networks.
Customer edge (CE): is physically deployed at the customer premise and provides access to
VPN services.
Provider edge (PE): is located at the edge of a provider network and provides access services
for customer sites. PEs monitor their connected VPNs and maintain VPN status.
Provider (P): operates inside a provider's core network and is not directly connected to CEs.
Ps are used to implement provider-provisioned virtual private networks (PPVPNs). Ps do not
monitor VPN sites or maintain VPN status. VPNs are generally configured on interfaces
connecting PEs and CEs to provide VPN services to the CEs. As shown in Figure 1-30, Site
CE 2 (FTP server) and Site CE 3 (FTP client) are connected to establish a VPN. The system
must support VPN over IP.
Issue 01 (2018-05-04) 44
NE20E-S2
Figure 1-30 VPN networks
1.3.8.1.3 Principles
1.3.8.1.3.1 FTP
The File Transfer Protocol (FTP), a file transfer standard on the Internet, runs at the
application layer in the TCP/IP protocol suite. FTP is used to transfer files between local and
remote hosts, typically for version upgrades, log downloads, file transfers, and configuration
savings. FTP is implemented based on the file system.
FTP uses the client/server architecture, as shown in Figure 1-31.
Figure 1-31 FTP client/server architecture
The NE20E provides the following FTP functions:

 FTP server: A router functions as an FTP server and provides access and operation
services for remote clients. You can run the FTP client program to log in to and access
files on the router.
 FTP client: A router functions as an FTP client and provides file access to the remote
FTP server. After you run a terminal emulation program or Telnet program on a PC to set
up a connection to the router, you can use FTP commands to set up a connection to the
remote FTP server and access files on it.
In addition to file transfer, FTP supports interactive access, format specifications, and
authentication control.
Issue 01 (2018-05-04) 45
NE20E-S2
FTP provides common file operation commands to help you manage the file system, including
file transfer between hosts. You can use an FTP client program outside a router to upload or
download files and access directories on the router. You can also run an FTP client program
on a router to transfer files to other devices or to the FTP server on the router.
FTP Connections
FTP is a standard application protocol based on the TCP/IP protocol suite. It is used to
transfer files between local clients and remote servers. FTP uses two TCP connections to copy
a file from one system to another. The TCP connections are usually established in
client-server mode, one for control (the server port number is 21) and the other for data
transmission (the server port number is 20).
 Control connection
A control connection is set up between the FTP client and FTP server.
The control connection always waits for communication between the client and server.
Commands are sent from the client to the server over this connection. The server
responds to the client after receiving the commands.
 Data connection
The server uses port 20 to provide a data connection. The server can either set up or
terminate a data connection. When the client sends files in streams to the server, only the
client can terminate the data connection.
FTP supports file transfer in stream mode. The end of each file is indicated by end of file
(EOF). Therefore, new data connections must be set up for each file transfer or directory
list. When a file is transferred between the client and server, a data connection is set up.
Figure 1-32 shows the process of FTP file transfer.
Figure 1-32 Process of FTP file transfer
Process of Setting Up an FTP Connection

The process of setting up an FTP data connection is as follows:
Issue 01 (2018-05-04) 46
NE20E-S2
1. The server passively enables port 21 to wait to set up a control connection to the client.
2. The client actively enables a temporary port to send a request for setting up a connection
to the server.
3. After the server receives the request, a control connection is set up between the
temporary port on the client and port 21 on the server.
4. The client sends a command for setting up a data connection to the server.
5. The client chooses a temporary port for the data connection and uses the port command
to send the port number to the server over the control connection.
6. The server actively enables port 20 to send a request for setting up a data connection.
7. After the client receives the request, a data connection is set up between the temporary
port on the client and port 20 on the server.
Figure 1-33 shows the FTP connection establishment process. In this example, the FTP client
uses temporary port 2345 to establish a control connection and temporary port 2346 to
establish a data connection. The two ports are connected to ports 21 and 20 of the FTP server,
respectively.
Figure 1-33 FTP connection establishment process
1.3.8.1.3.2 Data Types

FTP has two file transfer modes:
 Binary mode: is used to transfer program files, such as .app, .bin, and .btm files.
 ASCII mode: is used to transfer text files, such as .txt, .bat, and .cfg files.
1.3.8.1.3.3 Access Control

Access control defines user access privileges to files in the system. Access control prevents
unauthorized or accidental use of files. After you enable access control on an FTP server, all
access users are controlled.
After a user fails to log in to a device using FTP, the number of FTP login failures is recorded
for the IP address. If the number of login failures within a specified period reaches the
threshold, the IP address is locked, and all users who log in through this IP address cannot set
up an FTP connection with this device.
Issue 01 (2018-05-04) 47
NE20E-S2
1.3.8.1.3.4 Data Structure

FTP supports the following file structures:
 Byte structure
Also called a file structure. A file consists of sequential bytes.
 Record structure
Used only for text files (ASCII or EBCDIC). A file consists of sequential records.
 Page structure
Files are transmitted in pages. A file contains indexed pages and therefore the receiver
can save each page randomly.
On HUAWEI NE20E-S2, FTP supports the byte structure only.
1.3.8.1.3.5 Transmission Modes

FTP supports the following transmission modes:
 Stream mode
A file is transferred as a stream of bytes. In the file structure, the sender terminates the
data connection by indicating end-of-file (EOF). In the record structure, a dedicated
two-byte control code is used to identify the record and file ends.
 Block mode
A file is transmitted as a series of data blocks. Each block is preceded by one or more
header bytes. The header bytes contain a count field and descriptor code. The count field
indicates the total length of the data block in bytes. The descriptor code defines the last
block in the file (EOF), last block in the record (EOR), restart marker (marker to identify
error recovery and restart), or suspect data.
 Compressed mode
The same bytes sent consecutively are compressed.
On HUAWEI NE20E-S2, FTP supports the stream mode only.
1.3.8.1.3.6 Command Types

FTP supports a set of commands that comprise the control information flowing from the
user-FTP process to the server-FTP process. The FTP service type defines commands for the
FTP client to upload files or the file system. The argument of an FTP service command is a
pathname. The syntax of the pathnames must conform to the server and control connection
conventions.
1.3.8.1.4 Applications
FTP applications are as follows:
 A Device functions as an FTP client.
 A Device functions as an FTP server.
Issue 01 (2018-05-04) 48
NE20E-S2
Device as an FTP Client

After you log in to the FTP server through the router functioning as an FTP client, you can
download files from the server to the client.
As shown in Figure 1-34, the router with the IPv4 address 172.16.105.110 functions as the
FTP client. You can log in to the FTP server through the router.
Figure 1-34 Router functioning as an FTP client
Device as an FTP Server

You can log in to the FTP server through a HyperTerminal and download files from the FTP
server.
As shown in Figure 1-35, the router with the IPv4 address 172.16.104.110 functions as the
FTP server.
Figure 1-35 Router functioning as an FTP server
1.3.8.2 TFTP
This section describes basic concepts and principles of the Trivial File Transfer Protocol
(TFTP) and its applications on Huawei devices.
The Trivial File Transfer Protocol (TFTP) is a User Datagram Protocol (UDP) and uses port
69.
In TFTP, the client sends a request to the server to read a file, write a file, or establish a
connection. A file is transmitted in a fixed length of 512 bytes. A data packet of less than 512
bytes signifies the file transfer termination. Each data packet contains one data block, which
helps the sender to resend the packet if the data packet is lost during transmission.
Issue 01 (2018-05-04) 49
NE20E-S2
Compared with FTP, TFTP does not require complex port interactions or access or
authentication control. TFTP applies when no complex interactions exist between the client
and server. For example, you can use TFTP to obtain the system memory image when the
system starts.
1.3.8.2.2 Principles
1.3.8.2.2.1 TFTP
Message Types
A TFTP packet header contains a two-byte operation code, with values defined as follows:
 Read request (RRQ): indicates a read request.
 Write request (WRQ): indicates a write request.
 Data (DATA): indicates data packets.
 Acknowledgment (ACK): indicates a positive reply packet.
 Error (ERROR): indicates error packets.
Figure 1-36 shows a TFTP packet header.
Figure 1-36 TFTP packet header
A TFTP packet consists of the following fields:

 OPCODE: operation code or command line
Operation Command Description

Code
1 Read Request Request to read a file

2 Write Request Request to write to a file
3 File Data Transfer of file data
4 Data Acknowledge Acknowledgement of file data
5 Error Error message
Issue 01 (2018-05-04) 50
NE20E-S2
 FILENAME: name of the file to be transferred

 MODE: data mode, which is transmitted as a protocol
 BLOCK #: Block numbers in a data packet begin with 1 and increase by one for each
new block of data.
 DATA: This field ranges from 0 to 512 bytes.
 ERROR MESSAGE: The server cannot read or write a request. The code "0" indicates a
stop flag.
An error message consists of the following items:
− Error code: 2 bytes. The following table describes the error codes supported by
TFTP.
Error Code Description

0 Not defined
1 File not found
2 Access violation
3 Disk full
4 Illegal TFTP operation
5 Unknown port
6 File already exists
7 No such user exists
− Error information: indicated by a two-byte ASCII code.
Transfer Modes
TFTP supports the following transfer modes:
 Binary mode: used for program file transfers
 ASCII mode: used for text file transfers
HUAWEI NE20E-S2 can function only as a TFTP client and transmit files in binary mode.
1.3.8.2.3.1 Downloading Files

You can use TFTP to download files from a server in a simple interaction environment. For
file downloads, HUAWEI NE20E-S2 can function only as a TFTP client.
Figure 1-37 shows TFTP file download.
Issue 01 (2018-05-04) 51
NE20E-S2
Figure 1-37 TFTP file download
1.3.8.2.3.2 Uploading Files

You can use TFTP to upload files to a device in a simple interaction environment. For file
uploads, HUAWEI NE20E-S2 can function only as a TFTP client.
Figure 1-38 shows TFTP file upload.
Figure 1-38 TFTP file upload
1.3.9 Configuration Management

Definition
 Configuration: a series of command operations performed on the system to meet service
requirements. These operations still take effect after the system restarts.
 Configuration file: a file used to save configurations. You can use a configuration file to
view configuration information. You can also upload a device's configuration file to
other devices for batch management.
A configuration file saves command lines in a text format. (Non-default values of the
command parameters are saved in the file.) Commands can be organized into a basic
command view framework. The commands in the same command view can form a
section. Empty or comment lines can be used to separate different sections. The line
beginning with "#" is a comment line.
 Configuration management: a function for managing configurations and configuration
files using a series of commands.
A storage medium can save multiple configuration files. If the location of a device on the
network changes, its configurations need to be modified. To avoid reconfiguring the
device, specify a configuration file for the next startup. The device restarts with new
configurations to adapt to its new environment.
Issue 01 (2018-05-04) 52
NE20E-S2
Purpose
Configuration management allows you to lock, preview, and discard configurations, save the
configuration file used at the current startup, and set the configuration file to be loaded at the
next startup of the system.
Benefits
Configuration management offers the following benefits:
 Improved efficiency by configuring services in batches
 Improved reliability by correcting incorrect configurations
 Improved security by minimizing the configuration impact on services
1.3.9.2 Principles
1.3.9.2.1 Two-Phase Validation Mode
Basic Principles
In two-phase validation mode, the system configuration process is divided into two phases.
The actual configuration takes effect after the two phases are complete. Figure 1-39 shows the
two phases of the system configuration process.
Figure 1-39 Two phases of the system configuration process
1. In the first phase, a user enters configuration commands. The system checks the data
type, user level, and object to be configured, and checks whether there are repeated
configurations. If syntax or semantic errors are found in the command line, the system
displays a message on the terminal to inform the user of the error and cause.
2. In the second phase, the user commits the configuration. The system then enters the
configuration commitment phase and commits the configuration in the candidate
database to the running database.
− If the configuration takes effect, the system adds it to the running database.
− If the configuration fails, the system informs the user that the configuration is
incorrect. The user can enter the command line again or change the configuration.
In two-phase validation mode, if a configuration has not been committed, the symbol "*" is displayed in
the corresponding view (except the user view). If all configurations have been configured, the symbol
"~" is displayed in the corresponding view (except the user view).
Issue 01 (2018-05-04) 53
NE20E-S2
The two-phase validation mode uses the following databases:

 Running database:
A configuration set that is currently being used by the system.
 Candidate database:
For each user, the system generates a mapping of the running database. Users can edit
the configuration in the candidate database and commit the edited configuration to the
running database.
Validity Check
After users enter the system view, the system assigns each user a candidate database. Users
perform configuration operations in their candidate databases, and the system checks the
validity of each user's configurations.
In two-phase validation mode, the system checks configuration validity and displays error
messages. The system checks the validity of the following configuration items:
 Repeated configuration
The system checks whether configurations in the candidate databases are identical to
those in the running database.
− If configurations in the candidate databases are identical to those in the running
database, the system does not commit the configuration to the running database and
displays repeated configuration commands.
− If configurations in the candidate databases are different from those in the running
database, the system commits the configuration to the running database.
 Data type
 Commands available for each user level
 Existence of the object to be configured
Concurrent Operations of Multiple Users

As shown in Figure 1-40, multiple users can perform concurrent configuration operations on
the same device.
Figure 1-40 Concurrent configuration operations on the same device
Issue 01 (2018-05-04) 54
NE20E-S2
Benefits
The two-phase validation mode offers the following benefits:
 Allows several service configurations to take effect as a whole.
 Allows users to preview configurations in the candidate database.
 Clears configurations that do not take effect if an error occurs or the configuration does
not meet expectations.
 Minimizes the impact of configuration procedures on the existing services.
1.3.9.2.2 Configuration Rollback

Configuration rollback enables the system to roll back system configurations to a
user-specified historical state, enhancing system reliability and improving operation and
maintenance efficiency.
Basic Concepts
 Configuration: a set of specifications and parameters about services or physical resources.
These specifications and parameters are visible to and can be modified by users.
 Configuration operation: a series of actions taken to meet service requirements, such as
adding, deleting, or modifying the system configurations.
 Configuration rollback point: Once a user commits a configuration, the system
automatically generates a configuration rollback point and saves the difference between
the current configuration and the historical configuration at this configuration rollback
point.
Principles
As shown in Figure 1-41, a user committed configurations N times. Rollback point N
indicates the most recent configuration the user committed. The configuration rollback
procedure is as follows:
1. The user determines to roll the system configuration back to rollback point X based on
the comparison between the historical and current configurations.
2. After the user performs the configuration rollback operation, the system rolls back to the
historical state at rollback point X and generates a new rollback point N+1, which is
specially marked.
Configurations at rollback points N+1 and X are identical.
Figure 1-41 Configuration rollback
Issue 01 (2018-05-04) 55
NE20E-S2
Configuration rollback works in a best-effort manner. If a configuration fails to be rolled back,

the system records the configuration.
Usage Scenario
Users can check the system running state after committing system configurations. If a fault or
an unexpected result (such as service overload, service conflict, or insufficient memory
resources) derived from misoperations is detected during the check, the system configurations
must roll back to a previous version. The system allows users to delete or modify the system
configurations only one by one.
Configuration rollback addresses this issue by allowing users to restore the original
configurations in batches.
 The system automatically records configuration changes each time a change is made.
 Users can specify the historical state to which the system configurations are expected to
roll back based on the configuration change history.
For example, a user has committed four configurations and four consecutive rollback points
(A, B, C, and D) are generated. If an error is found in configurations committed at rollback
point B, configuration rollback allows the system to roll back to the configurations at rollback
point A.
Configuration rollback significantly improves maintenance efficiency, reduces maintenance
costs, and minimizes error risks when configurations are manually modified one by one.
Benefits
Configuration rollback brings significant benefits for users in terms of configuration security
and system maintenance.
 Minimizes impact of mistakes caused by misoperations. For example, if a user
mistakenly runs the undo bgp command, Border Gateway Protocol (BGP)-related
configurations (such as peer configurations) are deleted. Configuration rollback allows
the system to roll back configurations to what they were before the user ran the undo
bgp command.
 Facilitates feature testing: When a user is testing a feature, the system generates only one
rollback point if all the feature-related configurations are committed at the same time.
Before the user tests another feature, configuration rollback allows the system to roll
back configurations to what they were before the previous feature was tested, ruling out
the possibility that the previous feature affects the one to be tested.
 Functions properly regardless of whether the device restarts. A configuration rollback
point remains after a device restarts. If any change is made after the restart, the system
automatically generates a non-user-triggered configuration rollback point and saves it.
Users can determine whether to roll system configurations back to what they were before
the device restarts.
1.3.9.2.3 Configuration Trial Run

Configuration trial run can test new functions and services on live networks without
interrupting services.
Usage Scenario
Deploying unverified new services directly on live network devices may affect the current
services or even disconnect devices from the network management system (NMS). To address
Issue 01 (2018-05-04) 56
NE20E-S2
this problem, you can deploy configuration trial run. Configuration trial run will roll back the
system to the latest rollback point by discarding the new service configuration if the new
services threaten system security or disconnect devices from the NMS. This function
improves system security and reliability.
Principles
Configuration trial run takes effect only in two-phase configuration validation mode.
As shown in Figure 1-42, a user committed configurations N times. Rollback point N

indicates the most recent configuration that the user committed. The configuration trial run
procedure is as follows:
In two-phase configuration validation mode, you can specify a timer for the configuration trial
run to take effect. Committing the configuration trial run is similar to committing an ordinary
configuration, but the committed configuration takes effect temporarily for the trial. Each
time you commit a configuration, the system generates a rollback point and starts the
specified timer for the trial run. You cannot change the configuration during the trial run, but
you can check configurations at rollback points or perform maintenance operations.
Before the timer expires, you can confirm or abort the tested configuration. If you commit the
tested configuration, the timer stops and the configuration trial run ends. And if you abort the
configuration trial run, the system will roll back to the latest rollback point by discarding the
tested configuration. Meanwhile, a new rollback point will be generated.
After the timer expires, the system stops the configuration trial run and rolls back to the
configuration prior to the configuration trial run. When the rollback is complete, the system
generates a new rollback point.
The system configuration at this N+1 rollback point is the same as that at rollback point N-1.
As shown in Figure 1-42, the system has N-1 rollback points. After you configure the
configuration trial run and commit the configuration, the system generates a rollback point N,
recording the configuration to be tested. After the timer expires, the system rolls back and
then generates a new rollback point N+1. Configurations at rollback points N+1 and N-1 are
the same.
Figure 1-42 Diagram of configuration trial run
Issue 01 (2018-05-04) 57
NE20E-S2
1.4 System Management

Purpose
This document describes the system management feature in terms of its overview, principles,
and applications.
Related Version

U2000 V200R017C50
Intended Audience
Issue 01 (2018-05-04) 58
NE20E-S2

securely protected.
and VRRP.
Special Declaration
Symbol Conventions
Symbol Description

Issue 01 (2018-05-04) 59
NE20E-S2
Symbol Description

injury.
tips.
deterioration.
Change History
V800R009C10SPC200.
V800R009C10SPC100.
V800R009C10.
1.4.2 VS
Definition
A network administrator divides a physical system (PS) into several virtual systems (VSs)
using hardware and software simulation. Each VS performs independent routing tasks. VSs
share software and hardware resources, including main boards (MBs) and line cards (LCs),
but each interface works only for one VS.
Background
As the demand on various types of network services is growing, network management
becomes more complex. Requirements for service isolation, system security, and reliability
are steadily increasing. The virtual private network (VPN) technique can be used to isolate
services on a PS. If a module failure occurs on the PS, all services configured on the PS will
be interrupted.
Issue 01 (2018-05-04) 60
NE20E-S2
To prevent service interruptions, the VS technique is used to partition a PS into several VSs.
Each VS functions as an independent network element and uses separate physical resources to
isolate services.
Further development of the distributed routing and switching systems allows VS technique to
fully utilize the service processing capability of a single PS. The VS technique helps simplify
network deployment and management, and strengthen system security and reliability.
Benefits
This feature offers the following benefits to carriers:
 Service integrity: Each VS has all the functions of a common router to carry services.
Each VS has an independent control plane, which allows rapid response to future
network services and makes network services more configurable and manageable.
 Service isolation: A VS is a virtual router on both the software and hardware planes. A
software or hardware fault in a VS does not affect other VSs. The VS technique ensures
network security and stability.
 Expenditure reduction: As an important feature of new-generation IP bearer devices, VSs
play an active role in centralized operation of service provider (SP) services, reducing
capital expenditure (CAPEX) or operational expenditure (OPEX).
1.4.2.2 Principles
1.4.2.2.1 Basic VS Concepts
Concepts
Admin-VS and common VS
A virtual system (VS) is classified as an Admin-VS or a common VS.
Common VS: The network administrator uses hardware-level and software-level emulation to
partition a physical system (PS) into VSs. Each interface works only for one VS, and each VS
runs individual routing tasks. VSs share software and hardware resources, including main
boards (MBs) and line cards (LCs).
Admin-VS: Each PS has a default VS named Admin-VS. All unallocated interfaces belong to
this VS. The Admin-VS can process services in the same way as a common VS. In addition,
the PS administrator can use the Admin-VS to manage VSs.
Services are isolated between different VSs, but configuration and log files are not. This mode
applies when low requirements for VS independency and security are needed.
After you create a VS, allocate hardware resources to the VS.
In Figure 1-43, a PS is partitioned into VSs. Each VS functions as an individual router to
process services.
Issue 01 (2018-05-04) 61
NE20E-S2
Figure 1-43 VS partitioning
Table 1-2 lists other VS concepts.
Table 1-2 Other VS concepts

Acronym and Full Name Description
Abbreviation
MB main board MBs are used to carry out tasks on the

management control plane. In VSs, MBs are
different from master main boards (MMBs).
PVMB primary main board A VS has many MBs, two of which manage
of virtual system the entire VS. The two MBs are PVMBs.
MMB master main board A PS has many MBs, two of which manage
the entire system. The two MBs are MMBs.
LC line card LCs provide physical interfaces connected to
other devices.
Implementation
VSs share all resources except interfaces.
VSs can use the same system software to carry various services. In Figure 1-43, VS1 carries
voice services, VS2 carries data services, and VS3 carries video services. Each type of service
is transmitted through a separate VS, and these services are isolated from one other.
 VSs can be configured on both physical or logical interfaces, and an interface can only be assigned
to a single VS. Logical interface configured on a physical interface work for the same VS in the PS
to which the LC belongs.
 VS partitioning does not require a PS, and a PS must have sufficient interfaces on which VSs can
be configured.
VS Authority Management
Table 1-3 shows VS authority management.
Issue 01 (2018-05-04) 62
NE20E-S2
Table 1-3 VS authority management
Role Creating a VS Allocating Resources to a

VS
PS administrator √ √
VS administrator - -
√: indicates that the function is supported.

-: indicates that the function is not supported.
 A VS administrator can perform operations only on the managed VS, including starting and stopping
the allocated services, configuring routes, forwarding service data, and maintaining and managing
the VS.
 On the NE20E, physical interfaces can be directly connected so that different VSs on the same
Physical System (PS) can communicate.
Virtual system (VS) applications are as follows:
 Different routing instances are isolated, which is more secure and reliable than route
isolation implemented using VPN.
 Physical resources of a device can be fully utilized. For example, without the VS
technique, on a device with 16 interfaces, if only 4 interfaces are needed to transmit
services, the other 12 interfaces idle, wasting resources.
 Devices of different roles are integrated to simplify network tests, deployment,
management, and maintenance.
 Links between devices are simplified into internal buses that are of higher reliability,
higher performance, and lower cost.
1.4.2.3.1 Simplification of Network Deployment

On a live network, users access the core layer through the access and aggregation layers. On a
large network, a substantial number of access and aggregation devices are used to meet
surging access needs. This type of deployment makes network management difficult.
The virtual system (VS) technique effectively addresses this problem. As illustrated in Figure
1-44, one physical device can serve as both edge and aggregation nodes. This application
simplifies the network topology and makes the network easier to manage and maintain. In this
scenario, a VS can be configured as an aggregation node, while the other VSs are configured
as edge nodes.
Issue 01 (2018-05-04) 63
NE20E-S2
Figure 1-44 Simplification of network deployment (edge and convergence nodes)
In Figure 1-45, a physical device can serve as both aggregation and core nodes (such as the
BRAS, PE, and P), which simplifies network topology and network management and
maintenance.
Issue 01 (2018-05-04) 64
NE20E-S2
Figure 1-45 Simplification of network deployment (convergence and core nodes)
1.4.2.3.2 Assignment of VSs to Users

As networks develop, network deployment becomes complicated. SPs search for new avenues
of business growth to bring more profits without changing existing network deployment.
Virtual system (VS) assignment is suitable for this situation.
In Figure 1-46, a Level-1 SP assigns VSs to enterprises and Level-2 SPs. On a physical
system (PS), VSs are configured for different users. This configuration makes full use of
physical resources and provides secure and reliable user services.
Each Level-2 SP can also configure VSs on a PS for multiple enterprises.
This implementation ensures that enterprises use their own VSs independently. In terms of
operation and management (OM), SPs can uniformly manage all VSs. Alternatively, users can
manage their own VSs after the SP grants administrative rights.
Issue 01 (2018-05-04) 65
NE20E-S2
Figure 1-46 Assigning VSs to users
1.4.2.3.3 Service Differentiation and Isolation

Services deployed on a traditional router must be sent to the master main board (MMB) for
processing and forwarding. If a fault occurs during service processing, the router will be most
likely to fail, and other services cannot be processed. This failure can significantly impact
user services. The virtual system (VS) technique can be used to differentiate and isolate
services. After VSs are applied on a router, if a fault occurs on a VS, other services
transmitted on other VSs are not affected. This mechanism improves network security.
In Figure 1-47, a physical system (PS) is partitioned into VSs to carry various types of
services. Video, voice, and data services run on three independent VSs.
Issue 01 (2018-05-04) 66
NE20E-S2
Figure 1-47 Service differentiation and isolation
1.4.2.3.4 Multi-Service VPN

An IP network can be divided into virtual private networks (VPNs), which are logically
isolated. VPN-based service isolation can be used on networks to allow interconnection
between departments in an enterprise or to carry new services. Therefore, VPNs are widely
used on live networks. Each VPN configured on a router carries only a single type of service.
The virtual system (VS) technique can be used to implement multi-service VPNs on the same
router to meet growing service needs.
In Figure 1-48, each VS carries a specific type of VPN service, and various types of VPN
services are isolated from one another. MPLS and BGP can run on VSs in the same physical
system (PS).
Issue 01 (2018-05-04) 67
NE20E-S2
Figure 1-48 Multi-service VPN
1.4.2.3.5 New Service Verification

As networks develop, service providers (SPs) face fierce competition. Each SP intends to
keep existing users and attract more users in the existing network environment. Adding new
services helps SPs remain competitive. However, deploying unverified new services directly
on network devices poses a security risk. The virtual system (VS) technique, which isolates
services, can be used to prevent this risk.
In Figure 1-49, each new service is deployed and verified on an independent VS. This
deployment makes full use of hardware resources and does not affect existing services.
Figure 1-49 New service verification
Issue 01 (2018-05-04) 68
NE20E-S2
1.4.3 Information Management

Definition
Information management classifies output information, effectively filters out information, and
outputs information to a local device or a remote server.
Information Management on Huawei Devices

A Huawei device supports the following information management functions:
 Records and queries information in real time.
 Configures the total file size of logfiles.
 Supports the Syslog protocol.
Purpose
The information management function helps users:
 Locate faults effectively.
 Classify and filter out information.
 Send information to a network management station (workstation) to help a network
administrator monitor routers and locate faults.
1.4.3.2 Principles
1.4.3.2.1 Information Classification
Table 1-4 describes information that can be classified as logs, traps, or debugging information
based on contents, users, and usage scenarios.
Table 1-4 Information classification
Type Description
Logs Logs are records of events and unexpected activities of managed
objects. Logging is an important method to maintain operations and
identifying faults. Logs provide information for fault analysis and helps
an administrator trace user activities, manage system security, and
maintain a system.
Some logs are used by technical support personnel for troubleshooting
only. Because such logs have no practical instruction significance to
users, users are not notified when the logs are generated. Logs are
classified as user logs or diagnostic logs.
 User logs: During device running, the log module in the host
software records all running information in logs. The logs are saved
in the log buffer, sent to the Syslog server, reported to the NMS, and
displayed on the screen. Such logs are user logs. Users can view the
compressed log files and content.
 Diagnostic logs: The logs recorded after the device is started but
Issue 01 (2018-05-04) 69
NE20E-S2
Type Description
before the logserver component starts are diagnostic logs. Such logs
are recorded in the process-side black box, and they are not saved in
the log buffer, sent to the Syslog server, reported to the NMS, or
displayed on the screen. Users can view the compressed log files
and content.
NOTE
The information recorded in diagnostic logs is used for troubleshooting only and
does not contain any sensitive information.
Traps Traps are sent to a workstation to report urgent and important events,
such as the restart of a managed device. In general, the system also
generates a log with the same content after generating a trap, except that
the trap contains an additional OID.
Debugging Debugging information shows the device's the running status, such as
information the sending or receiving of data packets. A device generates debugging
information only after debugging services are enabled.
Information File Naming Mode

Information can be saved as files on a device. These files are called information files. You can
check information files in the cfcard2: directory by default. Table 1-5 describes the naming
modes for information files.
Table 1-5 Naming modes for information files
Naming Mode Description

log.log The current information files of the system are saved in log format.
diag.log Logs recording exceptions that occur when the system is started or
running are saved in diag.log format.
log_SlotID_time. If the size of a current information file reaches the upper threshold, the
log.zip system automatically compresses the file into a historical file and
changes the file name to log_SlotID_time.log.zip.
In the file name, SlotID indicates the slot ID and time indicates the
compression and saving time.
diag_SlotID_tim When the diagnostic log file size reaches the specified upper limit, the
e.log.zip system automatically converts the file to a compressed file and names
the compressed file diag_SlotID_time.log.zip, where SlotID represents
the board slot ID and time represents the file conversion time.
Issue 01 (2018-05-04) 70
NE20E-S2
1.4.3.2.2 Information Level
Overview
Identifying fault information is difficult if there is a large amount of information. Setting
information levels allows users to rapidly identify information.
Information Levels
Table 1-6 describes eight information severities. The lower the severity value, the higher the
severity.
Table 1-6 Definition of each information level
Value Severity Description

0 Emergency A fatal fault, such as an abnormally running program or
unauthorized use of memory. The system must be restarted
after the fault is rectified.
1 Alert A serious fault. For example, device memory reaches the
maximum limit. Such a fault must be rectified immediately.
2 Critical A critical fault. For example, memory usage reaches the upper
limit, the temperature reaches the upper limit, or bidirectional
forwarding detection (BFD) detects an unreachable device or
error messages generated by a local device. The fault must be
analyzed and rectified.
3 Error An incorrect operation or unexpected process. For example,
users enter incorrect commands or passwords, or error protocol
packets received by other devices are detected. The fault does
not affect subsequent services and requires cause analysis.
4 Warning An exception. For example, users disable a routing process,
BFD detects packet loss, or error protocol packets are detected.
The fault does not affect subsequent services and requires
attention.
5 Notice A key operation is performed to keep the device running
properly. For example, the shutdown command is used on an
interface, a neighbor is discovered, or the protocol status
changes.
6 Informational A routine operation is performed to keep a device running
properly. For example, the display command is used.
7 Debugging A routine operation is performed to keep a device running
properly, and no action is required.
Logs can be output or filtered based on a specified severity value. A device can output logs
with severity values less than or equal to the specified value. For example, if the log severity
value is set to 6, the device only outputs logs with severity values 0 to 6.
Issue 01 (2018-05-04) 71
NE20E-S2
1.4.3.2.3 Information Format

The following example shows the information format.
<Int_16>TIMESTAMP HOSTNAME %%ddAAA/B/CCC(l):VS=X-CID=ZZZ; YYYY
Table 1-7 describes each field of the information format.
Table 1-7 Description of the information format
Field Name Description

<Int_16> Leading characters Theses characters are added to the information to
be sent to a syslog server but are not added to the
information saved on a local device.
TIMESTAM Timestamp Date and time when information was output.
P The timestamp format can be configured. In the
default format yyyy-mm-dd hh:mm:ss, the
parameters are as follows:
 yyyy-mm-dd
 hh:mm:ss: hh is in the 24-hour format.
HOSTNAME Host name The default host name is HUAWEI.
%% Huawei identifier The information is output by a Huawei device.
dd Version number Information format version.
AAA Module name Name of a module that generates the information.
B Information level Information severity.
CCC Summary Further describes the information.
(l) Information type Information type.
VS=X-CID= Number of a virtual Name of a VS and an internal component to
ZZZ system (VS) and which the information belongs. The parameters
number of a are as follows:
component inside the X: ID of a VS
device
ZZZ: ID of an internal component
YYYY Detailed information Information contents. Before the information is
output, the module fills its contents.
1.4.3.2.4 Information Output

A device records its operation information in real time. If a problem occurs, the device records
what happened during device operation (for example, a command execution or network
disconnection) and provides reference for fault analysis. Information can be output to a
terminal, console, information buffer, information file, or SNMP agent for storage and query.
Because various types of user devices can be connected to a device, the information
management function on the device must detect user device information changes and
determine whether to output information to specified destinations and in which format the
Issue 01 (2018-05-04) 72
NE20E-S2
information is output. In addition, information management allows the device to filter

information by determining the type, severity, and source module of the information to be
output.
Information Output Channel

Information management defines 10 channels to output information. These channels have the
same priority and are independent of each other. Information channels are available only after
information sources are specified. By default, a device defines information sources for the
first six channels (console, monitor, log host, trap buffer, log buffer, and SNMP agent) and for
channel 9 (information file).
Figure 1-50 illustrates information output channels. Logs, traps, and debugging information
are output through default channels. All types of information can also be output through
specified channels. For example, if channel 6 is configured to carry information to the log
buffer, information is sent to the log buffer through channel 6, not channel 4.
Table 1-8 describes the default information output channels.
Figure 1-50 Information output channels
Table 1-8 Default information output channels
Chan Default Output Description

nel ID Channel Direction
Name
0 Console Console Receives logs, traps, and debugging information for

local query.
Issue 01 (2018-05-04) 73
NE20E-S2
Chan Default Output Description

nel ID Channel Direction
Name
1 Monitor Remote A VTY terminal: receives logs, traps, and

terminal debugging information for remote maintenance.
2 Loghost Syslog Receives and saves logs and traps. An administrator
server can monitor routers and locate faults by querying the
files.
The syslog server to which log information is output
can be specified by configuring the server IP
address, UDP port number, recording information
facility, and information severity. Multiple source
interfaces can be specified on devices to output log
information. This configuration allows a syslog
server to identify which the device outputs
information.
3 trapbuffer Trap Displays traps received by a local device.
buffer
4 logbuffer Log buffer Displays logs received by a local device.
5 snmpagent SNMP Sends traps to a workstation.
agent
6 channel6 Unspecifie Reserved.
d
d
d
9 channel9 Informatio Stores received logs, traps on a storage component
n file of a device.
Information is sent to the specified destinations through specified channels.

Channel names or the mapping between channels and destinations can be modified.
Information Filtering Table

During device operation, each module outputs service processing information. All information
is output to the console, terminal, syslog server, information buffer, information file, or SNMP
agent for storage and query. Information filtering tables help users filter out information of a
specific service module or severity through specified information output channels.
The information filtering table helps filter out information output to specified destinations
based on the type, severity, and source. Information management supports multiple
information filtering tables on a device. An information filtering table can be used to filter out
information sent to one or multiple destinations. The information filtering conditions can be
specified.
Issue 01 (2018-05-04) 74
NE20E-S2
The contents of an information filtering table are as follows:

 ID of the module that generates information
 Whether log output is enabled
 Output logs within a specified severity value range
 Whether trap output is enabled
 Output traps within a specified severity value range
 Whether debugging output is enabled
 Output debugging information within a specified severity value range
1.4.3.3.1 Monitoring Network Operations Using Collected Information
Information can be collected and used to monitor network operations. The collected
information includes active trap messages, historical trap messages, key events, operation
information, and historical performance data.
Command lines can be used to query collected information on a device. The information can
also be sent to a specified terminal or syslog server using the Syslog protocol.
1.4.3.3.2 Locating Network Faults Using Collected Information

Event information helps an administrator obtain a snapshot of unexpected events. Operation
information helps the administrator understand operations performed on a device. Analysis of
exceptions and operations provides reference for fault identification.
1.4.3.3.3 Information Audit

Information must be periodically audited for secure network operation. The following
information can be audited:
 Validity: Information to be sent must comply with the format required by the Syslog
protocol.
 Integrity: Information must include various user operations, exceptions, and key events.
1.4.4 Fault Management
This feature can only be executed on physical systems (PSs) but can take effect on all virtual systems
(VSs).
Definition
The fault management function is one of five functions (performance management,
configuration management, security management, fault management, and charging
management) that make up a telecommunications management network. The primary
purposes of this function are to monitor the operating anomalies and problems of devices and
networks in real time and to monitor, report, and store data on faults and device running
conditions. Fault management also provides alarms, helping users isolate or rectify faults so
that affected services can be restored.
Issue 01 (2018-05-04) 75
NE20E-S2
Purpose
With the popularity of networks, complexity of application environments, and expansion of
network scales, our goal must be to make network management more intelligent and effective.
Improving and optimizing fault management will help us meet this goal. Improved fault
management can achieve the following:
 Reduction in the volume of alarms generated
Alarm masking, alarm correlation analysis and suppression, and alarm continuity
analysis functions are supported to provide users with the most direct and valid fault
alarm information and to lighten the load on the fault management system. Such support
for efficient fault location and diagnosis enhances the ability of the network element (NE)
management system to manage same-network NEs and cross-network NEs.
 Guaranteed alarm reporting
Use of the active alarm table and internal reliability guarantee mechanism allows alarms
to be displayed immediately so that faults can be rapidly and correctly located and
analyzed.
1.4.4.2 Principles
Alarms are reported if a fault is detected. Classifying, associating, and processing received
alarms help keep you informed of the running status of devices and helps you locate and
analyze faults rapidly.
Table 1-9 lists the alarm functions supported by the HUAWEI.
Table 1-9 Alarm functions
Function Description
Alarm masking Maintenance engineers can configure alarm
masking on terminals so that terminals
detect only alarms that are not masked. This
function helps users ignore the alarms that
do not need to be displayed.
Alarm suppression Alarm suppression can be classified as jitter
suppression or correlation suppression.
 Jitter suppression: uses alarm continuity
analysis to allow the device not to report
the alarm if a fault lasts only a short
period of time and to display a stable
alarm generated if a fault flaps.
 Correlation suppression: uses alarm
correlation rules to reduce the number of
reported alarms, reducing the network
load and facilitating fault locating.
1.4.4.2.1 Alarm Masking

Maintenance engineers can configure alarm masking on terminals so that terminals detect
only alarms that are not masked. This function helps users ignore the alarms that do not need
to be displayed.
Issue 01 (2018-05-04) 76
NE20E-S2
Figure 1-51 illustrates principles of the alarm masking function.
Figure 1-51 Alarm masking
Alarm masking has the following characteristics:

 Alarm masking is terminal-specific. Specifically, alarms that are masked on a terminal
can still be received normally by other terminals.
 A terminal can be configured with an alarm masking table to control its alarm
information.
1.4.4.2.2 Alarm Suppression

Alarm suppression can be classified as jitter suppression or correlation suppression.
 Jitter suppression: uses alarm continuity analysis to allow the device not to report the
alarm if a fault lasts only a short period of time and to display a stable alarm if a fault
flaps.
 Correlation suppression: uses alarm correlation rules to reduce the number of reported
alarms, reducing the network load and facilitating fault locating.
Alarm Continuity Analysis

Figure 1-52 illustrates principles of alarm continuity analysis.
Issue 01 (2018-05-04) 77
NE20E-S2
Figure 1-52 Principles of alarm continuity analysis
Alarm continuity analysis aims to differentiate events that require analysis and attention from
those that do not and to filter out unstable events.
Continuity analysis measures time after a stable event, such as fault occurrence or fault
rectification, occurs. If the event continues for a specified period of time, an alarm is sent. If
the event is cleared, the event is filtered out and no alarm is sent. If a fault lasts only a short
period of time, it is filtered out and no alarm is reported. Only stable fault information is
displayed when a fault flaps.
Figure 1-53 shows the alarm generated if a fault flaps.
Figure 1-53 Alarm generated if a fault flaps
Alarm Correlation Analysis

An event may cause multiple alarms. These alarms are correlated. Alarm correlation analysis
facilitates fault locating by differentiating root alarms from correlative alarms.
Alarm correlation analyzes the relationships between alarms based on the predefined alarm
correlations. Use the linkDown alarm as an example. If a linkDown alarm is generated on an
interface and the link down event results in the interruption of circuit cross connect (CCC)
services on the interface, an hwCCCVcDown alarm is generated. According to the predefined
alarm correlations, the linkDown alarm is a root alarm, and the hwCCCVcDown alarm is a
correlative alarm.
After the system generates an alarm, it analyzes the alarm's correlation with other existing
alarms. After the analysis is complete, the alarm carries a tag identifying whether it is a root
Issue 01 (2018-05-04) 78
NE20E-S2
alarm, a correlative alarm or independent alarm. If the alarm needs to be sent to a Simple
Network Management Protocol (SNMP) agent and forwarded to the network management
system (NMS), the system determines whether NMS-based correlative alarm suppression is
configured.
 If NMS-based correlative alarm suppression is configured, the system filters out
correlative alarms and reports only root alarms and independent alarms to the NMS.
 If NMS-based correlative alarm suppression is not configured, the system reports root
alarms, correlative alarms and independent alarms to the NMS.
1.4.5 Performance Management

Definition
The performance management feature periodically collects performance statistics on a device
to monitor the performance and operating status of the device. This feature allows you to
evaluate, analyze, and predict device performance with current and historical performance
statistics.
Purpose
The performance management feature is essential to device operation and maintenance. This
feature provides current and historical statistics about performance indicators, helping you to
determine the device operating status and providing a reference for you to locate faults and
perform configurations.
Analysis on performance statistics helps you to predict the device performance trend. For
example, by analyzing the peak and valley values of user traffic during a day, you can predict
the network traffic growth trend and speed in the next 30 days or longer.
Performance statistics provide a reference for you to optimize network configuration and
make network capacity expansion decisions.
1.4.5.2 Principles
The performance management feature is implemented using the statistics collection function.
The performance management feature allows you to configure the statistics period, statistics
instances, performance indicators, and interval at which statistics files are generated for a
performance statistics task. The statistics period. The interval at which statistics files are
generated. After a performance statistics task is run, the device collects performance indicator
values within the specified statistics period and calculates statistical values at the end of each
statistics period. The device saves performance statistics as files at specified intervals.
After a performance statistics task is configured, the performance management module starts
to periodically collect performance statistics specified in the task.
The statistics include interface-based or service-based traffic statistics. The statistics items are
as follows:
 Traffic volume collected during a statistics period
 Traffic rate calculated by dividing the traffic volume collected during a statistics period
by the length of the period
 Bandwidth usage of statistics objects
Issue 01 (2018-05-04) 79
NE20E-S2
The statistics can be the peak, valley, or average values collected during a statistics period, or
the snapshot values collected at the end of a statistics period. The maximum, minimum,
average, and current values of the ambient temperature are examples of such statistics.
The statistics collection function supports many types of statistics tasks. A statistics task can
be bound to a statistics period and multiple statistics instances.
You can query current and historical performance statistics or clear current performance
statistics using either commands or a network management system (NMS).
If a network device supports the performance management feature, the network management
system (NMS) can deliver performance management tasks to collect and analyze the
performance statistics of the device, as shown in Figure 1-54.
Figure 1-54 Application of performance management on a network
1. The NMS delivers a performance statistics task to the device.

2. The device collects performance statistics based on the performance statistics task and
generates a performance statistics file.
3. The device actively transfers the performance statistics file to the NMS or transfers the
file to the NMS upon request.
4. The NMS parses the performance statistics file, stores the file in the database, and
presents collected statistics if necessary.
The NMS can convert received performance statistics files to files recognizable to a
third-party NMS and transfer these files to the third-party NMS for processing.
Issue 01 (2018-05-04) 80
NE20E-S2
1.4.6 Upgrade and Maintenance

Devices can be upgraded and maintained by activating GTL license files, upgrading system
software, monitoring CPU and memory usage, or restarting devices.
Definition
If the performance of the current system software does not meet requirements, you can
upgrade the system software package or maintain the device to enhance the system
performance. Specific operations involve:
 System software upgrade and patch installation
 GTL file update
Purpose
You can select a proper operation to upgrade and maintain the device according to the
real-world situation. Application scenarios of these operations are as follows:
 Upgrade
− System software upgrade
System software upgrade can optimize device performance, add new features, and
upgrade the current software version.
− Patch installation
Patches are a type of software compatible with system software. They are used to
fix urgent bugs in system software. You can upgrade the system by installing
patches, without having to upgrade the system software.
 GTL file update
A GTL file controls all resource and function items that can be used by a device. All
service features that have been configured on devices can be enabled only when a GTL
file is obtained from Huawei. GTL file update does not require software upgrade or
affect existing services.
Benefits
To add new features to a device or optimize device performance, or if the current resource
files (including the system software and GTL file) do not meet requirements, you can choose
to upgrade software, install patches, or update the GTL file as needed.
1.4.6.2 Principles
1.4.6.2.1 Software Management
Background
Software management is a basic feature on a device. It involves various operations, such as
software installation, software upgrade, software version rollback, and patch operations.
Issue 01 (2018-05-04) 81
NE20E-S2
 Software upgrade automation can be implemented, which minimizes time-consuming

manual operations, upgrade costs, and risks of upgrade failures resulted from
misoperations.
 Software upgrade optimizes system performance, enables new performance capabilities,
and resolves problems in an existing software version.
 Patches are software compatible with the system software. Installing patches can resolve
a specific number of urgent problems, whereas the device does not need to be upgraded.
Basic Concepts
Software management is a basic feature on a device. It involves various operations, such as
software installation, software upgrade, software version rollback, and patch operations.
Before operating system software, note the following:
Obtain the system software of the latest version and its matching documentation from Huawei.
Before uploading the system software onto a device, ensure that sufficient storage space is available on
the master and slave control boards.
Install or upgrade the system software by following the procedure described in an installation or upgrade
guide released by Huawei.
When you install or upgrade the system software, enable the log and alarm management functions to
record installation or upgrade operations on a device. The recorded information helps diagnose faults if
installation or an upgrade fails.
 Software installation
A device can load software onto all main control boards simultaneously, which
minimizes the loading time.
 Software upgrade
Software can be upgraded to satisfy network and service requirements on a live network.
 Software version rollback
If the target software fails to satisfy service requirements or transmit services, software
can be rolled back to the source version.
 Patch operations
Installing the latest patch optimizes software capabilities and fixes software bugs.
Installing the latest patch also dynamically upgrades software on a running device, which
minimizes negative impact on services and improves communication quality.
 Digital signature for a software package
The digital signature mechanism checks validity and integrity of software packages to
ensure that the software installed on a device is secure and reliable.
After a software package is released, it has security risks in the transfer, download,
storage and installation phases, such as components being replaced or tampered with. A
digital signature is packed into a software package before it is released and validated
before the software package is loaded to a device. The software package is considered
Issue 01 (2018-05-04) 82
NE20E-S2
complete and reliable for further installation and use only after the verification on the
digital signature succeeds.
Digital signatures are verified when you set the next-startup patch or system software
package, or load a patch.
1.4.6.2.2 Trusted Computing
Background
Communication devices consist of multiple embedded computer systems, where software may
be vulnerable to viruses and modified by attackers and even attacked by Trojan horses and
unauthorized programs. Once a system is being attacked, the attacker may modify
configurations or intercept packets to tamper with or intercept data.
The trusted computing function allows the system to discover trusted status issues in time,
thereby improving system security and reliability.
Related Concepts
Trusted system: A trusted system indicates that system hardware and software are running
properly as designed. The prerequisite for a trusted system is that the system software
integrity is good without being intruded or tampered with.
Basic Principles
Trusted computing uses the chip and initial startup code of the trusted platform module (TPM)
to establish a trust root for the trusted computing platform.
During startup, the system establishes a complete trust chain from the trust root, BIOS,
bootloader, OS Kernel, to applications, with each level measuring the boot phase of the next
level. The measurement results are irrevocably saved to the TPM chip. This implementation
ensures:
 Setup and transmission of the trust chain.
 Recording of the system's trusted status.
Benefit
This feature offers the following security benefits:
 Trusted start
Provides software integrity measurement, setup and transmission of trusted links, and
records the trusted status of the system.
 Trusted status query
Provides query of the trusted status of the system.
 Trusted status alarm
Generates an alarm if the trusted status of the system is abnormal.
Issue 01 (2018-05-04) 83
NE20E-S2
1.4.6.2.3 System Upgrade
Software Upgrade
At present, the NE20E supports two types of software upgrade: software upgrade that takes
effect at the next startup .
 Software upgrade that takes effect at the next startup
A new name is specified for the system software of the target version. After the device is
restarted, the system automatically uses the new system software. In this manner, the
device is upgraded.
1.4.6.2.4 Patch Upgrade
Background
The system software of a running device may need to be upgraded to correct existing errors or
add new functions to meet service requirements. The traditional way is to disconnect the
device from the network and upgrade the system software in offline mode. This method is
service-affecting.
Patches are specifically designed to upgrade the system software of a running device with
minimum or no impact on services.
Basic Concepts
A patch is an independent software unit used to upgrade system software.
Patches are classified as follows based on loading modes:
 Incremental patch: A device can have multiple incremental patches installed. The latest
incremental patch contains all the information of previous incremental patches.
 Non-incremental patch: A device can have only one non-incremental patch installed. If
you want to install an additional patch for a device on which a non-incremental patch
exists, uninstall the non-incremental patch first.
Patches are classified as follows according to how they take effect:
 Hot patch: The patch takes effect immediately after it is installed. Installing a hot patch
does not affect services.
 Cold patch: The patch does not take effect immediately after it is installed. You must
reset the corresponding board or subcard or perform a master/slave main control board
switchover for the patch to take effect. Installing a cold patch affects services.
Table 1-10 Naming Rules for Patches
Patch = Product Name + Space + Release

Name Number + Patch Number
Emergency = Hot ECP number: HPyyyy
Correction Cold ECP number: CPyyyy
Patches
(ECP)
number
Issue 01 (2018-05-04) 84
NE20E-S2
Patch = Product Name + Space + Release

Name Number + Patch Number
= SPxyyyy (Note: SP refers to service

Accumulated pack. x refers to H or C)
Correction
Updates
(ACU)
number
Naming rules for Emergency Correction Patches (ECP) are as follows:

1. For an ECP that is released based on an ACU, if activating and validating the ECP would
not affect user experience, the ECP is a hot ECP and named HPyyyy; if activating and
validating the ECP would affect user experience, the ECP is a cold ECP and named
CPyyyy.
2. The first y in HPyyyy or CPyyyy is fixed at 0, and the subsequent yyy is the same as yyy
in SPCyyy or SPHyyy of the corresponding ACU. Therefore, an ECP is named in the
format of HP0yyy or CP0yyy. If a calculated ECP name is the same as that of the
previously released ECP, the newly calculated one increases by 1.
Naming rules for Accumulated Correction Updates (ACUs) are as follows:
1. For an ACU that is released based on the previous cold ACU, if the current ACU
contains patches that would affect user experience when being validated, the current
ACU is a cold ACU and named SPCyyy.
2. For an ACU that is released based on the previous cold ACU, if the current ACU does
not contain any patches that would affect user experience when being validated, the
current ACU is a hot ACU and named SPHyyy.
Principles
Patches have the following functions:
 Correct errors in the source version without interrupting services running on a device.
 Add new functions, which requires one or more existing functions in the current system
to be replaced.
Patches are a type of software compatible with the router system software. They are used to
fix urgent bugs in the router system software.
Table 1-11 shows the patch status supported by a device.
Table 1-11 Patch status
Status Description Status Conversion
None The patch has been saved to the When a patch is loaded to the
storage medium of the device, but patch area in the memory, the
is not loaded to the patch area in patch status is set to Running.
the memory.
Running The patch is loaded to the patch A patch in the running state can be
area and enabled permanently. If uninstalled and deleted from the
Issue 01 (2018-05-04) 85
NE20E-S2
Status Description Status Conversion

the board is reset, the patch on the patch area.
board remains in the running
state.
Figure 1-55 shows the relationships between the tasks related to patch installation.
Figure 1-55 Relationships between the tasks related to patch installation
In previous versions, after a cold patch is installed, the system instructs users to perform
operations for the patch to take effect. To facilitate patch installation, the system is configured
to automatically perform the operation that needs to be performed for an installed cold patch
to take effect. Before the system performs the operation, the system asks for your
confirmation.
The implementation principles are as follows:
1. When a cold patch is released, its type and impact range are specified in the patch
description.
2. After a cold patch is installed, the system determines which operation to perform based
on the patch description. For example, the system determines whether to reset a board or
subcard based on the impact range of the cold patch. Then, the system displays a
message asking you to confirm whether to perform the operation for the cold patch to
take effect. The system automatically executes corresponding operations based on users'
choices.
Benefits
Patches allow you to optimize the system performance of a device with minimum or no
impact on services.
1.4.6.2.5 License-based Sales Policy
GTL License
NE20E software is subject to a license which specifies features, versions, capacities, and
expiration for the product. It also grants the right of software usage rather than the ownership
Issue 01 (2018-05-04) 86
NE20E-S2
of software. After purchasing a license, a customer has specified rights and receives a license
certificate from the vendor.
A GTL helps carriers reduce OPEX and speed up service deployment. This is true because all
features are deployed on each device, and a license can be purchased for required features. If
the needs of the carriers change, they may purchase more licenses, making licenses more
flexible in current and future business solutions. Another benefit is that the rights to use
features can be obtained without upgrading the device software or affecting running services.
A license file is encrypted using the device's sequence number as the key. A license can be
obtained from the license management server.
1.4.6.3.1 Upgrade Software
Software Upgrade
If the performance of the current system software does not meet requirements, you can update
the system software package to enhance system performance.
There are two methods to obtain a system software package: remote download or local
download. For details on how to obtain a system software package, refer to the configuration
guide of the corresponding product.
Patch Upgrade
During device operation, the system software may need to be modified due to system bugs or
new function requirements. The traditional way is to upgrade the system software after
powering off the device. This, however, interrupts services and affects QoS.
Loading a patch into the system software achieves system software upgrade without
interrupting services on the device and improves QoS.
1.4.6.3.2 License Authorization

Licenses control service functional modules and the service capacity.
After deploying a device, you can purchase only required service functional modules and
resources, thereby reducing purchasing costs. If the capacity of the existing device needs to be
expanded, you can apply for a new license to enable more license-controlled functions.
1.4.6.4 Terms, Acronyms, and Abbreviations
Terms
None
Acronyms and Abbreviations

Acronym and Full Name
Abbreviation
GTL Global Trotter License
Issue 01 (2018-05-04) 87
NE20E-S2
1.4.7 SNMP
Definition
Simple Network Management Protocol (SNMP) is a network management standard widely
used on TCP/IP networks. With SNMP, a core device, such as a network management station
(workstation), running network management software manage network elements (NEs), such
as routers.
SNMP provides the following functions:
 A workstation uses GET, Get-Next, and Get-Bulk operations to obtain network resource
information.
 A workstation uses a SET operation to set management Information Base (MIB) objects.
 A management agent proactively reports traps and informs to notify the workstation of
network status (allowing network administrators to take real-time measures as needed.)
Purpose
SNMP is primarily used to manage networks.
There are two types of network management methods:
 Network management issues related to software, including application management,
simultaneous file access by users, and read/write access permissions. This guide does not
describe software management in detail.
 Management of NEs that make up a network, such as workstations, servers, network
interface cards (NICs), routers, bridges, and hubs. Many of these devices are located far
from the central network site where the network administrator is located. Ideally, a
network administrator should be automatically notified of faults anywhere on the
network. Unlike users, however, routers cannot pick up the phone and call the network
administrator when there is a fault.
To address this problem, some manufacturers produce devices with integrated network
management functions. The workstation can remotely query the device status, and the devices
can use alarms to inform the workstation of events.
Network management involves the following items:
 Managed objects: devices, also called NEs, to be monitored
 Agent: special software or firmware used to trace the status of managed objects
 Workstation: a core device used to communicate with agents about managed objects and
to display the status of these agents
 Network management protocol: a protocol run on the workstation and agents to
exchange information
Supported SNMP Features

The NE20E supports SNMPv1, SNMPv2c, and SNMPv3. Table 1-12 lists SNMP features
supported by the NE20E.
Issue 01 (2018-05-04) 88
NE20E-S2
Table 1-12 Supported SNMP features
Feature Description
Access control This function restricts a user's device administration rights. It
gives a user the rights to manage specific objects on devices and
therefore provides refined management.
Authentication and This function authenticates and encrypts packets transmitted
encryption between an NMS and a managed device. This function prevents
data packets from being intercepted or tampered with,
improving data transmission security.
Error code Error codes help a network administrator identify and resolve
device faults. A wide range of error codes makes it easier for a
network administrator to manage devices.
Trap Traps are sent from a managed device to an NMS. Traps notify
a network administrator of device faults.
A managed device does not require an acknowledgement from
the NMS after it sends a trap.
Inform Informs are sent from a managed device to an NMS. Informs
notify a network administrator of device faults.
A managed device requires an acknowledgement from the NMS
after it sends an inform. If a managed device does not receive an
acknowledgement after it sends an inform, the managed device
performs the following operations:
 Resends the inform to the NMS.
 Stores the inform in the inform buffer, which consumes a lot
of system resources.
 Generates a log.
NOTE
After an NMS restarts, it learns of the informs sent during the restart
process.
GetBulk This function allows a network administrator to perform

GetNext operations in batches. It reduces the workload of
network administrators for large networks and improves
management efficiency.
Table 1-13 shows the features supported by each SNMP version.
Table 1-13 Features supported by each SNMP version
Feature SNMPv1 SNMPv2c SNMPv3

Access control Community-name-b Community-name-b User- or
ased access control ased access control user-group-based
access control
Issue 01 (2018-05-04) 89
NE20E-S2

Authentication and Not supported Not supported Authentication
encryption modes:
 Message digest
algorithm 5
(MD5)
 Secure hash
algorithm (SHA)
NOTE
To ensure high
security, do not use
the MD5 algorithm as
the SNMPv3
authentication
algorithm.
Encryption mode:
 Data Encryption
Standard-56
(DES-56)
 3DES168
 Advanced
Encryption
Standard-128
(AES128)
 AES192
 AES256
NOTE
To ensure high
security, do not use
the DES-56 or
3DES168 algorithm
as the SNMPv3
encryption algorithm.
For an USM user,

the
non-authentication
and non-encryption,
authentication and
non-encryption, or
authentication and
encryption mode can
be configured. For a
local user, only the
authentication and
encryption mode can
be configured.
Error code 6 error codes 16 error codes 16 error codes
Issue 01 (2018-05-04) 90
NE20E-S2

Trap Supported Supported  USM user:
Supported
 Local user: Not
supported
Inform Not supported Supported  USM user:
Supported
 Local user: Not
supported
GetBulk Not supported Supported Supported
1.4.7.2 Principles
1.4.7.2.1 SNMP Principles
Figure 1-56 shows a typical Simple Network Management Protocol (SNMP) management
system. The entire system must have a network management station (workstation) that
functions as a network management center for the network and runs management processes.
Each managed object must have an agent process. Management processes and agent processes
use User Datagram Protocol (UDP) to transmit SNMP messages for communication.
Figure 1-56 Typical SNMP configuration
Issue 01 (2018-05-04) 91
NE20E-S2
A workstation running SNMP cannot manage NEs (managed objects) running a network
management protocol, not SNMP. In this situation, the workstation must use proxy agents for
management. A proxy agent provides functions, such as protocol transition and filtering
operations. Figure 1-57 shows how a proxy agent works.
Figure 1-57 Schematic diagram for how a proxy agent works
1.4.7.2.2 SNMP Management Model

In an SNMP management system, the network management station (workstation) and agents
exchange signals.
 The workstation (or NMS) sends an SNMP Request message to an SNMP agent.
 The agent searches the management information base (MIB) on the managed object for
the required information and returns an SNMP Response message to the workstation.
 If the trap triggering conditions defined for a module are met, the agent for that module
sends a message to notify the workstation that an event has occurred on a managed
object. This helps the network administrator deal with network faults.
Figure 1-58 shows an SNMP management model.
Issue 01 (2018-05-04) 92
NE20E-S2
Figure 1-58 SNMP management model
1.4.7.2.3 SNMPv1 Principles

SNMP defines five types of protocol data units (PDUs), also called SNMP messages,
exchanged between the workstation and agent.
 Get-Request PDUs: Generated and transmitted by the workstation to obtain one or more
parameter values from an agent.
 Get-Next-Request PDUs: Generated and transmitted by the workstation to obtain
parameter values in alphabetical order from an agent.
 Set-Request PDUs: Used to set one or more parameter values for an agent.
 Get-Response PDUs: Contains one or more parameters. Generated by an agent and
transmitted in reply to a Get-Request PDU from the workstation.
 Traps: Messages that originate with an agent and are sent to inform the workstation of
network events.
Get-Request, Get-Next-Request, and Set-Request PDUs are sent by the workstation to an
agent; Get-Response PDUs and traps are sent by an agent to the workstation. When
Get-Request PDUs, Get-Next-Request PDUs, and Set-Request PDUs are generated and
transmitted, naming is simplified to Get, Get-Next, and Set for convenience. Figure 1-59
shows how the five types of PDUs are transmitted.
By default, an agent uses port 161 to receive Get, Get-Next, and Set messages, and the workstation uses
port 162 to receive traps.
Issue 01 (2018-05-04) 93
NE20E-S2
Figure 1-59 SNMP operations and messages
An SNMP message consists of a common SNMP header, a Get/Set header, a trap header, and
variable binding.
Common SNMP Header

A common SNMP header has the following fields:
 Version
The value for this field is determined by subtracting one from the actual version number.
For example, the version field value of an SNMPv1 message is 0.
 Community
The community is a simple text password shared by the workstation and an agent. It is a
string. A common value is the 6-character string "public".
 PDU type
There are five types of PDUs in total, as shown in Table 1-14.
Table 1-14 PDU type
PDU Type Name

0 get-request
1 get-next-request
2 get-response
3 set-request
4 trap
Get/Set Header
The Get or Set header contains the following fields:
 Request ID
Issue 01 (2018-05-04) 94
NE20E-S2
An integer set by the workstation, it is carried in Get-Request messages sent by the

workstation and in Get-Response messages returned by an agent. The workstation can
send Get messages to multiple agents simultaneously. All Get messages are transmitted
using UDP. A response to the request message sent first may be the last to arrive. In such
cases, Request IDs carried in the Get-Response messages enable the workstation to
identify the returned messages.
 Error status
An agent enters a value in this field of a Get-Response message to specify an error, as
listed in Table 1-15.
Table 1-15 Error status
Value Name Description
0 noError No error exists.

1 tooBig The agent cannot encapsulate its response in an
SNMP message.
2 noSuchName A nonexistent variable is contained in a message.
3 badValue A Set operation has returned an invalid value or
syntax.
4 readOnly The workstation has attempted to modify a
read-only variable.
5 genErr Other errors.
 Error index
When noSuchName, badValue, and readOnly errors occur, the agent sets an integer in
the Response message to specify an offset value for the faulty variable in the list. By
default, the offset value in get-request messages is 0.
 Variable binding (variable-bindings)
A variable binding specifies the variable name and corresponding value, which is empty
in Get or Get-Next messages.
Trap Header
 Enterprise
This field is an object identifier of a network device that sends traps. The object
identifier resides in the sub-tree of the enterprise object {1.3.6.1.4.1} in the object
naming tree.
 Generic trap type
Table 1-16 lists the generic trap types that can be received by SNMP.
Table 1-16 Generic trap type
Value Type Description
0 coldStart A coldStart trap signifies that the SNMP entity,

supporting a notification originator application, is
reinitializing itself and that its configuration may
Issue 01 (2018-05-04) 95
NE20E-S2
Value Type Description

have been altered.
1 warmStart A warmStart trap signifies that the SNMP entity,
supporting a notification originator application, is
reinitializing itself such that its configuration is
unaltered.
2 linkDown An interface has changed from the Up state to the
Down state.
3 linkUp An interface has changed from the Down state to
the Up state.
4 authenticationFailure The SNMP workstation has received an invalid
community name.
5 egpNeighborLoss An EGP peer has changed to the Down state.
6 enterpriseSpecific An event defined by the agent and specified by a
code.
To send a type 2, 3, or 5 trap, you must use the first variable in the trap's variable binding field
to identify the interface responding to the trap.
 Specific-code
If an agent sends a type 6 trap, the value in the Specific-code field specifies an event
defined by the agent. If the trap type is not 6, this field value is 0.
 Timestamp
This specifies the duration from when an agent is initializing to when an event reported
by a trap occurs. This value is expressed in 10 ms. For example, a timestamp of 1908
means that an event occurred 19080 ms after initialization of the agent.
Variable Binding
Variable binding specifies the name and value of one or more variables. In Get or Get-Next
messages, this field is null.
1.4.7.2.4 SNMPv2c Principles

SNMPv2 has been released as a recommended Internet standard.
Simplicity is the main reason for the success of SNMP. On a large and complicated network
with devices from multiple vendors, a management protocol is required to provide specific
functions to simplify management. However, to make the protocol simple, SNMP:
 Does not provide the batch access mechanism and has low access efficiency of bulk data.
 Only is able to run on TCP/IP networks.
 Does not provide a communication mechanism for managers and is therefore suitable for
only centralized management, not distributed management.
 Is suitable for monitoring network devices, not a network.
Issue 01 (2018-05-04) 96
NE20E-S2
In 1996, the Internet Engineering Task Force (IETF) issued a series of SNMP-associated
standards. These documents defined SNMPv2c and abandoned the security standard in
SNMPv2.
SNMPv2c enhances the following aspects of SNMPv1:
 Structure of management information (SMI)
 Communication between workstations
 Protocol control
SNMPv2c Security
SNMPv2c abandons SNMPv2 security improvements and inherits the message mechanism
and community concepts in SNMPv1.
New PDU Types in SNMPv2c

 Get-Bulk PDUs: A Get-Bulk PDU is generated on the workstation. The Get-Bulk
operation (transmission of Get-Bulk PDUs) is implemented based on Get-Next
operations. The Get-Bulk operation enables the workstation to query managed object
group information. One Get-Bulk operation equals several consecutive Get-Next
operations. You can set the recycle times for a Get-Bulk PDU on the workstation. The
recycle times equal the times for performing Get-Next operations during a one-time
packet exchange on the host.
 Inform-Request PDUs: An Inform-Request PDU is generated on the agent. The
Inform-Request operation (transmission of Inform-Request PDUs) provides a guarantee
for the trap mechanism. After the agent sends an Inform-Request PDU, the workstation
should return an acknowledge message to notify the agent of successful receipt of the
Inform-Request PDU. If the acknowledge message is not returned within a specified
period, the Inform-Request PDU is retransmitted until the number of retransmission
times exceeds the threshold.
1.4.7.2.5 SNMPv3 Principles

The SNMPv3 architecture embodies the model-oriented design and simplifies the addition
and modification of functions. SNMPv3 features the following:
 Strong adaptability: SNMPv3 is applicable to multiple operating systems. It can manage
both simple and complex networks.
 Good extensibility: New models can be added as needed.
 High security: SNMPv3 provides multiple security processing models.
SNMPv3 has four models: message processing and control model, local processing model,
user security model, and view-based access control model.
Unlike SNMPv1 and SNMPv2, SNMPv3 can implement access control, identity
authentication, and data encryption using the local processing model and user security model.
Message Processing and Control Model

A message processing and control model is responsible for constructing and analyzing SNMP
messages and determining whether the messages can pass through a proxy server. In the
message constructing process, the message processing and control model receives a PDU
from a dispatcher and then sends it to the user security model to add security parameters to the
PDU header. When analyzing the received PDU, the user security model must first process
Issue 01 (2018-05-04) 97
NE20E-S2
the security parameters in the PDU header and then send the unpacked PDU to the dispatcher
for processing.
Local Processing Model

A local processing model is primarily used to implement access control, data packaging, and
data interruption. Access control is implemented by setting information related to the agent so
that the management processes on different workstations can have different access
permissions to the agent. This process is implemented through PDU transmission. There are
two commonly used access control policies: restricting the workstation from delivering some
commands to the agent, and specifying the details in the MIB of the agent that the workstation
can access. Access control policies must be predefined. SNMPv3 flexibly defines access
control policies using the syntax with various parameters.
User Security Model

A user security model provides identity authentication and data encryption services. The two
preceding services require that the workstation and agent use a shared key.
 Identity authentication: A process in which the agent (or workstation) confirms whether
the received message is from an authorized workstation (or agent) and whether the
message is changed during transmission. HMAC is an effective tool that is widely
applied on the Internet to generate the message authentication code using the security
hash function and shared key.
 Data encryption: The workstation uses the key to calculate the CBC code and then adds a
CBC code to the message while the agent uses the same key to decrypt the authentication
code and then obtains the actual information. Similar to identity authentication, the
encryption requires that the workstation and agent share the same key to encrypt and
decrypt the message.
To improve system security, it is recommended to configure different authentication and encryption

passwords for an SNMP user.
View-Based Access Control Model

A view-based access control model is mainly used to restrict the access permissions of user
groups or communities to specific views. You must pre-configure a view and specify its
permission. When you configure a user, a user group, or a community, load this view to
implement read/write restriction or trap function (for SNMPv3).
1.4.7.2.6 MIB
A Management Information Base (MIB) specifies variables (MIB object identifiers or OIDs)
maintained by NEs. These variables can be queried and set in the management process. A
MIB provides a structure that contains data on all NEs that may be managed on the network.
The SNMP MIB uses a hierarchical tree structure similar to the Domain Name System (DNS),
beginning with a nameless root at the top. Figure 1-60 shows an object naming tree, one part
of the MIB.
Issue 01 (2018-05-04) 98
NE20E-S2
Figure 1-60 MIB tree structure
The three objects at the top of the object naming tree are: ISO, ITU-T (formerly CCITT), and
the sum of ISO and ITU-T. There are four objects under ISO. Of these, the number 3
identifies an organization. A Department of Defense (DoD) sub-tree, marked dod (6), is under
the identified organization (3). Under dod (6) is internet (1). If the only objects being
considered are Internet objects, you may begin drawing the sub-tree below the Internet object
(the square frames in dotted lines with shadow marks in the following diagram), and place the
identifier {1.3.6.1} next to the Internet object.
One of the objects under the Internet object is mgmt (2). The object under mgmt (2) is mib-2
(1) (formerly renamed in the new edition MIB-II defined in 1991). mib-2 is identified by an
OID, {1.3.6.1.2.1} or {Internet(1).2.1}.
Table 1-17 Types of information managed by the MIB
Type Identifier Information

system 1 Operating system of a host or router
interfaces 2 Various types of network interfaces and traffic
volumes on these interfaces
address translation 3 Address translation (such as ARP mapping)
ip 4 Internet software (for collecting statistics about IP
fragments)
Internet Control 5 ICMP software (for collecting statistics about
Issue 01 (2018-05-04) 99
NE20E-S2
Type Identifier Information

Message Protocol received ICMP messages)
(icmp)
TCP 6 TCP software (for algorithms, parameters, and
statistics)
UDP 7 UDP software (for collecting statistics on UDP
traffic volumes)
External Gateway 8 EGP software (for collecting statistics on EGP
Protocol (EGP) traffic)
MIB is defined independently of a network management protocol. Device manufacturers can

integrate SNMP agent software into their products (for example, routers), but they must
ensure that this software complies with relevant standards after new MIBs are defined. You
can use the same network management software to manage routers containing different MIB
versions. However, the network management software cannot manage a router that does not
support the MIB function.
1.4.7.2.7 SMI
Structure of Management Information (SMI) is a set of rules used to name and define
managed objects. It can define the ID, type, access level, and status of managed objects. At
present, there are two SMI versions: SMIv1 and SMIv2.
The following standard data types are defined in SMI:
 INTEGER
 OCTER STRING
 DisplayString
 OBJECT IDENTIFIER
 NULL
 IpAddress
 PhysAddress
 Counter
 Gauge
 TimeTicks
 SEQUENCE
 SEQUENCE OF
1.4.7.2.8 Trap
A managed device sends unsolicited trap messages to notify a network management system
(NMS) that an urgent and significant event has occurred on the managed device. For example,
the managed device restarts. Figure 1-61 shows the process of transmitting a trap message.
Issue 01 (2018-05-04) 100

NE20E-S2
Figure 1-61 Process of transmitting a trap message
If the trap triggering conditions defined for the agent's module are met, the agent sends a trap
message to notify the NMS that a significant event has occurred. Network administrators can
promptly handle the event.
The NMS uses port 162 to receive trap messages from the agent. The trap messages are
carried over the User Datagram Protocol (UDP). After the NMS receive trap messages, it does
not need to acknowledge the messages.
1.4.7.2.9 SNMP Protocol Stack Support for Error Codes

In communication between the network element (NE) and network management station
(workstation), an SNMP error code returned by the NE in response to SNMP requests can
provide error information, such as excessive packet length and nonexistent index. The error
code defined by SNMP is called the standard error code.
The SNMP protocol stack provides 21 types of standard error codes:
 Five are specialized for SNMPv1.
 Sixteen are shared by SNMPv2 and SNMPv3.
With an increasing number of system features and scenarios, the current SNMP standard error
code types are inadequate. Consequently, the workstation cannot identify the scenario where
the fault occurs when the NE processes packets. As a solution, the extended error code was
introduced.
When a fault occurs during packet processing, the NE returns an error code corresponding to
the fault scenario. If the fault scenario is beyond the range of the SNMP standard error code, a
generic error or a user-defined error code is returned.
The error code that is defined by users is called the extended error code.
The extended error code applies to more scenarios. Only Huawei workstations can correctly
parse the fault scenario of the current NE based on the agreement with NEs.
Extended error code can be enabled using either command lines or operations on the
workstation. After extended error code is enabled, SNMP converts the internal error codes
returned from features into different extended error codes and then sends them to the
workstation based on certain rules. If the internal error codes returned from features are
standard error codes, SNMP sends them directly to the workstation.
If extended error code is disabled, standard error codes and internal error codes defined by
modules are sent directly to the workstation.
The system generates and manages extended error codes based on those registered on the
modules and the module number. The workstation parses extended error codes according to
its agreement with NEs and then displays the obtained information.
Issue 01 (2018-05-04) 101

NE20E-S2
1.4.7.2.10 SNMP Support for IPv6

The transition from IPv4 to IPv6 networks has already begun. NEs must be capable of
running IPv6 and transmitting SNMP messages on IPv6 networks.
SNMP does not distinguish between SNMP messages transmitted with IPv4 or IPv6
encapsulated headers. SNMP processes both SNMP IPv4 and SNMP IPv6 messages in the
same manner.
SNMP supports IPv6 by:
 Reading SNMP messages
SNMP can read and process both SNMP IPv4 and IPv6 messages. The two types of
messages do not affect each other. NEs can run on either IPv6 networks or IPv4 and IPv6
dual-stack networks.
Upon receiving a message, an NE first determines whether the packet is an IPv4 or IPv6
packet. Depending on the packet type, it then dispatches the packet to perform a task and
processes the packet. A processing result based on the IP protocol type of the packet is
sent to the workstation.
Like SNMP IPv4 messages, IPv6 messages are sent to port 161. NEs can obtain
information for both SNMP IPv4 and IPv6 messages by monitoring port 161.
 Sending IPv6-based traps
Command lines are used to configure a network management host with an IPv6 address.
NEs use IPv6 to send traps to the host with this IPv6 address.
SNMP does not support IPv6 Inform packets.
 Recording SNMP IPv6 messages

The same commands are used to configure SNMP IPv6 and IPv4, but the command
outputs for the packets are adapted based on the protocol type.
NEs separate IPv6 messages from IPv4 messages by automatically matching messages
with their upper layer protocols.
1.4.7.2.11 Comparisons of Security in Different SNMP Versions
Table 1-18 Comparisons of security in different SNMP versions
Protocol User Checksum Encryptio Authentication

Version n
v1 No. Uses a community name. No No

v2c No. Uses a community name. No No
v3 Yes. User-name-based Yes Yes
encryption/decryption.
SNMPv1 and SNMPv2c have security risks. Using SNMPv3 is recommended.
Issue 01 (2018-05-04) 102

NE20E-S2
1.4.7.2.12 ACL Support

In SNMP, ACL is used for community, USM user and VACM group configuration. Access
control lists are used to prevent unauthorized access to the router. ACL for community, USM
user and VACM group can be configured independently.
1.4.7.2.13 SNMP Proxy
Background
The Simple Network Management Protocol (SNMP) communicates management information
between a network management station (NMS) and a device, such as a router, so that the
NMS can manage the device. If the NMS and device use different SNMP versions, the NMS
cannot manage the device.
To resolve this problem, configure SNMP proxy on a device between the NMS and device to
be managed, as shown in Figure 1-62. In the following description, the device on which
SNMP proxy needs to be configured is referred to as a middle-point device.
The NMS manages the middle-point device and managed device as an independent network
element, reducing the number of managed network elements and management costs.
Figure 1-62 SNMP proxy
An SNMP proxy provides the following functions:

 Receives SNMP packets from other SNMP entities, forwards SNMP packets to other
SNMP entities, or forwards responses to SNMP request originators.
 Enables communication between SNMP entities running SNMPv1, SNMPv2c, and
SNMPv3.
An SNMP proxy can work between one or more NMSs and multiple network elements.
Principles
In Figure 1-63, the middle-point device allows you to manage the network access,
configurations, and system software version of the managed device. The network element
management information base (MIB) files loaded to the NMS include the MIB tables of both
the middle-point device and managed device. After you configure SNMP proxy on the
middle-point device, the middle-point device automatically forwards SNMP requests from the
NMS to the managed device and forwards SNMP responses from the managed device to the
NMS.
Issue 01 (2018-05-04) 103

NE20E-S2
Figure 1-63 SNMP proxy working principles
Figure 1-64 shows the SNMP proxy schematic diagram.
Issue 01 (2018-05-04) 104

NE20E-S2
Figure 1-64 SNMP proxy schematic diagram
 The process in which an NMS uses a middle-point device to query the MIB information
of a managed device is as follows:
a. The NMS sends an SNMP request that contains the MIB object ID of the managed
device to the middle-point device.
 The engine ID carried in an SNMPv3 request must be the same as the engine
ID of the SNMP agent on the managed device.
 If the SNMP request is an SNMPv1 or SNMPv2c packet, a proxy community
name must be configured on the middle-point device with the engine ID of the
SNMP agent on the managed device be specified. The community name
carried in the SNMP request packet must match the community name
configured on the managed device.
b. Upon receipt, the middle-point device searches its proxy table for a forwarding
entry based on the engine ID.
 If a matching forwarding entry exists, the middle-point device caches the
request and encapsulates the request based on forwarding rules.
 If no matching forwarding entry exists, the middle-point device drops the
request.
c. The middle-point device forwards the encapsulated request to the managed device
and waits for a response.
d. After the middle-point device receives a response from the managed device, the
middle-point device forwards the response to the NMS.
If the middle-point device fails to receive a response within a specified period, the
middle-point device drops the SNMP request.
 The process in which a managed device uses a middle-point device to send a notification
to an NMS is as follows:
a. The managed device generates a notification due to causes such as overheating and
sends the notification to the middle-point device.
b. Upon receipt, the middle-point device searches its proxy table for a forwarding
entry based on the engine ID.
Issue 01 (2018-05-04) 105

NE20E-S2
 If a matching forwarding entry exists, the middle-point device encapsulates the

notification based on forwarding rules.
 If no matching forwarding entry exists, the middle-point device drops the
notification.
c. The middle-point device forwards the encapsulated notification to the NMS.
If the notification is sent as an inform by the managed device, the middle-point
device forwards the notification to the NMS and waits for a response after
forwarding the notification to the NMS. If the middle-point device does not receive
any response from the NMS within a specified period, the middle-point device
drops the notification.
d. The NMS receives the notification.
1.4.7.2.14 SNMP Support for AAA Users
Background
AAA is an authentication, authorization, and accounting technique. AAA local users can be
configured to log in to a device through FTP, Telnet, or SSH. However, SNMPv3 supports
only SNMP users, which can be an inconvenience in unified network device management.
To resolve this issue, configure SNMP to support AAA users. AAA users can then access the
NMS, and MIB node operation authorization can be performed based on tasks. The NMS
does not distinguish AAA users and SNMP users.
Figure 1-65 shows the process of an AAA user logging in to the NMS through SNMP.
Figure 1-65 Process of an AAA user logging in to the NMS through SNMP
Principles
Figure 1-66 shows the principles of SNMP's support for AAA users.
1. Create a local AAA user.
If the AAA user needs to log in through SNMP, the user name must have less than 32
characters.
2. Configure the AAA user to log in through SNMP.
3. SNMP synchronizes the AAA user data and updates the SNMP user list. Configure a
mode to authenticate the AAA user and a mode to encrypt the AAA user's data.
The AAA user's authentication and encryption modes are SNMP. An authentication
password is not used.
After the preceding operations are performed, the AAA user can log in to the NMS in the
same way as an SNMP user.
Issue 01 (2018-05-04) 106

NE20E-S2
Figure 1-66 Principles of SNMP's support for AAA users
To improve system security, it is recommended to configure different authentication and encryption

passwords for an SNMP local user.
Task-based MIB Node Operation Authorization

AAA allows you to perform the following operations:
 Configure users, user groups, tasks, and task groups.
 Add a user to a user group and associate a user group with a task group.
 Configure multiple tasks in a task group.
You can configure the read, write, and execute permissions for a specific task to control MIB
node operations that an AAA user is allowed to perform. As shown in Figure 1-67:
 MIB nodes 1 and 2 are added to task 1.
 Task group 1 is associated with user group 1.
 User 1 is added to user group 1
If the read permission is assigned in task 1, user 1 is allowed to read MIB nodes 1 and 2.
Issue 01 (2018-05-04) 107

NE20E-S2
Figure 1-67 Task-based MIB node operation authorization
1.4.7.3.1 Monitoring an Outdoor Cabinet Using SNMP Proxy
As shown in Figure 1-68, a Simple Network Management Protocol (SNMP) proxy and the
cabinet control unit (CCU) of a managed device are placed in an outdoor cabinet. The SNMP
proxy enables communication between the network management station (NMS) and managed
device and allows you to manage the configurations and system software version of the
managed device.
Issue 01 (2018-05-04) 108

NE20E-S2
Figure 1-68 Networking diagram for monitoring an outdoor cabinet using SNMP proxy
The SNMP proxy is deployed on the main device. The NMS manages each cabinet as a virtual
unit that consists of the main device and monitoring device. This significantly reduces the
number of NEs managed by the NMS, lowering network management costs, facilitating
real-time device performance monitoring, and improving service quality.
1.4.8 NETCONF
Definition
The Network Configuration Protocol (NETCONF) is an extensible markup language (XML)
based network configuration and management protocol. NETCONF uses a simple remote
procedure call (RPC) mechanism to implement communication between a client and a server.
NETCONF provides a method for a network management system (NMS) to remotely manage
and monitor devices.
Purpose
As networks grow in scale and complexity, the Simple Network Management Protocol
(SNMP) can no longer meet carriers' network management requirements, especially
configuration management requirements. XML-based NETCONF was developed to overcome
this limitation.
Table 1-19 lists the differences between SNMP and NETCONF.
Table 1-19 Comparison between SNMP and NETCONF
Item SNMP NETCONF
Issue 01 (2018-05-04) 109

NE20E-S2
Item SNMP NETCONF

Confi SNMP does not provide a lock NETCONF provides a lock mechanism to
gurati mechanism to prevent the prevent the operations performed by multiple
on operations performed by multiple users from conflicting with each other.
manag users from conflicting with each
ement other.
Query SNMP requires multiple NETCONF can directly query system
interaction processes to query configuration data and supports data filtering.
one or more records in a database
table.
Extens SNMP is not readily extensible. NETCONF is highly extensible:
ibility  The NETCONF protocol framework uses
a hierarchical structure with four
independent layers. Extensions to one
layer have little impact on the other
layers.
 XML encoding helps expand
NETCONF's management capabilities
and compatibility.
Securi The International Architecture NETCONF uses existing security protocols

ty Board (IAB) released SNMPv2 to ensure network security and is not specific
(enhanced SNMP) in 1996. to any security protocols. NETCONF is more
SNMPv2 provides only limited flexible than SNMP in ensuring security.
security improvements over NOTE
SNMPv1. SNMPv3, released in NETCONF prefers to run atop Secure Shell (SSH)
2002, provides important security at the transport layer and use SSH to transmit
improvements over the previous XML information.
two versions but is inextensible.
This is because SNMPv3 security
parameters are dependent upon
the security model.
Benefits
NETCONF offers the following benefits:
 Facilitates configuration data management and interoperability between different
vendors' devices using XML encoding to define messages and the RPC mechanism to
modify configuration data.
 Reduces network faults caused by manual configuration errors.
 Improves the efficiency of system software upgrade performed using a configuration
tool.
 Provides high extensibility, allowing different vendors to define additional NETCONF
operations.
 Improves data security using authentication and authorization mechanisms.
Issue 01 (2018-05-04) 110

NE20E-S2
1.4.8.2 Principles
1.4.8.2.1 NETCONF Protocol Framework
Like the open systems interconnection (OSI) model, the NETCONF protocol framework also
uses a hierarchical structure. A lower layer provides services for the upper layer.
The hierarchical structure enables each layer to focus only on a single aspect of NETCONF
and reduces the dependencies between different layers.
Table 1-20 describes the layers of the NETCONF protocol framework.
Table 1-20 NETCONF protocol framework layers
Layer Example Description

Layer BEEP, The transport protocol layer provides a communication path
1: Secure Shell between the NETCONF manager and agent.
transpo (SSH), and NETCONF can be layered over any transport protocol that meets
rt Secure the following requirements:
protoc Sockets
 The transport protocol is connection-oriented. A permanent
ol Layer (SSL)
link is established between the NETCONF manager and agent.
This link provides reliable and sequenced data delivery.
 The transport protocol provides authentication, data integrity,
and confidentiality for NETCONF.
 The transport protocol provides a mechanism to distinguish
the session type (client or server) for NETCONF.
NOTE
Currently, the NE20E can use only SSH as the transport protocol for
NETCONF.
Layer <rpc> and The RPC layer provides a simple and transport-independent
2: <rpc-reply> framing mechanism for encoding RPCs. The NETCONF manager
remote uses the <rpc> element to encapsulate RPC request information
proced and sends the RPC request information to the NETCONF agent
ure call over a secure and connection-oriented session. The NETCONF
(RPC) agent uses the <rpc-reply> element to encapsulate RPC response
information (content at the operations and content layers) and
sends the RPC response information to the NETCONF manager.
In normal cases, the <rpc-reply> element encapsulates the data
required by the NETCONF manager or information about a
configuration success. If the NETCONF manager sends an
incorrect request or the NETCONF agent fails to process a
request from the NETCONF manager, the NETCONF agent
encapsulates the <rpc-error> element containing detailed error
information in the <rpc-reply> element and sends the <rpc-reply>
element to the NETCONF manager.
Layer <get-config The operation layer defines a series of basic operations used in
3: >, RPC. The basic operations constitute basic NETCONF
Issue 01 (2018-05-04) 111

NE20E-S2
Layer Example Description

operati <edit-config capabilities.
ons >, and
<notification
>
Layer Configuratio The content layer describes configuration data involved in
4: n data network management. The configuration data depends on
content vendors' devices.
All the layers have been standardized for NETCONF except the
content layer. The content layer has no standard NETCONF data
modeling language or data model.
1.4.8.2.2 NETCONF Network Architecture and Related Concepts
NETCONF Network Architecture

Figure 1-69 shows the typical NETCONF network topology. NETCONF network architecture
consists of the following components:
 Client
A Client resides on a network management system (NMS) server that uses NETCONF to
manage devices.
− A client can send <rpc> elements to the server of a managed device to query or
modify the configuration data on the managed device.
− A client can learn the status of a managed device based on the alarms and events
actively reported by the server of the managed device.
 Server
A server resides on a managed device that maintains the configuration data on the
managed device, responds to the <rpc> elements sent by a client, and sends the requested
information to the client.
− After a NETCONF agent receives an <rpc> element, the server parses the element,
processes the element based on the Configuration Management Framework (CMF),
and sends an <rpc-reply> element to the client.
− If an alarm is generated or an event occurs on a managed device, the server of the
managed device reports the alarm or event to a client for the client to learn the
status of the managed device.
 Schema
A schema file is a data model file that defines a set of rules used to describe an
Extensible Markup Language (XML) document. A schema file defines all the
management objects on a managed device, the constraints and hierarchical relationships
between these management objects, and the read and write permissions of these
management objects.
A schema file functions in a way similar to a Simple Network Management Protocol
(SNMP) management information base (MIB) file.
 YANG
YANG is a data moduling language developed to design NETCONF-oriented
configuration data, status data models, RPCs models, and notification mechanisms.
Issue 01 (2018-05-04) 112

NE20E-S2
Although Schema and YANG have different model description syntax, they have almost
the same XML messages exchanged between the device and NMS when using the same
model.
Figure 1-69 NETCONF network architecture
The information that can be retrieved from a running device is as follows:

 Configuration data: a set of writable data that is required to transform a device from its
initial default state into its current state
 State data: the additional non-configuration data on a device, such as read-only status
information and collected statistics
NETCONF deals with configuration data operations performed by the client and is not
involved with how configuration data is stored.
Related Concepts
 XML encoding
XML is a NETCONF encoding format, allowing complex hierarchical data to be
expressed in a text format that can be read, saved, and manipulated with both traditional
text tools and tools specific to XML. All NETCONF protocol elements are defined in
namespace urn:ietf:params:xml:ns:netconf:base:1.0. Document type declarations must
not be contained in NETCONF content.
XML-based network management uses XML to describe managed data and management
operations, so that devices can parse management information.
XML-based network management has the following advantages:
− Strong data presentation capabilities
− Easy, efficient, and secure large-scale data transmission
− Improved network configuration management
 Remote procedure call (RPC) model
NETCONF uses an RPC-based communication model. NETCONF uses XML-encoded
<rpc> and <rpc-reply> elements to provide transport-protocol-independent framing of
NETCONF requests and responses. Table 1-21 describes some commonly used RPC
Issue 01 (2018-05-04) 113

NE20E-S2
elements. If a module supports YANG, its capability must provide information about the
YANG model. An example message is as follows:
<capability>http://www.huawei.com/netconf/vrp/huawei-ifm?module=huawei-ifm&
revision=2013-01-01</capability>
Table 1-21 RPC elements
Elements Description
<rpc> Encapsulates a request that the client sends to the server.
<rpc-reply> Encapsulates a response message for an <rpc> request message. The
server returns a response message, which is encapsulated in the
<rpc-reply>element, for each <rpc> request message.
<rpc-error> Notifies a client of an error occurring during <rpc> request processing.
The server encapsulates the <rpc-error> element in the <rpc-reply>
element and sends the <rpc-reply> element to the client.
<ok> Notifies a client of no errors occurring during <rpc> request processing.
The server encapsulates the <ok> element in the <rpc-reply> element and
sends the <rpc-reply> element to the client.
 NETCONF capability
A NETCONF capability is a set of functionalities that supplement the base NETCONF
specification. Capabilities augment the base operations of a device, describing both
additional operations and the content-allowed internal operations.
The client can discover the server's capabilities and use any additional operations,
parameters, and content defined by those capabilities.
A capability is identified by a uniform resource identifier (URI).
urn:ietf:params:xml:ns:netconf:capability:{name}:1.0
In addition to the capabilities defined by NETCONF, a vendor can define additional
capabilities to extend management functions.
 Configuration databases
A configuration database is a complete collection of configuration parameters for a
device. Table 1-22 describes NETCONF-defined configuration databases.
Table 1-22 NETCONF-defined configuration databases

Configur Description
ation
Database
<running/> Stores various configuration parameters that are running on a device.

This configuration database is the only standard database that is mandatory.
To support modification of the <running/> configuration database, the device
Issue 01 (2018-05-04) 114

NE20E-S2
Configur Description
ation
Database
must have the writable-running capability.
<candidate Stores various configuration parameters that are about to run on a device.
/> An administrator can perform operations on the <candidate/> configuration
database. Modifications to this configuration database do not take effect
before data in the <running/> configuration database is replaced with data in
this configuration database.
To support the <candidate/> configuration database, the device must have the
candidate capability.
NOTE
The <candidate/> configuration databases supported by Huawei devices do not allow
inter-session data sharing. Therefore, the configuration of a <candidate/> configuration
database does not require additional locking operations.
<startup/> Stores a configuration data set loaded during device startup. The
configuration data set is similar to a configuration file.
To support the <startup/> configuration database, the device must have the
distinct startup capability.
 Subtree filtering
Subtree filtering allows an application to include particular XML subtrees in the
<rpc-reply> elements for a <get> or <get-config> operation.
Subtree filtering provides a small set of filters for inclusion, simple content exact-match,
and selection. The server does not need to use any data-model-specific semantics during
processing, allowing for simple and centralized implementation policies.
Table 1-23 describes subtree filter components.
Table 1-23 Subtree filter components
Compone Description
nt
Namespac If namespaces are used, then the filter output will include only elements from
e selection the specified namespace.
Containme A containment node is a node that contains child elements within a subtree
nt node filter.
For each containment node specified in a subtree filter, all data model
instances which exactly match the specified namespaces and element
hierarchy are included in the filter output.
Content A content match node is a leaf node which contains simple content within a
match subtree filter.
node A content match node is used to select some or all of its sibling nodes for
Issue 01 (2018-05-04) 115

NE20E-S2
Compone Description
nt
filter output and represents an exact-match filter of the leaf node element
content.
Selection A selection node is an empty leaf node within a subtree filter.
node A selection node represents an explicit selection filter of the underlying data
model. Presence of any selection nodes within a set of sibling nodes will
cause the filter to select the specified subtrees and suppress automatic
selection of the entire set of sibling nodes in the underlying data model.
1.4.8.2.3 NETCONF Authorization
Overview
The NETCONF authorization mechanism regulates user permissions to perform NETCONF
operations and access NETCONF resources.
NETCONF authorization includes:
 NETCONF operation authorization: authorizes user information by specific NETCONF
operations, such as <edit-config>, <get>, <sync-full>, <sync-inc>, and <commit>.
 Module authorization: authorizes users for specific feature modules, such as Telnet-client,
Layer 3 virtual private network (L3VPN), Open Shortest Path First (OSPF), Fault-MGR,
Device-MGR, and Intermediate System-to-Intermediate System (IS-IS).
 Data node authorization: authorizes users for specific data nodes, such as:
/ifm/interfaces/interface/ifAdminStatus/devm/globalPara/maxChassisNum.
The authorization rules for NETCONF operations and data nodes can be configured.
Principles
The NETCONF authorization mechanism is similar to the task authorization mechanism used
to regulate command authorization. NETCONF authorization is also modeled based on
NETCONF access control model (ACM).
1.19.2.21.36 User Group-based and Task Group-based User Management defines tasks, task
groups, and user groups. The task authorization mechanism uses a three-layer permission
control model. This model organizes commands into tasks, tasks into task groups, and task
groups into user groups.
The NETCONF authorization mechanism is based on the task authorization mechanism. The
NETCONF authorization mechanism subscribes to required authorization information from
the task authorization mechanism and stores the obtained information in its local data
structures.
NETCONF authorization rules include the operation rule and data node rule. Rule
permissions can be either permit or deny, which is specified in the user-group view.
Only permit is allowed in the task-group view. For a schema path, access permission can be
set to read, write, or execute.
Issue 01 (2018-05-04) 116

NE20E-S2
NETCONF operations are implemented based on NETCONF sessions established using

Secure Shell (SSH). NETCONF authorization applies only to SSH users.
 The operation permissions of a user are defined by the user group to which the user
belongs. All users in a user group have the same permissions.
 A user group consists of multiple task groups.
 A task group consists of multiple tasks.
A task can be assigned one or more of the following permissions when being added to a
task group: read, write, and execute.
Commands for each feature or module belong to a single task. Tasks are pre-configured
and cannot be added, modified, or deleted.
Figure 1-70 and Figure 1-71 show the NETCONF authorization schematic diagram. The
NETCONF authorization mechanism adds rules for NETCONF operation and data node
authorization based on the task authorization mechanism.
Figure 1-70 NETCONF authorization schematic diagram
Issue 01 (2018-05-04) 117

NE20E-S2
Figure 1-71 NETCONF authorization schematic diagram
Benefits
NETCONF authorization is a mechanism to restrict access for particular users to a
pre-configured subset of all available NETCONF protocol operations and contents.
1.4.8.2.4 NETCONF Capabilities and Operations

NETCONF provides a set of operations to manage device configurations and retrieve device
configuration and state data. NETCONF can also provide additional operations based on the
capabilities advertised by a device.
Capabilities Exchange
When a NETCONF session is opened, each NETCONF peer sends a <hello> element
containing a list of its capabilities. If both peers support a capability, they can implement
special management functions based on this capability.
The <hello> elements sent by servers can be defined by vendors.
Each NETCONF peer sends its <hello> element simultaneously as soon as the connection is
opened. A NETCONF peer does not wait to receive the capabilities from the other side before
sending its own capabilities.
After a server exchanges <hello> elements with a client, the agent waits for <rpc> elements
from the client. A server returns an <rpc-reply> element in response to each <rpc> element.
Figure 1-72 shows the interaction between a server and a client.
Issue 01 (2018-05-04) 118

NE20E-S2
Figure 1-72 Capabilities exchange interaction between a server and a client
 Example of a <hello> element sent by a client

<?xml version="1.0" encoding="UTF-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.0</capability>
<capability>urn:ietf:params:netconf:capability:writable-running:1.0</capability
>
<capability>urn:ietf:params:netconf:capability:candidate:1.0</capability>
<capability>urn:ietf:params:netconf:capability:startup:1.0</capability>
<capability>urn:ietf:params:netconf:capability:rollback-on-error:1.0</capabilit
y>
<capability>http://www.huawei.com/netconf/capability/sync/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/rollback/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/exchange/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/active/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/action/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/update/1.0</capability>
</capabilities>
</hello>
 Example of a <hello> element sent by a server
− Example of a <hello> message sent by a YANG-supported module
<capabilities>
<capability>urn:ietf:params:netconf:base:1.0</capability>
Issue 01 (2018-05-04) 119

NE20E-S2
<capability>urn:ietf:params:netconf:capability:writable-running:1.0</capab
ility>
<capability>urn:ietf:params:netconf:capability:candidate:1.0</capability>
<capability>urn:ietf:params:netconf:capability:validate:1.0</capability>
<capability>urn:ietf:params:netconf:capability:startup:1.0</capability>
<capability>urn:ietf:params:netconf:capability:rollback-on-error:1.0</capa
bility>
<capability>urn:huawei:netconf:capability:sync:1.0</capability>
<capability>urn:ietf:params:netconf:capability:confirmed-commit:1.0</capab
ility>
<capability>http://www.huawei.com/netconf/capability/sync/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/discard-commit/1.0</c
apability>
<capability>http://www.huawei.com/netconf/capability/rollback/1.0</capabil
ity>
<capability>http://www.huawei.com/netconf/capability/exchange/1.0</capabil
ity>
<capability>http://www.huawei.com/netconf/capability/action/1.0</capabilit
y>
<capability>http://www.huawei.com/netconf/capability/update/1.0</capabilit
y>
</capabilities>
<session-id>1149</session-id>
</hello>
− Example of a <hello> message sent by a YANG-supported module, with a
notification containing YANG model information at the end of the message
<capability>http://www.huawei.com/netconf/vrp/huawei-ifm?module=huawei-ifm
&revision=2013-01-01</capability>
<capability>http://www.huawei.com/netconf/vrp/huawei-pub-types?module=
huawei-pub-types &revision=2013-01-01</capability>
 Examples of invalid <hello> elements sent by a client
− Example of a <hello> element that does not contain base capability information
<?xml version="1.0">
<capabilities>
</capabilities>
</hello>
− Example of an incorrect <hello> element
<?xml version="1.0">
<capabilities>
<capabilities>urn:ietf:params:netconf:base:1.0</capability>
<capabilities>urn:ietf:params:netconf:capability:candidate:1.0</capability
>
Issue 01 (2018-05-04) 120

NE20E-S2
</capabilities>
</hello>
Base Capabilities and Operations Defined by NETCONF

When a NETCONF session is opened, each peer must send at least the base NETCONF
capabilities.
<capability> urn:ietf:params:netconf:base:1.0 </capability>
The base NETCONF capabilities support the <running/> configuration database. The
following describes the operations defined in base capabilities:
 <get-config>
The <get-config> operation retrieves all or part of a specified configuration from the
<running/>, <candidate/>, and <startup/> configuration databases. The <get-config>
operation can also retrieve configuration from file: <url>huawei.cfg</url>.
If the <get-config> operation is successful, the server sends an <rpc-reply> element
containing a <data> element with the results of the query. Otherwise, the server sends an
<rpc-reply> element containing an <rpc-error> element.
− <rpc> element
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get-config>
<source>
<running/>
</source>
<filter type="subtree">
<ifm xmlns="http://www.huawei.com/netconf/vrp"
content-version="1.0" format-version="1.0">
<interfaces>
<interface/>
</interfaces>
</ifm>
</filter>
</get-config>
</rpc>
− <rpc-reply> element
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<ifm xmlns="http://www.huawei.com/netconf/vrp" format-version="1.0"
content-version="1.0">
<interfaces>
<interface>
<ifIndex>2</ifIndex>
<ifName>Virtual-Template0</ifName>
<ifPhyType>Virtual-Template</ifPhyType>
<ifPosition>
</ifPosition>
<ifParentIfName>
</ifParentIfName>
<ifNumber>0</ifNumber>

</interface>
Issue 01 (2018-05-04) 121

NE20E-S2
</interfaces>
</ifm>
</data>
</rpc-reply>
 <get>
The <get> operation retrieves configuration and state data from the <running/>
configuration database.
If the <get> operation is successful, the server sends an <rpc-reply> element containing a
<data> element with the results of the query. Otherwise, the server sends an <rpc-reply>
element containing an <rpc-error> element.
The differences between <get> and <get-config> operations are as follows:
 The <get> operation can retrieve data from the <running/>, <candidate/>, and <startup/>
configuration databases, whereas the <get> operation can retrieve data only from the <running/>
configuration database. The <source> parameter does not need to be specified in the <rpc> element
for a <get> operation.
 The <get-config> operation can retrieve only configuration data, whereas the <get> operation can
retrieve both configuration and state data.
− <rpc> element
<get>
<ifm xmlns="http://www.huawei.com/netconf/vrp" content-version="1.0"
format-version="1.0">
<interfaces>
<interface>
<ifName>GigabitEthernet0/0/0</ifName>
</interface>
</interfaces>
</ifm>
</filter>
</get>
</rpc>
<data>
<interfaces>
<interface>
<ifPhyType>MEth</ifPhyType>

</interface>
</interfaces>
</ifm>
Issue 01 (2018-05-04) 122

NE20E-S2
</data>
</rpc-reply>
 <edit-config>
The <edit-config> operation creates, modifies, or deletes configuration data.
If the <edit-config> operation is successful, the server sends an <rpc-reply> element
containing an <ok> element. Otherwise, the server sends an <rpc-reply> element
containing an <rpc-error> element.
− <rpc> element
<edit-config>
<target>
<running/>
</target>
<default-operation>merge</default-operation>
<error-option>rollback-on-error</error-option>
<config>
<interfaces>
<interface>
<ifDescr>chenyuqiao</ifDescr>
</interface>
</interfaces>
</ifm>
</config>
</edit-config>
</rpc>
<rpc-reply message-id="60" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
flow-id="104">
<ok />
</rpc-reply>
In the <get-config>, <get>, and <edit-config> request/reply messages sent by a YANG-supported

module, the namespace of each feature is specific to the feature level. For example:
 RPC request:
<get>
<arp xmlns="http://www.huawei.com/netconf/vrp/huawei-arp"/>
</filter>
</get>
</rpc>
 RPC reply:
<rpc-reply message-id="2415" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<arp xmlns="http://www.huawei.com/netconf/vrp/huawei-arp>
<arpSystemInfo>
<learnStrictEnable>false</learnStrictEnable>
Issue 01 (2018-05-04) 123

NE20E-S2
<l2TopoDetectEnable>false</l2TopoDetectEnable>
<rateTrapInterval>0</rateTrapInterval>
<arpPassiveLearnEnable>false</arpPassiveLearnEnable>
<arpTopoDetectDisable>false</arpTopoDetectDisable>
</arpSystemInfo>
</arp>
</data>
</rpc-reply>
 <copy-config>
The <copy-config> operation creates or replaces an entire configuration database with
the content of another complete configuration database. If the target database exists, it is
overwritten. Otherwise, a new one is created, if allowed.
If the <copy-config> operation is successful, the server sends an <rpc-reply> element
− <rpc> element
<copy-config>
<target>
<url>eee.cfg</url>
</target>
<source>
<running/>
</source>
</copy-config>
</rpc>
<ok/>
</rpc-reply>
 <delete-config>
The <delete-config> operation deletes a configuration database, not the <running/>
configuration database.
If the <delete-config> operation is successful, the server sends an <rpc-reply> element
− <rpc> element
<delete-config>
<target>
<startup/>
</target>
</delete-config>
</rpc>
Issue 01 (2018-05-04) 124

NE20E-S2
<ok/>
</rpc-reply
 <lock>
The <lock> operation locks the configuration databases of a device. A locked
configuration database cannot be modified by other clients. The locks eliminate errors
caused by simultaneous database modifications by the client and other clients or Simple
Network Management Protocol (SNMP) or command-line interface (CLI) scripts.
If the <lock> operation is successful, the server sends an <rpc-reply> element containing
an <ok> element. Otherwise, the server sends an <rpc-reply> element containing an
<rpc-error> element.
If the specified configuration database is already locked by a client, the <error-tag>
element will be "lock-denied" and the <error-info> element will include the <session-id>
of the lock owner.
If the <lock> operation is successful:
− <rpc> element
<lock>
<target>
<running/>
</target>
</lock>
</rpc>
<ok/> 
</rpc-reply>
If the <lock> operation is unsuccessful:
− <rpc> element
<lock>
<target>
<running/>
</target>
</lock>
</rpc>
<rpc-error>
<error-type>protocol</error-type>
<error-tag>lock-denied</error-tag>
<error-severity>error</error-severity>
<error-app-tag>43</error-app-tag>
<error-message>The configuration is locked by other user. [Session ID =
629]</error-message>
<error-info>
<error-paras>
Issue 01 (2018-05-04) 125

NE20E-S2
<error-para>629</error-para>
</error-paras>
</error-info>
</rpc-error>
</rpc-reply>
 <unlock>
The <unlock> operation releases a configuration lock previously obtained with the
<lock> operation. A client cannot unlock a configuration database that it did not lock.
If the <unlock> operation is successful, the server sends an <rpc-reply> element
− <rpc> element
<unlock>
<target>
<running/>
</target>
</unlock>
</rpc>
<ok/>
</rpc-reply>
 <close-session>
The <close-session> operation gracefully terminates a NETCONF session.
If the <close-session> operation is successful, the server sends an <rpc-reply> element
− <rpc> element
<close-session/>
</rpc>
<ok/>
</rpc-reply>
 <kill-session>
The <kill-session> operation forcibly terminates a NETCONF session. Only an
administrator is authorized to perform this operation.
If the <kill-session> operation is successful, the server sends an <rpc-reply> element
− <rpc> element
<kill-session>
Issue 01 (2018-05-04) 126

NE20E-S2
</kill-session>
</rpc>
<ok/>
</rpc-reply>
Standard Capabilities and Operations Defined by NETCONF

By default, a NETCONF-enabled device supports all standard capabilities defined by
NETCONF.
The following describes the standard capabilities defined by NETCONF and operations
defined in the standard capabilities:
 Writable-running
This capability indicates that the device supports direct writes to the <running>
configuration database. This means that the device supports <edit-config> and
<copy-config> operations on the <running> configuration database.
<capability> urn:ietf:params:netconf:capability:writable-running:1.0
</capability>
 Candidate configuration
This capability indicates that the device supports the <candidate/> configuration
database.
The <candidate/> configuration database holds a complete set of configuration data that
can be manipulated without impacting the device's current configuration. The
<candidate/> configuration database serves as a work place for creating and
manipulating configuration data.
Additions, deletions, and changes can be made to the data in the <candidate/>
configuration database to construct the desired configuration data. The following
operations can be performed at any time:
− <commit>: converts all configuration data in the <candidate/> configuration
database to the running configuration data.
If the device is unable to commit all of the changes in the <candidate/>
configuration database, then the running configuration data remains unchanged.
− <discard-changes>: discards any uncommitted changes by resetting the candidate
configuration data with the content of the running configuration data.
A device establishes an independent <candidate/> configuration database for each
NETCONF session.
<capability> urn:ietf:params:netconf:capability:candidate:1.0 </capability>
 Confirmed commit
This capability indicates that the device allows the <commit> operation to carry the
<confirmed> and <confirm-timeout> parameters.
− <confirmed>: performs all configured data and converts them to the running
configuration data on a device. The Confirmed Commit capability already contains
this parameter.
− <confirm-timeout>: specifies the interval at which the Confirmed Commit
capability is used, in seconds.
Issue 01 (2018-05-04) 127

NE20E-S2
If the Confirmed Commit capability is used in a specified interval, all configured data
are performed and converted to the running configuration data on the device. If the
Confirmed Commit capability is used when the interval elapses, the configured data are
not performed and restored to the original configuration. The interval can be configured
using the <confirm-timeout> parameter.
<capability> urn:ietf:params:netconf:capability:confirmed-commit:1.0
</capability>
− RPC request
<commit>
<confirmed/>
<confirm-timeout>120</confirm-timeout>
</commit>
</rpc>
− RPC reply
<ok/>
</rpc-reply>
 Rollback on error
This capability indicates that the device can perform a rollback when an error occurs. If
an error occurs and the <rpc-error> element is generated, the server stops performing the
<edit-config> operation and restores the specified configuration to the status before the
<edit-config> operation is performed.
<capability> urn:ietf:params:netconf:capability:rollback-on-error:1.0
</capability>
 Distinct startup
This capability indicates that the device can perform a distinct startup. The server checks
parameter availability and consistency during the distinct startup of a device.
The server supports the <startup/> configuration database and can distinguish the <running/>
configuration database from the <startup/> configuration database. To permanently save configuration
data in the <running/> configuration database, perform the <copy-config> operation to copy the
configuration data from the <running/> configuration database to the <startup/> configuration database.
<capability> urn:ietf:params:netconf:capability:startup:1.0 </capability>
Huawei-Specific Capabilities and Operations

The following describes Huawei-specific capabilities and operations:
 Synchronization
This capability indicates that the device allows the NMS to perform data
synchronization.
The client can send an <rpc> element to the server, requesting to synchronize data from
the device to the destination folder.
<capability> http://www.huawei.com/netconf/capability/sync/1.0 </capability>
Issue 01 (2018-05-04) 128

NE20E-S2
The synchronization capability supports the following operations:

− <sync-full>: synchronizes all data from the device to the destination folder.
After a server receives an <rpc> element containing a <sync-full> element, the
server performs a syntax check on the element. If the element fails the syntax check,
the server returns an <rpc-reply> element containing an <rpc-error> element. If the
element passes the syntax check, the server encapsulates the required data into
Extensible Markup Language (XML) files (one XML file for one feature) and
compresses all the XML files into a .zip file. Then, the server transfers the .zip file
to the client using FTP or SFTP. Meanwhile, the server sends an <rpc-reply>
element containing an <ok> element to the client.
Only a device with the synchronization capability supports the <sync-full>
operation.
 <rpc> element
<rpc message-id="101" hwcontext=""
<sync-full
xmlns="http://www.huawei.com/netconf/capability/base/1.0">
<target>
<user-name>user</user-name>
<password>pwd</password>
<target-addr>172.16.1.1:21</target-addr>
<path>path</path>
</target>
<filename-prefix>sync_dev958_vr1</filename-prefix>
<ospf xmlns="http://www.huawei.com/netconf/vrp" type="cfg"/>
<bgp xmlns= xmlns="http://www.huawei.com/netconf/vrp"type="oper">
<peer>
<peer_id/>
<peer_sate/>
</peer>
</bgp>
</filter>
</sync-full>
</rpc>
 <rpc-reply> element
<ok/>
</rpc-reply>
− <sync-inc>: incrementally synchronizes data from a device.
If the <sync-inc> operation is successful, the server sends an <rpc-reply> element
containing a <data> element which contains the data changes between two commit
operations. Otherwise, the server sends an <rpc-reply> element containing an
<rpc-error> element.
Only a device with the synchronization capability supports the <sync-inc>
operation.
 <rpc> element
<sync-increment xmlns="
http://www.huawei.com/netconf/capability/base/1.0">
Issue 01 (2018-05-04) 129

NE20E-S2
<target>
<flow-id>10</flow-id>
</target>
<source>
<flow-id>1</flow-id>
<source/>
</sync-increment>
</rpc>
 <rpc-reply> element
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"102hwcontext="vr=1">
<data>
<ifm>
<interfacs>
<interface difference="create">
<interfaceName>ethernet 1/1/1.1</interfaceName>
<mtu>15000</mtu>
<adminStatus>down</adminStatus>
</interface>
<interface difference="delete">
<interfaceName>ethernet 1/1/2.1</interfaceName>
</interface>
<interface difference="modify">
<interfaceName>ethernet 1/1/3</interfaceName>
<mtu>15000</mtu>
<adminStatus>up</adminStatus>
</interface>
<interface difference="modify">
<interfaceName>ethernet 1/1/4</interfaceName>
<ifAm4s>
<ifAm4 difference="create">
<ipAddress>10.164.11.10</ipAddress >
<netMask>255.255.255.0</netMask>
<addressType></addressType>
</ifAm4>
</ifAm4s>
</interface>
</ifm>
</data>
</rpc-reply>
 Active notification
This capability indicates that the device can inform its peer that it is active.
<capability> http://www.huawei.com/netconf/capability/active/1.0 </capability>
For operations that take a long time, such as the <commit> and <copy-config>
operations, the client may consider their request processing as overtime and cancel these
operations.
The active notification capability enables the server to periodically send an <active>
element to the client when processing a time-consuming <rpc> element, informing the
client that the <rpc> element is under processing.
Only a device with the active notification capability supports <active> elements.
<active message-id="101"
xmlns="http://www.huawei.com/netconf/capability/base/1.0"> </active>
Issue 01 (2018-05-04) 130

NE20E-S2
 Action
This capability indicates that the device can perform maintenance operations.
In addition to basic operations, such as additions, deletions, modifications, and queries, a
device also needs to perform some maintenance operations.
The action capability enables a device to perform maintenance operations using <rpc>
elements. The operation results are returned in <rpc-reply> elements. Maintenance
operations usually do not involve configuration data modification or service data
obtaining. A maintenance operation may be deleting a packet counter or resetting a
board.
<capability> http://www.huawei.com/netconf/capability/action/1.0 </capability>
Only a device with the action capability supports the <execute-action> operation.
The <execute-action> operation requests the server to run a maintenance command
(excluding query and basic configuration commands).
If the <execute-action> operation is successful, the server sends an <rpc-reply> element
− <rpc> element
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="631">
<execute-action
xmlns="http://www.huawei.com/netconf/capability/base/1.0">
<action>
<acl xmlns="http://www.huawei.com/netconf/vrp" content-version="1.0"
<aclResetCount>
<aclNumOrName>2111</aclNumOrName>
</aclResetCount>
</acl>
</action>
</execute-action>
</rpc>
<ok/>
</rpc-reply>
 Execute CLI
This capability indicates that the device can interact with the request sender in request
processing.
<capability> http://www.huawei.com/netconf/capability/execute-cli/1.0
</capability>
Only a device with the exchange capability supports the <execute-cli> operation.
The client performs the <execute-cli> operation to execute CLI commands through
NETCONF. Maximum 60 commands are allowed in single <rpc> request. Maximum
command string length allowed is 512 bytes. Execute-cli operation behaves as stop on
error. For example, if any failure in command execution, then it does not execute the rest
of commands in <rpc> request.
 Update
This capability indicates that the device can update configuration data.
<capability> http://www.huawei.com/netconf/capability/update/1.0 </capability>
Issue 01 (2018-05-04) 131

NE20E-S2
The <update> operation updates the configuration data in the <candidate/> configuration
database with the latest configuration data in the <running/> configuration database
when a conflict occurs during data commitment.
If the <update> operation is successful, the server sends an <rpc-reply> element
Only a device with the update capability supports the <update> operation.
− <rpc> element
<update xmlns="http://www.huawei.com/netconf/capability/base/1.0"/>
</rpc>
<ok/>
</rpc-reply>
 Exchange
This capability indicates that the device can interact with the request sender in request
processing.
<capability> http://www.huawei.com/netconf/capability/exchange/1.0 </capability>
<capability> http://www.huawei.com/netconf/capability/exchange/1.2 </capability>
Only a device with the exchange capability supports the <get-next> operation.
The client performs the <get-next> operation to interact with the server.
For example, if the returned result of a <get> or <get-config> operation involves a large
amount of data, the server has to return the data using multiple <rpc-reply> elements.
After the client receives the first <rpc-reply> element, the client interacts with the server
to request for the next <rpc-reply> element or cancel the data query.
If the <get-next> operation for obtaining the next <rpc-reply> element is successful, the
server sends an <rpc-reply> element containing a <data> element. Otherwise, the server
sends an <rpc-reply> element containing an <rpc-error> element.
− <rpc> element
<get>
<interfaces>
<interface>
</interface>
</interfaces>
</ifm>
</filter>
</get>
</rpc>
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
set-id="989">
<data>
Issue 01 (2018-05-04) 132

NE20E-S2
<interfaces>
<interface>
<ifName>Virtual-Template0</ifName>
<ifPhyType>Virtual-Template</ifPhyType>

</interface>
</interfaces>
</ifm>
</data>
</rpc-reply>
If the <get-next> operation for canceling the data query is successful, the server sends an
<rpc-reply> element containing an <ok> element. Otherwise, the server sends an
<rpc-reply> element containing an <rpc-error> element.
− <rpc> element
<get-next xmlns="http://www.huawei.com/netconf/capability/base/1.0"
set-id="1">
<discard/>
</get-next>
</rpc>
<ok/>
</rpc-reply>
− Commit-description
This capability indicates that a description of the committed data can be configured
when the data in the <candidate/> configuration database is committed to the
<running/> configuration database.
<capability>
http://www.huawei.com/netconf/capability/commit-description/1.0
</capability>
− Discard Commit
This operation is used to cancel/abort an ongoing confirmed-commit operation. For
example, before confirmed commit timeout or confirming <commit> operation.
<capability> http://www.huawei.com/netconf/capability/discard-commit/1.0
</capability>
1.4.8.3.1 NETCONF-based Configuration and Management
Devices on a network are usually located in various regions, as shown in Figure 1-73.
Configuring and managing these devices at each site is difficult. In addition, if these devices
are manufactured by various vendors, and each vendor provides a unique set of device
management methods, configuring and managing these devices using traditional methods will
be costly and highly ineffective. To resolve these issues, use NETCONF to remotely
configure, manage, and monitor devices.
Issue 01 (2018-05-04) 133

NE20E-S2
You can use the Simple Network Management Protocol (SNMP) as an alternative to remotely configure,
manage, and monitor devices on a simple network.
Figure 1-73 NETCONF-based configuration and management
NETCONF runs atop Secure Shell (SSH) at the transport layer.

Before using NETCONF to configure and manage devices shown in Figure 1-73, perform the
following operations:
1. Configure SSH on managed devices so that these devices can be configured, managed,
and monitored over SSH connections.
2. Enable NETCONF on managed devices so that these devices function as NETCONF
agents.
3. Install a network management system (NMS) on a personal computer (PC) or
workstation so that the PC or workstation functions as a NETCONF manager.
NETCONF provides the following functions:
 Allows authorized users to remotely configure, manage, and monitor devices.
 Allows devices to proactively report alarms and events to the NMS in real time, if there
are any.
1.4.9 DCN
Definition
The data communication network (DCN) refers to the network on which network elements
(NEs) exchange Operation, Administration and Maintenance (OAM) information with the
network management system (NMS). It is constructed for communication between managing
and managed devices.
Issue 01 (2018-05-04) 134

NE20E-S2
A DCN can be an external or internal DCN. In Figure 1-74, an external DCN is between the
NMS and an access point, and an internal DCN allows NEs to exchange OAM information
within it. In this document, internal DCNs are described.
Figure 1-74 External DCN and internal DCN
Gateway network elements (GNEs) are connected to the NMS using protocols, for example,
the Simple Network Management Protocol (SNMP). GNEs are able to forward data at the
network or application layer. An NMS directly communicates with a GNE and uses the GNE
to deliver management information to non-GNEs.
Purpose
When constructing a large network, hardware engineers must install devices on site, and
software commissioning engineers must configure the devices also on site. This network
construction method requires significant human and material resources, causing high capital
expenditure (CAPEX) and operational expenditure (OPEX). If a new NE is deployed but the
NMS cannot detect the NE, the network administrator cannot manage or control the NE.
Plug-and-play can be used so that the NMS can automatically detect new NEs and remotely
commission the NEs to reduce CAPEX and OPEX.
The DCN technique offers a mechanism to implement plug-and-play. After an NE is installed
and started, an IP address (NEIP address) mapped to the NEID of the NE is automatically
generated. Each NE adds its NEID and NEIP address to a link state advertisement (LSA).
Then, Open Shortest Path First (OSPF) advertises all Type-10 LSAs to construct a core
routing table that contains mappings between NEIP addresses and NEIDs on each NE. After
detecting a new NE, the GNE reports the NE to the NMS. The NMS accesses the NE using
the IP address of the GNE and ID of the NE. To commission NEs, the NMS can use the GNE
to remotely manage the NEs on the network.
To improve the system security, it is recommended that the NEIP address be changed to the planned one.
Issue 01 (2018-05-04) 135

NE20E-S2
Benefits
The NMS is able to manage NEs using service channels provided by the managed NEs. No
additional devices are required, reducing CAPEX and OPEX.
1.4.9.2 Principles
1.4.9.2.1 Basic Concepts
NEID and NEIP

 NEID
On a data communication network (DCN), a network element (NE) is uniquely identified
by an ID but not an IP address. This ID is called an NEID. A 24-bit NEID consists of a
subnet number and a basic ID. The leftmost 8 bits of an NEID indicate a subnet. The
rightmost 16 bits of an NEID indicate a basic ID. Each NE is assigned a default NEID
before the NE is delivered.
As the unique identities of NEs on a DCN, NEIDs must be different from each other. If
the NEIDs of two NEs on a DCN are identical, route flapping occurs.
 NEIP
NEIP addresses help managed terminals access NEs and allow addressing between NEs
in IP networking. An NEIP address consists of a network number and a host number. A
network number uniquely identifies a physical or logical link. All the NEs along the link
have the same network number. A network number is obtained using an AND operation
on the 32-bit IP address and subnet mask. A host number uniquely identifies a device on
a link.
An NEIP address is derived from an NEID when an NE is being initialized. An NEIP
address is in the format of 128.subnet-number.basic-ID.
The following example uses the default NEID 0x09BFE0, which is
1001.10111111.11100000 in binary format. The basic ID is the 16 least significant bits
10111111.11100000, which is 191.224 in decimal format. The subnet number is the 8
most significant bits 00001001, which is 9 in decimal format. Therefore, the NEIP
address derived from 0x09BFE0 is 128.9.191.224.
Before the NEIP address is manually changed, the NEIP address and NEID are
associated; therefore, the NEIP address changes if the NEID is changed. Once the NEIP
address is manually changed, it no longer changes when the associated NEID is changed.
To improve the system security, it is recommended that the NEIP address be changed to the planned one.
DCN Core Routing Table

A DCN core routing table consists of mappings between NEIP addresses and NEIDs of NEs
on a DCN.
To use a GNE to access a non-GNE, an NMS searches the DCN core routing table for the
destination NEIP address that maps the target NEID. Then, the NMS sends a UDP packet to
the destination NEIP address. Therefore, to implement the DCN feature, a DCN core routing
tables must be available on each device.
Issue 01 (2018-05-04) 136

NE20E-S2
1.4.9.2.2 Basic DCN Principles
Figure 1-75 Basic DCN principles
The devices on a data communication network (DCN) communicate with each other using the
Point-to-Point Protocol (PPP) through single-hop logical channels. Therefore, packets
transmitted on the DCN are encapsulated into PPP frames and forwarded through service
ports at the data link layer.
As shown in Figure 1-75, the NMS uses a GNE to manage non-GNEs in the following
process:
1. After the DCN function is enabled, a PPP channel and an OSPF neighbor relationship
are established between devices.
2. OSPF LSAs are sent between OSPF neighbors to learn host routes carrying NEIP
addresses to obtain mappings between NEIP addresses and NEIDs.
3. GNE sends the mappings to NMS, The NMS then accesses non-GNEs.
A core routing table is generated in the following process:
1. After PPP Network Control Protocol (NCP) negotiation is complete, a point-to-point
route is generated without network segment restrictions.
2. The OSPF neighbor relationship is established, and the OSPF route is generated for the
entire network.
3. NEIDs are advertised using OSPF LSAs, triggering the generation of a core routing
table.
DCN Application
During network deployment, every network element (NE) must be configured with software
and commissioned after hardware installation to ensure that all NEs can communicate with
each other. As a large number of NEs are deployed, on-site deployment for each NE requires
significant manpower and is time-consuming. In order to reduce the on-site deployment times
and the cost of operation and maintenance, the DCN can be deployed.
Issue 01 (2018-05-04) 137

NE20E-S2
Figure 1-76 Typical DCN application
In Figure 1-76, to improve reliability, active and standby GNEs can be deployed. If the active
GNE fails, the NMS can gracefully switch this function to the standby GNE.
DCN Traversal over a Third-Party Layer 2 Network
Figure 1-77 DCN traversal over a third-party Layer 2 network
Issue 01 (2018-05-04) 138

NE20E-S2
1. A DCN VLAN group is configured on the GNE, and the VLAN ID of the Dot1q
termination subinterface is the same as the DCN VLAN ID of the main interface.
2. The GNE sends DCN negotiation packets to VLANs in the DCN VLAN group.
3. The DCN negotiation packets are sent to different leaf nodes through VLLs.
4. NEs learn the DCN VLAN ID sent by the GNE and establish DCN connections with the
GNE.
Terms
Term Description
GNE Gateway network elements (GNEs) are able to forward
data at the network or application layer. The NMS can
use GNEs to manage remote NEs connected through
optical fibers.
Core routing table A core routing table consists of mappings between NEID
and NEIP addresses of NEs on a data communication
network (DCN). Before accessing a non-GNE through a
GNE, the NMS must search the core routing table for the
NEIP address of the non-GNE based on the destination
NEID.

Acronym and Abbreviation Full Name
DCN data communication network

GNE gateway network element
1.4.10 LAD
Definition
Link Automatic Discovery (LAD) is a Huawei proprietary protocol that discovers neighbors
at the link layer. LAD allows a device to issue link discovery requests as triggered by the
NMS or command lines. After the device receives link discovery replies, the device generates
neighbor information and saves it in the local MIB. The NMS can then query neighbor
information in the MIB and generate the topology of the entire network.
Purpose
Large-scale networks demand increased NMS capabilities, such as obtaining the topology
status of connected devices automatically and detecting configuration conflicts between
Issue 01 (2018-05-04) 139

NE20E-S2
devices. Currently, most NMSs use an automated discovery function to trace changes in the
network topology but can only analyze the network-layer topology. Network-layer topology
information notifies you of basic events like the addition or deletion of devices, but gives you
no information about the interfaces used by one device to connect to other devices or the
location or network operation mode of a device.
LAD is developed to resolve these problems. LAD can identify the interfaces on a network
device and provide detailed information about connections between devices. LAD can also
display paths between clients, switches, routers, application servers, and network servers. The
detailed information provided by LAD can help efficiently locate network faults.
Benefits
LAD helps network administrators promptly obtain detailed network topology and changes in
the topology and monitor the network status in real time, improving security and stability for
network communication.
1.4.10.2 Principles
LAD Packet Formats

Link Automatic Discovery (LAD) packets have three different formats, depending on the link
type.
 When Ethernet interfaces are used on links, LAD packets are encapsulated into Ethernet
frames. Figure 1-78 shows the LAD packet format on Ethernet interfaces.
Figure 1-78 LAD packet format on Ethernet interfaces
Table 1-24 describes the fields in an LAD packet on Ethernet interfaces.
Table 1-24 Fields in an LAD packet on Ethernet interfaces
Field Length Description
DA 6 bytes Destination MAC address, a broadcast MAC address fixed at

0xFF-FF-FF-FF-FF-FF
SA 6 bytes Source MAC address, an interface's MAC address or a
device's bridge MAC address
NOTE
Use the interface's MAC address if there is one; otherwise, use the
device's bridge MAC address.
Type 2 bytes Packet type, fixed at 0x0000
Issue 01 (2018-05-04) 140

NE20E-S2

Flag 20 bytes LAD packet identifier, fixed as Huawei Link Search
Information 20-44 bytes LAD data unit, main part of an LAD packet
FCS 4 bytes Frame check sequence
 When Ethernet sub-interfaces are used on links, LAD packets are encapsulated into
Ethernet frames. Figure 1-79 shows the LAD packet format on Ethernet sub-interfaces.
Figure 1-79 LAD packet format on Ethernet sub-interfaces
Table 1-25 describes the fields in an LAD packet on Ethernet sub-interfaces.
Table 1-25 Fields in an LAD packet on Ethernet sub-interfaces
DA 6 bytes Destination MAC address, a broadcast MAC address fixed at

0xFF-FF-FF-FF-FF-FF
SA 6 bytes Source MAC address, an interface's MAC address or a
device's bridge MAC address
NOTE
Use the interface's MAC address if there is one; otherwise, use the
device's bridge MAC address.
Tag 4 bytes 2-byte Ethernet Type field and 2-byte VLAN field included
Type 2 bytes Packet type, fixed at 0x0806
Field 6 bytes Four fields included:
 Hardware Type, fixed at 0xFF-FF
Issue 01 (2018-05-04) 141

NE20E-S2

 Protocol Type, fixed at 0xFF-FF
 Hardware Length, fixed at 0x00
 Protocol Length, fixed at 0x00
Flag 20 bytes LAD packet identifier, fixed as Huawei Link Search

 When low-speed interfaces are used on links, LAD packets are encapsulated into PPP
frames. Figure 1-80 shows the LAD packet format on low-speed interfaces.
Figure 1-80 LAD packet format on low-speed interfaces
Table 1-26 describes the fields in an LAD packet on low-speed interfaces.
Table 1-26 Fields in an LAD packet on low-speed interfaces
Flag1 1 byte PPP frame's start ID, fixed at 0x7E

Address 1 byte Remote device's address, fixed at 0xFF
Control 1 byte PPP frame type, fixed at 0x03, indicating an unsequenced
frame
Protocol 2 bytes Packet type (LAD) carried by PPP frames, fixed at 0xce05
Flag2 20 bytes LAD packet identifier, fixed as Huawei Link Search
Flag3 1 byte PPP frame's end ID, fixed at 0x7E
The Information field is the same in all the three LAD packet formats, meaning that the LAD
data units are irrelevant to the link type. Figure 1-81 shows the format of the LAD data unit.
Issue 01 (2018-05-04) 142

NE20E-S2
Figure 1-81 LAD data unit format
Table 1-27 describes the fields in the LAD data unit.
Table 1-27 LAD data unit fields

Type 1 byte LAD data unit type:

 1: Link Detect packet
 2: Link Reply packet
Version 1 byte LAD protocol version, fixed at 1

Length 2 bytes LAD data unit length
Value 12-16 bytes LAD data unit's sub TLV:
 Send Link Info SubTLV
 Recv Link Info SubTLV
LAD Packet Types

LAD packets are classified as Link Detect or Link Reply packets, depending on the LAD data
unit type.
 Link Detect packets: link discovery requests triggered by the NMS or command lines.
Link Detect packets carry Send Link Info SubTLV in the data unit. Figure 1-82 shows
the format of the Link Detect packet data unit.
Figure 1-82 Link Detect packet data unit format
Issue 01 (2018-05-04) 143

NE20E-S2
 Link Reply packets: link discovery replies in response to the Link Detect packets sent by
remote devices. Link Reply packets carry the Send Link Info SubTLV (the same as that
in the received Link Detect packets) and Recv Link Info SubTLV. Figure 1-83 shows the
format of the Link Reply packet data unit.
Figure 1-83 Link Reply packet data unit format
1.4.10.2.2 Implementation
Background
To monitor the network status in real-time and to obtain detailed network topology and
changes in the topology, network administrators usually deploy the Link Layer Discovery
Protocol (LLDP) on live networks. LLDP, however, has limited applications due to the
following characteristics:
 LLDP uniquely identifies a device by its IP address. IP addresses are expressed in dotted
decimal notation and therefore are not easy to maintain or manage, when compared with
NE IDs that are expressed in decimal integers.
 LLDP is not supported on Ethernet sub-interfaces, Eth-Trunk interfaces, or low-speed
interfaces, and therefore cannot discover neighbors for these types of interfaces.
 LLDP-enabled devices periodically broadcast LLDP packets, consuming many system
resources and even affecting the transmission of user services.
Link Automatic Discovery (LAD) addresses the preceding problems and is more flexible:
 LAD uniquely identifies a device by an NE ID in decimal integers, which are easier to
maintain and manage.
 LAD can discover neighbors for various types of interfaces and therefore are more
widely used than LLDP.
 LAD is triggered by an NMS or command lines and therefore can be implemented as
you need.
Implementation
The following example uses the networking in Figure 1-84 to illustrate how LAD is
implemented.
Issue 01 (2018-05-04) 144

NE20E-S2
Figure 1-84 LAD networking
The LAD implementation is as follows:

1. DeviceA determines the interface type, encapsulates local information into a Link Detect
packet, and sends the packet to DeviceB.
2. After DeviceB receives the link Detect packet, DeviceB parses the packet and
encapsulates local information and DeviceA's information carried in the packet into a
Link Reply packet, and sends the Link Reply packet to DeviceA.
3. After DeviceA receives the Link Reply packet, DeviceA parses the packet and saves
local information and DeviceB's information carried in the packet to the local MIB. The
local and neighbor information is recorded as one entry.
Local and remote devices exchange LAD packets to learn each other's NE ID, slot ID, subcard ID,
interface number, and even each other's VLAN ID if sub-interfaces are used.
4. The NMS exchanges NETCONF packets with DeviceA to obtain DeviceA's local and
neighbor information and then generates the topology of the entire network.
Benefits
After network administrators deploy LAD on devices, they can obtain information about all
links connected to the devices. LAD helps extend the network management scale. Network
administrators can obtain detailed network topology information and topology changes.
1.4.10.3.1 LAD Application in Single-Neighbor Networking
Networking Description
In single-neighbor networking, devices are directly connected, and each device interface
connects only to one neighbor. In Figure 1-85, DeviceA and DeviceB are directly connected,
and each interface on DeviceA and DeviceB connects only to one neighbor.
Issue 01 (2018-05-04) 145

NE20E-S2
Figure 1-85 Single-neighbor networking
Feature Deployment
After enabling Link Automatic Discovery (LAD) on DeviceA, administrators can use the
NMS to obtain Layer 2 configurations of DeviceA and DeviceB, get a detailed network
topology, and determine whether a configuration conflict exists. LAD helps improve security
and stability for network communication.
1.4.10.3.2 LAD Application in Multi-Neighbor Networking
In multi-neighbor networking, devices are connected over an unknown network, and each
device interface connects to one or more neighbors. In Figure 1-86, DeviceA, DeviceB, and
DeviceC are connected over a Layer 2 virtual private network (L2VPN). Devices on the
L2VPN may have Link Automatic Discovery (LAD) disabled or may not need to be managed
by the NMS, but they can still transparently transmit LAD packets. DeviceA has two
neighbors, DeviceB and DeviceC.
Issue 01 (2018-05-04) 146

NE20E-S2
Figure 1-86 Multi-neighbor networking
Feature Deployment
NMS to obtain Layer 2 configurations of DeviceA, DeviceB, and DeviceC, get a detailed
network topology, and determine whether a configuration conflict exists. LAD helps ensure
security and stability for network communication.
1.4.10.3.3 LAD Application in Link Aggregation
On the network shown in Figure 1-87, an Eth-Trunk that comprises aggregated links exists
between DeviceA and DeviceB. Each aggregated link interface connects directly to only one
neighbor, as if it were connected in single-neighbor networking.
Issue 01 (2018-05-04) 147

NE20E-S2
Figure 1-87 Networking with aggregated links
Feature Deployment
NMS to obtain Layer 2 configurations of DeviceA and DeviceB, get a detailed network
topology, and determine whether a configuration conflict exists. LAD helps ensure security
and stability for network communication.
Terms
Term Definition
LAD A Huawei proprietary protocol that discovers neighbors at the link layer.
LAD allows a device to issue link discovery requests as triggered by the
NMS or command lines. After the device receives link discovery
replies, the device generates neighbor information and saves it in the
local MIB. The NMS can then query neighbor information in the MIB
and generate the topology of the entire network.
LLDP A Layer 2 discovery protocol defined in IEEE 802.1ab. LLDP provides
a standard link-layer discovery mode to encapsulate information about
the capabilities, management address, device ID, and interface ID of a
local device into LLDP packets and send the packets to neighbors. The
neighbors save the information received in a standard MIB to help the
NMS query and determine the communication status of links.

Acronym & Full Name
Abbreviation
LAD Link Automatic Discovery
Issue 01 (2018-05-04) 148

NE20E-S2
Acronym & Full Name

Abbreviation
LLDP Link Layer Discovery Protocol

MIB management information base
NMS network management system
SNMP Simple Network Management Protocol
1.4.11 LLDP
Definition
The Link Layer Discovery Protocol (LLDP), a Layer 2 discovery protocol defined in IEEE
802.1ab, provides a standard link-layer discovery method that encapsulates information about
the capabilities, management address, device ID, and interface ID of a local device into LLDP
packets and sends the packets to neighboring devices. These neighboring devices save the
information received in a standard management information base (MIB) to help the network
management system (NMS) query and determine the link communication status.
Purpose
Diversified network devices are deployed on a network, and configurations of these devices
are complicated. Therefore, NMSs must be able to meet increasing requirements for network
management capabilities, such as the capability to automatically obtain the topology status of
connected devices and the capability to detect configuration conflicts between devices. A
majority of NMSs use an automated discovery function to trace changes in the network
topology, but most can only analyze the network layer topology. Network layer topology
information notifies you of basic events, such as the addition or deletion of devices, but gives
you no information about the interfaces to connect a device to other devices. The NMSs
cannot identify neither the device location or the network operation mode.
LLDP is developed to resolve these problems. LLDP can identify interfaces on a network
device and provide detailed information about connections between devices. LLDP can also
display information about paths between clients, switches, routers, application servers, and
network servers, which helps you efficiently locate network faults.
Benefits
Deploying LLDP improves NMS capabilities. LLDP supplies the NMS with detailed
information about network topology and topology changes, and it detects inappropriate
configurations existing on the network. The information provided by LLDP helps
administrators monitor network status in real time to keep the network secure and stable.
Issue 01 (2018-05-04) 149

NE20E-S2
1.4.11.2 Principles
1.4.11.2.1 Basic LLDP Concepts
LLDP Packets
LLDP encapsulates Ethernet packets encapsulated into LLDP data units (LLDPDUs). LLDP
supports two encapsulation modes: Ethernet II and Subnetwork Access Protocol (SNAP). The
versatile routing platform (NE20E) supports the Ethernet II encapsulation mode.
Figure 1-88 shows the format of an Ethernet II LLDP packet.
Figure 1-88 Ethernet II LLDP packet format
Table 1-28 describes the fields in an Ethernet II LLDP packet.
Table 1-28 Fields in an Ethernet II LLDP packet
Field Description
Destination MAC A fixed multicast MAC address 0x0180-C200-000E.
address
Source MAC address A MAC address for an interface or a bridge MAC address for a
device.
Type Packet type, fixed at 0x88CC.
LLDPDU Main body of an LLDP packet.
FCS Frame check sequence.
LLDPDU
An LLDPDU is a data unit encapsulated in the data field in an LLDP packet.
A device encapsulates local device information in type-length-value (TLV) format and
combines several TLVs in an LLDPDU for transmission. You can combine various TLVs to
form an LLDPDU as required. TLVs allow a device to advertise its own status and learn the
status of neighboring devices.
Figure 1-89 shows the LLDPDU format.
Issue 01 (2018-05-04) 150

NE20E-S2
Figure 1-89 LLDPDU format
Each LLDPDU carries a maximum of 28 types of TLVs, and that each LLDPDU starts with
the Chassis ID TLV, Port ID TLV, and Time to Live TLV, and ends with the End of LLDPDU
TLV.
TLV
A TLV is the smallest unit of an LLDPDU. It gives type, length, and other information for a
device object. For example, a device ID is carried in the Chassis ID TLV, an interface ID in
the Port ID TLV, and a network management address in the Management Address TLV.
LLDPDUs can encapsulate basic TLVs, TLVs defined by IEEE 802.1 working groups, TLVs
defined by IEEE 802.3 working groups, and Data Center Bridging Capabilities Exchange
Protocol (DCBX) TLVs.
 Basic TLVs: are the basis for network device management.
Table 1-29 Basic TLVs
TLV Name TLV Type Value Description Mandatory

End of LLDPDU 0 End of an LLDPDU. Yes
TLV
Chassis ID TLV 1 Bridge MAC address of Yes
the transmit device.
Port ID TLV 2 Number of a transmit Yes
interface of a device.
Time To Live TLV 3 Timeout period of local Yes
device information on
neighboring devices.
Port Description 4 String describing an No
TLV Ethernet interface.
System Name TLV 5 Device name. No
System Description 6 System description. No
TLV
System 7 Primary functions of the No
Capabilities TLV system and whether
these primary functions
are enabled.
Management 8 Management address. No
Address TLV
Issue 01 (2018-05-04) 151

NE20E-S2
TLV Name TLV Type Value Description Mandatory

Reserved 9-126 Reserved for special use. No
 Organizationally specific TLVs: include TLVs defined by IEEE 802.1 and those defined
by IEEE 802.3. They are used to enhance network device management. Use these TLVs
as needed.
a. TLVs defined by IEEE 802.1
Table 1-30 Description of TLVs defined by IEEE 802.1
TLV Name TLV Type Value Description
Reserved 0 Reserved for special use.

Port VLAN ID TLV 1 VLAN ID on an interface.
Port And Protocol VLAN 2 Protocol VLAN ID on an
ID TLV interface.
VLAN Name TLV 3 VLAN name on an interface.
Protocol Identity TLV 4 Protocol type that an interface
supports.
Reserved 5-255 Reserved for special use.
b. TLVs defined by IEEE 802.3
Table 1-31 Description of TLVs defined by IEEE 802.3
TLV Name TLV Type Description

Reserved 0 Reserved for special use.
MAC/PHY 1 Duplex and bit-rate capability,
Configuration/Status TLV current duplex and bit-rate
settings, and auto-negotiation
status.
Power Via MDI TLV 2 Power supply capability of an
interface, that is, whether an
interface supports PoE and
whether an interface supplies
or requires power.
Link Aggregation TLV 3 Link aggregation status.
Issue 01 (2018-05-04) 152

NE20E-S2
TLV Name TLV Type Description

Maximum Frame Size 4 Maximum frame length
TLV supported by interfaces. The
maximum transmission unit
(MTU) of an interface is used.
Reserved 5-255 Reserved for special use.
Figure 1-90 shows the TLV format.
Figure 1-90 TLV format
The TLV contains the following fields:

 TLV type: a 7–bit long field. Each value uniquely identifies a TLV type. For example,
value 0 indicates the end of LLDPDU TLV, and value 1 indicates a Chassis ID TLV.
 TLV information string length: a 9–bit long field indicating the length of a TLV string.
 TLV information string: a string that contains TLV information. This field contains a
maximum of 511 bytes.
LLDP Management Addresses

The NMS uses LLDP management addresses to identify devices and implement network
management.
Each management address is encapsulated in a Management Address TLV in an LLDP packet.
The management address must be set to a valid unicast IP address of a device.
 If you do not specify a management address, a device searches the IP address list and
automatically selects an IP address as the default management address.
 If the device does not find any proper IP address from the IP address list, the system uses
a bridge MAC address as the default management address.
The device searches the IP address list for loopback interfaces, management network interface, and
VLANIF interfaces in sequence and automatically selects the smallest IP address of the same interface
type as a management IP address.
1.4.11.2.2 Basic LLDP Principles
Implementation
LLDP must be used together with MIBs. LLDP requires that each device interface be
provided with four MIBs. An LLDP local system MIB that stores status information of a local
device and an LLDP remote system MIB that stores status information of neighboring devices
Issue 01 (2018-05-04) 153

NE20E-S2
are the most important. The status information includes the device ID, interface ID, system
name, system description, interface description, device capability, and network management
address.
LLDP requires that each device interface be provided with an LLDP agent to manage LLDP
operations. The LLDP agent performs the following functions:
 Maintains information in the LLDP local system MIB.
 Sends LLDP packets to notify neighboring devices of local device status.
 Identifies and processes LLDP packets sent by neighboring devices and maintains
information in the LLDP remote system MIB.
 Sends LLDP alarms to the NMS when detecting changes in information stored in the
LLDP local and remove system MIBs.
Figure 1-91 LLDP schematic diagram
Figure 1-91 shows the LLDP implementation process:

 The LLDP module maintains the LLDP local system MIB by exchanging information
with the PTOPO MIB, Entity MIB, Interface MIB, and Other MIBs of the device.
 An LLDP agent sends LLDP packets carrying local device information to neighboring
devices directly connected to the local device.
 An LLDP agent updates the LLDP remote system MIB after receiving LLDP packets
from neighboring devices.
The NMS collects and analyzes topology information stored in LLDP local and remote
system MIBs on all managed devices and determines the network topology. The information
helps rapidly detect and rectify network faults.
Issue 01 (2018-05-04) 154

NE20E-S2
Working Mechanism
LLDP working modes
LLDP is working in one of the following modes:
 Tx mode: enables a device only to send LLDP packets.
 Rx mode: enables a device only to receive LLDP packets.
 Tx/Rx mode: enables a device to send and receive LLDP packets. The default working
mode is Tx/Rx.
 Disabled mode: disables a device from sending or receiving LLDP packets.
When the LLDP working mode changes on an interface, the interface initializes the LLDP state
machines. To prevent repeatedly initializations caused by frequent working mode changes, the NE20E
supports an initial delay on the interface. When the working mode changes on the interface, the interface
initializes the LLDP state machines after a configured delay interval elapses.
Principles for sending LLDP packets

 After LLDP is enabled on a device, the device periodically sends LLDP packets to
neighboring devices. If the configuration is changed on the local device, the device
immediately sends LLDP packets to notify neighboring devices of the changes. If
information changes frequently, set a delay for an interface to send LLDP packets. After
an interface sends an LLDP packet, the interface does not send another LLDP packet
until the configured delay time elapses, which reduces the number of LLDP packets to be
sent.
 The fast sending mechanism allows the NE20E to override a pre-configured delay time
and quickly advertise local information to other devices in the following situations:
− A device receives an LLDP packet sent by a transmitting device, whereas the device
has no information about the transmitting device.
− LLDP is enabled on a device that previously has LLDP disabled.
− An interface on the device goes Up.
The fast sending mechanism shortens the interval at which LLDP packets are sent to 1
second. After a specified number of LLDP packets are sent, the pre-configured delay
time is restored.
Principles for receiving LLDP packets
A device verifies TLVs carried in LLDP packets it receives. If the TLVs are valid, the device
saves information about neighboring devices and sets the TTL value carried in the LLDPDU
so that the information ages after the TTL expires. If the TTL value carried in a received
LLDPDU is 0, the device immediately ages information about neighboring devices.
1.4.11.3.1 LLDP Applications in Single Neighbor Networking
In single neighbor networking, no interfaces between devices or interfaces between devices
and media endpoints (MEs) are directly connected to intermediate devices. Each device
interface is connected only to one remote neighboring device. In the single neighbor
networking shown in Figure 1-92, Device B is directly connected to Device A and the ME,
Issue 01 (2018-05-04) 155

NE20E-S2
and each interface of Device A and Device B is connected only to a single remote neighboring
device.
Figure 1-92 Single neighbor networking
Feature Deployment
After LLDP is configured on Device A and Device B, an administrator can use the NMS to
obtain Layer 2 configuration information about these devices, collect detailed network
topology information, and determine whether a configuration conflict exists. LLDP helps
make network communications more secure and stable.
1.4.11.3.2 LLDP Applications in Multi-Neighbor Networking
In multi-neighbor networking, each interface is connected to multiple remote neighboring
devices. In the multi-neighbor networking shown in Figure 1-93, the network connected to
Device A, Device B, and Device C is unknown. Devices on this unknown network may have
LLDP disabled or may not need to be managed by the NMS, but they can still transparently
transmit LLDP packets. Interfaces on Device A, Device B, and Device C are connected to
multiple remote neighboring devices.
Issue 01 (2018-05-04) 156

NE20E-S2
Figure 1-93 Multi-neighbor networking
Feature Deployment
After LLDP is configured on Device A, Device B, and Device C, an administrator can use the
NMS to obtain Layer 2 configuration information about these devices, collect detailed
network topology information, and determine whether a configuration conflict exists. LLDP
helps make network communications more secure and stable.
1.4.11.3.3 LLDP Applications in Link Aggregation
In Figure 1-94, aggregated links exist between interfaces on Device A and Device B. Each
aggregated link interface is connected directly to another aggregated link interface in the same
way in single neighbor networking.
Issue 01 (2018-05-04) 157

NE20E-S2
Figure 1-94 Networking with aggregated links
Feature Deployment
After LLDP is configured on Device A and Device B, an administrator can use the NMS to
obtain Layer 2 configuration information about these devices, collect detailed network
topology information, and determine whether a configuration conflict exists. LLDP helps
make network communications more secure and stable.
Terms
Term Description
LLDP A Layer 2 discovery protocol defined in IEEE 802.1ab.

DCBX Data Center Bridging Capabilities Exchange Protocol. DCBX provides
parameter negotiation and remote configuration for Data Center
Bridging (DCB)-enabled network devices.
agent A process running on managed devices. Each device interface is
provided with an LLDP agent to manage LLDP operations.

Abbreviation
APP Application Protocol

DCBX Data Center Bridging Capabilities Exchange Protocol
ETS Enhanced Transmission Selection
LLDP Link Layer Discovery Protocol
Issue 01 (2018-05-04) 158

NE20E-S2

Abbreviation
LLDPDU Link Layer Discovery Protocol Data Unit

MIB management information base
PFC Priority-based Flow Control
TLV type length value
VM virtual machine
1.4.12 Physical Layer Clock Synchronization

Physical layer clock synchronization means that devices restore the clock frequency from
physical signals to achieve frequency synchronization between upstream and downstream
devices.
Definition
Synchronization is classified into the following types:
 Clock synchronization, also called frequency synchronization
Clock synchronization maintains a strict relationship between signal frequencies or
between signal phases. Signals are transmitted at the same average rate within the valid
time. In this manner, all devices on a network run at the same rate.
On a digital communication network, a sender places a pulse signal in a specific timeslot
for transmission. A receiver needs to extract this pulse signal from this specific timeslot
to ensure that the sender and receiver communicate properly. The clocks on the sender
and receiver must also be synchronized to ensure smooth communication. Clock
synchronization enables the clocks on the sender and receiver to be synchronized.
 Time synchronization, also called phase synchronization
Generally, the word "time" indicates either a moment or a time interval. A moment is a
transient in a period, whereas a time interval is the interval between two transients. Time
synchronization adjusts the internal clocks and moments of devices based on a received
time. The working principle of time synchronization is similar to that of clock
synchronization. When a time is adjusted, both the frequency and phase of a clock are
adjusted. The phase of this clock is represented by a moment in the form of year, month,
day, hour, minute, second, millisecond, microsecond, and nanosecond. Time
synchronization enables devices to receive discontinuous time reference information and
to adjust their times to synchronize times. Clock synchronization enables devices to trace
a clock source to synchronize frequencies.
Issue 01 (2018-05-04) 159

NE20E-S2
Figure 1-95 Comparison between the time synchronization and clock synchronization
The figure shows the difference between time synchronization and clock synchronization. In
time synchronization, watches A and B always keep the same time. In clock synchronization,
watches A and B keep different times, but the time difference between the two watches is a
constant value, for example, 6 hours.
Purpose
Clock synchronization aims to limit the clock frequency or phase difference between network
elements (NEs) on a digital communication network within an allowable range. Information is
coded into digital pulse signals using pulse code modulation (PCM) and transmitted on a
digital communication network. If the clock frequencies of two digital switching devices are
different, or digital bit streams are corrupted due to interference during transmission, phase
drift or jitter occurs. Consequently, the buffer of the digital switching system experiences data
loss or duplication, resulting in incorrect transmission of bit streams. If the clock frequency or
phase difference exceeds an allowable range, bit errors or jitter may occur. As a result,
network transmission performance deteriorates.
CX68EEGFAE/CX68EAGFEA/CX68EEGFC subcard do not support clock synchronization under

100M mode.
1.4.12.2 Principles
Clock Source
The device that provides clock signals for another device is called a clock source. A device
can have multiple clock sources. They are classified into the following types:
 External clock source
An external clock source traces a high-level clock through the clock interface provided
by a clock board.
 Line clock source
Issue 01 (2018-05-04) 160

NE20E-S2
A clock board uses Ethernet interfaces or POS interfaces or CPOS interfaces, or E1

interfaces to extract clock signals from Ethernet line signals or STM-N line signals.
 Internal clock source
The reference clock provided by the local device, for example, the clock provided by a
clock board, is used as the working clock of an interface.
Reference Factors for Clock Source Selection

The NE20E can select a clock source based on three reference factors: priorities, synchronous
status message (SSM) levels, and clock IDs of clock sources.
Clock Source Selection Modes

The NE20E supports the following clock source selection modes:
 Automatic clock source selection: The NE20E uses the automatic clock source selection
algorithm to determine a clock source to be traced based on priorities, SSM levels, and
clock IDs of clock sources.
 Manual clock source selection: A clock source to be traced is manually specified. This
clock source must have the highest SSM level.
 Forcible clock source selection: A clock source to be traced is forcibly specified. This
clock source can be any clock source.
You are advised to configure the automatic clock source selection mode. In this mode, the NE20E
dynamically selects an optimal clock source based on clock source quality.
NE20E
SSM
The International Telecommunication Union-Telecommunication Standardization Sector
(ITU-T) defined a synchronous status message (SSM) to identify the quality level of a
synchronization source on synchronous digital hierarchy (SDH) networks. As stipulated by
the ITU-T, the four spare bits in one of the five Sa bytes in a 2 Mbit/s bit stream are used to
transmit the SSM value. The use of the SSM value in clock source selection brings the
following benefits:
 Improves synchronization network performance.
 Prevents timing loops.
 Achieves synchronization on networks with different structures.
 Enhances synchronization network reliability.
Forcible Participation of SSM Levels in Clock Source Selection

In automatic clock source selection mode, you can configure SSM levels to forcibly
participate in clock source selection. After the configuration is complete, the NE20E
determines a clock source to be traced based on priorities and SSM levels of clock sources.
The NE20E determines the SSM level of each clock source and preferentially selects a clock
source with the highest SSM level. If two or more clock sources have the same SSM level, the
NE20E selects a clock source based on the priorities of these clock sources.
Issue 01 (2018-05-04) 161

NE20E-S2
Extended SSM
The extended SSM function enables clock IDs to participate in automatic clock source
selection. This function prevents clock loops.
When the extended SSM function is enabled, the NE20E does not allow clock IDs to
participate in automatic clock source selection in either of the following cases:
 The clock ID of a clock source is the same as the clock ID configured on the NE20E.
 The clock ID of a clock source is 0.
1.4.12.2.2 Physical Layer Synchronization Modes and Precautions

There are two synchronization modes for digital networks: pseudo synchronization and
master/slave synchronization.
Pseudo Synchronization
In pseudo synchronization mode, each switching site has its own clock with very high
accuracy and stability, and clock synchronization is not carried out among the switching sites.
There is a small difference in the clock frequency or phase among the clocks of the switching
sites, which does not affect service transmission and can therefore be ignored. This is the
reason why the mode is called pseudo synchronization.
Pseudo synchronization applies to digital networks between countries. Caesium clocks are
usually used on digital networks inside a country.
Master/Slave Synchronization
In master/slave synchronization mode, a master clock of high accuracy is set on a network
and traced by every site. Every network element traces the higher-level clock in the same site
or sub-site. Every sub-site traces the higher-level clock in its own site.
Master/Slave synchronization is classified into two types: direct master/slave synchronization
and hierarchical master/slave synchronization.
In direct master/slave synchronization mode shown in Figure 1-96, all slave clocks
synchronize with the master clock. Direct master/slave synchronization applies to simple
networks.
Issue 01 (2018-05-04) 162

NE20E-S2
Figure 1-96 Direct master/slave synchronization
In hierarchical master/slave synchronization mode shown in Figure 1-97, the devices are
classified into three levels. The level-2 slave clock synchronizes with the level-1 reference
clock, and the level-3 slave clock synchronizes with the Level-2 slave clock. Hierarchical
master/slave synchronization applies to large complex networks.
Figure 1-97 Hierarchical master/slave synchronization
Master/Slave synchronization generally applies to digital networks inside a country or region

where there is one master clock of high accuracy on the network.
To improve the reliability of master/slave synchronization, two master clocks are set on the
network. One functions as the active master clock, and the other functions as the standby
master clock. Both are caesium clocks. In normal cases, each network element traces the
active master clock as does the standby. When the active master clock becomes faulty, the
standby automatically becomes the reference clock for the entire network. After the faulty
active master clock recovers, it becomes the reference clock again.
A clock board has the following working status in the master/slave synchronization mode:
 Trace state
The slave clock traces the clock source provided by a higher level clock. The clock
source may be provided either by the master clock or by the internal clock source of the
higher-level network element.
Issue 01 (2018-05-04) 163

NE20E-S2
 Hold-in state
When all reference clocks are lost, the slave clock enters the hold-in state and uses the
last frequency stored before the reference clocks were lost. In addition, the slave clock
provides clock signals that conform to the source reference clock to ensure that there is a
small difference between the frequency of the provided clock signals and that of the
reference clock.
 Free running state
After losing all external reference clocks, the slave clock loses the clock reference
memory or retains the hold-in state for a long time. As a result, the oscillator inside the
slave clock works in the free running state.
The accuracy of the clock in the hold-in state cannot be maintained for a long time because of
the drift of the inherent oscillation frequency. Therefore, the accuracy of the clock in the
hold-in state is inferior to that of the clock in the trace state.
1.4.12.2.3 Networking Modes of Clock Physical Layer Synchronization
Transmitting Clock Signals Through Clock Interfaces

A clock interface on a clock board outputs its clock signals to other network elements.
The IPU of the NE20E provides three building integrated timing supply (BITS) interfaces.
One BITS interface is used to input and output clock information. The other BITS interface is
used to input and output time information.
As shown in Figure 1-98, router A traces the BITS clock. The clock output interface on router
A is connected to the clock input interface on router B using a clock cable. routers B and C
are also connected using a clock cable, and router C traces router B's clock. The clocks of the
three routers synchronize with the BITS clock.
Figure 1-98 Transmitting clock signals through clock interfaces
Transmitting Clock Signals Through Physical Links

Transmitting clock signals through physical links is a commonly used method for
synchronizing clocks. The information about the master clock is stored in physical link
signals. Other network elements extract the clock information from the physical link signals
through the clock board and trace and lock the clock information.
Ethernet links can be used to implement clock synchronization without the need of
constructing a special clock synchronization network.
The NE20E can transmit or receive clock signals through an Ethernet interface.
Issue 01 (2018-05-04) 164

NE20E-S2
As shown in Figure 1-99, router A traces the BITS clock. routers A and B are connected
through an Ethernet link. routers B and C are also connected through an Ethernet link, and
router C traces router B's clock. The clocks of the three routers synchronize with the BITS
clock.
Figure 1-99 Transmitting clock signals through Ethernet links
Owing to the long transmission distance of optical fibers, synchronizing clock signals through
synchronous Ethernet links has become the most common networking mode for clock
synchronization.
1.4.12.2.4 Physical Layer Clock Protection Switching

This section describes how to deploy a highly reliable clock synchronization network and
covers the following topics:
 Overview of Clock Protection Switching
 Implementation of Clock Protection Switching
 Boards Participating in Clock Protection Switching
Overview of Physical Layer Clock Protection Switching

Each router traces the same reference clock level by level over clock synchronization paths to
implement clock synchronization on the entire network. Usually, a router may have many
clock sources, which may come from either the same master clock or from reference clocks
with different qualities. Maintaining synchronous router clocks is very important on a
synchronization network. Automatic protection switching of synchronized clocks can be
configured to prevent the entire synchronization network from becoming faulty because of a
faulty clock synchronization path.
Automatic protection switching of synchronized clocks means that when a certain clock
source traced by a router is lost, the router automatically traces another clock source, which
may be either the same reference clock as the previously traced one or another one of poorer
quality. After the previously traced clock source recovers, the router traces the clock source
again.
Implementation of Physical Layer Clock Protection Switching

The methods of implementing clock protection switching are as follows:
 Specifying a clock source manually
You can configure a clock board to always trace a certain clock source. You can also
configure different clock sources for the active and standby clock boards.
Issue 01 (2018-05-04) 165

NE20E-S2
As shown in Figure 1-100, on Device A that serves as the master clock, the active clock
board is configured to trace, Device B is configured to trace the clock of Device A, and
Device C is configured to trace the clock of Device B.
When all devices on the entire network trace router A's clock, there is no reference clock
on the entire network if router A fails. As a result, all routers do not have an accurate
reference clock. The routers may trace a reference clock, but the reference clock
accuracy cannot meet synchronization requirements.
Figure 1-100 Networking diagram of specifying a clock source manually
 Performing protection switching based on the priorities of clock sources

When there are multiple reference clock sources, you can set different priorities for them.
During protection switching, if the SSM level is configured not to participate in
reference source selection, the clock board prefers the reference clock source with the
highest priority. After the reference clock source with the highest priority becomes faulty,
the clock board selects the reference clock source with the second highest priority. If the
default priority (0) of the reference source is used, this reference source is not chosen
during protection switching.
 Performing protection switching based on SSM levels
An SSM is a group of codes used to indicate the level of the clock quality on a
synchronization network. ITU-T dictates that four bits are used for coding. Table 1-32
lists the SSM codes defined by ITU-T. These four bits comprise the Synchronous Status
Message Byte (SSMB). The codes represent 16 quality levels of synchronization sources.
When the SSMB of a clock source is 2, the quality of the clock source is of the highest
level. When the SSMB of a clock source is f, the quality of the clock source is of the
lowest level.
On an SDH transmission network, the SSM is transmitted through the four low-order bits
b5 through b8 in the S1 byte of the SDH segment overhead. On a BITS device, however,
the SSM is transmitted through a certain bit in the first timeslot (TS0) of the clock signal
of 2 Mbit/s. Therefore, 2 MHz clock signals cannot carry the SSM.
The difference between the SSMB and S1 byte is that the SSMB is a group of message
codes, representing clock quality levels, as listed in Table 1-32, whereas the S1 byte is a
byte in the SDH segment overhead with the four low-order bits representing the SSMB.
Table 1-32 SSM codes
Z1 (b5-b8) S1 Byte SDH Synchronization Quality Level
Issue 01 (2018-05-04) 166

NE20E-S2
Z1 (b5-b8) S1 Byte SDH Synchronization Quality Level

0000 0x00 Unknown
0001 0x01 Reserved
0010 0x02 G.811 clock signals (PRC, a caesium clock)
0011 0x03 Reserved
0100 0x04 G.812 transit site clock signals (SSUA, a rubidium
clock)
0101 0x05 Reserved
0110 0x06 Reserved
0111 0x07 Reserved
1000 0x08 G.812 local site clock signals (SSUB, a rubidium
clock or a crystal clock)
1001 0x09 Reserved
1010 0x0a Reserved
1011 0x0b SEC (a crystal clock)
1100 0x0c Reserved
1101 0x0d Reserved
1110 0x0e Reserved
1111 0x0f Cannot be used as a clock source (DNU)
When the clock board is powered on, the default SSM levels of all reference sources are
Unknown. The sequence of the SSM levels from high to low is PRC, SSUA, SSUB, SEC,
UNKNOWN, and DNU. If the SSM level of a clock source is DNU and the SSM level
participates in the selection of a clock source, the clock source is not selected during
protection switching.
The SSM level of output signals is determined by the traced clock source. When the
clock works in the trace state, the SSM level of output signals and that of the traced
clock source are the same. When the clock does not work in the trace state, the SSM
level of output signals is SEC.
For a line clock source, the SSM can be extracted from an interface board and reported
to the IPU. The IPU then sends the SSM to the clock board. The IPU can also forcibly
set the SSM of the line clock source.
For the BITS clock source of the clock module:
− If the signal is 2.048 Mbit/s, the clock module can extract the SSM from the signal.
− If the signal is 2.048 MHz, the SSM level can be set manually.
The router can only select an SSM value listed in Table 1-32. For values not listed, the router processes
them as DNU.
Issue 01 (2018-05-04) 167

NE20E-S2
Boards Participating in Physical Layer Clock Protection Switching

Clock protection switching involves boards and protocols. The functions of the boards during
clock protection switching are as follows:
 Interface board
An interface board is responsible for inserting and extracting the SSM. The SSM of the
best clock source sent by the clock board is set on each synchronous physical interface
on the interface board for distribution. The SSM of the best clock source received by
each synchronous interface is processed by the interface board.
 Clock board
A clock board extracts the SSMs of an external clock and implements protection
switching between clock sources. After receiving SSMs from an interface board, the
clock board determines the clock source to be traced based on SSM levels, implements
clock protection switching, and sends the SSM level of the current clock source to other
interface boards.
1.4.12.2.5 Impact
1.4.12.3 Terms and Abbreviations

None
1.4.13 1588 ACR Clock Synchronization

Definition
The 1588 adaptive clock recovery (ACR) algorithm is used to carry out clock (frequency)
synchronization between the NE20E and clock servers by exchanging 1588v2 messages over
a clock link that is set up by sending Layer 3 unicast packets.
Unlike 1588v2 that achieves frequency synchronization only when all devices on a network
support 1588v2, 1588 ACR is capable of implementing frequency synchronization on a
network with both 1588v2-aware devices and 1588v2-unaware devices.
After 1588 ACR is enabled on a server, the server provide 1588 ACR frequency
synchronization services for clients.
1588 ACR records PDV performance statistics in the CF card. The performance statistics indicate the
delay and jitter information about packets but not information in the packets.
Purpose
All-IP has become the trend for future networks and services. Therefore, traditional networks
based on the Synchronous Digital Hierarchy (SDH) have to overcome various constraints
before migrating to IP packet-switched networks. Transmitting Time Division Multiplexing
(TDM) services over IP networks presents a major technological challenge. TDM services are
classified into two types: voice services and clock synchronization services. With the
development of VoIP, technologies of transmitting voice services over an IP network have
become mature and have been extensively used. However, development of technologies of
transmitting clock synchronization services over an IP network is still under way.
Issue 01 (2018-05-04) 168

NE20E-S2
1588v2 is a software-based technology that carries out time and frequency synchronization.
To achieve higher accuracy, 1588v2 requires that all devices on a network support 1588v2; if
not, frequency synchronization cannot be achieved.
Derived from 1588v2, 1588 ACR implements frequency synchronization with clock servers
on a network with both 1588v2-aware devices and 1588v2-unaware devices. Therefore, in the
situation where only frequency synchronization is required, 1588 ACR is more applicable
than 1588v2.
Benefits
This feature brings the following benefits to operators:
 Frequency synchronization can be achieved on networks with both 1588v2-aware and
1588v2-unaware devices, reducing the costs of network construction.
 Operators can provide more services that can meet subscribers' requirements for
frequency synchronization.
This feature brings the following benefits to users:
N/A.
1.4.13.2 Principles
1.4.13.2.1 Basic Principles of 1588 ACR
1588 ACR aims to synchronize frequencies of routers (clients) with those of clock servers
(servers) or router (Client) and clock servers(Server).
1588 ACR sends Layer 3 unicast packets to establish a clock link between a client and a
server to exchange 1588v2 messages. 1588 ACR obtains a clock offset by comparing
timestamps carried in the 1588v2 messages, which enables the client to synchronize
frequencies with the server.
Process of 1588 ACR Clock Synchronization

1588 ACR implements clock (frequency) synchronization by adjusting time differences
between the time when the server sends 1588v2 messages and the time when the client
receives the 1588v2 messages over a link that is established after negotiations. The detailed
process is described as follows:
1588 ACR clock synchronization is implemented in two modes: one-way mode and two-way
mode.
 One-way mode
Issue 01 (2018-05-04) 169

NE20E-S2
Figure 1-101 Clock synchronization in one-way mode
a. The server sends the client 1588v2 messages at t1 and t1' and time-stamps the
messages with t1 and t1'.
b. The client receives the 1588v2 messages at t2 and t2' and time-stamps the messages
with t2 and t2'.
t1 and t1' are the clock time of the server, and t2 and t2' are the clock time of the client.
By comparing the sending time on the server and the receiving time on the client, 1588
ACR calculates a frequency offset between the server and client and then implements
frequency synchronization. For example, if the result of the formula (t2 - t1)/(t2' - t1') is
1, frequencies on the server and client are the same; if not, the frequency of the client
needs to be adjusted so that it is the same as the frequency of the server.
 Two-way mode
Issue 01 (2018-05-04) 170

NE20E-S2
Figure 1-102 Clock synchronization in two-way mode
a. The server clock sends a 1588 sync packet carrying a timestamp t1 to the client
server at t1.
b. The client server receives a 1588 sync packet from the server clock at t2.
c. The client clock sends a 1588 delay_req packet to the server clock at t3.
d. The server clock receives the 1588 delay_req packet from the client clock at t4, and
sends a delay_resp packet to the slave clock.
The same calculation method is used in two-way and one-way modes. t1 and t2 are compared
with t3 and t4. A group of data with less jitter is used for calculation. In the same network
conditions, the clock signals with less jitter in one direction can be traced, which is more
precise than clock signal tracing in one direction. The two-way mode has a better frequency
recovery accuracy and higher reliability than the one-way mode. If adequate bandwidth is
provided, using clock synchronization in two-way mode is recommended for frequency
synchronization when deploying 1588 ACR.
Layer 3 Unicast Negotiation Mechanism

Layer 3 unicast negotiations can be enabled to carry out 1588 ACR frequency synchronization
as required. The principle of Layer 3 unicast negotiations is as follows:
A client initiates a negotiation with a server in the server list by sending a request to the server.
After receiving the request, the server replies with an authorization packet, implementing a
2-way handshake. After the handshake is complete, the client and server exchange Layer 3
unicast packets to set up a clock link, and then exchange 1588v2 messages over the link to
achieve frequency synchronization.
Issue 01 (2018-05-04) 171

NE20E-S2
Dual Server Protection Mechanism

1588 ACR supports the configuration of double servers. Dual server protection is performed
as follows:
After triggering a negotiation with one server, a client periodically queries the negotiation
result. If the client detects that the negotiation fails, it automatically negotiates with another
server. Alternatively, if the client successfully synchronizes with one server and detects that
the negotiation status changes due to a server failure, the client automatically negotiates with
another server. This dual server protection mechanism ensures uninterrupted communications
between the server and the client.
When only one server is configured, the client re-attempts to negotiate with the server after a
negotiation failure. This allows a client to renegotiate with a server that is only temporarily
unavailable in certain situations, such as when the server fails and then recovers or when the
server is restarted.
Duration Mechanism
On a 1588 ACR client, you can configure a duration for Announce, Sync and delay_resp
packets. The duration value is carried in the TLV field of a packet for negotiating signaling
and sent to a server.
Generally, the client sends a packet to renegotiate with the server before the duration times
out so that the server can continue to provide the client with synchronization services.
If the link connected to the client goes Down or fails, the client cannot renegotiate with the
server. When the duration times out, the server stops sending Sync packets to the client.
Typical Applications of 1588 ACR

On an IP RAN shown in Figure 1-103, NodeBs need to implement only frequency
synchronization rather than phase synchronization; devices on an MPLS backbone network do
not support 1588v2; the RNC-side device is connected to an IPCLK server; closed subscriber
groups (CSGs) support 1588 ACR.
NodeB1 transmits wireless services along an E1 link to a CSG, and NodeB2 transmits
wireless services along an Ethernet link to the other CSG.
Issue 01 (2018-05-04) 172

NE20E-S2
Figure 1-103 Networking diagram of 1588 ACR applications on a network
On the preceding network, CSGs support 1588 ACR and function as clients to initiate requests
for Layer 3 unicast connections to the upstream IPCLK server. The CSGs then exchange
1588v2 essages with the IPCLK server over the connections, achieving frequency recovery.
BITS1 and BITS2 are configured as clock servers for the CSGs to provide protection.
One CSG sends line clock signals carrying frequency information to NodeB1 along an E1 link.
The other CSG transmits NodeB2 frequency information either along a synchronous Ethernet
link or by sending 1588v2 messages. In this manner, both NodeBs connected to the CSGs can
achieve frequency synchronization.
Terms
Term Description
Synchronizati On a modern communications network, in most cases, the proper
on functioning of telecommunications services requires network clock
synchronization, meaning that the frequency offset or time difference
between devices must be kept in an acceptable range. Network clock
synchronization includes frequency synchronization and time
synchronization.
Time Time synchronization, also called phase synchronization, refers to the
synchronizati consistency of both frequencies and phases between signals. This means
on that the phase offset between signals is always 0.
Frequency Frequency synchronization, also called clock synchronization, refers to a
synchronizati strict relationship between signals based on a constant frequency offset or
on a constant phase offset, in which signals are sent or received at the same
average rate in a valid instance. In this manner, all devices on the
Issue 01 (2018-05-04) 173

NE20E-S2
Term Description
communications network operate at the same rate. That is, the phase
difference between signals remains a fixed value.
IEEE 1588v2 1588v2, defined by the Institute of Electrical and Electronics Engineers
PTP (IEEE), is a standard for Precision Clock Synchronization Protocol for
Networked Measurement and Control Systems. The Precision Time
Protocol (PTP) is used for short.
Abbreviations
Abbreviation Full Spelling
PTP Precision Time Protocol
1588v2
BITS Building Integrated Time Supply System
BMC Best Master Clock
ACR Adaptive Clock Recovery
1.4.14 CES ACR Clock Synchronization

Definition
Circuit emulation service (CES) adaptive clock recovery (ACR) clock synchronization
implements adaptive clock frequency synchronization. CES ACR clock synchronization uses
special circuit emulation headers to encapsulate time multiplexing service (TDM) packets that
carry clock frequency information and transmits these packets over a packet switched network
(PSN).
Purpose
If a clock frequency is out of the allowed error range, problems such as bit errors and jitter
occur. As a result, network transmission performance deteriorates. CES ACR uses the
adaptive clock recovery algorithm to synchronize clock frequencies and confines the clock
frequencies of all network elements (NEs) on a digital network to the allowed error range,
enhancing network transmission stability.
When the intermediate PSN does not support clock synchronization at the physical layer and
needs to transmit clock frequency information using TDM services of the CES ACR.
Issue 01 (2018-05-04) 174

NE20E-S2
1.4.14.2 Principles
CES
The CES technology originated from the asynchronous transfer mode (ATM) network. CES
uses emulated circuits to encapsulate circuit service data into ATM cells and transmits these
cells over the ATM network. Later, circuit emulation was used on the Metro Ethernet to
transparently transmit TDM and other circuit switched services.
CES uses special circuit emulation headers to encapsulate TDM service packets that carry
clock frequency information and transmits these packets over the PSN.
CES ACR
The CES technology generally uses the adaptive clock recovery algorithm to synchronize
clock frequencies. If an Ethernet transmits TDM services over emulated circuits, the Ethernet
uses the adaptive clock recovery algorithm to extract clock synchronization information from
data packets.
Clock Recovery Domain

A clock recovery domain refers to a channel of clock signals that can be recovered on a client.
1.4.14.2.2 Basic Principles

On the network shown in Figure 1-104, if the intermediate packet switched network (PSN)
does not support physical-layer clock synchronization, CES ACR must be used for TDM
services to restore to the TDM format from PWE3. The process is as follows:
1. The clock source sends clock frequency information to the CE1.
2. The CE1 encapsulates clock frequency information into TDM service packets sends to
gateway IWF1.
3. Gateway IWF1 that connects to the master clock regularly sends service clock
information to gateway IWF2 that connects to the slave clock. The service clock
information is coded using sequence numbers or timestamp. The service clock
information is encapsulated into T1/E1 service packets for transmission.
4. Upon receipt, gateway IWF2 that connects to the slave clock extracts the timestamp or
sequence number from packets and uses ACR to recover clocks. The clock recovered on
IWF2 tracks and locks the clock imported to the TDM services on IWF1. This ensure
frequency synchronization between the two devices on the PSN.
Figure 1-104 Working principles of CES-based ACR
Issue 01 (2018-05-04) 175

NE20E-S2
CES ACR applies to scenarios in which TDM services traverse a packet switched network
(PSN) that does not support clock synchronization and the transmit TDM service clock must
be used to restore TDM services at the receive end.
Figure 1-105 Applications of CES-based ACR
On the network shown in Figure 1-105, the clock source sends clock frequency information to
CE1. CE1 encapsulates the clock frequency information into TDM services and transmits the
services over the intermediate PSN through routers. Upon receipt, the router connected to the
slave clock uses CES ACR to recover the clock frequency. In actual applications, multiple
E1/T1 interfaces can belong to the same clock recovery domain. The system uses the PW
source selection algorithm to select a PW as the primary PW and uses the primary PW to
recover clocks. If the primary PW fails, the system automatically selects the next available
PW as the primary PW to recover clocks. If multiple PWs are configured to belong to the
same clock domain, the TDM services carried over these PWs must also have the same clock
source. Otherwise, packet loss or frequency deviation adjustment may occur.
Abbreviations
CES Circuit Emulation Service

1.4.15 1588v2 and G.8275.1

Both 1588v2 and G.8275.1 are time synchronization protocols, with 1588v2 defined by IEEE
and G.8275.1 defined by the Telecom domain. This section describes 1588v2 and G.8275.1.
Definition
 Synchronization
This is the process of ensuring that the frequency offset or time difference between
devices is kept within a reasonable range. In a modern communications network, most
telecommunications services require network clock synchronization in order to function
Issue 01 (2018-05-04) 176

NE20E-S2
properly. Network clock synchronization includes time synchronization and frequency

synchronization.
− Time synchronization
Time synchronization, also called phase synchronization, means that both the
frequency of and the time between signals remain constant. In this case, the time
offset between signals is always 0.
− Frequency synchronization
Frequency synchronization, also called clock synchronization, refers to a constant
frequency offset or phase offset. In this case, signals are transmitted at a constant
average rate during any given time period so that all the devices on the network can
work at the same rate.
Figure 1-106 Schematic diagram of time synchronization and frequency synchronization
Figure 1-106 shows the differences between time synchronization and frequency
synchronization. If Watch A and Watch B always have the same time, they are in time
synchronization. If Watch A and Watch B have different time, but the time offset remains
constant, for example, 6 hours, they are in frequency synchronization.
 IEEE 1588
IEEE 1588 is defined by the Institute of Electrical and Electronics Engineers (IEEE) as
Precision Clock Synchronization Protocol (PTP) for networked measurement and control
systems. It is called the Precision Time Protocol (PTP) for short.
IEEE 1588v1, released in 2002, applies to industrial automation and tests and
measurements fields. With the development of IP networks and the popularization of 3G
Issue 01 (2018-05-04) 177

NE20E-S2
networks, the demand for time synchronization on telecommunications networks has

increased. To satisfy this need, IEEE drafted IEEE 1588v2 based on IEEE 1588v1 in
June 2006, revised IEEE 1588v2 in 2007, and released IEEE 1588v2 at the end of 2008.
Targeted at telecommunications industry applications, IEEE 1588v2 improves on IEEE
1588v1 in the following aspects:
− Encapsulation of Layer 2 and Layer 3 packets has been added.
− The transmission rate of Sync messages is increased.
− A transparent clock (TC) model has been developed.
− Hardware timestamp processing has been defined.
− Time-length-value (TLV) extension is used to enhance protocol features and
functions.
1588v2 is a time synchronization protocol which allows for highly accurate time
synchronization between devices. It is also used to implement frequency synchronization
between devices.
 ITU-T G.8275.1
ITU-T G.8275.1 defines the precision time protocol telecom profile for phase/time
synchronization with full timing support from the network.
G.8275.1 defines three types of clocks, including T-GM, T-BC and T-TSC. A Bearer
Network Device is configured as the T-BC.
Purpose
Data communications networks do not require time or frequency synchronization and,
therefore, routers on such networks do not need to support time or frequency synchronization.
On IP radio access networks (RANs), time or frequency needs to be synchronized among base
transceiver stations (BTSs). Therefore, routers on IP RANs are required to support time or
frequency synchronization.
Frequency synchronization between BTSs on an IP RAN requires that frequencies between
BTSs be synchronized to a certain level of accuracy; otherwise, calls may be dropped during
mobile handoffs. Some wireless standards require both frequency and time synchronization.
Table 1-33 shows the requirements of wireless standards for time synchronization and
frequency accuracy.
Table 1-33 Requirements of wireless standards for time synchronization and frequency accuracy
Wireless Standards Requirement for Requirement for Time

Frequency Accuracy Synchronization
GSM 0.05 ppm NA

WCDMA 0.05 ppm NA
TD-SCDMA 0.05 ppm 3us
CDMA2000 0.05 ppm 3us
WiMax FDD 0.05 ppm NA
WiMax TDD 0.05 ppm 1us
LTE 0.05 ppm In favor of time
synchronization
Issue 01 (2018-05-04) 178

NE20E-S2
Different BTSs have different requirements for frequency synchronization. These

requirements can be satisfied through physical clock synchronization (including external
clock input, WAN clock input, and synchronous Ethernet clock input) and packet-based clock
recovery.
Traditional packet-based clock recovery cannot meet the time synchronization requirement of
BTSs. For example, NTP-based time synchronization is only accurate to within one second
and 1588v1-based time synchronization is only accurate to within one millisecond. To meet
time synchronization requirements, BTSs need to be connected directly to a global positioning
system (GPS). This solution, however, has some disadvantages such as GPS installation and
maintenance costs are high and communications may be vulnerable to security breaches
because a GPS uses satellites from different countries.
1588v2, with hardware assistance, provides time synchronization accuracy to within one
micro second to meet the time synchronization requirements of wireless networks. Thus, in
comparison with a GPS, 1588v2 deployment is less costly and operates independently of GPS,
making 1588v2 strategically significant.
In addition, operators are paying more attention to the operation and maintenance of networks,
requiring routers to provide network quality analysis (NQA) to support high-precision delay
measurement at the 100 us level. Consequently, high-precision time synchronization between
measuring devices and measured devices is required. 1588v2 meets this requirement.
1588v2 packets are of the highest priority by default to avoid packet loss and keep clock
precision.
CX68EEGFAE/CX68EAGFEA/CX68EEGFC subcard do not support 1588v2 under 100M mode.
Benefits
This feature brings the following benefits to operators:
 Construction and maintenance costs for time synchronization on wireless networks are
reduced.
 Time synchronization and frequency synchronization on wireless networks are
independent of GPS, providing a higher level of strategic security.
 High-accuracy NQA-based unidirectional delay measurement is supported.
 Y.1731 and IPFPM are supported.
Concepts of G.8275.1
ITU-T G.8275.1 defines the precision time protocol telecom profile for phase/time
synchronization with full timing support from the network. G.8275.1 is defined as a time
synchronization protocol.
A physical network can be logically divided into multiple clock domains. Each clock domain
has its own independent synchronous time, with which clocks in the same domain
synchronize.
A node on a time synchronization network is called a clock. G.8275.1 defines three types of
clocks:
Issue 01 (2018-05-04) 179

NE20E-S2
 A Telecom grandmaster (T-GM) can only be the master clock that provides time
synchronization.
 A Telecom-boundary clock (T-BC) has more than one G.8275.1 interface. One interface
of the T-BC synchronizes time signals with an upstream clock, and the other interfaces
distribute the time signals to downstream clocks.
 A Telecom time slave clock (T-TSC) can only be the slave clock that synchronizes the
time information of the upstream device.
The NE20E can function as the T-BC only.
1.4.15.2 Principles
Clock Domain
Logically, a physical network can be divided into multiple clock domains. Each clock domain
has a reference time with which all devices in the domain are synchronized. Each clock
domain has its own reference time and these times are independent of one another.
A device can transparently transmit time signals from multiple clock domains over a bearer
network to provide specific reference times for multiple mobile operator networks. The device,
however, can join only one clock domain and can synchronize only with the synchronization
time of that clock domain.
Clock Node
Each node on a time synchronization network is a clock. The 1588v2 protocol defines the
following types of clocks:
 Ordinary clock
An ordinary clock (OC) has only one 1588v2 clock interface (a clock interface enabled
with 1588v2) through which the OC synchronizes with an upstream node or distributes
time signals to downstream nodes.
 Boundary clock
A boundary clock (BC) has multiple 1588v2 clock interfaces, one of which is used to
synchronize with an upstream node. The other interfaces are used to distribute time
signals to downstream nodes.
The following is an example of a special case: If a device obtains the standard time from
a BITS through an external time interface (which is not enabled with 1588v2) and then
distributes time signals through two 1588v2 enabled clock interfaces to downstream
nodes, this device is a BC node, as it has more than one 1588v2 clock interface.
 Transparent clock
A transparent clock (TC) does not synchronize the time with other devices (unlike BCs
and OCs) but has multiple 1588v2 clock interfaces through which it transmits 1588v2
messages and corrects message transmission delays.
TCs are classified into end-to-end (E2E) TCs and peer-to-peer (P2P) TCs.
 TC+OC
A TC+OC is a special TC, which has the functions of both the TC and OC. On interfaces
having TC attributes, the TC+OC can transparently transmit 1588v2 messages and
Issue 01 (2018-05-04) 180

NE20E-S2
correct message transmission delays. On interfaces having OC attributes, the TC+OC

performs frequency synchronization, but does not implement time synchronization.
As mentioned before, the TC corrects for transmission delays of its 1588v2 messages. If
the times on the inbound and outbound interfaces of the TC are synchronous, the
message transmission delay is determined by subtracting the time of the 1588v2
message's arrival at the inbound interface from the time of departure at the outbound
interface. If the clocks of the TC and the BC or OC with which the TC synchronizes are
asynchronous, the obtained message transmission delay is inaccurate, causing a time
offset in the BC or OC time synchronization. As a result, the time synchronization's
accuracy may be degraded.
To ensure accuracy, it is recommended that frequency synchronization between the TC
and the BC or OC be implemented through a physical clock, such as a WAN clock or
synchronous Ethernet clock. If no such physical clock is available, the TC needs to use
1588v2 Sync messages sent periodically to restore frequency and to realize time
synchronization with an upstream device.
TC+OCs are classified into E2E TC+OCs and P2P TC+OCs.
Figure 1-107 shows the location of the TC, OC, and TC+OC on a time synchronization
network.
Figure 1-107 Location of the TC, OC, and TC+OC on a time synchronization network
Time Source Selection

On a 1588v2 time synchronization network, all clocks are organized into a master-slave
synchronization hierarchy with the Grandmaster (GM) clock at the top. This topology can be
statically configured or automatically generated by 1588v2 using the Best Master Clock
(BMC) algorithm.
1588v2 Announce messages are used to exchange time source information, including
information about the priority level of the GM, time strata, time accuracy, distance, and hops
Issue 01 (2018-05-04) 181

NE20E-S2
to the GM between clocks. After this information has been gathered, one of the clock nodes is
selected to be the GM, the interface to be used for transmitting clock signals issued by the
GM is selected, and master and slave relationships between nodes are specified. A loop-free
and full-meshed GM-rooted spanning tree is established after completion of the process.
If a master-slave relationship has been set up between two nodes, the master node periodically
sends Announce messages to the slave node. If the slave node does not receive an Announce
message from the master node within a specified period of time, it terminates the current
master-slave relationship and finds another interface with which to establish a new
master-slave relationship.
Clock Modes of a 1588v2-enabled Device
Encapsulation Modes of a 1588v2 Packet

A 1588v2 packet can be encapsulated in either MAC or UDP mode:
 In MAC encapsulation, VLAN IDs and 802.1p priorities are carried in 1588v2 packets.
MAC encapsulation is classified into two types:
− Unicast encapsulation
− Multicast encapsulation
 In UDP encapsulation, Differentiated Service CodePoint (DSCP) values are carried in
1588v2 packets. UDP encapsulation is classified into two types:
− Unicast encapsulation
− Multicast encapsulation
Supported Link Types

Theoretically, 1588v2 supports all types of links, but at present it has only been defined for
encapsulation and implementation on Ethernet links and thus the NE20E supports only
Ethernet links.
Grandmaster
A time synchronization network is like a GM-rooted spanning tree. All other nodes
synchronize with the GM.
Master/Slave
When a pair of nodes perform time synchronization, the upstream node distributing the
reference time signals is the master node and the downstream node receiving the reference
time signals is the slave node.
1.4.15.2.2 Principle of Synchronization

The principles of 1588v2 time synchronization and NTP are the same. The master and slave
nodes exchange timing messages, and calculate the message transmission delays in two
directions (sending and receiving) according to the receiving and sending timestamps in the
exchanged timing messages. If the message transmission delays in two directions are identical,
the message transmission delay in one direction (the time offset between the slave and master
nodes) equals the delays in two directions divided by 2. Then, the slave node synchronizes
with the master node by correcting its local time according to the time offset.
Issue 01 (2018-05-04) 182

NE20E-S2
In practice, the delay and jitter on the network need to be taken into account, and the sending
and receiving delays are not always identical. Therefore, message-based time synchronization,
namely, 1588v2 and NTP, cannot guarantee high synchronization accuracy. For example, NTP
can only provide the synchronization accuracy of 10 to 100 ms.
1588v2 and NTP differ in implementation.
NTP runs at the application layer, for example, on the IPU of the NE20E. The delay measured
by NTP, in addition to the link delay, includes various internal processing delays, such as the
internal congestion queuing, software scheduling, and software processing delays. These
make the message transmission delay unstable, causing message transmission delays in two
directions to be asymmetric. As a result, the accuracy of NTP-based time synchronization is
low.
1588v2 presumes that the link delay is constant or changes so slowly that the change between
two synchronization processes can be ignored, and the message transmission delays in two
directions on a link are identical. Messages are time-stamped for delay measurement at the
physical layer of the interface board. This ensures that time synchronization based on the
obtained link delay is extremely accurate.
1588v2 defines two modes for the delay measurement and time synchronization mechanisms,
namely, Delay and Peer Delay (PDelay).
Delay Mode
The Delay mode is applied to end-to-end (E2E) delay measurement. Figure 1-108 shows the
delay measurement in Delay mode.
Figure 1-108 E2E delay measurement in Delay mode
Issue 01 (2018-05-04) 183

NE20E-S2
As shown in Figure 1-108, t-sm and t-ms represent the sending and receiving delays respectively and are
presumed to be identical. If they are different, they should be made identical through asymmetric delay
correction. For details about asymmetric delay correction, see the following part of this section.
Follow_Up messages are used in two-step mode. Only the one-step mode is described in this part and
Follow_UP messages are not mentioned. For details about the two-step mode, see the following part of
this section.
A master node periodically sends a Sync message carrying the sending timestamp t1 to the
slave node. When the slave node receives the Sync message, it time-stamps t2 to the message.
The slave node periodically sends the Delay_Req message carrying the sending timestamp t3
to the master node. When the master node receives the Delay_Req message, it time-stamps t4
to the message and returns a Delay_Resp message to the slave node.
The slave node receives a set of timestamps, including t1, t2, t3, and t4. Other elements
affecting the link delay are ignored.
The message transmission delays of the link between the master and slave nodes in two
directions equal (t4 - t1) - (t3 - t2). If the message transmission delays between both nodes are
identical, the message transmission delay in one direction is equal to [(t4 - t1) - (t3 - t2)]/2.
The time offset between the master and slave nodes equals [(t2-t1)+(t4-t3)]/2.
Based on the time offset, the slave node synchronizes with the master node.
As shown in Figure 1-109, time synchronization is repeatedly performed to ensure constant
synchronization between the master and slave nodes.
Issue 01 (2018-05-04) 184

NE20E-S2
Figure 1-109 Networking diagram of the directly-connected BC and OC
The BC and OC can be directly connected as shown in Figure 1-109. Alternatively, they can
be connected through other devices, but these devices must be TCs to ensure the accuracy of
time synchronization. The TC only transparently transmits 1588v2 messages and corrects the
message transmission delay (which requires that the TC identify these 1588v2 messages).
To ensure the high accuracy of 1588v2 time synchronization, it is required that the message
transmission delays in two directions between master and slave nodes be stable. Usually, the
link delay is stable but the transmission delay on devices is unstable. Therefore, if two nodes
performing time synchronization are connected through forwarding devices, the time
synchronization accuracy cannot be guaranteed. The solution to the problem is to perform the
transmission delay correction on these forwarding devices, which requires that the forwarding
devices be TCs.
Figure 1-110 shows how the transmission delay correction is performed on a TC.
Issue 01 (2018-05-04) 185

NE20E-S2
Figure 1-110 Schematic diagram of the transmission delay correction on a TC
The TC performs the transmission delay correction by adding the time it takes to transmit the
message to the Correction field of a 1588v2 message. This means that the TC deducts the
receiving timestamp of the 1588v2 message on its inbound interface and adds the sending
timestamp to the 1588v2 message on its outbound interface.
In this manner, the 1588v2 message exchanged between the master and slave nodes, when
passing through multiple TCs, carry message transmission delays of all TCs in the Correction
field. When the value of the Correction field is deducted, the value obtained is the link delay,
ensuring high accuracy time synchronization.
A TC that records the transmission delay from end to end as described above is the E2E TC.
Time synchronization in Delay mode can be applied only to E2E TCs. Figure 1-111 shows
how the BC, OC, and E2E TC are connected and how 1588v2 operates.
Issue 01 (2018-05-04) 186

NE20E-S2
Figure 1-111 Networking diagram of the BC, OC, and E2E TC and the 1588v2 operation
PDelay Mode
When performing time synchronization in PDelay mode, the slave node deducts both the
message transmission delay and upstream link delay. This requires that adjacent devices
perform the delay measurement in PDelay mode to enable each device on the link to know its
upstream link delay. Figure 1-112 shows the delay measurement in PDelay mode.
Issue 01 (2018-05-04) 187

NE20E-S2
Figure 1-112 Schematic diagram of the delay measurement in PDelay mode
As shown in Figure 1-108, t-sm and t-ms represent the sending and receiving delays respectively and are
presumed to be identical. If they are different, they should be made identical through asymmetric delay
correction. For details of asymmetric delay correction, see the following part of this section.
Follow_Up messages are used in two-step mode. In this part, the one-step mode is described and
Follow_UP messages are not mentioned. For details of the two-step mode, see the following part of this
section.
Node 1 periodically sends a PDelay_Req message carrying the sending timestamp t1 to node
2. When the PDelay_Req message is received, node 2 time-stamps t2 to the PDelay_Req
message. Then, node 2 sends a PDelay_Resp message carrying the sending timestamp t3 to
node 1. When the PDelay_Resp message is received, node 1 time-stamps t4 to the
PDelay_Resp message.
Node 1 obtains a set of timestamps, including t1, t2, t3, and t4. Other elements affecting the
link delay are ignored.
The message transmission delays in two directions on the link between node 1 and node 2
equal (t4 - t1) - (t3 - t2).
If the message transmission delays in two directions on the link between node 1 to node 2 are
identical, the message transmission delay in one direction equals [(t4 - t1) - (t3 - t2)]/2.
The delay measurement in PDelay mode does not differentiate between the master and slave
nodes. All nodes send PDelay messages to their adjacent nodes to calculate adjacent link delay.
This calculation process repeats and the message transmission delay in one direction is
updated accordingly.
Issue 01 (2018-05-04) 188

NE20E-S2
The delay measurement in PDelay mode does not trigger time synchronization. To implement
time synchronization, the master node needs to periodically send Sync messages to the slave
node and the slave node receives the t1 and t2 timestamps. The slave node then deducts the
message transmission delay on the link from the master node to the slave node. The obtained
t2-t1-CorrectionField is the time offset between the slave and master nodes. The slave node
uses the time offset to synchronize with the master node. Figure 1-113 shows how time
synchronization is implemented in PDelay mode in the scenario where the BC and OC are
directly connected.
Figure 1-113 Networking diagram of time synchronization in PDelay mode on the

directly-connected BC and OC
The BC and OC can be directly connected as shown in Figure 1-109.

Alternatively, the BC and OC can be connected through other device functioning as TCs to
ensure the accuracy of time synchronization. The TC only transparently transmits 1588v2
messages and corrects the message transmission delay (which requires that the TC identify
these 1588v2 messages). Unlike delay correction on the E2ETC, delay correction on the
P2PTC involves the correction of both transmission delay and upstream link delay. Figure
1-114 shows how transmission delay correction is performed on a P2PTC.
Issue 01 (2018-05-04) 189

NE20E-S2
Figure 1-114 Transmission delay correction in PDelay mode
Figure 1-115 shows how the BC, OC, and E2E TC are connected and how 1588v2 operates.
Figure 1-115 Schematic diagram of transmission delay correction in PDelay mode on a P2PTC
Issue 01 (2018-05-04) 190

NE20E-S2
One-Step/Two-Step
In one-step mode, both the Sync messages for time synchronization in Delay mode and
PDelay_Resp messages for time synchronization in PDelay mode are stamped with a sending
time.
In two-step mode, Sync messages for time synchronization in Delay mode and PDelay_Resp
messages for time synchronization in PDelay mode are not stamped with a sending time. The
sending time is carried in Follow_Up and PDelay_Resp_Follow_Up messages.
Asymmetric Correction
Theoretically, 1588v2 requires the message transmission delays in two directions on a link to
be symmetrical. Otherwise, the algorithms of 1588v2 time synchronization cannot be
implemented. In practice, however, the message transmission delays in two directions on a
link may be asymmetric due to the attributes of a link or a device. For example, if the delays
between receiving the message and time-stamping the message in two directions are different,
1588v2 provides a mechanism of asymmetric delay correction, as shown in Figure 1-116.
Figure 1-116 Asymmetric delay correction
Usually, t-ms is identical with t-sm. If they are different, the user can set a delay offset
between them if the delay offset is constant and obtainable by measurement device. 1588v2
performs the time synchronization calculation according to the asymmetric correction value.
In this manner, a high level of time synchronization accuracy can be achieved on an
asymmetric-delay link.
Packet Encapsulation
1588v2 defines the following multiple packet encapsulation modes:
 Layer 2 multicast encapsulation through a multicast MAC address
The EtherType field is 0x88F7, and the multicast MAC address is 01-80-C2-00-00-0E
(in PDelay messages) or 01-1B-19-00-00-00 (in non-PDelay messages).
1588v2 recommends that the Layer 2 multicast encapsulation mode be used. The NE20E
supports Layer 2 multicast encapsulation with tags. Figure 1-117 shows the Layer 2
multicast encapsulation without tags.
Issue 01 (2018-05-04) 191

NE20E-S2
Figure 1-117 Layer 2 multicast encapsulation without tags
Figure 1-118 shows Layer 2 multicast encapsulation with tags.
Figure 1-118 Layer 2 multicast encapsulation with tags
 Layer 3 unicast encapsulation through unicast UDP

The destination UDP port number is 319 or 320, depending on the types of 1588v2
messages.
Currently, it is recommended that Huawei base stations adopt Layer 3 unicast
encapsulation. The IP clock server consists of multiple BTSs and uses unicast UDP
packets to exchange 1588v2 protocol packets. Figure 1-119 shows Layer 3 unicast
encapsulation without tags.
Figure 1-119 Layer 3 unicast encapsulation without tags
Figure 1-120 shows Layer 3 unicast encapsulation with tags.
Figure 1-120 Layer 3 unicast encapsulation with tags
Issue 01 (2018-05-04) 192

NE20E-S2
 Layer 3 multicast encapsulation through multicast UDP

 Layer 3 unicast encapsulation through a unicast MAC address
The NE20E supports Layer 2 multicast encapsulation, Layer 2 unicast encapsulation, Layer 3
multicast encapsulation, and Layer 3 unicast encapsulation.
BITS Interface
1588v2 enables clock nodes to synchronize with each other, but cannot enable them to
synchronize with Greenwich Mean Time (GMT). If the clock nodes need to synchronize with
GMT, an external time source is required. That is, the GM needs to be connected to an
external time source to obtain the reference time in non-1588v2 mode.
Currently, the external time sources are from satellites, such as the GPS from the U.S.A,
Galileo from Europe, GLONASS from Russia, and Beidou from China. Figure 1-121 shows
how the GM and an external time source are connected.
Figure 1-121 Synchronization with an external time source
The NE20E provides one type of external clock or time interfaces:

 RJ45 port (using a 120 Ohm shielded cable)
The two RJ45 ports function as an external clock port and an external time port
respectively, providing the following clock or time signals:
− 2 MHz clock signal (Differential level with one line clock input and one line clock
output)
− 2 Mbit/s clock signal (Differential level with one line clock input and one line clock
output)
− DC level shifter (DCLS) time signal (RS422 differential level with one line clock
input + one line clock output)
− 1 pps + TOD time signal (RS422 differential level with one line time input)
− 1 pps + TOD time signal (RS422 differential level with one line time output)
Clock Synchronization
In addition to time synchronization, 1588v2 can be used for clock synchronization, that is,
frequency recovery can be achieved through 1588v2 messages.
1588v2 time synchronization in Delay or PDelay mode requires the device to periodically
send Sync messages to its peer.
Issue 01 (2018-05-04) 193

NE20E-S2
The sent Sync message carries a sending timestamp. After receiving the Sync message, the
peer adds a receiving timestamp to it. When the link delay is stable, the two timestamps
change at the same pace. If the receiving timestamp changes are faster or slower, it indicates
that the clock of the receiving device runs faster or slower than the clock of the sending
device. In this case, the clock of the receiving device needs to be adjusted. When this occurs,
the frequencies of the two devices are synchronized.
The frequency restored through 1588v2 messages has a lower accuracy than the frequency
restored through synchronous Ethernet. Therefore, it is recommended to perform frequency
synchronization through synchronous Ethernet and time synchronization through 1588v2.
1588v2 restores the frequency in the following modes:
 Hop-by-hop
In hop-by-hop mode, all devices on a link are required to support 1588v2. The frequency
recovery in this mode is highly accurate. In the case of a small number of hops, the
frequency recovery accuracy can meet the requirement of ITU-T G.813 (stratum 3
standard).
 End-to-end (Delay and jitter may occur on the transit network.)
In end-to-end mode, the forwarding devices do not need to support 1588v2, and the
delay of the forwarding path is only required to meet a specified level, for example, less
than 20 ms. The frequency recovery accuracy in this mode is low, and can meet only the
requirements of the G.8261 and base stations (50 ppb) rather than that of the stratum 3
clock standard.
To achieve high frequency recovery accuracy, 1588v2 requires Sync messages to be sent at a
high rate of at least 100 packets/s.
The NE20E meets the following clock standards:
 G.813 and G.823 for external clock synchronization
 G.813 and G.823/G.824 for E1 clocks
 G.8261 and G.8262 for synchronous Ethernet clocks
 G.8261 and G.823/G.824 for frequency recovery through 1588v2 messages
At present, the NE20E supports frequency recovery through 1588v2 messages in
hop-by-hop mode, rather than in end-to-end or inter-packet delay variation (PDV)
network mode. The NE20E is not committed to be G.813 and G.8262 compliant.
1.4.15.2.3 Principle of G.8275.1 Synchronization

The principle of G.8275.1 time synchronization is the same as the principle of the IEEE
1588v2. The master and slave nodes send and receive timing messages of each other.
According to the receiving and sending timestamps in the timing messages, the total delay in
bidirectional transmission between nodes can be calculated. If the bidirectional delays are the
same, the total delay divided 2 is equal to the unidirectional delay, which is the time
difference between the slave node and the master node. Then, the slave node corrects the local
time according to the time difference. In this manner, the slave node is synchronized with the
master node.
1.4.15.3 Application Environment

Currently, 1588v2 is applicable to a link where all devices are 1588v2-capable, and a
maximum of 30 hops are supported.
Issue 01 (2018-05-04) 194

NE20E-S2
Because a master clock has multiple slave clocks, it is recommended to use the BITS or IP
clock server as the master clock. It is not recommended to use any device as the master clock
because the CPU of the device may be overloaded.
1588v2 Clock Synchronization in E2E Mode
Figure 1-122 Networking diagram of 1588v2 clock synchronization in E2E mode
As shown in Figure 1-122, clock servers and NodeBs exchange TOP-encapsulated 1588
messages over a QoS-enabled bearer network with the jitter being less than 20 ms.
Scenario description:
 NodeBs only need frequency synchronization.
 The bearer network does not support 1588v2 or frequency recovery in synchronous
Ethernet mode.
Solution description:
 The bearer network is connected to a wireless IP clock server and adopts 1588v2 clock
synchronization and frequency recovery in E2E mode.
 The clock server sends 1588v2 timing messages, which are transparently transmitted
over the bearer network to NodeBs. Upon receiving the timing messages, NodeBs
perform frequency recovery.
 1588v2 timing messages need to be transparently transmitted by priority over the bearer
network; the E2E jitter on the bearer network must be less than 20 ms.
 Advantage of the solution: Devices on the bearer network are not required to support
1588v2, and are therefore easily deployed.
 Disadvantage of the solution: Only frequency synchronization rather than time
synchronization is performed. In practice, an E2E jitter of less than 20 ms is not ensured.
Issue 01 (2018-05-04) 195

NE20E-S2
1588v2 Clock Synchronization in Hop-by-Hop Mode
Figure 1-123 Networking diagram of 1588v2 clock synchronization in hop-by-hop mode
As shown in Figure 1-123, the clock source can send clock signals to NodeBs through the
1588v2 clock, WAN clock, synchronous Ethernet clock, or any combination of clocks.
 NodeBs only need frequency synchronization.
 GE links on the bearer network support the 1588v2 clock rather than the synchronous
Ethernet clock.
 The Synchronous Digital Hierarchy (SDH) or synchronous Ethernet clock sends stratum
3 clock signals through physical links. On the GE links that do not support the
synchronous Ethernet clock, stratum 3 clock signals are transmitted through 1588v2.
 Advantage of the solution: The solution is simple and flexible.
 Disadvantage of the solution: Only frequency synchronization rather than time
synchronization is performed.
Issue 01 (2018-05-04) 196

NE20E-S2
Bearer and Wireless Networks in the Same Clock Domain
Figure 1-124 Networking diagram of the bearer and wireless networks in the same clock domain
 NodeBs need to synchronize time with each other.
 The bearer and wireless networks are in the same clock domain.
 The core node supports GPS or BITS clock interfaces.
 All nodes on the bearer network function as BC nodes, which support the link delay
measurement mechanism to handle fast link switching.
 Links or devices that do not support 1588v2 can be connected to devices with GPS or
BITS clock interfaces to perform time synchronization.
 Advantage of the solution: The time of all nodes is synchronous on the entire network.
 Disadvantage of the solution: All nodes on the entire network must support 1588v2.
Bearer and Wireless Networks in Different Clock Domains
Figure 1-125 Networking diagram of the bearer and wireless networks in different clock domains
Issue 01 (2018-05-04) 197

NE20E-S2
 NodeBs need to synchronize time with one another.

 The bearer and wireless networks are in different time domains.
 The GPS is used as a time source and is connected to the wireless IP clock server.
 BCs are deployed in the middle of the bearer network to synchronize the time of the
intermediate network.
 TCs are deployed on both ends of the bearer network. TCs only correct the message
transmission delay and send the time to NodeBs, but do not synchronize the time with
the clock server.
 Advantage of the solution: The implementation is simple because the bearer network
does not need to synchronize with the clock server.
 Disadvantage of the solution: Devices on both ends of the bearer network need to
support 1588v2 in TCandBC mode.
G.8275.1 Per-hop Clock Synchronization
Figure 1-126 G.8275.1 per-hop clock synchronization
 NodeBs need to synchronize time with one another.
 The bearer and wireless networks are in the same clock domain.
 Core nodes support GPS/BITS interfaces.
 Network-wide time synchronization is achieved from the core node in T-BC mode. All
T-BC nodes support path delay measurement to adapt to fast link switching.
 Network-wide synchronization can be traced to two grand masters.
 The advantage of the solution is that the network-wide time is synchronized to ensure the
optimal tracing path.
 The disadvantage of the solution is that all nodes on the network need to support 1588v2
and G.8275.1.
Issue 01 (2018-05-04) 198

NE20E-S2
Terms
Terms Description
Synchron On a modern communications network, in most cases, the proper functioning
ization of telecommunications services requires network clock synchronization,
meaning that the frequency offset or time difference between devices must be
kept in an acceptable range. Network clock synchronization includes time
synchronization and frequency synchronization.
 Time synchronization
Time synchronization, also called phase synchronization, refers to the
consistency of both frequencies and phases between signals. This means
that the phase offset between signals is always 0.
 Frequency synchronization
Frequency synchronization, also called clock synchronization, refers to a
strict relationship between signals based on a constant frequency offset or a
constant phase offset, in which signals are sent or received at the same
average rate in a valid instance. In this manner, all devices on the
communications network operate at the same rate. That is, the phase
difference between signals remains a fixed value.
IEEE 1588v2, defined by the Institute of Electrical and Electronics Engineers
1588v2 (IEEE), is a standard for Precision Clock Synchronization Protocol for
PTP Networked Measurement and Control Systems. The Precision Time Protocol
(PTP) is used for short.
Clock Logically, a physical network can be divided into multiple clock domains.
domain Each clock domain has a reference time, with which all devices in the domain
are synchronized. Different clock domains have their own reference time,
which is independent of each other.
Clock Each node on a time synchronization network is a clock. The 1588v2 protocol
node defines three types of clocks: OC, BC, and TC.
Clock Clock source selection is a method to select reference clocks based on the clock
reference selection algorithm.
source
One-step In one-step mode, Sync messages in Delay mode and PDelay_Resp messages
mode in PDelay mode are stamped with the time when messages are sent.
Two-step In two-step mode, Sync messages in Delay mode and PDelay_Resp messages
mode in PDelay mode only record the time when messages are sent and carry no
timestamps. The timestamps are carried in the messages, such as Follow_Up
and PDelay_Resp_Follow_Up messages.
Abbreviations
1588v2 Precision Time Protocol
Issue 01 (2018-05-04) 199

NE20E-S2

IP RAN Internet Protocol Radio Access Network
GSM Global System for Mobile communications
WCDMA Wideband Code Division Multiple Access
TD-SCDMA Time Division-Synchronous Code Division Multiple Access
WiMax FDD Worldwide Interoperability for Microwave Access Frequency
Division Duplex
WiMax TDD Worldwide Interoperability for Microwave Access Time Division
Duplex
NTP Network Time Protocol
GPS Global Position System
LTE Long Term Evolution
BC Boundary Clock
OC Ordinary Clock
TC Transparent Clock
BITS Building Integrated Time Supply System
1.4.16 1588 ATR

Definition
1588 Adaptive Time Recovery (ATR) is a PTP-based technology that allows routers to
establish clock links and implement time synchronization over a third-party network using
PTP packets in Layer 3 unicast mode.
1588 ATR is an advancement compared to 1588v2, the latter of which requires 1588v2
support on all network devices.
1588 ATR is a client/server protocol through which servers communicate with clients to
achieve time synchronization.
routers can function only as the 1588 ATR server.
Issue 01 (2018-05-04) 200

NE20E-S2
Purpose
1588v2 is a software-based technology used to achieve frequency and time synchronization
and can support hardware timestamping to provide greater accuracy. However, 1588v2
requires support from all devices on the live network.
To address this disadvantage, 1588 ATR is introduced to allow time synchronization over a
third-party network that includes 1588v2-incapable devices. On the live network, 1588v2 is
preferred for 1588v2-capable devices, and 1588 ATR is used when 1588v2-incapable devices
exist.
Benefits
 Does not require 1588v2 to be supported by all network devices, reducing network
construction costs.
 Fits for more network applications that meet time synchronization requirements.
Features Supported
The 1588 ATR features supported by NE20Es are as follows:
 An NE20E that functions as a 1588 ATR server can synchronize time information with
upstream devices using the BITS source and transmit time information to downstream
devices.
 An NE20E that functions as a 1588 ATR server can synchronize time information with
upstream devices using 1588v2/G.8275.1 and transmit time information to downstream
devices.
An NE20E can function only as the 1588 ATR server. The following restrictions apply to network
deployment:
 When 1588 ATR is used to implement time synchronization over a third-party network, reduce the
packet delay variation (PDV) and the number of devices on the third-party network as much as
possible in order to ensure time synchronization performance on clients. For details, see
performance specifications for clients.
 The server and client communicate with each other through PTP packets which can be either Layer 3
IP packets or single-VLAN-tagged packets. The PTP packets cannot carry two VLAN tags or the
MPLS label.
 The interface used to send PTP packets on the server needs to be support 1588v2.
1.4.16.2 Principles
1.4.16.2.1 Principles of 1588 ATR
1588 ATR is used to deliver time synchronization between clients and servers (routers).
After clock links are established through negotiation between clients and servers, 1588 ATR
uses PTP packets in Layer 3 unicast mode to obtain the clock difference between clients and
servers and then implement time synchronization based on the difference.
Issue 01 (2018-05-04) 201

NE20E-S2
Synchronization Process
After negotiation is complete, 1588 ATR servers exchange PTP packets with clients to
implement time synchronization.
1588 ATR works in one-way or two-way mode.
 One-way mode
a. The server sends PTP packets that carry timestamps t1 and t1' to the client.
b. The client receives PTP packets at timepoints t2 and t2'. Timestamps t1 and t1'
indicate the server-side clock information, and timestamps t2 and t2' indicate the
client-side clock information. The server-side and client-side timestamps are
compared to obtain the frequency offset between the server and client, which is
used for frequency synchronization. For example, if (t2-t1)/(t2'-t1') is 1, the
frequency on the server is the same as that on the client. Otherwise, the client
frequency needs to be synchronized with the server.
Figure 1-127 Clock synchronization in one-way mode
 Two-way mode
a. The server sends a Sync packet carrying timestamp t1 to the client.
b. The client receives the Sync packets at timepoint t2.
c. The client sends a 1588 delay_req packet carrying timestamp t3 to the server.
d. The server receives the 1588 delay_req packet at timepoint t4 and then generates a
Delay_Rep packet and sends it to the slave clock.
Issue 01 (2018-05-04) 202

NE20E-S2
Figure 1-128 Clock synchronization in two-way mode
A 1588 ATR server supports both the one-way and two-way modes by default. A 1588 ATR
client supports either the one-way or two-way mode.
With bandwidth permitted, the two-way mode is recommended for 1588 ATR deployment.
Layer 3 Unicast Negotiation Mechanism

Enable Layer 3 unicast negotiation before 1588 ATR time synchronization is performed. The
implementation of Layer 3 unicast negotiation is as follows:
A client initiates a negotiation request with a server. The server replies with an authorization
packet to implement handshake. After the handshake succeeds, the client and server establish
a clock link through Layer 3 unicast packets. Then, the client and server exchange PTP
packets to implement time synchronization over the clock link.
Master/Slave Server Protection Mechanism

1588 ATR supports the master/slave server protection mechanism.
A client supports negotiation with two servers and queries the negotiation result on a regular
basis. If either of the servers fails after time synchronization, the client discovers the change
of the negotiation status and automatically switches services to the other server. This
implementation achieves service protection between the two servers.
If only one server is configured, the client attempts to re-negotiate with the server once the
negotiation fails.
Issue 01 (2018-05-04) 203

NE20E-S2
Duration Mechanism
A 1588 ATR client supports the duration specified in Announce, Sync, and Delay Resp
packets. The duration can be placed to the TLV field in Signaling packets before they are sent
to the server.
In normal situations, a client initiates a re-negotiation to a server before the duration expires
so that the server can continue providing synchronization with the client.
If a client becomes Down, it cannot initiate a re-negotiation. After the duration collapses, the
server does not send synchronization packets to the server any more.
Per-hop BC + Server
1588 ATR servers can synchronize time synchronization with upstream devices and send the
time source information to clients.
Figure 1-129 1588 ATR time synchronization
On the IP RAN shown in the following figure, time synchronization needs to be performed
between NodeBs, but the third-party network (such as a microwave or switch network) does
not support 1588v2. In this case, 1588 ATR can be configured to allow time synchronization
over the third-party network. routers enabled with 1588 ACR can function as a BC to
synchronize time information with upstream devices and as a 1588 ATR server to synchronize
time information with NodeBs.
Issue 01 (2018-05-04) 204

NE20E-S2
Figure 1-130 1588 ATR time synchronization
Terms
Term Definition
Synchron Most telecommunication services running on a modern communications
ization network require network-wide synchronization. Synchronization means that the
frequency offset or time difference between devices must remain in a specified
range. Clock synchronization is categorized as frequency synchronization or
time synchronization.
Time Time synchronization, also known as phase synchronization, refers to the
synchron consistency of both frequencies and phases between signals. That is, the phase
ization offset between signals is always 0.
Frequenc Frequency synchronization, also known as clock synchronization, refers to the
y strict relationship between signals based on a constant frequency offset or
synchron phase offset, in which signals are sent or received at an average rate in a
ization moment. In this manner, all devices in the communications network operate at
the same rate. That is, the difference of phases between signals is a constant
value.
IEEE A standard entitled Precision Clock Synchronization Protocol for Networked
1588v2 Measurement and Control Systems, defined by the Institute of Electrical and
PTP Electronics Engineers (IEEE). It is also called the Precision Time Protocol
(PTP).
Issue 01 (2018-05-04) 205

NE20E-S2

Acronyms and Full Name
Abbreviations

1588v2
BITS Building Integrated Timing Supply System
ATR Adaptive Time Recovery
1.4.17 ATom GPS Timing

Background
As the commercialization of LTE-TDD and LTE-A accelerates, there is a growing need for
time synchronization on base stations. Traditionally, the GPS and PTP solutions were used on
base stations to implement time synchronization.
The GPS solution requires GPS antenna to be deployed on each base station, leading to high
TCO. The PTP solution requires 1588v2 support on network-wide devices, resulting in huge
costs on network reconstruction for network carriers.
Furthermore, GPS antenna can properly receive data from GPS satellites only when they are
placed outdoor and meet installation angle requirements. When it comes to indoor deployment,
long feeders are in place to penetrate walls, and site selection requires heavy consideration
due to high-demanding lightning protection. These disadvantages lead to high TCO and make
GPS antenna deployment challenging on indoor devices. Another weakness is that most
indoor equipment rooms are leased, which places strict requirements for coaxial cables
penetrating walls and complex application procedure. For example, taking security factors
into consideration, the laws and regulations in Japan specify that radio frequency (RF) cables
are not allowed to be deployed in rooms by penetrating walls.
To address the preceding challenges, the Atom GPS timing system is introduced to NE20Es.
Specifically, an Atom GPS module which is comparable to a lightweight BITS device is
inserted to an NE20E to provide GPS access to the bearer network. Upon receipt of GPS
clock signals, the Atom GPS module converts them into SyncE signals and then sends the
SyncE signals to NE20Es. Upon receipt of GPS time signals, the Atom GPS module converts
them into 1588v2 signals and then sends the 1588v2 signals to base stations. This mechanism
greatly reduces the TCO for carriers.
Benefits
 For newly created time synchronization networks, the Atom GPS timing system reduces
the deployment costs by 80% compared to traditional time synchronization solutions.
Issue 01 (2018-05-04) 206

NE20E-S2
 For the expanded time synchronization networks, the Atom GPS timing system can reuse
the legacy network to protect investment.
1.4.17.2 Principles
1.4.17.2.1 Modules
The Atom GPS timing system includes two types of modules: Atom GPS modules and
clock/time processing modules on routers.
Atom GPS Modules

 GPS antenna: receives signals from GPS satellites.
 GPS receiver: processes GPS RF signals and obtains frequency and time information
from the GPS RF signals.
 Phase-locked loop (PLL):
− Frequency PLL: locks the 1PPS reference clocks and outputs a high-frequency
clock.
− Analog PLL (APLL): multiplies the system clock to a higher frequency clock.
− Time PLL: locks the UTC time and outputs the system time.
 Real-time clock (RTC): provides real-time timestamps for PTP event messages.
 PTP grandmaster (GM): functions as the SyncE slave to obtain SyncE clock data.
Clock/Time Processing Modules on routers

A Atom GPS module must work in conjunction with clock/time processing modules to
implement clock and time synchronization. routers support two types of clock/time processing
modules:
 SyncE Slave: This module is used to obtain SyncE clock data.
 PTP BC: This module typically functions as a slave BC to process PTP messages and
extract PTP information.
1.4.17.2.2 Implementation Principles

The Atom GPS timing feature provides two key functions:
 Serves as the SyncE clock source to provide clock synchronization.
 Serves as the PTP time source to provide time synchronization.
Processing for Key Function 1

1. The Atom GPS module uses a built-in GPS Receiver module to receive satellite signals
from GPS antenna and output 1PPS GPS frequency signals.
2. The Atom GPS module uses a built-in frequency PLL module to trace and lock 1PPS
phase and frequency and output the system clock.
3. The Atom GPS module uses a built-in APLL module to multiply the system clock to a
clock at GE rate which is then used as the SyncE transmit clock.
4. The device uses the GE interface to obtain SyncE clock signals from the Atom GPS
module and transmits the clock signals to downstream devices.
Issue 01 (2018-05-04) 207

NE20E-S2
Processing for Key Function 2

1. The Atom GPS module uses a built-in GPS receiver to receive satellite signals from GPS
antenna and output the UTC time.
2. The Atom GPS module uses a built-in time PLL module to trace time PLL, lock the
UTC time, and output the system time.
3. The Atom GPS module uses a built-in time RTC module to obtain the system time.
4. The Atom GPS module uses a built-in PTP GM module to process PTP messages. The
timestamps carried in PTP event messages are generated by the RTC module.
5. The device uses the GE interface to obtain PTP time signals from the Atom GPS module
and transmits the time signals to downstream devices.
On the network shown in the following figure, the Atom GPS timing feature is mainly used in
three synchronization solutions:
 SyncE frequency synchronization + Atom GPS time synchronization
On networks that do not support time synchronization, this solution allows time
synchronization with an Atom GPS module inserted into an router.
 Atom GPS frequency synchronization + 1588v2 time synchronization
On networks that do not support frequency synchronization, this solution allows
frequency synchronization with an Atom GPS module inserted into an router.
 Atom GPS frequency synchronization + Atom GPS time synchronization
On networks that cannot be reconstructed, this solution allows time and frequency
synchronization with an Atom GPS module inserted into an router.
Figure 1-131 Atom GPS networking
Issue 01 (2018-05-04) 208

NE20E-S2
Terms
Term Definition
Synchron Most telecommunication services running on a modern communications
ization network require network-wide synchronization. Synchronization means that the
frequency offset or time difference between devices must remain in a specified
range. Clock synchronization is categorized as frequency synchronization or
time synchronization.
Time Time synchronization, also known as phase synchronization, refers to the
synchron consistency of both frequencies and phases between signals. That is, the phase
ization offset between signals is always 0.
Frequenc Frequency synchronization, also known as clock synchronization, refers to the
y strict relationship between signals based on a constant frequency offset or
synchron phase offset, in which signals are sent or received at an average rate in a
ization moment. In this manner, all devices in the communications network operate at
the same rate. That is, the difference of phases between signals is a constant
value.
IEEE A standard entitled Precision Clock Synchronization Protocol for Networked
1588v2 Measurement and Control Systems, defined by the Institute of Electrical and
PTP Electronics Engineers (IEEE). It is also called the Precision Time Protocol
(PTP).

Abbreviations
GPS Global Positioning System

PRC Primary Reference Clock
PRTC Primary Reference Timing Clock
UTC Coordinated Universal Clock
PLL Phase-Locked Loop
SyncE Synchronization Ethernet
RTC Real-time Clock
1.4.18 NTP
The Network Time Protocol (NTP) is supported only by a physical system (PS).
Issue 01 (2018-05-04) 209

NE20E-S2
Definition
The Network Time Protocol (NTP) is an application layer protocol in the TCP/IP protocol
suite. NTP synchronizes the time among a set of distributed time servers and clients. NTP is
built on the Internet Protocol (IP) and User Datagram Protocol (UDP). NTP messages are
transmitted over UDP, using port 123.
NTP is evolved from the Time Protocol and the ICMP Timestamp message, but is specifically
designed to maintain time accuracy and robustness.
Purpose
In the NTP model, a number of primary reference sources, synchronized to national standards
by wire or radio, are connected to widely accessible resources, such as backbone gateways.
These gateways act as primary time servers. The purpose of NTP is to convey timekeeping
information from these primary time servers to other time servers (secondary time servers).
Secondary time servers are synchronized to the primary time servers. The servers are
connected in a logical hierarchy called a synchronization subnet. Each level of the
synchronization subnet is called a stratum. For example, the primary time servers are stratum
1, and the secondary time servers are stratum 2. Servers with larger stratum numbers are more
likely to have less accurate clocks than those with smaller stratum numbers.
When multiple time servers exist on a network, use a clock selection algorithm to synchronize the
stratums and time offsets of time servers. This helps improve local clock precision.
There is no provision for peer discovery or virtual-circuit management in NTP. Duplicate

detection is implemented using processing algorithms.
Implementation
Figure 1-132 illustrates the process of implementing NTP. Device A and Device B are
connected through a wide area network (WAN). They both have independent system clocks
that are synchronized through NTP.
In the following example:
 Before Device A synchronizes its system clock to Device B, the clock of Device A is
10:00:00 am and the clock of Device B is 11:00:00 am.
 Device B functions as an NTP server, and Device A must synchronize its clock signals
with Device B.
 It takes 1 second to transmit an NTP packet between Device A and Device B.
 It takes 1 second for Device A and Device B to process an NTP packet.
Issue 01 (2018-05-04) 210

NE20E-S2
Figure 1-132 NTP implementation
The process of synchronizing system clocks is as follows:

1. Device A sends an NTP packet to Device B. When the packet leaves Device A, it carries
a timestamp of 10:00:00 a.m. (T1).
2. When the NTP packet reaches Device B, Device B adds a receive timestamp of 11:00:01
a.m. (T2) to the packet.
3. When the NTP packet leaves Device B, Device B adds a transmit timestamp of 11:00:02
a.m. (T3) to the packet.
4. When Device A receives the response packet, it adds a new receive timestamp of
10:00:03 a.m. (T4) to the packet.
Device A uses the received information to calculate the following important values:
− Roundtrip delay for the NTP packet: Delay = (T4 - T1) - (T3 - T2).
− Relative offset between Device A and Device B clocks: Offset = [(T2 - T1) + (T3 -
T4)]/2.
According to the delay and the offset, Device A re-sets its own clock to synchronize with
the clock of Device B.
1.4.18.2 Principles
1.4.18.2.1 NTP Implementation Model
Using the NTP implementation model, a client creates the following processes with each peer:
 Transmit process
Issue 01 (2018-05-04) 211

NE20E-S2
 Receive process
 Update process
These processes share a database and are interconnected through a message-transfer system.
When the client has multiple peers, its database is divided into several parts, with each part
dedicated to a peer.
Figure 1-133 shows the NTP implementation model.
Figure 1-133 NTP implementation model
Transmit Process
The transmit process, controlled by each timer for peers, collects information in the database
and sends NTP messages to the peers.
Each NTP message contains a local timestamp marking when the message is sent or received
and other information necessary to determine a clock stratum and manage the association. The
rate at which messages are sent is determined by the precision required by the local clock and
its peers.
Receive Process
The receive process receives messages, including NTP messages and other protocol messages,
as well as information sent by directly connected radio clocks.
When receiving an NTP message, the receive process calculates the offset between the peer
and local clocks and incorporates it into the database along with other information that is
useful for locating errors and selecting peers.
Update Process
The update process handles the offset of each peer after receiving NTP response messages and
selects the most precise peer using a specific selection algorithm.
This process may involve either many observations of few peers or a few observations of
many peers, depending on the accuracy.
Issue 01 (2018-05-04) 212

NE20E-S2
Local Clock Process

The local clock process operates upon the offset data produced by the update process and
adjusts the phase and frequency of the local clock. This may result in either a step-change or a
gradual adjustment of the local clock phase to reduce the offset to zero. The local clock
provides a stable source of time information to other users of the system and may be used for
subsequent reference by NTP.
Offset data is often generated during the update process. The local clock process then adjusts
the phase and frequency of the local clock using the following methods:
 Performs one-step adjustment.
 Performs gradual phase adjustment to reduce the offset to zero.
1.4.18.2.2 Network Structure

In Figure 1-134, a synchronization subnet is composed of the primary time server, secondary
time servers, clients, and interconnecting transmission paths.
Figure 1-134 NTP network structure
The functions of the primary and secondary time servers are as follows:
 A primary time server is directly synchronized to a primary reference source, usually a
radio clock or global positioning system (GPS).
 A secondary time server is synchronized to another secondary time server or a primary
time server. Secondary time servers use NTP to send time information to other hosts in a
Local Area Network (LAN).
When there is no fault, primary and secondary servers in the synchronization subnet assume a
hierarchical master-slave structure, with the primary servers at the root and secondary servers
at successive stratums toward the leaf node. The lower the stratum, the less precise the clock
(where one is the highest stratum).
As the stratum increases from one, the clock sample accuracy gradually decreases, depending
on the network paths and local-clock stabilities. To prevent tedious calculations necessary to
estimate errors in each specific configuration, it is useful to calculate proportionate errors.
Issue 01 (2018-05-04) 213

NE20E-S2
Proportionate errors are approximate and based on the delay and dispersion relative to the root
of the synchronization subnet.
This design helps the synchronization subnet in automatically reconfiguring the hierarchical
master-slave structure to produce the most accurate and reliable time, even when one or more
primary or secondary servers or the network paths in the subnet fail. If all primary servers fail,
one or more backup primary servers continue operations. If all primary servers over the
subnet fail, the remaining secondary servers then synchronize among themselves. In this case,
distances reach upwards to a pre-selected maximum "infinity".
Upon reaching the maximum distance to all paths, a server drops off the subnet and runs
freely based on its previously calculated time and frequency. The timekeeping errors of a
Device having a stabilized oscillator are not more than a few milliseconds per day as these
computations are expected to be very precise, especially in terms of frequency.
In the case of multiple primary servers, a specific selection algorithm is used to select the
server at a minimum synchronization distance. When these servers are at approximately the
same synchronization distance, they may be selected randomly.
 The accuracy cannot be decreased because of random selection when the offset between
the primary servers is less than the synchronization distance.
 When the offset between the primary servers is greater than the synchronization distance,
use filtering and selection algorithms to select the best servers available and discard
others.
1.4.18.2.3 Format of NTP Messages

Figure 1-135 shows the format of NTP messages.
Figure 1-135 NTP message
Issue 01 (2018-05-04) 214

NE20E-S2
Table 1-34 Description of each field of an NTP message

Leap Indicator 2 bits A code warning of an impending leap second to be
inserted or deleted in the last minute of the current
day. The 2 bits, bit 0 and bit 1, are coded as follows:
00: no warning.
01: last minute has 61 seconds.
10: last minute has 59 seconds.
11: alarm condition (clock not synchronized).
VN (Version Number) 3 bits NTP version number. The current version is 3.
Mode 3 bits NTP mode:
0: reserved
1: symmetric active
2: symmetric passive
3: client
4: server
5: broadcast
6: reserved for NTP control messages
7: reserved for private use
Stratum 8 bits Stratum of the local clock.
It defines the precision of a clock. The value that can
be displayed in this field ranges from 1 to 15. The
clocks at Stratum 1 are the most precise.
Poll 8 bits Minimum interval between successive messages.
Precision 8 bits Precision of the local clock.
Root Delay 32 bits Roundtrip delay(in ms) between the client and the
primary reference source.
Root Dispersion 32 bits Estimated dispersion to the primary reference source.
Reference Identifier 32 bits ID of a reference clock.
Reference Timestamp 64 bits Local time at which the local clock was last set or
corrected.
Value 0 indicates that the local clock is never
synchronized.
Originate Timestamp 64 bits Local time at which the NTP request is sent by the
client.
Receive Timestamp 64 bits Local time at which the request arrives at the time
server
Transmit Timestamp 64 bits Local time at which the response message is sent by
the time server to the client.
Issue 01 (2018-05-04) 215

NE20E-S2

Authenticator 96 bits Authenticator information. This field is optional.
1.4.18.2.4 NTP Operating Modes

NTP supports the following operating modes:
 Peer mode
 Client/server mode
 Broadcast mode
 Multicast mode
 Manycast mode
Peer Mode
In peer mode, the active and passive ends can be synchronized. The end with a lower stratum
(larger stratum number) is synchronized to the end with a higher stratum (smaller stratum
number).
 Symmetric active: A host operating in this mode periodically sends messages regardless
of the reachability or stratum of its peer. The host announces its willingness to
synchronize and be synchronized by its peer.
The symmetric active end is a time server close to the leaf node in the synchronization
subnet. It has a low stratum (large stratum number). In this mode, time synchronization
is reliable. A peer is configured on the same stratum and two peers are configured on the
stratum one level higher (one stratum number smaller). In this case, synchronization poll
frequency is not important. Even when error packets are returned because of connection
failures, the local clocks are not significantly affected.
 Symmetric passive: A host operating in this mode receives packets and responds to its
peer. The host announces its willingness to synchronize and be synchronized by its peer.
The prerequisites of being a symmetric passive host are as follows:
− The host receives messages from a peer operating in the symmetric active mode.
− The peer is reachable.
− The peer operates at a stratum lower than or equal to the host.
The host operating in the symmetric passive mode is at a low stratum in the synchronization
subnet. It does not need to know the feature of the peer. A connection between peers is set up
and status variables must be updated only when the symmetric passive end receives NTP
messages from the peer.
In NTP peer mode, the active end functions as a client and the passive end functions as a server.
Client/Server Mode
 Client: A host operating in this mode periodically sends messages regardless of the
reachability or stratum of the server. The host synchronizes its clock with that on the
server but does not alter the clock on the server.
Issue 01 (2018-05-04) 216

NE20E-S2
 Server: A host operating in this mode receives packets and responds to the client. The
host provides synchronization information for all its clients but does not alter its own
clock.
A host operating in the client mode periodically sends NTP messages to a server during and
after its restart. The server does not need to retain state information when the client sends the
request. The client freely manages the interval for sending packets according to actual
conditions.
Kiss-o'-Death (KOD) packets provide useful information to a client and are used for status
reporting and access control. When KOD is enabled on the server, the server can send packets
with kiss codes DENY and RATE to the client.
 After the client receives a packet with kiss code DENY, the client demobilizes any
associations with that server and stops sending packets to that server.
 After the client receives a packet with kiss code RATE, the client immediately reduces
its polling interval to that of the server and continues to reduce it each time it receives a
RATE kiss code.
Broadcast Mode
 A host operating in broadcast mode periodically sends clock-synchronization packets to
the broadcast IPv4 address regardless of the reachability or stratum of the clients. The
host provides synchronization information for all its clients but does not alter its own
clock.
 A client listens to the broadcast packets sent by the server. When receiving the first
broadcast packet, the client temporarily starts in the client/server mode to exchange
packets with the server. This allows the client to estimate the network delay. The client
then reverts to the broadcast mode, continues to listen to the broadcast packets, and
re-synchronizes the local clock based on the received broadcast packets.
The broadcast mode is run on multiple workstations. Therefore, high-speed LANs of the
highest accuracy are not required. In a typical scenario, one or more time servers in a LAN
periodically send broadcast packets to the workstations. The LAN packet transmission delay
is only milliseconds.
If multiple time servers are available to enhance reliability, a clock selection algorithm is
useful.
Multicast Mode
 A host operating in the multicast mode periodically sends clock-synchronization packets
to a multicast IPv4/IPv6 address. The host is usually a time server using high-speed
multicast media in a LAN. The host provides synchronization information for all its
peers but does not alter its own clock.
 A client listens to multicast packets sent by the server. After receiving the first multicast
packet, the client temporarily starts in the client/server mode to exchange packets with
the server. This allows the client to estimate the network delay. The client then reverts to
the multicast mode, continues to listen to the multicast packets, and re-synchronizes the
local clock based on the received multicast packets.
Manycast Mode
 A client operating in manycast mode sends periodic request packets to a designated IPv4
or IPv6 multicast address in order to search for a minimum number of associations. It
starts with a time to live (TTL) value equal to one and continuously adding one to it until
Issue 01 (2018-05-04) 217

NE20E-S2
the minimum number of associations is made, or when the TTL reaches a maximum
value. If the TTL reaches its maximum value, and still not enough associations are
mobilized, the client stops transmission for a timeout period to clear all associations, and
then repeats the search process. If a minimum number of associations have been
mobilized, then the client starts transmitting one packet per timeout period to maintain
the associations.
 A designated manycast server within range of the TTL field in the packet header listens
for packets with that address. If a server is suitable for synchronization, it returns an
ordinary server (mode 4) packet using the client's unicast address.
Manycast mode is applied to a small set of servers scattered over a network. Clients can
discover and synchronize to the closest manycast server. Manycast can especially be used
where the identity of the server is not fixed and a change of server does not require
reconfiguration of all the clients on the network.
NTP Operation
 A host operating in an active mode (symmetric active, client or broadcast mode) must be
configured.
 Its peer operating in a passive mode (symmetric passive or server mode) requires no
pre-configuration.
An error occurs when the host and its peer operate in the same mode. In such a case, one
ignores messages sent by the other, and their associations are then dissolved.
1.4.18.2.5 NTP Events Processing

The significant NTP events are as follows:
 Expiry of a peer timer
Only a host operating in the active mode can encounter this situation. This event also
occurs when the NTP messages sent by various peers reach their destination.
 Operator command or system fault, such as a primary reference source failure
Transmit Process
In all modes (except the client mode with a broadcast server and the server mode), the
transmit process starts when the peer timer expires. In the client mode with a broadcast server,
messages are never sent. In the server mode, messages are sent only in response to received
messages. This process is also invoked by the receive process when the received NTP
message does not result in a local persistent association. To ensure a valid response, the
transmit timestamp must be added to packets to be sent. Therefore, the values of variables
carried in the response packet must be accurately saved.
Broadcast and multicast servers that are not synchronized will start the transmit process when
the peer timer expires.
Receive Process
The receive process starts when an NTP message arrives. First, it checks the mode field in the
packet. Value 0 indicates that the peer runs an earlier NTP version. If the version number in
the packet matches the current version, the receive process continues with the following steps.
If the version numbers do not match, the packet is discarded, and the association (if not
pre-configured) is dissolved. The receive process various according to the following result of
calculating the combination of the local and remote clock modes:
Issue 01 (2018-05-04) 218

NE20E-S2
 If both the local and remote hosts are operating in client mode, an error occurs, and the
packet is discarded.
 If the result is recv, the packet is processed, and the association is marked reachable if
the received packet contains a valid header. In addition, if the received packet contains
valid data, the clock-update process is called to update the local clock. If the association
was not previously configured, it is dissolved.
 If the result is xmit, the packet is processed, and an immediate response packet is sent.
The association is then dissolved if it is not pre-configured.
 If the result is pkt, the packet is processed, and the association is marked reachable if the
received packet contains a valid header. In addition, if the received packet contains valid
data, the clock-update process is called to update the local clock. If the association was
not pre-configured, an immediate reply is sent, and the association is dissolved.
Packet Process
The packet process checks message validity, calculates delay/offset samples, and invokes
other processes to filter data and select a reference source. First, the transmit timestamp must
be different from the transmit timestamp in the last message. If the transmit timestamp are the
same, the message may be an outdated duplicate message.
Second, the originate timestamp must match the last message sent to the same peer. If a
mismatch occurs, the message may be out of order, forged, or defective.
Lastly, the packet process uses a clock selection algorithm to select the best clock sample
from the specified clocks or clock groups at different stratums. The delay (peer delay), offset
(peer offset), and dispersion (peer dispersion) for the peer are all determined.
Clock-Update Process
After the offset, delay, and dispersion of the valid clock are determined by the clock-filter
process, the clock-selection process invokes the clock-update process. The result of the
clock-selection and clock-combining processes is the final clock correction value. The
local-clock updates the local clock with this value. If no reference source is found after these
processes, the clock-update process does not perform any other operation.
The clock-selection is then invoked. It contains two algorithms: intersection and clustering.
 The intersection algorithm generates a list of candidate peers suitable to be the reference
source and calculates a confidence interval for each peer. It discards falsetickers using a
technique adopted from Marzullo and Owicki [MAR85].
 The clustering algorithm orders the list of remaining candidates based on their stratums
and synchronization distances. It repeatedly discards outlier peers based on the
dispersion until only the most accurate, precise, and stable candidates remain.
If the offset, delay, and dispersion of the candidate peers are almost identical, first analyze the
clock situation by combining candidates. Then provide the parameters determined through
comprehensive analysis to the local end for updating the local clock.
1.4.18.2.6 Dynamic and Static NTP Associations

To manage synchronization information transmitted to each reference source, the NTP module
sets up a peer structure for each reference source. The peer structures are saved in the form of
links using Hash. Each peer structure corresponds to an association (or a session). NTP
supports a maximum of 128 static and dynamic associations.
Issue 01 (2018-05-04) 219

NE20E-S2
Static Associations
Static associations are set up using commands.
Dynamic Associations
Dynamic associations are set up when an NTP packet is received by the client or peer.
Static and Dynamic Associations in Different Modes

 In client/server mode, you must configure the IP address (on the client) of the server to
be synchronized. In such a case, a static association is established on the client. It is not
necessary for the server to set up the association because it only responds passively to
the client request.
 In symmetric peer mode, you must configure the IP address of the symmetric peer on the
symmetric active end. In such a case, a static association is established on the symmetric
active end.
 In multicast mode, you must configure the multicast IP addresses of the interfaces on the
multicast server. In such a case, a static association is established on the server. You must
also configure the multicast IP address of the client on the interface, which listens to the
multicast NTP packets. This is not intended for setting up a static association but is
intended for setting up a dynamic association after the client receives a packet from the
server.
 In broadcast mode, you must enable the server mode on the interfaces of the broadcast
server. In such a case, a static association is set up on the server. You must also configure
the client mode on the interface, which should listen to the broadcast NTP packets. This
is not intended for setting up a static association but is intended for setting up a dynamic
association after the client receives a packet from the server.
1.4.18.2.7 NTP Access Control
Access Control
The NTP is designed to handle accidental or malicious data modification or destruction. These
problems typically do not result in timekeeping errors on other time servers in the
synchronization subnet. The success of this design is, however, based on the redundant time
servers and various network paths. It is also assumed that data modification or destruction
does not occur simultaneously on many time servers over the synchronization subnet. To
prevent subnet vulnerability, select trusted time servers and allow them to be the clock
sources.
Access Control Implementation on the NE20E

NTP provides two security mechanisms: access authority and NTP authentication.
 Access authority
Access control protects a local NTP service by setting the access authority. This is a
simple measure to ensure security.
 NTP authentication
Enable NTP authentication on networks that demand high security.
Issue 01 (2018-05-04) 220

NE20E-S2
1.4.18.2.8 VPN Support

An NTP client must communicate with an NTP server and peers where the server is deployed
on a virtual private network (VPN), irrespective of IP networks. VPN is a computer network
that is implemented in an additional software layer (overlay) on top of an existing network.
This creates a private scope of computer communications or provides a secure extension of a
private network in an insecure network, such as the Internet.
VPN can also be used to link two separate networks over the Internet and operate as a single
network. This is useful for organizations that have two physical sites. Rather than setting up
VPN connections on each PC, the connection between the two sites can be handled by devices,
one at each location. After the configuration is complete, the devices maintain a constant
tunnel between them that links the two sites. The links between nodes of a VPN are formed
over virtual circuits between hosts of the larger network. VPNs are often deployed by
organizations to provide remote access to a secure organizational network.
Figure 1-136shows VPN support.
 Customer edge (CE): physically deployed at the customer site that provides access to
VPN services.
 Provider edge (PE): a device or set of devices at the edge of the provider network and
provides a customer site view. PEs are aware of the VPNs that connect through them and
maintain the VPN status.
 Provider (P): a device that operates inside the core network of the service provider and
does not directly connect to any customer endpoint. It is a part of implementing the
provider-provisioned virtual private network (PPVPN). It is not aware of VPN and does
not maintain the VPN status. VPN is configured on the interfaces on the PE devices that
connect to the CE devices to provide VPN services.
Figure 1-136 Virtual private network
Applicable Environment
The synchronization of clocks over the network is increasingly important as the network
topology becomes increasingly complex. NTP was developed to implement the
synchronization of system clocks over the network.
Issue 01 (2018-05-04) 221

NE20E-S2
NTP ensures clock synchronization for the following applications:

 When incremental backup is performed between the standby server and client, both
system clocks must be consistent.
 Complicated events are profiled by multiple systems. To ensure the order of events,
multiple systems must be synchronized to the same clock.
 Normal Remote Procedure Call (RPC) should be ensured. To prevent the system from
repeatedly calling a process and to ensure that a call has a fixed period, the system clocks
must be synchronized; otherwise, a call may time out before being performed.
 Certain applications must know the time when a user logs in to the system or when a file
is modified.
 On a network, the offset between system clocks may be 1 minute or less. If the network
is large, it is impossible for the network administrator to enter only the clock datetime
command (command for time setting) to adjust system clocks.
 Collecting timestamps for debugging and events on different Devices is not helpful
unless all these Devices are synchronized to the same clock.
NTP synchronizes all clocks of network devices so that the devices can provide multiple
applications based on the uniform time. A local NTP end can be a reference source for other
clocks or synchronize its clock to other clock sources. Clocks on the network exchange time
information and adjust the time until all are almost identical.
Application Instances
As shown in Figure 1-137, the time server B in the LAN is synchronized to the time server A
on the Internet, and the hosts in the LAN are synchronized to the time server B in the LAN. In
this way, the hosts are synchronized to the time server on the Internet.
Figure 1-137 Time synchronization
Issue 01 (2018-05-04) 222

NE20E-S2
1.4.19 OPS
1.4.19.1 Overview
Definition
The Open Programmability System (OPS) is an open platform that provides Representational
State Transfer (RESTful) Application Programming Interfaces (APIs) to achieve
programmability, allowing third-party applications to run on the platform.
The OPS also supports embedded third-party applications and an event subscription
mechanism. Using the OPS, users can deploy supplementary functions that facilitate service
extension and intelligent management of devices, reducing their operation and maintenance
costs.
Purpose
Conventional network devices provide only limited functions and predefined services. As the
network develops, the static and inflexible service provisioning mode cannot meet the
requirements for diversified and differentiated services. Some customers require devices with
specific openness, on which they can develop their own functions and deploy proprietary
management policies to implement automatic management and maintenance, lowering
management costs.
To meet the preceding customer requirements, Huawei offers the OPS, an open platform with
programmability. The OPS allows users or third-party developers to develop and deploy
network management policies using open RESTful APIs. With the programmability, the OPS
implements rapid service expansion, automatic function deployment, intelligent device
management, helping reduce network operation and maintenance costs and simplify network
operations.
Benefits
The OPS offers the following benefits:
 Supports user-defined configurations and applications, implementing flexible service
deployment and simplifying network device management.
 Uses various third-party programs, improving network utilization.
 Allows users to develop private services.
 Facilitates application deployment.
Security
The OPS provides the following security measures:
 API security
Only authorized users can operate the OPS.
 Operation security
Resources are isolated in modules in the OPS and their usage can be monitored.
 Program security
Third-party resources are used to manage programs.
 Important information security
Issue 01 (2018-05-04) 223

NE20E-S2
OPS APIs use the secure communication protocol to ensure no information leakage
during transmission. In addition, users must ensure local security for operating and
saving important information.
1.4.19.2 Principles
1.4.19.2.1 OPS Architecture
Figure 1-138 shows the OPS architecture. The OPS is developed based on the Huawei
proprietary Versatile Routing Platform (VRP) and allows customized applications to
interwork with the modules on the management, control, and resource planes on the VRP
through open application programming interfaces (APIs).
Figure 1-138 OPS architecture
Module Name Description
OPS Open Programmability System.

Python/Java/C/C++ Programming languages for applications supported by the OPS. The
OPS supports all four types of programming languages.
Issue 01 (2018-05-04) 224

NE20E-S2
Module Name Description

OPS API OPS application programming interface through which the
applications in the OPS can interwork with the modules on the VRP.
VRP Operating system used by Huawei data communication devices. The
VRP provides unified user and management interfaces that utilize a
unified real-time operating system kernel, software-based IP
forwarding engine, and route processing and configuration
management plane. The VRP supports control plane functions and
defines the interface standards on the forwarding plane.
Management plane The management plane performs management functions for the
entire system. It also provides coordination between all the planes,
such as the PM and FM.
Control plane The control plane performs call control and connection control
functions. The control plane uses signaling to set up and release
connections, and can restore a connection in case of a failure.
The control plane also performs other functions in support of call and
connection control, such as routing information dissemination,
L2VPN, L3VPN, OSPF, BGP, and MPLS.
Data plane The data plane provides virtual network paths, such as the FIB and
LSP, to transmit data between nodes.
1.4.19.2.2 Maintenance Assistant Overview

The maintenance assistant function implements automatic device management and
maintenance. You can create an assistant and define a trigger condition and task for the
assistant. If an assistant task is ready to run, the system monitors the device running status in
real time. When the configured trigger condition is met, the system automatically performs
the configured task. Assistants enable the system to monitor its running status and take actions
in different conditions, improving system maintainability.
The OPS supports the maintenance assistant-based maintaining probe function that monitors
protocol connectivity. If a protocol connection is torn down, the maintenance assistant script
begins to run to collect disconnection information, improving device maintainability.
1.4.19.3 Application
1.4.19.3.1 Maintenance Assistant Applications
Automatic Health Check

During conventional device health check, you need to log in to a device and run commands to
check the hardware and service running status.
The maintenance assistant function can be configured on a device to implement automatic
health check, as shown in Figure 1-139. The device then automatically runs the health check
commands, periodically collects health check results, and sends the results to a TFTP or FTP
server for analysis. This function reduces maintenance workload. If a fault occurs, the system
runs the pre-configured commands or scripts to isolate the faulty module and rectify the
faulty.
Issue 01 (2018-05-04) 225

NE20E-S2
Figure 1-139 Device automatically collects health information and sends it to a TFTP/FTP server
Terms
Term Definition
OPS Open Programmability System. A system that
provides APIs for users or third-party developers
to program self-defined applications, which
facilitate service extension and automatic
management and maintenance and enable users
to improve the utilization of their network
resources.
API Application Programming Interface. An interface
that specifies how applications interact with each
other.
REST Representational State Transfer. A software
standard defined by Doctor Roy Fielding in his
doctoral dissertation titled Architectural Styles
and the Design of Network-based Software
Architectures in 2000. The REST standard
defines:
 Addressability: Unlike OOP objects, each
resource in REST has its unique
corresponding URI.
 Interface uniformity: Unlike SOAP, REST
requires that all request methods be mapped
to HTTP methods (GET, PUT, DELETE, and
POST). Therefore, no service description
languages, such as WSDL, are required.
 Statelessness: No client-specific information
is ever stored on the server. This makes it
much easier to horizontally scale applications
and makes the server more reliable.
 Representational: The customers
communicate with the representations of
resources instead of the resources. A resource
can have multiple representations. Any
customer that is allocated the representation
Issue 01 (2018-05-04) 226

NE20E-S2
Term Definition
of a resource has sufficient information for
processing underlying resources.
 Connectedness: Any REST-based system can
predict the resources that customers need to
access and return the representations carrying
these resources. For example, the system can
return the RESTful operations in a hyperlink
to customers.

Abbreviation
OPS Open Programmability System
API Application Programming Interface
REST Representational State Transfer
1.4.20 SAID
Definition
System of active immunization and diagnosis (SAID) is an intelligent fault diagnosis system
that automatically diagnoses and rectifies severe device or service faults by simulating human
operations in troubleshooting.
Purpose
A network is prone to severe problems if it fails to recover from a service interruption. At
present, device reliability is implemented through various detection functions. Once a device
fault occurs, the device reports an alarm or requires a reset for fault recovery. However, this
mechanism is intended for fault detection of a single module. When a service interruption
occurs, the network may fail to promptly recover from the fault, adversely affecting services.
In addition, after receiving a reported fault, maintenance engineers may face a difficulty in
collecting fault information, preventing problem locating and adversely affecting device
maintenance.
The SAID is promoted to address the preceding issues. The SAID achieves automated device
fault diagnosis, fault information collection, and service recovery, comprehensively
improving the self-healing capability and maintainability of devices.
Issue 01 (2018-05-04) 227

NE20E-S2
Benefits
The SAID can automatically detect, diagnose, and rectify device faults, greatly improving
network maintainability and reducing maintenance costs.
1.4.20.2 Principles
1.4.20.2.1 Basic SAID Functions
Basic Concepts
 SAID node: detects, diagnoses, and rectifies faults on a device's modules in the SAID.
SAID nodes are classified into the following types:
− Module-level SAID node: defends against, detects, diagnoses, and rectifies faults on
a module.
− SAID-level SAID node: detects, diagnoses, and rectifies faults on multiple modules.
 SAID node state machine: state triggered when a SAID node detects, diagnoses, and
rectifies faults. A SAID node involves seven states: initial, detecting, diagnosing,
invalid-diagnose, recovering, judging, and service exception states.
 SAID tracing: The SAID collects and stores information generated when a SAID node
detects, diagnoses, and rectifies faults. The information can be used to locate the root
cause of a fault.
SAID
Fault locating in the SAID involves the fault detection, diagnosis, and recovery phases. The
SAID has multiple SAID nodes. Each time valid diagnosis is triggered (that is, the recovery
process has been triggered), the SAID records the diagnosis process information for fault
tracing. The SAID's main processes are described as follows:
1. Defense startup phase: After the system runs, it instructs modules to deploy fault defense
(for example, periodic logic re-loading and entry synchronization), starting the entire
device's fault defense.
2. Detection phase: A SAID node detects faults and finds prerequisites for problem
occurrence. Fault detection is classified as periodic detection (for example, periodic
traffic decrease detection) or triggered detection (for example, IS-IS Down detection).
3. Diagnosis phase: Once a SAID node detects a fault, the SAID node diagnoses the fault
and collects various fault entries to locate fault causes (only causes based on which
recovery measures can be taken need to be located).
4. Recovery phase: After recording information, the SAID node starts to rectify the fault by
level. After the recovery action is completed at each level, the SAID node determines
whether services recover (by determining whether the fault symptom disappears). If the
fault persists, the SAID node continues to perform the recovery action at the next level
until the fault is rectified. The recovery action is gradually performed from a lightweight
level to a heavyweight level.
5. Tracing phase: If the SAID determines the fault and its cause, this fault diagnosis is a
valid diagnosis. The SAID then records the diagnosis process. After entering the
recovery phase, the SAID records the recovery process for subsequent analysis.
Issue 01 (2018-05-04) 228

NE20E-S2
SAID Node State Machine

The fault detection, diagnosis, and recovery processes of a SAID node are implemented
through state machines.
Figure 1-140 Process of SAID node state transition
All state transition scenarios are as follows:

1. When detecting a trigger event in the initial state, the SAID node enters the detecting
state.
2. If the detection is not completed in the detecting state, the SAID node keeps working in
this state.
3. If a detection timeout occurs or no fault is detected in the detecting state, the SAID node
enters the initial state.
4. When detecting a fault in the detecting state, the SAID node enters the diagnosing state.
5. If the diagnosis action is not completed in the diagnosing state, the SAID node keeps
working in this state.
6. If an environmental change occurs in the diagnosing state or another SAID node enters
the recovering state, the SAID node enters the invalid-diagnose state.
7. If the diagnosis action is not completed in the invalid-diagnose state, the SAID node
keeps working in this state.
8. If no device exception is detected after the diagnosis action is completed in the
diagnosing state, the SAID node enters the initial state.
9. If a device exception is detected after the diagnosis action is completed in the diagnosing
state, the SAID node enters the recovering state.
10. If the recovery action is not completed in the recovering state, the SAID node keeps
11. If the recovery action is completed in the recovering state, the SAID node enters the
judging state.
12. If the judgment action is not completed in the judging state, the SAID node keeps
Issue 01 (2018-05-04) 229

NE20E-S2
13. If the service does not recover in the judging state and a secondary recovery action exists,
the SAID node enters the recovering state.
14. If the service does not recover in the judging state and no secondary recovery action
exists, the SAID node enters the service exception state.
15. In the service exception state, the SAID node periodically checks whether the service
recovers.
16. If the service recovers in the judging state, the SAID node enters the initial state.
1.4.20.2.2 SAID for Ping
Fault Cause
The failure to ping through a directly connected device often occurs on the network, causing
services to be interrupted for a long time and fail to automatically recover. The ping process
involves various IP forwarding phases. A ping failure may be caused by a hardware entry error,
board fault, or subcard fault on the local device or a fault on an intermediate device or the
peer device. Therefore, it is difficult to locate or demarcate the specific fault.
Definition
The ping service node is a specific SAID service node. This node performs link-heartbeat
loopback detection to detect service faults, diagnoses each ping forwarding phase to locate or
demarcate the fault, and takes corresponding recovery actions.
Principles
For details about the SAID framework and principles, see 1.4.20 SAID. The ping service node
undergoes four phases (fault detection, fault diagnosis, fault recovery, and service recovery
determination) to implement automatic device diagnosis, fault information collection, and
service recovery.
 Fault detection
The ping service node performs link-heartbeat loopback detection to detect service faults.
Link-heartbeat loopback detection is classified as packet modification detection or
packet loss detection.
− Packet modification detection is to check whether the content of received heartbeat
packets is the same as the content of sent heartbeat packets.
− Packet loss detection is to check whether the difference between the number of
received heartbeat packets and the number of sent heartbeat packets is within the
permitted range.
After detecting packet modification or loss, the SAID triggers a message and sends it to
instruct the ping service node to diagnose the fault.
 Fault diagnosis
After receiving the triggered message in the fault detection state, the ping service node
enters the diagnosis state for fault diagnosis.
In the fault diagnosis state, the ping service node performs interface loopback detection
to determine whether the local device is faulty. If yes, the node enters the fault recovery
state. If not, the node generates an alarm (only a packet modification alarm) and returns
to the fault detection state.
 Fault recovery
Issue 01 (2018-05-04) 230

NE20E-S2
If a loopback detection fault occurs, the ping service node determines whether a counting
error occurs on the associated subcard.
− If a counting error occurs, the ping service node resets the subcard for service
recovery. Then, the node enters the service recovery determination state and
performs link-heartbeat loopback detection to determine whether services recover.
If services recover, the node returns to the fault detection state. If services do not
recover, the node returns to the fault recovery state and takes a secondary recovery
action. (For a subcard reset, the secondary recovery action is board reset.)
− If no counting error occurs, the ping service node resets the involved board for
service recovery. After the board starts, the node enters the service recovery
determination state and performs link-heartbeat loopback detection to determine
whether services recover. If services recover, the node returns to the fault detection
state. If services do not recover, the node remains in the service recovery
determination state and periodically performs link-heartbeat loopback detection
until services recover.
 Service recovery determination
After fault recovery is complete, the ping service node uses the fault packet template to
send diagnostic packets. If a fault still exists, the node generates an alarm. If no fault
exists, the node instructs the link heartbeat to return to the initiate state, and the node
itself returns to the fault detection state.
Terms
None.
Abbreviation
Abbreviatio Full Spelling
n
SAID System of Active Immunization and Diagnosis
1.4.21 KPI
Definition
Key performance indicators (KPIs) indicate the performance of a running device at a specific
time. A KPI may be obtained by aggregating multiple levels of KPIs. The KPI data collected
by the master MPU and LPUs is saved as a xxx.dat file and stored into the CF card on the
master MPU. The KPI parsing tool parses the file according to a predefined parsing format
and converts it into an Excel file. The Excel file provides relevant fault and service
impairment information, facilitating fault locating.
Issue 01 (2018-05-04) 231

NE20E-S2
Purpose
The KPI system records key device KPIs in real time, provides service impairment
information (for example, the fault generation time, service impairment scope/type, relevant
operation, and possible fault cause/location), and supports fast fault locating.
Benefits
The KPI system helps carriers quickly learn service impairment information and locate faults,
so that they can effectively improve network maintainability and reduce maintenance costs.
1.4.21.2 Principles
KPI System
Key performance indicators (KPIs) are periodically collected at a specified time, which
slightly increases memory and CPU usage. However, if a large number of KPIs are to be
collected, services may be seriously affected. Therefore, when memory or CPU usage exceeds
70%, enable the system to collect the KPIs of only the CP-CAR traffic, message-queue
CurLen, Memory Usage, and CPU Usage objects that do not increase the memory or CPU
usage.
The KPI system checks whether the receiving buffer area has data every 30 minutes. If the
receiving buffer area has data, the system writes the data into a data file and checks whether
the data file size is greater than or equal to 4 MB. If the data file size is greater than or equal
to 4 GB, the system compresses the file as a package named in the
yyyy-mm-dd.hh-mm-ss.dat.zip format. After the compression is complete, the system
deletes the data file.
The KPI system obtains information about the size of the remaining CF card space each time
a file is generated.
 If the remaining CF card space is less than or equal to 50 MB, the KPI system deletes the
oldest packages compressed from data files.
 If the remaining CF card space is greater than 50 MB, the KPI system obtains data files
from the cfcard2:/KPISTAT path and computes the total space used by all the
packages compressed from data files. If the space usage is greater than or equal to 110
MB, the KPI system deletes the oldest packages.
Service Implementation Process

The KPI system provides periodic collection and storage interfaces for service modules. After
the desired service modules successfully register, the KPI system starts periodic data
collection and stores collected data.
Issue 01 (2018-05-04) 232

NE20E-S2
Figure 1-141 Service implementation flowchart
1. The KPI system provides a registration mechanism for service modules. After the
modules register, the system collects service data at the specific collection time through
periodic collection and storage interfaces.
2. When the collection period of a service module expires, the KPI system invokes the
module to collect data. The module converts the collected data into a desired KPI packet
format and saves the data on the MPU through the interface provided by the KPI system.
3. The KPI parsing tool parses the file based on a predefined format and converts the file
into an Excel one.
KPI Categories
KPIs are categorized as access service, traffic monitoring, system, unexpected packet loss,
resource, or value-added service KPIs. The monitoring period can be 1, 5, 10, 15, or 30
minutes. At present, chips (for example, NP and TM chips), services (for example, QoS), and
boards (for example, MPUs, LPUs, and subcards) support KPI collection.
Table 1-35 provides KPI examples.
Issue 01 (2018-05-04) 233

NE20E-S2
Table 1-35 KPI examples
KPI KPI MPU/ KPI KPI Monit Collec Report Incre

Catego Sub-ca LPU Collec oring ted ing mental
ry tegory tion Period When Condit /Total
Object CPU/ ion
Memo
ry
Usage
Is
Highe
r Than
70%
Traffic Physica LPU GE1/0/ Inboun 5 No Reporte Increm
monitor l 10 d minutes d when ental
ing interfac Multica the
e st thresho
Packet ld
30,000
is
reached
Access Numbe MPU Numbe Total 15 No Always Total
service r of r of number minutes reporte
access access d
users users
support support
ed by a ed by a
device device
System Messag MPU messag LCM-2 30 Yes Reporte Total
e queue e-queue minutes d only
CurLen when
the
thresho
ld is
exceed
ed
Unexpe Physica LPU GE0/1/ Inboun 5 No Reporte Increm
cted l 0 d minutes d when ental
packet interfac Discard the
loss e ed thresho
Packets ld 300
is
reached
Resour QoS LPU User-Q Inboun 30 No Reporte Total
ce resourc quantit ueue d minutes d upon
e y: N (N TM0 Alloce change
≤ 16) d s
Numbe
r
Issue 01 (2018-05-04) 234

NE20E-S2
Available KPI reporting modes are as follows:

 Always: always reported
 Change report: reported upon changes
 Over threshold: reported when the threshold is reached
KPI Parsing Rules

KPI log files in the CF card are stored in binary format. Each file consists of the following
parts:
 File header
For details about the header format of the .dat file, see Table 1-36.
 Data file
− Packet header
For details about the packet header format, see Table 1-37.
− Data packet
For details about the packet format, see Table 1-38.
Table 1-39 describes the file format output after the system parses the source file according to
the data formats in Table 1-36, Table 1-37, and Table 1-38.
Table 1-36 Format of the KPI file header
Structure Definition Bytes Remarks
Header Start delimiter 4 0x05050505

Data content length 2 NAME_LEN+4
Reserved 4 0
Header check 2 CRC check
(reserved)
Data T: type 2 0: NE name
content
L:V length 2 1-255
(NAME_LEN)
V:name (character NAME_LEN RO3-X16
string)
Tail End delimiter 2 0xa0a0a0a0
Table 1-37 Format of the KPI packet header

Record Data collection time 4 For example, the number of
header seconds elapsed from 00:00:00 of
January 1, 1970
Issue 01 (2018-05-04) 235

NE20E-S2

Slot ID 2 In the case of 0x0, "global" should
be displayed.
Module ID 2 Query the module name in the
configuration file according to the
module ID.
Count data length 2 -
Storage format 1 The KPI collection version is 1.
version
Reserved 1 -
Collection period 2 -
Table 1-38 Format of the KPI data packet
KPI Object Packet Format

KPI object 1 KPI-object T USHORT
L UCHAR
V -
KPI quantity N - USHORT
KPI 1 KPI-indic T USHORT
ator
L UCHAR
V -
KPI_VAL Seventh bit  0: increment
UE  1: total number
attribute
4 to 6 bits KPI-Value precision
Indicates that the
KPI-VALUE is the nth
power of 10 of the actual
value.
0 to 3 bits Number of valid bytes in
KPI-Value
KPI-Value - -
KPI 2
KPI...
KPI N
KPI object 2 -
Issue 01 (2018-05-04) 236

NE20E-S2
KPI Object Packet Format

KPI object... -
KPI object -
N
End 0xFFFF
delimiter
The involved byte order is network order.
Table 1-39 Post-parsing data modes

D L F C Versi D C S M K K K K K T I R T K U
e o il o on a h l o P P P P P y n e h P n
v o e ll t a o d I- I- I- I- I- p t c r I- it
ic p T e e s t u C S o I N e e o e V
e B y ct T s l l u b D a r r s a
N a p D i i e a b j m v d h l
a c e at m s s C e e a M o u
m k e e s l c l o l e
e I a t d d
P s e
s
H 1. K 2 V800 2 0 1 C S C C 2 C T 3 A N 6 %
U 1. P 0 R009 0 P y P P 5 P o 0 l A
A 1. I 1 C10S 1 U st U U 0 U ta 0 w
W 1 L 7/ PC20 7 P e U 8 U l a
E O 4/ 0 - m s 8 s y
I G 2 0 a a s
7 4 g g
- e e
2
7
1
4
:
4
7
:
4
9
+
0
0
:
0
0
Issue 01 (2018-05-04) 237

NE20E-S2
e I a t d d
P s e
s
H 1. K 2 V800 2 0 1 M S M M 2 M T 3 A N 1 %
U 1. P 0 R009 0 E y e e 5 e o 0 l A 6
A 1. I 1 C10S 1 M st m m 0 m ta 0 w
W 1 L 7/ PC20 7 P e o o 8 o l a
E O 4/ 0 - m r r 9 r y
I G 2 0 y y y s
7 4 U U
- s s
2 a a
7 g g
1 e e
4
:
4
8
:
4
9
+
0
0
:
0
0
H 1. K 2 V800 2 0 1 C S C C 2 C T 3 A N 6 %
U 1. P 0 R009 0 P y P P 5 P o 0 l A
A 1. I 1 C10S 1 U st U U 0 U ta 0 w
W 1 L 7/ PC20 7 P e U 8 U l a
E O 4/ 0 - m s 8 s y
I G 2 0 a a s
7 4 g g
- e e
2
7
1
4
:
4
9
:
4
9
Issue 01 (2018-05-04) 238

NE20E-S2
e I a t d d
P s e
s
+
0
0
:
0
0
H 1. K 2 V800 2 0 1 M S M M 2 M T 3 A N 1 %
U 1. P 0 R009 0 E y e e 5 e o 0 l A 6
A 1. I 1 C10S 1 M st m m 0 m ta 0 w
W 1 L 7/ PC20 7 P e o o 8 o l a
E O 4/ 0 - m r r 9 r y
I G 2 0 y y y s
7 4 U U
- s s
2 a a
7 g g
1 e e
4
:
5
0
:
4
9
+
0
0
:
0
0
Issue 01 (2018-05-04) 239

NE20E-S2
1.5 Network Reliability

Purpose
This document describes the network reliability feature in terms of its overview, principles,
and applications.
Related Version

U2000 V200R017C50
Intended Audience
Issue 01 (2018-05-04) 240

NE20E-S2

securely protected.
and VRRP.
Special Declaration
Symbol Conventions
Symbol Description

Issue 01 (2018-05-04) 241

NE20E-S2
Symbol Description

injury.
tips.
deterioration.
Change History
V800R009C10SPC200.
V800R009C10SPC100.
V800R009C10.
1.5.2 BFD
Definition
Bidirectional Forwarding Detection (BFD) is a fault detection protocol that can quickly
determine a communication failure between devices and notify upper-layer applications.
Purpose
To minimize the impact of device faults on services and improve network availability, a
network device must be able to quickly detect faults in communication with adjacent devices.
Measures can then be taken to promptly rectify the faults to ensure service continuity.
On a live network, link faults can be detected using either of the following mechanisms:
Issue 01 (2018-05-04) 242

NE20E-S2
 Hardware detection: For example, the Synchronous Digital Hierarchy (SDH) alarm
function can be used to quickly detect link faults.
 Hello detection: If hardware detection is unavailable, Hello detection can be used to
detect link faults.
However, the two mechanisms have the following issues:
 Only certain media support hardware detection.
 Hello detection takes more than 1 second to detect a fault. When traffic is transmitted at
gigabit rates, such slow detection causes packet loss.
 On a Layer 3 network, the Hello packet detection mechanism cannot detect faults for all
routes, such as static routes.
BFD resolves these issues by providing:
 A low-overhead, short-duration method to detect faults on the path between adjacent
forwarding engines. The faults can be interface, data link, and even forwarding engine
faults.
 A single, unified mechanism to monitor any media and protocol layers in real time.
Benefits
BFD offers the following benefits:
 Improved network performance and reliability
 Improved user experience
1.5.2.2 Principles
Bidirectional Forwarding Detection (BFD) detects communication faults between forwarding
engines. Specifically, BFD checks the continuity of a data protocol on the path between
systems. The path can be a physical or logical link or a tunnel.
BFD interacts with upper-layer applications in the following manner:
 An upper-layer application provides BFD with parameters, such as the detection address
and time.
 BFD creates, deletes, or modifies sessions based on these parameters and notifies the
upper-layer application of the session status.
BFD has the following characteristics:
 Provides a low-overhead, short-duration method to detect faults on the path between
adjacent forwarding engines.
 Provides a single, unified mechanism to monitor any media and protocol layers in real
time.
The following sections describe the basic principles of BFD, including the BFD detection
mechanism, detected link types, session establishment modes, and session management.
Issue 01 (2018-05-04) 243

NE20E-S2
BFD Detection Mechanism

Two systems establish a BFD session and periodically send BFD control packets along the
path between them. If one system does not receive BFD control packets within a specified
period, the system regards it as a fault occurrence on the path.
BFD control packets are encapsulated using the User Datagram Protocol (UDP). In the initial
phase of a BFD session, both systems negotiate BFD parameters with each other using BFD
control packets. These parameters include discriminators, required minimum intervals at
which BFD control packets are sent and received, and local BFD session status. After the
negotiation is successful, both systems send BFD control packets along the path between
them at the negotiated intervals.
BFD provides two types of detection modes:
 Asynchronous mode: a major BFD detection mode. In this mode, both systems
periodically send BFD control packets to each other. If one system fails to receive BFD
control packets consecutively, the system considers the BFD session Down.
The echo function is used for two modes. When the echo function is activated, the local
system sends a BFD control packet and the remote system loops back the packet through the
forwarding channel. If several consecutive echo packets are not received, the session is
declared to be Down.
Types of Links Detected by BFD
Table 1-40 Types of links detected by BFD
Link Type Classification Description
IP links  Layer 3 physical If a physical Ethernet

interfaces interface has multiple
 Ethernet sub-interfaces sub-interfaces, BFD
(including Eth-Trunk sessions can be separately
sub-interfaces) established on the physical
Ethernet interface and its
sub-interfaces.
IP-Trunks  IP-Trunk links Separate BFD sessions can
 IP-Trunk member links be established to detect link
faults on an IP-Trunk and its
member interfaces at the
same time.
Eth-Trunks  Layer 2 Eth-Trunk links Separate BFD sessions can
 Layer 2 Eth-Trunk be established to detect link
member links faults on an Eth-Trunk and
its member interfaces at the
 Layer 3 Eth-Trunk links same time.
 Layer 3 Eth-Trunk
member links
VLANIF  VLAN Ethernet member Separate BFD sessions can

links be established to detect link
 VLANIF interfaces faults on a VLANIF
interface and its member
Issue 01 (2018-05-04) 244

NE20E-S2
Link Type Classification Description

interfaces at the same time.
MPLS LSPs  In static mode, BFD can A BFD session used to
detect the following check the continuity of a
types of LSPs: Multiprotocol Label
− LDP LSPs Switching label switched
path (MPLS LSP) can be
− TE tunnels, static established in either of the
CR-LSPs bound to following modes:
tunnels, and RSVP
 Static mode: Local and
CR-LSPs bound to
tunnels remote discriminators are
manually configured on
 In dynamic mode, BFD interconnected devices to
can detect the following allow them to negotiate a
types of LSPs: BFD session.
− LDP LSPs  Dynamic mode: BFD
− RSVP CR-LSPs discriminator
bound to tunnels type-length-value (TLV)
carried in an LSP ping
packet is used to allow
the interconnected
devices to negotiate a
BFD session.
BFD can detect a TE tunnel

that uses CR-Static or
RSVP-TE as its signaling
protocol and detect the
primary LSP bound to the
TE tunnel.
A dynamic BFD session
cannot detect the entire TE
tunnel.
PWs  SS PWs BFD can monitor a PW in
 MS PWs static (manually configured
discriminator) or dynamic
mode.
BFD Session Establishment Modes

BFD sessions can be established in either static or dynamic mode.
BFD identifies sessions based on the My Discriminator (local discriminator) and Your
Discriminator (remote discriminator) fields carried in BFD control packets. The difference
between the two modes lies in different configurations for the two fields.
Issue 01 (2018-05-04) 245

NE20E-S2
Table 1-41 BFD session establishment modes
BFD Session Description

Establishment Mode
Static mode BFD session parameters, such as the local and remote
discriminators, are manually configured and delivered for BFD
session establishment.
NOTE
In static mode, configure unique local and remote discriminators for each
BFD session. This mode prevents incorrect discriminators from affecting
BFD sessions that have correct discriminators and prevents BFD sessions
from alternating between Up and Down.
Dynamic mode When a BFD session is dynamically established, the system

processes the local and remote discriminators as follows:
 Dynamically allocates the local discriminator. When a system
triggers the dynamic establishment of a BFD session, the
system allocates a dynamic discriminator as the local
discriminator of the BFD session. Then, the system sends a
BFD control packet with Your Discriminator set to 0 to the
peer for session negotiation.
 Automatically learns the remote discriminator. The local end
of a BFD session sends a BFD control packet with Your
Discriminator set to 0 to the remote end. After the remote end
receives the packet, it checks whether the value of Your
Discriminator in this packet is the same as the value of its My
Discriminator. If the value of Your Discriminator matches
that of My Discriminator, the remote end learns the value of
My Discriminator of the local end and obtains its Your
Discriminator.
BFD Session Management

A BFD session has the following states:
 Down: A BFD session is in the Down state or a request has been sent.
 Init: The local end can communicate with the remote end, and the local end expects the
BFD session to go Up.
 Up: A BFD session is successfully established.
 AdminDown: A BFD session is in the AdminDown state.
Session status changes are transmitted using the State field carried in a BFD control packet.
The system changes its session status based on the local session status and received remote
session status from the peer system.
When a BFD session is to be established or deleted, the BFD state machine implements a
three-way handshake to ensure that the two systems detect the status change.
Issue 01 (2018-05-04) 246

NE20E-S2
Figure 1-142 shows the status change process of the state machine during the establishment of
a BFD session.
Figure 1-142 Status change process of the state machine
1. BFD configured on both Device A and Device B independently starts state machines.
The initial status of BFD state machines is Down. Device A and Device B send BFD
control packets with the State field set to Down. If BFD sessions are established in static
mode, the value of Your Discriminator in BFD control packets is manually specified. If
BFD sessions are established in dynamic mode, the value of Your Discriminator is set to
0.
2. After receiving a BFD control packet with the State field set to Down, Device B switches
the session status to Init and sends a BFD control packet with the State field set to Init.
After the local BFD session status of Device B changes to Init, Device B no longer processes the
received BFD control packets with the State field set to Down.
3. The BFD session status change of Device A is the same as that of Device B.
4. After receiving a BFD control packet with the State field set to Init, Device B changes
the local session status to Up.
5. The BFD session status change of Device A is the same as that of Device B.
1.5.2.2.2 BFD for IP

A BFD session can be established to quickly detect faults of an IP link.
BFD for IP detects single- and multi-hop IPv4 and IPv6 links:
 Single-hop BFD checks the IP continuity between directly connected systems. The single
hop refers to a hop on an IP link. Single-hop BFD allows only one BFD session to be
established for a specified data protocol on a specified interface.
 Multi-hop BFD detects all paths between two systems. Each path may contain multiple
hops, and these paths may partially overlap.
Issue 01 (2018-05-04) 247

NE20E-S2
IPv4 Usage Scenario

Typical application 1:
As shown in Figure 1-143, BFD monitors the single-hop IPv4 path between Device A and
Device B, and BFD sessions are bound to outbound interfaces.
Figure 1-143 Single-hop BFD for IPv4
As shown in Figure 1-144, BFD monitors the multi-hop IPv4 path between Device A and
Device C, and BFD sessions are bound only to peer IP addresses.
Figure 1-144 Multi-hop BFD for IPv4
IPv6 Usage Scenario

As shown in Figure 1-145, BFD monitors the single-hop IPv6 path between Device A and
Device B, and BFD sessions are bound to outbound interfaces.
Issue 01 (2018-05-04) 248

NE20E-S2
Figure 1-145 Single-hop BFD for IPv6
As shown in Figure 1-146, BFD monitors the multi-hop IPv6 path between Device A and
Device C, and BFD sessions are bound only to peer IP addresses.
Figure 1-146 Multi-hop BFD for IPv6
In BFD for IP scenarios, BFD for PST is configured on a device. If a link fault occurs, BFD
detects the fault and triggers the PST to go Down. If the device restarts and the link fault
persists, BFD is in the AdminDown state and does not notify the PST of BFD Down. As a
result, the PST is not triggered to go Down and the interface bound to BFD is still Up.
1.5.2.2.3 BFD for PST

When Bidirectional Forwarding Detection (BFD) detects a fault, it changes the interface
status in the port state table (PST) to trigger a fast reroute (FRR) switchover. BFD for PST
applies only to single-hop scenarios when BFD sessions are bound to outbound interfaces.
BFD for PST is widely used in FRR applications. If BFD for PST is enabled for a BFD
session bound to an outbound interface, the BFD session is associated with the PST on the
outbound interface. After BFD detects that a link is Down, it sets the bit for the PST to Down
to immediately trigger an FRR switchover.
1.5.2.2.4 Multicast BFD

Multicast Bidirectional Forwarding Detection (BFD) can check the continuity of the link
between interfaces that do not have Layer 3 attributes (such as IP addresses) to quickly detect
link faults.
After multicast BFD is configured, multicast BFD packets are sent using the IP layer. If the
link is reachable, the remote interface receives the multicast BFD packets and forwards them
Issue 01 (2018-05-04) 249

NE20E-S2
to the BFD module. In this manner, the BFD module detects that the link is normal. If
multicast BFD packets are sent over a trunk member link, they are delivered to the data link
layer for link continuity check. The remote IP address used in a multicast BFD session is the
default known multicast IP address (224.0.0.107 to 224.0.0.250). Any packet with the default
known multicast IP address is sent to the BFD module for IP forwarding.
Usage Scenario
Figure 1-147 Multicast BFD
As shown in Figure 1-147, multicast BFD is configured on both Device A and Device B. BFD
sessions are bound to the outbound interface If1, and the default multicast address is used.
After the configuration is complete, multicast BFD quickly checks the continuity of the link
between interfaces.
1.5.2.2.5 BFD for PIS

Bidirectional Forwarding Detection (BFD) for process interface status (PIS) is a simple
mechanism in which the behavior of a BFD session is associated with the interface status.
BFD for PIS improves the sensitivity of interfaces in detecting link faults and minimizes the
impact of faults on non-direct links.
After BFD for PIS is configured and BFD detects a link fault, BFD immediately sends a
message indicating the Down state to the associated interface. The interface then enters the
BFD Down state, which is equivalent to the Down state of the link protocol. In the BFD
Down state, interfaces process only BFD packets to quickly detect link faults.
Configure multicast BFD for each BFD session to be associated with the interface status so
that BFD packet forwarding is independent of the IP attributes on the interface.
Issue 01 (2018-05-04) 250

NE20E-S2
Usage Scenario
Figure 1-148 BFD for PIS
In Figure 1-148, a BFD session is established between Device A and Device B, and the
default multicast address is used to check the continuity of the single-hop link connected to
the interface If1. After BFD for PIS is configured and BFD detects a link fault, BFD
immediately sends a message indicating the Down state to the associated interface. The
interface then enters the BFD Down state.
1.5.2.2.6 BFD for Link-Bundle

Two routing devices are connected through an Eth-Trunk that has multiple member interfaces.
If the Eth-Trunk fails and common BFD is used, only one single-hop BFD session is created.
After the creation is complete, BFD selects the board on which a member interface resides as
a state machine board and monitors the member interface. If the member interface or state
machine board fails, BFD considers the entire Eth-Trunk failed even if other member
interfaces of the Eth-Trunk are Up. BFD for link-bundle resolves this issue.
Figure 1-149 BFD for link-bundle networking
On the network shown in Figure 1-149, a BFD for link-bundle session consists of one main
session and multiple sub-sessions.
 Each sub-session independently monitors an Eth-Trunk member interface and reports the
monitoring results to the main session. Each sub-session uses the same monitoring
parameters as the main session.
 The main session creates a BFD sub-session for each Eth-Trunk member interface,
summarizes the sub-session monitoring results, and determines the status of the
Eth-Trunk.
− The main session is Up so long as a sub-session is Up.
Issue 01 (2018-05-04) 251

NE20E-S2
− If no sub-session is available, the main session goes Down and the Unknown state
is reported to applications. The status of the Eth-Trunk port is not changed.
− If the Eth-Trunk has only one member interface and the corresponding sub-session
is Up, the main session goes Down when the member interface exits the Eth-Trunk.
The status of the Eth-Trunk is Up.
The main session's local discriminator is allocated from the range from 0x00100000 to
0x00103fff without occupying the original BFD session discriminator range. The main
session does not learn the remote discriminator because it does not send or receive packets. A
sub-session's local discriminator is allocated from the original dynamic BFD session
discriminator range using the same algorithm as a dynamic BFD session.
Only sub-sessions consume BFD session resources per board. A sub-session must select the
board on which the physical member interface bound to this sub-session resides as a state
machine board. If no BFD session resources are available on the board, board selection fails.
In this situation, the sub-session's status is not used to determine the main session's status.
1.5.2.2.7 BFD Echo

BFD echo is a rapid fault detection mechanism in which the local system sends BFD echo
packets and the remote system loops back the packets. BFD echo is classified into passive
BFD echo and one-arm BFD echo modes. These two BFD echo modes have the same
detection mechanism but different application scenarios.
Passive BFD Echo

The NE20E supports passive BFD echo for interworking with other vendors' devices.
Passive BFD echo applies only to single-hop IP link scenarios and works with asynchronous
BFD. When a BFD session works in asynchronous echo mode, the two endpoints of the BFD
session perform both slow detection in asynchronous mode and quick detection in echo mode.
As shown in Figure 1-150, Device A is directly connected to Device B, and asynchronous
BFD sessions are established between the two devices. After active BFD echo is enabled on
Device B and passive BFD echo is enabled on Device A, the two devices work in
asynchronous echo mode and send single-hop and echo packets to each other.
If Device A has a higher BFD performance than Device B, for example, the minimum
intervals between receiving BFD packets supported by Device A and Device B are 3 ms and
100 ms respectively, then BFD sessions in asynchronous mode will adopt the larger interval
(100 ms). If BFD echo is enabled, Device A can use echo packets to implement faster link
failure detection. If BFD echo is disabled, Device A and Device B can still use asynchronous
BFD packets to detect link failures. However, the minimum interval between receiving BFD
packets is the larger interval value (100 ms in this example).
Issue 01 (2018-05-04) 252

NE20E-S2
Figure 1-150 Passive BFD echo networking
The process of establishing a passive BFD echo session as shown in Figure 1-150 is as
follows:
1. Device B functions as a BFD session initiator and sends an asynchronous BFD packet to
Device A. The Required Min Echo RX Interval field carried in the packet is a nonzero
value, which specifies that Device A must support BFD echo.
2. After receiving the packet, Device A finds that the value of the Required Min Echo RX
Interval field carried in the packet is a nonzero value. If Device A has passive BFD echo
enabled, it checks whether any ACL that restricts passive BFD echo is referenced. If an
ACL is referenced, only BFD sessions that match specific ACL rules can enter the
asynchronous echo mode. If no ACL is referenced, BFD sessions immediately enter the
asynchronous echo mode.
3. Device B periodically sends BFD echo packets, and Device A sends BFD echo packets
(the source and destination IP addresses are the local IP address, and the destination
physical address is Device B's physical address) at the interval specified by the Required
Min RX Interval field. Both Device A and Device B start a receive timer, with a receive
interval that is the same as the interval at which they each send BFD echo packets.
4. After Device A and Device B receive BFD echo packets from each other, they
immediately loop back the packets at the forwarding layer. Device A and Device B also
send asynchronous BFD packets to each other at an interval that is much less than that
for sending echo packets.
One-Arm BFD Echo

One-arm BFD echo applies only to single-hop IP link scenarios. Generally, one-arm BFD
echo is used when two devices are directly connected and only one of them supports BFD.
Therefore, one-arm BFD echo does not require both ends to negotiate echo capabilities. A
one-arm BFD echo session can be established on a device that supports BFD. After receiving
a one-arm BFD echo session packet, devices that do not support BFD immediately loop back
the packet, implementing quick link failure detection.
The local device that has one-arm BFD echo enabled sends a special BFD packet (both the
source and destination IP addresses in the IP header are the local IP address, and the MD and
YD in the BFD payload are the same). After receiving the packet, the remote device
immediately loops the packet back to the local device to determine link reachability. One-arm
BFD echo can be used on low-end devices that do not support BFD.
Issue 01 (2018-05-04) 253

NE20E-S2
Similarities and Differences Between Passive BFD Echo and One-Arm BFD Echo
To ensure that passive BFD echo or one-arm BFD echo can take effect, disable strict URPF
on devices that send BFD echo packets.
Strict URPF prevents attacks that use spoofed source IP addresses. If strict URPF is enabled
on a device, the device obtains the source IP address and inbound interface of a packet and
searches the forwarding table for an entry with the destination IP address set to the source IP
address of the packet. The device then checks whether the outbound interface for the entry
matches the inbound interface. If they do not match, the device considers the source IP
address invalid and discards the packet. After a device enabled with strict URPF receives a
BFD echo packet that is looped back, it checks the source IP address of the packet. As the
source IP address of the echo packet is a local IP address of the device, the packet is sent to
the platform without being forwarded at the lower layer. As a result, the device considers the
packet invalid and discards it.
Table 1-42 Differences between BFD echo sessions and common static single-hop sessions
BFD Suppor Session Descripto Negotiation IP Header
Session ted IP Type r Prerequisite
Type
Common IPv4 Static MD and A matching The source and
static and single-ho YD must session must be destination IP
single-ho IPv6 p session be established on addresses are
p session configured the peer. different.
.
Passive IPv4 Dynamic No MD or A matching Both the source and
BFD and single-ho YD needs session must be destination IP
echo IPv6 p session to be established and addresses are a local
session configured echo must be IP address of the
. enabled on the device.
peer.
One-arm IPv4 Static Only MD A matching Both the source and
BFD single-ho needs to session does not destination IP
echo p session be need to be addresses are a local
session configured established on IP address of the
(MD and the peer. device.
YD are the
same).
1.5.2.2.8 Board Selection Rules for BFD Sessions

BFD can be deployed in either distributed or integrated mode.
 Distributed mode
By default, BFD works in distributed mode. In this mode:
− If a single-hop BFD session is established and the session is bound to a board that is
BFD-capable in hardware, BFD can work properly. If the session is bound to a
BFD-incapable board, BFD cannot work.
Issue 01 (2018-05-04) 254

NE20E-S2
− If a single-hop BFD session is established and the session is bound to a board that is
BFD-incapable in hardware but BFD-capable in software, the BFD session can be
processed by this board.
 Integrated mode
If single-hop BFD sessions are established and the sessions are bound to boards that are
BFD-incapable in hardware but BFD-capable in software, the sessions will be distributed
to the two load-balancing integrated boards. The load-balancing integrated board with
more available BFD resources will be preferentially selected.
Boards that are BFD-incapable in hardware but BFD-capable in software are selected in the following
conditions:
 Boards that are BFD-capable in hardware are unavailable.

 The integrated mode is not configured, and BFD for IP sessions bound to a physical interface or its
sub-interfaces are single-hops.
If boards that are BFD-incapable in hardware but BFD-capable in software are already selected and the
integrated mode is configured, sessions will enter the AdminDown state and then be bound to an
integrated board.
Table 1-43 describes the board selection rules for BFD sessions.
Table 1-43 Board selection rules for BFD sessions
Session Type Board Selection Rule

Multi-hop session The board with the interface that receives BFD
negotiation packets is preferentially selected. If
the board does not have available BFD
resources, a load-balancing integrated board
will be selected. If no load-balancing integrated
board is available, board selection fails.
Single-hop session bound to a physical  If the board on which the bound interface or
interface or its sub-interfaces sub-interfaces reside is BFD-capable in
hardware, this board is selected. If the board
does not have available BFD resources,
board selection fails.
 If the board on which the bound interface or
sub-interfaces reside is BFD-incapable in
hardware but BFD-capable in software and
the integrated mode is configured, a
load-balancing integrated board will be
selected. If no load-balancing integrated
board is available, board selection fails.
 If the board on which the bound interface or
sub-interfaces reside is BFD-incapable in
hardware but BFD-capable in software, the
board is still selected. If the board does not
have available BFD resources, board
selection fails.
Single-hop session bound to a trunk A board is selected from the boards on which
Issue 01 (2018-05-04) 255

NE20E-S2
Session Type Board Selection Rule

interface trunk member interfaces reside. If none of the
boards has available BFD resources, board
selection fails.
 If none of these boards is BFD-incapable in
hardware, a specified integrated board will
be selected based on load balancing.
 If any of these boards are BFD-capable in
hardware, and the others are BFD-incapable
in hardware, a specified integrated board
will be selected. If board selection fails, a
board is selected from those that are
BFD-capable in hardware.
 If all of these boards are BFD-capable in
hardware, one will be selected based on
load balancing.
BFD for LDP LSP session  If an outbound interface is configured for a
BFD for LDP LSP session, the board on
which the outbound interface resides is
preferentially selected.
− If the outbound interface is a tunnel
interface, a board is selected based on
multi-hop session rules because tunnel
interfaces reside on the IPU that is
BFD-incapable in hardware.
− If the board on which the outbound
interface resides is BFD-incapable in
hardware, a specified integrated board is
selected.
− If the board on which the outbound
interface resides is BFD-capable in
hardware, this board is selected.
 If a BFD session is not configured with an
outbound interface, a board is selected for
the BFD session based on multi-hop session
rules.
BFD for TE session Preferentially select the specified board. If no
board is specified, select the board based on the
multi-hop session principle.
BFD for VLANIF session The board with the interface that receives BFD
negotiation packets is selected. If the board
does not have available BFD resources, board
selection fails.
1.5.2.2.9 BFD Dampening

If an IGP or MPLS link frequently flaps and the flapping interval is greater than the IGP or
MPLS recovery time, BFD detects the link flapping and notifies an upper-layer protocol of
Issue 01 (2018-05-04) 256

NE20E-S2
the event. As a result, the upper-layer protocol frequently flaps. BFD dampening prevents link
flapping detected by BFD from causing the frequent flapping of the upper-layer protocol.
BFD dampening enables the BFD session's next negotiation to be delayed if the number of
times that a BFD session flaps reaches a threshold. However, IGP and MPLS negotiation is
not affected. Specifically, if a BFD session that is always flapping goes Down, its next
negotiation is delayed, reducing the number of times that the BFD session flaps.
1.5.2.3.1 BFD for Static Routes
Different from dynamic routing protocols, static routes do not have a detection mechanism. If
a fault occurs on a network, an administrator must manually address it. Bidirectional
Forwarding Detection (BFD) for static routes is introduced to associate a static route with a
BFD session so that the BFD session can detect the status of the link that the static route
passes through.
After BFD for static routes is configured, each static route can be associated with a BFD
session. In addition to route selection rules, whether a static route can be selected as the
optimal route is subject to BFD session status.
 If a BFD session associated with a static route detects a link failure when the BFD
session is Down, the BFD session reports the link failure to the system. The system then
deletes the static route from the IP routing table.
 If a BFD session associated with a static route detects that a faulty link recovers when
the BFD session is Up, the BFD session reports the fault recovery to the system. The
system then adds the static route to the IP routing table again.
 By default, a static route can still be selected even though the BFD session associated
with it is AdminDown (triggered by the shutdown command run either locally or
remotely). If a device is restarted, the BFD session needs to be re-negotiated. In this case,
whether the static route associated with the BFD session can be selected as the optimal
route is subject to the re-negotiated BFD session status.
BFD for static routes has two detection modes:
 Single-hop detection
In single-hop detection mode, the configured outbound interface and next hop address
are the information about the directly connected next hop. The outbound interface
associated with the BFD session is the outbound interface of the static route, and the peer
address is the next hop address of the static route.
 Multi-hop detection
In multi-hop detection mode, only the next hop address is configured. Therefore, the
static route must be iterated to the directly connected next hop and outbound interface.
The peer address of the BFD session is the original next hop address of the static route,
and the outbound interface is not specified. In most cases, the original next hop to be
iterated is an indirect next hop. Multi-hop detection is performed on the static routes that
support route iteration.
For details about BFD, see the HUAWEI NE20E-S2 Universal Service Router Feature Description -
Reliability.
Issue 01 (2018-05-04) 257

NE20E-S2
1.5.2.3.2 BFD for RIP
Background
Routing Information Protocol (RIP)-capable devices monitor the neighbor status by
exchanging Update packets periodically. During the period local devices detect link failures,
carriers or users may lose a large number of packets. Bidirectional forwarding detection (BFD)
for RIP can speed up fault detection and route convergence, which improves network
reliability.
After BFD for RIP is configured on the router, BFD can detect a fault (if any) within
milliseconds and notify the RIP module of the fault. The router then deletes the route that
passes through the faulty link and switches traffic to a backup link. This process speeds up
RIP convergence.
Table 1-44 describes the differences before and after BFD for RIP is configured.
Table 1-44 Differences before and after BFD for RIP is configured
Item Link Fault Detection Mechanism Convergence

Speed
BFD for RIP is A RIP aging timer expires. Second-level
not configured.
BFD for RIP is A BFD session goes Down. Millisecond-level
configured.
Related Concepts
The BFD mechanism bidirectionally monitors data protocol connectivity over the link
between two routers. After BFD is associated with a routing protocol, BFD can rapidly detect
a fault (if any) and notify the protocol module of the fault, which speeds up route convergence
and minimizes traffic loss.
BFD is classified into the following modes:
 Static BFD
In static BFD mode, BFD session parameters (including local and remote discriminators)
must be configured, and requests must be delivered manually to establish BFD sessions.
Static BFD is applicable to networks on which only a few links require high reliability.
 Dynamic BFD
In dynamic BFD mode, the establishment of BFD sessions is triggered by routing
protocols, and the local discriminator is dynamically allocated, while the remote
discriminator is obtained from BFD packets sent by the neighbor.
When a new neighbor relationship is set up, a BFD session is established based on the
neighbor and detection parameters, including source and destination IP addresses. When
a fault occurs on the link, the routing protocol associated with BFD can detect the BFD
session Down event. Traffic is switched to the backup link immediately, which
minimizes data loss.
Dynamic BFD is applicable to networks that require high reliability.
Issue 01 (2018-05-04) 258

NE20E-S2
Implementation
For details about BFD implementation, see "BFD" in Universal Service Router Feature
Description - Reliability. Figure 1-151 shows a typical network topology for BFD for RIP.
 Dynamic BFD for RIP implementation:
a. RIP neighbor relationships are established among Device A, Device B, and Device
C and between Device B and Device D.
b. BFD for RIP is enabled on Device A and Device B.
c. Device A calculates routes, and the next hop along the route from Device A to
Device D is Device B.
d. If a fault occurs on the link between Device A and Device B, BFD will rapidly
detect the fault and report it to Device A. Device A then deletes the route whose
next hop is Device B from the routing table.
e. Device A recalculates routes and selects a new path Device C → Device B →
Device D.
f. After the link between Device A and Device B recovers, a new BFD session is
established between the two routers. Device A then reselects an optimal link to
forward packets.
 Static BFD for RIP implementation:
b. Static BFD is configured on the interface that connects Device A to Device B.
c. If a fault occurs on the link between Device A and Device B, BFD will rapidly
d. After the link between Device A and Device B recovers, a new BFD session is
forward packets.
Figure 1-151 BFD for RIP
Usage Scenario
BFD for RIP is applicable to networks that require high reliability.
Issue 01 (2018-05-04) 259

NE20E-S2
Benefits
BFD for RIP improves network reliability and enables devices to rapidly detect link faults,
which speeds up route convergence on RIP networks.
1.5.2.3.3 BFD for OSPF
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults
between forwarding engines.
To be specific, BFD detects the connectivity of a data protocol along a path between two
systems. The path can be a physical link, a logical link, or a tunnel.
In BFD for OSPF, a BFD session is associated with OSPF. The BFD session quickly detects a
link fault and then notifies OSPF of the fault, which speeds up OSPF's response to network
topology changes.
Purpose
A link fault or a topology change causes routers to recalculate routes. Routing protocol
convergence must be as quick as possible to improve network availability. Link faults are
inevitable, and therefore a solution must be provided to quickly detect faults and notify
routing protocols.
BFD for Open Shortest Path First (OSPF) associates BFD sessions with OSPF. After BFD for
OSPF is configured, BFD quickly detects link faults and notifies OSPF of the faults. BFD for
OSPF accelerates OSPF response to network topology changes.
Table 1-45 describes OSPF convergence speeds before and after BFD for OSPF is configured.
Table 1-45 OSPF convergence speeds before and after BFD for OSPF is configured

Speed
BFD for OSPF An OSPF Dead timer expires. Second-level
is not
configured.
BFD for OSPF A BFD session goes Down. Millisecond-level
is configured.
Issue 01 (2018-05-04) 260

NE20E-S2
Principles
Figure 1-152 BFD for OSPF
Figure 1-152 shows a typical network topology with BFD for OSPF configured. The
principles of BFD for OSPF are described as follows:
1. OSPF neighbor relationships are established between these three routers.
2. After a neighbor relationship becomes Full, a BFD session is established.
3. The outbound interface on Device A connected to Device B is interface 1. If the link
between Device A and Device B fails, BFD detects the fault and then notifies Device A
of the fault.
4. Device A processes the event that a neighbor relationship has become Down and
recalculates routes. The new route passes through Device C and reaches Device A, with
interface 2 as the outbound interface.
1.5.2.3.4 BFD for OSPFv3
Definition
In BFD for OSPFv3, a BFD session is associated with OSPFv3. The BFD session quickly
detects a link fault and then notifies OSPFv3 of the fault, which speeds up OSPFv3's response
to network topology changes.
Purpose
routing protocols.
BFD for Open Shortest Path First version 3 (OSPFv3) associates BFD sessions with OSPFv3.
After BFD for OSPFv3 is configured, BFD quickly detects link faults and notifies OSPFv3 of
the faults. BFD for OSPFv3 accelerates OSPFv3 response to network topology changes.
Issue 01 (2018-05-04) 261

NE20E-S2
Table 1-46 describes OSPFv3 convergence speeds before and after BFD for OSPFv3 is
configured.
Table 1-46 OSPFv3 convergence speeds before and after BFD for OSPFv3 is configured

Speed
BFD for An OSPFv3 Dead timer expires. Second-level
OSPFv3 is not
configured.
BFD for A BFD session goes Down. Millisecond-level
OSPFv3 is
configured.
Principles
Figure 1-153 BFD for OSPFv3
Figure 1-153 shows a typical network topology with BFD for OSPFv3 configured. The
principles of BFD for OSPFv3 are described as follows:
1. OSPFv3 neighbor relationships are established between these three routers.
of the fault.
recalculates routes. The new route passes through Device C and reaches Device B, with
1.5.2.3.5 BFD for IS-IS

In most cases, the interval at which Hello packets are sent is 10s, and the IS-IS neighbor
holding time (the timeout period of a neighbor relationship) is three times the interval. If a
device does not receive a Hello packet from its neighbor within the holding time, the device
terminates the neighbor relationship.
Issue 01 (2018-05-04) 262

NE20E-S2
A device can detect neighbor faults at the second level only. As a result, link faults on a
high-speed network may cause a large number of packets to be discarded.
BFD, which can be used to detect link faults on lightly loaded networks at the millisecond
level, is introduced to resolve the preceding issue. With BFD, two systems periodically send
BFD packets to each other. If a system does not receive BFD packets from the other end
within a specified period, the system considers the bidirectional link between them Down.
 Static BFD
are set using commands, and requests must be delivered manually to establish BFD
sessions.
 Dynamic BFD
protocols.
BFD for IS-IS enables BFD sessions to be dynamically established. After detecting a fault,
BFD notifies IS-IS of the fault. IS-IS sets the neighbor status to Down, quickly updates link
state protocol data units (LSPs), and performs the partial route calculation (PRC). BFD for
IS-IS implements fast IS-IS route convergence.
Instead of replacing the Hello mechanism of IS-IS, BFD works with IS-IS to rapidly detect the faults
that occur on neighboring devices or links.
BFD Session Establishment and Deletion

 Conditions for establishing a BFD session
− Global BFD is enabled on each device, and BFD is enabled on a specified interface
or process.
− IS-IS is configured on each device and enabled on interfaces.
− Neighbors are Up, and a designated intermediate system (DIS) has been elected on
a broadcast network.
 Process of establishing a BFD session
− P2P network
After the conditions for establishing BFD sessions are met, IS-IS instructs the BFD
module to establish a BFD session and negotiate BFD parameters between
neighbors.
− Broadcast network
After the conditions for establishing BFD sessions are met and the DIS is elected,
IS-IS instructs BFD to establish a BFD session and negotiate BFD parameters
between the DIS and each device. No BFD sessions are established between
non-DISs.
On broadcast networks, devices (including non-DIS devices) of the same level on a
network segment can establish adjacencies. In BFD for IS-IS, however, BFD sessions are
established only between the DIS and non-DISs. On P2P networks, BFD sessions are
directly established between neighbors.
Issue 01 (2018-05-04) 263

NE20E-S2
If a Level-1-2 neighbor relationship is set up between the devices on both ends of a link,
the following situations occur:
− On a broadcast network, IS-IS sets up a Level-1 BFD session and a Level-2 BFD
session.
− On a P2P network, IS-IS sets up only one BFD session.
 Process of tearing down a BFD session
− P2P network
If the neighbor relationship established between P2P IS-IS interfaces is not Up,
IS-IS tears down the BFD session.
If the neighbor relationship established between broadcast IS-IS interfaces is not Up
or the DIS is reelected on the broadcast network, IS-IS tears down the BFD session.
If the configurations of dynamic BFD sessions are deleted or BFD for IS-IS is disabled
from an interface, all Up BFD sessions established between the interface and its
neighbors are deleted. If the interface is a DIS and the DIS is Up, all BFD sessions
established between the interface and its neighbors are deleted.
If BFD is disabled from an IS-IS process, BFD sessions are deleted from the process.
BFD detects only the one-hop link between IS-IS neighbors because IS-IS establishes only one-hop
neighbor relationships.
 Response to the Down event of a BFD session

When BFD detects a link failure, it generates a Down event and informs IS-IS of the
Down event through the GFD module. IS-IS then suppresses neighbor relationships and
recalculates routes. This process speeds up network convergence.
Usage Scenario
Dynamic BFD needs to be configured based on the actual network. If the time parameters are
not configured correctly, network flapping may occur.
BFD for IS-IS speeds up route convergence through rapid link failure detection. The
following is a networking example for BFD for IS-IS.
Issue 01 (2018-05-04) 264

NE20E-S2
Figure 1-154 BFD for IS-IS
The configuration requirements are as follows:

 Basic IS-IS functions are configured on each device shown in Figure 1-154.
 Global BFD is enabled.
 BFD for IS-IS is enabled on Device A and Device B.
If the link between Device A and Device B fails, BFD can rapidly detect the fault and report it
to IS-IS. IS-IS sets the neighbor status to Down to trigger an IS-IS topology calculation. IS-IS
also updates LSPs so that Device C can promptly receive the updated LSPs from Device B,
which accelerates network topology convergence.
1.5.2.3.6 BFD for BGP

The Border Gateway Protocol (BGP) periodically sends Keepalive packets to a peer to
monitor the peer's status. However, BGP takes more than 1 second for fault detection. When
traffic is transmitted at gigabit rates, lengthy fault detection causes packet loss, which does
not meet carrier-class network requirements for high reliability.
Bidirectional Forwarding Detection (BFD) for BGP can quickly detect faults on the link
between BGP peers and notify BGP of the faults, which implements fast BGP route
convergence.
Networking
As shown in Figure 1-155, Device A and Device B belong to ASs 100 and 200, respectively.
The two routers are directly connected and establish an External Border Gateway Protocol
(EBGP) peer relationship.
BFD is enabled to detect the EBGP peer relationship between Device A and Device B. If the
link between Device A and Device B fails, BFD can quickly detect the fault and notify BGP.
Issue 01 (2018-05-04) 265

NE20E-S2
Figure 1-155 BFD for BGP
1.5.2.3.7 BFD for LDP LSP

Bidirectional forwarding detection (BFD) monitors Label Distribution Protocol (LDP) label
switched paths (LSPs). If an LDP LSP fails, BFD can rapidly detect the fault and trigger a
primary/backup LSP switchover, which improves network reliability.
Background
If a node or link along an LDP LSP that is transmitting traffic fails, traffic switches to a
backup LSP. The path switchover speed depends on the detection duration and traffic
switchover duration. A delayed path switchover causes traffic loss. LDP fast reroute (FRR)
can be used to speed up the traffic switchover, but not the detection process.
As shown in Figure 1-156, a local label switching router (LSR) periodically sends Hello
messages to notify each peer LSR of the local LSR's presence and establish a Hello adjacency
with each peer LSR. The local LSR constructs a Hello hold timer to maintain the Hello
adjacency with each peer. Each time the local LSR receives a Hello message, it updates the
Hello hold timer. If the Hello hold timer expires before a Hello message arrives, the LSR
considers the Hello adjacency disconnected. The Hello mechanism cannot rapidly detect link
faults, especially when a Layer 2 device is deployed between the local LSR and its peer.
Figure 1-156 Primary and backup LDP LSPs
The rapid, light-load BFD mechanism is used to quickly detect faults and trigger a
primary/backup LSP switchover, which minimizes data loss and improves service reliability.
BFD for LDP LSP is implemented by establishing a BFD session between two nodes on both
ends of an LSP and binding the session to the LSP. BFD rapidly detects LSP faults and
Issue 01 (2018-05-04) 266

NE20E-S2
triggers a traffic switchover. When BFD monitors a unidirectional LDP LSP, the reverse path
of the LDP LSP can be an IP link, an LDP LSP, or a traffic engineering (TE) tunnel.
A BFD session that monitors LDP LSPs is negotiated in either static or dynamic mode:
 Static configuration: The negotiation of a BFD session is performed using the local and
remote discriminators that are manually configured for the BFD session to be established.
On a local LSR, you can bind an LSP with a specified next-hop IP address to a BFD
session with a specified peer IP address.
 Dynamic establishment: The negotiation of a BFD session is performed using the BFD
discriminator type-length-value (TLV) in an LSP ping packet. You must specify a policy
for establishing BFD sessions on a local LSR. The LSR automatically establishes BFD
sessions with its peers and binds the BFD sessions to LSPs using either of the following
policies:
− Host address-based policy: The local LSR uses all host addresses to establish BFD
sessions. You can specify a next-hop IP address and an outbound interface name of
LSPs and establish BFD sessions to monitor the specified LSPs.
− Forwarding equivalence class (FEC)-based policy: The local LSR uses host
addresses listed in a configured FEC list to automatically establish BFD sessions.
BFD uses the asynchronous mode to check LSP continuity. That is, the ingress and egress
periodically send BFD packets to each other. If one end does not receive BFD packets from
the other end within a detection period, BFD considers the LSP Down and sends an LSP
Down message to the LSP management (LSPM) module.
Although BFD for LDP is enabled on a proxy egress, a BFD session cannot be established for the
reverse path of a proxy egress LSP on the proxy egress.
BFD for LDP Tunnel

BFD for LDP LSP only detects primary LSP faults and switches traffic to an FRR bypass LSP
or existing load-balancing LSPs. If the primary and FRR bypass LSPs or the primary and
load-balancing LSPs fail simultaneously, the BFD mechanism does not take effect. LDP can
instruct its upper-layer application to perform a protection switchover (such as VPN FRR or
VPN equal-cost load balancing) only after LDP itself detects the FRR bypass LSP failure or
the load-balancing LSP failure.
To address this issue, BFD for LDP tunnel is used. LDP tunnels include the primary LSP and
FRR bypass LSP. The BFD for LDP tunnel mechanism establishes a BFD session that can
simultaneously monitor the primary and FRR bypass LSPs or the primary and load-balancing
LSPs. If both the primary and FRR bypass LSPs fail or both the primary and load-balancing
LSPs fail, BFD rapidly detects the failures and instructs the LDP upper-layer application to
perform a protection switchover, which minimizes traffic loss.
BFD for LDP tunnel uses the same mechanism as BFD for LDP LSP to monitor the
connectivity of each LSP in an LDP tunnel. Unlike BFD for LDP LSP, BFD for LDP tunnel
has the following characteristics:
 Only dynamic BFD sessions can be created for LDP tunnels.
 A BFD for LDP tunnel session is triggered using a host IP address, a FEC list, or an IP
prefix list.
 No next-hop address or outbound interface name can be specified in any BFD session
trigger policies.
Issue 01 (2018-05-04) 267

NE20E-S2
Usage Scenarios
BFD for LDP LSP can be used in the following scenarios:
 Primary and bypass LDP FRR LSPs are established.
 Primary and bypass virtual private network (VPN) FRR LSPs are established.
Benefits
BFD for LDP LSP provides a rapid, light-load fault detection mechanism for LDP LSPs,
which improves network reliability.
1.5.2.3.8 BFD for P2MP TE

BFD for P2MP TE applies to NG-MVPN and VPLS scenarios and rapidly detects P2MP TE
tunnel failures. This function helps reduce the response time, improve network-wide
reliability, and reduces traffic loss.
Benefits
No tunnel protection is provided in the NG-MVPN over P2MP TE function or VPLS over
P2MP TE function. If a tunnel fails, traffic can only be switched using route change-induced
hard convergence, which renders low performance. This function provides dual-root 1+1
protection for the NG-MVPN over P2MP TE function and VPLS over P2MP TE function. If a
P2MP TE tunnel fails, BFD for P2MP TE rapidly detects the fault and switches traffic, which
improves fault convergence performance and reduces traffic loss.
Issue 01 (2018-05-04) 268

NE20E-S2
Principles
Figure 1-157 BFD for P2MP TE principles
In Figure 1-157, BFD is enabled on the root PE1 and the backup root PE2. Leaf nodes UPE1
to UEP4 are enabled to passively create BFD sessions. Both PE1 and PE2 sends BFD packets
to all leaf nodes along P2MP TE tunnels. The leaf nodes receives the BFD packets transmitted
only on the primary tunnel. If a leaf node receives detection packets within a specified
interval, the link between the root node and leaf node is working properly. If a leaf node fails
to receive BFD packets within a specified interval, the link between the root node and leaf
node fails. The leaf node then rapidly switches traffic to a protection tunnel, which reduces
traffic loss.
1.5.2.3.9 BFD for TE CR-LSP

BFD for TE is an end-to-end rapid detection mechanism supported by MPLS TE. BFD for TE
rapidly detects faults in links on an MPLS TE tunnel. BFD for TE supports BFD for TE
tunnel and BFD for TE CR-LSP. This section describes BFD for TE CR-LSP only.
Traditional detection mechanisms, such as RSVP Hello and Srefresh, detect faults slowly.
BFD rapidly sends and receives packets to detect faults in a tunnel. If a fault occurs, BFD
triggers a traffic switchover to protect traffic.
Issue 01 (2018-05-04) 269

NE20E-S2
Figure 1-158 BFD
On the network shown in Figure 1-158, BFD is disabled. If LSRE fails, LSRA or LSRF
cannot promptly detect the fault because a Layer 2 switch exists between them. Although the
Hello mechanism detects the fault, detection lasts for a long time.
If LSRE fails, LSRA and LSRF detect the fault rapidly, and traffic switches to the path LSRA
-> LSRB -> LSRD -> LSRF.
BFD for TE detects faults in a CR-LSP. After detecting a fault in a CR-LSP, BFD for TE
immediately notifies the forwarding plane of the fault to rapidly trigger a traffic switchover.
BFD for TE is usually used together with a hot-standby CR-LSP.
The concepts associated with BFD are as follows:
 Static BFD session: established by manually setting the local and remote discriminators.
The local discriminator on a local node must match the remote discriminator on a remote
node. The minimum intervals at which BFD packets are sent and received are
changeable after a static BFD session is established.
 Dynamic BFD session: established without a local or remote discriminator specified.
After a routing protocol neighbor is established between the local and remote nodes, the
RM delivers parameters to instruct the BFD module to establish a BFD session. The two
nodes negotiate the local discriminator, remote discriminator, minimum interval at which
BFD packets are sent, and minimum interval at which BFD packets are received.
 Detection period: an interval at which the system checks the BFD session status. If no
packet is received from the remote end within a detection period, the BFD session is
considered Down.
A BFD session is bound to a CR-LSP. A BFD session is set up between the ingress and egress.
A BFD packet is sent by the ingress to the egress along a CR-LSP. Upon receipt, the egress
responds to the BFD packet. The ingress can rapidly monitor the status of links through which
the CR-LSP passes based on whether a reply packet is received.
If a link fault is detected, BFD notifies the forwarding module of the fault. The forwarding
module searches for a backup CR-LSP and switches traffic to the backup CR-LSP. In addition,
the forwarding module reports the fault to the control plane. If dynamic BFD for TE CR-LSP
is used, the control plane proactively creates a BFD session to detect faults in the backup
CR-LSP. If static BFD for TE CR-LSP is used, a BFD session is created manually to detect
faults in the backup CR-LSP if necessary.
Issue 01 (2018-05-04) 270

NE20E-S2
Figure 1-159 BFD sessions before and after a switchover
On the network shown in Figure 1-159, a BFD session is set up to detect faults in the link
through which the primary CR-LSP passes. If a link fault occurs, the BFD session on the
ingress immediately notifies the forwarding plane of the fault. The ingress switches traffic to
the bypass CR-LSP and sets up a new BFD session to detect faults in the bypass CR-LSP.
BFD for TE Deployment

The networking shown in Figure 1-160 applies to BFD for TE CR-LSP and BFD for
hot-standby CR-LSP.
Figure 1-160 BFD for TE
Switchover between the primary and hot-standby CR-LSPs
Issue 01 (2018-05-04) 271

NE20E-S2
On the network shown in Figure 1-160, a primary CR-LSP is established along the path LSRA
-> LSRB, and a hot-standby CR-LSP is configured. A BFD session is set up between LSRA
and LSRB to detect faults in the primary CR-LSP. If a fault occurs on the primary CR-LSP,
the BFD session rapidly notifies LSRA of the fault. After receiving the fault information,
LSRA rapidly switches traffic to the hot-standby CR-LSP to ensure traffic continuity.
1.5.2.3.10 BFD for TE Tunnel

BFD for TE supports BFD for TE tunnel and BFD for TE CR-LSP. This section describes
BFD for TE tunnel.
The BFD mechanism detects communication faults in links between forwarding engines. The
BFD mechanism monitors the connectivity of a data protocol on a bidirectional path between
systems. The path can be a physical link or a logical link, for example, a TE tunnel.
BFD detects faults in an entire TE tunnel. If a fault is detected and the primary TE tunnel is
enabled with virtual private network (VPN) FRR, a traffic switchover is rapidly triggered,
which minimizes the impact on traffic.
On a VPN FRR network, a TE tunnel is established between PEs, and the BFD mechanism is
used to detect faults in the tunnel. If the BFD mechanism detects a fault, VPN FRR switching
is performed in milliseconds.
1.5.2.3.11 BFD for RSVP

When a Layer 2 device exists on a link between two RSVP nodes, BFD for RSVP can be
configured to rapidly detect a fault in the link between the Layer 2 device and an RSVP node.
If a link fault occurs, BFD for RSVP detects the fault and sends a notification to trigger TE
FRR switching.
Background
When a Layer 2 device is deployed on a link between two RSVP nodes, an RSVP node can
only use the Hello mechanism to detect a link fault. For example, on the network shown in
Figure 1-161, a switch exists between P1 and P2. If a fault occurs on the link between the
switch and P2, P1 keeps sending Hello packets and detects the fault after it fails to receive
replies to the Hello packets. The fault detection latency causes seconds of traffic loss. To
minimize packet loss, BFD for RSVP can be configured. BFD rapidly detects a fault and
triggers TE FRR switching, which improves network reliability.
Issue 01 (2018-05-04) 272

NE20E-S2
Figure 1-161 BFD for RSVP
Implementation
BFD for RSVP monitors RSVP neighbor relationships.
Unlike BFD for CR-LSP and BFD for TE that support multi-hop BFD sessions, BFD for
RSVP establishes only single-hop BFD sessions between RSVP nodes to monitor the network
layer.
BFD for RSVP, BFD for OSPF, BFD for IS-IS, and BFD for BGP can share a BFD session.
When protocol-specific BFD parameters are set for a BFD session shared by RSVP and other
protocols, the smallest values take effect. The parameters include the minimum intervals at
which BFD packets are sent, minimum intervals at which BFD packets are received, and local
detection multipliers.
Usage Scenario
BFD for RSVP applies to a network on which a Layer 2 device exists between the TE FRR
point of local repair (PLR) on a bypass CR-LSP and an RSVP node on the primary CR-LSP.
Benefits
BFD for RSVP improves reliability on MPLS TE networks with Layer 2 devices.
1.5.2.3.12 BFD for VRRP
Background
Devices in a VRRP backup group exchange VRRP Advertisement packets to negotiate the
master/backup status and implement backup. If the link between devices in a VRRP backup
group fails, VRRP Advertisement packets cannot be exchanged to negotiate the
master/backup status. A backup device attempts to preempt the Master state after a period
Issue 01 (2018-05-04) 273

NE20E-S2
three times as long as the time interval at which VRRP Advertisement packets are broadcast.
During this period, user traffic is still forwarded to the master device, which results in user
traffic loss.
Bidirectional Forwarding Detection (BFD) can rapidly detect faults in links or IP routes. BFD
for VRRP enables a master/backup VRRP switchover to be completed within 1 second,
preventing user traffic loss. A BFD session is established between the master and backup
devices in a VRRP backup group and is bound to the VRRP backup group. BFD immediately
detects communication faults in the VRRP backup group and instructs the VRRP backup
group to perform a master/backup switchover, minimizing service interruptions.
VRRP and BFD Association Modes

The following table describes VRRP and BFD association modes.
Table 1-47 VRRP and BFD association modes

Associ Usage Scenario VRRP Status Change BFD Support
ation
Mode
Associ A backup device If the common BFD VRRP devices must be
ation monitors the status of the session goes Down, BFD enabled with BFD.
betwee master device in a VRRP notifies the VRRP
na backup group. A backup group of the
VRRP common BFD session is fault. After receiving the
backup used to monitor the link notification, the VRRP
group between the master and backup group changes
and a backup devices. device VRRP priorities
commo and determines whether
n BFD to perform a
session master/backup VRRP
switchover.
Associ The master and backup If the link or peer BFD VRRP devices and the
ation devices monitor the link session goes Down, BFD downstream switch must
betwee and peer BFD sessions. notifies the VRRP be enabled with BFD.
na A link BFD session is backup group of the
VRRP established between the fault. After receiving the
backup master and backup notification, the VRRP
group devices. A peer BFD backup group
and session is established immediately performs a
link between a downstream master/backup VRRP
and switch and each VRRP switchover.
peer device. BFD helps the
BFD VRRP backup group
session detect faults in the link
s between a VRRP device
and the downstream
switch.
Association Between a VRRP Backup Group and a Common BFD Session
Issue 01 (2018-05-04) 274

NE20E-S2
As shown in Figure 1-162, a BFD session is established between Device A (master) and
Device B (backup) and is bound to a VRRP backup group. If BFD detects a fault on the link
between Device B and Device A, the BFD module notifies the VRRP module of the status
change. After receiving the notification, the VRRP module performs a master/backup VRRP
switchover.
Figure 1-162 Association between a VRRP backup group and a common BFD session
VRRP device configurations are as follows:

 Device A supports delayed preemption and its VRRP priority is 120.
 Device B supports immediate preemption and its VRRP priority retains the default value
100.
 A VRRP backup group is configured on Device B to monitor a common BFD session. If
BFD detects a fault and the BFD session goes Down, Device B increases its VRRP
priority by 40.
The implementation process is as follows:
1. Device A periodically sends VRRP Advertisement packets to inform Device B that it is
working properly. Device B monitors the status of Device A and the BFD session.
2. If BFD detects a fault, the BFD session goes Down. BFD notifies the VRRP module of
the status change. Device B increases its VRRP priority value to 140 (increased by 40),
higher than Device A's VRRP priority. Device B preempts the Master state and sends
gratuitous ARP packets to update address entries on Device E.
3. After the fault is rectified, the BFD session goes Up.
Device B restores a priority of 100. Device B retains the Master state and still sends
VRRP Advertisement packets to Device A.
After receiving the packets, Device A checks that the VRRP priority carried in the
packets is lower than the local VRRP priority and waits a specified period before
preempting the Master state. After restoring the Master state, Device A sends a VRRP
Advertisement packet and a gratuitous ARP packet.
After receiving the VRRP Advertisement packet that carries a priority higher than the
local priority, Device B enters the Backup state.
Issue 01 (2018-05-04) 275

NE20E-S2
4. Device A in the Master state forwards user traffic, and Device B remains in the Backup
state.
The preceding process shows that BFD for VRRP is different from VRRP. After BFD for
VRRP is deployed and a fault occurs, a backup device immediately preempts the Master state
without waiting a period three times as long as the time interval at which VRRP
Advertisement packets are broadcast. A master/backup VRRP switchover can be implemented
in milliseconds.
Association Between a VRRP Backup Group and Link and Peer BFD Sessions
As shown in Figure 1-163, the master and backup devices monitor the status of link and peer
BFD sessions to identify local or remote faults.
Device A and Device B run VRRP. A peer BFD session is established between Device A and
Device B to detect link and device failures. Link BFD sessions are established between
Device A and Device E and between Device B and Device E to detect link and device failures.
After Device B detects that the peer BFD session goes Down and Link2 BFD session goes Up,
Device B's VRRP status changes from Backup to Master, and Device B takes over.
Figure 1-163 Association between a VRRP backup group and link and peer BFD sessions

 Device A and Device B run VRRP.
 A peer BFD session is established between Device A and Device B to detect link and
device failures.
 Link1 and Link2 BFD sessions are established between Device E and Device A and
between Device E and Device B, respectively.
1. In normal circumstances, Device A periodically sends VRRP Advertisement packets to
inform Device B that it is working properly. Device A monitors the BFD session status.
Device B monitors the status of Device A and the BFD session.
2. The BFD session goes Down if BFD detects either of the following faults:
Issue 01 (2018-05-04) 276

NE20E-S2
− Link1 or Device E fails. Link1 BFD session and the peer BFD session go Down.
Link2 BFD session is Up.
Device A's VRRP status directly becomes Initialize.
Device B's VRRP status directly becomes Master.
− Device A fails. Link1 BFD session and the peer BFD session go Down. Link2 BFD
session is Up. Device B's VRRP status becomes Master.
3. After the fault is rectified, the BFD sessions go Up, and Device A and Device B restore
their VRRP status.
A Link2 fault does not affect Device A's VRRP status, and Device A continues to forward upstream
traffic. However, Device B's VRRP status becomes Master if both the peer BFD session and Link2 BFD
session go Down, and Device B detects the peer BFD session status change before detecting the Link2
BFD session status change. After Device B detects the Link2 BFD session status change, Device B's
VRRP status becomes Initialize.
Figure 1-164 shows the state machine for the association between a VRRP backup group and
link and peer BFD sessions.
Figure 1-164 State machine for the association between a VRRP backup group and link and peer
BFD sessions
The preceding process shows that, after link and peer BFD for VRRP is deployed, the backup
device immediately preempts the Master state if a fault occurs. Link and peer BFD for VRRP
implements a millisecond-level master/backup VRRP switchover.
Benefits
BFD for VRRP speeds up master/backup VRRP switchovers if faults occur.
1.5.2.3.13 BFD for PW
Service Overview
Bidirectional Forwarding Detection (BFD) for pseudo wire (PW) monitors PW connectivity
on a Layer 2 virtual private network (L2VPN) and informs the L2VPN of any detected faults.
Issue 01 (2018-05-04) 277

NE20E-S2
Upon receiving a fault notification from BFD, the L2VPN performs a primary/secondary PW
switchover to protect services.
Static BFD for PW has two modes: time to live (TTL) and non-TTL.
The two static BFD for PW modes are described as follows:
 Static BFD for PW in TTL mode: The TTL of BFD packets is automatically calculated or
manually configured. BFD packets are encapsulated with PW labels and transmitted over
PWs. A PW can either have the control word enabled or not. The usage scenarios of
static BFD for PW in TTL mode are as follows:
− Static BFD for single-segment PW (SS-PW): Two BFD-enabled nodes negotiate a
BFD session based on the configured peer address and TTL (the TTL for SS-PWs is
1) and exchange BFD packets to monitor PW connectivity.
− Static BFD for multi-segment PW (MS-PW): The remote peer address of the
MS-PW to be detected must be specified. BFD packets can pass through multiple
superstratum provider edge devices (SPEs) to reach the destination, regardless of
whether the control word is enabled for the PW.
 Static BFD for PW in non-TTL mode: The TTL of BFD packets is fixed at 255. BFD
packets are encapsulated with PW labels and transmitted over PWs. A PW must have the
control word enabled and differentiate control packets from data packets by checking
whether these packets carry the control word. Static BFD for PW in non-TTL mode can
detect only end-to-end (E2E) SS-PWs.
Figure 1-165 Service transmission over E2E PWs
Figure 1-165 shows an IP radio access network (RAN) that consists of the following device
roles:
 Cell site gateway (CSG): CSGs form the access network. On the IP RAN, CSGs function
as user-end provider edge devices (UPEs) to provide access services for NodeBs.
Issue 01 (2018-05-04) 278

NE20E-S2
 Aggregation site gateway (ASG): On the IP RAN, ASGs function as SPEs to provide
access services for UPEs.
 Radio service gateway (RSG): ASGs and RSGs form the aggregation network. On the IP
RAN, RSGs function as network provider edge devices (NPEs) to connect to the radio
network controller (RNC).
The primary PW is along CSG1–ASG3–RSG5 and the secondary PW is along
CSG1–CSG2–ASG4-RSG6. If the primary PW fails, traffic switches to the secondary PW.
Feature Deployment
Configure static BFD for PW on the IP RAN as follows:
1. On CSG1, configure static BFD for the primary and secondary PWs.
2. On RSG5, configure static BFD for the primary PW.
3. On RSG6, configure static BFD for the secondary PW.
When you configure static BFD for PW, note the following points:
 When you configure static BFD for the primary PW, ensure that the local discriminator on CSG1 is
the remote discriminator on RSG5 and that the remote discriminator on CSG1 is the local
discriminator on RSG5.
 When you configure static BFD for the secondary PW, ensure that the local discriminator on CSG1
is the remote discriminator on RSG6 and that the remote discriminator on CSG1 is the local
discriminator on RSG6.
After you configure static BFD for PW on CSG1 and primary/secondary RSGs, services can
quickly switch to the secondary PW if the primary PW fails.
1.5.2.3.14 BFD for Multicast VPLS
Service Overview
IP/MPLS backbone networks carry an increasing number of multicast services, such as IPTV,
video conferences, and massively multiplayer online role-playing games (MMORPGs), which
all require bandwidth assurance, QoS guarantee, and high network reliability. To provide
better multicast services, the IETF proposed the multicast VPLS solution. On a multicast
VPLS network, the ingress transmits multicast traffic to multiple egresses over a P2MP MPLS
tunnel. This solution eliminates the need to deploy PIM and HVPLS on the transit nodes,
simplifying network deployment.
On a multicast VPLS network, multicast traffic can be carried over either P2MP TE tunnels or
P2MP mLDP tunnels. When P2MP TE tunnels are used, P2MP TE FRR must be deployed. If
a link fault occurs, FRR allows traffic to be rapidly switched to a normal link. If a node fails,
however, traffic is not switched until the root node detects the fault and recalculates links to
set up a Source to Leaf (S2L) sub-LSP. Topology convergence takes a long time in this
situation, affecting service reliability.
To meet the reliability requirements of multicast services, configure BFD for multicast VPLS
to monitor multicast VPLS links. When a link or node fails, BFD on the leaf nodes can
rapidly detect the fault and trigger protection switching so that the leaf nodes receive traffic
from the backup multicast tunnel.
Issue 01 (2018-05-04) 279

NE20E-S2
Figure 1-166 BFD for multicast VPLS
Figure 1-166 shows a dual-root 1+1 protection scenario in which PE-AGG1 is the master root
node and PE-AGG2 is the backup root node. Each root node sets up a complete MPLS
multicast tree to the UPEs (leaf nodes). The two MPLS multicast trees do not have
overlapping paths. After multicast flows reach PE-AGG1 and PE-AGG2, PE-AGG1 and
PE-AGG2 send the multicast flows along their respective P2MP tunnels to UPEs. Each UPE
receives two copies of multicast flows and selects one to send to users.
Issue 01 (2018-05-04) 280

NE20E-S2
The network configurations are as follows:

1. An IGP runs between the UPEs, SPEs, and PE-AGGs to implement Layer 3 reachability.
2. Each PE-AGG sets up a P2P tunnel (a TE tunnel or LDP LSP) to each UPE. VPLS PWs
are set up using BGP-AD. In addition, BGP-AD is used to set up P2MP LSPs from
PE-AGG1 and PE-AGG2 to the UPEs. VPLS PWs are iterated to the P2MP LSPs.
3. A protection group is configured on each UPE for P2MP tunnels so that each UPE can
select one from the two copies of multicast flows it receives.
4. BFD for multicast VPLS is deployed for P2MP tunnels to implement protection
switching when BFD detects a fault. On the PE-AGGs, BFD is configured to track the
upstream AC interfaces. If the AC between NPE1 and PE-AGG1 fails, the UPEs receive
multicast flows from NPE2.
BFD for multicast VPLS sessions are set up as follows:
1. A root node triggers the establishment of a BFD session of the MultiPointHead type.
Once established, the BFD session is initially Up and requires no negotiation. BFD
triggers the root node to periodically send LSP ping packets along the P2MP tunnels and
to send BFD detection packets at a configured BFD detection interval.
2. A leaf node receives LSP ping packets and triggers the establishment of a BFD session
of the MultiPointTail type. Once established, the BFD session is initially Down. After
the leaf node receives BFD detection packets indicating that the BFD session on the root
node is Up, the leaf node changes its BFD session to the Up state and starts BFD
detection.
BFD for multicast VPLS sessions support only one-way detection. The BFD session of the
MultiPointHead type on a root node only sends packets, whereas the BFD session of the MultiPointTail
type on a leaf node only receives packets.
On the network shown in Figure 1-166, if link 1 (an AC) fails, BFD on the master root node
detects that the AC interface is Down and stops sending BFD detection packets. The leaf
nodes cannot receive BFD detection packets, and therefore report the Down event, which
triggers protection switching. The leaf nodes then receive multicast flows from the backup
multicast tunnel. Similarly, if node 2, link 3, node 4, or link 5 fails, the leaf nodes also receive
multicast flows from the backup multicast tunnel. After the fault is rectified, BFD sessions are
reestablished. The leaf nodes then receive multicast flows from the master multicast tunnel
again.
1.5.2.3.15 BFD for PIM

network device must be able to quickly detect communication faults with adjacent devices.
function can be used to detect link faults. Hardware fault detection mechanisms are fast,
but cannot be used in all scenarios by all media.
 Slow Hello mechanism: It usually refers to the Hello mechanism of a routing protocol.
The detection rate for slow Hello mechanisms is measured in seconds. Detection times of
one second or more can result in large losses if data is being transmitted at gigabit rates.
For delay-sensitive services, such as voice, a delay of one second or more is also
unacceptable.
Issue 01 (2018-05-04) 281

NE20E-S2
 Other detection mechanisms: Different protocols or manufacturers may provide

proprietary detection mechanisms, but it is difficult to deploy proprietary mechanisms
when systems are interconnected for interworking.
Bidirectional Forwarding Detection (BFD) is a unified detection mechanism that can detect a
fault in milliseconds on a network. BFD is compatible with all types of transmission media
and protocols. BFD implements the fault detection function by establishing a BFD session
and periodically sending BFD control packets along the path between them. If one system
does not receive BFD control packets within a specified period, the system regards it as a fault
occurrence on the path.
In multicast scenarios, if the DR on a shared network segment is faulty and the neighbor
relationship times out, other PIM neighbors start a new DR election. Consequently, multicast
data transmission is interrupted for a few seconds.
BFD for PIM can detect a link's status on a shared network segment within milliseconds and
respond quickly to a fault on a PIM neighbor. If the interface configured with BFD for PIM
does not receive any BFD packets from the current DR within a configured detection period,
the interface considers that a fault has occurred on the designated router (DR). The BFD
module notifies the route management (RM) module of the session status, and the RM module
notifies the PIM module. Then, the PIM module triggers a new DR election immediately
rather than waiting for the neighbor relationship to time out. This minimizes service
interruptions and improves the multicast network reliability.
Currently, BFD for PIM can be used on both IPv4 PIM-SM/Source-Specific Multicast (SSM) and IPv6
PIM-SM/SSM networks.
Figure 1-167 BFD for PIM
As shown in Figure 1-167, on the shared network segment where user hosts reside, a PIM
BFD session is set up between the downstream interface Port 2 of Device B and the
downstream interface Port 1 of Device C. Both ports send BFD packets to detect the status of
the link between them.
Issue 01 (2018-05-04) 282

NE20E-S2
Port 2 of Device B is elected as a DR for forwarding multicast data to the receiver. If Port 2
fails, BFD immediately notifies the RM module of the session status and the RM module then
notifies the PIM module. The PIM module triggers a new DR election. Port 1 of Device C is
then elected as a new DR to forward multicast data to the receiver.
1.5.3 LMSP
Definition
Linear multiplex section protection (LMSP) is an SDH interface-based protection technique
that uses an SDH interface to protect services on another SDH interface. If a link failure
occurs, LMSP enables a device to send a protection switching request over K bytes to its peer
device. The peer device then returns a switching bridge reply.
LMSP is often referred to as low speed APS protection.
Purpose
Large numbers of low-speed links still exist on the user side. These links may be unstable due
to aging. These links have a small capacity and may fail to work properly due to congestion in
traffic burst scenarios. Therefore, a protection technique is required to provide reliability and
stability for these low-speed links.
LMSP is an inherent feature of an SDH network. When a mobile bearer network is deployed,
a router must be connected to an add/drop multiplexer (ADM) or RNC, both of which support
LMSP. As the original protection function of the router cannot properly protect the
communication channel between the router and ADM or RNC, LMSP is introduced to resolve
this issue.
Benefits
LMSP offers the following benefits:
 Improves the reliability and security of low-speed links and enhances product
creditability and market competitiveness by reducing labor costs (automatic switching)
and decreasing network interruption time (rapid switching).
 Improves user experience by increasing user access success rates.
1.5.3.2 Principles
LMSP is a redundancy protection mechanism that uses a backup channel to protect services
on a channel. LMSP is defined in ITU-T G.783 and G.841 and used to protect multiplex
section (MS) layers in linear networking mode. LMSP applies to point-to-point physical
networks.
LMSP can protect services against disconnection of the optical fiber on which the working MS resides,
regenerator failures, and MS performance deterioration. It does not protect against node failures.
As a supporting network, an SDH network facilitates the establishment of large-scale data

communications networks with high bandwidth. For example, data networks A and B can
Issue 01 (2018-05-04) 283

NE20E-S2
communicate with each other by multiplexing services to SDH payloads and transmitting the
payloads over optical fibers. An LMSP-enabled router can protect traffic on a link to an ADM
on an SDH network that has LMSP functions. Two LMSP-enabled routers can also interwork
to protect traffic on the direct link between them.
1.5.3.2.1 Basic LMSP Principles
Linear MS Mode
Linear MS modes are classified as 1+1 or 1:N protection modes by protection structure (only
1:1 protection is implemented).
 In 1+1 protection mode, each working link has a dedicated protection link as its backup.
In a process called bridging, a transmit end transmits data on both the working and
protection links simultaneously. In normal circumstances, a receive end receives data
from the working link. If the working link fails and the receive end detects the failure,
the receive end receives data from the protection link. Generally, only a receive end
performs a switching action, along with single-ended protection. K1 and K2 bytes are
not required for LMSP negotiation.
The 1+1 protection mode has advantages such as rapid traffic switching and high
reliability. However, this mode has a low channel usage (about 50%). Figure 1-168
shows the 1+1 protection mode.
Figure 1-168 1+1 protection mode
 In 1:N protection mode, a protection link provides traffic protection for N working links
(1 ≤ N ≤ 14). In normal circumstances, a transmit end transmits data on a working link.
The protection link can transmit low-priority data or it may not transmit any data. If the
working link fails, the transmit end bridges data onto the protection link. The receive end
then receives data from the protection link. If the transmit end is transmitting
low-priority data on the protection link, it will stop the data transmission and start
transmitting high-priority protected data. Figure 1-169 shows the 1:N protection mode.
Issue 01 (2018-05-04) 284

NE20E-S2
Figure 1-169 1:N protection mode
If several working links fail at the same time, only data on the working link with the
highest priority can be switched to the protection link. Data on other faulty working links
is lost.
When N is 1, the 1:N protection mode becomes the 1:1 protection mode.
The 1:N protection mode requires both a transmit end and a receive end to perform
switching. Therefore, K1 and K2 bytes are required for negotiation. The 1:N protection
mode has a high channel usage but poorer reliability than the 1+1 protection mode.
Linear MS Switching and Recovery Modes

 Linear MS switching modes are classified as single- or dual-ended switching.
− In single-ended switching mode, if a link failure occurs, only the receive end detects
the failure and performs a switching action. Because only the receive end performs
switching and bridging actions, the two ends of an LMSP connection may select
different links to receive traffic.
− In dual-ended switching mode, if a link failure occurs, the receive end detects the
failure and performs a switching action. The transmit end also performs a switching
action through SDH K byte negotiation although it does not detect the failure. As a
result, both ends of an LMSP connection select the same link to send and receive
traffic.
Single-ended switching must work with 1+1 protection, but dual-ended switching can
work with 1:1 or 1+1 protection.
 Linear MS recovery modes are classified as switchback or non-switchback.
In switchback mode, data on a protection link is switched back to the working link when
a working link recovers and remains stable for several to dozens of minutes. In
non-switchback mode, data on a protection link is not switched back to a working link.
The 1+1 protection mode is a non-switchback mode by default. A switchback time can
be configured to change the 1+1 protection mode to a switchback mode. The 1:1
Issue 01 (2018-05-04) 285

NE20E-S2
protection mode is a switchback mode by default and can be manually changed to a

non-switchback mode.
 LMSP types can be classified as single-chassis LMSP or multi-chassis LMSP
(MC-LMSP), depending on the number of LMSP-enabled devices.
Linear MS K Bytes
LMSP uses APS to control bridging, switching, and recovery actions. APS information is
transmitted over the K1 and K2 bytes in the MS overhead in an SDH frame structure. Table
1-48 lists the bit layout of the K1 and K2 bytes.
Table 1-48 Bit layout of the K1 and K2 bytes
K1 K2
Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
The K1 and K2 bytes contains 16 bits.

 Bits 7, 6, 5, and 4 of the K1 byte: switching request code. Table 1-49 describes switching
request code values and their meanings.
Table 1-49 Switching request code values and their meanings
Value Meaning Value Meaning

1111 Lockout of 0111 Unused
protection
1110 Forced switch 0110 Wait-to restore
1101 Signal fail high 0101 Unused
priority
1100 Signal fail low 0100 Exercise
priority
1011 Signal degrade high 0011 Unused
priority
1010 Signal degrade low 0010 Reverse request
priority
1001 Unused 0001 Do not revert
1000 Manual switch 0000 No request
 Bits 3, 2, 1, and 0 of the K1 byte: switching request channel numbers. The value 0
indicates a protection channel. The values 1 to 14 indicate working channels (the value
Issue 01 (2018-05-04) 286

NE20E-S2
can be only 1 in 1+1 protection mode). The value 15 indicates an extra service channel
(the value can be 15 only in 1:N protection mode).
 Bits 7, 6, 5, and 4 of the K2 byte: bridging/switching channel numbers. The value
meanings of a bridging channel number are the same as those of a switching request
channel number.
 Bit 3 of the K2 byte: protection mode. The value 0 indicates 1+1 protection, and the
value 1 indicates 1:1 protection.
 Bits 2, 1, and 0 of the K2 byte: MS status code. The values are as follows:
− 000: idle state
− 111: multiplex section alarm indication signal (MS-AIS)
− 110: multiplexing section remote degradation indication (MS-RDI)
− 101: dual-ended
− 100: single-ended (not defined by standards)
1.5.3.2.2 Single-Chassis LMSP Implementation
1:1 Dual-ended Protection Switching

Figure 1-170 shows 1:1 dual-ended protection switching.
1. Device B receives a signal failure message and sends a bridge request to device A
through the protection channel.
2. After receiving the bridge request, device A sends a response to device B through the
protection channel.
3. After receiving the response, device B performs switching and bridging actions and
sends a switching acknowledgement to device A through the protection channel.
4. After receiving the switching acknowledgement, device A performs bridging and
switching actions. The switching is complete when LMSP enters the stable state.
Figure 1-170 1:1 dual-ended protection switching
Issue 01 (2018-05-04) 287

NE20E-S2
1+1 Dual-ended Protection Switching

Figure 1-171 shows 1+1 dual-ended protection switching. The switching and recovery
processes of 1+1 dual-ended protection are similar to those of 1:1 dual-ended protection. The
difference is that 1+1 dual-ended protection provides permanent bridging, and the two ends
only need to send switching requests and perform switching actions. When the working
channel recovers, both ends enter the WTR state and perform switchbacks after the WTR
period expires.
Figure 1-171 1+1 dual-ended protection switching
1+1 Single-ended Protection Switching

1+1 single-ended protection does not require both ends to perform K1 and K2 byte
negotiation. Instead, the two ends perform switching actions based on their interface states
and configurations.
Similarities and differences between single-ended protection and dual-ended protection are as
follows:
 K1 and K2 bytes are sent in both single-ended protection and dual-ended protection. The
information in the K1 and K2 bytes, for example, 1:1/1+1 or single-/dual-ended
protection information, must be configured as required.
 Information in the K2 byte, for example, 1:1/1+1 or single-/dual-ended protection
information, in both single-ended protection and dual-ended protection must be verified.
In single-ended protection mode, if the local end finds that the configuration on the peer
end is different from its configuration, it reports an alarm, and switching is not affected.
In dual-ended protection mode, if the local end finds that the configuration on the peer
end is different from its configuration, it reports an alarm, and switching is affected.
Issue 01 (2018-05-04) 288

NE20E-S2
1.5.3.2.3 MC-LMSP Implementation
PGP
MC-LMSP is implemented between main control boards over PGP. The connection mode is
UDP. Figure 1-172 shows the communication process.
1. Interface board of the master device sends a message to the main control board through
the IPC.
2. The main control board of the master device constructs a PGP packet and invokes the
socket to send the packet.
3. The master device sends the packet from the main control board to interface board over
the VP.
4. The master device sends the packet through an interface.
5. The backup device receives the packet from interface board.
6. The backup device sends the packet to the main control board over the VP.
7. The backup device performs APS PGP processing.
8. The main control board of the backup device sends a message to interface board through
the IPC.
9. The interface board's APS module of the backup device performs processing.
Figure 1-172 MC-LMSP implementation over PGP
Issue 01 (2018-05-04) 289

NE20E-S2
MC-LMSP Usage Scenario

MC-LMSP must work with MC-PW APS or PW redundancy to implement end-to-end
protection. Figure 1-173 shows a network with MC-LMSP and MC-PW APS deployed.
Figure 1-173 Network with MC-LMSP and MC-PW APS deployed
1. The interfaces on TPE2 and TPE3 form an MC-LMSP group. TPE2 and TPE3 are
configured as the working and protection NEs, respectively. The LMSP state machine
runs on TPE3.
2. PW1 and PW2 form an inter-device PW APS group.
3. A DNI-PW is deployed between TPE2 and TPE3 for traffic switching.
4. An ICB channel is deployed to synchronize the status between TPE2 and TPE3.
1.5.3.3.1 Application of Single-chassis LMSP on a Mobile Bearer Network
On the network shown in Figure 1-174, single-chassis LMSP is deployed on the access and
network sides of the router.
Issue 01 (2018-05-04) 290

NE20E-S2
Figure 1-174 Application of single-chassis LMSP on a mobile bearer network
 On the access side, a NodeB/BTS is connected to the router over an E1 or SDH link, and
a microwave or SDH device is connected to the router over an optical fiber.
Single-chassis LMSP is configured for the STM-1 link between the router and
microwave or SDH device.
 On the network side, the router is connected to PEs. Single-chassis LMSP is configured
on POS or CPOS interfaces.
Issue 01 (2018-05-04) 291

NE20E-S2
Access Side
Scenario 1: On the network shown in Figure 1-175, a base station is connected to the router
through the microwave devices and then over the IMA/TDM link (CPOS interface) that has
LMSP configured. The RNC is connected to the device over the IMA/TDM link (CPOS
interface). After base station data reaches the router, the base station can interwork with the
RNC over the PW between the router and device.
Figure 1-175 Access side scenario 1
Scenario 2: On the network shown in Figure 1-176, a base station is connected to the router
through the microwave devices and then over the IMA link (CPOS interface) that has LMSP
configured. The RNC is connected to the device over the ATM link. After base station data
reaches the router, the base station can interwork with the RNC over the PW between the
router and device.
Figure 1-176 Access side scenario 2
Network Side
Scenario 1: On the network shown in Figure 1-177, the router's network-side interface is a
CPOS interface on which a global MP group is configured. Single-chassis LMSP is
configured on the CPOS interface. The router is connected to another device to carry
PW/L3VPN/MPLS/DCN services.
Issue 01 (2018-05-04) 292

NE20E-S2
Figure 1-177 Network side scenario 1
Scenario 2: On the network shown in Figure 1-178, the router's network-side interface is a
POS interface. Single-chassis LMSP is configured on the POS interface. The router is
connected to another device to carry PW/VPLS/L3VPN/MPLS/DCN services.
Figure 1-178 Network side scenario 2
1.5.3.3.2 MC-LMSP and PW Redundancy Application
MC-LMSP 1:1 Protection+One Bypass PW

On the network shown in Figure 1-179, the RNC is dual-homed to two routers. MC-LMSP is
deployed between the routers and RNC, and MC-LMSP 1:1 protection is used. The primary
and backup PWs are deployed on the routers to transparently transmit data from the RNC to a
remote router. A bypass PW is deployed between Device C and Device B to protect the PWs
and the links between the RNC and two routers.
The protection principles are as follows:
 If the primary PW fails, traffic switches to the backup PW. The traffic forwarding path
on the AC side remains unchanged, that is, traffic is still forwarded over the link between
the RNC and Device C. The traffic is then transmitted from Device C to Device B over
the bypass PW.
 If the link between the RNC and Device C fails, traffic switches to the link between the
RNC and Device B over LMSP. If the negotiation mode of PW redundancy has been set
to Independent, a primary/backup PW switchover is performed. If the negotiation mode
of PW redundancy has been set to Master/Slave, no primary/backup PW switchover is
performed and traffic is transmitted from Device B to Device C over the bypass PW.
Figure 1-179 shows a network with MC-LMSP 1:1 protection+one bypass PW deployed.
Issue 01 (2018-05-04) 293

NE20E-S2
Figure 1-179 Network with MC-LMSP 1:1 protection+one bypass PW deployed
MC-LMSP 1+1 Protection+Two Bypass PWs

Compared with MC-LMSP 1:1 protection, MC-LMSP 1+1 protection has advantages, such as
rapid traffic switching and high reliability. When MC-LMSP 1+1 protection is configured, the
primary and backup PWs are deployed on the routers to transparently transmit data from the
RNC to a remote router. Two bypass PWs must also be deployed between Device C and
Device B to provide bypass protection for the primary and backup PWs.
When the primary PW or the link between the RNC and Device C fails, the protection method
in the scenario of MC-LMSP 1+1 protection+two bypass PWs is similar to that in the scenario
of MC-LMSP 1:1 protection+one bypass PW. The difference is that in the scenario of
MC-LMSP 1+1 protection+two bypass PWs, two bypass PWs are deployed between Device
C and Device B. This ensures traffic replication for MC-LMSP 1+1 protection and provides
bypass protection for the primary and backup PWs and AC-side working and protection links.
If a fault occurs, such deployment can implement rapid traffic switching to ensure that the
networking environment after the switching also has MC-LMSP 1+1 protection.
Figure 1-180 shows a network with MC-LMSP 1+1 protection+two bypass PWs deployed.
Figure 1-180 Network with MC-LMSP 1+1 protection+two bypass PWs deployed
Issue 01 (2018-05-04) 294

NE20E-S2
1.5.3.3.3 MC-LMSP and E-PW APS Application
E-PW APS and MC-LMSP 1:1 Application

On the network shown in Figure 1-181, the RNC is connected to the IP network through two
dual-homed routers. MC-LMSP is deployed between the routers and RNC, and MC-LMSP
1:1 protection is used. Two PWs are deployed on the routers to transparently transmit data
from the RNC to a remote router. E-PW APS is deployed between Device A and Device C and
between Device A and Device B to protect the PWs and the links between the RNC and two
routers.
The protection principles are as follows:
 If the working PW fails, traffic switches to the protection PW. After Device B receives
traffic from the public network side through port A, it queries the MC-LMSP status on
the AC side. If MC-LMSP has not performed a working/protection channel switchover,
Device B forwards the traffic to Device C through port C. Device C then forwards the
traffic to the RNC through port B. If MC-LMSP has performed a working/protection
channel switchover, Device B forwards the traffic to the RNC through port B.
 If the working channel between the RNC and Device C fails, traffic switches to the
protection channel between the RNC and Device B over LMSP. After Device B receives
traffic from the AC side through port B, it queries the E-PW APS status on the public
network side. If E-PW APS has not performed a working/protection PW switchover,
Device B forwards the traffic to Device C through port C. Device C then forwards the
traffic to Device A through port A. If E-PW APS has performed a working/protection PW
switchover, Device B forwards the traffic to Device A through port A.
Figure 1-181 E-PW APS and MC-LMSP 1:1 application
E-PW APS and MC-LMSP 1+1 Application

E-PW APS and MC-LMSP 1+1 application is similar to E-PW APS and MC-LMSP 1:1
application. The difference is that in E-PW APS and MC-LMSP 1+1 application, after
receiving traffic from the public network side through port A, Device B replicates two copies
of traffic and multicasts them to the RNC through port B or to Device C through port C.
Device C has an implementation process similar to Device B.
Figure 1-182 shows E-PW APS and MC-LMSP 1+1 application.
Issue 01 (2018-05-04) 295

NE20E-S2
Figure 1-182 E-PW APS and MC-LMSP 1+1 application
1.5.3.3.4 L3VPN (PPP/MLPPP) and MC-LMSP Application

In the mobile bearer scenario shown in Figure 1-183, BTS traffic passes through the SDH
network over MLPPP and reaches the MUX. The MUX is connected to Device A and Device
B over MC-LMSP to protect the IP forwarding links between the MUX and routers.
If the primary link between the MUX and Device A fails, traffic switches to the secondary
link between the MUX and Device B.
When the MUX detects the fault, it sends traffic to the protection link between the MUX and
Device B. Device B then sends the traffic based on the neighbor relationship learned by OSPF.
Finally, traffic reaches the BSC.
Figure 1-183 L3VPN (PPP/MLPPP) and MC-LMSP application
Issue 01 (2018-05-04) 296

NE20E-S2
1.5.4 MPLS OAM

Definition
As a key technology used on scalable next generation networks, Multiprotocol Label
Switching (MPLS) provides multiple services with quality of service (QoS) guarantee. MPLS,
however, introduces a unique network layer, which causes faults. Therefore, MPLS networks
must obtain operation, administration and maintenance (OAM) capabilities.
OAM is an important means to reduce network maintenance costs. The MPLS OAM
mechanism manages operation and maintenance of MPLS networks.
For details about the MPLS OAM background, see ITU-T Recommendation Y.1710. For
details about the MPLS OAM implementation mechanism, see ITU-T Recommendation
Y.1711.
Purpose
The server-layer protocols, such as Synchronous Optical Network (SONET)/Synchronous
Digital Hierarchy (SDH), is below the MPLS layer; the client-layer protocols, such as IP, FR,
and ATM, is above the MPLS layer. These protocols have their own OAM mechanisms.
Failures in the MPLS network cannot be rectified completely through the OAM mechanism of
other layers. In addition, the network technology hierarchy also requires MPLS to have its
independent OAM mechanism to decrease dependency between layers on each other.
The MPLS OAM mechanism can detect, identify, and locate a defect at the MPLS layer
effectively. Then, the MPLS OAM mechanism reports and handles the defect. In addition, if a
failure occurs, the MPLS OAM mechanism triggers protection switching.
MPLS offers an OAM mechanism totally independent of any upper or lower layer. The
following OAM features are enabled on the MPLS user plane:
 Monitors links connectivity.
 Evaluates network usage and performance.
 Performs a traffic switchover if a fault occurs so that services meet service level
agreements (SLAs).
Benefit
 MPLS OAM can rapidly detect link faults or monitor the connectivity of links, which
helps measure network performance and minimizes OPEX.
 If a link fault occurs, MPLS OAM rapidly switches traffic to the standby link to restore
services, which shortens the defect duration and improves network reliability.
Basic Detection Functions

MPLS OAM can be used to check the connectivity of an LSP.
Figure 1-184 shows connectivity monitoring for an LSP.
Issue 01 (2018-05-04) 297

NE20E-S2
Figure 1-184 Connectivity monitoring for an LSP
The working process of MPLS OAM is as follows:

1. The ingress sends a connectivity verification (CV) or fast failure detection (FFD) packet
along an LSP to be monitored. The packet passes through the LSP and arrives at the
egress.
2. The egress compares the packet type, frequency, and trail termination source identifier
(TTSI) in a received packet with the locally configured values to verify the packet. In
addition, the egress collects the numbers of correct and incorrect packets within a
detection interval.
3. If the egress detects an LSP defect, it analyzes the defect type and sends a backward
defect indication (BDI) packet carrying defect information to the ingress along a reverse
tunnel. The ingress can then obtain the defect. If a protection group is correctly
configured, the ingress switches traffic to a backup LSP.
Reverse Tunnel
A reverse tunnel is bound to an LSP that is monitored using MPLS OAM. The reverse tunnel
can transmit BDI packets to notify the ingress of an LSP defect.
A reverse tunnel and the LSP to which the reverse tunnel is bound must have the same
endpoints.
The reverse tunnel transmitting BDI packets can be either of the following types:
 Private reverse LSP
 Shared reverse LSP
MPLS OAM Auto Protocol

ITU-T Recommendation Y.1710 has some drawbacks, for example:
 If OAM is enabled on the ingress of an LSP later than that on the egress or if OAM is
enabled on the egress but disabled on the ingress, the egress generates a loss of
connectivity verification defect (dLOCV) alarm.
 Before the OAM detection packet type or the interval at which detection packets are sent
are changed, OAM must be disabled on the ingress and egress.
 OAM parameters (such as a detection packet type and an interval at which detection
packets are sent) must be set on both the ingress and egress, which may cause parameter
inconsistency.
Issue 01 (2018-05-04) 298

NE20E-S2
The NE20E implements the OAM auto protocol to resolve these drawbacks.
The OAM auto protocol is configured on the egress. With this protocol, the egress can
automatically start OAM functions after receiving the first OAM packet. In addition, the
egress can dynamically stop running the OAM state machine after receiving an FDI packet
sent by the ingress.
1.5.4.2 Principles
1.5.4.2.1 Basic Detection
Background
The Multiprotocol Label Switching (MPLS) operation, administration and maintenance
(OAM) mechanism effectively detects and locates MPLS link faults. The MPLS OAM
mechanism also triggers a protection switchover after detecting a fault.
Related Concepts
 MPLS OAM packets
Table 1-50 describes MPLS OAM packets.
Table 1-50 MPLS OAM packets
Packet Type Description

Continuity Connectivity verification Sent by a local MEP to detect exceptions.
check (CV) packet If the local MEP detects an exception, it
sends an alarm to its client-layer MEP. For
example, if a CV-enabled device receives a
packet on an incorrect LSP, the device will
report an alarm indicating a forwarding
error to the client-layer MEP.
Fast failure detection (FFD) Sent by a maintenance association end
packet point (MEP) to rapidly detect an LSP fault.
If the MEP detects a fault, it sends an alarm
to the client layer.
NOTE
 FFD and CV packets contain the same
information and provide the same function.
They are processed in the same way,
whereas FFD packets are processed more
quickly than CV packets.
 FFD and CV cannot be started
simultaneously.
Backward defect indication (BDI) packet Sent by the egress to notify the ingress of
an LSP defect.
Issue 01 (2018-05-04) 299

NE20E-S2
 Channel defects
Table 1-51 describes channel defects that MPLS OAM can detect.
Table 1-51 Channel defect detection using MPLS OAM
Defect Description
Type
MPLS  dLOCV: a connectivity verification loss defect.
layer A dLOCV defect occurs if no CV or FFD packets are received after three
defects consecutive intervals at which CV or FFD packets are sent elapse.
 dTTSI_Mismatch: a trail termination source identifier (TTSI) mismatch
defect.
A dTTSI_Mismatch defect occurs if no CV or FFD packets with correct
TTSIs are received after three consecutive intervals at which CV or FFD
packets are sent elapse.
 dTTSI_Mismerge: a TTSI mis-merging defect.
A dTTSI_Mismerge defect occurs if CV or FFD packets with both correct
and incorrect TTSIs are received within three consecutive intervals at which
CV or FFD packets are sent.
 dExcess: an excessive rate at which connectivity detection packets are
received.
A dExcess defect occurs if five or more correct CV or FFD packets are
received within three consecutive intervals at which CV or FFD packets are
sent.
Other  Oamfail: The OAM auto protocol expires.

defects An Oamfail defect occurs if the first OAM packet is not received after the
auto protocol expires.
 Signal deterioration (SD)
An SD defect occurs if the packet loss ratio reaches the configured SD
threshold.
 Signal failure (SF)
An SF defect occurs if the packet loss ratio reaches the configured SF
threshold.
 dUnknown: an unknown defect on an MPLS network.
A test packet type or interval inconsistency occurs between the source and
sink nodes.
 Reverse tunnel
A reverse tunnel is bound to an LSP that is monitored using MPLS OAM. The reverse
tunnel can transmit BDI packets to notify the ingress of an LSP defect. A reverse tunnel
and the LSP to which the reverse tunnel is bound must have the same endpoints, and they
Issue 01 (2018-05-04) 300

NE20E-S2
transmit traffic in opposite directions. The reverse tunnels transmitting BDI packets
include private or shared LSPs. Table 1-52 lists the two types of reverse tunnel.
Table 1-52 MPLS OAM reverse tunnel types
type Description
Private Bound to only one LSP. The binding between the private reverse LSP and
reverse LSP its forward LSP is stable but may waste LSP resources.
Shared Bound to many LSPs. A TTSI carried in a BDI packet identifies a specific
reverse LSP forward LSP bound to a reverse LSP. The binding between a shared
reverse LSP and multiple forward LSPs minimizes LSP resource wastes. If
defects occur on multiple LSPs bound to the shared reverse LSP, the
reverse LSP may be congested with traffic.
Implementation
MPLS OAM periodically sends CV or FFD packets to monitor TE LSPs, PWs, or ring
networks.
 MPLS OAM for TE LSPs
MPLS OAM monitors TE LSPs. If MPLS OAM detects a fault in a TE LSP, it triggers a
traffic switchover to minimize traffic loss.
Figure 1-185 MPLS OAM for a TE LSP
Figure 1-185 illustrates a network on which MPLS OAM monitors TE LSP connectivity.
The process of using MPLS OAM to monitor TE LSP connectivity is as follows:
a. The ingress sends a CV or FFD packet along a TE LSP to be monitored. The packet
passes through the TE LSP and arrives at the egress.
Issue 01 (2018-05-04) 301

NE20E-S2
b. The egress compares the packet type, frequency, and TTSI in the received packet
with the locally configured values to verify the packet. In addition, the egress
collects the number of correct and incorrect packets within a detection interval.
c. If the egress detects an LSP defect, the egress analyzes the defect type and sends a
BDI packet carrying defect information to the ingress along a reverse tunnel. The
ingress can then be notified of the defect. If a protection group is configured, the
ingress switches traffic to a backup LSP.
 MPLS OAM for PWs
MPLS OAM periodically sends CV or FFD packets to monitor PW connectivity. If
MPLS OAM detects a PW defect, it sends BDI packets carrying the defect type along a
reverse tunnel and instructs a client-layer application to switch traffic from the active
link to the standby link.
Figure 1-186 MPLS OAM for a PW
Figure 1-186 illustrates a network on which MPLS OAM monitors PW connectivity.

a. For PE1 and PE2, a PW is established between them, OAM parameters are set on
them, and they are enabled to send and receive OAM packets. OAM monitors the
PW between PE1 and PE2 and obtains PW information
b. If OAM detects a default, PE2 sends a BDI packet to PE1 over a reverse tunnel.
c. PEs notify CEs of the fault so that CE1 and CE2 can use the information to
maintain networks.
1.5.4.2.2 Auto Protocol

The MPLS OAM auto protocol is a Huawei proprietary protocol.
On the NE20E, the OAM auto protocol can address the following problems, which occur
because of drawbacks of ITU-T Recommendations Y.1710 and Y.1711:
 A dLOCV defect occurs if the OAM function is enabled on the ingress on an LSP later
than that on the egress or if OAM is enabled on the egress and disabled on the ingress.
Issue 01 (2018-05-04) 302

NE20E-S2
 The dLOCV defect also occurs when OAM is disabled. OAM must be disabled on the
ingress and egress before the OAM detection packet type or the interval at which
detection packets are sent can be changed.
 OAM parameters, including a detection packet type and an interval at which detection
packets are sent must be set on both the ingress and egress. This is likely to cause a
parameter inconsistency.
The OAM auto protocol enabled on the egress provides the following functions:
 Triggers OAM
− If the sink node does not support OAM CC and CC parameters (including the
detection packet type and interval at which packets are sent), upon the receipt of the
first CV or FFD packet, the sink node automatically records the packet type and
interval at which the packet is sent and uses these parameters in CC detection that
starts.
− If the OAM function-enabled sink node does not receive CV or FFD packets within
a specified period of time, the sink node generates a BDI packet and notifies the
NMS of the BDI defect.
 Dynamically stops running the OAM. If the detection packet type or interval at which
detection packets are sent is to be changed on the source node, the source node sends an
FDI packet to instruct the sink node to stop the OAM state machine. If an OAM function
is to be disabled on the source node, the source node also sends an FDI packet to instruct
the sink node to stop the OAM state machine.
1.5.4.3.1 MPLS OAM Application in the IP RAN Layer 2 to Edge Scenario
MPLS OAM is deployed on PEs to maintain and operate MPLS networks. Working at the
MPLS client and server layers, MPLS OAM can effectively detect, identify, and locate client
layer faults and quickly switch traffic if links or nodes become faulty, reducing network
maintenance cost.
Figure 1-187 IP RAN over MPLS in the Layer 2 to edge scenario
Figure 1-187 illustrates an IP RAN in the Layer 2 to edge scenario. The MPLS OAM
implementation is as follows:
Issue 01 (2018-05-04) 303

NE20E-S2
 The BTS, NodeB, BSC, and RNC can be directly connect to an MPLS network.
 A TE tunnel between PE1 and PE4 is established. PWs are established over the TE
tunnel to transmit various services.
 MPLS OAM is enabled on PE1 and PE4 OAM parameters are configured on PE1 and
PE4 on both ends of a PW. These PEs are enabled to send and receive OAM detection
packets, which allows OAM to monitor the PW between PE1 and PE4. OAM can obtain
basic PW information. If OAM detects a default, PE4 sends a BDI packet to PE1 over a
reverse tunnel. PEs notify the user-side BTS, NodeB, RNC, and BSC of fault
information so that the user-side devices can use the information to maintain networks.
1.5.4.3.2 Application of MPLS OAM in VPLS Networking
Service Overview
The operation and maintenance of virtual leased line (VLL) and virtual private LAN service
(VPLS) services require an operation, administration and maintenance (OAM) mechanism.
MultiProtocol Label Switching Transport Profile MPLS OAM provides a mechanism to
rapidly detect and locate faults, which facilitates network operation and maintenance and
reduces the network maintenance costs.
As shown in Figure 1-188, a user-end provider edge (UPE) on the access network is
dual-homed to SPE1 and SPE2 on the aggregation network. A VLL supporting access links of
various types is deployed on the access network. A VPLS is deployed on the aggregation
network to form a point-to-multipoint leased line network. Additionally, Fast Protection
Switching (FPS) is configured on the UPE; MPLS tunnel automatic protection switching
(APS) is configured on SPE1 and SPE2 to protect the links between the virtual switching
instances (VSIs) created on the two superstratum provider edges (SPEs).
Figure 1-188 UPE dual-homing networking
Feature Deployment
To deploy MPLS OAM to monitor link connectivity of VLL and VPLS pseudo wires (PWs),
configure maintenance entity groups (MEGs) and maintenance entities (MEs) on the UPE,
SPE1, and SPE2 and then enable one or more of the continuity check (CC), loss measurement
Issue 01 (2018-05-04) 304

NE20E-S2
(LM), and delay measurement (DM) functions. The UPE monitors link connectivity and
performance of the primary and secondary PWs.
MPLS-TP OAM is implemented as follows:
 When SPE1 detects a link fault on the primary PW, SPE1 sends a Remote Defect
Indication (BDI) packet to the UPE, instructing the UPE to switch traffic from the
primary PW to the secondary PW. Meanwhile, the UPE sends a MAC Withdraw packet,
in which the value of the PE-ID field is SPE1's ID, to SPE2. After receiving the MAC
Withdraw packet, SPE2 transparently forwards the packet to the NPE and the NPE
deletes the MAC address it has learned from SPE1. After that, the NPE learns a new
MAC address from the secondary PW.
 After the primary PW recovers, the UPE switches traffic from the secondary PW back to
the primary PW. Meanwhile, the UPE sends a MAC Withdraw packet, in which the value
of the PE-ID field is SPE2's ID, to SPE1. After receiving the MAC Withdraw packet,
SPE1 transparently forwards the packet to the NPE and the NPE deletes the MAC
address it has learned from SPE2. After that, the NPE learns a new MAC address from
the new primary PW.
Terms
Item Definition
reverse A direction opposite to the direction that traffic flows along the
monitored service link.
forward A direction that traffic flows along the monitored service link.
path merge LSR An LSR that receives the traffic transmitted on the protection path
in MPLS OAM protection switching.
If the path merge LSR is not the traffic destination, it sends and
merges the traffic transmitted on the protection path onto the
working path.
If the path merge LSR is the destination of traffic, it sends the
traffic to the upper-layer protocol for handling.
path switch LSR An LSR that switches or replicates traffic between the primary
service link and the bypass service link.
user plane A set of traffic forwarding components through which traffic flow
passes. An OAM CV or FFD packet is periodically inserted to this
traffic flow to monitor the forwarding component status. In IETF
drafts, the user plane is also called the data plane.
Ingress An LSR from which the forward LSP originates and at which the
reverse LSP terminates.
Egress An LSR at which the forward LSP terminates and from which the
reverse LSP originates.
Issue 01 (2018-05-04) 305

NE20E-S2

Acronym & Abbreviation Full Name
BDI backward defect indication
CV connectivity verification
FDI forward defect indication
FFD fast failure detection
MPLS Multiprotocol Label Switching
TTSI trail termination source identifier
DM loss measurement
OAM operation, administration and maintenance
PE provider edge router
SD Signal deterioration
SF Signal failure
1.5.5 MPLS-TP OAM

Definition
Multiprotocol Label Switching Protocol Transport Profile (MPLS-TP) is a transport technique
that integrates MPLS packet switching with traditional transport network features. MPLS-TP
networks are poised to replace traditional transport networks in the future. MPLS-TP
Operation, Administration, and Maintenance (MPLS-TP OAM) works on the MPLS-TP client
layer. It can effectively detect, identify, and locate faults in the client layer and quickly switch
traffic when links or nodes become defective. OAM is an important part of any plan to reduce
network maintenance expenditures.
Purpose
Both networks and services are part of an ongoing process of transformation and integration.
New services like triple play services, Next Generation Network (NGN) services, carrier
Ethernet services, and Fiber-to-the-x (FTTx) services are constantly emerging from this
process. Such services demand more investment and have higher OAM costs. They require
state of the art QoS, full service access, and high levels of expansibility, reliability, and
manageability of transport networks. Traditional transport network technologies such as
Multi-Service Transfer Platform (MSTP), Synchronous Digital Hierarchy (SDH), or
Wavelength Division Multiplexing (WDM) cannot meet these requirements because they lack
a control plane. Unlike traditional technologies, MPLS-TP does meet these requirements
because it can be used on next-generation transport networks that can process data packets, as
well as on traditional transport networks.
Issue 01 (2018-05-04) 306

NE20E-S2
Because traditional transport networks or Optical Transport Node (OTN) networks have high
reliability and maintenance benchmarks, MPLS-TP must provide powerful OAM capabilities.
MPLS-TP OAM provides the following functions:
 Fault management
 Performance monitoring
 Triggering protection switching
Benefits
 MPLS-TP OAM can rapidly detect link faults or monitor the connectivity of links, which
helps measure network performance and minimizes OPEX.
 If a link fault occurs, MPLS-TP OAM rapidly switches traffic to the standby link to
restore services, which shortens the defect duration and improves network reliability.
MPLS-TP OAM Components

MPLS-TP OAM functions are implemented by maintenance entities (MEs). An ME consists
of a pair of maintenance entity group end points (MEPs) located at two ends of a link and a
group of maintenance entity group intermediate points (MIPs) between them.
MPLS-TP OAM components are described as follows:
 ME
An ME maintains a relationship between two MEPs. On a bidirectional label switched
path (LSP) that has two MEs, MPLS-TP OAM detection can be performed on the MEs
without affecting each other. One ME can be nested within another ME but cannot
overlap with another ME.
ME1 and ME2 in Figure 1-189 are used as an example:
− ME1 consists of two MEPs only.
− ME2 consists of two MEPs and two MIPs.
Figure 1-189 ME deployment on a point-to-point bidirectional LSP
 MEG
Issue 01 (2018-05-04) 307

NE20E-S2
A maintenance entity group (MEG) comprises one or more MEs that are created for a
transport link. If the transport link is a point-to-point bidirectional path, such as a
bidirectional co-routed LSP or pseudo wire (PW), a MEG comprises only one ME.
 MEP
A MEP is the source or sink node in a MEG. Figure 1-190 shows ME node deployment.
Figure 1-190 ME node deployment
− For a bidirectional LSP, only the ingress label edge router (LER) and egress LER
can function as MEPs, as shown in Figure 1-190.
− For a PW, only user-end provider edges (UPEs) can function as MEPs.
MEPs trigger and control MPLS-TP OAM operations. OAM packets can be generated or
terminated on MEPs.
Fault Management
Table 1-53 lists the MPLS-TP OAM fault management functions supported by the NE20E.
Table 1-53 MPLS-TP OAM fault management functions
Continuity check Checks link connectivity periodically.
(CC)
Connectivity Detects forwarding faults continuously.
verification (CV)
Loopback (LB) Performs loopback.
Remote defect Notifies remote defects.
indication (RDI)
Issue 01 (2018-05-04) 308

NE20E-S2
Performance Monitoring
Table 1-54 lists the MPLS-TP OAM performance monitoring functions supported by the
NE20E.
Table 1-54 MPLS-TP OAM performance monitoring functions
Loss measurement Collects statistics about lost frames. LM includes the following
(LM) functions:
 Single-ended frame loss measurement
 Dual-ended frame loss measurement
Delay measurement Collects statistics about delays and delay variations (jitter). DM
(DM) includes the following functions:
 One-way frame delay measurement
 Two-way frame delay measurement
1.5.5.2 Principles
An MPLS-TP network consists of the section, LSP, and PW layers in bottom-up order. A
lower layer is a server layer, and an upper layer is a client layer. For example, the section
layer is the LSP layer's server layer, and the LSP layer is the section layer's client layer.
On the MPLS-TP network shown in Figure 1-191, MPLS-TP OAM detects and locates faults
in the section, LSP, and PW layers. Table 1-55 describes MPLS-TP OAM components.
Issue 01 (2018-05-04) 309

NE20E-S2
Figure 1-191 MPLS-TP OAM application
Table 1-55 MPLS-TP OAM components
Name Description Example

Maintenance entity (ME) All MPLS-TP OAM  Section layer:
functions are performed on Each pair of adjacent
MEs. Each ME consists of LSRs forms an ME.
two maintenance entity
 LSP layer:
group end points (MEPs)
and maintenance entity LSRs A, B, C, and D
group intermediate points form an ME.
(MIPs) on the link between LSRs D and E form an
the two MEPs. ME.
LSRs E, F, and G form
an ME.
 PW layer:
LSRs A, D, E, and G
form an ME.
Maintenance entity group A MEG is comprised of one  Section layer:
(MEG) or more MEs that are Each ME forms a MEG.
created for a transport link.
 LSP layer:
MEGs for various services
contain different MEs: Each ME forms a MEG.
 A MEG for a P2P  PW layer:
unidirectional path Each ME forms a MEG.
contains only one ME.
 A MEG for a P2P
bidirectional path NOTE
contains two MEs. A If two tunnels in opposite
directions between LSR A and
MEG for P2P
Issue 01 (2018-05-04) 310

NE20E-S2

bidirectional co-routed LSR D are established, a single
path contains a single MEG consisting of two MEs is
established.
ME.
 A MEG for a P2MP
unidirectional path
contains MEs destined
for leaf nodes.
MEG end point (MEP) A MEP is the source or sink  Section layer: Each LSR
node in a MEG. can function as a MEP.
Each LSR functions as
an LSR.
 LSP layer: Only an LER
can function as a MEP.
LSRs A, D, E, and G are
LERs functioning as
MEPs.
 PW layer: Only PW
terminating provider
edge (T-PE) LSRs can
function as MEPs.
LSRs A and G are T-PEs
functioning as MEPs.
MEG intermediate point Intermediate nodes between  Section layer:

(MIP) two MEPs on both ends of a No MIPs.
MEG. MIPs only respond to
 LSP layer:
OAM packets sent by MEPs
and do not take the initiative LSRs B, C, and F
in OAM packet exchanges. function as MIPs.
 PW layer:
LSRs D and E function
as MIPs.
Usage Scenario
MPLS-TP OAM monitors the following types of links:
 Static bidirectional co-routed CR-LSPs
 Static VLL-PWs,VPLS-PWs
1.5.5.2.2 Continuity Check and Connectivity Verification

Continuity check (CC) and connectivity verification (CV) are both MPLS-TP functions. CC is
used to check loss of continuity defeat(dLOC) between two MEPs in a MEG. CV monitors
connectivity between two MEPs within one MEG or in different MEGs.
Issue 01 (2018-05-04) 311

NE20E-S2
CC
CC is a proactive OAM operation. It detects LOC faults between any two MEPs in a MEG. A
MEP sends CC messages (CCMs) to a remote RMEP at specified intervals. If the RMEP does
not receive a CCM for a period 3.5 times as long as the specified interval, it considers the
connection between the two MEPs faulty. This causes the RMEP to report an alarm and enter
the Down state, and the RMEP triggers automatic protection switching (APS) on both MEPs.
After receiving a CCM from the MEP, the RMEP will clear the alarm and exit the Down state.
CV
CV is also a proactive OAM operation. It enables a MEP to report alarms when unexpected or
error packets are received. For example, if a CV-enabled MEP receives a packet from an LSP
and finds that this packet has been transmitted in error along an LSP, the MEP will report an
alarm indicating a forwarding error.
1.5.5.2.3 Packet Loss Measurement

Packet loss measurement (LM), a performance monitoring function provided by MPLS-TP, is
implemented on the two ends of a PW, LSP, or MPLS section to collect statistics about
dropped packets. Packet loss measurement results contain near- and far-end packet loss
values:
 Near-end packet loss value: the number of dropped packets expected to arrive at the local
MEP.
 Far-end packet loss value: the number of dropped packets the local MEP has sent.
To collect packet loss statistics for both incoming and outgoing packets, each MEP must have
both of the following counters enabled:
 TxFCl: records the number of packets sent to the RMEP.
 RxFCl: records the number of packets received by the local MEP.
Packet loss measurement can be performed in either single- or dual-ended mode. Table 1-56
describes the single- and dual-ended packet loss measurement.
Table 1-56 Packet loss measurement functions
Function Description Usage Scenario

Dual-ended Collects packet loss Dual-ended packet loss measurement provides
packet loss statistics to assess the more accurate results than the single-ended
measurement quality of the link method. The interval between dual-ended
between two MEPs that packet loss measurements varies with the
have connectivity fault interval between CCM transmissions. The
management (CFM) CCM transmission interval is shorter than the
continuity check (CC) interval between LMM transmissions.
enabled. Therefore, the dual-ended method allows for a
shorter measurement interval than the
single-ended method.
Single-ended Collects packet loss Sending CCMs imposes a heavier burden on
packet loss statistics to assess the the network than sending LMMs and LMRs.
measurement quality of the link To minimize the burden, single-ended packet
between two MEPs. loss measurement can be used.
This method is
Issue 01 (2018-05-04) 312

NE20E-S2

independent of CC.
Dual-ended Packet Loss Measurement

Figure 1-192 illustrates proactive dual-ended packet loss measurement. Dual-ended packet
loss measurement can only be performed in proactive mode. Two MEPs on both ends of a link
periodically exchange CCMs carrying the following information:
 TxFCf: the local TxFCl value recorded when the local MEP sent a CCM.
 RxFCb: the local RxFCl value recorded when the local MEP received a CCM.
 TxFCb: the TxFCf value carried in a received CCM. This TxFCb value is the local
TxFCl when the local MEP receives a CCM.
Figure 1-192 Proactive dual-ended packet loss measurement
After receiving CCMs carrying packet count information, both MEPs use the following
formulas to measure near- and far-end packet loss values:
Near-end packet loss value = |TxFCf[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Far-end packet loss value = |TxFCb[tc] - TxFCb[tp]| - |RxFCb[tc] - RxFCb[tp]|
 TxFCf[tc], RxFCb[tc], and TxFCb[tc] are the TxFCf, RxFCb, and TxFCb values,
respectively, which are carried in the most recently received CCM. RxFCl[tc] is the local
RxFCl value recorded when the local MEP received the CCM.
 TxFCf[tp], RxFCb[tp], and TxFCb[tp] are the TxFCf, RxFCb, and TxFCb values,
respectively, which are carried in the previously received CCM. RxFCl[tp] is the local
RxFCl value recorded when the local MEP received the previous CCM.
 tc is the time a current CCM was received.
 tp is the time the previous CCM was received.
Single-ended Packet Loss Measurement

Single-ended packet loss measurement is performed in either proactive or on-demand mode.
In proactive mode, a local MEP periodically sends loss measurement messages (LMMs) to an
RMEP carrying the following information:
Issue 01 (2018-05-04) 313

NE20E-S2
 TxFCl: the local TxFCl value recorded when the LMM was sent.
After receiving an LMM, the RMEP responds to the local MEP with loss measurement replies
(LMRs) carrying the following information:
 TxFCf: equal to the TxFCf value carried in the LMM.
 RxFCf: the local RxFCl value recorded when the LMM was received.
 TxFCb: the local TxFCl value recorded when the LMR was sent.
Figure 1-193 illustrates proactive single-end packet loss measurement.
Figure 1-193 Proactive single-ended packet loss measurement
After receiving an LMR, the local MEP uses the following formulas to calculate near- and
far-end packet loss values:
Near-end packet loss value = |TxFCb[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Far-end packet loss value = |TxFCf[tc] - TxFCf[tp]| - |RxFCf[tc] - RxFCf[tp]|
 TxFCf[tc], RxFCf[tc], and TxFCb[tc] are the TxFCf, RxFCf, and TxFCb values,
respectively, which are carried in the most recently received LMR. RxFCl[tc] is the local
RxFCl value recorded when the most recent LMR arrives at the local MEP.
 TxFCf[tp], RxFCf[tp], and TxFCb[tp] are the TxFCf, RxFCf, and TxFCb values,
respectively, which are carried in the previously received LMR. RxFCl[tp] is the local
RxFCl value recorded when the previous LMR arrived at the local MEP.
 tc is the time a current LMR was received.
 tp is the time the previous LMR was received.
1.5.5.2.4 Frame Delay Measurement

Frame delay measurement (DM), a performance monitoring function provided by MPLS-TP,
calculates the delay time on links. Frame delay measurement is performed in either proactive
or on-demand mode. The on-demand mode is used by default. Delay information can be used
to calculate the delay variation.
The link delay time can be measured using either one- or two-way frame delay measurement.
Table 1-57 describes these frame delay measurement functions.
Issue 01 (2018-05-04) 314

NE20E-S2
Table 1-57 Frame delay measurement functions

One-way Measures the network delay time One-way frame delay measurement
frame delay on a unidirectional link between can be used only on a
measurement MEPs. unidirectional link. A MEP and its
RMEP on both ends of the link
must have synchronous time.
Two-way Measures the network delay time Two-way frame delay
frame delay on a bidirectional link between measurement can be used on a
measurement MEPs. bidirectional link between a local
MEP and its RMEP. The local
MEP does not need to synchronize
its time with its RMEP.
One-Way Frame Delay Measurement

Figure 1-194 illustrates one-way frame delay measurement. A local MEP periodically sends
its RMEP one-way delay measurement (1DM) messages carrying TxTimeStampf (the time
when a 1DM was sent).
Figure 1-194 One-way frame delay measurement
After the RMEP receives a 1DM, it subtracts the TxTimeStampf value from the RxTimef
value to calculate the delay time:
Frame delay time = RxTimef - TxTimeStampf
The frame delay value can be used to measure the delay variation that is the absolute
difference between two delay time values.
One-way frame delay measurement can only be performed when the two MEPs on both ends
of a link have synchronous time. If these MEPs have asynchronous time, they can only
measure the delay variation.
Issue 01 (2018-05-04) 315

NE20E-S2
Two-Way Frame Delay Measurement

Two-way frame delay measurement is performed by E2E MEPs. A MEP periodically sends a
DMM carrying TxTimeStampf (the time when the DMM was sent). After receiving the DMM,
the RMEP responds with a delay measurement reply (DMR). This message carries
RxTimeStampf (the time when the DMM was received) and TxTimeStampb (the time when
the DMR was sent). The value in every field of the DMM is copied exactly to the DMR, with
the exception that the source and destination MAC addresses are interchanged.
Figure 1-195 Two-way frame delay measurement
Upon receipt of the DMR, the local MEP calculates the two-way frame delay time using the
following formula:
Frame delay = RxTimeb (the time the DMR was received) - TxTimeStampf
To obtain a more accurate result, RxTimeStampf and TxTimeStampb are used. RxTimeStampf
indicates the time a DMM is received, and TxTimeStampb indicates the time a DMR is sent.
After the local MEP receives the DMR, it calculates the frame delay time using the following
formula:
Frame delay = (RxTimeb - TxTimeStampf) - (TxTimeStampb - RxTimeStampf)
Two-way frame delay measurement supports both delay and delay variation measurement
even if these MEPs do not have synchronous time. The frame delay time is the round-trip
delay time. If both MEPs have synchronous time, the round-trip delay time can be calculated
by combining the two delay values using the following formulas:
 MEP-to-RMEP delay time = RxTimeStampf - TxTimeStampf
 RMEP-to-MEP delay time = RxTimeb - TxTimeStampb
1.5.5.2.5 Remote Defect Indication

Remote defect indication (RDI) enables a maintenance entity group end point (MEP) to send
continuity check messages (CCMs), each carrying an RDI flag, to notify a remote MEP
(RMEP) of faults.
The RDI implementation is as follows:
 After a local MEP detects a link fault using the continuity check (CC) function, the local
MEP sets the RDI flag to 1 in CCMs and sends the CCMs along a reverse path to notify
its RMEP of the fault.
Issue 01 (2018-05-04) 316

NE20E-S2
 After the fault is rectified, the local MEP sets the RDI flag to 0 in CCMs and sends them
to inform the RMEP that the fault is rectified.
 The RDI function is associated with the proactive continuity check function and takes effect only
after the continuity check function is enabled.
 The RDI function applies only to bidirectional links. In the case of a unidirectional LSP, before RDI
can be used, a reverse path must be bound to the LSP.
1.5.5.2.6 Loopback
Background
On a multiprotocol label switching transport profile (MPLS-TP) network, a virtual circuit may
traverse muptiple exchanging devices (nodes), including maintenance association end points
(MEPs) and maintenance association intermediate points (MIPs). Any faulty node or link fault
in a virtual circuit may lead to the unavailability of the entire virtual circuit. Moreover, the
fault cannot be located. Loopback (LB) can be configured on a source device (MEP) to detect
or locate faults in links between the MEP and a MIP or between MEPs.
Related Concepts
LB and continuity check (CC) are both connectivity monitoring tools on an MPLS-TP
network. Table 1-58 describes differentces between CC and LB.
Table 1-58 Differences among CC and LB

CC CC is a proactive OAM To only monitor the
operation. It detects LOC connectivity of a link
faults between any two between two MEPs or
MEPs in a MEG. associate APS, choose CC.
LB LB is an on-demand OAM To monitor the bidirectional
operation. It monitors the connectivity of a link
connectivity of bidirectional between a MEP and a MIP
links between a MEP and a or a link between two MEPs
MIP and between MEPs. and not to associate APS,
choose LB.
Implementation
The loopback function monitors the connectivity of bidirectional links between a MEP and a
MIP and between MEPs.
A loopback test is initiated on a MEP, and the destination can be set to an RMEP or a MIP.
The loopback test process is as follows:
1. The source MEP sends a loopback message (LBM) to a destination. If a MIP is used as
the destination, the TTL in the LBM must be equal to the number of hops from the
Issue 01 (2018-05-04) 317

NE20E-S2
source to the destination. LBM checks whether the target MIP ID carried by itself and
the MIP ID are the same. If a MEP is used as the destination, the TTL must be greater
than or equal to the number of hops to the destination. The TTL setting prevents the
LBM from being discarded before reaching the destination.
2. After the destination receives the LBM, it checks whether the target MIP ID or MEP ID
matches the local MIP ID or MEP ID. If they do not match, the destination discards the
LBM. If they match, the destination responds with a loopback reply (LBR).
3. If the source MEP receives the LBR within a specified period of time, it considers the
destination reachable and the loopback test successful. If the source MEP does not
receive the LBR after the specified period of time elapses, it records a loopback test
timeout and log information that is used to analyze the connectivity failure.
Figure 1-196 Loopback test
Figure 1-196 illustrates a loopback test. LSRA initiates a loopback test to LSRC on an LSP.
The loopback test process is as follows:
1. LSRA sends LSRC an LBM carrying a specified TTL and a MIP ID. LSRB
transparently transmits the LBM to LSRC.
2. Upon receipt, LSRC determines that the TTL carried in the LBM times out and checks
whether the target MIP ID carried in the LBM matches the local MIP ID. If they do not
match, LSRC discards the LBM. If they match, LSRC responds with an LBR.
3. If LSRA receives the LBR within a specified period of time, it considers LSRC
reachable. If LSRA fails to receive the LBR after a specified period of time elapses,
LSRA considers LSRC unreachable and records log information that is used to analyze
the connectivity failure.
1.5.5.3.1 MPLS-TP OAM Application in the IP RAN Layer 2 to Edge Scenario
MPLS-TP OAM is deployed on PEs to maintain and operate MPLS networks. Working at the
MPLS client and server layers, MPLS-TP OAM can effectively detect, identify, and locate
client layer faults and quickly switch traffic if links or nodes become faulty, reducing network
maintenance cost.
Issue 01 (2018-05-04) 318

NE20E-S2
Figure 1-197 IP RAN over MPLS-TP in the Layer 2 to edge scenario
Figure 1-197 illustrates an IP RAN in the Layer 2 to edge scenario. The MPLS OAM
implementation is as follows:
 The BTS, NodeB, BSC, and RNC can be directly connect to an MPLS-TP network.
 A TE tunnel between PE1 and PE4 is established. PWs are established over the TE
tunnel to transmit various services.
 MPLS-TP OAM is enabled on PE1 and PE4 OAM parameters are configured on PE1
and PE4 on both ends of a PW. These PEs are enabled to send and receive OAM
detection packets, which allows OAM to monitor the PW between PE1 and PE4. OAM
can obtain basic PW information. If OAM detects a default, PE4 sends a RDI packet to
PE1 over a reverse tunnel. PEs notify the user-side BTS, NodeB, RNC, and BSC of fault
information so that the user-side devices can use the information to maintain networks.
1.5.5.3.2 Application of MPLS-TP OAM in VPLS Networking
Service Overview
The operation and maintenance of virtual leased line (VLL) and virtual private LAN service
(VPLS) services require an operation, administration and maintenance (OAM) mechanism.
MultiProtocol Label Switching Transport Profile (MPLS-TP) OAM provides a mechanism to
rapidly detect and locate faults, which facilitates network operation and maintenance and
reduces the network maintenance costs.
As shown in Figure 1-198, a user-end provider edge (UPE) on the access network is
dual-homed to SPE1 and SPE2 on the aggregation network. A VLL supporting access links of
various types is deployed on the access network. A VPLS is deployed on the aggregation
network to form a point-to-multipoint leased line network. Additionally, Fast Protection
Switching (FPS) is configured on the UPE; MPLS tunnel automatic protection switching
(APS) is configured on SPE1 and SPE2 to protect the links between the virtual switching
instances (VSIs) created on the two superstratum provider edges (SPEs).
Issue 01 (2018-05-04) 319

NE20E-S2
Figure 1-198 UPE dual-homing networking
Feature Deployment
To deploy MPLS-TP OAM to monitor link connectivity of VLL and VPLS pseudo wires
(PWs), configure maintenance entity groups (MEGs) and maintenance entities (MEs) on the
UPE, SPE1, and SPE2 and then enable one or more of the continuity check (CC), and
loopback (LB) functions. The UPE monitors link connectivity and performance of the primary
and secondary PWs.
MPLS-TP OAM is implemented as follows:
 When SPE1 detects a link fault on the primary PW, SPE1 sends a Remote Defect
Indication (RDI) packet to the UPE, instructing the UPE to switch traffic from the
primary PW to the secondary PW. Meanwhile, the UPE sends a MAC Withdraw packet,
in which the value of the PE-ID field is SPE1's ID, to SPE2. After receiving the MAC
Withdraw packet, SPE2 transparently forwards the packet to the NPE and the NPE
deletes the MAC address it has learned from SPE1. After that, the NPE learns a new
MAC address from the secondary PW.
 After the primary PW recovers, the UPE switches traffic from the secondary PW back to
the primary PW. Meanwhile, the UPE sends a MAC Withdraw packet, in which the value
of the PE-ID field is SPE2's ID, to SPE1. After receiving the MAC Withdraw packet,
SPE1 transparently forwards the packet to the NPE and the NPE deletes the MAC
address it has learned from SPE2. After that, the NPE learns a new MAC address from
the new primary PW.
Terms
None
Abbreviations
Abbreviation Full Name
AIS Alarm Indication Signal
CC Continuity Check
Issue 01 (2018-05-04) 320

NE20E-S2

CSF Client Signal Failure
CV Connectivity Verification
DM Delay Measurement
LB Loopback
LCK Locked Signal
LM Loss Measurement
LSP Label Switched Path
LSR Label Switching Router
LT Linktrace
MEP Maintenance association End Point
MIP Maintenance association Intermediate Point
MPLS-TP Multiprotocol Label Switching Transport Profile
OAM Operation Administration & Maintenance
PE Provider Edge Router
PW Pseudo-Wires
RDI Remote Defect Indication
SPE Superstratum PE
TST Test
UPE Underlayer PE
1.5.6 VRRP
Definition
The Virtual Router Redundancy Protocol (VRRP) is a fault-tolerant protocol that groups
several routers into a virtual router. If the next hop of a host fails, VRRP switches traffic to
another router, which ensures communication continuity and reliability.
In this document, if a VRRP function supports both IPv4 and IPv6, the implementation of this VRRP
function is the same for IPv4 and IPv6 unless otherwise specified.
VRRP is a fault-tolerant protocol defined in relevant standards. VRRP allows logical devices
to work separately from physical devices and implements route selection among multiple
egress gateways.
Issue 01 (2018-05-04) 321

NE20E-S2
On the network shown in Figure 1-199, VRRP is enabled on two routers. One is the master
and the other is the backup. The two routers form a virtual router and this virtual router is
assigned a virtual IP address and a virtual MAC address. Hosts monitor only the presence of
the virtual router. The hosts communicate with devices on other network segments through
the virtual router.
A virtual router consists of a master router and one or more backup routers. Only the master
router forwards packets. If the master router fails, a backup router is elected as the master
router and takes over.
Figure 1-199 Schematic diagram for a VRRP backup group
On a multicast or broadcast LAN (for example, an Ethernet), VRRP uses a logical VRRP
gateway to ensure reliability for key links. VRRP prevents service interruptions if a physical
VRRP gateway fails, providing high reliability. VRRP configuration is simple and takes effect
without modification in configurations, such as routing protocol configurations.
Purpose
As networks rapidly develop and applications become diversified, various value-added
services, such as Internet Protocol television (IPTV) and video conferencing, have become
widespread. Demands for network infrastructure reliability are increasing, especially in
nonstop network transmission.
Generally, hosts use one default gateway to communicate with external networks. If the
default gateway fails, communication between the hosts and external networks is interrupted.
System reliability can be improved using dynamic routing protocols (such as RIP and OSPF)
or ICMP Router Discovery Protocol (IRDP). However, this method requires complex
configurations and each host must support dynamic routing protocols.
VRRP resolves this issue by enabling several routers to be grouped into a virtual router, also
called a VRRP backup group. In normal circumstances, the master router in the VRRP backup
group functions as a default gateway and provides access services for users. If the master
router fails, VRRP elects a backup router from the VRRP backup group to provide access
services for users.
 RIP: Routing Information Protocol

 OSPF: Open Shortest Path First
Issue 01 (2018-05-04) 322

NE20E-S2
 ICMP: Internet Control Message Protocol
Hosts on a local area network (LAN) are usually connected to an external network through a
default gateway. When the hosts send packets destined for addresses out of the local network
segment, these packets follow a default route to an egress gateway. A provider edge (PE)
functions as an egress gateway on the network shown in Figure 1-200. The PE forwards
packets to the external network so that the hosts can communicate with the external network.
Figure 1-200 Default gateway on a LAN
If the PE fails, the hosts connected to it cannot communicate with the external network. The
communication failure persists even if another router is added to the LAN. This is because
only a single default gateway can be configured for most hosts on a LAN and forward all data
packets destined for devices that are not on the local network segment. Hosts send packets
only through the default gateway though they are connected to multiple routers.
Configuring multiple egress gateways is a common method to prevent communication
interruptions. This method is available only if one of routes to these egress gateways can be
selected. Another method is to use dynamic routing protocols, such as the Routing
Information Protocol (RIP), Open Shortest Path First (OSPF), and Internet Control Message
Protocol (ICMP). This method is available only if every host runs a dynamic routing protocol
and there is no problem in management, security, or operating systems' support for protocols.
VRRP prevents communication failures in a better way than the preceding two methods.
VRRP is configured only on routers to implement gateway backup, without any networking
changes or burden on hosts.
Benefits
VRRP offers the following benefits to carriers:
 Reliable transmission: A logical VRRP gateway on a multicast or broadcast local area
network (LAN), such as an Ethernet network, ensures reliable transmission over key
links. VRRP helps prevent service interruptions if a link to a physical VRRP gateway
fails.
 Flexible applications: A VRRP header is encapsulated into an IP packet. This
implementation allows the association between VRRP and various upper-layer protocols.
 Low network overheads: VRRP uses only VRRP Advertisement packets.
Issue 01 (2018-05-04) 323

NE20E-S2
VRRP offers the following benefits to users:

 Simplified configurations: Users only need to specify a gateway address without
configuring routing protocols on their hosts.
 Improved user experience: Users are not aware of a single point of failure.
Basic VRRP Functions

VRRP supports two modes: master/backup mode and load balancing mode.
Figure 1-201 shows the master/backup mode.
Figure 1-201 Master/Backup mode
For the master/backup mode:

 A single VRRP backup group is configured and consists of a master device and several
backup devices.
 The router with the highest priority functions as the master device and transmits service
packets.
 Other routers function as backup devices and monitor the master router's status. If the
master router fails, a backup router with the highest priority preempts the Master state.
Figure 1-202 shows the load balancing mode.
Figure 1-202 Load balancing mode
For the load balancing mode:
Issue 01 (2018-05-04) 324

NE20E-S2
 Multiple VRRP backup groups are configured to implement load balancing.

 PE1 is the master device in VRRP backup group 1 and the backup device in VRRP
backup group 2.
 PE2 is the master device in VRRP backup group 2 and the backup device in VRRP
backup group 1.
 n normal circumstances, different routers process different user groups' traffic to
implement load balancing.
VRRP load balancing is classified as multi-gateway or single-gateway load balancing. For details about
VRRP load balancing, see the chapter "VRRP" in HUAWEI NE20E-S2 Universal Service Router
Feature Description - Network Reliability.
1.5.6.2 Principles
1.5.6.2.1 Basic VRRP Concepts
As shown in Figure 1-203, two gateways are grouped to form a virtual gateway, and the user
host uses the virtual gateway's IP address as the default gateway IP address to communicate
with the external network. If the default gateway fails, VRRP elects a new gateway to provide
access services for the user.
Figure 1-203 VRRP networking
Basic VRRP concepts are described as follows:
Issue 01 (2018-05-04) 325

NE20E-S2
 Virtual router: also called a VRRP backup group, consists of a master router and one or
more backup routers. A virtual router is a default gateway used by hosts within a shared
local area network (LAN). A virtual router ID and one or more virtual IP addresses
together identify a virtual router.
− Virtual router ID (VRID): ID of a virtual router. Routers with the same VRID form
a virtual router.
− Virtual IP address: IP address of a virtual router. A virtual router can have one or
more virtual IP addresses, which are manually assigned.
− Virtual MAC address: MAC address generated by a virtual router based on a VRID.
A virtual router has one virtual MAC address, in the format of
00-00-5E-00-01-{VRID} (VRRP for IPv4) or 00-00-5E-00-02-{VRID} (VRRP for
IPv6). After a virtual router receives an ARP (VRRP for IPv4) or NS (VRRP for
IPv6) request, it responds to the request with the virtual MAC address rather than
the actual MAC address.
 IP address owner: VRRP router that uses the virtual IP address as its interface IP address.
If an IP address owner is available, it functions as the master router.
 Primary IP address: IP address selected from actual interface IP addresses, which is
usually the first IP address that is configured. The primary IP address is used as the
source IP address in a VRRP Advertisement packet.
 VRRP router: device running VRRP. A VRRP router can join one or more VRRP backup
groups. A VRRP backup group consists of the following VRRP routers:
− Master router: forwards packets and responds to ARP requests.
− Backup router: does not forward packets when the master router is working
properly, but can be elected as the new master router if the master router fails.
 Priority: priority of a router in a VRRP backup group. A VRRP backup group elects the
master and backup routers based on router priorities.
 VRRP working modes:
− Preemption mode: A backup router with a higher priority than the master router
preempts the Master state.
− Non-preemption mode: When the master router is working properly, a backup
router does not preempt the Master state even if it has a priority higher than the
master router.
 VRRP timers:
− Adver_Interval timer: The master router sends a VRRP Advertisement packet each
time the Adver_Interval timer expires. The default timer value is 1 second.
− Master_Down timer: A backup router preempts the Master state after the
Master_Down timer expires. The Master_Down timer value (in seconds) is
calculated using the following equation:
Master_Down timer value = (3 x Adver_Interval timer value) + Skew_Time
where
Skew_Time = (256 - Backup router's priority)/256
1.5.6.2.2 VRRP Packets

VRRP packets are used to advertise the priority and status of the master router to all backup
routers in the same VRRP backup group as the master. A VRRP packet is a multicast packet
that can be forwarded only within a single broadcast domain, such as a virtual local area
network (VLAN) or a virtual switching instance (VSI).
Issue 01 (2018-05-04) 326

NE20E-S2
VRRP versions include VRRPv2 and VRRPv3. VRRPv2 applies only to IPv4 networks, and
VRRPv3 applies to both IPv4 and IPv6 networks. VRRP is classified as VRRP for IPv4
(VRRP4) or VRRP for IPv6 (VRRP6) by network type. VRRP4 supports both VRRPv2 and
VRRPv3, and VRRP6 supports only VRRPv3.
For an IPv4 network, VRRP packets are encapsulated into IPv4 packets and sent to an IPv4
multicast address assigned to a VRRP4 backup group. In an IPv4 packet header:
 The source address is the primary IPv4 address of the interface that sends the packet.
 The destination address is 224.0.0.18.
 The time to live (TTL) value is 255.
 The protocol number is 112.
For an IPv6 network, VRRP packets are encapsulated into IPv6 packets and sent to an IPv6
multicast address assigned to a VRRP6 backup group. In an IPv6 packet header:
 The source address is the link-local address of the interface that sends the packet.
 The destination address is FF02::12.
 The hop count is 255.
 The protocol number is 112.
The
NE20E allows you to manually switch a VRRP version. VRRP packets refer to VRRPv2 packets, unless
otherwise specified in this document.
VRRP Packet Structure

Figure 1-204 and Figure 1-205 show the VRRPv2 and VRRPv3 packet structures,
respectively.
Figure 1-204 VRRPv2 packet structure
Table 1-59 describes the fields in a VRRPv2 packet.
Table 1-59 Fields in a VRRPv2 packet
Field Description
Issue 01 (2018-05-04) 327

NE20E-S2
Field Description
Version Version number of the VRRP protocol. The
value is 2.
Type Type of the VRRPv2 packet. The value is 1,
indicating that the packet is an
advertisement packet.
Virtual Rtr ID Virtual router identifier.
Priority Priority of the master router in a VRRP
backup group.
Count IPv4 Addrs Number of virtual IPv4 addresses
configured for a VRRP backup group.
Auth Type VRRPv2 packet authentication type.
VRRPv2 defines the following
authentication types:
 0: Non Authentication, indicating that
authentication is not performed.
 1: Simple Text Password, indicating that
simple authentication is performed.
 2: IP Authentication Header, indicating
that MD5 authentication is performed.
Adver Int Interval at which VRRPv2 packets are sent,
in seconds.
Checksum 16-bit checksum, used to check the data
integrity of the VRRPv2 packet.
IPv4 Address Virtual IPv4 address configured for a VRRP
backup group.
Authentication Data Authentication key in the VRRPv2 packet.
This field applies only when simple or MD5
authentication is used. For other
authentication types, this field is fixed to 0.
Issue 01 (2018-05-04) 328

NE20E-S2
Figure 1-205 VRRPv3 packet structure
Table 1-60 describes the fields in a VRRPv3 packet.
Table 1-60 Fields in a VRRPv3 packet

Field Description
Version Version number of the VRRP protocol. The

value is 3.
Type Type of the VRRPv3 packet. The value is 1,
indicating that the packet is an
advertisement packet.
Virtual Rtr ID Virtual router identifier.
Priority Priority of the master router in a VRRP
backup group.
Count IPvX Addrs Number of virtual IPvX addresses
configured for a VRRP backup group.
rsvd Field reserved for the VRRPv3 packet. The
value must be set to 0.
Adver Int Interval at which VRRPv3 packets are sent,
in centiseconds.
Checksum 16-bit checksum, used to check the data
integrity of the VRRPv3 packet.
IPvX Address Virtual IPvX address configured for a
VRRP backup group.
As shown in Figure 1-204 and Figure 1-205, the main differences between VRRPv2 and
VRRPv3 are as follows:
 VRRPv2 supports authentication, whereas VRRPv3 does not.
 VRRPv2 supports a second-level interval between sending VRRP Advertisement packets,
whereas VRRPv3 supports a centisecond-level interval.
Issue 01 (2018-05-04) 329

NE20E-S2
1.5.6.2.3 VRRP Operating Principles
VRRP State Machine

VRRP defines three states: Initialize, Master, and Backup. Only a router in the Master state is
allowed to forward packets sent to a virtual IP address.
Figure 1-206 shows the transition process of the VRRP states.
Figure 1-206 Transition process of the VRRP states
Table 1-61 describes the VRRP states.
Table 1-61 VRRP states
State Description Transition

Initialize A VRRP router is unavailable and After a router receives a Startup
does not process VRRP event, it changes its status as follows:
Advertisement packets.  Changes from Initialize to Master
A router enters the Initialize state if the router is an IP address owner
when it starts or detects a fault. with a priority of 255.
 Changes from Initialize to Backup
if the router has a priority less than
255.
Master A router in the Master state provides The master router changes its status as
the following functions: follows:
 Sends a VRRP Advertisement  Changes from Master to Backup if
packet each time the the VRRP priority in a received
Adver_Interval timer expires. VRRP Advertisement packet is
 Responds to an ARP request with higher than the local VRRP
an ARP reply carrying the virtual priority.
MAC address.  Remains in the Master state if the
 Forwards IP packets sent to the VRRP priority in a received
Issue 01 (2018-05-04) 330

NE20E-S2

virtual MAC address. VRRP Advertisement packet is the
 Allows ping to a virtual IP address same as the local VRRP priority.
by default.  Changes from Master to Initialize
after it receives a Shutdown event,
indicating that the VRRP-enabled
interface has been shut down.
NOTE
If devices in a VRRP backup group are in
the Master state and a device receives a
VRRP Advertisement packet with the
same priority as the local VRRP priority,
the device compares the IP address in the
packet with the local IP address. If the IP
address in the packet is greater than the
local IP address, the device switches to
the Backup state. If the IP address in the
packet is less than or equal to the local IP
address, the device remains in the Master
state.
Backup A router in the Backup state provides A backup router changes its status as
the following functions: follows:
 Receives VRRP Advertisement  Changes from Backup to Master
packets from the master router and after it receives a Master_Down
checks whether the master router timer timeout event.
is working properly based on  Changes from Backup to Initialize
information in the packets. after it receives a Shutdown event,
 Does not respond to an ARP indicating that the VRRP-enabled
request carrying a virtual IP interface has been shut down.
address.
 Discards IP packets sent to the
virtual MAC address.
 Discards IP packets sent to virtual
IP addresses.
 If, in preemption mode, it receives
a VRRP Advertisement packet
carrying a VRRP priority lower
than the local VRRP priority, it
preempts the Master state after a
specified preemption delay.
 If, in non-preemption mode, it
receives a VRRP Advertisement
packet carrying a VRRP priority
lower than the local VRRP priority
it remains in the Backup state.
 Resets the Master_Down timer but
does not compare IP addresses if it
receives a VRRP Advertisement
packet carrying a VRRP priority
higher than or equal to the local
Issue 01 (2018-05-04) 331

NE20E-S2

VRRP priority.
VRRP Implementation Process

The VRRP implementation process is as follows:
1. VRRP elects the master router from a VRRP backup group based on router priorities.
Once elected, the master router sends a gratuitous ARP packet carrying the virtual MAC
address to its connected device or host to start forwarding traffic.
2. The master router periodically sends VRRP Advertisement packets to all backup routers
in the VRRP backup group to advertise its configurations (such as the priority) and
operating status.
3. If the master router fails, VRRP elects a new master router from the VRRP backup group
based on router priorities.
4. The new master router immediately sends a gratuitous ARP packet carrying the virtual
MAC address and virtual IP address to update MAC entries on its connected device or
host. After the update is complete, user traffic is switched to the new master router. The
switching process is transparent to users.
5. If the original master router recovers and its priority is 255, it immediately switches to
the Master state. If the original master router recovers and its priority is lower than 255,
it switches to the Backup state and recovers the previously configured priority.
6. If a backup router's priority is higher than the master router's priority, VRRP determines
whether to reelect a new master router, depending on the backup router's working mode
(preemption or non-preemption).
To ensure that the master and backup routers work properly, VRRP must implement the
following functions:
 Master router election
VRRP determines the master or backup role of each router in a VRRP backup group
based on router priorities. VRRP selects the router with the highest priority as the master
router.
If routers in the Initialize state receive a Startup event and their priorities are lower than
255, they switch to the Backup state. The router whose Master_Down timer first expires
switches to the Master state. The router then sends a VRRP Advertisement packet to
other routers in the VRRP backup group to obtain their priorities.
− If a router finds that the VRRP Advertisement packet carries a priority higher than
or equal to its priority, this router remains in the Backup state.
− If a router finds that the VRRP Advertisement packet carries a priority lower than
its priority, the router may switch to the Master state or remain in the Backup state,
depending on its working mode. If the router is working in preemption mode, it
switches to the Master state; if the router is working in non-preemption mode, it
remains in the Backup state.
Issue 01 (2018-05-04) 332

NE20E-S2
 If multiple VRRP routers enter the Master state at the same time, they exchange VRRP
Advertisement packets to determine the master or backup role. The VRRP router with the highest
priority remains in the Master state, and VRRP routers with lower priorities switch to the Backup
state. If these routers have the same priority and the VRRP backup group is configured on a router's
interface with the largest primary IP address, that router becomes the master router.
 If a VRRP router is the IP address owner, it immediately switches to the Master state after receiving
a Startup event.
 Master router status advertisement

The master router periodically sends VRRP Advertisement packets to all backup routers
in the VRRP backup group to advertise its configurations (such as the priority) and
operating status. The backup routers determine whether the master router is operating
properly based on received VRRP Advertisement packets.
− If the master router gives up the master role (for example, the master router leaves
the VRRP backup group), it sends VRRP Advertisement packets carrying a priority
of 0 to the backup routers. Rather than waiting for the Master_Down timer to expire,
the backup router with the highest priority switches to the Master state after a
specified switching time. This switching time is called Skew_Time, in seconds. The
Skew_Time is calculated using the following equation:
Skew_Time = (256 - Backup router's priority)/256
− If the master router fails and cannot send VRRP Advertisement packets, the backup
routers cannot immediately detect the master router's operating status. In this
situation, the backup router with the highest priority switches to the Master state
after the Master_Down timer expires. The Master_Down timer value (in seconds) is
calculated using the following equation:
Master_Down timer value = (3 x Adver_Interval timer value) + Skew_Time
If network congestion occurs, a backup router may not receive VRRP Advertisement packets from the
master router. If this situation occurs, the backup router proactively switches to the Master state. If the
new master router receives a VRRP Advertisement packet from the original master router, the new
master router will switch back to the Backup state. As a result, the routers in the VRRP backup group
frequently switch between Master and Backup. You can configure a preemption delay to resolve this
issue. After the configuration is complete, the backup router with the highest priority switches to the
Master state only when all of the following conditions are met:
 The Master_Down timer expires.

 The configured preemption delay elapses.
 The backup router does not receive VRRP Advertisement packets.
VRRP Authentication
VRRP supports different authentication modes and keys in VRRP Advertisement packets that
meet various network security requirements.
 On secure networks, you can use the non authentication mode. In this mode, a device
does not authenticate VRRP Advertisement packets before sending them. After a peer
device receives VRRP Advertisement packets, it does not authenticate them either, but it
considers them authentic and valid.
 On insecure networks, you can use the simple or message digest algorithm 5 (MD5)
authentication mode.
Issue 01 (2018-05-04) 333

NE20E-S2
− Simple authentication: Before a device sends a VRRP Advertisement packet, it adds

an authentication mode and key to the packet. After a peer device receives the
packet, the peer device checks whether the authentication mode and key carried in
the packet are the same as the locally configured ones. If they are the same, the peer
device considers the packet valid. If they are different, the peer device considers the
packet invalid and discards it.
− MD5 authentication: A device uses the MD5 algorithm to encrypt the locally
configured authentication key and saves the encrypted authentication key in the
Authentication Data field. After receiving a VRRP Advertisement packet, the device
uses the MD5 algorithm to encrypt the authentication key carried in the packet and
checks packet validity by comparing the encrypted authentication key saved in the
Authentication Data field with the encrypted authentication key carried in the
VRRP Advertisement packet.
 Only VRRPv2 supports authentication.

 MD5 authentication is more secure than simple authentication.
1.5.6.2.4 Basic VRRP Functions

VRRP works in either master/backup mode or load balancing mode.
Master/Backup Mode
A VRRP backup group comprises a master router and one or more backup routers. As shown
in Figure 1-207, Device A is the master router and forwards packets, and Device B and
Device C are backup routers and monitor Device A's status. If Device A fails, Device B or
Device C is elected as a new master router and takes over services from Device A.
Issue 01 (2018-05-04) 334

NE20E-S2
Figure 1-207 Master/Backup mode
VRRP device configurations in master/backup mode are as follows:

 Device A is the master. It supports delayed preemption and its VRRP priority is set to
120.
 Device B is a backup. It supports immediate preemption and its VRRP priority is set to
110.
 Device C is a backup. It supports immediate preemption and its VRRP priority is the
default value 100.
VRRP in master/backup mode is implemented as follows:
1. When Device A functions properly, user traffic travels along the path Device E ->
Device A -> Device D. Device A periodically sends VRRP Advertisement packets to
notify Device B and Device C of its status.
2. If Device A fails, its VRRP functions are unavailable. Because Device B has a higher
priority than Device C, Device B switches to the Master state and Device C remains in
the Backup state. User traffic switches to the new path Device E -> Device B -> Device
D.
Issue 01 (2018-05-04) 335

NE20E-S2
3. After Device A recovers, it enters the Backup state (its priority remains 120). After
receiving a VRRP Advertisement packet from Device B, the current master, Device A
finds that its priority is higher than that of Device B. Therefore, Device A preempts the
Master state after the preemption delay elapses, and sends VRRP Advertisement packets
and gratuitous ARP packets.
After receiving a VRRP Advertisement packet from Device A, Device B finds that its
priority is lower than that of Device A and changes from the Master state to the Backup
state. User traffic then switches to the original path Device E -> Device A -> Device D.
Load Balancing Mode

VRRP backup groups work together to load-balance traffic. The implementation principles
and packet negotiation mechanism of the load balancing mode are the same as those of the
master/backup mode. The difference between the two modes is that in load balancing mode,
two or more VRRP backup groups are established, and each VRRP backup group can contain
a different master router. A VRRP device can join multiple VRRP backup groups and have a
different priority in each group.
VRRP load balancing is classified into the following types:
 Multi-gateway load balancing: Multiple VRRP backup groups with virtual IP addresses
are created and specified as gateways for different users to implement load balancing.
Figure 1-208 illustrates multi-gateway load balancing.
Figure 1-208 Multi-gateway load balancing
As shown in Figure 1-208, VRRP backup groups 1 and 2 are deployed on the network.
− VRRP backup group 1: Device A is the master router, and Device B is the backup
router.
− VRRP backup group 2: Device B is the master router, and Device A is the backup
router.
VRRP backup groups 1 and 2 back up each other and serve as gateways for different
users, therefore load-balancing service traffic.
Issue 01 (2018-05-04) 336

NE20E-S2
 Single-gateway load balancing: A load-balance redundancy group (LBRG) with a virtual

IP address is created, and VRRP backup groups without virtual IP addresses are added to
the LBRG. The LBRG is specified as a gateway to implement load balancing for all
users.
Single-gateway load balancing, an enhancement to multi-gateway load balancing,
simplifies user-side configurations and facilitates network maintenance and management.
Figure 1-209 shows single-gateway load balancing.
Figure 1-209 Single-gateway load balancing
As shown in Figure 1-209, VRRP backup groups 1 and 2 are deployed on the network.
− VRRP backup group 1: an LBRG. Device A is the master router, and Device B is the
backup router.
− VRRP backup group 2: an LBRG member group. Device B is the master router, and
Device A is the backup router.
VRRP backup group 1 serves as a gateway for all users. After receiving an ARP request
packet from a user, VRRP backup group 1 returns an ARP response packet and
encapsulates its virtual MAC address or VRRP backup group 2's virtual MAC address in
the response.
1.5.6.2.5 mVRRP
Principles
A switch is dual-homed to two routers at the aggregation layer on a metropolitan area network
(MAN). Multiple VRRP backup groups can be configured on the two routers to transmit
various types of services. Because each VRRP backup group must maintain its own state
machine, a large number of VRRP Advertisement packets are transmitted between the routers.
To help reduce bandwidth and CPU resource consumption during VRRP packet transmission,
a VRRP backup group can be configured as a management Virtual Router Redundancy
Protocol (mVRRP) backup group. Other VRRP backup groups are bound to the mVRRP
backup group and become service VRRP backup groups. Only the mVRRP backup group
sends VRRP packets to negotiate the master/backup status. The mVRRP backup group
determines the master/backup status of service VRRP backup groups.
Issue 01 (2018-05-04) 337

NE20E-S2
As shown in Figure 1-210, an mVRRP backup group can be deployed on the same side as
service VRRP backup groups or on the interfaces that directly connect Device A and Device
B.
Figure 1-210 Typical mVRRP networking
Related Concepts
mVRRP backup group: has all functions of a common VRRP backup group. Different from a
common VRRP backup group, an mVRRP backup group can be tracked by service VRRP
backup groups and determine their statuses. An mVRRP backup group provides the following
functions:
 When the mVRRP backup group functions as a gateway, it determines the master/backup
status of devices and transmits services. In this situation, a common VRRP backup group
with the same ID as the mVRRP backup group must be created and assigned a virtual IP
address. The mVRRP backup group's virtual IP address is a gateway IP address set by
users.
 When the mVRRP backup group does not function as a gateway, it determines the
master/backup status of devices but does not transmit services. In this situation, the
mVRRP backup group does not require a virtual IP address. You can create an mVRRP
backup group directly on interfaces to simplify maintenance.
Service VRRP backup group: After common VRRP backup groups are bound to an mVRRP
backup group, they become service VRRP backup groups. Service VRRP backup groups do
not need to send VRRP packets to determine their states. The mVRRP backup group sends
VRRP packets to determine its state and the states of all its bound service VRRP backup
groups. A service VRRP backup group can be bound to an mVRRP backup group in either of
the following modes:
 Flowdown: The flowdown mode applies to networks on which both upstream and
downstream packets are transmitted over the same path. If the master device in an
mVRRP backup group enters the Backup or Initialize state, the VRRP module instructs
all service VRRP backup groups that are bound to the mVRRP backup group in
flowdown mode to enter the Initialize state.
 Unflowdown: The unflowdown mode applies to networks on which upstream and
downstream packets can be transmitted over different paths. If the mVRRP backup group
enters the Backup or Initialize state, the VRRP module instructs all service VRRP
backup groups that are bound to the mVRRP backup group in unflowdown mode to enter
the same state.
Issue 01 (2018-05-04) 338

NE20E-S2
Multiple service VRRP backup groups can be bound to an mVRRP backup group. However, the mVRRP
backup group cannot function as a service backup group and is bound to another mVRRP backup group.
If a physical interface on which a service VRRP backup group is configured goes Down, the status of the
service VRRP backup group becomes Initialize, irrespective of the status of the mVRRP backup group.
Benefits
VRRP offers the following benefits:
 Simplified management. An mVRRP backup group determines the master/backup status
of service VRRP backup groups.
 Reduced CPU and bandwidth resource consumption. Service VRRP backup groups do
not need to send VRRP packets.
1.5.6.2.6 Association Between VRRP and a VRRP-disabled Interface
Background
Virtual Router Redundancy Protocol (VRRP) can monitor the status change only in the
VRRP-enabled interface on the master device. If a VRRP-disabled interface on the master
device or the uplink connecting the interface to a network fails, VRRP cannot detect the fault,
which causes traffic interruptions.
To resolve this issue, configure VRRP to monitor the VRRP-disabled interface status. If a
VRRP-disabled interface on the master device or the uplink connecting the interface to a
network fails, VRRP instructs the master device to reduce its priority to trigger a
master/backup VRRP switchover.
Related Concepts
If a VRRP-disabled interface of a VRRP device goes Down, the VRRP device changes its
VRRP priority in either of the following modes:
 Increased mode: The VRRP device increases its VRRP priority by a specified value.
 Reduced mode: The VRRP device reduces its VRRP priority by a specified value.
Implementation
As shown in Figure 1-211, a VRRP backup group is configured on Device A and Device B.
Device A is the master device, and Device B is the backup device.
Device A is configured to monitor interface 1. If interface 1 fails, Device A reduces its VRRP
priority and sends a VRRP Advertisement packet carrying a reduced priority. After Device B
receives the packet, it checks that its VRRP priority is higher than the received priority and
preempts the Master state.
After interface 1 goes Up, Device A restores the VRRP priority. After Device A receives a
VRRP Advertisement packet carrying Device B's priority in preemption mode, Device A
checks that its VRRP priority is higher than the received priority and preempts the Master
state.
Issue 01 (2018-05-04) 339

NE20E-S2
Figure 1-211 Association between VRRP and a VRRP-disabled interface
Benefits
The association between VRRP and a VRRP-disabled interface helps trigger a master/backup
VRRP switchover if the VRRP-disabled interface fails or the uplink connecting the interface
to a network fails.
1.5.6.2.7 BFD for VRRP
Background
Devices in a VRRP backup group exchange VRRP Advertisement packets to negotiate the
master/backup status and implement backup. If the link between devices in a VRRP backup
group fails, VRRP Advertisement packets cannot be exchanged to negotiate the
master/backup status. A backup device attempts to preempt the Master state after a period
three times as long as the time interval at which VRRP Advertisement packets are broadcast.
During this period, user traffic is still forwarded to the master device, which results in user
traffic loss.
Bidirectional Forwarding Detection (BFD) can rapidly detect faults in links or IP routes. BFD
for VRRP enables a master/backup VRRP switchover to be completed within 1 second,
Issue 01 (2018-05-04) 340

NE20E-S2
preventing user traffic loss. A BFD session is established between the master and backup
devices in a VRRP backup group and is bound to the VRRP backup group. BFD immediately
detects communication faults in the VRRP backup group and instructs the VRRP backup
group to perform a master/backup switchover, minimizing service interruptions.
VRRP and BFD Association Modes

The following table describes VRRP and BFD association modes.
Table 1-62 VRRP and BFD association modes
Associ Usage Scenario VRRP Status Change BFD Support

ation
Mode
Associ A backup device If the common BFD VRRP devices must be
ation monitors the status of the session goes Down, BFD enabled with BFD.
betwee master device in a VRRP notifies the VRRP
na backup group. A backup group of the
VRRP common BFD session is fault. After receiving the
backup used to monitor the link notification, the VRRP
group between the master and backup group changes
and a backup devices. device VRRP priorities
commo and determines whether
n BFD to perform a
session master/backup VRRP
switchover.
Associ The master and backup If the link or peer BFD VRRP devices and the
ation devices monitor the link session goes Down, BFD downstream switch must
betwee and peer BFD sessions. notifies the VRRP be enabled with BFD.
na A link BFD session is backup group of the
VRRP established between the fault. After receiving the
backup master and backup notification, the VRRP
group devices. A peer BFD backup group
and session is established immediately performs a
link between a downstream master/backup VRRP
and switch and each VRRP switchover.
peer device. BFD helps the
BFD VRRP backup group
session detect faults in the link
s between a VRRP device
and the downstream
switch.
Association Between a VRRP Backup Group and a Common BFD Session

As shown in Figure 1-212, a BFD session is established between Device A (master) and
Device B (backup) and is bound to a VRRP backup group. If BFD detects a fault on the link
between Device B and Device A, the BFD module notifies the VRRP module of the status
change. After receiving the notification, the VRRP module performs a master/backup VRRP
switchover.
Issue 01 (2018-05-04) 341

NE20E-S2
Figure 1-212 Association between a VRRP backup group and a common BFD session

 Device A supports delayed preemption and its VRRP priority is 120.
100.
 A VRRP backup group is configured on Device B to monitor a common BFD session. If
BFD detects a fault and the BFD session goes Down, Device B increases its VRRP
priority by 40.
1. Device A periodically sends VRRP Advertisement packets to inform Device B that it is
working properly. Device B monitors the status of Device A and the BFD session.
2. If BFD detects a fault, the BFD session goes Down. BFD notifies the VRRP module of
the status change. Device B increases its VRRP priority value to 140 (increased by 40),
higher than Device A's VRRP priority. Device B preempts the Master state and sends
gratuitous ARP packets to update address entries on Device E.
3. After the fault is rectified, the BFD session goes Up.
Device B restores a priority of 100. Device B retains the Master state and still sends
VRRP Advertisement packets to Device A.
After receiving the packets, Device A checks that the VRRP priority carried in the
packets is lower than the local VRRP priority and waits a specified period before
preempting the Master state. After restoring the Master state, Device A sends a VRRP
Advertisement packet and a gratuitous ARP packet.
After receiving the VRRP Advertisement packet that carries a priority higher than the
local priority, Device B enters the Backup state.
state.
The preceding process shows that BFD for VRRP is different from VRRP. After BFD for
VRRP is deployed and a fault occurs, a backup device immediately preempts the Master state
without waiting a period three times as long as the time interval at which VRRP
Advertisement packets are broadcast. A master/backup VRRP switchover can be implemented
in milliseconds.
Issue 01 (2018-05-04) 342

NE20E-S2
Association Between a VRRP Backup Group and Link and Peer BFD Sessions
As shown in Figure 1-213, the master and backup devices monitor the status of link and peer
BFD sessions to identify local or remote faults.
Device A and Device B run VRRP. A peer BFD session is established between Device A and
Device B to detect link and device failures. Link BFD sessions are established between
Device A and Device E and between Device B and Device E to detect link and device failures.
After Device B detects that the peer BFD session goes Down and Link2 BFD session goes Up,
Device B's VRRP status changes from Backup to Master, and Device B takes over.
Figure 1-213 Association between a VRRP backup group and link and peer BFD sessions

 A peer BFD session is established between Device A and Device B to detect link and
device failures.
 Link1 and Link2 BFD sessions are established between Device E and Device A and
between Device E and Device B, respectively.
inform Device B that it is working properly. Device A monitors the BFD session status.
Device B monitors the status of Device A and the BFD session.
2. The BFD session goes Down if BFD detects either of the following faults:
− Link1 or Device E fails. Link1 BFD session and the peer BFD session go Down.
Link2 BFD session is Up.
Device A's VRRP status directly becomes Initialize.
Device B's VRRP status directly becomes Master.
− Device A fails. Link1 BFD session and the peer BFD session go Down. Link2 BFD
session is Up. Device B's VRRP status becomes Master.
3. After the fault is rectified, the BFD sessions go Up, and Device A and Device B restore
their VRRP status.
Issue 01 (2018-05-04) 343

NE20E-S2
A Link2 fault does not affect Device A's VRRP status, and Device A continues to forward upstream
traffic. However, Device B's VRRP status becomes Master if both the peer BFD session and Link2 BFD
session go Down, and Device B detects the peer BFD session status change before detecting the Link2
BFD session status change. After Device B detects the Link2 BFD session status change, Device B's
VRRP status becomes Initialize.
Figure 1-214 shows the state machine for the association between a VRRP backup group and
link and peer BFD sessions.
Figure 1-214 State machine for the association between a VRRP backup group and link and peer
BFD sessions
The preceding process shows that, after link and peer BFD for VRRP is deployed, the backup
device immediately preempts the Master state if a fault occurs. Link and peer BFD for VRRP
implements a millisecond-level master/backup VRRP switchover.
Benefits
BFD for VRRP speeds up master/backup VRRP switchovers if faults occur.
1.5.6.2.8 VRRP Tracking EFM
Principles
Metro Ethernet solutions use Virtual Router Redundancy Protocol (VRRP) tracking
Bidirectional Forwarding Detection (BFD) to detect link faults and protect links between the
master and backup network provider edges (NPEs) and between NPEs and user-end provider
edges (UPEs). If UPEs do not support BFD, Metro Ethernet solutions cannot use VRRP
tracking BFD. If UPEs support 802.3ah, Metro Ethernet solutions can use 802.3ah as a
substitute for BFD to detect link faults and protect links between NPEs and UPEs. Ethernet
operation, administration and maintenance (OAM) technologies, such as Ethernet in the First
Mile (EFM) OAM defined in IEEE 802.3ah, provide functions, such as link connectivity
detection, link failure monitoring, remote failure notification, and remote loopback for links
between directly connected devices.
Issue 01 (2018-05-04) 344

NE20E-S2
Implementation
EFM can detect only local link failures. If the link between the UPE and NPE1 fails, NPE2 cannot detect
the failure. NPE2 has to wait three VRRP Advertisement packet transmission intervals before it switches
to the Master state. During this period, upstream service traffic is interrupted. To speed up
master/backup VRRP switchovers and minimize the service interruption time, configure VRRP also to
track the peer BFD session.
Figure 1-215 shows a network on which VRRP tracking EFM is configured. NPE1 and NPE2
are configured to belong to a VRRP backup group. A peer BFD session is configured to detect
the faults on the two NPEs and on the link between the two NPEs. An EFM session is
configured between the UPE and NPE1 and between the UPE and NPE2 to detect the faults
on the UPE and NPEs and on the links between the UPE and NPEs. The VRRP backup group
determines the VRRP status of NPEs based on the link status reported by EFM and the peer
BFD session.
Figure 1-215 VRRP tracking EFM
The implementation is as follows:

1. In normal circumstances, NPE1 periodically sends VRRP Advertisement packets to
inform NPE2 that NPE1 works properly. NPE1 and NPE2 both track the EFM and peer
BFD session status.
2. If NPE1 or the link between the UPE and NPE1 fails, the status of the EFM session
between the UPE and NPE1 changes to Discovery, the status of the peer BFD session
changes to Down, and the status of the EFM session between the UPE and NPE2
changes to Detect. NPE1's VRRP status directly changes from Master to Initialize, and
NPE2's VRRP status directly changes from Backup to Master.
3. After NPE1 or the link between the UPE and NPE1 recovers, the status of the peer BFD
session changes to Up, and the status of the EFM session between the UPE and NPE1
changes to Detect. If the preemption function is configured on NPE1, NPE1 changes
back to the Master state after VRRP negotiation, and NPE2 changes back to the Backup
state.
Issue 01 (2018-05-04) 345

NE20E-S2
In normal circumstances, if the link between the UPE and NPE2 fails, NPE1 remains in the Master state
and continues to forward upstream traffic. However, NPE2's VRRP status changes to Master if NPE2
detects the Down state of the peer BFD session before it detects the Discovery state of the link between
itself and the UPE. After NPE2 detects the Discovery state of the link between itself and the UPE,
NPE2's VRRP status changes from Master to Initialize.
Figure 1-216 shows the state machine for VRRP tracking EFM.
Figure 1-216 State machine for VRRP tracking EFM
Benefits
VRRP tracking EFM facilitates master/backup VRRP switchovers on a network on which
UPEs do not support BFD but support 802.3ah.
1.5.6.2.9 VRRP Tracking CFM
Principles
Virtual Router Redundancy Protocol (VRRP) tracking Ethernet in the First Mile (EFM)
effectively facilitates link fault detection on a network on which UPEs do not support
Bidirectional Forwarding Detection (BFD). However, EFM can detect faults only on
single-hop links. As shown in Figure 1-217, EFM cannot detect faults on the link between
UPE2 and NPE1 or between UPE2 and NPE2.
Issue 01 (2018-05-04) 346

NE20E-S2
Figure 1-217 VRRP networking diagram
Connectivity fault management (CFM) defined in 802.1ag provides functions, such as

point-to-point connectivity fault detection, fault notification, fault verification, and fault
locating. CFM can monitor the connectivity of an entire network and locate connectivity
faults. CFM can also be used together with switchover techniques to improve network
reliability. VRRP tracking CFM enables a VRRP backup group to rapidly perform a
master/backup VRRP switchover when CFM detects a link fault. This implementation
minimizes the service interruption time.
Implementation
CFM can detect only local link failures. If the link between UPE2 and NPE1 fails, NPE2 cannot detect
the failure. NPE2 has to wait three VRRP Advertisement packet transmission intervals before it switches
to the Master state. During this period, upstream service traffic is interrupted. To speed up
master/backup VRRP switchovers and minimize the service interruption time, configure VRRP also to
track the peer BFD session.
Figure 1-218 shows a network on which VRRP tracks CFM and the peer BFD session.
Issue 01 (2018-05-04) 347

NE20E-S2
Figure 1-218 VRRP tracking CFM
 NPE1 and NPE2 are configured to belong to a VRRP backup group.

 A peer BFD session is configured to detect the faults on the two NPEs and on the link
between the two NPEs.
 A CFM session is configured between UPE2 and NPE1 and between UPE2 and NPE2 to
detect the faults on UPE2 and the NPEs and on links between UPE2 and the NPEs.
1. In normal circumstances, NPE1 periodically sends VRRP Advertisement packets to
inform NPE2 that NPE1 works properly. NPE1 and NPE2 both track the CFM and peer
BFD session status.
2. If NPE1 or the link between UPE2 and NPE1 fails, the status of the CFM session
between UPE2 and NPE1 changes to Down, the status of the peer BFD session changes
to Down, and the status of the CFM session between UPE2 and NPE2 changes to Up.
NPE1's VRRP status directly changes from Master to Initialize, and NPE2's VRRP status
directly changes from Backup to Master.
3. After NPE1 or the link between UPE2 and NPE1 recovers, the status of the peer BFD
session changes to Up, and the status of the CFM session between UPE2 and NPE1
changes to Up. If the preemption function is configured on NPE1, NPE1 changes back to
the Master state after VRRP negotiation, and NPE2 changes back to the Backup state.
In normal circumstances, if the link between UPE2 and NPE2 fails, NPE1 remains in the Master state
and continues to forward upstream traffic. However, NPE2's VRRP status changes to Master if NPE2
detects the Down state of the peer BFD session before it detects the Down state of the link between itself
and UPE2. After NPE2 detects the Down state of the link between itself and UPE2, NPE2's VRRP status
changes from Master to Initialize.
Figure 1-219 shows the state machine for VRRP tracking CFM.
Issue 01 (2018-05-04) 348

NE20E-S2
Figure 1-219 State machine for VRRP tracking CFM
Benefits
VRRP tracking CFM prevents service interruptions caused by dual master devices in a VRRP
backup group and facilitates master/slave VRRP switchovers.
1.5.6.2.10 VRRP Association with NQA
Background
To improve network reliability, VRRP can be configured on a device to track the following
objects:
 Interface
 EFM session
 BFD session
Failure of a tracked object can trigger a rapid master/backup VRRP switchover to ensure
service continuity.
In Figure 1-220, however, if Interface 2 on Device C goes Down and its IP address (20.1.1.1)
becomes unreachable, VRRP is unable to detect the fault. As a result, user traffic is dropped.
Issue 01 (2018-05-04) 349

NE20E-S2
To resolve the preceding issue, you can associate VRRP with network quality analysis (NQA).
Using test instances, NQA sends probe packets to check the reachability of destination IP
addresses. After VRRP is associated with an NQA test instance, VRRP tracks the NQA test
instance to implement rapid master/backup VRRP switchovers. For the example shown in the
preceding figure, you can configure an NQA test instance on Device A to check whether the
IP address 20.1.1.1 of Interface 2 on Device C is reachable.
VRRP association with an NQA test instance is required on only the local device (Device A).
Implementation
You can configure VRRP association with an NQA test instance to track a gateway router's
uplink, which is a cross-device link. If the uplink fails, NQA instructs VRRP to reduce the
gateway router's priority by a specified value. Reducing the priority enables another gateway
router in the VRRP backup group to take over services and become the master, thereby
ensuring communication continuity between hosts on the LAN served by the gateway and the
external network. After the uplink recovers, NQA instructs VRRP to restore the gateway
router's priority.
Figure 1-221 illustrates VRRP association with an NQA test instance.
Issue 01 (2018-05-04) 350

NE20E-S2
Figure 1-221 VRRP association with an NQA test instance
As shown in Figure 1-221:

 An NQA test instance is created on Device A to detect the reachability of the destination
IP address 20.1.1.1.
 VRRP is configured on Device A to track the NQA test instance. If the status of the NQA
test instance is Failed, Device A reduces its priority to trigger a master/backup VRRP
switchover. A VRRP backup group can track a maximum of eight NQA test instances.
1. Device A tracks the NQA test instance periodically and sends VRRP Advertisement
packets to notify its status to Device B.
2. When the uplink fails, the status of the NQA test instance changes to Failed. NQA
notifies VRRP of the link detection failure, and Device A reduces its priority by a
specified value. Because Device B has a higher priority than Device A, Device B
preempts the Master state and takes over services.
3. When the uplink recovers, the status of the NQA test instance changes to Success. NQA
notifies VRRP of the link detection success, and Device A restores the original priority.
If preemption is enabled on Device A, Device A preempts the Master state and takes
over services after VRRP negotiation.
Benefits
VRRP association with NQA implements a rapid master/backup VRRP switchover if a
cross-device uplink fails.
1.5.6.2.11 Association Between a VRRP Backup Group and a Route
Background
To improve device reliability, two user gateways working in master/backup mode are
connected to a network, and VRRP is enabled on these gateways to determine their
master/backup status. If a VRRP backup group has been configured and an uplink route to a
Issue 01 (2018-05-04) 351

NE20E-S2
network becomes unreachable, access-side users still use the VRRP backup group to forward
traffic along the uplink route, which causes user traffic loss.
Association between a VRRP backup group and a route can prevent user traffic loss. A VRRP
backup group can be configured to track the uplink route to a network. If the route is
withdrawn or becomes inactive, the route management (RM) module notifies the VRRP
backup group of the change. After receiving the notification, the VRRP backup group changes
its master device's VRRP priority and performs a master/backup switchover. This process
ensures that user traffic can be forwarded along a properly functioning link.
Implementation
A VRRP backup group can be associated with an uplink route to a network to determine
whether the route is reachable. If the uplink route is withdrawn or becomes inactive after the
uplink goes Down or the network topology changes, hosts on a local area network (LAN) fail
to access the external network through gateways. The RM module notifies the VRRP backup
group of the route status change. The VRRP priority of the master device decreases by a
specified value. A backup device with a priority higher than others preempts the Master state
and takes over traffic. This process ensures communication continuity between these hosts
and the external network. After the uplink recovers, the RM module instructs the VRRP
backup group to restore the master device's VRRP priority.
As shown in Figure 1-222, a VRRP backup group is configured on Device A (master) and
Device B (backup), with Device A forwarding user traffic. The VRRP backup group on
Device A is associated with the route 100.1.2.0/24.
When the uplink from Device A to Device C goes Down, the route 100.1.2.0/24 becomes
unreachable and Device A's VRRP priority decreases. Because Device A's reduced VRRP
priority is lower than Device B's VRRP priority, Device B preempts the Master state and takes
over, which prevents user traffic loss.
Issue 01 (2018-05-04) 352

NE20E-S2
Figure 1-222 Association between a VRRP backup group and a route

 Device A's VRRP priority is 120.
100.
 The VRRP backup group on Device A is associated with the route 100.1.2.0/24. If the
RM module informs Device A that the route 100.1.2.0/24 is unreachable, Device A's
VRRP priority decreases by 40.
inform Device B that it is working properly.
2. When the uplink between Device A and Device C goes Down, the route 100.1.2.0/24
becomes unreachable. The RM module notifies the VRRP backup group on Device A of
the route status change. After Device A receives the notification, its VRRP priority
decreases to 80 (120 - 40). Because the VRRP priority of Device B is higher than that of
Device A, Device B preempts the Master state and sends gratuitous ARP packets to
update address entries on Device E.
3. When the faulty link recovers, the route 100.1.2.0/24 is reachable. Device A restores its
VRRP priority to 120 (80 + 40), preempts the Master state, and sends VRRP
Advertisement packets and gratuitous ARP packets. After Device B receives the
Issue 01 (2018-05-04) 353

NE20E-S2
Advertisement packets and determines that its priority is lower than that of Device A,
Device B returns to the Backup state.
state.
The preceding process shows that the VRRP backup group performs a master/backup
switchover if the uplink route is unreachable.
Benefits
Association between a VRRP backup group and a route helps implement a master/backup
VRRP switchover when an uplink route to a network is unreachable. The association also
ensures that the VRRP backup group performs a traffic switchback and minimizes traffic
downtime.
1.5.6.2.12 Association Between Direct Routes and a VRRP Backup Group
Background
A VRRP backup group is configured on Device1 and Device2 on the network shown in Figure
1-223. Device1 is a master device, whereas Device2 is a backup device. The VRRP backup
group serves as a gateway for users. User-to-network traffic travels through Device1.
However, network-to-user traffic may travel through Device1, Device2, or both of them over
a path determined by a dynamic routing protocol. Therefore, user-to-network traffic and
network-to-user traffic may travel along different paths, which interrupts services if firewalls
are attached to devices in the VRRP backup group, complicates traffic monitoring or statistics
collection, and increases costs.
To address the preceding problems, the routing protocol is expected to select a route passing
through the master device so that the user-to-network and network-to-user traffic travels along
the same path. Association between direct routes and a VRRP backup group can meet
expectations by allowing the dynamic routing protocol to select a route based on the VRRP
status.
Issue 01 (2018-05-04) 354

NE20E-S2
Figure 1-223 Association between direct routes and a VRRP backup group
Related Concepts
Direct route: a 32-bit host route or a network segment route that is generated after a device
interface is assigned an IP address and its protocol status is Up. A device automatically
generates direct routes without using a routing algorithm.
Implementation
Association between direct routes and a VRRP backup group allows VRRP interfaces to
adjust the costs of direct network segment routes based on the VRRP status. The direct route
with the master device as the next hop has the lowest cost. A dynamic routing protocol
imports the direct routes and selects the direct route with the lowest cost. For example, VRRP
interfaces on Device1 and Device2 on the network shown in 1.10.2.2.15 are configured with
association between direct routes and the VRRP backup group. The implementation is as
follows:
 Device1 in the Master state sets the cost of its route to the directly connected virtual IP
network segment to 0 (default value).
 Device2 in the Backup state increases the cost of its route to the directly connected
virtual IP network segment.
Issue 01 (2018-05-04) 355

NE20E-S2
A dynamic routing protocol selects the route with Device1 as the next hop because this route
costs less than the other route. Therefore, both user-to-network and network-to-user traffic
travels through Device1.
Usage Scenario
When a data center is used, firewalls are attached to devices in a VRRP backup group to
improve network security. Network-to-user traffic cannot pass through a firewall if it travels
over a path different than the one used by user-to-network traffic.
When an IP radio access network (RAN) is configured, VRRP is configured to set the
master/backup status of aggregation site gateways (ASGs) and radio service gateways (RSGs).
Network-to-user and user-to-network traffic may pass through different paths, complicating
network operation and management.
Association between direct routes and a VRRP backup group can address the preceding
problems by ensuring the user-to-network and network-to-user traffic travels along the same
path.
1.5.6.2.13 Traffic Forwarding by a Backup Device
Principles
As shown in Figure 1-224, the base station attached to the cell site gateway (CSG) on a
mobile bearer network accesses aggregation nodes PE1 and PE2 over primary and secondary
pseudo wires (PWs) and accesses PE3 and PE4 over primary and secondary links. PE3 and
PE4 are configured to belong to a Virtual Router Redundancy Protocol (VRRP) backup group.
If PE1 fails, traffic switches from the primary link to the secondary link. Before a
master/backup VRRP switchover is complete, service traffic is temporarily interrupted.
Issue 01 (2018-05-04) 356

NE20E-S2
Figure 1-224 Traffic forwarding by a backup device
To meet carrier-class reliability requirements, configure devices in the VRRP backup group to
forward traffic even when they are in the Backup state. This configuration can prevent traffic
interruptions in the preceding scenario.
Implementation
As shown in Figure 1-224, upstream traffic travels along the path CSG -> PE1 -> PE3 ->
RNC1/RNC2 in normal circumstances. PE3 is in the Master state, and PE4 in the Backup
state.
If PE1 fails, traffic switches from the primary link between PE1 and PE3 to the secondary link
between PE2 and PE4. Because the speed of a primary/secondary link switchover is higher
than that of a master/backup VRRP switchover:
 If PE4 cannot forward traffic, service traffic is temporarily interrupted before the
master/backup VRRP switchover is complete.
 If PE4 can forward traffic, PE4 takes over service traffic forwarding even if the
master/backup VRRP switchover is not complete.
Benefits
Traffic forwarding by a backup device improves master/backup VRRP switchover
performance and reduces the service interruption time.
Issue 01 (2018-05-04) 357

NE20E-S2
1.5.6.2.14 Rapid VRRP Switchback
Principles
On the network shown in Figure 1-225, VRRP-enabled NPEs are connected to user-side PEs
through active and standby links. User traffic travels over the active link to the master NPE1,
and NPE1 forwards user traffic to the Internet. If NPE1 is working properly, user traffic
travels over the path UPE -> PE1 -> NPE1. If the active link or NPE1's interface 1 tracked by
the VRRP backup group fails, an active/standby link switchover and a master/backup VRRP
switchover are implemented. After the switchovers, user traffic switches to the path UPE ->
PE1 -> PE2 -> NPE2. After the fault is rectified, an active/standby link switchback and a
master/backup VRRP switchback are implemented. If the active link becomes active before
the original master device restores the Master state, user traffic is interrupted.
To prevent user traffic interruptions, the rapid VRRP switchback function is used to allow the
original master device to switch from the Backup state to the Master state immediately after
the fault is rectified.
Figure 1-225 Rapid VRRP switchback
Related Concept
A VRRP switchback is a process during which the original master device switches its status
from Backup to Master after a fault is rectified.
Issue 01 (2018-05-04) 358

NE20E-S2
Implementation
Rapid VRRP switchback allows the original master device to switch its status from Back to
Master without using VRRP Advertisement packets to negotiate the status. For example, on
the network shown in Figure 1-225, device configurations are as follows:
 A common VRRP backup group is configured on NPE1 and NPE2 that run VRRP. An
mVRRP backup group is configured on directly connected interfaces of NPE1 and NPE2.
The common VRRP backup group is bound to the mVRRP backup group and becomes a
service VRRP backup group. The mVRRP backup group determines the master/backup
status of the service VRRP backup group.
 NPE1 has a VRRP priority of 120 and works in the Master state in the mVRRP backup
group.
 NPE2 has a VRRP priority of 100 and works in the Backup state in the mVRRP backup
group.
 NPE1 tracks interface 1 and reduces its priority by 40 if interface 1 goes Down.
The rapid VRRP switchback process is as follows:
1. If NPE1 is working properly, NPE1 periodically sends VRRP Advertisement packets to
notify NPE2 of the Master state. NPE1 tracks interface 1 connected to the active link.
2. If the active link or interface 1 fails, interface 1 goes Down. The service VRRP backup
group on NPE1 is in the Initialize state. NPE1 reduces its mVRRP priority to 80 (120 -
40). As a result, the mVRRP priority of NPE2 is higher than that of NPE1, and NPE2
immediately preempts the Master state. NPE2 then sends a VRRP Advertisement packet
carrying a higher priority than that of NPE1. After receiving the packet, the mVRRP
backup group on NPE1 stops sending VRRP Advertisement packets and enters the
Backup state. The status of the service VRRP backup group is the same as that of the
mVRRP backup group on NPE2. User traffic switches to the path UPE -> PE1 -> PE2 ->
NPE2.
3. After the fault is rectified, interface 1 goes Up and NPE1 increases its VRRP priority to
120 (80 + 40). NPE1 immediately preempts the Master state and sends VRRP
Advertisement packets to NPE2. User traffic switches back to the path UPE -> PE1 ->
NPE1.
If rapid VRRP switchback is not configured and NPE1 restores its priority to 120, NPE1 has to wait
until it receives VRRP Advertisement packets carrying a lower priority than its own priority from NPE2
before preempting the Master state.
4. NPE1 then sends VRRP Advertisement packets carrying a higher priority than NPE2's
priority. After receiving the VRRP Advertisement packets, NPE2 enters the Backup state.
Both NPE1 and NPE2 restore their previous status.
Usage Scenario
Rapid VRRP switchback applies to a specific network with all of the following
characteristics:
 The master device in an mVRRP backup group tracks a VRRP-disabled interface or
feature and reduces its VRRP priority if the interface or feature status becomes Down.
 Devices in a VRRP backup group are connected to user-side devices over the active and
standby links.
Issue 01 (2018-05-04) 359

NE20E-S2
 An active/standby link switchback is implemented quicker than a master/backup VRRP

switchback.
Benefits
Rapid VRRP switchback speeds up a VRRP switchback after a fault is rectified.
1.5.6.3.1 IPRAN Gateway Protection Solution
Service Overview
NodeBs and radio network controllers (RNCs) on an IP radio access network (IPRAN) do not
have dynamic routing capabilities. Static routes must be configured to allow NodeBs to
communicate with access aggregation gateways (AGGs) and allow RNCs to communicate
with radio service gateways (RSGs) at the aggregation level. To ensure that various
value-added services, such as voice, video, and cloud computing, are not interrupted on
mobile bearer networks, a VRRP backup group can be deployed to implement gateway
redundancy. When the master device in a VRRP backup group goes Down, a backup device
takes over, ensuring normal service transmission and enhancing device reliability at the
aggregation layer.
Figure 1-226 shows the network for the IPRAN gateway protection solution. A NodeB is
connected to AGGs over an access ring or is dual-homed to two AGGs. The cell site gateways
(CSGs) and AGGs are connected using the pseudo wire emulation edge-to-edge (PWE3)
technology, which ensures connection reliability. Two VRRP backup groups can be
configured on the AGGs and RSGs to implement gateway backup for the NodeB and RNC,
respectively.
Figure 1-226 IPRAN gateway protection solution
Issue 01 (2018-05-04) 360

NE20E-S2
Feature Deployment
Table 1-63 describes VRRP-based gateway protection applications on an IPRAN.
Table 1-63 VRRP-based gateway protection on an IPRAN
Networ Feature Usage Scenario

k Layer Deploym
ent
Deploy Associate To meet various service demands, different VRRP backup
VRRP an mVRRP groups can be configured on AGGs to provide gateway
backup backup functions for different user groups. Each VRRP backup group
groups on group with maintains its own state machine, leading to transmission of
AGGs to a service multiple VRRP packets on the AGGs. These packets use a
implemen VRRP significant amount of bandwidth when traversing the access
t gateway backup network.
backup group. To simplify VRRP operations and reduce bandwidth
for the consumption, an mVRRP backup group can be associated with
NodeB. service VRRP backup groups on AGGs. During this process,
service VRRP backup groups function as gateways for the
NodeB and are associated with the mVRRP backup group. The
mVRRP backup group processes VRRP Advertisement packets
and determines the master/backup status of the associated
service VRRP backup group.
Associate By default, when a VRRP backup group detects that the master
an mVRRP device goes Down, the backup device attempts to preempt the
backup Master state after 3 seconds (three times the interval at which
group with VRRP Advertisement packets are broadcast). During this period,
a BFD no master device forwards user traffic, which leads to traffic
session. forwarding interruptions.
BFD can detect link faults in milliseconds. After an mVRRP
backup group is associated with a BFD session and BFD detects
a fault, a master/backup VRRP switchover is implemented,
preventing user traffic loss. When the master device goes Down,
the BFD module instructs the backup device in the mVRRP
backup group to preempt the Master state and take over traffic.
The status of the service VRRP backup group associated with
the mVRRP backup group changes accordingly. This
implementation reduces service interruptions.
Associate During the traffic transmission between the NodeB and RNC,
direct user-to-network and network-to-user traffic may travel through
network different paths, causing network operation, maintenance, and
segment management difficulties. For example, the NodeB sends traffic
routes with destined for the RNC through the master AGG. The RNC sends
a service traffic destined for the NodeB through the backup AGG. This
VRRP implementation increases traffic monitoring costs. Association
backup between direct network segment routes and a service VRRP
group. backup group can be deployed to ensure that user-to-network
and network-to-user traffic travels through the same path.
Deploy Deploy RSGs provide gateway functions for the RNC. Basic VRRP
VRRP basic functions can be configured on the RSGs to implement gateway
Issue 01 (2018-05-04) 361

NE20E-S2
Networ Feature Usage Scenario

k Layer Deploym
ent
backup VRRP backup. In normal circumstances, the master device forwards
groups on functions. user traffic. When the master device goes Down, the backup
RSGs to device takes over.
implemen
t gateway Associate a A VRRP backup group can be associated with a BFD session to
backup VRRP implement a rapid master/backup VRRP switchover when BFD
for the backup detects a fault. When the master device goes Down, the BFD
RNC. group with module instructs the backup device in the VRRP backup group
a BFD to preempt the Master state and take over traffic. This
session. implementation reduces service interruptions.
Associate Direct network segment routes can be associated with a VRRP
direct backup group to ensure the same path for both user-to-network
network and network-to-user traffic between the NodeB and RNC.
segment
routes with
a VRRP
backup
group.
Protection Switching Process

AGG1 and RSG1 are deployed as master devices. The following describes user traffic path
changes when AGG1 goes Down and after AGG1 recovers.
As shown in Figure 1-227, in normal circumstances, the NodeB sends traffic through the
CSGs to AGG1 over the primary pseudo wire (PW). AGG1 forwards the traffic to RSG1
through the P device. Then, RSG1 forwards the traffic to the RNC. The path for
user-to-network traffic is CSG -> AGG1 -> P -> RSG1 -> RNC, and the path for
network-to-user traffic is RNC -> RSG1 -> P -> AGG1 -> CSG.
When AGG1 goes Down, a primary/secondary PW switchover is performed. Traffic sent from
the NodeB goes through the CSGs to AGG2 through the new primary PW. AGG2 forwards
the traffic to RSG1 through the P device and RSG2. Then, RSG1 sends the traffic to the RNC.
The path for user-to-network traffic is CSG -> AGG2 -> P -> RSG2 -> RSG1 -> RNC, and
the path for network-to-user traffic is RNC -> RSG1 -> RSG2 -> P -> AGG2 -> CSG.
Issue 01 (2018-05-04) 362

NE20E-S2
Figure 1-227 Traffic path after AGG1 goes Down
As shown in Figure 1-228, when AGG1 recovers, a primary/secondary PW switchover is

performed, but a master/backup switchover is not performed in the mVRRP backup group.
Therefore, traffic sent from the NodeB goes through the CSGs and AGG1 to AGG2 over the
previous primary PW. AGG2 forwards the traffic to RSG1 through the P device and RSG2.
RSG1 then forwards the traffic to the RNC. The path for user-to-network traffic is CSG ->
AGG1 -> AGG2 -> P -> RSG2 -> RSG1 -> RNC, and the path for network-to-user traffic is
RNC -> RSG1 -> RSG2 -> P -> AGG2 -> AGG1 -> CSG.
Figure 1-228 Traffic path after AGG1 recovers
When AGG1 recovers, it becomes the master device after a specified preemption delay
elapses. AGG2 then becomes the backup device. Traffic sent from the NodeB goes through
the CSGs to AGG1 over the previous primary PW. AGG1 sends the traffic to RSG1 through
the P device. RSG1 then sends the traffic to the RNC. The path for user-to-network traffic is
CSG -> AGG1 -> P -> RSG1 -> RNC, and the path for network-to-user traffic is RNC ->
RSG1 -> P -> AGG1 -> CSG.
Issue 01 (2018-05-04) 363

NE20E-S2

Acronym Full Name
and
Abbreviatio
n
ARP Address Resolution Protocol

BFD Bidirectional Forwarding Detection
L2VPN Layer 2 virtual private network
L3VPN Layer 3 virtual private network
PW pseudo wire
VSI virtual switching instance
mVRRP management Virtual Router Redundancy Protocol
VRRP Virtual Router Redundancy Protocol
1.5.7 Ethernet OAM

Easy-to-use Ethernet techniques support good bandwidth expansibility on low-cost hardware.
With these advantages, Ethernet services and structures are the first choice for many
enterprise networks, metropolitan area networks (MANs), and wide area network (WANs).
The increasing popularity of Ethernet applications encourages carriers to use improved
Ethernet OAM functions to maintain and operate Ethernet networks.
OAM mechanisms for server-layer services such as synchronous digital hierarchy (SDH) and
for client-layer services such as IP cannot be used on Ethernet networks. Ethernet OAM
differs from client- or server-layer OAM and has been developed to support the following
functions:
 Monitors Ethernet link connectivity.
 Pinpoints faults on Ethernet networks.
 Evaluates network usage and performance.
These functions help carriers provide services based on service level agreements (SLAs).
Ethernet operation, administration and maintenance (OAM) is used for Ethernet networks.
Ethernet OAM provides the following functions:
 Fault management
− Ethernet OAM sends detection packets on demand or periodically to monitor
network connectivity.
− Ethernet OAM uses methods similar to Packet Internet Groper (PING) and
traceroute used on IP networks to locate and diagnose faults on Ethernet networks.
− Ethernet OAM is used together with a protection switching protocol to trigger a
device or link switchover if a connectivity fault is detected. Switchovers help
Issue 01 (2018-05-04) 364

NE20E-S2
networks achieve carrier-class reliability, by ensuring that network interruptions are

less than or equal to 50 milliseconds.
 Performance management
Ethernet OAM measures network transmission parameters including packet loss ratio,
delay, and jitter and collects traffic statistics including the numbers of sent and received
bytes and the number of frame errors. Performance management is implemented on
access devices. Carriers use this function to monitor network operation and dynamically
adjust parameters in real time based on statistical data. This process reduces maintenance
costs.
Ethernet OAM Network

Table 1-64 shows the hierarchical Ethernet OAM network structure.
Table 1-64 Ethernet OAM network
Layer Description Feature Usage Scenario

Link-l Monitors EFM supports link EFM is used on links between
evel physical Ethernet continuity check, fault customer edges (CEs) and
Ether links directly detection, fault user-end provider edges (UPEs)
net connecting advertisement, and on a metropolitan area network
OAM carrier networks loopback for P2P Ethernet (MAN) shown in Figure 1-229.
to user networks. link maintenance. Unlike It helps maintain the reliability
For example, the CFM that is used for a and stability of connections
Institute of specific type of service, between a user network and a
Electrical and EFM is used on links provider network. EFM
Electronics transmitting various monitors and detects faults in
Engineers (IEEE) services. P2P Ethernet physical links or
802.3ah, also simulated links.
known as
Ethernet in the
First Mile
(EFM), supports
Ethernet OAM
for the last-mile
links and also
monitors direct
physical Ethernet
links.
Netw Checks network IEEE 802.1ag, also known CFM is used at the access and
ork-le connectivity, as connectivity fault aggregation layers of the MAN
vel pinpoints management (CFM), shown in Figure 1-229. For
Ether connectivity defines OAM functions, example, CFM monitors the link
net faults, and such as continuity check between a user-end provider
OAM monitors E2E (CC), loopback (LB), and edge (UPE) and a PE. It
network linktrace (LT), for monitors network-wide
performance at Ethernet bearer networks. connectivity and detects
the access and CFM applies to large-scale connectivity faults. CFM is used
aggregation E2E Ethernet networks. together with protection
layers. For switchover mechanisms to
example, IEEE maintain network reliability.
802.1ag (CFM)
Y.1731 is an OAM Y.1731 is a CFM enhancement
Issue 01 (2018-05-04) 365

NE20E-S2
Layer Description Feature Usage Scenario

and Y.1731. protocol defined by the that applies to access and
Telecommunication aggregation networks. Y.1731
Standardization Sector of supports performance
the International monitoring functions, such as
Telecommunication Union LM and DM, in addition to fault
(ITU-T). It covers items management that CFM supports.
defined in IEEE 802.1ag
and provides additional
OAM messages for fault
management and
performance monitoring.
Fault management
includes alarm indication
signal (AIS), remote
defect indication (RDI),
locked signal (LCK), test
signal, maintenance
communication channel
(MCC), experimental
(EXP) OAM, and vendor
specific (VSP) OAM.
Performance monitoring
includes frame loss
measurement (LM) and
delay measurement (DM).
Issue 01 (2018-05-04) 366

NE20E-S2
Figure 1-229 Typical MAN networking
Benefits
P2P EFM, E2E CFM, E2E Y.1731, and their combinations are used to provide a complete
Ethernet OAM solution, which brings the following benefits:
 Ethernet is deployed near user premises using remote terminals and roadside cabinets at
remote central offices or in unattended areas. Ethernet OAM allows remote maintenance,
saving the trouble in onsite maintenance. Engineers operate detection, diagnosis, and
monitoring protocols and techniques from remote locations to maintain Ethernet
networks. Remote OAM maintenance saves the trouble of onsite maintenance and helps
reduce maintenance and operation expenditures.
 Ethernet OAM supports various performance monitoring tools that are used to monitor
network operation and assess service quality based on SLAs. If a device using the tools
detects faults, the device sends traps to a network management system (NMS). Carriers
use statistics and trap information on NMSs to adjust services. The tools help ensure
proper transmission of voice and data services.
Issue 01 (2018-05-04) 367

NE20E-S2
1.5.7.2 EFM Principles

OAMPDUs
EFM works at the data link layer and uses protocol packets called OAM protocol data units
(OAMPDUs). EFM devices periodically exchange OAMPDUs to report link status, helping
network administrators effectively manage networks. Figure 1-230 shows the format and
common types of OAMPDUs. Table 1-65 lists and describes fields in an OAMPDU.
Figure 1-230 OAMPDU format
Table 1-65 Fields and descriptions in an OAMPDU
Field Description
Dest addr Destination MAC address, which is a slow-protocol multicast address
0x0180-C200-0002. Network bridges cannot forward slow-protocol
packets. EFM OAMPDUs cannot be forwarded over multiple devices,
even if OAM is supported or enabled on the devices.
Source addr Source address, which is a unicast MAC address of a port on the
transmit end. If no port MAC address is specified on the transmit end,
the bridge MAC address of the transmit end is used.
Type Slow protocol type, which has a fixed value of 0x8809.
Subtype Subtype of a slow protocol. The value is 0x03, which means that the
slow sub-protocol is EFM.
Flags Status of an EFM entity:
 Remote Stable
 Remote Evaluating
 Local Stable
 Local Evaluating
 Critical Event
Issue 01 (2018-05-04) 368

NE20E-S2
Field Description
 Link Fault
Code OAMPDU type:

 0X00: Information OAMPDU
 0X01: Event Notification OAMPDU
 0X04: Loopback Control OAMPDU
Table 1-66 describes common types of OAMPDUs.
Table 1-66 OAMPDU types
OAMPDU Type Description

Information errored symbol period event
OAMPDU
Event Notification Used to monitor links. If an errored frame event, errored symbol
OAMPDU period event, or errored frame second summary event occurs on
an interface, the interface sends an Event Notification OAMPDU
to notify the remote interface of the event.
Loopback Control Used to enable or disable the remote loopback function.
OAMPDU
Connection Modes
EFM supports two connection modes: active and passive. Table 1-67 describes capabilities of
processing OAMPDUs in the two modes.
Table 1-67 Capabilities of processing OAMPDUs in active and passive modes
Capability Active Mode Passive Mode

Initiate a connection request by Supported Not supported
sending an Information OAMPDU
during the discovery process.
Respond to a connection request Supported Supported
during the discovery process.
Send Information OAMPDUs. Supported Supported
Send Event Notification Supported Supported
OAMPDUs.
Send Loopback Control OAMPDUs. Supported Not supported
Respond to Loopback Control Supported (The remote Supported
EFM entity must work in
Issue 01 (2018-05-04) 369

NE20E-S2
Capability Active Mode Passive Mode

OAMPDUs. active mode.)
 An EFM connection can be initiated only by an OAM entity working in active mode. An OAM
entity working in passive mode waits to receive a connection request from its peer entity. Two OAM
entities that both work in passive mode cannot establish an EFM connection between them.
 An OAM entity that is to initiate a loopback request must work in active mode.
1.5.7.2.2 Background
As telecommunication technologies develop quickly and the demand for service diversity is
increasing, various user-oriented teleservices are being provided over digital and intelligent
media through broadband paths. Backbone network technologies, such as synchronous digital
hierarchy (SDH), asynchronous transfer mode (ATM), passive optical network (PON), and
dense wavelength division multiplexing (DWDM), grow mature and popular. The
technologies allow the voice, data, and video services to be transmitted over a single path to
every home. Telecommunication experts and carriers focus on using existing network
resources to support new types of services and improve the service quality. The key point is to
provide a solution to the last-mile link to a user network.
A "last mile" reliability solution also needs to be provided. High-end clients, such as banks
and financial companies, demand high reliability. They expect carriers to monitor both carrier
networks and last-mile links that connect users to those carrier networks. EFM can be used to
satisfy these demands.
Figure 1-231 EFM network
On the network shown in Figure 1-231, EFM is an OAM mechanism that applies to the
last-mile Ethernet access links to users. Carriers use EFM to monitor link status in real time,
rapidly locate failed links, and identify fault types if faults occur. OAM entities exchange
various OAMPDUs to monitor link connectivity and locate link faults.
Issue 01 (2018-05-04) 370

NE20E-S2
1.5.7.2.3 Basic Functions

EFM supports OAM discovery, link monitoring, fault notification, and remote loopback. The
following example illustrates EFM implementation on the network shown in Figure 1-232.
The customer edge (CE) is a device in a customer equipment room, and provider edge 1 (PE1)
is a carrier device. EFM is used to monitor the link connecting the CE to PE1, allowing the
carrier to remotely monitor link connectivity and quality.
Figure 1-232 Typical EFM network
OAM Discovery
During the discovery phase, a local EFM entity discovers and establishes a stable EFM
connection with a remote EFM entity. Figure 1-233 shows the discovery process.
Issue 01 (2018-05-04) 371

NE20E-S2
Figure 1-233 OAM discovery
EFM entities at both ends of an EFM connection periodically exchange Information

OAMPDUs to monitor link connectivity. The interval at which Information OAMPDUs are
sent is also known as an interval between handshakes. If an EFM entity does not receive
Information OAMPDUs from the remote EFM entity within the connection timeout period,
the EFM entity considers the connection interrupted and sends a trap to the network
management system (NMS). Establishing an EFM connection is a way to monitor physical
link connectivity automatically.
Link Monitoring
Monitoring Ethernet links is difficult if network performance deteriorates while traffic is
being transmitted over physical links. To resolve this issue, the EFM link monitoring function
can be used. This function can detect data link layer faults in various environments. EFM
Issue 01 (2018-05-04) 372

NE20E-S2
entities that are enabled with link monitoring exchange Event Notification OAMPDUs to
monitor links.
If an EFM entity receives a link event listed in Table 1-68, it sends an Event Notification
OAMPDU to notify the remote EFM entity of the event and also sends a trap to an NMS.
After receiving the trap on the NMS, an administrator can determine the network status and
take remedial measures as needed.
Table 1-68 Common link events and their descriptions
Common Link Description Usage Scenario

Event
Errored symbol If the number of symbol errors This event helps the device detect
period event that occur on a device's interface code errors during data
during a specified period of time transmission at the physical layer.
reaches a specified upper limit,
the device generates an errored
symbol period event, advertises
the event to the remote device,
and sends a trap to the NMS.
Errored frame If the number of frame errors This event helps the device detect
event that occur on a device's interface frame errors that occur during data
during a specified period of time transmission at the MAC sublayer.
reaches a specified upper limit,
the device generates an errored
frame event, advertises the event
to the remote device, and sends a
trap to the NMS.
Errored frame An errored frame second is a This event helps the device detect
seconds one-second interval wherein at errored frame seconds that occur
summary event least one frame error is detected. during data transmission at the
If the number of errored frame MAC sublayer.
seconds that occur during a
specified period of time reaches
a specified upper limit on a
device's interface, the device
generates an errored frame
second summary event,
advertises the event to the
remote device, and sends a trap
to the NMS.
Fault Notification
After the OAM discovery phase finishes, two EFM entities at both ends of an EFM
connection exchange Information OAMPDUs to monitor link connectivity. If traffic is
interrupted due to a remote device failure, the remote EFM entity sends an Information
OAMPDU carrying an event listed in Table 1-69 to the local EFM entity. After receiving the
notification, the local EFM entity sends a trap to the NMS. An administrator can view the trap
on the NMS to determine link status and take measures to rectify the fault.
Issue 01 (2018-05-04) 373

NE20E-S2
Table 1-69 Critical link events
Critical Link Event Description

Link fault If a loss of signal (LoS) error occurs because the interval at
which OAMPDUs are sent elapses or a physical link fails, the
local device sends a trap to the NMS.
Critical event If an unidentified critical event occurs because a fault is detected
using association between the remote EFM entity and a specific
feature, the local device sends a trap to the NMS. Remote EFM
entities can be associated with protocols, including Bidirectional
Forwarding Detection (BFD), connectivity fault management
(CFM), and Multiprotocol Label Switching (MPLS) OAM.
Remote Loopback
Figure 1-234 demonstrates the principles of remote loopback. When a local interface sends
non-OAMPDUs to a remote interface, the remote interface loops the non-OAMPDUs back to
the local interface, not to the destination addresses of the non-OAMPDUs. This process is
called remote loopback. An EFM connection must be established to implement remote
loopback.
Figure 1-234 Principles of EFM remote loopback
A device enabled with remote loopback discards all data frames except OAMPDUs, causing a
service interruption. To prevent impact on services, use remote loopback to check link
connectivity and quality before a new network is used or after a link fault is rectified.
The local device calculates communication quality parameters such as the packet loss ratio on
the current link based on the numbers of sent and received packets. Figure 1-235 shows the
remote loopback process.
Issue 01 (2018-05-04) 374

NE20E-S2
Figure 1-235 Remote loopback process
If the local device attempts to stop remote loopback, it sends a message to instruct the remote
device to disable remote loopback. After receiving the message, the remote device disables
remote loopback.
If remote loopback is left enabled, the remote device keeps looping back service data, causing
a service interruption. To prevent this issue, a capability can be configured to disable remote
loopback automatically after a specified timeout period. After the timeout period expires, the
local device automatically sends a message to instruct the remote device to disable remote
loopback.
1.5.7.3 CFM Principles

Maintenance Domain
MDs are discrete areas within which connectivity fault detection is enabled. The boundary of
an MD is determined by MEPs configured on interfaces. An MD is identified by an MD
name.
To help locate faults, MDs are divided into levels 0 through 7. A larger value indicates a
higher level, and an MD covers a larger area. One MD can be tangential to another MD.
Issue 01 (2018-05-04) 375

NE20E-S2
Tangential MDs share a single device and this device has one interface in each of the MDs. A
lower level MD can be nested in a higher level MD. An MD must be fully nested in another
MD, and the two MDs cannot overlap. A higher level MD cannot be nested in a lower level
MD.
Classifying MDs based on levels facilitates fault diagnosis. MD2 is nested in MD1 on the
network shown in Figure 1-236. If a fault occurs in MD1, PE2 through PE6 and all the links
between the PEs are checked. If no fault is detected in MD2, PE2, PE3, and PE4 are working
properly. This means that the fault is on PE5, PE6, or PE7 or on a link between these PEs.
In actual network scenarios, a nested MD can monitor the connectivity of the higher level MD
in which it is nested. Level settings allow 802.1ag packets to transparently travel through a
nested MD. For example, on the network shown in Figure 1-236, MD2 with the level set to 3
is nested in MD1 with the level set to 6. 802.1ag packets must transparently pass through
MD2 to monitor the connectivity of MD1. The level setting allows 802.1ag packets to pass
through MD2 to monitor the connectivity of MD1 but prevents 802.1ag packets that monitor
MD2 connectivity from passing through MD1. Setting levels for MDs helps locate faults.
Figure 1-236 MDs
802.1ag packets are exchanged and CFM functions are implemented based on MDs. Properly
planned MDs help a network administrator locate faults.
Default MD
A single default MD with the highest priority can be configured for each device according to
Std 802.1ag-2007.
Issue 01 (2018-05-04) 376

NE20E-S2
Figure 1-237 Default MDs
On the network shown in Figure 1-237, if default MDs with the same level as the higher level
MDs are configured on devices in lower level MDs, MIPs are generated based on the default
MDs to reply to requests sent by devices in higher level MDs. CFM detects topology changes
and monitors the connectivity of both higher and lower level MDs.
The default MD must have a higher level than all MDs to which MEPs configured on the
local device belong. The default MD must also be of the same level as a higher level MD. The
default MD transmits high level request messages and generates MIPs to send responses.
Standard 802.1ag-2007 states that one default MD can be configured on each device and
associated with multiple virtual local area networks (VLANs). VLAN interfaces can
automatically generate MIPs based on the default MDs and a creation rule.
Maintenance Association
Multiple MAs can be configured in an MD as needed. Each MA contains MEPs. An MA is
uniquely identified by an MD name and an MA name.
An MA serves a specific service such as VLAN. A MEP in an MA sends packets carrying tags
of the specific service and receives packets sent by other MEPs in the MA.
Maintenance Association End Point

MEPs are located at the edge of an MD and MA. The service type and level of packets sent by
a MEP are determined by the MD and MA to which the MEP belongs. A MEP processes
packets at specific levels based on its own level. A MEP sends packets carrying its own level.
If a MEP receives a packet carrying a level higher than its own, the MEP does not process the
packet and loops it along the reverse path. If a MEP receives a packet carrying a level lower
than or equal to its own, the MEP processes the packet.
A MEP is configured on an interface. The MEP level is equal to the MD level.
A MEP configured on an Ethernet CFM-enabled device is called a local MEP. MEPs
configured on other devices in the same MA are called remote maintenance association end
points (RMEPs).
MEPs are classified into the following types:
 Inward-facing MEP: sends packets to other interfaces on the same device.
Issue 01 (2018-05-04) 377

NE20E-S2
 Outward-facing MEP: sends packets out of the interface on which the MEP is
configured.
Figure 1-238 shows inward- and outward-facing MEPs.
Figure 1-238 Inward- and outward-facing MEPs
Maintenance Association Intermediate Point

MIPs are located on a link between two MEPs within an MD, facilitating management. More
MIPs result in easier network management and control. Carriers set up more MIPs for
important services than for common services.
MIP creation modes
MIPs can be automatically generated based on rules or manually created on interfaces. Table
1-70 describes MIP creation modes.
Table 1-70 MIP creation modes
Creation Mode Description

Manual Only IEEE Std 802.1ag-2007 supports manual MIP configuration. The
configuration MIP level must be set. Manually configured MIPs are preferable to
automatically generated MIPs. Although configuring MIPs manually
is easy, managing many manually configured MIPs is difficult and
errors may occur.
Automatic A device automatically generates MIPs based on configured creation
creation rules. Configuring creation rules is complex, but properly configured
rules ensure correct MIP settings.
The following part describes automatic MIB creation principles.
Automatic MIP creation principles

A device automatically generates MIPs based on creation rules, which are configurable.
Creation rules are classified as explicit, default, or none rules, as listed in Table 1-71.
Issue 01 (2018-05-04) 378

NE20E-S2
Table 1-71 MIP creation rules
Version Manual Creation Rule MEPs Are MIPs Are

MIPs Exist Configured in Created
on an Lower Level MDs
Interface
IEEE Std Yes - - No
802.1ag-200
7 No Default No Yes
Explicit Yes Yes
None - -
The procedure for identifying a lower level MD is as follows:
1. Identify a service instance associated with the MD.

2. Query all interfaces in the service instance and check whether MEPs are configured on these
interfaces.
3. Query levels of all MEPs and locate the MEP with the highest level.
MIPs are separately calculated in each service instance such as a VLAN. In a single service
instance, MAs in MDs with different levels have the same VLAN ID but different levels.
For each service instance of each interface, the device attempts to calculate a MIP from the
lowest level MEP based on the rules listed in Table 1-70 and the following conditions:
 Each MD on a single interface has a specific level and is associated with multiple
creation rules. The creation rule with the highest priority applies. An explicit rule has a
higher priority than a default rule, and a default rule takes precedence over a none rule.
 The level of a MIP must be higher than any MEP on the same interface.
 An explicit rule applies to an interface only when MEPs are configured on the interface.
 A single MIP can be generated on a single interface. If multiple rules for generating
MIPs with different levels can be used, a MIP with the lowest level is generated.
MIP creation rules help detect and locate faults by level.
For example, CCMs are sent to detect a fault in a level 7 MD on the network shown in Figure
1-239. Loopback or linktrace is used to locate the fault in the link between MIPs that are in a
level 5 MD. This process is repeated until the faulty link or device is located.
Issue 01 (2018-05-04) 379

NE20E-S2
Figure 1-239 Hierarchical MIPs in MDs
The following example illustrates how to create a MIP based on a default rule defined in IEEE
Std 802.1ag-2007.
On the network shown in Figure 1-240, MD1 through MD5 are nested in MD7, and MD2
through MD5 are nested in MD1. MD7 has a higher level than MD1 through MD5, and MD1
has a higher level than MD2 through MD5. Multiple MEPs are configured on Device A in
MD1, and the MEPs belong to MDs with different levels.
Issue 01 (2018-05-04) 380

NE20E-S2
Figure 1-240 MIP creation based on IEEE Std 802.1ag-2007
A default rule is configured on Device A to create a MIP in MD1. The procedure for creating
the MIP is as follows:
1. Device A compares MEP levels and finds the MEP at level 5, the highest level. The
MEP level is determined by the level of the MD to which the MEP belongs.
2. Device A selects the MD at level 6, which is higher than the MEP of level 5.
3. Device A generates a MIP at level 6.
If MDs at level 6 or higher do not exist, no MIP is generated.
If MIPs at level 1 already exist on Device A, MIPs at level 6 cannot be generated.
Hierarchical MP Maintenance
MEPs and MIPs are maintenance points (MPs). MPs are configured on interfaces and belong
to specific MAs shown in Figure 1-241.
Issue 01 (2018-05-04) 381

NE20E-S2
Figure 1-241 MPs
The scope of maintenance performed and the types of maintenance services depend on the
need of the organizations that use carrier-class Ethernet services. These organizations include
leased line users, service providers, and network carriers. Users purchase Ethernet services
from service providers, and service providers use their networks or carrier networks to
provide E2E Ethernet services. Carriers provide transport services.
Figure 1-242 shows locations of MEPs and MIPs and maintenance domains for users, service
providers, and carriers.
Figure 1-242 Hierarchical MPs
Operator 1, operator 2, the service provider, and the customer use MDs with levels 3, 4, 5, and
6, respectively. A higher MD level indicates a larger MD.
Issue 01 (2018-05-04) 382

NE20E-S2
CFM Packet Format

CFM sends tagged protocol packets to detect link faults. Figure 1-243 shows the CFM packet
format.
Figure 1-243 CFM packet format
Table 1-72 describes the fields in a CFM packet.
Table 1-72 Fields in a CFM packet and their meanings
Field Description
MD Level Level of an MD. The value ranges from 0 to 7. A larger value

indicates a higher level.
Version Number of the CFM version. The current version is 0.
OpCode Message code value, specifying a specific type of CFM packet.
Table 1-73 describes the types of CFM packets.
Varies with value of Variables of message codes.
OpCode
Table 1-73 Types of CFM packets

OpCode Packet Type Function
Value
0x01 Continuity check message Used for monitoring E2E link connectivity.
(CCM)
Issue 01 (2018-05-04) 383

NE20E-S2
OpCode Packet Type Function

Value
0x02 Loopback reply (LBR) Reply to a Loopback message (LBM). LBRs

message are sent by local nodes enabled with loopback.
0x03 Loopback message (LBM) Sent by an interface that initiates loopback
detection.
0x04 Linktrace reply (LTR) Reply to a Linktrace message (LTM). LTRs
message are sent by local nodes enabled with linktrace.
0x05 Linktrace message (LTM) Sent by an interface to initiate a linktrace test.
IP-layer mechanisms, such as Simple Network Management Protocol (SNMP), IP ping, and
IP traceroute, are used to manage network-wide services, detect faults, and monitor
performance on traditional Ethernet networks. These mechanisms are unsuitable for
client-layer E2E Ethernet operation and management.
Figure 1-244 Typical CFM network
CFM supports service management, fault detection, and performance monitoring on the E2E
Ethernet network. In Figure 1-244:
 A network is logically divided into maintenance domains (MDs). For example, network
devices that a single Internet service provider (ISP) manages are in a single MD to
distinguish between ISP and user networks.
 Two maintenance association end points (MEPs) are configured on both ends of a
management network segment to be maintained to determine the boundary of an MD.
 Maintenance association intermediate points (MIPs) can be configured as needed. A
MEP initiates a test request, and the remote MEP (RMEP) or MIP responds to the request.
This process provides information about the management network segment to help detect
faults.
CFM supports level-specific MD management. An MD at a given level can manage MDs at
lower levels but cannot manage an MD at a higher level than its own. Level-specific MD
management is used to maintain a service flow based on level-specific MDs and different
types of service flows in an MD.
Issue 01 (2018-05-04) 384

NE20E-S2

CFM supports continuity check (CC), loopback (LB), and linktrace (LT) functions.
Continuity Check
CC monitors the connectivity of links between MEPs. A MEP periodically sends multicast
continuity check messages (CCMs) to an RMEP in the same MA. If an RMEP does not
receive a CCM within a period 3.5 times the interval at which CCMs are sent, the RMEP
considers the path between itself and the MEP faulty.
Figure 1-245 CC
The CC process is as follows:

1. CCM generation
A MEP generates and sends CCMs. MEP1, MEP2, and MEP3 are in the same MA on the
network shown in Figure 1-245 and are enabled to send CCMs to one another at the
same interval.
Each CCM carries a level equal to the MEP level.
2. MEP database establishment
Every Ethernet CFM-enabled device has a MEP database. A MEP database records
information about the local MEP and RMEPs in the same MA. The local MEP and
RMEPs are manually configured, and their information is automatically recorded in the
MEP database.
3. Fault identification
If a MEP does not receive CCMs from its RMEP within a period 3.5 times the interval at
which CCMs are sent, the MEP considers the path between itself and the RMEP faulty. A
log is generated to provide information for fault diagnosis. A user can implement
loopback or linktrace to locate the fault. MEPs in an MA exchange CCMs to monitor
links, implementing multipoint to multipoint (MP2MP) detection.
4. CCM processing
If a MEP receives a CCM carrying a level higher than the local level, it forwards this
CCM. If a MEP receives a CCM carrying a level lower than the local level, it does not
Issue 01 (2018-05-04) 385

NE20E-S2
forward this CCM. This process prevents a lower level CCM from being sent to a higher
level MD.
Loopback
Loopback is also called 802.1ag MAC ping. Similar to IP ping, loopback monitors the
connectivity of a path between a local MEP and an RMEP.
A MEP initiates an 802.1ag MAC ping test to monitor the reachability of an RMEP or MIP
destination address. The MEP, MIP, and RMEP have the same level and they can share an MA
or be in different MAs. The MEP sends Loopback messages (LBMs) to the RMEP or MIP.
After receiving the messages, the RMEP or MIP replies with loopback replies (LBRs).
Loopback helps locate a faulty node because a faulty node cannot send an LBR in response to
an LBM. LBMs and LBRs are unicast packets.
The following example illustrates the implementation of loopback on the network shown in
Figure 1-246.
Figure 1-246 Loopback
CFM is configured to monitor a path between PE1 (MEP1) and PE4 (MEP2). The MD level
of these MEPs is 6. A MIP with a level of 6 is configured on PE2 and PE3. If a fault is
detected in a link between PE1 and PE4, loopback can be used to locate the fault. Figure
1-247 illustrates the loopback process.
Issue 01 (2018-05-04) 386

NE20E-S2
Figure 1-247 Loopback process
MEP1 can measure the network delay based on 802.1ag MAC ping results or the frame loss
ratio based on the difference between the number of LBMs and the number of LBRs.
Linktrace
Linktrace is also called 802.1ag MAC trace. Similar to IP traceroute, linktrace identifies a
path between two MEPs.
A MEP initiates an 802.1ag MAC trace test to monitor a path to an RMEP or MIP destination
address. The MEP, MIP, and RMEP have the same level and they can share an MA or be in
different MAs. A source MEP constructs and sends a Linktrace message (LTM) to a
destination MEP. After receiving this message, each MIP forwards it and replies with a
linktrace reply (LTR). Upon receipt, the destination MEP replies with an LTR and does not
forward the LTM. The source MEP obtains topology information about each hop on the path
based on the LTRs. LTMs are multicast packets and LTRs are unicast packets.
Issue 01 (2018-05-04) 387

NE20E-S2
Figure 1-248 Linktrace
The following example illustrates the implementation of linktrace on the network shown in
Figure 1-248.
1. MEP1 sends MEP2 an LTM carrying a time to live (TTL) value and the MAC address of
the destination MEP2.
2. After the LTM arrives at MIP1, MIP1 reduces the TTL value in the LTM by 1 and
forwards the LTM if the TTL is not zero. MIP1 then replies with an LTR to MEP1. The
LTR carries forwarding information and the TTL value carried by the LTM when MIP1
received it.
3. After the LTM reaches MIP2 and MEP2, the process described above for MIP1 is
repeated for MIP2 and MEP2. In addition, MEP2 finds that its MAC address is the
destination address carried in the LTM and therefore does not forward the LTM.
4. The LTRs from MIP1, MIP2, and MEP2 provide MEP1 with information about the
forwarding path between MEP1 and MEP2.
If a fault occurs on the path between MEP1 and MEP2, MEP2 or a MIP cannot receive
the LTM or reply with an LTR. MEP1 can locate the faulty node based on such a
response failure. For example, if the link between MEP1 and MIP2 works properly but
the link between MIP2 and MEP2 fails, MEP1 can receive LTRs from MIP1 and MIP2
but fails to receive a reply from MEP2. MEP1 then considers the path between MIP2 and
MEP2 faulty.
1.5.7.3.4 CFM Alarms
Alarm Types
If CFM detects a fault in an E2E link, it triggers an alarm and sends the alarm to the network
management system (NMS). A network administrator uses the information to troubleshoot.
Table 1-74 describes alarms supported by CFM.
Table 1-74 Alarms supported by CFM
Alarm Name Description

hwDot1agCfmUnexpectedMEG A MEP receives a CCM frame with an incorrect MEG
Issue 01 (2018-05-04) 388

NE20E-S2
Alarm Name Description

Level level.
hwDot1agCfmUnexpectedMEG During an interval equal to 3.5 times the CCM
LevelCleared transmission period, a MEP does not receive CCM
frames with an incorrect MEG level.
hwDot1agCfmMismerge A MEP receives a CCM frame with a correct MEG
level but an incorrect MEG ID.
hwDot1agCfmMismergeCleared During an interval equal to 3.5 times the CCM
transmission period, a MEP does not receive CCM
frames with an incorrect MEG ID.
hwDot1agCfmUnexpectedMEP A MEP receives a CCM frame with a correct MEG
level and MEG ID but an unexpected MEP ID.
hwDot1agCfmUnexpectedMEP During an interval equal to 3.5 times the CCM
Cleared transmission period, a MEP does not receive CCM
frames with an unexpected MEP ID.
hwDot1agCfmUnexpectedPerio A MEP receives a CCM frame with a correct MEG
d level, MEG ID, and MEP ID but a Period field value
different than its own CCM transmission period.
hwDot1agCfmUnexpectedPerio During an interval equal to 3.5 times CCM transmission
dCleared period, a MEP does not receive CCM frames with an
incorrect Period field value.
hwDot1agCfmUnexpectedMAC A MEP receives a CCM carrying a source MAC
address different from the locally specified RMEP's
MAC address.
hwDot1agCfmUnexpectedMAC The alarm about RMEP MAC inconsistency is cleared.
Cleared
hwDot1agCfmLOC During an interval equal to 3.5 times the CCM
transmission period, a MEP does not receive CCM
frames from a peer MEP.
hwDot1agCfmLOCCleared During an interval equal to 3.5 times the CCM
transmission period, a MEP receives n CCM frames
from a peer MEP.
hwDot1agCfmExceptionalMAC The interface connecting the RMEP to the MEP does
Status not work properly based on Status type-length-value
(TLV) information carried in a CCM sent by an RMEP.
hwDot1agCfmExceptionalMAC The interface connecting the RMEP to the MEP is
StatusCleared restored based on Status TLV information carried in a
CCM sent by an RMEP.
hwDot1agCfmRDI A MEP receives a CCM frame with the RDI field set.
hwDot1agCfmRDICleared A MEP receives a CCM frame with the RDI field
cleared.
Issue 01 (2018-05-04) 389

NE20E-S2
Alarm Anti-jitter
Multiple alarms and clear alarms may be generated on an unstable network enabled with CC.
These alarms consume system resources and deteriorate system performance. An RMEP
activation time can be set to prevent false alarms, and an alarm anti-jitter time can be set to
limit the number of alarms generated.
Table 1-75 Alarm anti-jitter
Setting
RMEP Prevents false alarms. A local MEP with the ability to receive CCMs can
activation accept CCMs only after the RMEP activation time elapses.
time
Alarm If a MEP detects a connectivity fault,
anti-jitter  it sends an alarm to the NMS after the anti-jitter time elapses.
time
 it does not send an alarm if the fault is rectified before the anti-jitter time
elapses.
Clear alarm If a MEP detects a connectivity fault and sends an alarm,

anti-jitter  it sends a clear alarm if the fault is rectified within the clear alarm
time anti-jitter time.
 it does not send a clear alarm if the fault is not rectified within the clear
alarm anti-jitter time.
Alarm Suppression
If different types of faults trigger more than one alarm, CFM alarm suppression allows the
alarm with the highest level to be sent to the NMS. If alarms persist after the alarm with the
highest level is cleared, the alarm with the second highest level is sent to the NMS. The
process repeats until all alarms are cleared.
The principles of CFM alarm suppression are as follows:
 Alarms with high levels require immediate troubleshooting.
 A single fault may trigger alarms with different levels. After the alarm with the highest
level is cleared, alarms with lower levels may also be cleared.
1.5.7.4 Y.1731 Principles

EFM and CFM are used to detect link faults. Y.1731 is an enhancement of CFM and is used to
monitor service performance.
Issue 01 (2018-05-04) 390

NE20E-S2
Figure 1-249 Typical Y.1731 networking
Figure 1-249 shows typical Y.1731 networking. Y.1731 performance monitoring tools can be
used to assess the quality of the purchased Ethernet tunnel services or help a carrier conduct
regular service level agreement (SLA) monitoring.
Function Overview
Y.1731 can manage fault information and monitor performance.
 Fault management functions include continuity check (CC), loopback (LB), and linktrace
(LT). The principles of Y.1731 fault management are the same as those of CFM fault
management.
 Performance monitoring functions include single- and dual-ended frame loss
measurement, one- and two-way frame delay measurement, alarm indication signal
(AIS), Ethernet test function (ETH-Test), Single-ended Synthetic Loss Measurement
(SLM), Ethernet lock signal function (ETH-LCK), ETH-BN on virtual private LAN
service (VPLS) networks, virtual leased line (VLL) networks, and virtual local area
networks (VLANs).Kompella VPLS and VLL scenarios support AIS only.
Table 1-76 Y.1731 functions
Function Description Principles

Single-ended Collects frame loss To collect frame loss statistics, select
Frame Loss statistics to assess the either single- or dual-ended frame loss
Measurement quality of links between measurement:
MEPs, independent of  Dual-ended frame loss measurement
continuity check (CC). provides more accurate results than
Dual-ended Collects frame loss single-ended frame loss measurement.
Frame Loss statistics to assess link The interval between dual-ended frame
Measurement quality on CFM loss measurements varies with the
CC-enabled devices. interval between CCM transmissions.
The CCM transmission interval is
Issue 01 (2018-05-04) 391

NE20E-S2

shorter than the interval between
single-ended frame loss measurements.
Dual-ended frame loss measurement
allows for a short interval between
dual-ended frame loss measurements.
can be used to minimize the impact of
many CCMs on the network.
One-way Measures the network To measure the link delay, select either
Frame Delay delay on a unidirectional one- or two-way frame delay
Measurement link between MEPs. measurement:
 One-way frame delay measurement can
Two-way Measures the network
Frame Delay delay on a bidirectional be used to measure the delay on a
Measurement link between MEPs. unidirectional link between a MEP and
its RMEP. The MEP must synchronize
its time with its RMEP.
can be used to measure the delay on a
bidirectional link between a MEP and
its RMEP. The MEP does not need to
synchronize its time with its RMEP.
AIS Detects server-layer faults AIS is used to suppresses local alarms

and suppresses alarms, when faults must be rapidly detected.
minimizing the impact on
network management
systems (NMSs).
ETH-Test Verifies bandwidth  ETH-Test is used for a carrier to verify
throughput and bit errors. the throughput and bit errors for a
newly established link.
 ETH-Test is used for a user to verify
the throughput and bit errors for a
leased link.
ETH-LCK Informs the server-layer The ETH-LCK function must work with
(sub-layer) MEP of the ETH-Test function.
administrative locking and
the interruption of traffic
destined for the MEP in the
inner maintenance domain
(MD).
Single-ended Collects frame loss Single-ended synthetic frame LM is used
Synthetic Loss statistics on to collect accurate frame loss statistics on
Measurement point-to-multipoint or point-to-multipoint links.
(SLM) E-Trunk links to monitor
link quality.
Issue 01 (2018-05-04) 392

NE20E-S2

ETH-BN Enables server-layer MEPs When routing devices connect to
to notify client-layer MEPs microwave devices, enable the ETH-BN
of the server layer's receiving function on the routing devices
connection bandwidth to associate bandwidth with the microwave
when routing devices devices.
connect to microwave
devices. The server-layer
devices are microwave
devices, and the
client-layer devices are
routing devices. Routing
devices can only function
as ETH-BN packets'
receive ends and must
work with microwave
devices to implement this
function.
ETH-LM
Ethernet frame loss measurement (ETH-LM) enables a local MEP and its RMEP to exchange
ETH-LM frames to collect frame loss statistics on E2E links. ETH-LM modes are classified
as near- or far-end ETH-LM.
Near-end ETH-LM applies to an inbound interface, and far-end ETH-LM applies to an
outbound interface on a MEP. ETH-LM counts the number of errored frame seconds to
determine the duration during which a link is unavailable.
ETH-LM supports the following methods:
This method measures frame loss proactively or on demand.
− On-demand measurement collects single-ended frame loss statistics at a time or a
specific number of times for diagnosis.
− Proactive measurement collects single-ended frame loss statistics periodically.
A local MEP sends a loss measurement message (LMM) carrying an ETH-LM request to
its RMEP. After receiving the request, the RMEP responds with a loss measurement
reply (LMR) carrying an ETH-LM response. Figure 1-250 illustrates the process for
single-ended frame loss measurement.
Issue 01 (2018-05-04) 393

NE20E-S2
Figure 1-250 Single-ended frame loss measurement
After single-ended frame loss measurement is enabled, a MEP on PE1 sends an RMEP
on PE2 an ETH-LMM carrying an ETH-LM request. The MEP then receives an
ETH-LMR message carrying an ETH-LM response from the RMEP on PE2. The
ETH-LMM carries a local transmit counter TxFCl (with a value of TxFCf), indicating
the time when the message was sent by the local MEP. After receiving the ETH-LMM,
PE2 replies with an ETH-LMR message, which carries the following information:
− TxFCf: copied from the ETH-LMM
− RxFCf: value of the local counter RxFCl at the time of ETH-LMM reception
− TxFCb: value of the local counter TxFCl at the time of ETH-LMM transmission
After receiving the ETH-LMR message, PE1 measures near- and far-end frame loss
based on the following values:
− Received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and local counter
RxFCl value that is the time when this ETH-LMR message was received. These
values are represented as TxFCf[tc], RxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this ETH-LMR message was received.
− Previously received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and
local counter RxFCl value that is the time when this ETH-LMR message was
received. These values are represented as TxFCf[tp], RxFCf[tp], TxFCb[tp], and
RxFCl[tp].
tp is the time when the previous ETH-LMR message was received.
Far-end frame loss = |TxFCf[tc] - TxFCf[tp]| - |RxFCf[tc] - RxFCf[tp]|
Near-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Service packets are prioritized based on 802.1p priorities and are transmitted using
different policies. Traffic passing through the P device on the network shown in Figure
1-251 carries 802.1p priorities of 1 and 2.
Single-ended frame loss measurement is enabled on PE1 to send traffic with a priority of
1 to measure frame loss on a link between PE1 and PE2. Traffic with a priority of 2 is
also sent. After receiving traffic with priorities of 1 and 2, the P device forwards traffic
with a higher priority, delaying the arrival of traffic with a priority of 1 at PE2. As a
result, the frame loss ratio is inaccurate.
Issue 01 (2018-05-04) 394

NE20E-S2
802.1p priority-based single-ended frame loss measurement can be enabled to obtain

accurate results.
Figure 1-251 802.1p priority-based single-ended frame loss measurement
 Dual-ended frame loss measurement

This method measures frame loss periodically, implementing error management. Each
MEP sends its RMEP a dual-ended ETH-LM message. After receiving an ETH-LM
message, a MEP collects near- and far-end frame loss statistics but does not forward the
ETH-LM message. Figure 1-252 illustrates the process for dual-ended frame loss
measurement.
Figure 1-252 Dual-ended frame loss measurement
After dual-ended frame loss measurement is configured, each MEP periodically sends a
CCM carrying a request to its RMEP. After receiving the CCM, the RMEP collects near-
and far-end frame loss statistics but does not forward the message. The CCM carries the
following information:
− TxFCf: value of the local counter TxFCl at the time of CCM transmission
Issue 01 (2018-05-04) 395

NE20E-S2
− RxFCb: value of the local counter RxFCl at the time of the reception of the last
CCM
− TxFCb: value of TxFCf in the last received CCM
PE1 uses received information to measure near- and far-end frame loss based on the
following values:
− Received CCM's TxFCf, RxFCb, and TxFCb values and local counter RxFCl value
that is the time when this CCM was received. These values are represented as
TxFCf[tc], RxFCb[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this CCM was received.
− Previously received CCM's TxFCf, RxFCb, and TxFCb values and local counter
RxFCl value that is the time when this CCM was received. These values are
represented as TxFCf[tp], RxFCb[tp], TxFCb[tp], and RxFCl[tp].
tp is the time when the previous CCM was received.
Far-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCb[tc] - RxFCb[tp]|
Near-end frame loss = |TxFCf[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
ETH-DM
Delay measurement (DM) measures the delay and its variation. A MEP sends its RMEP a
message carrying ETH-DM information and receives a response message carrying ETH-DM
information from its RMEP.
ETH-DM supports the following modes:
 One-way frame delay measurement
A MEP sends its RMEP a 1DM message carrying one-way ETH-DM information. After
receiving this message, the RMEP measures the one-way frame delay and its variation.
One-way frame delay measurement can be implemented only after the MEP
synchronizes the time with its RMEP. The delay variation can be measured regardless of
whether the MEP synchronizes the time with its RMEP. If a MEP synchronizes its time
with its RMEP, the one-way frame delay and its variation can be measured. If the time is
not synchronized, only the one-way delay variation can be measured.
One-way frame delay measurement can be implemented in either of the following
modes:
− On-demand measurement: calculates the one-way frame delay at a time or a
specific number of times for diagnosis.
− Proactive measurement: calculates the one-way frame delay periodically.
Figure 1-253 illustrates the process for one-way frame delay measurement.
Issue 01 (2018-05-04) 396

NE20E-S2
Figure 1-253 One-way frame delay measurement
One-way frame delay measurement is implemented on an E2E link between a local MEP
and its RMEP. The local MEP sends 1DMs to the RMEP and then receives replies from
the RMEP. After one-way frame delay measurement is configured, a MEP periodically
sends 1DMs carrying TxTimeStampf (the time when the 1DM was sent). After receiving
the 1DM, the RMEP parses TxTimeStampf and compares this value with RxTimef (the
time when the DM frame was received). The RMEP calculates the one-way frame delay
based on these values using the following equation:
Frame delay = RxTimef - TxTimeStampf
The frame delay can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets are used to prioritize services. Traffic passing
through the P device on the network shown in Figure 1-254 carries 802.1p priorities of 1
and 2.
One-way frame delay measurement is enabled on PE1 to send traffic with a priority of 1
to measure the frame delay on a link between PE1 and PE2. Traffic with a priority of 2 is
result, the frame delay calculated on PE2 is inaccurate.
802.1p priority-based one-way frame delay measurement can be enabled to obtain
accurate results.
Issue 01 (2018-05-04) 397

NE20E-S2
Figure 1-254 802.1p priority-based one-way frame delay measurement

A MEP sends its RMEP a delay measurement message (DMM) carrying an ETH-DM
request. After receiving the DMM, the RMEP sends the MEP a delay measurement reply
(DMR) carrying an ETH-DM response.
Two-way frame delay measurement can be implemented in either of the following
modes:
− On-demand measurement: calculates the two-way frame delay at a time for
diagnosis.
− Proactive measurement: calculates the two-way frame delay periodically.
Figure 1-255 illustrates the process for two-way frame delay measurement.
Figure 1-255 Two-way frame delay measurement
Two-way frame delay measurement is performed by a local MEP to send a delay

measurement message (DMM) to its RMEP and then receive a DMR from the RMEP.
After two-way frame delay measurement is configured, a MEP periodically sends
Issue 01 (2018-05-04) 398

NE20E-S2
DMMs carrying TxTimeStampf (the time when the DMM was sent). After receiving the
DMM, the RMEP replies with a DMR message. This message carries RxTimeStampf
(the time when the DMM was received) and TxTimeStampb (the time when the DMR
was sent). The value in every field of the DMM is copied to the DMR, with the
exception that the source and destination MAC addresses were interchanged. Upon
receipt of the DMR message, the MEP calculates the two-way frame delay using the
following equation:
Frame delay = (RxTimeb - TxTimeStampf) - (TxTimeStampb - RxTimeStampf)
The frame delay can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets are used to prioritize services. Traffic passing
through the P device on the network shown in Figure 1-256 carries 802.1p priorities of 1
and 2.
Two-way frame delay measurement is enabled on PE1 to send traffic with a priority of 1
to measure the frame delay on a link between PE1 and PE2. Traffic with a priority of 2 is
result, the frame delay calculated on PE2 is inaccurate.
802.1p priority-based two-way frame delay measurement can be enabled to obtain
accurate results.
Figure 1-256 802.1p priority-based two-way frame delay measurement
AIS
AIS is a protocol used to transmit fault information.
A MEP is configured in MD1 with a level of 6 on each of CE1 and CE2 access interfaces on
the user network shown in Figure 1-257. A MEP is configured in MD2 with a level of 3 on
each of PE1 and PE2 access interfaces on a carrier network.
Issue 01 (2018-05-04) 399

NE20E-S2
 If CFM detects a fault in the link between AIS-enabled PEs, CFM sends AIS packet data
units (PDUs) to CEs. After receiving the AIS PDUs, the CEs suppress alarms,
minimizing the impact of a large number of alarms on a network management system
(NMS).
 After the link between the PEs recovers, the PEs stop sending AIS PDUs. CEs do not
receive AIS PDUs during a period of 3.5 times the interval at which AIS PDUs are sent.
Therefore, the CEs cancel the alarm suppression function.
Figure 1-257 AIS principles
ETH-Test
ETH-Test is used to perform one-way on-demand in-service or out-of-service diagnostic tests
on the throughput, frame loss, and bit errors.
The implementation of these tests is as follows:
 Verifying throughput and frame loss: Throughput means the maximum bandwidth of a
link without packet loss. When you use ETH-Test to verify the throughput, a MEP sends
frames with ETH-Test information at a preconfigured traffic rate and collects frame loss
statistics for a specified period. If the statistical results show that the number of sent
frames is greater than the number of received frames, frame loss occurs. The MEP sends
frames at a lower rate until no frame loss occurs. The traffic rate measured at the time
when no packet loss occurs is the throughput of this link.
 Verifying bit errors: ETH-Test is implemented by verifying the cyclic redundancy code
(CRC) of the Test TLV field carried in ETH-Test frames. For the ETH-Test
implementation, four types of test patterns can be specified in the test TLV field: Null
signal without CRC-32, Null signal with CRC-32, PRBS 2-31-1 without CRC-32, and
PRBS 2-31-1 with CRC-32. Null signal indicates all 0s signal. PRBS, pseudo random
binary sequence, is used to simulate white noise. A MEP sends ETH-Test frames
carrying the calculated CRC value to the RMEP. After receiving the ETH-Test frames,
the RMEP recalculates the CRC value. If the recalculated CRC value is different from
the CRC value carried in the sent ETH-Test frames, bit errors occur.
ETH-Test provides two types of test modes: out-of-service ETH-Test and in-service
ETH-Test:
 Out-of-service ETH-Test mode: Client data traffic is interrupted in the diagnosed entity.
To resolve this issue, the out-of-service ETH-Test function must be used together with
the ETH-LCK function.
Issue 01 (2018-05-04) 400

NE20E-S2
 In-service ETH-Test mode: Client data traffic is not interrupted, and the frames with the
ETH-Test information are transmitted using part of bandwidths.
ETH-LCK
ETH-LCK is used for administrative locking on the MEP in the outer MD with a higher level
than the inner MD, that is, preventing CC alarms from being generated in the outer MD.
When implementing ETH-LCK, a MEP in the inner MD sends frames with the ETH-LCK
information to the MEP in the outer MD. After receiving the frames with the ETH-LCK
information, the MEP in the outer MD can differentiate the alarm suppression caused by
administrative locking from the alarm suppression caused by a fault in the inner MD (the AIS
function).
To suppress CC alarms from being generated in the outer MD, ETH-LCK is implemented
with out-of-service ETH-Test. A MEP in the inner MD with a lower level initiates ETH-Test
by sending an ETH-LCK frame to a MEP in the outer MD. Upon receipt of the ETH-LCK
frame, the MEP in the outer MDsuppresses all CC alarms immediately and reports an
ETH-LCK alarm indicating administrative locking. Before out-of-service ETH-Test is
complete, the MEP in the inner MD sends ETH-LCK frames to the MEP in the outer MD.
After out-of-service ETH-Test is complete, the MEP in the inner MD stops sending
ETH-LCK frames. If the MEP in the outer MD does not receive ETH-LCK frames for a
period 3.5 times as long as the specified interval, it releases the alarm suppression and reports
a clear ETH-LCK alarm.
As shown in Figure 1-258, MD2 with the level of 3 is configured on PE1 and PE2; MD1 with
the level of 6 is configured on CE1 and CE2. When PE1's MEP1 sends out-of-service
ETH-Test frames to PE2's MEP2, MEP1 also sends ETH-LCK frames to CE1's MEP11 and
CE2's MEP22 separately to suppress MEP11 and MEP22 from generating CC alarms. When
MEP1 stops sending out-of-service ETH-Test frames, it also stops sending ETH-LCK frames.
If MEP11 and MEP22 do not receive ETH-LCK frames for a period 3.5 times as long as the
specified interval, they release the alarm suppression.
Figure 1-258 ETH-LCK
Issue 01 (2018-05-04) 401

NE20E-S2
Single-ended ETH-SLM
SLM measures frame loss using synthetic frames instead of data traffic. When implementing
SLM, the local MEP exchanges frames containing ETH-SLM information with one or more
RMEPs.
Figure 1-259 demonstrates the process of single-ended SLM:
1. The local MEP sends ETH-SLM request frames to the RMEPs.
2. After receiving the ETH-SLM request frames, the RMEPs send ETH-SLM reply frames
to the local MEP.
A frame with the single-ended ETH-SLM request information is called an SLM, and a frame
with the single-ended ETH-SLM reply information is called an SLR. SLM frames carry SLM
protocol data units (PDUs), and SLR frames carry SLR PDUs.
Single-ended SLM and single-ended frame LM are differentiated as follows: On the
point-to-multipoint network shown in Figure 1-259, inward MEPs are configured on PE1's
and PE3's interfaces, and single-ended frame LM is performed on the PE1-PE3 link. Traffic
coming through PE1's interface is destined for both PE2 and PE3, and single-ended frame LM
will collect frame loss statistics for all traffic, including the PE1-to-PE2 traffic. As a result, the
collected statistics are not accurate. Unlike singled-ended frame LM, single-ended SLM
collects frame loss statistics only for the PE1-to-PE3 traffic, which is more accurate.
Figure 1-259 Single-ended SLM
When implementing single-ended SLM, PE1 sends SLM frames to PE3 and receives SLR
frames from PE3. SLM frames contain TxFCf, the value of TxFCl (frame transmission
counter), indicating the frame count at the transmit time. SLR frames contain the following
information:
 TxFCf: value of TxFCl (frame transmission counter) indicating the frame count on PE1
upon the SLM transmission
 TxFCb: value of RxFC1 (frame receive counter) indicating the frame count on PE3 upon
the SLR transmission
After receiving the last SLR frame during a measurement period, a MEP on PE1 measures the
near-end and far-end frame loss based on the following values:
 Last received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive counter)
indicating the frame count on PE1 upon the SLR reception. These values are represented
as TxFCf[tc], TxFCb[tc], and RxFCl[tc].
Issue 01 (2018-05-04) 402

NE20E-S2
tc indicates the time when the last SLR frame was received during the measurement
period.
 Previously received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive
counter) indicating the frame count on PE1 upon the SLR reception. These values are
represented as TxFCf[tp], TxFCb[tp], and RxFCl[tp].
tp indicates the time when the last SLR frame was received during the previous
measurement period.
Far-end frame loss = |TxFCf[tc] – TxFCf[tp]| – |TxFCb[tc] – TxFCb[tp]|
Near-end frame loss = |TxFCb[tc] – TxFCb[tp]| – |RxFCf[tc] – RxFCf[tp]|
On a network, each packet carries the IEEE 802.1p field, indicating its priority. According to
packet priority, different QoS policies will be applied. On the network shown in Figure 1-260,
the PE1-to-PE3 traffic has two priorities: 1 and 2, as indicated by the IEEE 802.1p field.
When implementing single-ended SLM for traffic over the PE1-PE3 link, PE1 sends SLM
frames with varied priorities and checks the frame loss. Based on the check result, the
network administrator can adjust the QoS policy for the link.
Figure 1-260 Single-ended SLM based on different 802.1p priorities
ETH-BN
Ethernet bandwidth notification (ETH-BN) enables server-layer MEPs to notify client-layer
MEPs of the server layer's connection bandwidth when routing devices connect to microwave
devices. The server-layer devices are microwave devices, which dynamically adjust the
bandwidth according to the prevailing atmospheric conditions. The client-layer devices are
routing devices. Routing devices can only function as ETH-BN packets' receive ends and
must work with microwave devices to implement this function.
As shown in Figure 1-261, server-layer MEPs are configured on the server-layer devices, and
the ETH-BN sending function is enabled. The levels of client-layer MEPs must be specified
for the server-layer MEPs when the ETH-BN sending function is enabled. Client-layer MEPs
are configured on the client-layer devices, and the ETH-BN receiving function is enabled. The
levels of the client-layer MEPs are the same as those specified for the server-layer MEPs.
 If the ETH-BN function has been enabled on the server-layer devices Device2 and
Device3 and the bandwidth of the server-layer devices' microwave links decreases, the
server-layer devices send ETH-BN packets to the client-layer devices (Device1 and
Issue 01 (2018-05-04) 403

NE20E-S2
Device4). After receiving the ETH-BN packets, the client-layer MEPs can use bandwidth
information in the packets to adjust service policies, for example, to reduce the rate of
traffic sent to the degraded links.
 When the server-layer devices' microwave links work properly, whether to send
ETH-BN packets is determined by the configuration of the server-layer devices. When
the server-layer microwave devices stop sending ETH-BN packets, the client-layer
devices do not receive any ETH-BN packets. The ETH-BN data on the client-layer
devices is aged after 3.5 times the interval at which ETH-BN packets are sent.
When planning ETH-BN, you must check that the service burst traffic is consistent with a device's
buffer capability.
Figure 1-261 Basic principles of ETH-BN
Usage Scenario
Y.1731 supports performance statistics collection on both end-to-end and end-to-multi-end
links.
End-to-end performance statistics collection
On the network shown in Figure 1-262, Y.1731 collects statistics about the end-to-end link
performance between the CE and PE1, between PE1 and PE2, or between the CE and PE3.
End-to-multi-end performance statistics collection
On the network shown in Figure 1-263, user-to-network traffic from different users traverses
CE1 and CE2 and is converged on CE3. CE3 forwards the converged traffic to the UPE.
Network-to-user traffic traverses CE3, and CE3 forwards the traffic to CE1 and CE2.
When Y.1731 is used to collect statistics about the link performance between the CE and the
UPE, end-to-end performance statistics collection cannot be implemented. This is because
only one inbound interface (on the UPE) sends packets but two outbound interfaces (on CE1
and CE2) receive the packets. In this case, statistics on the outbound interfaces fail to be
collected. To resolve this issue, end-to-multi-end performance statistics collection can be
implemented.
The packets carry the MAC address of CE1 or CE2. The UPE identifies the outbound
interface based on the destination MAC address carried in the packets and collects end-to-end
performance statistics.
Issue 01 (2018-05-04) 404

NE20E-S2
Figure 1-262 End-to-end performance statistics collection
Figure 1-263 End-to-multi-end performance statistics collection
Both end-to-multi-end and end-to-end performance statistics collection applies to VLL, VPLS,
and VLAN scenarios and has the same statistics collection principles.
1.5.7.5 Ethernet OAM Fault Advertisement

Link detection protocols are used to monitor the connectivity of links between devices and
detect faults. A single fault detection protocol cannot detect all faults in all links on a complex
network. A combination of protocols and techniques must be used to detect link faults.
Ethernet OAM detects faults in Ethernet links and advertises fault information to interfaces or
other protocol modules. Ethernet OAM fault advertisement is implemented by an OAM
manager (OAMMGR) module, application modules, and detection modules. An OAMMGR
module associates one module with another. A detection module monitors link status and
network performance. If a detection module detects a fault, it instructs the OAMMGR module
to notify an application module or another detection module of the fault. After receiving the
Issue 01 (2018-05-04) 405

NE20E-S2
notification, the application or detection module takes measures to prevent a communication

interruption or service quality deterioration.
The OAMMGR module helps an Ethernet OAM module to advertise fault information to a
detection or application module. If an Ethernet OAM module detects a fault, it instructs the
OAMMGR module to send alarms to the network management system (NMS). A network
administrator takes measures based on information displayed on the NMS. Ethernet OAM
fault advertisement includes fault information advertisement between CFM and other
modules.
1.5.7.5.2 Fault Information Advertisement Between EFM and Other Modules
Between EFM and Detection Modules

The OAMMGR module associates EFM with detection modules, such as EFM, CFM, and
BFD modules. Fault information advertisement between EFM and detection modules enables
a device to delete MAC address entries once a fault is detected. Figure 1-264 shows the
network on which fault information is advertised between EFM and detection modules.
Figure 1-264 Fault information advertisement between EFM and detection modules
The following example illustrates fault information advertisement between EFM and
detection modules over a path CE5 -> CE4 -> CE1-> PE2 -> PE4 on the network shown in
Table 1-77.
Table 1-77 Fault information advertisement between EFM and detection modules
Function Issue to Be Resolved Solution
Deployment
EFM is used to  Although EFM detects a The EFM module can be
monitor the direct fault, EFM cannot notify associated with the CFM module.
link between CE1 PE6 of the fault. As a result,  If the EFM module detects a
and PE2, and CFM PE6 still forwards network fault, it instructs the
is used to monitor traffic to PE2, causing a OAMMGR module to notify
the link between traffic interruption. the CFM module of the fault.
PE2 and PE6.  Although CFM detects a  If the CFM module detects a
Issue 01 (2018-05-04) 406

NE20E-S2

Deployment
fault, CFM cannot notify fault, it instructs the
CE1 of the fault. As a OAMMGR module to notify
result, CE1 still forwards the EFM module of the fault.
user traffic to PE2, causing The association allows a module
a traffic interruption. to notify another associated
module of a fault and to send an
alarm to a network management
system (NMS). A network
administrator analyzes alarm
information and takes measures to
rectify the fault.
EFM is used to  Although EFM detects a The EFM module can be
monitor the direct fault, EFM cannot notify associated with the BFD module.
link between CE1 PE6 of the fault. As a result,  If the EFM module detects a
and PE2, and BFD PE6 still forwards network fault, it instructs the
is used to monitor traffic to PE2, causing a OAMMGR module to notify
the link between traffic interruption. the BFD module of the fault.
PE2 and PE6.  Although BFD detects a  If the BFD module detects a
fault, EFM cannot notify fault, it instructs the
CE1 of the fault. As a OAMMGR module to notify
result, CE1 still forwards the EFM module of the fault.
user traffic to PE2, causing
 If EFM on CE1 detects a fault
a traffic interruption.
or receives fault information
sent by PE2, the association
between EFM and BFD works
and deletes the MAC entry,
which switches traffic to a
backup link.
The association allows a module
to notify another associated
alarm to an NMS. A network
rectify the fault.
Fault Information Advertisement Between EFM and Application Modules

The OAMMGR module associates an EFM module with application modules, such as a
Virtual Router Redundancy Protocol (VRRP) module. Figure 1-265 shows the network on
which a user-side device is dual-homed to network-side devices, improving telecom service
reliability.
Issue 01 (2018-05-04) 407

NE20E-S2
Figure 1-265 Fault information advertisement between EFM and application modules
Table 1-78 describes fault information advertisement between EFM and VRRP modules.
Table 1-78 Fault information advertisement between EFM and VRRP modules
Deployment
 A VRRP If links connected to a VRRP To help prevent data loss, the
backup group backup group fail, VRRP packets VRRP module can be associated
is configured to cannot be sent to negotiate the with the EFM module. If a fault
determine the master/backup status. A backup occurs, the EFM module notifies
master/backup VRRP device preempts the the VRRP module of the fault.
status of Master state after a period of After receiving the notification,
provider three times the interval at which the VRRP module triggers a
edges-aggregat VRRP packets are sent. As a master/backup VRRP switchover.
ion result, data loss occurs.
(PE-AGGs).
 EFM is used to
monitor links
between the
UPE and
PE-AGGs.
1.5.7.5.3 Fault Information Advertisement Between CFM and Other Modules
Fault Information Advertisement Between CFM and Detection Modules

An OAMMGR module associates CFM with detection modules. A detection module can be
EFM, CFM, BFD. Fault information advertisement between CFM and detection modules
enables a device to delete ARP or MAC address entries once a fault is detected. Figure 1-266
shows the network on which fault information is advertised between CFM and detection
modules.
Issue 01 (2018-05-04) 408

NE20E-S2
Figure 1-266 Networking for fault information advertisement between CFM and detection
modules
The following example illustrates fault information advertisement between CFM and
detection modules over a path UPE1 -> PE2 -> PE4 -> PE6 -> PE8 on the network shown in
Table 1-79.
Table 1-79 Fault information advertisement between CFM and detection modules

Deployment
CFM is used to Although CFM detects a fault CFM can be associated with port
monitor the link in the link between UPE1 and 1.
between UPE1 and PE4, CFM cannot notify PE6  If CFM detects a fault, it
PE4. of the fault. As a result, PE6 instructs the OAMMGR
still forwards network traffic to module to disconnect port 1
PE4, causing a traffic intermittently. This operation
interruption. allows other modules to detect
Although port 1 on PE4 goes the fault.
Down, port 1 cannot notify  If port 1 goes Down, it
CE1 of the fault. As a result, instructs the OAMMGR
CE1 still forwards user traffic module to notify CFM of the
to PE4, causing a traffic fault. After receiving the
interruption. notification, CFM notifies PE6
of the fault.
The association between CFM and

a port is used to detect faults in an
active link of a link aggregation
group or in the link aggregation
group in 1:1 active/standby mode.
If a fault is detected, a protection
switchover is triggered.
EFM is used to Although CFM detects a fault, The EFM module can be
Issue 01 (2018-05-04) 409

NE20E-S2

Deployment
monitor the direct CFM cannot notify CE1 of the associated with the CFM module.
link between CE1 fault. As a result, CE1 still  If the EFM module detects a
and UPE1, and forwards user traffic to PE4, fault, it instructs the
CFM is used to causing a traffic interruption. OAMMGR module to notify
monitor the link the CFM module of the fault.
between UPE1 and
 If the CFM module detects a
PE4.
fault, it instructs the
OAMMGR module to notify
the EFM module of the fault.
The association allows a module
to notify another associated
alarm to an NMS. A network
rectify the fault.
CFM is configured  Although CFM detects a  Two CFM modules can be
to monitor the links fault in the link between associated with each other. If a
between UPE1 and PE4 and PE8, it cannot CFM module detects a fault, it
PE4 and between notify UPE1 of the fault. As instructs the OAMMGR
PE4 and PE8. a result, UPE1 still forwards module to notify the other
user traffic to PE4 through CFM module of the fault and
PE2, causing a traffic sends an alarm to an NMS. A
interruption. network administrator analyzes
 Although CFM detects a alarm information and takes
fault in the link between measures to rectify the fault.
UPE1 and PE4, it cannot  CFM can be associated with
notify PE8 of the fault. As a MAC or ARP entry clearing. If
result, PE8 still forwards CFM detects a fault, it
network traffic to PE4 instructs an interface to clear
through PE6, causing a MAC or ARP entries,
traffic interruption. triggering traffic to be
switched to a backup link.
 CFM is used to  Although CFM detects a The CFM module can be

monitor the link fault in the link between associated with the BFD module.
between UPE1 UPE1 and PE4, it cannot  If the CFM module detects a
and PE4. notify PE8 of the fault. As a fault, it instructs the
 BFD can be result, PE8 still forwards OAMMGR module to notify
used to monitor network traffic to PE4 the BFD module of the fault.
the through PE6, causing a
 If the BFD module detects a
non-Ethernet traffic interruption.
fault, it instructs the
link between  Although BFD detects a OAMMGR module to notify
PE4 and PE8. fault, BFD cannot notify the CFM module of the fault.
The UPE1 of the fault. As a
non-Ethernet result, UPE1 still forwards The association allows a module
link can be a user traffic to PE4 through to notify another associated
packet over PE2, causing a traffic module of a fault and to send an
synchronous interruption. alarm to an NMS. A network
digital hierarchy administrator analyzes alarm
Issue 01 (2018-05-04) 410

NE20E-S2

Deployment
(SDH)/synchron information and takes measures to
ous optical rectify the fault.
network
(SONET) (POS)
link.
Fault Information Advertisement Between CFM and Application Modules

The OAMMGR module associates a CFM module with application modules, such as a Virtual
Router Redundancy Protocol (VRRP) module.
Figure 1-267 shows the network on which a CFM module advertises fault information to a
VRRP module. Figure 1-268 shows the network on which a VRRP module advertises fault
information to a CFM module.
Issue 01 (2018-05-04) 411

NE20E-S2
Figure 1-267 Fault information advertisement by a CFM module to a VRRP module
Issue 01 (2018-05-04) 412

NE20E-S2
Figure 1-268 Fault information advertisement by a CFM module to a VRRP module
Table 1-80 describes fault information advertisement between CFM and VRRP modules.
Table 1-80 Fault information advertisement between CFM and VRRP modules
Function Deployment Issue to Be Resolved Solution

 A VRRP backup group If a fault occurs on the link CFM can be associated with
is configured to between NPE1 (the master) the VRRP module on NPEs.
determine the and PE-AGG1, NPE2 cannot If CFM detects a fault in the
master/backup status receive VRRP packets within link between PE-AGG1 and
of network provider a period of three times the NPE1, it instructs the
edges (NPEs). interval at which VRRP OAMMGR module to notify
 CFM is used to packets are sent. NPE2 then the VRRP module of the
monitor links between preempts the Master state. As fault. After receiving the
NPEs and PE-AGGs. a result, two master devices notification, the VRRP
coexist in a VRRP backup module triggers a
group, and the UPE receives master/backup VRRP
double copies of network switchover. NPE1 then
traffic. changes its VRRP status to
Initialize. NPE2 changes its
VRRP status from Backup to
Master after a period of three
times the interval at which
Issue 01 (2018-05-04) 413

NE20E-S2
Function Deployment Issue to Be Resolved Solution

VRRP packets are sent. This
process prevents two master
devices from coexisting in the
VRRP backup group.
 A VRRP backup group If a fault occurs on the  When VRRP status
is configured to backbone network, it triggers changes on NPEs, the
determine the a master/backup VRRP VRRP module notifies
master/backup status switchover but cannot trigger PE-AGGs' CFM modules
of NPEs. an active/standby PW of VRRP status changes.
 CFM is used to switchover. As a result, the  The CFM module on each
monitor links between CE still transmits user traffic PE-AGG notifies the PW
NPEs and PE-AGGs. to the previous master NPE, module of the status
causing a traffic interruption. change and triggers an
 PW redundancy is
configured to active/standby PW
determine the switchover.
active/standby status  Each PE-AGG notifies its
of PWs. associated UPE of the PW
status change.
 After receiving the
notification, the UPE
determines the
active/standby status of
PWs.
1.5.7.6.1 Ethernet OAM Applications on a MAN
EFM, CFM, and Y.1731 can be combined to provide E2E Ethernet OAM solutions,
implementing E2E Ethernet service management.
Issue 01 (2018-05-04) 414

NE20E-S2
Figure 1-269 Ethernet OAM applications on a MAN
Figure 1-269 shows a typical MAN network. The following example illustrates Ethernet
OAM applications on a MAN.
 EFM is used to monitor P2P direct links between a digital subscriber line access
multiplexer (DSLAM) and a user-end provider edge (UPE) or between a LAN switch
(LSW) and a UPE. If EFM detects errored frames, codes, or frame seconds, it sends
alarms to the network management system (NMS) to provide information for a network
administrator. EFM uses the loopback function to assess link quality.
 CFM is used to monitor E2E links between a UPE and an NPE or between a UPE and a
provider edge-aggregation (PE-AGG). A network planning engineer groups the devices
of each Internet service provider (ISP) into an MD and maps a type of service to an MA.
A network maintenance engineer enables maintenance points to exchange CCMs to
Issue 01 (2018-05-04) 415

NE20E-S2
monitor network connectivity. After receiving an alarm on the NMS, a network

administrator can enable loopback to locate faults or enable linktrace to discover paths.
 Y.1731 is used to measure packet loss and the delay on E2E links between a UPE and an
NPE or between a UPE and a PE-AGG at the aggregation layer.
1.5.7.6.2 Ethernet OAM Applications on an IPRAN
Figure 1-270 Ethernet OAM applications on an IPRAN
A mobile backhaul network shown in Figure 1-270 consists of a transport network between a
cell site gateway (CSG) and remote service gateways (RSGs) and a wireless network between
NodeBs/eNodeBs and the CSG. Carriers operate the transport and wireless networks
separately. Therefore, traffic transmitted on the transport network of one carrier is invisible to
devices on the wireless network of another carrier.
Ethernet OAM can be used on the transport and wireless networks to identify and locate
faults.
 EFM monitors Layer 2 links between a NodeB/eNodeB and CSG1.
− EFM is used to monitor the connectivity of links between a NodeB/eNodeB and
CSG1 or between RNCs and RSGs.
− EFM detects errored codes, frames, and frame seconds on links between a
NodeB/eNodeB and CSG1 and between RNCs and RSGs. If the number of errored
codes, frames, or frame seconds exceeds a configured threshold, an alarm is sent to
the NMS. A network administrator is notified of link quality deterioration and can
assess the risk of adverse impact on voice traffic.
− Loopback is used to monitor the quality of voice links between a NodeB/eNodeB
and CSG1 or between RNCs and RSGs.
 CFM is used to locate faulty links over which E2E services are transmitted.
− CFM periodically monitors links between cell site gateway (CSG) 1 and remote site
gateways (RSGs). If CFM detects a fault, it sends an alarm to the NMS. A network
administrator analyzes alarm information and takes measures to rectify the fault.
− Loopback and linktrace are enabled on links between CSG1 and the RSGs to help
link fault diagnosis.
 Y.1731 is used together with CFM to monitor link performance and voice and data traffic
quality.
Issue 01 (2018-05-04) 416

NE20E-S2
1.5.7.7 Our Advantages

In addition to EFM, CFM, and Y.1731, Huawei devices provide the following enhancements:
 EFM enhancements
 EFM and CFM can advertise information about many types of faults, facilitating
network-wide fault detection.
 Huawei devices can collect 802.1p priority-based Y.1731 statistics about packets
transmitted over pseudo wires (PWs).
1.5.7.7.1 EFM Enhancements

EFM enhancements are EFM extended functions, including an association between EFM and
an EFM interface, an active/standby extension, and single-fiber fault detection.
Association Between EFM and EFM Interfaces

On the network shown in Figure 1-271, customer edge 1 (CE1) is dual-homed to CE2 and
CE4. The dual-homing networking provides device redundancy, making the network more
robust and services more reliable. If the active link between CE1 and CE4 fails, traffic
switches to the standby link between CE1 and CE2, minimizing the service interruption time.
Association between EFM and EFM interfaces that connect CE2 and CE4 to CE1 allows
traffic to switch from the active link to the standby link if EFM detects a link fault or link
quality deterioration. On the network shown in Figure 1-271, when EFM detects a fault in the
link between CE1 and CE4, association between EFM and EFM interfaces can be used to
trigger an active/standby link switchover, improving transmission quality and reliability.
Figure 1-271 Association between EFM and EFM interfaces
Single-Fiber Fault Detection

Optical interfaces work in full-duplex mode and therefore consider themselves Up as long as
they receive packets. This causes the working status of the interfaces to be inconsistent with
the physical interface status.
As shown in Figure 1-272, optical interface A is directly connected to optical interface B. If
line 2 fails, interface B cannot receive packets and sets its physical status to Down. Interface
A can receive packets from interface B over line 1 and therefore considers its physical status
Up. If interface A sends packets to interface B, a service interruption occurs because interface
B cannot receive the packets.
Issue 01 (2018-05-04) 417

NE20E-S2
Figure 1-272 Principles of EFM single-fiber fault detection
EFM single-fiber detection can be used to prevent the preceding issue.

If EFM detects a fault on an interface that is associated with EFM, the association function
enables the interface to go Down. The modules for Layer 2 and Layer 3 services can detect
the interface status change and trigger a service switchover. The working status and physical
status of the interface remain consistent, preventing a service interruption. After the fault is
rectified and EFM negotiation succeeds, the interface goes Up and services switch back.
Single-fiber fault detection prevents inconsistency between the working and physical interface
statuses and allows the service modules to detect interface status changes.
1.5.8 Dual-Device Backup

Definition
Dual-device backup is a feature that ensures service traffic continuity in scenarios in which a
master/backup status negotiation protocol (for example, VRRP or E-Trunk) is deployed.
Dual-device backup enables the master device to back up service control data to the backup
device in real time. When the master device or the link directly connected to the master device
fails, service traffic quickly switches to the backup device. When the master device or the link
directly connected to the master device recovers, service traffic switches back to the master
device. Therefore, dual-device backup improves service and network reliability.
Related Concepts
If VRRP is used as a master/backup status negotiation protocol, dual-device backup involves
the following concepts:
 VRRP
VRRP is a fault-tolerant protocol that groups several routers into a virtual router. If the
next hop of a host is faulty, VRRP switches traffic to another router, which ensures
communication continuity and reliability.
For details about VRRP, see the chapter "VRRP" in NE20E Feature Description -
Network Reliability.
 RUI
RUI is a Huawei-specific redundancy protocol that is used to back up user information
between devices. RUI, which is carried over the Transmission Control Protocol (TCP),
specifies which user information can be transmitted between devices and the format and
amount of user information to be transmitted.
 RBS
Issue 01 (2018-05-04) 418

NE20E-S2
The remote backup service (RBS) is an RUI module used for inter-device backup. A
service module uses the RBS to synchronize service control data from the master device
to the backup device. When a master/backup VRRP switchover occurs, service traffic
quickly switches to a new master device.
 RBP
The remote backup profile (RBP) is a configuration template that provides a unified user
interface for dual-device backup configurations.
If E-Trunk is used as a master/backup status negotiation protocol, dual-device backup
involves the following concept:
 E-Trunk
E-Trunk implements inter-device link aggregation, providing device-level reliability.
E-Trunk aggregates data links of multiple devices to form a link aggregation group
(LAG). If a link or device fails, services are automatically switched to the other available
links or devices in the E-Trunk, improving link and device-level reliability.
For details about E-Trunk, see "E-Trunk" in NE20E Feature Description - LAN Access
and MAN Access.
Purpose
In traditional service scenarios, all users use a single device to access a network. Once the
device or the link directly connected to the device fails, all user services are interrupted, and
the service recovery time is uncertain. To resolve this issue, deploy dual-device backup to
enable the master device to back up service control data to the backup device in real time.
 The NE20E supports only dual-device hot backup for Address Resolution Protocol (ARP)
services, also called dual-device ARP hot backup.
Dual-device ARP hot backup enables the master device to back up the ARP entries at the
control and forwarding layers to the backup device in real time. When the backup device
switches to a master device, it uses the backup ARP entries to generate host routing
information without needing to relearn ARP entries, ensuring downlink traffic continuity.
− Manually triggered dual-device ARP hot backup: You must manually establish a
backup platform and backup channel for the master and backup devices. In addition,
you must manually trigger ARP entry backup from the master device to the backup
device. This backup mode has complex configurations.
− Automatically enabled dual-device ARP hot backup: You need to establish only a
backup channel between the master and backup devices, and the system
automatically implements ARP entry backup. This backup mode has simple
configurations.
 Dual-device IGMP snooping hot backup enables the master device to back up IGMP
snooping entries to the backup device in a master/backup E-Trunk scenario. If the master
device or the link between the master device and user fails, the backup device switches
to a master device and takes over, ensuring multicast service continuity.
Benefits
 Benefits to users
− Improved user experience
 Benefits to operators
Improving network reliability from the perspective of service reliability.
Issue 01 (2018-05-04) 419

NE20E-S2
1.5.8.2.1 Dual-Device ARP Hot Backup
Dual-device ARP hot backup enables the master device to back up the ARP entries at the
control and forwarding layers to the backup device in real time. When the backup device
switches to a master device, it uses the backup ARP entries to generate host routing
information. After you deploy dual-device ARP hot backup, the new master device forwards
downlink traffic without needing to relearn ARP entries. Dual-device ARP hot backup ensures
downlink traffic continuity.
Dual-device ARP hot backup applies in both Virtual Router Redundancy Protocol (VRRP) and enhanced
trunk (E-Trunk) scenarios. This section describes the implementation of dual-device ARP hot backup in
VRRP scenarios.
Figure 1-273 shows a typical network topology in which a Virtual Router Redundancy
Protocol (VRRP) backup group is deployed. In the topology, Device A is a master device, and
Device B is a backup device. In normal circumstances, Device A forwards both uplink and
downlink traffic. If Device A or the link between Device A and the switch fails, a
master/backup VRRP switchover is triggered to switch Device B to the Master state. Device B
needs to advertise a network segment route to a device on the network side to direct downlink
traffic from the network side to Device B. If Device B has not learned ARP entries from a
device on the user side, the downlink traffic is interrupted. Device B forwards the downlink
traffic only after it learns ARP entries from a device on the user side.
Feature Deployment
To prevent downlink traffic from being interrupted because Device B does not learn ARP
entries from a device on the user side, deploy dual-device ARP hot backup on Device A and
Device B, as shown in Figure 1-274.
Issue 01 (2018-05-04) 420

NE20E-S2
Figure 1-274 Dual-device ARP hot backup
After the deployment, Device B backs up the ARP entries on Device A in real time. If a
master/backup VRRP switchover occurs, Device B forwards downlink traffic based on the
backup ARP entries without needing to relearn ARP entries from a device on the user side.
1.5.8.2.2 Dual-Device IGMP Snooping Hot Backup
Dual-device IGMP snooping hot backup enables the master device and the backup device
synchronously generate multicast entries in real time. The IGMP protocol packets are
synchronize from the master device to the backup device, so that the same multicast
forwarding table entries can be generated on the backup device. After you deploy dual-device
ARP hot backup, the new master device forwards downlink traffic without needing to relearn
multicast forwarding table entries by IGMP snooping. Dual-device IGMP snooping hot
backup ensures downlink traffic continuity.
Figure 1-275 shows a typical network topology in which an Eth—Trunk group is deployed. In
the topology, Device A is a master device, and Device B is a backup device. In normal
circumstances, Device A forwards both uplink and downlink traffic. If Device A or the link
between Device A and the switch fails, a master/backup Eth—Trunk link switchover is
triggered to switch Device B to the Master state. Device B needs to advertise a network
segment route to a device on the network side to direct downlink traffic from the network side
to Device B. If Device B has not generated multicast forwarding entries directing traffic to the
user side, the downlink traffic is interrupted. Device B forwards the downlink traffic only
after it generates forwarding entries directing traffic to the user side.
Issue 01 (2018-05-04) 421

NE20E-S2
Figure 1-275 Eth-Trunk Networking
Feature Deployment
To prevent downlink traffic from being interrupted because Device B does not generate
multicast forwarding entries directing traffic to the user side, deploy dual-device IGMP
snooping hot backup on Device A and Device B, as shown in Figure 1-276.
Figure 1-276 Dual-device IGMP Snooping hot backup
After the deployment, Device A and Device B generate the same multicast forwarding entries
at the same time. If a master/backup Eth-Trunk link switchover occurs, Device B forwards
downlink traffic based on the generated multicast forwarding entries without needing to
generate the entries directing traffic to the user side.
Issue 01 (2018-05-04) 422

NE20E-S2
Terms
Term Definition
Dual-device A feature in which one device functions as a master device and the other
backup functions as a backup device. In normal circumstances, the master
device provides service access and the backup device monitors the
running status of the master device. When the master device fails, the
backup device switches to a master device and provides service access,
ensuring service traffic continuity.
Remote Backup A configuration template that provides a unified user interface for
Profile dual-system backup configurations.
Remote Backup An inter-device backup channel, used to synchronize data between two
Service devices so that user services can smoothly switch from a faulty device
to another device during a master/backup device switchover.
Redundancy A Huawei-proprietary protocol used by devices to back up user
User information between each other over TCP connections.
Information

Abbreviation

BFD Bidirectional Forwarding Detection
BRAS Broadband Remote Access Server
DHCP Dynamic Host Configuration Protocol
DR Designated Router
ETH OAM Ethernet Operations Administration Maintenance
GRE Generic Routing Encapsulation
IGMP Internet Group Manage Protocol
ISP Internet Service Provider
L2TP Layer 2 Tunneling Protocol
LAC L2TP Access Concentrator
LNS L2TP Tunnel Switch
LSP label switched path
MAC Media Access Control
Issue 01 (2018-05-04) 423

NE20E-S2

Abbreviation

PIM Protocol Independent Multicast
PPP Point-to-Point Protocol
PPPOE PPP Over Ethernet
STB Set Top Box
TE Traffic Engineering
VLAN Virtual Local Area Network
RUI Redundancy User Information
RBS Remote Backup Service
RBP Remote Backup Profile
1.5.9 Bit-Error-Triggered Protection Switching

Definition
A bit error refers to the deviation between a bit that is sent and the bit that is received. Cyclic
redundancy checks (CRCs) are commonly used to detect bit errors. Bit errors caused by line
faults can be corrected by rectifying the associated link faults. Random bit errors caused by
optical fiber aging or optical signal jitter, however, are more difficult to correct.
Bit-error-triggered protection switching is a reliability mechanism that triggers protection
switching based on bit error events (bit error occurrence event or correction event) to
minimize bit error impact.
Purpose
The demand for network bandwidth is rapidly increasing as mobile services evolve from
narrowband voice services to integrated broadband services, including voice and streaming
media. Meeting the demand for bandwidth with traditional bearer networks dramatically
raises carriers' operation costs. To tackle the challenges posed by this rapid
broadband-oriented development, carriers urgently need mobile bearer networks that are
flexible, low-cost, and highly efficient. IP-based mobile bearer networks are an ideal choice.
IP radio access networks (IPRANs), a type of IP-based mobile bearer network, are being
increasingly widely used.
Traditional bearer networks use retransmission or the mechanism that allows one end to
accept only one copy of packets from multiple copies of packets sent by the other end to
minimize bit error impact. IPRANs have higher reliability requirements than traditional bearer
networks when carrying broadband services. Traditional fault detection mechanisms cannot
Issue 01 (2018-05-04) 424

NE20E-S2
trigger protection switching based on random bit errors. As a result, bit errors may degrade or
even interrupt services on an IPRAN.
To solve this problem, configure bit-error-triggered protection switching.
To prevent impacts on services, check whether protection links have sufficient bandwidth resources
before deploying bit-error-triggered protection switching.
Benefits
Bit-error-triggered protection switching offers the following benefits:
 Protects traffic against random bit errors, meeting high reliability requirements and
improving service quality.
 Enables devices to record bit error events. These records help carriers locate the nodes or
lines that have bit errors and take corrective measures accordingly.
1.5.9.2 Principles
1.5.9.2.1 Bit Error Detection
Background
Bit-error-triggered protection switching enables link bit errors to trigger protection switching
on network applications, minimizing the impact of bit errors on services. To implement
bit-error-triggered protection switching, establish an effective bit error detection mechanism
to ensure that network applications promptly detect bit errors.
Related Concepts
Bit error detection involves the following concepts:
 Bit error: deviation between a bit that is sent and the bit that is received.
 BER: number of bit errors divided by the total number of transferred bits during a certain
period. The BER can be considered as an approximate estimate of the probability of a bit
error occurring on any particular bit.
 LSP BER: calculation result based on the BER of each node on an LSP.
Interface-based Bit Error Detection

A device uses the CRC algorithm to detect bit errors on an inbound interface and calculate the
BER. If the BER exceeds the bit error alarm threshold configured on a device's interface, the
device determines that bit errors have occurred on the interface's link, and instructs an
upper-layer application to perform a service switchover. When the BER of the interface falls
below the bit error alarm clear threshold, the device determines that the bit errors have been
cleared from the interface, and instructs the upper-layer application to perform a service
switchback. To prevent line jitters from frequently triggering service switchovers and
switchbacks, set the bit error alarm clear threshold to be one order of magnitude lower than
the bit error alarm threshold.
Interfaces support the following types of bit error detection functions:
 Trigger-LSP: applies to bit-error-triggered RSVP-TE tunnel, PW, or L3VPN switching.
Issue 01 (2018-05-04) 425

NE20E-S2
 Trigger-section: applies to bit-error-triggered section switching.

 Link-quality: applies to link quality adjustment. This type of detection triggers route cost
changes and in turn route reconvergence to prevent bit errors from affecting services.
Advertisement of the Bit Error Status

BFD mode
For dynamic services that use BFD to detect faults, a device uses BFD packets to advertise the
bit error status (including the BER). If the BER exceeds the bit error alarm threshold
configured on a device's interface, the device determines that bit errors have occurred on the
interface's link, and instructs an upper-layer application to perform a service switchover. The
device also notifies the BFD module of the bit error status, and then uses BFD packets to
advertise the bit error status to the peer device. If bit-error-triggered protection switching also
has been deployed on the peer device, the peer device performs protection switching.
If a transit node or the egress of a dynamic CR-LSP detects bit errors, the transit node or
egress must use BFD packets to advertise the BER. On the network shown in Figure 1-277, a
dynamic CR-LSP is deployed from PE1 to PE2. If both the transit node P and egress PE2
detect bit errors:
1. The P node obtains the local BER and sends PE2 a BFD packet carrying the BER.
2. PE2 obtains the local BER. After receiving the BER from the P node, PE2 calculates the
BER of the CR-LSP based on the BER received and the local BER.
3. PE2 sends PE1 a BFD packet carrying the BER of the CR-LSP.
4. After receiving the BER of the CR-LSP, PE1 determines the bit error status based on a
specified threshold. If the BER exceeds the threshold, PE1 performs protection
switching.
Figure 1-277 BER advertisement using BFD packets
MPLS-TP OAM mode

For static services that use MPLS-TP OAM to detect faults, a device uses MPLS-TP OAM to
advertise the bit error status. If the BER reaches the bit error alarm threshold configured on an
interface of a device along a static CR-LSP or PW, the device determines that bit errors have
occurred on the interface's link, and notifies the MPLS-TP OAM module. The MPLS-TP
OAM module uses AIS packets to advertise the bit error status to the egress, and then APS is
used to trigger a traffic switchover.
Issue 01 (2018-05-04) 426

NE20E-S2
If a transit node detects bit errors on a static CR-LSP or PW, the transit node uses AIS packets
to advertise the bit error status to the egress, triggering a traffic switchover on the static
CR-LSP or PW. On the network shown in Figure 1-278, a static CR-LSP is deployed from
PE1 to PE2. If the transit node P detects bit errors:
1. The P node uses AIS packets to notify PE2 of the bit error event.
2. After receiving the AIS packets, PE2 reports an AIS alarm to trigger local protection
switching. PE2 then sends CRC-AIS packets to PE1 and uses the APS protocol to
complete protection switching through negotiation with PE1.
3. After receiving the CRC-AIS packets, PE1 reports a CRC-AIS alarm.
Figure 1-278 Bit error status advertisement using AIS packets
1.5.9.2.2 Bit-Error-Triggered Section Switching
Background
If bit errors occur on an interface, deploy bit-error-triggered section switching to trigger an
upper-layer application associated with the interface for a service switchover.
Implementation Principles
Trigger-section bit error detection must be enabled on an interface. After detecting bit errors
on an inbound interface, a device notifies the interface management module of the bit errors.
The link-layer protocol status of the interface then changes to bit-error-detection Down,
triggering an upper-layer application associated with the interface for a service switchover.
After the bit errors are cleared, the link-layer protocol status of the interface changes to Up,
triggering an upper-layer application associated with the interface for a service switchback.
The device also notifies the BFD module of the bit error status, and then uses BFD packets to
advertise the bit error status to the peer device.
 If bit-error-triggered section switching also has been deployed on the peer device, the bit
error status is advertised to the interface management module of the peer device. The
link-layer protocol status of the interface then changes to bit-error-detection Down or Up,
triggering an upper-layer application associated with the interface for a service
switchover or switchback.
 If bit-error-triggered section switching is not deployed on the peer device, the peer
device cannot detect the bit error status of the interface's link. In this case, the peer
Issue 01 (2018-05-04) 427

NE20E-S2
device can only depend on an upper-layer application (for example, IGP) for link fault
detection.
For example, on the network shown in Figure 1-279, trigger-section bit error detection is
enabled on each interface, and nodes communicate through IS-IS routes. In normal cases,
IS-IS routes on PE1 and PE2 are preferentially transmitted over the primary link. Therefore,
traffic in both directions is forwarded over the primary link. If PE2 detects bit errors on the
interface to PE1:
 The link-layer protocol status of the interface changes to bit-error-detection Down,
triggering IS-IS routes to be switched to the secondary link. Traffic from PE2 to PE1 is
then forwarded over the secondary link. PE2 uses a BFD packet to notify PE1 of the bit
errors.
 After receiving the BFD packet, PE1 sets the link-layer protocol status of the
corresponding interface to bit-error-detection Down, triggering IS-IS routes to be
switched to the secondary link. Traffic from PE1 to PE2 is then forwarded over the
secondary link.
If trigger-section bit error detection is not supported or enabled on PE1's interface to PE2,
PE1 can only use IS-IS to detect that the primary link is unavailable, and then performs an
IS-IS route switchover.
Figure 1-279 Bit-error-triggered section switching
Usage Scenario
If LDP LSPs are used, deploy bit-error-triggered section switching to cope with link bit errors
on the LDP LSPs.
After bit-error-triggered section switching is deployed, if bit errors occur on both the primary and
secondary links on an LDP LSP, the interface status changes to bit-error-detection Down on both the
primary and secondary links. As a result, services are interrupted. Therefore, it is recommended that you
deploy bit-error-triggered IGP route switching.
Issue 01 (2018-05-04) 428

NE20E-S2
1.5.9.2.3 Bit-Error-Triggered IGP Route Switching
Background
Bit-error-triggered section switching can cope with link bit errors. If bit errors occur on both
the primary and secondary links, bit-error-triggered section switching changes the interface
status on both the primary and secondary links to bit-error-detection Down. As a result,
services are interrupted because no link is available. To resolve the preceding issue, deploy
bit-error-triggered IGP route switching. After the deployment is complete, link bit errors
trigger IGP route costs to be adjusted, preventing upper-layer applications from transmitting
service traffic to links with bit errors. Bit-error-triggered IGP route switching ensures normal
running of upper-layer applications and minimizes the impact of bit errors on services.
Link-quality bit error detection must be enabled on an interface. After detecting bit errors on
an inbound interface, a device notifies the interface management module of the bit errors. The
link quality level of the interface then changes to Low, triggering an IGP (OSPF or IS-IS) to
increase the cost of the interface's link. In this case, IGP routes do not preferentially select the
link with bit errors. After the bit errors are cleared, the link quality level of the interface
changes to Good, triggering the IGP to restore the original cost for the interface's link. In this
case, IGP routes preferentially select the link again. The device also notifies the BFD module
of the bit error status, and then uses BFD packets to advertise the bit error status to the peer
device.
 If bit-error-triggered IGP route switching also has been deployed on the peer device, the
bit error status is advertised to the interface management module of the peer device. The
link quality level of the interface then changes to Low or Good, triggering the IGP to
increase the cost of the interface's link or restore the original cost for the link. IGP routes
on the peer device then do not preferentially select the link with bit errors or
preferentially select the link again.
 If bit-error-triggered IGP route switching is not deployed on the peer device, the peer
device cannot detect the bit error status of the interface's link. Therefore, the IGP does
not adjust the cost of the link. Traffic from the peer device may still pass through the link
with bit errors. As a result, bidirectional IGP routes pass through different links. The
local device can receive traffic properly, and services are not interrupted. However, the
impact of bit errors on services cannot be eliminated.
For example, on the network shown in Figure 1-280, link-quality bit error detection is enabled
on each interface, and nodes communicate through IS-IS routes. In normal cases, IS-IS routes
on PE1 and PE2 are preferentially transmitted over the primary link. Therefore, traffic in both
directions is forwarded over the primary link. If PE2 detects bit errors on interface 1:
 PE2 adjusts the link quality level of interface 1 to Low, triggering IS-IS to increase the
cost of the interface's link to a value (for example, 40). PE2 uses a BFD packet to
advertise the bit errors to PE1.
 After receiving the BFD packet, PE1 also adjusts the link quality level of interface 1 to
Low, triggering IS-IS to increase the cost of the interface's link to a value (for example,
40).
IS-IS routes on both PE1 and PE2 preferentially select the secondary link, because the cost
(20) of the secondary link is less than the cost (40) of the primary link. Traffic in both
directions is then switched to the secondary link.
Issue 01 (2018-05-04) 429

NE20E-S2
If bit-error-triggered IGP route switching is not supported or enabled on PE1, PE1 cannot
detect the bit errors. In this case, PE1 still sends traffic to PE2 through the primary link. PE2
can receive traffic properly, but services are affected by the bit errors.
If PE2 detects bit errors on both interface 1 and interface 2, PE2 adjusts the link quality levels
of the interfaces to Low, triggering the costs of the interfaces' links to be increased to 40.
IS-IS routes on PE2 still preferentially select the primary link to ensure service continuity,
because the cost (40) of the primary link is less than the cost (50) of the secondary link. To
eliminate the impact of bit errors on services, you must manually restore the link quality.
Figure 1-280 Bit-error-triggered IGP route switching
Bit-error-triggered section switching and bit-error-triggered IGP route switching are mutually exclusive.
Usage Scenario
If LDP LSPs are used, deploy bit-error-triggered IGP route switching to cope with link bit
errors on the LDP LSPs. Bit-error-triggered IGP route switching ensures service continuity
even if bit errors occur on both the primary and secondary links on an LDP LSP. Therefore, it
is recommended that you deploy bit-error-triggered IGP route switching.
1.5.9.2.4 Bit-Error-Triggered Trunk Update
Background
If a trunk interface is used to increase bandwidth, improve reliability, and implement load
balancing, deploy bit-error-triggered trunk update to cope with bit errors detected on trunk
member interfaces.
According to the types of protection switching triggered, bit-error-triggered trunk update is
classified as follows:
Issue 01 (2018-05-04) 430

NE20E-S2
Trunk-bit-error-triggered section switching

On the network shown in Figure 1-281, trigger-section or trigger-LSP bit error detection must
be enabled on each trunk member interface. After detecting bit errors on a trunk interface's
member interface, a device advertises the bit errors to the trunk interface, triggering the trunk
interface to delete the member interface from the forwarding plane. The trunk interface then
does not select the member interface to forward traffic. After the bit errors are cleared from
the member interface, the trunk interface re-adds the member interface to the forwarding
plane. The trunk interface can then select the member interface to forward traffic. If bit errors
occur on all trunk member interfaces or the number of member interfaces without bit errors is
lower than the lower threshold for the trunk interface's Up links, the trunk interface goes
Down. An upper-layer application associated with the trunk interface is then triggered to
perform a service switchover. If the number of member interfaces without bit errors reaches
the lower threshold for the trunk interface's Up links, the trunk interface goes Up. An
upper-layer application associated with the trunk interface is then triggered to perform a
service switchback.
advertise the bit error status to the peer device connected to the trunk interface.
 If trunk-bit-error-triggered section switching also has been deployed on the peer device,
the bit error status is advertised to the trunk interface of the peer device. The trunk
interface is then triggered to delete or re-add the member interface from or to the
forwarding plane. The trunk interface is also triggered to go Down or Up, implementing
switchover or switchback synchronization with the device.
 If trunk-bit-error-triggered section switching is not deployed on the peer device, the peer
device cannot detect the bit error status of the interface's link. To ensure normal running
of services, the device can receive traffic from the member interface with bit errors in the
following cases:
− The trunk interface of the device has deleted the member interface with bit errors
from the forwarding plane or has gone Down.
− The trunk interface of the peer device can still forward traffic.
However, bit errors may affect service quality.
Trunk-bit-error-triggered section switching is similar to common-interface-bit-error-triggered
section switching. If bit errors occur on the trunk interfaces on both the primary and
secondary links, trunk-bit-error-triggered section switching may interrupt services. Therefore,
trunk-bit-error-triggered IGP route switching is recommended.
Figure 1-281 Trunk-bit-error-triggered section switching
Trunk-bit-error-triggered IGP route switching

On the network shown in Figure 1-282, link-quality bit error detection must be enabled on
each trunk member interface, and bit-error-triggered IGP route switching must also be
Issue 01 (2018-05-04) 431

NE20E-S2
deployed on the trunk interface. After detecting bit errors on a trunk interface's member
interface, a device advertises the bit errors to the trunk interface, triggering the trunk interface
to delete the member interface from the forwarding plane. The trunk interface then does not
select the member interface to forward traffic. After the bit errors are cleared from the
member interface, the trunk interface re-adds the member interface to the forwarding plane.
The trunk interface can then select the member interface to forward traffic. If bit errors occur
on all trunk member interfaces or the number of member interfaces without bit errors is lower
than the lower threshold for the trunk interface's Up links, the trunk interface ignores the bit
errors on the member interfaces and remains Up. However, the link quality level of the trunk
interface becomes Low, triggering an IGP (OSPF or IS-IS) to increase the cost of the trunk
interface's link. IGP routes then do not preferentially select the link. If the number of member
interfaces without bit errors reaches the lower threshold for the trunk interface's Up links, the
link quality level of the trunk interface changes to Good, triggering the IGP to restore the
original cost for the trunk interface's link. In this case, IGP routes preferentially select the link
again.
advertise the bit error status to the peer device connected to the trunk interface.
 If trunk-bit-error-triggered IGP route switching also has been deployed on the peer
device, the bit error status is advertised to the trunk interface of the peer device. The
trunk interface is then triggered to delete or re-add the member interface from or to the
forwarding plane. The link quality level of the trunk interface is also triggered to change
to Low or Good. In this case, the cost of IGP routes is adjusted, implementing
switchover or switchback synchronization with the device.
 If trunk-bit-error-triggered IGP route switching is not deployed on the peer device, the
peer device cannot detect the bit error status of the interface's link. If the trunk interface
of the device has deleted the member interface with bit errors from the forwarding plane,
the trunk interface of the peer device may still select the member interface to forward
traffic. Similarly, if the link quality level of the trunk interface on the device has changed
to Low, the IGP is triggered to increase the cost of the trunk interface's link. In this case,
IGP routes do not preferentially select the link. However, IGP on the peer device does
not adjust the cost of the link. Traffic from the peer device may still pass through the link
with bit errors. As a result, bidirectional IGP routes pass through different links. To
ensure normal running of services, the device can receive traffic from the member
interface with bit errors. However, bit errors may affect service quality.
Issue 01 (2018-05-04) 432

NE20E-S2
Figure 1-282 Trunk-bit-error-triggered IGP route switching
Layer 2 trunk interfaces do not support an IGP. Therefore, bit-error-triggered IGP route switching cannot
be deployed on Layer 2 trunk interfaces. If bit errors occur on all Layer 2 trunk member interfaces or the
number of member interfaces without bit errors is lower than the lower threshold for the trunk interface's
Up links, the trunk interface remains in the Up state. As a result, protection switching cannot be
triggered. To eliminate the impact of bit errors on services, you must manually restore the link quality.
Usage Scenario
If a trunk interface is deployed, deploy bit-error-triggered trunk update to cope with bit errors
detected on trunk member interfaces. Trunk-bit-error-triggered IGP route switching is
recommended.
1.5.9.2.5 Bit-Error-Triggered RSVP-TE Tunnel Switching
Background
To cope with link bit errors along an RSVP-TE tunnel and reduce the impact of bit errors on
services, deploy bit-error-triggered RSVP-TE tunnel switching. After the deployment is
complete, service traffic is switched from the primary CR-LSP to the backup CR-LSP if bit
errors occur.
On the network shown in Figure 1-283, trigger-LSP bit error detection must be enabled on
each node's interfaces on the RSVP-TE tunnels. To implement dual-ended switching,
configure the RSVP-TE tunnels in both directions as bidirectional associated CR-LSPs. If a
node on a CR-LSP detects bit errors in a direction, the ingress of the tunnel obtains the BER
of the CR-LSP after BER calculation and advertisement. For details, see 1.5.9.2.1 Bit Error
Detection.
Issue 01 (2018-05-04) 433

NE20E-S2
Figure 1-283 Bit-error-triggered RSVP-TE tunnel switching
The ingress then determine the bit error status of the CR-LSP based on the BER threshold
configured for the RSVP-TE tunnel. For rules for determining the bit error status of the
CR-LSP, see Figure 1-284.
 If the BER of the CR-LSP is greater than or equal to the switchover threshold of the
RSVP-TE tunnel, the CR-LSP is always in the excessive BER state.
 If the BER of the CR-LSP falls below the switchback threshold, the CR-LSP changes to
the normalized BER state.
Figure 1-284 Rules for determining the bit error status of the CR-LSP
After the bit error statuses of the primary and backup CR-LSPs are determined, the RSVP-TE
tunnel determines whether to perform a primary/backup CR-LSP switchover based on the
following rules:
 If the primary CR-LSP is in the excessive BER state, the RSVP-TE tunnel attempts to
switch traffic to the backup CR-LSP.
Issue 01 (2018-05-04) 434

NE20E-S2
 If the primary CR-LSP changes to the normalized BER state or the backup CR-LSP is in
the excessive BER state, traffic is switched back to the primary CR-LSP.
The RSVP-TE tunnel in the opposite direction also performs the same switchover, so that
traffic in the upstream and downstream directions is not transmitted over the CR-LSP with bit
errors.
Usage Scenario
If RSVP-TE tunnels are used as public network tunnels, deploy bit-error-triggered RSVP-TE
tunnel switching to cope with link bit errors along the tunnels.
1.5.9.2.6 Bit-Error-Triggered Switching for PW
Background
When PW redundancy is configured for L2VPN services, bit-error-triggered switching can be
configured. With this function, if bit errors occur, services can switch between the primary
and secondary PWs.
Principles
Trigger-LSP bit error detection must be enabled on each node's interfaces. PW redundancy
can be configured in either a single segment or multi-segment scenario.
 Single-segment PW redundancy scenario
In Figure 1-285, PE1 establishes a primary PW to PE2 and a secondary PW to PE3,
which implements PW redundancy. If PE2 detects bit errors, the processing is as follows:
− PE2 switches traffic destined for PE1 to the path bypass PW -> PE3 -> secondary
PW -> PE1 and sends a BFD packet to notify PE1 of the bit errors.
− Upon receipt of the BFD packet, PE1 switches traffic destined for PE2 to the path
secondary PW-> PE3 -> bypass PW -> PE2.
Traffic between PE1 and PE2 can travel along bit-error-free links.
Issue 01 (2018-05-04) 435

NE20E-S2
Figure 1-285 Bit-error-triggered switching for single-segment PW
 Multi-segment PW redundancy scenario

In Figure 1-286, multi-segment PW redundancy is configured. PE1 is dual-homed to two
SPEs. If PE2 detects bit errors, the processing is as follows:
− PE2 switches traffic destined for PE1 to the path bypass PW -> PE3 -> PW2 ->
SPE2 -> secondary PW -> PE1 and sends a BFD packet to notify SPE1 of the bit
errors.
− Upon receipt of the BFD packet, SPE1 sends an LDP Notification message to notify
PE1 of the bit errors.
− Upon receipt of the notification, PE1 switches traffic destined for PE2 to the path
secondary PW -> SPE2 -> PW2 -> PE3 -> bypass PW-> PE2.
Traffic between PE1 and PE2 can travel along bit-error-free links. If bit errors occur on a
link between PE1 and SPE1, the processing is the same as that in the single-segment PW
redundancy scenario.
Issue 01 (2018-05-04) 436

NE20E-S2
Figure 1-286 Bit-error-triggered switching for multi-segment PW
After traffic switches to the secondary PW, and bit errors are removed from the primary PW,
traffic switches back to the primary PW based on a configured switchback policy.
If an RSVP-TE tunnel is established for PWs, and bit-error-triggered RSVP-TE tunnel switching is
configured, a switchover is preferentially performed between the primary and hot-standby CR-LSPs in
the RSVP-TE tunnel. A primary/secondary PW switchover can be triggered only if the
primary/hot-standby CR-LSP switchover fails to remove bit errors in either of the following situations:
 The hot standby function is not configured.

 Bit errors occur on both the primary and hot-standby CR-LSPs.
Usage Scenario
If L2VPN is used to carry user services and PW redundancy is deployed to ensure reliability,
deploy bit-error-triggered switching for PW to minimize the impact of bit errors on user
services and improve service quality.
1.5.9.2.7 Bit-Error-Triggered L3VPN Switching
Background
On an FRR-enabled HVPN, bit-error-triggered switching can be configured for VPN routes.
With this function, if bit errors occur on the HVPN, VPN routes re-converge so that traffic
switches to a bit-error-free link.
Issue 01 (2018-05-04) 437

NE20E-S2
Principles
Trigger-LSP bit error detection must be enabled on each node's interfaces. In Figure 1-287, an
HVPN is configured on an IP/MPLS backbone network. VPN FRR is configured on a UPE. If
SPE1 detects bit errors, the processing is as follows:
 SPE1 reduces the Local Preference attribute value or increase the Multi-Exit
Discrimination (MED) attribute value. Then, the preference value of a VPN route that
SPE1 advertises to an NPE is reduced. As a result, the NPE selects the VPN route to
SPE2, not the VPN route to SPE1. Traffic switches to the standby link. In addition, SPE1
sends a BFD packet to notify the UPE of bit errors.
 Upon receipt of the BFD packet, the UPE switches traffic to the standby link over the
VPN route destined for SPE2.
If the bit errors on the active link are removed, the UPE re-selects the VPN routes destined for
SPE1, and SPE1 restores the preference value of the VPN route to be advertised to the NPE.
Then the NPE also re-selects the VPN route destined for SPE1.
Figure 1-287 Bit-error-triggered L3VPN switching
If an RSVP-TE tunnel is established for an L3VPN, and bit-error-triggered RSVP-TE tunnel switching
is configured, a traffic switchover between the primary and hot-standby CR-LSPs in the RSVP-TE
tunnel is preferentially performed. An active/standby L3VPN route switchover can be triggered only if
the primary/hot-standby CR-LSP switchover fails to remove bit errors in either of the following
situations:
 The hot standby function is not configured.

 Bit errors occur on both the primary and hot-standby CR-LSPs.
Issue 01 (2018-05-04) 438

NE20E-S2
Usage Scenario
If L3VPN is used to carry user services and VPN FRR is deployed to ensure reliability,
deploy bit-error-triggered L3VPN switching to minimize the impact of bit errors on user
services and improve service quality.
1.5.9.2.8 Bit-Error-Triggered Static CR-LSP/PW/E-PW APS
Background
In PW/E-PW over static CR-LSP scenarios, if primary and secondary PWs are configured,
deploy bit-error-triggered protection switching. If bit errors occur, service traffic is switched
from the primary PW to the secondary PW.
The MAC-layer SD alarm function (Trigger-LSP type) must be enabled on interfaces, and
then MPLS-TP OAM must be deployed to monitor CR-LSPs/PWs. Static PWs/E-PWs are
classified as SS-PWs or MS-PWs.
In an SS-PW networking scenario (see Figure 1-288), the bit error generation and clearing
process is as follows:
Bit error generation:
 If the BER on an inbound interface of the P node reaches a specified threshold, the CRC
module detects the bit error status of the inbound interface, notifies all static CR-LSP
modules, and constructs and sends AIS packets to PE2.
 Upon receipt of the AIS packets, PE2 notifies static PWs established over the CR-LSPs
of the bit errors and instructs the TP OAM module to perform APS. APS triggers a
primary/backup CR-LSP switchover, and a PW established over the new primary
CR-LSP takes over traffic.
Bit error clearing: After bit errors are cleared, the CRC module cannot detect the bit error
status on the inbound interface. The CRC module informs the TP OAM module that the bit
errors have been cleared. Upon receipt of the notification, the TP OAM module stops sending
AIS packets to PE2 functioning as the egress. PE2 does not receive AIS packets after a
specified period and determines that the bit errors have been cleared. PE2 then generates an
AIS clear alarm and instructs the TP OAM to perform APS. APS triggers a primary/backup
CR-LSP switchover, and services are switched back to the PW over the primary CR-LSP.
Issue 01 (2018-05-04) 439

NE20E-S2
Figure 1-288 Bit-error-triggered APS in an SS-PW networking scenario
In an MS-PW networking scenario (see Figure 1-289), the bit error generation and clearing
Bit error generation:
 The CRC module of an inbound interface on the SPE detects bit errors and determines to
send either an SF or SD alarm based on a specified BER threshold. The CRC module
then notifies the TP OAM module of the bit errors. The TP OAM module notifies the bit
error status, sends RDI packets, and performs APS. The APS module instructs the peer
node to perform a traffic switchover, which triggers a primary/backup CR-LSP
switchover. The PW established over the bit-error-free CR-LSP takes over traffic.
 If the BER on an inbound interface of the SPE reaches a specified threshold, the CRC
module detects the bit error status of the inbound interface, sets all static CR-LSP
modules to the bit error status, and constructs and sends AIS packets to PE2.
 Upon receipt of the AIS packets, PE2 notifies the TP OAM module. The TP OAM
module then performs APS, which triggers a primary/backup CR-LSP switchover. The
PW established over the bit-error-free CR-LSP takes over traffic.
Bit error clearing: After bit errors are cleared, the CRC module cannot detect the bit error
status on the inbound interface. The CRC module informs the TP OAM module that the bit
errors have been cleared. Upon receipt of the notification, the TP OAM module stops sending
AIS packets to PE2 functioning as the egress. PE2 does not receive AIS packets after a
specified period and determines that the bit errors have been cleared. PE2 then generates an
AIS clear alarm and instructs the TP OAM to perform APS. APS triggers a primary/backup
CR-LSP switchover, and services are switched back to the PW over the primary CR-LSP.
Issue 01 (2018-05-04) 440

NE20E-S2
Figure 1-289 Bit-error-triggered APS in an MS-PW networking scenario
If a tunnel protection group has been deployed for static CR-LSPs carrying PWs/E-PWs, bit errors
preferentially trigger static CR-LSP protection switching. Bit-error-triggered PW protection switching is
performed only when bit-error-triggered static CR-LSP protection switching fails to protect services
against bit errors (for example, bit errors occur on both the primary and backup CR-LSPs).
Usage Scenario
If static CR-LSPs/PWs/E-PWs are used to carry user services and MPLS-TP OAM is
deployed to ensure reliability, deploy bit-error-triggered APS to minimize the impact of bit
errors on user services and improve service quality.
1.5.9.2.9 Relationships Among Bit-Error-Triggered Protection Switching Features
Featur Function Dependency Relationship Deployment

e on Bit Error with Other Constraints and
Detection Bit-Error-Trig Suggestions
gered
Protection
Switching
Features
Bit A device uses the - This feature is To prevent line
error CRC algorithm to the basis of jitters from
detecti detect bit errors on other frequently
on an inbound interface. bit-error-trigger triggering service
Bit error detection ed protection switchovers and
types are classified as switching switchbacks, set
trigger-LSP, features. the bit error alarm
trigger-section, or clear threshold to
link-quality. be one order of
The device uses BFD magnitude lower
packets or MPLS-TP than the bit error
OAM to advertise the alarm threshold.
bit error status, and
Issue 01 (2018-05-04) 441

NE20E-S2

gered
Protection
Switching
Features
promptly notifies the
peer device of bit
error generation and
clearing events.
Bit-err If bit errors are Trigger-section  This feature  Enable
or-trigg generated or cleared bit error is bit-error-trigger
ered on an interface, the detection must independent ed section
section link-layer protocol be enabled on ly deployed. switching on the
switchi status of the interface an interface.  When interfaces at
ng changes to The bit error deploying both ends of a
bit-error-detection status must be trunk-bit-err link.
Down or Up, advertised or-triggered  If bit errors
triggering an using BFD section occur on both
upper-layer packets. switching, the primary and
application you can secondary links,
associated with the enable bit-error-trigger
interface for a service bit-error-trig ed section
switchover or gered switching may
switchback. section interrupt
switching services.
on trunk Therefore,
member bit-error-trigger
interfaces. ed IGP route
switching is
recommended.
Bit-err If bit errors are Link-quality bit  This feature  Enable
or-trigg generated or cleared error detection is bit-error-trigger
ered on an interface, the must be independent ed IGP route
IGP link quality level of enabled on an ly deployed. switching on the
route the interface changes interface.  When interfaces at
switchi to Low or Good, The bit error deploying both ends of a
ng triggering an IGP status must be trunk-bit-err link.
(OSPF or IS-IS) to advertised or-triggered
increase the cost of using BFD IGP route
the interface's link or packets. switching,
restore the original you must
cost for the link. IGP deploy
routes on the peer bit-error-trig
device then do not gered IGP
preferentially select route
the link with bit switching
errors or on trunk
preferentially select interfaces.
the link again.
Issue 01 (2018-05-04) 442

NE20E-S2

gered
Protection
Switching
Features
Bit-err If bit errors are When  Trunk-bit-er  Enable the same
or-trigg generated or cleared deploying ror-triggere bit-error-trigger
ered on a trunk member trunk-bit-error-t d section ed protection
trunk interface, the trunk riggered section switching is switching
update interface is triggered switching, you independent function on the
to delete or re-add must enable ly deployed. trunk interfaces
the member interface trigger-section  When at both ends.
from or to the or trigger-LSP deploying  Trunk-bit-error-
forwarding plane. If bit error trunk-bit-err triggered IGP
bit errors occur on all detection on or-triggered route switching
trunk member trunk member IGP route is
interfaces or the interfaces. switching, recommended.
number of member When you must  Layer 2 trunk
interfaces without bit deploying deploy
errors is lower than interfaces do
trunk-bit-error-t bit-error-trig not support an
the lower threshold riggered IGP gered IGP
for the trunk IGP. Therefore,
route route bit-error-trigger
interface's Up links, switching, you switching
bit-error-triggered ed IGP route
must enable on trunk switching
protection switching link-quality bit interfaces.
involves the cannot be
error detection deployed on
following modes: on trunk Layer 2 trunk
 Trunk-bit-error-tri member interfaces.
ggered section interfaces.
switching: The The bit error
trunk interface status must be
goes Down, advertised
triggering an using BFD
upper-layer packets.
application
associated with
the trunk interface
to perform a
service
switchover.
 Trunk-bit-error-tri
ggered IGP route
switching: The
trunk interface
ignores the bit
errors on the
member
interfaces and
remains Up.
However, the link
quality level of
Issue 01 (2018-05-04) 443

NE20E-S2

gered
Protection
Switching
Features
the trunk interface
becomes Low,
triggering an IGP
to increase the
cost of the trunk
interface's link.
IGP routes then
do not
preferentially
select the link.
Bit-err The ingress of the Trigger-LSP bit  This feature To implement
or-trigg primary and backup error detection is dual-ended
ered CR-LSPs determines must be independent switching, deploy
RSVP- the bit error statuses enabled on an ly deployed. bit-error-triggered
TE of the CR-LSPs interface.  This feature protection
tunnel based on link BERs. The bit error is deployed switching on the
switchi A service switchover status must be together RSVP-TE tunnels
ng or switchback is then advertised with in both directions
performed based on using BFD bit-error-trig and configure the
the bit error statuses packets. gered PW tunnels as
of the CR-LSPs. switching. bidirectional
associated
 This feature CR-LSPs.
is deployed
together
with
bit-error-trig
gered
L3VPN
switching.
Bit-err If bit errors occur, Trigger-LSP bit This feature is If an RSVP-TE
or-trigg service traffic is error detection deployed tunnel with
ered switched from the must be together with bit-error-triggered
PW primary PW to the enabled on an bit-error-trigger protection
switchi secondary PW. interface. ed RSVP-TE switching enabled
ng The bit error tunnel is used to carry a
status must be switching. PW,
advertised bit-error-triggered
using BFD RSVP-TE tunnel
packets. switching is
preferentially
performed.
Bit-error-triggered
PW switching is
performed only
when
Issue 01 (2018-05-04) 444

NE20E-S2

gered
Protection
Switching
Features
bit-error-triggered
RSVP-TE tunnel
switching fails to
protect services
against bit errors.
Bit-err If bit errors occur, Trigger-LSP bit This feature is If an RSVP-TE
or-trigg VPN routes are error detection deployed tunnel with
ered triggered to must be together with bit-error-triggered
L3VP reconverge. Service enabled on an bit-error-trigger protection
N route traffic is then interface. ed RSVP-TE switching enabled
switchi switched to the link The bit error tunnel is used to carry an
ng without bit errors. status must be switching. L3VPN,
advertised bit-error-triggered
using BFD RSVP-TE tunnel
packets. switching is
preferentially
performed.
Bit-error-triggered
L3VPN route
switching is
performed only
when
bit-error-triggered
RSVP-TE tunnel
switching fails to
protect services
against bit errors.
Bit-err Static The MAC-layer This feature is  If a tunnel
or-trigg CR-LSPs/PWs/E-PW SD alarm independently protection
ered s are used to carry function deployed. group has been
static user services, and (Trigger-LSP deployed for
CR-LS MPLS-TP OAM is type) must be static CR-LSPs
P/PW/ deployed to ensure enabled on carrying
E-PW reliability. If a node interfaces. PWs/E-PWs, bit
APS detects bit errors, the The bit error errors
node uses MPLS-TP status must be preferentially
OAM to advertise the advertised trigger static
bit error status to the using CR-LSP
egress. APS is then MPLS-TP protection
used to trigger a OAM. switching.
traffic switchover. Bit-error-trigger
ed PW
protection
switching is
performed only
when
Issue 01 (2018-05-04) 445

NE20E-S2

gered
Protection
Switching
Features
bit-error-trigger
ed static
CR-LSP
protection
switching fails
to protect
services against
bit errors.
 Eth-Trunk
interfaces do
not support the
advertisement
of the bit error
status by
MPLS-TP
OAM.
1.5.9.3.1 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which
TE Tunnels Carry an IP RAN
Figure 1-290 shows typical L2VPN+L3VPN networking in an IP RAN application. A VPWS
based on an RSVP-TE tunnel is deployed at the access layer, an L3VPN based on an
RSVP-TE tunnel is deployed at the aggregation layer, and L2VPN access to L3VPN is
configured on the AGGs. To ensure reliability, deploy PW redundancy for the VPWS,
configure VPN FRR protection for the L3VPN, and configure hot-standby protection for the
RSVP-TE tunnels.
Issue 01 (2018-05-04) 446

NE20E-S2
Figure 1-290 IP RAN carried over TE tunnels
Feature Deployment
To prevent the impact of bit errors on services, deploy bit-error-triggered RSVP-TE tunnel
switching, bit-error-triggered PW switching, and bit-error-triggered L3VPN route switching in
the scenario shown in Figure 1-290. The deployment process is as follows:
 Enable trigger-LSP bit error detection on each interface.
 Bit-error-triggered RSVP-TE tunnel switching: Enable bit-error-triggered protection
switching on the RSVP-TE tunnel interfaces of the CSG and AGG1, and configure
thresholds for bit-error-triggered RSVP-TE tunnel switching.
 Bit-error-triggered PW switching: Enable bit-error-triggered PW switching on the
interfaces that connect the CSG and AGG1 and the interfaces that connect the CSG and
AGG2.
 Bit-error-triggered L3VPN route switching: Configure bit-error-triggered L3VPN route
switching in the VPNv4 view of AGG1.
Bit-Error-Triggered Protection Switching Scenarios

Scenario 1
On the network shown in Figure 1-291, if bit errors occur on location 1, the RSVP-TE tunnel
between the CSG and AGG1 detects the bit errors, triggering dual-ended switching. Both
upstream and downstream traffic is switched to the hot-standby path, preventing traffic from
passing through the link with bit errors.
Issue 01 (2018-05-04) 447

NE20E-S2
Figure 1-291 Application of bit-error-triggered RSVP-TE tunnel switching
Scenario 2
On the network shown in Figure 1-292, if bit errors occur on both locations 1 and 2, both the
primary and secondary links of the RSVP-TE tunnel between the CSG and AGG1 detect the
bit errors. In this case, bit-error-triggered RSVP-TE tunnel switching cannot protect services
against bit errors. The bit errors further trigger PW and L3VPN route switching.
 After detecting the bit errors, the CSG performs a primary/secondary PW switchover and
switches upstream traffic to AGG2.
 After detecting the bit errors, AGG1 reduces the priority of VPNv4 routes advertised to
RSG1, so that RSG1 preferentially selects VPNv4 routes advertised by AGG2.
Downstream traffic is then switched to AGG2.
Issue 01 (2018-05-04) 448

NE20E-S2
Figure 1-292 Application of bit-error-triggered PW and L3VPN route switching

LDP LSPs Carry an IP RAN
Figure 1-293 shows typical L2VPN+L3VPN networking in an IP RAN application. A VPWS
based on an LDP LSP is deployed at the access layer, an L3VPN based on an LDP LSP is
deployed at the aggregation layer, and L2VPN access to L3VPN is configured on the AGGs.
To ensure reliability, deploy LDP and IGP synchronization for the LDP LSPs, and configure
Eth-Trunk interfaces on key links.
Issue 01 (2018-05-04) 449

NE20E-S2
Figure 1-293 IP RAN carried over LDP LSPs
Feature Deployment
To prevent the impact of bit errors on services, deploy bit-error-triggered IGP route switching
in the scenario shown in Figure 1-293. Deploy trunk-bit-error-triggered IGP route switching
on the Eth-Trunk interfaces. The deployment process is as follows:
 Enable link-quality bit error detection on each physical interface and Eth-Trunk member
interface.
 Enable bit-error-triggered IGP route switching on each physical interface and Eth-Trunk
interface.
Bit-Error-Triggered Protection Switching Scenarios

Scenario 1
On the network shown in Figure 1-294, if bit errors occur on location 1 (physical interface),
the CSG detects the bit errors and adjusts the quality level of the interface's link to Low,
triggering an IGP to increase the cost of the link. In this case, IGP routes do not preferentially
select the link. The CSG also uses a BFD packet to advertise the bit errors to the peer device,
so that the peer device also performs the same processing. Both upstream and downstream
traffic is then switched to the paths without bit errors.
Issue 01 (2018-05-04) 450

NE20E-S2
Figure 1-294 Application of physical-interface-bit-error-triggered IGP route switching
Scenario 2
On the network shown in Figure 1-295, if bit errors occur on location 2 (Eth-Trunk member
interface), AGG1 detects the bit errors.
 If the number of member interfaces without bit errors is still higher than the lower
threshold for the Eth-Trunk interface's Up links, the Eth-Trunk interface deletes the
Eth-Trunk member interface from the forwarding plane. In this case, service traffic is
still forwarded over the normal path.
 If the number of member interfaces without bit errors is lower than the lower threshold
for the Eth-Trunk interface's Up links, the Eth-Trunk interface ignores the bit errors on
the Eth-Trunk member interface and remains Up. However, the link quality level of the
Eth-Trunk interface becomes Low, triggering an IGP (OSPF or IS-IS) to increase the cost
of the Eth-Trunk interface's link. IGP routes then do not preferentially select the link.
AGG1 also uses a BFD packet to advertise the bit errors to the peer device, so that the
peer device also performs the same processing. Both upstream and downstream traffic is
then switched to the paths without bit errors.
Issue 01 (2018-05-04) 451

NE20E-S2
Figure 1-295 Application of Eth-Trunk-interface-bit-error-triggered IGP route switching

a Static CR-LSP/PW Carries L2VPN Services
Figure 1-296 shows a typical IP RAN. L2VPN services are carried on static CR-LSPs.
CR-LSP APS is configured to provide tunnel-level protection. Additionally, PW APS/E-PW
APS is configured for L2VPN services to provide service-level protection.
Issue 01 (2018-05-04) 452

NE20E-S2
Figure 1-296 IP RAN using static CR-LSPs to carry L2VPN services
Feature Deployment
To meet high reliability requirements of the IP RAN and protect services against bit errors,
configure bit-error-triggered protection switching for the CR-LSPs/PWs. To do so, enable bit
error detection on the interfaces along the CR-LSPs/PWs, configure the switching type as
trigger-LSP, and configure bit error alarm generation and clearing thresholds. If the BER
reaches the bit error alarm threshold configured on an interface of a device along a static
CR-LSP or PW, the device determines that a bit error occurrence event has occurred and
notifies the MPLS-TP OAM module of the event. The MPLS-TP OAM module uses AIS
packets to advertise the bit error status to the egress, and then APS is used to trigger a traffic
switchover.
Terms
Term Definition
Bit error The deviation between a bit that is sent and

the bit that is received. Cyclic redundancy
checks (CRCs) are commonly used to detect
bit errors.
BER (bit error rate) A bit error rate (BER) indicates the
probability that incorrect packets are
received and packets are discarded.

Abbreviation
Issue 01 (2018-05-04) 453

NE20E-S2

Abbreviation
CRC cyclic redundancy check

PW pseudo wire
APS Automatic Protection Switching
1.6 Interface and Data Link

Purpose
This document describes the interface and link feature in terms of its overview, principle, and
applications.
Related Version

U2000 V200R017C50
Intended Audience
Issue 01 (2018-05-04) 454

NE20E-S2
securely protected.
and VRRP.
Special Declaration
Issue 01 (2018-05-04) 455

NE20E-S2
Symbol Conventions
Symbol Description



injury.
tips.
deterioration.
Change History
V800R009C10SPC200.
V800R009C10SPC100.
V800R009C10.
1.6.2 Introduction
Definition
An interface is a point of interaction between devices on a network. Interfaces are classified
into physical and logical interfaces.
 Physical interfaces physically exist on a device.
Issue 01 (2018-05-04) 456

NE20E-S2
 Logical interfaces are manually configured interfaces that do not exist physically.
Logical interfaces can be used to exchange data.
Purpose
A physical interface connects a device to another device using a transmission medium (for
example, a cable). The physical interface and transmission medium together form a
transmission channel that transmits data between the devices. Before data reaches a device, it
must pass through the transmission channel. In addition, sufficient bandwidth must be
provided to reduce channel congestion.
A logical interface does not require additional hardware, therefore reducing costs.
Benefits
 Data can be transmitted properly over a transmission channel that a physical interface
and a transmission medium form, therefore enabling communication between users.
 Data communication can be implemented using logical interfaces, without additional
hardware requirements.
1.6.3 Principles
1.6.3.1 Basic Concepts
Interface Types
The router exchanges data and interacts with other devices on a network through interfaces.
Interfaces are classified into physical and logical interfaces.
 Physical Interfaces
Physical interfaces physically exist on boards. They are divided into the following types:
− LAN interfaces: interfaces through which the router can exchange data with the
devices on a LAN.
− WAN interfaces: interfaces through which the router can exchange data with remote
devices on a WAN.
 Logical Interfaces
Logical interfaces are manually configured interfaces that do not exist physically.
Logical interfaces can be used to exchange data.
Interface Views and Prompts

The NE20E supports the command views and prompts of physical interfaces in Table 1-81
and the command views and prompts of logical interfaces in Table 1-82.
Table 1-81 Command views and prompts of physical interfaces supported by the NE20E
Interface Name Command Accessing Prompt

View Command
Ethernet interface Ethernet interface Run the interface [~HUAWEI-Ethernet0/1/0

view ethernet 0/1/0 ]
command in the
Issue 01 (2018-05-04) 457

NE20E-S2
Interface Name Command Accessing Prompt

View Command
system view.
GE interface GE interface view Run the interface [~HUAWEI-GigabitEthern
gigabitethernet et0/1/0]
0/1/0 command in
the system view
40GE interface 40GE interface Run the interface [~HUAWEI-40GE0/1/0]
view 40GE 0/1/0
command in the
system view
100GE interface 100GE interface Run the interface [~HUAWEI-100GE0/1/0]
view 100GE 0/1/0
command in the
system view
POS interface POS interface Run the interface [~HUAWEI-Pos0/3/0]
view pos 0/3/0
command in the
system view
Table 1-82 Command views and prompts of logical interfaces

Interface Command Accessing Prompt
Name View Command
Sub-interface Sub-interface Run the interface [~HUAWEI-GigabitEthernet
view gigabitethernet 0/1/0.1]
0/1/0.1 command in
the system view
Eth-Trunk Eth-Trunk Run the interface [~HUAWEI-Eth-Trunk0]
interface interface view eth-trunk 0 command
in the system view
Loopback Loopback Run the interface [~HUAWEI-LoopBack2]
interface interface view loopback 2 command
in the system view
Null interface Null interface Run the interface null [~HUAWEI-NULL0]
view 0 command in the
system view
IP-Trunk IP-Trunk Run the interface [~HUAWEI-Ip-Trunk0]
interface interface view ip-trunk 0 command
in the system view
Tunnel Tunnel Run the interface [~HUAWEI-Tunnel0/1/0]
interface interface view tunnel 0/1/0 command
in the system view
Issue 01 (2018-05-04) 458

NE20E-S2
Commonly-used Link Protocols and Access Technologies

The link layer is responsible for accurately sending data from a node to a neighboring node. It
receives packets from the network layer, encapsulates the packets in frames, and then sends
the frames to the physical layer.
Major link layer protocols supported by the NE20E are listed as follows:
 Ethernet
Currently, the LAN mostly refers to the Ethernet. The Ethernet is a broadcast network,
which is flexible and simple in configuration and is easy to expand. The Ethernet is
widely used.
 Trunk
Trunks can be classified into Eth-Trunks and IP-Trunks. An Eth-Trunk must be
composed of Ethernet links, and an IP-Trunk must be composed of POS links.
The trunk technology has the following advantages:
− Bandwidth increase: The bandwidth of an IP-Trunk is the total bandwidth of all
member interfaces.
− Reliability enhancement: When a link fails, other links in the same trunk
automatically take over the services on the faulty link to prevent traffic interruption.
 PPP
The Point-to-Point Protocol (PPP) is used to encapsulate IP packets on serial links. It
supports both the asynchronous transmission of 8-bit data without the parity check and
the bit-oriented synchronous connection.
PPP consists of the Link Control Protocol (LCP) and the Network Control Protocol
(NCP). LCP is used to create, configure, and test links; NCP is used to control different
network layer protocols.
 HDLC
The High-Level Data Link Control (HDLC) is a suite of protocols that are used to
transmit data between network nodes. HDLC is widely used at the data link layer.
In HDLC, the receiver responds with an acknowledgement when it receives frames
transmitted over the network. In addition, HDLC manages data flows and the interval at
which data packets are transmitted.
MTU
The maximum transmission unit (MTU) is the size (in bytes) of the longest packet that can be
transmitted on a physical network. The MTU is very important for interworking between two
devices on a network. If the size of a packet exceeds the MTU supported by a transit node or a
receiver, the transit node or receiver may fragment the packet before forwarding it or may
even discard it, increasing the network transmission loads. MTU values must be correctly
negotiated between devices to ensure that packets reach the receiver.
 If fragmentation is disallowed, packet loss may occur during data transmission at the IP
layer. To ensure that long packets are not discarded during transmission, configure
forcible fragmentation for long packets.
 When an interface with a small MTU receives long packets, the packets have to be
fragmented. Consequently, when the quality of service (QoS) queue becomes full, some
packets may be discarded.
 If an interface has a large MTU, packets may be transmitted at a low speed.
Issue 01 (2018-05-04) 459

NE20E-S2
Control-Flap
The status of an interface on a device may alternate between Up and Down for various
reasons, including physical signal interference and incorrect link layer configurations. The
changing status causes Multiprotocol Label Switching (MPLS) and routing protocols to flap.
As a result, the device may break down, causing network interruption. Control-flap controls
the frequency of interface status alternations between Up and Down to minimize the impact
on device and network stability.
The following two control modes are available.
Table 1-83 Flapping control modes
Control Mode Function Usage Scenario

control-flap Controls the frequent  This control mode is
flappings of interfaces from interface-specific.
the network layer to  This control mode
minimize the impact on suppresses interface
device and network stability. flappings from the
network layer and
reports the flappings to
the routing management
module, thereby
improving network-layer
stability.
 This control mode allows
you to precisely
configure parameters
based on service
requirements.
 This control mode
involves complex
algorithms and is highly
demanding to use.
damp-interface Controls the frequent  This function is

flappings of interfaces at the supported globally or on
physical layer to minimize a specified interface.
the impact on device and  This control mode
network stability. suppresses the flappings
from the physical layer,
thereby improving
link-layer and
network-layer stability.
 This control mode
prevents the upper-layer
protocols from
frequently alternating
between enabled and
disabled, thereby
reducing the
consumption of CPU and
Issue 01 (2018-05-04) 460

NE20E-S2
Control Mode Function Usage Scenario

memory resources.
 This control mode does
not involve any complex
algorithms and is easy to
use.
 control-flap
Concepts of control-flap:
− Penalty value and threshold
An interface is suppressed or freed from suppression based on the penalty value.
 Penalty value: This value is calculated based on the status of the interface
using the suppression algorithm. The penalty value increases with the changing
times of the interface status and decreases with the half life.
 Suppression threshold (suppress): The interface is suppressed when the penalty
value is greater than the suppression threshold.
 Reuse threshold (reuse): The interface is no longer suppressed when the
penalty value is smaller than the reuse threshold.
 Ceiling threshold (ceiling): The penalty value no longer increases when the
penalty value reaches the ceiling threshold.
The parameter configuration complies with the following rule: reuse threshold
(reuse) < suppression threshold (suppress) < maximum penalty value (ceiling).
− Half life
When an interface goes Down for the first time, the half life starts. A device
matches against the half life based on the actual interface status. If a specific half
life is reached, the penalty value decreases by half. Once a half life ends, another
half life starts.
 Half life when an interface is Up (decay-ok): When the interface is Up, if the
period since the end of the previous half life reaches the current half life, the
penalty value decreases by half.
 Half life when an interface is Down (decay-ng): When the interface is Down,
if the period since the end of the previous half life reaches the current half life,
the penalty value decreases by half.
− Maximum suppression time: The maximum suppression time of an interface is 30
minutes. When the period during which an interface is suppressed reaches the
maximum suppression time, the interface is automatically freed from suppression.
− Penalty value: This value is calculated based on the status of the interface using the
suppression algorithm. The core of the suppression algorithm is that the penalty
value increases with the changing times of the interface status and decreases
exponentially.
− Suppression threshold: The interface is suppressed when the penalty value is greater
than the suppression threshold. The suppression threshold must be greater than the
reuse threshold and smaller than the ceiling threshold.
− Reuse threshold: The interface is no longer suppressed when the penalty value is
smaller than the reuse threshold. The reuse threshold must be smaller than the
suppression threshold.
Issue 01 (2018-05-04) 461

NE20E-S2
− Ceiling threshold: The penalty value no longer increases when the penalty value
reaches the ceiling threshold. The ceiling threshold must be greater than the
suppression threshold.
You can set the preceding parameters on the NE20E to restrict the frequency at which an
interface can alternate between Up and Down.
Principles of interface flapping control:
In Figure 1-297, the default penalty value of an interface is 0. The penalty value
increases by 400 each time the interface goes Down. When an interface goes Down for
the first time, the half life starts. The system checks whether the specific half life expires
at an interval of 1s. If the specific half life expires, the penalty value decreases by half.
Once a half life ends, another half life starts.
− If the penalty value reaches suppress, the interface is suppressed. When the
interface is suppressed, the outputs of the display interface, display interface brief,
and display ip interface brief commands show that the protocol status of the
interface remains DOWN(dampening suppressed) and does not change with the
physical status.
− If the penalty value falls below reuse, the interface is freed from suppression. When
the interface is freed from suppression, the protocol status of the interface is in
compliance with the actual status and does not remain Down (dampening
suppressed).
− If the penalty value reaches ceiling, the penalty value no longer increases.
Figure 1-297 Flapping control
Issue 01 (2018-05-04) 462

NE20E-S2
 damp-interface
Related concepts:
− penalty value: a value calculated by a suppression algorithm based on an interface's
flappings. The suppression algorithm increases the penalty value by a specific value
each time an interface goes Down and decreases the penalty value exponentially
each time the interface goes Up.
− suppress: An interface is suppressed if the interface's penalty value is greater than
the suppress value.
− reuse: An interface is no longer suppressed if the interface's penalty value is less
than the reuse value.
− ceiling: calculated using the formula of reuse x 2 (MaxSuppressTime/HalfLifeTime). ceiling is
the maximum penalty value. An interface's penalty value no longer increases when
it reaches ceiling.
− half-life-period: period that the penalty value takes to decrease to half. A
half-life-period begins to elapse when an interface goes Down for the first time. If
a half-life-period elapses, the penalty value decreases to half, and another
half-life-period begins.
− max-suppress-time: maximum period during which an interface's status is
suppressed. After max-suppress-time elapses, the interface's actual status is
reported to upper layer services.
Figure 1-298 shows the relationship between the preceding parameters. To facilitate
understanding, figures in Figure 1-298 are all multiplied by 1000.
Figure 1-298 Suppression on physical interface flappings
Issue 01 (2018-05-04) 463

NE20E-S2
At t1, an interface goes Down, and its penalty value increases by 1000. Then, the
interface goes Up, and its penalty value decreases exponentially based on the half-life
rule. At t2, the interface goes Down again, and its penalty value increases by 1000,
reaching 1600, which has exceeded the suppress value 1500. At this time if the interface
goes Up again, its status is suppressed. As the interface keeps flapping, its penalty value
keeps increasing until it reaches the ceiling value 10000 at tA. As time goes by, the
penalty value decreases and reaches the reuse value 750 at tB. The interface status is
then no longer suppressed.
Loopback, Layer 2 interfaces that are converted from Layer 3 interfaces using the portswitch command
and NULL interfaces do not support setting the maximum transmission unit (MTU) and deploying
control-flap.
1.6.3.2 Logical Interface

A single physical interface can be virtually split into multiple logical interfaces. Logical
interfaces can be used to exchange data.
Table 1-84 Logical interface list

Feature Interface Name Usage Scenario and Interface Description
System DCN serial After DCN is enabled globally, a DCN serial interface
management interface is automatically created.
Interface Virtual Ethernet When an L2VPN accesses multiple L3VPNs, VE
management (VE) interface interfaces are used to terminate the L2VPN for
L3VPN access. Because a common VE interface is
bound to only one board, services will be interrupted if
the board fails.
Interface Global VE When an L2VPN access multiple L3VPNs, global VE
management interface interfaces are used to terminate the L2VPN for
L3VPN access.
A common VE interface is bound to only one board. If
a board where a VE interface resides fails, services on
the VE interface will be interrupted. Unlike common
VE interfaces, global VE interfaces support global
L2VE and L3VE. Services on global VE interfaces
will not be interrupted if some boards fail.
The loopback function on global VE interfaces works
properly irrespective of whether a board is powered
off or damaged. The loopback process has been
optimized on global VE interfaces to enhance the
interface forwarding performance.
Global VE interfaces can be created on a device if the
device is powered on.
Interface Loopback According to TCP/IP, an interface with an IP address
management interface in the range of 127.0.0.0 to 127.255.255.255 is a
loopback interface. A loopback interface can be either
of the following:
 Loopback interface
Issue 01 (2018-05-04) 464

NE20E-S2

A loopback interface is used when you need an
interface that must be always in the Up state. A
loopback interface has the following advantages:
− Once a loopback interface is created, its
physical status and data link protocol status
always stay Up, regardless of whether an IP
address is configured for the loopback interface.
− A loopback interface can be assigned an IP
address with a 32-bit mask, which reduces
address consumption. The IP address of a
loopback interface can be advertised
immediately after being configured.
− No link layer protocol can be configured for a
loopback interface. Therefore, no data link layer
negotiation is required, allowing the link layer
protocol status of the interface to stay Up.
− The device drops the packet with a non-local IP
address as the destination IP address and a local
loopback interface as the outbound interface.
The advantages of a loopback interface help
improve configuration reliability. The IP address of
a loopback interface can be used as follows:
− Can be configured as a packet's source IP
address to improve network reliability.
− Can be used to control an access interface and
filter logs to simplify information displaying.
 InLoopback0 interface
An InLoopBack0 interface is a fixed loopback
interface that is automatically created at the system
startup.
An InLoopBack0 interface uses the fixed loopback
address 127.0.0.1/8 to receive data packets destined
for the host where the InLoopBack0 interface
resides. The loopback address of an InLoopBack0
interface is not advertised.
Interface Null0 interface A Null0 interface, similar to a null device supported in
management some operating systems, is automatically created by
the system.
Because data packets sent to a Null0 interface are
discarded, you can directly forward packets to be
filtered to the Null0 interface without configuring
ACLs.
A Null0 interface is used as follows:
 Routing loop prevention
A Null0 interface can be used to prevent routing
loops. For example, a route to the Null0 interface is
created when a set of routes are summarized.
Issue 01 (2018-05-04) 465

NE20E-S2

 Traffic filtering
A Null0 interface can filter packets without an
ACL.
No IP address or data link layer protocol can be

configured on a Null0 interface.
LAN access Ethernet An Ethernet sub-interface can be configured on a
and MAN sub-interface physical interface or logical interface and can have an
access IP address configured to implement inter-VLAN
communication.
An Ethernet sub-interface has Layer 3 features. It
shares the physical layer parameters of the main
interface but has independent link layer and network
layer parameters. Enabling or disabling an Ethernet
sub-interface does not affect the main interface where
the sub-interface resides, whereas the main interface
status affects the Ethernet sub-interface. Specifically,
the Ethernet sub-interface can work properly only if
the main interface is Up.
LAN access Eth-Trunk An Eth-Trunk interface can have multiple physical
and MAN interface interfaces bundled to increase bandwidth, improve
access reliability, and implement load balancing.
For more information, see 1.7.3 Trunk.
LAN access VLANIF A VLANIF interface belongs to a Layer 3 interface
and MAN interface and can be configured with an IP address. A VLANIF
access interface that has an IP address configured enables a
Layer 2 device to communicate with a Layer 3 device.
Layer 3 switching combines routing and switching and
improves overall network whole performance. After a
Layer 3 switch transmits a data flow using a routing
table, it generates a mapping between a MAC address
and IP address. When the Layer 3 switch receives the
same data flow, it transmits the data flow over Layer 2
not Layer 3. The routing table must have correct
routing entries, so that the Layer 3 switch can transmit
the data flow for the first time. A VLANIF interface
and a routing protocol must be configured on a Layer
3 switch to ensure Layer 3 route reachability.
WAN access ATM bundle An ATM bundle interface is used to forward one type
interface of service from NodeBs to an RNC over the same PW.
In the scenarios where multiple NodeBs connect to a
CSG through E1, CE1, or CPOS links, each NodeB
may have voice, video, and data services, which
require the CSG to create three PVCs for each NodeB.
If one PW is used to transmit one type of service on
each NodeB, a large number of PWs must be
configured on the CSG. The growing number of
NodeBs and service types increasingly burdens the
Issue 01 (2018-05-04) 466

NE20E-S2

CSG.
To address this problem, sub-interfaces that connect
NodeBs to the CSG and transmit the same type of
service can be bound to one ATM bundle interface. A
PW is then set up on the ATM bundle interface to
transmit the services to the RNC. In this way, each
type of service requires one ATM bundle interface and
one PW on a CSG, thereby reducing the number of
PWs, alleviating the burden on the CSG, and
improving service scalability.
WAN access Channelized Serial interfaces are channelized from E1 or CPOS
serial interface interfaces to carry PPP services.
 The number of a serial interface channelized from
an E1 interface is in the format of E1 interface
number:channel set number. For example, the
serial interface channelized from channel set 1 of
CE1 2/0/0 is serial 2/0/0:1.
 The number of a serial interface channelized from a
CPOS interface is in the format of CPOS interface
number/E1 interface number:channel set number.
For example, the serial interface channelized from
channel 3 of CPOS 2/0/0's E1 2 is serial 2/0/0/2:3.
WAN access IP-Trunk To improve link communication capability, multiple
interface POS interfaces can be bundled to an IP-Trunk
interface.
An IP-Trunk interface has the following advantages:
 Increased bandwidth: The bandwidth of an
IP-Trunk interface totals the bandwidths of the
IP-Trunk's member interfaces.
 Balanced traffic: An IP-Trunk interface can share
traffic by distributing traffic among different links,
thereby preventing traffic congestion caused by
traffic loading on a single link.
 Improved link reliability: If one member interface
goes Down, traffic can still be forwarded by the
remaining active member interfaces.
An IP-Trunk interface must have HDLC encapsulated
as its link layer protocol.
For more information, see 1.8.5.2.5 IP-Trunk.
WAN access POS-Trunk A POS-Trunk interface can have multiple POS
interface interfaces bundled to support APS. A POS-Trunk
interface must have PPP encapsulated as its link layer
protocol.
WAN access CPOS-Trunk A CPOS-Trunk interface can have multiple CPOS
interface interfaces bundled to support APS.
WAN access Trunk serial A trunk serial interface is channelized from a
Issue 01 (2018-05-04) 467

NE20E-S2

interface CPOS-Trunk interface to support APS.
WAN access MP-group An MP-group interface that has multiple serial
interface interfaces bundled is exclusively used by MP to
increase bandwidth and improve reliability.
For more information, see 1.8.6.2.5 MP Principles.
WAN access Global MP-group A protection channel can be configured to take over
interface traffic from one or more working channels if the
working channels fail, which improves network
reliability. Before PPP is deploying on CPOS
interfaces, two CPOS interfaces must be added to a
CPOS-Trunk interface, which is then channelized into
trunk serial interfaces.
A global MP-group interface can have multiple trunk
serial interfaces bundled to carry PPP services. If one
CPOS link fails, the other CPOS link takes over the
PPP traffic.
WAN access IMA-group When users access an ATM network at a rate between
interface T1 and T3 or between E1 and E3, it is cost-ineffective
for the carrier to directly use T3 or E3 lines. In this
situation, an IMA-group interface can have multiple
T1 or E1 interfaces bundled to carry ATM services.
The bandwidth of an IMA-group interface is
approximately the total bandwidths of all member
interfaces.
For more information, see 1.8.2 ATM IMA.
WAN access Global A protection channel can be configured to take over
IMA-group traffic from one or more working channels if the
interface working channels fail, which improves network
reliability. Before ATM services are deployed on
CPOS interfaces, two CPOS interfaces must be added
to a CPOS-Trunk interface, which is then channelized
into trunk serial interfaces.
A global IMA-group interface can have multiple trunk
serial interfaces bundled to carry ATM services. If one
CPOS link fails, the other CPOS link takes over the
ATM traffic.
MPLS Tunnel interface A tunnel interface is used by an MPLS TE tunnel to
forward traffic.
For more information, see 1.6.4.5 Tunnel Interface.
1.6.3.3 Interface Monitoring Group

Network-side interfaces can be added to an interface monitoring group. Each interface
monitoring group is identified by a unique group name. The network-side interface to be
monitored is a binding interface, and the user-side interface associated with the group is a
track interface, whose status changes with the binding interface status. The interface
Issue 01 (2018-05-04) 468

NE20E-S2
monitoring group monitors the status of all binding interfaces. When a specific proportion of
binding interfaces goes Down, the track interface associated with the interface monitoring
group goes Down, which causes traffic to be switched from the master link to the backup link.
When the number of Down binding interfaces falls below a specific threshold, the track
interface goes Up, and traffic is switched back to the master link.
Figure 1-299 Interface monitoring group
In the example network shown in Figure 1-299, ten binding interfaces are located on the
network side, and two track interfaces are located on the user side. You can set a Down weight
for each binding interface and a Down weight threshold for each track interface. For example,
the Down weight of each binding interface is set to 10, and the Down weight thresholds of
track interfaces A and B are set to 20 and 80, respectively. When the number of Down binding
interfaces in the interface monitoring group increases to 2, the system automatically instructs
track interface A to go Down. When the number of Down binding interfaces in the interface
monitoring group increases to 8, the system automatically instructs track interface B to go
Down. When the number of Down binding interfaces in the interface monitoring group falls
below 8, track interface B automatically goes Up. When the number of Down binding
interfaces in the interface monitoring group falls below 2, track interface A automatically goes
Up.
1.6.4 Applications
1.6.4.1 Sub-interface
In the network shown in Figure 1-300, multiple sub-interfaces are configured on the physical
interface of Device. Like a physical interface, each sub-interface can be configured with one
IP address. The IP address of a sub-interface must be on the same network segment as the IP
address of a remote network, and the IP address of each sub-interface must be on a unique
network segment.
Issue 01 (2018-05-04) 469

NE20E-S2
Figure 1-300 GE Sub-interface
With these configurations, a virtual connection is established between a sub-interface and a

remote network. This allows the remote network to communicate with the local sub-interface
and consequently communicate with the local network.
1.6.4.2 Eth-Trunk
In the network shown in Figure 1-301, an Eth-Trunk that bundles two full-duplex 1000 Mbit/s
interfaces is established between Device A and Device B. The maximum bandwidth of the
trunk link is 2000 Mbit/s.
Figure 1-301 Networking diagram of Eth-Trunk
Backup is enabled within the Eth-Trunk. If a link fails, traffic is switched to the other link to
ensure link reliability.
In addition, network congestion can be avoided because traffic between Device A and Device
B is balanced between the two member links.
The application and networking diagram of IP-Trunk are similar to those of Eth-Trunk.
1.6.4.3 Loopback Interface
Improving Reliability
 IP address unnumbered
When an interface will only use an IP address for a short period, it can borrow an IP
address from another interface to save IP address resources. Usually, the interface is
configured to borrow a loopback interface address to remain stable.
Issue 01 (2018-05-04) 470

NE20E-S2
 Router ID
Some dynamic routing protocols require that routers have IDs. A router ID uniquely
identifies a router in an autonomous system (AS).
If OSPF and BGP are configured with router IDs, the system needs to select the
maximum IP address as the router ID from the local interface IP addresses. If the IP
address of a physical interface is selected, when the physical interface goes Down, the
system does not reselect a router ID until the selected IP address is deleted.
Because the loopback interface is stable and always Up, it is recommended as the router
ID of a router.
 BGP
To prevent BGP sessions from being affected by physical interface faults, you can
configure a loopback interface as the source interface that sends BGP packets.
When a loopback interface is used as the source interface of BGP packets, note the
following:
− The loopback interface address of the BGP peer must be reachable.
− In the case of an EBGP connection, EBGP is allowed to establish neighbor
relationships through indirectly connected interfaces.
 MPLS LDP
In MPLS LDP, a loopback interface address is often used as the transmission address to
ensure network stability. This IP address could be a public network address.
Classifying information
 SNMP
To ensure the security of servers, a loopback interface address is used as the source IP
address rather than the outbound interface address of SNMP trap messages. In this
manner, packets are filtered to protect the SNMP management system. The system
allows only the packets from the loopback interface address to access the SNMP port.
This facilitates reading and writing trap messages.
 NTP
The Network Time Protocol (NTP) synchronizes the time of all devices. NTP specifies a
loopback interface address as the source address of the NTP packets sent from the local
router.
To ensure the security of NTP, NTP specifies a loopback interface address rather than the
outbound interface address as the source address. In this situation, the system allows
only the packets from the loopback interface address to access the NTP port. In this
manner, packets are filtered to protect the NTP system.
 Information recording
During the display of network traffic records, a loopback interface address can be
specified as the source IP address of the network traffic to be output.
In this manner, packets are filtered to facilitate network traffic collection. This is because
only the packets from the loopback interface address can access the specified port.
 Security
Identifying the source IP address of logs on the user log server helps to locate the source
of the logs rapidly. It is recommended that you configure a loopback address as the
source IP address of log messages.
 HWTACACS
Issue 01 (2018-05-04) 471

NE20E-S2
After Huawei Terminal Access Controller Access Control System (HWTACACS) is

configured, the packets sent from the local router use the loopback address as the source
address. In this manner, packets are filtered to protect the HWTACACS server.
This is because only the packets sent from the loopback interface address can access the
HWTACACS server. This facilitates reading and writing logs. There are only loopback
interface addresses rather than outbound interface addresses in HWTACACS logs.
 RADIUS authentication
During the configuration of a RADIUS server, a loopback interface address is specified
as the source IP address of the packets sent from the router.
This ensures the security of the server. In this situation, packets are filtered to protect the
RADIUS server and RADIUS agent. This is because only the packets from a loopback
interface address can access the port of the RADIUS server. This facilitates reading and
writing logs. There are only loopback interface addresses rather than outbound interface
addresses in RADIUS logs.
1.6.4.4 Null0 Interface
Application Scenario
The Null0 interface does not forward packets. All packets sent to this interface are discarded.
The Null0 interface is applied in two situations:
 Loop prevention
The Null0 interface is typically used to prevent routing loops. For example, during route
aggregation, a route to the Null0 interface is always created.
In the example network shown in Figure 1-302, Device A provides access services for
multiple remote nodes.
Device A is the gateway of the local network that uses the Class B network segment
address 172.16.1.1/16. Device A connects to three subnets through Device B, Device C,
and Device D respectively.
Issue 01 (2018-05-04) 472

NE20E-S2
Figure 1-302 Example for using the Null0 interface to prevent routing loops
Normally, the routing table of Device A contains the following routes:

− Routes to three subnets, namely, 172.16.2.1/24, 172.16.3.1/24, and 172.16.4.1/24
− Network segment routes to Device B, Device C, Device D
− Default routes to the ISP network
If Device E on the ISP network receives a packet with the destination address on the
network segment 172.16.1.10/24, it forwards the packet to Device A.
If the destination address of the packet does not belong to the network segment to which
Device B, Device C, or Device D is connected, Device A searches the routing table for
the default route, and then sends the packet to Device E.
In this situation, the packets whose destination addresses belong to the network segment
172.16.10.1/24 but not the network segment to which Device B, Device C, or Device D
is connected are repeatedly transmitted between Device A and Device E. As a result, a
routing loop occurs.
To address this issue, a static route to the Null0 interface is configured on Device A.
Then, after receiving the packet whose destination network segment does not belong to
any of the three subnets, Device A finds the route whose outgoing interface is the Null0
interface according to exact matching rules, and then discards the packet.
Issue 01 (2018-05-04) 473

NE20E-S2
Therefore, configuring a static route on Device A whose outgoing interface is the Null0
interface can prevent routing loops.
 Traffic filtering
The Null0 interface provides an optional method for filtering traffic. Unnecessary
packets are sent to the Null0 interface to avoid using an Access Control List (ACL).
Both the Null0 interface and ACL can be used to filter traffic as follows.
− Before the ACL can be used, ACL rules must be configured and then applied to an
interface. When a router receives a packet, it searches the ACL.
 If the action is permit, the router searches the forwarding table and then
determines whether to forward or discard the packet.
 If the action is deny, the router discards the packet.
− The Null0 interface must be specified as the outbound interface of unnecessary
packets. When a router receives a packet, it searches the forwarding table. If the
router finds that the outbound interface of the packet is the Null0 interface, it
discards the packet.
Using a Null0 interface to filter traffic is more efficient and faster. Using the Null0
interface for packet filtering only requires a route, but using the ACL for packet filtering
requires an ACL rule to be configured and then applied to the corresponding interface on
a router.
The Null0 interface can filter only the router-based traffic, whereas the ACL can filter
both the router-based and interface-based traffic.
1.6.4.5 Tunnel Interface

Tunnels such as MPLS TE tunnels, and IPv6 over IPv4 tunnels all use virtual interfaces called
tunnel interfaces to forward packets. Before these tunnels can be used, tunnel interfaces must
be created.
The source and destination addresses of a tunnel uniquely identify a tunnel. The destination
address of a tunnel interface is the same as the source address specified by the remote tunnel
interface. Ensure there is a reachable route between the source and destination tunnel
addresses. The same source address and destination address cannot be configured on two or
more tunnel interfaces using the same encapsulation protocol.
Tunnel interfaces can be configured with different encapsulation modes as required, for
example, mpls te, and ipv6-ipv4.
1.6.4.6 Application of Interface Monitoring Group

In the network shown in Figure 1-303, PE2 backs up PE1. NPE1 through NPEM on the user
side are dual-homed to the two PEs to load-balance traffic, and the two PEs are connected to
Router A through Router N on the network side. When only the link between PE1 and Router
N is available and all the links between PE1 and all the other routers fail, the NPEs do not
detect the failure and continue sending packets to Router N through PE1. As a result, the link
between PE1 and Router N becomes overloaded.
Issue 01 (2018-05-04) 474

NE20E-S2
Figure 1-303 Interface monitoring group application
To resolve this problem, you can configure an interface monitoring group and add multiple
network-side interfaces on the PEs to the interface monitoring group. When a link failure
occurs on the network side and the interface monitoring group detects that the status of a
certain proportion of network-side interfaces changes, the system instructs the user-side
interfaces associated with the interface monitoring group to change their status accordingly
and allows traffic to be switched between the master and backup links. Therefore, the
interface monitoring group can be used to prevent traffic overloads or interruptions.
1.7 LAN Access and MAN Access

Purpose
This document describes the LAN Access and MAN Access features in terms of its overview,
principle, and applications.
Issue 01 (2018-05-04) 475

NE20E-S2
Related Version

U2000 V200R017C50
Intended Audience
securely protected.
Issue 01 (2018-05-04) 476

NE20E-S2

and VRRP.
Special Declaration
Symbol Conventions
Symbol Description



injury.
tips.
Issue 01 (2018-05-04) 477

NE20E-S2
Symbol Description
deterioration.
Change History
V800R009C10SPC200.
V800R009C10SPC100.
V800R009C10.
1.7.2 Ethernet
Overview
Ethernet technology originated from an experimental network on which multiple PCs were
connected at 3 Mbit/s. In general, Ethernet refers to a standard connection for 10 Mbit/s
Ethernet networks. The Digital Equipment Corporation (DEC), Intel, and Xerox joined efforts
to develop and then issue Ethernet technology in 1982. The IEEE 802.3 standard is based on
and compatible with the Ethernet standard.
In this document, Ethernet II is used to indicate the Ethernet frames used by Ethernet IP packets, and
IEEE802.3 is used to indicate the Ethernet frames used by IEEE802.3 IP packets.
Purpose
Ethernet and token ring networks are typical local area network (LANs).
Ethernet has become the most important LAN networking technology because it is flexible,
simple, and easy to implement.
 Shared Ethernet
Initially, Ethernet networks were shared networks with 10M Ethernet technology.
Ethernet networks were constructed with coaxial cables, and computers and terminals
were connected through intricate connectors. This structure is complex and only suitable
for communications in half-duplex mode because only one line exists.
In 1990, 10BASE-T Ethernet based on twisted pair cables emerged. In this technology,
terminals are connected to a hub through twisted pair cables and communicate through a
Issue 01 (2018-05-04) 478

NE20E-S2
shared bus in the hub. The structure is physically a star topology. CSMA/CD is still used
because inside the hub, all terminals are connected to a shared bus.
All the hosts are connected to a coaxial cable in a similar manner. When a large number
of hosts exist, the following problems arise:
− Reliability of the media is low.
− Media access conflicts are severe.
− Packets are not properly broadcast.
− Security is not ensured.
 100M Ethernet
100M Ethernet works at a higher rate (10 times the rate of 10M Ethernet) and differs
from 10M Ethernet in the following ways:
− Network type: 10M Ethernet supports only a shared Ethernet, while 100M Ethernet
is a 10M/100M auto-sensing Ethernet and can work in half-duplex or full-duplex
mode.
− Negotiation mechanism: 10M Ethernet uses Normal Link Pulses (NLPs) to detect
the link connection status, while 100M Ethernet uses auto-negotiation between two
link ends.
 Gigabit Ethernet (GE) and 10GE
With the advancement of computer technology, applications such as large-scale
distributed databases and high-speed transmission of video images emerged. Those
applications require high bandwidth, and traditional 100M Fast Ethernet (FE) cannot
meet the requirements. GE was introduced to provide higher bandwidth.
GE inherits the data link layer of traditional Ethernet. This protects earlier investments in
traditional Ethernet. The GE and traditional Ethernet have different physical layers,
however, to transmit data at 1000 Mbit/s, the GE uses optical fiber channels.
As computer science develops, the 10GE technology becomes mature and is widely used
on Datacom backbone networks. This technology is also used to connect high-end
database servers.
1.7.2.2 Principles
1.7.2.2.1 Ethernet Physical Layer
Introduction to Ethernet Cable Standards

The following Ethernet cabling standards exist:
 10BASE-2
 10BASE-5
 10BASE-T
 10BASE-F
 100BASE-T4
 100BASE-TX
 100BASE-FX
 1000BASE-SX
 1000BASE-LX
 1000BASE-TX
Issue 01 (2018-05-04) 479

NE20E-S2
In these cabling standards, 10, 100, and 1000 represent the transmission rate (in Mbit/s), and
BASE represents baseband.
 10M Ethernet cable standard
Table 1-85 lists the 10M Ethernet cabling standard specifications defined in IEEE 802.3.
Table 1-85 10M Ethernet cable standard
Name Cable Maximum Transmission

Distance
10BASE-5 Thick coaxial cable 500 m

10BASE-2 Thin coaxial cable 200 m
10BASE-T Twisted pair cable 100 m
10BASE-F Fiber 2000 m
The greatest limitation of coaxial cable is that devices on the cable are connected in
series, so a single point of failure (SPOF) may cause a breakdown of the entire network.
As a result, the physical standards of coaxial cables, 10BASE-2 and 10BASE-5, have
fallen into disuse.
 100M Ethernet cable standard
100M Ethernet is also called Fast Ethernet (FE). Compared with 10M Ethernet, 100M
Ethernet has a faster transmission rate at the physical layer, but has the same rate at the
data link layer.
Table 1-86 lists the 100M Ethernet cable standard specifications.
Table 1-86 100M Ethernet cable standard
Name Cable Maximum Transmission

Distance
100Base-T4 Four pairs of Category 3 twisted 100 m

pair cables
100Base-Tx Two pairs of Category 5 twisted 100 m
pair cables
100Base-Fx Single-mode fiber or multi-mode 2000 m
fiber
10Base-T and 100Base-TX have different transmission rates, but both apply to Category
5 twisted pair cables. 10Base-T transmits data at 10 Mbit/s, while 100Base-TX transmits
data at 100 Mbit/s.
Issue 01 (2018-05-04) 480

NE20E-S2
100Base-T4 is now rarely used.

 Gigabit Ethernet cable standard
Gigabit Ethernet developed from the Ethernet standard defined in IEEE 802.3. Based on
the Ethernet protocol, the transmission rate increased by 10 times, reaching 1 Gbit/s in
GE. Table 1-87 lists the Gigabit Ethernet cable standard specifications.
Table 1-87 Gigabit Ethernet cable standard
Interface Name Cables Maximum Transmission

Distance
1000Base-LX Single-mode fiber or multi-mode 316 m

fiber
1000Base-SX Multi-mode fiber 316 m
1000Base-TX Category 5 twisted pair cable 100 m
Using Gigabit Ethernet technology, you can upgrade an existing Fast Ethernet network
from 100 Mbit/s to 1000 Mbit/s.
The physical layer of Gigabit Ethernet uses 8B10B coding. In the traditional Ethernet
technology, the data link layer delivers 8-bit data sets to the physical layer. After
processing, the 8 bit data sets are sent to the data link layer for transmission.
This process is different on the Gigabit Ethernet of optical fibers, in which the physical
layer maps the 8-bit data sets to 10-bit data sets before sending them to the data link
layer.
 10GE cable standards
IEEE 802.3ae is the 10GE cable standard. For 10GE, the cables are all optical fiber in
full-duplex mode.
The development of 10GE is well under way, and will be widely deployed in future.
CSMA/CD
 Concept of CSMA/CD
Ethernet was originally designed to connect stations, such as computers and peripherals,
on a shared physical line. However, the stations can only access the shared line in
half-duplex mode. Therefore, a mechanism of collision detection and avoidance is
required to enable multiple devices to share the same line in way that gives each device
fair access. Carrier Sense Multiple Access with Collision Detection (CSMA/CD) was
therefore introduced.
The concept of CSMA/CD is as follows:
− CS: carrier sense
Before transmitting data, a station checks to see if the line is idle. In this manner,
chances of collision are decreased.
− MA: multiple access
The data sent by a station can be received by other stations.
− CD: collision detection
Issue 01 (2018-05-04) 481

NE20E-S2
If two stations transmit electrical signals at the same time, the signals are
superimposed, doubling the normal voltage amplitude. This situation results in
collision.
The stations stop transmitting after sensing the conflict, and then resume
transmission after a random delay time.
 Working process of CSMA/CD
CSMA/CD works as follows:
a. A station continuously checks whether the shared line is idle.
 If the line is idle, the station sends data.
 If the line is in use, the station waits until the line is idle.
b. If two stations send data at the same time, a conflict occurs on the line, and the
signal becomes unstable.
c. After detecting an instability, the station immediately stops sending data.
d. The station sends a series of pulses.
The pulses inform other stations that a conflict has occurred on the line.
After detecting a conflict, the station waits for a random period of time, and then
resumes the data transmission.
Minimum Frame Length and Maximum Transmission Distance

 Minimum frame length
Due to the CSMA/CD algorithm limitation, an Ethernet frame cannot be shorter than a
certain length. The minimum frame length is 64 bytes. This length was determined based
on Ethernet maximum transmission distance and the collision detection mechanism.
The use of a minimum frame length prevents situations in which station A finishes
sending the last bit of a frame, but the first bit has not arrived at station B. Station B
senses that the line is idle and begins to send data, leading to a conflict.
The upper layer protocol must ensure that each frame's Data field contains at least 46
bytes. Therefore, the Data field with a 14-byte Ethernet frame header and 4-byte check
code at the end of the frame equals the minimum frame length of 64 bytes. If the Data
field is less than 46 bytes, the upper layer must fill the difference.
The maximum length of a frame's Data field is 1500 bytes, based on the memory cost
and buffer of low-cost LAN controllers in 1979.
 Maximum transmission distance
The maximum transmission distance depends on factors such as line quality and signal
attenuation.
Ethernet Duplex Modes

The Ethernet physical layer can work in either half- or full-duplex mode.
 Half-duplex mode
Half-duplex mode has the following features:
− Sending and receiving data takes place in one direction at a time.
− The CSMA/CD mechanism is used.
− The transmission distance is limited.
Hubs work in half-duplex mode.
 Full-duplex mode
Issue 01 (2018-05-04) 482

NE20E-S2
When Layer 2 switches replaced hubs in networking, shared Ethernet changed to

switched Ethernet, replacing half-duplex mode with full-duplex mode. As a result, the
transmission rate doubled.
Full-duplex mode solved the problem of conflicts, eliminating the need for CSMA/CD.
Full-duplex mode has the following features:
− Transmitting and receiving data can take place simultaneously.
− The maximum throughput is theoretically twice that of half-duplex mode.
− This mode extends the transmission distance of half-duplex mode.
Except for hubs, all network cards, Layer 2 switches, and routers produced in the past 10
years support full-duplex mode.
Full-duplex mode has the following requirements:
− Full-duplex network cards and chips
− Physical media over which sending and receiving frames are separated
− Point-to-point connection
Ethernet Auto-Negotiation
 Purpose of auto-negotiation
The earlier Ethernet used a 10 Mbit/s half-duplex mode that required CSMA/CD to
ensure access by all stations. The introduction of full-duplex mode and 100M Ethernet
created a need to achieve compatibility between the earlier and newer Ethernet
technologies.
Auto-negotiation technology achieves this compatibility by enabling the device on either
end of a link to choose the operation parameters. By exchanging information, the devices
negotiate parameters including half- or full-duplex mode, transmission speed, and flow
control. After the negotiation, the devices operate in the negotiated mode and rate.
Auto-negotiation is defined in the following standards:
− 100M Ethernet standard: IEEE 802.3u
In IEEE 802.3u, auto-negotiation is defined as an optional function.
− Gigabit Ethernet standard: IEEE 802.3z
In IEEE 802.3z, auto-negotiation is defined as a mandatory function.
 Principle of auto-negotiation
The auto-negotiation mechanism applies to twisted pair links only.
When no data is transmitted over a twisted pair link, the link is not idle because the
devices on the link transmit pulse signals at low frequency. Each device can identify
these Fast Link Pulses (FLPs) and use them to transmit small amounts of data to
implement auto-negotiation, as shown in Figure 1-304.
Figure 1-304 Schematic diagram of pulse insertion
Issue 01 (2018-05-04) 483

NE20E-S2
Auto-negotiation priorities of the Ethernet duplex link are listed as follows in descending
order:
− 1000M full-duplex
− 1000M half-duplex
− 100M full-duplex
− 100M half-duplex
− 10M full-duplex
− 10M half-duplex
If auto-negotiation succeeds, the Ethernet card activates the link. Then, data can be
transmitted over it. If auto-negotiation fails, the link is inaccessible.
Auto-negotiation is implemented at the physical layer and does not require any data
packets or have impact on upper-layer protocols.
 Auto-negotiation rules for interfaces
Two connected interfaces can communicate with each other only when they are in the
same working mode.
− If both interfaces work in the same non-auto-negotiation mode, the interfaces can
communicate.
− If both interfaces work in auto-negotiation mode, the interfaces can communicate
through negotiation. The negotiated working mode depends on the interface with
lower capability. Specifically, if one interface works in full-duplex mode and the
other interface works in half-duplex mode, the negotiated working mode is
half-duplex. The auto-negotiation function also allows the interfaces to negotiate
the use of the traffic control function.
− If a local interface works in auto-negotiation mode and the remote interface works
in a non-auto-negotiation mode, the negotiated working mode of the local interface
depends on the working mode of the remote interface.
Table 1-88 describes the auto-negotiation rules for interfaces of the same type.
Table 1-88 Auto-negotiation rules for interfaces of the same type (local interface working in
auto-negotiation mode)
Interface Type Working Mode of Auto-negotiation Description

the Remote Result
Interface
FE electrical 10M half-duplex 10M half-duplex If the remote
interface interface works in
10M full-duplex 10M half-duplex 10M full-duplex or
100M full-duplex
100M half-duplex 100M half-duplex
mode, the working
100M full-duplex 100M half-duplex modes of the two
interfaces are
different after
auto-negotiation,
and packets may be
dropped. Therefore,
if the remote
interface works in
Issue 01 (2018-05-04) 484

NE20E-S2
Interface Type Working Mode of Auto-negotiation Description

the Remote Result
Interface
10M full-duplex or
100M full-duplex
mode, configure the
local interface to
work in the same
mode.
GE electrical FE auto-negotiation 100M full-duplex If the remote
interface interface works in
10M half-duplex 10M half-duplex 10M full-duplex or
100M full-duplex
10M full-duplex 10M half-duplex
mode, the working
100M half-duplex 100M half-duplex modes of the two
interfaces are
100M full-duplex 100M half-duplex different after
auto-negotiation,
1000M full-duplex 1000M full-duplex and packets may be
dropped. Therefore,
if the remote
interface works in
10M full-duplex or
100M full-duplex
mode, configure the
local interface to
work in the same
mode.
Table 1-89 describes the auto-negotiation rules for interfaces of different types.
Table 1-89 Auto-negotiation rules for interfaces of different types
Interfac Working Working Auto-nego Description

e Type Mode of Mode of a tiation
an FE GE Result
Electrical Electrical
Interface Interface
An FE 10M Auto-negoti 10M If the FE electrical interface
electrical half-duplex ation half-duplex works in 10M full-duplex or
interface 100M full-duplex mode and the
connectin 10M 10M GE electrical interface works in
g to a GE full-duplex half-duplex auto-negotiation mode, the
electrical working modes of the two
100M 100M
interface interfaces are different after
half-duplex half-duplex
auto-negotiation and packets may
100M 100M be dropped. Therefore, if the FE
electrical interface works in 10M
Issue 01 (2018-05-04) 485

NE20E-S2
Interfac Working Working Auto-nego Description

e Type Mode of Mode of a tiation
an FE GE Result
Electrical Electrical
Interface Interface
full-duplex half-duplex full-duplex or 100M full-duplex
mode, configure the GE electrical
interface to work in the same
mode.
Auto-negot 10M 10M If the FE electrical interface
iation half-duplex half-duplex works in auto-negotiation mode
and the GE electrical interface
10M 10M works in 10M full-duplex or
full-duplex half-duplex 100M full-duplex mode, the
working modes of the two
100M 100M
interfaces are different after
half-duplex half-duplex
auto-negotiation, and packets may
100M 100M be dropped. Therefore, if the GE
full-duplex half-duplex electrical interface works in 10M
full-duplex or 100M full-duplex
1000M Failure mode, configure the FE electrical
full-duplex interface to work in the same
mode.
If you configure the GE electrical
interface to work in 1000M
full-duplex mode,
auto-negotiation fails.
According to the auto-negotiation rules described in Table 1-88 and Table 1-89, if
an interface works in auto-negotiation mode and the connected interface works in a
non-auto-negotiation mode, packets may be dropped or auto-negotiation may fail. It
is recommended that you configure two connected interfaces to work in the same
mode to ensure that they can communicate properly.
FE and higher-rate optical interfaces only support full-duplex mode.
Auto-negotiation is enabled on GE interfaces for the negotiation of traffic control.
When devices are directly connected using GE optical interfaces, auto-negotiation
is enabled on the optical interfaces to detect unidirectional optical fiber faults. If
one of two optical fibers is faulty, the fault information is synchronized on both
ends through auto-negotiation. As a result, interfaces on both ends go Down. After
the fault is rectified, the interfaces go Up again through auto-negotiation.
Hub
 Hub principle
When terminals are connected through twisted pair cables, a convergence device called a
hub is required. Hubs operate at the physical layer. Figure 1-305 shows a hub operation
model.
Issue 01 (2018-05-04) 486

NE20E-S2
Figure 1-305 Hub operation model
A hub is configured as a box with multiple interfaces, each of which can connect to a
terminal. Therefore, multiple devices can be connected through a hub to form a star
topology.
Note that although the physical topology is a star, the hub uses bus and CSMA/CD
technologies.
Figure 1-306 Hub operation principle
 Two types of hubs are possible, distinguished by their interfaces:

− − Category-I hub: provides a single type of physical interfaces.
For example, a Category-I hub can accommodate either Category-5 twisted pair
interfaces, Category-3 twisted pair interfaces, or optical fiber interfaces.
− − Category-II hub: provides interfaces of different types. For example, a
Category-II hub can provide both Category-5 twisted pair interfaces and optical
fiber interfaces.
Aside from the interface provision, these hub types have no differences in their
internal operation. In practice, Category-I hubs are commonly used.
Issue 01 (2018-05-04) 487

NE20E-S2
1.7.2.2.2 Ethernet Data Link Layer
Hierarchical Structure of the Data Link Layer

In Ethernet, the following access modes are used according to different duplex modes:
 CSMA/CD is used in half-duplex mode.
 Data is sent in full-duplex mode without having to detect if the line is idle.
Duplex mode, either half or full, refers to the operation mode of the physical layer. Access
mode refers to the access of the data link layer. Therefore, in the Ethernet, the data link layer
and physical layer are associated.
Therefore, different access modes are required for different operation modes. This brings
about some inconvenience to the design and application of the Ethernet.
Some organizations and vendors have proposed dividing the data link layer into two
sub-layers: the Logical Link Control (LLC) sub-layer and the Media Access Control (MAC)
sub-layer. Then, different physical layers correspond to different MAC sub-layers, and the
LLC sub-layer becomes totally independent, as shown in Figure 1-307.
Figure 1-307 Hierarchical structure of the Ethernet data link layer
MAC Sub-layer
 Functions of the MAC sub-layer
The MAC sub-layer is responsible for the following:
− Accessing physical links
− Identifying stations at the data link layer
The MAC sub-layer reserves a unique MAC address to identify each station.
− Transmitting data over the data link layer. After receiving data from the LLC
sub-layer, the MAC sub-layer adds the MAC address and control information to the
data, and then transfers the data to the physical link. During this process, the MAC
sub-layer provides other functions, such as the check function.
 Accessing physical links
The MAC sub-layer is associated with the physical layer so that different MAC
sub-layers provide access to different physical layers.
Ethernet has two types of MAC sub-layers:
− Half-duplex MAC: provides access to the physical layer in half-duplex mode.
− Full-duplex MAC: provides access to the physical layer in full-duplex mode.
Issue 01 (2018-05-04) 488

NE20E-S2
The two types of MAC are integrated in a network interface card. After the network
interface card is initialized, auto-negotiation is performed to choose an operation mode,
and then a MAC is chosen according to the operation mode.
 Identifying stations at the data link layer
The MAC sub-layer uses a MAC address to uniquely identify a station.
MAC addresses are managed by the Institute of Electrical and Electronics Engineers
(IEEE) and allocated in blocks. An organization, generally a vendor, obtains a unique
address block from the IEEE. The address block is called the Organizationally Unique
Identifier (OUI), and can be used by the organization to allocate addresses to 16,777,216
devices.
A MAC address consists of 48 bits, generally represented in dotted hexadecimal notation.
For example, the 48-bit MAC address 0000000011100000111111001000000000110100
is generally represented as 00e0.fc39.8034.
The first 24 bits stand for the OUI; the last 24 bits are allocated by the vendor. For
example, in 00e0.fc39.8034, 00e0.fc is the OUI allocated by the IEEE to Huawei;
39.8034 is the address number allocated by Huawei.
The second bit of a MAC address indicates whether the address is globally or locally
unique. The Ethernet uses globally unique MAC addresses.
Ethernet uses the following types of MAC addresses:
− Physical MAC address
A physical MAC address is permanently stored in network interface hardware (such
as a network interface card) and is used to uniquely identify a terminal on an
Ethernet.
− Broadcast MAC address
A broadcast MAC address indicates all the terminals on a network.
The 48 bits of a broadcast MAC address are all 1s. In hexadecimal notation, this
address is ffff.ffff.ffff.
− Multicast MAC address
A multicast MAC address indicates a group of terminals on a network.
The eighth bit of a multicast MAC address is 1, such as
000000011011101100111010101110101011111010101000.
 Transmitting data at the data link layer
Data transmission at the data link layer is as follows:
a. The upper layer delivers data to the MAC sub-layer.
b. The MAC sub-layer stores the data in a buffer.
c. The MAC sub-layer adds the destination and source MAC addresses to the data,
calculates the length of the data frame, and forms Ethernet frames.
d. The Ethernet frame is sent to the peer according to the destination MAC address.
e. The peer compares the destination MAC address with entries in the MAC address
table.
 If there is a matching entry, the frame is accepted.
 If there is no matching entry, the frame is discarded.
The preceding describes frame transmission in unicast mode. After an upper-layer
application is added to a multicast group, the data link layer generates a multicast MAC
address according to the application, and then adds the multicast MAC address to the
MAC address table. The MAC sub-layer then receives frames with the multicast MAC
address and transmits the frames to the upper layer.
Issue 01 (2018-05-04) 489

NE20E-S2
Ethernet Frame Structure

 Format of an Ethernet_II frame
Figure 1-308 Format of an Ethernet_II frame
An Ethernet_II frame has the following fields:

− DMAC
Indicates the destination MAC address, which specifies the receiver of the frame.
− SMAC
Indicates the source MAC address, which specifies the sender of the frame.
− Type
The 2-byte Type field identifies the upper layer protocol of the Data field. The
receiver can interpret the meaning of the Data field according to the Type field.
Multiple protocols can coexist on a local area network (LAN). The hexadecimal
values in the Type field of an Ethernet_II frame specify different protocols.
 Frames with the Type field value 0800 are IP frames.
 Frames with the Type field value 0806 are Address Resolution Protocol (ARP)
frames.
 Frames with the Type field value 0835 are Reverse Address Resolution
Protocol (RARP) frames.
 Frames with the Type field value 8137 are Internetwork Packet Exchange (IPx)
and Sequenced Packet Exchange (SPx) frames.
− Data
The minimum length of the Data field is 46 bytes, which ensures that the frame is at
least 64 bytes in length. A 46-byte Data field is required even if a station transmits 1
byte of data.
If the payload of the Data field is less than 46 bytes, the Data field must be padded
to 46 bytes.
The maximum length of the Data field is 1500 bytes.
− CRC
The Cyclic Redundancy Check (CRC) field provides an error detection mechanism.
Each sending device calculates a CRC code from the DMAC, SMAC, Type, and
Data fields. Then the CRC code is filled into the 4-byte CRC field.
 Format of an IEEE 802.3 frame
Issue 01 (2018-05-04) 490

NE20E-S2
Figure 1-309 Format of an IEEE 802.3 frame
As shown in Figure 1-309, the format of an IEEE 802.3 frame is similar to that of an
Ethernet_II frame. In an IEEE 802.3 frame, however, the Type field is changed to the
Length field, and the LLC field and Sub-Network Access Protocol (SNAP) field occupy
8 bytes of the Data field.
− Length
The Length field specifies the number of bytes of the Data field.
− LLC
The LLC field consists of three sub-fields: Destination Service Access Point
(DSAP), Source Service Access Point (SSAP), and Control.
− SNAP
The SNAP field consists of the Org Code field and Type field. Three bytes of the
Org Code field are all 0s. The Type field functions the same as that in Ethernet_II
frames.
For descriptions of other fields, see the description of Ethernet_II frames.
Based on the values of DSAP and SSAP, IEEE 802.3 networks use the following types of
frames:
− If DSAP and SSAP are both 0xff, the IEEE 802.3 frame becomes a
NetWare-Ethernet frame bearing NetWare data.
− If DSAP and SSAP are both 0xaa, the IEEE 802.3 frame becomes an
Ethernet_SNAP frame.
Ethernet_SNAP frames can encapsulate the data of multiple protocols. The SNAP
can be considered as an extension of the Ethernet protocol. SNAP allows vendors to
invent their own Ethernet transmission protocols.
The Ethernet_SNAP standard is defined by IEEE 802.1 to help ensure compatibility
between the operations between IEEE 802.3 LANs and Ethernet networks.
− Other values of DSAP and SSAP indicate IEEE 802.3 frames.
 Jumbo frames
Jumbo frames are Ethernet frames of greater length complying with vendor standards.
Such frames are dedicated to Gigabit Ethernet.
Jumbo frames carry more than 1518 bytes of payload. Generally, Ethernet frames carry a
maximum payload of 1518 bytes. Therefore, to implement transmission of large-sized
datagrams at the IP layer, datagram fragmentation is required to transmit the data within
an Ethernet frame. A frame header and a framer trailer are added to each frame during
frame transmission. Therefore, to reduce network costs and improve network usage and
transmission rate, Jumbo frames are introduced.
Issue 01 (2018-05-04) 491

NE20E-S2
The two Ethernet interfaces that need to communicate must both support jumbo frames
so that NE20Es can merge several standard-sized Ethernet frames into a jumbo frame to
improve transmission efficiency.
LLC Sub-layer
As described, the MAC sub-layer supports IEEE 802.3 frames and Ethernet_II frames. In an
Ethernet_II frame, the Type field identifies the upper layer protocol. Therefore, on a device,
the LLC sub-layer is not needed and only the MAC sub-layer is required.
In an IEEE 802.3 frame, useful features are defined at the LLC sub-layer in addition to the
traditional services of the data link layer. These features are specified by the sub-fields of
DSAP, SSAP, and Control.
Networks can support the following types of point-to-point services:
 Connection-less service
Currently, the Ethernet implements this service.
 Connection-oriented service
The connection is set up before data is transmitted. The reliability of the data
transmission is ensured.
 Connection-less data transmission with acknowledgement
The connection is not required before data transmission. The acknowledgement
mechanism is adopted to improve reliability.
The following is an example describing the application of SSAP and DSAP with terminals A
and B that use connection-oriented services. Data is transmitted using the following process:
1. A sends a frame to B to request a connection with B.
2. After receiving the frame, if B has enough resources, B returns an acknowledgement
message that contains a Service Access Point (SAP). The SAP identifies the connection
required by A.
3. After receiving the acknowledgement message, A knows that B has set up a local
connection between them. After creating a SAP, A sends a message containing the SAP
to B. The connection is set up.
4. 4. The LLC sub-layer of A encapsulates the data into a frame. The DSAP field is filled
in with the SAP sent by B; the SSAP field is filled in with that created by A. Then the
LLC sub-layer of A transfers the data to its MAC sub-layer.
5. 5. The MAC sub-layer of A adds the MAC address and Length field to the frame, and
then transfers the frame to the data link layer.
6. 6. After the frame is received at the MAC sub-layer of B, the frame is transferred to
the LLC sub-layer. The LLC sub-layer identifies the connection that the frame belongs to
according to the DSAP field.
7. 7. After checking and acknowledging the frame based on the connection type, the LLC
sub-layer of B transfers the frame to the upper layer.
8. After the frame reaches its destination, A sends B a frame instructing B to release the
connection. At this time, the communications end.
Issue 01 (2018-05-04) 492

NE20E-S2
1.7.2.3.1 Computer Interconnection
Computer interconnection is the principal object and the major application of Ethernet
technology.
In early Ethernet LANs, computers were connected through coaxial cables to access shared
directories or a file server. All the computers, whether they are servers or hosts, are equal on
this network.
However, because most traffic flows between clients and servers, the early traffic model led to
bottlenecks on servers.
After the introduction of full-duplex Ethernet technology and Ethernet switches, servers can
connect to high-speed interfaces (100 Mbit/s) on Ethernet switches. Clients can use
lower-speed interfaces. This approach reduces traffic bottlenecks. The modern operating
system provides distributed services and database services, and allows servers to
communicate with clients and other servers for data synchronization. 100M FE cannot meet
the bandwidth requirement; therefore, the 1000M Ethernet technology is introduced to meet
the requirements of the modernized technology.
1.7.2.3.2 Interconnection Between High-Speed Network Devices

The need to support Internet traffic challenged the bandwidth between some traditional
network devices such as routers. 1000M Ethernet was the first choice to solve the problem.
100M FE also helped because after being converged, 100M FE networks can form FE
channels whose speed ranges from 100 Mbit/s to 1000 Mbit/s.
1.7.2.3.3 MAN Access Methods

Accessing a Metropolitan Area Network (MAN) enables users to surf the Internet, download
files, and view Video on Demand (VoD) programs. Ethernet technology is the technology
used to access MANs because most computers support Ethernet network interface cards.
1.7.3 Trunk
Definition
Trunk is a technology that bundles multiple physical interfaces into a single logical interface.
This logical interface is called a trunk interface, and each bundled physical interface is called
a member interface.
Trunk technology helps increase bandwidth, enhance reliability, and carry out load balancing.
Purpose
Without trunk technology, the transmission rate between two network devices connected by a
100 Mbit/s Ethernet twisted pair cable can only reach 100 Mbit/s. To obtain a higher
transmission rate, you must change the transmission media or upgrade the network to a
Gigabit Ethernet, which is costly to small- and medium- sized enterprises and schools.
Issue 01 (2018-05-04) 493

NE20E-S2
Trunk technology provides an economical solution. For example, a trunk interface with three
100 Mbit/s member interfaces working in full-duplex mode can provide a maximum
bandwidth of 300 Mbit/s.
Both Ethernet interfaces and Packet over SONET/SDH (POS) interfaces can be bundled into a
trunk interface. These two types of interfaces, however, cannot be member interfaces of the
same trunk interface. The reasons are as follows:
 Ethernet interfaces apply to a broadcast network where packets are sent to all devices on
the network.
 POS interfaces apply to a P2P network, because the link layer protocol of POS interfaces
is High-level Data Link Control (HDLC), which is a point-to-point (P2P) protocol.
Benefits
This feature offers the following benefits:
 Increased bandwidth
 Improved link reliability through traffic load balancing
1.7.3.2 Principles
1.7.3.2.1 Basic Trunk Principles
The member links of a trunk link can be configured with different weights to carry out load
balancing, which helps ensure connection reliability and greater bandwidth.
Users can configure trunk interfaces to support various routing protocols and services.
Figure 1-310 shows a simple Eth-Trunk example in which two routers are directly connected
through three interfaces. These three interfaces are bundled into an Eth-Trunk interface at
both ends of the trunk link. In this way, the bandwidth is increased, and reliability is
improved.
Figure 1-310 Schematic diagram of a trunk
A trunk link can be considered as a point-to-point link. The devices on the end the link can be
both routers or switches, or a router on one end and a switch on the other.
A trunk has the following advantages:
 Greater bandwidth
The total bandwidth of a trunk interface equals the sum of the bandwidth of all its
member interfaces. In this manner, the interface bandwidth is multiplied.
 Higher reliability
If a member interface fails, traffic on the faulty link is then switched to an available
member link. This ensures higher reliability for the entire trunk link.
 Load balancing
Issue 01 (2018-05-04) 494

NE20E-S2
Load balancing can be carried out on a trunk interface, which distributes traffic among
its member interfaces and then transmits the traffic through the member links to the same
destination. This prevents network congestion that occurs when all traffic is transmitted
over one link.
1.7.3.2.2 Constraints on the Trunk Interface

As a logical interface with multiple member physical interfaces transparently transmitting
upper-layer data, a trunk interface must comply with the following rules:
 Parameters of the member physical interfaces on both ends of a trunk link must be
consistent. The parameters include:
− Number of member interfaces bundled on each end
− Transmission rate of member interfaces
− Duplex mode of physical interfaces
− Traffic-control mode of physical interfaces
 Data must be transmitted in sequence.
A data flow is a set of data packets with the same source and destination MAC addresses
and IP addresses. For example, the traffic over a Telnet or FTP connection between two
devices is a data flow.
Before a trunk is configured, frames that belong to a data flow can reach their destination
in correct order because only one physical connection exists between two devices. When
the trunk interface is configured, frames are transmitted by multiple physical links. If the
second frame is transmitted over a different physical link than the first frame, the second
frame may reach the destination before the first.
To prevent frame mis-sequence, a datagram forwarding mechanism is used to ensure the
correct order of frames belonging to the same data flow. This mechanism categorizes
data flows based on MAC or IP addresses. The datagrams that belong to the same data
flow are transmitted over the same physical link.
After the datagram forwarding mechanism is introduced, frames are transmitted in either
of the following manners:
− Frames with the same source and destination MAC addresses are transmitted over
the same physical link.
− Frames with the same source and destination IP addresses are transmitted over the
same physical link.
1.7.3.2.3 Types and Features of Trunk Interfaces
Types of Trunk Interfaces

The following types of trunk interfaces are available:
 An Eth-Trunk interface consists of Ethernet interfaces.
 An IP-Trunk interface consists of POS interfaces.
Features of Trunk Interfaces

Eth-Trunk and IP-Trunk interfaces configured on the NE20E support the following features:
 Assignment of IP addresses
 Load balancing based on a hash algorithm
Issue 01 (2018-05-04) 495

NE20E-S2
 Addition of interfaces on different interface boards to the same trunk interface
Upper and Lower Thresholds for the Number of Up Member Links

The number of member links in the Up state affects the status and bandwidth of a trunk
interface. The bandwidth of an Eth-Trunk interface equals the total bandwidth of all member
interfaces in the Up state. As shown in 1.7.3.2.1 , two devices are directly connected through
three interfaces, and the three interfaces are bundled into an Eth-Trunk interface on each end
of the trunk link. If the bandwidth of each interface is 1 Gbit/s, the bandwidth of the
Eth-Trunk interface is 3 Gbit/s. If the Eth-Trunk interface has two Up member interfaces, its
bandwidth is reduced to 2 Gbit/s.
You can set the following thresholds to stabilize an Eth-Trunk interface's status and bandwidth
as well as reduce the impact brought by frequent changes of member link status.
 Lower threshold for the number of member links in the Up state
When the number of member links in the Up state is smaller than the lower threshold,
the Eth-Trunk interface goes Down. This ensures the minimum available bandwidth of
an Up trunk link.
For example, if an Eth-Trunk interface needs to provide a minimum bandwidth of 2
Gbit/s and each member link can provide 1 Gbit/s bandwidth, the lower threshold must
be set to 2 or a larger value. If one or no member links are in the Up state, the Eth-Trunk
interface goes Down.
 Upper threshold for the number of member links in the Up state
After the number of member links in the Up state reaches the upper threshold, the
bandwidth of the Eth-Trunk interface does not increase even if more member links go
Up. This improves network reliability and ensures sufficient bandwidth.
For example, 10 member links are added to an Eth-Trunk link, each providing 1 Gbit/s
bandwidth. If the Eth-Trunk interface only needs to provide a maximum bandwidth of 5
Gbit/s, the upper threshold can be set to 5, indicating a maximum of five member links
needs to be active. The remaining links automatically enter the backup state. If one or
more of the active member links go Down, the backup links automatically become active,
which ensures the 5 Gbit/s bandwidth of the Eth-Trunk interface and improves network
reliability.
Load Balancing of Trunk Interfaces

Load can be balanced among member links of a trunk link according to the configured
weights.
The following types of load balancing are available:
 Per-flow load balancing
Per-flow load balancing differentiates data flows based on the MAC or IP address in
each packet and ensures that packets of the same data flow are transmitted over the same
member link.
This load balancing mode ensures the data sequence, but not the bandwidth usage.
 Per-packet load balancing
Per-packet load balancing takes each packet (rather than a data flow) as the transmission
unit, and transmits packets over different member links.
This load balancing mode ensures bandwidth utilization, but not the packet sequence.
Therefore, this mode applies to the scenarios where the packet sequence is not strictly
required.
Issue 01 (2018-05-04) 496

NE20E-S2
MAC address
Each station or server connected to an Ethernet interface of a device has its own MAC address.
The MAC address table on the device records information about the MAC addresses of
connected devices.
When a Layer 3 router is connected to a Layer 2 switch through two Eth-Trunk links for
different services, if both Eth-Trunk interfaces on the router adopt the default system MAC
address, the system MAC address is learned by the switch and alternates between the two
Eth-Trunk interfaces. In this case, a loop probably occurs between the two devices. To prevent
loops, you can change the MAC address of an Eth-Trunk interface by using the mac-address
command. By configuring the source and destination MAC addresses for two Eth-Trunk links,
you can guarantee the normal transmission of service data flows and improve the network
reliability.
After the MAC address of an Eth-Trunk interface is changed, the device sends gratuitous ARP
packets to update the mapping relationship between MAC addresses and ports.
MTU
Generally, the IP layer controls the maximum length of frames that are sent each time. Any
time the IP layer receives an IP packet to be sent, it checks which local interface the packet
needs to be sent to and queries the MTU of the interface. Then, the IP layer compares the
MTU with the packet length to be sent. If the packet length is greater than the MTU, the IP
layer fragments the packet to ensure that the length of each fragment is smaller or equal to the
MTU.
If forcible unfragmentation is configured, certain packets are lost during data transmission at
the IP layer. To ensure jumbo packets are not dropped during transmission, you need to
configure forcible fragmentation.
Generally, it is recommended that you adopt the default MTU value of 1500 bytes. If you
need to change the MTU of an Eth-Trunk interface, you need to change the MTU of the peer
Eth-Trunk interface to ensure that the MTUs of both interfaces are the same. Otherwise,
services may be interrupted.
1.7.3.2.4 Link Aggregation Control Protocol
Emergence of Link Aggregation

With the wide application of Ethernet technology on metropolitan area networks (MANs) and
wide area networks (WANs), carriers have an increasing requirement on the bandwidth and
reliability of Ethernet backbone links. To obtain higher bandwidth, the conventional solution
is to replace the existing interface boards with boards of higher capacity or install devices
which support higher-capacity interface boards. However, this solution is costly and inflexible.
To provide an economic and convenient solution, link aggregation is introduced. Link
aggregation increases link bandwidth by bundling a group of physical interfaces into a single
logical interface without the need to upgrade hardware. In addition, link aggregation can
implement a link backup mechanism, which improves transmission reliability.
As a link aggregation technology, trunk bundles a group of physical interfaces into a logical
interface to increase the bandwidth. However, trunk can only detect link disconections, not
link layer faults or link misconnections. The Link Aggregation Control Protocol (LACP) is
therefore used to improve trunk fault tolerance, provide M:N backup for the trunk, and
improve reliability.
Issue 01 (2018-05-04) 497

NE20E-S2
LACP provides a standard negotiation mechanism for devices to automatically aggregate

multiple links according to their configurations and enable the aggregated link to transmit and
receive data. After an aggregated link is formed, LACP maintains the link status and
implements dynamic link aggregation and deaggregation.
Basic Concepts
 Link aggregation
Link aggregation is a method of bundling several physical interfaces into a logical
interface to increase bandwidth and reliability.
 Link aggregation group
A link aggregation group (LAG) or a trunk link is a logical link that aggregates several
physical links.
If all these aggregated links are Ethernet links, the LAG is called an Ethernet link
aggregation group, or an Eth-Trunk for short, and the interface at each end of the
Eth-Trunk link is called an Eth-Trunk interface.
Each interface that is added to the Eth-Trunk interface is called a member interface.
An Eth-Trunk interface can be considered as a single Ethernet interface. The only
difference lies that an Eth-Trunk interface needs to select one or more member Ethernet
interfaces before forwarding data. You can configure features on an Eth-Trunk interface
the same way as on a single Ethernet interface, except for some features that take effect
only on physical Ethernet interfaces.
An Eth-Trunk member interface cannot be added to another Eth-Trunk interface.

 Active and inactive interfaces
There are active and inactive interfaces in link aggregation. An interface that forwards
data is active, while an interface that does not forward data is inactive.
A link connected to an active interface is an active link, while a link connected to an
inactive interface is an inactive link.
To enhance link reliability, a backup link is used. Interfaces on the two ends of the
backup link are inactive. The inactive interfaces become active only when the active
interfaces fail.
 Upper threshold for the number of active interfaces
In an Eth-Trunk interface, if an upper threshold for the number of active interfaces is
configured and the number of available active interfaces exceeds the upper threshold, the
number of active interfaces in the Eth-Trunk remains at the upper threshold value.
 Lower threshold for the number of active interfaces
In an Eth-Trunk interface, if a lower threshold for the number of active interfaces is
configured and the number of active interfaces falls below this threshold, the Eth-Trunk
interface goes Down, and all member interfaces of the Eth-Trunk interface stop
forwarding data. This prevents data loss during transmission when the number of active
interfaces is insufficient.
The lower threshold configured for the number of active interfaces ensures the
bandwidth of an Eth-Trunk link.
 System LACP priority
A system LACP priority is set to prioritize the devices at both ends. A lower system
LACP priority value indicates a higher system LACP priority. The device with a higher
system priority is selected as the Actor, and then active member interfaces are selected
according to the configuration of the Eth-Trunk interface on the Actor. In LACP mode,
Issue 01 (2018-05-04) 498

NE20E-S2
the active interfaces selected by devices must be consistent at both ends; otherwise, the
LAG cannot be set up. To ensure the consistency of the active interfaces selected at both
ends, you can set a higher priority for one end. Then the other end can select the active
interfaces accordingly.
If neither of the devices at the two ends of an Eth-Trunk link is configured with the
system priority, the devices adopt the default value 32768. In this case, the Actor is
selected according to the system ID. That is, the device with the smaller system ID
becomes the Actor.
 Interface LACP priority
An interface LACP priority is set to specify the priority of an interface to be selected as
an active interface. Interfaces with higher priorities are selected as active interfaces.
A smaller interface LACP priority value indicates a higher interface LACP priority.
 M:N backup of member interfaces
Link aggregation in static LACP uses LACPDUs to negotiation on active link selection.
This mode is also called M:N mode where M indicates the number of active links and N
indicates the number of backup links. This mode improves link reliability and
implements load balancing among the M active links.
On the network shown in Figure 1-311, M+N links with the same attributes (in the same
LAG) are set up between two devices. When data is transmitted over the aggregation link,
traffic is distributed among the active (M) links. No data is transmitted over the backup
(N) links. Therefore, the actual bandwidth of the aggregation link is the sum of the
bandwidth of the M links, and the maximum bandwidth that can be provided is the sum
of the bandwidth of M + N links.
If one of the M links fails, LACP selects one available backup link from the N links to
replace the faulty link. In this situation, the actual bandwidth of the aggregation link
remains the sum of the bandwidth of M links, but the maximum bandwidth that can be
provided is the sum of the bandwidth of M + N - 1 links.
Figure 1-311 M:N backup
M:N backup applies to the scenario where bandwidth of M links needs to be provided
and link redundancy is required. If an active link fails, an LACP-enabled device can
automatically select the backup link with the highest priority and add it to the LAG.
If no backup link is available and the number of Up member links is less than the lower
threshold for the number of Up links, the device shuts down the trunk interface.
Link Aggregation Mode

Link aggregation can use manual load balancing or LACP:
 Manual 1:1 master/backup mode
Issue 01 (2018-05-04) 499

NE20E-S2
In 1:1 master/backup mode, an LAG contains only two member interfaces. One interface
is the primary interface and the other is the backup interface. In normal situations, only
the master interface forwards traffic.
In manual mode, you must manually set up an Eth-Trunk and add an interface to the
Eth-Trunk. You must also manually configure member interfaces to be in the active state.
The manual 1:1 master/backup mode is used when the peer device does not support
LACP.
 Manual load balancing mode
In this mode, you must manually create an Eth-Trunk interface and add member
interfaces to it. The LACP protocol is not required.
All member interfaces forward data and perform load balancing.
In manual load balancing mode, traffic can be evenly distributed among all member
interfaces. Alternatively, you can set different weights for member interfaces to
implement uneven load balancing. The interfaces set with greater weights transmit more
traffic.
If an active link of the LAG fails, traffic load balancing is implemented among the
remaining active links.
 LACP mode
In LACP mode, you also manually create a trunk interface and add member interfaces to
it. Compared with link aggregation in manual load balancing mode, active interfaces in
LACP mode are selected through the transmission of Link Aggregation Control Protocol
Data Units (LACPDUs). This means that when a group of interfaces are added to a trunk
interface, the status of each member interface (active or inactive) depends on the LACP
negotiation.
Table 1-90 shows the similarities and differences between the manual load balancing
mode and LACP mode.
Table 1-90 Comparison of manual load balancing and LACP mode
Differen Manual Load Balancing LACP Mode

ce/Simil Mode
arity
Differenc LACP is disabled. LACP is enabled.
e Whether interfaces in an LAG LACP checks whether interfaces in an LAG
can be aggregated is not can be aggregated.
checked. Here, aggregation means the bundling of all
active interfaces.
Similarity The LAG is created and deleted manually, and the member links are added
and deleted manually.
Principle of Link Aggregation in Manual Load Balancing Mode

Link aggregation in manual load balancing mode is widely applied. In this mode, multiple
interfaces can be manually added to an aggregation group, all of which forward data and
Issue 01 (2018-05-04) 500

NE20E-S2
participate in load balancing. This mode applies when a great amount of link bandwidth is
required for two directly connected devices and one of them does not support LACP. As
shown in Figure 1-312, Device A supports LACP, while Device B does not.
Figure 1-312 Networking of link aggregation in manual load balancing mode
In this mode, load balancing is carried out among all member interfaces. The NE20E supports
two types of load balancing:
 Per-flow load balancing
Principle of Link Aggregation in LACP Mode

LACP, specified in IEEE 802.3 ad, provides a standardized means of exchanging information
to dynamically configure and maintain link aggregation groups. The local device and the peer
exchange information through LACPDUs.
After member interfaces are added to a trunk interface, the member interfaces send
LACPDUs to inform the peers of their system priorities, MAC addresses, interface priorities,
interface numbers, and keys. After the peer receives the information, the peer compares this
information with stored information and selects interfaces that can be aggregated. Devices at
both ends then determine which interfaces are to be active interfaces.
Figure 1-313 shows the fields in an LACPDU.
Issue 01 (2018-05-04) 501

NE20E-S2
Figure 1-313 LACPDU
An Eth-Trunk link in LACP mode is set up in the following process:

1. Devices at both ends send LACPDUs.
As shown in Figure 1-314, an Eth-Trunk interface in LACP mode is created on Device A
and Device B, and member interfaces are added to each. Then LACP is automatically
enabled on the member interfaces, and Device A and Device B send LACPDUs to each
other.
Issue 01 (2018-05-04) 502

NE20E-S2
Figure 1-314 LACPDU sending in LACP mode
2. Devices at both ends determine the Actor according to the system LACP priority and
system ID.
As shown in Figure 1-315, devices at both ends receive LACPDUs from each other.
When Device B receives LACPDUs from Device A, Device B checks and records
information about Device A and compares their system priorities. If the system priority
of Device A is higher than that of Device B, Device A functions as the Actor and Device
B selects active interfaces according to the interface priority of Device A. In this manner,
devices on both ends select the same active interfaces.
Figure 1-315 Determining the Actor in LACP mode
3. Devices at both ends determine active interfaces according to the LACP priorities and
interface IDs of the Actor.
On the network shown in Figure 1-316, after the devices at both ends determine the
Actor, both devices select active interfaces according to the interface priorities on the
Actor.
Then active interfaces are selected, those to be included in the LAG are specified, and
load balancing is implemented among these active links.
Issue 01 (2018-05-04) 503

NE20E-S2
Figure 1-316 Selecting active interfaces in LACP mode
 Switching between active links and inactive links

In LACP mode, if a device at either end detects any of the following events, link
switching is triggered in the LAG.
− A link is Down.
− Ethernet OAM detects a link failure.
− LACP discovers a link failure.
− An active interface becomes unavailable.
− After LACP preemption is enabled, the priority of the backup interface is changed
to be higher than that of the current active interface.
If any of the preceding conditions are met, a link switching occurs in the following steps:
a. The faulty link is disabled.
b. The backup link with the highest priority is selected to replace the faulty active link.
c. The backup link of the highest priority becomes the active link and then begins
forwarding data. The link switching is complete.
 LACP preemption
After LACP preemption is enabled, interfaces with higher priorities in an LAG function
as active interfaces.
As shown in Figure 1-317, Port 1, Port 2, and Port 3 are member interfaces of Eth-Trunk
1. The upper threshold for the number of active interfaces is 2. LACP priorities of Port 1
and Port 2 are set to 9 and 10, respectively. The LACP priority of Port 3 is the default
value. When LACP negotiation is complete, Port 1 and Port 2 are selected as active
interfaces because their LACP priorities are higher. Port 3 becomes the backup interface.
Issue 01 (2018-05-04) 504

NE20E-S2
Figure 1-317 Networking diagram of LACP preemption
LACP preemption needs to be enabled in the following situations.

− Port 1 fails and then recovers. When Port 1 fails, Port 3 takes its place. After Port 1
recovers, if LACP preemption is not enabled on Eth-Trunk 1, Port 1 remains as the
backup interface. If LACP preemption is enabled on Eth-Trunk 1, Port 1 becomes
the active interface after it recovers, and Port 3 becomes the backup interface again.
− If LACP preemption is enabled and you want Port 3 to take the place of Port 1 or
Port 2 as an active interface, you can set the LACP priority value of Port 3 to a
smaller value. If LACP preemption is not enabled, the system does not re-select an
active interface or switch the active interface when the priority of a backup interface
is higher than that of the active interface.
 LACP preemption delay
After LACP preemption occurs, the backup link waits for a period of time before
switching to the Active state. This period of time is called an LACP preemption delay.
The LACP preemption delay can be set to prevent unstable data transmission on an
Eth-Trunk link due to frequent link status changes.
As shown in Figure 1-316, Port 1 becomes an inactive interface due to a link fault. If the
system is enabled with LACP preemption, Port 1 can resume its Active state only after a
preemption delay when the link fault is rectified.
 Loop detection
LACP supports loop detection. If a local Eth-Trunk interface in LACP mode receives a
sole LACP protocol packet, the Eth-Trunk interface sets its member interfaces to the
Unselected state so that they cease to participate in service traffic forwarding.
After the loop is eliminated:
− If the Eth-Trunk interfaces on each end of a link can exchange LACPDUs normally
and LACP negotiation succeeds, the member interfaces in Unselected state are
restored to the Selected state and resume service traffic forwarding.
− If the Eth-Trunk interfaces on each end of a link still cannot exchange LACPDUs
normally, the member interfaces remains in the Unselected state, and the member
interfaces still cannot participate in service traffic forwarding.
1.7.3.2.5 E-Trunk
Definition
Enhanced Trunk (E-Trunk) implements inter-device link aggregation, providing device-level
reliability.
Issue 01 (2018-05-04) 505

NE20E-S2
Background
Eth-Trunk implements link reliability between single devices. However, if a device fails,
Eth-Trunk fails to take effect.
To improve network reliability, carriers introduced device redundancy with master and backup
devices. If the master device or primary link fails, the backup device can take over user
services. In this situation, another device must be dual-homed to the master and backup
devices, and inter-device link reliability must be ensured.
E-Trunk was introduced to meet the requirements. E-Trunk aggregates data links of multiple
devices to form a link aggregation group (LAG). If a link or device fails, services are
automatically switched to the other available links or devices in the E-Trunk, improving link
and device-level reliability.
Basic Concepts
Figure 1-318 E-Trunk diagram 1
Basic E-Trunk concepts are introduced based on Figure 1-318.

 Link aggregation control protocol (LACP) system priority of a member Eth-Trunk
interface in an E-Trunk
For an Eth-Trunk interface that is a member interface of an E-Trunk, the LACP system
priority is referred to as the LACP E-Trunk system priority.
When an E-Trunk consists of Eth-Trunk interfaces working in static LACP mode, each
member Eth-Trunk interface and the connected peer Eth-Trunk interface use LACP
E-Trunk system priorities to determine the priority of the device at either end of the
Eth-Trunk link. The device with the higher priority functions as the LACP Actor and
determines which member interfaces in its Eth-Trunk interface are active based on the
interface priorities. The other device selects the member interfaces connected to the
active member interfaces on the Actor as active member interfaces.
In an E-Trunk, for a CE to consider the peer PEs to be a single device, the peer PEs must
have the same LACP E-Trunk system priority and system ID.
 The LACP E-Trunk system priority is used for the E-Trunk to which Eth-Trunk interfaces in static
LACP mode are added.
 The LACP system priority is used for Eth-Trunk interfaces in static LACP mode.
 The LACP E-Trunk system priority and LACP system priority can be changed. If both priorities are
configured, after an Eth-Trunk interface working in static LACP mode is added to an E-Trunk, only
the LACP E-Trunk system priority takes effect for the Eth-Trunk interface.
Issue 01 (2018-05-04) 506

NE20E-S2
 LACP system ID of a member Eth-Trunk interface in an E-Trunk

For an Eth-Trunk interface that is a member interface of an E-Trunk, the LACP system
ID is referred to as the LACP E-Trunk system ID.
If two devices on an Eth-Trunk link have the same LACP E-Trunk system priority, the
LACP E-Trunk system IDs are used to determine the devices' priorities. A smaller LACP
E-Trunk system ID indicates a higher priority.
 The LACP E-Trunk system ID is used for the E-Trunk to which Eth-Trunk interfaces in static LACP
mode are added.
 The LACP system ID is used for Eth-Trunk interfaces in static LACP mode.
 To change the LACP E-Trunk system ID, run the lacp e-trunk system-id command. The LACP
system ID can only be the MAC address of an Ethernet interface on IPU and cannot be changed.
 E-Trunk priority
E-Trunk priorities determine the master/backup status of the devices in an aggregation
group. As shown in Figure 1-318, the smaller the E-Trunk priority value, the higher the
E-Trunk priority. PE1 has a higher E-Trunk priority than PE2, and therefore PE1 is the
master device while PE2 is the backup device.
 E-Trunk ID
An E-Trunk ID is an integer that uniquely identifies an E-Trunk.
 Working mode
The working mode is subject to the working mode of the Eth-Trunk interface added to
the E-Trunk group. The Eth-Trunk interface works in one of the following modes:
Automatic, Forcible master and Forcible backup.
 Timeout period
Normally, the master and backup devices in an E-Trunk periodically send Hello
messages to each other. If the backup device does not receive any Hello message within
the timeout period, it becomes the master device.
The timeout period is obtained through the formula: Timeout period = Sending period x
Multiplier.
If the multiplier is 3, the backup device becomes the master device if it does not receive
any Hello message within three consecutive sending periods.
E-Trunk Working Principle

In Figure 1-319, the NE20E supports Eth-Trunk interfaces working in static LACP mode or
manual load balancing mode to be added to an E-Trunk.
Issue 01 (2018-05-04) 507

NE20E-S2
Figure 1-319 E-Trunk diagram 2
 Eth-Trunk interfaces and E-Trunk deployment

− PE end
The same Eth-Trunk and E-Trunk interfaces are created on PE1 and PE2. In
addition, the Eth-Trunk interfaces are added to the E-Trunk group.
The Eth-Trunk interfaces can work in either static LACP mode or manual load balancing mode. The
Eth-Trunk and E-Trunk configurations on PE1 and PE2 must be the same.
− CE end
Adding Eth-Trunk interfaces in static LACP mode to an E-Trunk: Create an
Eth-Trunk interface in static LACP mode on the CE, and add the CE interfaces
connecting to the PEs to the Eth-Trunk interface. This ensures link reliability.
Adding Eth-Trunk interfaces in manual load balancing mode to an E-Trunk: Create
an Eth-Trunk interface in manual load balancing mode on the CE, and add the CE
interfaces connecting to the PEs to the Eth-Trunk interface. Then, configure
Ethernet operation, administration and maintenance (OAM) on the CE and PEs,
ensuring link reliability.
The E-Trunk group is invisible to the CE.
Eth-Trunk interfaces to be added to an E-Trunk can be either Layer 2 or Layer 3 interfaces.
When you configure IP addresses for Eth-Trunk interfaces connecting the CE and PEs to transmit Layer
3 services, the PE's Eth-Trunk interface configurations must meet the following requirements:
 The same IP address must be configured for the PE Eth-Trunk interfaces.

In most cases, the master device advertises the direct route to its Eth-Trunk interface, and the backup
device does not. After a master/backup device switchover is complete, the new master device
(former backup device) advertises the direct route to its Eth-Trunk interface.
 The same MAC address must be configured for the PE Eth-Trunk interfaces.
This prevents the CE from updating its ARP entries for a long time when a master/backup device
switchover is performed and therefore ensures uninterrupted service forwarding.
Issue 01 (2018-05-04) 508

NE20E-S2
There are few scenarios for configuring IP addresses for Eth-Trunk interfaces, which connect the CE and
PEs to transmit Layer 3 services and which on PEs are added to to an E-Trunk. In most cases, Eth-Trunk
interfaces work as Layer 2 interfaces.
 Sending and receiving E-Trunk packets

E-Trunk packets carrying the source IP address and port number configured on the local
end are sent through UDP. Factors triggering the sending of E-Trunk packets are as
follows:
− The sending timer expires.
− The configurations change. For example, the E-Trunk priority, packet sending
period, timeout period multiplier, addition/deletion of a member Eth-Trunk
interface, or source/destination IP address of the E-Trunk group changes.
− A member Eth-Trunk interface fails or recovers.
 E-Trunk master/backup status
PE1 and PE2 negotiate the E-Trunk master/backup status by exchanging E-Trunk
packets. Normally, after the negotiation one PE functions as the master and the other as
the backup.
The master/backup status of a PE depends on the E-Trunk priority and E-Trunk ID
carried in E-Trunk packets. The smaller the E-Trunk priority value, the higher the
E-Trunk priority. The PE with the higher E-Trunk priority functions as the master. If the
E-Trunk priorities of the PEs are the same, the PE with the smaller E-Trunk system ID
functions as the master device.
 Master/backup status of a member Eth-Trunk interface in the E-Trunk group
The master/backup status of a member Eth-Trunk interface in the E-Trunk group is
determined by its E-Trunk status and the peer Eth-Trunk interface status.
As shown in Figure 1-319, PE1 and PE2 are on the two ends of the E-Trunk link. PE1 is
considered as the local end and PE2 as the peer end.
Table 1-91 shows the status of each member Eth-Trunk interface in the E-Trunk group.
Table 1-91 Master/backup status of an E-Trunk and its member Eth-Trunk interfaces
Status of the Local Working Mode of Status of the Peer Status of the Local
E-Trunk the Local Eth-Trunk Eth-Trunk
Eth-Trunk Interface Interface
Interface
- Forcible master - Master

- Forcible backup - Backup
Master Automatic Backup Master
Backup Automatic Backup Master
Backup Automatic Master Backup
Issue 01 (2018-05-04) 509

NE20E-S2
In normal situations:
− If PE1 functions as the master, Eth-Trunk 10 of PE1 functions as the master, and its
link status is Up.
− If PE2 functions as the backup, Eth-Trunk 10 of PE2 functions as the backup, and
its link status is Down.
If the link between the CE and PE1 fails, the following situations occur:
a. PE1 sends an E-Trunk packet containing information about the faulty Eth-Trunk 10
of PE1 to PE2.
b. After receiving the E-Trunk packet, PE2 finds that Eth-Trunk 10 on the peer is
faulty. Then, the status of Eth-Trunk 10 on PE2 becomes master. Through the
LACP negotiation, the status of Eth-Trunk 10 on PE2 becomes Up.
The Eth-Trunk status on PE2 becomes Up, and traffic of the CE is forwarded
through PE2. In this way, traffic destined for the peer CE is protected.
If PE1 is faulty, the following situations occur:
a. If the PEs are configured with BFD, the PE2 detects that the BFD session status
becomes Down, then functions as the master and Eth-Trunk 10 of PE2 functions as
the master.
b. If the PEs are not configured with BFD, PE2 will not receive any E-Trunk packet
from PE1 before its timeout period runs out, after which PE2 will function as the
master and Eth-Trunk 10 of PE2 will function as the master.
Through the LACP negotiation, the status of Eth-Trunk 10 on PE2 becomes Up.
The traffic of the CE is forwarded through PE2. In this way, destined for the peer
CE is protected.
 BFD fast detection
A device cannot quickly detect a fault on its peer based on the timeout period of received
packets. In this case, BFD can be configured on the device. The peer end needs to be
configured with an IP address. After a BFD session is established to detect whether the
route to the peer is reachable, the E-Trunk can sense any fault detected by BFD.
 Switchback mechanism
The local device is in master state. In such a situation, if the physical status of the
Eth-Trunk interface on the local device goes Down or the local device fails, the peer
device becomes the master and the physical status of the member Eth-Trunk interface
becomes Up.
When the local end recovers, the local end needs to function as the master. Therefore, the
local Eth-Trunk interface enters the LACP negotiation state. After being informed by
LACP that the negotiation ability is Up, the local device starts the switchback delay timer.
After the switchback delay timer times out, the local Eth-Trunk interface becomes the
master. After LACP negotiation, the Eth-Trunk interface becomes Up.
E-Trunk Restrictions
To improve the reliability of CE and PE links, and to ensure that traffic can be automatically
switched between these links, the configurations on both ends of the E-Trunk link must be
consistent. Use the networking in Figure 1-319 as an example.
 The Eth-Trunk link directly connecting PE1 to the CE and the Eth-Trunk link directly
connecting PE2 to the CE must be configured with the same working rate, and duplex
mode. This ensures that both Eth-Trunk interfaces have the same key and join the same
E-Trunk group.
Issue 01 (2018-05-04) 510

NE20E-S2
 Peer IP addresses must be specified for the PEs to ensure Layer 3 connectivity. The
address of the local PE is the peer address of the peer PE, and the address of the peer PE
is the peer address of the local PE. Here, it is recommended that the addresses of the PEs
are configured as loopback interface addresses.
 The E-Trunk group must be bound to a BFD session.
 The two PEs must be configured with the same security key (if necessary).
1.7.3.3.1 Application of Eth-Trunk
Service Overview
As the volume of services deployed on networks increases, the bandwidth provided by a
single P2P physical link working in full-duplex mode cannot meet the requirements of service
traffic.
To increase bandwidth, existing interface boards can be replaced with interface boards of
higher bandwidth capacity. However, this would waste existing device resources and increase
upgrade expenditure. If more links are used to interconnect devices, each Layer 3 interface
must be configured with an IP address, wasting IP addresses.
To increase bandwidth without replacing the existing interface boards or wasting IP address
resources, bundle physical interfaces into a logical interface using Eth-Trunk to provide
higher bandwidth.
As shown in Figure 1-320, traffic of different services is sent to the core network through the
user-end provider edge (UPE) and provider edge-access aggregation gateway (PE-AGG).
Different services are assigned different priorities. To ensure the bandwidth and reliability of
the link between the UPE and the PE-AGG, a link aggregation group, Eth-Trunk 1, is
established.
Issue 01 (2018-05-04) 511

NE20E-S2
Figure 1-320 Networking diagram of the Eth-Trunk
Feature Deployment
In Figure 1-320, Eth-Trunk interfaces are created on the UPE and PE-AGG, and the physical
interfaces that directly connect the UPE and PE-AGG are added to the Eth-Trunk interfaces.
Eth-Trunk offers the following benefits:
 Improved link bandwidth. The maximum bandwidth of the Eth-Trunk link is three times
that of each physical link.
 Improved link reliability. If one physical link fails, traffic is switched to another physical
link of the Eth-Trunk link.
 Network congestion prevention. Traffic between the UPE and PE-AGG is load-balanced
on the three physical links of the Eth-Trunk link.
 Prompt transmission of high-priority packets, with quality of service (QoS) policies
applied to Eth-Trunk interfaces.
You can select the operation mode for the Eth-Trunk as follows:
 If devices at both ends of the Eth-Trunk link support the Link Aggregation Control
Protocol (LACP), Eth-Trunk interfaces in static LACP mode are recommended.
 If the device at either end of the Eth-Trunk does not support LACP, Eth-Trunk interfaces
in manual load balancing mode are recommended.
Issue 01 (2018-05-04) 512

NE20E-S2
1.7.3.3.2 E-Trunk Application in Dual-homing Networking
Service Overview
Eth-Trunk implements link reliability between single devices. However, if a device fails,
Eth-Trunk does not take effect.
To improve network reliability, carriers introduced the device redundancy method that
requires master and backup devices. If the master device or primary link fails, the backup
device can take over user services. However, in this situation, the master and backup devices
must be dual-homed by a downstream device, and inter-device link reliability must be
ensured.
In dual-homing networking, Virtual Router Redundancy Protocol (VRRP) can be used to
ensure device-level reliability, and Eth-Trunk can be used to ensure link reliability. In some
cases, however, traffic cannot be switched to the backup device and secondary link
simultaneously if the master device or primary link fails. As a result, traffic is interrupted. To
address this issue, use Enhanced Trunk (E-Trunk) to implement both device- and link-level
reliability.
In Figure 1-321, the customer edge (CE) is dual-homed to the virtual private LAN service
(VPLS) network, and Eth-Trunk is deployed on the CE and provider edges (PEs) to
implement link reliability.
In normal situations, the CE communicates with remote devices on the VPLS network
through PE1. If PE1 or the link between the CE and PE1 fails, the CE cannot communicate
with PE1. To ensure that services are not interrupted, deploy an E-Trunk on PE1 and PE2. If
PE1 or the link between the CE and PE1 fails, traffic is switched to PE2. The CE then
continues to communicate with remote devices on the VPLS network through PE2. If PE1 or
the link between the CE and PE1 recovers, traffic is switched back to PE1. An E-Trunk
provides backup between Eth-Trunk links of the PEs, improving device-level reliability.
Issue 01 (2018-05-04) 513

NE20E-S2
Figure 1-321 E-Trunk dual-homing networking
Feature Deployment
Use an E-Trunk comprised of Eth-Trunk interfaces in LACP mode as an example. Figure
1-321 shows how the Eth-Trunk and E-Trunk are deployed.
 Deploy Eth-Trunk interfaces in LACP mode on the CE and PEs and add the interfaces
that directly connect the CE and PEs to the Eth-Trunk interfaces to implement link
reliability.
 Deploy an E-Trunk on the PEs and add the Eth-Trunk interfaces in LACP mode to the
E-Trunk to implement device-level reliability.
1.7.4 Layer 2 Protocol Tunneling

Definition
Layer 2 protocol tunneling allows Layer 2 devices to use Layer 2 tunneling technology to
transparently transmit Layer 2 protocol data units (PDUs) across a Layer 2 network. Layer 2
protocol tunneling supports standard protocols, such as Spanning Tree Protocol (STP), Link
Aggregation Control Protocol (LACP), as well as user-defined protocols.
Purpose
Layer 2 protocol tunneling ensures transparent transmission of private Layer 2 PDUs over a
public network. The ingress device replaces the multicast destination MAC address in the
received Layer 2 PDUs with a specified multicast MAC address before transmitting them onto
the public network. The egress device restores the original multicast destination MAC address
and then forwards the Layer 2 PDUs to their destinations.
Issue 01 (2018-05-04) 514

NE20E-S2
1.7.4.2 Principles
Background
Layer 2 protocols running between user networks, such as Spanning Tree Protocol (STP) and
Link Aggregation Control Protocol (LACP), must traverse a backbone network to perform
Layer 2 protocol calculation.
On the network shown in Figure 1-322, User Network 1 and User Network 2 both run a Layer
2 protocol, Multiple Spanning Tree Protocol (MSTP). Layer 2 protocol data units (PDUs) on
User Network 1 must traverse a backbone network to reach User Network 2 to build a
spanning tree. Generally, the destination MAC addresses in Layer 2 PDUs of the same Layer
2 protocol are the same. For example, the MSTP PDUs are BPDUs with the destination MAC
address 0180-C200-0000. Therefore, when a Layer 2 PDU reaches an edge device on a
backbone network, the edge device cannot identify whether the PDU comes from a user
network or the backbone network and sends the PDU to the CPU to calculate a spanning tree.
In Figure 1-322, CE1 on User Network 1 builds a spanning tree together with PE1 but not
with CE2 on User Network 2. As a result, the Layer 2 PDUs on User Network 1 cannot
traverse the backbone network to reach User Network 2.
Figure 1-322 Layer 2 protocol tunneling over a backbone network
To resolve the preceding problem, use Layer 2 protocol tunneling. The NE20E supports
tunneling for the following Layer 2 protocols:
 Cisco Discovery Protocol (CDP)
 Ethernet Local Management Interface (E-LMI)
 Ethernet in the First Mile OAM (EOAM3AH)
 Device link detection protocol (DLDP)
 Dynamic Trunking Protocol (DTP)
 Ethernet in the First Mile (EFM)
 GARP Multicast Registration Protocol (GMRP)
 GARP VLAN Registration Protocol (GVRP)
 Huawei Group Management Protocol (HGMP)
 Link Aggregation Control Protocol (LACP)
Issue 01 (2018-05-04) 515

NE20E-S2
 Link Layer Discovery Protocol (LLDP)

 Multiple MAC Registration Protocol (MMRP)
 Multiple VLAN Registration Protocol (MVRP)
 Port Aggregation Protocol (PAgP)
 Per VLAN Spanning Tree Plus (PVST+)
 Secure Socket Tunneling Protocol (SSTP)
 Spanning Tree Protocol (STP)
 Unidirectional Link Detection (UDLD)
 VLAN Trunking Protocol (VTP)
 802.1X
Layer 2 PDUs can be tunneled across a backbone network if all of the following conditions
are met:
 All sites of a user network can receive Layer 2 PDUs from one another.
 Layer 2 PDUs of a user network are not processed by the CPUs of backbone network
devices.
 Layer 2 PDUs of different user networks must be isolated and not affect each other.
Layer 2 protocol tunneling prevents Layer 2 PDUs of different user networks from affecting
each other, which cannot be achieved by other technologies.
BPDU
Bridge protocol data units (BPDUs) are most commonly used by Layer 2 protocols, such as
STP and MSTP. BPDUs are protocol packets multicast between Layer 2 switches. BPDUs of
different protocols have different destination MAC addresses and are encapsulated in
compliance with IEEE 802.3. Figure 1-323 shows the BPDU format.
Figure 1-323 BPDU format
A BPDU consists of four fields:

 Destination Address: destination MAC address, 6 bytes
 Source Address: source MAC address, 6 bytes
 Length: length of the BPDU, 2 bytes
 BPDU Data: BPDU content
Layer 2 protocol tunneling provides a BPDU tunnel for BPDUs. The BPDU tunnel can be
considered a Layer 2 tunneling technology that allows user networks at different regions to
transparently transmit BPDUs across a backbone network, isolating user networks from the
backbone network.
Issue 01 (2018-05-04) 516

NE20E-S2

Layer 2 protocol tunneling allows Layer 2 protocol data units (PDUs) to be transparently
transmitted based on the following principles:
 When Layer 2 PDUs enter a backbone network,
a. The ingress device replaces the multicast destination MAC address in the Layer 2
PDUs with a specified multicast MAC address so that it does not send the Layer 2
PDUs to its CPU for processing.
The specified multicast MAC address cannot be a multicast MAC address used by well-known
protocols.
b. The ingress device then determines whether to add an outer VLAN tag to the Layer
2 PDUs with a specified multicast MAC address based on the configured Layer 2
protocol tunneling type.
 When Layer 2 PDUs leave the backbone network,
a. The egress device restores the original multicast destination MAC address in the
Layer 2 PDUs based on the configured mapping between the multicast destination
MAC address and the specified multicast MAC address.
b. The egress device then determines whether to remove the outer VLAN tag from the
Layer 2 PDUs with the original multicast destination MAC address based on the
configured Layer 2 protocol tunneling type.
Layer 2 PDUs can be tunneled across a backbone network if all of the following conditions
are met:
 All sites of a user network can receive Layer 2 PDUs from one another.
 Layer 2 PDUs of a user network are not processed by the CPUs of backbone network
devices.
 Layer 2 PDUs of different user networks must be isolated and not affect each other.
Table 1-92 describes the Layer 2 protocol tunneling types that Huawei devices support.
Table 1-92 Layer 2 protocol tunneling types
Layer 2 Protocol Usage Scenario

Tunneling Type
Interface-based Layer 2 Backbone network edge devices receive untagged Layer 2 PDUs.
Protocol Tunneling
VLAN-based Layer 2 Backbone network edge devices receive Layer 2 PDUs that carry
Protocol Tunneling a single VLAN tag.
QinQ-based Layer 2 Backbone network edge devices receive Layer 2 PDUs that carry
Protocol Tunneling a single VLAN tag and need to tunnel Layer 2 PDUs that carry
double VLAN tags.
Hybrid VLAN-based Backbone network edge devices receive both tagged and
Layer 2 Protocol untagged Layer 2 PDUs.
Tunneling
Issue 01 (2018-05-04) 517

NE20E-S2
Interface-based Layer 2 Protocol Tunneling
Figure 1-324 Interface-based Layer 2 protocol tunneling networking
On the network shown in Figure 1-324, each PE interface connects to one user network, and
each user network belongs to either LAN-A or LAN-B. Layer 2 PDUs from user networks to
PEs on the backbone network do not carry VLAN tags. The PEs, however, must identify
which LAN the Layer 2 PDUs come from. Layer 2 PDUs from a user network in LAN-A
must be sent to the other user networks in LAN-A, but not to the user networks in LAN-B. In
addition, Layer 2 PDUs cannot be processed by PEs. To meet the preceding requirements,
configure interface-based Layer 2 protocol tunneling on backbone network edge devices.
1. The ingress device on the backbone network identifies the protocol type of the received
Layer 2 PDUs and tags them with the default VLAN ID of the interface that has received
them.
2. The ingress device replaces the multicast destination MAC address in the Layer 2 PDUs
with a specified multicast MAC address based on the configured mapping between the
multicast destination MAC address and the specified multicast MAC address.
3. The internal devices on the backbone network forward the Layer 2 PDUs with a
specified multicast MAC address to the egress devices.
4. The egress devices restore the original destination MAC address in the Layer 2 PDUs
based on the configured mapping between the multicast destination MAC address and
the specified multicast address and send the Layer 2 PDUs to the user networks.
Issue 01 (2018-05-04) 518

NE20E-S2
VLAN-based Layer 2 Protocol Tunneling
Figure 1-325 VLAN-based Layer 2 protocol tunneling networking
In most circumstances, PEs serve as aggregation devices on a backbone network. On the

network shown in Figure 1-325, the aggregation interfaces on PE1 and PE2 receive Layer 2
PDUs from both LAN-A and LAN-B. To differentiate between the Layer 2 PDUs of the two
LANs, the PEs must identify tagged Layer 2 PDUs from CEs, with Layer 2 PDUs from
LAN-A carrying VLAN 200 and those from LAN-B carrying VLAN 100. To meet the
preceding requirements, configure backbone network devices to identify tagged Layer 2
PDUs and allow Layer 2 PDUs carrying specified VLAN IDs to pass through and also
configure VLAN-based Layer 2 protocol tunneling on backbone network edge devices.
1. User network devices send Layer 2 PDUs with specified VLAN IDs to the backbone
network.
2. The ingress device on the backbone network identifies that the Layer 2 PDUs carry a
single VLAN tag and replaces the multicast destination MAC address in the Layer 2
PDUs with a specified multicast MAC address based on the configured mapping
between the multicast destination MAC address and the specified multicast MAC
address.
the specified multicast address and send the Layer 2 PDUs to the user networks.
Issue 01 (2018-05-04) 519

NE20E-S2
QinQ-based Layer 2 Protocol Tunneling
Figure 1-326 QinQ-based Layer 2 protocol tunneling networking
If VLAN-based Layer 2 protocol tunneling is used when many user networks connect to a
backbone network, a large number of VLAN IDs of the backbone network are required. This
may result in insufficient VLAN resources. To reduce the consumption of VLAN resources,
configure QinQ on the backbone network to forward Layer 2 PDUs.
For details about QinQ, see 1.7.6 QinQ in NE20E Feature Description - LAN and MAN Access.
On the network shown in Figure 1-326, after QinQ is configured, a PE adds an outer VLAN
ID of 20 to the received Layer 2 PDUs that carry VLAN IDs in the range 100 to 199 and an
outer VLAN ID of 30 to the received Layer 2 PDUs that carry VLAN IDs in the range 200 to
299 before transmitting these Layer 2 PDUs across the backbone network. To tunnel Layer 2
PDUs from the user networks across the backbone network, configure QinQ-based Layer 2
protocol tunneling on PEs' aggregation interfaces.
1. The ingress device on the backbone network adds a different outer VLAN tag (public
VLAN ID) to the received Layer 2 PDUs based on the inner VLAN IDs (user VLAN IDs)
carried in the PDUs.
2. The ingress device replaces the multicast destination MAC address in the Layer 2 PDUs
with a specified multicast MAC address based on the configured mapping between the
3. The ingress device transmits the Layer 2 PDUs with a specified multicast MAC address
through different Layer 2 tunnels based on the outer VLAN IDs.
Issue 01 (2018-05-04) 520

NE20E-S2
the specified multicast address, remove the outer VLAN tags, and send the Layer 2
PDUs to the user networks based on the inner VLAN IDs.
Hybrid VLAN-based Layer 2 Protocol Tunneling
Figure 1-327 Hybrid VLAN-based Layer 2 protocol tunneling networking
On the network shown in Figure 1-327, PE1, PE2, and PE3 constitute a backbone network.
LAN-A and LAN-C belong to VLAN 3; LAN-B and LAN-D belong to VLAN 2. All LANs
send tagged Layer 2 PDUs. CE1 can forward Layer 2 PDUs carrying VLAN 2 and VLAN 3.
CE2 can forward Layer 2 PDUs carrying VLAN 3. CE3 can forward Layer 2 PDUs carrying
VLAN 2. CE1, CE2, and CE3 also run an untagged Layer 2 protocol, such as LLDP.
PEs therefore receive both tagged and untagged Layer 2 PDUs. To transparently transmit both
tagged and untagged Layer 2 PDUs, configure hybrid VLAN-based Layer 2 protocol
tunneling on backbone network edge devices.
Hybrid VLAN-based Layer 2 protocol tunneling functions as a combination of interface-based and

VLAN-based Layer 2 protocol tunneling. For details about the tunneling process, see Interface-based
Layer 2 Protocol Tunneling and VLAN-based Layer 2 Protocol Tunneling.
1.7.4.3.1 Untagged Layer 2 Protocol Tunneling Application
When each edge device interface on a backbone network connects to only one user network
and Layer 2 protocol data units (PDUs) from the user networks do not carry VLAN tags,
configure untagged Layer 2 protocol tunneling to allow the Layer 2 PDUs from the user
networks to be tunneled across the backbone network. Layer 2 PDUs from the user networks
then travel through different Layer 2 tunnels to reach the destinations to perform Layer 2
protocol calculation.
In Figure 1-328, PEs on the backbone network edge must tunnel Layer 2 PDUs from the user
networks across the backbone network.
Issue 01 (2018-05-04) 521

NE20E-S2
Figure 1-328 Untagged Layer 2 protocol tunneling networking
PE1, PE2, and PE3 constitute a backbone network and use different interfaces to connect to
LAN-A and LAN-B. Layer 2 PDUs from user networks to PEs on the backbone network do
not carry VLAN tags. The PEs, however, must identify which LAN the Layer 2 PDUs come
from. Layer 2 PDUs from a user network in LAN-A must be sent to the other user networks in
LAN-A, but not to the user networks in LAN-B. In addition, Layer 2 PDUs cannot be
processed by PEs. To meet the preceding requirements, configure interface-based Layer 2
protocol tunneling on backbone network edge devices. Multiple Spanning Tree Protocol
(MSTP) runs on the LANs.
To tunnel Layer 2 PDUs from the user network across the backbone network, configure
untagged Layer 2 protocol tunneling on user-side interfaces on PE1, PE2, and PE3.
The Layer 2 protocol tunneling process is as follows:
1. PE1 identifies the protocol type of the Layer 2 PDUs and tags the Layer 2 PDUs with the
default VLAN ID of the interface that has received the Layer 2 PDUs.
2. PE1 replaces the multicast destination MAC address in the Layer 2 PDUs with a
specified multicast MAC address based on the configured mapping between the
4. The egress devices PE2 and PE3 restore the original destination MAC address in the
Layer 2 PDUs based on the configured mapping between the multicast destination MAC
address and the specified multicast address and send the Layer 2 PDUs to the user
networks. The Layer 2 PDUs are transparently transmitted.
1.7.4.3.2 VLAN-based Layer 2 Protocol Tunneling Application

When each edge device interface on a backbone network connects to more than one user
network and Layer 2 protocol data units (PDUs) from the user networks carry a single VLAN
tag, configure VLAN-based Layer 2 protocol tunneling to allow the Layer 2 PDUs from the
Issue 01 (2018-05-04) 522

NE20E-S2
user networks to be tunneled across the backbone network. Layer 2 PDUs from the user
networks then travel through different Layer 2 tunnels to reach the destinations to perform
Layer 2 protocol calculation.
In Figure 1-329, PEs on the backbone network edge must tunnel tagged Layer 2 PDUs from
VLAN 100 and VLAN 200 across the backbone network.
Figure 1-329 VLAN-based Layer 2 protocol tunneling networking
In most circumstances, PEs serve as aggregation devices on a backbone network. PE1, PE2,
and PE3 constitute a backbone network, the aggregation interfaces on PE1 and PE2 receive
Layer 2 PDUs from both LAN-A and LAN-B. To differentiate the Layer 2 PDUs from the two
LANs, the PEs must identify tagged Layer 2 PDUs from CEs, with Layer 2 PDUs from
LAN-A carrying VLAN 200 and those from LAN-B carrying VLAN 100
To tunnel Layer 2 PDUs from the user network across the backbone network, configure
VLAN-based Layer 2 protocol tunneling on user-side interfaces on PE1, PE2, and PE3.
1. CE1 sends Layer 2 PDUs with specified VLAN Tag to the backbone network.
2. Configure Layer 2 forwasrding on the aggregation device PE1 to allow BPDUs that
carry specific VLAN Tags to pass through.
3. PE1 receives Layer 2 PDUs from the user networks and identifies that the Layer 2 PDUs
carry a single VLAN tag. PE1 then replaces the multicast destination MAC address in
the Layer 2 PDUs with a specified multicast MAC address and sends the PDUs onto the
backbone network.
Issue 01 (2018-05-04) 523

NE20E-S2
address and the specified multicast address and send the Layer 2 PDUs to the user
networks. The Layer 2 PDUs are transparently transmitted.
1.7.4.3.3 QinQ-based Layer 2 Protocol Tunneling Application

network and Layer 2 protocol data units (PDUs) from the user networks carry VLAN tags,
configure QinQ-based Layer 2 protocol tunneling to allow the Layer 2 PDUs from the user
networks to be tunneled across the backbone network and also reduce the consumption of
VLAN resources on the backbone network. This configuration allows backbone network edge
devices to transmit Layer 2 PDUs through different tunnels based on the outer VLAN IDs.
In Figure 1-330, PEs on the backbone network edge must tunnel tagged Layer 2 PDUs from a
large number of VLAN users across the backbone network.
Figure 1-330 QinQ-based Layer 2 protocol tunneling networking
PE1 and PE2 constitute a backbone network and use only VLAN 20 and VLAN 30 for Layer
2 forwarding. CEs send Layer 2 PDUs carrying VLAN 100 and VLAN 200 to the PEs. After
QinQ is configured, a PE adds an outer VLAN ID of 20 to the received Layer 2 PDUs
carrying VLAN 100 and an outer VLAN ID of 30 to the received Layer 2 PDUs carrying
VLAN 200 before transmitting these Layer 2 PDUs across the backbone network. To tunnel
Layer 2 PDUs from the user networks across the backbone network, configure QinQ-based
Layer 2 protocol tunneling on PEs' aggregation interfaces.
Issue 01 (2018-05-04) 524

NE20E-S2

1. PE1 receives Layer 2 PDUs and adds a different outer VLAN tag (public VLAN ID)
based on the inner VLAN IDs carried in the PDUs.
specified multicast MAC address through different Layer 2 tunnels based on the outer
VLAN IDs to the egress device.
4. The egress device PE2 restores the original multicast destination MAC address in the
address and the specified multicast MAC address. The egress device also removes the
outer VLAN tags and sends the Layer 2 PDUs to the user networks based on the inner
VLAN IDs. The Layer 2 PDUs are transparently transmitted.
1.7.4.3.4 Hybrid VLAN-based Layer 2 Protocol Tunneling Application

network and some Layer 2 protocol data units (PDUs) from the user networks carry VLAN
tags while others do not, configure hybrid VLAN-based Layer 2 protocol tunneling to allow
both the tagged and untagged Layer 2 PDUs from the user networks to be tunneled across the
backbone network. Layer 2 PDUs from the user networks then travel through different Layer
2 tunnels to reach the destinations to perform Layer 2 protocol calculation.
In Figure 1-331, PEs on the backbone network edge must tunnel both tagged and untagged
Layer 2 PDUs from a large number of VLAN users across the backbone network.
Figure 1-331 Hybrid VLAN-based Layer 2 protocol tunneling networking
PE1, PE2, and PE3 constitute a backbone network. LAN-A and LAN-C belong to VLAN 3;
LAN-B and LAN-D belong to VLAN 2. All LANs send tagged Layer 2 PDUs. CE1 can
forward Layer 2 PDUs carrying VLAN 2 and VLAN 3. CE2 can forward Layer 2 PDUs
carrying VLAN 3. CE3 can forward Layer 2 PDUs carrying VLAN 2. CE1, CE2, and CE3
also run an untagged Layer 2 protocol, such as LLDP.
To tunnel both tagged and untagged Layer 2 PDUs from a large number of VLAN users
across the backbone network, configure hybrid tagged and hybrid untagged attributes and
enable both interface-based and VLAN-based Layer 2 protocol tunneling on the user-side
interfaces of PE1, PE2, and PE3.
Issue 01 (2018-05-04) 525

NE20E-S2
1. PE1 receives tagged and untagged Layer 2 PDUs and adds the default VLAN ID of the
interface that has received the untagged Layer 2 PDUs to these PDUs.
address and the specified multicast address. They also remove the outer VLAN tags and
send the Layer 2 PDUs to the user networks.
1.7.5 VLAN
Definition
The Virtual Local Area Network (VLAN) technology logically divides a physical LAN into
multiple VLANs that are broadcast domains. Each VLAN contains a group of PCs that have
the same requirements. A VLAN has the same attributes as a LAN. PCs of a VLAN can be
placed on different LAN segments. Hosts can communicate within the same VLAN, while
cannot communicate in different VLANs. If two PCs are located on one LAN segment but
belong to different VLANs, they do not broadcast packets to each other. In this manner,
network security is enhanced.
Purpose
The traditional LAN technology based on the bus structure has the following defects:
 Conflicts are inevitable if multiple nodes send messages simultaneously.
 Messages are broadcast to all nodes.
 Networks have security risks as all the hosts in a LAN share the same transmission
channel.
The network constructs a collision domain. More computers on the network cause more
conflicts and lower network efficiency. The network is also a broadcast domain. When many
computers on the network send data, broadcast traffic consumes much bandwidth.
Traditional networks face collision domain and broadcast domain issues, and cannot ensure
information security.
To offset the defects, bridges and Layer 2 switches are introduced to consummate the
traditional LAN.
Bridges and Layer 2 switches can forward data from the inbound interface to outbound
interface in switching mode. This properly solves the access conflict problem on the shared
media, and limits the collision domain to the port level. Nevertheless, the bridge or Layer 2
switch networking can only solve the problem of the collision domain, but not the problems
of broadcast domain and network security.
In this document, the Layer 2 switch is referred to as the switch for short.
Issue 01 (2018-05-04) 526

NE20E-S2
To reduce the broadcast traffic, you need to enable the broadcast only among hosts that need
to communicate with each other, and isolate the hosts that do not need the broadcast. A router
can select routes based on IP addresses and effectively suppress broadcast traffic between two
connected network segments. The router solution, however, is costly. Therefore, multiple
logical LANs, namely, VLANs are developed on the physical LAN.
In this manner, a physical LAN is divided into multiple broadcast domains, that is, multiple
VLANs. The intra-VLAN communication is not restricted, while the inter-VLAN
communication is restricted. As a result, network security is enhanced.
For example, if different companies in the same building build their LANs separately, it is
costly; if these companies share the same LAN in the building, there may be security
problems.
Figure 1-332 Typical VLAN application
Figure 1-332 is a networking diagram of a typical VLAN application. Device A, Device B,

and Device C are placed at different locations, such as different floors in an office building.
Each switch connects to three computers which belong to three different VLANs. In Figure
1-332, each dashed line frame identifies a VLAN. Packets of enterprise customers in the same
VLAN are broadcast within the VLAN but not among VLANs. In this way, enterprise
customers in the same VLAN can share resources as well as protect their information security.
This application shows the following VLAN advantages:
 Broadcast domains are confined. A broadcast domain is confined to a VLAN. This saves
bandwidth and improves network processing capabilities.
 Network security is enhanced. Packets from different VLANs are separately transmitted.
PCs in one VLAN cannot directly communicate with PCs in another VLAN.
 Network robustness is improved. A fault in a VLAN does not affect PCs in other
VLANs.
 Virtual groups are set up flexibly. With the VLAN technology, PCs in different
geographical areas can be grouped together. This facilitates network construction and
maintenance.
Issue 01 (2018-05-04) 527

NE20E-S2
Benefits
The VLAN technology offers the following benefits:
 Saves network bandwidth resources by isolating broadcast domains.
 Improves communication security and facilitates service deployment.
1.7.5.2 Principles
VLAN Frame Format

IEEE 802.1Q defines a VLAN frame by adding a 4-byte 802.1Q tag between the source MAC
address field and the Length/Type field in an Ethernet frame, as shown in Figure 1-333.
Figure 1-333 VLAN frame format defined in IEEE 802.1Q
An 802.1Q tag contains four fields:

 Type
The 2-byte Type field indicates a frame type. If the value of the field is 0x8100, it
indicates an 802.1Q frame. If a device that does not support 802.1Q frames receives an
802.1Q frame, it discards the frame.
 PRI
The 3-bit Priority field indicates the frame priority. A greater the PRI value indicates a
higher frame priority. If a switch is congested, it preferentially sends frames with a
higher priority.
 CFI
The 1-bit Canonical Format Indicator (CFI) field indicates whether a MAC address is in
the canonical format. If the CFI field value is 0, the MAC address is in canonical format.
If the CFI field value is 1, the MAC address is not in canonical format. This field is
mainly used to differentiate among Ethernet frames, Fiber Distributed Digital Interface
(FDDI) frames, and token ring frames. The CFI field value in an Ethernet frame is 0.
 VID
The 12-bit VLAN ID (VID) field indicates to which VLAN a frame belongs. VLAN IDs
range from 0 to 4095. The values 0 and 4095 are reserved, and therefore VLAN IDs
range from 1 to 4094.
Issue 01 (2018-05-04) 528

NE20E-S2
Each frame sent by an 802.1q-capable switch carries a VLAN ID. On a VLAN, Ethernet
frames are classified into the following types:
− Tagged frames: frames with 4-byte 802.1q tags.
− Untagged frames: frames without 4-byte 802.1q tags.
Link Types
VLAN links can be divided into the following types:
 Access link: a link connecting a host and a switch. Generally, a PC does not know which
VLAN it belongs to, and PC hardware cannot distinguish frames with VLAN tags.
Therefore, PCs send and receive only untagged frames. In Figure 1-334, links between
PCs and the switches are access links.
 Trunk link: a link connecting switches. Data of different VLANs are transmitted along a
trunk link. The two ends of a trunk link must be able to distinguish frames with VLAN
tags. Therefore, only tagged frames are transmitted along trunk links. In Figure 1-334,
links between switches are trunk links. Frames transmitted over trunk links carry VLAN
tags.
Figure 1-334 Link types
Port Types
Some ports of a device can identify VLAN frames defined by IEEE 802.1Q, whereas others
cannot. Ports can be divided into four types based on whether they can identify VLAN
frames:
 Access port
An access port connects a switch to a host over an access port, as shown in Figure 1-334.
An access port has the following features:
− Allows only frames tagged with the port default VLAN ID (PVID) to pass.
− Adds a PVID to its received untagged frame.
Issue 01 (2018-05-04) 529

NE20E-S2
− Removes the tag from a frame before it sends the frame.

 Trunk port
A trunk port connects a switch to anther switch over a trunk link. A trunk port has the
following features:
− Allows tagged frames from multiple VLANs to pass.
− Directly sends the frame if the port permits the VLAN ID carried in the frame.
− Discards the frame if the port denies the VLAN ID carried in the frame.
 Hybrid port
A hybrid port connects a switch to either a host over an access link or another switch
over a trunk link. A hybrid port allows frames from multiple VLANs to pass and can
remove VLAN tags from some outgoing VLAN frames.
Figure 1-335 Ports
 QinQ port
An 802.1Q-in-802.1Q (QinQ) port refers to a QinQ-enabled port. A QinQ port adds an
outer tag to a single-tagged frame. In this manner, the number of VLANs can meet the
requirement of networks.
Figure 1-336 shows the format of a QinQ frame. The outer tag is a public network tag for
carrying a public network VLAN ID. The inner tag is a private network tag for carrying a
private network VLAN ID.
Figure 1-336 QinQ frame format
Issue 01 (2018-05-04) 530

NE20E-S2
For details on the QinQ protocol, see 1.7.6 QinQ.
VLAN Classification
VLANs are classified based on port numbers. In this mode, VLANs are classified based on
the numbers of ports on a switching device. The network administrator configures a port
default VLAN ID (PVID) for each port on the switch. When a data frame reaches a port
which is configured with a PVID, the frame is marked with the PVID if the data frame carries
no VLAN tag. If the data frame carries a VLAN tag, the switching device will not add a
VLAN tag to the data frame even if the port is configured with a PVID. Different types of
ports process VLAN frames in different manners.
1.7.5.2.2 VLAN Communication Principles
Basic Principles
To improve frame processing efficiency, frames arriving at a switch must carry a VLAN tag
for uniform processing. If an untagged frame enters a switch port which has a PVID
configured, the port then add a VLAN tag whose VID is the same as the PVID to the frame. If
a tagged frame enters a switch port that has a PVID configured, the port does not add any tag
to the frame.
The switch processes frames in a different way according to the port types. The following
table describes how a port processes a frame.
Table 1-93 Port types
Port Method for Method for Method for Application

Type Processing a Processing a Sending a
Received Received Frame
Untagged Frame Tagged Frame
Access Accepts the frame  Accepts the Removes the tag An access port
port and adds a tag with frame if the from the frame connects a
the default VLAN ID VLAN ID and sends the switch to a PC
to the frame. carried in the frame. and can be
frame is the added to only
same as the one VLAN.
default
VLAN ID.
 Discards the
frame if the
VLAN ID
carried in the
frame is
different
from the
default
VLAN ID.
Trunk Discards the frame.  Accepts the  Directly A trunk port
port frame if the sends the can be added to
port permits frame if the multiple
the VLAN port permits VLANs to send
ID carried in the VLAN ID and receive
Issue 01 (2018-05-04) 531

NE20E-S2

the frame. carried in the frames for these
 Discards the frame. VLANs. A
frame if the  Discards the trunk port
port denies frame if the connects a
the VLAN port denies switch to
ID carried in the VLAN ID another switch
the frame. carried in the or to a router.
frame.
Hybrid  If only the port  If only the  If only the A hybrid port
port default vlan port default port default can be added to
command is run vlan vlan multiple
on a hybrid port, command is command is VLANs to send
the hybrid port run on a run on a and receive
receives the frame hybrid port: hybrid port frames for these
and adds the − The and the VLANs. A
default VLAN tag hybrid frame's hybrid port can
to the frame. port VLAN ID is connect a
 If only the port accepts the same as switch to a PC
trunk allow-pass the frame the default or connect a
command is run if the VLAN ID, network device
on a hybrid port, frame's the hybrid to another
the hybrid port VLAN port removes network device.
discards the ID is the the VLAN
frame. same as tag and
the forwards the
 If both the port
default frame;
default vlan and otherwise, the
port trunk VLAN
ID of the hybrid port
allow-pass discards the
commands are port.
frame.
run on a hybrid − The
 If only the
port, the hybrid hybrid
port receives the port port trunk
frame and adds discards allow-pass
the VLAN tag the frame command is
with the default if the run on a
VLAN ID frame's hybrid port:
specified in the VLAN − The
port default vlan ID is hybrid
command to the different port
frame. from the forwards
default the frame
VLAN if the
ID of the frame's
port. VLAN ID
 If only the is in the
port trunk permitted
allow-pass range of
Issue 01 (2018-05-04) 532

NE20E-S2

command is VLAN
run on a IDs.
hybrid port: − The
− The hybrid
hybrid port
port discards
accepts the frame
the frame if the
if the frame's
frame's VLAN ID
VLAN is not in
ID is in the
the permitted
permitted range of
range of VLAN
VLAN IDs.
IDs.  If both the
− The port default
hybrid vlan and port
port trunk
discards allow-pass
the frame commands
if the are run on a
frame's hybrid port:
VLAN − The
ID is not hybrid
in the port
permitted removes
range of the VLAN
VLAN tag and
IDs. forwards
 If both the the frame
port default if the
vlan and frame's
port trunk VLAN ID
allow-pass is the
commands same as
are run on a the default
hybrid port: VLAN ID
− The of the
hybrid port.
port − The
accepts hybrid
the frame port
if the forwards
frame's the frame
VLAN if the
ID is in frame's
the VLAN ID
Issue 01 (2018-05-04) 533

NE20E-S2

permitted is
range of different
VLAN from the
IDs or is default
the same VLAN ID
as the of the port
default but in the
VLAN permitted
ID range of
specified VLAN
in the IDs
port specified
default in the
vlan port
comman trunk
d. allow-pas
− The s;
hybrid otherwise,
port the hybrid
discards port
the frame discards
if the the frame.
frame's NOTE
VLAN The hybrid port
ID is not removes the
in the VLAN tag and
forwards the
permitted
frame if the
range of frame's VLAN
VLAN ID is the same as
IDs or is the default
different VLAN ID
from the configured using
the port default
default
vlan and the
VLAN default VLAN ID
ID is in the
specified permitted range
in the of VLAN IDs
port specified in the
port trunk
default
allow-pass
vlan command.
comman
d.
QinQ QinQ ports are enabled with the IEEE 802.1QinQ protocol. A QinQ port adds a
port tag to a single-tagged frame, and thus the number of VLANs can meet the
requirement of a Metropolitan Area Network.
Issue 01 (2018-05-04) 534

NE20E-S2
Principles of Intra-VLAN Communication Across Switches

Hosts of a VLAN are sometimes connected to different switches. In this situation, ports of
different switches must be able to recognize and send packets belonging to this VLAN, and a
trunk link is used.
A trunk link plays the following two roles:
 Reply function
A trunk link can transparently transmit VLAN packets from a switch to another
interconnected switch.
 Backbone function
A trunk link can transmit multiple VLAN packets.
Figure 1-337 Trunk link communication
On the network shown in Figure 1-337, the trunk link between Device A and Device B must
support both the intra-VLAN 2 communication and the intra-VLAN 3 communication.
Therefore, the ports at both ends of the trunk link must be configured to be bound to VLAN 2
and VLAN 3. That is, Port 2 on Device A and Port 1 on Device B must belong to both VLAN
2 and VLAN 3.
Host A sends a frame to Host B in the following process:
1. The frame is first sent to Port 4 on A.
2. A tag is added to the frame on Port 4. The VID field of the tag is set to 2, that is, the ID
of the VLAN to which Port 4 belongs.
3. Device A checks whether its MAC address table contains the MAC address destined for
Host B.
− If so, Device A sends the frame to the outbound interface Port 2.
− If not, Device A sends the frame to all interfaces bound to VLAN 2 except for Port
4.
4. Upon receipt of the frame, Port 2 sends the frame to Device B.
Issue 01 (2018-05-04) 535

NE20E-S2
5. After receiving the frame, Device B checks whether its MAC address table contains the
MAC address destined for Host B.
− If so, Device B sends the frame to the outbound interface Port 3.
− If not, Device B sends the frame to all interfaces bound to VLAN 2 except for Port
1.
6. Upon receipt of the frame, Port 3 sends the frame to Host B.
The intra-VLAN 3 communication is similar, and is omitted here.
Inter-VLAN Communication Principles

After VLANs are configured, hosts in different VLANs cannot directly communicate with
each other at Layer 2. To implement the communication between VLANs, you need to create
routes between VLANs. The specific implementation schemes are as follows:
 Layer 2 switch + router
On the network shown in Figure 1-338, a switched Ethernet interface on a Layer 2
switch is connected to a routed Ethernet interface on a router for LAN communication.
Figure 1-338 Inter-VLAN communication based a Layer 2 switch and a router
If VLAN 2 and VLAN 3 are configured on the switch, to enable VLAN 2 to

communicate with VLAN 3, you need to perform the following operations: create two
sub-interfaces on the routed Ethernet interface that is connected to the switch.
Sub-interface 1 is used to forward traffic to VLAN 2, and sub-interface 2 is used to
forward traffic to VLAN 3.
Then, configure 802.1Q encapsulation on and assign IP addresses to the sub-interfaces.
On the switch, you need to configure the switched Ethernet port to a Trunk or Hybrid
interface and allow frames of VLAN 2 and VLAN 3 to pass.
The defects of the Layer 2 switch + Router mode are as follows:
− Multiple devices are needed, and the networking is complex.
− A router is deployed, which is expensive and provides a low transmission rate.
 Layer 3 switch
Issue 01 (2018-05-04) 536

NE20E-S2
Layer 3 switching combines both routing and switching techniques to implement routing
on a switch, improving the overall performance of the network. After sending the first
data flow based on a routing table, a Layer 3 switch generates a mapping table, in which
the mapping between the MAC address and the IP address about this data flow is
recorded. If the switch needs to send the same data flow again, it directly sends the data
flow at Layer 2 but not Layer 3 based on the mapping table. In this manner, delays on the
network caused by route selection are eliminated, and data forwarding efficiency is
improved.
To allow the first data flow to be correctly forwarded based on the routing table, the
routing table must contain correct routing entries. Therefore, configuring a Layer 3
interface and a routing protocol on the Layer 3 switch is required. VLANIF interfaces are
therefore introduced.
A VLANIF interface is a Layer 3 logical interface, which can be configured on either a
Layer 3 switch or a router.
As shown in Figure 1-339, VLAN 2 and VLAN 3 are configured on the switch. You can
then create two VLANIF interfaces on the switch and assign IP addresses to and
configure routes for them. In this manner, VLAN 2 can communicate with VLAN 3.
Figure 1-339 Inter-VLAN communication through a Layer 3 switch
The Layer 3 switching offsets the defects in the scheme of Layer 2 switch + Router, and
can implement faster traffic forwarding at a lower cost. Nevertheless, the Layer 3
switching has the following defects:
− The Layer 3 switching is applicable only to a network whose interfaces are almost
all Ethernet interfaces.
− The Layer 3 switching is applicable only to a network with stable routes and few
changes in the network topology.
Key points are summarized as follows:
 A PC does not need to know the VLAN to which it belongs. It sends only untagged frames.
Issue 01 (2018-05-04) 537

NE20E-S2
 After receiving an untagged frame from a PC, a switching device determines the VLAN to which the
frame belongs. The determination is based on the configured VLAN division method such as port
information, and then the switching device processes the frame accordingly.
 If the frame needs to be forwarded to another switching device, the frame must be transparently
transmitted along a trunk link. Frames transmitted along trunk links must carry VLAN tags to allow
other switching devices to properly forward the frame based on the VLAN information.
 Before sending the frame to the destination PC, the switching device connected to the destination PC
removes the VLAN tag from the frame to ensure that the PC receives an untagged frame.
Generally, only tagged frames are transmitted on trunk links; only untagged frames are transmitted on
access links. In this manner, switching devices on the network can properly process VLAN information
and PCs are not concerned about VLAN information.
1.7.5.2.3 VLAN Aggregation
Background
A VLAN is widely used on switching networks because of its flexible control of broadcast
domains and convenient deployment. On a Layer 3 switch, the interconnection between the
broadcast domains is implemented by using one VLAN with a logical Layer 3 interface.
However, this wastes IP addresses.
Following is an example that shows how IP addresses are wasted.
On the network show in Table 1-94, VLAN 2 requires 10 host addresses. A subnet address
1.1.1.0/28 with a mask length of 28 bits is assigned to VLAN 2. 1.1.1.0 is the subnet number,
and 1.1.1.15 is the directed broadcast address. These two addresses cannot serve as the host
address. In addition, 1.1.1.1, as the default address of the network gateway of the subnet,
cannot be used as the host address. The remaining 13 addresses ranging from 1.1.1.2 to
1.1.1.14 can be used by the hosts. In this way, although VLAN 2 needs only ten addresses, 13
addresses are assigned to it according to the subnet division.
VLAN 3 requires five host addresses. A subnet address 1.1.1.16/29 with a mask length of 29
bits is assigned to VLAN 3. VLAN 4 requires only one address. A subnet address 1.1.1.24/30
with a mask length of 30 bits is assigned to VLAN 4.
Issue 01 (2018-05-04) 538

NE20E-S2
Figure 1-340 Common VLAN
Table 1-94 Example of assigning host addresses in a common VLAN
VLAN Subnet Gateway Number of Number of Practical

Address Available Available Requirement
Addresses Hosts s
2 1.1.1.0/28 1.1.1.1 14 13 10
3 1.1.1.16/29 1.1.1.17 6 5 5
4 1.1.1.24/30 1.1.1.25 2 1 1
The preceding VLANs require a total of 16 (10 + 5 + 1) addresses. However, at least 28 (16 +
8 4) addresses are occupied by the common VLANs. In this way, nearly half of the addresses
are wasted. In addition, if only three hosts, not 10 hosts are bound to VLAN 2 later, the extra
addresses cannot be used by other VLANs and thereby are wasted.
Meanwhile, this division is inconvenient for later network upgrades and expansions. For
example, if you want to add two more hosts to VLAN 4 and do not want to change the IP
addresses assigned to VLAN 4, and the addresses after 1.1.1.24 has been assigned to others, a
new subnet with the mask length of 29 bits and a new VLAN must be assigned to the new
hosts. VLAN 4 has only three hosts, but the three hosts are assigned to two subnets, and a new
VLAN is required. This is inconvenient for network management.
In above, many IP addresses are used as the addresses of subnets, directional broadcast
addresses of subnets, and default addresses of network gateways of subnets and therefore
cannot be used as the host addresses in VLANs. This reduces addressing flexibility and
wastes many addresses. To solve this problem, VLAN aggregation is used.
Principles
The VLAN aggregation technology, also known as the super VLAN, provides a mechanism
that partitions the broadcast domain by using multiple VLANs in a physical network so that
Issue 01 (2018-05-04) 539

NE20E-S2
different VLANs can belong to the same subnet. In VLAN aggregation, two concepts are
involved, namely, super VLAN and sub VLAN.
 Super VLAN: In a super VLAN that is different from a common VLAN, only Layer 3
interfaces are created, and physical ports are not contained. The super VLAN can be
viewed as a logical Layer 3 concept. It is a collection of many sub VLANs.
 Sub VLAN: It is used to isolate broadcast domains. In a sub VLAN, only physical ports
are contained, and Layer 3 VLAN interfaces cannot be created. A sub VLAN implements
Layer 3 switching through the Layer 3 interface of the super VLAN.
A super VLAN can contain one or more sub VLANs that identify different broadcast domains.
The sub VLAN does not occupy an independent subnet segment. In the same super VLAN, IP
addresses of hosts belong to the subnet segment of the super VLAN, regardless of the
mapping between hosts and sub VLANs.
Therefore, the same Layer 3 interface is shared by sub VLANs. Some subnet IDs, default
gateway addresses of the subnet, and directed broadcast addresses of the subnet are saved;
meanwhile, different broadcast domains can use the addresses in the same subnet segment. As
a result, subnet differences are eliminated, addressing becomes flexible and idle addresses are
reduced.
For example, on the network shown in Table 1-94, VLAN 2 requires 10 host addresses,
VLAN 3 requires 5 host addresses, and VLAN 4 requires 1 host address.
To implement VLAN aggregation, create VLAN 10 and configure VLAN 10 as a super
VLAN. Then assign a subnet address 1.1.1.0/24 with the mask length of 24 to VLAN 10;
1.1.1.0 is the subnet number, and 1.1.1.1 is the gateway address of the subnet, as shown in
Figure 1-341. Address assignment of sub VLANs (VLAN 2, VLAN 3, and VLAN 4) is shown
in Table 1-95.
Figure 1-341 VLAN aggregation
Issue 01 (2018-05-04) 540

NE20E-S2
Table 1-95 Example for assigning Host addresses in VLAN aggregation mode
VLA Subnet Gateway Number of Available Practical

N address available Addresses requirements
addresses
2 1.1.1.0/24 1.1.1.1 10 1.1.1.2-1.1.1.11 10

3 5 1.1.1.12-1.1.1.16 5
4 1 1.1.1.17 1
In VLAN aggregation implementation, sub VLANs are not divided according to the previous
subnet border. Instead, their addresses are flexibly assigned in the subnet corresponding to the
super VLAN according to the required host number.
As the Table 1-95 shows that VLAN 2, VLAN 3, and VLAN 4 share a subnet (1.1.1.0/24), a
default gateway address of the subnet (1.1.1.1), and a directed broadcast address of the subnet
(1.1.1.255). In this manner, the subnet ID (1.1.1.16, 1.1.1.24), the default gateway of the
subnet (1.1.1.17, 1.1.1.25), and the directed broadcast address of the subnet (1.1.1.5, 1.1.1.23,
and 1.1.1.24) can be used as IP addresses of hosts.
Totally, 16 addresses (10 + 5 + 1 = 16) are required for the three VLANs. In practice, in this
subnet, a total of 16 addresses are assigned to the three VLANs (1.1.1.2 to 1.1.1.17). A total of
19 IP addresses are used, that is, the 16 host addresses together with the subnet ID (1.1.1.0),
the default gateway of the subnet (1.1.1.1), and the directed broadcast address of the subnet
(1.1.1.255). In the network segment, 236 addresses (255 - 19 = 236) are available, which can
be used by any host in the sub VLAN.
Inter-VLAN Communication
 Introduction
VLAN aggregation ensures that different VLANs use the IP addresses in the same subnet
segment. This, however, leads to the problem of Layer 3 forwarding between sub
VLANs.
In common VLAN mode, the hosts of different VLANs can communicate with each
other based on the Layer 3 forwarding through their respective gateways. In VLAN
aggregation mode, the hosts in a super VLAN uses the IP addresses on the same network
segment and share the same gateway address. The hosts in different sub VLANs belong
to the same subnet. Therefore, they communicate with each other based on the Layer 2
forwarding, rather than the Layer 3 forwarding through a gateway. In practice, hosts in
different sub VLANs are isolated in Layer 2. As a result, sub VLANs fails to
communicate with each other.
To solve the preceding problem, you can use proxy ARP.
For details of proxy ARP, see the chapter "ARP" in the NE20E Feature Description - IP Services.
 Layer 3 communication between different sub VLANs
As shown in Figure 1-342, super VLAN VLAN 10 contains sub VLAN 2 and sub VLAN
3.
Issue 01 (2018-05-04) 541

NE20E-S2
Figure 1-342 Layer 3 communication between different sub VLANs based on ARP proxy
In the scenario where Host A has no ARP entry of Host B in its ARP table and the
gateway (L3 Switch) has proxy ARP enabled, Host A in VLAN 2 wants to
communication with Host B in VLAN 3. The communication process is as follows:
a. After comparing the IP address of Host B 1.1.1.3 with its IP address, Host A finds
that both IP addresses are on the same network segment 1.1.1.0/24 and its ARP
table has no ARP entry of Host B.
b. Host A broadcasts an ARP request to ask for the MAC address of Host B.
c. Host B is not in the broadcast domain of VLAN 2, and cannot receive the ARP
request.
d. The proxy-ARP enabled gateway between the sub VLANs receives the ARP
request from Host A and finds that the IP address of Host B 1.1.1.3 is the IP address
of a directly connected interface. Then the gateway broadcasts an ARP request to
all the other sub VLAN interfaces to ask for the MAC address of Host B.
e. After receiving the ARP request, Host B sends an ARP response.
f. After receiving the ARP response from Host B, the gateway replies with its MAC
address to Host A.
g. Both the gateway and Host A have the ARP entry of Host B.
h. Host A sends packets to the gateway, and then the gateway sends the packets from
Host A to Host B at the Layer 3. In this way, Host A and Host B can communicate
with each other.
The process that Host B sends packets to Host A is similar, and is not mentioned.
 Layer 2 communication between a sub VLAN and an external network
As shown in Figure 1-343, in the Layer 2 VLAN communications based on ports, the
received or sent frames are not tagged with the super VLAN ID.
Issue 01 (2018-05-04) 542

NE20E-S2
Figure 1-343 Layer 2 communication between a sub VLAN and an external network
Host A sends a frame to Switch 1 through Port 1. Upon receipt, Switch 1 adds a VLAN
tag with a VLAN ID 2 to the frame. The VLAN ID 2 is not changed to the VLAN 10 on
Switch 1 even if VLAN 2 is the sub VLAN of VLAN 10. When the frame is sent by a
trunk Port 3, it still carries the ID of VLAN 2.
That is to say, Switch 1 itself does not send the frames from VLAN 10. If Switch 1
receives frames from VLAN 10, it discards these frames as there is no physical port for
VLAN 10.
A super VLAN has no physical port. This limitation is obligatory, as shown below:
− If you configure the super VLAN and then the trunk interface, the frames of a super
VLAN are filtered automatically according to the allowed VLAN range set on the
trunk interface.
On the network shown in Figure 1-343, no frame of super VLAN 10 passes through
Port 3 on Switch 1, even though the interface allows frames from all VLANs to
pass through.
− If you configure the trunk interface and allow all VLAN packets to pass through,
you still cannot configure the super VLAN on Switch 1, because any VLAN with
physical ports cannot be configured as the super VLAN.
As for Switch 1, the valid VLANs are just VLAN 2 and VLAN 3, and all frames from
these VLANs are forwarded.
 Layer 3 communication between a sub VLAN and an external network
Issue 01 (2018-05-04) 543

NE20E-S2
Figure 1-344 Layer 3 communication between a sub VLAN and an external network
As shown in Figure 1-344, Switch 1 is configured with super VLAN 4, sub VLAN 2, sub
VLAN 3, and a common VLAN 10. Switch 2 is configured with two common VLANs,
namely, VLAN 10 and VLAN 20. Suppose that Switch 1 is configured with the route to
the network segment 1.1.3.0/24, and Switch 2 is configured with the route to the network
segment 1.1.1.0/24. Then Host A in sub VLAN 2 that belongs to the super VLAN 4
needs to communicate with Host C in Switch 2.
a. After comparing the IP address of Host C 1.1.3.2 with its IP address, Host A finds
that two IP addresses are not on the same network segment 1.1.1.0/24.
b. Host A broadcasts an ARP request to ask for the MAC address of the gateway
(Switch 1).
c. After receiving the ARP request, Switch 1 finds the ARP request packet is from sub
VLAN 2 and replies with an ARP response to Host A through sub VLAN 2. The
source MAC address in the ARP response packet is the MAC address of VLANIF 4
for super VLAN 4.
d. Host A learns the MAC address of the gateway.
e. Host A sends the packet to the gateway, with the destination MAC address as the
MAC address of VLANIF 4 for super VLAN 4, and the destination IP address as
1.1.3.2.
f. After receiving the packet, Switch 1 performs the Layer 3 forwarding and sends the
packet to Switch 2, with the next hop address as 1.1.2.2, the outgoing interface as
VLANIF 10.
g. After receiving the packet, Switch 2 performs the Layer 3 forwarding and sends the
packet to Host C through the directly connected interface VLANIF 20.
Issue 01 (2018-05-04) 544

NE20E-S2
h. The response packet from Host C reaches Switch 1 after the Layer 3 forwarding on
Switch 2.
i. After receiving the packet, Switch 1 performs the Layer 3 forwarding and sends the
packet to Host A through the super VLAN.
1.7.5.2.4 VLAN Mapping

VLAN mapping, also called VLAN translation, converts between user VLAN IDs and ISP
VLAN IDs.
VLAN mapping is implemented after packets are received on an inbound interface and before
the packets are forwarded by an outbound interface.
 After VLAN mapping is configured on an interface, the interface replaces the VLAN tag
of a local VLAN frame with an external VLAN tag before sending the VLAN frame out.
 When receiving the VLAN frame, the interface replaces the VLAN tag of a received
VLAN frame with the local VLAN tag.
This implements inter-VLAN communication.
On the network shown in Figure 1-345, VLAN 2-VLAN 3 mapping is configured on Interface
1 of Switch A. Before Interface 1 sends a frame in VLAN 2 to VLAN 3, Interface 1 replaces
VLAN ID 2 in the frame with VLAN ID 3 of VLAN 3. After Interface 1 receives a frame
from VLAN 3, Interface 1 replaces VLAN ID 3 in the frame with VLAN ID 2 of VLAN 2.
Therefore, devices in VLAN 2 and VLAN 3 can communicate.
Figure 1-345 VLAN mapping
If devices in two VLANs need to communicate using VLAN mapping, the IP addresses of
these devices must be on the same network segment. Otherwise, devices in the two VLANs
must communicate through routes, and VLAN mapping does not take effect.
The NE20E supports only 1 to 1 VLAN mapping. When a VLAN mapping-enabled interface
receives a single-tagged frame, the interface replaces the VLAN ID in the frame with a
specified VLAN ID.
Issue 01 (2018-05-04) 545

NE20E-S2
1.7.5.2.5 VLAN Damping

In case that a VLAN Down event occurs when all the interfaces added to the VLAN go Down,
the VLAN will report the Down event to the corresponding VLANIF interface, causing a
change in the VLANIF interface status.
To avoid this, enable VLAN damping on the VLANIF interface.
After VLAN damping is enabled, among all the interfaces that are added to the VLAN, if the
last Up interface in the VLAN becomes Down, the VLAN damping-enabled device will report
the VLAN status to the VLANIF interface after the set delay time expires. If some interfaces
in the VLAN become Up before the set delay time expires, the VLANIF interface status will
stay Up. VLAN damping delays reporting a Down event to a VLANIF interface and
suppresses unnecessary route flapping.
If a user runs a command to enable a VLAN to go Down, VLAN damping does not need to be
configured.
1.7.5.2.6 Flexible Service Access Through Sub-interfaces of Various Types
Background
On an ME network, users and services are differentiated based on a single VLAN tag or
double VLAN tags carried in packets and then access different Virtual Private Networks
(VPNs) through sub-interfaces. In some special scenarios where the access device does not
support QinQ or a single VLAN tag is used in different services, different services cannot be
distributed to different Virtual Switching Instances (VSIs) or VPN instances.
As shown in Figure 1-346, the High Speed Internet (HSI), Voice over IP (VoIP), and Internet
Protocol Television (IPTV) services belong to VLAN 10 and are converged to the UPE
through Switch; the UPE is connected to the SR and BRAS through Layer 2 virtual private
networks (L2VPNs).
If the UPE does not support QinQ, it cannot differentiate the received HSI, VoIP, and IPTV
services for transmitting them over different Pseudo Wires (PWs). In this case, you can
configure the UPE to resolve the 802.1p priorities, DiffServ Code Point (DSCP) values, or
EthType values of packets. Then, the UPE can transmit different packets over different PWs
based on the 802.1p priorities, DSCP values, or EthType values of the packets.
In a similar manner, if the UPE is connected to the SR and BRAS through L3VPNs, the UPE
can transmit different services through different VPN instances based on the 802.1p priorities
or DSCP values of the packets.
Issue 01 (2018-05-04) 546

NE20E-S2
Figure 1-346 Multiple services belonging to the same VLAN
Basic Concepts
As shown in Figure 1-346, sub-interfaces of different types are configured at the attachment
circuit (AC) side of the UPE to transmit packets with different 802.1p priorities, DSCP values,
or EthTypes through different PWs or VPN instances. This implements flexible service access.
Flexible service access through sub-interfaces is a technology that differentiates L2VPN
access based on the VLAN IDs and 802.1p priorities/DSCP values/EthType values in packets.
The sub-interfaces are classified in Table 1-96 based on service identification policies
configured on them.
Table 1-96 Different types of sub-interfaces
Sub-interface Concept Application

Type
VLAN It is a sub-interface encapsulated with a Sub-interfaces on different
sub-interface VLAN ID. main interfaces can be
encapsulated with the same
VLAN ID. VLAN
sub-interfaces are bound to
VSIs/Virtual Private Wire
Services (VPWSes) or
VPN instances to access
L2VPNs or L3VPNs.
Untagged It is a sub-interface that supports An access device on an
sub-interface untagged+DSCP. An untagged ME network differentiates
sub-interface receives untagged packets services based on their
with DSCP values. DSCP values. Untagged
packets are transmitted
through different VPN
instances based on the
DSCP values of the
Issue 01 (2018-05-04) 547

NE20E-S2

Type
packets.
After untagged+DSCP is
configured on a
sub-interface, note the
following:
1. The sub-interface
automatically resolves a
received packet to
obtain its DSCP value.
2. If the obtained DSCP
value matches the
configured matching
policy, the packet is
transmitted to the VPN
instance associated with
the sub-interface.
3. If the obtained DSCP
value does not match
the configured
matching policy but a
default sub-interface is
configured, the packet
is transmitted to the
VPN instance
associated with the
default sub-interface.
If neither of the preceding
conditions exists, the
After untagged+DSCP is
configured on a
sub-interface, its main
interface cannot process
Layer 3 packets, and all
Layer 3 packets are
processed on the untagged
sub-interface on the
NE20E.
NOTE
Untagged+DSCP is
applicable to only the IP and
L3VPN access scenario.
802.1p It is a sub-interface that supports An access device on an

sub-interface VLAN+802.1p. The VLAN can be a ME network differentiates
single VLAN or a VLAN range. If a services based on their
single VLAN is specified, it is a Dot1q VLAN IDs and 802.1p
sub-interface; if a VLAN range is priorities.
specified, it can be a sub-interface for After VLAN+802.1p is
Dot1q VLAN tag termination or a QinQ
Issue 01 (2018-05-04) 548

NE20E-S2

Type
stacking sub-interface. An 802.1p configured on a
sub-interface receives tagged packets sub-interface, note the
with 802.1p priorities. following:
received packet to
obtain its VLAN ID
and 802.1p priority.
2. If the obtained VLAN
ID and 802.1p priority
match the configured
matching policy, the
packet is transmitted to
the VPWS/VSI or VPN
the sub-interface.
ID and DSCP value do
not match the
configured matching
policy but a default
sub-interface is
VPWS/VSI or VPN
the default
sub-interface.
DSCP It is a sub-interface that supports An access device on an
sub-interface VLAN+DSCP. Here, the VLAN can be ME network differentiates
a single VLAN or a VLAN range. If a services based on their
single VLAN is specified, it is a Dot1q VLAN IDs and DSCP
sub-interface; if a VLAN range is values.
specified, it can be a sub-interface for After VLAN+DSCP is
Dot1q VLAN tag termination or a QinQ configured on a
stacking sub-interface. A DSCP sub-interface, note the
sub-interface receives tagged packets following:
with DSCP values.
received packet to
obtain its VLAN ID
and DSCP value.
ID and DSCP value
Issue 01 (2018-05-04) 549

NE20E-S2

Type
the VPWS/VSI or VPN
the sub-interface.
ID and DSCP value do
not match the
configured matching
policy but a default
sub-interface is
VPWS/VSI or VPN
the default
sub-interface.
EthType It is a sub-interface that supports An access device on an
sub-interface VLAN+EthType. Here, the VLAN can ME network differentiates
be a single VLAN or a VLAN range. If services based on their
a single VLAN is specified, it is a VLAN IDs and EthTypes.
Dot1q sub-interface; if a VLAN range After VLAN+EthType is
is specified, it can be a sub-interface for configured on a
Dot1q VLAN tag termination or a QinQ sub-interface, there are the
stacking sub-interface. An EthType following situations:
sub-interface receives tagged packets
with different EthTypes. 1. The sub-interface
received packet to
obtain its VLAN ID
and EthType.
ID and EthType match
the configured
the VPWS/VSI
associated with the
sub-interface.
ID and EthType do not
matching policy but a
default sub-interface is
VPWS/VSI associated
with the default
sub-interface.
Issue 01 (2018-05-04) 550

NE20E-S2

Type
Default It is a sub-interface that supports A VLAN+default-enabled
sub-interface VLAN+default. Here, the VLAN can be sub-interface identifies
a single VLAN or a VLAN range. If a packets based on their
single VLAN is specified, it is a Dot1q VLAN IDs without 802.1p
sub-interface; if a VLAN range is priorities/DSCP
specified, it can be a sub-interface for values/EthType values.
Dot1q VLAN tag termination or a QinQ
stacking sub-interface. A default
sub-interface receives tagged packets
with no 802.1p priorities/DSCP
values/EthType values.
 802.1p and EthType

Figure 1-347 shows the format of a VLAN frame defined in IEEE 802.1Q.
As shown in Figure 1-347, the 802.1p priority is represented by a 3-bit PRI (priority)
field in a VLAN frame defined in IEEE 802.1Q. The value ranges from 0 to 7. The
greater the value, the higher the priority. When the switching device is congested, the
switching device preferentially sends packets with higher priorities. In flexible service
access, this field is used to identify service types so that different services can access
different L2VPNs/L3VPNs.
The EthType is represented by a 2-bit LEN/ETYPE field, as shown in Figure 1-347. In
flexible service access, this field is used to identify service types based on EthType
values (PPPoE or IPoE) so that different services can access different L2VPNs.
 DSCP
As shown in Figure 1-348, the DSCP is represented by the first 6 bits of the Type of
Service (ToS) field in an IPv4 packet header, as defined in relevant standards. The DSCP
guarantees QoS on IP networks. Traffic control on the gateway depends on the DSCP
field.
Issue 01 (2018-05-04) 551

NE20E-S2
In flexible service access, this field is used to identify service types so that different
services can access different L2VPNs/L3VPNs.
VLAN+802.1p-based L2VPN Access

On the network shown in Figure 1-349, when a CSG accesses an IP station, VPWS is not
required on the CSG and MASG. After the CSG receives IP packets, it performs the
following:
1. The CSG directly encapsulates the packets with VLAN IDs and 802.1p priorities for
differentiating services. The CSG encapsulates the IP packets as follows:
− Encapsulates different users with different VLAN IDs.
− Encapsulates different services with different 802.1p priorities.
− Encapsulates different services of the same user with the same VLAN ID but
different 802.1p priorities.
− Encapsulates different services of different users with different VLAN IDs but the
same or different 802.1p priorities.
2. Then, the CSG sends the encapsulated packets to PE1. After PE1 receives the packets, its
802.1p sub-interface resolves the packets to obtain their VLAN IDs and 802.1p priorities.
The packets then access different VSIs through priority mapping. In this manner,
different services are transmitted to PE2 through different VSIs.
3. After PE2 receives the packets, it sends the packets to the MASG.
4. The MASG then transmits the packets to the BSC.
Issue 01 (2018-05-04) 552

NE20E-S2
Figure 1-349 IP station access to an L2VPN
 Huawei high-end routers can function as PEs. In this scenario, only the configurations of PEs are
mentioned. For detailed configurations of other devices, see the related configuration guides.
 You can configure the 802.1p priorities on the CSG through commands.
 For details on L2VPNs, see the chapters "VPWS", "VPWS", and "VPLS" in the NE20E Feature
Description - VPN.
VLAN+EthType-based L2VPN Access

On the network shown in Figure 1-350, packets sent from PC users are encapsulated with
PPPoE whereas packets sent from IPTV and voice users are encapsulated with IPoE. To
ensure that packets of different EthTypes are transmitted to different remote servers, you can
configure VLAN+EthType on the edge device of the ME network to access an L2VPN. In this
manner, the edge device priorities services based on VLAN+EthType, distributes services to
different VSIs or VPWSs through priority mapping, and transparently transmits PPPoE
packets to the BRAS and IPoE packets to the remote SR.
Issue 01 (2018-05-04) 553

NE20E-S2
Figure 1-350 VLAN+EthType-based L2VPN access
VLAN+DSCP-based L2VPN Access

On the network shown in Figure 1-351, after the CSG receives IP packets, it performs the
following:
1. The CSG directly encapsulates the packets with VLAN IDs and DSCP values for
− Encapsulates different services with different DSCP values.
different DSCP values.
same or different DSCP values.
DSCP sub-interface resolves the packets to obtain their VLAN IDs and DSCP values.
The packets then access different VSIs through priority mapping. In this manner,
different services are transmitted to PE2 through different VSIs.
3. After PE2 receives the packets, it sends the packets to the RNC.
Issue 01 (2018-05-04) 554

NE20E-S2
 You can configure the DSCP values on the CSG through commands.
 For details on L2VPNs, see the chapters "VPWS" and "VPLS" in the NE20E Feature Description -
VPN.
VLAN+DSCP-based L3VPN Access

As shown in Figure 1-352, after the CSG receives IP packets, it performs the following:
1. The CSG directly encapsulates the packets with VLAN IDs and DSCP values for
− Encapsulates different services with different DSCP values.
different DSCP values.
same or different DSCP values.
DSCP sub-interface resolves the packets to obtain their VLAN IDs and DSCP values.
The packets then access different VPN instances through priority mapping. In this
manner, different services are transmitted to PE2 through different VPN instances.
 You can configure the DSCP values on the CSG through commands.
 For details on L2VPNs, see the chapter "BGP/MPLS IP VPN" in the NE20E Feature Description -
VPN.
VLAN+802.1p-based L3VPN Access

As shown in Figure 1-353, when a CSG accesses an IP station, VPWS is not required on the
CSG and MASG. After the CSG receives IP packets, it performs the following:
1. The CSG directly encapsulates the packets with VLAN IDs and 802.1p priorities for
Issue 01 (2018-05-04) 555

NE20E-S2

− Encapsulates different services with different 802.1p priorities.
different 802.1p priorities.
same or different 802.1p priorities.
802.1p sub-interface resolves the packets to obtain their VLAN IDs and 802.1p priorities.
The packets then access different VPN instances through priority mapping. In this
manner, different services are transmitted to PE2 through different VPN instances.
 You can configure the 802.1p priorities on the CSG through commands.
 For details on L2VPNs, see the chapter "BGP/MPLS IP VPN" in the NE20E Feature Description -
VPN.
1.7.5.3.1 Port-based VLAN Classification
On the network shown in Figure 1-354, different companies residing in the same business
premise need to isolate service data. According to the port requirement of each company,
ports of each company are bound to a VLAN. This ensures that each company can have a
"virtual switch" or a "virtual workstation".
Issue 01 (2018-05-04) 556

NE20E-S2
Figure 1-354 Port-based VLAN classification
1.7.5.3.2 VLAN Trunk Application

On the network shown in Figure 1-355, a company may have departments located in different
business premises. In such a situation, a trunk link can be used to interconnect core switches
of different business premises, In this manner, data of different companies can be isolated, and
the inter-department communication within the company can be implemented.
Figure 1-355 VLAN trunk application
1.7.5.3.3 Inter-VLAN Communication Application

Inter-VLAN communication ensures that different companies can communicate with each
other.
 Multiple VLANs belong to the same Layer 3 device.
On the network shown in Figure 1-356, if VLAN 2, VLAN 3, and VLAN 4 belong to
Device A, these VLANs does not cross switches. In such a situation, you can configure a
VLANIF interface for each VLAN on Device A to implement the communication among
these VLANs.
Issue 01 (2018-05-04) 557

NE20E-S2
Figure 1-356 Inter-VLAN communication through the same Layer 3 device
The Layer 3 device shown in Figure 1-356 can be a router or a Layer 3 switch.
 Multiple VLANs belong to different Layer 3 devices.
On the network shown in Figure 1-357, VLAN 2, VLAN 3, and VLAN 4 are VLANs
across different switches. In such a situation, you can configure a VLANIF interface on
Device A and Device B for each VLAN, and then configure the static route or a routing
protocol on Device A and Device B, so that Device A and Device B can communicate
over a Layer 3 route.
Figure 1-357 Inter-VLAN communication through different Layer 3 devices
The Layer 3 device shown in Figure 1-357 can be a router or a Layer 3 switch.
1.7.5.3.4 VLAN Aggregation Application

On the network shown in Figure 1-358, VLANs 1 through 4 are configured. To allow these
VLANs to communicate with each other, you must configure an IP address for each VLAN on
the router.
As an alternative, you can enable VLAN aggregation to aggregate VLAN 1 and VLAN 2 into
super VLAN 1, and VLAN 3 and VLAN 4 into super VLAN 2. In this manner, you can save
IP addresses by only assigning IP addresses to the super VLANs.
Issue 01 (2018-05-04) 558

NE20E-S2
After proxy ARP is configured on the router, the sub VLANs in each super VLAN can
Figure 1-358 VLAN aggregation application
Terms
None

Abbreviation

PVID Port Default VLAN ID
1.7.6 QinQ
Definition
802.1Q-in-802.1Q (QinQ) is a technology that adds another layer of IEEE 802.1Q tag to the
802.1Q tagged packets entering the network. This technology expands the VLAN space by
Issue 01 (2018-05-04) 559

NE20E-S2
tagging the tagged packets. It allows services in a private VLAN to be transparently

transmitted over a public network.
Purpose
During intercommunication between Layer 2 LANs based on the traditional IEEE 802.1Q
protocol, when two user networks access each other through a carrier network, the carrier
must assign VLAN IDs to users of different VLANs, as shown in Figure 1-359. User
Network1 and User Network2 access the backbone network through PE1 and PE2 of a carrier
network respectively.
Figure 1-359 Intercommunication between Layer 2 LANs using the traditional IEEE 802.1Q
protocol
To connect VLAN 100 - VLAN 200 on User Network1 to VLAN 100 - VLAN 200 on User
Network2, interfaces connecting CE1, PE1, the P, PE2, and CE2 can be configured to function
as trunk interfaces and to allow packets from VLAN 100 - VLAN 200 to pass through.
This configuration, however, makes user VLANs visible on the backbone network and wastes
the carrier's VLAN ID resources (4094 VLAN IDs are used). In addition, the carrier has to
manage user VLAN IDs, and users do not have the right to plan their own VLANs.
The 12-bit VLAN tag defined in IEEE 802.1Q identifies only a maximum of 4096 VLANs,
unable to isolate and identify massive users in the growing metro Ethernet (ME) network.
QinQ is therefore developed to expand the VLAN space by adding another 802.1Q tag to an
802.1Q tagged packet. In this way, the number of VLANs increases to 4096 x 4096.
In addition to expanding VLAN space, QinQ is applied in other scenarios with the
development of the ME network and carriers' requirements on refined operation. The outer
and inner VLAN tags can be used to differentiate users from services. For example, the inner
tag represents a user, while the outer tag represents a service. Moreover, QinQ functions as a
simple and practical VPN technology by transparently transmitting private VLAN services
over a public network. It extends services of a core MPLS VPN to the ME network and
implements an end-to-end VPN.
Since the QinQ technology is easy to use, it has been widely applied in the ISP network. For
example, it is used by multiple services in the metro Ethernet. The introduction to selective
QinQ makes QinQ more popular among ISPs. As the metro Ethernet develops, different
vendors propose their own metro Ethernet solutions. QinQ with its simplicity and flexibility,
plays important roles in metro Ethernet solutions.
Issue 01 (2018-05-04) 560

NE20E-S2
Benefits
QinQ offers the following benefits:
 Extends VLANs to isolate and identify more users.
 Facilitates service deployment by allowing the inner and outer tags to represent different
information. For example, use the inner tag to identify a user and the outer tag to identify
a service.
 Allows ISPs to implement refined operation by providing diversified encapsulation and
termination modes.
1.7.6.2 QinQ Features

QinQ is a technology used to expand VLAN space by adding another 802.1Q VLAN tag to a
tagged 802.1Q packet. To accommodate to the ME network development, QinQ becomes
diversified in its encapsulation and termination modes and is more intensely applied in service
refined operation. The following describes the format of a QinQ packet, QinQ encapsulation
on an interface, and QinQ termination on a sub-interface.
QinQ Packet Format

A QinQ packet has a fixed format. In the packet, another 802.1Q tag is added before an
802.1Q tag. A QinQ packet is 4–byte longer than a common 802.1Q packet.
Figure 1-360 shows 802.1Q encapsulation.
Figure 1-360 QinQ packet format
QinQ packets carry two VLAN tags when they are transmitted across a carrier network. The
meanings of the two tags are described as follows:
 Inner VLAN tag: private VLAN tag that identifies the VLAN to which a user belongs.
 Outer VLAN tag: public VLAN tag that is assigned by a carrier to a user.
QinQ Encapsulation
QinQ encapsulation is to add another 802.1Q tag to a single-tagged packet. QinQ
encapsulation is usually performed on UPE interfaces connecting to users.
Issue 01 (2018-05-04) 561

NE20E-S2
Currently, only interface-based QinQ encapsulation is supported. Interface-based QinQ

encapsulation, also known as QinQ tunneling, encapsulates packets that enter the same
interface with the same outer VLAN tag. This encapsulation mode cannot flexibly distinguish
between users and services.
Sub-interface for VLAN Tag Termination

In dot1q/QinQ termination, a device identifies whether a packet has one tag or two tags. The
device then forwards the packet after stripping one or both tags or discards the packet.
 After an interface receives a packet with one or two VLAN tags, the device removes the
VLAN tags and forwards the packet at Layer 3. The outbound interface decides whether
to add one or two VLAN tags to the packet.
 Before an interface forwards a packet, the device adds the planned VLAN tag to the
packet.
The following section describes the termination types, the VLAN tag termination
sub-interfaces, and the applications of VLAN tag termination.
 Termination type
VLAN packets are classified into dot1q packets, which carry only one VLAN tag, and
QinQ packets, which carry two VLAN tags. Accordingly, there are two VLAN tag
termination modes:
− Dot1q termination: terminates packets that carry one VLAN tag.
− QinQ termination: terminates packets that carry two VLAN tags.
 VLAN tag termination sub-interfaces
Dot1q/QinQ termination is conducted on sub-interfaces.
− Sub-interface for dot1q VLAN tag termination
A sub-interface that terminates packets carrying one VLAN tag.
− Sub-interface for QinQ VLAN tag termination
A sub-interface that terminates packets carrying two VLAN tags.
Sub-interfaces for QinQ VLAN tag termination are classified into the following
types:
 Explicit sub-interface for QinQ VLAN tag termination: The pair of VLAN tags
specifies two VLANs.
 Implicit sub-interface for QinQ VLAN tag termination: The pair of VLAN tags
specifies two ranges of VLANs.
Dot1q and QinQ VLAN tag termination sub-interfaces do not support transparent transmission of
packets that do not contain a VLAN tag, and discard received packets that do not contain a VLAN tag.
 Applications of VLAN tag termination
− Inter-VLAN communication
The VLAN technology is widely used because it allows Layer 2 packets of different
users to be transmitted separately. With the VLAN technology, a physical LAN is
divided into multiple logical broadcast domains (VLANs). Hosts in the same
VLAN can communicate with each other at Layer 2, but hosts in different VLANs
cannot. The Layer 3 routing technology is required for communication between
hosts in different VLANs. The following interfaces can be used to implement
inter-VLAN communication:
 VLANIF interfaces on Layer 3 switches
Issue 01 (2018-05-04) 562

NE20E-S2
 Layer 3 Ethernet interfaces on routers

Conventional Layer 3 Ethernet interfaces do not identify VLAN packets. After
receiving VLAN packets, they consider the packets invalid and discard them.
To implement inter-VLAN communication, create Ethernet sub-interfaces on
an Ethernet interface and configure the sub-interfaces to remove tags from
VLAN packets.
− Communication between devices in the LAN and WAN
Most LAN packets carry VLAN tags. Certain wide area network (WAN) protocols,
such as Point-to-Point Protocol (PPP), cannot identify VLAN packets. Before
forwarding VLAN packets from a LAN to a WAN, a device needs to record the
VLAN information carried in the VLAN packets and then remove the VLAN tags.
When a device receives packets, it adds the locally stored VLAN information to the
packets and forwards them to VLAN users.
1.7.6.2.2 QinQ Tunneling

QinQ tunneling increases the number of VLANs by adding a same outer VLAN tag to tagged
packets that enter the same interface.
On the network shown in Figure 1-361, Company 1 has two branches which are connected to
PE1, and Company 2 has three branches. Two of them are connected to PE2, and the third one
is connected to PE1. Company 1 and Company 2 can plan their own VLANs.
Figure 1-361 QinQ tunneling
Issue 01 (2018-05-04) 563

NE20E-S2
To allow branches to communicate within Company 1 or Company 2 but not between the two
companies, configure QinQ tunneling on PE1 and PE2. The configuration roadmap is as
follows:
 On PE1, user packets entering Port 1 and Port 3 are encapsulated with an outer VLAN
tag 10, and user packets entering Port 2 are encapsulated with an outer VLAN tag 20.
 On PE2, user packets entering Port 1 and Port 2 are encapsulated with an outer VLAN
tag 20.
 Port 4 on PE1 and Port 3 on PE2 allow the packets tagged with VLAN 20 to pass.
Table 1-97 shows planning of outer VLAN tags of Company 1 and Company 2.
Table 1-97 Outer VLAN tag planning of Company 1 and Company 2
Company Name VLAN ID Range Outer VLAN ID

Company 1 2 to 500 10
Company 2 500 to 4094 20
1.7.6.2.3 Layer 2 Selective QinQ

Layer 2 selective QinQ is an extension of QinQ tunneling but is more flexible. The major
difference is as follows:
 QinQ tunneling adds the same outer tag to the frames that enter a QinQ interface.
 Layer 2 selective QinQ adds distinctive outer tags to the frames that enter a QinQ
interface according to inner tags.
On the network shown in Figure 1-362, Company 1 and Company 2 have more than one
branch.
 VLAN 2 to VLAN 500 are used on the networks of Company 1.
 VLAN 501 to VLAN 4094 are used on the networks of Company 2.
 Interface 1 on PE1 both receives packets from VLANs of Company 1 and Company 2.
Issue 01 (2018-05-04) 564

NE20E-S2
Figure 1-362 Layer 2 selective QinQ
To allow branches to communicate within Company 1 or Company 2 but not between the two
companies, configure Layer 2 selective QinQ on PE1 and PE2.
 Table 1-98 shows the planning of outer VLAN tags in the packets entering different
interfaces on PE1 and PE2.
Table 1-98 Outer VLAN tag planning on PE1 and PE2
Device Name Interface Name VLAN ID Range Outer VLAN ID
PE1 Interface 1 2 to 500 10

Interface 1 1000 to 2000 20
PE2 Interface 1 1000 to 4094 20
Issue 01 (2018-05-04) 565

NE20E-S2
 Interface 3 on PE1 or PE2 allows the packets tagged with VLAN 20 to pass.
1.7.6.2.4 VLAN Stacking

VLAN stacking is a Layer 2 technology that encapsulates different outer VLAN tags for
different user VLANs.
On a carrier's access network, user packets need to be differentiated according to users'
applications, access points, or access devices. VLAN stacking is introduced to differentiate
users by adding outer VLAN tags to user packets based on user packets' inner tags or IP or
MAC addresses.
A VLAN stacking interface adds different outer VLAN tags to its received packets and strip
the outer VLAN tags from the packets to be sent.
1.7.6.2.5 Compatibility of EtherTypes in QinQ Tags

As shown in Figure 1-363, an IEEE 802.1Q tag lies between the Source Address field and the
Length/Type field. The default EtherType value in the 2–byte Tag Protocol Identifier (TPID)
is 0x8100. If the EtherType value of a packet is 0x8100, the packet is tagged. The EtherType
value in a QinQ packet varies with the settings of device manufactures. Huawei devices use
the default value 0x8100 while some non-Huawei devices use 0x9100 as the EtherType value.
To implement interworking between Huawei devices and non-Huawei devices, you need to
configure compatibility of EtherTypes in inner and outer tags of QinQ packets sent by the
devices of different vendors.
Figure 1-363 802.1Q encapsulation
In Figure 1-364, Device A is a non-Huawei device that uses 0x9100 as the EtherType value,
and Device B is a Huawei device which uses 0x8000 as the EtherType value. To implement
interworking between the Huawei and the non-Huawei devices, configure 0x9100 as the
EtherType value in the outer VLAN tag of QinQ packets sent by the Huawei device.
Figure 1-364 Compatibility of EtherTypes in QinQ tags
Issue 01 (2018-05-04) 566

NE20E-S2
1.7.6.2.6 QinQ-based VLAN Tag Swapping

On the network shown in Figure 1-365, a UPE receives user packets that carry double packets
from a DSLAM. The inner and outer tags represent the service and user, respectively.
However, the UPE only supports packets whose outer tag represents the service and inner tag
represents the user. In this situation, you can configure VLAN tag swapping on the UPE to
swap the inner and outer tags.
After VLAN tag swapping is configured, once the UPE receives packets with double VLAN
tags, it swaps the inner and outer VLAN tags. VLAN tag swapping does not take effect on
packets carrying a single tag.
Figure 1-365 QinQ-based VLAN tag swapping
PE-AGG: PE-Aggregation DSLAM: Digital Subscriber Line Access

Multiplexer
Service POP: Service IPTV: Internet Protocol Television
Points-of-Presence
UPE: Underlayer Provider Edge HSI: High Speed Internet
RG: Residential Gateway VOIP: Voice over IP
1.7.6.2.7 QinQ Mapping
Principles
QinQ mapping maps VLAN tags in user packets to specified tags before the user packets are
transmitted across the public network.
 Before sending local VLAN frames, a sub-interface replaces the tags in the local frames
with external VLAN tags.
Issue 01 (2018-05-04) 567

NE20E-S2
 Before receiving frames from external VLANs, a sub-interface replaces the tags in the
external VLANs with local VLAN tags.
QinQ mapping allows a device to map a user VLAN tag to a carrier VLAN tag, shielding
different user VLAN IDs in packets.
QinQ mapping is deployed on edge devices of a Metro Ethernet. It is applied in but not
limited to the following scenarios:
 VLAN IDs deployed at new sites and old sites conflict, but new sites need to
communicate with old sites.
 VLAN IDs planned by each site on the public network conflict. These sites do not need
to communicate.
 VLAN IDs on both ends of the public network are asymmetric.
Currently, only 1 to 1 QinQ mapping is supported. When a QinQ mapping-enabled
sub-interface receives a single-tagged packet, the sub-interface replaces the VLAN ID in the
frame with a specified VLAN ID.
Figure 1-366 QinQ mapping
As shown in Figure 1-366, 1 to 1 QinQ mapping is configured on Sub-interfaces 1 on Switch

2 and Switch 3. If PC1 wants to communicate with PC2:
1. PC1 sends a frame to Switch 1.
2. Upon receipt, Switch 1 adds VLAN ID 10 to the frame, and forwards the frame to
Switch 2. After Sub-interface1 on Switch 2 receives the frame with VLAN ID 10,
Sub-interface 1 on Switch 2 replaces VLAN ID 10 with carrier VLAN ID 50. Interface 2
on Switch 2 then sends the frame with carrier VLAN ID 50 to the Internet service
provider (ISP) network.
3. The ISP network transparently transmits the frame.
Issue 01 (2018-05-04) 568

NE20E-S2
4. After Sub-interface 1 on Switch 3 receives the tagged frame from Switch 2,

Sub-interface 1 on Switch 3 replaces the carrier VLAN ID 50 with VLAN ID 30.
PC2 communicates with PC1 in a similar manner.
Comparison Between QinQ Mapping and VLAN Mapping

Table 1-99 describes the comparison between QinQ mapping and VLAN mapping.
Table 1-99 Comparison between QinQ mapping and VLAN mapping
Mapping Similarity Difference

Type
1 to 1 An interface maps the  QinQ mapping

tag of a received − Performed on a sub-interface
single-tagged frame to
the specified tag. − Used for VPLS access
 VLAN mapping
− Performed on an interface
− Used on Layer 2 networks where VLAN
frames are forwarded
1.7.6.2.8 Symmetry/Asymmetry Mode

QinQ termination sub-interfaces can access the L2VPN in symmetry mode or asymmetry
mode.
 In symmetric mode, when sub-interfaces for QinQ VLAN tag termination are used to
access an L2VPN, packets received by the edge devices on the two ends of the public
network must carry the same VLAN tags.
In symmetry mode, the VLAN planning at each site must be consistent, and only users in
the same VLAN at different sites can communicate with each other. In this mode, user
VLANs can be isolated according to inner tags. MAC address learning is only based on
outer tags, and inner tags are transparently transmitted to the remote end.
 In asymmetric mode, when sub-interfaces for QinQ VLAN tag termination are used to
access an L2VPN, packets received by the edge devices on the two ends of the public
network may carry different VLAN tags.
In asymmetrical mode, the VLANs planning at each site can be different, and users in
VLANs at any sites can communicate with each other. In this mode, user VLANs cannot
be isolated, and MAC address learning is based on both inner and outer tags.
Table 1-100 and Table 1-101 describe how a PE processes user packets that arrive at an
L2VPN in different ways.
Table 1-100 Packet processing on an inbound interface
Type of the VPWS/VPLS VPWS/VPLS

Inbound Ethernet Encapsulation VLAN Encapsulation
Interface
Issue 01 (2018-05-04) 569

NE20E-S2

Inbound Ethernet Encapsulation VLAN Encapsulation
Interface
Symmetry Removes the outer tag. No action is required.
mode
Asymmetry Removes both the inner and Removes both the inner and outer tags
mode outer tags. and then adds one tag.
Table 1-101 Packet processing on an outbound interface

Outbound Ethernet Encapsulation VLAN Encapsulation
Interface
Symmetry Adds an outer tag. Replaces the outer tag.
mode
Asymmetry Adds double tags. Removes one tag and adds another
mode double tags.
1.7.6.2.9 IP Forwarding on a Termination Sub-interface

On the network shown in Figure 1-367 and Figure 1-368, when the NPE at the edge of the
MPLS/IP core network acts as a gateway for users, termination sub-interfaces must support IP
forwarding.
IP forwarding can be configured on a sub-interface for Dot1q VLAN tag termination or
sub-interface for QinQ VLAN tag termination, based on whether the user packets received by
the NPE carry one or two VLAN tags.
 If the user packets contain one tag, the sub-interface that has IP forwarding configured is
a sub-interface for Dot1q VLAN tag termination.
 If the user packets contain double tags, the sub-interface that has IP forwarding
configured is a sub-interface for QinQ VLAN tag termination.
Issue 01 (2018-05-04) 570

NE20E-S2
IP Forwarding on a Sub-interface for Dot1q VLAN Tag Termination
Figure 1-367 IP forwarding on a sub-interface for Dot1q VLAN tag termination
The sub-interface for Dot1q VLAN tag termination first identifies the outer VLAN tag and
then generates an ARP entry containing the IP address, MAC address, and outer VLAN tag.
 For the upstream traffic, the termination sub-interface strips the Ethernet frame header
(including MAC address) and the outer VLAN tag, and searches the routing table to
perform Layer 3 forwarding based on the destination IP address.
 For the downstream traffic, the termination sub-interface encapsulates IP packets with
the Ethernet frame header (including MAC address) and outer VLAN tag according to
ARP entries and then sends IP packets to the target user.
Issue 01 (2018-05-04) 571

NE20E-S2
IP Forwarding on a Sub-interface for QinQ VLAN Tag Termination
Figure 1-368 IP forwarding on a sub-interface for QinQ VLAN tag termination
The sub-interface for QinQ VLAN tag termination first identifies double VLAN tags and then
generates an ARP entry containing the IP address, MAC address, and double VLAN tags.
 For the upstream traffic, the termination sub-interface strips the Ethernet frame header
(including MAC address) and double VLAN tags, and searches the routing table to
perform Layer 3 forwarding based on the destination IP address.
 For the downstream traffic, the termination sub-interface encapsulates IP packets with
the Ethernet frame header (including MAC address) and double VLAN tags according to
ARP entries and then sends IP packets to the target user.
1.7.6.2.10 Proxy ARP on a Termination Sub-interface

On the network shown in Figure 1-369 and Figure 1-370, a termination sub-interface allows a
VLAN range to access the same network segment. Users on the same network segment
belong to different VLANs in the VLAN range. In this scenario, users cannot communicate
with each other at Layer 2. IP forwarding must be performed on the termination sub-interface.
To support IP forwarding, the termination sub-interface must support proxy ARP.
Proxy ARP can be configured on a sub-interface for Dot1q VLAN tag termination or
a PE contain one or two VLAN tags.
 If the user packets contain one tag, the sub-interface that has proxy ARP configured is a
sub-interface for Dot1q VLAN tag termination.
 If the user packets contain double tags, the sub-interface that has proxy ARP configured
is a sub-interface for QinQ VLAN tag termination.
Proxy ARP on a Sub-interface for Dot1q VLAN Tag Termination

On the network shown in Figure 1-369, PC1 and PC2 belong to VLAN 100; PC3 belongs to
VLAN 200; Switch 1 is a Layer 2 switch, which allows any VLAN packets to pass; PC1, PC2,
and PC3 are on the same network segment.
Issue 01 (2018-05-04) 572

NE20E-S2
When PC1 and PC3 want to communicate with each other, PC1 sends an ARP request to PC3
to obtain PC3's MAC address. However, as PC1 and PC3 are in different VLANs, PC3 fails to
receive the ARP request from PC1.
To solve this problem, configure proxy ARP on the sub-interface for Dot1q VLAN tag
termination. The detailed communication process is as follows:
1. PC1 sends an ARP Request message to request PC3's MAC address.
2. After receiving the ARP Request message, the PE checks the destination IP address of
the message and finds that the destination IP address is not the IP address of its
sub-interface for Dot1q VLAN tag termination. Then, the PE searches its ARP table for
the PC3's ARP entry.
− If the PE finds this ARP entry, the PE checks whether inter-VLAN proxy ARP is
enabled.
 If inter-VLAN proxy ARP is enabled, the PE sends the MAC address of its
sub-interface for Dot1q VLAN tag termination to PC1.
 If inter-VLAN proxy ARP is not enabled, the PE discards the ARP Request
message.
− If the PE does not find this ARP entry, the PE discards the ARP Request message
sent by PC1 and checks whether inter-VLAN proxy ARP is enabled.
 If inter-VLAN proxy ARP is enabled, the PE sends an ARP Request message
to PC3. After the PE receives an ARP Reply message from PC3, an ARP entry
of PC3 is generated in the PE's ARP table.
 If inter-VLAN proxy ARP is not enabled, the PE does not perform any
operations.
3. After learning the MAC address of the sub-interface for Dot1q VLAN tag termination,
PC1 sends IP packets to the PE based on this MAC address.
After receiving the IP packets, the PE forwards them to PC3.
Figure 1-369 Proxy ARP on a sub-interface for Dot1q VLAN tag termination
Issue 01 (2018-05-04) 573

NE20E-S2
Proxy ARP on a Sub-interface for QinQ VLAN Tag Termination

A termination sub-interface allows a VLAN range to access the same network segment. Users
on the same network segment belong to different VLANs in the VLAN range. In this scenario,
users cannot communicate with each other at Layer 2. IP forwarding must be performed on
the termination sub-interface. To support IP forwarding, the termination sub-interface must
support proxy ARP.
On the network shown in Figure 1-370, PC1 and PC2 belong to VLAN 100; PC3 belongs to
VLAN 200; Switch 1 has selective QinQ enabled and adds outer VLAN tag 1000 to the
packets sent by Switch 2 and Switch 3 to the PE; PC1, PC2, and PC3 are on the same network
segment.
When PC1 and PC3 want to communicate with each other, PC1 sends an ARP request to PC3.
However, as PC1 and PC3 are in different VLANs, PC3 fails to receive the ARP request from
PC1.
To solve this problem, enable proxy ARP on the sub-interface for QinQ VLAN tag
termination. The detailed communication process is as follows:
1. PC1 sends an ARP Request message to ask for PC3's MAC address.
2. After receiving the ARP Request message, the PE checks the destination IP address of
the message and finds that the destination IP address is not the IP address of its
sub-interface for QinQ VLAN tag termination. Then, the PE searches its ARP table for
the PC3's ARP entry.
− If the PE finds this ARP entry, the PE checks whether inter-VLAN proxy ARP is
enabled.
 If inter-VLAN proxy ARP is enabled, the PE sends the MAC address of its
sub-interface for QinQ VLAN tag termination to PC1.
 If inter-VLAN proxy ARP is not enabled, the PE discards the ARP Request
message.
− If the PE does not find this ARP entry, the PE discards the ARP Request message
sent by PC1 and checks whether inter-VLAN proxy ARP is enabled.
 If inter-VLAN proxy ARP is enabled, the PE sends an ARP Request message
to PC3. After the PE receives an ARP Reply message from PC3, an ARP entry
of PC3 is generated in the PE's ARP table.
 If inter-VLAN proxy ARP is not enabled, the PE does not perform any
operations.
3. After learning the MAC address of the sub-interface for QinQ VLAN tag termination,
PC1 sends IP packets to the PE based on this MAC address.
After receiving the IP packets, the PE forwards them to PC3.
Issue 01 (2018-05-04) 574

NE20E-S2
Figure 1-370 Proxy ARP on a sub-interface for QinQ VLAN tag termination
1.7.6.2.11 DHCP Server on a Termination Sub-interface

On the network shown in Figure 1-371 and Figure 1-372, the Dynamic Host Configuration
Protocol (DHCP) server function is configured on termination sub-interfaces, so that the
sub-interfaces can assign IP addresses to users.
The DHCP server function can be configured on a sub-interface for Dot1q VLAN tag
termination or sub-interface for QinQ VLAN tag termination, based on whether the user
packets received by a PE contain one or two VLAN tags.
 If the user packets contain one tag, the sub-interface that has the DHCP server function
configured is a sub-interface for Dot1q VLAN tag termination.
 If the user packets contain double tags, the sub-interface that has the DHCP server
function configured is a sub-interface for QinQ VLAN tag termination.
DHCP Server on a Sub-interface for Dot1q VLAN Tag Termination
Figure 1-371 DHCP server on a sub-interface for Dot1q VLAN tag termination
Issue 01 (2018-05-04) 575

NE20E-S2
On the network shown in Figure 1-371, the user packet received by the DHCP server carries a
single tag. To enable the sub-interface for Dot1q VLAN tag termination on the DHCP server
to assign an IP address to a DHCP client, configure the DHCP server function on the
DHCP Server on a Sub-interface for QinQ VLAN Tag Termination
Figure 1-372 DHCP server on a sub-interface for QinQ VLAN tag termination
On the network shown in Figure 1-372, the switch has selective QinQ configured, and the
user packet received by the DHCP server carries double tags. To enable the sub-interface for
QinQ VLAN tag termination on the DHCP server to assign an IP address to a DHCP client,
configure the DHCP server function on the sub-interface for QinQ VLAN tag termination.
1.7.6.2.12 DHCP Relay on a Termination Sub-interface

On the network shown in Figure 1-374 and Figure 1-374, the Dynamic Host Configuration
Protocol (DHCP) relay function is configured on termination sub-interfaces. This function
allows the sub-interfaces to add user tag information into Option 82, so that a DHCP server
can assign IP addresses based on the tag information.
The DHCP relay function can be configured on a sub-interface for Dot1q VLAN tag
termination or sub-interface for QinQ VLAN tag termination, based on whether the user
packets received by a PE contain one or two VLAN tags.
 If the user packets contain one tag, the sub-interface that has the DHCP relay function
 If the user packets contain double tags, the sub-interface that has the DHCP relay
function configured is a sub-interface for QinQ VLAN tag termination.
DHCP Relay on a Sub-interface for Dot1q VLAN Tag Termination

On the network shown in Figure 1-373, the packet received by the DHCP relay carries a
single tag. If a sub-interface for Dot1q VLAN tag termination does not support the DHCP
relay, the DHCP relay regards the received packet as an invalid packet and discards it. As a
result, the DHCP client cannot obtain an IP address from the DHCP server.
On the sub-interface for Dot1q VLAN tag termination, the DHCP relay function is
implemented as follows:
Issue 01 (2018-05-04) 576

NE20E-S2
1. When receiving a DHCP request message, the DHCP relay adds user tag information
into the Option 82 field in the message.
2. When receiving a DHCP reply message (ACK message) from the DHCP server, the
DHCP relay analyzes the DHCP reply and generates a binding table.
3. The DHCP relay checks user packets based on the user tag information.
Figure 1-373 DHCP relay on a sub-interface for Dot1q VLAN tag termination
DHCP Relay on a Sub-interface for QinQ VLAN Tag Termination

On the network shown in Figure 1-373, the packet received by the DHCP relay carries double
tags. If a sub-interface for QinQ VLAN tag termination does not support the DHCP relay, the
DHCP relay regards the received packet as an invalid packet and discards it. As a result, the
DHCP client cannot obtain an IP address from the DHCP server.
On the sub-interface for QinQ VLAN tag termination, the DHCP relay function is
implemented as follows:
1. When receiving a DHCP request message, the DHCP relay adds user tag information
into the Option 82 field in the message.
2. When receiving a DHCP reply message (ACK message) from the DHCP server, the
DHCP relay analyzes the DHCP reply and generates a binding table.
3. The DHCP relay checks user packets based on the user tag information.
Issue 01 (2018-05-04) 577

NE20E-S2
Figure 1-374 DHCP relay on a sub-interface for QinQ VLAN tag termination
1.7.6.2.13 VRRP on a Termination Sub-interface

On the network shown in Figure 1-375 and Figure 1-376, Virtual Router Redundancy
Protocol (VRRP) is supported on termination sub-interfaces to ensure communication
between Dot1q or QinQ users and networks.
VRRP can be configured on a sub-interface for Dot1q VLAN tag termination or sub-interface
for QinQ VLAN tag termination, based on whether the user packets received by a PE contain
one or two VLAN tags.
 If the user packets contain one tag, the sub-interface that has VRRP configured is a
 If the user packets contain double tags, the sub-interface that has VRRP configured is a
sub-interface for QinQ VLAN tag termination.
Issue 01 (2018-05-04) 578

NE20E-S2
VRRP on a Sub-interface for Dot1q VLAN Tag Termination
Figure 1-375 VRRP on a sub-interface for Dot1q VLAN tag termination
On the network shown in Figure 1-375, sub-interfaces for Dot1q VLAN tag termination
specify an outer tag, such as tag 100, to configure a VRRP backup group.
 Maintaining the master/backup status of the VRRP backup group
 Responding to ARP request messages of users
The PE responds to ARP requests of users regardless of whether their packets contain the
tag specified during the VRRP configuration.
 Updating the MAC address entries of the Layer 2 switch
Gratuitous ARP messages are sent periodically to update the MAC entries of the switch
and are copied for all the VLAN tags specified on the sub-interfaces for Dot1q VLAN
tag termination. In this way, the VLANs on the switch can learn virtual MAC addresses.
To improve system performance, the frequency of sending gratuitous ARP messages is
increased only when a master/backup switchover is performed. During stable operation
of VRRP, the frequency of sending gratuitous ARP messages is lowered, and the interval
at which gratuitous ARP packets are sent must be less than the aging time of MAC
entries on the switch.
The preceding working mechanism has the following advantages:
 Only one VRRP instance needs to be created for users on the same network segment,
even if they carry different VLAN tags.
 VRRP resources are saved.
 Hardware resources are saved.
 IP addresses are saved.
Issue 01 (2018-05-04) 579

NE20E-S2
 The number of users that can access the network is increased.
VRRP on a Sub-interface for QinQ VLAN Tag Termination
Figure 1-376 VRRP on a sub-interface for QinQ VLAN tag termination
On the network shown in Figure 1-376, sub-interfaces for QinQ VLAN tag termination
specify double tags, such as an inner tag 100, outer tag 1000 to configure a VRRP backup
group.
 Maintaining the master/backup status of the VRRP backup group
 Responding to ARP request messages of users
The PE responds to ARP requests of users regardless of whether their packets contain the
tags specified during the VRRP configuration.
 Updating the MAC address entries of the Layer 2 switch
Gratuitous ARP messages are sent periodically to update the MAC entries of the switch
and are copied for all the VLAN tags specified on the sub-interfaces for QinQ VLAN tag
termination. In this way, the VLANs on the switch can learn virtual MAC addresses. To
improve system performance, the frequency of sending gratuitous ARP messages is
increased only when a master/backup switchover is performed. During stable operation
of VRRP, the frequency of sending gratuitous ARP messages is lowered, and the interval
at which gratuitous ARP packets are sent must be less than the aging time of MAC
entries on the switch.
The preceding working mechanism has the following advantages:
 Only one VRRP instance needs to be created for users on the same network segment,
even if they carry different VLAN tags.
Issue 01 (2018-05-04) 580

NE20E-S2
 VRRP resources are saved.

 Hardware resources are saved.
 IP addresses are saved.
 The number of users that can access the network is increased.
1.7.6.2.14 L3VPN Access Through a Termination Sub-interface

On the network shown in Figure 1-377 and Figure 1-378, Layer 3 virtual private network
(L3VPN) functions are configured on termination sub-interfaces.
L3VPN functions can be configured on a sub-interface for Dot1q VLAN tag termination or
 If the user packets contain one tag, the sub-interface that has L3VPN functions
 If the user packets contain double tags, the sub-interface that has L3VPN functions
L3VPN Access Through a Sub-interface for Dot1q VLAN Tag Termination

Figure 1-377 shows a typical networking for L3VPN access through a sub-interface for Dot1q
VLAN tag termination.
A user packet is attached with a customer-based VLAN tag on the Digital Subscriber Line
Access Multiplexer (DSLAM) and then is transmitted transparently from the CE to the PE.
On the PE, a sub-interface for Dot1q VLAN tag termination is configured, an outer VLAN tag
is specified, and the sub-interface for Dot1q VLAN tag termination is bound to a VPN
instance according to the outer VLAN tag.
After receiving the user packet, the PE strips off the outer VLAN tag and sends it to the
L3VPN. At the same time, the PE needs to add a correct outer VLAN tag to the packet
returned to the CE.
When the PE is terminating the outer tag of a user packet, ARP learning based on the outer
VLAN tag of the user packet is required.
Issue 01 (2018-05-04) 581

NE20E-S2
Figure 1-377 L3VPN access through a sub-interface for Dot1q VLAN tag termination
L3VPN Access Through a Sub-interface for QinQ VLAN Tag Termination

Figure 1-378 shows a typical networking for L3VPN access through a sub-interface for QinQ
A user packet is attached with a customer-based VLAN tag on the DSLAM and then attached
with a service-based VLAN tag on the CE. On the PE, the sub-interface for QinQ VLAN tag
termination is configured, inner and outer VLAN tags are specified, and the sub-interface for
QinQ VLAN tag termination is bound to a VPN instance according to double VLAN tags.
After receiving a QinQ packet from the user, the PE strips off double VLAN tags and then
accesses the L3VPN. At the same time, the PE needs to add a correct outer VLAN tag and
inner VLAN tag to the packet returned to the CE.
When the PE is terminating double tags of a user packet, ARP learning based on double
VLAN tags of the user packet is required.
Issue 01 (2018-05-04) 582

NE20E-S2
Figure 1-378 L3VPN access through a sub-interface for QinQ VLAN tag termination
1.7.6.2.15 VPWS Access Through a Termination Sub-interface

Virtual Pseudo Wire Service (VPWS) access through a termination sub-interface for QinQ
VLAN tag termination means that VPWS functions are configured on the sub-interface for
QinQ VLAN tag termination. By configuring the range of double VLAN tags on the
sub-interface for QinQ VLAN tag termination on a PE, users within the VLAN tag range are
allowed to access VPWS. A local device can transparently transmit user packets with double
VLAN tags to a remote device for authentication. The remote device is usually a Broadband
Remote Access Server (BRAS).
Figure 1-379 shows a typical networking for VPWS access through a sub-interface for QinQ
Issue 01 (2018-05-04) 583

NE20E-S2
Figure 1-379 VPWS access through a sub-interface for QinQ VLAN tag termination
1.7.6.2.16 VPLS Access Through a Termination Sub-interface

Virtual Private LAN Service (VPLS) access through a termination sub-interface means that
VPLS functions are configured on the termination sub-interface. By configuring the range of
double VLAN tags on the sub-interface for QinQ VLAN tag termination of the PE, a local
Virtual Switching Instance (VSI) can communicate with a remote VSI. VPLS access is often
used for communication between QinQ users of Layer 2 enterprise networks.
On a VPLS network, one Virtual Circuit (VC) link connects only a user's two VLANs that are
distributed in different places. If the user wants to connect multiple VLANs distributed in
different places, multiple VCs are required.
As a termination sub-interface supports a VLAN range, configuring VPLS access through a
termination sub-interface allows one VC to connect users in the VLAN rage. Traffic of all the
VLANs in the specified range is transmitted over this VC, greatly saving VC resources of the
public network and configuration workload. In addition, users can plan their own VLANs,
irrespective of what the Internet Service Provider's (ISP's) VLANs are.
VPLS functions can be configured on a sub-interface for Dot1q VLAN tag termination or
 If the user packets contain one tag, the sub-interface that has VPLS functions configured
is a sub-interface for Dot1q VLAN tag termination.
 If the user packets contain double tags, the sub-interface that has VPLS functions
VPLS Access Through a Sub-interface for Dot1q VLAN Tag Termination

Figure 1-380 shows a typical networking for VPLS access through a sub-interface for Dot1q
Issue 01 (2018-05-04) 584

NE20E-S2
Figure 1-380 VPLS access through a sub-interface for Dot1q VLAN tag termination
VPLS supports the Point-to-Multipoint Protocol (P2MP) and forwards data by learning MAC
addresses. In this case, VPLS access through a sub-interface for Dot1q VLAN tag termination
can be performed by MAC address learning on the basis of a single VLAN tag. Note that
there are no restrictions on VLAN tags for VPLS access.
VPLS Access Through a Sub-interface for QinQ VLAN Tag Termination

Figure 1-381 shows a typical networking for VPLS access through a sub-interface for QinQ
Issue 01 (2018-05-04) 585

NE20E-S2
Figure 1-381 VPLS access through a sub-interface for QinQ VLAN tag termination
VPLS supports the P2MP and forwards data by learning MAC addresses. In this case, VPLS
access through a sub-interface for QinQ VLAN tag termination can be performed by MAC
address learning on the basis of double VLAN tags. Note that there are no restrictions on
VLAN tags for VPLS access.
1.7.6.2.17 Multicast Service on a Termination Sub-interface

With wide applications of multicast services on the Internet, when double-tagged multicast
packets are sent from the user side to a sub-interface for QinQ VLAN tag termination
sub-interface, the sub-interface needs to support the Internet Group Management Protocol
(IGMP). In this manner, the UPE can maintain outbound interface information of the
multicast packets based on the created multicast forwarding table, and the hosts can
communicate with the multicast source.
Issue 01 (2018-05-04) 586

NE20E-S2
Figure 1-382 Multicast service on a termination sub-interface
On the network shown in Figure 1-382, when the DSLAM forwards double-tagged multicast
packets to the UPE, the UPE processes the packets as follows based on double-tag contents:
1. When the double-tagged packets carrying an outer S-VLAN tag and an inner C-VLAN
tag are transmitted to the UPE to access the Virtual Switching Instances (VSIs), the UPE
terminates the double tags and binds the packets to the multicast VSIs through Pseudo
Wires (PWs). Then, the PE-AGG terminates PWs and adds multicast VLAN tags to the
packets. Finally, the packets are transmitted to the multicast source. For example, IPTV
packets with S-VLAN 3 and C-VLANs ranging from 1 to 1000 are terminated on the
UPE and then access a PW. The PE-AGG terminates the PW and adds multicast VLAN
8 to the packets. IGMP snooping sets up forwarding entries based on the interface
number, S-VLAN tag, and C-VLAN tag and supports multicast packets with different
C-VLAN tags. Each PW then forwards the multicast packets based on their S-VLAN IDs
and C-VLAN IDs.
2. When the double-tagged packets carrying an outer C-VLAN tag and an inner S-VLAN
tag are transmitted to the UPE, the UPE enabled with VLAN swapping swaps the outer
C-VLAN tag and inner S-VLAN tag. If multicast packets access Layer 2 VLANs, the
packets are processed in mode 1; if multicast packets access VSIs, the packets are
processed in mode 2.
Generally, VLANs are divided into the following types:
 C-VLAN: customer VLAN

 S-VLAN: service VLAN
The UPE processes packets in the following modes:

 Single-tagged packets: The sub-interface for Dot1q VLAN tag termination needs to have
IGMP and IGMP snooping configured.
 Double-tagged packets: The sub-interface for QinQ VLAN tag termination needs to have
IGMP and IGMP snooping configured.
1.7.6.2.18 VPWS Access Through a QinQ Stacking Sub-interface

The Virtual Private Wire Service (VPWS) is a point-to-point L2VPN technology. A VLANIF
interface does not support VPWS, and therefore you have to access a Virtual Private Network
Issue 01 (2018-05-04) 587

NE20E-S2
(VPN) through a main interface. Such a configuration is not flexible because multiple users
cannot access through the same physical interface. To ensure the access of multiple users
through the same physical interface, you can use the QinQ stacking function on different
sub-interfaces. This requires that CE-VLANs on PE1 and PE2 be the same.
On the network shown in Figure 1-383, a QinQ stacking sub-interface on PE1 adds an outer
VLAN tag of the ISP network to its received user packets that carry a VLAN tag ranging from
1 to 200 on sub-interfaces. Then, PE1 sends these packets to the VPWS network.
Figure 1-383 VPWS access through a QinQ stacking sub-interface
1.7.6.2.19 VPLS Access Through a QinQ Stacking Sub-interface

To access an Internet Service Provider (ISP) network through a Virtual Private LAN Service
(VPLS) network, you can bind a Virtual Switching Instance (VSI) to a VLANIF interface to
transparently transmit user VLANs over the ISP network.
Alternatively, you can access a VPLS network through routing-based sub-interfaces on which
QinQ stacking is configured. In Figure 1-384, QinQ stacking sub-interfaces add an outer
VLAN tag of the ISP network to its received user packets that carry a VLAN tag ranging from
1 to 200. Then the sub-interfaces are bound to a VSI. In this manner, users can access the
VPLS network.
Issue 01 (2018-05-04) 588

NE20E-S2
Figure 1-384 VPLS access through a QinQ stacking sub-interfaces
1.7.6.2.20 802.1p on a QinQ Interface

During QinQ encapsulation, a QinQ interface adds an outer VLAN tag to the packet it
received and is unaware of the 802.1p value in the inner VLAN tag. As a result, the service
priority identified by the 802.1p value is ignored. Figure 1-385 shows the 802.1p field in a
QinQ packet.
Figure 1-385 802.1p in a QinQ packet
Issue 01 (2018-05-04) 589

NE20E-S2
To solve this problem, the 802.1p value in the inner VLAN tag must be processed on a QinQ
sub-interface. The following three ways are available on a QinQ interface:
 Ignores the 802.1p value in the inner VLAN tag, but resets the 802.1p value in the outer
VLAN tag.
 Automatically maps the 802.1p value in the inner VLAN tag to an 802.1p value in the
outer VLAN tag.
 Sets the 802.1p value in the outer VLAN tag according to the 802.1p value in the inner
VLAN tag.
In Figure 1-386, QinQ supports 802.1p in following modes:
 Pipe mode: A specified 802.1p value is set.
 Uniform mode: The 802.1p value in the inner VLAN tag is used.
 Maps the 802.1p value in the inner VLAN tag to an 802.1p value in the outer VLAN tag.
Multiple 802.1p values in the inner VLAN tag can be mapped to an 802.1p value in the
outer VLAN tag, but one 802.1p value in the inner VLAN tag cannot be mapped to
multiple 802.1p values in the outer VLAN tag.
Figure 1-386 802.1p supported by QinQ
1.7.6.3.1 User Services on a Metro Ethernet
On the network shown in Figure 1-387, DSLAMs support multiple permanent virtual channel
(PVC) access. A user uses multiple services, such as HSI, IPTV and VoIP.
Issue 01 (2018-05-04) 590

NE20E-S2
Figure 1-387 QinQ on a Metro Ethernet
PVCs are used to carry services that are assigned with different VLAN ID ranges. The
following table lists the VLAN ID ranges for each service.
Table 1-102 Mapping between services and VLAN IDs

Service Name Full Name VLAN ID Range
HSI High Speed Internet 101 to 300

VoIP Voice over IP 301 to 500
IPTV Internet Protocol Television 501 to 700
If a user needs to use the VoIP service, user VoIP packets are sent to a DSLAM over a
specified PVC and assigned with VLAN ID 301. When the packets reach the UPE, an outer
VLAN ID (for example, 2000) is added to the packets. The inner VLAN ID (301) represents
the user, and the outer VLAN ID (2000) represents the VoIP service (the DSLAM location can
also be marked if you add different VLAN tags to packets received by different DSLAMs).
The UPE then sends the VoIP packets to the NPE where the double VLAN tags are terminated.
Then, the NPE sends the packets to an IP core network or a VPN.
Issue 01 (2018-05-04) 591

NE20E-S2
HSI and IPTV services are processed in the same way. The difference is that QinQ
termination of HSI services is implemented on the BRAS.
The NPE can generate a Dynamic Host Configuration Protocol (DHCP) binding table to avoid
network attacks. In addition, the NPE can implement DHCP authentication based on the
two-layer tags and has Virtual Router Redundancy Protocol (VRRP) enabled to ensure service
reliable access.
1.7.6.3.2 Enterprise Leased Line Interconnections

On the network shown in Figure 1-388, an enterprise has two sites in different places. Each
site has three networks: finance, marketing, and others. To ensure network security, users of
different networks cannot communicate with each other.
Figure 1-388 Enterprise leased line communication
A carrier deploys the VPLS technology on the IP/MPLS core network and QinQ on the ME
network. Three VLANs are assigned for each site to identify the finance, marketing and other
departments, and the VLAN ID for finance is 100, for marketing is 200, and for others is 300.
An outer VLAN 1000 is encapsulated on a UPE (Packets can be added with different VLAN
tags on different UPEs). The sub-interface bound to a VSI on the NPE connected to the UPE
is in symmetry mode. In this way, users belonging to the same VLAN in different sites can
Terms
Term Definition
QinQ interface An interface that can process VLAN frames with a single tag (Dot1q
Issue 01 (2018-05-04) 592

NE20E-S2
Term Definition
termination) or with double tags (QinQ termination).
Sub-interface An interface that identifies the single or double tags in a packet and
for VLAN tag removes the single or double tags before sending the packets.
termination

Abbreviation
QinQ 802.1Q in 802.1Q

VPLS Virtual Private LAN Service
VSI Virtual Switch Instance
VPWS Virtual Private Wire Service
QinQ Termination 802.1Q in 802.1Q termination
DHCP Dynamic Host Configuration Protocol
IPTV Internet Protocol Television
PVC Permanent Virtual Connection
VoIP Voice over IP
HSI High Speed Internet
1.7.7 EVC
Definition
An Ethernet virtual connection (EVC) defines a uniformed Layer 2 Ethernet transmission and
configuration model. An EVC is defined by the Metro-Ethernet Forum (MEF) as an
association between two or more user network interfaces within an Internet service provider
(ISP) network. In the EVC model, a bridge domain functions as a local broadcast domain that
can isolate user networks.
An EVC is a model, rather than a specific service or technique.
Issue 01 (2018-05-04) 593

NE20E-S2
Purpose
Figure 1-389 shows the traditional service model supported by the NE20E.
Figure 1-389 Traditional service model of the NE20E
The NE20E's traditional service model has limitations, which are described in Table 1-103.
The EVC model addresses the limitations and was introduced shown in Figure 1-390.
Issue 01 (2018-05-04) 594

NE20E-S2
Figure 1-390 EVC model
Table 1-103 provides a comparison between the traditional service model and the EVC model
of the NE20E.
Table 1-103 Comparison between the traditional service model and the EVC model of the NE20E
Service Traditional Service Model EVC Model

Object
Ethernet Sub-interfaces and Layer 2 physical EVC Layer 2 sub-interfaces only.
Flow interfaces, which have various types Configurations are unified on the
Point and require different configurations. Layer 2 sub-interfaces. The
(EFP) configurations include traffic
encapsulation, traffic behavior, traffic
policy, and traffic forwarding. A
traffic encapsulation type can be
combined with a traffic behavior to
form a traffic policy, allowing
different services to be transmitted
through different Layer 2
sub-interfaces.
Broadcas  Virtual local area network (VLAN) Bridge domain (BD)
t domain for traditional Layer 2 services  A BD provides local switching of
VLANs are used as broadcast VLAN/QinQ services.
domains on MANs. The VLAN tag  Different BDs can carry services
field defined in IEEE 802.1Q has from the same VSI, and services
12 bits and identifies only a are differentiated using BD IDs.
maximum of 4096 VLANs. BDs are isolated from each other,
QinQ was developed to address the and MAC address learning is
shortage of VLAN ID resources. based on BDs, preventing MAC
Issue 01 (2018-05-04) 595

NE20E-S2
Service Traditional Service Model EVC Model

Object
However, QinQ must be used with address flapping.
the virtual private LAN service
(VPLS) to provide local switching
services, and QinQ cannot
implement local switching services
and Layer 3 access at the same
time.
 Virtual switching instance (VSI)
for VPLS services
− A customer can plan VLANs
and traffic within a VSI.
− When VLAN services are
carried within a VSI, the
VLANs are not isolated, posing
security risks. If the same MAC
address exists in multiple
VLANs of a VSI, MAC address
flapping occurs, affecting
services.
Layer 2  VLAN Trunk: transmits VLAN -

forwardi services.
ng  Pseudo wire (PW): transmits Layer
2 virtual private network (L2VPN)
services.
Layer 3  VLANIF interface: terminates BDIF interface: terminates Layer 2

access Layer 2 packets and provides Layer services and provides Layer 3 access.
3 access.
A VLANIF interface terminates
single-tagged packets rather than
double-tagged packets.
 L2VE and L3VE interfaces bound
to a VE group: terminates L2VPN
services and provides L3VPN
access, respectively.
L2VE and L3VE interfaces must be
bound to the same VE group when
sub-interfaces for dot1q or QinQ
VLAN tag termination are used to
provide Layer 3 access. The
configurations are more
complicated than those of VLANIF
interfaces.
Issue 01 (2018-05-04) 596

NE20E-S2
Benefits
EVC provides an Ethernet service model and a configuration model. EVC simplifies
configuration management, improves operation and maintenance efficiency, and enhances
service expansibility.
1.7.7.2 Principles
1.7.7.2.1 EVC Service Bearing
Table 1-104 lists EVC types defined by the MEF.
Table 1-104 EVC types
EVC Type Description

Point to point Supports the Ethernet Line (E-Line) service.
EVC The E-Line service is an Ethernet service that is based on a point to point
EVC. Services are not distinguished in the point to point EVC.
Multipoint to Supports the Ethernet LAN (E-LAN) service.
multipoint The E-LAN service is an Ethernet service that is based on a multipoint to
EVC multipoint EVC.
Rooted Point to multi-point
multipoint
EVC
This section focuses on the multipoint to multipoint EVC.
Related Concepts
 EVC Layer 2 sub-interface
An EVC Layer 2 sub-interface is connected to a BD and a VPWS network but cannot be
directly connected to a Layer 3 network.
 BD
A BD is a broadcast domain. VLAN tags are transparent within a BD, and MAC address
learning is based on BDs.
An EVC Layer 2 sub-interface belongs to only one BD. Each EVC Layer 2 sub-interface
functioning as a service access point is added to a specific bridge domain and transmits a
specific type of service, which implements service isolation.
 BDIF
A BDIF interface is a Layer 3 logical interface that terminates Layer 2 services and
provides Layer 3 access.
Each BD has only one BDIF interface.
Figure 1-391 shows a diagram of EVC service bearing, involving EFPs, broadcast domains,
and Layer 3 access.
Issue 01 (2018-05-04) 597

NE20E-S2
Ethernet Flow Point (EFP)
Figure 1-391 Diagram of EVC service bearing
An EVC Layer 2 sub-interface is used as an EVC service access point, on which traffic
encapsulation types and behaviors can be flexibly combined. A traffic encapsulation type and
behavior are grouped into a traffic policy. Traffic policies help implement flexible Ethernet
traffic access.
 Traffic encapsulation
A Layer 2 Ethernet network can transmit untagged, single-tagged, and double-tagged
packets. To enable a specific EVC Layer 2 sub-interface to transmit a specific type of
packet, specify an encapsulation type on the EVC Layer 2 sub-interface. Table 1-105
lists traffic encapsulation types supported by Layer 2 sub-interfaces.
Table 1-105 Traffic encapsulation
Type Description Rule
Untagged An EVC Layer 2 sub-interface with this Only one traffic

traffic encapsulation type can only receive encapsulation type can be
packets carrying no VLAN tags. configured for each EVC
Issue 01 (2018-05-04) 598

NE20E-S2
Type Description Rule

Dot1q An EVC Layer 2 sub-interface with this Layer 2 sub-interface.
traffic encapsulation type can receive
packets carrying one or more tags.
The sub-interface checks the outer VLAN
tags in packets, but not the inner tags. It
accepts packets in which the outer VLAN
tag matches the specified VLAN tag and
the inner VLAN tag is either unspecified or
does not match a specified QinQ
encapsulation type, and transparently
transmits inner VLAN tags as data.
QinQ An EVC Layer 2 sub-interface with this
packets carrying two or more tags.
The sub-interface checks the first two tags
in packets before accepting them.
Default An EVC Layer 2 sub-interface with this
packets any number of tags. The packets
can be untagged, single-tagged,
double-tagged, and multi-tagged packets.
For example, where one EVC Layer 2
sub-interface supports untagged
encapsulation, and another one supports
default encapsulation, the former can
receive untagged packets, and the latter can
receive all types of packets, except
untagged packets.
Figure 1-392 shows a traffic encapsulation diagram.
Issue 01 (2018-05-04) 599

NE20E-S2
Figure 1-392 Traffic encapsulation diagram
On a physical interface, if only one EVC Layer 2 sub-interface is created and the
encapsulation type is Default, all traffic is forwarded through the EVC Layer 2
sub-interface.
If a physical interface has both a Default EVC sub-interface and EVC sub-interfaces of
other traffic encapsulation types (such as Dot1q and QinQ), and all the non-Default EVC
sub-interfaces are Down, traffic precisely matching these non-Default EVC
sub-interfaces will not be forwarded through the Default EVC sub-interface.
 Different types of sub-interfaces, including common sub-interfaces, Layer 2 sub-interfaces,

sub-interfaces for dot1q VLAN tag termination, and sub-interfaces for QinQ VLAN tag termination,
can be created on the same interface. Among these sub-interfaces, only Layer 2 sub-interfaces can be
connected to BDs and configured with traffic encapsulation, traffic behaviors, traffic policies, and
traffic forwarding.
 If a default sub-interface is connected to a BD, no BDIF interface can be created in the BD.
 Traffic behaviors
Table 1-106 lists traffic behaviors supported by Layer 2 sub-interfaces.
Table 1-106 Traffic behaviors
Ty Description Usage Scenario Rule Figure

pe
Issue 01 (2018-05-04) 600

NE20E-S2

pe
pus An EVC Layer 2 On a metro  Only one
h sub-interface with Ethernet network, traffic Figure 1-393 push
this traffic user and service behavior
behavior type adds packets are can be
outer VLAN tags identified using configure
to received VLANs. A 12-bit d on each
packets. VLAN tag defined EVC
 push 1: adds in IEEE 802.1Q Layer 2
one outer identifies a sub-interf
VLAN tag to a maximum of only ace.
packet. 4096 VLANs,  The
which is traffic
 push 2: adds insufficient for a behavior
two VLAN great number of
tags to a for
users in the metro incoming
packet. Ethernet. The traffic
QinQ technology must be
increases the the
number of inverse of
available VLAN that for
tags. the traffic
After an EVC behavior
Layer 2 for
sub-interface has outgoing
been created on traffic.
the access side of a  If no
device, a push traffic
traffic behavior is behavior
specified on the is
EVC Layer 2 configure
sub-interface. The d on an
sub-interface can EVC
add one or two Layer 2
tags to each sub-interf
untagged packet ace, the
and add an outer sub-interf
tag to each ace
single-tagged forwards
packet. received
pop An EVC Layer 2  Inter-VLAN packets
sub-interface with communication without Figure 1-394 pop
this traffic modifying
VLANs are them.
behavior type widely used
removes VLAN because they
tags from received can isolate
packets. Layer 2 user
 pop single: packets. Each
removes a tag VLAN
from each logically
single-tagged isolates a
Issue 01 (2018-05-04) 601

NE20E-S2

pe
packet or the broadcast
outer tag from domain from
each other broadcast
double-tagged domains in a
packet. physical LAN.
 pop double: Hosts in each
removes both VLAN can
VLAN tags communicate
from a packet. with each
other, but not
hosts in other
VLANs. To
enable
inter-VLAN
communication
, Layer 3
routing
techniques
must be used.
Traditional
Layer 3
Ethernet
interfaces
consider
VLAN-tagged
packets invalid
and discard
them and do
not transmit
them over
routes. To
enable
inter-VLAN
communication
, a pop traffic
behavior can
be specified on
an EVC Layer
2 sub-interface
created on the
access side of a
device on the
edge of a
public network.
Then the EVC
Layer 2
sub-interface
can remove
VLAN tags
after receiving
VLAN-tagged
packets and
Issue 01 (2018-05-04) 602

NE20E-S2

pe
forward the
packets to
another VLAN
to implement
inter-VLAN
communication
.
 LAN and
WAN
interconnection
A majority of
LAN packets
carry VLAN
tags, whereas
WAN
protocols, for
example, PPP,
cannot identify
VLAN-tagged
packets. To
forward
VLAN-tagged
packets from a
LAN to a
WAN, a device
needs to record
the VLAN
information,
remove VLAN
tags from
packets, and
forward them.
swa An EVC Layer 2 On Huawei

p sub-interface with devices, outer tags Figure 1-395 swap
this traffic in QinQ packets
behavior type identify services,
swaps the inner and inner tags
VLAN tag with identify users. On
the outer VLAN some networks,
tag in a outer tags in QinQ
double-tagged packets identify
packet. users, and inner
tags identify
services. To
forward packets to
such networks,
configure an EVC
Layer 2
sub-interface on a
Issue 01 (2018-05-04) 603

NE20E-S2

pe
Huawei device to
swap the inner and
outer VLAN tags
in received
packets.
ma An EVC Layer 2 A network needs
p sub-interface with to be expanded Figure 1-396 map
this traffic with the growth of
behavior type access users and
maps VLAN tags data services,
carried in received wihch poses the
packets to other following
configured tags in challenges to
one of the network
following modes: management:
 1 to 1: The  Existing and
sub-interface new sites that
maps a tag in are assigned to
each received different
single-tagged VLANs cannot
packet to a communicate
specified tag. at Layer 2.
 1 to 2: The  VLAN IDs at
sub-interface various sites
maps a tag in accessing a
each received public network
single-tagged may overlap. A
packet to the public network
specified two fails to transmit
tags. packets
 2 to 1: The between sites
sub-interface using the same
maps the outer VLAN IDs.
tag in each  Two ends of a
received public network
double-tagged connection
packet to a support
specified tag different
and numbers of
transparently tags carried in
transmits the each packet.
inner tag as
data.
To face these
 2 to 2: The challenges,
sub-interface configure devices
maps two tags on the public
in each network edge to
received map VLAN tags in
double-tagged access packets to
packet to the public network
Issue 01 (2018-05-04) 604

NE20E-S2

pe
two specified VLAN tags. The
tags. traffic mapping
 offset: The prevents user
sub-interface VLAN conflicts
increases or and helps
decreases, by a implement
specified inter-VLAN
offset, the communication.
VLAN ID
value in a tag
carried in each
single-tagged
packet or in the
outer tag
carried in each
double-tagged
packet.
 Traffic policies
A traffic policy is a combination of a traffic encapsulation type and a traffic behavior. In
the following example, a traffic policy is used. On the network shown in Figure 1-397,
users accessing PE1 need to communicate with users on other PEs at Layer 2. The
following steps are performed:
− Create a bridge domain on PE1, create an EVC Layer 2 sub-interface on the PE1
interface that users access, configure an encapsulation type on the EVC Layer 2
sub-interface and add the EVC Layer 2 sub-interface to the bridge domain.
− Create a bridge domain with the same ID as that on PE1 on each of the other PEs,
configure EVC Layer 2 sub-interfaces on PE interfaces that user access, specify
various encapsulation types and behaviors, and add all EVC Layer 2 sub-interfaces
to the bridge domain.
− Create EVC Layer 2 sub-interfaces connecting all PEs except PE1 and add these
sub-interfaces to the same bridge domain.
All user devices must be on the same network segment to help users on PE1 and other PEs successfully
communicate.
Issue 01 (2018-05-04) 605

NE20E-S2
Figure 1-397 Traffic policy applications
Dev Int Tra Traffic Processing Processing

ice erf ffic Behavior User-to-Device Packets Device-to-User Packets
Na ace Enc
me Na aps
me ulat
ion
Typ
e
PE1 por Dot - Only transparently Only transparently
t1 1q transmits packets. transmits packets.
PE2 por Dot - Only transparently Only transparently
t2 1q transmits packets. transmits packets.
PE3 por Qin map Maps VLAN ID 30 to Maps the tag with VLAN
t3 Q 2-to-1 vid VLAN ID 10 in the outer ID 10 in each received
10 tag and leaves the inner tag single-tagged packet to the
with VLAN ID 300 in each outer tag with VLAN ID
received double-tagged 30 and inner tag with
packet. VLAN ID 300.
Issue 01 (2018-05-04) 606

NE20E-S2
Dev Int Tra Traffic Processing Processing

ice erf ffic Behavior User-to-Device Packets Device-to-User Packets
Na ace Enc
me Na aps
me ulat
ion
Typ
e
PE3 por Def push vid Adds a tag with VLAN ID Removes tag with VLAN
t4 ult 10 10 to each received ID 10 from each received
untagged packet. single-tagged packet.
Traffic encapsulation types and behaviors can be combined flexibly in policies. Table
1-107 describes traffic policies for transmitting traffic.
Table 1-107 Traffic policies
Traffic Traffic Encapsulation Type

Behavior
- Default Dot1q QinQ Untagged

push 1 Supported Supported Supported Supported
push 2 Not supported Supported Not supported Supported
pop outer Not supported Supported Supported -
pop double Not supported - Supported -
swap Not supported - Supported -
1 to 1 map Not supported Supported Supported -
1 to 2 map Not supported Supported Supported -
2 to 1 map Not supported - Supported -
(outer tag)
2 to 2 map Not supported - Supported -
offset Not supported Supported Supported -
Issue 01 (2018-05-04) 607

NE20E-S2
Quality of service (QoS) policies can be deployed on Layer 2 sub-interfaces to differentiate services and
properly allocate resources for the services.
 Traffic forwarding
Figure 1-398 shows traffic forwarding based on an EVC model when Layer 2
sub-interfaces receive packets carrying two VLAN tags.
Figure 1-398 Traffic forwarding
Layer 2 sub-interfaces are created on the PE1 and PE2 interfaces connecting to the CEs.
A traffic policy is deployed on each EVC Layer 2 sub-interface, and the sub-interfaces
are added to BD1.
− Packet transmission from CE1 to CE2
Issue 01 (2018-05-04) 608

NE20E-S2
When receiving double-tagged packets from CE1, the EVC Layer 2 sub-interface of
port 1 on PE1 matches the packets against its traffic encapsulation and receives only
the packets with the outer VLAN ID 100 and inner VLAN ID 10. The EVC Layer 2
sub-interface removes both VLAN tags from the packets based on its traffic
behavior and then forwards the packets to PE2.
Before the EVC Layer 2 sub-interface of port 1 on PE2 forwards the packets to CE2,
the sub-interface adds the outer VLAN ID 200 and inner VLAN ID 20 to the
packets based on its traffic encapsulation and traffic behavior.
− Packet transmission from CE2 to CE1
When receiving double-tagged packets from CE2, the EVC Layer 2 sub-interface of
port 1 on PE2 matches the packets against its traffic encapsulation and receives only
the packets with the outer VLAN ID 200 and inner VLAN ID 20. The EVC Layer 2
sub-interface removes both VLAN tags from the packets based on its traffic
behavior and then forwards the packets to PE1.
Before the EVC Layer 2 sub-interface of port 1 on PE1 forwards the packets to CE1,
the sub-interface adds the outer VLAN ID 100 and inner VLAN ID 10 to the
packets based on its traffic encapsulation and traffic behavior.
Broadcast Domain
EVC has a unified broadcast domain model, as shown in Figure 1-399.
Figure 1-399 Broadcast domain model
Each BD is a virtual broadcast domain in the EVC model.

Different BDs can carry services from the same VSI, and services are differentiated using BD
IDs. BDs are isolated from each other, and MAC address learning is based on BDs,
preventing MAC address flapping.
Layer 3 Access
A BDIF interface is created for a BD in the EVC model. A BDIF interface terminates Layer 2
services and provides Layer 3 access. Figure 1-400 shows how a a BDIF interface forward
packets between Layer 2 and Layer 3.
Issue 01 (2018-05-04) 609

NE20E-S2
Figure 1-400 BDIF interface
A BD is created on the PE and implements Layer 2 forwarding of packets from the user
network. Layer 2 sub-interfaces are created on the user side and bound to the same BD and
are each configured with a traffic policy.
A BDIF interface, which is a virtual interface that implements Layer 3 packet forwarding is
created based on the BD and assigned an IP address.
When forwarding packets, the BDIF interface matches only the destination MAC address in
each packet.
 Layer 2 to Layer 3: When receiving user packets, Layer 2 sub-interfaces process the
packets based on the traffic policies and then forward the packets to the BD. If the
destination MAC address of a user packet is the MAC address of the BDIF interface, the
device removes the Layer 2 header of the packet and performs Layer 3 packet
forwarding based on the routing tables. For all other user packets, the device directly
performs Layer 2 forwarding for them based on the MAC address table.
 Layer 3 to Layer 2: When receiving packets, the device searches its routing table for the
outbound BDIF interface and then sends the packets to this interface. The BDIF interface
encapsulates the packets based on the ARP entries. The device then searches its MAC
address table for the outbound interfaces and performs Layer 2 forwarding for the
packets.
1.7.7.3.1 Application of EVC Bearing VPLS Services
Service Overview
As enterprises widen their global reach and establish more branches in different regions,
applications such as instant messaging and teleconferencing are becoming more common.
This imposes high requirements for end-to-end (E2E) Datacom technologies. A network
capable of providing point to multipoint (P2MP) and multipoint to multipoint (MP2MP)
services is paramount to Datacom function implementation. To ensure the security of
Issue 01 (2018-05-04) 610

NE20E-S2
enterprise data, secure, reliable, and transparent data channels must be provided for multipoint
transmission.
Generally, enterprises lease virtual switching instances (VSIs) on a carrier network to carry
services between branches.
Figure 1-401 Networking of enterprise service distribution
In Figure 1-401, Branch 1 and Branch 3 belong to one department (the Procurement
department, for example), and Branch 2 and Branch 4 belong to another department (the R&D
department, for example). Services must be isolated between these departments, but each
department can plan their VLANs independently (for example, different service development
teams belong to different VLANs). The enterprise plans to dynamically adjust the departments
but does not want to lease multiple VSIs on the carrier network because of the associated
costs.
Feature Deployment
In the traditional service model supported by the NE20E shown in Figure 1-401, common
sub-interfaces (VLAN type), sub-interfaces for dot1q VLAN tag termination, or
Issue 01 (2018-05-04) 611

NE20E-S2
sub-interfaces for QinQ VLAN tag termination are created on the user-side interfaces of the
PEs. These sub-interfaces are bound to different VSIs on the carrier network to isolate
services in different departments. If the enterprise sets up another department, the enterprise
must lease another VSI from the carrier to isolate the departments, increasing costs.
To allow the enterprise to dynamically adjust its departments and reduce costs, the EVC
model can be deployed on the PEs. In the EVC model, multiple BDs are connected to the
same VSI, and the BDs are isolated from each other.
Figure 1-402 Diagram of EVC bearing L2VPN services
In Figure 1-402, the EVC model is deployed as follows:

1. VPLS connections are created on the PEs to ensure communication on the Layer 2
network.
2. BDs are created on the PEs to isolate enterprise services.
3. EVC Layer 2 sub-interfaces are created on the PEs on the user side, are configured with
QinQ traffic encapsulation type and pop double traffic behavior, and transmit Enterprise
services to the carrier network.
4. A BDIF interface is created in each BD, and the BDs are bound to the same VSI to
transmit enterprise services over Pseudo Wires (PWs).
Figure 1-402 shows the VSI channel mode in which BDs are connected to the VPLS network.
The VSI functions as the network-side channel, and BDs function as service instances on the
access layer. A VSI can carry service traffic in multiple BDs.
Issue 01 (2018-05-04) 612

NE20E-S2
 When a packet travels from a BD to a PW, the PE adds the BD ID to the packet as the
outer tag (P-Tag).
 When a packet travels from a PW to a BD, the PE searches for the VSI instance based on
the VC label and the BD based on the P-Tag.
The NE20E also supports the exclusive VSI service mode. This mode is similar to a
traditional service mode in which sub-interfaces are bound to different VSIs to connect to the
VPLS network. Figure 1-403 shows a diagram of the exclusive VSI service mode.
Figure 1-403 Diagram of the exclusive VSI service mode
In the exclusive VSI service mode, each VSI is connected to only one BD, and the BD
occupies the VSI resource exclusively.
1.7.7.3.2 Application of EVC VPWS Services
Service Description
As globalization gains momentum, more and more enterprises set up branches in foreign
countries and requirements for office flexibility are increasing. An urgent demand for carriers
is to provide Layer 2 links for enterprises to set up their own enterprise networks, so that
enterprise employees can conveniently visit enterprise intranets outside their offices.
Issue 01 (2018-05-04) 613

NE20E-S2
By combining previous access modes with the current IP backbone network, VPWS prevents
duplicate network construction and saves operation costs.
Figure 1-404 Configuring EVC VPWS Services
In the traditional service model supported by the NE20E, common sub-interfaces (VLAN
type), Dot1q VLAN tag termination sub-interfaces, or QinQ VLAN tag termination
sub-interfaces are created on the user-side interfaces of PEs. These sub-interfaces are bound to
different VSIs on the carrier network. If Layer 2 devices use different access modes on a
network, service management and configuration are complicated and difficult. To resolve this
issue, configure an EVC to carry Layer 2 services. This implementation facilitates network
planning and management, driving down enterprise costs.
On the VPWS network shown in Figure 1-404, VPN1 services use the EVC VPWS model.
The traffic encapsulation type and behavior are configured on the PE to ensure service
connectivity within the same VPN instance.
Feature Deployment
1. Create a Layer 2 EVC sub-interface on the PE and specify the traffic encapsulation type
and behavior on the Layer 2 sub-interface.
2. Configure VPWS on the EVC Layer 2 sub-interface.
Terms
Terms Definition
EVC Ethernet Virtual Connection. A model for carrying Ethernet services
over a metropolitan area network (MAN). It is defined by the Metro
Ethernet Forum (MEF). An EVC is a model, rather than a specific
service or technique.
Issue 01 (2018-05-04) 614

NE20E-S2

Abbreviation
BD bridge domain
1.7.8 STP/RSTP/MSTP
Definition
Generally, redundant links are used on an Ethernet switching network to provide link backup
and enhance network reliability. The use of redundant links, however, may produce loops,
causing broadcast storms and MAC address table instability. As a result, the communication
quality deteriorates, and the communication service may even be interrupted. The Spanning
Tree Protocol (STP) is introduced to resolve this problem.
STP has a narrow sense and broad sense.
 STP, in the narrow sense, refers to only the STP protocol defined in IEEE 802.1D.
 STP, in the broad sense, refers to the STP protocol defined in IEEE 802.1D, the Rapid
Spanning Tree Protocol (RSTP) defined in IEEE 802.1W, and the Multiple Spanning
Tree Protocol (MSTP) defined in IEEE 802.1S.
Currently, the following spanning tree protocols are supported:
 STP
STP, a management protocol at the data link layer, is used to detect and prevent loops on
a Layer 2 network. STP blocks redundant links on a Layer 2 network and trims a
network into a loop-free tree topology.
The STP topology, however, converges at a slow speed. A port cannot be changed to the
Forwarding state until twice the time specified by the Forward Delay timer elapses.
 RSTP
RSTP, as an enhancement of STP, converges a network topology at a faster speed.
In both RSTP and STP, all VLANs share one spanning tree. All VLAN packets cannot be
load balanced, and some VLAN packets cannot be forwarded along the spanning tree.
RSTP is backward compatible with STP and can be used together with STP on a
network.
 MSTP
MSTP defines a VLAN mapping table in which VLANs are associated with multiple
spanning tree instances (MSTIs). In addition, MSTP divides a switching network into
multiple regions, each of which has multiple independent MSTIs. In this manner, the
entire network is trimmed into a loop-free tree topology, and replication and circular
propagation of packets and broadcast storms are prevented on the network. In addition,
MSTP provides multiple redundant paths to balance VLAN traffic.
MSTP is compatible with STP and RSTP. Table 1-108 shows a comparison between STP,
RSTP, and MSTP.
Issue 01 (2018-05-04) 615

NE20E-S2
Table 1-108 Comparison between STP, RSTP, and MSTP
Spa Characteristics Usage Scenario Precautions

nni
ng
Tre
e
Pro
toc
ol
STP In an STP region, a STP or RSTP is used in a NOTE

loop-free tree is scenario where all VLANs  If the current switching
generated. Broadcast share one spanning tree. In device supports only STP,
storms are therefore this situation, users or STP is recommended.
prevented, and services do not need to be  If the current switching
redundancy is differentiated. device supports both STP
implemented. and RSTP, RSTP is
recommended.
RST  In an RSTP region,  If the current switching
P a loop-free tree is device supports STP or
generated. RSTP, and MSTP, MSTP
is recommended.
Broadcast storms
are thereby
prevented, and
redundancy is
implemented.
 RSTP allows fast
convergence of a
network topology.
MS  In an MSTP MSTP is used in a scenario

TP region, a loop-free where traffic in different
tree is generated. VLANs is forwarded
Broadcast storms through different spanning
are thereby trees that are independent of
prevented, and each other to implement
redundancy is load balancing. In this
implemented. situation, users or services
 MSTP allows fast in different VLANs are
convergence of a distinguished.
network topology.
 MSTP implements
load balancing
among VLANs.
Traffic in different
VLANs is
transmitted along
different paths.
Issue 01 (2018-05-04) 616

NE20E-S2
Purpose
After a spanning tree protocol is configured on an Ethernet switching network, it calculates
the network topology and implements the following functions to remove network loops:
 Loop prevention: The potential loops on the network are cut off after redundant links are
blocked.
 Link redundancy: When an active path becomes faulty, a redundant link can be activated
to ensure network connectivity.
Benefits
 Compared with dual-homing networking, the ring networking requires fewer fibers and
transmission resources. This reduces resource consumption.
 STP prevents broadcast storms. This implements real-time communication and improves
communication reliability.
1.7.8.2 STP/RSTP Principles

STP is used to prevent loops in a LAN. As a LAN expands, STP has become an important
protocol for the LAN. The routers running STP discover loops on the network by exchanging
information with one another, and block certain interfaces to cut off loops.
Figure 1-405 Networking diagram for a typical LAN
On the network shown in Figure 1-405, the following situations may occur:
 Broadcast storms exhaust network resources.
Issue 01 (2018-05-04) 617

NE20E-S2
It is known that loops lead to broadcast storms. In Figure 1-405, STP is not enabled on
the Device A and Device B. If Host A broadcasts a request, the request is received by
port 1 and forwarded by port 2 on both Device A and Device B. Device A's port 2 then
receives the request from Device B's port 2 and forwards the request from Device A's
port 1. Similarly, Device B's port 2 receives the request from Device A's port 2 and
forwards the request from Device B's port 1. As such transmission repeats, resources on
the entire network are exhausted, causing the network unable to work.
 Flapping of MAC address tables damages MAC address entries.
In Figure 1-405, even update of MAC address entries upon the receipt of unicast packets
damages the MAC address table.
Assume that no broadcast storm occurs on the network. Host A unicasts a packet to Host
B. If Host B is temporarily removed from the network at this time, the MAC address
entry of Host B on Device A and Device B is deleted. The packet unicast by Host A to
Host B is received by port 1 on Device A. Device A, however, does not have the MAC
address entry of Host B. Therefore, the unicast packet is forwarded to port 2. Then, port
2 on Device B receives the unicast packet from port 2 on Device A and sends it out
through port 1. As such transmission repeats, port 1 and port 2 on Device A and Device
B continuously receive unicast packets from Host A. Therefore, Device A and Device B
update their MAC address entries continuously, causing the MAC address tables to flap.
Basic Design
STP runs at the data link layer. The routers running STP discover loops on the network by
exchanging information with each other and trim the ring topology into a loop-free tree
topology by blocking a certain interface. In this manner, replication and circular propagation
of packets are prevented on the network. In addition, STP prevents the processing
performance of routers from deteriorating.
The routers running STP usually communicate with each other by exchanging configuration
Bridge Protocol Data Units. BPDUs are classified into two types:
 Configuration BPDU: used to calculate a spanning tree and maintain the spanning tree
topology.
 Topology Change Notification (TCN) BPDU: used to inform associated routers of a
topology change.
Configuration BPDUs contain the following information for routers to calculate the spanning tree.
 Root bridge ID: is composed of a root bridge priority and the root bridge's MAC address. Each STP
network has only one root bridge.
 Cost of the root path: indicates the cost of the shortest path to the root bridge.
 Designated bridge ID: is composed of a bridge priority and a MAC address.
 Designated port ID: is composed of a port priority and a port name.
 Message Age: specifies the lifetime of a BPDU on the network.
 Max Age: specifies the maximum time a BPDU is saved.
 Hello Time: specifies the interval at which BPDUs are sent.
 Forward Delay: specifies the time interface status transition takes.
Issue 01 (2018-05-04) 618

NE20E-S2
One Root Bridge

A tree topology must have a root. Therefore, the root bridge is introduced by STP.
There is only one root bridge on the entire STP-enabled network. The root bridge is the
logical center of but is not necessarily at the physical center of the entire network. The root
bridge changes dynamically with the network topology.
After the network converges, the root bridge generates and sends out configuration BPDUs at
specific intervals. Other routers forward the BPDUs, ensuring that the network topology is
stable.
Two Types of Measurements

The spanning tree is calculated based on two types of measurements: ID and path cost.
 ID
Two types of IDs are available: Bridge IDs (BIDs) and Port IDs (PIDs).
− BID
IEEE 802.1D defines that a BID is composed of a 16-bit bridge priority and a
bridge MAC address. The bridge priority occupies the left most 16 bits and the
MAC address occupies the rightmost 48 bits.
On an STP-enabled network, the router with the smallest BID is selected to be the
root bridge.
− PID
The PID is composed of a 4-bit port priority and a 12-bit port number. The port
priority occupies the left most 4 bits and the port number occupies remaining bits
on the right.
The PID is used to select the designated port.
The port priority affects the role of a port in a specified spanning tree instance. For details, see 1.7.8.2.4
STP Topology Calculation.
 Path cost
The path cost is a port variable and is used to select a link. STP calculates the path cost
to select a robust link and blocks redundant links to trim the network into a loop-free tree
topology.
On an STP-enabled network, the accumulative cost of the path from a certain port to the
root bridge is the sum of the costs of all the segment paths into which the path is
separated by the ports on the transit bridges.
Table 1-109 shows the path costs defined in IEEE 802.1t. Different router manufacturers
use different path cost standards.
Table 1-109 List of path costs
Port Port Mode STP Path Cost (Recommended Value)

Speed
802.1D-199 802.1T legacy
8
0 - 65535 200000000 200,000
Issue 01 (2018-05-04) 619

NE20E-S2
Port Port Mode STP Path Cost (Recommended Value)

Speed
802.1D-199 802.1T legacy
8
10 Mbps Half-Duplex 100 2000000 2,000
Full-Duplex 99 1999999 1,999
Aggregated Link 2 95 1000000 1800
Ports
Ports
Ports
100 Mbps Half-Duplex 19 200000 200
Full-Duplex 18 199999 199
Ports
Ports
Ports
1000 Mbps Full-Duplex 4 20000 20
Ports
Ports
Ports
10 Gbps Full-Duplex 2 2000 2
Ports
Ports
Ports
The rate of an aggregated link is the sum of the rates of all Up member links in the aggregated group.
Issue 01 (2018-05-04) 620

NE20E-S2
Three Elements
There are generally three elements used when a ring topology is to be trimmed into a tree
topology: root bridge, root port, and designated port. Figure 1-406 shows the three elements.
Figure 1-406 STP network architecture
 Root bridge
The root bridge is the bridge with the smallest BID. The smallest BID is determined by
exchanging configuration BPDUs.
 Root port
The root port is a port that has the fewest path cost to the root bridge. To be specific, the
root port is determined based on the path cost. Among all STP-enabled ports on a
network bridge, the port with the smallest root path cost is the root port. There is only
one root port on an STP-enabled router, but there is no root port on the root bridge.
 Designated port
For description of a designated bridge and designated port, see Table 1-110.
Table 1-110 Description of the designated bridge and designated port
Object Designated Bridge Designated Port
Device Device that forwards Designated bridge port that

configuration BPDUs to a forwards configuration BPDUs to
directly connected router a router
Issue 01 (2018-05-04) 621

NE20E-S2
Object Designated Bridge Designated Port

LAN Device that forwards Designated bridge port that
configuration BPDUs to a forwards configuration BPDUs to
network segment a network segment.
As shown in Figure 1-407, AP1 and AP2 reside on Device A; BP1 and BP2 reside on
Device B; CP1 and CP2 reside on Device C.
− Device A sends configuration BPDUs to Device B through AP1. Device A is the
designated bridge of Device B, and AP1 on Device A is the designated port.
− Two routers, Device B and Device C, are connected to the LAN. If Device B is
responsible for forwarding configuration BPDUs to the LAN, Device B is the
designated bridge of the LAN and BP2 on Device B is the designated port.
Figure 1-407 Networking diagram of the designated bridge and designated port
After the root bridge, root port, and designated port are selected successfully, the entire tree
topology is set up. When the topology is stable, only the root port and the designated port
forward traffic. All the other ports are in the Blocking state and receive only STP protocol
packets instead of forwarding user traffic.
Four Comparison Principles

STP has four comparison principles that form a BPDU priority vector {root BID, total path
costs, sender BID, port ID}.
Table 1-111 shows the information that is carried in the configuration BPDUs.
Issue 01 (2018-05-04) 622

NE20E-S2
Table 1-111 Four important fields
Field Brief Description

Root BID Each STP-enabled network has only one root bridge.
Root path cost Cost of the path from the port sending configuration
BPDUs to the root bridge.
Sender BID BID of the router sending configuration BPDUs.
Port ID PID of the port sending configuration BPDUs.
After a router on the STP-enabled network receives configuration BPDUs, it compares the
fields shown in Table 1-111 with that of the configuration BPDUs. The four comparison
principles are as follows:
During the STP calculation, the smaller the value, the higher the priority.
 Smallest BID: used to select the root bridge. Devices running STP select the smallest
BID as the root BID shown in Table 1-111.
 Smallest root path cost: used to select the root port on a non-root bridge. On the root
bridge, the path cost of each port is 0.
 Smallest sender BID: used to select the root port when a router running STP selects the
root port between two ports that have the same path cost. The port with a smaller BID is
selected as the root port in STP calculation. Assume that the BID of Device B is smaller
than that of Device C in Figure 1-406. If the path costs in the BPDUs received by port A
and port B on Device D are the same, port B becomes the root port.
 Smallest PID: used to block the port with a greater PID but not the port with a smaller
PID when the ports have the same path cost. The PIDs are compared in the scenario
shown in Figure 1-408. The PID of port A on Device A is smaller than that of port B. In
the BPDUs that are received on port A and port B, the path costs and BIDs of the sending
routers are the same. Therefore, port B with a greater PID is blocked to cut off loops.
Figure 1-408 Topology to which PID comparison is applied
Issue 01 (2018-05-04) 623

NE20E-S2
Port States
Table 1-112 shows the port status of an STP-enabled router.
Table 1-112 Port states
Port Purpose Description

State
Forwardi A port in the Forwarding state Only the root port and designated port
ng forwards user traffic and BPDUs. can enter the Forwarding state.
Learning When a router has a port in the This is a transitional state.
Learning state, the router creates a
MAC address table based on the
received user traffic but does not
forward user traffic.
Discardin A port in the Discarding state does A root port or designated port can
g not forward user traffic but receives enter the Discarding state only when
BPDUs. the alternate port, backup port, or
protection function takes effect.
Figure 1-409 shows the process of the state transition of a port.
Figure 1-409 State transition of a port
Issue 01 (2018-05-04) 624

NE20E-S2
A Huawei datacom router uses MSTP by default. Port states supported by MSTP are the
same as those supported by STP/RSTP.
The following parameters affect the STP-enabled port states and convergence.
 Hello time
The Hello timer specifies the interval at which an STP-enabled router sends
configuration BPDUs and Hello packets to detect link faults.
Modification of the Hello timer takes effect only if the configuration of the root bridge is
modified. The root bridge adds certain fields in BPDUs to inform non-root bridges of the
change in the interval. After a topology changes, TCN BPDUs will be sent. This interval
is irrelevant to the transmission of TCN BPDUs.
 Forward Delay time
The Forward Delay timer specifies the delay for interface status transition. When a link
fault occurs, STP recalculation is performed, causing the structure of the spanning tree to
change. The configuration BPDUs generated during STP recalculation cannot be
immediately transmitted over the entire network. If the root port and designated port
forward data immediately after being selected, transient loops may occur. Therefore, an
interface status transition mechanism is introduced by STP. The newly selected root port
and designated port do not forward data until an amount of time equal to twice the
forward delay has past. In this manner, the newly generated BPDUs can be transmitted
over the network before the newly selected root port and designated port forward data,
which prevents transient loops.
The Forward Delay timer specifies the duration of a port spent in both the Listening and Learning states.
The port in the Listening or Learning state is blocked, which is key to preventing transient loops.
 Max Age time

The Max Age time specifies the aging time of BPDUs. The Max Age time can be
manually configured on the root bridge.
Configuration BPDUs are transmitted over the entire network, ensuring a unique Max
Age value. After a non-root bridge running STP receives a configuration BPDU, the
non-root bridge compares the Message Age value with the Max Age value in the
received configuration BPDU.
− If the Message Age value is smaller than or equal to the Max Age value, the
non-root bridge forwards the configuration BPDU.
− If the Message Age value is larger than the Max Age value, the configuration BPDU
ages, and the non-root bridge directly discards it. In this case, the network size is
considered too large and the non-root bridge disconnects from the root bridge.
If the configuration BPDU is sent from the root bridge, the value of Message Age is 0. Otherwise, the
value of Message Age indicates the total time during which a BPDU is sent from the root bridge to the
local bridge, including the delay in transmission. In real world situations, each time a configuration
BPDU passes through a bridge, the value of Message Age increases by 1.
Issue 01 (2018-05-04) 625

NE20E-S2
1.7.8.2.3 BPDU Format

The BID, path cost, and PID that are described in the previous sections are all carried in
Bridge Protocol Data Units (BPDUs).
 Configuration BPDUs are heartbeat packets. STP-enabled designated ports send BPDUs
at intervals specified by the Hello timer.
 Topology Change Notification (TCN) BPDUs are sent only after the router detects
network topology changes.
A BPDU is encapsulated into an Ethernet frame. In an Ethernet frame, the destination MAC
address is the multicast MAC address 01-80-C2-00-00-00; the value of the Length/Type field
is the length of MAC data; in the LLC header, as defined in the IEEE standard, the values of
DSAP and SSAP are 0x42 and the value of UI is 0x03; the BPDU header follows the LLC
header. Figure 1-410 shows the format of an Ethernet frame.
Figure 1-410 Format of an Ethernet frame
Configuration BPDU
Configuration BPDUs are most commonly used.
During initialization, each bridge actively sends configuration BPDUs. After the network
topology becomes stable, only the root bridge actively sends configuration BPDUs. Other
bridges send configuration BPDUs only after receiving configuration BPDUs from upstream
routers. A configuration BPDU is at least 35 bytes long, including the parameters such as the
BID, path cost, and PID. A BPDU is discarded if both the sender BID and Port ID field values
are the same as those of the local port. Otherwise, the BPDU is processed. In this manner,
BPDUs containing the same information as that of the local port are not processed.
Table 1-113 shows the format of a BPDU.
Table 1-113 BPDU format
Field Byte Description

Protocol 2 Always 0
Identifier
Protocol 1 Always 0
Version
Identifier
BPDU Type 1 Indicates the type of a BPDU. The value can be:
 0x00: configuration BPDU
 0x80: TCN BPDU
Issue 01 (2018-05-04) 626

NE20E-S2
Flags 1 Indicates whether the network topology is changed.

 The right most bit is the Topology Change (TC) flag.
 The left most bit is the Topology Change
Acknowledgement (TCA) flag.
Root 8 Indicates the BID of the current root bridge.
Identifier
Root Path 4 Indicates the cumulative cost of all links to the root bridge.
Cost
Bridge 8 Indicates the BID of the bridge sending a BPDU.
Identifier
Port 2 Indicates the ID of the port sending a BPDU.
Identifier
Message 2 Records the time since the root bridge originally generated
Age the information that a BPDU is derived from.
If the configuration BPDU is sent from the root bridge, the
value of Message Age is 0. Otherwise, the value of Message
Age indicates the total time during which a BPDU is sent
from the root bridge to the local bridge, including the delay
in transmission. In real world situations, each time a
configuration BPDU passes through a bridge, the value of
Message Age increases by 1.
Max Age 2 Indicates the maximum time that a BPDU is saved.
Hello Time 2 Indicates the interval at which BPDUs are sent.
Forward 2 Indicates the time spent in the Listening and Learning states.
Delay
Figure 1-411 shows the Flags field. Only the left most and right most bits are used in STP.
Figure 1-411 Format of the Flags field
A configuration BPDU is generated in one of the following scenarios:

 Once the ports are enabled with STP, the designated ports send configuration BPDUs at
intervals specified by the Hello timer.
Issue 01 (2018-05-04) 627

NE20E-S2
 When a root port receives configuration BPDUs, the router where the root port resides
sends a copy of the configuration BPDUs to the specified ports on itself.
 When receiving a configuration BPDU with a lower priority, a designated port
immediately sends its own configuration BPDUs to the downstream router.
TCN BPDU
The contents of TCN BPDUs are simple, including only three fields: Protocol ID, Version,
and Type, as shown in Table 1-113. The value of the Type field is 0x80, four bytes in length.
TCN BPDUs are transmitted by each router to its upstream router to notify the upstream
router of changes in the downstream topology, until they reach the root bridge. A TCN BPDU
is generated in one of the following scenarios:
 Where the port is in the Forwarding state and at least one designated port resides on the
router
 Where a designated port receives TCN BPDUs and sends a copy to the root bridge
1.7.8.2.4 STP Topology Calculation
Initialization of the Spanning Tree

After all routers on the network are enabled with STP, each router considers itself the root
bridge. Each router only transmits and receives Bridge Protocol Data Units (BPDUs) but does
not forward user traffic. All ports are in the Listening state. After exchanging configuration
BPDUs, all routers participate in the selection of the root bridge, root port, and designated
port.
1. Root bridge selection
As shown in Figure 1-412, the quadruple marked with {} indicates a set of ordered
vectors: root Bridge ID (BID) (DeviceA_MAC and DeviceB_MAC indicates the BIDs
of two routers), total path costs, sender BID, and Port ID. Configuration BPDUs are sent
at intervals set by the Hello timer. By default, the interval is 2 seconds.
As each bridge considers itself the root bridge, the value of the root BID field in the BPDU sent by each
port is recorded as its BID; the value of the Root Path Cost field is the cumulative cost of all links to the
root bridge; the sender BID is the ID of the local bridge; the Port ID is the Port ID (PID) of the local
bridge port that sends the BPDU.
Figure 1-412 Exchange of initialization messages
Issue 01 (2018-05-04) 628

NE20E-S2
Once a port receives a BPDU with a priority higher than that of itself, the port extracts
certain information from the BPDU and synchronizes its own information with the
obtained information. The port stops sending the BPDU immediately after saving the
updated BPDU.
When sending a BPDU, each router fills in the Sender BID field with its own BID. When
a router considers itself the root bridge, the router fills in the Root BID field with its own
BID. As shown in Figure 1-412, Port B on Device B receives a BPDU with a higher
priority from Device A, and therefore considers Device A the root bridge. When another
port on Device B sends a BPDU, the port fills in its Root BID field with DeviceA_BID.
The preceding intercommunication is repeatedly performed between two routers until all
routers consider the same router as the root bridge. This indicates that the root bridge is
selected. Figure 1-413 shows the root bridge selection.
Figure 1-413 Diagram of root bridge selection
2. Root port selection

Each non-root bridge must and can only select one root port.
After the root bridge has been selected, each bridge determines the cost of each possible
path from itself to the root bridge. From these paths, it picks one with the smallest cost (a
least-cost path). The port connecting to that path becomes the root port of the bridge.
Figure 1-414 shows the root port selection.
In the Root Path Cost algorithm, after a port receives a BPDU, the port extracts the value of the Root
Path Cost field, and adds the obtained value and the path cost on the itself to obtain the root path cost.
The path cost on the port covers only directly-connected path costs. The cost can be manually
configured on a port. If the root path costs on two or more ports are the same, the port that sends a
BPDU with the smallest sender BID value is selected as the root port.
Issue 01 (2018-05-04) 629

NE20E-S2
Figure 1-414 Diagram of root port selection
3. Selection of a designated port

A port that discards lower-priority BPDUs received from other ports, whether on the
local router or other routers on the network segment, is called a designated port on the
network segment. As shown in Figure 1-412, assume that the MAC address of Device A
is smaller than that of Device B. Port A on Device A is selected as a designated port. The
router where a designated port resides is called a designated bridge on the network
segment. In Figure 1-412, Device A is a designated bridge on the network segment.
After the network convergence is implemented, only the designated port and root port
are in the Forwarding state. The other ports are in the Blocking state. They do not
forward user traffic.
Ports on the root bridge are all designated ports unless loops occur on the root bridge.
Figure 1-415 shows the designated port selection.
Issue 01 (2018-05-04) 630

NE20E-S2
Figure 1-415 Diagram of designated port selection
After the Topology Becomes Stable

After the topology becomes stable, the root bridge still sends configuration BPDUs at
intervals set by the Hello timer. Each non-root bridge forwards the received configuration
BPDUs by using its designated port. If the priority of the received BPDU is higher than that
on the non-root bridge, the non-root bridge updates its own BPDU based on the information
carried in the received BPDU.
STP Topology Changes

Figure 1-416 shows the packet transmission process after the STP topology changes.
Issue 01 (2018-05-04) 631

NE20E-S2
Figure 1-416 Diagram of packet transmission after the topology changes
1. After the network topology changes, a downstream router continuously sends Topology
Change Notification (TCN) BPDUs to an upstream router.
2. After the upstream router receives TCN BPDUs from the downstream router, only the
designated port processes them. The other ports may receive TCN BPDUs but do not
process them.
3. The upstream router sets the TCA bit of the Flags field in the configuration BPDUs to 1
and returns the configuration BPDUs to instruct the downstream router to stop sending
TCN BPDUs.
4. The upstream router sends a copy of the TCN BPDUs to the root bridge.
5. Steps 1, 2, 3, and 4 are repeated until the root bridge receives the TCN BPDUs.
6. The root bridge sets the TC bit of the Flags field in the configuration BPDUs to 1 to
instruct the downstream router to delete MAC address entries.
 TCN BPDUs are used to inform the upstream router and root bridge of topology changes.
 Configuration BPDUs with the Topology Change Acknowledgement (TCA) bit being set to 1 are
used by the upstream router to inform the downstream router that the topology changes are known
and instruct the downstream router to stop sending TCN BPDUs.
 Configuration BPDUs with the Topology Change (TC) bit being set to 1 are used by the upstream
router to inform the downstream router of topology changes and instruct the downstream router to
delete MAC address entries. In this manner, fast network convergence is achieved.
Figure 1-415 is used as an example to show how the network topology converges when the
root bridge or designated port of the root bridge becomes faulty.
 The root bridge becomes faulty.
Issue 01 (2018-05-04) 632

NE20E-S2
Figure 1-417 Diagram of topology changes in the case of a faulty root bridge
As shown in Figure 1-417, the root bridge becomes faulty, Device B and Device C will
reselect the root bridge. Device B and Device C exchange configuration BPDUs to select
the root bridge.
 The designated port of the root bridge becomes faulty.
Figure 1-418 Diagram of topology changes in the case of a faulty designated port on the root
bridge
As shown in Figure 1-418, the designated port of the root bridge, port 1, becomes faulty.
port6 is selected as the root port through exchanging configuration BPDUs of Device B
and Device C.
Issue 01 (2018-05-04) 633

NE20E-S2
In addition, port6 sends TCN BPDUs after entering the forwarding state. Once the root
bridge receives the TCN BPDUs, it will send TC BPDUs to instruct the downstream
router to delete MAC address entries.
1.7.8.2.5 Evolution from STP to RSTP

In 2001, IEEE 802.1w was published to introduce an extension of the Spanning Tree Protocol
(STP), namely, Rapid Spanning Tree Protocol (RSTP). RSTP is developed based on STP but
outperforms STP.
Disadvantages of STP
STP ensures a loop-free network but has a slow network topology convergence speed, leading
to service deterioration. If the network topology changes frequently, the connections on the
STP-enabled network are frequently torn down, causing frequent service interruption. Users
can hardly tolerate such a situation.
Disadvantages of STP are as follows:
 Port states or port roles are not subtly distinguished, which is not conducive to the
learning and deployment for beginners.
A network protocol that subtly defines and distinguishes different situations is likely to
outperform the others.
− Ports in the Listening, Learning, and Blocking states do not forward user traffic and
are not even slightly different to users.
− The differences between ports in essence never lie in the port states but the port
roles from the perspective of use and configuration.
It is possible that the root port and designated port are both in the Listening state or
Forwarding state.
 The STP algorithm determines topology changes after the time set by the timer expires,
which slows down network convergence.
 The STP algorithm requires a stable network topology. After the root bridge sends
configuration Bridge Protocol Data Units (BPDUs), other routers forward them until all
bridges on the network receive the configuration BPDUs.
This also slows down topology convergence.
Advantages of RSTP over STP

To make up for STP disadvantages, Rapid Spanning Tree Protocol (RSTP) deletes three port
states, introduces two port roles, and distinguishes port attributes based on port states and
roles to provide more accurate port description. This offers beginners an easy access to
protocols and speeds up topology convergence.
 More port roles are defined to simplify the knowledge and deployment of STP.
Issue 01 (2018-05-04) 634

NE20E-S2
Figure 1-419 Diagram of port roles
As shown in Figure 1-419, RSTP defines four port roles: root port, designated port,
alternate port, and backup port.
The functions of the root port and designated port are the same as those defined in STP.
The alternate port and backup port are described as follows:
− From the perspective of configuration BPDU transmission:
 An alternate port is blocked after learning the configuration BPDUs sent by
other bridges.
 A backup port is blocked after learning the configuration BPDUs sent by itself.
− From the perspective of user traffic
 An alternate port backs up the root port and provides an alternate path from the
designated bridge to the root bridge.
 A backup port backs up the designated port and provides an alternate path
from the root bridge to the related network segment.
Issue 01 (2018-05-04) 635

NE20E-S2
After all RSTP-enabled ports are assigned roles, topology convergence is

completed.
 Port states are redefined in RSTP.
Port states are simplified from five types to three types. Based on whether a port
forwards user traffic and learns MAC addresses, the port is in one of the following states:
− If a port neither forwards user traffic nor learns MAC addresses, the port is in the
Discarding state.
− If a port does not forward user traffic but learns MAC addresses, the port is in the
Learning state.
− If a port forwards user traffic and learns MAC addresses, the port is in the
Forwarding state.
Table 1-114 shows the comparison between port states in STP and RSTP.
Port states and port roles are not necessarily related. Table 1-114 lists states of ports with different roles.
Table 1-114 Comparison between states of STP ports and RSTP ports with different roles
STP Port State RSTP Port State Port Role

Forwarding Forwarding Root port or designated port
Learning Learning Root port or designated port
Listening Discarding Root port or designated port
Blocking Discarding Alternate port or backup port
Disabled Discarding Disabled port
 Configuration BPDUs in RSTP are differently defined. Port roles are described based on
the Flags field defined in STP.
Compared with STP, RSTP slightly redefined the format of configuration BPDUs.
− The value of the Type field is no longer set to 0 but 2. Therefore, the RSTP-enabled
router always discards the configuration BPDUs sent by an STP-enabled router.
− The 6 bits in the middle of the original Flags field are reserved. Such a
configuration BPDU is called an RST BPDU, as shown in Figure 1-420.
Issue 01 (2018-05-04) 636

NE20E-S2
Figure 1-420 Format of the Flags field in an RST BPDU
 Configuration BPDUs are processed in a different manner.

− Transmission of configuration BPDUs
In STP, after the topology becomes stable, the root bridge sends configuration
BPDUs at an interval set by the Hello timer. A non-root bridge does not send
configuration BPDUs until it receives configuration BPDUs sent from the upstream
router. This renders the STP calculation complicated and time-consuming. In RSTP,
after the topology becomes stable, a non-root bridge sends configuration BPDUs at
Hello intervals, regardless of whether it has received the configuration BPDUs sent
from the root bridge. Such operations are implemented on each router
independently.
− BPDU timeout period
In STP, a router has to wait a Max Age period before determining a negotiation
failure. In RSTP, if a port does not receive configuration BPDUs sent from the
upstream router for three consecutive Hello intervals, the negotiation between the
local router and its peer fails.
− Processing of inferior BPDUs
In RSTP, when a port receives an RST BPDU from the upstream designated bridge,
the port compares the received RST BPDU with its own RST BPDU.
If its own RST BPDU is superior to the received one, the port discards the received
RST BPDU and immediately responds to the upstream router with its own RST
BPDU. After receiving the RST BPDU, the upstream router updates its own RST
BPDU based on the corresponding fields in the received RST BPDU.
In this manner, RSTP processes inferior BPDUs more rapidly, independent of any
timer that is used in STP.
 Rapid convergence
− Proposal/agreement mechanism
When a port is selected as a designated port, in STP, the port does not enter the
Forwarding state until a Forward Delay period expires; in RSTP, the port enters the
Discarding state, and then the proposal/agreement mechanism allows the port to
immediately enter the Forwarding state. The proposal/agreement mechanism must
be applied on the P2P links in full duplex mode.
For details, see 1.7.8.2.6 RSTP Implementation.
− Fast switchover of the root port
Issue 01 (2018-05-04) 637

NE20E-S2
If the root port fails, the most superior alternate port on the network becomes the
root port and enters the Forwarding state. This is because there must be a path from
the root bridge to a designated port on the network segment connecting to the
alternate port.
When the port role changes, the network topology will change accordingly. For
details, see 1.7.8.2.6 RSTP Implementation.
− Edge ports
In RSTP, a designated port on the network edge is called an edge port. An edge port
directly connects to a terminal and does not connect to any other routers.
An edge port does not receive configuration BPDUs, and therefore does not
participate in the RSTP calculation. It can directly change from the Disabled state to
the Forwarding state without any delay, just like an STP-incapable port. If an edge
port receives bogus BPDUs from attackers, it is deprived of the edge port attributes
and becomes a common STP port. The STP calculation is implemented again,
causing network flapping.
 Protection functions
Table 1-115 shows protection functions provided by RSTP.
Table 1-115 Protection functions
Protectio Scenario Principle

n
Function
BPDU On a router, ports that are After BPDU protection is enabled on a
protection directly connected to a user router, if an edge port receives an RST
terminal such as a PC or file BPDU, the router shuts down the edge port
server are configured as edge without depriving of its attributes, and
ports. notifies the NMS of the shutdown event.
Usually, no Rapid Spanning The edge port can be started only by the
Tree (RST) BPDU will be sent network administrator.
to edge ports. If a router To allow an edge port to automatically start
receives bogus RST BPDUs on after being shut down, you can configure
an edge port, the router the auto recovery function and set the delay
automatically sets the edge port on the port. In this manner, an edge port
to a non-edge port, and starts automatically after the set delay. If
performs STP calculation the edge port receives RST BPDUs again,
again. This causes network the edge port will again be shut down.
flapping. NOTE
The smaller the delay is set, the sooner the edge
port becomes Up, and the more frequently the
edge port alternates between Up and Down. The
larger the delay is set, the later the edge port
becomes Up, and the longer the service
interruption lasts.
Root Due to incorrect configurations If a designated port is enabled with the root
protection or malicious attacks on the protection function, the port role cannot be
network, the root bridge may changed. Once a designated port that is
Issue 01 (2018-05-04) 638

NE20E-S2
Protectio Scenario Principle

n
Function
receive RST BPDUs with a enabled with root protection receives RST
higher priority. Consequently, BPDUs with a higher priority, the port
the valid root bridge is no enters the Discarding state and does not
longer able to serve as the root forward packets. If the port does not
bridge, and the network receive any RST BPDUs with a higher
topology incorrectly changes. priority before a period (generally two
This also causes the traffic that Forward Delay periods) expires, the port
should be transmitted over automatically enters the Forwarding state.
high-speed links to be NOTE
transmitted over low-speed Root protection can take effect on only
links, leading to network designated ports.
congestion.
Loop On an RSTP-enabled network, After loop protection is configured, if the

protection the router maintains the status root port or alternate port does not receive
of the root port and blocked RST BPDUs from the upstream router for a
ports by continually receiving long time, the router notifies the NMS that
BPDUs from the upstream the port enters the Discarding state. The
router. blocked port remains in the Blocked state
If ports cannot receive BPDUs and does not forward packets. This
from the upstream router due to prevents loops on the network. The root
link congestion or port or alternate port restores the
unidirectional link failures, the Forwarding state after receiving new RST
router re-selects a root port. BPDUs.
Then, the previous root port NOTE
becomes a designated port and Loop protection can take effect on only the root
the blocked ports change to the port and alternate ports.
Forwarding state. As a result,
loops may occur on the
network.
Topology After receiving TC BPDUs, a After the TC BPDU attack defense is

Change router will delete its MAC enabled, the number of times that TC
(TC) entries and ARP entries. In the BPDUs are processed by the router within a
BPDU event of a malicious attack by given time period is configurable. If the
attack sending bogus TC BPDUs, a number of TC BPDUs that the router
defense router receives a large number receives within the given time exceeds the
of TC BPDUs within a short specified threshold, the router processes TC
period, and busies itself BPDUs only for the specified number of
deleting its MAC entries and times. Excess TC BPDUs are processed by
ARP entries. As a result, the the router as a whole for once after the
router is heavily burdened, specified period expires. In this manner, the
rendering the network rather router is prevented from frequently deleting
unstable. its MAC entries and ARP entries, and
therefore is protected against overburden.
Issue 01 (2018-05-04) 639

NE20E-S2
1.7.8.2.6 RSTP Implementation

RSTP implementation covers three aspects: P/A mechanism, RSTP topology change operation,
and interoperability between RSTP and STP.
P/A Mechanism
To allow a Huawei device to communicate with a non-Huawei device, a proper rapid
transition mechanism needs to be configured on the Huawei device based on the
Proposal/Agreement (P/A) mechanism on the non-Huawei device.
The P/A mechanism helps a designated port to enter the Forwarding state as soon as possible.
As shown in Figure 1-421, the P/A negotiation is performed based on the following port
variables:
Figure 1-421 BPDU exchange during the P/A negotiation
1. proposing: When a port is in the Discarding or Learning state, this variable is set to 1.
Additionally, a Rapid Spanning Tree (RST) BPDU with the Proposal field being 1 is sent
to the downstream router.
2. proposed: After a port receives an RST BPDU with the Proposal field being 1 from the
designated port on the peer router, this variable is set to 1, urging the designated port on
this network segment to enter the Forwarding state.
3. sync: After the proposed variable is set to 1, the root port receiving the proposal sets the
sync variable to 1 for the other ports on the same router; a non-edge port receiving the
proposal enters the Discarding state.
4. synced: After a port enters the Discarding state, it sets its synced variable to 1 in the
following manner: If this port is the alternate, backup, or edge port, it will immediately
set its synced variable to 1. If this port is the root port, it will monitor the synced
variables of the other ports. After the synced variables of all the other ports are set to 1,
the root port sets its synced variable to 1, and sends an RST BPDU with the Agreement
field being 1.
Issue 01 (2018-05-04) 640

NE20E-S2
5. agreed: After the designated port receives an RST BPDU with the Agreement field being
1 and the port role field indicating the root port, this variable is set to 1. Once the agreed
variable is set to 1, this designated port immediately enters the Forwarding state.
Figure 1-422 Schematic diagram for the P/A negotiation
As shown in Figure 1-422, a new link is established between the root bridges Device A and
Device B. On Device B, p2 is an alternate port; p3 is a designated port in the Forwarding state;
p4 is an edge port. The P/A mechanism works in the following process:
1. p0 and p1 become designated ports and send RST BPDUs.
2. After receiving an RST BPDU with a higher priority, p1 realizes that it will become a
root port but not a designated port, and therefore it stops sending RST BPDUs.
3. p0 enters the Discarding state, and sends RST BPDUs with the Proposal field being 1.
4. After receiving an RST BPDU with the Proposal field being 1, Device B sets the sync
variable to 1 for all its ports.
5. As p2 has been blocked, its status keeps unchanged; p4 is an edge port, and therefore it
does not participate in calculation. Therefore, only the non-edge designated port p3
needs to be blocked.
6. After p2, p3, and p4 enter the Discarding state, their synced variables are set to 1. The
synced variable of the root port p1 is then set to 1, and p1 sends an RST BPDU with the
Agreement field being 1 to Device A. Except for the Agreement field, which is set to 1,
and the Proposal field, which is set to 0, the RST BPDU is the same as that was received.
7. After receiving this RST BPDU, Device A identifies it as a reply to the proposal that it
just sent, and therefore p0 immediately enters the Forwarding state.
Issue 01 (2018-05-04) 641

NE20E-S2
This P/A negotiation process finishes, and Device B continues to perform the P/A negotiation
with its downstream router.
Theoretically, STP can quickly select a designated port. To prevent loops, STP has to wait for
a period of time long enough to determine the status of all ports on the network. All ports can
enter the Forwarding state at least one forward delay later. RSTP is developed to eliminate
this bottleneck by blocking non-root ports to prevent loops. By using the P/A mechanism, the
upstream port can rapidly enter the Forwarding state.
RSTP Topology Change

In RSTP, if a non-edge port changes to the Forwarding state, the topology changes.
After a router detects the topology change (TC), it performs the following procedures:
 Start a TC While Timer for every non-edge port. The TC While Timer value doubles the
Hello Timer value.
All MAC addresses learned by the ports whose status changes are cleared before the
timer expires.
These ports send RST BPDUs with the TC field being 1. Once the TC While Timer
expires, they stop sending the RST BPDUs.
 After another router receives the RST BPDU, it clears the MAC addresses learned by all
ports excluding the one that receives the RST BPDU. The router then starts a TC While
Timer for all non-edge ports and the root port, the same as the preceding process.
In this manner, RST BPDUs flood the network.
To use the P/A mechanism, ensure that the link between the two routersis a point to point (P2P) link in
full-duplex mode. Once the P/A negotiation fails, a designated port can forward traffic only after the
forwarding delay timer expires twice. This delay time is the same as that in STP.
Interoperability Between RSTP and STP

When RSTP switches to STP, RSTP loses its advantages such as fast convergence.
On a network where both STP-enabled and RSTP-enabled routers are deployed, STP-enabled
routers ignore RST BPDUs; if a port on an RSTP-enabled router receives a configuration
BPDU from an STP-enabled router, the port switches to the STP mode after two Hello
intervals and starts to send configuration BPDUs. In this manner, RSTP and STP are
interoperable.
After STP-enabled routers are removed, Huawei RSTP-enabled datacom routers can switch
back to the RSTP mode from the STP mode by running a command.
1.7.8.3 MSTP Principles

1.7.8.3.1 MSTP Background
For RSTP and STP, all VLANs on a LAN use one spanning tree, and therefore VLAN-based
load balancing cannot be performed. Once a link is blocked, it will no longer transmit traffic,
wasting bandwidth and causing the failure in forwarding certain VLAN packets. MSTP
overcomes the shortcoming of RSTP and STP and implements fast convergence and provides
multiple paths to load balance VLAN traffic.
Issue 01 (2018-05-04) 642

NE20E-S2
Figure 1-423 STP/RSTP shortcoming
On the network shown in Figure 1-423, STP or RSTP is enabled. The broken line shows the
spanning tree. Device F is the root router. The links between Device A and Device D and
between Device B and Device E are blocked. VLAN packets are transmitted by using the
corresponding links marked with "VLAN2" or "VLAN3."
Host A and Host B belong to VLAN 2 but they cannot communicate with each other because
the link between Device B and Device E is blocked and the link between Device C and
Device F denies packets from VLAN 2.
MSTP divides a switching network into multiple regions, each of which has multiple spanning
trees that are independent of each other. Each spanning tree is called a Multiple Spanning Tree
Instance (MSTI) and each region is called a Multiple Spanning Tree (MST) region.
An instance is a collection of VLANs. Binding multiple VLANs to an instance saves communication

costs and reduces resource usage. The topology of each MSTI is calculated independent of one another,
and traffic can be balanced among MSTIs. Multiple VLANs that have the same topology can be mapped
to one instance. The forwarding status of the VLANs for a port is determined by the port status in the
MSTI.
Issue 01 (2018-05-04) 643

NE20E-S2
Figure 1-424 Multiple spanning trees in an MST region
As shown in Figure 1-424, MSTP maps VLANs to MSTIs in the VLAN mapping table. Each
VLAN can be mapped to only one MSTI. This means that traffic of a VLAN can be
transmitted in only one MSTI. An MSTI, however, can correspond to multiple VLANs.
Two spanning trees are calculated:
 MSTI 1 uses Device D as the root router to forward packets of VLAN 2.
 MSTI 2 uses Device F as the root router to forward packets of VLAN 3.
In this manner, routers within the same VLAN can communicate with each other; packets of
different VLANs are load balanced along different paths.
MSTP Network Hierarchy

As shown in Figure 1-425, the Multiple Spanning Tree Protocol (MSTP) network consists of
one or more Multiple Spanning Tree (MST) regions. Each MST region contains one or more
Multiple Spanning Tree Instances (MSTIs). An MSTI is a tree network consisting of routers
running STP, Rapid Spanning Tree Protocol (RSTP), or MSTP.
Issue 01 (2018-05-04) 644

NE20E-S2
Figure 1-425 MSTP network hierarchy
MST Region
An MST region contains multiple routers and network segments between them. The routers of
one MST region have the following characteristics:
 MSTP-enabled
 Same region name
 Same VLAN-MSTI mappings
 Same MSTP revision level
A LAN can comprise several MST regions that are directly or indirectly connected. Multiple
routers can be grouped into an MST region by using MSTP configuration commands.
As shown in Figure 1-426, the MST region D0 contains Device A, Device B, Device C, and
Device D, and has three MSTIs.
Issue 01 (2018-05-04) 645

NE20E-S2
Figure 1-426 MST region
VLAN Mapping Table

The VLAN mapping table is an attribute of the MST region. It describes mappings between
VLANs and MSTIs.
As shown in Figure 1-426, the mappings in the VLAN mapping table of the MST region D0
are as follows:
 VLAN 1 is mapped to MSTI 1.
 VLAN 2 and VLAN 3 are mapped to MSTI 2.
 Other VLANs are mapped to MSTI 0.
Regional Root
Regional roots are classified as Internal Spanning Tree (IST) and MSTI regional roots.
In the region B0, C0, and D0 on the network shown in Figure 1-428, the routers closest to the
Common and Internal Spanning Tree (CIST) root are IST regional roots.
An MST region can contain multiple spanning trees, each called an MSTI. An MSTI regional
root is the root of the MSTI. On the network shown in Figure 1-427, each MSTI has its own
regional root.
Issue 01 (2018-05-04) 646

NE20E-S2
Figure 1-427 MSTI
MSTIs are independent of each other. An MSTI can correspond to one or more VLANs, but a
VLAN can be mapped to only one MSTI.
Master Bridge
The master bridge is the IST master, which is the router closest to the CIST root in a region,
for example, Device A shown in Figure 1-426.
If the CIST root is in an MST region, the CIST root is the master bridge of the region.
Issue 01 (2018-05-04) 647

NE20E-S2
CIST Root
Figure 1-428 MSTP network
On the network shown in Figure 1-428, the CIST root is the root bridge of the CIST. The
CIST root is a router in A0.
CST
A Common Spanning Tree (CST) connects all the MST regions on a switching network.
If each MST region is considered a node, the CST is calculated by using STP or RSTP based
on all the nodes.
As shown in Figure 1-428, the MST regions are connected to form a CST.
IST
An IST resides within an MST region.
An IST is a special MSTI with the MSTI ID being 0, called MSTI 0.
An IST is a segment of the CIST in an MST region.
As shown in Figure 1-428, the routers in an MST region are connected to form an IST.
Issue 01 (2018-05-04) 648

NE20E-S2
CIST
A CIST, calculated by using STP or RSTP, connects all the routers on a switching network.
As shown in Figure 1-428, the ISTs and the CST form a complete spanning tree, the CIST.
SST
A Single Spanning Tree (SST) is formed in either of the following situations:
 A router running STP or RSTP belongs to only one spanning tree.
 An MST region has only one router.
As shown in Figure 1-428, the router in B0 forms an SST.
Port Role
Based on RSTP, MSTP has two additional port types. MSTP ports can be root ports,
designated ports, alternate ports, backup ports, edge ports, master ports, and regional edge
port.
The functions of root ports, designated ports, alternate ports, and backup ports have been
defined in RSTP. Table 1-116 lists all port roles in MSTP.
Except edge ports, all ports participate in MSTP calculation.
A port can play different roles in different spanning tree instances.
Table 1-116 Port roles
Port Description
Role
Root port A root port is the non-root bridge port closest to the root bridge. Root bridges
do not have root ports.
Root ports are responsible for sending data to root bridges.
As shown in Figure 1-429, Device A is the root; CP1 is the root port on
Device C; BP1 is the root port on Device B.
Designate The designated port on a router forwards BPDUs to the downstream router.
d port As shown in Figure 1-429, AP2 and AP3 are designated ports on Device A;
CP2 is a designated port on Device C.
Alternate  From the perspective of sending BPDUs, an alternate port is blocked after a
port BPDU sent by another bridge is received.
 From the perspective of user traffic, an alternate port provides an alternate
path to the root bridge. This path is different than using the root port.
As shown in Figure 1-429, BP2 is an alternate port.

Backup  From the perspective of sending BPDUs, a backup port is blocked after a
port BPDU sent by itself is received.
Issue 01 (2018-05-04) 649

NE20E-S2
Port Description
Role
 From the perspective of user traffic, a backup port provides a
backup/redundant path to a segment where a designated port already
connects.
As shown in Figure 1-429, CP3 is a backup port.

Master A master port is on the shortest path connecting MST regions to the CIST root.
port BPDUs of an MST region are sent to the CIST root through the master port.
Master ports are special regional edge ports, functioning as root ports on ISTs
or CISTs and master ports in instances.
As shown in Figure 1-429, Device A, Device B, Device C, and Device D form
an MST region. AP1 on Device A, being the nearest port in the region to the
CIST root, is the master port.
Regional A regional edge port is located at the edge of an MST region and connects to
edge port another MST region or an SST.
During MSTP calculation, the roles of a regional edge port in the MSTI and
the CIST instance are the same. If the regional edge port is the master port in
the CIST instance, it is the master port in all the MSTIs in the region.
As shown in Figure 1-429, AP1, DP1, and DP2 in an MST region are directly
connected to other regions, and therefore they are all regional edge ports of the
MST region.
AP1 is a master port in the CIST. Therefore, AP1 is the master port in every
MSTI in the MST region.
Edge port An edge port is located at the edge of an MST region and does not connect to
any router.
Generally, edge ports are directly connected to terminals.
After MSTP is enabled on a port, edge-port detecting is started automatically.
If the port fails to receive BPDU packets within seconds, the port is set to an
edge port. Otherwise, the port is set to a non-edge port.
As shown in Figure 1-429, BP3 is an edge port.
Issue 01 (2018-05-04) 650

NE20E-S2
Figure 1-429 Root port, designated port, alternate port, and backup port
MSTP Port Status

Table 1-117 lists the MSTP port status, which is the same as the RSTP port status.
Table 1-117 Port status

Port Description
Status
Forwardi A port in the Forwarding state can send and receive BPDUs as well as forward
ng user traffic.
Learning A port in the Learning state learns MAC addresses from user traffic to
construct a MAC address table.
In the Learning state, the port can send and receive BPDUs, but not forward
user traffic.
Discardin A port in the Discarding state can only receive BPDUs.
g
There is no necessary link between the port status and the port role. Table 1-118 lists the
relationships between port roles and port status.
Table 1-118 Relationships between port roles and port status

Port Root Designated Regional Alternate Backup Port
Status Port/Master Port Edge Port Port
Port
Issue 01 (2018-05-04) 651

NE20E-S2
Port Root Designated Regional Alternate Backup Port

Status Port/Master Port Edge Port Port
Port
Forwardi Yes Yes Yes No No
ng
Learning Yes Yes Yes No No
Discardi Yes Yes Yes Yes Yes
ng
Yes: The port supports this status.

No: The port does not support this status.
1.7.8.3.3 MST BPDUs

Multiple Spanning Tree Protocol (MSTP) calculates spanning trees on the basis of Multiple
Spanning Tree Bridge Protocol Data Units (MST BPDUs). By transmitting MST BPDUs,
spanning tree topologies are computed, network topologies are maintained, and topology
changes are conveyed.
Table 1-119 shows differences between Topology Change Notification (TCN) BPDUs,
configuration BPDUs defined by STP, Rapid Spanning Tree (RST) BPDUs defined by Rapid
Spanning Tree Protocol (RSTP), and MST BPDUs defined by MSTP.
Table 1-119 Differences between BPDUs
Version Type Name
0 0x00 Configuration BPDU

0 0x80 TCN BPDU
2 0x02 RST BPDU
3 0x02 MST BPDU
MST BPDU Format

Figure 1-430 shows the MST BPDU format.
Issue 01 (2018-05-04) 652

NE20E-S2
Figure 1-430 MST BPDU format
The first 36 bytes of an intra-region or inter-region MST BPDU are the same as those of an
RST BPDU.
Fields from the 37th byte of an MST BPDU are MSTP-specific. The field MSTI
Configuration Messages consists of configuration messages of multiple MSTIs.
Table 1-120 lists the major information carried in an MST BPDU.
Table 1-120 Major information carried in an MST BPDU

Protocol 2 Indicates the protocol identifier.
Identifier
Protocol 1 Indicates the protocol version identifier. 0 indicates
Version STP; 2 indicates RSTP; 3 indicates MSTP.
Identifier
BPDU Type 1 Indicates the BPDU type:
 0x00: Configuration BPDU for STP
 0x80: TCN BPDU for STP
 0x02: RST BPDU or MST BPDU
Issue 01 (2018-05-04) 653

NE20E-S2

CIST Flags 1 Indicates the Common and Internal Spanning Tree
(CIST) flags.
CIST Root 8 Indicates the CIST root switching device ID.
Identifier
CIST External 4 Indicates the total path costs from the MST region
Path Cost where the switching device resides to the MST region
where the CIST root switching device resides. This
value is calculated based on link bandwidth.
CIST Regional 8 Indicates the ID of the regional root switching device
Root Identifier on the CIST, that is, the Internal Spanning Tree (IST)
master ID. If the root is in this region, the CIST
Regional Root Identifier is the same as the CIST Root
Identifier.
CIST Port 2 Indicates the ID of the designated port in the IST.
Identifier
Message Age 2 Indicates the lifecycle of the BPDU.
Max Age 2 Indicates the maximum lifecycle of the BPDU. If the
Max Age timer expires, it is considered that the link to
the root fails.
Hello Time 2 Indicates the Hello timer value.
Forward Delay 2 Indicates the forwarding delay timer.
Version 1 1 Indicates the BPDUv1 length, which is fixed to 0.
Length
Version 3 2 Indicates the BPDUv3 length.
Length
MST 51 Indicates the MST regional label information, which
Configuration includes four fields shown in Figure 1-431.
Identifier Interconnected switching devices that are configured
with the same MST configuration identifier belong to
one region. For details about these four fields, see
Table 1-121.
CIST Internal 4 Indicates the total path costs from the local port to the
Root Path Cost IST master. This value is calculated based on link
bandwidth.
CIST Bridge 8 Indicates the ID of the designated switching device on
Identifier the CIST.
CIST 1 Indicates the remaining hops of the BPDU in the CIST.
Remaining
Hops
MSTI 16 Indicates the Multiple Spanning Tree Instances (MSTI)
Configuration configuration information. Each MSTI configuration
Messages(may message uses 16 bytes, and therefore this field has N x
Issue 01 (2018-05-04) 654

NE20E-S2

be absent) 16 bytes in the case of N MSTIs. Figure 1-432 shows
the structure of a single MSTI configuration message.
Table 1-121 describes every sub-field.
Figure 1-431 shows the sub-fields in the MST Configuration Identifier field.
Figure 1-431 MST Configuration Identifier
Table 1-121 describes the sub-fields in the MST Configuration Identifier field.
Table 1-121 Description of sub-fields in the MST Configuration Identifier field
Sub-field Byte Description

Configuration 1 The value is 0.
Identifier Format
Selector
Configuration Name 32 Indicates the regional name. The value is a
32-byte string.
Revision Level 2 The value is a 2-byte non-negative integer.
Configuration 16 Indicates a 16-byte digest obtained by encrypting
Digest the mappings between VLANs and instances in
the region based on the HMAC-MD5 algorithm.
Figure 1-432 shows the sub-fields in the MST Configuration Messages field.
Figure 1-432 MSTI Configuration Messages
Issue 01 (2018-05-04) 655

NE20E-S2
Table 1-122 describes the sub-fields in the MSTI Configuration Messages field.
Table 1-122 Description of sub-fields in the MSTI Configuration Messages field
Sub-field Byte Description

MSTI Flags 1 Indicates the MSTI flags.
MSTI Regional Root 8 Indicates the MSTI regional root
Identifier switching device ID.
MSTI Internal Root 4 Indicates the total path costs from the
Path Cost local port to the MSTI regional root
switching device. This value is
calculated based on link bandwidth.
MSTI Bridge Priority 1 Indicates the priority value of the
designated switching device in the
MSTI.
MSTI Port Priority 1 Indicates the priority value of the
designated port in the MSTI.
MSTI Remaining Hops 1 Indicates the remaining hops of the
BPDU in the MSTI.
Configurable MST BPDU Format

Currently, there are two MST BPDU formats:
 dot1s: BPDU format defined in IEEE 802.1s.
 legacy: private BPDU format.
If a port transmits either dot1s or legacy BPDUs by default, the user needs to identify the
format of BPDUs sent by the peer, and then runs a command to configure the port to support
the peer BPDU format. Once the configuration is incorrect, a loop probably occurs due to
incorrect MSTP calculation.
By using the stp compliance command, you can configure a port on a Huawei datacom
device to automatically adjust the MST BPDU format. With this function, the port
automatically adopts the peer BPDU format. The following MST BPDU formats are
supported by Huawei datacom devices:
 auto
 dot1s
 legacy
In addition to dot1s and legacy formats, the auto mode allows a port to automatically switch
to the BPDU format used by the peer based on BPDUs received from the peer. In this manner,
the two ports use the same BPDU format. In auto mode, a port uses the dot1s BPDU format
by default, and keeps pace with the peer after receiving BPDUs from the peer.
Issue 01 (2018-05-04) 656

NE20E-S2
Configurable Maximum Number of BPDUs Sent by a Port at a Hello Interval

BPDUs are sent at Hello intervals to maintain the spanning tree. If a switching device does
not receive any BPDU during a certain period of time, the spanning tree will be re-calculated.
After a switching device becomes the root, it sends BPDUs at Hello intervals. Non-root
switching devices adopt the Hello Time value set for the root.
Huawei datacom devices allow the maximum number of BPDUs sent by a port at a Hello
interval to be configured as needed.
The greater the configured value, the more BPDUs can be sent at a Hello interval.
Configuring the maximum number to a proper value limits the number of BPDUs that can be
sent by a port at a Hello interval. This helps prevent network topology flapping and avoid
excessive use of bandwidth resources by BPDUs.
1.7.8.3.4 MSTP Topology Calculation
MSTP Principle
In Multiple Spanning Tree Protocol (MSTP), the entire Layer 2 network is divided into
multiple MST regions, which are interconnected by a single Common Spanning Tree (CST).
In a Multiple Spanning Tree (MST) region, multiple spanning trees are calculated, each of
which is called a Multiple Spanning Tree Instances (MSTI). Among these MSTIs, MSTI 0 is
also known as the internal spanning tree (IST). Like STP, MSTP uses configuration messages
to calculate spanning trees, but the configuration messages are MSTP-specific.
Vectors
Both MSTIs and the CIST are calculated based on vectors, which are carried in Multiple
Spanning Tree Bridge Protocol Data Units (MST BPDUs). Therefore, switching devices
exchange MST BPDUs to calculate MSTIs and the Common and Internal Spanning Tree
(CIST).
 Vectors are described as follows:
− The following vectors participate in the CIST calculation:
{root ID, external root path cost, region root ID, internal root path cost, designated
switching device ID, designated port ID, receiving port ID}
− The following vectors participate in the MSTI calculation:
{regional root ID, internal root path cost, designated switching device ID,
designated port ID, receiving port ID}
The priorities of vectors in braces are in descending order from left to right.
Table 1-123 describes the vectors.
Table 1-123 Vector description
Vector Name Description
Root ID Identifies the root switching device for the CIST. The root identifier
consists of the priority value (16 bits) and MAC address (48 bits).
External root path Indicates the path cost from a CIST regional root to the root. ERPCs
cost (ERPC) saved on all switching devices in an MST region are the same. If the
Issue 01 (2018-05-04) 657

NE20E-S2
Vector Name Description

CIST root is in an MST region, ERPCs saved on all switching
devices in the MST region are 0s.
Regional root ID Identifies the MSTI regional root. The regional root ID consists of the
priority value (16 bits) and MAC address (48 bits).
Internal root path Indicates the path cost from the local bridge to the regional root. The
cost (IRPC) IRPC saved on a regional edge port is greater than the IRPC saved on
a non-regional edge port.
Designated Identifies the nearest upstream bridge on the path from the local
switching device bridge to the regional root. If the local bridge is the root or the
ID regional root, this ID is the local bridge ID.
Designated port ID Identifies the port on the designated switching device connected to
the root port on the local bridge. The port ID consists of the priority
value (4 bits) and port number (12 bits). The priority value must be a
multiple of 16.
Receiving port ID Identifies the port receiving the BPDU. The port ID consists of the
priority value (4 bits) and port number (12 bits). The priority value
must be a multiple of 16.
 The vector comparison principle is as follows:

For a vector, the smaller the priority value, the higher the priority.
Vectors are compared based on the following rules:
a. Compare the IDs of the roots.
b. If the IDs of the roots are the same, compare ERPCs.
c. If ERPCs are the same, compare the IDs of regional roots.
d. If the IDs of regional roots are the same, compare IRPCs.
e. If IRPCs are the same, compare the IDs of designated switching devices.
f. If the IDs of designated switching devices are the same, compare the IDs of
designated ports.
g. If the IDs of designated ports are the same, compare the IDs of receiving ports.
If the priority of a vector carried in the configuration message of a BPDU received by a
port is higher than the priority of the vector in the configuration message saved on the
port, the port replaces the saved configuration message with the received one. In addition,
the port updates the global configuration message saved on the device. If the priority of a
vector carried in the configuration message of a BPDU received on a port is equal to or
lower than the priority of the vector in the configuration message saved on the port, the
port discards the BPDU.
CIST Calculation
After completing the configuration message comparison, the switching device with the
highest priority on the entire network is selected as the CIST root. MSTP calculates an IST for
each MST region, and computes a CST to interconnect MST regions. On the CST, each MST
Issue 01 (2018-05-04) 658

NE20E-S2
region is considered a switching device. The CST and ISTs constitute a CIST for the entire
network.
MSTI Calculation
In an MST region, MSTP calculates an MSTI for each VLAN based on mappings between
VLANs and MSTIs. Each MSTI is calculated independently. The calculation process is
similar to the process for STP to calculate a spanning tree. For details, see 1.7.8.2.4 STP
Topology Calculation.
MSTIs have the following characteristics:
 The spanning tree is calculated independently for each MSTI, and spanning trees of
MSTIs are independent of each other.
 MSTP calculates the spanning tree for an MSTI in the manner similar to STP.
 Spanning trees of MSTIs can have different roots and topologies.
 Each MSTI sends BPDUs in its spanning tree.
 The topology of each MSTI is configured by using commands.
 A port can be configured with different parameters for different MSTIs.
 A port can play different roles or have different status in different MSTIs.
On an MSTP-aware network, a VLAN packet is forwarded along the following paths:
 MSTI in an MST region
 CST among MST regions
MSTP Responding to Topology Changes

MSTP topology changes are processed in the manner similar to that in RSTP. For details
about how RSTP processes topology changes, see 1.7.8.2.6 RSTP Implementation.
1.7.8.3.5 MSTP Fast Convergence

Multiple Spanning Tree Protocol (MSTP) supports both ordinary and enhanced
Proposal/Agreement (P/A) mechanisms:
 Ordinary P/A
The ordinary P/A mechanism supported by MSTP is implemented in the same manner as
that supported by Rapid Spanning Tree Protocol (RSTP). For details about the P/A
mechanism supported by RSTP, see 1.7.8.2.6 RSTP Implementation.
 Enhanced P/A
Issue 01 (2018-05-04) 659

NE20E-S2
Figure 1-433 Enhanced P/A mechanism
As shown in Figure 1-433, in MSTP, the P/A mechanism works as follows:

a. The upstream router sends a proposal to the downstream router, indicating that the
port connecting to the downstream router wants to enter the Forwarding state as
soon as possible. After receiving this Bridge Protocol Data Units (BPDU), the
downstream router sets its port connecting to the upstream router to the root port,
and blocks all non-edge ports.
b. The upstream router continues to send an agreement. After receiving this BPDU,
the root port enters the Forwarding state.
c. The downstream router replies with an agreement. After receiving this BPDU, the
upstream router sets its port connecting to the downstream router to the designated
port, and the port enters the Forwarding state.
By default, Huawei routers use the enhanced P/A mechanism. If a Huawei router needs to
communicate with a non-Huawei device that uses the ordinary P/A mechanism, run the stp
no-agreement-check command to configure the Huawei router to use the ordinary P/A
mechanism. In this manner, these two devices can communicate with each other.
1.7.8.3.6 MSTP Multi-process
Background
On the network shown in Figure 1-434:
 UPEs are deployed at the aggregation layer, running MSTP.
 UPE1 and UPE2 are connected by a Layer 2 link.
 Multiple rings are connected to UPE1 and UPE2 through different ports.
 The routers on the rings reside at the access layer, running STP or RSTP. In addition,
UPE1 and UPE2 work for different carriers, and therefore they need to reside on
different spanning trees whose topology changes do not affect each other.
Issue 01 (2018-05-04) 660

NE20E-S2
Figure 1-434 Application with both MSTP and STP/RSTP
On the network shown in Figure 1-434, routers and UPEs construct multiple Layer 2 rings.
STP must be enabled on these rings to prevent loops. UPE1 and UPE2 are connected to
multiple access rings that are independent of each other. The spanning tree protocol cannot
calculate a single spanning tree for all routers. Instead, the spanning tree protocol must be
enabled on each ring to calculate a separate spanning tree.
MSTP supports MSTIs, but these MSTIs must belong to one MST region and routers in the
region must have the same configurations. If the routers belong to different regions, MSTP
calculates the spanning tree based on only one instance. Assume that routers on the network
belong to different regions, and only one spanning tree is calculated in one instance. In this
case, the status change of any router on the network affects the stability of the entire network.
On the network shown in Figure 1-434, the routers connected to UPEs support only STP or
RSTP but not MSTP. When MSTP-enabled UPEs receive RST BPDUs from the routers, the
UPEs consider that they and routers belong to different regions. As a result, only one spanning
tree is calculated for the rings composed of UPEs and routers, and the rings affect each other.
To prevent this problem, MSTP multi-process is introduced. MSTP multi-process is an
enhancement to MSTP. The MSTP multi-process mechanism allows ports on routers to be
bound to different processes. MSTP calculation is performed based on processes. In this
manner, only ports that are bound to a process participate in the MSTP calculation for this
process. With the MSTP multi-process mechanism, spanning trees of different processes are
calculated independently and do not affect each other. The network shown in Figure 1-434
can be divided into multiple MSTP processes by using MSTP multi-process. Each process
Issue 01 (2018-05-04) 661

NE20E-S2
takes charge of a ring composed of routers. The MSTP processes have the same functions and
support MSTIs. The MSTP calculation for one process does not affect the MSTP calculation
for another process.
MSTP multi-process is applicable to MSTP as well as RSTP and STP.
Purpose
On the network shown in Figure 1-434, MSTP multi-process is configured to implement the
following:
 Greatly improves applicability of STP to different networking conditions.
To help a network running different spanning tree protocols run properly, you can bind
the routers running different spanning tree protocols to different processes. In this
manner, every process calculates a separate spanning tree.
 Improves the networking reliability. For a network composed of many Layer 2 access
devices, using MSTP multi-process reduces the adverse effect of a single node failure on
the entire network.
The topology is calculated for each process. If a device fails, only the topology
corresponding to the process to which the device belongs changes.
 Reduces the network administrator workload during network expansion, facilitating
operation and maintenance.
To expand a network, you only need to configure new processes, connect the processes
to the existing network, and keep the existing MSTP processes unchanged. If device
expansion is performed in a process, only this process needs to be modified.
 Implements separate Layer 2 port management
An MSTP process manages parts of ports on a router. Layer 2 ports on a router are
separately managed by multiple MSTP processes.
Principles
 Public link status
As shown in Figure 1-434, the public link between UPE1 and UPE2 is a Layer 2 link
running MSTP. The public link between UPE1 and UPE2 is different from the links
connecting routers to UPEs. The ports on the public link need to participate in the
calculation for multiple access rings and MSTP processes. Therefore, the UPEs must
identify the process from which MST BPDUs are sent.
In addition, a port on the public link participates in the calculation for multiple MSTP
processes, and obtains different status. As a result, the port cannot determine its status.
To prevent this situation, it is defined that a port on a public link always adopts its status
in MSTP process 0 when participating in the calculation for multiple MSTP processes.
After a routers normally starts, MSTP process 0 exists by default, and MSTP configurations in the
system view and interface view belong to this process.
 Reliability
On the network shown in Figure 1-435, after the topology of a ring changes, the MSTP
multi-process mechanism helps UPEs flood a TC packet to all routers on the ring and
Issue 01 (2018-05-04) 662

NE20E-S2
prevent the TC packet from being flooded to routers on the other ring. UPE1 and UPE2
update MAC and ARP entries on the ports corresponding to the changed spanning tree.
Figure 1-435 MSTP multi-process topology change
On the network shown in Figure 1-436, if the public link between UPE1 and UPE2 fails,
multiple routers that are connected to the UPEs will unblock their blocked ports.
Issue 01 (2018-05-04) 663

NE20E-S2
Figure 1-436 Public link fault
Assume that UPE1 is configured with the highest priority, UPE2 with the second highest
priority, and routers with default or lower priorities. After the link between UPE1 and
UPE2 fails, the blocked ports (replacing the root ports) on routers no longer receive
packets with higher priorities and re-performs state machine calculation. If the
calculation changes the blocked ports to designated ports, a permanent loop occurs, as
shown in Figure 1-437.
Issue 01 (2018-05-04) 664

NE20E-S2
Figure 1-437 Loop between access rings
 Solutions
To prevent a loop between access rings, use either of the following solutions:
− Configure root protection between UPE1 and UPE2.
If all physical links between UPE1 and UPE2 fail, configuring an inter-board
Eth-Trunk link cannot prevent the loop. Root protection can be configured to
prevent the loop shown in Figure 1-437.
Issue 01 (2018-05-04) 665

NE20E-S2
Figure 1-438 MSTP multi-process with root protection
Use the blue ring shown in Figure 1-438 as an example. UPE1 is configured with
the highest priority, UPE2 with the second highest priority, and routers on the blue
ring with default or lower priorities. In addition, root protection is enabled on
UPE2.
Assume that a port on S1 is blocked. When the public link between UPE1 and
UPE2 fails, the blocked port on S1 begins to calculate the state machine because it
no longer receives BPDUs of higher priorities. After the calculation, the blocked
port becomes the designated port and performs P/A negotiation with the
downstream router.
After S1, which is directly connected to UPE2, sends BPDUs of higher priorities to
the UPE2 port enabled with root protection, the port is blocked. From then on, the
port remains blocked because it continues receiving BPDUs of higher priorities. In
this manner, no loop will occur.
1.7.8.4 E-STP Principles

Enhanced STP (E-STP) considers a pseudo wire (PW) an abstract interface and allows it to
participate in MSTP calculation to eliminate loops. E-STP prevents loops and duplicate traffic
Issue 01 (2018-05-04) 666

NE20E-S2
on inter-AS VPLS networks or CE dual-homing scenarios. The following section describes

how to implement E-STP by deploying STP on the AC side or PW side.
Unless otherwise specified, STP in this document includes STP defined in IEEE 802.1D, RSTP defined
in IEEE 802.1W, and MSTP defined in IEEE 802.1S.
STP Deployment on the AC Side

STP can be deployed on the AC side to resolve duplicate traffic reception problems on remote
PEs and upstream traffic load balancing problems on CEs.
 Background
Figure 1-439 Network with a loop
On the network shown in Figure 1-439, users access the VPLS network through a ring
network that is comprised of CE1, CE2, PE1, and PE2. The PEs are fully connected on
the VPLS network. The packet forwarding process is as follows (using the forwarding of
broadcast or unknown unicast packets from CE1 as an example):
a. After CE1 receives a broadcast or unknown unicast packet, it forwards the packet to
both PE1 and CE2.
b. After PE1 (CE2) receives the packet, it cannot find the outbound interface based on
the destination MAC address of the packet, and therefore broadcasts the packet.
Issue 01 (2018-05-04) 667

NE20E-S2
c. After PE2 receives the packet, it also broadcasts the packet. Because PEs do not
forward data received from a PW back to the PW, PE2 (PE1) sends the packet to a
CE and the remote PE.
As a result, a loop occurs on the path CE1 -> CE2 -> PE2 -> PE1 -> CE1 or the path
CE1 -> PE1 -> PE2 -> CE2 -> CE1. The CEs and PEs all receive duplicate traffic.
 Solution
To address this problem, enable STP on CE1, CE2, PE1, and PE2; deploy an mPW
between PE1 and PE2, deploy a service PW between PE1 and the PE and between PE2
and the PE, and associate service PWs with the mPW; enable MSTP for the mPW and
AC interfaces so that the mPW can participate in STP calculation and block a CE
interface to prevent duplicate traffic. In addition, configure PE1 and PE2 as the root
bridge and secondary root bridge so that the blocked port resides on the link between the
CEs.
As shown in Figure 1-440, STP is enabled globally on PE1, PE2, CE1, and CE2; an
mPW is deployed between PE1 and PE2; STP is enabled on GE 1/0/1 on PE1 and PE2
and on GE 1/0/1 and GE 1/0/2 on CE1 and CE2. PE2 is configured as the primary root
bridge and PE1 is configured as the secondary root bridge (determined by the bridge
priority) to block the port connecting CE2 to CE1. After STP calculation and association
between the mPW and service PWs are implemented, remote devices no longer receive
duplicate traffic.
Figure 1-440 MSTP deployed on the AC side
 Reliability
Issue 01 (2018-05-04) 668

NE20E-S2
On the network shown in Figure 1-441 the mPW does not detect a fault on the link
between the PE and PE2 because PE1 is reachable to the PE and a new service PW can
be created. In addition, the STP topology remains unchanged, and therefore the blocked
port is unchanged and STP recalculation is not required.
Figure 1-441 A fault occurs in MSTP deployment on the AC side (1)
If the STP topology changes, each node sends a TCN BPDU to trigger the updating of
local MAC address entries. In addition, the TCN BPDU triggers the PW to send MAC
Withdraw packets to instruct the remote device to update the learned MAC address
entries locally. In this manner, traffic is switched to an available link.
As shown in Figure 1-442, if the mPW between PE1 and PE2 fails, the ring network
topology is recalculated, and the blocked port on CE2 is unblocked and enters the
Forwarding state. In this situation, the remote PE receives permanent duplicate packets.
Issue 01 (2018-05-04) 669

NE20E-S2
To resolve this problem, configure root protection on the secondary root bridge PE1's GE
1/0/1 connecting to CE1. As shown in Figure 1-443, if the mPW between PE1 and PE2
fails, PE1's GE 1/0/1 is blocked because it receives BPDUs with higher priorities. As the
link along the path PE1 -> CE1 -> CE2 -> PE2 is working properly, PE1's blocked port
keeps receiving BPDUs with higher priorities, and therefore this port remains in the
blocked state. This prevents the remote PE from receiving duplicate traffic.
Issue 01 (2018-05-04) 670

NE20E-S2
 Load balancing
As shown in Figure 1-444, MSTP is enabled for ports connecting PEs and CEs, for the
mPW between PE1 and PE2, and for ports connecting CE1 and CE2. MSTP is globally
enabled on PE1, PE2, CE1, and CE2. After PE1 is configured as the primary root bridge
and PE2 is configured as the backup root bridge (determined by bridge priority), MSTP
calculation is performed to block the port connecting CE1 and CE2. A mapping is
configured between VLANs and MSTIs to implement load balancing.
Issue 01 (2018-05-04) 671

NE20E-S2
Figure 1-444 Load balancing networking
STP Deployment on the PW Side

STP can be deployed on the PW side to eliminate loops on inter-AS VPLS networks and
resolve duplicate traffic reception problems on the remote PE and upstream traffic load
balancing problems on CEs. Currently, E-STP applies only to inter-AS VPLS Option A.
Figure 1-445 shows an inter-AS VPLS Option A network.
1. ASBRs in different VPLS ASs (Metro-E areas) are connected back to back.
ASBR1#AS1, which functions as the CPE of ASBR1#AS2, accesses VSI#AS2;
ASBR1#AS2, which functions as the CPE of ASBR1#AS1, accesses VSI#AS1. A VPLS
or HVPLS network is set up in VPLS#AS1 and VPLS#AS2 (Metro-E areas) by using
LDP, and data is forwarded in the VSIs.
2. The local ASBR and the peer can be connected through PW interfaces, Layer 2 physical
interfaces, and Layer 3 physical interfaces. The peer ASBR is connected to the local
ASBR as a CE.
3. A ring network exists in between VPLS#AS1 and VPLS#AS2.
Issue 01 (2018-05-04) 672

NE20E-S2
Figure 1-445 Inter-AS VPLS in Option A networking
 Option A problem
In inter-AS VPLS Option A mode, redundant connections are established between ASs,
and broadcast and unknown unicast packets may be forwarded in a loop. As shown in
Figure 1-445, VPLS#AS1 and VPLS#AS2 are connected by two links to improve
reliability. After Option A is adopted, fully connected PWs between PEs and ASBRs in
an AS are configured with split horizon to prevent loops, but broadcast and unknown
unicast packets are looped between ASBRs. PEs receive duplicate packets even if
ASBRs in a VPLS AS are not connected.
 Dual protection of Option A
To resolve inter-AS loops, configure STP on ASBRs between ASs to break off the loops,
as shown in Figure 1-446. STP is running on Layer 2 ports, so Layer 2 links are required.
If Layer 2 links do not exist between ASBRs, PWs or Layer 3 ports must be added. STP
blocks a link on the inter-AS ring network to prevent broadcast and unknown unicast
packets from being forwarded in a loop and the remote PE from receiving duplicate
traffic.
Issue 01 (2018-05-04) 673

NE20E-S2
Figure 1-446 Dual protection of Option A networking
 Application scenarios of Option A - loop breakoff and duplicate traffic

As shown in Figure 1-446, STP is enabled for inter-AS links, and ASBR1#AS1 is
configured as the primary root bridge and ASBR2#AS1 is configured as the secondary
root bridge (determined by bridge priority). All nodes exchange BPDUs with each other
to calculate the roles of their ports. Port 1 of ASBR2#AS2 is blocked to break off the
loop and prevent the remote devices on the VPLS network from receiving duplicate
traffic.
When a fault occurs on ASBR1#AS2, the topology changes, as shown in Figure 1-447.
Each node recalculates the topology based on the received BPDUs and the blocked port
1 changes to the Forwarding state. As the network topology changes, each node sends a
TCN BPDU to trigger the updating of local MAC address entries. In addition, the TCN
BPDU triggers the PW to send MAC Withdraw packets to instruct the remote device to
update the learned MAC address entries locally. In this manner, traffic is switched to an
available link.
Issue 01 (2018-05-04) 674

NE20E-S2
Figure 1-447 Duplicate traffic of Option A
 Application scenarios of Option A - load balancing

− As shown in Figure 1-448, inter-AS ASBRs are connected through Layer 2 or Layer
3 interfaces. VLANs on an interface can be allocated to different instances by using
the MSTP multi-instance feature. Then MSTP can block a port based on the
instances. Each AS contains multiple MSTIs that are independent of each other.
Therefore, load balancing can be implemented.
Figure 1-448 Load balancing networking (1)
Issue 01 (2018-05-04) 675

NE20E-S2
− As shown in Figure 1-449, PWs between ASBRs are fully connected. By using the
MSTP multi-process feature, E-STP associates mPWs with MSTP processes.
Processes are independent of each other, and therefore the mPWs are independent
of each other. Multiple service PWs are associated with an mPW. After the mPW is
blocked, the associated service PWs are also blocked. This helps break off the loop
between VPLS ASs and perform load balancing by blocking an interface as
required.
Figure 1-449 Load balancing network (2)
1.7.8.5.1 STP Application
On a complex network, loops are inevitable. With the requirement for network redundancy
backup, network designers tend to deploy multiple physical links between two devices, one of
which is the master and the others are the backup. Loops are likely or bound to occur in such
a situation. Loops can cause flapping of MAC address tables and therefore damages MAC
address entries.
Issue 01 (2018-05-04) 676

NE20E-S2
Figure 1-450 Networking diagram for a typical STP application
On the network shown in Figure 1-450, after CE and PE running STP discover loops on the
network by exchanging information with each other, they trim the ring topology into a
loop-free tree topology by blocking a certain port. In this manner, replication and circular
propagation of packets are prevented on the network and the switching devices are released
from processing duplicated packets, thereby improving their processing performance.
1.7.8.5.2 Application of MSTP

Multiple Spanning Tree Protocol (MSTP) allows packets in different VLANs to be forwarded
by using different spanning tree instances, as shown in Figure 1-451. The configurations are
as follows:
 All devices on the network belong to the same Multiple Spanning Tree (MST) region.
 VLAN 10 packets are forwarded within MSTI 1; VLAN 30 packets are forwarded within
MSTI 3; VLAN 40 packets are forwarded within MSTI 4; VLAN 20 packets are
forwarded within MSTI 0.
On the network shown in Figure 1-451, Device A and Device B are aggregation-layer devices,
and Device C and Device D are access-layer devices. VLAN 10 and VLAN 30 are terminated
on aggregation-layer devices, and VLAN 40 is terminated on an access-layer device.
Therefore, Device A and Device B can be configured as the roots of instances 1 and 3
respectively; Device C can be configured as the root of instance 4.
Issue 01 (2018-05-04) 677

NE20E-S2
Figure 1-451 Networking diagram for a typical MSTP application
1.7.8.5.3 BPDU Tunneling

The Bridge Protocol Data Unit (BPDU) tunneling technology allows a user's networks located
in different areas to transparently transmit BPDUs on a specified VLAN VPN within an
operator's network. In this manner, all devices on the user's networks can calculate the
spanning tree. The user's networks and the operator's networks have their own independent
spanning trees.
As shown in Figure 1-452, the upper part is an operator's network; the lower part is a user's
network. The operator's networks hold ingress/egress devices; the user's networks consist of
user's network A and user's network B.
You can configure the packet ingress device to replace the original destination MAC address
of a BPDU with a MAC address in a special format and the packet egress device to replace
the MAC address in a special format with the original MAC address. In this manner, the
BPDU is transparently transmitted.
Issue 01 (2018-05-04) 678

NE20E-S2
Figure 1-452 Networking diagram for BPDU transparent transmission
1.7.8.5.4 Application of MSTP Multi-process

As shown in Figure 1-453, the UPEs are connected to each other through Layer 2 links and
enabled with Multiple Spanning Tree Protocol (MSTP). The rings connected to the UPEs
must be independent of each other. The devices on the rings connected to the UPEs support
only Rapid Spanning Tree Protocol (RSTP), not MSTP.
After MSTP multi-process is enabled, each MSTP process corresponds to a ring connected to
the UPE. The spanning tree protocol on each ring calculates a tree independently.
Issue 01 (2018-05-04) 679

NE20E-S2
Figure 1-453 Application with MSTP multi-process
Terms
Term Definition
STP Spanning Tree Protocol. A protocol used in the local area network (LAN)
to eliminate loops. Devices running STP discover loops in the network by
exchanging information with each other, and block certain interfaces to
eliminate loops.
RSTP Rapid Spanning Tree Protocol. A protocol which is given detailed
description by the IEEE 802.1w. Based on STP, RSTP modifies and
supplements to STP, and is therefore able to implement faster
convergence than STP.
MSTP Multiple Spanning Tree Protocol. A new spanning tree protocol defined in
IEEE 802.1s that introduces the concepts of region and instance. To meet
different requirements, MSTP divides a large network into regions where
multiple spanning tree instances (MSTIs) are created. These MSTIs are
mapped to virtual LANs (VLANs) and bridge protocol data units
(BPDUs) carrying information about regions and instances are transmitted
Issue 01 (2018-05-04) 680

NE20E-S2
Term Definition
between network bridges, and therefore, a network bridge can know
which region itself belongs to based on the BPDU information.
Multi-instance RSTP is run within regions, whereas RSTP-compatible
protocols are run between regions.
VLAN Virtual Local Area Network. A switched network and an end-to-end
logical network that is constructed by using the network management
software across different network segments and networks. A VLAN forms
a logical subnet, that is, a logical broadcast domain. One VLAN can
include multiple network devices.

Abbreviation
STP Spanning Tree Protocol

RSTP Rapid Spanning Tree Protocol
MSTP Multiple Spanning Tree Protocol
BPDU Bridge Protocol Data Unit
CIST Common and Internal Spanning Tree
CST Common Spanning Tree
IST Internal Spanning Tree
SST Single Spanning Tree
MST Multiple Spanning Tree
MSTI Multiple Spanning Tree Instance
TCN Topology Change Notification
1.7.9 ERPS (G.8032)

Definition
Ethernet Ring Protection Switching (ERPS) is a protocol defined by the International
Telecommunication Union - Telecommunication Standardization Sector (ITU-T) to prevent
loops at Layer 2. As the standard number is ITU-T G.8032/Y1344, ERPS is also called G.8032.
ERPS defines Ring Auto Protection Switching (RAPS) Protocol Data Units (PDUs) and
protection switching mechanisms. It can be used for communication between Huawei and
non-Huawei devices on a ring network.
Issue 01 (2018-05-04) 681

NE20E-S2
ERPSv1 and ERPSv2 are currently available. ERPSv1 was released by the ITU-T in June
2008, and ERPSv2 was released by the ITU-T in August 2010. ERPSv2, fully compatible
with ERPSv1, extends ERPSv1 functions. Table 1-124 compares ERPSv1 and ERPSv2.
Table 1-124 Comparison between ERPSv1 and ERPSv2
Function ERPSv1 ERPSv2
Ring type Supports single rings only. Supports single rings and
multi-rings. A multi-ring
topology comprises major rings
and sub-rings.
Port role Supports the RPL owner port Supports the RPL owner port,
configuration and ordinary ports. RPL neighbor port, and
ordinary ports.
Topology change Not supported. Supported.
notification
R-APS PDU Not supported. Supported.
transmission modes
on sub-rings
Revertive and Supports revertive switching by Supported.
non-revertive default and does not support
switching non-revertive switching or
switching mode configuration.
Manual port blocking Not supported. Supports forced switch (FS) and
manual switch (MS).
As ERPSv2 is fully compatible with ERPSv1, configuring ERPSv2 is recommended if all devices on an
ERPS ring support both ERPSv1 and ERPSv2.
Purpose
Generally, redundant links are used on an Ethernet switching network to provide link backup
and enhance network reliability. The use of redundant links, however, may produce loops,
causing broadcast storms and rendering the MAC address table unstable. As a result, the
communication quality deteriorates, and communication services may even be interrupted. To
resolve these problems, ERPS can be used for loop avoidance purposes.
ERPS blocks the ring protection link (RPL) owner port to remove loops and unblocks it if a
link fault occurs to promptly restore communication.
Table 1-125 compares various ring network protocols.
Issue 01 (2018-05-04) 682

NE20E-S2
Table 1-125 Ring network protocol comparison
Ring Network Advantage Disadvantage

Protocol
ERPS  Fast convergence, meeting Requires complex manual

carrier-class reliability configurations to perform
requirements. functions.
 Is a standard ITU-T
protocol that allows Huawei
devices to communicate
with non-Huawei devices.
 Supports single and
multi-ring topologies in
ERPSv2.
STP/RSTP/MSTP  Applies to all Layer 2 Slow to converge large-scale
networks. networks and fails to meet
 Is a standard IEEE protocol carrier-class reliability
that allows Huawei devices requirements.
to communicate with
non-Huawei devices.
Benefits
This feature offers the following benefits:
 Protects services and prevents broadcast storms on ring networks.
 Meets carrier-class reliability requirements for network convergence.
 Allows communication between Huawei and non-Huawei devices on ring networks.
1.7.9.2 Principles
Introduction
Ethernet Ring Protection Switching (ERPS) is a protocol used to block specified ports to
prevent loops at the link layer of an Ethernet network.
On the network shown in Figure 1-454, Device A through Device D constitute a ring and are
dual-homed to an upstream IP/MPLS network. This access mode will cause a loop on the
entire network. To eliminate redundant links and ensure link connectivity, ERPS is used to
prevent loops.
Issue 01 (2018-05-04) 683

NE20E-S2
Figure 1-454 ERPS single-ring networking
Figure 1-454 shows a typical ERPS single-ring network. The following describes ERPS based
on this networking:
ERPS Ring
An ERPS ring consists of interconnected switches that have the same control VLAN. A ring is
a basic ERPS unit.
ERPS rings are classified as major rings (closed) or sub-rings (open). On the network shown
in Figure 1-455, Device A through Device D constitute a major ring, and Device C through
Device F constitute a sub-ring.
Only ERPSv2 supports sub-rings.
Issue 01 (2018-05-04) 684

NE20E-S2
Figure 1-455 ERPS major ring and sub-ring networking
Node
A node refers to a switch added to an ERPS ring. A node can have a maximum of two ports
added to the same ERPS ring. Device A through Device D in Figure 1-454 are nodes on an
ERPS major ring.
Port Role
ERPS defines three port roles: ring protection link (RPL) owner port, RPL neighbor port (only
in ERPSv2), and ordinary port.
 RPL owner port
An RPL owner port is a ring port responsible for blocking traffic over the RPL to prevent
loops. An ERPS ring has only one RPL owner port.
When the node on which the RPL owner port resides receives an R-APS PDU indicating
that a link or node on the ring fails, it unblocks the RPL owner port to allow the port to
send and receive traffic. This process ensures that traffic is not interrupted.
 RPL neighbor port
An RPL neighbor port is a ring port directly connected to an RPL owner port and is used
to reduce the number of times that filtering database (FDB) entries are refreshed.
RPL owner and neighbor ports are both blocked under normal conditions to prevent
loops.
If an ERPS ring fails, both RPL owner and neighbor ports are unblocked.
 Ordinary port
Ordinary ports are ring ports other than the RPL owner and neighbor ports.
An ordinary port monitors the status of the directly connected ERPS link and sends
R-APS PDUs to inform the other ports if the link status changes.
Port Status
On an ERPS ring, an ERPS-enabled port can be in either of the following states:
 Forwarding: The port forwards user traffic and sends and receives R-APS PDUs.
 Discarding: The port does not forward user traffic but can receive and send ERPS R-APS
PDUs.
Issue 01 (2018-05-04) 685

NE20E-S2
Control VLAN
A control VLAN is configured for an ERPS ring to transmit R-APS PDUs. Each ERPS ring
must be configured with a control VLAN. After a port is added to an ERPS ring that has a
control VLAN configured, the port is added to the control VLAN automatically. Different
ERPS rings cannot be configured with the same control VLAN ID.
Unlike control VLANs, data VLANs are used to transmit data packets.
ERP Instance
On a router running ERPS, the VLAN in which R-APS PDUs and data packets are
transmitted must be mapped to an Ethernet Ring Protection (ERP) instance so that ERPS
forwards or blocks the VLAN packets based on blocking rules. Otherwise, VLAN packets
will probably cause broadcast storms on the ring network and render the network unavailable.
Timer
ERPS defines four timers: guard timer, WTR timer, hold-off timer, and WTB timer (only in
ERPSv2).
 Guard timer
After a faulty link or node recovers or a clear operation is executed, the nodes on the two
ends of the link or the recovered node sends R-APS No Request (NR) messages to
inform the other nodes of the link or node recovery and starts a guard timer. Before the
timer expires, each involved node does not process any R-APS PDUs to avoid receiving
out-of-date R-APS (SF) messages. After the timer expires, if the involved node still
receives an R-APS (SF) message, the local port enters the Forwarding state.
 WTR Timer
If the RPL owner port is unblocked due to a link or node failure, the involved port may
not go Up immediately after the link or node recovers. To prevent the RPL owner port
from alternating between Up and Down, the node where the RPL owner port resides
starts a WTR timer after receiving an R-APS (NR) message. If the node receives an
R-APS Signal Fail (SF) message before the timer expires, it terminates the WTR timer
(R-APS SF message: a message sent by a node to other nodes after the node in an ERPS
ring detects that one of its ring ports becomes Down). If the node does not receive any
R-APS (SF) message before the timer expires, it blocks the RPL owner port when the
timer expires and sends an R-APS (NR, RB) message. After receiving this R-APS (NR,
RB) message, the nodes set their recovered ports on the ring to the Forwarding state.
 Hold-off timer
Protection switching sequence requirements vary for Layer 2 networks running ERPS.
For example, in a multi-layer service application, a certain period of time is required for
a server to recover should it fail. (During this period, no protection switching is
performed, and the client does not detect the failure.) A hold-off timer can be set to
ensure that the server is given adequate time to recover. If a fault occurs, the fault is not
immediately reported to ERPS. Instead, the hold-off timer starts. If the fault persists after
the timer expires, the fault will be reported to ERPS.
 WTB timer
The WTB timer starts after an FS or MS operation is performed. When multiple nodes
on an ERPS ring are in the FS or MS state, the clear operation takes effect only after the
WTB timer expires. This ensures that the RPL owner port will not be blocked
immediately.
The WTB timer value cannot be configured. Its value is the guard timer value plus 5.
Issue 01 (2018-05-04) 686

NE20E-S2
Revertive and Non-revertive Switching

After link faults are rectified, whether to re-block the RPL owner port depends on the
switching mode.
 In revertive switching, the RPL owner port is re-blocked after the wait to restore (WTR)
timer expires, and the traffic channel is blocked on the RPL.
 In non-revertive switching, the traffic channel continues to use the RPL.
ERPSv1 supports only revertive switching. ERPSv2 supports both revertive and non-revertive
switching.
Port Blocking Modes

ERPSv2 supports manual port blocking.
If the RPL has high bandwidth, blocking a low-bandwidth link and unblocking the RPL
allows traffic to use the RPL and have more bandwidth. ERPS supports two manual port
blocking modes: forced switch (FS) and manual switch (MS).
 FS: forcibly blocks a port immediately after FS is configured, irrespective of whether
link failures have occurred.
 MS: forcibly blocks a port when link failures and FS conditions are absent.
In addition to FS and MS operations, ERPS also supports the clear operation. The clear
operation has the following functions:
 Clears an existing FS or MS operation.
 Triggers revertive switching before the WTR or wait to block (WTB) timer expires in the
case of revertive operations.
 Triggers revertive switching in the case of non-revertive operations.
R-APS PDU Transmission Mode on Sub-rings

ERPSv2 supports single and multi-ring topologies. In multi-ring topologies, sub-rings have
either R-APS virtual channels (VCs) or non-virtual channels (NVCs).
 With VCs: R-APS PDUs on sub-rings are transmitted to the major ring through
interconnection nodes. The RPL owner port of a sub-ring blocks both R-APS PDUs and
data traffic.
 With NVCs: R-APS PDUs on sub-rings are terminated on the interconnection nodes. The
RPL owner port blocks data traffic but not R-APS PDUs on each sub-ring.
On the network shown in Figure 1-456, a major ring is interconnected with two sub-rings. The
sub-ring on the left has a VC, whereas the sub-ring on the right has an NVC.
Issue 01 (2018-05-04) 687

NE20E-S2
Figure 1-456 Interconnected rings with a VC or NVC
By default, sub-rings use NVCs to transmit R-APS PDUs, except for the scenario shown in
Figure 1-457.
When sub-ring links are not contiguous, VCs must be used. On the network shown in Figure 1-457,
links b and d belong to major rings 1 and 2, respectively; links a and c belong to the sub-ring. Because
links a and c are not contiguous, they cannot detect the status change between each other. Therefore,
VCs must be used for R-APS PDU transmission.
Figure 1-457 VC application networking
Table 1-126 lists the advantages and disadvantages of R-APS PDU transmission modes on
sub-rings with VCs or NVCs.
Issue 01 (2018-05-04) 688

NE20E-S2
Table 1-126 Comparison between R-APS PDU transmission modes on sub-rings with VCs or
NVCs
R-APS Advantage Disadvantage

PDU
Transmis
sion
Mode on
Sub-ring
s
Using Applies to scenarios in which  Requires VC resource reservation and
VCs sub-ring links are not control VLAN assignment from
contiguous. Existing Ethernet adjacent rings.
ring networks, even non-ERPS  R-APS PDUs of sub-rings are
ring networks, can be transmitted through VCs, and therefore
interconnected using VCs. The sub-rings do not detect topology
existing ring networks can changes of neighboring networks. This
function as major rings, without may affect protection switching
any additional configuration. performance if these topology changes
require protection switching on the
sub-rings.
Using  Does not require resource Does not apply to scenarios in which
NVCs reservation or control VLAN sub-ring links are not contiguous.
assignment from adjacent
rings.
 Each sub-ring has
independent switching time,
irrelevant to other network
topologies.
1.7.9.2.2 R-APS PDU Format

Ethernet Ring Protection Switching (ERPS) protocol packets are called R-APS PDUs. Ring
Auto Protection Switching (R-APS) Protocol Data Units (PDUs) are transmitted on ERPS
rings to convey ERPS ring information. Figure 1-458 shows the basic R-APS PDU format.
Figure 1-458 Basic R-APS PDU format
Issue 01 (2018-05-04) 689

NE20E-S2
Table 1-127 describes the fields in an R-APS PDU.
Table 1-127 R-APS PDU field description
Field Name Lengt Description

h
MEL 3 bits Identifies the maintenance entity group (MEG) level of the
R-APS PDU.
Version 5 bits  0x00: used in ERPSv1.
 0x01: used in ERPSv2.
OpCode 8 bits Indicates an R-APS PDU. The value of this field is 0x28.
Flags 8 bits Is reserved. The value of this field is fixed at 0x00.
TLV Offset 8 bits Indicates that the TLV starts after an offset of 32 bytes. The
value of this field is fixed at 0x20.
R-APS Specific 32 x 8 Carries R-APS ring information and is the core in an R-APS
Information bits PDU. This field has different meanings for some of its
sub-fields in ERPSv1 and ERPSv2. Figure 1-459 shows the
R-APS Specific Information field format in ERPSv1. Figure
1-460 shows the R-APS Specific Information field format in
ERPSv2.
TLV Not Describes information to be loaded. The end TLV value is
limite 0x00.
d
Figure 1-459 R-APS Specific Information field format in ERPSv1
Issue 01 (2018-05-04) 690

NE20E-S2
Figure 1-460 R-APS Specific Information field format in ERPSv2
Table 1-128 describes sub-fields in the R-APS Specific Information field.
Table 1-128 Sub-fields in the R-APS Specific Information field
Sub-Field Length Description

Name
Request/St 4 bits Indicates that this R-APS PDU is a request or state PDU. The
ate value can be:
 1101: forced switch (FS)
 1110: Event
 1011: signal failed (SF)
 0111: manual switch (MS)
 0000: no request (NR)
 Others: reserved
Reserved 1 4 bits Reserved 1 is used in ERPSv1 for message reply or protection
identifier.
Sub-code
Sub-code is used in ERPSv2 with its value determined by the
Request/State field value:
 If the Request/State field value is 1110, the Sub-code value
is 0000, meaning Flush Request.
 If the Request/State field value is any value other than
1110, the Sub-code value is 0000 and ignored upon
reception.
Status 8 bits Includes the following status information:

 RPL Blocked (RB) (1 bit): If the value is 1, the RPL owner
port is blocked; if the value is 0, the RPL owner port is
unblocked. The nodes without the RPL owner port set this
sub-field to 0 when sending an R-APS PDU.
 Do Not Flush (DNF) (1 bit): If the value is 1, an FDB flush
should not be triggered by the reception of the R-APS
PDU; if the value is 0, an FDB flush may be triggered by
the reception of the R-APS PDU.
 Blocked port reference (BPR) (1 bit): If the value is 0, ring
Issue 01 (2018-05-04) 691

NE20E-S2
Sub-Field Length Description

Name
link 0 is blocked; if the value is 1, ring link 1 is blocked.
BPR is valid only in ERPSv2.
 Status Reserved (5 bits): This sub-field is reserved for
future specification and should be ignored upon reception.
This sub-field should be encoded as all 0s in transmission.
Node ID 6 x 8 bits Identifies the MAC address of a node on the ERPS ring. It is
informational and does not affect protection switching on the
ERPS ring.
Reserved 2 24 x 8 bits Reserved for future extension and should be ignored upon
reception. Currently, this sub-field should be encoded as all 0s
in transmission.
1.7.9.2.3 ERPS Single Ring Principles

ERPS is a standard ring protocol used to prevent loops on ERPS rings at the Ethernet link
layer. A router can have a maximum of two ports added to the same ERPS ring.
To prevent loops on an ERPS ring, you can enable a loop-breaking mechanism to block the
Ring Protection Link (RPL) owner port to eliminate loops. If a link on the ring network fails,
the ERPS-enabled router immediately unblocks the blocked port and performs link switching
to restore communication between nodes on the ring network.
This section describes how ERPS is implemented on a single ring when links are normal,
when a link fails, and when the link recovers (including protection switching operations).
Links Are Normal

On the network shown in Figure 1-461, Device A through Device E constitute a ring network,
and they can communicate with each other.
1. To prevent loops, ERPS blocks the RPL owner port and also the RPL neighbor port (if
any is configured). All other ports can transmit service traffic.
2. The RPL owner port sends R-APS (NR) messages to all other nodes on the ring at an
interval of 5s, indicating that ERPS links are normal.
Issue 01 (2018-05-04) 692

NE20E-S2
Figure 1-461 ERPS single ring networking (links are normal)
A Link Fails
As shown in Figure 1-462, if the link between Device D and Device E fails, the ERPS
protection switching mechanism is triggered. The ports on both ends of the faulty link are
blocked, and the RPL owner port and RPL neighbor port are unblocked to send and receive
traffic. This mechanism ensures that traffic is not interrupted. The process is as follows:
1. After Device D and Device E detect the link fault, they block their ports on the faulty
link and perform a Filtering Database (FDB) flush.
2. Device D and Device E send three consecutive R-APS Signal Fail (SF) messages to the
other LSWs and then send one R-APS (SF) message at an interval of 5s afterwards.
3. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush. Device
C on which the RPL owner port resides and Device B on which the RPL neighbor port
resides unblock the respective RPL owner port and RPL neighbor port, and perform an
FDB flush.
Issue 01 (2018-05-04) 693

NE20E-S2
Figure 1-462 ERPS single ring networking (unblocking the RPL owner port and RPL neighbor
port if a link fails)
The Link Recovers

After the link fault is rectified, either of two situations may occur:
 If the ERPS ring uses revertive switching, the RPL owner port is blocked again, and the
link that has recovered is used to forward traffic.
 If the ERPS ring uses non-revertive switching, the RPL remains unblocked, and the link
that has recovered remains blocked.
The following example uses revertive switching to describe the process after the link
recovers.
1. After the link between Device D and Device E recovers, Device D and Device E start a
guard timer to avoid receiving out-of-date R-APS PDUs. The two routers do not receive
any R-APS PDUs before the timer expires. At the same time, Device D and Device E
send R-APS (NR) messages to the other LSWs.
2. After receiving an R-APS (NR) message, Device C on which the RPL owner port resides
starts the WTR timer. After the WTR timer expires, Device C blocks the RPL owner port
and sends R-APS (NR, RB) messages.
Issue 01 (2018-05-04) 694

NE20E-S2
3. After receiving an R-APS (NR, RB) message, Device D and Device E unblock the ports
at the two ends of the link that has recovered, stop sending R-APS (NR) messages, and
perform an FDB flush. The other LSWs also perform an FDB flush after receiving an
R-APS (NR, RB) message.
Protection Switching
 Forced switch
On the network shown in Figure 1-463, Device A through Device E on the ERPS ring
can communicate with each other. A forced switch (FS) operation is performed on the
Device E's port that connects to Device D, and the Device E's port is blocked. The RPL
owner port and RPL neighbor port are then unblocked to send and receive traffic. This
ensures that traffic is not interrupted. The process is as follows:
a. After the Device E's port that connects to Device D is forcibly blocked, Device E
performs an FDB flush.
b. Device E sends three consecutive R-APS (SF) messages to the other LSWs and then
after 5s, sends another R-APS (SF) message.
c. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush.
Device C on which the RPL owner port resides and Device B on which the RPL
neighbor port resides unblock the respective RPL owner port and RPL neighbor
port, and perform an FDB flush.
Issue 01 (2018-05-04) 695

NE20E-S2
Figure 1-463 Layer 2 ERPS ring networking (blocking a port by FS)
 Clear
After a clear operation is performed on Device E, the port that is forcibly blocked by FS
sends R-APS (NR) messages to all other ports on the ERPS ring.
− If the ERPS ring uses revertive switching, the RPL owner port starts the WTB timer
after receiving an R-APS (NR) message. After the WTB timer expires, the FS
operation is cleared. The RPL owner port is then blocked, and the blocked port on
Device E is unblocked. If you perform a clear operation on Device C on which the
RPL owner port resides before the WTB timer expires, the RPL owner port is
immediately blocked, and the blocked port on Device E is unblocked.
− If the ERPS ring uses non-revertive switching and you want to block the RPL
owner port, perform a clear operation on Device C on which the RPL owner port
resides.
 Manual switch
Compared with an FS operation, a manual switch (MS) operation triggers protection
switching in a similar way except that an MS operation does not take effect in FS, MS,
or link failure conditions.
Issue 01 (2018-05-04) 696

NE20E-S2
1.7.9.2.4 ERPS Multi-ring Principles

Ethernet Ring Protection Switching version 1 (ERPSv1) supports only single ring topology,
whereas ERPSv2 supports single and multi-ring topologies.
A multi-ring network consists of one or more major rings and sub-rings. A sub-ring can have a
virtual channel (VC) or non-virtual channel (NVC), depending on whether R-APS PDUs on
the sub-ring will be transmitted to a major ring.
This section describes how ERPS is implemented on a multi-ring network with sub-rings that
have NVCs when links are normal, when a link fails, and when the link recovers.
Links Are Normal

On the multi-ring network shown in Figure 1-464, Device A through Device E constitute a
major ring; Device B, Device C, and Device F constitute sub-ring 1, and Device C, Device D,
and Device G constitute sub-ring 2. The devices on each ring can communicate with each
other.
1. To prevent loops, each ring blocks its RPL owner port. All other ports can transmit
service traffic.
2. The RPL owner on each ring sends R-APS (NR) messages to all other nodes on the same
ring at an interval of 5s. The R-APS (NR) messages on the major ring are transmitted
only on this ring. The R-APS (NR) messages on each sub-ring are terminated on the
interconnection nodes and therefore are not transmitted to the major ring.
Traffic between PC1 and the upper-layer network travels along the path PC1 <-> Device F
<-> Device B <-> Device A <-> PE1; traffic between PC2 and the upper-layer network travels
along the path PC2 <-> Device G <-> Device D <-> Device E <-> PE2.
Issue 01 (2018-05-04) 697

NE20E-S2
Figure 1-464 ERPS multi-ring networking (links are normal)
A Link Fails
As shown in Figure 1-465, if the link between Device D and Device G fails, the ERPS
protection switching mechanism is triggered. The ports on both ends of the faulty link are
blocked, and the RPL owner port on sub-ring 2 is unblocked to send and receive traffic. In this
situation, traffic from PC1 still travels along the original path. Device C and Device D inform
the other nodes on the major ring of the topology change so that traffic from PC2 is also not
interrupted. Traffic between PC2 and the upper-layer network travels along the path PC2 <->
Device G <-> Device C <-> Device B <-> Device A <-> Device E <-> PE2. The process is as
follows:
1. After Device D and Device G detect the link fault, they block their ports on the faulty
link and perform a Filtering Database (FDB) flush.
2. Device G sends three consecutive R-APS (SF) messages to the other LSWs and then
sends one R-APS (SF) message at an interval of 5s afterwards.
Issue 01 (2018-05-04) 698

NE20E-S2
3. Device G then unblocks the RPL owner port and performs an FDB flush.
4. After the interconnection node Device C receives an R-APS (SF) message, it performs
an FDB flush. Device C and Device D then send R-APS Event messages within the
major ring to notify the topology change in sub-ring 2.
5. After receiving an R-APS Event message, the other LSWs on the major ring perform an
FDB flush.
Then traffic from PC2 is switched to a normal link.
Figure 1-465 ERPS multi-ring networking (unblocking the RPL owner port if a link fails)
The Link Recovers

After the link fault is rectified, either of two situations may occur:
 If the ERPS ring uses revertive switching, the RPL owner port is blocked again, and the
link that has recovered is used to forward traffic.
Issue 01 (2018-05-04) 699

NE20E-S2
 If the ERPS ring uses non-revertive switching, the RPL remains unblocked, and the link
that has recovered remains blocked.
The following example uses revertive switching to describes the process after the link
recovers.
1. After the link between Device D and Device G recovers, Device D and Device G start a
guard timer to avoid receiving out-of-date R-APS PDUs. The two routers do not receive
any R-APS PDUs before the timer expires. Then Device D and Device G send R-APS
(NR) messages within sub-ring 2.
2. Device G on which the RPL owner port resides starts the WTR timer. After the WTR
timer expires, Device G blocks the RPL owner port and unblocks its port on the link that
has recovered and then sends R-APS (NR, RB) messages within sub-ring 2.
3. After receiving an R-APS (NR, RB) message from Device G, Device D unblocks its port
on the recovered link, stops sending R-APS (NR) messages, and performs an FDB flush.
Device C also performs an FDB flush.
4. Device C and Device D, the interconnection nodes, then send R-APS Event messages
within the major ring to notify the link recovery of sub-ring 2.
5. After receiving an R-APS Event message, the other LSWs on the major ring perform an
FDB flush.
Then traffic changes to the normal state, as shown in Figure 1-464.
1.7.9.2.5 ERPS Multi-instance

On a common ERPS network, a physical ring can be configured with a single ERPS ring, and
a single blocked port can be specified on the ring. If the ERPS ring is complete, the blocked
port prevents all user packets from passing through. As a result, all user packets travel through
a single path over the ERPS ring, and the other link on the blocked port becomes idle, causing
bandwidth wastes.
ERPS multi-instance allows two logical ERPS rings on a physical ring. On the network
shown in Figure 1-466, Device A through Device D constitute a physical ring that has two
single ERPS rings. Each ERPS ring has its routers, port roles, and control VLANs
independently configured. Therefore, the physical ring has two blocked ports. Each blocked
port verifies the completeness of the physical ring and blocks or forwards data without
affecting each other.
ERPS multi-instance allows a physical ring to have two ERPS rings. Each ERPS ring is
configured with one or more ERP instances. Each ERP instance represents a VLAN range.
The topology calculated for an ERPS ring does not apply to or affect the other ERPS ring.
With a specific ERP instance for each ERPS ring, a blocked port takes effect only for VLANs
of that specific ERPS ring. Different VLANs can use separate paths, implementing traffic load
balancing and link backup.
Issue 01 (2018-05-04) 700

NE20E-S2
Figure 1-466 ERPS multi-instance networking
1.7.9.2.6 Association Between ERPS and Ethernet CFM

When a transmission device is connected to an Ethernet Ring Protection Switching (ERPS)
ring and fails, ERPS, in absence of an automatic link detection mechanism, cannot quickly
detect the device failure. This issue will make convergence slow or even cause service
interruption in worse cases. To resolve this problem, ERPS can be associated with Ethernet
connectivity fault management (CFM).
After Ethernet CFM is deployed on ERPS nodes connecting to transmission devices and
detects a transmission link failure, Ethernet CFM informs the ERPS ring of the failure so that
ERPS can perform fast protection switching.
Issue 01 (2018-05-04) 701

NE20E-S2
Currently, ERPS can be associated only with outward MEPs.
On the network shown in Figure 1-467, Device A, Device B, and Device C form an ERPS
ring. Three relay nodes exist between Device A and Device C. Ethernet CFM is configured on
Device A and Device C. Interface 1 on Device A is associated with Interface 1 on Relay 1, and
Interface 1 on Device C is associated with Interface 1 on Relay 3.
In normal situations, the RPL owner port sends R-APS (NR) messages to all other nodes on
the ring at an interval of 5s, indicating that ERPS links are normal.
Figure 1-467 ERPS ring over transmission links (links are normal)
If Relay 2 fails, Device A and Device C detect the Ethernet CFM failure, block their Interface
1, send R-APS (SF) messages through their respective interfaces connected to Device B, and
then perform a Filtering Database (FDB) flush.
After receiving an R-APS (SF) message, Device B unblocks the RPL owner port and
performs an FDB flush. Figure 1-468 shows the networking after Relay 2 fails.
Issue 01 (2018-05-04) 702

NE20E-S2
Figure 1-468 ERPS ring over transmission links (Relay 2 fails)
After Relay 2 recovers, Relay2 in revertive switching mode re-blocks the RPL owner port and
sends R-APS (NR, RB) messages.
After Device A and Device C receive an R-APS (NR, RB) message, Device A and Device C
unblock their blocked Interface 1 and perform an FDB flush so that traffic changes to the
normal state, as shown in Figure 1-467.
1.7.9.3.1 ERPS Layer 2 Protocol Tunneling Application
Redundant links are used on an Ethernet switching network to provide link backup and
enhance network reliability. The use of redundant links, however, may produce loops, causing
broadcast storms and rendering the MAC address table unstable. As a result, the
communication quality deteriorates, and communication services may be interrupted.
To prevent loops caused by redundant links, enable ERPS on the nodes of the ring network.
ERPS is a Layer 2 loop-breaking protocol defined by the ITU-T. It boasts of fast convergence,
implementing convergence within 50 ms.
On the network shown in Figure 1-469, Device A through Device E constitute a major ring;
Device A, Device C, and Device F constitute a sub-ring; Device C, Device D, and Device G
constitute another sub-ring. The ERPS ring network resides at the aggregation layer, and
therefore is an aggregation ring. The aggregation ring aggregates Layer 2 services to the
upstream Layer 3 network, providing Layer 2 protection switching. VLANIF interfaces are
configured on Device A and Device E for Layer 3 access. VRRP is configured on the
VLANIF interfaces to implement the virtual gateway function, and peer BFD is enabled for
fast fault detection and accordingly fast VRRP switching.
Issue 01 (2018-05-04) 703

NE20E-S2
Figure 1-469 ERPS multi-ring networking
If ERPS multi-instances are configured, ERPS is implemented in the same manner as that in
Figure 1-469, except that two logical ERPS rings are configured on the physical ring in Figure
1-469, and each logical ERPS ring has its switches, port roles, and control VLANs
independently configured.
Terms
Term Description
FDB Filtering database. A collection of entries for guiding data forwarding. There are
Layer 2 FDB and Layer 3 FDB. The Layer 2 FDB refers to the MAC table,
which provides information about MAC addresses and outbound interfaces and
guides Layer 2 forwarding. The Layer 3 FDB refers to the ARP table, which
Issue 01 (2018-05-04) 704

NE20E-S2
Term Description
provides information about IP addresses and outbound interfaces and guides
Layer 3 forwarding.
MSTP Multiple Spanning Tree Protocol. A new spanning tree protocol defined in IEEE
802.1s. MSTP uses the concepts of region and instance. Based on different
requirements, MSTP divides a large network into regions where instances are
created. These instances are mapped to VLANs. BPDUs with region and
instance information are transmitted between bridges. A bridge determines
which domain it belongs to based on the information carried in BPDUs.
RSTP Rapid Spanning Tree Protocol. A protocol defined in IEEE 802.1w which is
released in 2001. RSTP is the amendment and supplementation to STP,
implementing rapid convergence.
STP Spanning Tree Protocol (STP). A protocol defined in IEEE 802.1d which is
released in 1998. This protocol is used to eliminate loops on a LAN. The routers
running STP detect loops on the network by exchanging information with each
other, and block specified interfaces to eliminate loops.

Abbreviation
APS Auto Protection Switching

ERPS Ethernet Ring Protection Switch
FS Forced Switch
MEL Maintenance Entity Group Level
MS Manual Switch
NR No Request
NRRB No Request, RPL Blocked
R-APS Ring Auto Protection Switching
RPL Ring Protection Link
SF Signal Fail
WTB Wait To Block
WTR Wait To Restore
Issue 01 (2018-05-04) 705

NE20E-S2
1.7.10 MAC Flapping-based Loop Detection

Definition
MAC flapping-based loop detection is a method for detecting Ethernet loops based on the
frequency of MAC address entry flapping.
Purpose
Generally, redundant links are used on an Ethernet network to provide link backup and
enhance network reliability. Redundant links, however, may produce loops and cause
broadcast storms and MAC address entry flapping. As a result, the communication quality
deteriorates, and communication services may even be interrupted. To eliminate loops on the
network, the spanning tree protocols or Layer 2 loop detection technology was introduced. If
you want to apply a spanning tree protocol, the protocol must be supported and you need to
configure it on each user network device. If you want to apply the Layer 2 loop detection
technology, user network devices must allow Layer 2 loop detection packets to pass.
Therefore, the spanning tree protocols or the Layer 2 loop detection technology cannot be
used to eliminate loops on user networks with unknown connections or user networks that do
not support the spanning tree protocols or Layer 2 loop detection technology.
MAC flapping-based loop detection is introduced to address this problem. It does not require
protocol packet negotiation between devices. A device independently checks whether a loop
occurs on the network based on MAC address entry flapping.
Devices can block redundant links based on the frequency of MAC address entry flapping to
eliminate loops on the network.
Benefits
 Eliminates loops on a network of any topology.
 Prevents broadcast storms and provides timely and reliable communication.
1.7.10.2 Principles
1.7.10.2.1 Principles of MAC Flapping-based Loop Detection
MAC flapping-based loop detection is a method for detecting Ethernet loops based on the
frequency of MAC address entry flapping. It eliminates loops on networks by blocking
redundant links. On a virtual private LAN service (VPLS) network, MAC flapping-based loop
detection can be applied to block attachment circuit (AC) interfaces and pseudo wires (PWs).
This section describes AC interface blocking.
On the network shown in Figure 1-470, the consumer edge (CE) is dual-homed to the
provider edges (PEs) of the Ethernet network. To avoid loops and broadcast storms, deploy
MAC flapping-based loop detection on PE1, PE2, and the CE. For example, when receiving
user packets from the CE, PE1 records in its MAC address table the CE MAC address as the
source MAC address and port1 as the outbound interface. When PE1 receives packets
forwarded by PE2 from the CE, the source MAC address of the packets remains unchanged,
but the outbound interface changes. In this case, PE1 updates the CE's MAC address entry in
its MAC address table. Because PE1 repeatedly receives user packets with the same source
Issue 01 (2018-05-04) 706

NE20E-S2
MAC address through different interfaces, PE1 constantly updates the MAC address entry. In
this situation, with MAC flapping-based loop detection, PE1 detects the MAC address
flapping and concludes that a loop has occurred. PE1 then blocks its port1 and generates an
alarm, or it just generates an alarm, depending on user configurations.
Figure 1-470 User network dual-homed to a VPLS network
After MAC flapping-based loop detection is configured on a device and the device receives
packets with fake source MAC addresses from attackers, the device may mistakenly conclude
that a loop has occurred and block an interface based on the configured blocking policy.
Therefore, key user traffic may be blocked. It is recommended that you disable MAC
flapping-based loop detection on properly running devices. If you have to use MAC
flapping-based loop detection to detect whether links operate properly during site deployment,
be sure to disable this function after this stage.
The basic concepts for MAC flapping-based loop detection are as follows:
 Detection cycle
If a device detects a specified number of MAC address entry flaps within a detection
cycle, the device concludes that a loop has occurred. The detection cycle is configurable.
 Temporary blocking
If a device concludes that a loop has occurred, it blocks an interface or PW for a
specified period of time.
 Permanent blocking
After an interface or a PW is blocked and then unblocked, if the total number of times
that loops occur exceeds the configured maximum number, the interface or PW is
permanently blocked.
Issue 01 (2018-05-04) 707

NE20E-S2
An interface or PW that is permanently blocked can be unblocked only manually.

 Blocking policy
MAC flapping-based loop detection has the following blocking policies:
− Blocking interfaces based on their blocking priorities
The blocking priority of an interface can be configured. When detecting a loop, a
device blocks the interface with a lower blocking priority.
− Blocking interfaces based on their trusted or untrusted states (accurate blocking)
If a dynamic MAC address entry remains the same in the MAC address table within
a specified period and is not deleted, the outbound interface in the MAC address
entry is trusted. When detecting a loop, a device blocks an interface that is not
trusted.
A device on which MAC flapping-based loop detection is deployed blocks PWs based
only on the blocking priorities of the PWs. If the device detects a loop, it blocks the PW
with a lower blocking priority.
 Accurate blocking
After MAC flapping-based loop detection is deployed on a device and the device detects
a loop, the device blocks an AC interface with a lower blocking priority by default.
However, MAC address entries of interfaces without loops may change due to the impact
from a remote loop, and traffic over the interfaces with lower blocking priorities is
interrupted. To address this problem, deploy accurate blocking of MAC flapping-based
loop detection. Accurate blocking determines trusted and untrusted interfaces by
analyzing the frequency of MAC address entry flapping. When a MAC address entry
changes repeatedly, accurate blocking can accurately locate and block the interface with
a loop, which is an untrusted interface.
In addition, MAC flapping-based loop detection can associate an interface with its
sub-interfaces bound with virtual switching instances (VSIs). If a loop occurs in the VSI
bound to a sub-interface, the sub-interface is blocked. However, a loop may also exist in a
VSI bound to another sub-interface. If the loop is not eliminated in time, it will cause traffic
congestion or even a network breakdown. To allow a device to inform the network
administrator of loops, enable MAC flapping-based loop detection association on the interface
of the sub-interfaces bound to VSIs. In this situation, if a sub-interface bound to a VSI is
blocked due to a loop, its interface is also blocked and an alarm is generated. After that, all the
other sub-interfaces bound with VSIs are blocked.
1.7.10.3.1 MAC Flapping-based Loop Detection for VPLS Networks
On the virtual private LAN service (VPLS) network shown in Figure 1-471, pseudo wires
(PWs) are established over Multiprotocol Label Switching (MPLS) tunnels between virtual
private network (VPN) sites to transparently transmit Layer 2 packets. When forwarding
packets, the provider edges (PEs) learn the source MAC addresses of the packets, create MAC
address entries, and establish mapping between the MAC addresses and AC interfaces and
mapping between the MAC addresses and PWs.
Issue 01 (2018-05-04) 708

NE20E-S2
Figure 1-471 VPLS network with MAC flapping-based loop detection enabled
On the network shown in Figure 1-471, CE2 and CE3 are connected to PE1 to provide
redundant links. This deployment may generate loops because the connections on the user
network of CE2 and CE3 are unknown. Specifically, if CE2 and CE3 are connected, PE1
interfaces connected to CE2 and CE3 may receive user packets with the same source MAC
address, causing MAC address entry flapping or even damaging MAC address entries. In this
situation, you can deploy MAC flapping-based loop detection on PE1 and configure a
blocking policy for AC interfaces to prevent such loops. The blocking policy can be either of
the following:
 Blocking interfaces based on their blocking priorities: If a device detects a loop, it blocks
the interface with a lower blocking priority.
 Blocking interfaces based on their trusted or untrusted states: If a device detects a loop, it
blocks the untrusted interface.
MAC flapping-based loop detection can also detect PW-side loops. The principles of blocking
PWs are similar to those of blocking AC interfaces.
In addition, MAC flapping-based loop detection can associate an interface with its
sub-interfaces bound with virtual switching instances (VSIs). If a loop occurs in the VSI
bound to a sub-interface, the sub-interface is blocked. However, a loop may also exist in a
VSI bound to another sub-interface. If the loop is not eliminated in time, it will cause traffic
congestion or even a network breakdown. To inform the network administrator of loops,
enable MAC flapping-based loop detection association on the interface of the sub-interfaces
bound with VSIs. In this situation, if a sub-interface bound with a VSI is blocked due to a
loop, the interface on which the sub-interface is configured is also blocked and an alarm is
generated. After that, all the other sub-interfaces bound with VSIs are blocked.
Issue 01 (2018-05-04) 709

NE20E-S2
Terms
None

Abbreviation
AC attachment circuit
MAC Media Access Control
PW pseudo wire
STP Spanning Tree Protocol
VPLS virtual private LAN service
VSI virtual switching instance
1.7.11 VXLAN
Definition
Virtual extensible local area network (VXLAN) is a Network Virtualization over Layer 3
(NVO3) technology that uses MAC-in-UDP encapsulation.
Purpose
As a widely deployed core cloud computing technology, server virtualization greatly reduces
IT and O&M costs and improves service deployment flexibility.
Issue 01 (2018-05-04) 710

NE20E-S2
Figure 1-472 Server virtualization
On the network shown in Figure 1-472, a server is virtualized into multiple virtual machines
(VMs), each of which functions as a host. A great increase in the number of hosts causes the
following problems:
 VM scale is limited by the network specification.
On a legacy large Layer 2 network, data packets are forwarded at Layer 2 based on MAC
entries. However, there is a limit on the MAC table capacity, which subsequently limits
the number of VMs.
 Network isolation capabilities are limited.
Most networks currently use VLANs to implement network isolation. However, the
deployment of VLANs on large-scale virtualized networks has the following limitations:
− The VLAN tag field defined in IEEE 802.1Q has only 12 bits and can support only
a maximum of 4094 VLANs, which cannot meet user identification requirements of
large Layer 2 networks.
− VLANs on legacy Layer 2 networks cannot adapt to dynamic network adjustment.
 VM migration scope is limited by the network architecture.
After a VM is started, it may need to be migrated to a new server due to resource issues
on the original server, for example, when the CPU usage is too high or memory
resources are inadequate. To ensure uninterrupted services during VM migration, the IP
and MAC addresses of the VM must remain unchanged. To carry this out, the service
Issue 01 (2018-05-04) 711

NE20E-S2
network must be a Layer 2 network and also provide multipathing redundancy backup
and reliability.
VXLAN addresses the preceding problems on large Layer 2 networks.
 Eliminates VM scale limitations imposed by network specifications.
VXLAN encapsulates data packets sent from VMs into UDP packets and encapsulates IP
and MAC addresses used on the physical network into the outer headers. Then the
network is only aware of the encapsulated parameters and not the inner data. This greatly
reduces the MAC address specification requirements of large Layer 2 networks.
 Provides greater network isolation capabilities.
VXLAN uses a 24-bit network segment ID, called VXLAN network identifier (VNI), to
identify users. This VNI is similar to a VLAN ID and supports a maximum of 16M [(2^24
- 1)/1024^2] VXLAN segments.
 Eliminates VM migration scope limitations imposed by network architecture.
VXLAN uses MAC-in-UDP encapsulation to extend Layer 2 networks. It encapsulates
Ethernet packets into IP packets for these Ethernet packets to be transmitted over routes,
and does not need to be aware of VMs' MAC addresses. There is no limitation on Layer
3 network architecture, and therefore Layer 3 networks are scalable and have strong
automatic fault rectification and load balancing capabilities. This allows for VM
migration irrespective of the network architecture.
Benefits
As server virtualization is being rapidly deployed on data centers based on physical network
infrastructure, VXLAN offers the following benefits:
 A maximum of 16M VXLAN segments are supported using 24-bit VNIs, which allows a
data center to accommodate multiple tenants.
 Non-VXLAN network edge devices do not need to identify the VM's MAC address,
which reduces the number of MAC addresses that have to be learned and enhances
network performance.
 MAC-in-UDP encapsulation extends Layer 2 networks, decoupling between physical
and virtual networks. Tenants are able to plan their own virtual networks, not limited by
the physical network IP addresses or broadcast domains. This greatly simplifies network
management.
1.7.11.2 Principles
Virtual extensible local area network (VXLAN) is an NVO3 network virtualization
technology that encapsulates data packets sent from virtual machines (VMs) into UDP packets
and encapsulates IP and MAC addresses used on the physical network in outer headers before
sending the packets over an IP network. The egress tunnel endpoint then decapsulates the
packets and sends the packets to the destination VM.
Issue 01 (2018-05-04) 712

NE20E-S2
Figure 1-473 VXLAN architecture
VXLAN allows a virtual network to provide access services to a large number of tenants. In
addition, tenants are able to plan their own virtual networks, not limited by the physical
network IP addresses or broadcast domains. This greatly simplifies network management.
Table 1-129 describes VXLAN concepts.
Table 1-129 VXLAN concepts
Concept Description
Underlay and VXLAN allows virtual Layer 2 or Layer 3 networks (overlay networks)
overlay to be built over existing physical networks (underlay networks).
networks Overlay networks use encapsulation technologies to transmit tenant
packets between sites over Layer 3 forwarding paths provided by
underlay networks. Tenants are aware of only overlay networks.
Network A network entity that is deployed at the network edge and implements
virtualization network virtualization functions.
Issue 01 (2018-05-04) 713

NE20E-S2
Concept Description
edge (NVE) NOTE
vSwitches on devices and servers can function as NVEs.
VXLAN tunnel A VXLAN tunnel endpoint that encapsulates and decapsulates VXLAN
endpoint packets. It is represented by an NVE.
(VTEP) A VTEP connects to a physical network and is assigned a physical
network IP address. This IP address is irrelevant to virtual networks.
In VXLAN packets, the source IP address is the local node's VTEP
address, and the destination IP address is the remote node's VTEP
address. This pair of VTEP addresses corresponds to a VXLAN tunnel.
VXLAN A VXLAN segment identifier similar to a VLAN ID. VMs on different
network VXLAN segments cannot communicate directly at Layer 2.
identifier (VNI) A VNI identifies only one tenant. Even if multiple terminal users
belong to the same VNI, they are considered one tenant. A VNI
consists of 24 bits and supports a maximum of 16M (2^24-1) tenants.
Bridge domain A Layer 2 broadcast domain through which VXLAN data packets are
(BD) forwarded.
VNIs identifying VNs must be mapped to BDs so that a BD can
function as a VXLAN network entity to transmit VXLAN traffic.
VBDIF interface A Layer 3 logical interface created for a BD. Configuring IP addresses
for VBDIF interfaces allows communication between VXLANs on
different network segments and between VXLANs and non-VXLANs
and implements Layer 2 network access to a Layer 3 network.
Virtual access A Layer 2 sub-interface used to transmit data packets.
point (VAP) Layer 2 sub-interfaces can have different encapsulation types
configured to transmit various types of data packets.
Gateway A device that ensures communication between VXLANs identified by
different VNIs and between VXLANs and non-VXLANs.
A VXLAN gateway can be a Layer 2 or Layer 3 gateway.
 Layer 2 gateway: allows tenants to access VXLANs and
intra-segment communication on a VXLAN.
 Layer 3 gateway: allows inter-segment VXLAN communication and
access to external networks.
1.7.11.2.2 VXLAN Packet Format

VXLAN is a network virtualization technique that uses MAC-in-UDP encapsulation by
adding a UDP header and a VXLAN header before an original Ethernet packet. Figure 1-474
shows the VXLAN packet format.
Issue 01 (2018-05-04) 714

NE20E-S2
Figure 1-474 VXLAN packet format
Table 1-130 Fields in the VXLAN packet format
Field Description
VXLAN header  VXLAN Flags (8 bits): The value is 00001000.
 VNI (24 bits): VXLAN Segment ID or VXLAN Network
Identifier used to identify a VXLAN segment.
 Reserved fields (24 bits and 8 bits): must be set to 0.
Outer UDP header  DestPort: destination port number, which is 4789 for UDP.
 Source Port: source port number, which is calculated by
performing the hash operation on the inner Ethernet frame
headers.
Outer IP header  IP SA: source IP address, which is the IP address of the

local VTEP of a VXLAN tunnel.
 IP DA: destination IP address, which is the IP address of the
remote VTEP of a VXLAN tunnel.
Outer Ethernet header  MAC DA: destination MAC address, which is the MAC
address mapped to the next-hop IP address based on the
Issue 01 (2018-05-04) 715

NE20E-S2
Field Description
destination VTEP address in the routing table of the VTEP
on which the VM that sends packets resides.
 MAC SA: source MAC address, which is the MAC address
of the VTEP on which the VM that sends packet resides.
 802.1Q Tag: VLAN tag carried in packets. This field is
optional.
 Ethernet Type: Ethernet packet type.
1.7.11.2.3 NVE Deployment Mode

On VXLANs, VTEPs are represented by NVEs, and therefore VXLAN tunnels can be
established after NVEs are deployed. The following NVE deployment modes are available
where NVEs are deployed.
 Hardware mode: On the network shown in Figure 1-475, all NVEs are deployed on
NVE-capable devices, which perform VXLAN encapsulation and decapsulation.
Figure 1-475 Hardware mode
 Software mode: On the network shown in Figure 1-476, all NVEs are deployed on
vSwitches, which perform VXLAN encapsulation and decapsulation.
Issue 01 (2018-05-04) 716

NE20E-S2
Figure 1-476 Software mode
 Hybrid mode: On the network shown in Figure 1-477, some NVEs are deployed on
vSwitches, and others on NVE-capable devices. Both vSwitches and NVE-capable
devices may perform VXLAN encapsulation and decapsulation.
Issue 01 (2018-05-04) 717

NE20E-S2
Figure 1-477 Hybrid mode
1.7.11.2.4 VXLAN Control Plane
1.7.11.2.4.1 Centralized VXLAN Gateway's Control Plane

In centralized VXLAN gateway deployment mode, Layer 3 gateways are configured on one
device. The traffic across network segments is forwarded through Layer 3 gateways to
implement centralized traffic management. In this mode, the VXLAN control plane is
responsible for VXLAN tunnel establishment and dynamic ARP/MAC address learning.
VXLAN Tunnel Establishment

VXLAN tunnels can be statically or dynamically established.
 Static VXLAN tunnel establishment
A VXLAN tunnel is identified by a pair of VTEP IP addresses. A VXLAN tunnel is
statically created after you configure local and remote VNIs, VTEP IP addresses, and an
ingress replication list, and the tunnel goes Up when the pair of VTEPs are reachable at
Layer 3.
On the network shown in Figure 1-478, Server 1 and Server 3 are deployed for Device 1,
and Server 2 is deployed for Device 3. To allow VMs on Server 2 and Server 3 to
communicate, VNIs and VTEP IP addresses must be configured for establishing a
VXLAN tunnel between Device 1 and Device 3. To allow VMs on Server 1 and Server 2
to communicate, VNIs and VTEP IP addresses must be configured for establishing a
VXLAN tunnel between Device 1 and Device 2 and between Device 2 and Device 3.
Issue 01 (2018-05-04) 718

NE20E-S2
Figure 1-478 VXLAN networking
 Dynamic VXLAN tunnel establishment

BGP EVPN extends BGP by defining a new type of network layer reachability
information (NLRI) called EVPN NLRI. In a centralized VXLAN gateway scenario,
EVPN can function as the VXLAN control plane by using inclusive multicast routes
carried in the EVPN NLRI to exchange information between VXLAN gateways for
tunnel establishment. A local gateway must obtain the VTEP IP address and VNI of a
remote gateway before establishing a VXLAN tunnel with the remote gateway. On the
network shown in Figure 1-479, Device 1 and Device 2 use EVPN inclusive multicast
routes to exchange information for VXLAN tunnel establishment. The process is as
follows:
a. Create EVPN instances on Device 1 and Device 2 and establish a BGP EVPN peer
relationship between Device 1 and Device 2.
b. Device 1 and Device 2 use BGP EVPN to send EVPN routes comprised of
inclusive multicast route prefixes and PMSI attributes. VTEP IP addresses are
stored in the Originating Router's IP Address field in the inclusive multicast route
prefix, and Layer 2 VNIs are stored in PMSI attributes.
Issue 01 (2018-05-04) 719

NE20E-S2
c. Upon receipt of the EVPN routes, a gateway matches the export VPN target carried
in the route against the import VPN target of its local EVPN instance. If the two
VPN targets match, the gateway accepts the route and stores the VTEP IP address
and VNI carried in the route for later packet transmission over the VXLAN tunnel.
If the two VPN targets do not match, the gateway drops the route.
In this example, the import VPN target of one EVPN instance must match the export VPN target of the
other EVPN instance. Otherwise, the VXLAN tunnel cannot be established. If only one end can
successfully accept the IRB or IP prefix route, this end can establish a VXLAN tunnel to the other end,
but cannot exchange data packets with the other end. The other end drops packets after confirming that
there is no VXLAN tunnel to the end that has sent these packets.
Figure 1-479 VXLAN tunnel establishment using EVPN in centralized gateway scenarios
To implement ARP broadcast suppression in centralized VXLAN gateway scenarios,

VXLAN gateways can be configured to advertise ARP routes. ARP routes are comprised
of EVPN MAC advertisement route prefixes and extended community attributes, with
MAC addresses of user terminals connecting to the gateways stored in the MAC Address
and MAC Address Length fields, IP Address, and IP Address Length of packets. EVPN
enables gateways to use ARP to learn the local MAC addresses and use ARP routes to
learn remote MAC addresses and IP addresses corresponding to these MAC addresses,
and store them locally. After ARP broadcast suppression is enabled and a gateway
receives an ARP request message, the gateway first searches the locally stored MAC
addresses. If a matching MAC address is found, the gateway responds with an ARP reply
message without broadcasting the ARP request message to other gateways. This
processing reduces network resource usage.
Issue 01 (2018-05-04) 720

NE20E-S2
Figure 1-480 EVPN NLRI specific to the MAC advertisement route
 Characteristics of static and dynamic VXLAN tunnel establishment

− Static VXLAN tunnel establishment: VXLAN tunnels are manually created,
without any routing protocol. This mode is easy to configure and manage, but is
inflexible, and therefore is inapplicable to large-scale networks. Static VXLAN
tunnel establishment can be used in simple VXLAN networking.
− Dynamic VXLAN tunnel establishment: The mode is complex to configure because
you must configure EVPN as the VXLAN control plane. It is also flexible because
EVPN allows dynamic discovery and establishment of VXLAN tunnels, and is
therefore applicable to large-scale networks.
Communication is allowed between VXLANs comprising Huawei devices and VXLANs
comprising non-Huawei devices, irrespective of whether VXLAN tunnels are established
statically or dynamically, though the two modes cannot be used together.
1.7.11.2.4.2 Distributed VXLAN Gateway's Control Plane

Distributed VXLAN gateways use the spine-leaf network. In this networking, leaf nodes,
which can function as Layer 3 VXLAN gateways, are used as VTEPs to establish VXLAN
tunnels. Spine nodes are unaware of the VXLAN tunnels and only forward VXLAN packets
between different leaf nodes. The control plane implementation for intra-segment
communication in a distributed VXLAN gateway scenario is the same as that in a centralized
VXLAN gateway scenario. This section describes the control plane implementation for
inter-segment communication in a distributed gateway scenario.
Inter-segment communication requires Layer 3 forwarding. On the network shown in Figure
1-481, multiple distributed Layer 3 VXLAN gateways are deployed. A routing protocol runs
on each leaf node for host route advertisement and VXLAN tunnel establishment.
Issue 01 (2018-05-04) 721

NE20E-S2
Figure 1-481 Distributed VXLAN gateway networking
The routing protocol running on each leaf node can be either Multiprotocol Extensions for
BGP (MP-BGP) or BGP Ethernet Virtual Private Network (EVPN).
BGP EVPN
BGP EVPN defines a new type of BGP network layer reachability information (NLRI), called
EVPN NLRI. In a distributed VXLAN gateway scenario, EVPN serves as the VXLAN
control plane. It uses MAC advertisement route prefixes and IP prefix route prefixes carried in
EVPN NLRI as well as extended community attributes to transmit information required for
VXLAN tunnel establishment. Figure 1-482 illustrates the formats of a MAC advertisement
route prefix, an IP prefix route prefix, and an extended community attribute. The MAC
advertisement route prefix, IP prefix route prefix, and extended community attribute can form
different types of routes. Table 1-131 compares the two types of routes used in a distributed
VXLAN gateway scenario.
Table 1-131 Comparison of different types of routes
Route Type Description Characteristic

IRB route An IRB route consists of a VXLAN gateways advertise
MAC advertisement roue IRB routes to suppress ARP
prefix and an extended flooding and exchange host
community attribute. routes.
 The Layer 2 VNI and
Layer 3 VNI are stored
in MPLS Label 1 and
MPLS Label 2,
respectively.
 The host route is stored
Issue 01 (2018-05-04) 722

NE20E-S2
Route Type Description Characteristic

in the IP Address and IP
Address Length fields of
the MAC advertisement
route prefix.
 The VTEP IP address is
stored in the Next Hop
attribute.
 The VTEP MAC address
and tunnel type are
stored in the Router's
MAC and Tunnel Type
fields of the extended
community attribute,
respectively.
IP prefix route Layer 3 VNI VXLAN gateways advertise

An IP prefix route consists IP prefix routes to exchange
of an IP prefix route prefix host routes or host network
and an extended community segment routes. If many
attribute. host routes belong to the
same network segment,
 An IP prefix route stores configure VXLAN gateways
the Layer 3 VNI in to advertise host network
MPLS Label 2, but does segment routes using IP
not carry the Layer 2 prefix routes. Only one host
VNI. network segment route is to
 The host route is stored be advertised for host routes
in the IP Address and IP belonging to the same
Address Length fields of network segment.
the IP prefix route prefix. NOTE
 The VTEP IP address is A VXLAN gateway can
stored in the Next Hop advertise network segment
attribute. routes only if the network
segments attached to the
 The VTEP MAC address gateway are unique
and tunnel type are network-wide.
stored in the Router's
MAC and Tunnel Type
fields of the extended
community attribute,
respectively.
A host route or host network segment route is stored in the IP Address and IP Address Length
fields of a MAC advertisement route prefix or IP prefix route prefix.
Issue 01 (2018-05-04) 723

NE20E-S2
Figure 1-482 Formats of the MAC advertisement route, IP prefix route, and extended community
attribute
Figure 1-483 illustrates the process of automatically establishing a VXLAN tunnel between
two distributed VXLAN gateways.
1. Create an EVPN instance and a VPN instance on each VXLAN gateway (Device 1 and
Device 2 in this example) and establish a BGP EVPN peer relationship between the two
devices.
2. Device 1 and Device 2 use BGP EVPN to exchange IRB or IP prefix routes.
− Upon receipt of an ARP request from a terminal, a gateway obtains the host's ARP
entry and generates a MAC advertisement route from this entry. Then, the gateway
advertises this route as an IRB route to the other gateway.
− A gateway imports the routes destined for the host address or host network segment
address to its local VPN instance. Then, the gateway imports these routes from the
local VPN instance to the local EVPN instance and advertises these routes to the
other gateway using an IP prefix route.
3. Upon receipt of the IRB or IP prefix route, a gateway matches the export VPN target
carried in the route against the import VPN target of its local EVPN instance. If the two
VPN targets match, the gateway accepts the route and stores the VTEP IP address and
Issue 01 (2018-05-04) 724

NE20E-S2
VNI carried in the route for later packet transmission over the VXLAN tunnel. If the two
VPN targets do not match, Device 2 drops the route.
In this example, the import VPN target of one EVPN instance must match the export VPN target of the
other EVPN instance. Otherwise, the VXLAN tunnel cannot be established. If only one end can
successfully accept the IRB or IP prefix route, this end can establish a VXLAN tunnel to the other end,
but cannot exchange data packets with the other end. The other end drops packets after confirming that
there is no VXLAN tunnel to the end that has sent these packets.
Figure 1-483 Establishing a VXLAN tunnel between two distributed VXLAN gateways using the
EVPN control plane
1.7.11.2.4.3 Dynamic Learning of ARP/MAC Address Entries

Dynamic ARP/MAC address entries are dynamically created and updated using ARP/MAC
messages. They do not need to be manually maintained, greatly reducing maintenance
workload.
Dynamic learning of ARP/MAC address entries in a distributed VXLAN gateway scenario is
similar to that in a centralized VXLAN gateway scenario. This section uses inter-segment
communication in a centralized VXLAN gateway scenario as an example to describe dynamic
learning of ARP/MAC address entries.
Learning of ARP/MAC Address Entries for Static VXLAN Tunnel Establishment

The following section illustrates the establishment of dynamic ARP/MAC address entries for
communication between tenants on different network segments as shown in Figure 1-484.
Issue 01 (2018-05-04) 725

NE20E-S2
Figure 1-484 Dynamic learning of ARP/MAC address entries
1. When VM1 communicates with VM2 for the first time, VM1 sends an ARP request with
the destination MAC address being all Fs for Device 3's MAC address.
2. After the ARP request arrives at Device 1, Device 1 updates the locally saved MAC
address table and broadcasts the ARP request onto the local network segment.
3. After the ARP request arrives at Device 3, Device 3 updates the locally saved ARP table
and responds to Device 1 with an ARP reply with the source MAC address being MAC3.
4. Upon receipt, Device 1 updates the locally saved MAC address table.
5. After VM1 receives the ARP reply, VM1 updates the locally saved ARP table and sends
data packets to the Layer 3 gateway NVE3.
6. After the data packets arrive at Device 1, Device 1 searches the locally saved MAC
address table and finds that the outbound interface points to NVE3. Device 1
encapsulates the data packets into VXLAN packets as shown in Figure 1-485 and sends
the VXLAN packets to Device 3.
Issue 01 (2018-05-04) 726

NE20E-S2
Figure 1-485 VXLAN packet
7. Upon receipt, Device 3 decapsulates the VXLAN packets to obtain and route the data
packets.
8. Upon receipt, NVE3 searches the locally saved ARP table for an entry containing the
mapping between VM2's IP and MAC addresses but fails. Therefore, NVE3 sends an
ARP request to NVE2.
address table and broadcasts the ARP request onto the local network segment.
10. After the ARP request arrives at VM2, VM2 updates the locally saved ARP table and
responds to Device 2 with an ARP reply.
11. Upon receipt, Device 2 updates the locally saved MAC address table and responds to
Device 3 with an ARP reply.
12. Upon receipt, Device 3 updates the locally saved ARP table.
13. Before NVE3 sends data packets to VM2, NVE3 searches the locally saved MAC
address table and finds that the outbound interface points to NVE2. Therefore, NVE3
encapsulates the data packets into VXLAN packets as shown in Figure 1-486.
Figure 1-486 VXLAN packet
14. NVE3 then forwards the VXLAN packets to Device 2 based on the outer routing table
shown in Figure 1-486. Upon receipt, Device 2 decapsulates the VXLAN packets,
searches the MAC address table, and forwards the data packets to VM2.
The VM2 -> VM1 communication process is similar to the VM1 -> VM2 communication
process. In VM1 -> VM2 communication, the Layer 3 gateway, NVE1, and NVE2 have
updated ARP and MAC address tables, and therefore ARP request transmission is not needed
during VM2 -> VM1 communication. VM2 communicates with VM1 in unicast mode.
VM1 and VM2 on different network segments can now communicate through VXLAN Layer
3 gateways.
Learning of ARP/MAC Address Entries for Dynamic VXLAN Tunnel

Establishment
ARP entry learning is the same for static and dynamic VXLAN tunnel establishment, but
MAC address entry learning varies for the two scenarios. The following uses the network
shown in Figure 1-484 as an example to describe the MAC address entry learning process.
Issue 01 (2018-05-04) 727

NE20E-S2
1. When VM1 communicates with VM2 for the first time, VM1 sends an ARP request with
the destination MAC address being all Fs.
address table (adds a VNI ID to the MAC address table).
3. Device 1 broadcasts the ARP request onto the local network segment, generates an
EVPN route that carries the MAC advertisement route prefix based on the ARP request,
and advertises the route to its BGP EVPN peer, Device 3.
5. After the ARP request arrives at Device 3, Device 3 updates the locally saved ARP table
and responds to Device 1 with an ARP reply with the source MAC address being MAC3.
6. Device 3 generates an EVPN route that carries the MAC advertisement route prefix
based on the ARP reply and advertises the route to its BGP EVPN peer, Device 1.
Device 2 obtains Device 3's MAC address entry in the same way as Device 1.
1.7.11.2.5 Data Packet Forwarding

Layer 2 packets can be transmitted over legacy Layer 3 networks after being encapsulated
into VXLAN packets. This implementation allows you to construct a logic large Layer 2
network over a Layer 3 network.
Intra-Segment Packet Forwarding

A Layer 2 VXLAN gateway forwards received packets in BUM packet forwarding or known
unicast packet forwarding mode based on the packets' destination MAC address type.
 BUM packet forwarding process
When a BUM packet enters a VXLAN tunnel, the ingress VTEP uses ingress replication
to perform VXLAN tunnel encapsulation. When the BUM packet leaves the VXLAN
tunnel, the egress VTEP decapsulates the BUM packet. Figure 1-487 shows the BUM
packet forwarding process.
Ingress replication: After an NVE receives broadcast, unknown unicast, and multicast (BUM) packets,
the local VTEP obtains a list of VTEPs on the same VXLAN segment as itself through the control plane
and sends a copy of the BUM packets to every VTEP in the list.
Ingress replication allows BUM packets to be transmitted in broadcast mode, independent of multicast
routing protocols.
Issue 01 (2018-05-04) 728

NE20E-S2
Figure 1-487 BUM packet forwarding process
a. After Device 1 receives packets from Terminal A, Device 1 determines the Layer 2
broadcast domain of the packets based on the access interface and VLAN
information carried in the packets and checks whether the destination MAC address
is a BUM address.
 If the destination MAC address is a BUM address, Device 1 broadcasts the
packets in the Layer 2 broadcast domain and goes to 2.
Issue 01 (2018-05-04) 729

NE20E-S2
 If the destination MAC address is not a BUM address, Device 1 follows the
unicast packet forwarding process.
b. Device 1's VTEP obtains the ingress replication list for the VNI, replicates packets
based on the list, and performs VXLAN tunnel encapsulation by adding outer
headers. Device 1 then forwards the packets through the outbound interface.
c. Upon receipt of the VXLAN packets, the VTEP on Device 2 or Device 3 verifies
the VXLAN packets based on the UDP destination port numbers, source and
destination IP addresses, and VNI. The VTEP obtains the Layer 2 broadcast domain
based on the VNI and removes the outer headers to obtain the inner Layer 2 packets.
It then determines whether the destination MAC address is a BUM address.
 If the destination MAC address is a BUM address, the VTEP broadcasts the
packets to the user side in the Layer 2 broadcast domain.
 If the destination MAC address is not a BUM address, the VTEP further
checks whether it is a local MAC address.
 If it is a local MAC address, the VTEP sends the packets to the device.
 If it is not a local MAC address, the VTEP searches for the outbound
interface and encapsulation information in the Layer 2 broadcast domain
and goes to 4.
d. Device 2 or Device 3 adds VLAN tags to the packets based on the outbound
interface and encapsulation information and forwards the packets to Terminal B or
Terminal C.
Terminal B or Terminal C responds to Terminal A following the unicast packet forwarding process.
 Unicast Packet Forwarding Process

Figure 1-488 shows the unicast packet forwarding process.
Issue 01 (2018-05-04) 730

NE20E-S2
Figure 1-488 Unicast packet forwarding process
a. After Device 1 receives packets from Terminal A, Device 1 determines the Layer 2
broadcast domain of the packets based on the access interface and VLAN
information carried in the packets and checks whether the destination MAC address
is a unicast address.
Issue 01 (2018-05-04) 731

NE20E-S2
 If the destination MAC address is not a unicast address, Device 1 further

 If it is a local MAC address, Device 1 processes the packets.
 If it is not a local MAC address, Device 1 searches for the outbound
interface and encapsulation information in the Layer 2 broadcast domain
and goes to 2.
 If it is not a local MAC address, the VTEP follows the 2.
b. Device 1's VTEP performs VXLAN tunnel encapsulation based on the outbound
interface and encapsulation information and forwards the packets.
c. Upon receipt of the VXLAN packets, Device 2's VTEP verifies the VXLAN
packets based on the UDP destination port numbers, source and destination IP
addresses, and VNI. Device 2 obtains the Layer 2 broadcast domain based on the
VNI and performs VXLAN tunnel decapsulation to obtain the inner Layer 2 packets.
It then determines whether the destination MAC address is a unicast address.
 If the destination MAC address is a unicast address, the VTEP searches for the
outbound interface and encapsulation information in the Layer 2 broadcast
domain and goes to 4.
 If the destination MAC address is not a unicast address, the VTEP further
 If it is a local MAC address, the VTEP sends the packets to Device 2.
 If it is not a local MAC address, the VTEP follows the BUM packet
forwarding process.
d. Device 2 adds VLAN tags to the packets based on the outbound interface and
encapsulation information and forwards the packets to Terminal B.
Inter-Segment Packet Forwarding

VNIs can be mapped to BDs in 1:1 mode so that a BD can function as a VXLAN network
entity to transmit VXLAN data packets. Layer 3 BDIF interfaces can be configured for BDs.
BDIF interfaces have similar functions as VLANIF interfaces. Configuring IP addresses for
BDIF interfaces allows communication between VXLANs on different network segments and
between VXLANs and non-VXLANs, implementing VXLAN Layer 3 gateway functionality.
Issue 01 (2018-05-04) 732

NE20E-S2
Figure 1-489 Inter-segment packet forwarding in a centralized VXLAN gateway scenario
In a centralized VXLAN gateway scenario shown in Figure 1-489, the inter-segment packet
forwarding process is as follows:
1. After Device 3 receives VXLAN packets, it decapsulates the packets and checks whether
the destination MAC address in the inner packets is the MAC address of the Layer 3
gateway interface BDIF10.
− If the destination MAC address is a local MAC address, Device 3 forwards the
packets to the Layer 3 gateway on the destination network segment and goes to 2.
− If the destination MAC address is not a local MAC address, Device 3 searches for
the outbound interface and encapsulation information in the Layer 2 broadcast
domain.
2. Device 3 remove the Ethernet headers of the inner packets and parse the destination IP
address. Device 3 searches the routing table for the next-hop IP address base on the
destination and the ARP entries based on the next-hop IP address. Device 3 uses the
ARP entries to identify the destination MAC address, VXLAN tunnel's outbound
interface, and VNI.
Issue 01 (2018-05-04) 733

NE20E-S2
− If the VXLAN tunnel's outbound interface and VNI cannot be found, Device 3
performs Layer 3 forwarding.
− If the VXLAN tunnel's outbound interface and VNI can be found, Device 3 follows
3.
3. Device 3 encapsulate VXLAN packets again, with the source MAC address in the
Ethernet header of the inner packets as the MAC address of the Layer 3 gateway
interface BDIF20.
For details on communication between Device 3 and other devices, see Layer 2 gateway principles.
1.7.11.2.6 VXLAN Active-Active Reliability
Basic Concepts
In the scenario where an enterprise site and data center are interconnected, the VPN GWs
(PE1 and PE2) and the enterprise Site (CPE) are connected through VXLAN tunnels to
exchange L2/L3 services between the enterprise site (CPE) and data center. The data center
gateway (CE1) is dual-homed to PE1 and PE2 to access the VXLAN network, which
enhances network access reliability. When one PE fails, services can be rapidly switched to
the other PE, minimizing the impact on services.
As shown in Figure 1-490, PE1 and PE2 use a virtual address as an NVE interface address at
the network side, namely, the Anycast VTEP address. In this way, the CPE is aware of only
one remote NVE interface and establishes a VXLAN tunnel with the virtual address. The
packets from the CPE can reach CE1 through either PE1 or PE2. However, single-homed CEs
may exist, such as CE2 and CE3. As a result, after reaching a PE, the packets from the CPE
may need to be forwarded by the other PE to a single-homed CE. Therefore, a bypass
VXLAN tunnel needs to be established between PE1 and PE2. An EVPN peer relationship is
established between PE1 and PE2. Different addresses, namely, bypass VTEP addresses, are
configured for PE1 and PE2 so that they can establish a bypass VXLAN tunnel.
Issue 01 (2018-05-04) 734

NE20E-S2
Figure 1-490 Basic networking of the VXLAN active-active scenario
Control Plane
 PE2 sends a multicast route to PE1. The source address of the route is the Anycast VTEP
address shared by PE1 and PE2. The route carries the bypass VXLAN extended
community attribute, including the bypass VTEP address of PE1.
 After receiving the multicast route from PE2, PE1 considers that an Anycast relationship
be established with PE2. This is because the source address (Anycast VTEP address) of
the route is the same as the local virtual address of PE1 and the route carries the bypass
VTEP extended community attribute. Based on the bypass VXLAN extended attribute of
the route, PE1 establishes a bypass VXLAN tunnel to PE2.
 PE1 learns the MAC address of the CEs through upstream packets at the AC side and
advertises the routes to PE2 through BGP EVPN. The routes carry the ESI of the links
accessed by the CEs, and information about the VLANs that the CE access, and bypass
VXLAN extended community attribute.
 PE1 learns the MAC address of the CPE through downstream packets at the network side,
specifies that the next-hop address of the MAC route can be iterated to a static VXLAN
tunnel, and advertises the route to PE2. The next-hop address of the MAC route cannot
be changed.
Data Packets Processing

 Layer 2 unicast packet forwarding
− Uplink
As shown in Figure 1-491, after receiving Layer 2 unicast packets destined for the
CPE from CE1, CE2, and CE3, PE1 and PE2 search for their local MAC address
table to obtain outbound interfaces, perform VXLAN encapsulation on the packets,
and forward them to the CPE.
Issue 01 (2018-05-04) 735

NE20E-S2
Figure 1-491 Uplink unicast packet forwarding
− Downlink
After receiving a Layer 2 unicast packet sent by the CPE to CE1, PE1 performs
VXLAN decapsulation on the packet, searches the local MAC address table for the
destination MAC address, obtains the outbound interface, and forwards the packet
to CE1.
destination MAC address, obtains the outbound interface, and forwards the packet
to CE2.
destination MAC address, and forwards it to PE2 over the bypass VXLAN tunnel.
After the packet reaches PE2, PE2 searches the destination MAC address, obtains
the outbound interface, and forwards the packet to CE3.
The process for PE2 to forward packets from the CPE is the same as that for PE1 to
forward packets from the CPE.
Issue 01 (2018-05-04) 736

NE20E-S2
Figure 1-492 Downlink unicast packet forwarding
 BUM packet forwarding

− As shown in Figure 1-493, if the destination address of a BUM packet from the
CPE is the Anycast VTEP address of PE1 and PE2, the BUM packet may be
forwarded to either PE1 or PE2. If the BUM packet reaches PE2 first, PE2 sends a
copy of the packet to CE3 and CE1. In addition, PE2 sends a copy of the packet to
PE1 through the bypass VXLAN tunnel between PE1 and PE2. After the copy of
the packet reaches PE1, PE1 sends it to CE2, not to the CPE or CE1. In this way,
CE1 receives only one copy of the packet.
Issue 01 (2018-05-04) 737

NE20E-S2
Figure 1-493 BUM packets from the CPE
− As shown in Figure 1-494, after a BUM packet from CE2 reaches PE1, PE1 sends a
copy of the packet to CE1 and the CPE. In addition, PE1 sends a copy of the packet
to PE2 through the bypass VXLAN tunnel between PE1 and PE2. After the copy
of the packet reaches PE2, PE2 sends it to CE3, not to the CPE or CE1.
Figure 1-494 BUM packets from CE2
Issue 01 (2018-05-04) 738

NE20E-S2
− As shown in Figure 1-495, after a BUM packet from CE1 reaches PE1, PE1 sends a
copy of the packet to CE2 and the CPE. In addition, PE1 sends a copy of the packet
to PE2 through the bypass VXLAN tunnel between PE1 and PE2. After the copy
of the packet reaches PE2, PE2 sends it to CE3, not to the CPE or CE1.
Figure 1-495 BUM packets from CE1
 Layer 3 packets transmitted on the same subnet

− Uplink
As shown in Figure 1-491, after receiving Layer 3 unicast packets destined for the
CPE from CE1, CE2, and CE3, PE1 and PE2 search for the destination address and
directly forward them to the CPE because they are on the same network segment.
− Downlink
After the Layer 3 unicast packet sent from the CPE to CE1 reaches PE1, PE1
searches for the destination address and directly sends it to CE1 because they are on
the same network segment.
searches for the destination address and directly sends it to CE2 because they are on
the same network segment.
searches for the destination address and sends it to PE2, then sends it to CE3,
because they are on the same network segment.
 Layer 3 packets transmitted across subnets
Issue 01 (2018-05-04) 739

NE20E-S2
− Uplink
Because the CPE is on a different network segment from PE1 and PE2, the
destination MAC address of a Layer 3 unicast packet sent from CE1, CE2, or CE3
to the CPE is the MAC address of the BDIF interface on the Layer 3 gateway of
PE1 or PE2. After receiving the packet, PE1 or PE2 removes the Layer 2 tag from
the packet, searches for a matching Layer 3 routing entry, and obtains the outbound
interface that is the BDIF interface connecting the CPE to the Layer 3 gateway. The
BDIF interface searches the ARP table, obtains the destination MAC address,
encapsulates the packet into a VXLAN packet, and sends it to the CPE through the
VXLAN tunnel.
After receiving the Layer 3 packet from PE1 or PE2, the CPE removes the Layer 2
tag from the packet because the destination MAC address is the MAC address of
the BDIF interface on the CPE. Then the CPE searches the Layer 3 routing table to
obtain a next-hop address to forward the packet.
− Downlink
Before sending a Layer 3 unicast packet to CE1 across subnets, the CPE searches its
Layer 3 routing table and obtains the outbound interface that is the BDIF interface
on the Layer 3 gateway connecting to PE1. The BDIF interface searches the ARP
table to obtain the destination MAC address, encapsulates the packet into a VXLAN
packet, and forwards it to PE1 over the VXLAN tunnel.
After receiving the packet from the CPE, PE1 removes the Layer 2 tag from the
packet because the destination address of the packet is the MAC address of PE1's
BDIF interface. Then PE1 searches the Layer 3 routing table and obtains the
outbound interface that is the BDIF interface connecting PE1 to its attached CE.
The BDIF interface searches its ARP table and obtains the destination address,
performs Layer-2 encapsulation for the packet, and sends it to CE1.
1.7.11.3.1 Application for Communication Between Terminal Users on a VXLAN
Service Description
Currently, data centers are expanding on a large scale for enterprises and carriers, with
increasing deployment of virtualization and cloud computing. In addition, to accommodate
more services while reducing maintenance costs, data centers are employing large Layer 2 and
virtualization technologies.
As server virtualization is implemented in the physical network infrastructure for data centers,
VXLAN, an NVO3 technology, has adapted to the trend by providing virtualization solutions
for data centers.
On the network shown in Figure 1-496, an enterprise has VMs deployed in different data
centers. Different network segments run different services. The VMs running the same service
or different services in different data centers need to communicate with each other. For
example, VMs of the financial department residing on the same network segment need to
Issue 01 (2018-05-04) 740

NE20E-S2
communicate, and VMs of the financial and engineering departments residing on different
network segments also need to communicate.
Figure 1-496 Communication between terminal users on a VXLAN
Feature Deployment
 Deploy Device 1 and Device 2 as Layer 2 VXLAN gateways and establish a VXLAN
tunnel between Device 1 and Device 2 to allow communication between terminal users
on the same network segment.
 Deploy Device 3 as a Layer 3 VXLAN gateway and establish a VXLAN tunnel between
Device 1 and Device 3 and between Device 2 and Device 3 to allow communication
between terminal users on different network segments.
Configure VXLAN on devices to trigger VXLAN tunnel establishment and dynamic learning
of ARP and MAC address entries. By now, terminal users on the same network segment and
different network segments can communicate through the Layer 2 and Layer 3 VXLAN
gateways based on ARP and routing entries.
Issue 01 (2018-05-04) 741

NE20E-S2
1.7.11.3.2 Application for Communication Between Terminal Users on a VXLAN and

Legacy Network
Service Description
Currently, data centers are expanding on a large scale for enterprises and carriers, with
increasing deployment of virtualization and cloud computing. In addition, to accommodate
more services while reducing maintenance costs, data centers are employing large Layer 2 and
virtualization technologies.
As server virtualization is implemented in the physical network infrastructure for data centers,
VXLAN, an NVO3 technology, has adapted to the trend by providing virtualization solutions
for data centers, allowing intra-VXLAN communication and communication between
VXLANs and legacy networks.
On the network shown in Figure 1-497, an enterprise has VMs deployed for the finance and
engineering departments and a legacy network for the human resource department. The
finance and engineering departments need to communicate with the human resource
department.
Figure 1-497 Communication between terminal users on a VXLAN and legacy network
Issue 01 (2018-05-04) 742

NE20E-S2
Feature Deployment
Deploy Device 1 and Device 2 as Layer 2 VXLAN gateways and Device 3 as a Layer 3
VXLAN gateway. The VXLAN gateways are VXLANs' edge devices connecting to legacy
networks and are responsible for VXLAN encapsulation and decapsulation. Establish a
VXLAN tunnel between Device 1 and Device 3 and between Device 2 and Device 3 for
VXLAN packet transmission.
When the human resource department sends a packet to VM1 of the financial department, the
1. Device 1 receives the packet and encapsulates it into a VXLAN packet before sending it
to Device 3.
2. Upon receipt, Device 3 decapsulates the VXLAN packet and removes the Ethernet
header in the inner packet, parses the destination IP address, and searches the routing
table for a next hop address. Then, Device 3 searches the ARP table based on the next
hop address to determine the destination MAC address, VXLAN tunnel's outbound
interface, and VNI.
3. Device 3 encapsulates the VXLAN tunnel's outbound interface and VNI into the packet
and sends the VXLAN packet to Device 2.
4. Upon receipt, Device 2 decapsulates the VXLAN packet, finds the outbound interface
based on the destination MAC address, and forwards the packet to VM1.
1.7.11.3.3 Distributed VXLAN Gateway Application
Service Description
In legacy networking, a centralized Layer 3 gateway is deployed on an aggregation or spine
node. Packets across different networks must be forwarded through the centralized Layer 3
gateway, resulting in the following problems:
 Forwarding paths are not optimal. Layer 3 traffic of data centers in different locations
must be transmitted to the centralized Layer 3 gateway for forwarding.
 The ARP entry specification is a bottleneck. ARP entries must be generated for tenants
on the centralized Layer 3 gateway. However, the centralized Layer 3 gateway can only
have a limited number of ARP entries configured, which does not facilitate data center
network expansion.
Distributed VXLAN gateways can be configured to address these problems. In distributed
VXLAN gateway networking, leaf nodes, which can function as Layer 3 VXLAN gateways,
are used as VTEPs to establish VXLAN tunnels. Spine nodes are unaware of the VXLAN
tunnels and only forward VXLAN packets.
On the network shown in Figure 1-498, Server1 and Server2 on different network segments
both connect to Leaf1. Configure Leaf1 as a Layer 3 VXLAN gateway. When Server1 and
Server2 communicate, traffic is forwarded through only Leaf1, not any spine node.
Server1 and Server3 on different network segments connect to Leaf1 and Leaf2, respectively.
Configure both Leaf1 and Leaf2 as Layer 3 VXLAN gateways. When Server1 and Server3
communicate, traffic is forwarded through the VXLAN tunnel established between Leaf1 and
Leaf2. Spine nodes are unaware of the VXLAN tunnel and only forward VXLAN packets.
Issue 01 (2018-05-04) 743

NE20E-S2
Figure 1-498 Distributed VXLAN gateway networking
Feature Deployment
Deploy both Layer 2 and Layer 3 VXLAN gateways on leaf nodes.
 Layer 2 gateway: allows tenant access to VXLANs and intra-subnet VXLAN
communication on the same network segment.
 Layer 3 gateway: allows inter-subnet VXLAN communication and access to external
networks.
Note the following when deploying distributed VXLAN gateways:
 When ARP broadcast suppression is enabled on a leaf node functioning as a Layer 2
VXLAN gateway, the leaf node can determine whether to broadcast ARP request
messages sent from tenants or servers. This function suppresses ARP broadcast traffic,
which improves network performance.
 When advertisement of host routes generated based on ARP entries is enabled on a leaf
node functioning as a Layer 3 VXLAN gateway, the leaf node learns ARP entries of
tenants, generates host routes based on the ARP entries, and uses BGP to advertise the
host routes to BGP peers.
 When traffic is transmitted across leaf nodes at Layer 3, tenants must be bound to VPN
instances. VXLAN tunnels can then be established through BGP VPN peers..
1.7.11.3.4 Application for BRAS Access Through VXLAN

As virtual networks develop, if device resources become insufficient for processing access
user traffic, traffic can be load balanced to virtual BRASs in data centers. Specifically, a
device can establish a VXLAN tunnel with a virtual BRAS for user access.
BRAS access can be implemented through a VXLAN tunnel or through a PW and a VXLAN
tunnel.
Issue 01 (2018-05-04) 744

NE20E-S2
BRAS Access Through a VXLAN Tunnel
Figure 1-499 Configuring BRAS access through a VXLAN tunnel
On the network shown in Figure 1-499, the device deployed at the network edge establishes a
VXLAN tunnel with a virtual BRAS for user access.
1. After a user terminal starts or an IPoE, PPPoE, or L2TP user dials up, the terminal sends
an access message, which is relayed to the edge device through an optical line terminal
(OLT).
2. The edge device encapsulates the access message with a VXLAN header to form a
VXLAN packet and transparently transmits it to the BRAS through a VXLAN tunnel.
3. The BRAS removes the VXLAN header of the received VXLAN packet and processes
the access message.
Issue 01 (2018-05-04) 745

NE20E-S2
BRAS Access Through a PW and a VXLAN Tunnel
Figure 1-500 Configuring BRAS access through a PW and a VXLAN tunnel
On the network shown in Figure 1-500, a VPLS network and a VXLAN network intersect at
Device 2 and Device 4 for BRAS access. The user access implementation is as follows:
1. After a user terminal starts or an IPoE, PPPoE, or L2TP user dials up, the terminal sends
an access message, which is relayed to edge devices (Device 1 and Device 3) through
OLTs.
2. Device 1 and Device 3 create a VSI for each OLT so that each OLT is identified by a
VSI. Device 1 through Device 4 internetwork using VPLS.
3. Device 2 and Device 4 back up each other, with Device 2 the master and Device 4 the
backup. VSIs are mapped to VXLAN VNIs in 1:1 mode. Device 2 and Device 4 have the
same VTEP IP address configured to exchange packets between the PW and VXLAN
tunnel.
4. Device 2 and Device 4 have a VRRP backup group configured to implement link
protection in case the link between Device 2 and BRAS 1 fails.
5. VRRP is associated with the virtual VTEP's route priority on the Device 2 and Device 4
interfaces connecting to the BRASs, and the route priority of the virtual VTEP on the
master device is higher than that of the virtual VTEP on the backup device. Downstream
VXLAN traffic of the BRASs is transmitted through the master device. After
downstream traffic is transmitted to Device 1 and Device 3, their MAC address entries
are updated for guiding upstream traffic to the master device.
6. VRRP is associated with PWs on the Device 2 and Device 4 interfaces connecting to the
BRASs so that the PW interface of the backup device does not receive or forward
VXLAN traffic. User access packets are broadcast to both Device 2 and Device 4 in the
VSI. Because PW packets are blocked on the backup device, only the master device
forwards the user access packets.
7. VRRP is associated with the Device 2 and Device 4 interfaces connecting to the VPLS
network. If link S on the VPLS network fails, protection switching is performed.
Issue 01 (2018-05-04) 746

NE20E-S2
8. Device 2 and Device 4 establish VXLAN tunnels with the BRASs for VXLAN packet
encapsulation and decapsulation, implementing BRAS access.
1.7.11.3.5 Application of VXLAN Active-Active Reliability
Service Description
In the scenario where an enterprise site and data center are interconnected, the data center
gateway (CE1) is dual-homed to PE1 and PE2 to access the VXLAN network to interconnect
with the remote enterprise site (CPE), which brings the following benefits:
 Benefits to carriers: The network access reliability is enhanced because the CE is
dual-homed to PEs.
 Benefits to users: Users are not aware of service failures because services can quickly
recover from a failure.
As shown in Figure 1-501, PE1 and PE2 use a virtual address as an NVE interface address at
the network side, namely, the Anycast VTEP address. In this way, the CPE is aware of only
one remote NVE interface and establishes a VXLAN tunnel with the virtual address. The
packets from the CPE can reach CE1 through either PE1 or PE2. An EVPN peer relationship
is established between PE1 and PE2. Different addresses, namely, bypass VTEP addresses, are
configured for PE1 and PE2 so that they can establish a bypass VXLAN tunnel.
Figure 1-501 Basic networking of the VXLAN active-active scenario
Feature Deployment
 At the AC side, a CE is dual-homed to PEs.
Issue 01 (2018-05-04) 747

NE20E-S2
 IP and IGP protocols are deployed on the carrier network

 Deploy VXLAN on the carrier network and establish static VXLAN tunnels between the
two PEs and CPEs.
 Establish a BGP EVPN peer relationship and a bypass VXLAN tunnel between the two
PEs for traffic forwarding.
 Deploy Fast ReRoute (FRR) on the PEs to implement rapid switching in case of a fault.
 (Optional) Configure a UDP port on the PEs to prevent the receiving of replicated
packets.
 (Optional) Configure IPSec on the PEs and CEP to enhance the security for packets
traversing an insecure network.
Terms
Term Description
NVO3 Network Virtualization over L3. A network virtualization technology
implemented at Layer 3 for traffic isolation and IP independence
between multi-tenants of data centers so independent Layer 2 subnets
can be provided for tenants. In addition, NVO3 supports VM deployment
and migration on Layer 2 subnets of tenants.
VXLAN Virtual extensible local area network. An NVO3 network virtualization
technology that encapsulates data packets sent from VMs into UDP
packets and encapsulates IP and MAC addresses used on the physical
network in the outer headers before sending the packets over an IP
network. The egress tunnel endpoint then decapsulates the packets and
sends the packets to the destination VM.

Abbreviation
BD bridge domain
BUM broadcast, unknown unicast, and multicast
VNI VXLAN Network Identifier
VTEP VXLAN Tunnel Endpoints
Issue 01 (2018-05-04) 748

NE20E-S2
1.7.12 DCI Solutions

Definition
Data Center Interconnection (DCI) provides solutions to interconnect data centers.
Using Virtual extensible local area network (VXLAN), Ethernet virtual private network
(EVPN), and BGP/MPLS IP VPN technologies, DCI solutions allow packets that are
exchanged between data centers to be transmitted securely and reliably over carrier networks,
allowing VMs in different data centers to communicate with each other.
Purpose
To meet the requirements of cross-region operation, user access, and inter-city disaster
recovery that arise during enterprise development, an increasing number of enterprises have
deployed data centers in multiple regions and across carrier networks. Currently, leased fibers
or leased lines are commonly used to interconnect cross-region data centers, which has the
following disadvantages:
 For enterprises, leased fibers or leased lines are costly.
 For carriers, service exploration is difficult, and resource utilization is low.
To cope with these disadvantages, a DCI network that is characterized by high security and
reliability and flexible scheduling needs to be constructed and operated. DCI solutions can be
deployed on the carrier network to allow packets to be transmitted securely and reliably
between data centers and maximize resource utilization.
Benefits
DCI solutions provide the following benefits:
 A data center interconnection network that is characterized by high security and
reliability and flexible scheduling for cross-region data center operation
 Tenant-based differentiated services, which help implement flexible resource scheduling
and reduce costs
1.7.12.2 Principles
1.7.12.2.1 Control Plane
DCI solutions are responsible for Layer 3 Route Advertisement and Layer 2 Route
Advertisement on the control plane. DCI-related concepts are described as follows.
Table 1-132 Basic DCI concepts
Concept Description
Overlay network  An overlay network is a logical network
deployed over a physical network and can be
regarded as a network connected through
virtual or logical links.
 An overlay network has its own control plane
Issue 01 (2018-05-04) 749

NE20E-S2
Concept Description
and forwarding plane.
 An overlay network is a step forward for a
physical network towards cloud and
virtualization. An overlay network is critical
for cloud network convergence because it
frees cloud resource pool capabilities from
various restrictions of the physical network.
Underlay network An underlay network is the physical network that

bears the overlay network.
Separate deployment Separate deployment indicates that data center
gateways are independent from PEs on the DCI
backbone network.
Integrated deployment Integrated deployment indicates that a device
functions not only as a data center gateway but
also as a PE on the DCI backbone network.
Integrated deployment applies when data centers
are established by carriers themselves.
Layer 3 Route Advertisement

A data center uses EVPN to send IRB or IP prefix routes with tenant host IP addresses to an
edge device on a carrier network. After receiving the VXLAN-encapsulated routes, the edge
device changes them to MPLS-encapsulated VPNv4 routes, and sends the VPNv4 routes to its
VPNv4 peer.
Table 1-133 Routes on the data center and carrier network sides
Side Route Fields Carried in the Route

 RD1: route distinguisher of an EVPN
instance
 VM-MAC: MAC address of a VM host
 VM-IP: IP address of a VM host
 Label1: Layer 2 VXLAN network
identifier (L2VNI) of a VXLAN tunnel
 Label2: Layer 3 VXLAN network
Data identifier (L3VNI) of a VXLAN tunnel
EVPN IRB route
center
 NHP: next-hop IP address of a route,
which is the virtual tunnel end point
(VTEP) address of a device in the data
center
 ExtCommunity: extended community
attribute of the route, including the
VXLAN, Router-MAC, and export route
target (ERT) attributes
Issue 01 (2018-05-04) 750

NE20E-S2
 RD1: route distinguisher of an EVPN

instance
 Label1: L3VNI of a VXLAN tunnel
EVPN IP prefix route (VTEP) address of a device in the data
center
VXLAN, Router-MAC, and ERT
attributes
 RD2: route distinguisher of the VPNv4

route
 Label: VPN label and public network
label carried in the VPNv4 route
Carrier
networ VPNv4 route
k which is the IP address used to establish
the VPNv4 peer relationship
attribute of the route, including only the
ERT attribute
In DCI solutions, a carrier network can carry Layer 3 traffic, in both integrated and separated
deployment scenarios. In the two scenarios, Layer 3 route advertisement processes on the DCI
backbone network are the same. This section describes the Layer 3 route advertisement
process only in the integrated deployment scenario. Figure 1-502 shows the networking of the
integrated deployment scenario.
Issue 01 (2018-05-04) 751

NE20E-S2
Figure 1-502 Basic MPLS integrated deployment scenario
Figure 1-503 illustrates the Layer 3 route advertisement process. The detailed process is
described as follows:
1. After receiving a VM host route from Device 1, DCI-PE1-GW1 parses the route,
regardless of whether it is an IRB or IP prefix route.
2. Based on the RT of the VM host route, DCI-PE1-GW1 crosses the VPNv4 route to a
local VPN instance.
3. DCI-PE1-GW1 changes the next hop of the EVPN route to the IP address used to
establish the VPNv4 peer relationship, performs re-encapsulation, and replaces the RD
and RT of the EVPN route with the RD and RT of the L3VPN instance, respectively. In
addition, DCI-PE1-GW1 applies for an MPLS label and sends the VPNv4 route to
DCI-PE2-GW2.
4. Based on the RT of the VPNv4 route, DCI-PE2-GW2 crosses the VPNv4 route to a local
VPN instance.
5. DCI-PE2-GW2 changes the next hop of the VPNv4 route to the local VTEP address,
performs re-encapsulation, replaces the RD and RT of the VPNv4 route with the RD and
RT of the L3VPN instance, respectively. In addition, DCI-PE2-GW2 adds the L3VNI
and sends the EVPN route to Device 2.
Issue 01 (2018-05-04) 752

NE20E-S2
Figure 1-503 Layer 3 routes in an MPLS integrated deployment scenario
Layer 2 Route Advertisement

A data center sends EVPN routes with tenant host MAC addresses to an edge device on a
carrier network. After receiving the routes, the edge device changes them into
MPLS-encapsulated EVPN routes, and sends the EVPN routes to its EVPN peer.
Table 1-134 Routes on the data center and carrier network sides

Data  RD1: route distinguisher of an EVPN
EVPN MAC route or ARP route
center instance
Issue 01 (2018-05-04) 753

NE20E-S2

 VM-IP: IP address of a VM host, which
is carried only in ARP routes
 Label1: L2VNI of a VXLAN tunnel
(VTEP) address of a device in the data
center or the IP address used to establish
an EVPN peer relationship
VXLAN and ERT attributes
 RD2: route distinguisher of the EVPN

route
 VM-IP: IP address of a VM host, which
is carried only in ARP routes
 Label: VPN label and public network
Carrier label carried in the EVPN route
networ EVPN ARP or ARP route  NHP: next-hop IP address of a route,
k which is the IP address of an EVPN peer
or the IP address used to establish an
EVPN peer relationship
attribute of the route, including only the
ERT attribute
In DCI solutions, a carrier network can carry Layer 2 traffic only in the integrated deployment
scenario.
Figure 1-504 illustrates the Layer 2 route advertisement process. The detailed process is
1. After receiving a VM host MAC route from Device 1, DCI-PE1-GW1 parses and learns
the route.
2. Based on the RT of the VM host MAC route, DCI-PE1-GW1 crosses the EVPN route to
a local EVPN instance.
3. DCI-PE1-GW1 changes the next hop of the EVPN route to the IP address used to
establish the EVPN peer relationship, performs re-encapsulation, and replaces the RD
and RT of the VXLAN-encapsulated EVPN route with the RD and RT of the EVPN
instance, respectively. In addition, DCI-PE1-GW1 applies for an MPLS label and sends
the EVPN route to DCI-PE2-GW2.
4. Based on the RT of the EVPN route, DCI-PE2-GW2 crosses the EVPN route to a local
EVPN instance.
Issue 01 (2018-05-04) 754

NE20E-S2
5. DCI-PE2-GW2 changes the next hop of the EVPN route to the local VTEP IP address,
performs re-encapsulation, and replaces the RD and RT of the EVPN route with the RD
and RT of the EVPN instance, respectively. In addition, DCI-PE2-GW2 adds the L2VNI
and sends the EVPN route to Device 2.
Figure 1-504 Layer 2 routes in an MPLS integrated deployment scenario
1.7.12.2.2 Data Plane

In DCI solutions, after a device on a carrier network receives a packet from a data center, the
device forwards the packet through the data plane. The packet is then transmitted hop by hop
over the backbone network, implementing inter-data center communication.
Layer 3 Traffic Forwarding

In DCI solutions, a carrier network can carry Layer 3 traffic, in both integrated and separated
deployment scenarios. In the two scenarios, traffic forwarding processes on the data plane
over the DCI backbone network are the same. This section describes the traffic forwarding
process only in the integrated deployment scenario.
Issue 01 (2018-05-04) 755

NE20E-S2
Figure 1-505 Basic MPLS integrated deployment scenario
On the network shown in Figure 1-505, Layer 3 traffic forwarding on the data plane is
1. After receiving a VXLAN packet carrying a VM host route from Device 1 in data center
A, DCI-PE1-GW1 parses the packet and obtains the corresponding VPN instance
according to VNI carried in the packet. In addition, DCI-PE1-GW1 searches the VPN
instance for the outbound interface and encapsulation information based on the prefix of
the VM host route's destination IP address. Because the outbound interface is an MPLS
tunnel interface, DCI-PE1-GW1 encapsulates the inner Layer 3 packet using MPLS and
sends the MPLS packet through the MPLS tunnel over the backbone network.
2. After DCI-PE2-GW2 receives double-tagged MPLS packet, it parses the packet using
MPLS, removes the outer MPLS public network label, and obtains the corresponding
VPN instance based on the VPN label. Then, DCI-PE2-GW2 searches the VPN
forwarding table based on the prefix of the VM host route's destination IP address.
Because the next hop is a VXLAN tunnel interface and the VTEP of the VXLAN tunnel
is Device 2 in data center B. DCI-PE2-GW2 encapsulates the original data packages and
attributes such as L3VNI and Router-MAC into a VXLAN packet and sends it to Device
2.
Layer 2 Traffic Forwarding

In DCI solutions, a carrier network can carry Layer 2 traffic only in the integrated deployment
scenario.
Issue 01 (2018-05-04) 756

NE20E-S2
On the network shown in Figure 1-505, Layer 2 traffic forwarding on the data plane is
1. After receiving a VXLAN packet carrying a VM MAC route from Device 1 in data
center A, DCI-PE1-GW1 parses the packet and obtains the corresponding Layer 2
broadcast domain according to the VNI carried in the packet. In addition, DCI-PE1-GW1
searches the Layer 2 broadcast domain for the outbound interface and encapsulation
information based on the destination MAC address of the VM host. Because the
outbound interface is an MPLS tunnel interface, DCI-PE1-GW1 encapsulates the inner
Layer 2 packet using MPLS and sends the MPLS packet through the MPLS tunnel over
the backbone network.
2. After DCI-PE2-GW2 receives the MPLS packet, it parses the packet using MPLS,
removes the outer MPLS public network label, and obtains the Layer 2 broadcast domain
based on the EVPN label and BD ID. Then, DCI-PE2-GW2 searches the Layer 2
broadcast domain based on the destination MAC address of the VM host. Because the
outbound interface is a VXLAN tunnel interface and the VTEP of the VXLAN tunnel is
Device 2 in data center B, DCI-PE2-GW performs VXLAN encapsulation based on the
VXLAN tunnel information, and sends the VXLAN packet to Device 2.
1.7.12.3.1 Application of an End-to-End Overlay VXLAN Tunnel
Service Description
GWs and DCI-PEs are separately deployed. DCI-PEs function as edge devices on the
underlay network and ensure VTEPs in data centers are reachable through routes, without
saving data center tenant and host information.
In Figure 1-506, data center gateways GW1 and GW2 are connected to the backbone network.
BGP/MPLS IP VPN functions are deployed on the DCI backbone network to transmit VTEP
IP information between GW1 and GW2. A VXLAN tunnel is established between GW1 and
GW2 for inter-data center E2E VXLAN packet encapsulation and VM communication.
Issue 01 (2018-05-04) 757

NE20E-S2
Figure 1-506 Networking with an end-to-end overlay VXLAN tunnel
Feature Deployment
 IP and an IGP are deployed on the carrier network to ensure reachability between
DCI-PEs at the network layer.
 MPLS is deployed on the carrier network, and an LDP LSP or TE LSP is established
between the DCI-PEs.
 BGP/MPLS IP VPN functions are deployed on DCI-PEs.
1.7.12.3.2 Application of Option A VLAN Layer 3 Access
Service Description
The solution of Option A VLAN Layer 3 access to DCI applies to the scenario where data
centers that do not support VXLAN are interconnected through a DCI network. This solution
has low requirements on GWs, but only a maximum of 4096 VLANs are available to this
solution.
GWs and DCI-PEs are separately deployed. Each DCI-PE considers the GW of a data center
as a CE, uses a Layer 3 VPN routing protocol to receive VM host routes from the data center,
and saves and maintains the routes.
In Figure 1-507, VXLAN tunnels are established within data centers to allow intra-DC VM
communication. To allow inter-data center VM communication, BGP/MPLS IP VPN
functions are deployed on the DCI backbone network, and a Layer 3 Ethernet sub-interface is
configured on each DCI-PE, added to the same VLAN, and bound to the VPN instance of
each DCI-PE.
Issue 01 (2018-05-04) 758

NE20E-S2
Figure 1-507 Networking of Option A VLAN Layer 3 access to DCI
Feature Deployment
 A Layer 3 Ethernet sub-interface is created on each DCI-PE and is associated with a
VLAN.
1.7.12.3.3 Application of Option A VXLAN Layer 3 Access
Service Description
GWs and DCI-PEs are separately deployed. EVPN is used as a control plane protocol to
dynamically establish VXLAN tunnels. VPNv4 is used to send received host IP routes to the
peer DCI-PE, and packets of VM hosts can be forwarded at Layer 3.
In Figure 1-508, data center gateway devices GW1 and GW2 are connected to the DCI
backbone network. To allow inter-data center VM communication, BGP/MPLSIP VPN
functions are deployed on the DCI backbone network. In addition, EVPN and a VXLAN
tunnel are deployed between the GW and DCI-PE to transmit VM host routes so that VMs in
different data centers can communicate with each other.
Issue 01 (2018-05-04) 759

NE20E-S2
Figure 1-508 Networking of Option A VXLAN Layer 3 access to DCI
Feature Deployment
 EVPN is deployed between the GW and DCI to transmit routes and establish a VXLAN
tunnel.
1.7.12.3.4 Application of an MPLS Integrated Deployment Scenario
Service Description
Each DCI-PE-GW functions not only as a data center gateway but also as a PE on the carrier
network. Each DCI-PE-GW learns VM host routes or MAC routes from a data center through
EVPN and forwards packets of VM hosts at Layer 3 or Layer 2.
In Figure 1-509, DCI-PE-GWs function as both data center gateways and MPLS PEs.
DCI-PE-GWs directly connect to data center devices and the Ps on the DCI backbone
network. To allow intra-data center VM communication, a VXLAN tunnel must be
established within each DC. To allow inter-data center VM communication, BGP/MPLS IP
Issue 01 (2018-05-04) 760

NE20E-S2
VPN or EVPN functions must be deployed on the DCI backbone network, and EVPN and
VXLAN tunnels must be deployed between DCI-PE-GWs and data center devices to transmit
VM host IP routes or MAC routes.
Figure 1-509 Networking of MPLS integrated deployment scenario
Feature Deployment
DCI-PE-GWs at the network layer.
between the DCI-PE-GWs.
 On DCI-PE-GWs, BGP/MPLS IP VPN functions are enabled to bear VM host routes, or
EVPN functions are enabled to bear VM MAC routes.
 EVPN is deployed on each DCI-PE-GW as a control plane protocol to dynamically
establish a VXLAN tunnel to the connected device in the data center.
Terms
Term Definition
EVC Ethernet Virtual Connection, a model defined by the
Metro Ethernet Forum (MEF) and used to transmit
Ethernet services on metropolitan transport networks.
An EVC is a model, rather than a specific service or
technique.
Issue 01 (2018-05-04) 761

NE20E-S2

Abbreviations
DCI Data Center Interconnection

VXLAN Virtual extensible local area network
EVPN Ethernet virtual private network
BD bridge domain
1.7.13 Proactive Loop Detection

This chapter describes the basic concepts, principles, and applications of asynchronous
transfer mode (ATM) interface and protocol.
Definition
Proactive loop detection detects and eliminates Layer 2 network loops. When a device's
Ethernet or Eth-Trunk interface physically goes Up or an interface is bound to a VSI, the
device proactively detects and eliminates loops, if any.
Purpose
If a device's Ethernet or Eth-Trunk interface goes Up by misoperation or an interface is bound
to a VSI and the interface incurs a loop, the device may have services interrupted or even get
out of the NMS control. To resolve the loop problem and ensure normal device running,
Huawei developed proactive loop detection upon interface Up. This feature allows a device's
interface to proactively send loop detection packets. If the interface detects a loop, the device
blocks the interface.
1.7.13.2 Principles
1.7.13.2.1 Proactive Loop Detection
Triggering Condition
 Interface going Up
If an Ethernet interface, Ethernet trunk interface, or a specified Ethernet trunk member
interface physically goes Up, the proactive loop detection function is triggered to detect
whether the Ethernet interface, all members of the Ethernet trunk interface, or the
specified Ethernet trunk member has a loop. If they have a loop, this function sets them
to Down. Note that if an Ethernet interface goes Down, its associated sub-interfaces also
go Down.
 Interface bound to a VSI
If an Ethernet interface, Ethernet sub-interface, Ethernet trunk interface, or Ethernet
trunk sub-interface is bound to a VSI, proactive loop detection is triggered on the
Issue 01 (2018-05-04) 762

NE20E-S2
Ethernet interface, Ethernet sub-interface, or trunk member interfaces. If they have a

loop, this function sets them to Down. An Ethernet trunk sub-interface can be a dot1q
sub-interface, dot1q VLAN tag termination sub-interface, or QinQ VLAN tag
termination sub-interface.
Detection Principles
When a device's Ethernet or Eth-Trunk interface goes Up or an interface is bound to a VSI,
the interface proactively sends a loop detection packet. If the device receives the loop
detection packet sent through a VPLS domain within the configured period, a loop occurs on
the network. In this case, the device blocks the interface sending the loop detection packet and
reports an alarm.
Figure 1-510 Proactive loop detection upon interface Up
On the network shown in Figure 1-510, AC1 sends a loop detection packet. If AC2 receives
this packet within a loop detection period, a loop occurs on the network.
Proactive loop detection applies only to VPLS scenarios, not VLAN scenarios.
It is recommended that you disable this function on properly running devices. If you have to
use this function to detect whether links operate properly during site deployment, be sure to
disable this function after this stage.
Issue 01 (2018-05-04) 763

NE20E-S2
Loop Detection Period

An interface can send a loop detection packet for a maximum of five times: the sending
interval is 3s for the first three times and 10s for the latter two times. The maximum loop
detection period is therefore 29s.
Figure 1-511 Loop detection packet sending
In Figure 1-511, t1 = t2 = t3 = 3s, and t4 = t5 = 10s. AC2 processes only the most recently
sent loop detection packet. For example, if AC1 sends a loop detection packet at the a second,
AC2 determines whether the packet was sent at the a second by AC1 upon receiving the
packet.
 If so, a loop occurs. The device then sets the link layer protocol of AC1 to Down and
reports an alarm to the NMS.
 If not, the device simply waits for the packet to be sent.
1.7.13.2.2 Loop Detection Packet Format

Figure 1-512 shows the loop detection packet format.
Issue 01 (2018-05-04) 764

NE20E-S2
Figure 1-512 Loop detection packet format
The meanings of the fields are as follows:

 Destination MAC Address: destination MAC address
 Source MAC Address: source MAC address
 EthType: Ethernet type
 VLAN Info: VLAN information
 System MAC Address: system MAC address
 FID: FID of a VLAN or VSI
 Magicnum: index of an AC interface
 Timestamp: timestamp for sending loop detection packets
 vr id: ID of a virtual router
 Domain Type: domain type
 Inst-id: ID of an instance
A loop detection packet carries at most two VLAN tags (for example, in a QinQ VLAN tag termination
scenario).
Issue 01 (2018-05-04) 765

NE20E-S2
1.7.13.3 Application
1.7.13.3.1 AC Interface Receiving a Loop Detection Packet
Figure 1-513 AC interface receiving a loop detection packet
In Figure 1-513, PE1's AC1 is an Ethernet interface. After AC1 goes physically Up, it
proactively sends a loop detection packet. If AC2 receives this packet, a loop occurs on the
network. PE1 then sets the link layer protocol of AC1 to Down and reports an alarm to the
NMS. This mechanism prevents AC1 from sending or receiving any packets.
Issue 01 (2018-05-04) 766

NE20E-S2
1.7.13.3.2 PW Side Receiving a Loop Detection Packet
Figure 1-514 PW side receiving a loop detection packet
In Figure 1-514, PE1's AC interface is an Ethernet interface. After the AC interface goes
physically Up, it proactively sends a loop detection packet. If this packet loops back to PE1
through Switch, PE2, and the PW between PE1 and PE2, a loop occurs on the network. PE1
then sets the link layer protocol of the AC interface to Down and reports an alarm to the NMS.
This mechanism prevents the AC interface from sending or receiving any packets.
1.7.14 RRPP
1.7.14.1 Principles
Basic Concepts
Ethernet devices can be configured as nodes with different roles on an RRPP ring. RRPP ring
nodes exchange and process RRPP packets to detect the status of the ring network and
communicate any topology changes throughout the network. The master node on the ring
blocks or unblocks the secondary port depending on the status of the ring network. If a device
or link on the ring network fails, the backup link immediately starts to eliminate loops.
 RRPP ring
An RRPP ring consists of interconnected devices configured with the same control
VLAN. An RRPP ring has a major ring and subring. Sub-ring protocol packets are
transmitted through the major ring as data packets; major ring protocol packets are
transmitted only within the major ring.
Issue 01 (2018-05-04) 767

NE20E-S2
 Control VLAN
The control VLAN is a concept relative to the data VLAN.In an RRPP ring, a control
VLAN is used to transmit only RRPP packets, whereas a data VLAN is used to transmit
data packets.
 Node type
Master node: The master node determines how to handle topology changes. Each RRPP
ring must have only one master node. Any device on the Ethernet ring can serve as the
master node.
Transit node: On an RRPP ring, all nodes except the master node are transit nodes. Each
transit node monitors the status of its directly connected RRPP link and notifies the
master node of any changes in link status.
Edge node and assistant edge node: An device can serve as an edge node or assistant
edge node on the sub-ring, and as a transit node on the major ring. On an RRPP sub-ring,
either of the two nodes crossed with the major ring can be specified as an edge node, and
if one of the two nodes crossed with the major ring is specified as an edge node, the
other node is the assistant edge node. Each sub-ring must have only one edge node and
one assistant edge node.
RRPP Packets
Table 1-135 RRPP packet types
Packet Type Description

HEALTH(HELLO) A packet sent from the master node to detect
whether a loop exists on a network.
LINK-DOWN A packet sent from a transit, edge, or assistant edge
node to notify the master node that a port has gone
Down and the loop has disappeared.
COMMON-FLUSH-FDB A packet sent from the master node to instruct the
transit, edge, or assistant edge node to update its
MAC address forwarding table, ARP entries, and
ND entries.
COMPLETE-FLUSH-FDB A packet sent from the master node to instruct the
transit, edge, or assistant edge node to update its
MAC address forwarding table, ARP entries, and
ND entries. In addition, this packet instructs the
transit node to unblock the temporarily blocked
ports.
EDGE-HELLO A packet sent from an edge port of a sub-ring and
received by an assistant edge port on the same
sub-ring. The packet is used to check the
completeness of the major ring in the domain where
the sub-ring is located.
MAJOR-FAULT A packet sent from an assistant edge node to notify
the edge node that the major ring in the domain fails
if the assistant edge node does not receive the
Edge-Hello packet from the edge port within a
specified period.
Issue 01 (2018-05-04) 768

NE20E-S2
Figure 1-515 RRPP packet format
The meanings of main fields are as follows:

 Destination MAC Address: indicates the destination MAC address of an RRPP packet.
 Source Mac Address: indicates the source MAC address for an RRPP packet, which is
the bridge MAC address of the device.
 EtherType: indicates the encapsulation type. This field occupies 16 bits and has a fixed
value of 0x8100 for tagged encapsulation.
 PRI: indicates the priority of Class of Service (COS). This field occupies 4 bits and has a
fixed value of 0xe.
 VLAN ID: indicates the ID of a VLAN to which the packet belongs.
 Frame Length: indicates the ID of a VLAN to which the packet belongs.
 RRPP_LENGTH: indicates the length of an RRPP data unit. This field occupies 16 bits
and has a fixed value of 0x0040.
 RRPP_VER: indicates the version of an RRPP packet. This field occupies 8 bits, and the
current version is 0x01.
 RRPP TYPE: indicates the type of an RRPP packet.
− HEALTH = 0x05
− COMPLETE-FLUSH-FDB = 0x06
− COMMON-FLUSH-FDB = 0x07
− LINK-DOWN = 0x08
− EDGE-HELLO = 0x0a
Issue 01 (2018-05-04) 769

NE20E-S2
− MAJOR-FAULT= 0x0b
 SYSTEM_MAC_ADDR: indicates the bridge MAC address from which the packet is
sent. This field occupies 48 bits.
1.7.14.1.2 RRPP Snooping

RRPP snooping advertises changes on the RRPP ring to the VPLS network. In this case, when
RRPP snooping is enabled on sub-interfaces or VLANIF interfaces, the VPLS network can
transparently transmit RRPP protocol packets, detect the changes in the RRPP ring, and
upgrade the forwarding entries to ensure that traffic can be switched to a non-blocking path.
As shown in Figure 1-518, UPEs are connected to the VPLS network where NPEs reside in
the form of RRPP ring. NPEs are connected over a PW, and therefore cannot serve as RRPP
nodes to directly respond to RRPP protocol packets. As a result, the VPLS network cannot
sense the status change of the RRPP ring. When the RRPP ring topology changes, each node
on the VPLS network forwards downstream data according to the MAC address table
generated before the RRPP ring topology changes. As a result, the downstream traffic cannot
be forwarded.
Figure 1-516 Association between RRPP and VPLS
RRPP snooping is enabled on the sub-interface or VLANIF interface of NPE D and associated
with other VSIs on the local device. When the RRPP ring fails, NPE D on the VPLS network
clears the forwarding entries of the VSIs (including the associated VSIs) on the local node and
the forwarding entries of the remote NPE B to re-learn forwarding entries. This ensures that
traffic can be switched to a normal path and downstream traffic will be normally forwarded.
Issue 01 (2018-05-04) 770

NE20E-S2
As shown in Figure 1-517, the link between NPE D and UPE A is faulty, and the RRPP master
node UPE A sends a COMMON-FLUSH-FDB packet to notify the transit nodes on the RRPP
ring to clear its MAC address table.
Figure 1-517 Association between RRPP and VPLS (RRPP ring fault)
The original MAC address table is not cleared because NPE D cannot process the
COMMON-FLUSH-FDB packet. If the downstream data packet sent to UPE A exists, NPE D
sends it to UPE A along the original path. This leads to an interruption in downstream traffic
between NPE D and NPE A. After UPE B clears its MAC address table, the upstream data
packet sent by UPE A is regarded as an unknown unicast packet to be forwarded to the VPLS
network along the path UPE A->UPE B->NPE D. After re-learning the MAC address, NPE D
can correctly forward the downstream traffic intended for UPE A.
When the RRPP ring fault recovers, UPE A, the master node, sends a
COMPLETE-FLUSH-FDB packet to notify the transit node to clear its MAC address table.
The downstream traffic between NPE D and UPE A is interrupted because NPE D cannot
process the COMPLETE-FLUSH-FDB packet
As shown in Figure 1-518, after RRPP snooping is enabled on sub-interfaces GE 1/0/0.100
and GE 2/0/0.100 of NPE D, NPE D can process the COMMON-FLUSH-FDB or
COMPLETE-FLUSH-FDB packet.
Issue 01 (2018-05-04) 771

NE20E-S2
Figure 1-518 Association between RRPP and VPLS (enabling the RRPP snooping)
When the RRPP ring topology changes and NPE D receives the COMMON-FLUSH-FDB or
COMPLETE-FLUSH-FDB packet from the master node UPE A, NPE D clears the MAC
address table of the VSI associated with sub-interfaces GE 1/0/0.100 and GE 2/0/0.100 and
then notifies other NPEs in this VSI to clear their MAC address tables also.
If the downstream data packet sent to UPE A exists, this packet is regarded as an unknown
unicast packet to be broadcast in the VLAN and sent to UPE A along the path UPE D->UPE
B->NPE A because NPE D cannot find the MAC address table. This ensures continuity of
downstream traffic.
1.8 WAN Access

Purpose
This document describes the WAN Access feature in terms of its overview, principles, and
applications.
Issue 01 (2018-05-04) 772

NE20E-S2
Related Version

U2000 V200R017C50
Intended Audience
securely protected.
Issue 01 (2018-05-04) 773

NE20E-S2

and VRRP.
Special Declaration
Symbol Conventions
Symbol Description



injury.
tips.
Issue 01 (2018-05-04) 774

NE20E-S2
Symbol Description
deterioration.
Change History
V800R009C10SPC200.
V800R009C10SPC100.
V800R009C10.
1.8.2 ATM IMA

Definition
IMA is the acronym of Inverse Multiplexing for ATM. The general idea of IMA is that the
sender schedules and distributes a high-speed ATM cell stream to multiple low-speed physical
links for transmission, and then the receiver schedules and reassembles the stream fragments
into one cell stream and submits the cell stream to the ATM layer. In this manner, bandwidths
are multiplexed flexibly, improving the efficiency of bandwidth usage.
Purpose
When users access an ATM network at a rate between T1 and T3 or between E1 and E3, using
T3 or E3 lines is cost-ineffective for carriers. Using multiple T1 or E1 lines is more flexible
and efficient. IMA allows a network designer and administrator to use multiple T1 or E1 lines,
not the expensive T3 or E3 lines, to implement ATM access.
Benefits
IMA has the following advantages:
 Provides a rate that is lower than the T3/E3 rate but higher than the T1/E1 rate.
 Maintains the order of cells, which facilitates ATM management
IMA provides the following benefits for carriers.
 Construction and maintenance of networks will cost less.
 Networks can be expanded flexibly and bandwidth usage is more efficient.
Issue 01 (2018-05-04) 775

NE20E-S2
1.8.2.2 Principles
1.8.2.2.1 Basic IMA Principles
IMA performs inverse multiplexing of an ATM cell flow to multiple physical links and
remotely restores the original cell flow on these physical links. The ATM cell flows are
multiplexed on multiple physical links on a per cell basis. To know the IMA feature, you need
to learn the basic concepts of IMA.
Basic Concepts
 IMA group
An IMA group can be considered a logical link that aggregates several low-speed
physical links (member links) to provide higher bandwidth. The rate of the logical link is
approximately the sum of the rate of the member links in the IMA group.
 Minimum number of active links
It refers to the minimum number of active links that are required when the IMA group
enters the Operational state. Link faults may cause the number of active links for the
IMA group in the Operational state to be smaller than the configured minimum value. As
a result, the IMA group status changes and IMA may go Down. Two communication
devices can be configured with different minimum numbers of active links, but both
devices must be configured with at least the specified minimum number of active links to
be able to properly send ATM cells.
 ICP cell
ICP is short for IMA Control Protocol. ICP cells are a type of IMA negotiation cells,
used mainly to synchronize frames and transmit control information (such as the IMA
version, IMA frame length, and peer mode) between communicating devices. The offset
of ICP cells in IMA frames on a link is fixed. Like common cells, ICP cells consist of a
5-byte header and 48-byte payload.
 Filler cell
In the ATM model without an IMA sub-layer, decoupling of cell rates is implemented by
Idle cells at the Transmission Convergence (TC) sub-layer. After the IMA sub-layer is
adopted, decoupling of cell rates can no longer be implemented at the TC sub-layer due
to frame synchronization. Therefore, Filler cells are defined at the IMA sub-layer to
implement decoupling of cell rates. If there is no ATM cell to be sent, the sender sends
Filler cells so that the physical layer transmits cells at a fixed rate. These filler cells are
discarded at the IMA receiving end.
 Differential delay
Links in an IMA group may have different delays and jitters. If the difference between
the greatest phase and the smallest phase in an IMA group exceeds the configured
differential delay, the IMA group removes the link with the longest delay from the
cyclical sending queue and informs the peer that the link is unavailable by sending the
Link Control Protocol (LCP) cells. Through negotiation between the two ends of a link,
the link becomes active and then rejoins the cyclical sending queue of the IMA group.
Features Supported by ATM IMA and Their Usage Scenarios

Table 1-136 shows the features supported by ATM IMA and their usage scenarios.
Issue 01 (2018-05-04) 776

NE20E-S2
Table 1-136 Features supported by ATM IMA and their usage scenarios
ATM Description Usage Scenario

Feature
IMA IMA divides one higher-speed When users access an ATM network
transmission channel into two or at a rate between T1 and T3 or
more lower-speed channels and between E1 and E3, using T3 or E3
transports an ATM cell stream lines is cost-ineffective for carriers.
across these lower-speed channels. In this scenario, IMA can be used.
At the far-end, IMA groups these IMA transports ATM traffic over
lower-speed channels and bundled low-speed T1 or E1 lines. It
reassembles the cells to recover the allows a network designer and
original ATM cell stream. administrator to use these T1 or E1
An IMA group can be considered a lines, not the expensive T3 or E3
logical link that aggregates several lines, to implement ATM access.
physical low-speed links (member
links) to provide higher bandwidth.
The rate of the logical link is
approximately the sum of the rate of
the member links in the IMA group.
Principles
Figure 1-519 shows inverse multiplexing and de-multiplexing of ATM cells in an IMA group.
 The sending end: In the sending direction, IMA receives ATM cells from the ATM layer
and places them in circular order onto member links of the IMA group.
 The receiving end: After reaching the receiving end, these cells are reassembled into the
original cell flow and transmitted onto the ATM layer. The IMA process is transparent to
the ATM layer.
Figure 1-519 Inverse multiplexing and de-multiplexing of ATM cells in an IMA group
Issue 01 (2018-05-04) 777

NE20E-S2
Figure 1-520 illustrates IMA frames.

The IMA interface periodically sends certain special cells. The information contained in these
cells is used by the receiving end of IMA virtual links to recreate ATM cell flows. Before
recreating ATM cell flows, the receiving end adjusts the link differential delay and removes
the Cell Delay Variation imported by the IMA Control Protocol (ICP) cells.
When IMA frames are transmitted, the sending end must align these frames on all links.
Depending on the arrival time of the IMA frames on different links, the sending end detects
the differential delay between the links and makes adjustments.
Cells are consecutively sent out from the sending end. If no cells on the ATM layer can be
sent between ICP cells of an IMA frame, the IMA sending end maintains consecutive cell
flows on the physical layer by adding filler cells. These filler cells are discarded at the IMA
receiving end.
Figure 1-520 Schematic diagram of IMA frames
1.8.2.3.1 ATM IMA Applications on an L2VPN
As show in Figure 1-521, after ATM services from the NodeA are converged at the E1 or T1
interface on PE1, ATM cells are encapsulated into PSN packets that can be transmitted over
PSNs. After arriving at the downlink PE2, the PSN packets are decapsulated into the original
ATM cells and then the ATM cells are sent to the 3G radio network controller (RNC).In this
solution, services of multiple types are converged at a PE on a PSN. This improves the
efficiency of current network resources, reduces Plesiochronous Digital Hierarchy (PDH)
VLLs, and facilitates the deployment of new sites as well as the maintenance and
management of multiple services.
Issue 01 (2018-05-04) 778

NE20E-S2
Figure 1-521 ATM IMA Applications on an L2VPN
Term
None.
Acronym/Abbreviation
Acronym/Abbreviation Full Spelling
FMC Fixed-Mobile Convergence
IMA Inverse Multiplexing for ATM
AN Access Node
PSN Pack Switched Network
IP RAN IP Radio Access Network
PWE3 Pseudo-Wire Emulation Edge-to-Edge
PW Pseudo Wire
QoS Quality of Service
Issue 01 (2018-05-04) 779

NE20E-S2
1.8.3 ATM
This chapter describes the basic concepts, principles, and applications of Asynchronous
Transfer Mode (ATM) interface and protocol.
Definition
ATM was designated as the transmission and switching mode for Broadband Integrated
Services Digital Networks (B-ISDN) by the ITU-T in June 1992. Due to its high flexibility
and support to the multi-media service, ATM is considered as the key for realizing broadband
communications.
Defined by the ITU-T, ATM implements transmission, multiplexing, and switching of data
based on cells. ATM is a cell-based and connection-oriented multiplexing and switching
technology.
An ATM cell has a fixed length of 53 bytes. As defined by the ITU-T, ATM transmits,
multiplexes, and switches data based on cells. For example, the messages of voice, video, and
data are all transmitted in the cells of the fixed length. This ensures the fast data transmission.
Purpose
ATM provides the network with a versatile and connection-oriented transfer mode that applies
to different services.
Before the Gigabit Ethernet technology, ATM backbone switches were mostly used on
backbone networks to ensure high bandwidth. ATM dominated among network technologies
because it can provide good QoS and transmit voice, data, and video with high bandwidth.
Nevertheless, the initial roadmap for ATM, coping with all the network communication issues,
was too ambitious and idealistic. As a result, the ATM implementation became so complicated.
The aim of the ATM technology is too ideal. The realization of ATM is complex. The
perfection of the ATM technology and complexity of its architecture result in the difficulties
of developing, configuring, managing, and troubleshooting the ATM system.
ATM network devices are quite expensive. The ATM network cannot be affordable for people
and its excellent performance is unknown from the origin of ATM.
In the late 1990's, Internet and IP technology overshadowed ATM for their simplicity and
flexibility. They developed at a fast rate in the application field. This made a severe impact on
the B-ISDN plan.
ATM is, however, still regarded as the best transmission technology of B-ISDN because it has
advantages in transporting integrated services. Therefore, the IP technology integrated with
ATM. This brought about the new era of constructing broadband networks through the
integration of the IP and ATM technologies.
1.8.3.2 Feature Updates

Version Change Description
V800R008C10 Now supports ATM.
Issue 01 (2018-05-04) 780

NE20E-S2
1.8.3.3 Principles
1.8.3.3.1 ATM Protocol Architecture
ATM Protocol Reference Model

Figure 1-522 describes the relationship between the planes and layers of the ATM protocol
architecture.
Figure 1-522 Diagram of the ATM protocol architecture
The ATM protocol architecture consists of the following planes:

 Control plane: This plane generates and manages signaling requests. It sets up, monitors,
and removes connections by using signaling protocols.
 User plane: This plane manages data transmission.
 Management plane: This plane is divided into layer management and plane management.
− Layer management: It is responsible for the management of every layer in each
plane. It has a layered structure corresponding to other planes.
− Plane management: It is responsible for the system management and the
communications between different planes.
The ATM protocol architecture is divided into the following layers:
 Physical layer: Similar to the physical layer of the OSI reference model, the physical
layer manages the transmission related to the medium.
 ATM layer: It integrates with the ATM adaptation layer (AAL) and is similar to the data
link layer of the OSI reference model. The ATM layer is responsible for sharing virtual
circuits on the physical link and transmitting ATM cells on the ATM network.
Issue 01 (2018-05-04) 781

NE20E-S2
 AAL: It integrates with the ATM layer and is similar to the data link layer of the OSI
reference model. AAL is mainly responsible for isolating the upper-layer protocols from
the ATM layer. It prepares the switchover from the data to cells, and divides the data into
a 48-byte cell payload.
 Upper layer: It receives data, divides it into packets, and transmits it to AAL for
processing.
Each layer is further divided into several sub-layers.
The comparison between the ATM protocol architecture and the OSI reference model is
Figure 1-523 Comparison between the ATM protocol architecture and the OSI reference model
Function Overview of ATM Layers and Sub-layers

Table 1-137 lists the functions of layers and sub-layers in the ATM reference model.
Table 1-137 Functions of layers and sub-layers in the ATM reference model
Layers of the ATM Sub-layers of the ATM Function

Reference Model Reference Model
ATM Adaptation Layer Convergence Sublayer Convergence sub-layer:

(AAL) (CS) provides standard interfaces.
Segmentation And Segmentation and
Reassembly (SAR) reassembly sub-layer
ATM layer  Flow control
 Generation and extraction
of cell headers
 Management of the
Virtual Path Identifier
(VPI)/Virtual Channel
Identifier(VCI)
 Cell multiplexing or
demultiplexing
Issue 01 (2018-05-04) 782

NE20E-S2
Layers of the ATM Sub-layers of the ATM Function

Reference Model Reference Model
Physical layer Transmission Convergence  Decoupling of the cell

Sub-layer (TC) rate
 Generation and check of
the header checksum
 Generation, adaptation,
and recovery of cells
Physical Medium  Clock recovery

Dependent (PMD)  Line encoding
 Physical network access
The detailed functions of layers and sub-layers in the ATM reference model are described in
the following sections.
1.8.3.3.2 ATM Physical Layer
Physical Medium Dependent

Physical medium dependent (PMD) provides the following functions.
 Synchronizes the sending and receiving by sending and receiving continuous bit flows
with timing information.
 Specifies physical carriers for all physical media, including cables and connectors.
The ATM physical medium standard includes Synchronous Optical Network
(SONET)/Synchronous Digital Hierarchy (SDH), Digital Signal level 3 (DS-3), T3/E3,
multimode optical fiber, and shielded twisted pair (STP). Different media can use various
transmission frame structures.
The following section describes the method to encapsulate ATM in SONET/SDH, T3/E3
frames:
 Encapsulating ATM in SONET/SDH Frames
SONET is the core standard defined by the American National Standards Institute
(ANSI), ECSA, and Bellcore. The standard rate provided in this specification is 50.688
Mbit/s.
To map T1 into the SONET frames, the size of the SONET frames is changed. Then, the
SONET frames provide the multi-rate transmission of 51.84 Mbit/s.
The basic rate of SONET is 51.84 Mbit/s, and the standard frame format is STS-M, in
which M can be 1, 3, 12, or 48.
SDH is similar to SONET. The basic rate of SDH is 155.52 Mbit/s, and the standard
frame format is STM-N, in which N can be 1, 4, 16, or 64.
Issue 01 (2018-05-04) 783

NE20E-S2
Table 1-138 Comparison between the common transmission rates of SONET and SDH
SONET SDH Data Payload Overhead Byte

Transmission Transmission Transmission
Rate (Mbit/s) Rate (Mbit/s) Rate (Mbit/s)
STS-1 - 51.84 50.112 1.728

STS-3 STM-1 155.52 150.336 5.184
STS-12 STM-4 622.08 601.344 20.736
STS-48 STM-16 2,488.32 2,405.376 84.672
STS-192 STM-64 9,953.28 9,621.504 331.776
STS-768 STM-256 39,813.12 38,486.016 1,327.104
Figure 1-524 SONET architecture
− The user layer lies at the top of the SONET physical layer.
− The transmission channel layer, digital line layer, and segment regeneration layer
are three sub-layer entities of the SONET physical layer.
 The transmission channel layer is mainly responsible for assembling and
disassembling cells for SONET frame signals.
 The digital line layer adds the packet header (such as system overhead) and
performs multiplexing.
 The segment regeneration layer includes the segment layer and photon layer.
After data arrives at the segment regeneration layer, the segment layer appends
a segment header, encapsulates the data in a frame, and transmits this frame to
the photon layer. Then, the photon layer sends this frame after switching the
electric signals into optical signals.
The frame format of the STS-3c that bears ATM cells is shown in Figure 1-525.
Issue 01 (2018-05-04) 784

NE20E-S2
Figure 1-525 Example for STS-3c frame format
 Encapsulating ATM in T3/E3 Frames

T3 is the standard of North American. It supports the transmission rate of 44.736 Mbit/s.
E3 is the standard of European. It supports the transmission rate of 34.368 Mbit/s.
The following technologies can map ATM cells into T3 frames.
− Adopting the Physical Layer Convergence Protocol (PLCP)
As shown in Figure 1-526, PLCP directly inserts 53 cells into DS-3 PLCP frames.
Figure 1-526 A PLCP frame
The system overhead occupies 4 bytes.

Because the DS-3 PLCP frames are different from DS-3 frames, 6.5 to 7 bytes must
be filled in the cell after every 12 cells to provide the synchronous operation
required by DS-3 frames.
The rate of this mode can reach up to 40.704 Mbit/s because of the existence of the
system overhead.
− Direct mapping ATM cells into DS-3 frames
Issue 01 (2018-05-04) 785

NE20E-S2
The direct mapping mode is more efficient than the PLCP mode and can support up
to 44.21 Mbit/s.
Similar to DS-3, E3 adopts two technologies: PLCP and direct mapping into E3.
Compared with DS-S PCLP, E3 PLCP has the following differences:
− It adopts the G.751 format, and inserts the tail used to synchronize E3 after every
nine cells.
− Its tail length ranges from 18 to 20 bytes, and that of DS-3 PLCP ranges from 6.5 to
7 bytes.
ATM cells are directly mapped into E3 frames in the G.832 standard.
ATM cells are directly mapped into 530-byte payload with the system overhead
occupying 7 bytes.
Figure 1-527 Format of directly mapping ATM cells into E3 frames
ATM IMA
1.8.2 ATM IMA describes the principles of ATM IMA.
ATM Bundling
ATM bundling is an extended ATM PWE3 application and is applicable to IP RAN networks.
On the network shown in Figure 1-528, nodeBs are connected to a Cell Site Gateway (CSG)
using ATM links. Each nodeB probably transmits both voice and data services. Configuring a
PWE3 PW for each service on every nodeB connected to an Radio Network Controller (RNC)
will expose heavy burden on the CSG. Bundling physical links to one PW to transmit the
same type of service from different nodeBs to the RNC relieves the burden on the CSG and
provides service scalability.
Issue 01 (2018-05-04) 786

NE20E-S2
Figure 1-528 Networking diagram for ATM bundling
ATM bundling is an ATM PWE3 extension and provides logical ATM bundle interfaces.
PWE3 PWs are established on ATM bundle interfaces and PVCs are configured on Serial
sub-interfaces( specifies ATM as the link layer protocol ). After Serial sub-interfaces join the
ATM bundle interfaces, PVCs on these sub-interfaces are mapped to specified PWs. This
reduces the number of PWs and system burden. ATM bundle interfaces forward traffic as
follows:
1. After receiving user traffic through a PVC of an ATM bundle member interface on a
CSG, the CSG forwards user traffic to a PW to which the PVC is mapped.
2. After receiving traffic from an RNC, the CSG maps traffic to specific ATM bundle
member interfaces based on PVCs and these ATM bundle member interfaces forward
traffic to specific nodeBs.
1.8.3.3.3 ATM Layer
Basic Function of the ATM layer

The ATM layer lies on the top of the physical layer and is responsible for exchanging and
multiplexing cells through the ATM network.
The 48-byte payload that is input into the ATM layer is called the Segmentation and
Reassembly-Protocol Data Unit (SAR-PDU). The 53-byte cell is output from the ATM layer.
Therefore, this cell is forwarded to the physical layer for transmission.
The ATM layer has the following functions:
 Generates a 5-byte cell header and checks this cell header.
 Transmits the VC number Virtual Path Identifier (VPI)/Virtual Channel Identifier (VCI),
multiplexes, and demultiplexes cells.
 Performs the generic flow control (GFC).
ATM Network Interface

An ATM network consists of a group of ATM switches, which are interconnected through the
P2P ATM links or interfaces. ATM network interfaces are divided into the following types:
Issue 01 (2018-05-04) 787

NE20E-S2
 User-to-Network Interface
The UNI defines the interfaces between the peripheral devices and ATM switches.
Depending on whether the switches are owned by clients or operators, UNIs can be
divided into public UNIs and private UNIs.
Private UNIs are connected to two switches on the same private ATM network and used
inside the private ATM network. Public UNIs are connected to ATM peripheral devices
or private ATM switches to public ATM switches.
 Network-to-Network Interface
The NNI refers to the interfaces between ATM switches.
Depending on whether the switches are owned by clients or operators, NNI can be
divided into two types: public NNIs and private NNIs.
Connected to two switches on the same private ATM network, the private NNI is used
inside the private ATM network. Connected to two ATM switches of the same public
network carrier, the public NNI is used by one ATM service provider.
 B-ISDN Inter Carrier Interface
A B-ISDN Inter Carrier Interface (B-ICI) is connected to the public switches of different
network carriers and provides internal connections to multiple ATM network carriers.
B-ICIs are directly connected to NNIs.
Figure 1-529 shows the connections between various ATM network interfaces.
Figure 1-529 ATM network interfaces of the private and public networks
Virtual Circuit of ATM

In ATM, VPI/VCI is used to identify a logical connection. The VPI/VCI value only has local
significance.
VPI is used to identify the virtual path (VP) number of the virtual circuit connection (VCC).
VCI is used to identify the VC number of the VP. The combination of VPI and VCI comprises
the connection identifier.
As shown in Figure 1-530, a VCC contains multiple VPs, and a VP contains multiple VCs.
Issue 01 (2018-05-04) 788

NE20E-S2
Figure 1-530 Diagram of the relationship between VP and VC
The VP is used to adapt to high-speed networks in which network control cost is increasing.
The VP technology reduces the control cost by binding the connections of the same paths on a
shared network into a unit. By doing so, the network management can only process lesser
number of connections, instead of a larger number of independent connections.
In the ATM communication, an ATM switch transmits the received cells to the output
interface according to the VPI/VCI of the input cells and the forwarding table that is
generated during the setup of a connection. At the same time, this ATM switch changes the
VPI/VCI of a cell into that of an outgoing interface to complete the VP switching or VC
switching.
ATM VCs are of the following types: permanent virtual circuit (PVC), switching virtual
circuit (SVC), and soft virtual circuit (soft VC).
 The PVC is statically configured by the administrator. Once it is set up, it cannot be
removed. PVC applies to connections for advanced requirements.
 The SVC is set up through the signaling protocol. It can be connected and removed
through commands.
When a node receives the connection request from other nodes, the connection response
information needs to be sent to this node if configuration requirements are satisfied.
After the connection is set up, the connection request is sent to the next target node.
The removing process is similar to the setting up of the connection.
 Soft VC indicates that the ATM network is based on SVC, but peripheral devices access
the ATM network in PVC mode.
The setting up of soft VCs is similar to that of SVCs. The only difference is that PVCs
must be manually configured between ATM switch interfaces and peripheral devices.
The advantage of this mode is that it is easy to manage users if PVCs are connected to
the users. In addition, SVCs can ensure the proper usage of the links.
Issue 01 (2018-05-04) 789

NE20E-S2
Figure 1-531 Soft VC
Forwarding of ATM Cell

The address in an ATM cell refers to VPI/VCI that is similar to the IP address. This VPI/VCI
value is defined by the network administrator or dynamically generated by an ATM switch. In
addition, the ATM forwarding table has similar functions as the IP routing table. As shown in
Figure 1-532, cells are forwarded according to the forwarding table of a switch.
Figure 1-532 Forwarding of ATM cells
In the ATM switching table shown in Figure 1-532, the first line shows that cells sent from the
port with VPI/VCI as 4/55 to a switch changes the cell header VPI/VCI to 8/62. Then, these
cells are sent out from port 3.
Format of an ATM Cell Header

ATM has two types of cell header formats: user-to-network interface (UNI) and
network-to-network interface (NNI).
Issue 01 (2018-05-04) 790

NE20E-S2
The UNI cell header is used for communication between the ATM terminal and switching
nodes on an ATM network.
Figure 1-533 shows the format of a UNI cell header.
Figure 1-533 Format of an ATM UNI cell header
The NNI cell header is used for communication between two switching nodes.
Figure 1-534 shows the NNI cell header format.
Figure 1-534 Format of an ATM NNI cell header
The meaning of each field in the preceding diagrams is as follows:

 GFC: indicates the general flow control with a length of 4 bits. It applies to the UNI
interfaces only. It performs flow control, and identifies different accesses on a shared
media network.
 VPI: indicates the virtual path identifier. In the UNI, it can identify 256 VPs and its
length is 8 bits. In the NNI, it can identify 4096 VPs and its length is 12 bits.
 VCI: indicates the virtual channel identifier. It can identify 65536 VCs and its length is
16 bits.
 CLP: indicates the cell loss priority. It is used for congestion control and its length is 1
bit. When congestion occurs, cells with the CLP as 1 are discarded first.
 PTI: indicates the payload type indicator. It identifies the payload type and its length is 3
bits.
 HEC: indicates the header error control. It is used for error control and cell delimitation
in a cell header and its length is 8 bits. HEC can correct 1-bit error, find multi-bit error,
and perform HEC on the physical layer.
Some specified VPI/VCI values are reserved for special cells. These special cells are
 Idle cell: Its VPI is 0, VCI is 0, PTI is 0, and CLP is 1. It is used for rate adaptation.
Issue 01 (2018-05-04) 791

NE20E-S2
 Unassigned cell: Its VPI is 0, VCI is 0, PTI can be any value, and CLP is 1.
 OAM cell: For the VP sub-layer, its VCI is 3 and it is used for the VP link. When VCI is
4, it is used for the VP connection. For the VC sub-layer, it is used for the VC link when
PTI is 4. When PTI is 5, it is used for the VC connection.
 Signaling cell: It is divided into the following types:
− Component signaling cell: Its VPI can be any value, and VCI is 1.
− General broadcast signaling cell: Its VPI can be any value, and VCI is 2.
− Point-to-point (P2P) signaling cell: Its VPI can be any value, and VCI is 5.
 Payload type: Its length is 3 bits. It is used to identify the information field, that is, the
payload type. The following lists the PT values and corresponding meanings defined by
the ITU-T I.361.
− PT = 000: indicates that the data cell does not experience congestion and ATM user
to user (AUU) is 0.
− PT = 001: indicates that the data cell does not experience congestion and AUU is 1.
− PT = 010: indicates that the data cell experiences congestion and AUU is 0.
− PT = 011: indicates that the data cell experiences congestion and AUU is 1.
− PT = 100: indicates the cells related to the OAM F5 segment.
− PT = 101: indicates the OAM F5 end-to-end cells.
− PT = 110: indicates the resource management cells.
− PT = 111: This PT is for future use.
When cells are used to carry data:
 The first bit of PT is 0.
 The second bit identifies whether cells experience congestion and can be set through the
network node when there is congestion.
 The third bit is an AUU indicator. AUU = 0 indicates that the corresponding SAR-PDU
is the beginning segment or intermediate segment. AUU = 1 indicates that SAR-PDU is
the ending segment.
ATM OAM
 Overview of OAM
According to different protocols, OAM has two different definitions.
− OAM: Operation And Maintenance (ITU-T I.610 02/99)
− OAM: Operation Administration and Maintenance (LUCENT APC User Manual,
03/99)
OAM offers a mechanism to detect and locate faults, and verify the network performance
without interrupting the service. After some OAM cells with the standard structure are
inserted in user cell flow, certain specific information can be provided.
 ATM OAM Supported by NE20E
Currently, on Huawei NE20Es, OAM mainly checks the connectivity of PVCs.
The OAM process is as follows:
a. Two ends simultaneously send OAM cells at a specified interval to their peers.
b. If the peer replies with a signal after receiving the OAM cell, it indicates the link is
normal. If the local timer finds that the OAM cell times out, the local port considers
that the link fails.
Issue 01 (2018-05-04) 792

NE20E-S2
OAM functions can vary with different chips. Main OAM functions are as follows.
OAM Function Application

Alarm Indication Signal (AIS) Reports errors to the downstream.
Remote Defect Indication (RDI) Reports errors to the upstream.
Continuity Check (CC) Monitors the connectivity of one connection
continuously. A series of cells periodically check
whether a connection is idle or faulty.
Loopback Detects the connectivity, locates faults, and validates
the pre-service connectivity as required.
Performance Monitoring (PM) Manages performance and returns the assessment to
the local.
Activation/Deactivation Activates or deactivates performance detection and
continuity check.
According to different functions, OAM is classified into two types:

− F4: is used for the VP level.
− F5: is used for the VC level.
F5 is divided into end-to-end and segmentation. PTI that has the length of 3 bits in the
ATM header information is used to differentiate the two types. When PTI is 100, it
indicates the segmentation. When PTI is 101, it indicates end-to-end. Currently, OAM is
used to detect links. Therefore, Huawei products mainly support the end-to-end type of
F5.
1.8.3.3.4 ATM Adaptation Layer
Structure and Function of ATM Adaptation Layer

ATM Adaptation Layer (AAL) is the interface between upper-layer protocols and the ATM
layer. It forwards and receives the information between ATM layers and upper-layer protocols.
AAL lies on top of the ATM layers and corresponds to the data link layer of the OSI reference
model.
AAL is divided into the following layers.
 Convergence Sublayer
Convergence sublayer (CS) contains the following two sublayers:
− Service special convergence sublayer (SSCS)
− Common part convergence sublayer (CPCS)
The CS sublayer is used to convert the upper-layer information into ATM payload with
the same size that is suitable for the segments.
SSCS associates with the features of various services. The CPCS changes into frames by
adding stuffing characters with variable length at the front and back of frames to perform
Issue 01 (2018-05-04) 793

NE20E-S2
error detection. The frames change into the integer multiple of 48-byte payload through
filling.
 Segmentation and Reassembly
When peripheral devices send data, segmentation and reassembly (SAR) is used to
divide aggregation frames into 48-byte payloads. When peripheral devices receive data,
SAR is used to reassemble 48-byte payloads into aggregation frames.
AAL Type
Currently, there are four types of AAL: AAL1, AAL2, AAL3/4, and AAL5. Each type
supports certain specified services on the ATM network. Products produced by most ATM
equipment manufacturers widely adopt AAL5 to support data communication service.
 AAL1
AAL1 is used for constant bit rate (CBR), sending data at a fixed interval.
AAL1 uses one part of the 48-byte payload to bear additional information, such as
sequence number (SN) and sequence number protection (SNP). SN contains 1-bit
convergence sublayer identifier and 3-bit sequence counting (SC). CSI is also used for
timing.
 AAL2
Compared with AAL1, AAL2 can transmit compressed voice and realize common
channel signaling (CCS) inside ISDN.
Details on AAL2 are defined in ITU-T 363.2.
AAL2 supports the processing of compressed voice at the upper limit rate of 5.3 Kbit/s.
This realizes silence detection, suppression, elimination, and CCS. In addition, higher
bandwidth utilization is available. Segments can be encapsulated into one or multiple
ATM cells.
CS of AAL2 can be divided into CPCS and SSCS. SSCS is on top of CPCS. The basic
structure of AAL2 users can be recognized through CPCS. Error check, data
encapsulation, and payload breakdown can be performed.
AAL2 allows payloads of variable length to exist in one or multiple ATM cells.
 AAL3/4
As the first technology trying to realize cell delay, AAL3/4 stipulates the
connection-oriented and connectionless data transmission.
CPCS is used to detect and process errors, identify the CPCS-service data unit (SDU) to
be transmitted, and determine the length of the CPCS-packet data unit (PDU).
 AAL5
AAL5 can also process connection-oriented and connectionless data. AAL5 is called the
simple and valid adaptation layer. It uses 48 bytes to load the payload information. AAL5
does not use the additional information bit. It contains no sequence number and cannot
detect errors.
AAL5 SAR sublayer is simple. It divides CPCS-PDUs into 48-byte SAR-PDUs without
any overhead and realizes the reverse function when receiving data.
The CPCS-PDU format of AAL5 CPCS is shown in Figure 1-535.
Issue 01 (2018-05-04) 794

NE20E-S2
Figure 1-535 CPCS-PDU format
The length of the CPCS-PDU payload is variable and ranges from 1 to 65535 bytes.
As shown in Figure 1-535, no CPCS-PDU header exists. A CPCS-PDU tail, however,
occupies eight bytes. The meaning of each field in Figure 1-535 is as follows:
− PAD: indicates the stuffing bit, making the CPCS-PDU length as the integer
multiple of 48-byte payload.
− UU: is used for transparent transmission of CPCS user information.
− CPI: is used to change the CPCS-PDU tail so that it is 8 bytes.
− L: indicates the payload length of CPCS-PDU.
− CRC: protects CPCS-PDU
SSCS of AAL5 CS is similar to AAL3/4. CPCS is also shared by upper layers. CPCS
performs error detection, processes errors, fills bytes to form 48-byte payloads, and
discards the received incomplete CPCS-PDU.
1.8.3.3.5 ATM Multiprotocol Encapsulation

ATM multiprotocol encapsulation, described in the standard protocols, defines the standard of
transmitting multiprotocol data packets on the ATM network in the format of ATM Adaptation
Layer 5 (AAL5) frame.
In addition, the standard protocols also define the following two multiprotocol encapsulations,
both of which carry the PDU in the payload field of the AAL5 frame. The format of the AAL5
CPCS-PDU is shown in 1.8.3.3.4 .
 Logical Link Control (LLC)/Sub-Network Attachment Point (SNAP), which is the
default encapsulation technology adopted in the standard protocols
 LLC/SNAP allows multiprotocol multiplexing on a single ATM virtual circuit (AC). The
type of the protocol carrying the PDU is identified by the LLC header of the IEEE 802.2
standard that is added to the PDU.
 Virtual Circuit (VC) multiplexing
 VC multiplexing ensures the carrying of high-layer protocols on ATM VCs. Each
protocol is carried on a distinct ATM VC.
LLC/SNAP Encapsulation
LLC encapsulation is needed when several protocols are carried over the same VC. To ensure
that the receiver properly processes the received AAL5 CPCS-PDU packets, the payload field
must contain information necessary to identify the protocol of the routed or bridged PDU. In
LLC encapsulation, this information is encoded in an LLC header placed in front of the
carried PDU.
There are two types of LLC:
 LLC type 1: Unacknowledged connectionless mode
Issue 01 (2018-05-04) 795

NE20E-S2
 LLC type 2: Connection-mode

Unless otherwise specified, LLC in this document refers to LLC type 1. The application of
LLC type 2 is similar to that of LLC type 1.
 LLC Encapsulation for Routed Protocols
In LLC encapsulation, the protocol of a routed PDU is identified by prefixing the PDU
by an IEEE 802.2 LLC header. As shown in Figure 1-536, an LLC header consists of
three fields with the length of 1 byte.
Figure 1-536 LLC header structure
In LLC encapsulation for routed protocols:

− The LLC header value 0xFE-FE-03 identifies a routed ISO Protocol Data Unit
(PDU).
− The Ctrl field value is 0x03, specifying an unnumbered information command
PDU.
For routed ISO PDUs, the format of the AAL5 CPCS-PDU Payload field is shown in
Figure 1-537.
Figure 1-537 Payload format for routed ISO PDUs
The meaning of each field is as follows:

− LLC: Its fixed value is 0xFE-FE-03.
− ISO PDU: Its length ranges from 1 to 65532 bytes.
− Packet Assembler/ Disassembler (PAD): Its length ranges from 0 to 47 bytes.
− CPCS-UU: Its length is 1 byte.
− CPI: Its length is 1 byte.
− Length: It is 2 bytes.
− Cyclic Redundancy Check (CRC): Its length is 4 bytes.
Issue 01 (2018-05-04) 796

NE20E-S2
ISO routing protocol is identified by a 1-byte Network Layer Protocol Identifier (NLPID)
field that is a part of the protocol data. NLPID values are administered by ISO and
ITU-T.
An NLPID value of 0x00 is defined in ISO/IEC TR 9577 as the null network layer or
inactive set. Since it has no significance within the context of this encapsulation scheme,
an NLPID value of 0x00 is invalid.
Although an IP is not an ISO protocol, the IP has an NLPID value of 0xCC. For an IP, it
adopts the preceding encapsulation format that is not used often.
The LLC header value 0xAA-AA-03 identifies a SNAP header with IEEE802.1a. Figure
1-538 shows the format of a SNAP header.
Figure 1-538 Format for SNAP headers
A SNAP header is 5 bytes in length, consisting of the OUI and PID.

− The organizationally unique identifier (OUI) is 3 bytes in length. The OUI identifies
an organization that administers the meaning of the following Protocol Identifier
(PID). The OUI value 0x00-00-00 indicates that the PID is an Ethernet type.
− The PID is 2 bytes in length.
An SNAP header therefore identifies a unique routed or bridged protocol.
For routed non-ISO PDUs, the format of an AAL5 CPCS-PDU payload is shown in
Figure 1-539, in which the field indicating that the Ethernet type is 2 bytes in length.
Figure 1-539 Format for routed non-ISO PDUs
In the detailed format of an IPv4 PDU, the Ethernet type value is 0x08-00. Figure 1-540
shows the format of the IP PDU.
Issue 01 (2018-05-04) 797

NE20E-S2
Figure 1-540 Format for routed IPv4 PDUs
 LLC Encapsulation for Bridged Protocols

In the LLC encapsulation, the bridged PDU is encapsulated by defining the type of the
bridged media in the SNAP header.
The LLC header value 0xAA-AA-03 identifies the SNAP header. In the LLC
encapsulation of bridged protocols, the OUI field value in the SNAP header is the 802.1
organization code 0x00-80-C2.
Currently, the bridged media type is specified by the 2-byte PID. In addition, the PID
indicates whether the original frame check sequence (FCS) is preserved within the
bridged PDU.
Table 1-139 lists the media type values that are used in the ATM encapsulation.
Table 1-139 List of some values of OUI 00-80-C2
Preserved FCS Not Preserved FCS Media Type
0x00-01 0x00-07 802.3/Ethernet

0x00-02 0x00-08 802.4
0x00-03 0x00-09 802.5
0x00-04 0x00-0A FDDI
0x00-05 0x00-0B 802.6
- 0x00-0D Fragments
- 0x00-0E BPDUs
The AAL5 CPCS-PDU Payload field carrying a bridged PDU must have one of the
following formats.
Issue 01 (2018-05-04) 798

NE20E-S2
It is required to add padding after the PID field to align the user information field of the
Ethernet, 802.3, 802.4, 802.5, FDDI, and 802.6 PDUs.
The sequence of a MAC address must be the same as that in the LAN or MAN.
Figure 1-541 Payload format for bridged Ethernet/802.3 PDUs
Padding is added to ensure that the length of a frame on the Ethernet/802.3 physical layer
reaches the minimum value. Padding must be added when bridged Ethernet/802.3 PDU
encapsulation with the LAN FCS is used. Otherwise, you do not need to add padding.
When frames without the LAN FCS are received, the bridge must add some padding to
the frames before forwarding the frames to an Ethernet/802.3 subnet.
Figure 1-542 Payload format for bridged 802.4 PDUs
Issue 01 (2018-05-04) 799

NE20E-S2
Figure 1-544 Payload format for bridged FDDI PDUs
The common PDU header and trailer are conveyed in sequence at the egress bridge to an
802.6 subnet. Specifically, the common PDU header contains the BAsize field, which
contains the length of the PDU.
Issue 01 (2018-05-04) 800

NE20E-S2
If this field is not available to the egress 802.6 bridge, that bridge cannot begin to
transmit the segmented PDU until it has received the entire PDU, calculated the length,
and inserted the length into the BAsize field.
If the field is available, the egress 802.6 bridge can extract the length from the BAsize
field of the Common PDU header, insert it into the corresponding field of the first
segment, and immediately transmit the segment onto the 802.6 subnet.
For the egress 802.6 bridge, you can set the length of the AAL5 CPCS-PDU to 0 to
ignore AAL5 CPCS-PDUs.
VC Multiplexing
In the multiplexing technologies based on the VC, the VC between two ATM sites is used to
differentiate the protocols that carry network interconnection. That is, each protocol must be
carried over each VC.
Therefore, no additional multiplexing information is contained on the payload of each AAL5
CPCS-PDU. This can save bandwidth and reduce the processing cost.
 VC Multiplexing for Routed Protocols
In VC multiplexing for routed protocols, the Payload field of an AAL5 CPCS-PDU
contains only the routed PDU packet. The format of the PDU packet is shown in Figure
1-546.
Figure 1-546 Payload Format for Routed PDUs
 VC Multiplexing for Bridged Protocols

In VC multiplexing for bridged protocols, how to carry a bridged PDU in the payload
field of an AAL5 CPCS-PDU must be the same as that described in 1.8.3.3.5 ATM
Multiprotocol Encapsulation except that only the field after the PID is contained in the
PDU packet.
The AAL5 CPCS-PDU Payload field carrying a bridged PDU must have one of the
following formats.
Figure 1-547 Payload Format for Bridged Ethernet/802.3 PDUs
Issue 01 (2018-05-04) 801

NE20E-S2
Figure 1-548 Payload Format for Bridged 802.4/802.5/FDDI PDUs
Figure 1-549 Payload Format for Bridged 802.6 PDUs
Since the PID field is not contained in a bridged Ethernet/802.3/802.4/802.5/FDDI PDU

packet, the VC determines the LAN FCS. PDUs in the same bridged medium can carry
different protocols regardless of whether the PDUs contain the LAN FCS.
1.8.3.4 Application
1.8.3.4.1 IPoA
IP over AAL5 (IPoA) means that AAL5 bears IP packets. That is, IP packets are encapsulated
in ATM cells and transmitted on the ATM network.
Issue 01 (2018-05-04) 802

NE20E-S2
Figure 1-550 Networking diagram of the IPoA application
Realization
As shown in Figure 1-550, on Device A, PVC 0/40 can reach Device B, and PVC 0/41 can
reach Device C. If IP packets sent to Device B need to be sent from PVC 0/40, the IP address
of Device B must be mapped on PVC 0/40. After address mapping is set up, Device A sets up
a route that reaches the IP address of Device B. The outgoing interface is the interface where
ATM PVC 0/40 resides.
Procedures for Packet Forwarding

Ping Device B through Device A.
 Device A searches the routing table, and finds the outgoing interface to be an ATM
interface.
 The outgoing interface encapsulates IP packets into PVC 0/40 as IPoA cells.
 IPoA cells are sent to an ATM network.
 The ATM network sends IPoA cells to Device B.
Device B sends back these cells to Device A. Then, Device A pings through Device B.
1.8.3.5 Impact
1.8.3.5.1 On System Performance
1.8.3.5.2 On Other Features

None
1.8.3.5.3 Our Advantages

None
1.8.3.5.4 Known Issues

None
Issue 01 (2018-05-04) 803

NE20E-S2
Terms
Term Description
ATM Recommendation ITU-R F.1499 defines the Asynchronous
Transfer Mode (ATM) as a protocol for the transmission of a
variety of digital signals using uniform 53 byte cells.
Recommendation ITU-R M.1224 defines ATM as a transfer mode
in which information is organized into cells. It is asynchronous in
the sense that the recurrence of cells depends on the required or
instantaneous bit rate. Statistical and deterministic values may also
be used to qualify the transfer mode.
Cell ATM organizes digital data into 53-byte cells and then transmits,
multiplexes, or switches them. An ATM cell consists of 53 bytes.
The first 5 bytes is the cell header that contains the routing and
priority information. The remaining 48 bytes are payloads.
Multi-network PVC A multi-network PVC travels multiple networks. It consists of PVC
segments on different networks.
Sub-interface Sub-interfaces enable one physical interface to provide multiple
logical interfaces. Configuring sub-interfaces on a physical
interface associates these logical interfaces with the physical
interface.

Acronym & Full Name
Abbreviation
AAL ATM Adaptation Layer

AAL1 ATM Adaptation Layer Type 1
ADSL Asymmetric Digital Subscriber Line
ANSI American National Standards Institute
ATM Asynchronous Transfer Mode
B-ICI B-ISDN Inter Carrier Interface
B-ISDN Broadband Integrated Services Digital Network
CBR Constant Bit Rate
Issue 01 (2018-05-04) 804

NE20E-S2
Acronym & Full Name

Abbreviation
CC Continuity Check
CCITT International Telegraph and Telephone Consultative Committee
CHAP Challenge Handshake Authentication Protocol
CLP Cell Loss Priority
CPCS Common Part Convergence Sublayer
CPCS Common Part Convergence Sublayer
CS Convergence Sublayer
CS Convergence Sublayer
FDDI Fiber Distributed Digital Interface
GFC Generic Flow Control
HEC Header Error Control
IPoA Internet Protocol over ATM
International Telecommunication Union-Telecommunication
ITU-T
Standardization Sector
LLC Logical Link Control
MMF Multi-mode Fiber
NNI Network-to-Network Interface
OAM Operation, Administration and Maintenance
OSI Open System Interconnection
PAP Password Authentication Protocol
PLCP Physical Layer Convergence Protocol
PM Performance Monitoring
PT Payload Type
PTI Payload Type Indicator
PVC Permanent Virtual Circuit
RDI Remote Defect Indication
SAR Segmentation and Reassembly
SAR-PDU Segmentation and Reassembly-Protocol Data Unit
SDH Synchronous Digital Hierarchy
Issue 01 (2018-05-04) 805

NE20E-S2
Acronym & Full Name

Abbreviation
SNAP Subnetwork Access Protocol

SNAP Sub-Network Attachment Point
Soft VC Soft Virtual Circuit
SONET Synchronous Optical Network
SSCS Service Special Convergence Sublayer
STP Shielded Twisted Pair
TC Transmission Convergence Sublayer
UNI User-to-Network Interface
VC Virtual Channel
VCC Virtual Channel Connection
VCI Virtual Channel Identifier
VE Virtual-Ethernet
VP Virtual Path
VPI Virtual Path Identifier
VT Virtual-Template
1.8.4 Frame Relay

Definition
Frame Relay (FR) is a Layer 2 packet-switched technology that allows devices to use virtual
circuits (VCs) to communicate on wide area networks (WANs).
Purpose
During the 1990s, rapid network expansion gave rise to the following requirements on
networks:
1. High transmission rate and low delay
2. Bandwidth reservation for traffic bursts
3. Accommodation for diversified intelligent user devices
The traditional methods used to meet the preceding requirements are circuit switching (leased
lines) and X.25 packet switching. However, these two methods have the following
disadvantages:
Issue 01 (2018-05-04) 806

NE20E-S2
 Circuit switching: Service deployment is costly, link usage efficiency is low, and
transmission of traffic bursts is unsatisfactory.
 X.25 packet switching: Switches and service deployment are costly, and because the
X.25 protocol is complicated, the transmission rate is low and the latency high.
FR was therefore introduced to meet such requirements. Unlike circuit switching and X.25
packet switching, FR is highly efficient, cost-effective, reliable, and flexible. With these
advantages, FR became popular in WAN deployment in the 1990s. Table 1-140 compares
circuit switching, X.25 packet switching, and FR.
Table 1-140 Comparison among circuit switching, X.25 packet switching, and FR
Performance Circuit Switching X.25 Packet FR
Indicator Switching
Time Division Supported Not supported Not supported
Multiplexing (TDM)
VC multiplexing Not supported Supported Supported
Port sharing Not supported Supported Supported
Transparent Supported Not supported Supported
transmission
Traffic burst Not supported Supported Supported
processing
High throughput Supported Not supported Supported
Transmission rate Low Low Low
Delay Very short Long Short
Cost High Medium Low
FR operates at the physical and data link layers of the Open System Interconnection (OSI)
reference model and is independent of upper layer protocols. This simplifies FR service
deployment. Characterized by a short network delay, low deployment costs, and high
bandwidth usage efficiency, FR became a popular communication technology in the early
1990s for WAN applications. FR has the following features:
 Transmits data in variable-size units called frames.
 Uses VCs instead of physical links to transmit data. Multiple VCs can be multiplexed
over one physical link, which improves bandwidth usage.
 Is a streamlined version of X.25 and retains only the core functionality of the link layer,
thereby improving data processing efficiency.
 Performs statistical multiplexing, frame transparent transmission, and error check at the
link layer. If FR detects an error, it drops the error frame; FR does not correct the errors.
In this way, FR does not involve frame sequencing, flow control, response, or monitoring
mechanism, and therefore reduces switch deployment costs, improves network
Issue 01 (2018-05-04) 807

NE20E-S2
throughput, and shortens communication delay. The access rate of FR users ranges from
64 kbit/s to 2 Mbit/s.
 Supports a frame size of at least 1600 bytes, suitable for LAN data encapsulation.
 Provides several effective mechanisms for bandwidth management and congestion
control. Besides reserving committed bandwidth resources for users, FR also allows
traffic bursts to occupy available bandwidth, which improves bandwidth usage.
 Is a connection-oriented packet-switched technology. It supports two types of circuits:
permanent virtual circuits (PVCs) and switched virtual circuits (SVCs). Currently, only
PVC services are deployed on FR networks.
Benefits
FR offers the following benefits:
 Easy deployment. FR can be deployed on X.25 devices after upgrading the device
software; existing applications and hardware require no modification.
 Flexible accounting mode. FR is suitable for traffic bursts and requires lower user
communication expenditure.
 Dynamically allocation of idle network resources. FR increases carrier returns from
existing investments by utilizing idle network resources.
1.8.4.2 Principles
On an FR network, devices connect to each other over VCs. A VC is a logical connection that
is identified by a data-link connection identifier (DLCI). Multiple VCs form a PVC.
The following describes several concepts involved in FR.
DLCI
DLCIs are used to identify VCs.
A DLCI is valid only on the local interface and its directly connected remote interface, and
enables the remote interface to know to which VC a frame belongs. Because FR VCs are
connection-oriented, the local DLCIs can be considered as FR addresses provided by local
devices.
A user interface on an FR network supports a maximum of 1024 VCs, and the number of
available DLCIs ranges from 16 to 1007.
DTE, DCE, UNI, and NNI

Devices and interfaces on an FR network serve different roles, as follows:
 DTE: data terminal equipment, typically located at the customer's premises
 DCE: data communication equipment, providing network access for DTEs
 UNI: user-network interface, connecting a DTE to a DCE
 NNI: network-network interface, connecting DCEs
On the FR network shown in Figure 1-551, two DTEs (Device A and Device D) are connected
across an FR network formed by two DCEs (Device B and Device C). Each DTE is connected
to a DCE through a UNI, and each DTE and its directly connected DCE must have the same
Issue 01 (2018-05-04) 808

NE20E-S2
DLCI. A PVC is established between two DTEs that are connected through NNIs. VCs are
differentiated by different DLCIs.
Figure 1-551 Roles of devices and interfaces on an FR network
VC
A VC is a virtual circuit established between two devices on a packet-switched network. VCs
can be classified as either PVCs or SVCs.
 PVCs are manually configured.
 SVCs are automatically created and deleted through protocol negotiation.
PVCs are more prevalent on FR networks because few manufacturers of frame relay DCEs support SVC
connections.
1.8.4.2.2 LMI
Introduction
Both a DCE and its connected DTE need to know the PVC status. Local Management
Interface (LMI) is a protocol that uses status enquiry messages and state messages to maintain
link and PVC status, including adding PVC status information, deleting information about
disconnected PVCs, monitoring PVC status changes, and checking link integrity. There are
three standards for LMI:
 ITU-T Q.933 Appendix A
 ANSI T1.617 Appendix D
 Vendor-specific implementation
This section describes LMI defined in ITU-T Q.933 Appendix A, which specifies the
information units and LMI implementation.
LMI Messages
There are two types of LMI messages:
 Status enquiry messages: sent from a DTE to a DCE to request the PVC status or detect
the link integrity.
Issue 01 (2018-05-04) 809

NE20E-S2
 Status messages: sent from a DCE to a DTE to respond to status enquiry messages. The
status messages carry the PVC status or link integrity information.
LMI Reports
There are three types of LMI reports:
 Link integrity verification only report: verifies the link integrity.
 Full status report: verifies the link integrity and transmits link integrity information and
PVC status.
 Single PVC asynchronous status report: notifies a DTE of a PVC status change.
On a UNI that connects a DTE to a DCE, the PVC status of the DTE is determined by the
DCE. To request the PVC status, the DTE sends a status enquiry message to the DCE. Upon
receipt of the message, the DCE replies with a status message that carries the requested status
information. The PVC status of the DCE is determined by other devices connected to the
DCE.
On an NNI that connects DCEs of a network, the DCEs periodically exchange PVC status.
LMI Working Process

Figure 1-552 shows the LMI working process:
Figure 1-552 LMI packet exchange
Issue 01 (2018-05-04) 810

NE20E-S2
1. A DTE sends a status enquiry message to its connected DCE, and at the same time, the
link integrity verification polling timer (T391) and the DTE counter (V391) start. The
value of T391 specifies the interval at which status enquiry messages are sent. The value
of the full status polling counter (N391), which includes the status of all PVCs, specifies
the interval at which full status reports are sent. You can specify the values of T391 and
N391 or use the default values.
− If the value of V391 is less than that of N391, the status enquiry message sent by
the DTE requests only link integrity information.
− If the value of V391 is equal to that N391, V391 is reset to 0, and the status enquiry
message sent by the DTE requests link integrity and PVC status information.
2. After receiving the enquiry message, the DCE responds with a status message, and at the
same time, the polling confirm timer (T392) of the DCE starts. If the DCE does not
receive a subsequent status enquiry message before T392 expires, the DCE records an
event and increases the value of the monitored events counter (N393) by 1.
3. The DTE checks the status message from the DCE. In addition to responding to every
enquiry that the DTE sends, the DCE automatically informs the DTE of the PVC status
when the PVC status changes or a PVC is added or deleted. This mechanism enables the
DTE to learn the PVC status in real time and maintain up-to-date records.
4. If the DTE does not receive a status message before T391 expires, the DTE records an
event and increases the value of N393 by 1.
5. N393 is an error threshold and records the number of events that have occurred. If the
value of N393 is greater than that of N392, the DTE or DCE considers the physical link
and all VCs unavailable. You can specify the values of N392 and N393 or use the default
values.
Table 1-141 lists the parameters required for LMI packet exchange. These parameters can be
configured to optimize device performance.
Table 1-141 Description of parameters for LMI packet exchange
Device Paramete Definition Description

r
DTE N391 Full status polling counter The DTE sends a full status
report or a link integrity
verification only report at an
interval specified by T391. The
numbers of full status reports
and link integrity verification
only reports to be sent are
determined using the following
formula: Number of link
integrity verification only
reports/Number of full status
reports = (N391 - 1)/1.
N392 Error threshold Specifies the threshold number
of errors.
N393 Monitored event counter Specifies the total number of
monitored events.
T391 Polling timer at the user side Specifies the interval at which
the DTE sends status enquiry
Issue 01 (2018-05-04) 811

NE20E-S2
Device Paramete Definition Description

r
messages.
DCE N392 Error threshold N392 on the DCE has similar
functions as N392 on the DTE.
N393 Monitored event counter N393 on the DCE has similar
functions as N393 on the DTE.
However, they differ in that the
interval at which status enquiry
messages are sent is specified
by T392 on the DCE (which, in
turn, is specified by T391 on
the DTE).
T392 Polling timer at the network Specifies the period during
side which the DCE waits for a
status enquiry message from
the DTE. The value of T392
must be greater than that of
T391.
1.8.4.2.3 FR Frame Encapsulation and Forwarding
FR Frame Encapsulation
FR encapsulates a network layer protocol (IP or IPX) in the Data field of a frame and sends
the frame to the physical layer for transmission. Figure 1-553 shows FR frame encapsulation.
Figure 1-553 FR frame encapsulation
Upon receipt of a Protocol Data Unit (PDU) from a network layer protocol (IP for example),
FR places the PDU between the Address field and frame check sequence (FCS). FR then adds
Flags to delimit the beginning or end of the frame. The value of the Flags field is always
01111110. After the encapsulation, FR sends the frame to the physical layer for transmission.
Figure 1-554 shows the basic format of an FR frame. In the format, the Flags field indicates
the beginning or end of the FR frame, and key information about the frame is carried in
Address, Data, and FCS. The 2-byte Address field is comprised of a 10-bit data-link
connection identifier (DLCI) and a 6-bit congestion management identifier.
Issue 01 (2018-05-04) 812

NE20E-S2
Figure 1-554 FR frame format
The following describes the fields in an FR frame:

 Flags: indicates the beginning or end of a frame.
 Address: contains the following information:
− DLCI: The 10-bit DLCI is the key part of an FR header because a DLCI identifies a
VC between a DTE and a DCE. A DLCI has only local significance.
FR VCs are connection-oriented, and a local device can be connected to different
peers through VCs with different DLCIs. A peer device can therefore be identified
by a local DLCI.
A maximum of 1024 VCs can be configured on a user interface of an FR device, but the number of
available DLCIs ranges from 16 to 1007. The values 0 and 1023 are reserved for LMI.
− C/R: follows DLCI in the Address field. The C/R bit is currently not defined.
− Extended Address (EA): indicates whether the byte in which the EA value is 1 is the
last addressing field. If the value is 1, the current byte is determined to be the last
DLCI byte. Although a two-byte DLCI is generally used in FR, EA supports longer
DLCIs. The eighth bit of each byte of the Address field indicates the EA.
− Congestion control: consists of three bits, which are forward-explicit congestion
notification (FECN), backward-explicit congestion notification (BECN), and
discard eligibility (DE).
 Data: contains encapsulated upper-layer data. Each frame in this variable-length field
includes a user data or payload field of a maximum of 16000 bytes.
 FCS: is used to check the integrity of frames. A source device computes an FCS value
and adds it to a frame before sending the frame to a receiver. Upon receipt of the frame,
the receiver computes an FCS value and compares the two FCS values. If the two values
are the same, the receiver processes the frame; if the two values are different, the
receiver discards the frame. If the frame is discarded, FR does not send a notification to
the source device. Error control is implemented by the upper layer of the OSI module.
FR Frame Forwarding
On the network shown in Figure 1-555, the source device and receiver are connected through
a PVC passing through Device A, Device B, and Device C. Each router maintains an address
mapping table that records the mapping between the inbound and outbound interfaces. FR
frames are received from the inbound interface and sent by the outbound interface to the next
router. Transit devices can be configured and connected through VCs on the FR network.
Issue 01 (2018-05-04) 813

NE20E-S2
Figure 1-555 Operating principles of FR VCs
Two devices across an FR network can be connected through a PVC consisting of multiple
VCs, (each VC is identified by a DLCI). Figure 1-555 shows how an FR frame is forwarded
along a PVC:
1. The source device sends an FR frame from port 1 along the VC specified by DLCI 1.
2. After receiving the FR frame from port 1, Device A sends it through port 2 along the VC
specified by DLCI 2.
3. After receiving the FR frame from port 0, Device B sends it through port 1 along the VC
specified by DLCI 3.
4. After receiving the FR frame from port 1, Device C sends it to the receiver through port
0 along the VC specified by DLCI 4.
1.8.4.2.4 FR Sub-interfaces
Background
An FR sub-interface is a logical interface configured on a physical interface. FR
sub-interfaces reduce the number of physical interfaces and deployment costs as well as the
impact of split horizon.
An FR network interconnects networks in different geographical locations using a star,
full-mesh, or partial-mesh network topology.
The star topology requires the least number of PVCs and is the most cost-effective. In the star
topology, PVCs are configured on an interface of the central node for communication with
different branch nodes. The star topology is an ideal option when a headquarters and its
branch offices need to be interconnected. The disadvantage of the star topology is that packets
exchanged between branch nodes have to pass through the central node.
Issue 01 (2018-05-04) 814

NE20E-S2
In a full-mesh topology, each two nodes are connected using PVCs and exchange packets
directly. This topology ensures high transmission reliability because packets can be switched
to other PVCs if the direct PVC between two nodes fails. However, the full-mesh topology
suffers from the "N square" problem and requires a large number of PVCs.
In a partial-mesh topology, only some nodes have PVCs to other nodes. An FR network is of
the non-broadcast multiple access (NBMA) type by default; Unlike Ethernet networks, the FR
network does not support broadcast. A node on the FR network must duplicate its received
route and send the route to different nodes over each PVC.
To avoid loops, split horizon is deployed to prevent an interface from sending received
routing information.
Figure 1-556 FR and split horizon
On the network shown in Figure 1-556, Device B sends a route to a POS interface of Device
A. Due to split horizon, Device A cannot send the route to Device C or Device D through the
POS interface. To resolve this problem, any of the following solutions can be used:
 Use multiple physical interfaces to connect two neighboring devices. This solution is not
cost-efficient because each device needs to provide multiple physical interfaces.
 Configure multiple sub-interfaces on a physical interface. Then assign a network address
to each sub-interface so that they can function as multiple physical interfaces.
 Disable split horizon. This solution increases the possibility of routing loops.
Implementation
FR can be deployed on interfaces or sub-interfaces, and multiple sub-interfaces can be
configured on one interface. Although sub-interfaces are logical, they have similar function as
interfaces at the network layer. Protocol addresses and VCs can be configured on the
sub-interfaces for communication with other devices.
Issue 01 (2018-05-04) 815

NE20E-S2
Figure 1-557 FR and sub-interfaces
Interfaces 1 through 3 in this example are 1/0/0.1, 1/0/0.2, 1/0/0.3, respectively.
On the network shown in Figure 1-557, three sub-interfaces (POS 3/0/1.1, POS 3/0/1.2, and
POS 3/0/1.3) are configured on a POS interface of Device A. Each sub-interface is connected
to a remote device through a VC. POS 1/0/0.1 is connected to Device B, POS 1/0/0.2 is
connected to Device C, and POS 1/0/0.3 is connected to Device D.
With the preceding configurations, the FR network is partially meshed. Devices can therefore
exchange update messages with each other, overcoming the limitations of split horizon.
Benefits
FR sub-interfaces reduce deployment costs.
1.8.4.3.1 FR Access
A typical FR application is FR access. FR access allows upper-layer packets to be transmitted
over an FR network.
An FR network allows user devices, such as routers and hosts, to exchange data.
Issue 01 (2018-05-04) 816

NE20E-S2
Figure 1-558 Directly connected LANs
1.8.4.4 Impact
1.8.4.4.1 On System Performance
None
1.8.4.4.2 On Other Features

None
1.8.4.4.3 Our Advantages

None
1.8.4.4.4 Known Issues

None
Terms
Term Definition
X.25 A data link layer protocol that defines how to maintain connections
between DTE and DCE devices for remote terminal access and PC
communication on a PDN.
Sub-interface A logical interface configured on a physical interface to facilitate
service deployment.

Abbreviation
ANSI American National Standards Institute

DCE data circuit-terminating equipment
DLCI data link connection identifier
Issue 01 (2018-05-04) 817

NE20E-S2

Abbreviation
DTE data terminal equipment

LMI local management interface
NNI network-to-network interface
OSI Open System Interconnection
PVC permanent virtual circuit
SVC switched virtual circuit
UNI user-to-network interface
VC virtual channel
1.8.5 HDLC and IP-Trunk

Definition
As a bit-oriented link layer protocol, HDLC transparently transmits bit flows of any type
without specifying data as a set of characters.
 Bit flow: data is transmitted in bit flows.

 Character set: data is transmitted in character sets.
Through the trunk technology, you can aggregate many physical interfaces into an
aggregation group to balance received and sent data among these interfaces and to provide
more highly-reliable connections.
HDLC
Compared with other data link layer protocols, HDLC has the following features:
 Full-duplex communication, which can send data continuously without waiting for
acknowledgment and has high data transmission efficiency.
 All frames adopt the Circle Redundancy Check (CRC) that numbers information frames.
In this way, the information frames can be prevented from being lost or received
repeatedly; therefore, the transmission reliability is improved.
 Transmission control function is separated from process function. Therefore, HDLC has
high flexibility and excellent control function.
 HDLC does not depend on any character set and can transmit data transparently.
 Zero-Bit Insertion, which is used to perform transparent transmission, is easy to be
applied on hardware.
Issue 01 (2018-05-04) 818

NE20E-S2
1.8.5.2 Principles
1.8.5.2.1 HDLC Principles
Background
Synchronous data link protocols include character-oriented, bit-oriented, and byte-oriented
protocols.
IBM put forward the first character-oriented synchronous protocol, called Binary
Synchronous Communication (BISYNC or BSC).
Later, ISO put forward related standards. The ISO standard is ISO 1745:1975 Information
processing - Basic mode control procedures for data communication systems.
In the early 1970s, IBM introduced the bit-oriented Synchronous Data Link Control (SDLC)
protocol.
Later, ANSI and ISO adopted and developed SDLC, and then later put forward their own
standards. ANSI introduced the Advanced Data Communications Control Protocol (ADCCP),
and ISO introduced HDLC.
HDLC Features
HDLC is a bit-oriented code-transparent synchronous data link layer protocol. It provides the
following features:
 HDLC works in full-duplex mode and can transmit data continuously without waiting for
acknowledgement. Therefore, HDLC features high data link transmission efficiency.
 HDLC uses cyclic redundancy check (CRC) for all frames and numbers them. This helps
you know which frames are dropped and which frames are repeatedly transmitted.
HDLC ensures high transmission reliability.
 HDLC separates the transmission control function from the processing function and
features high flexibility and perfect control capabilities.
 HDLC is independent of any character encoding set and transparently transmits data.
 Zero-bit insertion, which is used for transparent data transmission, is easy to implement
on hardware.
HDLC is especially used to logically transmit data that is segmented into physical blocks or
packages. These blocks or packages are called frames, each of which is identified by a start
flag and an end flag. In HDLC, all bit-oriented data link control protocols use a unified frame
format, and both data and control information are transmitted in frames. Each frame begins at
and ends with a frame delimiter, which is a unique sequence of bits of 01111110. The frame
delimiter marks the start or end of a frame or marks for synchronization. The frame delimiter
is invisible inside a frame to avoid confusion.
Zero-bit insertion is used to ensure that the sequence of bits used for the flag does not appear
in normal data. On the transmit end, zero-bit insertion monitors all fields except the flag and
places a 0 after five consecutive 1s. On the receive end, zero-bit insertion also monitors all
fields except the flag. After five consecutive 1s are found, if the following bit is a 0, the 0 is
automatically deleted to restore the former bit flow. If the following bit is a 1, it means that an
error has occurred or an end delimit is received. In this case, the frame receive procedure is
generally either restarted or aborted.
Issue 01 (2018-05-04) 819

NE20E-S2
1.8.5.2.2 HDLC Operation Modes
Introduction
Nodes on a network running HDLC are called stations. HDLC specifies three types of stations:
primary, secondary, and combined.
A primary station is the controlling station on a link. It controls the secondary stations on the
link and manages data flow and error recovery.
A secondary station is present on a link where there is a primary station. The secondary
station is controlled by the primary station, and has no direct responsibility for controlling the
link. Under normal circumstances, a secondary station will transfer frames only when
requested to do so by the primary station, and will respond only to the primary station.
A combined station is a combination of primary and secondary stations.
Frames transferred by a primary station to a secondary station are called commands, and
frames transferred by a secondary station to a primary station are called responses.
On a point to multipoint (P2MP) link, there is a primary station and several secondary stations.
The primary station polls the secondary stations to determine whether they have data to
transmit, and then selects one to transmit its data. On a point to point (P2P) link, both ends
can be combined stations. If a node is connected to multiple links, the node can be the primary
station for some links and a secondary station for the other links.
HDLC Operation Modes

HDLC can run in three separate modes:
 Normal Response Mode
In Normal Response Mode (NRM), the primary station on an HDLC link initiates
information transfers with secondary stations. A secondary station will respond only after
receiving a command from the primary station. The secondary station can respond with
one or more frames, and must indicate which frame is the last frame in the transmission.
The primary station manages the entire link and is responsible for timeout,
retransmission, and error recovery.
NRM is generally used for terminal-oriented P2P links and P2MP links.
 Asynchronous Response Mode
In Asynchronous Response Mode (ARM), the secondary station can transmit frames to
the primary station without first receiving a command from the primary station. The
secondary station is responsible for timeout and retransmission.
This mode is necessary for multi-node links that use polling.
 Asynchronous Balanced Mode
In Asynchronous Balance Mode (ABM), all stations are combined stations. All stations
can transmit information without permission from any other station, as well as transmit
and receive commands, send responses, and correct errors. This mode improves link
transmission efficiency.
1.8.5.2.3 HDLC Frame Format

In HDLC, data and control information is transmitted in the standard format of a frame.
HDLC frames are similar to BSC character blocks but are not transmitted independently.
Issue 01 (2018-05-04) 820

NE20E-S2
A complete HDLC frame consists of several fields, such as the Flag field, Address field,
Control field, Information field, and Frame check sequence (FCS) field. Figure 1-559 shows
the format of a complete HDLC frame.
Figure 1-559 HDLC frame format
1.8.5.2.4 HDLC Frame Types

In an HDLC frame, the format of the Control field determines the type of the HDLC frame.
The HDLC frame types are as follows:
 Information frames (I-frames): used to transmit valid user data. An I-frame contains a
receive sequence number N(R) and a sequence number of the sent frame N(S) in the
Control field.
 Supervisory frames (S-frames): used for flow and error control. An S-frame contains
only N(R) in the Control field. S-frames do not have information fields.
 Unnumbered frames (U-frames): used to set up, tear down, and control links. A U-frame
does not contain N(R) or N(S) in the Control field.
1.8.5.2.5 IP-Trunk
A trunk can aggregate many interfaces into an aggregation group to implement load balancing
on member interfaces. Therefore, link connectivity is of higher reliability. Trunk interfaces are
classified as Eth-Trunk interfaces and IP-Trunk interfaces. An IP-Trunk can only be composed
of POS links. It has the following characteristics:
 Increased bandwidth: An IP-Trunk obtains the sum of bandwidths of all member
interfaces.
 Improved reliability: When a link fails, traffic is automatically switched to other links,
which improves connection reliability.
Member interfaces of an IP-Trunk interface must be encapsulated with HDLC. IP-Trunk and
Eth-Trunk technologies have similar principles. For details, see the chapter about trunk in the
NE20E Feature Description - LAN Access and MAN Access.
1.8.5.2.6 HDLC Flapping Suppression
Background
Due to unstable signals on physical links or incorrect configurations at the data link layer on
live networks, an interface on which High-Level Data Link Control (HDLC) is enabled may
frequently experience HDLC negotiation, and the HDLC protocol status of the interface may
alternate between Up and Down, causing routing protocol or MPLS flapping. As a result,
Issue 01 (2018-05-04) 821

NE20E-S2
devices and networks are severely affected. Worse still, devices are paralyzed and networks
become unavailable.
HDLC flapping suppression restricts the frequency at which the HDLC protocol status of an
interface alternates between Up and Down. This restriction minimizes the impact of flapping
on devices and networks.
HDLC flapping suppression involves the following concepts:
 Penalty value: This value is calculated based on the HDLC protocol status of the
interface using the suppression algorithm. The core of the suppression algorithm is that
the penalty value increases with the changing times of the interface status and decreases
exponentially.
 Suppression threshold: The HDLC protocol status of an interface remains Down when
the penalty value is greater than the suppression threshold.
 Reuse threshold: The HDLC protocol status of an interface is no longer suppressed when
the penalty value is smaller than the reuse threshold.
 Ceiling threshold: The penalty value no longer increases when the penalty value reaches
the ceiling threshold, preventing the HDLC protocol status of an interface from being
suppressed for a long time. The ceiling value can be calculated using the following
formula: ceiling = reuse x 2(MaxSuppressTime/HalfLifeTime).
 Half-life-period: period that the penalty value takes to decrease to half. A half-life-period
begins to elapse when the HDLC protocol status of an interface goes Down for the first
time. If the specific half life expires, the penalty value decreases by half. Once a half life
ends, another half life starts.
 Max-suppress-time: maximum period during which the HDLC protocol status of an
interface is suppressed. After a max-suppress-time elapses, the HDLC protocol status of
the interface is renegotiated and reported.
Figure 1-560 shows the relationships between these parameters.
Issue 01 (2018-05-04) 822

NE20E-S2
Figure 1-560 HDLC flapping suppression
At t1, the HDLC protocol status of an interface goes Down, and its penalty value increases by
1000. Then, the interface goes Up, and its penalty value decreases exponentially based on the
half-life rule. At t2, the HDLC protocol status of the interface goes Down again, and its
penalty value increases by 1000, reaching 1600, which has exceeded the suppression
threshold of 1500. The HDLC protocol status of the interface is therefore suppressed. As the
interface keeps flapping, its penalty value keeps increasing until it reaches the ceiling
threshold of 10000 at tA. As time goes by, the penalty value decreases and reaches the reuse
value of 750 at tB. The HDLC protocol status of the interface is then no longer suppressed.
HDLC
Figure 1-561 HDLC
On the network shown in Figure 1-561, a point-to-point link is established betweenDevice A

and Device B, and HDLC is configured on Device A and Device B. HDLC provides simple,
stable, and reliable data transmission and features high fault tolerance at the data link layer.
Issue 01 (2018-05-04) 823

NE20E-S2
IP-Trunk
For an IP-Trunk interface, you can configure weights for member interfaces to implement
load balancing among member interfaces. There are two load balancing modes, namely,
per-destination and per-packet load balancing.
 Per-destination load balancing: packets with the same source and destination IP
addresses are transmitted over one member link.
 Per-packet load balancing: packets are transmitted over different member links.
As shown in Figure 1-562, two routers are connected through POS interfaces that are bundled
into an IP-Trunk interface to transmit IPv4, IPv6, and MPLS packets.
Figure 1-562 IP-Trunk networking
Terms
Term Definition
Aggregation Two or more interfaces are bundled together so that they function as a
single interface for load balancing and link protection.
Inter-board Interfaces on different boards are bundled together to form a link
aggregation aggregation group to improve the reliability of the link aggregation group.
Bundling Two boards can be bundled together and considered as one board.
Load Member interfaces in a link aggregation group are determined as outbound
balancing interfaces for packets based on their source and destination MAC addresses.

Acronym & Full Name
Abbreviation
LAG link aggregation group
LACP Link Aggregation Control Protocol
Issue 01 (2018-05-04) 824

NE20E-S2
1.8.6 PPP and MP

Definition
The Point-to-Point Protocol (PPP) is a link-layer protocol used to transmit point-to-point (P2P)
data over full-duplex synchronous and asynchronous links.
PPP negotiation involves the following items:
 Data encapsulation mode: defines how to encapsulate multi-protocol data packets.
 Link Control Protocol (LCP): used to set up, monitor, and tear down data links.
 Network Control Protocol (NCP): used to negotiate options for a network layer protocol
running atop PPP and the format and type of the data to be transmitted over data links.
PPP uses the Password Authentication Protocol (PAP) and Challenge Handshake
Authentication Protocol (CHAP) to secure network communication.
If carriers have high bandwidth requirements, bundle multiple PPP links into an MP link to
increase link bandwidth and improve link reliability.
Purpose
PPP, which works at the second layer (data link layer) of the open systems interconnection
(OSI) model, is mainly used on links that support full-duplex to transmit data. PPP is widely
used because it provides user authentication, supports synchronous and asynchronous
communication, and is easy to extend.
PPP is developed based on the Serial Line Internet Protocol (SLIP) and overcomes the
shortcomings of SLIP which supports transmits only IP packets, and does not support
negotiation. Compared with other link-layer protocols, PPP has the following advantages:
 PPP supports both synchronous and asynchronous links, whereas SLIP supports only
asynchronous links, and other link-layer protocols, such as X.25, support only
synchronous links.
 PPP is highly extensible.
 PPP uses a Link Control Protocol (LCP) to negotiate link-layer parameters.
 PPP uses a Network Control Protocol (NCP), such as the IP Control Protocol (IPCP) or
Internetwork Packet Exchange Control Protocol (IPXCP), to negotiate network-layer
parameters.
 PPP supports Password Authentication Protocol (PAP) and Challenge Handshake
Authentication Protocol (CHAP) which improve network security.
 PPP does not have a retransmission mechanism, which reduces network costs and speeds
up packet transmission.
Issue 01 (2018-05-04) 825

NE20E-S2
1.8.6.2 Principles
1.8.6.2.1 PPP Basic Concepts
PPP Architecture
PPP works at the network access layer of the Transmission Control Protocol (TCP)/IP suite
for point-to-point (P2P) data transmission over full-duplex synchronous and asynchronous
links.
Figure 1-563 Location of PPP in the TCP/IP suite
PPP negotiation involves the following protocols:

 Link Control Protocol (LCP): used to set up, monitor, and tear down data links.
 Network Control Protocol (NCP): used to negotiate the formats and types of the data
transmitted on data links.
 (Optional) Password Authentication Protocol (PAP) and Challenge Handshake
Authentication Protocol (CHAP): used to improve network security.
PPP Packet Format

Figure 1-564 shows the PPP packet format.
Issue 01 (2018-05-04) 826

NE20E-S2
Figure 1-564 PPP packet format
A PPP packet contains the following fields:

 Flag field
The Flag field identifies the start and end of a physical frame and is always 0x7E.
 Address field
The Address field uniquely identifies a peer. PPP is used on P2P links, so two devices
communicating using PPP do not need to know the link-layer address of each other. This
field must be filled with a broadcast address of all 1s and is of no significance to PPP.
 Control field
The Control field value defaults to 0x03, indicating an unsequenced frame. By default,
PPP does not use sequence numbers or acknowledgement mechanisms to ensure
transmission reliability.
The Address and Control fields together identify a PPP packet. That is, a PPP packet
header is FF03 by default.
 Protocol field
The Protocol field identifies the protocol of the data encapsulated in the Information
field of a PPP packet.
The structure of this field complies with the International Organization for
Standardization (ISO) 3309 extension mechanism for address fields. All Protocol field
values must be odd. The least significant bit of the least significant byte must be "1". The
least significant bit of the most significant byte must be "0".
If a device receives a data packet that does not comply with these rules, the device
considers the packet unrecognizable and sends a Protocol-Reject packet padded with the
protocol code of the rejected packet to the sender.
Table 1-142 Common protocol codes
Protocol Code Protocol Type
0021 Internet Protocol

002b Novell IPX
Issue 01 (2018-05-04) 827

NE20E-S2
Protocol Code Protocol Type

002d Van Jacobson Compressed TCP/IP
002f Van Jacobson Uncompressed TCP/IP
8021 Internet Protocol Control Protocol
802b Novell IPX Control Protocol
8031 Bridging NC
C021 Link Control Protocol
C023 Password Authentication Protocol
C223 Challenge Handshake Authentication Protocol
 Information field
The Information field contains the data. The maximum length of the Information field,
including the Padding content, is equivalent to the maximum receive unit (MRU) length.
The MRU defaults to 1500 bytes and can be negotiated.
In the Information field, the Padding content is optional. If data is padded, the
communicating devices can communicate only when they can identify the padding
information as well as the payload to be transmitted.
 Frame check sequence (FCS) field
The FCS field checks whether PPP packets contain errors.
Some mechanisms used to ensure proper data transmission increase the transmission cost
and cause delay in data exchange at the application layer.
LCP Packet Format

Figure 1-564 shows the LCP packet format.
Two devices exchange LCP packets to establish a PPP link. An LCP packet is encapsulated
into the Information field of a PPP packet as the payload. The value of the Protocol field of a
PPP packet is always 0xC021.
During the establishment of a PPP link, the Information field is variable and can contain
various LCP packets, which are identified using the Code field.
The following describes the cold field in the Information field:
 Code field
The 1–byte-long Code field identifies the LCP packet type.
If a receiver receives an LCP packet with an unknown Code field from a sender, the
receiver sends a Code-Reject packet to the sender.
Table 1-143 Code field values
Code Value Packet Type
Issue 01 (2018-05-04) 828

NE20E-S2
Code Value Packet Type

0x01 Configure-Request
0x02 Configure-Ack
0x03 Configure-Nak
0x04 Configure-Reject
0x05 Terminate-Request
0x06 Terminate-Ack
0x07 Code-Reject
0x08 Protocol-Reject
0x09 Echo-Request
0x0A Echo-Reply
0x0B Discard-Request
0x0C Reserved
 Identifier field
The Identifier field is 1 byte long. It is used to match requests and replies. If a packet
with an invalid Identifier field is received, the packet is discarded.
The sequence number of a Configure-Request packet usually starts at 0x01 and increases
by 1 each time the Configure-Request packet is sent. After a receiver receives a
Configure-Request packet, it must send a reply packet with the same sequence number as
the received Configure-Request packet.
 Length field
The Length field specifies the length of a negotiation packet, including the length of the
Code, Identifier, Length, and Data fields.
The Length field value cannot exceed the MRU of the link. Bytes outside the range of
the Length field are treated as padding and are ignored after they are received.
 Data field
The Data field contains the contents of a negotiation packet and includes the following
fields:
− Type field: specifies the negotiation option type.
− Length field: specifies the total length of the Data field.
− Data field: contains the contents of the negotiation option.
Table 1-144 Negotiation options in the Type field
Negotiation Negotiation Packet Type

Option Value
Issue 01 (2018-05-04) 829

NE20E-S2
Negotiation Negotiation Packet Type

Option Value
0x01 Maximum-Receive-Unit
0x02 Async-Control-Character-Map
0x03 Authentication-Protocol
0x04 Quality-Protocol
0x05 Magic-Number
0x06 RESERVED
0x07 Protocol-Field-Compression
0x08 Address-and-Control-Field-Compression
1.8.6.2.2 PPP Link Establishment Process

A PPP link is set up through a series of negotiations, as shown in Figure 1-565.
Figure 1-565 PPP link establishment process
The PPP link establishment process is as follows:
Issue 01 (2018-05-04) 830

NE20E-S2
1. Two devices enter the Establish phase if one of them sends a PPP connection request to
the other.
2. In the Establish phase, the two devices perform an LCP negotiation to negotiate the
working mode, maximum receive unit (MRU), authentication mode, and magic number.
The working mode can be either Single-Link PPP (SP) or Multilink PPP (MP). If the
LCP negotiation succeeds, LCP enters the Opened state, which indicates that a
lower-layer link has been established.
3. If authentication is configured, the two devices enter the Authentication phase and
perform Password Authentication Protocol (PAP) or Challenge Handshake
Authentication Protocol (CHAP) authentication. If no authentication is configured, the
two devices enter the Network phase.
4. In the Authentication phase, if PAP or CHAP authentication fails, the two devices enter
the Terminate phase. The link is torn down and LCP enters the Down state. If PAP or
CHAP authentication succeeds, the two devices enter the Network phase, and LCP
remains in the Opened state.
5. In the Network phase, the two devices perform an NCP negotiation to select a
network-layer protocol and to negotiate network-layer parameters. After the two devices
succeed in negotiating a network-layer protocol, packets can be sent over this PPP link
using the network-layer protocol.
Various control protocols, such as IP Control Protocol (IPCP) and Multiprotocol Label
Switching Control Protocol (MPLSCP), can be used in NCP negotiation. IPCP mainly
negotiates the IP addresses of the two devices.
6. If the PPP connection is interrupted during PPP operation, for example, if the physical
link is disconnected, the authentication fails, the negotiation timer expires, or the
connection is torn down by the network administrator, the two devices enter the
Termination phase.
7. In the Termination phase, the two devices release all resources and enter the Dead phase.
The two devices remain in the Dead phase until a new PPP connection is established
between them.
Dead Phase
The physical layer is unavailable during the Dead phase. A PPP link begins and ends with this
phase.
When two devices detect that the physical link between them has been activated, for example,
when carrier signals are detected on the physical link, the two devices move from the Dead
phase to the Establish phase.
After the PPP link is terminated, the two devices enter the Dead phase.
Establish Phase
In the Establish phase, the two devices perform an LCP negotiation to negotiate the working
mode (SP or MP), MRU, authentication mode, and magic number. After the LCP negotiation
is complete, the two devices enter the next phase.
In the Establish phase, the LCP status changes as follows:
 If the link is unavailable (in the Dead phase), LCP is in the Initial or Starting state. When
the physical layer detects that the link is available, the physical layer sends an Up event
to the link layer. Upon receipt, the link layer changes the LCP status to Request-Sent.
Then, the devices at both ends send Configure-Request packets to each other to
configure a data link.
Issue 01 (2018-05-04) 831

NE20E-S2
 If the local device first receives a Configure-Ack packet from the peer, the LCP status
changes from Request-Sent to Ack-Received. After the local device sends a
Configure-Ack packet to the peer, the LCP status changes from Ack-Received to Open.
 If the local device first sends a Configure-Ack packet to the peer, the LCP status changes
from Request-Sent to Ack-Sent. After the local device receives a Configure-Ack packet
from the peer, the LCP status changes from Ack-Sent to Open.
 After LCP enters the Open state, the next phase starts.
The next phase is the Authentication or Network phase, depending on whether authentication
is required.
Authentication Phase
The Authentication phase is optional. By default, PPP does not perform authentication during
PPP link establishment. If authentication is required, the authentication protocol must be
specified in the Establish phase.
PPP provides two password authentication modes: PAP authentication and CHAP
authentication.
Two authentication methods are available: unidirectional authentication and bidirectional authentication.
In unidirectional authentication, the device on one end functions as the authenticating device, and the
device on the other end functions as the authenticated device. In bidirectional authentication, each
device functions as both the authenticating and authenticated device. In practice, only unidirectional
authentication is used.
PAP Authentication Process

PAP is a two-way handshake authentication protocol that transmits passwords in simple text.
Figure 1-566 shows the PAP authentication process.
Issue 01 (2018-05-04) 832

NE20E-S2
Figure 1-566 PAP authentication process
1. The authenticated device sends the local user name and password to the authenticating
device.
2. The authenticating device checks whether the received user name is in the local user list.
− If the received user name is in the local user list, the authenticating device checks
whether the received password is correct.
 If the password is correct, the authentication succeeds.
 If the password is incorrect, the authentication fails.
− If the received user name is not in the local user list, the authentication fails.
PAP Packet Format
A PAP packet is encapsulated into the Information field of a PPP packet with the Protocol
field value 0xC023. Figure 1-567 shows the PAP packet format.
Figure 1-567 PAP packet format
Table 1-145 describes the fields in a PAP packet.
Table 1-145 PAP packet fields
Field Length in Bytes Description
Issue 01 (2018-05-04) 833

NE20E-S2

Code 1 Type of a PAP packet:
 0x01 for Authenticate-Request
packets
 0x02 for Authenticate-Ack
packets
 0x03 for Authenticate-Nak
packets
Identifier 1 Whether requests match replies.

Length 2 Length of a PAP packet, including
the lengths of the Code, Identifier,
Length, and Data fields.
Bytes outside the range of the Length
field are treated as padding and are
discarded.
Data 0 or more Data contents that are determined by
the Code field.
CHAP Authentication Process

CHAP is a three-way handshake authentication protocol. CHAP transmits only user names but
not passwords, so it is more secure than PAP.
Figure 1-568 shows the CHAP authentication process.
Issue 01 (2018-05-04) 834

NE20E-S2
Figure 1-568 CHAP authentication process
Unidirectional CHAP authentication applies to the following scenarios:

 The authenticating device is configured with a user name (this scenario is recommended).
In this scenario:
a. The authenticating device initiates an authentication request by sending a
randomly-generated Challenge packet that carries the local user name to the
authenticated device.
b. After the authenticated device receives the Challenge packet at an interface, the
authenticated device checks whether the CHAP password is used on the interface.
 If the password is used, the authenticated device encrypts the Challenge packet
with the packet ID and password using the Message Digest 5 (MD5) algorithm.
Then the authenticated device sends a Response packet carrying the generated
ciphertext and local user name to the authenticating device.
 If the password is not configured, the authenticated device searches the local
user table for the password matching the user name of the authenticating
device in the received Challenge packet, and encrypts the Challenge packet
with the packet ID and user password using the MD5 algorithm. Then the
authenticated device sends a Response packet carrying the generated
ciphertext and local user name to the authenticating device.
c. The authenticating device uses the MD5 algorithm to encrypt the Challenge packet
with the saved password of the authenticated device. Then the authenticating device
compares the generated ciphertext with that carried in the received Response packet
and returns a response based on the result of the check. If the two passwords are the
same, the authentication succeeds. If the two passwords are different, the
authentication fails.
 The authenticating device is not configured with a user name. In this scenario:
Issue 01 (2018-05-04) 835

NE20E-S2
a. The authenticating device initiates an authentication request by sending a

randomly-generated Challenge packet.
b. After the authenticated device receives the Challenge packet, the authenticated
device uses the MD5 algorithm to encrypt the Challenge packet with the packet ID
and password configured by the ppp chap password command. Then the
authenticated device sends a Response packet carrying the generated ciphertext and
local user name to the authenticating device.
c. The authenticating device uses the MD5 algorithm to encrypt the Challenge packet
with the saved password of the authenticated device. Then the authenticating device
compares the generated ciphertext with that carried in the received Response packet
and returns a response based on the result of the check. If the two passwords are the
same, the authentication succeeds. If the two passwords are different, the
authentication fails.
CHAP Packet Format
A CHAP packet is encapsulated into the Information field of a PPP packet with the Protocol
field value 0xC023. Figure 1-569 shows the CHAP packet format.
Figure 1-569 CHAP packet format
Table 1-146 describes the fields in a CHAP packet.
Table 1-146 Fields in a CHAP packet

Code 1 Type of a CHAP packet:
 0x01 for Challenge packets
 0x02 for Response packets
 0x03 for Success packets
 0x04 for Failure packets
Identifier 1 Relationships between Challenge and

Response packets.
Length 2 Length of a CHAP packet, including
the lengths of the Code, Identifier,
Length, and Data fields.
Bytes outside the range of the Length
field are treated as padding and are
discarded.
Data 0 or more Data contents that are determined by
the Code field.
Issue 01 (2018-05-04) 836

NE20E-S2
The differences between PAP and CHAP authentication are as follows:
 In PAP authentication, passwords are sent over links in simple text. After a PPP link is established,
the authenticated device repeatedly sends the user name and password until authentication finishes.
PAP authentication is used on networks that do not require high security.
 CHAP is a three-way handshake authentication protocol. In CHAP authentication, the authenticated
device sends only a user name to the authenticating device. Compared with PAP, CHAP features
higher security because passwords are not transmitted. CHAP authentication is used on networks
that require high security.
Network Phase
In the Network phase, NCP negotiation is performed to select a network-layer protocol and to
negotiate network-layer parameters. An NCP can enter the Open or Closed state at any time.
After an NCP enters the Open state, network-layer data can be transmitted over the PPP link.
Termination Phase
PPP can terminate a link at any time. A link can be terminated manually by an administrator
or be terminated due to carrier loss, an authentication failure, or other causes.
1.8.6.2.3 PPP Magic Number Check
Background
When two devices are connected through the interfaces over an intermediate transmission
device, their connection will be adjusted if the connection is found incorrect during traffic
transmission. However, the interfaces cannot detect the connection adjustment because the
interfaces do not go Down, and therefore LCP renegotiation is not triggered. However, PPP
allows the interfaces to learn the 32-bit host routes from each other only during the LCP
negotiation. As a result, the interfaces continue to transmit traffic using the host routes learned
during the original connection even after the connection change, and traffic is transmitted
incorrectly.
To address this issue, deploy PPP magic number check on these devices. Even if the interfaces
do not detect the connection change, PPP magic number check can trigger LCP renegotiation.
The interfaces then re-learn the host routes from each other.
Principles
Magic numbers are generated by communication devices independently. To prevent devices
from generating identical magic numbers, each device generates a unique magic number using
its serial number, hardware address, or clock randomly.
Devices negotiate their magic numbers during LCP negotiation and send Echo packets
carrying their negotiated magic numbers to their peers after the LCP negotiation.
In Figure 1-570, Device A and Device B are connected over a transmission device, and
Device C and Device D are also connected over this transmission device. PPP connections
Issue 01 (2018-05-04) 837

NE20E-S2
have been established, and LCP negotiation is complete between Device A and Device B and
between Device C and Device D. If the connections are found incorrect, an adjustment is
required to establish a PPP connection between Device A and Device C. In this situation, PPP
magic number check can be used to trigger the LCP renegotiation as follows:
1. Device A sends to Device C an Echo-Request packet carrying Device A's negotiated
magic number.
2. When receiving the Echo-Request packet, Device C compares the magic number carried
in the packet with its peer's negotiated magic number (Device D's). The magic numbers
are different, and the error counter on Device C increases by one.
3. Device C replies to Device A with an Echo-Reply packet carrying Device D's negotiated
magic number.
4. When receiving the Echo-Reply packet, Device A compares the magic number carried in
the packet with the local magic number. The magic numbers are different. Device A then
compares the magic number in the packet with its peer's negotiated magic number
(Device B's). The magic numbers are also different, and the error counter on Device A
increases by one.
5. The preceding steps are repeated. If the error counter reaches a specified value, LCP
goes Down, and LCP renegotiation is triggered.
Figure 1-570 Triggering LCP renegotiation
Issue 01 (2018-05-04) 838

NE20E-S2
Figure 1-570 shows the connection status before LCP renegotiation. Device A and Device C still use the
local and peer's magic numbers that are negotiated previously. These magic numbers are not updated
until the LCP renegotiation.
1.8.6.2.4 PPP Flapping Suppression
Background
Due to unstable signals on physical links or incorrect configurations at the data link layer on
live networks, PPP-capable interfaces may frequently experience PPP negotiation, and the
PPP protocol status of these interfaces may alternate between Up and Down, causing routing
protocol or MPLS flapping. As a result, devices and networks are severely affected. Worse
still, devices are paralyzed and the network become unavailable.
PPP flapping suppression restricts the frequency at which the PPP protocol status of an
interface alternates between Up and Down. This restriction minimizes the impact of flapping
on devices and networks.
PPP flapping suppression involves the following concepts:
 Penalty value: This value is calculated based on the PPP protocol status of the interface
using the suppression algorithm. The core of the suppression algorithm is that the penalty
value increases with the changing times of the interface status and decreases
exponentially.
 Suppression threshold: The PPP protocol status of an interface is suppressed and remains
Down when the penalty value is greater than the suppression threshold.
 Reuse threshold: The PPP protocol status of an interface is no longer suppressed when
the penalty value is smaller than the reuse threshold.
 Ceiling threshold: The penalty value no longer increases when the penalty value reaches
the ceiling threshold, preventing the PPP protocol status of an interface from being
suppressed for a long time. The ceiling value can be calculated using the following
formula: ceiling = reuse x 2(MaxSuppressTime/HalfLifeTime).
 Half-life-period: period that the penalty value takes to decrease to half. A half-life-period
begins to elapse when the PPP protocol status of an interface goes Down for the first
time. If a half-life-period elapses, the penalty value decreases to half, and another
half-life-period begins.
 Max-suppress-time: maximum period during which the PPP protocol status of an
interface is suppressed. After a max-suppress-time elapses, the PPP protocol status of the
interface is renegotiated and reported.
Figure 1-571 shows the relationships between these parameters.
Issue 01 (2018-05-04) 839

NE20E-S2
Figure 1-571 PPP flapping suppression
At t1, the PPP protocol status of an interface goes Down, and its penalty value increases by
1000. Then, the interface goes Up, and its penalty value decreases exponentially based on the
half-life rule. At t2, the PPP protocol status of the interface goes Down again, and its penalty
value increases by 1000, reaching 1600, which has exceeded the suppression threshold of
1500. The PPP protocol status of the interface is therefore suppressed. As the interface keeps
flapping, its penalty value keeps increasing until it reaches the ceiling threshold of 10000 at
tA. As time goes by, the penalty value decreases and reaches the reuse value of 750 at tB. The
PPP protocol status of the interface is then no longer suppressed.
1.8.6.2.5 MP Principles
Principles
The Multilink protocol bundles multiple PPP links into an MP link to increase link bandwidth
and reliability. MP fragments packets exceeding the maximum transmission unit (MTU) and
sends these fragments to the PPP peer over the PPP links in the MP-group. The PPP peer then
reassembles these fragments into packets and forwards these packets to the network layer. For
packets that do not exceed the MTU, MP directly sends these packets over the PPP links in
the MP-group to the PPP peer, which in turn forwards these packets to the network layer.
Implementation
An MP-group interface is dedicated to MP applications. MP is implemented by adding
multiple interfaces to an MP-group interface.
MP Link Negotiation Process

Certain MP options, such as the maximum receive reconstructed unit (MRRU) and endpoint
discriminator, are determined through Link Control Protocol (LCP) negotiation.
Issue 01 (2018-05-04) 840

NE20E-S2
MP negotiation involves:
 LCP negotiation: Devices on both ends negotiate LCP parameters and check whether
they both work in MP mode. If they work in different working modes, LCP negotiation
fails.
 Network Control Protocol (NCP) negotiation: Devices on both ends perform NCP
negotiation by using only NCP parameters (such as IP addresses) of the MP-group
interfaces but not using the NCP parameters of physical interfaces.
If NCP negotiation succeeds, an MP link is established.
Benefits
MP provides the following benefits:
 Increased bandwidth
 Load balancing
 Link backup
 Reduced delay through packet fragmentation
1.8.6.3.1 MP Applications
A single PPP link can provide only limited bandwidth. To increase link bandwidth and
reliability, bundle multiple PPP links into a MP link.
As shown in Figure 1-572, there are two PPP links between Device A and Device B. The two
PPP links are bundled into an MP link by creating an MP-group interface. The MP link
provides higher bandwidth than a single PPP link. If one PPP link in the MP group fails,
communication over the other PPP link is not affected.
Figure 1-572 Communication over an MP link
Terms
None

Abbreviation
Issue 01 (2018-05-04) 841

NE20E-S2

Abbreviation
CHAP Challenge Handshake Authentication Protocol

FCS Frame Check Sequence
LCP Link Control Protocol
MP Multilink Protocol
MRRU Maximum Receive Reconstructed Unit
MRU Maximum Receive Unit
NCP Network Control Protocol
PAP Password Authentication Protocol
SLIP Serial Line Internet Protocol
1.8.7 Transmission Alarm Customization and Suppression

Transmission alarm customization can control the impact of alarm signals on interface status.
Transmission alarm suppression can efficiently suppress alarm signals, which prevents
interfaces from frequently flapping.
Definition
Carrier-class networks require high reliability for IP devices. IP devices are required to rapidly
detect faults.
When the fast detection function is enabled on an interface, alarm reporting becomes faster.
This may cause the physical status of the interface to switch between Up and Down. As a
result, the network flaps frequently.
Therefore, alarms must be filtered and suppressed to prevent frequent network flapping.
Transmission alarm suppression can efficiently filter and suppress alarm signals to prevent
interfaces from frequently flapping. In addition, transmission alarm customization can control
the impact of alarms on the interface status.
Transmission alarm customization and suppression provide the following functions:
 The Transmission alarm customization function allows you to specify alarms that can
cause the physical status of an interface to change. This function helps filter out
unwanted alarms.
 The Transmission alarm suppression function allows you to suppress network flapping
by setting a series of thresholds.
Issue 01 (2018-05-04) 842

NE20E-S2
Purpose
Transmission alarm customization allows you to filter unwanted alarms, and transmission
alarm suppression enables you to set thresholds on customized alarms, allowing devices to
ignore burrs generated during transmission link protection and preventing frequent network
flapping.
On a backbone network or an MAN, IP devices are connected to transmission devices,
including synchronous digital hierarchy (SDH), Synchronous Optical Network (SONET).
When transmission devices become faulty, IP devices will receive alarms. Then, faulty
transmission devices perform link switchovers and the alarms disappear. After an alarm is
generated, a link switchover lasts 50 ms to 200 ms. In the log information on IP devices, the
transmission alarms are displayed as burrs that last 50 ms to 200 ms. These burrs will cause
the interface status of IP devices to switch frequently. IP devices will perform route
calculation frequently. As a result, routes flap frequently, affecting the performance of IP
devices.
From the perspective of the entire network, IP devices are expected to ignore such burrs. That
is, IP devices must customize and suppress the alarms that are generated during transmission
device maintenance or link switchovers. This can prevent route flapping. Transmission alarm
customization can control the impact of transmission alarms on the physical status of
interfaces. Transmission alarm suppression can efficiently filter and suppress specific alarm
signals to avoid frequent interface flapping.
1.8.7.2 Principles
Network Flapping
Network flapping occurs when the physical status of interfaces on a network frequently
alternates between Up and Down.
Alarm Burrs
An alarm burr is a process in which alarm generation and alarm clearance signals are received
in a short period (The period varies with specific usage scenarios, devices, or service types).
For example, if a loss of signal (LOS) alarm is cleared 50 ms after it is generated, the process
from the alarm generation to clearance is an alarm burr.
Alarm Flapping
Alarm flapping is a process in which an alarm is repeatedly generated and cleared in a short
period (The period varies with specific usage scenarios, devices, or service types).
For example, if an LOS alarm is generated and cleared 10 times in 1s, alarm flapping occurs.
Key Parameters in Flapping Suppression

 figure of merit: stability value of an alarm. A larger value indicates a less stable alarm.
 penalty: penalty value. Each time an interface receives an alarm generation signal. Each
time an interface receives an alarm clearance signal, the figure of merit value decreases
exponentially.
Issue 01 (2018-05-04) 843

NE20E-S2
 suppress: alarm suppression threshold. When the figure of merit value exceeds this
threshold, alarms are suppressed. This value must be smaller than the ceiling value and
greater than the reuse value.
 ceiling: maximum value of figure of merit. When an alarm is repeatedly generated and
cleared in a short period, figure of merit significantly increases and, therefore, takes a
long time to return to reuse. To avoid long delays returning to reuse, a ceiling value can
be set to limit the maximum value of figure of merit. figure of merit does not increase
when it reaches the ceiling value.
 reuse: alarm reuse threshold. When this value is greater than that of figure of merit,
alarms are not suppressed. This value must be smaller than the suppress value.
 half-time: time used by figure of merit of suppressed alarms to decrease to half.
 decay-ok: time used by figure of merit to decrease to half when an alarm clearance
signal is received.
 decay-ng: time used by figure of merit to decrease to half when an alarm generation
signal is received.
1.8.7.2.2 Transmission Alarm Processing

Transmission alarms are processed as follows:
1. After a transmission device generates alarms, it determines whether to report the alarms
to its connected IP device based on the alarm types.
− If the alarms are b1tca, b2tca, b3tca, sdbere, or sfbere, the transmission device
determines whether the alarm threshold is reached.
If the threshold is reached, the transmission device reports the alarms to the IP
devices for processing.
If the threshold is not reached, the transmission device ignores these alarms.
− For all other alarms, they are directly reported to the IP device for processing.
2. If the recording function is enabled on the IP device, the alarms are recorded.
3. The IP device determines whether to change the physical status of the interface based on
customized alarm types.
− If no alarm types are customized to affect the physical status of the interface, these
alarms are ignored. The physical status of the interface remains unchanged.
− If an alarm type is customized to affect the physical status of the interface, the alarm
is processed based on the transmission alarm customization mechanism.
Transmission Alarm Customization Mechanism

When a transmission device reports alarm signals to an IP device, the IP device determines
whether to change the physical status of its interface based on the transmission alarm
customization function.
 When a certain type of alarms is customized to affect the interface status but
transmission alarm filtering or suppression is not configured,
− The physical status of the interface changes to Down if such an alarm is generated
− The physical status of the interface changes to Up if such an alarm is cleared.
 If a certain type of alarms is customized to affect the interface status and transmission
alarm filtering or suppression is configured, the IP device processes the alarm according
to the filtering mechanism or suppression parameters.
Issue 01 (2018-05-04) 844

NE20E-S2
(Optional) Transmission Alarm Filtering Mechanism

Transmission alarm filtering enables an IP device to determine whether an alarm signal is a
burr.
If the interval between an alarm signal generation and clearance is smaller than the filtering
timer value, this alarm signal is considered a burr.
 If the alarm signal is a burr, it is ignored. The physical status of the interface remains
unchanged.
 If the alarm signal is not a burr,
− The physical status of the interface changes to Down if the signal is an alarm
generation signal.
− The physical status of the interface changes to Up if the signal is an alarm clearance
signal that is not suppressed.
(Optional) Transmission Alarm Suppression Mechanism

Transmission alarm suppression enables an IP device to determine how to process an alarm
signal.
 When an alarm's figure of merit is smaller than suppress,
− If no alarm generation or clearance signal is received, figure of merit decreases
with time.
− If an alarm generation signal is received, the physical status of the interface changes
to Down, and figure of merit increases by the penalty value.
− If an alarm clearance signal is received, the physical status of the interface changes
to Up. figure of merit decreases exponentially.
 When an alarm's figure of merit reaches suppress, this alarm is suppressed. The
generation or clearance signal of this alarm does not affect the physical status of the
interface.
 When an alarm is frequently generated, figure of merit reaches ceiling. figure of merit
then stops increasing, even if new alarm signals arrive. If no alarm signals arrive, figure
of merit decreases with time.
 When an alarm's figure of merit decreases to reuse, this alarm is free from suppression.
After the alarm is free from suppression, the process repeats if this alarm is generated again.
Issue 01 (2018-05-04) 845

NE20E-S2
Figure 1-573 Alarm suppression attenuation
Figure 1-573 shows the correlation between a transmission device sending alarm generation
signals and how figure of merit increases and decreases.
1. At t1 and t2, figure of merit is smaller than suppress. Therefore, alarm signals
generated at t1 and t2 affect the physical status of the interface, and the physical status of
the interface changes to Down.
2. At t3, figure of merit exceeds suppress, and the alarm is suppressed. The physical
status of the interface is not affected, even if new alarm signals arrive.
3. At t4, figure of merit reaches ceiling. If new alarm signals arrive, figure of merit is
recalculated but does not exceed ceiling.
4. At t5, figure of merit falls below reuse, and the alarm is free from suppression.
Terms
None

Abbreviation
SONET Synchronous Optical Network

VRP Versatile Routing Platform
Issue 01 (2018-05-04) 846

NE20E-S2
1.8.8 CES Service Connectivity Test

Definition
The circuit emulation service (CES) technology carries traditional TDM data over a packet
switched network (PSN) and provides end-to-end PDH and SDH data transmission in the
PWE3 architecture.
The pseudo random binary sequence (PRBS) is used to generate random data.
CES service connectivity tests use the PRBS technique to generate a PRBS stream,
encapsulate the PRBS stream into CES packets, send and receive the CES packets over CES
service channels, and calculate the proportion of error bits to the total number of bits to obtain
the bit error rate (BER) of CES service channels for measuring service connectivity.
Purpose
When routers and access services such as ATN devices are connected over a public network,
the transmission quality affects service deployment and cutover. To address this problem, use
the NMS to deliver a service connectivity test command after CES services are deployed on
PWs. After the test is conducted, the device returns the test result to the NMS. This shortens
service deployment.
Benefits
CES service connectivity tests offer the following benefits to carriers:
 Monitors link quality during network cutover and helps identify potential risks,
improving the cutover success ratio and minimizing user complaints about operator
network issues.
 Helps speed up service deployment and cutover on a network, shortening the service
launch period.
1.8.8.2 Principles
PRBS Stream
CES service connectivity tests use the PRBS technique to generate a PRBS stream,
encapsulate the PRBS stream into CES packets, send and receive the CES packets over CES
service channels, and calculate the proportion of error bits to the total number of bits to obtain
the BER of CES service channels for measuring service connectivity.
A PRBS stream is a pseudo random binary sequence of bits.
1. PRBS stream generation: A PRBS stream is generated by a specific carry flip-flop using
a multinomial. The multinomial varies according to the length of a sequence.
2. PRBS stream measurement Figure 1-574 shows how PRBS stream measurement is
implemented. After the PRBS module of PE1 generates a PRBS stream, the PRBS
stream is encapsulated to CES packets, which are then sent by the network-side
high-speed TX interface to PE2 over a PW. Upon receipt, PE2's line-side E1 interface
Issue 01 (2018-05-04) 847

NE20E-S2
performs a local loopback and sends the CES packets through the network-side interface
to PE1's RX interface. After PE1 receives the packets, it compares the sent and received
data and counts the error bits.
Figure 1-574 PRBS stream measurement over a PW
3. Bit error insertion during tests: During the tests, bit errors can be inserted to the PRBS
stream. PE1 generates a PRBS stream and inserts bit errors. After the PRBS receive unit
receives bit errors, PE1 can determine the test validity.
4. Test termination by PRBS streams: If a CES service connectivity test lasts for a long
time, you can stop sending and receiving the PRBS stream to terminate the test.
CES service connectivity tests are offline detections and interrupt services. Therefore, this function
applies to site deployment and fault detection after a service interruption.
BER Calculation
The BER is calculated using the following equation:
BER = Number of error bits/(Interface rate x Test period)
Real-time and Historical Test Result Query

During the test, you can check the real-time test result to determine the real-time link quality.
After the test ends, you can also check the historical test result to determine the historical link
quality.
Typical Application on an IP RAN
Figure 1-575 E1 transparent transmission on an IP RAN
Issue 01 (2018-05-04) 848

NE20E-S2
On an IP RAN shown in Figure 1-575, NE20E 1 is directly connected to a BTS over an E1

link, and NE20E 2 is directly connected to a BSC over an E1 link. Link deterioration or
incorrect connections may cause a cutover failure.
1.8.9 CES
Definition
 TDM
Time Division Multiplex (TDM) is implemented by dividing a channel by time,
sampling voice signals, and enabling sampled voice signals to occupy a fixed interval
that is called timeslot according to time sequence. In this way, multiple ways of signals,
through TDM, can be combined into one way of high-rate complex digital signal (group
signal) in a certain structure. Each way of signal is transmitted independently.
Figure 1-576 Multiplexing and demultiplexing for TDM
Traditional Transmission Mode

After processed by Pulse Code Modulation (PCM), voice signals, together with other
digital signals, are transmitted through Plesiochronous Digital Hierarchy (PDH) or
Synchronous Digital Hierarchy (SDH) connections by using the TDM technology.
Generally speaking, PDH/SDH services are called TDM services.
Service System
TDM services are classified by transmission mode as follows:
− In the PDH system, E1, T1, and E3 are usually used.
− In the SDH system, the STM-1, STM-4, and STM-16 are usually used.
Clock Synchronization
TDM services require clock synchronization. One of the two parties in communication
takes the clock of the other as the source, that is, the device functioning as the Data
Circuit-terminal Equipment (DCE) outputs clocks signals to the device functioning as
the Data Terminal Equipment (DTE). If the clock mode is incorrect or the clock is faulty,
error code is generated or synchronization fails.
The synchronization clock signals for TDM services are extracted from the physical
layer. The 2.048 MHz synchronization clock signals for E1 are extracted from the line
code. The transmission adopts HDB3 or AMI coding that carries timing information.
Therefore, devices can extract clock signals from these two types of codes.
 PWE3
Issue 01 (2018-05-04) 849

NE20E-S2
Pseudo Wire Emulation Edge-to-Edge (PWE3) is a mechanism to emulate core features

of the telecom service through PSN, such as a T1 leased line or frame relay (FR). The
PW technology is used to carry emulated services from one PE to another PE or multiple
PEs through a PSN. It adopts a tunnel (IP/MPLS) on the PSN to emulate multiple
services, such as ATM, FR, HDLC, PPP, TDM, and Ethernet.
PSN can transmit Protocol Data Units (PDUs) of multiple services. Interoperability and
conversion between the services are not required. Tunnels used for PWE3 are called
pseudo wires (PW). PW data traffic is invisible for the core network. The core network
transparently transmits CE services.
Figure 1-577 PWE3 framework
 TDMoPSN
Based on TDM circuits on a PSN, TDM Circuits over Packet Switching Networks
(TDMoPSN) is a kind of PWE3 service emulation. TDMoPSN emulates TDM services
over a PSN such as an MPLS or Ethernet network; therefore, transparently transmitting
TDM services over a PSN. TDMoPSN is mainly implemented by means of two
protocols: Structure-Agnostic TDM over Packet (SAToP) and Structure-Aware TDM
Circuit Emulation Service over Packet Switched Network (CESoPSN).
 IP RAN
IP RAN, mobile carrier, is a technology used to carry wireless services over the IP
network. IP RAN scenarios are complex because different base stations (BSs), interface
technologies, access and convergence scenarios are involved.
− 2G/2.5G/3G/LTE, traditional BSs/IP BSs, GSM/CDMA, TDM/ATM/IP (interface
technologies) are involved.
− Varying with the BS type, distribution model, network environment, and evolution
process, the convergence modes include microwave, MSTP, DSL, PON, and Fiber.
You can converge services on BSs directly to the MAN UPE or through
convergence gateways (with functions of BS convergence, compression
optimization, packet gateway, and offload).
− Reliability, security, QoS and operation and maintenance (OM) are considered in IP
RAN scenarios. In some IP RAN scenarios, transmission efficiency is concerned.
 CEP
Circuit Emulation over Packet (CEP) emulates Synchronous Optical Network
(SONET)/Synchronous Digital Hierarchy (SDH) circuits and services over MPLS. The
emulation signals include:
Issue 01 (2018-05-04) 850

NE20E-S2
− Synchronous Payload Envelope (SPE)/Virtual Container (VC-N): STS-1/VC-3,

STS-3c/VC-4, STS-12c/VC-4-4c, STS-48c/VC-4-16c, STS-192c/VC-4-64c, and so
on.
− Virtual Tributary (VT)/VC-N: VT1.5/VC-11, VT2/VC-12, VT3, and VT6/VC-2.
CEP treats these signals as serial data code flows and fragments and encapsulates them
so that they can be transmitted over PW tunnels.
Currently, only SDH VC-4 signal encapsulation is supported.
Purpose
TDMoPSN is just a mature solution of this kind. TDMoPSN is applied to implement
accessing and bearing of TDM services on the PSN. TDMoPSN is mainly applied to IP RAN
carrying wireless services to carry fixed network services between MSAN devices.
Benefits
The TDMoPSN feature offers the following benefits to carriers:
 Saves rent for expensive TDM leased lines.
 Facilitates smooth evolution of the network.
 Simplifies network operations and reduces maintenance cost.
 Binds only the useful time slots into packets to improve the resource utilization.
The TDMoPSN feature offers the following benefits to users:
Be free from paying expensive rent for leased lines for fixed network operators when an
enterprise access the network for the voice service.
1.8.9.2 Principles
TDMoPSN
A TDMoPSN packet, as defined by Recommendation rfc4553-Structure-Agnostic Time
Division Multiplexing, includes the Ethernet header, TDMoPSN packet (CES or SAToP
packet), and FCS.
The CPOS signal is divided into 63 E1 signals by the Framer on the CPOS interface. Then, E1
signals are encapsulated according to the protocol. Packets received on the network side are
decapsulated to E1 signals, bound into CPOS signals by the Framer, and then sent to the
CPOS line. Therefore, the implementation of SDH data services in TDMoPSN is similar with
the implementation of PDH data services in TDMoPSN.
Issue 01 (2018-05-04) 851

NE20E-S2
Figure 1-578 Structure of a TDMoPSN packet
 SAToP
The Structure-Agnostic TDM over Packet (SAToP) function emulates PDH circuit
services of low rate.
SAToP is used to carry E1/T1/E3 services in unframed mode (non-structured). It divides
and encapsulates serial data streams of TDM services, and then transmits encapsulated
packets in a PW. SAToP is the most simple method to handle transparent transmission of
PDH low-rate services in TDM circuit simulation schemes.
Figure 1-579 SAToP
Features of non-structured transmission mode are as follows:

− The mode does not need to protect the integrity of the structure; it does not need to
explain or operate the channels.
− It is suitable for the PSN of higher transmission performance.
− It needs to neither distinguish channels nor interrupt TDM signaling.
 CESoPSN
The Structure-aware TDM Circuit Emulation Service over Packet Switched Network
(CESoPSN) function simulates PDH circuit services of low rate on E1/T1/E3 interfaces.
Different from SAToP, CESoPSN provides structured simulation and transmission of
TDM services. That is, with a framed structure, it can identify and transmit signaling in
the TDM frame.
Issue 01 (2018-05-04) 852

NE20E-S2
Figure 1-580 CESoPSN
Features of the structured transmission mode are as follows:

− When services are carried on the PSN, the TDM structure needs to be protected
explicitly.
− The transmission with a sensitive structure can be applied to the PSN with poor
network performance. In this manner, the transmission is more reliable.
Other Key Technologies

 Jitter Buffer
After traversing the MPLS network, PW packets may reach the egress PE at different
intervals or packet disorder may occur Therefore, the TDM service flow must be
reconstructed on the egress PE according to the interval at which PW packets smoothed
with the jitter buffer technology.
The jitter buffer of a larger capacity can tolerate a greater jitter in the transmission
interval of packets on the network, but it causes a longer delay in the reconstruction of
TDM service data flows. A jitter buffer can be configured based on delay and jitter
conditions.
 Analysis on Delay of Data Packets
Most TDM services are voice services and therefore require short delay. ITU-T G.111
(A.4.4.1 Note3) points out that when the delay reaches 24 ms, a human ear can feel the
echo in the voice service.
Generally, the TDMoPSN processing delay is calculated as follows:
TDMoPSN service processing delay = Hardware processing delay + Jitter buffer depth +
Packet encapsulation time + Network delay
Where:
− The hardware processing delay is fixed and inevitable.
− The jitter buffer depth is configurable.
− The packet encapsulation time equals 0.125 ms multiplied by the number of frames
encapsulated into a packet.
− The network delay refers to the transmission delay between two PEs.
Issue 01 (2018-05-04) 853

NE20E-S2
 Clock synchronization
TDMoPSN service packets are transmitted at a constant rate. The local and remote
devices must have synchronized clocks before exchanging TDMoPSN service packets.
Traditional TDM services can synchronize clocks through a physical link but TDMoPSN
services are carried on a PSN. TDM services lose synchronization clock signals when
reaching a downstream PE.
A downstream PE uses either of the following methods to synchronize clocks:
− Obtains clock signals from an external BITS clock.
− Recovers clock signals from packets.
Downstream PEs, by following an algorithm, can extract clock signals from
received PWE3 packets. Clock recovery is further classified as adaptive clock
recovery (ACR) and differential clock recovery (DCR) according to
implementation.
 QoS processing
TDM services require low delay and jitter and fixed bandwidth. A high QoS priority
must be specified for TDM services.
1.8.9.2.2 IP RAN Implementation on the Device

SAToP and CESoPSN are two protocol standards of TDM PWE3. Their difference lies in that
SAToP is insensitive to the E1 frame structure and packs the whole E1 frame in a PW; but
CESoPSN is sensitive to the E1 frame structure and packs E1 data by timeslot and tunnel.
Unframed E1 frames are packed to simplify configurations through SAToP and framed E1
frames are packed through CESoPSN.
TDMoPSN services on the NE20E are encapsulated through MPLS. The CESoPSN
encapsulation structure complies with Recommendation draft-ietf-pwe3-cesopsn-07 and
SAToP encapsulation structure complies with Recommendation rfc4553-Structure-Agnostic
Time Division Multiplexing.
CES implementation
CESoPSN services are encapsulated through MPLS, with the structure defined by
Recommendation draft-ietf-pwe3-cesopsn-07 as shown in Figure 1-581.
Issue 01 (2018-05-04) 854

NE20E-S2
 MPLS Lable
The specified PSN header includes data required for forwarding packets from the PSN
border gateway to the TDM border gateway.
PWs are distinguished by PW tags that are carried on the specified layer of the PSN.
Since TDM is bidirectional, two PWs in reverse directions should be correlated.
 PW Control Word
The structure of the CESoPSN control word is defined by Recommendation
draft-ietf-pwe3-cesopsn-07 as shown in Figure 1-582.
Figure 1-582 PW Control Word
The padding structure of the PW control word on the NE20E is as follows:

− Bit 0 to bit 3: padded with 0 fixedly.
− L bit (1 bit), R bit (1 bit), and M bit (2 bits): Used for transparent transmission of
alarms and identifying the detection of severe alarms by an upstream PE on the CE
or AC side.
− FRG (2 bits): padded with 0 fixedly.
Issue 01 (2018-05-04) 855

NE20E-S2
− Length (6 bits): length of a TDMoPSN packet (control word and payload) when the
padding bit is used to meet the requirements on the minimum transmission unit on
the PSN. When the length of the TDMoPSN packet is longer than 64 bytes, padding
bit field is padded with all 0s.
− Sequence number (16 bits): It is used for PW sequencing and enabling the detection
of discarded and disordered packets. The length of the sequence number is 16 bits
and has unsigned circular space. The initial value of the sequence number is
random.
 Optional RTP
An RTP header can carry timestamp information to a remote device to support packet
recovery clock such as DCR. The packet recovery clock is not discussed in this
document. In addition, packets transmitted on some devices must include the RTP header.
To save bandwidth, no RTP header is recommended under other situations.
The RTP header is not configured by default. You can add it to packets. Configurations
of PEs on both sides must be the same; otherwise, two PEs cannot communicate with
each other.
Figure 1-583 RTP header
The padding method for the RTP header on the NE20E is to keep the sequence number
(16 bits) consistent with the PW control word and pad other bits with 0s.
 TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by the
number of timeslots bound to PW (bytes). When the length of the whole PW packet is
shorter than 64 bytes, fixed bit fields are padded to meet requirements of Ethernet
transmission.
SAToP implementation
CESoPSN services are encapsulated through MPLS, with the structure defined by
Recommendation rfc4553-Structure-Agnostic Time Division Multiplexing as show in Figure
1-584
Issue 01 (2018-05-04) 856

NE20E-S2
 MPLS Lable
The MPLS label for SAToP is the same as the MPLS label for CESoPSN.
 PW Control Word
The structure of the CESoPSN control word is defined by Recommendation
RFC4553-Structure-Agnostic Time Division Multiplexing as show in Figure 1-585.
Figure 1-585 PW Control Word
The padding structure of the PW control word on the NE20E is as follows:

− Bit 0 to bit 3: padded with 0 fixedly.
− L bit (1 bit) and R bit (1 bit): Used for transparent transmission of alarms and
identifying the detection of severe alarms by an upstream PE on the CE or AC side.
− RSV (2 bits) and FRG (2 bits): padded with 0 fixedly.
− Length (6 bits): length of a TDMoPSN packet (control word and payload) when the
padding bit is used to meet the requirements on the minimum transmission unit on
the PSN. When the length of the TDMoPSN packet is longer than 64 bytes, the
padding bits are padded with all 0s.
− Sequence number (16 bits): It is used for PW sequencing and enabling the detection
of discarded and disordered packets. The length of the sequence number is 16 bits
and has unsigned circular space. The initial value is the sequence number is
random.
 Optional RTP
Issue 01 (2018-05-04) 857

NE20E-S2
The optional RTP for SAToP is the same as the optional RTP for CESoPSN.
 TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by 32
(bytes). When the length of the whole PW packet is shorter than 64 bytes, the fixed bits
are padded to meet requirements of Ethernet transmission.
Timeslot 0 Transparent Transmission

When the E1 frame adopts the structure of the CRC4 multiframe, bits SA4 to SA8 in timeslot
0 of the E1 frame are used to transmit the signaling defined by the operator.
If timeslot 0 is configured on both sides of the PSN, timeslot 0 in the upstream is processed in
the same way as the process method of the data tunnel. Timeslot 0 is packed as a PW or
bound with other timeslots as a PW. In the downstream, the Framer configures transparent
transmission of SA bits, and SA bits use network data and other bits in timeslot 0 are
generated locally.
Statistics of Alarms and Error Codes

 E1
Framed mode: LOS, LOF, RDI, and AIS. Unframed mode: LOS and AIS.
Statistics: none.
 CPOS
Alarms: LOS, LOF, OOF, LAIS, LRDI, LREI, B2SD, B2SF, PAIS, PLOP, PRDI, PREI,
and PSLM.
Statistics: B1, B2, and B3.
Implementation Procedures
The frequency of E1 frames is 8000 frames/second, namely, 32 bytes/frame. An E1 frame
consists of 32 timeslots and each timeslot corresponds to one byte of 32 bytes. For example,
in CESoPSN mode, timeslot 0 (the byte 0 of 32 bytes) as the frame header, cannot carry data
but is used for special processing. The other 31 timeslots correspond to bytes 1 to 31 of each
E1 frame. In SAToP mode, no frame header is used and an E1 frame consists of 32 bytes.
As shown in Figure 1-586, the following implementation procedures goes from CE1, PE1,
PE2, to CE2. In the direction of TDM transparent transmission from CE1 to PE1, in
CESoPSN mode, PE1 encapsulates bytes 1 to 31 (payload) of the E1 frame received from
CE1 in a PW packet. In SAToP mode, PE1 encapsulates 256 bits as payload from the bit
stream in the form of 32 x 8 = 256bit in a PW packet. The frequency of E1 frames is fixed,
and therefore PE1 receives data (31 bytes or 256 bits) of a fixed frequency from CE1 and then
encapsulates data in the PW packet continuously. When the number of encapsulated frames
reaches the pre-configured number, the whole PW packet is sent to the PSN.
In the encapsulation structure of a PW packet, the control word is mandatory. The L bit, R bit,
and sequence number domain must be paid attention to. The L bit and R bit are used to carry
alarm information. They are used when the TDM transparent transmission process transmits
E1 frame data received by PE1 in a PW to an E1 interface of PE2 and PE1 needs to transmit
alarm information (such as AIS and RDI) from CE1 to a remote device. PE1 reports received
alarm information (AIS/RDI) to the control plane. The control plane modifies the L bit and R
bit in the control word of the PW packet and then sends them with E1 frame data to PE2.
Issue 01 (2018-05-04) 858

NE20E-S2
The sequence number is used to prevent PW packets from being discarded or disordered
during forwarding on the PSN. Every time a PW packet is sent by PE1, the sequence number
increases by 1.
The downstream traffic goes from PE2 to CE2. After receiving a PW packet from the PSN,
PE2 caches the PW packet in different buffers by the mask included in the sequence number.
For example, the sequence number is 16 bits and 256 buffers are configured for caching, and
therefore the lowest 8 bits of the 16-bit sequence number is cached according to the map
address. When the sequence number of received PW packet is sequential and the configured
jitter buffer for the PW packet reaches the threshold, the PW packet is unpacked and then sent.
For example, 8 frames are encapsulated in a packet. According to the frequency of 8000
frames/second, 8 frames require 1 ms; however, the jitter buffer is configured to 3 ms.
Therefore, PW packets are not sent until its total number reaches 3.
If the PW packet corresponding to a sequence number is not received, an idle code (its
payload is configurable) is sent.
Before the PW packet is resolved and the sequence number is processed, the L bit and R bit
need to be processed. The L bit and R bit that carry alarm information is sent to PE2. After
being extracted with payload, the PW packet is sent to CE2 at the same frequency as that of
CE1 in the way that 31 bytes or 256 bits are included in a frame; otherwise, PE2 overruns or
underruns. Therefore, clock synchronization (frequency synchronization) is required between
the CE1 lock and PE2 clock in TDM transparent transmission.
The recommended mode for frequency synchronization in TDM transparent transmission is
ACR/DCR, that is, PE2 calculates the sending clock frequency of CE1 according to the
frequency of received PW packets and then uses the sending clock frequency of PE2 on the
AC side to send E1 frame data.
Alarm Transparent Transmission

Before PWE3 is applied, CEs are directly connected by cables or fibers. In this way, alarms
generated on CE1 can be directly detected by CE2. After PWE3 is applied, CE2 cannot
directly detect alarms generated on CE1 because the PWE3 tunnel between CEs does not have
the circuit features of TDM services. To implement better simulation, alarm transparent
transmission is used.
Figure 1-586 Alarm transparent transmission
As shown in Figure 1-586, it is assumed that data is transmitted from CE2 to CE1. Alarm
transparent transmission is the process of transmitting E1/T1 alarms on PE1 to downstream
Issue 01 (2018-05-04) 859

NE20E-S2
PE2 through the PW control word, restoring E1/T1 alarms, and then transmitting them to CE2,
and vice versa.
The types of alarms that can be transparently transmitted are AIS and RDI. Involved PW
control words are the L bit, R bit, and M bit.
Other Features
The TDM interface can be created on either a common E1 interface or an E1 interface that is
channelized from CPOS.
Both the non-slotted TDM interface (SAToP transparent transmission) and the slotted TDM
interface (CES transparent transmission) can be created.
The serial port supports encapsulation of packets through multiple protocols such as TDM,
ATM, PPP, and HDLC.
The dynamic or static PW protocol is supported.
1.8.9.2.3 Principles of CEP

Circuit Emulation over Packet (CEP) is a protocol standard of TDM PWE3. Unlike
Structure-Agnostic Time Division Multiplexing over Packet (SAToP) and Structure-Aware
TDM Circuit Emulation Service over Packet Switched Network (CESoPSN), which
encapsulate payload based on low-speed PDH services, CEP encapsulates payload based on
VCs.
CEP Encapsulation Format

Figure 1-587 shows the CEP encapsulation format.
Figure 1-587 CEP encapsulation format
 MPLS Label
The specified PSN header includes data required to forward packets from a PSN border
gateway to a TDM border gateway.
Issue 01 (2018-05-04) 860

NE20E-S2
PWs are distinguished by MPLS labels that are carried on a specified PSN layer. To
transmit bidirectional TDM services, two PWs that transmit in opposite directions are
associated.
 CEP Header
Figure 1-588 shows the CEP header format .
Figure 1-588 CEP header format
The CEP header contains the following fields:

− L bit: CEP-AIS. This bit must be set to 1 to signal to the downstream PE that a
failure condition has been detected on the attachment circuit.
− R bit: CEP-RDI. This bit must be set to 1 to signal to the upstream PE that a loss of
packet synchronization has occurred. This bit must be set to 0 once packet
synchronization is acquired.
− N and P bits: These bits are used to explicitly relay negative and positive pointer
adjustments events across the PSN. The use of N and P bits is optional. If not used,
N and P bits must be set to 0.
− FRG (2 bits): both bits must be set to 0.
− Length (6 bits): length of a TDMoPSN packet (including the length of a CEP header,
plus the length of the RTP header if used, and plus the length of the payload). If the
length of the TDMoPSN packet is shorter than the minimum transmission unit (64
bytes) on the PSN, padding bits are used. If the length of the TDMoPSN packet is
longer than 64 bytes, the entire field is padded with 0s.
− Sequence Number (16 bits): used for PW sequencing and enabling the detection of
discarded and disordered packets. The length of the sequence number is 16 bits and
has unsigned circular space. The initial value of the sequence number is random.
 Optional RTP
An RTP header can carry timestamp information to a remote device to support packet
recovery clock, such as DCR.
By default, the RTP header is not configured. You can add it to packets. RTP
configurations of PEs on both ends of a PWE3 must be the same; otherwise, the two PEs
cannot communicate with each other.
Issue 01 (2018-05-04) 861

NE20E-S2
Figure 1-589 RTP header
The sequence number (16 bits) in the RTP header is padded in the same way as that in
the CEP header. The other bits in the RTP header are 0s.
 TDM Payload
The TDM packet payload can only be 783 bytes.
Implementation
Each STM-1 frame consists of 9 rows and 270 columns. VC-4 occupies 9 rows and 261
columns, a total of 2349 bytes. As a CEP payload is 783-bytes long, one VC-4 can be broken
into three CEP packets.
In the following example, CEP packets are transmitted along the path CE1 -> PE1 -> PE2 ->
CE2. On the uplink of TDM transparent transmission from CE1 to PE1, PE1 fragments the
VC-4 contained in an SDN frame sent by CE1 into 783-byte payloads and encapsulates the
payloads into a PW packet. The frequency of SDH frames is fixed, and therefore PE1 receives
data at a fixed frequency from CE1 and then encapsulates data into the PW packet
continuously. When the number of encapsulated frames reaches the pre-configured number,
the whole PW packet is sent to the PSN.
In the encapsulation structure of a PW packet, the CEP header is mandatory. The L bit and R
bit are used to carry alarm information. PE1 transmits its received SDH frame data to an SDH
interface of PE2 over a PW on the PSN and transmits alarm information (such as AIS and
RDI) received from CE1 to a remote device. PE1 reports received alarm information
(LOS/LOF/AUAIS/MSAIS/AULOP) to the control plane. The control plane modifies the L
bit and R bit in the control word of the PW packet and then sends them with SDH frame data
to PE2.
The sequence number is used to prevent PW packets from being forward in the wrong
sequence (and therefore discarded) during forwarding on the PSN. Each time PE1 sends a PW
packet, the sequence number increases by 1.
On the downlink of TDM transparent transmission from PE2 to CE2, upon receipt of a PW
packet from the PSN, PE2 caches the PW packet in different buffers by the mask included in
the sequence number. For example, if the sequence number is 16 bits and 256 buffers are
configured for caching, the lowest 8 bits of the 16-bit sequence number are cached according
to the map address. When the sequence number of the received PW packet is sequential and
the configured jitter buffer for the PW packet reaches the threshold, the PW packet is
unpacked and then sent.
If the PW packet corresponding to a sequence number is not received, an idle code (its
payload is configurable) is sent.
Issue 01 (2018-05-04) 862

NE20E-S2
The L and R bits need to be processed before the PW packet is parsed and the sequence
number is processed. The L and R bits that carry alarm information are sent to PE2. After
being extracted from the PW packet, these payloads are assembled into a VC-4 and integrated
into an SDH frame. The SDH frame is then sent to CE2 at the same frequency as that when
the SDH frame is sent by CE1. Otherwise, PE2 overruns or underruns. Therefore, clock
synchronization (frequency synchronization) is required between the CE1 clock and PE2
clock in TDM transparent transmission.
Alarm Transparent Transmission

Without PWE3, CEs are directly connected by cables or fibers. In this way, alarms generated
on CE1 can be directly detected by CE2. After PWE3 is applied, CE2 cannot directly detect
alarms generated on CE1 because the PWE3 tunnel between CEs does not possess the circuit
characteristics of TDM services. To implement better simulation, alarm transparent
transmission is used.
In the alarm transparent transmission process as shown in Figure 1-590, if data is transmitted
from CE2 to CE1, an interface alarm detected by PE1 is transmitted to downstream PE2 over
the PSN using the PW control word, restored as an interface alarm, and then transmitted to
CE1. The procedure of alarm transparent transmission is the similar when data is transmitted
from CE1 to CE2.
Figure 1-590 Alarm transparent transmission
 The types of alarms that can be transparently transmitted are LOS, LOF, AUAIS, MSAIS, and
AULOP.
 Involved PW control words are the L bit and R bit.
Issue 01 (2018-05-04) 863

NE20E-S2
Applicable Scenario 1
Figure 1-591 Applicable Scenario 1
Scenario description
After TDM services from 2G base stations are converged on the E1 or T1 interface on PE1,
TDM packets are encapsulated into PSN packets that can be transmitted on PSNs. After
reaching downstream PE2, PSN packets are decapsulated to original TDM packets and then
the TDM packets are sent to the 2G convergence device.
Advantages of the solution
In the solution, multiple types of services are converged at a PE on the PSN. The solution
effectively saves original network resources, uses less PDH VLLs, and facilitates the
deployment of sites and the maintenance and administration of multiple services.
Issue 01 (2018-05-04) 864

NE20E-S2
Application Scenario 2
Figure 1-592 Application Scenario 2
TDM services of different office areas, residential areas, schools, enterprises, and institutions
can be accessed by a local PE through E1/T1 links. Heavy TDM services can be carried
through CPOS interfaces.
The solution saves the rent for VLL because TDM services for enterprises are access by a
local PE. In addition, the solution can choose access types flexibly and plan networking
properly.
Issue 01 (2018-05-04) 865

NE20E-S2
In this solution, a network can carry 2G, 3G, and fixed network services concurrently. This
solution physically integrates the transmission of different types of services but keeps the
management of them independently. Therefore, it provides different service bearer solutions
for different operators on the same network.
In the solution, different services can be carried on the same network and therefore the
resource utilization is improved and maintenance cost is reduced.
Issue 01 (2018-05-04) 866

NE20E-S2
Services of different timeslots on different sites can be accessed by the PSN through local E1.
The PE on the convergence side binds different timeslots of different E1s to one E1 and then
encapsulates bound timeslots and other CE1/E1 services as SDH data, and finally sends
encapsulated packets to the base station controller (BSC) through the CPOS interface.
The solution channelizes E1 services, transparently transmits E1 services, multiplexes
timeslots of multiple E1s to one E1, and manages services of multiple E1s/CE1s through the
same CPOS interface.
Terms
None

TDM Time Division Multiplex
PCM Pulse Code Modulation
Issue 01 (2018-05-04) 867

NE20E-S2

PDH Plesiochronous Digital Hierarchy
MPLS MultiProtocol Label Switch
PSN Pack Switching Network
PWE3 Pseudo-Wire Emulation Edge-to-Edge
PW Pseudo-Wire
DCE Data Circuit-terminal Equipment
DTE Data Terminal Equipment
SAToP Structure-Agnostic TDM over Packet
CESoPSN Structure-Aware TDM over Packet
Switched Network
1.9 IP Services
Purpose
This document describes the IP services feature in terms of its overview, principles, and
applications.
Related Version

U2000 V200R017C50
Intended Audience
Issue 01 (2018-05-04) 868

NE20E-S2

securely protected.
and VRRP.
Special Declaration
Issue 01 (2018-05-04) 869

NE20E-S2
Symbol Conventions
Symbol Description



injury.
tips.
deterioration.
Change History
V800R009C10SPC200.
V800R009C10SPC100.
Issue 01 (2018-05-04) 870

NE20E-S2

V800R009C10.
1.9.2 ARP
Definition
The Address Resolution Protocol (ARP) is an Internet protocol used to map IP addresses to
MAC addresses.
Purpose
If two hosts need to communicate, the sender must know the network-layer IP address of the
receiver. IP datagrams, however, must be encapsulated with MAC addresses before they can
be transmitted over the physical network. Therefore, ARP is needed to map IP addresses to
MAC addresses to ensure the transmission of datagrams.
Function Overview
Table 1-147 lists ARP features.
Table 1-147 ARP features
Feature Description Usage Scenario

1.9.2.2.2 Dynamic ARP Devices Real-time
dynamically learn communication is a
and update the priority, or network
mapping between resources are
IP and MAC insufficient.
addresses by
exchanging ARP
messages.
1.9.2.2.3 Static ARP The mapping Communication
between IP and security is a
MAC addresses is priority, and
manually created network resources
and cannot be are sufficient.
dynamically
modified.
1.9.2.2.4 Gratuitous ARP A device broadcasts Gratuitous ARP is
gratuitous ARP used to check
packets that carry whether the local IP
the local IP address address conflicts
as both the source with that of another
and destination IP device, to notify
addresses to notify other devices on the
the other devices on same network
Issue 01 (2018-05-04) 871

NE20E-S2

the same network segment of the new
segment of its MAC address after
address the local network
information. interface card is
replaced, or to
notify master/slave
switchovers in a
Virtual Router
Redundancy
Protocol (VRRP)
backup group.
1.9.2.2.5 Proxy ARP If a proxy  Two hosts have
ARP-enabled the same
device receives an network ID, but
ARP request are located on
message that different
destined for another physical
device, the proxy network
ARP-enabled segments. If the
device encapsulates hosts need to
its MAC address communicate,
into an ARP reply routed proxy
message and sends ARP must be
the packet to the enabled on the
device that sends intermediate
the ARP request device.
message.  Two hosts
belong to the
same VLAN,
but host
isolation is
configured for
the VLAN. If
the two hosts
need to
communicate,
intra-VLAN
proxy ARP must
be enabled on
the interfaces
that connect the
two hosts.
 Two hosts
belong to
different
VLANs. If the
two hosts need
to communicate
at Layer 2,
inter-VLAN
proxy ARP must
be enabled on
Issue 01 (2018-05-04) 872

NE20E-S2

the interfaces
that connect the
two hosts.
 In the Ethernet
virtual
connection
(EVC) mode, if
two hosts
belong to the
same bridge
domain (BD) for
which host
isolation is
configured, you
must enable
local proxy ARP
on the VBDIF
interfaces that
connect the two
hosts.
Otherwise, the
two hosts cannot
communicate.
1.9.2.2.6 ARP-Ping ARP-Ping uses To prevent address
ARP messages to conflict, send ARP
detect whether an messages to check
IP or MAC address whether an address
to be configured for is already in use on
a device is in use. the network before
configuring an IP
or MAC address for
a device.
1.9.2.2.7 Dual-Device ARP Hot Backup Dual-device ARP Dual-device ARP
hot backup enables hot backup prevents
ARP entries on the downlink traffic
control and from being
forwarding planes interrupted because
to be synchronized the backup device
between the master does not learn ARP
and backup devices entries from a
in real time. When device on the user
the backup device side during a
switches to the master/backup
master device, host VRRP switchover,
route information is which improves
generated based on network reliability.
the backup ARP
entries on the
backup device.
Issue 01 (2018-05-04) 873

NE20E-S2
Benefits
ARP ensures communication by mapping IP addresses at the network layer to MAC addresses
at the link layer on Ethernet networks.
1.9.2.2 Principles
Related Concepts
ARP involves the following concepts:
 Address Resolution Protocol (ARP) messages
An ARP message can be an ARP request or reply message. Figure 1-595 shows the ARP
message format.
Figure 1-595 ARP message format
The Ethernet Address of destination field contains a total of 48 bits. Ethernet Address of destination
(0-31) indicates the first 32 bits of the Ethernet Address of destination field, and Ethernet Address of
destination (32-47) indicates the last 16 bits of the Ethernet Address of destination field.
An ARP message consists of 42 bytes. The first 14 bytes indicate the Ethernet frame
header, and the last 28 bytes are the ARP request or reply message content. Table 1-148
describes the fields in an ARP message.
Issue 01 (2018-05-04) 874

NE20E-S2
Table 1-148 Description of fields in an ARP message

Ethernet address 48 bits Ethernet destination MAC address in the Ethernet frame
of destination header. This field in an ARP request message is the
broadcast MAC address, with a value of
0xFF-FF-FF-FF-FF-FF.
Ethernet address 48 bits Ethernet source MAC address in the Ethernet frame header.
of sender
Frame type 16 bits Data type. For an ARP request or reply message, the value
of this field is 0x0806.
Hardware type 16 bits Hardware address type. For an Ethernet network, the value
of this field is 1.
Protocol type 16 bits Type of the protocol address to be mapped by the sending
device. For an IP address, the value of this field is 0x0800.
Hardware length 8 bits Hardware address length. For an ARP request or reply
message, the value of this field is 6.
Protocol length 8 bits Protocol address length. For an ARP request or reply
message, the value of this field is 4.
OP 16 bits Operation type. The values are as follows:
 1: ARP request
 2: ARP reply
 3: RARP request
 4: RARP reply
Ethernet address 48 bits Source MAC address. The value of this field is the same as
of sender the Ethernet source MAC address in the Ethernet frame
header.
IP address of 32 bits Source IP address.
sender
Ethernet address 48 bits Destination MAC address. The value of this field in an
of destination ARP request message is 0x00-00-00-00-00-00.
IP address of 32 bits Destination IP address.
destination
 ARP table
An ARP table contains the latest mapping between IP and MAC addresses. If a host
always broadcasts an ARP request message for a MAC address before it sends an IP
datagram, network communication traffic will greatly increase. Furthermore, all other
hosts on the network have to receive and process the ARP request messages, which
lowers network efficiency. To solve this problem, an ARP table is maintained on each
Issue 01 (2018-05-04) 875

NE20E-S2
host to ensure efficient ARP operations. The mapping between an IP address and a MAC
address is called an ARP entry.
ARP entries can be classified as dynamic or static.
− Dynamic ARP entries are automatically generated and maintained by using ARP
messages. Dynamic ARP entries can be aged and overwritten by static ARP entries.
− Static ARP entries are manually configured and maintained by a network
administrator. Static ARP entries can neither be aged nor be overwritten by dynamic
ARP entries.
Before sending IP datagrams, a host searches the ARP table for the MAC address
corresponding to the destination IP address.
− If the ARP table contains the corresponding MAC address, the host directly sends
the IP datagrams to the MAC address instead of sending an ARP request message.
− If the ARP table does not contain the corresponding MAC address, the host
broadcasts an ARP request message to request the MAC address of the destination
host.
 Reverse Address Resolution Protocol (RARP)
If only the MAC address of a host is available, the host can send and receive RARP
messages to obtain its IP address.
To do so, the network administrator must establish the mapping between MAC addresses
and IP addresses on a gateway. When a new host is configured, its RARP client requests
the host's IP address from the RARP server on the gateway.
Implementation
 ARP implementation within a network segment
Figure 1-596 illustrates how ARP is implemented within a network segment, by using IP
datagram transmission from Host A to Host B as an example.
Issue 01 (2018-05-04) 876

NE20E-S2
Figure 1-596 ARP implementation between Host A and Host B on the same network
segment
a. Host A searches its ARP table and does not find the mapping between the IP and
MAC addresses of Host B. Host A then sends an ARP request message for the
MAC address of Host B. In this ARP request message, the source IP and MAC
addresses are respectively the IP and MAC addresses of Host A, the destination IP
and MAC addresses are respectively the IP address of Host B and
00-00-00-00-00-00, and the Ethernet source MAC address and Ethernet destination
MAC address are respectively the MAC address of Host A and the broadcast MAC
address.
b. After CE1 receives the ARP request message, CE1 broadcasts it on the network
segment.
c. After Host B receives the ARP request message, Host B adds the MAC address of
Host A to its ARP table and sends an ARP reply message to Host A. In this ARP
reply message, the source IP and MAC addresses are respectively the IP and MAC
addresses of Host B, the destination IP and MAC addresses are respectively the IP
and MAC addresses of Host A, and the Ethernet source and destination MAC
addresses are respectively the MAC addresses of Host B and Host A.
The PE also receives the ARP request message but discards it because the destination IP address in the
ARP request message is not its own IP address.
d. CE1 receives the ARP reply message and forwards it to Host A.
e. After Host A receives the ARP reply message, Host A adds the MAC address of
Host B to its ARP table and sends the IP datagrams to Host B.
 ARP implementation between different network segments
Issue 01 (2018-05-04) 877

NE20E-S2
ARP messages are Layer 2 messages. Therefore, ARP is applicable only to devices on
the same network segment. If two hosts on different network segments need to
communicate, the source host sends IP datagrams to the default gateway, which in turns
forwards the IP datagrams to the destination host. ARP implementation between different
network segments involves separate ARP implementation within network segments. In
this manner, hosts on different network segments can communicate.
The following examples show how ARP is implemented between different network
segments, by using IP datagram transmission from Host A to Host C as an example.
Figure 1-597 illustrates how ARP is implemented between Host A and the PE on the
same network segment.
Figure 1-597 ARP implementation between Host A and the PE
a. Host A searches its ARP table and does not find the mapping between the IP and
MAC addresses of Interface 1 on the default gateway PE that connects to Host C.
Host A then sends an ARP request message for the MAC address of the PE's
Interface 1. In this ARP request message, the source IP and MAC addresses are
respectively the IP and MAC addresses of Host A, the destination IP and MAC
addresses are respectively the IP address of the PE's Interface 1 and
00-00-00-00-00-00, and the Ethernet source and destination MAC addresses are
respectively the MAC address of Host A and the broadcast MAC address.
segment.
c. After the PE receives the ARP request message, the PE adds the MAC address of
Host A to its ARP table and sends an ARP reply message to Host A. In this ARP
reply message, the source IP and MAC addresses are respectively the IP and MAC
addresses of the PE's Interface 1, the destination IP and MAC addresses are
Issue 01 (2018-05-04) 878

NE20E-S2
respectively the IP and MAC addresses of Host A, and the Ethernet source and
destination MAC addresses are respectively the MAC address of the PE's Interface
1 and the MAC address of Host A.
Host B also receives the ARP request message but discards it because the destination IP address in the
d. CE1 receives the ARP reply message and forwards it to Host A.
e. After Host A receives the ARP reply message, Host Aadds the MAC address of the
PE's Interface 1 to its ARP table and sends the IP datagrams to the PE.
Figure 1-598 illustrates ARP implementation between the PE and Host C on the same
network segment.
Figure 1-598 ARP implementation between the PE and Host C
The PE searches its routing table and sends the IP datagrams from Interface 1 to
Interface 2.
a. The PE searches its ARP table and does not find the mapping between the IP
address and MAC address of Host C. Then, the PE sends an ARP request message
for the MAC address of Host C. In this ARP request message, the source IP and
MAC addresses are respectively the IP and MAC addresses of the PE's Interface 2,
the destination IP and MAC addresses are respectively the Host C's IP address and
Issue 01 (2018-05-04) 879

NE20E-S2
00-00-00-00-00-00, and the Ethernet source and destination MAC address are
respectively the MAC address of Interface 2 on PE and the broadcast MAC address.
segment.
c. After Host C receives the ARP request message, Host C adds the MAC address of
the PE's Interface 2 to its ARP table and sends an ARP reply message to the PE. In
this ARP reply message, the source IP and MAC addresses are respectively the IP
and MAC addresses of Host C, the destination IP and MAC addresses are
respectively the IP and MAC addresses of the PE's Interface 2, and the Ethernet
source and destination MAC addresses are respectively the MAC address of Host C
and the MAC address of Interface 2 on PE.
Host D also receives the ARP request message but discards it because the destination IP address in the
d. CE2 receives the ARP reply message and forwards it to the PE.
e. After the PE receives the ARP reply message, the PE adds the MAC address of
Host C to its ARP table and sends the IP datagrams to Host C.
So far, the IP datagram transmission from Host A to Host C is complete.
1. ARP request messages are broadcast, whereas ARP reply messages are unicast.
2. In ARP implementation, the switches CE1 and CE2 transparently forward IP datagrams and do not
modify them.
1.9.2.2.2 Dynamic ARP
Definition
Dynamic ARP allows devices to dynamically learn and update the mapping between IP and
MAC addresses using ARP messages. You do not need to manually configure the mapping.
Related Concepts
Dynamic ARP uses the dynamic ARP aging mechanism.
The dynamic ARP aging mechanism enables an ARP entry that is not used over a specified
period to be automatically deleted. This mechanism helps reduce storage space of ARP tables
and speed up ARP table queries.
Table 1-149 describes concepts related to the dynamic ARP aging mechanism.
Table 1-149 Concepts related to the dynamic ARP aging mechanism
Conce Description Usage Scenario

pt
Aging Before a dynamic ARP  If the IP address of the peer device remains
probe entry on a device is unchanged but its MAC address changes
mode aged, the device sends frequently, it is recommended that you configure
ARP aging probe ARP aging probe messages to be broadcast.
messages to the other  If the MAC address of the peer device remains
devices on the same unchanged, network bandwidth resources are
network segment. An insufficient, and the aging time of ARP entries is
ARP aging probe set to a small value, it is recommended that you
Issue 01 (2018-05-04) 880

NE20E-S2
Conce Description Usage Scenario

pt
message can be a configure ARP aging probe messages to be unicast.
unicast or broadcast
message. By default, a
device broadcasts ARP
aging probe messages.
Aging A dynamic ARP entry Two interconnected devices can learn the mapping
time has a life cycle. If a between their IP and MAC addresses using ARP and
dynamic ARP entry is can save the mapping in their ARP tables. Then, the
not updated before its two devices can communicate by using the ARP
life cycle ends, this entries. When the peer device becomes faulty, or the
dynamic ARP entry is network adapter of the peer device is replaced but the
deleted from the ARP local device does not receive any status change
table. The life cycle is information about the peer device, the local device
called aging time. continues sending IP datagrams to the peer device. As
a result, network traffic is interrupted because the ARP
table of the local device is not promptly updated. To
reduce the risk of network traffic interruption, an aging
timer can be set for each ARP entry. After the aging
timer of a dynamic ARP entry expires, the entry is
automatically deleted.
Numbe Before a dynamic ARP The ARP aging timer can help reduce the risk of
r of entry is aged, a device network traffic interruptions that occur because an
aging sends ARP aging probe ARP table is not updated quickly enough, but cannot
probe messages to the peer eliminate problems due to delays. Specifically, if the
attemp device. If the device dynamic ARP entry aging time is N seconds, the local
ts does not receive an device can detect the status change of the peer device
ARP reply message after N seconds. During the N seconds, the ARP table
after the number of of the local device is not updated. If the number of
aging probe attempts aging probe attempts is specified, the local device can
reaches a specified obtain the status change information about the peer
number, the dynamic device and update its ARP table.
ARP entry is aged.
Implementation
Dynamic ARP entries can be created, updated, and aged.
 Creating and updating dynamic ARP entries
If a device receives an ARP message that meets either of the following conditions, the
device automatically creates or updates an ARP entry:
− The source IP address of the ARP message is on the same network segment as the
IP address of the inbound interface. The destination IP address of the ARP message
is the IP address of the inbound interface.
− The source IP address of the ARP message is on the same network segment as the
IP address of the inbound interface. The destination IP address of the ARP message
is the virtual IP address of the VRRP backup group configured on the interface on
the device.
 Aging dynamic ARP entries
Issue 01 (2018-05-04) 881

NE20E-S2
After the aging timer of a dynamic ARP entry on a device expires, the device sends ARP
aging probe messages to the peer device. If the device does not receive an ARP reply
message after the number of aging probe attempts reaches a specified number, the
dynamic ARP entry is aged. The shutdown operation on the interface will trigger ARP
entry aging deletion on the interface. The shutdown operation on the VS will trigger ARP
entry aging deletion in the VS.
This feature limits the rate of sending ARP probe messages in order to prevent too many
system resources from being used during ARP probing. In high-specification scenarios, it
usually takes a long time from when ARP probing starts to when ARP entry aging is
complete.
Enhanced Functions
Dynamic ARP has an enhanced Layer 2 topology probe function. This function enables a
device to set the aging time to 0 for all ARP entries corresponding to a VLAN to which a
Layer 2 interface belongs when the Layer 2 interface becomes Up. The device then resends
ARP probe messages to update all ARP entries.
If a non-Huawei device that connects to a Huawei device receives an ARP aging probe
message with the destination MAC address as the broadcast address and the ARP table of the
non-Huawei device contains the mapping between the IP address and MAC address of the
Huawei device, the non-Huawei device does not respond to the broadcast ARP aging probe
message. Therefore, the Huawei device considers the link to the non-Huawei device Down
and deletes the mapping between the IP address and MAC address of the non-Huawei device.
To prevent this problem, configure Layer 2 topology change so that the Huawei device
unicasts ARP aging probe messages to the non-Huawei device.
Usage Scenario
Dynamic ARP applies to a network with a complex topology, insufficient bandwidth resources,
and a high real-time communication requirement.
Benefits
Dynamic ARP entries are dynamically created and updated using ARP messages. They do not
need to be manually maintained, greatly reducing maintenance workload.
1.9.2.2.3 Static ARP
Definition
Static ARP allows a network administrator to create the mapping between IP and MAC
addresses.
Principles
Static ARP implements the following functions:
 Binds IP addresses to the MAC address of a specified gateway so that IP datagrams
destined for these IP addresses must be forwarded by this gateway.
 Binds the destination IP addresses of IP datagrams sent by a specified host to a
nonexistent MAC address, helping filter out unwanted IP datagrams.
Issue 01 (2018-05-04) 882

NE20E-S2
To ensure the stability and security of network communication, deploy static ARP based on
actual requirements and network resources.
Related Concepts
Static ARP entries are classified as short or long entries.
 Short static ARP entries
Short static ARP entries contain only IP and MAC addresses. A device still has to send
ARP request messages. If the source IP and MAC addresses of the received reply
messages are the same as the configured IP and MAC addresses in a short static ARP
entry, the device adds the interface that receives the ARP reply messages to the short
static ARP entry. The device can use this interface to forward subsequent messages
directly. Short static ARP entries cannot be directly used to forward messages.
Configuring short static ARP entries enables a host and a device to communicate using
fixed IP and MAC addresses.
In Network Load Balancing (NLB) scenarios, you must configure both MAC entries with multiple
outbound interfaces and short static ARP entries for the gateway. These MAC entries and short static
ARP entries must have the same MAC address. In NLB scenarios, short static ARP entries are also
called ARP entries with multiple outbound interfaces and cannot be updated manually.
 Long static ARP entries
Long static ARP entries contain IP and MAC addresses as well as the VLAN and
outbound interface through which devices send packets. Long static ARP entries are
directly used to forward messages.
Configuring long static ARP entries enables a host and a device to communicate through
a specified interface in a VLAN.
Usage Scenario
Static ARP applies to the following scenarios:
 Networks with a simple topology and high stability
 Networks on which information security is of high priority, such as a government or
military network
 Short static ARP entries mainly apply to scenarios in which network administrators want
to bind hosts' IP and MAC addresses but hosts' access interfaces can change.
Benefits
Static ARP ensures communication security. If a static ARP entry is configured on a device,
the device can communicate with the peer device using only the specified MAC address.
Network attackers cannot modify the mapping between the IP and MAC addresses using ARP
messages, ensuring communication between the two devices.
1.9.2.2.4 Gratuitous ARP
Principles
Gratuitous ARP allows a device to broadcast gratuitous ARP messages that carry the local IP
address as both the source and destination IP addresses to notify the other devices on the same
network segment of its address information. Gratuitous ARP is used in the following scenarios
to ensure the stability and reliability of network communication:
Issue 01 (2018-05-04) 883

NE20E-S2
 You need to check whether the IP address of a device conflicts with the IP address of
another device on the same network segment. The IP address of each device must be
unique to ensure the stability of network communication.
 After the MAC address of a host changes after its network adapter is replaced, the host
must quickly notify other devices on the same network segment of the MAC address
change before the ARP entry is aged. This ensures the reliability of network
communication.
 When a master/backup switchover occurs in a VRRP backup group, the new master
device must notify other devices on the same network segment of its status change.
Related Concepts
Gratuitous ARP uses gratuitous ARP messages. A gratuitous ARP message is a special ARP
message that carries the sender's IP address as both the source and destination IP addresses.
Implementation
Gratuitous ARP is implemented as follows:
 If a device finds that the source IP address in a received gratuitous ARP message is the
same as its own IP address, the device sends a gratuitous ARP message to notify the
sender of the address conflict.
 If a device finds that the source IP address in a received gratuitous ARP message is
different from its own IP address, the device updates the corresponding ARP entry with
the sender's IP and MAC addresses carried in the gratuitous ARP message.
Figure 1-599 illustrates how gratuitous ARP is implemented.
Figure 1-599 Gratuitous ARP implementation
Issue 01 (2018-05-04) 884

NE20E-S2
As shown in Figure 1-599, the IP address of Interface 1 on PE1 is 10.1.1.1, and the IP address
of Interface 2 on PE2 is 10.1.1.1.
1. Interface 1 broadcasts an ARP request message. Interface 2 receives the ARP request
message and finds that the source IP address in the message conflicts with its own IP
address. Interface 2 then performs the following operations:
a. Sends a gratuitous ARP message to notify Interface 1 of its IP address.
b. Generates a conflict node on its conflict link and then sends gratuitous ARP
messages to Interface 1 at an interval of 5 seconds.
2. Interface 1 receives the gratuitous ARP messages from Interface 2 and finds that the
source IP address in the messages conflicts with its own IP address. Interface 1 then
performs the following operations:
a. Sends a gratuitous ARP message to notify Interface 2 of its IP address.
b. Generates a conflict node on its conflict link and then sends gratuitous ARP
messages to Interface 2 at an interval of 5 seconds.
Interface 1 and Interface 2 send gratuitous ARP messages to each other at an interval of 5
seconds until the address conflict is rectified.
If one interface does not receive a gratuitous ARP message from the other interface within 8
seconds, the interface considers the address conflict rectified. The interface deletes the
conflict node on its conflict link and stops sending gratuitous ARP messages to the other
interface.
Functions
Gratuitous ARP has the following functions:
 Checks for IP address conflicts. If a device receives a gratuitous ARP message from
another device, the IP addresses of the two devices conflict.
 Notifies MAC address changes. When the MAC address of a host changes after its
network adapter is replaced, the host sends a gratuitous ARP message to notify other
devices of the MAC address change before the ARP entry is aged. This ensures the
reliability of network communication. After receiving the gratuitous ARP message, other
devices maintain the corresponding ARP entry in their ARP tables based on the address
information carried in the message.
 Notifies status changes. When a master/backup switchover occurs in a VRRP backup
group, the new master device sends a gratuitous ARP message to notify other devices on
the network of its status change.
Benefits
Gratuitous ARP reveals address conflict on a network so that ARP tables of devices can be
quickly updated. This feature ensures the stability and reliability of network communication.
1.9.2.2.5 Proxy ARP
Principles
ARP is applicable only to devices on the same physical network. When a device on a physical
network needs to send IP datagrams to another physical network, the gateway is used to query
the routing table to implement communication between the two networks. However, routing
table query consumes system resources and can affect other services. To resolve this problem,
deploy proxy ARP on an intermediate device. Proxy ARP enables devices that reside on
Issue 01 (2018-05-04) 885

NE20E-S2
different physical network segments but on the same IP network to resolve IP addresses to
MAC addresses. This feature helps reduce system resource consumption caused by routing
table queries and improves the efficiency of system processing.
Implementation
 Routed proxy ARP
A large company network is usually divided into multiple subnets to facilitate
management. The routing information of a host in a subnet can be modified so that IP
datagrams sent from this host to another subnet are first sent to the gateway and then to
another subnet. However, this solution makes it hard to manage and maintain devices.
Deploying proxy ARP on the gateway effectively resolves management and maintenance
problems caused by network division.
Figure 1-600 illustrates how routed proxy ARP is implemented between Host A and Host
B.
Figure 1-600 Routed proxy ARP implementation
Issue 01 (2018-05-04) 886

NE20E-S2
a. Host A sends an ARP request message for the MAC address of Host B.
b. After the PE receives the ARP request message, the PE checks the destination IP
address of the message and finds that it is not its own IP address and determines
that the requested MAC address is not its MAC address. The PE then checks
whether there are routes to Host B.
 If a route to Host B is available, the PE checks whether routed proxy ARP is
enabled.
 If routed proxy ARP is enabled on the PE, the PE sends the MAC address
of its Interface 1 to Host A.
 If routed proxy ARP is not enabled on the PE, the PE discards the ARP
request message sent by Host A.
 If no route to Host B is available, the PE discards the ARP request message
sent by Host A.
c. After Host A learns the MAC address of the PE's Interface 1, Host A sends IP
datagrams to the PE using this MAC address.
The PE receives the IP datagrams and forwards them to Host B.
 Intra-VLAN proxy ARP
Figure 1-601 illustrates how intra-VLAN proxy ARP is implemented between Host A
and Host C.
Issue 01 (2018-05-04) 887

NE20E-S2
Figure 1-601 Intra-VLAN proxy ARP implementation
The type of interface 1 could be VLANIF interface.
Host A, Host B, and Host C belong to the same VLAN, but Host A and Host C cannot
communicate at Layer 2 because port isolation is enabled on the CE. To allow Host A
and Host C to communicate, configure a interface 1 on the CE and enable intra-VLAN
proxy ARP.
a. Host A sends an ARP request message for the MAC address of Host C.
b. After the CE receives the ARP request message, the CE checks the destination IP
that the requested MAC address is not the MAC address of its Interface 1. The CE
Issue 01 (2018-05-04) 888

NE20E-S2
then searches its ARP table for the ARP entry indicating the mapping between the
IP and MAC addresses of Host C.
 If the CE finds this ARP entry in its ARP table, the CE checks whether
intra-VLAN proxy ARP is enabled.
 If intra-VLAN proxy ARP is enabled on the CE, the CE sends the MAC
address of its interface 1 to Host A.
 If intra-VLAN proxy ARP is not enabled on the CE, the CE discards the
ARP request message sent by Host A.
 If the CE does not find this ARP entry in its ARP table, the CE discards the
ARP request message sent by Host A and checks whether intra-VLAN proxy
ARP is enabled.
 If intra-VLAN proxy ARP is enabled on the CE, the CE broadcasts the
ARP request message with the IP address of Host C as the destination IP
address within VLAN 4. After the CE receives an ARP reply message
from Host C, the CE generates an ARP entry indicating the mapping
between the IP and MAC addresses of Host C.
 If intra-VLAN proxy ARP is not enabled on the CE, the CE does not
perform any operations.
c. After Host A learns the MAC address of interface 1, Host A sends IP datagrams to
the CE using this MAC address.
The CE receives the IP datagrams and forwards them to Host C.
 Inter-VLAN proxy ARP
Figure 1-602 illustrates how inter-VLAN proxy ARP is implemented between Host A
and Host B.
Issue 01 (2018-05-04) 889

NE20E-S2
Figure 1-602 Inter-VLAN proxy ARP implementation
The type of interface 1 could be VLANIF interface.
Host A belongs to VLAN 3, whereas Host B belongs to VLAN 2. Therefore, Host A

cannot communicate with Host B. To allow Host A and Host B to communicate,
configure interface 1 on the PE and enable inter-VLAN proxy ARP.
b. After the PE receives the ARP request message, the PE checks the destination IP
that the requested MAC address is not the MAC address of its interface 1. The PE
Issue 01 (2018-05-04) 890

NE20E-S2
then searches its ARP table for the ARP entry indicating the mapping between the
IP and MAC addresses of Host B.
 If the PE finds this ARP entry in its ARP table, the PE checks whether
inter-VLAN proxy ARP is enabled.
 If inter-VLAN proxy ARP is enabled on the PE, the PE sends the MAC
address of its interface 1 to Host A.
 If inter-VLAN proxy ARP is not enabled on the PE, the PE discards the
ARP request message sent by Host A.
 If the PE does not find this ARP entry in its ARP table, the PE discards the
ARP request message sent by Host A and checks whether inter-VLAN proxy
ARP is enabled.
 If inter-VLAN proxy ARP is enabled on the PE, the PE broadcasts the
ARP request message with the IP address of Host B as the destination IP
address within VLAN 2. After the PE receives an ARP reply message
from Host B, the PE generates an ARP entry indicating the mapping
between the IP and MAC addresses of Host B.
 If inter-VLAN proxy ARP is not enabled on the PE, the PE does not
perform any operations.
c. After Host A learns the MAC address of interface 1, Host A sends IP datagrams to
the PE using this MAC address.
The PE receives the IP datagrams and forwards them to Host B.
 Local proxy ARP
Figure 1-603 illustrates how local proxy ARP is implemented between Host A and Host
B.
Issue 01 (2018-05-04) 891

NE20E-S2
Figure 1-603 Local proxy ARP implementation
Host A and Host B belong to the same bride domain (BD) but cannot communicate at
Layer 2 because port isolation is enabled on the CE. To enable Host A and Host B to
communicate, a VBDIF interface (VBDIF2) is configured on the CE to implement local
proxy ARP.
b. After the CE receives the ARP request message, the CE checks the destination IP
that the requested MAC address is not the MAC address of VBDIF2. The CE then
searches its ARP table for the ARP entry indicating the mapping between the IP and
MAC addresses of Host B.
 If the CE finds this ARP entry in its ARP table, the CE checks whether local
proxy ARP is enabled.
 If local proxy ARP is enabled on the CE, the CE sends the MAC address
of VBDIF2 to Host A.
Issue 01 (2018-05-04) 892

NE20E-S2
 If local proxy ARP is not enabled on the CE, the CE discards the ARP
request message.
 If the CE does not find this ARP entry in its ARP table, the CE discards the
ARP request message and checks whether local proxy ARP is enabled.
 If local proxy ARP is enabled on the CE, the CE broadcasts an ARP
request message to request Host B's MAC address. After receiving an
ARP reply message from Host B, the CE generates an ARP entry for Host
B.
 If local proxy ARP is not enabled on the CE, the CE does not perform any
operations.
c. After Host A learns the MAC address of VBDIF2, Host A sends IP datagrams to
the CE using this MAC address.
The CE receives the IP datagrams and forwards them to Host B.
Usage Scenario
Table 1-150 describes the usage scenarios for proxy ARP.
Table 1-150 Proxy ARP usage scenarios
Proxy ARP Usage Scenario

Type
Routed Two hosts that need to communicate belong to the same network segment
proxy ARP but different physical networks.
Intra-VLAN Two hosts that need to communicate belong to the same network segment
proxy ARP and the same VLAN in which user isolation is configured.
Inter-VLAN Two hosts that need to communicate belong to the same network segment
proxy ARP but different VLANs.
NOTE
In VLAN aggregation scenarios, inter-VLAN proxy ARP can be enabled on the
VLANIF interface corresponding to the super-VLAN to implement communication
between sub-VLANs.
Local proxy In an EVC model, two hosts that need to communicate belong to the same
ARP network segment and the same BD in which user isolation is configured.
Benefits
Proxy ARP offers the following benefits:
 Proxy ARP enables a host on a network to consider that the destination host is on the
same network segment. Therefore, the hosts do not need to know the physical network
details but can be aware of the network subnets.
 All processing related to proxy ARP is performed on a gateway, with no configuration
needed on the hosts connecting to it. In addition, proxy ARP affects only the ARP tables
on hosts and does not affect the ARP table and routing table on a gateway.
 Proxy ARP can be used when no default gateway is configured for a host or a host
cannot route messages.
Issue 01 (2018-05-04) 893

NE20E-S2
1.9.2.2.6 ARP-Ping
Principles
ARP-Ping is classified as ARP-Ping IP or ARP-Ping MAC and is used to maintain a network
on which Layer 2 features are deployed. ARP-Ping uses ARP messages to detect whether an
IP or MAC address to be configured for a device is in use.
 ARP-Ping IP
Before configuring an IP address for a device, check whether the IP address is being used
by another device. Generally, the ping operation can be used to check whether an IP
address is being used. However, if a firewall is configured for the device using the IP
address and the firewall is configured not to respond to ping messages, the IP address
may be mistakenly considered available. To resolve this problem, use the ARP-Ping IP
feature. ARP messages are Layer 2 protocol messages and, in most cases, can pass
through a firewall configured not to respond to ping messages.
 ARP-Ping MAC
The host's MAC address is the fixed address of the network adapter on the host. It does
not normally need to be configured manually; however, there are exceptions. For
example, if a device has multiple interfaces and the manufacturer does not specify MAC
addresses for these interfaces, the MAC addresses must be configured, or a virtual MAC
address must be configured for a VRRP backup group. Before configuring a MAC
address, use the ARP-Ping MAC feature to check whether the MAC address is being
used by another device.
Related Concepts
 ARP-Ping IP
A device obtains the specified IP address and outbound interface number from the
configuration management plane, saves them to the buffer, constructs an ARP request
message, and broadcasts the message on the outbound interface. If the device does not
receive an ARP reply message within a specified period, the device displays a message
indicating that the IP address is not being used by another device. If the device receives
an ARP reply message, the device compares the source IP address in the ARP reply
message with the IP address stored in the buffer. If the two IP addresses are the same, the
device displays the source MAC address in the ARP reply message and displays a
message indicating that the IP address is being used by another device.
 ARP-Ping MAC
The ARP-Ping MAC process is similar to the ping process but ARP-Ping MAC is
applicable only to directly connected Ethernet LANs or Layer 2 Ethernet virtual private
networks (VPNs). A device obtains the specified MAC address and outbound interface
number (optional) from the configuration management plane, constructs an Internet
Control Message Protocol (ICMP) Echo Request message, and broadcasts the message
on all outbound interfaces. If the device does not receive an ICMP Echo Reply message
within a specified period, the device displays a message indicating that the MAC address
is not being used by another device. If the device receives an ICMP Echo Reply message
within a specified period, the device compares the source MAC address in the message
with the MAC address stored on the device. If the two MAC addresses are the same, the
device displays the source IP address in the ICMP Echo Reply message and displays a
message indicating that the MAC address is being used by another device.
Issue 01 (2018-05-04) 894

NE20E-S2
Implementation
 ARP-Ping IP implementation
Figure 1-604 ARP-Ping IP implementation
As shown in Figure 1-604, Device A uses ARP-Ping IP to check whether IP address

10.1.1.2 is being used. After Device A receives an ARP reply message from Host A with
IP address 10.1.1.2, Device A displays the MAC address of Host A along with a message
indicating that the IP address is in use by another host.
The ARP-Ping IP implementation process is as follows:
a. After IP address 10.1.1.2 is specified using a command line on Device A, Device A
broadcasts an ARP request message and starts a timer for ARP reply messages
b. After Host A on the same LAN receives the ARP request message, Host A finds
that the destination IP address in the message is the same as its own IP address and
sends an ARP reply message to Device A.
c. After Device A receives the ARP reply message, Device A compares the source IP
address in the message with the IP address specified in the command line.
 If the two IP addresses are the same, Device A displays the source MAC
address in the message and displays a message indicating that the IP address is
being used by another host. Meanwhile, Device A stops the timer for ARP
reply messages.
 If the two IP addresses are different, Device A discards the ARP reply message
and displays a message indicating that the IP address is not being used by
another host.
If Device A does not receive any ARP reply messages before the ARP reply
message timer expires, it displays a message indicating that the IP address is not
being used by another host.
A device cannot allow the arp-ping ip command to ping its own IP address, whereas the ping command
allows this function.
 ARP-Ping MAC implementation
Issue 01 (2018-05-04) 895

NE20E-S2
Figure 1-605 ARP-Ping MAC implementation
As shown in Figure 1-605, Device A uses ARP-Ping MAC to check whether MAC
address 0013-46E7-2EF5 is being used by another host. After receiving ICMP Echo
Reply messages from all hosts on the network, Device A displays the IP address of the
host with the MAC address 0013-46E7-2EF5 and displays a message indicating that the
MAC address is being used by another host.
The ARP-Ping MAC implementation process is as follows:
a. After MAC address 0013-46E7-2EF5 is specified using a command line on Device
A, Device A broadcasts an ICMP Echo Request message and starts a timer for
ICMP Echo Reply messages.
b. After receiving the ICMP Echo Request message, all the other hosts on the same
LAN send ICMP Echo Reply messages to Device A.
c. After Device A receives an ICMP Echo Reply message from a host, Device A
compares the source MAC address in the message with the MAC address specified
in the command line.
 If the two MAC addresses are the same, Device A displays the source IP
address in the ICMP Echo Reply message and displays a message indicating
that the MAC address is being used by another host. Meanwhile, Device A
stops the timer for ICMP Echo Reply messages.
 If the two MAC addresses are different, Device A discards the ICMP Echo
Reply message and displays a message indicating that the MAC address is not
being used by another host.
If Device A does not receive any ICMP Echo Reply messages before the ICMP
Echo Reply message timer expires, it displays a message indicating that the MAC
address is not being used by another host.
Usage Scenario
ARP-Ping applies to directly connected Ethernet LANs or Layer 2 Ethernet VPNs.
Issue 01 (2018-05-04) 896

NE20E-S2
Benefits
ARP-Ping checks whether an IP or MAC address to be configured is being used by another
device, preventing address conflict.
1.9.2.2.7 Dual-Device ARP Hot Backup
Background
Figure 1-606 shows a typical network topology with a VRRP backup group deployed. In the
topology, Device A is a master device, and Device B is a backup device. In normal
circumstances, Device A forwards uplink and downlink traffic. If Device A or the link
between Device A and the Switch becomes faulty, a master/backup VRRP switchover is
triggered to switch Device B to the Master state. Device B needs to advertise a network
segment route to a device on the network side to direct downlink traffic to Device B. If
Device B has not learned ARP entries from a device on the user side, the downlink traffic is
interrupted.
Dual-device ARP hot backup applies in both Virtual Router Redundancy Protocol (VRRP) and enhanced
trunk (E-Trunk) scenarios. This section describes the implementation of dual-device ARP hot backup in
VRRP scenarios.
Figure 1-606 VRRP application
Implementation
After you deploy dual-device ARP hot backup, the new master device forwards the downlink
traffic without learning ARP entries again. Dual-device ARP hot backup ensures downlink
traffic continuity.
As shown in Figure 1-607, a VRRP backup group is configured on Device A and Device B.
Device A is a master device, and Device B is a backup device. Device A forwards uplink and
downlink traffic.
Issue 01 (2018-05-04) 897

NE20E-S2
Figure 1-607 Dual-device ARP hot backup
If Device A or the link between Device A and the Switch becomes faulty, a master/backup
VRRP switchover is triggered to switch Device B to the Master state. Device B needs to
advertise a network segment route to a device on the network side to direct downlink traffic to
Device B.
 Before you deploy dual-device ARP hot backup, Device B does not learn ARP entries
from a device on the user side and therefore a large number of ARP Miss messages are
transmitted. As a result, system resources are consumed and downlink traffic is
interrupted.
 After you deploy dual-device ARP hot backup, Device B backs up ARP information on
Device A in real time. When Device B receives downlink traffic, it forwards the
downlink traffic based on the backup ARP information.
Related Concepts
VRRP is a fault-tolerant protocol. VRRP groups several routers into a virtual router. If the
next hop of a router is faulty, VRRP switches traffic to another router for transmission,
ensuring the continuity and reliability of communication.
For details about VRRP, see the chapter "VRRP" in the NE20E Feature Description - Network
Reliability.
E-Trunk is an inter-device link aggregation mechanism, which aggregates links between
devices to provide device-level reliability. For details about E-Trunk, see the chapter
"E-Trunk" in the NE20E Feature Description - LAN and MAN Access.
Usage Scenario
Dual-device ARP hot backup applies when VRRP or E-Trunk is deployed to implement a
master/backup device switchover.
To ensure that ARP entries are completely backed up, set the VRRP or E-Trunk switchback delay to a
value greater than the number of ARP entries that need to be backed up divided by the slowest backup
speed.
Issue 01 (2018-05-04) 898

NE20E-S2
Benefits
Dual-device ARP hot backup prevents downlink traffic from being interrupted because the
backup device does not learn ARP entries of a device on the user side during a master/backup
device switchover, which improves network reliability.
1.9.2.3.1 Intra-VLAN Proxy ARP Application
As shown in Figure 1-608, to facilitate ease of management, communication isolation is
implemented for various departments on the intranet of a company. For example, although
Host A of the president's office, Host B of the R&D department, and Host C of the financial
department belong to the same VLAN, they cannot communicate at Layer 2. However, the
business requires that the president's office communicate with the financial department. To
permit this, enable intra-VLAN proxy ARP on the CE so that Host A can communicate with
Host C.
 Before intra-VLAN proxy ARP is enabled, if Host A sends an ARP request message for
the MAC address of Host C, the message cannot be broadcast to hosts of the R&D
department and financial department because port isolation is configured on the CE.
Therefore, Host A can never learn the MAC address of Host C and cannot communicate
with Host C.
 After intra-VLAN proxy ARP is enabled, the CE does not discard the ARP request
message sent from Host A even if the destination IP address in the message is not its own
IP address. Instead, the CE sends the MAC address of its VLANIF4 to Host A. Host A
then sends IP datagrams to this MAC address.
Issue 01 (2018-05-04) 899

NE20E-S2
Figure 1-608 Intra-VLAN proxy ARP networking
The type of interface 1 could be dot1q termination sub-interface or VLANIF interface.
Feature Deployment
Configure interface 1, which is a Layer 3 interface, on the CE, and enable intra-VLAN proxy
ARP. After the deployment, the CE sends the MAC address of its interface 1 to Host A when
receiving a request for the MAC address of Host C from Host A. Host A then sends IP
datagrams to the CE, which forwards the IP datagrams to Host C. Consequently, the
communication between Host A and Host C is implemented.
1.9.2.3.2 Static ARP Application
As shown in Figure 1-609, the intranet of an organization communicates with the Internet
through the gateway PE. To prevent network attackers from obtaining private information by
modifying ARP entries on the PE, deploy static ARP.
Issue 01 (2018-05-04) 900

NE20E-S2
Figure 1-609 Static ARP networking
 Before static ARP is deployed, the PE dynamically learns and updates ARP entries using
ARP messages. However, dynamic ARP entries can be aged and overwritten by new
dynamic ARP entries. Therefore, network attackers can send fake ARP messages to
modify ARP entries on the PE to obtain the private information of the organization.
 After static ARP is deployed, ARP entries on the PE are manually configured and
maintained by a network administrator. Static ARP entries are neither aged nor
overwritten by dynamic ARP entries. Therefore, deploying static ARP can prevent
network attackers from sending fake ARP messages to modify ARP entries on the PE,
and information security is ensured.
Feature Deployment
Deploy static ARP on the PE to set up fixed mapping between IP and MAC addresses of hosts
on the intranet. This can prevent network attackers from sending fake ARP messages to
modify ARP entries on the PE, ensuring the stability and security of network communication
and minimizing the risk of private information being stolen.
Terms
Term Definition
ARP Address Resolution Protocol. An Internet protocol used to map IP
addresses to MAC addresses.
Issue 01 (2018-05-04) 901

NE20E-S2

Abbreviation

RARP Reverse Address Resolution Protocol
1.9.3 ACL
Definition
As the name indicates, an Access Control List (ACL) is a list. The list contains matching
clauses, which are actually matching rules and used to tell the device to perform action on the
packet or not.
Purpose
ACLs are used to ensure reliable data transmission between devices on a network by
performing the following:
 Defend the network against various attacks, such as attacks by using IP, Transmission
Control Protocol (TCP), or Internet Control Message Protocol (ICMP) packets.
 Control network access. For example, ACLs can be used to control enterprise network
user access to external networks, to specify the specific network resources accessible to
users, and to define the time ranges in which users can access networks.
 Limit network traffic and improve network performance. For example, ACLs can be
used to limit the bandwidth for upstream and downstream traffic and to apply charging
rules to user requested bandwidth, therefore achieving efficient utilization of network
resources.
Benefits
ACL rules are used to classify packets. After ACL rules are applied to a router, the router
permits or denies packets based on them. The use of ACL rules therefore greatly improves
network security.
An ACL is a set of rules. It identifies a type of packet but does not filter packets. Other ACL-associated
functions are used to filter identified packets.
Issue 01 (2018-05-04) 902

NE20E-S2
1.9.3.2 Principles
1.9.3.2.1 Basic ACL Concepts
ACL type
ACL can be classified as ACL4 or ACL6 based on the support for IPv4 or IPv6.
The following table outlines ACL4 classification based on functions.
Table 1-151 ACL types

ACL Function ACL Number
Type
Interface- Defines rules based on packets' inbound 1000 to 1999
based interfaces.
ACL
Basic Defines rules based on packets' source 2000 to 2999
ACL addresses.
Advance Rules in an advanced ACL are defined 3000 to 3999
d ACL based on packets' source or destination
addresses, source or destination port
numbers, and protocol types.
Layer 2 Defines rules based on the Layer 2 4000 to 4999
ACL information, such as the source MAC
address, destination MAC address, or
protocol type of Ethernet frames.
User Defines rules based on the 6000 to 9999
ACL source/destination IP address,
source/destination service group,
source/destination user group,
source/destination port number, and
protocol type.
MPLS-ba Defines rules based on MPLS packets' 10000 to 10999
sed ACL EXP values, labels, or TTL values.
The following table outlines ACL6 classification based on functions.
Table 1-152 ACL6 types
ACL6 Function ACL6 Number

Type
Interface- Defines rules based on packets' inbound 1000 to 1999
based interfaces.
ACL6
Issue 01 (2018-05-04) 903

NE20E-S2
ACL6 Function ACL6 Number

Type
Basic Defines rules based on packets' source 2000 to 2999
ACL6 addresses.
Advance Defines rules based on packets' source or 3000 to 3999
d ACL6 destination addresses, source or
destination port numbers, and protocol
types.
User Defines rules based on the 6000 to 9999
ACL6 source/destination IP address,
source/destination service group,
source/destination user group,
source/destination port number, and
protocol type.
For easy memorization, use names instead of numbers to define ACLs. Just like using domain
names to replace IP addresses. ACLs of this type are called named ACLs. The ACL stated
above called numbered ACLs.
The only difference between named and numbered ACLs is that the former ones are more
recognizable owing to descriptive names.
When naming an ACL, you can specify a number for it. If no number is specified, the system
will allocate one automatically.
One name is only for one ACL. Multiple ACLs cannot have the same name, even if they are of different
types.
ACL step
An ACL step is the difference between two adjacent ACL rule numbers that are automatically
allocated. For example, if the step is set to 5, the rule numbers are multiples of 5, such as 5, 10,
15, and 20.
 If an ACL step is changed, rules in the ACL are automatically renumbered. For example,
if the ACL step is changed from 5 to 2, the original rule numbers 5, 10, 15, and 20 will
be renumbered as 2, 4, and 6.
 If the default step 5 is restored for an ACL, the system immediately renumbers the rules
in the ACL based on the default step. For example, if the step of ACL 3001 is 2, rules in
ACL 3001 are numbered 0, 2, 4, and 6. If the default step 5 is restored, the rules will be
renumbered as 5, 10, 15, and 20.
An ACL step can be used to maintain ACL rules and makes it convenient to add new ACL
rules. If a user has created four rules numbered 0, 5, 10, and 15 in an ACL, the user can add a
rule (for example, rule number 1) between rules 0 and 5.
Issue 01 (2018-05-04) 904

NE20E-S2
ACL validity period

To control a type of traffic in a specified period of time, users can configure the validity
period of an ACL rule to determine the time during which that traffic type is allowed to pass
through. For example, to ensure reliable transmission of video services in prime time in the
evening, restrict the traffic volume of common online users. The validity period can be an
absolute or cyclic time range.
 An absolute time range start from yyyy-mm-dd to yyyy-mm-dd. This time range is
effective once and does not repeat.
 A cyclic time range is cyclic, with a one week cycle. For example, an ACL rule takes
effect from 8:00 to 12:00 every Sunday.
1.9.3.2.2 ACL Matching Principles
What is "Matched"
Matched: the ACL exists, and there is a rule to which the packet conforms, no matter the rule
is permit or deny.
Mismatched: the ACL does not exist, or there is no rule in the ACL, or the packet does not
conform to any rules of the ACL.
ACL Matching Order

Firstly, the device checks whether the ACL exists (nonexistent ACLs can be applied to traffic
classifiers, such as QoS and OSPF.).
Then, the device matches packets against rules in order according to the rule ID. When
packets match one rule, the match operation is complete, and no more rules will be matched
against.
A rule is identified by a rule ID, which is configured by a user or generated by the system according to
the ACL step. All rules in an ACL are arranged in ascending order of rule IDs.
If the rule ID is automatically allocated, there is a certain space between two rule IDs. The size of the
space depends on the ACL step. For example, if the ACL step is set to 5, the difference between two rule
IDs are 5, such as 5, 10, 15, and the rest may be deduced by analogy. If the ACL step is 2, the rule IDs
generated automatically by the system start from 2. In this manner, the user can add a rule before the
first rule.
In configuration file, the rules are displayed in ascending order of rule IDs, not in the order of
configuration.
Rule can be arranged in two modes: Configuration mode and Auto mode. The default mode is
Configuration.
 If the Configuration mode is used, users can set rule IDs or allow a device to
automatically allocate rule IDs based on the step.
If rule IDs are specified when rules are configured, the rules are inserted at places
specified by the rule IDs. For example, three rules with IDs 5, 10, and 15 exist on a
device. If a new rule with ID 3 is configured, the rules are displayed in ascending order,
Issue 01 (2018-05-04) 905

NE20E-S2
3, 5, 10, and 15. This is the same as inserting a rule before ID 5. If users do not set rule
IDs, the device automatically allocates rule IDs based on the step. For example, if the
ACL step is set to 5, the difference or interval between two rule IDs is 5, such as 5, 10,
15, and the rest may be deduced by analogy.
If the ACL step is set to 2, the device allocates rule IDs starting from 2. The step allows
users to insert new rules, facilitating rule maintenance. For example, the ACL step is 5
by default. If a user does not configure a rule ID, the system automatically generates a
rule ID 5 as the first rule. If the user intends to add a new rule before rule 5, the user only
needs to input a rule ID smaller than 5. After the automatic realignment, the new rule
becomes the first rule.
In the Configuration mode, the system matches rules in ascending order of rule IDs. As a
result, a latter configured rule may be matched earlier.
 If the auto mode is used, the system automatically allocates rule IDs, and places the most
precise rule in the front of the ACL based on the depth-first principle. This can be
implemented by comparing the address wildcard. The smaller the wildcard, the narrower
the specified range.
For example, 129.102.1.1 0.0.0.0 specifies a host with the IP address 129.102.1.1, and
129.102.1.1 0.0.0.255 specifies a network segment with the network segment address
ranging from 129.102.1.1 to 129.102.1.255. The former specifies a narrower host range
and is placed before the latter.
The detailed operations are as follows:
− For basic ACL rules, the source address wildcards are compared. If the source
address wildcards are the same, the system matches packets against the ACL rules
based on the configuration order.
− For advanced ACL rules, the protocol ranges and then the source address wildcards
are compared. If both the protocol ranges and the source wildcards are the same, the
destination address wildcards are then compared. If the destination address
wildcards are also the same, the ranges of source port numbers are compared with
the smaller range being allocated a higher precedence. If the ranges of source port
numbers are still the same, the ranges of destination port numbers are compared
with the smaller range being allocated a higher precedence. If the ranges of
destination port numbers are still the same, the system matches packets against ACL
rules based on the configuration order of rules.
For example, a wide range of packets are specified for packet filtering. Later, it is required that packets
matching a specific feature in the range be allowed to pass. If the auto mode is configured in this case,
the administrator only needs to define a specific rule and does not need to re-order the rules because a
narrower range is allocated a higher precedence in the auto mode.
Table 1-153 describes the depth-first principle for matching ACL rules.
Table 1-153 Depth-first principle for matching ACL rules
ACL Matching Rules

Type
Interface-b Rules with any set are matched last, and other rules are matched in the order
ased ACL they are configured.
Basic ACL 1. Rules with VPN instance information are matched before those without
VPN instance information.
2. If multiple rules contain the same VPN instance information, the rule with
the smaller source IP addresses range (more 1s in the masks) is matched
Issue 01 (2018-05-04) 906

NE20E-S2
ACL Matching Rules

Type
first.
3. If multiple rules contain the same VPN instance information and the same
source IP address range, they are matched in the order they are
configured.
Advanced 1. Rules with VPN instance information are matched before those without
ACL VPN instance information.
2. If multiple rules contain the same VPN instance information, the rule that
contains the protocol type is matched first.
3. If multiple rules contain the same VPN instance information and the same
protocol type, the rule with the smaller source IP address range (more 1s
in the masks) is matched first.
4. If multiple rules contain the same VPN instance information, protocol
type, and source IP address range, the rule with the smaller destination IP
address range (more 1s in the masks) is matched first.
type, source IP address range, and destination IP address range, the rule
with the smaller Layer 4 port number range (TCP/UDP port numbers) is
matched first.
type, source and destination IP address ranges, and port number range,
they are matched in the order they are configured.
Layer 2 1. Rules with smaller wildcards of Layer 2 protocol types (more 1s in the
ACL masks) are matched first.
2. If multiple rules contain the same Layer 2 protocol type wildcard, the rule
with the smaller source MAC address range (more 1s in the masks) is
matched first.
3. If multiple rules contain the same Layer 2 protocol type wildcard and the
same source MAC address range, the rule with the smaller destination
MAC address range (more 1s in the masks) is matched first.
4. If multiple rules contain the same Layer 2 protocol type wildcard, source
and destination MAC address ranges, the rule with the smaller VLAN ID
of the outer tag is matched first.
and destination MAC address ranges, and VLAN ID of the outer tag, the
rule with the higher 802.1p priority of the outer tag is matched first.
and destination MAC address ranges, VLAN ID and 802.1p priority of the
outer tag, the rule with the smaller VLAN ID of the inner tag is matched
first.
outer tag, and VLAN ID of the inner tag, the rule with the higher 802.1p
priority of the inner tag is matched first.
outer tag, VLAN ID and 802.1p priority of the inner tag, they are matched
in the order they are configured.
MPLS-bas Rules can only be arranged in Configuration mode.
Issue 01 (2018-05-04) 907

NE20E-S2
ACL Matching Rules

Type
ed ACL
Matching Principle Summary

 The rules of an ACL are matched against according to the ascending order of the rule
IDs.
 Checking continues until a match is found. And stop to check once a match is found.
Therefore, different arrangement orders may have different results even all the rules in
an ACL are the same.
 Each rule has two actions: permit or deny.
 An ACL has two matching result: matched or mismatched.
 Mismatched result includes,
− The ACL has rules, but no rule is matched.
− There is no rule in the ACL.
− The ACL does not exist.
The performance for mismatched case depends on the ACL application. For detailed
information, see Table 1-154.
Please attention that in Table 1-154,

 The default "permit" in CPU defend policy indicates the device continues to match against the rest
clauses. For example, if the packet mismatches the blacklist, the device continues to match the
packet against the user-defined flow, rather than do the action of the blacklist.
 The default "permit" in traffic policy just indicates the matching result of the if-match acl clause is
permit. The performance of the policy depends on the matching result of other if-match acl clauses
in the same Classifier, and the logical relationship between the if-match acl clauses. For detailed
information, see 1.9.3.3.2 ACL Applied to Traffic Policy.
 The default "permit" and "deny" in route policy is just the matching result of the if-match acl clause.
The performance of the policy node depends on the matching-results of all if-match acl clauses in
the same node, and the node action ("permit" or "deny"). For detailed information, see 1.9.3.3.3 ACL
Applied to Route Policy.
Table 1-154 The default value of the application modules for mismatched case
Application Module Mismatched No Rule In ACL Does

All Rules ACL Not Exist
Telnet deny permit permit

SNMP deny permit permit
FTP V600R003C00: permit permit
permit
Other versions:
Issue 01 (2018-05-04) 908

NE20E-S2
Application Module Mismatched No Rule In ACL Does

All Rules ACL Not Exist
deny
TFTP deny permit permit
Traffic Policy permit permit permit

CPU Defend Whitelist permit permit permit
Policy
Blacklist permit permit permit
User-defined Flow permit permit permit
Routing Route Policy deny deny permit
Protocol
Filter Policy deny deny permit
Multicast static-rp permit permit permit
Policy group-policy
c-rp group-policy
Multicast boundary deny permit permit
policy
Other multicast deny deny deny
policies
NAT deny deny deny
BFD deny deny deny
IPSec deny IPSec does not IPSec does
support this not support
kind of ACL this kind of
ACL
Example
The following commands are configured one after another:
rule deny ip dscp 30 destination 1.1.0.0 0.0.255.255
rule permit ip dscp 30 destination 1.1.1.0 0.0.0.255
If the config mode is used, the rules in the ACL are displayed as follows:
acl 3000
rule 5 deny ip dscp 30 destination 1.1.0.0 0.0.255.255
rule 10 permit ip dscp 30 destination 1.1.1.0 0.0.0.255
If the auto mode is used, the rules in the ACL are displayed as follows:
Issue 01 (2018-05-04) 909

NE20E-S2
acl 3000
rule 1 permit ip dscp 30 destination 1.1.1.0 0.0.0.255
rule 2 deny ip dscp 30 destination 1.1.0.0 0.0.255.255
If the device receives a packet with DSCP value 30 and destination IP address 1.1.1.1, the
packet is dropped when the config mode is used, but the packet is allowed to pass when the
auto mode is used.
1.9.3.3.1 ACL Applied to Telnet(VTY), SNMP, FTP & TFTP
Filtering Principle
When an ACL is applied to Telnet, SNMP, FTP or TFTP,
 If the source IP address of the user matches the permit rule, the user is allowed to log on.
 If the source IP address of the user matches the deny rule, the user is prohibited from
logging on.
 If the source IP address of the user does not match any rule in the ACL, the user is
prohibit logging on.
 If there is no rule in the ACL, or the ACL does not exist, all users are allowed to log on.
The default behavior is deny if the source IP address of the user does not match any rule in the ACL
applied to FTP.
When an ACL is applied to SNMP, if receiving a packet with the community name field being null, the
device directly discards the packet without filtering the packet based on the ACL rule. In addition, the
log about the community name error is generated. ACL filtering is triggered only when the community
name is not null.
Issue 01 (2018-05-04) 910

NE20E-S2
Figure 1-610 ACL flow chat-Telnet(VTY), SNMP, FTP, TFTP
Example of ACL Applied to Telnet(VTY)

On an IP bearer network, an ACL is applied to the device VTP for network access security, so
that only the NMS server (IP address: 10.0.102.113) in the attached NM VPN can log in to the
device.
Configurations:
#
acl 2013 //Create a basic ACL with the number 2013.
rule 5 permit vpn-instance vpna source 10.0.102.113 0 ////Permit the source IP
10.0.102.113 in the vpn-instance vpna to log in to the device.
rule 500 deny //Forbid other terminals to log in to the device
#
user-interface vty 0 4
acl 2007 inbound //Restrict VTY0 to VTY4's access to the device.
authentication-mode aaa
protocol inbound all
#
user-interface vty 5 14
acl 2007 inbound //Restrict VTY5 to VTY14's access to the device.
Issue 01 (2018-05-04) 911

NE20E-S2
If the NMS server belongs to a VPN, the VPN instance must be configured in the rule of ACL.
Example of ACL Applied to FTP

A device is connected to two network segments: 10.1.1.0/24 and 10.1.2.0/24. In the network
segment 10.1.1.0/24, a server provides web services and its IP address is 10.1.1.19.
Configure the following steps to implement that all hosts in these network segments are
allowed to establish the FTP connection with the device except for this server (10.1.1.19).
#
acl 2013 //Create a basic ACL with the number 2013.
rule 5 deny source 10.1.1.19 0 // Deny the server at 10.1.1.19/32.
rule 10 permit source 10.1.1.0 0.0.0.255 // Allow other hosts in network segment
10.1.1.0/24.
rule 15 permit source 10.1.2.0 0.0.0.255 // Allow the hosts in network segment
10.1.2.0/24.
#
ftp acl 2013 // After ACL 2013 is applied to FTP, all IP addresses in the network segment
10.1.1.0/24 are allowed to establish the FTP connection with the device except for the
address 10.1.1.19.
1.9.3.3.2 ACL Applied to Traffic Policy
About Traffic Policy

Traffic Policy is used in QoS Multi-Filed Classification to implement various QoS policies.
Traffic Policy consists of three parts,
 Classifier: defines traffic class. A Classifier can be configured with one or more if-match
clauses. A Classifier with non if-match clause is also allowed. Each if-match can be
applied with an ACL. Multiple Classifiers can use the same ACL. An ACL can be
configured with one or more rules.
 Behavior: defines action(s) that can be applied to a traffic classifier. A Behavior can have
one or more actions.
 Traffic-Policy: associates traffic classifiers and behaviors. When the Traffic-Policy
configuration is complete, apply the Traffic-Policy to the interface to make Traffic Policy
take effect.
Figure 1-611 shows relationships between an interface, traffic policy, traffic behavior, traffic
classifier, and ACL.
Issue 01 (2018-05-04) 912

NE20E-S2
Figure 1-611 Relationships between an interface, traffic policy, traffic behavior, traffic classifier,
and ACL
Matching Order Between Classifiers

One or more classifier & behavior pairs can be configured in a traffic policy. A packet is
matched against traffic classifiers in the order in which those classifiers are configured. If the
packet matches a traffic classifier, no further match operation is performed. If not, the packet
is matched against the following traffic classifiers one by one. If the packet matches no traffic
classifier at all, the packet is forwarded with no traffic policy executed.
The order of the traffic classifiers can be changed by the classifier behavior command.
For example, classifiers (named A, B, and C) are configured in traffic-policy T:
#
traffic policy T
classifier A behavior A
classifier B behavior B
classifier C behavior C
#
By default, the order of classifier A, B, and C are 1, 2, and 3, which is the same as
configuration order. Now if you want to move the classifier A to be the last one, you can run
the following command:
classifier A behavior A precedence 4
The result is:

#
traffic policy T
#
Issue 01 (2018-05-04) 913

NE20E-S2
The precedence 1 is not used, so you can add a classifier (named D) before classifier B by the
following command:
classifier D behavior D precedence 1
The result is:

#
traffic policy T
classifier D behavior D precedence 1
#
If you can add the classifier D by the following command not specifying precedence:
classifier D behavior D
The result is:

#
traffic policy T
classifier D behavior D
#
Matching Order Between If-match Clauses

If multiple if-match clauses are configured for a traffic classifier, the packet is matched
against them in the order which they are configured. If the packet is matched with one of the
if-match clause, the related behavior is executed or not, depends on the AND/OR Logic.
AND/OR Logic Between If-match Clauses

If a traffic classifier has multiple matching rules, the AND/OR logic relationships between
rules are described as follows:
 AND: Packets that match all the if-match clauses configured in a traffic classifier belong
to this traffic classifier.
 OR: Packets that match any one of the if-match clauses configured in a traffic classifier
belong to this traffic classifier.
Issue 01 (2018-05-04) 914

NE20E-S2
Traffic Policy Implementation (OR Logic)
Figure 1-612 Traffic Policy Implementation (OR Logic)
As shown in the Figure 1-612, for each Classifier, if the logic between If-match clauses is OR,
a packet is matched against If-match clauses in the order of the If-match clauses configuration.
Once the packet is matched with an if-match clause,
 If there is no ACL applied to the matched if-match clause, then the related behavior is
executed.
 If an ACL is applied to the matched if-match clause, and the packet matched with the
permit rule, then the related behavior is executed.
 If an ACL is applied to the matched if-match clause, and the packet matched with the
deny rule, then the packet is discarded directly and the related behavior is not executed.
If the packet is not matched any if-match clause, the related behavior is not executed, and the
next Classifier is processed for the packet.
Traffic Policy Implementation (AND Logic)

If the logic between If-match clauses is AND, the device combinates all If-match clauses, and
then processes the combinated If-match clause according to the procedure of OR logic.
Issue 01 (2018-05-04) 915

NE20E-S2
If one of the if-match clauses is applied with ACL, each rule of the ACL is combinated with
all of the other if-match clauses.
Note: the rules of the ACL will not be combinated. Therefore, the order of the If-match
clauses in And logic does not impact on the final matching result, but the order of the rules in
the ACL still impacts on the final result.
For example, in the following configuration,
#
acl 3000
rule 5 permit ip source 1.1.1.1 0
rule 10 deny ip source 2.2.2.2 0
#
traffic classifier example operator and
if-match acl 3000
if-match dscp af11
#
The device will combinate all if-match clauses. The combination result is the same as the
following configurations.
#
acl 3000
rule 5 permit ip source 1.1.1.1 0 dscp af11
rule 10 deny ip source 2.2.2.2 0 dscp af11
#
traffic classifier example operator or
if-match acl 3000
#
traffic behavior example
remark dscp af22
#
traffic policy example
share-mode
classifier example behavior example
#
interface GigabitEthernet2/0/0
traffic-policy P inbound
#
Then, the device process the combinated If-match clause according to the procedure of OR
logic. The result is, the DSCP of the packets is remark as AF22 if the packet is received from
GE2/0/0 and the DSCP is 10 and the source IP address is 1.1.1.1/32; the DSCP of the packets
is discarded if the packet is received from GE2/0/0 and the DSCP is 10 and the source IP
address is 1.1.1.2/32; other packets are forwarded directly since they are not matched any
rule.
In the default License, AND logic permits only one if-match clause applied with ACL, and OR logic
permits multiple if-match clauses applied with ACL.
If the License is modified so that the multiple if-match clauses applied with ACL is permitted in AND
logic, then the combination principle is:
Issue 01 (2018-05-04) 916

NE20E-S2
 permit + permit = permit

 permit + deny = deny
 deny + permit = deny
 deny + deny = deny
Matching Order of the ACL Applied to If-match Clauses

If an ACL is specified in an if-match clause, the packet is matched against the multiple rules
in the ACL. The device first checks whether the ACL exists. (A non-existent ACL can be
applied to a traffic classifier.) If the packet matches a rule in the ACL, no further match
operation is performed.
For traffic behavior mirroring or sampling, even if a packet matches a rule that defines a deny action, the
traffic behavior takes effect for the packet.
A permit or deny action can be specified in an ACL for a traffic classifier to work with
specific traffic behaviors as follows:
 If the deny action is specified in an ACL, the packet that matches the ACL is denied,
regardless of what the traffic behavior defines.
 If the permit action is specified in an ACL, the traffic behavior applies to the packet that
matches the ACL.
For example, the following configuration leads to such a result: the IP precedence of packets
with the source IP address 50.0.0.1/24 are re-marked as 7; the packets with the source IP
address 60.0.0.1/24 are dropped; the packets with the source IP address 70.0.0.1/24 are
forwarded with the IP precedence unchanged.
acl 3999
rule 5 permit ip source 50.0.0.0 0.255.255.255
rule 10 deny ip source 60.0.0.0 0.255.255.255
traffic classifier acl
if-match acl 3999
traffic behavior test
remark ip-pre 7
traffic policy test
classifier acl behavior test
interface GigabitEthernet1/0/1
traffic-policy test inbound
ACL Traffic Statistics Function

By default, ACL traffic statistics is disabled. You can, however, use the statistics enable
command to enable traffic statistics.
#
traffic policy example
classifier example behavior example
statistics enable
#
Issue 01 (2018-05-04) 917

NE20E-S2
1.9.3.3.3 ACL Applied to Route Policy
About Route Policy

Route-policy can use ACL, IP-prefix, AS-Path filter, community-filter, extcommunity-filter,
RD-filter, Route-Policy to define matching rules as shown in the following:
#
route-policy a permit node 1
if-match acl 2000
if-match as-path-filter 2
apply local-preference 20
#
if-match acl 2001
if-match as-path-filter 3
apply cost 1000
#
if-match ip-prefix prefix1
 A Route-policy can have multiple nodes. The logic between the nodes is "OR". The
device processes the nodes according to the ascending order of the node number. If the
route matches one of the nodes, the route is considered to match the policy, and the
matching action is not continued for the matched routes.
 Each node can have one or more if-match clauses and apply clauses.
The if-match clauses define the matching rules and the matching objects are route
attributes. The logic between the if-match clauses in the same node is "AND". If the
route matches all the if-match clauses, the route is considered to match the node. If the
route does not match all the if-match clauses of the node, the route continues to be
matched against the next node.
The apply clauses define the action applied to the route that matches the node.
Issue 01 (2018-05-04) 918

NE20E-S2
Matching Principle of the ACL Applied to Route-policy
Figure 1-613 ACL matching procedure in route-policy
Table 1-155 Matching Principle of the ACL Applied to Route-policy
Node ACL Route Result

Actio Actio Matches
n n ACL Rule?
Issue 01 (2018-05-04) 919

NE20E-S2
Node ACL Route Result

Actio Actio Matches
n n ACL Rule?
Permit Permit Yes The route is considered to match the if-match clause, and
the device continues to process the rest if-match clauses
in the same node.
 If the route matches all if-match clause, then the
apply clause is executed and the device does not
continue to match against the rest nodes for this route.
 If the route does not match all if-match clauses, the
apply clause is not executed. The device just
continues to process the rest nodes for the route. If
there is no rest node, the route is "deny".
No The route is considered to not match the if-match clause,
and the apply clause is not executed. The device just
continues to process the rest nodes for the route. If there
is no rest node, the mismatched route is "deny".
Permit Deny Yes The node does not take effect, and the device just
No is no rest node, the route is "deny".
Deny Permit Yes The route is "deny", and the apply clause is not executed.
And the device does not continue to process the rest
nodes for the route.
No The route does not match the if-match clause, and the
apply clause is not executed. The device just continues to
process the rest nodes for the route. If there is no rest
node, the route is "deny".
Deny Deny Yes The node does not take effect, and the device just
No is no rest node, the route is "deny".
 The device continues to process the rest nodes if the route is deny by the ACL.
 The device continues to process the rest nodes if the route does not match any rule in the ACL.
 It is recommended to configure deny rules with smaller numbers to filter out the unwanted routes.
Then, configure permit rules with larger numbers in the same ACL to receive or advertise the other
routes.
 It is recommended to configure permit rules with a smaller number to permit the routes to be
received or advertised by the device. Then, configure deny rules with larger numbers in the same
ACL to filter out unwanted routes.
Table 1-156 Dealing With Mismatched Case
ACL Matching Route-policy Processing Result

Result
Issue 01 (2018-05-04) 920

NE20E-S2
ACL Matching Route-policy Processing Result

Result
The relative ACL The if-match clause matching result is set as "permit", and
does not exist  if the node is "deny", the route is "deny", and the apply clause is
not executed. The device stops to process the rest nodes for this
route.
 if the node is "permit", and the device continues to process the
rest if-match clauses in the same node for this route.
− If the route matches all if-match clauses, then the apply
clause is executed. The device stops to process the rest
nodes.
− If the route does not match all if-match clauses, the apply
clause is not executed. The device just continues to process
the rest nodes for the route. If there is no rest node, the route
is "deny".
− If the if-match clause is the only one in the node, the apply
clause is executed for all relative routes. The device stops to
process the rest nodes for all routes.
The relative ACL The if-match clause matching result is set as "deny". The device
exists and there are stops to process the other if-match clauses, and the apply clause is
rules in the ACL, but not executed.
the rule matching  If there are rest nodes, the device continues to process the rest
result is nodes for the route.
"mismatched".
 If there is no rest node, all routes are "deny".
The relative ACL
exists but there is no
rule in the ACL.
If Unsupported ACL Filter Option Applied to Route-policy

Only numbered basic ACL (rule ID ranges from 2000 to 2999) or named ACL (rule ID ranges
from 42768 to 75535) can apply to route-policy.
The numbered basic ACL and named ACL applied to route-policy support only two matching
options, source-address and the time-range, not support other options (such as
destination-address, vpn-instance).
If the unsupported matching option is configured for route-policy, the matching result of the
option is "permit".
Example1
In the following configurations, the result is, all static routes are imported to BGP, and the
local-preferences of all routes are modified.
acl name example number 42768
rule 5 deny ip destination 10.1.0.0 0.0.0.255
#
route-policy policy1 permit node 10
if-match acl example
Issue 01 (2018-05-04) 921

NE20E-S2

#
bgp 100
import-route static route-policy policy1
#
Example2
In the following configurations, the result is, only the static route 20.1.0.0/24 can be imported
to BGP, and the local-preferences of all routes are modified. The "destination 10.1.0.0
0.0.0.255" does not take effect.
rule 5 permit ip source 20.1.0.0 0.0.0.255 destination 10.1.0.0 0.0.0.255
#
#
bgp 100
#
Example3
In the following configurations, the result is, all routes to 10.1.0.0/24 cannot be advertised to
BGP VPNv4 peer 1.1.1.1, no matter the L3VPNs the denied routes belong to. The
"vpn-instance vpnb" does not take effect.
acl example number 2000
rule 5 deny ip source 10.1.0.0 0.0.0.255 vpn-instance vpnb
rule 10 permit
#
#
bgp 100
peer 1.1.1.1 as-number 100
peer 1.1.1.1 connect-interface LoopBack1
#
ipv4-family vpnv4
policy vpn-target
peer 1.1.1.1 enable
peer 1.1.1.1 route-policy policy1 export
#
What is "Route Matches ACL Rule" in Route-policy?

In route-policy, if the route is in the network segment range defined by the source address and
its wildcard mask of the ACL rule, the route is considered to match the ACL rule.
For example, in the following configurations, the routes 10.1.1.0/24, 10.1.1.0/25, 10.1.1.0/30
is in the segment range of 10.1.1.0/24. Therefore, these routes are considered to match the
Issue 01 (2018-05-04) 922

NE20E-S2
ACL rule. The route 10.1.1.0/16 is considered to mismatch the ACL rule since it is outside of
the segment range of 10.1.1.0/24.
acl number 2000
rule 1 permit source 10.1.1.0 0.0.0.255
rule 99 deny any
Examples for ACL Applied to Route-policy

Node Is Permit, Rule Is Permit.
Configuration example:
acl number 2000
#
if-match acl 2000
#
#
bgp 100
#
If there are two static routes, 10.1.1.0/24 and 10.1.2.0/24,

 The static route 10.1.1.0/24 matches the ACL in node 10 and the node 10 is permit, so
the local-preference of 10.1.1.0/24 is modified to 1300.
 The static route 10.1.2.0/24 does not match node 10, but matches node 20. There is no
rule in node20, so all attributes of 10.1.1.0/24 are not modified.
The result is, both the static routes are imported to BGP, and only the local-preference of
10.1.1.0/24 is modified.
acl number 2000
rule 1 deny source 10.1.1.0 0.0.0.255
#
if-match acl 2000
#
#
bgp 100
#
Issue 01 (2018-05-04) 923

NE20E-S2
 10.1.1.0/24 matches the deny rule in node 10, so 10.1.1.0/24 is denied, the apply clause
in node 10 is not executed for 10.1.1.0/24, and the device continues to process node 20.
As a result, 10.1.1.0/24 is imported to BGP and its local-preference is not changed.
 10.1.2.0/24 does not match any rule in node 10, so the apply clause in node 10 is not
executed, and the device continues to process node 20 for 10.1.2.0/24. As a result,
10.1.2.0/24 is imported to BGP.
The result is, both the static routes are imported to BGP, and the local-preferences of both
routes are not modified.
acl number 2000
#
route-policy policy1 deny node 10
if-match acl 2000
#
#
bgp 100
#

 10.1.1.0/24 matches the permit rule in node 10 and the node 10 is deny, so 10.1.1.0/24 is
denied, the apply clause in node 10 is not executed for 10.1.1.0/24, and the device stops
to process node 20. As a result, 10.1.1.0/24 is not imported to BGP and its
local-preference is not modified.
 10.1.2.0/24 does not match node 10, so the apply clause in node 10 is not executed for
10.1.2.0/24, and the device continues to process node 20 for 10.1.2.0/24. As a result,
The result is, only 10.1.2.0/24 is imported to BGP and its local-preference is not modified.
Node Is Permit, Rule Is Permit.4
acl number 2000
rule 1 deny source 10.1.1.0 0.0.0.255
#
route-policy policy1 deny node 10
if-match acl 2000
#
#
bgp 100
#
Issue 01 (2018-05-04) 924

NE20E-S2

 10.1.1.0/24 matches the deny rule in node 10, so 10.1.1.0/24 is denied, the apply clause
in node 10 is not executed for 10.1.1.0/24, and the device continues to process node 20.
As a result, 10.1.1.0/24 is imported to BGP and its local-preference is not modified.
 10.1.2.0/24 does not match node 10, so the apply clause in node 10 is not executed for
10.1.2.0/24, and the device continues to process node 20 for 10.1.2.0/24. As a result,
The result is, both the static routes are imported to BGP, and the local-preferences of both
routes are not modified.
1.9.3.3.4 ACL Applied to Filter Policy
About Filter Policy

Filter policy can use ACL, IP-prefix and route-policy to filter routes during importing or
exporting routes.
Take OSPF as an example. As shown in the following figure. There are three routes to
10.1.1.0/24, 10.1.2.0/24, 10.1.3.0/24 on RTA.
If you don't want to advertise the routes to 10.1.1.0/24 and 10.1.2.0/24 on RTB, you can
configure the following commands.
[RTB] acl 2000
[RTB-acl2000] rule 5 deny source 10.1.1.0 0.0.0.255
[RTB-acl2000] rule 10 deny source 10.1.2.0 0.0.0.255
[RTB-acl2000] rule 15 permit source any
[RTB] ospf 100
[RTB-ospf-100] filter-policy acl 2000 export
Filter-policy impacts only on the routes advertised to or received from neighbors, not on the routes
imported from a route protocol to another route protocol. To import routes learned by other routing
protocols, run the import-route command in the OSPF view.
Issue 01 (2018-05-04) 925

NE20E-S2
Matching Principle of the ACL Applied to Filter-policy
Figure 1-614 ACL matching procedure in filter-policy
ACL Rule Matching Result Processing Result of Filter-policy
Route matches PERMIT rule The route is imported or advertised

Route matches DENY rule The route is not imported or advertised
There are rules in the ACL but The route is not imported or advertised
no rule is matched
The ACL does not exist All routes are imported or advertised
The ACL exists but there is no All routes are not imported or advertised
rule in the ACL
If Unsupported ACL Filter Option Applied to Filter-policy

Only numbered basic ACL (rule ID ranges from 2000 to 2999) or named ACL (rule ID ranges
from 42768 to 75535) can apply to filter-policy.
The numbered basic ACL and named ACL applied to filter-policy support only two matching
options, source-address and the time-range, not support other options (such as
destination-address, vpn-instance).
Issue 01 (2018-05-04) 926

NE20E-S2
If the unsupported matching option is configured for filter-policy, the matching result of the
option is "permit".
Example1
In the following configurations, the result is, all static routes are advertised to BGP peer.
rule 5 deny ip destination 10.1.0.0 0.0.0.255
#
bgp 100
ipv4-family unicast
filter-policy acl-name example export
#
Example2
In the following configurations, the result is, only the static route 20.1.0.0/24 can be
advertised to BGP peer. The "destination 10.1.0.0 0.0.0.255" does not take effect.
rule 5 permit ip source 20.1.0.0 0.0.0.255 destination 10.1.0.0 0.0.0.255
#
bgp 100
ipv4-family unicast
filter-policy acl-name example export
#
Example3
In the following configurations, the result is, all routes to 10.1.0.0/24 cannot be advertised to
all BGP VPNv4 peers, no matter the L3VPNs the denied routes belong to. The "vpn-instance
vpnb" does not take effect.
acl number 2000
rule 5 deny ip source 10.1.0.0 0.0.0.255 vpn-instance vpnb
rule 10 permit
#
#
bgp 100
ipv4-family vpnv4
filter-policy 2000 export
#
What is "Route Matches ACL Rule" in Filter-policy?

In filter-policy, if the route is in the network segment range defined by the source address and
its wildcard mask of the ACL rule, the route is considered to match the ACL rule.
For example, in the following configurations, the routes 10.1.1.0/24, 10.1.1.0/25, 10.1.1.0/30
is in the segment range of 10.1.1.0/24. Therefore, these routes are considered to match the
ACL rule. The route 10.1.1.0/16 is considered to mismatch the ACL rule since it is outside of
the segment range of 10.1.1.0/24.
Issue 01 (2018-05-04) 927

NE20E-S2
acl number 2000

rule 99 deny any
1.9.3.3.5 ACL Applied to Multicast Policy
Matching Principle of the ACL Applied to Multicast

When ACL is applied to multicast policy,
 If multicast route matches the permit rule, the action defined in multicast policy is done.
 If multicast route matches the deny rule, the action defined in multicast policy is not
executed.
 If multicast route does not match any rule, or the ACL does not exist, or there is no rule
in the ACL, the multicast route are deny in most multicast policies. For detailed
information, see Table 1-157.
Table 1-157 The default matching result of mismatched routes in multicast policy
Multicast Policy ACL Matching Processing Result of Policy

Result
static-rp group-policy Not match any rule in Permit

c-rp group-policy ACL
ACL does not exist Permit
There is no rule in the
ACL
Multicast boundary Not match any rule in Deny
policy ACL
ACL does not exist Permit
Or there is no rule in
the ACL
Other multicast policies Not match any rule in Deny
ACL
Or the ACL does not
exist,
Or there is no rule in
the ACL
ACL Filter Options Supported by Multicast Policy

When ACL is applied to multicast policy,
 The source-address in basic ACL indicates the source of multicast data packets, multicast
routes, or group-address. Basic ACL applied to multicast policy only supports two
parameters, that is, source and time-range.
Issue 01 (2018-05-04) 928

NE20E-S2
 Advanced ACL applied to multicast policy only supports two or three parameters, that is,
− Most multicast policies support only source, destination and time-range
− A few multicast policies supports only source and time-range
− Other multicast policies supports only destination and time-range
Named ACL applied to multicast policy only supports advanced ACL. If the named ACL number is out
of the range, the ACL does not take effect.
1.9.3.3.6 ACL Applied to CPU Defend Policy
About CPU Defend Policy

CPU defend policy limit the rate of the traffic sent to the local CPU, to prevent attack packets
and reduce the invalid packets takes, release the burden of CPU.
The summary of deployment for CPU defend policy is, divide the CPU packets into two parts,
one part is trusted and the other part is untrusted. Protect the trusted packets (set the larger
bandwidth for them) and limit the rate of the untrusted packets (set smaller bandwidth for
them).
CPU defend policy uses four modules to insulate or control the trusted traffic and the
untrusted packets.
Module Function
TCP/IP attack defense Directly discards the TCP/IP attack packet.
TCP/IP attack defense function is
enabled by default.
TCP/IP attack defense supports discarding

the following four kinds of attack packets.
 Malformed packets: IP null payload
packets, IGMP null payload packets,
LAND attack packets, Smurf attack
packets, and packets with invalid TCP
flag bits.
 Invalid fragmented packets: repetitive
fragmented packets, Tear Drop attack
packets, syndrop attack packets, nesta
attack packets, fawx attack packets,
bonk attack packets, NewTear attack
packets, Rose attack packets, dead ping
attack packets, and Jolt attack packets.
 UDP flood attack packets: UDP packets
whose destination interface numbers are
7, 13, and 19.
 TCP SYN flood attack packets.
Whitelist Protects the trusted packets. The bandwidth
for the packets added to whitelist is
assurable. The attack does not impact the
service in whitelist.
Issue 01 (2018-05-04) 929

NE20E-S2
Module Function
The following protocols are auto added to

the whitelist by default when TCP session
established: BGP, LDP, MSDP, FTP-server,
SSH-server, Telnet-server, FTP-client,
Telnet-client, and SSH-client.
Whitelist is enabled by default. Modifying

the default parameters in the whitelist is
not recommended. To extend the
application, you can use user-defined flows.
Blacklist Limits the rate of untrusted packets.
The blacklist is enabled by default, but there
is no packets added to blacklist by default.
You can add the invalid or unknown packets
to the blacklist so that the system can
minimize the bandwidth for them or directly
drop them.
User-defined Flow Protects the trusted packets.
User-defined flows allow users to flexibly
customize the attack defense policy to
protect the CPU against different types of
attack packets.
With user-defined flows, you can specify

the flows to be protected and control the
parameters, such as the bandwidth, priority,
and packet length for the flows. In addition,
you can send each user-defined flow to a
specific channel for granular isolation and
precise control.
The whitelist, blacklist, and user-defined flow use ACL to define the characters of the flow.
Each CPU defend policy can be configured with one whitelist, one blacklist, and one or more
user-defined flows, as shown in the following figure.
cpu-defend policy 4
whitelist acl 2001
blacklist acl 2002
user-defined-flow 1 acl 2003
#
cpu-defend policy 5
whitelist acl 2005
Issue 01 (2018-05-04) 930

NE20E-S2
Procedure of CPU Defend Policy
By default, the packet to CPU is matched in the order of whitelist --> blacklist --> user-defined flow.
This order can be modified by commands.
1. Performs the URPF, TCP/IP attack defense, and GTSM check. Continues to do the next
step for the packets that pass the checks. The packets not pass the checks are discarded.
2. Matches against the whitelist. Performs CAR and go to step 5 for the packet those match
the permit rule. Discards the packets those match the deny rule. Continues to do the next
step for the mismatched packet.
3. Matches against the blacklist. Performs CAR and go to step 5 for the packet those match
the permit rule. Discards the packets those match the deny rule. Continues to do the next
step for the mismatched packet.
4. Matches against the user-defined flow. Performs CAR and go to step 5 for the packet
those match the permit rule. Discards the packets those match the deny rule. Continues to
do the next step for the mismatched packet.
5. Checks all packets based on application layer association. Sends only the packets belong
to enabled protocols. The packets belong to disabled protocols are discarded.
In the step 2, 3 and 4, the "mismatched" includes:
 The packets mismatch all rules of the ACL.
 The ACL does not exist.
 The ACL exists but no rule in the ACL.
Directly drops the management packets received from the non-management interfaces.
1.9.3.3.7 ACL Applied to NAT

A NAT instance distributes user packets to different NAT address pools for address translation
according to ACL matching in the command line. Addresses can be selected from the
corresponding NAT address pool to perform NAT for packets only when the packets match
the specified ACL rule and the action defined for the rule is permit.
Table 1-158 Matching Principle of the ACL Applied to NAT
ACL Matching Processing Result of NAT

Result
The packet matches NAT is executed
the permit rule
The packet matches NAT is not executed, the packet is forwarded directly.
the deny rule
The packet
mismatches all rules
The relative ACL NAT is not executed, all packet are forwarded directly.
does not exist
The relative ACL
rule in the ACL
Issue 01 (2018-05-04) 931

NE20E-S2
1.9.3.3.8 ACL Applied to IPSec Policy

IPSec policy can protect different data flows. In practice, you need to define data flows
through ACL and quote the ACL in the security policy. Therefore, data flows are protected.
According to ACL rules, IPSec identifies which packets need or do not need security
protection. Data flows matching advanced ACLs (permit) are protected and sent after being
processed by IPSec. Data flows that do not match advanced ACLs are transmitted directly.
Data flows that need to be encrypted but actually not are considered as attack data flows and
discarded.
Pay attention to the following items:
 An inexistent ACL or an ACL without any rule cannot be applied to IPSec policy.
 IPSec policy supports only advanced ACL (including numbered and named ACL).
 Rules in an advanced ACL can match data flows according to the source or destination
IP address, source or destination port, and protocol number only.
 The ACL applied to IPSec policy cannot support deny rule.
 The ACL applied to IPSec policy cannot contain rules quoting address sets/port sets.
 The source and destination port numbers in the ACL applied to IPSec policy can be
specified by the eq parameter, rather than the lt, gt, and range parameters.
 An IPSec policy can only be applied one ACL. The original configuration must be
deleted when a new ACL is applied.
 ACLs configured in the same IPSec policy group cannot include the same rules.
Table 1-159 Matching Principle of the ACL Applied to IPSec
ACL Matching IPSec Processing Result

Result
The packet matches The packet is processed by IPSec, and then be forwarded.
the permit rule
The packet matches The packet is forwarded directly.
the deny rule
The relative ACL The packet is forwarded directly.
exists and there are
rules in the ACL, but
the packet does not
match any rule
The relative ACL IPSec does not support these kinds of ACLs
does not exist
The relative ACL
rule in the ACL
1.9.3.3.9 ACL Applied to Filtering BFD Passive Echo

ACL can be applied to control the range of BFD sessions that to enable with passive echo. By
default, passive echo is not enabled.
Issue 01 (2018-05-04) 932

NE20E-S2
The BFD echo packet is looped back through ICMP redirect at the remote end. In the IP
packet that encapsulates the BFD echo packet, the destination address and the source address
are the IP address of the outgoing interface of the local end. Therefore, in the ACL rule, both
the source addresses of the remote end and the local end must be permitted.
BFD passive echo supports only basic ACL, not support advanced ACL.
If the ACL applied to an established BFD session is modified, or a new ACL is applied to an established
BFD session, the ACL takes effect only when the session re-establishes or the parameters of the session
is modified.
Table 1-160 Matching Principle of the ACL Applied to BFD Passive Echo
ACL Matching Processing Result

Result
The session matches Passive echo is enabled for the session
the permit rule
The session matches Passive echo is not enabled for the session
the deny rule
The session
mismatches all rules
The relative ACL Passive echo is not enabled for all sessions
does not exist
The relative ACL
rule in the ACL
Terms
Term Definition
Interface-based ACL A list of rules for packet filtering based on the inbound
interfaces of packets.
Basic ACL A list of rules for packet filtering based on the source IP
addresses of packets.
Advanced ACL A list of rules for packet filtering based on the source or
destination IP addresses of packets and protocol types. It filters
packets based on protocol information, such as TCP source and
destination port numbers and the ICMP type and code.
Layer 2 ACL A list of rules for packet filtering based on the Ethernet frame
header information, such as source or destination Media
Access Control (MAC) addresses, protocol types of Ethernet
frames, or 802.1p priorities.
Issue 01 (2018-05-04) 933

NE20E-S2
Term Definition
User ACL A list of rules for packet filtering based on the
source/destination IP address, source/destination service group,
source/destination user group, source/destination port number,
and protocol type.
MPLS-based ACL A list of rules for packet filtering based on the EXP values,
Label values, or TTL values of MPLS packets.

Abbreviation
ACL access control list
1.9.4 DHCP
Definition
The Dynamic Host Configuration Protocol (DHCP) dynamically assigns IP addresses to hosts
and centrally manages host configurations. DHCP uses the client/server model. A client
applies to the server for configuration parameters, such as an IP address, subnet mask, and
default gateway address; the server replies with the requested configuration parameters.
DHCPv4 and DHCPv6 are available for dynamic address allocation on IPv4 and IPv6
networks, respectively. Though DHCPv4 and DHCPv6 both use the client/server model, they
are built based on different principles and operate differently.
Purpose
A host can send packets to or receive packets from the Internet after it obtains an IP address,
as well as the router address, subnet mask, and DNS address.
The Bootstrap Protocol (BOOTP) was originally designed for diskless workstations to
discover their own IP addresses, the server address, the name of a file to be loaded into
memory, and the gateway IP address. BOOTP applies to a static scenario in which all hosts
are allocated permanent IP addresses.
However, as the increasing network scale and network complexity complicate network
configuration, the proliferation of portable computers and wireless networks brings about host
mobility, and the increasing number of hosts causes IP address exhaustion, BOOTP is no
longer applicable. To allow hosts to rapidly go online or offline, as well as to improve IP
address usage and support diskless workstations, an automatic address allocation mechanism
is needed based on the original BOOTP architecture.
DHCP was developed to implement automatic address allocation. DHCP extends BOOTP in
the following aspects:
Issue 01 (2018-05-04) 934

NE20E-S2
 Allows a host to exchange messages with a server to obtain all requested configuration
parameters.
 Allows a host to rapidly and dynamically obtain an IP address.
Benefits
DHCP rapidly and dynamically allocates IP addresses, which improves IP address usage and
prevents the waste of IP addresses.
1.9.4.2 DHCPv4 Principles

1.9.4.2.1 DHCPv4 Overview
DHCPv4 Architecture
Figure 1-615 shows the DHCPv4 architecture.
Figure 1-615 DHCPv4 architecture
DHCPv4 involves the following roles:

 DHCPv4 client
A DHCPv4 client exchanges messages with a DHCPv4 server to obtain an IP address
and other configuration parameters. A device interface can function as a DHCPv4 client
to dynamically obtain configuration parameters from a DHCPv4 server. This facilitates
configuration and centralized management.
 DHCPv4 relay agent
A DHCPv4 relay agent forwards DHCPv4 messages exchanged between a DHCPv4
client and a DHCPv4 server that are located on different network segments, allowing
them to complete their address configuration. The use of a DHCPv4 relay agent
eliminates the need for deploying a DHCPv4 server on each network segment. This
reduces network deployment costs and facilitates device management.
DHCPv4 relay agents are not mandatory in the DHCPv4 architecture. A DHCPv4 relay agent is required
only when the server and client are located on different network segments.
 DHCPv4 server
A DHCPv4 server processes address allocation, lease extension, and address release
requests originating from a DHCPv4 client or forwarded by a DHCPv4 relay agent and
assigns IP addresses and other configuration parameters to the client.
To protect a DHCP server against network attacks, such as man-in-the-middle attacks, starvation attacks,
and DoS attacks by changing the CHADDR value, configure DHCP snooping on the intermediate device
directly connecting to a DHCP client to provide DHCP security services.
Issue 01 (2018-05-04) 935

NE20E-S2
1.9.4.2.2 DHCPv4 Messages

DHCP uses the client/server model. A DHCP client sends a message to a DHCP server to
request configuration parameters, such as the IP address, subnet mask, and default gateway
address. The DHCP server responds with a message carrying the requested configuration
parameters. DHCPv4 messages sent between clients and servers share an identical fixed
format header and a variable format area for options.
 DHCPv4 Message Format
 DHCPv4 Options
DHCPv4 Message Format

Figure 1-616 shows the DHCPv4 message format.
Figure 1-616 DHCPv4 message format
Table 1-161 describes the fields in a DHCPv4 message
Table 1-161 DHCPv4 message fields

op 1 byte Message operation code that specifies the message type. The
options are as follows:
 1: DHCP Request message
 2: DHCP Reply message
The specific message type is carried in the options field.
htype 1 byte Hardware address type. For Ethernet, the value of this field is 1.
hlen 1 byte Hardware address length. For Ethernet, the value of this field is 6.
hops 1 byte Number of DHCPv4 relay agents that have relayed this message.
This field is set to 0 by a DHCP client. The value increases by 1
each time a DHCPv4 message passes through a relay agent.
Issue 01 (2018-05-04) 936

NE20E-S2

NOTE
A maximum of 16 DHCP relay agents are allowed between a server and a
client. If this number is exceeded, DHCP messages are discarded.
xid 4 bytes Transaction ID for this message exchange. A DHCP client

generates a random number, which the client and server use to
identify their message exchange.
secs 2 bytes Number of seconds elapsed since a DHCP client began to request
an IP address.
flags 2 bytes The leftmost bit determines whether the DHCP server unicasts or
broadcasts a DHCP Reply message. All remaining bits in this
field are set to 0. The options are as follows:
 0: The DHCP server unicasts a DHCP Reply message.
 1: The DHCP server broadcasts a DHCP Reply message.
ciaddr 4 bytes Client IP address. The IP address can be an existing IP address of

a DHCP client or an IP address assigned by a DHCP server to a
DHCP client. During initialization, the client has no IP address,
and the value of this field is 0.0.0.0.
NOTE
The IP address 0.0.0.0 is an invalid address that is used only for temporary
communication during system startup in DHCP mode.
yiaddr 4 bytes Client IP address assigned by the DHCP server. The DHCP server
fills this field into a DHCP Reply message.
siaddr 4 bytes Server IP address from which a DHCP client obtains the startup
configuration file.
giaddr 4 bytes Gateway IP address, which is the IP address of the first DHCP
relay agent. If the DHCP server and client are located on different
network segments, the first DHCP relay agent fills its own IP
address into this field of the DHCP Request message sent by the
client. The relay agent forwards the message to the DHCP server,
which uses this field to determine the network segment where the
client resides. The DHCP server then assigns an IP address on this
network segment from an address pool.
The DHCP server also returns a DHCP Reply message to the first
DHCP relay agent. The DHCP relay agent then forwards the
DHCP Reply message to the client.
NOTE
If the DHCP Request message passes through multiple DHCP Relay agents
before reaching the DHCP server, the value of this field remains as the IP
address of the first DHCP relay agent. However, the value of the Hops
Issue 01 (2018-05-04) 937

NE20E-S2

field increases by 1 each time a DHCP Request message passes through a
DHCP relay agent.
chaddr 16 Client hardware address. This field must be consistent with the
bytes hardware type and hardware length fields. When sending a DHCP
Request message, the client fills its hardware address into this
field. For Ethernet, a 6-byte Ethernet MAC address must be filled
in this field when the hardware type and hardware length fields
are set to 1 and 6, respectively.
sname 64 Server host name. This field is optional and contains the name of
bytes the server from which a client obtains configuration parameters.
The field is filled in by the DHCP server and must contain a
character string that ends with 0.
file 128 Boot file name specified by the DHCP server for a DHCP client.
bytes This field is optional and is delivered to the client when the IP
address is assigned to the client. The field is filled in by the
DHCP server and must contain a character string that ends with 0.
options Variabl Optional parameters field. The length of this field must be at least
e 312 bytes. This field contains the DHCP message type and
configuration parameters assigned by a server to a client,
including the gateway IP address, DNS server IP address, and IP
address lease.
DHCPv4 Options
In the DHCPv4 options field, the first four bytes are decimal numbers 99, 130, 83 and 99,
respectively. This is the same as the magic cookie defined in standard protocols. The
remaining bytes identify several options as defined in standard protocols. One particular
option, the DHCP Message Type option (Option 53), must be included in every DHCP
message. Option 53 defines DHCP message types, including the DHCPDISCOVER,
DHCPOFFER, DHCPREQUEST, DHCPACK, DHCPNAK, DHCPDECLINE,
DHCPRELEASE, and DHCPINFORM messages.
 DHCPv4 message types
Table 1-162 lists the DHCPv4 message types.
Table 1-162 DHCPv4 message types
Type Description
DHCPDISCO A DHCP Discover message is broadcast by a DHCP client to locate a
VER DHCP server when the client attempts to access a network for the first
time.
DHCPOFFER A DHCP Offer message is sent by a DHCP server in response to a DHCP
Issue 01 (2018-05-04) 938

NE20E-S2
Type Description
Discover message. A DHCP Offer message carries various configuration
parameters.
DHCPREQUE A DHCP Request message is sent in the following conditions:
ST  After a DHCP client is initialized, it broadcasts a DHCP Request
message in response to the DHCP Offer message sent by a DHCP
server.
 After a DHCP client restarts, it broadcasts a DHCP Request message
to confirm the configuration including the assigned IP address.
 After a DHCP client obtains an IP address, it unicasts or broadcasts a
DHCP Request message to update the IP address lease.
DHCPACK A DHCP ACK message is sent by a DHCP server to acknowledge the
DHCP Request message from a DHCP client. After receiving a DHCP
ACK message, the DHCP client obtains the configuration parameters
including the IP address.
DHCPNAK A DHCP NAK message is sent by a DHCP server to reject the DHCP
Request message from a DHCP client. For example, if a DHCP server
cannot find matching lease records after receiving a DHCP Request
message, it sends a DHCP NAK message indicating that no IP address is
available for the DHCP client.
DHCPDECLI A DHCP Decline message is sent by a DHCP client to notify the DHCP
NE server that the assigned IP address conflicts with another IP address.
Then the DHCP client applies to the DHCP server for another IP address.
DHCPRELEA A DHCP Release message is sent by a DHCP client to release its IP
SE address. After receiving a DHCP Release message, the DHCP server can
assign this IP address to another DHCP client.
DHCPINFOR A DHCP Inform message is sent by a DHCP client to obtain other
M network configuration parameters such as the gateway address and DNS
server address after the DHCP client has obtained an IP address.
 DHCPv4 options
The options field in a DHCP message carries control information and parameters that are
not defined in common protocols. When a DHCP client requests an IP address from a
DHCP server that has been configured to encapsulate the options field, the server returns
a DHCP Reply packet containing the options field. Figure 1-617 shows the options field
format.
Figure 1-617 Options field format
Issue 01 (2018-05-04) 939

NE20E-S2
The options field consists of the sub-fields Type, Length, and Value. Table 1-163 describes
these sub-fields.
Table 1-163 Sub-fields in the DHCPv4 options field
Sub-field Length Description

Type 1 byte Type of the message content
Length 1 byte Length of the message content
Value Determined by the Length field Message content
value
The type value of the options field ranges from 1 to 255. Table 1-164 lists common
DHCPv4 options.
Table 1-164 Options in DHCPv4 messages
Options ID Description
1 Subnet mask
3 Gateway address
6 DNS address
15 Domain name
33 Group of classful static routes
After a DHCP client receives DHCP messages with this
option, it adds the classful static routes contained in the
option to its routing table. In classful routes, masks of
destination addresses are natural masks and cannot be used to
divide subnets. If Option 121 exists, Option 33 is ignored.
44 NetBIOS name
46 NetBIOS object type
50 Requested IP address
51 IP address lease
52 Additional option
53 DHCP message type
54 Server identifier
55 Parameter request list
The DHCP client uses this option to request specified
Issue 01 (2018-05-04) 940

NE20E-S2
Options ID Description
configuration parameters
58 Lease renewal time (Time1), which is 50% of the lease time
59 Lease renewal time (Time2), which is 87.5% of the lease time
The use of the options field differs depending on its function.

For more information about common DHCP options, see standard protocols.
 Customized DHCPv4 options
Some options are not defined in standard protocols. Option 43 and Option 82, which are
customized options, are described as follows:
− Option 43
Option 43 is called the vendor-specific information option. Figure 1-618 shows the
Option 43 format.
Figure 1-618 Option 43 format
DHCP servers and DHCP clients use Option 43 to exchange vendor-specific

information. When a DHCP server receives a DHCP Request message with
parameter 43 encapsulated in Option 55, the server encapsulates Option 43 in a
DHCP Reply message and sends it to the DHCP client.
To implement extensibility and allocate more configuration parameters to DHCP
clients, Option 43 supports sub-options, which are shown in Figure 1-618.
Sub-options follow a similar format to that used for Options. They contain a Type,
Length, and Value sub-field. In the Type sub-field, the value 0x01 indicates the
Auto-configuration server (ACS) parameter, the value 0x02 indicates the SP ID, and
the value 0x80 indicates the Preboot execution environment (PXE) server address.
If a device functions as a DHCP client, it can obtain the following information
using Option 43:
 ACS parameters, including the uniform resource locator (URL), user name,
and password
 SP ID that the Customer Premises Equipment (CPE) notifies the ACS of so
that the ACS can select configuration parameters from the specified SP
 PXE server address, which is used by a DHCP client to obtain the Bootfile or
control information from the PXE server
− Option 82
Issue 01 (2018-05-04) 941

NE20E-S2
The Option 82 field is called the DHCP relay agent information field. It records the
location of a DHCP client. A DHCP relay agent or a DHCP snooping-enabled
device appends the Option 82 field to a DHCP Request message sent from a DHCP
client and forwards the message to a DHCP server.
Servers use the Option 82 field to learn the location of DHCP clients, implement
client security and accounting, and make parameter assignment policies, allowing
for more flexible address allocation.
The Option 82 field contains a maximum of 255 sub-options. If the Option 82 field
is defined, at least one sub-option must be defined. Currently, the device supports
only two sub-options: sub-option 1 (circuit ID) and sub-option 2 (remote ID).
The content of the Option 82 field is not uniformly defined, and vendors fill in the
Option 82 field as needed.
The device supports the following Option 82 field formats:
 Type1: This is the Telecom format of Option 82.
 Type2: This is the NMS format of Option 82.
 Cn-telecom: This is the Option 82 format defined by China Telecom.
 Self-define: This is the user-defined format of DHCP Option 82.
1.9.4.2.3 DHCPv4 Server

A DHCP server assigns IP addresses to clients. A DHCP client sends a message to a DHCP
server to request configuration parameters, such as the IP address, subnet mask, and default
gateway address. The DHCP server responds with a message carrying the requested
configuration parameters. Both the request and reply messages are encapsulated in UDP
packets.
Three Modes for the Interaction Between the DHCP Client and Server
To obtain a valid dynamic IP address, a DHCP client exchanges different information with a
server at different stages. Generally, the DHCP client and server interact in the following
modes (defined in standard protocols):
 A DHCP client accesses a network for the first time.
When a DHCP client accesses a network for the first time, the DHCP client undergoes
the following stages to set up a connection with a DHCP server:
− Discovering stage: At this stage, the DHCP client searches for a DHCP server. The
client broadcasts a DHCP Discover message and only DHCP servers respond to the
message.
− Offering stage: At this stage, each DHCP server offers an IP address to the DHCP
client. After receiving the DHCP Discover message from the client, each DHCP
server selects an unassigned IP address from the IP address pool and sends a DHCP
Offer message with the leased IP address and other settings to the client.
− Selecting stage: At this stage, the DHCP client selects an IP address. If multiple
DHCP servers send DHCP Offer messages to the client, the client accepts the first
received DHCP Offer message and broadcasts a DHCP Request message carrying
the selected IP address.
− Acknowledging stage: At this stage, the DHCP server confirms the IP address that
is offered. After receiving the DHCP Request message, the DHCP server sends a
DHCP ACK message to the client. The DHCP ACK message contains the offered IP
address and other settings. The DHCP client then binds its TCP/IP protocol suite to
the network interface card.
Issue 01 (2018-05-04) 942

NE20E-S2
Except the IP address selected by the client, the IP addresses offered by other DHCP
servers are available to other clients.
 A DHCP client accesses a network for the second time.
When a DHCP client accesses a network for the second time, the DHCP client undergoes
the following stages to set up a connection with the DHCP server:
− If the client has previously accessed the network correctly, it does not broadcast a
DHCP Discover message. Instead, it broadcasts a DHCP Request message that
carries the previously-assigned IP address.
− After receiving the DHCP Request message, the DHCP server responds with a
DHCP ACK message if the IP address is not assigned, notifying the client that it can
continue to use the original IP address.
− If the IP address cannot be assigned to the client (for example, it has been assigned
to another client), the DHCP server responds with a DHCP NAK message to the
client. After receiving the DHCP NAK message, the client sends a DHCP Discover
message to apply for an IP address.
 A DHCP client extends the IP address lease.
The IP address dynamically assigned to a client has a validity period. The server
withdraws the IP address after the validity period expires. If the client intends to continue
to use this IP address, it must extend the IP address lease.
In real-world implementations, the DHCP client sends a DHCP Request message to the
server automatically to update the IP address lease when the DHCP client is started or
half of the lease has passed. If the IP address is valid, the server replies with a DHCP
ACK message to inform the client of the new IP address lease.
IP Address Allocation Modes

DHCP provides the following address allocation modes:
 Manual address allocation: An administrator binds fixed IP addresses to specific clients,
such as the WWW server, and uses DHCP to assign these IP addresses to the clients.
 Automatic address allocation: DHCP assigns IP addresses of infinite lease to clients.
 Dynamic address allocation: DHCP assigns IP addresses with a validity period to clients.
After the validity period expires, the clients must re-apply for addresses. This address
allocation mode is widely adopted.
IP Address Allocation Sequence

A DHCP server assigns IP addresses to a client in the following sequence:
 IP address that is in the database of the DHCP server and is statically bound to the MAC
address of the client
 IP address that has previously been assigned to the client, that is, IP address in the
requested IP Addr Option of the DHCP Discover message sent by the client
 IP address that is first found when the DHCP server searches the DHCP address pool for
available IP addresses
 If the DHCP address pool has no available IP address, the DHCP server searches the
expired IP addresses and conflicting IP addresses, and then assigns a valid IP address to
the client. If all the IP addresses are in use, an error message is reported.
Issue 01 (2018-05-04) 943

NE20E-S2
Method of Preventing Repeated IP Address Allocation

To avoid address conflicts, the DHCP server pings the IP address before assigning it to a
client.
The ping command checks whether a response to the ping packet is received within the
specified period. If no response to the ping packet is received, the DHCP server continues to
send ping packets to the IP address until the number of sent ping packets reaches the
maximum limit. If there is still no response, this IP address is not in use, and the DHCP server
assigns the IP address to a client. (This method is implemented based on standard protocols.)
IP Address Reservation
DHCP supports IP address reservation for clients. The reserved IP addresses must belong to
the address pool. If an address in the address pool is reserved, it is no longer assignable.
Addresses are usually reserved for specific clients, such as DNS and WWW servers.
1.9.4.2.4 DHCPv4 Relay

A DHCP relay agent transparently transmits DHCP messages between a DHCP client and a
DHCP server that reside on different network segments. The DHCP relay function allows
DHCP clients and DHCP server that are not part of the same network to communicate.
DHCP relay is usually implemented on a specific interface of a router. This interface requires
an IP relay address that is the IP address of the DHCP server specified on the DHCP relay
agent. The DHCP relay-enabled interface sends the broadcast DHCP messages that it receives
to the specified DHCP server.
DHCP Client Requesting an IP Address Through a DHCP Relay Agent for the
First Time
Figure 1-619 shows the process of a DHCP client requesting an IP address through a DHCP
relay agent for the first time.
Figure 1-619 DHCP client requesting an IP address through a DHCP relay agent for the first time
1. When a DHCP client starts and initializes DHCP, it broadcasts the configuration request
packets (DHCP Discover messages) onto a local network.
Issue 01 (2018-05-04) 944

NE20E-S2
After a DHCP relay agent connecting to the local network receives the broadcast packets,
it processes and forwards the packets to the specified DHCP server on another network.
2. After receiving the packets, the DHCP server sends the requested configuration
parameters in DHCP Offer messages to the DHCP client through the DHCP relay agent.
3. The DHCP client replies to the DHCP Offer message by broadcasting DHCP Request
messages.
Upon receipt, the DHCP relay agent sends the DHCP Request messages in unicast mode
to the DHCP server.
4. The DHCP server responds with a unicast DHCP ACK or DHCP NAK message through
the DHCP relay agent.
DHCP Client Extending the IP Address Lease Through the DHCP Relay Agent
An IP address dynamically assigned to a DHCP client usually has a validity period. The
DHCP server withdraws the IP address after the validity period expires. To continue using the
IP address, the DHCP client must renew the IP address lease.
The DHCP client enters the binding state after obtaining an IP address. The DHCP client has
three timers to control lease renewal, rebinding, and lease expiration. When assigning an IP
address to the DHCP client, the DHCP server can specify timer values. If the DHCP server
does not specify timer values, the default values are used. Table 1-165 describes the three
timers.
Table 1-165 Timers
Timer Description Default Value

Lease renewal When the lease renewal timer expires, the 50% of the lease
client automatically sends a DHCP Request
message to the DHCP server that has
assigned an IP address to the client. The
client then enters the update state, as shown
in Figure 1-620.
If the IP address is valid, the DHCP server
responds with a DHCP ACK message to
notify the client that the client has obtained
a new IP address lease, and the DHCP
client re-enters the binding state. If the IP
address is invalid, the DHCP server
responds with a DHCP NAK message, and
the client enters the initializing state.
Rebinding After the client sends a DHCP Request 87.5% of the lease
message for extending the lease, the client
remains in the update state and waits for a
response. If the client does not receive any
responses from the server before the
rebinding timer expires, it considers the
original DHCP server unavailable and
broadcasts a DHCP Request message. Any
DHCP server on the network shown in
Figure 1-621 can reply to this request with
a DHCP ACK or DHCP NAK message.
If the client receives a DHCP ACK
Issue 01 (2018-05-04) 945

NE20E-S2
Timer Description Default Value

message, it returns to the binding state and
resets the lease renewal timer and rebinding
timer, as shown in Figure 1-620. If the
client receives a DHCP NAK message, it
stops using the current IP address
immediately and returns to the initializing
state to apply for a new IP address.
Lease When the lease expires, the client stops 100% of the lease
expiration using the current IP address and returns to
the initializing state to apply for a new IP
address.
Figure 1-620 DHCP client extending the IP address lease by 50% through the DHCP relay agent
Figure 1-621 DHCP client extending the IP address lease by 87.5% through the DHCP relay agent
DHCP Relay Agent Supporting VPN Instances

A DHCP relay agent must support VPN instances to transmit DHCP packets between VPNs.
To ensure successful DHCP packet transmission between VPNs, there must be reachable VPN
Issue 01 (2018-05-04) 946

NE20E-S2
routes. If a DHCP server and a DHCP client reside on different VPNs, the DHCP replay agent
can transmit a DHCP Request message to the VPN where the DHCP server resides and
transmit a DHCP Reply message to the VPN where the DHCP client resides.
A DHCP relay agent can be deployed in CE1-PE1-PE2-CE2 networking, where the DHCP
server connects to one CE and the DHCP client connects to the other CE. Both CE1 and CE2
can belong to the same VPN or different VPNs.
DHCP Relay Agent Sending DHCP Release Messages to the DHCP Server
A DHCP relay agent can send a DHCP Release message, carrying an IP address to be released,
to the DHCP server.
When a DHCP client cannot send requests to the DHCP server to release its IP address, you
can configure the DHCP relay agent to release the IP address assigned by the DHCP server to
the client.
DHCP Relay Agent Setting the Priority of a DHCP Reply Message and TTL
Value of a DHCP Relay Message
 A DHCP relay agent can set the priority of DHCP Reply messages. The priority of
low-priority DHCP Reply messages can be raised so that they will not be discarded on
access devices.
 A DHCP relay agent can set the TTL value of DHCP Relay messages. The TTL value of
DHCP Relay messages can be increased to prevent the messages from being discarded
due to TTL becoming 0.
1.9.4.3 DHCPv6 Principles

1.9.4.3.1 DHCPv6 Overview
IPv6 Address Allocation Modes

IPv6 has made it possible to have virtually unlimited IP addresses by increasing the IP address
length from 32 bits to 128 bits. This increase in IP address length requires efficient IPv6
address space management and assignment.
IPv6 provides the following address allocation modes:
 Manual configuration. IPv6 addresses/prefixes and other network configuration
parameters are manually configured, such as the DNS server address, network
information service (NIS) server address, and Simple Network Time Protocol (SNTP)
server address.
 Stateless address allocation. A host uses the prefix carried in a received Router
Advertisement (RA) message and the local interface ID to automatically generate an
IPv6 address.
 Stateful address autoconfiguration using DHCPv6. DHCPv6 address allocation can be
implemented in any of the following modes:
− DHCPv6 stateful address autoconfiguration. A DHCPv6 server automatically
configures IPv6 addresses/prefixes and other network configuration parameters,
such as the DNS server address, NIS server address, and SNTP server address.
− DHCPv6 stateless address autoconfiguration. A host uses the prefix carried in a
received RA message and the local interface ID to automatically generate an IPv6
address. The DHCPv6 server assigns configuration parameters other than IPv6
Issue 01 (2018-05-04) 947

NE20E-S2
addresses, such as the DNS server address, NIS server address, and SNTP server
address.
− DHCPv6 Prefix Delegation (PD). IPv6 prefixes do not need to be manually
configured for the downstream routers. The DHCPv6 prefix delegation mechanism
allows a downstream router to send DHCPv6 messages carrying the IA_PD option
to an upstream router to apply for IPv6 prefixes. After the upstream router assigns a
prefix that has less than 64 bits to the downstream router, the downstream router
automatically subnets the delegated prefix into /64 prefixes and assigns the /64
prefixes to the links attached to IPv6 hosts through RA messages. This mechanism
implements automatic configuration of IPv6 addresses for IPv6 hosts and
hierarchical IPv6 prefix delegation.
DHCPv6 Architecture
Figure 1-622 shows the DHCPv6 architecture.
Figure 1-622 DHCPv6 architecture
The DHCPv6 architecture involves three roles:

 DHCPv6 client: exchanges DHCPv6 messages with a DHCPv6 server to obtain an IPv6
address/prefix and other configuration parameters.
 DHCPv6 relay agent: forwards DHCPv6 messages between a client and a server so that
the client can obtain an IPv6 address from the server. When DHCPv6 clients and servers
reside on the same link, a DHCPv6 client uses a link-local multicast address to obtain an
IPv6 address/prefix and other configuration parameters from a DHCPv6 server. If the
DHCPv6 client and server reside on different links, a DHCPv6 relay agent must be used
to forward DHCPv6 messages between the client and server. DHCPv6 relay allows a
single DHCPv6 server to serve DHCPv6 clients on different links, reducing costs and
facilitating centralized management.
DHCPv6 relay agents are not mandatory in the DHCPv6 architecture.
Issue 01 (2018-05-04) 948

NE20E-S2
 DHCPv6 server: processes address allocation, lease extension, and address release
requests originating from a DHCPv6 client or forwarded by a DHCPv6 relay agent and
assigns IPv6 addresses/prefixes and other configuration parameters to the client.
Basic DHCPv6 Concepts

1. Multicast address
In DHCPv4, clients broadcast DHCPv4 messages to servers. To prevent broadcast storms,
IPv6 uses multicast packets instead of broadcast packets. DHCPv6 uses the following
multicast addresses:
− All_DHCP_Relay_Agents_and_Servers (FF02::1:2): a link-scoped multicast
address used by a client to communicate with neighboring relay agents and servers.
All DHCPv6 servers and relay agents are members of this multicast group.
− All_DHCP_Servers (FF05::1:3): a site-scoped multicast address used by a DHCPv6
relay agent to communicate with servers. All DHCPv6 servers within the site are
members of this multicast group.
2. UDP port number
− DHCPv6 messages are carried over UDPv6.
− DHCPv6 clients listen to DHCPv6 messages on UPD port 546.
− DHCPv6 servers and relay agents listen to DHCPv6 messages on UPD port 547.
3. DHCP Unique Identifier (DUID)
− Each DHCPv6 client or server has a DUID. A DHCPv6 server and a client use
DUIDs to identify each other.
− The client DUID is carried in the Client Identifier option, and the server DUID is
carried in the Server Identifier option. Both options have the same format. The
option-code field value determines whether the option is a Client Identifier or
Server Identifier option.
4. Identity Association (IA)
− An IA is a construct through which a server and a client can identify, group, and
manage a set of related IPv6 addresses. Each IA consists of an IAID and associated
configuration information.
− Each DHCPv6 client must associate one or more IAs with each of its interfaces that
request to obtain IPv6 addresses from a DHCPv6 server. The client uses the IAs
associated with an interface to obtain configuration information from a DHCPv6
server for that interface. Each IA must be associated with an interface.
− Each IA has an identity association identifier (IAID), which must be unique among
all IAIDs for the IAs of a client. An IAID is not lost or changed due to a device
restart.
− An interface is associated with one or more IAs. An IA contains one or more
addresses.
1.9.4.3.2 DHCPv6 Messages

Similar to DHCPv4 messages, DHCPv6 messages are also carried over UDP, with UDP port
546 assigned to DHCPv6 clients and UDP port 547 assigned to DHCPv6 relay agents and
servers.
IPv6 does not support broadcast packets, and therefore DHCPv6 clients use multicast IPv6
packets for communication. DHCPv6 clients use the multicast address FF02::1:2 to
communicate with DHCPv6 relay agents and servers. DHCPv6 relay agents and servers use
the multicast address FF05::1:3 to communicate with each other.
Issue 01 (2018-05-04) 949

NE20E-S2
DHCPv6 messages share an identical fixed format header and a variable format area for
options.
 Introduction
 DHCPv6 Options
Introduction
 DHCPv6 message types
Unlike DHCPv4 messages for which the message type is specified in the Message Type
option, DHCPv6 messages use the msg-type field in the header to identify the message
type. Table 1-166 lists the DHCPv6 message types.
Table 1-166 DHCPv6 message types
Type Code Description

SOLICIT 1 A client sends a Solicit message to locate servers.
ADVERTISE 2 A server sends an Advertise message in response to a Solicit
message received from a client to indicate that it is available for
DHCPv6 services.
REQUEST 3 A client sends a Request message to request IP addresses and
other configuration parameters from a server.
CONFIRM 4 A client sends a Confirm message to any available server to
determine whether the IP addresses it was assigned are still
applicable to the link to which the client is connected.
RENEW 5 A client sends a Renew message to the server that provided the
client's addresses and other configuration parameters to extend
the lease of the IP addresses assigned to the client and to update
other configuration parameters.
REBIND 6 A client sends a Rebind message to any available server to
extend the lease of the IP addresses assigned to the client and to
update other configuration parameters. This message is sent if a
client does not receive a response to a Renew message.
REPLY 7 A server sends a Reply message in the following scenarios:
 A server sends a Reply message containing assigned IP
addresses and configuration parameters in response to a
Solicit, Request, Renew, or Rebind message received from a
client.
 A server sends a Reply message containing configuration
parameters in response to an Information-request message.
 A server sends a Reply message in response to a Confirm
message, confirming or denying that the IP addresses
assigned to the client are applicable to the link to which the
client is connected.
 A server sends a Reply message to acknowledge receipt of a
Release or Decline message.
Issue 01 (2018-05-04) 950

NE20E-S2
Type Code Description
RELEASE 8 A client sends a Release message to the server that assigned

addresses to the client to indicate that the client will no longer
use one or more of the assigned addresses.
DECLINE 9 A client sends a Decline message to a server to indicate that the
client has determined that one or more addresses assigned by the
server are already in use on the link to which the client is
connected.
RECONFIG 10 A server sends a Reconfigure message to a client to inform the
URE client that the server has new or updated configuration
parameters.
INFORMATI 11 A client sends an Information-request message to a server to
ON-REQUE request configuration parameters without any IP addresses.
ST
RELAY-FOR 12 A relay agent sends a Relay-forward message to relay messages
W to servers.
RELAY-REP 13 A server sends a Relay-reply message to a relay agent containing
LY a message that the relay agent delivers to a client.
 DHCPv6 message format

DHCPv6 messages share an identical fixed format header and a variable format area for
options, which are different from those of DHCPv4 messages. DHCPv6 messages
transmitted between clients and servers and between relay agents and servers have
different header formats.
− DHCPv6 client/server message format
Figure 1-623 shows the DHCPv6 client/server message format.
Figure 1-623 DHCPv6 client/server message format
Table 1-167 describes fields in a DHCPv6 client/server message.
Issue 01 (2018-05-04) 951

NE20E-S2
Table 1-167 DHCPv6 client/server message fields
Field Lengt Description Value

h
msg-type 1 byte DHCP message type The value ranges from 1 to 11.
The available message types are
listed in Table 1-166.
transaction 3 bytes Transaction ID for this message -
-id exchange, indicating one
exchange of DHCPv6 messages
options Variab Options carried in this message -
le
− DHCPv6 relay agent/server message format

Figure 1-624 shows the relay agent/server message format.
Figure 1-624 DHCPv6 relay agent/server message format
Only Relay-forward and Relay-reply messages are exchanged between DHCPv6 relay
agents and servers. Figure 1-625 lists the fields of a DHCPv6 relay agent/server message.
Table 1-168 DHCPv6 relay agent/server message fields
Field Lengt Usage 1 Usage 2

h
msg-type 1 byte RELAY-FORW RELAY-REPL

hop-count 1 byte Number of relay agents that have Copied from the
relayed this message Relay-forward message
link-addre 16 An IPv6 global unicast or link-local Copied from the
ss bytes address that will be used by the Relay-forward message
Issue 01 (2018-05-04) 952

NE20E-S2
Field Lengt Usage 1 Usage 2

h
server to identify the link to which
the client is connected
peer-addr 16 IP address of the client or relay Copied from the
ess bytes agent from which the message to be Relay-forward message
relayed was received
options Varia Must include the Relay Message Must include the Relay
ble option; may include other options Message option; may include
added by the relay agent other options
DHCPv6 Options
 DHCPv6 options format
Figure 1-625 shows the DHCPv6 options format.
Figure 1-625 DHCPv6 options format
Table 1-169 lists the sub-fields in the DHCPv6 options field
Table 1-169 Sub-fields in the DHCPv6 options field
Sub-field Length Description

option-code 2 bytes Options ID
option-len 2 bytes Length of the option-data field
option-data Determined by the Data for the option
option-len value
 DHCPv6 relay options

A Relay-forward or Relay-reply message must have a Relay Message option (Option 9)
that carries a DHCPv6 message.
DHCPv6 relay Interface-Id option (Option 18), Remote-Id option (Option 37), and
Subscriber-Id option (Option 38) have the same functions as DHCPv4 relay Option 82.
These DHCPv6 options are added by DHCPv6 relay agents in Relay-forward messages
for DHCPv6 servers. Servers use these options to learn the location of DHCPv6 clients,
Issue 01 (2018-05-04) 953

NE20E-S2
implement client security and accounting, and make parameter assignment policies,
allowing for more flexible address allocation.
Table 1-170 lists the DHCPv6 relay options.
Table 1-170 DHCPv6 relay options
Option Options Description

ID
relay 9 Carries a DHCPv6 message.
message
Interface-Id 18 Identifies the interface on which the client message was
received.
Remote-Id 37 Carries additional information, such as the DHCP Unique
Identifier (DUID), port identifier, and VLAN ID.
Subscriber-I 38 Carries client's physical information, such as the MAC
d address.
1.9.4.3.3 DHCPv6 Relay
Overview
DHCPv6 relay agents relay DHCPv6 messages between DHCPv6 clients and servers that
reside on different network segments to facilitate dynamic address allocation. This function
enables a single DHCPv6 server to serve DHCPv6 clients on different network segments,
which reduces costs and facilitates centralized management.
 A DHCPv6 relay agent relays both messages from clients and Relay-forward messages
from other relay agents. When a relay agent receives a valid message to be relayed, it
constructs a new Relay-forward message. The relay agent copies the received DHCP
message (excluding IP or UDP headers) into the Relay Message option in the new
message. If other options are configured on the relay agent, it also adds them to the
Relay-forward message. Table 1-171 lists the fields that a DHCPv6 relay agent can
encapsulate into a Relay-forward message.
Table 1-171 Fields that a DHCPv6 relay agent can encapsulate into a Relay-forward message
Field Encapsulation Description

Source address in the Set to the IPv6 global unicast address of the outbound interface.
IP header
Destination address in 1. Used to send unicast packets if the inbound interface is
the IP header configured with a unicast address of a server or relay agent.
2. Used to send multicast packets to the All_DHCP_Servers
multicast address FF05::1:3 if the inbound interface is not
configured with a unicast address of any server or relay agent.
Hop limit in the IP  Set to 32 if the destination address is the All_DHCP_Servers
Issue 01 (2018-05-04) 954

NE20E-S2

header multicast address FF05::1:3.
 Set to 255 if the destination address is a unicast address.
Source port number in Set to 547.
the UDP header
Destination port Set to 547.
number in the UDP
header
Hop-count in the  Set to 0 if the message comes from a client.
Relay-forward  Set to the value of the hop-count field in the received message
message incremented by 1 if the hop-count is less than the maximum
value. If the hop-count is greater than or equal to the
maximum value, the relay agent discards the received
message.
Link-address in the  Set to a global unicast or link-local address assigned to the
Relay-forward inbound interface if the message comes from a client. The
message server then determines the link it uses to assign addresses and
other configuration parameters to the client.
 Set to 0 if the message comes from another relay agent.
Peer-address in the Set to the source address in the IP header of the received
Relay-forward message.
message
 A DHCPv6 relay agent relays a Relay-reply message from a server. The relay agent
extracts the Relay Message option from a Relay-reply message and relays it to the
address contained in the peer-address field of the Relay-reply message. Table 1-172 lists
the fields that a DHCPv6 relay agent can encapsulate into a Relay-reply message.
Table 1-172 Fields that a DHCPv6 relay agent can encapsulate into a Relay-reply message

Source address in the Set to the IPv6 global unicast address of the outbound interface.
IP header
Destination address in Set to the peer-address of the received outer Relay-reply
the IP header message.
Hop limit in the IP Set to 255.
header
Source port number in Set to 547.
the UDP header
Destination port  Set to 547 if the Relay-reply message is sent to other relay
number in the UDP agents.
header  Set to 546 if the message extracted from the Relay-reply
Issue 01 (2018-05-04) 955

NE20E-S2

message is sent to the client.
DHCPv6 servers construct Relay-reply messages.

A server uses a Relay-reply message to return a response to a client if the original message from the
client was relayed to the server in the Relay Message option of a Relay-forward message.
If a server does not have an address it can use to send a Reconfigure message directly to a client, the
server encapsulates the Reconfigure message into the Relay Message option of a Relay-Reply message
to be relayed by the relay agent to the client.
The Relay-Reply message must be relayed through the same relay agents as the original client message.
The server must be able to obtain the addresses of the client and all relay agents on the return path so it
can construct the appropriate Relay-reply message carrying the response.
DHCPv6 Client Applying for an IP Address Through a DHCPv6 Relay Agent for
the First Time
Figure 1-626 illustrates how a DHCPv6 client applies for an IP address to a DHCPv6 server
through a DHCPv6 relay agent for the first time.
Figure 1-626 DHCPv6 client applying for an IP address to a DHCPv6 server through a DHCPv6
relay agent for the first time
Issue 01 (2018-05-04) 956

NE20E-S2
1. The DHCPv6 client sends a Solicit message to discover servers. The DHCPv6 relay
agent that receives the Solicit message constructs a Relay-forward message with the
Solicit message in the Relay Message option and sends the Relay-forward message to the
DHCPv6 server.
2. After the DHCPv6 server receives the Relay-forward message, it parses the Solicit
message and constructs a Relay-reply message with the Advertise message in the Relay
Message option. The DHCPv6 server then sends the Relay-reply message to the
DHCPv6 relay agent. The DHCPv6 relay agent parses the Relay Message option in the
Relay-reply message and sends the Advertise message to the DHCPv6 client.
3. The DHCPv6 client then sends a Request message to request IP addresses and other
configuration parameters. The DHCPv6 relay agent constructs a Relay-forward message
with the Request message in the Relay Message option and sends the Relay-forward
message to the DHCPv6 server.
4. After the DHCPv6 server receives the Relay-forward message, it parses the Request
message and constructs a Relay-reply message with the Reply message in the Relay
Message option. The Reply message contains the assigned IPv6 address and other
configuration parameters. The DHCPv6 server then sends the Relay-reply message to the
DHCPv6 relay agent. The DHCPv6 relay agent parses the Relay Message option in the
Relay-reply message and sends the Reply message to the DHCPv6 client.
Supported DHCPv6 Relay Options

 Interface-Id option (Option 18)
A DHCPv6 relay agent sends the Interface-Id option to identify the interface on which
the client message was received. A DHCPv6 server uses the Interface-Id option for
parameter assignment policies. The server must copy the Interface-Id option from the
Relay-forward message into the Relay-Reply message the server sends to the relay agent
in response to the Relay-forward message. The Interface-Id option applies only to
Relay-forward and Relay-reply messages.
 Remote-Id option (Option 37)
A DHCPv6 relay agent sends the Remote-Id option to carry additional information, such
as the DUID, port identifier, and VLAN ID. A DHCPv6 server uses the Remote-Id
option to determine the addresses, delegated prefixes, and configuration parameters to
assign to clients.
 Subscriber-Id option (Option 38)
A DHCPv6 relay agent sends the Subscriber-Id option to carry the client's physical
information, such as the MAC address. A DHCPv6 server uses the Subscriber-Id option
to determine the addresses, delegated prefixes, and configuration parameters to assign to
clients.
DHCPv6 Prefix Delegation

On a hierarchical network, IPv6 addresses are generally configured manually, which limits
extensibility and prevents uniform IPv6 address planning and management. Standard
protocols provide a delegation mechanism, DHCPv6 Prefix Delegation (PD), which
automates the process of assigning prefixes to networking equipment on the customer's
premises.
Issue 01 (2018-05-04) 957

NE20E-S2
Figure 1-627 DHCPv6 PD networking
On the network shown in Figure 1-627, IPv6 prefixes do not need to be manually configured
for the CPEs. The DHCPv6 prefix delegation mechanism allows a CPE to apply for IPv6
prefixes by sending DHCPv6 messages carrying the IA_PD option to the DHCPv6 server.
After the DHCPv6 server assigns a prefix that has less than 64 bits to the CPE, the CPE
automatically subnets the delegated prefix into /64 prefixes and assigns the /64 prefixes to the
user network through RA messages. This mechanism implements automatic configuration of
IPv6 addresses for IPv6 hosts and hierarchical IPv6 prefix delegation.
If a DHCPv6 relay agent is deployed to forward DHCPv6 messages between CPEs (DHCPv6
clients) and the DHCPv6 server, the DHCPv6 relay agent must set up routes to the network
segments on which the clients reside and advertises these network segments after the
DHCPv6 server assigns PD prefixes to the clients. Otherwise, core network devices cannot
learn the routes destined for the CPEs, and IPv6 hosts cannot access the network. If a client
sends a Release message to the server to return a delegated prefix, or the lease of a delegated
prefix is not extended after expiration, the DHCPv6 relay agent deletes the network segment
of the client.
1.9.4.4.1 DHCPv4 Server Application
Service Overview
A DHCP server is used to assign IP addresses in the following scenarios:
 Manual configurations take a long time and bring difficulties to centralized management
on a large network.
 Hosts on the network outnumber the available IP addresses. Therefore, not every host
can have a fixed IP address assigned. For example, if service providers (SPs) limit the
number of concurrent network access users, many hosts must dynamically obtain IP
addresses from the DHCP server.
 Only a few hosts on the network require fixed IP addresses.
Issue 01 (2018-05-04) 958

NE20E-S2
On a typical DHCP network, a DHCP server and multiple DHCP clients exist, such as PCs
and portable computers. DHCP uses the client/server model. A client applies to the server for
configuration parameters, such as an IP address, subnet mask, and default gateway address;
the server replies with the requested configuration parameters. Figure 1-628 shows typical
DHCP networking.
Figure 1-628 Typical DHCP Networking
If a DHCP client and a DHCP server reside on different network segments, the client can obtain an IP
address and other configuration parameters from the server through a DHCP relay agent. For details
about DHCP relay, see 1.9.4.2.4 DHCPv4 Relay.
1.9.4.4.2 DHCPv4/v6 Relay Application

Earlier versions of DHCP can be used when the DHCP client and server reside on the same
network segment. To dynamically assign IP addresses to hosts on network segments, the
network administrator must configure a DHCP server on each network segment, which
increases costs. The DHCP relay function solves this problem. A DHCP client can apply for
an IP address from a DHCP server on another network segment through a DHCP relay agent.
This function enables a single DHCP server to serve DHCP clients on different network
segments, which reduces costs and facilitates centralized management.
Figure 1-629 illustrates the DHCP relay application.
Issue 01 (2018-05-04) 959

NE20E-S2
Figure 1-629 DHCP relay networking
DHCPv4 and DHCPv6 relay applications are the same. The DHCP relay application described in this
section covers both DHCPv4 and DHCPv6 relay. However, DHCPv4 and DHCPv6 relay cannot be used
in the current version at the same time.
1.9.5 DNS
Definition
Domain Name System (DNS) is a distributed database for TCP/IP applications that provides
conversion between domain names and IP addresses.
Purpose
DNS uses a hierarchical naming method to specify a meaningful name for each device on the
network and uses a resolver to establish mappings between IP addresses and domain names.
DNS allows users to use meaningful and easy-to-memorize domain names instead of IP
addresses to identify devices.
Benefits
When you check the continuity of a service, you can directly enter the domain name used to
access the service instead of the IP address. Even if the IP address used to access the service
has changed, you can still check continuity using the domain name, so long as the DNS server
has obtained the new IP address.
1.9.5.2 Principles
There are two complementary DNS methods: static and dynamic DNS. In domain name
resolution, static DNS is used first. If this method fails, dynamic DNS is used.
Issue 01 (2018-05-04) 960

NE20E-S2
1.9.5.2.1 Static DNS
Related Concepts
Static DNS is implemented based on the static domain name resolution table. The mapping
between domain names and IP addresses recorded in the table is manually configured. You
can add some common domain names to the table to facilitate resolution efficiency.
Implementation
A DNS client establishes the static domain name resolution table based on configured static
DNS data. The DNS client can then automatically convert entered domain names to IP
addresses, if the entered domain names can be found in the static domain name resolution
table. Statically configured DNS data does not age.
Usage Scenario
If no DNS server exists on a network or the required DNS entries are not stored on the DNS
server, use static DNS to resolve domain names.
Benefits
If there are not many hosts accessed by Telnet applications and the hosts do not change
frequently, using static DNS improves resolution efficiency.
1.9.5.2.2 Dynamic DNS
Related Concepts
Dynamic DNS allows client programs, such as ping and tracert, to use the resolver of a DNS
client to access a DNS server.
 Resolver: a server that provides the mapping between domain names and IP addresses
and handles user request for domain name resolution.
 Recursive resolution: If a DNS server cannot find the IP address corresponding to a
domain name, the DNS server turns to other DNS servers for help and sends the resolved
IP address to the DNS client.
 Query type
− Class-A query: a query used to request the IPv4 address corresponding to a domain
name. This type of query is most commonly used in DNS resolution.
− Class-AAAA query: a query used to request the IPv6 address corresponding to a
domain name.
− PTR query: a query used to request the domain name corresponding to a IPv4
address.
Implementation
Dynamic DNS is implemented using the DNS server.
Figure 1-630 shows the relationship between the client program, resolver, DNS server, and
cache.
Issue 01 (2018-05-04) 961

NE20E-S2
Figure 1-630 Dynamic DNS
The DNS client is composed of the resolver and cache and is responsible for accepting and
responding to DNS queries from client programs. Generally, the client program, cache, and
resolver are on the same device, whereas the DNS server is on another device.
1. A client program sends a request to the DNS client.
2. After receiving the request, the DNS client searches the local database or the cache. If
the required DNS entry is not found, the DNS client sends a query packet to the DNS
server. Currently, devices support the Class-A query, Class-AAAA query and PTR
query.
3. The DNS server searches its local database for the IP address corresponding to the
domain name carried in the query packet. If the corresponding IP address cannot be
found, the DNS server forwards the query packet to the upper-level DNS server for help.
The upper-level DNS server resolves the domain name in recursive resolution mode, as
specified in the query packet, and returns the resolution result to the DNS server. The
DNS server then sends the result to the DNS client.
4. After receiving the response packet from the DNS server, the DNS client sends the
resolution result to the client program.
Dynamic DNS allows you to define a domain name suffix list by pre-configuring some domain name
suffixes. After you enter a partial domain name, the DNS server automatically displays the complete
domain name with different suffixes for resolution.
Usage Scenario
Dynamic DNS is used in scenarios in which a large number of mappings between domain
names and IP addresses exist and these mappings change frequently.
Benefits
If a large number of mappings between domain names and IP addresses exist, manually
configuring DNS entries on each DNS server is laborious. To solve this problem, use dynamic
DNS instead. Dynamic DNS effectively improves configuration efficiency and facilitates
DNS management.
Issue 01 (2018-05-04) 962

NE20E-S2
If you want to use domain names to visit other devices, configure DNS. DNS entries record
the mappings between domain names and IP addresses. In Figure 1-631, client programs and
the DNS client are on the same device.
 If you seldom use domain names to visit other devices or no DNS server is available,
configure static DNS on the DNS client. To configure static DNS, you must know the
mapping between domain names and IP addresses. If a mapping changes, manually
modify the DNS entry on the DNS client.
 If you want to use domain names to visit many devices and DNS servers are available,
configure dynamic DNS. Dynamic DNS requires DNS servers.
Figure 1-631 Domain name resolution networking
1.9.6 MTU
1.9.6.1 What is MTU
Maximum transmission unit (MTU) defines the largest size of packets that an interface can
sent without the need to fragment. IP packets larger than the MTU are fragmented before they
are sent out of an interface.
MTU is used to limit frame lengths on the link layer. In fact, devices of different vendors and
even different product models of the same vendor have different MTU definitions.
Take the Ethernet as an example.
Issue 01 (2018-05-04) 963

NE20E-S2
Figure 1-632 A complete Ethernet frame (in bytes)
 In some devices, the MTU configured on the Ethernet interface indicates the largest size
of the IP datagram of the Ethernet frame, that is, the MTU is layer3 definition, named as
IP MTU.
 In some devices, the MTU = Data payload + Destination MAC + Source MAC + Length,
that is, the MTU = IP MTU + 14Bytes.
 In other devices, the MTU = Data payload + Destination MAC + Source MAC + Length
+ CRC, that is, MTU = IP MTU + 18Bytes.
In NE20E, the MTU is a layer 3 definition. As shown in the Figure 1, the MTU indicates the
largest size of the IP header + IP payload. If the MTU configured on an interface of Ethernet
type is 1500 bytes, the packet will not be fragmented if the total length of the IP header and IP
payload is not larger than 1500 bytes.
Figure 1-633 The MTU definition in NE20E
The interfaces of NE20E support an MTU between 46 and 9600 bytes. Each interface
supports a default MTU.
1.9.6.2 IP MTU Fragmentation
IP MTU Fragmentation Implementation

MTU Process Notes
Fragmentation Location
Related
Processes
Original IPv4 Control Original IPv4 packets indicates the protocol packets sent from control plane
packet sending plane of local device. The source IP address of these IPv4 packets are local
device.
Border Gateway Protocol (BGP) packets, Internet Control Message Protocol
(ICMP) messages, and bidirectional forwarding detection (BFD) control
packets, belong to protocol packets.
When the ping command is run on a device, the device sends ICMP request
messages.
Original IPv6 Control Original IPv6 packets indicates the protocol packets sent from control plane
packet sending plane of local device. The source IPv6 address of these IPv6 packets are local
device.
Issue 01 (2018-05-04) 964

NE20E-S2
MTU Process Notes

Fragmentation Location
Related
Processes
When the ping ipv6 command is run on a device, the device sends ICMPv6
request messages.
IPv4 packet Forwarding IP fragmentation occurs when the device sends packet, but not when the
forwarding plane device receives packet.
For NE20E device, the MTU configured in the interface (also called
interface MTU) is IP MTU. IP MTU is a layer 3 definition. Therefor,
interface MTU only takes effect on Layer 3 traffic, but not on Layer 2
traffic. Even when IP header + IP payload of L2 frame exceeds interface
MTU, the L2 frame will not be fragmented.
NOTE
Generally, only the source and destination nodes need to analyze the IPv6 extension headers. So fragmentation only occurs on
the source node, which is different from IPv4.
Force-fragment
By default, when the IPv4 packet's length is larger than the interface MTU,
 If DF=0, the packet is fragmented.
 If DF=1, the packet is not permit to fragmented, and the device drops the packet and
return a Packet-too-big message.
NE20E supports force-fragment function. If force-fragment is enabled, the board ignoring
DF-bit all IPv4 large packets (size> MTU) will be cut into packets and be forwarded with
DF=0.
To enable force-fragment function, run the ipv4 force-fragment enable command.
Force-fragment function is enabled only for IPv4 packets, not for other type packets.
By default, the force-fragment function is not enabled.
Fragmentation in Control Plane

As shown in Figure 1-634, control plane fragments IP packet and then encapsulate them with
tunnel header (such as MPLS and L2TP) if needed before sending the packet to forwarding
plane. Control plane fragmentation is implemented by software. Therefore, the fragmentation
rules are the same in different board types.
Issue 01 (2018-05-04) 965

NE20E-S2
Figure 1-634 Fragmentation in Control Plane
If the size (including the IP header and payload) of non-MPLS packets sent from control
plane, is greater than the MTU value configured on an outbound interface:
 If the DF field is set to 0 in a packet, the packet is fragmented. The size of each fragment
is less than or equal to the interface MTU.
 If the DF field is set to 1 in a packet, the packet is discarded.
 If the DF field is set to 1 in a packet and the out interface is enabled with
forcible-fragmentation, the packet is fragmented. Each fragment is forwarded with DF=0.
(By default, forcible-fragmentation is not enabled for control plane. To enable
forcible-fragmentation for control plane, run the clear ip df command in the out
interface.
For the information about the fragmentation for MPLS packet, see the chapter 1.9.6.3 MPLS
MTU Fragmentation.
Protocol packets are usually allowed to be fragmented (DF=1), that is, the protocol packets
are usually not be discarded in the original device even when they exceed the MTU. the
protocol packets are not allowed to be fragment (DF=1) only when:
 the device is implementing PMTU discover, such as IPv6 PMTU discover, or
LDP/RSVP-TE PMTU negotiation.
 the ping -f command is running on the local device.
Fragmentation in Forwarding Plane

Fragmentation in forwarding plane takes effort only on forwarding traffic. Forwarding traffic
indicates the traffic passes through the local device, without being sent to the control plane.
Forwarding traffic does not include the traffic sent from control plane.
The NE20E device provides rich board types. Different board types may have different MTU
fragmentation rules, as shown in Figure 1-635.
Issue 01 (2018-05-04) 966

NE20E-S2
Figure 1-635 Fragmentation on motherboards
Fragmentation on motherboards or integrated boards:

 The fragmentation works only towards the IPv4 traffic to be forwarded (including the
IPv4 traffic to be forwarded after decapsulation).
For example, MPLS L3VPN traffic is MPLS-encapsulated before being forwarded from
a network-to-network interface (NNI) to a user-to-network interface (UNI) on a provider
edge (PE) node. PE fragments the traffic after removing MPLS labels.
For another example, in L3VPN and high-speed Internet (HSI) scenarios, if a customer
premises equipment (CPE) uses a Dot1q or QinQ termination sub-interface to access a
PE, the packet sent from CPE to PE is VLAN tagged. In these scenarios, the packet is
also fragmented on the PE after the VLAN tags are removed.
 Fragmentation does not occur during L2 or MPLS forwarding behavior.
 Fragmentation does not occur during IPv6 forwarding.
 Force-fragment ( ipv4 force-fragment enable command ): By default, when the IPv4
packet's length is larger than the interface MTU, if DF=1, the packet is not permit to
fragmented, and the device drops the packet and return a Packet-too-big message. If
force-fragment is enabled, the board ignoring DF-bit all IPv4 large packets (size> MTU)
will be cut into packets and be forwarded with DF=0. Force-fragment function is enabled
only for IPv4 packets, not for other type packets.
Issue 01 (2018-05-04) 967

NE20E-S2
Figure 1-636 Fragmentation on integrated boards
1.9.6.3 MPLS MTU Fragmentation
MPLS MTU Definition

The Multiprotocol Label Switching (MPLS) MTU defines the maximum number of bytes in a
labeled packet that an MPLS device can forward without fragmenting the packet. In NE20E,
the MPLS MTU is the maximum length of the sum of the following fields:
 MPLS label stack
 IP header
 IP payload
Figure 1-637 MPLS MTU example
Issue 01 (2018-05-04) 968

NE20E-S2
MPLS MTU Usage Scenarios

On NE20E, the MPLS MTU takes effect only on Layer 3 traffic traveling from IP networks to
MPLS tunnels. The MPLS packets on which a specified MPLS MTU takes effect must
contain labels next to the IP headers. MPLS MTU usage scenarios are as follows:
 MPLS L3VPN scenario, the traffic forwarding from User-to-network interface (UNI)
Network-to-network interface (NNI).
 IP traffic on PEs or Ps is directed into LSPs using policy-based routing (PBR), the
redirection function, static routes, Interior Gateway Protocol (IGP) shortcuts, and
forwarding adjacency.
 Packets originate from the control plane and are directed to LSPs. For example, run the
ping -vpn-instance or ping lsp command is executed on the device, the device
originates ICMP Request messages. These messages are IP packets and will be sent to
MPLS tunnels.
MPLS MTU Value Selection

The basic MPLS MTU formula is:
MPLS MTU = IP MTU + Number of labels x 4 bytes
On NE20E, the following parameters may affect MPLS MTU selection:
 configured interface MTU on the physical out interface
 configured MPLS MTU on the physical out interface
 PMTU negotiated by LDP signalling (for details about this parameter, see chapter 1.9.6.4
Protocols MTU Negotiation)
 PMTU negotiated by RSVP-TE signalling (for details about this parameter, see chapter
1.9.6.4 Protocols MTU Negotiation)
 configured interface MTU on the tunnel interface
For detailed rules of MPLS MTU Value Selection, see Table 1-173.
Table 1-173 MPLS MTU Value Selection Rules
Scenarios Parameters which may affect MPLS MTU value selection ("Y" indicates affect,
"N" indicates no affect, the smallest value among the affecting parameters is
selected as the MPLS MTU)
interface MTU MPLS MTU PMTU PMTU interface MTU

on the on the negotiated by negotiated on the tunnel
physical out physical out LDP signalling by interface
interface interface RSVP-TE
signalling
LDP LSP Y Y Y N N
MPLS-TE Y Y N Y N
LDP over TE Y Y Y N Y
NOTE
In LDP over TE scenario, interface MTU on the tunnel interface affects MPLS MTU value selection,
because the LDP LSP is over TE tunnel and the TE tunnel interface is an out interface of the LDP LSP.
Issue 01 (2018-05-04) 969

NE20E-S2
Scenarios Parameters which may affect MPLS MTU value selection ("Y" indicates affect,
"N" indicates no affect, the smallest value among the affecting parameters is
selected as the MPLS MTU)
interface MTU MPLS MTU PMTU PMTU interface MTU

on the on the negotiated by negotiated on the tunnel
physical out physical out LDP signalling by interface
interface interface RSVP-TE
signalling
According to the above rules, the selected MPLS MTU on NE20E is impossible larger than
the physical interface MTU. Therefore, the size of the MPLS-labeled packets are less than or
equal to the physical interface MTU and will not be discarded by the local device if DF=0.
Fragmentation Implementation for IP Packets That Enter MPLS Tunnels

For the MPLS packets sent from the control plane of the NE20E device, if the size of the IP
datagram and label in MPLS packets is greater than the MPLS MTU value,
 if DF=0, the packet is fragmented. Each fragment (including the IP header and label) is
less than or equal to the MPLS MTU value.
 if DF=1, the packet is discarded and an ICMP Datagram Too Big message is sent to the
source end.
1.9.6.4 Protocols MTU Negotiation

In addition to packet forwarding, the MTU is associated with some protocols.
OSPF MTU Negotiation

Defined in relevant standards, Open Shortest Path First (OSPF) nodes exchange Database
Description (DD) packets that carry the interface MTU fields to negotiate MTU values.
Issue 01 (2018-05-04) 970

NE20E-S2
According to relevant standards, if an OSPF node receives a DD packet with an interface

MTU value less than the MTU of the local outbound interface, the OSPF relationship remains
in the ExStart state and fails to transition to the Full state.
Devices manufactured by different vendors may use the different rules to process DD packets:
 Some devices check the MTU values carried in DD packets by default, while allowing
users to disable the MTU check.
 Some devices do not check the MTU values carried in DD packets by default, while
allowing users to enable the MTU check.
 Other devices forcibly check the MTU values carried in DD packets.
Implementation inconsistencies between vendor-specific devices are a common cause of
OSPF adjacency problems.
NE20E devices by default do not check MTU values carried in DD packets and set the MTU
values to 0 bytes before sending DD packets.
NE20E devices allow to set the MTU value in DD packets to be sent over a specified interface.
After the DD packets arrive at NE20E device, the device checks the interface MTU field and
allows an OSPF neighbor relationship to reach the Full state only if the interface MTU field in
the packets is less than or equal to the local MTU.
IS-IS MTU Negotiation

Two Intermediate System to Intermediate System (IS-IS) devices exchange Hello packets to
establish and maintain an IS-IS adjacency. As defined in ISO 10589, the size of each Hello
packet must be greater than or equal to the interface MTU. If the Hello packet size is less than
the interface MTU, the Hello packet is padded with zeros so that its size is equal to the
interface MTU. This process ensures that IS-IS adjacencies are established only between
devices that can handle the maximum sized packets.
In NE20E implement IS-IS in compliance with ISO 10589. By default, only IS-IS interfaces
with the same MTU value can establish an IS-IS adjacency.
On live networks, all interconnected router interfaces have the same MTU, and there is no
need to pad Hello packets with zeros. If an interface has a large MTU sends Hello packets at
short intervals, the interface has to pad a large number of Hello packets with zeros, which
wastes network resources.
NE20E can be disabled from padding Hello packets with zeros, which helps use network
resources more efficiently.
NE20E allows to configure an interface to pad Hello packets with zeros before they are sent.
By default, the following interface-specific rules for sending Hello packets on NE20E devices
apply:
 Point-to-point (P2P) interfaces exchange Hello packets with the Padding field before
they establish an IS-IS neighbor relationship. After the IS-IS neighbor relationship is
established, the P2P interfaces exchange Hello packets without the padding field.
 Broadcast interfaces exchange Hello packets with the Padding field before and after they
establish an IS-IS neighbor relationship.
LDP PMTU Negotiation

As defined in relevant standards, LDP label switching routers (LSRs) can automatically
discover MTU values along LDP LSPs. Each LSR selects the smallest value among all MTU
Issue 01 (2018-05-04) 971

NE20E-S2
values advertised by downstream LSRs as well as the MTU of the outbound interface mapped
to the local forwarding equivalence class (FEC) before advertising the selected MTU value to
the upstream LSR.
The default LDP MTU values vary according to types of LSRs along an LSP as follows:
 The egress LSR uses the default MTU value of 65535.
 The penultimate LSR assigned an implicit-null label uses the default LDP MTU equal to
the MTU of the local outbound interface mapped to the FEC.
 Except the preceding LSRs, each LSR selects a smaller value as the local LDP MTU.
This value ranges between the MTU of the local outbound interface mapped to the FEC
and the MTU advertised by a downstream LSR. If an LSR receives no MTU from any
downstream LSR, the LSR uses the default LDP MTU value of 65535.
A downstream LSR adds the calculated LDP MTU value to the MTU type-length-value (TLV)
in a Label Mapping message and sends the Label Mapping message upstream.
Figure 1-638 LDP PMTU Negotiation
If an MTU value changes (such as when the local outbound interface or its configuration is
changed), an LSR recalculates an MTU value and sends a Label Mapping message carrying
the new MTU value upstream. The comparison process repeats to update MTUs along the
LSP.
If an LSR receives a Label Mapping message that carries an unknown MTU TLV, the LSR
forwards this message to upstream LDP peers.
NE20E devices exchange Label Mapping messages to negotiate MPLS MTU values before
they establish LDP LSPs. Each message carries either of the following two MTU TLVs:
 Huawei proprietary MTU TLV: sent by Huawei routers by default. If an LDP peer cannot
recognize this Huawei proprietary MTU TLV, the LDP peer forwards the message with
this TLV so that an LDP peer relationship can still be established between the Huawei
router and its peer.
 Relevant standards-compliant MTU TLV: specified by commands on NE20E. NE20E
uses this MTU TLV to negotiate with non-Huawei devices.
RSVP-TE PMTU Negotiation

Resource Reservation Protocol-Traffic Engineering (RSVP-TE) nodes negotiate MPLS MTU
values and select the smallest value as the PMTU for a TE LSP.
The process of negotiating MTU values between RSVP-TE nodes is as follows:
Issue 01 (2018-05-04) 972

NE20E-S2
Figure 1-639 RSVP-TE PMTU Negotiation
1. The ingress sends a Path message with the ADSPEC object that carries an MTU value.
The smaller MTU value between the MTU configured on the physical outbound
interface and the configured MPLS MTU is selected.
2. Upon receipt of the Path message, a transit LSR selects the smallest MTU among the
received MTU value, the MTU configured on the physical outbound interface, and the
configured MPLS MTU. The transit LSR then sends a Path message with the ADSPEC
object that carries the smallest MTU value to the downstream LSR. This process repeats
until a Path message reaches the egress.
3. The egress uses the MTU value carried in the received Path message as the PMTU. The
egress then sends a Resv message that carries the PMTU value upstream to the ingress.
L2VPN MTU Negotiation

As defined in relevant standards, nodes negotiate MTU values before they establish virtual
circuits (VCs) or pseudo wires (PWs) on Layer 2 virtual private networks (L2VPNs), such as
pseudo wire emulation edge-to-edge (PWE3), virtual leased line (VLL), and virtual private
LAN service (VPLS) networks. An MTU inconsistency will cause two nodes to fail to
establish a VC or PW.
Type MTU Configuration Methods MTU Value Selection Rules

PWE  Specify MTU parameter in the One of the following MTUs with
3 mpls switch-l2vc command. priorities in descending order is selected:
 Configure the mtu mtu-value 1. Mtu parameter specified in the mpls
command in the PW template l2vc command or mpls switch-l2vc
view. command
2. MTU configured in PW template
3. Interface MTU of the AC interface
4. Default MTU value (1500 bytes)
Kom Configure the mtumtu-value One of the following MTUs with
pella command in MPLS-L2VPN instance priorities in descending order is selected:
VLL view. 1. MTU configured in MPLS-L2VPN
instance view
VPL Configure the One of the following MTUs with
S mtumtu-valuecommand in VSI view. priorities in descending order is selected:
1. MTU configured in VSI instance view
Issue 01 (2018-05-04) 973

NE20E-S2
By default, Huawei routers implement MTU negotiation for VCs or PWs. Two nodes must
use the same MTU to ensure that a VC or PW is established successfully. L2VPN MTUs are
only used to establish VCs and PWs and do not affect packet forwarding.
To communicate with non-Huawei devices that do not verify L2VPN MTU consistency,
L2VPN MTU consistency verification can be disabled on NE20E. This allows NE20E to
establish VCs and PWs with the non-Huawei devices.
1.9.6.5 Number of labels carried in an MPLS packet in various scenarios

MPLS VPN Number of Labels Description
Scenario
One private network label and N Value of N (Depending on the Public-Network Tunnel
Intra-AS VPN
public network labels. Type):
 N is 1 when packets are transmitted on an LDP LSP.
Inter-AS VPN  A packet transmitted within an
Option A autonomous system (AS)  N is 1 when packets are transmitted on a static LSP.
carries one private network  N is 1 when packets are transmitted on a TE tunnel.
label and N public network
 N is 2 when packets are transmitted on a TE tunnel in
labels.
the LDP over TE scenario.
 A packet transmitted between
 N is 3 when packets are transmitted on a TE fast
ASs caries no labels.
reroute (FRR) bypass tunnel in the LDP over TE
Inter-AS VPN  A packet transmitted within an scenario.
Option B AS carries one private network NOTE
label and N public network The preceding Ns take effect when the PHP function is
labels. disabled. If PHP is enabled and performed, N minus 1 (N - 1)
 A packet transmitted between takes effect.
ASs caries one private network
label.
Inter-AS VPN  A packet sent within an
Option C original AS carries one private
network label, one Border
Gateway Protocol (BGP) label,
and N public network labels.
 A packet transmitted from the
original AS to another AS
carries one private network
label and one BGP label.
 A packet is transmitted in
non-original ASs, the packet
carries one private network
label and N public network
labels.
 A packet transmitted within the
core layer carries an inner label
and N public network labels.
HoVPN and
 A packet transmitted between a
HVPLS
user-end provider edge (UPE)
and a superstratum provider
edge (SPE) carries one inner
Issue 01 (2018-05-04) 974

NE20E-S2
MPLS VPN Number of Labels Description

Scenario
label.
1.9.7 Load Balancing

Definition
Load balancing distributes traffic among multiple links available to the same destination.
Purpose
After load balancing is deployed, traffic is split into different links. When one link used in
load balancing fails, traffic can still be forwarded through other links.
Benefits
Load balancing offers the following benefits to carriers:
 Maximized network resource usage
 Increased link reliability
1.9.7.2 Basic Concepts

1.9.7.2.1 What Is Load Balancing
Load balancing means that network nodes distribute traffic among multiple links during
transmission. Route load balancing, tunnel load balancing, and trunk load balancing are
available.
Route Load Balancing

Route load balancing means that traffic is load-balanced over multiple forwarding paths to a
destination, as shown in Figure 1-640.
Issue 01 (2018-05-04) 975

NE20E-S2
Figure 1-640 Route load balancing networking
If the Forwarding Information Base (FIB) of a device has multiple entries with the same
destination address and mask but different next hops, outbound interfaces, or tunnel IDs, route
load balancing can be implemented.
Route load balancing can be implemented in either of the following solutions:
 Solution 1: Configure multiple equal-cost routes with the same destination network segment but
different next hops and the maximum number of equal-cost routes for load balancing. This solution
is mostly used among links that directly connect two devices. However, this solution is being
replaced with the trunk technology as the trunk technology develops. Compared with this solution,
the trunk technology saves IP addresses and facilitates management by bundling links into a trunk.
 Solution 2: Separate destination IP addresses into several groups and allocate one link for each group.
This solution improves the utilization of bandwidth resources. However, if you use this solution to
implement load balancing, you must observe and analyze traffic and know the distribution and
trends of traffic of various types.
Tunnel Load Balancing

Tunnel load balancing is applicable when the ingress PE on a VPN has multiple tunnels to a
destination PE, as shown in Figure 1-641. Traffic can be load-balanced among these tunnels.
Figure 1-641 Tunnel load balancing networking
Trunk Load Balancing

Trunk load balancing means that traffic is load-balanced among trunk member links after
multiple physical interfaces of the same link layer protocol are bundled into a logical data link,
as shown in Figure 1-641.
Issue 01 (2018-05-04) 976

NE20E-S2
Figure 1-642 Trunk load balancing networking
Load Balancing Characteristics

Advantages:
 Load balancing distributes traffic among multiple links, providing higher bandwidth than
each individual link and preventing traffic congestion caused by link overload.
 Links used for load balancing back up each other. If a link fails, traffic can be
automatically switched to other available links, which increases link reliability.
Disadvantages:
Traffic is load-balanced randomly, which may result in poor traffic management.
1.9.7.2.2 Per-Flow and Per-Packet Load Balancing

Load balancing can work in per-flow mode or per-packet mode, irrespective of whether it is
route load balancing, tunnel load balancing, or trunk load balancing.
Per-Flow Load Balancing

Per-flow load balancing classifies packets into different flows based on a certain rule, such as
the IP 5-tupe (source IP address, destination IP address, protocol number, source port number,
and destination port number). Packets of the same flow go over the same link.
On the network shown in Figure 1-643, R1 sends six packets, P1, P2, P3, P4, P5, and P6 in
sequence to R2 over Link A and Link B in load balancing mode. P2, P3, and P5 are destined
for R3; P1, P4, and P6 are destined for R4. If per-flow load balancing is used, packets
destined for R3 can go over Link A, and packets destined for R4 can go over Link B.
Alternatively, packets destined for R3 can go over Link B, and packets destined for R4 can go
over Link A.
Figure 1-643 Per-flow load balancing networking
Issue 01 (2018-05-04) 977

NE20E-S2
Per-Packet Load Balancing

Per-packet load-balancing means that the device sends packets in sequence alternately over
the links used for load balancing, as shown in Figure 1-644. Load is evenly distributed over
the links.
Figure 1-644 Per-packet load balancing networking
Comparison Between Per-Flow and Per-Packet Load Balancing

Per-packet load balancing balances traffic more equally than per-flow balancing. Load
balancing rules and service flow characteristics determine whether load can be equally
balanced in per-flow load balancing. In many practical cases, per-flow load-balancing can
have unequal link utilization.
In per-packet load balancing, packets may arrive out of order due to the following causes:
 Links are of poor transmission quality. Delay, packet loss, or error packets may occur
when the link quality is poor.
 Packets are of varied sizes. When packets of different sizes are transmitted over the same
link, under circumstances of a steady transmission rate, small-sized packets may arrive at
the peer first even though they are sent later than large-sized packets. Therefore, check
whether packet disorder is tolerable and the links have the mechanism of keeping the
original transmission sequence on the live network before using per-packet load
balancing.
As per-packet load balancing may cause packet disorder, it is not recommended for key
services that are sensitive to packet sequence, such as voice and video services.
Huawei NE20E use per-flow load balancing by default.
By default, the load balancing modes of the traffic on control plane and forwarding plane are per-flow.
You can configure command to change the mode, see Configuring Load Balancing Mode.
1.9.7.2.3 ECMP and UCMP

Route load balancing can be classified as Equal-Cost Multiple Path (ECMP) or Unequal-Cost
Multiple Path (UCMP).
Issue 01 (2018-05-04) 978

NE20E-S2
ECMP
ECMP evenly load-balances traffic over multiple equal-cost paths to a destination,
irrespective of bandwidth. Equal-cost paths have the same cost to the destination.
When the bandwidth of these paths differs greatly, the bandwidth usage is low. On the
network shown in Figure 1-645, traffic is load-balanced over three paths, with the bandwidth
of 10 Mbit/s, 20 Mbit/s, and 30 Mbit/s, respectively. If ECMP is used, the total bandwidth can
reach 30 Mbit/s, but the bandwidth usage can only be 50%, the highest.
Figure 1-645 ECMP networking
UCMP
UCMP load-balances traffic over multiple equal-cost paths to a destination based on
bandwidth ratios. All paths carry traffic based on their bandwidth ratios. As shown in Figure
1-646. This increases bandwidth usage.
Figure 1-646 UCMP networking
Trunk load balancing does not have ECMP or UCMP, but has similar functions. For example, if
interfaces of different rates, for example, GE and FE interfaces, are bundled into a trunk interface, and
weights are assigned to the trunk member interfaces, traffic can be load-balanced over trunk member
links based on link weights. This is implemented in a similar way as UCMP. By default, the trunk
member interfaces have the same weight of 1. The default implementation is similar to ECMP, but all
member interfaces can only have the lowest forwarding capability among all.
Issue 01 (2018-05-04) 979

NE20E-S2
Additional Explanation of UCMP

 Currently, only per-flow load balancing is supported by UCMP. If both UCMP and
per-packet load balancing are configured, per-packet load balancing with ECMP takes
effect.
 Among the paths used for UCMP, the bandwidth of each path must be higher than or
equal to 1/16 of the total bandwidth; otherwise, the path carries no traffic.
 The interfaces that support UCMP are Ethernet interfaces, GE interfaces, POS interfaces,
IP-Trunk interfaces, Eth-Trunk interfaces, and TE tunnel interfaces.
For the configuration guide of UCMP, see Configuring UCMP.
1.9.7.3 Basic Principles
Protocol-based Load Balancing

 Equal-cost load balancing
The NE20E supports equal-cost load balancing, enabling you to configure multiple
routes to the same destination and with the same preference. If there are no routes with
higher priorities, these equal-cost routes are all used. The NE20E always selects a
different route for the IP packets to the destination. This balances traffic among multiple
routes.
A specified routing protocol can learn several different routes to the same destination. If
the protocol priority is higher than the priorities of all the other active protocols, these
routes are all considered valid. As a result, load balancing of IP traffic is ensured at the
routing protocol layer. In actual applications, routing protocols OSPF, BGP, and IS-IS
and static routes support load balancing.
Figure 1-647 Protocol-based load balancing
On the network shown in Figure 1-647, OSPF is used as the routing protocol.
− OSPF is configured on Device A, Device B, Device C, Device D, and Device E.
OSPF learns three different routes.
Issue 01 (2018-05-04) 980

NE20E-S2
− Packets entering Device A through Port 1 and heading for Device E are sent to the
destination according to specific load balancing modes by the three routes,
implementing load balancing.
 Unequal-cost load balancing
When equal-cost load balancing is performed, traffic is load-balanced over paths,
irrespective of the difference between link bandwidths. In this situation, low-bandwidth
links may be congested, whereas high-bandwidth links may be idle. Unequal-cost load
balancing can solve this problem by balancing traffic based on the bandwidths of the
outbound interfaces.
Load balancing modes and algorithms of equal-cost and unequal-cost load balancing are
the same.
The working mechanisms of equal-cost load balancing and unequal-cost load balancing
are similar. The difference is that unequal-cost load balancing carries bandwidth
information to the FIB and generates an NHP table according to the bandwidth ratio so
that load balancing can be performed based on the bandwidth ratio.
In Figure 1-647, after unequal-cost load balancing is enabled on Device A, traffic is
load-balanced based on the bandwidth ratio of the three outbound interfaces on Device A.
For example, if the bandwidths of the three outbound interfaces are 0.5 Gbit/s, 1 Gbit/s,
and 2.5 Gbit/s, respectively, traffic is load-balanced by these interfaces at the ratio of
1:2:5.
For detailed about unequal-cost load balancing, see 1.9.8 UCMP.
 MPLS load balancing
When MPLS load balancing is performed, the NP checks the load balancing table and
then hashes packets to different load balancing items.
Figure 1-648 MPLS load balancing
In Figure 1-648, two equal-cost LSPs exist between Device B and Device C so that
MPLS load balancing can be performed.
 Multicast load balancing
Multicast load balancing can be configured based on the multicast source, multicast
group, or multicast priority.
Trunk Load Balancing

A trunk is a logical interface in which several physical interfaces of the same type are bundled.
Trunks provide higher bandwidth than each individual physical interface, improve connection
redundancy, and load balance traffic over links.
Issue 01 (2018-05-04) 981

NE20E-S2
Figure 1-649 Trunk load balancing
 Trunk load balancing for Layer 3 unicast and MPLS packets

Per-packet load balancing mode can be configured for Layer 3 unicast and MPLS
packets to implement trunk load balancing. By default, per-flow load balancing mode is
used.
 Trunk load balancing for multicast
By default, trunk load balancing for multicast is performed based on the multicast source
and group.
Two-Level Hash
When links connecting to next hops are trunk links, the traffic that is hashed based on
protocol-based load balancing is further hashed based on the trunk forwarding table. This is
the two-level hash.
Figure 1-650 Two-level hash
1. The hash algorithm is first performed on Link 1 and trunk links.

2. Traffic on the trunk links is hashed to two trunk member interfaces.
Two-level hash works as follows:
A trunk, being regarded as a link, participates in the first-level hash with other links. The
mechanism of the first-level hash is the same as that of protocol-based load balancing. The
trunk traffic that has been load balanced according to the hash algorithm based on the NHP
table is further load balanced according to the hash algorithm based on the trunk forwarding
table. After that, second-level hash is implemented.
Issue 01 (2018-05-04) 982

NE20E-S2
Two-Level Load Balancing
Figure 1-651 Two-level load balancing
In Figure 1-651, traffic is load balanced between Device A and Device B, and between Device
B and Device C. If the two load balancing processes use the same algorithm to calculate the
hash key, the same flow is always distributed to the same link. In this case, the forwarding of
the traffic is unbalanced.
Two-level load balancing works as follows:
A random number is introduced to the hash algorithm on each device. Random numbers vary
depending on devices, which ensures different hash results.
1.9.7.4 Conditions for Load Balancing

1.9.7.4.1 Route Load Balancing
1.9.7.4.1.1 Overview
Huawei NE20E can implement load balancing using static routes and a variety of routing
protocols, including the Routing Information Protocol (RIP), RIP next generation (RIPng),
Open Shortest Path First (OSPF), OSPFv3, Intermediate System-to-Intermediate System
(IS-IS), and Border Gateway Protocol (BGP).
When multiple dynamic routes participate in load-balancing, these routes must have equal
metric. As metric can be compared only among routes of the same protocol, only routes of the
same protocol can load-balance traffic.
1.9.7.4.1.2 Load Balancing Among Static Routes

By default, a maximum of 64 static routes can be used to load-balance traffic on Huawei
NE20E.
The maximum number of static routes that can be used to load-balance traffic is controlled by
the PAF/license or global trotter license (GTL) file and cannot be modified using commands.
Conditions
When the maximum number of static routes that load-balance traffic and the maximum
number of routes of all types that load-balance traffic are both greater than 1, the following
rules apply:
 If N active static routes with the same prefix are available and N is less than or equal to
the maximum number of static routes that can be used to load-balance traffic, traffic is
load-balanced among the N static routes, regardless of whether they have the same cost.
 If a static route is active and has N iterative next hops, traffic is load-balanced among N
routes, which is called iterative load balancing.
Issue 01 (2018-05-04) 983

NE20E-S2
Black-hole routes cannot be used for load balancing.
In Figure 1-652, R1 learns two OSPF routes to 172.1.1.2/32, both with the cost 2. The
outbound interface and next hop of one route are GE 1/0/0 and 172.1.1.34, and the outbound
interface and next hop of the other route are GE 2/0/0 and 172.1.1.38.
Figure 1-652 Load balancing among static routes
 A static route is configured on R1 using the following command:

ip route-static 172.1.1.45 30 172.1.1.2 inherit-cost
Although only one static route is configured, two iterative next hops (172.1.1.34 and
172.1.1.38) are available. Therefore, the number of static routes displayed in the routing
table is 1, but the number of FIB entries is 2.
 If another static route is configured on R1 using the following command:
ip route-static 172.1.1.45 30 172.1.1.42
Traffic is load-balanced among three routes, although the cost of the new static route is
different from that of the other two routes.
 If the following command is run to set the priority of the new static route to 1,
ip route-static 172.1.1.45 30 172.1.1.42 preference 1
R1 will preferentially select the static route with next hop 172.1.1.42. As a result, the
other static routes become invalid, and traffic is no longer load-balanced.
1.9.7.4.1.3 Load Balancing Among OSPF Routes
Conditions
If the maximum number of OSPF routes that can be used to load-balance traffic and the
maximum number of routes of all types that can be used to load-balance traffic are both
greater than 1 and multiple OSPF routes with the same prefix exist, these routes participate in
load balancing only when the following conditions are met:
 These routes are of the same type (intra-area, inter-area, Type-1 external, or Type-2
external).
 These routes have different direct next hops.
 These routes have the same cost.
 If these routes are Type-2 external routes, the costs of the links to the ASBR or
forwarding address are the same.
 If OSPF route selection specified in relevant standards is implemented, these routes have
the same area ID.
Issue 01 (2018-05-04) 984

NE20E-S2
The OSPF route selection rules specified in relevant standards are different from those in relevant
standards. By default, Huawei NE20E perform OSPF route selection based on the rules specified in
relevant standards. To implement OSPF route selection based on the rules specified in relevant standards,
run the undo rfc1583 compatible command.
For the configuration guide of OSPF load-balance maximum number, see Configuring Unicast Route
Load Balancing.
Principles
If the number of OSPF routes available for load balancing is greater than the configured
maximum number of OSPF routes that can be used to load-balance traffic, OSPF selects
routes for load balancing in the following order:
1. Routes whose next hops have smaller weights
Weight indicates the route preference, and the weight of the next hop can be changed by the nexthop
command (in OSPF view). Routing protocols and their default preferences:
 DIRECT: 0
 STATIC: 60
 IS-IS: 15
 OSPF: 10
 OSPF ASE: 150
 OSPF NSSA: 150
 RIP: 100
 IBGP: 255
 EBGP: 255
2. Routes whose outbound interfaces have larger indexes
Each interface has an index, which can be seen in the display interface interface-name command in any
view.
3. Routes whose next hop IP addresses are larger.
1.9.7.4.1.4 Load Balancing Among IS-IS Routes
Conditions
If the maximum number of IS-IS routes that can be used to load-balance traffic and the
greater than 1 and multiple IS-IS routes with the same prefix exist, these routes can participate
in load balancing only when the following conditions are met:
 These routes are of the same level (Level-1, Level-2, or Level-1-2).
 These routes are of the same type (internal or external).
 These routes have the same cost.
 These routes have different direct next hops.
For the configuration guide of IS-IS load-balance maximum number, see Configuring Unicast Route
Load Balancing.
Issue 01 (2018-05-04) 985

NE20E-S2
Principles
If the number of IS-IS routes available for load balancing is greater than the configured
maximum number of IS-IS routes that can be used to load-balance traffic, IS-IS selects routes
for load balancing in the following order:
1. Routes whose next hops have smaller weights
Weight indicates the route preference, and the weight of the next hop can be changed by the nexthop
command (in IS-IS view). Routing protocols and their default preferences:
 DIRECT: 0
 STATIC: 60
 IS-IS: 15
 OSPF: 10
 OSPF ASE: 150
 OSPF NSSA: 150
 RIP: 100
 IBGP: 255
 EBGP: 255
2. Routes with smaller neighbor IDs

3. Routes with smaller circuit IDs
4. Routes with smaller sub-network point addresses (SNPAs)
5. Routes whose outbound interfaces have smaller indexes
Each interface has an index, which can be seen in the the display interface interface-name command in
any view.
6. Routes carrying IPv4, IPv6, and OSI next hop addresses, in descending order
7. Routes whose next hops have smaller IP addresses
8. If all the preceding items are the same, IS-IS selects the routes that are first calculated for
load balancing.
1.9.7.4.1.5 Load Balancing Among BGP Routes
Conditions
Unlike an Interior Gateway Protocol (IGP), BGP imports routes from other routing protocols,
controls route advertisement, and selects optimal routes, rather than maintaining network
topologies or calculating routes by itself.
If the maximum number of BGP routes that can be used to load-balance traffic and the
greater than 1, load balancing can be performed in either of the following modes:
 Static routes or equal-cost IGP routes are used for BGP route iteration, and then traffic is
load-balanced among BGP routes.
 BGP route attributes are modified to carry out load balancing.
In versions that support BGP independent route selection, BGP routes can be used to
load-balance traffic only when the following conditions are met:
− The PrefVal attributes of the BGP routes are the same.
Issue 01 (2018-05-04) 986

NE20E-S2
− The Local_Pref attributes of the BGP routes are the same.

− The BGP routes are all summarized routes or none of them is a summarized route.
− The AS_Path (excluding As_Confed_Sequence and As_Confed_Set) attributes of
the BGP routes carry the same number of ASs. During route selection, a router
considers that the AS_Set carries only one AS, regardless of the actual number of
ASs.
− The origin types (IGP, EGP, or incomplete) of the BGP routes are the same.
− The MED values of the BGP routes are the same.
− The BGP routes are either all EBGP or all IBGP routes.
− The metric values of intra-AS IGP routes imported and converted to the BGP routes
are the same.
− The AS_Path attributes of the BGP routes are the same.
In versions that do not support BGP independent route selection, BGP routes can be used
to load-balance traffic only when both the preceding nine conditions and the following
conditions are met:
− The BGP routes are either all locally originated or none of the them is locally
originated. If they are all locally originated, they must have the same priority.
− The next hops of the BGP routes are reachable.
− The BGP routes are either all labeled or none of them is labeled.
For the configuration guide of BGP load-balance maximum number, see Configuring Unicast Route
Load Balancing.
Principles
If the number of BGP routes available for load balancing is greater than the configured
maximum number of BGP routes that can be used to load-balance traffic, BGP selects routes
for load balancing in the following order:
 Routes with the shortest Cluster_List
 Routes advertised by the routers with smaller router IDs. If the BGP routes carry
Originator_ID attributes, BGP selects the routes with smaller Originator_ID attributes
without comparing the router IDs.
 Routes that are learned from BGP peers with smaller addresses
EIBGP Load Balancing

EIBGP load balancing is used in a VPN where a CE is dual-homed to two PEs, as shown in
Figure 1-653. If the CE belongs to the same AS as one of the PEs and belongs to a different
AS from the other PE, VPN traffic is load-balanced among EBGP and IBGP routes.
Issue 01 (2018-05-04) 987

NE20E-S2
Figure 1-653 EIBGP load balancing
1.9.7.4.1.6 Multicast Load Balancing

If equal-cost unicast routes to multicast sources or RPs exist on a multicast network, you can
implement multicast load balancing by running the multicast load-splitting command to
configure a load balancing policy ( as shown in Figure 1-654 and Figure 1-655 ).
Figure 1-654 Equal-cost routes to an RP
Issue 01 (2018-05-04) 988

NE20E-S2
Figure 1-655 Equal-cost routes to a multicast source
After a multicast load balancing policy is configured, a multicast router selects equal-cost
routes in each routing table on the device, such as, the unicast, MBGP, MIGP, and multicast
static routing tables. Based on the mask length and priority of each type of equal-cost routes,
the router selects a routing table on which multicast routing depends. Then, the router
implements load balancing among equal-cost routes in the selected routing table.
Load balancing can be implemented only between or among the same type of equal-cost routes. For
example, load balancing can be implemented between two MBGP routes but cannot be implemented
between an MBGP route and an MIGP route.
For the configuration guide of Multicast load-balance maximum number, see Configuring Multicast
Route Load Balancing.
Issue 01 (2018-05-04) 989

NE20E-S2
1.9.7.4.2 Tunnel Load Balancing
Tunnel Definition and Tunnel Policy

The tunneling technology provides a mechanism of encapsulating packets of a protocol into
packets of another protocol. This allows packets to be transmitted over heterogeneous
networks. The channel for transmitting heterogeneous packets is called a tunnel. Tunnels are
mandatory for VPNs to transparently transmit VPN data from one VPN site to another.
MPLS VPNs support the following types of tunnels:
 Label Switched Path (LSP): includes LDP LSP, BGP LSP, and static LSP.
 Constraint-based Routed Label Switched Path (CR-LSP): includes RSVP-TE CR-LSP or
static CR-LSP. Compared with LSPs, CR-LSPs meet specified constraints, such as
bandwidth or path constraints.
 Generic Routing Encapsulation (GRE) tunnel: GRE-encapsulated data packets are
transparently transmitted over the public IP network.
Generally, MPLS VPNs use LSPs or CR-LSPs as public network tunnels. If the core
routers (Ps) on the backbone network, however, provide only the IP functionality and not
MPLS functionality, whereas the PEs at the network edge have the MPLS functionality,
the LSPs or CR-LSPs cannot be used as public network tunnels. In this situation, GRE
tunnels can be used for MPLS VPNs.
A tunnel policy determines the tunnel type to be used for a VPN. By default, a VPN uses
LSPs to forward data. To change the tunnel type or configure tunnel load balancing for VPN
services, apply a tunnel policy to the VPN and run the tunnel select-seq command in the
tunnel policy view to configure the priority sequence of tunnels and the number of tunnels
used for load balancing.
Tunnel Load Balancing Selection Rules

Assume that the following command has been run in a tunnel policy:
tunnel select-seq cr-lsp lsp load-balance-number 2
After the tunnel policy is applied to a VPN, the VPN selects tunnels based on the following
rules:
 If two or more CR-LSPs are available, the VPN selects any two of them at random.
 If less than two CR-LSPs are available, the VPN selects all CR-LSPs and also selects
LSPs as substitutes to ensure that two tunnels are available for load balancing.
 If two tunnels have been selected, one CR-LSP and the other LSP, and a CR-LSP is
added or a CR-LSP goes Up from the Down state, the VPN selects the CR-LSP to
replace the LSP.
 If the number of existing tunnels for load balancing is smaller than the configured
number and a CR-LSP or LSP in the Up state is added, the newly added tunnel is also
used for load balancing.
 If one or more tunnels used for load balancing go Down, the tunnel policy is triggered to
re-select tunnels. The VPN selects LSPs as substitutes to ensure that the configured
number of tunnels are used for load balancing.
 The number of tunnels used for load balancing depends on the number of eligible tunnels.
For example, if there are only one CR-LSP and one LSP in the Up state, load balancing
is performed between the two tunnels. The tunnels of other types are not selected even if
they are Up.
Issue 01 (2018-05-04) 990

NE20E-S2
Priority Sequence for Selecting LSPs

If a tunnel policy has been configured for a VPN to select only LSPs, three types of LSPs are
available: LDP LSPs, BGP LSPs, and static LSPs, which can be selected in descending order
of priority.
Assume that the following command has been run in a tunnel policy:
tunnel select-seq lsp load-balance-number 3
 If three or more LDP LSPs are available, the VPN selects any three of them at random.
 If less than three LDP LSPs are available, the VPN selects all LDP LSPs and then selects
BGP LSPs as substitutes to ensure that three tunnels are available for load balancing.
 If less than three LDP LSPs and BGP LSPs are available, the VPN selects all LDP LSPs
and BGP LSPs and then selects static LSPs as substitutes to ensure that three tunnels are
available for load balancing.
Priority Sequence for Selecting CR-LSPs

If a tunnel policy has been configured for a VPN to select only CR-LSPs, two types of
CR-LSPs are available: RSVP-TE CR-LSPs and static CR-LSPs, which can be selected in
descending order of priority.
Comparison of Tunnel Load Balancing and Route Load Balancing

Tunnel load balancing and route load balancing are different in the following aspects:
 Routes used for load balancing must have equal costs, whereas tunnels used for load
balancing can have unequal costs.
On the network shown in Figure 1-656, assume that all links have the same route cost. If
two routes are available from PE1 to PE2 for load balancing, these two routes must have
the same cost. If two tunnels are available from PE1 to PE2 for load balancing, these
tunnels can have unequal route costs.
Figure 1-656 Tunnels used for load balancing do not necessarily have the same cost
 Routes used for load balancing must go over different paths, whereas tunnels used for
load balancing can go over the same path.
Issue 01 (2018-05-04) 991

NE20E-S2
On the network shown in Figure 1-657, if two routes are available from PE1 to PE2 for
load balancing, these two routes must go over different paths. If two tunnels are available
from PE1 to PE2 for load balancing, these tunnels can go over the same path.
Figure 1-657 Tunnels used for load balancing are allowed to go over the same path
 Routes used for load balancing must have the same type, whereas tunnels used for load
balancing can have different types.
For example, between the two routes used for load balancing, if one is a static route, the
other cannot be an OSPF route. However, between the two tunnels used for load
balancing, if one is a CR-LSP, the other can be an LSP.
For the configuration guide of Tunnel load-balance maximum number, see Configuring Tunnel Load
Balancing.
1.9.7.4.3 Eth-Trunk Load Balancing

Traffic can be load-balanced among active trunk member links to provide higher link
reliability and higher bandwidth than each individual link.
A Eth-Trunk interface can work in either static LACP mode or manual load balancing mode.
 Static LACP mode: a link aggregation mode that uses the Link Aggregation Control
Protocol (LACP) to negotiate parameters and select active links based on the
IEEE802.3ad standard. In static LACP mode, LACP determines the numbers of active
and inactive links in a link aggregation group. It is also called the M:N mode, with M
and N indicating the number of primary and backup links, respectively. This mode
provides higher link reliability and allows load balancing to be performed among M
links.
On the network shown in Figure 1-658, three primary links and two backup links with
the same attributes exist between two devices. Traffic is load-balanced among the three
primary links, but not along the two backup links. The actual bandwidth of the
aggregated link is the sum of the bandwidths of the three primary links.
Issue 01 (2018-05-04) 992

NE20E-S2
Figure 1-658 Eth-Trunk interface in static LACP mode
If a link in M links fails, LACP selects one from the N backup links to replace the faulty
one to retain M:N backup. The actual link bandwidth is still the sum of the bandwidths
of the M primary links.
If a link cannot be found in the backup links to replace the faulty link and the number of
member links in the Up state falls below the configured lower threshold of active links,
the Eth-Trunk interface goes Down. Then all member interfaces in the Eth-Trunk
interface no longer forward data.
An Eth-Trunk interface working in static LACP mode can contain member interfaces at
different rates, in different duplex modes, and on different boards. Eth-Trunk member
interfaces at different rates cannot forward data at the same time. Member interfaces in
half-duplex mode cannot forward data.
 Manual load balancing mode: In this mode, you must manually create an Eth-Trunk
interface, add interfaces to the Eth-Trunk interface, and specify active member interfaces.
LACP is not involved. All active member interfaces forward data and perform load
balancing.
Traffic can be evenly load-balanced among all member interfaces. Alternatively, you can
set the weight for each member interface to implement uneven load balancing; in this
manner, the interface that has a larger weight transmits a larger volume of traffic. If an
active link in a link aggregation group fails, traffic is balanced among the remaining
active links evenly or based on weights, as shown in Figure 1-659.
An Eth-Trunk interface working in manual load balancing mode can contain member
interfaces at different rates, in different duplex modes, and on different boards.
Issue 01 (2018-05-04) 993

NE20E-S2
Figure 1-659 Eth-Trunk interface in manual load balancing mode
For the configuration guide of Eth-Trunk load-balance maximum number, see Configuring Eth-Trunk
Load Balancing.
1.9.7.5 Load Balancing Algorithm

1.9.7.5.1 Algorithm Overview
In per-packet load balancing, a counter is set to count the number of packets. The value of the
counter are used for selecting an outbound interface.
Per-flow load balancing uses the hash algorithm. This document focus on the hash algorithm
since the per-flow mode is more widely used.
Hash Algorithm
The hash algorithm uses a hash function to map a binary value of any length to a smaller
binary value of a fixed length. The smaller binary value is the hash value. The device then
uses an algorithm to map the hash value to an outbound interface and sends packets out from
this outbound interface.
Hash Factor
Traffic is hashed based on traffic characteristics, which are called hash factors.
Traffic characteristics that can be used as hash factors include but are not limited to the
following:
 Ethernet frame header: source and destination MAC addresses
 IP header: source IP address, destination IP address, and protocol number
 TCP/UDP header: source and destination port numbers
 MPLS header: MPLS label and some bits in the MPLS payload
 L2TP packets: tunnel ID and session ID
Issue 01 (2018-05-04) 994

NE20E-S2
Hash Factors and Load Balancing Effects

If hash factors are more hashable, traffic will be more evenly load-balanced. If network traffic
is of varied types, only using hash factors may not achieve the best load balancing results.
To implement load balancing better, you are allowed to configure hash factors based on traffic
types. If the most suitable hash factor is used but the load balancing effect still unsatisfactory,
you can add the scrambling value to allow the hash factor to be more hashable, and therefore
to achieve better load balancing effect.
For the configuration guide of load-balance hash factor, see Adjusting the Algorithm of Load Balancing.
For the default hash factors of hash algorithm in typical load balance scenarios, see the chapter 1.9.7.6
Appendix: Default Hash Factors.
1.9.7.5.2 Analysis for Load Balancing In Typical Scenarios
For the relative configuration guide, see Adjusting the Algorithm of Load Balancing.
For the default hash factors of hash algorithm in typical load balance scenarios, see the chapter 1.9.7.6
Appendix: Default Hash Factors.
1.9.7.5.2.1 MPLS L3VPN Scenario
MPLS L3VPN Typical Topology
Figure 1-660 MPLS L3VPN typical topology
 PE (Provider Edge): an edge device on the provider network, which is directly connected
to the CE. The PE receives traffic from the CE and then encapsulates the traffic with
MPLS header, and then sends the traffic to P. The PE also receives traffic from the P and
then remove the MPLS header from the traffic, and then sends the traffic to CE.
 P (Provider): a backbone device on the provider network, which is not directly connected
to the CE. Ps perform basic MPLS forwarding.
 CE (Customer Edge): an edge device on the private network.
Issue 01 (2018-05-04) 995

NE20E-S2
Suitable Scenario 1: Load Balance on Ingress PE of L3VPN
Figure 1-661 Route load balance on ingress PE
Figure 1-662 Trunk load balance on ingress PE
Figure 1-663 Tunnel load balance on ingress PE
The hash algorithm is performed based on the packet format of the inbound traffic from AC
interface. The hash factors can be IP 5-tuple or IP 2-tuple. The result of the load balancing
depends on the discreteness of the private IP addresses or TCP/UDP ports of the packets.
Suitable Scenario 2: Load Balance on P Node
Figure 1-664 Route load balance on P
Issue 01 (2018-05-04) 996

NE20E-S2
Figure 1-665 Trunk load balance on P
The hash algorithm on P node is performed based on the packet format of the inbound MPLS
traffic.
 If the number of MPLS labels in the packet is less than four, the hash factors can be IP
5-tuple or IP 2-tuple. The result of the load balancing depends on the discreteness of the
private IP addresses or TCP/UDP ports of the packets.
 In the complex scenarios such as inter-AS VPN, FRR and LDP over TE, the number of
the labels in the packet may be four or more. In these scenarios, the hash factors are the
layer 4 or layer 5 label. The result of the load balancing depends on the discreteness of
the layer 4 or layer 5 labels of the packets.
Suitable Scenario 3: Load Balance on Egress PE of L3VPN
Figure 1-666 Route load balance on egress PE
Figure 1-667 Trunk load balance on egress PE
The hash algorithm of the load balance on egress PE is the same as Scenario 2 if the
Penultimate Hop Popping (PHP) is disabled, the same as Scenario 1 if the Penultimate Hop
Popping (PHP) is enabled.
Issue 01 (2018-05-04) 997

NE20E-S2
Suitable Scenario 4: Load Balancing Among the L3 Outbound Interfaces in the

Access of L2VPN to L3VPN Scenarios
Figure 1-668 Load Balancing Among the L3 Outbound Interfaces in the Access of L2VPN to
L3VPN Scenarios
In access of L2VPN to L3VPN scenarios, the hash algorithm is the same as Scenario 1.
1.9.7.5.2.2 VPLS Scenario
VPLS Typical Topology
Figure 1-669 VPLS typical topology
to the CE. The PE receives traffic from the CE and then encapsulates the Ethernet traffic
with MPLS header, and then sends the traffic to P. The PE also receives traffic from the P
and then remove the MPLS header from the traffic, and then sends the traffic to CE.
 CE (Customer Edge): an edge device on the private network. CEs perform
Ethernet/VLAN layer2 forwarding.
Issue 01 (2018-05-04) 998

NE20E-S2
Suitable Scenario 1: Load Balance on Ingress PE of VPLS
interface.
 IP traffic: the hash factors can be IP 5-tuple or IP 2-tuple. The result of the load
balancing depends on the discreteness of the private IP addresses or TCP/UDP ports of
the packets.
 Ethernet carrying Non-IP traffic, the hash factors can be MAC 2-tuple. The result of the
load balancing depends on the discreteness of the MAC addresses of the packets. Some
boards support 3-tuple <source MAC, destination MAC, VC Label> if the inbound AC
traffic is MPLS traffic and the AC interface is not QinQ sub-interface.
Issue 01 (2018-05-04) 999

NE20E-S2
Suitable Scenario 2: Load Balance on P Node
traffic.
 If the number of labels in the packet is less than four, the hash factors can be IP 5-tuple
or IP 2-tuple. The result of the load balancing depends on the discreteness of the private
IP addresses or TCP/UDP ports of the packets.
the labels may be four or more. In these scenarios, the hash factors are the layer 4 or
layer 5 label. The result of the load balancing depends on the discreteness of the layer 4
or layer 5 labels of the packets.
Suitable Scenario 3: Load Balance on Egress PE of VPLS

Egress PE of VPLS only supports Trunk load balancing because the Egress PE performs
Ethernet/VLAN layer2 forwarding. There is no route load balancing on egress PE of VPLS.
Figure 1-675 Trunk load balance on egress PE
 If the traffic is from MPLS to AC, the hash factors can be IP 5-tuple, IP 2-tuple or MAC
2-tuple. The default hash factors may be different in different board-types. Some boards
only support MAC 2-tuple.
 If the traffic is from AC to AC, the hash algorithm is the same as Scenario 1.
Issue 01 (2018-05-04) 1000

NE20E-S2

L3VPN Scenarios
1.9.7.5.2.3 VLL/PWE3 Scenario
VLL/PWE3 Typical Topology
Figure 1-677 VLL/PWE3 typical topology
to the CE. The PE receives traffic from the CE and then encapsulates the traffic with
MPLS header, and then sends the traffic to P. The PE also receives traffic from the P and
then remove the MPLS header from the traffic, and then sends the traffic to CE.
 CE (Customer Edge): an edge device on the private network.
Issue 01 (2018-05-04) 1001

NE20E-S2
Suitable Scenario 1: Load Balance on Ingress PE of VLL/PWE3
interface.
 IP traffic: the hash factors can be IP 5-tuple or IP 2-tuple. The result of the load
balancing depends on the discreteness of the private IP addresses or TCP/UDP ports of
the packets.
 Ethernet carrying Non-IP traffic, the hash factors can be MAC 2-tuple. The result of the
load balancing depends on the discreteness of the MAC addresses of the packets.
 Non-Ethernet traffic: the hash factor is VC label in most boards.
Suitable Scenario 2: Load Balance on P Nodes
Issue 01 (2018-05-04) 1002

NE20E-S2
traffic.
 If the number of labels in the packet is less than four, the hash factors can be IP 5-tuple
or IP 2-tuple. The result of the load balancing depends on the discreteness of the private
IP addresses or TCP/UDP ports of the packets.
the labels may be four or more. In these scenarios, the hash factors are the fourth or fifth
label from the top. The result of the load balancing depends on the discreteness of the
fourth or fifth label from the top.
Suitable Scenario 3: Load Balance on Egress PE of VLL/PWE3
Figure 1-683 Load Balance on Egress PE of VLL/PWE3
Egress PE of VLL/PWE3 only supports Trunk load balancing because the virtual circuit (VC)
of VLL/PWE3 is P2P.
 If the traffic is from AC to AC, the hash algorithm is the same as Scenario 1.
 If the traffic is from MPLS to AC, the hash factors can be IP 5-tuple, IP 2-tuple or VC
label. The hash factors may be different in different board-types.
Issue 01 (2018-05-04) 1003

NE20E-S2

L3VPN Scenarios
1.9.7.5.2.4 L2TP Scenario
About L2TP Tunnels

The Layer 2 Tunneling Protocol (L2TP) allows enterprise users, small-scale ISPs, and mobile
office users to access a VPN over a public network (PSTN/ISDN) and the access network.
An L2TP tunnel involves three node types, as shown in Figure 1-685:
 L2TP Access Concentrator (LAC): a network device capable of PPP and L2TP. It is
usually an ISP's access device that provides access services for users over the
PSTN/ISDN. An LAC uses L2TP to encapsulate the packets received from users before
sending them to an LNS and decapsulates the packets received from the LNS being
sending them to the users.
 L2TP Network Server (LNS): a network device that accepts and processes L2TP tunnel
requests. Users can access VPN resources after they have been authenticated by the LNS.
An LNS and an LAC are two endpoints of an L2TP tunnel. The LAC initiates an L2TP
tunnel, whereas the LNS accepts L2TP tunnel requests. An LNS is usually deployed as
an enterprise gateway or a PE on an IP public network.
 Transit node: a transmission device on the transit network between an LAC and an LNS.
Various types of networks can be used as the transit networks, such as IP or MPLS
networks.
Issue 01 (2018-05-04) 1004

NE20E-S2
Figure 1-685 L2TP networking
Two Types of L2TP Traffic

L2TP Traffic has two types:
 Control message: is used to establish, maintain or tear down the L2TP tunnel and
sessions. The format of L2TP control message is shown as Figure 1-686.
Figure 1-686 Format of L2TP control message
If the transit nodes of L2TP tunnel use per-packet load balancing, the L2TP control
messages may arrive out of order, this may result in the failure of L2TP tunnel
establishment.
 Data message: is used to transmit PPP frames over L2TP tunnel. The data message is not
retransmitted if lost. The format of L2TP data message is shown as Figure 1-687.
Figure 1-687 Format of L2TP data message
Hash Result of L2TP Traffic

In L2TP scenarios, the traffic are added a new IP header by LAC node. The source IP address
of the new IP header is the L2TP tunnel address of LAC node, and destination address of the
new IP header is the L2TP tunnel address of the remote LNS. That is, the source IP address
and destination IP address of the new IP header is unique. Therefore, the L2TP traffic is
Issue 01 (2018-05-04) 1005

NE20E-S2
belongs to the same flow. The load balancing result depends on the number of the L2TP
tunnels carrying the traffic. The more L2TP tunnels, the better result of load balancing.
1.9.7.5.2.5 GRE Scenarios

Generic Routing Encapsulation (GRE) provides a mechanism of encapsulating packets of a
protocol into packets of another protocol. This allows packets to be transmitted over
heterogeneous networks. The channel for transmitting heterogeneous packets is called a
tunnel. In addition, GRE serves as a Layer 3 tunneling protocol of Virtual Private Networks
(VPNs), and provides a tunnel for transparently transmitting VPN packets.
GRE can be used in the scenarios shown in Figure 1-688 to Figure 1-691.
Figure 1-688 Transmitting data of multi-protocol local networks through the single-protocol
backbone network
Figure 1-689 Enlarging the network operation scope
Figure 1-690 CPE-based VPN
Issue 01 (2018-05-04) 1006

NE20E-S2
Figure 1-691 Network-based VPN
In the scenarios stated above, the source IP addresses and the destination IP addresses of all
packets in the GRE tunnel are the source address and the destination address of the GRE
tunnel. Therefore, on any transit node or on egress node of the GRE tunnel, the TTLs in the
outer IP headers of the GRE packets are the same. If a flow is carried by only one GRE tunnel
and the load balancing mode is per-flow, the load balancing is not available. It is
recommended to create multiple GRE tunnels to carry the flow.
1.9.7.5.2.6 IP Unicast Forwarding Scenarios

The IP (IPv4 or IPv6) unicast packet ( Figure 1-692 ) is hashed based on the format of the
packet when the packet is received on the inbound board (upstream board). The hash factor
can be
 the 2-tuple <source IP address, destination IP address>,
 the 4-tuple <source IP address, destination IP address, source port number, destination
port number>,
 or the 5-tuple <source IP address, destination IP address, source port number, destination
port number, and protocol number>.
Therefore, the result of the load balancing depends on the IP addresses and the TCP/UDP port
number of the traffic.
Default hash factors of IP unicast traffic depends on the type of the inbound board.
Figure 1-692 IP unicast load balancing
1.9.7.5.2.7 Multicast Scenarios

Five load balancing policies are available for multicast traffic.
Issue 01 (2018-05-04) 1007

NE20E-S2
Multicast Group-based Load Balancing

Based on this policy, a multicast router uses the hash algorithm to select an optimal route
among multiple equal-cost routes for a multicast group. Therefore, all traffic of a multicast
group is transmitted on the same forwarding path, as shown in Figure 1-693.
This policy applies to a network that has one multicast source but multiple multicast groups.
Figure 1-693 Multicast group-based load balancing
Multicast Source-based Load Balancing

among multiple equal-cost routes for a multicast source. Therefore, all traffic of a multicast
source is transmitted on the same forwarding path, as shown in Figure 1-694.
This policy applies to a network that has one multicast group but multiple multicast sources.
Figure 1-694 Multicast source-based load splitting
Issue 01 (2018-05-04) 1008

NE20E-S2
Multicast Source- and Group-based Load Balancing

among multiple equal-cost routes for each (S, G) entry. Therefore, all traffic matching a
specific (S, G) entry is transmitted on the same forwarding path, as shown in Figure 1-695.
This policy applies to a network that has multiple multicast sources and groups.
Figure 1-695 Multicast source- and group-based load balancing
Balance-Preferred
Based on this policy, a multicast router evenly distributes (*, G) and (S, G) entries on their
corresponding equal-cost routes. This policy implements automatic load balancing adjustment
in the following conditions: Equal-cost routes are added, deleted, or modified; multicast
routing entries are added or deleted; the weights of equal-cost routes are changed.
This policy applies to a network on which multicast users frequently join or leave multicast
groups.
Stable-Preferred
Based on this policy, a multicast router distributes (*, G) entries and (S, G) entries on their
corresponding equal-cost routes. Therefore, stable-preferred is similar as the
balance-preferred policy. This policy implements automatic load balancing adjustment when
equal-cost routes are deleted. However, dynamic load balancing adjustment will not be
performed when multicast routing entries are deleted or when weights of load balancing
routes change.
This policy applies to a network that has stable multicast services.
Difference Between Balance-Preferred and Stable-Preferred

Both the balance-preferred and stable-preferred policies allow a multicast router to distribute
multicast routing entries based on weights of equal-cost routes. If all equal-cost routes have
the same weight, each equal-cost route will have the same number of multicast routing entries
as others. However, when route flapping occurs, the load balancing adjustment results of the
two policies will be different:
Issue 01 (2018-05-04) 1009

NE20E-S2
 Based on the balance-preferred policy, a multicast router takes load balancing as the
most important issue, so that the router rapidly responds to a change in unicast routes,
multicast routes, and weights of equal-cost routes.
 Based on the stable-preferred policy, a multicast router prevents unnecessary link
switchovers to ensure stable services, so that the router rapidly responds to unicast delete
requests but does not adjust load. After the unicast route flapping problem is resolved,
the router selects optimal routes for subsequent services to resolve the imbalance
problem gradually. Therefore, stable-preferred provides both stable and load-balanced
services.
1.9.7.5.2.8 Broadcast Scenario

Broadcast packets can only be forwarded within the same VLAN or VPLS domain.
Broadcast packets are load-balanced per VLAN or per VPLS domain.
1.9.7.6 Appendix: Default Hash Factors
ECMP (Route & Tunnel) Load Balance
Table 1-174 Default Hash Factors in ECMP Load Balance Scenario
Scenarios Traffic Type Default Hash Factors

IPv4 unicast 5-tuple <source IP, destination IP, source port number,
TCP/UDP
forwarding destination port number, protocol number>
(including
IPv4 L3VPN
forwarding
on Ingress Non-TCP/non-U
2-tuple <source IP, destination IP>
PE(AC->MP DP
LS or
AC->AC))
IPv6 unicast 5-tuple <source IPv6, destination IPv6, source port
TCP/UDP
forwarding number, destination port number, protocol number>
(including
IPv6 L3VPN
forwarding
on Ingress Non-TCP/non-U
2-tuple <source IPv6, destination IPv6>
PE(AC->MP DP
LS or
AC->AC))
 If the IP header is next to MPLS label stack:
MPLS − TCP/UDP: 5-tuple <source IP, destination IP,
forwarding source port number, destination port number,
Number of protocol number>
(P nodes in Labels is not
MPLS more than 4 − Non-TCP/non-UDP: 2-tuple <source IP,
scenarios) destination IP>
(MPLS->MP  If the IP header is not next to MPLS label stack or
LS) there is no IP header: bottom-of-stack label
Number of Fifth label from the top
Issue 01 (2018-05-04) 1010

NE20E-S2

Labels is not
less than 5
VPLS 5-tuple <source IP, destination IP, source port number,

TCP/UDP
forwarding destination port number, protocol number>
on Ingress
PE Non-TCP/non-U
2-tuple <source IP, destination IP>
DP
(AC->MPLS
) Non-IP 2-tuple <source MAC, destination MAC>
IP carrying 5-tuple <source IP, destination IP, source port number,
TCP/UDP destination port number, protocol number>
VLL IP carrying
forwarding non-TCP/non-U 2-tuple <source IP, destination IP>
on Ingress DP
PE
(AC->MPLS Ethernet
VC label
) carrying non-IP
Non-Ethernet
VC label
carrying Non-IP
Trunk Load Balance
Table 1-175 Default Hash Factors in Trunk Load Balance Scenario

TCP/UD 5-tuple <source IP, destination IP, source port
L3 P number, destination port number, protocol number>
forwardin IPv4
g unicast Non-TCP
(including /non-UD 2-tuple <source IP, destination IP>
IPv4 P
L3VPN TCP/UD 5-tuple <source IPv6, destination IPv6, source port
forwardin P number, destination port number, protocol number>
g on IPv6
Ingress unicast Non-TCP
PE) /non-UD 2-tuple <source IPv6, destination IPv6>
P
 If the IP header is next to MPLS label stack:
- TCP/UDP: 5-tuple <source IP, destination IP,
Number source port number, destination port number,
MPLS of labels protocol number>
forwardin MPLS is not
g more than - non-TCP/non-UDP: 2-tuple <source IP,
4 destination IP>2)
 IP header is not next to MPLS label stack or
there is no IP header: bottom-of-stack label
Issue 01 (2018-05-04) 1011

NE20E-S2

Number
of labels
Fifth label from the top of the stack
is not less
than 5
P number, destination port number, protocol number>
IPv4 Non-TCP
Bridged /non-UD 2-tuple <source IP, destination IP>
forwardin P
g, and
VPLS TCP/UD 5-tuple <source IPv6, destination IPv6, source port
g on PE IPv6 Non-TCP
(AC->MP /non-UD 2-tuple <source IPv6, destination IPv6>
LS or P
AC->AC)
MPLS 2-tuple <source MAC, destination MAC>
Non-MPLS and
2-tuple <source MAC, destination MAC>
Non-IP
VPLS Ethernet P number, destination port number, protocol number>
forwardin carrying Non-TCP
g on IP /non-UD 2-tuple <source IP, destination IP>
egress PE
P
(MPLS->
AC) Ethernet carrying
2-tuple <source MAC, destination MAC>
non-IP
P number, destination port number, protocol number>
IPv4 Non-TCP
/non-UD 2-tuple <source IP, destination IP>
P
VLL
forwardin TCP/UD 5-tuple <source IP, destination IP, source port
g on P number, destination port number, protocol number>
ingress PE IPv6 Non-TCP
(AC->MP
/non-UD 2-tuple <source IP, destination IP>
LS), VLL
P
local
connection Ethernet carrying
(AC->AC) VC label
MPLS
Ethernet carrying
VC label
non-IP and non-MPLS
Non-Ethernet carrying
VC label
non-IP
VLL IPv4/IPv6 TCP/UD 5-tuple <source IP, destination IP, source port
Issue 01 (2018-05-04) 1012

NE20E-S2

g on
egress PE Non-TCP
(MPLS-> /non-UD 2-tuple <source IP, destination IP>
AC) P
Non-IP VC label
Terms
None

Abbreviation
ECMP equal-cost multipath

FIB forwarding information base
UCMP unequal-cost multipath
NHP next hop
PST port state table
1.9.8 UCMP
Definition
Unequal cost multipath (UCMP) allows traffic to be distributed according to the bandwidth
ratio of multiple unequal-cost paths that point to the same destination with the same
precedence. All paths carry proportional traffic according to bandwidth ratio to achieve
optimal load balancing.
Purpose
When equal-cost routes have multiple outbound interfaces that connect to both high-speed
links and low-speed links, equal cost multipath (ECMP) evenly distributes traffic among links
to a destination, regardless of the difference between link bandwidths. When the link
bandwidths differ greatly, low-bandwidth links may be congested, whereas high-bandwidth
links may be idle. To fully utilize bandwidths of different links, traffic must be balanced
according to the bandwidth ratio of these links.
Issue 01 (2018-05-04) 1013

NE20E-S2
1.9.8.2 Principles
If multiple equal-cost routes reach the destination through multiple outbound interfaces,
bottom-layer hardware applies for resources according to the bandwidth ratio of these
interfaces so that the traffic ratio equals or approaches the bandwidth ratio on these interfaces.
When the bandwidth of an interface changes, traffic is automatically load-balanced according
to the new bandwidth radio.
1.9.8.2.2 Interface-based UCMP
Types of Interfaces Supporting Interface-based UCMP

 Ethernet interfaces
 GE interfaces
 POS interfaces
 Serial interfaces
Enabling Interface-based UCMP

Interface-based UCMP requires that multiple physical outbound interfaces that support
interface-based UCMP exist and UCMP enabled on each interface. If one of these interfaces
does not support UCMP, traffic is evenly distributed on all interfaces, regardless of the UCMP
function enabled on other interfaces.
To enable the interface board to record bandwidth information about load balancing on each
interface, you must run the shutdown and undo shutdown commands to restart the interfaces.
As a result, the routing management (RM) module re-advertises routes to the interface board.
When advertising routes, the FIB module on the IPU checks whether UCMP is enabled on
outbound interfaces and records bandwidth information of the UCMP-enabled interfaces in
the messages. The interface board calculates the distribution ratio of traffic according to the
bandwidth ratio of interfaces that are used for load balancing.
Interface Bandwidth Change Processing

The bandwidth derives from physical link information of interfaces and cannot be manually
changed.
Precautions
 If interface-based UCMP is enabled, global UCMP cannot be enabled. Similarly, if
global UCMP is enabled, interface-based UCMP cannot be enabled.
 The bandwidth accuracy for the interface board is Mbit/s, which supports high-speed
links.
 You must run the shutdown and undo shutdown commands in sequence after enabling
UCMP on an interface. As a result, traffic is interrupted. Global UCMP avoids this
situation by providing more functions.
Issue 01 (2018-05-04) 1014

NE20E-S2
1.9.8.2.3 Global UCMP
Interfaces Supporting Global UCMP

The following types of interfaces support global UCMP:
 Ethernet interfaces
 GE interfaces
 POS interfaces
 Eth-Trunk interfaces
 IP-Trunk interfaces
 Serial interfaces
 MP-Group interfaces
Enabling Global UCMP

Global UCMP requires that multiple outbound interfaces that support global UCMP exist. If
one of these interfaces does not support UCMP, traffic is evenly distributed on other interfaces,
regardless of the bandwidths of these interfaces after global UCMP is enabled.
After global UCMP is enabled on all interfaces that support UCMP, all routes are delivered to
the interface board and bandwidth information about load balancing on each outbound
interface is delivered in the same way as interface-based UCMP. Then, the interface board
calculates the distribution ratio according to the bandwidth of each outbound interface.
When a large number of routes exist, a global command is set to avoid the pressure from
delivering all routes to the system. The interval between enabling and disabling global UCMP
is 5 minutes.
Interface Bandwidth Change Processing

 When the interface bandwidth changes because a member interface is added to or
removed from a logical interface or a member interface goes Up or Down, the modules
related to logical interfaces, such as the trunk, inform the interface management module
of the new bandwidth of each interface. Then, the interface management module sends
an event message informing the FIB module of bandwidth changes. After verification,
the FIB module informs all interface boards of bandwidth changes on interfaces.
Consequently, the interface boards calculate the traffic ratio. If routing calculations are
caused by interface bandwidth changes, the FIB module resends FIB entries.
 Processing interface bandwidth changes takes time so that the CPU may be busy
processing frequent changes of interface bandwidths. To avoid this problem, you can set
an interval for reporting changes of interface bandwidth to interface boards. If the
interface bandwidth changes multiple times within the interval, the latest bandwidth is
reported to interface boards.
Precautions
If global UCMP is enabled, interface-based UCMP cannot be enabled. Similarly, if
interface-based UCMP is enabled, global UCMP cannot be enabled.
Issue 01 (2018-05-04) 1015

NE20E-S2
1.9.8.3.1 Interface-based UCMP Application
Device A has three physical outbound interfaces: Port 1, Port 2, and Port 3. The bandwidths of
the three interfaces are 10 Gbit/s, 1 Gbit/s, and 1 Gbit/s, respectively. Three IPv4 equal-cost
routes are available between Device A and Device B.
Figure 1-696 Interface-based UCMP
When UCMP is not enabled in the three interfaces, their traffic ratio is 1:1:1.
After UCMP is enabled on the three interfaces, the traffic ratio of the three interfaces
approaches the bandwidth ratio 10:1:1.
1.9.8.3.2 Global UCMP Application

Device A has three outbound interfaces that support global UCMP: Eth-Trunk 1, Port 1, and
Port 2. Eth-Trunk 1 is a logical interface that consists of three GE interfaces. The bandwidths
of the three outbound interfaces are 3 Gbit/s, 1 Gbit/s, and 1 Gbit/s. Three IPv4 equal-cost
routes are available between Device A and Device B.
Figure 1-697 Global UCMP
When UCMP is not enabled on the three interfaces, their traffic ratio is 1:1:1, irrespective of
the bandwidth ratio.
After global UCMP is enabled, traffic from Device A to Device B is load-balanced on the
three outbound interfaces, and the traffic ratio approaches the bandwidth ratio 3:1:1.
Issue 01 (2018-05-04) 1016

NE20E-S2
When a member interface of Eth-Trunk 1 is shut down, the bandwidth of Eth-Trunk 1 changes
to 2 Gbit/s and accordingly the bandwidth ratio of the three outbound interfaces is 2:1:1 for
load balancing.
When interfaces support UCMP, the bandwidths of equal-cost routes are displayed in the FIB
table. By calculating the bandwidth ratio of interfaces, you can see whether the bandwidth
ratio approaches the traffic ratio. In this way, you can learn whether UCMP functions
normally.
Terms
None

Abbreviation
ECMP equal cost multipath
UCMP unequal cost multipath
1.9.9 IPv4
Definition
Internet Protocol version 4 (IPv4) is the core of the TCP/IP protocol suite and works at the
Internet layer in the TCP/IP model. This layer corresponds to the network layer in the OSI
model. At the IP layer, information is divided into data units, and address and control
information is added to allow datagrams to be routed.
IP provides unreliable and connectionless data transmission services. Unreliable transmission
means that IP does not ensure that IP datagrams successfully arrive at the destination. IP only
provides best effort delivery. Once an error occurs, for example, a router exhausts the buffer,
IP discards the excess datagrams and sends ICMP messages to the source. The upper layer
protocols, such as TCP, are responsible for resolving reliability issues. Connectionless
transmission means that IP does not maintain status information for subsequent datagrams.
Every datagram is processed independently, meaning that IP datagrams may not be received
in the same order they are sent. If a source sends two consecutive datagrams A and B in
sequence to the same destination, each datagram is possibly routed over a different path to the
destination, and therefore B may arrive ahead of A.
On an IP network, each host must have an IP address for communication. An IP address is of
32 bits and consists of two parts: network ID and host ID.
 The network ID uniquely identifies a network segment or the summarized network
segment of multiple network segments.
 The host ID uniquely identifies a specific device on a network segment.
Issue 01 (2018-05-04) 1017

NE20E-S2
If multiple devices on the same network segment have the same network ID, they belong to
the same network, regardless of their physical locations.
Purpose
IPv4 shields link layer protocol differences and provides a uniform standard for transmission
at the network layer.
1.9.9.2 Principles
1.9.9.2.1 ICMP
The Internet Control Message Protocol (ICMP) is an error-reporting mechanism and is used
by IP or an upper-layer protocol (TCP or UDP). An ICMP message is encapsulated as a part
of an IP datagram and transmitted through the Internet.
An IP datagram contains information about only the source and destination, not about all
nodes along the entire path through which the IP datagram passes. The IP datagram can record
information about all nodes along the path only when route record options are set in the IP
datagram. Therefore, if a device detects an error, it reports the error to the source and not to
intermediate devices.
When an error occurs during the IP datagram forwarding, ICMP reports the error to the source
of the IP datagram, but does not rectify the error or notify the intermediate devices of the error.
A majority of errors generally occur on the source. When an error occurs on an intermediate
device, however, the source cannot locate the device on which the error occurs even after
receiving the error report.
Time Exceeded Message

During the process of forwarding or assembling an IP datagram, if the time-to-live (TTL)
field in the IP datagram is zero, the receiving device sends a Time Exceeded message to the
source.
Port Unreachable Message

If a host or routing device receives a local UDP or TCP datagram but cannot find the process
corresponding to the destination port of the datagram, the host or routing device sends a Port
Unreachable message to the source.
Destination Unreachable Message

If a network is unreachable, route selection fails. If a host is unreachable, message delivery
fails. The source device can determine which address is unreachable by checking the IP
header and the 64 most significant bits in the original IP datagram (Internet Header + 64 bits
of the Original Data Datagram field).
When a routing device forwards a message that meets the following conditions:
 No route is available for the destination address of the message.
 The message is not sent to the local host.
The routing device will discard the message and return an ICMP Net Unreachable message to
the source address to inform the source host to stop sending messages to this destination.
Issue 01 (2018-05-04) 1018

NE20E-S2
1.9.9.2.2 TCP
The Transmission Control Protocol (TCP) defined in standard protocols ensures
high-reliability transmission between hosts. TCP provides reliable, connection-oriented, and
full-duplex services for user processes. TCP transmits data through sequenced and
nonstructural byte streams.
TCP is an end-to-end, connection-oriented, and reliable protocol. TCP supports multiple
network applications. In addition, TCP assumes that the lower layer provides only unreliable
datagram services, and it can run over a network of different hardware structures.
Figure 1-698 shows the position of TCP in a layered protocol architecture, where TCP is
above IP. TCP can transmit variable-length data through IP encapsulation. IP then performs
data fragmentation and assembly and transmits the data over multiple networks.
Figure 1-698 TCP in the layered protocol architecture
TCP works below applications and above IP. Its upper-layer interface consists of a series of
calls similar to the interrupt call of an operating system.
TCP can asynchronously transmit data of upper-layer applications. The lower-layer interfaces
are assumed as IP interfaces. To implement connection-oriented and reliable data transmission
over unreliable networks, TCP must provide the following:
 Reliability and flow control functions
 Multiple interfaces for upper-layer applications
 Data for multiple applications
 Connection assurance
 Communication security assurance
Figure 1-699 shows the process of setting up and tearing down a TCP connection.
Issue 01 (2018-05-04) 1019

NE20E-S2
Figure 1-699 Setup and teardown of a TCP connection
1.9.9.2.3 UDP
The User Datagram Protocol (UDP) is a computer communication protocol that provides
packet switching services on the Internet. By default, UDP uses IP as the lower-layer protocol.
UDP provides the simplest protocol mechanism that sends information to a user application.
UDP is transaction-oriented and does not support delivery or duplicate protection. TCP,
however, is required by applications for reliable data transmission. Figure 1-700 shows the
format of a UDP datagram.
Figure 1-700 UDP datagram format
1.9.9.2.4 RawIP
RawIP only fills in certain fields of an IP header and allows an application to provide its own
IP header. Similar to UDP, RawIP is unreliable. No control mechanism is available to verify
whether a RawIP datagram is received. RawIP is connectionless, and it transmits data between
hosts without an electric circuit of any type. Unlike UDP, RawIP allows application data to be
Issue 01 (2018-05-04) 1020

NE20E-S2
directly processed at the IP layer through a socket. This is helpful to the applications that need
to directly communicate with the IP layer.
1.9.9.2.5 Socket
A socket consists of a set of application programming interfaces (APIs) working between the
transport layer and application layer. The socket shields differences of transport layer
protocols and provides the uniform programming interfaces for the application layer. In this
manner, the application layer, being exempt from the detailed process of the TCP/IP protocol
suite, can transmit data over IP networks by calling socket functions. Figure 1-701 shows the
position of the socket in the TCP/IP protocol stack.
Figure 1-701 Schematic diagram of the socket in the TCP/IP protocol stack
The following types of sockets are supported by different protocols at the transport layer:
 TCP-based socket: provides reliable byte-stream communication services for the
application layer.
 UDP-based socket: supports connectionless and unreliable data transmission for the
application layer and preserves datagram boundaries.
 RawIP socket: also called raw socket. Similar to the UDP-based socket, the RawIP
socket supports connectionless and unreliable data transmission and preserves datagram
boundaries. The RawIP socket is unique in that it can be used by applications to directly
access the network layer.
 Link layer-based socket: used by Intermediate System to Intermediate System (IS-IS) to
directly access the link layer.
Issue 01 (2018-05-04) 1021

NE20E-S2
Security of the IPv4 Protocol Stack

In normal situations, Net Unreachable messages, Time Exceeded messages, and Port
Unreachable messages in ICMP can be correctly sent and received. When network traffic is
heavy and a great number of errors occur, a router sends a great number of ICMP messages,
which increases network traffic. Receiving and processing these messages may cause router
performance to deteriorate. In addition, network attacks are usually initiated by using ICMP
error messages, which may worsen network congestion.
On the NE20E, you can enable or disable the sending and receiving of ICMP messages.
In the inbound direction, you can control the following ICMP messages:
 Echo Request message
 Echo Reply message
 Host Unreachable message
 Time Exceeded message
 Port Unreachable message
In the outbound direction, you can control the following ICMP messages:
 Port Unreachable message
 Destination Unreachable message
If you disable the sending or receiving of ICMP messages, the router does not send or receive
any ICMP message. This reduces network traffic and router burden and prevents malicious
attacks.
Alternatively, you can limit the ICMP message rate and configure the router to discard ICMP
messages with the TTL 1 and ICMP messages that carry options. This reduces router burden.
1.9.10 IPv6
Definition
Internet Protocol version 6 (IPv6), also called IP Next Generation (IPng), is the
second-generation standard protocol of network layer protocols. As a set of specifications
defined by the Internet Engineering Task Force (IETF), IPv6 is the upgraded version of
Internet Protocol version 4 (IPv4).
The most significant difference between IPv6 and IPv4 is that IP addresses are lengthened
from 32 bits to 128 bits. Featuring a simplified header format, sufficient address space,
hierarchical address structure, flexible extended header, and an enhanced neighbor discovery
(ND) mechanism, IPv6 has a competitive future in the market.
Issue 01 (2018-05-04) 1022

NE20E-S2
Purpose
IP technology has become widely applied due to the great success of the IPv4 Internet. As the
Internet develops, however, IPv4 weaknesses have become increasingly obvious in the
following aspects:
 The IPv4 address space is insufficient.
An IPv4 address is identified using 32 bits. In theory, a maximum of 4.3 billion
addresses can be provided. In actual applications, less than 4.3 billion addresses are
available because of address allocation. In addition, IPv4 address resources are allocated
unevenly. The USA occupies almost half of the global address space, Europe uses fewer
IPv4 addresses, while the Asian-Pacific region uses an even smaller quantity. The
shortage of IPv4 addresses limits further development of mobile IP and bandwidth
technologies that require an increasing number of IP addresses.
There are several solutions to IPv4 address exhaustion. Classless Inter-domain Routing
(CIDR) is one of them. CIDR, however, have their disadvantages, which helped
encourage the development of IPv6.
 The backbone router maintains too many routing entries.
In the initial IPv4 allocation planning, many discontinuous IPv4 addresses were allocated,
and therefore routes cannot be aggregated effectively. The constantly growing routing
table consumes significant memory, affecting forwarding efficiency. Subsequently,
device manufacturers have to upgrade routers to improve route addressing and
forwarding performance.
 Address auto configuration and readdressing cannot be performed easily.
An IPv4 address only has 32 bits, and IP addresses are allocated unevenly. Consequently,
IP addresses must be reallocated during network expansion or replanning. Address
autoconfiguration and readdressing are required to simplify maintenance.
 Security cannot be guaranteed.
As the Internet develops, security issues have become more serious. Security was not
fully considered in designing IPv4. Therefore, the original framework cannot implement
end-to-end security. An IPv6 packet contains a standard extension header related to IP
security (IPsec), which allows IPv6 to provide end-to-end security.
IPv6 solves the problem of IP address shortage and has the following advantages:
 Easy to deploy.
 Compatible with various applications.
 Smooth transition from IPv4 networks to IPv6 networks.
With so many obvious advantages over IPv4, IPv6 has developed rapidly.
1.9.10.2 Principles
IPv6 basic functions include IPv6 neighbor discovery and path MTU (PMTU) discovery.
Neighbor discovery and PMTU discovery are implemented using Internet Control Message
Protocol for IPv6 (ICMPv6) messages.
1.9.10.2.1 IPv6 Addresses
IPv6 Address Format

A 128-bit IPv6 address has two formats:
Issue 01 (2018-05-04) 1023

NE20E-S2
 X:X:X:X:X:X:X:X
− IPv6 addresses in this format are written as eight groups of four hexadecimal digits
(0 to 9, A to F), each group separated by a colon (:). Every "X" represents a group
of hexadecimal digits. For example, 2031:0000:130F:0000:0000:09C0:876A:130B
is a valid IPv6 address.
For convenience, any zeros at the beginning of a group can be omitted; therefore,
the given example becomes 2031:0:130F:0:0:9C0:876A:130B.
− Any number of consecutive groups of 0s can be replaced with two colons (::).
Therefore, the given example can be written as 2031:0:130F::9C0:876A:130B.
This double-colon substitution can only be used once in an address; multiple
occurrences would be ambiguous.
 X:X:X:X:X:X:d.d.d.d
IPv4-mapped IPv6 address: The format of an IPv4-mapped IPv6 address is
0:0:0:0:0:FFFF:IPv4-address. IPv4-mapped IPv6 addresses are used to represent IPv4
node addresses as IPv6 addresses.
"X:X:X:X:X:X" represents the high-order six groups of digits, each "X" standing for 16
bits represented by hexadecimal digits. "d.d.d.d" represents the low-order four groups of
digits, each "d" standing for 8 bits represented by decimal digits. "d.d.d.d" is a standard
IPv4 address.
IPv6 Address Structure

An IPv6 address is composed of two parts:
 Network prefix: network ID of an IPv4 address, which is of n bits.
 Interface identifier: host ID of an IPv4 address, which is of 128-n bits.
Figure 1-702 illustrates the structure of the address 2001:A304:6101:1::E0:F726:4E58 /64.
Figure 1-702 Structure of the address 2001:A304:6101:1::E0:F726:4E58 /64
IPv6 Address Types

IPv6 addresses have three types.
 Unicast address: identifies a single network interface and is similar to an IPv4 unicast
address. A packet sent to a unicast address is transmitted to the unique interface
identified by this address.
 Anycast address: assigned to a group of interfaces, which usually belong to different
nodes. A packet sent to an anycast address is transmitted to only one of the member
interfaces, typically the nearest according to the routing protocol's choice of distance.
Issue 01 (2018-05-04) 1024

NE20E-S2
Application scenario: When a mobile host communicates with the mobile agent on the
home subnet, it uses the anycast address of the subnet router.
Addresses specifications: Anycast addresses do not have independent address space.
They can use the format of any unicast address. A syntax is required to differentiate an
anycast address from a unicast address.
 Multicast address: assigned to a set of interfaces that belong to different nodes and is
similar to an IPv4 multicast address. A packet that is sent to a multicast address is
delivered to all the interfaces identified by that address.
IPv6 addresses do not include broadcast addresses. In IPv6, multicast addresses can
provide the functions of broadcast addresses.
Unicast addresses can be classified into four types, as shown in Table 1-176.
Table 1-176 IPv6 unicast address types
Address Type Binary Prefix IPv6 Prefix Identifier

Link-local unicast 1111111010 FE80::/10
address
Unique local unicast 1111110 FC00::/7
address
Loopback address 00...1 (128 bits) ::1/128
Unspecified address 00...0 (128 bits) ::/128
Global unicast address Others -
Each unicast address type is described as follows:

 Link-local unicast address: used in the neighbor discovery protocol and in the
communication between nodes on the local link during stateless address
autoconfiguration. The packet with the link-local unicast address as the source or
destination address is only forwarded on the local link. The link-local unicast address can
be automatically configured on any interface using the link-local prefix FE80::/10(1111
1110 10), and the interface identifier in IEEE EUI-64 format (an EUI-64 can be derived
from an EUI-48).
 Unique Local unicast address: is globally unique and intended for local communication.
Unique local unicast addresses are not expected to be routable on the global internet.
They are routable inside a site and also possibly between a limited set of sites. These
addresses are not auto-configured. A unique local unicast address consists of a 7-bit
prefix, a 41-bit global ID (including the L bit which is one bit), a 16-bit subnet ID, and a
64-bit interface ID.
 Loopback address: is 0:0:0:0:0:0:0:1 or ::1 and not assigned to any interface. Similar to
the IPv4 loopback address 127.0.0.1, the IPv6 loopback address indicates that a node
sends IPv6 packets to itself.
 Unspecified address (::): can neither be assigned to any node nor function as the
destination address. The unspecified address can be used in the Source Address field of
the IPv6 packet sent by an initializing host before it has learned its own address. During
Duplicate Address Detection (DAD), the Source Address field of a Neighbor Solicitation
(NS) packet is an unspecified address.
Issue 01 (2018-05-04) 1025

NE20E-S2
 Global unicast address: equivalent to an IPv4 public network address. Global unicast
addresses are used on links that can be aggregated, and are provided to the Internet
Service Provider (ISP). The structure of this type of address enables route-prefix
aggregation to solve the problem of a limited number of global routing entries. A global
unicast address consists of a 48-bit route prefix managed by operators, a 16-bit subnet ID
managed by local nodes, and a 64-bit interface ID. Unless otherwise specified, global
unicast addresses include site-local unicast addresses.
Interface ID in the IEEE EUI-64 Format

The 64-bit interface ID in an IPv6 address identifies a unique interface on a link. This address
is derived from the link-layer address (such as a MAC address) of the interface. The 64-bit
IPv6 interface ID is translated from a 48-bit MAC address by inserting a hexadecimal number
FFFE (1111 1111 1111 1110) into the MAC address, and then setting the U/L bit (the leftmost
seventh bit) to 1. Figure 1-703 shows translation from a MAC address to an EUI-64 address.
Figure 1-703 Translation from a MAC address to an EUI-64 address
1.9.10.2.2 IPv6 Features

IPv6 supports the following features:
 Hierarchical address structure
The IPv6 hierarchical address structure facilitates route search, reduces the IPv6 routing
table size using route aggregation, and improves the forwarding efficiency of routers.
 Automatic address configuration
IPv6 supports stateful and stateless address autoconfiguration to simplify the host
configuration process.
− In stateful address autoconfiguration, the host obtains the address and configuration
from a server.
− In stateless address autoconfiguration, the host automatically configures an IPv6
address that contains the prefix advertised by the local router and interface ID of the
host. If no router exists on the link, the host can only configure the link-local
address automatically to interwork with local nodes.
 Selection of source and destination addresses
Issue 01 (2018-05-04) 1026

NE20E-S2
When network administrators need to specify or plan source and destination addresses of
packets, they can define a group of address selection rules. An address selection policy
table can be created based on these rules. Similar to a routing table, this table is queried
based on the longest matching rule. The address is selected based on the source and
destination addresses.
Select a source address using the following rules in descending order of priority:
a. Prefer a source address that is the same as the destination address.
b. Prefer an address in an appropriate address range.
c. Avoid selecting a deprecated address.
d. Prefer a home address.
e. Prefer an address of the outbound interface.
f. Prefer an address whose label value is the same as that of the destination address.
g. Use the longest matching rule.
The candidate address can be the unicast address that is configured on the specified outbound interface.
If a source address that has the same label value and is in the same address range with the destination
address is not found on the outbound interface, you can select such a source address from another
interface.
Select a destination address using the following rules in descending order of priority.
a. Avoid selecting an unavailable destination address.
b. Prefer an address in an appropriate address range.
c. Avoid selecting a deprecated address.
d. Prefer a home address.
e. Prefer an address whose label value is the same as that of the source address.
f. Prefer an address with a higher precedence value.
g. Prefer native transport to the 6over4 or 6to4 tunnel.
h. Prefer an address in a smaller address range.
i. Use the longest matching rule.
j. Leave the order of address priorities unchanged.
 QoS
In an IPv6 header, the new Flow Label field specifies how to identify and process traffic.
The Flow Label field identifies a flow and allows a router to recognize packets in the
flow and to provide special processing.
QoS is guaranteed even for the packets encrypted with IPsec because the IPv6 header
can identify different types of flows.
 Built-in security
An IPv6 packet contains a standard extension header related to IPsec, and therefore IPv6
can provide end-to-end security. This provides network security specifications and
improves interoperability between different IPv6 applications.
 Fixed basic header
A fixed basic header helps improve forwarding efficiency.
 Flexible extension header
An IPv4 header only supports the 40-byte Options field, whereas the size of the IPv6
extension header is limited only by the IPv6 packet size.
In IPv6, multiple extension headers are introduced to replace the Options field of the
IPv4 header. This improves packet processing efficiency, enhances IPv6 flexibility, and
Issue 01 (2018-05-04) 1027

NE20E-S2
provides better scalability for the IP protocol. Figure 1-704 shows an IPv6 extension
header.
Figure 1-704 IPv6 extension header
When multiple extension headers are used in the same packet, the headers must be listed in
the following order:
 IPv6 basic header
 Hop-by-hop extension header
 Destination options extension header
 Routing extension header
 Fragment extension header
 Authentication extension header
 Encapsulation security extension header
 Destination options extension header (options to be processed at the destination)
 Upper layer extension header
Not all extension headers must be examined and processed by routers. When a router
forwards packets, it determines whether or not to process the extension headers based on the
Next Header value in the IPv6 basic header.
The destination options extension header appears twice in a packet: one before the routing
extension header and one after the upper layer extension header. All other extension headers
appear only once.
1.9.10.2.3 ICMPv6
Internet Control Message Protocol for IPv6 (ICMPv6) is a basic IPv6 protocol that uses error
or informational messages to report errors and information generated during packet
processing. Figure 1-705 shows the ICMPv6 message format.
Issue 01 (2018-05-04) 1028

NE20E-S2
Figure 1-705 ICMPv6 message format
Each field in an ICMPv6 message is described as follows:

 Type field: message type. Values from 0 to 127 indicate an error message, and values
from 128 to 255 indicate an informational message.
 Code field: specific message type.
 Checksum field: checksum of an ICMPv6 message.
ICMPv6 Error Message Classification

ICMPv6 error messages are classified as follows:
 Destination Unreachable message
When an IPv6 node forwards IPv6 packets and finds that the destination address of the
packets is unreachable, it sends an ICMPv6 Destination Unreachable message to the
source node of the packets. The message carries specific causes for the error. Destination
Unreachable messages are classified into the following types:
− No Route to Destination
− Address Unreachable
− Port Unreachable
 Datagram Too Big message
When an IPv6 node forwards IPv6 packets and finds that the size of the packets exceeds
the outbound interface path MTU, it sends an ICMPv6 Datagram Too Big message to the
source node of the packets. The message carries the outbound interface path MTU. Path
MTU discovery is implemented based on Datagram Too Big messages.
During IPv6 packet transmission, when a router receives a packet with the hop limit 0 or
reduces the hop limit to 0, it sends an ICMPv6 Time Exceeded message to the source
node of the packets. During the processing of a packet to be fragmented and reassembled,
an ICMPv6 Time Exceeded message is also generated when the reassembly time is
longer than the specified period.
 Parameter Problem message
When a destination node receives an IPv6 packet, it checks its validity. If it detects any
of the following errors, it sends an ICMPv6 Parameter Problem message to the source
node of the packet:
− A field in the IPv6 basic header or extension header is incorrect.
− The NextHeader in the IPv6 basic header or extension header cannot be identified.
− Unknown options exist in the extension headers.
Issue 01 (2018-05-04) 1029

NE20E-S2
ICMPv6 Informational Message Classification

ICMPv6 informational messages are classified as Echo Request or Echo Reply messages.
ICMPv6 messages can be used for network fault diagnosis, path MTU discovery, and
neighbor discovery. During interworking detection between two nodes, after a node receives
an Echo Request message, it sends an Echo Reply message to the source node. Packets are
subsequently transmitted between the two nodes.
1.9.10.2.4 Neighbor Discovery

Neighbor discovery (ND) is a group of messages and processes that identify relationships
between neighboring nodes. IPv6 ND contains the same features of the Address Resolution
Protocol (ARP) and ICMP router discovery in IPv4, as well as additional functions.
After a node is configured with an IPv6 address, it checks that the address is available and
does not conflict with other addresses. When a node is a host, a router must notify it of the
optimal next hop address of a packet to a destination. When a node is a router, it must
advertise its address and address prefix, along with other configuration parameters to instruct
hosts to configure parameters. When forwarding IPv6 packets, a node must know the link
layer addresses and check the availability of neighboring nodes. IPv6 ND provides four types
of ICMPv6 messages:
 Router Solicitation (RS): After startup, a host sends an RS message to a router, and waits
for the router to respond with a Router Advertisement (RA) message.
 Router Advertisement (RA): A router periodically advertises RA messages containing
prefixes and flag bits.
 Neighbor Solicitation (NS): An IPv6 node uses NS messages to obtain the link-layer
address of its neighbor, check that the neighbor is reachable, and perform duplicate
address detection.
 Neighbor Advertisement (NA): After receiving an NS message, an IPv6 node responds
with an NA message. In addition, the IPv6 node initially sends NA messages when the
link layer changes.
IPv6 ND provides the following functions: duplicate address detection, neighbor discovery,
router discovery, and address autoconfiguration.
Duplicate Address Detection

Duplicate address detection checks whether an IPv6 address is available. The detailed process
is as follows:
1. When a node is configured with an IPv6 address, it immediately sends an NS message to
check whether this address is already being used by another neighboring node.
2. After receiving the NS message, neighboring nodes check whether the same IPv6
address exists. If so, it sends an NA message with the IPv6 address to the source node.
3. After the source node receives the NA message, it considers this IPv6 address already in
use by a neighbor. Conversely, if the source node does not receive any NA message after
sending its NS message, the configured IPv6 address is available.
Neighbor Discovery
Similar to ARP in IPv4, IPv6 ND parses the neighbor addresses and detects the availability of
neighbors based on NS and NA messages.
Issue 01 (2018-05-04) 1030

NE20E-S2
When a node needs to obtain the link-layer address of another node on the same local link, it
sends an ICMPv6 type 135 NS message. An NS message is similar to an ARP request
message in IPv4, but is destined for a multicast address rather than a broadcast address. Only
the node whose last 24 bits in its address are the same as the multicast address can receive the
NS message. This reduces the possibility of broadcast storms. A destination node fills its
link-layer address in the NA message.
An NS message is also used to detect the availability of a neighbor when the link-layer
address of the neighbor is known. An NA message is the response to an NS message. After
receiving an NS message, a destination node responds with an ICMPv6 type 136 NA message
on the local link. After receiving the NA message, the source node can communicate with the
destination node. When the link-layer address of a node on the local link changes, the node
actively sends an NA message.
Router Discovery
Router discovery is used to locate a neighboring router and learn the address prefix and
configuration parameters related to address autoconfiguration. IPv6 router discovery is
implemented based on the following messages:
 RS message
When a host is not configured with a unicast address, for example, when the system has
just started, it sends an RS message. An RS message helps the host rapidly perform
address autoconfiguration without waiting for the RA message that is periodically sent
by an IPv6 device. An RS message is of the ICMPv6 type 133.
 RA message
Interfaces on each IPv6 device periodically send RA messages only when they are
enabled to do so. After a router receives an RS message from an IPv6 device on the local
link, the router responds with an RA message. An RA message is sent to the all-nodes
multicast address (FF02::1) or to the IPv6 unicast address of the node that sent the RS
message. An RA message is of the ICMPv6 type 134 and contains the following
information:
− Whether or not to use address autoconfiguration
− Supported autoconfiguration type: stateless or stateful
− One or more on-link prefixes (On-link nodes can perform address autoconfiguration
using these address prefixes.)
− Lifetime of the advertised on-link prefixes
− Whether the router sending the RA message can be used as a default router (If so,
the lifetime of the default router is also included, expressed in seconds.)
− Other information about the host, such as the hop limit and the MTU that specifies
the maximum size of the packet initiated by a host
After an IPv6 host on the local link receives an RA message, it extracts the preceding
information to obtain the updated default router list, prefix list, and other configurations.
Address Autoconfiguration
A router can notify hosts of how to perform address autoconfiguration using RA messages and
prefix flags. For example, the router can specify stateful or stateless address autoconfiguration
for the hosts.
Issue 01 (2018-05-04) 1031

NE20E-S2
When stateless address autoconfiguration is employed, a host uses the prefix information in a
received RA message and local interface ID to automatically form an IPv6 address, and sets
the default router according to the default router information in the message.
Security Neighbor Discovery

IPsec is well suited for IPv6 networks, but it does not address all security issues. In addition
to IPsec, IPv6 requires more security mechanisms.
In the IPv6 protocol suite, ND is significant in ensuring availability of neighbors on the local
link. As network security problems intensify, how to secure ND becomes a concern. Standard
protocols define several threats to ND security, some of which are described as follows.
Table 1-177 IPv6 ND attack modes
Attack Mode Description

NS/NA An attacker sends an authorized node (host or router) an NS message
spoofing with a bogus source link-layer address option, or an NA message with a
bogus target link-layer address option. Then packets from the authorized
node are sent to this link-layer address.
Neighbor An attacker repeatedly sends forged NA messages in response to an
unreachability authorized node's NUD NS messages so that the authorized node cannot
detection detect the neighbor unreachability. The consequences of this attack
(NUD) failure depend on why the neighbor became unreachable and how the authorized
node would behave if it knew that the neighbor has become unreachable.
Duplicate An attacker responds to every DAD attempt made by a host that accesses
Address the network, claiming that the address is already in use. Then the host
Detection will never obtain an address.
(DAD) attacks
Spoofed An attacker uses the link-local address of the first-hop router to send a
Redirect Redirect message to an authorized host. The authorized host accepts this
message message because the host mistakenly considers that the message came
from the first-hop router.
Replay attacks An attacker captures valid messages and replays them. Even if Neighbor
Discovery Protocol (NDP) messages are cryptographically protected so
that their contents cannot be forged, they are still prone to replay attacks.
Bogus address An attacker sends a bogus RA message specifying that some prefixes are
prefix on-link. If a prefix is on-link, a host will not send any packets that
contain this prefix to the router. Instead, the host will send NS messages
to attempt address resolution, but the NS messages are not responded. As
a result, the host is denied services.
Malicious An attacker multicasts bogus RA messages or unicasts bogus RA
last-hop router messages in response to multicast RS messages to a host attempting to
discover a last-hop router. If the host selects the attacker as its default
router, the attacker is able to insert himself as a man-in-the-middle and
intercepts all messages exchanged between the host and its destination.
Issue 01 (2018-05-04) 1032

NE20E-S2
To counter these threats, standard protocols specifies security mechanisms to extend ND.
Standard protocols define Cryptographically Generated Addresses (CGAs), CGA option, and
Rivest Shamir Adleman (RSA) Signature option, which are used to ensure that the sender of
an ND message is the owner of the message's source address. Standard protocols also define
Timestamp and Nonce options to prevent replay attacks.
 CGA: contains an IPv6 interface identifier that is generated from a one-way hash of the
public key and associated parameters.
 CGA option: contains information used to verify the sender's CGA, including the public
key of the sender. CGA is used to authenticate the validity of source IP addresses carried
in ND messages.
 RSA option: contains the hash value of the sender's public key and contains the digital
signature generated from the sender's private key and ND messages. RSA is used to
authenticate the completeness of ND messages and the identity of the ND message
sender.
For an attacker to use an address that belongs to an authorized node, the attacker must use the public key
of the authorized node for encryption. Otherwise, the receiver can detect the attempted attack after
checking the CGA option. Even if the attacker obtains the public key of the authorized node, the receiver
can still detect the attempted attack after checking the digital signature, which is generated from the
sender's private key.
 Timestamp option: a 64-bit unsigned integer field containing a timestamp. The value
indicates the number of seconds since January 1, 1970, 00:00 UTC. This option protects
non-solicit notification messages and Redirect messages and ensures that the timestamp
of the recently received message is the latest.
 Nonce option: contains a random number selected by the sender of a solicitation message.
This option prevents replay attacks during message exchange. For example, a sender
sends an NS message carrying the Nonce option and receives an NA message as a
response that also carries the Nonce option; the sender verifies the NA message based on
the Nonce option.
To reject insecure ND messages, an interface can have the IPv6 SEND function configured.
An ND message that meets any of the following conditions is insecure:
 The received ND message does not carry the CGA or RSA option, which indicates that
the interface sending this message is not configured with a CGA.
 The key length of the received ND message exceeds the length limit that the interface
supports.
 The rate at which ND messages are received exceeds the system rate limit.
 The time difference between the sent and received ND messages exceeds the time
difference allowed by the interface.
As router implementation complies with Standard protocols, the key-hash field in the RSA signature
option of ND packets is generated using the SHA-1 algorithm. SHA-1 has been proved not secure
enough.
1.9.10.2.5 Path MTU
Problems Related to the MTU

IPv6 packets cannot be fragmented on the transit node. Therefore, the packet length is often
greater than the path MTU (PMTU). The source node then needs to continually retransmit the
Issue 01 (2018-05-04) 1033

NE20E-S2
IPv6 packets, which reduces transmission efficiency. If the source node uses the minimum
IPv6 MTU of 1280 bytes as the maximum fragment length, in most cases, the PMTU is
greater than the minimum IPv6 MTU of the link, and the fragments sent by a node are always
smaller than the PMTU. As a result, network resources are wasted. To resolves this problem,
the PMTU discovery mechanism is introduced.
PMTU Principles
PMTU is the process of determining the minimum IPv6 MTU on the path between the source
and destination. The PMTU discovery mechanism uses a technique to dynamically discover
the PMTU for a path. When an IPv6 node has a large amount of data to send to another node,
the data is transmitted in a series of IPv6 fragments. When these fragments are of the
maximum length allowed in successful transmission from the source node to the destination
node, the fragment length is considered optimal and called PMTU.
A source node assumes that the PMTU of a path is the known IPv6 MTU of the first hop on
the path. If any of the packets sent on that path are too large to be forwarded, the transit node
discards these packets and returns an ICMPv6 Datagram Too Big message to the source node.
The source node sets the PMTU for the path based on the IPv6 MTU in the received message.
When the PMTU learned by the transit node is less than or equal to the actual PMTU, the
PMTU discovery process is complete. Before the PMTU discovery process is completed,
ICMPv6 Datagram Too Big messages may be repeatedly sent or received because there may
be links with smaller MTUs further along the path.
1.9.10.2.6 Dual Protocol Stacks

A dual-stack node supports both IPv4 and IPv6 protocol stacks. Figure 1-706 shows the
structure of a single stack and a dual stack.
Figure 1-706 Structure of a single stack and a dual stack in Ethernet
A dual stack has the following advantages:

 Multiple link protocols support the dual stack.
Multiple link protocols, such as Ethernet, support the dual stack. In Figure 1-706, the
link protocol is Ethernet. If the value of the Protocol ID field is 0x0800 in an Ethernet
frame, the network layer receives IPv4 packets; if the value is 0x86DD, the network
layer receives IPv6 packets.
 Multiple applications support the dual stack.
Issue 01 (2018-05-04) 1034

NE20E-S2
Multiple applications, such as DNS, FTP, and Telnet, support the dual stack. The upper
layer applications, such as the DNS, use TCP or UDP as the transmission layer protocol
and prefer the IPv6 protocol stack rather than the IPv4 protocol stack as the network
layer protocol.
1.9.10.2.7 IPv6 over IPv4 Tunnel

During the early transition from IPv4 to IPv6 networks, a large number of deployed IPv4
networks isolated the IPv6 networks at sites all over the world. With the tunneling technology,
IPv6 over IPv4 tunnels can be created on the IPv4 networks to connect the isolated IPv6 sites.
To establish IPv6 over IPv4 tunnels, the IPv4/IPv6 dual stack must be enabled on the router at
the borders of the IPv4 and IPv6 networks.
Figure 1-707 shows how to apply the IPv6 over IPv4 tunnel.
Figure 1-707 Applying an IPv6 over IPv4 tunnel
1. On the border router, IPv4/IPv6 dual stack is enabled, and an IPv6 over IPv4 tunnel is
configured.
2. After the border router receives a packet from the IPv6 network, if the destination
address of the packet is not the border router and the outbound interface of the next hop
is a tunnel interface, the border router appends an IPv4 header to the IPv6 packet to
encapsulate it as an IPv4 packet.
3. On the IPv4 network, the encapsulated packet is transmitted to the remote border router.
4. The remote border router receives the packet, removes the IPv4 header, and then sends
the decapsulated IPv6 packet to the remote IPv6 network.
IPv6 over IPv4 tunnels are classified into IPv6 over IPv4 manual tunnels and
IPv6-to-IPv4 (6to4) tunnels in different application scenarios.
The following describes the characteristics and applications of each.
IPv6 over IPv4 Manual Tunnel

An IPv6 over IPv4 manual tunnel is manually configured between two border routers. The
source and destination IPv4 addresses of the tunnel need to be statically specified. Manual
tunnels can be used for communication between isolated IPv6 sites, or configured between
border routers and hosts. Hosts and routers on both ends of a manual tunnel must support the
IPv4/IPv6 dual stack.
IPv6-to-IPv4 Tunnel
A 6to4 tunnel can connect multiple isolated IPv6 sites through an IPv4 network. A 6to4 tunnel
can be a P2MP connection, whereas a manual tunnel is a P2P connection. Therefore, routers
on both ends of the 6to4 tunnel are not configured in pairs.
Issue 01 (2018-05-04) 1035

NE20E-S2
A 6to4 tunnel uses a special IPv6 address, a 6to4 address in the format of 2002:IPv4
address:subnet ID:interface ID. A 6to4 address has a 48-bit prefix composed of 2002:IPv4
address. The IPv4 address is the globally unique IPv4 address applied by an isolated IPv6 site.
This IPv4 address must be configured on the physical interfaces connecting the border routers
between IPv6 and IPv4 networks to the IPv4 network. The IPv6 address has a 16-bit subnet
ID and a 64-bit interface ID, which are assigned by users in the isolated IPv6 site.
When the 6to4 tunnel is used for communication between the 6to4 network and the native
IPv6 network, you can configure an anycast address with the prefix 2002:c058:6301/48 on
the tunnel interface of the 6to4 relay route device.
The difference between a 6to4 address and anycast address is as follows:
 If a 6to4 address is used, you must configure different addresses for tunnel interfaces of
all devices.
 If an anycast address is used, you must configure the same address for the tunnel
interfaces of all devices, effectively reducing the number of addresses.
A 6to4 network refers to a network on which all nodes are configured with 6to4 addresses. A
native IPv6 network refers to a network on which nodes do not need to be configured with
6to4 addresses. A 6to4 relay is required for communication between 6to4 networks and native
IPv6 networks.
Figure 1-708 6to4 tunnel and 6to4 relay
1.9.10.2.8 TCP6
Transmission Control Protocol version 6 (TCP6) provides a mechanism to establish virtual
circuits between processes of two endpoints. A TCP6 virtual circuit is similar to the
full-duplex circuit that transmits data between systems. TCP 6 provides reliable data
transmission between processes, and is known as a reliable protocol. TCP6 also provides a
mechanism to optimize transmission performance according to the network status. When all
data can be received and acknowledged, the transmission rate increases gradually. Delay
causes the sending host to reduce the sending rate before it receives Acknowledgement
packets.
TCP6 is generally used in interactive applications, such as the web application. Certain errors
in data receiving affect the normal operation of devices. TCP6 establishes virtual circuits
using the three-way handshake mechanism, and all virtual circuits are deleted through the
four-way handshake. TCP6 connections provide multiple checksums and reliability functions,
Issue 01 (2018-05-04) 1036

NE20E-S2
but increase cost. As a result, TCP6 has lower efficiency than User Datagram Protocol version
6 (UDP6).
Figure 1-709 shows the establishment and tearing down of a TCP6 connection.
Figure 1-709 Establishment and tearing down of a TCP6 connection
1.9.10.2.9 UDP6
User Datagram Protocol version 6 (UDP6) is a computer communications protocol used to
exchange packets on a network. UDP6 has the following characteristics:
 UDP only uses source and destination information and is mainly used in the simple
request/response structure.
 UDP is unreliable. This is because no control mechanism is available to ensure that
UDP6 datagrams reach their destinations.
 UDP is connectionless, meaning that no virtual circuits are required during data
transmission between hosts.
The connectionless feature of UDP6 enables it to send data to multicast addresses. This is
different from TCP6, which requires specific source and destination addresses.
1.9.10.2.10 RawIP6
RawIP6 fills only a limited number of fields in the IPv6 header, and allows application
programs to provide their own IPv6 headers.
RawIP6 is similar to UDP6 in the following aspects:
Issue 01 (2018-05-04) 1037

NE20E-S2
 RawIP6 is unreliable because no control mechanism is available to ensure that RawIP6

datagrams reach their destinations.
 RawIP6 is connectionless, meaning that no virtual circuits are required during data
transmission between hosts.
Unlike UDP6, rawIP6 allows application programs to directly operate the IP layer through the
socket, which facilitates direct interactions of applications with the lower layer.
1.10 IP Routing
Purpose
This document describes the IP Routing feature in terms of its overview, principles, and
applications.
Related Version

U2000 V200R017C50
Intended Audience
Issue 01 (2018-05-04) 1038

NE20E-S2

securely protected.
and VRRP.
Special Declaration
Symbol Conventions
Issue 01 (2018-05-04) 1039

NE20E-S2
Symbol Description



injury.
tips.
deterioration.
Change History
V800R009C10SPC200.
V800R009C10SPC100.
V800R009C10.
1.10.2 IP Routing Overview

Definition
As a basic concept on data communication networks, routing is the process of packet relaying
or forwarding, and the process provides route information for packet forwarding.
Issue 01 (2018-05-04) 1040

NE20E-S2
Purpose
During data forwarding, routers, routing tables, and routing protocols are indispensable.
Routing protocols are used to discover routes and contribute to the generation of routing
tables. Routing tables store the routes discovered by various routing protocols, and routers
select routes and implement data forwarding.
1.10.2.2 Principles
1.10.2.2.1 Routers
On the Internet, network connection devices control network traffic and ensure data
transmission quality on networks. Common network connection devices include hubs, bridges,
switches, and routers.
As a standard network connection device, a router is used to select routes and forward packets.
Based on the destination address in the received packet, a router selects a path to send the
packet to the next router. The last router is responsible for sending the packet to the
destination host. In addition, a router can select an optimal path for data transmission.
For example, in Figure 1-710, traffic from Host A to Host C needs to pass through three
networks and two routers. The hop count from a router to its directly connected network is
zero. The hop count from a router to a network that the router can reach through another
router is one. The rest can be deduced by analogy. If a router is connected to another router
through a network, a network segment exists between the two routers, and they are considered
adjacent on the Internet. In Figure 1-710, the bold arrows indicate network segments. The
routers do not need to know about the physical link composition of each network segment.
Figure 1-710 Network segment and hop count
Network sizes may vary greatly, and the actual lengths of network segments vary as well.
Therefore, you can set a weighted coefficient for the network segments of each network and
then measure the cost of a route based on the number of network segments.
A route with the minimal network segments is not necessarily optimal. For example, a route
passing through three high-speed Local Area Network (LAN) network segments may be a
better choice than one passing through two low-speed Wide Area Network (WAN) network
segments.
Issue 01 (2018-05-04) 1041

NE20E-S2
1.10.2.2.2 Routing Protocols

Routing protocols are rules used by routers to discover routes, add routes, and maintain
routing tables for packet forwarding.
1.10.2.2.3 Routing Tables

A router searches a routing table for routes, and each router maintains at least one routing
table.
Routing tables store the routes discovered by various routing protocols. Based on the
generation method, routes in a routing table consist of the following types:
 Routes discovered by link layer protocols, which are also called interface routes or direct
routes
 Static routes configured by the network administrator
 Dynamic routes that are discovered by dynamic routing protocols
Routing Table Types

Each router maintains a local core routing table, and each routing protocol maintains its own
routing table.
 Protocol routing table
A protocol routing table stores routing information discovered by the protocol.
A routing protocol can import and advertise routes generated by other routing protocols.
For example, if a router that runs Open Shortest Path First (OSPF) needs to use OSPF to
advertise direct routes, static routes, or Intermediate System to Intermediate System
(IS-IS) routes, the router needs to import these routes into the OSPF routing table.
 Local core routing table
A local core routing table stores protocol routes and optimal routes and selects routes
based on the priorities of routing protocols and costs of routes. You can run the display
ip routing-table command to view the local core routing table of a router.
Each router that supports Layer 3 Virtual Private Network (L3VPN) maintains a management routing
table (local core routing table) for each VPN instance.
Contents in the Routing Table

On the NE20E, the display ip routing-table command displays brief information about the
routing table.
<HUAWEI> display ip routing-table
Route Flags: R - relay, D - download
to fib, T - to vpn-instance, B - black hole route
------------------------------------------------------------------------------
Routing Table: Public
Destinations : 8 Routes : 8
Destination/Mask Proto Pre Cost Flags NextHop Interface
0.0.0.0/0 Static 60 0 D 1.1.4.2 GigabitEthernet1/0/0

1.1.4.0/30 OSPF 10 0 D 1.1.4.1 GigabitEthernet1/0/0
Issue 01 (2018-05-04) 1042

NE20E-S2
1.1.4.1/32 Direct 0 0 D 127.0.0.1 InLoopBack0

1.1.4.2/32 OSPF 10 0 D 1.1.4.2 GigabitEthernet1/0/0
A routing table contains the following key entries:

 Destination: indicates the destination IP address or the destination network address of an
IP packet.
 Mask: indicates the network mask. The network mask and the destination address are
used together to identify the address of the network segment where the destination host
or router resides.
− The address of the network segment where the destination host or router resides can
be calculated using after the AND operation on the destination address and network
mask. For example, if the destination address is 1.1.1.1 and the mask is
255.255.255.0, the address of the network segment where the host or the router
resides is 1.1.1.0.
− The mask, which consists of several consecutive 1s, can be expressed either in
dotted decimal notation or by the number of consecutive 1s in the mask. For
example, the length of the mask 255.255.255.0 is 24, and therefore, the mask can
also be expressed as 24.
 Protocol: indicates the name of a routing protocol.
 Pre: indicates the priority of a route that is added to the IP routing table. If multiple
routes have the same destination but different next hops or outbound interfaces or these
routes are static routes or discovered by different routing protocols, the one with the
highest priority (the smallest value) is selected as the optimal route. For the route priority
of each routing protocol, see 1.10.2.2.7 .
 Cost: indicates the route cost. When multiple routes to the same destination have the
same priority, the route with the smallest cost is selected as the optimal route.
The Preference is used during the selection of routes discovered by different routing protocols, whereas
the Cost is used during the selection of routes discovered by the same routing protocol.
 Flags:
Route flag:
− R: indicates an iterated route.
− D: indicates a route that is downloaded to the FIB.
− T: indicates a route whose next hop belongs to a VPN instance.
− B: indicates a black-hole rout
 Next hop: indicates the IP address of the next router through which an IP packet passes.
 Interface: indicates the outbound interface that forwards an IP packet.
Based on the destination addresses, routes can be classified into the following types:
 Network segment route: The destination is a network segment.
 Host route: The destination is a host.
In addition, based on whether the destination is directly connected to the router, route types
are as follows:
Issue 01 (2018-05-04) 1043

NE20E-S2
 Direct route: The router is directly connected to the destination network.

 Indirect route: The router is indirectly connected to the destination network.
Setting a default route can reduce the number of routing entries in the routing table. When a
router cannot find a route in the routing table, the router uses the default route (destined for
0.0.0.0/0) to send packets.
In Figure 1-711, Device A is connected to three networks, and therefore, it has three IP
addresses and three outbound interfaces. Figure 1-711 shows the routing table on Device A.
Figure 1-711 Routing table
1.10.2.2.4 Route Iteration

Routes can be used to forward traffic only when they have directly connected next hops.
However, this condition may not be met when routes are generated. Therefore, the system
needs to search for the directly connected next hops and corresponding outbound interfaces,
and this process is called route iteration. In most cases, BGP routes, static routes, and UNRs
do not have directly connected next hops, and route iteration is required.
For example, the next hop IP address of a BGP route is the IP address of a non-directly
connected peer's loopback interface, and therefore, the BGP route needs to be iterated.
Specifically, the system searches the IP routing table for a direct route (IGP route in most
cases) that is destined for the next hop IP address of the BGP route and then adds the next hop
IP address and outbound interface of the IGP route to the IP routing table to generate a FIB
entry.
The next hop IP address of a BGP VPN route is the IP address of a non-directly connected
PE's loopback interface, and the BGP route needs to be iterated to a tunnel. Specifically, the
system searches the tunnel list for a tunnel that is destined for this loopback IP address and
then adds the tunnel information to the routing table to generate a FIB entry.
Issue 01 (2018-05-04) 1044

NE20E-S2
1.10.2.2.5 Static and Dynamic Routes

Static routes can be easily configured and have low requirements on the system. They apply to
simple, stable, and small-scale networks. However, they cannot automatically adapt to
network topology changes. Therefore, static routes require subsequent maintenance.
Dynamic routing protocols have their routing algorithms and can automatically adapt to
network topology changes. They apply to the network equipped with a number of Layer 3
devices. Dynamic route configurations are complex. Dynamic routes have higher
requirements on a system than static ones do and consume network resources.
1.10.2.2.6 Classification of Dynamic Routing Protocols

Dynamic routing protocols can be classified based on the following criteria.
Based on the Application Scope

Based on the application scope, routing protocol types can be defined as follows:
 Interior Gateway Protocol (IGP): runs within an Autonomous System (AS), such as RIP,
OSPF, and IS-IS.
 Exterior Gateway Protocol (EGP): runs between ASs. At present, BGP is the most widely
used EGP.
Based on the Routing Algorithm

Based on the routing algorithm, routing protocol types can be defined as follows:
 Distance-vector routing protocol: includes RIP and BGP. BGP is also called a
path-vector protocol.
 Link-state routing protocol: includes OSPF and IS-IS.
These routing algorithms differ in their methods of discovering and calculating routes.
Based on the Destination Address Type

Based on the destination address type, routing protocol types can be defined as follows:
 Unicast routing protocol: includes RIP, OSPF, BGP, and IS-IS.
 Multicast routing protocol: includes Protocol Independent Multicast-Sparse Mode
(PIM-SM).
This chapter describes unicast routing protocols only. For details on multicast routing
protocols, see the HUAWEI NE20E-S2 Universal Service Router Feature Description - IP
Multicast.
Routers manage both static and dynamic routes. These routes can be exchanged between
different routing protocols to implement 1.10.2.2.11 Readvertisement of Routing Information.
1.10.2.2.7 Routing Protocol and Route Priority
Route Priority
Dynamic routes of different routing protocols and static routes may have the same destination,
but not all these routes are optimal. At a certain moment, only one routing protocol determines
the current route to a certain destination. To select the optimal route, each routing protocol
Issue 01 (2018-05-04) 1045

NE20E-S2
and static route are configured with priorities, and the route with the highest priority becomes
the optimal route. Table 1-178 lists routing protocols and their default priorities.
In Table 1-178, 0 indicates a direct route, and 255 indicates any route learned from unreliable
sources. The smaller the value, the higher the priority.
Table 1-178 Routing protocols and default route priorities

Routing Protocol or Route Priority
Route Type
Direct 0
OSPF 10
IS-IS 15
Static 60
RIP 100
OSPF ASE 150
OSPF NSSA 150
IBGP 255
EBGP 255
The priorities of routing protocols can be configured, except for direct routes. In addition, the
priority of each static route can be different.
The NE20E defines external and internal priorities. The external priority refers to the priority
set by users for each routing protocol. Table 1-178 lists the default external priorities.
When different routing protocols are configured with the same priority, the system selects the
optimal route based on the internal priority. For the internal priority of each routing protocol,
see Table 1-179.
Table 1-179 Internal priority of each routing protocol
Routing Protocol or Route Route Priority

Type
Direct 0
OSPF 10
IS-IS Level-1 15
IS-IS Level-2 15
Static 60
RIP 100
OSPF ASE 150
Issue 01 (2018-05-04) 1046

NE20E-S2
Routing Protocol or Route Route Priority

Type
OSPF NSSA 150

IBGP 170
EBGP 170
For example, two routes, an OSPF route and a static route, can reach 10.1.1.0/24, and the
priorities of the two routes are set to 5. In this case, the NE20E selects the optimal route based
on the internal priorities listed in Table 1-179. The internal priority of OSPF (10) is higher
than that of the static route (60). Therefore, the system selects the route discovered by OSPF
as the optimal route.
1.10.2.2.8 Priority-based Route Convergence
Definition
Priority-based route convergence is an important technology that improves network reliability.
It provides faster route convergence for key services. For example, to minimize the
interruption of key services in case of network faults, real-time multicast services require that
the routes to the multicast source quickly converge, and the Multiprotocol Label Switching
(MPLS) VPN bearer network requires that routes between PEs also quickly converge.
Convergence priorities provide references for the system to converge routes for service
forwarding. Different routes can be set with different convergence priorities, which can be
identified as critical, high, medium, and low listed in descending order.
Purpose
With the integration of network services, requirements on service differentiation increase.
Carriers require that the routes for key services, such as Voice over IP (VoIP) and video
conferencing services converge faster than those for common services. Therefore, routes need
to converge based on their convergence priorities to improve network reliability.
Route Convergence Priority

Table 1-180 lists the default convergence priorities of public network routes. You can set
convergence priorities for routes based on the requirements of a live network.
Table 1-180 Default convergence priorities of public network routes
Routing Protocol or Route Convergence Priority

Type
Direct Critical
Static Medium
32-bit host routes of OSPF and Medium
IS-IS
OSPF route (except 32-bit host Low
Issue 01 (2018-05-04) 1047

NE20E-S2
Routing Protocol or Route Convergence Priority

Type
routes)
IS-IS route (except 32-bit host Low
routes)
RIP Low
BGP Low
For VPN route priorities, only 32-bit host routes of OSPF and IS-IS are identified as medium, and the
other routes are identified as low.
Applications
Figure 1-712 shows networking for multicast services. An IGP runs on the network; Device A
is the receiver, and Device B is the multicast source server with IP address 10.10.10.10/32.
The route to the multicast source server is required to converge faster than other routes, such
as 12.10.10.0/24. In this case, you can set a higher convergence priority for 10.10.10.10/32
than that of 12.10.10.0/24. Then, when routes converge on the network, the route to the
multicast source server 10.10.10.10/32 converges first, ensuring the transmission of multicast
services.
Figure 1-712 Networking for priority-based route convergence
1.10.2.2.9 Load Balancing and Route Backup
Load Balancing
The NE20E supports the multi-route model (multiple routes with the same destination and
priority). Routes discovered by one routing protocol with the same destination and cost can
load-balance traffic. In each routing protocol view, you can run the maximum
load-balancing number command to perform load balancing. Load balancing can work
per-destination or per-packet.
Issue 01 (2018-05-04) 1048

NE20E-S2

With per-packet load balancing, the router forwards packets destined for the same
destination through equal-cost routes, and each time the next hop address is different
from the last one.
 Per-destination load balancing
After per-destination load balancing is configured, the router forwards packets based on
the quintuple (the source address, destination address, source port, destination port, and
protocol in the packets). When the quintuple is the same, the router always chooses the
next hop address that is the same as the last one to send packets. Figure 1-713
per-destination load balancing.
Figure 1-713 Networking for per-destination load balancing
Device A needs to forward packets to 10.1.1.0/24 and 10.2.1.0/24. Based on

per-destination load balancing, packets of the same flow are transmitted along the same
path. The processes for forwarding packets on Device A are as follows:
− The first packet P1 to 10.1.1.0/24 is forwarded through Port 1, and all subsequent
packets to 10.1.1.0/24 are forwarded through Port 1.
− The first packet P1 to 10.2.1.0/24 is forwarded through Port 2, and all subsequent
packets to 10.2.1.0/24 are forwarded through Port 2.
Currently, RIP, OSPF, BGP, and IS-IS support load balancing, and static routes also support
load balancing.
The number of equal-cost routes for load balancing varies with products.
Route Backup
The NE20E supports route backup to improve network reliability. You can configure multiple
routes to the same destination as required. The route with the highest priority functions as the
primary route, and the other routes with lower priorities function as backup routes.
Issue 01 (2018-05-04) 1049

NE20E-S2
In most cases, the NE20E uses the primary route to forward packets. If the link fails, the
primary route becomes inactive. The NE20E then selects a backup route with the highest
priority to forward packets, and the primary route is switched to the backup route. When the
original primary route recovers, the NE20E restores and reselects the optimal route. Because
the original primary route has the highest priority, the NE20E selects this route to send
packets. Therefore, the backup route is switched to the primary route.
1.10.2.2.10 Principles of IP FRR
Overview
Fast Reroute (FRR) functions when the lower layer (physical layer or data link layer) detects a
fault. The lower layer reports the fault to the upper layer routing system and immediately
forwards packets through a backup link.
If a link fails, FRR helps reduce the impact of the link failure on services transmitted on the
link.
Background
On traditional IP networks, when a fault occurs at the lower layer of the forwarding link, the
physical interface on the router goes Down. After the router detects the fault, it instructs the
upper layer routing system to recalculate routes and then update routing information. The
routing system takes several seconds to reselect an available route.
For services that are sensitive to packet loss and delay, a convergence time of several seconds
is intolerable because it may lead to service interruptions. For example, the maximum
convergence time tolerable for Voice over IP (VoIP) services is within milliseconds. IP FRR
enables the forwarding system to detect a fault and then to take measures to restore services as
soon as possible.
Classification and Implementation

IP FRR, which is designed for routes on IP networks, consists of public network IP FRR and
VPN IP FRR.
 Public network IP FRR: protects routers on the public network.
 VPN IP FRR: protects Customer Edges (CEs).
The static routes that are imported between public and private networks do not support IP FRR.
IP FRR is implemented as follows:

 IP FRR can be enabled or disabled using commands.
 When optimal routes are selected from the routes discovered by routing protocols, a
backup link is selected for each preferred primary link based on the protocol priority, and
then the forwarding information of primary and backup links is provided for the
forwarding engine.
Implementation of IP FRR Between Different Protocols

When IP FRR between different protocols is enabled, and optimal routes are selected from
protocol routes, a backup link is selected for each preferred primary link based on the protocol
priority, and then the forwarding information of primary and backup links is provided for the
forwarding engine.
Issue 01 (2018-05-04) 1050

NE20E-S2
If the forwarding engine detects that the primary link is unavailable after IP FRR between
different protocols is enabled, the system can use the backup link to forward traffic before the
routes converge on the control plane.
Comparison Between IP FRR and Load Balancing
Table 1-181 Comparison between IP FRR and load balancing
Feature Description
IP FRR Implements FRR through a backed up route. IP FRR is applicable to
networks where a master link and a backup link exist and load balancing
is not configured.
Load Implements fast route switching through equal-cost routes and applies to
balancing the multi-link networking with load balancing.
1.10.2.2.11 Readvertisement of Routing Information

Different routing protocols may discover different routes because they adopt different routing
algorithms. When the scale of a network is large and multiple routing protocols run on the
network, these protocols need to readvertise their discovered routes.
On the NE20E, the routes discovered by a routing protocol can be imported into the routing
table of another routing protocol. Each protocol has its mechanism to import routes. For
details, see "Routing Policy."
1.10.2.2.12 Indirect Next Hop
Definition
Indirect next hop is a technique used to speed up route convergence. This technique can
change the direct association between route prefixes and next hop information into an indirect
association. Indirect next hop allows next hop information to be refreshed independently of
the prefixes of the same next hop, which speeds up route convergence.
Purpose
In the scenario requiring route iteration, when IGP routes or tunnels are switched, forwarding
entries are rapidly refreshed, which implements fast route convergence and reduces the impact
of route or tunnel switching on services.
Mapping Between the Route Prefix and the Next Hop

Mapping between route prefixes and next hops is the basis of indirect next hop. To meet the
requirements of route iteration and tunnel iteration in different scenarios, next hop
information includes the address family, original next hop address, and tunnel policy. The
system assigns an index to each next hop, performs route iteration, communicates the iteration
result to the routing protocol, and then delivers forwarding entries.
Issue 01 (2018-05-04) 1051

NE20E-S2
On-Demand Route Iteration

On the NE20E, the route to a reachable address is called a dependent route. The system
forwards packets based on dependent routes. The process of finding a dependent route based
on the next hop address is called route iteration.
On-demand route iteration indicates that when a dependent route changes, only the next hop
associated with the dependent route is re-iterated. If the route destination address is the
original next hop address or network segment address of next hop information, any route
changes affect the iteration result of the next hop information. Otherwise, route changes do
not affect next hop information. Therefore, when a route changes, you can re-iterate only the
associated next hop by assessing the destination address of the route. For example, if the
original next hop address of the route 2.2.2.2/32 is 1.1.1.1, the route that the original next hop
1.1.1.1 depends on may be 1.1.1.1/32 or 1.1.0.0/16. If the route 1.1.1.1/32 or 1.1.0.0/16
changes, the iteration result of the original next hop 1.1.1.1 is affected.
With respect to tunnel iteration, when a tunnel alternates between Up and Down, re-iterate the
next hop information whose original next hop address is the same as the destination address of
the tunnel.
Iteration Policy
An iteration policy is used to control the iteration result of the next hop to meet requirements
of different scenarios. In route iteration, behaviors do not need to be controlled by the
iteration policy. Instead, iteration behaviors only need to comply with the longest match rule.
In addition, the iteration policy needs to be applied only when VPN routes are iterated to
tunnels.
By default, the system selects Label Switched Paths (LSPs) for VPNs without performing
load balancing. If load balancing or other types of tunnels are required, configure a tunnel
policy and bind it to a tunnel. After the tunnel policy is applied, the system uses the tunnel
bound to the tunnel policy or selects a tunnel based on the priorities specified in the tunnel
policy during next hop iteration.
Mechanism for Indirect Next Hop

Without indirect next hop, the forwarding information corresponds to the prefix, and therefore,
the route convergence time is decided by the number of route prefixes. With indirect next hop,
multiple route prefixes correspond to one next hop. Forwarding information is added to the
forwarding table using the next hop, and traffic with relevant route prefixes can be switched,
which speeds up route convergence.
Issue 01 (2018-05-04) 1052

NE20E-S2
Figure 1-714 Implementation without indirect next hop
As shown in Figure 1-714, without indirect next hop, prefixes are totally independent, each
corresponding to its next hop and forwarding information. When a dependent route changes,
the next hop corresponding to each prefix is iterated and forwarding information is updated
based on the prefix. In this case, the convergence time is decided by the number of prefixes.
Note that prefixes of a BGP peer have the same next hop, forwarding information, and
refreshed forwarding information.
Figure 1-715 Implementation with indirect next hop
As shown in Figure 1-715, with indirect next hop, prefixes of routes from the same BGP peer
share the same next hop. When a dependent route changes, only the shared next hop is
iterated and forwarding information is updated based on the next hop. In this case, routes of
all prefixes can converge at a time. Therefore, the convergence time is irrelevant to the
number of prefixes.
Comparison Between Route Iteration and Tunnel Iteration

The following table lists differences between route iteration and tunnel iteration.
Table 1-182 Differences between route iteration and tunnel iteration
Iteration Type Description
Route iteration  Applies to BGP public network routes.

 Is triggered by route changes.
 Supports next hop iteration based on the specified routing policy.
Issue 01 (2018-05-04) 1053

NE20E-S2
Iteration Type Description

Tunnel iteration  Applies to BGP VPN routes.
 Is triggered by tunnel or tunnel policy changes.
 Iteration behaviors can be controlled using a tunnel policy to meet
requirements of different scenarios.
IBGP Route Iteration to an IGP Route
Figure 1-716 Networking for IBGP route iteration
In Figure 1-716, an IBGP peer relationship is established between Device A and Device D.
The IBGP peer relationship is established between two loopback interfaces on the routers, but
the next hop cannot be used to guide packet forwarding, because it is not directly reachable.
Therefore, to refresh the forwarding table and guide packet forwarding, the system needs to
search for the actual outbound interface and directly connected next hop based on the original
IBGP next hop.
Device D receives 100,000 routes from Device A. These routes have the same original BGP
next hop. After being iterated, these routes eventually follow the same IGP path (A->B->D).
If the IGP path (A->B->D) fails, these IBGP routes do not need to be iterated separately, and
the relevant forwarding entries do not need to be refreshed one by one. Note that only the
shared next hop needs to be iterated and refreshed. Consequently, these IBGP routes converge
to the path (A->C->D) on the forwarding plane. Therefore, the convergence time depends on
only the number of next hops, not the number of prefixes.
If Device A and Device D establish a multi-hop EBGP peer relationship, the convergence
procedure is the same as the preceding one. Indirect next hop also applies to the iteration of a
multi-hop EBGP route.
Issue 01 (2018-05-04) 1054

NE20E-S2
VPN Routes Iteration to a Tunnel
Figure 1-717 Networking for VPN route iteration
In Figure 1-717, a neighbor relationship is established between PE1 and PE2, and PE2
receives 100,000 VPN routes from PE1. These routes have the same original BGP next hop.
After being iterated, these VPN routes eventually follow the same public network tunnel
(tunnel 1). If tunnel 1 fails, these routes do not need to be iterated separately, and the relevant
forwarding entries do not need to be refreshed one by one. Note that only the shared next hop
needs to be iterated, and the relevant forwarding entries need to be refreshed. Consequently,
these VPN routes converge to tunnel 2 on the forwarding plane. In this manner, the
convergence time depends on only the number of next hops, not the number of prefixes.
1.10.2.2.13 Default Route

Default routes are special routes. In most cases, they are configured by administrators. Default
routes can also be generated by dynamic routing protocols, such as OSPF and IS-IS.
Default routes are used only when no matching routing entry is available for packet
forwarding in the routing table. A default route in the routing table is the route to the network
0.0.0.0 (with mask 0.0.0.0). You can check whether the default route is configured using the
display ip routing-table command.
If the destination address of a packet does not match any entry in the routing table, the packet
is sent along a default route. If no default route exists and the destination address of the packet
does not match any entry in the routing table, the packet is discarded. An Internet Control
Message Protocol (ICMP) packet is then sent, informing the originating host that the
destination host or network is unreachable.
1.10.2.2.14 Multi-Topology
Multi-Topology Overview
On a traditional IP network, only one unicast topology exists, and only one unicast forwarding
table is available on the forwarding plane, which forces services transmitted from one router
to the same destination address to share the same next hop, and various end-to-end services,
such as voice and data services, to share the same physical links. As a result, some links may
become heavily congested while others remain relatively idle. To address this problem,
configure multi-topology to divide a physical network into different logical topologies for
different services.
By default, the base topology is created on the public network. The class-specific topology
can be added or deleted in the public network address family view. Each topology contains its
Issue 01 (2018-05-04) 1055

NE20E-S2
own routing table. The class-specific topology supports the addition, deletion, and import of
protocol routes.
The base topology cannot be deleted.
Direct Routes Supporting Multi-Topology

Direct routes can be added to or deleted from the routing table of any topology. The same
routes can also be added to multiple topologies, independent of each other.
Direct routes associated with interfaces are added to the base topology by default. Direct
routes in the base topology are not deleted, and the base topology contains all direct routes.
Static Routes Supporting Multi-Topology

Static routes can be added to or deleted from the routing table of any topology. The routes
with the same prefix, outbound interface, and next hop can also be added to multiple
topologies, independent of each other.
Static routes, by default, are configured in the base topology. However, they can be
configured in a specified class-specific topology and can be changed or deleted.
Static routes have no outbound interfaces, and therefore, need to be iterated based on the next
hop. In this case, you cannot specify the topology in which the next hop resides.
Public network static route iteration to a VPN next hop or VPN static route iteration to a
public network next hop can be configured only in the base topology. When configuring static
routes, you cannot specify the name of the topology in which the destination resides.
1.10.2.2.15 Association Between Direct Routes and a VRRP Backup Group
Background
A VRRP backup group is configured on Device1 and Device2 on the network shown in Figure
1-718. Device1 is a master device, whereas Device2 is a backup device. The VRRP backup
group serves as a gateway for users. User-to-network traffic travels through Device1.
However, network-to-user traffic may travel through Device1, Device2, or both of them over
a path determined by a dynamic routing protocol. Therefore, user-to-network traffic and
network-to-user traffic may travel along different paths, which interrupts services if firewalls
are attached to devices in the VRRP backup group, complicates traffic monitoring or statistics
collection, and increases costs.
To address the preceding problems, the routing protocol is expected to select a route passing
through the master device so that the user-to-network and network-to-user traffic travels along
the same path. Association between direct routes and a VRRP backup group can meet
expectations by allowing the dynamic routing protocol to select a route based on the VRRP
status.
Issue 01 (2018-05-04) 1056

NE20E-S2
Figure 1-718 Association between direct routes and a VRRP backup group
Related Concepts
VRRP is a widely used fault-tolerant protocol that groups multiple routing devices into a
backup group, improving network reliability. A VRRP backup group consists of a master
device and one or more backup devices. If the master device fails, the VRRP backup group
switches services to a backup device to ensure communication continuity and reliability.
A device in a VRRP backup group operates in one of three states:
 Master: If a network is working correctly, the master device transmits all services.
 Backup: If the master device fails, the VRRP backup group selects a backup device as
the new master device to take over traffic and ensure uninterrupted service transmissions.
 Initialize: A device in the Initialize state is waiting for an interface Startup message to
switch its status to Master or Backup.
For details about VRRP, see HUAWEI NE20E-S2 Universal Service Router Feature Description -
Network Reliability - VRRP.
Issue 01 (2018-05-04) 1057

NE20E-S2
Implementation
Association between direct routes and a VRRP backup group allows VRRP interfaces to
adjust the costs of direct network segment routes based on the VRRP status. The direct route
with the master device as the next hop has the lowest cost. A dynamic routing protocol
imports the direct routes and selects the direct route with the lowest cost. For example, VRRP
interfaces on Device1 and Device2 on the network shown in Figure 1-718 are configured with
association between direct routes and the VRRP backup group. The implementation is as
follows:
 Device1 in the Master state sets the cost of its route to the directly connected virtual IP
network segment to 0 (default value).
 Device2 in the Backup state increases the cost of its route to the directly connected
virtual IP network segment.
A dynamic routing protocol selects the route with Device1 as the next hop because this route
costs less than the other route. Therefore, both user-to-network and network-to-user traffic
travels through Device1.
Usage Scenario
When a data center is used, firewalls are attached to devices in a VRRP backup group to
improve network security. Network-to-user traffic cannot pass through a firewall if it travels
over a path different than the one used by user-to-network traffic.
When an IP radio access network (RAN) is configured, VRRP is configured to set the
master/backup status of aggregation site gateways (ASGs) and radio service gateways (RSGs).
Network-to-user and user-to-network traffic may pass through different paths, complicating
network operation and management.
Association between direct routes and a VRRP backup group can address the preceding
problems by ensuring the user-to-network and network-to-user traffic travels along the same
path.
1.10.2.2.16 Direct Routes Responding to L3VE Interface Status Changes After a Delay
Background
In Figure 1-719, a Layer 2 virtual private network (VPN) connection is set up between each
AGG and the CSG through L2 virtual Ethernet (VE) interfaces, and BGP VPNv4 peer
relationships are set up between the AGGs and RSGs on an L3VPN. L3VE interfaces are
configured on the AGGs, and VPN instances are bound to the L3VE interfaces so that the
CSG can access the L3VPN. BGP is configured on the AGGs to import direct routes between
the CSG and AGGs. The AGGs convert these direct routes to BGP VPNv4 routes before
advertising them to the RSGs.
AGG1 functions as the master device in Figure 1-719. In most cases, the RSGs select routes
advertised by AGG1, and traffic travels along Link A. If AGG1 or the CSG-AGG1 link fails,
traffic switches over to Link B. After AGG1 or the CSG-AGG1 link recovers, the L3VE
interface on AGG1 goes from Down to Up, and AGG1 immediately generates a direct route
destined for the CSG and advertises the route to the RSGs. Downstream traffic then switches
over to Link A. However, AGG1 has not learned the MAC address of the NodeB yet. As a
result, downstream traffic is lost.
Issue 01 (2018-05-04) 1058

NE20E-S2
To address this problem, configure the direct route to respond to L3VE interface status
changes after a delay. After you configure the delay, the RSG preferentially selects routes
advertised by AGG1 only after AGG1 learns the MAC address of the NodeB.
Figure 1-719 Networking for the direct route responding to L3VE interface status changes after a
delay
Implementation
After you configure the direct route to respond to L3VE interface status changes after a delay,
the cost of the direct route between the CSG and AGG1 is modified to the configured cost
(greater than 0) when the L3VE interface on AGG1 goes from Down to Up. After the
configured delay expires, the cost of the direct route to the CSG restores to the default value 0.
Because BGP has imported the direct route and has advertised it to RSGs, the cost value
determines whether RSGs preferentially select the direct route.
RSGs preferentially transmit traffic over Link B before AGG1 has learned the MAC address
of the NodeB, which reduces traffic loss.
Usage Scenario
This feature applies to IP radio access networks (RANs) on which an L2VPN accesses an
L3VPN.
1.10.2.2.17 Association Between the Direct Route and PW Status
Background
In Figure 1-720, PWs are set up between the AGGs and the CSG. BGP virtual private network
version 4 (VPNv4) peer relationships are set up between the AGGs and RSGs. Layer 3 virtual
Ethernet (L3VE) interfaces are configured on the AGGs, and VPN instances are bound to the
L3VE interfaces so that the CSG can access the L3VPN. BGP is configured on the AGGs to
import direct routes between the CSG and AGGs. The AGGs convert these direct routes to
BGP VPNv4 routes before advertising them to the RSGs.
AGG1 functions as the master device in Figure 1-720. In most cases, the RSGs select routes
advertised by AGG1, and traffic travels along Link A. If AGG1 or the CSG-AGG1 link fails,
Issue 01 (2018-05-04) 1059

NE20E-S2
traffic switches over to Link B. After AGG1 or the CSG-AGG1 link recovers, the L3VE
interface on AGG1 goes from Down to Up, and AGG1 immediately generates a direct route
destined for the CSG and advertises the route to the RSGs. Downstream traffic then switches
over to Link A. However, PW1 is on standby. As a result, downstream traffic is lost.
To address this problem, associate the direct route and PW status. After the association is
configured, the RSG preferentially selects the direct route only after PW1 becomes active.
Figure 1-720 Networking for the association between the direct route and PW status
Implementation
Configuring the association between the direct route and PW status allows a VE interface to
adjust the cost value of the direct route based on PW status. The cost value determines
whether the RSGs preferentially select the direct route because BGP has imported the direct
route and has advertised it to RSGs. For example, if you associate the direct route and PW
status on the network shown in Figure 1-720, the implementation is as follow:
 When PW1 becomes active, the cost value of the direct route between the CSG and
AGG1 restores to the default value 0. RSGs preferentially transmit traffic over Link A.
 When PW1 is on standby, the cost value of the direct route between the CSG and AGG1
is modified to a configured value (greater than 0). RSGs preferentially transmit traffic
over Link B, which reduces traffic loss.
Usage Scenario
This feature applies to IP radio access networks (RANs) on which primary/secondary PWs are
configured between the CSG and AGGs.
1.10.2.2.18 Vlink Direct Route Advertisement
Background
By default, IPv4 Address Resolution Protocol (ARP) Vlink direct routes or IPv6 Neighbor
Discovery Protocol (NDP) Vlink direct routes are only used for packet forwarding in the same
Issue 01 (2018-05-04) 1060

NE20E-S2
VLAN and cannot be imported to dynamic routing protocols. This is because importing Vlink
direct routes to dynamic routing protocols will increase the number of routing entries and
affect routing table stability. In some cases, some operations need to be performed based on
Vlink direct routes of VLAN users. For example, different VLAN users use different route
exporting policies to guide traffic from the remote device. In this scenario, ARP or NDP Vlink
direct routes are needed to be imported by a dynamic routing protocol and advertised to the
remote device. After advertisement of ARP or NDP Vlink direct routes is enabled, these direct
routes can be imported by a dynamic routing protocol (IGP or BGP) and advertised to the
remote device.
Related Concepts
ARP Vlink direct routes: routing entries with physical interfaces of VLAN users and used to
forward IP packets. These physical interfaces are learned using ARP. On networks with
VLANs, IP packets can be forwarded only by physical interfaces rather than VLANIF
interfaces because VLANIF interfaces are logical interfaces that consist of multiple physical
interfaces.
NDP Vlink direct routes: routing entries carrying IPv6 addresses of VLAN users' physical
interfaces. These IPv6 addresses are learned and resolved using NDP.
Implementation
On the network shown in Figure 1-721, Device A, Device B, and Device C are connected to
the VLANIF interface of Device D which is a Border Gateway Protocol (BGP) peer of Device
E. However, Device E needs to communicate only with Device B rather than Device A and
Device C. In this scenario, Vlink direct route advertisement must be enabled on Device D.
Then Device D obtains each physical interface of Device A, Device B, and Device C, uses a
routing policy to filter out network segment routes and routes destined for Device A and
Device C, and advertises the route destined for Device B to Device E.
Figure 1-721 Networking for Vlink direct route advertisement
Usage Scenario
Vlink direct route advertisement is applicable to networks in which a device needs to add
Vlink direct routes with physical interfaces of VLAN users to the routing table of a dynamic
routing protocol before advertising the routes to remote ends.
Issue 01 (2018-05-04) 1061

NE20E-S2
Advantages
With Vlink direct route advertisement, a device can add Vlink direct routes to the routing
table of a dynamic routing protocol (such as an Interior Gateway Protocol or BGP) and then
use different export policies to advertise routes required by remote ends.
1.10.2.3.1 Typical Application of IP FRR
In Figure 1-722, CE1 is dual-homed to PE1 and PE2. CE1 is configured with two outbound
interfaces and two next hops. Link B functions as the backup of link A. If link A fails, traffic
can be rapidly switched to link B.
Figure 1-722 Configuring IP FRR
1.10.2.3.2 Data Center Applications of Association Between Direct Routes and a VRRP
Backup Group
Service Overview
A data center, used for service access and transmission, consists of many servers, disk arrays,
security devices, and network devices that store and process a great number of services and
applications. Firewalls are used to improve data security, and VRRP backup groups are
configured to improve communication reliability. VRRP may cause user-to-network traffic
and network-to-user traffic to travel along different paths, and as a result, the firewall may
discard the network-to-user traffic because of path inconsistency. To address this problem,
association between direct routes and a VRRP backup group must be configured.
Figure 1-723 shows a data center network. A server functions as a core service module in the
data center. A VRRP backup group protects data exchanged between the server and core
devices, improving service security. Firewalls are attached to devices in the VRRP backup
group to improve network security.
Issue 01 (2018-05-04) 1062

NE20E-S2
Figure 1-723 Data center network
Feature Deployment
The master device transmits server traffic to a core device. When the core device attempts to
send traffic to the server, the traffic can only pass through a firewall attached to the master
device. On the network shown in Figure 1-723, the server sends data destined for the core
device through the master device, and the core device sends data destined for the server along
a path that an Interior Gateway Protocol (IGP) selects. The association between the direct
routes and a VRRP backup group can be configured on switch A and switch B so that the IGP
selects a route based on VRRP status. The IGP forwards core-device-to-server traffic over the
same path as the one over which server-to-core-device traffic is transmitted, which prevents
the firewall from discarding traffic.
1.10.2.3.3 IPRAN Applications of Association Between Direct Routes and a VRRP

Backup Group
Service Overview
NodeBs and radio network controllers (RNCs) on an IP radio access network (RAN) do not
have dynamic routing capabilities. Therefore, static routes must be configured to allow
NodeBs to communicate with aggregation site gateways (ASGs) and allow RNCs to
communicate with remote service gateways (RSGs) that are at the aggregation layer. VRRP is
configured to provide ASG and RSG redundancy, improving device reliability and ensuring
Issue 01 (2018-05-04) 1063

NE20E-S2
non-stop transmission of value-added services, such as voice, video, and cloud computation
services over mobile bearer networks.
Figure 1-724 shows VRRP-based gateway protection applications on an IPRAN. A NodeB is
dual-homed to VRRP-enabled ASGs to communicate with the aggregation network. The
NodeB sends traffic destined for the RNC through the master ASG, while the RNC sends
traffic destined for the NodeB through either the master or backup ASG over a path selected
by a dynamic routing protocol. As a result, traffic in opposite directions may travel along
different paths. Similarly, the RNC is dual-homed to VRRP-enabled RSGs. Path inconsistency
may also occur.
Figure 1-724 VRRP-based gateway protection on an IPRAN
Feature Deployment
On the IPRAN shown in Figure 1-724, both ASGs and RSGs may send and receive traffic
over different paths. For example, user-to-network traffic enters the aggregation network
through the master ASG, while network-to-user traffic flows out of the aggregation network
from the backup ASG. Path inconsistency complicates traffic monitoring or statistics
collection and increases the cost. In addition, when the master ASG is working properly, the
backup ASG also transmits services, which is counterproductive to VRRP redundancy backup
implementation. Association between direct routes and the VRRP backup group can be
configured to ensure path consistency.
On the NodeB side, the direct network segment routes of ASG VRRP interfaces can be
associated with VRRP status. The route with the master ASG as the next hop has a lower cost
than the route with the backup ASG as the next hop. The dynamic routing protocol imports
the direct routes and selects the route with a lower cost, ensuring path consistency.
Implementation on the RNC side is similar to that on the NodeB side.
1.10.2.4 Appendix List of Port Numbers of Common Protocols
Table 1-183 Port numbers of routing protocols
Routing Protocol UDP Port Number TCP Port Number
Issue 01 (2018-05-04) 1064

NE20E-S2
Routing Protocol UDP Port Number TCP Port Number

RIP 520 -
RIPv2 520 -
RIPng 521 -
BGP - 179
OSPF - -
IS-IS - -
Note that "-" indicates that the related transport layer protocol is not used.
Table 1-184 Port numbers of application layer protocols
Application Layer UDP Port Number TCP Port Number

Protocol
DHCP 67 -
DNS 53 53
FTP - 20/21
HTTP - 80
IMAP - 993
NetBIOS 137/138 137/139
POP3 - 995
SMB 445 445
SMTP 25 25
SNMP 161 -
TELNET - 23
TFTP 69 -
Note that "-" indicates that the related transport layer protocol is not used.
Terms
Term Description
ARP Vlink direct IP packets are forwarded through a specified physical interface. IP
routes packets cannot be forwarded through a VLANIF interface, because a
VLANIF interface is a logical interface with several physical interfaces
Issue 01 (2018-05-04) 1065

NE20E-S2
Term Description
as its member interfaces. If an IPv4 packet reaches a VLANIF
interface, the device obtains information about the physical interface
using ARP and generates the relevant routing entry. The route recorded
in the routing entry is called an ARP Vlink direct route.
FRR FRR is applicable to services that are very sensitive to packet loss and
delay. When a fault is detected at the lower layer, the lower layer
informs the upper layer routing system of the fault. Then, the routing
system forwards packets through a backup link. In this manner, the
impact of the link fault on services is minimized.
NDP Vlink direct IP packets are forwarded through a specified physical interface. IP
routes packets cannot be forwarded through a VLANIF interface, because a
VLANIF interface is a logical interface with several physical interfaces
as its member interfaces. If an IPv6 packet reaches a VLANIF
interface, the device obtains information about the physical interface
using the neighbor discovery protocol (NDP) and generates the
relevant routing entry. The route recorded in the routing entry is called
an NDP Vlink direct route.
UNR When a user goes online through a Layer 2 device, such as a switch,
but there is no available Layer 3 interface and the user is assigned an
IP address, no dynamic routing protocol can be used. To enable
devices to use IP routes to forward the traffic of this user, use the
Huawei User Network Route (UNR) technology to assign a route to
forward the traffic of the user.
Abbreviations

BGP Border Gateway Protocol
CE Customer Edge
FIB Forwarding Information Base
IBGP Internal Border Gateway Protocol
IGP Internal Gateway Protocol
IS-IS Intermediate System-Intermediate System
NDP Neighbor Discover Protocol
OSPF Open Shortest Path First
PE Provider Edge
RIP Routing Information Protocol
RM Route Management
Issue 01 (2018-05-04) 1066

NE20E-S2

Vlink Virtual Link
VoIP Voice Over IP
VPN Virtual Private Network
VRP Versatile Routing Platform
1.10.3 Static Route

Definition
Static routes are special routes that are configured by network administrators.
Purpose
On a simple network, only static routes can ensure that the network runs properly. If a router
cannot run dynamic routing protocols or cannot generate routes to a destination network, you
can configure static routes on the router.
Route selection can be controlled using static routes. Properly configuring and using static
routes can improve network performance and guarantee the required bandwidth for important
applications. When a network fault occurs or the network topology changes, however, static
routes must be changed manually by the administrator.
1.10.3.2 Principles
1.10.3.2.1 Components
On the NE20E, you can run the ip route-static command to configure a static route, which
consists of the following components:
 Destination address and mask
 Outbound interface and next hop address
Destination Address and Mask

In the ip route-static command, the IPv4 address is expressed in dotted decimal notation, and
the mask is expressed in dotted decimal notation or represented by the mask length (the
number of consecutive 1s in the mask).
Outbound Interface and Next Hop Address

An outbound interface, a next hop address, or both of them can be configured for a static
route.
Actually, each routing entry requires a next hop address. Before sending a packet, a device
needs to search its routing table for the route matching the destination address in the packet
Issue 01 (2018-05-04) 1067

NE20E-S2
based on the longest match rule. The device can find the associated link layer address to
forward the packet only when the next hop address of the packet is available.
When specifying an outbound interface, note the following:
 For a Point-to-Point (P2P) interface, if the outbound interface is specified, the next hop
address is the address of the remote interface connected to the outbound interface. For
example, when a GE interface is encapsulated with Point-to-Point Protocol (PPP) and
obtains the remote IP address through PPP negotiation, you need to specify only the
outbound interface rather than the next hop address.
 Non-Broadcast Multiple-Access (NBMA) interfaces are applicable to
Point-to-Multipoint networks. Therefore, IP routes and the mappings between IP
addresses and link layer addresses are required. In this case, you need to configure next
hop addresses.
 An Ethernet interface is a broadcast interface and a virtual-template (VT) interface can
be associated with multiple virtual access (VA) interfaces. If the Ethernet interface or the
VT interface is specified as the outbound interface of a static route, the next hop cannot
be determined because multiple next hops exist. Therefore, do not specify an Ethernet
interface or a VT interface as the outbound interface unless necessary. If you need to
specify a broadcast interface (such as an Ethernet interface) or a VT interface as the
outbound interface, specify the associated next hop address.
In Figure 1-725, the network topology is simple, and network communication can be
implemented through static routes. You need to specify an address for each physical network,
identify indirectly connected physical networks for each router, and configure static routes for
indirectly connected physical networks.
Figure 1-725 Networking for static routes
In Figure 1-725, static routes to networks 3, 4, and 5 need to be configured on Device A; static
routes to networks 1 and 5 need to be configured on Device B; static routes to networks 1, 2,
and 3 need to be configured on Device C.
Default Static Route

Default routes are a special kind of routes and can be configured. The default route is used
only when no matched entry is available in the routing table. In a routing table, both the
destination address and the mask of the default route are 0.0.0.0.
If the destination address of a packet does not match any entry in the routing table, the router
selects the default route to forward this packet. If no default route exists and the destination
Issue 01 (2018-05-04) 1068

NE20E-S2
address of the packet does not match any entry in the routing table, the packet is discarded. An
Internet Control Message Protocol (ICMP) packet is then sent, informing the originating host
that the destination host or network is unreachable.
The static route with the destination address and mask 0s (0.0.0.0 0.0.0.0) configured using
the ip route-static command is a default route intended to simplify network configuration.
In Figure 1-725, because the next hop of the packets from Device A to networks 3, 4, and 5 is
Device B, a default route can be configured on Device A to replace the three static routes
destined for networks 3, 4, and 5. Similarly, only a default route from Device C to Device B
needs to be configured to replace the three static routes destined for networks 1, 2, and 3.
Floating Static Routes

Different static routes can be configured with different priorities so that routing management
policies can be flexibly applied. Route backup can be implemented by specifying different
priorities for multiple routes to the same destination.
In Figure 1-726, there are two static routes from Device A to Device C. In most cases, the
only Active route is the static route with Device B as the next hop in the routing table because
it has a higher priority. The other static route with Device D as the next hop functions as a
backup route. The backup route is only activated to forward traffic when the primary link fails.
After the primary link recovers, the static route with Device B as the next hop becomes Active
to take over the traffic. Therefore, the backup route is also called a floating static route. The
floating static route becomes ineffective if a fault occurs on the link between Device B and
Device C.
Figure 1-726 Networking for a floating static route
Load Balancing Among Static Routes

Routes to the same destination with the same priority can be used to load-balance traffic.
As shown in Figure 1-727, there are two static routes with the same priority from Device A to
Device C. The two routes both exist in the routing table and forward traffic at the same time.
Issue 01 (2018-05-04) 1069

NE20E-S2
Figure 1-727 Load balancing among static routes
FRR for Static Routes

When routes are delivered to the routing management (RM) module, the optimal route is
delivered with a backup route. If the optimal route fails, traffic is immediately switched to the
backup route, minimizing traffic loss.
You need to configure two routes with the same prefix but different priorities to implement
FRR. The route with the higher priority is the primary route, and the route with the lower
priority is the backup route. FRR is implemented only on static routes that are manually
configured. That is, FRR is not implemented on iterated next hops.
1.10.3.2.3 Functions
IPv4 Static Routes

The NE20E supports common static routes and the static routes associated with VPN
instances. The static routes associated with VPN instances are used to manage VPN routes.
For details about VPN instances, see the HUAWEI NE20E-S2 Universal Service Router
Feature Description - VPN.
Attributes and Functions of IPv6 Static Routes

Similar to IPv4 static routes, IPv6 static routes are configured by the administrator and are
applicable to simple IPv6 networks.
The major difference between IPv6 static routes and IPv4 static routes lies in their destination
addresses and next hop addresses.
An IPv6 static route with destination address ::/0 ( mask length 0) is a default IPv6 route. If
the destination address of an IPv6 packet fails to match any entry in the routing table, a router
selects the default IPv6 route to forward the IPv6 packet.
1.10.3.2.4 BFD for Static Routes

Different from dynamic routing protocols, static routes do not have a detection mechanism. If
a fault occurs on a network, an administrator must manually address it. Bidirectional
Forwarding Detection (BFD) for static routes is introduced to associate a static route with a
Issue 01 (2018-05-04) 1070

NE20E-S2
BFD session so that the BFD session can detect the status of the link that the static route
passes through.
After BFD for static routes is configured, each static route can be associated with a BFD
session. In addition to route selection rules, whether a static route can be selected as the
optimal route is subject to BFD session status.
 If a BFD session associated with a static route detects a link failure when the BFD
session is Down, the BFD session reports the link failure to the system. The system then
deletes the static route from the IP routing table.
 If a BFD session associated with a static route detects that a faulty link recovers when
the BFD session is Up, the BFD session reports the fault recovery to the system. The
system then adds the static route to the IP routing table again.
 By default, a static route can still be selected even though the BFD session associated
with it is AdminDown (triggered by the shutdown command run either locally or
remotely). If a device is restarted, the BFD session needs to be re-negotiated. In this case,
whether the static route associated with the BFD session can be selected as the optimal
route is subject to the re-negotiated BFD session status.
BFD for static routes has two detection modes:
 Single-hop detection
In single-hop detection mode, the configured outbound interface and next hop address
are the information about the directly connected next hop. The outbound interface
associated with the BFD session is the outbound interface of the static route, and the peer
address is the next hop address of the static route.
 Multi-hop detection
In multi-hop detection mode, only the next hop address is configured. Therefore, the
static route must be iterated to the directly connected next hop and outbound interface.
The peer address of the BFD session is the original next hop address of the static route,
and the outbound interface is not specified. In most cases, the original next hop to be
iterated is an indirect next hop. Multi-hop detection is performed on the static routes that
support route iteration.
For details about BFD, see the HUAWEI NE20E-S2 Universal Service Router Feature Description -
Reliability.
1.10.3.2.5 NQA for Static Routes
Background
Static routes do not have a dedicated detection mechanism. If a link fails, a network
administrator must manually delete the corresponding static route from the IP routing table.
This process delays link switchovers and can cause lengthy service interruptions.
Bidirectional Forwarding Detection (BFD) for static routes can use BFD sessions to monitor
the link status of a static route. However, both ends of the link must support BFD. Network
quality analysis (NQA) for static routes, however, can monitor the link status of a static route
as long as only one end supports NQA.
Table 1-185 compares BFD and NQA for static routes.
Issue 01 (2018-05-04) 1071

NE20E-S2
Table 1-185 Comparison between BFD and NQA for static routes
Item BFD for Static Routes NQA for Static Routes

Detection mode Bidirectional session Unidirectional detection
Requirements for Two communicating Two communicating
communicating devices devices; both must support devices; only one must
BFD. support NQA.
Detection speed Milliseconds Seconds
Related Concepts
NQA monitors network quality of service (QoS) in real time. If a network fails, NQA can be
used to diagnose the fault.
NQA relies on a test instance to monitor link status. The two ends of an NQA test are called
the NQA client and the NQA server. The NQA client initiates an NQA test that can return any
of the following results:
 success: The test is successful. NQA instructs the routing management module to set the
static route to active and add the static route to the routing table.
 failed: The test fails. NQA instructs the routing management module to set the static
route to inactive and delete the static route from the routing table.
 no result: The test is running and no result has been obtained, which does not change the
status of the static route.
For NQA details, see the chapter "System Monitor" in the HUAWEI NE20E-S2 Universal Service
Router Feature Description.
Implementation
NQA for static routes associates an NQA test instance with a static route and uses the NQA
test instance to monitor the link. The routing management module determines whether a static
route is active or inactive based on the test result. If the static route is inactive, the routing
management module deletes it from the IP routing table and selects a backup link for data
forwarding, which prevents lengthy service interruptions.
In Figure 1-728, each access switch is connected to 10 clients, and a total of 100 clients are
connected. Because no dynamic routing protocol can be deployed between Device B and the
clients, static routes to the clients must be configured on Device B, and backup static routes to
the clients can be configured on Device C.
Device A, Device B, and Device C run a dynamic routing protocol and learn routes from one
another. Device B and Device C are configured to import static routes to the routing table of
the dynamic routing protocol, and different costs are set for the static routes. Device A can
contact Device B and Device C using the dynamic routing protocol to learn routes to the
clients. Device A selects one primary link and one backup link based on link costs.
NQA for static routes, configured on Device B, uses NQA test instances to monitor the status
of the primary link. If the primary link fails, the corresponding static route is deleted and
network-to-client traffic switches to the backup link. When both the primary and backup links
are running properly, network-to-client traffic is preferentially transmitted along the primary
link.
Issue 01 (2018-05-04) 1072

NE20E-S2
Figure 1-728 NQA for static routes
NQA test instances support both IPv4 and IPv6 static routes. The mechanisms for monitoring IPv4 and
IPv6 static routes are the same.
Each static route can be associated with only one NQA test instance.
Usage Scenario
NQA for static routes applies to a network on which BFD for static routes cannot be deployed
due to device connectivity limitations. For example, switches, optical line terminals (OLTs),
digital subscriber line access multiplexers (DSLAMs), multiservice access nodes (MSANs),
or x digital subscriber lines (xDSLs) exist on the network.
Benefits
NQA for static routes can monitor the link status of static routes and implement rapid
primary/backup link switchovers, preventing lengthy service interruptions.
Issue 01 (2018-05-04) 1073

NE20E-S2
1.10.3.2.6 Static Route Permanent Advertisement
Background
When the link over which a static route runs fails, the static route will be deleted from the IP
routing table to trigger a route re-selection. After a new route is selected, traffic is switched to
the new route. Some carriers, however, may require that specific traffic always travel along a
fixed link, regardless of the link status. Static route permanent advertisement is introduced to
meet this service need.
Implementation
With static route permanent advertisement, a static route can still be advertised and added to
the IP routing table for route selection even when the link over which the static route runs
fails. After static route permanent advertisement is configured, the static route can be
advertised and added to the IP routing table in both of the following scenarios:
 An outbound interface is configured for the static route, and the outbound interface has
an IP address. Static route permanent advertisement is not affected no matter whether the
outbound interface is Up.
 No outbound interface is configured for the static route. Static route permanent
advertisement is not affected no matter whether the static route can obtain an outbound
interface through route iteration.
After static route permanent advertisement is enabled, a static route always remains in the IP routing
table regardless of route reachability. If the destination of the route becomes unreachable, traffic
interruption occurs.
Typical Networking
On the network shown in Figure 1-729, BR1, BR2, and BR3 belong to ISP1, ISP2, and ISP3
respectively. Two links (Link A and Link B) exist between BR1 and BR2, but ISP1 expects its
service traffic destined for ISP2 to be always transmitted over Link A.
Issue 01 (2018-05-04) 1074

NE20E-S2
Figure 1-729 Networking with static route permanent advertisement
A direct EBGP peer relationship is established between BR1 and BR2. A static route is created
on BR1, with 10.1.1.2/24 (IP address of BR2) as the destination address and the local
interface connected to BR2 as the outbound interface.
Without static route permanent advertisement, Link A is used to transmit traffic. If Link A
fails, BGP will switch the traffic to Link B.
With static route permanent advertisement, Link A is used to transmit traffic regardless of
whether the destination is reachable through Link A. If Link A fails, no link switchover is
performed, causing traffic interruption. To check whether the destination is reachable through
the static route, ping the destination address of the static route to which static route permanent
advertisement is applied.
1.10.4 RIP
Definition
Routing Information Protocol (RIP) is a simple Interior Gateway Protocol (IGP). RIP is used
in small-scale networks, such as campus networks and simple regional networks.
As a distance-vector routing protocol, RIP exchanges routing information through User
Datagram Protocol (UDP) packets with port number 520.
RIP employs the hop count as the metric to measure the distance to the destination. In RIP, by
default, the number of hops from the router to its directly connected network is 0; the number
of hops from the router to a network that is reachable through another router is 1, and so on.
The hop count (the metric) equals the number of routers along the path from the local network
to the destination network. To speed up route convergence, RIP defines the hop count as an
integer that ranges from 0 to 15. A hop count that is equal to or greater than 16 is classified as
infinite, indicating that the destination network or host is unreachable. Due to the hop limit,
RIP is not applicable to large-scale networks.
RIP has two versions:
Issue 01 (2018-05-04) 1075

NE20E-S2
 RIP version 1 (RIP-1), a classful routing protocol

 RIP version 2 (RIP-2), a classless routing protocol
RIP supports split horizon, poison reverse, and triggered update, which improves the
performance and prevents routing loops.
Purpose
As the earliest IGP, RIP is used in small and medium-sized networks. Its implementation is
simple, and the configuration and maintenance of RIP are easier than those of Open Shortest
Path First (OSPF) and Intermediate System-to-Intermediate System (IS-IS). Therefore, RIP is
widely used on live networks.
1.10.4.2 Principles
RIP is a distance-vector routing protocol. It forwards packets through UDP and uses timers to
control the advertisement, update, and aging of routing information. However, design defects
in RIP may cause routing loops. Therefore, split horizon, poison reverse, and triggered update
were introduced into RIP to avoid routing loops.
In addition, RIP periodically advertises its routing table to neighbors, and route
summarization was introduced to reduce the size of the routing table.
1.10.4.2.1 RIP-1
RIP version 1 (RIP-1) is a classful routing protocol, which supports only the broadcast of
protocol packets. Figure 1-730 shows the format of a RIP-1 packet. A RIP packet can carry a
maximum of 25 routing entries. RIP is based on UDP, and a RIP-1 packet cannot be longer
than 512 bytes. RIP-1 packets do not carry any mask information, and RIP-1 can identify only
the routes to natural network segments, such as Class A, Class B, and Class C. Therefore,
RIP-1 does not support route summarization or discontinuous subnets.
Figure 1-730 RIP-1 packet format
1.10.4.2.2 RIP-2
RIP version 2 (RIP-2) is a classless routing protocol. Figure 1-731 shows the format of a
RIP-2 packet.
Issue 01 (2018-05-04) 1076

NE20E-S2
Figure 1-731 RIP-2 packet format
Compared with RIP-1, RIP-2 has the following advantages:

 Supports external route tags and uses a routing policy to flexibly control routes based on
the tag.
 Supports route summarization and classless inter-domain routing (CIDR) by adding
mask information to RIP-2 packets.
 Supports next hop specification so that the optimal next hop address can be specified on
the broadcast network.
 Supports Update packets transmission along multicast routes. Only the routers that
support RIP-2 can receive RIP-2 packets, which reduces resource consumption.
 Provides two packet authentication modes: simple authentication and Message Digest 5
(MD5) authentication, which improves security.
1.10.4.2.3 Timers
RIP uses the following timers:
 Update timer: The Update timer periodically triggers Update packet transmission. By
default, the interval at which Update packets are sent is 30s.
 Age timer: If a RIP device does not receive any packets from its neighbor to update a
route before the route expires, the RIP device considers the route unreachable. By default,
the age timer interval is 180s.
 Garbage-collect timer: If a route becomes invalid after the age timer expires or a route
unreachable message is received, the route is placed into a garbage queue instead of
being immediately deleted from the RIP routing table. The garbage-collect timer
monitors the garbage queue and deletes expired routes. If an Update packet of a route is
received before the garbage-collect timer expires, the route is placed back into the age
queue. The garbage-collect timer is set to avoid route flapping. By default, the garbage
collect timer interval is 120s.
 Hold-down timer: If a RIP device receives an updated route with cost 16 from a neighbor,
the route enters the holddown state, and the hold-down timer is started. To avoid route
flapping, the RIP device does not accept any updated routes until the hold-down timer
expires, even if the cost is less than 16 except in the following scenarios:
a. The cost carried in the Update packet is less than or equal to that carried in the last
update packet.
b. The hold-down timer expires, and the corresponding route enters the Garbage state.
The relationship between RIP routes and the four timers is as follows:
Issue 01 (2018-05-04) 1077

NE20E-S2
 The advertisement of RIP routing updates is triggered by the update timer with a default
value 30 seconds.
 Each routing entry is associated with two timers: the age timer and garbage-collect timer.
a. Each time a route is learned and added to the routing table, the age timer is started.
b. If no Update packet is received from the neighbor within 180 seconds after the age
timer is started, the metric of the corresponding route is set to 16, and the
garbage-collect timer is started.
 If no Update packet is received within 120 seconds after the garbage-collect timer is
started, the corresponding routing entry is deleted from the routing table after the
garbage-collect timer expires.
 By default, the hold-down timer is disabled. If you configure a hold-down timer, it starts
after the system receives a route with a cost greater than 16 from its neighbor.
1.10.4.2.4 Split Horizon
Split Horizon on Broadcast, P2PMP, and P2P Networks

Split horizon prevents a RIP-enabled interface from sending back the routes it learns, which
reduces bandwidth consumption and prevents routing loops.
Figure 1-732 Networking for interface-based split horizon
In Figure 1-732, Device A sends Device B a route to 10.0.0.0/8. If split horizon is not
configured, Device B will send this route back to Device A after learning it from Device A. As
a result, Device A learns the following routes to 10.0.0.0/8:
 A direct route with zero hops
 A route with Device B as the next hop and total two hops
Only direct routes, however, are active in the RIP routing table of Device A.
If the route from Device A to 10.0.0.0/8 becomes unreachable and Device B is not notified,
Device B still considers the route to 10.0.0.0/8 reachable and continues sending this route to
Device A. Then, Device A receives incorrect routing information and considers the route to
10.0.0.0/8 reachable through Device B; Device B considers the route to 10.0.0.0/8 reachable
through Device A. As a result, a loop occurs on the network.
After split horizon is configured, Router B no longer sends the route back after learning the
route, which prevents such a loop.
Split Horizon on NBMA Networks

On a Non-Broadcast Multi-Access (NBMA) network where an interface is connected to
multiple neighbors, RIP supports neighbor-based split horizon. On NBMA networks, routes
are sent in unicast mode, and an interface can differentiate which neighbor each route was
Issue 01 (2018-05-04) 1078

NE20E-S2
learned from, and the interface will not send the routes back to the neighbor it learned them
from.
Figure 1-733 Networking for neighbor-based split horizon on an NBMA network
In Figure 1-733, Device A sends the route to 10.0.0.0/8 that it learns from Device B only to
Device C.
1.10.4.2.5 Poison Reverse

Poison reverse allows a RIP-enabled interface to set the cost of the route that it learns from a
neighbor to 16 (indicating that the route is unreachable) and then send the route back. After
receiving this route, the neighbor deletes the useless route from its routing table, which
prevents loops.
Figure 1-734 Networking for poison reverse
In Figure 1-734, Device A sends Device B a route to 10.0.0.0/8. If poison reverse is not
configured, Device B will send this route back to Device A after learning it from Device A. As
a result, Device A learns the following routes to 10.0.0.0/8:
 A direct route with zero hops
 A route with Device B as the next hop and total two hops
Only direct routes, however, are active in the RIP routing table of Device A.
If the route from Device A to 10.0.0.0 becomes unreachable and Device B is not notified,
Device B still considers the route to 10.0.0.0/8 reachable and continues sending this route to
Device A. Then, Device A receives incorrect routing information and considers the route to
10.0.0.0/8 reachable through Device B; Device B considers the route to 10.0.0.0/8 reachable
through Device A. As a result, a loop occurs on the network.
Issue 01 (2018-05-04) 1079

NE20E-S2
With poison reverse, after Device B receives the route from Device A, Device B sends a route
unreachable message to Device A with cost 16. Device A then no longer learns the reachable
route from Device B, which prevents routing loops.
If both split horizon and poison reverse are configured, only poison reverse takes effect.
1.10.4.2.6 Triggered Update

Triggered update allows a device to advertise routing information changes immediately,
which speeds up network convergence.
Figure 1-735 Networking for triggered update
In Figure 1-735, if the route to 11.4.0.0 becomes unreachable, Device C learns the
information first. By default, a RIP-enabled device sends routing updates to its neighbors
every 30s. If Device C receives an Update packet from Device B within 30s while Device C is
still waiting to send Update packets, Device C learns the incorrect route to 11.4.0.0. In this
case, the next hops of the routes from Device B or Device C to network 11.4.0.0 are Device C
and Device B respectively, which results in routing loops. If Device C sends an Update packet
to Device B immediately after it detects a network, Device B can rapidly update its routing
table, which prevents routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local
Device sets the cost of the route to 16 and then advertises the route immediately to its
neighbors. This process is called route poisoning.
1.10.4.2.7 Route Summarization

Route summarization allows routes to the same natural network segment but different subnets
to be summarized into a single route belonging to the same network segment before it is
transmitted to other network segments. RIP-1 packets do not carry mask information, and
therefore RIP-1 can advertise only routes with natural masks. RIP-2 supports route
summarization because RIP-2 packets carry mask information. Therefore, RIP-2 supports
subnetting.
Issue 01 (2018-05-04) 1080

NE20E-S2
In RIP-2, route summarization can reduce the size of the routing table and improve the
extensibility and efficiency of a large-scale network.
Route summarization has two modes:
 Process-based classful summarization
Summarized routes are advertised with natural masks. If split horizon or poison reverse
is configured, classful summarization becomes invalid because split horizon or poison
reverse suppresses some routes from being advertised. In addition, when classful
summarization is configured, routes learned from different interfaces may be
summarized into a single route. As a result, a conflict occurs in the advertisement of the
summarized route.
For example, a RIP process summarizes the route 10.1.1.0 /24 with metric 2 and route
10.2.2.0/24 with metric 3 into the route 10.0.0.0/8 with metric 2.
 Interface-based summarization
Users can specify a summary address.
For example, users can configure a RIP-enabled interface to summarize the route
10.1.1.0/24 with metric 2 and route 10.2.2.0/24 with metric 3 into the route 10.1.0.0/16
with metric 2.
1.10.4.2.8 Multi-Process and Multi-Instance

RIP supports multi-process and multi-instance to simplify network management and improve
service control efficiency. Multi-process allows a set of interfaces to be associated with a
specific RIP process, which ensures that the specific RIP process performs all the protocol
operations only on this set of interfaces. Therefore, multiple RIP processes can run on one
router, and each process manages a unique set of interfaces. In addition, the routing data of
each RIP process is independent; however, processes can import routes from each other.
On routers that support VPN, each RIP process is associated with a specific VPN instance.
Therefore, all the interfaces associated with the RIP process need to be associated with the
RIP process-related VPN instance.
1.10.4.2.9 BFD for RIP
Background
Routing Information Protocol (RIP)-capable devices monitor the neighbor status by
exchanging Update packets periodically. During the period local devices detect link failures,
carriers or users may lose a large number of packets. Bidirectional forwarding detection (BFD)
for RIP can speed up fault detection and route convergence, which improves network
reliability.
After BFD for RIP is configured on the router, BFD can detect a fault (if any) within
milliseconds and notify the RIP module of the fault. The router then deletes the route that
passes through the faulty link and switches traffic to a backup link. This process speeds up
RIP convergence.
Table 1-186 describes the differences before and after BFD for RIP is configured.
Table 1-186 Differences before and after BFD for RIP is configured

Speed
Issue 01 (2018-05-04) 1081

NE20E-S2

Speed
BFD for RIP is A RIP aging timer expires. Second-level
not configured.
BFD for RIP is A BFD session goes Down. Millisecond-level
configured.
Related Concepts
The BFD mechanism bidirectionally monitors data protocol connectivity over the link
between two routers. After BFD is associated with a routing protocol, BFD can rapidly detect
a fault (if any) and notify the protocol module of the fault, which speeds up route convergence
and minimizes traffic loss.
 Static BFD
must be configured, and requests must be delivered manually to establish BFD sessions.
Static BFD is applicable to networks on which only a few links require high reliability.
 Dynamic BFD
protocols, and the local discriminator is dynamically allocated, while the remote
discriminator is obtained from BFD packets sent by the neighbor.
When a new neighbor relationship is set up, a BFD session is established based on the
neighbor and detection parameters, including source and destination IP addresses. When
a fault occurs on the link, the routing protocol associated with BFD can detect the BFD
session Down event. Traffic is switched to the backup link immediately, which
minimizes data loss.
Dynamic BFD is applicable to networks that require high reliability.
Implementation
For details about BFD implementation, see "BFD" in Universal Service Router Feature
Description - Reliability. Figure 1-736 shows a typical network topology for BFD for RIP.
 Dynamic BFD for RIP implementation:
b. BFD for RIP is enabled on Device A and Device B.
c. Device A calculates routes, and the next hop along the route from Device A to
Device D is Device B.
d. If a fault occurs on the link between Device A and Device B, BFD will rapidly
e. Device A recalculates routes and selects a new path Device C → Device B →
Device D.
Issue 01 (2018-05-04) 1082

NE20E-S2
f. After the link between Device A and Device B recovers, a new BFD session is
forward packets.
 Static BFD for RIP implementation:
b. Static BFD is configured on the interface that connects Device A to Device B.
c. If a fault occurs on the link between Device A and Device B, BFD will rapidly
d. After the link between Device A and Device B recovers, a new BFD session is
forward packets.
Figure 1-736 BFD for RIP
Usage Scenario
BFD for RIP is applicable to networks that require high reliability.
Benefits
BFD for RIP improves network reliability and enables devices to rapidly detect link faults,
which speeds up route convergence on RIP networks.
1.10.4.2.10 RIP Authentication

As networks develop, there has been considerable growth in all types of data, voice, and video
information exchanged on networks. In addition, new services, such as E-commerce, online
conferencing and auctions, video on demand (VoD), and e-learning have sprung up
increasingly, requiring higher information security than before. Carriers must protect data
packets from being intercepted or modified by attackers and prohibit unauthorized users from
accessing network resources. RIP packet authentication effectively meets these security
requirements.
RIP authentication falls into the following modes:
Issue 01 (2018-05-04) 1083

NE20E-S2
 Simple authentication: The authenticated party adds the configured password directly to
packets for authentication. This authentication mode provides the lowest password
security.
 MD5 authentication: The authenticated party uses the Message Digest 5 (MD5)
algorithm to generate a ciphertext password and adds it to packets for authentication.
This authentication mode improves password security.
 Keychain authentication: The authenticated party configures a keychain that changes
over time. This authentication mode further improves password security.
Keychain authentication improves RIP security by periodically changing the password
and the encryption algorithms. For details about Keychain, see "Keychain" in NE20E
Feature Description - Security.
 HMAC-SHA256 authentication: The authenticated party uses the HMAC-SHA256
algorithm to generate a ciphertext password and adds it to packets for authentication.
RIP authentication ensures network security by adding an authentication field used to encrypt
a packet before sending the packet to ensure network security. After receiving a RIP packet
from a remote router, the local router discards the packet if the authentication password in the
packet does not match the local authentication password. This authentication mode protects
the local router.
On IP networks of carriers, RIP authentication ensures the secure transmission of packets,
improves the system security, and provides secure network services for carriers.
1.10.5 RIPng
Definition
RIP next generation (RIPng) is an extension to RIP version 2 (RIP-2) on IPv6 networks. Most
RIP concepts apply to RIPng.
RIPng is a distance-vector routing protocol, which measures the distance (metric or cost) to
the destination host by the hop count. In RIPng, the hop count from a device to its directly
connected network is 0, and the hop count from a device to a network that is reachable
through another device is 1. When the hop count is equal to or exceeds 16, the destination
network or host is defined as unreachable.
To be applied on IPv6 networks, RIPng makes the following changes to RIP:
 UDP port number: RIPng uses UDP port number 521 to send and receive routing
information.
 Multicast address: RIPng uses FF02::9 as the link-local multicast address of a RIPng
device.
 Prefix length: RIPng uses a 128-bit (the mask length) prefix in the destination address.
 Next hop address: RIPng uses a 128-bit IPv6 address.
 Source address: RIPng uses link-local address FE80::/10 as the source address to send
RIPng Update packets.
Purpose
RIPng is an extension to RIP for support of IPv6.
Issue 01 (2018-05-04) 1084

NE20E-S2
1.10.5.2 Principles
RIPng is an extension to RIPv2 on IPv6 networks and uses the same timers as RIPv2. RIPng
supports split horizon, poison reverse, and triggered update, which prevents routing loops.
1.10.5.2.1 RIPng Packet Format

A RIPng packet is composed of a header and multiple route table entries (RTEs). In a RIPng
packet, the maximum number of RTEs is determined by the maximum transmission unit
(MTU) of an interface.
Figure 1-737 shows the basic format of a RIPng packet.
Figure 1-737 RIPng packet format
A RIPng packet contains two types of RTEs:

 Next hop RTE: It defines the IPv6 address of the next hop and is located before a group
of IPv6-prefix RTEs that have the same next hop. The Metric field of a next hop RTE is
always 0xFF.
 IPv6-prefix RTE: It describes the destination IPv6 address and the cost in the RIPng
routing table and is located after a next hop RTE. A next hop RTE can be followed by
multiple different IPv6-prefix RTEs.
Figure 1-738 shows the format of a next hop RTE.
Issue 01 (2018-05-04) 1085

NE20E-S2
Figure 1-738 Format of the next hop RTE
Figure 1-739 shows the format of an IPv6-prefix RTE.
Figure 1-739 Format of the IPv6-prefix RTE
1.10.5.2.2 Timers
RIPng uses the following timers:
 Update timer: This timer periodically triggers Update packet transmission. By default,
the interval at which Update packets are sent is 30s. This timer is used to synchronize
RIPng routes on the network.
 Age timer: If a RIPng device does not receive any Update packet from its neighbor
before a route expires, the RIPng device considers the route to its neighbor unreachable.
 Garbage-collect timer: If no packet is received to update an unreachable route after the
Age timer expires, this route is deleted from the RIPng routing table.
 Hold-down timer: If a RIP device receives an updated route with cost 16 from a neighbor,
the route enters the holddown state, and the hold-down timer is started.
The following describes the relationship among these timers:
The advertisement of RIPng routing updates is periodically triggered by the update timer with
default value 30 seconds. Each routing entry is associated with two timers: the Age timer and
garbage-collect timer. Each time a route is learned and added to the routing table, the Age
timer is started. If no Update packet is received from the neighbor within 180 seconds, the
metric of the route is set to 16, and the garbage-collect timer is started. If no Update packet is
received within 120 seconds, the route is deleted after the garbage-collect timer expires.
By default, hold-down timer is disabled. If you configure a hold-down timer, it starts after the
system receives a route with a cost greater than 16 from its neighbor.
Issue 01 (2018-05-04) 1086

NE20E-S2
1.10.5.2.3 Split Horizon

Split horizon prevents a RIPng-enabled interface from sending back the routes it learns, which
reduces bandwidth consumption and prevents routing loops.
Figure 1-740 Networking for split horizon
On the network shown in Figure 1-740, after Device B sends a route to network 123::45 to
Device A, Device A does not send the route back to Device B.
1.10.5.2.4 Poison Reverse

Poison reverse allows a RIPng-enabled interface to set the cost of the route that it learns from
a neighbor to 16 (indicating that the route is unreachable) and then send the route back. After
receiving this route, the neighbor can delete the useless route from its routing table, which
prevents loops.
Figure 1-741 Networking for poison reverse
In Figure 1-741, if poison reverse is not configured, Device B sends Device A a route learned
from Device A. The cost of the route from Device A to network 123::0/64 is 1. If the route
from Device A to network 123::0/64 becomes unreachable and Device B does not receive an
Update packet from Device A and keeps sending Device A the route from Device A to
network 123::0/64, a routing loop occurs.
With poison reverse, after Device B receives the route from Device A, Device B sends a route
unreachable message to Device A with cost 16. Device A then no longer learns the reachable
route from Device B, which prevents routing loops.
If both poison reverse and split horizon are configured, only poison reverse takes effect.
1.10.5.2.5 Triggered Update

Triggered update allows a device to advertise the routing information changes immediately,
which speeds up network convergence.
Issue 01 (2018-05-04) 1087

NE20E-S2
Figure 1-742 Networking for triggered update
In Figure 1-742, if network 123::0 is unreachable, Device C learns the information first. By
default, a RIPng-enabled device sends Update packets to its neighbors every 30 seconds. If
Device C receives an Update packet from Device B within 30s when Device C is still waiting
to send Update packets, Device C learns the incorrect route to 123::0. In this case, the next
hops of the routes from Device B and Device C to 123::0 are Device C andDevice B,
respectively, which results in routing loops. If Device C sends an Update packet to Device B
immediately after it detects a network fault, Device B can rapidly update its routing table,
which prevents routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local
Router sets the cost of the route to 16 and then advertises the route immediately to its
neighbors. This process is called route poisoning.
1.10.5.2.6 Route Summarization
Background
On large networks, the RIPng routing table of each device contains a large number of routes,
which consumes lots of system resources. In addition, if a specific link connected to a device
within an IP address range frequently alternates between Up and Down, route flapping occurs.
To address these problems, RIPng route summarization was introduced. With RIPng route
summarization, a device summarizes routes destined for different subnets of a network
segment into one route destined for one network segment and then advertises the summarized
route to other network segments. RIPng route summarization reduces the number of routes in
the routing table, minimizes system resource consumption, and prevents route flapping.
Implementation
RIPng route summarization is interface-based. After RIPng route summarization is enabled on
an interface, the interface summarizes routes based on the longest matching rule and then
Issue 01 (2018-05-04) 1088

NE20E-S2
advertises the summarized route. The smallest metric among the specific routes for the
summarization is used as the metric of the summarized route.
For example, an interface has two routes: 11:11:11::24 with metric 2 and 11:11:12::34 with
metric 3. After RIPng route summarization is enabled on the interface, the interface
summarizes the two routes into the route 11::0/16 with metric 2 and then advertises it.
1.10.5.2.7 Multi-Process and Multi-Instance

RIPng supports multi-process and multi-instance, which simplifies network management and
improves service control efficiency. Multi-process allows a set of interfaces to be associated
with a specific RIPng process, which ensures that the specific RIPng process performs all the
protocol operations only on this set of interfaces. Therefore, multiple RIPng processes can run
on one router, and each process manages a unique set of interfaces. In addition, the routing
data of each RIPng process is independent; however, processes can import routes from each
other.
On routers that support VPN, each RIPng process is associated with a specific VPN instance.
Therefore, all the interfaces associated with the RIPng process need to be associated with the
RIPng process-related VPN instance.
1.10.5.2.8 IPsec Authentication
Background
As networks develop, network security has become an increasing concern. Internet Protocol
Security (IPsec) authentication can be used to authenticate RIPng packets. The packets that
fail to be authenticated are discarded, which prevents data transmitted based on TCP/IP from
being intercepted, tampered with, or attacked.
Implementation
IPsec has an open standard architecture and ensures secure packet transmission on the Internet
by encrypting packets. RIPng IPsec provides a complete set of security protection
mechanisms to authenticate RIPng packets, which prevents devices from being attacked by
forged RIPng packets.
IPsec includes a set of protocols that are used at the network layer to ensure data security,
such as Internet Key Exchange (IKE), Authentication Header (AH), and Encapsulating
Security Payload (ESP). The three protocols are described as follows:
 AH: A protocol that provides data origin authentication, data integrity check, and
anti-replay protection. AH does not encrypt packets to be protected.
 ESP: A protocol that provides IP packet encryption and authentication mechanisms
besides the functions provided by AH. The encryption and authentication mechanisms
can be used together or independently.
AH and ESP can be used together or independently.
Benefits
RIPng IPsec offers the following benefits:
Issue 01 (2018-05-04) 1089

NE20E-S2
 Improves carriers' reputation and competitiveness by preventing services from being

tampered with or attacked by unauthorized users.
 Ensures confidentiality and integrity of user packets.
1.10.6 OSPF
Definition
Open Shortest Path First (OSPF) is a link-state Interior Gateway Protocol (IGP) developed by
the Internet Engineering Task Force (IETF).
OSPF version 2 (OSPFv2) is intended for IPv4. OSPF version 3 (OSPFv3) is intended for
IPv6.
In this document, OSPF refers to OSPFv2, unless otherwise stated.
Purpose
Before the emergence of OSPF, the Routing Information Protocol (RIP) was widely used as
an IGP on networks. RIP is a distance-vector routing protocol. Due to its slow convergence,
routing loops, and poor scalability, RIP is gradually being replaced with OSPF.
Typical IGPs include RIP, OSPF, and Intermediate System to Intermediate System (IS-IS).
Table 1-187 describes differences among the three typical IGPs.
Table 1-187 Differences among RIP, OSPF, and IS-IS
Item RIP OSPF IS-IS

Protocol IP layer protocol IP layer protocol Link layer protocol
type
Applicati Applies to small Applies to Applies to large
on scope networks with simple medium-sized networks networks, such as
architectures, such as with several hundred Internet service provider
campus networks. routers supported, such (ISP) networks.
as enterprise networks.
Routing Uses a distance-vector Uses the shortest path Uses the SPF algorithm
algorithm algorithm and first (SPF) algorithm to to generate an SPT
exchanges routing generate a shortest path based on the network
information over the tree (SPT) based on the topology, calculates
User Datagram Protocol network topology, shortest paths to all
(UDP). calculates shortest paths destinations, and
to all destinations, and exchanges routing
exchanges routing information over IP.
information over IP. The SPF algorithm runs
separately in Level-1
and Level-2 databases.
Route Slow Less than 1 second Less than 1 second
converge
Issue 01 (2018-05-04) 1090

NE20E-S2
Item RIP OSPF IS-IS

nce speed
Scalabilit Not supported Supported by Supported by defining
y partitioning a network router levels
into areas
Benefits
OSPF offers the following benefits:
 Wide application scope: OSPF applies to medium-sized networks with several hundred
routers, such as enterprise networks.
 Network masks: OSPF packets can carry masks, and therefore the packet length is not
limited by natural IP masks. OSPF can process variable length subnet masks (VLSMs).
 Fast convergence: When the network topology changes, OSPF immediately sends link
state update (LSU) packets to synchronize the changes to the link state databases
(LSDBs) of all routers in the same autonomous system (AS).
 Loop-free routing: OSPF uses the SPF algorithm to calculate loop-free routes based on
the collected link status.
 Area partitioning: OSPF allows an AS to be partitioned into areas, which simplifies
management. Routing information transmitted between areas is summarized, which
reduces network bandwidth consumption.
 Equal-cost routes: OSPF supports multiple equal-cost routes to the same destination.
 Hierarchical routing: OSPF uses intra-area routes, inter-area routes, Type 1 external
routes, and Type 2 external routes, which are listed in descending order of priority.
 Authentication: OSPF supports area-based and interface-based packet authentication,
which ensures packet exchange security.
 Multicast: OSPF uses multicast addresses to send packets on certain types of links,
which minimizes the impact on other devices.
1.10.6.2 Principles
This section describes the basic Open Shortest Path First (OSPF) concepts.
Router ID
A router ID is a 32-bit unsigned integer, which identifies a router in an autonomous system
(AS). A router ID must exist before the router runs OSPF.
A router ID can be manually configured or automatically obtained.
If no router ID has been configured, the router automatically obtains a router ID using the
following methods in descending order of priority.
1. The router preferentially selects the largest IP address from its loopback interface
addresses as the router ID.
2. If no loopback interface has been configured, the router selects the largest IP address
from its interface IP addresses as the router ID.
Issue 01 (2018-05-04) 1091

NE20E-S2
A router can obtain a router ID again only after a router ID is reconfigured for the router or an
OSPF router ID is reconfigured and the OSPF process restarts.
Area
When a large number of routers run OSPF, link state databases (LSDBs) become very large
and require a large amount of storage space. Large LSDBs also complicate shortest path first
(SPF) computation and overload the routers. As the network grows, the network topology
changes, which results in route flapping and frequent OSPF packet transmission. When a
large number of OSPF packets are transmitted, bandwidth usage efficiency decreases, and
each router on a network has to recalculate routes in case of any topology change.
OSPF resolves this problem by partitioning an AS into different areas. An area is regarded as
a logical group, and each group is identified by an area ID. A router, not a link, resides at the
border of an area. A network segment or link can belong only to one area. An area must be
specified for each OSPF interface.
OSPF areas include common areas, stub areas, and not-so-stubby areas (NSSAs). Table 1-188
describes these OSPF areas.
Table 1-188 OSPF areas
Area Function Notes

Type
Common By default, OSPF areas are defined as common  The backbone area
area areas. Common areas include: must have all its
 Standard area: transmits intra-area, devices connected.
inter-area, and external routes.  All non-backbone
 Backbone area: connects to all other OSPF areas must remain
areas and transmits inter-area routes. The connected to the
backbone area is area 0. Routes between backbone area.
non-backbone areas must be forwarded
through the backbone area.
Stub area A stub area is a non-backbone area with only  The backbone area
one area border router (ABR) and generally cannot be configured
resides at the border of an AS. The ABR in a as a stub area.
stub area does not transmit received AS external  An autonomous system
routes, which significantly decreases the number boundary router
of entries in the routing table on the ABR and (ASBR) cannot exist in
the amount of routing information to be a stub area. Therefore,
transmitted. To ensure the reachability of AS AS external routes
external routes, the ABR generates a default cannot be advertised
route and advertises the route to non-ABRs in within the stub area.
the stub area.
 A virtual link cannot
A totally stub area allows only intra-area routes pass through a stub
and ABR-advertised Type 3 link state area.
advertisements (LSAs) carrying a default route
to be advertised within the area.
NSSA An NSSA is similar to a stub area. An NSSA  ABRs in an NSSA
does not advertise Type 5 LSAs but can import advertise Type 7 LSAs
AS external routes. ASBRs in an NSSA carrying a default route
generate Type 7 LSAs to carry the information within the NSSA. All
Issue 01 (2018-05-04) 1092

NE20E-S2
Area Function Notes

Type
about the AS external routes. The Type 7 LSAs inter-area routes are
are advertised only within the NSSA. When the advertised by ABRs.
Type 7 LSAs reach an ABR in the NSSA, the  A virtual link cannot
ABR translates the Type 7 LSAs into Type 5 pass through an NSSA.
LSAs and floods them to the entire AS.
A totally NSSA area allows only intra-area
routes to be advertised within the area.
Router Type
Routers are classified as internal routers, ABRs, backbone routers, or ASBRs by location in an
AS. Figure 1-743 shows the four router types.
Figure 1-743 Router type layout
Table 1-189 describes the four router types.
Table 1-189 Router type descriptions
Router Type Description

Internal router All interfaces on an internal router belong to the same
OSPF area.
ABR An ABR connects the backbone area and non-backbone
areas, and it can connect to the backbone area either
physically or logically.
An ABR can belong to two or more areas, one of which
must be a backbone area.
Issue 01 (2018-05-04) 1093

NE20E-S2

Backbone router Backbone routers include internal routers in the backbone
area and all ABRs.
At least one interface on a backbone router belongs to the
backbone area.
ASBR An ASBR exchanges routing information with other ASs.
An ASBR may not reside at the boundary of an AS, and it
can be an internal router or an ABR.
LSA
OSPF encapsulates routing information into LSAs for transmission. Table 1-190 describes
LSAs and their functions.
Table 1-190 LSAs and their functions
LSA Type Function

Router-LSA (Type 1) Describes the link status and cost of a router. Router-LSAs
are generated by a router and advertised within the area to
which the router belongs.
Network-LSA (Type 2) Describes the link status of all routers on the local network
segment. Network-LSAs are generated by a designated
router (DR) and advertised within the area to which the DR
belongs.
Network-summary-LSA Describes routes on a network segment.
(Type 3) Network-summary-LSAs are generated by an ABR and are
advertised within the non-totally stub area or NSSA. For
example, an ABR belongs to both area 0 and area 1. Area 0
has a network segment 10.1.1.0, and area 1 has a network
segment 11.1.1.0. The Type 3 LSA of the network segment
11.1.1.0 generated by the ABR for area 0 and the Type 3
LSA of the network segment 10.1.1.0 generated by the ABR
for area 1 are advertised within the non-totally stub area or
NSSA.
ASBR-summary-LSA Describes routes to an ASBR in an area.
(Type 4) ASBR-summary-LSAs are generated by an ABR and are
advertised to the areas except the area to which the ASBR
belongs.
AS-external-LSA (Type 5) Describes AS external routes, which are advertised to all
areas except stub areas and NSSAs. AS-external-LSAs are
generated by an ASBR.
NSSA-LSA (Type 7) Describes AS external routes. NSSA-LSAs are generated by
an ASBR and advertised only within NSSAs.
Opaque-LSA (Type 9/Type Provides a general mechanism for OSPF extension.
10/Type 11) Different types of LSAs are described as follows:
Issue 01 (2018-05-04) 1094

NE20E-S2
LSA Type Function

 Type 9 LSAs are advertised only on the network segment
where the interface advertising the LSAs resides. Grace
LSAs used in graceful restart (GR) are Type 9 LSAs.
 Type 10 LSAs are advertised within an OSPF area. LSAs
that are used to support traffic engineering (TE) are Type
10 LSAs.
 Type 11 LSAs are advertised within an AS but have not
been used in practice.
Packet Type
OSPF packets are classified as Hello, Database Description (DD), Link State Request (LSR),
Link State Update (LSU), or Link State Acknowledgment (LSAck) packets. Table 1-191
describes OSPF packets and their functions.
Table 1-191 OSPF packets and their functions
Packet Type Function

Hello packet Hello packets are periodically sent to discover and
maintain OSPF neighbor relationships.
DD packet DD packets contain the summaries of LSAs in the
local LSDB. DD packets are used for LSDB
synchronization between two routers.
LSR packet LSR packets are sent to OSPF neighbors to request
required LSAs.
A router sends LSR packets to its OSPF neighbor
only after DD packets have been successfully
exchanged.
LSU packet LSU packets are used to transmit required LSAs to
OSPF neighbors.
LSAck packet LSAck packets are used to acknowledge received
LSAs.
Route Type
Route types are classified as intra-area, inter-area, Type 1 external, or Type 2 external routes.
Intra-area and inter-area routes describe the network structure of an AS. Type 1 or Type 2 AS
external routes describe how to select routes to destinations outside an AS.
Table 1-192 describes OSPF routes in descending order of priority.
Table 1-192 OSPF routes
Route Type Description
Issue 01 (2018-05-04) 1095

NE20E-S2

Intra-area route -
Inter-area route -
Type 1 external route Type 1 external routes have high reliability.
Cost of a Type 1 external route = Cost of the route from a
router to an ASBR + Cost of the route from the ASBR to
the destination
When multiple ASBRs exist, the cost of each Type 1
external route equals the cost of the route from the local
device to an ASBR plus the cost of the route from the
ASBR to the destination. The cost is used for route
selection.
Type 2 external route Because a Type 2 external route has low reliability, its
cost is considered to be much greater than the cost of any
internal route to an ASBR.
Cost of a Type 2 external route = Cost of the route from
an ASBR to the destination
If routes are imported by multiple ASBRs, the route with
the smallest cost from the corresponding ASBR to its
destination is selected. If the routes have the same cost
from the corresponding ASBR to each route destination,
the route with the smallest cost from the local router to
the corresponding ASBR is selected. The cost of each
Type 2 external route equals the cost of the route from the
corresponding ASBR to the destination.
Network Type
Networks are classified as broadcast, non-broadcast multiple access (NBMA),
point-to-multipoint (P2MP), or point-to-point (P2P) networks by link layer protocol. Table
1-193 describes the network types.
Table 1-193 OSPF network classification
Network Type Link Layer Protocol Graph
Broadcast  Ethernet
 FDDI
Issue 01 (2018-05-04) 1096

NE20E-S2
Network Type Link Layer Protocol Graph

NBMA X.25
P2MP Regardless of the link

layer protocol, OSPF
does not default the
network type to P2MP.
The network type must be
manually changed to
P2MP. In most cases, a
non-fully connected
NBMA network is
changed to a P2MP
network.
P2P  PPP
 HDLC
 LAPB
DR and BDR
On broadcast or NBMA networks, any two routers need to exchange routing information. As
shown in Figure 1-744, nrouters are deployed on the network. n x (n - 1)/2 adjacencies must
be established. Any route change on a router is transmitted to other routers, which wastes
bandwidth resources. OSPF resolves this problem by defining a DR and a backup designated
router (BDR). After a DR is elected, all routers send routing information only to the DR. Then
the DR broadcasts LSAs. routers other than the DR and BDR are called DR others. The DR
others establish only adjacencies with the DR and BDR and not with each other. This process
reduces the number of adjacencies established between routers on broadcast or NBMA
networks.
Issue 01 (2018-05-04) 1097

NE20E-S2
Figure 1-744 Network topologies before and after a DR election
If the original DR fails, routers must reelect a DR and the routers except the new DR must
synchronize routing information to the new DR. This process is lengthy, which may cause
incorrect route calculations. A BDR is used to shorten the process. The BDR is a backup for a
DR. A BDR is elected together with a DR. The BDR establishes adjacencies with all routers
on the network segment and exchanges routing information with them. When the DR fails, the
BDR immediately becomes a new DR. The routers need to reelect a new BDR, but this
process does not affect route calculations.
The DR priority of a router interface determines its qualification for DR and BDR elections.
The router interfaces with their DR priorities greater than 0 are eligible. Each router adds the
elected DR to a Hello packet and sends it to other routers on the network segment. When both
router interfaces on the same network segment declare that they are DRs, the router interface
with a higher DR priority is elected as a DR. If the two router interfaces have the same DR
priority, the router interface with a larger router ID is elected as a DR.
OSPF Multi-Process
OSPF multi-process allows multiple OSPF processes to independently run on the same router.
Route exchange between different OSPF processes is similar to that between different routing
protocols. A router's interface can belong only to one OSPF process.
OSPF multi-process is typically used on virtual private networks (VPNs) on which OSPF is
deployed between provider edges (PEs) and customer edges (CEs). The OSPF processes on
the PEs are independent of each other.
OSPF Default Route

A default route is the route whose destination address and mask are both all 0s. When no
matching route is discovered, a router uses a default route to forward packets.
A default route generally applies to the following scenarios:
 An ABR in an area advertises Type 3 LSAs carrying a default route within the area. The
routers in the area use the received default route to forward inter-area packets.
 An ASBR in an AS advertises Type 5 or Type 7 LSAs carrying a default route within the
AS. The routers in the AS use the received default route to forward AS external packets.
OSPF routes are hierarchically managed. The priority of the default route carried in Type 3
LSAs is higher than the priority of the default route carried in Type 5 or Type 7 LSAs.
A router advertises LSAs carrying a default route by adhering to the following principles:
Issue 01 (2018-05-04) 1098

NE20E-S2
 A router in an area can advertise LSAs carrying a default route only when the router has
an interface connected to a device outside the area.
 If a router has advertised LSAs carrying a default route, the router no longer learns the
same type of LSA advertised by other routers, which carry a default route. That is, the
router uses only the LSAs advertised by itself to calculate routes. The LSAs advertised
by other routers are still saved in the LSDB.
 If a router must use a route to advertise LSAs carrying an external default route, the
route cannot be a route learned by the local OSPF process. A router in an area uses an
external default route to forward packets outside the area. If the next hops of routes in
the area are routers in the area, packets cannot be forwarded outside the area.
 Before a router advertises a default route, it checks whether a neighbor in the full state is
present in area 0. The router advertises a default route only when a neighbor in the full
state is present in area 0. If no such a neighbor exists, the backbone area cannot forward
packets and advertising a default route is meaningless. For the concept of the Full State,
see OSPF Neighbor States.
Table 1-194 describes the principles for advertising default routes in different areas.
Table 1-194 Principles for advertising default routes in different areas
Area Advertisement Principles

Type
Common By default, a router in a common area does not generate a default route.
area After being configured to do so, an ASBR generates a Type 5 LSA carrying a
default route. The router then advertises the default route in the entire AS.
If no default route is generated on the ASBR, the router does not advertise a
default route.
Stub Type 5 LSAs cannot be advertised within a stub area.
area A router in the stub area must learn AS external routes from an ABR. The ABR
automatically generates a Type 3 LSA carrying a default route and advertises it
within the entire stub area. Then the router can learn AS external routes from
the ABR.
Totally Neither Type 3 (except default Type 3 LSAs) nor Type 5 LSAs can be
stub area advertised within a totally stub area.
A router in the totally stub area must learn AS external and inter-area routes
from an ABR. After you configure a totally stub area, an ABR automatically
generates a Type 3 LSA carrying a default route and advertises it within the
entire totally stub area. Then the router can learn AS external and inter-area
routes from the ABR.
NSSA A small number of AS external routes learned from the ASBR in an NSSA can
be imported to the NSSA. Type 5 LSAs cannot be advertised within the NSSA.
The ABR automatically generates a Type 7 LSA carrying a default route and
advertises it within the entire NSSA. A small number of AS external routes can
be learned from the ASBR in the NSSA, and other inter-area routes can be
learned from the ABR in the NSSA. Manual configurations must be performed
on the ASBR to enable the ASBR to generate a Type 7 LSA carrying a default
route and advertise the LSA within the entire NSSA.
An ABR does not translate Type 7 LSAs carrying a default route into Type 5
LSAs carrying a default route or flood them to the entire AS.
Issue 01 (2018-05-04) 1099

NE20E-S2
Area Advertisement Principles

Type
Totally Neither Type 3 (except default Type 3 LSAs) nor Type 5 LSAs can be
NSSA advertised within a totally NSSA.
A router in the totally NSSA must learn AS external routes from an ABR. The
ABR automatically generates Type 3 and Type7 LSAs carrying a default route
and advertises them to the entire totally NSSA. Then AS external and inter-area
routes can be advertised within the totally NSSA.

OSPF route calculation involves the following processes:
1. Adjacency establishment
The adjacency establishment process is as follows:
a. The local and remote routers use OSPF interfaces to exchange Hello packets to
establish a neighbor relationship.
b. The local and remote routers negotiate a master/slave relationship and exchange
Database Description (DD) packets.
c. The local and remote routers exchange link state advertisements (LSAs) to
synchronize their link state databases (LSDBs).
2. Route calculation
OSPF uses the shortest path first (SPF) algorithm to calculate routes to implement fast
route convergence.
OSPF Neighbor States

To exchange routing information on an OSPF network, neighbor routers must establish
adjacencies. The differences between neighbor relationships and adjacencies are described as
follows:
 Neighbor relationship: After the local router starts, it uses an OSPF interface to send a
Hello packet to the remote router. After the remote router receives the packet, it checks
whether the parameters carried in the packet are consistent with its own parameters. If
the parameters carried in the packet are consistent with its own parameters, the remote
router establishes a neighbor relationship with the local router.
 Adjacency: After the local and remote routers establish a neighbor relationship, they
exchange DD packets and LSAs to establish an adjacency.
OSPF has eight neighbor states: Down, Attempt, Init, 2-way, Exstart, Exchange, Loading, and
Full. Down, 2-way, and Full are stable states. Attempt, Init, Exstart, Exchange, and Loading
are unstable states, which last only several minutes. Figure 1-745 shows the eight neighbor
states.
Issue 01 (2018-05-04) 1100

NE20E-S2
Figure 1-745 OSPF neighbor states
Table 1-195 OSPF neighbor states and their meanings

OSPF Meaning
Neighbo
r State
Down This is the initial state of a neighbor conversation. This state indicates that a
router has not received any Hello packets from its neighbors within a dead
interval.
Attempt In the Attempt state, a router periodically sends Hello packets to manually
configured neighbors.
NOTE
This state applies only to non-broadcast multiple access (NBMA) interfaces.
Init This state indicates that a router has received Hello packets from its neighbors
but the neighbors did not receive Hello packets from the router.
2-way This state indicates that a router has received Hello packets from its neighbors
and neighbor relationships have been established between the routers.
If no adjacency needs to be established, the neighbors remain in the 2-way
state. If adjacencies need to be established, the neighbors enter the Exstart
state.
Exstart In the Exstart state, routers establish a master/slave relationship to ensure that
DD packets are sequentially exchanged.
Exchange In the Exchange state, routers exchange DD packets. A router uses a DD
packet to describe its own LSDB and sends the packet to its neighbors.
Loading In the Loading state, a router sends Link State Request (LSR) packets to its
neighbors to request their LSAs for LSDB synchronization.
Full In the Full state, a router establishes adjacencies with its neighbors and all
LSDBs have been synchronized.
Issue 01 (2018-05-04) 1101

NE20E-S2
The neighbor state of the local router may be different from that of the remote router. For example, the
neighbor state of the local router is Full, but the neighbor state of the remote router is Loading.
Adjacency Establishment
Adjacencies can be established in either of the following situations:
 Two routers have established a neighbor relationship and communicate for the first time.
 The designated router (DR) or backup designated router (BDR) on a network segment
changes.
The adjacency establishment process is different on different networks.
Adjacency establishment on a broadcast network
On a broadcast network, the DR and BDR establish adjacencies with each router on the same
network segment, but DR others establish only neighbor relationships.
Figure 1-746 shows the adjacency establishment process on a broadcast network.
Figure 1-746 Adjacency establishment process on a broadcast network
The adjacency establishment process on a broadcast network is as follows:

1. Neighbor relationship establishment
a. Router A uses the multicast address 224.0.0.5 to send a Hello packet to Router B
through the OSPF interface connected to a broadcast network. The packet carries
the DR field of 1.1.1.1 (ID of Router A) and the Neighbors Seen field of 0. A
neighbor router has not been discovered, and Router A regards itself as a DR.
b. After Router B receives the packet, it returns a Hello packet to Router A. The
returned packet carries the DR field of 2.2.2.2 (ID of Router B) and the Neighbors
Issue 01 (2018-05-04) 1102

NE20E-S2
Seen field of 1.1.1.1 (Router A's router ID). Router A has been discovered but its
router ID is less than that of Router B, and therefore Router B regards itself as a DR.
Then Router B's state changes to Init.
c. After Router A receives the packet, Router A's state changes to 2-way.
The following procedures are not performed for DR others on a broadcast network.
2. Master/Slave negotiation and DD packet exchange
a. Router A sends a DD packet to Router B. The packet carries the following fields:
 Seq field: The value x indicates the sequence number is x.
 I field: The value 1 indicates that the packet is the first DD packet, which is
used to negotiate a master/slave relationship and does not carry LSA
summaries.
 M field: The value 1 indicates that the packet is not the last DD packet.
 MS field: The value 1 indicates that Router A declares itself a master.
To improve transmission efficiency, Router A and Router B determine which LSAs
in each other's LSDB need to be updated. If one party determines that an LSA of the
other party is already in its own LSDB, it does not send an LSR packet for updating
the LSA to the other party. To achieve the preceding purpose, Router A and Router
B first send DD packets, which carry summaries of LSAs in their own LSDBs.
Each summary identifies an LSA. To ensure packet transmission reliability, a
master/slave relationship must be determined during DD packet exchange. One
party serving as a master uses the Seq field to define a sequence number. The
master increases the sequence number by one each time it sends a DD packet. When
the other party serving as a slave sends a DD packet, it adds the sequence number
carried in the last DD packet received from the master to the Seq field of the packet.
b. After Router B receives the DD packet, Router B's state changes to Exstart and
Router B returns a DD packet to Router A. The returned packet does not carry LSA
summaries. Because Router B's router ID is greater than Router A's router ID,
Router B declares itself a master and sets the Seq field to y.
c. After Router A receives the DD packet, it agrees that Router B is a master and
Router A's state changes to Exchange. Then Router A sends a DD packet to Router
B to transmit LSA summaries. The packet carries the Seq field of y and the MS
field of 0. The value 0 indicates that Router A declares itself a slave.
d. After Router B receives the packet, Router B's state changes to Exchange and
Router B sends a new DD packet containing its own LSA summaries to Router A.
The value of the Seq field carried in the new DD packet is changed to y + 1.
Router A uses the same sequence number as Router B to confirm that it has received DD
packets from Router B. Router B uses the sequence number plus one to confirm that it
has received DD packets from Router A. When Router B sends the last DD packet, it sets
the M field of the packet to 0.
3. LSDB synchronization
a. After Router A receives the last DD packet, it finds that many LSAs in Router B's
LSDB do not exist in its own LSDB, so Router A's state changes to Loading. After
Router B receives the last DD packet from Router A, Router B's state directly
changes to Full, because Router B's LSDB already contains all LSAs of Router A.
b. Router A sends an LSR packet for updating LSAs to Router B. Router B returns an
LSU packet to Router A. After Router A receives the packet, it sends an LSAck
packet for acknowledgement.
Issue 01 (2018-05-04) 1103

NE20E-S2
The preceding procedures continue until the LSAs in Router A's LSDB are the same as
those in Router B's LSDB. Router A's state changes to Full. After Router A and Router B
exchange DD packets and update all LSAs, they establish an adjacency.
Adjacency establishment on an NBMA network
The adjacency establishment process on an NBMA network is similar to that on a broadcast
network. The blue part shown in Figure 1-747 highlights the differences from a broadcast
network.
On an NBMA network, all routers establish adjacencies only with the DR and BDR.
Figure 1-747 Adjacency establishment process on an NBMA network
The adjacency establishment process on an NBMA network is as follows:

1. Neighbor relationship establishment
a. After Router B sends a Hello packet to a Down interface of Router A, Router B's
state changes to Attempt. The packet carries the DR field of 2.2.2.2 (ID of Router B)
and the Neighbors Seen field of 0. A neighbor router has not been discovered, and
Router B regards itself as a DR.
b. After Router A receives the packet, Router A's state changes to Init and Router A
returns a Hello packet. The returned packet carries the DR and Neighbors Seen
fields of 2.2.2.2. Router B has been discovered but its router ID is greater than that
of Router A, and therefore Router A agrees that Router B is a DR.
The following procedures are not performed for DR others on an NBMA network.
2. Master/Slave relationship negotiation and DD packet exchange
The procedures for negotiating a master/slave relationship and exchanging DD packets
on an NBMA network are the same as those on a broadcast network.
3. LSDB synchronization
The procedure for synchronizing LSDBs on an NBMA network is the same as that on a
broadcast network.
Adjacency establishment on a point-to-point (P2P)/Point-to-multipoint (P2MP) network
The adjacency establishment process on a P2P/P2MP network is similar to that on a broadcast
network. On a P2P/P2MP network, however, no DR or BDR needs to be elected and DD
packets are transmitted in multicast mode.
Route Calculation
OSPF uses an LSA to describe the network topology. A Type 1 LSA describes the attributes of
a link between routers. A router transforms its LSDB into a weighted, directed graph, which
reflects the topology of the entire AS. All routers in the same area have the same graph. Figure
1-748 shows a weighted, directed graph.
Issue 01 (2018-05-04) 1104

NE20E-S2
Figure 1-748 Weighted, directed graph
Based on the graph, each router uses the SPF algorithm to calculate an SPT with itself as the
root. The SPT shows routes to nodes in the AS. Figure 1-749 shows an SPT.
Figure 1-749 SPT
When a router's LSDB changes, the router recalculates a shortest path. Frequent SPF
calculations consume a large amount of resources and affect router efficiency. Changing the
interval between SPF calculations can prevent resource consumption caused by frequent
LSDB changes. The default interval between SPF calculations is 5 seconds.
The route calculation process is as follows:
1. A router calculates intra-area routes.
The router uses an SFP algorithm to calculate shortest paths to other routers in an area.
Type 1 and Type 2 LSAs accurately describe the network topology in an area. Based on
the network topology described by a Type 1 LSA, the router calculates paths to other
routers in the area.
If multiple equal-cost routes are produced during route calculation, the SPF algorithm retains all these
routes in the LSDB.
2. The router calculates inter-area routes.
For the routers in an area, the network segment of the routes in an adjacent area is
directly connected to the area border router (ABR). Because the shortest path to the ABR
has been calculated in the preceding step, the routers can directly check a Type 3 LSA to
obtain the shortest path to the network segment. The autonomous system boundary
Issue 01 (2018-05-04) 1105

NE20E-S2
router (ASBR) can also be considered connected to the ABR. Therefore, the shortest path
to the ASBR can also be calculated in this phase.
If the router performing an SPF calculation is an ABR, the router needs to check only Type 3 LSAs in
the backbone area.
3. The router calculates AS external routes.
AS external routes can be considered to be directly connected to the ASBR. Because the
shortest path to the ASBR has been calculated in the preceding phase, the router can
check Type 5 LSAs to obtain the shortest paths to other ASs.
1.10.6.2.3 OSPF Route Control

You can use the following features to control the advertising and receiving of OSPF routes:
 Route summarization
Route summarization enables a router to summarize routes with the same prefix into a
single route and to advertise only the summarized route to other areas. Route
summarization reduces the size of a routing table and improves router performance.
 Route filtering
OSPF can use routing policies to filter routes. By default, OSPF does not filter routes.
 OSPF Database Overflow
Set the maximum number of external routes supported by the LSDB to dynamically limit
the LSDB's size.
These features meet requirements for network planning and traffic management.
Route Summarization
When a large OSPF network is deployed, an OSPF routing table includes a large number of
routing entries. To accelerate route lookup and simplify management, configure route
summarization to reduce the size of the OSPF routing table. If a link frequently alternates
between Up and Down, the links not involved in the route summarization are not affected.
This process prevents route flapping and improves network stability.
Route summarization can be carried out on an ABR or ASBR.
 ABR summarization
When an ABR transmits routing information to other areas, it generates Type 3 LSAs for
each network segment. If consecutive network segments exist in this area, you can
summarize these network segments into a single network segment. The ABR generates
one LSA for the summarized network segment and advertises only that LSA.
 ASBR summarization
If route summarization has been configured and the local router is an ASBR, the local
router summarizes imported Type 5 LSAs within the summarized address range. If an
NSSA has been configured, the local router also summarizes imported Type 7 LSAs
within the summarized address range.
If the local router is both an ASBR and an ABR, it summarizes Type 5 LSAs translated
from Type 7 LSAs.
Issue 01 (2018-05-04) 1106

NE20E-S2
Route Filtering
OSPF routing policies include access control lists (ACLs), IP prefix lists, and route-policies.
For details about these policies, see the section "Routing Policy" in the NE20EFeature
Description - IP Routing.
OSPF route filtering applies in the following aspects:
 Route import
OSPF can import the routes learned by other routing protocols. A router uses a
configured routing policy to filter routes and imports only the routes matching the
routing policy. Only an ASBR can import routes, and therefore a routing policy for
importing routes must be configured on the ASBR.
 Advertising of imported routes
A router advertises imported routes to its neighbors. Only an ASBR can import routes,
and therefore a routing policy for the advertising of imported routes must be configured
on the ASBR.
If OSPF imports a large number of external routes and advertises them to a device with a
smaller routing table capacity, the device may restart unexpectedly. To address this
problem, configure a limit on the number of LSAs generated when an OSPF process
imports external routes.
 Route learning
A router uses a routing policy to filter received intra-area, inter-area, and AS external
routes. The router adds only the routes matching the routing policy to its routing table.
All routes can still be advertised from an OSPF routing table.
The router filters only routes calculated based on LSAs, and therefore learned LSAs are
complete.
 Inter-area LSA learning
An ABR in an area can be configured to filter Type 3 LSAs advertised to the area. The
ABR can advertise only Type 3 LSAs, and therefore a routing policy for inter-area LSA
learning must be configured on the ABR.
During inter-area LSA learning, the ABR directly filters Type 3 LSAs advertised to the
area.
 Inter-area LSA advertising
An ABR in an area can be configured to filter Type 3 LSAs advertised to other areas. The
ABR can advertise only Type 3 LSAs, and therefore a routing policy for inter-area LSA
advertising must be configured on the ABR.
OSPF Database Overflow

OSPF requires that devices in the same area have the same LSDB. As the number of routes
increase continually, some devices cannot carry excess routing information due to limited
system resources. This situation is called an OSPF database overflow.
You can configure stub areas or NSSAs to prevent resource exhaustion caused by continually
increasing routing information. However, configuring stub areas or NSSAs cannot prevent an
OSPF database overflow caused by a sharp increase in dynamic routes. To resolve this issue,
set the maximum number of external routes supported by the LSDB to dynamically limit the
LSDB's size.
The maximum number of external routes configured for all devices in the OSPF AS must be the same.
Issue 01 (2018-05-04) 1107

NE20E-S2
When the number of external routes in the LSDB reaches the maximum number, the device
enters the overload state and starts the overflow timer at the same time. The device
automatically exits from the overflow state after the overflow timer expires. Table 1-196
describes the operations performed by the device after it enters or exits from the overload
state.
Table 1-196 Operations performed by the device after it enters or exits from the overload state
Phase OSPF Processing Procedure

Staying at overload state Deletes self-generated non-default external routes and
stops advertising non-default external routes.
Discards newly received non-default external routes and
does not reply with a Link State Acknowledgment (LSAck)
packet.
Checks whether the number of external routes is still
greater than the configured maximum number when the
overflow timer expires.
 Restarts the timer if the number of external routes is
greater than or equal to the configured maximum
number.
 Exits from the overflow state if the number of external
routes is less than the configured maximum number.
Exiting from the overflow Ends the overflow timer.
state Advertises non-default external routes.
Accepts newly received non-default external routes and
replies with LSAck packets.
1.10.6.2.4 OSPF Virtual Link
Background
All non-backbone areas must be connected to the backbone area during OSPF deployment to
ensure that all areas are reachable.
In Figure 1-750, area 2 is not connected to area 0 (backbone area), and Device B is not an
ABR. Therefore, Device B does not generate routing information about network 1 in area 0,
and Device C does not have a route to network 1.
Figure 1-750 Non-backbone area not connected to the backbone area
Issue 01 (2018-05-04) 1108

NE20E-S2
Some non-backbone areas may not be connected to the backbone area. You can configure an
OSPF virtual link to resolve this issue.
Related Concepts
A virtual link refers to a logical channel established between two ABRs over a non-backbone
area.
 A virtual link must be configured at both ends of the link.
 The non-backbone area involved is called a transit area.
A virtual link is similar to a point-to-point (P2P) connection established between two ABRs.
You can configure interface parameters, such as the interval at which Hello packets are sent,
at both ends of the virtual link as you do on physical interfaces.
Principles
In Figure 1-751, two ABRs use a virtual link to directly transmit OSPF packets. The device
between the two ABRs only forwards packets. Because the destination of OSPF packets is not
the device, the device transparently transmits the OSPF packets as common IP packets.
Figure 1-751 OSPF virtual link
1.10.6.2.5 OSPF TE
OSPF Traffic Engineering (TE) is developed based on OSPF to support Multiprotocol Label
Switching (MPLS) TE and establish and maintain TE LSPs. In the MPLS TE architecture
described in "MPLS Feature Description", OSPF functions as the information advertising
component, responsible for collecting and advertising MPLS TE information.
In addition to the network topology, TE needs to know network constraints, such as the
bandwidth, TE metric, administrative group, and affinity attribute. However, current OSPF
functions cannot meet these requirements. Therefore, OSPF introduces a new type of LSAs to
advertise network constraints. Based on the network constraints, the Constraint Shortest Path
First (CSPF) algorithm can calculate the path subject to specified constraints.
Issue 01 (2018-05-04) 1109

NE20E-S2
Figure 1-752 Overview of OSPF in the MPLS TE architecture
OSPF in the MPLS TE Architecture

In the MPLS TE architecture, OSPF functions as the information advertising component:
 Collects related information about TE.
 Floods TE information to devices in the same area.
 Uses the collected TE information to form the TE database (TEDB) so that CSPF can
calculate routes.
OSPF does not care about information content or how MPLS uses the information.
TE-LSA
OSPF uses a new type of LSA (Type 10 opaque LSA) to collect and advertise TE information.
Type 10 opaque LSAs contain the link status information required by TE, including the
maximum link bandwidth, maximum reservable bandwidth, current reserved bandwidth, and
link color. Based on the OSPF flooding mechanism, Type 10 opaque LSAs synchronize link
status information among devices in an area to form a uniform TEDB for route calculation.
Interaction Between OSPF TE and CSPF

OSPF uses Type 10 LSAs to collect TE information in an area, such as the bandwidth, priority,
and link metrics. After processing the collected TE information, OSPF provides it for CSPF to
calculate routes.
IGP Shortcut and Forwarding Adjacency

OSPF supports IGP shortcut and forwarding adjacency. The two features allow OSPF to use a
tunnel interface as an outbound interface to reach a destination.
Differences between IGP shortcut and forwarding adjacency are as follows:
 An IGP shortcut-enabled device uses a tunnel interface as an outbound interface but does
not advertise the tunnel interface to neighbors. Therefore, other devices cannot use this
tunnel.
Issue 01 (2018-05-04) 1110

NE20E-S2
 A forwarding adjacency-enabled device uses a tunnel interface as an outbound interface

and advertises the tunnel interface to neighbors. Therefore, other devices can use this
tunnel.
 IGP shortcut is unidirectional and needs to be configured only on the device that uses
IGP shortcut.
OSPF SRLG
OSPF supports the applications of the Shared Risk Link Group (SRLG) in MPLS by
obtaining information about the TE SRLG flooded among devices in an area. For details, refer
to the chapter "MPLS" in this manual.
1.10.6.2.6 OSPF VPN
Definition
As an extension of OSPF, OSPF VPN enables Provider Edges (PEs) and Customer Edges
(CEs) in VPNs to run OSPF for interworking and use OSPF to learn and advertise routes.
Purpose
As a widely used IGP, in most cases, OSPF runs in VPNs. If OSPF runs between PEs and CEs,
and PEs use OSPF to advertise VPN routes to CEs, no other routing protocols need to be
configured on CEs for interworking with PEs, which simplifies management and
configuration of CEs.
Running OSPF Between PEs and CEs

In BGP/MPLS VPN, Multi-Protocol BGP (MP-BGP) is used to transmit routing information
between PEs, while OSPF is used to learn and advertise routes between PEs and CEs.
Running OSPF between PEs and CEs has the following benefits:
 OSPF is used in a site to learn routes. Running OSPF between PEs and CEs can reduce
the number of the protocol types that CEs must support.
 Similarly, running OSPF both in a site and between PEs and CEs simplifies the work of
network administrators and reduces the number of protocols that network administrators
must be familiar with.
 When a network using OSPF but not VPN on the backbone network begins to use
BGP/MPLS VPN, running OSPF between PEs and CEs facilitates the transition.
In Figure 1-753, CE1, CE3, and CE4 belong to VPN 1, and the numbers following OSPF
refer to the process IDs of the multiple OSPF instances running on PEs.
Issue 01 (2018-05-04) 1111

NE20E-S2
Figure 1-753 Networking with OSPF running between PEs and CEs
The routes that PE1 receives from CE1 are advertised to CE3 and CE4 as follows:
1. PE1 imports OSPF routes of CE1 into BGP and converts them to BGP VPNv4 routes.
2. PE1 uses MP-BGP to advertise the BGP VPNv4 routes to PE2.
3. PE2 imports the BGP VPNv4 routes into OSPF and then advertises these routes to CE3
and CE4.
The process of advertising routes of CE4 or CE3 to CE1 is the same as the preceding process.
Configuring OSPF Areas Between PEs and CEs

OSPF areas between PEs and CEs can be non-backbone or backbone areas (Area 0). PEs can
function only as ABRs.
In the extended application of OSPF VPN, the MPLS VPN backbone network serves as Area
0. OSPF requires that Area 0 be contiguous. Therefore, Area 0 of all VPN sites must be
connected to the MPLS VPN backbone network. If a VPN site has OSPF Area 0, the PEs that
CEs access must be connected to the backbone area of this VPN site through Area 0. If no
physical link is available to directly connect PEs to the backbone area, a virtual link can be
deployed between the PEs and the backbone area. Figure 1-754 shows the networking for
configuring OSPF areas between PEs and CEs.
Issue 01 (2018-05-04) 1112

NE20E-S2
Figure 1-754 Configuring OSPF areas between PEs and CEs
A non-backbone area (Area 1) is configured between PE1 and CE1, and a backbone area
(Area 0) is configured in Site 1. The backbone area in Site 1 is separated from the VPN
backbone area. To ensure that the backbone areas are contiguous, a virtual link is configured
between PE1 and CE1.
OSPF Domain ID
If inter-area routes are advertised between local and remote OSPF areas, these areas are
considered to be in the same OSPF domain.
 Domain IDs identify domains.
 Each OSPF domain has one or more domain IDs. If more than one domain ID is
available, one of the domain IDs is a primary ID, and the others are secondary IDs.
 If an OSPF instance does not have a specific domain ID, its ID is considered as null.
Before advertising the remote routes sent by BGP to CEs, PEs need to determine the type of
OSPF routes (Type 3, Type 5, or Type 7) to be advertised to CEs based on domain IDs.
 If local domain IDs are the same as or compatible with remote domain IDs in BGP
routes, PEs advertise Type 3 routes.
 If local domain IDs are different from or incompatible with remote domain IDs in BGP
routes, PEs advertise Type 5 or Type 7 routes.
Table 1-197 Domain ID relationships and corresponding generated routes
Relationship Between Local and Remote Type of the Generated Routes

Domain IDs
Both the local and remote domain IDs are null. Inter-area routes
The remote domain ID is the same as the local Inter-area routes
primary domain ID or one of the local secondary
domain IDs.
The remote domain ID is different from the local If the local area is a non-NSSA,
primary domain ID or any of the local secondary external routes are generated.
domain IDs. If the local area is an NSSA, NSSA
routes are generated.
Issue 01 (2018-05-04) 1113

NE20E-S2
Routing Loop Prevention

Routing loops may occur between PEs and CEs when OSPF and BGP learn routes from each
other.
Figure 1-755 Networking for OSPF VPN routing loops
In Figure 1-755, on PE1, OSPF imports a BGP route destined for 10.1.1.1/32 and then
generates and advertises a Type 5 or Type 7 LSA to CE1. Then, CE1 learns an OSPF route
with 10.1.1.1/32 as the destination address and PE1 as the next hop and advertises the route to
PE2. Therefore, PE2 learns an OSPF route with 10.1.1.1/32 as the destination address and
CE1 as the next hop.
Similarly, CE1 also learns an OSPF route with 10.1.1.1/32 as the destination address and PE2
as the next hop. PE1 learns an OSPF route with 10.1.1.1/32 as the destination address and
CE1 as the next hop.
As a result, CE1 has two equal-cost routes with PE1 and PE2 as next hops respectively, and
the next hop of the routes from PE1 and PE2 to 10.1.1.1/32 is CE1, which leads to a routing
loop.
In addition, the priority of an OSPF route is higher than that of a BGP route. Therefore, on
PE1 and PE2, BGP routes to 10.1.1.1/32 are replaced with the OSPF route, and the OSPF
route with 10.1.1.1/32 as the destination address and CE1 as the next hop is active in the
routing tables of PE1 and PE2.
The BGP route is inactive, and therefore, the LSA generated when this route is imported by
OSPF is deleted, which causes the OSPF route to be withdrawn. As a result, no OSPF route
exists in the routing table, and the BGP route becomes active again. This cycle causes route
flapping.
OSPF VPN provides a few solutions to routing loops, as described in Table 1-198.
Table 1-198 Routing loop prevention measures

Feature Definition Function
DN-bit It is a flag bit used by OSPF After OSPF multi-instance

multi-instance processes to prevent is configured on the router
Issue 01 (2018-05-04) 1114

NE20E-S2

routing loops. (a PE or an MCE), the
router sets the DN-bit of
generated Type 3, Type 5,
or Type 7 LSAs to 1 and
retains the DN-bit (0) of
other LSAs.
When calculating routes, the
OSPF multi-instance
process on the router ignores
LSAs with DN-bit 1, which
prevents the router from
receiving the LSAs that are
advertised by itself. In
Figure 1-755, PE1 generates
Type 3, Type 5, or Type 7
LSAs, sets their DN-bit to 1,
and advertises these LSAs to
CE1. After receiving these
LSAs, CE1 forwards them
to PE2. Upon reception of
these LSAs, PE2 ignores
them and does not forward
them back to PE1, which
prevents a routing loop.
VPN route tag The VPN route tag is carried in Type When a PE detects that the
5 or Type 7 LSAs generated by PEs VPN route tag in the
based on the received BGP VPN incoming LSA is the same
route. as the local route tag, the PE
It is not carried in BGP extended ignores the LSA, which
community attributes. The VPN route prevents routing loops.
tag is valid only on PEs that receive
BGP routes and generate OSPF LSAs.
Default route It is a route whose destination IP Default routes are used to
address and mask are both 0. forward traffic from CEs or
sites where CEs reside to the
VPN backbone network.
Disabling Routing Loop Prevention
Exercise caution when disabling routing loop prevention because it may cause routing loops.
Issue 01 (2018-05-04) 1115

NE20E-S2
During BGP or OSPF route exchanges, routing loop prevention prevents OSPF routing loops
in VPN sites.
In the inter-AS VPN Option A scenario, if OSPF runs between ASBRs to transmit VPN routes,
the remote ASBR may fail to learn the OSPF routes sent by the local ASBR due to the routing
loop prevention mechanism.
In Figure 1-756, inter-AS VPN Option A is deployed with OSPF running between PE1 and
CE1. CE1 sends VPN routes to CE2.
Figure 1-756 Networking for inter-AS VPN Option A
1. PE1 learns routes to CE1 using the OSPF process in a VPN instance, imports these
routes into MP-BGP, and sends the MP-BGP routes to ASBR1.
2. After receiving the MP-BGP routes, ASBR1 imports the routes into the OSPF process in
a VPN instance and generates Type 3, Type 5, or Type 7 LSAs carrying DN bit 1.
3. ASBR2 uses OSPF to learn these LSAs and checks the DN bit of each LSA. After
learning that the DN bit in each LSA is 1, ASBR2 does not add the routes carried in
these LSAs to its routing table.
The routing loop prevention mechanism prevents ASBR2 from learning the OSPF routes sent
from ASBR1. As a result, CE1 cannot communicate with CE3.
To address the preceding problem, use either of the following methods:
 Disable the device from setting the DN bit to 1 in the LSAs when importing BGP routes
into OSPF. For example, if ASBR1 does not set the DN bit to 1 when importing
MP-BGP routes into OSPF. After ASBR2 receives these routes and finds that the DN bit
in the LSAs carrying these routes is 0, ASBR2 will add the routes to its routing table.
 Disable the device from checking the DN bit after receiving LSAs. For example, ASBR1
sets the DN bit to 1 in LSAs when importing MP-BGP routes into OSPF. ASBR2,
however, does not check the DN bit after receiving these LSAs.
The preceding methods can be used based on specific types of LSAs. You can configure a
sender to determine whether to set the DN bit to 1 or configure a receiver to determine
Issue 01 (2018-05-04) 1116

NE20E-S2
whether to check the DN bit in the Type 3 LSAs based on the router ID of the device that
generates the Type 3 LSAs.
In the inter-AS VPN Option A scenario shown in Figure 1-757, the four ASBRs are fully
meshed and run OSPF. ASBR2 may receive the Type 3, Type 5, or Type 7 LSAs generated on
ASBR4. ASBR2 denies the Type 5 or Type 7 LSAs, because the VPN route tags carried in the
LSAs are the same as the default VPN route tag of the OSPF process on ASBR2. If ASBR2 is
disabled from checking the DN bit in the LSAs, ASBR2 accepts the Type 3 LSAs, and routing
loops may occur.
To address the routing loop problem caused by Type 3 LSAs, ASBR2 can be disabled from
checking the DN bit in the Type 3 LSAs generated by devices with router ID 1.1.1.1 and
router ID 3.3.3.3. After the configuration is complete, if ASBR2 receives Type 3 LSAs sent by
ASBR4 with router ID 4.4.4.4, ASBR2 checks the DN bit and denies these Type 3 LSAs
because the DN bit is set to 1.
Figure 1-757 Networking for fully meshed ASBRs in the inter-AS VPN Option A scenario
Multi-VPN-Instance CE
OSPF multi-instance generally runs on PEs. Devices that run OSPF multi-instance within user
LANs are called Multi-VPN-Instance CEs (MCEs).
Compared with OSPF multi-instance running on PEs, MCEs have the following
characteristics:
 MCEs do not need to support OSPF-BGP association.
 MCEs establish one OSPF instance for each service. Different virtual CEs transmit
different services, which ensures LAN security at a low cost.
 MCEs implement different OSPF instances on a CE. The key to implementing MCEs is
to disable loop detection and calculate routes directly. MCEs also need to use the
received LSAs with the ND-bit 1 for route calculation.
1.10.6.2.7 OSPF NSSA
Background
As defined in OSPF, stub areas cannot import external routes. This mechanism prevents
external routes from consuming the bandwidth and storage resources of routers in stub areas.
Issue 01 (2018-05-04) 1117

NE20E-S2
If you need to both import external routes and prevent resource consumption caused by
external routes, you can configure not-so-stubby areas (NSSAs).
There are many similarities between NSSAs and stub areas. However, different from stub
areas, NSSAs can import AS external routes into the OSPF AS and advertise the imported
routes in the OSPF AS without learning external routes from other areas on the OSPF
network.
Related Concepts
 N-bit
A router uses the N-bit carried in a Hello packet to identify the area type that it supports.
The same area type must be configured for all routers in an area. If routers have different
area types, they cannot establish OSPF neighbor relationships. Some vendors' devices do
not comply with standard protocols, but the N-bit is also set in OSPF Database
Description (DD) packets. You can manually set the N-bit on a router to interwork with
the vendors' devices.
 Type 7 LSA
Type 7 LSAs, which describe imported external routes, are introduced to support NSSAs.
Type 7 LSAs are generated by an ASBR in an NSSA and advertised only within the
NSSA. After an ABR in an NSSA receives Type 7 LSAs, it selectively translates Type 7
LSAs into Type 5 LSAs to advertise external routes to other areas on an OSPF network.
Principles
To advertise external routes imported by an NSSA to other areas, a translator must translate
Type 7 LSAs into Type 5 LSAs. Notes for an NSSA are as follows:
 By default, the translator is the ABR with the largest router ID in the NSSA.
 The propagate bit (P-bit) is used to notify a translator whether Type 7 LSAs need to be
translated.
 Only Type 7 LSAs with the P-bit set and a non-zero forwarding address (FA) can be
translated into Type 5 LSAs. An FA indicates that packets to a destination address will be
forwarded to the address specified by the FA.
FA indicates that the packet to a specific destination address is to be forwarded to the address specified
by.
The loopback interface address in an area is preferentially selected as the FA. If no loopback interface
exists, the address of the interface that is Up and has the largest logical index in the area is selected as
the FA.
 The P-bit is not set for Type 7 LSAs generated by an ABR.

Figure 1-758 shows an NSSA.
Issue 01 (2018-05-04) 1118

NE20E-S2
Figure 1-758 NSSA
Advantages
Multiple ABRs may be deployed in an NSSA. To prevent routing loops caused by default
routes, ABRs do not calculate the default routes advertised by each other.
1.10.6.2.8 OSPF Local MT
Background
When multicast and an IGP Shortcut-enabled MPLS TE tunnel are configured on a network,
the outbound interface of the route calculated by an IGP may not be a physical interface but a
TE tunnel interface. The TE tunnel interface on the Device sends multicast Join packets over a
unicast route to the multicast source address. The multicast Join packets are transparent to the
Device through which the TE tunnel passes. As a result, the Device through which the TE
tunnel passes cannot generate multicast forwarding entries.
To resolve the problem, configure OSPF local multicast topology (MT) to create a multicast
routing table for multicast packet forwarding.
Principles
On the network shown in Figure 1-759, multicast and an IGP Shortcut-enabled MPLS TE
tunnel are configured, and the TE tunnel passes through Device B. As a result, Device B
cannot generate multicast forwarding entries.
Issue 01 (2018-05-04) 1119

NE20E-S2
Figure 1-759 OSPF Local MT
Since the TE tunnel is unidirectional, multicast data packets from the multicast source are sent
to a physical interface of Device B. Device B discards these packets, because Device B has no
multicast forwarding entry. As a result, services are interrupted.
After local MT is enabled, if the outbound interface of a calculated route is an IGP
Shortcut-enabled TE tunnel interface, the route management (RM) module creates an
independent Multicast IGP (MIGP) routing table for the multicast protocol, calculates a
physical outbound interface for the route, and adds the route to the MIGP routing table.
Multicast packets are then forwarded along this route.
After receiving multicast Join packets from Client, interface 1 on Device A forwards these
packets to Device B. With local MT enabled, Device B can generate multicast forwarding
entries.
1.10.6.2.9 BFD for OSPF
Definition
In BFD for OSPF, a BFD session is associated with OSPF. The BFD session quickly detects a
link fault and then notifies OSPF of the fault, which speeds up OSPF's response to network
topology changes.
Purpose
routing protocols.
Issue 01 (2018-05-04) 1120

NE20E-S2
BFD for Open Shortest Path First (OSPF) associates BFD sessions with OSPF. After BFD for
OSPF is configured, BFD quickly detects link faults and notifies OSPF of the faults. BFD for
OSPF accelerates OSPF response to network topology changes.
Table 1-199 describes OSPF convergence speeds before and after BFD for OSPF is
configured.
Table 1-199 OSPF convergence speeds before and after BFD for OSPF is configured

Speed
BFD for OSPF An OSPF Dead timer expires. Second-level
is not
configured.
BFD for OSPF A BFD session goes Down. Millisecond-level
is configured.
Principles
Figure 1-760 BFD for OSPF
Figure 1-760 shows a typical network topology with BFD for OSPF configured. The
principles of BFD for OSPF are described as follows:
1. OSPF neighbor relationships are established between these three routers.
of the fault.
recalculates routes. The new route passes through Device C and reaches Device A, with
Issue 01 (2018-05-04) 1121

NE20E-S2
1.10.6.2.10 OSPF GTSM
Definition
Generalized TTL security mechanism (GTSM) is a mechanism that protects services over the
IP layer by checking whether the TTL value in an IP packet header is within a pre-defined
range.
Purpose
On networks, attackers may simulate OSPF packets and keep sending them to a device. After
receiving these packets, the device directly sends them to the control plane for processing
without checking their validity if the packets are destined for the device. As a result, the
control plane is busy processing these packets, resulting in high CPU usage.
GTSM is used to protect the TCP/IP-based control plane against CPU-utilization attacks, such
as CPU-overload attacks.
Principles
GTSM-enabled devices check the TTL value in each received packet based on a configured
policy. The packets that fail to pass the policy are discarded or sent to the control plane, which
prevents the devices from possible CPU-utilization attacks. A GTSM policy involves the
following items:
 Source address of the IP packet sent to the device
 VPN instance to which the packet belongs
 Protocol number of the IP packet (89 for OSPF, and 6 for BGP)
 Source port number and destination port number of protocols above TCP/UDP
 Valid TTL range
GTSM is implemented as follows:
 For directly connected OSPF neighbors, the TTL value of the unicast protocol packets to
be sent is set to 255.
 For multi-hop neighbors, a reasonable TTL range is defined.
The applicability of GTSM is as follows:
 GTSM takes effect on unicast packets rather than multicast packets. This is because the
TTL value of multicast packets can only be 255, and therefore GTSM is not needed to
protect against multicast packets.
 GTSM does not support tunnel-based neighbors.
1.10.6.2.11 OSPF Smart-discover
Definition
routers periodically send Hello packets through OSPF interfaces. By exchanging Hello
packets, the routers establish and maintain the neighbor relationship, and elect the DR and the
Backup Designated Router (BDR) on the multiple-access network (broadcast or NBMA
network). OSPF uses a Hello timer to control the interval at which Hello packets are sent. A
router can send Hello packets again only after the Hello timer expires. Neighbors keep
Issue 01 (2018-05-04) 1122

NE20E-S2
waiting to receive Hello packets until the Hello timer expires. This process delays the
establishment of OSPF neighbor relationships or election of the DR and the BDR.
Enabling Smart-discover can solve the preceding problem.
Table 1-200 Processing differences with and without Smart-discover
With or Without Processing

Smart-discover
Without Smart-discover  Hello packets are sent only when the Hello timer
expires.
 Hello packets are sent at the Hello interval.
 Neighbors keep waiting to receive Hello packets
within the Dead interval.
With Smart-discover  Hello packets are sent directly regardless of
whether the Hello timer expires.
 Neighbors can receive packets and change the state
immediately.
Principles
In the following situations, Smart-discover-enabled interfaces can send Hello packets to
neighbors regardless of whether the Hello timer expires:
 On broadcast or NBMA networks, neighbor relationships can be established and a DR
and a BDR can be elected rapidly.
− The neighbor status becomes 2-way for the first time or returns to Init from 2-way
or a higher state.
− The interface status of the DR or BDR on a multiple-access network changes.
 On P2P or P2MP networks, neighbor relationships can be established rapidly. The
establishment of neighbor relationships on a P2P or P2MP network is the same as that on
a broadcast or NBMA network.
1.10.6.2.12 OSPF-BGP Synchronization
Background
When a new device is deployed on a network or a device is restarted, network traffic may be
lost during BGP route convergence because IGP routes converge more quickly than BGP
routes.
OSPF-BGP synchronization can address this problem.
Purpose
If a backup link exists, BGP traffic may be lost during traffic switchback because BGP routes
converge more slowly than OSPF routes do.
Issue 01 (2018-05-04) 1123

NE20E-S2
In Figure 1-761, Device A, Device B, Device C, and Device D run OSPF and establish IBGP
connections. Device C functions as the backup of Device B. When the network is stable, BGP
and OSPF routes converge completely on the router.
In most cases, traffic from Device A to 10.3.1.0/30 passes through Device B. If Device B fails,
traffic is switched to Device C. After Device B recovers, traffic is switched back to Device B.
During this process, packet loss occurs.
Consequently, convergence of OSPF routes is complete while BGP route convergence is still
going on. As a result, Device B does not have the route to 10.3.1.0/30.
When packets from Device A to 10.3.1.0/30 reach Device B, Device B discards them because
Device B does not have the route to 10.3.1.0/30.
Figure 1-761 Networking for OSPF-BGP synchronization
Principles
If OSPF-BGP synchronization is configured on a device, the device remains as a stub router
during the set synchronization period. During this period, the link metric in the LSA
advertised by the device is set to the maximum value (65535), instructing other OSPF routers
not to use it as a transit router for data forwarding.
In Figure 1-761, OSPF-BGP synchronization is enabled on Router B. In this situation, before
BGP route convergence is complete, Device A keeps forwarding data through Device C rather
than Device B until BGP route convergence on Device B is complete.
1.10.6.2.13 LDP-IGP Synchronization
Background
LDP-IGP synchronization enables the LDP status and the IGP status to go Up simultaneously,
which helps minimize traffic interruption time if a fault occurs.
A network provides active and standby links for redundancy. If the active link fails, both an
IGP route and an LDP LSP switch from the active link to the standby link. After the active
link recovers, the IGP route switches back to the active link earlier than the LDP LSP. Traffic
therefore switches to the IGP route over the active link but is dropped because the LSP is
unreachable over the new active link. To prevent traffic loss, LDP-IGP synchronization can be
configured.
Issue 01 (2018-05-04) 1124

NE20E-S2
LDP-IGP synchronization supports OSPFv2 and IS-IS IPv4.
On a network enabled with LDP-IGP synchronization, an IGP keeps advertising the maximum
cost of an IGP route over the new active link to delay IGP route convergence until LDP
converges. Traffic keeps switching back and forth between the standby and active links. The
backup LSP is torn down after the LSP on the active link is established.
LDP-IGP synchronization involves the following timers:
 Hold-max-cost timer
 Delay timer
Implementation
Figure 1-762 Switchback in LDP-IGP synchronization
 In Figure 1-762, a network has both an active and standby link. When the active link
recovers from any fault, traffic is switched from the standby link to the active link.
During the traffic switchback, the backup LSP cannot be used, and a new LSP cannot be
set up over the active link once IGP route convergence is complete. This causes a traffic
interruption for a short period of time. To help prevent this problem, LDP-IGP
synchronization can be configured to delay the IGP route switchback until LDP
converges. The backup LSP is not deleted and continues forwarding traffic until an LSP
over the active link is established. The process of LDP-IGP synchronization is as
follows:
a. A link recovers from a fault.
b. An LDP session is set up between LSR2 and LSR3. The IGP advertises the
maximum cost of the active link to delay the IGP route switchback.
c. Traffic is still forwarded along the backup LSP.
d. Once set up, the LDP session transmits Label Mapping messages and advertises the
IGP to start synchronization.
e. The IGP advertises the normal cost of the active link, and its routes converge on
the original forwarding path. The LSP is reestablished and delivers entries to the
forwarding table.
The whole process usually takes milliseconds.
Issue 01 (2018-05-04) 1125

NE20E-S2
 If an LDP session between two nodes on an active link fails, the LSP on the active link is
torn down, but the IGP route for the active link is reachable. In this case, traffic fails to
switch from the primary LSP to a backup LSP and is discarded. To prevent this problem,
LDP-IGP synchronization can be configured so that after an LDP session fails, LDP
notifies an IGP of the failure. The IGP advertises the maximum cost of the failed link,
which enables the route to switch from the active link to the standby link. In addition to
the LSP switchover from the primary LSP to the backup LSP, LDP-IGP synchronization
is implemented. The process of LDP-IGP synchronization is as follows:
a. An LDP session between two nodes on an active link fails.
b. LDP notifies an IGP of failure in the LDP session over which the primary LSP is
established. The IGP then advertises the maximum cost along the active link.
c. The IGP route switches to the standby link.
d. A backup LSP is set up over the standby link and then forwarding entries are
delivered.
To prevent an LDP session from failing to be established, you can set the value of a
Hold-max-cost timer to always advertise the maximum cost, which enables traffic to be
transmitted along the backup link before the LDP session is reestablished on the active
link.
 LDP-IGP synchronization state machine
After LDP-IGP synchronization is enabled on an interface, the LDP-IGP synchronization
state machine operates based on the flowchart shown in Figure 1-763.
Figure 1-763 LDP-IGP synchronization status transition
Issue 01 (2018-05-04) 1126

NE20E-S2
Note differences when different IGP protocols are used:
 When OSPF is used, the status transits based on the flowchart shown in Figure 1-763.
 When IS-IS is used, the Hold-normal-cost state does not exist. After the Hold-max-cost timer expires,
IS-IS advertises the actual link cost, but the Hold-max-cost state is displayed even though this state
is nonexistent.
Usage Scenario
Figure 1-764 shows an LDP-IGP synchronization scenario.
On the network shown in Figure 1-764, an active link and a standby link are established.
LDP-IGP synchronization and LDP FRR are deployed.
Figure 1-764 LDP-IGP synchronization scenario
Benefits
Packet loss is reduced during an active/standby link switchover, improving network
reliability.
1.10.6.2.14 OSPF Fast Convergence

OSPF fast convergence is an extended feature of OSPF to speed up the convergence of routes.
It includes the following components:
Issue 01 (2018-05-04) 1127

NE20E-S2
 Partial Route Calculation (PRC): calculates only the changed routes when the routes on
the network change.
 An OSPF intelligent timer: can dynamically adjust its value based on the user's
configuration and the interval at which an event is triggered, such as the route calculation
interval, which ensures rapid and stable network operation.
OSPF intelligent timer uses the exponential backoff technology so that the value of the
timer can reach the millisecond level.
PRC
When a node changes on the network, this algorithm is used to recalculate all routes. The
PRC calculation takes a long time and consumes too many CPU resources, which affects the
convergence speed.
In route calculation, a leaf represents a route, and a node represents a router. Either an SPT or
a leaf change causes a route change. The SPT change is irrelevant to the leaf change. PRC
processes routing information as follows:
 If the SPT changes, PRC processes the routing information of all leaves on a changed
node.
 If the SPT remains unchanged, PRC does not process the routing information on any
node.
 If a leaf changes, PRC processes the routing information on the leaf only.
 If a leaf remains unchanged, PRC does not process the routing information on any leaf.
OSPF Intelligent Timer

On an unstable network, routes are calculated frequently, which consumes a great number of
CPU resources. In addition, LSPs that describe the unstable topology are generated and
transmitted on the unstable network. Frequently processing such LSAs affects the rapid and
stable operation of the entire network.
To speed up route convergence on the entire network, the OSPF intelligent timer controls
route calculation, LSA generation, and LSA receiving.
The OSPF intelligent timer works as follows:
 On a network where routes are calculated repeatedly, the OSPF intelligent timer
dynamically adjusts the route calculation based on user's configuration and the
exponential backoff technology. The number of route calculation times and the CPU
resource consumption are decreased. Routes are calculated after the network topology
stabilizes.
 On an unstable network, if a router generates or receives LSAs due to frequent topology
changes, the OSPF intelligent timer can dynamically adjust the interval. No LSAs are
generated or processed within an interval, which prevents invalid LSAs from being
generated and advertised on the entire network.
1.10.6.2.15 OSPF Neighbor Relationship Flapping Suppression

OSPF neighbor relationship flapping suppression works by delaying OSPF neighbor
relationship reestablishment or setting the link cost to the maximum value (65535).
Issue 01 (2018-05-04) 1128

NE20E-S2
Background
If an interface carrying OSPF services alternates between Up and Down, OSPF neighbor
relationship flapping occurs on the interface. During the flapping, OSPF frequently sends
Hello packets to reestablish the neighbor relationship, synchronizes LSDBs, and recalculates
routes. In this process, a large number of packets are exchanged, adversely affecting neighbor
relationship stability, OSPF services, and other OSPF-dependent services, such as LDP and
BGP. OSPF neighbor relationship flapping suppression can address this problem by delaying
OSPF neighbor relationship reestablishment or preventing service traffic from passing
through flapping links.
Related Concepts
Flapping-event: reported when the status of a neighbor relationship on an interface last
changes from Full to a non-Full state. The flapping-event triggers flapping detection.
Flapping-count: number of times flapping has occurred.
Detecting-interval: detection interval. The interval is used to determine whether to trigger a
valid flapping_event.
Threshold: flapping suppression threshold. When the flapping_count reaches or exceeds
threshold, flapping suppression takes effect.
Resume-interval: interval for exiting from OSPF neighbor relationship flapping suppression.
If the interval between two successive valid flapping_events is longer than resume-interval,
the flapping_count is reset.
Implementation
Flapping detection
Each OSPF interface on which OSPF neighbor relationship flapping suppression is enabled
starts a flapping counter. If the interval between two successive neighbor status changes from
Full to a non-Full state is shorter than detecting-interval, a valid flapping_event is recorded,
and the flapping_count increases by 1. When the flapping_count reaches or exceeds
threshold, flapping suppression takes effect. If the interval between two successive neighbor
status changes from Full to a non-Full state is longer than resume-interval, the
flapping_count is reset.
The detecting-interval, threshold, and resume-interval are configurable.
The value of resume-interval must be greater than that of detecting-interval.
Flapping suppression
Flapping suppression works in either Hold-down or Hold-max-cost mode.
 Hold-down mode: In the case of frequent flooding and topology changes during neighbor
relationship establishment, interfaces prevent neighbor relationship reestablishment
during Hold-down suppression, which minimizes LSDB synchronization attempts and
packet exchanges.
 Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use
65535 as the cost of the flapping link during Hold-max-cost suppression, which prevents
traffic from passing through the flapping link.
Issue 01 (2018-05-04) 1129

NE20E-S2
Flapping suppression can also work first in Hold-down mode and then in Hold-max-cost
mode.
By default, the Hold-max-cost mode takes effect. The mode and suppression duration can be
changed manually.
If an attack causes frequent neighbor relationship flapping, Hold-down mode can minimize
the impact of the attack.
When an interface enters the flapping suppression state, all neighbor relationships on the interface enter
the state accordingly.
Exiting from flapping suppression

Interfaces exit from flapping suppression in the following scenarios:
 The suppression timer expires.
 The corresponding OSPF process is reset.
 An OSPF neighbor is reset.
 A command is run to exit from flapping suppression.
Typical Scenarios
Basic scenario
In Figure 1-765, the traffic forwarding path is Device A -> Device B -> Device C -> Device E
before a link failure occurs. After the link between Device B and Device C fails, the
forwarding path switches to Device A -> Device B -> Device D -> Device E. If the neighbor
relationship between Device B and Device C frequently flaps at the early stage of the path
switchover, the forwarding path will be switched frequently, causing traffic loss and affecting
network stability. If the neighbor relationship flapping meets suppression conditions, flapping
suppression takes effect.
 If flapping suppression works in Hold-down mode, the neighbor relationship between
Device B and Device C is prevented from being reestablished during the suppression
period, in which traffic is forwarded along the path Device A -> Device B -> Device D
-> Device E.
 If flapping suppression works in Hold-max-cost mode, 65535 is used as the cost of the
link between Device B and Device C during the suppression period, and traffic is
forwarded along the path Device A -> Device B -> Device D -> Device E.
Issue 01 (2018-05-04) 1130

NE20E-S2
Figure 1-765 Flapping suppression in a basic scenario
Single-forwarding path scenario

When only one forwarding path exists on the network, the flapping of the neighbor
relationship between any two devices on the path will interrupt traffic forwarding. In Figure
1-766, the traffic forwarding path is Device A -> Device B -> Device C -> Device E. If the
neighbor relationship between Device B and Device C flaps, and the flapping meets
suppression conditions, flapping suppression takes effect. However, if the neighbor
relationship between Device B and Device C is prevented from being reestablished, the whole
network will be divided. Therefore, Hold-max-cost mode (rather than Hold-down mode) is
recommended. If flapping suppression works in Hold-max-cost mode, 65535 is used as the
cost of the link between Device B and Device C during the suppression period. After the
network stabilizes and the suppression timer expires, the link is restored.
By default, the Hold-max-cost mode takes effect.
Figure 1-766 Flapping suppression in a single-forwarding path scenario
Broadcast scenario
In Figure 1-767, four devices are deployed on the same broadcast network using switches, and
the devices are broadcast network neighbors. If Device C flaps due to a link failure, and
Device A and Device B were deployed at different time (Device A was deployed earlier for
Issue 01 (2018-05-04) 1131

NE20E-S2
example) or the flapping suppression parameters on Device A and Device B are different,
Device A first detects the flapping and suppresses Device C. Consequently, the Hello packets
sent by Device A do not carry Device C's router ID. However, Device B has not detected the
flapping yet and still considers Device C a valid node. As a result, the DR candidates
identified by Device A are Device B and Device D, whereas the DR candidates identified by
Device B are Device A, Device C, and Device D. Different DR candidates result in a different
DR election result, which may lead to route calculation errors. To prevent this problem in
scenarios where an interface has multiple neighbors, such as on a broadcast, P2MP, or NBMA
network, all neighbors on the interface are suppressed when the status of a neighbor
relationship last changes to ExStart or Down. Specifically, if Device C flaps, Device A,
Device B, and Device D on the broadcast network are all suppressed. After the network
stabilizes and the suppression timer expires, Device A, Device B, and Device D are restored
to normal status.
Figure 1-767 Flapping suppression on a broadcast network
Multi-area scenario
In Figure 1-768, Device A, Device B, Device C, Device E, and Device F are connected in area
1, and Device B, Device D, and Device E are connected in backbone area 0. Traffic from
Device A to Device F is preferentially forwarded along an intra-area route, and the forwarding
path is Device A -> Device B -> Device C -> Device E -> Device F. When the neighbor
relationship between Device B and Device C flaps and the flapping meets suppression
conditions, flapping suppression takes effect in the default mode (Hold-max-cost).
Consequently, 65535 is used as the cost of the link between Device B and Device C. However,
the forwarding path remains unchanged because intra-area routes take precedence over
inter-area routes during route selection according to OSPF route selection rules. To prevent
traffic loss in multi-area scenarios, configure Hold-down mode to prevent the neighbor
relationship between Device B and Device C from being reestablished during the suppression
period. During this period, traffic is forwarded along the path Device A -> Device B ->
Device D -> Device E -> Device F.
By default, the Hold-max-cost mode takes effect. The mode can be changed to Hold-down manually.
Issue 01 (2018-05-04) 1132

NE20E-S2
Figure 1-768 Flapping suppression in a multi-area scenario
Scenario with both LDP-IGP synchronization and OSPF neighbor relationship flapping
suppression configured
In Figure 1-769, if the link between PE1 and P1 fails, an LDP LSP switchover is implemented
immediately, causing the original LDP LSP to be deleted before a new LDP LSP is established.
To prevent traffic loss, LDP-IGP synchronization needs to be configured. With LDP-IGP
synchronization, 65535 is used as the cost of the new LSP to be established. After the new
LSP is established, the original cost takes effect. Consequently, the original LSP is deleted,
and LDP traffic is forwarded along the new LSP.
LDP-IGP synchronization and OSPF neighbor relationship flapping suppression work in
either Hold-down or Hold-max-cost mode. If both functions are configured, Hold-down mode
takes precedence over Hold-max-cost mode, followed by the configured link cost. Table
1-201 lists the suppression modes that take effect in different situations.
Table 1-201 Principles for selecting the suppression modes that take effect in different situations
LDP-IGP LDP-IGP LDP-IGP Exited from
Synchronization/OS Synchronization Synchronization LDP-IGP
PF Neighbor Hold-down Mode Hold-max-cost Synchronization
Relationship Mode Suppression
Flapping
Suppression Mode
OSPF Neighbor Hold-down Hold-down Hold-down

Relationship
Flapping
Suppression
Hold-down Mode
OSPF Neighbor Hold-down Hold-max-cost Hold-max-cost
Relationship
Flapping
Suppression
Hold-max-cost
Mode
Issue 01 (2018-05-04) 1133

NE20E-S2
Exited from OSPF Hold-down Hold-max-cost Exited from

Neighbor LDP-IGP
Relationship synchronization and
Flapping OSPF neighbor
Suppression relationship flapping
suppression
For example, the link between PE1 and P1 frequently flaps in Figure 1-769, and both
LDP-IGP synchronization and OSPF neighbor relationship flapping suppression are
configured. In this case, the suppression mode is selected based on the preceding principles.
No matter which mode (Hold-down or Hold-max-cost) is selected, the forwarding path is PE1
-> P4 -> P3 -> PE2.
Figure 1-769 Scenario with both LDP-IGP synchronization and OSPF neighbor relationship
flapping suppression configured
Scenario with both bit-error-triggered protection switching and OSPF neighbor

relationship flapping suppression configured
If a link has poor link quality, services transmitted along it may be adversely affected. If
bit-error-triggered protection switching is configured and the bit error rate (BER) along a link
exceeds a specified value, a bit error event is reported, and 65535 is used as the cost of the
link, triggering route reselection. Consequently, service traffic is switched to the backup link.
If both bit-error-triggered protection switching and OSPF neighbor relationship flapping
suppression are configured, they both take effect. Hold-down mode takes precedence over
Hold-max-cost mode, followed by the configured link cost.
1.10.6.2.16 OSPF Multi-Area Adjacency
Background
In OSPF, intra-area links take precedence over inter-area links during route selection even
when the inter-area links are shorter than the intra-area links. Each OSPF interface belongs to
only one area. As a result, even when a high-speed link exists in an area, traffic of another
area cannot be forwarded along the link. A common method used to solve this problem is to
configure multiple sub-interfaces and add them to different areas. However, this method has a
Issue 01 (2018-05-04) 1134

NE20E-S2
defect that an independent IP address needs to be configured for each sub-interface and then is
advertised, which increases the total number of routes. In this situation, OSPF multi-area
adjacency is introduced.
OSPF multi-area adjacency allows an OSPF interface to be multiplexed by multiple areas so
that a link can be shared by the areas.
Figure 1-770 Traffic forwarding paths before and after OSPF multi-area adjacency is enabled
In Figure 1-770, the link between Device A and Device B in area 1 is a high-speed link.
In Figure 1-770 a, OSPF multi-area adjacency is disabled on Device A and Device B, and
traffic from Device A to Device B in area 2 is forwarded along the low-speed link of Device A
-> Device C -> Device D -> Device B.
In Figure 1-770 b, OSPF multi-area adjacency is enabled on Device A and Device B, and their
multi-area adjacency interfaces belong to area 2. In this case, traffic from Device A to Device
B in area 2 is forwarded along the high-speed link of Device A -> Device B.
OSPF multi-area adjacency has the following advantages:
 Allows interface multiplexing, which reduces OSPF interface resource usage in
multi-area scenarios.
 Allows link multiplexing, which prevents a traffic detour to low-speed links and
optimizes the OSPF network.
Issue 01 (2018-05-04) 1135

NE20E-S2
Related Concepts
Multi-area adjacency interface: indicates the OSPF logical interface created when OSPF
multi-area adjacency is enabled on an OSPF-capable interface (main OSPF interface). The
multi-area adjacency interface is also referred to as a secondary OSPF interface. The
multi-area adjacency interface has the following characteristics:
 The multi-area adjacency interface and the main OSPF interface belong to different
OSPF areas.
 The network type of the multi-area adjacency interface must be P2P. The multi-area
adjacency interface runs an independent interface state machine and neighbor state
machine.
 The multi-area adjacency interface and the main OSPF interface share the same interface
index and packet transmission channel. Whether the multi-area adjacency interface or the
main OSPF interface is selected to forward an OSPF packet is determined by the area ID
carried in the packet header and related configuration.
 If the interface is P2P, its multi-area adjacency interface sends packets through multicast.
 If the interface is not P2P, its multi-area adjacency interface sends packets through
unicast.
Principles
Figure 1-771 Networking for OSPF multi-area adjacency
In Figure 1-771, the link between Device A and Device B in area 1 is a high-speed link. In
area 2, traffic from Device A to Device B is forwarded along the low-speed link of Device A
-> Device C -> Device D -> Device B. If you want the traffic from Device A to Device B in
area 2 to be forwarded along the high-speed link of Device A -> Device B, deploy OSPF
multi-area adjacency.
Specifically, configure OSPF multi-area adjacency on the main interfaces of Device A and
Device B to create multi-area adjacency interfaces. The multi-area adjacency interfaces
belong to area 2.
1. An OSPF adjacency is established between Device A and Device B. For details about the
establishment process, see 1.10.6.2.2 Basic Principles.
Issue 01 (2018-05-04) 1136

NE20E-S2
2. Route calculation is implemented. For details about the calculation process, see
1.10.6.2.2 Basic Principles.
The optimal path in area 2 obtained by OSPF through calculation is the high-speed link of
Device A -> Device B. In this case, the high-speed link is shared by area 1 and area 2.
1.10.6.2.17 OSPF IP FRR

With OSPF IP fast reroute (FRR), a device pre-computes alternate next hops and stores them
in the IP routing table. If a primary link fails, the device switches the traffic to a backup link.
Background
As networks develop, voice over IP (VoIP) and online video services pose higher
requirements for real-time transmission. Nevertheless, if a primary link fails, OSPF-enabled
devices need to perform multiple operations, including detecting the fault, updating the
link-state advertisement (LSA), flooding the LSA, calculating routes, and delivering forward
information base (FIB) entries before switching traffic to a new link. This process takes a
much longer time, the minimum delay to which users are sensitive. As a result, the
requirements for real-time transmission cannot be met. OSPF IP FRR can solve this problem.
OSPF IP FRR conforms to dynamic IP FRR defined by standard protocols. With OSPF IP
FRR, devices can switch traffic from a faulty primary link to a backup link, protecting against
a link or node failure.
Major Auto FRR techniques include loop-free alternate (LFA), U-turn, Not-Via, Remote LFA,
and MRT, among which OSPF supports only LFA and Remote LFA.
Related Concepts
OSPF IP FRR
OSPF IP FRR refers to a mechanism in which a device uses the loop-free alternate (LFA)
algorithm to compute the next hop of a backup link and stores the next hop together with the
primary link in the forwarding table. If the primary link fails, the device switches the traffic to
the backup link before routes are converged on the control plane. This mechanism keeps the
traffic interruption duration and minimizes the impacts.
OSPF IP FRR policy
An OSPF IP FRR policy can be configured to filter alternate next hops. Only the alternate
next hops that match the filtering rules of the policy can be added to the IP routing table.
LFA algorithm
A device uses the shortest path first (SPF) algorithm to calculate the shortest path from each
neighbor with a backup link to the destination node. The device then uses the inequalities
defined in standard protocols and the LFA algorithm to calculate the next hop of the loop-free
backup link that has the smallest cost of the available shortest paths.
Remote LFA
LFA Auto FRR cannot be used to calculate alternate links on large-scale networks, especially
on ring networks. Remote LFA Auto FRR addresses this problem by calculating a PQ node
and establishing a tunnel between the source node of a primary link and the PQ node. If the
primary link fails, traffic can be automatically switched to the tunnel, which improves
network reliability.
P space
Issue 01 (2018-05-04) 1137

NE20E-S2
P space consists of the nodes through which the shortest path trees (SPTs) with the source
node of a primary link as the root are reachable without passing through the primary link.
Extended P space
Extended P space consists of the nodes through which the SPTs with neighbors of a primary
link's source node as the root are reachable without passing through the primary link.
Q space
Q space consists of the nodes through which the SPTs with the destination node of a primary
link as the root are reachable without passing through the primary link.
PQ node
A PQ node exists both in the extended P space and Q space and is used by Remote LFA as the
destination of a protection tunnel.
OSPF LFA FRR

OSPF IP FRR guarantees protection against either a link failure or a node-and-link failure.
The link-and-node protection takes precedence over the link protection.
Link protection
Link protection takes effect when the traffic to be protected flows along a specified link.
In Figure 1-772, traffic flows from Device S to Device D. The primary link is Device
S->Device E->Device D, and the backup link is Device S->Device N->Device E->Device D.
The link costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D). With OSPF IP FRR, Device S switches the traffic to the backup link if the
primary link fails, keeping the traffic interruption duration.
Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a source node, E for the faulty
node, N for a node along a backup link, and D for a destination node.
Figure 1-772 OSPF IP FRR link protection
Node-and-link protection
Node-and-link protection takes effect when the traffic to be protected.
In Figure 1-773, traffic flows from Device S to Device D. The primary link is Device
S->Device E->Device D, and the backup link is Device S->Device N->Device D. The
preceding inequalities are met. With OSPF IP FRR, Device S switches the traffic to the
backup link if the primary link fails, keeping the traffic interruption duration.
Issue 01 (2018-05-04) 1138

NE20E-S2
The traffic to be protected flows along a specified link and node and the following conditions
are met:
 The link costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D).
 The interface costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, E) +
Distance_opt(E, D).
Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a source node, E for the faulty
node, N for a node along a backup link, and D for a destination node.
Figure 1-773 OSPF IP FRR node-and-link protection
OSPF Remote LFA Auto FRR

Similar to LFA Auto FRR, Remote LFA protects against both link and node-and-link failures.
The following example shows how Remote LFA works to protect against link failures:
In Figure 1-774, traffic flows through PE1 -> P1 -> P2 -> PE2, and the primary link is
between P1 and P2. Remote LFA calculates a PQ node (P4) and establishes a Label
Distribution Protocol (LDP) tunnel between P1 and P4. If P1 detects a failure on the primary
link, P1 encapsulates packets into MPLS packets and forwards MPLS packets to P4. After
receiving the packets, P4 removes the MPLS label from them and searches the IP routing
table for a next hop to forward the packets to PE2. Remote LFA ensures uninterrupted traffic
forwarding.
Figure 1-774 Networking for Remote LFA
On the network shown in Figure 1-774, Remote LFA calculates the PQ node as follows:
Issue 01 (2018-05-04) 1139

NE20E-S2
1. Calculates the SPTs with all neighbors of P1 as roots. The nodes through which the SPTs
are reachable without passing through the primary link form an extended P space. The
extended P space in this example is {PE1, P1, P3, P4}.
2. Calculates the SPTs with P2 as the root and obtains the Q space {PE2, P4}.
3. Selects the PQ node (P4) that exists both in the extended P space and Q space.
OSPF anti-microloop
In Figure 1-774, OSPF remote LFA FRR is enabled, the primary link is PE1 -> P1 -> P2 ->
PE2, and the backup link is PE1 -> P1 -> P3 -> P4 -> P2 -> PE2, and the link P1 -> P3 -> P4
is an LDP tunnel. If the primary link fails, traffic is switched to the backup link, and then
another round of the new primary link calculation begins. Specifically, after P1 completes
route convergence, its next hop becomes P3. However, the route convergence on P3 is slower
than that on P1, and P3's next hop is still P1. As a result, a temporary loop occurs between P1
and P3. OSPF anti-microloop can address this problem by delaying P1 from switching its next
hop until the next hop of P3 becomes P4. Then traffic is switched to the new primary link
(PE1 -> P1 -> P3 -> P4 -> P2 -> PE2), and on the link P1 -> P3 -> P4, traffic is forwarded
based on IP routes.
OSPF anti-microloop applies only to OSPF remote LFA FRR.
OSPF FRR in a Multi-Source Routing Scenario

Both OSPF LFA FRR and OSPF remote LFA FRR use the SPF algorithm to calculate the
shortest path from each neighbor (root node) that provides a backup link to the destination
node and store the node-based backup next hop, which applies to single-source routing
scenarios. As networks are increasingly diversified, two ABRs or ASBRs are deployed to
improve network reliability. In this case, OSPF FRR in a multi-source routing scenario is
needed.
In a multi-source routing scenario, OSPF FRR is implemented by calculating the Type 3 LSAs
advertised by ABRs of an area for intra-area, inter-area, ASE, or NSSA routing. Inter-area routing is
used as an example to describe how OSPF FRR in a multi-source routing scenario works.
Issue 01 (2018-05-04) 1140

NE20E-S2
Figure 1-775 OSPF FRR in a multi-route source scenario
In Figure 1-775, Device B and Device C function as ABRs to forward area 0 and area 1 routes.
Device E advertises an intra-area route. Upon receipt of the route, Device B and Device C
translate it to a Type 3 LSA and flood the LSA to area 0. After OSPF FRR is enabled on
Device A, Device A considers Device B and Device C as its neighbors. Without a fixed
neighbor as the root node, Device A fails to calculate FRR backup next hop. To address this
problem, a virtual node is simulated between Device B and Device C and used as the root
node of Device A, and Device A uses the LFA or remote LFA algorithm to calculate the
backup next hop. This solution converts multi-source routing into single-source routing.
For example, both Device B and Device C advertise the 100.1.1.0/24 route. After Device A
receives the route, it fails to calculate a backup next hop for the route due to a lack of a fixed
root node. To address this problem, a virtual node is simulated between Device B and Device
C and used as the root node of Device A. The cost of the Device B-virtual node link is 0, and
the cost of the Device C-virtual node link is 5. The cost of the virtual node-Device B or
Device C link is the maximum value (65535). If the virtual node advertises the 100.1.1.0/24
route, it will use the smaller cost of the routes advertised by Device B and Device C as the
cost of the route. Device A is configured to consider Device B and Device C as invalid
sources of the 100.1.1.0/24 route and use the LFA or remote LFA algorithm to calculate the
backup next hop for the route, with the virtual node as the root node.
In a multi-source routing scenario, OSPF FRR can use the LFA or remote LFA algorithm.
When OSPF FRR uses the remote LFA algorithm, PQ node selection has the following
restrictions:
 An LDP LSP will be established between a faulty node and a PQ node, and a virtual
node in a multi-source routing scenario cannot transmit traffic through LDP LSPs. As a
result, the virtual node cannot be selected as a PQ node.
 The destination node is not used as a PQ node. After a virtual node is added to a
multi-source routing scenario, the destination node becomes the virtual node. As a result,
the nodes directly connected to the virtual node cannot be selected as PQ nodes.
Issue 01 (2018-05-04) 1141

NE20E-S2
Derivative Functions
If you bind a Bidirectional Forwarding Detection (BFD) session with OSPF IP FRR, the BFD
session goes Down if BFD detects a link fault. If the BFD session goes Down, OSPF IP FRR
is triggered to switch traffic from the faulty link to the backup link, which minimizes the loss
of traffic.
1.10.6.2.18 OSPF Authentication

OSPF authentication encrypts OSPF packets by adding the authentication field to packets to
ensure network security. When a local device receives OSPF packets from a remote device,
the local device discards the packets if the authentication passwords carried in these packets
do not match the local one, which protects the local device from potential attacks.
In terms of the packet type, the authentication is classified as follows:
 Area authentication
Area authentication is configured in the OSPF area view and applies to packets received
by all interfaces in the OSPF area.
 Interface authentication
Interface authentication is configured in the interface view and applies to all packets
received by the interface.
In terms of packet the authentication modes, the authentication is classified as follows:
 Non-authentication
Authentication is not required.
 Simple authentication
The authenticated party directly adds the configured password to packets for
authentication. This authentication mode provides the lowest password security.
 MD5 authentication
The authenticated party encrypts the configured password using a Message Digest 5
(MD5) algorithm and adds the ciphertext password to packets for authentication. This
authentication mode improves password security. The supported MD5 algorithms
include MD5 and HMAC-MD5.
 Keychain authentication
A keychain consists of multiple authentication keys, each of which contains an ID and a
password. Each key has the lifetime. Keychain dynamically selects the authentication
key based on the lifetime. A keychain can dynamically select the authentication key to
enhance attack defense.
Keychain dynamically changes algorithms and keys, which improves OSPF security. For
detailed information about Keychain, see the chapter "Keychain" in HUAWEI
NE20E-S2 Feature Description - Security.
 HMAC-SHA256 authentication
A password is encrypted using the HMAC-SHA256 algorithm before it is added to the
packet, which improves password security.
OSPF carries authentication types in packet headers and authentication information in packet
tails.
The authentication types include:
 0: non-authentication
Issue 01 (2018-05-04) 1142

NE20E-S2
 1: simple authentication
 2: Ciphertext authentication
Usage Scenario
Figure 1-776 OSPF authentication on a broadcast network

 The interface authentication configurations must be the same on all devices on the same
network so that OSPF neighbor relationships can be established.
 The area authentication configurations must be the same on all devices in the same area.
1.10.6.2.19 OSPF Packet Format

The OSPF protocol number is 89. OSPF packets are encapsulated into IP packets. OSPF
packets are classified into the following types:
 Hello packet
 Database Description (DD) packet
 Link State Request (LSR) packet
 Link State Update (LSU) packet
 Link State Acknowledgment (LSAck) packet
Packet Header Format

The five types of OSPF packets have the same packet header format. The length of an OSPF
packet header is 24 bytes. Figure 1-777 shows an OSPF packet header.
Issue 01 (2018-05-04) 1143

NE20E-S2
Figure 1-777 OSPF packet header
Table 1-202 describes packet header fields.
Table 1-202 Packet header fields

Version 8 bits OSPF version number. For OSPFv2, the value is 2.

Type 8 bits OSPF packet type. The values are as follows:
 1: Hello packet
 2: DD packet
 3: LSR packet
 4: LSU packet
 5: LSAck packet
Packet 16 bits Length of the OSPF packet with the packet header, in bytes.
length
Router ID 32 bits ID of the router that sends the OSPF packet.
Area ID 32 bits ID of the area to which the router that sends the OSPF packet
belongs.
Checksum 16 bits Checksum of the OSPF packet that does not carry the
Authentication field.
AuType 16 bits Authentication type. The values are as follows:
 0: non-authentication
 2: message digest algorithm 5 (MD5) authentication
Authenticati 64 bits This field has different meanings for different AuType
on values:
 0: This field is not defined.
 1: This field defines password information.
 2: This field contains the key ID, MD5 authentication data
length, and sequence number.
Issue 01 (2018-05-04) 1144

NE20E-S2
MD5 authentication data is added to an OSPF packet and is not included in the Authentication field.
Hello Packet
Hello packets are commonly used packets, which are periodically sent by OSPF interfaces to
establish and maintain neighbor relationships. A Hello packet includes information about the
designated router (DR), backup designated router (BDR), timers, and known neighbors.
Figure 1-778 shows the format of a Hello packet.
Figure 1-778 Format of a Hello packet
Table 1-203 describes Hello packet fields.
Table 1-203 Hello packet fields

Network 32 bits Mask of the network on which the interface that sends the
Mask Hello packet resides.
HelloInterval 16 bits Interval at which Hello packets are sent.
Options 8 bits The values are as follows:
 E: Type 5 link state advertisements (LSAs) are flooded.
 MC: IP multicast packets are forwarded.
 N/P: Type 7 LSAs are processed.
 DC: On-demand links are processed.
Rtr Pri 8 bits DR priority. The default value is 1.
NOTE
If the DR priority of a router interface is set to 0, the interface cannot
Issue 01 (2018-05-04) 1145

NE20E-S2

participate in a DR or BDR election.
RouterDeadI 32 bits Dead interval. If a router does not receive any Hello packets
nterval from its neighbors within a specified dead interval, the
neighbors are considered Down.
Designated 32 bits Interface address of the DR.
Router
Backup 32 bits Interface address of the BDR.
Designated
Router
Neighbor 32 bits Router ID of the neighbor.
Table 1-204 lists the address types, interval types, and default intervals used when Hello
packets are transmitted on different networks.
Table 1-204 Hello packet characteristics for various network types
Networ Address Interval Type Default Interval

k Type Type
Broadcas Multicast HelloInterval. 10 seconds
t address
Non-bro Unicast  HelloInterval is used by the 30 seconds for HelloInterval
adcast address DR, BDR, and router that 120 seconds for PollInterval
multiple can become a DR.
access  PollInterval is used when
(NBMA) neighbors become Down,
and HelloInterval is used in
other cases.
Point-to- Multicast HelloInterval. 10 seconds
point address
(P2P)
Point-to- Unicast HelloInterval. 30 seconds
multipoi address
nt
(P2MP)
To establish neighbor relationships between routers on the same network segment, set the same
HelloInterval, PollInterval, and RouterDeadInterval values for the routers. PollInterval applies only to
NBMA networks.
DD Packet
During an adjacency initialization, two routers use DD packets to describe their own link state
databases (LSDBs) for LSDB synchronization. A DD packet contains the header of each LSA
Issue 01 (2018-05-04) 1146

NE20E-S2
in an LSDB. An LSA header uniquely identifies an LSA. The LSA header occupies only a
small portion of the LSA, which reduces the amount of traffic transmitted between routers. A
neighbor can use the LSA header to check whether it already has the LSA. When two routers
exchange DD packets, one functions as the master, and the other functions as the slave. The
master defines a start sequence number and increases the sequence number by one each time
it sends a DD packet. After the slave receives a DD packet, it uses the sequence number
carried in the DD packet for acknowledgement.
Figure 1-779 shows the format of a DD packet.
Figure 1-779 Format of a DD packet
Table 1-205 describes DD packet fields.
Table 1-205 DD packet fields

Interface 16 bits Maximum length of the DD packet sent by the interface with
MTU packet fragmentation disabled.
 E: Type 5 LSAs are flooded.
I 1 bit If the DD packet is the first packet among multiple
consecutive DD packets sent by a router, this field is set to 1.
In other cases, this field is set to 0.
M (More) 1 bit If the DD packet is the last packet among multiple
M/S 1 bit When two routers exchange DD packets, they negotiate a
(Master/Slav master/slave relationship. The router with a larger router ID
Issue 01 (2018-05-04) 1147

NE20E-S2

e) becomes the master. If this field is set to 1, the DD packet is
sent by the master.
DD 32 bits Sequence number of the DD packet. The master and slave use
sequence the sequence number to ensure that DD packets are correctly
number transmitted.
LSA - LSA header information included in the DD packet.
Headers
LSR Packet
After two routers exchange DD packets, they send LSR packets to request each other's LSAs.
The LSR packets contain the summaries of the requested LSAs. Figure 1-780 shows the
format of an LSR packet.
Figure 1-780 Format of an LSR packet
Table 1-206 describes LSR packet fields.
Table 1-206 LSR packet fields

LS type 32 bits Type of the LSA.
Link State 32 bits This field together with the LS type field describes an LSA in
ID an AS.
Advertising 32 bits Router ID of the router that generates the LSA.
Router
Issue 01 (2018-05-04) 1148

NE20E-S2
The LS type, Link State ID, and Advertising Router fields can uniquely identify an LSA. If two LSAs
have the same LS type, Link State ID, and Advertising Router fields, a router uses the LS sequence
number, LS checksum, and LS age fields to obtain a required LSA.
LSU Packet
A router uses an LSU packet to transmit LSAs requested by its neighbors or to flood its own
updated LSAs. The LSU packet contains a set of LSAs. For multicast and broadcast networks,
LSU packets are multicast to flood LSAs. To ensure reliable LSA flooding, a router uses an
LSAck packet to acknowledge the LSAs contained in an LSU packet that is received from a
neighbor. If an LSA fails to be acknowledged, the router retransmits the LSA to the neighbor.
Figure 1-781 shows the format of an LSU packet.
Figure 1-781 Format of an LSU packet
Table 1-207 describes the LSU packet field.
Table 1-207 LSU packet field

Number of 32 bits Number of LSAs contained in the LSU packet
LSAs
LSAck Packet
A router uses an LSAck packet to acknowledge the LSAs contained in a received LSU packet.
The LSAs can be acknowledged using LSA headers. LSAck packets can be transmitted over
different links in unicast or multicast mode. Figure 1-782 shows the format of an LSAck
packet.
Issue 01 (2018-05-04) 1149

NE20E-S2
Figure 1-782 Format of an LSAck packet
Table 1-208 describes the LSAck packet field.
Table 1-208 LSAck packet field

LSAs Determine This field is used to acknowledge an LSA.
Headers d by the
header
length of
the LSA to
be
acknowled
ged.
1.10.6.2.20 OSPF LSA Format

Each router in an AS generates one or more types of LSAs, depending on the router's type.
Multiple LSAs form an LSDB. OSPF encapsulates routing information into LSAs for
transmission. Commonly used LSAs include:
 Router-LSAs (Type 1)
 Network-LSAs (Type 2)
 Summary-LSAs, including network-summary-LSAs (Type 3) and
ASBR-summary-LSAs (Type 4)
 AS-external-LSAs (Type 5)
LSA Header Format

All LSAs have the same header. Figure 1-783 shows an LSA header.
Issue 01 (2018-05-04) 1150

NE20E-S2
Figure 1-783 LSA header
Table 1-209 describes LSA header fields.
Table 1-209 LSA header fields

LS age 16 bits Time that elapses after an LSA is generated, in seconds. The
value of this field continually increases regardless of whether
the LSA is transmitted over a link or saved in an LSDB.
LS type 8 bits Type of the LSA. The values are as follows:
 Type1: Router-LSA
 Type2: Network-LSA
 Type3: Network-summary-LSA
 Type4: ASBR-summary-LSA
 Type5: AS-external-LSA
 Type7: NSSA-LSA
ID an AS.
Router
LS sequence 32 bits Sequence number of the LSA. Neighbors can use this field to
number identify the latest LSA.
LS 16 bits Checksum of all fields except the LS age field.
checksum
length 16 bits Length of the LSA including the LSA header, in bytes.
Issue 01 (2018-05-04) 1151

NE20E-S2
Router-LSA
A router-LSA describes the link status and cost of a router. Router-LSAs are generated by a
router and advertised within the area to which the router belongs. Figure 1-784 shows the
format of a router-LSA.
Figure 1-784 Format of a router-LSA
Table 1-210 describes router-LSA fields.
Table 1-210 Router-LSA fields

Link State 32 bits Router ID of the router that generates the LSA.
ID
V (Virtual 1 bit If the router that generates the LSA is located at one end of a
Link) virtual link, this field is set to 1. In other cases, this field is set
to 0.
E (External) 1 bit If the router that generates the LSA is an autonomous system
boundary router (ASBR), this field is set to 1. In other cases,
this field is set to 0.
B (Border) 1 bit If the router that generates the LSA is an area border router
(ABR), this field is set to 1. In other cases, this field is set to
0.
# links 16 bits Number of links and interfaces described in the LSA,
including all links and interfaces in the area to which the
router belongs.
Issue 01 (2018-05-04) 1152

NE20E-S2

Link ID 32 bits Object to which the router is connected. Its meanings are as
follows:
 1: router ID
 2: interface IP address of the designated router (DR)
 3: network segment or subnet number
 4: router ID of the neighbor on a virtual link
Link Data 32 bits Link data. Its meanings are as follows:
 1: interface index
 3: subnet mask
 2 and 4: interface address of the router
Type 8 bits Type of the router link. The values are as follows:
 1: The router is connected to another router in
point-to-point (P2P) mode.
 2: The router is connected to a transport network.
 3: The router is connected to a stub network.
 4: The router is connected to another router over a virtual
link.
# ToS 8 bits Number of types of service (ToSs).
metric 16 bits Cost of the link.
ToS 8 bits Type of service.
ToS metric 16 bits Metric for the specified ToS.
Network-LSA
A network-LSA describes the link status of all routers on the local network segment.
Network-LSAs are generated by a DR on a broadcast or non-broadcast multiple access
(NBMA) network and advertised within the area to which the DR belongs. Figure 1-785
shows the format of a network-LSA.
Issue 01 (2018-05-04) 1153

NE20E-S2
Figure 1-785 Format of a network-LSA
Table 1-211 describes network-LSA fields.
Table 1-211 Network-LSA fields

Link State 32 bits Interface IP address of the DR
ID
Network 32 bits Mask of the broadcast or NBMA network
Mask
Attached 32 bits Router IDs of all routers on the broadcast or NBMA network,
Router including the router ID of the DR
Summary-LSA
A network-summary-LSA describes routes on a network segment in an area. The routes are
advertised to other areas.
An ASBR-summary-LSA describes routes to the ASBR in an area. The routes are advertised
to all areas except the area to which the ASBR belongs.
The two types of summary-LSAs have the same format and are generated by an ABR. Figure
1-786 shows the format of a summary-LSA.
Issue 01 (2018-05-04) 1154

NE20E-S2
Figure 1-786 Format of a summary-LSA
Table 1-212 describes network-summary-LSA fields.
Table 1-212 Network-summary-LSA fields

Link State 32 bits Advertised network address
ID
Network 32 bits Mask of the broadcast or NBMA network
Mask
metric 24 bits Cost of the route to the destination address
ToS 8 bits Type of service
ToS metric 24 bits Metric for the specified ToS
When a default route is advertised, both the Link State ID and Network Mask fields are set to 0.0.0.0.
Table 1-213 describes ASBR-summary-LSA fields.
Table 1-213 ASBR-summary-LSA fields

Link State 32 bits Router ID of the ASBR
ID
Network 32 bits Set to 0.0.0.0
Mask
metric 24 bits Cost of the route to the destination address
ToS 8 bits Type of service
ToS metric 24 bits Metric for the specified ToS
Issue 01 (2018-05-04) 1155

NE20E-S2
AS-External-LSA
An AS-external-LSA describes AS external routes. AS-external-LSAs are generated by an
ASBR. Among the five types of LSAs, only AS-external-LSAs can be advertised to all areas
except stub areas and not-so-stubby areas (NSSAs). Figure 1-787 shows the format of an
AS-external-LSA.
Figure 1-787 Format of an AS-external-LSA
Table 1-214 describes AS-external-LSA fields.
Table 1-214 AS-external-LSA fields

Link State 32 bits Advertised network address.
ID
Network 32 bits Mask of the advertised destination address.
Mask
E 1 bit Type of the external route. The values are as follows:
 0: Type 1 external route
 1: Type 2 external route
metric 24 bit Cost of the route to the destination address.
Forwarding 32 bits Packets destined for the advertised destination address are
Address forwarded to the address specified by this field.
External 32 bits Tag added to the external route. This field can be used to
Route Tag manage external routes. OSPF itself does not use this field.
ToS 8 bits Type of service.
ToS metric 24 bits Metric for the specified ToS.
When AS-external-LSAs are used to advertise default routes, both the Link State ID and Network Mask
fields are set to 0.0.0.0.
1.10.6.3.1 OSPF GTSM
In Figure 1-788, OSPF runs between the routers, and GTSM is enabled on Device C. The
following are the valid TTL ranges of the packets from all other routers on the network to
Device C:
Issue 01 (2018-05-04) 1156

NE20E-S2
 Device A and Device E are the neighbors of Device C, and the valid TTL range of
packets from Device A and Device E is [255 - hop count + 1, 255].
 The valid TTL ranges of the packets from Device B, Device D, and Device F to Device
C are [254, 255], [253, 255], and [252, 255], respectively.
Figure 1-788 Networking for OSPF GTSM
For detailed description of OSPF GTSM, refer to the HUAWEI NE20E-S2 Feature Description -
Security.
1.10.7 OSPFv3
Definition
Open Shortest Path First (OSPF), developed by the Internet Engineering Task Force (IETF), is
a link-state Interior Gateway Protocol (IGP).
At present, OSPF Version 2 (OSPFv2) is used for IPv4, while OSPF Version 3 (OSPFv3),
developed on the basis of OSPFv2, is used for IPv6.
Purpose
The primary purpose of OSPFv3 is to develop a routing protocol independent of any specific
network layer. The internal OSPFv3 router information is redesigned to achieve this purpose.
1.10.7.2 Principles
1.10.7.2.1 OSPFv3 Fundamentals
Running on IPv6, OSPFv3 is an independent routing protocol that is developed on the basis of
OSPFv2.
Issue 01 (2018-05-04) 1157

NE20E-S2
 OSPFv3 and OSPFv2 are the same in terms of the working principles of the Hello packet,
state machine, link-state database (LSDB), flooding, and route calculation.
 OSPFv3 packets are encapsulated into IPv6 packets and can be transmitted in unicast or
multicast mode.
OSPFv3 Packet Types

Packet Type Function
Hello packet Hello packets are sent periodically to discover and
maintain OSPFv3 neighbor relationships.
Database Description (DD) DD packets contain the summary of the local LSDB
packet and are exchanged between two OSPFv3 routers to
update the LSDBs.
Link State Request (LSR) packet LSR packets are sent to the neighbor to request the
required LSAs.
An OSPFv3 device sends LSR packets to its neighbor
only after they exchange DD packets.
Link State Update (LSU) packet LSU packets carry the LSAs required by neighbors.
Link State Acknowledgment LSAck packets acknowledge the receipt of an LSA.
(LSAck) packet
LSA Types
LSA Type Description
Router-LSA (Type 1) Describes the link status and link cost of a router. It is
generated by every router and advertised in the area to
which the router belongs.
Network-LSA (Type 2) Describes the link status of all routers on the local network
segment. Network-LSAs are generated by a designated
router (DR) and advertised in the area to which the DR
belongs.
Inter-Area-Prefix-LSA (Type Describes routes to a specific network segment in an area.
3) Inter-Area-Prefix-LSAs are generated on the Area Border
Router (ABR) and sent to related areas.
Inter-Area-Router-LSA Describes routes to an Autonomous System Boundary
(Type 4) Router (ASBR). Inter-Area-Router-LSAs are generated by
an ABR and advertised to all related areas except the area
to which the ASBR belongs.
AS-external-LSA (Type 5) Describes routes to a destination outside the AS.
AS-external-LSAs are generated by an ASBR and
advertised to all areas except stub areas.
Link-LSA (Type 8) Describes the link-local address and IPv6 address prefix
associated with the link and the link option set in the
Issue 01 (2018-05-04) 1158

NE20E-S2
LSA Type Description

network LSA. Link LSAs are transmitted only on the link.
Intra-Area-Prefix-LSA (Type Each device or DR generates one or more intra-area prefix
9) LSAs and transmits it in the local area.
 An intra-area prefix LSA generated by a device
describes the IPv6 address prefix associated with the
router LSA.
 An intra-area prefix LSA generated by a DR describes
the IPv6 address prefix associated with the network
LSA.
Router Types
Figure 1-789 Router types
Table 1-215 Router types and descriptions

Internal router All interfaces on an internal router belong to the same
OSPFv3 area.
Area border router (ABR) An ABR belongs to two or more areas, one of which must
be the backbone area.
An ABR is used to connect the backbone area and
non-backbone areas. It can be physically or logically
connected to the backbone area.
Backbone router At least one interface on a backbone router belongs to the
backbone area.
Internal routers in Area 0 and all ABRs are backbone
routers.
Issue 01 (2018-05-04) 1159

NE20E-S2

AS boundary router (ASBR) An ASBR exchanges routing information with other ASs.
An ASBR does not necessarily reside on the border of an
AS. It can be an internal router or an ABR. An OSPFv3
device that has imported external routing information will
become an ASBR.
OSPFv3 Route Types

Inter-area routes and intra-area routes describe the network structure of an AS. External routes
describe how to select a route to the destination outside an AS. OSPFv3 classifies the
imported AS external routes into Type 1 routes and Type 2 routes.
Table 1-216 lists route types in descending order of priority.
Table 1-216 Types of OSPFv3 routes

Intra area route Indicates routes within an area.
Inter area route Indicates routes between areas.
Type 1 external route Type 1 external routes have high reliability.
Cost of a Type 1 external route = Cost of the route from a
local router to an ASBR + Cost of the route from the
ASBR to the destination of the Type 1 external route
Type 2 external route Type 2 external routes have low reliability, and therefore
OSPFv3 considers the cost of the route from an ASBR to
the destination of a Type 2 external route to be much
greater than the cost of any internal route to the ASBR.
Cost of a Type 2 external route = Cost of the route from the
ASBR to the destination of the Type 2 external route
Area
When a large number of routers run OSPFv3, LSDBs become very large and require a large
amount of storage space. Large LSDBs also complicate shortest path first (SPF) computation
and are computationally intensive for the routers. Network expansion causes the network
topology to change, which results in route flapping and frequent OSPFv3 packet transmission.
When a large number of OSPFv3 packets are transmitted on the network, bandwidth usage
efficiency decreases. Each change in the network topology causes all routers on the network
to recalculate routes.
OSPFv3 resolves this problem by partitioning an AS into different areas. An area is regarded
as a logical group, and each group is identified by an area ID. A router, not a link, resides at
the border of an area. A network segment or link can belong only to one area. An area must be
specified for each OSPFv3 interface.
Issue 01 (2018-05-04) 1160

NE20E-S2
OSPFv3 areas include common areas, stub areas, as described in Table 1-217.
Table 1-217 OSPFv3 areas
Area Function Notes

Type
Common OSPFv3 areas are common areas by default.  In the backbone area,
area Common areas include standard areas and all devices must be
backbone areas. connected.
 Standard area: transmits intra-area,  All non-backbone
inter-area, and external routes. areas must be
 Backbone area: connects to all other OSPFv3 connected to the
areas and transmits inter-area routes. The backbone area.
backbone area is represented by area 0.
Routes between non-backbone areas must be
forwarded through the backbone area.
Stub area A stub area is a non-backbone area with only  The backbone area
one ABR and generally resides at the border of cannot be configured
an AS. The ABR in a stub area does not transmit as a stub area.
received AS external routes, which significantly  An ASBR cannot exist
decreases the number of entries in the routing in a stub area.
table on the ABR and the amount of routing Therefore, AS external
information to be transmitted. To ensure the routes cannot be
reachability of AS external routes, the ABR in advertised within the
the stub area generates a default route and stub area.
advertises the route to non-ABRs in the stub
area.
A totally stub area allows only intra-area routes
and ABR-advertised Type 3 LSAs carrying a
default route to be advertised within the area.
Network Types Supported by OSPFv3

OSPFv3 classifies networks into the following types based on link layer protocols.
Table 1-218 Types of OSPFv3 networks
Network Type Description
Broadcast OSPFv3 considers networks with Ethernet or Fiber Distributed

Data Interface (FDDI) as the link layer protocol as broadcast
networks by default.
On a broadcast network:
 Hello packets, LSU packets, and LSAck packets are usually
transmitted in multicast mode. FF02::5 is an IPv6 multicast
address reserved for an OSPFv3 device. FF02::6 is an IPv6
multicast address reserved for an OSPFv3 DR or backup
designated router (BDR).
 DD and LSR packets are transmitted in unicast mode.
Issue 01 (2018-05-04) 1161

NE20E-S2
Network Type Description

Non-broadcast Multiple OSPFv3 considers networks with X.25 as the link layer protocol
Access (NBMA) as NBMA networks by default.
On an NBMA network, protocol packets, such as Hello packets,
DD packets, LSR packets, LSU packets, and LSAck packets are
sent in unicast mode.
Point-to-Multipoint No network is a P2MP network by default, no matter what type
(P2MP) of link layer protocol is used on the network. A non-fully
meshed NBMA network can be changed to a P2MP network.
On a P2MP network:
 Hello packets are transmitted in multicast mode using the
multicast address FF02::5.
 Other types of protocol packets, such as DD packets, LSR
packets, LSU packets, and LSAck packets are sent in unicast
mode.
Point-to-point (P2P) OSPFv3 considers networks with PPP, HDLC, or LAPB as the
link layer protocol to be P2P networks by default.
On a P2P network, protocol packets, such as Hello packets, DD
packets, LSR packets, LSU packets, and LSAck packets are sent
in multicast mode using the multicast address FF02::5.
Stub Area
Stub areas are specific areas where ABRs do not flood received AS external routes. In stub
areas, routers maintain fewer routing entries and less routing information than the routers in
other areas.
Configuring a stub area is optional. Not every area can be configured as a stub area, because a
stub area is usually a non-backbone area with only one ABR and is located at the AS border.
To ensure the reachability of the routes to destinations outside an AS, the ABR in the stub area
generates a default route and advertises the route to the non-ABRs in the same stub area.
Note the following points when configuring a stub area:
 The backbone area cannot be configured as a stub area.
 Configure stub area attributes on all routers in the area to be configured as a stub area.
 No ASBRs are allowed in the area to be configured as a stub area because AS external
routes cannot be transmitted in the stub area.
OSPFv3 Route Summarization

Routes with the same IPv6 prefix can be summarized into one route. On a large-scale OSPFv3
network, route lookup may slow down because of the large size of the routing table. To reduce
the routing table size and simplify management, configure route summarization. With route
summarization, if a link connected to a device within an IPv6 address range that has been
summarized alternates between Up and Down, the link status change is not advertised to the
devices beyond the IPv6 address range. This prevents route flapping and improves network
stability.
Issue 01 (2018-05-04) 1162

NE20E-S2
OSPFv3 route summarization is classified as follows:

 Route summarization on an ABR
An ABR can summarize routes with the same prefix into one route and advertise the
summarized route to other areas.
When an ABR transmits routing information to other areas, it generates Type 3 LSAs for
each network segment. If contiguous segments exist in this area, these segments can be
summarized into one segment so that the ABR sends only one summarized LSA.
 Route summarization on an ASBR
An ASBR can summarize imported routes with the same prefix into one route and then
advertise the summarized route to other areas.
With route summarization, an ASBR summarizes imported Type 5 LSAs within the
summarized address range. After route summarization, the ASBR does not generate a
separate Type 5 LSA for each specific prefix within the configured range. Instead, the
ASBR generates a Type 5 LSA for only the summarized prefix.
OSPFv3 Multi-process
OSPFv3 supports multi-process. Multiple OSPFv3 processes can independently run on the
same router. Route exchange between different OSPFv3 processes is similar to that between
different routing protocols.
1.10.7.2.2 Comparison Between OSPFv3 and OSPFv2

OSPFv3 and OSPFv2 are the same in the following aspects:
 Network types and interface types
 Interface state machines and neighbor state machines
 LSDB
 Flooding mechanism
 Five types of packets: Hello, DD, LSR, LSU, and LSAck packets
 Route calculation
OSPFv3 and OSPFv2 differ as follows:
 In OSPFv3, only LSUs contain IP addresses.
 OSPFv3 uses IPv6 which is based on links rather than network segments.
Therefore, the interfaces on which OSPFv3 is to be configured must be on the same link
rather than in the same network segment. In addition, the interfaces can establish
OSPFv3 sessions without IPv6 global addresses.
 OSPFv3 does not depend on IP addresses.
OSPFv3 separates topology calculation from IP addresses. Specifically, OSPFv3 can
calculate the OSPFv3 topology without IPv6 global addresses which only apply to
virtual link interfaces and packet forwarding.
 OSPFv3 packets and the LSA format change.
− OSPFv3 packets do not contain IP addresses.
− OSPFv3 router LSAs and network LSAs do not contain IP addresses, which are
advertised through link LSAs and intra-area prefix LSAs.
− In OSPFv3, router IDs, area IDs, and LSA link state IDs no longer indicate IP
addresses, but the IPv4 address format is still reserved.
Issue 01 (2018-05-04) 1163

NE20E-S2
− Neighbors are identified by router IDs instead of IP addresses on broadcast, NBMA,

or P2MP networks.
 Information about the flooding scope is added to OSPFv3 LSAs.
Information about the flooding scope is added to the LSA Type field of OSPFv3 LSAs.
Therefore, OSPFv3 routers can process LSAs of unidentified types more flexibly.
− OSPFv3 can store or flood unidentified packets, while OSPFv2 discards
unidentified packets.
− In OSPFv3, unknown LSAs with 1 as the U flag bit can be flooded, and the
flooding scope of such LSAs is specified by the LSAs.
For example, Device A and Device B can identify LSAs of a certain type. Device A and
Device B are connected through Device C which, however, cannot identify these LSAs.
If Device A floods such LSA to Device C, Device C can still flood the received LSAs to
Device B although Device C does not identify these LSAs. Device B then processes
these LSAs.
If OSPFv2 is run, Device C discards the unidentified LSAs. As a result, these LSAs
cannot reach Device B.
 OSPFv3 supports multi-process on a link.
In OSPFv2, one physical interface can be bound to only one multi-instance. In OSPFv3,
one physical interface can be bound to multiple multi-instances that are identified by
different instance IDs. In these OSPFv3 multi-instances running on one physical
interface, neighbor relationships are established separately, sharing resources on the
same link.
 OSPFv3 uses IPv6 link-local addresses.
IPv6 implements neighbor discovery and automatic configuration based on link-local
addresses. Routers running IPv6 do not forward IPv6 packets whose destination address
is a link-local address, and those packets can only be exchanged on the same link. The
unicast link-local address starts from FE80/10.
As a routing protocol running on IPv6, OSPFv3 also uses link-local addresses to
maintain neighbor relationships and update LSDBs. Except Vlink interfaces, all OSPFv3
interfaces use link-local addresses as the source address and the next hop to transmit
OSPFv3 packets.
The advantages are as follows:
− OSPFv3 can calculate the topology without global IPv6 addresses.
− The packets flooded on a link are not transmitted to other links, which prevents
unnecessary flooding and saves bandwidth.
 OSPFv3 supports two new LSAs.
− Link LSA: A device floods a link LSA on the link where it resides to advertise its
link-local address and the configured global IPv6 address.
− Intra-area prefix LSA: A device advertises an intra-area prefix LSA in the local
OSPF area to inform the other routers in the area or the network (either a broadcast
network or an NBMA network) of its IPv6 global address.
 OSPFv3 identifies neighbors based on Router IDs only.
On broadcast, NBMA, and P2MP networks, OSPFv2 identifies neighbors based on IPv4
addresses of interfaces.
OSPFv3 identifies neighbors based on Router IDs only.
Issue 01 (2018-05-04) 1164

NE20E-S2
1.10.7.2.3 BFD for OSPFv3
Definition
In BFD for OSPFv3, a BFD session is associated with OSPFv3. The BFD session quickly
detects a link fault and then notifies OSPFv3 of the fault, which speeds up OSPFv3's response
to network topology changes.
Purpose
routing protocols.
BFD for Open Shortest Path First version 3 (OSPFv3) associates BFD sessions with OSPFv3.
After BFD for OSPFv3 is configured, BFD quickly detects link faults and notifies OSPFv3 of
the faults. BFD for OSPFv3 accelerates OSPFv3 response to network topology changes.
Table 1-219 describes OSPFv3 convergence speeds before and after BFD for OSPFv3 is
configured.
Table 1-219 OSPFv3 convergence speeds before and after BFD for OSPFv3 is configured

Speed
BFD for An OSPFv3 Dead timer expires. Second-level
OSPFv3 is not
configured.
BFD for A BFD session goes Down. Millisecond-level
OSPFv3 is
configured.
Issue 01 (2018-05-04) 1165

NE20E-S2
Principles
Figure 1-790 BFD for OSPFv3
Figure 1-790 shows a typical network topology with BFD for OSPFv3 configured. The
principles of BFD for OSPFv3 are described as follows:
1. OSPFv3 neighbor relationships are established between these three routers.
of the fault.
recalculates routes. The new route passes through Device C and reaches Device B, with
1.10.7.2.4 Priority-based Convergence

Priority-based OSPFv3 convergence ensures that specific routes are converged first in the
case of a great number of routes. Different routes can be set with different convergence
priorities.
A higher convergence priority can be configured for routes over which key services are
transmitted so that these routes can converge first, which minimizes the impact on key
services.
1.10.7.2.5 OSPFv3 Auto FRR

OSPFv3 Auto Fast Reroute (FRR) is dynamic IP FRR in which a device pre-computes
alternate next hops and stores them in the IP routing table. If a primary link fails, the device
switches the traffic to a backup link.
With OSPFv3 Auto FRR, devices can switch traffic from a faulty primary link to a backup
link, protecting against a link or node failure.
Background
As networks develop, voice over IP (VoIP) and online video services pose higher
requirements for real-time transmission. Nevertheless, if a primary link fails, OSPFv3-enabled
devices need to perform multiple operations, including detecting the fault, updating the
link-state advertisement (LSA), flooding the LSA, calculating routes, and delivering forward
information base (FIB) entries before switching traffic to a new link. This process takes a
Issue 01 (2018-05-04) 1166

NE20E-S2
much longer time, the minimum delay to which users are sensitive. As a result, the
requirements for real-time transmission cannot be met.
Principles
With OSPFv3 IP FRR, a device uses the loop-free alternate (LFA) algorithm to compute the
next hop of a backup link and stores the next hop together with the primary link in the
forwarding table. If the primary link fails, the device switches the traffic to the backup link
before routes are converged on the control plane. This mechanism keeps the traffic
interruption duration and minimizes the impacts. The NE20E supports OSPFv3 Auto FRR.
A device uses shortest path first (SPF) algorithm to calculate the shortest path from each
neighbor that can provide a backup link to the destination node. The device then uses the
inequalities defined in standard protocols and the LFA algorithm to calculate the next hop of
the loop-free backup link that has the smallest cost of the available shortest paths.
An OSPFv3 Auto FRR policy is used to filter alternate next hops. Only the alternate next hops
that match the filtering rules of the policy can be added to the IP routing table. Users can
configure a desired OSPF IP FRR policy to filter alternate next hops.
If a Bidirectional Forwarding Detection (BFD) session is bound to OSPFv3 Auto FRR, the
BFD session goes Down if BFD detects a link fault. If the BFD session goes Down, OSPFv3
Auto FRR is triggered on the interface to switch traffic from the faulty link to the backup link,
which minimizes the loss of traffic.
Usage Scenario
OSPFv3 Auto FRR guarantees protection against either a link failure or a node-and-link
failure. Distance_opt (X,Y) indicates the shortest path between node X and node Y.
 Link protection: Link protection takes effect when the traffic to be protected flows
along a specified link and the link costs meet the inequality: Distance_opt (N, D) <
Distance_opt (N, S) + Distance_opt (S, D).
− S: source node
− N: node along a backup link
− D: destination node
On the network shown in Figure 1-791, traffic flows from Device S to Device D. The
link cost satisfies the link protection inequality. If the primary link (Device S -> Device
E -> Device D) fails, Device S switches the traffic to the backup link (Device S ->
Device N -> Device E -> Device D), keeping the traffic interruption duration.
Figure 1-791 Networking for OSPFv3 Auto FRR link protection
Issue 01 (2018-05-04) 1167

NE20E-S2
 Link-and-node protection: Figure 1-792 shows a networking for link-and-node

protection. The link-and-node protection takes precedence over the link protection.
Link-and-node protection must satisfy the following conditions:
− The link cost must satisfy the inequality: Distance_opt (N, D) < Distance_opt (N, S)
+ Distance_opt (S, D).
− The interface cost of the Device must satisfy the inequality: Distance_opt (N, D) <
Distance_opt (N, E) + Distance_opt (E, D).
E: faulty node
N: node on the backup link
D: destination node
Figure 1-792 Networking for OSPFv3 Auto FRR link-and-node protection
OSPFv3 FRR in a Multi-Source Routing Scenario

OSPFv3 LFA FRR uses the SPF algorithm to calculate the shortest path from each neighbor
(root node) that provides a backup link to the destination node and store the node-based
backup next hop, which applies to single-source routing scenarios. As networks are
increasingly diversified, two ABRs or ASBRs are deployed to improve network reliability. In
this case, OSPFv3 FRR in a multi-source routing scenario is needed.
In a multi-source routing scenario, OSPFv3 FRR is implemented by calculating the Type 3 LSAs
advertised by ABRs of an area for intra-area, inter-area, ASE routing. Inter-area routing is used as an
example to describe how OSPFv3 FRR in a multi-source routing scenario works.
Issue 01 (2018-05-04) 1168

NE20E-S2
Figure 1-793 OSPFv3 FRR in a multi-route source scenario
In Figure 1-793, Device B and Device C function as ABRs to forward area 0 and area 1 routes.
Device E advertises an intra-area route. Upon receipt of the route, Device B and Device C
translate it to a Type 3 LSA and flood the LSA to area 0. After OSPFv3 FRR is enabled on
Device A, Device A considers Device B and Device C as its neighbors. Without a fixed
neighbor as the root node, Device A fails to calculate FRR backup next hop. To address this
problem, a virtual node is simulated between Device B and Device C and used as the root
node of Device A, and Device A uses the LFA algorithm to calculate the backup next hop.
This solution converts multi-source routing into single-source routing.
For example, both Device B and Device C advertise the 2001:DB8:1::1/64 route. After Device
A receives the route, it fails to calculate a backup next hop for the route due to a lack of a
fixed root node. To address this problem, a virtual node is simulated between Device B and
Device C and used as the root node of Device A. The cost of the Device B-virtual node link is
0, and the cost of the Device C-virtual node link is 5. The cost of the virtual node-Device B or
Device C link is the maximum value (65535). If the virtual node advertises the
2001:DB8:1::1/64 route, it will use the smaller cost of the routes advertised by Device B and
Device C as the cost of the route. Device A is configured to consider Device B and Device C
as invalid sources of the 2001:DB8:1::1/64 route and use the LFA algorithm to calculate the
backup next hop for the route, with the virtual node as the root node.
1.10.7.2.6 OSPFv3 GR
The NE20E can be configured as a GR helper rather than a GR restarter.
Graceful restart (GR) is a technology used to ensure proper traffic forwarding, especially the
forwarding of key services, during the restart of routing protocols.
Without GR, the master/slave main control board switchover due to various reasons leads to
transient service interruption, and as a result, route flapping occurs on the whole network.
Such route flapping and service interruption are unacceptable on large-scale networks,
especially carrier networks.
Issue 01 (2018-05-04) 1169

NE20E-S2
GR is one of the high availability (HA) technologies which comprise a series of

comprehensive technologies, such as fault-tolerant redundancy, link protection, faulty node
recovery, and traffic engineering technologies. As a fault-tolerant redundancy technology, GR
is widely used to ensure non-stop forwarding of key data during the master/slave main control
board switchovers and system upgrade.
In GR mode, the forwarding plane continues data forwarding during a restart, and operations
on the control plane, such as re-establishment of neighbor relationships and route calculation,
do not affect the forwarding plane, preventing service interruptions caused by route flapping
and improving network reliability.
Comparison Between Master/Slave Main Control Board Switchovers with and

Without GR
Table 1-220 Comparison between master/slave main control board switchovers with and without
GR
Master/Slave Main Control Board Master/Slave Main Control Board
Switchovers Without GR Switchovers with GR
 OSPFv3 neighbor relationships are  OSPFv3 neighbor relationships are
reestablished. reestablished.
 Routes are recalculated.  Routes are recalculated.
 FIB entries change.  FIB entries remain unchanged.
 The entire network detects route  Except the neighbors of the router on
changes, and route flapping occurs for which a master/slave main control board
a short period of time. switchover occurs, other routers do not
 Packets are lost during forwarding, detect route changes.
and services are interrupted.  No packets are lost during forwarding, and
services are not affected.
1.10.7.2.7 OSPFv3 VPN
Definition
As an extension to OSPFv3, OSPFv3 VPN multi-instance enables Provider Edges (PEs) and
Customer Edges (CEs) in VPN networks to run OSPFv3 for interworking and use OSPFv3 to
learn and advertise routes.
Purpose
As a widely used IGP, in most cases, OSPFv3 runs in VPNs. If OSPFv3 runs between PEs
and CEs, and PEs use OSPFv3 to advertise VPN routes to CEs, no other routing protocols
need to be configured on CEs for interworking with PEs, which simplifies management and
configuration of CEs.
Running OSPFv3 Between PEs and CEs

In BGP/MPLS VPN, Multi-Protocol BGP (MP-BGP) is used to transmit routing information
between PEs, while OSPFv3 is used to learn and advertise routes between PEs and CEs.
Issue 01 (2018-05-04) 1170

NE20E-S2
Running OSPFv3 between PEs and CEs features the following benefits:
 OSPFv3 is used in a site to learn routes. Running OSPFv3 between PEs and CEs can
reduce the number of protocol types supported by CEs.
 Similarly, running OSPFv3 both in a site and between PEs and CEs simplifies the work
of network administrators and reduces the number of protocols that network
administrators must be familiar with.
 When the network, which originally uses OSPFv3 but not VPN on the backbone network
begins to use BGP/MPLS VPN, running OSPFv3 between PEs and CEs facilitates the
transition.
In Figure 1-794, CE1, CE3, and CE4 belong to VPN 1, and the numbers following OSPFv3
indicate the process IDs of the multiple OSPFv3 instances running on PEs.
Figure 1-794 Running OSPFv3 between PEs and CEs
CE1 advertises routes to CE3 and CE4 as follows:

1. PE1 imports OSPF routes of CE1 into BGP and forms BGP VPNv4 routes.
2. PE1 uses MP-BGP to advertise the BGP VPNv4 routes to PE2.
3. PE2 imports the BGP VPNv6 routes into OSPFv3 and then advertises these routes to
CE3 and CE4.
The process of advertising routes of CE4 or CE3 to CE1 is the same as the preceding process.
OSPFv3 Domain ID
If inter-area routes are advertised between local and remote OSPFv3 areas, these areas are
considered to be in the same OSPFv3 domain.
 Domain IDs identify domains.
 Each OSPFv3 domain has one or more domain IDs. If more than one domain ID is
available, one of the domain IDs is a primary ID, and the others are secondary IDs.
 If an OSPFv3 instance does not have specific domain IDs, its ID is considered as null.
Issue 01 (2018-05-04) 1171

NE20E-S2
Before advertising the remote routes sent by BGP to CEs, PEs need to determine the type of
OSPFv3 routes (Type 3 or Type 5) to be advertised to CEs based on the domain IDs.
 If local domain IDs are the same as or compatible with remote domain IDs in BGP
routes, PEs advertise Type 3 routes.
 If local domain IDs are different from or incompatible with remote domain IDs in BGP
routes, PEs advertise Type 5 or Type 7 routes.
Table 1-221 Domain ID relationships and corresponding generated routes
Relationship Between Local and Remote Type of the Generated Routes

Domain IDs
Both are null. Inter-area routes

The remote domain ID equals the local primary Inter-area routes
domain ID or one of the local secondary domain IDs.
The remote domain ID is different from the local If the local area is a non-NSSA,
primary domain ID or any of the local secondary external routes are generated.
domain IDs. If the local area is an NSSA, NSSA
routes are generated.
Routing Loop Prevention

Routing loops may occur between PEs and CEs when OSPFv3 and BGP learn routes from
each other.
Figure 1-795 OSPFv3 VPN routing loops
In Figure 1-795, on PE1, OSPFv3 imports a BGP route destined for 2001:db8:1::1/64 and
then generates and advertises a Type 5 or Type 7 LSA to CE1. Then, CE1 learns an OSPFv3
route with 2001:db8:1::1/64 as the destination address and PE1 as the next hop and advertises
the route to PE2. Therefore, PE2 learns an OSPFv3 route with 2001:db8:1::1/64 as the
destination address and CE1 as the next hop.
Issue 01 (2018-05-04) 1172

NE20E-S2
Similarly, CE1 also learns an OSPFv3 route with 2001:db8:1::1/64 as the destination address
and PE2 as the next hop. PE1 learns an OSPF route with 2001:db8:1::1/64 as the destination
address and CE1 as the next hop.
As a result, CE1 has two equal-cost routes with PE1 and PE2 as next hops respectively, and
the next hops of the routes from PE1 and PE2 to 2001:db8:1::1/64 are CE1, which leads to a
routing loop.
In addition, the priority of an OSPFv3 route is higher than that of a BGP route. Therefore, on
PE1 and PE2, BGP routes to 2001:db8:1::1/64 are replaced with the OSPFv3 route, and the
OSPFv3 route with 2001:db8:1::1/64 as the destination address and CE1 as the next hop is
active in the routing tables of PE1 and PE2.
The BGP route is inactive, and therefore, the LSA generated when this route is imported by
OSPFv3 is deleted, which causes the OSPFv3 route to be withdrawn. As a result, no OSPFv3
route exists in the routing table, and the BGP route becomes active again. This cycle causes
route flapping.
OSPFv3 VPN provides a few solutions to routing loops, as described Table 1-222.
Table 1-222 Routing loop prevention measures
DN-bit It is a flag bit used by OSPFv3 When advertising the

multi-instance processes to prevent generated Type 3, Type 5,
routing loops. or Type 7 LSAs to CEs, PEs
set the DN-bit of these
LSAs to 1. PEs retain the
DN-bit (0) of other LSAs.
When calculating routes, the
OSPFv3 multi-instance
process of a PE ignores
LSAs with DN-bit 1, which
prevents the PE from
receiving the LSAs that are
advertised by itself.
VPN route tag The VPN route tag is carried in Type When a PE detects that the
5 or Type 7 LSAs generated by PEs VPN route tag in the
based on the received BGP VPN incoming LSA is the same
route. as that in the local LSA, the
It is not carried in BGP extended PE ignores this LSA, which
community attributes. The VPN route prevents routing loops.
tag is valid only on the PEs that
receive BGP routes and generate
OSPFv3 LSAs.
Default route It is a route whose destination IP PEs do not calculate default
address and mask are both 0. routes.
Default routes are used to
forward the traffic from CEs
or the sites where CEs
reside to the VPN backbone
network.
Issue 01 (2018-05-04) 1173

NE20E-S2
Multi-VPN-Instance CE
OSPFv3 multi-instance generally runs on PEs. Devices that run OSPFv3 multi-instance
within user LANs are called Multi-VPN-Instance CEs (MCEs).
Compared with OSPFv3 multi-instance running on PEs, MCEs have the following
characteristics:
 MCEs do not need to support OSPFv3-BGP association.
 MCEs establish one OSPFv3 instance for each service. Different virtual CEs transmit
different services, which ensures LAN security at a low cost.
 MCEs implement different OSPFv3 instances on a CE. The key to implementing MCEs
is to disable loop detection and calculate routes directly. MCEs also use the received
LSAs with the DN-bit 1 for route calculation.
1.10.7.2.8 OSPFv3-BGP Association

When a new device is deployed on a network or a device is restarted, network traffic may be
lost during BGP route convergence because IGP routes converge more quickly than BGP
routes. OSPFv3-BGP association can address this problem.
After a device on a BGP network recovers from a fault, BGP convergence is performed again
and packet loss may occur during the convergence.
In Figure 1-796, traffic from Device A to Device D through Device C traverses a BGP
network.
Figure 1-796 Traffic traversing a BGP network
If Device C fails, traffic is switched to Device B after rerouting. Packets are lost when Device
C recovers.
Because OSPFv3 route convergence is faster than BGP route convergence, OSPFv3
convergence is complete while BGP route convergence is still going on when Device C
recovers. The next hop of the route from Device A to Device D is Device C, which, however,
does not know the route to Device D since BGP convergence on Device C is not complete.
Therefore, Device C discards the packets destined for Device D after receiving them from
Device A, as shown in Figure 1-797.
Issue 01 (2018-05-04) 1174

NE20E-S2
Figure 1-797 Packet loss during a device restart without OSPFv3-BGP association
OSPFv3-BGP Association Process

When a device with OSPFv3-BGP association restarts, the device sets the weight to the
largest value (65535) in LSAs, instructing other OSPFv3 routers not to use it as a transit
router for data forwarding. BGP routes, however, can still reach the device.
1.10.7.2.9 OSPFv3 Authentication
OSPFv3 IPSec Authentication

The rapid development of networks poses higher requirements for network security. Routing
protocol packets that are transmitted on networks may be intercepted, changed, or forged, and
packet attacks may cause network interruption. Therefore, packets need to be protected.
Standard protocols does not define any authentication mechanisms for OSPFv3. Therefore,
OSPFv3 packets do not carry any authentication information.
Standard protocol defines the use of the IP Security (IPsec) mechanism to authenticate
OSPFv3 packets.
The IPsec protocol family, which consists of a series of protocols defined by the Internet
Engineering Task Force (IETF), provides high-quality, interoperable, and cryptology-based
security for IP packets.
By encrypting data and authenticating the data source at the IP layer, communicating parties
can ensure confidentiality, data integrity, data source authentication, and anti-replay for the
data transmitted across the network.
 Confidentiality: The data is encrypted and transmitted in cipher text.

 Data integrity: Received packets are authenticated to check whether they have been modified.
 Data authentication: The data source is authenticated to ensure that the data is sent from a real
sender.
 Anti-replay: The attacks from malicious users who repeatedly send obtained data packets are
prevented. Specifically, the receiver rejects old or repeated data packets.
Issue 01 (2018-05-04) 1175

NE20E-S2
IPsec adopts two security protocols: Authentication Header (AH) security and Encapsulating
Security Payload (ESP):
 AH: A protocol that provides data origin authentication, data integrity check, and
anti-replay protection. AH does not encrypt packets to be protected.
AH data is carried in the following fields:
− IP version
− Header length
− Packet length
− Identification
− Protocol
− Source and destination addresses
− Options
 ESP: A protocol that provides IP packet encryption and authentication mechanisms
besides the functions provided by AH. The encryption and authentication mechanisms
can be used together or independently.
OSPFv3 Authentication Trailer

Prior to the OSPFv3 Authentication Trailer, OSPFv3 can use only IPSec for authentication.
However, on some specicial networks, a mobile ad hoc network (MANET) for example,
IPSec is difficult to deploy and maintain. To address this problem, standard protocol
introduces Authentication Trailer for OSPFv3, which provides another approach for OSPFv3
to implement authentication.
In OSPFv3 authentication, an authentication field is added to each OSPFv3 packet for
encryption. When a local device receives an OSPFv3 packet from a remote device, the local
device discards the packet if the authentication password carried in the packet is different
from the local one, which protects the local device against potential attacks. Therefore,
OSPFv3 authentication improves network security.
Based on the applicable scope, OSPFv3 authentication is classified as follows:
 Area authentication
Area authentication is configured in the OSPFv3 area view and applies to packets
received by all interfaces in an OSPF area.
 Process authentication
Process authentication is configured in the OSPFv3 view and applies to all packets in an
OSPF process.
 Interface authentication
Interface authentication is configured in the interface view and applies to all packets
received by the interface.
OSPFv3 uses HMAC-SHA256 to authenticate packets. In HMAC-SHA256 authentication, a
password is encrypted using the HMAC-SHA256 algorithm before being added to a packet,
which improves password security.
Each OSPFv3 packet carries an authentication type in the header and authentication
information in the tail.
The authentication types are as follows:
Issue 01 (2018-05-04) 1176

NE20E-S2
 2: ciphertext authentication
Networking Application of OSPFv3 Authentication Trailer
Figure 1-798 OSPFv3 authentication trailer on a broadcast network

 Interface authentication configurations must be the same on all devices of the same
network so that OSPFv3 neighbor relationships can be established.
 Area authentication configurations must be the same on all devices in the same area.
1.10.7.2.10 OSPFv3 Neighbor Relationship Flapping Suppression

OSPFv3 neighbor relationship flapping suppression works by delaying OSPFv3 neighbor
relationship reestablishment or setting the link cost to the maximum value (65535).
Background
If the status of an interface carrying OSPFv3 services alternates between Up and Down,
OSPFv3 neighbor relationship flapping occurs on the interface. During the flapping, OSPFv3
frequently sends Hello packets to reestablish the neighbor relationship, synchronizes LSDBs,
and recalculates routes. In this process, a large number of packets are exchanged, adversely
affecting neighbor relationship stability, OSPFv3 services, and other OSPFv3-dependent
services, such as LDP and BGP. OSPFv3 neighbor relationship flapping suppression can
address this problem by delaying OSPFv3 neighbor relationship reestablishment or preventing
service traffic from passing through flapping links.
Related Concepts
Flapping_event: reported when the status of a neighbor relationship on an interface last
changes from Full to a non-Full state. The flapping_event triggers flapping detection.
Flapping_count: number of times flapping has occurred.
Detect-interval: detection interval. The interval is used to determine whether to trigger a
valid flapping_event.
Threshold: flapping suppression threshold. When the flapping_count reaches or exceeds
threshold, flapping suppression takes effect.
Resume-interval: interval for exiting from OSPFv3 neighbor relationship flapping
suppression. If the interval between two successive valid flapping_events is longer than
resume-interval, the flapping_count is reset.
Issue 01 (2018-05-04) 1177

NE20E-S2
Implementation
Flapping detection
Each OSPFv3 interface on which OSPFv3 neighbor relationship flapping suppression is
enabled starts a flapping counter. If the interval between two successive neighbor status
changes from Full to a non-Full state is shorter than detecting-interval, a valid
flapping_event is recorded, and the flapping_count increases by 1. When the flapping_count
reaches or exceeds threshold, flapping suppression takes effect. If the interval between two
successive neighbor status changes from Full to a non-Full state is longer than
resume-interval, the flapping_count is reset.
The detecting-interval, threshold, and resume-interval are configurable.
The value of resume-interval must be greater than that of detecting-interval.
relationship establishment, interfaces prevent neighbor relationships from being
reestablished during the suppression period, which minimizes LSDB synchronization
attempts and packet exchanges.
 Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use
65535 as the cost of the flapping link during Hold-max-cost suppression, which prevents
traffic from passing through the flapping link.
mode.
By default, the Hold-max-cost mode takes effect. The mode and suppression duration can be
changed manually.
If an attack causes frequent neighbor relationship flapping, Hold-down mode can minimize
the impact of the attack.

 The corresponding OSPFv3 process is reset.
 An OSPF neighbor is reset.
Typical Scenarios
Basic scenario
Issue 01 (2018-05-04) 1178

NE20E-S2
-> Device E.
 If flapping suppression works in Hold-max-cost mode, 65535 is used as the cost of the
link between Device B and Device C during the suppression period, and traffic is
forwarded along the path Device A -> Device B -> Device D -> Device E.

recommended. If flapping suppression works in Hold-max-cost mode, 65535 is used as the
cost of the link between Device B and Device C during the suppression period. After the
network stabilizes and the suppression timer expires, the link is restored.
Issue 01 (2018-05-04) 1179

NE20E-S2
Broadcast scenario
to normal status.
Multi-area scenario
In Figure 1-802, Device A, Device B, Device C, Device E, and Device F are connected in area
1, and Device B, Device D, and Device E are connected in backbone area 0. Traffic from
Device A to Device F is preferentially forwarded along an intra-area route, and the forwarding
path is Device A -> Device B -> Device C -> Device E -> Device F. When the neighbor
relationship between Device B and Device C flaps and the flapping meets suppression
Issue 01 (2018-05-04) 1180

NE20E-S2
conditions, flapping suppression takes effect in the default mode (Hold-max-cost).

Consequently, 65535 is used as the cost of the link between Device B and Device C. However,
the forwarding path remains unchanged because intra-area routes take precedence over
inter-area routes during route selection according to OSPFv3 route selection rules. To prevent
traffic loss in multi-area scenarios, configure Hold-down mode to prevent the neighbor
relationship between Device B and Device C from being reestablished during the suppression
period. During this period, traffic is forwarded along the path Device A -> Device B ->
1.10.7.2.11 OSPFv3 Flush
Background
If network-wide OSPFv3 LSAs are flushed, network stability will be adversely affected. In
this case, source tracing must be implemented to locate the root cause of the fault immediately
to minimize the impact. However, OSPFv3 itself does not support source tracing. A
conventional solution is isolation node by node until the faulty node is located. The solution is
complex and time-consuming. Therefore, a fast source tracing method is required. In this case,
OSPFv3 flush LSA source tracing is introduced, which allows maintenance personnel to
locate the faulty source on any device on the network.
Related Concepts
OSPFv3 flush LSA source tracing
A mechanism that helps locate the device that flushes LSAs. The feature has the following
characteristics:
 Uses a new UDP port. Source tracing packets are carried by UDP packets, and the UDP
packets also carry the OSPFv3 LSAs flushed by the current device and are flooded hop
by hop based on the OSPFv3 topology.
Issue 01 (2018-05-04) 1181

NE20E-S2
 Forwards packets along UDP channels which are independent of the channels used to
transmit OSPFv3 packets. Therefore, OSPFv3 flush LSA source tracing supports
incremental deployment. In addition, source tracing does not affect the devices with the
related UDP port disabled.
 Supports query of the node that flushed LSAs on any of the devices after OSPFv3 flush
LSA source tracing packets are flooded on the network, which speeds up fault locating
and faulty node isolation.
 Is Huawei proprietary.
Flush
Network-wide OSPFv3 LSAs are deleted.
PS-Hello packets
Packets used to negotiate the OSPFv3 flush LSA source tracing capability between OSPFv3
neighbors.
PS-LSA
When a device flushes an OSPFv3 LSA, it generates a PS-LSA carrying information about the
device and brief information about the OSPFv3 LSA.
PS-LSU packets
OSPFv3 flush LSA source tracing packets that carry PS-LSAs.
PS-LSU ACK packets
Acknowledgment packets used to improve OSPFv3 flush LSA source tracing packets.
OSPFv3 flush LSA source tracing port
ID of the UDP port that receives and sends OSPFv3 flush LSA source tracing packets. The
default port ID is 50133, which is configurable.
Implementation
The implementation of OSPFv3 flush LSA source tracing is as follows:
1. Source tracing capability negotiation
After an OSPFv3 neighbor relationship is established between two devices, they need to
negotiate the source tracing capability through PS-Hello packets.
2. PS-LSA generation and flooding
When a device flushes an OSPFv3 LSA, it generates a PS-LSA carrying information
about the device and brief information about the OSPFv3 LSA, adds the PS-LSA to a
PS-LSU packet, and floods the PS-LSU packet to source tracing-capable neighbors. The
PS-LSU packet is used to locate the faulty source.
Only Router-LSAs, Network-LSAs, and Inter-Area-Router-LSAs can be flushed. Therefore, a device

generates a PS-LSA only when it flushes a Router-LSA, Network-LSA, or Inter-Area-Router-LSA.
Source Tracing Capability Negotiation

OSPFv3 flush LSA source tracing uses UDP to carry source tracing packets and listens to the
UDP port which is used to receive and send source tracing packets. OSPFv3 flush LSA source
tracing is Huawei proprietary. If a source tracing-capable Huawei device sends source tracing
Issue 01 (2018-05-04) 1182

NE20E-S2
packets to a source tracing-incapable Huawei device or non-Huawei device, the source

tracing-capable Huawei device may be incorrectly identified as an attacker. Therefore, the
source tracing capability need to be negotiated between the devices. In addition, the source
tracing-capable device needs to help the source tracing-incapable device to send source
tracing information, which also requires negotiation.
Source tracing capability negotiation depends on OSPFv3 neighbor relationships. Specifically,
after an OSPFv3 neighbor relationship is established, the local device initiates source tracing
capability negotiation based on the IP address of the neighbor. Figure 1-803 shows the
negotiation process.
Figure 1-803 Source tracing capability negotiation
Table 1-223 Source tracing capability negotiation
Whether Source Tracing Source Tracing Capability Negotiation Process

Is Supported
Devices A and B both 1. Device A sends a PS-Hello packet to notify its source
support source tracing. tracing capability to device B.
2. Upon reception of the PS-Hello packet, device B sets
the source tracing field for device A and replies with an
ACK packet to notify its source tracing capability to
Issue 01 (2018-05-04) 1183

NE20E-S2
Whether Source Tracing Source Tracing Capability Negotiation Process

Is Supported
device A.
3. Upon reception of the ACK packet, device A sets the
source tracing field for device B.
Devices A supports source 1. Device A sends a PS-Hello packet to notify its source
tracing, but device B does tracing capability to device B.
not. 2. Device A fails to receive an ACK packet from device B
within 10s and retransmits the PS-Hello packet. Device
A can retransmit the PS-Hello packet twice at most.
After device A fails to receive an ACK packet from
device B after the PS-Hello packet is retransmitted
twice, device A considers that device B does not
support source tracing.
Devices A and B both 1. After source tracing is disabled from device B, device B
support source tracing, but sends a PS-Hello packet to notify its source tracing
source tracing is disabled incapability to device A.
from device B. 2. Upon reception of the PS-Hello packet, device A
replies with an ACK packet that carries the source
tracing capability.
3. Upon reception of the ACK packet, device B considers
the capability negotiation complete and disables the
UDP port.
Devices A does not support 1. After source tracing is disabled from device B, device B
source tracing, and source sends a PS-Hello packet to notify its source tracing
tracing is disabled from incapability to device A.
device B. 2. Device B fails to receive an ACK packet from device A
within 10s and retransmits the PS-Hello packet. Device
B can retransmit the PS-Hello packet twice at most.
After device B fails to receive an ACK packet from
device B after the PS-Hello packet is retransmitted
twice, device A considers the capability negotiation
complete and disables the UDP port.
PS-LSA Generation and Flooding

PS-LSA: carries information about the node that flushed OSPFv3 LSAs.
 If a device flushes an OSPFv3 LSA, it generates and floods a PS-LSA to all its
neighbors.
 If a device receives a flush LSA from a source tracing-incapable neighbor, the device
generates and floods a PS-LSA to all its neighbors. If a device receives the same flush
LSA (with the same LSID and sequence number) from more than one source
tracing-incapable neighbor, the device generates only one PS-LSA.
 If a device flushes a Router-LSA, Network-LSA, or Inter-Area-Router-LSA, it generates
a PS-LSA, adds the PS-LSA to a PS-LSU packet, and floods the PS-LSU packet to all
source tracing-capable neighbors.
Issue 01 (2018-05-04) 1184

NE20E-S2
Figure 1-804 PS-LSA generation rules
PS-LSA generation rules

 When device A flushes a Router-LSA, Network-LSA, or Inter-Area-Router-LSA, it
generates a PS-LSA in which the Flush Router field is its router ID and the Neighbor
Router field is 0, and adds the PS-LSA to the queue where packets are to be sent to all
source tracing-capable neighbors.
 After device A receives the flush LSA from device B (source tracing incapable), device A
generates a PS-LSA in which the Flush Router field is its router ID and the Neighbor
Router field is the router ID of device B, and adds the PS-LSA to the queue where
packets are to be sent to all source tracing-capable neighbors.
 After device A receives the flush LSA from device B, followed by the same flush LSA
sent by device C, device A generates a PS-LSA in which the Flush Router field is its
router ID and the Neighbor Router field is the router ID of device B, and adds the
PS-LSA to the queue where packets are to be sent to all source tracing-capable neighbors.
Device A does not generate a PS-LSA in response to the flush LSA sent by device C.
PS-LSU packet sending rules
 During neighbor relationship establishment, a device initializes the sequence number of
the PS-LSU packet of the neighbor. When the device replies with a PS-LSU packet, it
adds the sequence number of the PS-LSU packet of the neighbor. During PS-LSU packet
retransmission, the sequence number remains unchanged. After the device receives a
PS-LSU ACK packet with the same sequence number, it increases the sequence number
of the neighbor's PS-LSU packet by 1.
 The neighbor manages the PS-LSA sending queue. When a PS-LSA is added to the
queue which was empty, the neighbor starts a timer. After the timer expires, the neighbor
adds the PS-LSAs in the queue to a PS-LSU packet, sends the packet to its neighbor, and
starts another timer to wait for a PS-LSU ACK packet.
 After the PS-LSU ACK timer expires, the PS-LSU packet is retransmitted.
 When the device receives a PS-LSU ACK packet with a sequence number same as that
in the neighbor record, the device clears PS-LSAs from the neighbor queue, and sends
another PS-LSU packet after the timer expires.
− If the sequence number of a received PS-LSU ACK packet is less than that in the
neighbor record, the device ignores the packet.
− If the sequence number of a received PS-LSU ACK packet is greater than that in the
neighbor record, the device discards the packet.
PS-LSU packet sending is independent among neighbors.
PS-LSU packet receiving rules
Issue 01 (2018-05-04) 1185

NE20E-S2
 When a device receives a PS-LSU packet from a neighbor, the neighbor records the
sequence number of the packet and replies with a PS-LSU ACK packet.
 When the device receives a PS-LSU packet with the sequence number same as that in the
neighbor record, the device discards the PS-LSU packet.
 After the devices parses a PS-LSU packet, it adds the PS-LSA in the packet to the LSDB.
The device also checks whether the PS-LSA is newer than the corresponding PS-LSA in
the LSDB.
− If the PS-LSA is newer, the device floods it to other neighbors.
− If the PS-LSA is the same as the corresponding PS-LSA in the LSDB, the device
does not process the received PS-LSA.
− If the PS-LSA is older, the device floods the corresponding PS-LSA in the LSDB to
the neighbor.
 If the device receives a PS-LSU packet from a neighbor and the neighbor does not
support source tracing, the device modifies the neighbor status as source tracing capable.
Source Tracing Security

OSPFv3 flush LSA source tracing uses a UDP port to receive and send source tracing packets.
Therefore, the security of the port must be taken into consideration.
OSPFv3 flush LSA source tracing inevitably increases packet receiving and sending workload
and intensifies bandwidth pressure. To minimize the impact on OSPFv3, the number of source
tracing packets must be controlled.
The following security measures are available:
Table 1-224 Security measures for source tracing

Security Measures for Principles
Source Tracing
Authentication Source tracing is embedded in OSPFv3 and uses OSPFv3

authentication parameters to authenticate packets.
GTSM GTSM is a security mechanism that checks whether the
time to live (TTL) value in each received IP packet header
is within a pre-defined range.
OSPFv3 flush LSA source tracing packets can be flooded
as far as one hop.
 When a device sends a packet, it sets the TTL of the
packet to 255.
 If the TTL is not 254 when the packet is received, the
packet will be discarded.
Cpu-car Interface boards can check the packets to be sent to the

CPU for processing and prevent the main control board
from being overloaded by a large number of packets that
are sent to the CPU. OSPFv3 flush LSA source tracing
needs to apply for an independent CAR channel and has
small committed information rate (CIR) and committed
burst size (CBS) values configured.
Issue 01 (2018-05-04) 1186

NE20E-S2
Typical Scenarios
Scenario where all nodes support source tracing
All nodes on the network support source tracing, and node A is the faulty source. Figure 1-805
shows the networking.
Figure 1-805 Scenario where all nodes support source tracing
When device A flushes an OSPFv3 LSA, it generates a PS-LSA that carries device A
information and brief information about the flush LSA. Then the PS-LSA is flooded on the
network hop by hop. After the fault occurs, maintenance personnel can log in to any node on
the network to locate device A that keeps sending flush LSAs and isolate device A from the
network.
Scenario where source tracing-incapable nodes are not isolated from source
tracing-capable nodes
All nodes on the network except device C support source tracing, and device A is the faulty
source. In this case, the PS-LSA can be flooded on the entire network. Figure 1-806 shows the
networking.
Issue 01 (2018-05-04) 1187

NE20E-S2
Figure 1-806 Scenario where source tracing-incapable nodes are not isolated from source
information and brief information about the flush LSA. Then the PS-LSA is flooded on the
network hop by hop. When devices B and E negotiate the source tracing capability with
device C, they find that device C does not support source tracing. Therefore, after device B
receives the PS-LSA from device A, device B sends the PS-LSA to device D, but not to device
C. After receiving the flush LSA from device C, device E generates a PS-LSA which carries
information about the advertisement source (device E), flush source (device C), and the flush
LSA, and floods the PS-LSA on the network.
After the fault occurs, maintenance personnel can log in to any device on the network except
device C to locate the faulty node. Two possible faulty nodes can be located in this case: node
A and node C, and they both sends the same flush LSA. In this case, device A takes
precedence over device C when the maintenance personnel determine the most possible faulty
source. After device A is isolated, the network recovers.
Scenario where source tracing-incapable nodes are isolated from source tracing-capable
nodes
All nodes on the network except devices C and D support source tracing, and node A is the
faulty source. In this case, the PS-LSA cannot be flooded on the entire network. Figure 1-807
Issue 01 (2018-05-04) 1188

NE20E-S2
Figure 1-807 Scenario where source tracing-incapable nodes are isolated from source
information and brief information about the flush LSA. However, the PS-LSA can reach only
device B because devices C and D do not support source tracing.
During source tracing capability negotiation, device E finds that device C does not support
source tracing, and device F finds that device D does not support source tracing. After device
E receives the PS-LSA from device C, device E helps device E generate and flood a PS-LSA.
Similarly, after device F receives the PS-LSA from device D, device F helps device D
generate and flood a PS-LSA.
After the fault occurs:
 If maintenance personnel log in to device A or B, the personnel can locate the faulty
source (device A) directly. After device A is isolated, the network recovers.
 If the maintenance personnel log in to device E, F, G, or H, the personnel will find that
device E claims device C to be the faulty source and device F claims device D to be the
faulty source.
 If the personnel log in to devices C and D and find that the flush LSA was sent by device
B, not generated by device C or D.
 If the personnel log in to device B, determine that device A is the faulty device, and
isolate device A. After device A is isolated, the network recovers.
1.10.7.2.12 OSPFv3 Packet Format

Open Shortest Path First (OSPF) for IPv6 packets are encapsulated into IPv6 packets. The
OSPFv3 protocol number is 89. OSPFv3 packets are classified into the following types:
Issue 01 (2018-05-04) 1189

NE20E-S2
 Hello packet
 Database Description (DD) packet
 Link State Request (LSR) packet
 Link State Update (LSU) packet
 Link State Acknowledgment (LSAck) packet
Packet Header Format

The five types of OSPFv3 packets have the same packet header format. The length of an
OSPFv3 packet header is 24 bytes. Figure 1-808 shows an OSPFv3 packet header.
Figure 1-808 OSPFv3 packet header
Table 1-225 describes packet header fields.
Table 1-225 Packet header fields
Version 8 bits OSPF version number. For OSPFv3, the value is 3.

Type 8 bits OSPFv3 packet type. The values are as follows:
 1: Hello packet
 2: DD packet
 3: LSR packet
 4: LSU packet
 5: LSAck packet
Packet 16 bits Length of the OSPFv3 packet containing the packet header, in
length bytes.
Router ID 32 bits ID of the router that sends the OSPFv3 packet.
Area ID 32 bits ID of the area to which the router that sends the OSPFv3
packet belongs.
Checksum 16 bits Checksum of the OSPFv3 packet that does not contain the
Authentication field.
Instance ID 8 bits ID of an OSPFv3 instance.
0 8 bits Reserved fields.
Issue 01 (2018-05-04) 1190

NE20E-S2
Hello Packet
Hello packets are commonly used packets, which are periodically sent on OSPFv3 interfaces
to establish and maintain neighbor relationships. A Hello packet includes information about
the designated router (DR), backup designated router (BDR), timers, and known neighbors.
Figure 1-809 shows the format of a Hello packet.
Figure 1-809 Format of a Hello packet
Table 1-226 describes Hello packet fields.
Table 1-226 Hello packet fields

Interface ID 32 bits ID of the interface that sends the Hello packets.
Rtr Priority 8 bits DR priority. The default value is 1.
NOTE
If the DR priority of a router interface is set to 0, the interface cannot
participate in a DR or BDR election.

 E: Type 5 link state advertisements (LSAs) are flooded.
HelloInterval 16 bits Interval at which Hello packets are sent.
RouterDeadI 16 bits Dead interval. If a router does not receive any Hello packets
nterval from its neighbors within a specified dead interval, the
neighbors are considered Down.
Designated 32 bits Interface address of the DR.
Issue 01 (2018-05-04) 1191

NE20E-S2

Router ID
Backup 32 bits Interface address of the BDR.
Designated
Router ID
Neighbor ID 32 bits Router ID of the neighbor.
Table 1-227 lists the address types, interval types, and default intervals used when Hello
packets are transmitted on different networks.
Table 1-227 Hello packet characteristics for various network types

Networ Address Interval Type Default Interval
k Type Type
Broadcas Multicast HelloInterval 10 seconds

t address
Non-bro Unicast  HelloInterval for the DR, 30 seconds for HelloInterval
adcast address BDR, and router that can 120 seconds for PollInterval
multiple become a DR
access  PollInterval for the case
(NBMA) when neighbors become
Down and HelloInterval for
other cases
Point-to- Multicast HelloInterval 10 seconds
point address
(P2P)
Point-to- Unicast HelloInterval 30 seconds
multipoi address
nt
(P2MP)
To establish neighbor relationships between routers on the same network segment, you must set the
same HelloInterval, PollInterval, and RouterDeadInterval values for the routers. PollInterval applies
only to NBMA networks.
DD Packet
During an adjacency initialization, two routers use DD packets to describe their own link state
databases (LSDBs) for LSDB synchronization. A DD packet contains the header of each LSA
in an LSDB. An LSA header uniquely identifies an LSA. The LSA header occupies only a
small portion of the LSA, which reduces the amount of traffic transmitted between routers. A
neighbor can use the LSA header to check whether it already has the LSA. When two routers
exchange DD packets, one functions as the master and the other functions as the slave. The
master defines a start sequence number. The master increases the sequence number by one
Issue 01 (2018-05-04) 1192

NE20E-S2
each time it sends a DD packet. After the slave receives a DD packet, it uses the sequence
number carried in the DD packet for acknowledgement.
Figure 1-810 shows the format of a DD packet.
Figure 1-810 Format of a DD packet
Table 1-228 describes DD packet fields.
Table 1-228 DD packet fields


Interface 16 bits Maximum length of the DD packet sent by the interface with
MTU packet fragmentation disabled.
I 1 bit If the DD packet is the first packet among multiple
M (More) 1 bit If the DD packet is the last packet among multiple
M/S 1 bit When two routers exchange DD packets, they negotiate a
(Master/Slav master/slave relationship. The router with a larger router ID
e) becomes the master. If this field is set to 1, the DD packet is
sent by the master.
DD 32 bits Sequence number of the DD packet. The master and slave use
sequence the sequence number to ensure that DD packets are correctly
number transmitted.
LSA - LSA header information included in the DD packet.
Issue 01 (2018-05-04) 1193

NE20E-S2

Headers
LSR Packet
After two routers exchange DD packets, they send LSR packets to request each other's LSAs.
The LSR packets contain the summaries of the requested LSAs. Figure 1-811 shows the
format of an LSR packet.
Figure 1-811 Format of an LSR packet
Table 1-229 describes LSR packet fields.
Table 1-229 LSR packet fields
LS type 16 bits Type of the LSA

ID an AS.
Router
The LS type, Link State ID, and Advertising Router fields can uniquely identify an LSA. If two LSAs
have the same LS type, Link State ID, and Advertising Router fields, a router uses the LS sequence
number, LS checksum, and LS age fields to obtain a required LSA.
LSU Packet
A router uses an LSU packet to transmit LSAs requested by its neighbors or to flood its own
updated LSAs. The LSU packet contains a set of LSAs. For multicast and broadcast networks,
LSU packets are multicast to flood LSAs. To ensure reliable LSA flooding, a router uses an
LSAck packet to acknowledge the LSAs contained in an LSU packet that is received from a
Issue 01 (2018-05-04) 1194

NE20E-S2
neighbor. If an LSA fails to be acknowledged, the router retransmits the LSA to the neighbor.
Figure 1-812 shows the format of an LSU packet.
Figure 1-812 Format of an LSU packet
Table 1-230 describes the LSU packet field.
Table 1-230 LSU packet field

Number of 32 bits Number of LSAs contained in the LSU packet
LSAs
LSAck Packet
A router uses an LSAck packet to acknowledge the LSAs contained in a received LSU packet.
The LSAs can be acknowledged using LSA headers. LSAck packets can be transmitted over
different links in unicast or multicast mode. Figure 1-813 shows the format of an LSAck
packet.
Figure 1-813 Format of an LSAck packet
Table 1-231 describes the LSAck packet field.
Table 1-231 LSAck packet field

LSAs Determine This field is used to acknowledge an LSA.
Issue 01 (2018-05-04) 1195

NE20E-S2

Headers d by the
header
length of
the LSA to
be
acknowled
ged.
1.10.7.2.13 OSPFv3 LSA Format

Each router in an autonomous system (AS) generates one or more types of link state
advertisements (LSAs), depending on the router's type. Multiple LSAs form a link state
database (LSDB). Open Shortest Path First (OSPF) for IPv6 encapsulates routing information
into LSAs for transmission. Commonly used LSAs include:
 Router-LSA (Type1)
 Network-LSA (Type2)
 Inter-Area-Prefix-LSA (Type3)
 Inter-Area-Router-LSA (Type4)
 AS-external-LSA (Type5)
 Link-LSA (Type8)
 Intra-Area-Prefix-LSA (Type9)
LSA Header Format

All LSAs have the same header. Figure 1-814 shows an LSA header.
Figure 1-814 LSA header
Table 1-232 describes LSA header fields.
Table 1-232 LSA header fields

LS age 16 bits Time that elapses after the LSA is generated, in seconds. The
value of this field continually increases regardless of whether
the LSA is transmitted over a link or saved in an LSDB.
Issue 01 (2018-05-04) 1196

NE20E-S2

LS type 16 bits Type of the LSA. The values are as follows:
 Type1: Router-LSA.
 Type2: Network-LSA.
 Type3: Inter-Area-Prefix-LSA.
 Type4: Inter-Area-Router-LSA.
 Type5: AS-external-LSA.
 Type8: Link-LSA.
 Type9: Intra-Area-Prefix-LSA.
ID an AS.
Router
LS sequence 32 bits Sequence number of the LSA. Neighbors can use this field to
number identify the latest LSA.
LS 16 bits Checksum of all fields except the LS age field.
checksum
Length 16 bits Length of the LSA including the LSA header, in bytes.
Router-LSA
A router-LSA describes the link status and cost of a router. Router-LSAs are generated by a
router and advertised within the area to which the router belongs. Figure 1-815 shows the
format of a router-LSA.
Issue 01 (2018-05-04) 1197

NE20E-S2
Figure 1-815 Format of a router-LSA
Table 1-233 describes router-LSA fields.
Table 1-233 Router-LSA fields

Nt (NSSA 1 bit If the router that generates the LSA is an NSSA border router,
translation) this field is set to 1. In other cases, this field is set to 0. When
this field is set to 1, the router is unconditionally translating
NSSA-LSAs into AS-external-LSAs.
x 1 bit This field is deprecated.
V (Virtual 1 bit If the router that generates the LSA is located at one end of a
Link) virtual link, this field is set to 1. In other cases, this field is set
to 0.
E (External) 1 bit If the router that generates the LSA is an autonomous system
boundary router (ASBR), this field is set to 1. In other cases,
this field is set to 0.
B (Border) 1 bit If the router that generates the LSA is an area border router
(ABR), this field is set to 1. In other cases, this field is set to
0.
Options 24 bits The optional capabilities supported by the router.
Type 8 bits Type of the router link. The values are as follows:
 1: The router is connected to another router in
Issue 01 (2018-05-04) 1198

NE20E-S2

point-to-point (P2P) mode.
 2: The router is connected to a transport network.
 3: Reserved.
 4: The router is connected to another router over a virtual
link.
metric 16 bits Cost of the link.
Interface ID 32 bits The Interface ID assigned to the interface.
Neighbor 32 bits The Interface ID the neighbor router has associated with the
Interface ID link.
Neighbor 32 bits The Router ID the of the neighbor router.
Router ID
Network-LSA
A network-LSA describes the link status of all routers on the local network segment.
Network-LSAs are generated by a DR on a broadcast or non-broadcast multiple access
(NBMA) network and advertised within the area to which the DR belongs. Figure 1-816
shows the format of a network-LSA.
Figure 1-816 Format of a network-LSA
Table 1-234 describes network-LSA fields.
Table 1-234 Network-LSA fields

Options 24 bits The optional capabilities supported by the router.
Attached 32 bits Router IDs of all routers on the broadcast or NBMA network,
Router including the router ID of the DR
Issue 01 (2018-05-04) 1199

NE20E-S2
Inter-Area-Prefix-LSA
An inter-area-prefix-LSA describes routes on a network segment in an area, It is generated by
the ABR. The routes are advertised to other areas.
Figure 1-817 shows the format of an inter-area-prefix-LSA.
Figure 1-817 Format of An inter-area-prefix-LSA
Table 1-235 describes inter-area-prefix-LSA fields.
Table 1-235 Network-summary-LSA fields
PrefixLength 8 bits Length of the prefix.

PrefixOption 8 bits The capabilities associated with the prefix.
Address 32 bits An encoding of the prefix itself.
Prefix
Inter-Area-Router-LSA
An inter-area-router-LSA describes routes to ASBR in other areas, It is generated by the ABR.
The routes are advertised to all related areas except the area that the ASBR belongs to.
Figure 1-818 shows the format of an inter-area-prefix-LSA.
Issue 01 (2018-05-04) 1200

NE20E-S2
Figure 1-818 Format of an inter-area-router-LSA
Table 1-236 describes inter-area-router-LSA fields.
Table 1-236 Inter-area-router-LSA fields

Destination 32 bits The Router ID of the router being described by the LSA.
Router ID
AS-External-LSAs
An as-external-LSA describes destinations outside the AS, it is originated by ASBR.
Figure 1-819 shows the format of an as-external-LSA.
Issue 01 (2018-05-04) 1201

NE20E-S2
Figure 1-819 Format of an as-external-LSA
Table 1-237 describes as-external-LSA fields.
Table 1-237 As-external-LSA fields

E 1 bit The type of external metric.
 If this field is 1, the specified metric is a Type 2 external
metric.
 If this field is 0, the specified metric is a Type 1 external
metric.
F 1 bit Whether a Forwarding Address has been included in the LSA.

 If this field is 1, a Forwarding Address has been included
in the LSA.
 If this field is 0, no Forwarding Address is included in the
LSA.
T 1 bit Whether an External Route Tag has been included in the

LSA.
 If this field is 1, an External Route Tag has been included
in the LSA.
 If this field is 0, no External Route Tag is included in the
LSA.
Issue 01 (2018-05-04) 1202

NE20E-S2
Referenced 16 bits Indicates the referenced LS type. If non-zero, an LSA with

LS Type this LS type is to be associated with this LSA (see Referenced
Link State ID below).
Forwarding 128 bits A fully qualified global IPv6 address.
Address
External 32 bits Indicates the external route tag, which can be used to
Route Tag communicate additional information between ASBRs.
Referenced 32 bits Indicates the referenced link state ID.
Link State
ID
Link-LSAs
Each router generates a link LSA for each link. A link LSA describes the link-local address
and IPv6 address prefix associated with the link and the link option set in the network LSA. It
is transmitted only on the link.
Figure 1-820 shows the format of a Link-LSA.
Issue 01 (2018-05-04) 1203

NE20E-S2
Figure 1-820 Format of a Link-LSA
Table 1-238 describes link-LSA fields.
Table 1-238 Link-LSA fields
Rtr Priority 8 bits The Router Priority of the interface.

Options 24 bits The set of Options bits that the router would like set in the
network-LSA that will be originated by the DR on broadcast
or NBMA links.
Link-local 128 bits The originating router's link-local interface address on the
Interface link.
Address
Number of 32 bits The number of IPv6 address prefixes contained in the LSA.
prefixes
Intra-Area-Prefix-LSAs
Each router or DR generates one or more intra-area prefix LSAs and transmits it in the local
area.
Issue 01 (2018-05-04) 1204

NE20E-S2
 An LSA generated on a router describes the IPv6 address prefix associated with the
router LSA.
 An LSA generated on a DR describes the IPv6 address prefix associated with the
network LSA.
Figure 1-821 shows the format of an intra-area-prefix-LSA.
Figure 1-821 Format of an intra-area-prefix-LSA
Table 1-239 describes intra-area-prefix-LSA fields.
Table 1-239 Intra-area-prefix-LSA fields

Referenced 16 bits This field identifies the router-LSA or network-LSA with
LS Type which the IPv6 address prefixes should be associated.
 If Referenced LS Type is 0x2001, the prefixes are
associated with a router-LSA.
 If Referenced LS Type is 0x2002, the prefixes are
associated with a network-LSA.
Referenced 32 bits Indicates the referenced link state ID.

Link State  If Referenced LS Type is 0x2001, Referenced Link State
ID ID should be 0.
 If Referenced LS Type is 0x2002, Referenced Link State
ID should be the Interface ID of the link's DR.
Issue 01 (2018-05-04) 1205

NE20E-S2
Referenced 32 bits Indicates the ID of the referenced advertising router.

Advertising  If Referenced LS Type is 0x2001, Referenced Advertising
Router Router should be the originating router's Router ID.
 If Referenced LS Type is 0x2002, Referenced Advertising
Router should be the Designated Router's Router ID.
1.10.8 IS-IS
Definition
Intermediate System to Intermediate System (IS-IS) is a dynamic routing protocol initially
designed by the International Organization for Standardization (ISO) for its Connectionless
Network Protocol (CLNP).
To support IP routing, the Internet Engineering Task Force (IETF) extends and modifies IS-IS
in relevant standards, which enables IS-IS to be applied to both TCP/IP and Open System
Interconnection (OSI) environments. This type of IS-IS is called Integrated IS-IS or Dual
IS-IS.
In this document, IS-IS refers to Integrated IS-IS, unless otherwise stated.
If IS-IS IPv4 and IS-IS IPv6 implement a feature in the same way, details are not provided in this
chapter. For details about the implementation differences, see the 1.10.8.4 Appendixes.
Purpose
As an Interior Gateway Protocol (IGP), IS-IS is used in Autonomous Systems (ASs). IS-IS is
a link state protocol, and it uses the Shortest Path First (SPF) algorithm to calculate routes.
1.10.8.2 Principles
1.10.8.2.1 Basic Concepts of IS-IS
IS-IS Areas
To support large-scale routing networks, IS-IS adopts a two-level structure in a routing
domain. A large domain is divided into areas. Figure 1-822 shows an IS-IS network. The
entire backbone area covers all Level-2 devices in area 1 and Level-1-2 devices in other areas.
Three types of devices on the IS-IS network are described as follows:
Issue 01 (2018-05-04) 1206

NE20E-S2
Figure 1-822 IS-IS topology
 Level-1 device
A Level-1 device manages intra-area routing. It establishes neighbor relationships with
only the Level-1 and Level-1-2 devices in the same area and maintains a Level-1 LSDB.
The LSDB contains routing information in the local area. A packet to a destination
beyond this area is forwarded to the nearest Level-1-2 device.
 Level-2 device
A Level-2 device manages inter-area routing. It can establish neighbor relationships with
all Level-2 devices and Level-1-2 devices, and maintains a Level-2 LSDB which
contains inter-area routing information.
All Level-2 devices form the backbone network of the routing domain. Level-2 neighbor
relationships are set up between them. They are responsible for communications between
areas. The Level-2 devices in the routing domain must be in succession to ensure the
continuity of the backbone network. Only Level-2 devices can exchange data packets or
routing information with the devices beyond the routing domain.
 Level-1-2 device
A device, which can establish neighbor relationships with both Level-1 devices and
Level-2 devices, is called a Level-1-2 device. A Level-1-2 device can establish Level-1
neighbor relationships with Level-1 devices and Level-1-2 devices in the same area. It
can also establish Level-2 neighbor relationships with Level-2 devices and Level-1-2
Issue 01 (2018-05-04) 1207

NE20E-S2
devices in other areas. Level-1 devices can be connected to other areas only through
Level-1-2 devices.
A Level-1-2 device maintains two LSDBs: a Level-1 LSDB and a Level-2 LSDB. The
Level-1 LSDB is used for intra-area routing, while the Level-2 LSDB is used for
inter-area routing.
Level-1 devices in different areas cannot establish neighbor relationships. Level-2 devices can establish
neighbor relationships with each other, regardless of the areas to which the Level-2 devices belong.
In general, Level-1 devices are located within an area, Level-2 devices are located between
areas, and Level-1-2 devices are located between Level-1 devices and Level-2 devices.
Interface level
A Level-1-2 device may need to establish only a Level-1 adjacency with one neighbor and
establish only a Level-2 adjacency with another neighbor. In this case, you can set the level of
an interface to control the setting of adjacencies on the interface. Specifically, only Level-1
adjacencies can be established on a Level-1 interface, and only Level-2 adjacencies can be
established on a Level-2 interface.
Address Structure of IS-IS

In OSI, the NSAP is used to locate resources. The ISO adopts the address structure shown in
Figure 1-823. An NSAP is composed of the Initial Domain Part (IDP) and the Domain
Specific Part (DSP). IDP is the counterpart of network ID in an IP address, and DSP is the
counterpart of the subnet number and host address in an IP address.
As defined by the ISO, the IDP consists of the Authority and Format Identifier (AFI) and
Initial Domain Identifier (IDI). AFI specifies the address assignment mechanism and the
address format; the IDI identifies a domain.
The DSP consists of the High Order DSP (HODSP), system ID, and NSAP Selector (SEL).
The HODSP is used to divide areas; the system ID identifies a host; the SEL indicates the
service type.
The lengths of the IDP and DSP are variable. The length of the NSAP varies from 8 bytes to
20 bytes.
Figure 1-823 IS-IS address structure
 Area address
Issue 01 (2018-05-04) 1208

NE20E-S2
An IDP and HODSP of the DSP can identify a routing domain and the areas in a routing
domain; therefore, the combination of the IDP and HODSP is referred to as an area
address, equal to an area ID in OSPF. An area address is used to uniquely identify an area
in a routing domain. The area addresses of routers in the same Level-1 area must be the
same, while the area addresses of routers in the Level-2 area can be different.
In general, a router can be configured with only one area address. The area address of all
nodes in an area must be the same. In the implementation of a device, an IS-IS process
can be configured with a maximum of three area addresses to support seamless
combination, division, and transformation of areas.
 System ID
A system ID uniquely identifies a host or a router in an area. In the device, the length of
the system ID is 48 bits (6 bytes).
A router ID corresponds to a system ID. If a device uses the IP address of Loopback 0
(168.10.1.1) as its router ID, its system ID used in IS-IS can be obtained by performing
the following steps:
− Extend each part of the IP address 168.10.1.1 to 3 digits and add 0 or 0s to the front
of the part that is shorter than 3 digits.
− Divide the extended address 168.010.001.001 into three parts, with each part
consisting of 4 decimal digits.
− The reconstructed 1680.1000.1001 is the system ID.
There are many ways to specify a system ID. Whichever you choose, ensure that the
system ID uniquely identifies a host or a device.
If the same system ID is configured for more than one device on the same network, network flapping
may occur. To address this problem, IS-IS provides the automatic recovery function. With the function,
if the system detects an IS-IS system ID conflict, it automatically changes the local system ID to resolve
the conflict. The first two bytes of the system ID automatically changed by the system are Fs, and the
last four bytes are randomly generated. For example, FFFF:1234:5678 is such a system ID. If the
conflict persists after the system automatically changes three system IDs, the system no longer resolves
this conflict.
 SEL
The role of an SEL (also referred to as NSAP Selector or N-SEL) is similar to that of the
"protocol identifier" of IP. A transport protocol matches an SEL. The SEL is "00" in IP.
 NET
A Network Entity Title (NET) indicates the network layer information of an IS itself and
consists of an area ID and a system ID. It does not contain the transport layer
information (SEL = 0). A NET can be regarded as a special NSAP. The length of the
NET field is the same as that of an NSAP, varying from 8 bytes to 20 bytes. For example,
in NET ab.cdef.1234.5678.9abc.00, the area is ab.cdef, the system ID is
1234.5678.9abc, and the SEL is 00.
In general, an IS-IS process is configured with only one NET. When areas need to be
redefined, for example, areas need to be combined or an area needs to be divided into
sub-areas, you can configure multiple NETs.
A maximum of three area addresses can be configured in an IS-IS process, and therefore, you can
configure only a maximum of three NETs. When you configure multiple NETs, ensure that their system
IDs are the same.
The routers in an area must have the same area address.
Issue 01 (2018-05-04) 1209

NE20E-S2
IS-IS Network Types

IS-IS supports the following types of networks:
 Broadcast network
 Point-to-point (P2P) network
1.10.8.2.2 Basic Protocols of IS-IS
Related Concepts
DIS and Pseudo Node
A Designated Intermediate System (DIS) is an intermediate router elected in IS-IS
communication. A pseudo node simulates a virtual node on a broadcast network and is not a
real router. In IS-IS, a pseudo node is identified by the system ID and 1-byte circuit ID (a
non-zero value) of a DIS.
The DIS is used to create and update pseudo nodes and generate the link state protocol data
units (LSPs) of pseudo nodes. The routers advertise a single link to a pseudo node and obtain
routing information about the entire network through the pseudo node. The router does not
need to exchange packets with all the other routers on the network. Using the DIS and pseudo
nodes simplifies network topology and reduces the length of LSPs generated by routers.
When the network changes, fewer LSPs are generated. Therefore, fewer resources are
consumed.
SPF Algorithm
The SPF algorithm, also named Dijkstra's algorithm, is used in a link-state routing protocol to
calculate the shortest paths to other nodes on a network. In the SPF algorithm, a local router
takes itself as the root and generates a shortest path tree (SPT) based on the network topology
to calculate the shortest path to every destination node on a network. In IS-IS, the SPF
algorithm runs separately in Level-1 and Level-2 databases.
Implementation
All routers on the IS-IS network communicate through the following steps:
 Establishment of IS-IS Neighbor Relationships
 LSDB Synchronization
 Route Calculation
Establishment of IS-IS Neighbor Relationships
On different types of networks, the modes for establishing IS-IS neighbor relationships are
different.
 Establishment of a neighbor relationship on a broadcast link
Issue 01 (2018-05-04) 1210

NE20E-S2
Figure 1-824 Networking for a broadcast link
Device A, Device B, Device C, and Device D are Level-2 routers. Device A is

newly added to the broadcast network. Figure 1-825 demonstrates the process of
establishing the neighbor relationship between Device A and Device B, the process
of establishing the neighbor relationship between Device A and Device C or Device
D is similar to that between Device A and Device B.
Figure 1-825 Establishing a neighbor relationship on a broadcast link
As shown in Figure 1-825, the process for establishing a neighbor relationship on a

broadcast link consists of the following phases:
− Device A broadcasts a Level-2 local area network (LAN) IS-to-IS Hello PDU (IIH).
After Device B receives the IIH, Device B detects that the neighbor field in the IIH
does not contain its media access control (MAC) address, and sets its neighbor
status with Device A to Initial.
− Device B replies a Level-2 LAN IIH to Device A. After Device A receives the IIH,
Device A detects that the neighbor field in the IIH contains its MAC address, and
sets its neighbor status with Device B to Up.
Issue 01 (2018-05-04) 1211

NE20E-S2
− Device A sends a Level-2 LAN IIH to Device B. After Device B receives the IIH,
Device B detects that the neighbor field in the IIH contains its MAC address, and
sets its neighbor status with Device A to Up.
DIS Election
On a broadcast network, any two routers exchange information. If n routers are available
on the network, n x (n - 1)/2 adjacencies must be established. Each status change of a
router is transmitted to other routers, which wastes bandwidth resources. IS-IS resolves
this problem by introducing the DIS. All routers send information to the DIS, which then
broadcasts the network link status. Using the DIS and pseudo nodes simplifies network
topology and reduces the length of LSPs generated by routers. When the network
changes, fewer LSPs are generated. Therefore, fewer resources are consumed.
A DIS is elected after a neighbor relationship is established. Level-1 and Level-2 DISs
are elected separately. You can configure different priorities for DISs at different levels.
In DIS election, a Level-1 priority and a Level-2 priority are specified for every interface
on every router. A router uses every interface to send IIHs and advertises its priorities in
the IIHs to neighboring routers. The higher the priority, the higher the probability of
being elected as the DIS. If there are multiple routers with the same highest priority on a
broadcast network, the one with the largest MAC address is elected. The DISs at
different levels can be the same router or different routers.
In the DIS election procedure, IS-IS is different from Open Shortest Path First (OSPF).
In IS-IS, DIS election rules are as follows:
− The router with the priority of 0 also takes part in the DIS election.
− When a new router that meets the requirements of being a DIS is added to the
broadcast network, the router is selected as the new DIS, which triggers a new
round of LSP flooding.
 Establishment of a neighbor relationship on a P2P link
The establishment of a neighbor relationship on a P2P link is different from that on a
broadcast link. A neighbor relationship on a P2P link can be established in 2-way or
3-way mode, as shown in Table 1-240. By default, the 3-way handshake mechanism is
used to establish a neighbor relationship on a P2P link.
Table 1-240 Comparison between 2-way mode and 3-way mode
Mode Description Advantages and Reliability

Disadvantages
2-way mode When a router Disadvantages: Low

receives an IIH, it  The unstable link
unidirectionally sets status causes the
up a neighbor loss of
relationship. 1.10.8.2.18 IS-IS
Control
Messages that
are sent once an
adjacency is set
up. As a result,
the link state
databases
(LSDBs) of two
neighboring
routers are not
synchronized
Issue 01 (2018-05-04) 1212

NE20E-S2
Mode Description Advantages and Reliability

Disadvantages
during the LSP
update period.
 If two or more
links exist
between two
routers, an
adjacency can
still be set up
when one link is
Down and
another is Up in
the same
direction. A
router that fails
to detect the
faulty link may
also forward
packets over this
link.
3-way mode A neighbor Advantages: A High
relationship is neighbor
established after relationship is
IIHs are sent three established only
times. when both ends are
Up. This mechanism
ensures that packets
are transmitted
securely.
LSDB Synchronization
IS-IS is a link-state protocol. An IS-IS router obtains first-hand information from other routers
running link-state protocols. Every router generates information about itself, directly
connected networks, and links between itself and directly connected networks. The router
then sends the generated information to other routers through adjacent routers. Every router
saves link state information without modifying it. Finally, every router has the same network
interworking information, and LSDB synchronization is complete. The process of
synchronizing LSDBs is called LSP flooding. In LSP flooding, a router sends an LSP to its
neighbors and the neighbors send the received LSP to their neighbors except the router that
first sends the LSP. The LSP is flooded among the routers at the same level. This
implementation allows each router at the same level to have the same LSP information and
keep a synchronized LSDB.
All routers in the IS-IS routing domain can generate LSPs. A new LSP is generated in any of
the following situations:
 Neighbor goes Up or Down.
 related interface goes Up or Down.
Issue 01 (2018-05-04) 1213

NE20E-S2
 Imported IP routes change.

 Inter-area IP routes change.
 A new metric value is configured for an interface.
 Periodic updates occur.
A router processes a received LSP as follows:
 Updating the LSDB on a broadcast link
The DIS updates the LSDB to synchronize LSDBs on a broadcast network. Figure 1-826
shows the process of synchronizing LSDBs on a broadcast network.
a. When the DIS receives an LSP, it searches the LSDB for the related records. If the
DIS does not find the LSP in its LSDB, it adds the LSP to its LSDB and broadcasts
the new LSDB.
b. If the sequence number of the received LSP is greater than that of the local LSP, the
DIS replaces the local LSP with the received LSP in the LSDB and broadcasts the
new LSDB.
c. If the sequence number of the received LSP is less than that of the local LSP, the
DIS sends the local LSP in the LSDB to the inbound interface.
d. If the sequence number of the received LSP is equal to that of the local LSP, the
DIS compares the Remaining Lifetime of the two LSPs. If Remaining Lifetime of
the received LSP is 0, the DIS replaces the LSP with the received LSP, and
broadcasts the new LSDB. If the Remaining Lifetime of local LSP is 0, the DIS
sends the LSP to the inbound interface.
e. If the sequence number of the received LSP and the local LSP in the LSDB are the
same and neither Remaining Lifetime is 0, the DIS compares the checksum of the
two LSPs. If the received LSP has a greater checksum than that of the local LSP in
the LSDB, the DIS replaces the local LSP in the LSDB with the received LSP and
advertises the new LSDB. If the received LSP has a smaller checksum than that of
the local LSP in the LSDB, the DIS sends the local LSP in the LSDB to the inbound
interface.
f. If the checksums of the received LSP and the local LSP are the same, the LSP is not
forwarded.
Issue 01 (2018-05-04) 1214

NE20E-S2
Figure 1-826 Process of updating the LSDB on a broadcast link
 Updating the LSDB on a P2P link

a. If the sequence number of the received LSP is greater than that of the local LSP in
the LSDB, the router adds the received LSP to its LSDB. The router then sends a
1.10.8.2.18 IS-IS Control Messages to acknowledge the received LSP and sends the
LSP to all its neighbors except the neighbor that sends the LSP.
b. If the sequence number of the received LSP is less than that of the local LSP, the
router directly sends its LSP to the neighbor and waits for a PSNP from the
neighbor as an acknowledgement.
c. If the sequence number of the received LSP is the same as that of the local LSP in
the LSDB, the router compares the Remaining Lifetimes of the two LSPs. If
Remaining Lifetime of the received LSP is 0, the router adds the LSP to its LSDB.
The router then sends a PSNP to acknowledge the received LSP. If Remaining
Lifetime of the local LSP is 0, the router directly sends the local LSP to the
neighbor and waits for a PSNP from the neighbor.
d. If the sequence number of the received LSP and the local LSP in the LSDB are the
same, and neither Remaining Lifetime is 0, the router compares the checksum of the
two LSPs. If the received LSP has a greater checksum than that of the local LSP,
the router adds the received LSP to its LSDB. The router then sends a PSNP to
acknowledge the received LSP. If the received LSP has a smaller checksum than
that of the local LSP, the router directly sends the local LSP to the neighbor and
waits for a PSNP from the neighbor. At last, the router sends the LSP to all its
neighbors except the neighbor that sends the LSP.
e. If the checksums of the received LSP and the local LSP are the same, the LSP is not
forwarded.
Issue 01 (2018-05-04) 1215

NE20E-S2
Route Calculation
When LSDB synchronization is complete and network convergence is implemented, IS-IS
performs SPF calculation by using LSDB information to obtain the SPT. IS-IS uses the SPT to
create a forwarding database (a routing table).
In IS-IS, link costs are used to calculate shortest paths. The default cost for an interface on a
Huawei router is 10. The cost is configurable. The cost of a route is the sum of the cost of
every outbound interface along the route. There may be multiple routes to a destination,
among which the route with the smallest cost is the optimal route.
Level-1 routers can also calculate the shortest path to Level-2 routers to implement inter-area
route selection. When a Level-1-2 router is connected to other areas, the router sets the value
of the attachment (ATT) bit in its LSP to 1 and sends the LSP to neighboring routers. In the
route calculation process, a Level-1 router selects the nearest Level-1-2 router as an
intermediate router between the Level-1 and Level-2 areas.
1.10.8.2.3 IS-IS Routing Information Control

IS-IS routes calculated using the SPF algorithm may bring about some problems. For example,
too many routing entries slow down route lookup, or link usage is unbalanced. As a result,
IS-IS routing cannot meet carriers' network planning and traffic management requirements.
IS-IS routing information control can refine control over route selection using the following
methods:
 Route Leaking
 Route Summarization
 Load Balancing
 Administrative Tag
 IS-IS Mesh Group
 Link Group
Route Leaking
When Level-1 and Level-2 areas both exist on an IS-IS network, Level-2 routers do not
advertise the learned routing information about a Level-1 area and the backbone area to any
other Level-1 area by default. Therefore, Level-1 routers do not know the routing information
beyond the local area. As a result, the Level-1 routers cannot select the optimal routes to the
destination beyond the local area.
With route leaking, Level-1-2 routers can select routes using routing policies, or tags and
advertise the selected routes of other Level-1 areas and the backbone area to the Level-1 area.
Figure 1-827 shows the typical networking for route leaking.
Issue 01 (2018-05-04) 1216

NE20E-S2
Figure 1-827 Typical networking for route leaking
 Device A, Device B, Device C, and Device D belong to area 10. Device A and Device B
are Level-1 routers. Device C and Device D are Level-1-2 routers.
 Device E and Device F belong to area 20 and are Level-2 routers.
If Device A sends a packet to Device F, the selected optimal route should be Device A ->
Device B -> Device D -> Device E -> Device F because its cost is 40 (10 + 10 + 10 + 10 = 40)
which is less than that of Device A -> Device C -> Device E -> Device F (10 + 50 + 10 = 70).
However, if you check routes on Device A, you can find that the selected route is Device A ->
Device C -> Device E -> Device F, which is not the optimal route from Device A to Device F.
This is because Device A does not know the routes beyond the local area, and therefore, the
packets sent by Device A to other network segments are sent through the default route
generated by the nearest Level-1-2 device.
In this case, you can enable route leaking on the Level-1-2 devices (Device C and Device D).
Then, check the route and you can find that the selected route is Device A -> Device B ->
Route Summarization
On a large-scale IS-IS network, links connected to devices within an IP address range may
alternate between Up and Down. With route summarization, multiple routes with the same IP
prefix are summarized into one route, which prevents route flapping, reduces routing entries
and system resource consumption, and facilitates route management. Figure 1-828 shows the
typical networking for route summarization.
Issue 01 (2018-05-04) 1217

NE20E-S2
Figure 1-828 Typical networking for route summarization
 router A, router B, and router C use IS-IS to communicate with each other.
 Device A belongs to area 20, and Device B and Device C belong to area 10.
 Device A is a Level-2 router. Device B is a Level-1-2 router. Device C is a Level-1
router.
 Device B maintains Level-1 and Level-2 LSDBs and leaks the routes to three network
segments (172.1.1.0/24, 172.1.2.0/24, and 172.1.3.0/24) from the Level-1 area to the
Level-2 area. If a link fault causes the Device C interface with IP address 172.1.1.1/24 to
frequently alternate between Up and Down, the status change is advertised to the Level-2
area, triggering frequent LSP flooding and SPF calculation on Device A. As a result, the
CPU usage on Device A increases, and even network flapping occurs.
On Device B, routes to the three network segments in the Level-1 area are summarized
to one route to 172.1.0.0/16, which reduces the number of routing entries on Device B
and minimizes the impact of route flapping in the Level-1 area on route convergence in
the Level-2 area.
Load Balancing
If multiple equal-cost routes are available on a network, they can load-balance traffic, which
improves link usage and prevents network congestion caused by link overload. Figure 1-829
shows the typical networking for load balancing.
Issue 01 (2018-05-04) 1218

NE20E-S2
Figure 1-829 Typical networking for load balancing
 Device A, Device B, Device C, and Device D communicate with each other on an IP

network using IS-IS.
 Device A, Device B, Device C, and Device D belong to area 10 and are Level-2 routers.
 If load balancing is not enabled, traffic on Device A is transmitted along the optimal
route obtained using the SPF calculation. Consequently, traffic on different links is
unbalanced. Enabling load balancing on Device A sends traffic to routerDevice D
through routerDevice B and Device C. This transmission mode relieves the load on the
optimal route.
Load balancing can work per-destination or per-packet. For details, see "Overview" in NE20E
Feature Description - IP Routing.
In addition to the support for route balancing within the same process, IS-IS supports ECMP
among several processes.
Administrative Tag
Administrative tags carry administrative information about IP address prefixes. When the cost
type is wide, wide-compatible, or compatible and the prefix of the reachable IP address to be
advertised by IS-IS has this cost type, IS-IS adds the administrative tag to the reachability
type-length-value (TLV) in the prefix. In this manner, the administrative tag is advertised
throughout the entire IS-IS area so that routes can be imported or filtered based on the
administrative tag.
IS-IS Mesh Group

As defined in IS-IS, a router must flood the received LSP to all neighbors. On a network with
multiple connections and point-to-point (P2P) links, this flooding method causes repeated
LSP flooding and wastes bandwidth resources.
To address this issue, you can add certain interfaces to a mesh group. These interfaces flood
the LSPs received from a group only to interfaces of other groups or interfaces no which no
mesh groups are configured. All the interfaces that join a mesh group ensure the
Issue 01 (2018-05-04) 1219

NE20E-S2
synchronization of the LSDBs in the entire network segment using the CSNP and PSNP
mechanisms.
Link Group
In Figure 1-830, router A is dual-homed to the IS-IS network through router B and router C.
The path router A -> router B is primary and the path router A -> router C is backup. The
bandwidth of each link is 100 Gbit/s, and the traffic from Client is transmitted at 150 Gbit/s.
In this situation, both links in the path router A -> router B or the path router A -> router C
need to carry the traffic. If Link-a fails, Link-b takes over the traffic. However, traffic loss
occurs because the bandwidth of Link-b is not sufficient enough to carry the traffic.
To address this problem, configure link groups. You can add multiple links to a link group. If
one of the links fails and the bandwidth of all the links in the group is not sufficient enough to
carry the traffic, the link group automatically increases the costs of the other links to a
configured value so that this link group is not selected. Then, traffic is switched to another
link group.
Figure 1-830 IS-IS dual-homing access networking
In Figure 1-830, Link-a and Link-b belong to link group 1, and Link-c and Link-d belong to
link group 2.
 If Link-a fails, link group 1 automatically increases the cost of Link-b so that the traffic
is switched to link group 2.
 If both Link-a and Link-c fail, the link groups increase the costs of Link-b and Link-d (to
the same value) so that Link-b and Link-d load-balance the traffic.
1.10.8.2.4 IS-IS Neighbor Relationship Flapping Suppression

IS-IS neighbor relationship flapping suppression works by delaying IS-IS neighbor
relationship reestablishment or setting the link cost to the maximum value (16777214 for wide
mode and 63 for narrow mode).
Issue 01 (2018-05-04) 1220

NE20E-S2
Background
If the status of an interface carrying IS-IS services alternates between Up and Down, IS-IS
neighbor relationship flapping occurs on the interface. During the flapping, IS-IS frequently
sends Hello packets to reestablish the neighbor relationship, synchronizes LSDBs, and
recalculates routes. In this process, a large number of packets are exchanged, adversely
affecting neighbor relationship stability, IS-IS services, and other IS-IS-dependent services,
such as LDP and BGP. IS-IS neighbor relationship flapping suppression can address this
problem by delaying IS-IS neighbor relationship reestablishment or preventing service traffic
from passing through flapping links.
Related Concepts
Flapping_event: reported when the status of a neighbor relationship on an interface last
changes from Up to Init or Down. The flapping_event triggers flapping detection.
Flapping_count: number of times flapping has occurred.
Detect-interval: interval at which flapping is detected. The interval is used to determine
whether to trigger a valid flapping_event.
Threshold: flapping suppression threshold. When the flapping_count exceeds the threshold,
flapping suppression takes effect.
Resume-interval: interval used to determine whether flapping suppression exits. If the
interval between two valid flapping_events is longer than the resume-interval, flapping
suppression exits.
Implementation
Flapping detection
IS-IS interfaces start a flapping counter. If the interval between two flapping_events is shorter
than the detect-interval, a valid flapping_event is recorded, and the flapping_count increases
by 1. When the flapping_count exceeds the threshold, the system determines that flapping
occurs, and therefore triggers flapping suppression, and sets the flapping_count to 0. If the
interval between two valid flapping_events is longer than the resume-interval before the
flapping_count reaches the threshold again, the system sets the flapping_count to 0 again.
Interfaces start the suppression timer when the status of a neighbor relationship last changes
to ExStart or Down.
The detect-interval, threshold, and resume-interval are configurable.
relationship establishment, interfaces prevent neighbor relationships from being
reestablished during the suppression period, which minimizes LSDB synchronization
attempts and packet exchanges.
 Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use the
maximum cost of the flapping link during the suppression period, which prevents traffic
from passing through the flapping link.
mode.
Issue 01 (2018-05-04) 1221

NE20E-S2
By default, the Hold-max-cost mode takes effect. The mode and suppression period can be
changed manually.

 The corresponding IS-IS process is reset.
 Three Hello packets in which the padding TLV carries a sub-TLV with the value being
251 are sent consecutively to notify the peer device to forcibly exit flapping suppression.
Basic scenario
-> Device E.
 If flapping suppression works in Hold-max-cost mode, the maximum cost is used as the
cost of the link between Device B and Device C during the suppression period, and
traffic is forwarded along the path Device A -> Device B -> Device D -> Device E.
Issue 01 (2018-05-04) 1222

NE20E-S2
recommended. If flapping suppression works in Hold-max-cost mode, the maximum cost is
used as the cost of the link between Device B and Device C during the suppression period.
After the network stabilizes and the suppression timer expires, the link is restored.
Broadcast scenario
to normal status.
Issue 01 (2018-05-04) 1223

NE20E-S2
Scenario of multi-level networking

In Figure 1-834, Device A, Device B, Device C, Device E, and Device F are connected on
Level 1 (Area 1), and Device B, Device D, and Device E are connected on Level 2 (Area 0).
Traffic from Device A to Device F is preferentially forwarded along an intra-area route, and
the forwarding path is Device A -> Device B -> Device C -> Device E -> Device F. When the
neighbor relationship between Device B and Device C flaps and the flapping meets
suppression conditions, flapping suppression takes effect in the default mode (Hold-max-cost).
Consequently, the maximum cost is used as the cost of the link between Device B and Device
C. However, the forwarding path remains unchanged because intra-area routes take
precedence over inter-area routes during route selection according to IS-IS route selection
rules. To prevent traffic loss in multi-area scenarios, configure Hold-down mode to prevent
the neighbor relationship between Device B and Device C from being reestablished during the
suppression period. During this period, traffic is forwarded along the path Device A -> Device
B -> Device D -> Device E -> Device F.
Issue 01 (2018-05-04) 1224

NE20E-S2
Scenario with both LDP-IGP synchronization and IS-IS neighbor relationship flapping
In Figure 1-835, if the link between PE1 and P1 fails, an LDP LSP switchover is implemented
immediately, causing the original LDP LSP to be deleted before a new LDP LSP is established.
To prevent traffic loss, LDP-IGP synchronization needs to be configured. With LDP-IGP
synchronization, the maximum cost is used as the cost of the new LSP to be established. After
the new LSP is established, the original cost takes effect. Consequently, the original LSP is
deleted, and LDP traffic is forwarded along the new LSP.
LDP-IGP synchronization and IS-IS neighbor relationship flapping suppression work in either
Hold-down or Hold-max-cost mode. If both functions are configured, Hold-down mode takes
precedence over Hold-max-cost mode, followed by the configured link cost. Table 1-241 lists
the suppression modes that take effect in different situations.
Table 1-241 Principles for selecting the suppression modes that take effect in different situations
LDP-IGP LDP-IGP LDP-IGP Exited from
Synchronization/IS-I Synchronization Synchronization LDP-IGP
S Neighbor Hold-down Mode Hold-max-cost Synchronization
Relationship Mode Suppression
Flapping
Suppression Mode
IS-IS Neighbor Hold-down Hold-down Hold-down
Relationship
Flapping
Suppression
Hold-down Mode
IS-IS Neighbor Hold-down Hold-max-cost Hold-max-cost
Relationship
Flapping
Suppression
Hold-max-cost
Mode
Exited from IS-IS Hold-down Hold-max-cost Exited from
Neighbor LDP-IGP
Relationship synchronization and
Flapping IS-IS neighbor
Suppression relationship flapping
suppression
LDP-IGP synchronization and IS-IS neighbor relationship flapping suppression are
-> P4 -> P3 -> PE2.
LDP-IGP synchronization and IS-IS neighbor relationship flapping suppression are
-> P4 -> P3 -> PE2.
Issue 01 (2018-05-04) 1225

NE20E-S2
Figure 1-835 Scenario with both LDP-IGP synchronization and IS-IS neighbor relationship
flapping suppression configured
Scenario with both bit-error-triggered protection switching and IS-IS neighbor

relationship flapping suppression configured
If a link has poor link quality, services transmitted along it may be adversely affected. If
bit-error-triggered protection switching is configured and the bit error rate (BER) along a link
exceeds a specified value, a bit error event is reported, and the maximum cost is used as the
cost of the link, triggering route reselection. Consequently, service traffic is switched to the
backup link. If both bit-error-triggered protection switching and IS-IS neighbor relationship
flapping suppression are configured, they both take effect. Hold-down mode takes precedence
over Hold-max-cost mode, followed by the configured link cost.
Scenario with both Link-bundle and IS-IS neighbor relationship flapping suppression
configured
When the service traffic rate exceeds the capacity of the link, multiple links must be used. If
one of the links between two devices is faulty, traffic is switched to another link. Because of
limited forwarding capacity on the new link, excessive traffic is discarded. If the number of
faulty links reaches the upper threshold, the maximum cost is used as the cost of all links in
the link bundle to switch all service traffic to the backup nodes. When both link-bundle and
neighbor relationship flapping suppression are configured, if the number of flappy links
reaches the upper threshold, the maximum cost must be configured as the cost of all other
links in the link bundle to prevent service loss caused by user traffic congestion. As shown in
Figure 1-836, two parallel links exist between Device A and Device C. If Link 1 is faulty and
Link 2 bears all service traffic, traffic loss occurs. If both link-bundle and neighbor
relationship flapping suppression are configured and Link 1 flaps, the maximum cost must be
configured for Link 2 to avoid service traffic congestion. Only the Hold-max-cost mode
therefore can be configured for neighbor relationship flapping suppression to switch the
traffic forwarding path to Device A->Device B->Device C.
Issue 01 (2018-05-04) 1226

NE20E-S2
Figure 1-836 Scenario with both Link-bundle and IS-IS neighbor relationship flapping
1.10.8.2.5 IS-IS Overload

The overload (OL) field of LSPs configured on a device prevents other devices from
calculating the routes passing through this device.
If a system fails to store new LSPs for LSDB synchronization, the routes calculated by the
system are incorrect. In that case, the system enters the Overload state. The user can configure
the device to enter the Overload state when the system lacks sufficient memory. At present,
users can set the Overload timer when IS-IS is started and configure whether to delete the
leaked routes and whether to advertise the imported routes. A device enters the Overload state
after an exception occurs on the device or when it is configured to enter the state.
 If IS-IS enters the Overload state after an exception occurs on the device, the system
deletes all imported or leaked routes.
 If IS-IS enters the Overload state based on a user configuration, the system only deletes
all imported or leaked routes if configured to do so.
Although LSPs with overload fields are flooded throughout the network, they are ignored in
the calculation of the routes passing through the device in the Overload state. Specifically,
after the overload field of LSPs is configured on a device, other devices do not count the
routes that pass through the device when performing SPF calculation, but the direct routes
between the device and other devices are still calculated.
If a device in an IS-IS domain is faulty, routes may be incorrectly calculated across the entire
domain. The overload field can be configured for the device to isolate it from the IS-IS
network temporarily, which facilitates fault isolation.
1.10.8.2.6 IS-IS Fast Convergence

IS-IS fast convergence is an extended feature of IS-IS implemented to speed up route
convergence. It includes the following concepts:
 I-SPF
Incremental SPF (I-SPF) recalculates only the routes of the changed nodes rather than
the routes of all nodes when the network topology changes, which speeds up the
calculation of routes.
 PRC
Issue 01 (2018-05-04) 1227

NE20E-S2
Partial route calculation (PRC) calculates only those routes which have changed when
the network topology changes.
 Link State PDUs (LSP) fast flooding
LSP fast flooding speeds up LSP flooding.
 Intelligent timer
The first timeout period of the timer is fixed. If an event that triggers the timer occurs
before the set timer expires, the next timeout period of the timer increases.
The intelligent timer applies to LSP generation and SPF calculation.
I-SPF
In ISO 10589, the Dijkstra algorithm was adopted to calculate routes. When a node changes
on the network, the algorithm recalculates all routes. The calculation requires a long time to
complete and consumes a significant amount of CPU resources, reducing convergence speed.
I-SPF improves the algorithm. Except for the first time the algorithm is run, only the nodes
that have changed rather than all nodes in the network are used in the calculation. The SPT
generated using I-SPF is the same as that generated using the previous algorithm. This
significantly decreases CPU usage and speeds up network convergence.
PRC
Similar to I-SPF, PRC calculates only routes that have changed. PRC, however, does not
calculate the shortest path. It updates routes based on the SPT calculated by I-SPF.
In route calculation, a leaf represents a route, and a node represents a device. If the SPT
changes after I-SPF calculation, PRC calculates all the leaves only on the changed node. If the
SPT remains unchanged, PRC calculates only the changed leaves.
For example, if IS-IS is enabled on an interface of a node, the SPT calculated by I-SPF
remains unchanged. In this case, PRC updates only the routes of this interface, which
consumes less CPU resources.
PRC working with I-SPF further improves network convergence performance and replaces
the original SPF algorithm.
On the NE20E, only I-SPF and PRC are used to calculate IS-IS routes.
LSP Fast Flooding

When an IS-IS device receives new LSPs from other devices, it updates the LSPs in the
LSDB and periodically floods the updated LSPs based on a timer. Therefore, the
synchronization of all LSDBs is slow.
With lSP fast flooding, when the router receives LSPs that can trigger route calculation or
route update, it floods these LSPs before route calculation occurs, which speeds up network
convergence and LSDB synchronization throughout the entire network.
Intelligent Timer
Although the route calculation algorithm is improved, the long interval for triggering route
calculation also affects the convergence speed. A millisecond-level timer can shorten the
interval. Frequent network changes, however, also consume too much CPU resources. The
SPF intelligent timer addresses these problems.
Issue 01 (2018-05-04) 1228

NE20E-S2
In most cases, an IS-IS network running normally is stable. The frequent changes on a
network are rather rare, and IS-IS does not calculate routes frequently. Therefore, a short
period (within milliseconds) can be configured as the first interval for route calculation. If the
network topology changes frequently, the interval set by the intelligent timer increases with
the calculation times to reduce CPU consumption.
The LSP generation intelligent timer is similar to the SPF intelligent timer. When the LSP
generation intelligent timer expires, the system generates a new LSP based on the current
topology. The original mechanism uses a timer with fixed intervals, which results in slow
convergence and high CPU consumption. Therefore, the LSP generation timer is designed as
an intelligent timer to respond to emergencies (for example, the interface goes Up or Down)
quickly and speed up network convergence. In addition, when the network changes frequently,
the interval for the intelligent timer becomes longer to reduce CPU consumption.
1.10.8.2.7 IS-IS LSP Fragment Extension

If the LSP capacity is insufficient, newly imported routes and new TLVs fail to be added to
LSP fragments. In this case, you can use LSP fragment extension to increase the LSP capacity,
restoring the LSP space. When the LSP space is restored, the system automatically attempts to
re-add these routes and TLVs to LSP fragments.
When the LSPs to be advertised by IS-IS contain a large amount of information, they are
advertised in multiple Link State PDUs (LSP) fragments belonging to the same system.
Virtual system IDs can be configured, and virtual LSPs that carry routing information can be
generated for IS-IS.
IS-IS LSP fragment extension allows an IS-IS device to generate more LSP fragments and
carry more IS-IS information.
Terms
 Originating system
The originating system is a device that runs the IS-IS protocol. A single IS-IS process
advertises LSPs as virtual devices do, except that the originating system refers to a real
IS-IS process.
 Normal system ID
The normal system ID is the system ID of the originating system.
 Additional system ID
The additional system ID, assigned by the network administrator, is used to generate
additional or extended LSP fragments. A maximum of 256 additional or extended LSP
fragments can be generated. Like a normal system ID, an additional system ID must be
unique in a routing domain.
 Virtual system
The virtual system, identified by an additional system ID, is used to generate extended
LSP fragments. These fragments carry additional system IDs in their LSP IDs.
Principles
IS-IS LSP fragments are identified by the LSP Number field in their LSP IDs. The LSP
Number field is 1 byte. Therefore, an IS-IS process can generate a maximum of 256
fragments. With fragment extension, more information can be carried.
Issue 01 (2018-05-04) 1229

NE20E-S2
Each system ID represents a virtual system, and each virtual system can generate 256 LSP
fragments. In addition, another virtual systems can be configured. Therefore, an IS-IS process
can generate more LSP fragments.
After a virtual system and fragment extension are configured, an IS-IS device adds the
contents that cannot be contained in its LSPs to the LSPs of the virtual system and notifies
other devices of the relationship between the virtual system and itself through a special TLV
in the LSPs.
IS Alias ID TLV
Standard protocol defines a special Type-Length-Value (TLV): IS Alias ID TLV.
Table 1-242 IS Alias ID TLV
Type 1 byte TLV type. If the value is 24, it indicates the IS

Alias ID TLV.
Length 1 byte TLV length.
System ID 6 bytes System ID.
Pseudonode 1 byte Pseudonode number.
number
Sub-TLVs 1 byte Length of sub-TLVs.
length
Sub-TLVs 0 to 247 bytes Sub-TLVs.
LSPs with fragment number 0 sent by the originating system and virtual system carry IS Alias
ID TLVs to indicate the originating system.
Operation Modes
IS-IS devices can use the LSP fragment extension feature in the following modes:
Figure 1-837 Networking for IS-IS LSP fragment extension
Issue 01 (2018-05-04) 1230

NE20E-S2
 Mode-1
Mode-1 is used when some devices on the network do not support LSP fragment
extension.
In this mode, virtual systems participate in SPF calculation. The originating system
advertises LSPs containing information about links to each virtual system and each
virtual system advertises LSPs containing information about links to the originating
system. In this manner, the virtual systems function the same as the actual devices
connected to the originating system on the network.
Mode-1 is a transitional mode for earlier versions that do not support LSP fragment
extension. In the earlier versions, IS-IS cannot identify Alias ID TLVs. Therefore, the
LSP sent by a virtual system must look like a common IS-IS LSP.
The LSP sent by a virtual system contains the same area address and overload bit as
those in the common LSP. If the LSPs sent by a virtual system contain TLVs specified in
other features, the TLVs must be the same as those in common LSPs.
LSPs sent by a virtual system carry information of the neighbor (the originating system),
and the carried cost is the maximum value minus 1. LSPs sent by the originating system
carry information of the neighbor (the virtual system), and the carried cost is 0. This
mechanism ensures that the virtual system is a node downstream of the originating
system when other devices calculate routes.
In Figure 1-837, Device B does not support LSP fragment extension; Device A supports
LSP fragment extension in mode-1; Device A1 and Device A2 are virtual systems of
Device A. Device A1 and Device A2 send LSPs carrying partial routing information of
Device A. After receiving LSPs from Device A, Device A1, and Device A2, Device B
considers there to be three devices at the peer end and calculates routes normally.
Because the cost of the route from Device A to Device A1 or Device A2 is 0, the cost of
the route from Device B to Device A is equal to that from Device B to Device A1.
 Mode-2
Mode-2 is used when all the devices on the network support LSP fragment extension. In
this mode, virtual systems do not participate in SPF calculation. All the devices on the
network know that the LSPs generated by the virtual systems actually belong to the
originating system.
IS-IS working in mode-2 identifies IS Alias ID TLVs, which are used to calculate the
SPT and routes.
In Figure 1-837, Device B supports LSP fragment extension, and Device A supports LSP
fragment extension in mode-2; Device A1 and Device A2 send LSPs carrying some
routing information of Device A. After receiving LSPs from Device A1 and Device A2,
Device B obtains IS Alias ID TLV and learns that the originating system of Device A1
and Device A2 is Device A. Device B then considers information advertised by Device
A1 and Device A2 to be about Device A.
Whatever the LSP fragment extension mode, LSPs can be resolved. However, if LSP
fragment extension is not supported, only LSPs in mode-1 can be resolved.
Table 1-243 Comparison between mode-1 and mode-2
LSP Field Carried in Mode-1 Carried in

Mode-2
IS Alias ID Yes Yes

Area Yes No
Issue 01 (2018-05-04) 1231

NE20E-S2
LSP Field Carried in Mode-1 Carried in

Mode-2
Overload bit Yes Yes

IS NBR/IS EXTENDED NBR Yes No
Routing Yes Yes
ATT bit Yes, with value 0 Yes, with value 0
P bit Yes, with value 0 Yes, with value 0
Process
After LSP fragment extension is configured, if information is lost because LSPs overflow, the
system restarts the IS-IS process. After being restarted, the originating system loads as much
routing information as possible. Any excessive information beyond the forwarding capability
of the system is added to the LSPs of the virtual systems for transmission. In addition, if a
virtual system with routing information is deleted, the system automatically restarts the IS-IS
process.
Usage Scenario
If there are non-Huawei devices on the network, LSP fragment extension must be set to mode-1.
Otherwise, these devices cannot identify LSPs.
Configuring LSP fragment extension and virtual systems before setting up IS-IS neighbors or
importing routes is recommended. If IS-IS neighbors are set up or routes are imported first
and the information to be carried exceeds the forwarding capability of 256 fragments before
LSP fragment extension and virtual systems are configured, you have to restart the IS-IS
process for the configurations to take effect.
1.10.8.2.8 IS-IS 3-Way Handshake

IS-IS introduces the 3-way handshake mechanism on P2P links to ensure a reliable data link
layer.
Based on ISO 10589, the IS-IS 2-way handshake mechanism uses Hello packets to set up P2P
adjacencies between neighboring devices. When a device receives a Hello packet from the
other end, it regards the other end as Up and sets up an adjacency with it. However, this
mechanism has some serious shortcomings.
When two or more links exist between two devices, an adjacency can still be set up where one
link is Down and the other is Up in the same direction. The parameters of the other link are
used in SPF calculation. As a result, a device that does not detect any fault along the faulty
link will continue trying to forward packets over the link.
The 3-way handshake mechanism resolves these problems on P2P links. In 3-way handshake
mode, a device regards a neighbor Up and sets up an adjacency with it only after confirming
that the neighbor has received the packet that the device sends.
Issue 01 (2018-05-04) 1232

NE20E-S2
In addition, the 3-way handshake mechanism uses the 32-bit Extended Local Circuit ID field,
which extends the original 8-bit Extended Local Circuit ID field and the limit of only 255 P2P
links.
By default, the IS-IS 3-way handshake mechanism is implemented on P2P links.
1.10.8.2.9 IS-IS for IPv6

Relevant standards released by the IETF defines two new TLVs that can support IPv6 routes
and a new Network Layer Protocol Identifier (NLPID), which ensures that IS-IS can process
and calculate IPv6 routes.
The two new TLVs are as follows:
 IPv6 Reachability
The IPv6 Reachability TLV indicates the reachability of a network by specifying the
route prefix and metric. The type value is 236 (0xEC).
 IPv6 Interface Address
The IPv6 Interface Address TLV is similar to the IP interface address TLV of IPv4 in
function, except that it changes the original 32-bit IPv4 address to a 128-bit IPv6 address.
The type value is 232 (0xE8).
The NLPID is an 8-bit field that identifies network layer protocol packets. The NLPID of
IPv6 is 142 (0x8E). If an IS-IS router supports IPv6, it advertises routing information through
the NLPID value.
1.10.8.2.10 IS-IS TE
IS-IS TE supports MPLS establishment and maintenance of Constraint-based Routed Label
Switched Paths (CR-LSPs).
To establish CR-LSPs, MPLS needs to learn the traffic attributes of all the links in the local
area. MPLS can acquire the TE information of the links through IS-IS.
Traditional routers select the shortest path as the primary route regardless of other factors,
such as bandwidth, even when the path is congested.
Issue 01 (2018-05-04) 1233

NE20E-S2
Figure 1-838 Networking with IS-IS routing defects
On the network shown in Figure 1-838, all the links have the same cost (10). The shortest path
from Device A/Device H to Device E is Device A/Device H → Device B → Device C →
Device D → Device E. Data is forwarded along this shortest path. Therefore, the path Device
A (Device H) → Device B → Device C → Device D → Device E may be congested while the
path Device A/Device H → Device B → Device F → Device G → Device D → Device E is
idle.
To resolve the preceding problem, the cost of the path from Device B to Device C can be set
to 30 so that the traffic is switched to the path Device A/Device H → Device B → Device F
→ Device G → Device D → Device E.
This method eliminates the congestion on the link Device A/Device H → Device B → Device
C → Device D → Device E; however, the other link Device A (Device H) → Device B →
Device F → Device G → Device D → Device E may be congested. In addition, on networks
with complicated topologies, changing the cost of one link may affect multiple routes.
As an overlay model, MPLS can set up a virtual topology over the physical network topology
and map traffic to the virtual topology, effectively combining MPLS and TE technology into
MPLS TE.
MPLS TE can resolve network congestion problems by allowing carriers can precisely control
the path through which traffic passes and prevent traffic from passing through congested
nodes. Meanwhile, MPLS TE can reserve resources during the establishment of LSPs to
ensure service quality.
To ensure continuity of services, MPLS TE provides the CR-LSP backup and fast reroute
(FRR) mechanisms. If a link fault occurs, traffic can be switched immediately. Through
MPLS TE, service providers (SPs) can fully utilize the current network resources to provide
diverse services, optimize network resources, and methodically manage the network.
Issue 01 (2018-05-04) 1234

NE20E-S2
To accomplish the preceding tasks, MPLS TE needs to learn TE information about all devices
on the network. However, MPLS TE lacks a mechanism in which each device floods its TE
information throughout the entire network for TE information synchronization. However,
IS-IS does provide such a mechanism. Therefore, MPLS TE can advertise and synchronize TE
information with the help of IS-IS. To support MPLS TE, IS-IS needs to be extended.
In brief, IS-IS TE collects TE information on IS-IS networks and then transmits the TE
information to the Constrained Shortest Path First (CSPF) module.
Basic Principles
IS-IS TE is an extension of IS-IS intended to support MPLS TE. As defined in standard
protocols IS-IS TE defines new TLVs in IS-IS LSPs to carry TE information to help MPLS
implement the flooding, synchronization, and resolution of TE information. Then, IS-IS TE
transmits the resolved TE information to the CSPF module. In MPLS TE, IS-IS TE plays the
role of a porter. Figure 1-839 illustrates the relationships between IS-IS TE, MPLS TE, and
CSPF.
Figure 1-839 Outline of relationships between MPLS TE, CSPF, and IS-IS TE
To carry TE information in LSPs, IS-IS TE defines the following TLVs in standard protocols:
 Extended IS reachability TLV
The Extended IS reachability TLV replaces the IS reachability TLV and extends the TLV
format using sub-TLVs. The implementation of sub-TLVs in TLVs is the same as that of
TLVs in LSPs. Sub-TLVs are used to carry TE information configured on physical
interfaces.
All sub-TLVs defined in standard protocols are supported.
Table 1-244 Sub-TLVs defined in the extended IS reachability TLV
Name Type Length (Byte)

Administrative Group 3 4
Issue 01 (2018-05-04) 1235

NE20E-S2
Name Type Length (Byte)

IPv4 Interface Address 6 4
IPv4 Neighbor Address 8 4
Maximum Link Bandwidth 9 4
Maximum Reserved Link Bandwidth 10 4
Unreserved Bandwidth 11 32
Traffic Engineering Default Metric 18 3
Bandwidth Constraints sub-TLV 22 36
 Traffic Engineering router ID TLV

The type of this TLV is 134, and this TLV carries a 4-byte router ID (MPLS LSR-ID). In
MPLS TE, each device has a unique router ID.
 Extended IP reachability TLV
The Extended IP reachability TLV replaces the IP reachability TLV and carries routing
information. It extends the length of the route cost field to 4 bytes and carries sub-TLVs.
IS-IS TE consists of two procedures:
 Responding to MPLS TE configurations
IS-IS TE functions only after MPLS TE is enabled.
IS-IS TE updates the TE information in IS-IS LSPs based on MPLS TE configurations.
IS-IS TE transmits MPLS TE configurations to the CSPF module.
 Processing TE information in LSPs
IS-IS TE extracts TE information from IS-IS LSPs and transmits the TE information to
the CSPF module.
Usage Scenario
IS-IS TE helps MPLS TE set up TE tunnels. In Figure 1-840, a TE tunnel is set up between
Device A and Device C.
Issue 01 (2018-05-04) 1236

NE20E-S2
Figure 1-840 Networking for IS-IS TE

 MPLS TE and MPLS TE CSPF are enabled on Device A.
 MPLS TE is enabled on Device B, Device C, and Device D.
 IS-IS and IS-IS TE are enabled on Device A, Device B, Device C, and Device D.
After the configurations are complete, IS-IS on Device A, Device B, Device C, and Device D
sends LSPs carrying TE information configured on each Device. Device A then obtains the
TE information of Device B, Device C, and Device D from the received LSPs. The CSPF
module can calculate the path required by the TE tunnel based on the TE information
collected across the entire network.
1.10.8.2.11 IS-IS Wide Metric

In the earlier ISO 10589, the largest metric of an interface is 63. TLV type 128 and TLV type
130 contain information about routes, and TLV type 2 contains information about IS-IS
neighbors. However, on large-scale networks, the metric range cannot meet the requirements.
Moreover, IS-IS TE needs to be configured. Therefore, the wide metric was introduced.
As defined in standard protocols, with IS-IS wide metric, the largest metric of an interface is
extended to 16777215, and the largest metric of a route is 4261412864.
After IS-IS wide metric is enabled, TLV type 135 contains information about routes; TLV
type 22 contains information about IS-IS neighbors.
 The following lists the TLVs used in narrow mode:
− IP Internal Reachability TLV: carries routes within an area.
− IP External Reachability TLV: carries routes outside an area.
− IS Neighbors TLV: carries information about neighbors.
 The following lists the TLVs used in wide mode:
− Extended IP Reachability TLV: replaces the earlier IP Reachability TLV and carries
information about routes. This TLV expands the range of the route cost to 4 bytes
and carries sub-TLVs.
− IS Extended Neighbors TLV: carries information about neighbors.
Issue 01 (2018-05-04) 1237

NE20E-S2
The metric style can be set to narrow, narrow-compatible, compatible, wide-compatible, or wide mode.
Table 1-245 shows which metric styles are carried in received and sent packets. A device can calculate
routes only when it can receive, send, and process corresponding TLVs. Therefore, to ensure correct data
forwarding on a network, the proper metric style must be configured for each device on the network.
Table 1-245 Metric style carried in received and sent under different metric style configurations
Configured Metric Metric Style Carried in Metric Style Carried in Sent

Style Received Packets Packets
Narrow Narrow Narrow

Narrow-compatible Narrow and wide Narrow
Compatible Narrow and wide Narrow and wide
Wide-compatible Narrow and wide Wide
Wide Wide Wide
When the metric style is set to compatible, IS-IS sends the information both in narrow and
wide modes.
Process
Once the metric style is changed, the IS-IS process restarts.
 If the metric style carried in sent packets is changed from narrow to wide:
The information previously carried by TLV type 128, TLV type 130, and TLV type 2 is
now carried by TLV type 135 and TLV type 22.
 If the metric style carried in sent packets is changed from wide to narrow:
The information previously carried by TLV type 135 and TLV type 22 is now carried by
TLV type 128, TLV type 130, and TLV type 2.
 If the metric style carried in sent packets is changed from narrow or wide to narrow and
wide:
The information previously carried in narrow or wide mode is now carried by TLV type
128, TLV type 130, TLV type 2, TLV type 135, and TLV type 22.
Usage Scenario
IS-IS wide metric is used to support IS-IS TE, and the metric style needs to be set to wide,
compatible or wide compatible.
Issue 01 (2018-05-04) 1238

NE20E-S2
1.10.8.2.12 BFD for IS-IS

In most cases, the interval at which Hello packets are sent is 10s, and the IS-IS neighbor
holding time (the timeout period of a neighbor relationship) is three times the interval. If a
device does not receive a Hello packet from its neighbor within the holding time, the device
terminates the neighbor relationship.
A device can detect neighbor faults at the second level only. As a result, link faults on a
high-speed network may cause a large number of packets to be discarded.
BFD, which can be used to detect link faults on lightly loaded networks at the millisecond
level, is introduced to resolve the preceding issue. With BFD, two systems periodically send
BFD packets to each other. If a system does not receive BFD packets from the other end
within a specified period, the system considers the bidirectional link between them Down.
 Static BFD
are set using commands, and requests must be delivered manually to establish BFD
sessions.
 Dynamic BFD
protocols.
BFD for IS-IS enables BFD sessions to be dynamically established. After detecting a fault,
BFD notifies IS-IS of the fault. IS-IS sets the neighbor status to Down, quickly updates link
state protocol data units (LSPs), and performs the partial route calculation (PRC). BFD for
IS-IS implements fast IS-IS route convergence.
Instead of replacing the Hello mechanism of IS-IS, BFD works with IS-IS to rapidly detect the faults
that occur on neighboring devices or links.
BFD Session Establishment and Deletion

 Conditions for establishing a BFD session
− Global BFD is enabled on each device, and BFD is enabled on a specified interface
or process.
− IS-IS is configured on each device and enabled on interfaces.
− Neighbors are Up, and a designated intermediate system (DIS) has been elected on
a broadcast network.
 Process of establishing a BFD session
− P2P network
After the conditions for establishing BFD sessions are met, IS-IS instructs the BFD
module to establish a BFD session and negotiate BFD parameters between
neighbors.
After the conditions for establishing BFD sessions are met and the DIS is elected,
IS-IS instructs BFD to establish a BFD session and negotiate BFD parameters
Issue 01 (2018-05-04) 1239

NE20E-S2
between the DIS and each device. No BFD sessions are established between
non-DISs.
On broadcast networks, devices (including non-DIS devices) of the same level on a
network segment can establish adjacencies. In BFD for IS-IS, however, BFD sessions are
established only between the DIS and non-DISs. On P2P networks, BFD sessions are
directly established between neighbors.
If a Level-1-2 neighbor relationship is set up between the devices on both ends of a link,
the following situations occur:
− On a broadcast network, IS-IS sets up a Level-1 BFD session and a Level-2 BFD
session.
− On a P2P network, IS-IS sets up only one BFD session.
 Process of tearing down a BFD session
− P2P network
If the neighbor relationship established between P2P IS-IS interfaces is not Up,
IS-IS tears down the BFD session.
If the neighbor relationship established between broadcast IS-IS interfaces is not Up
or the DIS is reelected on the broadcast network, IS-IS tears down the BFD session.
If the configurations of dynamic BFD sessions are deleted or BFD for IS-IS is disabled
from an interface, all Up BFD sessions established between the interface and its
neighbors are deleted. If the interface is a DIS and the DIS is Up, all BFD sessions
established between the interface and its neighbors are deleted.
If BFD is disabled from an IS-IS process, BFD sessions are deleted from the process.
BFD detects only the one-hop link between IS-IS neighbors because IS-IS establishes only one-hop
 Response to the Down event of a BFD session

When BFD detects a link failure, it generates a Down event and informs IS-IS of the
Down event through the GFD module. IS-IS then suppresses neighbor relationships and
recalculates routes. This process speeds up network convergence.
Usage Scenario
Dynamic BFD needs to be configured based on the actual network. If the time parameters are
not configured correctly, network flapping may occur.
BFD for IS-IS speeds up route convergence through rapid link failure detection. The
following is a networking example for BFD for IS-IS.
Issue 01 (2018-05-04) 1240

NE20E-S2
Figure 1-841 BFD for IS-IS

 Basic IS-IS functions are configured on each device shown in Figure 1-841.
 Global BFD is enabled.
 BFD for IS-IS is enabled on Device A and Device B.
If the link between Device A and Device B fails, BFD can rapidly detect the fault and report it
to IS-IS. IS-IS sets the neighbor status to Down to trigger an IS-IS topology calculation. IS-IS
also updates LSPs so that Device C can promptly receive the updated LSPs from Device B,
which accelerates network topology convergence.
1.10.8.2.13 IS-IS Auto FRR
Background
IS-IS Auto fast re-route (FRR) is a dynamic IP FRR technology that minimizes traffic loss by
immediately switching traffic to the alternate link pre-computed by an IGP based on the
LSDBs on the entire network and stored in the FIB if a link or adjacent node failure is
detected. As IP FRR implements route convergence, it is becoming increasingly popular with
carriers.
Major Auto FRR techniques include loop-free alternate (LFA), U-turn, Not-Via, Remote LFA,
and MRT, among which IS-IS supports only LFA and Remote LFA.
Related Concepts
LFA
LFA is an IP FRR technology that calculates the shortest path from the neighbor that can
provide an alternate link to the destination node based on the Shortest Path First (SPF)
algorithm. Then, LFA calculates a loop-free alternate link with the smallest cost based on the
inequality: Distance_opt (N, D) < Distance_opt (N, S) + Distance_opt (S, D).
In the preceding inequality, S, D, and N indicate the source node, destination node, and a
neighbor of S, respectively, and Distance_opt (X,Y) indicates the shortest distance from node
X to node Y.
Remote LFA
LFA Auto FRR cannot be used to calculate alternate links on large-scale networks, especially
on ring networks. Remote LFA Auto FRR addresses this problem by calculating a PQ node
and establishing a tunnel between the source node of a primary link and the PQ node. If the
Issue 01 (2018-05-04) 1241

NE20E-S2
primary link fails, traffic can be automatically switched to the tunnel, which improves
network reliability.
P space
P space consists of the nodes through which the shortest path trees (SPTs) with the source
node of a primary link as the root are reachable without passing through the primary link.
Extended P space
Extended P space consists of the nodes through which the SPTs with neighbors of a primary
link's source node as the root are reachable without passing through the primary link.
Q space
Q space consists of the nodes through which the SPTs with the destination node of a primary
link as the root are reachable without passing through the primary link.
PQ node
A PQ node exists both in the extended P space and Q space and is used by Remote LFA as the
destination of a protection tunnel.
IS-IS LFA Auto FRR

IS-IS LFA Auto FRR protects against both link and node-and-link failures.
 Link protection: Link protection applies to traffic transmitted over specified links.
In the example network shown in Figure 1-842, traffic flows from Device S to Device
D, and the link cost meets the preceding link protection inequality. If the primary link (S
-> Device D) fails, Device S switches the traffic to the alternate link (S -> Device
N -> Device D), minimizing traffic loss.
Figure 1-842 Networking for IS-IS LFA Auto FRR link protection
 Node-and-link protection: Node-and-link protection applies to traffic transmitted over

specified nodes or links. Figure 1-843 illustrates a network where LFA Auto FRR
node-and-link protection is used. Node-and-link protection takes precedence over link
protection.
Node-and-link protection takes effect when the following conditions are met:
a. The link cost satisfies the inequality: Distance_opt (N, D) < Distance_opt (N, S) +
Distance_opt (S, D).
Issue 01 (2018-05-04) 1242

NE20E-S2
b. The interface cost of the device satisfies the inequality: Distance_opt (N, D) <
Distance_opt (N, E) + Distance_opt (E, D).
Figure 1-843 Networking for IS-IS LFA Auto FRR node-and-link protection
IS-IS Remote LFA Auto FRR

Similar to LFA Auto FRR, Remote LFA protects against both link and node-and-link failures.
The following example shows how Remote LFA works to protect against link failures:
In Figure 1-844, traffic flows through PE1 -> P1 -> P2 -> PE2, and the primary link is
between P1 and P2. Remote LFA calculates a PQ node (P4) and establishes a Label
Distribution Protocol (LDP) tunnel between P1 and P4. If P1 detects a failure on the primary
link, P1 encapsulates packets into MPLS packets and forwards MPLS packets to P4. After
receiving the packets, P4 removes the MPLS label from them and searches the IP routing
table for a next hop to forward the packets to PE2. Remote LFA ensures uninterrupted traffic
forwarding.
Figure 1-844 Networking for Remote LFA
On the network shown in Figure 1-844, Remote LFA calculates the PQ node as follows:
1. Calculates the SPTs with all neighbors of P1 as roots. The nodes through which the SPTs
are reachable without passing through the primary link form an extended P space. The
extended P space in this example is {PE1, P1, P3, P4}.
2. Calculates the SPTs with P2 as the root and obtains the Q space {PE2, P4}.
Issue 01 (2018-05-04) 1243

NE20E-S2
3. Selects the PQ node (P4) that exists both in the extended P space and Q space.
1.10.8.2.14 IS-IS Authentication
Background
As the Internet develops, more data, voice, and video information are exchanged over the
Internet. New services, such as e-commerce, online conferencing and auctions, video on
demand, and distance learning, emerge gradually. The new services have high requirements
for network security. Carriers need to prevent data packets from being intercepted or modified
by attackers or unauthorized users. IS-IS authentication applies to the area or interface where
packets need to be protected. Using IS-IS authentication enhances system security and helps
carriers provide safe network services.
Related Concepts
Authentication Classification
Based on packet types, the authentication is classified as follows:
 Interface authentication: is configured in the interface view to authenticate Level-1 and
Level-2 IS-to-IS Hello PDUs (IIHs).
 Area authentication: is configured in the IS-IS process view to authenticate Level-1
CSNPs, PSNPs, and LSPs.
 Routing domain authentication: is configured in the IS-IS process view to authenticate
Level-2 CSNPS, PSNPs, and LSPs.
Based on the authentication modes of packets, the authentication is classified into the
following types:
 Simple authentication: The authenticated party directly adds the configured password to
packets for authentication. This authentication mode provides the lowest password
security.
 MD5 authentication: uses the MD5 algorithm to encrypt a password before adding the
password to the packet, which improves password security.
 Keychain authentication: further improves network security with a configurable key
chain that changes with time.
 HMAC-SHA256 authentication: uses the HMAC-SHA256 algorithm to encrypt a
password before adding the password to the packet, which improves password security.
Implementation
IS-IS authentication encrypts IS-IS packets by adding the authentication field to packets to
ensure network security. After receiving IS-IS packets from a remote router, a local router
discards the packets if the authentication passwords in the packets are different from the
locally configured one. This mechanism protects the local router.
IS-IS provides a type-length-value (TLV) to carry authentication information. The TLV
components are as follows:
 Type: indicates the type of a packet, which is 1 byte. The value defined by ISO is 10,
while the value defined by IP is 133.
 Length: indicates the length of the authentication TLV, which is 1 byte.
Issue 01 (2018-05-04) 1244

NE20E-S2
 Value: indicates the authentication information, including authentication type and

authenticated password, which ranges from 1 to 254 bytes. The authentication type is 1
byte:
− 0: reserved
− 1: simple authentication
− 3: general authentication, and only HMAC-SHA256 authentication currently
− 54: MD5 authentication
− 255: private authentication
Interface Authentication
Authentication passwords for IIHs are saved on interfaces. The interfaces send authentication
packets with the authentication TLV. Interconnected router interfaces must be configured with
the same password.
Area Authentication
Every router in an IS-IS area must use the same authentication mode and have the same key
chain.
Routing Domain Authentication
Every Level-2 or Level-1-2 router in an IS-IS area must use the same authentication mode and
have the same key chain.
For area authentication and routing domain authentication, you can set a router to authenticate
SNPs and LSPs separately in the following ways:
 A router sends LSPs and SNPs that carry the authentication TLV and verifies the
authentication information of the LSPs and SNPs it receives.
 A router sends LSPs that carry the authentication TLV and verifies the authentication
information of the LSPs it receives. The router sends SNPs that carry the authentication
TLV and does not verify the authentication information of the SNPs it receives.
 A router sends LSPs that carry the authentication TLV and verifies the authentication
information of the LSPs it receives. The router sends SNPs without the authentication
TLV and does not verify the authentication information of the SNPs it receives.
 A router sends LSPs and SNPs that carry the authentication TLV but does not verify the
authentication information of the LSPs and SNPs it receives.
1.10.8.2.15 IS-IS Purge LSP Source Tracing

IS-IS purge LSP source tracing improves the efficiency in locating the fault that triggers IS-IS
purge LSPs.
Background
If network-wide IS-IS LSPs are deleted, purge LSPs are flooded, which adversely affects
network stability. In this case, source tracing must be implemented to locate the root cause of
the fault immediately to minimize the impact. However, IS-IS itself does not support source
tracing. A conventional solution is isolation node by node until the faulty node is located. The
solution is complex and time-consuming. Therefore, a fast source tracing method is required.
Related Concepts
 PS-PDU: packets that carry information about the node that floods IS-IS purge LSPs.
Issue 01 (2018-05-04) 1245

NE20E-S2
 CAP-PDU: packets used to negotiate the IS-IS purge LSP source tracing capability
between IS-IS neighbors.
 IS-IS purge LSP source tracing port: ID of the UDP port that receives and sends IS-IS
purge LSP source tracing packets. The default port ID is 50121, which is configurable.
Implementation
IS-IS purge LSPs do not carry source information. If a device fails on the network, a large
number of purge LSPs are flooded. Without a source tracing mechanism, nodes are isolated
one by one until the faulty node is located, which is labor-intensive and time-consuming.
IS-IS purge LSPs will trigger route flapping on the network, or even routes become
unavailable. In this case, the device that floods the purge LSPs must be located and isolated
immediately.
A solution that can address the following issues is required:
 Information about the source that floods IS-IS purge LSPs can be obtained when
network routes are unreachable.
 The method used to obtain source information must apply to all devices on the network
and support incremental deployment, without compromising routing capabilities.
IS-IS purge LSP source tracing helps locate the device that floods purge LSPs. IS-IS purge
LSP source tracing uses a new UDP port. Source tracing packets are carried by UDP packets,
and the UDP packets also carry the IS-IS purge LSPs sent by the current device and are
flooded hop by hop based on the IS-IS topology.
IS-IS purge LSP source tracing forwards packets along UDP channels which are independent
of the channels used to transmit IS-IS packets. Therefore, IS-IS purge LSP source tracing
supports incremental deployment. In addition, source tracing does not affect the devices with
the related UDP port disabled.
After IS-IS purge LSP source tracing packets are flooded to devices on the network,
information about the node that floods purge LSPs can be queried on any of the devices,
which speeds up fault locating and faulty node isolation.
Capability Negotiation
IS-IS purge LSP source tracing is Huawei proprietary. It uses UDP to carry packets and listens
to the UDP port which is used to receive and send source tracing packets. If a source
tracing-capable Huawei device sends source tracing packets to a source tracing-incapable
Huawei device or non-Huawei device, the source tracing-capable Huawei device may be
incorrectly identified as an attacker. Therefore, the source tracing capability need to be
negotiated between the devices. In addition, the source tracing-capable device needs to help
the source tracing-incapable device to send source tracing information, which also requires
negotiation.
Source tracing capability negotiation depends on IS-IS neighbor relationships. Specifically,
after an IS-IS neighbor relationship is established, the local device initiates source tracing
capability negotiation based on the IP address of the neighbor.
PS-PDU Generation
If a device needs to purge an LSP, it generates and floods a PS-PDU to all its source tracing
neighbors.
Issue 01 (2018-05-04) 1246

NE20E-S2
If a device receives a purge LSP from a source tracing-incapable neighbor, the device
generates and floods a PS-PDU to all its neighbors. If a device receives the same purge LSP
(with the same LSP ID and sequence number) from more than one source tracing-incapable
neighbor, the device generates only one PS-PDU.
PS-PDU flooding is similar to IS-IS LSP flooding.
Security Concern
IS-IS purge LSP source tracing uses a UDP port to receive and send source tracing packets.
Therefore, the security of the port must be taken into consideration.
IS-IS purge LSP source tracing inevitably increases packet receiving and sending workload
and intensifies bandwidth pressure. To minimize the impact on IS-IS, the number of source
tracing packets must be controlled.
 Authentication
Source tracing is embedded in IS-IS and uses IS-IS authentication parameters to
authenticate packets.
 Generalized TTL security mechanism (GTSM)
GTSM is a security mechanism that checks whether the time to live (TTL) value in each
received IP packet header is within a pre-defined range.
IS-IS purge LSP source tracing packets can be flooded as far as one hop. If the TTL of a
packet is 255 when it is sent and not 254 when it is received, the packet will be
discarded.
 CPU-CAR
The NP chip on interface boards can check the packets to be sent to the CPU for
processing and prevent the main control board from being overloaded by a large number
of packets that are sent to the CPU.
IS-IS purge LSP source tracing needs to apply for an independent CAR channel and has
small committed information rate (CIR) and committed burst size (CBS) values
configured.
Typical Scenarios
Scenario where all nodes support source tracing
All nodes on the network support source tracing, and Device A is the faulty source. Figure
1-845 shows the networking.
Issue 01 (2018-05-04) 1247

NE20E-S2
Figure 1-845 Scenario where all nodes support source tracing
All nodes in the networking support IS-IS purge LSP source tracing, and Device A is the
faulty source.
When Device A purges an IS-IS LSP, it floods a source tracing packet that carries Device A
information and brief information about the LSP. Then the packet is flooded on the network
hop by hop. After the fault occurs, maintenance personnel can log in to any node on the
network to locate Device A that keeps sending purge LSPs and isolate Device A from the
network.
Scenario where source tracing-incapable nodes are not isolated from source
All nodes on the network except device C support source tracing, and device A is the faulty
source. In this case, the PS-LSA can be flooded on the entire network. Figure 1-846 shows the
networking.
Issue 01 (2018-05-04) 1248

NE20E-S2
Figure 1-846 Scenario where source tracing-incapable nodes are not isolated from source
information and brief information about the LSP. Then the packet is flooded on the network
hop by hop. When Device B and Device E negotiate the source tracing capability with Device
C, they find that Device C does not support source tracing. Therefore, after Device B receives
the source tracing packet from Device A, Device B sends the packet to Device D, but not to
Device C. After receiving the purge LSP from Device C, Device E generates a source tracing
packet which carries information about the advertisement source (Device E), purge source
(Device C), and the purge LSP, and floods the packet on the network.
After the fault occurs, maintenance personnel can log in to any node on the network except
Device C to locate the faulty node. Two possible faulty nodes can be located in this case:
Device A and Device C, and they both sends the same purge LSP. In this case, Device A takes
precedence over Device C when the maintenance personnel determine the most possible
faulty source. After Device A is isolated, the network recovers. Then the possibility that
Device C is the faulty node is ruled out.
Scenario where source tracing-incapable nodes are isolated from source tracing-capable
nodes
All nodes on the network except devices C and D support source tracing, and Device A is the
faulty source. In this case, the PS-LSA cannot be flooded on the entire network. Figure 1-847
Issue 01 (2018-05-04) 1249

NE20E-S2
Figure 1-847 Scenario where source tracing-incapable nodes are isolated from source
information and brief information about the LSP. However, the source tracing packet can
reach only Device B because nodes C and Device D do not support IS-IS purge LSP source
tracing.
During source tracing capability negotiation, Device E finds that Device C does not support
source tracing, and Device F finds that Device D does not support source tracing. After
Device E receives the purge LSP from Device C, Device E helps Device E generate and flood
a source tracing packet. Similarly, after Device F receives the purge LSP from Device D,
Device F helps Device D generate and flood a source tracing packet.
After the fault occurs, if maintenance personnel log in to Device A or Device B, the personnel
can locate the faulty source (Device A) directly. After Device A is isolated, the network
recovers. If the maintenance personnel log in to Device E, Device F, Device G, or Device H,
the personnel will find that Device E claims Device C to be the faulty source and Device F
claims Device D to be the faulty source. Then the personnel log in to nodes C and Device D
and find that the purge LSP was sent by Device B, not generated by Device C or Device D.
Then the personnel log in to Device B, determine that Device A is the faulty node, and isolate
Device A. After Device A is isolated, the network recovers.
1.10.8.2.16 IS-IS MT
With IS-IS multi-topology (MT), IPv6, multicast, and advanced topologies can have their own
routing tables. This feature prevents packet loss if an integrated topology and the IPv4/IPv6
dual stack are deployed, isolates multicast services from unicast routes, improves network
resource usage, and reduces network construction cost.
Issue 01 (2018-05-04) 1250

NE20E-S2
Introduction
On a traditional IP network, IPv4 and IPv6 share the same integrated topology, and only one
unicast topology exists, which causes the following problems:
 Packet loss if the IPv4/IPv6 dual stack is deployed: If some routers and links in an
IPv4/IPv6 topology do not support IPv4 or IPv6, they cannot receive IPv4 or IPv6
packets sent from the router that supports the IPv4/IPv6 dual stack. As a result, these
packets are discarded.
 Multicast services highly depending on unicast routes: Only one unicast forwarding table
is available on the forwarding plane because only one unicast topology exists, which
forces services transmitted from one router to the same destination address to share the
same next hop, and various end-to-end services, such as voice and data services, to share
the same physical links. As a result, some links may be heavily congested while others
remain relatively idle. In addition, the multicast reverse path forwarding (RPF) check
depends on the unicast routing table. If the default unicast routing table is used when
transmitting multicast services, multicast services depend heavily on unicast routes, a
multicast distribution tree cannot be planned independently of unicast routes, and unicast
route changes affect multicast distribution tree establishment.
Deploying multiple topologies for different services on a physical network can address these
problems. IS-IS MT transmits multicast information by defining new TLVs in IS-IS packets.
Users can deploy multiple logical topologies on a physical network based on IP protocols or
service types supported by links so that separate SPF calculation operations are performed in
different topologies, which improves network usage.
If an IPv4 or IPv6 BFD session is Down in a topology on a network enabled with MT, neighbors of the
IPv4 or IPv6 address family will be affected.
Related Concepts
IS-IS MT allows multiple route selection subsets to be deployed on a versatile network
infrastructure and divides a physical network into multiple logical topologies, where each
topology performs its own SPF calculations.
IS-IS MT, an extension of IS-IS, allows multiple topologies to be applied to IS-IS. IS-IS MT
complies with standard protocols and transmits multicast information using new TLVs in
IS-IS packets. Users can deploy multiple logical topologies on a physical network. Each
topology performs its own SPF calculations and maintains its own routing table. Traffic of
different services, including the traffic transmitted in different IP topologies, has its own
optimal forwarding path.
The MT ID configured on an interface identifies the topology bound to the interface. One or
more MT IDs can be configured on a single interface.
With RPF check, upon receiving a packet, a router searches multicast static, unicast,
Multiprotocol Border Gateway Protocol (MBGP), and Multicast Interior Gateway Protocol
(MIGP) routing tables for an optimal route and sets it as the RPF route to the source IP
address of the packet. The packet can be transmitted only when it is destined for the RPF
interface.
Issue 01 (2018-05-04) 1251

NE20E-S2
Implementation
In IS-IS MT, the MT ID varies with the topology. Each Hello packet or LSP sent by a router
carries one or more MT TLVs of the topologies to which the source interface belongs. If the
router receives from a neighbor a Hello packet or LSP that carries only some of the local MT
TLVs, the router assumes that the neighbor belongs to only the default IPv4 topology. On a
point-to-point (P2P) link, an adjacency cannot be established between two neighbors that
share no common MT ID, while on a broadcast link, the relationship can be established in this
case.
Figure 1-848 shows the MT TLV format.
Figure 1-848 MT TLV format
The following section describes separation of the IPv4 topology from the IPv6 topology, and
the multicast topology from the unicast topology.
 Figure 1-849 shows the networking for separation of the IPv4 topology from the IPv6
topology. The values in the networking diagram are link costs. Device A, Device C, and
Device D support the IPv4/IPv6 dual stack; Device B supports IPv4 only and cannot
forward IPv6 packets.
Issue 01 (2018-05-04) 1252

NE20E-S2
Figure 1-849 Separation of the IPv4 topology from the IPv6 topology
Without IS-IS MT, Device A, Device B, Device C, and Device D use the IPv4/IPv6
topology for the SPF calculation. In this case, the shortest path from Device A to Device
D is Device A -> Device B- > Device D. IPv6 packets cannot reach Device D through
Device B because Device B does not support IPv6.
If a separate IPv6 topology is set up using IS-IS MT, Device A chooses only IPv6 links
to forward IPv6 packets. In this case, the shortest path from Device A to Device D is
Device A -> Device C -> Device D.
 Figure 1-850 shows the networking for separation between unicast and a multicast
topologies using IS-IS MT.
Figure 1-850 Separation of the multicast topology from the unicast topology
Issue 01 (2018-05-04) 1253

NE20E-S2
On the network shown in Figure 1-850, all routers are interconnected using IS-IS. A TE
tunnel is set up between Device A (ingress) and Device E (egress). The outbound
interface of the route calculated by IS-IS may not be a physical interface but a TE tunnel
interface. In this case, router C through which the TE tunnel passes cannot set up
multicast forwarding entries. As a result, multicast services cannot be transmitted.
IS-IS MT addresses this problem by establishing separate unicast and multicast
topologies. TE tunnels are excluded from a multicast topology. Therefore, multicast
services are unaffected by TE tunnels.
1.10.8.2.17 IS-IS Local MT
Background
When multicast and an IGP Shortcut-enabled MPLS TE tunnel are configured on a network,
the outbound interface of the route calculated by IS-IS may not be a physical interface but a
TE tunnel interface. Multicast Join packets are transparent to routers through which the TE
tunnel passes. Therefore, these routers cannot generate multicast forwarding entries and
discard received multicast packets from the multicast source. Figure 1-851 shows the conflict
between multicast and a TE tunnel.
Figure 1-851 Conflict between multicast and a TE tunnel
Client and Server exchange multicast packets as follows:

1. Client sends a Report message to Device A, requesting to join the multicast group. Upon
receipt, Device A sends a Join packet to Device B.
Issue 01 (2018-05-04) 1254

NE20E-S2
2. After Device B receives the Join packet, it selects TE-Tunnel 1/0/0 as the reverse path
forwarding (RPF) interface and adds an MPLS label to the packet before forwarding it
from Interface2 to Device C.
3. As the penultimate hop of the TE tunnel, Device C removes the MPLS label from the
Join packet before forwarding it from Interface2 to Device D. Because the forwarding is
based on MPLS, Device C does not generate any multicast forwarding entries.
4. After Device D receives the Join packet, it generates a multicast forwarding entry in
which the upstream and downstream interfaces are Interface1 and Interface2,
respectively. Device D then sends the Join packet to Device E, which has already
established the shortest path tree.
5. Multicast packets flow from Server to Device C through Device E and Device D. Device
C discards these packets because no multicast forwarding entry is available. As a result,
multicast services are interrupted.
IS-IS local multicast topology (MT) can address this problem.
Related Concepts
IS-IS local MT is a mechanism that enables the routing management (RM) module to create a
separate multicast topology on the local device so that protocol packets exchanged between
devices are not erroneously discarded. When the outbound interface of the route calculated by
IS-IS is an IGP Shortcut-enabled TE tunnel interface, IS-IS local MT calculates a physical
outbound interface for the route. This mechanism resolves the conflict between multicast and
a TE tunnel.
The TE tunnel described in this section is IGP Shortcut-enabled.
Implementation
Figure 1-852 shows how multicast packets are processed after IS-IS local MT is enabled.
1. Establishment of a multicast IGP (MIGP) routing table
Device B creates an independent MIGP routing table, records the TE tunnel interface,
and generates multicast routing entries for multicast packet forwarding. If the outbound
interface of a calculated route is a TE tunnel interface, IS-IS calculates a physical
outbound interface for the route and adds the route to the MIGP routing table.
2. Multicast packet forwarding
When forwarding multicast packets, a router searches the unicast routing table for a route.
If the next hop of the route is a tunnel interface, the router searches the MIGP routing
table for the physical outbound interface to forward multicast packets. In this example,
the original outbound interface of the route is TE tunnel 1/0/0. IS-IS re-calculates a
physical outbound interface (Interface2) for the route and adds the route to the MIGP
routing table. Device B then forwards multicast packets through GE 2/0/0 based on the
MIGP routing table and generates a multicast routing entry in the multicast routing table.
Therefore, multicast services are properly forwarded.
Issue 01 (2018-05-04) 1255

NE20E-S2
Figure 1-852 Local MT networking
Usage Scenario
IS-IS local MT prevents multicast services from being interrupted on networks which allows
multicasting and has an IGP Shortcut-enabled TE tunnel.
Benefits
Local MT resolves the conflict between multicast and a TE tunnel and improves multicast
service reliability.
1.10.8.2.18 IS-IS Control Messages

IS-IS routers implement routing by exchanging control messages. This section describes IS-IS
control messages.
IS-IS PDU Formats

Nine types of IS-IS protocol data units (PDUs) are available for processing control
information. Each PDU is identified by a 5-digit type code. IS-IS has three major types of
PDUs: Hello PDUs, Link State PDUs (LSPs), and Sequence Number PDUs (SNPs). Table
1-246 shows the mapping between PDUs and type values.
Table 1-246 Mapping between PDUs and type values
PDU Type Acronym Type

Value
Issue 01 (2018-05-04) 1256

NE20E-S2
PDU Type Acronym Type

Value
Level-1 LAN IS-IS Hello PDU L1 LAN IIH 15

Level-2 LAN IS-IS Hello PDU L2 LAN IIH 16
Point-to-Point IS-IS Hello PDU P2P IIH 17
Level-1 Link State PDU L1 LSP 18
Level-2 Link State PDU L2 LSP 20
Level-1 Complete Sequence Numbers PDU L1 CSNP 24
Level-2 Complete Sequence Numbers PDU L2 CSNP 25
Level-1 Partial Sequence Numbers PDU L1 PSNP 26
Level-2 Partial Sequence Numbers PDU L2 PSNP 27
The first eight bytes in all IS-IS PDUs are public. Figure 1-853 shows the IS-IS PDU format.
Figure 1-853 IS-IS PDU format
The main fields are as follows:

 Intradomain Routing Protocol Discriminator: network layer protocol identifier assigned
to IS-IS, which is 0x83.
 Length Indicator: length of the fixed header, in bytes.
 ID Length: length of the system ID of network service access point (NSAP) addresses or
NETs in this routing domain.
 PDU Type: type of a PDU. For details, see Table 1-246.
Issue 01 (2018-05-04) 1257

NE20E-S2
 Maximum Area Address: maximum number of area addresses supported by an IS-IS area.
The value 0 indicates that a maximum of three area addresses are supported by this IS-IS
area.
 Type/Length/Value (TLV): encoding type that features high efficiency and expansibility.
Each type of PDU contains a different TLV. Table 1-247 shows the mapping between
TLV codes and PDU types.
Table 1-247 Mapping between TLV codes and PDU types
TLV Code TLV Code Name PDU Type

1 Area Addresses IIH, LSP
2 IS Neighbors (LSP) LSP
4 Partition Designated Level2 IS L2 LSP
6 IS Neighbors (MAC Address) LAN IIH
7 IS Neighbors (SNPA Address) LAN IIH
8 Padding IIH
9 LSP Entries SNP
10 Authentication Information IIH, LSP, or SNP
128 IP Internal Reachability Information LSP
129 Protocols Supported IIH or LSP
130 IP External Reachability Information L2 LSP
131 Inter-Domain Routing Protocol Information L2 LSP
132 IP Interface Address IIH or LSP
Hello Packet Format

Hello packets, also called the IS-to-IS Hello PDUs (IIHs), are used to set up and maintain
neighbor relationships. Level-1 LAN IIHs are applied to the Level-1 routers on broadcast
LANs. Level-2 LAN IIHs are applied to the Level-2 routers on broadcast LANs. P2P IIHs are
applied to non-broadcast networks. IIHs in different networks have different formats.
 LAN IIHs: Figure 1-854 shows the format of IIHs on a broadcast network.
Issue 01 (2018-05-04) 1258

NE20E-S2
Figure 1-854 Level-1/Level-2 LAN IIH format
 P2P IIHs: Figure 1-855 shows the format of IIHs on a P2P network.
Figure 1-855 P2P IIH format
Issue 01 (2018-05-04) 1259

NE20E-S2
As shown in Figure 1-855, most fields in a P2P IIH are the same as those in a LAN IIH.
The P2P IIH does not have the priority and LAN ID fields but has a local circuit ID field.
The local circuit ID indicates the local link ID.
LSP Format
LSPs are used to exchange link-state information. There are two types of LSPs: Level-1 and
Level-2. Level-1 IS-IS transmits Level-1 LSPs. Level-2 IS-IS transmits Level-2 LSPs.
Level-1-2 IS-IS can transmit both Level-1 and Level-2 LSPs.
Level-1 and Level-2 LSPs have the same format, as shown in Figure 1-856.
Figure 1-856 Level-1 or Level-2 LSP

 ATT: Attached bit
ATT is generated by a Level-1-2 router to identify whether the originating router is
connected to other areas. When a Level-1 router receives a Level-1 LSP with ATT as 1
from a Level-1-2 router, the Level-1 router generates a default route destined for the
Level-1-2 router so that data can be transmitted to other areas.
Although ATT is defined in both the Level-1 LSP and Level-2 LSP, it is set only in the
Level-1 LSP only by the Level-1-2 router.
 OL: LSDB overload
LSPs with the overload bit are still flooded on networks, but the LSPs are not used when
routes that pass through a device configured with the overload bit are calculated. That is,
after a device is configured with the overload bit, other devices ignore the device when
performing the SPF calculation except for the direct routes of the device.
Issue 01 (2018-05-04) 1260

NE20E-S2
 IS Type: type of the IS-IS generating the LSP

IS Type is used to specify whether the IS-IS type is Level-1 or Level-2 IS-IS. The value
01 indicates Level-1; the value 11 indicates Level-2.
SNP Format
SNPs describe the LSPs in all or some of the databases and are used to synchronize and
maintain all LSDBs. SNPs consist of complete SNPs (CSNPs) and partial SNPs (PSNPs).
 CSNPs carry summaries of all LSPs in LSDBs, which ensures LSDB synchronization
between neighboring routers. On a broadcast network, the designated intermediate
system (DIS) sends CSNPs at an interval. The default interval is 10 seconds. On a P2P
link, neighboring devices send CSNPs only when a neighbor relationship is established
for the first time.
Figure 1-857 shows the CSNP format.
Figure 1-857 Level-1/Level-2 CSNP format

− Source ID: system ID of the router that sends SNPs
− Start LSP ID: ID of the first LSP in a CSNP
− End LSP ID: ID of the last LSP in a CSNP
 PSNPs list only the sequence numbers of recently received LSPs. A PSNP can
acknowledge multiple LSPs at a time. If an LSDB is not updated, PSNPs are also used to
request a new LSP from a neighbor.
Figure 1-858 shows the PSNP format.
Issue 01 (2018-05-04) 1261

NE20E-S2
Figure 1-858 Level-1/Level-2 PSNP format
1.10.8.3.1 IS-IS MT
Figure 1-859 shows the use of IS-IS MT to separate an IPv4 topology from an IPv6 topology.
Device A, Device C, and Device D support IPv4/IPv6 dual-stack; Device B supports IPv4
only and cannot forward IPv6 packets.
Figure 1-859 Separation of the IPv4 topology from the IPv6 topology
If IS-IS MT is not used, Device A, Device B, Device C, and Device D consider the IPv4 and
IPv6 topologies the same when using the SPF algorithm for route calculation. The shortest
Issue 01 (2018-05-04) 1262

NE20E-S2
path from Device A to Device D is Device A -> Device B- > Device D. Device B does not
support IPv6 and cannot forward IPv6 packets to Device D.
If IS-IS MT is used to establish a separate IPv6 topology, Device A chooses only IPv6 links to
forward IPv6 packets. The shortest path from Device A to Device D changes to Device A ->
Device C -> Device D. IPv6 packets are then forwarded.
Figure 1-860 shows the use of IS-IS MT to separate unicast and multicast topologies.
Figure 1-860 Separation of the multicast topology from the unicast topology
All routers in Figure 1-860 are interconnected using IS-IS. A TE tunnel is set up between
Device A (ingress) and Device E (egress). The outbound interface of the route calculated by
IS-IS may not be a physical interface but a TE tunnel interface. The routers between which
the TE tunnel is established cannot set up multicast forwarding entries. As a result, multicast
services cannot run properly.
IS-IS MT is configured to solve this problem by establishing separate unicast and multicast
topologies. TE tunnels are excluded from a multicast topology; therefore, multicast services
can run properly, without being affected by TE tunnels.
1.10.8.4 Appendixes
Feature Supported Supported Differences
by IPv4 by IPv6
IS-IS TE Yes No -
Issue 01 (2018-05-04) 1263

NE20E-S2
1.10.9 BGP
BGP Definition
Border Gateway Protocol (BGP) is a dynamic routing protocol used between Autonomous
Systems (ASs). BGP is widely used by Internet Service Providers (ISPs).
As three earlier-released versions of BGP, BGP-1, BGP-2, and BGP-3 are used to exchange
reachable inter-AS routes, establish inter-AS paths, avoid routing loops, and apply routing
policies between ASs.
Currently, BGP-4 is used.
BGP has the following characteristics:
 Unlike an Interior Gateway Protocol (IGP), such as Open Shortest Path First (OSPF) and
Routing Information Protocol (RIP), BGP is an Exterior Gateway Protocol (EGP) which
controls route advertisement and selects optimal routes between ASs rather than
discovering or calculating routes.
 BGP uses Transport Control Protocol (TCP) as the transport layer protocol, which
enhances BGP reliability.
− BGP selects inter-AS routes, which poses high requirements on stability. Therefore,
using TCP enhances BGP's stability.
− BGP peers must be logically connected through TCP. The destination port number
is 179 and the local port number is a random value.
 BGP supports Classless Inter-Domain Routing (CIDR).
 When routes are updated, BGP transmits only the updated routes, which reduces
bandwidth consumption during BGP route distribution. Therefore, BGP is applicable to
the Internet where a large number of routes are transmitted.
 BGP is a distance-vector routing protocol.
 BGP is designed to prevent loops.
− Between ASs: BGP routes carry information about the ASs along the path. The
routes that carry the local AS number are discarded to prevent inter-AS loops.
− Within an AS: BGP does not advertise routes learned in an AS to BGP peers in the
AS to prevent intra-AS loops.
 BGP provides many routing policies to flexibly select and filter routes.
 BGP provides a mechanism that prevents route flapping, which effectively enhances
Internet stability.
 BGP can be easily extended.
BGP4+ Definition
As a dynamic routing protocol used between ASs, BGP4+ is an extension of BGP.
Traditional BGP4 manages IPv4 routing information but does not support the inter-AS
transmission of packets encapsulated by other network layer protocols (such as IPv6).
To support IPv6, BGP4 must have the additional ability to associate an IPv6 protocol with the
next hop information and network layer reachable information (NLRI).
Issue 01 (2018-05-04) 1264

NE20E-S2
Two NLRI attributes that were introduced to BGP4+ are as follows:

 Multiprotocol Reachable NLRI (MP_REACH_NLRI): carries the set of reachable
destinations and the next hop information used for packet forwarding.
 Multiprotocol Unreachable NLRI (MP_UNREACH_NLRI): carries the set of
unreachable destinations.
The Next_Hop attribute in BGP4+ is in the format of an IPv6 address, which can be either a
globally unique IPv6 address or a next hop link-local address.
Using multiple protocol extensions of BGP4, BGP4+ is applicable to IPv6 networks without
changing the messaging and routing mechanisms of BGP4.
Purpose
BGP transmits route information between ASs. It, however, is not required in all scenarios.
Figure 1-861 BGP networking
BGP is required in the following scenarios:

 On the network shown in Figure 1-861, users need to be connected to two or more ISPs.
The ISPs need to provide all or part of the Internet routes for the users. Routers, therefore,
need to select the optimal route through the AS of an ISP to the destination based on the
attributes carried in BGP routes.
 The AS_Path attribute needs to be transmitted between users in different organizations.
 Users need to transmit VPN routes through a Layer 3 VPN. For details, see the HUAWEI
NE20E-S2 Feature Description - VPN.
 Users need to transmit multicast routes and construct a multicast topology. For details,
see the HUAWEI NE20E-S2 Feature Description - IP Multicast.
BGP is not required in the following scenarios:
Issue 01 (2018-05-04) 1265

NE20E-S2
 Users are connected to only one ISP.

 The ISP does not need to provide Internet routes for users.
 ASs are connected through default routes.
1.10.9.2 Principles
1.10.9.2.1 Basic Principle
BGP Operating Modes

BGP is called Internal BGP (IBGP) when it runs within an AS; it is called External BGP
(EBGP) when it runs between ASs, as shown in Figure 1-862.
Figure 1-862 BGP operating modes
Roles in Transmitting BGP Messages

 Speaker: Any router that sends BGP messages is called a BGP speaker. The speaker
receives or generates new routing information and then advertises the routing
information to other BGP speakers. After receiving a route from another AS, a BGP
speaker compares the route with its local routes. If the route is better than its local routes,
or the route is new, the speaker advertises this route to all other BGP speakers.
 Peer: BGP speakers that exchange messages with each other are called peers.
BGP Messages
BGP runs by sending five types of messages: Open, Update, Notification, Keepalive, and
Route-refresh.
Issue 01 (2018-05-04) 1266

NE20E-S2
 Open: The first message sent after a TCP connection is set up is an Open message, which
is used to set up BGP peer relationships. After a peer receives an Open message and the
peer negotiation is successful, the peer sends a Keepalive message to confirm and
maintain the peer relationship. Then, peers can exchange Update, Notification, Keepalive,
and Route-refresh messages.
 Update: This type of message is used to exchange routes between BGP peers.
− An Update message can advertise multiple reachable routes with the same attributes.
These route attributes are applicable to all destination addresses (expressed by IP
prefixes) in the Network Layer Reachability Information (NLRI) field of the Update
message.
− An Update message can be used to delete multiple unreachable routes. Each route is
identified by its destination address (using the IP prefix), which identifies the routes
previously advertised between BGP speakers.
− An Update message can be used only to delete routes. In this case, it does not need
to carry the route attributes or NLRI. In addition, an Update message can be used
only to advertise reachable routes. In this case, it does not need to carry information
about the deleted routes.
 Notification: When BGP detects an error, it sends a Notification message to its peer. The
BGP connection is then torn down immediately.
 Keepalive: BGP periodically sends Keepalive messages to peers to maintain peer
relationships.
 Route-refresh: This type of message is used to request that the peer resend all reachable
routes.
If all BGP routers are enabled with the Route-refresh capability and the import policy of
BGP changes, the local BGP router sends a Route-refresh message to its peers. After
receiving the Route-refresh message, the peers resend their routing information to the
local BGP router. In this manner, BGP routing tables are dynamically refreshed and new
routing policies are used without tearing down BGP connections.
BGP Finite State Machine

The BGP Finite State Machine (FSM) has six states: Idle, Connect, Active, OpenSent,
OpenConfirm, and Established.
Three common states during the establishment of BGP peer relationships are Idle, Active, and
Established.
 In the Idle state, BGP denies all connection requests. This is the initial status of BGP.
 In the Connect state, BGP decides subsequent operations after a TCP connection is
established.
 In the Active state, BGP attempts to establish a TCP connection.
 In the OpenSent state, BGP is waiting for an Open message from the peer.
 In the OpenConfirm state, BGP is waiting for a Notification or Keepalive message.
 In the Established state, BGP peers can exchange Update, Route-refresh, Keepalive, and
Notification messages.
The BGP peer relationship can be established only when both BGP peers are in the
Established state. Both peers send Update messages to exchange routes.
Issue 01 (2018-05-04) 1267

NE20E-S2
BGP Processing
 BGP adopts TCP as its transport layer protocol. Therefore, a TCP connection must be
available between the peers. BGP peers negotiate parameters by exchanging Open
messages to establish a BGP peer relationship.
 After the peer relationship is established, BGP peers exchange BGP routing tables. BGP
does not periodically update a routing table. When BGP routes change, BGP updates the
changed BGP routes in the BGP routing table by sending Update messages.
 BGP sends Keepalive messages to maintain the BGP connection between peers.
 After detecting an error on a network, BGP sends a Notification message to report the
error and the BGP connection is torn down.
BGP Attributes
BGP route attributes are a set of parameters that describe specific BGP routes. With BGP
route attributes, BGP can filter and select routes. BGP route attributes are classified into the
following types:
 Well-known mandatory: This type of attribute can be identified by all BGP routers and
must be carried in Update messages. Without this attribute, errors occur in the routing
information.
 Well-known discretionary: This type of attribute can be identified by all BGP routers.
This type of attribute is optional and, therefore, is not necessarily carried in Update
messages.
 Optional transitive: This indicates the transitive attribute between ASs. A BGP router
may not recognize this attribute, but the router still receives it and advertises it to other
peers.
 Optional non-transitive: If a BGP router does not recognize this type of attribute, the
router does not advertise it to other peers.
The most common BGP route attributes are as follows:
 Origin
The Origin attribute defines the origin of a route. The Origin attribute is classified into
the following types:
− Interior Gateway Protocol (IGP): This attribute type has the highest priority. IGP is
the Origin attribute for routes obtained through an IGP in the AS from which the
routes originate. For example, the Origin attribute of the routes imported to the BGP
routing table using the network command is IGP.
− Exterior Gateway Protocol (EGP): This attribute type has the second highest
priority. The Origin attribute of the routes obtained through EGP is EGP.
− Incomplete: This attribute type has the lowest priority. Incomplete is the Origin
attribute type of all routes that do not have the IGP or EGP Origin attribute. For
example, the Origin attribute of the routes imported using the import-route
command is Incomplete.
 AS_Path
The AS-Path attribute records all ASs through which a route passes from the local end to
the destination in distance-vector (DV) order.
When a BGP speaker advertises a local route:
− When advertising the route beyond the local AS, the BGP speaker adds the local AS
number to the AS_Path list and then advertises it to the neighboring routers through
Update messages.
Issue 01 (2018-05-04) 1268

NE20E-S2
− When advertising the route within the local AS, the BGP speaker creates an empty
AS_Path list in an Update message.
When a BGP speaker advertises a route learned from the Update messages of another
BGP speaker:
− When advertising the route beyond the local AS, the BGP speaker adds the local AS
number to the left of the AS_Path list. From the AS_Path attribute, the BGP router
that receives the route learns the ASs through which the route passes to the
destination. The number of the AS that is nearest to the local AS is placed on the left
of the list, while other AS numbers are listed in sequence.
− When advertising the route within the local AS, the BGP speaker does not change
the AS_Path attribute.
The AS_Path attribute has four types:
− AS_Sequence: records in reverse order all the ASs through which a route passes
from the local device to the destination.
− AS_Set: records without an order all the ASs through which a route passes through
from the local device to the destination. The AS_Set attribute is used in route
summarization scenarios. After route summarization, the device records the
unsequenced AS numbers because it cannot sequence the numbers of ASs through
which specific routes pass. No matter how many AS numbers an AS_Set contains,
BGP regards the AS_Set as one AS number when calculating routes.
− AS_Confed_Sequence: records in reverse order all the sub-ASs within a BGP
confederation through which a route passes from the local device to the destination.
− AS_Confed_Set: records without an order all the sub-ASs within a BGP
confederation through which a route passes from the local device to the destination.
The AS_Confed_Set attribute is used in route summarization scenarios in a
confederation.
The AS_Confed_Sequence and AS_Confed_Set attributes are used to prevent routing
loops and to select routes among the various sub-ASs in a confederation.
 Next_Hop
Different from the Next_Hop attribute in an IGP, the Next_Hop attribute in BGP is not
necessarily the IP address of a neighboring router. In most cases, the Next_Hop attribute
in BGP complies with the following rules:
− When advertising a route to an EBGP peer, a BGP speaker sets the Next_Hop of the
route to the address of the local interface through which the BGP peer relationship
is established.
− When advertising a locally generated route to an IBGP peer, a BGP speaker sets the
Next_Hop of the route to the address of the local interface through which the BGP
peer relationship is established.
− When advertising a route learned from an EBGP peer to an IBGP peer, the BGP
speaker does not change the Next_Hop of the route.
 MED
The Multi-Exit-Discriminator (MED) is transmitted only between two neighboring ASs.
The AS that receives the MED does not advertise it to a third AS.
Similar to the cost used by an IGP, the MED is used to determine the optimal route when
traffic enters an AS. When a BGP peer learns multiple routes that have the same
destination address but different next hops from EBGP peers, the route with the smallest
MED value is selected as the optimal route if all other attributes are the same.
 Local_Pref
Issue 01 (2018-05-04) 1269

NE20E-S2
The Local_Pref attribute indicates the BGP priority of a route. It is available only to
IBGP peers and is not advertised to other ASs.
The Local_Pref attribute is used to determine the optimal route when traffic leaves an AS.
When a BGP router obtains multiple routes to the same destination address but with
different next hops through IBGP peers, the route with the largest Local_Pref value is
selected.
1.10.9.2.2 BGP Route Processing

Figure 1-863 shows how BGP processes routes. BGP routes can be imported from other
protocols or learned from peers. To reduce the routing size, you can configure route
summarization after BGP selects routes. In addition, you can configure route-policies and
apply them to route import, receipt, or advertisement to filter routes or modify route attributes.
Figure 1-863 BGP route processing
For details about route import, see Route Import; for details about BGP route selection rules,
see BGP Route Selection; for details about route summarization, see Route Summarization;
for details about advertising routes to BGP peers, see BGP Route Advertisement.
For details about import or export policies, see "Routing Policies" in NE20E Feature
Description — IP Routing.
Route Import
BGP itself cannot discover routes. Therefore, it needs to import other protocol routes, such as
IGP routes or static routes, to the BGP routing table. Imported routes can be transmitted
within an AS or between ASs.
Issue 01 (2018-05-04) 1270

NE20E-S2
BGP routes are imported in either of the following modes:

 The import command imports routes based on protocol types, such as RIP routes, OSPF
routes, Intermediate System to Intermediate System (IS-IS) routes, static routes, or direct
routes.
 The network command imports a route with the specified prefix and mask to the BGP
routing table, which is more precise than the previous mode.
BGP Route Selection

On the NE20E, when multiple routes to the same destination are available, BGP selects routes
based on the following rules:
1. Prefers routes in descending order of Valid, Not Found, and Invalid after BGP origin AS
validation states are applied to route selection in a scenario where the device is
connected to an RPKI server.
2. Prefers routes without bit errors.
If the bestroute bit-error-detection command is run, BGP preferentially selects routes
without bit error events.
3. Prefers the route with the largest PreVal value.
PrefVal is Huawei-specific. It is valid only on the device where it is configured.
4. Prefers the route with the highest Local_Pref.
If a route does not carry Local_Pref, the default value 100 takes effect. To change the
value, run the default local-preference command.
5. Prefers a locally originated route to a route learned from a peer.
Locally originated routes include routes imported using the network or import-route
command, as well as manually and automatically summarized routes.
a. Prefers a summarized route over a non-summarized route.
b. Prefers a route obtained using the aggregate command over a route obtained using
the summary automatic command.
c. Prefers a route imported using the network command over a route imported using
the import-route command.
6. Prefers a route that carries the Accumulated Interior Gateway Protocol Metric (AIGP)
attribute.
− The priority of a route that carries the AIGP attribute is higher than the priority of a
route that does not carry the AIGP attribute.
− If two routes both carry the AIGP attribute, the route with a smaller AIGP attribute
value plus IGP metric of the iterated next hop is preferred over the other route.
7. Prefers the route with the shortest AS_Path length.
− The AS_CONFED_SEQUENCE and AS_CONFED_SET are not included in the
AS_Path length.
− During route selection, a router assumes that an AS_SET carries only one AS
number regardless of the actual number of ASs it is carrying.
− If the bestroute as-path-ignore command is run, BGP no longer compares the
AS_Path attribute.
Issue 01 (2018-05-04) 1271

NE20E-S2
With load balancing, if the preceding conditions are equal and multiple external routes with the same
AS_Path are available, load balancing is performed among them. The number of routes load-balancing
traffic must be less than or equal to the configured number. After the load-balancing as-path-ignore
command is run, the routes with different As_Path values can load-balance traffic.
8. Prefers the route with the Origin type as IGP, EGP, and Incomplete in descending order.
9. Prefers the route with the smallest MED value.
If the bestroute med-plus-igp command is run, BGP preferentially selects the route with the smallest
sum of MED multiplied by a MED multiplier and IGP cost multiplied by an IGP cost multiplier.
− BGP compares the MEDs of only routes from the same AS (excluding
confederation sub-ASs). MEDs of two routes are compared only when the first AS
number in the AS_Sequence (excluding AS_Confed_Sequence) of one route is the
same as its counterpart in the other route.
− If a route does not carry MED, BGP considers its MED as the default value (0)
during route selection. If the bestroute med-none-as-maximum command is run,
BGP considers its MED as the largest MED value (4294967295).
− If the compare-different-as-med command is run, BGP compares MEDs of routes
even when the routes are received from peers in different ASs. Do not run this
command unless the ASs use the same IGP and route selection mode. Otherwise, a
loop may occur.
− If the deterministic-med command is run, routes are no longer selected in the
sequence in which they are received.
10. Prefers local VPN routes, LocalCross routes, and RemoteCross routes in descending
order.
LocalCross routes indicate local VPN cross routes or routes imported between public network and VPN
instances.
If the ERT of a VPNv4 route in the routing table of a VPN instance on a PE matches the
IRT of another VPN instance on the PE, the VPNv4 route is added to the routing table of
the second VPN instance. This route is called a LocalCross route. If the ERT of a VPNv4
route learned from a remote PE matches the IRT of a VPN instance on the local PE, the
VPNv4 route is added to the routing table of that VPN instance. This route is called a
RemoteCross route.
11. Prefers EBGP routes to IBGP routes.
12. Prefers the route that is iterated to an IGP route with the smallest cost.
If the bestroute igp-metric-ignore command is run, BGP no longer compares the IGP
cost.
13. Prefers the route with the shortest Cluster_List length.
By default, Cluster_List takes precedence over Router ID during BGP route selection. To enable Router
ID to take precedence over Cluster_List during BGP route selection, run the bestroute
routerid-prior-clusterlist command.
14. Prefers the route advertised by the router with the smallest router ID.
Issue 01 (2018-05-04) 1272

NE20E-S2
If the bestroute router-id-ignore command is run, router IDs do not determine which
route is selected for BGP.
If each route carries an Originator_ID, the originator IDs rather than router IDs are compared during
route selection. The route with the smallest Originator_ID is preferred.
15. Prefers the route learned from the peer with the smallest IP address.
16. If BGP Flow Specification routes are configured locally, the first configured BGP Flow
Specification route is preferentially selected.
17. Prefers the locally imported route in the RM routing table.
If a direct route, static route, and IGP route are imported, BGP preferentially selects the
direct route, static route, and IGP route in descending order.
18. Prefers the Add-Path route with the smallest recv pathID.
19. Prefers the RemoteCross route with the largest RD.
20. Prefers locally received routes to the routes imported between VPN and public network
instances.
21. Prefers the route that was learned the earliest.
For details about BGP route attributes, see 1.10.9.2.1 Basic Principle.
For details about the BGP route selection process, see Figure 1-864.
Issue 01 (2018-05-04) 1273

NE20E-S2
Figure 1-864 BGP route selection process
Issue 01 (2018-05-04) 1274

NE20E-S2
Issue 01 (2018-05-04) 1275

NE20E-S2
Route Summarization
On a large-scale network, the BGP routing table can be very large. Route summarization can
reduce the size of the routing table.
Route summarization is the process of summarizing specific routes with the same IP prefix
into a summarized route. After route summarization, BGP advertises only the summarized
route rather than all specific routes to BGP peers.
BGP supports automatic and manual route summarization.
 Automatic route summarization: takes effect on the routes imported by BGP. With
automatic route summarization, the specific routes for the summarization are suppressed,
and BGP summarizes routes based on the natural network segment and sends only the
summarized route to BGP peers. For example, 10.1.1.1/24 and 10.2.1.1/24 are
summarized into 10.0.0.0/8, which is a Class A address.
 Manual route summarization: takes effect on routes in the local BGP routing table. With
manual route summarization, users can control the attributes of the summarized route
and determine whether to advertise the specific routes.
IPv4 supports both automatic and manual route summarization, while IPv6 supports only
manual route summarization.
BGP Route Advertisement

BGP adopts the following policies to advertise routes:
 When there are multiple valid routes, a BGP speaker advertises only the optimal route to
its peers.
 A BGP speaker advertises the routes learned from EBGP peers to all BGP peers,
including EBGP peers and IBGP peers.
 A BGP speaker does not advertise the routes learned from an IBGP peer to other IBGP
peers.
 A BGP speaker advertises the routes learned from IBGP peers to its EBGP peers.
 A BGP speaker advertises all BGP optimal routes to new peers after peer relationships
are established.
1.10.9.2.3 AIGP
Background
The Accumulated Interior Gateway Protocol Metric (AIGP) attribute is an optional
non-transitive Border Gateway Protocol (BGP) path attribute. The attribute type code
assigned by the Internet Assigned Numbers Authority (IANA) for the AIGP attribute is 26.
Routing protocols, such as IGPs that have been designed to run within a single administrative
domain, generally assign a metric to each link, and then choose the path with the smallest
metric as the optimal path between two nodes. BGP, designed to provide routing over a large
number of independent administrative domains, does not select paths based on metrics. If a
single administrative domain runs several contiguous BGP networks, it is desirable for BGP
to select paths based on metrics, just as an IGP does. The AIGP attribute enables BGP to
select paths based on metrics.
Issue 01 (2018-05-04) 1276

NE20E-S2
Related Concepts
An AIGP administrative domain is a set of autonomous systems (ASs) in a common
administrative domain. The AIGP attribute takes effect only in an AIGP administrative
domain. Figure 1-865 shows the networking diagram of AIGP application.
Figure 1-865 AIGP application networking
Implementation
AIGP Attribute Origination
The AIGP attribute can be added to a route only through a route-policy. You can configure a
BGP route to add an AIGP value when routes are imported, received, or sent. If no AIGP
value is configured, BGP routes do not contain AIGP attributes.
AIGP Attribute Delivery
BGP does not allow the AIGP attribute to leak out of an AIGP administrative domain
boundary onto the Internet. If the AIGP attribute of a route changes, BGP sends Update
packets for BGP peers to update information about this route. In a scenario in which A, a BGP
speaker, sends a route that carries the AIGP attribute to B, its BGP peer:
 If B does not support the AIGP attribute or does not have the AIGP capability enabled for
a peer, B ignores the AIGP attribute and does not transmit the AIGP attribute to other
BGP peers.
Issue 01 (2018-05-04) 1277

NE20E-S2
 If B supports the AIGP attribute and has the AIGP capability enabled for a peer, B can
modify the AIGP attribute of the route only after B has set itself to be the next hop of the
route. To modify the AIGP attribute of the route, B complies with the following rules:
− If the BGP peer relationship between A and B is established over an IGP route, or a
static route that does not require recursive next hop resolution, B uses the IGP or
static route metric value plus the received AIGP attribute value as the new AIGP
attribute value of the received route and sends the new AIGP attribute along with
the route to other BGP peers.
− If the BGP peer relationship between A and B is established over a BGP route, or a
static route that requires recursive next hop resolution, route iteration occurs when
B sends data to A. Each route iteration requires a pre-existing route. B uses the sum
of metric values for iterated routes along the path from B to A plus the received
AIGP attribute value as the new AIGP attribute value of the received route and
sends the new AIGP attribute along with the route to other BGP peers.
Role of the AIGP Attribute in BGP Route Selection
If multiple active routes exist between two nodes, BGP will make a route selection decision.
If BGP cannot determine the optimal route based on PrefVal, Local_Pref, and Route-type,
BGP compares the AIGP attributes of these routes. BGP route selection rules are as follows:
 If BGP cannot determine the optimal route based on Route-type, BGP compares the
AIGP attributes. If this method still cannot determine the optimal route, BGP proceeds to
compare the AS_Path attributes.
 The priority of a route that carries the AIGP attribute is higher than the priority of a route
that does not carry the AIGP attribute.
 If all routes carry the AIGP attribute, the route with the smallest AIGP attribute value
plus the IGP metric value of the iterated next hop is preferred over the other routes.
Usage Scenario
The AIGP attribute is used to select the optimal route in an AIGP administrative domain.
Benefits
After the AIGP attribute is configured in an AIGP administrative domain, BGP selects paths
based on metrics, just as an IGP. Consequently, all devices in the AIGP administrative domain
use the optimal routes to forward data.
1.10.9.2.4 Peer Group

A peer group is a set of peers with the same policies. When a peer joins a peer group, it
inherits the configurations of the peer group. If the configurations of the peer group change,
the configurations of all the peers in the group changes accordingly.
On a large-scale BGP network, there are many peers and most of them need the same policies.
Therefore, some commands need to be run repeatedly on each peer. Configuring a peer group
can simplify the configuration.
Each peer in a peer group can be configured with unique policies to advertise and receive
routes.
1.10.9.2.5 Route Dampening

Route instability is reflected by route flapping. When a route flaps, it repeatedly disappears
from the routing table and then reappears.
Issue 01 (2018-05-04) 1278

NE20E-S2
If route flapping occurs, a router sends an Update packet to its peers. After the peers receive
the Update packet, they recalculate routes and update their routing tables. Frequent route
flapping consumes lots of bandwidth and CPU resources and can even affect network
operations.
Route dampening can address this problem. In most cases, BGP is deployed on complex
networks where routes change frequently. To reduce the impact of frequent route flapping,
BGP adopts route dampening to suppress unstable routes.
BGP dampening measures route stability using a penalty value. The greater the penalty value,
the less stable a route. Each time route flapping occurs (a device receives a Withdraw or an
Update packet), BGP adds a penalty value to the route carried in the packet. If a route changes
from active to inactive, the penalty value increases by 1000. If a route is updated when it is
active, the penalty value increases by 500. When the penalty value of a route exceeds the
Suppress value, the route is suppressed. As a result, BGP does not add the route to the routing
table or advertise any Update message to BGP peers.
The penalty value of a suppressed route reduces by half after a half-life period. When the
penalty value decreases to the Reuse value, the route becomes reusable, and BGP adds the
route to the IP routing table and advertises an Update packet carrying the route to BGP peers.
The penalty value, suppression threshold, and half-life are configurable. Figure 1-866 shows
the process of BGP route dampening.
Figure 1-866 BGP route dampening
Route dampening applies only to EBGP routes and VPNv4 IBGP routes. IBGP routes (except
VPNv4 IBGP routes) cannot be dampened because IBGP routing tables contain the routes
from the local AS, which require that the forwarding entries be the same on IBGP peers in the
AS. If IBGP routes are dampened, the forwarding entries may be inconsistent because
dampening parameters may vary among these IBGP peers.
1.10.9.2.6 Community Attribute

A community is a set of destination addresses with the same characteristics. It is four bytes
long. On the NE20E, the community attribute is expressed in the format of aa:nn or as a
community number.
Issue 01 (2018-05-04) 1279

NE20E-S2
 aa:nn: aa indicates an AS number and nn indicates the community identifier defined by

an administrator. The value of aa or nn ranges from 0 to 65535, which is configurable.
For example, if a route is from AS 100 and the community identifier defined by the
administrator is 1, the community is 100:1.
 Community number: It is an integer ranging from 0 to 4294967295. As defined in
standard protocols, numbers from 0 (0x00000000) to 65535 (0x0000FFFF) and from
4294901760 (0xFFFF0000) to 4294967295 (0xFFFFFFFF) are reserved.
The community attribute is used to simplify the application, maintenance, and management of
routing policies. With the community attribute, a group of BGP peers in multiple ASs can
share the same routing policy. The community attribute is a route attribute. It is transmitted
between BGP peers and is not restricted by the AS. Before advertising a route with the
community attribute to peers, a BGP peer can change the original community attribute of this
route.
The peers in a peer group share the same policy, while the routes with the same community
attribute share the same policy.
The well-known communities are described in the next section. Users can also create their
own communities to filter routes.
Well-known Community
Table 1-248 lists well-known communities of BGP routes.
Table 1-248 Well-known communities of BGP routes

Community Community Description
Name Identifier
Internet 0 By default, all routes belong to the Internet

(0x00000000) community. A route with this attribute can be
advertised to all BGP peers.
No_Export 4294967041 A route with this attribute cannot be advertised
(0xFFFFFF01) beyond the local AS.
No_Advertise 4294967042 A route with this attribute cannot be advertised
(0xFFFFFF02) to any other BGP peers.
No_Export_Subconf 4294967043 A route with this attribute cannot be advertised
ed (0xFFFFFF03) beyond the local AS or to other sub-ASs.
Usage Scenario
On the network shown in Figure 1-867, EBGP connections are established between Device A
and Device B, and between Device B and Device C. If the community attribute of No_Export
is configured on Device A and Device A sends a route with the community attribute to Device
B, Device B does not advertise the route to other ASs after receiving it.
Issue 01 (2018-05-04) 1280

NE20E-S2
Figure 1-867 Networking for BGP communities
1.10.9.2.7 BGP Confederation

Besides RR, BGP confederation can also reduce IBGP connections in an AS. It divides an AS
into several sub-ASs. Fully meshed IBGP connections are established in each sub-AS, and
fully meshed EBGP connections are established between sub-ASs.
Figure 1-868 BGP confederation
Issue 01 (2018-05-04) 1281

NE20E-S2
In Figure 1-868, there are multiple BGP routers in AS 200. To reduce the number of IBGP
connections, AS 200 is divided into three sub-ASs: AS 65001, AS 65002, and AS 65003. In
AS 65001, fully meshed IBGP connections are established between the three routers.
BGP speakers outside a confederation such as Router F in AS 100, do not know the existence
of the sub-ASs (AS 65001, AS 65002, and AS 65003) in the confederation. The confederation
ID is the AS number that is used to identify the entire confederation. For example, AS 200 in
Figure 1-868 is the confederation ID.
Applications and Limitations

The confederation needs to be configured on each router, and the router that joins the
confederation must support the confederation function.
BGP speakers need to be reconfigured when a network in non-confederation mode switches to
confederation mode. As a result, the logical topology changes accordingly.
On large-scale BGP networks, the RR and confederation can both be used.
1.10.9.2.8 Route Reflector

Fully meshed connections need to be established between IBGP peers to ensure the
connectivity between IBGP peers. If there are n routers in an AS, n x (n-1)/2 IBGP
connections need to be established. When there are a lot of IBGP peers, network resources
and CPU resources are greatly consumed. Route reflection can solve the problem.
In an AS, one router functions as a Route Reflector (RR) and the other routers as clients. The
clients establish IBGP connections with the RR. The RR and its clients form a cluster. The RR
reflects routes among clients, and BGP connections do not need to be established between the
clients.
A BGP peer that functions as neither an RR nor a client is called a non-client. A non-client
must establish full meshed connections with the RR and all the other non-clients, as shown in
Figure 1-869.
Figure 1-869 Networking with an RR
Applications
After an RR receives routes from its peers, it selects the optimal route based on BGP route
selection policies and performs one of the following operations:
Issue 01 (2018-05-04) 1282

NE20E-S2
 If the optimal route is from a non-client IBGP peer, the RR advertises the route to all
clients.
 If the optimal route is from a client, the RR advertises the route to all non-clients and
clients.
 If the optimal route is from an EBGP peer, the RR advertises the route to all clients and
non-clients.
An RR is easy to configure because it only needs to be configured on the RR itself, and clients
do not need to know whether they are clients.
On some networks, if fully meshed connections have already been established among clients
of an RR, they can exchange routing information directly. In this case, route reflection among
the clients through the RR is unnecessary and occupies bandwidth. For example, on the
NE20E, route reflection can be disabled, but the routes between clients and non-clients are
still exchanged. By default, route reflection between clients through the RR is enabled.
On the NE20E, an RR can change various attributes of BGP routes, such as the AS_Path,
MED, Local_Pref, and community attributes.
Originator_ID
Originator_ID and Cluster_List are used to detect and prevent routing loops.
The Originator_ID attribute is four bytes long and is generated by an RR. It carries the router
ID of the route originator in the local AS.
 When a route is reflected by an RR for the first time, the RR adds the Originator_ID to
this route. If a route already carries the Originator_ID attribute, the RR does not create a
new one.
 After receiving the route, a BGP speaker checks whether the Originator_ID is the same
as its router ID. If Originator_ID is the same as its router ID, the BGP speaker discards
this route.
Cluster_List
To prevent routing loops between ASs, a BGP router uses the AS_Path attribute to record the
ASs through which a route passes. Routes with the local AS number are discarded by the
router. To prevent routing loops within an AS, IBGP peers do not advertise routes learned
from the local AS.
With RR, IBGP peers can advertise routes learned from the local AS to each other. However,
the Cluster_List attribute must be deployed to prevent routing loops within the AS.
An RR and its clients form a cluster. In an AS, each RR is uniquely identified by a
Cluster_ID.
Similar to an AS_Path, a Cluster_List is composed of a series of Cluster_IDs and is generated
by an RR. The Cluster_List records all the RRs through which a route passes.
 Before an RR reflects a route between its clients or between its clients and non-clients,
the RR adds the local Cluster_ID to the head of the Cluster_List. If a route does not carry
any Cluster_List, the RR creates one for the route.
 After the RR receives an updated route, it checks the Cluster_List of the route. If the RR
finds that its cluster ID is included in the Cluster_List, the RR discards the route. If its
cluster ID is not included in the Cluster_List, the RR adds its cluster ID to the
Cluster_List and then reflects the route.
Issue 01 (2018-05-04) 1283

NE20E-S2
Backup RR
To enhance network reliability and prevent single points of failure, more than one route
reflector needs to be configured in a cluster. The route reflectors in the same cluster must
share the same Cluster_ID to prevent routing loops.
With backup RRs, clients can receive multiple routes to the same destination from different
RRs. The clients then apply route selection policies to choose the optimal route.
Figure 1-870 Backup RR
On the network shown in Figure 1-870, RR1 and RR2 are in the same cluster. RR1 and RR2
establish an IBGP connection so that each RR is a non-client of the other RR.
 If Client 1 receives an updated route from an external peer, Client 1 advertises the route
to RR1 and RR2 through IBGP.
 After receiving the updated route, RR1 reflects the route to other clients (Client 2 and
Client 3) and the non-client (RR2) and adds the local Cluster_ID to the head of the
Cluster_List.
 After receiving the reflected route, RR2 checks the Cluster_List. RR2 finds that its
Cluster_ID is contained in the Cluster_List; therefore, it discards the updated route.
If RR1 and RR2 are configured with different Cluster_IDs, each RR receives both the route
from Client 1 and the updated route reflected from the other RR. Therefore, configuring the
same Cluster_ID for RR1 and RR2 reduces the number of routes that each RR receives and
memory consumption.
The application of Cluster_List prevents routing loops among RRs in the same AS.
Multiple Clusters in an AS
Multiple clusters may exist in an AS. RRs are IBGP peers of each other. An RR can be
configured as a client or non-client of another RR.
For example, the backbone network shown in Figure 1-871 is divided into multiple clusters.
Each RR is configured as a non-client of the other RRs, and these RRs are fully meshed. Each
Issue 01 (2018-05-04) 1284

NE20E-S2
client establishes IBGP connections with only the RR in the same cluster. In this manner, all
BGP peers in the AS can receive reflected routes.
Figure 1-871 Multiple clusters in an AS
Hierarchical Reflector
Hierarchical reflectors are deployed on live networks. On the network shown in Figure 1-872,
the ISP provides Internet routes for AS 100. Two EBGP connections are established between
the ISP and AS 100. AS 100 is divided into two clusters. The four routers in Cluster 1 are core
routers.
 Two Level-1 RRs (RR-1s) are deployed in Cluster 1, which ensures the reliability of the
core layer of AS 100. The other two routers in the core layer are clients of RR-1s.
 One Level-2 RR (RR-2) is deployed in Cluster 2. RR-2 is a client of RR-1.
Issue 01 (2018-05-04) 1285

NE20E-S2
Figure 1-872 Hierarchical reflector
1.10.9.2.9 BGP VPN Route Crossing

Route crossing refers to the process of adding a BGP VPN route to the routing table of the
local or remote VPN instance in a BGP/MPLS IP VPN scenario. Route crossing can be
classified as local route crossing and remote route crossing based on the source of the BGP
VPN route.
 Remote route crossing: After a PE receives a BGP VPNv4 route from a remote PE, the
local PE matches the export target (ERT) of the route against the import targets (IRTs)
configured for local VPN instances. If the ERT matches the IRT of a local VPN instance,
the PE converts the BGP VPNv4 route to a BGP VPN route and adds the BGP VPN
route to the routing table of this local VPN instance.
 Local route crossing: A PE matches the ERT of a BGP VPN route in a local VPN
instance against the IRTs configured for other local VPN instances. If the ERT matches
the IRT of a local VPN instance, the PE adds the BGP VPN route to the routing table of
this local VPN instance. Locally crossed routes include both locally imported routes and
routes learned from VPN peers.
After a PE receives VPNv4 routes destined for the same IP address from another PE or VPN
routes from a CE, the local PE implements route crossing by following the steps in Figure
1-873.
Issue 01 (2018-05-04) 1286

NE20E-S2
Figure 1-873 Flowchart for BGP VPN route crossing
In Figure 1-874, PEs have the same VPN instance (vpna) and RTs (including the ERT and
IRT). The RD configured for PE2 and PE3 is 2:2, and the RD configured for PE4 is 3:3. Site 2
has a route destined for 10.1.1.0/24. The route is sent to PE2, PE3, and PE4, who convert this
route to multiple BGP VPNv4 routes and send them to PE1. On receipt of the BGP VPNv4
routes, PE1 implements route crossing as shown in Figure 1-875. The detailed process is as
follows:
1. After receiving the BGP VPNv4 routes from PE2, PE3, and PE4, PE1 adds them to the
BGP VPNv4 routing table.
2. PE1 converts the BGP VPNv4 routes to BGP VPN routes by removing their RDs, adds
the BGP VPN routes to the routing table of the VPN instance, selects an optimal route
from the BGP VPN routes based on BGP route selection policies, and adds the optimal
BGP VPN route to the IP VPN instance routing table.
Issue 01 (2018-05-04) 1287

NE20E-S2
Figure 1-874 Route crossing networking
Figure 1-875 BGP VPN route crossing process
1.10.9.2.10 MP-BGP
Conventional BGP-4 manages only IPv4 unicast routing information, and inter-AS
transmission of packets of other network layer protocols, such as IPv6 and multicast, is
limited.
To support multiple network layer protocols, the Internet Engineering Task Force (IETF)
extends BGP-4 to Multiprotocol Extensions for BGP-4 (MP-BGP). MP-BGP is forward
compatible. Specifically, routers supporting MP-BGP can communicate with the routers that
do not support MP-BGP.
Extended Attributes
BGP-4 Update packets carry three IPv4-related attributes: NLRI (Network Layer Reachable
Information), Next_Hop, and Aggregator. Aggregator contains the IP address of the BGP
speaker that performs route aggregation.
To carry information about multiple network layer protocols in NLRI and Next_Hop,
MP-BGP introduces the following route attributes:
Issue 01 (2018-05-04) 1288

NE20E-S2
 MP_REACH_NLRI: indicates the multiprotocol reachable NLRI. It is used to advertise a

reachable route and its next hop.
 MP_UNREACH_NLRI: indicates the multiprotocol unreachable NLRI. It is used to
delete an unreachable route.
The preceding two attributes are optional non-transitive. Therefore, the BGP speakers that do
not support MP-BGP will ignore the information carried in the two attributes and do not
advertise the information to other peers.
Address Family
The Address Family Information field consists of a 2-byte Address Family Identifier (AFI)
and a 1-byte Subsequent Address Family Identifier (SAFI).
BGP uses address families to distinguish different network layer protocols. For the values of
address families, see relevant standards. The NE20E supports multiple MP-BGP extension
applications, such as VPN extension and IPv6 extension, which are configured in their
respective address family views.
For details about the BGP VPNv4 address family and BGP VPN instance address family, see
the HUAWEI NE20E-S2 Universal Service Router Feature Description - VPN.
1.10.9.2.11 BGP Security
BGP Authentication
BGP can work properly only after BGP peer relationships are established. Authenticating BGP
peers can improve BGP security. BGP supports the following authentication modes:
BGP uses TCP as the transport layer protocol. Message Digest 5 (MD5) authentication
can be used when establishing TCP connections to improve BGP security. MD5
authentication sets the MD5 authentication password for the TCP connection, and TCP
performs the authentication. If the authentication fails, the TCP connection cannot be
established.
Keychain authentication is performed on the application layer. It ensures smooth service
transmission and improves security by periodically changing the password and
encryption algorithms. When keychain authentication is configured for BGP peer
relationships over TCP connections, BGP packets as well as the establishment process of
a TCP connection can be authenticated. For details about keychain, see "Keychain" in
HUAWEI NE20E-S2 Feature Description - Security.
GTSM
During network attacks, attackers may simulate BGP packets and continuously send them to
the router. If the packets are destined for the router, it directly forwards them to the control
plane for processing without validating them. As a result, the increased processing workload
on the control plane results in high CPU usage.
The Generalized TTL Security Mechanism (GTSM) defends against attacks by checking
whether the time to live (TTL) value in each IP packet header is within a pre-defined range.
TTL refers to the maximum number of routers through which a packet can pass.
Issue 01 (2018-05-04) 1289

NE20E-S2
In actual networking, packets whose TTL values are not within the specified range are either
allowed to pass or discarded by the GTSM. To configure the GTSM to discard packets, you
need to set an appropriate TTL value range according the network topology. Then, packets
whose TTL values are not within the specified range are discarded, which prevents the local
device from potential attacks.
You can also enable the log function to record discarded packets for further fault location.
RPKI
Resource Public Key Infrastructure (RPKI) improves BGP security by validating the origin
ASs of BGP routes.
Attackers can steal user data by advertising routes that are more specific than those advertised
by carriers. For example, if a carrier has advertised a route destined for 10.10.0.0/16, an
attacker can advertise a route destined for 10.10.153.0/16, which is more specific than
10.10.0.0/16. According to the longest match rule, 10.10.153.0/16 is preferentially selected for
traffic forwarding. As a result, the attacker succeeds in intercepting user data.
To address this issue, establish an RPKI session between a router and an RPKI server. The
router will then query Route Origin Authorizations (ROAs) from the RPKI server through the
RPKI session and match the origin AS of each received BGP route against the ROAs. This
mechanism ensures that only the routes that originate from the trusted ASs are accepted. The
validation result can also be applied to BGP route selection to ensure that hosts in the local AS
can communicate with hosts in other ASs.
1.10.9.2.12 BGP GR
Graceful restart (GR) is one of the high availability (HA) technologies, which comprise a
series of comprehensive technologies such as fault-tolerant redundancy, link protection, faulty
node recovery, and traffic engineering. As a fault-tolerant redundancy technology, GR ensures
normal forwarding of data during the restart of routing protocols to prevent interruption of
key services. Currently, GR has been widely applied to the master/slave switchover and
system upgrade.
GR is usually used when the active route processor (RP) fails because of a software or
hardware error, or used by an administrator to perform the master/slave switchover.
Related Concepts
The concepts related to GR are as follows:
 GR Restarter: indicates a device that performs master/slave switchover triggered by the
administrator or a failure. A GR Restarter must support GR.
 GR Helper: indicates the neighbor of a GR Restarter. A GR Helper must support GR.
 GR session: indicates a session, through which a GR Restarter and a GR Helper can
negotiate GR capabilities.
 GR time: indicates the time when the GR Helper finds that the GR Restarter is Down but
keeps the topology information or routes obtained from the GR Restarter.
 End-of-RIB (EOR): indicates BGP information, notifying a peer BGP that the first route
upgrade is finished after the negotiation.
 EOR timer: indicates a maximum time of a local device waiting for the EOR information
sent from the peer. If the local device does not receive the EOR information from the
peer within the EOR timer, the local device will select an optimal route from the current
routes.
Issue 01 (2018-05-04) 1290

NE20E-S2
Principles
Principles of BGP GR are as follows:
1. During BGP peer relationship establishment, devices negotiate GR capabilities by
sending supported GR capabilities to each other.
2. When detecting the master/slave switchover of the GR Restarter, a GR Helper does not
delete the routing information and forwarding entries related to the GR Restarter within
the GR time, but waits to re-establish a BGP connection with the GR Restarter.
3. After the master/slave switchover, the GR Restarter receives routes from all the
negotiated peers with GR capabilities before the switchover, and starts the EOR timer.
The GR Restarter selects a route when either of the following conditions is met:
− The GR Restarter receives the EOR information of all peers and the EOR timer is
deleted.
− The EOR timer times out but the GR Restarter receives no EOR information from
all peers.
4. The GR Restarter sends the optimal route to the GR Helper and the GR Helper starts the
EOR timer. The GR Helper quits GR when either of the following conditions is met:
− The GR Helper receives the EOR information from the GR Restarter and the EOR
timer is deleted.
− The EOR timer times out and the GR Helper receives no EOR information from the
GR Restarter.
GR Reset
Currently, BGP does not support dynamic capability negotiation. Therefore, each time a new
BGP capability (such as the IPv4, IPv6, VPNv4, and VPNv6 capabilities) is enabled on a BGP
speaker, the BGP speaker tears down existing sessions with its peer and renegotiates BGP
capabilities. This process will interrupt ongoing services.
To prevent the service interruptions, the NE20E provides the GR reset function that enables
the NE20E to reset a BGP session in GR mode. With the GR reset function configured, when
you enable a new BGP capability on the BGP speaker, the BGP speaker enters the GR state,
resets the BGP session, and renegotiates BGP capabilities with the peer. In the whole process,
the BGP speaker re-establishes the existing sessions but does not delete the routing entries for
the existing sessions, so that the existing services are not interrupted.
Benefits
Through BGP GR, the forwarding is not interrupted. In addition, the flapping of BGP occurs
only on the neighbors of the GR Restarter, and does not occur in the entire routing domain.
This is important for BGP that needs to process a large number of routes.
1.10.9.2.13 BFD for BGP

The Border Gateway Protocol (BGP) periodically sends Keepalive packets to a peer to
monitor the peer's status. However, BGP takes more than 1 second for fault detection. When
traffic is transmitted at gigabit rates, lengthy fault detection causes packet loss, which does
not meet carrier-class network requirements for high reliability.
Bidirectional Forwarding Detection (BFD) for BGP can quickly detect faults on the link
between BGP peers and notify BGP of the faults, which implements fast BGP route
convergence.
Issue 01 (2018-05-04) 1291

NE20E-S2
Networking
As shown in Figure 1-876, Device A and Device B belong to ASs 100 and 200, respectively.
The two routers are directly connected and establish an External Border Gateway Protocol
(EBGP) peer relationship.
BFD is enabled to detect the EBGP peer relationship between Device A and Device B. If the
link between Device A and Device B fails, BFD can quickly detect the fault and notify BGP.
Figure 1-876 BFD for BGP
1.10.9.2.14 BGP 6PE
Background
As IPv6 technology becomes more popular, an increasing number of separate IPv6 networks
take shape. IPv6 provider edge (6PE), a technology designed to provide IPv6 services over
IPv4 networks, allows service providers to provide IPv6 services without constructing IPv6
backbone networks. The 6PE solution connects separate IPv6 networks using multiprotocol
label switching (MPLS) tunnels. The 6PE solution implements IPv4/IPv6 dual stack on the
provider edge devices (PEs) of Internet service providers and uses the Multi-protocol
Extensions for Border Gateway Protocol (MP-BGP) to assign labels to IPv6 routes. In this
manner, the 6PE solution connects separate IPv6 networks over IPv4 tunnels between PEs.
Related Concepts
In practical application, different metropolitan area networks (MANs) of a service provider or
collaborative backbone networks of different service providers often span multiple
autonomous systems (ASs). The 6PE solution can be intra-AS 6PE or inter-AS 6PE,
depending on whether separate IPv6 networks connect to the same AS. Standard protocol
provides three inter-AS 6PE modes: inter-AS 6PE OptionB, inter-AS 6PE OptionB with
autonomous system boundary routers (ASBRs) as PEs, and inter-AS OptionC. This section
describes the following 6PE modes:
 Intra-AS 6PE: Separate IPv6 networks are connected by the same AS. PEs in the AS
exchange IPv6 routes by establishing MP-IBGP peer relationships.
 Inter-AS 6PE OptionB: ASBRs in different ASs exchange labeled IPv6 routes by
establishing MP-EBGP peer relationships.
 Inter-AS 6PE OptionB (with ASBRs as PEs): ASBRs in different ASs exchange IPv6
routes using MP-EBGP.
 Inter-AS 6PE OptionC: PEs in different ASs exchange labeled IPv6 routes over
multi-hop MP-EBGP peer sessions.
Issue 01 (2018-05-04) 1292

NE20E-S2
Intra-AS 6PE
Figure 1-877 shows intra-AS 6PE networking. 6PE runs on the edge of a service provider
network. PEs that connect to IPv6 networks are IPv4/IPv6 dual-stack devices. PEs and
customer edge devices (CEs) exchange IPv6 routes using the IPv6 Interior Gateway Protocol
(IGP), or IPv6 External Border Gateway Protocol (EBGP). PEs exchange IPv4 routes with
each other or with provider devices (Ps) using an IPv4 routing protocol. PEs must establish
tunnels to transparently transmit IPv6 packets. PEs often use MPLS label switched paths
(LSPs) and MPLS Local IFNET tunnels. By default, a PE uses an MPLS LSP to transmit IPv6
packets. If no MPLS LSP is available, a PE uses an MPLS Local IFNET tunnel to transmit
IPv6 packets.
Figure 1-877 Intra-AS 6PE networking diagram
Figure 1-878 shows route and packet transmission in an intra-AS 6PE scenario. I-L indicates
an inner label, and O-L indicates an outer label. The outer label directs the packet to the BGP
next hop, and the inner label identifies the outbound interface or CE to which the packet
should be forwarded.
The route transmission process is as follows:
1. CE2 sends an IPv6 route to PE2, its EBGP peer.
2. Upon receipt, PE2 changes the next hop of the IPv6 route to itself and assigns a label to
the IPv6 route. Then, PE2 sends the labeled IPv6 route to PE1, its IBGP peer.
3. Upon receipt, PE1 relays the labeled IPv6 route to a tunnel and adds information about
the route to the local forwarding table. Then, PE1 changes the next hop of the route to
itself, removes the label of the route, and sends the route to CE1, its EBGP peer.
The IPv6 route transmission from CE2 to CE1 is complete.
The packet transmission process is as follows:
1. CE1 sends an ordinary IPv6 packet to PE1 over an IPv6 link on the public network.
Issue 01 (2018-05-04) 1293

NE20E-S2
2. Upon receipt, PE1 searches its local forwarding table for the forwarding entry based on
the destination address of the packet and encapsulates the packet with inner and outer
labels. Then, PE1 sends the IPv6 packet to PE2 over a public network tunnel.
3. Upon receipt, PE2 removes the inner and outer labels and forwards the IPv6 packet to
CE2 over an IPv6 link.
As a result, the IPv6 packet is transmitted from CE1 to CE2.
The route and packet transmission processes show that whether the public network is an IPv4
or IPv6 network does not matter to the CEs.
Figure 1-878 Route and packet transmission in an intra-AS 6PE scenario
Inter-AS 6PE
 Inter-AS 6PE OptionB (with ASBRs as PEs)
Figure 1-879 shows inter-AS 6PE OptionB (with ASBRs as PEs) networking. Inter-AS
6PE OptionB (with ASBRs as PEs) is similar to intra-AS 6PE. The only difference is that
in an inter-AS 6PE OptionB scenario in which ASBRs also function as PEs, ASBRs
establish EBGP peer relationships between each other. The route and packet transmission
processes in an inter-AS 6PE OptionB scenario in which ASBRs also function as PEs are
similar to those in an intra-AS 6PE scenario.
Issue 01 (2018-05-04) 1294

NE20E-S2
Figure 1-879 Networking diagram for inter-AS 6PE OptionB (with ASBRs as PEs)
 Inter-AS 6PE OptionB

Figure 1-880 shows inter-AS 6PE OptionB networking. ASBRs exchange labeled IPv6
routes with each other or with PEs using an IPv4 routing protocol. Tunnels must be
established between ASBRs and between PEs and ASBRs to transparently transmit IPv6
packets. MPLS LSPs, MPLS Local IFNET tunnels, GRE tunnels, and MPLS TE tunnels
are often used between ASBRs to transmit IPv6 packets. By default, an ASBR uses an
MPLS LSP to transmit IPv6 packets. If no MPLS LSP is available, an ASBR uses an
MPLS Local IFNET tunnel to transmit IPv6 packets. If you want an ASBR to transmit
IPv6 packets over an MPLS TE or a Generic Routing Encapsulation (GRE) tunnel,
configure a tunnel policy on the ASBR.
Figure 1-880 Networking diagram for inter-AS 6PE OptionB
Issue 01 (2018-05-04) 1295

NE20E-S2
Figure 1-881 shows route and packet transmission in an inter-AS 6PE OptionB scenario.
I-L indicates an inner label, and O-L indicates an outer label.
Figure 1-881 Route and packet transmission in an inter-AS 6PE OptionB scenario

a. CE2 sends an IPv6 route to PE2, its EBGP peer.
b. Upon receipt, PE2 changes the next hop of the IPv6 route to itself and assigns a
label to the IPv6 route. Then, PE2 sends the labeled IPv6 route to ASBR2 over an
IBGP peer session.
c. Upon receipt, ASBR2 relays the route to a tunnel and adds information about the
route to the local forwarding table. Then, ASBR2 changes the next hop of the route
to itself, replaces the label of the route, and sends the route to ASBR1, its EBGP
peer.
d. Upon receipt, ASBR1 relays the route to a tunnel and adds information about the
route to the local forwarding table. Then, ASBR1 changes the next hop of the route
to itself, replaces the label of the route, and sends the route to PE1, its IBGP peer.
e. Upon receipt, PE1 relays the route to a tunnel and adds information about the route
to the local forwarding table. Then, PE1 changes the next hop of the route to itself,
removes the label of the route, and sends the route to CE1, its EBGP peer.
As a result, the IPv6 route is transmitted from CE2 to CE1.
a. CE1 sends an ordinary IPv6 packet to PE1 over an IPv6 link on the public network.
b. Upon receipt, PE1 looks up its local forwarding table based on the destination
address of the packet and encapsulates the packet with inner and outer labels. Then,
PE1 sends the IPv6 packet to ASBR1 over a public network tunnel.
c. Upon receipt, ASBR1 removes the inner and outer labels of the packet, looks up the
local forwarding table based on the destination address of the packet, and
encapsulates the packet with new inner and outer labels. Then, ASBR1 sends the
IPv6 packet to ASBR2 over a public network tunnel.
d. Upon receipt, ASBR2 removes the inner and outer labels of the packet, looks up the
local forwarding table based on the destination address of the packet, and
encapsulates the packet with new inner and outer labels. Then, ASBR2 sends the
IPv6 packet to PE2 over a public network LSP.
e. Upon receipt, PE2 removes the inner and outer labels and forwards the IPv6 packet
to CE2 over an IPv6 link.
Issue 01 (2018-05-04) 1296

NE20E-S2

 Inter-AS 6PE OptionC
Figure 1-882 shows inter-AS 6PE OptionC networking. In an inter-AS 6PE OptionC
scenario, PEs establish multi-hop MP-EBGP peer relationships between each other and
exchange labeled IPv6 routes using an IPv4 routing protocol. PEs exchange IPv6 packets
over end-to-end BGP LSPs.
Two inter-AS 6PE OptionC solutions are available, depending on the establishment methods of
end-to-end LSPs. In an inter-AS 6PE OptionC scenario, PEs establish multi-hop MP-EBGP peer
relationships to exchange labeled IPv6 routes and establish end-to-end BGP LSPs to transmit IPv6
packets. The way in which an end-to-end BGP LSP is established does not matter much to inter-AS 6PE
OptionC and therefore is not described here.
Figure 1-882 Networking diagram for inter-AS 6PE OptionC
Figure 1-883 shows route and packet transmission in an inter-AS 6PE OptionC scenario.
I-L indicates an inner label, B-L indicates a BGP LSP label, and O-L indicates an outer
label.
To simplify the description of the figure, it is assumed that:
 The two ASBRs are connected by an MPLS Local IFNET tunnel.

 MPLS does not use the penultimate hop popping (PHP) function.
Issue 01 (2018-05-04) 1297

NE20E-S2
Figure 1-883 Route and packet transmission in an inter-AS 6PE OptionC scenario

a. CE2 sends an IPv6 route to PE2, its EBGP peer.
b. Upon receipt, PE2 changes the next hop of the IPv6 route to itself and assigns a
label to the IPv6 route. Then, PE2 sends the labeled IPv6 route to PE1, its
MP-EBGP peer.
c. Upon receipt, PE1 relays the route to a tunnel and adds information about the route
to the local forwarding table. Then, PE1 changes the next hop of the route to itself,
removes the label of the route, and sends the route to CE1, its EBGP peer.
The IPv6 route transmission from CE2 to CE1 is complete. During this process, ASBRs
transparently transmit information about the labeled IPv6 route.
a. CE1 sends an ordinary IPv6 packet to PE1 over an IPv6 link on the public network.
b. Upon receipt, PE1 searches its local forwarding table for the forwarding entry based
on the destination address of the packet, changes the next hop of the packet based
on the search result, and encapsulates the packet with an inner label, a BGP LSP
label, and an outer label. Then, PE1 sends the IPv6 packet to P1 over a public
network tunnel.
c. Upon receipt, P1 replaces the outer label of the packet and forwards the packet to
ASBR1 over a public network tunnel.
d. Upon receipt, ASBR1 removes the outer and BGP LSP labels and encapsulates the
packet with a new BGP LSP label. Then, ASBR1 sends the IPv6 packet to ASBR2
over a public network tunnel.
e. Upon receipt, ASBR2 removes the BGP LSP label and encapsulates the packet with
an outer label. Then, ASBR2 sends the IPv6 packet to PE2 over a public network
tunnel.
f. Upon receipt, P2 replaces the outer label of the packet and forwards the packet to
PE2 over a public network tunnel.
g. Upon receipt, PE2 removes the inner and outer labels and forwards the IPv6 packet
to CE2 over an IPv6 link.
Issue 01 (2018-05-04) 1298

NE20E-S2
Usage Scenarios
Each 6PE mode has its advantages and usage scenarios. The intra-AS 6PE mode is best suited
for scenarios in which separate IPv6 networks connect to the same AS. Inter-AS 6PE modes
are best suited for scenarios in which separate IPv6 networks connect to different ASs. Table
1-249 lists the usage scenarios for inter-AS 6PE modes.
Table 1-249 Usage scenarios for inter-AS 6PE modes
Mode Characteristic Usage Scenario

Inter-AS 6PE OptionB (with Advantage: Configuration is A small network with
ASBRs as PEs) similar to that for intra-AS ASBRs in different ASs to
6PE and additional inter-AS which separate IPv6
configuration is not networks are connected. The
required. smaller the number of ASs
Disadvantage: Network spanned, the more obvious
expansibility is poor. the advantage of this
ASBRs must have high solution is.
performance to manage
information about all
labeled IPv6 routes.
Inter-AS 6PE OptionB Advantage: MPLS tunnels An inter-AS OptionB public
are established segment by network with tunnels
segment and easy to established between the PEs
manage. in different ASs and with
Disadvantage: Information separate IPv6 networks
about labeled IPv6 routes is connected to the ASs.
stored on and advertised by
ASBRs. If a large number of
VPN routes exist, the
overburdened ASBRs are
likely to encounter
bottlenecks.
Inter-AS 6PE OptionC Advantage: Labeled IPv6 An inter-AS OptionC public
routes are directly network with end-to-end
exchanged between the tunnels established between
ingress and egress PEs. the PEs in different ASs and
Information about labeled with separate IPv6 networks
IPv6 routes is managed by connected to the ASs.
PEs only and ASBRs are no The greater the number of
longer the bottlenecks. ASs spanned, the more
Disadvantage: Management obvious the advantage of
costs of end-to-end this solution is.
connections between PEs
are high.
Benefits
6PE offers the following benefits:
Issue 01 (2018-05-04) 1299

NE20E-S2
 Easy maintenance: All configurations are performed on PEs and network maintenance is
simple. IPv6 services are carried over IPv4 networks, but the users on IPv6 networks are
unaware of IPv4 networks.
 Low network construction costs: Service providers can provide IPv6 services over
existing MPLS networks without upgrading the networks. 6PE devices can provide
multiple types of services, such as IPv6 VPN and IPv4 VPN.
1.10.9.2.15 BGP ORF

Outbound Route Filtering (ORF) is used to enable a BGP device to send the local routing
policy to its BGP peer. The peer can use the local routing policy to filter out unwanted routes
before route advertisement.
In most cases, users expect the carrier to send them only the routes they require. Therefore,
the carrier needs to maintain a separate outbound policy for each user. ORF allows carriers to
send only required routes to each user without maintaining a separate outbound policy for
each user. ORF supports on-demand route advertisement, which greatly reduces bandwidth
consumption and the manual configuration workload.
Prefix-based ORF, defined in standard protocols, can be used to send prefix-based inbound
policies configured by users to a carrier through Route-Refresh packets. The carrier then
filters out unwanted routes before route advertisement based on the received inbound policies,
which prevents users from receiving a large number of unwanted routes and saves resources.
Applications
On the network shown in Figure 1-884, Device A and Device B are directly connected, and
prefix-based ORF is enabled on them; after negotiating the prefix-based ORF capability with
Device B, Device A adds the local prefix-based inbound policy to a Route-Refresh packet and
then sends the Route-Refresh packet to Device B. Device B uses the information in the packet
to work out an outbound policy to advertise routes to Device A.
Figure 1-884 Applying ORF to directly connected BGP peers
As shown in Figure 1-885, Device A and Device B are clients of the RR in the domain.
Prefix-based ORF is enabled on all three NEs. After negotiating prefix-based ORF with the
RR, Device A and Device B add the local prefix-based inbound policies Route-Refresh
packets and then send the packets to the RR. Based on the Route-Refresh packets, the RR
uses the information in the Route-Refresh packets to work out the outbound policies to reflect
routes to Device A and Device B.
Issue 01 (2018-05-04) 1300

NE20E-S2
Figure 1-885 Applying ORF to a domain with an RR
1.10.9.2.16 BGP Auto FRR

As a protection measure against faults over links, BGP Auto Fast Reroute (FRR) is applicable
to networks with primary and backup links. With BGP Auto FRR, traffic can be switched
between two BGP peers or next hops within sub-seconds.
With BGP Auto FRR, if a peer has multiple routes with the same prefix that are learned from
different peers, the peer uses the optimal route as the primary link to forward packets and the
less optimal route as a backup link. If the primary link fails, the peer rapidly notifies other
peers that the BGP route has become unreachable and then switches traffic from the primary
link to the backup link.
Usage Scenario
On the network shown in Figure 1-886, Device Y advertises a learned BGP route to Device
X2 and Device X3 in AS 100; Device X2 and Device X3 then advertise the BGP route to
Device X1 through RR. Therefore, Device X1 receives two routes whose next hops are
Device X2 and Device X3 respectively. Then, Device X1 selects a route based on a
configured routing policy. Assume that the route sent by Device X2 (Link A) is preferred. The
route sent by Device X3 (Link B) then functions as a backup link.
Issue 01 (2018-05-04) 1301

NE20E-S2
Figure 1-886 Networking for BGP Auto FRR
If a node along Link A fails or faults occur on Link A, the next hop of the route from Device
X1 to Device X2 becomes unavailable. If BGP Auto FRR is enabled on Device X1, the
forwarding plane then quickly switches to Link B the traffic from Device X1 to Device Y,
which ensures uninterrupted traffic transmission. In addition, Device X1 reselects the route
sent by Device X3 based on the forwarding prefixes and then updates the FIB table.
1.10.9.2.17 BGP Dynamic Update Peer-Groups

As the routing table increases in size and the network topology increases in complexity, BGP
needs to be able to support more peers. When the router needs to send a large number of
routes to many BGP peers and most of the peers share the same configuration, if the router
groups each route and then send the route to each peer, the efficiency is low.
The dynamic update peer-groups feature can address this problem. It considers all the BGP
peers with the same configuration as an update-group. With the dynamic update peer-groups
feature, each route to be sent is grouped once for all and then sent to all peers in the
update-group, improving grouping efficiency and forwarding performance exponentially.
Usage Scenario
The BGP dynamic update peer-groups feature is applicable to the following scenarios:
 Scenario with an international gateway
 Scenario with an RR
 Scenario where routes received from EBGP peers need to be sent to all IBGP peers
Issue 01 (2018-05-04) 1302

NE20E-S2
Figure 1-887 Networking for the international gateway
Issue 01 (2018-05-04) 1303

NE20E-S2
Figure 1-888 Networking for RRs with many clients
Figure 1-889 Networking for a PE connected to multiple IBGP peers
The preceding scenarios have in common that a router needs to send routes to a large number
of BGP peers, most of which share the same configuration. This situation is most evident in
the networking shown in Figure 1-888.
For example, an RR has 100 clients and needs to reflect 100,000 routes to them. If the RR
groups the routes for each peer before sending the routes to 100 clients, the total number of
times that all routes are grouped is 100,000 x 100. After the dynamic update peer-groups
Issue 01 (2018-05-04) 1304

NE20E-S2
feature is applied, the total number of times that all routes are grouped changes to 100,000 x 1.
The efficiency is 100 times higher than before.
1.10.9.2.18 Active-Route-Advertise
Background
Active-route-advertise allows a device to advertise only optimal routes in an IP routing table.
Active-route-advertise prevents changes to data forwarding paths that may occur after
independent BGP route selection is deployed. Independent BGP route selection enables a
device to advertise an optimal BGP route in a BGP routing table to BGP peers, regardless of
whether this optimal BGP route is optimal in an IP routing table. Therefore, if a device is
upgraded to support independent BGP route selection, previously selected data forwarding
paths may change. If you do not expect the changes, configure active-route-advertise on the
device.
Related Concepts
IP routing table: stores routes that are optimal in each available protocol routing table, selects
optimal routes from the stored routes, and delivers the selected optimal routes to the
forwarding information base (FIB) table.
Independent BGP route selection: enables a device to advertise an optimal BGP route in a
BGP routing table to BGP peers, regardless of whether this optimal BGP route is optimal in
an IP routing table.
Implementation
As shown in Figure 1-890, an RR is deployed between Device A and Device B, an Open
Shortest Path First (OSPF) neighbor relationship is established between Device A and the RR,
an External BGP (EBGP) peer relationship is established between Device B and Device C,
and Device A and Device C are not directly connected.
Issue 01 (2018-05-04) 1305

NE20E-S2
Figure 1-890 Active-route-advertise networking diagram
The route 100.0.0.0/8 is imported to the BGP and OSPF routing tables on Device A, and the
RR learns both the BGP route 100.0.0.0/8 and OSPF route 100.0.0.0/8. By default, an OSPF
route has a higher priority over a BGP route. Therefore, the OSPF route 100.0.0.0/8 is an
optimal route in the IP routing table, and the BGP route 100.0.0.0/8 is inactive in the IP
routing table. Table 1-250 describes the changes after the RR is upgraded to support
independent BGP route selection, including the changes in the optimal route in the BGP
routing table, route advertisement, and data forwarding path.
Table 1-250 Change description

In the Target In the Target
In the Source Version Without Version with
Item
Version Active-Route-Adv Active-Route-Adv
ertise ertise
Optimal route in
The BGP route 100.0.0.0/8 is an optimal route in the BGP routing
the BGP routing
table, but not optimal in the IP routing table.
table
Independent BGP The BGP route

Independent BGP
route selection is 100.0.0.0/8 is not an
route selection is not
supported. The RR optimal route in the
supported. The RR
advertises the BGP IP routing table on
does not advertise
Route route 100.0.0.0/8 to the RR. The RR
the BGP route
advertisement Device B. Device C does not advertise
100.0.0.0/8 to
learns routes the BGP route
Device B. Device C
advertised by both 100.0.0.0/8 to
learns routes
Device B and Device B. Device C
advertised by
Device E. learns routes
Device E but not by
advertised by
Issue 01 (2018-05-04) 1306

NE20E-S2
In the Target In the Target

In the Source Version Without Version with
Item
Version Active-Route-Adv Active-Route-Adv
ertise ertise
Device B. Device E but not by
Device B.
Available data
Link B Link A and Link B Link B
forwarding path
Actually selected Device C selects Device C selects Device C selects
data forwarding Link B to send data Link A to send data Link B to send data
path to 100.0.0.0/8. to 100.0.0.0/8. to 100.0.0.0/8.
As described in Table 1-250, active-route-advertise ensures that the data forwarding path Link
B is still used after independent BGP route selection is deployed.
Before you configure active-route-advertise, check whether BGP routes are optimal routes in the IP
routing table. If all BGP routes are optimal routes in the IP routing table, data forwarding paths do not
change after active-route-advertise is configured. If only some BGP routes are optimal routes in the IP
routing table, analyze the impacts of active-route-advertise and determine whether to configure this
feature.
Advantages
Active-route-advertise prevents changes to data forwarding paths that may occur after
independent BGP route selection is deployed.
1.10.9.2.19 4-Byte AS Number
Purpose
2-byte autonomous system (AS) numbers used on networks range from 1 to 65535, and the
available AS numbers are close to exhaustion as networks expand. Therefore, the AS number
range needs to be extended. 4-byte AS numbers ranging from 1 to 4294967295 can address
this problem. New speakers that support 4-byte AS numbers can co-exist with old speakers
that support only 2-byte AS numbers.
Definition
4-byte AS numbers are extended from 2-byte AS numbers. Border Gateway Protocol (BGP)
peers use a new capability code and optional transitive attributes to negotiate the 4-byte AS
number capability and transmit 4-byte AS numbers. This mechanism enables communication
between new speakers and between old speakers and new speakers.
Open capability code (0x41), defined by standard protocols, indicates that the local end
supports 4-byte capability extension.
The following new optional transitive attributes are defined by standard protocols and used to
transmit 4-byte AS numbers in old sessions:
 AS4_Path coded 0x11
Issue 01 (2018-05-04) 1307

NE20E-S2
 AS4_Aggregator coded 0x12

If a new speaker with an AS number greater than 65535 communicates with an old speaker,
the old speaker needs to set the peer AS number to AS_TRANS. The value of AS_TRANS is
23456 and reserved.
Related Concepts
 New speaker: a peer that supports 4-byte AS numbers
 Old speaker: a peer that does not support 4-byte AS numbers
 New session: a BGP connection established between new speakers
 Old session: a BGP connection established between a new speaker and an old speaker, or
between old speakers
Principles
BGP speakers negotiate capabilities by exchanging Open messages. Figure 1-891 shows the
format of Open messages exchanged between new speakers. The header of a BGP Open
message is fixed, in which My AS Number is supposed to be the local AS number. However,
My AS Number can carry only 2-byte AS numbers. Therefore, a new speaker adds 23456 to
My AS Number and its local AS number to Optional parameters before it sends an Open
message to a peer. After the peer receives the message, it can determine whether the new
speaker supports 4-byte AS numbers by checking Optional parameters in the message.
Figure 1-891 Format of Open messages sent by new speakers
Figure 1-892 shows how peer relationships are established between new speakers, and
between an old speaker and a new speaker. BGP speakers notify each other of whether they
support 4-byte AS numbers by exchanging Open messages. After the capability negotiation,
new sessions are established between new speakers, and old sessions are established between
a new speaker and an old speaker.
Issue 01 (2018-05-04) 1308

NE20E-S2
Figure 1-892 Process of establishing a BGP peer relationship
AS_Path and Aggregator in Update messages exchanged between new speakers carry 4-byte
AS numbers, while AS_Path and Aggregator in Update messages sent by an old speaker
carry 2-byte AS numbers.
 When a new speaker sends an Update message carrying an AS number greater than
65535 to an old speaker, the new speaker uses AS4_Path and AS4_Aggregator to assist
AS_Path and AS_Aggregator in transferring 4-byte AS numbers. AS4_Path and
AS4_Aggregator are transparent to the old speaker. In the networking shown in Figure
1-893, before the new speaker in AS 2.2 sends an Update message to the old speaker in
AS 65002, the new speaker replaces each 4-byte AS number (2.2, 1.1) with 23456 in
AS_Path. Therefore, the AS_Path carried in the Update message is (23456, 23456,
65001), and the carried AS4_Path is (2.2, 1.1). After the old speaker in AS 65002
receives the Update message, it transparently transmits the message to other ASs.
 When the new speaker receives an Update message carrying AS_Path, AS4_Path,
AS_Aggregator, and AS4_Aggregator from the old speaker, the new speaker uses the
reconstruction algorithm to reconstruct the actual AS_Path and AS_Aggregator. In the
networking shown in Figure 1-893, after the new speaker in AS 65003 receives an
Update message carrying AS_Path (65002, 23456, 23456, 65001) and AS4_Path (2.2,
1.1) from the old speaker in AS 65002, the new speaker reconstructs the actual AS_Path
(65002, 2.2, 1.1, 65001).
Issue 01 (2018-05-04) 1309

NE20E-S2
Figure 1-893 Process of transmitting a BGP Update message
Format of 4-byte AS numbers

A 4-byte AS number can be an integer or in dotted notation. The system stores 4-byte AS
numbers as unsigned integers, regardless of their formats. 4-byte AS numbers in dotted
notation are in the format of A.B. The formula of the conversion between 4-byte AS numbers
for the two formats is as follows: Integer 4-byte AS number = A x 65536 + B. For example,
the 4-byte AS number in dotted notation 2.3 can be converted to the integer 4-byte AS number
131075 (2 x 65536 + 3).
The NE20E supports 4-byte AS numbers of both formats. The 4-byte AS numbers displayed
in the configuration files are in the format configured by users.
By default, the 4-byte AS numbers displayed in the display and debugging command outputs
are in dotted notation, regardless of the configured format. If users change the default display
format of 4-byte AS numbers from dotted notation to an integer, the displayed 4-byte AS
numbers are integers.
Adjusting the display format of 4-byte AS numbers affects the matching results of AS_Path
regular expressions and extended community filters. If you adjust the display format of
4-byte AS numbers on a system that uses an AS_Path regular expression or extended
community filter as the export or import policy, reconfigure the AS_Path regular expression
or extended community filter. If you do not reconfigure the AS_Path regular expression or
extended community filter, routes cannot match the export or import policy, and a network
error may occur.
Issue 01 (2018-05-04) 1310

NE20E-S2
Benefits
4-byte AS numbers alleviate AS number exhaustion and therefore are beneficial to carriers
who need to expand the network scale.
1.10.9.2.20 BMP
Background
The BGP Monitoring Protocol (BMP) is designed to monitor BGP running status, such as
BGP peer relationship establishment and termination and route updates.
Without BMP, manual query is required if you want to know about BGP running status.
With BMP, a router can be connected to a monitoring server and configured to report BGP
running statistics to the server for monitoring, which improves the network monitoring
efficiency.
BMP Messages
Routers send BMP packets carrying Initiation, Peer Up Notification (PU), Route Monitoring
(RM), Peer Down Notification (PD), Status Report (SR), or Termination messages to the
monitoring server to report BGP running statistics. The functions of these messages are listed
as follows:
 Initiation message: Reports to the monitoring server such information as the router
vendor and its software version.
 PU message: Notifies the monitoring server that a BGP peer relationship has been
established.
 RM message: Sends to the monitoring server all routes received from BGP peers and
notifies the server of route addition or deletion in real time.
 PD message: Notifies the monitoring server that a BGP peer has been disconnected.
 SR message: Reports router running statistics to the monitoring server.
 Termination message: Reports to the monitoring server the cause of BMP session
termination.
BMP sessions are unidirectional. Routers send messages to the monitoring server but ignore messages
replied by the server.
Implementation
In Figure 1-894, a TCP connection is established between the monitoring server and PE1 and
between the monitoring server and PE2. PE1 and PE2 send unsolicited BMP packets to the
monitoring server to report BGP running statistics. After receiving these BMP packets, the
monitoring server parses them and displays the BGP running status in the monitoring view.
The BMP packets carry headers. By analyzing the headers, the monitoring server can decide
which BGP peers have advertised the routes carried in these packets.
When establishing a connection between a router and a monitoring server, note the following
rules:
 You can specify a port for the TCP connection between the router and the monitoring
server.
Issue 01 (2018-05-04) 1311

NE20E-S2
 One router can connect to multiple monitoring servers, and one monitoring server can
also connect to multiple routers.
 In each BMP instance, one router can connect to only one monitoring server.
 The monitoring server monitors all BGP peers. Specifying the BGP peer to be monitored
is not supported.
Figure 1-894 Networking with BMP
Benefits
BMP facilitates the monitoring of BGP running status and reports security threats in real time
so that preventive measures can be taken promptly.
1.10.9.2.21 BGP Best External
Background
If multiple routes to the same destination are available, a BGP device selects one optimal
route based on BGP route selection policies and advertises the route to its BGP peers.
For details about BGP route selection policies, see BGP Principles.
Issue 01 (2018-05-04) 1312

NE20E-S2
However, in scenarios with master and backup provider edges (PEs) or route reflectors (RRs),
if routes are selected based on the preceding policies and the primary link fails, the BGP route
convergence takes a long time because no backup route is available. To address this problem,
the BGP Best External feature was introduced.
Related Concepts
BGP Best External: A mechanism that enables a backup device to select a sub-optimal route
and send the route to its BGP peers if the route preferentially selected based on BGP route
selection policies is an Internal Border Gateway Protocol (IBGP) route advertised by the
master device. Therefore, BGP Best External speeds up BGP route convergence if the primary
link fails.
Best External route: The sub-optimal route selected after BGP Best External is enabled.
Networking with Master and Backup PEs

In the networking shown in Figure 1-895, CE1 is dual-homed to PE1 and PE2. PE1 has a
greater Local_Pref value than PE2 and CE1. EBGP peer relationships are established between
CE1 and PE1, and between CE1 and PE1. In addition, IBGP peer relationships are established
among PE1, PE2, and PE3. PE1 and PE2 receive the same route to 1.1.1.1/32 from CE1. After
receiving this route, PE1 advertises it to PE2 and PE3. Therefore, PE2 has two routes to
1.1.1.1/32. Of the two routes, PE2 preferentially selects the route from PE1 because PE1 has a
larger Local_Pref value than CE1. PE2 does not advertise the selected route to PE3. Therefore,
PE3 has only one route to 1.1.1.1/32, which is advertised by PE1. If the link between CE1 and
PE1 fails, a new route must be selected to take over traffic after routes are converged, which
takes a long time.
Figure 1-895 Networking with master and backup PEs
BGP Best External can be enabled on PE2 to address this problem. With BGP Best External,
PE2 selects the EBGP route from CE1 and advertises it to PE3. Therefore, PE3 has two routes
to 1.1.1.1/32, in which the route CE1 -> PE2 -> PE3 backs up CE1 -> PE1 -> PE3. Table
1-251 lists the differences with and without BGP Best External.
Table 1-251 Differences with and without BGP Best External
If the Link
Route Available
BGP Best External Optimal Route Between CE1 and
on PE3
PE1 Fails
Not enabled CE1 -> PE1 -> PE3 CE1 -> PE1 -> PE3 A new route must be
Issue 01 (2018-05-04) 1313

NE20E-S2
If the Link
Route Available
BGP Best External Optimal Route Between CE1 and
on PE3
PE1 Fails
selected to take over
traffic after routes
are converged.
Enabled CE1 -> PE1 -> PE3 Traffic is switched
CE1 -> PE1 -> PE3 to CE1 -> PE2 ->
CE1 -> PE2 -> PE3 PE3 immediately.
Networking with Master and Backup RRs

In the networking shown in Figure 1-896, an EBGP peer relationship is established between
Device A and Device B. An IBGP peer relationship is established between each two devices
among RR1, RR2, Device B, and Device C except between Device B and Device C. Device B
is a client of RR1 and RR2. RR1 has a greater Local_Pref value than RR2, and therefore RR1
is the master device while RR2 is the backup device. RR1 and RR2 receive the same route to
1.1.1.1/32 from Device B. After receiving this route, RR1 advertises it to RR2 and Device C.
Therefore, RR2 have two routes to 1.1.1.1/32. Of the two routes, RR2 preferentially selects
the route from RR1 because RR1 has a greater Local_Pref value. RR2 does not advertise the
selected route to Device C. Therefore, Device C has only one route to 1.1.1.1/32, which is
advertised by RR1. If the link between Device B and RR1 fails, a new route must be selected
to take over traffic after routes are converged, which takes a long time.
Figure 1-896 Networking with master and backup RRs
BGP Best External can be enabled on RR2 to address this problem. With BGP Best External,
RR2 selects the EBGP route from Device B and advertises it to Device C. Therefore, Device
C has two routes to 1.1.1.1/32, in which the route Device A -> Device B -> RR2 -> Device C
backs up Device A -> Device B -> RR1 -> Device C. Table 1-252 lists the differences with
and without BGP Best External.
Issue 01 (2018-05-04) 1314

NE20E-S2
Table 1-252 Differences with and without BGP Best External
If the Link
Route Available
BGP Best External Optimal Route Between Device B
on Device C
and RR1 Fails
A new route must be
Device A -> Device Device A -> Device
selected to take over
Not enabled B -> RR1 -> Device B -> RR1 -> Device
traffic after routes
C C
are converged.
Device A -> Device
B -> RR1 -> Device Traffic is switched
C Device A -> Device to Device A ->
Enabled B -> RR1 -> Device Device B -> RR2 ->
Device A -> Device C Device C
B -> RR2 -> Device immediately.
C
Usage Scenario
The BGP Best External feature applies to scenarios in which master and backup PEs or RRs
are deployed and the backup PE or RR needs to advertise the sub-optimal route (Best External
route) to its BGP peers to speed up BGP route convergence.
Advantages
As networks develop, services, such as voice over IP (VoIP), online video, and financial
services, pose higher requirements for real-time transmission. With BGP Best External, the
backup device selects the sub-optimal route and advertises the route to its BGP peers, which
speeds up BGP route convergence and minimizes service interruptions.
1.10.9.2.22 BGP Add-Path
Background
In a scenario with a route reflector (RR) and clients, if the RR has multiple routes to the same
destination (with the same prefix), the RR selects an optimal route from these routes and then
sends only the optimal route to its clients. Therefore, the clients have only one route to the
destination. If a link along this route fails, route convergence takes a long time, which cannot
meet the requirements on high reliability.
To address this issue, deploy the BGP Add-Path feature on the RR. With BGP Add-Path, the
RR can send two or more routes with the same prefix to its clients. After reaching the clients,
these routes can back up each other or load-balance traffic, which ensures high reliability in
data transmission.
 For details about route selection and advertisement policies, see 1.10.9.2.1 Basic Principle.
 BGP Add-Path is deployed on RRs in most cases although it can be configured on any router.
 With BGP Add-Path, you can configure the maximum number of routes with the same prefix that an
RR can send to its clients. The actual number of routes with the same prefix that an RR can send to
its clients is the smaller value between the configured maximum number and the number of
available routes with the same prefix.
Issue 01 (2018-05-04) 1315

NE20E-S2
Related Concepts
Add-Path route: The routes selected by BGP after BGP Add-Path is configured.
Typical Networking
On the network shown in Figure 1-897, Device A, Device B, and Device C are clients of the
RR, and Device D is an EBGP peer of Device B and Device C.
Each of Device B and Device C receives a route to 1.1.1.1/32 from Device D, with 9.1.1.1/24
and 9.1.2.1/24 as the next hops, respectively. Then, each of Device B and Device C sends the
received route to the RR. After receiving the two routes, the RR selects an optimal route from
them and sends it to Device A. Therefore, Device A has only one route to 1.1.1.1/32.
Figure 1-897 Networking with BGP Add-Path
BGP Add-Path can be configured to allow the RR to send more than one route with the same
prefix to Device A. Suppose that the configured maximum number of routes with the same
prefix that the RR can send to Device A is 2 and that the optimal route selected by the RR is
the one from Device B. Table 1-253 lists the differences with and without BGP Add-Path.
Table 1-253 Differences with and without BGP Add-Path
If the Link Between

Route Available on
BGP Add-Path Device D and Device B
Device A
Fails
Only one route to 1.1.1.1/32, A new route must be
Not enabled with 9.1.1.1/24 as the next selected to take over traffic
hop after route convergence.
If the two routes back up
each other, the link between
Device D and Device C
Two routes to 1.1.1.1/32,
takes over traffic.
Enabled with 9.1.1.1/24 and
9.1.2.1/24 as the next hops If the two routes
load-balance traffic, the link
between Device D and
Device C takes over all the
Issue 01 (2018-05-04) 1316

NE20E-S2
If the Link Between

Route Available on
BGP Add-Path Device D and Device B
Device A
Fails
traffic that ran along the two
routes.
Usage Scenario
The BGP Add-Path feature applies to scenarios in which an RR and clients are deployed and
the RR needs to send more than one route with the same prefix to its clients to ensure high
reliability in data transmission.
BGP Add-Path is used in traffic optimization scenarios and allows multiple routes to be sent
to the controller.
Benefits
Deploying BGP Add-Path can improve network reliability.
1.10.9.2.23 BGP Iteration Suppression in Case of Next Hop Flapping
Background
In some scenarios, if a large number of routes are iterated to the same next hop that flaps
frequently, the system will be busy processing reselection and re-advertisement of these routes,
which consumes excessive resources and leads to high CPU usage. BGP iteration suppression
in case of next hop flapping can address this problem.
Principles
After this function is enabled, BGP calculates the penalty value that starts from 0 by
comparing the flapping interval with configured intervals if next hop flapping occurs. When
the penalty value exceeds 10, BGP suppresses route iteration to the corresponding next hop.
For example, if the intervals for increasing, retaining, and clearing the penalty value are T1,
T2, and T3, respectively, BGP calculates the penalty value as follows:
 Increases the penalty value by 1 if the flapping interval is less than T1.
 Retains the penalty value if the flapping interval is greater than or equal to T1, but less
than T2.
 Reduces the penalty value by 1 if the flapping interval is greater than or equal to T2, but
less than T3.
 Clears the penalty value if the flapping interval is greater than or equal to T3.
When the penalty value exceeds 10, the system processes reselection and re-advertisement of
the routes that are iterated to a flapping next hop much slower.
Benefits
BGP iteration suppression in case of next hop flapping prevents the system from frequently
processing reselection and re-advertisement of a large number of routes that are iterated to a
flapping next hop, which reduces system resource consumption and CPU usage.
Issue 01 (2018-05-04) 1317

NE20E-S2
1.10.9.2.24 BGP-LS
BGP-link state (LS) enables BGP to report topology information collected by IGPs to the
controller.
Background
BGP-LS is a new method of collecting topology information.
Without BGP-LS, the router uses an IGP (OSPF or IS-IS) to collect topology information of
each AS, and the IGP reports the information to the controller. This topology information
collection method has the following disadvantages:
 The controller must have high computing capabilities and support the IGP and its
algorithm.
 The controller cannot gain the complete inter-AS topology information and therefore is
unable to calculate optimal E2E paths.
 Different IGPs report topology information separately to the controller, which
complicates the controller's analysis and processing.
For details on how OSPF collects topology information, see NE20E Feature Description -
OSPF.
For details on how IS-IS collects topology information, see NE20E Feature Description -
IS-IS.
With powerful routing capabilities of BGP, BGP-LS has the following advantages:
 Reduces computing capability requirements and spares the necessity of IGPs on the
controller.
 Facilitates route selection and calculation on the controller by using BGP to summarize
process or AS topology information and report the complete information to the
controller.
 Requires only one routing protocol (BGP) to report topology information to the
controller.
Related Concepts
BGP-LS provides a simple and efficient method of collecting topology information.
BGP-LS routes carry topology information and are classified into three types of routes that
carry node, link, and route prefix information, respectively. Theses routes collaborate in
carrying topology information.
BGP-LS Route Formats

Format of node routes
For example, a node route is in the format of
[NODE][ISIS-LEVEL-1][IDENTIFIER0][LOCAL[as100][bgp-ls-identifier11.1.1.2][ospf-are
a-id0.0.0.0][igp-router-id0000.0000.0001.00]].
Node routes carry node information.
Table 1-254 describes the fields in node routes.
Issue 01 (2018-05-04) 1318

NE20E-S2
Table 1-254 Description of the fields in node routes
Item Description
NODE Field indicating that the BGP-LS route is a
node route.
ISIS-LEVEL-1 Protocol that collects topology information.
The protocol is IS-IS in this example.
IDENTIFIER0 Identifier of the protocol that collects
topology information.
LOCAL Field indicating information of a local node.
as BGP-LS domain AS number.
bgp-ls-identifier BGP-LS domain ID.
ospf-area-id OSPF area ID.
igp-router-id IGP router ID, generated by the IGP that
collects topology information. The router ID
is obtained from the NET of an IS-IS
process in this example.
Format of link routes

For example, a link route is in the format of
[LINK][ISIS-LEVEL-1][IDENTIFIER0][LOCAL[as255.255][bgp-ls-identifier192.168.102.4
][ospf-area-id0.0.0.0][igp-router-id0000.0000.0002.01]][REMOTE[as255.255][bgp-ls-identifi
er192.168.102.4][ospf-area-id0.0.0.0][igp-router-id0000.0000.0002.00]][LINK[if-address0.0.
0.0][peer-address0.0.0.0][if-address::][peer-address::][mt-id0]].
Link routes carry information about links between devices.
Table 1-255 describes the fields in link routes.
Table 1-255 Description of the fields in link routes
Item Description
LINK Field indicating that the BGP-LS route is a
link route.
Issue 01 (2018-05-04) 1319

NE20E-S2
Item Description
REMOTE Field indicating information of a remote
node.
if-address IP address of the local interface.
peer-address IP address of the remote interface.
mt-id ID of the topology.
Format of prefix routes

For example, a prefix route is in the format of
[IPV4-PREFIX][ISIS-LEVEL-1][IDENTIFIER0][LOCAL[as100][bgp-ls-identifier192.168.1
02.3][ospf-area-id0.0.0.0][igp-router-id0000.0000.0001.00]][PREFIX[mt-id0][ospf-route-type
0][prefix192.168.102.0/24]].
Prefix routes carry information about reachable network segments.
Table 1-256 describes the fields in link routes.
Table 1-256 Description of the fields in prefix routes
Item Description
IPV4-PREFIX Field indicating an IPv4 or IPv6 prefix

route. The router cannot generate IPv6
prefix routes, but it can process the IPv6
prefix routes received from non-Huawei
devices.
PREFIX Field indicating an IGP route.
Issue 01 (2018-05-04) 1320

NE20E-S2
Item Description
mt-id ID of the topology.
ospf-route-type OSPF route type:
 1: Intra-Area
 2: Inter-Area
 3: External 1
 4: External 2
 5: NSSA 1
 6: NSSA 2
prefix Prefix of an IGP route.
Typical Networking
Networking in which topology information is collected within an IGP area
In Figure 1-898, Device A, Device B, Device C, and Device D use IS-IS to communicate with
each other at the network layer. They are all Level-2 devices in the same area (area 10). Only
one of the four devices needs to have BGP-LS deployed and establish a BGP-LS peer
relationship with the controller so that BGP-LS can collect and report topology information to
the controller. To improve reliability, deploying BGP-LS on two or more devices and
establishing a BGP-LS peer relationship between each BGP-LS device and the controller are
recommended. The BGP-LS devices collect the same topology information, and they back up
each other in case one of them fails.
Figure 1-898 Networking in which topology information is collected within an IGP area
Networking in which topology information is collected between IGP areas
Issue 01 (2018-05-04) 1321

NE20E-S2
In Figure 1-899, Device A, Device B, Device C, and Device D use IS-IS to communicate with
each other at the network layer. Device A, Device B, and Device C reside in area 10, whereas
Device D resides in area 20. Device A and Device B are Level-1 devices, Device C is a
Level-1-2 device, and Device D is a Level-2 device. Only one of the four devices needs to
have BGP-LS deployed and establish a BGP-LS peer relationship with the controller so that
BGP-LS can collect and report topology information to the controller. To improve reliability,
deploying BGP-LS on two or more devices and establishing a BGP-LS peer relationship
between each BGP-LS device and the controller are recommended. The BGP-LS devices
collect the same topology information, and they back up each other in case one of them fails.
Figure 1-899 Networking in which topology information is collected between IGP areas
Networking in which topology information is collected between BGP ASs

In Figure 1-900, Device A and Device B belong to the same AS, and an IS-IS neighbor
relationship is established between them. BGP is not enabled on Device A. An EBGP peer
relationship is established between Device B and Device C. If BGP-LS is not enabled, BGP
cannot exchange topology information between AS 100 and AS 200. As a result, the topology
information collected in AS 100 is different from that collected in AS 200. To address this
problem, enable BGP-LS on at least one device (two or more is recommended for higher
reliability) in each AS and establish a BGP-LS peer relationship between each of the devices
and the controller.
Issue 01 (2018-05-04) 1322

NE20E-S2
Figure 1-900 Networking 1 in which topology information is collected between BGP ASs
If two controllers are deployed and are connected to different ASs, for example in Figure
1-901, a BGP-LS peer relationship must be established between the two controllers or
between Device B and Device C so that both controllers can obtain topology information on
the whole network.
Figure 1-901 Networking 2 in which topology information is collected between BGP ASs
To reduce the number of connections to the controller, deploy one or more BGP-LS RRs and establish
BGP-LS peer relationships between each RR and the devices that require BGP-LS peer relationships
with the controller.
Usage Scenario
The router functions as a forwarder and reports topology information to the controller for
topology monitoring and traffic control.
Issue 01 (2018-05-04) 1323

NE20E-S2
Benefits
BGP-LS offers the following benefits:
 Reduces computing capability requirements of the controller.
 Allows the controller to gain the complete inter-AS topology information.
 Requires only one routing protocol (BGP) to report topology information to the
controller.
1.10.10 Routing Policy

1.10.10.1 Introduction to Routing Policies
Definition
Routing policies are used to filter routes and control how routes are received and advertised.
If route attributes, such as reachability, are changed, the path along which network traffic
passes changes accordingly.
Purpose
When advertising, receiving, and importing routes, the router implements certain routing
policies based on actual networking requirements to filter routes and change the route
attributes. Routing policies serve the following purposes:
 Control route advertising
Only routes that match the rules specified in a policy are advertised.
 Control route receiving
Only the required and valid routes are received, which reduces the routing table size and
improves network security.
 Filter and control imported routes
A routing protocol may import routes discovered by other routing protocols. Only routes
that satisfy certain conditions are imported to meet the requirements of the protocol.
 Modify attributes of specified routes
To enrich routing information, a routing protocol may import routing information
discovered by other routing protocols. Only the routing information that satisfies the
conditions is imported. Some attributes of the imported routing information are changed
to meet the requirements of the routing protocol.
Benefits
Routing policies have the following benefits:
 Control the routing table size, saving system resources.
 Control route receiving and advertising, improving network security.
 Modify attributes of routes for proper traffic planning, improving network performance.
Differences Between the Routing Policy and Policy-based Routing

Unlike the routing mechanism that searches the forwarding table for matching routes based on
the destination addresses of IP packets, policy-based routing (PBR) is based on the
Issue 01 (2018-05-04) 1324

NE20E-S2
user-defined routing policies. PBR selects routes based on the user-defined routing policies,
with reference to the source IP addresses and lengths of incoming packets. PBR can be used
to improve security and implement load balancing.
A routing policy and PBR have different mechanisms. Table 1-257 shows the differences
between them.
Table 1-257 Differences between the routing policy and PBR
Routing Policy Policy-based Routing

Forwards packets based on destination Forwards packets based on the policy. The
addresses in the routing table. device searches the routing table for packet
forwarding only after packets fail to be
forwarded based on the policy.
Based on the control plane, serves Based on the forwarding plane, serves
routing protocols and routing tables. forwarding.
Combines with a routing protocol to Needs to be configured hop by hop to ensure that
form a policy. packets are forwarded based on the policies.
Is configured using the route-policy Is configured using the policy-based-route
command. command.
1.10.10.2 Principles
Implementation
Routing policies are implemented in the following steps:
1. Define rules. Characteristics of routing information to which routing policies are applied
need to be defined. Specifically, you need to define a set of matching rules regarding
different attributes of routing information, such as the destination address and AS
number.
2. Apply rules. Matching rules are applied to advertise, receive, or import routes.
Filter
A filter is the core of a routing policy and is defined using a set of matching rules. The NE20E
provides the filters listed in Table 1-258.
Table 1-258 Comparisons between filters
Filter Applicab Matching Rules

le Scope
Access control list (ACL) Dynamic Inbound interface, source or destination IP

routing address, protocol type, and source or
protocols destination port number
IP prefix list Dynamic Source and destination IP addresses and next
routing hop address
protocols
Issue 01 (2018-05-04) 1325

NE20E-S2
Filter Applicab Matching Rules

le Scope
AS_Path BGP AS_Path attribute

Community BGP Community attribute
Extended community VPN Extended community attribute
Route distinguisher (RD) VPN RD attribute
Route-policy Dynamic Destination IP address, next hop address,
routing cost, interface information, route type, ACL,
protocols IP prefix list, AS_Path, community, extended
community, and RD.
The ACL, IP prefix list, AS_Path, community, Extended community, and RD filters can be
used to filter routes but cannot modify route attributes. A route-policy is a comprehensive
filter and can use the matching rules of the ACL, IP prefix list, AS_Path, community,
Extended community, and RD filters to filter routes and change route attributes.
ACL
An ACL is a set of sequential filtering rules. Users can define rules based on packet
information, such as inbound interfaces, source or destination IP addresses, protocol types, or
source or destination port numbers and specify an action to deny or permit packets. After an
ACL is configured, the system classifies received packets based on the rules defined in the
ACL and denies or permits the packets accordingly.
An ACL only classifies packets based on defined rules and filters packets only after it is
applied to a routing policy.
ACLs can be configured for both IPv4 packets and IPv6 packets. Based on the usage, ACLs
are classified as interface-based ACLs, basic ACLs, or advanced ACLs. Users can specify the
IP address and subnet address range in an ACL to match the source IP address, destination
network segment address, or the next hop address of a route.
ACLs can be configured on access or core devices to:
 Protect the devices against IP, TCP, and Internet Control Message Protocol (ICMP)
packet attacks.
 Control network access. For example, ACLs can be used to control the access of
enterprise network users to external networks, the specific network resources that users
can access, and the period for which users can access networks.
 Limit network traffic and improve network performance. For example, ACLs can be
used to limit bandwidth for upstream and downstream traffic, charge for the bandwidth
that users have applied for, and fully use high-bandwidth network resources.
For details about ACL features, see 1.9.3 ACL.
IP Prefix List
An IP prefix list contains a group of route filtering rules. Users can specify the prefix and
mask length range to match the destination network segment address or the next hop address
Issue 01 (2018-05-04) 1326

NE20E-S2
of a route. An IP prefix list is used to filter routes that are advertised and received by various
dynamic routing protocols.
An IP prefix list is easier and more flexible than an ACL. However, if a large number of
routes with different prefixes need to be filtered, configuring an IP prefix list to filter the
routes is complex.
IP prefix lists can be configured for both IPv4 routes and IPv6 routes, and they share the same
implementation process. An IP prefix list filters routes based on the mask length or mask
length range.
 Mask length: An IP prefix list filters routes based on IP address prefixes. An IP address
prefix is defined by an IP address and the mask length. For example, for route
10.1.1.1/16, the mask length is 16 bits, and the valid prefix is 16 bits (10.1.0.0).
 Mask length range: Routes with the IP address prefix and mask length within the range
defined in the IP prefix list meet the matching rules.
0.0.0.0 is a wildcard address. If the IP prefix is 0.0.0.0, specify either a mask or a mask length range,
with the following results:
 If a mask is specified, all routes with the mask are permitted or denied.
 If a mask length range is specified, all routes with the mask length in the range are permitted or
denied.
AS_Path
An AS_Path is used to filter BGP routes based on AS_Path attributes contained in BGP routes.
The AS_Path attribute is used to record in distance-vector (DV) order the numbers of all ASs
through which a BGP route passes from the local end to the destination. Therefore, AS_Path
attributes can be used to filter BGP routes.
The matching condition of an AS_Path is specified using a regular expression. For example,
^30 indicates that only the AS_Path attribute starting with 30 is matched. Using a regular
expression can simplify configurations. For details about regular expressions, see
Configuration Guide - Basic Configurations.
The AS_Path attribute is a private attribute of BGP and is therefore used to filter BGP routes only. For
details about the AS_Path attribute, see 1.10.9.2.1 Basic Principle.
Community
A community is used to filter BGP routes based on the community attributes contained in
BGP routes. The community attribute is a set of destination addresses with the same
characteristics. Therefore, community attributes can be used to filter BGP routes.
In addition to the well-known community attributes, users can define community attributes
using digits. The matching condition of a community filter can be specified using a
community ID or a regular expression.
Issue 01 (2018-05-04) 1327

NE20E-S2
Like AS_Path filters, community filters are used to filter only BGP routes because the community
attribute is also a private attribute of BGP. For details about the community attribute, see 1.10.9.2.6
Community Attribute.
Extended Community
An extended community is used to filter BGP routes based on extended community attributes.
BGP extended community attributes are classified into two types:
 VPN target: A VPN target controls route learning between VPN instances, isolating
routes of VPN instances from each other. A VPN target may be either an import or export
VPN target. Before advertising a VPNv4 or VPNv6 route to a remote MP-BGP peer, a
PE adds an export VPN target to the route. After receiving a VPNv4 or VPNv6 route, the
remote MP-BGP peer compares the received export VPN target with the local import
VPN target. If they are the same, the remote MP-BGP peer adds the route to the routing
table of the local VPN instance.
 Source of Origin (SoO): Several CEs at a VPN site may be connected to different PEs.
The VPN routes advertised from the CEs to the PEs may be re-advertised to the VPN site
where the CEs reside after the routes have traversed the backbone network, causing
routing loops at the VPN site. In this situation, configure an SoO attribute for VPN
routes. With the SoO attribute, routes advertised from different VPN sites can be
distinguished and will not be advertised to the source VPN site, preventing routing loops.
The formats of a VPN target attribute and an SoO attribute are the same. The matching
condition of an extended community can be specified using an extended community ID or a
regular expression.
An extended community is used to filter only BGP routes because the extended community attribute is
also a private attribute of BGP. For details about the extended community attribute, see 1.14.6.2.1 Basic
BGP/MPLS IP VPN.
RD
An RD is used to filter BGP routes based on RDs in VPN routes. RDs are used to distinguish
IPv4 and IPv6 prefixes in the same address segment in VPN instances. An RD filters specify
matching rules regarding RD attributes.
For details about how to configure an RD, see HUAWEI NE20E-S2 Universal Service
Router Configuration Guide – VPN.
Route-Policy
A route-policy is a comprehensive filter. It is used to match attributes of specified routes and
change route attributes when specific conditions are met. A route-policy can use the preceding
six filters to define its matching rules.
 Composition of a Route-Policy
As shown in Figure 1-902, a route-policy consists of node IDs, matching mode, if-match
clauses, and apply clauses.
Issue 01 (2018-05-04) 1328

NE20E-S2
Figure 1-902 Composition of a route-policy
− Node ID
A route-policy consists of one or more nodes. Node IDs are specified as indexes in
the IP prefix list. In a route-policy, routes are filtered based on the following rules:
 Sequential matching: The system checks entries based on node IDs in
ascending order. Therefore, specifying the node IDs in the required sequence is
recommended.
 One-time matching: The relationship between the nodes of a route-policy is
"OR". If a route matches one node, the route matches the route-policy and will
not be matched against the next node.
− Matching mode
Either of the following matching modes can be used:
 permit: specifies the permit mode of a node. If a route matches the if-match
clauses of a node, all the actions defined by apply clauses are performed, and
the matching is complete. If a route does not match the if-match clauses of the
node, the route continues to match against the next node.
 deny: specifies the deny mode of a node. In deny mode, the apply clauses are
not used. If a route matches all the if-match clauses of the node, the route is
denied by the node and no longer matches against the next node. If the route
does not match any of the if-match clauses, the route continues to match
against the next node.
To allow other routes to pass through, a route-policy that contains no if-match or apply clause in
permit mode needs to be configured for a node next to multiple nodes in deny mode.
− if-match clause
The if-match clause defines the matching rules.
Each node of a route-policy can comprise multiple if-match clauses or no if-match
clause at all. If no if-match clause is configured for a node in permit mode, all
IPv4 and IPv6 routes match the node. If an if-match clause is configured to match
only IPv4 routes for a node in permit mode, matching IPv4 routes and all IPv6
routes match the node. If an if-match clause is configured to match only IPv6
Issue 01 (2018-05-04) 1329

NE20E-S2
routes for a node in permit mode, matching IPv6 routes and all IPv4 routes match
the node.
− apply clause
The apply clauses specify actions. When a route matches a route-policy, the system
sets some attributes for the route based on the apply clause.
Each node of a route-policy can comprise multiple apply clauses or no apply clause
at all. The apply clause is not used when routes need to be filtered but attributes of
the routes do not need to be changed.
 Matching results of a route-policy
The matching results of a route-policy are obtained based on the following aspects:
− Matching mode of the node, either permit or deny
− Matching rules (either permit or deny) contained in the if-match clause (such as
ACLs or IP prefix lists)
The matching results are listed in Table 1-259.
Table 1-259 Matching results of a Route-Policy
Rule (Matching Rule Mode Matching Result

Contained in (Matching
if-match Clauses) Mode of a
Node)
permit permit  Routes matching the if-match clauses of the

node match the route-policy, and the
matching is complete.
 Routes not matching the if-match clauses of
the node continue to match against the next
node of the route-policy.
deny  Routes matching the if-match clauses of the
node are denied by the route-policy, and the
matching is complete.
deny permit  Routes matching the if-match clauses of the
node are denied by the route-policy and
continue to match against the next node.
deny  Routes matching the if-match clauses of the

node are denied by the route-policy and
continue to match against the next node.
NOTE
Issue 01 (2018-05-04) 1330

NE20E-S2
Rule (Matching Rule Mode Matching Result

Contained in (Matching
if-match Clauses) Mode of a
Node)
If all if-match clauses and nodes of the route-policy
are in deny mode, all the routes to be filtered are
denied by the route-policy.
On the HUAWEI NE20E-S2, all routes that fail to match a route-policy are denied by the route-policy
by default. If more than one node is defined in a route-policy, at least one of them must be in permit
mode. The reason is as follows:
 If a route fails to match any of the nodes, the route is denied by the route-policy.
 If all the nodes in the route-policy are set in deny mode, all the routes to be filtered are denied by the
route-policy.
Other Functions
In addition to the preceding functions, routing policies have an enhanced feature: BGP to IGP.
In some scenarios, when an IGP uses a routing policy to import BGP routes, route attributes,
the cost for example, can be set based on private attributes, such as the community in BGP
routes. However, without the BGP to IGP feature, BGP routes are denied because the IGP
fails to identify private attributes, such as community attributes in these routes. As a result,
apply clauses used to set route attributes do not take effect.
With the BGP to IGP feature, route attributes can be set based on private attributes, such as
the community, extended community, and AS_Path attributes in BGP routes. The BGP to IGP
implementation process is as follows:
 When an IGP imports BGP routes through a route-policy, route attributes can be set
based on private attributes, such as the community attribute in BGP routes.
 If BGP routes carry private attributes, such as community attributes, the system filters
the BGP routes based on the private attributes. If the BGP routes meet the matching rules,
the routes match the route-policy, and apply clauses take effect.
 If BGP routes do not carry private attributes, such as community attributes, the BGP
routes fail to match the route-policy and are denied, and apply clauses do not take effect.
Specific Route Filtering

On the OSPF-enabled network shown in Figure 1-903, Device A receives routes from the
Internet and advertises some of the routes to Device B.
 Device A advertises only routes 172.1.17.0/24, 172.1.18.0/24, and 172.1.19.0/24 to
Device B.
Issue 01 (2018-05-04) 1331

NE20E-S2
 Device C accepts only the route 172.1.18.0/24.

 Device D accepts all the routes advertised by Device B.
Figure 1-903 Filtering received and advertised routes
There are multiple approaches to meet the preceding requirements, and the following two
approaches are used in this example:
 Use IP prefix lists
− Configure an IP prefix list for Device A and configure the IP prefix list as an export
policy on Device A for OSPF.
− Configure another IP prefix list for Device C and configure the IP prefix list as an
import policy on Device C for OSPF.
 Use route-policies
− Configure a route-policy (the matching rules can be the IP prefix list, cost, or route
tag) for Device A and configure the route-policy as an export policy on Device A for
OSPF.
− Configure another route-policy on Device C and configure the route-policy as an
import policy on Device C for OSPF.
Compared with an IP prefix list, a route-policy can change route attributes and control
routes more flexibly, but it is more complex to configure.
Transparent Transmission of Routes of Other Protocols Through an OSPF AS

On the network shown in Figure 1-904, an AS runs OSPF and functions as a transit AS for
other areas. Routes from the IS-IS area connected to Device A need to be transparently
transmitted through the OSPF AS to the IS-IS area connected to Device D. Routes from the
RIP-2 area connected to Device B need to be transparently transmitted through the OSPF AS
to the RIP-2 area connected to Device C.
Issue 01 (2018-05-04) 1332

NE20E-S2
Figure 1-904 Transparently transmitting routes of other protocols through an OSPF AS
To meet the preceding requirements, configure a route-policy for Device A to set a tag for the
imported IS-IS routes. Device D identifies the IS-IS routes from OSPF routes based on the
tag.
Routing Policy Application in Inter-AS VPN Option C

On the network shown in Figure 1-905, CE1 and CE2 communicate with each other through
inter-AS VPN Option C.
Figure 1-905 Implementing route-policies in the inter-AS VPN Option C scenario
To establish an inter-AS label switched path (LSP) between PE1 and PE2, route-policies need
to be configured for autonomous system boundary routers (ASBRs).
 When an ASBR advertises the routes received from a PE in the same AS to the peer
ASBR, the ASBR allocates MPLS labels to the routes using a route-policy.
 When an ASBR advertises labeled IPv4 routes to a PE in the same AS, the ASBR
reallocates MPLS labels to the routes using another route-policy.
In addition, to control route transmission between different VPN instances on a PE, configure
a route-policy for the PE and configure the route-policy as an import or export policy for the
VPN instances.
Issue 01 (2018-05-04) 1333

NE20E-S2
Application of BGP to IGP

On the network shown in Figure 1-906, Device A and Device B are aggregation devices on a
backbone network, and Device C and Device D are egress devices of a metropolitan area
network (MAN). BGP peer relationships are established between Device A and Device C as
well as between Device B and Device D. External routes are advertised to the MAN using
BGP. The MAN runs OSPF to implement interworking.
Figure 1-906 BGP to IGP
To enable devices on the MAN to access the backbone network, Device C and Device D need
to import routes. When OSPF imports BGP routes, a routing policy can be configured to
control the number of imported routes based on private attributes (such as the community) of
the imported BGP routes or modify the cost of the imported routes to control the MAN egress
traffic.
1.11 IP Multicast
Purpose
This document describes the IP multicast feature in terms of its overview, principles, and
applications.
Related Version
Issue 01 (2018-05-04) 1334

NE20E-S2

U2000 V200R017C50
Intended Audience
securely protected.
Issue 01 (2018-05-04) 1335

NE20E-S2
and VRRP.
Special Declaration
Symbol Conventions
Symbol Description



injury.
tips.
deterioration.
Issue 01 (2018-05-04) 1336

NE20E-S2
Change History
V800R009C10SPC200.
V800R009C10SPC100.
V800R009C10.
1.11.2 IP Multicast Overview

IP multicast is a method of sending a single IP stream to multiple receivers simultaneously,
reducing bandwidth consumption. IP multicast provides benefits for point to multi-point
(P2MP) services, such as e-commerce, online conferencing, online auctions, video on demand,
and e-learning. P2MP services offer opportunities for significant profits, yet require high
bandwidth and secure operation. IP multicast is used to meet these requirements.
IP Addresses Identified by Hosts

Hosts identify the following types of IP addresses:
 Unicast IP address
A unicast IP address can identify only one host, and a host can identify only one unicast
IP address. An IP packet that carries a unicast destination address can be received by
only one host.
 Broadcast IP address
A broadcast IP address can identify all hosts on a network segment, and an IP packet that
carries a broadcast destination IP address can be received by all hosts on a network
segment. However, a host can identify only one broadcast IP address. IP broadcast
packets cannot be transmitted across network segments.
 Multicast IP address
A multicast IP address can identify multiple hosts at different locations, and a host can
identify multiple multicast IP addresses. An IP packet that carries a multicast destination
IP address can therefore be received by multiple hosts at different locations.
IP Transmission Modes
Based on the IP address types, networks can transmit packets in the following modes:
 IP unicast mode
 IP broadcast mode
 IP multicast mode
Any of these modes can be used for P2MP data transmission.
Issue 01 (2018-05-04) 1337

NE20E-S2
 Unicast transmission
− Features: A unicast packet uses a unicast address as the destination address. If
multiple receivers require the same packet from a source, the source sends an
individual unicast packet to each receiver.
− Disadvantages: This mode consumes unnecessary bandwidth and processor
resources when sending the same packet to a large number of receivers.
Additionally, the unicast transmission mode does not guarantee transmission quality
when a large number of hosts exist.
 Broadcast transmission
− Features: A broadcast packet uses a broadcast address as the destination address. In
this mode, a source sends only one copy of each packet to all hosts on the network
segment, irrespective of whether a host requires the packet.
− Disadvantages: This mode requires that the source and receivers reside on the same
network segment. Because all hosts on the network segment receive packets sent by
the source, this mode cannot guarantee information security or charging of services.
 Multicast transmission
As shown in Figure 1-907, a source exists on the network. User A and User C require
information from the source, while User B does not. The transmission mode is multicast.
Figure 1-907 Multicast transmission
− Features: A multicast packet uses a multicast address as the destination address. If

multiple receivers on a network segment require the same packet from a source, the
source sends only one packet to the multicast address.
The multicast protocol deployed on the network establishes a routing tree for the
packet. The tree's root is the source, and routes branch off to all multicast members.
As shown in Figure 1-907, multicast data is transmitted along the path: Source →
Device B → Device E [ →Device D → User A | → Device F → User C ].
Issue 01 (2018-05-04) 1338

NE20E-S2
− Advantages: In multicast mode, a single information flow is sent to users along the
distribution tree, and a maximum of one copy of the data flow exists on each link.
Users who do not require the packet do not receive the packet, providing the basis
for information security. Compared with unicast, multicast does not increase the
network load when the number of users increases in the same multicast group. This
advantage prevents the server and CPU from being overloaded. Compared with
broadcast, multicast can transmit information across network segments and across
long distances.
Multicast technologies therefore provide the ideal solution when one source must
address multiple receivers with efficient P2MP data transmission.
− Multicast applications: Multicast applies to all P2MP applications, such as
multimedia presentations, streaming media, communications for training and
tele-learning, highly reliable data storage, and finance (stock-trading) applications.
IP multicast is being widely used in Internet services, such as online broadcast,
network TV, distance learning, remote medicine, network TV broadcast, and
real-time video and audio conferencing.
1.11.2.2 Principles
Multicast Group
A multicast group consists of a group of receivers that require the same data stream. A
multicast group uses an IP multicast address identifier. A host that joins a multicast group
becomes a member of the group and can identify and receive IP packets that have the IP
multicast address as the destination address.
Multicast Source
A multicast source sends IP packets that carry multicast destination addresses.
 A multicast source can simultaneously send data to multiple multicast groups.
 Multiple multicast sources can simultaneously send data to a same multicast group.
Multicast Group Member

A member of a multicast group is a host that requires IP packets from the multicast group.
Hosts can choose to join or leave a multicast group, so the members of a multicast group are
dynamic. The members can be located anywhere on a network.
A multicast source is generally not a receiver or a member of a multicast group.
Multicast Router
A router that supports the multicast feature is called a multicast router.
A multicast router implements the following functions:
 Manages group members on the leaf segment networks that connect to users.
 Routes and forwards multicast packets.
Issue 01 (2018-05-04) 1339

NE20E-S2
Multicast Distribution Tree

A multicast distribution tree (MDT) is a tree-shaped packet distribution path along which
multicast traffic is sent to multicast receivers.
1.11.2.2.2 Basic Framework

This section describes the basic multicast framework and key multicast techniques that
transmit multicast data from a source to multiple receivers. Table 1-260 shows the key
multicast techniques.
Table 1-260 Key multicast techniques
Multicast Technique Description
Host registration Determines whether a receiver exists.

Multicast source discovery technology Determines the multicast source.
Multicast addressing mechanism Determines the multicast data destination.
Multicast routing Forwards multicast data.
IP multicast is an end-to-end service. Figure 1-908 shows the four IP multicast functions from
the lower protocol layer to the upper protocol layer.
Figure 1-908 IP multicast basic framework
The four functions operate as follows:

 Addressing mechanism: transmits data to multicast groups based on multicast destination
addresses.
 Host registration: allows a host to dynamically join or leave a group, implementing
group member management.
 Multicast routing: sets up a distribution tree to transmit packets from a source to
receivers.
 Multicast application: To work together, multicast sources and receivers must support the
same multicast application software, such as a video conferencing application. The
TCP/IP protocol suite must support multicast data transmission and receipt.
Issue 01 (2018-05-04) 1340

NE20E-S2
1.11.2.2.3 Multicast Addresses

The multicast addressing mechanism determines the destination of a packet and how to
determine a destination address.
Multicast addressing has the following needs:
 Multicast IP addresses are needed to implement the communication between a source
and its receivers on the network layer.
 Link layer multicast (also known as hardware multicast) is needed to transmit multicast
data on a local physical network. On an Ethernet link layer network, hardware multicast
uses multicast MAC addresses.
 An IP-to-MAC address mapping technology is needed to map multicast IP addresses to
multicast MAC addresses.
IPv4 Multicast Addresses

IPv4 addresses are classified as Class A, B, C, D, or E. Class D addresses are IPv4 multicast
addresses and are carried in packets' destination address fields to identify multicast groups.
A multicast packet's source address field is a Class A, B, or C unicast address. A Class D
address cannot be a source IP address in a multicast packet. Class E addresses are reserved for
future use.
All receivers of a multicast group are identified by the same IPv4 multicast group address on
the network layer. Once a user joins the group, the user can receive all IP packets sent to the
group.
Class D addresses are in the 224.0.0.0 to 239.255.255.255 range. For details, see Table 1-261.
Table 1-261 Class D addresses
Class D Address Range Description
224.0.0.0 to 224.0.0.255 Permanent multicast group addresses reserved by the

Internet Assigned Number Authority (IANA) for routing
protocols
224.0.1.0 to 231.255.255.255 Temporary any-source multicast (ASM) group addresses
233.0.0.0 to 238.255.255.255 valid on the entire network
232.0.0.0 to 232.255.255.255 Temporary source-specific multicast (SSM) group

addresses valid on the entire network
239.0.0.0 to 239.255.255.255 Temporary ASM group addresses valid only in local
administration domains
A local administration multicast address is a private
address and can be used in different multicast
administration domains.
 A permanent multicast group address, also known as a reserved multicast group address,
identifies all devices in a multicast group that may contain any number (including 0) of
members. For details, see Table 1-262.
Issue 01 (2018-05-04) 1341

NE20E-S2
 A temporary multicast group address, also known as a common group address, is an IPv4
address that is assigned to a multicast group temporarily. If there is no user in this group,
this address is reclaimed.
Table 1-262 General permanent multicast group addresses
Permanent Multicast Group Address Usage
224.0.0.0 Unassigned address

224.0.0.1 Address of all hosts and routers on a network
segment (this address works like a broadcast
address)
224.0.0.2 Address of all multicast routers
224.0.0.3 Unassigned address
224.0.0.4 Address of Distance Vector Multicast
Routing Protocol (DVMRP) devices
224.0.0.5 Address of Open Shortest Path First (OSPF)
devices
224.0.0.6 Address of OSPF designated devices
224.0.0.7 Address of ST routers
224.0.0.8 Address of ST hosts
224.0.0.9 Address of RIP version 2 (RIP-2) devices
224.0.0.11 Address of mobile agents
224.0.0.12 Address of Dynamic Host Configuration
Protocol (DHCP) servers or relay agents
224.0.0.13 Address of all Protocol Independent
Multicast (PIM) devices
224.0.0.14 Address of Resource Reservation Protocol
(RSVP) devices
224.0.0.15 Address of all CBT devices
224.0.0.16 Address of a designated SBM
224.0.0.17 Address of all SBMSs
224.0.0.18 Address of Virtual Router Redundancy
Protocol (VRRP) devices
224.0.0.19 to 224.0.0.21 Unassigned addresses
224.0.0.22 Address of all Internet Group Management
Protocol version 3 (IGMPv3) routers
224.0.0.23 to 224.0.0.255 Unassigned addresses
Issue 01 (2018-05-04) 1342

NE20E-S2
IPv6 Multicast Addresses

Figure 1-909 shows the format of an IPv6 multicast address.
Figure 1-909 IPv6 multicast address format
 An IPv6 multicast address starts with FF.

 The flags field (4 bits) indicates the multicast type. Values in this field are defined as
follows:
− 0: well-known multicast address defined by the IANA
− 1: multicast address of the ASM model
− 2: multicast address of the ASM model
− 3: multicast address of the SSM model
− Other value: unassigned multicast address
 The scope field (4 bits) indicates whether a multicast group contains any node in the
global address space or only the nodes of the same local network, the same site, or the
same organization. Values in this field are defined as follows:
− 0: reserved for other multicast protocol usage
− 1: node/interface-local scope
− 2: link-local scope
− 3: reserved for other multicast protocol usage
− 4: admin-local scope
− 5: site-local scope
− 8: organization-local scope
− E: global scope
− F: reserved for other multicast protocol usage
− Other value: unassigned and can be used as a common address
Table 1-263 shows the scopes and meanings of fixed IPv6 multicast addresses.
Table 1-263 IPv6 multicast addresses
Scope Description
FF0x::/32 Well-known multicast addresses defined by the IANA
For details, see Table 1-264.
FF1x::/32 (x cannot be 1 or 2) ASM addresses valid on the entire network
FF2x::/32 (x cannot be 1 or 2)
Issue 01 (2018-05-04) 1343

NE20E-S2
Scope Description
FF3x::/32 (x cannot be 1 or 2) SSM addresses
This is the default SSM group address scope and is
valid on the entire network.
Table 1-264 Commonly used IPv6 multicast addresses
Scope IPv6 Multicast Address Description

Node/interface-lo FF01:0:0:0:0:0:0:1 Address of all hosts and routers on a
cal scope network segment (this address works
like a broadcast address)
FF01:0:0:0:0:0:0:2 Address of all routers
Link-local scope FF02:0:0:0:0:0:0:1 Address of all nodes
FF02:0:0:0:0:0:0:2 Address of all routers
FF02:0:0:0:0:0:0:3 Undefined address
FF02:0:0:0:0:0:0:4 Address of DVMRP devices
FF02:0:0:0:0:0:0:5 Address of OSPF devices
FF02:0:0:0:0:0:0:6 Address of OSPF designated routers
FF02:0:0:0:0:0:0:7 Address of ST devices
FF02:0:0:0:0:0:0:8 Address of ST hosts
FF02:0:0:0:0:0:0:9 Address of Routing Information
Protocol (RIP) devices
FF02:0:0:0:0:0:0:A Address of Enhanced Interior
Gateway Routing Protocol (EIGRP)
devices
FF02:0:0:0:0:0:0:B Address of mobile agents
FF02:0:0:0:0:0:0:D Address of all PIM devices
FF02:0:0:0:0:0:0:E Address of RSVP devices
FF02:0:0:0:0:0:1:1 Address of links
FF02:0:0:0:0:0:1:2 Address of all DHCP agents
FF02:0:0:0:0:1:FFXX:XXXX Solicited node address
XX:XXXX indicates the 24 least
significant bits of an IPv6 address.
Site-local scope FF05:0:0:0:0:0:0:2 Address of all routers
FF05:0:0:0:0:0:1:3 Address of all DHCP severs
FF05:0:0:0:0:0:1:4 Address of all DHCP relays
Issue 01 (2018-05-04) 1344

NE20E-S2
Scope IPv6 Multicast Address Description

FF05:0:0:0:0:0:1:1000 to Addresses of service locations
FF05:0:0:0:0:0:1:13FF
Multicast MAC Addresses

IEEE802.3 defines unicast and multicast MAC addresses as follows:
 The last bit in the first byte of a unicast address is fixed at 0.
 The last bit in the first byte of a multicast address is fixed at 1.
Multicast MAC addresses identify receivers of the same multicast group at the link layer.
Ethernet interface boards can identify multicast MAC addresses. After a multicast MAC
address of a multicast group is configured on a device's driver, the device can then receive and
forward data of the multicast group on the Ethernet. The mapping between the multicast IPv4
address and multicast IPv4 MAC address is as follows:
As defined by the IANA, the 24 most significant bits of a MAC address are 0x01005e, the
25th bit is 0, and the 23 least significant bits are the same as those of a multicast IPv4 address.
Figure 1-910 shows the mapping relationships between multicast IPv4 addresses and
multicast MAC addresses.
Figure 1-910 Mapping relationships between multicast IPv4 addresses and multicast MAC
addresses
The first four bits of an IPv4 multicast address, 1110, are mapped to the 25 most significant
bits of a multicast MAC address. In the last 28 bits, only 23 bits are mapped to a MAC
address, resulting in the loss of 5 bits. Therefore, 32 IPv4 multicast addresses are mapped to
the same MAC address.
The IANA defines that the higher-order 16 bits of an IPv6 MAC address are 0x3333, and the
low-order 32 bits of an IPv6 MAC address are the same as those of a multicast IPv6 address.
Figure 1-911 shows the mapping relationship between the multicast IPv6 address and
multicast IPv6 MAC address.
Issue 01 (2018-05-04) 1345

NE20E-S2
Figure 1-911 Mapping relationships between multicast IPv6 addresses and multicast MAC
addresses
This document focuses on IP multicast technology and device operation. Multicast in the document
refers to IP multicast, unless otherwise specified.
1.11.2.2.4 Multicast Protocols

To implement a complete set of multicast services, several multicast protocols need to work
together, as shown in Figure 1-912 and Figure 1-913.
Figure 1-912 IPv4 multicast network
Issue 01 (2018-05-04) 1346

NE20E-S2
Figure 1-913 IPv6 multicast network
The NE20E supports various multicast routing protocols to implement different applications.
Table 1-265 describes commonly used multicast routing protocols.
Table 1-265 Multicast protocols
Network Multicast Protocol Protocol Function

Between a user  Internet Group Allows hosts to access multicast
host and a Management Protocol networks:
multicast router (IGMP) for IPv4 networks  On the host side, IGMP/MLD allows
 Multicast Listener hosts to dynamically join and leave
Discovery (MLD) for multicast groups.
IPv6 networks  On the router side, IGMP/MLD
exchanges information with upper
layer multicast routing protocols and
manages and maintains multicast
group member relationships.
Between Protocol Independent Routes and forwards multicast packets:

multicast Multicast (PIM)  Creates multicast routing entries.
routers in the
 Responds to network topology
same domain
changes and maintains multicast
routing tables.
 Forwards multicast data based on
routing entries.
Between Multicast Source Discovery Transmits source information between

multicast Protocol (MSDP) for IPv4 routers in different domains.
routers in networks
Issue 01 (2018-05-04) 1347

NE20E-S2
Network Multicast Protocol Protocol Function

different
domains
Multicast protocols have two main types of functions: managing member relationships;
establishing and maintaining multicast routes.
Managing Member Relationship

IGMP/MLDsets up and maintains member relationships between hosts and routers.
IGMP applies to IPv4 networks with the following variants:
 IGMP has three versions: IGMPv1, IGMPv2, and IGMPv3. At present, IGMPv2 is most
widely used. IGMP versions are backward compatible.
 All the IGMP versions support the Any-Source Multicast (ASM) model. IGMPv3 can
support the Source-Specific Multicast (SSM) model independently, while IGMPv1 or
IGMPv2 needs to work with SSM mapping to support the SSM model.
MLD applies to IPv6 networks with the following variants:
 MLD has two versions: MLDv1 and MLDv2.
 MLDv1 is similar to IGMPv2, and MLDv2 is similar to IGMPv3.
 Both of the two MLD versions support the ASM model. MLDv2 supports the SSM
model independently, while MLDv1 needs to work with SSM mapping to support the
SSM model.
Establishing and Maintaining Multicast Routes

A multicast route, also called a multicast distribution tree, refers to the data transmission path
from a multicast source to all receivers. The path is unidirectional, loop-free, and the shortest
available path. Multicast data packets can be forwarded only after multicast routes are
established and maintained among routers.
 Intra-domain multicast routing protocols discover multicast sources and establish
multicast distribution trees in an autonomous system (AS) to deliver information to
receivers.
 Inter-domain multicast routing protocols transmit multicast source information between
domains to set up inter-domain routes. Multicast resources can then be shared among
different domains. MSDP is a typical inter-domain multicast routing protocol. It usually
works with the Multicast Border Gateway Protocol (MBGP) to implement inter-domain
multicast. MSDP applies to domains that run PIM-SM.
In the SSM model, domains are not classified as intra-domains or inter-domains. Receivers
know the location of the multicast source domain; therefore, multicast transmission paths can
be directly established with the help of partial PIM-SM functions.
Currently, the NE20E supports PIM-SM, PIM-SSM, and MSDP.
Issue 01 (2018-05-04) 1348

NE20E-S2
1.11.2.2.5 Multicast Models

Based on the control level for multicast sources, IP multicast can use the following models:
 ASM model
 SFM model
 SSM model
ASM Model
In the any-source multicast (ASM) model, any sender can act as a multicast source and send
information to a multicast group address. Receivers cannot know the multicast source location
before they join a multicast group.
SFM Model
From the sender's point of view, the source-filtered multicast (SFM) model works the same as
the ASM model. That is, any sender can act as a multicast source and send information to a
multicast group address.
Compared with the ASM model, the SFM model extends the following function: The upper
layer software checks the source addresses of received multicast packets, permitting or
denying packets of multicast sources as configured.
Compared with ASM, SFM adds multicast source filtering policies. The basic principles and
configurations of ASM and SFM are the same. In this document, information about ASM also applies to
SFM.
SSM Model
In real-world situations, users may not require all data sent by multicast sources. The
source-specific multicast (SSM) model allows users to specify multicast data sources.
Compared with receivers in the ASM model, receivers in the SSM model know the multicast
source location before they join a multicast group. The SSM model uses a different address
scope from the ASM model and sets up a dedicated forwarding path between a source and
receivers.
1.11.2.2.6 Multicast Packet Forwarding

In the multicast models, an IP packet's destination address is a multicast group address. A
multicast source sends data packets to the host group identified by the destination address. To
transmit packets to all receivers, a router on the forwarding path needs to send a packet
received from an incoming interface to many outgoing interfaces. To perform these tasks,
multicast models use the following functions:
 A multicast routing table guides the forwarding of multicast packets.
 Reverse path forwarding (RPF) ensures that multicast routing uses the shortest path tree.
RPF is used by most multicast protocols to create multicast route entries and forward
packets.
Issue 01 (2018-05-04) 1349

NE20E-S2
Introduction to Multi-Instance Multicast

Multi-instance multicast is the basis of transmitting multicast data across VPNs.
Multi-instance multicast applies to IPv4 VPNs.
A VPN needs to be separated from a public network and also from other VPNs. As shown in
Figure 1-914, VPN A and VPN B are isolated yet connected to the public network through
provider edge (PE) devices.
Figure 1-914 Typical VPN networking
On this network:
 P belongs to the public network. Each customer edge (CE) device belongs to a VPN.
Each router is dedicated to a network and maintains only one forwarding mechanism.
 PEs are connected to both the public network and one or more VPN networks. The
network information must be completely separated, and a separate set of forwarding
mechanism needs to be maintained for each network. The set of software and hardware
device that serves the same network on the PE is called an instance. A PE supports
multiple instances, and one instance can reside on multiple PEs.
For details of the multi-instance multicast technique, see the HUAWEI NE20E-S2 Universal Service
Router Feature Description - VPN.
Applications of Multi-Instance Multicast

Multi-instance multicast implements the following functions for PEs:
Issue 01 (2018-05-04) 1350

NE20E-S2
 Maintains a separate multicast forwarding mechanism for each instance. A forwarding

mechanism supports all multicast protocols and maintains a PIM neighbor list and a
multicast routing table. Each instance searches its own forwarding table or routing table
when forwarding multicast data.
 Isolates instances from each other.
 Implements communication and data exchange between a public network instance and a
VPN instance.
1.11.3 IGMP
Definition
In the TCP/IP protocol suite, the Internet Group Management Protocol (IGMP) manages IPv4
multicast members, and sets up and maintains multicast member relationships between IP
hosts and their directly connected multicast routers.
After IGMP is configured on hosts and their directly connected multicast routers, the hosts
can dynamically join multicast groups, and the multicast routers can manage multicast group
members on the local network.
IGMP implements the following functions on the host side and router side:
 On the host side, IGMP allows hosts to dynamically join and leave multicast groups
anytime and anywhere. IGMP does not limit the number of hosts that can join or leave a
multicast group.
A host's operating system (OS) determines the IGMP version that the host supports.
 On the router side, IGMP enables a router to determine whether multicast receivers of a
specific group exist. Each host stores information about only the multicast groups it
joins.
IGMP has three versions, as listed in Table 1-266:
Table 1-266 IGMP versions
IGMP Version Model Supported

IGMPv1 Any-source multicast (ASM) and source-specific multicast (SSM)
To support SSM in IGMPv1, the SSM mapping technique is
requried.
IGMPv2 ASM and SSM
To support SSM in IGMPv2, the SSM mapping technique is
requried.
IGMPv3 ASM and SSM
Issue 01 (2018-05-04) 1351

NE20E-S2
Purpose
IGMP allows receivers to access IP multicast networks, join multicast groups, and receive
multicast data from multicast sources. IGMP manages multicast group members by
exchanging IGMP messages between hosts and routers. IGMP records host join and leave
information on interfaces, ensuring correct multicast data forwarding on the interfaces.
1.11.3.2 Principles
1.11.3.2.1 Principles of IGMP
IGMP Messages
Figure 1-915 IGMP networking
Figure 1-915 shows the IGMP message types.

 IGMP Query message: This type of message is sent by a router to hosts to learn whether
multicast receivers exist on a specific network segment. IGMP Query messages are sent
only by queriers. IGMP Query messages are categorized into the following types:
− General Query message: It does not contain specific source or group information.
− Group-specific Query message: It contains specific multicast group information, but
does not contain specific source information.
− Group-and-Source-Specific Query message: It contains both specific multicast
source and group information.
 IGMP Report message: It is sent by a host to an upstream device when the host wants to
join a multicast group.
 IGMP Leave message: It is sent by a host to an upstream device when the host wants to
leave a multicast group.
IGMPv2 and IGMPv3 support leave messages, but IGMPv1 does not.
Issue 01 (2018-05-04) 1352

NE20E-S2
IGMP Querier and Non-Querier

An IGMP multicast device can either be a querier or a non-querier:
 Querier
A querier is responsible for sending IGMP Query messages to hosts and receiving IGMP
Report messages and Leave messages from hosts. A querier can then learn which
multicast group has receivers on a specified network segment.
 Non-querier
A non-querier only receives IGMP Report messages from hosts to learn which multicast
group has receivers. Then, based on the querier's action, the non-querier identifies which
receivers leave multicast groups.
Generally, a network segment has only one querier. Multicast devices follow the same
principle to select a querier. The process is as follows (using Device A, Device B, and Device
C as examples):
 After IGMP is enabled on Device A, Device A considers itself a querier in the startup
process by default and sends IGMP Query messages. If Device A receives IGMP Query
messages from Device B that has a lower IP address, Device A changes from a querier to
a non-querier. Device A starts the another-querier-existing timer and records Device B as
the querier of the network segment.
 If Device A is a non-querier and receives IGMP Query messages from the querier Device
B, the another-querier-existing timer is updated; if Device A is a non-querier and
receives IGMP Query messages from Device C that has a lower IP address than the
querier Device B, the querier is changed to Device C, and the another-querier-existing
timer is updated.
 If Device A is a non-querier and the another-querier-existing timer expires, Device A
changes to a querier.
IGMPv1 does not support querier election. An IGMPv1 querier is designated by the upper-layer protocol,
such as PIM. In this version, querier election can be implemented only among multicast devices that run
the same IGMP version on a network segment.
IGMP Implementation
IGMP enables a multicast router to identify receivers by sending IGMP Query messages to
hosts and receiving IGMP Report messages and Leave messages from hosts. A multicast
router forwards multicast data to a network segment only if the network segment has
multicast group members. Hosts can decide whether to join or leave a multicast group.
As shown in Figure 1-916, IGMP-enabled Device A functions as a querier to periodically send
IGMP Query messages. All hosts (Host A, Host B, and Host C) on the same network segment
of Device A can receive these IGMP Query messages.
Issue 01 (2018-05-04) 1353

NE20E-S2
Figure 1-916 IGMP networking
 When a host (for example, Host A) receives an IGMP Query message of a multicast
group G, the processing flow is as follows:
− If Host A is already a member of group G, Host A replies with an IGMP Report
message of group G at a random time within the response period specified by
Device A.
After receiving the IGMP Report message, Device A records information about
group G and forwards the multicast data to the network segment of the host
interface that is directly connected to Device A. Meanwhile, Device A starts a timer
for group G or resets the timer if it has been started. If no members of group G
respond to Device A within the interval specified by the timer, Device A stops
forwarding the multicast data of group G.
− If Host A is not a member of any multicast group, Host A does not respond to the
IGMP Query message from Device A.
 When a host (for example, Host A) joins a multicast group G, the processing flow is as
follows:
Host A sends an IGMP Report message of group G to Device A, instructing Device A to
update its multicast group information. Subsequent IGMP Report messages of group G
are triggered by IGMP Query messages sent by Device A.
 When a host (for example, Host A) leaves a multicast group G, the processing flow is as
follows:
Host A sends an IGMP Leave message of group G to Device A. After receiving the
IGMP Leave message, Device A triggers a query to check whether group G has other
receivers. If Device A does not receive IGMP Report messages of group G within the
period specified by the query message, Device A deletes the information about group G
and stops forwarding multicast traffic of group G.
Message Processing Characteristics in Different IGMP Versions

IGMP Characteristic
Version
Issue 01 (2018-05-04) 1354

NE20E-S2
IGMP Characteristic
Version
IGMPv1  IGMPv1 manages multicast groups by exchanging IGMP Query messages

and IGMP Report messages. In IGMPv1, a host does not send an IGMP
Leave message when leaving a multicast group, and a router deletes the
record of a multicast group when the timer for maintaining the members in
the multicast group expires.
 IGMPv1 provides only General Query messages.
IGMPv2  In IGMPv2, an IGMP Report message contains information about a
multicast group, but does not contain information about a multicast source.
A message contains the record of a multicast group.
After a host sends an IGMP Report message of a multicast group to a
router, the router notifies the multicast forwarding module of this join
request. Then the multicast forwarding module can correctly forward
multicast data to the host.
 IGMPv2 is capable of suppressing IGMP Report messages to reduce
repetitive IGMP Report messages. This function works as follows:
− After a host (for example, Host A) joins a multicast group G, Host A
receives an IGMP Query message from the router. Then the host
randomly selects a value from 0 to the maximum response time
(specified in the IGMP Query message) as the timer value. When the
timer expires, Host A sends an IGMP Report message of group G to the
router. However, if Host A receives an IGMP Report message of group G
from another host in group G before the timer expires, Host A does not
send an IGMP Report message of group G to the router.
− When a host leaves group G, the host sends an IGMP Leave message of
group G to a router. Because of the Report message suppression
mechanism in IGMPv2, the router cannot determine whether another
host exists in group G. Therefore, the router triggers a query on group G.
If another host exists in group G, the host sends an IGMP Report
message of G to the router. If the router sends the query on group G for a
specified number of times, but does not receive an IGMP Report
message for group G, the router deletes information about group G and
stops forwarding multicast data of group G.
 IGMPv2 provides General Query messages and Group-specific Query
messages.
NOTE
Both IGMP queriers and non-queriers can process IGMP Report message, while only
queriers can forward IGMP Report messages. IGMP non-queriers cannot process IGMP
Leave messages.
IGMPv3  An IGMPv2 Report message contains information about multicast groups,

but does not contain information about multicast sources. Therefore, an
IGMPv2 host can select a multicast group, but not a multicast source/group.
IGMPv3 has resolved the problem. The IGMPv3 message from a host can
contain multiple records of multicast groups, with each multicast group
record containing multiple multicast sources.
On the router side, the querier sends IGMP Query messages and receives
IGMP Report and Leave messages from hosts to identify network segments
that contain receivers and forward the multicast data to the network
Issue 01 (2018-05-04) 1355

NE20E-S2
IGMP Characteristic
Version
segments. In IGMPv3, source information in multicast group records can be
filtered in either include mode or exclude mode:
− In include mode:
− If a source is included in a group record and the source is active, the
router forwards the multicast data of the source.
− If a source is included in a group record but the source is inactive, the
router deletes the source information and does not forward the multicast
data of the source.
− In exclude mode:
− If a source is active, the router forwards the multicast data of the source,
because there are hosts that require the multicast data of the source.
− If a source is inactive, the router does not forward the multicast data of
the source.
− If a source is excluded in a group record, the router forwards the
multicast data of the source.
 IGMPv3 does not have the Report message suppression mechanism.
Therefore, all hosts joining a multicast group must reply with IGMP Report
messages when receiving IGMP Query messages.
 In IGMPv3, multicast sources can be selected. Therefore, besides the
common query and multicast group query, an IGMPv3-enabled device adds
the designated multicast source and group query, enabling the router to find
whether receivers require data from a specified multicast source.
Advanta  IGMPv2 provides IGMP Leave messages, and thus IGMPv2 can manage
ges of members of multicast groups effectively.
IGMPv2  The multicast group can be selected directly, and thus the selection is more
over precise.
IGMPv1
Advanta  IGMPv3 allows hosts to select multicast sources, while IGMPv2 does not.
ges of  An IGMPv3 message contains records of multiple multicast groups, and
IGMPv3 thus the number of IGMP messages is reduced on the network segment.
over
IGMPv2
IGMP Group Compatibility

In IGMP group compatibility mode, a multicast device of a later IGMP version is compatible
with the hosts of an earlier IGMP version. For example, an IGMPv2 multicast device can
process join requests of IGMPv1 hosts; an IGMPv3 multicast device can process join requests
of IGMPv1 and IGMPv2 hosts.
In IGMP group compatibility mode, if a multicast device receives IGMP Report messages
from hosts running an earlier IGMP version, the multicast device automatically changes the
version of the corresponding multicast group to be the same as that of the hosts and then
operates in the earlier IGMP version. The process works as follows:
Issue 01 (2018-05-04) 1356

NE20E-S2
 When an IGMPv2 multicast device receives an IGMPv1 Report message from a

multicast group, the multicast device lowers the IGMP version of the multicast group to
IGMPv1. Then, the multicast device ignores the IGMPv2 Leave messages of the
multicast group.
 When an IGMPv3 multicast device receives IGMPv2 Report messages from a multicast
group, the multicast device lowers the IGMP version of the multicast group to IGMPv2.
Then, the multicast device ignores the IGMPv3 BLOCK messages and the multicast
source list in the IGMPv3 TO_EX messages. The multicast source-selecting function of
IGMPv3 messages is then disabled.
 When an IGMPv3 multicast device receives IGMPv1 Report messages from a multicast
group, the multicast device lowers the IGMP version of the multicast group to IGMPv1.
Then, the multicast device ignores the IGMPv2 Leave messages, IGMPv3 BLOCK
messages, IGMPv3 TO_IN messages, and multicast source list in the IGMPv3 TO_EX
messages.
If you manually change the IGMP version of a multicast device to a later version, the
multicast device still operates in the original version if group members of the original version
exist. The multicast device upgrades its IGMP version only after all group members of the
original version leave.
Router-Alert Option for IGMP

Generally, a packet is sent to and processed by the routing protocol layer only if the packet's
destination IP address is the IP address of a local interface. An IGMP packet's destination IP
address is usually a multicast address, so that IGMP packets may not be sent to the routing
protocol layer for processing.
To allow IGMP packets to be sent to the routing protocol layer, the Router-Alert option
mechanism is used to mark protocol packets. If a packet contains the Router-Alert option, the
packet must be sent to and processed by the routing protocol layer.
After a multicast device receives an IGMP packet:
 If the multicast device does not check the Router-Alert option and sends the IGMP
packet to the routing protocol layer, irrespective of whether the IGMP packet contains
the Route-Alert option.
 If the multicast device is configured to check the Router-Alert option, the multicast
device sends the IGMP packet to the routing protocol layer only if the packet contains
the Route-Alert option.
1.11.3.2.2 IGMP Policy Control

IGMP policy control restricts or extends IGMP actions, without affecting IGMP
implementation. IGMP policy control can be implemented through IGMP-limit, Source
Address-based IGMP Message Filtering or group-policy.
 IGMP-limit
IGMP-limit is configured on router interfaces connected to users to limit the maximum
number of multicast groups, including source-specific multicast groups. This mechanism
enables users who have successfully joined multicast groups to enjoy smoother multicast
services.
 Source address-based IGMP message filtering
This feature allows you to specify multicast source addresses used to filter IGMP
messages. This feature prevents forged IGMP message attacks and enhances multicast
network security.
Issue 01 (2018-05-04) 1357

NE20E-S2
 Group-policy
Group-policy is configured on router interfaces to allow the router to set restrictions on
specific multicast groups, so that entries will not be created for the restricted multicast
groups. This improves IGMP security.
IGMP-Limit
When a large number of multicast users request multiple programs simultaneously, excessive
bandwidth resources will be exhausted, and the router's performance will be degraded,
deteriorating the multicast service quality.
Figure 1-917 IGMP-limit application
To prevent this problem, configure IGMP-limit on a router interface to limit the maximum
number of IGMP entries on the interface. When receiving an IGMP Join message from a user,
the router interface first checks whether the configured maximum number of IGMP entries is
reached. If the maximum number is reached, the router interface discards the IGMP Join
message and rejects the user. If the maximum number is not reached, the router interface sets
up an IGMP membership and forwards data flows of the requested multicast group to the user.
This mechanism enables users who have successfully joined multicast groups to enjoy
smoother multicast services.
For example, on the network shown in Figure 1-917, if the maximum number of IGMP entries
is set to 1 on Interface 1 of router A, Interface 1 allows only one host to join a multicast group
and creates an IGMP entry only for the permitted host.
The working principles of IGMP-limit are as follows:
 IGMP-limit allows you to configure a maximum number of IGMP entries on a router
interface. After receiving an IGMP Join message, a router interface determines whether
to create an entry by checking whether the number of IGMP entries has reached the
upper limit on the interface.
 IGMP-limit allows you to configure an ACL on a router interface, so that the interface
permits IGMP Join messages containing a group address, including a source-group
address, in the range specified in the ACL, irrespective of whether the configured
Issue 01 (2018-05-04) 1358

NE20E-S2
maximum number of IGMP entries is reached. An IGMP entry that contains a group
address in the range specified in the ACL is not counted as one entry on an interface.
The principles of counting the number of IGMP entries are as follows:
 Each (*, G) entry is counted as one entry on an interface, and each (S, G) is counted as
one entry on an interface.
 Source-specific multicast (SSM) mapping (*, G) entries are not counted as entries on an
interface, and each (S, G) entry mapped using the SSM-mapping mechanism is counted
as one entry on an interface.
Source Address-based IGMP Message Filtering

If a multicast network is attacked by bogus IGMP messages, the network will forward
multicast traffic to multicast groups that do not have receivers, wasting bandwidth resources.
Source address-based IGMP message filtering resolves this problem by enabling a device to
filter out IGMP messages that contain unauthorized source addresses. Source address-based
IGMP message filtering works as follows for IGMP Report and Leave messages and for
IGMP Query messages:
 Source address-based IGMP message filtering for IGMP Report and Leave messages:
− The device permits the message only if the message's source address is 0.0.0.0 or an
address on the same network segment as the interface that receives the message.
− If ACL rules are configured for filtering IGMP Report and Leave messages, the
device determines whether to permit or discard an IGMP Report or Leave message
based on the ACL configurations.
 Source address-based IGMP message filtering for IGMP Query messages: A device
determines whether to permit or drop an IGMP Query message based on only the
configured ACL rules.
On the network shown in Figure 1-918, Device A's interface 10.0.0.1/24 connects to a user
network. Host A sends IGMP Report or Leave messages with the source address 11.0.0.1,
Host B sends IGMP Report or Leave messages with the source address 10.0.0.8, and Host C
sends IGMP Report or Leave messages with the source address 0.0.0.0. If an ACL rule is not
configured, Device A permits messages from Host B and Host C, but drops messages from
Host A. If ACL rules are configured, Device A determines whether to permit or drop IGMP
Report or Leave messages from Host B and Host C based on the ACL configurations. For
example, if an ACL rule only permits IGMP Report or Leave messages with the source
address 10.0.0.8, Device A permits IGMP Report or Leave messages from Host B, but drops
IGMP Report or Leave messages from Host C.
Issue 01 (2018-05-04) 1359

NE20E-S2
Figure 1-918 Source address-based filtering for IGMP Report or Leave messages
On the network shown in Figure 1-919, Device A is a querier that receives IGMP Report or
Leave messages from hosts. If Device B constructs bogus IGMP Query messages that contain
a source address lower than Device A's address, such as 10.0.0.1/24, Device A will become a
non-querier and fail to respond to IGMP Leave messages from hosts, so Device A continues to
forward multicast traffic to user hosts who have left, which wastes network resources. To
resolve this problem, you can configure an ACL rule on Device A to drop IGMP Query
messages with the source address 10.0.0.1/24.
Figure 1-919 Source address-based filtering for IGMP Query messages
Group-Policy
Group-policy is a filtering policy configured on router interfaces. For example, on the
network shown in Figure 1-920, Host A and Host C request to join the multicast group
Issue 01 (2018-05-04) 1360

NE20E-S2
225.1.1.1. Host B and Host D request to join the multicast group 226.1.1.1. Group-policy is
configured on router A to permit join requests only for the multicast group 225.1.1.1. Then,
router A creates entries for Host A and Host C, but not for Host B or Host D.
Figure 1-920 Group-policy application
To improve network security and facilitate network management, you can use group-policy to
disable a router interface from receiving IGMP Report messages from or forwarding multicast
data to specific multicast groups.
Group-policy is implemented through access control list (ACL) configurations.
1.11.3.2.3 IGMP Static-Group Join

Static-group is implemented by statically joining interfaces to groups. For example, on the
network shown in Figure 1-921, after an interface on router A is added to a static group, router
A will not start a timer for the multicast entry that contains the specified group address, and
the multicast entry will never expire. Therefore, router A sends multicast data to User 1 in the
static group, irrespective of whether this user is requesting the data. This entry cannot be
automatically deleted, but can only be manually deleted when it is not needed any more.
Figure 1-921 Static-group application
Issue 01 (2018-05-04) 1361

NE20E-S2
In real-world situations, static-group is configured on the router interface that is connected to

hosts, which facilitates multicast data forwarding to the router. The router interface can then
quickly forward the multicast data, which shortens the channel switchover period.
1.11.3.2.4 IGMP Prompt-Leave

When a host leaves a multicast group (group G, for example), the host sends an IGMP Leave
message of group G to the multicast device. Because of the Report message suppression
mechanism in IGMPv2, the multicast device cannot determine whether another host exists in
group G. Therefore, the multicast device triggers a query on group G. If another host exits in
group G, the host sends the IGMP Report message of group G to the multicast device. If the
multicast device sends the query on group G a specified number of times but does not receive
IGMP Report messages from any host, the multicast device deletes information about group G
and stops forwarding multicast data of group G.
If a multicast device is directly connected to an access device on which IGMP proxy is
enabled, when the access device leaves group G and sends the IGMP Leave message of group
G to the multicast device, the multicast device can identify that group G contains no receivers
and will not trigger the IGMP Query message. Then, the multicast device deletes all records
of group G and stops forwarding data of group G. This is called IGMP Prompt-Leave.
After IGMP Prompt-Leave is enabled on a multicast device, the multicast device does not
trigger IGMP Query messages destined for the multicast group when the multicast device
receives IGMP Leave messages from the multicast group. In this case, the multicast device
deletes all records about the multicast group and stops forwarding the data of the multicast
group. In this manner, the multicast device responds faster to IGMP Leave messages.
1.11.3.2.5 IGMP SSM Mapping
Background
IGMPv3 supports source-specific multicast (SSM) but IGMPv1 and IGMPv2 do not.
Although the majority of latest multicast devices support IGMPv3, most legacy multicast
terminals only support IGMPv1 or IGMPv2. SSM mapping is a transition solution that
provides SSM services for such legacy multicast terminals.
Using rules that specify the mapping from a particular multicast group G to a source-specific
group, SSM mapping can convert IGMPv1 or IGMPv2 packets whose group addresses are
within the SSM range to IGMPv3 packets. This mechanism allows hosts running IGMPv1 or
IGMPv2 to access SSM services. SSM mapping allows IGMPv1 or IGMPv2 terminals to
access only specific sources, thus minimizing the risks of attacks on multicast sources.
A multicast device does not process the (*, G) requirements, but only processes the (S, G) requirements
from the multicast group in the SSM address range. For details about SSM, see 1.11.4.2.2 PIM-SSM.
Implementation
As shown in Figure 1-922, on the user network segment of the SSM network, Host A runs
IGMPv3, Host B runs IGMPv2, and Host C runs IGMPv1. To enable the SSM network to
provide SSM services for all of the hosts without upgrading the IGMP versions to IGMPv3,
configure SSM mapping on the multicast device.
Issue 01 (2018-05-04) 1362

NE20E-S2
Figure 1-922 SSM mapping application
If Device A has SSM mapping enabled and is configured with mappings between group
addresses and source addresses, it will perform the following actions after receiving a (*, G)
message from Host B or Host C:
 If the multicast group address contained in the message is within the any-source
multicast (ASM) range, Device A processes the request as described in 1.11.3.2.1
Principles of IGMP.
 If the multicast group address contained in the message is within the SSM range, Device
A maps a (*, G) join message to multiple (S, G) join messages based on mapping rules.
With this processing, hosts running IGMPv1 or IGMPv2 can access multicast services
available only in the SSM range.
1.11.3.2.6 IGMP On-Demand

IGMP on-demand helps to maintain IGMP group memberships and frees a multicast device
and its connected access device from exchanging a large number of packets.
Background
After IGMP is configured on hosts and the hosts' directly connected multicast device, the
hosts can dynamically join multicast groups, and the multicast device can manage multicast
group members on the local network.
In some cases, the device directly connected to a multicast device, however, may not be a host
but an IGMP proxy-capable access device to which hosts are connected. If you configure only
IGMP on the multicast device, access device, and hosts, the multicast and access devices need
to exchange a large number of packets.
To resolve this problem, enable IGMP on-demand on the multicast device. The multicast
device sends only one general query message to the access device. After receiving the general
query message, the access device sends the collected Join and Leave status of multicast
groups to the multicast device. The multicast device uses the Join and Leave status of the
multicast groups to maintain multicast group memberships on the local network segment.
Benefits
IGMP on-demand reduces packet exchanges between a multicast device and its connected
access device and reduces the loads on these devices.
Issue 01 (2018-05-04) 1363

NE20E-S2
Related Concepts
IGMP on-demand enables a multicast device to send only one IGMP general query message
to its connected access device (IGMP proxy-capable) and to use Join/Leave status of multicast
groups reported by its connected access device to maintain IGMP group memberships.
Implementation
When a multicast device is directly connected to hosts, the multicast device sends IGMP
Query messages to and receives IGMP Report and Leave messages from the hosts to identify
the multicast groups that have receivers. The device directly connected to the multicast device,
however, may be not a host but an IGMP proxy-capable access device, as shown in Figure
1-923.
Figure 1-923 IGMP on-demand

The provider edge (PE) is a multicast device, and the customer edge (CE) is an access device.
 On the network segment a shown in Figure 1-923, if IGMP on-demand is not enabled on
the PE, the PE sends a large number of IGMP Query messages to the CE, and the CE
sends a large number of Report and Leave messages to the PE. As a result, lots of PE and
CE resources are consumed.
 On the network segment b shown in Figure 1-923, after IGMP on-demand is enabled on
the PE, the PE sends only one general query message to the CE. After receiving the
general query message from the PE, the CE sends the collected Join and Leave status of
IGMP groups to the PE. The CE sends a Report or Leave message for a group to the PE
Issue 01 (2018-05-04) 1364

NE20E-S2
only when the Join or Leave status of the group changes. To be specific, the CE sends an
IGMP Report message for a multicast group to the PE only when the first user joins the
multicast group and sends a Leave message only when the last user leaves the multicast
group.
After you enable IGMP on-demand on a multicast device connected to an IGMP proxy-capable access
device, the multicast device implements IGMP in a different way as it implements standard IGMP in the
following aspects:
 The multicast device interface connected to the access device sends only one IGMP general query
message to the access device.
 The records about dynamically joined IGMP groups on the multicast device interface connected to
the access device do not time out.
 The multicast device interface connected to the access device directly deletes the entry for a group
only after the multicast device interface receives an IGMP Leave message for the group.
1.11.3.2.7 IGMP IPsec

IGMP IPsec is a security function that filters out invalid packets and protects devices on a
multicast network. Table 1-267 describes the basic principles of IGMP IPsec.
Table 1-267 IGMP IPsec

Item Purpose Principle Applicable
Device
IGMP IPsec This function is IGMP IPsec uses security IGMP IPsec
used to authenticate association (SA) to applies to
IGMP packets to authenticate sent and received multicast devices
prevent bogus IGMP packets. The IGMP connected to user
IGMP protocol IPsec implementation process hosts.
packet attacks, is as follows:
improving multicast  Before an interface sends
service security. out an IGMP protocol
packet, IPsec adds an AH
header to the packet.
 After an interface receives
an IGMP protocol packet,
IPsec uses an SA to
authenticate the AH header
in the packet. If the AH
header is authenticated, the
interface forwards the
packet. Otherwise, the
interface discards the
packet.
NOTE
For IPsec feature description, see
1.16.11 IPsec.
Issue 01 (2018-05-04) 1365

NE20E-S2
1.11.3.2.8 Multi-Instance Supported by IGMP

IGMP multi-instance allows a multicast device's interface to send and receive protocol
packets based on the IGMP instance to which the interface belongs. When the interface
receives an IGMP message, the multicast device identifies the instance to which the interface
belongs and processes the message based on this instance's rules. When IGMP exchanges
information with other multicast protocols, IGMP only notifies these multicast protocols in
the instance.
For detailed IGMP message processing, see 1.11.3.2.1 Principles of IGMP.
1.11.3.3 IGMP Applications

1.11.3.3.1 Typical IGMP Applications
IGMP is a multicast protocol that allows hosts to join routing networks. Therefore, IGMP
applies to the network that connects multicast devices and hosts. IGMP also works even if
hosts and multicast devices run different IGMP versions.
Figure 1-924 Typical IGMP application
1.11.4 PIM
Purpose
A multicast network requires multicast protocols to replicate and forward multicast data. PIM
is a widely used intra-domain multicast protocol that builds MDTs to transmit multicast data
between routers in the same domain.
PIM can create multicast routing entries on demand, forward packets based on multicast
routing entries, and dynamically respond to network topology changes.
Definition
If IPv4 PIM and IPv6 PIM implement a feature in the same way, details are not provided in this chapter.
For details about implementation differences, see 1.11.4.4 Appendix.
Issue 01 (2018-05-04) 1366

NE20E-S2
PIM is a multicast routing protocol that uses unicast routing protocols to forward data, but
PIM is independent of any specific unicast routing protocols.
PIM has three implementation modes: PIM-SM, and PIM-SSM. These modes apply to both
IPv4 and IPv6 networks.
Table 1-268 PIM implementation modes
Protocol Full Name Model Deployment

Scenario
1.11.4.2.1 PIM-SM Protocol Independent ASM Large-scale

Multicast-Sparse Mode model networks on which
multicast data
receivers are
sparsely
distributed.
1.11.4.2.2 PIM-SSM Protocol Independent SSM Networks on which
Multicast-Source-Specific model multicast data
Multicast receivers can learn
source locations
before they join
multicast groups
and require
multicast data from
specific multicast
sources.
Benefits
PIM works together with other multicast protocols to implement applications, such as:
 Multimedia and media streaming applications
 Training and tele-learning communication
 Data storage and financial management applications
IP multicast is being widely used in Internet services, such as online broadcasts, network TV,
e-learning, telemedicine, network TV stations, and real-time video/voice conferencing
services.
1.11.4.2 Principles
1.11.4.2.1 PIM-SM
PIM-SM implements P2MP data transmission on large-scale networks on which multicast
data receivers are sparsely distributed. PIM-SM forwards multicast data only to network
segments with active receivers that have required the data.
PIM-SM assumes that no host wants to receive multicast data, so PIM-SM sets up an MDT
only after a host requests multicast data and then sends the data to the host along the MDT.
Issue 01 (2018-05-04) 1367

NE20E-S2
Concepts
This section provides basic PIM-SM concepts. Figure 1-925 shows a typical PIM-SM
network.
Figure 1-925 PIM-SM network
 PIM device
A router that runs PIM is called a PIM device. A router interface on which PIM is
enabled is called a PIM interface.
 PIM domain
A network constructed by PIM devices is called a PIM network.
A PIM-SM network can be divided into multiple PIM-SM domains by configuring BSR
boundaries on router interfaces to restrict BSR message transmission. PIM-SM domains
isolate multicast traffic between domains and facilitate network management.
 Designated router
A designated router (DR) can be a multicast source's DR or a receiver's DR.
− A multicast source's DR is a PIM device directly connected to a multicast source
and is responsible for sending Register messages to an RP.
− A receiver's DR is a PIM device directly connected to receiver's hosts and is
responsible for sending Join messages to an RP and forwarding multicast data to
receiver's hosts.
 RP
An RP is the forwarding core in a PIM-SM domain, used to process join requests of the
receiver's DR and registration requests of the multicast source's DR. An RP constructs an
MDT with the RP at the root and creates (S, G) entries to transmit multicast data to hosts.
All routers in the PIM-SM domain need to know the RP's location. The following table
lists the types of RPs.
Table 1-269 RP classifications
RP Type Implementation Deployment Scenario Precautions
Issue 01 (2018-05-04) 1368

NE20E-S2

Static RP A static RP is manually Static RPs are To use a static RP,
configured. recommended on ensure that all
If a static RP is used, the small-/medium-sized routers, including the
same RP address must be networks because a RP, have the same
configured on all PIM small-/medium-sized RP and multicast
devices in the same network is stable and has group address range
domain as the RP. low forwarding information.
requirements for an RP.
NOTE
If only one multicast source
exists on a network,
configuring the device
directly connected to the
source as the RP is
recommended, so the
source's DR does not need
to register with the RP.
Dynamic A dynamic RP is elected Dynamic RPs can be used To use a dynamic

RP among C-RPs in the on large-scale networks to RP, you must
same PIM domain. improve network configure a BSR that
The BSR sends Bootstrap reliability and dynamically
messages to collect all maintainability. advertises
C-RP information as an  If multiple multicast group-to-RP
RP-Set, and advertises sources are densely mapping
the RP-Set information to distributed on the information.
all PIM devices in the network, configuring
domain. Then, all the core devices close to
PIM devices use the the multicast sources
same RP-Set information as C-RPs is
and follow the same rules recommended.
to elect an RP. After the  If multiple users are
elected RP fails, the other densely distributed on
C-RPs start an election the network,
process again to elect a configuring core
new RP. devices close to the
users as C-RPs is
recommended.
Embedde Embedded-RP is a mode MSDP does not support -

d-RP used by a router in the IPv6 networks, so under
ASM model to obtain RP this protocol IPv6
addresses and is used PIM-SM domains cannot
either in an IPv6 learn RP information
PIM-SM domain or from each other.
between IPv6 PIM-SM Embedded-RP resolves
domains. this issue by supporting
An RP address is IPv6 and embedding an
embedded in an IPv6 RP address in an IPv6
group address. Therefore, address.
when obtaining an IPv6
Issue 01 (2018-05-04) 1369

NE20E-S2

group address, a router
can obtain the RP address
to which the IPv6 group
address corresponds.
 BSR
A BSR on a PIM-SM network collects RP information, summarizes that information into
an RP-Set (group-RP mapping database), and advertises the RP-Set to the entire
PIM-SM network.
A network can have only one BSR but can have multiple C-BSRs. If a BSR fails, a new
BSR is elected from the C-BSRs.
 RPT
An RPT is an MDT with an RP at the root and group members at the leaves.
 SPT
An SPT is an MDT with the multicast source at the root and group members at the
leaves.
Implementation
The multicast data forwarding process in a PIM-SM domain is as follows:
1. Neighbor discovery
Each PIM device in a PIM-SM domain periodically sends Hello messages to all other
PIM devices in the domain to discover PIM neighbors and maintain PIM neighbor
relationships.
By default, a PIM device permits other PIM control messages or multicast messages from a neighbor,
irrespective of whether the PIM device has received Hello messages from the neighbor. However, if a
PIM device has the neighbor check function, the PIM device permits other PIM control messages or
multicast messages from a neighbor only after the PIM device has received Hello messages from the
neighbor.
2. DR election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The
receiver's DR is the only multicast data forwarder on a shared network segment. The
source's DR is responsible for forwarding multicast data received from the multicast
source along an MDT.
3. RP discovery
An RP is the forwarding core in a PIM-SM domain. A dynamic or static RP forwards
multicast data over the entire network.
4. RPT setup
PIM-SM assumes that no hosts want to receive multicast data, so PIM-SM sets up an
RPT only after a host requests multicast data, and then sends the data from the RP to the
host along the RPT.
Issue 01 (2018-05-04) 1370

NE20E-S2
5. SPT switchover
A multicast group in a PIM-SM domain is associated with only one RP and one RPT. All
multicast data packets are forwarded by the RP. The path along which the RP forwards
multicast data may not be the shortest path from the multicast source to receivers. The
load of the RP increases when the multicast traffic volume increases. If the multicast data
forwarding rate exceeds a configured threshold, an RPT-to-SPT switchover can be
implemented to reduce the burden on the RP.
If a network problem occurs, the Assert mechanism or a DR switchover delay can be used to
guarantee that multicast data is transmitted properly.
 Assert
If multiple multicast data forwarders exist on a network segment, each multicast packet
is repeatedly sent across the network segment, generating redundant multicast data. To
resolve this issue, the Assert mechanism can be used to select a unique multicast data
forwarder on a network segment.
 DR switchover delay
If the role of an interface on a PIM device is changed from DR to non-DR, the PIM
device immediately stops using this interface to forward data. If multicast data sent from
a new DR does not arrive, multicast data traffic is temporarily interrupted. If a DR
switchover delay is configured, the interface continues to forward multicast data until the
delay expires. Setting a DR switchover delay prevents multicast data traffic from being
interrupted.
The detailed PIM-SM implementation process is as follows:
Neighbor Discovery
Each PIM-enabled interface on a PIM device sends Hello messages. A multicast packet that
carries a Hello message has the following features:
 The destination address is 224.0.0.13, indicating that this packet is destined for all PIM
devices on the same network segment as the interface that sends this packet.
 The source address is an interface address.
 The TTL value is 1, indicating that the packet is sent only to neighbor interfaces.
Hello messages are used to discover neighbors, adjust protocol parameters, and maintain
 Discovering PIM neighbors
All PIM devices on the same network segment must receive multicast packets with the
destination address 224.0.0.13. Directly connected multicast routers can then learn
neighbor information from the received Hello messages.
A router can receive PIM control messages or multicast packets from a neighbor only
after the router receives a Hello message from the neighbor. PIM control messages and
multicast packets are used for creating multicast routing entries and maintaining MDTs.
 Adjusting protocol parameters
A Hello message carries the following protocol parameters:
− DR_Priority: priority used by each router to elect a DR. The higher a router's
priority is, the higher the probability that the router will be elected as the DR.
− Holdtime: timeout period during which the neighbor remains in the reachable state.
− LAN_Delay: delay for transmitting a Prune message on the shared network
segment.
Issue 01 (2018-05-04) 1371

NE20E-S2
− Override-Interval: interval carried in a Hello message for overriding a Prune

message.
 Maintaining neighbor relationships
PIM devices periodically exchange Hello messages. If a PIM device does not receive a
new Hello message from its PIM neighbor within the Holdtime, the router considers the
neighbor unreachable and deletes the neighbor from its neighbor list.
PIM neighbor relationship changes cause the multicast topology to change. If an
upstream or a downstream neighbor is unreachable, multicast routes re-converge, and the
MDT is transferred.
DR Election
The network segment on which a multicast source or group members reside is usually
connected to multiple PIM devices, as shown in Figure 1-926. The PIM devices exchange
Hello messages to set up PIM neighbor relationships. A Hello message carries the DR priority
and the address of the interface that connects the PIM device to this network segment. The
router compares the local information with the information carried in the Hello messages sent
by other PIM devices to elect a DR. This process is a DR election. The election rules are as
follows:
 The PIM router with the highest DR priority wins.
 If PIM devices have the same DR priority or PIM devices that do not support Hello
messages carrying DR priorities exist on the network segment, the PIM device with the
highest IP address wins.
Figure 1-926 DR election
RP Discovery
 Static RP
Issue 01 (2018-05-04) 1372

NE20E-S2
A static RP is specified by running commands. A static RP's address needs to be

manually configured on other routers so they can find and use this RP for data
forwarding.
 Dynamic RP
A dynamic RP is elected from a set of PIM devices.
Figure 1-927 shows the dynamic RP election mechanism.
Figure 1-927 Dynamic RP election
The dynamic RP election rules are as follows:

a. To use a dynamic RP, configure C-BSRs to elect a BSR among the set of C-BSRs.
Each C-BSR considers itself a BSR and advertises a Bootstrap message. The
Bootstrap message carries the address and priority of the C-BSR. Each router
compares the information contained in all received Bootstrap messages to
determine which C-BSR becomes the BSR:
i. The C-BSR with the highest priority wins (the greater the priority value, the
higher the priority).
ii. If all the C-BSRs have the same priority, the C-BSR with the highest IP
address wins.
All the routers follow the same BSR election rules, so they will elect the same BSR
and learn the BSR address.
b. The C-RPs send C-RP Advertisement messages to the BSR. Each Advertisement
message carries the address of the C-RP that sent it, the range of multicast groups
that the C-RP serves, and the priority of the C-RP.
Issue 01 (2018-05-04) 1373

NE20E-S2
c. The BSR collects the received information as an RP-Set, encapsulates the RP-Set
information in a Bootstrap message, and advertises the Bootstrap message to all
PIM-SM devices.
d. Each router uses the RP-Set information to perform the same calculations and
comparisons to elect an RP from multiple C-RPs. The election rules are as follows:
i. A C-RP wins if it serves the group address that users join has the longest mask.
ii. If group addresses that users join and are served by C-RPs have the same mask
length, the priorities of the C-RPs are compared. The C-RP with the highest
priority wins (the greater the priority value, the lower the priority).
iii. If the C-RPs have same priority, the hash function is started. The C-RP with
the greatest calculated value wins.
iv. If none of the above criteria can determine a winner, the C-RP with the highest
address wins.
e. Because all routers use the same RP-Set and the same election rules, the
relationship between the multicast group and the RP is the same for all routers. The
routers save this relationship to guide subsequent multicast operations.
If a router needs to interwork with an auto-RP-capable device, enable auto-RP listening.
After auto-RP listening is enabled, the router can receive auto-RP announcement and
discovery messages, parse the messages to obtain source addresses, and perform RPF
checks based on the source addresses.
− If an RPF check fails, the router discards the auto-RP message.
− If an RPF check succeeds, the router forwards the auto-RP message to PIM
neighbors. The auto-RP message carries the multicast group address range served
by the RP to guide subsequent multicast operations.
Auto-RP listening is supported only in IPv4 scenarios.
 Embedded-RP
Embedded-RP is a mode used by the router in the ASM model to obtain an RP address.
This mode applies only within an IPv6 PIM-SM domain or between IPv6 PIM-SM
domains. To ensure consistent RP election results, an RP obtained in embedded-RP mode
takes precedence over RPs elected using other mechanisms. The address of an RP
obtained in embedded-RP mode must be embedded in an IPv6 multicast group address,
which must meet both of the following conditions:
− In the range of IPv6 multicast addresses.
− The IPv6 multicast group address must not be within the SSM group address range.
After a router calculates the RP address from the IPv6 multicast group address, the router
uses the RP address to discover a route for forwarding multicast packets. The process for
calculating the RP address is as follows:
a. The router copies the first N bits of the network prefix in the IPv6 multicast group
address. Here, N is specified by the plen field.
b. The router replaces the last four bits with the contents of the RIID field. An RP
address is then obtained.
Figure 1-928 shows the mapping between the IPv6 multicast group address and RP
address.
Issue 01 (2018-05-04) 1374

NE20E-S2
Figure 1-928 Mapping between the IPv6 multicast group address and RP address
 Anycast-RP
In a traditional PIM-SM domain, each multicast group is mapped to only one RP. When
the network is overloaded or traffic is heavy, many network problems can occur. For
example, if the RP is overloaded, routes will converge slowly, or the multicast
forwarding path will not be optimal.
Anycast-RP can be used to address these problems. Currently, Anycast-RP can be
implemented through MSDP or PIM:
− Through MSDP: Multiple RPs with the same address are configured in a PIM-SM
domain and MSDP peer relationships are set up between the RPs to share multicast
data sources.
This mode is only for use on IPv4 networks. For details about the implementation
principles, see 1.11.5.2.3 Anycast-RP in MSDP.
− Multiple RPs with the same address are configured in a PIM-SM domain and the
device where an RP resides is configured with a unique local address to identify the
RP. These local addresses are used to set up connectionless peer relationships
between the devices. The peers share multicast source information by exchanging
Register messages.
This mode is for use on both IPv4 and IPv6 networks.
These two modes cannot be both configured on the same device in a PIM-SM domain. If Anycast-RP is
implemented through PIM, you can also configure the device to advertise the source information
obtained from MSDP peers in another domain to peers in the local domain.
Receivers and the multicast source each select the RPs closest to their own location to create
RPTs. After receiving multicast data, the receiver's DR determines whether to trigger an SPT
switchover. Using Anycast-RP is an implementation strategy that facilitates optimal RPTs and
load sharing. The following section covers the principles of Anycast-RP in PIM.
Issue 01 (2018-05-04) 1375

NE20E-S2
Figure 1-929 Typical networking for Anycast-RP in PIM
In the PIM-SM domain shown in Figure 1-929, multicast sources S1 and S2 send multicast
data to multicast group G, and U1 and U2 are members of group G. Perform the following
operations to apply the PIM protocol to implement Anycast-RP in the PIM-SM domain:
 Configure RP1 and RP2 and assign both the same IP address (address of a loopback
interface). Assume that the IP address is 10.10.10.10.
 Set up a connectionless peer relationship between RP1 and RP2 using unique IP
addresses. Assume that the IP address of RP1 is 1.1.1.1 and the IP address of RP2 is
2.2.2.2.
The implementation of Anycast-RP in PIM is as follows:
1. The receiver sends a Join message to the closest RP and builds an RPT.
− U1 joins the RPT with RP1 at the root, and RP1 creates a (*, G) entry.
− U2 joins the RPT with RP2 at the root, and RP2 creates a (*, G) entry.
2. The multicast source sends a Register message to the closest RP.
− DR1 sends a Register message to RP1 and RP1 creates an (S1, G) entry. Multicast
data from S1 reaches U1 along the RPT.
− DR2 sends a Register message to RP2 and RP2 creates an (S2, G) entry. Multicast
data from S2 reaches U2 along the RPT.
3. After receiving Register messages from the source's DRs, RPs re-encapsulate the
Register messages and forward them to peers to share multicast source information.
− After receiving the (S1, G) Register message from DR1, RP1 replaces the source
and destination addresses with 1.1.1.1 and 2.2.2.2, respectively, and re-encapsulates
the message and sends it to RP2. Upon receiving the specially encapsulated
Register message from peer 1.1.1.1, RP2 processes this Register message without
forwarding it to other peers.
− After receiving the (S2, G) Register message from DR2, RP2 replaces the source
and destination addresses with 2.2.2.2 and 1.1.1.1, respectively, and re-encapsulates
the message and sends it to RP1. Upon receiving the specially encapsulated
Issue 01 (2018-05-04) 1376

NE20E-S2
Register message from peer 2.2.2.2, RP1 processes this Register message without
forwarding it to other peers.
4. The RP joins an SPT with the source's DR as the root to obtain multicast data.
− RP1 sends a Join message to S2. Multicast data from S2 first reaches RP1 along the
SPT and then reaches U1 along the RPT.
− RP2 sends a Join message to S1. Multicast data from S1 reaches RP2 first through
the SPT and then reaches U2 through the RPT.
5. After receiving multicast data, the receiver's DR determines whether to trigger an SPT
switchover.
RPT Setup
Figure 1-930 shows the RPT setup and data forwarding processes.
Figure 1-930 RPT setup and data forwarding processes
Setting up an RPT creates a forwarding path for multicast data.

 When a multicast source sends the first multicast packet of a multicast group to its DR,
the source's DR encapsulates the multicast packet in a Register message and unicasts the
Register message to the RP. The RP creates an (S, G) entry to register the multicast
source information.
 When a receiver joins a multicast group through IGMP, the receiver's DR sends a Join
message to the RP. A (*, G) entry is then created on each hop, and an RPT is created.
 When a receiver joins a multicast group and a multicast source becomes active
simultaneously, the multicast source's DR encapsulates the multicast packet in a Register
message and unicasts the Register message to the RP. The RP then forwards the multicast
data along the RPT to group members.
The RPT implements on-demand multicast data forwarding, which reduces bandwidth usage
by preventing unrequested data transmission.
Issue 01 (2018-05-04) 1377

NE20E-S2
To reduce the RPT forwarding loads and improve multicast data forwarding efficiency, PIM-SM
supports SPT switchovers, allowing a multicast network to set up an SPT with the multicast source at the
root. Then, the multicast source can send multicast data directly to receivers along the SPT.
SPT Switchover
In a PIM-SM domain, a multicast group interacts with only one RP, and only one RPT is set
up. If SPT switchover is not enabled, all multicast packets must be encapsulated in Register
messages and then sent to the RP. After receiving the packets, the RP de-encapsulates them
and forwards them along the RPT.
Since all multicast packets forwarded along the RPT are transferred by the RP, the RP may be
overloaded when multicast traffic is heavy. To resolve this problem, PIM-SM allows the RP or
the receiver's DR to trigger an SPT switchover.
Figure 1-931 SPT switchover triggered by the receiver's DR
An SPT switchover can be triggered by the RP or by the receiver's DR:

 SPT switchover triggered by the RP
Register messages sent from the source's DR are decapsulated by the RP, which then
forwards multicast data along the RPT to group members. At the same time, the RP
sends SPT Join messages to the source's DR to set up an SPT from the RP to the source.
After the SPT is set up and starts carrying multicast data packets, the RP stops processing
Register messages. This frees the source's DR and RP from encapsulating and
decapsulating packets. Multicast data is sent from the router directly connected to the
multicast source to the RP along the SPT and then forwarded to group members along
the RPT.
 SPT switchover triggered by the receiver's DR
a. As shown in Figure 1-931, multicast data is transmitted along the RPT. The
receiver's DR (Device D) sends (*, G) Join messages to the RP. Multicast data is
sent to receivers along the path source's DR (Device A) -> RP (Device B) ->
receiver's DR (Device D).
Issue 01 (2018-05-04) 1378

NE20E-S2
b. The receiver's DR periodically checks the forwarding rate of multicast packets. If

the receiver's DR finds that the forwarding rate is greater than the configured
threshold, the DR triggers an SPT switchover.
c. The receiver's DR sends (S, G) Join messages to the source's DR. After receiving
multicast data along the SPT, the receiver's DR discards multicast data received
along the RPT and sends a Prune message to the RP to delete the receiver from the
RPT. The switchover from the RPT to the SPT is complete.
d. Multicast data is forwarded along the SPT. Specifically, multicast data is
transmitted to receivers along the path multicast source's DR (Device A) ->
receiver's DR (Device D).
An SPT is set up from the source to group members, and therefore subsequent packets
may bypass the RP. The RPT may not be an SPT. After an SPT switchover is performed,
delays in transmitting multicast data on the network are reduced.
If one source sends packets to multiple groups simultaneously and an SPT switchover policy
is specified for a specified group range:
 Before an SPT switchover, these packets reach the receiver's DR along the RPT.
 After an SPT switchover, only the packets sent to the groups within the range specified
in the SPT switchover policy are forwarded along the SPT. Packets sent to other groups
are still forwarded along the RPT.
Assert
Either of the following conditions indicates other multicast forwarders are present on the
network segment:
 A multicast packet fails the RPF check.
 The interface that receives the multicast packet is a downstream interface in the (S, G)
entry on the local router.
If other multicast forwarders are present on the network segment, the router starts the Assert
mechanism.
The router sends an Assert message through the downstream interface. The downstream
interface also receives an Assert message from a different multicast forwarder on the network
segment. The destination address of the multicast packet in which the Assert message is
encapsulated is 224.0.0.13. The source address of the packet is the downstream interface
address. The TTL value of the packet is 1. The Assert message carries the route cost from the
PIM device to the source or RP, priority of the used unicast routing protocol, and the group
address.
The router compares its information with the information contained in the message sent by its
neighbor. This is called Assert election. The election rules are as follows:
1. The router that runs a higher priority unicast routing protocol wins.
2. If the routers have the same unicast routing protocol priority, the router with the smaller
route cost to the source wins.
3. If the routers have the same priority and route cost, the router with the highest IP address
for the downstream interface wins.
The router performs the following operations based on the Assert election result:
 If the router wins the election, the downstream interface of the router is responsible for
forwarding multicast packets on the network segment. The downstream interface is
called an Assert winner.
Issue 01 (2018-05-04) 1379

NE20E-S2
 If the router does not win the election, the downstream interface is prohibited from
forwarding multicast packets and is deleted from the downstream interface list of the (S,
G) entry. The downstream interface is called an Assert loser.
After Assert election is complete, only one upstream router that has a downstream interface
exists on the network segment, and the downstream interface transmits only one copy of each
multicast packet. The Assert winner then periodically sends Assert messages to maintain its
status as the Assert winner. If the Assert loser does not receive any Assert messages from the
Assert winner after the timer of the Assert loser expires, the loser re-adds downstream
interfaces for multicast data forwarding.
DR Switchover Delay
If an existing DR fails, the PIM neighbor relationship times out, and a new DR election is
triggered.
By default, when an interface changes from a DR to a non-DR, the router immediately stops
using the interface to forward data. If multicast data sent from a new DR has not yet arrived at
the interface, multicast data streams are temporarily interrupted.
When a PIM-SM interface that has a PIM DR switchover delay configured receives Hello
messages from a new neighbor and changes from a DR to a non-DR, the interface continues
to function as a DR and to forward multicast packets until the delay times out.
If the router that has a DR switchover delay configured receives packets from a new DR
before the delay expires, the router immediately stops forwarding packets. When a new IGMP
Report message is received on the shared network segment, the new DR (instead of the old
DR configured with a DR switchover delay) sends a PIM Join message to the upstream
device.
If the new DR receives multicast data from the original DR before the DR switchover delay expires, an
Assert election is triggered.
PIM-SM Administrative Domain

A PIM-SM network is divided into a global domain and multiple BSR administrative domains
to simplify network management. Dividing the network into domains can reduce the
workloads of a single BSR and can use private group addresses to provide special services for
users in a specific domain.
Each BSR administrative domain has only one BSR that serves a multicast group for a
specific address range. The global domain has a BSR that serves the other multicast groups.
The relationship between the BSR administrative domain and the global domain is described
as follows in terms of the domain space, group address range, and multicast function.
 Domain space
Issue 01 (2018-05-04) 1380

NE20E-S2
Figure 1-932 BSR administrative domain - domain space
As shown in Figure 1-932, different BSR administrative domains contain different

routers. A router cannot belong to multiple BSR administrative domains. Each BSR
administrative domain is independent and geographically isolated from other domains. A
BSR administrative domain manages a multicast group for a specific address range.
Multicast packets within this address range can be transmitted only in this BSR
administrative domain and cannot exit the border of the domain.
The global domain contains all the routers on the PIM-SM network. Multicast packets
that do not belong to a particular BSR administrative domain can be transmitted over the
entire PIM network.
 Group address range
Issue 01 (2018-05-04) 1381

NE20E-S2
Figure 1-933 BSR administrative domain - address range
Each BSR administrative domain provides services to the multicast group within a
specific address range. The multicast groups that different BSR administrative domains
serve can overlap. However, a multicast group address that a BSR administrative domain
serves is valid only in its BSR administrative domain because a multicast address is a
private group address. As shown in Figure 1-933, the group address range of BSR1
overlaps with that of BSR3.
The multicast group that does not belong to any BSR administrative domain belongs to
the global domain. That is, the group address range of the global domain is G-G1-G2.
 Multicast function
As shown in Figure 1-932, the global domain and each BSR administrative domain have
their respective C-RP and BSR devices. Devices only function in the domain to which
they are assigned. Each BSR administrative domain has a BSR mechanism and RP
elections that are independent of other domains.
Each BSR administrative domain has a border. Multicast information for this domain,
such as the C-RP Advertisement messages and BSR Bootstrap message, can be
transmitted only within the domain. Multicast information for the global domain can be
transmitted throughout the entire global domain and can traverse any BSR administrative
domain.
1.11.4.2.2 PIM-SSM
Protocol Independent Multicast-Source-Specific Multicast (PIM-SSM) enables a user host to
rapidly join a multicast group if the user knows a multicast source address. PIM-SSM sets up
a shortest path tree (SPT) from a multicast source to a multicast group, while PIM-SM uses
rendezvous points (RPs) to set rendezvous point trees (RPTs). Therefore, PIM-SSM
implements a more rapid join function than PIM-SM.
Different from the any-source multicast (ASM) model, the SSM model does not need to
maintain an RP, construct an RPT, or register a multicast source.
The SSM model is based on PIM-SM and IGMPv3/Multicast Listener Discovery version 2
(MLDv2). The procedure for setting up a multicast forwarding tree on a PIM-SSM network is
Issue 01 (2018-05-04) 1382

NE20E-S2
similar to the procedure for setting up an SPT on a PIM-SM network. The receiver's DR,
which knows the multicast source address, sends Join messages directly to the source so that
multicast data streams can be sent to the receiver's designated router (DR).
In SSM mode, multicast traffic forwarding is based on (S, G) channels. To receive the multicast traffic
of a channel, a multicast user must join the channel. A multicast user can join or leave a multicast
channel by subscribing to or unsubscribing from the channel. Currently, only IGMPv3 can be used for
channel subscription or unsubscription.
Related Concepts
PIM-SSM implementation is based on PIM-SM. For details about PIM-SSM, see Related
Concepts.
Implementation
The process for forwarding multicast data in a PIM-SSM domain is as follows:
1. Neighbor Discovery
Each PIM device in a PIM-SSM domain periodically sends Hello messages to all other
PIM devices in the domain to discover PIM neighbors and maintain PIM neighbor
relationships.
By default, a PIM device permits other PIM control messages or multicast messages from a neighbor,
irrespective of whether the PIM device has received Hello messages from the neighbor. However, if a
PIM device has the neighbor check function, the PIM device permits other PIM control messages or
multicast messages from a neighbor only after the PIM device has received Hello messages from the
neighbor.
2. DR Election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The
receiver's DR is the only multicast data forwarder on the segment.
3. SPT setup
Users on a PIM-SSM network can know the multicast source address and can, therefore,
specify the source when joining a multicast group. After receiving a Report message
from a user, the receiver's DR sends a Join message towards the multicast source to
establish an SPT between the source and the user. Multicast data is then sent by the
multicast source to the user along the SPT.
 SPT establishment can be triggered by user join requests (both dynamic and static) and
SSM-mapping.
 The DR in an SSM scenario is valid only in the shared network segment connected to group
members. The DR on the group member side sends Join messages to the multicast source, creates
the (S, G) entry hop by hop, and then sets up an SPT.
 PIM-SSM supports PIM silent, BFD for PIM, and a PIM DR switchover delay.
1.11.4.2.3 PIM Reliability

PIM reliability mechanisms include:
Issue 01 (2018-05-04) 1383

NE20E-S2
 BFD for PIM
Basic Principles of BFD for PIM

network device must be able to quickly detect communication faults with adjacent devices.
function can be used to detect link faults. Hardware fault detection mechanisms are fast,
but cannot be used in all scenarios by all media.
 Slow Hello mechanism: It usually refers to the Hello mechanism of a routing protocol.
The detection rate for slow Hello mechanisms is measured in seconds. Detection times of
one second or more can result in large losses if data is being transmitted at gigabit rates.
For delay-sensitive services, such as voice, a delay of one second or more is also
unacceptable.
 Other detection mechanisms: Different protocols or manufacturers may provide
proprietary detection mechanisms, but it is difficult to deploy proprietary mechanisms
when systems are interconnected for interworking.
Bidirectional Forwarding Detection (BFD) is a unified detection mechanism that can detect a
fault in milliseconds on a network. BFD is compatible with all types of transmission media
and protocols. BFD implements the fault detection function by establishing a BFD session
and periodically sending BFD control packets along the path between them. If one system
does not receive BFD control packets within a specified period, the system regards it as a fault
occurrence on the path.
In multicast scenarios, if the DR on a shared network segment is faulty and the neighbor
relationship times out, other PIM neighbors start a new DR election. Consequently, multicast
data transmission is interrupted for a few seconds.
BFD for PIM can detect a link's status on a shared network segment within milliseconds and
respond quickly to a fault on a PIM neighbor. If the interface configured with BFD for PIM
does not receive any BFD packets from the current DR within a configured detection period,
the interface considers that a fault has occurred on the designated router (DR). The BFD
module notifies the route management (RM) module of the session status, and the RM module
notifies the PIM module. Then, the PIM module triggers a new DR election immediately
rather than waiting for the neighbor relationship to time out. This minimizes service
interruptions and improves the multicast network reliability.
Currently, BFD for PIM can be used on both IPv4 PIM-SM/Source-Specific Multicast (SSM) and IPv6
PIM-SM/SSM networks.
Issue 01 (2018-05-04) 1384

NE20E-S2
Figure 1-934 BFD for PIM
As shown in Figure 1-934, on the shared network segment where user hosts reside, a PIM
BFD session is set up between the downstream interface Port 2 of Device B and the
downstream interface Port 1 of Device C. Both ports send BFD packets to detect the status of
the link between them.
Port 2 of Device B is elected as a DR for forwarding multicast data to the receiver. If Port 2
fails, BFD immediately notifies the RM module of the session status and the RM module then
notifies the PIM module. The PIM module triggers a new DR election. Port 1 of Device C is
then elected as a new DR to forward multicast data to the receiver.
1.11.4.2.4 PIM Security

To ensure that multicast services are correctly transmitted on networks, PIM security is
implemented to limit the valid BSR and C-RP address ranges, filter packets, and check PIM
neighbors.
Table 1-270 PIM security features
PIM Applica Purpose Principle Applicable Protected

Security ble Device Device
Feature Protocol
Limit on IPv4 Any router on a An ACL All multicast BSR
the BSR PIM-SM PIM-SM network and devices on a
address IPv6 that uses the filtering network
range PIM-SM BootStrap router rules can be
(BSR) mechanism configured
can be configured to limit the
as a range of
Candidate-BootStra valid BSR
p Router (C-BSR) addresses.
and participate in a Consequent
BSR election. The ly, devices
Issue 01 (2018-05-04) 1385

NE20E-S2

Feature Protocol
winner of the BSR will discard
election is BSR
responsible for packets
advertising carrying
rendezvous point BSR
(RP) information. addresses
This function is outside the
used to guarantee valid
BSR security by address
preventing BSR range.
spoofing and
malicious hosts
from replacing
valid BSRs.
Limit on IPv4 Any router on a An ACL C-BSR RP
the C-RP PIM-SM PIM-SM network and
address IPv6 that uses the BSR filtering
range PIM-SM mechanism can be rules can be
configured as a configured
Candidate-Rendezv to limit the
ous Point (C-RP) range of
and serve multicast valid C-RP
groups in a addresses
specified range. and the
Each C-RP unicasts range of
an Advertisement multicast
message to the groups that
BSR. The BSR each C-RP
collects all received serves.
C-RP information Then the
and summarizes it BSR will
as the RP-Set, and discard
floods the RP-Set Advertisem
over the entire ent
network using messages
Bootstrap carrying
messages. Based on C-RP
the RP-Set, routers addresses
on the network can outside the
calculate out the RP valid C-RP
to which a address
multicast group in a range.
specific range
corresponds.
This function is
used to guarantee
C-RP security by
preventing C-RP
spoofing and
malicious hosts
Issue 01 (2018-05-04) 1386

NE20E-S2

Feature Protocol
from replacing
valid C-RPs. With
this function, an RP
can be correctly
elected.
Limit on IPv4 This feature is used A PIM All PIM All PIM
the PIM-SM to limit the number entry devices on a devices on a
number IPv4 of number network. network.
of PIM PIM-SS PIM-SM/PIM-SSM limit can be
entries M entries to prevent a configured
device from globally to
generating restrict the
excessive multicast maximum
routing entries number of
when attackers PIM-SM/PI
send numerous M-SSM
multicast data or entries that
IGMP/PIM can be
protocol messages. created.
Therefore, this After the
feature helps specified
prevent high limit is
memory and CPU reached, the
usage and improve device will
multicast service not create
security. new
PIM-SM/PI
M-SSM
entries.
PIM (*, G)
and (S, G)
entries are
limited
separately.
 After
the
specifie
d limit
for PIM
(*, G)
entries
is
reached,
the
device
will stop
creating
PIM-SM
(*, G)
Issue 01 (2018-05-04) 1387

NE20E-S2

Feature Protocol
entries.
 After
the
specifie
d limit
for PIM
(S, G)
entries
is
reached,
the
device
will stop
creating
PIM-SM
/PIM-SS
M (*, G)
entries.
Register IPv4 Any new multicast An ACL RP RP
message PIM-SM source on a and
filtering IPv6 PIM-SM network filtering
PIM-SM must initially rules can be
register with the configured
RP. The RP to enable
forwards multicast the RP to
data sent by a filter
multicast source to Register
group members messages
after receiving a received
Register message from the
from the multicast multicast
source's designated source's
router (DR). DR.
This function is
used to protect the
network against
invalid Register
messages from
malicious devices.
With this function,
multicast
forwarding trees
can be correctly set
up so that multicast
data can be
correctly sent to
receivers.
PIM IPv4 Some unknown An ACL All multicast All multicast
neighbor PIM-SM devices on a and devices on a devices on a
Issue 01 (2018-05-04) 1388

NE20E-S2

Feature Protocol
filtering IPv6 network may set up filtering network network
PIM-SM PIM neighbor rules can be
IPv4 relationships with a configured
PIM-SS multicast router and to enable
M prevent the interfaces to
multicast router set up
IPv6 from functioning as neighbor
PIM-SS a DR. relationship
M s only with
This function is
used to prevent a interfaces
multicast router with valid
from setting up addresses
PIM neighbor and to
relationships with delete
unknown devices neighbors
and prevent an with invalid
unknown router addresses.
from becoming a
DR.
Join IPv4 A Join/Prune An ACL All multicast All multicast
informati PIM-SM message received and devices on a devices on a
on IPv6 by an interface filtering network network
filtering PIM-SM contains both join rules can be
and prune configured
IPv4 information. to filter join
PIM-SS information
M This function is
used to filter join . Devices
IPv6 information to create PIM
PIM-SS prevent entries
M unauthorized users based on
from joining valid Join
multicast groups. information
.
Source IPv4 This function An ACL All multicast All multicast
address-b PIM-SM enables a device to and devices on a devices on a
ased IPv6 filter multicast data filtering network network
filtering PIM-SM packets based on rules can be
source or configured
IPv4 source/group to enable
PIM-SS addresses, ensuring devices to
M the security of forward
IPv6 multicast data multicast
PIM-SS packets. packets
M carrying
source or
source/grou
p addresses
within the
valid source
Issue 01 (2018-05-04) 1389

NE20E-S2

Feature Protocol
or
source/grou
p address
range.
PIM IPv4 This function When All multicast All multicast
neighbor PIM-SM guarantees the receiving or devices on a devices on a
check IPv6 security of sending network network
PIM-SM Join/Prune or Join/Prune
Assert messages or Assert
IPv4 received or sent by messages, a
PIM-SS devices. device
M checks
IPv6 whether the
PIM-SS messages
M are sent to
or received
from a PIM
neighbor. If
these
messages
are not sent
to or
received
from a PIM
neighbor,
these
messages
will be
discarded.
PIM IPv4 If PIM-SM is The Interface PIM devices
silent PIM-SM enabled on the interface is directly directly
IPv6 interface directly not allowed connected to connected to
PIM-SM connecting a to receive the user host user host
multicast device to or forward network network
IPv4 user hosts, this any PIM segment that segments.
PIM-SS interface can set up packets and has only one
M PIM neighbor all PIM PIM device
IPv6 relationships and neighbor
PIM-SS process PIM relationship
M packets. If a s
malicious host established
sends pseudo PIM by this
Hello packets to the interface
multicast device, are deleted.
the multicast device
may break down.
This function is
used to protect
interfaces of
Issue 01 (2018-05-04) 1390

NE20E-S2

Feature Protocol
PIM-SM devices
against pseudo PIM
Hello packets.
PIM IPv4 This function is PIM IPsec All PIM All PIM
IPsec PIM-SM used to authenticate uses devices on a devices on a
IPv6 PIM packets to security network. network.
PIM-SM prevent bogus PIM association
protocol packet (SA) to
IPv4 attacks or denial of authenticate
PIM-SS service (DoS) sent and
M attacks, improving received
IPv6 multicast service PIM
PIM-SS security. packets.
M The PIM
IPsec
implementa
tion process
is as
follows:
 Before
an
interface
sends
out an
PIM
protocol
packet,
IPsec
adds a
protocol
header
to the
packet.
 After an
interface
receives
an PIM
protocol
packet,
IPsec
uses a
protocol
header
to
authenti
cate the
protocol
header
in the
packet.
Issue 01 (2018-05-04) 1391

NE20E-S2

Feature Protocol
If the is
authenti
cation is
successf
ul, the
packet is
forward
ed.
Otherwi
se, the
packet is
discarde
d.
PIM IPsec
can
authenticate
the
following
types of
PIM
packets:
 PIM
multicas
t
protocol
packets,
such as
Hello
and
Join/Pru
ne
packets.
 PIM
unicast
protocol
packets,
such as
Register
and
Register
-Stop
packets.
NOTE
For IPsec
feature
description,
see 1.16.11
Issue 01 (2018-05-04) 1392

NE20E-S2

Feature Protocol
IPsec.
1.11.4.2.5 PIM FRR

PIM fast reroute (FRR) is a multicast traffic protection mechanism that allows
PIM-SM/PIM-SSM-capable devices to set up both primary and backup shortest path trees
(SPTs) for multicast receivers. PIM FRR enables a device to switch traffic to the backup SPT
within 50 ms after the primary link or a node on the primary link fails, thus minimizing
multicast traffic loss.
Background
SPT setup relies on unicast routes. If a link or node failure occurs, a new SPT can be set up
only after unicast routes are converged. This process is time-consuming and may cause severe
multicast traffic loss.
PIM FRR resolves these issues. It allows a device to search for a backup FRR route based on
unicast routing information and send the PIM Join message of a multicast receiver along both
the primary and backup routes, setting both primary and backup SPTs. The cross node of the
primary and backup links can receive one copy of a multicast flow from each of the links.
Each device's forwarding plane permits the multicast traffic on the primary link and discards
that on the backup link. However, the forwarding plane starts permitting multicast traffic on
the backup link as soon as the primary link fails, thus minimizing traffic loss.
PIM FRR supports fast SPT switchovers only in IPv4 PIM-SSM or PIM-SM. In extranet scenarios, PIM
FRR supports only source VPN, not receiver VPN entries.
Implementation
PIM FRR implementation involves three steps:
1. Setup of primary and backup SPTs for a multicast receiver
Each PIM-SM/PIM-SSM device adds the inbound interface information to the (S, G)
entry of the receiver, and then searches for a backup FRR route based on unicast routing
information. After a backup FRR route is discovered, each device adds the backup
route's inbound interface information to the (S, G) entry so that two routes become
available from the source to the multicast group requested by the receiver. Each device
then sends a PIM Join message along both the primary and backup routes to set up two
SPTs. Figure 1-935 shows the process of setting up two SPTs for a multicast receiver.
Issue 01 (2018-05-04) 1393

NE20E-S2
Figure 1-935 Setup of primary and backup SPTs for a multicast receiver
2. Fault detection and traffic protection

After the primary and backup SPTs are set up, each multicast device on the primary link
receives two copies of a multicast flow. Their forwarding planes permit the multicast
traffic on the primary link and discard that on the backup link. If the primary link or a
node on the primary link fails, the forwarding plane starts permitting the traffic on the
backup link as soon as it detects the failure. Table 1-271 describes PIM FRR
implementation before and after link or node failure occurs.
Table 1-271 PIM FRR implementation before and after a link or node failure occurs
Failure Type Before a Failure Occurs After a Failure Occurs

Local primary link In Figure 1-936, Device A permits the In Figure 1-937, Device
multicast traffic on the primary link and A starts permitting
discards that on the backup link. multicast traffic on the
backup link (Device B ->
Figure 1-936 PIM FRR implementation Device D -> Device A)
before a local primary link failure occurs as soon as the local
primary link fails.
Figure 1-937 PIM FRR

implementation after a
local primary link failure
occurs
Node In Figure 1-938, Device A permits the In Figure 1-939, Device

multicast traffic on the primary link and A starts permitting
backup link (Device C ->
before a node failure occurs on the as soon as Device B fails
Issue 01 (2018-05-04) 1394

NE20E-S2
Failure Type Before a Failure Occurs After a Failure Occurs

primary link on the primary link.

node failure occurs on the
primary link
Remote primary In Figure 1-940, Device A permits the In Figure 1-941, Device
link multicast traffic on the primary link and A starts permitting
backup link (Device C ->
before a remote primary link failure as soon as Device A
occurs detects the remote
primary link failure.

remote primary link failure
occurs
3. Traffic switchback
After the link or node failure is resolved, PIM detects a route change at the protocol layer,
starts route switchback, and then smoothly switches traffic back to the primary link.
PIM FRR in Scenarios Where IGP FRR Cannot Fulfill Backup Root Computation
Independently
PIM FRR relies on IGP FRR to compute both primary and backup routes. However, a live
network may encounter backup route computation failures on some nodes due to the increase
of network nodes. Therefore, if IGP FRR cannot fulfill route computation independently on a
network, deploy IP FRR to work jointly with IGP FRR. The following example uses a ring
network.
Issue 01 (2018-05-04) 1395

NE20E-S2
Figure 1-942 PIM FRR on a ring network
On the ring network shown in Figure 1-942, Device C connects to a multicast receiver. The
primarily multicast traffic link for this receiver is Device C -> Device B -> Device A. To
compute a backup route for the link Device D -> Device C, IGP FRR requires that the cost of
link Device D -> Device A be less than the cost of link Device C -> Device A plus the cost of
link Device D -> Device C. That is, the cost of link Device D -> Device E -> Device F ->
Device A must be less than the cost of link Device C -> Device A plus the cost of link Device
D -> Device C. This ring network does not meet this requirement; therefore, IGP FRR cannot
compute a backup route for link Device D -> Device C.
To resolve this issue, manually specify a backup route to the multicast source. Configure a
static route whose destination is the multicast source, next hop is Device D, and preference is
lower than that of the IGP route, as follows.
 The IGP route is Device C -> Device B -> Device A, which has a higher preference and
functions as the primary link.
 The static route is Device C -> Device D -> Device E -> Device F -> Device A, which
has a lower preference and functions as the backup link.
Before a link or node failure occurs, Device C permits the multicast traffic on the primary link
and discards that on the backup link. After a link or node failure occurs, Device C starts
permitting the multicast traffic on the backup link as soon as it detects the failure.
Benefits
PIM FRR helps improve the reliability of multicast services and minimize service loss for
users.
Issue 01 (2018-05-04) 1396

NE20E-S2
1.11.4.2.6 PIM Control Messages

PIM devices exchange control messages to implement multicast routing. A PIM control
message is encapsulated in an IP packet, as shown in Figure 1-943.
Figure 1-943 Encapsulation format of a PIM control message
In the header of an IP packet that contains a PIM control message:

 The protocol type field is 103.
 The destination address identifies a receiver. The destination address can be either a
unicast address or a multicast address.
PIM Control Message Types

All PIM control messages use the same header format, as shown in Figure 1-944.
Figure 1-944 Header format of a PIM protocol message
Table 1-272 Fields in a PIM control message
Field Description
Version PIM version

The value is 2.
Type Message type:
 0: Hello
 1: Register
 2: Register-Stop
 3: Join/Prune
 4: Bootstrap
 5: Assert
 8: Candidate-RP-Advertisement
Reserved Reserved
Checksum Checksum
Issue 01 (2018-05-04) 1397

NE20E-S2
Hello Messages
PIM devices periodically send Hello messages through all PIM interfaces to discover
neighbors and maintain neighbor relationships.
In an IP packet that carries a Hello message, the source address is a local interface's address,
the destination address is 224.0.0.13, and the TTL value is 1. The IP packet is transmitted in
multicast mode.
Figure 1-945 Hello message format
Figure 1-946 Hello Option field format
Table 1-273 Fields in a Hello message
Field Description
Type Message type

The value is 0.
Reserved Reserved
The field is set to 0 when the message is sent and is ignored
when the message is received.
Checksum Checksum
Option Type Parameter type
Valid values of this field are listed in Table 1-274.
Option Length Length of the Option Value field
Option Value Parameter value
Issue 01 (2018-05-04) 1398

NE20E-S2
Table 1-274 Valid values of the Option Type field
Option Type Option Value

1 Holdtime: timeout period during which a neighbor remains in
the reachable state
If a router does not receive any Hello message during the
timeout period, the router considers its neighbor unreachable.
2 The field consists of the following parts:
 LAN Prune Delay: delay before transmitting Prune
messages on a shared network segment
 Interval: interval for overriding a Prune message
 T: capability of suppressing Join messages
19 DR Priority: priority of a router interface, used to elect a

designated router (DR)
The higher a router interface's priority, the higher the
probability the router interface becomes a DR.
20 Generation ID: a random number, indicating neighbor status
If the neighbor status changes, the random number is updated.
When a router detects that the Hello messages received from an
upstream device contain different Generation ID values, the
router considers the upstream neighbor Down or the status of
the upstream neighbor has changed.
21 State Refresh Capable: interval for refreshing neighbor status
24 Address List: secondary address list of PIM interfaces
Register Messages
Register messages are used only in PIM-SM.
When a multicast source becomes active on a PIM-SM network, the source's DR sends a
Register message to register with the rendezvous point (RP).
In an IP packet that carries a Register message, the source address is the address of the
source's DR, and the destination address is the RP's address. The message is transmitted in
unicast mode.
Issue 01 (2018-05-04) 1399

NE20E-S2
Figure 1-947 Register message format
Table 1-275 Fields in a Register message
Field Description
Type Message type
The value is 0.
Reserved The field is set to 0 when the message is sent and is ignored
when the message is received.
Checksum Checksum
B Border bit
N Null-Register bit
Reserved2 Reserved
The field is set to 0 when the message is sent and this field is
ignored when the message is received.
Multicast data packet The source's DR encapsulates the received multicast data in a
Register message and sends the message to the RP. After
decapsulating the message, the RP learns the (S, G)
information of the multicast data packet.
A multicast source can send data to multiple groups, and therefore a source's DR must send
Register messages to the RP of each target multicast group. A Register message is
encapsulated only in one multicast data packet, so the packet carries only one copy of (S, G)
information.
In the register suppression period, a source's DR sends Null-Register messages to notify the
RP of the source's active state. A Null-Register message contains only an IP header, including
the source address and group address. After the register suppression times out, the source's
DR encapsulates a Register message into a multicast data packet again.
Register-Stop Messages
Register-Stop messages are used only in PIM-SM.
On a PIM-SM network, an RP sends Register-Stop messages to a source's DR in the following

conditions:
 Receivers stop requesting a multicast group's data from the RP.
Issue 01 (2018-05-04) 1400

NE20E-S2
 The RP stops serving a multicast group.

 Multicast data has been switched from a rendezvous point tree (RPT) to a shortest path
tree (SPT).
After receiving a Register-Stop message, a source's DR stops using the Register message to
encapsulate multicast data packets and enters the register suppressed state.
In an IP packet that carries a Register-Stop message, the source address is the RP's address,
and the destination address is the source DR's address. The message is transmitted in unicast
mode.
Figure 1-948 Register-Stop message format
Table 1-276 Fields in a Register-Stop message
Field Description
Type Message type
The value is 2.
Group Address Multicast group address
Source Address Multicast source address
An RP can serve multiple groups, and a group can receive data from multiple sources.
Therefore, an RP may simultaneously perform multiple (S, G) registrations.
A Register-Stop message carries only one copy of the (S, G) information. When an RP sends a
Register-Stop message to a source's DR, the RP can terminate only one (S, G) registration.
After receiving the Register-Stop message carrying the (S, G) information, the source's DR
stops encapsulating (S, G) packets. The source still uses Register messages to encapsulate
packets and send the packets to other groups.
Join/Prune Messages
A Join/Prune message can contain both Join messages and Prune messages. A Join/Prune
message that contains only a Join message is called a Join message. A Join/Prune message
that contains only a Prune message is called a Prune message.
 When a PIM device is not required to send data to its downstream interfaces, the PIM
device sends Prune messages through its upstream interfaces to instruct upstream devices
to stop forwarding packets to the network segment on which the PIM device resides.
 When a receiver starts to require data from a PIM-SM network, the receiver's DR sends a
Join message through the reverse path forwarding (RPF) interface towards the RP to
Issue 01 (2018-05-04) 1401

NE20E-S2
instruct the upstream neighbor to forward packets to the receiver. The Join message is
sent in the upstream direction hop by hop to set up an RPT.
 When an RP triggers an SPT switchover, the RP sends a Join message through the RPF
interface connected to the source to instruct the upstream neighbor to forward packets to
the network segment. The Join message is sent in the upstream direction hop by hop to
set up an SPT.
 When a receiver's DR triggers an SPT switchover, the DR sends a Join message through
the RPF interface connected to the source to instruct the upstream neighbor to forward
packets to the network segment. The Join message is sent in the upstream direction hop
by hop to set up an SPT.
 A PIM network segment may be connected to a downstream interface and multiple
upstream interfaces. After an upstream interface sends a Prune message, if other
upstream interfaces still require multicast packets, these interfaces must send Join
messages within the override-interval. Otherwise, the downstream interfaces responsible
for forwarding packets on the network segment do not perform the prune action.
 If PIM is enabled on the interfaces of user-side routers, a receiver' DR is elected, and outbound
interfaces are added to the PIM DR's outbound interface list. The PIM DR then sends Join messages
to the RP.
As shown in Figure 1-949, Interface 1 of Device A is a downstream interface, and

Interface 2 of Device B and Interface 3 of Device C are upstream interfaces. If Device B
sends a Prune message through Interface 3 of Device C and Interface 1 of Device A will
receive the message. If Device C still wants to receive the multicast data of the group,
Device C needs to send a Join message within the override-interval. This message will
notify Interface 1 of Device A that a downstream router still wants to receive the
multicast data. Therefore, Device A does not perform the prune action.
Figure 1-949 Join/Prune messages on a PIM shared network segment
In an IP packet that carries a Join/Prune message, the source address is a local interface's
address, the destination address is 224.0.0.13, and the TTL value is 1. The message is
transmitted in multicast mode.
Issue 01 (2018-05-04) 1402

NE20E-S2
Figure 1-950 Join/Prune message format
Figure 1-951 Format of the Group J/P Record field
Table 1-277 Fields in a Join/Prune message
Field Description
Type Message type
The value is 3.
Upstream Neighbor Upstream neighbor's address, that is, the address of the
Address downstream interface that receives the Join/Prune message and
performs the Join or Prune action
Number of Groups Number of groups contained in the message
Holdtime Duration (in seconds) that an interface remains in the Join or
Prune state
Group Address Group address
Number of Joined Number of sources that the router joins
Sources
Number of Pruned Number of sources that the router prunes
Sources
Issue 01 (2018-05-04) 1403

NE20E-S2
Field Description
Joined Source Address Address of the source that the router joins
Pruned Source Address Address of the source that the router prunes
Bootstrap Messages
Bootstrap messages are used only in PIM-SM.
When a dynamic RP is used, candidate-bootstrap routers (C-BSRs) periodically send

Bootstrap messages through all PIM interfaces to participate in BSR election. The router that
wins in the election continues to send Bootstrap messages carrying RP-Set information to all
PIM routers in the domain.
In an IP packet that carries a Bootstrap message, the source address is a PIM interface's
address, the destination address is 224.0.0.13, and the TTL value is 1. The packet is
transmitted in multicast mode hop by hop and is flooded on the entire network.
Figure 1-952 Bootstrap message format
Figure 1-953 Format of the Group-RP Record field
Issue 01 (2018-05-04) 1404

NE20E-S2
Table 1-278 Fields in a Bootstrap message
Field Description
Type Message type
The value is 4.
Fragment Tag Random number used to distinguish the Bootstrap message
Hash Mask length Length of the hash mask of the C-BSR
BSR-priority C-BSR priority
BSR-Address C-BSR address
RP-Count Total number of candidate-rendezvous points (C-RPs) that
serve the group
Frag RP-Cnt Number of C-RP addresses included in this fragment of the
Bootstrap message for the corresponding group range.
This field facilitates parsing of the RP-Set for a given group
range, when carried over more than one fragment.
RP-address C-RP address
RP-holdtime Aging time of the advertisement message sent by the C-RP
RP-Priority C-RP priority
The BSR boundary of a PIM interface can be set by using the pim bsr-boundary command
on the interface. Multiple BSR boundary interfaces divide the network into different PIM-SM
domains. Bootstrap messages cannot pass through the BSR boundary.
Assert Messages
On a shared network segment, if a PIM router receives an (S, G) packet from the downstream
interface of the (S, G) or (*, G) entry, it indicates that other forwarders exist on the network
segment. The PIM router then sends an Assert message through the downstream interface to
participate in the forwarder election. The router that fails in the forwarder election stops
forwarding multicast packets through the downstream interface.
In an IP packet that carries an Assert message, the source address is a local interface's address,
the destination address is 224.0.0.13, and the TTL value is 1. The packet is transmitted in
multicast mode.
Issue 01 (2018-05-04) 1405

NE20E-S2
Figure 1-954 Assert message format
Table 1-279 Fields in an Assert message
Field Description
Type Message type
The value is 5.
Source address This field is a multicast source address if a unique forwarder is
elected for (S, G) entries, and this field is 0 if a unique forwarder
is elected for (*, G) entries.
R RPT bit
This field is 0 if a unique forwarder is elected for (S, G) entries,
and this field is 1 if a unique forwarder is elected for (*, G)
entries.
Metric Preference Priority of the unicast path to the source address
If the R field is 1, this field indicates the priority of the unicast
path to the RP.
Metric Cost of the unicast route to the source address
If the R field is 1, this field indicates the cost of the unicast path
to the RP.
C-RP Advertisement Messages
C-RP Advertisement messages are used only in PIM-SM.
When a dynamic RP is used, C-RPs periodically send Advertisement messages to notify the
BSR of the range of groups they want to serve.
In an IP packet that carries an Advertisement message, the source address is the source's C-RP
address, and the destination address is the BSR's address. The packet is transmitted in unicast
mode.
Issue 01 (2018-05-04) 1406

NE20E-S2
Figure 1-955 Advertisement message format
Table 1-280 Fields in an Advertisement message
Field Description
Type Message type

The value is 8.
Prefix-Cnt Prefix value of the multicast address
Priority C-RP priority
Holdtime Aging time of the Advertisement message
RP-Address C-RP address
1.11.4.2.7 Multicast over P2MP TE Tunnels

Using point-to-multipoint (P2MP) Traffic Engineering (TE) tunnels to carry multicast
services on an IP/Multiprotocol Label Switching (MPLS) backbone network provides high TE
capabilities and reliability and reduces operational expenditure (OPEX).
Background
IP and MPLS are generally used to forward packets on traditional core and backbone
networks. Deployment of multicast services, such as IPTV, multimedia conferences, and
real-time online games continues to increase on IP/MPLS networks. These services require
sufficient bandwidth, assured QoS, and high reliability on the bearer network. Currently, the
following multicast solutions are used to run multicast services, but these solutions cannot
meet the requirements of multicast services and network carriers:
 IP multicast technology: It can be deployed on point-to-point (P2P) networks to run
multicast services, reducing network upgrade and maintenance costs. Similar to IP
unicast, IP multicast does not support QoS or traffic planning and has low reliability.
Multicast applications place high demands on real-time transmission and reliability, and
IP multicast technology cannot meet these requirements.
 Establishing a dedicated multicast network: A dedicated multicast network is usually
constructed over Synchronous Optical Network (SONET)/Synchronous Digital
Hierarchy (SDH). SONET/SDH has high reliability and provides a high transmission
rate. However, such a network is expensive to construct, incurs significant OPEX, and
must be maintained separately.
Issue 01 (2018-05-04) 1407

NE20E-S2
IP/MPLS backbone network carriers require a multicast solution with high TE capabilities to
run multicast services on existing IP/MPLS backbone network devices.
Multicast over P2MP TE tunnels can meet the carriers' requirements by establishing tree
tunnels to transmit multicast data. It has the advantages of high IP multicast packet
transmission efficiency and assured MPLS TE end-to-end (E2E) QoS.
Benefits
Deploying P2MP TE on an IP/MPLS backbone network brings the following benefits:
 Improves network bandwidth utilization.
 Provides sufficient bandwidth for multicast services.
 Simplifies network deployment using multicast protocols by not requiring PIM and
IGMP to be deployed on core devices on the network.
Related Concepts
P2MP TE data forwarding is similar to IP multicast data forwarding. A branch node copies
MPLS packets, swaps existing labels with outgoing labels in the MPLS packets, and sends
each separate copy of the MPLS packets over every sub-LSP. This process increases network
bandwidth resource usage.
For details on P2MP TE concepts, see Related Concepts in the HUAWEI NE20E-S2 Feature
Description - MPLS.
Technologies Used by Multicast over P2MP TE Tunnels

If P2MP TE tunnels are used to transmit multicast services, the ingresses and egresses of the
P2MP TE tunnels must be configured properly to ensure multicast traffic transmission after
the traffic passes through the P2MP TE tunnels, as shown in Figure 1-956.
Issue 01 (2018-05-04) 1408

NE20E-S2
Figure 1-956 Networking diagram for multicast over P2MP TE tunnels
 Ingresses
The P2MP tunnel interfaces of the ingresses (PE1 and PE2) direct multicast data to the
P2MP TE tunnel.
 Egresses
The egresses (PE3, PE4, PE5, and PE6) must be configured to ignore the Unicast
Reverse Path Forwarding (URPF) check. Whether to configure multicast source proxy
on the egresses is based on the location of the rendezvous point (RP).
Issue 01 (2018-05-04) 1409

NE20E-S2
− Ignoring the URPF check

The egresses must be configured to ignore the URPF check during multicast traffic
forwarding.
− Multicast source proxy
In a multicast over P2MP TE scenario where PIM-SM is used, if an RP is deployed
at the egress side, the multicast source cannot send a Register message to the RP
because it cannot find an available route to the RP. In this case, multicast source
proxy can be used to enable the egress to register multicast source information with
the RP.
If a multicast data packet for a group in the any-source multicast (ASM) address
range is directed to an egress and the egress is not directly connected to the
multicast source and does not function as the RP to which the group corresponds,
the multicast data packet stops being forwarded. As a result, downstream hosts
cannot receive these multicast data packets. Multicast source proxy can be used to
address this problem. To allow forwarding of multicast data packets to the
downstream hosts, multicast source proxy enables the egress to send a Register
message to the RP deployed on a source-side device in a PIM domain, such as AR1
or AR2.
1.11.4.3.1 PIM Intra-domain
Continuing development of the Internet has led to considerable growth in the types of data,
voice, and video information exchanged online. New services, such as VoD and BTV, have
emerged and continue to develop. Multicast plays an increasingly important role in the
transmission of these services. This section describes Protocol Independent Multicast-Sparse
Mode (PIM-SM) intra-domain networking.
Figure 1-957 shows a large-scale network with multicast services deployed. An IGP has been
deployed, and each network segment route is reachable. Group members are distributed
sparsely. Users on the network require VoD services, but network bandwidth resources are
limited.
Issue 01 (2018-05-04) 1410

NE20E-S2
Figure 1-957 PIM-SM intra-domain
Implementation Solution
As shown in Figure 1-957, Host A and Host B are multicast information receivers, each
located on a different leaf network. The hosts receive VoD information in multicast mode.
PIM-SM is used in the entire PIM domain. Device B is connected to multicast source S1.
Device A is connected to multicast source S2. Device C is connected to Host A. Devices E
and F are connected to Host B.
Network configuration details are as follows:
 PIM-SM is enabled on all router interfaces.
 As shown in Figure 1-957, multicast sources are densely distributed.
Candidate-Rendezvous Points (C-RPs) can be deployed on devices close to the multicast
sources. Loopback 0 interfaces on Devices A and D are configured as
candidate-bootstrap routers (C-BSRs) and C-RPs. A BSR is elected among the C-BSRs.
An RP is elected among the C-RPs.
The RP deployment guidelines are as follows:
− Static RPs are recommended on small-/medium-sized networks because a
small-/medium-sized network is stable and has low forwarding requirements for an
RP.
If there is only one multicast source on the network, setting the device directly
connected to the multicast source as a static RP is recommended. The source's
designated router (DR) also functions as the RP and does not need to register with
the RP.
When a static RP is used, all routers, including the RP, must have the same
information about the RP and the multicast groups that the RP serves.
− Dynamic RPs are recommended on large-scale networks because dynamic RPs are
easy to maintain and provide high reliability.
 Dynamic RP
Issue 01 (2018-05-04) 1411

NE20E-S2
 If multiple multicast sources are densely distributed on the network,

configuring core devices close to the multicast sources as C-RPs is
recommended.
 If multiple users are densely distributed on the network, configuring core
devices close to the users as C-RPs is recommended.
 Anycast-RP
 Small-scale network: A static RP is recommended.
 Large-scale network: Configuring a BSR to elect an RP is recommended
to facilitate RP maintenance.
To ensure RP information consistency, do not configure static RPs on some routers but dynamic RPs on
other routers in the same PIM domain.
 IGMP is run between Device C and Host A and between Device E, Device F, and Host B.
When configuring IGMP on router interfaces, ensure that interface parameters are
consistent. All routers connected to the same network must run the same IGMP version
(IGMPv2 is recommended) and be configured with the same parameter values, such as
the interval at which IGMP Query messages are sent and holdtime of memberships.
Otherwise, IGMP group memberships on different routers will be inconsistent.
 Hosts A and B send Join messages to the RP to require information from the multicast
source.
Configuring interfaces on network edge devices to statically join all multicast groups is recommended to
increase the speed for changing channels and to provide a stable viewing environment for users.
1.11.4.3.2 PIM-SSM Intra-domain

Both Protocol Independent Multicast-Source-Specific Multicast (PIM-SSM) and PIM-SM are
for use on large-scale networks where group members are sparsely distributed. Unlike
PIM-SM, PIM-SSM can be used in scenarios in which users know the multicast source
location before they join a specific group and send requests to specific sources for multicast
data. This section describes PIM-SSM intra-domain networking.
Multicast services are deployed on the large-scale network shown in Figure 1-958. An IGP
has been deployed, and each network segment route is reachable. Group members are sparsely
distributed on the network. User hosts on the network want to send Join messages directly to
specific multicast sources and receive VoD information.
Issue 01 (2018-05-04) 1412

NE20E-S2
Figure 1-958 PIM-SSM intra-domain
Implementation Solution
On the network shown in Figure 1-958, Hosts A and B are multicast information receivers,
each located on a different leaf network. The hosts receive VoD information in multicast mode.
PIM-SSM is used throughout the PIM domain. Device B is connected to multicast source S1.
Device A is connected to multicast source S2. Device C is connected to Host A. Devices E
and F are connected to Host B.
Network configuration details are as follows:
 PIM-SSM is enabled on all router interfaces.
A receiver in a PIM-SSM scenario can send a Join message directly to a specific multicast source. A
shortest path tree (SPT) is established between the multicast source and receiver, not requiring the
network to maintain Rendezvous Points (RPs).
 IGMP runs between Device C and Host A, between Device E and Host B, and between
Device F and Host B.
When configuring IGMP on router interfaces, ensure that interface parameters are
consistent. All routers connected to the same network must run the same IGMP version
(IGMPv2 is recommended) and be configured with the same interface parameter values,
such as the Query timer value and hold time of memberships. If the IGMP versions or
interface parameters are different, IGMP group memberships are inconsistent on
different routers.
 Host A can send Join messages to S1. Host B can send Join messages to S2. Information
sent by these multicast sources can reach user hosts.
Issue 01 (2018-05-04) 1413

NE20E-S2
Configuring interfaces on network edge devices to statically join all multicast groups is recommended to
increase the speed for changing channels and to provide a stable viewing environment for users.
1.11.4.3.3 P2MP TE Applications for IPTV
Service Overview
There is an increasing diversity of multicast services, such as IPTV, multimedia conference,
and massively multiplayer online role-playing games (MMORPGs), and multimedia
conferences. These services are transmitted over a service bearer network with the following
functions:
 Forwards multicast traffic even during traffic congestion.
 Rapidly detects network faults and switches traffic to a standby link.
point-to-multipoint (P2MP) Traffic Engineering (TE) supported on NE20Es is used on the
IP/MPLS backbone network shown in Figure 1-959. P2MP TE helps the network prevent
multicast traffic congestion and maintain reliability.
Figure 1-959 P2MP TE applications for multicast services
Feature Deployment
Figure 1-959 illustrates how P2MP TE tunnels are used to transmit IP multicast services. The
process consists of the following stages:
 Import multicast services.
− An Internet Group Management Protocol (IGMP) static group is configured on a
network-side interface of each service router (SR). SR1 run the Protocol
Independent Multicast (PIM). Ingress PE1 functioning as a host sends an IGMP
Join message to SR1. After receiving the message, SR generates a multicast
forwarding entry and forwards multicast traffic to a PE. A traffic policy is
Issue 01 (2018-05-04) 1414

NE20E-S2
configured to allow each PE to direct multicast traffic to a separate P2MP tunnel

interface.
− A static IGMP group is configured on a P2MP TE tunnel interface on each ingress.
PIM is enabled on the ingress nodes. After receiving an IGMP Join message from a
downstream device, each ingress generates a multicast forwarding entry. Each
ingress then sends an IGMP Join message to an SR.
 Establish a P2MP TE tunnel.
Using the following configuration is recommended:
− Configuring an explicit path is recommended. The configuration must prevent the
re-merge or cross-over problem.
− Resource Reservation Protocol (RSVP) neighbor-based authentication is configured
to improve network protocol security.
− RSVP Srefresh is configured to increase the efficiency of network resource use.
− P2MP TE FRR is configured to improve network reliability.
 Forward multicast services.
− Leaf nodes PE3, PE4, PE5, and PE6 shown in Figure 1-959 run PIM to generate
multicast forwarding entries.
− AR1 establishes PIM neighbor relationships with PE3 and PE4, and AR2
establishes PIM neighbor relationships with PE5 and PE6.
− Multicast source proxy is configured on leaf nodes PE3 through PE6 to allow these
PEs to receive Join messages sent by ARs. After receiving the Join messages, these
PEs terminate them.
− Reverse path forwarding (RPF) check is disabled on leaf PEs.
1.11.4.3.4 NON-ECMP PIM FRR Based on IGP FRR
Service Overview
conferences. To bear these services, the service providers' networks have to meet the
following requirements:
PIM FRR function deployed on the user-access devices helps the network prevent multicast
traffic congestion and maintain reliability. PIM FRR is used on the IPTV service network
shown inFigure 1-960.
Issue 01 (2018-05-04) 1415

NE20E-S2
Figure 1-960 NON-ECMP PIM FRR for IPTV services
Feature Deployment
PIM FRR is used to transmit and protect IP multicast services. The process consists of the
following stages:
 Deploy IGP LFA FRR.
Deploy ISIS LFA FRR or OSPF LFA FRR to the protection nodes, such as DeviceA, so
that the nodes can generate main and backup unicast routes.
 Configure PIM FRR.
PIM FRR is configured on protection nodes, such as DeviceA. When a user joins in, a
main multicast forwarding entry and a backup multicast forwarding entry are generated.
If the network operates normally, the protection nodes only receive the multicast traffic
from main link and drop the traffic from backup link. If the main link fails, the protection
node rapidly switch to the backup link to protect the multicast traffic.
1.11.4.3.5 NON-ECMP PIM FRR Based on Multicast Static Route
Service Overview
In a NON-ECMP network, the IGP LFA FRR function may fail to calculate unicast routes. To
avoid multicast service failure, configure static main and backup routes to establish main and
backup links.
PIM FRR function deployed on the user-access devices helps the network prevent multicast
traffic congestion and maintain reliability. PIM FRR is used on the IPTV service network
shown inFigure 1-961.
Issue 01 (2018-05-04) 1416

NE20E-S2
Figure 1-961 NON-ECMP PIM FRR for IPTV services
Feature Deployment
PIM FRR is used to transmit and protect IP multicast services. The process consists of the
following stages:
 Configure FRR based on multicast static routes.
Configure FRR based on multicast static routes on each node of the circle, so that each
node can generates main and backup unicast routes.
 Configure PIM FRR.
PIM FRR is configured oneach node of the circle. When a user joins in, a main multicast
forwarding entry and a backup multicast forwarding entry are generated. If the network
operates normally, the protection nodes only receive the multicast traffic from main link
and drop the traffic from backup link. If the main link fails, the protection node rapidly
switch to the backup link to protect the multicast traffic.
Issue 01 (2018-05-04) 1417

NE20E-S2
1.11.4.4 Appendix
Feature IPv4 PIM IPv6 PIM Implementation Difference
PIM-SM Supported Supported Auto-RP listening is supported

only in IPv4 scenarios.
Embedded-RP is supported only
in IPv6 scenarios.
Anycast-RP
 Anycast-RP implemented
using MSDP is supported
only in IPv4 scenarios.
 Anycast-RP implemented
using PIM is supported in
both IPv4 and IPv6
scenarios.
PIM-SSM Supported Supported -

PIM reliability Supported Supported -
PIM security Supported Supported -
PIM FRR Supported Not supported -
PIM control Supported Supported -
message
1.11.5 MSDP
Definition
Multicast Source Discovery Protocol (MSDP) is an inter-domain multicast solution that
applies to interconnected multiple Protocol Independent Multicast-Sparse Mode (PIM-SM)
domains. Currently, MSDP applies only to IPv4.
Purpose
A network composed of PIM-SM devices is called a PIM-SM network. In real-world
situations, a large PIM-SM network may be maintained by multiple Internet service providers
(ISPs).
A PIM-SM network uses Rendezvous Points (RPs) to forward multicast data. A large
PIM-SM network can be divided into multiple PIM-SM domains. On a PIM-SM network, an
RP does not communicate with RPs in other domains. An RP knows only the local multicast
source's location and distributes data only to local domain users. A multicast source registers
only with the local domain RP, and hosts send Join messages only to the local domain RP.
Using this approach, PIM-SM domains implement load splitting among RPs, enhance
network stability, and facilitate network management.
Issue 01 (2018-05-04) 1418

NE20E-S2
After a large PIM-SM network is divided into multiple PIM-SM domains, a mechanism is
required to implement inter-domain multicast. MSDP provides this mechanism, enabling
hosts in the local PIM-SM domain to receive multicast data from sources in other PIM-SM
domains.
In this section, a PIM-SM domain refers to the service range of an RP. A PIM-SM domain can be a
domain defined by bootstrap router (BSR) boundaries or a domain formed after you configure static RPs
on the router.
1.11.5.2 Principles
1.11.5.2.1 Inter-Domain Multicast in MSDP
MSDP Peer
On a PIM-SM network, MSDP enables Rendezvous Points (RPs) in different domains to
interwork. MSDP also enables different PIM-SM domains to share multicast source
information by establishing MSDP peer relationships between RPs.
An MSDP peer relationship can be set up between two RPs in the following scenarios:
 Two RPs belong to the same AS but different PIM-SM domains.
 Two RPs belong to different autonomous systems (ASs).
To ensure successful reverse path forwarding (RPF) checks in an inter-AS scenario, a BGP or
a Multicast Border Gateway Protocol (MBGP) peer relationship must be established on the
same interfaces as the MSDP peer relationship.
Basic Principles
Setting up MSDP peer relationships between RPs in different PIM-SM domains ensures the
communication between PIM-SM domains, and thereby forming an MSDP-connected graph.
MSDP peers exchange Source-Active (SA) messages. An SA message carries (S, G)
information registered by the source's DR with the RP. Message exchange between MSDP
peers ensures that SA messages sent by any RP can be received by all the other RPs.
Figure 1-962 shows a PIM-SM network divided into four PIM-SM domains. The source in the
PIM-SM 1 domain sends data to multicast group G. The receiver in the PIM-SM 3 domain is a
member of group G. RP 3 and the receiver's PIM-SM 3 domain maintain an RPT for group G.
Issue 01 (2018-05-04) 1419

NE20E-S2
Figure 1-962 Inter-domain multicast through MSDP
As shown in Figure 1-962, the receiver in the PIM-SM 3 domain can receive data sent by the
source PIM-SM 1 domain after MSDP peer relationships are set up between RP 1, RP 2, and
RP 3. The data processing flow is as follows:
1. The source sends multicast data to group G. DR 1 encapsulates the data into a Register
message and sends the message to RP 1.
2. As the source's RP, RP 1 creates an SA message containing the IP addresses of the
source, group G, and RP 1. RP 1 sends the SA message to RP 2.
3. Upon receiving the SA message, RP 2 performs an RPF check on the message. If the
check succeeds, RP 2 forwards the message to RP3.
4. Upon receiving the SA message, RP 3 performs an RPF check on the message. If the
check succeeds, it means that (*, G) entries exist on RP 3, indicating that the local
domain contains members of group G. RP 3 then creates an (S, G) entry and sends a Join
message with the (S, G) information to the source hop by hop. A multicast path (routing
tree) from the source to RP 3 is then set up.
5. After the multicast data reaches RP 3 along the routing tree, RP 3 forwards the data to
the receiver along the rendezvous point tree (RPT).
6. After receiving the multicast data, the receiver determines whether to initiate shortest
path tree (SPT) switchover.
Issue 01 (2018-05-04) 1420

NE20E-S2
1.11.5.2.2 Mesh Group
Background
If multiple Multicast Source Discovery Protocol (MSDP) peers exist in the same or different
ASs, the following problems may easily occur:
 Source active (SA) messages are flooded between peers. Especially when many MSDP
peers are configured in the same PIM-SM domain, reverse path forwarding (RPF) rules
cannot filter out useless SA messages effectively. The MSDP peer needs to perform the
RPF check on each received SA messages, which brings heavy workload to the system.
 SA messages are discarded due to RPF check failures.
To resolve these problems, configure a mesh group.
Implementation Principle
A mesh group requires each two MSDP peers in the group to set up a peer relationship,
implementing full-mesh connections in the group. To implement the mesh group function, add
all MSDP peers in the same and different ASs to the same mesh group on a multicast device.
When a member of the mesh group receives an SA message, it checks the source of the SA
message:
 If the SA message is sent by a member of the mesh group, the member directly accepts
the message without performing the RPF check. In addition, it does not forward the
message to other members in the mesh group.
In real-world situations, adding all MSDP peers in the same and different ASs to the same mesh group is
recommended to prevent SA messages from being discarded due to RPF check failures.
 If the SA message is sent by an MSDP peer outside the mesh group, the member
performs the RPF check on the SA message. If the SA message passes the check, the
member forwards it to other members of the mesh group.
The mesh group mechanism greatly reduces SA messages to be exchanged among MSDP
peers, relieving the workload of the multicast device.
An MSDP peer can belong to only one mesh group.
1.11.5.2.3 Anycast-RP in MSDP
Usage Scenario
In a traditional PIM-SM domain, each multicast group is mapped to only one rendezvous
point (RP). When the network is overloaded or traffic is heavy, many network problems occur.
For example, the RP may be overloaded, routes may converge slowly if the RP fails, or the
multicast forwarding path may not be optimal.
To resolve those problems, Anycast-RP is used in MSDP. Anycast-RP allows you to configure
multiple loopback interfaces as RPs in a PIM-SM domain, assign the same IP address to each
of these loopback interfaces, and set up MSDP peer relationships between these RPs. These
configurations help select the optimal paths and RPs and implement load splitting among the
RPs.
Issue 01 (2018-05-04) 1421

NE20E-S2
Implementation Principle
As shown in Figure 1-963, in a PIM-SM domain, the multicast sources, S1 and S2, send
multicast data to the multicast group G. U1 and U2 are members of group G.
Figure 1-963 Anycast-RP
The implementation process of Anycast-RP in the PIM-SM domain is as follows:

1. RP 1 and RP 2 establish an MSDP peer relationship to implement intra-domain
multicast.
2. The receiver sends a Join message to the nearest RP and sets up a rendezvous point tree
(RPT). The multicast source registers with the nearest RP. RPs exchange source active
(SA) messages to share multicast source information.
3. Each RP joins a shortest path tree (SPT) with the source's designated router (DR) at the
root. After the receiver receives the multicast data, it determines whether to initiate the
SPT switchover.
1.11.5.2.4 Multi-Instance MSDP

VPN instances support MSDP. MSDP peer relationships can be set up between multicast
router interfaces in the same public or VPN instance. MSDP peers exchange source active
(SA) messages to implement inter-domain VPN multicast.
Multicast routers on which multi-instance is applied maintain a set of MSDP mechanisms for
each instance, including the SA cache, peer connection, timer, sending cache, and cache area
for PIM information exchange. At the same time, information is isolated between different
instances. Consequently, only the routers in the same VPN instance can exchange MSDP
information and PIM-SM information.
Issue 01 (2018-05-04) 1422

NE20E-S2
1.11.5.2.5 MSDP Authentication

MSDP supports the message-digest algorithm 5 (MD5) and keychain authentication to
improve the security and reliability of MSDP packet forwarding. The application scenario of
MD5 or keychain authentication is the same as that of basic MSDP applications. MD5 and
keychain authentication cannot be both configured.
MSDP uses TCP as the transport layer protocol. To enhance MSDP security, you can
configure MD5 to authenticate TCP connections. If a TCP connection fails to be
authenticated, the TCP connection cannot be established.
Keychain authentication works at the application layer. This authentication method
ensures smooth service transmission and improves security by periodically changing the
authentication password and encryption algorithm. Keychain authenticates both MSDP
packets and the TCP connection setup process. For details about keychain, see the
"Keychain" chapter in HUAWEI NE20E-S2 Feature Description - Security.
1.11.5.2.6 RPF Check Rules for SA Messages

To prevent source active (SA) messages from being cyclically transmitted between MSDP
peers, MSDP peers perform a reverse path forwarding (RPF) check on received SA messages
and discard any SA messages that fail the check.
RPF check rules for SA messages are as follows:
 Rule 1: If an SA message is sent from an MSDP peer that functions as a source
rendezvous point (RP) constructing the SA message, the receiving multicast device
permits the SA message.
 Rule 2: If an SA message is sent from an MSDP peer that is a static RPF peer, the
receiving multicast device permits the SA message. A receiving multicast device can set
up MSDP peer relationships with multiple other multicast devices. You can specify one
or more MSDP peers as static RPF peers.
 Rule 3: If the receiving multicast device has only one MSDP peer, the peer automatically
becomes an RPF peer. The receiving multicast device permits SA messages sent from
this peer.
 Rule 4: If an SA message is sent from an MSDP peer that is in the same mesh group as
the receiving multicast device, the receiving multicast device permits the SA message.
The receiving multicast device does not forward the SA message to MSDP peers in the
mesh group but forwards it to all MSDP peers outside the mesh group.
 Rule 5: If an SA message is sent from an MSDP peer that is a route advertiser or the next
hop of a source RP, the receiving multicast device permits the SA message. If a network
has multiple equal-cost routes to a source RP, the receiving multicast device permits SA
messages sent from all MSDP peers on the equal-cost routes.
 Rule 6: If a network has inter-AS routes to a source RP, the receiving multicast device
permits SA messages sent from MSDP peers whose AS numbers are recorded in the
AS-path.
If an SA message matches any of rules 1 to 4, the receiving multicast device permits the SA
message. The application of rules 5 and 6 depends on route types.
 If a route to a source RP is a BGP or a Multicast Border Gateway Protocol (MBGP)
route:
Issue 01 (2018-05-04) 1423

NE20E-S2
− If an MSDP peer is an External Border Gateway Protocol (EBGP) or MEBGP peer,

rule 6 applies.
− If an MSDP peer is an Internal Border Gateway Protocol (IBGP) or MIBGP peer,
rule 5 applies.
− If an MSDP peer is not a BGP or an MBGP peer and the route to the source RP is an
inter-AS route, rule 6 applies. Rule 5 applies in other cases.
 If a route to a source RP is not a BGP or an MBGP route:
− If IGP or multicast static routes exist, rule 5 applies.
− If no routes exist, the receiving multicast device discards SA messages sent from
MSDP peers.
If an RP address is a local address, an RPF check fails.
Inter-Domain Multicast
Figure 1-964 shows an inter-domain multicast application.
 An MSDP peer relationship is set up between rendezvous points (RPs) in two different
PIM-SM domains. Multicast source information can then be shared between the two
domains.
 After multicast data reaches RP 1 (the source's RP), RP 1 sends a source active (SA)
message that carries the multicast source information to RP 2.
 RP 2 initiates a shortest path tree (SPT) setup request to the source.
 RP 2 forwards the multicast data to the receiver in the local domain.
 After Receiver receives the multicast data, it independently determines whether to
initiate an SPT switchover.
Issue 01 (2018-05-04) 1424

NE20E-S2
Figure 1-964 Inter-domain multicast within an AS
Anycast-RP
Figure 1-965 shows an Anycast-RP application.
 Device 1 and Device 2 function as RPs and establish an MSDP peer relationship between
each other.
 Intra-domain multicast is performed using this MSDP peer relationship. A receiver sends
a Join message to the nearest RP to set up a rendezvous point tree (RPT).
 The multicast source registers with the nearest RP. RPs exchange SA messages to share
the multicast source information.
 Each RP joins an SPT with the source's DR at the root.
 After receiving the multicast data, the receiver decides whether to initiate an SPT
switchover.
Issue 01 (2018-05-04) 1425

NE20E-S2
Figure 1-965 Anycast-RP
1.11.6 Multicast Route Management

Definition
A multicast forwarding table consists of groups of (S, G) entries. In an (S, G) entry, S
indicates the source information, and G indicates the group information. The multicast route
management module supports multiple multicast routing protocols. The multicast forwarding
table therefore collects multicast routing entries generated by various types of protocols.
Multicast route management includes the following functions:
 Reverse path forwarding (RPF) check
 Multicast load splitting
 Longest-match multicast routing
 Multicast multi-topology
 Multicast Boundary
Purpose
 RPF check
This function is used to find an optimal unicast route to the multicast source and build a
multicast forwarding tree. The outbound interface of the unicast route functions as the
inbound interface of the forwarding entry. Then, when the forwarding module receives a
Issue 01 (2018-05-04) 1426

NE20E-S2
multicast data packet, the module matches the packet with the forwarding entry and
checks whether the inbound interface of the packet is correct. If the inbound interface of
the packet is identical with the outbound interface of the unicast routing entry, the packet
passes the RPF check; otherwise, the packet fails the RPF check and is discarded. The
RPF check prevents traffic loops in multicast data forwarding.
 Multicast load splitting
If a multicast load splitting policy is configured, different forwarding entries that specify
the same multicast source can select different equal-cost routes as RPF routes to guide
multicast data forwarding. The RPF routes of forwarding entries can be hashed to
different equal-cost routes, and multicast data distribution is then implemented.
 Longest-match multicast routing
During multicast routing, the router preferentially selects the route with the longest
matched mask length to implement accurate route matching.
 Multicast multi-topology
The multicast multi-topology function helps you plan a multicast topology for multicast
services on a physical network. Then, when a multicast device performs the RPF check,
the device searches for routes and builds a multicast forwarding tree only in the multicast
topology. In this manner, the problem that multicast services heavily depend on unicast
routes is addressed.
 Multicast Boundary
Multicast boundaries are used to control multicast information transmission by allowing
the multicast information of each multicast group to be transmitted only within a
designated scope. A multicast boundary can be configured on an interface to form a
closed multicast forwarding area. After a multicast boundary is configured for a specific
multicast group on an interface, the interface cannot receive or send multicast packets for
the multicast group.
1.11.6.2 Principles
1.11.6.2.1 RPF Check
Reverse path forwarding (RPF) check is a mechanism that determines whether a multicast
packet is valid. RPF check works as follows: After receiving a multicast packet, a router looks
up the packet source address in the unicast routing table, Multicast Border Gateway Protocol
(MBGP) routing table, Multicast Interior Gateway Protocol (MIGP) routing table, and
multicast static routing table to select an optimal route as an RPF route for the packet. If the
interface on which the packet has arrived is an RPF interface, the RPF check succeeds, and
the packet is forwarded. Otherwise, the RPF check fails, and the packet is dropped.
If all the MIGP, MBGP, and MSR routing tables have candidate routes for the RPF route, the
system selects one optimal route from each of the routing table. If the routes selected from
each table are Rt_urt (migp), Rt_mbgp, and Rt_msr, the system selects the RPF route based
on the following rules:
 By default, the system selects the RPF route based on the route priority.
a. The system compares the priorities of Rt_urt (migp), Rt_mbgp, and Rt_msr. The
route with the smallest priority value is preferentially selected as the RPF route.
b. If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same priority, the system selects
the RPF route in descending order of Rt_msr, Rt_mbgp, and Rt_urt (migp).
 If the multicast longest-match command is run to control route selection based on the
route mask:
Issue 01 (2018-05-04) 1427

NE20E-S2
− The system compares the mask lengths of Rt_urt (migp), Rt_mbgp, and Rt_msr.
The route with the longest mask is preferentially selected as the RPF route.
− If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same mask length, the system
compares their priorities. The route with the smallest priority value is preferentially
selected as the RPF route.
− If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same mask length and priority, the
system selects the RPF route in descending order of Rt_msr, Rt_mbgp, and Rt_urt
(migp).
For example, on the network shown in Figure 1-966, Device C receives packets on both Port
1 and Port 2 from the same source. The routing table on Device C shows that the RPF
interface for this source is Port 2. Therefore, the RPF check fails for the packet on Port 1 but
succeeds for the packet on Port 2. Then, Device C drops the packet on Port 1 but forwards
that packet on Port 2.
Figure 1-966 RPF check process
1.11.6.2.2 Multicast Load Splitting

Multicast load splitting support five policies:
 Multicast group-based load splitting
 Multicast source-based load splitting
 Multicast source- and group-based load splitting
 Stable-preferred load splitting
 Link bandwidth-based load splitting
Multicast group-based load splitting, multicast source-based load splitting, and multicast source- and
multicast group-based load splitting are all methods of hash mode load splitting.
Issue 01 (2018-05-04) 1428

NE20E-S2
Multicast Group-based Load Splitting

The multicast group-based load splitting policy applies to the scenario in which a large
number of multicast groups exist. Figure 1-967 shows the networking diagram of multicast
group-based load splitting.
Figure 1-967 Multicast group-based load splitting
Based on the hash algorithm, a multicast router can select a route among several equal-cost
routes for each multicast group. The routes are used for packet forwarding for the groups. As
a result, multicast traffic for different groups can be split into different forwarding paths.
Multicast Source-based Load Splitting

Multicast source-based load splitting applies to the scenario in which a large number of
multicast sources exist. Figure 1-968 shows the networking diagram of multicast source-based
load splitting.
Figure 1-968 Multicast source-based load splitting
Issue 01 (2018-05-04) 1429

NE20E-S2
routes for each multicast source. The routes are used for packet forwarding for the sources. As
a result, multicast traffic from different sources can be split into different forwarding paths.
Multicast Source- and Group-based Load Splitting

Multicast source- and group-based load splitting applies to the scenario in which a large
number of multicast sources and groups exist. Figure 1-969 shows the networking diagram of
multicast source- and multicast group-based load splitting.
Figure 1-969 Multicast source- and group-based load splitting
routes for each source-specific multicast group. The routes are used for packet forwarding for
the source-specific multicast groups. As a result, multicast traffic for different source-specific
groups can be split into different forwarding paths.
Stable-preferred Load Splitting

A stable-preferred load splitting policy can be used in the preceding three load splitting
scenarios, shown in Figure 1-967, Figure 1-968, and Figure 1-969.
Stable-preferred load splitting enables a router to select an optimal route for a new join entry.
An optimal route is one on which the fewest entries depend. When the network topology and
entries are stable, all entries with the sources on the same network segment are distributed
evenly among the equal-cost routes.
If an unbalance occurs after entries are deleted or route costs change, stable-preferred load
splitting does not allow a router to balance the existing entries immediately, but allows the
router to select the optimal routes for subsequent entries to resolve the unbalance problem.
Link bandwidth-based Load Splitting

Link bandwidth-based load splitting applies to scenarios in which the links have different
bandwidth.
On a multicast router capable of link bandwidth-based load splitting, when a new entry is
generated, the router divides the interface bandwidth by the number of current interface
Issue 01 (2018-05-04) 1430

NE20E-S2
entries for each equal-cost route. Then the router selects the route with the maximum
calculation result as the forwarding route for this new entry.
If an entry is deleted, the router does not adjust the entry load. Therefore, the router cannot
prevent unbalance of load splitting.
1.11.6.2.3 Longest-Match Multicast Routing

During route selection, an optimal intra-domain unicast route, an optimal inter-domain unicast
route, and an optimal multicast static route are selected. One of the them is finally selected as
the forwarding path for the multicast data.
The longest match principle works as follows:
1. If the longest match principle is configured for route selection, a route with the longest
matched mask is chosen by the multicast router.
For example, there is a multicast source with the IP address of 10.1.1.1, and multicast
data needs to be sent to a host with the IP address of 192.168.1.1. There are two
reachable routes to the source in the static routing table and intra-domain unicast routing
table, and the destination network segments are 10.1.1.0/16 and 10.1.1.0/24. Based on
the longest match principle for route selection, the route to the network segment of
10.1.1.0/24 is chosen as the forwarding path for the multicast data.
2. If the mask lengths of the routes are the same, the route with a higher priority is chosen
as the forwarding path for the multicast data.
3. If the mask lengths and priorities of the routes are the same, a route is selected in the
order of a static route, an inter-domain unicast route, and an intra-domain unicast route
as the forwarding path for multicast data.
4. If all the preceding conditions cannot determine a forwarding path for multicast data, the
route with the highest next-hop address is chosen.
1.11.6.2.4 Multicast Multi-Topology

Multi-topology is a method that divides a physical network into multiple logical topologies.
Multicast multi-topology is a typical application of multi-topology.
Without multicast multi-topology, multicast routing heavily depends on unicast routing.
Therefore, unicast route changes affect the setup of an MDT.
Multicast multi-topology resolves this problem by enabling the system to generate a multicast
multi-topology routing table dedicated to multicast services so that multicast routing no
longer completely depends on unicast routing tables.
When a multicast router performs a reverse path forwarding (RPF) check, the router searches
for routes and builds a multicast forwarding tree only in the multicast topology.
Figure 1-970 shows an implementation of multicast multi-topology.
Issue 01 (2018-05-04) 1431

NE20E-S2
Figure 1-970 Multicast multi-topology
 Use multicast multi-topology to deploy multicast services on a network that has a

unidirectional Multiprotocol Label Switching – Traffic Engineering (MPLS TE) tunnel
configured.
On the network, a unidirectional MPLS TE tunnel is established, and multicast services
are enabled. After Interior Gateway Protocol (IGP) Shortcut or Forwarding Advertise
(FA) is configured, the outbound interface of the route calculated by IGP is not the actual
physical interface but a TE tunnel interface. A receiver joins a multicast group, but the
multicast data sent by the server can only travel through Device E and reach Device C
through a physical link. This is because the TE tunnel is unidirectional. Device C has no
multicast routing entries, so it does not forward the multicast data to the receiver. The
multicast service fails to work for this receiver.
Multicast multi-topology resolves this problem by dividing the network into several
logical topologies. For example, the links in green shown in Figure 1-970 construct a
multicast topology and the network operators deploy multicast services only in the
multicast topology. Then, after Device A receives a Join message from the receiver and
performs the RPF check, it selects only the route in the multicast topology with the
upstream device being Device D and sets up an MDT hop-by-hop. The multicast data
travels through the path Device E → Device D → Device A and successfully reaches the
receiver.
Do not configure a unidirectional MPLS TE tunnel in a multicast topology.
 Use multicast multi-topology to isolate multicast services from unicast services.

If multiple types of services are deployed on a network, these services share the physical
topology. For example, the links Device E→Device B→Device A and Device
E→Device C→Device A may run mission-critical unicast services that keep the links
Issue 01 (2018-05-04) 1432

NE20E-S2
very busy. Network operators can set up another link Device E→Device D→Device A to
carry only multicast services and isolate multicast services from unicast services.
After a receiver sends a Join message to a multicast router, the multicast router performs
an RPF check based on the unicast route in the multicast topology and establishes an
MDT hop by hop. The multicast data then travels through the path Device E → Device D
→ Device A and reaches the receiver.
1.11.6.2.5 Multicast Boundary
Usage Scenario
Multicast boundaries are used to control multicast information transmission by allowing the
multicast information of each multicast group to be transmitted only within a designated
scope. A multicast boundary can be configured on an interface to form a closed multicast
forwarding area. After a multicast boundary is configured for a specific multicast group on an
interface, the interface cannot receive or send multicast packets for the multicast group.
Principles
As shown in Figure 1-971, Device A, Device B, and Device C form multicast domain 1.
Device D, Device E, and Device F form multicast domain 2. The two multicast domains
communicate through Device B and Device D.
Figure 1-971 Multicast boundary
Interfaces 1 and Interfaces 2 in this example are GE 1/0/0, GE 2/0/0, respectively.
To isolate the data for a multicast group G from the other multicast domain, configure a
multicast boundary on GE 1/0/0 or GE 2/0/0 for group G. Then, the interface no longer
forwards data to and receives data from group G.
Issue 01 (2018-05-04) 1433

NE20E-S2
1.11.7 Multicast VPN in Rosen Mode

Definition
Multicast VPN (MVPN) in Rosen Mode is based on the multicast domain (MD) scheme
defined in relevant standards. MVPN in Rosen Mode implements multicast service
transmission over MPLS/BGP VPNs.
Purpose
MVPN in Rosen Mode transmits multicast data and control messages of PIM instances in a
VPN over a public network to remote sites of the VPN.
With MVPN in Rosen Mode, a public network PIM instance (called a PIM P-instance) does
not need to know multicast data transmitted in a PIM VPN instance (called a PIM C-instance),
and a PIM C-instance does not need to know multicast data transmitted in a PIM P-instance.
Therefore, MVPN in Rosen Mode isolates multicast data between a PIM P-instance and a
PIM C-instance.
1.11.7.2 Principles
 MD
A multicast domain (MD) is composed of VPN instances on PEs that can receive and
send multicast data between each other. A PE VPN instance can belong only to one MD.
Different VPN instances belong to different MDs. An MD serves a specific VPN. All
private multicast data transmitted in the VPN is transmitted in the MD.
 Share-group
A share-group is a group that all PE VPN instances in the same MD should join. A VPN
instance can join a maximum of one share-group.
 Share-MDT
A share-multicast distribution tree (share-MDT) transmits PIM protocol packets and data
packets between PEs in the same VPN instance. A share-MDT is built when PIM
C-instances join share-groups.
 MTI
A multicast tunnel interface (MTI) is the outbound or inbound interface of a multicast
tunnel (MT) or an MD. MTIs are used to transmit VPN data between local and remote
PEs.
An MTI is regarded as a channel through which the public network instance and a VPN
instance communicate. An MTI connects a PE to an MT on a shared network segment
and sets up PIM neighbor relationships between PE VPN instances in the same MD.
 Switch-group
A switch-group is a group to which all VPN data receivers' PEs join. Switch-groups are
the basis of switch-MDT setup.
 Switch-MDT
A switch-multicast distribution tree (switch-MDT) implements on-demand multicast data
transmission, so a switch-MDT transmits multicast data to only PEs that require the
Issue 01 (2018-05-04) 1434

NE20E-S2
multicast data. A switch-MDT can be built after a share-MDT is set up and VPN data
receivers' PEs join a switch-group.
1.11.7.2.2 Inter-domain Multicast Implemented by MVPN

Multicast Virtual Private Network (MVPN) requires a multicast backbone network (a core
network or a public network) to support multicast functions.
 A PIM instance that runs in a VPN instance bound to a PE is called a VPN-specific PIM
instance or a PIM C-instance.
 A PIM instance that runs in a public network instance bound to a PE is called a PIM
P-instance.
Figure 1-972 MVPN networking
MVPN implements communication between PIM C-instances as follows:

1. MVPN establishes a multicast tunnel (MT) between each two PIM C-instances.
2. Each PIM C-instance creates a multicast tunnel interface (MTI) to connect to a specific
MT.
3. Each PIM C-instance joins a specific MT based on the configured share-group.
VPN instances with the same share-group address construct a multicast domain (MD).
On the network shown in Figure 1-972, VPN BLUE instances bound to PE1 and PE2
communicate through MD BLUE, and VPN RED instances bound to PE1 and PE2
communicate through MD RED. See Figure 1-973 and Figure 1-974.
Issue 01 (2018-05-04) 1435

NE20E-S2
Figure 1-973 MD-based VPN BLUE interworking
Figure 1-974 MD-based VPN RED interworking
The PIM C-instance on the local PE considers the MTI as a LAN interface and sets up a PIM
neighbor relationship with the remote PIM C-instance. The PIM C-instances then use the
MTIs to perform DR election, send Join/Prune messages, and transmit multicast data.
The PIM C-instances send PIM protocol packets or multicast data packets to the MTIs and the
MTIs encapsulate the received packets. The encapsulated packets are public network
multicast data packets that are forwarded by PIM P-instances. Therefore, an MT is actually a
multicast distribution tree on a public network.
 VPNs use different MTs, and each MT uses a unique packet encapsulation mode, so
multicast data is isolated between VPNs.
 PIM C-instances on PEs in the same VPN use the same MT and communicate through
this MT.
Issue 01 (2018-05-04) 1436

NE20E-S2
A VPN uniquely defines an MD. An MD serves only one VPN. This relationship is called a one-to-one
relationship. The VPN, MD, MTI, and share-group are all in a one-to-one relationship.
1.11.7.2.3 PIM Neighbor Relationships Between CEs, PEs, and Ps

PIM neighbor relationships are set up between two or more directly connected multicast
devices on the same network segment. A PIM neighbor relationship in an MD VPN instance
can be a PE-CE neighbor relationship, a PE-P neighbor relationship, or a PE-PE neighbor
relationship.
As shown in Figure 1-975, VPN A instances on each PE and the sites that belong to VPN A
implement multicast in VPN A. Figure 1-976 shows neighbor relationships between CEs, PEs,
and Ps.
Figure 1-975 Multicast in VPN A
Issue 01 (2018-05-04) 1437

NE20E-S2
Figure 1-976 Neighbor relationships between CEs, PEs, and Ps in an MD
 PE-CE neighbor relationship

A PE-CE neighbor relationship is set up between a PE interface bound to a VPN instance
and a CE interface.
 PE-P neighbor relationship
A PE-P neighbor relationship is set up between a PE interface bound to the public
network instance and a P interface.
 PE-PE neighbor relationship
A PE-PE neighbor relationship is set up between two PEs after a local PE MTI receives
Hello packets from a remote PE MTI.
1.11.7.2.4 Share-MDT Setup Process

A share-multicast distribution tree (MDT) has a share-group address as the group address and
is uniquely identified by a share-group address.
Share-MDT Setup on a PIM-SM Network

Figure 1-977 shows the share-MDT setup process on a public network that runs PIM-SM.
1. The PIM P-instance on PE 1 sends the rendezvous point (RP) a Join message that carries
a share-group address. The RP, that is the P device, receives the Join message and
creates the (*, 239.1.1.1) entry. PE 2 and PE 3 also send Join messages to the RP. A
rendezvous point tree (RPT) is thus created in the multicast domain (MD), with the RP at
the root and PE1, PE2, and PE3 at the leaves.
2. The PIM P-instance on PE 1 sends the RP a Register message that has the multicast
tunnel interface (MTI) address as the source address and the share-group address as the
group address. The RP receivers the Register message and creates the (11.11.1.1,
Issue 01 (2018-05-04) 1438

NE20E-S2
239.1.1.1) entry. PE 2 and PE 3 also send Register messages to the RP. Then, three
independent RP-source trees that connect PEs to the RP are built in the MD.
On the PIM-SM network, an RPT (*, 239.1.1.1) and three independent RP-source trees
construct a share-MDT.
Figure 1-977 Share-MDT setup on a PIM-SM network
1.11.7.2.5 Switch-MDT Switchover
Background
According to the process of establishing a Share-multicast distribution tree (Share-MDT)
described in the previous section, you can find that the VPN instance bound to PE3 has no
receivers but PE3 still receives the VPN multicast data packet of the group (192.1.1.1,
225.1.1.1). This is a defect of the multicast domain (MD) scheme: All the PEs belonging to
the same MD can receive multicast data packets regardless of whether they have receivers.
This wastes the bandwidth and imposes extra burden on PEs.
In MVPN, an optimized solution, Switch-MDT, is provided so that multicast data can be
transmitted on demand. It allows on-demand multicast transmission. Traffic will be switched
from the Share-MDT to the Switch-MDT if multicast traffic on PEs reaches the maximum.
Only the PEs that have receivers connected to them will receive multicast data from the
Switch-MDT. This reduces the stress on PEs and bandwidth consumption.
Implementation
Figure 1-978 shows the switch-MDT implementation process based on the assumption that a
share-MDT has been successfully established.
Issue 01 (2018-05-04) 1439

NE20E-S2
Figure 1-978 Switch-MDT implementation
1. On PE 1, set 238.1.1.0-238.1.1.255 as the switch-group-pool address range of the

switch-MDT and set the data forwarding rate threshold that triggers a switch-MDT
switchover.
2. When the rate of data forwarded by the source connected with CE 1 exceeds the
configured threshold, PE 1 selects a group address (for example, 238.1.1.0) and
periodically sends signaling packets to other PEs through the share-MDT to instruct
them to switch to the switch-MDT.
3. If PE 2 has a receiver, after receiving the signaling packet, PE 2 joins the group
238.1.1.0. Then, a switch-MDT is set up. The switch-MDT setup process is similar to
that of a share-MDT. If PE 3 has no receivers, after receiving the signaling packet, PE 3
does not join the switch-MDT. As a result, only PE 2 can receive the VPN multicast data
packet from the group (192.1.1.1, 225.1.1.1). Note that PIM control packets are still
transmitted along the share-MDT.
A switch-MDT switchover occurs if any of the following conditions are met:
− The source and group addresses of VPN multicast data packets match the source
and group address ranges defined in ACL filtering rules. Otherwise, the packets are
still forwarded along the share-MDT.
− The forwarding rate of VPN multicast data packets exceeds the switchover
threshold for a specified time range.
4. In some cases, the forwarding rate of VPN multicast data packets fluctuates around the
switchover threshold. To prevent multicast data packets from being frequently switched
between a share-MDT and a switch-MDT, the system does not immediately perform a
switchover after the system detects that the forwarding rate exceeds the switchover
threshold. Instead, the system starts a switch-delay timer. During the switch-MDT setup,
the share-MDT is still used for multicast data packet forwarding. Therefore, the
switch-delay timer helps implement non-stop data forwarding during a switchover from
a share-MDT to a switch-MDT. Before the switch-delay timer times out, the system
continues to detect the data forwarding rate. If the rate remains consistently higher than
the switchover threshold throughout the timer period, data packets are switched to the
switch-MDT. Otherwise, the packets are still forwarded along the share-MDT.
Issue 01 (2018-05-04) 1440

NE20E-S2
Switchback from the Switch-MDT to Share-MDT

A PE switches data back from a switch-MDT to a share-MDT if any of the conditions are met:
 The forwarding rate of VPN multicast data packets is lower than the specified threshold
and does not exceed this threshold during the switch-Holddown period.
 In some cases, the forwarding rate of VPN multicast data packets fluctuates around the
switchover threshold. To prevent the multicast data flow from being frequently switched
between a switch-MDT and a share-MDT, the system does not immediately perform a
switchover when the system detects that the forwarding rate is lower than the switchover
threshold. Instead, the system starts a Holddown timer. Before the Holddown timer
expires, the system continues to detect the data forwarding rate. If the rate remains
consistently lower than the switchover threshold throughout the timer period, the data
packets are switched back to the share-MDT. Otherwise, the packets are still forwarded
along the switch-MDT.
 After the switch-group-pool is switched, the switch-group address encapsulated in VPN
multicast data is not in the address range of the new switch-group-pool.
 After advanced ACL rules that control a switch-MDT switchover change, VPN multicast
data packets do not comply with the new ACL rules.
1.11.7.2.6 Multicast VPN Extranet
Background
Rosen MVPN supports only intra-VPN multicast service distribution. To enable a service
provider on a VPN to provide multicast services for users on other VPNs, use MVPN
extranet.
Implementation
Table 1-281 describes the usage scenarios of MVPN extranet.
Table 1-281 Usage scenarios of MVPN extranet
Usage Scenario Description Remarks

Remote cross The source and receiver VPN Two configuration options are
instances reside on different PEs. available for remote cross:
 Configure a source VPN
instance on a receiver PE
(recommended)
A multicast routing policy
must be configured in this
option.
 Configure a receiver VPN
instance on a source PE (not
recommended)
No multicast routing policy is
required in this option. This
option is not recommended
because it may cause
multicast traffic loopbacks on
the source PE and waste
Issue 01 (2018-05-04) 1441

NE20E-S2
Usage Scenario Description Remarks

bandwidth. However, when
the receiver PE is a
non-Huawei device that does
not allow the source VPN
instance configuration, this
option can be used.
Local cross The source and receiver VPN -

instances reside on the same PE,
or the multicast source belongs
to the public network instance.
 The address range of multicast groups using the MVPN extranet service cannot overlap that of
multicast groups using the intra-VPN service.
 Only a static RP can be used in an MVPN extranet scenario, the same static RP address must be
configured on the source and receiver VPN sides, and the static RP address must belong to the
source VPN. If different RP addresses are configured, inconsistent multicast routing entries will be
created on the two instances, causing service forwarding failures.
 To provide an SSM service using MVPN extranet, the same SSM group address must be configured
on the source and receiver VPN sides.
Remote Cross
 Configure a source VPN instance on a receiver PE
On the network shown in Figure 1-979, VPN GREEN is configured on PE1; PE1
encapsulates packets with the share-group G1 address; CE1 connects to the multicast
source in VPN GREEN. VPN BLUE is configured on PE2; PE2 encapsulates packets
with the share-group G2 address; CE2 connects to the multicast source in VPN BLUE.
VPN BLUE is configured on PE3; PE3 encapsulates packets with the share-group G2
address; PE3 establishes a multicast distribution tree (MDT) with PE2 on the public
network. Users connect to CE3 require to receive multicast data from both VPN BLUE
and VPN GREEN.
Issue 01 (2018-05-04) 1442

NE20E-S2
Figure 1-979 Configuring a source VPN instance on a receiver PE
Configure source VPN GREEN on PE3 and a multicast routing policy for receiver VPN
BLUE. Table 1-282 describes the implementation process.
Table 1-282 Configuring a source VPN instance on a receiver PE

St Devi Description
ep ce
1 CE3 CE3 receives an IGMP Report message from the receiver that requires data
from the multicast source in VPN GREEN and forwards the Join message to
PE3 through PIM.
2 PE3 After PE3 receives the PIM Join message from CE3 in VPN BLUE, it
creates a multicast routing entry. Through the RPF check, PE3 determines
that the upstream interface of the RPF route belongs to VPN GREEN. Then,
PE3 adds the upstream interface (serving as an extranet inbound interface)
to the multicast routing table.
3 PE3 PE3 encapsulates the PIM Join message with the share-group G1 address of
VPN GREEN and sends the PIM Join message to PE1 in VPN GREEN over
the public network.
4 PE1 After PE1 receives the multicast data from the source in VPN GREEN, PE1
encapsulates the multicast data with the share-group G1 address of VPN
GREEN and sends the data to PE3 in VPN GREEN over the public
network.
5 PE3 PE3 decapsulates and imports the received multicast data to receiver VPN
BLUE and sends the data to CE3. Then, CE3 forwards the data to the
Issue 01 (2018-05-04) 1443

NE20E-S2
St Devi Description
ep ce
receiver in VPN BLUE.
 Configure a receiver VPN instance on a source PE

On the network shown in Figure 1-980, the prerequisite configurations are the same as
those in the first option.
Figure 1-980 Configuring a receiver VPN instance on a source PE
Configure receiver VPN BLUE on PE1. No multicast routing policy is required. Table
1-283 describes the implementation process.
Table 1-283 Configuring a receiver VPN instance on a source PE
St Devi Description
ep ce
PE3 through PIM.
2 PE3 After PE3 receives the PIM Join message from CE3 in VPN BLUE, it
encapsulates the PIM Join message with the share-group G2 address of
Issue 01 (2018-05-04) 1444

NE20E-S2
St Devi Description
ep ce
VPN BLUE and sends the PIM Join message to PE1 in VPN BLUE over
the public network.
3 PE1 PE1 imports the PIM Join message for VPN BLUE to VPN GREEN,
establishes a multicast routing entry in VPN GREEN, and adds the extranet
outbound interface and receiver VPN BLUE to the multicast routing entry.
4 PE1 PE1 imports the multicast data sent by the multicast source in VPN GREEN
to receiver VPN BLUE, encapsulates the multicast data with the
share-group G2 address of VPN BLUE, and sends the data to PE3 in VPN
BLUE over the public network.
5 PE3 PE3 decapsulates and sends the received data to CE3. Then, CE3 forwards
the data to the receiver in VPN BLUE.
Local Cross
On the network shown in Figure 1-981, PE1 is the source PE of VPN BLUE. CE1 connects to
the multicast source in VPN BLUE. CE4 connects to the multicast source in VPN GREEN.
Both CE3 and CE4 reside on the same side of PE3. Users connect to CE3 require to receive
multicast data from both VPN BLUE and VPN GREEN.
Figure 1-981 Local cross
Table 1-284 describes how MVPN extranet is implemented in the local crossing scenario.
Issue 01 (2018-05-04) 1445

NE20E-S2
Table 1-284 MVPN extranet implemented in the local crossing scenario
St Devi Description
ep ce
PE3 through PIM.
2 PE3 After PE3 receives the PIM Join message, it creates a multicast routing
entry of VPN BLUE. Through the RPF check, PE3 determines that the
upstream interface of the RPF route belongs to VPN GREEN. PE3 then
imports the PIM Join message to VPN GREEN.
3 PE3 PE3 creates a multicast routing entry in VPN GREEN, records receiver
VPN BLUE in the entry, and sends the PIM Join message to CE4 in VPN
GREEN.
4 PE3 After CE4 receives the PIM Join message, it sends the multicast data from
VPN GREEN to PE3, PE3 imports the multicast data to receiver VPN
BLUE based on the multicast routing entries of VPN GREEN.
5 PE3 PE3 sends the multicast data to CE3 based on the multicast routing entries
of VPN BLUE. Then, CE3 forwards the data to the receiver in VPN BLUE.
In MVPN extranet scenarios where the multicast source resides on a public network and the receiver
resides on a VPN, static routes to the multicast source and public network RP must be configured in the
receiver VPN instance. After the device where the receiver VPN instance resides imports the PIM join
message from the VPN instance to the public network instance and establishes a multicast routing entry,
the device can send multicast data from the public network instance to the VPN instance, and then to the
receivers. Multicast protocol and data packets can be directly forwarded to the receiver without the need
to be encapsulated and decapsulated by GRE.
1.11.7.2.7 BGP A-D MVPN
Background
Multicast packets, including protocol packets and data packets, are transmitted from the
public network to a private network along a public network multicast distribution tree (MDT).
Public network MDTs are categorized into the following types:
 PIM-SM MDT: an MDT established by sending PIM-SM Join messages to the
intermediate device RP. PIM-SM MDTs are used in scenarios in which the location of
the multicast source (multicast tunnel interface) is unknown.
 PIM-SSM MDT: an MDT established by sending PIM-SSM Join messages to the
multicast source. PIM-SSM MDTs are used in scenarios in which the location of the
multicast source (multicast tunnel interface) is known.
Before the BGP A-D MVPN is introduced, MD MVPNs can establish only PIM-SM MDTs.
This is because PEs belonging to the same VPN cannot detect each other's peer information.
As a result, PEs belonging to the same VPN cannot detect the multicast source, and therefore
are unable to send PIM-SSM Join messages to the multicast source to establish a PIM-SSM
MDT.
Issue 01 (2018-05-04) 1446

NE20E-S2
After the BGP A-D MVPN is introduced, MD MVPNs can also establish PIM-SSM MDTs.
On a BGP A-D MVPN, PEs obtain and record peer information about a VPN by exchanging
BGP Update packets that carry A-D route information. Then, these PEs send PIM-SSM Join
messages directly to the multicast source to establish a PIM-SSM MDT. After the PIM-SSM
MDT is established, the BGP A-D MVPN transmits multicast services over a public network
tunnel based on the PIM-SSM MDT.
Related Concepts
The concepts related to BGP A-D MVPN are as follows:
 MD MVPN: See 1.11.7.2.1 Basic Concepts.
 Peer: a BGP speaker that exchanges messages with another BGP speaker.
 BGP A-D: a mechanism in which PEs exchange BGP Update packets that carry A-D
route information to obtain and record peer information of a VPN.
Implementation
For multicast VPN in BGP A-D mode, only MDT-SAFI A-D is supported, in which a new
address family is defined by BGP. In this manner, after VPN instance is configured on a PE,
the PE advertises the VPN configuration including the RD, share-group address, and IP
address of the MTI interface to all its BGP peers. After a remote PE receives an MDT-SAFI
message advertised by BGP, the remote PE compares the Share-Group address in the message
with its Share-Group address. If the remote PE confirms that it is in the same VPN as the
sender of the MDT-SAFI message, the remote PE establishes the PIM-SSM MDT on the
public network to transmit multicast VPN services.
Issue 01 (2018-05-04) 1447

NE20E-S2
Figure 1-982 Networking diagram of multicast VPN in BGP A-D mode
As shown in Figure 1-982, PE1, PE2, and PE3 belong to VPN1, and join the share-group G1.
The address of G1 is within the SSM group address range. BGP MDT-SAFI A-D mode is
enabled on each PE. In addition, the BGP A-D function is enabled on VPN1. The site where
CE1 resides is connected to Source of VPN1, and CE2 and CE3 are connected to VPN1 users.
Based on the BGP A-D mechanism, every PE on the network obtains and records information
about all its BGP peers on the same VPN, and then directly establishes a PIM-SSM MDT on
the public network for transmitting multicast VPN services. In this manner, MVPN services
can be transmitted over a public network tunnel based on the PIM-SSM MDT.
The following uses PE3 as an example to describe service processing in MVPN in BGP A-D
mode:
1. After being configured with the BGP A-D function, PE1, PE2, and PE3 negotiate session
parameters, and confirm that both ends support the BGP A-D function. Then, the PEs
can establish BGP peer relationships. After receiving a BGP Update packet from PE1
and PE2, respectively, PE3 obtains and records the BGP peer addresses of PE1 and PE2.
The BGP Update packets carry the information about the PEs that send packets, such as
the PE address and supported tunnel type.
2. VPN1 is configured on PE3. PE3 joins the share-group G1. PE3 creates a PIM-SSM
entry with G1 being the group address and the address of PE1 being the source address
and another PIM-SSM entry with G1 being the group address and the address of PE2
being the source address. PE3 then directly sends PIM Join messages to PE1 and PE2 to
establish two PIM-SSM MDTs to PE1 and PE2, respectively.
3. CE3 sends a Join message to PE3. After receiving the Join message, PE3 encapsulates
the Join message with the PIM-SSM share-group address, and then sends it to PE1 over
Issue 01 (2018-05-04) 1448

NE20E-S2
the public network tunnel. PE1 then decapsulates the received Join message, and then
sends it to the multicast source.
4. After the multicast data sent by the multicast source reaches PE1, PE1 encapsulates the
multicast data with the share-group address, and then forwards it to PE3 over the public
network tunnel. PE3 then forwards the multicast data to CE3, and CE3 sends the
multicast data to the user.
1.11.7.3 MVPN Applications

1.11.7.3.1 Single-AS MD VPN
Single-autonomous system (AS) multicast domain (MD) VPN isolates multicast services of
different VPNs in the same AS.
On the network shown in Figure 1-983, a single AS runs MPLS/BGP VPN. Both PE 1 and PE
2 have two VPN instances configured: VPN BLUE and VPN RED. The RED instances have
the same share-group address, use the same share-MDT, and belong to the same MD. The
BLUE instances have the same share-group address, use the same share-MDT, and belong to
the same MD.
Figure 1-983 Single-AS MD VPN
The following example uses VPN BLUE to describe how multicast services are isolated
between VPNs.
1. After a share-multicast distribution tree (MDT) is established for the BLUE instances,
the two BLUE instances connected with CE 1 and CE 2 exchange multicast protocol
packets through a multicast tunnel (MT).
2. Multicast devices in the BLUE instances can then establish neighbor relationships, and
send Join, Prune, and BSR messages to each other. The protocol packets in the BLUE
instances are encapsulated and decapsulated only on the MTs of the PEs. The PEs are
unaware they are on VPN networks, so they process the multicast protocol packets and
forward multicast data packets like devices on a public network. Multicast data is
transmitted in the same MD, but isolated from VPN instances in other MDs.
Issue 01 (2018-05-04) 1449

NE20E-S2
1.11.8 Multicast VPN in NG MVPN Mode

Purpose
BGP/MPLS IP VPNs are widely deployed as they provide excellent reliability and security.
Meanwhile, IP multicast is gaining increasing popularity among service providers as it
provides highly efficient point-to-multipoint (P2MP) traffic transmission. Rapidly developing
multicast applications, such as IPTV, video conference, and distance education, impose
increasing requirements on network reliability, security, and efficiency. As a result, service
providers' demand for delivering multicast services over BGP/MPLS IP VPNs is also
increasing. In this context, the multicast virtual private network (MVPN) solution is
developed. The MVPN technology, when applied to a BGP/MPLS IP VPN, can transmit VPN
multicast traffic to remote VPN sites across the public network.
Rosen MVPNs establish multicast distribution trees (MDTs) using Protocol Independent
Multicast (PIM) to transmit VPN multicast protocol and data packets, and have the following
limitations:
 VPN multicast protocol and data packets must be transmitted using the MDT, which
complicates network deployment because the multicast function must be enabled on the
public network.
 The public network uses GRE for multicast packet encapsulation and cannot leverage the
MPLS advantages, such as high reliability, QoS guarantee, and TE bandwidth
reservation, of existing BGP/MPLS IP VPNs.
Next-generation (NG) MVPNs, which have made improvements over Rosen MVPNs, have
the following characteristics:
 The public network uses BGP to transmit VPN multicast protocol packets and routing
information. Multicast protocols do not need to be deployed on the public network,
simplifying network deployment and maintenance.
 The public network uses the mature label-based forwarding and tunnel protection
techniques of MPLS, improving multicast service quality and reliability.
Definition
The NG MVPN is a new framework designed to transmit IP multicast traffic across a
BGP/MPLS IP VPN. An NG MVPN uses BGP to transmit multicast protocol packets, and
uses PIM-SM, PIM-SSM, P2MP TE, or mLDP to transmit multicast data packets. The NG
MVPN enables unicast and multicast services to be delivered using the same VPN
architecture.
Figure 1-984 shows a typical NG MVPN networking scenario, and Table 1-285 lists the roles
of different entities on an NG MVPN.
Issue 01 (2018-05-04) 1450

NE20E-S2
Figure 1-984 Typical NG MVPN networking scenario
Table 1-285 Roles on an NG MVPN
Role Description Example

Customer edge (CE) A CE directly connects to a CE1, CE2, and CE3 in
service provider network. Figure 1-984
Usually, a CE is unaware of
the VPN and does not need
to support MPLS.
Provider edge (PE) A PE directly connects to PE1, PE2, and PE3 in
CEs. On an MPLS network, Figure 1-984
PEs process all VPN
services. Therefore, the
requirements for PE
performance are high.
Provider device (P) A P does not directly P in Figure 1-984
connect to CEs. Ps only
need to possess basic MPLS
forwarding capabilities and
do not need to maintain
VPN information.
Receiver site A receiver site is a site Site where multicast
where multicast receivers receivers reside in Figure
reside. 1-984
Receiver PE A receiver PE is a PE that PE2 and PE3 in Figure
connects to a receiver site. 1-984
Sender site A sender site is a site where Site where the multicast
the multicast source resides. source resides in Figure
Issue 01 (2018-05-04) 1451

NE20E-S2
Role Description Example

1-984
Sender PE A sender PE is a PE that PE1 in Figure 1-984
connects to a sender site.
Benefits
NG MVPNs, which implement hierarchical forwarding of multicast data and control packets
on BGP/MPLS IP VPNs, offer the following benefits:
 Better security by transmitting VPN multicast data over BGP/MPLS IP VPNs.
 Better network maintainability by reducing network deployment complexity.
 Better service quality and reliability by using mature label-based forwarding and tunnel
protection techniques of MPLS.
1.11.8.2 NG MVPN Control Plane

1.11.8.2.1 Control Plane Overview
This section describes the NG MVPN control plane establishment process:
1. 1.11.8.2.2 MVPN Membership Autodiscovery
MVPN membership autodiscovery is a process that automatically discovers MVPN peers
and establishes MVPN peer relationships. A sender PE and a receiver PE on the same
MVPN can exchange control messages that carry MVPN NLRI to establish a PMSI
tunnel only after they establish an MVPN peer relationship. In NE20E, PEs use BGP as
the signaling protocol to exchange control messages.
2. 1.11.8.2.3 I-PMSI Tunnel Establishment
PMSI tunnels are logical tunnels used by a public network to transmit VPN multicast
traffic.
3. 1.11.8.2.4 Switching Between I-PMSI and S-PMSI Tunnels
After switching between I-PMSI and S-PMSI tunnels is configured, if the multicast data
forwarding rate exceeds the switching threshold, multicast data is switched from the
I-PMSI tunnel to an S-PMSI tunnel. Unlike the I-PMSI tunnel that sends multicast data
to all PEs on an NG MVPN, an S-PMSI tunnel sends multicast data only to PEs
interested in the data, reducing bandwidth consumption and PEs' burdens.
4. 1.11.8.2.5 Joining and Leaving for Multicast Group Members
After a multicast receiver joins a multicast group, the multicast receiver can receive
traffic from the multicast source if a corresponding PMSI tunnel has been established.
Like Rosen MVPNs, NG MVPNs also use PIM as the multicast routing protocol to
implement communication between a CE and a PE.
Concepts related to the control plane include:
 PMSI Tunnel
 MVPN Targets
 MVPN Extended Community Attributes
Issue 01 (2018-05-04) 1452

NE20E-S2
PMSI Tunnel
Public tunnels (P-tunnels) are transport mechanisms used to forward VPN multicast traffic
across service provider networks. In NE20E, PMSI tunnels can be carried over RSVP-TE
P2MP or mLDP P2MP tunnels. Table 1-286 lists the differences between RSVP-TE P2MP
tunnels and mLDP P2MP tunnels.
Table 1-286 Differences between RSVP-TE P2MP tunnels and mLDP P2MP tunnels
Tunnel Type Tunnel Establishment Characteristic

Method
RSVP-TE P2MP tunnel Established from the root RSVP-TE P2MP tunnels
node. support bandwidth
reservation and can ensure
service quality during
network congestion. Use
RSVP-TE P2MP tunnels to
carry PMSI tunnels if high
service quality is required.
mLDP P2MP tunnel Established from a leaf mLDP P2MP tunnels do not
node. support bandwidth
reservation and cannot
ensure service quality
during network congestion.
Configuring an mLDP
P2MP tunnel, however, is
easier than configuring an
RSVP-TE P2MP tunnel.
Use mLDP P2MP tunnels to
carry PMSI tunnels if high
service quality is not
required.
Theoretically, a P-tunnel can carry the traffic of one or multiple MVPNs. However, in NE20E,
a P-tunnel can carry the traffic of only one MVPN.
On an MVPN that uses BGP as the signaling protocol, a sender PE distributes information
about the P-tunnel in a new BGP attribute called PMSI. PMSI tunnels are the logical tunnels
used by the public network to transmit VPN multicast data, and P-tunnels are the actual
tunnels used by the public network to transmit VPN multicast data. A sender PE uses PMSI
tunnels to send specific VPN multicast data to receiver PEs. A receiver PE uses PMSI tunnel
information to determine which multicast data is sent by the multicast source on the same
MVPN as itself. There are two types of PMSI tunnels: I-PMSI tunnels and S-PMSI
tunnels.Table 1-287 lists the differences between I-PMSI and S-PMSI tunnels.
Table 1-287 I-PMSI and S-PMSI
PMSI Tunnel Type Description Characteristic

I-PMSI tunnel An I-PMSI tunnel connects Multicast data sent over an
to all PEs on an MVPN. I-PMSI tunnel can be
received by all PEs on the
Issue 01 (2018-05-04) 1453

NE20E-S2
PMSI Tunnel Type Description Characteristic

MVPN. In a VPN instance,
one PE corresponds to only
one I-PMSI tunnel.
S-PMSI tunnel An S-PMSI tunnel connects Multicast data sent over an
to the sender and receiver S-PMSI tunnel is received
PEs of specific sources and by only PEs interested in the
multicast groups. data. In a VPN instance, one
PE can correspond to
multiple S-PMSI tunnels.
MVPN Targets
MVPN targets are used to control MVPN A-D route advertisement. MVPN targets function in
a similar way as VPN targets used on unicast VPNs and are also classified into two types:
 Export MVPN target: A PE adds the export MVPN target to an MVPN instance before
advertising this route.
 Import MVPN target: After receiving an MVPN A-D route from another PE, a PE
matches the export MVPN target of the route against the import MVPN targets of its
VPN instances. If the export MVPN target matches the import MVPN target of a VPN
instance, the PE accepts the MVPN A-D route and records the sender PE as an MVPN
member. If the export MVPN target does not match the import MVPN target of any VPN
instance, the PE drops the MVPN A-D route.
By default, if you do not configure MVPN targets for an MVPN, MVPN A-D routes carry the VPN
target communities that are attached to unicast VPN-IPv4 routes. If the unicast and multicast network
topologies are congruent, you do not need to configure MVPN targets for MVPN A-D routes. If they are
not congruent, configure MVPN targets for MVPN A-D routes.
MVPN Extended Community Attributes

MVPN extended community attributes, which are used to control the advertisement and
receiving of BGP C-multicast routes, can be:
 Source AS Extended Community: carried in VPNv4 routes advertised by PEs. This
attribute is an AS extended community attribute and is mainly used in inter-AS
scenarios.
 VRF Route Import Extended Community: carried in VPNv4 routes advertised by sender
PEs to receiver PEs. When a receiver PE sends a BGP C-multicast route to a sender PE,
the receiver PE attaches this attribute to the route. In a scenario in which many sender
PEs exist, this attribute helps a sender PE that receives the BGP C-multicast route to
determine whether to process the route and to which VPN instance routing table the BGP
C-multicast route should be added.
The value of the VRF Route Import Extended Community is in the format of
"Administrator field value:Local Administrator field value". The Administrator field is
set to the local MVPN ID, whereas the Local Administrator field is set to the local VPN
instance ID of the sender PE.
Issue 01 (2018-05-04) 1454

NE20E-S2
On the network shown in Figure 1-985, PE1 and PE2 are both sender PEs, and PE3 is a
receiver PE. PE1 and PE2 connect to both vpn1 and vpn2. On PE1, the VRF Route
Import Extended Community is 1.1.1.9:1 for vpn1 and 1.1.1.9:2 for vpn2; on PE2, the
VRF Route Import Extended Community is 2.2.2.9:1 for vpn1 and 2.2.2.9:2 for vpn2.
After PE1 and PE2 both establish BGP MVPN peer relationships with PE3, PE1 and PE2
both send to PE3 a VPNv4 route destined for the multicast source 192.168.1.2. The VRF
Route Import Extended Community carried in the VPNv4 route sent by PE1 is 1.1.1.9:1
and that carried in the VPNv4 route sent by PE2 is 2.2.2.9:1. After PE3 receives the two
VPNv4 routes, PE3 adds the preferred route (VPNv4 route sent by PE1 in this example)
to the vpn1 routing table and stores the VRF Route Import Extended Community value
carried in the preferred route locally for later BGP C-multicast route generation.
Upon receipt of a PIM Join message from CE3, PE3 generates a BGP C-multicast route
with the RT-import attribute and sends this route to PE1 and PE2. The RT-import
attribute value of this route is the same as the locally stored VRF Route Import Extended
Community value, 1.1.1.9:1. Then,
− Upon receipt of the BGP C-multicast route, PE1 checks the RT-import attribute of
this route. After PE1 finds that the Administrator field value is 1.1.1.9, which is the
same as its local MVPN ID, PE1 accepts this route and adds it to the vpn1 routing
table based on the Local Administrator field value, 1.
− Upon receipt of the BGP C-multicast route, PE2 also checks the RT-import attribute
of this route. After PE2 finds that the Administrator field value is 1.1.1.9, a value
different from its local MVPN ID 2.2.2.9, PE2 drops this route.
Figure 1-985 Application of the VRF Route Import Extended Community
Issue 01 (2018-05-04) 1455

NE20E-S2
1.11.8.2.2 MVPN Membership Autodiscovery

To exchange control messages and establish PMSI tunnels, a PE on an MVPN must be
capable of discovering other PEs on the MVPN. The discovery process is called MVPN
membership autodiscovery. An NG MVPN uses BGP to implement this process. To support
MVPN membership autodiscovery, BGP defines a new address family, the BGP-MVPN
address family.
On the network shown in Figure 1-986, BGP and MVPN are configured on PE1, PE2, and
PE3 in a way that PE1 can negotiate with PE2 and PE3 to establish BGP MVPN peer
relationships. A PE newly added to the service provider's backbone network can join the
MVPN so long as this PE can establish BGP MVPN peer relationships with existing PEs on
the MVPN.
To transmit multicast traffic from multicast sources to multicast receivers, sender PEs must
establish BGP MVPN peer relationships with receiver PEs. On the network shown in Figure
1-986, PE1 serves as a sender PE, and PE2 and PE3 serve as receiver PEs. Therefore, PE1
establishes BGP MVPN peer relationships with PE2 and PE3.
PEs on an NG MVPN use BGP Update messages to exchange MVPN information. MVPN
information is carried in the network layer reachability information (NLRI) field of a BGP
Update message. The NLRI containing MVPN information is also called the MVPN NLRI.
For more information about the MVPN NLRI, see MVPN NLRI.
Issue 01 (2018-05-04) 1456

NE20E-S2
1.11.8.2.3 I-PMSI Tunnel Establishment

When establishing an I-PMSI tunnel, you must specify the P-tunnel type. The process of
establishing an I-PMSI tunnel varies according to the P-tunnel type. In NE20E, PEs can use
only the following types of P-tunnels to carry I-PMSI tunnels:
 RSVP-TE P2MP tunnels: A sender PE sends an intra-AS PMSI A-D route to each
receiver PE. Upon receipt, each receiver PE sends a reply message. Then, the sender PE
collects P2MP tunnel leaf information from received reply messages and establishes an
RSVP-TE P2MP tunnel for each MVPN based on the leaf information of the MVPN. For
more information about RSVP-TE P2MP tunnel establishment, see "P2MP TE" in
NE20E Feature Description - MPLS.
 mLDP P2MP tunnels: Receiver PEs directly send Label Mapping messages based on the
root node address (sender PE address) and opaque value information carried in the
Intra-AS PMSI A-D route sent by the sender PE to establish an mLDP P2MP tunnel. For
more information about mLDP P2MP tunnel establishment, see "mLDP" in NE20E
Feature Description - MPLS.
For comparison between RSVP-TE and mLDP P2MP tunnels, see 1.11.8.2.1 in 1.11.8.2.1 Control
Plane Overview.
The following example uses the network shown in Figure 1-987 to describe how to establish
PMSI tunnels. Because RSVP-TE P2MP tunnels and mLDP P2MP tunnels are established
differently, the following uses two scenarios, RSVP-TE P2MP Tunnel and mLDP P2MP
Tunnel, to describe how to establish PMSI tunnels.
This example presumes that:
 PE1 has established BGP MVPN peer relationships with PE2 and PE3, but no BGP
MVPN peer relationship is established between PE2 and PE3.
 The network administrator has configured MVPN on PE1, PE2, and PE3 in turn.
Issue 01 (2018-05-04) 1457

NE20E-S2
RSVP-TE P2MP Tunnel

Figure 1-988 shows the time sequence for establishing an I-PMSI tunnel with the P-tunnel
type as RSVP-TE P2MP LSP.
Figure 1-988 Time sequence for establishing an I-PMSI tunnel with the P-tunnel type as
RSVP-TE P2MP LSP
Table 1-288 briefs the procedure for establishing an I-PMSI tunnel with the P-tunnel type as
RSVP-TE P2MP LSP.
Table 1-288 Procedure for establishing an I-PMSI tunnel with the P-tunnel type as RSVP-TE
P2MP LSP
Ste De Prerequisites Key Action

p vice
PE1 BGP and MVPN have As a sender PE, PE1 initiates the I-PMSI tunnel
been configured on establishment process. The MPLS module on PE1
PE1. reserves resources for the corresponding
PE1 has been RSVP-TE P2MP tunnel. Because PE1 does not
configured as a sender know RSVP-TE P2MP tunnel leaf information,
PE. the RSVP-TE P2MP tunnel is not established in a
real sense.
The P-tunnel type for
Issue 01 (2018-05-04) 1458

NE20E-S2

p vice
I-PMSI tunnel
establishment has been
specified as RSVP-TE
P2MP LSP.
PE1 BGP and MVPN have PE1 sends a Type 1 BGP A-D route to PE2. This
been configured on route carries the following information:
PE2.  1.11.8.2.1 Control Plane Overview: used to
PE1 has established a control A-D route advertisement. The Type 1
BGP MVPN peer BGP A-D route carries the export MVPN
relationship with PE2. target information configured on PE1.
 1.11.8.4 NG MVPN Control Messages:
specifies the P-tunnel type (RSVP-TE P2MP
LSP in this case) used for PMSI tunnel
establishment. This attribute carries
information about resources reserved for the
RSVP-TE P2MP tunnel in Step .
PE2 - 1. PE2 sends a BGP A-D route that carries the

export MVPN target to PE1. Because PE2 is
not a sender PE configured with PMSI tunnel
information, the BGP A-D route sent by PE2
does not carry the PMSI Tunnel attribute.
2. After PE2 receives the BGP A-D route from
PE1, PE2 matches the export MVPN target of
the route against its local import MVPN target.
If the two targets match, PE2 accepts this
route, records PE1 as an MVPN member, and
joins the P2MP tunnel that is specified in the
PMSI Tunnel attribute carried in this route (at
the moment, the P2MP tunnel has not been
established yet).
PE1 - After PE1 receives the BGP A-D route from PE2,
PE1 matches the export MVPN target of the route
against its local import MVPN target. If the two
targets match, PE1 accepts this route, records PE2
as an MVPN member, and instructs the MPLS
module to send an MPLS message to PE2 and add
PE2 as a leaf node of the RSVP-TE P2MP tunnel
to be established.
PE1 - After PE1 receives a reply from PE2, the MPLS
module on PE1 completes the process of
establishing an RSVP-TE P2MP tunnel with PE1
as the root node and PE2 as a leaf node. For more
information about RSVP-TE P2MP tunnel
establishment, see "P2MP TE" in NE20E Feature
Description - MPLS.
PE2 - After PE2 receives the MPLS message from PE1,
Issue 01 (2018-05-04) 1459

NE20E-S2

p vice
PE2 joins the established RSVP-TE P2MP tunnel.
PE3 joins the RSVP-TE P2MP tunnel rooted at PE1 in a similar way as PE2. After PE2 and
PE3 both join the RSVP-TE P2MP tunnel rooted at PE1, the I-PMSI tunnel is established and
the MVPN service becomes available.
mLDP P2MP Tunnel

Figure 1-989 shows the time sequence for establishing an I-PMSI tunnel with the P-tunnel
type as mLDP LSP.
Figure 1-989 Time sequence for establishing an I-PMSI tunnel with the P-tunnel type as mLDP
P2MP LSP
Table 1-289 briefs the procedure for establishing an I-PMSI tunnel with the P-tunnel type as
mLDP P2MP LSP.
Table 1-289 Procedure for establishing an I-PMSI tunnel with the P-tunnel type as mLDP P2MP
LSP
Step Dev Prerequisites Key Action

ice
Issue 01 (2018-05-04) 1460

NE20E-S2

ice
PE1 BGP and MVPN have As a sender PE, PE1 initiates the I-PMSI tunnel
been configured on establishment process. The MPLS module on
PE1. PE1 reserves resources (FEC information such as
PE1 has been the opaque value and root node address) for the
configured as a sender corresponding mLDP P2MP tunnel. Because PE1
PE. does not know leaf information of the mLDP
P2MP tunnel, the mLDP P2MP tunnel is not
The P-tunnel type for established in a real sense.
I-PMSI tunnel
establishment has been
specified as mLDP
P2MP LSP.
PE1 BGP and MVPN have PE1 sends a Type 1 BGP A-D route to PE2. This
been configured on route carries the following information:
PE2.  1.11.8.2.1 Control Plane Overview: used to
PE1 has established a control A-D route advertisement. The Type 1
BGP MVPN peer BGP A-D route carries the export MVPN
relationship with PE2. target configured on PE1.
 1.11.8.4 NG MVPN Control Messages:
specifies the P-tunnel type (mLDP P2MP in
this case) used for PMSI tunnel establishment.
This attribute carries information about
resources reserved by MPLS for the mLDP
P2MP tunnel in Step .
PE2 - After PE2 receives the BGP A-D route from PE1,
the MPLS module on PE2 sends a Label
Mapping message to PE1. This is because the
PMSI Tunnel attribute carried in the received
route specifies the P-tunnel type as mLDP,
meaning that the P2MP tunnel must be
established from leaves.
After PE2 receives the MPLS message replied by
PE1, PE2 becomes aware that the P2MP tunnel
has been established. For more information about
mLDP P2MP tunnel establishment, see "mLDP"
in NE20E Feature Description - MPLS.
PE2 - PE2 creates an mLDP P2MP tunnel rooted at
PE1.
PE2 - PE2 sends a BGP A-D route that carries the
export MVPN target to PE1. Because PE2 is not
a sender PE configured with PMSI tunnel
information, the BGP A-D route sent by PE2
does not carry the PMSI Tunnel attribute.
After PE1 receives the BGP A-D route from PE2,
PE1 matches the export MVPN target of the route
against its local import MVPN target. If the two
Issue 01 (2018-05-04) 1461

NE20E-S2

ice
targets match, PE1 accepts this route and records
PE2 as an MVPN member.
PE3 joins the mLDP P2MP tunnel and MVPN in a similar way as PE2. After PE2 and PE3
both join the mLDP P2MP tunnel rooted at PE1, the I-PMSI tunnel is established and the
MVPN service becomes available.
1.11.8.2.4 Switching Between I-PMSI and S-PMSI Tunnels
Background
An NG MVPN uses the I-PMSI tunnel to send multicast data to receivers. The I-PMSI tunnel
connects to all PEs on the MVPN and sends multicast data to these PEs regardless of whether
these PEs have receivers. If some PEs do not have receivers, this implementation will cause
redundant traffic, wasting bandwidth resources and increasing PEs' burdens.
To solve this problem, S-PMSI tunnels are introduced. An S-PMSI tunnel connects to the
sender and receiver PEs of specific multicast sources and groups on an NG MVPN. Compared
with the I-PMSI tunnel, an S-PMSI tunnel sends multicast data only to PEs interested in the
data, reducing bandwidth consumption and PEs' burdens.
For comparison between I-PMSI and S-PMSI tunnels, see 1.11.8.2.1 Control Plane Overview in
1.11.8.2.1 .
Implementation
The following example uses the network shown in Figure 1-990 to describe switching
between I-PMSI and S-PMSI tunnels on an NG MVPN.
Issue 01 (2018-05-04) 1462

NE20E-S2
Figure 1-990 Typical NG MVPN networking
Table 1-290 Switching between I-PMSI and S-PMSI tunnels
Item Occurring Condition Description

Switching from the I-PMSI The multicast data S-PMSI tunnels are
tunnel to an S-PMSI tunnel forwarding rate is classified as RSVP-TE
consistently above the S-PMSI tunnels or mLDP
specified switching S-PMSI tunnels, depending
threshold. on the bearer tunnel type.
For details about switching
from the I-PMSI tunnel to an
S-PMSI tunnel, see:
 Switching from the
I-PMSI Tunnel to an
RSVP-TE S-PMSI
Tunnel
 Switching from the
I-PMSI Tunnel to an
mLDP S-PMSI Tunnel
Switching from an S-PMSI The multicast data -

tunnel to the I-PMSI tunnel forwarding rate is
consistently below the
specified switching
threshold.
 After multicast data is switched from the I-PMSI tunnel to an S-PMSI tunnel, if the S-PMSI tunnel
fails but the I-PMSI tunnel is still available, multicast data will be switched back to the I-PMSI
tunnel.
Issue 01 (2018-05-04) 1463

NE20E-S2
 After multicast data is switched from the I-PMSI tunnel to an S-PMSI tunnel, if the multicast data
forwarding rate is consistently below the specified switching threshold but the I-PMSI tunnel is
unavailable, multicast data still travels along the S-PMSI tunnel.
Switching from the I-PMSI Tunnel to an S-PMSI Tunnel

 Switching from the I-PMSI Tunnel to an RSVP-TE S-PMSI Tunnel
Figure 1-991 shows the time sequence for switching from the I-PMSI tunnel to an
RSVP-TE S-PMSI tunnel. Table 1-291 describes the specific switching procedure.
Figure 1-991 Time sequence for switching from the I-PMSI tunnel to an RSVP-TE S-PMSI
tunnel
Table 1-291 Procedure for switching from the I-PMSI tunnel to an RSVP-TE S-PMSI tunnel
Step Devic Key Action

e
PE1 After PE1 detects that the multicast data forwarding rate exceeds the
specified switching threshold, PE1 initiates switching from the I-PMSI
tunnel to an S-PMSI tunnel by sending a BGP S-PMSI A-D route to its
Issue 01 (2018-05-04) 1464

NE20E-S2
Step Devic Key Action

e
BGP peers. In the BGP S-PMSI A-D route, the Leaf Information
Require flag is set to 1, indicating that a PE that receives this route
needs to send a BGP Leaf A-D route in response if the PE wants to
join the S-PMSI tunnel to be established.
PE2 Upon receipt of the BGP S-PMSI A-D route, PE2, which has
downstream receivers, sends a BGP Leaf A-D route to PE1.
PE3 Upon receipt of the BGP S-PMSI A-D route, PE3, which does not
have downstream receivers, does not send a BGP Leaf A-D route to
PE1 but records the BGP S-PMSI A-D route information.
PE1 Upon receipt of the BGP Leaf A-D route from PE2, PE1 establishes an
S-PMSI tunnel with itself as the root node and PE2 as a leaf node.
PE2 After PE2 detects that the RSVP-TE S-PMSI tunnel has been
established, PE2 joins this tunnel.
After PE3 has downstream receivers, PE3 will send a BGP Leaf A-D route to PE1. Upon
receipt of the route, PE1 adds PE3 as a leaf node of the RSVE-TE S-PMSI tunnel. After
PE3 joins the tunnel, PE3's downstream receivers will also be able to receive multicast
data.
 Switching from the I-PMSI Tunnel to an mLDP S-PMSI Tunnel
Figure 1-992 shows the time sequence for switching from the I-PMSI tunnel to an mLDP
S-PMSI tunnel. Table 1-292 describes the specific switching procedure.
Issue 01 (2018-05-04) 1465

NE20E-S2
Figure 1-992 Time sequence for switching from the I-PMSI tunnel to an mLDP S-PMSI
tunnel
Table 1-292 Procedure for switching from the I-PMSI tunnel to an mLDP S-PMSI tunnel
Step Device Key Action

PE1 After PE1 detects that the multicast data forwarding rate exceeds the
specified switching threshold, PE1 initiates switching from the
I-PMSI tunnel to an S-PMSI tunnel by sending a BGP S-PMSI A-D
route to its BGP peers. In the BGP S-PMSI A-D route, the Leaf
Information Require flag is set to 0.
PE2 Upon receipt of the BGP S-PMSI A-D route, PE2, which has
downstream receivers, directly joins the mLDP S-PMSI tunnel
specified in the BGP S-PMSI A-D route.
PE3 Upon receipt of the BGP S-PMSI A-D route, PE3, which does not
have downstream receivers, does not join the mLDP S-PMSI tunnel
specified in the BGP S-PMSI A-D route, but records the BGP
S-PMSI A-D route information.
After PE3 has downstream receivers, PE3 will also directly join the mLDP S-PMSI
tunnel. Then, PE3's downstream receivers will also be able to receive multicast data.
Issue 01 (2018-05-04) 1466

NE20E-S2
PE1 starts a switch-delay timer upon the completion of S-PMSI tunnel establishment and determines
whether to switch multicast data to the S-PMSI tunnel as follows: If the S-PMSI tunnel fails to be
established, PE1 still uses the I-PMSI tunnel to send multicast data. If the multicast data forwarding rate
is consistently below the specified switching threshold throughout the timer lifecycle, PE1 still uses the
I-PMSI tunnel to transmit multicast data. If the multicast data forwarding rate is consistently above the
specified switching threshold throughout the timer lifecycle, PE1 switches data to the S-PMSI tunnel for
transmission.
Switching from an S-PMSI Tunnel to the I-PMSI Tunnel

Figure 1-993 shows the time sequence for switching from an S-PMSI tunnel to the I-PMSI
tunnel. Table 1-293 describes the specific switching procedure.
Figure 1-993 Time sequence for switching from an S-PMSI tunnel to the I-PMSI tunnel
Table 1-293 Procedure for switching from an S-PMSI tunnel to the I-PMSI tunnel
PE1 After PE1 detects that the multicast data forwarding rate is
consistently below the specified switching threshold, PE1 starts a
switchback hold timer:
 If the multicast data forwarding rate is consistently above the
specified switching threshold throughout the timer lifecycle, PE1
still uses the S-PMSI tunnel to send traffic.
 If the multicast data forwarding rate is consistently below the
specified switching threshold throughout the timer lifecycle, PE1
switches multicast data to the I-PMSI tunnel for transmission.
Meanwhile, PE1 sends a BGP Withdraw S-PMSI A-D route to
Issue 01 (2018-05-04) 1467

NE20E-S2

PE2, instructing PE2 to withdraw bindings between multicast
entries and the S-PMSI tunnel.
PE2 Upon receipt of the BGP Withdraw S-PMSI A-D route, PE2
withdraws the bindings between its multicast entries and the
S-PMSI tunnel. If PE2 has sent a BGP Leaf A-D route to PE1, PE2
will send a BGP Withdraw Leaf A-D route to PE1 in this step.
PE2 After PE2 detects that none of its multicast entries is bound to the
S-PMSI tunnel, PE2 leaves the S-PMSI tunnel.
PE1 PE1 deletes the S-PMSI tunnel after waiting for a specified period
of time.
In an RSVP-TE P2MP tunnel dual-root 1+1 protection scenario, S-PMSI tunnels must be carried over
RSVP-TE P2MP tunnels. The I-PMSI/S-PMSI switching processes in this scenario are similar to those
described above except that the leaf PEs need to start a tunnel status check delay timer:
 Before the timer expires, leaf PEs delete tunnel protection groups to skip the status check of the
primary I-PMSI or S-PMSI tunnel. The leaf PEs select the multicast data received from the primary
tunnel and discards the multicast data received from the backup tunnel.
 After the timer expires, leaf PEs start to check the primary I-PMSI or S-PMSI tunnel status again.
Leaf PEs select the multicast data received from the primary tunnel only if the primary tunnel is Up.
If the primary tunnel is Down, Leaf PEs select the multicast data received from the backup tunnel.
1.11.8.2.5 Joining and Leaving for Multicast Group Members

After receiver PEs establish BGP MVPN peer relationships with the sender PE, the receiver
PEs can use BGP to send multicast joining and leaving information to the sender PE. The
following example describes how a multicast receiver joins/leaves a multicast group in PIM
(S, G) and PIM (*, G) modes.
Multicast Joining and Leaving in PIM (S, G) Mode
On the network shown in 1.11.8.2.3 , CE1 connects to the multicast source, and CE2 connects
a multicast receiver.
Issue 01 (2018-05-04) 1468

NE20E-S2
Figure 1-994 Time sequences for joining a multicast group
Figure 1-994 shows the procedure for joining a multicast group, and Table 1-294 describes
this procedure.
Table 1-294 Procedure for joining a multicast group
Step Dev Key Action

ice
PE1 After PE1 receives a unicast route destined for the multicast source from
CE1, PE1 converts this route to a VPNv4 route, adds the Source AS
Extended Community and VRF Route Import Extended Community to this
route, and advertises this route to PE2.
For more information about the Source AS Extended Community and VRF
Route Import Extended Community, see MVPN Extended Community
Attributes.
PE2 After PE2 receives the VPNv4 route from PE1, PE2 matches the export
VPN target of the route against its local import VPN target:
 If the two targets match, PE2 accepts the VPNv4 route and stores the
Source AS Extended Community and VRF Route Import Extended
Community values carried in this route locally for later generation of the
BGP C-multicast route.
 If the two targets do not match, PE2 drops the VPNv4 route.
CE2 After CE2 receives an IGMP join request, CE2 sends a PIM-SSM Join
message to PE2.
PE2 After PE2 receives the PIM-SSM Join message:
 PE2 generates a multicast entry. In this entry, the downstream interface
is the interface that receives the PIM-SSM Join message and the
Issue 01 (2018-05-04) 1469

NE20E-S2
Step Dev Key Action

ice
upstream interface is the P2MP tunnel interface on the path to the
multicast source.
 PE2 generates a BGP C-multicast route based on the locally stored
Community values. The RT-import attribute of this route is set to the
locally stored VRF Route Import Extended Community value.
NOTE
For more information about BGP C-multicast route generation, see BGP C-Multicast
Route.
PE2 PE2 sends the BGP C-multicast route to PE1.
PE1 After PE1 receives the BGP C-multicast route:

1. PE1 checks the Administrator field and Local Administrator field values
in the RT-import attribute of the BGP C-multicast route. After PE1
confirms that the Administrator field value is its MVPN ID, PE1 accepts
the BGP C-multicast route.
2. PE1 determines to which VPN instance routing table should the BGP
C-multicast route be added based on the Local Administrator field value
in the RT-import attribute of the route.
3. PE1 adds the BGP C-multicast route to the corresponding VPN instance
routing table and creates a VPN multicast entry to guide multicast traffic
forwarding. In the multicast entry, the downstream interface is PE1's
P2MP tunnel interface.
4. PE1 converts the BGP C-multicast route to a PIM-SSM Join message.
PE1 PE1 sends the PIM-SSM Join message to CE1.
CE1 After CE1 receives the PIM-SSM Join message, CE1 generates a multicast
entry. In this entry, the downstream interface is the interface that receives
the PIM-SSM Join message. After that, the multicast receiver successfully
joins the multicast group, and CE1 can send multicast traffic to CE2.
Issue 01 (2018-05-04) 1470

NE20E-S2
Figure 1-995 Time sequence for leaving a multicast group
Figure 1-995 shows the procedure for leaving a multicast group, and Table 1-295 describes
this procedure.
Table 1-295 Procedure for leaving a multicast group
Step Dev Key Action

ice
CE2 CE2 detects that a multicast receiver attached to itself leaves the multicast
group.
PE2 PE2 deletes the corresponding multicast entry after this entry ages out.
Then, PE2 generates a BGP Withdraw message.
PE2 PE2 sends the BGP Withdraw message to PE1.
PE1 After PE1 receives the BGP Withdraw message, PE1 deletes the
corresponding multicast entry and generates a PIM-SSM Prune message.
PE1 PE1 sends the PIM-SSM Prune message to CE1.
CE1 After CE1 receives the PIM-SSM Prune message, CE1 stops sending
multicast traffic to CE2.
Multicast Joining and Leaving in PIM (*, G) Mode

Table 1-296 lists the implementation modes of PIM (*, G) multicast joining and leaving.
Table 1-296 Implementation modes of PIM (*, G) multicast joining and leaving
Implementation Principle Advantage Disadvantage

Mode
Across the public PIM (*, G) The private network  The RPT-to-SPT
network entries are rendezvous point (RP) switching may
transmitted can be deployed at occur on the public
across the either a CE or a PE. network, so PEs
public need to maintain a
Issue 01 (2018-05-04) 1471

NE20E-S2
Implementation Principle Advantage Disadvantage

Mode
network to lot of route state
remote PEs. information.
The  Currently, a private
multicast network RP must
joining be a static RP.
process
includes:
 Rendezv
ous point
tree
(RPT)
construct
ion (see
Table
1-297 for
more
informati
on)
 Switchin
g from
an RPT
to a
shortest
path tree
(SPT)
(see
Table
1-298 for
more
informati
on)
Not across the public PIM (*, G)  PIM (*, G) entries The private network
network entries are are not transmitted RP can be deployed on
converted to across the public either a PE or a CE. If
PIM (S, G) network, lowering a CE serves as the
entries the performance private network RP,
before being requirements for the CE must establish
transmitted PEs. an MSDP peer
to remote  The private relationship with the
PEs across network RP can be corresponding PE.
the public either a static RP
network. or a dynamic RP.
The advertisement of VPNv4 routes during multicast joining and leaving in PIM (*, G) mode is similar
to that in PIM (S, G) mode. For more information, see Table 1-294.
Issue 01 (2018-05-04) 1472

NE20E-S2
 PIM (*, G) multicast joining and leaving across the public network
On the network show in Figure 1-996, CE3 serves as the RP. Figure 1-997 shows the
time sequence for establishing an RPT.
Figure 1-996 Networking for PIM (*, G) multicast joining and leaving
Figure 1-997 Time sequence for establishing an RPT
Issue 01 (2018-05-04) 1473

NE20E-S2
Table 1-297 describes the procedure for establishing an RPT.
Table 1-297 Procedure for establishing an RPT
Step Dev Key Action

ice
CE2 After CE2 receives an IGMP join request, CE2 sends a PIM (*, G) Join
message to PE2.
PE2 After PE2 receives the PIM (*, G) Join message: PE2 generates a PIM (*,
G) entry. In this entry, the downstream interface is the interface that
receives the PIM (*, G) Join message and the upstream interface is the
P2MP tunnel interface on the path to the RP. In this case, the upstream
interface is the interface used by PE3 to connect to PE2. PE2 generates a
BGP C-multicast route (Shared Tree Join route) based on the locally stored
Community values. The RT-import attribute of this route is set to the
locally stored VRF Route Import Extended Community value. PE2 sends
the BGP C-multicast route to PE3, its BGP peer.
NOTE
For more information about BGP C-multicast route generation, see Table 1-294.
PE3 After PE3 receives the BGP C-multicast route (Shared Tree Join route):
1. PE3 checks the Administrator field and Local Administrator field values
in the RT-import attribute of the BGP C-multicast route. After PE3
confirms that the Administrator field value is the same as its local
MVPN ID, PE3 accepts the BGP C-multicast route.
2. PE3 determines to which VPN instance routing table should the BGP
C-multicast route be added based on the Local Administrator field value
in the RT-import attribute of the route.
3. PE3 adds the BGP C-multicast route to the corresponding VPN instance
routing table and creates a VPN multicast entry to guide multicast traffic
forwarding. In the multicast entry, the downstream interface is PE3's
P2MP tunnel interface.
4. PE3 converts the BGP C-multicast route to a PIM (*, G) Join message
and sends this message to CE3.
CE3 Upon receipt of the PIM (*, G) Join message, CE3 generates a PIM (*, G)
entry. In this entry, the downstream interface is the interface that receives
the PIM (*, G) Join message. Then, an RPT rooted at CE3 and with CE2 as
the leaf node is established.
CE1 After CE1 receives multicast traffic from the multicast source, CE1 sends a
PIM Register message to CE3.
CE3 Upon receipt of the PIM Register message, CE3 generates a PIM (S, G)
entry, which inherits the outbound interface of the previously generated
PIM (*, G) entry. Meanwhile, CE3 sends multicast traffic to PE3.
PE3 Upon receipt of the multicast traffic, PE3 generates a PIM (S, G) entry,
which inherits the outbound interface of the previously generated PIM (*,
G) entry. Because the outbound interface of the PIM (*, G) entry is an
P2MP tunnel interface, multicast traffic is imported to the I-PMSI tunnel.
Issue 01 (2018-05-04) 1474

NE20E-S2
Step Dev Key Action

ice
PE2 Upon receipt of the multicast traffic, PE2 generates a PIM (S, G) entry,
which inherits the outbound interface of the previously generated PIM (*,
G) entry.
CE2 Upon receipt of the multicast traffic, CE2 sends the multicast traffic to
multicast receivers.
When the multicast traffic sent by the multicast source exceeds the threshold set on set,
CE2 initiates RPT-to-SPT switching. Figure 1-998 shows the time sequence for
switching an RPT to an SPT.
When the receiver PE receives multicast traffic transmitted along the RPT, the receiver PE immediately
initiates RPT-to-SPT switching. The RPT-to-SPT switching process on the receiver PE is similar to that
on CE2.
Figure 1-998 Time sequence for RPT-to-SPT switching
Table 1-298 describes the procedure for switching an RPT to an SPT.
Table 1-298 Procedure for RPT-to-SPT switching
Step Dev Key Action

ice
Issue 01 (2018-05-04) 1475

NE20E-S2
Step Dev Key Action

ice
CE2 After the received multicast traffic exceeds the set threshold, CE2 initiates
RPT-to-SPT switching by sending a PIM (S, G) Join message to PE2.
PE2 Upon receipt of the PIM (S, G) Join message, PE2 updates the outbound
interface status in its PIM (S, G) entry, and switches the PIM (S, G) entry to
the SPT. Then, PE2 searches its multicast routing table for a route to the
multicast source. After PE2 finds that the upstream device on the path to
the multicast source is PE1, PE2 sends a BGP C-multicast route (Source
Tree Join route) to PE1.
PE1 Upon receipt of the BGP C-multicast route (Source Tree Join route), PE1
generates a PIM (S, G) entry, and sends a PIM (S, G) Join message to CE1.
CE1 Upon receipt of the PIM (S, G) Join message, CE1 generates a PIM (S, G)
entry. Then, the RPT-to-SPT switching is complete and CE1 can send
multicast traffic to PE1.
PE1 To prevent duplicate multicast traffic, PE1 carries the PIM (S, G) entry
information in a Source Active AD route and sends the route to all its BGP
peers.
PE3 Upon receipt of the Source Active AD route, PE3 records the route. After
RPT-to-SPT switching, PE3, the ingress of the P2MP tunnel for the RPT,
deletes received multicast traffic, generates the (S, G, RPT) state, and sends
a PIM (S, G, RPT) Prune to its upstream. Meanwhile, PE3 updates its VPN
multicast routing entries and stops forwarding multicast traffic.
NOTE
To prevent packet loss during RPT-to-SPT switching, the PIM (S, G, RPT) Prune
operation is performed after a short delay.
PE2 Upon receipt of the Source Active AD route, PE2 records the route.
Because the Source Active AD route carries information about the PIM (S,
G) entry for the RPT, PE2 initiates RPT-to-SPT switching. After PE2 sends
a BGP C-multicast route (Source Tree Join route) to PE1, PE2 can receive
multicast traffic from PE1.
Figure 1-999 shows the time sequence for leaving a multicast group in PIM (*, G) mode.
Issue 01 (2018-05-04) 1476

NE20E-S2
Figure 1-999 Time sequence for leaving a multicast group in PIM (*, G) mode
Table 1-299 describes the procedure for leaving a multicast group in PIM (*, G) mode.
Table 1-299 Procedure for leaving a multicast group in PIM (*, G) mode
Step Dev Key Action

ice
CE2 After CE2 detects that a multicast receiver attached to itself leaves the
multicast group, CE2 sends a PIM (*, G) Prune message to PE2. If CE2 has
switched to the SPT, CE2 also sends a PIM (S, G) Prune message to PE2.
PE2 Upon receipt of the PIM (*, G) Prune message, PE2 deletes the
corresponding PIM (*, G) entry. Upon receipt of the PIM (S, G) Prune
message, PE2 deletes the corresponding PIM (S, G) entry.
PE2 PE2 sends a BGP Withdraw message (Shared Tree Join route) to PE3 and a
BGP Withdraw message (Source Tree Join route) to PE1.
PE1 Upon receipt of the BGP Withdraw message (Source Tree Join route), PE1
deletes the previously recorded BGP C-multicast route (Source Tree Join
route) as well as the outbound interface in the PIM (S, G) entry.
PE3 Upon receipt of the BGP Withdraw message (Shared Tree Join route), PE3
deletes the previously recorded BGP C-multicast route (Shared Tree Join
 PIM (*, G) multicast joining and leaving not across the public network
Issue 01 (2018-05-04) 1477

NE20E-S2
On the network show in Figure 1-996, each site of the MVPN is a PIM-SM BSR domain.
A PE serves as the RP. Figure 1-1000 shows the time sequence for joining a multicast
group when a PE serves as the RP.
Figure 1-1000 Time sequence for joining a multicast group when a PE serves as the RP
Table 1-300 describes the procedure for joining a multicast group when a PE serves as
the RP.
Table 1-300 Procedure for joining a multicast group when a PE serves as the RP
Ste Dev Key Action

p ice
CE2 After CE2 receives an IGMP join request, CE2 sends a PIM (*, G) Join
message to PE2.
PE2 Upon receipt of the PIM (*, G) Join message, PE2 generates a PIM (*, G)
entry. Because PE2 is the RP, PE2 does not send the BGP C-multicast route
(Shared Tree Join route) to other devices. Then, an RPT rooted at PE2 and
with CE2 as the leaf node is established.
CE1 After CE1 receives multicast traffic from the multicast server, CE1 sends a
PIM Register message to PE1.
PE1 Upon receipt of the PIM Register message, PE1 generates a PIM (S, G)
entry.
PE1 PE1 sends a Source Active AD route to all its BGP peers.
Issue 01 (2018-05-04) 1478

NE20E-S2
Ste Dev Key Action

p ice
PE2 Upon receipt of the Source Active AD route, PE2 generates a PIM (S, G)
entry, which inherits the outbound interface of the previously generated PIM
(*, G) entry.
PE2 PE2 initiates RPT-to-SPT switching and sends a BGP C-multicast route
(Source Tree Join route) to PE1.
PE1 Upon receipt of the BGP C-multicast route (Source Tree Join route), PE1
imports multicast traffic to the I-PMSI tunnel based on the corresponding
VPN multicast forwarding entry. Then, multicast traffic is transmitted over
the I-PMSI tunnel to CE2.
Figure 1-1001 shows the time sequence for leaving a multicast group when a PE serves
as the RP.
Figure 1-1001 Time sequence for leaving a multicast group when a PE serves as the RP
Table 1-301 describes the procedure for leaving a multicast group when a PE serves as
the RP.
Issue 01 (2018-05-04) 1479

NE20E-S2
Table 1-301 Procedure for leaving a multicast group when a PE serves as the RP
Step Dev Key Action

ice
multicast group, CE2 sends a PIM (*, G) Prune message to PE2.
PE2 Upon receipt of the PIM (*, G) Prune message, PE2 deletes the
corresponding PIM (*, G) entry.
CE2 CE2 sends a PIM (S, G) Prune message to PE2.
PE2 Upon receipt of the PIM (S, G) Prune message, PE2 deletes the
corresponding PIM (S, G) entry. PE2 sends a BGP Withdraw message
(Source Tree Join route) to PE1.
Meanwhile, PE1 sends a PIM (S, G) Prune message to CE1.
CE1 Upon receipt of the PIM (S, G) Prune message, CE1 stops sending
On the network show in Figure 1-996, each site of the MVPN is a PIM-SM BSR domain.
A CE serves as the RP. CE3 has established an MSDP peer relationship with PE3, and
PE2 has established an MSDP peer relationship with CE2. Figure 1-1002 shows the time
sequence for joining a multicast group when a CE serves as the RP.
Figure 1-1002 Time sequence for joining a multicast group when a CE serves as the RP
Issue 01 (2018-05-04) 1480

NE20E-S2
Table 1-302 describes the procedure for joining a multicast group when a CE serves as
the RP.
Table 1-302 Procedure for joining a multicast group when a CE serves as the RP
Step Dev Key Action

ice
CE2 After CE2 receives an IGMP join request, CE2 generates a PIM (*, G) Join
message. Because CE2 is the RP, CE2 does not send the PIM (*, G) Join
message to its upstream.
CE1 After CE1 receives multicast traffic from the multicast server, CE1 sends a
PIM Register message to CE3.
CE3 Upon receipt of the PIM Register message, CE3 generates a PIM (S, G)
entry.
CE3 CE3 carries the PIM (S, G) entry information in an MSDP Source Active
(SA) message and sends the message to its MSDP peer, PE3.
PE3 Upon receipt of the MSDP SA message, PE3 generates a PIM (S, G) entry.
PE3 PE3 carries the PIM (S, G) entry information in a Source Active AD route
and sends the route to other PEs.
PE2 Upon receipt of the Source Active AD route, PE2 learns the PIM (S, G)
entry information carried in the route. Then, PE2 sends an MSDP SA
message to transmit the PIM (S, G) entry information to its MSDP peer,
CE2.
CE2 Upon receipt of the MSDP SA message, CE2 learns the PIM (S, G) entry
information carried in the message and generates a PIM (S, G) entry. Then,
CE2 initiates a PIM (S, G) join request to the multicast source. Finally, CE2
forwards the multicast traffic to multicast receivers.
Figure 1-1003 shows the time sequence for leaving a multicast group when a CE serves
as the RP.
Issue 01 (2018-05-04) 1481

NE20E-S2
Figure 1-1003 Time sequence for leaving a multicast group when a CE serves as the RP
Table 1-303 describes the procedure for leaving a multicast group when a CE serves as
the RP.
Table 1-303 Procedure for leaving a multicast group when a CE serves as the RP
Step Dev Key Action

ice
multicast group, CE2 generates a PIM (*, G) Prune message. Because CE2
is the RP, CE2 does not send the PIM (*, G) Prune message to its upstream.
CE2 CE2 sends a PIM (S, G) Prune message to PE2.
PE2 Upon receipt of the PIM (S, G) Prune message, PE2 deletes the
corresponding PIM (S, G) entry. Then, PE2 sends a BGP Withdraw
message (Shared Tree Join route) to PE1.
Meanwhile, PE1 sends a PIM (S, G) Prune message to CE1.
CE1 Upon receipt of the PIM (S, G) Prune message, CE1 stops sending
Issue 01 (2018-05-04) 1482

NE20E-S2
1.11.8.3 NG MVPN Data Plane

After a multicast receiver joins a multicast group, the multicast source can send multicast
traffic to the multicast receiver over a BGP/MPLS IP VPN if the corresponding P-tunnel has
been established. Figure 1-1004 shows a typical NG MVPN networking scenario, and Figure
1-1005 shows how an IP multicast packet is encapsulated and transmitted on the network
Figure 1-1005 IP multicast packet transmission on an NG MVPN
Table 1-304 describes how an IP multicast packet is transmitted on an NG MVPN.
Issue 01 (2018-05-04) 1483

NE20E-S2
Table 1-304 IP multicast packet transmission on an NG MVPN
Step Devi Action Multicast Forwarding Table

ce Information
1 CE1 After CE1 receives an IP

multicast packet from the
multicast source, CE1
searches its multicast
forwarding table to forward
the packet to PE1.
2 PE1 After PE1 receives the IP

multicast packet, PE1
searches its VPN instance
multicast forwarding table
for the corresponding (C-S,
C-G) entry, adds an MPLS
label to the packet, and
sends the packet over a
P2MP tunnel to the P.
3 P After the P receives the -

MPLS packet, the P
duplicates the packet after
removing the MPLS label
of the packet. Then, the P
adds a new MPLS label to
each copy and sends one
copy to PE2 and one copy
to PE3.
4 PE2/ After PE2 and PE3 receive
PE3 the MPLS packet, PE2 and
PE3 remove the MPLS
label, search their VPN
instance multicast
forwarding tables for the
corresponding (C-S, C-G)
entries, and forward the IP
multicast packet to CE2 and
CE3 respectively.
5 CE2/ After CE2 and CE3 receive
CE3 the IP multicast packet,
CE2 and CE3 search their
multicast forwarding tables
to forward the IP multicast
packet to all receivers in the
multicast group.
Issue 01 (2018-05-04) 1484

NE20E-S2
1.11.8.4 NG MVPN Control Messages

PEs on an NG MVPN exchange control messages to implement functions such as MVPN
membership autodiscovery, PMSI tunnel establishment, and VPN multicast joining and
leaving. The following describes these NG MVPN control messages. All examples in this
document are based on the network shown in Figure 1-1006. On this network:
 The service provider's backbone network provides both unicast and multicast VPN
services for vpn1. The AS number of the backbone network is 65001.
 The multicast source resides at Site1 and accesses PE1 by means of CE1. This multicast
source sends multicast traffic to multicast group 232.1.1.1.
 Multicast receivers reside at Site2 and Site3.
 The backbone network provides MVPN services for vpn1 over RSVP-TE or mLDP
P2MP tunnels.
Figure 1-1006 NG MVPN
MVPN NLRI
A PE that participates in an NG MVPN is required to send a BGP Update message containing
the MVPN NLRI. The SAFI of the MVPN NLRI is 5. Figure 1-1007 shows the MVPN NLRI
format.
Issue 01 (2018-05-04) 1485

NE20E-S2
Figure 1-1007 MVPN NLRI format
Table 1-305 Description of the fields in the MVPN NLRI
Field Description
Route type Type of an MVPN route. Seven types of MVPN routes are
specified. For more information, see Table 1-306.
Length Length of the Route Type specific field of the MVPN
NLRI.
Route type specific MVPN route information. The value of this field depends
on the Route Type field. For more information, see Table
1-306.
Table 1-306 describes the types and functions of MVPN routes. Type 1-5 routes are called
MVPN A-D routes. These routes are used for MVPN membership autodiscovery and P2MP
tunnel establishment. Type 6 and Type 7 routes are called C-multicast routes (C is short for
Customer. C-multicast routes refe to multicast routes from the private network). These routes
are used for VPN multicast joining and VPN multicast traffic forwarding.
Table 1-306 Types and functions of MVPN routes

Rou Name Function Route Type Specific Field
te Format
Typ
e
1 Intra-AS I-PMSI A-D Used for MVPN

route membership
autodiscovery in
intra-AS scenarios.
MVPN-capable
PEs use Intra-AS
I-PMSI A-D routes
to advertise and
learn intra-AS
MVPN
membership
information.
Issue 01 (2018-05-04) 1486

NE20E-S2

te Format
Typ
e
2 Inter-AS I-PMSI A-D Used for MVPN

route membership
autodiscovery in
inter-AS scenarios.
MVPN-capable
ASBRs use
Inter-AS I-PMSI
A-D routes to
advertise and learn
inter-AS MVPN
membership
information.
3 S-PMSI A-D route Used by a sender
PE to initiate a
selective P-tunnel
for a particular
(C-S, C-G).
4 Leaf A-D route Originated by a

receiver PE in
response to
receiving a Type 3
route. A sender PE
NOTE
uses Leaf A-D
The Route Key is set to the MVPN
routes to discover NLRI of the S-PMSI A-D route
the leaf nodes of received.
an S-PMSI tunnel.
5 Source Active A-D route Used by PEs to

learn the identity
of active VPN
multicast sources.
6 Shared Tree Join route Used in (*, G)
Issue 01 (2018-05-04) 1487

NE20E-S2

te Format
Typ
e
scenarios. NOTE
The Shared Tree Shared Tree Join routes and Source
Tree Join routes have the same NLRI
Join route is format. The multicast source address
originated when a is the RP address for (C-*, C-G)
receiver PE joins.
receives a (C-*,
C-G) PIM Join
message. A
receiver PE sends
the Shared Tree
Join route to
sender PEs with
which it has
established BGP
peer relationships.
NOTE
The (*, G) PIM-SM
join initiated by a
VPN is called a
(C-*, C-G) PIM
join.
7 Source Tree Join route Used in (S, G)

scenarios.
The Source Tree
Join route is
originated when a
receiver PE
receives a (C-S,
C-G) PIM Join
message. A
receiver PE sends
the Source Tree
Join route to
sender PEs with
which it has
established BGP
peer relationships.
NOTE
The (S, G)
PIM-SSM join
initiated by a VPN
is called a (C-S,
C-G) PIM join.
Issue 01 (2018-05-04) 1488

NE20E-S2
Intra-AS I-PMSI A-D Route

Figure 1-1008 (a) shows the format of an Intra-AS I-PMSI A-D route. Figure 1-1008 (b)
shows the contents of the Intra-AS I-PMSI A-D route sent by PE1 to PE2 and PE3 on the
network shown in Figure 1-1006.
Figure 1-1008 Intra-AS I-PMSI A-D route format
Table 1-307 Description of the fields for an Intra-AS I-PMSI A-D route
Field Description
RD Route distinguisher, an 8-byte field. An RD and a 4-byte

IPv4 address prefix form a VPNv4 address, which is used to
differentiate IPv4 prefixes using the same address space.
Originating router's IP IP address of the router that originates the A-D route. In
address NE20E, the value is equal to the MVPN ID of the router
that originates the A-D route.
S-PMSI A-D Route

Figure 1-1009 (a) shows the format of an S-PMSI A-D route. Figure 1-1009 (b) shows the
contents of the S-PMSI A-D route sent by PE1 to PE2 and PE3 on the network shown in
Figure 1-1006.
Figure 1-1009 S-PMSI A-D route format
Table 1-308 Description of the fields for an S-PMSI A-D route
Field Description
Issue 01 (2018-05-04) 1489

NE20E-S2
Field Description
RD Route distinguisher, an 8-byte field in a VPNv4 address. An
RD and a 4-byte IPv4 address prefix form a VPNv4 address,
which is used to differentiate IPv4 prefixes using the same
address space.
Multicast source length Length of the multicast source address. The value is 32 if
the multicast source address is an IPv4 address or 128 if the
multicast source address is an IPv6 address.
Multicast source Multicast source address.
Multicast group length Length of the multicast group address. The value is 32 if the
multicast group address is an IPv4 address or 128 if the
multicast group address is an IPv6 address.
Multicast group Multicast group address.
address NE20E, the value is equal to the MVPN ID of the router
that originates the A-D route.
Leaf A-D Route

Figure 1-1010 (a) shows the format of a Leaf A-D route. Figure 1-1010 (b) shows the contents
of the Leaf A-D route sent by PE2 to PE1 in response to receiving an S-PMSI A-D route on
the network shown in Figure 1-1006.
Figure 1-1010 Leaf A-D route format
Table 1-309 Description of the fields for a Leaf A-D route
Field Description
Route key Set to the MVPN NLRI of the S-PMSI A-D route received.
NE20E, the value is equal to the MVPN ID of the router
Issue 01 (2018-05-04) 1490

NE20E-S2
Field Description
address that originates the A-D route.
Source Active A-D Route

Figure 1-1011 (a) shows the format of a Source Active A-D route. Figure 1-1011 (b) shows
the contents of the Source Active route sent by PE1 to PE2 and PE3 after PE1 discovers an
active multicast source and the corresponding multicast group (192.168.1.2, 224.1.1.1) on the
network shown in Figure 1-1006.
Figure 1-1011 Source Active A-D route format
Table 1-310 Description of the fields for a Source Active A-D route
Field Description
RD RD of the sender PE connected to the multicast source.
the multicast source address is an IPv4 address or 128 if the
multicast source address is an IPv6 address.
Shared Tree Join route

Figure 1-1012 (a) shows the format of a Shared Tree Join route. Figure 1-1012 (b) shows the
contents of the Shared Tree Join route sent by PE2 to PE1 on the network shown in Figure
1-1006.
Issue 01 (2018-05-04) 1491

NE20E-S2
Figure 1-1012 Shared Tree Join route format
Table 1-311 Description of the fields for a Shared Tree Join route
Field Description
Route type MVPN route type. The value 6 indicates that the route is a
Type 6 route (Shared Tree Join route).
Rt-import VRF Route Import Extended Community of the unicast
route to the multicast source. For more information about
the VRF Route Import Extended Community, see MVPN
Extended Communities.
The VRF Route Import Extended Community is used by
sender PEs to determine whether to process the BGP
C-multicast route sent by a receiver PE. This attribute also
helps a sender PE to determine to which VPN instance
routing table a BGP C-multicast route should be added.
Next hop Next hop address.
Source AS Source AS Extended Community of the unicast route to the
multicast source. For more information about the Source AS
Extended Community, see MVPN Extended Communities.
the multicast group address is an IPv4 address or 128 if the
RP address Rendezvous Point (Rendezvous Point) address.
Issue 01 (2018-05-04) 1492

NE20E-S2
Source Tree Join Route

Figure 1-1013 (a) shows the format of a Source Tree Join route. Figure 1-1013 (b) shows the
contents of the Source Tree Join route sent by PE2 to PE1 on the network shown in Figure
1-1006.
Figure 1-1013 Source Tree Join route format
Table 1-312 Description of the fields for a Source Tree Join route
Field Description
Source AS Source AS Extended Community of the unicast route to the
multicast source. For more information about the Source AS
Extended Community, see MVPN Extended Communities.
the multicast group address is an IPv4 address or 128 if the
PMSI Tunnel Attribute

The PMSI Tunnel attribute carries P-tunnel information used for P-tunnel establishment.
Figure 1-1014 shows the PMSI Tunnel attribute format.
Issue 01 (2018-05-04) 1493

NE20E-S2
Figure 1-1014 PMSI Tunnel attribute format
Table 1-313 Description of fields for the PMSI Tunnel attribute
Field Description
Flags Flags bits. Currently, only one flag indicating whether leaf
information is required is specified:
 If the PMSI Tunnel attribute carried with a Type 3 route
has its Flags bit set to Leaf Information Not Required,
the receiver PE that receives the Type 3 route does not
need to respond.
 If the PMSI Tunnel attribute carried with a Type 3 route
has its Flags bit set to Leaf Information Required, the
receiver PE that receives the Type 3 route needs to send
a Leaf A-D route in response.
Tunnel type Tunnel type, which can be:

 0: No tunnel information present
 1: RSVP-TE P2MP LSP
 2: mLDP P2MP LSP
 3: PIM-SSM Tree
 4: PIM-SM Tree
 5: BIDIR-PIM Tree
 6: Ingress Replication
 7: mLDP MP2MP LSP
Currently, NE20E supports only RSVP-TE and mLDP
P2MP tunnels.
MPLS label MPLS labels are used for VPN tunnel multiplexing.
Currently, tunnel multiplexing is not supported.
Tunnel identifier Tunnel identifier. Its value depends on the value set in the
Tunnel Type field: The NE20E supports only the following
two types of tunnels:
 If the tunnel type is RSVP-TE P2MP LSP, its value is
<P2MP ID, Tunnel ID, Extended Tunnel ID>.
 If the tunnel type is mLDP P2MP LSP, its value is <Root
Issue 01 (2018-05-04) 1494

NE20E-S2
Field Description
node address, Opaque value>.
On an NG MVPN, the sender PE sets up the P-tunnel, and therefore is responsible for
originating the PMSI Tunnel attribute. The PMSI Tunnel attribute can be attached to Type 1-3
routes and sent to receiver PEs. Figure 1-1015 is an example shows the format of an Intra-AS
I-PMSI A-D route carrying the PMSI Tunnel attribute.
Figure 1-1015 Intra-AS I-PMSI A-D route carrying the PMSI Tunnel attribute
1.11.8.5 NG MVPN Reliability

In the NG MVPN solution, MDT protection must be deployed to prevent network node or
link failures from causing long-term multicast service interruptions. A general protection
mechanism is node or link redundancy, which can immediately switch traffic to a backup
device or link if the master device or primary link fails. Table 1-314 describes several NG
MVPN protection solutions.
Table 1-314 NG MVPN protection solutions
Protection Protection Position Characteristic

Solution
Single-MVPN Sender CEs, receiver PEs, and Advantage: The network does not
networking nodes and links between sender have redundant multicast traffic.
protection CEs and receiver PEs Disadvantages:
 This solution enhances
network reliability by means
of networking redundancy. If
a network fault occurs, traffic
depends on unicast route
convergence to switch
between links. A longer route
convergence time results in
Issue 01 (2018-05-04) 1495

NE20E-S2
Protection Protection Position Characteristic

Solution
lower network reliability.
 Receiver CEs cannot be
protected.
Dual-MVPN Entire network Advantage: The entire network

networking can be protected.
protection Disadvantages:
 This solution also enhances
network reliability by means
of networking redundancy. If
a network fault occurs, traffic
depends on unicast route
convergence to switch
between links. A longer route
convergence time results in
lower network reliability.
 Redundant multicast traffic
exists on the network, wasting
bandwidth resources.
Dual-root 1+1 Sender PEs (P-tunnels can also Advantage: The network uses
protection be protected after this solution is BFD to detect link faults,
deployed) implementing fast route
convergence and high network
reliability.
Disadvantages:
 Redundant multicast traffic
exists on the network, wasting
bandwidth resources.
 Only sender PEs and
P-tunnels can be protected.
Receiver PEs and CEs cannot
be protected.
MPLS tunnel P-tunnels Advantage: MPLS tunnel

protection, such as protection technologies are
P2MP TE FRR mature and highly reliable.
NOTE Disadvantage: Only link
For more protection is supported.
information about
P2MP TE FRR, see
1.12.4.10 P2MP
TE.
Issue 01 (2018-05-04) 1496

NE20E-S2
Single-MVPN Networking Protection

Appropriate NG MVPN networking can protect traffic transmitted over the NG MVPN
without using any reliability mechanisms. Single-MVPN networking protection is such an NG
MVPN protection solution. In single-MVPN networking protection, only one sender PE sends
multicast traffic to receiver PEs.
Scenario in Which No Fault Occurs
For example, on the network shown in Figure 1-1016, unicast routing, VPN, BGP, MPLS, and
multicast routing are deployed properly. Figure 1-1016 shows how a multicast receiver joins a
multicast group and how the multicast traffic is transmitted in a scenario in which no fault
occurs:
 Multicast joining process: After CE3 receives an IGMP join request, CE3 sends a PIM
Join message to PE3. Upon receipt, PE3 converts the message to a BGP C-multicast
route and sends the route to PE1, its BGP MVPN peer. Upon receipt, PE1 converts the
route to a PIM Join message and sends the message to the multicast source. Then, the
multicast receiver joins the multicast group.
 Multicast forwarding process: After PE1 receives multicast traffic from the multicast
source, PE1 sends the multicast traffic to PE3 over the P2MP tunnel. Upon receipt, PE3
sends the traffic to CE3, which in turn sends the traffic to the multicast receiver.
Figure 1-1016 Single-MVPN networking protection
Scenario in Which a Fault Occurs

Table 1-315 lists the possible points of failure on the network shown in Figure 1-1016 and
describes the corresponding network convergence processes.
Issue 01 (2018-05-04) 1497

NE20E-S2
Table 1-315 Possible points of failure and corresponding network convergence processes
No Point Network Convergence Process

. of
Failure
1 CE1 or The network can rely only on unicast route convergence for recovery. The
link handling process is as follows:
between 1. PE1 detects that the multicast source is unreachable.
PE1
2. PE1 sends to PE3 a BGP Withdraw message that carries information
and the
about a VPNv4 route to the source.
multica
st 3. After PE3 receives the message, PE3 preferentially selects the route
source advertised by PE2 as the route to the multicast source. Then, PE3 sends
a BGP C-multicast route to PE2. Upon receipt, PE2 converts the route
to a PIM Join message and sends the message to CE2.
4. CE2 constructs an MDT and sends the multicast traffic received from
the multicast source to PE2. Upon receipt, PE2 sends the traffic to PE3
over the P2MP tunnel.
5. After PE3 receives the traffic, PE3 sends the traffic to CE3, which in
turn sends the traffic to the multicast receiver.
2 PE1 The network can rely only on unicast route convergence for recovery. The
handling process is as follows:
1. After PE3 uses BFD for BGP to detect that PE1 is unreachable, PE3
withdraws the route (to the multicast source) advertised by PE1 and
preferentially selects the route advertised by PE2 as the route to the
multicast source. Then, PE3 sends a BGP C-multicast route to PE2.
2. After PE2 receives the route, PE2 converts the route to a PIM Join
message and sends the message to CE2.
3 Public If MPLS tunnel protection is configured, the network relies on MPLS
network tunnel protection for recovery. The MVPN is unaware of public network
link link changes. If MPLS tunnel protection is not configured, the network
relies on unicast route convergence for recovery. In this situation, the
handling process is similar to the process for handling PE1 failures.
4 PE3 The network can rely only on unicast route convergence for recovery. The
1. When CE3 detects that PE3 is unreachable, CE3 withdraws the unicast
route (to the multicast source) advertised by PE3 to trigger route
convergence. During route convergence, CE3 preferentially selects the
route advertised by PE4 as the route to the multicast source.
2. CE3 sends a PIM Join message to PE4.
3. After PE4 receives the message, PE4 converts the message to a BGP
C-multicast route and sends the route to PE1.
4. After PE1 receives the route, PE1 converts the route to a PIM Join
message and sends the message to CE1.
Issue 01 (2018-05-04) 1498

NE20E-S2

. of
Failure
In single-MVPN networking protection, if PE3 and PE4 both receive PIM Join messages but
their upstream peers are different (for example, the upstream peer is PE1 for PE3 and PE2 for
PE4), PE1 and PE2 both send multicast traffic to PE3 and PE4. In this situation, you must
ensure that PE3 accepts only the multicast traffic from PE1 and PE4 accepts only the
multicast traffic from PE2. To do so, you must create multiple P2MP tunnels (with each
I-PMSI tunnel corresponds to one P2MP tunnel) when configuring a receiver PE to join
multiple I-PMSI tunnels. Then, when the multicast traffic reaches the receiver PE over
multiple I-PMSI tunnels, the receiver PE can identify the P2MP tunnel corresponding to the
upstream neighbor in its VPN instance multicast routing table. The receiver PE permits traffic
only in the identified P2MP tunnel but discards traffic in all other tunnels.
Dual-MVPN Networking Protection

Dual-MVPN networking protection is another protection solution that relies only on network
convergence to protect NG MVPN traffic. Dual-MVPN networking protection has the
following characteristics:
 On the control plane
− The master sender and receiver PEs belong to one MVPN; the backup sender and
receiver PEs belong to another MVPN.
− One receiver CE sends a PIM Join message to the master receiver PE, and the other
receiver CE sends a PIM Join message to the backup receiver PE. The master
receiver PE sends a BGP C-multicast route to the master sender PE, whereas the
backup receiver PE sends a BGP C-multicast route to the backup sender PE.
− The master and backup sender PEs convert received BGP C-multicast routes to PIM
Join messages and send these messages to the two sender CEs. The two CEs then
construct two MDTs.
 On the data plane
− The master and backup sender PEs send multicast traffic received from different
sender CEs to the master and backup receiver PEs respectively over different P2MP
tunnels.
− The master and backup receiver PEs send received multicast traffic to
corresponding receiver CEs.
− The receiver CEs send received multicast traffic to corresponding multicast
receivers.
multicast routing are deployed properly. Figure 1-1017 shows how a multicast receiver joins a
multicast group and how the multicast traffic is transmitted in a scenario in which no fault
occurs:
Issue 01 (2018-05-04) 1499

NE20E-S2
 CE3 serves as a DR. After CE3 receives an IGMP join request from a multicast receiver,
CE3 sends a PIM Join message to PE3. Upon receipt, PE3 converts the message to a
BGP C-multicast route and sends the route to PE1, its BGP MVPN peer. Upon receipt,
PE1 converts the BGP C-multicast route to a PIM Join message and sends the message to
CE1. Upon receipt, CE1 establishes an MDT. Then, multicast traffic can be transmitted
from the multicast source to the multicast receiver along the path CE1 -> PE1 -> P1 ->
PE3 -> CE3.
 CE4 serves as a non-DR. After CE4 receives an IGMP join request from a multicast
receiver, CE4 does not send a PIM Join message to its upstream. To implement traffic
redundancy, configure static IGMP joining on CE4, so that CE4 can send a PIM Join
message to PE4. After PE4 receives the message, PE4 converts the message to a BGP
C-multicast route and sends the route to PE2. Upon receipt, PE2 converts the route to a
PIM Join message and sends the message to CE2. Upon receipt, CE2 establishes an
MDT. Then, multicast traffic can be transmitted along the path CE2 -> PE2 -> P2 -> PE4
-> CE4. The multicast traffic will not be forwarded to receivers because CE4 is a
non-DR.
Figure 1-1017 Dual-MVPN networking protection

Table 1-316 lists the possible points of failure on the network shown in Figure 1-1017 and
describes the corresponding network convergence processes.
Table 1-316 Possible points of failure and corresponding network convergence processes
. of
Failure
1 CE1 or The network relies on unicast route convergence for recovery. The
link
Issue 01 (2018-05-04) 1500

NE20E-S2

. of
Failure
between handling process is as follows:
PE1 and 1. PE1 detects that the multicast source is unreachable.
the
2. PE1 sends to PE3 a BGP Withdraw message that carries information
multicas
about a VPNv4 route to the source.
t source
3. After PE3 receives the message, PE3 withdraws the route (to the
multicast source) advertised by PE1.
4. CE3 performs route convergence and finds that the next hop of the
route to the multicast source is CE4. Then, CE3 sends a PIM Join
message to CE4.
5. After CE4 receives the message, CE4 adds the downstream outbound
interface on the path to the multicast receiver to the corresponding
multicast entry. Then, CE4 starts to send the multicast traffic received
from the multicast source to the multicast receiver.
2 PE1 The network relies on unicast route convergence for recovery. The
1. After PE3 uses BFD for BGP to detect that PE1 is unreachable, PE3
withdraws the route (to the multicast source) advertised by PE1. Then,
PE3 instructs CE3 to withdraw this route.
2. CE3 performs route convergence and finds that the next hop of the
route to the multicast source is CE4. Then, CE3 sends a PIM Join
message to CE4.
3 Public If MPLS tunnel protection is configured, the network relies on MPLS
network tunnel protection for recovery. The MVPN is unaware of public network
link link changes. If MPLS tunnel protection is not configured, the network
relies on unicast route convergence for recovery. In this situation, the
handling process is similar to the process for handling PE1 failures.
4 PE3 The network relies on unicast route convergence for recovery. The
1. CE3 detects route changes during unicast route convergence and
recalculates routes. After CE3 finds that the next hop of the route to
the multicast source is CE4, CE3 sends a PIM Join message to CE4.
5 CE3 After CE4 uses BFD for PIM to detect that CE3 is faulty, CE4 starts to
serve as a DR and adds the downstream outbound interface on the path to
the multicast receiver to the corresponding multicast entry. Then, CE4
starts to send the multicast traffic received from the multicast source to
the multicast receiver.
Issue 01 (2018-05-04) 1501

NE20E-S2
Dual-Root 1+1 Protection

In an MVPN scenario, if a sender PE on a P2MP tunnel fails, the VPN multicast service will
be interrupted. The network can rely only on unicast route convergence for recovery. However,
unicast route convergence is slow and may fail to meet the high reliability requirements of
some multicast services. To solve the preceding problem, use BFD for P2MP TE/mLDP and
dual-root 1+1 protection to protect public network nodes. The configuration is as follows:
 Configure PE1 and PE2 as sender PEs for the MVPN. Configure RSVP-TE/mLDP
P2MP on PE1 and PE2, so that two RSVP-TE/mLDP P2MP tunnels rooted at PE1 and
PE2 respectively can be established. PE3 serves as a leaf node of both tunnels.
 Configure PE to use BFD for P2MP TE/mLDP to detect public network node or link
failures.
 Configure VPN FRR on PE3, so that PE3 can have two routes to the multicast source.
PE3 uses the route advertised by PE1 as the primary route and the route advertised by
PE2 as the backup route.
 Configure MVPN FRR on PE3 to import VPN multicast traffic to the primary and
backup routes.
multicast routing are deployed properly. Dual-root 1+1 protection can protect both sender PEs
and P-tunnels. Figure 1-1018 shows how a multicast receiver joins a multicast group and how
the multicast traffic is transmitted in a scenario in which no fault occurs:
 Multicast joining process: After CE3 receives an IGMP join request, CE3 sends a PIM
Join message to PE3. Upon receipt, PE3 converts the message to a BGP C-multicast
route and sends the route to PE1 and PE2, its BGP MVPN peers. Upon receipt, PE1 and
PE2 convert the route to a PIM Join message and send the message to the multicast
source. Then, the multicast receiver joins the multicast group.
 Multicast forwarding process: After PE1 receives multicast traffic from the multicast
source, PE1 sends the multicast traffic to PE3 over the RSVP-TE/mLDP P2MP tunnel.
Upon receipt, PE3 sends the traffic to CE3, which in turn sends the traffic to the
multicast receiver. After PE3 receives the multicast traffic sent over the RSVP-TE/mLDP
P2MP tunnel rooted at PE2, PE3 drops the traffic.
Issue 01 (2018-05-04) 1502

NE20E-S2
Figure 1-1018 Network after dual-root 1+1 protection is configured

Table 1-317 shows the possible points of failure on the network shown in Figure 1-1018 and
the network convergence processes.
Table 1-317 Possible points of failure and network convergence processes
No. Point of Failure Network Convergence Process
1 PE1 or the P2MP If a fault occurs on the RSVP-TE/mLDP P2MP tunnel,

tunnel connected PE3 can use BFD for P2MP TE/mLDP to quickly detect
to PE1 the fault and choose to accept the multicast traffic sent by
PE2. Traffic switchover can be completed within 50 ms.
The specific route convergence time depends on the fault
detection time of BFD for P2MP TE/mLDP.
The disadvantage of dual-root 1+1 protection is that
redundant traffic exists on the public network.
2 P1 or the link The handling process is similar to the process for handling
connected to P1 PE1 or the P2MP tunnel connected to PE1 failures.
3 Public network If MPLS tunnel protection is configured, the network relies
link on MPLS tunnel protection for recovery. If MPLS tunnel
protection is not configured, the network relies on
dual-root 1+1 protection for recovery.
Issue 01 (2018-05-04) 1503

NE20E-S2
1.11.8.6.1 Application of NG MVPN to IPTV Services
Overview
Multicast services, such as IPTV services, video conferences, and real-time multi-player
online games, are increasingly used in daily life. These services are transmitted over service
bearer networks that need to:
 Forward multicast traffic smoothly even during traffic congestion.
 Detect network faults in a timely manner and quickly switch traffic from faulty links to
normal links.
 Ensure multicast traffic security in real time.
NG MVPN is deployed on the service provider's backbone network to solve multicast service
issues related to traffic congestion, transmission reliability, and data security. Figure 1-1019
shows the application of NG MVPN to IPTV services.
Issue 01 (2018-05-04) 1504

NE20E-S2
Figure 1-1019 Application of NG MVPN to IPTV services
Feature Deployment
In this scenario, NG MVPN deployment consists of the following aspects:
 On the control plane
− Configure a BGP/MPLS IP VPN on the service provider's backbone network and
ensure that this VPN runs properly.
− Configure MVPN on the service provider's backbone network, so that PEs
belonging to the same MVPN can use BGP to exchange BGP A-D and BGP
C-multicast routes.
− Configure P2MP tunnels on the service provider's backbone network.
Issue 01 (2018-05-04) 1505

NE20E-S2
− Configure PIM on the private network to establish the VPN MDT.

 On the data plane
− Configure static multicast joining on sender PEs (PE1 and PE2) to direct multicast
traffic to the P2MP tunnels corresponding to the I-PMSI tunnels.
− Configure receiver PEs (PE3, PE4, PE5, and PE6) not to perform RPF checks.
You can use either single-MVPN or dual-MVPN networking protection to enhance network
reliability or use either of the following solutions to protect specific parts of the MVPN:
 To protect sender PEs, configure dual-root 1+1 protection.
 To protect P-tunnels, configure P2MP TE FRR or use other MPLS tunnel protection
technologies.
Terms
Term Definition
BFD Bidirectional Forwarding Detection. A common fault detecting mechanism
that uses Hello packets to quickly detect a link status change and notify a
protocol of the change. The protocol then determines whether to establish or
tear down a peer relationship.
DR Designated router. A router that applies only to PIM-SM. On the network
segment that connects to a multicast source, a DR sends Register messages to
the RP. On the network segment that connects to multicast receivers, a DR
sends Join messages to the RP. In SSM mode, a DR at the group member side
directly sends Join messages to a multicast source.
IGMP Internet Group Management Protocol. A signaling mechanism that
implements communication between hosts and routers on IP multicast leaf
networks.
By periodically sending IGMP messages, a host joins or leaves a multicast
group, and a router identifies whether a multicast group contains members.
Join A type of message used on PIM-SM networks. When a host requests to join a
network segment, the DR of the network segment sends a Join message to the
RP hop by hop to generate a multicast route. When the RP starts an SPT
switchover, the RP sends a Join message to the source hop by hop to generate
a multicast route.
PIM Protocol Independent Multicast. A multicast routing protocol.
Reachable unicast routes are the basis of PIM forwarding. PIM uses the
existing unicast routing information to perform RPF check on multicast
packets to create multicast routing entries and set up an MDT.
Prune A type of message. If there are no multicast group members on a downstream
interface, a router sends a prune message to the upstream node. After
receiving the prune message, the upstream node removes the downstream
interface from the downstream interface list and stops forwarding data of the
specified group to the downstream interface.
P-tunnel A public network tunnel used to transmit VPN multicast traffic. A P-tunnel
can be established using GRE, MPLS, or other tunneling technologies.
Issue 01 (2018-05-04) 1506

NE20E-S2
Term Definition
PMSI A logical tunnel used by a public network to transmit VPN multicast traffic. A
sender PE transmits VPN multicast traffic to receiver PEs over a PMSI
tunnel. Receiver PEs determine whether to accept the VPN multicast traffic
based on PMSI tunnel information. PMSI tunnels are categorized as I-PMSI
or S-PMSI tunnels.
RD Route distinguisher. An 8-byte field in a VPN IPv4 address. An RD together
with a 4-byte IPv4 address prefix constructs a VPN IPv4 address to
differentiate the IPv4 prefixes using the same address space.
receiver A site where multicast receivers reside.
site
receiver A PE connected to a receiver site.
PE
sender site A site where a multicast source resides.
sender PE A PE connected to a sender site.
(S, G) A multicast routing entry. S indicates a multicast source, and G indicates a
multicast group. After a multicast packet with S as the source address and G
as the group address reaches a router, it is forwarded through the downstream
interfaces of the (S, G) entry. The packet is expressed as an (S, G) packet.
(*, G) A PIM routing entry. * indicates any source, and G indicates a multicast
group. The (*, G) entry applies to all multicast packets whose group address
is G. All multicast packets that are sent to G are forwarded through the
downstream interfaces of the (*, G) entry, regardless of which source sends
the packets.
tunnel ID A group of information, including token, slot number of an outgoing
interface, tunnel type.
VPN Virtual private network. A technology that implements a private network over
a public network.
VPN An entity that is set up and maintained by the PE devices for
instance directly-connected sites. Each site has its VPN instance on a PE device. A
VPN instance is also called a VPN routing and forwarding (VRF) table. A PE
device has multiple forwarding tables, including a public-network routing
table and one or multiple VRF tables.
VPN A BGP extended community attribute that is also called Route Target. In
Target BGP/MPLS IP VPN, VPN Target controls VPN routing information. VPN
Target defines a VPN-IPv4 route can be received by which site and a PE
device can receive routes from which site.
MVPN Control MVPN A-D route advertisement. MVPN Target functions in a similar
Target way as VPN Target on unicast VPNs.
Issue 01 (2018-05-04) 1507

NE20E-S2

Abbreviation
A-D autodiscovery
AS autonomous system
BGP Border Gateway Protocol
CE customer edge
C-G customer multicast group address
C-S customer multicast source address
FRR fast reroute
LDP Label Distribution Protocol
mLDP Multipoint LDP
MVPN multicast VPN
NG MVPN next-generation multicast VPN
NLRI network layer reachability information
P2MP point-to-multipoint
P provider (device)
PE provider edge
PIM-SM Protocol Independent Multicast-Sparse Mode
RP rendezvous point
RPF reverse path forwarding
RSVP Resource Reservation Protocol
SSM source-specific multicast
TE traffic engineering
VPN virtual private network
Issue 01 (2018-05-04) 1508

NE20E-S2
1.11.9 MLD
Definition
MLD manages IPv6 multicast members. MLD sets up and maintains member relationships
between IPv6 hosts and the multicast router to which the hosts are directly connected.
MLD has two versions: MLDv1 and MLDv2. Both MLD versions support the ASM model.
MLDv2 supports the SSM model independently, while MLDv1 needs to work with SSM
mapping to support the SSM model.
MLD applies to IPv6 and provides the similar functions as the IGMP for IPv4. MLDv1 is
similar to IGMPv2, and MLDv2 is similar to IGMPv3.
Some features of MLD and IGMP are implemented in the same manner. The following
common features of MLD and IGMP are not mentioned:
 MLD Router-Alert option
 MLD Prompt-Leave
 MLD static-group
 MLD group-policy
 MLD SSM mapping
This section describes MLD principles and unique features of MLD, including the MLD
querier election mechanism and MLD group compatibility.
Configuring an ACL filtering rule is mandatory for source address-based MLD message
filtering, while is optional for source address-based IGMP message filtering.
Purpose
MLD allows hosts to dynamically join IPv6 multicast groups and manages multicast group
members. MLD is configured on the multicast router to which hosts are directly connected.
1.11.9.2 Principles
1.11.9.2.1 MLDv1 and MLDv2
By sending Query messages to hosts and receiving Report messages and Done messages from
hosts, a multicast router can identify multicast groups that contain receivers on a network
segment. A multicast router forwards multicast data to a network segment only if the network
segment has receivers. Hosts can determine whether to join or leave a multicast group.
As shown in Figure 1-1020, MLD-enabled Device A functions as the querier to periodically
send Multicast Listener Query messages. All hosts (Host A, Host B, and Host C) on the same
network segment of Device A can receive these Multicast Listener Query messages.
Issue 01 (2018-05-04) 1509

NE20E-S2
Figure 1-1020 MLD networking
 When a host (for example, Host A) receives a Multicast Listener Query message of
group G, the processing flow is as follows:
If Host A is already a member of group G, Host A replies with a Multicast Listener
Report message of group G at a random time point within the response period specified
by Device A.
After receiving the Multicast Listener Report message, Device A records information
about group G and forwards the multicast data to the network segment of the host
interface that is directly connected to Device A. Meanwhile, Device A starts a timer for
group G or resets the timer if it has been started. If no members of group G respond to
Device A within the interval specified by the timer, Device A stops forwarding the
multicast data of group G.
If Host A is not a member of any multicast group, Host A does not respond to the
Multicast Listener Query message from Device A.
 When a host (for example, Host A) joins a multicast group G, the processing flow is as
follows:
Host A sends a Multicast Listener Report message of group G to Device A, instructing
Device A to update its multicast group information. Subsequent Multicast Listener
Report messages of group G are triggered by Multicast Listener Query messages sent by
Device A.
 When a host (for example, Host A) leaves a multicast group G, the processing flow is as
follows:
Host A sends a Multicast Listener Done message of group G to Device A. After receiving
the Multicast Listener Done message, Device A triggers a query on group G to check
whether group G has other receivers. If Device A does not receive Multicast Listener
Report messages of group G within the period specified by the query message, Device A
deletes the information about group G, and stops forwarding the multicast traffic of
group G.
Message Processing in MLDv1

MLDv1 messages sent by hosts contain only information about multicast groups. After a host
sends a Multicast Listener Report message of a multicast group to a router, the router informs
the multicast forwarding module of the event. Then, the multicast forwarding module can
correctly forward the multicast data to the host when receiving the multicast data of the group.
Issue 01 (2018-05-04) 1510

NE20E-S2
MLDv1 is capable of suppressing report messages to reduce repetitive report messages. This
function works as follows:
After a host (for example, Host A) joins a multicast group G, Host A receives a Multicast
Listener Query message from a router and then randomly selects a value from 0 to Maximum
Response Delay (specified in the Multicast Listener Query message) as the timer value. When
the timer expires, Host A sends a Multicast Listener Report message of group G to the router.
If Host A receives a Multicast Listener Report message of group G from another host in group
G before the timer expires, Host A does not send the Multicast Listener Report message of
group G to the router.
When a host leaves group G, the host sends a Multicast Listener Done message of group G to
a router. Because of the Report message suppression mechanism in MLDv1, the router cannot
determine whether another host exists in group G. Therefore, the router triggers a query on
group G. If another host exists in group G, the host sends a Multicast Listener Report message
of group G to the router.
If a router sends the query on group G for a specified number of times, but does not receive a
Multicast Listener Report message for group G, the router deletes information about group G
and stops forwarding multicast data of group G.
Both MLD queriers and non-queriers can process Multicast Listener Report messages, while only
queriers are responsible for forwarding Multicast Listener Query messages. MLD non-queriers cannot
process Multicast Listener Done messages of MLDv1.
Message Processing in MLDv2

An MLDv1 message contains only the information about multicast groups, but does not
contain information about multicast sources. Therefore, an MLDv1 host can select a multicast
group, but not a multicast source/group. MLDv2 has resolved the problem. The MLDv2
message from a host can contain multiple records of multicast groups, with each multicast
group record containing multiple multicast sources.
MLDv2 does not have the Report message suppression mechanism. Therefore, all hosts
joining a multicast group must reply with Multicast Listener Report messages when receiving
Multicast Listener Query messages. In MLDv2, multicast sources can be selected. Therefore,
besides the common query and group-specific query, an MLDv2 router adds the
source-specific multicast group query, enabling the router to find whether receivers require
data from a specified multicast source.
MLDv2 messages sent by hosts are classified into the following types:
 MODE_IS_INCLUDE: indicates that the corresponding mode between a group and its
source list is Include. That is, hosts receive the data sent by a source in the
source-specific list to the group.
 MODE_IS_EXCLUDE: indicates that the corresponding mode between a group and its
source list is Exclude. That is, hosts receive the data sent by a source that is not in the
source-specific list to the group.
 CHANGE_TO_INCLUDE_MODE: indicates that the corresponding mode between a
group and its source list changes from Exclude to Include. If the source-specific list is
empty, the hosts leave the group.
 CHANGE_TO_EXCLUDE_MODE: indicates that the corresponding mode between a
group and its source list changes from Include to Exclude.
Issue 01 (2018-05-04) 1511

NE20E-S2
 ALLOW_NEW_SOURCES: indicates that a host still wants to receive data from certain
multicast sources. If the current relationship is Include, certain sources are added to the
current source list. If the current relationship is Exclude, certain sources are deleted from
the current source list.
 BLOCK_OLD_SOURCES: indicates that a host does not want to receive data from
certain multicast sources any longer. If the current relationship is Include, certain sources
are deleted from the current source list. If the current relationship is Exclude, certain
sources are added to the current source list.
On the router side, the querier sends Multicast Listener Query messages and receives
Multicast Listener Report. In this manner, the router can identify which multicast group on the
network segment contains receivers, and then forwards the multicast data to the network
segment accordingly. In MLDv2, records of multicast groups can be filtered in either Include
mode or Exclude mode.
 In Include mode:
− The multicast source in the activated state requires the router to forward its data.
− The multicast source in the deactivated state is deleted by the router and data
forwarding for the multicast source is ceased.
 In Exclude mode:
− The multicast source in the activated state is in the collision domain. That is, no
matter whether hosts on the same network segment of the router interface require
the data of the multicast source, the data is forwarded.
− The multicast source in the deactivated state requires no data forwarding.
− Data of the multicast source that is not recorded in the multicast group should be
forwarded.
1.11.9.2.2 MLD Group Compatibility

In MLD group compatibility mode, MLDv2 multicast devices are compatible with MLDv1
hosts. An MLDv2 multicast device can process Multicast Listener Report messages of
MLDv1 hosts. When an MLDv2 multicast device that supports MLD group compatibility
receives Multicast Listener Report messages from an MLDv1 host, the MLDv2 multicast
device automatically changes its MLD version to MLDv1 and operates in MLDv1. Then, the
MLDv2 multicast device ignores MLDv2 BLOCK messages and the multicast source list in
the MLDv2 TO_EX messages. The multicast source-selecting function of MLDv2 messages
is therefore suppressed.
If you manually change the MLDv1 version of a multicast device to the MLDv2 version, the
multicast device still operates in the MLDv1 version if MLDv1 group members exist. The
multicast device upgrades to the MLDv2 version only after all MLDv1 group members leave.
1.11.9.2.3 MLD Querier Election

An MLD multicast device can be either a querier or a non-querier:
 Querier
A querier is responsible for sending Multicast Listener Query messages to hosts and
receiving Multicast Listener Report and Multicast Listener Done messages from hosts. A
querier can then learn which multicast group has receivers on a specified network
segment.
 Non-querier
Issue 01 (2018-05-04) 1512

NE20E-S2
A non-querier only receives Multicast Listener Report messages from hosts to learn
which multicast group has receivers. Then, based on the querier's action, the non-querier
identifies which receivers leave multicast groups.
Generally, a network segment has only one querier. Multicast devices follow the same
principle to select a querier. The process is as follows (using Device A, Device B, and Device
C as examples):
 After MLD is enabled on Device A, Device A considers itself a querier in the startup
process by default and sends Multicast Listener Query messages on the network segment.
If Device A receives a Multicast Listener Query message from Device B that has a lower
link-local address, Device A changes from a querier to a non-querier. Device A starts the
another-querier-existing timer and records Device B as the querier of the network
segment.
 If Device A is a non-querier and receives a Multicast Listener Query message from
Device B in the querier state, Device A updates another-querier-existing timer; if the
received Multicast Listener Query message is sent from Device C whose link-local
address is lower than that of Device B in the querier state, Device A records Device C as
the querier of the network segment and updates the another-querier-existing timer.
 If Device A is a non-querier and the another-querier-existing timer expires, Device A
changes to a querier.
In this document version, querier election can be implemented only among multicast devices that run the
same MLD version on a network segment.
1.11.9.2.4 MLD On-Demand

Multicast Listener Discovery (MLD) on-demand helps to maintain MLD group memberships
and frees a multicast device and its connected access device from exchanging a large number
of packets.
Background
When a multicast device is directly connected to user hosts, the multicast device sends MLD
Query messages to and receives MLD Report and Done messages from the user hosts to
identify the multicast groups that have attached receivers on the shared network segment.
The device directly connected to a multicast device, however, may not be a host but an MLD
proxy-capable access device to which hosts are connected. If you configure only MLD on the
multicast device, access device, and hosts, the multicast and access devices need to exchange
a large number of packets.
To resolve this problem, enable MLD on-demand on the multicast device. The multicast
device sends only one general query message to the access device. After receiving the general
query message, the access device sends the collected Join and Leave status of multicast
groups to the multicast device. The multicast device uses the Join and Leave status of the
multicast groups to maintain multicast group memberships on the local network segment.
Benefits
MLD on-demand reduces packet exchanges between a multicast device and its connected
access device and reduces the loads of these devices.
Issue 01 (2018-05-04) 1513

NE20E-S2
Related Concepts
MLD on-demand
MLD on-demand enables a multicast device to send only one MLD general query message to
its connected access device (MLD proxy-capable) and to use Join/Leave status of multicast
groups reported by its connected access device to maintain MLD group memberships.
Implementation
When a multicast device is directly connected to user hosts, the multicast device sends MLD
Query messages to and receives MLD Report and Done messages from the user hosts to
identify the multicast groups that have attached receivers on the shared network segment. The
device directly connected to the multicast device, however, may be not a host but an MLD
proxy-capable access device, as shown in Figure 1-1021.
Figure 1-1021 Networking diagram for MLD on-demand
The provider edge (PE) is a multicast device. The customer edge (CE) is an access device.
 On the network a shown in Figure 1-1021, if MLD on-demand is not enabled on the PE,
the PE sends a large number of MLD Query messages to the CE, and the CE sends a
large number of Report and Done messages to the PE. As a result, lots of PE and CE
resources are consumed.
 On the network b shown in Figure 1-1021, after MLD on-demand is enabled on the PE,
the PE sends only one general query message to the CE. After receiving the general
query message from the PE, the CE sends the collected Join and Leave status of MLD
groups to the PE. The CE sends a Report or Done message for a group to the PE only
Issue 01 (2018-05-04) 1514

NE20E-S2
when the Join or Leave status of the group changes. To be specific, the CE sends an
MLD Report message for a multicast group to the PE only when the first user joins the
multicast group and sends a Done message only when the last user leaves the multicast
group.
After you enable MLD on-demand on a multicast device connected to an MLD proxy-capable access
device, the multicast device implements MLD in a different way as it implements standard MLD in the
following aspects:
 The records on dynamically joined multicast groups on the multicast device interface connected to
the access device do not time out.
 The multicast device interface connected to the access device sends only one MLD general query
message to the access device.
 The multicast device interface connected to the access device directly deletes the entry for a group
after it receives an MLD Done message for the group.
1.11.9.2.5 Protocol Comparison

Table 1-318 compares MLDv1 and MLDv2
Table 1-318 Protocol Comparison
MLDv1 MLDv2 Advantages of MLDv2 over

MLDv1
An MLDv1 message An MLDv2 message MLDv2 allows hosts to select
contains multicast group contains both multicast multicast sources, while
information, but does not group and source MLDv1 does not.
contain multicast source information.
information.
An MLDv1 message An MLDv2 message MLDv2 reduces the number of
contains the record of only contains records of MLD messages on a network
one multicast group. multiple multicast groups. segment.
The Multicast Listener The Multicast Listener MLDv2 ensures better multicast
Query messages of a Query messages and information consistency
specified multicast group Multicast Listener Query between a non-querier and a
cannot be retransmitted. messages of a specified querier.
multicast source/group can
be retransmitted.
1.11.9.3 MLD Applications

As shown in Figure 1-1022, hosts receive video on demand (VoD) information in multicast
mode. Receivers belong to different organizations form leaf network segments. Each leaf
network segment contains one or more receivers.
Issue 01 (2018-05-04) 1515

NE20E-S2
Figure 1-1022 MLD application
Host A is a receiver of N1; Host C is a receiver of N2. MLDv1 is configured on Port 1 of

Device A that is directly connected to Host A. MLDv2 is configured on Port 1 of Device B
and Device C that are directly connected to its respective host. That is, MLDv1 runs on N1;
MLDv2 runs on N2. The multicast devices on the same network segment must run MLD of
the same version.
1.11.10 User-side Multicast

Definition
User-side multicast enables a BRAS to identify users of a multicast program.
In Figure 1-1023, when the set top box (STB) and phone users go online, they send Internet
Group Management Protocol (IGMP) Report messages of a multicast program to the BRAS.
After receiving the messages, the BRAS identifies the users and sends a Protocol Independent
Multicast (PIM) Join message to the network-side rendezvous point (RP) or the source's
designated router (DR). The RP or source's DR creates multicast forwarding entries for the
users and receives the required multicast traffic from the source. The BRAS finally sends the
multicast traffic to the STB and phone users based on their forwarding entries and replication
modes. The multicast replication in this example is based on sessions.
Now user-side multicast supports IPv4 and IPv6. For IPv4 users, user-side multicast applies to both
private and public networks. For IPv6 users, user-side multicast applies only to public networks.
On Layer 2, user-side multicast supports the PPPoE and IPoE access modes for common users and the
IPoE access mode for E-Line users.
Issue 01 (2018-05-04) 1516

NE20E-S2
Figure 1-1023 User-side multicast
Purpose
Because conventional multicast does not provide a method to identify users, carriers cannot
effectively manage multicast users who access services such as Internet Protocol television
(IPTV). Such users can join multicast groups, without notification, by sending Internet Group
Management Protocol (IGMP) Report messages. To identify these users and allow for
improved management of them, Huawei provides the user-side multicast feature.
Benefits
User-side multicast can identify users and the programs they join or leave for carriers to better
manage and control online users.
1.11.10.2.1 Overview
Table 1-319 describes multicast service processes.
Table 1-319 Multicast service processes in a user-side multicast scenario
Process Description Remarks

1.11.10.2.2 Multicast To join a multicast program, -
Program Join after going online, a user sends
to an IGMP-capable BRAS an
IGMP Report message of a
multicast program. Upon the
receipt of the message, the
BRAS identifies the user and
the multicast program that the
user wants to join.
1.11.10.2.3 Multicast To leave a multicast program, a -
Program Leave user sends to an IGMP-capable
BRAS an IGMP Leave
Issue 01 (2018-05-04) 1517

NE20E-S2
Process Description Remarks

message. Upon the receipt of
the message, the BRAS
identifies the user and the
multicast program that the user
wants to leave.
Multicast program To switch to another multicast Users switch to another
switchover program, a user sends to an multicast program by
IGMP-capable BRAS an IGMP performing 1.11.10.2.3
Leave message of the multicast Multicast Program Leave and
program that the user wants to 1.11.10.2.2 Multicast Program
leave and an IGMP Report Join.
message of the multicast
program that the user wants to
join.
1.11.10.2.4 Multicast After going offline, a user -
Program Leave by terminates the IPoE or PPPoE
Going Offline connection, without sending
IGMP Leave messages. To stop
the unnecessary multicast traffic
replication, IGMP removes all
outbound interface information
in the multicast entries of the
user.
Related Concepts
Multicast program
A multicast program is an IPTV channel or program and is identified by a multicast source
address and a multicast group.
Access mode
In user-side multicast, only the Point-to-Point Protocol over Ethernet (PPPoE) access and IP
over Ethernet (IPoE) access modes are supported, and only session-based replication is
supported.
 PPPoE access mode: allows a remote access device to provide access services for hosts
on Ethernet networks and to implement user access control and accounting. PPPoE is a
link layer protocol that transmits PPP datagrams through PPP sessions established over
point-to-point connections on Ethernet networks.
 IPoE access mode: allows the BRAS to perform authentication and authorization on
users and user services based on the physical or logical user information, such as the
MAC address, VLAN ID, and Option 82, carried in IPoE packets. In IPv4 network
access where a user terminal connects to an Ethernet interface of a BRAS through a
Layer 2 device, the user IP packets are encapsulated into IPoE packets by the user
Ethernet interface before they are transmitted to the BRAS through the Layer 2 device.
Issue 01 (2018-05-04) 1518

NE20E-S2
Table 1-320 Differences between PPPoE access users and IPoE access users in user-side multicast
Item Packet Multicast Message Type Description

Encapsulation (Unicast/Multicast)
PPPoE Multicast traffic  IGMP messages Multicast replication by
access and IGMP exchanged between a interface + VLAN is not
mode messages user and a BRAS are all supported for the PPPoE
exchanged unicast messages. access mode.
between a user  Multicast traffic that a
and a BRAS are BRAS replicates to a
encapsulated user is sent in unicast
using PPPoE. PPPoE packets.
IPoE Multicast traffic  IGMP Query messages -
access and IGMP that a BRAS sends to a
mode messages user are multicast
exchanged messages encapsulated
between a user using IPoE. The
and a BRAS are destination MAC address
encapsulated is the multicast MAC
using IPoE. address of the user.
 Multicast traffic that a
BRAS replicates to a
user is sent in multicast
IPoE packets. The
destination MAC address
is the multicast MAC
address of the user.
Multicast replication modes

Table 1-321 describes the multicast traffic replication modes on BAS interfaces of BRAS
devices.
Table 1-321 Multicast replication modes
Multicas Multicast Description Usage Scenario Advantage

t Replication
Replicati Devices
on Mode
Session-b BRAS. The BRAS The downstream Users who fail in
ased The BRAS is replicates Layer 2 device of the authentication cannot
multicast used as the multicast BRAS is not capable join multicast
replication multicast traffic to of IGMP Snooping. programs, which
replication each session. allows for improved
device management of them.
because its
downstream
Layer 2
device is
incapable of
Issue 01 (2018-05-04) 1519

NE20E-S2

t Replication
Replicati Devices
on Mode
IGMP
snooping.
Multicast BRAS' The BRAS IGMP Report The burden on the
replication Downstream replicates messages carry BRAS to replicate
by Layer 2 multicast VLAN tags and multicast traffic is
interface + device. traffic by multicast traffic alleviated and the
VLAN This device interface + forwarding across bandwidth usage is
is capable of VLAN to VLANs is not reduced.
IGMP users required.
snooping. In aggregated
other words, based on
it is capable their
of multicast VLANs. For
replication. users on the
same VLAN
who go
online
through the
same
interface and
join the same
multicast
program, the
BRAS
replicates
only one
copy of the
multicast
traffic to the
downstream
Layer 2
device. Then
the Layer 2
device
replicates the
multicast
traffic to the
users.
Multicast BRAS' Users first IGMP Report The burden on the
replication downstream join messages carry BRAS to replicate
by VLAN Layer 2 multicast VLAN tags and multicast traffic is
device. VLANs and multicast traffic alleviated and the
This device then BRAS forwarding across bandwidth usage is
is capable of replicates VLANs is required. reduced.
IGMP multicast
snooping. In traffic based
other words, on the
it is capable multicast
VLANs. The
Issue 01 (2018-05-04) 1520

NE20E-S2

t Replication
Replicati Devices
on Mode
of multicast Layer 2
replication. device
replicates the
received
multicast
traffic based
on the
VLANs that
the users are
on. For users
who go
online
through the
same
interface and
join the same
multicast
program, the
BRAS
replicates
only one
copy of the
multicast
traffic to the
downstream
Layer 2
device.
Replicatio BRAS' The BRAS By default, multicast The burden on the
n by Downstream replicates replication by BRAS to replicate
interface Layer 2 multicast interface is enabled. multicast traffic is
devices. traffic based alleviated and the
This device on interfaces bandwidth usage is
is capable of and the reduced.
IGMP downstream
snooping. In Layer 2
other words, device
it is capable replicates the
of multicast received
replication. multicast
traffic based
on sessions.
It is a special
case of
multicast
replication
by VLAN,
which is
enabled by
setting the
Issue 01 (2018-05-04) 1521

NE20E-S2

t Replication
Replicati Devices
on Mode
VLAN value
to 0.
If all of the preceding multicast replication modes are configured, the priority is as follows in descending
order: replication by interface + VLAN, session-based replication, replication by multicast VLAN, and
replication by interface.
In addition to multicast data packets replication, IGMP Query messages are sent based on the
preceding multicast replication modes.
1.11.10.2.2 Multicast Program Join

Multicast program join requires the user to be online and a member of a multicast group. In
user-side multicast, only the Point-to-Point Protocol over Ethernet (PPPoE) access and IP over
Ethernet (IPoE) access modes are supported, and only session-based replication is supported.
As shown in Figure 1-1024, a set top box (STB) user sends an IGMP Report message to join
an IPTV multicast program. Most implementation processes are similar in PPPoE and IPoE
access modes. See 1.11.10.2.1 for the differences between the processes.
Session-based multicast replication is used in the following illustration of the multicast program join
process. Multicast program join processes of other multicast replication modes are similar to that of
session-based multicast replication.
Issue 01 (2018-05-04) 1522

NE20E-S2
Figure 1-1024 Multicast program join
Accessing the Internet through PPPoE or IPoE is a prerequisite for users to join multicast
programs. Figure 1-1025 illustrates the procedures of multicast program join, and Table 1-322
describes each procedure
Figure 1-1025 Multicast program join process
Table 1-322 Key actions in each multicast program join step
Step Devi Key Action

ce
Issue 01 (2018-05-04) 1523

NE20E-S2

ce
STB To join a multicast program after going online, an STB sends

to an IGMP-capable BRAS an IGMP Report message of a
multicast program. Upon receipt of the message, the BRAS
identifies the user and the multicast program that the user
wants to join.
BRA The BRAS creates a multicast forwarding entry for the STB.
S In this entry, the downstream interface is the interface that
connects to the STB. If it is the first time that a BRAS creates
a multicast forwarding entry for the STB, the BRAS sends a
Protocol Independent Multicast (PIM) Join message to the
rendezvous point (RP) or the source's designated router (DR).
RP/so After receiving the PIM Join message, the RP or the source's
urce's DR generates a multicast forwarding entry for the STB. In this
DR entry, the downstream interface is the interface that receives
the PIM Join message. Then, the STB successfully joins the
multicast group, and the RP or source's DR can send the
multicast traffic to the STB.
Sourc The multicast source sends multicast traffic to the RP or the
e source's DR.
RP/so The RP or source's DR replicates multicast traffic to the
urce's BRAS.
DR
BRA The BRAS replicates the multicast traffic it receives to the
S STB by session based on the multicast forwarding entry. The
STB user can then watch the program.
BRA To determine whether any members remain in the multicast
S group, the BRAS periodically sends an IGMP Query message
to the STB. If no members remain, the BRAS tears down the
group.
STB Upon receipt of the IGMP Query message, the STB responds
with an IGMP Report message to keep the multicast program
active.
1.11.10.2.3 Multicast Program Leave

To leave a multicast program, a users sends to an IGMP-capable BRAS an IGMP Leave
message. Figure 1-1026 illustrates the process of the multicast program leave. The key actions
during this process are described in Table 1-323.
Session-based multicast replication is used in the following illustration of the multicast program leave
process. Multicast program leave processes of other multicast replication modes are similar to that of
session-based multicast replication.
Issue 01 (2018-05-04) 1524

NE20E-S2
Figure 1-1026 Multicast program leave process
Table 1-323 Key actions in each multicast program leave step

ce
STB To leave a multicast program, an STB user sends to an

IGMP-capable BRAS an IGMP Leave message. Upon receipt of
the message, the BRAS identifies the user and the multicast
program that the user wants to leave.
BRA The BRAS sends an IGMP Query message to members in the
S multicast group specified in the IGMP Leave message it
received. (If IGMP Prompt-Leave is configured, this step is
skipped.)
BRA The BRAS deletes the multicast forwarding entry of the STB
S user only if there are other members in the same multicast
group. (If IGMP Prompt-Leave is configured, this step is
skipped.)
NOTE
If the STB user is not a member of any multicast group, the BRAS stops
sending IGMP Query messages to the STB user after the robustness
variable value is reached.
BRA The BRAS stops sending to the STB the multicast traffic of the
S corresponding multicast group it joined.
BRA If there is no member in the multicast group after the STB user
S leaves, the BRAS sends a PIM Graft message to the RP or
source's DR to stop the multicast traffic replication to the group.
Issue 01 (2018-05-04) 1525

NE20E-S2

ce
RP/so The RP or source's DR stops the replication of multicast traffic
urce's to the BRAS, ending the STB user leave process.
DR
1.11.10.2.4 Multicast Program Leave by Going Offline

AAfter a user goes offline, the IPoE or PPPoE connection is terminated automatically for the
user. Figure 1-1027 illustrates the process of the multicast program leave by going offline.
The key actions during this process are described in Table 1-324.
Session-based multicast replication is used in the following illustration of multicast program leave of all
multicast groups by going offline. Multicast program leave of all multicast groups by going offline
processes of other multicast replication modes are similar to that of session-based multicast replication.
Figure 1-1027 Process of multicast program leave of all multicast groups by going offline
Table 1-324 Key actions in each step of multicast program leave of all multicast groups by going
offline

ce
STB When a PPPoE or IPoE STB user goes offline, the user leaves
Issue 01 (2018-05-04) 1526

NE20E-S2

ce
all the multicast programs it joined without sending IGMP
Leave messages.
BRA The BRAS searches for the multicast programs that the STB
S user joined and removes all multicast entries of the STB user.
BRA The BRAS stops the multicast traffic replication to the STB
S user.
BRA The BRAS stops periodically sending the IGMP Query message
S to the offline STB user.
BRA If the offline STB user was the only member of the multicast
S program it joined on the BRAS, the BRAS sends a PIM Graft
message to the rendezvous point (RP) or the source's designated
router (DR). Upon receipt of the message, the multicast source
determines that the multicast data of this program is no longer
required.
RP/so The RP or source's DR stops the replication of multicast traffic
urce's to the BRAS, and the STB user leaves all the multicast programs
DR it joined.
1.11.10.2.5 User-side Multicast CAC
Overview
User-side call admission control (CAC) is a bandwidth management and control method used
to guarantee multicast service quality of online users.
A conventional quality-guarantee mechanism is to limit the maximum number of multicast
groups that users can join. With this mechanism, a BRAS checks whether the maximum
number of multicast groups has been exceeded after receiving a Join message from a user. If
the maximum number has been exceeded, the device drops the Join message and denies the
user request. This mechanism alone, however, has become incompetent due to the continuous
increase of IPTV program varieties. A high upper limit may prevent the device from denying
many join requests but cannot prevent the device from dropping messages due to limited
bandwidth resources on interfaces.
User-side multicast CAC addresses these issues by enabling a BRAS to limit bandwidth for
users.
User-side multicast CAC enables a BRAS to check the bandwidth limit and deny user
requests if the limit has been exceeded.
User-side multicast CAC can be implemented for users in a specific domain and on a specific
interface. It works with the multicast group limit function to implement the following
functions:
 User-level bandwidth limit: A bandwidth limit can be set for each user in a specific user
access domain, and new service requests of a user are denied when the bandwidth
consumed by the user exceeds the bandwidth limit.
Issue 01 (2018-05-04) 1527

NE20E-S2
 Interface-level bandwidth limit: A bandwidth limit can be set for a user access interface,
and new service requests are denied when the consumed bandwidth exceeds the
bandwidth limit.
User-side multicast CAC supports GE interfaces only.
Principles
Figure 1-1028 shows the working principles of user-side multicast CAC in a process of going
online.
 The STB and phone users go online.
 The STB and phone users send IGMP Report messages to request for multicast services.
 The BRAS receives the IGMP Report messages and checks the bandwidth limits
configured for the user access domain and interface.
− If the remaining bandwidth resources are sufficient for the users:
The BRAS sends a PIM Join message to the RP or source's DR. The RP or source's
DR creates a multicast forwarding entry, and sends the service flow received from
the source to the BRAS. The BRAS forwards the flow to the users based on the
multicast forwarding entry and multicast traffic replication mode (this examples
uses the by session mode).
− If the remaining bandwidth resources are insufficient for the users, the BRAS
discards the IGMP Report message and denies the service requests.
The BRAS supports only IPv4 user-side multicast CAC.
User-side multicast CAC supports only PPPoE and IPoE access modes for Layer 2 common users.
The BRAS supports four multicast traffic replication modes: by session, by interface + VLAN, by
multicast VLAN, and by interface.
Issue 01 (2018-05-04) 1528

NE20E-S2
Figure 1-1028 User-side multicast CAC
Purpose
Limiting the maximum number of multicast groups cannot guarantee service quality any more
due to the increase of IPTV service varieties and the big bandwidth requirement difference
among multicast channels. Therefore, user-side multicast CAC was introduced to prevent
bandwidth resources from being exhausted, thus guaranteeing the IPTV service quality of
online users.
Benefits
User-side multicast CAC brings the following benefits:
 Guarantees IPTV service quality for online users.
 Allows the denial of new users when a mass of multicast channels are requested and
bandwidth resources are insufficient.
1.11.10.3.1 User-side Multicast for PPPoE Access Users
Service Description
Management Protocol (IGMP) Report messages.
To identify these users and allow for improved management of them, Huawei provides the
user-side multicast feature.
Issue 01 (2018-05-04) 1529

NE20E-S2
In Figure 1-1029, a set top box (STB) user initiates a dial up connection through
Point-to-Point Protocol over Ethernet (PPPoE) to the broadband remote access server (BRAS).
The BRAS then assigns an IPv4 address to the user for Internet access. To join a multicast
program, the user sends an IGMP Report message to the BRAS, and the BRAS creates a
multicast forwarding entry for the user. In this entry, the downstream interface is the interface
that connects to the user. After the entry is created, the BRAS sends a Protocol Independent
Multicast (PIM) Join message to the network-side rendezvous point (RP) or the source's
designated router (DR). Upon receipt of this message, the RP or source's DR sends to the
BRAS the multicast traffic of the program that the user wants to join. The BRAS then
replicates and sends the multicast traffic to the user based on the multicast forwarding entry.
Figure 1-1029 User-side multicast for PPPoE access users
Feature Deployment
Deployment for the user-side multicast feature is as follows:
 Configure an IPv4 address pool on the BRAS to assign IPv4 addresses to online users.
 Configure Authentication, Authorization and Accounting (AAA) schemes.
 Configure a domain for user management, such as AAA.
 Configure the PPPoE access mode on the BRAS.
a. Configure a virtual template (VT) interface.
b. Bind a VT to an interface.
c. Bind the sub-interface to the virtual local area network (VLAN) if users are
connected to the sub-interface. (For users connected to the main interface, skip this
step.)
d. Configure a broadband access server (BAS) interface and specify a user access type
for the interface. (The BAS interface can be a main interface, a common
sub-interface, or a QinQ sub-interface.)
 Configure basic multicast functions on the BRAS and on the RP or source's DR.
a. Enable multicast routing.
b. Enable Protocol Independent Multicast-Sparse Mode (PIM-SM) on BRAS
interfaces and on the RP or source's DR interfaces.
c. Enable IGMP on the BRAS interface connected to users.
Issue 01 (2018-05-04) 1530

NE20E-S2
 Configure a multicast replication mode on a BAS interface. By default, multicast

replication by interface is configured. You can choose to configure one of the following
multicast replication modes:
− Session-based multicast replication
− Multicast replication by interface + VLAN
− Multicast replication by VLAN
1.11.10.3.2 User-side Multicast for IPoE Access Users
Service Description
Management Protocol (IGMP) Report messages.
To identify these users and allow for improved management of them, Huawei provides the
user-side multicast feature.
In Figure 1-1030, a set top box (STB) user connects to the BRAS through IPoE. (Using IPoE,
the user does not need to initiate a dial up connection, and so no client software is required.)
The BRAS then assigns an IPv4 address to the user for Internet access. To join a multicast
program, the user sends an IGMP Report message to the BRAS. The BRAS then creates a
multicast forwarding entry and establishes an outbound interface for the user. After the entry
is created, the BRAS sends a PIM Join message to the network-side RP or the source's DR.
Upon receipt of this message, the RP or source's DR sends to the BRAS the multicast data of
the program that the user wants to join. The BRAS then replicates and sends the multicast
data to the user based on the multicast forwarding entry.
Figure 1-1030 User-side multicast for IPoE access users
Feature Deployment
Deployment for the user-side multicast feature is as follows:
Issue 01 (2018-05-04) 1531

NE20E-S2

 Configure access service for IPoE access users.
a. Configure an authentication scheme.
b. Bind the sub-interface to the virtual local area network (VLAN) if users are
connected to the sub-interface. (For users connected to the main interface, skip this
step.)
c. Configure a broadband access server (BAS) interface and specify a user access type
for the interface. (The BAS interface can be a main interface, a common
sub-interface, or a QinQ sub-interface.)
 Configure basic multicast functions on the BRAS and on the RP or source's DR.
a. Enable multicast routing.
b. Enable Protocol Independent Multicast-Sparse Mode (PIM-SM) on BRAS
interfaces and on the RP or source's DR interfaces.
c. Enable IGMP on the BRAS interface connected to users.
1.11.10.3.3 User-side Multicast VPN
Service Overview
User-side multicast VPN enables a BRAS to identify users of a multicast program, which
allows for improved management of them.
As shown in Figure 1-1031, the STB user and the multicast source belong to the same VPN
instance, which is a prerequisite for users to join programs of the multicast source on the VPN
that they belong to. To join a multicast program after accessing the Layer 3 VPN, the STB
user sends and IGMP Report message to the BRAS. Upon receipt of the IGMP Report
message, the BRAS identifies the domain and private VPN instance of the STB user. Then the
BRAS creates the multicast entry for the STB user in the corresponding VPN instance and
sends the PIM Join message to the network-side multicast source or RP for the multicast
traffic. As the final step, the BRAS replicates the multicast traffic to the STB user based on
different multicast replication modes.
Issue 01 (2018-05-04) 1532

NE20E-S2
Figure 1-1031 Networking of user-side multicast VPN
Feature Deployment
Deployment for the user-side multicast VPN is as follows:
 Configure the PPPoE or IPoE access mode on the BRAS.
 Configure basic multicast VPN functions.
 Bind a VPN instance of the specified multicast service to the main interface on a BRAS.
 Enable IGMP and PIM on the main interface of the BRAS.
1.11.11 Layer 2 Multicast

Definition
Layer 2 multicast implements on-demand multicast data transmission on the data link layer.
Figure 1-1032 shows a typical Layer 2 multicast application where Device B functions as a
Layer 2 device. After Layer 2 multicast is deployed on Device B, it listens to Internet Group
Management Protocol (IGMP) packets exchanged between Device A (a Layer 3 device) and
hosts and creates a Layer 2 multicast forwarding table. Then, Device B forwards multicast
data only to users who have explicitly requested the data, instead of broadcasting the data.
Issue 01 (2018-05-04) 1533

NE20E-S2
Figure 1-1032 Layer 2 multicast
Purpose
Layer 2 multicast is designed to reduce network bandwidth consumption. For example,
without Layer 2 multicast, Device B cannot know which interfaces are connected to multicast
receivers. Therefore, after receiving a multicast packet from Device A, Device B broadcasts
the packet in the packet's broadcast domain. As a result, all hosts in the broadcast domain
(including those who do not request the packet) will receive the packet, which wastes network
bandwidth and compromises network security.
With Layer 2 multicast, DeviceB can create a Layer 2 multicast forwarding table and record
the mapping between multicast group addresses and interfaces in the table. After receiving a
multicast packet, Device B searches the forwarding table for downstream interfaces that map
to the packet's group address, and forwards the packet only to these interfaces. A multicast
group address can be a multicast IP address or a mapped multicast MAC address.
Functions
Major Layer 2 multicast functions include:
 IGMP snooping
 Static Layer 2 multicast
 Layer 2 SSM mapping
 IGMP snooping proxy
 Multicast VLAN
 Layer 2 multicast entry limit
 Layer 2 Multicast Instance
 Multicast Listener Discovery Snooping
Benefits
Layer 2 multicast offers the following benefits:
 Reduced network bandwidth consumption
 Lower performance requirements on Layer 3 devices
 Improved multicast data security
 Improved user service quality
Issue 01 (2018-05-04) 1534

NE20E-S2
1.11.11.2.1 IGMP Snooping
Background
Layer 3 devices and hosts use IGMP to implement multicast data communication. IGMP
messages are encapsulated in IP packets. A Layer 2 device can neither process Layer 3
information nor learn multicast MAC addresses in link layer data frames because source
MAC addresses in data frames are not multicast MAC addresses. As a result, when a Layer 2
device receives a data frame in which the destination MAC address is a multicast MAC
address, the device cannot find a matching entry in its MAC address table. The Layer 2 device
then broadcasts the multicast packet, which wastes bandwidth resources and compromises
network security.
IGMP snooping addresses this problem by controlling multicast traffic forwarding at Layer 2.
IGMP snooping enables a Layer 2 device to listen to and analyze IGMP messages exchanged
between a Layer 3 device and hosts. Based on the learned IGMP message information, the
device creates a Layer 2 forwarding table and uses it to implement on-demand packet
forwarding.
Figure 1-1033 shows a network on which Device B is a Layer 2 device and users connected to
Port 1 and Port 2 require multicast data from a multicast group (for example, 225.0.0.1).
 If Device B does not run IGMP snooping, Device B broadcasts all received multicast
data at the data link layer.
 If Device B runs IGMP snooping and receives data for a multicast group, Device B
searches the Layer 2 multicast forwarding table for ports connected to the users who
require the data. In this example, Device B sends the data only to Port 1 and Port 2
because the user connected to Port 3 does not require the data.
Issue 01 (2018-05-04) 1535

NE20E-S2
Figure 1-1033 Multicast packet transmission before and after IGMP snooping is configured on a
Layer 2 device
Table 1-325 Layer 2 multicast forwarding table on Device B
Multicast Group Downstream Port
225.0.0.1 Port 1
225.0.0.1 Port 2
Issue 01 (2018-05-04) 1536

NE20E-S2
Related Concepts
Figure 1-1034 illustrates IGMP snooping on a Layer 2 multicast network.
Figure 1-1034 IGMP snooping on a Layer 2 multicast network
 A router port (labeled with a blue circle in Figure 1-1034): It connects a Layer 2
multicast device to an upstream multicast router.
Router ports can be dynamically discovered by IGMP or manually configured.
 A member port of a multicast group (labeled with a yellow square in Figure 1-1034): It
connects a Layer 2 multicast device to group member hosts and is used by a Layer 2
multicast device to send multicast packets to hosts.
Member ports can be dynamically discovered by IGMP or manually configured.
 A Layer 2 multicast forwarding entry: It is stored in the multicast forwarding table and
used by a Layer 2 multicast device to determine the forwarding of a multicast packet sent
from an upstream device. Information in a Layer 2 multicast forwarding entry includes:
− VLAN ID or VSI name
Issue 01 (2018-05-04) 1537

NE20E-S2
− Multicast group address

− Router port that connects to an upstream device
− Member port that connects to a host
 Multicast MAC address: It is mapped from a multicast IP address contained in a
multicast data packet at the data link layer. Multicast MAC addresses are used to
determine multicast data packet forwarding at the data link layer.
As defined by the Internet Assigned Numbers Authority (IANA), the 24 most significant
bits of a multicast MAC address are 0x01005e, the 25th bit is 0, and the 23 least
significant bits are the same as those of a multicast IP address.
Figure 1-1035 shows the mapping between a multicast IP address and a multicast MAC
address. For example, if the IP address of a multicast group is 224.0.1.1, the MAC
address of this multicast group is 01-00-5e-00-01-01. Information about 5 bits of the IP
address is lost, because only 23 bits of the 28 least significant bits of the IP address are
mapped to the MAC address. As a result, 32 IPv4 multicast addresses are mapped to the
same MAC address. In this example, IP multicast addresses 224.0.1.1, 224.128.1.1,
225.0.1.1, and 239.128.1.1 all correspond to the multicast MAC address
01-00-5e-00-01-01.
Figure 1-1035 Mapping between an IP multicast address and a multicast MAC address
Implementation
IGMP snooping is implemented as follows:
1. After IGMP snooping is deployed on a Layer 2 device, the device uses IGMP snooping
to analyze IGMP messages exchanged between hosts and a Layer 3 device and then
creates a Layer 2 multicast forwarding table based on the analysis. Information in
forwarding entries includes VLAN IDs or VSI names, multicast source addresses,
multicast group addresses, and numbers of ports connected to hosts.
− After receiving an IGMP Query message from an upstream device, the Layer 2
device sets a network-side port as a dynamic router port.
− After receiving a PIM Hello message from an upstream device, the Layer 2 device
sets a network-side port as a dynamic router port.
− After receiving an IGMP Report message from a downstream device or user, the
Layer 2 device sets a user-side port as a dynamic member port.
Issue 01 (2018-05-04) 1538

NE20E-S2
2. The IGMP snooping-capable Layer 2 device forwards a received packet based on the
Layer 2 multicast forwarding table.
Other Functions
 IGMP snooping supports all IGMP versions.
IGMP has three versions: IGMPv1, IGMPv2, and IGMPv3. You can specify an IGMP
version for your device.
 IGMP snooping enables a Layer 2 device to rapidly respond to Layer 2 network topology
changes.
Multiple Spanning Tree Protocol (MSTP) is usually used to connect Layer 2 devices to
implement rapid convergence. IGMP snooping adapts to this feature by enabling a Layer
2 device to immediately update port information and switch multicast data traffic over a
new forwarding path when the network topology changes, which minimizes multicast
service interruptions.
 IGMP snooping allows you to configure a security policy for multicast groups.
This function can be used to limit the range and number of multicast groups that users
can join and to determine whether to receive multicast data packets containing a security
field. It provides refined control over multicast groups and improves network security.
Deployment Scenarios
IGMP snooping can be used on VLANs and virtual private LAN service (VPLS) networks.
Benefits
IGMP snooping deployed on a user-side router offers the following benefits:
 Reduced bandwidth consumption
 Independent accounting for individual hosts
1.11.11.2.2 Static Layer 2 Multicast
Background
Multicast data can be transmitted to user terminals over an IP bearer network in either
dynamic or static multicast mode.
 In dynamic multicast mode, a device starts to receive and deliver a multicast group's data
after receiving the first Report message for the group. The device stops receiving the
multicast group's data after receiving the last Leave message. The dynamic multicast
mode has both an advantage and a disadvantage:
− Advantage: It reduces bandwidth consumption by restricting multicast traffic.
− Disadvantage: It introduces a delay when a user switches a channel.
 In static multicast mode, multicast forwarding entries are configured for each multicast
group on a device. A multicast group's data is delivered to a device, regardless of
whether users are requesting the data from this device. The static multicast mode has the
following advantages and disadvantages:
− Advantages:
 Multicast routes are fixed, and multicast paths exist regardless of whether there
are multicast data receivers. Users can change channels without delays,
improving user experience.
Issue 01 (2018-05-04) 1539

NE20E-S2
 Multicast source and group ranges are easy to manage because multicast paths
are stable.
 The delay when data is first forwarded is minimal because static routes already
exist and do not need to be established the way dynamic multicast routes do.
− Disadvantages:
 Each device on a multicast data transmission path must be manually
configured. The configuration workload is heavy.
 Sub-optimal multicast forwarding paths may be generated because
downstream ports are manually specified on each device.
 When a network topology or unicast routes change, static multicast paths may
need to be reconfigured. The configuration workload is heavy.
 Multicast routes exist even when no multicast data needs to be forwarded. This
wastes network resources and creates high bandwidth requirements.
A Layer 2 multicast forwarding table can be dynamically built using IGMP snooping or be
manually configured. Choose the dynamic or static mode based on network quality
requirements and demanded service types.
If network bandwidth is sufficient and hosts require multicast data for specific multicast
groups from a router port for a long period of time, choose static Layer 2 multicast to
implement stable multicast data transmission on a metropolitan area network (MAN) or
bearer network. After static Layer 2 multicast is deployed on a device, multicast entries on the
device do not age and users attached to the device can stably receive multicast data for
specific multicast groups.
Related Concepts
Static router ports or member ports are used in static Layer 2 multicast.
 Static router ports are used to receive multicast traffic.
 Static member ports are used to send data for specific multicast groups.
Benefits
Static Layer 2 multicast offers the following benefits:
 Simplified network management
 Reduced network delays
 Improved information security by preventing unregistered users from receiving multicast
packets
1.11.11.2.3 Layer 2 SSM Mapping
Background
IGMPv3 supports source-specific multicast (SSM), but IGMPv1 and IGMPv2 do not. The
majority of the latest multicast devices support IGMPv3, but most legacy multicast terminals
only support IGMPv1 or IGMPv2. SSM mapping is a transition solution that provides SSM
services for such legacy multicast terminals. Using rules that specify the mapping from a
particular multicast group to a source-specific group, SSM mapping can convert IGMPv1 or
IGMPv2 messages whose group addresses are within the SSM range to IGMPv3 messages.
This mechanism allows hosts running IGMPv1 or IGMPv2 to access SSM services. SSM
Issue 01 (2018-05-04) 1540

NE20E-S2
mapping allows IGMPv1 or IGMPv2 terminals to access only specific sources, thus
minimizing the risks of attacks on multicast sources.
Layer 2 SSM mapping is used to implement SSM mapping on Layer 2 networks. For example,
on the network shown in Figure 1-1036, the Layer 3 device runs IGMPv3 and directly
connects to a Layer 2 device. Host A runs IGMPv3, Host B runs IGMPv2, and Host C runs
IGMPv1 on the Layer 2 network. If the IGMP versions of Host B and Host C cannot be
upgraded to IGMPv3, Layer 2 SSM mapping needs to be configured on the Layer 2 device to
provide SSM services for all hosts on the network segment.
Figure 1-1036 Layer 2 SSM mapping
Implementation
If SSM mapping is configured on a multicast device and mapping between group addresses
and source addresses is configured, the multicast device will perform the following actions
after receiving a (*, G) message from a host running IGMPv1 or IGMPv2:
 If the message's multicast group address is not in the SSM group address range, the
device processes the message in the same manner as it processes an IGMPv1 or IGMPv2
message.
 If the message's multicast group address is in the SSM group address range, the device
maps the (*, G) message into (S, G) messages based on mapping rules.
Issue 01 (2018-05-04) 1541

NE20E-S2
Benefits
Layer 2 SSM mapping offers the follow benefits:
 Enables IGMPv1/v2 terminal users to enjoy SSM services.
 Better protects multicast sources against attacks.
1.11.11.2.4 IGMP Snooping Proxy
Background
Forwarding entries are generated when a Layer 3 device (PE on the network shown in Figure
1-1037) exchanges IGMP messages with user hosts. If there are many user hosts, excessive
IGMP messages will reduce the forwarding capability of the Layer 3 device.
To resolve this issue, deploy IGMP snooping proxy on a Layer 2 device (CE on the network
shown in Figure 1-1037) that connects the Layer 3 device and hosts. IGMP snooping proxy
enables a Layer 2 device to behave as both a Layer 3 device and a user host, so that the Layer
2 device can terminate IGMP messages to be transmitted between the Layer 3 device and user
host. IGMP snooping proxy enables a Layer 2 device to perform the following operations:
 Periodically send Query messages to hosts and receive Report and Leave messages from
hosts.
 Maintain group member relationships.
 Send Report and Leave messages to a Layer 3 device.
 Forward multicast traffic only to hosts who require it.
After IGMP snooping proxy is deployed on a Layer 2 device, the Layer 2 device is not a
transparent message forwarder between a Layer 3 device and user host any more. Furthermore,
the Layer 3 device only recognizes the Layer 2 device and is unaware of user hosts.
Issue 01 (2018-05-04) 1542

NE20E-S2
Figure 1-1037 IGMP snooping proxy
Implementation
A device that runs IGMP snooping proxy establishes and maintains a multicast forwarding
table and sends multicast data to users based on this table. IGMP snooping proxy implements
the following functions:
 IGMP snooping proxy implements the querier function for upstream devices, enabling a
Layer 2 device to send Query messages on behalf of its interworking upstream device.
The querier function must be enabled by deploying directly or enabling IGMP snooping
proxy on a Layer 2 device if its interworking upstream device cannot send IGMP Query
messages or if static multicast groups are configured on the upstream device.
 IGMP snooping proxy enables a Layer 2 device to suppress Report and Leave messages
if large numbers of users frequently join or leave multicast groups. This function reduces
message processing workload for upstream devices.
− After receiving the first Report message for a multicast group from a user host, the
device checks whether an entry has been created for this group. If an entry has not
been created, the device sends the Report message to its upstream device and
creates an entry for this group. If an entry has been created, the device adds the host
to the multicast group and does not send a Report messages to its upstream device.
− After receiving a Leave message for a group from a user host, the device sends a
group-specific query message to check whether there are any members of this group.
If there are members of this group, the device deletes only the user from the group.
If there are no other members of this group, the device considers the user as the last
member of the group and sends a Leave message to its upstream device.
IGMP snooping proxy can be used on VLANs and VPLS networks.
Issue 01 (2018-05-04) 1543

NE20E-S2
Benefits
IGMP snooping proxy deployed on a user-side Layer 2 router offers the following benefits:
 Reduced workload on Layer 3 devices directly connected to the Layer 2 router
1.11.11.2.5 Multicast VLAN
Background
In traditional multicast on-demand mode, if users in different VLANs require the same
multicast flow, an upstream device of a Layer 2 device must send a copy of the multicast flow
for each user. This mode wastes bandwidth and imposes additional burdens on the upstream
device.
The multicast VLAN function can be used to address this problem. With the help of IGMP
snooping, the multicast VLAN function moves the multicast replication point downstream to
an edge device, so that only one multicast flow is replicated on an upstream device for
different VLANs that require the same flow.
For example, on the network shown in Figure 1-1038, the CE is a Layer 2 device, and the PE
is an upstream device of the CE. Users in VLANs 11 and 12 require the same multicast flow
from the CE. After multicast VLAN 3 is configured on the CE, the PE sends only one copy of
the multicast flow to VLAN 3. The CE then sends a copy of the multicast flow to VLAN 11
and VLAN 22. The PE no longer needs to send identical multicast data flows downstream.
This mode saves network bandwidth and relieves the load on the PE.
Issue 01 (2018-05-04) 1544

NE20E-S2
Figure 1-1038 Multicast flow replication before and after multicast VLAN is configured
The following uses the network shown in Figure 1-1038 as an example to describe why
multicast VLAN requires IGMP snooping proxy to be enabled.
 If IGMP snooping proxy is not enabled on VLAN 3 and users in different VLANs want
to join the same group, the CE forwards each user's IGMP Report message to the PE.
Similarly, if users in different VLANs leave the same group, the CE also needs to
forward each user's IGMP Leave message to the PE.
 If IGMP snooping proxy is enabled on VLAN 3 and users in different VLANs want to
join the same group, the CE forwards only one IGMP Report message to the PE. If the
last member of the group leaves, the CE sends an IGMP Leave message to the PE. This
reduces network-side bandwidth consumption on the CE and performance pressure on
the PE.
Related Concepts
The following concepts are involved in the multicast VLAN function:
 Multicast VLAN: is a VLAN to which the interface connected to a multicast source
belongs. A multicast VLAN is used to aggregate multicast flows.
 User VLAN: is a VLAN to which a group member host belongs. A user VLAN is used to
receive multicast flows from a multicast VLAN.
Issue 01 (2018-05-04) 1545

NE20E-S2
One multicast VLAN can be bound to multiple user VLANs.

After the multicast VLAN function is configured on a device, the device receives multicast
traffic through multicast VLANs and sends the multicast traffic to users through user VLANs.
Implementation
The multicast VLAN implementation process can be divided into two parts:
 Protocol packet forwarding
− After the user VLAN tag in an IGMP Report message is replaced with a
corresponding multicast VLAN tag, the message is sent out through a router port of
the multicast VLAN.
− After the multicast VLAN tag in an IGMP Query message is replaced with a
corresponding user VLAN tag, the message is sent out through a member port of
the user VLAN.
− Entries learned through IGMP snooping in user VLANs are added to the table of the
multicast VLAN.
 Multicast data forwarding
After receiving a multicast data packet from an upstream device, a Layer 2 device
searches its multicast forwarding table for a matching entry.
− If a matching forwarding entry exists, the Layer 2 device will identify the
downstream ports and their VLAN IDs, replicate the multicast data packet on each
downstream port, and send a copy of the packet to user VLANs.
− If no matching forwarding entry exists, the Layer 2 device will discard the multicast
data packet.
Other Functions
A user VLAN allows you to configure the querier election function. The following uses the
network shown in Figure 1-1039 as an example to describe the querier election function.
 A CE connects to Router A through both Router B and Router C, which improves the
reliability of data transmission. The querier function is enabled on Router B and Router
C.
 Multicast VLAN is enabled on Router B and Router C. VLAN 11 is a multicast VLAN,
and VLAN 22 is a user VLAN.
Both Router B and Router C in VLAN 11 are connected to VLAN 22. As a result, VLAN 22
will receive two identical copies for the same requested multicast flow from Router B and
Router C, causing data redundancy.
To address this problem, configure querier election on Router B and Router C in the user
VLAN and specify one of them to send Query messages and forward multicast data flows. In
this manner, VLAN 22 receives only one copy of a multicast data flow from the upstream
Router A over VLAN 11.
Issue 01 (2018-05-04) 1546

NE20E-S2
Figure 1-1039 Networking diagram for querier election in a user VLAN
A querier is elected as follows in a user VLAN (the network shown in Figure 1-1039 is used
as an example):
1. After receiving a Query message from Router A, Router B and Router C replace the
source IP address of the Query message with their own local source IP address (1.1.1.1
for Router B and 1.1.1.2 for Router C).
2. Router B and Router C exchange Query messages. Based on the querier election
algorithm, Router B with a smaller source IP address is elected as a querier.
3. As a querier, Router B generates a forwarding entry after receiving a Join message from
VLAN 22, while Router C does not generate a forwarding entry. Then, multicast data
flows from upstream devices are forwarded by Router B to VLAN 22.
The multicast VLAN function can be used on VLANs.
Benefits
The multicast VLAN function offers the following benefits:
 Reduced workloads for Layer 3 devices
 Simplified management of multicast sources and multicast group members
Issue 01 (2018-05-04) 1547

NE20E-S2
1.11.11.2.6 Layer 2 Multicast Entry Limit
Principles
With the growing popularity of IPTV applications, multicast services are more widely
deployed than ever. When multicast services are deployed on a Layer 2 network, a number of
problems may arise:
 If users join a large number of multicast groups, sparsely distributed multicast groups
will increase performance pressure on network devices.
 If network bandwidth is insufficient, the demand for bandwidth resources will exceed the
total network bandwidth, overloading aggregation layer devices and degrading user
experience.
 If multicast packets are used to attack a network, network devices become busy
processing attack packets and cannot respond to normal network requests.
On the network shown in Figure 1-1040, Layer 2 multicast entry limit can be deployed on the
UPE and NPEs to address the problems described above. The Layer 2 multicast entry limit
function limits entries of multicast services on a Layer 2 network. This function implements
multicast service access restrictions and refined control on the aggregation network based on
the number of multicast groups. Layer 2 multicast entry limit also enables service providers to
refine content offerings and develop flexible subscriber-specific policies. This prevents the
demand for bandwidth resources from exceeding the total bandwidth of the aggregation
network and improves service quality for users.
Figure 1-1040 Layer 2 multicast entry limit
Related Concepts
Entry limit: provides rules to limit the number of multicast groups, implementing control over
multicast entry learning.
Issue 01 (2018-05-04) 1548

NE20E-S2
Implementation
If IGMP snooping is enabled, Layer 2 multicast entry limit can be used to control multicast
services. Multicast entry limit constrains the generation of multicast forwarding entries. When
a specified threshold is reached, no more forwarding entries will be generated. This conserves
the processing capacity of devices and controls link bandwidth.
Layer 2 multicast entry limit can be classified by usage scenario as follows:
 VLAN scenario:
− Layer 2 multicast entry limit in a VLAN
− Layer 2 multicast entry limit on an interface
− Layer 2 multicast entry limit in a VLAN on a specified interface
 VPLS scenario:
− Layer 2 multicast entry limit in a VSI
− Layer 2 multicast entry limit on a sub-interface
− Layer 2 multicast entry limit on a PW
Layer 2 multicast entry limit can restrict the following items:
 Number of multicast groups
The number of multicast groups allowed can be limited when a device creates Layer 2
multicast forwarding entries. This protects device and network performance by limiting
the number of groups available for users to join. After IGMP Report messages are
received from downstream user hosts, the device checks entry limit statistics to
determine whether the threshold for the number of multicast groups has been reached. If
the threshold has not been reached, a forwarding entry is generated and entry limit
statistics are updated to show the increase in groups. If the threshold has been reached,
no entry is generated. When IGMP Leave messages are received or entries age, the
entries are deleted and entry limit statistics are updated.
Layer 2 multicast entry limit can be used on VLANs and VPLS networks.
Benefits
Layer 2 multicast entry limit offers the following benefits:
 Prevents required bandwidth resources from exceeding the total bandwidth of the
aggregation network and improves service quality for users.
 Improves multicast service security.
1.11.11.2.7 Layer 2 Multicast CAC
Background
With the growing popularity of IPTV applications, multicast services are more widely
deployed than ever. When multicast services are deployed on a Layer 2 network, a number of
problems may arise:
 If users join a large number of multicast groups, sparsely distributed multicast groups
will increase performance pressure on network devices.
Issue 01 (2018-05-04) 1549

NE20E-S2
 If network bandwidth is insufficient, the demand for bandwidth resources will exceed the
total bandwidth of the network, overloading aggregation layer devices and degrading
user experience.
 If multicast packets are used to attack a network, network devices become busy
processing attack packets and cannot respond to normal network requests.
 If static multicast group management policies are used, user requests for access to a
variety of different multicast services cannot be met. Service providers expect more
refined channel management. For example, they expect to limit the number and
bandwidth of multicast groups in channels.
On the network shown in Figure 1-1041, Layer 2 multicast CAC can be deployed on the UPE
and NPEs to address the problems described above. Layer 2 multicast CAC controls multicast
services on the aggregation network based on different criteria, including the multicast group
quantity and bandwidth limit for a channel or sub-interface. Layer 2 multicast CAC enables
service providers to refine content offerings and develop flexible subscriber-specific policies.
This prevents the demand for bandwidth resources from exceeding the total bandwidth of the
aggregation network and ensures service quality for users.
Figure 1-1041 Layer 2 multicast CAC
Related Concepts
The following concepts are involved in multicast CAC.
 Call Admission Control (CAC): provides a series of rules for controlling multicast entry
learning, including the multicast group quantity and bandwidth limits for each multicast
group, as well as for each channel. Layer 2 multicast CAC is used to perform CAC
operations for multicast services on Layer 2 networks.
 Channel: consists of a series of multicast groups, each of which can have its own
bandwidth attribute. For example, a TV channel consists of two groups, TV-1 and TV-5,
with the bandwidth of 4 Mbit/s and 18 Mbit/s, respectively.
Issue 01 (2018-05-04) 1550

NE20E-S2
Implementation
Layer 2 multicast CAC constrains the generation of multicast forwarding entries. When a
preset threshold is reached, no more forwarding entries can be generated. This ensures that
devices have adequate processing capabilities and controls link bandwidth.
Layer 2 multicast CAC can restrict the following items:
 Restriction on the number and bandwidth of multicast groups
The number of multicast groups allowed can be limited when a device creates Layer 2
multicast forwarding entries. This protects device and network performance by limiting
the number of groups available for users to join. After IGMP Report messages are
received from downstream user hosts, the device checks CAC statistics to determine
whether the threshold for the number of multicast groups has been reached. If the
threshold has not been reached, a forwarding entry is generated and CAC statistics are
updated to show the increase in groups. If the threshold has been reached, no entry is
generated. When IGMP Leave messages are received or entries age, the entries are
deleted and CAC statistics are updated.
If the bandwidth of each multicast group is fixed and each group uses approximately the
same amount of bandwidth, the total bandwidth for multicast traffic is basically fixed.
For example, if there are 20 multicast groups and each multicast group has 4 kbit/s of
bandwidth, the total bandwidth for multicast traffic is 80 kbit/s. If there are 20 multicast
groups and the bandwidth values of the multicast groups are different, some being 4
kbit/s and the others being 18 kbit/s, the total bandwidth for multicast traffic cannot be
determined. In a case like this, setting a limit on the number of multicast groups is not
adequate to control bandwidth. Bandwidth usage must be limited.
 Restriction on the number and bandwidth of multicast groups in a channel
If a network offers channels for different content providers, the number of multicast
groups and the amount of bandwidth must be limited based on channels.
Before a Layer 2 multicast entry is generated, the multicast group address must be
checked to determine which channel's address range to which this address belongs.
Whether CAC is configured for the address range needs to be checked also. If CAC is
configured for the address range and the number or bandwidth of member multicast
groups exceeds the upper threshold, the Layer 2 entry will not be generated. The Layer 2
entry will be generated only if the number or bandwidth of member multicast groups is
below the upper threshold.
Layer 2 multicast CAC applies to VPLS networks.
Benefits
The Layer 2 multicast CAC feature provides the following benefits:
 For providers:
− Provides channel-based restrictions, allowing service providers to implement
refined multicast service management.
− Improves multicast service security.
 For users:
− Prevents bandwidth resources required from exceeding the total bandwidth of the
aggregation network and ensures service quality for users.
Issue 01 (2018-05-04) 1551

NE20E-S2
− Improves multicast service security.
1.11.11.2.8 Rapid Multicast Data Forwarding on a Backup Device
Principles
Multicast services have relatively high demands for real-time transmissions. To ensure
uninterrupted delivery of multicast services, master and backup links and devices are
deployed on a VPLS network with a UPE dual-homed to SPEs. In the networking shown in
Figure 1-1042, a UPE is connected to two SPEs through a VPLS network. The PWs between
the UPE and SPEs work in master/backup mode. Multicast services are delivered from a
multicast source to users attached to the UPE.
Figure 1-1042 VPLS network where a UPE is dual-homed to SPEs
Issue 01 (2018-05-04) 1552

NE20E-S2
This networking allows unicast services to be transmitted properly, but there are problems
with the transmission of multicast services. Multicast protocol and data packets are blocked
on the backup PW and this prevents the backup SPE (SPE2) from learning multicast
forwarding entries. As a result, SPE2 has no forwarding entries, and, in the event of a
master/backup SPE switchover, it cannot begin forwarding multicast data traffic immediately.
The PE must first resend an IGMP Query message and users attached to the UPE must reply
with Report messages before SPE2 can learn multicast forwarding entries through the backup
PW and resume the forwarding of multicast data packets. As a result, services are interrupted
on the UPE for a long period of time, and network reliability is adversely affected.
If the primary and secondary PWs in this networking are hub PWs, split horizon still takes effect,
meaning that protocol and data packets are not transmitted from the primary PW to the secondary PW.
To address this problem, rapid multicast traffic forwarding is configured on the backup device,
SPE2. SPE2 sends an IGMP Query message to the UPE along the backup PW, and receives an
IGMP Report message from the UPE to create a Layer 2 multicast forwarding table. Although
the backup PW cannot be used to forward multicast data traffic, it can be used by SPE2 to
send an IGMP Query message. If there is a switchover and the backup PW becomes the
master, SPE2 has a Layer 2 multicast forwarding table ready to use and can begin forwarding
multicast data traffic immediately. This ensures uninterrupted delivery of multicast services.
Related Concepts
The following concepts are involved in rapid multicast data forwarding on a backup device:
 Master and backup devices
Between the devices to which a device directly connected to user hosts are dual-homed
through a VPLS network, the working device is the master, and the device that protects
the working device is the backup.
 Primary and backup links
The physical link between the device directly connected to user hosts and the master
device is the primary link. The physical link between the device directly connected to
user hosts and backup device is the backup link.
 Primary and backup PWs
The PW between the device directly connected to user hosts and the master device is the
primary PW. The PW between the device directly connected to user hosts and the backup
device is the backup PW.
Other Functions
If the upstream and downstream devices (PE and UPE) are not allowed to receive IGMP
messages that carry the same source MAC address but are sent from different interfaces, the
backup device needs to be configured to replace the source MAC addresses carried in IGMP
messages.
 After rapid multicast traffic forwarding is configured, the UPE receives IGMP Query
messages from both SPE1 and SPE2. Both messages carry the same MAC address. If
MAC-flapping or MAC address authentication has been configured on the UPE, protocol
packets that are received by the UPE through different interfaces but carry the same
source MAC address will be filtered out. The backup SPE can be configured to change
the source MAC addresses of packets to its MAC address before sending IGMP Query
Issue 01 (2018-05-04) 1553

NE20E-S2
messages along the backup PW. This allows the UPE to learn two different router ports
and send IGMP Report and Leave messages from attached users to SPE1 and SPE2.
 Similarly, if MAC-flapping or MAC address authentication has been configured on the
PE, the backup SPE needs to be configured to change the source MAC addresses of
received IGMP Report or Leave messages to its MAC address before sending them to
the PE.
Rapid multicast data forwarding on a backup device is used on VPLS networks that have a
device dual-homed to upstream devices through PWs.
Benefits
Rapid multicast data forwarding on a backup device provides the following benefit:
 After a master/backup device switchover is performed, multicast data can be quickly
forwarded on the backup device. This ensures reliable multicast service transmission and
enhances user experience.
1.11.11.2.9 Layer 2 Multicast Instance
Background
In conventional multicast on-demand mode, if users of a Layer 2 multicast device in different
VLANs or VSIs request for the same multicast group's data from the same source, the
connected upstream Layer 3 device has to send a copy of each multicast flow of this group for
each VLAN or VSI. Such implementation wastes bandwidth resources and burdens the
upstream device.
The Layer 2 multicast instance feature, which is an enhancement of multicast VLAN, resolves
these issues by allowing multicast data replication across VLANs and VSIs and supporting
multicast data transmission of the same multicast group across instances. These functions help
save bandwidth resources and simplify multicast group management. A Layer 2 network
supports multiple Layer 2 multicast instances. For example, on the network shown in Figure
1-1043, if users in VLAN 11 and VLAN 22 request for multicast data from channels in the
range of 225.0.0.1 to 225.0.0.5, Layer 2 multicast instances can be deployed on the CE. Then,
the CE requests for only a single copy of each multicast data flow through VLAN 3 from the
PE, replicates the multicast data flow, and sends a copy to each VLAN. This implementation
greatly reduces bandwidth consumption.
Issue 01 (2018-05-04) 1554

NE20E-S2
Figure 1-1043 Layer 2 multicast instance application
Layer 2 multicast instances allow devices to replicate multicast data flows across different
types of instances, such as flow replication from a VPLS to a VLAN or from a VLAN to a
VPLS.
Related Concepts
 Multicast instance
An instance to which the interface connected to a multicast source belongs. A multicast
instance aggregates multicast flows.
 User instance
An instance to which the interface connected to a multicast receiver belongs. A user
instance receives multicast flows from a multicast instance.
A multicast instance can be associated with multiple user instances.
 Multicast channel
A multicast channel consists of one or more multicast groups. To facilitate service
management, multicast content providers generally operate different types of channels in
different Layer 2 multicast instances. Therefore, multicast channels need to be
configured for Layer 2 multicast instances.
Issue 01 (2018-05-04) 1555

NE20E-S2
Implementation
After receiving a multicast data packet from an upstream device, a Layer 2 device searches
for a matching entry in the multicast forwarding table based on the multicast instance ID and
the destination address (multicast group address) contained in the packet. If a matching
forwarding entry exists, the Layer 2 device obtains the downstream interfaces and the VLAN
IDs or VSI names, replicates the multicast data packet on each downstream interface, and
sends a copy of the packet to all involved user instances. If no matching forwarding exists, the
Layer 2 device broadcasts the multicast data packet in the local multicast VLAN or VSI. This
implementation is similar to multicast VLAN implementation.
Usage Scenario
Layer 2 multicast instances apply to VLAN and VPLS networks.
Benefits
Layer 2 multicast instances bring the following benefits:
 Improved network security
 Isolated unicast and multicast domains to prevent user traffic from affecting each other
1.11.11.2.10 MLD Snooping
Definition
Multicast Listener Discovery Snooping (MLD snooping) is an IPv6 Layer 2 multicast
protocol. The MLD snooping protocol maintains information about the outbound interfaces of
multicast packets by snooping multicast protocol packets exchanged between the Layer 3
multicast device and user hosts. MLD snooping manages and controls multicast packet
forwarding at the data link layer.
MLD snooping does not support multicast HQoS.
Purpose
Similar to an IPv4 multicast network, multicast data on an IPv6 multicast network (especially
on an LAN) have to pass through Layer 2 switching devices. As shown in Figure 1-1044, a
Layer 2 switch locates between multicast users and the Layer 3 multicast device, Router.
Issue 01 (2018-05-04) 1556

NE20E-S2
Figure 1-1044 MLD snooping networking
After receiving multicast packets from Router, Switch forwards the multicast packets to the
multicast receivers. The destination address of the multicast packets is a multicast group
address. Switch cannot learn multicast MAC address entries, so it broadcasts the multicast
packets in the broadcast domain. All hosts in the broadcast domain will receive the multicast
packets, regardless of whether they are members of the multicast group. This wastes network
bandwidth and threatens network security.
MLD snooping solves this problem. MLD snooping is a Layer 2 multicast protocol on the
IPv6 network. After MLD snooping is configured, Switch can snoop and analyze MLD
messages between multicast users and Router. The Layer 2 multicast device sets up Layer 2
multicast forwarding entries to control forwarding of multicast data. In this way, multicast
data is not broadcast on the Layer 2 network.
Principles
MLD snooping is a basic IPv6 Layer 2 multicast function that forwards and controls multicast
traffic at Layer 2. MLD snooping runs on a Layer 2 device and analyzes MLD messages
exchanged between a Layer 3 device and hosts to set up and maintain a Layer 2 multicast
forwarding table. The Layer 2 device forwards multicast packets based on the Layer 2
multicast forwarding table.
On an IPv6 multicast network shown in Figure 1-1045, after receiving multicast packets from
Router, Switch at the edge of the access layer forwards the multicast packets to receiver hosts.
If Switch does not run MLD snooping, it broadcasts multicast packets at Layer 2. After MLD
snooping is configured, Switch forwards multicast packets only to specified hosts.
With MLD snooping configured, Switch listens on MLD messages exchanged between
Router and hosts. It analyzes packet information (such as packet type, group address, and
receiving interface) to set up and maintain a Layer 2 multicast forwarding table, and forwards
multicast packets based on the Layer 2 multicast forwarding table.
Issue 01 (2018-05-04) 1557

NE20E-S2
Figure 1-1045 Multicast packet transmission before and after MLD snooping is configured on a
Layer 2 device
Concepts
As shown in Figure 1-1046, Router connects to the multicast source. MLD snooping is
configured on SwitchA and SwitchB. HostA, HostB, and HostC are receiver hosts.
Issue 01 (2018-05-04) 1558

NE20E-S2
Figure 1-1046 MLD snooping ports
Figure 1-1046 shows MLD snooping ports. The following table describes these ports.
Table 1-326 MLD snooping ports
Port Role Function Generation

Router port A router port receives  A dynamic router port is
Ports marked as blue points multicast packets from a generated by MLD
on SwitchA and SwitchB. Layer 3 multicast device snooping. A port
such as a designated router becomes a dynamic
NOTE (DR) or MLD querier. router port when it
A router port is a port on a receives an MLD
Layer 2 multicast device and
connects to an upstream
General Query message
multicast router. or IPv6 PIM Hello
message with any source
address except 0::0. The
IPv6 PIM Hello
messages are sent from
the PIM port on a Layer
3 multicast device to
discover and maintain
 A static router port is
manually configured.
Issue 01 (2018-05-04) 1559

NE20E-S2
Port Role Function Generation

Member port A member port is a member  A dynamic member port
Ports marked as yellow of a multicast group. A is generated by MLD
points on SwitchA and Layer 2 multicast device snooping. A Layer 2
SwitchB. sends multicast data to the multicast device sets a
receiver hosts through port as a dynamic
member ports. member port when the
port receives an MLD
Report message.
 A static member port is
manually configured.
The router port and member port are outbound interfaces in Layer 2 multicast forwarding
entries. A router port functions as an upstream interface, while a member port functions as a
downstream interface. Port information learned through protocol packets is saved as dynamic
entries, and port information manually configured is saved as static entries.
Besides the outbound interfaces, each entry includes multicast group addresses and VLAN
IDs.
 Multicast group addresses can be multicast IP addresses or multicast MAC addresses
mapped from multicast IP addresses. In MAC address-based forwarding mode, multicast
data may be forwarded to hosts that do not require the data because multiple IP addresses
are mapped to the same MAC address. The IP address-based forwarding mode can
prevent this problem.
 The VLAN ID specifies a Layer 2 broadcast domain. After multicast VLAN is
configured, the inbound VLAN ID is the multicast VLAN ID, and the outbound VLAN
ID is a user VLAN ID. If multicast VLAN is not configured, both the inbound and
outbound VLAN IDs are the ID of the VLAN to which a host belongs.
Implementation
After MLD snooping is configured, the Layer 2 multicast device processes the received MLD
protocol packets in different ways and sets up Layer 2 multicast forwarding entries.
Table 1-327 MLD message processing by MLD snooping
MLD Working Phase MLD Message Received Processing Method

on a Layer 2 Device
General query MLD General Query A Layer 2 device forwards

The MLD querier message MLD General Query
periodically sends General messages to all ports
Query messages to all hosts excluding the port receiving
and the router (FF02::1) on the messages. The Layer 2
the local network segment, device processes the
to check which multicast receiving port as follows:
groups have members on the  If the port is included in
network segment. the router port list, the
Layer 2 device resets the
aging timer of the router
Issue 01 (2018-05-04) 1560

NE20E-S2

on a Layer 2 Device
port.
 If the port is not in the
router port list, the Layer
2 device adds it to the
list and starts the aging
timer.
Membership report MLD Report message A Layer 2 device forwards

Membership Report an MLD Report message to
messages are used in two all router ports in a VLAN.
scenarios: The Layer 2 device obtains
the multicast group address
 Upon receiving an MLD from the Report message
General Query message, and performs the following
a member returns an operations on the port
MLD Report message. receiving the message:
 A member sends an  If the multicast group
MLD Report message to matches no forwarding
the MLD querier to entry, the Layer 2 device
announce its joining to a creates a forwarding
multicast group. entry, adds the port to
the outbound interface
list as a dynamic
member port, and starts
the aging timer.
 If the multicast group
matches a forwarding
entry but the port is not
in the outbound interface
list, the Layer 2 device
adds the port to the
outbound interface list as
a dynamic member port,
and starts the aging
timer.
matches a forwarding
entry and the port is in
the router port list, the
Layer 2 device resets the
aging timer.
NOTE
Aging time of a dynamic
router port = Robustness
variable × General query
interval + Maximum response
time for General Query
messages
Issue 01 (2018-05-04) 1561

NE20E-S2

on a Layer 2 Device
Leave of multicast members MLD Leave message The Layer 2 device

There are two phases: determines whether the
multicast group matches a
1. Members send MLD forwarding entry and
Done messages to notify whether the port that
the MLD querier that the receives the message is in
members have left a the outbound interface list.
multicast group.
 If no forwarding entry
2. Upon receiving the MLD
matches the multicast
Done message, the MLD
group or the outbound
querier obtains the
interface list of the
multicast group address
matching entry does not
and sends a
contain the receiving
Multicast-Address-Speci
port, the Layer 2 device
fic
drops the MLD Leave
Query/Multicast-Address
message.
-and-Source-Specific
Query message to the
multicast group. matches a forwarding
entry and the port is in
the outbound interface
list, the Layer 2 device
forwards the MLD Leave
message to all router
ports in the VLAN.
The following assumes that

the port receiving an MLD
Leave message is a dynamic
member port. Within the
aging time of the member
port:
 If the port receives MLD
Report messages in
response to the
fic Query message, the
Layer 2 device knows
that the multicast group
has members connected
to the port and resets the
aging timer.
 If the port receives no
MLD Report message in
response to the
fic Query message, no
member of the multicast
group exists under the
interface. Then the Layer
Issue 01 (2018-05-04) 1562

NE20E-S2

on a Layer 2 Device
2 device deletes the port
from the outbound
interface list when the
aging time is reached.
Multicast-Address-Specific A
Query/Multicast-Address-an Multicast-Address-Specific
d-Source-Specific Query Query/Multicast-Address-an
message d-Source-Specific Query
message is forwarded to the
ports connected to members
of specific groups.
Upon receiving an IPv6 PIM Hello message, a Layer 2 device forwards the message to all
ports excluding the port that receives the Hello message. The Layer 2 device processes the
receiving port as follows:
 If the port is included in the router port list, the device resets the aging timer of the router
port.
 If the port is not in the router port list, the device adds it to the list and starts the aging
timer.
When the Layer 2 device receives an IPv6 PIM Hello message, it sets the aging time of the router port to
the Holdtime value in the Hello message.
If a static router port is configured, the Layer 2 device forwards received MLD Report and
Done messages to the static router port. If a static member port is configured for a multicast
group, the Layer 2 device adds the port to the outbound interface list for the multicast group.
After a Layer 2 multicast forwarding table is set up, the Layer 2 device searches the multicast
forwarding table for outbound interfaces of multicast data packets according to the VLAN IDs
and destination addresses (IPv6 group addresses) of the packets. If outbound interfaces are
found for a packet, the Layer 2 device forwards the packet to all the member ports of the
multicast group. If no outbound interface is found, the Layer 2 device drops the packet or
broadcasts the packet in the VLAN.
MLD Snooping Proxy

Principles
MLD snooping proxy can be configured on a Layer 2 device. The Layer 2 device then
functions as a host to send MLD Report messages to the upstream Layer 3 device. This
function reduces the number of MLD Report and MLD Done messages sent to the upstream
Layer 3 device. A device configured with MLD snooping proxy functions as a host for its
upstream device and a querier for its downstream hosts.
As shown in Figure 1-1047, when Switch runs MLD snooping, it forwards MLD Query,
Report, and Done messages transparently to the upstream Router. When numerous hosts exist
on the network, redundant MLD messages increase the burden of Router.
Issue 01 (2018-05-04) 1563

NE20E-S2
With MLD snooping proxy configured, Switch can terminate MLD Query messages sent from
Router and MLD Report/Done sent from downstream hosts. When receiving these messages,
Switch constructs new messages to send them to Router.
Figure 1-1047 Networking diagram of MLD snooping proxy
After MLD snooping proxy is deployed on the Layer 2 device, the Layer 3 device considers
that it interacts with only one user. The Layer 2 device interacts with the upstream device and
downstream hosts. The MLD snooping proxy function conserves bandwidth by reducing
MLD message exchanges. In addition, MLD snooping proxy functions as a querier to process
protocol messages received from downstream hosts and maintain group memberships. This
reduces the load of the upstream Layer 3 device.
Implementation
A device that runs MLD snooping proxy sets up and maintains a Layer 2 multicast forwarding
table and sends multicast data to hosts based on the multicast forwarding table. Table 1-328
describes how the MLD snooping proxy device processes MLD messages.
Table 1-328 received MLD message processing by MLD snooping proxy
MLD Message Processing Method

MLD General Query message The Layer 2 device forwards the message to
all ports excluding the port receiving the
message. The device generates an MLD
Report message based on the group
Issue 01 (2018-05-04) 1564

NE20E-S2
MLD Message Processing Method

memberships and sends the MLD Report
message to all router ports.
Multicast-Address-Specific Query/Multicast If the group specified in the message has
Address and Source Specific Query member ports in the multicast forwarding
message table, the Layer 2 device responds with an
MLD Report message to all router ports.
MLD Report message  If the multicast group matches no
forwarding entry, the Layer 2 device
creates a forwarding entry, adds the
message receiving port to the outbound
interface list as a dynamic member port,
starts the aging timer, and sends an MLD
Report message to all router ports.
 If the multicast group matches a
forwarding entry and the message
receiving is in the outbound interface
list, the device resets the aging timer.
 If the multicast group matches a
forwarding entry, but the port is not in
the outbound interface list, the Layer 2
device adds the port to the list as a
dynamic router port, and starts the aging
timer.
MLD Done message The Layer 2 device sends a Group-Specific

Query message to the port that receives the
MLD Done message. The Layer 2 device
sends an MLD Done message to all router
ports only when the last member port is
deleted from the forwarding entry.
1.11.11.3.1 Application of Layer 2 Multicast for IPTV Services
Service Overview
IPTV services are video services provided for users through an IP network. IPTV services
pose high requirements for bandwidth, real-time transmission, and reliability on IP MANs.
Multiple users can receive the same IPTV service data simultaneously.
Given the characteristics of IPTV, multicast technologies can be used to bear IPTV services.
Compared with traditional unicast, multicast ensures that network bandwidth demands do not
increase with the number of users and reduces the workload of video servers and the bearer
network. If service providers want to deploy IPTV services in a rapid and economical way,
E2E multicast push is recommended.
Issue 01 (2018-05-04) 1565

NE20E-S2
Network Description
Currently, the IP MAN consists of a metro backbone network and broadband access network.
IPTV service traffic is pushed to user terminals through the metro backbone network and
broadband access network in sequence. Figure 1-1048 shows an E2E IPTV service push
model. The metro backbone network is mainly composed of network layer (Layer 3) devices.
PIM such as PIM-SM is used on each device on the metro backbone to connect to the
multicast source and IGMP is used on the devices directly connected to the broadband access
network to forward multicast packets to user terminals. The broadband access network is
mainly composed of data link layer (Layer 2) devices. Layer 2 multicast techniques such as
IGMP proxy or IGMP snooping can be used on Layer 2 devices to forward multicast packets
to terminal users.
Figure 1-1048 Application of Layer 2 multicast for IPTV services
The following section describes Layer 2 multicast features used on the broadband access
network.
Issue 01 (2018-05-04) 1566

NE20E-S2
Feature Deployment
The broadband access network is constructed using Layer 2 devices. Layer 2 devices
exchange or forward data frames by MAC address. They have week IP packet parsing and
routing capabilities. As a result, the Layer 2 devices do not support Layer 3 multicast
protocols. Previously, Layer 2 devices broadcast IPTV multicast traffic to all interfaces, which
easily results in broadcast storms.
To solve the problem of multicast packet flooding, commonly used Layer 2 multicast
forwarding techniques, such as IGMP snooping, IGMP proxy, and multicast VLAN, can be
used.
 Deploy IGMP snooping on all Layer 2 devices, so that they listen to IGMP messages
exchanged between Layer 3 devices and user terminals and maintain multicast group
memberships, implementing on-demand multicast traffic forwarding.
 Deploy IGMP snooping proxy on CEs close to user terminals, so that the CEs listen to,
filter, and forward IGMP messages. This reduces the number of multicast protocol
packets directly exchanged between CEs and upstream devices, and reduces packet
processing pressure on upstream devices.
 Deploy multicast VLAN= on CEs close to user terminals to reduce the network
bandwidth required for transmissions between CEs and multicast sources.
The following features can also be deployed on Layer 2 devices:
 VSI or VLAN-based Layer 2 multicast instance (a multicast VLAN enhancement) can be
deployed on CEs close to user terminals to reduce the network bandwidth required for
transmissions between CEs and multicast sources.
 If the number of user terminals attached to a CE exceeds the number of IPTV channels,
static multicast groups can be configured on the CE to increase the channel change speed
and improve the QoS for IPTV services.
 If user hosts support IGMPv1 and IGMPv2 only, SSM mapping can be deployed on the
CE connected to these user terminals so the user hosts can access SSM services.
 Rapid multicast traffic forwarding can be deployed on a backup PE to improve the
reliability of links between the PE and CE.
This example uses an IPTV channel with a bandwidth of 2 Mbit/s.
 If a Layer 2 device uses no Layer 2 multicast forwarding technology, the device forwards
multicast packets to all IPTV users. Broadcasting multicast packets for five IPTV
channels leads to network congestion. This is the case even if the bandwidth of the
interface connecting the Layer 2 device to users is 10 Mbit/s.
 After Layer 2 multicast forwarding technologies are used on the Layer 2 device, the
Layer 2 device sends multicast packets only to users that require the multicast packets. If
each interface of the Layer 2 device is connected to at least one IPTV user terminal,
multicast packets (2 Mbit/s traffic) for at most one BTV channel are forwarded to
corresponding interfaces. This ensures the availability of adequate network bandwidth
and the quality of user experience.
1.11.11.3.2 MLD Snooping Application
As shown in Figure 1-1049, a multicast source exists on an IPv6 PIM network and provides
multicast video services for users on the LAN. Some users such as HostA and HostC on the
LAN want to receive video data in multicast mode. To prevent multicast data from being
Issue 01 (2018-05-04) 1567

NE20E-S2
broadcast on the LAN, configure MLD snooping on Layer 2 multicast devices to accurately
forward multicast data on the Layer 2 network, which prevents bandwidth waste and network
information leakage.
Figure 1-1049 MLD snooping networking
Deployed Features
You can deploy the following features to accurately forward multicast data on the network
shown in Figure 1-1049:
 IPv6 PIM and MLD on the Layer 3 multicast device Router to route multicast data to
user segments.
 MLD snooping on the Layer 2 device Switch so that Switch can set up and maintain a
Layer 2 multicast forwarding table to forward multicast data to specified users.
 MLD snooping proxy after configuring MLD snooping on Switch to release Router from
processing a large number of MLD messages.
Terms
Term Definition
(*, G) A multicast routing entry used in the ASM model. * indicates
Issue 01 (2018-05-04) 1568

NE20E-S2
Term Definition
any source, and G indicates a multicast group.
(*, G) applies to all multicast messages with the multicast
group address as G. That is, all the multicast messages sent to
G are forwarded through the downstream interface of the (*, G)
entry, regardless of which multicast sources send the multicast
messages.
(S, G) A multicast routing entry used in the SSM model. S indicates a
multicast source, and G indicates a multicast group.
After a multicast packet with S as the source address and G as
the group address reaches a router, it is forwarded through the
downstream interfaces of the (S, G) entry.
A multicast packet that contains a specified source address is
expressed as an (S, G) packet.

Abbreviation
IGMP Internet Group Management Protocol

PW pseudo wire
VLAN virtual local area network
VPLS virtual private LAN service
VSI virtual switch instance
1.12 MPLS
Purpose
This document describes the MPLS feature in terms of its overview, principles, and
applications.
Related Version
Issue 01 (2018-05-04) 1569

NE20E-S2

U2000 V200R017C50
Intended Audience
securely protected.
Issue 01 (2018-05-04) 1570

NE20E-S2
and VRRP.
Special Declaration
Symbol Conventions
Symbol Description



injury.
tips.
deterioration.
Issue 01 (2018-05-04) 1571

NE20E-S2
Change History
V800R009C10SPC200.
V800R009C10SPC100.
V800R009C10.
1.12.2 MPLS Overview

Background
IP-based Internet prevailed in the mid 90s. The technology is simple and costs little to deploy.
However, nowadays IP technology, which relies on the longest match algorithm, is not the
most efficient choice for forwarding packets.
In comparison, asynchronous transfer mode (ATM) is much more efficient at forwarding
packets. However, ATM technology is a complex protocol with a high deployment cost, which
has hindered its widespread popularity and growth.
Users wanted a technology that combines the best that both IP and ATM have to offer. The
MPLS technology merges.
Multiprotocol Label Switching (MPLS) is designed to increase forwarding rates. Unlike IP
technology, MPLS analyzes packet headers on the edge of a network, not at each hop.
Therefore, packet processing time is shortened.
MPLS supports multi-layer labels, and its forwarding plane is connection-oriented. MPLS is
widely used in virtual private network (VPN), traffic engineering (TE), and quality of service
(QoS) scenarios.
Overview
MPLS takes place between the data link layer and network layer in the TCP/IP protocol stack.
MPLS supports label switching between multiple network protocols, as implied by its name.
MPLS can use any Layer 2 media to transfer packets, but is not limited by any specific
protocol on the data link layer.
MPLS is derived from the Internet Protocol version 4 (IPv4). The core MPLS technology can
be extended to multiple network protocols, such as the Internet Protocol version 6 (IPv6),
Internet Packet Exchange (IPX), Appletalk, DECnet, and Connectionless Network Protocol
(CLNP). Multiprotocol in MPLS means that the protocol supports multiple network protocols.
The MPLS technology supports multiple protocols and services and improves data
transmission security.
Issue 01 (2018-05-04) 1572

NE20E-S2
1.12.2.2 Principles
1.12.2.2.1 Concepts
MPLS Network Structure

Figure 1-1050 shows the typical structure of an MPLS network, which consists of many label
switching routers (LSRs). An MPLS network, also called an MPLS domain, comprises the
following nodes:
 Label edge routers (LERs): reside on the edge of an MPLS domain and directly connect
to one or more nodes that do not run MPLS.
 Core LSRs: directly connect to MPLS-enabled nodes within an MPLS domain.
Figure 1-1050 MPLS network structure
All LSRs on the MPLS network forward data based on labels. When an IP packet enters an
MPLS network, an LER adds a label to it. Before the IP packet leaves the MPLS network,
another LER removes the label.
The path that MPLS packets take on an MPLS network is called a label switched path (LSP).
Issue 01 (2018-05-04) 1573

NE20E-S2
Figure 1-1051 MPLS LSP
Forwarding Equivalence Class

A FEC is a set of data flows with the same attributes. Data flows in the same FEC are
processed by LSRs in the same way.
FECs can be used based on the address, service type, and quality of service (QoS).
Label
A label is a 20-bit identifier that uniquely identifies the FEC to which a packet belongs. Upon
receiving an IP packet from a non-MPLS network, the ingress of an LSP creates an MPLS
header in the packet and inserts a specific label into this field, which turns the IP packets into
MPLS packets. A label is only meaningful to a local end. A FEC can be mapped to multiple
incoming labels to balance loads, but a label only represents a single FEC.
Figure 1-1052 illustrates the structure of an MPLS header.
Figure 1-1052 MPLS packet header structure
The MPLS header contains the following fields:

 Label: a 20-bit field that identifies a label value.
 Exp: a 3-bit l field used for extension. This field is used to implement the class of service
(CoS) function, which is similar to Ethernet 802.1p.
 S: a 1-bit field that identifies the bottom of a label stack. MPLS supports multiple labels
that may be stacked. If the S field value is set to 1, the label is at the bottom of the label
stack.
 TTL: time to live, which is 8 bits long. This field is the same as the TTL in IP packets.
The unit is seconds.
Labels are encapsulated between the data link layer and network layer and supported by all
data link layer protocols.
Issue 01 (2018-05-04) 1574

NE20E-S2
Figure 1-1053 illustrates the position of the label in a packet.
Figure 1-1053 Position of the label in a packet
Label Space
Label space is the label value range. The NE20E supports the following label ranges:
 special labels. For details about special labels, see Table 1-329.
 label space shared by static LSPs and static constraint-based routed label switched paths
(CR-LSPs).
 label space used by dynamic signaling protocols, such as Label Distribution Protocol
(LDP), Resource Reservation Protocol-Traffic Engineering (RSVP-TE), and
Multiprotocol Extensions for Border Gateway Protocol (MP-BGP).
Each dynamic signaling protocol uses independent and contiguous values.
Table 1-329 Special labels
Value Name Function
0 IPv4 Explicit If the egress receives a packet carrying a label with this
NULL Label value, the egress must remove the label from the packet.
The egress then forwards the packet using IPv4.
1 Router Alert If a node receives a packet carrying label with this value,
Label the node sends the packet to a software module, without
implementing hardware forwarding. The node forwards
the packet based on the next layer label. If the packet
needs to be forwarded using hardware, the node pushes
the Router Alert Label back onto the top of the label
stack before forwarding the packet.
This label takes effect only when it is not at the bottom
of a label stack.
2 IPv6 Explicit If the egress receives a packet carrying a label with this
NULL Label value, the egress removes the label from the packet and
forwards the packet using IPv6.
3 Implicit If the penultimate LSR receives a packet carrying a label
NULL Label with this value, the penultimate LSR removes the label
and forwards the packet (now, an IP or VPN packet) to
the egress. The egress then forwards the packet over IP
or VPN routes.
4 to 13 Reserved N/A
14 OAM Router If the ingress receives a packet carrying a label with this
Alert Label value, the ingress considers it an Operation,
Administration and Maintenance (OAM) packet and
Issue 01 (2018-05-04) 1575

NE20E-S2
Value Name Function

transparently forwards it to the egress. MPLS OAM
sends OAM packets to monitor LSPs and advertise
faults.
15 Reserved N/A
Label Stack
Labels in an MPLS packet can be stacked. The label next to the Layer 2 header is the top or
outer label. The label next to the Layer 3 header is the bottom or inner label. Theoretically,
there is no limitation to the number of MPLS labels that can be stacked. Figure 1-1054
illustrates a label stack.
Figure 1-1054 Label stack
The labels are processed from the top of the stack based on the last in, first out principle.
Label Operations
The label forwarding table defines the following label operations:
 Push: The ingress adds a label into an IP packet between the Layer 2 header and IP
header before forwarding the packet. Within an MPLS network, each LSR adds a new
label to the top of the label stack.
 Swap: A transit node replaces a label on the top of the label stack in an MPLS packet
with another label, which is assigned by the next hop.
 Pop: A transit LSR or the egress removes the top label from the label stack to decrease
the number of labels in the stack. Either the egress or the penultimate LSR removes a
label from the MPLS packet before the packet leaves an MPLS network.
Penultimate Hop Popping

Penultimate hop popping (PHP) enables the penultimate LSR to remove a label from a packet.
Since many LSPs can share the same egress, PHP helps reduce the amount of work the egress
has to do, which reduces network congestion.
PHP is configured on the egress. The PHP-enabled egress advertises a label with value 3 to
adjacent LSRs on an MPLS network. The implicit null label with value 3 is assigned to
penultimate LSRs to implement PHP. After the penultimate LSRs removes labels with value 3
from the packets, the MPLS packets are reverted to IP or VPN packets and forwarded to the
egress. The egress then forwards the packets over IP routes or based on the next layer label.
Issue 01 (2018-05-04) 1576

NE20E-S2
Label Switching Router

A label switching router (LSR) swaps labels and forwards MPLS packets. It is also called an
MPLS node. As a fundamental element on an MPLS network, all LSRs support MPLS.
Label Switched Path

On an MPLS network, packets belonging to a forwarding equivalence class (FEC) pass
through a path called a label switched path (LSP).
LSPs are unidirectional and originate from the ingress and terminate at the egress.
Ingress, Transit, and Egress LSRs

The LSRs along an LSP are as follows:
 Ingress LSR: the start node on an LSP. An LSP can have only one ingress.
The ingress creates an MPLS header field into which it pushes a label. This essentially
turns the IP packet into an MPLS packet.
 Transit LSR: an optional intermediate node on an LSP. An LSP can have multiple transit
LSRs.
A transit LSR searches the label forwarding table for entries, swaps the existing labels in
MPLS packets for new labels, and forwards the MPLS packets to the next hop.
 Egress LSR: the end node on an LSP, also called the last hop. An LSP can have only one
egress.
The egress strips labels off MPLS packets and forwards the resultant IP packets.
Upstream and Downstream

There are two types of LSRs: upstream and downstream. Upstream LSRs send MPLS packets
to a local LSR. Downstream LSRs are directly connected to and receive MPLS packets from a
local LSR.
In Figure 1-1055, LSRA is the upstream LSR of LSRB, and LSRB is the downstream of
LSRA. Likewise, LSRB is the upstream LSR of LSRC. LSRC is the downstream LSR of
LSRB.
Figure 1-1055 Upstream and downstream
Label Distribution
An LSR records a mapping between a label and FEC and notifies upstream LSRs of the
mapping. This process is called label distribution.
Issue 01 (2018-05-04) 1577

NE20E-S2
Figure 1-1056 Label distribution
On the network shown in Figure 1-1056, packets with the destination address 192.168.1.0/24
are assigned to a specific FEC. LSRB and LSRC allocate labels that represent the FEC and
advertise the mapping between labels and the FEC to upstream LSRs.
Label Distribution Protocols

Label distribution protocols, also called signaling protocols, are MPLS control protocols used
to identify FECs, distribute labels, and create and maintain LSPs.
MPLS utilizes the following label distribution protocols:
 Label Distribution Protocol (LDP)
 Resource Reservation Protocol-Traffic Engineering (RSVP-TE)
 Multiprotocol Extensions for Border Gateway Protocol (MP-BGP)
MPLS Architecture
As shown in Figure 1-1057, the MPLS architecture consists of a control plane and a
forwarding plane.
Figure 1-1057 MPLS architecture
Issue 01 (2018-05-04) 1578

NE20E-S2
The control plane is connectionless and is used to distribute labels, create a label forwarding
table, and establish or tear down LSPs.
The forwarding plane, also known as the data plane, is connection oriented. It can apply
services and protocols supported by ATM, and Ethernet. The forwarding plane adds labels to
IP packets and removes labels from MPLS packets. It forwards packets based on the label
forwarding table.
1.12.2.2.2 LSP Establishment
Procedure
MPLS assigns packets to a FEC, distributes labels that identify the FEC, and establishes an
LSP. Packets travel along the LSP.
On the network shown in Figure 1-1058, packets destined for 3.3.3.3 are assigned to a FEC.
Downstream LSRs assign labels for the FEC to upstream LSRs and use a label advertisement
protocol to inform its upstream LSR of the mapping between the labels and the FEC. Each
upstream LSR adds the mapping to a label forwarding table. An LSP is established using the
label mapping information.
Figure 1-1058 Procedure for establishing an LSP
LSPs can be either static or dynamic. Static LSPs are established manually. Dynamic LSPs are
established using a routing protocol and a label distribution protocol.
Establishing Dynamic LSPs

Dynamic LSPs are set up automatically by one of the following label distribution protocols:
 Label Distribution Protocol (LDP)
LDP is specially defined to distribute labels. When LDP sets up an LSP in hop-by-hop
mode, LDP identifies a next hop based on the routing forwarding table on each LSR.
Information contained in the routing forwarding table is collected by Interior Gateway
Protocol (IGP) and BGP, from which LDP is independent.
In addition to LDP, BGP and RSVP can also be extended to distribute MPLS labels.
 Resource Reservation Protocol-Traffic Engineering (RSVP-TE)
The RSVP-TE signaling protocol is an extension to RSVP. RSVP is designed for the
integrated service model and is used to reserve the resources of nodes along a path.
RSVP works on the transport layer and does not transmit application data. This is
because RSVP is a network control protocol, similar to the Internet Control Message
Protocol (ICMP).
RSVP-TE establishes constraint-based routed LSPs (CR-LSPs).
Unlike LDP LSPs, CR-LSPs support the following parameters:
− Bandwidth reservation requests
Issue 01 (2018-05-04) 1579

NE20E-S2
− Bandwidth constraint
− Link colors
− Explicit paths
 Multiprotocol Extensions for Border Gateway Protocol (MP-BGP)
MP-BGP is an extension to BGP. MP-BGP defines community attributes. MP-BGP
supports label distribution for packets transmitted over MPLS virtual private network
(VPN) routes and labeled inter-AS VPN routes.
1.12.2.2.3 MPLS P Fragmentation

Packet fragmentation enables an MPLS P node to fragment MPLS packets, which minimizes
packet discarding during MPLS forwarding.
Background
A network with increasing scale and complexity allows for devices of various specifications.
Without packet fragmentation enabled, an MPLS P node transparently transmits packets sent
by the ingress PE to the egress PE. If the MTU configured on the ingress PE is greater than
the MRU configured on the egress PE, the egress PE discards packets with sizes larger than
the MRU.
Principles
In Figure 1-1059, the ingress PE1 has MTU1 greater than MRU2 on the egress PE2. PE2 is
enabled to discard packets with sizes larger than MRU2. Without packet fragmentation
enabled, a P node transparently forwards a packet with a size of LENGTH (MTU1 >
LENGTH > MRU2) to PE2. Since the packet length is greater than MRU2, PE2 discards the
packet. After packet fragmentation is enabled on the P node, the P node fragments the same
packet in to a packet with the size of MTU2 (MTU2 < MRU2) and a packet with a specified
size (LENGTH minus MTU2). If the LENGTH-MTU2 value is greater than MTU2, the
fragment is also fragmented. After the fragments reach PE2, PE2 properly forwards them
because their lengths are less than MRU2.
Figure 1-1059 Fragmentation on the MPLS P node
Issue 01 (2018-05-04) 1580

NE20E-S2
MPLS MTU Calculation Method

Each LSR selects the smallest value among all MTU values advertised by preferred next-hop
LSRs as well as the MTU of the local outbound interface mapped to a specified forwarding
equivalence class (FEC) before advertising the selected MTU value to the upstream LSR. A
downstream LSR selects an MTU value, adds it to the MTU TLV in a Label Mapping
message, and sends the Label Mapping message upstream. If an MTU value changes (such as
when the local outbound interface or its configuration changes), an LSR recalculates the MTU
value and sends a Label Mapping message carrying the new MTU value to all upstream
devices.
1.12.2.2.4 Checking the Source Interface of a Static CR-LSP

A device uses the static CR-LSP's source interface check function to check whether the
inbound interface of labeled packets is the same as that of a configured static CR-LSP. If the
inbound interfaces match, the device forwards the packets. If the inbound interfaces do not
match, the device discards the packets.
Background
A static CR-LSP is established using manually configured forwarding and resource
information. Signaling protocols and path calculation are not used during the setup of
CR-LSPs. Setting up a static CR-LSP consumes a few resources because the two ends of the
CR-LSP do not need to exchange MPLS control packets. The static CR-LSP cannot be
adjusted dynamically in a changeable network topology. A static CR-LSP configuration error
may cause protocol packets of different NEs and statuses interfere one another, which
adversely affects services. To address the preceding problem, a device can be enabled to
check source interfaces of static CR-LSPs. With this function configured, the device can only
forward packets if both labels and inbound interfaces are correct.
Principles
In Figure 1-1060, static CR-LSP1 is configured, with PE1 functioning as the ingress, the P as
a transit node, and PE2 as the egress. The P's inbound interface connected to PE1 is Interface1
and the incoming label is Label1. Static CR-LSP2 remains on PE3 that functions as the
ingress of CR-LSP2. The P's inbound interface connected to PE3 is Interface2 and the
incoming label is Label1. If PE3 sends traffic along CR-LSP2 and Interface2 on the P receives
the traffic, the P checks the inbound interface information and finds that the traffic carries
Label2 but the inbound interface is not Interface2. Consequently, the P discards the traffic.
Issue 01 (2018-05-04) 1581

NE20E-S2
Figure 1-1060 Checking the source interface of a static CR-LSP
1.12.2.3.1 MPLS-based VPN
A traditional virtual private network (VPN) transmits private network data over a public
network using tunneling protocols, such as the Generic Routing Encapsulation (GRE), Layer
2 Tunneling Protocol (L2TP), and Point to Point Tunneling Protocol (PPTP).
An MPLS-based VPN, which is as secure as Frame Relay networks, does not encapsulate or
encrypt packets; therefore, IP Security (IPsec), GRE, or L2TP tunnels do not need to be
deployed. The MPLS-based VPN helps minimize the network delay time.
The MPLS-based VPN technology can establish LSPs to connect private network branches
within a single VPN and to connect VPNs.
Figure 1-1061 illustrates an MPLS-based VPN. The following devices are deployed on the
MPLS-based VPN:
 Customer edge (CE): an edge device on a customer network. The CE can be a router,
switch, or host.
 Provider edge (PE): an edge device on a service provider network.
 Provider (P): is a backbone device in the service provider network and does not connect
to CEs directly. A P obtains basic MPLS forwarding capabilities but does not maintain
VPN information.
Issue 01 (2018-05-04) 1582

NE20E-S2
Figure 1-1061 MPLS-based VPN
The principles of an MPLS-based VPN are as follows:

 PEs manage VPN users, establish LSPs between themselves, and advertise routes to
VPN sites.
 LDP or MP-BGP is used to allocate routes.
 The MPLS-based VPN supports IP address multiplexing between sites and the
interconnection of VPNs.
1.12.2.3.2 PBR to an LSP

Policy-based routing (PBR) enables a device to select routes based on a user-defined policy,
which helps transmit traffic securely or balance traffic. On an MPLS network, IP packets that
meet a PBR policy can be forwarded along a specified LSP. Figure 1-1062 illustrates an
MPLS network on which PBR is used.
On the network shown in Figure 1-1062, Device F and Device G are added to a legacy
network to provide new services. To allow part of the new services to pass through the
original network, the PBR can be configured on Device A. The services that meet a specific
PBR policy can travel along LSPs over the original network.
 The traffic for original services is forwarded through the original network.
 The traffic for new services is forwarded by Device F and Device G.
Issue 01 (2018-05-04) 1583

NE20E-S2
Figure 1-1062 Application of the PBR to an LSP
You can also use PBR with LDP fast reroute (FRR) to divert some traffic to a backup LSP to
balance traffic between the primary and backup LSP may be idle relatively.
1.12.3 MPLS LDP

Definition
The Label Distribution Protocol (LDP) is a Multiprotocol Label Switching (MPLS) control
protocol. It classifies forwarding equivalence classes (FECs), distributes labels, and
establishes and maintains label switched paths (LSPs). LDP defines messages in the label
distribution process as well as procedures for processing these messages.
Purpose
On an MPLS network, LDP distributes label mappings and establishes LSPs. LDP sends
multicast Hello messages to discover local peers and sets up local peer relationships.
Alternatively, LDP sends unicast Hello messages to discover remote peers and sets up remote
peer relationships.
Two LDP peers establish a TCP connection, negotiate LDP parameters over the TCP
connection, and establish an LDP session. They exchange messages over the LDP session to
set up an LSP. LDP networking is simple to construct and configure, and LDP establishes
LSPs using routing information.
LDP applications are as follows:
 LDP LSPs guide IP data across a full-mesh MPLS network, over which a Border
Gateway Protocol-free (BGP-free) core network can be built.
 LDP works with BGP to establish end-to-end inter-autonomous system (inter-AS) or
inter-carrier tunnels to transmit Layer 3 virtual private network (L3VPN) services.
 LDP over traffic engineering (TE) combines LDP and TE advantages to establish
end-to-end tunnels to transmit virtual private network (VPN) services.
Issue 01 (2018-05-04) 1584

NE20E-S2
Figure 1-1063 LDP networking
1.12.3.2 Principles
The MPLS architecture consists of multiple label distribution protocols, in which LDP is
widely used.
LDP defines messages in the label distribution process and procedures for processing the
messages. Label switching routers (LSRs) obtain information about incoming labels, next-hop
nodes, and outgoing labels for specified FECs based on the local forwarding table. LSRs use
the information to establish LSPs.
For detailed information about LDP, see relevant standards (LDP Specification).
LDP Adjacency
When an LSR receives a Hello message from a peer, the LSR establishes an LDP adjacency
with the peer may exist. An LDP adjacency maintains a peer relationship between the two
LSRs. There are two types of LDP adjacencies:
 Local adjacency: established by exchanging Link Hello messages between two LSRs.
 Remote adjacency: established by exchanging Target Hello messages between two LSRs.
LDP Peers
Two LDP peers set up LDP sessions and exchange Label Mapping messages over the session
so that they establish an LSP.
LDP Session
An LDP session between LSRs helps them exchange messages, such as Label Mapping
messages and Label Release messages. LDP sessions are classified into the following types:
 Local LDP session: set up over a local adjacency. The two LSRs, one on each end of the
local LDP session, are directly connected.
 Remote LDP session: set up over a remote adjacency. The two LSRs, one on each end of
the remote LDP session, can be either directly or indirectly connected.
Issue 01 (2018-05-04) 1585

NE20E-S2
The local and remote LDP sessions can be set up simultaneously.
Relationships Between LDP Adjacencies, Peers, and Sessions

LDP maintains peer relationships over adjacencies. Two LDP peers can establish more than
one adjacency. An LDP session can only be created if two peers establish a peer relationship.
LDP Messages
Two LSRs exchange the following messages:
 Discovery message: used to notify or maintain the presence of an LSR on an MPLS
network.
 Session message: used to establish, maintain, or terminate an LDP session between LDP
peers.
 Advertisement message: used to create, modify, or delete a mapping between a specific
FEC and label.
 Notification message: used to provide advisory information or error information.
LDP transmits Discovery messages using the User Datagram Protocol (UDP) and transmits
Session, Advertisement, and Notification messages using the Transmission Control Protocol
(TCP).
Label Space and LDP Identifier

 Label space
A label space defines a range of labels allocated between LDP peers. The NE20E only
supports per-platform label space. All interfaces on an LSR share a single label space.
 LDP identifier
An LDP identifier identifies a label space used by a specified LSR. An LDP identifier
consists of 6 bytes including a 4-byte LSR ID and a 2-byte label space. An LDP
identifier is in the format of <LSR ID>:<Label space ID>.
1.12.3.2.2 LDP Session
LDP Discovery Mechanisms

An LDP discovery mechanism is used by LSRs to discover potential LDP peers. LDP
discovery mechanisms are classified into the following types:
 Basic discovery mechanism: used to discover directly connected LSR peers on a link.
An LSR periodically sends Link LDP Hello messages to discover LDP peers and
establish local LDP sessions with the peers.
The Link Hello messages are encapsulated in UDP packets with a specific multicast
destination address and are sent using LDP port 646. A Link Hello message carries an
LDP identifier and other information, such as the hello-hold time and transport address.
If an LSR receives a Link Hello message on a specified interface, a potential LDP peer is
connected to the same interface.
 Extended discovery mechanism: used to discover the LSR peers that are not directly
connected to a local LSR.
The Targeted Hello messages are encapsulated in UDP packets and carry unicast
destination addresses and are sent using LDP port 646. A Targeted Hello message carries
Issue 01 (2018-05-04) 1586

NE20E-S2
an LDP identifier and other information, such as the hello-hold time and transport
address. If an LSR receives a Targeted Hello message, the LSR has a potential LDP peer.
Process of Establishing an LDP Session

Two LSRs exchange Hello messages to establish an LDP session.
Figure 1-1064 demonstrates the process of LDP session establishment.
Figure 1-1064 Process of establishing an LDP session
In Figure 1-1064, the process of establishing an LDP session is as follows:

1. Two LSRs exchange Hello messages. After receiving the Hello messages carrying
transport addresses, the two LSRs use the transport addresses to establish an LDP session.
The LSR with the larger transport address serves as the active peer and initiates a TCP
connection. LSRA serves as the active peer to initiate a TCP connection and LSRB
serves as the passive peer that waits for the TCP connection to initiate.
2. After the TCP connection is successfully established, LSRA sends an Initialization
message to negotiate parameters used to establish an LDP session with LSRB. The main
parameters include the LDP version, label advertisement mode, Keepalive hold timer
value, maximum PDU length, and label space.
3. Upon receipt of the Initialization message, LSRB replies to LSRA in either of the
following situations:
− If LSRB rejects some parameters, it sends a Notification message to terminate LDP
session establishment.
− If LSRB accepts all parameters, it sends an Initialization message and a Keepalive
message to LSRA.
4. Upon receipt of the Initialization message, LSRA performs operations in either of the
following situation:
− If LSRA rejects some parameters after receiving the Initialization message, it sends
a Notification message to terminate LDP session establishment.
− If LSRA accepts all parameters, it sends a Keepalive message to LSRB.
Issue 01 (2018-05-04) 1587

NE20E-S2
After both LSRA and LSRB have accepted each other's Keepalive messages, the LDP session
is successfully established.
Dynamic LDP Advertisement

Dynamic LDP advertisement enables LDP nodes to automatically advertise newly enabled
LDP functions to other nodes. Without dynamic LDP advertisement, if a new LDP function is
enabled on a node, the node sends Initialization messages to advertise to other nodes along an
established LDP LSP, which causes the LSP to be torn down. Dynamic LDP advertisement
allows the node to send Capability messages to advertise the new function so that the
established LSP can continue to transmit services without being torn down. The following
functions can be advertised using Capability messages:
 Global mLDP
 mLDP make-before-break (MBB)
Automatically Established Remote LDP Session

A common remote LDP session is manually configured on two devices at two ends of the
session. Unlike the manual configuration scenario, in some scenarios, a local device needs to
automatically establish remote LDP sessions with its peers. For example, in a Remote LFA
FRR scenario (for details, see 1.12.3.2.7 LDP Auto FRR), after an ingress uses the Remote
LFA algorithm to calculates a PQ node, the ingress needs to run LDP to automatically
establish a remote LDP session with the destination IP address set to the PQ node's IP address.
After an Remote LFA-enabled LSR receives a Targeted Hello message with the R bit of 1, the
LSR automatically establishes a remote LDP peer relationship with its peer and replies with a
Targeted hello message with the R bit of 0, which triggers the establishment of a remote LDP
session. The R bit of 1 in the Targeted Hello message indicates that the receive end
periodically replies with a Targeted Hello message. The R bit of 0 in the Targeted Hello
message indicates that the receive end does not need to periodically reply with a Targeted
Hello message. If the LSR does not receive a Targeted Hello message with the R bit of 1, the
LSR deletes the established remote LDP session.
1.12.3.2.3 Label Advertisement, Distribution, and Retention

LDP peers send messages, such as Label Mapping messages, over an LDP session to
exchange label information with each other to establish an LSP. Relevant standards defines
the label advertisement, distribution control, and retention modes.
Label Advertisement Modes

An LSR on an MPLS network binds a label to a specific FEC and notifies its upstream LSRs
of the binding. This process is called label advertisement.
The label advertisement modes on upstream and downstream LSRs must be the same.
Label Distribution Control Modes

The label distribution control mode defines how an LSR distributes labels.
Label Retention Modes

The label retention mode defines how an LSR preserves a Label Mapping message.
Issue 01 (2018-05-04) 1588

NE20E-S2
1.12.3.2.4 Outbound and Inbound LDP Policies

An LSR receives and sends label mapping messages for all FECs, resulting in the
establishment of a large number of LDP LSPs. The establishment of a large number of LDP
LSPs consumes a great deal of LSR resources and as a result, the LSR may be overburdened.
An outbound or inbound LDP policy needs to be configured to reduce the number of label
mapping messages to be sent or received, reducing the number of LSPs to be established and
saving memory.
Outbound LDP Policy

The outbound LDP policy filters Label Mapping messages to be sent. The outbound LDP
policy does not take effect on L2VPN Label Mapping messages, which means that all the
L2VPN Label Mapping messages can be sent. In addition, the ranges of FECs to which routes
are mapped can be configured separately.
If FECs in the Label Mapping messages to be sent to an LDP peer group or all LDP peers are
in the same range, the same outbound policy is applicable to the LDP peer group or all LDP
peers.
In addition, the outbound LDP policy supports split horizon. After split horizon is configured,
an LSR distributes labels only to its upstream LDP peers.
An LSR checks whether an outbound policy mapped to the labeled BGP route or non-BGP
route is configured before sending a Label Mapping message for a FEC.
 If no outbound policy is configured, the LSR sends the Label Mapping message.
 If an outbound policy is configured, the LSR checks whether the FEC in the Label
Mapping message is within the range defined in the outbound policy. If the FEC is
within the FEC range, the LSR sends a Label Mapping message for the FEC; if the FEC
is not in the FEC range, the LSR does not send a Label Mapping message.
Inbound LDP Policy

The inbound LDP policy filters the label mapping messages to be received. The inbound LDP
policy does not take effect with L2VPN label mapping messages, which means that all
L2VPN label mapping messages can be received. In addition, the range of FECs to which the
non-BGP routes are mapped is configurable.
If FECs in the label mapping messages to be received by an LDP peer group or all LDP peers
are in the same range, the same inbound policy applies to the LDP peer group or all LDP
peers.
An LSR checks whether an inbound policy mapped to a FEC is configured before receiving a
label mapping message for the FEC.
 If no inbound policy is configured, the LSR receives the label mapping message.
 If an inbound policy is configured, the LSR checks whether the FEC in the label
mapping message is within the range defined in the inbound policy. If the FEC is within
the FEC range, the LSR receives the label mapping message for the FEC; if the FEC is
not in the FEC range, the LSR does not receive the label mapping message.
If the FEC fails to pass an outbound policy on an LSR, the LSR receives no label mapping
message for the FEC.
One of the following results may occur:
Issue 01 (2018-05-04) 1589

NE20E-S2
 If a DU LDP session is established between an LSR and its peer, a liberal LSP is
established. This liberal LSP cannot function as a backup LSP after LDP FRR is enabled.
 If a DoD LDP session is established between an LSR and its peer, the LSR sends a
Release message to tear down label-based bindings.
1.12.3.2.5 Establishment of an LDP LSP
Command-based LDP LSP Establishment

Figure 1-1065 shows the procedure for establishing an LSP using the downstream unsolicited
(DU) and ordered modes.
Figure 1-1065 LDP LSP establishment
The process of establishing an LDP LSP is as follows:

1. If a label edge router (LER) on an MPLS network discovers a new direct route due to a
network route change, and the address carried in the new route does not belong to any
existing forwarding equivalence class (FEC), the LER creates a FEC for the address.
2. If the egress has available labels for distribution, it distributes a label for the FEC and
pro-actively sends a Label Mapping message to its upstream transit LSR. The Label
Mapping message contains the label and an FEC bound to the label.
3. The transit LSR adds the mapping in the Label Mapping message to the label forwarding
table and sends a Label Mapping message with a specified FEC to its upstream LSR.
4. The ingress LSR also adds the mapping to its label forwarding table. The ingress LSR
establishes an LSP and forwards packets along the LSP.
Proxy Egress LSP

A proxy egress extends an LSP to a non-LDP node. The extended LSP is called a proxy egress
LSP. A penultimate LSR functions as a special proxy egress when penultimate hop popping
(PHP) is enabled.
A proxy egress LSP can be established on a network with MPLS-incapable routers or in the
Border Gateway Protocol (BGP) route load balancing scenario. For example, on the network
shown in Figure 1-1066, LSRA, LSRB, and LSRC, all except LSRD, are in an MPLS domain.
Issue 01 (2018-05-04) 1590

NE20E-S2
An LSP is established along the path LSA -> LSRB -> LSRC. LSRC functions as a proxy
egress and extends the LSP to LSRD. The extended LSP is a proxy egress LSP.
Figure 1-1066 LDP LSP establishment
1.12.3.2.6 LDP Session Protection

LDP session protection is an enhancement to the basic peer discovery mechanism. If the basic
peer discovery mechanism fails, LDP session protection uses an extended peer discovery
mechanism to maintain a session between LDP peers. After the basic peer discovery
mechanism recovers, LDP can use it to rapidly converge routes and reestablish an LSP.
Background
If a direct link for a local LDP session fails, the LDP adjacency is torn down, and the session
and labels are deleted. After the direct link recovers, the local LDP session is reestablished
and distributes labels so that an LSP can be reestablished over the session. Before the LSP is
reestablished, however, LDP LSP traffic is dropped.
To speed up LDP LSP convergence and minimize packet loss, the NE20E implements LDP
session protection.
LDP session protection helps maintain an LDP session, eliminating the need to reestablish an
LDP session or re-distribute labels.
Principles
In Figure 1-1067, LDP session protection is configured on the nodes at both ends of a link.
The two nodes exchange Link Hello messages to establish a local LDP session and exchange
Targeted Hello messages to establish a remote LDP session, forming a backup relationship
between the remote LDP session and local LDP session.
Issue 01 (2018-05-04) 1591

NE20E-S2
Figure 1-1067 LDP session protection
In Figure 1-1067, if the direct link between LSRA and LSRB fails, the adjacency established
using Link Hello messages is torn down. Because the indirectly connected link is working
properly, the remote adjacency established using Targeted Hello messages remains. Therefore,
the LDP session is maintained by the remote adjacency, and the mapping between FECs and
labels for the session also remains. After the direct link recovers, the local LDP session can
rapidly restore LSP information. There is no need to reestablish the LDP session or
re-distribute labels, which minimizes the time required for LDP session convergence.
Session Hold Time

In addition to LDP session protection, a session hold time can be configured. After a local
adjacency established using Link Hello messages is torn down, a remote adjacency
established using Targeted Hello messages continues to maintain an LDP session within the
configured session hold time. If the local adjacency does not recover after the session hold
time elapses, the remote adjacency is torn down, and the LDP session maintained using the
remote adjacency is also torn down. If the session hold time is not specified, the remote
adjacency permanently maintains the LDP session.
1.12.3.2.7 LDP Auto FRR

LDP auto fast reroute (FRR) backs up local interfaces to provide the fast reroute function for
MPLS networks.
Background
On an MPLS network with both active and standby links, if an active link fails, IGP routes
re-converge, and the IGP route of the standby link becomes reachable. An LDP LSP over the
standby link is then established. During this process, some traffic is lost. To minimize traffic
loss, LDP Auto FRR is used.
On the network enabled with LDP Auto FRR, if an interface failure (detected by the interface
itself or by an associated BFD session) or a primary LSP failure (detected by an associated
BFD session) occurs, LDP FRR is notified of the failure and rapidly forwards traffic to a
backup LSP, protecting traffic on the primary LSP. The traffic switchover minimizes the
traffic interruption time.
Implementation
LDP LFA FRR
Issue 01 (2018-05-04) 1592

NE20E-S2
LDP LFA FRR is implemented based on IGP LFA FRR's LDP Auto FRR. LDP LFA FRR uses
the liberal label retention mode to obtain a liberal label, applies for a forwarding entry
associated with the label, and forwards the forwarding entry to the forwarding plane as a
backup forwarding entry to be used by the primary LSP. If an interface detects a failure of its
own, bidirectional forwarding detection (BFD) detects an interface failure, or BFD detects a
primary LSP failure, LDP LFA FRR rapidly switches traffic to a backup LSP to protect traffic
on the primary LSP.
Figure 1-1068 Typical usage scenario for LDP Auto FRR (triangle topology)
Figure 1-1068 shows a typical usage scenario for LDP Auto FRR. The preferred
LSRA-to-LSRB route is LSRA-LSRB and the second optimal route is LSRA-LSRC-LSRB. A
primary LSP between LSRA and LSRB is established on LSRA, and a backup LSP of
LSRA-LSRC-LSRB is established to protect the primary LSP. After receiving a label from
LSRC, LSRA compares the label with the LSRA-to-LSRB route. Because the next hop of the
LSRA-to-LSRB route is not LSRC, LSRA preserves the label as a liberal label.
If the backup route corresponding to the source of the liberal label for LDP auto FRR exists,
and its destination meets the policy for LDP to create a backup LSP. LSRA can apply a
forwarding entry for the liberal label, establish a backup LSP as the backup forwarding entry
of the primary LSP, and send the entries mapped to both the primary and backup LSPs to the
forwarding plane. In this way, the primary LSP is associated with the backup LSP.
LDP Auto FRR is triggered when the interface detects faults by itself, BFD detects faults in
the interface, or BFD detects a primary LSP failure. After LSP FRR is complete, traffic is
switched to the backup LSP based on the backup forwarding entry. Then, the route is
converged to LSRA-LSRC-LSRB. An LSP is established on the new LSP (the original backup
LSP), and the original primary LSP is torn down, and then the traffic is forwarded along the
new LSP over the path LSRA-LSRC-LSRB.
LDP Remote LFA FRR
LDP LFA FRR cannot calculate backup paths on large networks, especially ring networks,
which fails to meet reliability requirements. To address this issue, LDP Remote LFA FRR is
used. Remote LFA FRR is implemented based on IGP Remote LFA FRR's (1.10.8.2.13 IS-IS
Auto FRR) LDP Auto FRR. Figure 1-1069 illustrates the typical LDP Auto FRR usage
scenario. The primary LDP LSP is established over the path PE1 -> PE2. Remote LFA FRR
establishes a Remote LFA FRR LSP over the path PE1 -> P2 -> PE2 to protect the primary
LDP LSP.
Issue 01 (2018-05-04) 1593

NE20E-S2
Figure 1-1069 Typical LDP Auto FRR usage scenario - ring topology

1. An IGP uses the Remote LFA algorithm to calculate a Remote LFA route with the PQ
node (P2) IP address and the iterated outbound interface's next hop and then notifies the
route management module of the information. For the PQ node definition, see
1.10.8.2.13 IS-IS Auto FRR.
2. LDP obtains the Remote LFA route from the route management module. PE1
automatically establishes a remote LDP peer relationship with the PQ node and a remote
LDP session for the relationship. PE1 then establishes an LDP LSP to the PQ node and a
Remote LFA FRR LSP over the path PE1 -> P2 -> PE2. For information about how to
automatically establish a remote LDP session, see 1.12.3.2.2 LDP Session.
3. LDP-enabled PE1 establishes an LDP LSP over the path PE1 -> P1 -> P2 with the
iterated outbound interface's next hop. This LSP is called a Remote LFA FRR iterated
LSP.
If PE1 detects a fault, PE1 rapidly switches traffic to the Remote LFA FRR LSP.
1.12.3.2.8 LDP-IGP Synchronization
Background
LDP-IGP synchronization enables the LDP status and the IGP status to go Up simultaneously,
which helps minimize traffic interruption time if a fault occurs.
A network provides active and standby links for redundancy. If the active link fails, both an
IGP route and an LDP LSP switch from the active link to the standby link. After the active
link recovers, the IGP route switches back to the active link earlier than the LDP LSP. Traffic
therefore switches to the IGP route over the active link but is dropped because the LSP is
Issue 01 (2018-05-04) 1594

NE20E-S2
unreachable over the new active link. To prevent traffic loss, LDP-IGP synchronization can be
configured.
LDP-IGP synchronization supports OSPFv2 and IS-IS IPv4.
On a network enabled with LDP-IGP synchronization, an IGP keeps advertising the maximum
cost of an IGP route over the new active link to delay IGP route convergence until LDP
converges. Traffic keeps switching back and forth between the standby and active links. The
backup LSP is torn down after the LSP on the active link is established.
LDP-IGP synchronization involves the following timers:
 Hold-max-cost timer
 Delay timer
Implementation
Figure 1-1070 Switchback in LDP-IGP synchronization
 In Figure 1-1070, a network has both an active and standby link. When the active link
recovers from any fault, traffic is switched from the standby link to the active link.
During the traffic switchback, the backup LSP cannot be used, and a new LSP cannot be
set up over the active link once IGP route convergence is complete. This causes a traffic
interruption for a short period of time. To help prevent this problem, LDP-IGP
synchronization can be configured to delay the IGP route switchback until LDP
converges. The backup LSP is not deleted and continues forwarding traffic until an LSP
over the active link is established. The process of LDP-IGP synchronization is as
follows:
a. A link recovers from a fault.
b. An LDP session is set up between LSR2 and LSR3. The IGP advertises the
maximum cost of the active link to delay the IGP route switchback.
c. Traffic is still forwarded along the backup LSP.
d. Once set up, the LDP session transmits Label Mapping messages and advertises the
IGP to start synchronization.
e. The IGP advertises the normal cost of the active link, and its routes converge on
the original forwarding path. The LSP is reestablished and delivers entries to the
forwarding table.
Issue 01 (2018-05-04) 1595

NE20E-S2
The whole process usually takes milliseconds.

 If an LDP session between two nodes on an active link fails, the LSP on the active link is
torn down, but the IGP route for the active link is reachable. In this case, traffic fails to
switch from the primary LSP to a backup LSP and is discarded. To prevent this problem,
LDP-IGP synchronization can be configured so that after an LDP session fails, LDP
notifies an IGP of the failure. The IGP advertises the maximum cost of the failed link,
which enables the route to switch from the active link to the standby link. In addition to
the LSP switchover from the primary LSP to the backup LSP, LDP-IGP synchronization
is implemented. The process of LDP-IGP synchronization is as follows:
a. An LDP session between two nodes on an active link fails.
b. LDP notifies an IGP of failure in the LDP session over which the primary LSP is
established. The IGP then advertises the maximum cost along the active link.
c. The IGP route switches to the standby link.
d. A backup LSP is set up over the standby link and then forwarding entries are
delivered.
To prevent an LDP session from failing to be established, you can set the value of a
Hold-max-cost timer to always advertise the maximum cost, which enables traffic to be
transmitted along the backup link before the LDP session is reestablished on the active
link.
 LDP-IGP synchronization state machine
After LDP-IGP synchronization is enabled on an interface, the LDP-IGP synchronization
state machine operates based on the flowchart shown in Figure 1-1071.
Issue 01 (2018-05-04) 1596

NE20E-S2
Figure 1-1071 LDP-IGP synchronization status transition
Note differences when different IGP protocols are used:
 When OSPF is used, the status transits based on the flowchart shown in Figure 1-1071.
 When IS-IS is used, the Hold-normal-cost state does not exist. After the Hold-max-cost timer expires,
IS-IS advertises the actual link cost, but the Hold-max-cost state is displayed even though this state
is nonexistent.
Usage Scenario
Figure 1-1072 shows an LDP-IGP synchronization scenario.
On the network shown in Figure 1-1072, an active link and a standby link are established.
LDP-IGP synchronization and LDP FRR are deployed.
Issue 01 (2018-05-04) 1597

NE20E-S2
Figure 1-1072 LDP-IGP synchronization scenario
Benefits
Packet loss is reduced during an active/standby link switchover, improving network
reliability.
1.12.3.2.9 LDP GR
LDP supports graceful restart (GR) that enables a Restarter, together with a Helper, to perform
a master/backup switchover or protocol restart, without interrupting traffic.
Issue 01 (2018-05-04) 1598

NE20E-S2
Figure 1-1073 LDP GR
After a device without the GR capability performs a master/backup switchover, an LDP

session between the device and its neighbor node goes Down. As a result, the neighbor node
deletes the LSP established over the LDP session, and services are interrupted for a short
period of time. LDP GR can be configured to prevent interruptions because LDP GR remains
capable of forwarding entries after a master/backup device switchover or protocol restart is
performed. LDP GR helps implement uninterrupted MPLS forwarding. Figure 1-1073
illustrates the process of LDP GR:
1. Before a master/slave switchover is performed, LDP neighbors negotiate the GR
capability when establishing an LDP session.
2. When the GR Helper is aware that the Restarter has performed a master/slave switchover
or LDP is restarted, the Helper starts a Reconnect timer, and reserves the forwarding
entries of the Restarter before the timer expires to prevent forwarding interruptions.
3. If an LDP session between the Restarter and Helper is reestablished before the
Reconnect timer expires, the Helper deletes the Reconnect timer and starts a Recovery
timer.
4. The Helper and the Restarter help each other restore the forwarding entries before the
Recovery timer expires. After the timer expires, the Helper deletes all Restarter-related
forwarding entries that were not restored.
5. After the Restarter performs the master/backup switchover or protocol restart, the
Restarter starts a Forwarding State Holding timer. The Restarter preserves the
forwarding entries before a restart and restores the forwarding entries before the timer
expires with the help of the Helper. After the Forwarding State Holding timer expires,
the Restarter deletes all forwarding entries that were not restored.
The NE20E can function as a Helper to help the Restarter implement uninterrupted
forwarding during a master/backup switchover or protocol restart.
Issue 01 (2018-05-04) 1599

NE20E-S2
1.12.3.2.10 BFD for LDP LSP

Bidirectional forwarding detection (BFD) monitors Label Distribution Protocol (LDP) label
switched paths (LSPs). If an LDP LSP fails, BFD can rapidly detect the fault and trigger a
primary/backup LSP switchover, which improves network reliability.
Background
If a node or link along an LDP LSP that is transmitting traffic fails, traffic switches to a
backup LSP. The path switchover speed depends on the detection duration and traffic
switchover duration. A delayed path switchover causes traffic loss. LDP fast reroute (FRR)
can be used to speed up the traffic switchover, but not the detection process.
As shown in Figure 1-1074, a local label switching router (LSR) periodically sends Hello
messages to notify each peer LSR of the local LSR's presence and establish a Hello adjacency
with each peer LSR. The local LSR constructs a Hello hold timer to maintain the Hello
adjacency with each peer. Each time the local LSR receives a Hello message, it updates the
Hello hold timer. If the Hello hold timer expires before a Hello message arrives, the LSR
considers the Hello adjacency disconnected. The Hello mechanism cannot rapidly detect link
faults, especially when a Layer 2 device is deployed between the local LSR and its peer.
Figure 1-1074 Primary and backup LDP LSPs
The rapid, light-load BFD mechanism is used to quickly detect faults and trigger a
primary/backup LSP switchover, which minimizes data loss and improves service reliability.
BFD for LDP LSP is implemented by establishing a BFD session between two nodes on both
ends of an LSP and binding the session to the LSP. BFD rapidly detects LSP faults and
triggers a traffic switchover. When BFD monitors a unidirectional LDP LSP, the reverse path
of the LDP LSP can be an IP link, an LDP LSP, or a traffic engineering (TE) tunnel.
A BFD session that monitors LDP LSPs is negotiated in either static or dynamic mode:
 Static configuration: The negotiation of a BFD session is performed using the local and
remote discriminators that are manually configured for the BFD session to be established.
On a local LSR, you can bind an LSP with a specified next-hop IP address to a BFD
session with a specified peer IP address.
 Dynamic establishment: The negotiation of a BFD session is performed using the BFD
discriminator type-length-value (TLV) in an LSP ping packet. You must specify a policy
for establishing BFD sessions on a local LSR. The LSR automatically establishes BFD
sessions with its peers and binds the BFD sessions to LSPs using either of the following
policies:
Issue 01 (2018-05-04) 1600

NE20E-S2
− Host address-based policy: The local LSR uses all host addresses to establish BFD
sessions. You can specify a next-hop IP address and an outbound interface name of
LSPs and establish BFD sessions to monitor the specified LSPs.
− Forwarding equivalence class (FEC)-based policy: The local LSR uses host
addresses listed in a configured FEC list to automatically establish BFD sessions.
BFD uses the asynchronous mode to check LSP continuity. That is, the ingress and egress
periodically send BFD packets to each other. If one end does not receive BFD packets from
the other end within a detection period, BFD considers the LSP Down and sends an LSP
Down message to the LSP management (LSPM) module.
Although BFD for LDP is enabled on a proxy egress, a BFD session cannot be established for the
reverse path of a proxy egress LSP on the proxy egress.
BFD for LDP Tunnel

BFD for LDP LSP only detects primary LSP faults and switches traffic to an FRR bypass LSP
or existing load-balancing LSPs. If the primary and FRR bypass LSPs or the primary and
load-balancing LSPs fail simultaneously, the BFD mechanism does not take effect. LDP can
instruct its upper-layer application to perform a protection switchover (such as VPN FRR or
VPN equal-cost load balancing) only after LDP itself detects the FRR bypass LSP failure or
the load-balancing LSP failure.
To address this issue, BFD for LDP tunnel is used. LDP tunnels include the primary LSP and
FRR bypass LSP. The BFD for LDP tunnel mechanism establishes a BFD session that can
simultaneously monitor the primary and FRR bypass LSPs or the primary and load-balancing
LSPs. If both the primary and FRR bypass LSPs fail or both the primary and load-balancing
LSPs fail, BFD rapidly detects the failures and instructs the LDP upper-layer application to
perform a protection switchover, which minimizes traffic loss.
BFD for LDP tunnel uses the same mechanism as BFD for LDP LSP to monitor the
connectivity of each LSP in an LDP tunnel. Unlike BFD for LDP LSP, BFD for LDP tunnel
has the following characteristics:
 Only dynamic BFD sessions can be created for LDP tunnels.
 A BFD for LDP tunnel session is triggered using a host IP address, a FEC list, or an IP
prefix list.
 No next-hop address or outbound interface name can be specified in any BFD session
trigger policies.
Usage Scenarios
BFD for LDP LSP can be used in the following scenarios:
 Primary and bypass LDP FRR LSPs are established.
 Primary and bypass virtual private network (VPN) FRR LSPs are established.
Benefits
BFD for LDP LSP provides a rapid, light-load fault detection mechanism for LDP LSPs,
which improves network reliability.
Issue 01 (2018-05-04) 1601

NE20E-S2
1.12.3.2.11 LDP Bit Error Detection

The LDP bit error detection function detects bit errors on LDP interfaces and LDP LSPs and
transmits the LSP bit error rate to services carried by LDP LSPs, triggering a primary/backup
LSP switchover. This function prevents service transmission quality from deteriorating and
improves network reliability.
Background
As mobile services evolve from narrowband voice services to integrated broadband services,
providing rich voice, streaming media, and high speed downlink packet access (HSDPA)
services, the demand for network bandwidth is rapidly increasing. Meeting the bandwidth
demand on traditional bearer networks requires huge investments. Therefore, carriers are in
urgent need of an access mode that is low cost, flexible, and highly efficient, which can help
them meet the challenges brought by the growth in wideband services. In this context, the
all-IP mobile bearer networks are an effective means of dealing with these issues. IP radio
access networks (RANs), a type of IP-based mobile bearer network, are increasingly widely
used.
IP RANs, however, have more complex reliability requirements than traditional bearer
networks when carrying broadband services. Traditional fault detection mechanisms cannot
trigger protection switching based on random bit errors. Therefore, bit errors may degrade or
even interrupt services on an IP RAN in extreme cases. Bit-error-triggered protection
switching can solve this problem.
Benefits
Bit-error-triggered LDP protection switching has the following benefits:
 Protects traffic from random bit errors, improving service quality.
 Enables devices to record bit error events, enabling carriers to quickly locate the nodes
or lines with bit errors and take corrective measures.
Related Concepts
LDP interface bit error rate
LDP interface bit error rate is the bit error rate detected by LDP on an interface. A node uses a
Link Hello message to report its LDP interface bit error rate to an upstream LDP peer.
LSP bit error rate
LSP bit error rate on a node = LSP bit error rate reported by the downstream LDP peer + the
LDP interface bit error rate reported by the downstream LDP peer.
Implementation
The NE20E supports single-node and multi-node LDP bit error detection and calculation.
When LDP detects an interface bit error on a node along an LSP, the node sends a Link Hello
message to notify its upstream LDP peer of the interface bit error rate and a Label Mapping
message to notify its upstream LDP peer of the LSP bit error rate. Upon receipt of the
notifications, the upstream LDP peer uses the received interface bit error rate as the local LDP
interface bit error rate, adds the LDP interface bit error rate to the received LSP bit error rate
to obtain the local LSP bit error rate, and sends the interface bit error rate and local LSP bit
error rate to its upstream LDP peer. This process repeats until the ingress of the LSP calculates
Issue 01 (2018-05-04) 1602

NE20E-S2
its local LSP bit error rate. Figure 1-1075 illustrates the networking for bit-error-triggered
LDP protection switching.
Figure 1-1075 LDP bit error detection
In Figure 1-1075, an LSP is established between PE1 and PE2. If if1 and if3 interfaces both
detect bit errors, the bit errors along the LSP to the ingress are advertised and calculated as
described by the text in Figure 1-1075.
LDP only detects and transmits bit errors, and service switching such as in PW switching or L3VPN
route switching occurs on paths carried over LDP.
1.12.3.2.12 LDP MTU

The maximum transmission unit (MTU) defines the maximum number of bytes that a device
can transmit at a time. It plays an important role when two devices communicate on a network.
If the MTU exceeds the maximum number of bytes supported by a receive end or a transit
device, packets are fragmented or even discarded, which imposes heavy burden on network
transmission. Devices must calculate MTUs before communicating so that the sent packets
can successfully reach receive ends.
LDP MTU Principles

LDP LSP forwarding differs greatly from IP forwarding in terms of implementation
mechanisms, but they share a large number of similarities regarding the MTU principles. Both
the LDP MTU and IP MTU are used so that packets pass through each transit device smoothly
and reach receivers without fragmentation or reassembly.
An LSR selects the smallest value as the LDP MTU among MTUs advertised by all preferred
next hops and the MTU of the local outbound interface. The LSR then sends upstream LSRs
through Label Mapping messages carrying the MTU TLV representing the calculated LDP
MTU. If any MTU changes due to the local outbound interface change or configuration
change, the LSR recalculates the MTU and sends Label Mapping messages carrying the
calculated MTU to all upstream LSRs.
1.12.3.2.13 LDP Authentication

Message-digest algorithm 5 (MD5) is a standard digest algorithm defined in relevant
standards. A typical application of MD5 is to calculate a message digest to prevent message
Issue 01 (2018-05-04) 1603

NE20E-S2
spoofing. The MD5 message digest is a unique result calculated by an irreversible character
string conversion. If a message is modified during transmission, a different digest is generated.
After the message arrives at the receive end, the receive end can determine whether the packet
is modified by comparing the received digest with the pre-computed digest.
LDP MD5 authentication prevents LDP packets from being modified by generating a unique
digest for an information segment. This authentication mode is stricter than common
checksum verification for TCP connections.
Before an LDP message is sent over a TCP connection, LDP MD5 authentication is performed
by padding the TCP header with a unique digest. This digest is a result calculated by MD5
based on the TCP header, LDP message, and password set by the user.
When receiving this TCP packet, the receiver obtains the TCP header, digest, and LDP
message, and then uses MD5 to calculate a digest based on the received TCP header, received
LDP message, and locally stored password. The receiver compares the calculated digest with
the received one to check whether the packet is modified.
A password can be set in either ciphertext or simple text. The simple password is directly
recorded in the configuration file. The ciphertext password is recorded in the configuration
file after being encrypted using a special algorithm.
During the calculation of a digest, the manually entered character string is used regardless of
whether the password is in simple text or ciphertext. This means that a password in ciphertext
does not participate in MD5 calculation.
1.12.3.2.14 LDP over TE
Principles
LDP over TE establishes LDP LSPs across RSVP-TE areas. RSVP-TE is an MPLS tunnel
technique used to generate LSPs as tunnels for other protocols to transparently transmit
packets. LDP is another MPLS tunnel technique used to generate LDP LSPs. LDP over TE
allows an LDP LSP to span an RSVP-TE area so that a TE tunnel functions as a hop along an
LDP LSP.
After an RSVP-TE tunnel is established, an IGP (OSPF or IS-IS) locally computes routes or
advertises link state advertisements (LSAs) or link state PDUs (LSPs) to select a TE tunnel
interface as the outbound interface. In the following example, the original router is directly
connected to the destination router of the TE tunnel through logical interfaces. Packets are
transparently transmitted along the TE tunnel.
Figure 1-1076 Label distribution of LDP over TE
In Figure 1-1076, P1, P2, and P3 belong to an RSVP-TE domain. PE1 and PE2 are located in
a VPN, and LDP sessions between PE1 and P1 and between P3 and PE2 are established. The
Issue 01 (2018-05-04) 1604

NE20E-S2
following example demonstrates the process of establishing an LDP LSP between PE1 and
PE2 over the RSVP-TE domain:
1. An RSVP-TE tunnel between P1 and P3 is set up. P3 assigns RSVP-Label-1 to P2, and
P2 assigns RSVP-Label-2 to P1.
2. PE2 initiates LDP to set up an LSP and sends a Label Mapping message carrying
LDP-Label-1 to P3.
3. Upon receipt, P3 sends a Label Mapping message carrying LDP-Label-2 to P1 over a
remote LDP session.
4. Upon receipt, P1 sends a Label Mapping message carrying LDP-Label-3 to PE1.
Usage Scenario
Figure 1-1077 Networking diagram for LDP over TE
LDP over TE is used to transmit VPN services. Because carriers have difficulties in deploying
MPLS traffic engineering on an entire network, they use LDP over TE to plan a core TE area
and implement LDP outside this area. Figure 1-1077 illustrates an LDP over TE network.
The advantage of LDP over TE is that an LDP LSP is easier to operate and maintain than a TE
tunnel, and the resource consumption of LDP is lower than that of the RSVP soft state. On an
LDP over TE network, TE tunnels are deployed only in the core area, but not on all devices
including PEs. This simplifies deployment and maintenance on the entire network and
relieves burden from PEs. In addition, the core area can take full advantage of TE tunnels to
perform protection switchovers, path planning, and bandwidth protection.
1.12.3.2.15 LDP GTSM

For an overview of GTSM, see the HUAWEI NE20E-S2 Feature Description - Security.
Issue 01 (2018-05-04) 1605

NE20E-S2
Principles
LDP GTSM implements GTSM implementation over LDP.
To protect the router against attacks, GTSM checks the TTL in each packet to verify it. GTSM
for LDP verifies LDP packets exchanged between neighbor or adjacent (based on a fixed
number of hops) routers. The TTL range is configured on each router for packets from other
routers, and GTSM is enabled. If the TTL of an LDP packet received by a router configured
with LDP is out of the TTL range, the packet is considered invalid and discarded. Therefore,
the upper layer protocols are protected.
Usage Scenario
GTSM is used to protect the TCP/IP-based control plane against CPU usage attacks, for
example, CPU overload attacks. GTSM for LDP is used to verify all LDP packets to prevent
LDP from suffering CPU-based attacks when LDP receives and processes a large number of
forged packets.
Figure 1-1078 Networking diagram for LDP GTSM
In Figure 1-1078, LSR1 through LSR5 are core routers on the backbone network. When
LSRA is connected to the router through another device, LSRA may initiate an attack by
forging LDP packets that are transmitted among LSR 1 to LSR 5.
After LSRA accesses the backbone network through another device and forges a packet, the
TTL carried in the forged packet cannot be forged.
A GTSM policy is configured on LSR1 through LSR5 separately and is used to verify packets
reaching possible neighbors. For example, on LSR5, the valid number of hops is set to 1 or 2,
and the valid TTL is set to 254 or 255 for packets sent from LSR2. The forged packet sent by
LSRA to LSR5 through multiple intermediate devices contains a TTL value that is out of the
preset TTL range. LSR5 discards the forged packet and prevents the attack.
Issue 01 (2018-05-04) 1606

NE20E-S2
1.12.3.2.16 Compatible Local and Remote LDP Session
Principles
The local and remote LDP adjacencies can be connected to the same peer so that the peer is
maintained by both the local and remote LDP adjacencies.
On the network shown in Figure 1-1079, when the local LDP adjacency is deleted due to a
failure in the link to which the adjacency is connected, the peer's type may change without
affecting its presence or status. (The peer type is determined by the adjacency type. The types
of adjacencies can be local, remote, and coexistent local and remote.)
If the link becomes faulty or is recovering from a fault, the peer type may change while the
type of the session associated with the peer changes. However, the session is not deleted and
does not become Down. Instead, the session remains Up.
Usage Scenario
Figure 1-1079 Networking diagram for a coexistent local and remote LDP session
A coexistent local and remote LDP session typically applies to L2VPNs. On the network
shown in Figure 1-1079, L2VPN services are transmitted between PE1 and PE2. When the
directly connected link between PE1 and PE2 recovers from a disconnection, the processing
of a coexistent local and remote LDP session is as follows:
1. MPLS LDP is enabled on the directly connected PE1 and PE2, and a local LDP session
is set up between PE1 and PE2. PE1 and PE2 are configured as the remote peer of each
other, and a remote LDP session is set up between PE1 and PE2. Local and remote
adjacencies are then set up between PE1 and PE2. Since now, both local and remote
LDP sessions exist between PE1 and PE2. L2VPN signaling messages are transmitted
through the compatible local and remote LDP session.
2. When the physical link between PE1 and PE2 becomes Down, the local LDP adjacency
also goes Down. The route between PE1 and PE2 is still reachable through the P, which
means that the remote LDP adjacency remains Up. The session changes to a remote
session so that it can remain Up. The L2VPN does not detect the change in session status
and does not delete the session. This prevents the L2VPN from having to disconnect and
recover services, and shortens service interruption time.
3. When the fault is rectified, the link between PE1 and PE2 as well as the local LDP
adjacency can go Up again. The session changes to the compatible local and remote LDP
session and remains Up. Again, the L2VPN will not detect the change in session status
and does not delete the session. This reduces service interruption time.
Issue 01 (2018-05-04) 1607

NE20E-S2
1.12.3.2.17 Assigning Labels to Both Upstream and Downstream LSRs

This sub-feature solves the possible problem of slow convergence if a link becomes faulty.
When labels only are distributed for upstream LSRs, an LSR checks the
upstream/downstream relationship with peers in LDP sessions according to routing
information before sending Label Mapping messages. An upstream LSR will not send a Label
Mapping message associated with a route specified to its downstream LSR. If the route
changes and the upstream/downstream relationship is switched, the new downstream LSR
resends the Label Mapping message, which slows convergence.
With this feature, each LSR can send Label Mapping messages to all peers irrespective of
upstream or downstream relationships.
In Figure 1-1080, P2 and PE3 are connected along the paths P2 -> P1 -> P3 -> PE3 and P2 ->
P4 -> PE4 -> PE3. According to the routes on the loopback interface of PE3, P1 is the next
hop of P2. When labels can only be assigned to upstream nodes and P2 receives a Label
Mapping message from P1, it does not send the Label Mapping message associated with the
route to P1. If the link between P1 and P3 is faulty, the route from PE1 to PE3 is switched
from PE1 -> P1 -> P3 -> PE3 to PE1 -> P1 -> P2 -> P4 -> P3 -> PE3 and P2 becomes the
downstream node of P1. The LSP can only be set up after P2 resends a Label Mapping
message. However, P2 does not send the Label Mapping message to P1, which slows LSP
re-convergence.
When LDP is enabled to distribute labels to all peers, P2 sends a Label Mapping message
associated with the route to P1 after receiving the Label Mapping message from P1, which
allows LDP to generate a liberal LSP on P1. If the link between P1 and P3 becomes faulty, the
route from PE1 to PE3 is switched from PE1 -> P1 -> P3 -> PE3 to PE1 -> P1 -> P2 -> P4 ->
P3 -> PE3, P2 becomes the downstream of P1, and the liberal LSP changes to a normal LSP,
which accelerates LSP convergence.
Figure 1-1080 Networking diagram for both upstream and downstream LSRs assigned labels by
LDP
In addition, split horizon can be configured to have Label Mapping messages only sent to
specified upstream LSRs.
Issue 01 (2018-05-04) 1608

NE20E-S2
1.12.3.2.18 mLDP
The multipoint extensions for Label Distribution Protocol (mLDP) transmits multicast
services over IP or Multiprotocol Label Switching (MPLS) backbone networks, which
simplifies network deployment.
Background
Traditional core and backbone networks run IP and MPLS to flexibly transmit unicast packets
and provide high reliability and traffic engineering (TE) capabilities.
The proliferation of applications, such as IPTV, multimedia conference, and massively
multiplayer online role-playing games (MMORPGs), amplifies demands on multicast
transmission over IP/MPLS networks. The existing P2P MPLS technology requires a transmit
end to deliver the same data packet to each receive end, which wastes network bandwidth
resources.
The point-to-multipoint (P2MP) Label Distribution Protocol (LDP) technique defined in
mLDP can be used to address the preceding problem. P2MP LDP extends the MPLS LDP
protocol to meet P2MP transmission requirements and uses bandwidth resources much more
efficiently.
Figure 1-1081 shows the P2MP LDP LSP networking. A tree-shaped LSP originates at the
ingress PE1 and is destined for egresses PE3, PE4, and PE5. The ingress directs multicast
traffic into the LSP. The ingress sends a single packet along the trunk to the branch node P4.
P4 replicates the packet and forwards the packet to its connected egresses. This process
prevents duplicate packets from wasting trunk bandwidth.
Figure 1-1081 P2MP LDP LSP networking
Issue 01 (2018-05-04) 1609

NE20E-S2
Related Concepts
Table 1-330 describes the nodes used on the P2MP LDP network shown in Figure 1-1081.
Table 1-330 P2MP LDP nodes
Item Description Example
Root node An ingress on a P2MP LDP LSP. The ingress initiates PE1
LSP calculation and establishment. The ingress pushes a
label into each multicast packet before forwarding it along
an established LSP.
Transit An intermediate node that swaps an incoming label for an P1 and P3
node outgoing label in each MPLS packet. A branch node may
function as a transit node.
Leaf node A destination node on a P2MP LDP LSP. PE3, PE4, and
PE5
Bud node An egress of a sub-LSP and transit node of other PE2
sub-LSPs. The bud node is connected to a customer edge
(CE) and is functioning as an egress.
Branch A node from which LSP branches (sub-LSP) start. P4
node A branch node replicates packets and swaps an incoming
label for an outgoing label in each packet before
forwarding it to each leaf node.
Implementation
The procedure for using mLDP to establish and maintain a P2MP LDP LSP is as follows:
 Nodes negotiate the P2MP LDP capability with each other.
mLDP enables a node to negotiate the P2MP LDP capability with a peer node and
establish an mLDP session with the peer node.
 A P2MP LDP LSP is established.
Each leaf and transit node sends a Label Mapping message upstream until the root node
receives a Label Mapping message downstream. The root node then establishes a P2MP
LDP LSP with sub-LSPs that are destined for leaf nodes.
 A node deletes a P2MP LDP LSP.
A node of a specific type uses a specific rule to delete an LSP, which minimizes the
service interruptions.
 The P2MP LDP LSP updates.
If the network topology or link cost changes, the P2MP LDP LSP updates automatically
based on a specified rule, which ensures uninterrupted service transmission.
Issue 01 (2018-05-04) 1610

NE20E-S2
P2MP LDP Capability Negotiation

mLDP extends LDP by adding a P2MP Capability type-length-value (TLV) to an LDP
Initialization message. Figure 1-1082 shows the format of the P2MP Capability TLV.
Figure 1-1082 P2MP Capacity TLV format
As shown in Figure 1-1083, P2MP LDP-enabled label switching routers (LSRs) exchange
signaling messages to negotiate mLDP sessions. Two LSRs can successfully negotiate an
mLDP session only if both the LDP Initialization messages carry the P2MP Capability TLV.
After successful negotiation, an mLDP session is established. The mLDP session
establishment process is similar to the LDP session establishment process. The difference is
that the mLDP session establishment involves P2MP capability negotiation.
Figure 1-1083 Process of establishing an mLDP session
P2MP LDP LSP Establishment

P2MP LDP extends the FEC TLV carried in a Label Mapping message. The extended FEC
TLV is called a P2MP FEC element. Figure 1-1084 illustrates the P2MP FEC element format.
Issue 01 (2018-05-04) 1611

NE20E-S2
Figure 1-1084 P2MP FEC element format
Table 1-331 lists the fields in the P2MP FEC element.
Table 1-331 Fields in a P2MP FEC element

Field Description
Tree Type mLDP LSP type:

 P2MP
 MP2MP (Up)
 MP2MP (Down)
Address Family Address family to which a root node's IP address belongs

Address Length Length of a root node's IP address
Root Node Root node's IP address, which is manually designated
Address
Opaque Length Length of the opaque value
Opaque Value Value that identifies a specific P2MP LSP on a root node and carries
information about the root (also called ingress) and leaf nodes on the
P2MP LSP
The P2MP LDP LSP establishment mode varies depending on the node type. A P2MP LDP
LSP contains the following nodes:
 Leaf node: manually specified. When configuring a leaf node, you must also specify the
root node IP address and the opaque value.
 Transit node: any node that can receive P2MP Label Mapping messages and whose LSR
ID is different from the LSR IDs of the root nodes.
 Root node: a node whose host address is the same as the root node's IP address carried in
a P2MP LDP FEC.
The process for establishing a P2MP LDP LSP is as follows:
Issue 01 (2018-05-04) 1612

NE20E-S2
 Leaf and transit nodes select their upstream nodes.

A node that is the next hop in a preferred route to the root node is selected as an
upstream node. The label advertisement mode is downstream unsolicited (DU) for a
P2MP LDP LSP. This mode requires each leaf and transit node to select upstream nodes
and send Label Mapping messages to the upstream nodes.
 Nodes send Label Mapping messages to upstream nodes and generate forwarding entries.
As shown in Figure 1-1085, each node performs the following operations before
completing the LSP establishment:
− Leaf node: sends a Label Mapping message to its upstream node and generates a
forwarding entry.
− Transit node: receives a Label Mapping message from its downstream node and
checks whether it has sent a Label Mapping message to its upstream node:
 If the transit node has sent no Label Mapping message to any upstream nodes,
it looks up the routing table and finds an upstream node. If the upstream and
downstream nodes of the transit node have different IP addresses, the transit
node sends a Label Mapping message to the upstream node. If the upstream
and downstream nodes of the transit node have the same IP address, the transit
node does not send a Label Mapping message.
 If the transit node has sent a Label Mapping message to its upstream node, it
does not send a Label Mapping message again.
The transit node then generates a forwarding entry.
− Root node: receives a Label Mapping message from its downstream node and
generates a forwarding entry.
− A P2MPL LDP LSP is then established.
Figure 1-1085 Process of establishing a P2MP LDP LSP
P2MP LDP LSP Deletion

The process on each type of node is as follows:
 Leaf node
Issue 01 (2018-05-04) 1613

NE20E-S2
A leaf node sends a Label Withdraw message to an upstream node. After the upstream
node receives the message, it replies with a Label Release message to instruct the leaf
node to tear down the sub-LSP. If the upstream node has only the leaf node as a
downstream node, the upstream node sends the Label Withdraw message to its upstream
node. If the upstream node has another downstream node, the upstream node does not
send the Label Withdraw message.
 Transit node
If a transit node or an LDP session between a transit node and its upstream node fails or
a user manually deletes the transit node configuration, the upstream node of the transit
node deletes the sub-LSPs that pass through the transit node. If the upstream node has
another downstream node, the upstream node does not send the Label Withdraw message.
If the upstream node has only the leaf node as a downstream node, the upstream node
sends the Label Withdraw message to its upstream node.
 Root node
If a root node fails or a user manually deletes the LSP configuration on the root node, the
root node deletes the whole LSP.
P2MP LDP LSP Update

If a node is manually modified or the link cost is changed, mLDP updates the P2MP LDP LSP.
The P2MP LDP LSP update scenarios are as follows:
 A leaf node dynamically joins a P2MP LDP LSP.
A leaf node negotiates a P2MP LDP session with its upstream node. After the session is
established, the leaf node assigns a label to its upstream node. The upstream node
directly adds the sub-LSP to the leaf node to the LSP and updates the forwarding entry
for the sub-LSP.
 An upstream node is modified.
As shown in Figure 1-1086, the upstream node of Leaf 2 is changed from P4 to P2. To
prevent LSP loops, Leaf 2 sends a Label Withdraw message to P4. Upon receipt, P4
deletes the sub-LSP to Leaf 2 and deletes the forwarding entry for the sub-LSP. Leaf 2
then sends a Label Mapping message to P2. Upon receipt, P2 establishes a sub-LSP to
Leaf 2 and generates a forwarding entry.
Issue 01 (2018-05-04) 1614

NE20E-S2
Figure 1-1086 Upstream node change
 The make-before-break (MBB) mechanism is used.

If the optimal path between an LSR and the root node changes after a link recovers or the
link cost changes, the LSR re-selects its upstream node, which leads to a P2MP LDP LSP
update. This process causes packet loss. mLDP uses the MBB mechanism to minimize
packet loss during the P2MP LDP LSP update. The MBB mechanism enables the LSR to
establish a new LSP before tearing down the original LSP. This means that although the
LSR sends a Label Mapping message upstream, the LSR retains the original LSP. After
the upstream node sends an MBB Notification message informing that a new LSP is
successfully established, the LSR tears down the original LSP.
Other Usage
mLDP P2MP LSPs can transmit services on next generation (NG) multicast VPN (MVPN)
and multicast VPLS networks. In the MVPN or multicast VPLS scenario, NG MVPN
signaling or multicast VPLS signaling triggers the establishment of mLDP P2MP LSPs. There
is no need to manually configure leaf nodes.
Usage Scenarios
mLDP can be used in the following scenarios:
 IPTV services are transmitted over an IP/MPLS backbone network.
 Multicast virtual private network (VPN) services are transmitted.
 The virtual private LAN service (VPLS) is transmitted along a P2MP LDP LSP.
Benefits
mLDP used on an IP/MPLS backbone network offers the following benefits:
Issue 01 (2018-05-04) 1615

NE20E-S2
 Core nodes on the IP/MPLS backbone network can transmit multicast services, without
Protocol Independent Multicast (PIM) configured, which simplifies network deployment.
 Uniform MPLS control and forwarding planes are provided for the IP/MPLS backbone
network. The IP/MPLS backbone network can transmit both unicast and multicast VPN
traffic.
1.12.3.2.19 LDP Traffic Statistics Collection

If a device functions as the ingress or transit node of an LDP LSP and the primary LDP LSP
uses a destination IP address mask of 32 bits, the device collects statistics about primary LDP
LSP traffic that an outbound interface forwards. LDP traffic statistics collection enables users
to query and monitor LDP LSP traffic in real time.
Implementation
LDP traffic statistics collection enables the ingress or a transit node to collect statistics only
about outgoing LDP LSP traffic with the destination IP address mask of 32 bits.
Figure 1-1087 LDP traffic statistics collection
In Figure 1-1087, each pair of adjacent devices establishes an LDP session and LDP LSP over
the session. Two LSPs originate from LSRA and are destined for LSRD along the paths LSRA
-> LSRB -> LSRD and LSRA -> LSRB -> LSRC -> LSRD. LSRB is used as an example.
LSRB functions as either a transit node to forward LSRA-to-LSRD traffic or the ingress to
forward LSRB-to-LSRD traffic. LSRB collects statistics about traffic sent by the outbound
interface connected to LSRD and outbound interface connected to LSRC. LSRA can only
function as the ingress, and therefore, collects statistics about traffic only sent by itself. LSRD
can only function as the egress, and therefore, does not collect traffic statistics.
1.12.3.2.20 BFD for P2MP Tunnel

BFD for P2MP tunnel applies when the primary and backup mLDP P2MP tresses are
established on roots on NG-MVPN or VPLS networks. With this function, a BFD session is
established to monitor the connectivity of the primary mLDP P2MP tree. If the BFD session
detects a fault, it rapidly switches traffic to the backup tree, which minimizes traffic loss.
Issue 01 (2018-05-04) 1616

NE20E-S2
Benefits
No tunnel protection is provided in the NG-MVPN over mLDP P2MP function or VPLS over
mLDP P2MP function. If an LSP fails, traffic can only be switched using route
change-induced hard convergence, which renders low performance. BFD for P2MP tunnel
provides a dual-root mLDP 1+1 protection mechanism for the NG-MVPN over mLDP P2MP
function or VPLS over mLDP P2MP function. The primary and backup tunnels are
established for VPN traffic. If a P2MP tunnel fails, BFD For mLDP P2MP tunnel rapidly
detects the fault and switches traffic, which improves convergence performance for the
NG-MVPN over mLDP P2MP function or VPLS over mLDP P2MP function and minimizes
traffic loss.
Principles
Figure 1-1088 Dual-root P2MP LDP tunnel protection
In Figure 1-1088, a root uses BFD to send protocol packets to all leaf nodes along a P2MP
LDP LSP. If a leaf node fails to receive BFD packets within a specified period, a fault occurs.
In an NG-MVPN or VPLS scenario shown in Figure 1-1088, each of two roots establishes an
mLDP P2MP tree. PE-AGG1 is the master root, and PE-AGG2 is the backup root. The two
trees do not overlap. BFD for P2MP tunnel is configured on the roots and leaf nodes to
establish BFD sessions. If a BFD session detects a fault in the primary P2MP tunnel, a
forwarder rapidly detects the fault and switches NG-MVPN or VPLS traffic to the backup
P2MP tunnel.
1.12.3.2.21 LDP Extension for Inter-Area LSP
Principles
In a large-scale network, multiple IGP areas usually need to be configured for flexible
network deployment and fast route convergence. When advertising routes between IGP areas,
to prevent a large number of routes from consuming too many resources, an area border router
(ABR) needs to aggregate the routes in the area and then advertise the aggregated route to the
Issue 01 (2018-05-04) 1617

NE20E-S2
neighbor IGP areas. The LDP extension for inter-area LSP function supports the longest
match rule for looking up routes so that LDP can use aggregated routes to establish inter-area
LDP LSPs.
Figure 1-1089 Networking topology for LDP extension for inter-area LSP
As shown in Figure 1-1089, there are two IGP areas: Area 10 and Area 20.
In the routing table of LSRD on the edge of Area 10, there are two host routes to LSRB and
LSRC. You can use IS-IS to aggregate the two routes to one route to 1.3.0.0/24 and send this
route to Area 20 in order to prevent a large number of routes from occupying too many
resources on the LSRD. Consequently, there is only one aggregated route (1.3.0.0/24) but not
32-bit host routes in LSRA's routing table. By default, when establishing LSPs, LDP searches
the routing table for the route that exactly matches the forwarding equivalence class (FEC) in
the received Label Mapping message.Figure 1-1089shows routing entry information of LSRA
and routing information carried in the FEC in the example shown inTable 1-332.
Table 1-332 Routing entry information of LSRA and routing information carried in the FEC
Routing Entry Information FEC
of LSRA
1.3.0.0/24 1.3.0.1/32
1.3.0.2/32
LDP establishes liberal LSPs, not inter-area LDP LSPs, for aggregated routes. In this
situation, LDP cannot provide required backbone network tunnels for VPN services.
Issue 01 (2018-05-04) 1618

NE20E-S2
Therefore, in the situation shown in Figure 1-1089, configure LDP to search for routes based
on the longest match rule for establishing LSPs. There is already an aggregated route to
1.3.0.0/24 in the routing table of LSRA. When LSRA receives a Label Mapping message
(such as the carried FEC is 1.3.0.1/32) from Area 10, LSRA searches for a route according to
the longest match rule defined in relevant standards. Then, LSRA finds information about the
aggregated route to 1.3.0.0/24, and uses the outbound interface and next hop of this route as
those of the route to 1.3.0.1/32. LDP can establish inter-area LDP LSPs.
DoD Support for Inter-Area LDP Extension

In a remote DoD LDP session, LDP uses the longest match rule to establish an LSP destined
for the peer's LSR ID.
Figure 1-1090 DoD support for inter-area LDP extension
In Figure 1-1090, no exact routes between LSRA and LSRC are configured. The default
LSRA-to-LSRB route to 0.0.0.0 is used between LSRA and LSRC. A remote LDP session in
DoD mode is established between LSRA and LSRC. Before an LSP is established between
the two LSRs, LSRA uses the longest match rule to query the next-hop IP address and sends a
Label Request packet to the downstream LSR. Upon receipt of the Label Request packet, the
transit LSRB checks whether an exact route to LSRC exists. If no exact route is configured
and the longest match function is enabled, LSRB uses the longest match function to find a
route and establish an LSP over the route.
A remote LDP session in DoD mode is established on LSRA and LSRA does not find an exact
route to the remote IP address. In this situation, after the IP address of a remote peer is
specified on LSRA, LSRA uses the longest match function to automatically send a Label
Request packet to request a DoD label to the remote peer that is assigned an IP address.
1.12.3.3.1 mLDP Applications in an IPTV Scenario
Service Overview
The IP or Multiprotocol Label Switching (MPLS) technology has become a mainstream
bearer technology on backbone networks, and the demands for multicast services (for
example, IPTV) transmitted over bearer networks are evolving. Carriers draw on the existing
MPLS mLDP technique to provide the uniform MPLS control and forwarding planes for
multicast services transmitted over backbone networks.
Issue 01 (2018-05-04) 1619

NE20E-S2
mLDP is deployed on IP/MPLS backbone networks. Figure 1-1091 illustrates mLDP
applications in an IPTV scenario.
Figure 1-1091 mLDP applications in an IPTV scenario
Feature Deployment
The procedure for deploying end-to-end (E2E) IP multicast services to be transmitted along
mLDP label switched paths (LSPs) is as follows:
 Establish an mLDP LSP.
Perform the following steps:
a. Plan the root, transit, and leaf nodes on an mLDP LSP.
b. Configure leaf nodes to send requests to the root node to establish
point-to-multipoint (P2MP) LDP LSPs.
c. Configure a virtual tunnel interface and bind the LSP to it.
 Import multicast services into the LSP.
Configure the quality of service (QoS) redirection function on the ingress PE1 to direct
data packets sent by a multicast source to the specified mLDP LSP.
To enable the egresses (PE2 and PE3) to forward multicast services, perform the
following operations:
− Configure the egresses to run Protocol Independent Multicast (PIM) to generate
− Enable the egresses to ignore the Unicast Reverse Path Forwarding (URPF) check.
This is because the URPF check fails as PIM does not need to be run on core nodes
on the P2MP LDP network.
− Enable multicast source proxy based on the location of the Rendezvous Point (RP).
After multicast data packets for a multicast group in an any-source multicast (ASM)
address range are directed to an egress, the egress checks the packets based on
Issue 01 (2018-05-04) 1620

NE20E-S2
unicast routes. Multicast source proxy is enabled or disabled based on the following
check results:
 If the egress is indirectly connected to a multicast source and does not function
as the RP to which the group corresponds, the egress stops forwarding
multicast data packets. As a result, downstream hosts cannot receive these
multicast data packets. Multicast source proxy can be used to address this
problem. Multicast source proxy enables the egress to send a Register message
to the RP deployed on a source-side device (for example, SR1) in a PIM
domain. The RP adds the egress to a rendezvous point tree (RPT) to enable the
egress to forward multicast data packets to the downstream hosts.
 If the egress is directly connected to a multicast source or functions as the RP
to which the group corresponds, the egress can forward multicast data packets,
without multicast source proxy enabled.
1.12.4 MPLS TE
Multiprotocol Label Switching (MPLS) traffic engineering (TE) effectively schedules,
allocates, and uses existing network resources to provide sufficient bandwidth and support for
quality of service (QoS). MPLS TE helps carriers minimize expenditures without requiring
hardware upgrades. TE is implemented based on MPLS techniques and is easy to deploy and
maintain on live networks. MPLS TE supports a range of reliability techniques, which helps
backbone networks achieve carrier- and device-class reliability.
Purpose
Traffic engineering techniques are common for carriers operating IP/MPLS bearer networks.
These techniques are used to prevent traffic congestion and uneven resource allocation.
A node on a conventional IP network selects the shortest path as an optimal route, regardless
of other factors, for example, bandwidth. The shortest path may be congested with traffic,
whereas other available paths are idle.
Figure 1-1092 Conventional routing
Each Link on the network shown in Figure 1-1092 has a bandwidth of 100 Mbit/s and the
same metric value. LSRA sends LSRJ traffic at 40 Mbit/s, and LSRG sends LSRJ traffic at 80
Mbit/s. Traffic from both routers travels through the shortest path LSRA (LSRG) → LSRB →
LSRC → LSRD → LSRI → LSRJ that is calculated by an Interior Gateway Protocol (IGP)
Issue 01 (2018-05-04) 1621

NE20E-S2
protocol. As a result, the path LSRA (LSRG) → LSRB → LSRC → LSRD → LSRI → LSRJ
may be congested because of overload, while the path LSRA (LSRF) → LSRB → LSRE →
LSRF → LSRH → LSRI → LSRJ is idle.
Network congestion is a major cause for backbone network performance deterioration. The
network congestion is resulted from insufficient resources or locally induced by incorrect
resource allocation. For the former, network device expansion can prevent the problem. For
the later, TE is used to allocate some traffic to idle link so that traffic allocation is improved.
TE dynamically monitors network traffic and loads on network elements and adjusts the
parameters for traffic management, routing, and resource constraints in real time, which
prevents network congestion induced by load imbalance.
Conventional TE solutions are as follows:
 TE controls network traffic by adjusting the metric of a path. This method eliminates
congestion only on some links. Adjusting a metric is difficult on a complex network
because a link change affects multiple routes.
 TE directs some traffic to virtual connections (VCs) based on an overlay model. The
current IGPs are topology driven and applicable to only static network connections,
regardless of dynamic factors, such as bandwidth and traffic attributes.
The overlay model, such as IP over asynchronous transfer mode (ATM), complements
IGP disadvantages. An overlay model provides a virtual topology over a physical
topology for a network. This helps properly adjust traffic and implement QoS features,
but has high costs and poor extensibility.
A scalable and simple solution is required to implement TE on a large-scale network. MPLS,
an overlay model, allows a virtual topology to be established over a physical topology and
maps traffic to the virtual topology. MPLS can be integrated with TE. MPLS TE was
introduced.
Definition
MPLS TE establishes label switched paths (LSPs) based on constraints and conducts traffic to
specific LSPs so that network traffic is transmitted along the specified path. The constraints
include controllable paths and sufficient link bandwidth reserved for services transmitted over
the LSPs. If resources are insufficient, higher-priority LSPs preempt resources, such as
bandwidth, of lower-priority LSPs so that higher-priority services' requirements can be
fulfilled preferentially. In addition, if an LSP fails or a node is congested, MPLS TE protects
network communication using a backup path and the fast reroute (FRR) function. Using
MPLS TE allows a network administrator to deploy LSPs to properly allocate network
resources, which prevents network congestion. If the number of LSPs increases, a specific
offline tool can be used to analyze traffic. MPLS TE can be used on the network shown in
Figure 1-1092 to address congestion. MPLS TE establishes an 80 Mbit/s LSP over the path
LSRG → LSRB → LSRC → LSRD → LSRI → LSRJ and a 40 Mbit/s LSP over the path
LSRA → LSRB → LSRE → LSRF → LSRH → LSRI → LSRJ. MPLS TE directs traffic to
the two LSPs, preventing congestion.
Issue 01 (2018-05-04) 1622

NE20E-S2
Figure 1-1093 MPLS TE
Table 1-333 describes MPLS TE functions.
Table 1-333 MPLS TE functions
Module
Basic Includes basic MPLS TE settings and the tunnel establishment capability.
function
Tunnel Allows existing tunnels to be reestablished over other paths if the topology is
optimizati changed, or these tunnels can be reestablished using updated bandwidth if
on service bandwidth values are changed.
Reliabilit Supports path protection, local protection, and node protection.
y function
Security Supports Resource Reservation Protocol (RSVP) authentication, which
improves signaling security over MPLS TE networks.
P2MP TE Point-to-multipoint (P2MP) traffic engineering (TE) is a promising solution to
multicast service transmission. P2MP TE helps carriers provide high TE
capabilities and increased reliability on an IP/MPLS backbone network and
reduce network operational expenditure (OPEX).
Benefits
MPLS TE offers the following benefits:
 Provides sufficient bandwidth and supports QoS capabilities for services.
 Optimizes bandwidth allocation.
 Establishes public network tunnels to isolate virtual private network (VPN) traffic.
 Is implemented based on existing MPLS techniques and its deployment and maintenance
are simple.
Issue 01 (2018-05-04) 1623

NE20E-S2
 Supports carrier- and device-level reliability functions.
1.12.4.2 Basic Principles

1.12.4.2.1 Technical Overview
Related Concepts
Table 1-334 Related Concepts
Concept Description
MPLS TE Multiple LSPs are bound together to form an MPLS TE tunnel. An
tunnel MPLS TE tunnel is uniquely identified by the following parameters:
 Tunnel interface: a P2P virtual interface that encapsulates packets.
Similar to a loopback interface, a tunnel interface is a logical
interface. A tunnel interface name is identified by an interface type
and number. The interface type is "tunnel." The interface number is
expressed in the format of SlotID/CardID/PortID.
 Tunnel ID: a decimal number that identifies an MPLS TE tunnel and
facilitates tunnel planning and management. A tunnel ID must be
specified before an MPLS TE tunnel interface is configured.
Figure 1-1094 MPLS TE Tunnel and LSPs
A primary LSP with an LSP ID 2 is established

along the path LSRA → LSRB → LSRC → LSRD
→ LSRE on the network shown in Figure 1-1094. A
backup LSP with an LSP ID 1024 is established
along the path LSRA → LSRF → LSRG → LSRH
→ LSRE. The two LSPs are in a tunnel named
Tunnel 1/0/0 with a tunnel ID 100.
CR-LSP LSPs in an MPLS TE tunnel are constraint-based routed LSPs
Issue 01 (2018-05-04) 1624

NE20E-S2
Concept Description
(CR-LSPs).
Unlike Label Distribution Protocol (LDP) LSPs that are established using
routing information, CR-LSPs are established based on bandwidth and
path constraints in addition to routing information.
MPLS TE Tunnel Establishment and Application

An MPLS TE tunnel is established using four components. Table 1-335 lists the components
and describes their functions.
Table 1-335 MPLS TE components

N Name Description
o.
1 1.12.4.2.2 Extends an IGP to advertise TE information, in addition to routing

Informatio information. TE information includes the maximum link bandwidth,
n maximum reservable bandwidth, reserved bandwidth, and link colors.
Advertisem Every node collects TE information about all nodes in a local area and
ent generates a traffic engineering database (TEDB).
Component
2 1.12.4.2.3 Runs Constraint Shortest Path First (CSPF) and uses TEDB data to
Path calculate a path that satisfies specific constraints. CSPF evolves from
Calculation the Shortest Path First (SPF) protocol. CSPF excludes nodes and links
Component that do not satisfy specific constraints and uses the same algorithm that
SPF supports to calculate a path.
3 1.12.4.2.4 Establishes the following types of CR-LSPs:
Establishin  Static CR-LSP: set up by manually configuring labels and
ga bandwidth, irrespective of signaling protocols or path calculation.
CR-LSP Setting up a static CR-LSP consumes few resources because no
Using MPLS control packets are exchanged between two ends of the
RSVP-TE CR-LSP. The static CR-LSP cannot be adjusted dynamically in a
changeable network topology; therefore, the static CR-LSP is not
widely used.
 Dynamic CR-LSP: established using RSVP-TE signaling.
RSVP-TE carries parameters, such as the bandwidth, explicit
routes, and affinities. There is no need to manually configure each
hop along a dynamic CR-LSP. Dynamic CR-LSPs apply to
large-scale networks.
4 1.12.4.2.7 Directs traffic to a CR-LSP and forwards the traffic along the CR-LSP.
Traffic Although a CR-LSP can be established using the preceding three
Forwarding components, the CR-LSP cannot automatically import traffic. The
Component traffic forwarding component can be used to direct traffic to the
CR-LSP.
Issue 01 (2018-05-04) 1625

NE20E-S2
A network administrator can configure link and tunnel attributes to enable MPLS TE to
automatically establish a CR-LSP. The network administrator can then direct traffic to the
CR-LSP and forward traffic over the CR-LSP.
1.12.4.2.2 Information Advertisement Component

The information advertisement component advertises network resource information over an
MPLS TE network. TE is used to control network traffic distribution, which optimizes
network resource usage. All nodes, especially ingress nodes on an MPLS TE network, must
obtain information about link resources to determine the paths and nodes for MPLS TE
tunnels.
Related Concepts
Information Advertisement Component involves the following concepts:
Table 1-336 Related concepts
Concept Description
Total link Manually set for a physical link.
bandwidth
Maximum Maximum bandwidth that a link can reserve for an MPLS TE tunnel to be
reservable established.
bandwidth The maximum reservable bandwidth must be lower than or equal to the total
link bandwidth. The maximum reservable bandwidth can be manually set.
TE metric A TE metric is used in TE tunnel path calculation, allowing the calculation
process to be independent from IGP route-based path calculation.
The IGP metric is used for MPLS TE tunnels by default.
SRLG A shared risk link group (SRLG) is a set of links which are likely to fail
concurrently when sharing a physical resource (for example, an optical fiber).
Links in an SRLG share the same risk of faults. If one link fails, other links in
the SRLG also fail.
An SRLG enhances CR-LSP reliability on an MPLS TE network enabled
with CR-LSP hot standby or TE FRR. For more information about the SRLG,
see 1.12.4.5.6 SRLG.
Link Link administrative group is also called link color.
administra A link administrative group is a 32-bit vector, with each bit set to a specified
tive group value that is associated with a desired meaning. For example, a link
administrative group attribute can be configured to describe link bandwidth, a
performance parameter (such as the delay time) or a management policy. The
policy can be a traffic type (multicast for example) or a flag indicating that an
MPLS TE tunnel passes over the link. The link administrative group attribute
is used together with affinities to control the paths for tunnels.
Issue 01 (2018-05-04) 1626

NE20E-S2
Contents to Be Advertised
The network resource information to be advertised includes the following items:
 Link status information: interface IP addresses, link types, and link metric values, which
are collected by an Interior Gateway Protocol (IGP)
 Bandwidth information, such as maximum link bandwidth and maximum reservable
bandwidth
 TE metric: TE link metric, which is the same as the IGP metric by default
 Administrative group
 SRLG
Advertisement Methods
Either of the following link status protocol extensions can be used to advertise TE
information:
 1.10.8.2.10 IS-IS TE
 1.10.6.2.5 OSPF TE
Open Shortest Path First (OSPF) TE and Intermediate System to Intermediate System (IS-IS)
TE automatically collect TE information and flood it to MPLS TE nodes.
When to Advertise Information

OSPF TE or IS-IS TE floods link information so that each node can save area-wide link
information in a traffic engineering database (TEDB). Information flooding is triggered by the
establishment of an MPLS TE tunnel, or one of the following conditions:
 A specific IGP TE flooding interval elapses.
 A link is activated or deactivated.
 A CR-LSP in an MPLS TE tunnel fails to be established because of insufficient
bandwidth.
 Link attributes, such as the administrative group attribute or affinity attribute change.
 The link bandwidth changes.
When the available bandwidth of an MPLS interface changes, the system automatically
updates information in the TEDB and floods it. When a lot of tunnels are to be
established on a node, the node reserves bandwidth and frequently updates information
in the TEDB and floods it. For example, the bandwidth of a link is 100 Mbit/s. If 100 TE
tunnels, each with bandwidth of 1 Mbit/s, are established, the system floods link
information 100 times.
To help suppress the frequency at which TEDB information is updated and flooded, the
flooding is triggered based on either of the following conditions:
− The proportion of the bandwidth reserved for an MPLS TE tunnel to the available
bandwidth in the TEDB is greater than or equal to a specific threshold.
− The proportion of the bandwidth released by an MPLS TE tunnel to the available
bandwidth in the TEDB is greater than or equal to a specific threshold.
If either of the preceding conditions is met, an IGP floods link bandwidth information,
and constraint shortest path first (CSPF) updates the TEDB.
For example, the available bandwidth of a link is 100 Mbit/s and 100 TE tunnels, each
with bandwidth of 1 Mbit/s, are established over the link. The flooding threshold is 10%.
Issue 01 (2018-05-04) 1627

NE20E-S2
The Figure 1-1095 shows the proportion of the bandwidth reserved for each MPLS TE
tunnel to the available bandwidth in the TEDB.
Bandwidth flooding is not performed when tunnels 1 to 9 are created. After tunnel 10 is
created, the bandwidth information (10 Mbit/s in total) on tunnels 1 to 10 is flooded. The
available bandwidth is 90 Mbit/s. Similarly, no bandwidth information is flooded after
tunnels 11 to 18 are created. After tunnel 19 is created, bandwidth information on tunnels
11 to 19 is flooded. The process repeats until tunnel 100 is established.
Figure 1-1095 Proportion of the bandwidth reserved for each MPLS TE tunnel to the
available bandwidth in the TEDB
Results Obtained After Information Advertisement

Every node creates a TEDB in an MPLS TE area after OSPF TE or IS-IS TE floods
bandwidth information.
TE parameters are advertised during the deployment of an MPLS TE network. Every node
collects TE link information in the MPLS TE area and saves it in a TEDB. The TEDB
contains network link and topology attributes, including information about the constraints and
bandwidth usage of each link.
A node calculates the optimal path to another node in the MPLS TE area based on information
in the TEDB. MPLS TE then establishes a CR-LSP over this optimal path.
The TEDB and IGP link-state data base (LSDB) are independent of each other. The two types
of database both collect routing information flooded by IGPs, but they differ in the following
ways:
 A TEDB contains TE information in addition to all the information in an LSDB.
 An IGP uses information in an LSDB to calculate the shortest path, while MPLS TE uses
information in a TEDB to calculate the optimal path.
Issue 01 (2018-05-04) 1628

NE20E-S2
1.12.4.2.3 Path Calculation Component

Intermediate System to Intermediate System (IS-IS) or Open Shortest Path First (OSPF) uses
shortest path first (SPF) to calculate the shortest paths between nodes. MPLS TE uses
constrained shortest path first (CSPF) to calculate the optimal path to a specific node. CSPF,
which is derived from SPF, is an algorithm that supports constraints.
Related Concepts
Path Calculation Component involves the following concepts:
Table 1-337 Related Concepts
Concept Description
Bandwidth Bandwidth values are planned based on services that are to pass through a
tunnel. The configured bandwidth is reserved on each node through which a
tunnel passes.
Affinity An affinity is a 32-bit vector, configured on the ingress of a tunnel. It must be
attribute used together with a link administrative group attribute.
After a tunnel is configured with an affinity, a device compares the affinity
with the administrative group value during link selection to determine
whether a link with specified attributes is selected or not. The link selection
criteria are as follows:
 The result of the IncludeAny affinity OR administrative group value is not
0.
 The result of the ExcludeAny affinity OR the administrative group value
is 0.
IncludeAny equals the result of the affinity attribute OR the subnet mask;
ExcludeAny equals IncludeAny OR the subnet mask; the administrative
group value equals the administrative group value OR the subnet mask.
The following rules apply:
 If some bits in a mask are 1s, at least one bit in the administrative group is
1 and the corresponding bit in the affinity must be 1. If some bits in the
affinity are 0s, the corresponding bits in the administrative group cannot
be 1.
For example, an affinity is 0x0000FFFF and its mask is 0xFFFFFFFF.
The higher-order 16 bits in the administrative group of available links are
0 and at least one of the lower-order 16 bits is 1. This means the
administrative group attribute ranges from 0x00000001 to 0x0000FFFF.
 If some bits in a mask are 0s, the corresponding bits in the administrative
group are not compared with the affinity bits.
For example, an affinity is 0xFFFFFFFF and its mask is 0xFFFF0000. At
least one of the higher-order 16 bits in an administrative group attribute is
1 and the lower-order 16 bits can be 0s and 1s. This means that the
administrative group attribute ranges from 0x00010000 to 0xFFFFFFFF.
NOTE
Understand specific comparison rules before deploying devices of different vendors
because the comparison rules vary with the vendor.
A network administrator can use the link administrative group and affinities
Issue 01 (2018-05-04) 1629

NE20E-S2
Concept Description
to control the paths over which MPLS TE tunnels are established.
Explicit An explicit path used to establish a CR-LSP. Nodes to be included or
path excluded are specified on this path. Explicit paths are classified into the
following types:
 Strict explicit path
A hop is directly connected to its next hop on a strict explicit path. By
specifying a strict explicit path, the most accurate path is provided for a
CR-LSP.
Figure 1-1096 Strict explicit path
For example, a CR-LSP is set up between LSRA and LSRF on the

network shown in Figure 1-1096. LSRA is the ingress, and LSRF is the
egress. "X strict" specifies the LSR that the CR-LSP must travel through.
For example, "B strict" indicates that the CR-LSP must travel through
LSRB, and the previous hop of LSRB must be LSRA. "C strict" indicates
that the CR-LSP must travel through LSRC, and the previous hop of
LSRC must be LSRB. The procedure repeats. A path with each node
specified is provided for the CR-LSP.
 Loose explicit path
A loose explicit path contains specified nodes through which a CR-LSP
must pass. Other routers that are not specified can also exist on the
CR-LSP.
Figure 1-1097 Loose explicit path
Issue 01 (2018-05-04) 1630

NE20E-S2
Concept Description
For example, a CR-LSP is set up over a loose explicit path between LSRA
and LSRF on the network shown in Figure 1-1097. LSRA is the ingress,
and LSRF is the egress. "D loose" indicates that the CR-LSP must pass
through LSRD and LSRD and LSRA may not be directly connected. This
means that other LSRs may exist between LSRD and LSRA.
Hop limit Hop limit is a condition for path selection during CR-LSP establishment.
Similar to the administrative group and affinity attributes, a hop limit defines
the number of hops that a CR-LSP allows.
CSPF Fundamentals
CSPF works based on the following parameters:
 Tunnel attributes configured on an ingress to establish a CR-LSP
 Traffic engineering database (TEDB)
A TEDB can be generated only after Interior Gateway Protocol (IGP) TE is configured. On an IGP
TE-incapable network, CR-LSPs are established based on IGP routes, but not CSPF calculation results.
CSPF Calculation Process

The CSPF calculation process is as follows:
1. Links that do not meet tunnel attribute requirements in the TEDB are excluded.
2. SPF calculates the shortest path to a tunnel destination based on TEDB information.
CSPF attempts to use the OSPF TEDB to establish a path for a CR-LSP by default. If a path is
successfully calculated using OSPF TEDB information, CSPF completes calculation and does not use
the IS-IS TEDB to calculate a path. If path calculation fails, CSPF attempts to use IS-IS TEDB
information to calculate a path.
Issue 01 (2018-05-04) 1631

NE20E-S2
CSPF can be configured to use the IS-IS TEDB to calculate a CR-LSP path. If path
calculation fails, CSPF uses the OSPF TEDB to calculate a path.
CSPF calculates the shortest path to a destination. If there are several shortest paths with the
same metric, CSPF uses a tie-breaking policy to select one of them. The following
tie-breaking policies for selecting a path are available:
 Most-fill: selects a link with the highest proportion of used bandwidth to the maximum
reservable bandwidth, efficiently using bandwidth resources.
 Least-fill: selects a link with the lowest proportion of used bandwidth to the maximum
reservable bandwidth, evenly using bandwidth resources among links.
 Random: selects links randomly, allowing LSPs to be established evenly over links,
regardless of bandwidth distribution.
When several links have the same proportion of used bandwidth to the maximum reservable
bandwidth (for example, the links do not use the reserved bandwidths or the same bandwidth
is used on every link), the link discovered first is selected, irrespective of whether most-fill or
least-fill is configured.
For example, CSPF removes links marked blue and links each with bandwidth of 50 Mbit/s
based on tunnel constraints and uses other links each with bandwidth of 100 Mbit/s to
calculate a path for an MPLS TE tunnel on the network shown in Figure 1-1098. The
constraints include the destination LSRE, bandwidth of 80 Mbit/s, and a transit node LSRH.
Figure 1-1098 Process of link removal
CSPF calculates a path shown in Figure 1-1099 in the same way SPF would calculate it.
Issue 01 (2018-05-04) 1632

NE20E-S2
Figure 1-1099 CSPF calculation result
Differences Between CSPF and SPF

CSPF is dedicated to calculating MPLS TE paths. It has similarities with SPF but they have
the following differences:
 CSPF calculates the shortest path between the ingress and egress, and SPF calculates the
shortest path between a node and each of other nodes on a network.
 CSPF uses metrics such as the bandwidth, link attributes, and affinity attributes, in
addition to link costs, which are the only metric used by SPF.
 CSPF does not support load balancing and uses three tie-breaking policies to determine a
path if multiple paths have the same attributes.
1.12.4.2.4 Establishing a CR-LSP Using RSVP-TE

RSVP-TE is an extension to RSVP. RSVP is designed for the Integrated Service model and
runs on every node of path for resource reservation. RSVP is a control protocol working at the
transport layer, but does not transmit application data. It establishes or tears down LSPs using
TE attributes carried in extended objects.
RSVP-TE has the following characteristics:
 Unidirectional: RSVP-TE only takes effect on traffic that travels from the ingress to the
egress.
 Receive end-oriented: A receive end initiates a request to reserve resources and maintains
resource reservation information.
 Soft state-based: RSVP uses a soft state mechanism to maintain the resource reservation
information.
RSVP-TE Messages
RSVP-TE messages are as follows:
 Path message: used to request downstream nodes to distribute labels. A Path message
records path information on each node through which the message passes. The path
information is used to establish a path state block (PSB) on a node.
 Resv message: used to reserve resources at each hop of a path. A Resv message carries
information about resources to be reserved. Each node that receives the Resv message
reserves resources based on reservation information carried in the message. The
Issue 01 (2018-05-04) 1633

NE20E-S2
reservation information is used to establish a reservation state block (RSB) and to record
information about distributed labels.
 PathErr message: sent upstream by an RSVP node if an error occurs during the
processing of a Path message. A PathErr message is forwarded by every transit node and
arrives at the ingress.
 ResvErr message: sent downstream by an RSVP node if an error occurs during the
processing of a Resv message. A ResvErr message is forwarded by every transit node
and arrives at the egress.
 PathTear message: sent downstream by the ingress to delete information about the local
state created on every node of the path.
 ResvTear message: sent upstream by the egress to delete the local reserved resources
assigned to a path. After receiving the ResvTear message, the ingress sends a PathTear
message to the egress.
Process of Establishing an LSP
Figure 1-1100 Schematic diagram for the establishment of a CR-LSP
Figure 1-1100 shows the process of establishing a CR-LSP. The process is as follows:
1. The ingress configured with RSVP-TE creates a PSB and sends a Path message to transit
nodes.
2. After receiving the Path message, the transit node processes and forwards this message,
and creates a PSB.
3. After receiving the Path message, the egress creates a PSB, uses bandwidth reservation
information in the Path message to generate a Resv message, and sends the Resv
message to the ingress.
4. After receiving the Resv message, the transit node processes and forwards the Resv
message and creates an RSB.
5. After receiving the Resv message, the ingress creates an RSB and confirms that the
resources are reserved successfully.
6. The ingress successfully establishes a CR-LSP to the egress.
Soft State Mechanism

The soft state mechanism enables RSVP nodes to periodically send Path and Resv messages
to synchronize states (including states in the PSB and RSB) between RSVP neighboring
nodes or to resend RSVP messages that have been dropped. If an RSVP node does not receive
an RSVP matching a specific state within a specified period of time, the RSVP node deletes
the state from a state block.
A node can refresh a state in a state block and notifies other nodes of the refreshed state. In
the tunnel re-optimization scenario, if a route changes, the ingress is about to establish a new
LSP. RSVP nodes along the new path send Path messages downstream to initialize PSBs and
receive Resv messages responding to create new RSBs. After the new path is established, the
Issue 01 (2018-05-04) 1634

NE20E-S2
ingress sends a Tear message downstream to delete soft states maintained on nodes of the
previous path.
Reservation Styles
A reservation style defines how a node reserves resources after receiving a request sent by an
upstream node. The NE20E supports the following reservation styles:
 Fixed filter (FF): defines a distinct bandwidth reservation for data packets from a
particular transmit end.
 Shared explicit (SE): defines a single reservation for a set of selected transmit ends.
These senders share one reservation but assign different labels to a receive end.
1.12.4.2.5 RSVP Summary Refresh

RSVP summary refresh (Srefresh) function enables a node to send digests of RSVP Refresh
messages to maintain RSVP soft states and respond to RSVP soft state changes, which
reduces signaling packets used to maintain the RSVP soft states and optimizes bandwidth
allocation.
Background
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state
block (RSB) information between nodes. They can also be used to monitor the reachability
between RSVP neighbors and maintain RSVP neighbor relationships. As the sizes of Path and
Resv messages are larger, sending many messages to establish many CR-LSPs causes
increased consumption of network resources. RSVP Srefresh can be used to address this
problem.
Implementation
RSVP Srefresh defines new objects based on the existing RSVP protocol:
 Message_ID extension and retransmission extension
The Srefresh extension builds on the Message_ID extension. According to the
Message_ID extension mechanism defined in relevant standards, RSVP messages carry
extended objects, including Message_ID and Message_ID_ACK objects. The two
objects are used to confirm RSVP messages and support reliable RSVP message
delivery.
The Message_ID object can also be used to provide the RSVP retransmission mechanism.
For example, a node initializes a retransmission interval as Rf seconds after it sends an
RSVP message carrying the Message_ID object. If the node receives no ACK message
within Rf seconds, the node retransmits an RSVP message after (1 + Delta) x Rf seconds.
The Delta determines the increased rate of the transmission interval set by the sender.
The node keeps retransmitting the message until it receives an ACK message or the
retransmission times reach the threshold (called a retransmission increment value).
 Summary Refresh extension
The Summary Refresh extension supports Srefresh messages to update the RSVP status,
without the transmission of standard Path or Resv messages.
Each Srefresh message carries a Message_ID object. Each object contains multiple
messages IDs, each of which identifies a Path or Resv state to be refreshed. If a CR-LSP
changes, its message ID value increases.
Issue 01 (2018-05-04) 1635

NE20E-S2
Only the state that was previously advertised by Path and Resv messages containing
Message_ID objects can be refreshed using the Srefresh extension.
After a node receives an Srefresh message, the node compares the Message_ID with that
saved in a local state block. If they match, the node does not change the state. If the
Message_ID is greater than that saved in the local state block, the node sends a NACK
message to the sender, refreshes the PSB or RSB based on the Path or Resv message, and
updates the Message_ID.
Message_ID objects contain sequence numbers of Message_ID objects. If a CR-LSP
changes, the associated Message_ID sequence number increases. When receiving an
Srefresh message, the node compares the sequence number of the Message_ID with the
sequence number of the Message_ID saved in the PSB. If they are the same, the node
does not change the state; if the received sequence number is greater than the local one,
the state has been updated.
1.12.4.2.6 RSVP Hello

The RSVP Hello extension can rapidly monitor the reachability of RSVP nodes. If an RSVP
node becomes unreachable, TE FRR protection is triggered. The RSVP Hello extension can
also monitor whether an RSVP GR neighboring node is in the restart process.
Background
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state
block (RSB) information between nodes. They can also be used to monitor the reachability
between RSVP neighbors and maintain RSVP neighbor relationships.
Using Path and Resv messages to monitor neighbor reachability delays a traffic switchover if
a link fault occurs and therefore is slow. The RSVP Hello extension can address this problem.
Related Concepts
 RSVP Refresh messages: Although an MPLS TE tunnel is established using Path and
Resv messages, RSVP nodes still send Path and Resv messages over the established
tunnel to update the RSVP status. These Path and Resv messages are called RSVP
Refresh messages.
 RSVP GR: ensures uninterrupted transmission on the forwarding plane while an
AMB/SMB switchover is performed on the control plane. A GR helper assists a GR
restarter in rapidly restoring the RSVP status.
 TE FRR: a local protection mechanism for MPLS TE tunnels. If a fault occurs on a
tunnel, TE FRR rapidly switches traffic to a bypass tunnel.
Implementation
The principles of the RSVP Hello extension are as follows:
1. Hello handshake mechanism
Issue 01 (2018-05-04) 1636

NE20E-S2
Figure 1-1101 Hello handshake mechanism
LSRA and LSRB are directly connected on the network shown in Figure 1-1101.
− If RSVP Hello is enabled on LSRA, LSRA sends a Hello Request message to
LSRB.
− After LSRB receives the Hello Request message and is also enabled with RSVP
Hello, LSRB sends a Hello ACK message to LSRA.
− After receiving the Hello ACK message, LSRA considers LSRB reachable.
2. Detecting neighbor loss
After a successful Hello handshake is implemented, LSRA and LSRB exchange Hello
messages. If LSRB does not respond to three consecutive Hello Request messages sent
by LSRA, LSRA considers router B lost and re-initializes the RSVP Hello process.
3. Detecting neighbor restart
If LSRA and LSRB are enabled with RSVP GR, and the Hello extension detects that
LSRB is lost, LSRA waits for LSRB to send a Hello Request message carrying a GR
extension. After receiving the message, LSRA starts the GR process on LSRB and sends
a Hello ACK message to LSRB. After receiving the Hello ACK message, LSRB
performs the GR process and restores the RSVP soft state. LSRA and LSRB exchange
Hello messages to maintain the restored RSVP soft state.
There are two scenarios if a CR-LSP is set up between LSRs:
 If GR is disabled and FRR is enabled, FRR switches traffic to a bypass CR-LSP after the Hello
extension detects that the RSVP neighbor relationship is lost to ensure proper traffic transmission.
 If GR is enabled, the GR process is performed.
The RSVP Hello extension applies to networks enabled with both RSVP GR and TE FRR.
1.12.4.2.7 Traffic Forwarding Component

The traffic forwarding component imports traffic to a tunnel and forwards traffic over the
tunnel. Although the information advertisement, path selection, and path establishment
components are used to establish a CR-LSP in an MPLS TE tunnel, a CR-LSP (unlike an LDP
LSP) cannot automatically import traffic. The traffic forwarding component must be used to
import traffic to the CR-LSP before it forwards traffic based on labels.
Issue 01 (2018-05-04) 1637

NE20E-S2
Static Route
Static route is the simplest method for directing traffic to a CR-LSP in an MPLS TE tunnel. A
TE static route works in the same way as a common static route and has a TE tunnel interface
as an outbound interface.
Auto Route
An Interior Gateway Protocol (IGP) uses an auto route related to a CR-LSP in a TE tunnel
that functions as a logical link to calculate a path. The tunnel interface is used as an outbound
interface in the auto route. The TE tunnel is considered a P2P link with a specified metric
value. The following auto routes are supported:
 IGP shortcut: A route related to a CR-LSP is not advertised to neighbor nodes,
preventing other nodes from using the CR-LSP.
 Forwarding adjacency: A route related to a CR-LSP is advertised to neighbor nodes,
allowing these nodes to use the CR-LSP.
The forwarding adjacency advertises CR-LSP routes with neighbor IP addresses by
sending link-state advertisements (LSAs) or IS-IS link state packets (LSPs). Type 10
Opaque LSAs carry the neighbor IP addresses in the Remote IP Address
sub-type-length-value (sub-TLV), and LSPs carry the neighbor IP addresses in
intermediate system (IS) reachability TLV's Remote IP Address sub-TLV.
If the forwarding adjacency is used, nodes on both ends of a CR-LSP must be in the
same area.
The following example demonstrates the IGP shortcut and forwarding adjacency.
Issue 01 (2018-05-04) 1638

NE20E-S2
Figure 1-1102 Schematic diagram for IGP shortcut and forwarding adjacency
A CR-LSP over the path LSRG → LSRF → LSRB is established on the network shown in
Figure 1-1102, and the TE metric values are specified. Either of the following configurations
can be used:
 The auto route is not used. LSRE uses LSRD as the next hop in a route to LSRA and a
route to LSRB; LSRG uses LSRF as the next hop in a route to LSRA and a route to
LSRB.
 The auto route is used. Either IGP shortcut or forwarding adjacency can be configured:
− The IGP shortcut is used to advertise the route of Tunnel 1. LSRE uses LSRD as the
next hop in the route to LSRA and the route to LSRB; LSRG uses Tunnel 1 as the
next hop in the route to LSRA and the route to LSRB. LSRG, unlike LSRE, uses
Tunnel 1 in IGP path calculation.
− The forwarding adjacency is used to advertise the route of Tunnel 1. LSRE uses
LSRG as the next hop in the route to LSRA and the route to LSRB; LSRG uses
Tunnel 1 as the next hop in the route to LSRA and the route to LSRB. Both LSRE
and LSRG use Tunnel 1 in IGP path calculation.
Policy-based Routing
The policy-based routing (PBR) allows the system to select routes based on user-defined
policies, improving security and load balancing traffic. If PBR is enabled on an MPLS
network, IP packets are forwarded over specific CR-LSPs based on PBR rules.
MPLS TE PBR, the same as IP unicast PBR, is implemented based on a set of matching rules
and behaviors. The rules and behaviors are defined using an apply clause, in which the
Issue 01 (2018-05-04) 1639

NE20E-S2
outbound interface is a specific tunnel interface. If packets do not match PBR rules, they are
properly forwarded using IP; if they match PBR rules, they are forwarded over specific
CR-LSPs.
Tunnel Policy
Tunnel policies applied to virtual private networks (VPNs) guide VPN traffic to tunnels in
either of the following modes:
 Select-seq mode: The system selects tunnels for VPN traffic in the specified tunnel
selection sequence.
 Tunnel binding mode: A CR-LSP is bound to a destination address in a tunnel policy.
This policy applies only to CR-LSPs.
1.12.4.2.8 Priorities and Preemption

Priorities and preemption are used to allow TE tunnels to be established preferentially to
transmit important services, preventing random resource competition during tunnel
establishment.
If there is no path meeting the bandwidth requirement of a desired CR-LSP, a device can tear
down an established CR-LSP and use the bandwidth assigned to that CR-LSP to establish a
desired CR-LSP. This is called preemption. The following preemption modes are supported:
 Hard preemption: A CR-LSP with a higher priority can -directly delete preempted
resources assigned to a CR-LSP with a lower priority. Some traffic is dropped on the
CR-LSP with a lower priority during the hard preemption process,The CR-LSP with a
lower priority is immediately deleted after its resources are preempted.
 Soft preemption: A CR-LSP with a higher priority can directly preempt resources
assigned to a CR-LSP with a lower priority, but the CR-LSP with a lower priority is not
deleted. During the soft preemption process, the bandwidth assigned to the CR-LSP with
a lower priority gradually decreases to 0 kbit/s. Some traffic is forwarded while some
may be dropped on the CR-LSP with a lower priority. The CR-LSP with a lower priority
is deleted after the soft preemption timer expires.
CR-LSPs use setup and holding priorities to determine whether to preempt resources. The
setup priority must be lower than or equal to the holding priority for a tunnel.
The priority and preemption attributes are used in conjunction to determine resource
preemption among tunnels. If multiple CR-LSPs are to be established, CR-LSPs with high
priorities can be established by preempting resources. If resources (such as bandwidth) are
insufficient, a CR-LSP with a higher setup priority can preempt resources of an established
CR-LSP with a lower holding priority.
The following tunnels are established on the network shown in Figure 1-1103.
 Tunnel 1: established over the path LSRA → LSRF → LSRD. Its bandwidth is 155
Mbit/s, and its setup and holding priority values are 0.
 Tunnel 2: established over the path LSRB → LSRF → LSRC. Its bandwidth is 155
Mbit/s, and its setup and holding priority values are 7.
If the link between LSRF and LSRD fails, LSRA recalculates a path LSRA → LSRF →
LSRC → LSRE → LSRD for tunnel 1. The link between LSRF and LSRC is shared by
tunnels 1 and 2, but has insufficient bandwidth for these two tunnels. As a result, preemption
is triggered.
Issue 01 (2018-05-04) 1640

NE20E-S2
Figure 1-1103 Preemption based on priorities
 If hard preemption is used, since Tunnel 1 has a higher priority than Tunnel 2, LSRF
sends an RSVP message to tear down Tunnel 2. As a result, some traffic on Tunnel 2 is
dropped if Tunnel 2 is transmitting traffic.
 If soft preemption is used, LSRF sends LSRC a Resv message. After LSRC receives this
message, LSRC reestablishes Tunnel 2 over another path
LSRC→LSRE→LSRD→LSRB. LSRC switches traffic to the new path before tearing
down Tunnel 2 over the original path.
1.12.4.2.9 Affinity Naming Function

The affinity naming function simplifies the configuration of tunnel affinities and link
administrative group attributes. Using this function, you can query whether a tunnel affinity
matches a link administrative group attribute.
Background
A tunnel affinity and a link administrative group attribute are 32-bit hexadecimal numbers. An
IGP (IS-IS or OSPF) advertises the administrative group attribute to devices in the same IGP
area. RSVP-TE advertises the tunnel affinity to downstream devices. CSPF on the ingress
checks whether administrative group bits match affinity bits to determine whether a link can
be used to establish a CR-LSP.
Hexadecimal calculations are complex during network deployment, and maintaining and
querying tunnels established using hexadecimal calculations are difficult. Each bit in a
hexadecimal-number affinity can be assigned a name. In this example, colors are used to
name affinity bits. Naming affinity bits help verify that tunnel affinity bits match link
administrative group bits, therefore facilitating network planning and deployment.
Implementation
An affinity name template can be configured to manage the mapping between affinity bits and
names. Configuring the same template on all nodes on an MPLS network is recommended.
Inconsistent configuration may cause a service provision failure. You can name each of 32
Issue 01 (2018-05-04) 1641

NE20E-S2
affinity bits differently. As shown in Figure 1-1104,, the affinity bits are named using colors.
The second affinity bit is "red", the fourth bit is "blue", and the sixth bit is "brown."
Figure 1-1104 Affinity naming example
Bits in a link administrative group must also be configured the same names as the affinity bits.
Once affinity bits are named, you can then determine which links a CR-LSP can include or
exclude on the ingress. Rules for selecting links are as follows:
 include-any: CSPF includes a link when calculating a path, if at least one link
administrative group bit has the same name as an affinity bit.
 exclude: CSPF excludes a link when calculating a path, if any link administrative group
bit has the same name as an affinity bit.
 include-all: CSPF includes a link when calculating a path, only if each link
administrative group bit has the same name as each affinity bit.
Usage Scenarios
The affinity naming function is used when CSPF calculates paths over which RSVP-TE
establishes CR-LSPs.
Benefits
The affinity naming function allows you to easily and rapidly use affinity bits to control paths
over which CR-LSPs are established.
1.12.4.3 Tunnel Optimization

1.12.4.3.1 Tunnel Re-optimization
An MPLS TE tunnel can be automatically reestablished over a new optimal path (if one exists)
if topology information is updated.
Background
MPLS TE tunnels are used to optimize traffic distribution over a network. An MPLS TE
tunnel is configured using static information, such as a bandwidth setting and a calculated
path. Without the optimization function, an MPLS TE tunnel cannot be automatically updated
after the service bandwidth or a tunnel management policy changes. This wastes network
resources. MPLS TE tunnels need to be optimized after being established.
Issue 01 (2018-05-04) 1642

NE20E-S2
Implementation
A specific event that occurs on the ingress can trigger optimization for a CR-LSP bound to an
MPLS TE tunnel. The optimization enables the CR-LSP to be reestablished over the optimal
path with the smallest metric.
 The fixed filter (FF) reservation style and CR-LSP re-optimization cannot be configured together.
 Although re-optimization can be successfullyconfigured for a CR-LSP that is established over an
explicit path, the configuration does not take effect.
Re-optimization is classified into the following modes:

 Automatic re-optimization
When the interval at which a CR-LSP is optimized elapses, Constraint Shortest Path First
(CSPF) attempts to calculate a new path. If the calculated path has a metric smaller than
that of the existing CR-LSP, a new CR-LSP is set up over the new path. After the
CR-LSP is successfully set up, the ingress instructs the forwarding plane to switch traffic
to the new CR-LSP and tear down the original CR-LSP. Re-optimization is then
complete. If the CR-LSP fails to be set up, traffic is still forwarded along the existing
CR-LSP.
 Manual re-optimization
A re-optimization command is run in the user view to trigger re-optimization.
The 1.12.4.5.1 Make-Before-Break mechanism is used to ensure uninterrupted service
transmission during the re-optimization process. Traffic must be switched to a new CR-LSP
before the original CR-LSP is torn down.
1.12.4.3.2 Automatic Bandwidth Adjustment

Automatic bandwidth adjustment enables the ingress of an MPLS TE tunnel to dynamically
update tunnel bandwidth after traffic changes and to reestablish the MPLS TE tunnel using
changed bandwidth values, all of which optimizes bandwidth resource usage.
Background
MPLS TE tunnels are used to optimize traffic distribution over a network. Traffic that
frequently changes wastes MPLS TE tunnel bandwidth; therefore, automatic bandwidth
adjustment is used to prevent this waste. A bandwidth is initially set to meet the requirement
for the maximum volume of services to be transmitted over an MPLS TE tunnel, to ensure
uninterrupted transmission.
Related Concepts
Automatic bandwidth adjustment allows the ingress to dynamically detect bandwidth changes
and periodically attempt to reestablish a tunnel with the needed bandwidth.
Table 1-338 lists concepts and their descriptions.
Table 1-338 Variables used in automatic bandwidth adjustment
Variable Notation Description

Adjustment A Interval at which bandwidth adjustment is performed.
Issue 01 (2018-05-04) 1643

NE20E-S2
Variable Notation Description

frequency
Sampling B Interval at which traffic rates on a specific tunnel
frequency interface are sampled.
Current C Configured bandwidth.
bandwidth
Target D Updated bandwidth after adjustment.
bandwidth
Threshold Threshold An average bandwidth is calculated after the sampling
interval time elapses. If the ratio of the difference
between the average bandwidth and actual bandwidth
to the actual bandwidth exceeds a specific threshold,
automatic bandwidth adjustment is triggered.
Implementation
Automatic bandwidth adjustment is enabled on a tunnel interface of the ingress. The
automatic bandwidth adjustment procedure on the ingress is as follows:
1. Samples traffic.
The ingress starts a bandwidth adjustment timer (A) and samples traffic at a specific
interval (B seconds) to obtain the instantaneous bandwidth during each sampling period.
The ingress records the instantaneous bandwidths.
2. Calculates an average bandwidth.
After timer A expires, the ingress uses the records to calculate an average bandwidth (D)
to be used as a target bandwidth.
3. Calculates a path.
The ingress runs CSPF to calculate a path with bandwidth D and establishes a new
CR-LSP over that path.
4. Switches traffic to the new CR-LSP.
The ingress switches traffic to the new CR-LSP before tearing down the original
CR-LSP.
The preceding procedure repeats each time automatic bandwidth adjustment is triggered.
Bandwidth adjustment is not needed if traffic fluctuates below a specific threshold. The
ingress calculates an average bandwidth after the sampling interval time elapses. The ingress
performs automatic bandwidth adjustment if the ratio of the difference between the average
and existing bandwidths to the existing bandwidth exceeds a specific threshold. The following
inequality applies:
[(D - C)/D] x 100% > Threshold
Other Usage
The following functions are supported based on automatic bandwidth adjustment:
Issue 01 (2018-05-04) 1644

NE20E-S2
 The ingress only samples traffic on a tunnel interface, and does not perform bandwidth
adjustment.
 The upper and lower limits can be set to define a range, within which the bandwidth can
fluctuate.
1.12.4.3.3 PCE+
The PCE+ solution is used for interconnection between Huawei forwarders and Huawei controllers.
Background
The ingress runs the constrained shortest path first (CSPF) algorithm and uses information
stored in the traffic engineering database (TEDB) to calculate MPLS TE tunnels. On an
inter-domain network, each ingress can only obtain topology information within a single
domain. Therefore, the ingress faces the following challenges when establishing inter-domain
tunnels:
 Failure to calculate optimal E2E paths.
 Failure to calculate different paths for primary and backup MPLS TE tunnels, so that the
paths for primary and backup MPLS TE tunnels share a node on a domain border.
PCE+ solution can help resolved the preceding issues in MPLS networks. This solution
involves two device roles:
 PCE server: usually an SDN controller. A PCE server stores the path information of the
entire network and computes paths based on stored information to optimize
network-wide resource usage.
 PCE client: usually an SDN forwarder serving as a tunnel ingress. A PCE client is the
initiator of path computation requests. After receiving the path computation results and
tunnel constraints from a PCE server, a PCE client sets up a TE tunnel as required.
Benefits
The PCE+ solution offers the following benefits:
 A PCE calculates optimal E2E paths for MPLS TE tunnels within a PCE domain.
 Stateful PCEs can be used to improve the efficiency of bandwidth resource use and
simplify network deployment and maintenance.
 The PCE feature uniformly configures and manages TE topology information and tunnel
constraints, which streamlines network operation and maintenance.
 Allows for better control of PCE path calculation results.
Related Concepts
PCE server
Defined in relevant standards, a PCE server is an entity that can use network topology
information to calculate paths or constrained routes. A PCE server can be an operations
support system (OSS) application, a network node, or a server. A PCE server on an MPLS TE
network receives a calculation request sent by an ingress and uses TEDB information to
calculate an optimal constrained path for an MPLS TE tunnel.
PCC
Issue 01 (2018-05-04) 1645

NE20E-S2
A path computation client (PCC) sends a calculation request to a PCE. The ingress of an
MPLS TE tunnel can function as a PCC.
PCEP
The Path Computation Element Communication Protocol (PCEP), defined in relevant
standards, exchanges information between a PCC and a selected PCE and between PCEs in
different domains.
Domain
A domain can be an Interior Gateway Protocol (IGP) area or a Border Gateway Protocol
(BGP) autonomous system (AS). The NE20E supports IGP areas only.
LSP DB
After a PCC advertises LSP attributes of an MPLS network to all PCEs, each PCE stores
these attributes in the label switched path (LSP) databases (DBs).
Stateful PCE
Stateful PCEs technique construct LSP DBs to monitor LSP information, including the
assigned bandwidth and LSP establishment status, and use the LSP DB and TEDB
information to calculate optimal paths for LSPs on an MPLS network.
Implementation
The PCE feature performs discovers PCEs. After members are discovered, PCCs and the PCE
server establish PCEP sessions to exchange information. Before the ingress functioning as a
PCC establishes an MPLS TE tunnel, the ingress sends a request to the selected PCE server to
calculate a path and waits for the calculation result. Unlike IETF PCE, the NE20E allows you
to verify and accept the calculated result or allows the PCE server to automatically confirm
and accept the calculated path. After the calculated path is confirmed, the PCE server replies
with this result to the PCC. Upon receipt the calculation result, the PCC establishes an LSP.
To improve network bandwidth usage efficiency and simplify network operation and
maintenance, the NE20E implements Stateful PCEs and Uniform TE Network Information
Configuration and Management.
PCE Discovery
An available PCC must be discovered before it sends a path calculation request to a PCE
server. The PCE server, however, does not have to proactively discover a PCC. The NE20E
only supports manually configured PCE member relationships. You need to specify the source
IP address on a PCE server. The PCC then establishes a connection to the source IP address of
the PCE server. You can specify multiple candidate PCE servers for the same PCC. The PCC
selects a server based on the priority and source IP address. If candidate PCE servers have the
same priority, the PCC selects a server with the smallest IP address. Other servers function as
backup servers. If the server that is selected to calculate paths fails, the PCC automatically
selects another server.
PCEP Sessions
After a PCC discovers PCEs in different domains, the PCC establishes a PCEP session with a
selected PCE within a specific domain, and the PCEs in different domains establish PCEP
sessions with each other. The devices exchange information, including path calculation results,
over the sessions.
Issue 01 (2018-05-04) 1646

NE20E-S2
Table 1-339 describes the implementation of a PCEP session.
Table 1-339 PCEP session implementation
Stage Diagram Description

PCEP 1. The PCC initiates a TCP
sessio Figure 1-1105 PCEP session establishment request to a PCE. After
n the PCC and PCE
establi perform three handshakes,
shmen they establish a TCP
t connection.
2. The PCC and PCE
exchange Open messages
to negotiate a session.
After both devices accept
one another's parameters,
the two devices exchange
Keepalive messages to
confirm the negotiated
parameters and use these
parameters to establish a
PCEP session.
PCEP A node on each end of the

sessio Figure 1-1106 PCEP session maintenance session periodically sends
n Keepalive messages to the
mainte other node to maintain the
nance PCEP session. The
transmissions on the two
nodes are independent of
each other. If one node fails
to receive a Keepalive
message after a specified
interval time elapses, the
node considers the session
interrupted.
Disco - The node that fails to receive

nnecti a Keepalive message sends a
on Close message to disconnect
the PCEP session.
The process of establishing a PCEP session between two PCEs in different domains is similar to the
preceding process of establishing a PCEP session between the PCC and PCE.
Issue 01 (2018-05-04) 1647

NE20E-S2
Intra-Domain Path Calculation

Intra-domain path calculation involves message exchanges only between the PCC and PCE.
In the following example, the TEDB on each node contains the information about the entire
network, and a session between the PCC and PCE has been established. Figure 1-1107
illustrates intra-domain PCE path calculation. Table 1-340 describes intra-domain path
calculation.
Figure 1-1107 Intra-domain path calculation
Table 1-340 Intra-domain path calculation
Step Description
1 The ingress is configured as a PCC and sends a request to a PCE to establish an
LSP on the network shown in Figure 1-1107.
2 The ingress sends the PCE server a PCEP Report message to calculate a path and
delegate the LSP.
3 Upon receipt, the PCE server obtains the ingress and egress addresses carried in
the message and uses TEDB information to calculate the optimal path between the
ingress and egress. After the PCE server receives the Report message, it saves
LSP information carried in the message to the LSP DB. The PCE server then uses
the TEDB information and the local policy to calculate paths or globally optimize
paths.
4 The PCE server sends an Update message to notify the ingress of the calculation
result.
Issue 01 (2018-05-04) 1648

NE20E-S2
Step Description
5 The ingress uses RSVP signaling to establish an LSP over the calculated path.
Stateful PCEs
Stateful PCEs help establish optimal paths for both primary and backup TE LSPs. Although
MPLS TE is used to properly assign network resources and improve network bandwidth
usage, the TE LSP establishment mechanism insufficient serve these purposes. In Figure
1-1108, each link has 10 Gbit/s bandwidth. The LSP between nodes A and E needs 6 Gbit/s
bandwidth, the LSP between nodes C and D needs 8 Gbit/s bandwidth, and the LSP between
nodes C and G needs 4 Gbit/s bandwidth. The setup priority of the LSP between nodes A and
E is the highest. The C-to-D link has less than 12 Gbit/s bandwidth and a priority lower than
the A-to-E link. Without stateful PCEs, these three LSPs shown in Figure 1-1108 (a) will be
established. As a result, these established LSPs use all links on the network, which is an
extremely inefficient use of network bandwidth.
Figure 1-1108 LSP establishment with and without stateful PCEs
Alternatively, stateful PCEs can be used to improve network bandwidth usage. For example,
in Figure 1-1108 (b), stateful PCEs are used to establish the three LSPs over optimal paths.
The bandwidth of the links between A and B, B and C, and D and E remain available.
Stateful PCEs construct LSP DBs to monitor LSP information, including assigned bandwidth
and establishment status. After stateful PCE is enabled on each node, the PCC advertises LSP
attributes to the now stateful PCEs, and the stateful PCEs construct LSP DBs to store LSP
attributes. All nodes on the MPLS network then have LSP DBs that contain consistent
information. The stateful PCEs then use TEDB and LSP DB information to calculate paths for
LSPs. Stateful PCEs work in either of the following modes:
 Active stateful PCE: Each PCE automatically updates the LSP status and parameters,
while calculating paths.
Issue 01 (2018-05-04) 1649

NE20E-S2
 Passive stateful PCE: PCEs calculate paths, but do not update the LSP status or
parameters.
Uniform TE Network Information Configuration and Management

The stateful PCE function enables a PCE server to uniformly configure and manage TE
network topology and tunnel attributes, which streamlines network management and
maintenance. The PCE server uses the configured TE topology information and tunnel
attributes to calculate paths.
1.12.4.4 IP-Prefix Tunnel

The IP-prefix tunnel function enables the creation of MPLS TE tunnels in a batch, which
helps simplify configuration and improve deployment efficiency.
Background
MPLS TE provides various TE and reliability functions, and MPLS TE applications increase.
The complexity of MPLS TE tunnel configurations, however, also increases. Manually
configuring full-meshed TE tunnels on a large network is laborious and time-consuming. To
address the issues, the HUAWEI NE20E-S2 implements the IP-prefix tunnel function. This
function uses an IP prefix list to automatically establish a number of tunnels to specified
destination IP addresses and applies a tunnel template that contains public attributes to these
tunnels. MPLS TE tunnels that meet expectations can be established in a batch.
Benefits
The IP-prefix tunnel function allows you to establish MPLS TE tunnels in a batch. This
function satisfies various configuration requirements, such as reliability requirements, and
reduces TE network deployment workload.
Implementation
The IP-prefix tunnel implementation is as follows:
1. Configure an IP prefix list that contains multiple destination IP addresses.
2. Configure a tunnel template to set public attributes.
3. Use the template to automatically establish MPLS TE tunnels to the specified destination
IP addresses.
The IP-prefix tunnel function uses the IP prefix list to filter LSR IDs in the traffic engineering
database (TEDB). Only the LSR IDs that match the IP prefix list can be used as destination IP
addresses of MPLS TE tunnels that are to be automatically established. After LSR IDs in the
TEDB are added or deleted, the IP-prefix tunnel function automatically creates or deletes
tunnels, respectively. The tunnel template that the IP-prefix tunnel function uses contains
various configured attributes, such as the bandwidth, priorities, affinities, TE FRR, CR-LSP
backup, and automatic bandwidth adjustment. The attributes are shared by MPLS TE tunnels
that are established in a batch.
1.12.4.5 MPLS TE Reliability

1.12.4.5.1 Make-Before-Break
The make-before-break mechanism prevents traffic loss during a traffic switchover between
two CR-LSPs. This mechanism improves MPLS TE tunnel reliability.
Issue 01 (2018-05-04) 1650

NE20E-S2
Background
MPLS TE provides a set of tunnel update mechanisms, which prevents traffic loss during
tunnel updates. In real-world situations, an administrator can modify the bandwidth or explicit
path attributes of an established MPLS TE tunnel based on service requirements. An updated
topology allows for a path better than the existing one, over which an MPLS TE tunnel can be
established. Any change in bandwidth or path attributes causes a CR-LSP in an MPLS TE
tunnel to be reestablished using new attributes and causes traffic to switch from the previous
CR-LSP to the newly established CR-LSP. During the traffic switchover, the
make-before-break mechanism prevents traffic loss that occurs if the traffic switchover is
implemented more quickly than the path switchover.
Principles
Make-before-break is a mechanism that allows a CR-LSP to be established using changed
bandwidth and path attributes over a new path before the original CR-LSP is torn down. It
helps minimize data loss and additional bandwidth consumption. The new CR-LSP is called a
modified CR-LSP. Make-before-break is implemented using the shared explicit (SE) resource
reservation style.
The new CR-LSP competes with the original CR-LSP on some shared links for bandwidth.
The new CR-LSP cannot be established if it fails the competition. The make-before-break
mechanism allows the system to reserve bandwidth used by the original CR-LSP for the new
CR-LSP, without calculating the bandwidth to be reserved. Additional bandwidth is used if
links on the new path do not overlap the links on the original path.
Figure 1-1109 Schematic diagram for make-before-break
In this example, the maximum reservable bandwidth on each link is 60 Mbit/s on the network
shown in Figure 1-1109. A CR-LSP along the path LSRA → LSRB → LSRC → LSRD is
established, with the bandwidth of 40 Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data
because LSRE has a light load. The reservable bandwidth of the link between LSRC and
LSRD is just 20 Mbit/s. The total available bandwidth for the new path is less than 40 Mbit/s.
The make-before-break mechanism can be used in this situation.
The make-before-break mechanism allows the newly established CR-LSP over the path LSRA
→ LSRE → LSRC → LSRD to use the bandwidth of the original CR-LSP's link between
LSRC and LSRD. After the new CR-LSP is established over the path, traffic switches to the
new CR-LSP, and the original CR-LSP is torn down.
In addition to the preceding method, another method of increasing the tunnel bandwidth can
be used. If the reservable bandwidth of a shared link increases to a certain extent, a new
CR-LSP can be established.
Issue 01 (2018-05-04) 1651

NE20E-S2
In the example shown in Figure 1-1109, the maximum reservable bandwidth on each link is
60 Mbit/s. A CR-LSP along the path LSRA → LSRB → LSRC → LSRD is established, with
the bandwidth of 30 Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data
because LSRE has a light load, and the bandwidth is expected to increase to 40 Mbit/s. The
reservable bandwidth of the link between LSRC and LSRD is just 30 Mbit/s. The total
available bandwidth for the new path is less than 40 Mbit/s. The make-before-break
mechanism can be used in this situation.
The make-before-break mechanism allows the newly established CR-LSP over the path LSRA
→ LSRE → LSRC → LSRD to use the bandwidth of the original CR-LSP's link between
LSRC and LSRD. The bandwidth of the new CR-LSP is 40 Mbit/s, out of which 30 Mbit/s is
released by the link between LSRC and LSRD. After the new CR-LSP is established, traffic
switches to the new CR-LSP and the original CR-LSP is torn down.
Delayed Switchover and Deletion

If an upstream node on an MPLS network is busy but its downstream node is idle or an
upstream node is idle but its downstream node is busy, a CR-LSP may be torn down before
the new CR-LSP is established, causing a temporary traffic interruption.
To prevent this temporary traffic interruption, the switching and deletion delays are used
together with the make-before-break mechanism. In this case, traffic switches to a new
CR-LSP a specified delay time later after a new CR-LSP is established. The original CR-LSP
is torn down a specified delay later after a new CR-LSP is established. The switching delay
and deletion delay can be manually configured.
1.12.4.5.2 TE FRR
Traffic engineering (TE) fast reroute (FRR) protects links and nodes on MPLS TE tunnels. If
a link or node fails, TE FRR rapidly switches traffic to a backup path, minimizing traffic loss.
Background
A link or node failure in an MPLS TE tunnel triggers a primary/backup CR-LSP switchover.
During the switchover, IGP routes converge to a backup CR-LSP, and CSPF recalculates a
path over which the primary CR-LSP is reestablished. Traffic is dropped during this process.
TE FRR can be used to minimize traffic loss. TE FRR establishes a backup path that excludes
faulty links or nodes. The backup path can rapidly take over traffic, minimizing traffic loss. In
addition, the ingress attempts to reestablish the primary CR-LSP.
Benefits
TE FRR provides carrier-class local protection capabilities for MPLS TE CR-LSPs to
improve network reliability.
Related Concepts
TE FRR works in either facility or one-to-one backup mode.
 Facility backup
Figure 1-1110 illustrates facility backup networking.
Issue 01 (2018-05-04) 1652

NE20E-S2
Figure 1-1110 Schematic diagram for TE FRR in facility backup mode
TE FRR working in facility backup mode establishes a bypass tunnel for each link or
node that may fail on a primary tunnel. A bypass tunnel can protect traffic on multiple
primary tunnels. TE FRR in facility backup mode is configured to establish a single
bypass tunnel to protect primary tunnels. This mode is extensible, resource efficient, and
easy to implement. Bypass tunnels must be manually planned and configured, which is
time-consuming and laborious on a complex network.
 One-to-one backup
Figure 1-1111 illustrates one-to-one backup networking.
Figure 1-1111 Schematic diagram for TE FRR in one-to-one backup mode
TE FRR in one-to-one backup mode automatically creates a backup CR-LSP on each

node along a primary CR-LSP to protects downstream links or nodes. This mode is easy
to configure, eliminates manual network planning, and provides flexibility on a complex
network. However, this mode has low extensibility, requires maintenance of the backup
CR-LSP status on each node, and consumes more bandwidth than the facility backup
mode.
Table 1-341 describes nodes and paths that support facility and one-to-one backup.
Table 1-341 Nodes and paths that support facility and one-to-one backup
Concept Supported Description

Protection
Mode
Issue 01 (2018-05-04) 1653

NE20E-S2
Concept Supported Description

Protection
Mode
Primary Both A primary CR-LSP protected by a bypass CR-LSP.
CR-LSP
Bypass Facility A bypass CR-LSP can protect multiple primary CR-LSPs.
CR-LSP backup The bypass and primary CR-LSPs are established in
different tunnels.
Detour LSP One-to-one Each detour LSP is automatically established to protect a
backup primary CR-LSP. Detour LSPs and the primary CR-LSP are
in the same tunnel.
Point of Both A PLR is the ingress of a bypass CR-LSP or a detour LSP.
local repair The PLR can be the ingress or a transit node but cannot be
(PLR) the egress of a primary CR-LSP.
Merge Both An MP is the egress of a bypass CR-LSP or a detour LSP. It
point (MP) must reside on the primary CR-LSP but cannot be the
ingress of the primary CR-LSP.
Detour One-to-one Two detour LSPs pass through the same path after they
merge backup converge on the DMP.
point
(DMP)
Table 1-342 describes TE FRR protection functions implemented in facility and one-to-one
backup modes.
Table 1-342 TE FRR protection functions implemented in facility and one-to-one backup modes
Classif Protection Facility Backup One-to-One Backup
ied By Function
Protect Node A PLR and an MP are indirectly connected. A bypass CR-LSP
ed protection protects a direct link to the PLR and nodes on the primary
object CR-LSP's path between the PLR and MP. Both the bypass
CR-LSP in Figure 1-1110 and the detour LSP 1 in Figure 1-1111
provide node protection.
Link A PLR and an MP are directly connected. A bypass CR-LSP only
protection protects the direct link to the PLR. Detour LSP 2 in Figure 1-1111
provides link protection.
Bandwi Bandwidth A bypass CR-LSP can only By default, a detour LSP has
dth to protection provide bandwidth protection the same bandwidth as
be for a primary CR-LSP when protected primary CR-LSP and
reserve the bypass CR-LSP has provides bandwidth protection.
d bandwidth higher than or equal
to the primary CR-LSP.
Issue 01 (2018-05-04) 1654

NE20E-S2
Classif Protection Facility Backup One-to-One Backup

ied By Function
Non-bandw A bypass CR-LSP without Not supported.
idth bandwidth assigned protects
protection only the path over which the
primary CR-LSP is established.
Implem Manual A bypass CR-LSP is manually Not supported.
entation configured.
Automatic Auto FRR-enabled nodes Nodes on a primary CR-LSP
automatically establish bypass automatically establish detour
CR-LSPs. If an FRR-enabled LSPs.
primary CR-LSP passing
through such a node and a
backup path is available, the
node automatically establishes
an FRR bypass CR-LSP and
binds it to the primary
CR-LSP.
A bypass CR-LSP working in facility backup mode supports a combination of protection types. For
example, a bypass CR-LSP can implement manual, node, and bandwidth protection.
Implementation
Facility backup implementation
The process of implementing TE FRR in facility backup mode is as follows:
1. The ingress establishes a primary CR-LSP.
Figure 1-1112 TE FRR local protection
Issue 01 (2018-05-04) 1655

NE20E-S2
A primary CR-LSP is established in a way similar to that of an ordinary CR-LSP. The

difference is that the ingress appends the following flags into the Session_Attribute
object in a Path message.
− "Local protection desired" flag: enables node or link protection.
− "Label recording desired" flag: allows the message to record labels.
− "SE style desired" flag: enables resource reservation in shared explicit (SE) style.
− "Bandwidth protection desired" flag: enables bandwidth protection. This flag is
added only if bandwidth protection needs to be provided.
2. A PLR binds a bypass CR-LSP to the primary CR-LSP.
Figure 1-1113 Binding between bypass and primary CR-LSPs
The process of searching for a suitable bypass CR-LSP is also called bypass CR-LSP
binding. The primary CR-LSP only with the "local protection desired" flag can trigger a
binding process. The binding must be complete before a primary/bypass CR-LSP
switchover is performed. During the binding, the PLR must obtain information about the
outbound interface of the bypass CR-LSP, next hop label forwarding entry (NHLFE),
label switching router (LSR) ID of the MP, label allocated by the MP, and protection
type.
The PLR already obtains the next hop (NHOP) and next NHOP (NNHOP) of the primary
CR-LSP. The PLR establishes a bypass CR-LSP to provide a specific type of protection
based on the NHOP and NNHOP LSR IDs:
− Link protection can be provided if the egress LSR ID of the bypass CR-LSP is the
same as the NHOP LSR ID.
− Node protection can be provided if the egress LSR ID of the bypass CR-LSP is the
same as the NNHOP LSR ID.
For example, in Figure 1-1113, bypass CR-LSP 1 protects a link, and bypass CR-LSP 2
protects a node.
If multiple bypass CR-LSPs are established, the PLR selects one with the highest priority.
Protection types are prioritized in descending order: bandwidth protection, non-
bandwidth protection, node protection, link protection, manual protection, and automatic
protection. Both bypass CR-LSPs 1 and 2 shown in Figure 1-1113 are manually
configured to provide bandwidth protection. Bypass CR-LSP 1 that protects a link has a
Issue 01 (2018-05-04) 1656

NE20E-S2
lower priority than bypass CR-LSP 2 that protects a node. In this situation, only bypass
CR-LSP 2 can be bound to a primary CR-LSP. If bypass CR-LSP 1 protects bandwidth
and bypass CR-LSP 2 does not, only bypass CR-LSP 1 can be bound to the primary
CR-LSP.
After a bypass CR-LSP is successfully bound to the primary CR-LSP, the NHLFE of the
primary CR-LSP is recorded. The NHLFE contains the NHLFE index of the bypass
CR-LSP and the inner label assigned by the MP. The inner label is used to forward traffic
during FRR switching.
3. The PLR detects faults.
− In link protection, a data link layer protocol is used to detect and advertise faults.
The speed of fault detection at the data link layer depends on link types.
− In node protection, a data link layer protocol is used to detect link faults. If no link
fault occurs, RSVP Hello detection or bidirectional forwarding detection (BFD) for
RSVP is used to detect faults in protected nodes.
If a link or node fault is detected, FRR switching is triggered immediately.
If node protection is enabled, only the link between the protected node and PLR is protected. The PLR
cannot detect faults in the link between the protected node and MP.
4. The PLR performs a traffic switchover.

If a primary CR-LSP fails, the PLR switches both service traffic and RSVP messages to
a detour LSP and advertises the switchover event upstream. The PLR pushes the inner
and outer labels that the MP assigns for the primary and bypass CR-LSPs, respectively.
The penultimate hop along the bypass CR-LSP removes the outer label from the packet
and forwards the packet only with the inner label to the MP. The MP forwards the packet
to the next hop along the primary CR-LSP.
Figure 1-1114 illustrates the process of forwarding a packet on nodes along the primary
and bypass CR-LSPs before TE FRR switching is performed.
Figure 1-1114 Packet forwarding before TE FRR switching
In Figure 1-1114, the bypass CR-LSP provides node protection. If the link between
LSRB and LSRC fails or LSRC fails, LSRB (PLR) swaps an inner label 1024 for an
inner label 1022, pushes an outer label 34 into a packet, and forwards the packet along
Issue 01 (2018-05-04) 1657

NE20E-S2
the bypass CR-LSP. After the packet arrives at LSRD, LSRD forwards the packet to
LSRE at the next hop. Figure 1-1115 illustrates the forwarding process after TE FRR
switching is complete.
Figure 1-1115 Packet forwarding after TE FRR switching
5. The ingress performs a traffic switchback.

After TE FRR (either manual or auto FRR) switching is complete, the PLR (ingress)
attempts to reestablish the primary CR-LSP using the make-before-break mechanism.
Service traffic and RSVP messages switch from the bypass CR-LSP back to the
successfully reestablished primary CR-LSP. The reestablished CR-LSP is called a
modified CR-LSP. The original primary CR-LSP is only torn down after the modified
CR-LSP is established successfully.
One-to-one backup implementation
The process of implementing TE FRR in one-to-one backup mode is as follows:
1. The ingress establishes a primary CR-LSP.
The process of establishing a primary CR-LSP in one-to-one backup is similar to that in
the facility mode. The ingress appends the "local protection desired", "label recording
desired", and "SE style desired" flags to the Session_Attribute object carried in a Path
message.
2. A PLR establishes a detour LSP.
Issue 01 (2018-05-04) 1658

NE20E-S2
Figure 1-1116 Detour LSP establishment and label swapping
Except the egress, each node on the primary CR-LSP attempts to establish a detour LSP
to protect a downstream link or node. Only qualified nodes can function as PLRs and
establish detour LSPs over paths calculated using CSPF.
Each PLR obtains NHOP information. A PLR establishes a detour LSP to provide a
specific type of protection:
− Link protection is provided if the MP LSR ID on a detour LSP is the same as the
NHOP LSR ID. Detour LSP 2 in Figure 1-1116 provides link protection.
− Node protection is provided if the MP LSR ID on a detour LSP differs from the
NHOP LSR ID when other nodes exist between the PLR and MP. Detour LSP 1 in
Figure 1-1116 provides node protection.
If a PLR can establish detour LSPs that provide both link and node protection, the PLR
only establishes a detour LSP that supports node protection.
3. A PLR detects faults.
− In link protection, a data link layer protocol is used to detect and advertise faults.
The speed of fault detection at the data link layer depends on link types.
− In node protection, a data link layer protocol is used to detect link faults. If no link
fault occurs, RSVP Hello detection or BFD is used to detect faults in a protected
node.
If a link or node fault is detected, FRR switching is triggered immediately.
If node protection is enabled, only the link between the protected node and PLR is protected. The PLR
cannot detect faults in the link between the protected node and MP.
4. A PLR performs a traffic switchover.

If a primary CR-LSP fails, the PLR switches both service traffic and RSVP messages to
a detour LSP and advertises the switchover event upstream. In facility backup, a label
stack contains two labels. In one-to-one backup, a label stack contains a single label.
In Figure 1-1116, a primary CR-LSP and two detour LSPs are established. If no faults
occur, traffic passes through the primary CR-LSP based on labels. If a link between
LSRB and LSRC fails, LSRB detects the link fault and switches traffic to detour LSP 2.
LSRB swaps label 1024 for label 36 in a packet and sends the packet to LSRE. LSRE is
the DMP of these two detour LSPs. On LSRE, detour LSPs 1 and 2 merge into one
Issue 01 (2018-05-04) 1659

NE20E-S2
detour LSP (named detour LSP 1, for example). LSRE swaps label 36 for label 37 and
sends the packet to LSRC. Detour LSP 1 overlaps the primary CR-LSP since LSRC.
Therefore, LSRC uses a label for the primary CR-LSP and sends the packet to the egress
LSRD.
5. The ingress on the primary CR-LSP performs a traffic switchback.
After performing a traffic switchover, the ingress on the primary CR-LSP attempts to
reestablish a modified CR-LSP using the make-before-break mechanism. The ingress
then switches service traffic and RSVP messages to the established modified CR-LSP
and tears down the original primary CR-LSP.
Other Usage
When the TE FRR is in the FRR-in-use state, the interface sends RSVP messages without
interface authentication TLV to a remote interface. Upon receipt of this message, the remote
interface does not perform interface authentication in this situation. To enable authentication,
the neighbor authentication mode can be configured.
TE FRR can be used to implement board removal protection. Board removal protection
enables a PLR to retain information about the primary CR-LSP's outbound interface that
resides on an interface board of the PLR. If the interface board is removed, the PLR rapidly
switches MPLS TE traffic to a bypass CR-LSP or a detour LSP. After the interface board is
re-installed, the PLR switches MPLS TE traffic back to the primary CR-LSP through the
outbound interface. Board removal protection protects traffic on the primary CR-LSP's
outbound interface of the PLR.
Without board removal protection, after an interface board on which a tunnel interface resides
is removed from the PLR, CR-LSP information is lost on the PLR. To prevent CR-LSP
information loss, ensure that the interface board to be removed does not have the following
interfaces: primary CR-LSP's tunnel interface, bypass CR-LSP's tunnel interface, bypass
CR-LSP's outbound interface, or detour LSP's outbound interface.
Configuring a TE tunnel interface on the PLR's IPU is recommended. If the interface board on
which the primary CR-LSP's physical outbound interface resides is removed or fails, the PLR
sets the outbound interface to the Stale state. The PLR's main control board retains
information about each FRR-enabled primary CR-LSP that passes through the outbound
interface. After the interface board is re-installed, the outbound interface becomes available
again. Each primary CR-LSP is then automatically reestablished.
1.12.4.5.3 CR-LSP Backup

On one tunnel, a CR-LSP used to protect the primary CR-LSP is called a backup CR-LSP.
A backup CR-LSP protects traffic on important CR-LSPs. If a primary CR-LSP fails, traffic
switches to a backup CR-LSP.
If the ingress detects that a primary CR-LSP is unavailable, the ingress switches traffic to a
backup CR-LSP. After the primary CR-LSP recovers, traffic switches back. Traffic on the
primary CR-LSP is protected.
CR-LSP backup is performed in either of the following modes:
 Hot standby: A backup CR-LSP is set up immediately after a primary CR-LSP is set up.
If the primary CR-LSP fails, traffic switches to the backup CR-LSP. If the primary
CR-LSP recovers, traffic switches back to the primary CR-LSP. Hot-standby CR-LSPs
support best-effort paths.
Issue 01 (2018-05-04) 1660

NE20E-S2
 Ordinary backup: A backup CR-LSP is set up after a primary CR-LSP fails. If the
primary CR-LSP fails, a backup CR-LSP is set up and takes over traffic from the primary
CR-LSP. If the primary CR-LSP recovers, traffic switches back to the primary CR-LSP.
Table 1-343 lists differences between hot-standby and ordinary CR-LSPs.
Table 1-343 Differences between hot-standby and ordinary CR-LSPs
Item Hot Standby Ordinary Backup

When a backup Created immediately after the Created only after the primary
CR-LSP is primary CR-LSP is established. CR-LSP fails.
established
Path Whether or not a primary Allowed to use the path of the
overlapping CR-LSP overlaps a backup primary CR-LSP in any case.
CR-LSP can be determined
manually. If an explicit path is
allowed for a backup CR-LSP,
the backup CR-LSP can be set
up over an explicit path.
Whether or not Supported Not supported
a best-effort
path is
supported
 Best-effort path
The hot standby function supports the establishment of best-effort paths. If both the
primary and hot-standby CR-LSPs fail, a best-effort path is established and takes over
traffic.
As shown in Figure 1-1117, the primary CR-LSP uses the path PE1 -> P1 -> PE2, and
the backup CR-LSP uses the path PE1 -> P2 -> PE2. If both the primary and backup
CR-LSPs fail, PE1 triggers the setup of a best-effort path PE1 -> P2 -> P1 -> PE2.
Issue 01 (2018-05-04) 1661

NE20E-S2
Figure 1-1117 Schematic diagram for a best-effort path
A best-effort path does not provide reserved bandwidth for traffic. The affinity attribute and hop limit are
configured as needed.
Hot-standby CR-LSP Switchover and Revertive Switchover Policy

Traffic can switch to a hot-standby CR-LSP in automatic or manual mode:
 Automatic switchover: Traffic switches to a hot-standby CR-LSP from a primary
CR-LSP when the primary CR-LSP goes Down. If the primary CR-LSP goes Up again,
traffic automatically switches back to the primary CR-LSP. This is the default setting.
You can determine whether to switch traffic back to the primary CR-LSP and set a
revertive switchover delay time.
 Manual switchover: You can manually trigger a traffic switchover. Forcibly switch traffic
from the primary CR-LSP to a hot-standby CR-LSP before some devices on a primary
CR-LSP are upgraded or primary CR-LSP parameters are adjusted. After the required
operations are complete, manually switch traffic back to the primary CR-LSP.
Path Overlapping
The path overlapping function can be configured for hot-standby CR-LSPs. This function
allows a hot-standby CR-LSP to use links of a primary CR-LSP. The hot-standby CR-LSP
protects traffic on the primary CR-LSP.
Comparison Between CR-LSP Backup and Other Features

 The difference between CR-LSP backup and TE FRR is as follows:
− CR-LSP backup is end-to-end path protection for an entire CR-LSP.
− Fast reroute (FRR) is a partial protection mechanism used to protect a link or node
on a CR-LSP. In addition, FRR rapidly responds to a fault and takes effect
temporarily, which minimizes the switchover time.
Issue 01 (2018-05-04) 1662

NE20E-S2
 CR-LSP hot standby and TE FRR are used together.

If a protected link or node fails, a point of local repair (PLR) switches traffic to a bypass
tunnel. If the PLR is the ingress of the primary CR-LSP, the PLR immediately switches
traffic to a hot-standby CR-LSP. If the PLR is a transit node of the primary CR-LSP, it
uses a signaling to advertise fault information to the ingress of the primary CR-LSP, and
the ingress switches traffic to the hot-standby CR-LSP. If the hot-standby CR-LSP is
Down, the ingress keeps attempting to reestablish a hot-standby CR-LSP.
 CR-LSP ordinary backup and TE FRR are used together.
− The association is disabled.
If a protected link or node fails, a PLR switches traffic to a bypass tunnel. Only
after both the primary and bypass CR-LSPs fail, the ingress of the primary CR-LSP
attempts to establish an ordinary backup CR-LSP.
− The association is enabled (FRR in Use).
If a protected link or node fails, a PLR switches traffic to a bypass tunnel. If the
PLR is the ingress of the primary CR-LSP, the PLR attempts to establish an
ordinary backup CR-LSP. If the ordinary backup CR-LSP is successfully
established, the PLR switches traffic to the new CR-LSP. If the PLR is a transit
node on the primary CR-LSP, the PLR advertises the fault to the ingress of the
primary CR-LSP, the ingress attempts to establish an ordinary backup CR-LSP. If
the ordinary backup CR-LSP is successfully established, the ingress switches traffic
to the new CR-LSP.
If the ordinary backup CR-LSP fails to be established, traffic keeps traveling
through the bypass CR-LSP.
1.12.4.5.4 Isolated LSP Computation

Isolated LSP computation enables a device to compute isolated primary and hot-standby label
switched paths (LSPs) using the disjoint algorithm and constrained shortest path first (CSPF)
algorithm simultaneously.
Background
Most live IP radio access networks (RANs) use ring topologies and have the access ring
separated from the aggregation ring. To improve the end-to-end and inter-ring LSP reliability,
many IP RAN carriers require isolated primary and hot-standby LSPs. The CSPF algorithm
does not meet this reliability requirement, because CSPF is a metric-based path computing
algorithm that may compute two intersecting LSPs. Specifying explicit paths can meet this
reliability requirement; this method, however, does not adapt to topology changes. Each time
a node is added to or deleted from the IP RAN, operators must configure new explicit paths,
which is time-consuming and laborious. To resolve these problems, you can configure isolated
LSP computation.
Figure 1-1118 illustrates an IP RAN on which an MPLS TE tunnel is established between a
cell site gateway (CSG) on the access ring and a radio service gateway (RSG) on the
aggregation ring. The MPLS TE tunnel implements the end-to-end virtual private network
(VPN) service. To improve the network reliability, this network requires the constraint-based
routed label switched path (CR-LSP) hot standby feature and isolated primary and
hot-standby LSPs.
Without the isolated LSP computation feature, CSPF on this network will compute CSG ->
ASG1 -> ASG2 -> RSG as the primary LSP. This LSP does not have an isolated hot-standby
LSP. However, two isolated LSPs exist on this network: CSG -> ASG1 -> RSG and CSG ->
ASG2 -> RSG. With the isolated LSP computation feature, the disjoint and CSPF algorithms
work simultaneously to get the two isolated LSPs.
Issue 01 (2018-05-04) 1663

NE20E-S2
Figure 1-1118 Application of isolated LSP computation on an end-to-end VPN bearer network
Implementation
Isolated LSP computation is implemented by both the disjoint and CSPF algorithms. This
feature computes primary and hot-standby LSPs simultaneously and cuts off overlapping
paths of the two LSPs to get two isolated LSPs. In the example shown in Figure 1-1119,
before isolated LSP computation is configured, CSPF computes LSRA -> LSRB -> LSRC ->
LSRD as the primary LSP and LSRA -> LSRC -> LSRD as the hot-standby LSP if path
overlapping is allowed. These two LSPs intersect, so that they do not meet the reliability
requirement.
After isolated LSP computation is configured, the disjoint and CSPF algorithms compute
LSRA -> LSRB -> LSRD as the primary LSP and LSRA -> LSRC -> LSRD as the
hot-standby LSP. These two LSPs do not intersect, so that they meet the reliability
requirement.
Issue 01 (2018-05-04) 1664

NE20E-S2
Figure 1-1119 Principles of the disjoint algorithm
 Isolated LSP computation is a best-effort technique. If the disjoint and CSPF algorithms cannot get
isolated primary and hot-standby LSPs or two isolated LSPs do not exist, the device uses the
primary and hot-standby LSPs computed by CSPF.
 The disjoint algorithm cannot work together with the following features: explicit path, affinity
property, and hop limit. Therefore, before you configure isolated LSP computation, check that all
those features are disabled. Otherwise, the device does not allow you to configure isolated LSP
computation. After you configure isolated LSP computation, the device does not allow you to
configure any of those features, either.
 After you configure isolated LSP computation, the shared risk link group (SRLG), if configured,
becomes ineffective.
Usage Scenario
Isolated LSP computation applies to networks on which Resource Reservation Protocol -
Traffic Engineering (RSVP-TE) tunnels and the hot standby feature are configured.
Benefits
Isolated LSP computation offers the following benefits to carriers:
 Improves the network reliability.
 Reduces the maintenance workload.
1.12.4.5.5 Association Between CR-LSP Establishment and the IS-IS Overload

An association between constraint-based routed label switched path (CR-LSP) establishment
and IS-IS overload enables the ingress to establish a CR-LSP that excludes overloaded IS-IS
nodes. The association ensures that MPLS TE traffic travels properly along the CR-LSP,
therefore improving CR-LSP reliability and service transmission quality.
Issue 01 (2018-05-04) 1665

NE20E-S2
Background
If a device is unable to store new link state protocol data units (LSPs) or use LSPs to update
its link state database (LSDB) information, the device will calculate incorrect routes, causing
forwarding failures. The IS-IS overload function enables the device to set the device to the
IS-IS overload state to prevent such forwarding failures. By configuring the ingress to
establish a CR-LSP that excludes the overloaded IS-IS device, the association between
CR-LSP establishment and the IS-IS overload function helps the CR-LSP reliably transmit
MPLS TE traffic.
Related Concepts
IS-IS overload state
When a device cannot store new LSPs or use LSPs to update its LSDB information using
LSPs, the device will incorrectly calculate IS-IS routes. In this situation, the device will enter
the overload state. For example, an IS-IS device becomes overloaded if its memory resources
decrease to a specified threshold or if an exception occurs on the device. A device can be
manually configured to enter the IS-IS overload state.
Implementation
In Figure 1-1120, RT1 supports the association between CR-LSP establishment and the IS-IS
overload function. RT3 and RT4 support the IS-IS overload function.
Figure 1-1120 CR-LSP establishment and IS-IS overload association
In Figure 1-1120, devices RT1 to RT4 are in an IS-IS area. RT1 establishes a CR-LSP named
Tunnel1 destined for RT2 along the path RT1 -> RT3 -> RT2. Association between the
CR-LSP establishment and IS-IS overload is implemented as follows:
1. If RT3 enters the IS-IS overload state, IS-IS propagates packets carrying overload
information in the IS-IS area.
2. RT1 determines that RT3 is overloaded and re-calculates the CR-LSP destined for RT2.
Issue 01 (2018-05-04) 1666

NE20E-S2
3. RT1 calculates a new path RT1 -> RT4 - >RT2, which bypasses the overloaded IS-IS
node. Then RT1 establishes a new CR-LSP along this path.
4. After the new CR-LSP is established, RT1 switches traffic from the original CR-LSP to
the new CR-LSP, ensuring service transmission quality.
1.12.4.5.6 SRLG
The shared risk link group (SRLG) functions as a constraint that is used to calculate a backup
path in the scenario where CR-LSP hot standby or TE FRR is used. This constraint helps
prevent backup and primary paths from overlapping over links with the same risk level,
improving MPLS TE tunnel reliability as a consequence.
Background
Carriers use CR-LSP hot standby or TE FRR to improve MPLS TE tunnel reliability.
However, in real-world situations protection failures can occur, requiring the SRLG technique
to be configured as a preventative measure, as the following example demonstrates.
Figure 1-1121 Networking diagram for an SRLG
The primary tunnel is established over the path PE1 → P1 → P2 → PE2 on the network
shown in Figure 1-1121. The link between P1 and P2 is protected by a TE FRR bypass tunnel
established over the path P1 → P3 → P2.
In the lower part of Figure 1-1121, core nodes P1, P2, and P3 are connected using a transport
network device. They share some transport network links marked in yellow. If a fault occurs
on a shared link, both the primary and FRR bypass tunnels are affected, causing an FRR
Issue 01 (2018-05-04) 1667

NE20E-S2
protection failure. An SRLG can be configured to prevent the FRR bypass tunnel from sharing
a link with the primary tunnel, ensuring that FRR properly protects the primary tunnel.
Related Concepts
An SRLG is a set of links at the same risk of faults. If a link in an SRLG fails, other links also
fail. If a link in this group is used by a hot-standby CR-LSP or FRR bypass tunnel, the
hot-standby CR-LSP or FRR bypass tunnel cannot provide protection.
Implementation
An SRLG link attribute is a number and links with the same SRLG number are in a single
SRLG.
Interior Gateway Protocol (IGP) TE advertises SRLG information to all nodes in a single
MPLS TE domain. The constraint shortest path first (CSPF) algorithm uses the SRLG
attribute together with other constrains, such as bandwidth, to calculate a path.
The MPLS TE SRLG works in either of the following modes:
 Strict mode: The SRLG attribute is a necessary constraint used by CSPF to calculate a
path for a hot-standby CR-LSP or an FRR bypass tunnel.
 Preferred mode: The SRLG attribute is an optional constraint used by CSPF to calculate
a path for a hot-standby CR-LSP or FRR bypass tunnel. For example, if CSPF fails to
calculate a path for a hot-standby CR-LSP based on the SRLG attribute, CSPF
recalculates the path, regardless of the SRLG attribute.
Usage Scenario
The SRLG attribute is used in either the TE FRR or CR-LSP hot-standby scenario.
Benefits
The SRLG attribute limits the selection of a path for a hot-standby CR-LSP or an FRR bypass
tunnel, which prevents the primary and bypass tunnels from sharing links with the same risk
level.
1.12.4.5.7 MPLS TE Tunnel Protection Group

A tunnel protection group protects E2E MPLS TE tunnels. If a working tunnel in a protection
group fails, traffic switches to a protection tunnel, minimizing traffic interruptions.
Related Concepts
Concepts related to a tunnel protection group are as follows:
 Working tunnel: a tunnel to be protected.
 Protection tunnel: a tunnel that protects a working tunnel.
 Protection switchover: switches traffic from a faulty working tunnel to a protection
tunnel in a tunnel protection group, which improves network reliability.
Figure 1-1122 illustrates a tunnel protection group.
Issue 01 (2018-05-04) 1668

NE20E-S2
Figure 1-1122 Tunnel protection group
Primary tunnels tunnel-1 and the protection tunnel tunnel-2 are established on the ingress
LSRA on the network shown in Figure 1-1122.
Tunnel-2 is configured as a protection tunnel for primary tunnel tunnel-1 on LSRA. If the
configured fault detection mechanism on the ingress detects a fault in tunnel-1, traffic
switches to tunnel-2. LSRA attempts to reestablish tunnel-1. If tunnel-1 is successfully
established, traffic switches back to the primary tunnel.
Implementation
An MPLS TE tunnel protection group uses a configured protection tunnel to protect traffic on
the working tunnel to improve tunnel reliability. To ensure the improved performance of the
protection tunnel, the protection tunnel must exclude links and nodes through which the
working tunnel passes during network planning.
Table 1-344 describes the implementation procedure of a tunnel protection group.
Table 1-344 Implementation procedure of a tunnel protection group
Seq Process Description

uenc
e
Nu
mbe
r
1 Establish The working and protection tunnels must have the same ingress and
ment destination address. The protection tunnel is established in the same
procedure as a regular tunnel. The protection tunnel can use attributes
that differ from those for the working tunnel. Ensure that the working
and protection tunnels are established over different paths as much as
possible.
NOTE
 A protection tunnel cannot be protected or enabled with TE FRR.
 Attributes for a protection tunnel can be configured independently of those

for the working tunnel, which facilitates the network planning.
 The primary and protection tunnels must be bidirectional. The following
types of bidirectional tunnels are supported:
 Dynamic bidirectional associated LSPs
Issue 01 (2018-05-04) 1669

NE20E-S2
Seq Process Description

uenc
e
Nu
mbe
r
 Static bidirectional associated LSPs
 Static bidirectional co-routed LSPs
2 Binding The protection tunnel is bound to the tunnel ID of the working tunnel
between so that the two tunnels form a tunnel protection group.
the
working
and
protectio
n tunnels
3 Fault MPLS OAM/MPLS-TP OAM is used to detect faults in a tunnel
detection protection group to speed up protection switching.
4 Protectio The tunnel protection group supports either of the following protection
n switching modes:
switchin  Manual switching: Traffic is forcibly switched to the protection
g tunnel.
 Automatic switching: Traffic automatically switches to the
protection tunnel if the working tunnel fails.
A time interval can be set for automatic switching.
An MPLS TE tunnel protection group only supports bidirectional
switching. If a traffic switchover is performed for traffic in one
direction, a traffic switchover is also performed for traffic in the
opposite direction.
5 Switchba After a traffic switchover is implemented, the ingress attempts to
ck reestablish the working tunnel. If the working tunnel is reestablished,
the ingress can switch traffic back to the working tunnel or still
forward traffic over the protection tunnel.
Differences Between CR-LSP Backup and a Tunnel Protection Group

CR-LSP backup and a tunnel protection group are both E2E protection mechanisms for MPLS
TE. Table 1-345 shows the comparison between these two mechanisms.
Table 1-345 Comparison between CR-LSP backup and a tunnel protection group
Item CR-LSP Backup Tunnel Protection Group

Object to be Primary and backup CR-LSPs One tunnel protects traffic over
Issue 01 (2018-05-04) 1670

NE20E-S2
Item CR-LSP Backup Tunnel Protection Group

protected are established on the same another tunnel in a tunnel protection
tunnel interface. A backup group.
CR-LSP protects traffic on a
primary CR-LSP.
TE FRR A primary CR-LSP supports TE A working tunnel supports TE FRR.
FRR. A backup CR-LSP does A protection tunnel does not support
not support TE FRR. TE FRR.
LSP attributes Primary and backup CR-LSPs The attributes of one tunnel in a
have the same attributes, except tunnel protection group are
for the TE FRR attribute. In independent of the attributes of the
addition, the bandwidth for the other tunnel. For example, a
backup CR-LSP can be set protection tunnel with no bandwidth
separately. can protect traffic on a working
tunnel that has a bandwidth.
1.12.4.5.8 BFD for TE CR-LSP

BFD for TE is an end-to-end rapid detection mechanism supported by MPLS TE. BFD for TE
rapidly detects faults in links on an MPLS TE tunnel. BFD for TE supports BFD for TE
tunnel and BFD for TE CR-LSP. This section describes BFD for TE CR-LSP only.
Traditional detection mechanisms, such as RSVP Hello and Srefresh, detect faults slowly.
BFD rapidly sends and receives packets to detect faults in a tunnel. If a fault occurs, BFD
triggers a traffic switchover to protect traffic.
Figure 1-1123 BFD
On the network shown in Figure 1-1123, BFD is disabled. If LSRE fails, LSRA or LSRF
cannot promptly detect the fault because a Layer 2 switch exists between them. Although the
Hello mechanism detects the fault, detection lasts for a long time.
If LSRE fails, LSRA and LSRF detect the fault rapidly, and traffic switches to the path LSRA
-> LSRB -> LSRD -> LSRF.
Issue 01 (2018-05-04) 1671

NE20E-S2
BFD for TE detects faults in a CR-LSP. After detecting a fault in a CR-LSP, BFD for TE
immediately notifies the forwarding plane of the fault to rapidly trigger a traffic switchover.
BFD for TE is usually used together with a hot-standby CR-LSP.
The concepts associated with BFD are as follows:
 Static BFD session: established by manually setting the local and remote discriminators.
The local discriminator on a local node must match the remote discriminator on a remote
node. The minimum intervals at which BFD packets are sent and received are
changeable after a static BFD session is established.
 Dynamic BFD session: established without a local or remote discriminator specified.
After a routing protocol neighbor is established between the local and remote nodes, the
RM delivers parameters to instruct the BFD module to establish a BFD session. The two
nodes negotiate the local discriminator, remote discriminator, minimum interval at which
BFD packets are sent, and minimum interval at which BFD packets are received.
 Detection period: an interval at which the system checks the BFD session status. If no
packet is received from the remote end within a detection period, the BFD session is
considered Down.
A BFD session is bound to a CR-LSP. A BFD session is set up between the ingress and egress.
A BFD packet is sent by the ingress to the egress along a CR-LSP. Upon receipt, the egress
responds to the BFD packet. The ingress can rapidly monitor the status of links through which
the CR-LSP passes based on whether a reply packet is received.
If a link fault is detected, BFD notifies the forwarding module of the fault. The forwarding
module searches for a backup CR-LSP and switches traffic to the backup CR-LSP. In addition,
the forwarding module reports the fault to the control plane. If dynamic BFD for TE CR-LSP
is used, the control plane proactively creates a BFD session to detect faults in the backup
CR-LSP. If static BFD for TE CR-LSP is used, a BFD session is created manually to detect
faults in the backup CR-LSP if necessary.
Issue 01 (2018-05-04) 1672

NE20E-S2
Figure 1-1124 BFD sessions before and after a switchover
On the network shown in Figure 1-1124, a BFD session is set up to detect faults in the link
through which the primary CR-LSP passes. If a link fault occurs, the BFD session on the
ingress immediately notifies the forwarding plane of the fault. The ingress switches traffic to
the bypass CR-LSP and sets up a new BFD session to detect faults in the bypass CR-LSP.
BFD for TE Deployment

The networking shown in Figure 1-1125 applies to BFD for TE CR-LSP and BFD for
hot-standby CR-LSP.
Figure 1-1125 BFD for TE
Switchover between the primary and hot-standby CR-LSPs
Issue 01 (2018-05-04) 1673

NE20E-S2
On the network shown in Figure 1-1125, a primary CR-LSP is established along the path
LSRA -> LSRB, and a hot-standby CR-LSP is configured. A BFD session is set up between
LSRA and LSRB to detect faults in the primary CR-LSP. If a fault occurs on the primary
CR-LSP, the BFD session rapidly notifies LSRA of the fault. After receiving the fault
information, LSRA rapidly switches traffic to the hot-standby CR-LSP to ensure traffic
continuity.
1.12.4.5.9 BFD for TE Tunnel

BFD for TE supports BFD for TE tunnel and BFD for TE CR-LSP. This section describes
BFD for TE tunnel.
The BFD mechanism detects communication faults in links between forwarding engines. The
BFD mechanism monitors the connectivity of a data protocol on a bidirectional path between
systems. The path can be a physical link or a logical link, for example, a TE tunnel.
BFD detects faults in an entire TE tunnel. If a fault is detected and the primary TE tunnel is
enabled with virtual private network (VPN) FRR, a traffic switchover is rapidly triggered,
which minimizes the impact on traffic.
On a VPN FRR network, a TE tunnel is established between PEs, and the BFD mechanism is
used to detect faults in the tunnel. If the BFD mechanism detects a fault, VPN FRR switching
is performed in milliseconds.
1.12.4.5.10 BFD for P2MP TE

BFD for P2MP TE applies to NG-MVPN and VPLS scenarios and rapidly detects P2MP TE
tunnel failures. This function helps reduce the response time, improve network-wide
reliability, and reduces traffic loss.
Benefits
No tunnel protection is provided in the NG-MVPN over P2MP TE function or VPLS over
P2MP TE function. If a tunnel fails, traffic can only be switched using route change-induced
hard convergence, which renders low performance. This function provides dual-root 1+1
protection for the NG-MVPN over P2MP TE function and VPLS over P2MP TE function. If a
P2MP TE tunnel fails, BFD for P2MP TE rapidly detects the fault and switches traffic, which
improves fault convergence performance and reduces traffic loss.
Issue 01 (2018-05-04) 1674

NE20E-S2
Principles
Figure 1-1126 BFD for P2MP TE principles
In Figure 1-1126, BFD is enabled on the root PE1 and the backup root PE2. Leaf nodes UPE1
to UEP4 are enabled to passively create BFD sessions. Both PE1 and PE2 sends BFD packets
to all leaf nodes along P2MP TE tunnels. The leaf nodes receives the BFD packets transmitted
only on the primary tunnel. If a leaf node receives detection packets within a specified
interval, the link between the root node and leaf node is working properly. If a leaf node fails
to receive BFD packets within a specified interval, the link between the root node and leaf
node fails. The leaf node then rapidly switches traffic to a protection tunnel, which reduces
traffic loss.
1.12.4.5.11 BFD for RSVP

When a Layer 2 device exists on a link between two RSVP nodes, BFD for RSVP can be
configured to rapidly detect a fault in the link between the Layer 2 device and an RSVP node.
If a link fault occurs, BFD for RSVP detects the fault and sends a notification to trigger TE
FRR switching.
Background
When a Layer 2 device is deployed on a link between two RSVP nodes, an RSVP node can
only use the Hello mechanism to detect a link fault. For example, on the network shown in
Figure 1-1127, a switch exists between P1 and P2. If a fault occurs on the link between the
switch and P2, P1 keeps sending Hello packets and detects the fault after it fails to receive
replies to the Hello packets. The fault detection latency causes seconds of traffic loss. To
minimize packet loss, BFD for RSVP can be configured. BFD rapidly detects a fault and
triggers TE FRR switching, which improves network reliability.
Issue 01 (2018-05-04) 1675

NE20E-S2
Figure 1-1127 BFD for RSVP
Implementation
BFD for RSVP monitors RSVP neighbor relationships.
Unlike BFD for CR-LSP and BFD for TE that support multi-hop BFD sessions, BFD for
RSVP establishes only single-hop BFD sessions between RSVP nodes to monitor the network
layer.
BFD for RSVP, BFD for OSPF, BFD for IS-IS, and BFD for BGP can share a BFD session.
When protocol-specific BFD parameters are set for a BFD session shared by RSVP and other
protocols, the smallest values take effect. The parameters include the minimum intervals at
which BFD packets are sent, minimum intervals at which BFD packets are received, and local
detection multipliers.
Usage Scenario
BFD for RSVP applies to a network on which a Layer 2 device exists between the TE FRR
point of local repair (PLR) on a bypass CR-LSP and an RSVP node on the primary CR-LSP.
Benefits
BFD for RSVP improves reliability on MPLS TE networks with Layer 2 devices.
1.12.4.5.12 RSVP GR
RSVP graceful restart (GR) is a status recovery mechanism supported by RSVP-TE.
RSVP GR is designed based on non-stop forwarding (NSF). If a fault occurs on the control
plane of a node, the upstream and downstream neighbor nodes send messages to restore
RSVP soft states, but the forwarding plane does not detect the fault and is not affected. This
helps stably and reliably transmit traffic.
Issue 01 (2018-05-04) 1676

NE20E-S2
RSVP GR uses the Hello extension to detect the neighboring nodes' GR status. For more
information about the Hello feature, see .
RSVP GR principles are as follows:
On the network shown in Figure 1-1128, if the restarter performs GR, it stops sending Hello
messages to its neighbors. If the GR-enabled helpers fail to receive three consecutive Hello
messages, the helpers consider that the restarter is performing GR and retain all forwarding
information. In addition, the interface board continues transmitting services and waits for the
restarter to restore the GR status.
After the restarter restarts, if it receives Hello Path messages from helpers, it replies with
Hello ACK messages. The types of the Hello messages returned by the upstream and
downstream nodes on a tunnel are different:
 If an upstream helper receives a Hello message, it sends a GR Path message downstream
to the restarter.
 If a downstream helper receives a Hello message, it sends a Recovery Path message
upstream to the restarter.
Figure 1-1128 Networking diagram for restoring the GR status by sending GR Path and Recovery
Path messages
If both the GR Path and Recovery Path messages are received, the restarter creates the new
PSB associated with the CR-LSP. This restores information about the CR-LSP on the control
plane.
If no Recovery Path message is sent and only a GR Path message is received, the restarter
creates the new PSB associated with the CR-LSP based on the GR Path message. This
restores information about the CR-LSP on the control plane.
The NE20E can only function as a GR Helper to help a neighbor node to complete RSVP GR.
1.12.4.5.13 Loopback Detection for a Static Bidirectional Co-Routed CR-LSP

Loopback detection for a specified static bidirectional co-routed CR-LSP locates faults if a
few packets are dropped or bit errors occur on links along the CR-LSP.
Background
On a network with a static bidirectional co-routed CR-LSP used to transmit services, if a few
packets are dropped or bit errors occur on links, no alarms indicating link or LSP failures are
generated, which poses difficulties in locating the faults. To locate the faults, loopback
detection can be enabled for the static bidirectional co-routed CR-LSP.
Issue 01 (2018-05-04) 1677

NE20E-S2
Implementation
To implement loopback detection for a specified static bidirectional co-routed CR-LSP, a
transit node temporarily connects the forward CR-LSP to the reverse CR-LSP and generates a
forwarding entry for the loop so that the transit node can loop all traffic back to the ingress. A
professional monitoring device connected to the ingress monitors data packets that the ingress
sends and receives and checks whether a fault occurs on the link between the ingress and
transit node.
The dichotomy method is used to perform loopback detection by reducing the range of nodes
to be monitored before locating a faulty node. For example, in Figure 1-1129, loopback
detection is enabled for a static bidirectional co-routed CR-LSP established between PE1
(ingress) and PE2 (egress). The process of using loopback detection to locate a fault is as
follows:
1. Loopback is enabled on P1 to loop data packets back to the ingress. The ingress checks
whether the sent packets match the received ones.
− If the packets do not match, a fault occurs on the link between PE1 and P1.
Loopback detection can then be disabled on P1.
− If the packets match, the link between PE1 and P1 is working properly. The fault
location continues.
2. Loopback is disabled on P1 and enabled on P2 to loop data packets back to the ingress.
The ingress checks whether the sent packets match the received ones.
− If the packets do not match, a fault occurs on the link between P1 and P2. Loopback
detection can then be disabled on P2.
− If the packets match, a fault occurs on the link between P2 and PE2. Loopback
detection can then be disabled on P2.
Figure 1-1129 Loopback detection for a static bidirectional co-routed CR-LSP
Loopback detection information is not saved in a configuration file after loopback detection is enabled.
A loopback detection-enabled node loops traffic back to the ingress through a temporary loop. Loopback
alarms can then be generated to prompt users that loopback detection is performed. After loopback
detection finishes, it can be manually or automatically disabled. Loopback detection configuration takes
effect only on a main control board. After a master/slave main control board switchover is performed,
loopback detection is automatically disabled.
Issue 01 (2018-05-04) 1678

NE20E-S2
Benefits
Loopback detection for a static bidirectional co-routed CR-LSP helps rapidly local faults,
such as minor packet loss or bit errors, and improve network operation and maintenance
efficiency.
1.12.4.6 MPLS TE Security

1.12.4.6.1 RSVP Authentication
Principles
RSVP messages are sent over Raw IP with no security mechanism and expose themselves to
being modified and expose devices to attacks. These packets are easy to modify, and a device
receiving these packets is exposed to attacks.
RSVP authentication prevents the following situations and improves device security:
 An unauthorized remote router sets up an RSVP neighbor relationship with the local
router.
 A remote router constructs forged RSVP messages to set up an RSVP neighbor
relationship with the local router and initiates attacks (such as maliciously reserving a
large number of bandwidths) to the local router.
RSVP authentication parameters are as follows:
 Key
The same key must be configured on two RSVP nodes before they perform RSVP
authentication. A node uses this key to compute a digest for a packet to be sent based on
the HMAC (Keyed-Hashing for Message Authentication)-Message Digest 5 (MD5)
algorithm or Secure Hash Algorithm (SHA). The packet carrying the digest as an
integrity object is sent to a remote node. After receiving the packet, the remote node uses
the same key and algorithm to compute a digest for the packet, and compares the
computed digest with the one carried in the packet. If they are the same, the packet is
accepted; if they are different, the packet is discarded.
 Sequence number
In addition, each packet is assigned a 64-bit monotonically increasing sequence number
before being sent, which prevents replay attacks. After receiving the packet, the remote
node checks whether or not the sequence number is in an allowable window. If the
sequence number in the packet is smaller than the lower limit defined in the window, the
receiver considers the packet as a replay packet and discards it.
RSVP authentication also introduces handshake messages. If a receiver receives the first
packet from a transmit end or packet mis-sequence occurs, handshake messages are used
to synchronize the sequence number windows between the RSVP neighboring nodes.
 Authentication lifetime
Network flapping causes an RSVP neighbor relationship to be deleted and created
alternatively. Each time the RSVP neighbor relationship is created, the handshake
process is performed, which delays the establishment of a CR-LSP. The RSVP
authentication lifetime is introduced to resolve the problem. If a network flaps, a
CR-LSP is deleted and created. During the deletion, the RSVP neighbor relationship
associated with the CR-LSP is retained until the RSVP authentication lifetime expires.
Issue 01 (2018-05-04) 1679

NE20E-S2
Authentication Key Management

An HMAC-MD5 key is entered in either ciphertext or simple text on an RSVP interface or
node. An HMAC-MD5 key has the following characteristics:
 A unique key must be assigned to each protocol.
 A single key is assigned to each interface and node. The key can be reconfigured but
cannot be changed.
Key Authentication Configuration Scope

RSVP authentication keys can be configured on RSVP interfaces and nodes.
 Local interface-based key
A local interface-based key is configured on an interface. The key takes effect on packets
sent and received on this interface.
 Neighbor node-based key
A neighbor node-based key is associated with the label switch router (LSR) ID of an
RSVP node. The key takes effect on packets sent and received by the local node.
 Neighbor address-based key
A neighbor address-based key is associated with the IP address of an RSVP interface.
The key takes effect on the following packets:
− Received packets with the source or next-hop address the same as the configured
one
− Sent packets with the destination or next-hop address the same as the configured
one
On an RSVP node, if the local interface-, neighbor node-, and neighbor address-based keys
are configured, the neighbor address-based key takes effect; the neighbor node-based key
takes effect if the neighbor address-based key fails; if the neighbor node-based key fails, the
local interface-based key takes effect.
A specific RSVP authentication key is configured in a specific situation:
 Neighbor node-key usage scenario:
− If multiple links or hops exist between two RSVP nodes, only a neighbor
node-based key needs to be configured, which simplifies the configuration. Two
RSVP nodes authenticate all packets exchanged between them based on the key.
− On a TE FRR network, packets are exchanged on an indirect link between a Point
of Local Repair (PLR) node and a Merge Point (MP) node.
 Local interface-based key usage scenario
Two RSVP nodes are directly connected and authenticate packets that are sent and
received by their indirectly connected interfaces.
 Neighbor address-key usage scenarios
− Two RSVP nodes cannot obtain the LSR ID of each other (for example, on an
inter-domain network).
− The PLR and MP authenticate packets with specified interface addresses.
The keychain key is recommended.
Issue 01 (2018-05-04) 1680

NE20E-S2
1.12.4.7 Static Bidirectional Co-routed LSPs

A co-routed bidirectional static CR-LSP is an important feature that enables LSP ping
messages, LSP tracert messages, and OAM messages and replies to travel through the same
path.
Background
Service packets exchanged by two nodes must travel through the same links and nodes on a
transport network without running a routing protocol. Co-routed bidirectional static CR-LSPs
can be used to meet the requirements.
Definition
A co-routed bidirectional static CR-LSP is a type of CR-LSP over which two flows are
transmitted in opposite directions over the same links. A co-routed bidirectional static
CR-LSP is established manually.
A co-routed bidirectional static CR-LSP differs from two LSPs that transmit traffic in opposite
directions. Two unidirectional CR-LSPs bound to a co-routed bidirectional static CR-LSP
function as a single CR-LSP. Two forwarding tables are used to forward traffic in opposite
directions. The co-routed bidirectional static CR-LSP can go Up only when the conditions for
forwarding traffic in opposite directions are met. If the conditions for forwarding traffic in one
direction are not met, the bidirectional CR-LSP is in the Down state. If no IP forwarding
capabilities are enabled on the bidirectional CR-LSP, any intermediate node on the
bidirectional LSP can reply with a packet along the original path. The co-routed bidirectional
static CR-LSP supports the consistent delay and jitter for packets transmitted in opposite
directions, which guarantees QoS for traffic transmitted in opposite directions.
Implementation
A bidirectional co-routed static CR-LSP is manually established. A user manually specifies
labels and forwarding entries mapped to two FECs for traffic transmitted in opposite
directions. The outgoing label of a local node (also known as an upstream node) is equal to
the incoming label of a downstream node of the local node.
A node on a co-routed bidirectional static CR-LSP only has information about the local LSP
and cannot obtain information about nodes on the other LSP. A co-routed bidirectional static
CR-LSP shown in Figure 1-1130 consists of a CR-LSP and a reverse CR-LSP. The CR-LSP
originates from the ingress and terminates on the egress. Its reverse CR-LSP originates from
the egress and terminates on the ingress.
Figure 1-1130 Co-routed bidirectional static CR-LSP
The process of configuring a co-routed bidirectional static CR-LSP is as follows:
Issue 01 (2018-05-04) 1681

NE20E-S2
 On the ingress, configure a tunnel interface and enable MPLS TE on the outbound
interface of the ingress. If the outbound interface is Up and has available bandwidth
higher than the bandwidth to be reserved, the associated bidirectional static CR-LSP can
go Up, regardless of the existence of transit nodes or the egress node.
 On each transit node, enable MPLS TE on the outbound interface of the bidirectional
CR-LSP. If the outbound interface is Up and has available bandwidth higher than the
bandwidth to be reserved for the forward and reverse CR-LSPs, the associated
bidirectional static CR-LSP can go Up, regardless of the existence of the ingress, other
transit nodes, or the egress node.
 On the egress, enable MPLS TE on the inbound interface. If the inbound interface is Up
and has available bandwidth higher than the bandwidth to be reserved for the
bidirectional CR-LSP, the associated bidirectional static CR-LSP can go Up, regardless
of the existence of the ingress node or transit nodes.
1.12.4.8 Associated Bidirectional CR-LSPs

Associated bidirectional CR-LSPs provide bandwidth protection for bidirectional services.
Bidirectional switching can be performed for associated bidirectional CR-LSPs if faults occur.
Background
MPLS networks face the following challenges:
 Traffic congestion: RSVP-TE tunnels are unidirectional. The ingress forwards services to
the egress along an RSVP-TE tunnel. The egress forwards services to the ingress over IP
routes. As a result, the services may be congested because IP links do not reserve
bandwidth for these services.
 Traffic interruptions: Two MPLS TE tunnels in opposite directions are established
between the ingress and egress. If a fault occurs on an MPLS TE tunnel, a traffic
switchover can only be performed for the faulty tunnel, but not for the reverse tunnel. As
a result, traffic is interrupted.
A forward CR-LSP and a reverse CR-LSP between two nodes are established. Each CR-LSP
is bound to the ingress of its reverse CR-LSP. The two CR-LSPs then form an associated
bidirectional CR-LSP. The associated bidirectional CR-LSP is mainly used to prevent traffic
congestion. If a fault occurs on one end, the other end is notified of the fault so that both ends
trigger traffic switchovers, which traffic transmission is uninterrupted.
Implementation
Figure 1-1131 illustrates an associated bidirectional CR-LSP that consists of Tunnel1 and
Tunnel2. The implementation of the associated bidirectional CR-LSP is as follows:
 MPLS TE Tunnel1 and Tunnel2 are established using RSVP-TE signaling or manually.
 The tunnel ID and ingress LSR ID of the reverse CR-LSP are specified on each tunnel
interface so that the forward and reverse CR-LSPs are bound to each other. For example,
in Figure 1-1131, set the reverse tunnel ID to 200 and ingress LSR ID to 4.4.4.4 on
Tunnel1 so the reverse tunnel is bound to Tunnel1.
The ingress LSR ID of the reverse CR-LSP is the same as the egress LSR ID of the forward CR-LSP.
 Penultimate hop popping (PHP) is not supported on associated bidirectional CR-LSPs.
Issue 01 (2018-05-04) 1682

NE20E-S2
The forward and reverse CR-LSPs can be established over the same path or over different paths.
Establishing the forward and reverse CR-LSPs over the same path is recommended to implement the
consistent delay time.
Figure 1-1131 Associated bidirectional CR-LSP
Usage Scenario
 An associated bidirectional static CR-LSP transmits services and returned OAM PDUs
on MPLS networks.
 An associated bidirectional dynamic CRLSP is used on an RSVP-TE network when
bit-error-triggered switching is used.
1.12.4.9 CBTS
Class-of-service based tunnel selection (CBTS) is a method of selecting a TE tunnel. Unlike
the traditional method of load-balancing services on TE tunnels, CBTS selects tunnels based
on services' priorities so that high quality resources can be provided for services with higher
priority. In addition, FRR and HSB can be configured for TE tunnels selected by CBTS. For
more information about FRR and HSB, see the section Configuration - MPLS - MPLS TE
Configuration - Configuring MPLS TE Manual FRR and Configuration - MPLS - MPLS TE
Configuration - Configuring CR-LSP Backup.
Background
Existing networks face a challenge that they may fail to provide exclusive high-quality
transmission resources for higher-priority services. This is because the policy for selecting TE
tunnels is based on public network routes or VPN routes, which causes a node to select the
Issue 01 (2018-05-04) 1683

NE20E-S2
same tunnels for services with the same destination IP or VPN address but with different
priorities.
Traffic classification can be configured on CBTS-capable devices to match incoming services
on the ingress's inbound interface against a specific match rule and map matching services to
configured priorities. A rule can be enforced based on traffic characteristics. Alternatively, a
QoS Policy Propagation Through the Border Gateway Protocol (QPPB) rule can be used
based on BGP community attributes in BGP routes.
Service class attributes can be configured on a tunnel to which services are iterated so that the
tunnel can transmit services with one or more priorities. Services with specified priorities can
only be transmitted on such tunnels, not be load-balanced by all tunnels to which they may be
iterated. The service class attribute of a tunnel can also be set to "default" so that the tunnel
transmits mismatching services with other priorities that are not specified.
Implementation
Figure 1-1132 illustrates CBTS principles. TE tunnels between LSRA and LSRB balance
services, including high-priority voice services, medium-priority Ethernet data services, and
common ATM data services. The implementation of transmitting services of each priority on a
specific tunnel is as follows:
 Service classes EF, AF1+AF2, and default are configured for the three TE tunnels,
respectively.
 Multi-field classification is configured on the PE to map voice services to EF and map
Ethernet services to AF1 or AF2.
 The configuration is repeated. Voice services are transmitted along the TE tunnel that is
assigned the EF service class, Ethernet services along the TE tunnel that is assigned the
AF1+AF2 service class, and other services along the TE tunnel that is assigned the
default service class.
The default service class is not a mandatory setting. If it is not configured, mismatching services will be
transmitted along a tunnel that is assigned no service class. If every tunnel is configured with a service
class, these services will be transmitted along a tunnel that is assigned a service class mapped to the
lowest priority. The following service classes are prioritized in ascending order: BE, AF1, AF2, AF3,
AF4, EF, CS6, and CS7.
Figure 1-1132 CBTS principles
Issue 01 (2018-05-04) 1684

NE20E-S2
Usage Scenarios
 TE tunnels or TE tunnels on an LDP over TE scenario are configured on a PE to
load-balance services.
 L3VPN, VLL and VPLS services are configured on a PE. Inter-AS VPN services are not
supported.
 LDP over TE is configured, and TE tunnels are established to load-balance services on a
P.
1.12.4.10 P2MP TE
Point-to-multipoint (P2MP) traffic engineering (TE) is a promising solution to multicast
service transmission. P2MP TE helps carriers provide high TE capabilities and increased
reliability on an IP/MPLS backbone network and reduce network operational expenditure
(OPEX).
Background
The proliferation of applications, such as IPTV, multimedia conference, and massively
multiplayer online role-playing games (MMORPGs), amplifies demands on multicast
transmission over IP/MPLS networks. These services require sufficient network bandwidth,
quality of service (QoS) capabilities, and high reliability. The following multicast solutions
are used to run multicast services, but these solutions fall short of the requirements of
multicast services or network carriers:
 IP multicast technology: deployed on a live P2P network with software upgraded. This
solution reduces upgrade and maintenance costs. IP multicast, similar to IP unicast, does
not support QoS or TE capabilities and provides low reliability.
 Dedicated multicast network: deployed using asynchronous transfer mode (ATM) or
synchronous optical network (SONET)/synchronous digital hierarchy (SDH)
technologies. This solution provides high reliability and transmission rates, but has high
construction costs and requires separate maintenance.
IP/MPLS backbone network carriers require a multicast solution that has high TE capabilities
and can be implemented by upgrading existing devices.
P2MP TE is such a solution. It combines advantages of efficient IP multicast forwarding and
E2E MPLS TE QoS capabilities. P2MP TE establishes a tree-shape tunnel that originates
from an ingress node and is destined for multiple egress nodes and reserves bandwidth for the
multicast packets along the tree path. This provides sufficient bandwidth and QoS capabilities
for multicast services over the tunnel. In addition, a P2MP TE tunnel supports fast reroute
(FRR), which provides high reliability for multicast services.
Benefits
The P2MP TE feature deployed on an IP/MPLS backbone network offers the following
benefits:
 Optimizes your network bandwidth resources utilization.
 Provides bandwidth assurance required by multicast services.
 Eliminates the need to use Protocol Independent Multicast (PIM) in the MPLS core.
Issue 01 (2018-05-04) 1685

NE20E-S2
Related Concepts
Figure 1-1133 P2MP TE tunnel

Figure 1-1133 illustrates a P2MP TE tunnel.
Table 1-346 lists P2MP TE-related concepts.
Table 1-346 Related Concepts of P2MP TE

Ingress A node on which the inbound interface of a P2MP TE PE1 is the
tunnel is located. The ingress calculates a tunnel path, ingress.
establishes a tunnel over the path, and pushes MPLS
labels into multicast packets.
Transit A relay node that processes P2MP TE tunnel signaling P1 and P3 are
node and forwards packets. A transit node swaps an existing transit nodes.
label with an outgoing label in each packet. The transit
node can be a branch node.
Egress Also known as a leaf node on a sub-LSP. PE3, PE4, and
PE5 are
egresses.
Sub-LSP An LSP that originates from an ingress and is destined for On the network
a single egress. It is also known as a source-to-leaf (S2L) shown in Figure
sub-LSP. A P2MP TE tunnel consists of one or more 1-1133, PE1 ->
sub-LSPs. P3 -> P4 -> PE4
is a sub-LSP.
Bud node A node functioning as both an egress and a transit node. PE2 is a bud
This node is connected to a device on the user side. node.
NOTE
Issue 01 (2018-05-04) 1686

NE20E-S2

A P2MP bud node has low forwarding performance because it
has to replicate traffic to directly connected client-side devices.
Branch A type of transit node. A branch node copies an MPLS P4 is a branch

node packet, swaps MPLS labels with outgoing labels in the node.
copied MPLS packets, and sends duplicate MPLS packets
along sub-LSPs.
P2MP TE Tunnel Establishment

Similar to the process for establishing P2P TE tunnels, the process for establishing P2MP TE
tunnels consists of bandwidth advertisement, path calculation, and path establishment stages.
Interior Gateway Protocol (IGP) TE is used to advertise bandwidth information for P2MP TE
tunnels. Distinct path calculation and establishment processes are used for P2MP TE
tunnels.Inter-IGP-area or inter-AS P2MP TE tunnels are not supported.
 Path calculation and planning
The NE20E uses constraint shortest path first (CSPF) to calculate a path that originates
from the ingress and is destined for each leaf node. The calculated paths form a shortest
path tree. P2MP TE supports the explicit path technique. A user can configure an explicit
path for a specific or every leaf node. The explicit path technique facilitates path
planning, but causes the following problems:
− Crossover event: occurs when two sub-LSPs have different inbound and outbound
interfaces on a transit node. Figure 1-1134 illustrates that a crossover event occurs
on the transit node shared by the two sub-LSPs. If a P2MP TE tunnel is to be
established over these paths, duplicate bandwidth is reserved on the transit node for
the tunnel.
− Re-merge event: occurs when two sub-LSPs have different inbound interfaces but
the same outbound interface on a transit node. Figure 1-1134 illustrates that a
re-merge event occurs on the outbound interface shared by the two sub-LSPs. If a
P2MP TE tunnel is to be established over these paths, duplicate MPLS packets are
forwarded through the outbound interface.
Figure 1-1134 Re-merge event and crossover event
The ingress cannot establish a P2MP TE tunnel after detecting either a crossover or re-merge event. A
user can modify an explicit path for a sub-LSP to resolve a crossover or re-merge problem.
Issue 01 (2018-05-04) 1687

NE20E-S2
 Establishing a tunnel
Standard protocols defines a new RSVP extension that can be used to establish a P2MP
TE tunnel properly. Similar to a P2P TE tunnel, a P2MP TE tunnel is established using
Path and Resv messages that carry RSVP-TE signaling information. Path messages
originate from the ingress and travel along an explicit path to each leaf node. Leaf nodes
reply with Resv messages in the opposite direction of Path messages. After receiving a
Resv message, a node reserves bandwidth for a sub-LSP to be established. After
receiving all Resv messages, the ingress can properly establish a P2MP TE tunnel.
Figure 1-1135 demonstrates the process for establishing a P2MP TE tunnel.
Figure 1-1135 Path establishment 1
A P2MP TE tunnel is to be established between the ingress PE1 and leaf nodes PE2 and PE3.
This tunnel consists of sub-LSPs over the path PE1 -> P -> PE2 and the path PE1 -> P -> PE3.
PE1 constructs a Path message for each leaf PE and sends the messages over an explicit path.
Figure 1-1136 Path establishment 2
A P2MP TE tunnel is to be established between the ingress PE1 and leaf nodes PE2, PE3.
This tunnel consists of sub-LSPs over paths PE1 -> P -> PE2, PE1 -> P -> PE3. PE1
constructs a Path message for each leaf PE and sends the messages over an explicit path.
Issue 01 (2018-05-04) 1688

NE20E-S2
After receiving the Path message, every leaf PE replies with a Resv message carrying a
label assigned to its upstream node. The MPLS packets share the same incoming label on
the branch node, and the branch node builds a P2MP forwarding table. For example, the
P is the branch node shown in Figure 1-1136. Table 1-347 illustrates how a P2MP TE
tunnel is established.
Table 1-347 P2MP TE tunnel establishment process
Node Event Processing Generated Forwarding

Entry
PE2 PE2 receives a PE2 assigns a label with the in-label: LE21; action: POP
Path message sent value of 22 to the P,
by the P. constructs a Resv message,
and sends this message with
the assigned label to the P.
PE3 PE3 receives a PE3 assigns a label with the in-label: LE31; action: POP
Path message sent value of 34 to the P,
by the P. constructs a Resv message,
and sends this message with
the assigned label to the P.
P The P receives a The P reserves bandwidth in-label: L43; action:
Resv message sent for the outbound interface SWAP; out-label: LE11
by PE2. connected to PE3, assigns a
label with the value of 43,
and replies to P3 with a
Resv message carrying the
assigned label.
The P receives a The P reserves bandwidth in-label: L43; action:
Resv message sent for the outbound interface SWAP; out-label: LE11 and
by PE3. connected to PE4, assigns a LE44
label with the value of 43,
and replies to P3 with a
Resv message carrying the
assigned label.
PE1 PE1 receives a PE1 reserves bandwidth for in-label: none; action:
Resv message sent the outbound interface PUSH; out-label: LE11 and
by P3. connected to P3. A tunnel is LE31
successfully established.
P2MP TE Data Forwarding

P2MP TE data forwarding is similar to IP multicast data forwarding. A branch node replicates
MPLS packets, swaps existing labels with outgoing labels in the MPLS packets, and sends the
same MPLS packets over every sub-LSP. This process increases the efficiency of network
bandwidth resource usage.
Issue 01 (2018-05-04) 1689

NE20E-S2
In a VPLS over P2MP scenario or an NG MVPN over P2MP scenario, each service is transmitted
exclusively along a P2MP tunnel.
A P2MP TE tunnel is established on the network shown in Figure 1-1137. P2 is a branch node,
and PE2 is a bud node. Table 1-348 demonstrates the process for forwarding multicast packets
on each node over the P2MP TE tunnel.
Figure 1-1137 P2MP TE data forwarding
Table 1-348 P2MP TE data forwarding
Node Forwarding Entry Forwarding Behavior
Incoming Outgoin
Label g Label
PE1 N/A L11 Pushes an outgoing label with the value of 11 into
an IP multicast packet and forwards the packet to
P1.
P1 L11 L21 Swaps the incoming label with an outgoing label
with the value of 21 in an MPLS packet and
forwards the packet to P2.
P2 L11 LE22 Replicates the IP multicast packet, swaps the
(branch incoming label with an outgoing label in each
node) LE42 packet, and forwards each packet to a next hop
through a specific outbound interface.
PE2 (bud LE22 LE32  Replicates the packet, removes a label from one
node) packet, and forwards the packet to the CE.
None
 Swaps the incoming label with an outgoing label
LE32 in the other packet before forwarding this
packet to PE3.
Issue 01 (2018-05-04) 1690

NE20E-S2
Node Forwarding Entry Forwarding Behavior
Incoming Outgoin
Label g Label
PE3 LE32 None Removes the label from the packet so that this
MPLS packet becomes an IP multicast packet.
PE4 LE42 None Removes the label from the packet so that this
MPLS packet becomes an IP multicast packet.
P2MP TE FRR
Fast reroute (FRR) can protect P2MP and P2P TE tunnels. The NE20E supports FRR link
protection, not node protection, over P2MP TE tunnels. TE FRR establishes a bypass tunnel to
protect sub-LSPs. If a link fails, traffic switches to the bypass tunnel within 50 milliseconds.
Figure 1-1138 FRR link protection for a P2MP TE tunnel
The P2P TE bypass tunnel is established over the path P1 -> P5 -> P2 on the network shown
in Figure 1-1138. It protects traffic over the link between P1 and P2. If the link between P1
and P2 fails, P1 switches traffic to the bypass tunnel destined for P2.
An FRR bypass tunnel must be manually configured. An administrator can configure an
explicit path for a bypass tunnel and determine whether or not to plan bandwidth for the
bypass tunnel.
Issue 01 (2018-05-04) 1691

NE20E-S2
P2P and P2MP TE tunnels can share a bypass tunnel. FRR protection functions for P2P and P2MP TE
tunnels are as follows:
 A bypass tunnel bandwidth with planned bandwidth can be bound to a specific number of both P2P
and P2MP tunnels in configuration sequence. The total bandwidth of the bound P2P and P2MP
tunnels must be lower than or equal to the bandwidth of the bypass tunnel.
 A bypass tunnel with no bandwidth can also be bound to both P2P and P2MP TE tunnels.
MPLS TE Functions Shared by P2P TE and P2MP TE

Both a P2P TE tunnel and a P2MP TE tunnel are established using the Resource Reservation
Protocol-Tunnel Engineering (RSVP-TE) protocol. Therefore, they both share the following
functions shown in Table 1-349.
Table 1-349 MPLS TE functions shared by P2P TE and P2MP TE
Supported Description
Function
1.12.4.3.1 Tunnel Enables the ingress to reestablish a CR-LSP over a better path.
Re-optimization Tunnel re-optimization is implemented in either of the following
modes:
 Periodic re-optimization
When the specified interval for optimizing a CR-LSP expires,
Constraint Shortest Path First (CSPF) is triggered to calculate the
path of the CR-LSP. If the path calculated by CSPF has a metric
smaller than that of the existing CR-LSP, a new CR-LSP is
established along the new path. If the CR-LSP is successfully
established, the system notifies the forwarding plane to switch
traffic and tear down the original CR-LSP. After the process,
re-optimization is complete. If the CR-LSP is not set up, the
traffic is still forwarded along the existing CR-LSP.
 Manual re-optimization
A re-optimization command is run in the user view to trigger
re-optimization.
1.12.4.6.1 RSVP Authenticates RSVP messages based on RSVP message digests.

Authentication RSVP authentication helps prevent malicious attacks initiated by the
modified or forged RSVP messages and improve the network
reliability and security.
1.12.4.2.5 RSVP Enables nodes to send Srefresh messages that contain Path and Resv
Summary Refresh message digests, instead of sending complete Path and Resv
messages, which helps reduce bandwidth consumption. Srefresh
takes effect on both P2P and P2MP TE tunnels established on the
same device.
1.12.4.5.12 RSVP Helps the GR restarter complete the GR process.
GR Principles and configuration procedures for RSVP graceful restart
(GR) Helper functions for P2MP and P2P TE tunnels are similar.
RSVP GR Helper functions take effect on both P2P and P2MP TE
Issue 01 (2018-05-04) 1692

NE20E-S2
Supported Description
Function
tunnels established on the same device.
1.12.4.11.1 P2MP TE Applications for IPTV
Service Overview
conferences. These services are transmitted over a service bearer network with the following
functions:
point-to-multipoint (P2MP) Traffic Engineering (TE) supported on NE20Es is used on the
IP/MPLS backbone network shown in Figure 1-1139. P2MP TE helps the network prevent
multicast traffic congestion and maintain reliability.
Figure 1-1139 P2MP TE applications for multicast services
Feature Deployment
Figure 1-1139 illustrates how P2MP TE tunnels are used to transmit IP multicast services. The
process consists of the following stages:
 Import multicast services.
Issue 01 (2018-05-04) 1693

NE20E-S2
− An Internet Group Management Protocol (IGMP) static group is configured on a

network-side interface of each service router (SR). SR1 run the Protocol
Independent Multicast (PIM). Ingress PE1 functioning as a host sends an IGMP
Join message to SR1. After receiving the message, SR generates a multicast
forwarding entry and forwards multicast traffic to a PE. A traffic policy is
configured to allow each PE to direct multicast traffic to a separate P2MP tunnel
interface.
− A static IGMP group is configured on a P2MP TE tunnel interface on each ingress.
PIM is enabled on the ingress nodes. After receiving an IGMP Join message from a
downstream device, each ingress generates a multicast forwarding entry. Each
ingress then sends an IGMP Join message to an SR.
 Establish a P2MP TE tunnel.
Using the following configuration is recommended:
− Configuring an explicit path is recommended. The configuration must prevent the
re-merge or cross-over problem.
− Resource Reservation Protocol (RSVP) neighbor-based authentication is configured
to improve network protocol security.
− RSVP Srefresh is configured to increase the efficiency of network resource use.
− P2MP TE FRR is configured to improve network reliability.
− Leaf nodes PE3, PE4, PE5, and PE6 shown in Figure 1-1139 run PIM to generate
− AR1 establishes PIM neighbor relationships with PE3 and PE4, and AR2
establishes PIM neighbor relationships with PE5 and PE6.
− Multicast source proxy is configured on leaf nodes PE3 through PE6 to allow these
PEs to receive Join messages sent by ARs. After receiving the Join messages, these
PEs terminate them.
− Reverse path forwarding (RPF) check is disabled on leaf PEs.

Abbreviation
CSPF Constrained Shortest Path First

CT class type
FRR fast reroute
MP merge point
PLR point of local repair
PSB path state block
RSVP Resource Reservation Protocol
RSB reserved state block
TE traffic engineering
Issue 01 (2018-05-04) 1694

NE20E-S2
1.12.5 Seamless MPLS

Definition
Seamless MPLS is a bearer technique that extends MPLS techniques to access networks.
Seamless MPLS establishes an E2E LSP across the access, aggregation, and core layers. All
services can be encapsulated using MPLS at the access layer and transmitted along the E2E
LSP across the three layers.
Purpose
MPLS is a mature and well-known technology and has been adopted by a growing number of
service providers in network construction. MPLS can integrate multiple networks on an
Ethernet-based infrastructure, making full use of the benefits of a uniform forwarding model
and reducing network construction costs. MPLS has been widely used on aggregation and
core networks.
With current trends moving towards a flat network structure, metropolitan area networks
(MANs) are steadily evolving into the Ethernet architecture, which calls for the application of
MPLS on the MAN and access networks. To meet this requirement, seamless MPLS was
developed. Seamless MPLS uses existing BGP, IGP, and MPLS techniques to establish an
E2E LSP across the access, aggregation, and core layers, allowing end-to-end traffic to be
encapsulated and forwarded using MPLS.
Benefits
Seamless MPLS offers the following benefits:
 Integrates the access, aggregation, and core layers into one MPLS network, encapsulates
all services using MPLS, and transmits these services along an E2E LSP. Seamless
MPLS simplifies network provisioning, operation, and maintenance.
 Supports high deployment flexibility and scalability. On a seamless MPLS network, an
LSP can be established between any two nodes to roll out services.
1.12.5.2 Principles
1.12.5.2.1 Basic Principles of Seamless MPLS
Usage Scenario
Seamless MPLS establishes a BGP LSP across the access, aggregation, and core layers and
transmits services along the E2E BGP LSP. Service traffic can be transmitted between any
two points on the LSP. The seamless MPLS network architecture maximizes service
scalability using the following functions:
 Allows access nodes to signal all services to an LSP.
 Uses the same transport layer convergence technique to rectify all network-side faults,
without affecting service transmission.
Seamless MPLS networking solutions are as follows:
Issue 01 (2018-05-04) 1695

NE20E-S2
 Intra-AS seamless MPLS: The access, aggregation, and core layers are within a single
AS. Intra-AS seamless MPLS applies to mobile bearer networks.
 Inter-AS seamless MPLS: The access and aggregation layers are within a single AS,
whereas the core layer in another AS. Inter-AS seamless MPLS is mainly used to
transmit enterprise services.
 Inter-AS seamless MPLS+HVPN: A cell site gateway (CSG) and an aggregation (AGG)
node establish an HVPN connection, and the AGG and a mobile aggregate service
gateway (MASG) establish a seamless MPLS LSP. The AGG provides hierarchical
L3VPN access services and routing management services. Seamless MPLS+HVPN
combines the advantages of both MPLS and HVPN. Seamless MPLS allows any two
nodes on an inter-AS LSP to transmit services at the access, aggregation, and core layers,
providing high service scalability. HVPN enables carriers to reduce network deployment
costs by deploying devices with layer-specific capacities to meet service requirements.
Intra-AS Seamless MPLS
Table 1-350 Intra-AS seamless MPLS networking
Network Description
Deployment
Deploy
routing Figure 1-1140 Deploying routing protocols for the intra-AS
protocols. seamless MPLS networking
Control
plane As shown in Figure 1-1140, routing protocols are deployed on
devices as follows:
 An IGP (IS-IS or OSPF) is enabled on devices at each of the
access, aggregation, and core layers to implement intra-AS
connectivity.
 The path CSG1 -> AGG1 -> core ABR1 -> MASG1 is used
in the following example. An IBGP peer relationship is
established between each of the following pairs of devices:
− CSG and AGG
− AGG and core ABR
− Core ABR and MASG
The AGG and core ABR are configured as route reflectors
(RRs) so that the CSG and MASG can obtain routes
destined for each other's loopback addresses.
Issue 01 (2018-05-04) 1696

NE20E-S2
Network Description
Deployment
 The AGG and core ABR set the next hop addresses in BGP
routes to their own addresses to prevent advertising
unnecessary IGP area-specific public routes.
Deploy
tunnels. Figure 1-1141 Deploying tunnels for the intra-AS seamless MPLS
networking
As shown in Figure 1-1141, tunnels are deployed as follows:

 A public network tunnel is established using LDP, LDP over
TE or TE in each IGP area.
 The path CSG1 -> AGG1 -> core ABR1 -> MASG1 is used
in the following example. An IBGP peer relationship is
established between each of the following pairs of devices:
− CSG and AGG
− AGG and core ABR
− Core ABR and MASG
These devices are enabled to advertise labeled routes and
assign labels to BGP routes that match a specified routing
policy. After the devices exchange labeled BGP routes, an
E2E BGP LSP is established between the CSG and MASG.
Forwarding plane Figure 1-1142 Forwarding plane for the intra-AS seamless MPLS
networking
Issue 01 (2018-05-04) 1697

NE20E-S2
Network Description
Deployment
Figure 1-1142 illustrates the forwarding plane of the intra-AS

seamless MPLS networking. Seamless MPLS is mainly used to
transmit VPN packets. The following example demonstrates
how VPN packets, including labels and data, are transmitted
from a CSG to an MASG along the path CSG1 -> AGG1 ->
core ABR1 -> MASG1.
1. The CSG pushes a BGP LSP label and an MPLS tunnel
label in sequence into each VPN packet and forwards the
packets to the AGG.
2. The AGG removes the access-layer MPLS tunnel labels
from the packets and swaps the existing BGP LSP labels for
new labels. The AGG then pushes an aggregation-layer
MPLS tunnel label into each packet. The AGG proceeds to
forward the packets to the core ABR. If the penultimate hop
popping (PHP) function is enabled on the AGG, the CSG
has removed the MPLS tunnel labels from the packets, and
therefore, the AGG receives packets without MPLS tunnel
labels.
3. The core ABR removes aggregation-layer MPLS tunnel
labels from the VPN packets and swaps the existing BGP
LSP labels for new labels. The AGG pushes a core-layer
MPLS tunnel label to each packet and forwards the packets
to the MASG.
4. The MASG removes MPLS tunnel labels and BGP LSP
labels from the VPN packets. If the PHP function is enabled
on the MASG, the core ABR has removed the core-layer
MPLS tunnel labels from the packets, and therefore, the
MASG receives packets without MPLS tunnel labels.
The VPN packet transmission along the intra-AS seamless
MPLS tunnel is complete.
Issue 01 (2018-05-04) 1698

NE20E-S2
Inter-AS Seamless MPLS
Table 1-351 Inter-AS seamless MPLS networking
Network Description
Deployment
Deploy
routing Figure 1-1143 Deploying routing protocols for the inter-AS
protocols. seamless MPLS networking
As shown in Figure 1-1143, routing protocols are deployed on

devices as follows:
connectivity.
Control  The path CSG1 -> AGG1 -> AGG ASBR1 -> core ASBR1
plane -> MASG1 is used in the following example. A BGP peer
relationship is established between each of the following
pairs of devices:
− CSG and AGG
− AGG and AGG ASBR
− AGG ASBR and core ASBR
− Core ASBR and MASG
An EBGP peer relationship is established between the AGG
ASBR and core ASBR, and IBGP peer relationships are
established between other devices.
 The AGG is configured as an RR so that IBGP peers can
exchange BGP routes, and the CSG and MASG can obtain
BGP routes destined for each other's loopback addresses.
 If the AGG ASBR and core ASBR are connected indirectly,
an IGP neighbor relationship between them must be
established to implement inter-area connectivity.
Deploy
tunnels. Figure 1-1144 Deploying tunnels for the inter-AS seamless MPLS
networking
Issue 01 (2018-05-04) 1699

NE20E-S2
Network Description
Deployment

 A public network tunnel is established using LDP, LDP over
TE or TE in each IGP area. An LDP LSP or a TE LSP is
established if more than one hop exists between the AGG
ASBR and core ASBR.
 The CSG, AGG, AGG ASBR, and core ASBR are enabled to
advertise labeled routes and assign labels to BGP routes that
match a specified routing policy. After the devices exchange
labeled BGP routes, a BGP LSP is established between the
CSG and core ASBR.
 Either of the following tunnel deployment methods can be
used in the core area:
− A BGP LSP between the core ASBR and MASG and
combined with the BGP LSP between the CSG and core
ASBR to form an E2E BGP LSP. The route to the
MASG's loopback address is installed into the BGP
routing table and advertised to the core ASBR using the
IBGP peer relationship. The core ASBR assigns a label to
the route and advertises the labeled route to the AGG
ASBR.
− No BGP LSP is established between the core ASBR and
MASG. The core ASBR runs an IGP to learn the route
destined for the MASG's loopback address and installs
the route to the routing table. The core ASBR assigns a
BGP label to the route and associates the route with an
intra-AS tunnel. The BGP LSP between the CSG and core
ASBR and the MPLS tunnel in the core area are
combined into an E2E tunnel.
Forwarding plane Figure 1-1145 Forwarding plane for the inter-AS seamless MPLS
networking with a BGP LSP established in the core area
Issue 01 (2018-05-04) 1700

NE20E-S2
Network Description
Deployment
Figure 1-1145 illustrates the forwarding plane of the inter-AS

seamless MPLS networking with a core-layer BGP LSP
established. Seamless MPLS is mainly used to transmit VPN
packets. The following example demonstrates how VPN
packets, including labels and data, are transmitted from a CSG
to an MASG along the path CSG1 -> AGG1 -> AGG ASBR1 ->
core ASBR1 -> MASG1.
1. The CSG pushes a BGP LSP label and an MPLS tunnel label
in sequence into each VPN packet and forwards the packets
to the AGG.
from the packets and swaps the existing BGP LSP labels for
new labels. The AGG pushes an aggregation-layer MPLS
tunnel label into each packet and then proceeds to forward
the packets to the AGG ASBR. If the PHP function is
enabled on the AGG, the CSG has removed the MPLS tunnel
labels from the packets, and therefore, the AGG receives
packets without MPLS tunnel labels.
3. The AGG ASBR then removes the MPLS tunnel labels from
packets and swaps the existing BGP LSP label for a new
label in each packet. It then forwards the packets to the core
ASBR. If the PHP function is enabled on the AGG ASBR,
the AGG has removed the MPLS tunnel labels from the
packets, and therefore, the AGG ASBR receives packets
without MPLS tunnel labels.
4. After the core ASBR receives the packets, it swaps a BGP
LSP label for a new label and adds a core-layer MPLS tunnel
label to each packet. It then forwards the packets to the
MASG.
5. The MASG removes MPLS tunnel labels, BGP LSP labels,
and VPN labels from the packets. If the PHP function is
enabled on the MASG, the core ASBR has removed the
Issue 01 (2018-05-04) 1701

NE20E-S2
Network Description
Deployment
The VPN packet transmission along the inter-AS seamless
MPLS tunnel is complete.
Figure 1-1146 Forwarding plane for the inter-AS seamless MPLS

networking without a BGP LSP established in the core area
Figure 1-1146 illustrates the forwarding plane for the inter-AS

seamless MPLS networking without a BGP LSP established in
the core area. The process of transmitting packets on this
network is similar to that on a network with a BGP LSP
established. The difference is that without a BGP LSP in the
core area, the core ASBR removes BGP labels from packets and
add MPLS tunnel labels to these packets.
Inter-AS Seamless MPLS+HVPN
Table 1-352 Inter-AS seamless MPLS+HVPN networking
Network Description
Deployment
Deploy
Control
routing Figure 1-1147 Deploying routing protocols for the inter-AS
plane
protocols. seamless MPLS+HVPN networking
Issue 01 (2018-05-04) 1702

NE20E-S2
Network Description
Deployment
As shown in Figure 1-1147, routing protocols are deployed on

devices as follows:
connectivity.
 An IBGP peer relationship is established between each of the
following pairs of devices:
− AGG and an AGG ASBR
− Core ASBR and MASG
 An EBGP peer relationship is established between the AGG
ASBR and core ASBR.
 An MP-IBGP peer relationship is established between the
CSG and AGG, and a multi-hop MP-EBGP peer relationship
is established between the AGG and MASG.
Deploy
tunnels. Figure 1-1148 Deploying tunnels for the inter-AS seamless
MPLS+HVPN networking
Issue 01 (2018-05-04) 1703

NE20E-S2
Network Description
Deployment
 A public network tunnel is established using LDP or TE in
each IGP area.
 The AGGs, AGG ASBRs, core ASBRs, and MASGs are
enabled to advertise labeled routes. They assign labels to
BGP routes that match a specified routing policy. After they
exchange BGP routes, a BGP LSP can be established
between each pair of an AGG and MASG.
Figure 1-1149 Forwarding plane of the inter-AS seamless

MPLS+HVPN networking
Forwarding plane Figure 1-1149 illustrates the forwarding plane of the inter-AS
seamless MPLS+HVPN networking. Seamless MPLS is mainly
used to transmit VPN packets. The following example
demonstrates how VPN packets, including labels and data, are
transmitted from a CSG to an MASG along the path CSG2 ->
AGG1 -> AGG ASBR1 -> core ASBR1-> MASG1.
1. The CSG pushes an MPLS tunnel label into each VPN
packet and forwards the packets to the AGG.
from the packets and pushes a BGP LSP label. It then adds
aggregation-layer MPLS tunnel labels to the packets and
then proceeds to forward them to the AGG ABR. If the PHP
function is enabled on the AGG, the CSG has removed the
AGG receives packets without MPLS tunnel labels.
3. The AGG ASBR then removes the MPLS tunnel labels from
packets and swaps the existing BGP LSP label for a new
label in each packet. It then forwards the packets to the core
ASBR. If the PHP function is enabled on the AGG ASBR,
the AGG has removed the MPLS tunnel labels from the
packets, and therefore, the AGG ASBR receives packets
Issue 01 (2018-05-04) 1704

NE20E-S2
Network Description
Deployment
without MPLS tunnel labels.
4. After the core ASBR receives the packets, it swaps a BGP
LSP label for a new label and adds a core-layer MPLS tunnel
label to each packet. It then forwards the packets to the
MASG.
5. The MASG removes MPLS tunnel labels, BGP LSP labels,
and VPN labels from the packets. If the PHP function is
enabled on the MASG, the core ASBR has removed the
The VPN packet transmission along the seamless MPLS
tunnel is complete.
Reliability
Seamless MPLS network reliability can be improved using a variety of functions. If a network
fault occurs, devices with reliability functions enabled immediately detect the fault and switch
traffic from active links to standby links.
The following examples demonstrate the reliability functions used on an inter-AS seamless
MPLS network.
 A fault occurs on a link between a CSG and an AGG.
As shown in Figure 1-1150, the active link along the primary path between CSG1 and
AGG1 fails. After BFD for LDP or BFD for CR-LSP detects the fault, the BFD module
uses LDP FRR, TE Hot-standby or BGP FRR to switch traffic from the primary path to
the backup path.
Figure 1-1150 Traffic protection triggered by a fault in the link between the CSG and AGG
on the inter-AS seamless MPLS network
 A fault occurs on an AGG.
Issue 01 (2018-05-04) 1705

NE20E-S2
As shown in Figure 1-1151, BGP Auto FRR is configured on CSGs and AGG ASBRs to
protect traffic on the BGP LSP between CSG1 and MASG1. If BFD for LDP or BFD for
TE detects AGG1 faults, the BFD module switches traffic from the primary path to the
backup path.
Figure 1-1151 Traffic protection triggered by a fault in an AGG on the inter-AS seamless
MPLS network
 A fault occurs on the link between an AGG and an AGG ASBR.

As shown in Figure 1-1152, a fault occurs on the link along the primary path between
AGG1 and ASBR1. After BFD for LDP or BFD for CR-LSP detects the fault, the BFD
module uses LDP FRR, TE hot-standby or BGP FRR to switch traffic from the primary
path to the backup path.
Figure 1-1152 Traffic protection triggered by a fault in the link between an AGG and an
AGG ASBR on the inter-AS seamless MPLS network
 A fault occurs on an AGG ASBR.

As shown in Figure 1-1153, BFD for LDP or BFD for TE is configured on AGG1, and
BFD for interface is configured on core ASBR1. If AGG ASBR1 fails, the BFD modules
on AGG1 and core ASBR1 detect the fault and trigger the BGP Auto FRR function. BGP
Issue 01 (2018-05-04) 1706

NE20E-S2
Auto FRR switches both upstream and downstream traffic from the primary path to
backup paths.
Figure 1-1153 Traffic protection triggered by a fault in an AGG ASBR on the inter-AS
seamless MPLS network
 A fault occurs on the link between an AGG ASBR and a core ASBR.
As shown in Figure 1-1154, BFD for interface is configured on AGG ASBR1 and core
ASBR1. If the BFD module detects a fault in the link between AGG ASBR1 and core
ASBR1, the BFD module triggers the BGP Auto FRR function. BGP Auto FRR switches
both upstream and downstream traffic from the primary path to backup paths.
Figure 1-1154 Traffic protection triggered by a fault in the link between an AGG ASBR and
a core ASBR on the inter-AS seamless MPLS network
 A fault occurs on a core ASBR.
Issue 01 (2018-05-04) 1707

NE20E-S2
As shown in Figure 1-1155, BFD for interface and BGP Auto FRR are configured on
AGG ASBR1. BGP Auto FRR and BFD for LDP (or BFD for TE) are configured on
MASGs to protect traffic on the BGP LSP between CSG1 and MASG1. If the BFD
module detects a fault in core ASBR1, it switches both upstream and downstream traffic
from the primary path to backup paths.
Figure 1-1155 Traffic protection triggered by a fault in a core ASBR on the inter-AS
 A link fault occurs in the core area.

As shown in Figure 1-1156, BFD for LDP or BFD for CR-LSP is configured on core
ASBR1. If the BFD module detects a fault in the link between core ASBR1 and MASG1,
it triggers the LDP FRR, TE Hot-standby or BGP FRR function. LDP FRR, TE FRR, or
BGP FRR switches both upstream and downstream traffic from the primary path to the
backup path.
Figure 1-1156 Traffic protection triggered by a link fault in a core area on the inter-AS
Issue 01 (2018-05-04) 1708

NE20E-S2
 A fault occurs on an MASG.

As shown in Figure 1-1157, BFD for BGP tunnel is configured on CSG1. BFD for BGP
tunnel is implemented in compliance with relevant standards "Bidirectional Forwarding
Detection (BFD) for MPLS Label Switched Paths (LSPs)." BFD for BGP tunnel
monitors E2E BGP LSPs, including a BGP LSP connected to an LDP LSP. When
MASG1 that functions as a provider edge (PE) device fails, BFD for BGP tunnel can
rapidly detect the fault and trigger VPN FRR switching. The BFD module then switches
both upstream and downstream traffic from the primary path to the backup path.
Figure 1-1157 Traffic protection triggered by a fault in an MASG on the inter-AS seamless
MPLS network
1.12.5.2.2 BFD for BGP Tunnel
Background
The IP/MPLS network shown in Figure 1-1158 transmits VPN services. PEs, such as a CSG,
AGG, ASBR, and MASG, establish multi-segment MPLS tunnels between directly connected
devices. In this case, VPN service provision on PEs is complex, and the VPN service
scalability decreases. As PEs establish BGP peer relationships, a routing policy can be used to
assign MPLS labels for BGP routes so that an E2E BGP tunnel can be established. The BGP
tunnel consists of a primary BGP LSP and a backup BGP LSP. VPN services can travel along
the E2E BGP tunnel, which simplifies service provision and improves VPN service
scalability.
Issue 01 (2018-05-04) 1709

NE20E-S2
Figure 1-1158 E2E BGP tunnel application
To rapidly detect faults in an E2E BGP tunnel, BFD for BGP tunnel is used. BFD for BGP
tunnel establish a dynamic BFD session, also called a BGP BFD session, which is bound to
both the primary and backup BGP LSPs. If both BGP LSPs fail, the BGP BFD session detects
the faults and triggers VPN FRR switching.
Usage Scenarios
BFD for BGP tunnel is used in the following scenarios:
 Inter-AS VPN Option C scenario
 Intra- or inter-AS seamless MPLS scenario
Principles
Dynamic BGP BFD sessions are established using either of the following policies:
 Host address-based policy: used when all host addresses are available to trigger the
creation of BGP BFD sessions.
 IP address prefix list-based policy: used when only some host addresses can be used to
establish BFD sessions.
A BGP BFD session working in asynchronous mode monitors BGP LSPs over BGP tunnels.
In Figure 1-1159, the ingress (CSG) and egress (MASG) of E2E BGP LSPs exchange BFD
packets periodically. The forward path is a BGP LSP, and the reverse path is an IP route. If
either node receives no BFD packet after a specified detection period elapses, the node
considers the BGP LSP faulty. If both the primary and backup BGP LSPs fail, the BGP BFD
session triggers VPN FRR switching.
Issue 01 (2018-05-04) 1710

NE20E-S2
Figure 1-1159 BFD for BGP tunnel
1.12.5.3.1 Seamless MPLS Applications in VPN Services
Service Overview
With the growth of third generation of mobile telecommunications (3G) and Long Term
Evolution (LTE) services, inter-AS leased line services becomes the key services. To carry
these services over VPNs, seamless MPLS establishes an E2E LSP between a cell site
gateway (CSG) and a mobile aggregate service gateway (MASG) to transmit virtual private
network (VPN) services, as well as helps carriers reduce costs of network construction,
operation, and maintenance. Seamless MPLS also allows carriers to uniformly operate and
maintain networks.
Figure 1-1160 illustrates an LTE network. The access and aggregation layers belong to one
AS, and the core layer belongs to another AS. To transmit VPN services, the inter-AS
seamless MPLS+HVPN networking can be used to establish an LSP between each pair of a
CSG and MASG. CSGs are connected to NodeBs that are Wideband Code Division Multiple
Access (WCDMA) 3G base stations and eNodeBs that are LTE base stations. MASGs are
connected to a mobility management entity (MME) or service gateway (SGW). VPN
instances can be configured between CSGs and MASGs to transmit various types of services.
An HVPN is deployed between each pair of a CSG and aggregation (AGG) node, and an
inter-AS LSP is established between each pair of an AGG and MASG using the seamless
MPLS technique. A NodeB or an eNodeB can then communicate with the MME or SGW.
Issue 01 (2018-05-04) 1711

NE20E-S2
Figure 1-1160 Seamless MPLS application
Seamless MPLS Networking Characteristics
Table 1-353 Inter-AS seamless MPLS+HVPN characteristics
Requirement Inter-AS Seamless MPLS+HVPN
Services Segment tunnels are established.

 LDP LSPs or TE LSPs are established at the access layer.
 LDP LSPs or TE LSPs are established at the access layer. An
inter-AS BGP LSP is established across the aggregation and
core layer.
 The inter-AS BGP LSP overlaps an intra-AS LDP LSP or an
intra-AS TE LSP.
Route control The number of routes is minimized:

 On the public network, routes at the access and aggregation
layer are isolated. Devices at the two layers advertise labeled
BGP routes destined for one another's loopback addresses.
 On the private network, AGGs advertise default routes, which
minimizes the number of private routes to be advertised.
Enterprise leased line The large-scale enterprise VPN services can be provisioned. The
services Layer 2 and Layer 3 leased lines connected to CSGs are easily
deployed.
Protection switching The following protection switching functions can be configured:
 TE hot standby or LDP FRR: monitors TE LSPs or LDP LSPs.
 BGP FRR: monitors BGP LSPs.
 VPN FRR: monitors VPN connections.
CSG performance CSGs that maintain a few routes only need to process packets each
requirements with two labels.
Issue 01 (2018-05-04) 1712

NE20E-S2
1.12.6 GMPLS UNI
The GMPLS UNI solution is only used for interconnection between Huawei forwarders and Huawei
controllers.
1.12.6.1 Introduction to GMPLS UNI
Purpose
In an era when IP technologies evolve quickly and data transmission becomes demanding, IP
services impose higher requirements on bandwidth of transport networks. Mainstream
bandwidth of transport networks has quickly changed from 155 Mbit/s and 622 Mbit/s to 2.5
Gbit/s and 10 Gbit/s, and to 40 Gbit/s and 100 Gbit/s at present. The processing granularity
(VC4) of Synchronous Digital Hierarchy (SDH) networks, however, lags behind. In this case,
the Dense Wavelength Division Multiplexing (DWDM) technique becomes one of options to
construct a transport network. To provide an end-to-end DWDM solution, the issue in the
communication between routers and DWDM devices must be addressed in advance.
To be specific, many User-Network Interfaces (UNIs) are statically configured between IP
networks and transport networks, but this configuration has many drawbacks:
 Transmission channels between IP networks and transport networks need to be
configured manually, which is time consuming and increases carriers' network
construction cost.
 When a fault occurs and both the primary and secondary paths fail, additional
configurations are needed to restore services, increasing carriers' network maintenance
cost.
 Bandwidth cannot be dynamically adjusted because IP networks and transport networks
are interconnected based on static configurations. This defect will waste abundant
network resources and lead to unnecessary capacity expansion.
The automatic UNI service deployment feature provided by Generalized Muti-Protocol Label
Switching (GMPLS) properly solves the preceding problems. GMPLS provides packet
switching, wavelength switching, time division switching, and spatial switching, supports
multiple interconnection models between transmission networks and IP networks, and truly
implements an end-to-end solution. GMPLS brings the following benefits:
 Simplified network management, intelligent service provisioning, flexible transmission
channel setup, and lower operational&maintenance cost
 Abundant protection levels, enhanced network robustness based on an effective
protection recovery mechanism, and lower operational&maintenance cost
 Flexible resource allocation policies, improved network resource usage, and lower
pressure on capacity expansion
Definition
GMPLS is developed from MPLS so that it inherits nearly all MPLS features and protocols.
GMPLS also extends the definition of MPLS labels and it can be considered as an extension
of MPLS in transmission networks. GMPLS provides a unified control plane for the IP layer
Issue 01 (2018-05-04) 1713

NE20E-S2
and transport layer. In this manner, the network architecture is simplified, the network
management cost is reduced, and the network performance is optimized.
The GMPLS User-Network Interface (UNI) is defined by IETF as a network connection
interface. It is applicable to the overlay model in the GMPLS network structure and it meets
the trend in network development.
GMPLS UNI extends MPLS in the following aspects:
 Supports multiple network interface types and supports switching of packets, timeslots,
wavelengths, and ports.
 Supports explicit routes and explicit labels.
 Supports bidirectional LSPs.
 Separates the control plane from the data plane, supports outband signaling, and prevents
a failure in the control plane from affecting the data plane.
 Enables fast fault detection in the control plane and supports end-to-end recovery and
protection.
 Supports service security mechanisms and service policy authentication.
 Supports LSP graceful deletion.
1.12.6.2 Principles
Generalized Multiprotocol Label Switching (GMPLS) extends the traditional MPLS
technology and applies to the transport layer. To seamlessly integrate the IP and transport
layers, GMPLS extends MPLS labels and uses labels to identify Time Division Multiplexing
(TDM) time divisions, wavelengths, and optical fibers, in addition to data packets. GMPLS
adds labels to packets during IP data switching, TDM electrical circuit switching (primarily
applying to Synchronous Digital Hierarchy [SDH]/Synchronous Optical Network [SONET]),
and spatial switching. GMPLS separates control and data channels and uses the Link
Management Protocol (LMP) to manage and maintain links. GMPLS supports multiple
models for interconnecting the IP and transport networks, meeting requirements for IP and
transport network convergence.
GMPLS Network Structure Models

GMPLS supports three models for connecting an IP network and a transport network: overlay
model, peer model, and border peer model. The models are described as follows:
 Overlay model: An IP network shown in Figure 1-1161 functions as a client connected to
a transport network and exchanges information only with directly connected optical
transport devices. An IP network is unaware of path planning inside the transport
network, and its network topology is independent of the transport network topology.
GMPLS UNIs connect each IP network to the transport network. Users use specified
UNIs on edge nodes of the IP networks to establish GMPLS tunnels across the transport
network, but do not plan paths within the transport network on the edge nodes.
Issue 01 (2018-05-04) 1714

NE20E-S2
Figure 1-1161 Overlay model
 Peer model: Figure 1-1162 shows the peer model networking. IP devices and transport
devices are operating in a single GMPLS domain. IP and transport network topologies
are visible to each other. End-to-end (E2E) GMPLS tunnels can be established and they
originate from an IP network, pass through a transport network, and are destined for
another IP network.
Figure 1-1162 Peer model
 Border peer model: Figure 1-1163 shows the border peer model networking. A transport
network and edge nodes that directly connect the IP networks to the transport network
are in the same GMPLS domain. The transport network topology is invisible to non-edge
nodes on the IP networks. A path for a GMPLS tunnel between the edge nodes across the
transport network can be calculated.
Issue 01 (2018-05-04) 1715

NE20E-S2
Figure 1-1163 Border peer model
Table 1-354 Comparison between three GMPLS networking models
Model Advantage Disadvantage
Peer model Both IP address space and Using the peer model is
signaling protocols can be difficult because the entire
planned for transport live network must be
devices and IP routers. The upgraded. Transport devices
transport devices and IP and IP routers need to use
routers can establish reliable the same signaling
connections. This model protocols, increasing the
allows rapid service rollout possibility of security risks.
and planning of E2E optimal
paths.
Border peer model IP routers are isolated from The edge nodes must have
transport devices, except for high performance. Security
edge nodes. The transport deteriorates in this model.
network topology is visible This model does not support
to the boundary routers on E2E optimal path planning.
the IP network.
Overlay model Transport and IP network Planning E2E optimal paths
devices must have clearly for GMPLS tunnels is
defined UNI information. difficult. UNI bandwidth
They do not need to learn usage is lower in this model
about routing or topology than in the other two
information of each other or models. The overlay model
exchange information. The requires UNI interface
overlay model provides high planning.
security and has low
upgrade requirements.
Issue 01 (2018-05-04) 1716

NE20E-S2
The NE20E only supports the overlay model, in compliance with the GMPLS UNI model
defined in relevant standards. The GMPLS UNI model is used in the following sections.
GMPLS UNI Model Structure and Concepts
Figure 1-1164 GMPLS UNI network structure
Figure 1-1164 shows the GMPLS UNI model networking. Edge nodes on overlay networks
running IP are directly connected to transport devices on a core transport network along TE
links. Only the edge nodes can initiate the establishment of a UNI tunnel to travel through the
core network. On the IP networks, only edge nodes need to support GMPLS UNI functionality.
The GMPLS UNI model involves the following concepts:
 Ingress EN: refers to an edge node that directly connects an IP network to a transport
network. A GMPLS UNI tunnel originates from the ingress EN.
 Ingress CN: refers to an edge node that directly connects a transport network to the
ingress EN.
 Egress EN: refers to an edge node that directly connects an IP network to a transport
network. A GMPLS UNI tunnel is destined for the egress EN.
 Egress CN: refers to an edge node that directly connects a transport network to the egress
EN.
 UNI: sends requests for bandwidth used for connections to the transport network.
 Network network interface (NNI): connects nodes within the transport network.
Separation Between the Control Channel and Data Channel

Traditional MPLS LSPs do not distinguish between the data channel and control channel. This
means that both signaling and services travel through the same paths. GMPLS separates the
data channel from the control channel. The control channel transmits control packets such as
RSVP signaling packets and the data channel bears services. A fault in the control channel
Issue 01 (2018-05-04) 1717

NE20E-S2
does not affect the data channel, ensuring uninterrupted service forwarding. The data and
control channels are separated in either out-of-band or in-band mode. Out-of-band separation
means that the data and control channels' physical links are separate. For example, the two
channels use separate physical interfaces, time divisions, or wavelengths. In-band separation
means that the data and control channels use the same physical links but different protocol
overheads. For example, an Ethernet network uses OAM to carry control packets and an SDH
network uses the dial control center (DCC) byte overheads to carry control packets. The
NE20E only supports out-of-band Ethernet channels and in-band Ethernet OAM channels.
LMP
The LMP(Link Management Protocol) protocol used in GMPLS manages links of the control
and data channels. Relevant standards describes the major functions of LMP, including:
 Control channel management: Dynamic LMP automatically discovers neighbors and
creates, maintains, and manages a control channel.
 Link attribute association: LMP bundles multiple data links between two directly
connected nodes into a TE link, and synchronizes TE link attributes such as switching
types and code types between the two directly connected nodes.
 Link connectivity verification: LMP verifies the connectivity of a data channel separated
from a control channel. LMP can verify the connectivity of multiple data channels
simultaneously.
 Fault management: LMP rapidly detects data link failures in unidirectional and
bidirectional LSPs, locates and isolates faults, and triggers appropriate protection and
recovery mechanisms. After a fault is removed, LMP sends a notification about link
recovery. Fault management is performed on links only between adjacent nodes.
LMP is classified into the following types:
 Static LMP: LMP neighbors are manually configured and no LMP packet needs to be
sent between them.
 Dynamic LMP: LMP neighbors, a control channel, a TE link, and data links are all
automatically discovered, minimizing configurations and speeding up network
construction.
The NE20E only supports static LMP. This means that LMP neighbors, control channels, and
data channels are manually configured.
1.12.6.2.2 Establishment of a GMPLS UNI Tunnel
Establishment of a GMPLS UNI Tunnel

In the GMPLS UNI model, an IP network and a transport network are managed in two
separate domains, between which routing information and topology information are not
shared. Therefore, explicit paths need to be established to specify data transmission paths, and
signaling packets transmitted over the transport network need to be encrypted and
authenticated. The specific measures involve static configuration of the Link Management
Protocol (LMP) between the ingress EN and the source CN and establishment of UNI LSPs
using the Resource Reservation Protocol-Traffic Engineering (RSVP-TE).
As shown in Figure 1-1165, a user initiates a UNI connection request from the ingress EN
Device1 to the egress EN Device2. Through a UNI interface, UNI Connection Request
packets are forwarded to the ingress CN NE1. NE1 then establishes a transmission path to
NE3 based on the destination IP address in the UNI Connection Request packets, the routing
protocol of the transport network, and RSVP-TE. The transmission path is determined as
Issue 01 (2018-05-04) 1718

NE20E-S2
NE1->NE2->NE3 (as indicated by the red line in the figure). In this manner, a direct link from
NE1 to NE3 is established on the transport network, and a GMPLS UNI tunnel with the path
Device1->NE1->NE2->NE3->Device2 is established over the transport network.
Figure 1-1165 Networking of GMPLS UNI services
Bidirectional UNI LSP

On a traditional MPLS network, two unidirectional LSPs need to be established to set up a
bidirectional LSP. This configuration, however, causes many problems, such as long delay
and high cost in setting up the bidirectional LSP, and low reliability and complicated
management of the bidirectional LSP.
GMPLS supports the establishment of bidirectional UNI LSPs in symmetric mode with a
unified TE requirement (resource sharing in either directional, path protection and recovery,
LSR and resource requirement), unique ingress node and egress node, and unique signaling
information. Therefore, the delay and cost in establishing a bidirectional UNI LSP is greatly
reduced and the preceding issues are properly addressed.
Figure 1-1166 shows the process to establish a bidirectional UNI LSP:
1. After a GMPLS UNI tunnel is configured on an ingress EN, a label of the reverse UNI
LSP is assigned to the ingress EN and the label is sent to the ingress CN in a Path
message. Note that assigning a label of the reverse UNI LSP to a node means to reserve
bandwidth resources for the reverse UNI LSP on the node.
2. After receiving the Path message from the ingress EN, the ingress CN creates a
bidirectional FA tunnel to the egress CN. Then a label of the reverse UNI LSP is
assigned to the ingress CN and the label is sent to the egress CN in a Path message.
3. After the Path message reaches the ingress CN, a label of the reverse UNI LSP is
assigned to the egress CN and the label is sent to the egress EN in a Path message.
4. After receiving the Path message from the egress CN, the egress EN reserves bandwidth
resources for both the reverse UNI LSP and the forward UNI LSP based on RSVP, and
Issue 01 (2018-05-04) 1719

NE20E-S2
labels of both UNI LSPs are assigned to the egress EN. Then the egress EN creates a
Resv message and sends the message to the egress CN.
5. After the Resv message reaches the egress CN, a label of the forward UNI LSP is
assigned to the egress CN and the label is sent to the ingress CN in a Resv message
through the FA tunnel.
6. After the Resv message reaches the ingress CN, a label of the forward UNI LSP is
assigned to the ingress CN and the label is sent to the ingress EN in a Resv message.
7. After the Resv message reaches the ingress EN, a label of the forward UNI LSP is
assigned to the ingress EN.
In this manner, each node is informed of the forward/reverse UNI LSP label of the adjacent
node and a bidirectional UNI LSP is then successfully set up.
Figure 1-1166 Establishment of a bidirectional UNI LSP
1.12.6.2.3 UNI LSP Graceful Deletion

UNI LSPs can be deleted in either of the following modes:
 Graceful deletion: The ingress EN sends a Path message carrying an Admin_Status
object with the delete (D) bit of 1 along a UNI LSP. Each node that receives the message
along the UNI LSP disables the alarm function and starts a graceful deletion timer. After
receiving the message, the egress EN replies with a Resv message carrying the
Admin_Status object with the D bit of 1. The Resv message travels through the UNI LSP
hop by hop to reach the ingress EN. The ingress EN sends a PathTear message to instruct
the nodes to delete the UNI LSP after receiving the Resv message or after the graceful
deletion timer expires.
 Forcible deletion: Its process is similar to the process for deleting MPLS LSPs. The
details are not provided.
The forcible deletion mode used on a transport network leads to a protection switchover and
relevant alarms. To prevent unneeded switchovers and alarms, graceful deletion is used.
Issue 01 (2018-05-04) 1720

NE20E-S2
Graceful deletion is triggered in either of the following modes:

 Triggered by configuration: A user runs the shutdown or reset command for a GMPLS
UNI LSP, or modifies a GMPLS UNI tunnel attribute on the ingress EN. After this, a
UNI LSP is deleted.
 Triggered by a Notify message: A Notify message carries an Admin_Status object with
the D bit of 1. It originates from internal transport nodes or the NMS and reaches the
ingress EN. After receiving the message, the ingress EN deletes a UNI LSP.
1.12.6.2.4 UNI Tunnel Calculation Using Both IP and Optical PCE Servers
Background
IP service provision on E2E backbone networks faces the following challenges:
 Optical and IP layers cannot share topology information. Therefore, path planning can
only be manually performed at both the optical and IP layers. Optimizing path
calculation and network resources is difficult.
 Inter-layer network deployment is performed by collaboration of IP and optical
departments, which delays service rollout.
To face the preceding challenges, the NE20E uses both the IP Path Computation Element
(PCE) and optical PCE functions to calculate paths for GMPLS UNI tunnels.
With this path calculation function, the IP and optical PCE servers automatically implement
path planning, which reduces manual workload and speeds up service rollout.
Principles
An ingress EN on a GMPLS UNI functions as a PCE client and requests an IP PCE server to
calculate paths. Upon receipt of the request, the IP PCE server works with an optical PCE
server to calculate path and sends path information to the ingress EN. The ingress EN
automatically establishes a GMPLS UNI tunnel over the calculated path.
In the following example, the IP and optical PCE servers are used simultaneously to calculate
a path for a GMPLS UNI tunnel between the ingress EN and egress EN. The implementation
is as follows:
1. The ingress EN sends a delegate path request for a GMPLS UNI tunnel to an IP PCE
server.
2. Upon receipt of the request, the IP PCE server instructs the optical PCE server to
calculate a path within an optical network.
3. The optical PCE server sends path information to the IP PCE server.
4. The IP PCE server sends all path information to the ingress EN.
5. The ingress EN sends RSVP messages to the ingress CN and starts to establish a
GMPLS UNI tunnel. The GMPLS UNI tunnel establishment process is similar to the
common GMPLS UNI tunnel establishment process. The establishment procedure is not
described.
Issue 01 (2018-05-04) 1721

NE20E-S2
Figure 1-1167 Flowchart for using both the IP and optical PCE servers to calculate paths
Benefits
Simultaneously using the IP and optical PCE servers to calculate a path for a GMPLS UNI
tunnel offers the following benefits:
 Automatically performs a great amount of site deployment planning, which reduces labor
costs.
 Eliminates the collaboration between the IP and optical service departments, which
speeds up site deployment.
1.12.6.2.5 SRLG Sharing Between Optical and IP Layers Within a Transport Network
Background
Although the IP layer and optical layer are connected, they cannot exchange routing
information. The active and standby links at the IP layer can only be separated using statically
planned SRLGs within the optical network, which delays service rollout and increases
maintenance workload. To address these problems, the SRLG sharing function can be used.
RSVP signaling at the optical layer sends SRLG attributes of transport links to the IP layer.
The IP layer applies the SRLG attributes to IP links. This function helps select reliable paths
for high reliability services at the IP layer based on SRLG constraints.
Principles
When a GMPLS UNI tunnel is established using RSVP, the extended RSVP protocol carries
SRLG information on optical links to both ends of the GMPLS UNI tunnel. SRLG
information is processed as TE SRLG information that is used to bind the GMPLS UNI tunnel
to UNI links, which separates links for the primary and backup TE tunnels.
Issue 01 (2018-05-04) 1722

NE20E-S2
RSVP Path or Resv messages carry SRLG sub-objects to notify the IP layer of SRLG
information about paths on an optical network. The ingress CN and engress CN at the IP layer
flood the SRLG information to the other devices on at the IP layer. Then all devices on the
network can establish the primary and backup TE tunnels on different links, preventing path
overlapping.
Figure 1-1168 SRLG sharing between optical and IP layers within a transport network
1.12.6.3 Deployment Scenario

1.12.6.3.1 General GMPLS UNI Scheme
GMPLS UNI allows IP services to be transparently transmitted over a transport network after
the source EN and destination EN are configured based on the NMS. In this manner, new IP
services do not need to be deployed for several months and burdensome negotiations between
management departments of the IP network and transport network are not needed.
After a GMPLS UNI tunnel is established, the tunnel can be used only if it is advertised to the
IP network to transmit IP services. There are multiple methods to advertise a GMPLS UNI
tunnel to the IP network. For example, the GMPLS UNI tunnel can be directly advertised to
the IP network as a tunnel interface; alternatively, the GMPLS UNI tunnel can be bound to
logical GMPLS UNI interfaces and the logical interfaces are advertised to the IP network. At
present, the NE20E uses the logical interface binding method to advertise a GMPLS UNI
tunnel to the IP network. Figure 1-1169 shows the details on this method.
Figure 1-1169 General GMPLS UNI scheme
Issue 01 (2018-05-04) 1723

NE20E-S2
A GMPLS UNI tunnel is bidirectional so that both ends of the tunnel needs to be bound to
logical GMPLS UNI interfaces and both logical interfaces need to be advertised to their
respective IP networks. The statuses of the bound logical interfaces are associated with the
GMPLS UNI status. If the UNI LSP is established, the bound logical interfaces go Up. If no
UNI LSP is established, the bound logical interfaces go Down. In real-world situations, a
GMPLS UNI tunnel is configured on logical interfaces in a way similar to the configuration
of a routing protocol or the MPLS function on the logic interfaces. This merit makes the
configuration of a GMPLS UNI tunnel more acceptable by users.
1.13 Segment Routing

The controller in this chapter indicates Huawei controller.

Purpose
This document describes the Segment Routing feature in terms of its overview, principles, and
applications.
Related Version

U2000 V200R017C50
Intended Audience
Issue 01 (2018-05-04) 1724

NE20E-S2

securely protected.
and VRRP.
Special Declaration
Issue 01 (2018-05-04) 1725

NE20E-S2
Symbol Conventions
Symbol Description



injury.
tips.
deterioration.
Change History
V800R009C10SPC200.
V800R009C10SPC100.
V800R009C10.
Issue 01 (2018-05-04) 1726

NE20E-S2
1.13.2 IPv4 Segment Routing

Definition
Segment routing (SR) is a protocol designed to forward data packets on a network based on
source routes. Segment routing divides a network path into several segments and assigns a
segment ID to each segment and network forwarding node. The segments and nodes are
sequentially arranged (segment list) to form a forwarding path.
Segment routing encodes the segment list identifying a forwarding path into a data packet
header. The segment ID is transmitted along with the packet. After receiving the data packet,
the receive end parses the segment list. If the top segment ID in the segment list identifies the
local node, the node removes the segment ID and proceeds with the follow-up procedure. If
the top segment ID does not identify the local node, the node uses the Equal Cost Multiple
Path (ECMP) algorithm to forward the packet to a next node.
Purpose
With the progress of the times, more and more types of services pose a variety of network
requirements. For example, real-time UC&C applications prefer to paths of low delay and low
jitter, and big data applications prefer to high bandwidth tunnels with a low packet loss rate.
In this situation, the rule helping the network adapt to service growth cannot catch up with the
rapid service development and even makes network deployment more complex and difficult
to maintain.
The solution is to allow services to drive network development and to define the network
architecture. Specifically, an application raises requirements (on the delay, bandwidth, and
packet loss rate). A controller collects information, such as network topology, bandwidth
usage, and delay information and computes an explicit path that satisfies the service
requirements.
Issue 01 (2018-05-04) 1727

NE20E-S2
Figure 1-1170 Service-driven network
Segment routing emerges in this context. Segment routing is used to simply define an explicit
path. Nodes need to merely maintain the segment routing information to adapt to rapid service
growth in real time. Segment routing has the following characteristics:
 Extends existing protocols such as IGP to allow for better smooth evolution of live
networks.
 The SR supports both the controller's centralized control mode and forwarder's
distributed control mode, providing a balance between centralized control and the
distributed control.
 Uses the source routing technique to provide capabilities of rapid interaction between
networks and upper-layer applications.
Benefits
Segment routing offers the following benefits:
 The control plane of MPLS network is simplified.
A controller or an IGP is used to uniformly compute paths and distribute labels, without
using RSVP-TE or LDP. Segment Routing can be directly applied to the MPLS
architecture without any change in the forwarding plane.
 Provides efficient topology independent-loop-free alternate (TI-LFA) FRR protection for
fast path failure recovery.
Based on the Segment Routing technology, combined with the RLFA (Remote Loop-free
Alternate) FRR algorithm, an efficient TI-LFA FRR algorithm is formed. TI-LFA FRR
supports node and link protection of any topology and overcomes drawbacks in
conventional tunnel protection.
 Provides the higher network capacity expansion capability.
Issue 01 (2018-05-04) 1728

NE20E-S2
MPLS TE is a connection-oriented technique. To maintain connections, nodes need to

send and process a large number of Keepalive packets, posing heavy burdens on the
control plane. Segment routing controls any service paths by merely operating labels on
the ingress, and transit node do not have to maintain path information, which reduces the
burdens on the control plane.
In addition, segment routing labels equal to the sum of the number of network-wide
nodes and the number of local adjacencies. The label quantity is related only to the
network scale, not to the number of tunnels or the service volume.
 Better smooth evolution to SDN network.
Segment routing is designed based on the source routing concept. Using the source node
alone can control forwarding paths over which packets are transmitted across a network.
The segment routing technique and the centralized patch computing module are used
together to flexibly and conveniently control and adjust paths.
Segment Routing supports both traditional networks and SDN networks. It is compatible
with existing equipment and ensures smooth evolution of existing networks to SDN
networks instead of subverting existing networks.
1.13.2.2 Feature Updates

Version Change Description
V800R009C10 Newly supports SR-BE.
Newly supports node labels in SR-TE.
V800R008C11 Newly supports SR-TE tunnels established
by a controller to run NETCONF to deliver
SR-TE tunnel configurations to a forwarder.
V800R008C10 Newly supports SR-TE tunnels established
by forwarders.
1.13.2.3 Principles
Basic Concepts
Segment routing involves the following concepts:
 Segment routing domain: is a set of SR nodes.
 Segment ID (SID): uniquely identifies a segment. A SID is mapped to an MPLS label on
the forwarding plane.
 SRGB: A segment routing global block (SRGB) is a set of local labels reserved for
segment routing of users.
Issue 01 (2018-05-04) 1729

NE20E-S2
Segment Category
Table 1-355 Segment category
Label Generation Function

Mode
Prefix Manually Identifies the prefix of a destination address.
Segment configured. An IGP floods it to the other NEs. The prefix segment is
visible globally and takes effect globally.
Prefix segment identified by the prefix segment ID (SID).
A prefix SID is an offset within the SRGB range and
advertised by a source node. The receive end uses the
local SRGB to compute label values and generate
forwarding entries.
Adjacency Allocated by Identifies an adjacency on a network.
Segment the ingress An IGP floods it to the other NEs. The adjacency segment
using a is visible globally and takes effect locally.
dynamic
protocol. Adjacency segment identified by the adjacency segment
ID (SID). The adjacency SID is a local SID out of the
SRGB range.
Node Manually The node segment, a special prefix segment, identifies a
Segment configured. specific node. When an IP address is configured for a
loopback interface, the IP address functions as the prefix
SID that is a type of node SID.
An example of Prefix SIDs, Adjacency SIDs, and Node SIDs is shown in Figure 1-1171.
Figure 1-1171 Prefix SID, Adjacency SID and Node SID
In simple words, a prefix segment indicates a destination address, and an adjacency segment
indicates a link over which data packets travel. The prefix and adjacency segments are similar
to the destination IP address and outbound interface, respectively, in conventional IP
forwarding. In an IGP area, a network element (NE) sends extended IGP messages to flood its
Issue 01 (2018-05-04) 1730

NE20E-S2
own node SID and adjacency SID. Upon receipt of the message, any NE can obtain
information about the other NEs.
Combining prefix (node) SIDs and adjacency SIDs in sequence can construct any network
path. Every hop on a path identifies a next hop based on the segment information on the top of
the label stack. The segment information is stacked in sequence at the top of the data header.
 If segment information at the stack top contains the identifier of another node, the
receive end forwards a data packet to a next hop using ECMP.
 If segment information at the stack identifies the local node, the receive end removes the
top segment and proceeds with the follow-up procedure.
In actual application, the prefix segment, adjacency segment, and node segment can be used
independently or in combinations. The following three main cases are involved.
Prefix Segment
A prefix segment-based forwarding path is computed by an IGP using the SPF algorithm. In
Figure 1-1172, node Z is a destination, and its prefix SID is 100. After an IGP floods the
prefix SID, all nodes in the IGP area lean the prefix SID of node Z. Each node runs SPF to
compute the shortest path to node Z. Such a path is a smallest-cost path.
Figure 1-1172 Prefix segment-based forwarding paths
If there are several paths have the same cost, they perform ECMP. If they have different costs,
they perform link backup. The prefix segment-based forwarding paths are not fixed, and the
ingress cannot control the whole forwarding path.
Adjacency Segment
In Figure 1-1173, an adjacency segment is assigned to each adjacency. The adjacency
segments are contained in a segment list defined on the ingress. The segment list is used to
strictly specify any explicit path. This mode can better implement SDN.
Issue 01 (2018-05-04) 1731

NE20E-S2
Figure 1-1173 Adjacency segment-based forwarding path
Adjacency Segment + Node Segment

In Figure 1-1174, adjacency and node segments are combined to forcibly include a specific
adjacency into a path. Nodes can use node segments to compute the shortest path based on
SPF or to load-balance traffic among paths. In this mode, paths are not strictly fixed, and
therefore, they are also called loose explicit paths.
Figure 1-1174 Adjacency segment + node segment-based forwarding path
Issue 01 (2018-05-04) 1732

NE20E-S2
SR Forwarding Mechanism
SR can be used directly in the MPLS architecture, where the forwarding mechanism remains.
SIDs are encoded as MPLS labels. The segment list is encoded as a label stack. The segment
to be processed is at the stack top. Once a segment is processed, its label is removed from a
label stack.
Label Conflicts and Handling Rules

Prefix segments are manually configured. These settings on different devices may conflict
with one another. Label conflicts are as follows:
 Prefix conflict: The same prefix is associated with two different SIDs.
 SID conflict: The same SID is associated with different prefixes.
If label conflicts occur, handle prefix conflicts before SID conflicts and use the following
rules to preferentially select a SID or prefix:
1. A prefix with a larger mask is preferred.
2. The prefix of a smaller value is preferred.
3. A smaller SID is preferred.
For example, label conflicts occur in the following four routes (in the form of prefix/mask
SID):
 a. 1.1.1.1/32 1
 b. 1.1.1.1/32 2
 c. 2.2.2.2/32 3
 d. 3.3.3.3/32 1
The process of handling the label conflicts is as follows:
1. Prefix conflicts are handled. Routers a and b lead to a prefix conflict. Route a has a
smaller SID than route b. Route a is preferred. After the conflict is handled, the
following three routes are selected:
− a. 1.1.1.1/32 1
− c. 2.2.2.2/32 3
− d. 3.3.3.3/32 1
2. SID conflicts are handled. Routes a and d lead to a SID conflict. Route a has a smaller
prefix than route d, route a is preferred. After the conflict is handled, the following two
routes are selected:
− a. 1.1.1.1/32 1
− c. 2.2.2.2/32 3
1.13.2.3.2 SR LSP
SR LSPs are established using the segment routing technique, uses prefix or node segments to
guide data packet forwarding. Segment Routing Best Effort (SR-BE) uses an IGP to run the
shortest path algorithm to compute an optimal SR LSP.
The establishment and data forwarding of SR LSPs are similar with those of LDP LSPs. SR
LSPs have no tunnel interfaces.
Issue 01 (2018-05-04) 1733

NE20E-S2
Creating an SR LSP
Creating an SR LSP involves the following operations:
 Devices report topology information to a controller (if the controller is used to create a
tunnel) and are assigned labels.
 The devices compute paths.
SR LSPs are created primarily using prefix labels. A destination node runs an IGP to advertise
prefix SIDs, and forwarders parse them and compute label values based on local SRGBs.
Each node then runs an IGP to collect topology information, runs the SPF algorithm to
calculate a label forwarding path, and delivers the computed next hop and outgoing label
(OuterLabel) to the forwarding table to guide data packet forwarding.
Figure 1-1175 Prefix label-based LSP establishment
Table 1-356 describes the process of using prefix labels to create an LSP shown in Figure
1-1175.
Table 1-356 LSP creation process
St Dev Operation
e ice
p
1 D An SRGB and a prefix SID are configured on a loopback interface of D. D

generates forwarding entries, encapsulates the SRGB and prefix SID into an
LSP (for exmaple, IS-IS Router Capability TLV-242 containing
SR-Capabilities Sub-TLV), and floods the LSP across the whole network
through IGP.
After the other devices receive the LSP, they parse the LSP, obtain the prefix
SID advertised by device D, and use the prefix to compute labels based on
local SRGBs. They run IGP to compute a label switched path and find
next-hop devices and outgoing labels.
2 C C parses the prefix SID released by device D and computes a label value
based on the local SRGB (36000 to 65535). The value is calculated using the
following formula:
Label = SRGB start value + Prefix SID value = 36000 + 100 = 36100
Issue 01 (2018-05-04) 1734

NE20E-S2
St Dev Operation
e ice
p
IS-IS calculates an outgoing label based on the following formula:
OuterLabel = SRGB start value advertised by the next hop devices + Prefix
SID value = 16000 + 100 = 16100
Here, the next-hop device is device D, and device D releases the SRGB
(16000 to 65535).
3 B The calculation process is similar to C:
Label = 26000 + 100 = 26100
OuterLabel = 36000 + 100 = 36100
4 A The calculation process is similar to C:
Label = 6000 + 100 = 6100
OuterLabel = 26000 + 100 = 26100
Data Forwarding
Similar to MPLS, SR-TE operates labels by pushing, swapping, or popping them.
 Push: After a packet enters an SR LSP, the ingress adds a label between the Layer 2 and
IP header. Alternatively, the ingress adds a label stack above the existing label stack.
 Swap: When packets are forwarded in an SR domain, a node searches the label
forwarding table for a label assigned by a next hop and swaps the label on the top of the
label stack with the matching label in each SR packet.
 Pop: After the packets leave out of an SR-TE tunnel, a node finds an outbound interface
mapped to the label on the top of the label stack and removes the top label.
Figure 1-1176 Prefix label-based data forwarding
Table 1-357 describes the data forwarding process on the network shown in Figure 1-1176.
Issue 01 (2018-05-04) 1735

NE20E-S2
Table 1-357 Packet forwarding process
St Dev Operation
e ice
p
1 A Receives a data packet, adds label 26100 to the packet, and forwards the
packet.
2 B Receives the labeled packet, swaps label 26100 for label 36100, and forwards
the packet.
3 C Receives the labeled packet, swaps label 36100 for label 16100, and forwards
the packet.
4 D Removes label 16100 and forwards the packet along a matching route.
PHP, MPLS QoS, and TTL

Penultimate hop popping (PHP) is enabled on the egress on which a label becomes useless.
The egress assigns a label to a penultimate node on an LSP so that the label is removed to
relieve the burden on the egress. The egress then forwards the packet over an IP route or
based on the next label.
PHP is configured on the egress. In Figure 1-1176, PHP is not enabled, and NE-C is a
penultimate hop of an SR tunnel. NE-C uses a valid label to reach NE-D. If PHP is enabled,
NE-C sends a packet without an SR label to NE-D.
Enabling PHP affects both the MPLS QoS and TTL functions on the egress. For details, see
Table 1-358.
Table 1-358 PHP, MPLS QoS, and TTL
Label Type Description MPLS EXP MPLS TTL on Scenario

(QoS) on the the Egress
Egress
explicit-null PHP is not The MPLS The MPLS TTL Label resources
supported. The EXP field is processing is on the egress
egress assigns reserved. QoS normal. are saved. If
an explicit-null is supported. E2E services
label. The IPv4 carry QoS
explicit-null attributes to be
label value is 0. contained in the
EXP field in a
label, an
explicit-null can
be used.
implicit-null PHP is There is no There is no The forwarding
supported. The MPLS EXP MPLS TTL burden on the
egress assigns field on the field on the egress is
an implicit-null egress, and QoS egress, so it can reduced, and
label. The is not not be copied to forwarding
implicit-null supported. the IP TTL efficiency is
Issue 01 (2018-05-04) 1736

NE20E-S2
Label Type Description MPLS EXP MPLS TTL on Scenario

(QoS) on the the Egress
Egress
label value is 3. field. improved.
If an
implicit-null
label is
distributed to an
NE, the NE
directly
removes the
label without
having to swap
an existing
label at the top
of the stack for
it. The egress
then forwards
the packet over
an IP route or
based on the
next label.
non-null PHP is not The MPLS The MPLS TTL Using a
supported. The EXP field is processing is non-null label
egress assigns a reserved. QoS normal. consumes a
common label is supported. great number of
to a penultimate resources on the
hop. egress and is
not
recommended.
The non-null
label helps the
egress identify
various types of
services.
1.13.2.3.3 IS-IS for SR

Segment routing uses an IGP to advertise topology information, prefix information, a segment
routing global block (SRGB), and label information. To complete the preceding functions, the
IGP extends some TLVs of protocol packets. IS-IS mainly defines sub-TLVs that enable SID
and NE SR capabilities, as shown in Table 1-359.
Table 1-359 IS-IS Sub-TLV extension for SID and NE SR capabilities
Item Function Position

Prefix-SID Advertises the SR prefix SID.  IS-IS Extended IPv4
Sub-TLV Reachability TLV-135
 IS-IS Multitopology IPv4
Reachability TLV-235
Issue 01 (2018-05-04) 1737

NE20E-S2
Item Function Position

 IS-IS IPv6 IP Reachability
TLV-236
 IS-IS Multitopology IPv6 IP
Reachability TLV-237
 ...
Adj-SID Sub-TLV Advertises SR Adjacency  IS-IS Extended IS reachability

SIDs on a P2P network. TLV-22
 IS-IS IS Neighbor Attribute
TLV-23
 IS-IS inter-AS reachability
information TLV-141
 IS-IS Multitopology IS TLV-222
 IS-IS Multitopology IS Neighbor
Attribute TLV-223
LAN-Adj-SID Advertises SR Adjacency  IS-IS Extended IS reachability

Sub-TLV SIDs on a LAN. TLV-22
 IS-IS IS Neighbor Attribute
TLV-23
 IS-IS inter-AS reachability
information TLV-141
 IS-IS Multitopology IS TLV-222
 IS-IS Multitopology IS Neighbor
Attribute TLV-223
SID/Label Advertises the SR SID or SR-Capabilities Sub-TLV and SR

Sub-TLV MPLS Label. Local Block Sub-TLV
SR-Capabilities Advertises the SR IS-IS Router Capability TLV-242
Sub-TLV capabilities.
SR-Algorithm Advertises the used IS-IS Router Capability TLV-242
Sub-TLV algorithm.
SR Local Block Advertises the label scope IS-IS Router Capability TLV-242
Sub-TLV that an NE reserves for the
local SID.
IS-IS SID TLV Extensions

Prefix-SID Sub-TLV
Issue 01 (2018-05-04) 1738

NE20E-S2
The Prefix-SID sub-TLV carries IGP-Prefix-SID information. Figure 1-1177 shows the format
of the Prefix-SID sub-TLV.
Figure 1-1177 Prefix-SID Sub-TLV format
Table 1-360 Meanings of fields in the Prefix-SID Sub-TLV
Field Name Length Description

Type 8 bits Unassigned. The recommended value is 3.
Length 8 bits Packet length.
Flags 8 bits Flags field. Figure 1-1178 shows its format.
Figure 1-1178 Flags field
The meaning of each flag is as follows:

 R: re-advertised flag. If this flag is set, a prefix is imported
from another protocol or penetrates from another level
(such as when a prefix is penetrated from an IS-IS Level 1
area to a Level 2 area).
 N: node SID flag. If this flag is set, a prefix SID identifies
a node. If a prefix SID is set to a loopback interface
address, this flag bit is set.
 P: no-PHP flag. If this flag is set, PHP is disabled so that
the penultimate node sends a labeled packet to the egress.
 E: explicit null label flag. If this flag is set, the explicit
null label function is enabled. An upstream neighbor must
replace an existing label with an explicit null label before
forwarding a packet.
 V: value flag. If this flag is set, a prefix SID carries a
value, instead of an index. By default, the flag is not set.
 L: local flag. If this flag is set, the value or index carried
in a prefix SID is of local significance. By default, the flag
is not set.
Issue 01 (2018-05-04) 1739

NE20E-S2

A node must compute an outgoing prefix label based on the P
and E flags in a prefix SID advertised by a next hop,
regardless whether the optimal path to the prefix SID passes
through the next hop. When a node advertises reachability
messages (for example, from Level-1 to Level-2) generated
by another IS-IS Speaker, the local node must set the P flag
and clear the E flag in a prefix SID.
The following behavior is related to P and E flags:
 If the P flag is not set, any upstream node of the prefix
SID producer must strip off the prefix SID, which is
similar to PHP in MPLS forwarding. The MPLS EXP bit
is also cleared. In addition, if the P flag is not set, the
received E flag bit is ignored.
 If the P flag is set, the following situations occur:
− If the E flag is not set, any upstream node of the prefix
SID producer must reserve the prefix SID on the top of
the label stack. This method is used in path stitching.
For example, a prefix SID producer may use this label
to forward a packet to another MPLS LSP.
− If the E flag is set, any upstream node of the prefix
SID producer must replace the prefix SID label with an
explicit null label. In this mode, the MPLS EXP flag is
retained. If the prefix SID producer is the destination,
the node can receive the original MPLS EXP field
value. The MPLS EXP flag can be used in QoS
services.
Algorithm 8 bits Algorithm:

 0: Shortest Path First
 1: Strict Shortest Path First
SID/Index/L Variable This field contains either of the following information based
abel length on the V and L flags:
(variable)  4-byte label offset value, within an ID/label range. In this
case, V and L flags are not set.
 3-byte local label: The rightmost 20 bits are a label value.
In this case, the V and L flags must be set.
Adj-SID Sub-TLV
An Adj-SID Sub-TLV is optional and carries IGP Adjacency SID information. Figure 1-1179
shows its format.
Issue 01 (2018-05-04) 1740

NE20E-S2
Figure 1-1179 Adj-SID Sub-TLV format
Table 1-361 Meanings of fields in the Adj-SID Sub-TLV


 F: address family flag.
− 0: IPv4
− 1: IPv6
 B: backup flag. If the flag is set, an Adj-SID is used to
protect another node.
 V: value flag. If this flag is set, an Adj-SID carries a label
value. The flag is set by default.
 L: local flag. If this flag is set, the Adj-SID value or index
is of local significance. The flag is set by default.
 S: sequence flag. If this flag is set, an Adj-SID is an
adjacency sequence.
 P: permanent label. If this flag is set, an Adj-SID is a
permanently assigned SID, which is unchanged,
regardless of a device restart or interface flapping.
Weight 8 bits Weight. The Adj-SID weight is used for load balancing.
SID/Index/L Variable This field contains either of the following information based
abel length on the V and L flags:
(variable)  3-byte local label: The rightmost 20 bits are a label value.
In this case, the V and L flags must be set.
Issue 01 (2018-05-04) 1741

NE20E-S2

 4-byte label offset value, within an ID/label range. In this
case, V and L flags are not set.
A designated intermediate system (DIS) is elected as a medium during IS-IS communication

on a LAN. On the LAN, an NE merely needs to advertise a link message to the DIS and
obtain all link information from the DIS, but does not need to exchange link information
between NEs.
In Segment routing implementation, each NE advertises Adj-SIDs to all neighbors. On the
LAN, each NE advertises only an Adj-SID to the DIS and encapsulates neighbors' Adj-SIDs
in a new TLV, which is a LAN-Adj-SID Sub-TLV. The TLV contains all Adj-SID that the NE
allocates to all LAN neighbors.
Figure 1-1181 shows the format of the LAN-Adj-SID Sub-TLV.
Figure 1-1181 LAN-Adj-SID Sub-TLV format
SID/Label Sub-TLV
A SID/Label Sub-TLV includes a SID or an MPLS label. The SID/Label Sub-TLV is a part of
the SR-Capabilities Sub-TLV and SR Local Block Sub-TLV.
Figure 1-1182 shows the format of the SID/Label Sub-TLV.
Figure 1-1182 SID/Label Sub-TLV format
Table 1-362 Meanings of fields in the SID/Label Sub-TLV

Issue 01 (2018-05-04) 1742

NE20E-S2

SID/Label Variable If the Length field value is set to 3, the rightmost 20 bits
(variable) length indicate an MPLS label.
IS-IS SR Capability TLV Extension

SR-Capabilities Sub-TLV
In segment routing, each NE must be able to advertise its SR capability and global SID range
(or global label index). To implement the preceding requirement, an SR-Capabilities Sub-TLV
is defined and embed in the IS-IS Router Capability TLV-242 for transfer. The
SR-Capabilities Sub-TLV can be propagated only within the same IS-IS level area.
Figure 1-1183 shows the format of the SR-Capabilities Sub-TLV.
Figure 1-1183 SR-Capabilities Sub-TLV format
Table 1-363 Meanings of fields in the SR-Capabilities Sub-TLV


 I: MPLS IPv4 flag. If the flag is set, SR MPLS IPv4
packets received by all interfaces can be processed.
 V: MPLS IPv6 flag. If the flag is set, SR MPLS IPv6
Issue 01 (2018-05-04) 1743

NE20E-S2

packets received by all interfaces can be processed.
Range 8 bits SRGB range.

The advertising end releases the following SR-Capabilities in
the following ranges.
SR-Capability 1:Range: 100, SID value: 100
SR-Capability 2: Range: 100, SID value: 1000
SR-Capability 3: Range: 100, SID value: 500
The receive end links the preceding ranges and generates an
SRGB.
SRGB = [100, 199]
[1000, 1099]
[500, 599]
Different label indexes may span multiple ranges.
Index 0: label 100
...
Index 99: label 199
Index 100: label 1000
...
...
SID/Label Variable See SID/Label Sub-TLV. The SRGB start value is included.
Sub-TLV length When multiple SRGBs are configured, ensure that the SRGB
(variable) sequence is correct and the SRGBs do not overlap.
SR-Algorithm Sub-TLV
NEs use different algorithms, for example, the SPF algorithm and various SPF variant
algorithms, to compute paths to the other nodes or prefixes. The newly defined SR-Algorithm
Sub-TLV enables an NE to advertise its own algorithm. The SR-Algorithm Sub-TLV is also
carried in the IS-IS Router Capability TLV-242 for transfer. The SR-Algorithm Sub-TLV can
be propagated within the same IS-IS level.
Figure 1-1185 shows the format of the SR-Algorithm Sub-TLV.
Figure 1-1185 SR-Algorithm Sub-TLV format
Issue 01 (2018-05-04) 1744

NE20E-S2
Table 1-364 Meanings of fields in the SR-Algorithm Sub-TLV

Algorithm 8 bits Algorithm.
SR Local Block Sub-TLV

The SR Local Block Sub-TLV contains a label range that an NE reserves for local SIDs. The
local SIDs are used as adjacency SIDs or are allocated by the other components. For example,
an application (App) or a controller instructs the NE to assign a special local SID. To notify
the App or controller of available local SIDs, the NE must advertise an SR local block SRLB.
Figure 1-1186 shows the format of the SR Local Block Sub-TLV.
Figure 1-1186 SR Local Block Sub-TLV format
Table 1-365 Meanings of fields in the SR Local Block Sub-TLV

Flags 8 bits Flags field. This field is not defined.
Range 8 bits SRLG range.
SID/Label Variable See SID/Label Sub-TLV. The SRGB start value is included.
Sub-TLV length When multiple SRGBs are configured, ensure that the SRGB
(variable) sequence is correct and the SRGBs do not overlap.
The SRLB TLV advertised by the NE may contain a label range that is out of the SRLB. Such
a label range is assigned locally and is not advertised in the SRLB. For example, an adjacency
SID is assigned a local label, not a label within the SRLB range.
IS-IS SR LSP Creation

An intra-IGP-area SR LSP is created.
Issue 01 (2018-05-04) 1745

NE20E-S2
In Figure 1-1187, devices run IS-IS. Segment routing is used and enables each device to
advertise the SR capability and supported SRGB. In addition, the advertising end advertises a
prefix SID offset within the SRGB range. The receive end computes an effective label value
to generate a forwarding entry.
Figure 1-1187 IS-IS SR LSP creation
Devices A through F are deployed in areas of the same level. All Devices run IS-IS. An SR
tunnel originates from Device A and is terminated at Device D.
An SRGB is configured on Device D. A prefix SID is set on the loopback interface of Device
D. Device D encapsulates the SRGB and prefix SID into a link state protocol data unit (LSP)
(for example, IS-IS Router Capability TLV-242 containing SR-Capability Sub-TLV) and
floods the LSP across the network. After another Device receives the SRGB and prefix SID, it
uses them to compute a forwarding label, uses the IS-IS topology information, and runs the
Dijkstra algorithm to calculate an LSP and LSP forwarding entries.
An inter-IGP area SR LSP is created
In Figure 1-1188, to establish an inter-area SR LSP, the prefix SID must be advertised across
areas by penetrating these areas. This overcomes the restriction on IS-IS's flooding scope
within each area.
Issue 01 (2018-05-04) 1746

NE20E-S2
Figure 1-1188 Inter-IGP area SR LSP
Devices A through D are deployed in different areas, and all devices run IS-IS. An SR tunnel
originates from Device A and is terminated at Device D.
An SRGB is configured on Device D. A prefix SID is set on the loopback interface of Device
D. Device D generates and delivers forwarding entries. It encapsulates the SRGB and prefix
SID into an LSP (for example, IS-IS Router Capability TLV-242 containing SR-Capability
Sub-TLV) and floods the LSP across the network. Upon receipt of the LSP, Device C parses
the LSP to obtain the prefix SID, calculates and delivers forwarding entries, and penetrates the
prefix SID and prefix address to the Level-2 area. Device B parses the LSP to obtain the
prefix SID, calculates and delivers forwarding entries, and penetrates the prefix SID and
prefix address to the Level-1 area. Device A parses the LSP and obtains the prefix SID, uses
IS-IS to collect topology information, and runs the Dijkstra algorithm to compute a label
switched path and tunnel forwarding entries.
1.13.2.3.4 SR-TE
SR-Traffic Engineering (SR-TE) is a new Multiprotocol Label Switching (MPLS) Traffic
Engineering (TE) tunneling technique implemented based on an Interior Gateway Protocol
(IGP) extension. The controller calculates a path for an SR-TE tunnel and forwards a
computed label stack to the ingress configured on a forwarder. The ingress uses the label stack
to generate an LSP in the SR-TE tunnel. Therefore, the label stack is used to control the path
along which packets are transmitted on a network.
SR-TE Advantages
SR-TE tunnels are capable of meeting the rapid development requirements of
software-defined networking (SDN), which Resource Reservation Protocol-TE (RSVP-TE)
tunnels are unable to meet.Table 1-366 describes the comparison between SR-TE and
RSVP-TE.
Table 1-366 Comparison between SR-TE and RSVP-TE tunnels
Item SR-TE RSVP-TE

Label The extended IGP assigns and MPLS allocates and distributes labels.
allocatio distributes labels. Each link is Each LSP is assigned a label, which
n assigned only a single label, and all consumes a great number of labels
LSPs share the label, which reduces resources and results in heavy
Issue 01 (2018-05-04) 1747

NE20E-S2
Item SR-TE RSVP-TE

resource consumption and workloads maintaining label
maintenance workload of label forwarding tables.
forwarding tables.
Control An IGP is used, which reduces the RSVP is used, and the control plane is
plane number of protocols to be used. complex.
Scalabilit High scalability. Tunnel information Poor scalability. It needs to maintain
y is carried in packets, so an the tunnel status information and also
intermediate device cannot discern an needs to maintain the forwarding
SR-TE tunnel. This eliminates the entries.
need to maintain tunnel status
information. Forwarding entries are
only maintained, rendering high
scalability.
Path A service path can be controlled by Whether it is a normal service
adjustme operating a label only on the ingress. adjustment or a passive path
nt and Configurations do not need to be adjustment of a fault scenario, the
control delivered to each node, which configurations must be delivered to
improves programmability. each node.
When a node in the path fails, the
controller recalculates the path and
updates the label stack of the ingress
node to complete the path adjustment.
Related Concepts
Label Stack
A label stack is a set of Adjacency Segment labels in the form of a stack stored in a packet
header. Each Adjacency SID label in the stack identifies an adjacency to a local node, and the
label stack describes all adjacencies along an SR-TE LSP. In packet forwarding, a node
searches for an adjacency mapped to each Adjacency Segment label in a packet, removes the
label, and forwards the packet. After all labels are removed from the label stack, the packet is
sent out of an SR-TE tunnel.
Stick Label and Stick Node
If a label stack depth exceeds that supported by a forwarder, the label stack cannot carry all
adjacency labels on a whole LSP. In this situation, the controller assigns multiple label stacks
to the forwarder. The controller delivers a label stack to an appropriate node and assigns a
special label to associate label stacks to implement segment-based forwarding. The special
label is a stitching label, and the appropriate node is a stitching node.
The controller assigns a stitching label at the bottom of a label stack to a stitching node. After
a packet arrives at the stitching node, the stitching node swaps a label stack associated with
the stitching label based on the label-stack mapping. The stitching node forwards the packet
based on the label stack for the next segment.
Issue 01 (2018-05-04) 1748

NE20E-S2
1.13.2.3.4.1 Topology Collection and Label Allocation
Network Topology Collection Modes

Network topology information is collected in either of the following modes:
 A forwarder runs IS-IS to collect network topology information and report the
information to the controller.
 Both the controller and forwarders run IS-IS. Each forwarder floods network topology
information to one another. Each forwarder reports the information to the controller.
Label Allocation Modes

A forwarder runs an IGP (IS-IS is supported only) to assign labels and reports label
information to a controller. SR-TE mainly uses adjacency labels (adjacency segment), and
node labels can also be used. Adjacency labels are assigned by the ingress. They are valid
locally and unidirectional. The node labels are manually configured and globally valid.
Adjacency labels and node labels are advertised using IGP. In Figure 1-1189, adjacency label
9003 identifies the PE1-to-P3 adjacency and is assigned by PE1. Adjacency label 9004
identifies the P3-to-PE1 adjacency and is assigned by P3.
Figure 1-1189 IS-IS label assignment
IS-IS SR is enabled on PE1, PE2, and P1 through P4 to establish IS-IS neighbor relationships
between each pair of directly connected nodes. In SR-capable IS-IS instances, each outbound
IS-IS interface is assigned an SR Adjacency Segment label. SR IS-IS advertises the
Adjacency Segment labels across a network. P3 is used as an example. In Figure 1-1189,
IS-IS-based label allocation is as follows:
Issue 01 (2018-05-04) 1749

NE20E-S2
1. P3 runs IS-IS to apply for a local dynamic label for an adjacency. For example, P3
assigns adjacency label 9002 to the P3-to-P4 adjacency.
2. P3 runs IS-IS to advertise the adjacency label and flood it across the network.
3. P3 uses the label to generate a label forwarding table.
4. After the other nodes on the network run IS-IS to learn the Adjacency Segment label
advertised by P3, the nodes do not generate local forwarding tables.
PE1, P1, P2, P3, and P4 assign and advertise adjacency labels in the same way as P3 does.
The label forwarding table is then generated on each node. Each node establishes an IS-IS
neighbor relationship with the controller, generates topology information, including SR labels,
and reports topology information to the controller. A node establishes an IS-IS neighbor
relationship with the controller, generates topology information, including SR labels, and
reports topology information to the controller.
1.13.2.3.4.2 SR-TE Tunnel Establishment
SR-TE Tunnel
Segment Routing Traffic Engineering (SR-TE) runs the SR protocol and uses TE constraints
to create a tunnel.
Figure 1-1190 SR-TE Tunnel
In Figure 1-1190, a primary LSP is established along the path PE1->P1->P2->PE2, and a
backup path is established along the path PE1->P3->P4->PE2. The two LSPs have the same
tunnel ID of an SR-TE tunnel. The LSP originates from the ingress, passes through transit
nodes, and is terminated at the egress.
SR-TE tunnel establishment involves configuring and establishing an SR-TE tunnel. Before
an SR-TE tunnel is created, IS-IS neighbor relationships must be established between
forwarders to implement network layer connectivity, to assign labels, and to collect network
topology information. Forwarders send label and network topology information to the
controller, and the controller uses the information to calculate paths.
Issue 01 (2018-05-04) 1750

NE20E-S2
SR-TE Tunnel Configuration

SR-TE tunnel attributes are used to create tunnels. An SR-TE tunnel can be configured on a
controller or a forwarder.
 An SR-TE tunnel is configured on a controller.
The controller runs NETCONF to deliver tunnel attributes to a forwarder (as shown in
Figure 1-1191). The forwarder runs PCEP to delegate the tunnel to the controller for
management. (Upon receipt of the SR-TE tunnel configuration, the forwarder runs PCEP
to delegate LSPs to the controller. The controller calculates paths, generates labels, and
maintains the SR-TE tunnels.)
 An SR-TE tunnel is manually configured on a forwarder.
An SR-TE tunnel is manually configured on a forwarder. The forwarder delegates LSPs
to the controller for management.
SR-TE Tunnel Establishment

If a service (for example, VPN) is bound to an SR-TE tunnel, a device establishes an SR-TE
tunnel based on the following process, as shown in Figure 1-1191.
Figure 1-1191 Networking for SR-TE tunnels established using configurations that a controller
runs NETCONF to deliver to a forwarder
The process of establishing an SR-TE tunnel is as follows:
Issue 01 (2018-05-04) 1751

NE20E-S2
1. The controller uses SR-TE tunnel constraints and Path Computation Element (PCE) to
calculate paths and combines adjacency labels into a label stack that is the calculation
result.
If the label stack depth exceeds the upper limit supported by a forwarder, the label stack
can only carry some labels, and the controller needs to divide a label stack into multiple
stacks for an entire path.
In Figure 1-1191, the controller calculates a path PE1->P3->P1->P2->P4->PE2 for an
SR-TE tunnel. The path is mapped to two label stacks {1003, 1006, 100} and {1005,
1009, 1010}. Label 100 is a stitching label, and the others are adjacency labels.
2. The controller runs NETCONF to deliver the label stacks to the forwarder.
In Figure 1-1191, the process of delivering label stacks on the controller is as follows:
a. The controller delivers label stack {1005, 1009, 1010} to P1 and assigns a stitching
label of value 100 associated with the label stack. Label 100 is the bottom label in
the label stack on PE1.
b. The controller delivers label stack {1003, 1006, 100} to the ingress PE1.
3. The forwarder uses the delivered label stacks to establish an LSP for an SR-TE tunnel.
An SR-TE tunnel does not support MTU negotiation. Therefore, the MTUs configured on nodes along
the SR-TE tunnel must be the same. If an SR-TE tunnel is created manually, set an MTU value on the
tunnel interface or use the default MTU of 1500 bytes. On the manual SR-TE tunnel, the smallest value
in the following values takes effect: MTU of the tunnel, MPLS MTU of the tunnel, MTU of the
outbound interface, and MPLS MTU of the outbound interface.
1.13.2.3.4.3 SR-TE Data Forwarding

A forwarder operates a label in a packet based on the label stack mapped to the SR-TE LSP,
searches for an outbound interface hop by hop based on the top label of the label stack, and
uses the label to guide the packet to the tunnel destination address.
SR-TE Data Forwarding (Adjacency)

In Figure 1-1192, an example is provided to describe the process of forwarding SR-TE data
with manually specified adjacency labels.
Issue 01 (2018-05-04) 1752

NE20E-S2
Figure 1-1192 SR-TE data packet forwarding (based on adjacency labels)
In Figure 1-1192, the SR-TE path calculated by the controller is A -> B -> C -> D -> F -> E.
The path is mapped to two label stacks {1003, 1006, 100} and {1005, 1009, 1010}. The two
label stacks are delivered to ingress A and stitching node C, respectively. Label 100 is a
stitching label and is associated with label stack {1005, 1009, 1010}. The other labels are
adjacency labels. Process of forwarding data packets along an SR-TE tunnel is shown as
following:
1. The ingress A adds a label stack of {1003, 1006, 100}. The ingress A uses the outer label
of 1003 in the label stack to match against an adjacency and finds A-B adjacency as an
outbound interface. The ingress A strips off label 1003 from the label stack {1003, 1006,
100} and forwards the packet downstream through A-B outbound interface.
2. Node B uses the outer label of 1006 in the label stack to match against an adjacency and
finds B-C adjacency as an outbound interface. Node B strips off label 1006 from the
label stack {1006, 100}. The pack carrying the label stack {100} travels through the
B-to-C adjacency to the downstream node C.
3. After stitching node C receives the packet, it identifies stitching label 100 by querying
the stitching label entries, swaps the label for the associated label stack {1005, 1009,
1010}. Stitching node C uses the top label 1005 to search for an outbound interface
connected to the C-to-D adjacency and removes label 1005. Stitching node C forwards
the packet carrying the label stack {1009, 1010} along the C-to-D adjacency to the
downstream node D. For more details about stick label and stick node, see 1.13.2.3.4
SR-TE.
4. After nodes D and E receive the packet, they treat the packet in the same way as node B.
Node E removes the last label 1010 and forwards the data packet to node F.
Issue 01 (2018-05-04) 1753

NE20E-S2
5. Egress F receives the packet without a label and forwards the packet along a route that is
found in a routing table.
The preceding information shows that after adjacency labels are manually specified, devices
strictly forward the data packets hop by hop along the explicit path designated in the label
stack. This forwarding method is also called strict explicit-path SR-TE.
SR-TE Data Forwarding (Node+Adjacency)

SR-TE in strict path mode does not support hot-standby LSP. In addition, if equal-cost paths
exist, load balancing cannot be performed, either. To overcome these drawbacks, node labels
are introduced to SR-TE paths.
The node+adjacency mixed label stack can be manually specified. With this stack used, the
inter-node node labels can be set. The controller runs PCEP or NETCONF to deliver the stack
to the forwarder ingress, and forwarders use the label stack to forward packets through
outbound interfaces to the destination IP address of an LSP.
Figure 1-1193 SR-TE forwarding principles (node+adjacency)
On the network shown in Figure 1-1193, a node+adjacency mixed label stack is configured.
On the ingress node A, the mixed label stack is {1003, 1006, 1005, 101}. Labels 1003, 1006
and 1005 are adjacency labels, and label 101 is a node label.
1. Node A finds an A-B outbound interface based on label 1003 on the top of the label
stack. Node A removes label 1003 and forwards the packet to the next hop node B.
Issue 01 (2018-05-04) 1754

NE20E-S2
2. Similar to node A, node B finds the outbound interface mapped to label 1006 on the top
of the label stack. Node B removes label 1006 and forwards the packet to the next hop
node C.
3. Similar to node A, node C finds the outbound interface mapped to label 1005 on the top
of the label stack. Node C removes label 1006 and forwards the packet to the next hop
node D.
4. Node D processes label 101 on the top of the label stack. This label is to perform load
balancing. Node D replaces this label with labels 201 and 301 and forwards the packet to
nodes E and G. Traffic packets are balanced on links based on 5-tuple information.
5. After receiving node labels 201 and 301, nodes E and G that are at the penultimate hops
removes labels and forwards packets to node F to complete the E2E traffic forwarding.
The preceding information shows that after adjacency and node labels are manually specified,
a device can forward the data packets along the shortest path or load-balance the data packets
over paths. The paths are not fixed, and therefore, this forwarding method is called loose
explicit-path SR-TE.
1.13.2.3.5 Importing Traffic

After an SR LSP or SE-TE tunnel is established, service traffic needs to be imported to the SR
LSP or SR-TE tunnel. The common methods are to use a static route, tunnel policies, or an
automatic route. Services include public network services, EVPN, L2VPN, and L3VPN.
Table 1-367 Support for tunnels
Traffic Direction SR LSP SR-TE Tunnel

Mode/Tunnel
Type
Static route No tunnel interface is available. A tunnel interface is available.

Therefore, Static routes cannot A static route can direct traffic
be used to direct traffic to SR to an SR-TE tunnel.
LSPs.
Tunnel policy The tunnel select-sequence Either the tunnel
method can be used, whereas a select-sequence method or a
tunnel binding policy cannot be tunnel binding policy can be
used. used.
Auto route No tunnel interface is available. A tunnel interface is available.
Therefore, auto routes cannot be An auto route can direct traffic
used to direct traffic to SR to an SR-TE tunnel.
LSPs.
Policy-Based No tunnel interface is available. A tunnel interface is available.
Routing Therefore, policy-based routing policy-based routing can direct
cannot be used to direct traffic traffic to an SR-TE tunnel.
to SR LSPs.
Issue 01 (2018-05-04) 1755

NE20E-S2
Static Route
Static routes on an SR-TE tunnel work in the same way as common static routes. When
configuring a static route, set the outbound interface of a static route to an SR-TE tunnel
interface so that traffic transmitted over the route is directed to the SR-TE tunnel.
Tunnel Policy
By default, VPN traffic is forwarded through LDP LSPs, not SR LSPs or SR-TE tunnels. If
the default LDP LSPs cannot meet VPN traffic requirement, a tunnel policy is used to direct
VPN traffic to an SR LSP or an SR-TE tunnel.
The tunnel policy may be a tunnel type prioritizing policy or a tunnel binding policy. Select
either of the following policies as needed:
 Select-seq mode: This policy changes the type of tunnel selected for VPN traffic. An SR
LSP or SR-TE tunnel is selected as a public tunnel for VPN traffic based on the
prioritized tunnel types. If no LDP LSPs are available, SR LSPs are selected by default.
 Tunnel binding mode: This policy defines a specific destination IP address, and this
address is bound to an SR-TE tunnel for VPN traffic to guarantee QoS.
Auto Route
An IGP uses an auto route related to an SR-TE tunnel that functions as a logical link to
compute a path. The tunnel interface is used as an outbound interface in the auto route.
According to the network plan, a node determines whether an LSP link is advertised to a
neighbor node for packet forwarding. An auto route is configured using either of the following
methods:
 Forwarding shortcut: The node does not advertise an SR-TE tunnel to its neighbor nodes.
The SR-TE tunnel can be involved only in local route calculation, but cannot be used by
the other nodes.
 Forwarding adjacency: The node advertises an SR-TE tunnel to its neighbor nodes. The
SR-TE tunnel is involved in global route calculation and can be used by the other nodes.
 Forwarding shortcut and forwarding adjacency are mutually exclusive, and cannot be used
simultaneously.
 When the forwarding adjacency is used, a reverse tunnel must be configured for a routing protocol
to perform bidirectional check after a node advertises LSP links to the other nodes. The forwarding
adjacency must be enabled for both tunnels in opposite directions.
Policy-Based Routing
The policy-based routing (PBR) allows a device to select routes based on user-defined
policies, which improves traffic security and balances traffic. If PBR is enabled on an SR
network, IP packets are forwarded over specific LSPs based on PBR rules.
SR-TE PBR, the same as IP unicast PBR, is implemented by defining a set of matching rules
and behaviors. The rules and behaviors are defined using the apply clause with an SR-TE
tunnel interface used as an outbound interface. If packets do not match PBR rules, they are
properly forwarded using IP; if they match PBR rules, they are forwarded over specific
tunnels.
Issue 01 (2018-05-04) 1756

NE20E-S2
1.13.2.3.5.1 Public IP Routes Iterated to SR LSPs
Public Network BGP Route Iterated to an SR LSP

If an Internet user performs IP forwarding to access the Internet, core devices on a forwarding
path must learn many Internet routes. This imposes a heavy load on the core devices and
affects the performance of these devices. To tackle the problems, a user access device can be
configured to iterate non-labeled public network BGP or static routes to a segment routing
(SR) LSP. User packets travel through the SR LSP to access the Internet. The iteration to the
SR LSP prevents the problems induced by insufficient performance, heavy burdens, and
service transmission on the core devices on the network.
Figure 1-1194 Public network BGP route iterated to an SR LSP
In Figure 1-1194, the deployment is as follows:

 An E2E IGP neighbor relationship is established between each pair of directly connected
devices, and segment routing is configured on PEs and Ps. An SR LSP is established
between PEs.
 A BGP peer relationship between PEs is established to enable the PEs to learn the peer
routes.
 A BGP route is iterated to an SR LSP on each PE.
Static Route Iterated to an SR LSP

The next hop of a static route may be unreachable. Such a route must be iterated to a path. If
such a static route is iterated to an SR LSP, packets over the static route are forwarded based
on labels.
Issue 01 (2018-05-04) 1757

NE20E-S2
Figure 1-1195 Static route iterated to an SR LSP

 An E2E IGP neighbor relationship is established between each pair of directly connected
devices, and segment routing is configured on PEs and Ps. PE1 establishes an SR LSP
destined for PE2's loopback IP address.
 A static route is configured on PE1. The next-hop IP address is set to PE2's loopback IP
address.
 After receiving an IP packet, PE1 adds a label into the packet and forwards the packet
along the SR LSP.
1.13.2.3.5.2 L3VPN Iterated to SR LSPs
Basic VPN Iterated to an SR LSP

If an Internet user performs IP forwarding to access the Internet, core devices on a forwarding
path must learn many Internet routes. This imposes a heavy load on the core devices and
affects the performance of these devices. To tackle the problems, a VPN instance can be
iterated to a segment routing (SR) LSP, and users access the Internet through the SR LSP.
Figure 1-1196 Basic VPN iterated to an SR LSP
Issue 01 (2018-05-04) 1758

NE20E-S2
The network shown in Figure 1-1196 consists of inconsecutive L3VPN subnets with a
backbone network in between. PEs establish an SR LSP to forward L3VPN packets. PEs run
BGP to learn VPN routes. The deployment is as follows:
 An IS-IS neighbor relationship is established between each pair of directly connected
devices on the public network to implement route reachability.
 A BGP peer relationship is established between the two PEs to learn peer VPN routes of
each other.
 The PEs establish an IS-IS SR LSP to assign public network labels and compute a label
switched path.
 BGP is used to assign a private network label, for example, label Z, to a VPN instance.
 VPN routes are iterated to the SR LSP.
 PE1 receives an IP packet, adds the private network label and SR public network label to
the packet, and forwards the packet along the label switched path.
HVPN
On a growing network with increasing types of services, PEs encounter scalability problems,
such as insufficient access or routing capabilities, which reduces network performance and
scalability. In this situation, VPNs cannot be deployed in a large scale. In Figure 2, on a
hierarchical VPN (HVPN), PEs play different roles and provide various functions. These PEs
form a hierarchical architecture to provide functions that are provided by one PE on a
non-hierarchical VPN. HVPNs lower the performance requirements for PEs.
Figure 1-1197 HVPN

 BGP peer relationships are established between the UPE and SPE and between the SPE
and NPE. A segment routing LSP is established between the UPE and NPE.
 The SPE iterates a VPNv4 routes to the SR LSP.
The process of forwarding HVPN packets that CE1 sends to CE2 is as follows:
1. CE1 sends a VPN packet to the NPE.
2. After receiving the packet, the NPE searches its VPN forwarding table for an LSP to
forward the packet based on the destination address of the packet. Then, the NPE adds an
Issue 01 (2018-05-04) 1759

NE20E-S2
inner label L4 and an outer label Lv to the packet and sends the packet to the SPE over
the corresponding LSP. The label stack is L4/Lv.
3. After receiving the packet, the SPE replaces the outer label Lv with Lu and the inner
label L2 with L3. Then, the SPE sends the packet to the NPE over the same LSP.
4. After receiving the packet, the NPE removes the outer label Lu, searches for a VPN
instance corresponding to the packet based on the inner label L3, and removes the inner
label L3 after the VPN instance is found. Then, the NPE searches the VPN forwarding
table of this VPN instance for the outbound interface of the packet based on the
destination address of the packet. The NPE sends the packet through this outbound
interface to CE2. The packet sent by the NPE is a pure IP packet with no label.
VPN FRR
In Figure 1-1198, PE1 adds the optimal route advertised by PE3 and less optimal route
advertised by PE4 into a forwarding entry. The optimal route is used to guide traffic
forwarding, and the less optimal route is used as a backup route.
Figure 1-1198 VPN FRR networking
Table 1-368 Typical fault-triggered switching scenarios
Faulty Point Protection Switching
P1-to-P3 link failure PE1 does not support BFD for SR-BE and
cannot detect an LSP Down event. As a
result, PE2 cannot perform VPN FRR
switching to switch traffic to PE4 along
LSP3 over a path in Figure 1-1198.
After IS-IS FRR is configured, P1 performs
FRR switching to switch traffic to LSP2
over the path PE1->P1->P2->P4->P3->PE3,
After IS-IS FRR is configured, SR-BE LSP
hard convergence is performed on the P
node. Traffic switches to LSP2 over the
Issue 01 (2018-05-04) 1760

NE20E-S2
Faulty Point Protection Switching

converged path
PE1->P1->P2->P4->P3->PE3, shown in
Figure 1-1198.
PE3 node failure If PE3 fails, traffic on LSP1 cannot be
switched to an FRR backup path, and LSP2
cannot converge. PE1 uses IS-IS protocol
packets to detect the PE3 fault and performs
path convergence. Then the LSP goes
Down, and BGP switches traffic to LSP3
along the path
PE1->CE1->PE2->P2->P4->PE4, shown in
Figure 1-1198.
1.13.2.3.5.3 L2VPN Iterated to SR LSPs
VPLS Iterated to an SR LSP

Figure 1-1199 shows a typical VPLS networking mode. In this networking, users located in
various geographical regions communicate with each other through different PEs. From the
perspective of users, a VPLS network is a Layer 2 switched network that allows them to
communicate with each other in a way similar to communication over a LAN. The VPLS
service can be iterated to a segment routing (SR) LSP. Sites in each VPN establish virtual
connections, and public network SR LSPs are established to forward Layer 2 packets.
Figure 1-1199 VPLS iterated to an SR LSP
Issue 01 (2018-05-04) 1761

NE20E-S2
The process of iterating VPLS services to an SR LSP is as follows:

 CE1 sends a packet with Layer 2 encapsulation to PE1.
 PE1 establishes an E2E SR LSP to PE2.
 An LSP policy is configured on PE1 to select the SR LSP, and the VSI forwarding
entries are associated with the SR forwarding entries.
 PE1 receives the packet, searches for a VSI entry, and selects an LSP and a PW based on
the entry. PE1 adds double labels (outer LSP label and inner VC label) to the packet
based on the selected LSP and PW, performs Layer 2 encapsulation, and forwards the
packet to PE2.
 Upon receipt of the packet, PE2 decapsulates the packet by removing Layer 2
encapsulation information and two MPLS labels.
 PE2 forwards the original packet to CE2.
The process of iterating HVPLS services to an SR LSP is similar to that of iterating VLL and
VPLS services to an SR LSP. The process is not described.
EVPN Iterated to an SR LSP

Ethernet virtual private network (EVPN) is a Layer 2 interworking VPN technique. EVPN
uses a mechanism similar to BGP/MPLS IP VPN. EVPN extends the BGP protocol and uses
extended reachability information to move the process of learning and advertising MAC
addresses between Layer 2 networks at various sites from the data plane to the control plane.
Compared with VPLS, EVPN tackles the load imbalance and high network resource
consumption problems occurring on VPLS networks.
Figure 1-1200 Unicast traffic transmission
In Figure 1-1200, after the PEs learn the MAC addresses of VPN sites and establish a public
network SR LSP, the PEs can transmit unicast packets to the other site. The packet
transmission process is as follows:
1. CE1 sends unicast packets based on Layer 2 forwarding to PE1.
2. After PE1 receives the packets, PE1 encapsulates a private network label carried in a
MAC entry and a public network SR label in sequence and sends the packets to PE2.
3. After PE2 receives the encapsulated unicast packets, PE1 performs decapsulation,
removes the private network label, and searches the private network MAC table for a
matching outbound interface.
1.13.2.3.6 BFD for SR-TE

SR-TE does not use a protocol. Once a label stack is delivered to an SR-TE node, the node
establishes an SR-TE LSP. The LSP does not encounter the protocol Down state, except for
Issue 01 (2018-05-04) 1762

NE20E-S2
the situation when the label stack is withdrawn. Therefore, BFD must be used to monitor

Huawei NE20 - Features PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Huawei NE20 - Features PDF

Hochgeladen von

Copyright:

Verfügbare Formate

NE20E-S2

1.3.4.3.3 Supporting SCP ................................................................................................................................................. 29

Issue 01 (2018-05-04) iii

1.4.2.3.2 Assignment of VSs to Users .............................................................................................................................. 65

1.4.7.2.4 SNMPv2c Principles .......................................................................................................................................... 96

1.4.11.2.1 Basic LLDP Concepts .................................................................................................................................... 150

1.4.16.4 Terms, Acronyms, and Abbreviations ............................................................................................................... 205

Issue 01 (2018-05-04) vii

1.5.2.2.1 Basic Concepts................................................................................................................................................. 243

Issue 01 (2018-05-04) viii

1.5.7.3 CFM Principles ................................................................................................................................................... 375

1.5.9.4 Terms, Acronyms, and Abbreviations ................................................................................................................. 453

1.7.4.3 Applications ........................................................................................................................................................ 521

Issue 01 (2018-05-04) xii

1.7.6.2.20 802.1p on a QinQ Interface ............................................................................................................................ 589

Issue 01 (2018-05-04) xiii

1.7.9.2.4 ERPS Multi-ring Principles ............................................................................................................................. 697

Issue 01 (2018-05-04) xiv

1.7.13.2 Principles .......................................................................................................................................................... 762

1.8.4.2.4 FR Sub-interfaces ............................................................................................................................................ 814

Issue 01 (2018-05-04) xvi

1.8.9 CES ........................................................................................................................................................................ 849

Issue 01 (2018-05-04) xvii

1.9.4.2 DHCPv4 Principles ............................................................................................................................................. 935

Issue 01 (2018-05-04) xviii

1.9.8.2 Principles .......................................................................................................................................................... 1014

Issue 01 (2018-05-04) xix

1.10.2.2.8 Priority-based Route Convergence .............................................................................................................. 1047

1.10.5.2.1 RIPng Packet Format ................................................................................................................................... 1085

Issue 01 (2018-05-04) xxi

1.10.7.2.7 OSPFv3 VPN ............................................................................................................................................... 1170

Issue 01 (2018-05-04) xxii

1.10.9.2.9 BGP VPN Route Crossing ........................................................................................................................... 1286

Issue 01 (2018-05-04) xxiii

1.11.3.2.8 Multi-Instance Supported by IGMP ............................................................................................................. 1366

Issue 01 (2018-05-04) xxiv

1.11.7.2.2 Inter-domain Multicast Implemented by MVPN ......................................................................................... 1435

Issue 01 (2018-05-04) xxv

1.11.10.3.3 User-side Multicast VPN ........................................................................................................................... 1532

Issue 01 (2018-05-04) xxvi

1.12.3.2.10 BFD for LDP LSP ...................................................................................................................................... 1600

Issue 01 (2018-05-04) xxvii

1.12.4.5.11 BFD for RSVP ........................................................................................................................................... 1675

Issue 01 (2018-05-04) xxviii

1.13.2.3.8 SR OAM ...................................................................................................................................................... 1772

Issue 01 (2018-05-04) xxix

1.14.6.2.2 Hub & Spoke ............................................................................................................................................... 1822

Issue 01 (2018-05-04) xxx

Issue 01 (2018-05-04) xxxi

1.14.11 EVPN ............................................................................................................................................................... 1980

Issue 01 (2018-05-04) xxxii

1.15.6.4.2 Traffic Policy Based on MF Classification .................................................................................................. 2096

Issue 01 (2018-05-04) xxxiii

1.16.2.2.1 Validity Check of ARP Packets .................................................................................................................... 2211

Issue 01 (2018-05-04) xxxiv

1.16.7 MAC Address Limit........................................................................................................................................... 2262

Issue 01 (2018-05-04) xxxv

1.16.11.3.1 Carrier Scenario ......................................................................................................................................... 2284

Issue 01 (2018-05-04) xxxvi

1.16.15.2 Principles ...................................................................................................................................................... 2337

Issue 01 (2018-05-04) xxxvii

1.18.4.9.22 CE Ping ...................................................................................................................................................... 2473

Issue 01 (2018-05-04) xxxviii