Sie sind auf Seite 1von 1
Blog Home | INE Home | Members | Contact Us | Subscribe Free Resources View
Blog Home | INE Home | Members | Contact Us | Subscribe Free Resources View
| INE Home | Members | Contact Us | Subscribe Free Resources View Archives All Access
Free Resources View Archives All Access Pass CCIE Bloggers
Free Resources
View Archives
All Access Pass
CCIE Bloggers
Nov 22
Nov
22

18 Comments

in BGP , BGP , CCDE , CCDP Introduction 18 Comments BGP (see [0] ) is

BGP (see [0]) is the de-facto protocol used for Inter-AS connectivity nowadays. Even though it is commonly

accepted that BGP protocol design is far from being ideal and there have been attempts to develop a better

replacement for BGP, none of them has been successful. To further add to BGP’s widespread adoption, MP-BGP

extension allows BGP transporting almost any kind of control-plane information, e.g. to providing auto-discovery

functions or control-plane interworking for MPLS/BGP VPNs. However, despite BGP’s success, the problems with

the protocol design did not disappear. One of them is slow convergence, which is a serious limiting factor for many

modern applications. In this publication, we are going to discuss some techniques that could be used to improve

BGP convergence for Intra-AS deployments.

BGP-Only Convergence Process

Tuning BGP Transport

BGP Fast Peering Session Deactivation

BGP and IGP Interaction

BGP PIC and Multiple-Path Propagation

Practical Scenario: BGP PIC + BGP NHT

Considerations for Implementing BGP PIC

Summary

Further Reading

Appendix: Practical Scenario Baseline Configuration

BGP-Only Convergence Process

BGP is a path-vector protocol – in other words, a distance-vector protocol featuring complex metric. In absence of

any policies, BGP operates like if routes have metric equal to the length of the AS_PATH attribute. BGP routing

polices may override this simple monotonous metric and potentially create divergence conditions in non-trivial BGP

topologies (see [7],[8],[9]). While this may be a serious problem at a large scale, we are not going to discuss these

pathological cases, but rather talk about convergence in general. Like any distance-vector protocol, BGP routing

process accepts multiple incoming routing updates, and advertises only the best routes to its peers. BGP does not

utilize periodic updates, and thus route invalidation is not based on expiring any sort of soft state information (e.g

prefix-related timers like in RIP). Instead, BGP uses explicit withdrawal section in the triggered UPDATE message

to signal neighbors of the loss of the particular path. In addition to the explicit withdrawals, BGP also support

implicit signaling, where newer information for the same prefix from the same peer replaces the previously learned

information.

Search

Search

Submit

Categories

Select Category

CCIE Bloggers

Routing & Sw itchingSelect Category CCIE Bloggers Brian Dennis, CCIEx5 #2210 Voice Security Service Provider ISP Dial Brian McGahan,

VoiceBloggers Brian Dennis, CCIEx5 #2210 Routing & Sw itching Security Service Provider ISP Dial Brian McGahan,

SecurityBrian Dennis, CCIEx5 #2210 Routing & Sw itching Voice Service Provider ISP Dial Brian McGahan, CCIEx4

Service ProviderDennis, CCIEx5 #2210 Routing & Sw itching Voice Security ISP Dial Brian McGahan, CCIEx4 #8593, CCDE

ISP DialRouting & Sw itching Voice Security Service Provider Brian McGahan, CCIEx4 #8593, CCDE #2013::13 Design Data

DesignISP Dial Brian McGahan, CCIEx4 #8593, CCDE #2013::13 Data Center Routing & Sw itching Security Service

Data CenterISP Dial Brian McGahan, CCIEx4 #8593, CCDE #2013::13 Design Routing & Sw itching Security Service Provider

Routing & Sw itchingMcGahan, CCIEx4 #8593, CCDE #2013::13 Design Data Center Security Service Provider Mark Snow , CCIEx4 #14073

SecurityCCDE #2013::13 Design Data Center Routing & Sw itching Service Provider Mark Snow , CCIEx4 #14073

Service ProviderDesign Data Center Routing & Sw itching Security Mark Snow , CCIEx4 #14073 Data Center Collaboration

Data Centeritching Security Service Provider Mark Snow , CCIEx4 #14073 Collaboration Security Voice Petr Lapukhov, CCIEx4 #16379,

CollaborationService Provider Mark Snow , CCIEx4 #14073 Data Center Security Voice Petr Lapukhov, CCIEx4 #16379, CCDE

SecurityProvider Mark Snow , CCIEx4 #14073 Data Center Collaboration Voice Petr Lapukhov, CCIEx4 #16379, CCDE #2010::7

VoiceMark Snow , CCIEx4 #14073 Data Center Collaboration Security Petr Lapukhov, CCIEx4 #16379, CCDE #2010::7 Design

DesignSecurity Voice Petr Lapukhov, CCIEx4 #16379, CCDE #2010::7 Routing & Sw itching Security Service Provider Voice

Routing & Sw itchingVoice Petr Lapukhov, CCIEx4 #16379, CCDE #2010::7 Design Security Service Provider Voice Popular Posts No posts

SecurityCCIEx4 #16379, CCDE #2010::7 Design Routing & Sw itching Service Provider Voice Popular Posts No posts

Service ProviderCCDE #2010::7 Design Routing & Sw itching Security Voice Popular Posts No posts to display Let’s

VoiceDesign Routing & Sw itching Security Service Provider Popular Posts No posts to display Let’s have

Popular Posts

No posts to display

Let’s have a look at BGP UPDATE message below. As you can see, the UPDATE message may contain both

withdrawn prefixes and new routing information. While withdrawn prefixes are listed simply as a collection of NLRIs,

new information is grouped around the set of BGP attributes, shared by the group of announced prefixes. In other

words, every BGP UPDATE message contains new information pertaining to a set of path attributes, at least

prefixes sharing the same AS_PATH attribute. Therefore, every new collection of attributes requires a separate

UPDATE message to be sent. This fact is important, as BGP process tries packing as many prefixes per update

message as possible, when replicating routing information.

message as possible, when replicating routing information. Look at the sample topology below. Let’s assume that

Look at the sample topology below. Let’s assume that R1′s session to R7 just came up and follow the way that

prefix 20.0.0.0/8 takes to propagate through AS 300. In the course of this discussion we skip the complexities

associated with BGP policy application and thus ignore the existence of BGP Adj-RIB-In space used for processing

the prefixes learned from a peer prior to running the best-path selection process.

a peer prior to running the best-path selection process. Upon session establishment and exchanging the BGP

Upon session establishment and exchanging the BGP OPEN messages, R1 enters the “BGP Read-Only Mode”. What this means, is that R1 will not start the BGP Best-Path selection process until it either receives all prefixes from R7 or reaches the BGP read-only mode timeout. The timeout is defined using the BGP process command bgp update-delay . The reason to hold the BGP best-path selection process is to ensure bgp update-delay. The reason to hold the BGP best-path selection process is to ensure that the peer has supplied us all routing information. This allows minimizing the number of best-path selection process runs, simplify update generation and ensure better prefix per message packing, thus improving transportation efficiency.

BGP process determines the end of UPDATE messages flow in either of two ways: receiving BGP KEEPALIVE message or receiving BGP End-of-RIB message. The last message is normally used for BGP graceful restart (see [13] ), but could also be used to explicitly signalize the end of BGP UPDATE [13]), but could also be used to explicitly signalize the end of BGP UPDATE exchange process. Even if BGP process does not support the End-of-RIB marker, Cisco’s BGP implementation always sends a KEEPALIVE message when it finishes sending updates to a peer. It is clear that the best- path selection delay would be longer in case when peers have to exchange larger routing tables, or the underlying TCP transport and router ingress queue settings make the exchange slower. To address this, we’ll briefly cover TCP transport optimization later.

When R1′s BGP process leaves read-only mode, it starts the best-path selection running the BGP Router process. This process walks over new information and compare it with the local BGP Router process. This process walks over new information and compare it with the local BGP RIB contents, selecting the best-path for every prefix. It takes time proportional to the amount of the new informational learned. Luckily, the computations are not very CPU-intensive, just like with any distance-vector protocol. As soon as the best-path process if finished, BGP has to upload all routes to the RIB, before advertising them to the peers. This is a requirement of distance-vector protocols – having the routing information active in the RIB before propagating it further. The RIB update will in turn trigger FIB information upload to the router’s line-cards, if the platform supports distributed forwarding. Both RIB and FIB updates are time- consuming and take the time proportional to the number of prefixes being updated.

After information has been committed to RIB, R1 needs to replicate the best-paths to every peer that should receive it. The replication process could be most memory and CPU intensive as BGP process has to perform a full BGP table walk for every peer and construct the output for the corresponding BGP Adj-RIB- Out. This may require additional transient memory in the course of the update batch calculation. However, the update generation process is highly optimized in Cisco’s BGP implementation by means of dynamic update groups . The essence of the dynamic update groups is that BGP process dynamic update groups. The essence of the dynamic update groups is that BGP process dynamically finds all neighbors sharing the same output policies, then elects a peer with the lowest IP address as the group leader and only generates the updates batch for the group leader. All other members of the same group receive the same updates. In our case, R1 has to generate two update sets: one for R5 and another for the pair of RR1 and RR2 route reflectors. The BGP update groups become very effective on route- reflectors that often have hundred of peers sharing the same policies. You may see the update groups using the command show ip bgp replication for IPv4 sessions.

R1 starts sending updates to R1 and RR1, RR2. This will take some time, depending on the BGP TCP transport settings and BGP table size. However, before R1 will ever start sending any updates to any peer/update group, it checks if Advertisement Interval timer is running for this peer. BGP speaker starts this timer on per-peer Advertisement Interval timer is running for this peer. BGP speaker starts this timer on per-peer basis every time its done sending the full batch of updates to the peer. If the subsequent batch is prepared to be sent and the timer is still running, the update will be delayed until the timer expires. This is a dampening mechanism to prevent unstable peers from flooding the network with updates. The command to define this timer is neighbor X.X.X.X advertisment-interval XX. The default values are 30 seconds for eBGP and 5 seconds for iBGP/eiBGP sessions (intra-AS). This timer really starts playing its role only for “Down-Up” or “Up-Down” convergence, as any rapid flapping changes are delayed for the amount of advertisement-interval seconds. This becomes especially important for inter-AS route propagation, where the default advertisement-interval there is 30 seconds.

The process repeats itself on RR1 and RR2, starting with the incoming UPDATE packet reception, best-path

selection and update generation. If for some reason the prefix 20.0.0.0/8 would vanish from AS 100 soon after it

has been advertised, it may take as long as “Number_of_Hops x Advertisement_Interval” to reach to R3 and R4,

as every hop may delay the fast subsequent update. As we can see, the main limiting factors of BGP convergence

are BGP table size, transport-level settings and advertisement delay. The best-path selection time is proportional

to the table size as well as time required for update batching.

Let’s look at a slightly different scenario to demonstrate how BGP multi-path may potentially improve convergence.

Firstly, observing the topology presented on FIG 1, we may notice that AS 300 has two connections to AS 100.

Thus, it may be expected to see two paths to every route from AS 100 on every router in AS 300. But this is not

always possible in situations where any topology other than BGP full mesh is used inside the AS. In our example,

R1 and R2 advertise routing information to the route-reflectors RR1 and RR2. Per the distance-vector behavior,

the reflectors will only re-advertise the best-path to AS 100 prefixes, and since both RRs elect paths consistently,

they will advertise the same path to R3, R4 and R2. Both R3 and R4 will receive the prefix 10.0.0.0/24 from each

of the RRs and use the path via R1. R2 will receive the best path via R1 as well but prefer using its eBGP

connection. On contrary, if R1, R2, R3 and R4 were connected in the full mesh, then every router would have

seen exits via R1 and R2 and be able to use BGP multi-path if configured. Let’s review what happens in the

topology on FIG1 when R1 loses connection to AS 100.

Depending on the failure detection mechanism, be it BGP keepalives or BFD, it will take some time for R1 to realize the connection is no longer valid. We’ll discuss the options for fast failure detection later in this publication.in the topology on FIG1 when R1 loses connection to AS 100. After realizing that R5

After realizing that R5 is gone, R1 deletes all paths via R7. Since RR1 and RR2 never advertised back to R1 the path via R2, R1 has no alternate paths to AS 100. Realizing this, R1 prepares a batch of UPDATE messages for RR1, RR2 and R7, containing the withdrawal messages for AS 100 prefixes. As soon as RR1 and RR2 are done receiving and processing the withdrawals, they elect the new best path via R2 and advertise withdrawals/updates to R1, R2, R3, R4.for fast failure detection later in this publication. R3 and R4 now have the new path

R3 and R4 now have the new path via R2, and R2 loses the “backup” path via R1 it knew about from the RRs. The main workhorses of the re-convergence process in this case are the route-reflectors. The convergence time is sum of the peering session failure detection, update advertisement and BGP best- path recalculations in the RRs.via R2 and advertise withdrawals/updates to R1, R2, R3, R4. If BGP speakers were able to

If BGP speakers were able to utilize multiple paths at the same time, then it could be possible to alleviate the

severity of a network failure. Indeed, if load-balancing is in use, then a failure of an exit point will only affect flows

going across this exit point (50% in our case) and only those flows will have to wait for re-convergence time. Even

better, it is theoretically possible to do “fast” re-route in the case where multiple equal-cost (equivalent and thus

loop–less) paths are available in BGP. Such switchover could be performed in the forwarding engine, as soon as

the failure is signaled. However, there are two major problems with the re-route mechanism of this type:

1. As we have seen, the use of route-reflectors (or confederations) has significant effect on redundancy by hiding alternate paths. Using full-mesh is not an option, so a mechanism needed allowing propagation of multiple alternate paths in RR/Confederation environment. It is interesting to point out that such mechanism is already available in BGP/MPLS VPN scenarios, where multiple point of attachments for CE sites could utilize different RD values to differentiate the same routes advertised from different connection points. However, a generic solution is required, allowing for advertising multiple alternate paths with IPv4 or any other address-family.

2. Failure detection and propagation by means of BGP mechanics is slow, and depends on the number of affected prefixes. Therefore, the more severe is the damage, the slower it is propagated in the BGP. Some other, non-BGP mechanism needs to be used to report network failures and trigger BGP re-convergence.

In the following sections we are going to review various technologies developed to accelerate BGP convergence,

enabling far better reaction times compared to “pure BGP based” failure detection and repair.

Tuning BGP Transport

Tuning BGP transport mechanism is a very important factor for improving BGP performance in the cases where

purely BGP-based re-convergence process is in use. TCP is the underlying transport used for propagating BGP

UPDATE messages, and optimizing TCP performance directly benefits BGP. If you take the full Internet routing

table, which is above 300k prefixes (Y2010), then simply transporting the prefixes alone will consume over 10

Megabytes, not to count the path attributes and other information. Tuning TCP transport performance includes the

following:

1. Enabling TCP Path MTU discovery for every neighbor, to allow the TCP selecting optimum MSS size. Notice that this requires that no firewall blocks the ICMP unreachable messages used during the discovery process

2. Tuning the router’s ingress queue size to allow for successful absorption of large amount of TCP ACK messages. When a router starts replicating BGP UPDATES to its peers, every peer responds with TCP ACK message to normally every second segment sent (TCP Delayed ACK). The more peers router has, the higher will be the pressure on the ingress queue.

Very detailed information on tuning BGP transport could be found in [10] Chapter 3. We, therefore, skip an in-

depth discussion of this topic here.

BGP Fast Peering Session Deactivation

When using BGP-only convergence mechanic, detecting a link failure is normally based on BGP KEEPALIVE

timers, which are 60/180 seconds by default. It could be noted that TCP keepalives could be used for the same

purpose, but since BGP already has similar mechanics these are not of any big help. It is possible to tune the BGP

keepalive timers to be as low as 1/3 seconds, but the risk of peering session flapping become significant with such

settings. Such instability is dangerous since there is no built-in session dampening mechanism in BGP session

establishment process. Therefore, some other mechanism should be preferred – either BFD or fast BGP peering

session deactivation. The last option is on by default for eBGP sessions, and tracks the outgoing interface

associated with the BGP session. As soon as the interface (or the next-hop for multihop eBGP) is reported as

down, the BGP session is deactivated. Interface flapping could be effectively dampened using IP Event Dampening

in Cisco IOS (see [14]) and hence is less dangerous than BGP peering session flapping. The command to disable

fast peering session deactivation is no bgp fast-external-fallover. Notice that this feature is by default off for

iBGP sessions, as those are supposed to be routed and restored using the underlying IGP mechanics.

Using BFD is the best option on multipoint interfaces, such as Ethernet, that do not support fast link down

detection e.g. by means of Ethernet OAM. BFD is especially attractive in the platforms that implement it in the

hardware. The command to activate BFD fallover is neighbor fall-over bfd. In the following sections, we’ll

discuss the use of IGP for fast reporting of link failures.

BGP and IGP Interaction

BGP prefixes typically rely on recursive next-hop resolution. That is, next-hops associated with BGP prefixes are

normally not directly connected, but rather resolved via IGP. The core of BGP and IGP interaction used to be

implemented in the BGP Scanner process. This process runs periodically and among other work performs full

BGP table walk and validates the BGP next-hop values. The validation consists of resolving the next-hop

recursively through the router’s RIB and possibly changing the forwarding information in response to IGP events.

For example, if R1 crashes on FIG1, it will take 180 seconds for the RRs to notice the failure based on BGP

KEEPALIVE message. However, the IGP will probably converge faster and report R1′s address as unreachable.

This event will be detected during the BGP Scanner process run and all paths via R1 will be invalidated by all

BGP speakers in AS 100. The default BGP Scanner run-time is 60 seconds, and could be changed using the

command bgp scan-time. Notice that setting this value too low may result in extra burden on router’s CPU if you

have large BGP tables, since the scanner process has to perform full table walk every time it executes.

The periodic behavior of BGP Scanner is still too slow to effectively respond to IGP events. IGP protocols could be

tuned to react to a network change within hundreds of milliseconds (see [6]) and it would be desirable to make

BGP aware of such changes as quickly as possible. This could be done with the help of BGP Next-Hop Tracking

(NHT) feature. The idea is to make the BGP process register the next-hop values with the RIB “watcher” process

and require a “call-back” every time information about the prefix corresponding to the next-hop changes. Typically,

the number of registered next-hop values equals the number of exits from the local AS, or the number of PEs in

MPLS/BGP VPN environment, so next-hop tracking does not impose heavy memory/CPU requirements. There are

normally two types of events: IGP prefix becoming unreachable and IGP prefix metric change. The first event is

more important and reported faster than metric change. Overall, IGP delays report of an event for the duration of

bgp nextop trigger delay XX interval which is 5 seconds by default. This allows for more consecutive events to

be processed and received from IGP and effectively implements event aggregation. This delay is helpful in various

“fate sharing” scenarios where a facility failure affects multiple links in the network, and BGP needs to ensure that

all IGP nodes have reported this failure and IGP has fully converged. Normally, you should set the NHT delay to be

slightly above the time it takes the IGP to fully converge upon a change in the network. In a fast-tuned IGP

network, you can set this delay to as low as 0 seconds, so that every IGP event is reported immediately, though

this requires careful underlying IGP tuning to avoid oscillations. See [6] for more information on tuning the IGP

protocol settings, but in short, you need to tune the SPF delay value in IGP to be conservative enough to capture

all changes that could be caused by a failure in the network. Setting SPF delay too low may result is excessive

BGP next-hop recalculations and massive best-path process runs.

As a reaction to an IGP next-hop change, the BGP process has to start BGP Router sub-process for re-

calculating the best paths. This will affect every prefix that has the next-hop changed as a result of IGP event, and

could take significant amount of time, based on number of prefixes associated with this nexthop. For example, if an

AS has two connections to the Internet and receives full BGP tables over both connections, then a single exit

failure will force full-table walk for over 300k prefixes. After this happens, BGP has to upload the new forwarding

information to RIB/FIB, with the overall delay being proportional to the table size. To put it in other words, BGP

convergence is non-deterministic in response to an IGP event, e.g. there is no well-defined finite time for the

process to complete. However, if the IGP change did not result in any effects to BGP next-hop, e.g. if IGP was able

to repair the path upon link failure and the path has the same cost, then BGP is not needed to be informed at all

and convergence is handled at IGP level.

The last, less visible contributor to faster convergence is Hierarchical FIB. Look at the figure below – it shows

how FIB could be organized as either “flat” or “hierarchical”. In the “flat” case, BGP prefixes have their forwarding

information directly associated – e.g. the outgoing interface, MAC rewrite, MPLS label information and so on. In

such case, any change to a BGP next-hop may require updating a lot of prefixes sharing the same next-hop, which

is a time consuming process. If the next-hop value remains the same, and only the output interface changes, the

FIB update process still needs walking over all BGP prefixes and reprogramming the forwarding information. In

case of “hierarchical” FIB, any IGP change that does not affect BGP prefixes, e.g. output interface change, only

requires walking over the IGP prefixes, which are not as numerous as BGP. Therefore, hierarchical FIB

organization significantly reduces FIB update latency in the cases where only IGP information needs to be

changed. The use of hierarchical FIB is automatic and does not require any special commands. All major

networking equipment vendors support this feature.

The last thing to discuss in relation to BGP NHT is IGP route summarization. Summarization hides detailed

information and may conceal changes occurring in the network. In such case, BGP process will not be notified of

the IGP event and will have to detect failure and re-converge using BGP-only mechanics. Look at the figure below

– because of summarization, R1 will not be notified or R2′s failure and the BGP process at R1 will have to wait till

BGP session times out. Aside from avoiding summarization for the prefixes used for iBGP peering, an alternate

solution could be using multi-hop BFD [15]. Additionally, there is some work in progress to allow the separation of

routing and reachability information natively in IGP protocols.

and reachability information natively in IGP protocols. You can see now how NHT may allow BGP

You can see now how NHT may allow BGP to react quickly to the events inside its own AS, provided that

underlying IGP is properly tuned for fast convergence. This fast convergence process effectively covers core link

and node failures, as well as edge link and node failures, provided that all these could be detected by IGP. You

may want to look at [1] for detailed convergence breakdowns. Pay special attention that edge link failure requires

special handling. If your edge BGP speaker is changing the next-hop value to self for the routes received from

another autonomous system, than IGP will only be able to detect failures for paths going to the BGP speaker’s own

IP address. However, if the edge link fails, the convergence will follow along the BGP path, using BGP withdrawal

message propagation through the AS. The best approach in this case is to leave the eBGP next-hop IP address

unmodified and advertise the edge link into IGP using the passive interface feature or redistribution. This will allow

the IGP to respond to link down condition by quickly propagating the new LSA and synchronously trigger BGP re-

convergence on all BGP speakers in the system by informing them of the failed next-hop. In topologies with large

BGP tables this takes significantly less time compared to BGP-based convergence process. And lastly, despite all

benefits that BGP NHT may provide for recovering from Intra-AS failures, the Inter-AS convergence is still purely

BGP driven, based on BGP’s distance-vector behavior.

BGP PIC and Multiple-Path Propagation

Even though BGP NHT enables fast reaction to IGP events, the convergence time is still not deterministic, because

it depends on the number of prefixes BGP needs to be processed for best-path selection. Previously, we

discussed how having multiple equal-cost BGP paths could be used for redundancy and fast failover at the

forwarding engine level, without involving any BGP best-path selection. What if the paths are unequal – is it

possible to use them for backup? In fact, since BGP treats the local AS as a single hop, all BGP speakers select

the same path consistently, and changing from one path to another synchronously among all speakers should not

create any permanent routing loops. Thus, even in scenarios where equal-cost BGP multi-path is not possible, the

secondary paths may still be used for fast failover, provided that a signaling mechanism to detect the primary path

failure exists. We already know that BGP NHT could be used to detect a failure and propagate this information

quickly to all BGP speakers, triggering local switchover. This switchover does not require any BGP table walks and

best-path re-election, but simply is a matter of changing the forwarding information – provided that hierarchical FIB

is in use. Therefore, this process does not depend on the number of BGP prefixes, and thus known as Prefix

Independent Convergence (PIC) process. You may think of this process as a BGP equivalent to IGP-based Fast

Re-Route, though in IGP failure deception is local to the router and in BGP failure detection is local to the AS. BGP

PIC could be used any time there are multiple paths to the destination prefix, such on R1 in the example below,

where target prefix is reachable via multiple paths:

We have already stated the problem with multiple paths – only one best path is advertised by BGP speakers and

the BGP speaker will only accept one path for a given prefix from a given peer. If a BGP speaker receives multiple

paths for the same prefix within the same session it simply uses the newest advertisement. A special extension to

BGP known as “Add Paths” (see [3] and [16]) allows BGP speaker to propagate and accept multiple paths for

the same prefix. The “Add Paths” capability allows peering BGP speakers to negotiate whether they support

advertising/receiving multiple paths per prefix and actually advertise such paths. A special 4-byte path-identifier is

added to NLRIs to differentiate multiple paths for the same prefix sent across a peering session. Notice that BGP

still considers all paths as comparable from the viewpoint of best-path selection process – all paths are stored in

the BGP RIB and only one is selected as the best-path. The additional NLRI identifier is only used when prefixes

are sent across a peering session to prevent implicit withdrawals by the receiving peer. These identifiers are

generated locally and independently for every peering session that supports such capability.

for every peering session that supports such capability. in addition to propagating backup paths, the “Add

in addition to propagating backup paths, the “Add Paths” capability could be used for other purposes, e.g.

overcoming BGP divergence problems described in [9]. Alternatively, if backup paths are required but “Add Path”

feature is not implemented, one of your options could be using full-mesh of BGP speakers, such as on the figure

below. In this case, multiple exit point information is preserved and allows for implementing BGP PIC functionality.

preserved and allows for implementing BGP PIC functionality. Pay attention to the fact that BGP PIC

Pay attention to the fact that BGP PIC is possible even without the “Add Paths” capability in RR scenarios,

provided that RRs propagate the alternate paths to the edge nodes. This may require IGP metric manipulation to

ensure different exit points are selected by the RRs or using other techniques, such as different RD values for

multi-homed site attachment points.

Practical Scenario: BGP PIC + BGP NHT

In this hands-on scenario we are going to illustrate the use of IGP tuning, BGP NHT configuration and BGP PIC

and demonstrate how they work together. First, look at the topology diagram: R9 is advertising a prefix, and R5,

R6 receive this prefix via the RRs. In normal BGP environment, provided that the RRs elect the same path, R5 and

R6 would have just one path for R9′s prefix. However, we tune the scenario, disabling the connections between R1

and R4 and R2 and R3, so R3 has better cost to exit via R1 and R4 has better cost via R2. This will make the RRs

elect different best-paths and propagate them to their clients.

different best-paths and propagate them to their clients. The following is the key piece of configuration

The following is the key piece of configuration for enabling the fast backup path failover to be applied to every

router in AS 100. As you can see, the SPF/LSA throttling timers are tuned very aggressively to allow for fastest

reaction to IGP events. BGP nexthop trigger delay is set to 0 seconds, thus fully relying on IGP to aggregate

underlying events. In any production environment, you should NOT use these values and pick up your own,

matching your IGP scale and convergence rate.

router ospf 100

timers throttle spf 1 100 5000

timers throttle lsa all 0 100 5000

timers lsa arrival 50

!

router bgp 100

bgp nexthop trigger delay 0

bgp additional-paths install

no bgp recursion host

The command bgp additional-paths install when executed in non BGP-multipath environment allows for

installing backup paths in additional to the best one elected by BGP. This, of course, requires that the additional

paths have been advertised by the BGP Route Reflectors. At the moment of writing, Cisco IOS does not support

the “Add Paths” capability, so you need to make sure BGP RRs elect different best-paths in order for the edge

routers to be able to use additional paths. The command no bgp recursion host requires special explanation on

its own. By default, when a BGP prefix loses next-hop, the CEF process will attempt to look-up the next longest-

matching prefix for the next-hop to provide fallback. When additional repair paths are present, this functionality is

not required and will, in fact, slower the convergence. This is why it’s automatically disabled when you type the

command bgp additional-paths install and thus typing it with the “no” prefix is not really required.

Now that we have our scenario set up, we are going to demonstrate the fact that at least in current implementation,

Cisco IOS BGP process does not exchange/detects the capabilities for “Add Path” feature. Here is a debugging

output from a peering session establishment process, which shows that no “Add Path Capability” (code 69, per the

RFC draft) is being exchanged during session establishment.

R5#debug ip bgp 10.0.3.3

BGP debugging is on for neighbor 10.0.3.3 for address family: IPv4 Unicast

R5#clear ip bgp 10.0.3.3

BGP: 10.0.3.3 active rcv OPEN, version 4, holdtime 180 seconds

BGP: 10.0.3.3 active rcv OPEN w/ OPTION parameter len: 29

BGP: 10.0.3.3 active rcvd OPEN w/ optional parameter type 2 (Capability) len 6

BGP: 10.0.3.3 active OPEN has CAPABILITY code: 1, length 4

BGP: 10.0.3.3 active OPEN has MP_EXT CAP for afi/safi: 1/1

BGP: 10.0.3.3 active rcvd OPEN w/ optional parameter type 2 (Capability) len 2

BGP: 10.0.3.3 active OPEN has CAPABILITY code: 128, length 0

BGP: 10.0.3.3 active OPEN has ROUTE-REFRESH capability(old) for all address-families

BGP: 10.0.3.3 active rcvd OPEN w/ optional parameter type 2 (Capability) len 2

BGP: 10.0.3.3 active OPEN has CAPABILITY code: 2, length 0

BGP: 10.0.3.3 active OPEN has ROUTE-REFRESH capability(new) for all address-families

BGP: 10.0.3.3 active rcvd OPEN w/ optional parameter type 2 (Capability) len 3

BGP: 10.0.3.3 active OPEN has CAPABILITY code: 131, length 1

BGP: 10.0.3.3 active OPEN has MULTISESSION capability, without grouping

BGP: 10.0.3.3 active rcvd OPEN w/ optional parameter type 2 (Capability) len 6

BGP: 10.0.3.3 active OPEN has CAPABILITY code: 65, length 4

BGP: 10.0.3.3 active OPEN has 4-byte ASN CAP for: 100

BGP: nbr global 10.0.3.3 neighbor does not have IPv4 MDT topology activated

BGP: 10.0.3.3 active rcvd OPEN w/ remote AS 100, 4-byte remote AS 100

BGP: 10.0.3.3 active went from OpenSent to OpenConfirm

BGP: 10.0.3.3 active went from OpenConfirm to Established

This means that we need to rely on the BGP RRs to advertise multiple different paths in order for the edge nodes

to leverage the backup path capability.

R5#debug ip bgp updates

BGP updates debugging is on for address family: IPv4 Unicast

R5#debug ip bgp addpath

BGP additional-path related events debugging is on

R5#clear ip bgp 10.0.3.3

BGP(0): 10.0.3.3 rcvd UPDATE w/ attr: nexthop 20.0.17.7, origin i, localpref 100, metric 0, originator

10.0.1.1, clusterlist 10.0.3.3, merged path 200, AS_PATH

BGP(0): 10.0.3.3 rcvd 20.0.99.0/24

BGP(0): 10.0.3.3 rcvd NEW PATH UPDATE (bp/be - Deny)w/ prefix: 20.0.99.0/24, label 1048577, bp=N, be=N

BGP(0): 10.0.3.3 rcvd UPDATE w/ prefix: 20.0.99.0/24, - DO BESTPATH

BGP(0): Calculating bestpath for 20.0.99.0/24

Here you can see that the RR with IP address 10.0.3.3 sends us an update that has better information than the

one we currently know. However, before you enable the bgp additional-paths install there would be just one

path installed for the prefix:

R5#show ip route repair-paths 20.0.99.0

Routing entry for 20.0.99.0/24

Known via "bgp 100", distance 200, metric 0

Tag 200, type internal

Last update from 20.0.17.7 00:02:31 ago

Routing Descriptor Blocks:

* 20.0.17.7, from 10.0.3.3, 00:02:31 ago

Route metric is 0, traffic share count is 1

AS Hops 1

Route tag 200

MPLS label: none

But as soon as the bgp additional-paths install option has been enabled, the output of the same command

looks different:

R5#show ip route repair-paths 20.0.99.0

Routing entry for 20.0.99.0/24

Known via "bgp 100", distance 200, metric 0

Tag 200, type internal

Last update from 20.0.17.7 00:00:03 ago

Routing Descriptor Blocks:

* 20.0.17.7, from 10.0.3.3, 00:00:03 ago

Route metric is 0, traffic share count is 1

AS Hops 1

Route tag 200

MPLS label: none

[RPR]20.0.28.8, from 10.0.4.4, 00:00:03 ago

Route metric is 0, traffic share count is 1

AS Hops 1

Route tag 200

MPLS label: none

You may also see the second path in the BGP table with the “b” (backup) flag:

R5#show ip bgp 20.0.99.0

BGP routing table entry for 20.0.99.0/24, version 39

Paths: (2 available, best #1, table default)

Additional-path

Not advertised to any peer

200

20.0.17.7 (metric 192) from 10.0.3.3 (10.0.3.3)

Origin IGP, metric 0, localpref 100, valid, internal, best

Originator: 10.0.1.1, Cluster list: 10.0.3.3

200

20.0.28.8 (metric 192) from 10.0.4.4 (10.0.4.4)

Origin IGP, metric 0, localpref 100, valid, internal, backup/repair

Originator: 10.0.2.2, Cluster list: 10.0.4.4

And if you check the CEF entry for this prefix, you will notice there are multiple next-hops and output interfaces

that could be used for primary/backup paths:

R5#show ip cef 20.0.99.0 detail

20.0.99.0/24, epoch 0, flags rib only nolabel, rib defined all labels

recursive via 20.0.17.7

recursive via 20.0.17.0/24

nexthop 10.0.35.3 Serial1/0

recursive via 20.0.28.8, repair

recursive via 20.0.28.0/24

nexthop 10.0.35.3 Serial1/0

nexthop 10.0.45.4 Serial1/2

Notice that in oder to use the PIC functionality, BGP multi-path should be turned off – otherwise, equal-cost paths

will be used for load-sharing, not for primary/backup behavior. You may opt to using equal-cost multipath if allowed

by the network topology, as it offers better resource utilization and CEF switching layer allows for fast path failover

in case of equal-cost load-balancing. Now for debugging the fast failover process. We want to shut down R1′s

connection to R7 and see fast backup path switchover at R5. There are few caveats here, because we have very

simplified topology. Firstly, we only have one prefix advertised into BGP on R9. Propagating this prefix through

BGP is almost instant, since BGP best-path selection is done quickly and advertisement delay does not apply to a

single event. Thus, if we shutdown R1′s connection to R7, which is used as primary path, then R1 will detect the

link failure and shutdown the session. Immediately after this BGP process will flood an UPDATE with prefix removal

and this message would reach R5 and R6 even before OSPF finishes SPF computations. The reason being, of

course, single prefix propagated via BGP and no advertisement-interval used to delay to a single event.

It may seems like that disabling BGP fast external fallover on R1 could help us to take BGP out of the equation.

However, we still have BGP NHT enabled in R1 – as soon as we shut down the link, the RIB process would report

to BGP of the next-hop failure and UPDATE message will be sent right away. Thus, we would also need to disable

NTH in R1, using the command no bgp nexthop trigger enable . If we think further, we’ll notice that we also

need to enable NHT in R3 and R4, just so that they cannot to generate their own UPDATEs to R5 ahead of OSPF

notification. Therefore, prior to running experiment we disable BGP NHT in R1, R3, R4 and disable fast external

fallover in R1. This will allow the event from R1 propagate via OSPF ahead of BGP UPDATE message and trigger

fast switchover on R5. The below is the output of the debugging commands enabled on R5 after we shut down

R1′s connection to R7.

R5#debug ip ospf spf

OSPF spf events debugging is on

OSPF spf intra events debugging is on

OSPF spf inter events debugging is on

OSPF spf external events debugging is on

R5#debug ip bgp addpath

BGP additional-path related events debugging is on

R5 receive the LSA at 26.223 then BGP starts the path switchover at 26.295 – It took 72ms to run SPF, update

RIB and inform BGP of the event and then change the paths.

14:00:26.223: OSPF: Detect change in topology Base with MTID-0, in LSA type 1, LSID 10.0.1.1 from

 

10.0.1.1 area 0

 

14:00:26.223: OSPF: Schedule SPF in area 0, topology Base with MTID 0

 

Change in LS ID 10.0.1.1, LSA type R, spf-type Full

 

….

14:00:26.295: BGP(0): Calculating bestpath for 20.0.99.0/24, New bestpath is 20.0.28.8 :path_count:-

2/0, best-path =20.0.28.8, bestpath runtime :- 4 ms(or 3847 usec) for net 20.0.99.0

 

14:00:26.299: BGP(0): Calculating backuppath::Backup-Path for 20.0.99.0/24:BUMP-VERSION-BACKUP-

 

DELETE:, backup path runtime :- 0 ms (or 193 usec)

 

14:00:32.439: BGP(0): 10.0.3.3 rcvd UPDATE w/ prefix: 20.0.99.0/24, - DO BESTPATH

 

14:00:32.443: BGP(0): Calculating bestpath for 20.0.99.0/24,

bestpath is 20.0.28.8 :path_count:-

 

2/0, best-path =20.0.28.8, bestpath runtime :- 0 ms(or 222 usec) for net 20.0.99.0

 

14:00:32.443: BGP(0): Calculating backuppath::Backup-Path for 20.0.99.0/24, backup path runtime :- 0

ms (or 133 usec)

In the debugging output above, you can see that the BGP process in R5 switched to backup path even before it

received the UPDATE message from R3, signaling the change of the best-path in the RR. Notice that the update

does not have any path identifiers in the NLRI, as the RR has only a single best-path. Let’s see how much time it

actually took to run SPF, as compared to overall detection/failover process:

R5#show ip ospf statistics

OSPF Router with ID (10.0.5.5) (Process ID 100)

Area 0: SPF algorithm executed 15 times

Summary OSPF SPF statistic

SPF calculation time

Delta T

Intra D-Intra Summ

D-Summ Ext

D-Ext

Total

Reason

00:28:00

44

0

0

4

0

4

56
56

R

As you can see, the total SPF runtime was 56ms. Therefore, the remaining 20ms were spent on updating RIB and

triggering the next-hop change event. Of course, all these numbers have only relative meaning, as we are using

Dynamips for this simulation, but you may use similar methodology when validating real-world designs.

Considerations for Implementing BGP Add Paths

Even though the Add Paths feature is not yet implemented, it is worth considering the drawbacks of this approach.

One drawback is that the amount information needed to be sent and stored is now multiplied by the number of

additional paths. Previously, the most stressed routers in BGP AS were route reflectors, that had to carry the

largest BGP tables. With the Add-Path functionality, every non-RR speaker now receives all information that RR

stores in its BGP table. This puts extra requirement on the edge speakers and should be accounted when

planning to use this feature. Furthermore, additional paths will utilize extra memory on the forwarding engines, as

now PIC-enabled prefixes have multiple alternate paths. However, since the number of prefixes remains the same,

TCAM fast lookup memory resources will not be wasted, and thus only dynamic RAM is being affected the most.

You may read more about scalability/performance trade-offs in [17]

Summary

Achieving fast BGP convergence is not easy, because BGP is a complicated routing protocol running overlay on

top of an IGP process. We found out that tuning purely BGP-based convergence requires the following general

steps:

Tuning BGP TCP Transport and router ingress queues to achieve faster routing information propagation.BGP-based convergence requires the following general steps: Proper organization of outbound policies for achieving

Proper organization of outbound policies for achieving optimum update group construction.queues to achieve faster routing information propagation. Tuning BGP Advertisement Interval if needed to respond to

Tuning BGP Advertisement Interval if needed to respond to fast “Down->Up” conditionspolicies for achieving optimum update group construction. Activating BGP fast external fallover and possible BFD for

Activating BGP fast external fallover and possible BFD for fast external peering session deactivation.if needed to respond to fast “Down->Up” conditions As we noticed previously, pure-BGP based convergence is

As we noticed previously, pure-BGP based convergence is the only thing available for Inter-AS scenarios.

However, for fastest convergence inside a single AS, understanding and tuning BGP and IGP interaction can make

BGP converge almost as fast as the underlying IGP. This allows for fast recovery in response to intra-AS link and

node failures, as well as to edge link failures. Optimizing BGP and IGP interaction requires the following:

Tuning the underlying IGP for fast convergence. It is possible to tune the IGP even for large network to converge under one second.Optimizing BGP and IGP interaction requires the following: Enabling BGP Next-Hop Tracking process for all BGP

Enabling BGP Next-Hop Tracking process for all BGP speakers and tuning the BGP NHT delay in accordance with IGP response time.the IGP even for large network to converge under one second. Applying IGP summarization carefully to

Applying IGP summarization carefully to avoid hiding BGP NHT information.the BGP NHT delay in accordance with IGP response time. Leveraging IGP for propagation of external

Leveraging IGP for propagation of external peering link failures, in addition to relying on BGP peering session deactivation.summarization carefully to avoid hiding BGP NHT information. Using the Add-Path Functionality in critical BGP speakers

Using the Add-Path Functionality in critical BGP speakers (e.g. RRs) to allow for propagation of redundant paths if supported by implementation.in addition to relying on BGP peering session deactivation. Use BGP PIC or fast backup switchover

Use BGP PIC or fast backup switchover in the environments that allow for multiple paths to be propagated – e.g. multihomed MPLS VPN sites using different RD values.of redundant paths if supported by implementation. We’ve also briefly covered some caveats resulting from the

We’ve also briefly covered some caveats resulting from the future use of “Add-Path” functionality, such as

excessive usage of memory resources on router-processor and line-cards and extra toll on BGP best-path

process due to the growth of alternate paths. There are few things that were left out of the scope of this paper. We

didn’t not concentrate on the detailed mechanics of BGP fast peering session deactivation e.g. for multihop

sessions and we did not cover the MP-BGP specific features. Some MP-BGP extensions such as the additional

import scan interval and edge control plane interworking have their effects on end-to-end convergence, but this is

a topic for another discussion.

Further Reading

[10]BGP Design and Implementation by Randy Zhang

Appendix: Practical Scenario Baseline Configuration

The below are the initial configurations for the Dynamips topology used to validate BGP PIC behavior.

====R1:====

hostname R1

!

ip tcp synwait-time 5

no ip domain-lookup

no service timestamps

!

line con 0

logging synch

exec-timeout 0 0

privilege level 15

!

ip routing

!

interface Serial 1/0

ip address 20.0.17.1 255.255.255.0

no shut

!

interface Serial 1/2

no shut

ip address 10.0.12.1 255.255.255.0

!

interface Serial 1/1

no shut

ip address 10.0.13.1 255.255.255.0

!

interface Serial 1/3

ip address 10.0.14.1 255.255.255.0

!

interface Loopback0

ip address 10.0.1.1 255.255.255.255

!

router ospf 100

router-id 10.0.1.1

network 0.0.0.0 0.0.0.0 area 0

passive-interface Serial 1/0

!

router bgp 100

neighbor 10.0.3.3 remote-as 100

neighbor 10.0.3.3 update-source Loopback0

neighbor 10.0.4.4 remote-as 100

neighbor 10.0.4.4 update-source Loopback0

neighbor 20.0.17.7 remote-as 200

====R2:====

hostname R2

!

ip tcp synwait-time 5

no ip domain-lookup

no service timestamps

!

line con 0

logging synch

exec-timeout 0 0

privilege level 15

!

ip routing

!

interface Serial 1/0

ip address 20.0.28.2 255.255.255.0

no shut

!

interface Serial 1/2

no shut

ip address 10.0.12.2 255.255.255.0

!

interface Serial 1/1

no shut

ip address 10.0.24.2 255.255.255.0

!

interface Serial 1/3

no shut

ip address 10.0.23.2 255.255.255.0

!

interface Loopback0

ip address 10.0.2.2 255.255.255.255

!

router ospf 100

router-id 10.0.2.2

network 0.0.0.0 0.0.0.0 area 0

passive-interface Serial 1/0

!

router bgp 100

neighbor 10.0.3.3 remote-as 100

neighbor 10.0.3.3 update-source Loopback0

neighbor 10.0.4.4 remote-as 100

neighbor 10.0.4.4 update-source Loopback0

neighbor 20.0.28.8 remote-as 200

====R3:====

hostname R3

!

ip tcp synwait-time 5

no ip domain-lookup

no service timestamps

!

line con 0

logging synch

exec-timeout 0 0

privilege level 15

!

ip routing

!

interface Serial 1/0

no shut

ip address 10.0.13.3 255.255.255.0

!

interface Serial 1/1

no shut

ip address 10.0.35.3 255.255.255.0

!

interface Serial 1/2

no shut

ip address 10.0.34.3 255.255.255.0

!

interface Serial 1/3

no shut

ip address 10.0.23.3 255.255.255.0

!

interface Serial 1/4

no shut

ip address 10.0.36.3 255.255.255.0

!

router ospf 100

router-id 10.0.3.3

network 0.0.0.0 0.0.0.0 area 0

!

interface Loopback0

ip address 10.0.3.3 255.255.255.255

!

router bgp 100

neighbor IBGP peer-group

neighbor IBGP remote-as 100

neighbor IBGP update-source Loopback0

neighbor IBGP route-reflector-client

neighbor 10.0.1.1 peer-group IBGP

neighbor 10.0.2.2 peer-group IBGP

neighbor 10.0.5.5 peer-group IBGP

neighbor 10.0.6.6 peer-group IBGP

neighbor 10.0.4.4 remote-as 100

neighbor 10.0.4.4 update-source Loopback0

====R4:====

hostname R4

!

ip tcp synwait-time 5

no ip domain-lookup

no service timestamps

!

line con 0

logging synch

exec-timeout 0 0

privilege level 15

!

ip routing

!

interface Serial 1/0

no shut

ip address 10.0.24.4 255.255.255.0

!

interface Serial 1/1

no shut

ip address 10.0.46.4 255.255.255.0

!

interface Serial 1/2

no shut

ip address 10.0.34.4 255.255.255.0

!

interface Serial 1/3

no shut

ip address 10.0.14.4 255.255.255.0

!

interface Serial 1/4

no shut

ip address 10.0.45.4 255.255.255.0

!

router ospf 100

router-id 10.0.4.4

network 0.0.0.0 0.0.0.0 area 0

!

interface Loopback0

ip address 10.0.4.4 255.255.255.255

!

router bgp 100

neighbor IBGP peer-group

neighbor IBGP remote-as 100

neighbor IBGP update-source Loopback0

neighbor IBGP route-reflector-client

neighbor 10.0.1.1 peer-group IBGP

neighbor 10.0.2.2 peer-group IBGP

neighbor 10.0.5.5 peer-group IBGP

neighbor 10.0.6.6 peer-group IBGP

neighbor 10.0.3.3 remote-as 100

neighbor 10.0.3.3 update-source Loopback0

====R5:====

hostname R5

!

ip tcp synwait-time 5

no ip domain-lookup

no service timestamps

!

line con 0

logging synch

exec-timeout 0 0

privilege level 15

!

ip routing

!

interface Serial 1/0

no shut

ip address 10.0.35.5 255.255.255.0

!

interface Serial 1/1

no shut

ip address 10.0.56.5 255.255.255.0

!

interface Serial 1/2

no shut

ip address 10.0.45.5 255.255.255.0

!

router ospf 100

router-id 10.0.5.5

network 0.0.0.0 0.0.0.0 area 0

!

interface Loopback0

ip address 10.0.5.5 255.255.255.0

!

router bgp 100

neighbor 10.0.3.3 remote-as 100

neighbor 10.0.3.3 update-source Loopback0

neighbor 10.0.4.4 remote-as 100

neighbor 10.0.4.4 update-source Loopback0

====R6:====

hostname R6

!

ip tcp synwait-time 5

no ip domain-lookup

no service timestamps

!

line con 0

logging synch

exec-timeout 0 0

privilege level 15

!

ip routing

!

interface Serial 1/0

no shut

ip address 10.0.46.6 255.255.255.0

!

interface Serial 1/1

no shut

ip address 10.0.56.6 255.255.255.0

!

interface Serial 1/2

no shut

ip address 10.0.36.6 255.255.255.0

!

router ospf 100

router-id 10.0.6.6

network 0.0.0.0 0.0.0.0 area 0

!

interface Loopback0

ip address 10.0.6.6 255.255.255.0

!

router bgp 100

neighbor 10.0.3.3 remote-as 100

neighbor 10.0.3.3 update-source Loopback0

neighbor 10.0.4.4 remote-as 100

neighbor 10.0.4.4 update-source Loopback0

====R7:====

hostname R7

!

ip tcp synwait-time 5

no ip domain-lookup

no service timestamps

!

line con 0

logging synch

exec-timeout 0 0

privilege level 15

!

ip routing

!

interface Serial 1/0

no shut

ip address 20.0.17.7 255.255.255.0

!

interface Serial 1/1

no shut

ip address 20.0.78.7 255.255.255.0

!

interface Serial 1/2

no shut

ip address 20.0.79.7 255.255.255.0

!

interface Loopback0

ip address 20.0.7.7 255.255.255.0

!

router ospf 1

router-id 20.0.7.7

network 0.0.0.0 0.0.0.0 area 0

passive-interface Serial 1/0

!

router bgp 200

neighbor 20.0.17.1 remote-as 100

neighbor 20.0.9.9 remote-as 200

neighbor 20.0.9.9 update-source Loopback0

neighbor 20.0.8.8 remote-as 200

neighbor 20.0.8.8 update-source Loopback0

====R8:====

hostname R8

!

ip tcp synwait-time 5

no ip domain-lookup

no service timestamps

!

line con 0

logging synch

exec-timeout 0 0

privilege level 15

!

ip routing

!

interface Serial 1/0

no shut

ip address 20.0.28.8 255.255.255.0

!

interface Serial 1/1

no shut

ip address 20.0.78.8 255.255.255.0

!

interface Serial 1/2

no shut

ip address 20.0.89.8 255.255.255.0

!

interface Loopback0

ip address 20.0.8.8 255.255.255.0

!

router ospf 1

router-id 20.0.8.8

network 0.0.0.0 0.0.0.0 area 0

passive-interface Serial 1/0

!

router bgp 200

neighbor 20.0.28.2 remote-as 100

neighbor 20.0.9.9 remote-as 200

neighbor 20.0.9.9 update-source Loopback0

neighbor 20.0.7.7 remote-as 200

neighbor 20.0.7.7 update-source Loopback0

====R9:====

hostname R9

!

ip tcp synwait-time 5

no ip domain-lookup

no service timestamps

!

line con 0

logging synch

exec-timeout 0 0

privilege level 15

!

ip routing

!

interface Serial 1/0

no shut

ip address 20.0.79.9 255.255.255.0

!

interface Serial 1/1

no shut

ip address 20.0.89.9 255.255.255.0

!

interface Loopback0

ip address 20.0.9.9 255.255.255.0

!

interface Loopback100

ip address 20.0.99.99 255.255.255.0

!

router ospf 1

router-id 20.0.9.9

network 0.0.0.0 0.0.0.0 area 0

!

router bgp 200

neighbor 20.0.8.8 remote-as 200

neighbor 20.0.8.8 update-source Loopback0

neighbor 20.0.7.7 remote-as 200

neighbor 20.0.7.7 update-source Loopback0

network 20.0.99.0 mask 255.255.255.0

About Petr Lapukhov, 4xCCIE/CCDE: Petr Lapukhov's career in IT begain in 1988 w ith a

About Petr Lapukhov, 4xCCIE/CCDE:

Petr Lapukhov's career in IT begain in 1988 w ith a focus on computer programming, and progressed into netw orking w ith his first exposure to Novell NetWare in 1991. Initially involved w ith Kazan State University's campus netw ork support and UNIX system administration, he w ent through the path of becoming a netw orking consultant, taking part in many netw ork deployment projects. Petr currently has over 12 years of experience w orking in the Cisco netw orking field, and is the only person in the w orld to have obtained four CCIEs in under tw o years, passing each on his first attempt. Petr is an exceptional case in that he has been w orking w ith all of the technologies covered in his four CCIE tracks (R&S, Security, SP, and Voice) on a daily basis for many years. When not actively teaching classes, developing self-paced products, studying for the CCDE Practical & the CCIE Storage Lab Exam, and completing his PhD in Applied Mathematics.

You can leave a response, or trackback from your own site.

18 Responses to “Understanding BGP Convergence”

Maher El Zein, CCIE No. 21032 November 22, 2010 at 8:26 am

Maher El Zein, CCIE No. 21032

November 22, 2010 at 8:26 am

Really Amazing … Thank you Petr.

  November 22, 2010 at 4:55 pm
 

November 22, 2010 at 4:55 pm

john

Thank you for the great post; how long does it take you to put these posts together? You are very detailed and yet make it understandable…thanks (again)…and we appreciate your effort!

  November 23, 2010 at 4:12 am
 

November 23, 2010 at 4:12 am

Amit

Wow, Cisco must be acknowledged for bringing the “Add-Path” feature to the IOS already while it is still in the draft phase.

Of course, thank u Petr for such a detailed post.

  November 23, 2010 at 4:45 am
 

November 23, 2010 at 4:45 am

Amit

BTW, how did you test it on dynamips? IOS 15.0(1)M4 doesn’t support the Add-Path feature on 7200 routers.

 
  Petr Lapukhov, 4xCCIE/CCDE November 23, 2010 at 9:13 am

November 23, 2010 at 9:13 am

@Amit,

I was using the 12.2(33)SRE train for testing

I was using the 12.2(33)SRE train for testing

The Add Paths capability is not in 15.x train yet as far as I know.

  November 23, 2010 at 6:19 pm
 

November 23, 2010 at 6:19 pm

Joe

Great post, what system on you using to run dynamips?

 
what system on you using to run dynamips?   Reply Petr Lapukhov, 4xCCIE/CCDE @Everyone, November 23,

@Everyone,

November 23, 2010 at 8:06 pm

My apologies for creating some confusion here. It appears at present moment Cisco IOS 12.2(33)SRE DOES NOT support the “Add Paths” feature, but only the fast BGP PIC. My scenario backup behavior was based on the fact that R3 and R4 (the RRs) were advertising different best-paths, for their closest exit points respectively. You still need to use MPLS VPN RDs to allow for multiple- path propagation in MPLS VPN environment.

  November 24, 2010 at 5:58 pm
 

November 24, 2010 at 5:58 pm

Dan Kirkland, 3xCCIE

Petr,

Another great post!

 

(FYI, looks like there’s a small typo in the link for reference #16.)

a small typo in the link for reference #16.) Reply Mustard Bayramov greate post. btw add-path

Mustard Bayramov

greate post. btw add-path is support in 4.0 XR

December 19, 2010 at 6:06 pm

January 27, 2011 at 12:54 pm

] [

Acessado em: [

Lapukhov, Understanding BGP Convergence. Disponível em: http://blog.ine.com/2010/11/22/understanding-bgp-convergence/.

]

. ] Reply Faraz May 20, 2011 at 10:47 am So just to clarify, BGP PIC

Faraz

May 20, 2011 at 10:47 am

So just to clarify, BGP PIC is a feature thats allows a router to install backup routes for a prefix for fast re-route capability.

Add Path is a feature that allows a peer to send/accept multiple paths for the same prefix.

  August 1, 2011 at 2:44 am
 

August 1, 2011 at 2:44 am

yognshunz

thanks for your great job.

 
Abdul Kaleem January 17, 2012 at 11:32 pm

Abdul Kaleem

January 17, 2012 at 11:32 pm

Thanks good document

 
  February 4, 2012 at 3:08 am
 

February 4, 2012 at 3:08 am

saurabh

Very detailed n informative post

thanks

Petr

  April 1, 2012 at 11:15 am
 

April 1, 2012 at 11:15 am

Mohit Bhalla

how did a ibgp able to understand or what kind of logical operation occur to stop sending the routes it learned from one ebgp to other ibgps

  July 24, 2012 at 11:10 am
 

July 24, 2012 at 11:10 am

fornarina

Incredible !!. What a great paper !!. You are the best. Thanks Petr.

 
paper !!. You are the best. Thanks Petr.   Reply Fulvio Allegretti September 19, 2013 at

Fulvio Allegretti

September 19, 2013 at 7:00 am

Great post, thanks. (Fig 6 is not completely right, the routers in as 200 should be R7 and R8, not 5 and 6 again)

February 7, 2014 at 11:36 am

Server, IDR Working Group draft Internet Exchange Route Server Operations, GROW Working Group draft Understanding BGP

] [

Convergence, Posted by Petr Lapukhov Share this:TwitterFacebookLike this:Like [

]

Leave a Reply

this:Like [ ] Reply Leave a Reply Name (required) Mail (will not be published) (required)
this:Like [ ] Reply Leave a Reply Name (required) Mail (will not be published) (required)

Name (required)

Mail (will not be published) (required)

this:Like [ ] Reply Leave a Reply Name (required) Mail (will not be published) (required) Submit

Submit Comment

© 2011 INE, Inc., All Rights Reserved

© 2011 INE, Inc., All Rights Reserved
© 2011 INE, Inc., All Rights Reserved
Mail (will not be published) (required) Submit Comment twitter.com/ine © 2011 INE, Inc., All Rights Reserved
Mail (will not be published) (required) Submit Comment twitter.com/ine © 2011 INE, Inc., All Rights Reserved
Mail (will not be published) (required) Submit Comment twitter.com/ine © 2011 INE, Inc., All Rights Reserved
Mail (will not be published) (required) Submit Comment twitter.com/ine © 2011 INE, Inc., All Rights Reserved