Beruflich Dokumente
Kultur Dokumente
BRKRST-3320
BRKRST-3320
Cisco Public
Introduction
Housekeeping
Cell Phones Who am I? Who are you?
Enterprise Service Provider
Advanced Class
BRKRST-3320
Cisco Public
Introduction
Operating Systems
IOS vs. IOS-XR vs. NX-OS Troubleshooting concepts are the same Some variation in show command syntax and output Will use all three in this presentation
BRKRST-3320
Cisco Public
Introduction
Agenda
Generic Troubleshooting Advice Troubleshooting Peers Bestpath Algorithm Table Version Initial Convergence Periodic Convergence High Utilization Layer 3 VPNs Looking Glasses
BRKRST-3320
Cisco Public
100k routes flapping? Pick one route and focus on that one route Forces you to talk through the problem Different set of eyes may spot something
BRKRST-3320
Cisco Public
BRKRST-3320
Cisco Public
BRKRST-3320
Cisco Public
bgp log-neighbor-changes
Always configure this
Generates a syslog message when a peer goes up or down OSPF, ISIS, and EIGRP all have log-neighbor-changes too
BRKRST-3320
Cisco Public
10
Cacti is a handy tool for polling and graphing data from various network devices
http://www.cacti.net/
BRKRST-3320
Cisco Public
12
IOS-XR
monitor session 1 source interface Te2/4 rx Only supported on ASR-9000 Use ACLs to control what packets to SPAN
RSPAN
RSPAN has all the features of SPAN, plus support for source ports and destination ports that are distributed across multiple switches, allowing one to monitor any destination port located on the RSPAN VLAN. Hence, one can monitor the traffic on one switch using a device on another switch.
13
BRKRST-3320
Cisco Public
Very handy if a dedicated sniffer is not available Available on IOS and NX-OS
BRKRST-3320
Cisco Public
14
Define which interface and direction to capture Associate the buffer with the capture Start/Stop the capture
BRKRST-3320
You probably know this already but Wireshark is your best friend It is free You can get it here
http://www.wireshark.org/
BRKRST-3320
Cisco Public
16
BRKRST-3320
Cisco Public
17
Can do complex filters If the filter is red, your syntax is busted If the filter is green, your syntax is correct
BRKRST-3320
Cisco Public
18
Wireshark does a LOT Enough for someone to write an 800 page book on how to use it ISBN-13: 978-1893939998
BRKRST-3320
Cisco Public
19
service timestamps debug datetime msec localtime service timestamps log datetime msec localtime brain1#debug ip packet 100 brain1# reload in 10 Run your debug brain1(config)#access-list 100 permit ip host 1.1.1.1 host 2.2.2.2 IP packet debugging is on for access list 100
BRKRST-3320
Cisco Public
20
Finite number of most recent events are stored Use show commands later to
Display an event in a debug like format Merge events from various protocols
http://tinyurl.com/cisco-event-tracer
BRKRST-3320 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
brain1(config)#monitor event-trace ? adjacency all-traces atom cef [snip] brain1(config)#monitor event-trace adjacency enable brain1(config)#end Adjacency Events Configure merged event traces AToM Event Trace CEF traces
BRKRST-3320
Cisco Public
22
BRKRST-3320
Cisco Public
23
Dont be the person who has to drive 3 hours to console into a box If you dont have out of band access for every router and/or switch in your network.get it.please
BRKRST-3320
Cisco Public
24
Troubleshooting Peers
Failed Peering
Configurations
Check AS Numbers IP addresses for TCP eBGP Multihop?
interface Loop0 ip address 1.1.1.1/32 ! router bgp 100 neighbor 2.2.2.2 remote-as 100 neighbor 2.2.2.2 update-source Loop0
R1
R2
interface Loop0 ip address 2.2.2.2/32 ! router bgp 100 neighbor 1.1.1.1 remote-as 100 neighbor 1.1.1.1 update-source Loop0
R1#sh tcp brief all TCB Local Address 64328548 *.179 R1#
(state) LISTEN
BRKRST-3320
Cisco Public
26
Failed Peering
Connectivity
Check Extended ping between BGP peering addresses
interface Loop0 ip address 1.1.1.1/32 ! router bgp 100 neighbor 2.2.2.2 remote-as 100 neighbor 2.2.2.2 update-source Loop0 interface Loop0 ip address 2.2.2.2/32 ! router bgp 100 neighbor 1.1.1.1 remote-as 100 neighbor 1.1.1.1 update-source Loop0
R1
R2
R1#ping 2.2.2.2 source Loop0 Sending 5, 100-byte ICMP Echos to 2.2.2.2 Packet sent with a source address of 1.1.1.1 ..... Success rate is 0 percent (0/5) R1#
BRKRST-3320
Cisco Public
27
Failed Peering
Connectivity BGP runs on top of IP and can be affected by many things No connectivity?
IGP issues Access Lists Rate limiting TCP problems
MTU Issues extended ping and sweep address ranges, DF bit, etc.
BRKRST-3320
Cisco Public
28
Failed Peering
Notifications
BRKRST-3320
Cisco Public
29
Failed Peering
Notifications
%BGP-3-NOTIFICATION: sent to neighbor 2.2.2.2 2/2 (peer in wrong AS) 2 bytes 00C8 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 002D 0104 00C8 00B4 0202 0202 1002 0601 0400 0100 0102 0280 0002 0202 00
Value 1 2 3 4 5 6 Name Message Header Error OPEN Message Error UPDATE Message Error Hold Timer Expired Finite State Machine Error Cease Reference RFC 4271 RFC 4271 RFC 4271 RFC 4271 RFC 4271 RFC 4271
BRKRST-3320
Cisco Public
30
Failed Peering
Notifications
Subcode # 1 2 3 4 6 7 Subcode Name Unsupported BGP version Bad Peer AS Bad BGP Identifier Unsupported Optional Parameter Unacceptable Hold Time Unsupported Capability Subcode Description The version of BGP the peer is running isnt compatible with the local version of BGP The AS this peer is locally configured for doesnt match the AS the peer is advertising The BGP router ID is the same as the local BGP router ID There is an option in the packet which the local BGP speaker doesnt recognize The remote BGP peer has requested a BGP hold time which is not allowed (too low) The peer has asked for support for a feature which the local router does not support
OPEN Message Subcodes shown above The second 2 in 2/2 is the Error Subcode.so Bad Peer AS
BRKRST-3320
Cisco Public
31
Failed Peering
Notifications
R2# show log | include NOTIFICATION %BGP-3-NOTIFICATION: sent to neighbor 10.1.2.1 2/2 (peer in wrong AS) 2 bytes 0064 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 002D 0104 0064 00B4 0101 0101 1002 0601 0400 0100 0102 0280 0002 0202 00 x0064 = data of NOTIFICATION x0064 = decimal 100 R1 AS 100
10.1.2.1
R2 AS 200
10.1.2.2
Failed Peering
Notifications
Question: What did R1 see?
R1#sh log | include NOTIFICATION %BGP-3-NOTIFICATION: received from neighbor 10.1.2.2 2/2 (peer in wrong AS) 2 bytes 0064 R1 AS 100
10.1.2.1
router bgp 100 no synchronization bgp log-neighbor-changes neighbor 10.1.2.2 remote-as 200 no auto-summary
10.1.2.2
R2 AS 200
router bgp 200 no synchronization bgp log-neighbor-changes neighbor 10.1.2.1 remote-as 10 no auto-summary
BRKRST-3320
Cisco Public
33
Failed Peering
Decoding Hex
What if a peer sends you a message that causes us to send a NOTIFICATION?
show ip bgp neighbor 1.1.1.1 | begin Last reset
FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 005C0200 00004140 01010040 0206065D 1CFC059F 400304D5 8C20F480 04040000 05054005 04000000 55C0081C 329C4844 329C6E28 329C6E29 58F50082 58F5EACE 58F5FA02 58F5FA6E 18D14E70
Last reset 5d12h, due to BGP Notification sent, invalid or corrupt AS path Message received that caused BGP to send a Notification:
BRKRST-3320
Cisco Public
34
Failed Peering
Decoding Hex
You dont like reading hex? Nice write-up here on converting hex output to wireshark .pcap file
http://tinyurl.com/bgp-hex-decode http://ccie-in-3-months.blogspot.com/2010/08/decoding-ripe-experiment.html
BRKRST-3320
Cisco Public
35
Failed Peering
Decoding Hex
BRKRST-3320
Cisco Public
36
Troubleshooting Peers
eBGP TTL
For eBGP peers that are more than 1 hop away a larger TTL must be used neighbor x.x.x.x ebgp-multihop [2-255]
No longer verifies if NEXTHOP is directly connected
Configured TTL
AS65000 R1
BRKRST-3320
Cisco Public
37
Troubleshooting Peers
eBGP TTL
Loopback peering to directly connected eBGP peer
Two options for configuring this Typically used to load-balance over multiple links Use ebgp-multihop Change the TTL to 2
R1
R2
BRKRST-3320
Cisco Public
38
Troubleshooting Peers
eBGP TTL
Use disable-connected-check
R1
R2
BRKRST-3320
Cisco Public
39
Failed Peering
Notifications Hold Time Expired
R1 R2
NOTIFICATION
%BGP-5-ADJCHANGE: neighbor 2.2.2.2 Down BGP Notification sent %BGP-3-NOTIFICATION: sent to neighbor 2.2.2.2 4/0 (hold time expired) R1#show ip bgp neighbor 2.2.2.2 | include last reset Last reset 00:01:02, due to BGP Notification sent, hold time expired
BRKRST-3320
Failed Peering
Notifications Hold Time Expired
First figure out if R2 is building keepalives
When did R2 last build a BGP message for R1? It should be within keepalive interval seconds. Last read 00:00:15, last write 00:00:44, hold time is 180, keepalive interval is 60 seconds Output drops on the outbound interface towards R1? Is R2 out of memory or CPU?
MsgSent is the number of packets TCP has removed from OutQ and transmitted for a peer
Cisco Public 41
BRKRST-3320
Failed Peering
Notifications Hold Time Expired
R2#show ip bgp sum | begin Neighbor Neighbor MsgRcvd MsgSent TblVer 1.1.1.1 53 284 10167 InQ OutQ Up/Down State/PfxRcd 0 97 00:01:20 0
R2#show ip bgp sum | begin Neighbor Neighbor MsgRcvd MsgSent TblVer 1.1.1.1 53 284 10167
OutQ is incrementing due to keepalive generation MsgSent is not incrementing Something is stuck on the OutQ The keepalives are not leaving R2!!
BRKRST-3320 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
Failed Peering
Notifications Hold Time Expired
Do R1 and R2 still have IP connectivity?
Ping using peering addresses (loopback to loopback) Ping with mss (max-segment-size) with df-bit set 536 bytes by default
Datagrams (max data segment is 1460 bytes): R1# ping 2.2.2.2 source loop0 size 1500 df-bit
BRKRST-3320 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Failed Peering
Notifications Hold Time Expired
MSS ping
BGP OPENs and Keepalives are small UPDATEs can be much larger
R1#ping 2.2.2.2 source loop0 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 16/21/24 ms R1#ping 2.2.2.2 source loop0 size 1500 df-bit Type escape sequence to abort. Sending 5, 1500-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds: Packet sent with the DF bit set . . . . . Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
Failed Peering
Notifications Hold Time Expired
Input drops on R1
BRKRST-3320
Cisco Public
45
Bestpath Algorithm
Best Path
Algorithm
Quick bestpath review Remember BGP only advertises one path per prefixthe bestpath Cannot advertise path from one iBGP peer to another Bestpath selection process is a little lengthy First eliminate paths that are ineligible for bestpath
1 2 3 Not synchronized Inaccessible NEXTHOP Received-only paths Only happens if sync is configured AND the route isnt in your IGP IGP does not have a route to the BGP NEXTHOP Happens if soft-reconfig inbound is applied. A path will be received-only if it was denied/modified by inbound policy.
BRKRST-3320
Cisco Public
47
Best Path
Algorithm
1 2 3 4 Weight LOCAL_PREFERENCE Locally Originated AS_PATH Shortest wins Highest wins Highest wins Scope is router only Scope is AS only Redistribution or network statement favored over aggregateaddress Skipped if bgp bestpath as-path ignore configured AS_SET counts as 1 CONFED parts do not count IGP < EGP < Incomplete MEDs are compared only if the first AS in the AS_SEQUENCE is the same
5 6 7 8 9 10 11 12 13
ORIGIN MED eBGP over iBGP Metric to Next Hop Multiple Paths in RIB Oldest External Wins BGP Router ID CLUSTER_LIST Neighbor Address
Lowest wins
IGP cost to the BGP NEXTHOP Flag path as multipath is max-paths is configured Unless BGP best path compare router-id configured
BRKRST-3320
Cisco Public
48
Best Path
Algorithm
show ip bgp x.x.x.x bestpath
Will show you only the bestpath for x.x.x.x
Same concept but will show you all of the multipaths for x.x.x.x
2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
BRKRST-3320
Best Path
Algorithm
IOS-XR has
BRKRST-3320
Cisco Public
50
Each prefix has a 32 bit number that is its table version A prefixs table version is bumped for every bestpath change Bumped means the table version changes from the current version to the next available version #. Assume 10.0.0.0/8 has a table version of #27 and the highest table version used by any prefix is #30. If 10.0.0.0/8 has a bestpath change his table version will be bumped to #31.
BRKRST-3320
Cisco Public
52
BRKRST-3320
Cisco Public
53
If peer 1.1.1.1 has a table version of #60 this tells us we have informed 1.1.1.1 of all bestpath changes for prefixes with a table version of <= #60
If any prefix has a table version > #60 then we need to inform 1.1.1.1 of that prefixs bestpath Once 1.1.1.1 has been updated his table version will be updated accordingly Same concept for the RIB and its table version
BRKRST-3320 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Highest table version of any prefix = main routing table version RIB is converged 1.1.1.1 is converged
BRKRST-3320
Cisco Public
55
Do RIB adds, deletes, and/or modifies When complete, set the RIB table version to #15
Build updates and/or withdraws for each peer When complete, set our peers table versions to #15
2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
BRKRST-3320
Gives you a way to know who has been informed about what Provides a way to tell how many bestpath changes your network is experiencing
You should monitor the table version in your network to determine what is normal for you If the table version is increasing rapidly then that could explain why BGP Router and BGP IO are busy
You have 150k routes and see the table version increase by 150k every minutesomething is wrong!! You have 150k routes and see the table version increase by 300 every minutesounds like normal network churn
BRKRST-3320
Cisco Public
57
Initial Convergence
BGP Convergence
HeyWho are you calling slow? Two general convergence situations
Periodic route changes Initial startup
BRKRST-3320
Cisco Public
59
Convergence
Initial Startup
A router boots
How long initial convergence takes is a factor of the amount of work to be done and the router/networks ability to do this fast and efficiently
BRKRST-3320
Cisco Public
60
Convergence
Initial Startup
Initial convergence can be stressfulif you are approaching BGP scalability limits this is when you will see issues.
BRKRST-3320
Cisco Public
61
Convergence
Initial Startup
2) Calculate bestpaths
This is easy
BRKRST-3320
Cisco Public
62
Convergence
Key Variables
BGP Variables
The number of routes The number of peers CPU horsepower Code version
Router Variables
The ability to advertise routes to each update-group efficiently Outbound Interface Bandwidth
BRKRST-3320
Cisco Public
63
Convergence
UPDATE Packing
An UPDATE contains a set of Attributes and a list of prefixes (NLRI)
BGP starts an UPDATE by building an attribute set BGP then packs as many destinations (NLRIs) as it can into the UPDATE
NLRI = Network Layer Reachability Information Only NLRI with a matching attribute set can be placed in the UPDATE NLRI are added to the UPDATE until it is full (4096 bytes max)
UPDATE Packing refers to how efficiently an implementation packs NLRIs into UPDATEs
Least efficient: BGP only puts one NLRI per UPDATE Most efficient: BGP puts all NLRI with a certain Attribute set in one UPDATE
10.1.1.0/24 Origin IGP 10.1.2.0/24 MED 50 Origin IGP
Origin IGP
BRKRST-3320
Cisco Public
64
Convergence
UPDATE Packing
The fewer attribute sets you have the better
Fewer UPDATEs to converge More NLRI will share an attribute set next-hop-self for all iBGP sessions show ip bgp summary
190844 network entries using 21565372 bytes of memory 302705 path entries using 15740660 bytes of memory 57469/31045 BGP path/bestpath attribute entries using 6206652 bytes of memory
2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
BRKRST-3320
Convergence
TCP MSS Max Segment Size
TCP MSS (max segment size) is also a factor in convergence times. The larger the MSS the fewer TCP packets it takes to transport the BGP updates. Fewer packets means less overhead and faster convergence. BGP UPDATE Default MSS
BGP UDPATE is split into two TCP packets Attribute NLRI ..NLRIs.. NLRI ..NLRIs.. NLRI ..NLRIs.. NLRI ..NLRIs..
IP Header IP Header
Attribute NLRI
Increased MSS
IP Header
TCP Header
Attribute
NLRI
NLRI
NLRI
BRKRST-3320
Cisco Public
66
Convergence
TCP MSS Max Segment Size MSS Max Segment Size
536 bytes by default Limit on packet size for a TCP socket
Finds smallest MTU between R1 and R2 Subtract 40 bytes for TCP/IP overhead
BRKRST-3320
Cisco Public
67
Convergence
Update Groups
BGP must create updates based on the policies towards each peer Peers with a common outbound policy are members of the same update-group
Outbound route-map, prefix-lists, etc. iBGP vs. eBGP
Less Efficient Two peers in different update-groups Attribute Attribute NLRI NLRI NLRI NLRI
UPDATEs are generated for one member of an update-group and then replicated to the other members
More Efficient Two peers in the same update-group Attribute NLRI NLRI
BRKRST-3320
Cisco Public
68
Convergence
Dropping TCP Acks
RR sends out tons of UPDATES to RRCs RRCs send TCP ACKs RR core facing interface(s) receive huge wave of TCP ACKs
TCP ACKs
RR BGP UPDATEs
RRCs
BRKRST-3320
Cisco Public
69
Convergence
Dropping TCP Acks
It takes a good deal of time for a TCP session to come out of slow start hold-queue 1000 in
Each time a TCP packet is dropped, the session goes into slow start
BRKRST-3320
Cisco Public
70
Convergence
How do You Know if BGP has Converged?
Watch the global table version
Increases by 1 for every bestpath change In the lab: Table version stabilizes Wait for all InQ and OutQs to be empty 0 show ip bgp summ
If peer table version == global table version and InQ/OutQ empty, BGP has converged that peer
BRKRST-3320
Cisco Public
71
Convergence
Initial Convergence Summary Initial convergence time is a factor of the amount of work that needs to be done and the router/networks ability to do this fast and efficiently Reduce the number of attributes sets in BGP Reduce the number of unique outbound policies towards all peers
The fewer update-groups the better Use next-hop-self, dont send communities you dont need, etc. Efficient packaging of BGP messages in TCP Increase interface input queues on RRs Try to find a small set of common policies, rather than individualizing policies per peer
MSS/PMTU
BRKRST-3320
Cisco Public
72
Periodic Convergence
Convergence
Route Changes
There are 2 elements to route change convergence for BGP Failure Detection Convergence
How long does it take to see the failure? (t0 to t1)
t0
How long does it take to process and propagate information about the failure? (t1 to t2)
t1 t2
Failure
Process Propagate
Recovery
BRKRST-3320
Cisco Public
74
Convergence
Route Changes
BRKRST-3320
Cisco Public
75
Convergence
Address Tracking Filter
ATF is a middle man between the RIB and RIB clients A client tells ATF what prefixes he is interested in ATF tracks each prefix
Notify the client when the route to a registered prefix changes Provides a scalable event driven model for dealing with RIB changes
BRKRST-3320 2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
76
Convergence
Nexthop Tracking
BGP nexthop tracking
Event driven convergence model Relies on ATF 10.1.1.3 10.1.1.5
BGP
ATF
ATF filters out changes for 10.1.1.1/32, 10.1.1.2/32, and 10.1.1.4/32 Changes to 10.1.1.3/32 and 10.1.1.5/32 are passed along to BGP
No need to wait for BGP Scanner
BRKRST-3320
Convergence
Nexthop Tracking
Enabled by default
[no] bgp nexthop trigger enable
show ip bgp attr next-hop ribfilter bgp nexthop trigger delay <0-100> debug ip bgp events nexthop debug ip bgp rib-filter
Debugs
BRKRST-3320
Cisco Public
78
Convergence
Peer Down Detection
BGP must learn that the peer is down
Default keepalive/holdtime values are 60 seconds and 180 seconds My 2c.use 3 second KA with 9 second holdtime Tune your IGP to converge in under 9 seconds Use BFD (bidirectional forwarding detection) if you need to be more aggressive
eBGP multihop
bgp fast-external-fallover If the interface goes down so does the eBGP peer Reduce carrier-delay settings 0 msec for down 100 msec for up Relies on holdtime or BFD
BRKRST-3320
Cisco Public
79
Convergence
Peer Down Detection
iBGP peers
Your BFD dead timer must be greater than that amount IGP should be tuned to converge quickly
Fast IGP + BGP Nexthop Tracking = BGP reacts quickly to nexthop changes
BGP can route around a change in the core prior to bringing down iBGP peer(s)
BRKRST-3320 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 80
Convergence
Fast Session Deactivation
ATF informs BGP of routing changes to the peer When we lose our route to the peer, bring the peer down.
Multihop eBGP #1 Link 1 fails #2 Link 2 fails #3 FSD takes down peer
BRKRST-3320
Cisco Public
81
Convergence
Fast Session Deactivation
IGP may not have a route to a peer for a split second FSD would tear down the BGP session neighbor x.x.x.x fall-over Every RR to RRC session would flap
Imagine if you lose your IGP route to your RR (Route Reflector) for just 100ms
Off by default
BRKRST-3320
Cisco Public
82
Convergence
FSD vs. BFD
Why do we have both?
BFD came later FSD was developed first
Goal was fast BGP neighbor detection without expense of fast keepalives
Fast keepalives not as much of a concern Goal was fast neighbor detection for multiple protocols BFD KAs are generated by linecards CPUs are also much faster today
FSD
Relies on control plane (absence of a route in the RIB) to tear down the peer We could have a route but not have connectivity Relies on forwarding plane to detect down peer If we loose connectivity, the peer comes down
BFD
BRKRST-3320
Cisco Public
83
Convergence
MRAI (Minimum Route Advertisement Interval)
How is the timer enforced for peer X?
Timer starts when all routes have been advertised to X Restart the timer and the process repeats For the next MRAI (seconds) we will not propagate any bestpath changes to peer X Once Xs MRAI timer expires, send him updates and withdraws
User may see a wave of updates and withdraws to peer X every MRAI seconds User will NOT see a delay of MRAI between each individual update and/or withdraw
BGP would never converge if this were the case
BRKRST-3320
Cisco Public
84
Convergence
MRAI
MRAI timeline for BGP peer w/ MRAI of 5 seconds T0 T7
The big bang Bestpath Change #1 UPDATE sent immediately MRAI timer starts, will expire at T12 MRAI expires Bestpath Change #2 is Txed MRAI timer starts, will expire at T17 MRAI expires No pending UPDATEs
Bestpath Change #2 Bestpath Change #1
T10
T12
t0
t5
t10
t15
t20
t25
MRAI Expires
T17
BRKRST-3320
Cisco Public
85
Convergence
MRAI
BGP is not a link state protocol, it is path vector May take several rounds/cycles of exchanging updates and withdraws for the network to converge MRAI must expire between each round! The more fully meshed the network and the more tiers of ASes, the more rounds required for convergence Think about
How meshy peering can be in the Internet
2012 Cisco and/or its affiliates. All rights reserved.
BRKRST-3320
Cisco Public
86
Convergence
MRAI
Internet churn means we are constantly setting and waiting on MRAI timers
One flapping prefix slows convergence for all prefixes Has been the default since 12.0(32)S Internet table sees roughly 6 bestpath changes per second neighbor x.x.x.x advertisement-interval 0 Default is 30 seconds Lowering to 0 may get you dampened
BRKRST-3320
87
Convergence
MRAI
TCP, the operating system, and BGP code provide some batching
Calculate bestpaths based on received messages Format UPDATEs to advertise new bestpaths
Cisco Public
88
High Utilization
Router#show process cpu CPU utilization for five seconds: 100%/0%; one minute: 99%; five minutes: 81% .... 139 6795740 1020252 6660 88.34% 91.63% 74.01% 0 BGP Router
Define High
Know what normal CPU utilization is for the router in question Is the CPU spiking due to BGP Scanner or is it constant? Is BGP going through Initial Convergence? Illegal recursive lookup or some other factor causes bestpath changes for the entire table
BRKRST-3320
Cisco Public
90
High Utilization
How to identify route churn?
Wait 60 seconds Do sh ip bgp summary, note the table version This is probably normal route churn
You have 150k routes and see the table version increase by 300
Know how many bestpath changes you normally see per minute This is bad and is the cause of your high CPU
You have 150k routes and see the table version increase by 150k
BRKRST-3320
Cisco Public
91
High Utilization
What causes massive table version changes? Flapping peers
Hold-timer expiring? Corrupt UPDATE?
Route churn
Identify one prefix that is churning and troubleshoot that one prefix Will likely fix the problem with the rest of the BGP table churn
BRKRST-3320
Cisco Public
92
High Utilization
Table Version Changing Rapidly: A Little Lab Fun
RP/0/RP0/CPU0:XR#sh route | include 00:00: Wed Apr 27 13:53:40.201 EDT O 1.0.0.0/30 [110/3] via 10.1.2.1, 00:00:00, GigabitEthernet0/0/0/1 O 1.0.0.4/30 [110/3] via 10.1.2.1, 00:00:00, GigabitEthernet0/0/0/1 O 1.0.0.8/30 [110/3] via 10.1.2.1, 00:00:00, GigabitEthernet0/0/0/1 O 1.0.0.12/30 [110/3] via 10.1.2.1, 00:00:00, GigabitEthernet0/0/0/1 ... RP/0/RP0/CPU0:XR#sh route | include 00:00: Wed Apr 27 13:53:44.162 EDT B 1.0.0.0/30 [20/2] via 1.1.1.1, 00:00:01 < 4 seconds later B 1.0.0.4/30 [20/2] via 1.1.1.1, 00:00:01 B 1.0.0.8/30 [20/2] via 1.1.1.1, 00:00:01 B 1.0.0.12/30 [20/2] via 1.1.1.1, 00:00:01 ...
BRKRST-3320
Cisco Public
93
High Utilization
Table Version Changing Rapidly: A Little Lab Fun
RP/0/RP0/CPU0:aggies#sh ip bgp 1.0.0.4 Wed Apr 27 14:00:36.066 EDT ... Last Modified: Apr 27 14:00:35.387 for 00:00:00 Paths: (1 available, no best path) ... 100 1.1.1.1 (inaccessible) from 1.1.1.1 (1.1.1.1) ...
BRKRST-3320
Cisco Public
High Utilization
Something is wrong with NEXTHOP 1.1.1.1 Flip flops between inaccessible and accessible with an IGP cost of 2 Troubleshoot 1.1.1.1 and the churning will stop
BRKRST-3320
Cisco Public
95
Layer 3 VPNs
Layer 3 VPNs
#1
#2
#2
Check IGP
Check LDP
CE1
CE2
BRKRST-3320
Cisco Public
97
Layer 3 VPNs
#3 PE PE vrf connectivity
Can PEs ping the vrf interface of the other PE? If not double check your import/export Route Targets Verify each PE can ping the CE connected to the other PE At this point you should be able to ping CE to CE
PE1
#3
PE2
#4 PE CE connectivity
#4 #5
CE1
#4
CE2
#5 CE CE connectivity
BRKRST-3320
Cisco Public
98
Looking Glasses
The Internet
BGP Looking Glasses
You are advertising your address space to your ISPs Q: How can you verify they are receiving it? Q: How can you verify the rest of the Internet is receiving it? A: BGP Looking Glasses
BRKRST-3320
Cisco Public
100
BGP Looking Glass servers are computers on the Internet running one of a variety
of publicly available Looking Glass software implementations. A Looking Glass server (or LG server) is accessed remotely for the purpose of viewing routing info. Essentially, the server acts as a limited, read-only portal to routers of whatever organization is running the Looking Glass server. Typically, publicly accessible looking glass servers are run by ISPs or NOCs
http://www.bgp4.as/looking-glasses
The Internet
BGP Looking Glasses
https://www.sprint.net/lg/
BRKRST-3320
Cisco Public
102
The Internet
BGP Looking Glasses
http://whois.arin.net/ui
BRKRST-3320 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 103
The Internet
BGP Looking Glasses
http://www.bgp4.as/looking-glasses
BRKRST-3320
Cisco Public
104
The Internet
BGP Looking Glasses
The Level3 looking glass will translate AS #s to company names
AS-PATH Translation: GBLX SHAWFIBER 3549 6327 AS-PATH:
BRKRST-3320
Cisco Public
105
The Internet
Whose AS is That Anyway?
Or lookup a specific AS
http://bgp.potaroo.net/cidr/autnums.html
http://whois.arin.net/rest/asn/AS1239/pft
BRKRST-3320
Cisco Public
106
The University's Route Views project was originally conceived as a tool for Internet operators to obtain real-time information about the global routing system from the perspectives of several different backbones and locations around the Internet. Although other tools handle related tasks, such as the various Looking Glass Collections (see e.g. NANOG, or the DTI NSPIXP-2 Looking Glass), they typically either provide only a constrained view of the routing system (e.g., either a single provider, or the route server) or they do not provide real-time access to routing data. While the Route Views project was originally motivated by interest on the part of operators in determining how the global routing system viewed their prefixes and/or AS space, there have been many other interesting uses of this Route Views data. For example, NLANR has used Route Views data for AS path visualization (see also NLANR), and to study IPv4 address space utilization (archive). Others have used Route Views data to map IP addresses to origin AS for various topological studies. CAIDA has used it in conjunction with theNetGeo database in generating geographic locations for hosts, functionality that both CoralReef and the Skitter project support.
Dont forget to activate your Cisco Live Virtual account for access to all session material, communities, and on-demand and live activities throughout the year. Activate your account at the Cisco booth in the World of Solutions or visit www.ciscolive.com.
Cisco Public 108
BRKRST-3320
Final Thoughts
Get hands-on experience with the Walk-in Labs located in World of Solutions, booth 1042 Come see demos of many key solutions and products in the main Cisco booth 2924 Visit www.ciscoLive365.com after the event for updated PDFs, ondemand session videos, networking, and more! Follow Cisco Live! using social media:
LinkedIn Group: http://linkd.in/CiscoLI
BRKRST-3320
Cisco Public
109
BRKRST-3320
Cisco Public