Beruflich Dokumente
Kultur Dokumente
A Router’s Job
• So I got a packet, where to next?
– Which “next hop”?
- Out-going router interface to use when forwarding traffic to the
destination. May also include the IP address of the next router (if
any) in the path towards the destination.
Destination address
0101001101……………….1101111100011010 83.125.35.131
?
?
? 2
The Lookup Operation
Next
Route
Hop
• Two basic pieces
155.23.0.0/16 3
1. Forwarding table
155.23.14.0/24 4
2. Longest prefix match rule 162.26.0.0/16 1
• Forwarding table
112.0.0.0/8 2
– List of all the routes
183.30.0.0/16 7
(destinations I know) and their
associated next hops 83.0.0.0/8 3
• Longest prefix match 83.125.35.131 83.125.0.0/16 11
– For any destination address 83.125.35.0/24 9
find the route that differs from
83.125.35.128/25 12
the destination address at the
furthest bit position
- More specific is better!
193.2.0.0/16 1
0.0.0.0/0 2
3
So Where Does Routing Fit in All Of This?
4
Key Problem
• How to make correct local decisions?
– each router/switch must know something about global state
• Global state
– inherently large
– dynamic
– hard to collect
• A routing protocol must intelligently acquire,
summarize, and maintain relevant information
– How do I find out about other routers and links?
– How do I use that information to generate routes?
– How do I maintain routes in the presence of changes?
5
Design Direction
• Designing a single solution that spans the entire
Internet is unlikely to work
– Computational cost, protocol and bandwidth overhead
• Heterogeneous environment
– Domain sizes and requirements, routers capabilities
– Different constraints on internal and external connectivity
Basic design approach
– Hierarchy of domains (reflects address hierarchy)
- Ensures scalability
– Independence of routing protocols in different domains
- Support for heterogeneity
– Gateways between domain for end-to-end solution
6
A Bird’s Eye View of the Internet
• Basic “two-level” hierarchy AS 376
– Federation of inter- AS 441
connected islands –
Autonomous Systems (AS)
- Each island has its own AS 168 AS 2
internal rules.
- Islands collaborate to
offer end-to-end
connectivity. AS 524
AS 3
AS 1
• Some islands are bigger and
more powerful than others.
– Willingness to carry traffic
for others AS 321
AS 123
- Peering or transit AS 121
agreements
AS
3411
7
Delivering Ubiquitous Connectivity
- It could change…
AS
3411
8
Routing Protocol Overview
• Routing protocols follow the AS 376
two-level hierarchy of the AS 441
Internet
– Interior Gateway
Protocols (IGP) control
routing within an AS 2
AS/domain AS 168
– Exterior Gateway
Protocols (EGP) control
routing between AS’s
AS
3411
10
Protocol Design Goals and Requirements
• Minimize routing table space
– Faster look-up (although forwarding table can be more compact)
– Less to exchange (lower processing and bandwidth overhead)
– Lower storage cost
• Minimize number and frequency of control messages
– Lower processing and bandwidth overhead
– But need to take responsiveness to changes into account
• Robustness
– Avoid black holes (unable to reach destination)
– Prevent or recover from loops (inconsistent decisions among routers)
– Limit instability (oscillation between possible routes)
• Optimize use of network resources and overall
performance
– Best possible and/or most efficient path
11
General Design Choices
• Where are routes computed?
– Centralized vs. distributed
- Centralized is simpler but may not scale and more prone to failure
- Distributed requires collaboration between routers and has more
complex transient behavior
• How are routes computed?
– Distributed computations vs. distribution of information from
which routes can be computed
- Distance vector: routers exchange results of computations
– Routing table and costs
- Link state: routers exchange information on which computations
can be performed independently
– Topological and reachability information
• What criteria are used when computing routes?
– What metrics to optimize (hop count, bandwidth, delay, etc.)?
– Static vs. dynamic metrics?
12
IGPs Job Description
13
Link State (LS) Routing
14
OSPF Overview
• Two level hierarchy
– Backbone area (area 0) connects other areas (hub and spoke).
– Costs are assigned to internal links and external routes for fine
tuning of traffic distribution.
• Link state operation
– Routers broadcast inside their area knowledge of their local
neighborhood.
- Routers glue local neighborhood pieces to create an area map.
– Information about other areas is summarized and broadcast into an
area by Area Border Routers (ABRs).
– Autonomous System Boundary Routers (ASBRs) are responsible for
injecting information on how to reach external destinations.
• Routing table construction
– Routers rely on their domain map and summary information about
other areas and the outside world to compute consistent best paths
to known destinations.
– Multiple paths can exist for any given route.
15
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers
through HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from
Change notification
neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.
16
OSPF HELLO Protocol (1)
17
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through
HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
Change notification
18
Database Synchronization (2a)
• Each router maintains a database that is used to
store a “complete” map of the network (more on how
this map/database is actually constructed later on).
• When routers boot, or the link between routers
comes up, routers perform database synchronization.
– The goal is to quickly ensure a common view of the world.
- Synchronization process determines what each router
knows and does not know and which router has the most
recent information.
– Database entries are characterized by sequence number and
age.
- Only newer or unknown entries are exchanged.
Neighbor routers end up with a common domain
map.
19
Database Synchronization
• During partition, routers’ databases can become unsynchronized.
Router Router
2 6
Router 1 Router Router 2 Router
4 1
5 8
3 Router Router
3 7
20
Building the Topology Database (2b/c)
• OSPF router advertises itself and its view of its
neighborhood with Link State Advertisements (LSAs).
– Puzzle piece from which the full map can be built
- Router ID identifies the originator of each puzzle piece
– Several types of advertisements (in OSPF)
- Multiple LSA types keep LSA size small (40 bytes on
average for OSPF).
- Small granularity ensures updates of minimal size.
- Different types of LSAs can have different scopes of
distribution.
– Link local, area local, domain-wide
• Router forwards LSAs it originates or receives from its
neighbors (flooding process).
• Sum of all received LSAs makes up a router’s topology
database.
– Common network map shared by all routers in an area
21
LSA Distribution: Flooding
• New LSAs are forwarded on all eligible (type-dependent)
links except the link on which they were received (if
applicable).
– Dissemination of information is independent of routing.
– Each LSA is transmitted at most twice on each link.
• New LSAs replace previous ones in local database.
• LSAs are transmitted reliably (acknowledged).
• Flooding rate bounded through minimum gap between
LSAs
• Periodic (30 min.) flooding to refresh database (aging of
LSAs)
– Why is periodic refreshing necessary if transmission is reliable?
1 1
1 1
23
Flooding Example
2 2
3 4
4 2
4
24
Flooding Example
2 2
3 4
5 2
5
25
LSA Sequence Numbers
26
OSPF Sequence Numbering
and LSA Refreshes
• 32-bit sequence number (signed integer)
– InitialSequenceNumber of -N+1
– Increment by 1 (up to N-1) for each new update
• Wrap-around handled by premature aging of entry
before flooding new value (when reaching N-1)
– Flood LSA with age of MaxAge (1 hour) to purge LSA
– Send new LSA with sequence number of -N+1
– Rare event (over 100,000 years with a 30 min. refresh period)
• Receiving a self-originated LSA with a larger sequence
number than last transmitted LSA
– Caused by presence of residual LSA originated prior to last
router restart
– Jump to one more than received number and flood again
27
Aging of LSAs
• LSA has an age field.
– Incremented at each transmission and while stored
– MaxAge determines time-to-live
- Gets reset at each refresh
• Two uses of MaxAge: Flooding of MaxAge LSA
– Removal of timed-out or invalid LSAs
– Removal of current entry before counter wrap-around
• Impact of MaxAge on boot-up problem
– If router waits for MaxAge, old LSAs will be purged.
– Selection of MaxAge is a difficult choice.
- Small MaxAge minimizes boot-up latency.
- Small MaxAge imposes higher flooding overhead and may prevent
full LSA distribution in large networks.
• Initial database synchronization handles boot-up.
28
Securing LSA Databases
29
What Are Link State Advertisements?
• Five basic types of LSAs:
– Router LSA: A router’s neighborhood
Intra
– Network LSA: Connectivity through a broadcast network
– T3 Summary LSA: Reachability to a route in another area
Inter
– T4 Summary LSA: Reachability to an ASBR in another area
External – T5 LSA: AS-external LSA to a route in another AS
30
What Is a Router LSA?
• Router LSA advertises to other routers in my “area” what my
neighborhood looks like
– What kind of links, to which neighbors, at what cost?
- Point-to-point, point-to-multipoint, broadcast, etc.
– What set of local destinations can I reach?
- Stub networks and transit networks
– What kind of router am I?
- Internal, Area Border Router, AS Boundary Router
19.2.4.0/24
10 1
Stu 6
b
4
12
7
Stu 0
b Transit
19.2.5.0/24 19.2.0.0/16
0
0
Designated Router
31
What Is a Network LSA?
• Network LSA identifies all the routers attached to
the same broadcast network.
– Advertised by Designated Router (DR) that is chosen
through an election process.
– Backup DR is also elected and takes over if the DR fails.
– All routers exchange HELLO packets with only the DR and
the backup DR.
Transit
19.2.0.0/16
Backup DR
Designated Router
32
Why A Network LSA?
A B C D
• Multiple routers attached to
broadcast LAN
– They can all reach each other.
E F G
33
Building an Area Topology Map
• Putting the puzzle pieces together
Stu Transit
b 19.2.0.0/16
19.2.5.0/24
34
Building an Area Topology Map
• Putting the puzzle pieces together
Stu
b Transit 0
19.2.0.0/16 R4
19.2.5.0/24
0
R3
Designated Router
35
Building an Area Topology Map
• Putting the puzzle pieces together
Stu
b Transit 0
19.2.0.0/16 R4
19.2.5.0/24
0
R3
Designated Router
36
Building an Area Topology Map
• Putting the puzzle pieces together
37
Building an Area Topology Map
• Putting the puzzle pieces together
38
Building an Area Topology Map
• Putting the puzzle pieces together
39
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through
HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from
Change notification
neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks
and then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.
40
Computing the Intra-Area
Routing Table (3a)
• Basic setting
– Topology database has stabilized.
– Router computes shortest paths from itself to all routers and transit
networks and then stub networks.
• Path computation based on Dijkstra algorithm
– Maintain two sets of nodes
- Nodes to which the shortest path is known (set S)
- Nodes to which candidate shortest paths are known (set C)
- Initially only the origin router is in set S
– Iterate and at each iteration:
- Consider all neighbors of last node X added to S and add them to C if
they are not already in it.
- Update candidate paths of all neighbors of X if path through X is shorter
than their current path.
- If C is empty, the algorithm terminates.
- Otherwise add to S the node in C that is “closest” to X and iterate.
41
Dijkstra Shortest Path Computation
1
1
R2
5
R1
• What intra-area 4
6
12 2
routing table 0 6
1 R4
does router R1
T 0
come-up with, 1
0
and how does it 5
do it? R3
42
Dijkstra’s Operation At R1
S = {(R1,0,R1)};
C = {(R2,1,R2); (T,12,T); (R3,,*); (R4,,*)}
S = {(R1,0,R1); (R2,1,R2)}; 1
1
C = {(T,12,T); (R3,,*); (R4,6,R2)} 1 R2
R1 5
S = {(R1,0,R1); (R2,1,R2); (R4,6,R2)}; 4
12 6
2
C = {(T,12,T,R2); (R3,,*)} 0 6
1 R4
S = {(R1,0,R1); (R2,1,R2); (R4,6,R2); T 0 1
(T,12,T,R2)}; 0
5
C = {(R3,12,T,R2)}
R3
S = {(R1,0,R1); (R2,1,R2); (R4,6,R2);
3
(T,12,T,R2); (R3,12,T,R2)};
C=
43
Adding Transit Networks First
B
• The order in which nodes are 1 1
added to the labeled set can
affect the number of paths A E
discovered to some nodes.
This is because once a node is
added to the labeled set it is 2 0
never revisited
– If E is added first to set of C
labeled nodes, the path A-C-E
of cost 2 is not discovered
– If C is added first to set of 1
B
labeled nodes, the path A-B-E-
C of cost 2 is not discovered A 1 1
1
• In OSPF transit network nodes 2
1
always have outgoing costs of 0 E
0, and therefore must be added C
0
first to the set of labeled nodes
44
Adding Stub Networks
• For each stub network:
– Identify all routers that advertise the stub network.
– Retrieve the shortest path to those routers.
– Add the cost of the shortest path to the router to the cost of the
stub network link advertised by each router in its Router LSA.
– Pick the router(s) that yield the smallest cost.
– Add the stub network to the routing table with the same next
hop(s) as the selected router(s).
• Four stub networks in previous example:
– 19.2.4.0/24 and 19.2.5.0/24 are directly connected to router R1.
– 19.2.6.0/24 is reachable from both R2 and R4, and R2 is the lower
cost option (total cost of 1+1=2 vs 1+5+2=8).
– 19.2.7.0/24 is reachable from both R3 and R4, and R4 is the lower
cost option (total cost of 1+5+1=7 vs 12+3=15).
45
What Intra-Area Routing Table at R1?
46
From One To Multiple Areas
• Why can’t we keep increasing the number of routers in an area?
– Topology database size and flooding overhead increase.
– Most importantly, route computation can become very onerous.
- Cost and frequency of Dijkstra increase.
• Basic solution is to partition a domain into multiple areas.
– Two-level hierarchy:
- Backbone area as a hub to which other areas connect
- Area Border Routers (ABRs) interconnect areas.
• Full topology is maintained only within an area.
– Flooding of router and network LSAs is limited to within an area.
– Dijkstra computation is limited to one area.
• Domain-wide shortest path computed using a DV-like approach
– ABRs advertise their cost to remote destinations.
– Shortest path is computed by concatenating costs to and from ABR.
47
What Are Link State Advertisements?
• Five basic types of LSAs:
– Router LSA: A router’s neighborhood
Intra
– Network LSA: Connectivity through a broadcast network
– T3 Summary LSA: Reachability to a route in another area
Inter
– T4 Summary LSA: Reachability to an ASBR in another area
External – T5 LSA: AS-external LSA to a route in another AS
48
Router’s View of a Multi-Area Domain
19.1.6.0/24
19.2.6.0/24 T3
Stu
b 1
1 1 R2 Stu
19.2.4.0/24 4 b 19.1.5.0/24
R1 14
1 12 5
2 17 T3
Stu 0 6
b Transit 0
6 R4 21 19.1.2.0/23
19.2.5.0/24 19.2.0.0/16
0
ABR
5 T3
3 26
R3
Designated Router Stu 19.3.0.0/16
b
19.2.7.0/24 19
T3
19.1.0.0/16
T3
49
Generating Summary LSAs
50
A Day in the Life of an OSPF Router
neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.
51
Computing Inter-Area Paths (3b)
• Path selection
– For all ABRs that advertise the target route (longest prefix
match), add the “cost to the ABR” to the “cost from that ABR
to the remote destination.”
– Pick the ABR(s) with the smallest total cost as the target exit
point(s) to reach remote destination.
– Set next hop(s) for remote destination to the next hop(s) of
the shortest path(s) to the selected ABR(s).
52
Example of Inter-Area Path Computation
Rtr 2 Rtr 10 Rtr 17 r
1
2 2 2 2
Rtr 1 Rtr 4 Rtr 7 Rtr 11 Rtr 14 Rtr 18
2 2
20 2
Rtr 3 2 Rtr 9 Rtr 16
2 2 2
2 2
2 2 6 2 2 2 2 2
2 2 2 2 2 2
2 2 2 2 2
Rtr 5 Rtr 6 Rtr 8 Rtr 12 Rtr 13 Rtr 19
Area 1 Area 0 Area 2
• Step 1
– Router 14 advertises a T3summary with cost of 3 for r into area 0.
• Step 2
– Router 4 advertises a T3summary with cost of 13 (10+3) for r in area 1.
• Step 3
– Router 1 identifies router 6 as the best exit point to reach r (4+9 < 4+13).
– Router 1 identifies router 3 and router 5 as its next hops to reach r.
53
What Intra & Inter-Area
Routing Table at R1?
19.1.6.0/24
19.2.6.0/24 T3
Stu
b 1
1 1 R2 Stu
19.2.4.0/24 4 b 19.1.5.0/24
R1 14
1 12 5
2 17 T3
Stu 0 6
b Transit 0
6 R4 21 19.1.2.0/23
19.2.5.0/24 19.2.0.0/16
0
ABR
5 T3
3 26
R3
Designated Router Stu 19.3.0.0/16
b
19.2.7.0/24 19
T3
19.1.0.0/16
T3
54
Intra and Inter-Area Routing Table at R1
56
What Are Link State Advertisements?
57
A Router’s View of The “World” 0.0.0.0/0
T5
T4 <2,20>
R5
T3
19.2.6.0/24 ASBR
Stu
b 1 17 19.1.6.0/24
1 1 R2 Stu
19.2.4.0/24 4 b
R1 14
1 12 5
2 T3
0 17
Stu 6
b 0 19.1.5.0/24
Transit 6 R4 21
19.2.5.0/24 19.2.0.0/16
0
ABR
5 T3
26
3 19
R3 19.1.2.0/23
Designated Router Stu
b
T3
19.2.7.0/24
T3 19.3.0.0/16
19.1.0.0/16
58
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through
HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from
Change notification
neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.
59
Computing Paths to External Routes (3c)
• Two cost components (again like inter-area & stubs):
1. Cost from ASBR to external route
2. Cost from source to ASBR
• Path selection
– Smallest type 2 cost wins independent of internal cost
– If type 1 cost or equal type 2 cost
- Prefer non-backbone intra-area path to ASBR (or forwarding
address) and use cost to break ties
- If no such path, use cost to select path
– Cost computation
- If (equal) type 2 cost: Cost to ASBR
- If type 1 cost: Cost to ASBR + Cost from ASBR
- Cost to ASBR:
– Direct cost if within same area
– Cost to ABR plus T4 summary cost advertised by ABR (cost from
ABR to ASBR in remote area)
60
Examples of External Path Computation
<1,20> <1,2>
Rtr 2 <1,2> r r’
Rtr 17
r’
2 2 2 2
Rtr 1 Rtr 4 Rtr 7 Rtr 11 Rtr 14 Rtr 18
Rtr 10 2 2
2 2
Rtr 3 2 Rtr 9 Rtr 16
2 2 2
2 2
2 2 6 2 2 2 2 2
Rtr 5 2 2 2 2 2
2
2 2 2 2 2
Rtr 6 Rtr 8 Rtr 12 Rtr 13 Rtr 19
Area 2
r
Area 1 Area 0
<1,20> <x,y>: <cost type, cost>
for external route
T5
T4
<2,20>
R5
T3
19.2.6.0/24 ASBR
Stu
b 1 19.1.6.0/24
1 1 R2 Stu 17
19.2.4.0/24 4 b
R1 14
1 12 5
2 T3
0 17
Stu 6
b 0 19.1.5.0/24
Transit 6 R4 21
19.2.5.0/24 19.2.0.0/16
0
ABR
5 26 T3
3 19
R3
Designated Router Stu 19.1.2.0/23
b T3
19.2.7.0/24
T3 19.3.0.0/16
19.1.0.0/16
62
Routing Table at R1
Routes Next Hop(s)
19.2.4.0/24 19.2.4.1 (IP address of local interface at R1)
ASBR
Area 0
r
Area 4 3
Area 2
Intra-area route
AS 123 Inter-area route
External route
64
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through
HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from
Change notification
neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.
65
Packet Forwarding (4)
• For each known route, paths (next hops) are installed
in the final routing table as follows:
– Intra-area paths are preferred.
– Inter-area paths are next.
– Type 1 external paths follow.
– Type 2 external paths are the least preferred.
– Cost is used as a last resort to break ties.
67
Example of Non-Shortest Path
2 2 2 2
Rtr 1 Rtr 4 Rtr 7 Rtr 11 Rtr 14 Rtr 18
2 2
20 2
Rtr 3 2 Rtr 9 Rtr 16
2 2 2
2 2
2 2 6 2 2 2 2 2
2 2 2 2 2 2
2 2 2 2 2
Rtr 5 Rtr 6 Rtr 8 Rtr 12 Rtr 13 Rtr 19
Area 1 Area 0 Area 2
2 2 2 2
Rtr 1 Rtr 4 Rtr 7 Rtr 11 Rtr 14 Rtr 18
2 2
4 2
Rtr 3 2 Rtr 9 Rtr 16
2 2 2
2 2
20 2 8 6 2 2 2 2
20 2 2 2 2 2
20 2 2 2 2
Rtr 5 Rtr 6 Rtr 8 Rtr 12 Rtr 13 Rtr 19
Area 1 Area 0 Area 2
2 2
Router 1 Router 4 Router 14 Router 12
2 2
Router 3 2 Router 15 2
2 2 2 20
2 2 2 2
2 2 2 2
2 2 2 2 2
Router 5 Router 6 16 Router 13 Router 16
Router 10
Area 1 Area 0 Area 2
2 2
Router 7 Router 8
Router 9
2 2
Router 1 Router 4 Router 14 Router 12
2 2
Router 3 2 Router 15 2
2 2 2 20
2 2 2 2
2 2 2 2
2 2 2 2 2
Router 5 Router 6 16 Router 13 Router 16
Router 10
Area 1 Area 0 Area 2
2 2
Router 7 Router 8
Router 9
72
Virtual Links &Transit Areas
73
Two Examples of The Use of Virtual Links
5
2 4
1
Area 0 Area 2 3
1 3
3 2 2
6
2
7 1 5
2 3
1 3
2
2 5
3
1 Area 3
2 1
74
Another Virtual Link Example
• Or why virtual links can make understanding packet
forwarding more complicated
– When computing the shortest path to a destination, ABRs
attached to a transit area are allowed to consider summary
LSAs advertised in the transit area by other ABRs
3 2
1
Area 0
12
10
75
Another Link State Protocol: IS-IS
• IS-IS = Intermediate System to Intermediate System
– Intermediate System = Router
– End System = Host
• Just like OSPF
– A link-state protocol that relies on flooding
- Each node advertises its (local) view of the world in a Link
State Packet (LSP)
– Entries are regularly refreshed
- Nodes maintain a topology database on which they run
Dijkstra to compute shortest paths
– Database entries are aged out if not refreshed
– Hello protocol for liveness and neighbor discovery
– Broadcast network represented as “pseudo-node” with an elected
Designated IS (DIS) responsible for impersonating it
– A two-level hierarchy for improved scalability
76
Just like OSPF – Maybe Not
• IS-IS runs directly on the link layer, i.e., below and
independent of IP
– No exposure to IP “insecurity”, but no benefit from IP functionality,
e.g., fragmentation (common MTU across the entire network)
• IS-IS relies on different area and level definitions
– Area = level 1 domain based on local sharing of area address
– Routers belong to one area (area boundary through links not
routers, i.e., no ABR but “connected” routers)
– Level 2 backbone defines logical connectivity based on link types
and not as a separate area (OSPF area 0)
• IS-IS has a different LSP structure (just one per router)
– LSP originator identified by System ID of router (IS)
- Not an IP address…
– Extensive use of Type/Length/Value (TLV) format to encode
everything a router knows in one long list
- Affects update process: One small change triggers update for
everything
- Affects extensibility: New information easily encoded in new
TLV without requiring new packet type
77
More on IS-IS Differences – (1)
• Areas and levels (level 1 and level 2)
– Routers form adjacency over link if and only if
- They both agree it is a level 2 link
- They both agree it is a level 1 link AND they share at least one
area address
– Area IDs are link local, which facilitates merging and splitting areas
• Level 2 and level 1 structure
– Links can be level 1, level 2 or BOTH level 1 and level 2
– Level 2 backbone is mostly a connectivity concept
- L1/L2 links are somewhat similar to OSPF virtual links
– Connectivity between level 1 areas is provided through “attached”
router
- Router with connectivity to L2 backbone as indicated in L1
LSP (Attached bit)
- Default behavior routes inter-area packets to closest attached
router (route leaking extensions)
78
Merging, Splitting, Renumbering Areas
A
B
L2
Area 7
• Merging area 7 and area 8
Area 8 – Reconfigure link A-B to be L1/L2
C
D link with, say, area 7 as area ID
common to A & B
A
L1/L2 B – L1 flooding domain now extends
to routers A, B, C and D
Area 7 Area 7
C – Add area 7 as area ID for link
Area 8
B-D on both routers B and D
D
– Reconfigure link A-B to be L1
only
– Remove area 8 ID from routers B
A and D for link B-D
L1 B
• Similar (symmetric) processes
Area 7 Area 7 can be followed for area splitting
C Area 7 or renumbering
D
79
Level 2 Backbone
A1
4
5
2
4
2 4 25 19.1.1.0/24
1
1
1 4 10
1
4
81
More on IS-IS Differences – (2)
• HELLO protocol
– Local hold timer for each link (carried in Hello messages)
– Used for MTU check through padding of Hello messages
– Three-way handshake to test for bidirectional connectivity
- Hello message list addresses of IS from which Hello have been
heard
• Broadcast network
– DIS election is preemptive (IS with highest priority wins)
– LAN represented by pseudonode LSP from DIS
- Identified by non-zero pseudonode ID
– No backup DIS (DIS can reduce its Hello hold timer – JUNOS)
• Database synchronization
– On LANs DIS sends Complete Sequence Numbers PDUs (CSNPs)
every 10s (unreliable transmissions – implicit ACKs)
– On P2P links initial CSNP sent only when adjacency comes up
- JUNOS implements periodic resending of CSNP on P2P links
– Other routers request specific LSPs using Partial Sequence Numbers
PDUs (PSNPs) or reflood missing/old LSPs
- ACKs (PSNP or CSNP) needed for received LSPs on P2P links
82
More on IS-IS Differences – (3)
• Fragmentation
– Remember that IP fragmentation is not available
– IS-IS does application level fragmentation (assumes minimum
MTU size of 1492 bytes – verified using Hello padding)
– CSNP fragmentation
- Based on using Start-LSP-ID and End-LSP-ID fields to
indicate beginning and end of synchronization
– LSP fragmentation
- Based on fragment ID byte (up to 256 fragments)
– Extension based on assigning additional IDs to IS
- Fragments zero is mandatory for others to be considered
- Fragments are atomic (arrive independently) Need to be
careful in packaging information in fragments to avoid
churning in the presence of changes
83
LSP Fragmentation
Fragment 00 Fragment 01
Fragment 00 Fragment 01
Preserving fragment
structure ensures that
Adj Adj Adj Adj
only fragment 00 is re- Header
#1 #20 #21 #22
advertised
84
More on IS-IS Differences – (4)
• LSP structure
– Link State Packet as a container
– Container content based on packing independent entities (TLVs)
that each provide different type of information
• Benefits
– Protocol machinery associated with the handling of the containers
is independent of its content (reusable)
– Carrying new information in container only requires definition of
new TLVs
– LSPs can still be parsed even if they contain unknown TLVs (just
skip the # bits specified in the Length field)
• Drawbacks
– Container imposes rather coarse information granularity
- Whole container is resent in the presence of changes
- Need to exercise caution when information is spread over
multiple fragments
– Purge mechanism is required
85
Purging Old Fragments
Fragment 00 Fragment 01
86
Main IS-IS TLVs – (1)
87
Main IS-IS TLVs – (2)
TLV TLV # Where Used?
IS Reachability 2 LSP
Extended IS Reachability 22 LSP
88
IS Reachability TLVs
89
Some IS-IS Sub-TLVs – (1)
Sub-TLV Sub-TLV #
Maximum Link Bandwidth 9
Reservable Link Bandwidth 10
Unreserved Bandwidth 11
90
IP Reachability TLVs
91
Multi-Topology TLVs
• SPF computations
– Explicitly structured as a two-phase process
• Phase 1
– Compute shortest paths from IS to IS (on a graph with
routers/IS and “networks” as vertices and links/adjacencies
as edges
- One SPF per topology
– Independent of any underlying “reachability” information
• Phase 2
– Add shortest paths to all “leaves” – reachability information
• Phase 1 can be reused across protocols and is only
triggered in the present of connectivity changes
• Phase 2 is reachability specific
93
Routing Protocol Overview
• Routing protocols follow the AS 376
two-level hierarchy of the AS 441
Internet
– Interior Gateway Protocols
(IGP) control routing within
an AS/domain AS 168 AS 2
– Exterior Gateway
Protocol(s) (EGP) control
routing between AS’s
• Different goals and
constraints for each family AS 524
of protocols AS 3
AS 1
– IGP: Ability to fine tune
internal operation and
shielding from outside
“noise”
– EGP: Scalability and ability AS 321
to AS 123
AS 121
AS
3411
94
Growing Up
From One to Many Domains
• Goal
– Enable connectivity between domains (Internet-wide)
• Requirements
– Operational flexibility and scalability, and
95
BGP Routing Table Growth
From http://bgp.potaroo.net
96
Some Basic Remarks
Before Jumping Into BGP
• A link state type of approach would simply not work
– Requires building and maintaining a map of the entire Internet in
every router...
– The need for consistent information and decisions cannot be
satisfied as the network size grows
- Things are always changing somewhere in the Internet
• Distance vector protocols are the only realistic option
– Better scalability by limiting the level of topology information that
each router maintains
– Preserve ability to use different route selection criteria as each
router
- No need for consistent metrics
- Seamless support for policies
– Control of what routing information is sent to whom
97
Border Gateway Protocol
• DV protocol for inter-domain routing
– Supports arbitrary topology (but no overlapping domains)
– Governs exchange of information between internal and external
border routers (BGP peers)
- Internal peers: within the same domain
- External peers: in two adjacent domains
- Each domain is characterized by a unique autonomous system number
• Major BGP characteristics
– Selection of “best” path (avoid stupid choices and support strong
administrative control)
- Multiple path attributes
– Loop avoidance (path vectors)
– Scalability through route aggregation
• BGP as a protocol is relatively simple (86 pages for the latest draft
vs 244 for RFC 2328), but its configuration can be complex and
errors can have far-reaching implications
98
BGP Operation Overview
Three major phases:
– Neighbor acquisition and reachability, exchange of routing
information, and path selection (steady state)
1. Neighbor acquisition and reachability
– Initiated through OPEN message and maintained by
KEEPALIVE messages
– Neighbor declared unreachable if no KEEPALIVE received
within Holding Time
2. Routing information exchanged through UPDATE
messages
– Incremental updates to advertise & withdraw routes
- Requires reliable transmission (uses TCP - port 179)
3. Path selection uses the information received in
UPDATE messages to select the best path for a route
and construct the routing table
99
The BGP State Machine
“Normal” Sequence
Idle Idle
Connect Active
Connect
OpenSent
OpenSent
OpenConfirm
OpenConfirm
Established
Established
100
BGP Information Flow and Sources
• Different peering sessions with internal (same AS) and
external (different AS) neighbors
– External BGP neighbors communicate via eBGP
– Internal BGP neighbors communicate via iBGP
- All BGP peers in an AS are typically connected in a full
mesh (more on this later)
iBGP AS 3
iBGP eBGP
AS 1 iBGP iBGP
AS 2
101
BGP Processing Steps
Phase 1 Phase 3
Determines Determine which
RIB_In RIB_Out
Rtr A2
degree of routes to advertise Rtr A2
preference based on policies
Phase 2
RIB_In RIB_Out
IBGP Rtr B2 Select best routes to Rtr B2 IBGP
install in LocRIB
EBGP EBGP
RIB_In RIB_ Out
Rtr B3 Router D2 Rtr B3
102
BGP UPDATE Message
advertisement
– Can contain multiple routes Withdrawn Routes (variable)
that are being withdrawn
– Path Attributes describe a
number of key properties of Total Path Attribute Length (2 bytes)
the advertised route that are
used to select the best path
– NLRI is a list of IP address Path Attributes (variable)
prefixes associated with a
given BGP route (common Network Layer Reachability Information
set of Path Attributes) (NLRI) (variable)
103
Path Attributes
General Characteristics
• Several categories of attributes
– Optional or well-known, mandatory or discretionary, transitive or not,
partial or not
• Well-known attributes must be recognized by all BGP
implementations
– Mandatory well-known attributes must be included in every UPDATE
message, while discretionary well-known attributes may or may not be
sent based on the content of the message
– Well-known attributes MUST be passed along (after updating) to other
BGP peers
• Optional attributes need not be recognized by all BGP
implementations
– Unrecognized transitive attributes SHOULD be passed to other BGP
peers with the partial bit set
– Unrecognized non-transitive attributes are ignored
104
Path Attributes (1)
• AS_PATH
– Well-known, mandatory
– Sequence of path segments of type AS_SET (1) or AS_SEQUENCE (2)
- AS_SET: Unordered list of autonomous systems traversed by the
route
- AS_SEQUENCE: Ordered list of autonomous systems traversed by
the route
– Updated by “pre-pending” own AS number when advertising to a BGP
speaker in another AS Loop prevention
• NEXT_HOP
– Well-known, mandatory
– IP address of border router to be used as next hop towards destination
identified by the NLRI field
– Typically chosen to ensure that the “shortest” path is taken
• ORIGIN
– Well-known, mandatory
– Characterizes where the path first originated
- IGP: 0; EGP: 1; Other: 2
– Should not be changed by other BGP speakers
105
Path Attributes (2)
• LOCAL_PREF
– Well-known, discretionary
– Advertisement to other BGP speakers in the same AS (iBGP) of the
degree of preference of a route by the advertising router (higher
value is preferred)
• MULTI_EXIT_DISC (MED)
– Optional, non-transitive
– Used to give some preference to different exit/entry points in a
neighboring AS (lower value is preferred)
• COMMUNITY
– Optional, transitive, used to simplify routing policies
- Common property used to determine which routes to accept,
prefer, and pass to BGP neighbors
– Some well-known communities:
- NO_EXPORT: do not advertise outside of the AS (or
confederation)
- NO_EXPORT_SUBCONFED: do not advertise to external peers
(including peers in other autonomous systems within a
confederation)
- NO_ADVERTISE: not advertised to any BGP peer 106
Path Attributes (3)
• AGGREGATOR
– Optional, transitive
– Contains IP address and AS number of the BGP speaker that
formed the aggregate route
• ATOMIC_AGGREGATE
– Well-known, discretionary (should be propagated)
– Informs other BGP speakers that the advertiser aggregated
several routes and may have removed some autonomous
system numbers from the AS_SET (loop free property must
be maintained, though)
- As a result, actual path may differ from AS_PATH
- Basically used to signal possible loss of information
– NLRI field must not be modified by adding a more specific
prefix, i.e., route must not be de-aggregated (loop prevention)
107
Path Attributes (4)
• ORIGINATOR_ID
– Optional, non-transitive
– Used by Route Reflectors (more on this later)
– Identifies the local router (within the local AS) that originally
advertised the route
• CLUSTER_LIST
– Optional, non-transitive
– Used by Route Reflectors to detect looping of routing
information in an AS because of misconfiguration
- Each Route Reflector prepends its CLUSTER_ID to the
CLUSTER_LIST
- Route Reflectors ignore advertisement that carry their
CLUSTER_ID in the CLUSTER LIST
108
BGP Decision Process
• Three phase process
– Phase 1: Calculates a “degree of preference” for each route in a
given RIB_In (locks the associated RIB_In)
- If route is learned from local peer, the LOCAL_PREF attribute
is usually taken as the degree of preference.
- If route is learned from an external peer, the degree of
preference is computed based on local policy.
– The resulting value is used as LOCAL_PREF in any iBGP
re-advertisement.
– Phase 2: Selects the “best” route out of all those available for
distinct destinations (locks all RIB_In)
- Excludes routes with unresolvable NEXT_HOP or a loop in the
AS_PATH attribute
- Best routes are installed in the Local RIB.
– Phase 3: Decides, based on policies, which routes in Local RIB to
advertise to which peer (blocks execution of Phase 2).
- Route aggregation can be performed at this stage.
109
BGP Tie Breaking Rules
• BGP selects a SINGLE route.
– Remove all routes that don’t have the smallest number of AS
numbers in AS_PATH (each AS_SET counts only as one!)
– Remove all routes that don’t have the lowest ORIGIN value
– Among routes learned from the same neighboring AS, remove
routes with less desirable (higher) MED values.
– If at least one route was learned through eBGP, remove all
routes learned through iBGP.
– Remove all routes with a non-minimum IGP cost to NEXT_HOP.
– Remove all routes that were not advertised by the BGP speaker
with the lowest BGP identifier.
– Prefer the route received from the lowest peer address.
110
Using LOCAL_PREF to Pick an Exit
Point
• Choosing between a primary and a backup provider
– Used to influence internal decisions
Primary
LOCAL_PREF=100
LOCAL_PREF=20
Backup
111
AS_PATH Padding to Discourage the
Use of Certain Links - (A Hack!)
• Used externally to influence choice of inbound links
– Choosing between a primary and a backup link
– Tuning inbound traffic for load-balancing purposes
• Can be over-ridden by local decisions
(LOCAL_PREF)
1.2.0.0/16; <AS1>
1.3.0.0/16; <AS1,AS1>
1.2.0.0/16; <AS1,AS1>
1.3.0.0/16; <AS1>
112
Another Way to Influence Entry Points
• MED allows crude selection ability
– Avoid low speed internal links
• But not always taken into account
113
Ignoring MED Values
• Hot potato routing
– Basic rule between ISPs
– “I wont carry your bits for you…”
114
Propagating Path Attributes (1)
• Let us follow UPDATEs for routes r and r’ located in AS 1.
• Router A1 originates updates for routes r and r’ and advertises them over
its eBGP session to Router A2.
– ORIGIN is set to 0 as routes r and r’ were learned through IGP.
r’
eBGP iBGP eBGP
Rtr B1 Rtr B2 Rtr C2 Rtr B3
AS 2 115
Propagating Path Attributes (2)
• Router A2 processes the updates it received from Router A1 for routes r
and r’ and decides to advertise them over its iBGP sessions to Routers
B2, C2 and D2.
– ORIGIN is kept unchanged.
r’
eBGP iBGP eBGP
Rtr B1 Rtr B2 Rtr C2 Rtr B3
AS 2 116
Propagating Path Attributes (3)
• Router D2 processes updates received from Routers A2 and B2 for routes r
and r’ and advertises a single UPDATE for aggregate route r* over its eBGP
sessions to Router A3.
– ORIGIN is kept unchanged.
– Router D2 generates new AS_PATH attributes for r and r’ by pre-pending
AS2 to the AS_PATH (value is now <AS2,AS1>) and because both
AS_PATH attributes are identical, the AS_PATH of r* is set to the same
value and type.
– Router D2 adds an AGGREGATOR attribute <AS 2;own IP address> but
no ATOMIC_AGGREGATE attribute as there was no information loss
– Router D2 sets NEXT_HOP to its own IP address.
r’
Rtr B1 eBGP iBGP eBGP
Rtr B2 Rtr C2 Rtr B3
117
AS 2
Decision Process Example (1)
• In AS 1 both routes r and r’: are learned from IGP
• In AS 2 routers hear about r and r’ from Router A2 and Router B2,
and both routes have the same AS_PATH count and ORIGIN value.
– For routes r and r’, Router A1 advertises MED values of 0 and 50, and
Router B1 advertises MED values of 50 and 0.
– If LOCAL_PREF values are equal, Routers C2 and D2 in AS 2 rely on MED
values and pick Router A2 as the NEXT_HOP for r and Router B2 as the
NEXT_HOP for r’ (Routers A2 and B2 pick Routers A1 and B1, respectively)
• In AS 3, Router A3 will pick Router D2 (eBGP from Router D2 vs
iBGP from Router B3); Router B3 will pick Router C2 (smaller BGP
ID); other BGP speakers pick Routers A3 or B3 based on IGP cost.
Rtr D2 Rtr A3
Rtr A1 Rtr A2 eBGP
eBGP iBGP
r
iBGP AS 3
iBGP eBGP
AS 1 iBGP iBGP
r’
eBGP iBGP eBGP
Rtr B1 Rtr B3
Rtr B2 Rtr C2
AS 2 118
Decision Process Example (1’)
• In AS 2 routers hear about r and r’ from Router A1 and
Router B1, and both routes have the same AS_PATH
count and ORIGIN value, but different MED values.
– For routes r and r’, Router A1 advertises MED values of 0 and 50, and
Router B1 advertises MED values of 50 and 0.
• For routes r and r’, Router A2 advertises LOCAL_PREF
values of 50 and 20, while Router B2 advertises 50 for both
– Router C2 and D2 pick Router B2 for r’, and select either Router A2 or
Router B2 for r based on their IGP cost (MED is ignored)
Rtr D2 Rtr A3
Rtr A1 Rtr A2 eBGP
eBGP iBGP
r
iBGP AS 3
iBGP eBGP
AS 1 iBGP iBGP
r’
eBGP iBGP eBGP
Rtr B1 Rtr B3
Rtr B2 Rtr C2
AS 2 119
Another Aggregation Example
• Routes r and r’ are aggregated into route r* by Router R when
advertised into AS 8
– AS_PATH attribute type changed to AS_SET
– Unordered list of ASes <AS 1;AS 2;AS 3;AS 4;AS 5;AS 6;AS 7>
– May omit some AS numbers if there is no risk of loop, e.g., advertise
AS_SET <AS 1; AS 2; AS 3; AS 7>
- ATOMIC_AGGREGATE attribute is added
- Actual path need not follow AS_PATH
AS 1 AS 2 AS 3
r
r* AS 8
AS 7
Router R
AS 4
AS 5 AS 6
r’
120
De-Aggregation and Loops
r’/24
Route r’ < Route r
r/16
r’; <AS1,AS2,AS3,AS4>
r’; <AS1,AS2,AS3,AS4>
r; <AS5,AS6>
r’; <AS5,AS6>
Illegal de-aggregation
121
Policies – One Example
122
Controlling Route Advertisements
Through Policies
AS 1, AS 6 AS 1, AS 6
0.0.0.0/0
0.0.0.0/0
0.0.0.0/0 0.0.0.0/0
123
Controlling Route Advertisements
Through Communities
• COMMUNITY attribute
– First two bytes carry ASN and last two bytes carry community
values used for local policy routing.
– 444: I2 routes; 445: Univ. X; 446: UUNET; 447: Co. X Research
GigaPOP
124
Enhancing BGP Scalability
• What is wrong with this picture?
• The need for an iBGP mesh
creates many problems.
– N-1 TCP connections at every
router
– Every new router requires
configuration updates at all other
routers.
– Every router maintains N-1 RIB_In
and RIB_Out.
– Every change at one router needs
to be processed by all other
routers.
• Solutions
– Break it up in smaller pieces
- Route Reflectors
- Confederations
125
Route Reflector
• Simple solution, compatible with current BGP operation, and
supports easy migration
– Some BGP speakers, Route Reflectors (RR), can redistribute to iBGP
peers routes learned from other iBGP peers.
• Route Reflectors have two types of iBGP peers:
– Client peers and non-client peers
- Non-client peers must be fully meshed but not client peers.
– RR and its clients form a cluster identified by a CLUSTER_ID.
- Multiple RRs are allowed in a cluster (redundancy).
• Two Attributes: ORIGINATOR_ID and CLUSTER_LIST
– RR sets ORIGINATOR_ID to be the ROUTER_ID of the router that
originated the route.
- Routers ignore routes with ORIGINATOR_ID equal to their
ROUTER-ID.
– RR prepends the local CLUSTER_ID to the CLUSTER_LIST when
reflecting a route.
- Used to detect looping of routing information
- Routes with local CLUSTER_ID in CLUSTER_LIST are ignored.
126
Route Reflector Operation
• When an RR receives a
route from an iBGP peer:
– Selects the best path based
on its path selection rule
– If the best path is from a
non-client peer, reflect to all
clients
– If the best path is from a
client peer, reflect to all
client and non-client peers
• Note that path selection
need not be identical to
that of a full iBGP mesh.
127
Confederations
• Basic principle
– Break-up one big autonomous system into smaller internal autonomous
systems
• But, this arrangement increases:
– Complexity of routing policy based on AS_PATH information
– External overhead when internal topology changes
• Autonomous system confederation
– Collection of autonomous systems advertised as a single autonomous
system to BGP speakers outside of the confederation
- Confederation is identified externally by a single autonomous system
confederation identifier
- Each member of the Confederation is given a member autonomous
system number that is used only inside the confederation
– Two additional AS_PATH type attributes:
- AS_CONFED_SEQUENCE: Ordered set of member autonomous
system numbers that an UPDATE message has traversed inside the
Confederation
- AS_CONFED_SET: Unordered set of member autonomous system
numbers
128
Confederation Operation
129
From BGP to Packet
Forwarding Decisions
• Recursive lookup at Router 1.1.1.1
– BGP routing table identifies Router 1.1.5.1 as the
NEXT_HOP for route r.
– IGP routing table identifies interface 10.2.1.1 on Router
1.1.2.1 as the next hop towards Router 1.1.5.1.
Forwarding table entry for route r points to 10.2.1.1 on
router 1.1.2.1 as the next hop.
Router
AS 2 1.1.1.1
Router 10.2.1.1
1.1.4.1 IGP
AS 1 Router Router
Router
1.1.3.1 1.1.2.1 AS 3
1.1.5.1
r
iBGP
130
End-to-End Connectivity
Gluing BGP and IGP Decisions Together
• Two cases
1. All routers are BGP speakers (BGP mesh, common in ISPs).
2. Some internal routers do not speak BGP.
• Case 1: BGP mesh
– Forwarding table can be constructed simply based on
recursive lookup.
- IGP provides connectivity between routers.
- BGP associates routes to routers.
• Case 2: Mix of BGP speakers and IGP-only routers
– BGP speakers participate in IGP.
– BGP speakers “export” routes into IGP.
- Example of OSPF ASBRs
131
From Routing Table
to Forwarding Table
• OK, we got to Router 1.1.2.1. Where to next?
– Case 1: BGP full or partial mesh
- Routers 1.1.2.1, 1.1.3.1, 1.1.4.1 also participate in iBGP.
– Partial mesh means that only those routers on the path
between 1.1.1.1 and 1.1.5.1 need to participate in BGP.
– Dangerous (why?) but not uncommon (why?)
- They all know that 1.1.5.1 is the desired exit point and
can forward packets.
Router
AS 2 1.1.1.1
10.2.1.1
Router
1.1.4.1
AS 1 Router
Router 1.1.2.1 AS 3
1.1.5.1
r
Router
1.1.3.1
132
From Routing Table
to Forwarding Table
• OK, we got to Router 1.1.2.1. Where to next?
– Case 2: BGP routes imported into IGP, e.g., OSPF
- Routers 1.1.1.1 and 1.1.5.1 are ASBRs.
- Router 1.1.5.1. advertises a type 1/2 external route r.
- Routers 1.1.2.1, 1.1.3.1 and 1.1.4.1 learn about r through
a type 5 External LSA advertised by 1.1.5.1.
- Router 1.1.1.1 learns about r through both BGP and
OSPF (consistency, precedence?)
Router
Router 1.1.1.1
AS 2 1.1.2.1
Router
1.1.4.1 10.2.1.1
AS 1 Router
Router
1.1.3.1 AS 3
1.1.5.1
r
T5: < r >
133
Forwarding Table Challenges
134
Impact of Protocol Dependencies
d1 d2
1 3
A
135
Impact of Protocol Dependencies
• BGP tells A that B and C can both
reach the Internet
– IGP costs to B and C are the tie- 300k 300k
breakers with d1<d2 routes routes
– B is the selected exit point to reach
the Internet through port #1 on A
• Internal link failure affects path from B C
A to B
– Exits through port #2 with IGP cost
d’1<d2 d’1 d2
• A needs to step through full BGP
table to determine that IGP change
did not affect BGP decision (d’1<d2) 2
• A needs to update all 300k entries
in forwarding tables to point to new 1 3
forwarding next hop for B now A
reachable over port #2
– A better option: Recursive lookup
136
Recursive Forwarding Structure
10’s of Next_Hops
300k prefixes
R23 R13
R22
R11
R12
R21
CE PE
MY BLUE/GREEN
P
VIRTUAL NETWORKS
R23
R24
R13 R14 140
VPN Definition and Scope
141
BGP/MPLS VPNs
142
VPN Terminology and Configurations
R22
Three types of routers R11 R12
Provider (P) or backbone only
routers
R21
Provider Edge (PE) routers interface
to customer sites
CE PE
Customer Edge (CE) routers attach
P
to Service Provider routers
P and PE router form the Service
Provider network
CE routers belong to customer R23
VPNs R24
But do not peer directly with each R13 R14
other (they peer with PE routers)
Sample VPN Configurations VPN1: R11, R12, and R13
VPN1 and VPN2 intranets VPN2: R21, R22, and R23
VPN3 extranet VPN3: R21, R22, and R23 connect to
VPN2 sites connect to servers at R11 R11 and R12 through R13 143
and R12 through firewall at R13
VPN Forwarding Overview
145
BGP Extensions in Support of VPNs
146
Multiprotocol Extensions to BGP
147
MP_REACH_NLRI
Reserved – 1 byte
Network Layer Reachability Information (NLRI) –
Variable
148
Specifying IPv4-VPN Routes
AFI/SAFI field identifies the network layer protocol to
which the NEXT_HOP address belongs, and specifies
the NLRI semantic
VPN-IPv4 address family
AFI=1 (IP) and SAFI=128 for labeled VPN-IPv4 addresses
Address format -12-byte quantity
8-byte Route Distinguisher (RD) + 4-byte IPv4 address/ prefix
Route Distinguisher (2-byte type field, 6-byte value)
Three defined types
Type 0: 2-byte administrator subfield (AS number)
4-byte number field administered by AS owner
Type 1: 4-byte administrator subfield (IP address)
4-byte number field administered by IP address owner
Type 2: 4-byte administrator subfield (4-byte AS number)
2-byte number field administered by AS owner
149
NLRI Encoding
150
Populating VRFs
151
Routing Information Flow
152
Forwarding Information Flow
153
Route Distribution Through Reflectors
154
Sample VPNs – Closed Mesh (1)
10.0.2.0/16
VPN1: 4 fully inter-connected sites 10.0.1.0/16
Basic configuration at PEs
RD1 value identifies VPN1 CE2
CE1
RT value of T1 for all VRF1 export and
import policies
VRF construction at PE1 (VRF1) PE2
PE1
Learns route 10.1.0.0/16 from CE1, and
installs in VRF1
P
Exports <RD1,10.1.0.0/16;T1,L1;PE1>
to BGP (Next_Hop self (PE1) and label
L1)
Advertises PE3 PE4
<RD1,10.1.0.0/16;T1;L1;PE1> to PE2,
PE3 and PE4
Receives CE3
CE4
<RD1,10.0.0.0/16;T1;L0;PE4> from PE4
<RD1,10.3.0.0/16;T1;L3;PE3> from PE3
<RD1,10.2.0.0/16;T1;L2;PE2> from PE2
and installs them in VRF1 10.3.0.0/16 10.0.0.0/16
155
Sample VPNs – Closed Mesh (2)
10.0.2.0/16
10.0.1.0/16
Forwarding of packet to 10.0.0.1
from PE1
Packet received from CE1 CE1
CE2
Lookup in VRF1 at PE1
10.0.0.0/16 as best route with
Next_Hop of PE4 PE1 PE2
Packet sent as MPLS packet with
label stack of <L(PE4),L0> P
Packet to PE4 delivered through
MPLS backbone based on label
L(PE4) PE3 PE4
PE4 pops label stack to expose L0
L0 identifies CE4 as packet
CE3
destination CE4
PE4 forwards packet to CE4 as
standard IP packet (removes L0)
10.3.0.0/16 10.0.0.0/16
156
Sample VPNs – Hub and Spoke
VPN2: All connectivity through CE1 10.0.2.0/16
10.0.1.0/16
Basic configuration at PEs
RD1 value identifies VPN2
CE1
Two route targets are defined: TH (hub) CE2
and TS (spoke)
At the VRFs attached to the hub site
(PE1), TH is the Export target and TS
the Import target PE1 PE2
At the VRFs attached to the spoke sites
(PE2, PE3, and PE4), TS is the Export P
target and TH the Import target
VRFs construction
PEs associated with spoke sites
PE3 PE4
Receive routes from their CEs and
export them to PE1 with target TS
Receive routes from PE1 with target TH
and import them in the VRF of their CEs CE3 CE4
PE1
Receive routes from spoke PEs with
target TS and installs them in CE1’s
VRF 10.3.0.0/16 10.0.0.0/16
Export routes (back)to spoke PEs with 157
target TH