Sie sind auf Seite 1von 157

IP Routing

A Router’s Job
• So I got a packet, where to next?
– Which “next hop”?
- Out-going router interface to use when forwarding traffic to the
destination. May also include the IP address of the next router (if
any) in the path towards the destination.
Destination address

0101001101……………….1101111100011010 83.125.35.131

?
?
? 2
The Lookup Operation
Next
Route
Hop
• Two basic pieces
155.23.0.0/16 3
1. Forwarding table
155.23.14.0/24 4
2. Longest prefix match rule  162.26.0.0/16 1
• Forwarding table
112.0.0.0/8 2
– List of all the routes
183.30.0.0/16 7
(destinations I know) and their
associated next hops 83.0.0.0/8 3
• Longest prefix match 83.125.35.131 83.125.0.0/16 11
– For any destination address 83.125.35.0/24 9
find the route that differs from
83.125.35.128/25 12
the destination address at the
furthest bit position
- More specific is better!
193.2.0.0/16 1
0.0.0.0/0 2
3
So Where Does Routing Fit in All Of This?

• Routing builds the forwarding table.


– Manual entries (static routes) take you only so far.
– You need a process that:
- Automates populating the routing table.
- Adapts to changes.
• Routing protocols are responsible for:
– Distributing known routes and the latest routing changes to a
router’s neighbors.
– Deciding the next hop(s) to be used for each route.
- What’s the best choice?
- Ensuring consistency between decisions at different
routers (no loops!)

4
Key Problem
• How to make correct local decisions?
– each router/switch must know something about global state
• Global state
– inherently large
– dynamic
– hard to collect
• A routing protocol must intelligently acquire,
summarize, and maintain relevant information
– How do I find out about other routers and links?
– How do I use that information to generate routes?
– How do I maintain routes in the presence of changes?

5
Design Direction
• Designing a single solution that spans the entire
Internet is unlikely to work
– Computational cost, protocol and bandwidth overhead
• Heterogeneous environment
– Domain sizes and requirements, routers capabilities
– Different constraints on internal and external connectivity
 Basic design approach
– Hierarchy of domains (reflects address hierarchy)
- Ensures scalability
– Independence of routing protocols in different domains
- Support for heterogeneity
– Gateways between domain for end-to-end solution

6
A Bird’s Eye View of the Internet
• Basic “two-level” hierarchy AS 376
– Federation of inter- AS 441
connected islands –
Autonomous Systems (AS)
- Each island has its own AS 168 AS 2
internal rules.
- Islands collaborate to
offer end-to-end
connectivity. AS 524
AS 3
AS 1
• Some islands are bigger and
more powerful than others.
– Willingness to carry traffic
for others AS 321
AS 123
- Peering or transit AS 121
agreements
AS
3411
7
Delivering Ubiquitous Connectivity

• Basic principles AS 376


AS 441
– No global knowledge
– Hop-by-hop decisions
AS 168 AS 2
• Map analogy
– Detailed map of your
neighborhood
– Coarse knowledge of AS 524
AS 3
how to exit your AS 1
neighborhood to reach
remote destinations
– On-going validation of
map each time you AS 321
reach an intersection AS 121 AS 123

- It could change…
AS
3411
8
Routing Protocol Overview
• Routing protocols follow the AS 376
two-level hierarchy of the AS 441
Internet
– Interior Gateway
Protocols (IGP) control
routing within an AS 2
AS/domain AS 168
– Exterior Gateway
Protocols (EGP) control
routing between AS’s

• Different goals and AS 524


constraints for each family AS 3
AS 1
of protocols
– IGP: Ability to fine tune
internal operation and
shielding from outside
“noise”
– EGP: Scalability and ability AS 321
AS 123
to accommodate a broad AS 121
range of administrative
policies
AS
3411
9
Interior Gateway Protocols
• Routing protocols follow the AS 376
two-level hierarchy of the AS 441
Internet
– Interior Gateway Protocols
(IGPs)
control routing within an AS 168 AS 2
AS/domain
– Exterior Gateway Protocol(s)
(EGP) control routing between
AS’s

• Different goals and constraints AS 524


for each family of protocols AS 3
AS 1
– IGP: Ability to fine tune
internal operation and
shielding from outside “noise”
– EGP: Scalability and ability
accommodate a broad range
of administrative policies AS 321
AS 123
AS 121

AS
3411
10
Protocol Design Goals and Requirements
• Minimize routing table space
– Faster look-up (although forwarding table can be more compact)
– Less to exchange (lower processing and bandwidth overhead)
– Lower storage cost
• Minimize number and frequency of control messages
– Lower processing and bandwidth overhead
– But need to take responsiveness to changes into account
• Robustness
– Avoid black holes (unable to reach destination)
– Prevent or recover from loops (inconsistent decisions among routers)
– Limit instability (oscillation between possible routes)
• Optimize use of network resources and overall
performance
– Best possible and/or most efficient path

11
General Design Choices
• Where are routes computed?
– Centralized vs. distributed
- Centralized is simpler but may not scale and more prone to failure
- Distributed requires collaboration between routers and has more
complex transient behavior
• How are routes computed?
– Distributed computations vs. distribution of information from
which routes can be computed
- Distance vector: routers exchange results of computations
– Routing table and costs
- Link state: routers exchange information on which computations
can be performed independently
– Topological and reachability information
• What criteria are used when computing routes?
– What metrics to optimize (hop count, bandwidth, delay, etc.)?
– Static vs. dynamic metrics?

12
IGPs Job Description

• Ensure consistent and efficient (optimized) forwarding


for destinations within a routing domain.
• Enable selection of appropriate exit points for
destinations outside the domain.
• Two major families of protocols
– Distance vector, examples:
– Routing Information Protocol V2 (RIP2)
– Enhanced Interior Gateway Router Protocol (EIGRP), Cisco
proprietary
- Distributed computation
– Link state, examples:
– Open Shortest Path First (OSPF)
– Intermediate System – Intermediate System (IS-IS)
- Distributed information

13
Link State (LS) Routing

• Common feature with DV protocols


– Relies on communications with neighbors
– Supports destination-based shortest path forwarding
• Everything else is different: Distributing information
versus distributing computations
– Type of information exchanged between neighbors
- Topology and link costs vs route costs
- Use to build a common domain map in all routers
– Route computations
- Independently performed in each router vs distributed
• Two major components of LS protocols
– Topology dissemination and maintenance (flooding)
– Route computation algorithm executed in each router

14
OSPF Overview
• Two level hierarchy
– Backbone area (area 0) connects other areas (hub and spoke).
– Costs are assigned to internal links and external routes for fine
tuning of traffic distribution.
• Link state operation
– Routers broadcast inside their area knowledge of their local
neighborhood.
- Routers glue local neighborhood pieces to create an area map.
– Information about other areas is summarized and broadcast into an
area by Area Border Routers (ABRs).
– Autonomous System Boundary Routers (ASBRs) are responsible for
injecting information on how to reach external destinations.
• Routing table construction
– Routers rely on their domain map and summary information about
other areas and the outside world to compute consistent best paths
to known destinations.
– Multiple paths can exist for any given route.

15
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers
through HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from
Change notification

neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.

16
OSPF HELLO Protocol (1)

• HELLO protocol is a liveness protocol between


adjacent routers.
– HELLO packets are periodically exchanged.
• HELLO protocol serves multiple purposes:
– Dynamic discovery of neighbors
– Advertising of router attributes
– Identification of Designated Router (DR) and DR election
(more on this later)
– Detection of link and router failures
- Frequency of advertising varies with network technology
(from 10 sec to 30 sec).

17
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through
HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
Change notification

c. Process and forward advertisements received from


neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.

18
Database Synchronization (2a)
• Each router maintains a database that is used to
store a “complete” map of the network (more on how
this map/database is actually constructed later on).
• When routers boot, or the link between routers
comes up, routers perform database synchronization.
– The goal is to quickly ensure a common view of the world.
- Synchronization process determines what each router
knows and does not know and which router has the most
recent information.
– Database entries are characterized by sequence number and
age.
- Only newer or unknown entries are exchanged.
 Neighbor routers end up with a common domain
map.

19
Database Synchronization
• During partition, routers’ databases can become unsynchronized.
Router Router
2 6
Router 1 Router Router 2 Router
4 1
5 8

3 Router Router
3 7

• After failure 1, network is partitioned.


– Routers 1 to 4 don’t know about failure 2.
– Routers 5 to 8 don’t know about failure 3.
• When routers 4 and 5 reconnect, database synchronization is
required to ensure consistent views.
• Databases are described by database descriptor records
exchanged between master and slave.
– Routers on each side of a newly restored link talk to each other to
update databases (determine missing and out-of-date pieces).

20
Building the Topology Database (2b/c)
• OSPF router advertises itself and its view of its
neighborhood with Link State Advertisements (LSAs).
– Puzzle piece from which the full map can be built
- Router ID identifies the originator of each puzzle piece
– Several types of advertisements (in OSPF)
- Multiple LSA types keep LSA size small (40 bytes on
average for OSPF).
- Small granularity ensures updates of minimal size.
- Different types of LSAs can have different scopes of
distribution.
– Link local, area local, domain-wide
• Router forwards LSAs it originates or receives from its
neighbors (flooding process).
• Sum of all received LSAs makes up a router’s topology
database.
– Common network map shared by all routers in an area
21
LSA Distribution: Flooding
• New LSAs are forwarded on all eligible (type-dependent)
links except the link on which they were received (if
applicable).
– Dissemination of information is independent of routing.
– Each LSA is transmitted at most twice on each link.
• New LSAs replace previous ones in local database.
• LSAs are transmitted reliably (acknowledged).
• Flooding rate bounded through minimum gap between
LSAs
• Periodic (30 min.) flooding to refresh database (aging of
LSAs)
– Why is periodic refreshing necessary if transmission is reliable?

 All routers end-up with a “complete” domain map.


22
Flooding Example

1 1

1 1

23
Flooding Example

2 2

3 4

4 2

4
24
Flooding Example

2 2

3 4

5 2

5
25
LSA Sequence Numbers

• Use to identify most recent update.


– Greater sequence number is newer.
– Newer LSA replaces old one.
• Two problems to deal with:
– Wrap around of sequence number counter
- Smaller sequence number is now newer.
– Choice of number at boot-up time
- Need to be able to overwrite previous LSAs

26
OSPF Sequence Numbering
and LSA Refreshes
• 32-bit sequence number (signed integer)
– InitialSequenceNumber of -N+1
– Increment by 1 (up to N-1) for each new update
• Wrap-around handled by premature aging of entry
before flooding new value (when reaching N-1)
– Flood LSA with age of MaxAge (1 hour) to purge LSA
– Send new LSA with sequence number of -N+1
– Rare event (over 100,000 years with a 30 min. refresh period)
• Receiving a self-originated LSA with a larger sequence
number than last transmitted LSA
– Caused by presence of residual LSA originated prior to last
router restart
– Jump to one more than received number and flood again

27
Aging of LSAs
• LSA has an age field.
– Incremented at each transmission and while stored
– MaxAge determines time-to-live
- Gets reset at each refresh
• Two uses of MaxAge: Flooding of MaxAge LSA
– Removal of timed-out or invalid LSAs
– Removal of current entry before counter wrap-around
• Impact of MaxAge on boot-up problem
– If router waits for MaxAge, old LSAs will be purged.
– Selection of MaxAge is a difficult choice.
- Small MaxAge minimizes boot-up latency.
- Small MaxAge imposes higher flooding overhead and may prevent
full LSA distribution in large networks.
• Initial database synchronization handles boot-up.
28
Securing LSA Databases

• Consistency of LSA databases is critical to avoid


routing loops.
• Need to protect against multiple error scenarios:
– Link errors
- LSA checksum and acknowledgment to ensure reliable
transmissions
– Injection of spurious LSAs (errors or malice)
- Support for authentication capability in LSAs (OSPFv2)

29
What Are Link State Advertisements?
• Five basic types of LSAs:
– Router LSA: A router’s neighborhood
Intra
– Network LSA: Connectivity through a broadcast network
– T3 Summary LSA: Reachability to a route in another area
Inter
– T4 Summary LSA: Reachability to an ASBR in another area
External – T5 LSA: AS-external LSA to a route in another AS

• Several additional “special” LSA types:


– T7 LSA: Support of “not-so-stubby area” (NSSA)
– Opaque LSAs: Support extensibility of functionality:
- Type 9: Link local
- Type 10: Area local
- Type 11: Throughout whole AS

30
What Is a Router LSA?
• Router LSA advertises to other routers in my “area” what my
neighborhood looks like
– What kind of links, to which neighbors, at what cost?
- Point-to-point, point-to-multipoint, broadcast, etc.
– What set of local destinations can I reach?
- Stub networks and transit networks
– What kind of router am I?
- Internal, Area Border Router, AS Boundary Router

19.2.4.0/24
10 1
Stu 6
b
4
12
7
Stu 0
b Transit
19.2.5.0/24 19.2.0.0/16
0
0

Designated Router
31
What Is a Network LSA?
• Network LSA identifies all the routers attached to
the same broadcast network.
– Advertised by Designated Router (DR) that is chosen
through an election process.
– Backup DR is also elected and takes over if the DR fails.
– All routers exchange HELLO packets with only the DR and
the backup DR.

Transit
19.2.0.0/16

Backup DR

Designated Router

32
Why A Network LSA?
A B C D
• Multiple routers attached to
broadcast LAN
– They can all reach each other.
E F G

• Direct advertising of complete


router connectivity is expensive. B C
– Every router specifies
connectivity to N-1 routers. A D
– ~N2 state overhead (bandwidth
and storage) G E
F
• Network LSA provides more
compact advertising B C
– Single copy of full router
connectivity is sent by A T D
designated router.
– Backup DR minimizes sensitivity G E
to DR stability.
F

33
Building an Area Topology Map
• Putting the puzzle pieces together

1. Router LSA from R1


19.2.4.0/24  2 stub interfaces
10 1
Stu 1  1 transit interface
b R1
1 12 R2  1 pt-to-pt link

Stu Transit
b 19.2.0.0/16

19.2.5.0/24

34
Building an Area Topology Map
• Putting the puzzle pieces together

2. Network LSA from DR R3


19.2.4.0/24
10 1  R1, R3, R4: attached routers
Stu 1 R2
b R1
1 12

Stu
b Transit 0
19.2.0.0/16 R4
19.2.5.0/24
0
R3
Designated Router

35
Building an Area Topology Map
• Putting the puzzle pieces together

3. Router LSA from R2


 2 pt-to-pt links
 1 stub interface
19.2.6.0/24
19.2.4.0/24
10 1
1 Stu
Stu R2
4 b
b R1 5
1 12

Stu
b Transit 0
19.2.0.0/16 R4
19.2.5.0/24
0
R3
Designated Router

36
Building an Area Topology Map
• Putting the puzzle pieces together

4. Router LSA from R4


19.2.6.0/24  1 pt-to-pt link
19.2.4.0/24
10 1 Stu  1 transit interface
Stu 1 R2 1
4 b 2 stub interfaces
b R1 5

1 12

Stu 6 • Note: Stub network


0 2
b Transit
19.2.0.0/16 6 R4 19.2.6.0/24 is dual-
19.2.5.0/24 1
0 homed, but packets
R3 wont “transit” through it
Stu
Designated Router b  No HELLO protocol
19.2.7.0/24 packets sent by R2
and R4 on those
interfaces (passive
interface config.)

37
Building an Area Topology Map
• Putting the puzzle pieces together

5. Router LSA from R3


19.2.6.0/24  1 transit interface
19.2.4.0/24
10 1 1 stub interface
1 1 Stu 
Stu R2
4 b
b R1 5
1 12
• Core Network Graph
0 6
Stu 2
b Transit 0 1
19.2.0.0/16 R4 R2
19.2.5.0/24 6
1 R1 5
0 5 4
12 6
3 0 6
R3 R4
Stu
Designated Router b T
0
19.2.7.0/24 0
5
R3

38
Building an Area Topology Map
• Putting the puzzle pieces together

6. Final Network Graph


19.2.6.0/24 – Stub networks are
19.2.4.0/24
1 added
1 1 Stu
Stu R2
4 b
b R1 5
1 12 1
0 6 1 1 R2
Stu 2
b Transit 0 R1 5
4
19.2.0.0/16 6 R4 12 6
19.2.5.0/24 1 0 6 2
0 1 R4
5
3 T
0 1
R3 Stu
Designated Router 0
b 5
19.2.7.0/24 R3
3

39
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through
HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from
Change notification

neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks
and then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.

40
Computing the Intra-Area
Routing Table (3a)
• Basic setting
– Topology database has stabilized.
– Router computes shortest paths from itself to all routers and transit
networks and then stub networks.
• Path computation based on Dijkstra algorithm
– Maintain two sets of nodes
- Nodes to which the shortest path is known (set S)
- Nodes to which candidate shortest paths are known (set C)
- Initially only the origin router is in set S
– Iterate and at each iteration:
- Consider all neighbors of last node X added to S and add them to C if
they are not already in it.
- Update candidate paths of all neighbors of X if path through X is shorter
than their current path.
- If C is empty, the algorithm terminates.
- Otherwise add to S the node in C that is “closest” to X and iterate.
41
Dijkstra Shortest Path Computation

1
1
R2
5
R1
• What intra-area 4
6
12 2
routing table 0 6
1 R4
does router R1
T 0
come-up with, 1
0
and how does it 5

do it? R3

42
Dijkstra’s Operation At R1

 S = {(R1,0,R1)};
C = {(R2,1,R2); (T,12,T); (R3,,*); (R4,,*)}
 S = {(R1,0,R1); (R2,1,R2)}; 1
1
C = {(T,12,T); (R3,,*); (R4,6,R2)} 1 R2
R1 5
 S = {(R1,0,R1); (R2,1,R2); (R4,6,R2)}; 4
12 6
2
C = {(T,12,T,R2); (R3,,*)} 0 6
1 R4
 S = {(R1,0,R1); (R2,1,R2); (R4,6,R2); T 0 1
(T,12,T,R2)}; 0
5
C = {(R3,12,T,R2)}
R3
 S = {(R1,0,R1); (R2,1,R2); (R4,6,R2);
3
(T,12,T,R2); (R3,12,T,R2)};
C=

43
Adding Transit Networks First
B
• The order in which nodes are 1 1
added to the labeled set can
affect the number of paths A E
discovered to some nodes.
This is because once a node is
added to the labeled set it is 2 0
never revisited
– If E is added first to set of C
labeled nodes, the path A-C-E
of cost 2 is not discovered
– If C is added first to set of 1
B
labeled nodes, the path A-B-E-
C of cost 2 is not discovered A 1 1
1
• In OSPF transit network nodes 2
1
always have outgoing costs of 0 E
0, and therefore must be added C
0
first to the set of labeled nodes

44
Adding Stub Networks
• For each stub network:
– Identify all routers that advertise the stub network.
– Retrieve the shortest path to those routers.
– Add the cost of the shortest path to the router to the cost of the
stub network link advertised by each router in its Router LSA.
– Pick the router(s) that yield the smallest cost.
– Add the stub network to the routing table with the same next
hop(s) as the selected router(s).
• Four stub networks in previous example:
– 19.2.4.0/24 and 19.2.5.0/24 are directly connected to router R1.
– 19.2.6.0/24 is reachable from both R2 and R4, and R2 is the lower
cost option (total cost of 1+1=2 vs 1+5+2=8).
– 19.2.7.0/24 is reachable from both R3 and R4, and R4 is the lower
cost option (total cost of 1+5+1=7 vs 12+3=15).

45
What Intra-Area Routing Table at R1?

Routes Next Hop(s)


19.2.4.0/24 19.2.4.1 (IP address of local interface at R1)
19.2.5.0/24 19.2.5.1 (IP address of local interface at R1)
19.2.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.2.7.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.2.0.0/16 19.2.1.1 (IP address of local interface at R1)
0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

46
From One To Multiple Areas
• Why can’t we keep increasing the number of routers in an area?
– Topology database size and flooding overhead increase.
– Most importantly, route computation can become very onerous.
- Cost and frequency of Dijkstra increase.
• Basic solution is to partition a domain into multiple areas.
– Two-level hierarchy:
- Backbone area as a hub to which other areas connect
- Area Border Routers (ABRs) interconnect areas.
• Full topology is maintained only within an area.
– Flooding of router and network LSAs is limited to within an area.
– Dijkstra computation is limited to one area.
• Domain-wide shortest path computed using a DV-like approach
– ABRs advertise their cost to remote destinations.
– Shortest path is computed by concatenating costs to and from ABR.

47
What Are Link State Advertisements?
• Five basic types of LSAs:
– Router LSA: A router’s neighborhood
Intra
– Network LSA: Connectivity through a broadcast network
– T3 Summary LSA: Reachability to a route in another area
Inter
– T4 Summary LSA: Reachability to an ASBR in another area
External – T5 LSA: AS-external LSA to a route in another AS

• Several additional “special” LSA types:


– T7 LSA: Support of “not-so-stubby area” (NSSA)
– Opaque LSAs: Support extensibility of functionality:
- Type 9: Link local
- Type 10: Area local
- Type 11: Throughout whole AS

48
Router’s View of a Multi-Area Domain

19.1.6.0/24

19.2.6.0/24 T3
Stu
b 1
1 1 R2 Stu
19.2.4.0/24 4 b 19.1.5.0/24
R1 14
1 12 5
2 17 T3
Stu 0 6
b Transit 0
6 R4 21 19.1.2.0/23
19.2.5.0/24 19.2.0.0/16
0
ABR
5 T3
3 26
R3
Designated Router Stu 19.3.0.0/16
b
19.2.7.0/24 19
T3

19.1.0.0/16

T3

49
Generating Summary LSAs

• Summary LSAs advertise cost


to routes or routers (ASBRs)
12.3.4.0/24
in other areas.
• Area Border Routers (ABRs)
are responsible for generating 12
summary LSAs.
– ABRs advertise to other routers
the results of their own shortest
path computations for remote

T3: [12.3.4.0/24; 12]


(but within the same AS)
destinations.
- Essentially a “distance vector”
type of approach

50
A Day in the Life of an OSPF Router

1. Establish adjacency to neighbor routers through


HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from
Change notification

neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.

51
Computing Inter-Area Paths (3b)

• Two cost components (similar to handling of stubs):


1. Cost from ABR to remote destination (as advertised in
corresponding T3 summary LSA)
2. Cost from source to ABR in the local area

• Path selection
– For all ABRs that advertise the target route (longest prefix
match), add the “cost to the ABR” to the “cost from that ABR
to the remote destination.”
– Pick the ABR(s) with the smallest total cost as the target exit
point(s) to reach remote destination.
– Set next hop(s) for remote destination to the next hop(s) of
the shortest path(s) to the selected ABR(s).

52
Example of Inter-Area Path Computation
Rtr 2 Rtr 10 Rtr 17 r
1
2 2 2 2
Rtr 1 Rtr 4 Rtr 7 Rtr 11 Rtr 14 Rtr 18
2 2
20 2
Rtr 3 2 Rtr 9 Rtr 16
2 2 2
2 2
2 2 6 2 2 2 2 2

2 2 2 2 2 2

2 2 2 2 2
Rtr 5 Rtr 6 Rtr 8 Rtr 12 Rtr 13 Rtr 19
Area 1 Area 0 Area 2
• Step 1
– Router 14 advertises a T3summary with cost of 3 for r into area 0.

– Router 13 advertises a T3summary with cost of 5 for r into area 0.

• Step 2
– Router 4 advertises a T3summary with cost of 13 (10+3) for r in area 1.

– Router 6 advertises a T3summary with cost of 9 (6+3) for r in area 1.

• Step 3
– Router 1 identifies router 6 as the best exit point to reach r (4+9 < 4+13).
– Router 1 identifies router 3 and router 5 as its next hops to reach r.
53
What Intra & Inter-Area
Routing Table at R1?
19.1.6.0/24

19.2.6.0/24 T3
Stu
b 1
1 1 R2 Stu
19.2.4.0/24 4 b 19.1.5.0/24
R1 14
1 12 5
2 17 T3
Stu 0 6
b Transit 0
6 R4 21 19.1.2.0/23
19.2.5.0/24 19.2.0.0/16
0
ABR
5 T3
3 26
R3
Designated Router Stu 19.3.0.0/16
b
19.2.7.0/24 19
T3

19.1.0.0/16

T3

54
Intra and Inter-Area Routing Table at R1

Routes Next Hop(s)


19.2.4.0/24 19.2.4.1 (IP address of local interface at R1)

19.2.5.0/24 19.2.5.1 (IP address of local interface at R1)

19.2.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.2.7.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.2.0.0/16 19.2.1.1 (IP address of local interface at R1)

19.1.5.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.1.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.1.2.0/23 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.1.0.0/16 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.3.0.0/16 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)


55
From Multiple Areas To The “World”
• How to reach destinations in other domains/AS?
• AS Boundary Routers (ASBR) are the gateways to the
outside world.
– They “summarize” the world in pretty much the same way as
ABRs summarize other areas.
– ASBRs advertise external routes (T5 LSA) that indicate their
ability to reach remote destinations.
- T5 LSAs are flooded throughout the AS.
– Reachability to ASBRs located in other areas are advertised by
ABRs through T4 LSA. (Why needed?)
• Cost of AS-external LSAs can be of two types:
– Type 1 cost is compatible with the link costs within the AS.
– Type 2 cost is incompatible with the link costs within the AS and
trumps any internal cost.

56
What Are Link State Advertisements?

• Five basic types of LSAs:


Intra – Router LSA: A router’s neighborhood
– Network LSA: Connectivity through a broadcast network
Inter – T3 Summary LSA: Reachability to a route in another area
– T4 Summary LSA: Reachability to an ASBR in another area
External – T5 LSA: AS-external LSA to a route in another AS
• Several additional “special” LSA types:
– T7 LSA: Support of “not-so-stubby area” (NSSA)
– Opaque LSAs: Support extensibility of functionality:
- Type 9: Link local
- Type 10: Area local
- Type 11: Throughout whole AS

57
A Router’s View of The “World” 0.0.0.0/0

T5
T4 <2,20>
R5
T3
19.2.6.0/24 ASBR
Stu
b 1 17 19.1.6.0/24
1 1 R2 Stu
19.2.4.0/24 4 b
R1 14
1 12 5
2 T3
0 17
Stu 6
b 0 19.1.5.0/24
Transit 6 R4 21
19.2.5.0/24 19.2.0.0/16
0
ABR
5 T3
26
3 19
R3 19.1.2.0/23
Designated Router Stu
b
T3
19.2.7.0/24
T3 19.3.0.0/16

19.1.0.0/16
58
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through
HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from
Change notification

neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.

59
Computing Paths to External Routes (3c)
• Two cost components (again like inter-area & stubs):
1. Cost from ASBR to external route
2. Cost from source to ASBR
• Path selection
– Smallest type 2 cost wins independent of internal cost
– If type 1 cost or equal type 2 cost
- Prefer non-backbone intra-area path to ASBR (or forwarding
address) and use cost to break ties
- If no such path, use cost to select path
– Cost computation
- If (equal) type 2 cost: Cost to ASBR
- If type 1 cost: Cost to ASBR + Cost from ASBR
- Cost to ASBR:
– Direct cost if within same area
– Cost to ABR plus T4 summary cost advertised by ABR (cost from
ABR to ASBR in remote area)

60
Examples of External Path Computation
<1,20> <1,2>
Rtr 2 <1,2> r r’
Rtr 17
r’
2 2 2 2
Rtr 1 Rtr 4 Rtr 7 Rtr 11 Rtr 14 Rtr 18
Rtr 10 2 2
2 2
Rtr 3 2 Rtr 9 Rtr 16
2 2 2
2 2
2 2 6 2 2 2 2 2
Rtr 5 2 2 2 2 2
2

2 2 2 2 2
Rtr 6 Rtr 8 Rtr 12 Rtr 13 Rtr 19
Area 2
r
Area 1 Area 0
<1,20> <x,y>: <cost type, cost>
for external route

• Router 1 selects router 5 to reach external route r in spite


of its higher cost (22 instead of 10 through router 10), as
it is in area 1
• Router 8 selects router 17 to reach external route r’, as
router 8 is in the backbone area and uses cost to identify
the best path
61
What Routing Table at Router R1?
0.0.0.0/0

T5
T4
<2,20>
R5
T3
19.2.6.0/24 ASBR
Stu
b 1 19.1.6.0/24
1 1 R2 Stu 17
19.2.4.0/24 4 b
R1 14
1 12 5
2 T3
0 17
Stu 6
b 0 19.1.5.0/24
Transit 6 R4 21
19.2.5.0/24 19.2.0.0/16
0
ABR
5 26 T3
3 19
R3
Designated Router Stu 19.1.2.0/23
b T3
19.2.7.0/24
T3 19.3.0.0/16

19.1.0.0/16
62
Routing Table at R1
Routes Next Hop(s)
19.2.4.0/24 19.2.4.1 (IP address of local interface at R1)

19.2.5.0/24 19.2.5.1 (IP address of local interface at R1)

19.2.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.2.7.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.2.0.0/16 19.2.1.1 (IP address of local interface at R1)

19.1.5.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.1.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.1.2.0/23 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.1.0.0/16 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

19.3.0.0/16 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

0.0.0.0/0 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)


63
Summarizing OSPF Operation
r
2
r
1 Area 3
Area 1 ABR

ASBR

Area 0

r
Area 4 3

Area 2
Intra-area route
AS 123 Inter-area route
External route
64
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through
HELLO protocol.
2. Build and maintain topology database.
a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from
Change notification

neighbors.
3. Compute routing table.
a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.

65
Packet Forwarding (4)
• For each known route, paths (next hops) are installed
in the final routing table as follows:
– Intra-area paths are preferred.
– Inter-area paths are next.
– Type 1 external paths follow.
– Type 2 external paths are the least preferred.
– Cost is used as a last resort to break ties.

• Forwarding table is constructed from the routing


table.

• Upon receipt of a packet, a longest prefix match


search is performed on the forwarding table to
determine where to send the packet next.
66
In Case I Have Made It Look Too Simple!
• Understanding the flow of traffic in OSPF networks is not trivial.
• Many “exceptions” to the basic rule of using shortest paths. Some
of them are clear, some are less obvious.
– Intra-area paths have preference over inter-area paths.
- If I happen to hit a router with an intra-area route, it will divert my
packet from the intended exit point.
– ABRs ignore each other’s advertisements within a given non-
backbone area.
- Once an exit point is reached, there is no turning back.
– Address summarization can hide the best path.
- Advertised cost for summary route is max cost across all
individual routes (RFC 2328, Sec. 3.5).
– Choice of path to external route is not always cost driven.
- Prefer path to local ASBR except if in backbone area, where
selection is based on cost (needed to avoid routing loops in some
cases – See RFC 2178 and 2328 for details).

67
Example of Non-Shortest Path

Rtr 2 Rtr 10 Rtr 17

2 2 2 2
Rtr 1 Rtr 4 Rtr 7 Rtr 11 Rtr 14 Rtr 18
2 2
20 2
Rtr 3 2 Rtr 9 Rtr 16
2 2 2
2 2
2 2 6 2 2 2 2 2

2 2 2 2 2 2

2 2 2 2 2
Rtr 5 Rtr 6 Rtr 8 Rtr 12 Rtr 13 Rtr 19
Area 1 Area 0 Area 2

• ABRs ignore T3 Summary LSAs in their own area


(except backbone)
– Shortest path from Router 2 to Router 8 has cost 6. (Router
6 is the intended exit point)
– Actual path from Router 2 to Router 8 has cost 8. (Router 4
diverts packets over higher cost intra-area path to Router 8)
68
Another Example of Non-Shortest
Path
Rtr 2 Rtr 10 Rtr 17

2 2 2 2
Rtr 1 Rtr 4 Rtr 7 Rtr 11 Rtr 14 Rtr 18
2 2
4 2
Rtr 3 2 Rtr 9 Rtr 16
2 2 2
2 2
20 2 8 6 2 2 2 2

20 2 2 2 2 2

20 2 2 2 2
Rtr 5 Rtr 6 Rtr 8 Rtr 12 Rtr 13 Rtr 19
Area 1 Area 0 Area 2

• Both Router 5 and Router 8 advertise reachability to


external route E with type 2 cost of 10.
– Router 5 is selected by Router 2 in spite of the higher
(internal) cost to reach it (22 vs 10), because of preference
for intra-area paths to ASBRs.
69
A Tricky Corner Case
Router 2 Router 11

2 2
Router 1 Router 4 Router 14 Router 12
2 2

Router 3 2 Router 15 2
2 2 2 20
2 2 2 2

2 2 2 2

2 2 2 2 2
Router 5 Router 6 16 Router 13 Router 16
Router 10
Area 1 Area 0 Area 2
2 2
Router 7 Router 8

Router 9

• What is (are) the path(s) between Router 2 and


Router 16
– Hint: Router 10 has interfaces in Areas 0, 1 and 2
• Remember basic rule of distance vector protocol
– Only advertise a route you are using
70
A Tricky Corner Case
Router 2 Router 11

2 2
Router 1 Router 4 Router 14 Router 12
2 2

Router 3 2 Router 15 2
2 2 2 20
2 2 2 2

2 2 2 2

2 2 2 2 2
Router 5 Router 6 16 Router 13 Router 16
Router 10
Area 1 Area 0 Area 2
2 2
Router 7 Router 8

Router 9

• Two exit points from Area 1


– Router 10 advertises a cost of 24 to Router 16, even though there is a
better/cheaper path through Router 8
– Router 7 advertises a cost of 8 to Router 16
• Minimum cost through either Router 10 or Router 7 is 28
• BUT there are two paths of total cost 46!
71
A Few Other Extensions That Can
Further Complicate Things
• Virtual links and transit areas
– Useful to enhance backbone connectivity, but rarely used
and add complexity to understanding packet forwarding
decisions
• Stub areas
– No flooding of AS-external LSAs (T5)
– Default T3-summary route flooded in the area
• Not-so-stubby areas (NSSA)
– I don’t want to receive T5’s but want to originate T5’s
– Use new LSA type (T7) for flooding of AS-external LSAs
within the NSSA, plus “translation” of T7 to T5 at ABRs and
a few other tweaks

72
Virtual Links &Transit Areas

• Virtual links are meant to facilitate backbone connectivity


– Connecting remote areas
– Increasing backbone robustness to link failures
• The two end points of a virtual link are Area Border
Routers
– A virtual link can be configured between two ABRs that have an
interface to a common non-backbone area
– A virtual link is treated as an unnumbered pt-to-pt link
– The cost of a virtual link is the cost of the path connecting the two
ABRs that have been configured as ends of the virtual link
- Note that this implies that a virtual link only comes up after the
shortest path computation has completed
• An area that has one or more active virtual links is called
a transit area (stub areas cannot be transit areas)
– It can be used to carry transit traffic between other areas
– It can make understanding how packets are “actually” forwarded
pretty complicated...

73
Two Examples of The Use of Virtual Links

5
2 4
1
Area 0 Area 2 3
1 3
3 2 2
6
2
7 1 5

2 3
1 3
2
2 5
3
1 Area 3
2 1

74
Another Virtual Link Example
• Or why virtual links can make understanding packet
forwarding more complicated
– When computing the shortest path to a destination, ABRs
attached to a transit area are allowed to consider summary
LSAs advertised in the transit area by other ABRs

3 2
1
Area 0
12
10

75
Another Link State Protocol: IS-IS
• IS-IS = Intermediate System to Intermediate System
– Intermediate System = Router
– End System = Host
• Just like OSPF
– A link-state protocol that relies on flooding
- Each node advertises its (local) view of the world in a Link
State Packet (LSP)
– Entries are regularly refreshed
- Nodes maintain a topology database on which they run
Dijkstra to compute shortest paths
– Database entries are aged out if not refreshed
– Hello protocol for liveness and neighbor discovery
– Broadcast network represented as “pseudo-node” with an elected
Designated IS (DIS) responsible for impersonating it
– A two-level hierarchy for improved scalability
76
Just like OSPF – Maybe Not
• IS-IS runs directly on the link layer, i.e., below and
independent of IP
– No exposure to IP “insecurity”, but no benefit from IP functionality,
e.g., fragmentation (common MTU across the entire network)
• IS-IS relies on different area and level definitions
– Area = level 1 domain based on local sharing of area address
– Routers belong to one area (area boundary through links not
routers, i.e., no ABR but “connected” routers)
– Level 2 backbone defines logical connectivity based on link types
and not as a separate area (OSPF area 0)
• IS-IS has a different LSP structure (just one per router)
– LSP originator identified by System ID of router (IS)
- Not an IP address…
– Extensive use of Type/Length/Value (TLV) format to encode
everything a router knows in one long list
- Affects update process: One small change triggers update for
everything
- Affects extensibility: New information easily encoded in new
TLV without requiring new packet type

77
More on IS-IS Differences – (1)
• Areas and levels (level 1 and level 2)
– Routers form adjacency over link if and only if
- They both agree it is a level 2 link
- They both agree it is a level 1 link AND they share at least one
area address
– Area IDs are link local, which facilitates merging and splitting areas
• Level 2 and level 1 structure
– Links can be level 1, level 2 or BOTH level 1 and level 2
– Level 2 backbone is mostly a connectivity concept
- L1/L2 links are somewhat similar to OSPF virtual links
– Connectivity between level 1 areas is provided through “attached”
router
- Router with connectivity to L2 backbone as indicated in L1
LSP (Attached bit)
- Default behavior routes inter-area packets to closest attached
router (route leaking extensions)

78
Merging, Splitting, Renumbering Areas
A
B
L2
Area 7
• Merging area 7 and area 8
Area 8 – Reconfigure link A-B to be L1/L2
C
D link with, say, area 7 as area ID
common to A & B
A
L1/L2 B – L1 flooding domain now extends
to routers A, B, C and D
Area 7 Area 7
C – Add area 7 as area ID for link
Area 8
B-D on both routers B and D
D
– Reconfigure link A-B to be L1
only
– Remove area 8 ID from routers B
A and D for link B-D
L1 B
• Similar (symmetric) processes
Area 7 Area 7 can be followed for area splitting
C Area 7 or renumbering
D

79
Level 2 Backbone
A1

Area 7 • L2 Backbone forms logical


L1
connected topology based on L2
and L2/L1 links
A2
– Can facilitate use of high-speed
L2 backbone links for intra-area traffic
C1 (similar to OSPF virtual links but link
L1/L2
rather than area specific)
C2 • Inter-area traffic is sent to closest
L1 Area 9 L1/L2 attached L2 router
– Closest exit point to backbone
L1/L2 - No need to deliver traffic to C1
C4
C3
– Scalable, but can result in sub-
L2 L2
optimal routing
L1 • Route-leaking extensions
B2 – Similar to OSPF T3 summary routes
DB1 approach
L1
Area 10 L1
– Use of Up/Down bit to ensure that
B3 routes are not propagated back out
B4
80
The Penalty of Choosing
The Closest Exit Point

4
5
2
4

2 4 25 19.1.1.0/24
1
1
1 4 10
1
4

81
More on IS-IS Differences – (2)
• HELLO protocol
– Local hold timer for each link (carried in Hello messages)
– Used for MTU check through padding of Hello messages
– Three-way handshake to test for bidirectional connectivity
- Hello message list addresses of IS from which Hello have been
heard
• Broadcast network
– DIS election is preemptive (IS with highest priority wins)
– LAN represented by pseudonode LSP from DIS
- Identified by non-zero pseudonode ID
– No backup DIS (DIS can reduce its Hello hold timer – JUNOS)
• Database synchronization
– On LANs DIS sends Complete Sequence Numbers PDUs (CSNPs)
every 10s (unreliable transmissions – implicit ACKs)
– On P2P links initial CSNP sent only when adjacency comes up
- JUNOS implements periodic resending of CSNP on P2P links
– Other routers request specific LSPs using Partial Sequence Numbers
PDUs (PSNPs) or reflood missing/old LSPs
- ACKs (PSNP or CSNP) needed for received LSPs on P2P links

82
More on IS-IS Differences – (3)
• Fragmentation
– Remember that IP fragmentation is not available
– IS-IS does application level fragmentation (assumes minimum
MTU size of 1492 bytes – verified using Hello padding)
– CSNP fragmentation
- Based on using Start-LSP-ID and End-LSP-ID fields to
indicate beginning and end of synchronization
– LSP fragmentation
- Based on fragment ID byte (up to 256 fragments)
– Extension based on assigning additional IDs to IS
- Fragments zero is mandatory for others to be considered
- Fragments are atomic (arrive independently)  Need to be
careful in packaging information in fragments to avoid
churning in the presence of changes

83
LSP Fragmentation
Fragment 00 Fragment 01

Adj Adj Adj Adj Adj


Header
#1 #2 #20 #21 #22

Adjacency goes away in


fragment 00 Fragment 00 Fragment 01
Repackaging of
fragments requires re- Adj Adj Adj Adj
advertising both Header
#1 #20 #21 #22
fragments 00 and 01

Fragment 00 Fragment 01
Preserving fragment
structure ensures that
Adj Adj Adj Adj
only fragment 00 is re- Header
#1 #20 #21 #22
advertised

84
More on IS-IS Differences – (4)
• LSP structure
– Link State Packet as a container
– Container content based on packing independent entities (TLVs)
that each provide different type of information
• Benefits
– Protocol machinery associated with the handling of the containers
is independent of its content (reusable)
– Carrying new information in container only requires definition of
new TLVs
– LSPs can still be parsed even if they contain unknown TLVs (just
skip the # bits specified in the Length field)
• Drawbacks
– Container imposes rather coarse information granularity
- Whole container is resent in the presence of changes
- Need to exercise caution when information is spread over
multiple fragments
– Purge mechanism is required

85
Purging Old Fragments
Fragment 00 Fragment 01

Adj Adj Adj Adj


Header
#1 #20 #21 #22

• Adjacencies #21 & #22 go away


– New fragment 00 is issued
– No more need for fragment 01 New Fragment 00
• How does router know fragment
01 is gone? Header
Adj Adj
– Will persist for a period of #1 #20
“lifetime”
• Router can issue “purge” LSP
– Contains only the header with
zeroed out lifetime and
checksum

86
Main IS-IS TLVs – (1)

TLV TLV # Where Used?


Area Address 1 Hello, LSP
IS Neighbors 6 Hello (LAN)
Padding 8 Hello
Authentication 10 Hello, LSP, CSNP, PSNP

Checksum 12 Hello, CSNP, PSNP


Protocols Supported 129 Hello, LSP
IP Interface Address 132 Hello, LSP
Dynamic Host Name 137 LSP
Multi-Topology Supported 229 Hello, LSP

87
Main IS-IS TLVs – (2)
TLV TLV # Where Used?
IS Reachability 2 LSP
Extended IS Reachability 22 LSP

Multi-Topology IS Reachability 222 LSP

IP Internal Reachability 128 LSP

IP External Reachability 130 LSP

Extended IP Reachability 135 LSP

Multi-Topology IP Reachability 235 LSP

Multi-Topology IPv6 Reachability 237 LSP

88
IS Reachability TLVs

• IS Reachability (TLV #2) – Hardly used anymore


– Fixed length
– Identifies connectivity to neighbors (independent of IP)
– Multiple possible metrics (default, delay, expense, error), but
very limited range (only 6 bits, i.e., from 0 to 63)
• Extended IS Reachability (TLV #22)
– Motivated by need for bigger metrics and support (room) for
extensions (sub-TLVs)
- 24-bit metric field typically encodes value in reference to
Reference Bandwidth
– Reference bandwidth of 1 Terabits/sec, so that a 1Gbps
link has a value of 1000 (smallest granularity is 64kbps)
- Variable length TLV
– Ability to include a variety of sub-TLVs, e.g., for Traffic
Engineering extensions and MPLS

89
Some IS-IS Sub-TLVs – (1)

Sub-TLV Sub-TLV #
Maximum Link Bandwidth 9
Reservable Link Bandwidth 10

Unreserved Bandwidth 11

Traffic Engineering Metric 18

Link Protection Type 20

90
IP Reachability TLVs

• IP Internal/External Reachability (TLV #128/130) –


Hardly used anymore
– Identifies IP routes directly connected to router (internal) or
learned from other protocol (external)
– Same metrics and metrics limitations as IS Reachability TLV
• Extended IP Reachability (TLV #135)
– Specifies IP routes reachable by router
- 6-bit for prefix length: from 0 to 32 (33 values)
- Subsumes both TLV #128 and #130
– 32-bit metric field (compatibility with other protocols)

91
Multi-Topology TLVs

• Enables specification of multiple logical topologies


each with a different routing
– Per topology metrics and SPF computations, e.g., VoIP
traffic routed differently from data traffic
• TLV #229 identifies topologies a router supports
– IPv 4 Unicast (#0)
– In-Band Management (#1)
– IPv6 Unicast (#2)
– Multicast (#3)
• Multi-topology support validated during Hello
exchanges
• Multi-topology IPv4/6 Reachability TLVs “duplicate”
Extended IP Reachability and IPv6 Reachability
TLVs
92
More on IS-IS Differences – (5)

• SPF computations
– Explicitly structured as a two-phase process
• Phase 1
– Compute shortest paths from IS to IS (on a graph with
routers/IS and “networks” as vertices and links/adjacencies
as edges
- One SPF per topology
– Independent of any underlying “reachability” information
• Phase 2
– Add shortest paths to all “leaves” – reachability information
• Phase 1 can be reused across protocols and is only
triggered in the present of connectivity changes
• Phase 2 is reachability specific
93
Routing Protocol Overview
• Routing protocols follow the AS 376
two-level hierarchy of the AS 441
Internet
– Interior Gateway Protocols
(IGP) control routing within
an AS/domain AS 168 AS 2
– Exterior Gateway
Protocol(s) (EGP) control
routing between AS’s
• Different goals and
constraints for each family AS 524
of protocols AS 3
AS 1
– IGP: Ability to fine tune
internal operation and
shielding from outside
“noise”
– EGP: Scalability and ability AS 321
to AS 123
AS 121

AS
3411
94
Growing Up
From One to Many Domains

• Goal
– Enable connectivity between domains (Internet-wide)
• Requirements
– Operational flexibility and scalability, and

scalability, and scalability,…


- Autonomous systems are typically operated by different
administrative entities
- Cooperation but no “trust” between domains
• Border Gateway Protocol (BGP4) is the dominant
(only!) External Gateway Protocol (EGP)

95
BGP Routing Table Growth

Telstra’s table (AS 1221)

From http://bgp.potaroo.net
96
Some Basic Remarks
Before Jumping Into BGP
• A link state type of approach would simply not work
– Requires building and maintaining a map of the entire Internet in
every router...
– The need for consistent information and decisions cannot be
satisfied as the network size grows
- Things are always changing somewhere in the Internet
• Distance vector protocols are the only realistic option
– Better scalability by limiting the level of topology information that
each router maintains
– Preserve ability to use different route selection criteria as each
router
- No need for consistent metrics
- Seamless support for policies
– Control of what routing information is sent to whom

97
Border Gateway Protocol
• DV protocol for inter-domain routing
– Supports arbitrary topology (but no overlapping domains)
– Governs exchange of information between internal and external
border routers (BGP peers)
- Internal peers: within the same domain
- External peers: in two adjacent domains
- Each domain is characterized by a unique autonomous system number
• Major BGP characteristics
– Selection of “best” path (avoid stupid choices and support strong
administrative control)
- Multiple path attributes
– Loop avoidance (path vectors)
– Scalability through route aggregation
• BGP as a protocol is relatively simple (86 pages for the latest draft
vs 244 for RFC 2328), but its configuration can be complex and
errors can have far-reaching implications

98
BGP Operation Overview
Three major phases:
– Neighbor acquisition and reachability, exchange of routing
information, and path selection (steady state)
1. Neighbor acquisition and reachability
– Initiated through OPEN message and maintained by
KEEPALIVE messages
– Neighbor declared unreachable if no KEEPALIVE received
within Holding Time
2. Routing information exchanged through UPDATE
messages
– Incremental updates to advertise & withdraw routes
- Requires reliable transmission (uses TCP - port 179)
3. Path selection uses the information received in
UPDATE messages to select the best path for a route
and construct the routing table

99
The BGP State Machine
“Normal” Sequence

Idle Idle

Connect Active
Connect

OpenSent
OpenSent

OpenConfirm
OpenConfirm

Established
Established

100
BGP Information Flow and Sources
• Different peering sessions with internal (same AS) and
external (different AS) neighbors
– External BGP neighbors communicate via eBGP
– Internal BGP neighbors communicate via iBGP
- All BGP peers in an AS are typically connected in a full
mesh (more on this later)

Rtr A1 Rtr D2 Rtr A3


eBGP
eBGP Rtr A2 iBGP

iBGP AS 3
iBGP eBGP
AS 1 iBGP iBGP

eBGP iBGP eBGP


Rtr B1 Rtr B2 Rtr C2 Rtr B3

AS 2
101
BGP Processing Steps
Phase 1 Phase 3
Determines Determine which
RIB_In RIB_Out
Rtr A2
degree of routes to advertise Rtr A2
preference based on policies
Phase 2
RIB_In RIB_Out
IBGP Rtr B2 Select best routes to Rtr B2 IBGP
install in LocRIB

RIB_In RIB_ Out


Rtr C2 Local RIB Rtr C2

RIB_In RIB_ Out


Rtr A3 Rtr A3

EBGP EBGP
RIB_In RIB_ Out
Rtr B3 Router D2 Rtr B3

102
BGP UPDATE Message

• UPDATE message is the


basic unit of route Unfeasible Route Length (2 bytes)

advertisement
– Can contain multiple routes Withdrawn Routes (variable)
that are being withdrawn
– Path Attributes describe a
number of key properties of Total Path Attribute Length (2 bytes)
the advertised route that are
used to select the best path
– NLRI is a list of IP address Path Attributes (variable)
prefixes associated with a
given BGP route (common Network Layer Reachability Information
set of Path Attributes) (NLRI) (variable)

103
Path Attributes
General Characteristics
• Several categories of attributes
– Optional or well-known, mandatory or discretionary, transitive or not,
partial or not
• Well-known attributes must be recognized by all BGP
implementations
– Mandatory well-known attributes must be included in every UPDATE
message, while discretionary well-known attributes may or may not be
sent based on the content of the message
– Well-known attributes MUST be passed along (after updating) to other
BGP peers
• Optional attributes need not be recognized by all BGP
implementations
– Unrecognized transitive attributes SHOULD be passed to other BGP
peers with the partial bit set
– Unrecognized non-transitive attributes are ignored

104
Path Attributes (1)
• AS_PATH
– Well-known, mandatory
– Sequence of path segments of type AS_SET (1) or AS_SEQUENCE (2)
- AS_SET: Unordered list of autonomous systems traversed by the
route
- AS_SEQUENCE: Ordered list of autonomous systems traversed by
the route
– Updated by “pre-pending” own AS number when advertising to a BGP
speaker in another AS  Loop prevention
• NEXT_HOP
– Well-known, mandatory
– IP address of border router to be used as next hop towards destination
identified by the NLRI field
– Typically chosen to ensure that the “shortest” path is taken
• ORIGIN
– Well-known, mandatory
– Characterizes where the path first originated
- IGP: 0; EGP: 1; Other: 2
– Should not be changed by other BGP speakers
105
Path Attributes (2)
• LOCAL_PREF
– Well-known, discretionary
– Advertisement to other BGP speakers in the same AS (iBGP) of the
degree of preference of a route by the advertising router (higher
value is preferred)
• MULTI_EXIT_DISC (MED)
– Optional, non-transitive
– Used to give some preference to different exit/entry points in a
neighboring AS (lower value is preferred)
• COMMUNITY
– Optional, transitive, used to simplify routing policies
- Common property used to determine which routes to accept,
prefer, and pass to BGP neighbors
– Some well-known communities:
- NO_EXPORT: do not advertise outside of the AS (or
confederation)
- NO_EXPORT_SUBCONFED: do not advertise to external peers
(including peers in other autonomous systems within a
confederation)
- NO_ADVERTISE: not advertised to any BGP peer 106
Path Attributes (3)

• AGGREGATOR
– Optional, transitive
– Contains IP address and AS number of the BGP speaker that
formed the aggregate route
• ATOMIC_AGGREGATE
– Well-known, discretionary (should be propagated)
– Informs other BGP speakers that the advertiser aggregated
several routes and may have removed some autonomous
system numbers from the AS_SET (loop free property must
be maintained, though)
- As a result, actual path may differ from AS_PATH
- Basically used to signal possible loss of information
– NLRI field must not be modified by adding a more specific
prefix, i.e., route must not be de-aggregated (loop prevention)

107
Path Attributes (4)

• ORIGINATOR_ID
– Optional, non-transitive
– Used by Route Reflectors (more on this later)
– Identifies the local router (within the local AS) that originally
advertised the route
• CLUSTER_LIST
– Optional, non-transitive
– Used by Route Reflectors to detect looping of routing
information in an AS because of misconfiguration
- Each Route Reflector prepends its CLUSTER_ID to the
CLUSTER_LIST
- Route Reflectors ignore advertisement that carry their
CLUSTER_ID in the CLUSTER LIST

108
BGP Decision Process
• Three phase process
– Phase 1: Calculates a “degree of preference” for each route in a
given RIB_In (locks the associated RIB_In)
- If route is learned from local peer, the LOCAL_PREF attribute
is usually taken as the degree of preference.
- If route is learned from an external peer, the degree of
preference is computed based on local policy.
– The resulting value is used as LOCAL_PREF in any iBGP
re-advertisement.
– Phase 2: Selects the “best” route out of all those available for
distinct destinations (locks all RIB_In)
- Excludes routes with unresolvable NEXT_HOP or a loop in the
AS_PATH attribute
- Best routes are installed in the Local RIB.
– Phase 3: Decides, based on policies, which routes in Local RIB to
advertise to which peer (blocks execution of Phase 2).
- Route aggregation can be performed at this stage.

109
BGP Tie Breaking Rules
• BGP selects a SINGLE route.
– Remove all routes that don’t have the smallest number of AS
numbers in AS_PATH (each AS_SET counts only as one!)
– Remove all routes that don’t have the lowest ORIGIN value
– Among routes learned from the same neighboring AS, remove
routes with less desirable (higher) MED values.
– If at least one route was learned through eBGP, remove all
routes learned through iBGP.
– Remove all routes with a non-minimum IGP cost to NEXT_HOP.
– Remove all routes that were not advertised by the BGP speaker
with the lowest BGP identifier.
– Prefer the route received from the lowest peer address.

110
Using LOCAL_PREF to Pick an Exit
Point
• Choosing between a primary and a backup provider
– Used to influence internal decisions

Primary

LOCAL_PREF=100

LOCAL_PREF=20

Backup

111
AS_PATH Padding to Discourage the
Use of Certain Links - (A Hack!)
• Used externally to influence choice of inbound links
– Choosing between a primary and a backup link
– Tuning inbound traffic for load-balancing purposes
• Can be over-ridden by local decisions
(LOCAL_PREF)
1.2.0.0/16; <AS1>
1.3.0.0/16; <AS1,AS1>

1.2.0.0/16; <AS1,AS1>
1.3.0.0/16; <AS1>

112
Another Way to Influence Entry Points
• MED allows crude selection ability
– Avoid low speed internal links
• But not always taken into account

19.2.1.0/24; MED 100


19.2.1.0/24; MED 5
19.2.2.0/24, MED 5
19.2.2.0/24, MED 100

Low speed RF link

113
Ignoring MED Values
• Hot potato routing
– Basic rule between ISPs
– “I wont carry your bits for you…”

114
Propagating Path Attributes (1)
• Let us follow UPDATEs for routes r and r’ located in AS 1.
• Router A1 originates updates for routes r and r’ and advertises them over
its eBGP session to Router A2.
– ORIGIN is set to 0 as routes r and r’ were learned through IGP.

– AS_PATH type set to AS_SEQUENCE and initialized with AS 1.

– Router A1 sets NEXT_HOP to be the IP address of its interface on


the link to Router A2.
– MED values of 0 and 50 for routes r and r’, respectively, as Router A1
is the desired entry point for r but not r’ (Router B1 will use MED
values of 50 and 0 when advertising routes r and r’ to Router B2).

Rtr A1 Rtr D2 Rtr A3


eBGP
eBGP Rtr A2 iBGP
r
iBGP AS 3
iBGP eBGP
AS 1 iBGP iBGP

r’
eBGP iBGP eBGP
Rtr B1 Rtr B2 Rtr C2 Rtr B3

AS 2 115
Propagating Path Attributes (2)
• Router A2 processes the updates it received from Router A1 for routes r
and r’ and decides to advertise them over its iBGP sessions to Routers
B2, C2 and D2.
– ORIGIN is kept unchanged.

– AS_PATH is propagated unchanged.

– Router A2 has been configured with NEXT_HOP self, so it sets


NEXT_HOP to be its own IP address.
– MED values are propagated unchanged.

– Router A2 sets LOCAL_PREF for r and r’ to 50 and 20, respectively


(Router B2 advertises both as 50 – more on this later).

Rtr A1 Rtr D2 Rtr A3


eBGP
eBGP Rtr A2 iBGP
r
iBGP AS 3
iBGP eBGP
AS 1 iBGP iBGP

r’
eBGP iBGP eBGP
Rtr B1 Rtr B2 Rtr C2 Rtr B3

AS 2 116
Propagating Path Attributes (3)
• Router D2 processes updates received from Routers A2 and B2 for routes r
and r’ and advertises a single UPDATE for aggregate route r* over its eBGP
sessions to Router A3.
– ORIGIN is kept unchanged.
– Router D2 generates new AS_PATH attributes for r and r’ by pre-pending
AS2 to the AS_PATH (value is now <AS2,AS1>) and because both
AS_PATH attributes are identical, the AS_PATH of r* is set to the same
value and type.
– Router D2 adds an AGGREGATOR attribute <AS 2;own IP address> but
no ATOMIC_AGGREGATE attribute as there was no information loss
– Router D2 sets NEXT_HOP to its own IP address.

Rtr A1 Rtr D2 Rtr A3


eBGP
eBGP Rtr A2 iBGP
r
iBGP AS 3
iBGP eBGP
AS 1 iBGP iBGP

r’
Rtr B1 eBGP iBGP eBGP
Rtr B2 Rtr C2 Rtr B3
117
AS 2
Decision Process Example (1)
• In AS 1 both routes r and r’: are learned from IGP
• In AS 2 routers hear about r and r’ from Router A2 and Router B2,
and both routes have the same AS_PATH count and ORIGIN value.
– For routes r and r’, Router A1 advertises MED values of 0 and 50, and
Router B1 advertises MED values of 50 and 0.
– If LOCAL_PREF values are equal, Routers C2 and D2 in AS 2 rely on MED
values and pick Router A2 as the NEXT_HOP for r and Router B2 as the
NEXT_HOP for r’ (Routers A2 and B2 pick Routers A1 and B1, respectively)
• In AS 3, Router A3 will pick Router D2 (eBGP from Router D2 vs
iBGP from Router B3); Router B3 will pick Router C2 (smaller BGP
ID); other BGP speakers pick Routers A3 or B3 based on IGP cost.

Rtr D2 Rtr A3
Rtr A1 Rtr A2 eBGP
eBGP iBGP
r
iBGP AS 3
iBGP eBGP
AS 1 iBGP iBGP

r’
eBGP iBGP eBGP
Rtr B1 Rtr B3
Rtr B2 Rtr C2

AS 2 118
Decision Process Example (1’)
• In AS 2 routers hear about r and r’ from Router A1 and
Router B1, and both routes have the same AS_PATH
count and ORIGIN value, but different MED values.
– For routes r and r’, Router A1 advertises MED values of 0 and 50, and
Router B1 advertises MED values of 50 and 0.
• For routes r and r’, Router A2 advertises LOCAL_PREF
values of 50 and 20, while Router B2 advertises 50 for both
– Router C2 and D2 pick Router B2 for r’, and select either Router A2 or
Router B2 for r based on their IGP cost (MED is ignored)
Rtr D2 Rtr A3
Rtr A1 Rtr A2 eBGP
eBGP iBGP
r
iBGP AS 3
iBGP eBGP
AS 1 iBGP iBGP

r’
eBGP iBGP eBGP
Rtr B1 Rtr B3
Rtr B2 Rtr C2

AS 2 119
Another Aggregation Example
• Routes r and r’ are aggregated into route r* by Router R when
advertised into AS 8
– AS_PATH attribute type changed to AS_SET
– Unordered list of ASes <AS 1;AS 2;AS 3;AS 4;AS 5;AS 6;AS 7>
– May omit some AS numbers if there is no risk of loop, e.g., advertise
AS_SET <AS 1; AS 2; AS 3; AS 7>
- ATOMIC_AGGREGATE attribute is added
- Actual path need not follow AS_PATH

AS 1 AS 2 AS 3

r
r* AS 8
AS 7
Router R
AS 4
AS 5 AS 6

r’
120
De-Aggregation and Loops
r’/24
Route r’ < Route r

r/16
r’; <AS1,AS2,AS3,AS4>

r’; <AS1,AS2,AS3,AS4>

Routing Loop for


packets destined r; < AS 5>
for route r’

r; <AS5,AS6>
r’; <AS5,AS6>
Illegal de-aggregation

121
Policies – One Example

• Transit (customer) vs. non-transit (peer) agreements


between providers (routing domains)
– In a transit agreement, I will accept traffic from you that is
intended for any destination.
– In a non-transit agreement, I will only accept traffic from you
that is destined to my customers.
• Associated routing policies
– I advertise to you all routes I can reach and for which I am
willing to carry your traffic.
– I only advertise to you routes to my own customers.

122
Controlling Route Advertisements
Through Policies

AS 1, AS 6 AS 1, AS 6

0.0.0.0/0
0.0.0.0/0
0.0.0.0/0 0.0.0.0/0

123
Controlling Route Advertisements
Through Communities
• COMMUNITY attribute
– First two bytes carry ASN and last two bytes carry community
values used for local policy routing.
– 444: I2 routes; 445: Univ. X; 446: UUNET; 447: Co. X Research

444; 445; 446;


447

GigaPOP

124
Enhancing BGP Scalability
• What is wrong with this picture?
• The need for an iBGP mesh
creates many problems.
– N-1 TCP connections at every
router
– Every new router requires
configuration updates at all other
routers.
– Every router maintains N-1 RIB_In
and RIB_Out.
– Every change at one router needs
to be processed by all other
routers.
• Solutions
– Break it up in smaller pieces
- Route Reflectors
- Confederations

125
Route Reflector
• Simple solution, compatible with current BGP operation, and
supports easy migration
– Some BGP speakers, Route Reflectors (RR), can redistribute to iBGP
peers routes learned from other iBGP peers.
• Route Reflectors have two types of iBGP peers:
– Client peers and non-client peers
- Non-client peers must be fully meshed but not client peers.
– RR and its clients form a cluster identified by a CLUSTER_ID.
- Multiple RRs are allowed in a cluster (redundancy).
• Two Attributes: ORIGINATOR_ID and CLUSTER_LIST
– RR sets ORIGINATOR_ID to be the ROUTER_ID of the router that
originated the route.
- Routers ignore routes with ORIGINATOR_ID equal to their
ROUTER-ID.
– RR prepends the local CLUSTER_ID to the CLUSTER_LIST when
reflecting a route.
- Used to detect looping of routing information
- Routes with local CLUSTER_ID in CLUSTER_LIST are ignored.

126
Route Reflector Operation
• When an RR receives a
route from an iBGP peer:
– Selects the best path based
on its path selection rule
– If the best path is from a
non-client peer, reflect to all
clients
– If the best path is from a
client peer, reflect to all
client and non-client peers
• Note that path selection
need not be identical to
that of a full iBGP mesh.

127
Confederations
• Basic principle
– Break-up one big autonomous system into smaller internal autonomous
systems
• But, this arrangement increases:
– Complexity of routing policy based on AS_PATH information
– External overhead when internal topology changes
• Autonomous system confederation
– Collection of autonomous systems advertised as a single autonomous
system to BGP speakers outside of the confederation
- Confederation is identified externally by a single autonomous system
confederation identifier
- Each member of the Confederation is given a member autonomous
system number that is used only inside the confederation
– Two additional AS_PATH type attributes:
- AS_CONFED_SEQUENCE: Ordered set of member autonomous
system numbers that an UPDATE message has traversed inside the
Confederation
- AS_CONFED_SET: Unordered set of member autonomous system
numbers

128
Confederation Operation

• AS_PATH update rules:


– Different handling of speakers
in AS inside and outside the
Confederation AS 1
AS 112
– Basically hide Confederation
structure when advertising
AS_PATH to the outside, and AS 111
otherwise follow essentially the
same update rules.
AS 114
• Within a Confederation
– NEXT_HOP, MED and AS 113
LOCAL_PREFERENCE can be
advertised unchanged to
neighboring AS members.

129
From BGP to Packet
Forwarding Decisions
• Recursive lookup at Router 1.1.1.1
– BGP routing table identifies Router 1.1.5.1 as the
NEXT_HOP for route r.
– IGP routing table identifies interface 10.2.1.1 on Router
1.1.2.1 as the next hop towards Router 1.1.5.1.
 Forwarding table entry for route r points to 10.2.1.1 on
router 1.1.2.1 as the next hop.

Router
AS 2 1.1.1.1
Router 10.2.1.1
1.1.4.1 IGP
AS 1 Router Router
Router
1.1.3.1 1.1.2.1 AS 3
1.1.5.1
r
iBGP

130
End-to-End Connectivity
Gluing BGP and IGP Decisions Together

• Two cases
1. All routers are BGP speakers (BGP mesh, common in ISPs).
2. Some internal routers do not speak BGP.
• Case 1: BGP mesh
– Forwarding table can be constructed simply based on
recursive lookup.
- IGP provides connectivity between routers.
- BGP associates routes to routers.
• Case 2: Mix of BGP speakers and IGP-only routers
– BGP speakers participate in IGP.
– BGP speakers “export” routes into IGP.
- Example of OSPF ASBRs

131
From Routing Table
to Forwarding Table
• OK, we got to Router 1.1.2.1. Where to next?
– Case 1: BGP full or partial mesh
- Routers 1.1.2.1, 1.1.3.1, 1.1.4.1 also participate in iBGP.
– Partial mesh means that only those routers on the path
between 1.1.1.1 and 1.1.5.1 need to participate in BGP.
– Dangerous (why?) but not uncommon (why?)
- They all know that 1.1.5.1 is the desired exit point and
can forward packets.
Router
AS 2 1.1.1.1
10.2.1.1
Router
1.1.4.1
AS 1 Router
Router 1.1.2.1 AS 3
1.1.5.1
r
Router
1.1.3.1

132
From Routing Table
to Forwarding Table
• OK, we got to Router 1.1.2.1. Where to next?
– Case 2: BGP routes imported into IGP, e.g., OSPF
- Routers 1.1.1.1 and 1.1.5.1 are ASBRs.
- Router 1.1.5.1. advertises a type 1/2 external route r.
- Routers 1.1.2.1, 1.1.3.1 and 1.1.4.1 learn about r through
a type 5 External LSA advertised by 1.1.5.1.
- Router 1.1.1.1 learns about r through both BGP and
OSPF (consistency, precedence?)

Router
Router 1.1.1.1
AS 2 1.1.2.1
Router
1.1.4.1 10.2.1.1
AS 1 Router
Router
1.1.3.1 AS 3
1.1.5.1
r
T5: < r >
133
Forwarding Table Challenges

• With today’s CPU’s SPF (Phase 1) computations are not


anymore the dominant challenge even in large networks
– Less than 50ms per run on 400 routers network
• Processing load of Phase 2 (route/stub updates) can be more
significant for full Internet routing tables (stepping through all
entries in routing table)
– But what are the odds that IS-IS or OSPF will carry a full Internet
routing table?
• Which brings us to the true challenge(s)
– Impact of dependencies across protocols, e.g., BGP and IS-IS
– Volume of data to be pushed/modified
- Full Internet routing table >200MB and ~300k prefixes
- Forwarding table size ~2MB

134
Impact of Protocol Dependencies

• BGP tells A that B and C can both


reach the Internet
300k 300k
– IGP costs to B and C are the tie- routes
routes
breakers with d1<d2
– B is the selected exit point to reach
the Internet through port #1 on A B C

d1 d2

1 3
A

135
Impact of Protocol Dependencies
• BGP tells A that B and C can both
reach the Internet
– IGP costs to B and C are the tie- 300k 300k
breakers with d1<d2 routes routes
– B is the selected exit point to reach
the Internet through port #1 on A
• Internal link failure affects path from B C
A to B
– Exits through port #2 with IGP cost
d’1<d2 d’1 d2
• A needs to step through full BGP
table to determine that IGP change
did not affect BGP decision (d’1<d2) 2
• A needs to update all 300k entries
in forwarding tables to point to new 1 3
forwarding next hop for B now A
reachable over port #2
– A better option: Recursive lookup
136
Recursive Forwarding Structure

10’s of Next_Hops
300k prefixes

• A change in forwarding decision for a Next_Hop does not


require modifications of individual prefix entries
– Only the Next_Hop forwarding information is updated
– One vs 300,000 updates!
• Unfortunately, this wont help if Next_Hop itself changes
– Still need to update up to 300,000 entries in that case
137
Impact of Protocol Dependencies

• BGP tells A that B and C can both


reach the Internet
– IGP costs to B and C are the tie- 300k 300k
breakers with d1<d2 routes routes

– B is the selected exit point to reach


the Internet through port #1 on A B C
• Internal link failure affects path from
A to B
– We now have d’1>d2 d’1 d2
• A needs to step through full BGP
table to determine that IGP change
affects the BGP decision (d’1>d2) 2

• A needs to update all 300k entries 3


in forwarding tables to point to new A
BGP Next_Hop of C reachable over
port #3
138
Dealing with Multiple Protocols

• Routers often learn from Protocol Distance


multiple protocols that use
Connected 0
different/incompatible
metrics
interface
– Which one to prefer? Static route 1
• Administrative distance EIGRP 5
specifies the degree of
eBGP 20
preference of a protocol
– Smaller is better OSPF 110
• Default administrative IS-IS 115
distance can be vendor
RIP 120
specific, and changed…
EGP 140
iBGP 200
Unknown 255
139
Back to BGP: VPN Support
R22 R12

R21 MY BLUE R11 MY GREEN


R24 R14
NETWORK NETWORK

R23 R13

R22
R11

R12
R21

CE PE
MY BLUE/GREEN
P
VIRTUAL NETWORKS

R23
R24
R13 R14 140
VPN Definition and Scope

 A set of “Sites” are attached to a common backbone network


 Subsets of this set form VPNs
 A common backbone delivers IP connectivity to sites belonging to
the same VPN
 Many possible VPN types
 Intranet: All sites belong to the same enterprise
 Extranet: Sites belong to different enterprises
 Sites can
 Belong to multiple VPNs
 Intranet and several different extranets
 Span broad geographical areas
 Routers within a site communicate directly (not through the common
backbone network)
 Policies determine which VPNs a site belongs to and what
routes it learns and can use
 Supporting all these requirements in a scalable and efficient
manner is challenging
 BGP/MPLS defines mechanisms to effectively realize VPNs

141
BGP/MPLS VPNs

 Two main components


 MPLS as the tunneling technology (implementing VPN
connectivity)
 Label stacking (two levels) for ease of scalable backbone
forwarding and easy VPN association
– Outer label identifies the egress backbone router
connecting to customer site
• Stripped upon reception at egress router
– Inner label points to the VPN Routing and Forwarding
(VRF) table for the customer site at the egress router
 BGP as the route distribution and installation mechanism
(controlling connectivity)
 Several extensions to allow transport and selective installation
and use of VPN routes across provider routers
– Which route goes into which VRF?

142
VPN Terminology and Configurations
R22
 Three types of routers R11 R12
 Provider (P) or backbone only
routers
R21
 Provider Edge (PE) routers interface
to customer sites
CE PE
 Customer Edge (CE) routers attach
P
to Service Provider routers
 P and PE router form the Service
Provider network
 CE routers belong to customer R23
VPNs R24
 But do not peer directly with each R13 R14
other (they peer with PE routers)
 Sample VPN Configurations  VPN1: R11, R12, and R13
 VPN1 and VPN2 intranets  VPN2: R21, R22, and R23
 VPN3 extranet  VPN3: R21, R22, and R23 connect to
 VPN2 sites connect to servers at R11 R11 and R12 through R13 143
and R12 through firewall at R13
VPN Forwarding Overview

 PEs maintain multiple forwarding tables


 Default forwarding table
 VPN Routing and Forwarding tables (VRFs)
 Each VRF contains a specific subset of VPN routes
 At ingress PE, each PE-CE connection is associated with a
VRF
 Incoming (from CE) packets are forwarded by looking up the
destination address in the corresponding (ingress) VRF
 Local (attached to same PE) packets are forwarded directly
 Remote (to other VPN site) packets are forwarded as MPLS packets
– VPN route label is assigned based on VRF content
– Tunnel label is pushed on top of label stack to enable delivery
of packet to “next hop” (PE) across the backbone
 Backbone (P) routers forward packets based on outer label
 At egress PE tunnel label is removed and route label is
used to access appropriate VRF
 Forwarding may or may not require an additional VRF lookup
144
VPN Route Distribution

 PEs learn VPN routes from attached CEs


 Can use static or routing protocol (RIP, OSPF, or BGP)
 Routes are installed in the associated VRF
 PEs convert routes into VPN-IPv4 routes by pre-pending a
Route Distinguisher (RD) to each of them
 Distinguishes between addresses from different VPNs
 PEs redistribute VPN routes to other PEs using MP-BGP
 PEs use their own address as the “BGP next hop”
 PEs assign an MPLS label to each route
 Multiple options for assigning labels to routes from the same VRF
 Export policies determine the set of Route Targets (RT - BGP
attribute similar to Community) associated with each route
 Import policies specify the RTs of routes eligible to be installed in a
given VRF

145
BGP Extensions in Support of VPNs

 MP-BGP (RFC 4760)


 Multi-Protocol BGP (MP-BGP) as a generic extension to
BGP to support other protocols (than IPv4) including multiple
address families
 VPN routes are viewed as a separate address family
 Carrying (MPLS) labels in BGP updates (RFC 3107)
 Where and how to associate one or more label with a prefix
in a BGP update
 Binds routes to tunnels
 BGP/MPLS VPNs (RFC 4364)
 Use of BGP as a prefix distribution mechanism in support of
multiple VPNs over a common MPLS network
 Full specification of VPN support with MPLS and MP-BGP

146
Multiprotocol Extensions to BGP

 BGP-4 specifications include only three pieces of information


that are tied to IPv4
 NEXT_HOP (IPv4 address)
 AGGREGATOR (IPv4 address)
 NLRI (IPv4 prefix)
 Extending BGP to handle multiple protocols is achieved by
introducting two new (optional, non-transitive – can be
ignored) attributes
 Multiprotocol Reachable NLRI (MP_REACH_NLRI)
 Carries set of reachable destinations together with NEXT_HOP
 Multiprotocol Unreachable NLRI (MP_UNREACH_NLRI)
 Carries set of unreachable destinations
 Multiprotocol support is specified in capability advertisement
 Capability code set to 1
 Followed by list of supported address families (protocols)

147
MP_REACH_NLRI

 Provides protocol specific reachability information


 Advertises a feasible route together with the (network layer)
address of the next hop router
 Encoding is as follows
Address Family Identifier (AFI) – 2 bytes

Subsequent Address Family Identifier (SAFI) – 1 byte

Length of Next Hop Network Address – 1 byte

Network Address of Next Hop – Variable

Reserved – 1 byte
Network Layer Reachability Information (NLRI) –
Variable
148
Specifying IPv4-VPN Routes
 AFI/SAFI field identifies the network layer protocol to
which the NEXT_HOP address belongs, and specifies
the NLRI semantic
 VPN-IPv4 address family
 AFI=1 (IP) and SAFI=128 for labeled VPN-IPv4 addresses
 Address format -12-byte quantity
 8-byte Route Distinguisher (RD) + 4-byte IPv4 address/ prefix
 Route Distinguisher (2-byte type field, 6-byte value)
 Three defined types
 Type 0: 2-byte administrator subfield (AS number)
4-byte number field administered by AS owner
 Type 1: 4-byte administrator subfield (IP address)
4-byte number field administered by IP address owner
 Type 2: 4-byte administrator subfield (4-byte AS number)
2-byte number field administered by AS owner

149
NLRI Encoding

 NLRI is encoded as one or more triplets of the form


<length;label(s);prefix>
 Length is 1-byte and gives number of bits for label(s)+prefix
 Label(s) encoded as 3 bytes with high-order 20 bits for label
and low-order bit as “bottom of stack” indicator
 Prefix is followed by don’t care bits to align on byte boundary
 Consists of RD + IPv4 prefix
 Prefix length and start position “deduced” from length field and
number of labels
– Keep stepping through labels until reaching bottom of
stack indicator in the label
– Remainder is prefix + padding bits with length field
providing information on the number of padding bits

150
Populating VRFs

 Routes installed in a given VRF come from


 Routes “received” from local CE routers, e.g., through eBGP
 Corresponding VRF is determined from router interface
 Routes learned from remote PE routers over iBGP
 New attribute (Route Target – RT) determines in which VRFs routes
are installed based on local policies
 Policies and Route Targets for VRF construction
 Similar approach as in standard BGP in using Community
attributes to implement policies
 RT defined as an Extended Community Attribute
 For local (associated with CEs attached to PE) routes, export
policies determine value(s) of RT’s
 Local route is converted into a VPN-IPv4 route and added to the
corresponding VRF with one or more RT attributes
 Remote VPN-IPv4 routes received through BGP are installed in
local VRF’s if one of their RT’s matches a local import policy

151
Routing Information Flow

 Egress PE learns route associated with given CE


 Corresponding VRF is identified
 Route is converted to VPN-IPv4 route and RD value is assigned based on
VRF configuration, e.g., each VRF has its own RD
 RT attributes are assigned to route based on local export policies
 Egress PE communicates VPN-IPv4 route to MP-BGP peer
 Sets NEXT_HOP to its own address encoded as VPN-IPv4 address with
RD value of 0
 Assigns a label to route
 One label per VRF, or per outgoing interface, or per route
 Note that PE can aggregate routes before distribution
– Label identifying aggregate route then calls for L3 lookup in VRF
 Ingress PE receives VPN-IPv4 route over MP-BGP session
 Route is installed in VRFs based on matching RT values to import policies
of each VRF
 Note that two VPN-IPv4 routes with the same prefix but different RD values
can both be installed in a given VRF
 Note that unless it is a Route Reflector, a PE should discard all routes that
have no RT attributes matching the import target of at least one VRF
 Tunnel (MPLS or not) is identified for NEXT_HOP of route

152
Forwarding Information Flow

 Packet arrives at ingress PE over interface associated with a given CE


 Corresponding VRF is identified based on incoming interface
 If a match is found for destination address, “next hop” is retrieved
 If the next hop is on same PE, the packet is forwarded without pushing
any new label onto the packet’s label stack (if any)
 Note that if egress interface is associated with a different VRF, and the
matching route is an aggregate, an additional lookup in the egress VRF
may be required
 If the next hop is a remote BGP next hop
 The packet is converted into an MPLS packet with the corresponding VPN
route label
 The next hop “tunnel” information (MPLS label) is retrieved and pushed on
top of the packet’s label stack
 The packet is forwarded to the tunnel’s next hop
 At the next hop (egress PE) the packet treatment depends on the label
 The label can identify
 An egress interface together with the corresponding link layer header
 A VRF in which to lookup the destination address
 The packet is ultimately forwarded on egress interface

153
Route Distribution Through Reflectors

 Use of Route Reflectors is again motivated by


scalability
 However, RRs need to maintain routing information
for VPNs for which they have NO attachments
 In general, RRs accept ALL routes received from client PEs,
provided they carry RT attributes from a “given” set
 Set can be configured or learned
 Routes with RTs not in that set can be (inbound) filtered
 Main difference with BGP is that RRs are not really applying
a decision process to inbound routes and advertising to
clients the output of their decision process
 Closer to passive reflectors

154
Sample VPNs – Closed Mesh (1)
10.0.2.0/16
 VPN1: 4 fully inter-connected sites 10.0.1.0/16
 Basic configuration at PEs
 RD1 value identifies VPN1 CE2
CE1
 RT value of T1 for all VRF1 export and
import policies
 VRF construction at PE1 (VRF1) PE2
PE1
 Learns route 10.1.0.0/16 from CE1, and
installs in VRF1
P
 Exports <RD1,10.1.0.0/16;T1,L1;PE1>
to BGP (Next_Hop self (PE1) and label
L1)
 Advertises PE3 PE4
<RD1,10.1.0.0/16;T1;L1;PE1> to PE2,
PE3 and PE4
 Receives CE3
CE4
 <RD1,10.0.0.0/16;T1;L0;PE4> from PE4
 <RD1,10.3.0.0/16;T1;L3;PE3> from PE3
 <RD1,10.2.0.0/16;T1;L2;PE2> from PE2
and installs them in VRF1 10.3.0.0/16 10.0.0.0/16
155
Sample VPNs – Closed Mesh (2)
10.0.2.0/16
10.0.1.0/16
 Forwarding of packet to 10.0.0.1
from PE1
 Packet received from CE1 CE1
CE2
 Lookup in VRF1 at PE1
 10.0.0.0/16 as best route with
Next_Hop of PE4 PE1 PE2
 Packet sent as MPLS packet with
label stack of <L(PE4),L0> P
 Packet to PE4 delivered through
MPLS backbone based on label
L(PE4) PE3 PE4
 PE4 pops label stack to expose L0
 L0 identifies CE4 as packet
CE3
destination CE4
 PE4 forwards packet to CE4 as
standard IP packet (removes L0)
10.3.0.0/16 10.0.0.0/16
156
Sample VPNs – Hub and Spoke
 VPN2: All connectivity through CE1 10.0.2.0/16
10.0.1.0/16
 Basic configuration at PEs
 RD1 value identifies VPN2
CE1
 Two route targets are defined: TH (hub) CE2
and TS (spoke)
 At the VRFs attached to the hub site
(PE1), TH is the Export target and TS
the Import target PE1 PE2
 At the VRFs attached to the spoke sites
(PE2, PE3, and PE4), TS is the Export P
target and TH the Import target
 VRFs construction
 PEs associated with spoke sites
PE3 PE4
 Receive routes from their CEs and
export them to PE1 with target TS
 Receive routes from PE1 with target TH
and import them in the VRF of their CEs CE3 CE4
 PE1
 Receive routes from spoke PEs with
target TS and installs them in CE1’s
VRF 10.3.0.0/16 10.0.0.0/16
 Export routes (back)to spoke PEs with 157
target TH

Das könnte Ihnen auch gefallen