Sie sind auf Seite 1von 128

Chapter 4

Internetworking
4.1 Simple Internetworking (IP)
4.2 Routing
4.3 Global Internet
4.4 Multicast
4.5 Multiprotocol Label Switching (MPLS)
1

4.1 Simple Internetworking


(IP)

Best Effort Service Model


Global Addressing Scheme
ARP (Address Resolution Protocol
ICMP (Internet Message Control Protocol)

IP Internet
Network 1 (Ethernet)

Concatenation of Networks
H1

H2

H7

H3

R3

H8

Network 4
(point-to-point)

Network 2 (Ethernet)
R1

R2
H4

Protocol Stack

Network 3 (FDDI)

H6

H5

H1

H8

TCP

R1

IP
ETH

R2

IP
ETH

R3

IP
FDDI

FDDI

IP
PPP

PPP

TCP
IP

ETH

ETH

Service Model
Connectionless (datagram-based)
Best-effort delivery (unreliable service)
packets are lost
packets are delivered out of order
duplicate copies of a packet are delivered
packets can be delayed for a long time
Datagram format
0

Version

HLen

16

TOS

31

Length

Ident
TTL

19

Flags
Protocol

Of f set
Checksum

SourceAddr
DestinationAddr
Options (v ariable)

Pad
(v ariable)

Data

Fragmentation and Reassembly


Each network has some MTU
Design decisions
fragment when necessary (MTU < Datagram)
try to avoid fragmentation at source host
re-fragmentation is possible
fragments are self-contained datagrams
use CS-PDU (not cells) for ATM
delay reassembly until destination host
do not recover from lost fragments
5

Start of header
Ident = x

Example

0 Offset = 0
Rest of header

(a)

1400 data bytes

Start of header
Ident = x

1 Offset = 0
Rest of header

H1

R1
R1

R2
R2

R3
R3

H8

512 data bytes


(b)

Start of header

ETH IP (1400)

FDDI IP (1400)

Ident = x

1 Offset = 64

PPP IP (512)

ETH IP (512)

PPP IP (512)

ETH IP (512)

Rest of header

PPP IP (376)

ETH IP (376)

512 data bytes

Start of header

Ident = x

0 Offset = 128
Rest of header
376 data bytes

Global Addresses
Properties
globally unique
hierarchical: network + host
Dot Notation
10.3.2.4
128.96.33.81
192.12.69.77

7
(a)

24

Network

Host

14
(b)

16

Network

Host
21

(c)

Network

8
Host

Datagram Forwarding
Strategy
every datagram contains destinations address
if connected to destination network, then forward to host
if not directly connected, then forward to some router
forwarding table maps network number into next hop
each host has a default router
each router maintains a forwarding table
Example (R2)

Network Number
1
2
3
4

Next Hop
R3
R1
interface 1
interface 0

Address Translation
Map IP addresses into physical addresses
destination host
next hop router
Techniques
encode physical address in host part of IP address
table-based
ARP
table of IP to physical address bindings
broadcast request if IP address not in table
target machine responds with its physical address
table entries are discarded if not refreshed
9

ARP Details
Request Format
HardwareType: type of physical network (e.g., Ethernet)
ProtocolType: type of higher layer protocol (e.g., IP)
HLEN & PLEN: length of physical and protocol addresses
Operation: request or response
Source/Target-Physical/Protocol addresses
Notes
table entries timeout in about 10 minutes
update table with source when you are the target
update table if already have an entry
do not refresh table entries upon reference
10

ARP Packet Format


0

16

Hardware type = 1
HLen = 48

PLen = 32

31
ProtocolType = 0x0800
Operation

SourceHardwareAddr (bytes 0 3)
SourceHardwareAddr (bytes 4- 5)

SourceProtocolAddr (bytes 0 - 1)

SourceProtocolAddr (bytes 2 - 3)

TargetHardwareAddr (bytes 0 1)

TargetHardwareAddr (bytes 2 - 5)
TargetProtocolAddr (bytes 0 - 3)

11

Internet Control Message Protocol


(ICMP)
Echo (ping)
Redirect (from router to source host)
Destination unreachable (protocol, port, or host)

TTL exceeded (so datagrams dont cycle forever)


Checksum failed
Reassembly failed

Cannot fragment

12

Redirect
G1
Network

(1)
H1

Network

(2)
G2

Network

H2

G2 finds that H1 is directly connected and


will inform H1 to redirect the IP datagrams to G2.

4.2 Routing
Forwarding vs Routing
forwarding: to select an output port based on
destination address and routing table
routing: process by which routing table is built
Network as a Graph
A
3
4
C

B
9

1
D

Problem: Find lowest cost path between two nodes


Factors
static: topology
dynamic: load
14

Distance Vector
Each node maintains a set of triples
(Destination, Cost, NextHop)

Directly connected neighbors exchange updates


periodically (on the order of several seconds)
whenever table changes (called triggered update)
Each update is a list of pairs:
(Destination, Cost)
Update local table if receive a better route
smaller cost
came from next-hop
Refresh existing routes; delete if they time out
15

Routing Table Example (Node B)

B
C
A
D
E

Destination Cost NextHop


A
1
A
C
1
C
D
2
C
E
2
A
F
2
A
G
3
A

16

Routing Loops
Example 1
F detects that link to G has failed
F sets distance to G to infinity and sends update to A
A sets distance to G to infinity since it uses F to reach G
A receives periodic update from C with 2-hop path to G
A sets distance to G to 3 and sends update to F
F decides it can reach G in 4 hops via A
B
C
A
D
E

17

Routing Loops
Example 2
link from A to E fails
A advertises distance of infinity to E
B and C advertise a distance of 2 to E
B decides it can reach E in 3 hops; advertises this to A
A decides it can read E in 4 hops; advertises this to C
C decides that it can reach E in 5 hops
B
C
A
D
E

18

Distance Vector: link cost changes


Link cost changes:
node detects local link cost change

updates routing info, recalculates

distance vector
if DV changes, notify neighbors

good
news
travels
fast

y
50

At time t0, y detects the link-cost change, updates its DV,


and informs its neighbors.
At time t1, z receives the update from y and updates its table.
It computes a new least cost to x and sends its neighbors its DV.
At time t2, y receives zs update and updates its distance table.
ys least costs do not change and hence y does not send any
message to z.
19

Distance Vector: link cost changes


good news Travels
fast
Dy

y
50

algorithm
terminates

Dz

20

Distance Vector: link cost changes


Link cost changes:
bad news travels slow - count to infinity problem!

44 iterations before algorithm stabilizes


z (y) does not know that the least distance from y (z)

60

50

to x that y (z) tells z (y) is the distance of the path yz-y-x (z-y-x)
algorithm
continues
on!

21

Distance Vector: poisoned reverse


If Z routes through Y to get to X :
Z tells Y its (Zs) distance to X is infinite (so Y

wont route to X via Z)


will this completely solve count to infinity
problem?
Loops involving three or more nodes cannot be
solved using the technique

60

Y
50

algorithm
terminates

22

RIP ( Routing Information Protocol)


Distance vector algorithm
Included in BSD-UNIX Distribution in 1982
Distance metric: # of hops (max = 15 hops)
Source node: A
u

x
y

destination hops
u
1
v
2
w
2
x
3
y
3
z
2

23

RIP advertisements
Distance vectors:

exchanged among
neighbors every 30 sec
via Response Message
(also called
advertisement)
Each advertisement: a list
of up to 25 destination
subnets within AS

8
Command

16
Version

Family of net 1

31
Must be zero
Address of net 1

Address of net 1

Distance to net 1
Family of net 2

Address of net 2

Address of net 2

Distance to net 2

24

RIP: Example
z

C
Destination Network

w
y
z
x

Next Router

Num. of hops to dest.

....

A
B
B
--

2
2
7
1

Routing table in D
25

RIP: Example
Dest
w
x
z
.

Next
C

hops
4
...

Advertisement
from A to D

z
x

Destination Network

w
y
z
x

Next Router

Num. of hops to dest.

....

A
B
B A
--

Routing table in D

2
2
7 5
1

26

RIP: Link Failure and Recovery


If no advertisement heard after 180 sec --> neighbor or
link declared dead
routes via neighbor invalidated
new advertisements sent to neighbors
neighbors in turn send out new advertisements (if
tables changed)
link failure info quickly propagates to entire net
poison reverse used to prevent ping-pong loops
(infinite distance = 16 hops)
27

RIP Table processing


RIP routing tables managed by application-level

process called route-d (daemon)


advertisements sent in UDP packets, periodically
repeated
routed

routed

Transprt
(UDP)
network
(IP)
link
physical

Transprt
(UDP)
forwarding
table

forwarding
table

network
(IP)
link
physical
28

Link State
Strategy
send to all nodes (not just neighbors)
information about directly connected links (not
entire routing table)
Link State Packet (LSP)
id of the node that created the LSP
cost of link to each directly connected neighbor
sequence number (SEQNO)
time-to-live (TTL) for this packet

29

Link State (cont)


Reliable flooding
store most recent LSP from each node
forward LSP to all nodes but one that sent it
generate new LSP periodically
increment SEQNO

start SEQNO at 0 when reboot


decrement TTL of each stored LSP
discard when TTL=0
30

Reliable Flooding
X

(a)

B
(c)

(b)

(d)
31

Route Calculation
Dijkstras shortest path algorithm
Let

N denotes set of nodes in the graph


l (i, j) denotes non-negative cost (weight) for edge (i, j)
s denotes this node
M denotes the set of nodes incorporated so far
C(n) denotes cost of the path from s to node n

M = {s}
for each n in N - {s}
C(n) = l(s, n)
while (N != M)
M = M union {w} such that C(w) is the minimum for
all w in (N - M)
for each n in (N - M)
C(n) = MIN(C(n), C (w) + l(w, n ))
32

A Link-State Routing Algorithm


Dijkstras algorithm
net topology, link costs known

to all nodes
accomplished via link state
broadcast
all nodes have same info
computes least cost paths from
one node (source) to all other
nodes
gives forwarding table for
that node
iterative: after k iterations,
know least cost path to k
destinations

Notation:
c(x,y): link cost from node x to
y; = if not direct neighbors

D(v): current value of cost of


path from source to destination v

p(v): predecessor node along


path from source to v

N': set of nodes whose least cost


path definitively known

33

Dijsktras Algorithm
1 Initialization:
u: source node
2 N' = {u}
3 for all nodes v
4
if v adjacent to u
5
then D(v) = c(u,v)
6
else D(v) =
7
8 Loop
9 find w not in N' such that D(w) is a minimum
10 add w to N'
11 update D(v) for all v adjacent to w and not in N' :
12
D(v) = min( D(v), D(w) + c(w,v) )
13 /* new cost to v is either old cost to v or known
14 shortest path cost to w plus cost from w to v */
15 until all nodes in N'
34

Dijkstras algorithm: example


Step
0
1
2
3
4
5

N'
u
ux
uxy
uxyv
uxyvw
uxyvwz

D(v),p(v) D(w),p(w)
2,u
5,u
2,u
4,x
2,u
3,y
3,y

D(x),p(x)
1,u

D(y),p(y)

2,x

D(z),p(z)

4,y
4,y
4,y

5
2

v
2

w
3

2
35

Dijkstras algorithm: example


5

5
2

v
2

w
3

5
2

2
1

2
1

w
3

v
2

w
3

2
36

Dijkstras algorithm: example

5
2

v
2

w
3

v
2

w
3

37

Dijkstras algorithm, discussion


Algorithm complexity: n nodes
each iteration: need to check all nodes, w, not in N
n(n+1)/2 comparisons: O(n2)
more efficient implementations possible: O(nlogn)
Oscillations possible:
e.g., link cost = amount of carried traffic

D
1

1
0

A
0 0

C
e

1+e
e

initially

B
1

2+e

D 1+e 1 B
0
0
C
recompute
routing

A
0 0

2+e

1+e

recompute

2+e

D 1+e 1 B
e
0
C

recompute
38

OSPF (Open Shortest Path First)


open: publicly available defined in RFC 2328
Uses Link State algorithm
Link-State packet dissemination
Topology map at each node
Route computation using Dijkstras algorithm
OSPF advertisement carries one entry per neighbor

router
Advertisements disseminated to entire AS (via flooding)
Carried in OSPF messages directly over IP (rather than TCP

or UDP)
39

OSPF advanced features (not in RIP)


Security: all OSPF messages authenticated (to prevent

malicious intrusion)
Load Balancing: Multiple same-cost paths allowed
(only one path in RIP)
For each link, multiple cost metrics for different TOS
(e.g., satellite link cost set low for best effort; high
for real time)
Integrated uni- and multicast support:
Multicast OSPF (MOSPF) uses same topology data
base as OSPF
Hierarchical OSPF in large domains.
40

Hierarchical OSPF
An OSPF autonomous system (AS) can be configured

into areas
Exactly one OSPF area in the AS is configured to be
the backbone area
Each area runs its own OSPF link-state routing
algorithm
Two-level hierarchy: local area, backbone.
Link-state advertisements only in area
each nodes has detailed area topology; only know
direction (shortest path) to nets in other areas.
41

Hierarchical OSPF

42

Hierarchical OSPF

Four types of routers


Internal routers: perform only intra AS routing
Area border routers: belong to both an area

and the backbone


Backbone routers: run OSPF routing limited to
backbone.
Boundary routers: connect to other ASs.

43

OSPF Advertisement Format


0

16

31
LS Age

Version

Ty pe

Message length

LS sequence number
LS checksum
Length

AreaId
Authentication ty pe

Ty pe=1

Link-state ID
Adv ertising router

SourceAddr

Checksum

Options

Flags

Number of links
Link ID
Link data

Authentication
Link ty pe

Num_TOS
Metric
Optional TOS inf ormation
More links

Header Format
Link-State Advertisement
44

Comparison of LS and DV algorithms


Message complexity
LS: with n nodes, E links,

O(nE) messages sent


DV: exchange between
neighbors only
convergence time varies

Speed of Convergence
LS: O(n2) algorithm requires

O(nE) messages
may have oscillations
DV: convergence time varies
may be routing loops
count-to-infinity problem

Robustness: what happens if


router malfunctions?
LS:
node can advertise incorrect

link cost
each node computes only its
own table

DV:
DV node can advertise

incorrect path cost


each nodes table used by
others
error propagate thru

network
45

Metrics
Original ARPANET metric
measures number of packets queued on each link
took neither latency or bandwidth into consideration
New ARPANET metric
stamp each incoming packet with its arrival time (AT)
record departure time (DT)
when link-level ACK arrives, compute
Delay = (DT - AT) + Transmit + Latency

if timeout, reset DT to departure time for

retransmission
link cost = average delay over some time period
46

Metrics
Still has problems
Under light load, it works well since the two static
factors of delay dominated the cost.
Under heavy load, a congested link would start to
advertise a very high cost. This caused all the
traffic to move off that link, leaving it idle, so then
it advertise a low cost,
The range of link values was much too large.
Fine Tuning
compressed dynamic range
replaced Delay with link utilization
47

Revised ARPANET routing metric


versus link utilization
225

9.6-Kbps satellite link


140

9.6-Kbps terrestrial link


56-Kbps satellite link
56-Kbps terrestrial link

90
75
60
30

25%

50%

75%

100%

Utilization
48

Revised ARPANET routing metric


versus link utilization
A highly loaded link never shows a cost of

more than three times its cost when idle


The most expensive link is only seven times
the cost of least expensive
A high-speed satellite link is more attractive
than a low-speed terrestrial link
Cost is a function of link utilization only at
moderate to high loads.
49

4.3 Global Internet Structure


Tree Structure of the Internet in 1990

NSFNET backbone

Stanford
BARRNET
regional
Berkeley

Westnet
regional

PARC

MidNet
regional

UNM

NCAR

ISU

UNL

KU

UA

50

Global Internet
One of the salient features of this topology is that it

consists of end user sites (e.g, Stanford university)


that connect to service provider networks (e.g,
BARRNET)
Each provider and end user is likely to be an
administratively independent entity Autonomous
System (AS).
Scalability problems
Scalability of routing
Address utilization

Subnetting deals with address space utilization


Classless routing or supernetting tackles both address

utilization and routing scalability

51

Subnetting
Inefficient use of Hierarchical Address Space
class C with 2 hosts (2/255 = 0.78% efficient)
class B with 256 hosts (256/65535 = 0.39% efficient)
Still Too Many Networks
routing tables do not scale
route propagation protocols do not scale
Subnetting provides an elegantly simple way to

reduce the total number of networks that are assigned


The idea is to take a single IP network number and
allocate the IP addresses with that network number to
several physical networks subnets.
52

Subnetting
Add another level to address/routing hierarchy: subnet
Subnet masks define variable partition of host part
A single network number can be shared among multiple

networks involves configuring all the nodes on each


subnet with a subnet mask.
Subnets visible only within site
Network number

Host number

Class B address
111111111111111111111111

00000000

Subnet mask (255.255.255.0)


Network number

Subnet ID

Host ID
53

Subnetted address

Subnet Example
H1 H2
255.255.255.128
128.96.34.139
128.96.34.128

Subnet mask: 255.255.255.128


Subnet number: 128.96.34.0
128.96.34.15
128.96.34.1
H1

R1
Subnet mask: 255.255.255.128
Subnet number: 128.96.34.128

128.96.34.130

128.96.34.139

128.96.34.129
H3

R2

H2

R1
255.255.255.128
128.96.34.139
128.96.34.128

128.96.33.1
128.96.33.14

Forwarding table at router R1


Subnet mask: 255.255.255.0
Subnet number: 128.96.33.0

Subnet Number
128.96.34.0
128.96.34.128
128.96.33.0

Subnet Mask
255.255.255.128
255.255.255.128
255.255.255.0

Next Hop
interface 0
interface 1
R2 54

Forwarding Algorithm
D = destination IP address
for each entry (SubnetNum, SubnetMask, NextHop)
D1 = SubnetMask & D
if D1 = SubnetNum
if NextHop is an interface
deliver datagram directly to D
else
deliver datagram to NextHop

Use a default router if nothing matches


Not necessary for all 1s in subnet mask to be

contiguous
Can put multiple subnets on one physical network
Subnets not visible from the rest of the Internet
55

Classless Routing (CIDR) Supernetting


CIDR: Classless Inter-Domain Routing

A technique that addresses two scaling concerns:


the growth of backbone routing tables, and
the potential for the 32-bit IP address space to be
exhausted well before the 4 billionth host is attached
to the Internet.
Even though subnetting can help to assign

addresses carefully, it does not get around the


fact that any AS with more than 255 hosts wants
a class B address exhaustion of IP address
space.
56

Classless Routing (CIDR) Supernetting


CIDR tries to balance the desire to minimize

the number of routes that a router needs to


know against the need to hand out addresses
efficiently
Assign block of contiguous network numbers to

nearby networks
Represent blocks with a single pair
(first_network_address, count)

Restrict block sizes to powers of 2


Use a bit mask (CIDR mask) to identify block size
All routers must understand CIDR addressing
57

Route aggregation with CIDR


Customers
128.112.128/24

Advertise
128.112.128/21

ISP

..
.
128.112.135/24

Since all of the customers are reachable through the same

Provider network, it can advertise a single route to all of


Them by just advertising the common 21-bit prefix they share
58

IP Forwarding Revisited
Find the network number in a packet and then lookup

that number in a forwarding table.


Reexamine this assumption with CIDR
Prefixes length 2-32 bits
Prefixes may overlap
Some addresses may match more than one prefix.
Longest Prefix Matching (LPM)
For example
171.69 (16-bit prefix)
171.69.10 (24-bit prefix)
171.69.10.5 matches both
171.69.20.5 only matches 171.69

59

Interdomain Routing (BGP)


AS = routing domain
Routing Policies
Two major Interdomain
routing protocols
-- Exterior gateway Protocol
(EGP)
-- Border gateway Protocol
(BGP-4)

R1

R3

R2
Autonomous sy stem 1

R4
Autonomous sy stem 2

R5

R6

60

BGP-4: Border Gateway Protocol


AS Types
stub AS: has a single connection to one other AS
carries local traffic only

multihomed AS: has connections to more than one AS


refuses to carry transit traffic
transit AS: has connections to more than one AS
carries both transit and local traffic

Each AS has:
one or more border routers
one BGP speaker that advertises:
local networks
other reachable networks (transit AS only)
gives path information

61

Todays multibackbone Internet


Large corporation

Consumer
ISP
Peering
point

Backbone service provider

Consumer
ISP

Large corporation

Peering
point

Consumer
ISP

Small
corporation
62

BGP Example
Speaker for AS2 advertises reachability to P and Q
network 128.96, 192.4.153, 192.4.32, and 192.4.3, can be reached

directly from AS2


Customer P
(AS 4)

128.96
192.4.153

Customer Q
(AS 5)

192.4.32
192.4.3

Customer R
(AS 6)

192.12.69

Customer S
(AS 7)

192.4.54
192.4.23

Regional prov ider A


(AS 2)
Backbone network
(AS 1)
Regional prov ider B
(AS 3)

Speaker for backbone advertises


networks 128.96, 192.4.153, 192.4.32, and 192.4.3 can be reached

along the path (AS1, AS2).

Speaker can cancel previously advertised paths


63

Internet inter-AS routing: BGP


BGP (Border Gateway Protocol): the de facto

standard
BGP provides each AS a means to:
1. Obtain subnet reachability information from

neighboring ASs.
2. Propagate the reachability information to all routers
internal to the AS.
3. Determine good routes to subnets based on
reachability information and policy.

Allows a subnet to advertise its existence to rest

of the Internet: I am here

64

BGP basics

Pairs of routers (BGP peers) exchange routing information

over semi-permanent TCP connections: BGP sessions


Note that BGP sessions do not correspond to physical links.
When AS2 advertises a prefix to AS1, AS2 is promising it will
forward any datagrams destined to that prefix towards the
prefix.
AS2 can aggregate prefixes in its advertisement

3c
3a
3b
AS3
1a
AS1

2a

1c
1d

1b

2c
AS2

2b

External BGP (eBGP) session


Internal BGP (iBGP) session

65

Aggregation of prefixes
138.16.64/24
138.16.65/24
138.16.66/24 138.16.64/22
138.16.67/24

66

Distributing reachability info


With eBGP session between 3a and 1c, AS3 sends prefix reachability

information to AS1.
1c can then use iBGP to distribute this new prefix reachability
information to all routers in AS1
1b can then re-advertise the new reachability information to AS2 over
the 1b-to-2a eBGP session
When router learns about a new prefix, it creates an entry for the prefix
in its forwarding table.

3c
3a
3b
AS3
1a
AS1

2a

1c
1d

1b

2c
AS2

2b

eBGP session
iBGP session

67

Path attributes & BGP routes


When advertising a prefix, advertisement includes BGP

attributes.

prefix + attributes = route

Two important attributes:


AS-PATH: contains the ASs through which the advertisement for

the prefix passed: AS 67 AS 17


used to detect and prevent looping advertisement
also use in choosing among multiple path to the same prefix

NEXT-HOP: Indicates the specific internal-AS router to next-hop

AS. (There may be multiple links from current AS to next-hopAS.)

When gateway router receives route advertisement, uses

import policy to accept/decline.

68

BGP route selection


Router may learn about more than 1 route to any

one prefix. Router must select route.


Elimination rules invoked sequentially until one
route remains:
1. Local preference value attribute: policy decision ASs

network administrator
2. Shortest AS-PATH
3. Closest NEXT-HOP router: hot potato routing
4. Additional criteria

69

BGP messages
BGP messages exchanged using TCP.
BGP messages:
OPEN: opens TCP connection to peer and authenticates

sender
UPDATE: advertises new path (or withdraws old)
KEEPALIVE keeps connection alive in absence of
UPDATES; also ACKs OPEN request
NOTIFICATION: reports errors in previous message; also
used to close connection

70

BGP routing policy


legend:

B
W

provider
network

customer
network:

C
Y

Figure 4.5-BGPnew: a simple BGP scenario

A,B,C are provider networks

X,W,Y are customer (of provider networks)


X is dual-homed: attached to two networks
X does not want to route from B via X to C
.. so X will not advertise to B a route to C
71

BGP routing policy (2)


legend:

B
W

provider
network

customer
network:

C
Y

Figure 4.5-BGPnew: a simple BGP scenario

A advertises to B the path AW

B advertises to X the path BAW


Should B advertise to C the path BAW?
No way! B gets no revenue for routing CBAW since neither W nor C

are Bs customers
B wants to force C to route to w via A
B wants to route only to/from its customers!
72

Why different Intra- and Inter-AS routing ?


Policy:
Inter-AS: administrator wants control over how its traffic

routed, who routes through its net.


Intra-AS: single admin, so no policy decisions needed

Scale:
hierarchical routing saves table size, reduced update traffic

Performance:
Intra-AS: can focus on performance
Inter-AS: policy may dominate over performance

73

IP Version 6

Features
128-bit addresses (classless)
multicast
real-time service
authentication and security
autoconfiguration
end-to-end fragmentation
protocol extensions
Header
40-byte base header
extension headers (fixed order, mostly fixed length)
fragmentation
source routing
authentication and security
other options
74

4.4 Broadcast/Multicast routing


Broadcast routing - deliver a packet from a

source node to all other nodes


Multicast routing deliver a packet from a

source node to a subset of other nodes

75

Source-duplication versus in-network duplication

duplicate

R1

duplicate
creation/transmission

R1
duplicate

R2

R2

R3

R4

(a)

R3

R4

(b)

(a) source duplication, (b) in-network duplication


76

Broadcast routing algorithms


Uncontrolled flooding

Controlled flooding
Spanning-tree broadcast

77

Uncontrolled flooding
The source node sends a copy of the packet to

all of its neighbors


When a node receives a broadcast packet, it
duplicates the packet and forwards it to all of
its neighbors (except the neighbor from which
it receives the packet)
Problems:
If the graph has cycles, then one or more copies

of each broadcast packet will cycle indefinitely


Broadcast storm

78

Controlled flooding
Sequence-number-controlled flooding
Source node puts its address and a broadcast
sequence number into a broadcast packet
Each node maintains a list of the source
address and sequence number of each packet
it has received
When a node receives a broadcast packet
If the packet is in the list, the packet is dropped

Otherwise, the packet is duplicated and

forwarded

79

Controlled flooding
Reverse path forwarding
When a router receives a broadcast packet, it
duplicates and forwards the packet only if the
packet arrives on the link that is on its own shortest
unicast path back to the source
A
B

Packet will be forwarded

D
G

Packet not forwarded beyond receiving router

80

Controlled flooding
Drawback
Some of the nodes receive redundant packets
A
B

D
G

Redundant packets
Ideally, every node should receive only one copy of the broadcast packet.
81

Spanning-tree broadcast
Spanning tree a tree that contains all nodes in a graph
Minimum spanning tree a spanning tree whose cost is
the minimum among all the spanning trees of a graph

Broadcast along a spanning tree


A
B

c
D
F
G

(a) Broadcast initiated at A

D
G

(b) Broadcast initiated at D


82

Construction of Spanning-tree
Many algorithms have been developed
Center-based approach
Select a center node (rendezvous or core)
Each node unicasts tree-join message to the center node
A

A
3

c
4

E
1 Center node

c
D

(a) Stepwise construction


of spanning tree

(b) Constructed spanning


tree
83

Multicast Routing: Problem Statement


Goal: find a tree (or trees) connecting routers

having local multicast group members


tree: not all paths between routers used
source-based: different tree from each sender to receivers
shared-tree: same tree used by all group members

Source-based trees

Shared tree

84

Approaches for building multicast trees


source-based tree: one tree per source
shortest path trees
reverse path forwarding
group-shared tree: group uses one tree
minimal spanning (Steiner)
center-based trees
we first look at basic approaches, then specific
protocols adopting these approaches
85

Shortest Path Tree


multicast forwarding tree: tree of shortest path

routes from source to all receivers


Dijkstras algorithm
S: source

LEGEND

R1
1

R4

R2
3
R3

router with attached


group member

4
R6

router with no attached


group member

R5
6
R7

link used for forwarding,


i indicates order link
added by algorithm
86

Reverse Path Forwarding


rely on routers knowledge of unicast shortest

path from it to sender


each router has simple forwarding behavior:

if (multicast datagram received on incoming link on


shortest path back to sender)
then flood datagram onto all outgoing links
else ignore datagram

87

Reverse Path Forwarding: example


S: source

LEGEND

R1

R4

router with attached


group member

R2

R5
R3

R6

R7

router with no attached


group member
datagram will be
forwarded
datagram will not be
forwarded

result is a source-specific reverse SPT


may be a bad choice with asymmetric links
88

Reverse Path Forwarding: pruning


forwarding tree contains subtrees with no multicast

group members
no need to forward datagrams down subtree
prune messages sent upstream by router with no
downstream group members
LEGEND

S: source

R1

router with attached


group member

R4

R2

P
R5

R3

R6

P
R7

router with no attached


group member
prune message
links with multicast
forwarding
89

Shared-Tree: Steiner Tree


Steiner Tree: minimum cost tree connecting

all routers with attached group members


problem is NP-complete
excellent heuristics exists
not used in practice:
computational complexity
information about entire network needed
monolithic: rerun whenever a router needs to

join/leave
90

Center-based trees
single delivery tree shared by all
one router identified as center of tree
to join:
edge router sends unicast join-message addressed to
center router
join-message processed by intermediate routers and
forwarded towards center
join-message either hits existing tree branch for this
center, or arrives at center
path taken by join-message becomes new branch of tree
for this router
91

Center-based trees: an example


Suppose R6 chosen as center:
LEGEND
R1
3
R2

router with attached


group member

R4
2
R5

R3

R6

router with no attached


group member
path order in which join
messages generated

R7

92

Internet Multicasting Routing: DVMRP


DVMRP: distance vector multicast routing

protocol, RFC1075
flood and prune: source-based tree, reverse path
forwarding,
RPF tree based on DVMRPs own routing tables

constructed by communicating DVMRP routers


no assumptions about underlying unicast
initial datagram to multicast group flooded everywhere
via RPF
routers not wanting group: send upstream prune
messages
93

DVMRP: continued
soft state: DVMRP router periodically (1 min.)

forgets branches are pruned:

multicast data again flows down unpruned branch

downstream router: reprune or else continue to

receive data

routers can quickly regraft to tree


following IGMP join at leaf
odds and ends
commonly implemented in commercial routers
Mbone routing done using DVMRP

94

Tunneling
Q: How to connect islands of multicast routers in a
sea of unicast routers?

physical topology

logical topology

multicast datagram encapsulated inside normal (non-multicast-

addressed) datagram
normal IP datagram sent thru tunnel via regular IP unicast to
receiving multicast router
receiving multicast router decapsulates to get multicast datagram
95

PIM: Protocol Independent Multicast


Not dependent on any specific underlying unicast

routing algorithm (like RIP, OSPF, works with all)


Two different multicast distribution scenarios :

Dense:

Sparse:

group members

# of routers with group

densely packed, in
close proximity.

members is small wrt total


# of routers
group members widely
dispersed

96

Consequences of Sparse-Dense Dichotomy:


Dense

Sparse:

group membership by

no membership until

routers explicitly join


routers assumed until
routers explicitly prune receiver-driven
data-driven construction construction of multicast
tree (e.g., center-based)
of multicast tree (e.g.,
RPF)
bandwidth and nongroup-router processing
bandwidth and nonconservative
group-router processing
profligate
97

PIM- Dense Mode


Flood-and-prune RPF, similar to DVMRP but
underlying unicast protocol provides RPF
information for incoming datagram
less complicated (less efficient) downstream
flood than DVMRP
reduces reliance on underlying routing
algorithm
has protocol mechanism for router to detect if
it is a leaf-node router
98

PIM - Sparse Mode


Center-based approach

R1

router sends join

message to rendezvous
point (RP)
intermediate routers

update state and forward


join

after joining via RP,

router can switch to


source-specific tree

R4

join

R2

R3

join
R5

join

R6

all data multicast


from rendezvous
point

R7
rendezvous
point

99

PIM - Sparse Mode


Sender(s):
unicast data to RP,
which distributes down
RP-rooted tree
RP can extend multicast
tree upstream to source
RP can send stop
message to the source if
no attached receivers

R1

R4

join
R2

R3

join
R5

join
R6

all data multicast


from rendezvous
point

R7
rendezvous
point

no one is listening!
100

4.5 MultiProtocol Label Switching


(MPLS)
Prior Work
MPLS Overview
MPLS Architecture

Prior Work
Tag Switching (Cisco)
Aggregate Route-Based IP Switching (ARIS,

IBM)
IP Navigator
IFMP-IP Switching (Ipsilon)
Cell Switching Router (CSR, Toshiba)

102

Prior Work
Tag switching is based on the control-driven approach.

The set up of LSPs (Label Switched Paths) closely


follows control messages such as routing updates and
RSVP messages.
Aggregate route-based IP switching (ARIS) is based on
the control-driven approach. Very similar to tag
switching. ARIS introduces the concept of an egress
identifier (FECs) to express the granularity of LSPs.
IP Navigator is again a control-driven protocol. Use
OSPF as the internal routing protocol used within a
routing domain. Explicit routing is used to setting up
the VCs.
103

Prior Work
Ipsilon Flow Management Protocol (IFMP) is a traffic driven

protocol. When the number of packets from a flow exceeds a


predetermined threshold, the controller uses IFMP to set up an
LSP for the particular flow.
Cell switch router (CSR) proposal is similar to IP switching.
CSR is primarily designed as a device for interconnecting ATM
clouds. Within an LIS (logical IP subnet), ATM forum standards
are used to connection hosts and switched together.
Multiple LISs are then interconnected with CSRs that are
capable of running both IP forwarding and cell forwarding. The
setup of LSPs is data-driven for best effort traffic and RSVPdriven for flows that require resource reservation.
104

MPLS Overview
RFC 3812
The IETF MPLS working group is to standardize a

base technology that integrates the label swapping


forwarding paradigm with network layer routing.
Cisco is the major contributor to the MPLS working
group.
substitute Label for Tag in Tag Switching @

MPLS

Core mechanisms of MPLS


Semantics assigned to a stream label
Labels are associated with specific streams of

data.
Forwarding Methods
Forwarding is simplified by the use of the short
fixed length labels to identify streams.
Forwarding may require simple functions such as
looking up a label in a table, swapping labels, and
possibly decrementing and checking a TTL.
Label Distribution Methods
Allow nodes to determine which labels to use for
specific streams.

Native IP Forwarding
IP routing: both the packet forwarding and

route determination process in an IP


network.
Native IP forwarding (NIF): hop-by-hop,
destination-based packet forwarding.
Each packets next hop and output port are

determined by a longest-prefix-match
forwarding table lookup.
Additional packet classification may also be
performed to derive output port queuing and
scheduling rules.
107

A Simplified NIF forwarding engine


Longest Prefix Match lookup

Forwarding
Table

Next hop + port

Packet
Classification
Input
Ports
IP Header

Queuing and
Scheduling rules

Output
Ports

IP payload

Packet Classification keys: IP source and destination addresses,


IP protocol type, DiffServ (DS) or TOS byte, and TCP/UDP port
numbers.
108

Per-Hop classification, queuing, and


scheduling
Classify

Queue

Port 1

Port N

Port M

109

A Simplified LSR forwarding engine

Switching
Table

Next hop + port


Queuing and
Scheduling rules
Output
Ports

Input
Ports
MPLS label

MPLS payload

110

Traffic Engineering
Conventional IP routing attempts to find the shortest

path between a packets current location and its


intended destination.
Hot spots and packet loss rates, latency, and jitter
increase as the average load on a router rises.
Solutions: (1) Faster routers, (2) Alternate routes.
Routing policy may also require traffic engineering.
For example, the external link between R6 and A3
may have been funded solely by A2 and A3.
Therefore, A1s traffic must not be allowed to
traverse it.
111

Traffic Engineering
-- Override the shortest path route
IP Backbone

Access 1
R1

R6

Access 3
R5

Access 2

R2

R3
R4

Route from A2 to D
Desired route from A1 to D

Actual route from A1 to D

Destination D
112

Signaling and Provisioning


Signaling: when network (re)configuration can be

requested by users at any time and achieved within


milliseconds or seconds.
Provisioning: When the reaction time for
(re)configuration becomes measured in minutes or hours.
In either case, the (re)configuring action involves
establishing (or modifying) information used by routers or
switches to control their forwarding actions, including
forwarding (routing) information,
classification rules, and/or
queuing and scheduling parameters.
113

Core MPLS Components


The basic routing approach
Routing is accomplished through the use of standard
L3 routing protocols (e.g. OSPF and BGP).
The information maintained by the L3 routing
protocols is then used to distribute labels to
neighboring nodes that are used in the forwarding of
packets.
Labels
Label semantics, Label granularity, Label
assignment, Label stack and forwarding operations.

Label Semantics
The label is nothing more than a shorthand for an

aggregate stream of user data.


The meaning of the label is a strictly local issue
between two neighboring nodes.
MPLS could be employed between any two
neighboring nodes, even if no other nodes in the
network participate in MPLS.
When MPLS is used between more than two nodes,
then the operation between any two neighboring
nodes could be interpreted as independent of the
operation between any other pair of nodes.

Label Granularity
The device uses the label to forward packets

will forward all packets with the same label in


the same way.
A Forwarding Equivalence Class (FEC) is a set
of L3 packets which are all forwarded in the
same manner by a particular Label Switching
Router (LSR).
For unicast IP traffic, the granularity of a label
allows various levels of aggregation in a Label
Information Base (LIB).
For IP multicast, the natural binding of a label
would be to a multicast tree.

Label assignment
Label assignment involves allocating a label,

and then binding a label to a route.


Label assignment can be driven by control
traffic or data traffic. (discussed later.)
Label withdrawal is primarily a matter of
garbage collection, that is collecting up unused
labels so that they may be reassigned.

117

Routing Aggregation
R6

Access 1

1
R1

R5

Access 3

2
Access 2

R2

R3

5
R4

Destination D
118

Forwarding Component
Label Stack and Forwarding Operations
label swap : looking up the incoming label to

determine the outgoing label, encapsulation, port,


and any additional information which may pertain
to the stream such as a particular queue or other
QoS related treatment.
label push : When a packet first enters an MPLS
domain, the packet is associated with a label.
label pop : When a packet leaves an MPLS
domain, the label is removed.
The label stack is useful within hierarchical routing
domain.

Encapsulation
Label-based forwarding makes use of various pieces of

information, including a label or stack of labels, and


possibly additional information such as a TTL field.
MPLS encapsulation : encapsulate the label
information and information used for label based
forwarding.
An encapsulation scheme may make use of the
following fields:
label, TTL, class of service, stack indicator, next
header type indicator, and checksum

MPLS label stack encoding


Stack bottom

Stack top

COS

Label
(20 bits)

Label
(20 bits)

Label
(20 bits)

Exp
(3 bits)

Exp
(3 bits)

Exp
(3 bits)

S
(1 bit)

S
(1 bit)

S
(1 bit)

TTL
(8 bits)

TTL
(8 bits)

TTL
(8 bits)

...

Original
Packet

MPLS frame delivered to link layer


121

Label Assignment
Topology driven (Tag)
In response to normal processing of routing protocol control
traffic
Labels are pre-assigned; no label setup latency at forwarding
time.
Request driven (RSVP)
In response to normal processing of request based control traffic
May require a large number of labels to be assigned.
Traffic driven (Ipsilon)
The arrival of data at an LSR triggers label assignment and
distribution.
Label setup latency; potential for packet reordering.

122

Label Distribution
Explicit Label Distribution
Downstream label allocation
label allocation is done by the downstream LSR
most natural mechanism for unicast traffic

Upstream label allocation


label allocation is done by the upstream LSR
may be used for optimality for some multicast traffic
A unique label for an egress LSR within the MPLS

domain
Any stream to a particular MPLS egress node could use the

label of that node.


123

Label Distribution
Explicit Label Distribution Protocol (LDP)

Reliability : by transport protocol or as part of LDP.


Separate routing computation and label distribution.
Piggybacking on Other Control Messages

Use existing routing/control protocol for distributing

routing/control and label information.


OSPF, BGP, RSVP, PIM

Combine routing and label distribution.


Label purge mechanisms
By time out

Exchange of MPLS control packets

124

Label Distribution Protocol


LDP Peer:
Two LSRs that exchange label/stream mapping information via LDP

LDP messages
Discovery messages (via UDP)
announce and maintain the presence of LSR

Session messages
maintain session between LDP peers
Advertisement message
label operation (Label distribution)
Notification message
advisory information and signal error information
Error notification: signal fatal errors
Advisory notification: status of the LDP session or some previous message
received from the peer.
125

Label Swapping
Labeled Packet

Example : Forwarding a Labeled Packet

Map the incoming label to a

next hop label, determines


where to forward the packet.
Encodes the new label stack
into the packet, and then
forwards it.

Incoming Label Map (ILM)


Input
Port
Label
1
4

Unlabeled Packet
LSR analyzes the L3 header, to

determine the packets stream.


Map the stream to a next hop,
determines where to forward
the packet.
Encodes the new label stack
into the packet, and then
forwards it.

Output
Port
Label
2
6

Label Switching Router


(LSR)
L3 Header
L2 Header

IP Router
Module

Label

Dat

H3

H2

Dat

H3

126

H2

Use of MPLS in a Hierarchy


Swap
L1

L4

OSPF
R2
R1

L2

Push

IN

OUT

IN

OUT

L2

L3

L3

L1

Swap

OUT

L1

R3

IN

OUT

L1

L4

R5

R6

R4

Pop

L2

L3

L3

L1

L1

L1

L1

BGP

L2
L1

Domain 1

Domain 2

127

Conclusion
MPLS improves the scalability of hop-by-hop

routing and forwarding, and provides traffic


engineering capabilities for better network
provisioning.
It decouples forwarding from routing and
allows multi-protocol support without
requiring changes to the basic forwarding
paradigm.
Generalized MPLS (GMPLS)
MPLS (Optical wavelength-based)
128