Sie sind auf Seite 1von 68

1

Transport Layer
2
Transport Layer: Goals & Overview
Our goals:
understand principles
behind transport layer
services:
multiplexing/demulti
plexing
reliable data transfer
flow control
congestion control
instantiation and
implementation in the
Internet
Overview:
transport layer services
multiplexing/demultiplexing
connectionless transport:
UDP
principles of reliable data
transfer
connection-oriented
transport: TCP
reliable transfer
flow control
connection management
principles of congestion
control
TCP congestion control

3
Transport services and protocols
provide logical communication
between app processes running on
different hosts
transport protocols run in end
systems
transport vs network layer
services:
network layer: data transfer
between end systems
transport layer: data transfer
between processes
relies on, enhances, network layer
services


applicatio
n
transport
network
data link
physical
applicatio
n
transport
network
data link
physical

network
data link
physical

network
data link
physical

network
data link
physical

network
data link
physical
network
data link
physical
4
Transport-layer protocols
Internet transport
services:
reliable, in-order unicast
delivery (TCP)
congestion
flow control
connection setup
unreliable (best-effort),
unordered unicast or
multicast delivery: UDP
services not available:
real-time
bandwidth guarantees
reliable multicast
applicatio
n
transport
network
data link
physical
applicatio
n
transport
network
data link
physical

network
data link
physical

network
data link
physical

network
data link
physical

network
data link
physical
network
data link
physical
5
application
transport
network

M
P2
application
transport
network

Multiplexing/demultiplexing
Recall: segment - unit of data
exchanged between transport
layer entities
aka TPDU: transport
protocol data unit
receiver
H
t
H
n
Demultiplexing: delivering
received segments to
correct app layer processes
segment
segment
M
application
transport
network

P1
M
M M
P3
P4
segment
header
application-layer
data
6
Multiplexing/demultiplexing
multiplexing/demultiplexing:
based on sender, receiver port
numbers, IP addresses
source, dest port #s in each
segment
recall: well-known port
numbers for specific
applications (0 to 1023)
Un reserved port nos. for
applications: 1024 to 65535
gathering data from multiple
app processes, enveloping
data with header (later used
for demultiplexing)
source port # dest port #
32 bits
application
data
(message)
other header fields
TCP/UDP segment format
Multiplexing:
7
Multiplexing/demultiplexing: examples
host A
server B
source port: x
dest. port: 23
source port:23
dest. port: x
port use: simple telnet app
Web client
host A
Web
server B
Web client
host C
Source IP: C
Dest IP: B
source port:
x
dest. port:
80
Source IP: C
Dest IP: B
source port:
y
dest. port:
80
port use: Web server
Source IP: A
Dest IP: B
source port:
x
dest. port:
80
8
UDP: User Datagram Protocol [RFC 768]

best effort service, UDP
segments may be:
lost
delivered out of order to
app
connectionless:
no handshaking between
UDP sender, receiver
each UDP segment handled
independently of others

Why is there a UDP?
no connection establishment
(which can add delay)
simple: no connection state
at sender, receiver
small segment header
no congestion control: UDP
can blast away as fast as
desired

9
UDP: more
often used for streaming
multimedia apps
loss tolerant
rate sensitive
other UDP uses (why?):
DNS
SNMP
reliable transfer over UDP:
add reliability at application
layer
application-specific error
recovery!
source port # dest port #
32 bits
Application
data
(message)
UDP segment format
length checksum
Length, in
bytes of UDP
segment,
including
header
10
UDP checksum
Sender:
treat segment contents as
sequence of 16-bit integers
checksum: addition (1s
complement sum) of segment
contents
sender puts checksum value
into UDP checksum field


Receiver:
compute checksum of received
segment
check if computed checksum
equals checksum field value:
NO - error detected
YES - no error detected.

Goal: detect errors (e.g., flipped bits) in transmitted
segment

11
Principles of Reliable data transfer
important in app., transport, link layers
It is one of the important networking topics!

characteristics of unreliable channel will determine complexity of
reliable data transfer protocol (rdt)
12
Reliable data transfer: getting started
send
side
receive
side
rdt_send(): called from above,
(e.g., by app.). Passed data to
deliver to receiver upper layer
udt_send(): called by rdt,
to transfer packet over
unreliable channel to
receiver
rdt_rcv(): called when packet
arrives on rcv-side of channel
deliver_data(): called by
rdt to deliver data to
upper
13
Reliable data transfer: getting started
Well:
incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
consider only unidirectional data transfer
but control info will flow on both directions!
use finite state machines (FSM) to specify sender,
receiver
state
1
state
2
event causing state transition
actions taken on state transition
state: when in this
state next state uniquely
determined by next event
event
actions
14
Rdt1.0: reliable transfer over a reliable
channel
underlying channel perfectly reliable
no bit errors
no loss of packets
separate FSMs for sender, receiver:
sender sends data into underlying channel
receiver read data from underlying channel
Wait for
call from
above
packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packet,data)
deliver_data(data)
Wait for
call from
below
rdt_rcv(packet)
sender
receiver
15
Rdt2.0: channel with bit errors
underlying channel may flip bits in packet
recall: UDP checksum to detect bit errors
the question: how to recover from errors:
acknowledgements (ACKs): receiver explicitly tells sender that pkt
received OK
negative acknowledgements (NAKs): receiver explicitly tells sender
that pkt had errors
sender retransmits pkt on receipt of NAK
human scenarios using ACKs, NAKs?
new mechanisms in rdt2.0 (beyond rdt1.0):
error detection
receiver feedback: control msgs (ACK,NAK) rcvr->sender
16
rdt2.0: FSM specification
Wait for
call from
above
snkpkt = make_pkt(data,
checksum)
udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(sndpk
t)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt)
&&
corrupt(rcvpkt)
Wait for
ACK or
NAK
Wait for
call from
below
sender
receiver
rdt_send(data)
A
17
rdt2.0: operation with no errors
Wait for
call from
above
snkpkt = make_pkt(data,
checksum)
udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(sndpk
t)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt)
&&
corrupt(rcvpkt)
Wait for
ACK or
NAK
Wait for
call from
below
rdt_send(data)
A
18
rdt2.0: error scenario
Wait for
call from
above
snkpkt = make_pkt(data,
checksum)
udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(sndpk
t)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt)
&&
corrupt(rcvpkt)
Wait for
ACK or
NAK
Wait for
call from
below
rdt_send(data)
A
19
rdt2.0 has a fatal flaw!
What happens if
ACK/NAK corrupted?
sender doesnt know what
happened at receiver!
cant just retransmit: possible
duplicate
What to do?
sender ACKs/NAKs receivers
ACK/NAK? What if sender
ACK/NAK lost?
retransmit, but this might
cause retransmission of correctly
received pkt!


Handling duplicates:
sender adds sequence number
to each pkt
sender retransmits current
pkt if ACK/NAK garbled
receiver discards (doesnt
deliver up) duplicate pkt
Sender sends one packet,
then waits for receiver
response
stop and wait
20
rdt2.1: sender, handles garbled ACK/NAKs
Wait for
call 0 from
above
sndpkt = make_pkt(0, data,
checksum)
udt_send(sndpkt)
rdt_send(data)
Wait for
ACK or
NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isNAK(rcvpkt) )
sndpkt = make_pkt(1, data,
checksum)
udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt)
udt_send(sndpkt
)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isNAK(rcvpkt) )
rdt_rcv(rcvpkt)
&&
notcorrupt(rcvpkt)
&& isACK(rcvpkt)
Wait for
call 1
from
above
Wait for
ACK or
NAK 1
A
A
21
rdt2.1: receiver, handles garbled
ACK/NAKs
Wait
for
0 from
below
sndpkt = make_pkt(NAK,
chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) ||
has_seq0(rcvpkt)))

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
Wait
for
1 from
below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) ||
has_seq1(rcvpkt)))

22
rdt2.1: discussion
Sender:
seq # added to pkt
two seq. #s (0,1) will
suffice. Why?
must check if received
ACK/NAK corrupted
twice as many states
state must remember
whether current pkt has 0
or 1 seq. #

Receiver:
must check if received
packet is duplicate
state indicates whether 0 or
1 is expected pkt seq #
note: receiver can not
know if its last ACK/NAK
received OK at sender
23
rdt2.2: a NAK-free protocol
same functionality as rdt2.1, using NAKs only
instead of NAK, receiver sends ACK for last pkt
received OK
receiver must explicitly include seq # of pkt being ACKed
duplicate ACK at sender results in same action as
NAK: retransmit current pkt
24
rdt2.2: sender, receiver fragments
Wait for
call 0
from
above
sndpkt = make_pkt(0, data,
checksum)
udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isACK(rcvpkt,1) )
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
Wait for
ACK
0
sender FSM
fragment
Wait
for
0 from
below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) ||
has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSM
fragment
A
25
rdt3.0: channels with errors and loss
New assumption:
underlying channel can
also lose packets (data or
ACKs)
checksum, seq. #, ACKs,
retransmissions will be of
help, but not enough
Q: how to deal with loss?
sender waits until certain
data or ACK lost, then
retransmits
yuck: drawbacks?
Approach: sender waits
reasonable amount of time
for ACK
retransmits if no ACK received
in this time
if pkt (or ACK) just delayed (not
lost):
retransmission will be
duplicate, but use of seq. #s
already handles this
receiver must specify seq # of
pkt being ACKed
requires countdown timer
26
rdt3.0 sender
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
start_timer
rdt_send(data)
Wait
for
ACK0
rdt_rcv(rcvpkt)
&&
( corrupt(rcvpkt)
||
isACK(rcvpkt,1) )
Wait for
call 1 from
above
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
start_timer
rdt_send(data)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
rdt_rcv(rcvpkt)
&&
( corrupt(rcvpkt)
||
isACK(rcvpkt,0) )
rdt_rcv(rcvpkt)
&&
notcorrupt(rcvpkt)
&& isACK(rcvpkt,1)
stop_timer
stop_timer
udt_send(sndpkt)
start_timer
timeout
udt_send(sndpkt)
start_timer
timeout
rdt_rcv(rcvpkt
)
Wait for
call 0from
above
Wait
for
ACK1
A
rdt_rcv(rcvpkt
)
A
A
A
27
rdt3.0 in action
28
rdt3.0 in action
29
Performance of rdt3.0
rdt3.0 works, but performance unimpressive.
example: 1 Gbps link, 15 ms end to end prop. delay, 1KB packet:

T
transmi
t
=
8kb/pkt
10**9 b/sec
= 8 microsec
U
sender
: utilization fraction of time sender busy sending
1KB pkt every 30 msec -> 33kB/sec throughput over 1 Gbps link
network protocol limits use of physical resources!

U
sender
=
.008
30.008
=
0.00027
microsec
onds
L / R
RTT + L / R
=
L (packet length in bits)
R (transmission rate, bps)
=
30
rdt3.0: stop-and-wait operation
first packet bit transmitted, t =
0
sender receive
r
RTT
last packet bit transmitted, t = L /
R
first packet bit arrives
last packet bit arrives, send
ACK
ACK arrives, send next
packet, t = RTT + L / R

U
sender
=
.008
30.008
=
0.00027
microsec
onds
L / R
RTT + L / R
=
31
Pipelined protocols
Pipelining: sender allows multiple, in-flight, yet-to-
be-acknowledged pkts
range of sequence numbers must be increased
buffering at sender and/or receiver
Two generic forms of pipelined protocols: go-Back-N,
selective repeat
32
Pipelining: increased utilization
first packet bit transmitted, t
= 0
sender receiver
RTT
last bit transmitted, t = L /
R
first packet bit arrives
last packet bit arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R
last bit of 2
nd
packet arrives, send
ACK last bit of 3
rd
packet arrives, send
ACK

U
sender
=
.024
30.008
=
0.0008
microsecon
ds
3 * L / R
RTT + L / R
=
Increase utilization
by a factor of 3!
33
Go-Back-N
Sender:
k-bit seq # in pkt header
window of up to N, consecutive unacked pkts allowed


ACK(n): ACKs all pkts up to, including seq # n - cumulative ACK
may deceive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n): retransmit pkt n and all higher seq # pkts in window


34
GBN: sender extended FSM
Wait
start_timer
udt_send(sndpkt[base])
udt_send(sndpkt[base+1])

udt_send(sndpkt[nextseqnum-
1])

timeout

rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
else
refuse_data(data)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)

base=0
nextseqnum=0

rdt_rcv(rcvpkt)
&& corrupt(rcvpkt)

A
35
GBN: receiver extended FSM
ACK-only: always send ACK for correctly-received pkt
with highest in-order seq #
may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt:
discard (dont buffer) -> no receiver buffering!
Re-ACK pkt with highest in-order seq #
Wait
udt_send(sndpkt
)
defaul
t

rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
&&
hasseqnum(rcvpkt,expectedseqnu
m)
extract(rcvpkt,data)
deliver_data(data)
sndpkt =
make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++
expectedseqnum=0

A
36
GBN in
action
37
Selective Repeat
receiver individually acknowledges all correctly
received pkts
buffers pkts, as needed, for eventual in-order delivery to upper
layer
sender only resends pkts for which ACK not received
sender timer for each unACKed pkt
sender window
N consecutive seq #s
again limits seq #s of sent, unACKed pkts
38
Selective repeat: sender, receiver windows
39
Selective repeat
data from above :
if next available seq # in
window, send pkt
timeout(n):
resend pkt n, restart timer
ACK(n) in
[sendbase,sendbase+N]:
mark pkt n as received
if n smallest unACKed pkt,
advance window base to next
unACKed seq #

sender
pkt n in [rcvbase, rcvbase+N-1]
send ACK(n)
out-of-order: buffer
in-order: deliver (also deliver
buffered, in-order pkts), advance
window to next not-yet-received
pkt
pkt n in [rcvbase-N,rcvbase-1]
ACK(n)
otherwise:
ignore

receiver
40
Selective repeat in action
41
Selective repeat:
dilemma
Example:
seq #s: 0, 1, 2, 3
window size=3

receiver sees no difference in
two scenarios!
incorrectly passes duplicate
data as new in (a)

Q: what relationship between
seq # size and window size?
sequence number size >=
2*window size
42
TCP: Overview RFCs: 793, 1122, 1323,
2018, 2581
full duplex data:
bi-directional data flow in
same connection
MSS: maximum segment
size
connection-oriented:
handshaking (exchange of
control msgs) inits sender,
receiver state before data
exchange
flow controlled:
sender will not overwhelm
receiver
point-to-point:
one sender, one receiver
reliable, in-order byte
steam:
no message boundaries
pipelined:
TCP congestion and flow
control set window size
send & receive buffers

socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes data
application
reads data
43
TCP segment structure
source port # dest port #
32 bits
application
data
(variable length)
sequence number
acknowledgement number
rcvr window size
ptr urgent data
checksum
F S R P A U
head
len
not
used
Options (variable length)
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
# bytes
rcvr willing
to accept
counting
by bytes
of data
(not segments!)
Internet
checksum
(as in UDP)
44
TCP seq. #s and ACKs
Seq. #s:
byte stream number
of first byte in
segments data
ACKs:
seq # of next byte
expected from other
side
cumulative ACK
pigybacked
Q: how receiver handles
out-of-order segments
A: TCP spec doesnt
say, - up to implementer


Host A Host B
User
types
C
host ACKs
receipt
of echoed
C
host ACKs
receipt of
C, echoes
back C
time
simple telnet scenario
45
TCP: retransmission scenarios
Host A
loss
t
i
m
e
o
u
t

time
lost ACK scenario
Host B
X
Host A
S
e
q
=
9
2

t
i
m
e
o
u
t

time
premature timeout,
cumulative ACKs
Host B
S
e
q
=
1
0
0

t
i
m
e
o
u
t

46
TCP Flow Control
receiver: explicitly
informs sender of
(dynamically changing)
amount of free buffer space
RcvWindow field in
TCP segment
sender: keeps the amount
of transmitted, unACKed
data less than most recently
received RcvWindow
sender wont overrun
receivers buffers by
transmitting too
much,
too fast
flow control
receiver buffering
RcvBuffer = size or TCP Receive Buffer

RcvWindow = amount of spare room in Buffer
47
TCP Round Trip Time and
Timeout
Q: how to set TCP
timeout value?
longer than RTT
note: RTT will vary
too short: premature
timeout
unnecessary
retransmissions
too long: slow reaction to
segment loss
Q: how to estimate RTT?
SampleRTT: measured time from
segment transmission until ACK receipt
ignore retransmissions,
cumulatively ACKed segments
SampleRTT will vary, want
estimated RTT smoother
use several recent measurements,
not just current SampleRTT
48
TCP Round Trip Time and
Timeout
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
weighted moving average (EstimatedRTT)
influence of given sample decreases fast (SampleRTT)
typical value of x: 0.125
Setting the timeout
EstimtedRTT plus safety margin
large variation in EstimatedRTT -> larger safety margin

Timeout = EstimatedRTT + 4*Deviation
Deviation = (1-y)*Deviation +
y*|SampleRTT-EstimatedRTT|
Typical vlaue of y: 0.25
49
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
R
T
T

(
m
i
l
l
i
s
e
c
o
n
d
s
)
SampleRTT Estimated RTT
50
Principles of Congestion Control
Congestion:
informally: too many sources sending too much data
too fast for network to handle
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
long delays (queuing in router buffers)
a top-10 problem!

FLOW CONTROL I S BETWEEN 2 END SYSTEMS
CONGESTI ON CONTROL I S BETWEEN 2 I NTERMEDI ATE NODES
51
Causes/costs of congestion: scenario 1
two senders, two
receivers
one router, infinite
buffers
no retransmission

large delays
when congested
maximum
achievable
throughput
unlimited shared
output link buffers
Host A

in
: original
data
Host B

out
52
Approaches towards congestion control
End-end congestion
control:
no explicit feedback
from network
congestion inferred from
end-system observed loss,
delay
approach taken by TCP
Network-assisted
congestion control:
routers provide feedback to
end systems
single bit indicating
congestion (SNA, DECbit,
TCP/IP ECN, ATM)
explicit rate sender should
send at
Two broad approaches towards congestion control:
53
TCP Congestion Control
end-end control (no network assistance)
transmission rate limited by congestion window size,
Congwin, over segments:
w segments, each with MSS bytes sent in one RTT:
throughput =
w * MSS
RTT
Bytes/sec
Congwin
54
TCP congestion control:
two phases
slow start
congestion
avoidance(AIMD)
important variables:
Congwin
threshold: defines
threshold between two slow
start phase, congestion
control phase
probing for usable
bandwidth:
ideally: transmit as fast as
possible (Congwin as large
as possible) without loss
increase Congwin until loss
(congestion)
loss: decrease Congwin,
then begin probing
(increasing) again

55
TCP Congestion Algorithms
Additive-increase & Multiplicative Decrease
(AIMD): Increase Linearly & Incase of
congestion decrease by half from present
Congestion window
Slow Start: Start from 1MSS & increase
exponentially & in case of Congestion again
starts from 1 MSS.
Reaction To Timeout Events: Different
protocols for Time-out case then NAK or 3-
ACK after time out TCP enters to Slow Start
till CongWin reaches one half value of before
timeout event. From here CongWin grows
Linearly.
56
TCP Slowstart
exponential increase (per
RTT) in window size (not so
slow!)
loss event: timeout (Tahoe
TCP) and/or or three duplicate
ACKs (Reno TCP)
initialize: Congwin = 1
for (each segment
ACKed)
Congwin++
until (loss event OR
CongWin >
threshold)
Slowstart algorithm
Host A
R
T
T

Host B
time
57
TCP Congestion Avoidance: Tahoe
/* slowstart is over */
/* Congwin > threshold */
Until (loss event) {
every w segments
ACKed:
Congwin++
}
threshold = Congwin/2
Congwin = 1
perform slowstart
TCP Tahoe Congestion avoidance
58
TCP Congestion Avoidance:
Reno
/* slowstart is over */
/* Congwin > threshold */
Until (loss event) {
every w segments ACKed:
Congwin++
}
threshold = Congwin/2
If (loss detected by timeout)
{
Congwin = 1
perform slowstart }
If (loss detected by triple
duplicate ACK)
Congwin = Congwin/2
TCP Reno Congestion avoidance
three duplicate ACKs
(Reno TCP):
some segments are
getting through correctly!
dont overreact by
decreasing window to 1 as
in Tahoe
decrease window size by
half

59
Congestion Avoidance: Reno
increase window by one per RTT if no loss: Congwin++





decrease window by half on detection of loss by triple
duplicate ACK: CongWin = Congwin/2 W <- W/2

s e n d e r
r e c e i v e r
W
s e n d e r
r e c e i v e r
W
60
TCP Reno versus TCP Tahoe:
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Transmission round
c
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w

s
i
z
e

(
s
e
g
m
e
n
t
s
)
Series1 Series2

threshold
TCP
Tahoe
TCP
Reno
Figure: Evolution of TCPs Congestion
window (Tahoe and Reno)
61
TCP Fairness
Fairness goal: if N TCP
sessions share same
bottleneck link, each should
get 1/N of link capacity
TCP congestion
avoidance:
AIMD: additive
increase, multiplicative
decrease
increase window by 1
per RTT
decrease window by
factor of 2 on loss event
AIMD
TCP connection 1
bottleneck
router
capacity R
TCP
connection 2
62
Why is TCP fair?
Two competing sessions:
Additive increase gives slope of 1, as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
C
o
n
n
e
c
t
i
o
n

2

t
h
r
o
u
g
h
p
u
t

congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
63
TCP latency modeling: for STATIC
Window size
Q: How long does it take to
receive an object from a Web
server after sending a request?
TCP connection establishment
data transfer delay
Notation,
assumptions:
Assume one link
between client and server
of rate R
Assume: fixed
congestion window, W
segments
S: MSS (bits)
O: object size (bits)
no retransmissions (no
loss, no corruption)


Two cases to consider:
WS/R > RTT + S/R: ACK for first segment in window
returns before windows worth of data sent
WS/R < RTT + S/R: wait for ACK after sending
windows worth of data sent
64
TCP latency Modeling
Case 1: latency = 2RTT + O/R
Case 2: latency = 2RTT + O/R
+ (K-1)[S/R + RTT - WS/R]
K:= O/WS
65
TCP Latency Modeling: Slow Start- DYNAMIC
window size
Now suppose window grows according to slow start.
Will show that the latency of one object of size O is:
R
S
R
S
RTT P
R
O
RTT Latency
P
) 1 2 ( 2
(

+ + + =
where P is the number of times TCP stalls at server:
} 1 , { min = K Q P
- where Q is the number of times the server would stall
if the object were of infinite size.

- and K is the number of windows that cover the object.
66
TCP Latency Modeling: Slow Start (cont.)
RTT
initiate TCP
connection
request
object
first window
= S/R
second window
= 2S/R
third window
= 4S/R
fourth window
= 8S/R
complete
transmission
object
delivered
time at
client
time at
server
Example:

O/S = 15 segments

K = 4 windows

Q = 2

P = min{K-1,Q} = 2

Server stalls P=2 times.




67
TCP Latency Modeling: Slow Start (cont.)
R
S
R
S
RTT P RTT
R
O
R
S
RTT
R
S
RTT
R
O
stallTime RTT
R
O
P
k
P
k
P
p
p
) 1 2 ( ] [ 2
] 2 [ 2
2 latency
1
1
1
+ + + =
+ + + =
+ + =

=
=

window th after the time stall 2


1
k
R
S
RTT
R
S
k
=
(

+
+

ement acknowledg receives server until


segment send to starts server when from time = +RTT
R
S
window kth the transmit to time 2
1
=

R
S
k
RTT
initiate TCP
connection
request
object
first window
= S/R
second window
= 2S/R
third window
= 4S/R
fourth window
= 8S/R
complete
transmission
object
delivered
time at
client
time at
server
68
Summary
principles behind transport
layer services:
multiplexing/demultiplexi
ng
reliable data transfer
flow control
congestion control
instantiation and
implementation in the Internet
UDP
TCP
Next:
leaving the network
edge (application,
transport layers)
into the network
core