Sie sind auf Seite 1von 34

TCP Congestion Control:

Algorithms and Analysis


Simon S. Lam
Department of Computer Sciences
The University of Texas at Austin

Littles Law
Average
g population
p p
= (average delay) x
(throughput)

average delay 1
N

delayi
i1

where N is number of departures

throughput N/T

where T is duration of observation

average population (to be defined)

Try homework problem at


http://www.cs.utexas.edu/users/lam/cs356/homework/hw2.html

TCP Congestion Control (Simon Lam)

Number in system n(t)

Time t
average population 1 T0 n(t)dt
T

TCP Congestion Control (Simon Lam)

Sliding Window Protocol


Consider an infinite array, Source, at the

sender, and an infinite array, Sink, at the


receiver.
send window

Source:
P1
Sender

a1 a

acknowledged

P2
Receiver

Sink
0

unacknowledged

next expected
received

:
1

s1 s

r + RW 1

r
delivered
RW receive window size
SW send window size (s - a SW)

receive window

TCP Congestion Control (Simon Lam)

Sliding Windows in Action


Data unit r has just been received by P2
Receive window slides forward
P2
P sends
d cumulative
l
ack
k with
h sequence

number it expects to receive next (r+3)


send window

Source:
P1
Sender

a1 a

acknowledged

s1 s
unacknowledged
r+3
next expected

Sink:
P2
Receiver

r + RW 1

r
delivered

receive window
TCP Congestion Control (Simon Lam)

Sliding Windows in Action


P1 has just received cumulative ack with

r+3 as next expected sequence number

Send window slides forward


send window

Source:
P1
Sender

a1 a

s1 s

acknowledged
next expected

Sink:
P2
Receiver

r + RW 1

r
delivered

receive window

TCP Congestion Control (Simon Lam)

Window Flow Control


RTT
Source

1 2

1 2

time

ACKs

data
Destination

1 2

1 2

time

~ W packets per RTT when no loss


Lost packet detected by missing ACK

(note: timeout value TO > RTT)

TCP Congestion Control (Simon Lam)

Throughput (send rate)


Limit the number of unacked transmitted

packets in the network to window size W

Throughput

W
RTT

packets/sec

W MSS
RTT

bytes/sec

Where did we apply Littles Law?


Answer: Consider send buffer
TCP Congestion Control (Simon Lam)

Clarifications
Average number in the send buffer is typically

less than W unless packet arrival rate to send


buffer is infinite -> previous formula provides a
throughput upper bound

If each packet may be lost with rate p, then the

average delay is

(1 p) RTT p TO

Since TO > RTT, actual throughput is smaller.


With loss, g
goodput
p is

(1 p ) throughput
th
h t

Note: in some papers and other context (e.g., random


access protocols), goodput is called throughput. To avoid
confusion, throughput is called send rate
TCP Congestion Control (Simon Lam)

Effect of Congestion
W too big for each of many flows -> congestion
Packet loss -> transmissions on links prior to packet

loss are wasted


Congestion collapse due too many retransmissions
and too much waste
October 1986, Internet had its first congestion
collapse
goodput

load

TCP Congestion Control (Simon Lam)

10

TCP Window Control


Receiver flow control
Avoid overloading receiver
rwnd:
receiver (advertised) window
Receiver sends rwnd to sender
Network congestion control
Sender tries to avoid overloading network
It infers available network capacity from loss
i di ti
indications

cwnd:
congestion window
Sender sets W = min (cwnd, rwnd)
TCP Congestion Control (Simon Lam)

11

Receiver Flow Control


Receiver advertises rwnd with each packet it

sends
Size of rwnd indicates available space in
receive buffer

decreased when data is received from IP layer and


ackd
increased when data is consumed by application
process

TCP Congestion Control (Simon Lam)

12

Network Congestion Control


Sender calculates cwnd from indications of

network congestion
Congestion indications
timeout (loss)
dupACK (loss likely)
queueing delay
mark (needs ECN)

TCP algorithms to calculate cwnd


Tahoe, Reno, Vegas,
Link algorithms:
RED, REM

TCP Congestion Control (Simon Lam)

13

TCP & AQM


pl((t))

xi(t)

Congestion measures pl(t) for distributed feedback control of xi(t)


loss and dupACK (DropTail)
queueing delay (Vegas)
with the help of active queue management (AQM)
queue length (RED)
price (REM)
TCP Congestion Control (Simon Lam)

14

TCP Congestion Control


Tahoe (Jacobson 1988)
Slow Start
Congestion Avoidance
Fast
F t Retransmit
R t
it
Reno (Jacobson 1990)
Fast Recovery
Its variants: NewReno, SACK
Vegas (Brakmo & Peterson 1994)
New Congestion Avoidance
AQM
RED
E (Floyd
(Fl d & Jacobson
b
1993)
1
)
Probabilistic marking or dropping

REM (Athuraliya & Low 2000)


Clear buffer, match rate

Others

TCP Congestion Control (Simon Lam)

15

Slow Start
Start with cwnd = 1
On each successful ACK, increment cwnd

cwnd cwnd + 1

Exponential growth of cwnd

each RTT: cwnd 2 x cwnd

Enter CA when cwnd >= ssthresh


For initial slow start,, ssthresh is set to a very
y large
g

value (e.g., 65 Kbytes)

Note: for clarity, cwnd, rwnd, and ssthresh are


counted in packets (segments) rather than in bytes
TCP Congestion Control (Simon Lam)

16

Slow Start
sender

receiver

cwnd
1

1 RTT
2

data packet
ACK

3
4
5
6
7
8

cwnd cwnd + 1 (for each ACK)TCP Congestion Control (Simon Lam)

17

Congestion Avoidance
sender

receiver

cwnd
1

cwnd ssthresh

data packet
ACK

CA starts when

1 RTT

On each successful

ACK:
cwnd cwnd + 1/cwnd

Linear growth of cwnd


4

each RTT:
cwnd cwnd + 1
TCP Congestion Control (Simon Lam)

18

Packet Loss
Assumption: loss indicates congestion
Packet
P k t loss
l
detected
d t t db
by

Retransmission timeout (RTO timer)


Duplicate ACKs (at least 3)

Packets

Acknowledgements

TCP Congestion Control (Simon Lam)

19

Fast Retransmit
A timeout is quite long (> RTT)
Upon
p receiving
g 3 dupACKs,
p
, immediately
y

retransmit without waiting for timeout

Adjusts ssthresh

ssthresh max(flightsize/2, 2)
where
h
flightsize
fli htsiz is numb
number of
f outstanding
utst ndin p
packets,
ck ts
which may be less than W = min(rwnd, cwnd)
Enter Slow Start (cwnd = 1)
TCP Congestion Control (Simon Lam)

20

10

TCP Tahoe

(Jacobson 1988)

cwnd

SS

time
(in RTTs)

CA

SS: Slow Start


CA: Congestion Avoidance
TCP Congestion Control (Simon Lam)

21

Successive Timeouts
When there is another timeout, double the

timeout value
Keep doing so for each additional lossretransmission

Exponential backoff up to
max timeout value equal
to 64 times initial timeout
value

Note: red line in figure denotes a loss indication


TCP Congestion Control (Simon Lam)

22

11

Summary: Tahoe
Basic ideas
Probe network for spare capacity during SS and
CA and increase send rate
Drastically reduce rate on congestion indication
Self-clocking
Error recovery by retransmission
Round trip time estimation (to get TO value)
for every ACK {
if (W < ssthresh) then W++
(SS)
else W += 1/W
(CA)
}
for every loss indication {
ssthresh = W/2
W = 1
}
TCP Congestion Control (Simon Lam)

TCP Tahoe

23

(Jacobson 1988)

cwnd

SS

time

CA

SS: Slow Start


CA: Congestion Avoidance
TCP Congestion Control (Simon Lam)

24

12

TCP Reno

(Jacobson 1990)

cwnd

SS

time

CA

SS: Slow Start


CA: Congestion Avoidance

Fast retransmission/fast recovery


TCP Congestion Control (Simon Lam)

25

TCP Reno (another scenario)


TO
cwnd

3 dupACKs

halved
Initial slow start

Slow start until cwnd


reaches ssthresh

TCP Congestion Control (Simon Lam)

26

13

Fast recovery (in detail)


Idea: each dupACK represents a packet

successfully received.
received Therefore,
Therefore no need for
very drastic action
Enter FR/FR after 3 dupACKs

Set ssthresh max(flightsize/2, 2)


Retransmit lost packet
Set cwnd ssthresh + #dupACKs (window inflation)
Wait till W=min(rwnd
W=min(rwnd, cwnd) is large enough; transmit
new packet(s)
On non-dup ACK (1 RTT later), set cwnd ssthresh
(window deflation)

Enter CA
TCP Congestion Control (Simon Lam)

27

Example: FR/FR
S 1 2 3 4 5 6 7 8

9 10 11

time
Exit FR/FR

cwnd 8
ssthresh

7
4

time

0 0 0 0 0 0 0

9
4

11
4

4
4

Above scenario: Packet 1 is lost, p


packets 2, 3, and

4 are received;
i d dupACKs
d ACK with
i h seq. no. 0 returned
d

Fast retransmit
Retransmit on 3 dupACKs
Fast recovery
Inflate window such that new packets 9, 10, and 11 can be
sent while repairing loss

TCP Congestion Control (Simon Lam)

28

14

Summary: Reno
Basic ideas
Fast recovery avoids slow start
dupACKs: fast retransmit + fast recovery
Timeout: fast retransmit + slow start
dupACKs
congestion
avoidance

FR/FR
timeout

slow start

retransmit
TCP Congestion Control (Simon Lam)

29

AIMD in steady state


additive increase:
increase cwnd by 1
MSS every RTT in the
absence
bs
of
f any lloss
ss
event

multiplicative decrease:
cut cwnd in half after
3 dupACKs

congestion
window
24 Kbytes

16 Kbytes

8 Kbytes

Long-lived TCP connection

time

TCP Congestion Control (Simon Lam)

30

15

TCP throughput (send rate)


We derived the approximate formula

throughput =

W
packets/sec
RTT
W changes with the arrival of each
congestion indication
To calculate
l l
(average)
(
) send
d rate, we need
d
the average value of W
Q: W is a function of what parameter?
TCP Congestion Control (Simon Lam)

31

First approximation

M. Mathis, et al., The Macroscopic Behavior of the TCP Congestion


Avoidance Algorithm,ACM Computer Communicatons Review, 27(3), 1997.

No slow-start, no timeout, long-lived TCP

connection
Independent identically distributed periods
Each packet may be lost with probability p

TCP Congestion Control (Simon Lam)

32

16

Geometric Distribution

Ave. no. of transmissions to get first loss


n

ibi i (1 p)i1 p
i 1

i 1

p i (1 p )i 1
i 1

d
d
(1 p )i p

dp i 1
dp
d
1
1
p
p 2
dp 1 1 p
p
1/ p
p

(1 p)

i 0

Similarly, ave. no. of transmissions to get first success is


1/(1-p)
TCP Congestion Control (Simon Lam)

33

First approximation (cont.)


Average number of
packets delivered in
one period (area under
one saw-tooth)
2

W 1 W 3 2
W
2 2 2 8
Average number of
packets
k
sent per period
i d
(incl. loss at the end) is
1/p
Equating the two and
solving for W, we get

8
3p

send rate (in packets/sec)


no. of packets/period

time per period

1/ p
1

2 RTT
RTT

3p

3 2
W
8
W
RTT
2

3
2p

TCP Congestion Control (Simon Lam)

34

17

TCP ACK generation

[RFC 1122, RFC 2581]

Event at Receiver

TCP Receiver action

Arrival of in-order segment with


expected seq #. All data up to
expected seq # already ACKed

Delayed ACK. Wait up to 500ms


for next segment. If no next segment,
send ACK

Arrival of in-order segment with


expected seq #. One other
segment has ACK pending

Immediately send single cumulative


ACK, ACKing both in-order segments

Arrival of out
out-of-order
of order segment
higher-than-expect seq. # .
Gap detected

Immediately send duplicate ACK


ACK,
indicating seq. # of next expected byte

Arrival of segment that


partially or completely fills gap

Immediate send ACK, provided that


segment starts at lower end of gap
TCP Congestion Control (Simon Lam)

35

Receiver implements Delayed ACKs


Receiver sends one ACK for every two packets

received -> each saw-tooth is WxRTT wide


-> area under a saw-tooth
h is
3W 2

Send rate is

1
RTT

One ACK for every

i
is

1
RTT

3
4p

b packets received -> send rate

3
2bp
TCP Congestion Control (Simon Lam)

36

18

Modeling
M
d li TCP Throughput:
Th
h t A Si
Simple
l
Model and its Empirical Validation,
Proc. ACM SIGCOMM, 1998
Jitendra Padhye, Victor Firoiu,
D Towsley,
Don
T
l
and
d Jim
Ji Kurose
K

Motivation
Previous formulas not so accurate when

loss rates are high


TCP traces show that there are more loss
indications due to timeouts (TO) than due
to triple dupACKs (TD)

TCP Congestion Control (Simon Lam)

38

19

Objectives
More accurate steady-state throughput

formula as a function of loss rate and RTT


by also accounting for TO behavior of a
TCP connection
Formula applicable over a wider range of
loss rates
Explicit statements of assumptions and
approximations
i ti
used
d iin d
derivation
i ti of
f
throughput formula
Formula to include the impact of a small
rwnd
TCP Congestion Control (Simon Lam)

39

Many assumptions and


approximations
A1. TCP sender is saturated, i.e., source

application process always has a packet to


send when send window has space available

i.e., bulk transfer application

A2. Slow Start not modeled


A3. Time to send all packets in a window is

smaller than RTT

i.e.,transmission rate is not too low

TCP Congestion Control (Simon Lam)

40

20

A3. Time to send W packets is


less than RTT
ACK reception

marks the end of


current round and
beginning of next
round.

Start of round

time

Approximation: For
b > 1, ACK is not
received
immediately after
one RTT, but it is so
assumed in the
analysis

End of round

space

TCP Congestion Control (Simon Lam)

41

AIMD evolution of Window Size over time

Each TD period is ended by a TD loss indication.


TDPi p
period has duration Ai rounds

A4. Duration of a round (RTT) is independent of

window size

approximation (poor for a slow line)

A5. No window inflation in Fast Recovery


approximation
TCP Congestion Control (Simon Lam)

42

21

Markov regenerative assumption


For the

i-th TD period, Wi is window size at

the end of the period, Yi is the number of


packets sent in the period
A6. Assume {Wi} to be a Markov
regenerative process with rewards {Yi}
Given A6, the steady-state TCP throughput
is

B lim Bt lim
t

N t E[Yi ] E[Y ]

t
E[ Ai ] E[ A]

TCP Congestion Control (Simon Lam)

43

Consider i-th TD period

when ACK of
last packet is
received

b packets (b = 2 in above
figure) -> linear increase has a slope of 1/b packet per
RTT
Number of rounds is Xi +1
One ACK after receiving

i is the first packet lost in i-th TD period

TCP Congestion Control (Simon Lam)

44

22

Loss assumptions
A7. Losses in different rounds are

independent

approximation

A8. Losses within the same round are

correlated as follows: If a packet is lost,


all remaining packets transmitted until the
end of that round are also lost
approximation
i ti bursty
b t lloss b
behavior
h i b
butt only
l
within the same round
all lost packets in the same round are counted
as a single loss indication when estimating p

TCP Congestion Control (Simon Lam)

45

AIMD throughput derivation (1)

E[ ] 1/ p
E[r] RTT
E[Y ] E[ ] E[W ] 1
From Wi

1
1 E[W ]
p

Wi1 Xi
, we have
b
2

seq. no. of first loss


r round trip time
Y no. of p
packets sent
W window size
X no. of rounds
A time duration of a period

b
E[ X ] E[W ]
2
b
E[ A] (E[ X ] 1)E[r] ( E[W ] 1)RTT
2
1
1 E[W ]
E[Y ]
p

send rate B
E[ A] ( b E[W ] 1)RTT
2

<- from A4 that round trip


times are independent of Wi

TCP Congestion Control (Simon Lam)

46

23

AIMD throughput derivation (2)


Another way to
compute E[Y]

X i / b 1

Wi 1
k )b i
2
k 0
XW
X X
i i 1 i ( i 1) i
2
2 b
X
W
i (Wi i 1 1) i
2
2
W
Let E[ ] be E[ ] and we have
2
E[ X ]
E[W ]
E[Y ]
( E[W ]
1) E[ ]
2
2
bE[W ]
E[W ]
W
( E[W ]

1) E[ ]
4
2
2

Yi

<- A9. Assume that


{Xi} and {Wi} are
mutually
independent i.i.d.
sequences of
random variables
TCP Congestion Control (Simon Lam)

47

AIMD throughput (3)


Equate the two previous formulas for E[Y]. Solve the quadratic

equation with E[W] as the only unknown

1
1 E [W ]
p
b E [W ]
E [W ]
W
( E [W ]

1) E [ ]
4
2
2
E [Y ]

E [W ]

2b

3b

se n d ra te

8(1 p )
2b 2
)
(
3b p
3b

1 p
E [W ]
p
B( p)
E [ A]

TCP Congestion Control (Simon Lam)

48

24

AIMD throughput (4)


To get a simple formula, collect terms that are o(1/sqrt(p))

E[W ]

8
o(1/ p )
3bp

b
2b
E[ X ] E[W ]
o(1/ p )
2
3p
send rate B( p)

1/ p o(1/ p)
1
3

o(1/ p )
2b
RTT 2bp
o(1/ p )
RTT
3p

TCP Congestion Control (Simon Lam)

49

AIMD with TO

Let ni denote the number of TD periods within a

cycle ending in i-th


th TO period, Ri denote no. of
retransmissions in i-th TO period
A10. {ni } form an i.i.d. sequence, independent of
{Yij} and {Aij}

TCP Congestion Control (Simon Lam)

50

25

Throughput of AIMD with TO (1)


E[ M ] E[n]E[Y ] E[ R ]

Assumption of
Markov
regenerative
process again.

E[ S ] E[n]E[ A] E[ Z TO ]
E[ M ]
E[n]E[Y ] E[ R]
send rate B

E[ S ] E[n]E[ A] E[ Z TO ]
E[Y ] Q E[ R]
B
E[ A] Q E[ Z TO ]
1
where Q
<- Probability
P b bilit th
thatt a
E[n]
given loss
1
E[ R]
indication is a TO
1 p
with Q and E[ Z TO ] to be determined
TCP Congestion Control (Simon Lam)

51

Approximate solution for Q

A given loss indication is a TO is the union of two


events Two or less acked packets in penultimate
round or two or less acked packets in final round
TCP Congestion Control (Simon Lam)

52

26

Approximate solution for Q (cont.)


<- penultimate round of w
packets, first k packets
ackd given there is a loss

(1 p ) k p
A( w, k )
1 (1 p ) w
C (k , m ) (1 p ) m p,

m k 1

C (k , m ) (1 p ) m ,

mk

( w) 1
Q

<- for last round, k packets


sent, m packets ackd in
sequence

<- at most 2 dupACKs


p
2
w
2
<- probability of fewer than 3
A( w, k ) A( w, k ) C ( k , m) packets sent successfully
k 0
k 3
m 0
in penultimate round or
if w 4
less than 3 acks in last
round

if w 3

TCP Congestion Control (Simon Lam)

53

Approximate solution for Q (cont.)

Q is

( w)]
E[Q

But we dont know the probability distribution of Wi


Approximation
3
3bp

Q Q ( E[W ]) min(1,

E[W ]

) min(1,3

TCP Congestion Control (Simon Lam)

54

27

Throughput of AIMD with TO (2)


P[ R k ] p k 1 (1 p )
Lk (2 1)TO
k

for k 1, 2,...

for k 6

(63 64(k 6))TO

for k 7

<- duration of k
TOs in a row

1 p 2 p 2 4 p 3 8 p 4 16 p 5 32 p 6
1 p
f ( p)
<- approximation
TO
T0 (1 32 p 2 )
1 p
E[Y ] Q E[ R ]
sendd rate B ( p )
E[ A] Q E[ Z TO ]
1 p
( E[W ]) 1
E[W ] Q
p
1 p
B( p)
( E[W ])T f ( p )
RTT ( E[ X ] 1) Q
O
1 p TCP Congestion Control (Simon Lam)
E[ Z TO ] TO

55

Throughput of AIMD with TO (3)


1 p
( E[W ]) 1
E[W ] Q
<- Eq. (27)
p
1 p
B( p)
more accurate
( E[W ])T f ( p)
RTT ( E[ X ] 1) Q
version of
O
1 p
throughput
formula
1/ p

2b

3bp
2
RTT
min 1,3
(1 32 p )T0
3
p
8

1
<-Eq. (29)

2bp

3bp
2
most wellRTT
min 1,3
p(1 32 p )T0
known version
3
8

of throughput
formula
TCP Congestion Control (Simon Lam)

56

28

Impact of receivers rwnd limitation

Full model Eq. (31)

Compute E[W ]. If E[W ] Wmax , use Eq. (27):

1 p
( E[W ]) 1
E[W ] Q
p
1 p
B( p)
if E[W] <Wmax,
( E[W ])T f ( p)
RTT ( E[ X ] 1) Q
O
1 p
1 p
(W ) 1
Wmax Q
otherwise, use Wmax for
max
p
1 p
E[W] and recompute
B( p)
b
1 p
f ( p)
E[X]

2) Q(Wmax )TO
RTT ( Wmax
8
1 p
pWmax
(derivation omitted)
TCP Congestion Control (Simon Lam)

57

Impact of receivers rwnd


limitationapproximate model
Use the well-known Eq. (29) from before,

B( p ) min(

Wmax
,
RTT

1
2bp

3bp
2
RTT
min 1,3
p (1 32 p )T0
8
3

which is referred to as Eq. (32)


TCP Congestion Control (Simon Lam)

58

29

Summary data from traces (1 hour)


Saturated TCP

sender
p computed
from dividing
total no. of loss
indications by
total number of
packets sent
RTT and TO
values are
averaged over
entire 1-hour
trace
TCP Congestion Control (Simon Lam)

59

Summary data from 100s traces

Each row represents results of 100 traces each of

100 seconds in duration for same S-D pair


Totals are cumulative over 100 traces
RTT and TO are average values over 100 traces for
same S-D pair
TCP Congestion Control (Simon Lam)

60

30

Experimental comparison (1)

Each point represents number of packets in 100s interval of trace


T0 ~ single TO, T1 ~ at least 1 double TO in trace, etc.
TD Only is analytic model by Mathis et al.
Note: Wmax is only 6 in Figure 7

TCP Congestion Control (Simon Lam)

61

Experimental comparison (2)

Wmax = 33

Wmax=44
TCP Congestion Control (Simon Lam)

62

31

Experimental comparison (3)

Wmax=8

Wmax=48
TCP Congestion Control (Simon Lam)

63

Accuracy of approximate model

Figure 18: manic to spiff, with predictions by both full and


approximate models
(Wmax=32)
TCP Congestion Control (Simon Lam)

64

32

Average errors

ave. error

N predicted N observed

N observed
no. of observations

observations

TCP Congestion Control (Simon Lam)

65

Conclusions
A much more rigorous analysis than the one by

Mathis et al.
Numerous assumptions and approximations used
but (almost) all of them are explicitly stated
Large amount of experimental measurements on
the Internet to validate accuracy of the full model
(less for the approximate model)
Throughput formula accounts for loss indications
due to TO as well as rwnd restriction

Using
U
i the
h formula
f
l requires
i
accurate measurements of
f
loss rate and RTT values (which could be tricky)
For TCP Reno and drop-tail router

Accuracy (like beauty) is in the eye of the

beholder. What do you think?

TCP Congestion Control (Simon Lam)

66

33

TCP Throughput limited by loss rate


TCP average throughput (approximate) in

terms of loss rate, L:


1 22 MSS
1.22

RTT p
Example: 1500-byte segments, 100ms RTT,

to get 10 Gbps throughput, loss rate needs


to be very low
10
p = 2x10
2 10-10
New version of TCP needed for connections
with high-delay bandwidth product

addressed in paper by Katabis et al

TCP Congestion Control (Simon Lam)

67

TCP Congestion Control (Simon Lam)

68

The End

34

Das könnte Ihnen auch gefallen