Beruflich Dokumente
Kultur Dokumente
Erich Nahum
Clusters
Client-side issues: DNS, HTML rendering
Proxies: some similarities, many differences
Dynamic Content: CGI, PHP, JSP, etc.
QoS for Web Servers
SSL/TLS and HTTPS
Content Distribution Networks (CDNs)
Security and Denial of Service
Called Methods:
GET: retrieve a file (95% of requests)
HEAD: just get meta-data (e.g., mod time)
POST: submitting a form to a server
PUT: store enclosed document as URI
DELETE: removed named resource
LINK/UNLINK: in 1.0, gone in 1.1
TRACE: http echo for debugging (added in 1.1)
CONNECT: used by proxies for tunneling (1.1)
OPTIONS: request for server/proxy options (1.1)
acceptex() is called
gets new socket, request, remote host IP address
string match in hash table is done to parse request
hash table entry contains relevant meta-data, including modification times, file
descriptors, permissions, etc.
sendfile() is called
pre-computed header, file descriptor, and close option
log written back asynchronously (buffered write()).
Thats it!
Advantages:
Most importantly, consistent with programmer's way of
thinking. Most programmers think in terms of linear series of
steps to accomplish task.
Processes are protected from one another; can't nuke data in
some other address space. Similarly, if one crashes, others
unaffected.
Disadvantages:
Slow. Forking is expensive, allocating stack, VM data structures
for each process adds up and puts pressure on the memory
system.
Difficulty in sharing info across processes.
Have to use locking.
No control over scheduling decisions.
Disadvantages:
Less robust. Failure can halt whole server.
Pushes per-process resource limits (like file descriptors).
Not every OS has full asynchronous I/O, so can still block on a file
read. Flash uses helper processes to deal with this (AMPED
architecture).
TCP TCP
IP IP
ETH ETH
Some terminology/jargon:
Mean: average of samples
Median : half are bigger, half are smaller
Percentiles: dump samples into N bins
(median is 50th percentile number)
Heavy-tailed:
Pr[ X x] cx a
As x->infinity
Normal: ( x ) 2 /( 2 2 )
e
(avg. sigma, variance mu) f ( x)
2
Lognormal: (ln( x ) ) 2 /( 2 2 )
(x >= 0; sigma > 0) e
f ( x)
x 2
Exponential:
(x >= 0)
f ( x) e x
Pareto:
(x >= k, shape a, scale k)
f ( x) ak a / x ( a 1)
Request methods
GET, POST, HEAD, etc.
Response codes
success, failure, not-modified, etc.
Size of requested files
Size of transferred objects
Popularity of requested files
Numbers of embedded objects
Inter-arrival time between requests
Protocol support (1.0 vs. 1.1)
Traffic is variable:
Responses vary across multiple orders of magnitude
Traffic is bursty:
Peak loads much larger than average loads
Certain files more popular than others
Zipf-like distribution captures this well
Two-sided aspect of transfers:
Most responses are small (zero pretty common)
Most of the bytes are from large transfers
Controversy over Pareto/log-normal distribution
Non-trivial for workload generators to replicate
Percentage Size
35.00 0-1 KB
50.00 1-10 KB
14.00 10-100 KB
Poisson distribution
1.00between
100 KB 1each
MB class
www.spec.org/osg/web96
Notion of user-equivalent:
statistical model of a user
active off time (between URLS),
inactive off time (between pages)
Captures various levels of burstiness
Not validated, shows that load generated is
different than SpecWeb96 and has more
burstiness in terms of CPU and # active
connections
www.cs.wisc.edu/~pb
www.cs.rice.edu/CS/Systems/Web-measurement
www.spec.org/osg/web99
www.tpc.org/tpcw
RTT
Loss occurs OR
CW > slow start threshold two segm
ents
Then switch to congestion
avoidance
If we detect loss, cut CW four segm
ents
in half
Exponential increase in
window size per RTT
time
Until (loss) {
after CW packets ACKed:
CW += 1;
}
ssthresh = CW/2;
Depending on loss type:
SACK/Fast Retransmit:
CW/= 2; continue;
Course grained timeout:
CW = 1; go to slow start.
timeout
=100
ACK
RTO value is based on estimated
round-trip time (RTT) X
loss
RTT is adjusted over time using
exponential weighted moving average: Seq=9
2, 8 byte
s data
RTT = (1-x)*RTT + (x)*sample
(x is typically 0.1)
=100
First done in TCP Tahoe AC K
time
lost ACK scenario
time
a byte in sequence)
FIN(Y
Client ACK's the FIN with )
timed wait
A
Client sends it's own FIN
when ready
Server ACK's client FIN as
well with SN+1. closed
ESTABLISHED
side closed, got ACK of our FIN receive FIN receive ACK
CLOSE-WAIT: other side sent send ACK of FIN LAST_ACK
CLOSED
RTT
1 st segme
nt
set 200 ms. delayed ack timer
Short-term deadlock:
sender is waiting for ACK since it sent
200 ms.
1 segment time
receiver is waiting for 2nd segment
before ACKing segment
ACK of 1
st
RTT
multiple objects per web page 2nd segm
ent
IE does not do pipelining! 3rd segme
nt
nd + 3rd seg
ments
ACK of 2
RTT
1 st segme
nt
systems since they 2nd segm
ent
(incorrectly) counted the
connection setup in st and 2
nd
f 1
RTT
ACK o
congestion window 3rd segme
nt
calculation
time
200 ms.
Delayed ACK still happens,
but now out of critical path
of response time for
d
download ACK of 3r
RTO timeout
1996 Olympic Web Server SEQ=400
0
show over 50% of clients have 0 00
receive window < 10K K 30 00 , RWIN = 1
AC
Many suffer coarse-grained
retransmission timeouts (RTOs) (illegal for sender
to send more)
Even SACK would not have
helped! SEQ=300
0 , size=100
0
time
RTT
1 st segme
Nagle prevents second from being nt (full siz
e)
sent (since not full size, and now we write()
(Nagle forbids
have unacked data outstanding)
sender from
Sender waits for delayed ACK from
200 ms.
sending more)
receiver
Receiver is waiting for 2nd segment
st segment
before sending ACK A C K of 1
Similar to IW=1 problem earlier
RTT
2 nd segme
n
Result: Many disable Nagle. t (half size
)
via setsockopt() call
SYN (X)
?
(2 * MSL)
MSL defined as 2 minutes in RFC time
1122 AC K
SYN (Z)
X
reject!
than TIME-WAIT.
Can sort PCB chain such that 178.23.48.3: TIME_WAIT
10.1.1.2: TIME_WAIT
82% reduction)
results in more RAM available for
disk cache, which leads to better 128.119.72.4
performance
10.1..1.2
wheel pointer
Varghese SOSP 1987:
use a hash-table-like structure called
timing wheel
events are ordered by relative time in
the future Timing Wheel
given event in future time T, put in
O ------- (N-1)
slot (T mod N)
list sorted by time (scheme 5)
Each clock tick: Expire: 12