Sie sind auf Seite 1von 28

Quality of Service

Les Cottrell SLAC & Stanford U. Presented at the NATO Advanced Networking Workshop, Tbilisi, Oct-99
Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM)
1

Overview
How do we measure QoS
Overview of methodology

Problem areas:
Generally How do E. Europe & Russia look

How does it affect applications


Bulk data transfer, interactive applications Loss, RTT, jitter, availability

What can be done

Uses existing ping infrastructure Hierarchical vs. full mesh Lightweight - low network impact, no special machines

Measurement mechanism

WWW
SLAC
Reports & Data

HTTP

Ping

Archive Monitoring Cache Monitoring Remote

Archive

HEPNRC

Remote

Monitoring Monitoring 1 monitor host remote host pair Remote Remote

Deployment

23 monitoring sites in 12 countries 511 remote hosts monitored in 54 countries on 6 continents ~ 2000 pairs

Deployment 2/2

In U.S. 57% are .edu, 10% are .gov, 15% are .net, 10% are .com 20% are connected directly to ESnet, 39% are on Internet 2

Results: Top level view - Aug-99


Includes about 2000 pairs in 56 countries % packet loss between regions Monitoring region

Good (0-1%) Acceptable (1-2.5%) Bad (> 12%)

Poor (2.5-5%) V. poor (5-12%)

Within region (on diagonal) good to acceptable

Problem areas
Germany was bad with .ca & .edu yet good with ESnet. DESY improved in Aug with dedicated 3.5Mbps PVC to US/Canada R&E Russia (W) bad to .ca & .edu, good to ESnet, mixed to Europe, poor .jp Dubna worse than others. ITEP/IHEP better since new satellite E. Europe generally poor to bad China poor to very poor with most S. America poor to very poor
7

E. Europe

Russia
Packet loss from N. America to Russia, Jan-Aug 1999
60 50

Canada-ITEP Edu-ITEP Esnet-ITEP Esnet-Dubna Esnet-RSSI

Canada-NSK Edu-NSK Esnet-NSK Esnet-IHEP

Packet loss

40 30 20 10 0 Dec-98

Feb-99

Mar-99

May-99

Jul-99

Aug-99

ESnet NSk good, ESnet ITEP & IHEP improved with new satellite Canada & Edu bad all over DESY, CERN improved to acceptable to ITEP, IHEP, NSK with new satellite, Dubna still v. poor to bad, UK poor to ITEP & NSK 9 KEK good to NSk, v. poor to ITEP

European performance from U.S.

10

Impact on applications
Email
Importance of loss/performance
fairly insensitive to quality, may be delayed but keeps retrying for days and eventually gets through

Web
usually has human but expectations are low, performance often more limited by server, can retry

Bulk file transfer


unattended, if > 10-12% loss connections can time out

Interactive telnet, voice


very time & loss sensitive E.g. telnet/ssh loss of > 3% severely impacts typing ability

11

TCP bandwidth < (MSS/RTT)*(1/sqrt(loss))

Relates to Web performance (small files dominated by RTT)


Residual = GET - 2 * min (ping RTT)

Web response (ms)

Ping response (ms)

12

Bulk transfer - Performance Trends


Bandwidth TCP < 1460/(RTT * sqrt(loss))

13

Interactive apps - Delay

(48)

14

Interactive apps- Packet loss


ITU threshold for good quality voice

(48)

15

Interactive apps - Jitter


SLAC<=>CERN two-way instantaneous packet delay variation
Average = -0.03 msec. Std dev = 35 msec. Median = 0 msec. IQR = 29 msec Loss = 0.3% 1000 samples
Frequency Gaussian Gaussian-prob=79*exp(-x**2/(2*(IQR/2)**2))

90 80 70 60 50 40 30 20 10 0

90 80 70 60 50 40 30 20 10 0

Frequency

20

40

60

-80

-60

-40

-100

-20

80

Ping inter packet delay difference in msec.

100

IPDD(i) = RTT(i) - RTT(i-1)

16

SLAC-CERN Jitter
ITU/TIPHON delay jitter threshold (75 ms)
IQR(ipdv) between CERN & SLAC from Surveyor measurements (12/15/98 & medians for Dec-98)
100 IQR(ipdv) CERN>SLAC Monthly IQR(ipdv) CERN>SLAC 10 IQR(ipdv) SLAC>CERN Monthly IQR(ipdv) SLAC>CERN

IQR(IPDV) in msec.

0.1 0 5 10 15 20 25
17

Time since midnight (GMT)

Availability -Routing convergence

18

Availability - Outage probability


Surveyor probes randomly 2/seconds Measure time (Outage length) consecutive probes dont get through

http://www-iepm.slac.stanford.edu/monitoring/surveyor/outage.html

19

Error free seconds


Typical US phone company objectives are 99.6-99.99% What do we see for the Internet using Surveyor measurements

http://www-iepm.slac.stanford.edu/monitoring/surveyor/err-sec.html

20

Improving QoS
More bandwidth
Keep network load low (< 30%) Costs (at least in the W) are coming down dramatically

Reserved/managed bandwidth
generally on ATM via PVCs today

Differentiated services

21

More bandwidth
Packet loss between ESnet & UK since 1995
Median monthly % ping packet loss
Doubled capacity (+2Mbps) 45 Tripled capacity (+9Mbps) 40
35 30 25 20 15 10 5 0

Upgraded to 155Mbps Add 45 Mbps Doubled to 90Mbps

1/1/95

1/1/96

1/1/97

1/1/98

1/1/99

Holidays also have dips Transatlantic bandwidth is quickly absorbed Jan-95 had 2 Mbps, now at 2*OC3 so 150 times increase in bandwidth in 4.5 years

22

Reserved bandwidth
U.K. transatlantic link at 2*155Mbps, will reserve 2% for special projects both short & long term CERN & Italy both have reserved bandwidth to US
DESY had reserved bandwidth to ESnet, but not to N. America in general, so: performance to Canada & .edu bad performance to ESnet good to acceptable
23

Reserved BW - DESY & .edu/.ca


DESY worked with DFN to provide 3.5Mbps (< 3% total) non-shared bandwidth (PVC) for DESY to major educational sites in N. America starting August 12, 1999
DESY - TRIUMF (CA)

RTT ms. Aug 3-17 1999

Rest of Germany still around 12% loss (vs 1-2%)

24

Differentiated Services 1/2


Provides improved performance for small fraction of traffic Quite complex, requires policy, reservations, signaling, classification/marking, metering, policing (shaping & dropping), queuing/scheduling (congestion management), cross AS agreements
still research & pilots

25

Differentiated services 2/2


SLAC & LBNL have a DS testbed with a 3.5Mbps ATM PVC carved out of 43Mbps
Make phone call No load phone call is fine WFQ
PBX 24kbps

Inject 4Mbps UDP load


No WFQ cant make call
If make call then terrible quality

ATM
Bottleneck 3.5Mbps
Prod
26

VoIP

ESnet

Apply WFQ & policing (via CAR) With WFW call sounds fine

Edge

Next use ping to characterize:

Policing

Mark ping TOS bits with CAR, & use WFQ in routers and see how it affects loss, RTT, jitter etc.

Conclusions
Performance is getting better Within Western R&E networks things are good
Good enough even for VoIP in terms of RTT, jitter, loss

But keeping pace takes constant upgrades Transoceanic, needs special care E. Europe, Russia, China, S. America performance is where N. America & W. Europe were 4 years ago Peering is critical Internet reliability, even in the West, has a way to go to meet phone company standards of 99.999%
27

More Information
IEPM/PingER home site
http://www-iepm.slac.stanford.edu/

Surveyor/IETF/IPPM project
http://www.advanced.org/csg-ippm/

ICFA-SCIC Homepage
http://www.hep.net/ICFA/index.html

28