Network Border Patrol Preventing Congestion Collapse and Promoting Fairness in The Internet

1
Network Border Patrol: Preventing Congestion Collapse and Promoting Fairness in the Internet
C lio Albuquerquey, Brett J. Vickersz and Tatsuya Suday e y Dept. of Information and Computer Science University of California, Irvine fcelio,sudag@ics.uci.edu
Abstract The end-to-end nature of Internet congestion control is an important factor in its scalability and robustness. However, end-to-end congestion control algorithms alone are incapable of preventing the congestion collapse and unfair bandwidth allocations created by applications that are unresponsive to network congestion. To address this aw, we propose and investigate a novel congestion avoidance mechanism called Network Border Patrol (NBP). NBP relies on the exchange of feedback between routers at the borders of a network in order to detect and restrict unresponsive trafc ows before they enter the network. An enhanced core-stateless fair queueing mechanism is proposed in order to provide fair bandwidth allocations among competing ows. NBP is compliant with the Internet philosophy of pushing complexity toward the edges of the network whenever possible. Simulation results show that NBP effectively eliminates congestion collapse that, when combined with fair queueing, NBP achieves approximately max-min fair bandwidth allocations for competing network ows. Keywords Internet, congestion control, congestion collapse, max-min fairness, end-to-end argument, core-stateless mechanisms, border control
z Dept. of Computer Science Rutgers University bvickers@cs.rutgers.edu
I. I NTRODUCTION HE essential philosophy behind the Internet is expressed by the scalability argument: no protocol, algorithm or service should be introduced into the Internet if it does not scale well. A key corollary to
This research is supported by the National Science Foundation through grant NCR-9628109. It has also been supported by grants from the University of California MICRO program, Hitachi America, Hitachi, Standard Microsystem Corp., Canon Information Systems Inc., Nippon Telegraph and Telephone Corp. (NTT), Nippon Steel Information and Communica-tion Systems Inc. (ENICOM), Tokyo Electric Power Co., Fujitsu, Novell, Matsushita Electric Industrial Co. and Fundacao CAPES/Brazil.
the scalability argument is the end-to-end argument: to maintain scalability, algorithmic complexity should be pushed to the edges of the network whenever possible. Perhaps the best example of the Internet philosophy is TCP congestion control, which is achieved primarily through algorithms implemented at end systems. Unfortunately, TCP congestion control also illustrates some of the shortcomings of the end-to-end argument. As a result of its strict adherence to end-to-end congestion control, the current Internet suffers from two maladies: congestion collapse from undelivered packets, and unfair allocations of bandwidth between competing trafc ows. The rst maladycongestion collapse from undelivered packetsarises when bandwidth is continually consumed by packets that are dropped before reaching their ultimate destinations [1]. John Nagle coined the term congestion collapse in 1984 to describe a network that remains in a stable congested state [2]. At that time, the primary cause of congestion collapse was the poor calculation of retransmission timers by TCP sources, which led to the unnecessary retransmission of delayed packets. This problem was corrected with more recent implementations of TCP [3]. Recently, however, a potentially more serious cause of congestion collapse has become increasingly common. Network applications are now frequently written to use transport protocols, such as UDP, which are oblivious to congestion and make no attempt to reduce packet transmission rates when packets are discarded by the network [4]. In fact, during periods of congestion some applications actually increase their transmission rates, just so that they will be less sensitive to packet losses [5]. Unfortunately, the Internet currently has no effective way to regulate such applications. The second maladyunfair bandwidth allocationarises in the Internet for a variety of reasons, one of which is the presence of application which do not adapt to congestion. Adaptive applications (e.g., TCP-based applications) that respond to congestion by rapidly reducing their transmission rates are likely to receive unfairly small bandwidth allocations when competing with unresponsive or malicious applications. The Internet protocols themselves also introduce unfairness. The TCP algorithm, for instance, inherently causes each TCP ow to receive a bandwidth that is inversely proportional to its round trip time [6]. Hence, TCP connections with short round trip times may receive unfairly large allocations of network bandwidth when compared to connections with longer round trip times.
To address these maladies, we introduce and investigate a new Internet trafc control protocol called Network Border Patrol. The basic principle of Network Border Patrol (NBP) is to compare, at the borders of a network, the rates at which packets from each application ow are entering and leaving the network. If a ows packets are entering the network faster than they are leaving it, then the network is likely buffering or, worse yet, discarding the ows packets. In other words, the network is receiving more packets than it can handle. NBP prevents this scenario by patrolling the networks borders, ensuring that each ows packets do not enter the network at a rate greater than they are able to leave it. This patrolling prevents congestion collapse from undelivered packets, because an unresponsive ows otherwise undeliverable packets never enter the network in the rst place. As an option, Network Border Patrol can also realize approximately max-min fair bandwidth allocations for competing packet ows if it is used in conjunction with an appropriate fair queueing mechanism. Weighted fair queuing (WFQ) [7], [8] is an example of one such mechanism. Unfortunately, WFQ imposes signicant complexity on routers by requiring them to maintain per-ow state and perform per-ow scheduling of packets. In this paper we introduce an enhanced core-stateless fair queuing mechanism, in order to achieve some of the advantages of WFQ without all of its complexity. NBPs prevention of congestion collapse comes at the expense of some additional network complexity, since routers at the border of the network are expected to monitor and control the rates of individual ows. NBP also introduces an added communication overhead, since in order for an edge router to know the rate at which its packets are leaving the network, it must exchange feedback with other edge routers. However, unlike other proposed approaches to the problem of congestion collapse, NBPs added complexity is isolated to edge routers; routers within the core do not participate in the prevention of congestion collapse. Moreover, end systems operate in total ignorance of the fact that NBP is implemented in the network, so no changes to transport protocols are necessary. The remainder of this paper is organized as follows. Section II describes why existing mechanisms are not effective in preventing congestion collapse nor providing fairness in the presence of unresponsive ows. In section III, we describe the architectural components of Network Border Patrol in further detail and present the feedback and rate control algorithms used by NBP edge routers to prevent congestion collapse. Section IV
explains the enhanced core-stateless fair queueing mechanism, and illustrates the advantages of providing lower queueing delays to ows transmiting at lower rates. In section V, we present the results of several simulations, which illustrate the ability of NBP to avoid congestion collapse and provide fairness to competing network ow. The scalability of NBP is further examined. In section VI, we discuss several implementation issues that must be addressed in order to make deployment of NBP feasible in the Internet. Finally, in section VII we provide some concluding remarks. II. R ELATED W ORK The maladies of congestion collapse from undelivered packets and of unfair bandwidth allocations have not gone unrecognized. Some have argued that there are social incentives for multimedia applications to be friendly to the network, since an application would not want to be held responsible for throughput degradation in the Internet. However, malicious denial-of-service attacks using unresponsive UDP ows are becoming disturbingly frequent in the Internet and they are an example that the Internet cannot rely solely on social incentives to control congestion or to operate fairly. Some have argued that these maladies may be mitigated through the use of improved packet scheduling [12] or queue management [13] mechanisms in network routers. For instance, per-ow packet scheduling mechanisms like Weighted Fair Queueing (WFQ) [7], [8] attempt to offer fair allocations of bandwidth to ows contending for the same link. So do Core-Stateless Fair Queueing (CSFQ) [9], Rainbow Fair Queueing [10] and CHOKe [11], which are approximations of WFQ that do not require core routers to maintain per-ow state. Active queue management mechanisms like Fair Random Early Detection (FRED) [14] also attempt to limit malicious or unresponsive ows by preferentially discarding packets from ows that are using more than their fair share of a links bandwidth. All of these mechanisms are more complex and expensive to implement than simple FIFO queueing, but they reduce the causes of unfairness and congestion collapse in the Internet. Nevertheless, they do not eradicate them. For illustration of this fact, consider the example shown in Figure 1. Two unresponsive ows compete for bandwidth in a network containing two bottleneck links arbitrated by a fair queueing mechanism. At the rst bottleneck link (R1 -R2 ), fair queueing ensures that each ow receives half of the links available bandwidth (750 kbps). On the second bottleneck link (R2 -S4), much of the trafc from ow B is discarded
S1 10 Mbps R1 10 Mbps S2
Flow A 1.5 Mbps
10 Mbps R2 128 kbps
S3
Flow B
S4
Fig. 1. Example of a network which experiences congestion collapse
due to the links limited capacity (128 kbps). Hence, ow A achieves a throughput of 750 kbps and ow B achieves a throughput of 128 kbps. Clearly, congestion collapse has occurred, because ow B packets, which are ultimately discarded on the second bottleneck link, unnecessarily limit the throughput of ow A across the rst bottleneck link. Furthermore, while both ows receive equal bandwidth allocations on the rst bottleneck link, their allocations are not globally max-min fair. An allocation of bandwidth is said to be globally max-min fair if, at every link, all active ows not bottlenecked at another link are allocated a maximum, equal share of the links remaining bandwidth [15]. A globally max-min fair allocation of bandwidth would have been 1.372 Mbps for ow A and 128 kbps for ow B. This example, which is a variant of an example presented by Floyd and Fall [1], illustrates the inability of local scheduling mechanisms, such as WFQ, to eliminate congestion collapse and achieve global max-min fairness without the assistance of additional network mechanisms. Jain et al. have proposed several rate control algorithms that are able to prevent congestion collapse and provide global max-min fairness to competing ows [16]. These algorithms (e.g., ERICA, ERICA+) are designed for the ATM Available Bit Rate (ABR) service and require all network switches to compute fair allocations of bandwidth among competing connections. However, these algorithms are not easily tailorable to the current Internet, because they violate the Internet design philosophy of keeping router implementations simple and pushing complexity to the edges of the network. Rangarajan and Acharya proposed a network border-based approach, which aims to prevent congestion collapse through early regulation of unresponsive ows (ERUF) [17]. ERUF border routers rate control the input trafc, while core routers generate source quenches on packet drops to advise sources and border routers to reduce their sending rates. While this approach may prevent congestion collapse, it does so after packets have been dropped and the network is congested. It also lacks mechanisms to provide fair bandwidth allocations to ows
Domain 1
1 0 1 0 1 0 1 0 1 0 1 0 11 00 1 0 11 00 1 0 1 0 1 0 1 0
1 0 11 00 1 0 11 00 1111 0000 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 router Edge 11 00
Domain 2
11 00 11 00 11 00 11 00 1 0 1 0 1 0 111 000 111 000 1 0 1 0 1 0 11 00 00 11 1 0 1 0
1 0 1 0
End systems
Core router
Fig. 2. The core-stateless Internet architecture assumed by NBP
that are responsive and unresponsive to congestion. Floyd and Fall have approached the problem of congestion collapse by proposing low-complexity router mechanisms that promote the use of adaptive or TCP-friendly end-to-end congestion control [1]. Their suggested approach requires selected gateway routers to monitor high-bandwidth ows in order to determine whether they are responsive to congestion. Flows determined to be unresponsive to congestion are penalized by a higher packet discarding rate at the gateway router. A limitation of this approach is that the procedures currently available to identify unresponsive ows are not always successful [9]. III. N ETWORK B ORDER PATROL Network Border Patrol is a network layer congestion avoidance protocol that is aligned with the core-stateless approach. The core-stateless approach, which has recently received a great deal of research attention [18], [9], allows routers on the borders (or edges) of a network to perform ow classication and maintain per-ow state but does not allow routers at the core of the network to do so. Figure 2 illustrates this architecture. As in other work on core-stateless approaches, we draw a further distinction between two types of edge routers. Depending on which ow it is operating on, an edge router may be viewed as an ingress or an egress router. An edge router operating on a ow passing into a network is called an ingress router, whereas an edge router operating on a ow passing out of a network is called an egress router. Note that a ow may pass through more than one egress (or ingress) router if the end-to-end path crosses multiple networks. NBP prevents congestion collapse through a combination of per-ow rate monitoring at egress routers and
Flow 1
Rate Monitor
Arriving packets
Flow Classifier Flow n
Rate Monitor
To forwarding function and output ports
Rate 1 Forward Feedback Flow
Rate n Backward Feedback Flow Feedback Controller
Fig. 3. An input port of an NBP egress router
per-ow rate control at ingress routers. Rate monitoring allows an egress router to determine how rapidly each ows packets are leaving the network, whereas rate control allows an ingress router to police the rate at which each ows packets enter the network. Linking these two functions together are the feedback packets exchanged between ingress and egress routers; ingress routers send egress routers forward feedback packets to inform them about the ows that are being rate controlled, and egress routers send ingress routers backward feedback packets to inform them about the rates at which each ows packets are leaving the network. This section describes three important aspects of the NBP mechanism: (1) the architectural components, namely the modied edge routers, which must be present in the network, (2) the feedback control algorithm, which determines how and when information is exchanged between edge routers, and (3) the rate control algorithm, which uses the information carried in feedback packets to regulate ow transmission rates and thereby prevent congestion collapse in the network. A. Architectural Components The only components of the network that require modication by NBP are edge routers; the input ports of egress routers must be modied to perform per-ow monitoring of bit rates, and the output ports of ingress routers must be modied to perform per-ow rate control. In addition, both the ingress and the egress routers must be modied to exchange and handle feedback. Figure 3 illustrates the architecture of an egress routers input port. Data packets sent by ingress routers arrive at the input port of the egress router and are rst classied by ow. In the case of IPv6, this is done by examining the packet headers ow label, whereas in the case of IPv4, it is done by examining the packets
Flow 1
Traffic Shaper
Outgoing packets
Flow Classifier Flow n
Traffic Shaper
To output buffer and network
Rate 1 Backward Feedback Flow
Rate n Forward Feedback Flow
Rate Controller
Feedback Controller
Fig. 4. An output port of an NBP ingress router
source and destination addresses and port numbers. Each ows bit rate is then rate monitored using a rate estimation algorithm such as the Time Sliding Window (TSW) [19]. These rates are collected by a feedback controller, which returns them in backward feedback packets to an ingress router whenever a forward feedback packet arrives from that ingress router. The output ports of ingress routers are also enhanced. Each contains a ow classier, per-ow trafc shapers (e.g., leaky buckets), a feedback controller, and a rate controller. See Figure 4. The ow classier classies packets into ows, and the trafc shapers limit the rates at which packets from individual ows enter the network. The feedback controller receives backward feedback packets returning from egress routers and passes their contents to the rate controller. It also generates forward feedback packets, which it occasionally transmits to the networks egress routers. The rate controller adjusts trafc shaper parameters according to a TCP-like rate control algorithm, which is described later in this section. B. The Feedback Control Algorithm The feedback control algorithm determines how and when feedback packets are exchanged between edge routers. Feedback packets take the form of ICMP packets and are necessary in NBP for three reasons. First, they allow egress routers to discover which ingress routers are acting as sources for each of the ows they are monitoring. Second, they allow egress routers to communicate per-ow bit rates to ingress routers. Third, they allow ingress routers to detect incipient network congestion by monitoring edge-to-edge round trip times. The contents of feedback packets are shown in Figure 5. Contained within the forward feedback packet is a
9
Forward Feedback (FF) Packet
IP/ICMP Headers Timestamp Flow Spec 1 ... Flow Spec n
FF
Ingress Router
Egress Router
BF
IP/ICMP Headers
Timestamp
Hop Count
Flow Egress Spec 1 Rate 1
...
Flow Egress Spec n Rate n
Backward Feedback (BF) Packet
Fig. 5. Forward and backward feedback packets exchanged by edge routers
time stamp and a list of ow specications for ows originating at the ingress router. The time stamp is used to calculate the round trip time between two edge routers, and the list of ow specications indicates to an egress router the identities of active ows originating at the ingress router. A ow specication is a value uniquely identifying a ow. In IPv6 it is the ows ow label; in IPv4, it is the combination of source address, destination address, source port number, and destination port number. An edge router adds a ow to its list of active ows whenever a packet from a new ow arrives; it removes a ow when the ow becomes inactive. In the event that the networks maximum transmission unit size is not sufcient to hold an entire list of ow specications, multiple forward feedback packets are used. When an egress router receives a forward feedback packet, it immediately generates a backward feedback packet and returns it to the ingress router. Contained within the backward feedback packet are the forward feedback packets original time stamp, a router hop count, and a list of observed bit rates, called egress rates, collected by the egress router for each ow listed in the forward feedback packet. The router hop count, which is used by the ingress routers rate control algorithm, indicates how many routers are in the path between the ingress and the egress router. The egress router determines the hop count by examining the time to live (TTL) eld of arriving forward feedback packets. When the backward feedback packet arrives at the ingress router, its contents are passed to the ingress routers rate controller, which uses them to adjust the parameters of each ows trafc shaper.
10
on arrival of Backward Feedback packet p from egress router e currentRTT = currentTime - p.timestamp; if (currentRTT < e.baseRTT) e.baseRTT = currentRTT; deltaRTT = currentRTT - e.baseRTT; RTTsElapsed = (currentTime - e.lastFeedbackTime) / currentRTT; e.lastFeedbackTime = currentTime; for each flow f listed in p rateQuantum = min (MSS / currentRTT, f.egressRate / QF); if (f.phase == SLOW_START) if (deltaRTT f.ingressRate < MSS e.hopcount) f.ingressRate = f.ingressRate 2 ^ RTTsElapsed; else f.phase = CONGESTION_AVOIDANCE; if (f.phase == CONGESTION_AVOIDANCE) if (deltaRTT f.ingressRate < MSS e.hopcount) f.ingressRate = f.ingressRate + rateQuantum RTTsElapsed; else f.ingressRate = f.egressRate - rateQuantum;
Fig. 6. Pseudocode for ingress router rate control algorithm
In order to determine how often to generate forward feedback packets, an ingress router keeps a byte transmission counter for each ow it processes. Whenever a ows byte counter exceeds a threshold, denoted Tx, the ingress router generates and transmits a forward feedback packet to the ows egress router. The forward feedback packet includes a list of ow specications for all ows going to the same egress router, and the counters for all ows described in the feedback packet are reset. Using a byte counter for each ow ensures that feedback packets are generated more frequently when ows transmit at high rates, thereby allowing ingress routers to respond more quickly to impending congestion collapse. To maintain a frequent ow of feedback between edge routers even when data transmission rates are low, ingress routers also generate forward feedback packets whenever a time-out interval, denoted C. The Rate Control Algorithm The NBP rate control algorithm regulates the rate at which each ow enters the network. Its primary goal is to converge on a set of per-ow transmission rates (hereinafter called ingress rates) that prevents congestion collapse from undelivered packets. It also attempts to lead the network to a state of maximum link utilization and low router buffer occupancies, and it does this in a manner that is similar to TCP. In the NBP rate control algorithm, shown in Figure 6, a ow may be in one of two phases, slow start or congestion avoidance, which are similar to the phases of TCP congestion control. New ows enter the network in the
f,
is exceeded.
11
slow start phase and proceed to the congestion avoidance phase only after the ow has experienced congestion. The rate control algorithm is invoked whenever a backward feedback packet arrives at an ingress router. Recall that BF packets contain a list of ows arriving at the egress router from the ingress router as well as the monitored egress rates for each ow. Upon the arrival of a backward feedback packet, the algorithm calculates the current round trip time between the edge routers and updates the base round trip time, if necessary. The base round trip time reects the best observed round trip time between the two edge routers. The algorithm then calculates deltaRTT, which is the difference between the current round trip time (currentRTT) and the base round trip time (e.baseRTT). A deltaRTT value greater than zero indicates that packets are requiring a longer time to traverse the network than they once did, and this can only be due to the buffering of packets within the network. NBPs rate control algorithm decides that a ow is experiencing congestion whenever it estimates that the network has buffered the equivalent of more than one of the ows packets at each router hop. To do this, the algorithm rst computes the product of the ows ingress rate and deltaRTT. This value provides an estimate of the amount of the ows data that is buffered somewhere in the network. If the amount is greater than the number of router hops between the ingress and the egress router multiplied by the size of the largest possible packet, then the ow is considered to be experiencing congestion. The rationale for determining congestion in this manner is to maintain both high link utilization and low queueing delay. Ensuring there is always at least one packet buffered for transmission on a network link is the simplest way to achieve full utilization of the link, and deciding that congestion exists when more than one packet is buffered at the link keeps queueing delays low. A similar approach is used in the DECbit congestion avoidance mechanism [20]. When the rate control algorithm determines that a ow is not experiencing congestion, it increases the ows ingress rate. If the ow is in the slow start phase, its ingress rate is doubled for each round trip time that has elapsed since the last backward feedback packet arrived. The estimated number of round trip times since the last feedback packet arrived is denoted as RTTsElapsed. Doubling the ingress rate during slow start allows a new ow to rapidly capture available bandwidth when the network is underutilized. If, on the other hand, the ow is in the congestion avoidance phase, then its ingress rate is conservatively incremented by one rateQuantum value
12
for each round trip that has elapsed since the last backward feedback packet arrived. This is done to avoid the creation of congestion. The rate quantum is computed as the maximum segment size divided by the current round trip time between the edge routers. This results in rate growth behavior that is similar to TCP in its congestion avoidance phase. Furthermore, the rate quantum is not allowed to exceed the ows current egress rate divided by a constant quantum factor (QF). This guarantees that rate increments are not excessively large when the round trip time is small. When the rate control algorithm determines that a ow is experiencing congestion, it reduces the ows ingress rate. If a ow is in the slow start phase, it enters the congestion avoidance phase. If a ow is already in the congestion avoidance phase, its ingress rate is reduced to the ows egress rate decremented by MRC. In other words, an observation of congestion forces the ingress router to send the ows packets into the network at a rate slightly lower than the rate at which they are leaving the network. IV. A DDING FAIRNESS
TO
N ETWORK B ORDER PATROL
Although Network Border Patrol prevents congestion collapse, it does not guarantee that all ows are treated fairly when they compete for bottleneck links. To address this concern, we consider the interoperation of Network Border Patrol and various fair queueing mechanisms. Fair bandwidth allocations can be achieved by using per-ow packet scheduling mechanisms such as Fair Queuing (FQ) [7], [8]. Fair Queuing attempts to emulate the behavior of a uid ow system; each packet stream is treated as a uid ow entering a pipe, and all ows receive an equal proportion of the pipes capacity. Fair Queuing is effective; it fairly allocates bandwidth to packet ows competing for a single link. However, in order to provide this benet, Fair Queuing requires each link to maintain queues and state for each ow. This complexity overhead impedes the scalability of Fair Queuing, making it impractical for wide area networks in which thousands of ows may be active at any one time. Recognizing the scalability difculties of Fair Queuing, several researchers have proposed more scalable core-stateless approximations of Fair Queuing, such as Core-Stateless Fair Queuing [9], Rainbow Fair Queuing [10] and CHOke [11]. The general idea behind these mechanisms is that edge routers label packets entering the network with the state of their associated ows, and core routers use the state recorded in the packets to de-
13
cide whether to drop them or when to schedule them for transmission. This makes the core-stateless mechanisms more scalable than Fair Queuing, since they limit per-ow operations and state maintenance to routers on the edges of a network. Although these core-stateless fair queuing mechanisms work well with most congestion control algorithms that rely on packet losses to indicate congestion, they do not work as well with congestion avoidance algorithms that prevent congestion before packet loss occurs. Examples of such congestion avoidance algorithms include TCP Vegas [22], [23], TCP with Explicit Congestion Notication [24] and Network Border Patrol. Two simulation experiments illustrate this phenomenon. In both experiments, two TCP ows and a 1 Mbps constant bit rate UDP ow share a single 1.5 Mbps bottleneck link. In the rst experiment, the TCP sources use the TCP Reno implementation, which relies on observations of packet loss to indicate congestion. As Figure 7(a) shows, CSFQ works well with this kind of TCP source, providing approximately fair allocations of bottleneck link bandwidth to all three ows. In the second experiment, the TCP Reno sources are replaced by TCP Vegas sources, which rely on round trip time measurements to predict incipient congestion and keep buffer occupancies small. Here, as Figure 7(b) shows, CSFQ fails to provide fair allocations of bandwidth to the TCP ows. CSFQ fails for congestion avoidance algorithms that prevent packet loss, because it does not accurately approximate the delay characteristics of Fair Queuing. In Fair Queuing, ows transmitting at rates less than or equal to their fair share are guaranteed timely delivery of their packets since they do not share the same buffer as packets from other ows. In the core-stateless approximations of Fair Queuing, this is not the case, since they aggregate packets from all ows into a single buffer and rely on packet discarding to balance the service of each ow. Hence, the existing core-stateless mechanisms are incompatible with congestion avoidance mechanisms that keep buffer occupancies small or rely on round trip time measurements to indicate incipient congestion. We propose a slightly modied version of CSFQ, hereinafter referred to as Enhanced CSFQ (ECSFQ). ECSFQ not only achieves the scalability of CSFQ, but at the same time it is designed to work with preventive congestion avoidance mechanisms like TCP Vegas and Network Border Patrol. The basic idea of ECSFQ is to introduce an additional high priority buffer at each core router. The high priority buffer is used to hold packets from ows transmitting at rates less than their fair share, while the original buffer holds all other packets. Packets in the
14
1 UDP TCP-1 TCP-2 0.8
Bandwidth (Mbps)
0.6
0.4
0.2
0 0 20 40 Time (sec) 60 80 100
(a) CSFQ achieves approximately fair bandwidth allocations when TCP Reno sources are used.
Throughput (Mbps)
0.6
0.4
0.2
0 0 20 40 Time (sec) 60 80 100
(b) CSFQ fails to achieve fair bandwidth allocations when TCP Vegas sources are used.
Fig. 7. CSFQ does not achieve fair bandwidth allocations when used with some congestion avoidance mechanisms
high priority buffer are served rst and therefore experience short delays. Once a ows rate meets or exceeds its fair share, the ows packets enter the low priority buffer and its packets experience the same delays as packet from other existing ows transmitting at or above the fair share. Apart from the addition of a high priority buffer, ECSFQ behaves identically to the original CSFQ algorithm. The results in Figure 8 were obtained by repeating the previous experiment with ECSFQ and TCP Vegas. Due to the use of high priority buffers, TCP Vegas packets experience lower queueing delays than the UDP packets. Because of this, all three ows achieve approximately fair shares of the bottleneck link bandwidth. One potential drawback of ECSFQ is that it makes possible the reordering of a ows packets. However, we believe packet reordering will be rare, since it happens only when a ows packets are queued in the high priority buffer after previously being queued in the low priority buffer. This can occur in two cases: (1) when
15
Bandwidth (Mbps)
0.6
0.4
0.2
0 0 20 40 Time (sec) 60 80 100
Fig. 8. Enhanced CSFQ restores fairness when used with TCP Vegas Simulation parameter Packet size Router queue size Maximum segment size (MSS) TCP implementation TCP window size NBP MRC factor (MF) NBP Tx NBP f TSW window size End-system-to-edge propagation delay End-system-to-edge link bandwidth Value 1000 bytes 100 packets 1500 bytes Reno [26] 100 kbytes 10 40000 bytes 100 msec 10 msec 100 sec 10 Mbps
Table 1. Default simulation parameters
a ow originally transmits at or above its fair share allocation but later decreases its transmission rate below the fair share, or (2) when bandwidth becomes available and the ows fair share suddenly increases. Packet reordering in the rst case is possible but unlikely because by reducing its rate, the ow is reducing the load on the bottleneck, thereby allowing the packets in the low priority buffer to be processed faster, resulting in a low probability of packets from this ow to be found in the low priority buffer. Packet reordering in the second case is also possible but unlikely, since the low priority buffer empties rapidly when new bandwidth becomes available. V. S IMULATION E XPERIMENTS In this section, we present the results of several simulation experiments, each of which is designed to test a different aspect of Network Border Patrol. The rst set of experiments examines the ability of NBP to prevent congestion collapse, while the second set of experiments examines the ability of ECSFQ to provide fair bandwidth allocations to competing network ows. Together, NBP and ECSFQ prevent congestion collapse and
16
TCP Flow S1 I1 10 Mbps 2 ms R1 3 ms 10 Mbps S2 I2 Unresponsive UDP Flow I = Ingress Router E = Egress Router R = Core Router S = End System 1.5 Mbps 3 ms 10 Mbps 10 ms R2 5 ms 128 kbps E2 S4 E1 S3
Fig. 9. A network with a single shared link
provide fair allocations of bandwidth to competing network ows. The third set of experiments assesses the scalability constraints of NBP. All simulations were run for 100 seconds using the UC Berkeley/LBNL/VINT ns-2 simulator [27]. The ns-2 code implementing NBP and the scripts to run these simulations are available at the UCI Network Research Group web site [28]. Default simulation parameters are shown in Table 1. They are set to values commonly used in the Internet and are used in all simulation experiments unless otherwise noted. A. Preventing Congestion Collapse The rst set of simulation experiments explores NBPs ability to prevent congestion collapse from undelivered packets. In the rst experiment, we study the scenario depicted in Figure 9. One ow is a TCP ow generated by an application which always has data to send, and the other ow is an unresponsive constant bit rate UDP ow. Both ows compete for access to a shared 1.5 Mbps bottleneck link (R1-R2 ), and only the UDP ow traverses a second bottleneck link (R2-E2 ), which has a limited capacity of 128 kbps. Figure 10 shows the throughput achieved by the two ows as the UDP sources transmission rate is increased from 32 kbps to 2 Mbps. The combined throughput delivered by the network (i.e., the sum of both ow throughputs) is also shown. Four different cases are examined under this scenario. The rst is the benchmark case used for comparison: NBP is not used between edge routers, and all routers schedule the delivery of packets on a FIFO basis. As Figure 10(a) shows, the network experiences severe congestion collapse as the UDP ows transmission rate increases, because the UDP ow fails to respond adaptively to the discarding of its packets on the second bottleneck link. When the UDP load increases to 1.5 Mbps, the TCP ows throughput drops nearly to zero. In the second case we show that fair queueing mechanisms alone cannot prevent congestion collapse. As
17
1.6
Throughput (Mbps)
1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 500 1000 UDP input traffic load (Kbps)
Combined TCP UDP
1500
2000
(a) Severe congestion collapse using FIFO only

1.6
Throughput (Mbps)
Combined TCP UDP
1500
2000
(b) Moderate congestion collapse using ECSFQ only

1.6
Throughput (Mbps)
1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 500 1000 UDP input traffic load (Kbps) 1500 2000 Combined TCP UDP
(c) No congestion collapse using NBP with FIFO
Fig. 10. Congestion collapse observed as unresponsive trafc load increases. The solid line shows the combined throughput delivered by the network.
shown in Figure 10(b), better throughput is achieved for the TCP ow when compared to the FIFO only case. However, as indicated by the combined throughput of both ows, congestion collapse still occurs as the UDP load increases. Although ECSFQ allocates about 750 kbps to both ows at the rst bottleneck link, only 128 kbps of this bandwidth is successfully exploited by the UDP ow, which is even more seriously bottlenecked by a second link. The remaining 622 kbps is wasted on undelivered packets. In the third case, as Figure 10(c) shows, NBP effectively eliminates congestion collapse; the TCP ow achieves a nearly optimal throughput of 1.37 Mbps, and the combined throughput remains very close to 1.5 Mbps.
18
1.6
Throughput (Mbps)
Combined TCP UDP
1500
2000
(a) Severe unfairness using FIFO only

1.6
Throughput (Mbps)
Combined TCP UDP
1500
2000
(b) Moderate unfairness using NBP with FIFO

1.6
Throughput (Mbps)
Combined TCP UDP
1500
2000
(c) Approximate fairness using ECSFQ
Fig. 11. Unfairness as the unresponsive trafc load increases
B. Achieving Fairness Besides targeting to prevent congestion collapse from occurring, this paper aims to improve the fairness of bandwidth allocations to competing network ows. In the rst fairness experiment, we consider the scenario depicted in Figure 9 but replace the second bottleneck link (R2-E2 ) with a higher capacity 10 Mbps link. The TCP ow is generated by an application which always has data to send, and the UDP ow is generated by an unresponsive source which transmits packets at a constant bit rate. Since there is only one 1.5 Mbps bottleneck link (R1 -R2) in this scenario, the max-min fair allocation of bandwidth between the ows is 750 kbps (if the UDP source exceeds a transmission rate of 750 kbps). However, as Figure 11(a) shows, fairness is clearly not achieved when only FIFO scheduling is used in routers. As the
19
D E E F H H AAA CCC GGGGG GGBBB
20ms R 50 Mbps R
10ms 100 Mbps R
5ms 50 Mbps R
5ms 150 Mbps R
5ms 150 Mbps R
10ms 50 Mbps R
A B D
E E AB
B H H
C C C
GGGGGGG
Fig. 12. The GFC-2 network

Ideal global max-min fair share 10 5 35 35 35 10 5 52.5 Throughput using WFQ only 8.32 5.04 27.12 16.64 16.64 8.32 4.96 36.15 Simulation Throughput using NBP with FIFO 10.96 1.84 31.28 33.84 37.76 7.60 1.04 46.87 results Throughput using NBP with WFQ 10.00 5.04 34.23 34.95 34.87 10.08 4.96 50.47 Throughput using NBP with ECSFQ 10.40 4.48 31.52 32.88 33.36 8.08 5.28 47.76
Flow Group A B C D E F G H
Table 2. Per-ow throughput in the GFC-2 network (in Mbps)
unresponsive UDP trafc load increases, the TCP ow experiences congestion and reduces its transmission rate, thereby granting an unfairly large amount of bandwidth to the unresponsive UDP ow. Thus, although there is no congestion collapse from undelivered packets, there is clearly unfairness. When NBP is deployed with FIFO scheduling, Figure 11(b) shows that the unfair allocation of bandwidth is only lightly reduced. Figure 11(c) shows the throughput of each ow when ECSFQ is introduced. Notice that ECSFQ is able to approximate fair bandwidth allocations. In the second fairness experiment, we study whether NBP combined with ECSFQ can provide max-min fairness in a complex network. We consider the network model shown in Figure 12. This model is adapted from the second General Fairness Conguration (GFC-2), which is specically designed to test the max-min fairness of trafc control algorithms [29]. It consists of 22 unresponsive UDP ows, each generated by a source transmitting at a constant bit rate of 100 Mbps. Flows belong to ow groups which are labeled from A to H, and the network is designed in such a way that members of each ow group receive the same max-min bandwidth allocations. Links connecting core routers serve as bottlenecks for at least one of the 22 ows, and all links have propagation delays of 5 msec and bandwidths of 150 Mbps unless otherwise shown in the gure.
20
The rst column of Table 2 lists the global max-min fair share allocations for all ows shown in Figure 12. These values represent the ideal bandwidth allocations for any trafc control mechanism that attempts to provide global max-min fairness. The remaining columns list the equilibrium-state throughputs actually observed after 4.5 seconds of simulation for several scenarios. (Only the results for a single member of each ow group are shown.) In the rst scenario, NBP is not used and all routers perform WFQ. As indicated by comparing the values in the rst and second columns, WFQ by itself is not able to achieve global max-min fairness for all ows. This is due to the fact that WFQ does not prevent congestion collapse. In the second scenario, NBP is introduced at edge routers and FIFO scheduling is assumed at all routers. Results listed in the third column show that NBP with FIFO also fails to achieve global max-min fairness in the GFC-2 network, largely because NBP has no mechanism to explicitly enforce fairness. In the third and fourth simulation scenarios, NBP is combined with WFQ and ECSFQ, respectively, and in both cases bandwidth allocations that are approximately max-min fair for all ows are achieved. NBP with WFQ achieves slightly better fairness than NBP with ECSFQ, since ECSFQ is only an approximation of WFQ, and its performance depends on the accuracy of its estimation of a ows input rate and fair share. Figures 13(a) and 13(b) show how rapidly the throughput of each ow converges to its max-min fair bandwidth allocation for the NBP with WFQ and the NBP with ECSFQ cases, respectively. Even in a complex network like the one simulated here, all ows converge to an approximately max-min fair bandwidth allocation within one second. C. Scalability Scalability is perhaps the most important performance measure of any trafc control mechanism. As we saw in the previous section, Network Border Patrol is a core-stateless trafc control mechanism that effectively prevents congestion collapse and provides approximate max-min fairness. However, NBPs scalability is highly dependent upon per-ow management performed by edge routers. In a large scale network, the overheads of maintaining per-ow state, communicating per-ow feedback, and performing per-ow rate control and rate monitoring may become inordinately expensive. The number of border routers, the number of ows, and the load of the trafc
21
70 60 50 40 30 20 10 0 0.5 1 1.5 2 2.5 Time (sec) 3 3.5 4 4.5 A (ideal=10) B (5) C (35) D (35) E (35) F (10) G (5) H (52.5)
Throughput (Mbps)
(a) Using NBP with WFQ

70 60 50 40 30 20 10 0 0.5 1 1.5 2 2.5 Time (sec) 3 3.5 4 4.5 A (ideal=10) B (5) C (35) D (35) E (35) F (10) G (5) H (52.5)
Throughput (Mbps)
(b) Using NBP with ECSFQ
Fig. 13. Per-ow throughput in the GFC-2 network
are the determinant factors in NBPs scalability. The amount of feedback packets an ingress router generates depends on the number of egress routers it communicates with and the load of the trafc. The length of the feedback packets as well as the processing and the amount of state borders routers need to maintain is mainly determined by the number of ows. In this section we assess NBPs scalability through the network shown in Figure 14. This is a very exible topology, in which we can vary the number of border routers connected to each of the core routers, as well as the number of end systems connected to each border router1. The network model shown in Figure 14 consists of four core routers, 4
B border routers and 4 B S end systems, where B is the number of border routers
per core router and S is the number of end systems per border router. Propagation delays between core routers are set to 5 msec, between border and core routers 1 msec and between end systems and core routers 0.1 msec. Connections are setup in such a way that data packets travel on both forward and backward directions on the
1 Since NBP is a core-stateless mechanism, the conguration of core routers doesnt seem to be very relevant for the study of NBPs scalability. A tandem conguration is used in order to enable scenarios with multiple congested links.
22
UDP sinks 2 UDP sinks 1
E TCP sinks 2 10 Mbps
E 5 Mbps 10 Mbps
E 5 Mbps 10 Mbps
E 10 Mbps 10 Mbps
I TCP sources 2
C
I 10 Mbps I 10 Mbps I TCP sources 1
C
10 Mbps I I
C
E 10 Mbps TCP sinks 1 I E
UDP sources 1
UDP sources 2
Fig. 14. Simulation model for evaluating scalability
links connecting core routers. TCP ows traverse all core routers while UDP ows traverse only the interior core routers. The link speeds between core and egress routers traversed by UDP ows are set to 5 Mbps. All remaining link speeds are set to 10 Mbps. Therefore, UDP ows are bottlenecked downstream at 5 Mbps. TCP ows traverse multiple congested links, and compete for bandwidth with UDP ows and also among themselves. UDP ows are unresponsive and transmit at a constant rate of 5 Mbps. In the rst experiment we consider a moderately large network with 10 border routers, and we varied the number of end systems from 8 to 48, by varying the number of end systems per border router from 1 to 6. The amount of feedback information generated by NBP is shown in Figure 15. Interestingly the amount of feedback generated by NBP is not very dependent on the number of end systems. This is because feedback packets are generated according to the amount of trafc admitted into the network. Since in nearly all scenarios tested, the capacity of the network was fully utilized, the amount of feedback information generated by NBP does not increase with the number of end systems. The fraction of the bandwidth of links L2 and L3 consumed by feedback packets varied from 1.04% to 1.59% in this experiment. One of the advantages of NBP is to reduce the losses within the network. Figure 16 shows the amount of data packets lost in the network with and without NBP. As expected, in both scenarios, the total number of packet losses increase linearly with the number of end systems, according to the load of unresponsive trafc. By monitoring the amount of packets lost within the network, one can infer the degree of congestion collapse in the network. The amount of packets lost inside the network provide a good measure of congestion collapse. Without
23
50000 45000 40000 35000 30000 25000 20000 15000 10000 5000 0 5
Total Feedback Packets Dropped Feedback Packets
500000 450000 400000 350000 300000 250000 200000 150000 100000 50000 0 5
Total Feedback Information
Number of packets
10
15
20
25
30
35
40
45
50
Number of bytes
10
15
20
25
30
35
40
45
50
Number of end systems
Fig. 15. Amount of feedback packets as the number of end systems increase.
NBP (a), it can be seen that once the load of the input trafc becomes larger than the capacity of the link between border routers and core routers, than data packets are discarded by the border routers, and the amount of data discarded in core routers remains constant, clearly demonstrating congestion collapse. With NBP (b), nearly zero losses are observed in the core of the network, showing that even in a large scale complex network, congestion collapse can be effectively prevented.
Number of dropped data packets
60000 50000 40000 30000 20000 10000 0 5 10 15 20 25 30 35 Number of end systems 40 45 50 Total Data Dropped - Without NBP Data Dropped in Core - Without NBP
70000
70000 60000 50000 40000 30000 20000 10000 0 5 10 15 20 25 30 35 Number of end systems 40 45 50 Total Data Dropped - With NBP Data Dropped in Core - With NBP
Fig. 16. Number of dropped packets as the number of end systems increase.
TCP ows traverse the entire network compete for bandwidth with the unresponsive ows. Figure 17(a) shows that without NBP, the combined throughput of TCP ows drops to nearly zero as the load of the unresponsive trafc increases. However, as seen in Figure 16, since NBP was able to prevent congestion collapse, the performance of TCP can be greatly improved. Figure 17 (b) shows that TCPs throughput remains high as the number of end systems increase. In the second experiment we vary the size of the network by varying the number of border routers from 10 to 48 routers. We consider that only one end system is connected to each border router, so that links between ingress and core routers are never congested. Only links connecting core routers and links connecting core to egress routers become congested in this experiment. Figure 18 shows the amount of data packets lost in the network with and without NBP. Like in the previous
24
10
Throughput (Mbps)
8 6 4 2 0 5 10 15 20 25
TCP Throughput - Without NBP TCP Throughput - With NBP
30
35
40
45
50
Fig. 17. Combined throughput of TCP ows as the number of end systems increase.
experiment, in both scenarios, the total number of packet losses increase linearly with the number of end systems, according to the load of unresponsive trafc. However, notice that with NBP a slightly higher amount of packets is discarded when compared to the scenario without NBP. Notice also that without NBP, all packet losses occur in the core of the network, but when NBP is used, nearly no losses are observed in the core. In this scenario, ingress routers proved to be somewhat conservative and discarded a slightly large amount of packets.
Total Data Dropped - Without NBP Data Dropped in Core - Without NBP
60000 50000 40000 30000 20000 10000 0 5 10 15 20 25 30 Number of borders 35 40 45 50
60000 50000 40000 30000 20000 10000 0 5 10 15 20 25 30 Number of borders 35 40 45 50 Total Data Dropped - With NBP Data Dropped in Core - With NBP
Fig. 18. Number of dropped packets as the number of borders increase.
The amount of feedback information generated by NBP is shown in Figure 19 and it slightly increases as the number of borders increase. The fraction of the bandwidth of links L2 and L3 consumed by feedback packets varied from 0.92% to 1.41% in this experiment.
60000 50000 Total Feedback Packets Dropped Feedback Packets 600000 Total Feedback Information 500000
Number of packets
40000 30000 20000 10000 0 5 10 15 20 25 30 Number of borders 35 40 45 50
Number of bytes
400000 300000 200000 100000 0 5 10 15 20 25 30 Number of borders 35 40 45 50
Fig. 19. Amount of feedback information as the number of borders increase.
The throughput achieved by TCP ows in Figure 20. Although results are not optimal, TCPs throughput
25
with NBP is still much larger than in the scenario without NBP. It can be seen however a large drop in TCPs performance as the number of border routers increase. Several reason could have contributed for this result. First notice that in this experiment each TCP source is connected to a different ingress router, while in the previous experiment all TCP sources share the same ingress router. So in order to respond to congestion, in the previous experiment, a single feedback packet per ingress router is able to rate control all TCP and UDP sources, while in this experiment one feedback packet per ow at each ingress router is necessary to rate control each ow. Second feedback packets contain congestion information regarding all ows sharing the same pair of ingress and egress routers. Therefore the information carried by feedback packets in the rst experiment is relevant to all ows while the information carried by feedback packets in the second experiment is exclusive to one single ow.
10 TCP Throughput - Without NBP TCP Throughput - With NBP
Throughput (Mbps)
8 6 4 2 0 5 10 15 20 25
30
35
40
45
50
Number of borders
Fig. 20. Combined throughput of TCP ows as the number of borders increase.
This last result sugests that ow aggregation may be benetial to the throughput of small microows. Coarsegrained ow aggregation has the effect of signicantly reducing the number of ows seen by NBP edge routers, reducing therefore the required amount of state and processing. Flows can be classifyed very coarsely at edge routers. Instead of classifying a ow using the packets addresses and port numbers, the networks edge routers may aggregate many ows together by, for instance, classifying ows using only the packets address elds. Alternatively, they might choose to classify ows even more coarsely using only the packets destination network address. However, the drawback of ow aggregation is that adaptive ows aggregated with unresponsive ows may be indiscriminately punished by an ingress router. Ingress routers can attempt to prevent this unfair situation by performing per-ow scheduling. Still, ow aggregation may create a trade-off between scalability and perow fairness.
26
VI. I MPLEMENTATION I SSUES As we saw in the previous section, Network Border Patrol is a congestion avoidance mechanism that effectively prevents congestion collapse and provides approximate max-min fairness when used with a fair queueing mechanism. However, a number of important implementation issues must be addressed before NBP can be feasibly deployed in the Internet. Among these issues are the following: 1. Scalable inter-domain deployment. Another approach to improving the scalability of NBP, inspired by a suggestion in [9], is to develop trust relationships between domains that deploy NBP. The inter-domain router connecting two or more mutually trusting domains may then become a simple NBP core router with no need to perform per-ow tasks or keep per-ow state. If a trust relationship cannot be established, border routers between the two domains may exchange congestion information, so that congestion collapse can be prevented not only within a domain, but throughout multiple domains. 2. Incremental deployment. It is crucial that NBP be implemented in all edge routers of an NBP-capable network. If one ingress router fails to police arriving trafc or one egress router fails to monitor departing trafc, NBP will not operate correctly and congestion collapse will be possible. Nevertheless, it is not necessary for all networks in the Internet to deploy NBP in order for it to be effective. Any network that deploys NBP will enjoy the benets of eliminated congestion collapse within the network. Hence, it is possible to incrementally deploy NBP into the Internet on a network-by-network basis. 3. Multicast. Multicast routing makes it possible for copies of a ows packets to leave the network through more than one egress router. When this occurs, an NBP ingress router must examine backward feedback packets returning from each of the multicast ows egress routers. To determine whether the multicast ow is experiencing congestion, the ingress router should execute its rate control algorithm using backward feedback packets from the most congested ingress-to-egress path (i.e., the one with the lowest ow egress rate). This has the effect of limiting the ingress rate of a multicast ow according to the most congested link in the ows multicast tree. 4. Multi-path routing. Multi-path routing makes it possible for packets from a single ow to leave the network through different egress routers. In order to support this possibility, an NBP ingress router may need to examine backward feedback packets from more than one egress router in order to determine the combined egress rate for
27
a single ow. For a ow passing through more than one egress router, its combined egress rate is equal to the sum of the ows egress rates reported in backward feedback packets from each egress router. 5. Integrated or differentiated services. NBP treats all ows identically, but integrated and differentiated services networks allow ows to receive different qualities of service. In such networks, NBP should be used to regulate best effort ows only. Flows using network services other than best effort are likely to be policed by separate trafc control mechanisms. VII. C ONCLUSION In this paper, we have presented a novel congestion avoidance mechanism for the Internet called Network Border Patrol and an enhanced core-stateless fair queuing mechanism. Unlike existing Internet congestion control approaches, which rely solely on end-to-end control, NBP is able to prevent congestion collapse from undelivered packets. It does this by ensuring at the border of the network that each ows packets do not enter the network faster than they are able to leave it. NBP requires no modications to core routers nor to end systems. Only edge routers are enhanced so that they can perform the requisite per-ow monitoring, per-ow rate control and feedback exchange operations. Extensive simulation results provided in this paper show that NBP successfully prevents congestion collapse from undelivered packets. They also show that, while NBP is unable to eliminate unfairness on its own, it is able to achieve approximate global max-min fairness for competing network ows when combined with ECSFQ, they approximate global max-min fairness in a completely core-stateless fashion. R EFERENCES
[1] [2] [3] [4] [5] [6] [7] [8] [9] S. Floyd and K. Fall, Promoting the Use of End-to-End Congestion Control in the Internet, IEEE/ACM Transactions on Networking, August 1999, To appear. J. Nagle, Congestion control in IP/TCP internetworks, Request for Comments 896, Internet Engineering Task Force, Jan. 1984. Van Jacobson, Congestion avoidance and control, ACM Computer Communications Review, vol. 18, no. 4, pp. 314329, Aug. 1988, Proceedings of the Sigcomm 88 Symposium in Stanford, CA, August, 1988. Real Broadcast Network White Paper, White paper, RealNetworks Inc., January 1999, http://www.real.com/solutions/rbn/whitepaper.html. Real Video Technical White Paper, White paper, RealNetworks Inc., January 1999, http://www.real.com/devzone/library/whitepapers/overview.html. J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, Modeling TCP Throughput: A Simple Model and its Empirical Validation, in Proc. of ACM SIGCOMM, September 1998, pp. 303314. A. Demers, S. Keshav, and S. Shenker, Analysis and Simulation of a Fair Queueing Algorithm, in Proc. of ACM SIGCOMM, September 1989, pp. 112. A. Parekh and R. Gallager, A Generalized Processor Sharing Approach to Flow Control the Single Node Case, IEEE/ACM Transactions on Networking, vol. 1, no. 3, pp. 344357, June 1993. I. Stoica, S. Shenker, and H. Zhang, Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Allocations in High Speed Networks, in Proc. of ACM SIGCOMM, September 1998, pp. 118130.
28
[10] Z. Cao, Z. Wang, and E. Zegura, Rainbow Fair Queuing: Fair Bandwidth Sharing Without Per-Flow State, in Proc. of IEEE Infocom 2000, March 2000. [11] R. Pan, B. Prabhakar, and K. Psounis, CHOKe - A stateless active queue management scheme for approximating fair bandwidth allocation, in Proc. of IEEE Infocom 2000, March 2000. [12] B. Suter, T.V. Lakshman, D. Stiliadis, and A. Choudhury, Design Considerations for Supporting TCP with Per-Flow Queueing, in Proc. of IEEE Infocom 98, March 1998, pp. 299305. [13] B. Braden et al., Recommendations on Queue Management and Congestion Avoidance in the Internet, RFC 2309, IETF, April 1998. [14] D. Lin and R. Morris, Dynamics of Random Early Detection, in Proc. of ACM SIGCOMM, September 1997, pp. 127137. [15] D. Bertsekas and R. Gallager, Data Networks, second edition, Prentice Hall, 1987. [16] R. Jain, S. Kalyanaraman, R. Goyal, S. Fahmy, and R. Viswanathan, ERICA Switch Algorithm: A Complete Description, ATM Forum Document 96-1172, Trafc Management WG, August 1996. [17] A. Rangarajan and A. Acharya, ERUF: Early Regulation of Unresponsive Best-Effort Trafc, International Conference on Networks and Protocols, October 1999. [18] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, An Architecture for Differentiated Services, Request for Comments 2475, Internet Engineering Task Force, December 1998. [19] D. Clark and W. Fang, Explicit Allocation of Best-Effort Packet Delivery Service, IEEE/ACM Transactions on Networking, vol. 6, no. 4, pp. 362373, August 1998. [20] K.K. Ramakrishnan and R. Jain, A Binary Feedback Scheme for Congestion Avoidance in Computer Networks with a Connectionless Network Layer, ACM Transactions on Computing Systems, vol. 8, no. 2, pp. 158181, May 1990. [21] L. Kalampoukas, Anujan Varma, and K. K. Ramakrishnan, Explicit Window Adaptation: A Method to Enhance TCP Performance, in Proc. of IEEE Infocom, San Francisco, California, April 1998. [22] L.S. Brakmo, S.W. OMalley, and L.L. Peterson, TCP Vegas: New techniques for congestion detection and avoidance, in Proc. of ACM SIGCOMM, London, United Kingdom, Aug. 1994, ACM, pp. 3435. [23] U. Hengartner, J. Bolliger, and Th. Gross, TCP Vegas Revisited, in Proc. of IEEE Infocom 2000, March 2000. [24] S. Floyd, TCP and Explicit Congestion Notication, ACM Computer Communications Review, vol. 24, no. 5, pp. 823, Oct. 1994. [25] C. Albuquerque, B.J. Vickers, and T. Suda, Network Border Patrol, in Proc. of IEEE Infocom, March 2000. [26] W. Stevens, TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms, RFC 2001, IETF, January 1997. [27] LBNL Network Research Group, UCB/LBNL/VINT Network Simulator - ns (version 2), http://www-mash.cs.berkeley.edu/ns/, September 1997. [28] UCI Network Research Group, Network Border Patrol (NBP), http://netresearch.ics.uci.edu/nbp/, 1999. [29] B. Vandalore, S. Fahmy, R. Jain, R. Goyal, and M. Goyal, A Denition of Generalized Fairness and its Support in Switch Algorithms, ATM Forum Document 98-0151, Trafc Management WG, February 1998. [30] W.K. Tsai and Y. Kim, Re-Examining Maxmin Protocols: A Fundamental Study on Convergence, Complexity, Variations, and Performance, in Proc. of IEEE Infocom, April 1999, pp. 811818.

Network Border Patrol Preventing Congestion Collapse and Promoting Fairness in The Internet

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Network Border Patrol Preventing Congestion Collapse and Promoting Fairness in The Internet

Hochgeladen von

Copyright:

Verfügbare Formate

1

z Dept. of Computer Science Rutgers University bvickers@cs.rutgers.edu

Flow A 1.5 Mbps

10 Mbps R2 128 kbps

Fig. 1. Example of a network which experiences congestion collapse

1 0 11 00 1 0 11 00 1111 0000 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 router Edge 11 00

11 00 11 00 11 00 11 00 1 0 1 0 1 0 111 000 111 000 1 0 1 0 1 0 11 00 00 11 1 0 1 0

Fig. 2. The core-stateless Internet architecture assumed by NBP

Flow Classifier Flow n

To forwarding function and output ports

Rate 1 Forward Feedback Flow

Rate n Backward Feedback Flow Feedback Controller

Fig. 3. An input port of an NBP egress router

Flow Classifier Flow n

To output buffer and network

Rate 1 Backward Feedback Flow

Rate n Forward Feedback Flow

Fig. 4. An output port of an NBP ingress router

Flow Egress Spec 1 Rate 1

Flow Egress Spec n Rate n

Backward Feedback (BF) Packet

Fig. 5. Forward and backward feedback packets exchanged by edge routers

Fig. 6. Pseudocode for ingress router rate control algorithm

N ETWORK B ORDER PATROL

1 UDP TCP-1 TCP-2 0.8

0 0 20 40 Time (sec) 60 80 100

1 UDP TCP-1 TCP-2 0.8

0 0 20 40 Time (sec) 60 80 100

1 UDP TCP-1 TCP-2 0.8

0 0 20 40 Time (sec) 60 80 100

Table 1. Default simulation parameters

Fig. 9. A network with a single shared link

Combined TCP UDP

(a) Severe congestion collapse using FIFO only

Combined TCP UDP

(b) Moderate congestion collapse using ECSFQ only

(c) No congestion collapse using NBP with FIFO

Combined TCP UDP

(a) Severe unfairness using FIFO only

Combined TCP UDP

(b) Moderate unfairness using NBP with FIFO

Combined TCP UDP

(c) Approximate fairness using ECSFQ

Fig. 11. Unfairness as the unresponsive trafc load increases

10ms 100 Mbps R

5ms 150 Mbps R

5ms 150 Mbps R

Fig. 12. The GFC-2 network

Table 2. Per-ow throughput in the GFC-2 network (in Mbps)

(a) Using NBP with WFQ

(b) Using NBP with ECSFQ

Fig. 13. Per-ow throughput in the GFC-2 network

E TCP sinks 2 10 Mbps

Fig. 14. Simulation model for evaluating scalability

Total Feedback Packets Dropped Feedback Packets

Total Feedback Information

Number of end systems

Number of end systems

Number of dropped data packets

TCP Throughput - Without NBP TCP Throughput - With NBP

Number of end systems

Number of dropped data packets