Globecom 10

An implementation of optimal dynamic load
balancing based on multipath IP routing

Juan Pablo Saibene, Richard Lempert and Fernando Paganini
Facultad de Ingeniera, Universdad ORT, Montevideo, Uruguay
AbstractWe develop a protocol through which multipathenabled IP routers collectively engage in dynamic traffic engineering, to optimize performance in concert with legacy TCP
congestion control. We build on recent theory which shows a
globally optimum resource allocation across the TCP/IP layers
can be achieved through control of rates and multipath routing fractions following a consistent congestion signal. In this
work we fully develop, in an ns2 simulation environment, the
required multipath IP layer consistent with prevailing loss-based
TCP protocols, with the additional requirement that individual
TCP connections should be routed through single paths. The
solution involves a generalization of distance vector protocols
where routing metrics reflect loss probabilities. We demonstrate
through simulations the stability and performance of the resulting
protocol in combination with TCP, and we comment on the
complexity of the implementation.
I. I NTRODUCTION
Routing plays a clear role in the efficient use of network
capacity, since routing choices can overload certain paths
while under-utilizing others. Addressing this issue is the focus
of Traffic Engineering [1], focusing mostly on methods to
distribute a given traffic matrix of demands within network
capacity. Solutions involve use of optimization together with
practical techniques to implement the results via OSPF weights
[4], [13], [15] or MPLS tunnels (see [1]). When external demands are unkwown or time-varying, dynamic load balancing
methods are required (e.g., [2]).
The impact of routing becomes more subtle when combined
with the TCP layer: congestion-controlled sources adapt rate
to whatever capacities are made available, effectively reducing
the demand on a congested path, so superficially the demand
appears to be served. It is only when the TCP/IP layers are
analyzed jointly that efficiency can be precisely defined, and
integrated solutions can be sought. In recent years, research on
Multipath TCP [6] has tackled this problem from the transport
layer side: here, TCP sources manage multiple congestion
windows for different fixed routes provided by IP. This approach has generated interest in the standards community [3].
Its impact is, however, constrained by the limited path diversity
in the first-hop seen by the source (e.g., a multi-homed server).
It is difficult to overcome this limitation without a stronger
breach of layer separation, namely source routing inside an
operator network. The latter would also not be scalable since
the number of internal paths is exponential.
Far more efficiency impact and scalability can be obtained
if IP routers are endowed with a dynamic multipath funcResearch supported in part by AFOSR-US under grant FA9550-09-1-0504.
tion. Here TCP sources continue to manage only a single

connection, and it is the networks job to distribute the load
among internal paths. In recent theoretical work [12], reviewed
in Section II, it was shown how a very general network
utility optimization over both layers can be formulated and
solved in this manner. [12] also provides a proof-of-concept
implementation on the ns2 simulator, with queueing delay as
a global congestion measure, controlling sources through a
variant of TCP-FAST [7], and routing through a generalized
multipath distance vector protocol.
In this paper we seek a router implementation that follows
the proposals of [12] without the need to change TCP from
the prevailing versions [8] which respond to packet loss as a
congestion measure. In Section III we describe the ns2-based
implementation of a multipath routing protocol that responds
to loss. We also address the challenge of keeping individual
TCP connections single-path (to avoid packet reordering),
while balancing the aggregate load. This is achieved through a
hashing technique. In Section IV we present simulation results
that validate the stability and performance of our protocol in
different scenarios, in particular with a realistic mix of TCP
connections. Brief conclusions are given in Section V.
II. J OINTLY OPTIMIZED CONGESTION CONTROL AND
MULTIPATH ROUTING
In this section we review the framework from [12] that

allows for a unified control of the network and transport layers
to serve a common performance objective. This work builds
on the Network Utility Maximization (NUM) models for TCP
congestion control [9], [14]. Here each TCP flow k is assigned
a utility Uk (xk ) as a function of its rate xk , andP
the control
seeks to maximize the aggregate network utility k Uk (xk ),
subject to the constraint that aggregate link rates yl do not
exceed the corresponding link capacity cl .
In the case of single-path IP routing, link rates yl are just
the sum of rates of the flows through the link. If more than one
end-to-end path is defined for a source, the above model can
be generalized [9], [6] by breaking the TCP rate xk into path
components, which in turn determine the link rates. For a given
capacity, the aggregate network utility can improve through the
additional degrees of freedom provided by multiple paths. This
motivates recent proposals for Mulipath TCP [6], [3], where
sources control these individual path rates. If all possible
paths between source and destination were included in the
model, the problem would become equivalent to the following
optimization over the TCP/IP layers.
P
k
Problem 1 (NUM): Maximize
k Uk (x ), subject to link
capacity constraints yl cl , and flow balance constraints at
each node.
To consider all paths would require controlling route choices
at all possible intermediate hops. Source-based Multipath TCP
cannot exploit this path diversity, so it cannot reach the
efficiency of Problem 1.
In [12] a method is proposed that is able to fully exploit
path diversity by controlling the aggregate rate xk of each TCP
d
that
flow (as is current practice), plus the split fractions i,j
specify multipath routing to destination d at router i. Namely,
if xdi is the traffic rate reaching router i destined to d, the
router sends to neighbor j the rate portion
d
d
yi,j
= i,j
xdi .
(1)
d
Control laws are presented in [12] for xk , i,j
so that the
equilibrium allocation of Problem 1 is achieved, or alternatively that of
P
P
Problem 2: Maximize S := k Uk (xk ) l l (yl ) subject to flow balance constraints.
This approximation amounts to replacing capacity constraints
with a barrier function; it also can be seen as combining the
utility maximization approach with cost minimization used in
Traffic Engineering [4], [13], [15].
Control is based on congestion feedback, based on a link
congestion measure or price pl . This variable can be generated
as in standard congestion control (primal or dual versions,
see [14]), which will lead to solutions of either Problem 1 or
Problem 2. For the purposes of this paper it suffices to say
that either packet loss fraction or queuing delay can serve as
congestion price, provided TCP responds to this quantity.
In the multipath setting, congestion prices must be averaged
among paths to destination. We introduce node prices qid ,
representing the average price of sending packets from node
i to destination d, defined through the recursion
X
d
qdd = 0,
qid =
i,j
[pi,j + qjd ], i 6= d.
(2)
j
d
) is the mean price experi(also denoted i,j
Here pi,j +
enced from i to d when routing through next hop j. The overall
mean node price qid is obtained by averaging these prices with
their corresponding routing fractions.
The overall price qsd from the source node to destination
serves as congestion control signal for the TCP sources. For
the overall network to solve the desired optimization, split
fractions must also respond to the same congestion prices.
One control law proposed in [12] for this purpose is
qjd
d
d
i,j
= i (id i,j
);
(3)
d
here id is the average of the i,j
over j. So we reduce transmission on paths with higher than average price, transferring
to lower price paths. In [12], a saturation is further imposed to
the right-hand side of (3) so that traffic fractions remain nonnegative. It is shown in [12] that this algorithm, together with
prices pl = 0l (yl ) provides global convergence to the solution
of Problem 2. To solve Problem 1, a different price generation

method is used, together with a variation of (3) which involves
an anticipatory term in the prices. For the purposes of this
paper we focus from now on (3).
In [12] the theory was demonstrated through an ns2 implementation that used queueing delay as a congestion price. On
the router side, delay was used to generate the qid prices, which
were used as metric for a generalized distance vector (RIP)
protocol: multipath routers use this information to update
d
. On the transport side, a delay-based
the split fractions i,j
congestion control was developed based on TCP-FAST [7].
Two modifications were required: first, FAST was modified
to become insensitive to packet reordering, modifying the
ACKing scheme to compute RTT averages on all packets.
Second, to estimate the BaseRTT parameter in FAST, which
represents here average propagation delay over all paths, an
explicit feedback from the routers was required at the timescale of RIP announcements. Results with the above implementation were successful, but it is still far from a deployable
proposal, because it requires modifications to both the TCP
and IP portions of the network, and feedback between them.
In this paper we pursue a router-side only implementation of
the above control laws, compatible with legacy TCP protocols.
III. L OSS - BASED IMPLEMENTATION
Congestion control has for a long time been implemented
with TCP-Reno [8], with its additive-increase-multiplicativedecrease regulation of the congestion window based on loss
events. In recent years, high-speed TCP variants have appeared
that depart from AIMD, but are still predominantly loss-based.
This motivates us to consider loss as a congestion price for
multipath routing control as well.
If the price pl is link loss probability, the recursion (2) will
compute, to first order, the mean loss fractions experienced
from node to destination, given the current routing splits
d
i,j
. The usual first order approximation (commonly made
in congestion control) is that loss probabilities are additive
d
over a path. Under this assumption, the terms i,j
= pi,j + qjd
become the conditional loss probabilities when routing to hop
d
j, which weighted by the routing probabilities i,j
give the
d
correct qi through Bayes rule. In particular, if s is the source
node, qsd will be the end-to-end loss probability.
Returning now to TCP, q
for concreteness with AIMD, we
recall the model x = RT1 T 1.5
q , from [11] for the TCP rate
x as a function of the loss probability q experienced by the
flow. Identifying this q with qsd , we have modeled TCP and
multipath IP in a coherent setting. Indeed, introducing the
1.5
utility U (x) = RT
T 2 x associated with the AIMD demand
curve, the framework of the previous section applies, without
the need of explicit message passing between routers and
sources, an advantage over the implementation in [12].
This model assumes, however, that each TCP flow is distributed in the multiple routes, which has an undesirable effect:
packet reordering at the arrival due to different delay paths.
For legacy TCP, out-of-order arrivals are taken as an indication
of loss, triggering an unnecessary congestion response.
Is it impossible, then, to take advantage of the performance

benefits of multipath in a router-side only implementation?
If there is a single TCP connection between s and d, this
would seem an insurmountable limitation. If many flows are
present, however, rather than splitting each flows packets in
multiple routes, we can split the aggregate traffic according
to (1) while keeping each connection single-path. A hashing
method to achieve this is described below.
A question arises here: does per-flow routing give the same
fairness as the full multipath case? Or would some flows
see higher-than-average losses, while others see lower-thanaverage, in detriment of the former? The answer is that during
a transient phase, unfairness can appear. However, a property
of the routing dynamics (3) is that when it reaches equilibrium,
all paths in use have the same price (loss). So there is still a
fair equilibrium, with optimal overall utility as desired.
A. Multipath forwarding based on hashing
d
d
d
), one natGiven a desired routing split (i,1
, i,2
, . . . i,J
ural forwarding method could be: generate for each packet
a pseudo-random number, uniform in the interval [0, 1], and
d
d
d
d
+
, (i,1
+i,2
), . . . , (i,1
compare it with the thresholds i,1
d
d
i,2 +. . . i,J1 ); this comparison selects the forwarding link.
We wish, however, to rig this random routing so that packets
of individual TCP flows always fall in the same bin, while
aggregate rates keep the desired proportions. This can be done
by using a flow identifier (e.g., the 5-tuple in the packet
header) as a seed for pseudo-random number generation.
Specifically, consider the operation
(seed K) mod k
hash =
.
(4)
k
Here k is the resolution used for the uniform distribution, assumed prime, K another large prime. The remainder modulo k
of seedK will tend to distribute uniformly in 0, 1, . . . , k 1,
leading to the desired distribution in [0, 1] after dividing by k.
The above can be used successfully for multipath routing in
a single node. In a more general network topology we have an
additional requirement: routing decisions at each node should
be independent. To see this, consider the following example:
Node 1 splits traffic between links (1, 2) and (1, 3).
Downstream node 2 splits between (2, 4) and (2, 5).
Assume both nodes use 50-50 splits. If the hashing seed only
depends on the flow, node 2 will only use one outgoing link,
because it will only receive packets whose hashes have fallen
on one side of the split threshold.
Independent hashing can be obtained with a seed =
f low id node id, where the first term represents the connection, the second the node. To avoid trivial results the latter
should be coprime with k. In the ns2 version we used k prime
and node id < k, and we validated in simulation that splitting
across multiple hops takes place.
B. ns2 implementation
We supplemented the standard ns2 distribution that contains
a module for TCP-Reno, with multipath router modules.
Many features are common to the implementation in [12],
we highlight the differences and enhancements. We emphasize

that despite some similarities, the present code is in a more
final and documented form, it constitutes a patch that can be
added to the ns2 distribution, and will be available in [10].
The routing agent at each node i maintains an extended
routing table, storing for each destination d three vectors with
d
d
the variables qjd , i,j
, i,j
, indexed by neighbor j. In addition,
a set of boolean flags are used to tag neighbors in improper
state; this is part of a method for blocking transient loops,
adapted from [5]; we refer to [12] for details.
Neighbor prices qjd are updated upon receipt of the corresponding routing announcements from neighbors. A configurable time interval T adv is used1 to schedule announcements, of the form (destination, metric, improper flag), modifying de distance vector agent in ns2. The metrics qid for each
d are computed prior to announcement according to (2).
Link prices pi,j (loss probabilities) are measured at configurable periodicity T prob. This is done through a queue monitor in ns2, that returns counters p drops and p departures,
from which the loss fraction is calculated. We included exponential smoothing to eliminate noise from the loss estimation.
A slight departure from the theory was to add a small
d
. This only has impact
constant hc > 0 to the calculation of i,j
in uncongested situations where the price would otherwise be
zero, making routing indifferent. Our addition implies favoring
shortest hop-count routes in this under-loaded case.
Split fractions are themselves updated with their own period
T , following a discretization of (3). This update must respect
d
blocking, and also the saturation constraints i,j
0, we
refer to [12] for details. Updated split variables are transferred
to the forwarding classifiers. A minimal value min is also
configured, below which the link is not used for forwarding.
We modified the standard ns2 node agents to include two
packet classifiers: the first acts as usual, based on destination,
and determines if the packet is for the node itself or must be
forwarded; in the latter case it is sent to a multipath classifier,
one per destination. This is where split ratios are stored and
the hashing operation to resolve forwarding is carried out.
C. Complexity considerations
In regard to communication requirements between routers,
our multipath routing protocol involves minimal overhead
with respect to standard distance vector protocols (e.g. RIP):
mainly, allowing a floating point metric. A greater penalty
appears in storage and computation. For a network with
N destination nodes, a router with V neighbors must store
O(N V ) variables, and perform O(N V ) multiplications every
T . Queue monitoring is O(V ) and thus of moderate impact.
All in all, for an intradomain scenario with moderate N , this
complexity should be manageable by modern-day routers.
IV. S IMULATION RESULTS
We have carried out extensive simulations of our implementation, that validate the correct behavior of the protocol.
1 Random
times in [0.9 T adv, 1.1 T adv] avoid synchronizations.
The selection presented below highlights the main features and

attempts to approximate some realistic scenarios.
A. Topology of parallel links
We begin with the simple topology of Figure 1. 100 sources
are located at node 0, with destination on node 7. Multipath
routing occurs in node 1, with asymmetric bottlenecks in its
outgoing links. The parameters are T adv = 2sec, T prob =
4sec, T = 6sec, with T = 1.5 for (3), and min = 0.005.
Fig. 1.
Parallel link topology.
Simulation results show convergence to the optimal resource

allocation. In particular, the routing splits 1,j in Figure 2
acquire the correct values to fill the overall 100Mbps capacity.
This can be compared to 30Mbps achievable with single-path,
or 25Mbps with equal-split multipath. Figure 2 also shows the
rates of individual TCP flows, all of which are single path.
These indeed achieve a fair allocation of bandwidth, around
1Mbps each, irrespective of the path they are assigned. Routing
based on hashing with split fractions 1,j controls the number
of connections per path to exploit the capacity fairly.
B. Multiple source topology, random traffic

The second scenario is the topology of Figure 3, with
sources at nodes 0,1,2 and destination in node 3. Assume first
we had only long (elephant) connections, all with the same
utility2 , and half as many in node 2 with respect nodes 0,1.
In that case we can compute the optimal allocation which is
asymmetric; the required splits i,j are indicated in the figure.
Fig. 3.
Topology with multiple sources, indicating optimal splits.
To make things more realistic we include a random load

pattern instead of permanent flows. This is done through an
ns2 module that generates Poisson traffic, with i arrivals/sec
and exponential file sizes of mean size 6MB. We chose 0 =
1 = 0.4, and 2 = 0.2, creating a load of 19.2 Mbps at source
2, twice in the others. We still leave a few elephant flows in
each source as a probe to measure the resulting fair-rates.
Fig. 4.
Split fractions from node 1, elephant flow rates in Mbps.
Figure 4 depicts one trajectory of traffic splits, 1,j ; while

there is more time-variation due to random load, it settles
around the optimal value. We also show the rate of the elephant
flows, again equalized among routes as expected.
C. Stub domain with peering bottlenecks
The third scenario we consider is depicted in Figure 5. Here
we have a full-mesh backbone from which destinations are
Fig. 2.
Evolution of 1,j (top) and individual TCP flow rates (bottom).
2 For this to happen, RTTs must be the same; this is set up by making
external delays predominant with respect to those inside this loop.
reachable with good bandwidth and low delay, but there is

scarce bandwidth and longer latency in the connections to
the outside Internet. Such situation is present in stub ISPs
far removed from the Internet core, like those in the authors
home country. External routers Ext0 and Ext1, located near
the core are still managed by the ISP: the efficient use of their
outgoing capacity has high impact in network performance.
Fig. 7.
Split fractions for Ext1. Top: to Dst1. Bottom: to Dst3.
capacity is plentiful, the external rate suffers no degradation.

Note, however, that split ratios at Ext0 and Ext1 move to a
different equilibrium. This is consistent with theory since this
problem has non-unique solutions: split fractions from Ext0 to
different destinations can each be different from 3-1 and still
result in an overall 3-1 rate partition.
V. C ONCLUSIONS
We have implemented a dynamic multipath routing protocol
which, combined with legacy TCP, can achieve the optimal
resource allocation promised by the theory in [12]. Given
its performance and moderate computational requirements, it
shows potential for Traffic Engineering practice. One open
question is whether the proposal can be made compatible with
link-state protocols, prevalent in intra-domain routing today.
Fig. 5.
Stub network with bottlenecks in external access.
We include 4 external source nodes 0, 1, 2, 3, which

generate traffic to consistently labelled destinations. Each
source node is fed, as before, by a mix of permanent TCP
connections (10 flows each) and random TCP traffic sources
(Poisson traffic with mean load 70Mbps).
Fig. 6.
Split fractions for Ext0. Top: to Dst0. Bottom: to Dst2.
Simulation plots for the split fractions in routers Ext0 and

Ext1 are given in Figures 6 and 7. All settle initially in the 3-1
ratio consistent with the outgoing link capacities. At 3000sec
we introduce a fault, severing link Ext0-Bb1, later restoring
it at 5000 sec. The routing splits at node Ext0 react to these
changes quickly, rerouting traffic as needed, whereas those at
Ext1 are unaffected. At 7000 sec the backbone link Bb0-Bb2
fails, causing Bb0 to reroute traffic. Since remaining backbone
R EFERENCES
[1] D. Awduche, A. Chiu, A. Elwalid, I. Widjaja, X. Xiao, Overview and
Principles of Internet Traffic Engineering, RFC3272, IETF.
[2] A. Elwalid, C. Jin, S. Low, and I. Widjaja, MATE: MPLS Adaptive
Traffic Engineering, Proc. IEEE INFOCOM 2001.
[3] A. Ford, C. Raiciu, M.Handley TCP Extensions for Multipath Operation with Multiple Addresses, Internet Draft, Oct. 2009.
[4] B. Fortz and M. Thorup, Internet Traffic Engineering by Optimizing
OSPF Weights, Proc. IEEE INFOCOM 2000.
[5] R. G. Gallager, A minimum delay routing algorithm using distributed
computation, IEEE Trans. on Comm., Vol Com-25 (1), pp. 73-85, 1977.
[6] H. Han, S. Shakkottai, C.Hollot, R. Srikant and D. Towsley, Multi-Path
TCP: A joint congestion and routing scheme to exploit path diversity in
the Internet, IEEE/ACM Trans. Netw. Vol. 14(6), pp. 1260-1271, 2006.
[7] C. Jin, D. X. Wei and S. H. Low, FAST TCP: motivation, architecture,
algorithms, performance; Proc. IEEE INFOCOM 2004.
[8] V. Jacobson, Congestion avoidance and control, Proc. ACM SIGCOMM 88.
[9] F. P. Kelly, A. Maulloo, and D. Tan, Rate control for communication
networks: Shadow prices, proportional fairness and stability, Jour. Oper.
Res. Society, vol. 49(3), pp 237-252, 1998.
[10] http://athenea.ort.edu.uy/publicaciones/mate/en/index.html.
[11] M. Mathis, J. Semke, J. Mahdavi, T. Ott The Macroscopic Behavior of
the TCP Congestion Avoidance Algorithm, Computer Communication
Review, volume 27, number 3, July 1997.
[12] F. Paganini, E. Mallada, A unified approach to congestion control and
node-based multipath routing, IEEE/ACM Trans. on Networking, Vol.
17, no. 5, pp. 1413-1426, Oct. 2009.
[13] A. Sridharan, R.Guerin, C. Diot. Achieving Near-Optimal Traffic
Engineering Solutions for Current OSPF/ISIS Networks. IEEE/ACM
Transactions on Networking, March 2005.
[14] R. Srikant, The Mathematics of Internet Congestion Control, Birkhauser,
2004.
[15] D. Xu, M. Chiang, J. Rexford, Link-state routing with hop-by-hop
forwarding can achieve optimal traffic engineering, Proc. IEEE INFOCOM 2008.

Globecom 10

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Globecom 10

Hochgeladen von

Copyright:

Verfügbare Formate

An implementation of optimal dynamic load

balancing based on multipath IP routing

tion. Here TCP sources continue to manage only a single

In this section we review the framework from [12] that

of Problem 2. To solve Problem 1, a different price generation

Is it impossible, then, to take advantage of the performance

we highlight the differences and enhancements. We emphasize

times in [0.9 T adv, 1.1 T adv] avoid synchronizations.

The selection presented below highlights the main features and

Parallel link topology.

Simulation results show convergence to the optimal resource

B. Multiple source topology, random traffic

Topology with multiple sources, indicating optimal splits.

To make things more realistic we include a random load

Split fractions from node 1, elephant flow rates in Mbps.

Figure 4 depicts one trajectory of traffic splits, 1,j ; while

Evolution of 1,j (top) and individual TCP flow rates (bottom).

reachable with good bandwidth and low delay, but there is

Split fractions for Ext1. Top: to Dst1. Bottom: to Dst3.

capacity is plentiful, the external rate suffers no degradation.

Stub network with bottlenecks in external access.

We include 4 external source nodes 0, 1, 2, 3, which

Split fractions for Ext0. Top: to Dst0. Bottom: to Dst2.

Simulation plots for the split fractions in routers Ext0 and

Das könnte Ihnen auch gefallen