Beruflich Dokumente
Kultur Dokumente
simplicity, we assume that a swarm can be considered sets T (r) are not necessarily pairwise disjoint. Finally,
“large” if it has at least x̃ participating peers, for some let us denote the number of peers tracked by tracker r
threshold x̃. “Large” swarms are likely to have high piece for torrent t by xt,r , the total number of peers tracked by
diversity and are more resilient to fluctuations in peer tracker r by Mr , and the total number of peers associated
participation. with torrent t by xt .
Apart from these two properties, one would like to
minimize the effect of swarm balancing on the traffic 2.2 Distributed Negotiation
load of the trackers by (iii) avoiding a significant in-
crease in the number of peers for any individual tracker The distributed negotiation algorithm assumes that each
(load conservation), and by (iv) minimizing the number tracker r knows the set of torrents T (r, r 0 ) = T (r)∩ T (r0 )
of peers that are shifted from one tracker to another (min- that r has in common with each other tracker r 0 ∈ R
imum overhead, especially shifts associated with torrents for which the trackers’ torrents are not disjoint (i.e., for
being re-balanced). which T (r, r0 ) 6= 0). Note that this information should
The distributed swarm management (DISM) algorithm be available through the torrent file, which should have
we describe in the following was designed with these been uploaded to the tracker when the torrent was regis-
four properties in mind. Our algorithm allows swarms to tered with the tracker.
be merged and to be split: swarms are merged whenever The algorithm works as follows. Tracker r invites for
a swarm has less than x̃ participating peers, and a swarm pairwise balancing the trackers r 0 for which the overlap
is split (or re-balanced) over multiple trackers only if in tracked torrents, T (r, r 0 ) is maximal among the track-
splitting the peers does not cause any swarm to drop be- ers with which it has not yet performed the pairwise bal-
low x̃ peers. The algorithm ensures that no tracker will ancing. A tracker r 0 accepts the invitation if its over-
see an increase of more than x̃ peers in total (across all lap with tracker r is maximal. Otherwise, tracker r 0 asks
torrents) and typically much less, and it does not balance tracker r to wait until their overlap becomes maximal for
swarms that have at least x̃ peers each. r0 as well. The distributed algorithm guarantees that all
The DISM algorithm is composed of two algorithms. pairs of trackers with an overlap in torrents will perform a
It relies on a distributed negotiation algorithm to deter- pairwise balancing once and only once during the execu-
mine the order in which trackers should perform pair- tion of the algorithm. If tracker r 0 accepts the invitation,
wise load balancing. On a pairwise basis, trackers then tracker r queries xt,r0 for t ∈ T (r, r0 ) from r0 , executes
exchange information about the number of active peers the pairwise balancing algorithm described below, and
associated with each torrent they have in common (e.g., finally tells the results to tracker r 0 .
2.3 Pairwise Approximation Algorithm only new protocol message required is a tracker redirect
message that could be used by a tracker to signal to a peer
Rather than applying optimization techniques, we pro- that the peer should contact an alternative tracker for the
pose a much simpler three-step greedy-style algorithm. torrent. The message would be used by a tracker r for
First, peers are tentatively shifted based only on infor- a torrent t for which xt,r decreases due to the execution
mation local to each individual torrent. For all torrents of DISM. Peers that recieve the message should contact
t that require merging (i.e., for which xt,r + xt,r0 < 2x̃), another tracker they know about from the tracker file.
all peers are tentatively shifted to the tracker that al- The communication overhead of DISM is dominated
ready maintains information about more peers for that by the exchange of torrent popularity information be-
torrent. For all torrents that should be re-balanced (i.e., tween the trackers and by the redirection messages sent
for which min[xt,r , xt,r0 ] < x̃ and xt,r +xt,r0 ≥ 2x̃), the min- to the peers. Distributed negotiatation involves one
imum number of peers (x̃ − min[xt,r , xt,r0 ]) needed to en- tracker scrape before every pairwise balancing, and the
sure that both trackers have at least x̃ peers are tentatively corresponding exchange of the results of the balancing.
shifted to the tracker with fewer peers. The amount of data exchanged between trackers r and
Second, towards achieving load conservation of Mr , r0 is hence O(T (r, r 0 )). The amount of redirection mes-
the peer responsibility of some torrents may have to be sages is proportional to the number of peers shifted be-
adjusted. Using a greedy algorithm, the tentative alloca- tween swarms, and is bounded by ∑t∈T (r,r0 ) xt .
tions are shifted from the tracker that saw an increase in
peer responsibility (if any) towards the tracker that saw a
decrease in overall peer responsibility. To avoid increas- 3 Protocol Validation
ing the number of peers associated with partial shifts, pri-
ority is given to shifting responsibilities of the torrents 3.1 Empirical Data Set
that are being merged. Among these torrents, the algo-
We used two kinds of measurements to obtain our data
rithm selects the torrent that results in the largest load
set. First, we performed a screen-scrape of the torrent
adjustment and that does not cause the imbalance in over-
search engine www.mininova.org. In addition to claim-
all load to shift to the other tracker. By sorting torrents
ing to be the largest torrent search engine, mininova
based on their relative shift, this step can be completed
was the most popular torrent search engine according to
in O(|T (r, r0 )|log|T (r, r0 )|) steps.
www.alexa.com during our measurement period (Alexa-
Finally, if overall load conservation is not yet fully
rank of 75, August 1, 2008). From the screen-scrapes we
achieved, additional load adjustments can be achieved
obtained the sizes of about 330, 000 files shared using
by flipping the peer responsibilities of the pair of tor-
BitTorrent, and the addresses of 1, 690 trackers.
rents that (if flipped) would result in the load split clos-
Second, we scraped all the 1, 690 trackers for peer
est to achieving perfect load conservation (across all tor-
and download information of all the torrents they main-
rents), with ties broken in favor of choices that mini-
tain. (Apart from interacting and helping peers, a tracker
mize the total shift of peers. Of course, considering
also answers scrape requests at its scrape URL.) For
all possible combinations can scale as O(|T (r, r 0 )|2 ).
the tracker-scrapes we developed a Java application that
However, by noticing that only the torrents with the
scrapes the scrape URL of each tracker. By not speci-
smallest shift for each load shift are candidate solutions,
fying any infohash, the tracker returns the scrape infor-
many combinations can be pruned. By sorting the tor-
mation for all torrents that it tracks. This allowed us
rents appropriately, our current implementation achieves
to efficiently obtain the number of leechers, seeds, and
O(|T (r, r0 )|log|T (r, r0 )|) whenever x̃ is finite.
completed downloads as seen by all trackers that we de-
We have also considered an alternative version in
termined via the screen-scrape of mininova.
which priority (in step two) is given to allocations of the
We performed the tracker-scrapes daily from October
torrents that are being re-balanced. For these torrents,
10, 2008, to October 17, 2008. All scrapes were per-
we (greedily) balance the load of the torrent with the
formed at 8pm GMT. We removed redundant tracker in-
largest imbalance first, until load conservation has been
formation for trackers that share information about the
achieved or all such torrents have reached their balanc-
same swarms of peers, and identified 721 independent
ing point. Results using this algorithm are very similar
trackers. Table 2 summarizes the data set obtained on
to those of our baseline algorithm and are hence omitted.
October 10, 2008.
Figure 1 summarizes some of the main characteris-
2.4 Implementation and Overhead tics of the world of trackers and torrents captured by
our measurements. Figure 1(a) shows the size of tor-
The proposed DISM algorithm could be implemented rents and swarms against their rank. Similar to previous
with a minor extension to the BitTorrent protocol. The content popularity studies (e.g., [1, 3, 4, 5]), we find that
6
10
6 10 1
Torrent size statistics (x )
t 5
|R(t)| − 1
10
Swarm size statistics (x ) 0.8
Number of torrents
t,r
Number of peers
4 4
10 10
0.6
p
3
10
CoV (xt,r )/
0.4
2 2
10 10
1
0.2
10
0 0 0
1
0
10 0
1
2 4 6
10 10 10
2 4
10 10
6
10 10 10 10 2 4 6 8 10 12 14
Torrent/Swarm rank Number of swarms in torrent (|T (r)|) Number of peers in torrent (x )
t
|R(t)| − 1]
|R(t)| − 1]
0.9
|R(t)| − 1]
0.8 0.8 0.8
0.7 0.7
0.6
0.6 0.6
p
p
p
E[CoV (xt,r )/
E[CoV (st,r )/
E[CoV (lt,r )/ 0.5 0.5
0.4 x = 200 x = 200 x = 200
0.4 0.4
x = 100 x = 100 x = 100
0.2 0.3 x = 50 0.3 x = 50
x = 50
0.2 Original 0.2 Original
Original
0 0 0.1 0 1 2 3 4 5
0.1 0 1 2 3 4 5
1 2 3 4 5
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Number of peers in torrent (xt) Number of leechers in torrent (lt) Number of seeds in torrent (st)
Figure 2: The average coefficient of variation (CoV) as a function of (a) peers, (b) leechers, and (c) seeds.
|R(t)| − 1
0.8
0.6
Using these throughput estimations we can estimate
p
CoV (xt,r )/ the speedup achieved by the distributed algorithm. Fig-
0.4
ure 5 shows the relative throughput improvement for
0.2
leechers as a function of the torrent size. The results
0 0
10 10
2
10
4 6
10
show a good match with our objectives: the smallest tor-
Number of peers in torrent (xt)
rents experience throughput gains up to 70 percent on
average, while torrents above approximately 200 peers
Figure 3: Normalized CoV after applying DISM. are not affected by our algorithm.
torrents, Figure 3 shows the normalized CoV for all tor- Figure 6 shows the relative estimated throughput im-
rents. Comparing Figure 3 with Figure 1(c) we note that provement for leechers in torrents smaller than 300 peers
the distributed algorithm significantly reduces the num- over a week. The figure shows the estimated increase
ber of torrents operating in the lower left corner. While of the throughput per leecher averaged over the torrents
there still are some torrents that do, we note that Figure 2 for three different values of the threshold x̃. The curves
suggests that this number is very small. marked with × show results weighted with the torrent
size; the curves without markers show the non-weighted
gain. The throughput gains are rather steady in both
3.4 Throughput Analysis cases, and show that dynamic swarm managenemt could
Thus far our analysis has been based on the assumption improve peer throughput significantly.
that small torrents are more likely to perform poorly. To
validate this assumption and allow us to obtain estimates
of the potential performance gains small torrents can ob- 4 Related Work
tain using our distributed algorithm we need to estimate
the throughput of different sized swarms. Much research has been done to understand BitTorrent
To estimate the throughput of any particular swarm we and BitTorrent-like [6, 11] systems. While most of these
measured the number of seeds, leechers, and cumulative efforts have been towards understanding the performance
number of downloads for two consecutive days. Using of single torrent systems, there have been some recent
these values we estimated the throughput per leecher as papers that consider multi-torrent environments [7, 10].
the file size L divided by estimated download (service) Guo et al. [7] provide measurements results that show
time S, which using Little’s law can be expressed as that more than 85% of torrent users simultaneously par-
S = l/λ, where l is the average number of leechers cur- ticipate in multiple torrents. The authors also illustrate
rently being served in the system and λ the current arrival that the “life-time” of torrents can be extended if a node
rate, which due to flow conservation can be estimated as that acts as a downloader in one torrent also acts as a seed
the overall throughput (equal to the number of download in another torrent. Yang et al. [10] propose incentives
completions D during that day, times the file size L, and that motivate users to act as seeds in such systems. In
divided by the time T between two consecutive measure- particular, Yang et al. propose a cross-torrent tit-for-tat
ments). To summarize, we have an estimated throughput scheme in which unchocking is done based on the aggre-
of LD gate download rates of a peer across all torrents (instead
lT . Finally, throughput estimates for any particular
swarm type were obtained by taking the average over all of only based on the download rate for a single torrent).
swarms of that particular size. Rather than increasing seed capacity through coopera-
Figure 4 shows our throughput estimation results. tion among peers downloading different files, this paper
Here, swarms are classified based on the total number focuses on how multiple swarms of the same file can be
of peers in the swarm (s + l) and the seed-to-leecher ratio merged to improve the performance of small swarms. To
s the best of our knowledge, no prior work has considered
l ; 60 bins are used for the swarm sizes and three curves
are used for the different seed-to-leecher ratios. We note the performance benefits from adaptively merging and/or
that the results for swarms up to just over 1, 000 peers re-distributing peer information among trackers.
are consistent with intuition and previous measurement Other related works have considered the effective-
results [9]. For larger swarm sizes we believe that the ness of BitTorrent’s tit-for-tat and rarest-first mecha-
above estimation is less accurate. Estimation errors may nism [2, 8]. BitTorrent-like systems have also been
be due to inaccurate estimation of the average number considered for streaming [11]. Finally, we note that
of leechers l over the measurement period, for exam- long-tailed popularity distributions have been observed
ple. Furthermore, we note that the popularity of these in many other contexts, including Web workloads [1, 3]
type of workloads typically follows a diurnal pattern and and user generated content [4, 5].
45 0.7 0.45
throughput/leecher [KB/s]
40 x = 100 x = 100
of mean throughput
x = 50 Torrent throughput gain x = 50
1≤ S/L<4
Relative increase
35 0.5 0.35
Estimated swarm
30 S/L<1
0.4 0.3
25
0.3 0.25
20
0.2 0.2 Throughput gain weighted by torrent size
15
0.1 0.15
10
0
5 0.1
0 −0.1 0 1 2 3 4 5 0.05
0
10 10
1 2
10 10
3
10
4 5
10 10 10 10 10 10 10 0 1 2 3 4 5 6 7
Number of peers in swarm (x ) Number of peers in torrent (x ) Days since 2008.10.10
t,r t,r
Figure 4: Throughput estimation for Figure 5: Estimated speedup vs. tor- Figure 6: Estimated speedup for tor-
three classes of swarms rent size rents smaller than 300 peers vs. time