Beruflich Dokumente
Kultur Dokumente
Raimund K. Ege Alberto Aguilar Florida International University Northern Illinois University Department of Computer Science College of Engineering and Computing School of Computing and Information Scienc ege@cs.niu.edu alberto.aguilar @itesm.mx Raime Bustos Kia Makki Florida International University Florida International University College of Engineering and Computing College of Engineering and Computing IT2 IT2 raime.bustos @ itesm.mx makkik@fiu.edu
Abstract
The number of computers and high-speed networks connected to the Internet in large-scale organizations has grown in the last two decades. As a consequence, a large number of software applications must be installed and the quantity of data needed for their operation is considerable. This has a high price for IT departments: distribute this content efficiently. We claim that content distribution using P2Pfile sharing systems can be used to accomplish this; we propose a solution that works even in environments that use NAT and utilizes WAN links efficiently.
1. Introduction
The number of computers and high-speed networks connected to the Internet in large organizations has grown in the last two decades [12, 13]. As a consequence, a large number of software applications are needed and the quantity of data that can be generated using these applications is considerable. IT departments confront a big challenge: distribute applications and data among these computers. These data may include OS updates and patches, anti virus updates, general purpose applications, image files (i.e. ImageCast files [16]), drivers and Java applets. The size of these files range from a few megabytes to several gigabytes in the case of image files. Also, it implies distributing this data to several computers connected to a LAN, and since some organizations tend to be large and geographically separated, several LANs may be part of the organization's network, or a VLAN [25] in some cases.
1
Considering that these LANs may be located in distinct locations, they are interconnected using the Internet with bandwidths much smaller than the ones typically found in a LAN. These connections to the Internet are provided by different carriers (i.e. AT&T, MCI) and NAT is widely used in some of these environments as a consequence of the shortage of Public IP addresses [14]. In addition, to facilitate management and improve performance these LANs are divided with a logical separation using Networks and Sub-networks [24], for example one Network ID per building. Hence, in a LAN we can find several Network IDs depending on its size. Finally, similar office hours in local and regional networks implies clients up and running at the same time. We claim that content distribution using P2P file sharing systems can be used to accomplish this and we propose a solution that works even in environments that use NAT and utilizes WAN links efficiently.
2. Related work
2.1. Centralized server
Two main problems with this approach. First, as the number of hosts trying to download a file increase, a bottle neck that may lead to saturation of site and, hence, denial of service (DoS). Although mirror-based solutions [15, 16, 23] might improve performance, the need for more mirrors increase in the same ratio as the number of hosts increases. In addition, available bandwidth between a LAN and the Internet is usually limited and this channel may become a bottleneck for all hosts inside the LAN.
615
System
=
Look-up method
Flooding Centrized Tracker
Share if partial
Fairne
=
I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Comment
No
No
No
No
Yes
Yes
distinguishes two types of peers: downloaders and seeds. Seeds have a complete copy of the file while downloaders have some pieces of it and became a seed once they obtain the entire file. If a peer needs to download a file, it should have the torrent file first to obtain the necessary information to join a swarm. Currently, there are several clients [17, 18, 19] that can parse this torrent file to start the downloading process. Since the file is broken into pieces, peers can download missing pieces from others, or upload pieces to them as well. A peer is aware of which peers to contact by querying the tracker (or tracker service if distributed) for currently connected downloaders and seeds. Once a peer obtains this random list from the tracker, it updates its neighbors list and then tries to establish TCP connections with a sufficient number of them. Also, each peer is responsible for keeping statistics of upload and download rates for each of the other peers which has been in contact with. This is done to maximize its download rate by downloading from whoever it can. Then, a decision of which peers to upload can be made using a tit-fortat approach. In other words, if one peer does not provide pieces, it may be choked since it is considered a leecher [1]. File pieces are not usually downloaded in order and hence they need to be reassembled by the peer. Also, a peer may upload pieces to other peers even if it does not have the complete file.
Since a LAN may be connected to the Internet using a low-speed channel it is important that a file can be shared by peers even if they have some fragments of the file only. In this way, when one of the peers obtains the entire file, it is likely that the rest of the hosts in the LAN are missing just some pieces. Fairness deals with the problem known as "free riding" [6], which means that a peer is downloading data but is not sharing it with others. Thus, fairness means that a peer receives data in the same amount that it shares it. We conclude that BitTorrent is suitable to deal with the described problems.
3.2. Problem with BitTorrent and other P2P file sharing systems
Note in Fig. 1 that Peer 1 cannot establish an end-to-end connection with Peer 2 even if both belong to the same private network and using NAT. The problem is the following: the tracker identifies Peer ] and Peer 2 with 70.149.56.44 as IP address. Thus, when the tracker provides Peer ] and Peer 2 with a list of connected peers and seeds, this list may include 70.149.56.44, which is the router's IP address which means that peers in the same private network, either LAN or VLAN, cannot cooperate with each other. As a consequence, the list of peers decreases in an environment where NAT is used. Also, consider that potential bandwidth is not used since speeds between peers in a LAN are typically 100 Mbps or more, and the use of high speed switches to connect them is common. Finally, note that once a seed is in the LAN there is no need to download data from peers outside the network. These problems are also present in Gnutella, CoopNet and similar approaches since all of them use (or assume) public IP addresses to the contact other peers. 2
616
oz.
Router
Firewall
Same Network
RandomList
Figure 2. Proposed
peer
list
the same private network, either LAN or VLAN or, 3) P1 is behind NAT (having a Private IP) and P2 has a Public IP. Finally, there is at least one seed with a public IP address and private IP ranges [20] are well defined, hence given an IP address it is easy to determine whether is private or not. Now, we detail how to augment BitTorrent. 1. Modify BitTorrent Client. When a peer contacts the tracker, the client should send the following tracking information (in addition to the information required by the current implementation): 1) Its IP address, which we will call IPQeer, which could either a private or public IP address. 2) Its Network ID, which we will call Network-ID, and a Sub-network ID, referred as Sub_net_ID. Note that the last two are extremely easy to compute having the subnet mask, since only an AND operation is needed: IPpeer AND sub-net-mask. 2. Augment Tracker's peers list. The tracker service should maintain a list of peers (in addition to the information it currently manages such as port number), where each row is a tuple: <IP sender, IPpeer, Network-ID, Sub_net_ID> Where IP_sender, which can be determined from the IP address from which the HTTP request came, is the IP address of the device that made the request. If a peer is behind a router using NAT, it would be the IP address of the router. It must be a public IP. IPQeer is the peer's IP address, which can be either public or private. Note that if IPQeer :t IP_sender then the peer is behind a proxy or NAT. Network-ID is the peer's Network ID. Sub_net_ID is the peer's Sub network ID. 3. Filter Tracker's peers list when it receives a request for peers. The key idea is that since the tracker receives two IP addresses from each peer, IP_sender and IPpeer, it can filter peers that are behind NAT in the same network by selecting peers with the same IP_sender, that is, the same proxy or router. Now, when a peer request a list of peers the tracker returns a list which we call FilteredPeerList, where,FilteredPeerList = SameSubNetUSameNet U RandomList. SameSubNet contains peers in the same network and subnetwork while SameNet contains peers with the same Network-ID. Finally, RandomList is used for backward compatibility with current bittorrent implementations. Observe that this list may contain 1) peers which have public IP address or, 2) peers which have private IP address that are in the same LANIVLAN and hence can be contacted to establish a TCP connection.
617
Peer24
(90qt*
0.4zm
X
--i
m--Iz
=7*
we are using is the use of broadcast and multicast for distribution. To define the group-id we use an opportunistic approach, that is, if a peer wants to join the group, it should send a message asking for a group-id (by broadcast), if there is no answer it computes the group-id using SHAI(ip+domain+filename). Then, if another internal peers needs to join the swarm, the former receives a response from the first one. Also, a group-id can be distributed using DHCP; in such case, the DHCP server should have this group-id parameter defined. To obtain this parameter, peers send a DHCPINFORM message once the BitTorrent client has started. Then, the DHCP server responds with a DHCPACK message that contains this information.
5. Simulation
The time reference for our simulation is what we call tags. A tag is a cut in the time axis to obtain a picture of the simulation at a given time. The time between two consecutive tags is the necessary time to contact all peers in the neighbors list and request pieces. All the results shown use 1000 peers sharing 300 pieces of 256KB to simulate a 75MB patch being downloaded. We vary the number of peers inside a LAN from 10% to 90% (referred in graphs as Internal Peers Percentage) to simulate several corporative scenarios. Also, we assume that all peers join the swarm practically at the same time (also known as flash crowd) to simulate a software patch that has been announced by a software vendor; this may look as a good scenario but is the opposite: there are no peers with partial information (i.e. some pieces already downloaded). We assume three seeds with public IP addresses and we wait till all peers have the entire file to stop simulation to measure the number of tags required for each experiment. We define two types of peers. Internal peers are hosts behind NAT with a 100 Mbps bandwidth in a switched environment (also known as symmetric switching); these peers share the same group-id, and the bandwidth for the WAN link that connects this LAN to the Internet is 8Mbps. Peers are considered External when located outside the LAN and have public IP addresses. External peers have a 1536kbps I 384kbps download/upload rate. Since internal peers are behind NAT, external peers cannot contact them. In our simulation, for peers behind NAT, we restricted incoming communications. However, if a peer behind NAT contacts a peer with public IP, from that point and as long as the communication is alive, bi-directional communication is allowed. External peers receive a list containing fifty public IP addresses, while internal peers receive a list with twenty-five public IP addresses and twenty-five private IP addresses. For all our experiments we consider three BitTorrent versions. First, a normal BitTorrent version, which we abbre4
618
viate , to simulate current releases of the protocol. ABT stands for Augmented BitTorrent which implements our proposed solution, and finally what we call ABT-K, which is a kind ABT version.
In the case of ABT, once an internal peer becomes a seed it sends a message to all internal peers informing this situation; when this message is received they drop connections with external peers to save WAN bandwidth. Note that internal peers are not considered leechers since they upload pieces to other external peers as long as no internal seed is available. This message is transmitted in the same way that the group-id is distributed.
In ABT-K, once an internal peer becomes a seed, it does not inform about this situation to other internal peers and hence connections with external peers may continue; this is why we call it a kind ABT version: it helps external peers after having seeds in the internal network.
.$D
4.
a)
Average Number of File Copies per Internal Peer through WAN link
1.4 1.2
BT
.ABT
o 0
0 0.
0.6
>_
20
30
~~~~~~~~
50 60 70
80
0.4 0.2
u
10
40
90
Figure 4. Average File Copies needed to obtain one file copy for each peer
I,
0.9 0.8
to
= a)
a)
0.7
P4 0.6
First, we compare the consumed WAN bandwidth for BT, ABT and ABT-K.
Observe in Fig. 4 that in BT the average number of file copies' per host across the WAN link is at least one (the downloaded copy plus sharing price), which means that the file was downloaded from the seeds; otherwise, it would be -two since BT uses a tit-for-tat approach. This occurs since an external peer considers all internal peers as a unique peer (as a NAT consequence) and the former are choked almost immediately by external peers. In ABT and ABT-K this number is smaller since internal sharing is performed and then WAN link usage decreases while capacity to share pieces increases. For instance, for 10% of internal peers, ABT uses half the BT bandwidth (Fig. 4) while providing three times more effective bandwidth to external peers (Fig. 5). Also, note that for 90% of internal peers, BT bandwidth is twelve times ABT bandwidth.
N 05
-> 0.4
0.3
0.2 0.1
o
10
2\
u) L
lo
20
30
40
50
60
70
80
90
619
50 45 40
BT
...
.ABT
35
v:
a)
30 25 20
4.
15
10
700 Tags
10
20
30
40
50
60
70
80
90
70
600
500
400
300
200 100
Pk
k(i
( is
a
)N-k
O,1,.., N
(1)
00
o LIU, 100
200
300
Tags
400
500
600
114700~
random
P[X k]
=
j)F(a ( +
=
+ k)F(N +
k)
(2)
paare
F(a)F(O)F(N + a + )
a
ber of tracker messages from internal peers drops when using ABT since the need to contact the tracker decreases if pieces can be shared internally. In BT internal peers contact the tracker frequently since they are choked by external peers since internal peers are not cooperating with pieces, and the neighbor list should be updated.
Also, from [27], we consider the estimation of the rameters using the moment method, where and given by:
(3) (4)
Similarly, note in Fig. 9 that for external peers the number of tracker requests is smaller in ABT. Also, this is a consequence of sharing pieces internally: internal peers help external hosts with pieces since piece sharing is faster in a LAN (and becomes more effective), and hence the number of tracker messages decreases by reducing the choking rate.
a(N -X)
x
S2
2A high quality list has a high success rate, where ered if a peer in the list provides at least one piece
is consid-
620
Results
...
.ABT
4
v:
.3
0
a)
a)
nD
-0 nU
=
,
4.
10
20
30
40
50
60
70
80
90
10 Success Events
to
0.6
0.5
O-
No
1+
() 2Z22
(5)
a)
For a 5% significance level, ZO, = 1.645, then No 581. Using chi-square tables to calculate X2at a 5% significance level and 23 degrees of freedom, we obtain 35.1725 Given this, we found that a Beta Binomial PDF fits for all experiments, while Binomial and Negative Binomial fail.
The histograms for the experimental data, the Beta Binomial, the Binomial and the Negative Binomial PDFs can be compared in Fig. 10 for 50% of internal peers.
5.4. Completion time
.BT-10%
BT-90% ABT-10%
600
0.1
o Ll100 200
300
400
500
Tags
We compare the required tags to have one copy of the file for each internal peer in the swarm (i.e, all internal peers become seeds). Note in Fig. 11 that BT best case (BT- 10%) takes about 7 times more than ABT worst case.
621
model obtained to find a relationship between the success events and the number of tracker requests, the size of the internal neighbor list, and the number of tags needed to have one copy of the file for each internal peer. Also, we are investigating how to use this approach with other protocols such as Gnutella and Coopnet. Finally, we are working on multicast version of BitTorrent for environments where it is suitable. Our research includes the creation of a multicast-tree based on information provided by the tracker considering the solution proposed in this paper.
[13] M. Naldi, The internet's growth problems , Proquest ABI/INFORM Global Dedham: Jan 1998.Vol.32, Iss. 1; pg. 55
Translation,
[15] H. Liu, X. Jia, D. Li and C.H. Lee, Optimal placement of mirrored web servers in ring networks, Communications, IEE Proceedings-Volume 151, Issue 2, 22 April 2004 Page(s): 170 - 178
References
[1] B. Cohen, Incentives Build Robustness in BitTorrent, May 22, 2003 [2] David Liben-Nowell, Hari Balakrishnan, and David Karger, "Analysis of the evolution of peer-to-peer systems" in ACM Conf. on Principles of Distributed Computing (PODC), Monterey, CA, USA, July 2002, ACM
[16] Myers, A.; Dinda, P.; Hui Zhang, Performance characteristics of mirror servers on the Internet, INFOCOM '99. Proceedings. IEEE Volume 1, 21-25 March 1999 Page(s):304 - 312 vol. 1
[17] Azureusproject,http://sourceforge.net/projects/azureus/ [18] BitComet BitTorret, client, http://www.bitcomet.com
[19] ABC
client,
http://pingpongInternets,
[3] V. Padmanabhan, H. Wang, P. A. Chou, Distributing Streaming Media Content Using Cooperative Networking Technical Report MSR-TR-2002-37, April 2002.
[4] The impact of file sharing on service provider networks. An Industry White Paper, Sandvine Inc.
[20] Address Allocation for Private http://www.ietf.org/rfc/rfc 1918.txt [21] US Secure Hash Algorithm http://www.faqs.org/rfcs/rfc3174.html
1
(SHA1),
[22] D.Tsoumakos and N. Roussopoulos. A Comparison of Peer to Peer Search Methods. In Sixth International Workshop on the Web and Databases (WebDB), 2003. [23] Akamai, http://www.akamai.com [24] Cisco - IP Addressing and Subnetting for New Users, http://www.cisco.com/warp/public/701/3.html
[8] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. "Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications", ACM SIGCOMM, August 2001. [9] V. N. Padmanabhan and K. Sripanidkulchai. "The Case for Cooperative Networking", IPTPS, March 2002.
[25] Sean Rooney, Christian Hortnagl, Jens Krause. Automatic VLAN creation based on on-line measurement, ACM SIGCOMM Computer Communication Review, Volume 29 Issue 3, July 1999. [26] Qiu, D., and Srikant, R. Modeling and performance analysis of bittorrent-like peer-to-peer networks. In Proceedings of ACM Sigcomm (Portland, OR, Aug 2004).
[27] R. Rodriguez-Dagnino, and R. Bustos-Gardea, BetaBinomial Video Traffic Modeling for the Knockout ATM Multicasting Switch. Proceedings of SPIE Vol. 3531,1998. [28] Server Security Patch Management at Microsoft, Technical White Paper, January 2004
[10] S. Saroiu, P. Khrisna, A Measurement Study of Peerto-Peer File Sharing Systems, Dept. of Computer Science and Engineering, Univ. of Washington
protocol [11] BitTorrent specification, http://www.bittorrent.com/protocol.html
[12] R. Christener, The Internet growth path, ProQuest ABI/INFORM Global Chicago: Apr 12, 1999. Vol.236, Iss. 15; pg. 46
8
[29] Local
622