Sie sind auf Seite 1von 7

4/2/2007

Common Scenario
• Millions want to download the same
popular huge files (for free)
BitTorrent – ISO’s
– Media (the real example!)

CS514 • Client-server model fails


– Single server fails
Vivek Vishnumurthy, TA
– Can’t afford to deploy enough servers

IP Multicast?
• Recall: IP Multicast not a real option in
general settings
– Not scalable
– Onlyy used in private
p settings
g
• Alternatives
– End-host based Multicast
Source
– BitTorrent
Router
– Other P2P file-sharing schemes (later in
lecture) “Interested”
End-host

Client-Server Client-Server
Overloaded!

Source Source

Router Router

“Interested” “Interested”
End-host End-host

1
4/2/2007

IP multicast End-host based multicast

Source Source

Router Router

“Interested” “Interested”
End-host End-host

End-host based multicast End-host based multicast


• “Single-uploader” “Multiple-uploaders” • Also called “Application-level Multicast”
– Lots of nodes want to download • Many protocols proposed early this
– Make use of their uploading abilities as well decade
– Node that has downloaded (part of) file will – Yoid
Y id (2000)
(2000), N
Narada
d (2000)
(2000), O
Overcastt (2000)
(2000),
then upload it to other nodes. ALMI (2001)
Uploading costs amortized across all nodes • All use single trees
• Problem with single trees?

End-host multicast using single tree End-host multicast using single tree
Source Source

2
4/2/2007

End-host multicast using single tree End-host multicast using single tree
Source
• Tree is “push-based” – node receives data,
Slow data transfer
pushes data to children
• Failure of “interior”-node affects downloads in
entire subtree rooted at node
• Slow interior node similarly affects entire subtree
• Also, leaf-nodes don’t do any sending!
• Though later multi-tree / multi-path protocols
(Chunkyspread (2006), Chainsaw (2005), Bullet
(2003)) mitigate some of these issues

BitTorrent BitTorrent Swarm


• Written by Bram Cohen (in Python) in 2001 • Swarm
• “Pull-based” “swarming” approach – Set of peers all downloading the same file
– Each file split into smaller pieces – Organized as a random mesh
– Nodes request
q desired p
pieces from neighbors
g
• As opposed to parents pushing data that they receive • E
Eachh node
d kknows lilistt off pieces
i
– Pieces not downloaded in sequential order downloaded by neighbors
– Previous multicast schemes aimed to support • Node requests pieces it does not own from
“streaming”; BitTorrent does not
neighbors
• Encourages contribution by all nodes
– Exact method explained later

How a node enters a swarm How a node enters a swarm


for file “popeye.mp4” for file “popeye.mp4”
www.bittorrent.com

• File popeye.mp4.torrent • File popeye.mp4.torrent


hosted at a (well-known) hosted at a (well-known)
1
webserver webserver
• The .torrent
torrent has address Peer • The .torrent
torrent has address
of tracker for file of tracker for file
• The tracker, which runs • The tracker, which runs
on a webserver as well, on a webserver as well,
keeps track of all peers keeps track of all peers
downloading file downloading file

3
4/2/2007

How a node enters a swarm How a node enters a swarm


for file “popeye.mp4” for file “popeye.mp4”
www.bittorrent.com www.bittorrent.com

• File popeye.mp4.torrent • File popeye.mp4.torrent


hosted at a (well-known) hosted at a (well-known)
webserver webserver
2 • The .torrent
torrent has address • The .torrent
torrent has address
Peer Peer
of tracker for file of tracker for file
Tracker • The tracker, which runs 3 Tracker • The tracker, which runs
on a webserver as well, on a webserver as well,
keeps track of all peers keeps track of all peers
downloading file downloading file
Swarm

Contents of .torrent file Terminology


• URL of tracker • Seed: peer with the entire file
• Piece length – Usually 256 KB – Original Seed: The first seed
• SHA-1 hashes of each piece in file • Leech: peer that’s downloading the file
– For reliability – Fairer
F i tterm might
i ht h
have b
been “d
“downloader”
l d ”
• “files” – allows download of multiple files • Sub-piece: Further subdivision of a piece
– The “unit for requests” is a subpiece
– But a peer uploads only after assembling
complete piece

Peer-peer transactions:
Choosing pieces to request Choosing pieces to request
• Rarest-first: Look at all pieces at all peers, • Random First Piece:
and request piece that’s owned by fewest – When peer starts to download, request
peers random piece.
– Increases diversityy in the pieces downloaded • So as to assemble first complete piece quickly
• avoids case where a node and each of its peers • Then participate in uploads
have exactly the same pieces; increases
throughput – When first complete piece assembled, switch
– Increases likelihood all pieces still available to rarest-first
even if original seed leaves before any one
node has downloaded entire file

4
4/2/2007

Choosing pieces to request Tit-for-tat as incentive to upload


• End-game mode: • Want to encourage all peers to contribute
– When requests sent for all sub-pieces, • Peer A said to choke peer B if it (A) decides not
(re)send requests to all peers. to upload to B
– To speed up completion of download • Each peer (say A) unchokes at most 4 interested
– Cancel request for downloaded sub-pieces peers at any time
– The three with the largest upload rates to A
• Where the tit-for-tat comes in
– Another randomly chosen (Optimistic Unchoke)
• To periodically look for better choices

Anti-snubbing Why BitTorrent took off


• A peer is said to be snubbed if each of its • Better performance through “pull-based”
peers chokes it transfer
– Slow nodes don’t bog down other nodes
• To handle this, snubbed peer stops
uploading to its peers • Allows uploading from hosts that have
downloaded parts of a file
Optimistic unchoking done more often – In common with other end-host based
– Hope is that will discover a new peer that will multicast schemes
upload to us

Why BitTorrent took off Pros and cons of BitTorrent


• Practical Reasons (perhaps more important!) • Pros
– Working implementation (Bram Cohen) with simple – Proficient in utilizing partially downloaded files
well-defined interfaces for plugging in new content
– Many recent competitors got sued / shut down
– Discourages “freeloading”
• Napster, Kazaa • By rewarding fastest uploaders
– Doesn’t do “search” per se. Users use well-known, – Encourages diversity through “rarest-first”
trusted sources to locate content • Extends lifetime of swarm
• Avoids the pollution problem, where garbage is passed off as
authentic content • Works well for “hot content”

5
4/2/2007

Pros and cons of BitTorrent Pros and cons of BitTorrent


• Cons • Dependence on centralized tracker:
– Assumes all interested peers active at same pro/con?
time; performance deteriorates if swarm – Single point of failure: New nodes can’t
“cools off” enter swarm if tracker goes down
– Even worse: no trackers for obscure content – Lack of a search feature
• ☺ Prevents pollution attacks
• Users need to resort to out-of-band search: well
known torrent-hosting sites / plain old web-search

Why is (studying) BitTorrent


“Trackerless” BitTorrent
important?
• To be more precise, “BitTorrent without a
centralized-tracker”
• E.g.: Azureus
• Uses a Distributed Hash Table (Kademlia DHT)
• Tracker run by a normal end-host (not a web-
server anymore)
– The original seeder could itself be the tracker (From CacheLogic, 2004)
– Or have a node in the DHT randomly picked to act as
the tracker

Why is (studying) BitTorrent


Other file-sharing systems
important?
• BitTorrent consumes significant amount of • Prominent earlier: Napster, Kazaa,
internet traffic today Gnutella
– In 2004, BitTorrent accounted for 30% of all
internet traffic (Total P2P was 60%),
• Current popular file-sharing client: eMule
according
di to C CacheLogic
h L i – Connects
C t tto the
th ed2k
d2k and
dKKad
d networks
t k
– Slightly lower share in 2005 (possibly – ed2k has a supernode-ish architecture
because of legal action), but still significant (distinction between servers and normal
– BT always used for legal software (linux iso) clients)
distribution too – Kad based on the Kademlia DHT
– Recently: legal media downloads (Fox)

6
4/2/2007

File-sharing systems… References


• (Anecdotally) Better than BitTorrent in • BitTorrent
finding obscure items – “Incentives build robustness in BitTorrent”,
• Vulnerable to: Bram Cohen
– Pollution
P ll ti attacks:
tt k Garbage
G b d
data
t inserted
i t d with
ith – BitTorrent Protocol Specification:
the same file name; hard to distinguish http://www.bittorrent.org/protocol.html
– Index-poisoning attacks (sneakier): Insert • Poisoning/Pollution in DHT’s:
bogus entries pointing to non-existant files – “Index Poisoning Attack in P2P file sharing
– Kazaa reportedly has more than 50% systems”
pollution + poisoning – “Pollution in P2P File Sharing Systems”

Das könnte Ihnen auch gefallen