High Performance P2P Web Caching

High Performance
P2P Web Caching
Erik Garrison
Jared Friedman
CS264 Presentation
May 2, 2006
SETI@Home
●
Basic Idea: people donate computer time to look for
aliens
●
Delivered more than 9 million CPU-years
●
Guinness BWR – largest computation ever
●
Many other successful projects (BOINC, Google
Compute)
●
The point: many people are willing to donate
computer resources for a good cause
Wikipedia
●
About 200 servers required to keep the site
live
●
Hosting & Hardware costs over 1$M per year
●
All revenue from donations
●
Hard to make ends meet
●
Other not-for-profit websites in similar
situation
HelpWikipedia@Home
●
What if people could donate idle computer
resources to help host not-for-profit
websites?
●
They probably would!
●
This is the goal of our project
Prior Work
●
This doesn't exist
●
But some things are similar
 Content Distribution Networks (Akamai)
●
Distributed web hosting for big companies
 CoralCDN/CoDeeN
●
P2P web caching, like our idea,
●
But a very different design
●
Both have some problems
Akamai, the opportunity
●
Internet traffic is 'bursty'
●
Expensive to build infrastructure to handle
flash crowds
●
International audience, local servers
 Sites run slowly in other countries
Akamai, how it works
●
Akamai put >10,000 servers around the
globe
●
Companies subscribe as Akamai clients
●
Client content (mostly images, other media)
is cached on Akamai's servers
●
Tricks with DNS make viewers download
content from nearby Akamai servers
●
Result: Website runs fast everywhere, no
worries about flash crowds
●
But VERY expensive!
CoralCDN
●
P2P web caching
●
Probably the closest system to our goal
●
Currently in late-stage testing on PlanetLab
●
Uses an overlay and a 'distributed sloppy
hash table'
●
Very easy to use – just append '.nyud.net' to
a URL and Coral handles it
●
Unfortunately ...
Coral: Problems
●
Currently very slow
 This might improve in later versions
 Or it might be due to the overlay structure
●
Security: volunteer nodes can respond with
fake data
●
Any site can use Coral to help reduce load
 Just append .nyud.net to their internal links
●
Decentralization makes optimization hard
 more on this later
Our Design Goals
●
Fast: Akamai level performance
●
Secure: Pages served are always genuine
●
Fast updates possible
●
Must greatly reduce demands on main site
 But this cannot compromise first 3
Our Design
●
Node/Supernode structure
 Take advantage of extremely heterogeneous
performance characteristics
●
Custom DNS server redirects incoming
requests to nearby super node
●
Super node forwards request to nearby
ordinary node
●
Node replies to user
Our Design
User goes to wikipedia.org
DNS server resolves Node retrieves document

wikipedia.org to a super node and sends to user
Super node forwards request to

ordinary node that has the
requested document
Performance
●
Requests are answered in only 2 hops
●
DNS server resolves to a geographically
close supernode
●
Supernode avoids sending requests to slow
or overloaded nodes
●
All parts of a page (e.g., html and images)
should be served by a single node
Security
●
Have to check nodes' accuracy
●
First line of defense: encrypt local content
●
May delay attacks, but won't stop them
Security
●
More serious defense: let users check the
volunteer nodes!
●
Add a javascript wrapper to the website that
requests the pages using AJAX
●
With some probability, the AJAX script will
compute the MD5 of the page it got and send
it to a trusted central node
●
Central node kicks out nodes that frequently
get invalid MD5sum's
●
Offload processing not just to nodes, but to
users, with zero-install
A Tricky Part
●
Supernodes get requests, have to decide
what node should answer what requests
●
Have to load-balance nodes – no
overloading
●
Popular documents should be replicated
across many nodes
●
But don't want to replicate unpopular
documents much – conserve storage space
●
Lots of conflicting goals!
On the plus side...
●
Unlike Coral & CoDeeN, supernodes know a
lot of nodes (maybe 100-1000?)
●
They can track performance characteristics of
each node
●
Make object placement decisions from a
central point
●
Lots of opportunity to make really intelligent
decisions
 Better use of resources
 Higher total system capacity
 Faster response times
Object Placement Problem
●
This kind of problem is known as an object
placement problem
 “What nodes do we put what files on?”
●
Also related to the request routing problem
 “Given the files currently on the nodes, what
node do we send this particular request to?”
●
These problems are basically unsolved for
our scenario
●
Analytical solutions have been done for very
simplified, somewhat different cases
●
We suspect a useful analytic solution is
impossible here
Simulation
●
Too hard to solve analytically, so do a
simulation
●
Goal is to explore different object placement
algorithms under realistic scenarios
●
Also want to model the performance of the
whole system
 What cache hit ratios can we get?
 How does number/quality of peers affect cache
hit ratios?
 How is user latency affected?
●
Built a pretty involved simulation in Erlang
Simulation Results
●
So far, encouraging!
●
Main results using a heuristic object
placement algorithm
●
Can load-balance without creating hotspots
up to about 90% of theoretical capacity
●
Documents rarely requested more than once
from central server
●
Close to theoretical optimum
Next Steps
●
Add more detail to simulation
 Node churn
 Better internet topology
●
Explore update strategies
●
Obviously, an actual implementation would
be nice, but not likely to happen this week
●
What do you think?

High Performance P2P Web Caching

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

High Performance P2P Web Caching

Hochgeladen von

Copyright:

Verfügbare Formate

High Performance

P2P Web Caching

DNS server resolves Node retrieves document

Super node forwards request to

Das könnte Ihnen auch gefallen