Sie sind auf Seite 1von 5

Are Virtualized Overlay Networks Too Much of a Good Thing?

Pete Keleher, Bobby Bhattacharjee, Bujor Silaghi


Department of Computer Science
University of Maryland, College Park
keleher@cs.umd.edu

1 Introduction be served by different nodes; they will be dis-


tributed randomly among participating peers.
Peer-to-peer (P2P) networks have recently be- Similarly, routing load is distributed because
come one of the hottest topics in OS research [3, paths to items exported by the same site are usu-
2, 10, 5, 9, 4, 8]. Starting with the explosion of ally quite different.
popularity of Napster, researchers have become Just as importantly, virtualization of the
interested because of the unparalleled chance to namespace provides a clean, elegant abstraction
do relevant research (people might use it!), and of routing, with provable bounds on routing la-
the brain-dead approach of many of the first P2P tency.
protocols. The contention of this position paper is that
The majority of most recent high-profile work this virtualization comes at a significant cost, as
has described middleware that performs a single described below:
task: distributed lookup. This seemingly sim-
ple function can be used as the basic building
1. Virtualization destroys locality - By virtual-
block of more complex systems that perform a
izing keys, data items from a single site are
variety of sophisticated functions (file systems,
not usually co-located, meaning that oppor-
event notification system, etc.).
tunities for enhancing browsing, prefetch-
P2P networks differ from more conventional ing, and efficient searching are lost.
infrastructure in that the load (whether CPU cy-
cles or packet routing/forwarding) is distributed
across participating peers. This load should ide- 2. Virtualization discards useful application-
ally be balanced, as the load is in some sense specific information - The data used by
the “payment” for participating in the network. many applications (file systems, auctions,
Overloading some peers while letting others off resource discovery) is naturally described
without performing any work towards the com- using hierarchies. A hierarchy exposes
mon good is clearly unfair. relationships between items near to each
other in the hierarchy; virtualization of the
The approach many recent systems [9, 10, 4, 5]
namespace discards this information.
have taken towards ensuring load balance is to
virtualize data item keys by creating the keys
from one-way hashes (SHA-1, etc.) of the item The rest of this paper elaborates on these
labels. The peer node ID’s are similarly encoded, points and outlines an alternative approach.
and data items are mapped to “closest” nodes To be clear, the environment assumed in this
by comparing keys and hashed node ID’s. We paper is that of a set of cooperating, widely-
refer to this as virtualization of the namespace. separated peers, running as user-level processes
By contrast, a “non-virtualized” system is one on ordinary PC’s or workstations. Peers “ex-
where data items are served by the same nodes port” data, and keys are “mapped” onto overlay
that export them. servers. The set of peer nodes that export data
A virtualized approach helps load balance be- is also the set of overlay servers. A “node” is a
cause data items from one very hot site will process participating in the system.
2 Locality is a Good Thing than a native IP message.
Note that there is an implicit assumption of
The first form of locality with which we are con- spatial locality in all of the virtualization work.
cerned is spatial locality. Users who access di are Virtualization distributes load well only assum-
more likely to also access di+1 than some arbi- ing that the only form of locality present is spa-
trary dj . Consider web browsing: a given page tial, not temporal. Stated another way, virtu-
might require many nearby items to be accessed, alization can not improve load balance in the
and the next page accessed by a user is likely to presence of single-item hotspots; it can only dis-
be on the same site as well. tribute multiple (possibly related) data items
Virtualization of this process loses several op- across the network.
portunities for performance improvement. First, By contrast, current systems use replication
the route to the exporting site only has to be dis- both to provide high availability and to dis-
covered once in a non-virtualized system. Subse- tribute load when single items become hot spots.
quent accesses to a second data item can follow
the same route. In a virtualized system, there 3 Searching
will likely be nothing in common between the
two routes. Most distributed object stores require some
Second, an exporting site (or the access ini- search capability in order to be used effectively.
tiator) might choose to prefetch nearby data in We distinguish two types of searching: index-
a non-virtualized system. Prefetching, when it ing and keyword/attribute searching. “Index-
works, enables the system to hide the latency of ing” refers to indexing entire documents, e.g.
remote accesses. While prefetching can be made Google’s index of the web. Indexing of docu-
to work with a virtualized namespace, it is much ments served by a distributed overlay system re-
more difficult. For example, CFS [1], a cooper- quires local indexes to be created, combined, and
ative file system built on top of Chord [9], can then served, presumably through the same over-
prefetch file blocks. A peer receiving a request lay system, although this could also be done at a
for block i of a file can prefetch the second by higher level. The difficulty of indexing is not af-
locally reconstructing the virtualized name for fected by whether the namespace is virtualized;
block i + 1 and sending a prefetch message to it is a hard problem for any distributed system.
the site serving it. However, each such prefetch As an example of attribute searching, assume
requires a network message. Worse, prefetching that we wish to search for “red cars”, where ’red’
blocks is relatively easy because the name of the and ’car’ are encoded as attributes of certain
nearby object (block i + 1) is easy to predict. documents. Virtualized namespaces do not en-
However, the names of nearby items on the ex- code attributes in the overlay structure, so this
porting site are not easy to predict, and prefetch- search could only be accomplished by visiting ev-
ing could probably only be accomplished via an ery node or resorting to some higher-level proto-
application overlay that indexes exporting sites. col. We discuss how embedding attributes in the
Current approaches also fail to exploit tem- overlay structure of a non-virtualized system al-
poral locality as much as they might. None of lows efficient attribute searching in Section 5.
Chord, CAN [4], Pastry [5], or Tapestry [10] cur-
rently use caching. Repeated accesses by one site 4 Adding Information Back In
to the same data item require repeated traversals
through the overlay network. However, caching There are two approaches to adding application-
could easily be added to these systems, and is specific information and support for locality back
used in some applications built on top of these into a virtualized system: use of higher-level ap-
systems (e.g., Past [6], CFS). Further, some sys- plication layers, and eliminating virtualization
tems (most notably Pastry) attempt to exploit entirely. We discuss the former here, and one
locality in charting a route through the overlay approach to the latter in the next section.
network’s nodes. The result is that the total IP Support for locality in the query stream can
hop count may be only a small constant higher be added back at higher levels. As an example,
both Past and CFS cache data along paths to approach helps load balance and only adds a con-
highly popular items. The advantage of doing so stant amount of overhead per node, regardless of
at the file system layer rather than the routing the size of the system.
layer is that both location information, and the TerraDir provides comparable performance to
file’s data itself, are cached together. If address the other systems (probably somewhat better
caching were performed at the routing level, file latency because of the caching, probably a bit
caching would be less effective. worse load distribution), but leaves application
File system prefetching can also be accom- locality and hierarchical information intact.
plished at this level. For example, one could Locality is retained because a given data item
prefetch the rest of a directory after one file is is mapped to the node that exports it, rather
accessed by (1) deriving the file’s directory name than to another randomized host. Not only does
from the target file’s name, (2) routing a message this save a level of indirection, but co-located
to the directory object, (3) reading the directory items are mapped to the same locations, meaning
object to get the set of other files in the direc- locality can be recognized and exploited without
tory, and then (4) sending prefetches to each of network communication. Note that replication
those files. is per-node. All items exported by a node are
However, not only is this inefficient, but it only replicated together, so any replica can perform
makes use of information about accesses to a sin- prefetching.
gle file. Consider how a system with a virtual- Caching addresses both spatial and temporal
ized namespace would support a policy that only locality. Repeated accesses to the same remote
prefetches entire directories if two or more files object are serviced by a local cache if caching of
of the directory are accessed within a short time. data is turned on. Otherwise, the cache provides
the network address of the exporting node, lim-
5 Eschewing Virtualization iting routing to a single hop through the overlay
network.
Consider the types of applications that are being Accesses to items “near” each other in the
built on top of the overlay networks discussed application hierarchy are handled efficiently be-
so far: file systems, event notification systems, cause they are also near each other, or co-
distributed auctions, and cooperative web prox- located, in the overlay network. Hence, the num-
ies. All of these applications organize their data ber of hops in the overlay network are again min-
hierarchically, and any locality in these appli- imized.
cations is local in the framework of this hierar- Data item keywords are explicitly coded into
chy. Browsing the hierarchy is, therefore, the the overlay hierarchy, so searching for keywords
only way of extracting and exploiting locality. is handled efficiently. For example, consider
Yet, this information is discarded by virtualized searching for red cars in the hierarchy shown in
namespaces. Figure 1. The query would be of the form “/ve-
Another approach is to encode this hierarchy hicles/cars/red/*”, and would be routed to the
directly into the overlay layer. While this paper smallest subtree on the left side of the figure.
is not about TerraDir, we discuss it as an exam- The wildcard will then cause the query to be split
ple approach that addresses some of the short- and to flood that subtree. However, the rest of
comings discussed above. A TerraDir [8] is a the hierarchy, aside from the path from the query
non-virtualized overlay in the form of a rooted initiator to the “cars/red” subtree, is untouched.
hierarchy that explicitly codifies the application By contrast, searching for “red cars” can only
hierarchy. By default, routing is performed via be accomplished efficiently via some higher-level
tree traversal, taking O(log(n)) hops in the worst service in a virtualized system.
case1 . Availability, load balance, and latency are To be fair, note that the query is handled effi-
all addressed further by caching and replication. ciently only because the query structure matches
The degree to which a node is replicated is de- the hierarchy’s structure. Searching for “any-
pendent on the node’s level in the tree. This thing red” would cause all leaves to be visited.
1
Assuming a relatively well-balanced tree. This problem is addressed by allowing “views”
vehicles efit from increased functionality in the lookup
layer.
cars planes boats We advocate encoding application hierarchies
directly into the structure of the overlay network.
red blue red blue green pink This approach allows systems to exploit locality
between objects and to provide searching with-
out centralized indexing or flooding.
Figure 1: An example TerraDir. Searching for
“red cars” is more efficient than searching for
“anything red”. References
[1] Frank Dabek, M. Frans Kaashoek, David
to be dynamically materialized. A client that
Karger, Robert Morris, and Ion Stoica.
expects to make multiple queries with a differ-
Wide-area cooperative storage with CFS. In
ent structure (e.g. “all red things”, then “all
Proceedings of the 18th ACM Symposium on
blue things”, etc.) inserts a view query into the
Operating Systems Principles (SOSP ’01),
system. The view queries specifies an ordering
Chateau Lake Louise, Banff, Canada, Octo-
on the set of attributes2 , which is used to build
ber 2001.
a new overlay hierarchy. Building the new hier-
archy requires the entire tree to be visited once; [2] John Kubiatowicz, David Bindel, Yan
subsequent queries will be handled efficiently. Chen, Steven Czerwinski, Patrick Eaton,
The TerraDir approach has at least two other Dennis Geels, Ramakrishna Gummadi,
important advantages. First, maintenance over- Sean Rhea, Hakim Weatherspoon, West-
head is significantly less. The virtualized ap- ley Weimer, Chris Wells, and Ben Zhao.
proaches generally require log(n) operations to Oceanstore: An Architecture for Global-
allow a node to leave or join the overlay, whereas Scale Persistent Storage. In Proceedings of
these operations require only a constant number the Ninth International Conference on Ar-
of operations under TerraDir. chitectural Support for Programming Lan-
Finally, TerraDir nodes “maps” the key of a guages and Operating Systems (ASPLOS
node back to that same node, meaning that the 2000), 2000.
data item and its mapping are not distributed
across administrative boundaries. [3] Karin Petersen, Mike Spreitzer, Douglas B.
Terry, Marvin Theimer, and Alan J. De-
mers. Flexible update propagation for
6 Summary weakly consistent replication. In Sym-
posium on Operating Systems Principles,
Distributed lookup services using virtualized
pages 288–301, 1997.
namespaces can be important building blocks for
building sophisticated P2P applications. Names-
[4] Sylvia Ratnasamy, Paul Francis, Mark Han-
pace virtualization provides load balance and
dley, Richard Karp, and Scott Shenker. A
tight bounds on latency at low cost.
scalable content addressable network. In In
In doing so, however, it discards potentially Proceedings of the ACM SIGCOMM 2001
useful information (application hierarchies) and Technical Conference, 2001.
relationships (proximity within the hierarchy).
This is not always a problem: certain types [5] Antony Rowstron and Peter Druschel. Pas-
of functionality are more efficiently provided at try: Scalable, distributed object loca-
higher layers (this is merely the end-to-end argu- tion and routing for large-scale peer-to-
ment [7]). However, many applications can ben- peer systems. In Proceedings of the 18th
2
View queries can also name tag functions, which can
IFIP/ACM International Conference on
be seen as dynamic attributes synthesized from the static Distributed Systems Platforms (Middleware
attributes. 2001), 2001.
[6] Antony Rowstron and Peter Druschel. Stor-
age management and caching in PAST, a
large-scale, persistent peer-to-peer storage
utility. In Proceedings of the 18th ACM
Symposium on Operating Systems Princi-
ples (SOSP’01), 2001.

[7] Jerome H. Saltzer, David P. Reed, and


David D. Clark. End-to-end arguments
in system design. Computer Systems,
2(4):277–288, 1984.

[8] Bujor Silaghi, Samrat Bhattacharjee, and


Pete Keleher. Query routing in the terradir
distributed directory. Submitted for publica-
tion, 2001.

[9] Ion Stoica, Robert Morris, David Karger,


M. Frans Kaashoek, and Hari Balakrishnan.
Chord: A scalable peer-to-peer lookup ser-
vice for internet applications. In Proceed-
ings of the ACM SIGCOMM ’01 Confer-
ence, San Diego, California, August 2001.

[10] B. Zhao, K. Kubiatowicz, and A. Joseph.


Tapestry: An infrastructure for fault-
resilient wide-area location and routing.
Technical Report UCB//CSD-01-1141, Uni-
versity of California at Berkeley Technical
Report, 2001.

Das könnte Ihnen auch gefallen