Resource Discovery Mechanisms For Peer-To-Peer Systems

Resource Discovery Mechanisms for Peer-to-peer Systems
Rozlina Mohamed
Computer Science Department
University Malaysia Pahang
Kuantan, MALAYSIA
E-mail: rozlina@ump.edu.my
Siti Zanariah Satari
Mathematical Department
University Malaysia Pahang
Kuantan, MALAYSIA
E-mail: zanariah@ump.edu.my
AbstractOne of the most important problems for Peer-topeer
(P2P) system is searching and retrieving the correct
information based on the query request. In this paper, we will
discuss several existing resource discovery mechanisms that
have been widely implemented in P2P networks. Thus, we are
classifying the P2P systems based on their resource discovery
mechanisms. Consequently, we will present taxonomy for P2P
resource discovery as a hierarchy based on our classification.
The classification is done for seek out generalities amongst
resource discovery mechanisms. We also specifically highlight
on issues related to challenges for providing the resource
discovery mechanism to the P2P networks. Issues related to
matrix for measuring the cost and benefit for choosing the
right resource discovery mechanism for a P2P system will
conclude the paper.
Keywords- peer-to-peer, super-peer network, searching
I. INTRODUCTION
Peer-to-peer (P2P) systems are designed for sharing
computer resources such as files, storage, memory and even
the CPU cycles through direct exchange. In contrast to other
system characteristic that requires intermediation or support
of a centralized server, P2P system characterized the ability
to autonomously discover the participated peers. Fully
decentralized and unstructured P2P networks such as
Gnutella [1] are attractive for not requiring any centralized
directories. Furthermore, each peer in Gnutella is absolutely
an autonomous peer where there is the system network
doesnt have any control over network topology and data
placement. However, the flooding-based query algorithm
used in Gnutella is performed by flooding a query message
through the network [1-4]. A peer node in Gnutella-based
system is known as servent, where each peer is a client and
server as well. Figure 1 illustrates how a peer locates thus
downloading a file in Gnutella. The figure is just a simple
illustration where since it does not show the actual
download and details concerning the establishment of TCP
connections are omitted.
Although the flooding-based query is simple and robust,
this approach wastes too much bandwidth because
sometimes most of the neighbors who receive the query
does not obtaining the required data. Thus, most of them are
not replying anything. As illustrated in Figure 1, Peer7 and
Peer8 are contacted twice; while Peer7 does not give any
response. These unused query messages are contributing to
thenetwork flooding.
Several approaches have been introduced to solve the
above mentioned problem [5-11]. In this paper, we provide
the classification of P2P resource discovery that has been
proposed by several researches for solving the flooding
problem. This classification is significant for the developer
to justify which discovery mechanism for specified P2P
system needs. Thus, we are developing a matrix for
measuring cost and benefit. The matrix would be based on
our proposed classification. The remaining of this paper is
organized as follows. Section 2 will review some resource
discovery approaches, and then followed by the discussion
of our proposed classification taxonomy in Section 3. In
Section 4, some challenges in the implementation of
resource discovery will briefly explained and Section 5
concludes the paper.
II. RESOURCE DISCOVERY
Due to the weaknesses of the flooding-based query, the
used of resource discovery mechanism has mutually growth
with the growth of size and diversity of P2P network.
Implementation of the resource discovery mechanism is
named as topology-aware routing has raised significant
attention since it has been proven as an efficient query
message routing. Instead of blindly send the query to
neighbor, the peer has to aware of the query message
destination before sending the query. As a popular platform
for sharing huge volumes of data, the usability of P2P is
highly depends on the effective technique to find and
retrieve the correct information. Therefore, discovering the
2009 Second International Conference on Computer and Electrical Engineering
978-0-7695-3925-6/09 $26.00 2009 IEEE
DOI 10.1109/ICCEE.2009.32
102
2009 Second International Conference on Computer and Electrical Engineering
978-0-7695-3925-6/09 $26.00 2009 IEEE
DOI 10.1109/ICCEE.2009.32
100
resources that locate the required information has giving
significant impact to increase the successful of searching
and retrieving query results.
The term resource is defined in [12] and [13] as any
source of supply that may consist of files, file-system,
memory, CPU capability, communication capability, and etc.
Thus, we are defining resource discovery as a process of
searching for prospective data locations. Searching in P2P
can be divided into two general categories: i) providing
directory of data providers locations or towards data
sources, or ii) replicating the data from the sources. Both
categories require a kind of data indexing. To discover
specific data, peer has to know some identifiers or keys in
advance. In practice, data identifier is obtained in a resource
discovery mechanism which is called as index. This index is
also known as routing index is either centralized or
distributed. Since the searching is done based on the index,
this kind of search is also named as structured search.
Centralized index will allocate the index of shared data or
files at central server, whereas the index is scattered in
several different server for the distributed index approach.
A. Centralized Index
Centralized index is popularized by Napster and its
clients where the index of data or files location is kept in
central server. Each peer maintains a connection to the
central server. Instead of blindly broadcast the query in
Gnutella, each query in Napster will be send to the central
server for discovering the location for obtaining the query
result. Figure 2 illustrates the Napster client querying
directory server and downloading file. The process starts by
the peer tries to connect to the Napster directory server.
Once the directory server accepts the connection, the peer
sends a query to the directory server describing the music
file wanted. By sending the query message to the selective
candidate peer, the number of query messages in the entire
network is eliminated compared to Gnutella query flooding
[14]. As shown in Figure 2, Peer4 has not receiving any
query message from Peer1. This is due to the absence of
Peer4 as one of the data source locations from the server
returned message.
The most important part of the Napster system is the
directory server. It contains a large database with available
music files. A data source peers connecting to the Napster
directory server adds descriptions for all its' music files to
the database. Queries to the directory server can then be
processed efficiently. However, Napster systems are
vulnerable to censorship and malicious attack. A single
server is seen as a single point of failure. In addition, they
are not inherently scalable, because of limitations on the
size of the index directory and server capacity to respond to
peers queries. As central directories are not always updated,
they have to be periodically refreshed.
B. Distributed Index
Distributed index is also known as structured P2P
overlay networks. Overlay network is a computer network
which is built on top of physical network. Nodes in the
overlay can be thought of as being connected by virtual or
logical links, each of which corresponds to a path, perhaps
through many physical links, in the underlying network.
Examples of such network are CAN, Chord, Pastry,
Tapestry and HyperCup. Key or identifier is used to map
onto data value in the distributed index. The distributed
infrastructure of this index provides hash-table like
functionality on the Internet scale. In here, the distribute
hash table (DHT) is used as the routing table. The DHT is
used to keep of the participated peer topology in order to
retain the number of routing hop at certain limit.
DHT-based routing table can be allocated at either every
node in the peer-to-peer network, or selected node.
Allocating the routing table on the selected node is called as
super-peer network. In the super-peer network, selective
peer node would keep the routing table on behalf of other
respective peers. [15-18]. However, the super-peer node is
not a permanent server. As an autonomous peer, the
superpeer node is free to leave the P2P network as well. In
addition, the super-peer node just keeps the routing index
without responsible for further peer query processing.
Keeping the routing index on every participated peer
would enable each peer to precisely determine routing
direction. However, keeping the routing index would give
an additional burden that may not be suitable on all peers. In
practice, peers do not have equal performance. Thus,
assigning routing index to the selective nodes seems to be
the solutions of the peer performance problem. Further
details on the super-peer network will be discussed on the
following sub-sections.
C. Super-peer Network
Peer in the super-peer network is clustered into several
network subnets. The cluster is built-up based on shared
data, usage history and peer interest [15, 19, 20]. Peer has to
register their participation before joining the P2P network.
During registration, they have to declare sorts of required
that would be used to allocate them in which cluster. A peer
may belong to more than one cluster. Respective super-peer
will update their routing-index when the new peer joining
and the network or even when they are updating their shared
data. Further details on the super-peer index registrations,
updates and withdrawal can be found in [15]. In a network
cluster, super-peer is considered as the hub for the other
client peers. We used the term client-peers to differentiate
between peers role within a cluster. The super-peer network
103 101
is illustrated in Figure 3.
Distributing the routing index alleviate a single-point-
offailure in the centralized index. Although super-peer
clusters are efficient, scalable and manageable, super-peer
itself has become a single-point-of-failure for the peer nodes
within the cluster [14]. In order to avoid a single point-of-
failure for the clients in a cluster to queries, some policies of
super-peer redundancy should be taken into account. As in
the case of fail over super-peer or when the super-peer
suddenly disconnected from network, these strategies should
be able to take over the job of the super-peer.
D. Redundant Super-peer
Super-peer redundancy is initiated to increase the
reliability of super-peer topology by replicating the
superpeer index into several peers within the same subnet
[21, 22]. Several peers are assigned as super-peers for the
same cluster of peers. However, the super-peer redundancy
comes with cost of query processing. This is due to the
number of superpeer assigned to the peers cluster is
increases by a factor of k2, where k is the number of super-
peers in a cluster. Hence, the aggregate cost is also
increased. The cost increments involved are bandwidth,
processing and storage.
Even though the single super-peer workload is decreased
by load/k compared to single super-peer for certain amount
of participated peers, but there is some trade-off between
reliability and increment cost of bandwidth, workload and
storage for aggregate P2P system network.
E. Hierarchical super-peer
Another approach in reducing the probability of single
point of failure in super-peer network is by having superpeer
index that is hierarchically structured. Since the number of
peers involved in maintaining the index is increased, the
possibility of single point of failure is reduced. The main
aim of hierarchical schema structures approach is to
increase the system scalability by reducing the amount of
complete mapping between data source peers [8]. There are
two approaches of hierarchical schema structures: fixed
structure and flexible structure [23].
The XPeer system specifies the three levels of schema
hierarchy. Whereas in iXPeer, the level of schema hierarchy
is varies, where the hierarchy structure can be derived from
any abstraction. In reducing the probability of having a
single point of failure in super-peer topology, the
hierarchical schema index could have the single point of
failure if only one peer involved in each hierarchy level.
Therefore, implementing the abstraction of super-peer
schema index would only reduce the probability of having a
single point of failure in super-peer network but not solving
the entire problem.
F. Published/subscribe
Besides having DHT-based index, publish/subscribe can
be seen as an alternative model for indexing.
Publish/subscribe consists of three entities: publisher which
is the peer who send the data, subscriber which is the peer
who consume the data, and the index which is the mediator
between publisher and subscriber. Instead of having routine
acknowledgement on data updates from the respective
superpeer, super-peer in publish/subscribe model will only
send the data to all peer within the same cluster that
matched with the peers subscription [24]. In addition to the
indexing capabilities in publish/subscribe, the multi-
dimensional indexing is discussed in [25]. The peer
connection is keep by layer; thus, dimension of the index
would be based on the connection layer. Therefore, the
index storage space is more utilized compared to single-
dimension index.
Besides the indexing capabilities issue, the concept of
104 102
semantic P2P overlays based on publish/subscribe has been
proposed in [26]. With the semantic feature that has been
proposed, their P2P system network doesnt rely on
dedicated network contents routers. However, the messages
routing may not be perfectly accurate, in the sense that some
peers may receive some messages that do not match their
interests. This is due to the lack of proximity matrix that has
been applied. Meanwhile, Carona System has introduced the
concept of pooling content server. Thus, instead of having a
temporary fix server content which is served by the
respective super-peer, the index is served by pooling the
index content periodically while the content server is
interchangeably.
In contrast to [24-26], the use of publish/subscribe on
the absence of super-peer node is presented in [27]. With the
absence of super-peer the publish/subscribe message is
replicated on the selected peers. They are maintaining the
replication that need to be assigned.
III. MATRICES
In the rest of the paper, we will define some matrices for
addressing the efficiency of resource discovery mechanism
in P2P systems. In addition, we will propose taxonomy of
resource discovery mechanism towards defining the matrix.
The taxonomy of resource discovery features is illustrated in
Figure 3. Our taxonomy is using all of the related research
that has been discussed on the previous sections.
A. Cost
When peer a querying peer send a query, his query is
broadcast through the network by passing-thru several peer
nodes. Queried peer which is the contacted peers can be
either as an intermediate node who just re-route the query or
process the query result or even forward some query
response message to the sender have spend processing
resources (such as CPU cycles) to the particular query. In
addition, each query uses bandwidth resources messages
passing between peers. Thus, the query cost would include
the processing and bandwidth cost. Since the query can be
broadcast to several peers in the network, the cost is not
limited to a single peers processing. Thus, the total
processing cost consists of the local processing cost of the
querying peer and also the processing cost for each of the
queried peers. Accordingly, the total bandwidth cost
consists of the total bandwidth usage in order to send
messages between particular peers that are related to the
query.
B. Result Quality
Besides measuring the cost of processing and bandwidth
that has been used to process the query, another matrix point
of view is the quality of result produced.
Measuring the result quality is actually the measurement
of users satisfaction for particular query result. There are
several attributes for measuring the users satisfaction.
However, acquiring for users satisfactory comments on
each query result that has been produced is almost
impossible. Still, we can measure the quality of result in
number of ways, such as accumulate the number of returned
results, and accumulate the elapsed time from when the
query is first submitted by the querying peer, to when the
query result is complete.
Quality determination based on the accumulation of
returned result should be done according to the query
processing mechanism that has been applied by the P2P
system. Some query processor has put certain limit on the
number for return result. For the query processor who does
not have any returned result limit, the higher return result is
associated to the higher quality. Otherwise, if the number of
return result reached the limitation, then the result is
considered as the quality result. Besides that, the proportion
of messages related to the returned result and the query
messages that have been broadcast within the entire network
can be the quality measurement as well. Higher proportion
for the number of return result can be assumed as the higher
quality result. In general, the proportion of query and result
messages is seen as measuring the message broadcasting
efficiency. But, we are strongly believed that the smaller
number of query message broadcasted indicate that the
queried peers are the qualified data sources. Thus, the
quality result is returned.
The second quality measurement attribute is the elapsed
time the user has to wait for result to arrive. We believe that
the resource discovery mechanism contributes to shorten the
response time. This is due to the fact of decreasing in the
number of message hop in P2P systems while embedding
the resource discovery mechanism [20, 21].
C. Resource Discovery Mechanism
Our third viewpoint on metric is the resource discovery
mechanism. RDM is considered due to the additional cost of
preserving the mechanism neither at local nor the entire
networks. The reason for separating the cost of RDM to the
cost of query processing is because of the different nature
for cost calculation. Basically, we will count the query
processing cost per query. But it is unfair to judge the RDM
cost per query basis since this mechanism may be used for
several thousand of queries and would live for certain
number of period. There are several dissimilarities between
query processing and RDM cost, such as follows:
Storage allocation for obtaining the RDM.Allocation
includes of the external data storage (for example
when the index keep the XML data) and also the
cached of any materialized data or query
Additional processing resources require for
maintaining the RDM. The processes are such as
inserting new shared data source, updating the RDM
once the data sources are refresh, and also deleting
the elements associated to data source whenever the
peer leaves the network.
Additional processing for conducting the query
processing on behalf of peer (This is true for some of
the P2P networks only)
One needs to deal with several metric elements in
addition to the complexity of the discovery algorithm. There
is no clear-cut in between them. However, there must be
some trade-off that has to be encounter.
105 103
IV. CHALLENGES FORRESOURCE DISCOVERYIN
MAJORP2P SYSTEMS ANDCONCLUSION
This section is devoted to discuss issues specific to
resource discovery challenges in P2P systems. To design a
resource discovery mechanism for P2P system is not a
trivial task. Besides dealing with several matrices, P2P
resource discovery need to deal with several other issues as
well. Such issues are load balancing between nodes, fault-
tolerance and computational capabilities among peers as
well as the ability to support more complex queries.
In any decentralized network, balancing of load is one of
the unresolved tasks. Embedding the routing index for all
peer may burdened some of the incapable peers but
assigning it to several selected peers may imbalance the load
among peers. Moreover, stability of the network connection
and the probability of the node failure need to be taken into
account when selecting peers for embedding the routing
index.
Standard decentralized network supports either a limited
keyword search (such as KaZaa in [28]) or index-based
search (such as Bibster in [18]). There are number of
attempts to adapt distributed resource discovery with more
complex queries and even supporting incomplete queries
[29, 31].
V. ACKNOWLEDGMENT
We would like to thank UMP Research Management
Centre (RMC) for the support through UMP short-term
grant funding. We also acknowledge the previous support
from the PhD fund scholarship.
REFERENCES
[1] The Gnutella web site, [Online] 2003, http://gnutella.wego.com.
[2] S. Ciraci, I. Korpeoglu and O. Ulusoy. Reducing query overhead
through route learning in unstructured peer-to-peer network, Journal
of Network and Computer Applications. (2008).
[3] V. Kalogeraki, D. Gunopulos, D. Z. Yazti. A local search mechanism
for peer-to-peer networks, CIKM02 in ACM. (2002).
[4] Q. Lv. et.al. Search and replication in unstructured peer-to-peer
networks, ICS02 in ACM. (2002).
[5] R. Mohamed, C. D. Buckingham, Pre-processing for improved query
routing in super-peer P2P systems, IEEE Region 5. (2008).
[6] L. Fegaras, W. He and G. Levine, XML query routing in structured
P2P systems, DBIS2P2P in VLDB. (2006).
[7] W. He, L. Fegaras and G. Levine, Indexing and searching XML
documents based on content and structure synopses, BNCOD in
LNCS. (2007).
[8] G. Kokkkinides and V. Christophides, Semantic query routing and
processing in P2P database systems, EDBT 2004 in LNCS. (2004).
[9] I. Brunkhorst, H. Dhraief, Semantic caching in schema-based P2P
network, in LNCS. (2007).
[10] B. Bobby, et. al., Efficient peer-to-peer searches using result caching,
IPTPS in LNCS. (2003).
[11] Z. Christian, B Srikanta and W. Gerhard, Standing on the shoulders
of peers: Caching in peer-to-peer information retrieval, in the 3rd
VLDB. (2007).
[12] K. Vanthournout, G. Deconinck, and R. Belmans, A taxonomy for
resource discovery , Personal Ubiquitous Computingvol.7, no. 2.
(2004).
[13] E. Meshkova, et. al, A survey on resource discovery mechanisms,
peer-to-peer and service discovery frameworks, Computer networks,
vol. 52, no. 11. (2008).
[14] B. Pourebrahimi, K. Bertels, and S. Vassiliadis, A survey of peer-
topeer networks, 16th annual workshop on circuits, systems and
signal processing. (2005).
[15] W. Nejdl, et. al., Edutella: A P2P networking infrastructure based on
RDF. In the ACM 11th WWW02 , (2002).
[16] D. Calvanese, et. al., Hyper: A framework for peer-to-peer data
integration on grids. In Springer LNCS, vol. 3226/2004. (2004).
[17] M. Boyd, et. al., AutoMed: A BAV data integration system for
heterogeneous data sources. In Springer LNCS, vol. 3084/2004.
(2004).
[18] P. Haase, et. al., Bibster: A semantic-based bibliographic peer-to-peer
system, In Springer ISWC2004, (2004).
[19] S. Johnstone, P. Sage, and P. Milligan, iXChange A self organizing
super-peer network model, in IEEE ISCC2005, (2005).
[20] A. Moderresi, et. al., How community-based P2P social networks can
affect query routing?, in IEEE NCM 2008, (2008).
[21] B. Yang and H. Garcia-Molina, Designing a super-peer network, In
the 19th ICDE, 2003
[22] Inc. Fiorano Software. Super-peer architectures for distributed
computing (whitepaper), Technical report, [Online] 2007,
http://www.orano.com/whitepapers/superpeer.pdf
[23] Z. Bellahsene, C. Lazanitis, P. McBrien, and N. Roussopoulos.
iXpeer: Implementing layers of abstraction in P2P schema mapping
using AutoMed. In WWW2006. (2006).
[24] A. K. Datta, et. al., Anonymous Publish/Subscribe in P2P Networks,
in ACM 17th IPDPS, (2003).
[25] V. Mathusamy, H.-A. Jacobsen, Small-scale Peer-to-peer
Publish/Subscribe. In P2P Workshop at MobiQuitous 2004. (2004).
[26] R.Chand, P. Felber, Semantic Peer-to-Peer Overlays for
Publish/Subscribe Networks, In Springer LNCS, vol. 3648/2005.
(2005).
[27] H. Huang, H. Shi, Routing Mechanism for Active Publish Subscribe
System Based on Event Space Partition, In IEEE ISISE, vol. 1(2008).
[28] The KaZaa web site, [Online] 2009, http://www.kazaa.com.
[29] I. Brunkhorst, et. al., Distributed queries and query optimization in
schema-based P2P systems, in Springer LNCS, vol. 2944/2004.
(2004).
[30] R. Mohamed, C. D. Buckingham, How Caching Queries at Client-
peers Affects the Loads of Super-peer P2P Systems, in IEEE ICPCA
2008, (2008).
[31] M. Karnstedt, K. Hose, K. Sattler, Distributed Query Processing in
P2P Systems with Incomplete Schema Information, In DIWeb 2004:
34-45.(2004).
106 104

Resource Discovery Mechanisms For Peer-To-Peer Systems

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Resource Discovery Mechanisms For Peer-To-Peer Systems

Hochgeladen von

Copyright:

Verfügbare Formate

Resource Discovery Mechanisms for Peer-to-peer Systems

Das könnte Ihnen auch gefallen