a.a. 2003/04 Mauceri Calogero 16 May 2007 Mauceri Calogero P2P Architectures 2 / 29 Outline Peer to peer architecture DHT Pastry Scribe 16 May 2007 Mauceri Calogero P2P Architectures 3 / 29 Publish/Subscribe Systems I want to receive msg with topic X ? P S S S 16 May 2007 Mauceri Calogero P2P Architectures 4 / 29 Client-Server Architecture Server P1 P2 S1 S2 S3 Simple A lot of traffic and computation in the server No scalable Single point of failure 16 May 2007 Mauceri Calogero P2P Architectures 5 / 29 Peer-to-peer Architecture Distributed system architecture: No centralized control Nodes are symmetric in function Large number of heterogeneous and unreliable nodes 16 May 2007 Mauceri Calogero P2P Architectures 6 / 29 Lookup Problem P ? Store msgs with topic X I want to receive msgs with topic X S R 16 May 2007 Mauceri Calogero P2P Architectures 7 / 29 Centralized Approach P S Single Point of failure O(N) status in the coordinator node R C 16 May 2007 Mauceri Calogero P2P Architectures 8 / 29 Flooding Approach P S R O(N) messages per lookup 16 May 2007 Mauceri Calogero P2P Architectures 9 / 29 Distributed Hash Table Nodes are the hash buckets Key identifies data uniquely DHT balances keys and data across nodes DHT replicates, caches, routes lookups, etc. Distributed hash tables Distributed applications Lookup (key) data node node node . Insert(key, data) 16 May 2007 Mauceri Calogero P2P Architectures 10 / 29 Distributed Hash Table Requirements Scalability Adaptivity Reliability Efficiency Decentralized Load Balance 16 May 2007 Mauceri Calogero P2P Architectures 11 / 29 Distributed Hash Table Approach P S R Usually O(log(N)) messages per lookup Building and maintenance of routing tables is needed Pastry A peer-to-peer protocol example 16 May 2007 Mauceri Calogero P2P Architectures 13 / 29 Pastry Design Node IDs and keys are 128bit numbers with base 2 b . Each node has a unique nodeIDassigned randomly when node joins the system. Key are stored in the node with nodeID numerically closest to the key. Prefix routing scheme (numerically closest). Expected routing steps is O(log N). 16 May 2007 Mauceri Calogero P2P Architectures 14 / 29 Pastry Routing Table Nodes numerically closer to the present Node Nodes physically closest to the present node Prefix-based routing entries: common prefix with 10233102-next digit-rest of NodeId nodeId=10233102, b =2, l = 8 16 May 2007 Mauceri Calogero P2P Architectures 15 / 29 Routing Check the leaf set If destination present route to it directly Consult the routing table Forward to a node whose identifier matches the message key in at least one more digit Forward to a node whose identifier is numerically closer to the message key This routing procedure converge 16 May 2007 Mauceri Calogero P2P Architectures 16 / 29 Routing Example 65a1fc d13da3 d4213f d462ba d467ca d471f1 Route(d46a1c) 16 May 2007 Mauceri Calogero P2P Architectures 17 / 29 Node Arrival Node X arrival: Contact a known nearby Pastry node A to send joinmessage with node Xs nodeId as the key. The message will be routed to the node Z with numerically closest nodeId with the key. Node X initializes its state based on the nodes on the routing path from A to Z. Inform related nodes to update their states. 16 May 2007 Mauceri Calogero P2P Architectures 18 / 29 Node Arrival Example 65a1fc d13da3 d4213f d462ba d467ca Route(d46a1c) d46a1c I want to join to Pastry network 16 May 2007 Mauceri Calogero P2P Architectures 19 / 29 Node Departure Node departure or failure: Failure in the leaf set: update from the largest or smallest leaf. Failure in the routing table: update from the nodes in the same row or higher rows. Failure in the neighborhood set: periodically check and update from other neighbors. 16 May 2007 Mauceri Calogero P2P Architectures 20 / 29 Locality (1/2) when X joins the Pastry network, it send a join message to A and finally the message will reach Z Because A is supposed to be physically near to X, the entries in the 1 st row of A are near to A, and thus near to X (triangular inequality) The nodes in the 2 nd row of Bs routing table are near to B, but much farther than X to B These nodes are likely to be near to X, although not exactly the nearest To compensate for the cascading error, X asks the nodes in its neighborhood set and routing table for their states, and find the entries nearest to X 16 May 2007 Mauceri Calogero P2P Architectures 21 / 29 Locality (2/2) X A B C Z Level 0 Level 1 Level 2 16 May 2007 Mauceri Calogero P2P Architectures 22 / 29 Locating the nearest among k nodes Goal: among the k numerically closest nodes to a key, a message tends to first reach a node near the client. Problem: Since Pastry routes primarily based on nodeId prefixes, it may miss nearby nodes with a different prefix than the key. Solution (using a heuristic): Based on estimating the density of nodeIds, it detects when a message approaches the set of k and then switches to numerically nearest address based routing to locate the nearest replica. Scribe How a peer-to-peer protocol can be used in a publish/subscribe system 16 May 2007 Mauceri Calogero P2P Architectures 24 / 29 Scribe Topic-based publish-subscribe system Peer-to-peer network of Pastry nodes Application-level multicast Best-effort dissemination of events 16 May 2007 Mauceri Calogero P2P Architectures 25 / 29 Scribe Architecture Each topic has a unique topicId. Rendezvous point The Scribe node with a nodeId numerically closet to the topicId Root of multicast tree. 16 May 2007 Mauceri Calogero P2P Architectures 26 / 29 Create Topic Rendez-vous point nodeId = topicId Route(CREATE,topicId) 16 May 2007 Mauceri Calogero P2P Architectures 27 / 29 Subscription Route(JOIN,topicId = 1100) 0111 1000 1001 1011 1100 1101 1111 0100 0111 1001 0100 1101 1100 1011 16 May 2007 Mauceri Calogero P2P Architectures 28 / 29 Event Dissemination Route through the Pastry network using the topicId as the destination Dissemination along the multicast tree starting from the root 0111 1001 0100 1101 1100 1011 16 May 2007 Mauceri Calogero P2P Architectures 29 / 29 Repairing the multicast tree 0111 1001 0100 1101 1100 1111 16 May 2007 Mauceri Calogero P2P Architectures 30 / 29 References Distributed Hash Tables (DHTs): Design and Performance. Brad Karp. (ppt file) A Survey on Distributed Hash Tables. Nitin Gupta. (html page) Pastry: Scalable, distributed object location and routing for large-scale peer-to- peer systems. Antony Rowstron and Peter Druschel. (pdf file) Scribe: A large-scale and decentralized application-level multicast infrastructure. Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony Rowstron. (pdf file)