Sie sind auf Seite 1von 5

Distributed Hashtable based Search Engine

Priyank Desai (priyank), Manthan Gandhi (manthan), Saumin Shah (saumin)

Introduction:
The project constituted of two parts. 1. In the first part, we simulated two intra-domain routing protocols: Link State (LS) and Distance Vector (DV) routing protocol. Later extended DV to path vector protocol. We also considered the bandwidth latency as a part of computing the shortest routes. 2. In the second part, we used above code and extended it to build a distributed hash table (Chord) on the application layer and then used it to build a mini-search engine. The search engine downloaded pages (names given on the prompt) and generated keywords from the meta section of the page. The inverted list of documents was then distributed across the nodes based on the keywords using SHA1 hash function. Later, we emulated the functionality on a small cluster of six nodes.

Approach:
We used extended version of NS3/C++ for the project, which had the basic skeleton code to startup with. The topology of the network for the simulation was provided through a text file.

Implementation: Below is a brief overview of implementation.


LS Routing Protocol: In this implementation each node periodically sends it LSP containing its information regarding the neighbors to its neighbors in the network. If the neighbor has the new information then it will update its LSP information and forward that packet to it neighbours. The packet will be dropped under following situations: 1. The packet reaches the source that generated it. 2. If the information received is older than the current stored information. 3. If the TTL of the packet has expired. Each information stored about the node in the network has an associated TTL with it, which is decremented on a periodic basis. If the TTL expires before a next update from that node is received then that information is deleted. Whenever there is any change in the topology, route tables are recalculated using Dijkstra's algorithm. DV Protocol: We created an initial routing table from the neighbor table by adding the cost of the path to the latter.

Each node sends distance vectors periodically as well as whenever there it detects any change in its topology (or routing tables). In this way latest information is propagated to the next neighbors. Based on the received information each node will update its route tables as required. The packet is serialized while sending and on the receiving end it is then deserialized. On the receiving end, the receiving node then checks each and every entry of the senders routing table with its routing table and then makes the necessary changes according to the cost of the paths. In this way a node gets its final routing table. Now a node will find out when a node gets disconnected or a path between two nodes. If it doesnt get a reply, then it would consider that node to be unreachable and will increase the Cost to infinity (or in this case 16). After the route table is made for each node, the packets can now be exchanged. Distributed Hash Table and Search: We implemented Chord as the underlying distributed hash table for search. The details of the implementation are similar to MIT Chord implementation. Later a search layer was built above it. We passed the names of URLs from the prompt to parse and create the keywords and distributed them using the SHA1. Screenshots of our implementation are below:

Initialization of network using scenario file.

Publishing of keys on nodes of the DHT.

Storage the distributed keys at various nodes.

Searching a keyword in DHT based engine.

Das könnte Ihnen auch gefallen