Sie sind auf Seite 1von 5

28 Aug 2008

5/1/2008

Hands on Script
Search Engine - I

Amit K Khairnar

31 August 2008

At the end of the article you will come to know how search engine works.

Hands on script

How Does a Search Engine Work?

What are search engines? How do search engines work? Find your answers here.

What is a Search Engine?


By definition, an Internet search engine is an information retrieval system, which helps us find information on the World Wide Web. World Wide Web is the universe of information where this information is accessible on the network. It facilitates global sharing of information. But WWW is seen as an unstructured database. It is exponentially growing to become enormous store of information. Searching for information on the web is hence a difficult task. There is a need to have a tool to manage, filter and retrieve this oceanic information. A search engine serves this purpose.

31 August 2008

How does a Search Engine Work?


1. Internet search engines are web search engines that search and retrieve information on the web. Most of them use crawler indexer architecture. They depend on their crawler modules. Crawlers also referred to as spiders are small programs that browse the web. 2. Crawlers are given an initial set of URLs whose pages they retrieve. They extract the URLs that appear on the crawled pages and give this information to the crawler control module. The crawler module decides which pages to visit next and gives their URLs back to the crawlers. 3. The topics covered by different search engines vary according to the algorithms they use. Some search engines are programmed to search sites on a particular topic while the crawlers in others may be visiting as many sites as possible.
3

31 August 2008

4. The crawl control module may use the link graph of a previous crawl or may use usage patterns to help in its crawling strategy. 5. The indexer module extracts the words form each page it visits and records its URLs. It results into a large lookup table that gives a list of URLs pointing to pages where each word occurs. The table lists those pages, which were covered in the crawling process. 6. A collection analysis module is another important part of the search engine architecture. It creates a utility index. A utility index may provide access to pages of a given length or pages containing a certain number of pictures on them. 7. During the process of crawling and indexing, a search engine stores the pages it retrieves. They are temporarily stored in a page repository. Search engines maintain a cache of pages they visit so that retrieval of already visited pages expedites. 8. The query module of a search engine receives search requests form users in the form of keywords. The ranking module sorts the results. 9. The crawler indexer architecture has many variants. It is modified in the distributed architecture of a search engine. These search engine architectures consist of gatherers and brokers. Gatherers collect indexing information from web servers while the brokers give the indexing mechanism and the query interface. Brokers update indices on the basis of information received from gatherers and other brokers. They can filter information. Many search engines of today use this type of architecture.

31 August 2008

Best of luck!
For any Question or suggestion contact on the Email Id given below:

By Amit K Khairnar Email: amitkhairnar_2004@yahoo.co.in hands_on_scripts@yahoo.co.in

The figure in the cover is the microprocessor chip of Intel Pentium taken from www.hardwarezone.com .

Das könnte Ihnen auch gefallen