Beruflich Dokumente
Kultur Dokumente
Hypertextual
Web Search Engine
Sergey Brin and Lawrence Page
Computer Science Department,
Stanford University, Stanford, CA 94305, USA
sergey@cs.stanford.edu and page@cs.stanford.edu
Introduction
• This paper marks the birth of Google search engine. The paper
describes the first Google prototype designed by Sergey Brin and
Lawrence Page.
• The paper also cover the challenges faced during developing a search
engine that would have the capability to handle the largest amount of
dataset ever seen and perform an even larger query set on the
dataset.
• The main goal of Google was to improve the quality of web searches.
To do this Google employed many methods, one of which was to use
the hyperlink text available.
Before Google
Generally, people used to type in their query into –
• A search engine like Yahoo!, which kept high quality human
maintained indices
• But these indices were prone to error, expensive, slow and did not cover all
the topics.
• Or any other automated search engines available at that time.
• These search engines used keyword matching to return matches. The matches
returned were of low quality and prone to manipulation.
A brief history of search engines (in numbers)
• In 1994, one of the first web search engines, the World Wide Web Worm
(WWWW) had an index of 110,000 web pages and web accessible
documents and received an average of about 1500 queries per day.
• As of November, 1997, the top search engines claimed to index from 2
million to 100 million web documents. Altavista claimed it handled roughly
20 million queries per day.
• The authors predicted, rather correctly, that by the year 2000, a
comprehensive index of the Web will contain over a billion documents and
will handle hundreds of millions of queries per day.
• Thus, Google focused on being such a search engine that excelled in both,
quality and scalability.
Google: Scalability
• What was needed to be done?
• Fast crawling technology
• Efficient storage
• Quick query retrieval
• How it was achieved?
• Parallelization of time consuming tasks
• Efficient use of storage with the help of various data structures
• Kept in mind the growth rate or hardware
Design Goals
• Improved search quality
• Very high precision
• Academic search engine research
• An architecture that supports research activities
System features
PageRank
Bringing order to the Web
Citation Graph
• A citation graph is a directed
graph in which
each vertex represents a
document and in which each
edge represents a citation from
the current publication to
another.
• The citation graph of Web of
Webgraph describes the
directed links between pages of
the World Wide Web
PageRank Calculations X