Beruflich Dokumente
Kultur Dokumente
Tefko Saracevic
Definition
Search
COMPUTING (transitive verb) to examine a computer file, disk, database, or network for particular information
Engine
something that supplies the driving force or energy to a movement, system, or trend
Search Engine
a computer program that searches for particular keywords and returns a list of documents in which they were found, especially a commercial service that scans documents on the Internet
Tefko Saracevic 2
Brief History
Very First tool used for searching was Archie created in 1990. Aliweb was next to come in 1993 which used the crawler. Web crawler and Lycos were next to come in 1994.
Tefko Saracevic
Crawler
URL1 URL2
Indexer
The Web
URL3 URL4
Eggs?
Eggs.
Eggs - 90% All About Eggo - 81% Your Eggs Ego- 40% by Browser Huh? -Am S. I. 10%
4
Ways of Searching
Keyword searching Refined Searching Relevancy Rankings Information on meta tags Concept based Searching
Tefko Saracevic
Tefko Saracevic
Web Crawler
Create a copy of all visited pages for later processing by a search engine. used for automating maintenance tasks on a website, such as checking links or validating HTML code
Tefko Saracevic
can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).
for a number of reasons crawlers cover only a fraction, not cover-invisible web.
Tefko Saracevic
Indexing
Search engine Indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power.
Tefko Saracevic
Tefko Saracevic
10
Case of
developed by Sergey Brin and Lawrence Page while students at Stanford
in the beginning run on Stanford computers
basic approach has been described in their famous paper The Anatomy of a Large-Scale Hypertextual Web Search Engine
well written, simple language, has their pictures in acknowledgement they cite the support by NSFs Digital Library Initiative i.e. initially, Google came out of government sponsored research describe their method PageRank - based on ranking hyperlinks as in citation indexing We chose our system name, Google, because it is a common spelling of googol, or ten on hundredth power
Tefko Saracevic 11
Coverage Differences
no engine covers more than a fraction of WWW
estimates: none more than 16%
hard (even impossible) to discern & compare coverage, but they differ substantially in what they cover
Tefko Saracevic 12
o in addition: many national search engines own coverage, orientation, governance many specialized or domain search engines
Tefko Saracevic
13
Limitations
Automated method of collecting informations rather crude. Information may be out of context . May produce out of date sites.
Tefko Saracevic
15
Search engines are also many times victims of spamdexing. use of techniques that push rankings higher than they belong is also called spamdexing. methods typically include textual as well as link-based techniques.
Tefko Saracevic
16
Search (SEO)
Engine
Optimization
Tefko Saracevic
17
Tefko Saracevic
18
Thank you
Tefko Saracevic
19