Search Engine

Project Presentation
Subject: Professional Practises-3

Faculty: Mudassir Mahadik Sir
Project Details:
Group Members:
1.
Qasim Dadan
2.
Rashid Shaikh
3.
Saif Khan
4.
Wahaj Shaikh
Project Topic: Search Engine and Web

Crawler(Spider)
Search Engine Introduction

Search Engine Definition:
Aweb search engineisasoftwaresystemthatisdesignedtosearchfor

informationontheWorldWideWeb.Thesearchresultsaregenerallypresentedin
alineofresultsoftenreferredtoassearchengineresultspages(SERPs).The
informationmaybeamixofwebpages,images,andothertypesoffiles.Some
searchenginesalsominedataavailableindatabasesoropendirectories.Unlike
webdirectories,whicharemaintainedonlybyhumaneditors,searchenginesalso
maintainreal-timeinformationbyrunninganalgorithmonawebcrawler.
Purpose of Search Engine

Helping people find what theyre looking for
Starts with an information need
Convert to a query
Gets results
Materials available on these Systems
Web pages
Other formats
Deep Web
A search engine operates in the

following order:
1. Web Crawling:
It is bot for purpose of indexing pages into the
database.
2. Indexing:
It decides rank(priority) of indexed results.
3. Searching:
It is process of looking up into the search database by
firing a simple query.
Working Diagram of Search Engine
Explanation of Working Diagram:

Indexing Process:
The search engine analyzes the contents of each page to determine
how it should beindexed(for example, words can be extracted from the
titles, page content, headings, or special fields calledmeta tags). Data
about web pages are stored in an index database for use in later queries. A
query from a user can be a single word. The index helps find information
relating to the query as quickly as possible.
Searching Process:
When a user enters aqueryinto a search engine (typically by
usingkeywords), the engine examines itsindexand provides a listing of
best-matching web pages according to its criteria, usually with a short
summary containing the document's title and sometimes parts of the text.
The index is built from the information stored with the data and the
method by which the information is indexed.
Crawling Process:
A Web crawler starts with a list ofURLsto visit, called theseeds. As the
crawler visits these URLs, it identifies all thehyperlinksin the page and
adds them to the list of URLs to visit, called thecrawl frontier. URLs from
the frontier arerecursivelyvisited according to a set of policies. If the
crawler is performing archiving ofwebsitesit copies and saves the
information as it goes
Search is Mostly Invisible

Like an iceberg,
2/3 below water
user
interface
content
search
functionality
10
Web Crawling Introduction
AWeb crawleris anInternet botwhich systematically browses the

World Wide Web, typically for the purpose ofWeb indexing. A Web crawler
may also be called aWeb spider,anant, anautomatic indexer,or aWeb
scutter. Web search enginesand some other sites use Web crawling or
spidering software to update theirweb contentor indexes of others sites' web
content. Web crawlers can copy all the pages they visit for later processing by
a search engine whichindexesthe downloaded pages so theuserscan search
much more efficiently.
Crawlers can validatehyperlinksandHTMLcode. They can also be used for

web scraping
Utilities of a crawler
Web Crawling Definition:
A Web crawler is a computer program that browses the World Wide

Web in a methodical, automated manner. (Wikipedia)
Web Crawling Utilities:
Gather pages from the Web.
Support a search engine, perform data mining and so on.
Objects Processed by Crawler:
Text, video, image and so on.
Overview of Crawler
A Web crawler starts with a list ofURLsto visit, called theseeds. As the
crawler visits these URLs, it identifies all thehyperlinksin the page and adds
them to the list of URLs to visit, called thecrawl frontier. URLs from the
frontier arerecursivelyvisited according to a set of policies. If the crawler is
performing archiving ofwebsitesit copies and saves the information as it
goes. The archives are usually stored in such a way they can be viewed, read
and navigated as they were on the live web, but are preserved as snapshots'.
The large volume implies the crawler can only download a limited number of
the Web pages within a given time, so it needs to prioritize its downloads.
The high rate of change can imply the pages might have already been
updated or even deleted.
Thank You

Search Engine

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Search Engine

Hochgeladen von

Copyright:

Verfügbare Formate

Project Presentation

Subject: Professional Practises-3

Project Topic: Search Engine and Web

Search Engine Introduction

Aweb search engineisasoftwaresystemthatisdesignedtosearchfor

Purpose of Search Engine

Starts with an information need

Materials available on these Systems

A search engine operates in the

Working Diagram of Search Engine

Explanation of Working Diagram:

Search is Mostly Invisible

Web Crawling Introduction

AWeb crawleris anInternet botwhich systematically browses the

Crawlers can validatehyperlinksandHTMLcode. They can also be used for

Web Crawling Definition:

A Web crawler is a computer program that browses the World Wide

Web Crawling Utilities:

Gather pages from the Web.

Support a search engine, perform data mining and so on.

Objects Processed by Crawler:

Text, video, image and so on.

Das könnte Ihnen auch gefallen