Sie sind auf Seite 1von 15

IC0102 Web-based Information Systems

Web Search Engines

Web Search Engines

A/Prof. Yang Zhonghua

Web Search Engine / 2 (59)

A/Prof. Yang, Zhonghua

Internet search engines

The first law of e-commerce is that if users cannot find the product, they cannot buy it either. Jakob Nielsen

Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various search engines work, but they all perform three basic tasks:
1. They search the Internet -- or select pieces of the Internet -- based on important words. 2. They keep an index of the words they find, and where they find them. 3. They allow users to look for words or combinations of words found in that index.

Web Search Engine / 3 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 4 (59)

A/Prof. Yang, Zhonghua

Searches Per Day: Top 5 Engines


Google search is the world's most popular search engine
Searches Google Yahoo MSN AOL Ask Others Total Per Day (Millions) Per Month (Millions) 2,733 1,792 845 486 378 166 6,400

91 60 28 16 13 6 213

the United States in March 2006

Web Search Engine / 5 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 6 (59)

A/Prof. Yang, Zhonghua

Google search engine


Top U.S. Search Providers by Searches, Provider Google Yahoo MSN/Windows Live AOL Ask.com My Web Search Comcast EarthLink My Way Dogpile.com Other All search Source: Nielsen//NetRatings, 2007 May 2007 Searches (000) 4,033,277 1,540,949 605,400 381,961 142,418 61,784 34,908 33,461 30,122 26,295 275,365 7,165,940 Share of Total Searches (%) 56.3 21.5 8.4 5.3 2.0 0.9 0.5 0.5 0.4 0.4 3.8 100.0

Google has one of the largest databases of Web pages, including many other types of web documents (blog posts, wiki pages, group discussion threads and document formats (e.g., PDFs, Word or Excel documents, PowerPoints). Despite the presence of all these formats, Google's popularity ranking often makes pages worth looking at rise near the top of search results.

Web Search Engine / 7 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 8 (59)

A/Prof. Yang, Zhonghua

Second Opinion in searching

Features in common

Google alone is often not sufficient, however. Less than half the searchable Web is fully searchable in Google. Overlap studies show that about half of the pages in any search engine database exist only in that database. Getting a second opinion is therefore often worth your time.
Ask.com or Yahoo! Search.

Things You CAN Do in Google, Yahoo!, and Ask.com Phrase Searching by enclosing terms in double quotes OR searching with capitalized OR - excludes, + requires exact form of word Limit results by language in Advanced Search

Things NOT Supported in Google, Yahoo!, or Ask.com Truncation - use OR searches for variants (airline OR airlines) Case sensitivity capitalization does

Web Search Engine / 9 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 10 (59)

A/Prof. Yang, Zhonghua

Features the search engines differ


Search Engine Links to help Google www.google.com Google help pages Yahoo! Search search.yahoo.com Yahoo! help pages Ask.com www.ask.com Ask help pages

Features the search engines differ


Search Engine Google www.google.com Partial. AND assumed between words. Capitalize OR. - excludes. No ( ) or nesting. In Advanced Search, partial Boolean available in boxes. - excludes + will allow you to retrieve "stop words" (e.g., +in) Sort of . At bottom of results page, click "Search within results" and enter more terms. Adds terms. Yahoo! Search search.yahoo.com Accepts AND, OR, NOT or AND NOT, and ( ). Must be capitalized. You must enclose terms joined by OR in parentheses (classic Boolean). - excludes + will allow you to search common words: "+in truth" Ask.com www.ask.com Partial. AND assumed between words. Capitalize OR. - excludes. No ( ) or nesting.

Size, type

HUGE. Size not HUGE. Claims over disclosed in any way that allows comparison. 20 billion total "web objects." Probably the biggest.

LARGE. Claims to have 2 billion fully indexed, searchable pages.

Boolean logic

Noteworthy features and limitations

Popularity ranking using PageRank. Indexes the first 101KB of a Web page, and 120KB of PDF's. ~ before a word finds synonyms sometimes (~help > FAQ, tutorial, etc.)

Shortcuts give quick access to dictionary, synonyms, patents, traffic, stocks, encyclopedia, and more.

Subject-Specific Popularity ranking. Suggests broader and narrower terms.

+Requires/ Excludes

- excludes + will allow you to retrieve "stop words" (e.g., +in)

Sub-Searching

Add terms.

Sort of . Add terms.

Web Search Engine / 11 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 12 (59)

A/Prof. Yang, Zhonghua

Features the search engines differ


Search Engine Google www.google.com Yahoo! Search search.yahoo.com Ask.com www.ask.com

Features the search engines differ


Search Engine Google www.google.com link: site: intitle: inurl: Offers U.S.Gov't Search and other special searches. Patent search. Yahoo! Search search.yahoo.com Ask.com www.ask.com

Based on page popularity measured in links to it from other pages: high rank if a lot of other pages link to it. Automatic Fuzzy Results Ranking Fuzzy AND also AND. invoked. Matching and ranking based on "cached" version of pages that may not be the most recent version.

Field limiting Based on SubjectSpecific Popularity, links to a page by related pages. Truncation Stemming

link: site: intitle: inurl: url: hostname:

intitle: inurl: site:

No truncation. Stems some words. Search Neither. Search variant endings and synonyms separately, with OR as in Google. separating with OR (capitalized): airline OR airlines

Neither. Search with OR as in Google.

Web Search Engine / 13 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 14 (59)

A/Prof. Yang, Zhonghua

Features the search engines differ


Search Engine Google www.google.com Yes. Major Romanized and non-Romanized languages in Advanced Search. Yes, in Translate this page link following some pages. To and sometimes from English and major European languages and Chinese, Japanese, Korean. Yahoo! Search search.yahoo.com Yes. Major Romanized and non-Romanized languages. Ask.com www.ask.com Yes. Major Romanized languages. Use Advanced Search to limit.

Role of search engines for e-commerce 80% of traffic determined by search 60% would use search to research a purchase 67% would choose a natural search result Examples (each month in the UK):
500,000 search for shopping 100,000 for clothes, shirts & shoes 1,000,000 for mobile phone 250,000 for furniture 25,000 for bed linen

Language

Translation

Yes.

No.

Web Search Engine / 15 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 16 (59)

A/Prof. Yang, Zhonghua

Why it matters financially


Shopper enters search. About 10,000 people looked for ski jackets in Dec 2004 Natural/organic search results: over 79,000 pages in the UK

Why it matters financially


Number of Click thrus Visitors Conv. ratio searches 300,000 30% 10,000 30% 1,000 30% 90,000 3,000 300 0.05% 1% 5% Value Value p.a. p.m. 242,460 161,640 80,820

Search term DVD player Sony DVD player

Orders

45 20,205 30 13,470 15 6,735

Analysis suggests roughly 30% of searchers will click a top three result, another 20% on rest of page one (top ten).

Sony RDR-GX7

Assumes a top three search result and a purchase price of 449.


Paid for search results at a cost of 0.62p per click for this keyword. Rough click-through rate of 5-10%

what the term "natural" or "organic" search engine-listing means, they describe the "editorial" search results on any particular engine. These results are professed to be non-biased - meaning that the engine will not accept money to influence the rankings of any individual sites. Web Search Engine / 17 (59) A/Prof. Yang, Zhonghua Web Search Engine / 18 (59) A/Prof. Yang, Zhonghua

Whats the goal?


All products and categories appear in Google & other major search engines: completely & elegantly Sites perform well for generic searches
2.8% 2.2%1.5% 2.6% 3.8% Google

How do you come top of Google?


1) 2) 3)
Yahoo

Indexability Relevance Link popularity

9.6%

M SN

AOL 55.8% Lycos 21.7%

Alt avist a

Ask Jeeves

Ot hers

Web Search Engine / 19 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 20 (59)

A/Prof. Yang, Zhonghua

Indexability The site must be navigated by robots and spiders Its content must be readable Robots dont like frames Robots dont like Flash Robots cant read into product catalogues

What robots read and index

page titles URLs

body copy

links image names

Web Search Engine / 21 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 22 (59)

A/Prof. Yang, Zhonghua

what robots read and index

Relevance The content of your site must be relevant It must reflect the keywords Keywords are the words or phrases that web users use to search for information on the web Where and how you place and present these keywords in your site is vital

description meta tag

Web Search Engine / 23 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 24 (59)

A/Prof. Yang, Zhonghua

choosing keywords Keywords reflect:


Your core business/product/service offering Your unique sales/value proposition What your customers are looking for on the Internet

Where to put keywords Page title (the single most important place) Description meta tag (appears in listings) Body headers (H1) and copy Image/file names Image alt tags URLs Keywords meta tag and Offsite descriptions (directories etc)

Other influencing factors


Popularity Saturation Relevance Priorities (& quantity)

Web Search Engine / 25 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 26 (59)

A/Prof. Yang, Zhonghua

What matters Meta tags


Description (pay attention to size) Page title (pay attention to size)

Popularity Determined primarily by number of inbound & relevant links Influenced by frequency and recency of updates Visible in Googles Page Rank

Category page (make it relevant) Product page (make it relevant) Offsite relevance (directories, links)

Web Search Engine / 27 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 28 (59)

A/Prof. Yang, Zhonghua

How do I increase popularity? Get lots of people to link to your site (with the right keywords) Common approaches:
Get in the important directories Self-managed affiliate programmes Develop valuable content Research, surveys and quizzes Weblogs (blogs) Social bookmarks (del.icio.us)

the two most important links


Open directory Project Yahoo Directory

Web Search Engine / 29 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 30 (59)

A/Prof. Yang, Zhonghua

Searching a database

How do Search Engines Work?

Search Engines for the general web (like all those listed above) do not really search the World Wide Web directly. Each one searches a database of the full text of web pages selected from the billions of web pages out there residing on servers.
When you search the web using a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve from the server the current version of the page.

Web Search Engine / 31 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 32 (59)

A/Prof. Yang, Zhonghua

Robots: Spider

Page Links submission

Search engine databases are selected and built by computer robot programs called spiders (Web crawler).
Although it is said they "crawl" the web in their hunt for pages to include, in truth they stay in one place. They find the pages for potential inclusion by following the links in the pages they already have in their database (i.e., already "know about"). They cannot think or type a URL or use judgment to "decide" to go look something up and see what's on the web about it.

If a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page - one that no other page has ever linked to - can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included.
All search engine companies offer ways to do this.

Web Search Engine / 33 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 34 (59)

A/Prof. Yang, Zhonghua

Indexing

After spiders find pages, they pass them on to another computer program for "indexing."
This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content.

"Spiders" take a Web page's content and create key search words that enable online users to find pages they're looking for.

Web Search Engine / 35 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 36 (59)

A/Prof. Yang, Zhonghua

What to look

Meta Tags

When the Google spider looked at an HTML page, it took note of two things:
The words within the page Where the words were found

Meta tags allow the owner of a page to specify key words and concepts under which the page will be indexed.
There is, however, a danger in over-reliance on meta tags, because a careless or unscrupulous page owner might add meta tags that fit very popular topics but have nothing to do with the actual contents of the page. To protect against this, spiders will correlate meta tags with page content, rejecting the meta tags that don't match the words on the page.

Words occurring in the title, subtitles, meta tags and other positions of relative importance were noted for special consideration during a subsequent user search.
The Google spider was built to index every significant word on a page, leaving out the articles "a," "an" and "the." Other spiders take different approaches.

Web Search Engine / 37 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 38 (59)

A/Prof. Yang, Zhonghua

Meta tag (NTU)

The Meta Description Tag

The meta description tag allows you to influence the description of your page in the crawlers that support the tag
But Google ignores the meta description tag and instead will automatically generate its own description for this page

Web Search Engine / 39 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 40 (59)

A/Prof. Yang, Zhonghua

Meta Robots Tag

Indexing: weight

The robots tag lets you specify that a particular page should NOT be indexed by a search engine.

To make for more useful results, most search engines store more than just the word and URL. An engine might store the number of times that the word appears on a page. The engine might assign a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page.
Each commercial search engine has a different formula for assigning weight to the words in its index.

Web Search Engine / 41 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 42 (59)

A/Prof. Yang, Zhonghua

How Search Engines Rank Web Pages

How Search Engines Rank Web Pages

How do crawler-based search engines go about determining relevancy follow a set of rules, known as an algorithm.
Exactly how a particular search engine's algorithm works is a closely-kept trade secret.

One of the main rules in a ranking algorithm involves the location and frequency of keywords on a web page. Call it the location / frequency method, for short.
Pages with the search terms appearing in the HTML title tag are often assumed to be more relevant than others to the topic. Search engines will also check to see if the search keywords appear near the top of a web page, Frequency is the other major factor in how search engines determine relevancy. A search engine will analyze how often keywords appear in relation to other words in a web page

However, all major search engines follow the general rules below.

Web Search Engine / 43 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 44 (59)

A/Prof. Yang, Zhonghua

How Search Engines Rank Web Pages "off the page" ranking criteria.
Off the page factors are those that a webmasters cannot easily influence. Chief among these is link analysis.

How Search Engines Rank Web Pages In addition, sophisticated techniques are used to screen out attempts by webmasters to build "artificial" links designed to boost their rankings. Another off the page factor is click through measurement.
a search engine may watch what results someone selects for a particular search, then eventually drop high-ranking pages that aren't attracting clicks, while promoting lower-ranking pages that do pull in visitors.

By analyzing how pages link to each other, a search engine can both determine what a page is about and whether that page is deemed to be "important"

Web Search Engine / 45 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 46 (59)

A/Prof. Yang, Zhonghua

Placement Tips for most "relevant"

Create Relevant Content

Pick Your Target Keywords


How do you think people will search for your web page? The words you imagine them typing into the search box are your target keywords. Your target keywords should always be at least two or more words long.

Your keywords need to be reflected in the page content. consider "expanding" your text references, where appropriate.
For example, a stamp collecting page might have references to "collectors" and "collecting." Expanding these references to "stamp collectors" and "stamp collecting" reinforces your strategic keywords in a legitimate and natural manner.

Position Your Keywords


Make sure your target keywords appear in the crucial locations on your web pages. The page's HTML title tag is most important. Build your titles around the top two or three phrases that you would like the page to be found for.

Web Search Engine / 47 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 48 (59)

A/Prof. Yang, Zhonghua

Build Inbound Links Every major search engine uses link analysis as part of its ranking algorithm. By building links, you can help improve how well your pages perform in link analysis systems.
You want links from good web pages that are related to the topics you want to be found for.

Build Inbound Links Here's one simple means to find those good links.
Using a search engine, search for your target keywords. Look at the pages that appear in the top results. Now visit those pages and ask the site owners if they will link to you. Not everyone will, especially sites that are extremely competitive with yours.

Web Search Engine / 49 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 50 (59)

A/Prof. Yang, Zhonghua

Submit Your Key Pages

Most search engines will index the other pages from your web site by following links from a page you submit to them.
submit the top two or three pages that best summarize your web site.

Verify and Maintain Your Listing

Web Search Engine / 51 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 52 (59)

A/Prof. Yang, Zhonghua

Invisible Web pages

Submitting To Directories

Some types of pages and links are excluded from most search engines by policy. Others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the "Invisible Web
what you don't see in search engine results.

Submitting To Directories: Yahoo & The Open Directory The Open Directory Project (aka ODP or DMOZ) is a volunteer-built guide to the web.
It is provided as an option at many major search engines, including Google. Given this, being listed with the Open Directory can add value to any site. Submission is absolutely free.

Yahoo maintains its own independent "directory" of Web sites


Anyone can use Standard submission to submit for free to a non-commercial category.
dmoz (from directory.mozilla.org, ODP's original domain name)

Web Search Engine / 53 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 54 (59)

A/Prof. Yang, Zhonghua

Paid Search Advertising

What Makes a Search Engine Good?


Parts of Search Engines Variables, and their implications for your searches

Paid Search Advertising: Google AdWords, Yahoo Search Marketing & Microsoft adCenter Every major search engine with significant market share accepts paid listings.
This unique form of search engine advertising guarantees that your site will appear in the top results for the keyword terms you target within a day or less. Paid search listings are also called sponsored listings and/or Pay Per Click (PPC) listings.

Size of database: How many documents does the search engine claim it has? How much of the total web are you able to search? Freshness ("up-to-dateness"): Search engine databases consist of copies of web pages and other documents that were made when their crawlers or spiders last visited each site. How often is the database refreshed to find new pages? How often do their crawlers update the Database of web copies of the web pages you are searching? Completeness of documents text: Is the database really "full" text, or only parts of the pages? Is every word indexed? Types of documents offered: All search engines offer web pages. Do they also have extensive PDF, Word, Excel, PowerPoint, and other formats like WordPerfect? Are they full-text searchable? Speed and consistency: How fast is it? How consistent is it? Do you get different results at different times?

Web Search Engine / 55 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 56 (59)

A/Prof. Yang, Zhonghua

What Makes a Search Engine Good?


Parts of Search Engines Variables, and their implications for your searches

What Makes a Search Engine Good?


Parts of Search Engines Variables, and their implications for your searches

Basic Search options and limitations: Automatic default of AND assumed between words? Accepts " " to create phrases? Is there an easy way to allow for synonyms and equivalent The search terms (OR searching)? Can you OR phrases or just single engine's capabilities: All words? Advanced Search options and limitations: Can you search engines require your search terms in specific fields, such as the document title? Can you require some words in certain fields let you enter some keywords and others anywhere? Can you restrict to documents only from a certain domain (org, edu, gov, etc.)? Limit to more and search on than one or only one? Can you limit by type of document (pdf them. What happens inside? or excel, etc.)? More than one? Can you limit by language? Can you limit in How reliably and easily can you limit to date last updated? General limitations and features: What do you have to do ways that will make it search on common or stop words? Maximum limit on increase your search terms or on search complexity? Ability to search chances of finding what you within previous results? Can you count on consistent results are looking for? from search to search and from day to day? Can you customize the search or display? Is there a "family" filter? Does it work well? Is it easy to turn on or off?

Results display All search engines return a list of results it "thinks" are what you are looking for. How well does it "think like you expect it think"?

Ranking: Are they ranked by popularity or relevancy or both? Do pages with your words juxtaposed (like a phrase) rank highest? Do you get pages with only some of your words, perhaps in addition to pages with them all? Display: Are your keywords highlighted in context, showing excerpts from the web pages which caused the match? Some other excerpt from the page? Collapse pages from the same site: If it shows only one or a few pages from a site, does it show the one(s) with your terms? How easy is it to see all from the site? Can this be changed and saved as your preferred search method?

Other features

Search engine designers try to come up with all kinds of features and services that they hope will allure you to their services.
A/Prof. Yang, Zhonghua

Web Search Engine / 57 (59)

A/Prof. Yang, Zhonghua

Web Search Engine / 58 (59)

Summary Importance of search engine How it works How to user in term of ranking Features

Web Search Engine / 59 (59)

A/Prof. Yang, Zhonghua

Das könnte Ihnen auch gefallen