Beruflich Dokumente
Kultur Dokumente
Heap
n
Term Frequency,
Where,
•| D | : total number of documents in the corpus
• : number of documents where the
term ti appears (that is ).
Inverse Document Frequency
There are many different formulas used to calculate tf–
idf.
One way of calculating “document frequency”
(DF) is to determine how many documents contain the
word and divide it by the total number of documents in
the collection.
For Example ,If the word computer appears in 1,000
documents out of a total of 10,000,000 then the
document frequency is 0.0001 (1000/10,000,000).
Alternatives to this formula are to take
the log of the document frequency. The natural
logarithm is commonly used. In this example we
would have
idf = ln(1,000 / 10,000,000) =1/ 9.21
Inverse Document Frequency
Pros :-
Easy to use
Able to search more web pages in less
time.
High probability of finding the desired
page(s)
It will get at least some results when
no result had been obtained with
traditional search engines.
Pros and Cons of Meta Search Engines
Cons :-
Metasearch engine results are less relevant, since it
doesn’t know the internal “alchemy” of search engine
used.
Since, only top 10-50 hits are retrieved from each
search engine, the total number of hits retrieved may
be considerably less than found by doing a direct
search.
Advanced search features (like, searches with
boolean operators and field limiting ; use of " ", +/-.
default AND between words e.t.c.) are not usually
available.
Meta Search Engines Cont….
Meta- Ad Special
Primary Web
Search Databas Feature
Databases
Engine es s
Vivisimo Ask, MSN, Gigablast, Looksmart, Google Clusters
Open Directory, Wisenut results
Ask, MSN, Gigablast, Looksmart, Clusters
Clusty Google
Open Directory, Wisenut results
Ixquick AltaVista, EntireWeb, Gigablast, Yahoo
Go, Looksmart,Netscape, Open
Directory,Wisenut, Yahoo
Dogpile Ask, Google, MSN, Yahoo!, Teoma, Google, All top 4
Open Directory, more Yahoo engines
Mamma About, Ask, Business.com, Miva, Refine
EntireWeb, Gigablast, Open Ask options
AlltheWeb, AltaVista, EntireWeb,
Directory,Wisenut
Exalead, Hotbot, Looksmart, Visual
Kartoo Lycos, MSN, Open Directory, ?? results
Teoma, ToileQuebec, Voila, display
Wisenut, Yahoo
Meta Search Engines (MSEs)
Come In Four Flavors