Evaluation of IR

Performance Evaluation
of Information Retrieval Systems

Why System Evaluation?
There are many retrieval models/ algorithms/

systems, which one is the best?
What is the best component for:
Ranking function (dot-product, cosine, )
Term selection (stopword removal, stemming)
Term weighting (TF, TF-IDF,)
How far down the ranked list will a user need
to look to find some/all relevant documents?
Wednesday, July 19, 2017 CS F469:Information Retrieval 2

Relevance in IR
In information science and in IR, relevance
denotes how well a retrieved document or set of
documents meets the information need of the user.
Relevance may include concerns such as
timeliness, authority or novelty of the result.

Precision and Recall
Precision(also called positive predictive value)
The ability to retrieve top-ranked documents that are
mostly relevant. In other way, it is the fraction of retrieved
documents that are relevant
Recall(also called sensitivity)
The ability of the search to find all of the relevant
documents in the corpus. In other way, it is the fraction of
relevant documents that are retrieved

Precision and Recall
relevant irrelevant
Entire document
collection Relevant Retrieved retrieved & Not retrieved &
documents documents irrelevant irrelevant
retrieved & not retrieved but

relevant relevant
retrieved not retrieved
Number of relevant documents retrieved

recall
Total number of relevant documents
Number of relevant documents retrieved

precision
Total number of documents retrieved

Trade-off between Recall and Precision
Returns relevant documents but
misses many useful ones too The ideal
1
Precision
0 1
Recall Returns most relevant
documents but includes
lots of junk

Precision vs. Recall
In information retrieval, a perfect precision

score of 1.0 means that every results
retrieved by a search was relevant (but says
nothing about whether all relevant
documents were retrieved) whereas a
perfect recall score of 1.0 means that all
relevant documents were retrieved by the
search (but says nothing about how many
irrelevant documents were also retrieved).

Precision vs. Recall, cont
Precision can be seen as a measure of

exactness or quality, whereas recall is a
measure of completeness or quantity. In
simple terms, high precision means that an
algorithm returned substantially more
relevant results than irrelevant, while high
recall means that an algorithm returned
most of the relevant results.

Example 1
There are total 20 documents in a corpus out

of which 10 are relevant. If a search engine
able to retrieve 10 documents out of which 5
are relevant then compute precision, recall
and F-measure.
Solution: Precision = 5/10 = 1/2 =0.5
Recall = 5/10 = = 0.5
F-measure = 2(0.5*0.5)/ (0.5+0.5) = 0.5

Computing Recall/Precision Points:
Example 1
n doc # relevant
Let total # of relevant docs = 6
1 588 x Check each new recall point:
2 589 x
3 576
R=1/6=0.167; P=1/1=1
4 590 x
5 986
R=2/6=0.333; P=2/2=1
6 592 x
7 984 R=3/6=0.5; P=3/4=0.75
8 988
9 578 R=4/6=0.667; P=4/6=0.667
10 985
11 103 Missing one
relevant document.
12 591
Never reach
13 772 x R=5/6=0.833; p=5/13=0.38 100% recall
14 990
Computing Recall/Precision Points:
Example 2
n doc # relevant
Let total # of relevant docs = 6
1 588 x
Check each new recall point:
2 576
3 589 x
R=1/6=0.167; P=1/1=1
4 342
5 590 x
R=2/6=0.333; P=2/3=0.667
6 717
7 984 R=3/6=0.5; P=3/5=0.6
8 772 x
9 321 x R=4/6=0.667; P=4/8=0.5
10 498
11 113 R=5/6=0.833; P=5/9=0.556
12 628
13 772
14 592 x R=6/6=1.0; p=6/14=0.429
R- Precision
Precision at the R-th position in the ranking

of results for a query that has R relevant
documents.
n doc # relevant
1 588 x
R = # of relevant docs = 6
2 589 x
3 576
4 590 x
5 986
6 592 x R-Precision = 4/6 = 0.67
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x
14 990
Mean Average Precision
(MAP)
Average Precision: Average of the precision
values at the points at which each relevant
document is retrieved.
Ex1: (1 + 1 + 0.75 + 0.667 + 0.38 + 0)/6 = 0.633
Ex2: (1 + 0.667 + 0.6 + 0.5 + 0.556 + 0.429)/6 = 0.625
Mean Average Precision: Average of the

average precision value for a set of queries.

Introduction to Information Retrieval
MAP(Example)

F-Measure
One measure of performance that takes into

account both recall and precision.
Harmonic mean of recall and precision:
2 PR 2
F 1 1
P R RP
Compared to arithmetic mean, both need to

be high for harmonic mean to be high.

Answers the following
1. When a search engine returns 40 pages,

only 20 of which were relevant while failing
to return 50 additional relevant pages,
precision=? recall=?
2. Suppose a program for finding duplicate
pages in sample identifies 7 duplicates in a
test containing 9 duplicate pages and some
spam pages. If 4 of the identifications are
correct, but 3 are actually spam pages,
precision=?, recall=?
END

Evaluation of IR

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Evaluation of IR

Hochgeladen von

Copyright:

Verfügbare Formate

Performance Evaluation

of Information Retrieval Systems

There are many retrieval models/ algorithms/

Wednesday, July 19, 2017 CS F469:Information Retrieval 2

Wednesday, July 19, 2017 CS F469:Information Retrieval 3

Wednesday, July 19, 2017 CS F469:Information Retrieval 4

retrieved & not retrieved but

retrieved not retrieved

Number of relevant documents retrieved

Number of relevant documents retrieved

Wednesday, July 19, 2017 CS F469:Information Retrieval 5

Wednesday, July 19, 2017 CS F469:Information Retrieval 6

In information retrieval, a perfect precision

Wednesday, July 19, 2017 CS F469:Information Retrieval 7

Precision can be seen as a measure of

Wednesday, July 19, 2017 CS F469:Information Retrieval 8

There are total 20 documents in a corpus out

Wednesday, July 19, 2017 CS F469:Information Retrieval 9

Precision at the R-th position in the ranking

Mean Average Precision: Average of the

Wednesday, July 19, 2017 CS F469:Information Retrieval 13

Wednesday, July 19, 2017 CS F469:Information Retrieval 14

One measure of performance that takes into

Compared to arithmetic mean, both need to

Wednesday, July 19, 2017 CS F469:Information Retrieval 15

1. When a search engine returns 40 pages,

Wednesday, July 19, 2017 CS F469:Information Retrieval 17

Das könnte Ihnen auch gefallen