Sie sind auf Seite 1von 17

Performance Evaluation

of Information Retrieval Systems


Why System Evaluation?

There are many retrieval models/ algorithms/


systems, which one is the best?
What is the best component for:
Ranking function (dot-product, cosine, )
Term selection (stopword removal, stemming)
Term weighting (TF, TF-IDF,)
How far down the ranked list will a user need
to look to find some/all relevant documents?

Wednesday, July 19, 2017 CS F469:Information Retrieval 2


Relevance in IR
In information science and in IR, relevance
denotes how well a retrieved document or set of
documents meets the information need of the user.
Relevance may include concerns such as
timeliness, authority or novelty of the result.

Wednesday, July 19, 2017 CS F469:Information Retrieval 3


Precision and Recall
Precision(also called positive predictive value)
The ability to retrieve top-ranked documents that are
mostly relevant. In other way, it is the fraction of retrieved
documents that are relevant
Recall(also called sensitivity)
The ability of the search to find all of the relevant
documents in the corpus. In other way, it is the fraction of
relevant documents that are retrieved

Wednesday, July 19, 2017 CS F469:Information Retrieval 4


Precision and Recall

relevant irrelevant
Entire document
collection Relevant Retrieved retrieved & Not retrieved &
documents documents irrelevant irrelevant

retrieved & not retrieved but


relevant relevant

retrieved not retrieved

Number of relevant documents retrieved


recall
Total number of relevant documents

Number of relevant documents retrieved


precision
Total number of documents retrieved

Wednesday, July 19, 2017 CS F469:Information Retrieval 5


Trade-off between Recall and Precision
Returns relevant documents but
misses many useful ones too The ideal
1
Precision

0 1
Recall Returns most relevant
documents but includes
lots of junk

Wednesday, July 19, 2017 CS F469:Information Retrieval 6


Precision vs. Recall

In information retrieval, a perfect precision


score of 1.0 means that every results
retrieved by a search was relevant (but says
nothing about whether all relevant
documents were retrieved) whereas a
perfect recall score of 1.0 means that all
relevant documents were retrieved by the
search (but says nothing about how many
irrelevant documents were also retrieved).

Wednesday, July 19, 2017 CS F469:Information Retrieval 7


Precision vs. Recall, cont

Precision can be seen as a measure of


exactness or quality, whereas recall is a
measure of completeness or quantity. In
simple terms, high precision means that an
algorithm returned substantially more
relevant results than irrelevant, while high
recall means that an algorithm returned
most of the relevant results.

Wednesday, July 19, 2017 CS F469:Information Retrieval 8


Example 1

There are total 20 documents in a corpus out


of which 10 are relevant. If a search engine
able to retrieve 10 documents out of which 5
are relevant then compute precision, recall
and F-measure.
Solution: Precision = 5/10 = 1/2 =0.5
Recall = 5/10 = = 0.5
F-measure = 2(0.5*0.5)/ (0.5+0.5) = 0.5

Wednesday, July 19, 2017 CS F469:Information Retrieval 9


Computing Recall/Precision Points:
Example 1
n doc # relevant
Let total # of relevant docs = 6
1 588 x Check each new recall point:
2 589 x
3 576
R=1/6=0.167; P=1/1=1
4 590 x
5 986
R=2/6=0.333; P=2/2=1
6 592 x
7 984 R=3/6=0.5; P=3/4=0.75
8 988
9 578 R=4/6=0.667; P=4/6=0.667
10 985
11 103 Missing one
relevant document.
12 591
Never reach
13 772 x R=5/6=0.833; p=5/13=0.38 100% recall
14 990
Wednesday, July 19, 2017 CS F469:Information Retrieval 10
Computing Recall/Precision Points:
Example 2
n doc # relevant
Let total # of relevant docs = 6
1 588 x
Check each new recall point:
2 576
3 589 x
R=1/6=0.167; P=1/1=1
4 342
5 590 x
R=2/6=0.333; P=2/3=0.667
6 717
7 984 R=3/6=0.5; P=3/5=0.6
8 772 x
9 321 x R=4/6=0.667; P=4/8=0.5
10 498
11 113 R=5/6=0.833; P=5/9=0.556
12 628
13 772
14 592 x R=6/6=1.0; p=6/14=0.429
Wednesday, July 19, 2017 CS F469:Information Retrieval 11
R- Precision

Precision at the R-th position in the ranking


of results for a query that has R relevant
documents.
n doc # relevant
1 588 x
R = # of relevant docs = 6
2 589 x
3 576
4 590 x
5 986
6 592 x R-Precision = 4/6 = 0.67
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x
14 990
Wednesday, July 19, 2017 CS F469:Information Retrieval 12
Mean Average Precision
(MAP)
Average Precision: Average of the precision
values at the points at which each relevant
document is retrieved.
Ex1: (1 + 1 + 0.75 + 0.667 + 0.38 + 0)/6 = 0.633
Ex2: (1 + 0.667 + 0.6 + 0.5 + 0.556 + 0.429)/6 = 0.625

Mean Average Precision: Average of the


average precision value for a set of queries.

Wednesday, July 19, 2017 CS F469:Information Retrieval 13


Introduction to Information Retrieval

MAP(Example)

Wednesday, July 19, 2017 CS F469:Information Retrieval 14


F-Measure

One measure of performance that takes into


account both recall and precision.
Harmonic mean of recall and precision:
2 PR 2
F 1 1
P R RP

Compared to arithmetic mean, both need to


be high for harmonic mean to be high.

Wednesday, July 19, 2017 CS F469:Information Retrieval 15


Answers the following

1. When a search engine returns 40 pages,


only 20 of which were relevant while failing
to return 50 additional relevant pages,
precision=? recall=?
2. Suppose a program for finding duplicate
pages in sample identifies 7 duplicates in a
test containing 9 duplicate pages and some
spam pages. If 4 of the identifications are
correct, but 3 are actually spam pages,
precision=?, recall=?
Wednesday, July 19, 2017 CS F469:Information Retrieval 16
END

Wednesday, July 19, 2017 CS F469:Information Retrieval 17

Das könnte Ihnen auch gefallen