Sie sind auf Seite 1von 2

Volume 3 Issue 12 December 2011

Editor's Desk F-Score: A combined measure of Precision and Recall


We often say, mistake is a mistake to mean and to emphasise that big or small, both positive and negative mistakes should be condemned. But in real life situations, all of us want to err on the positive side and give benefit of doubt while judging something or someone. In other words, errors are of two types. Shall we call them harmless, soft or positive error and harmful, harsh and negative error? Take the example of announcing results of an examination or a test. A student A who has really passed, but the system has by mistake declared him as failed and another student B, who is really failed the test but declared as
Student A Student B Declared Pass Correct Type II Error () Declared Fail Type I Error () Correct

passed. The former is a negative, harsh and more harmful error than the other. Yet another example is the usual statement about judiciary that hundred culprits may escape the clutches of law, but one innocent should not be punished. In other words, our judiciary system should be such that we have least errors in punishing innocents (type I error) even if it amounts to some culprits escaping from the clutches of law (type II error). Yes, you would have guessed rightly that, in hypothesis testing, margin of error is determined in advance as level of significance for a given sample size, which we may explore in Researchers Corner.
Relevant Not Relevant Retrieved TP (True Positive) FN (False Negative) Not Retrieved FP (False Positive) TN (True Negative)

It is interesting that if we try to decrease one type of error, we risk increase in the probability of committing the other type of error and vice versa. A kind of trade off is required on the part of decision maker taking costs and penalties of both types of errors into consideration. This is the typical relation between Precision (P) and Recall (R) in information retrieval system also, i.e., we cannot simultaneously increase/ decrease both. We may recollect that traditionally, several precision-increasing and recallincreasing techniques have been used in indexing and retrieval. If we mark retrieved and relevant

documents in a binary classification as in the table (it is called confusion matrix or table of confusion and a visualization tool in AI), the True Positive Rate is Recall, i.e., R = TP / TP + FN and the P = TP / TP + FP. Other measures are, the True Negative Rate, also called Specificity = TN / TN + F and the proportion of correct predictions called Accuracy = TP + TN / TP + TN + FP + FN.

In terms of probability, precision (a measure of quality or exactness) is the probability that a randomly selected retrieved document is relevant and recall (a measure of quantity or completeness) is the probability that a randomly relevant document is retrieved. One practical difficulty in calculating precision is assessing the relevant documents not retrieved in a large system and it is usually estimated. In search engine era, a combined Harmonic Mean (HM) of precision and recall is worked out and called F-score to test the accuracy of retrieval systems, i.e., F = 2PR / P + R. For the benefit of those not familiar with HM, it is the reciprocal of the average of reciprocals of the values. For example, HM for 4, 5 and 10 = 3 / 1/4

+1/5 + 1/10 = 60/11 = 5.45. As HM considers reciprocals, it gives largest weight to the smallest item and smallest weight to the largest item. Hence HM is more intuitive than Arithmetic Mean (AM) when fractional (ratio) values are involved. For example, a system with 1.0 as P and 0.2 as R (though it is useless to have a system with just 20% recall) will have 0.33 as AM and 0.6 as HM. The F-score as combined P and R could be misleading without knowing the individual precision and recall ratios, because in the above example if the values of P and R are interchanged still we a get the same F-score. Often F score is used as F2 score, which gives twice the weight to recall emphasising recall higher than precision, and the F0.5 score, which puts twice the weight to precision emphasising precision hgiher than recall. F score is also called F1 score to mean that no extra weigtage to either P or R. The general formula is F = ( + 1)PR / P + R, where is the parameter that controls the weighting balance between P and R.
2 2

M S Sridhar sridhar@informindia.co.in -------------------------------------------------------------------------------------------------------------------------http://informindia.co.in/iil_newsletter_editors.asp

Das könnte Ihnen auch gefallen