Beruflich Dokumente
Kultur Dokumente
Key Words derestimate the indicators of test performance and limit the
Diagnostic accuracy · Sensitivity · Specificity · Predictive applicability of the results of the study. Key Messages: The
values · Likelihood ratios · Diagnostic odds ratio · Receiver testing procedure should be verified on a reasonable popu-
operating characteristic · Area under the receiver operating lation, including people with mild and severe disease, thus
characteristic curve providing a comparable spectrum. Sensitivities and speci-
ficities are not predictive measures. Predictive values de-
pend on disease prevalence, and their conclusions can be
Abstract transposed to other settings only for studies which are based
Background: An increasing number of diagnostic tests and on a suitable population (e.g. screening studies). Likelihood
biomarkers have been validated during the last decades, and ratios should be an optimal choice for reporting diagnostic
this will still be a prominent field of research in the future accuracy. Diagnostic accuracy measures must be reported
because of the need for personalized medicine. Strict evalu- with their confidence intervals. We always have to report
ation is needed whenever we aim at validating any potential paired measures (sensitivity and specificity, predictive val-
diagnostic tool, and the first requirement a new testing pro- ues or likelihood ratios) for clinically meaningful thresholds.
cedure must fulfill is diagnostic accuracy. Summary: Diag- How much discriminative or predictive power we need de-
nostic accuracy measures tell us about the ability of a test to pends on the clinical diagnostic pathway and on misclassifi-
discriminate between and/or predict disease and health. cation (false positives/negatives) costs. © 2013 S. Karger AG, Basel
This discriminative and predictive potential can be quanti-
fied by measures of diagnostic accuracy such as sensitivity
and specificity, predictive values, likelihood ratios, area un-
der the receiver operating characteristic curve, overall accu- Introduction
racy and diagnostic odds ratio. Some measures are useful for
discriminative purposes, while others serve as a predictive An increasing number of diagnostic tests and bio-
tool. Measures of diagnostic accuracy vary in the way they markers [1] have become available during the last de-
depend on the prevalence, spectrum and definition of the cades, and the need for personalized medicine will
disease. In general, measures of diagnostic accuracy are ex- strengthen the impact of this phenomenon in the future.
tremely sensitive to the design of the study. Studies not Consequently, we need a careful evaluation of any poten-
meeting strict methodological standards usually over- or un- tial new testing procedure in order to limit the potential-
194.243.112.67 - 3/11/2014 9:21:51 AM
E-Mail karger@karger.com
IT–06124 Perugia (Italy)
www.karger.com/ced
E-Mail paoloeusebi @ gmail.com
ly negative consequences on both health and medical care Table 1. 2 × 2 table reporting cross-classification of subjects by in-
expenditures [2]. dex and reference test result
Evaluating the diagnostic accuracy of any diagnostic
Reference test
procedure or test is not a trivial task. In general it is about
answering several questions. Will the test be used in a subjects with subjects with- total
the disease out the disease
clinical or screening setting? In which part of the clinical
pathway will it be placed? Has the test the ability to dis- Index test
criminate between health and disease? How well does the Positive TP FP TP + FP
test do that job? How much discriminative ability do we Negative FN TN FN + TN
need for our clinical purposes? Total TP + FN FP + TN Total
In the following pages we will try and answer these
questions. We will give an overview of diagnostic accu-
racy measures, accompanied by their definitions includ-
ing their purpose, advantages and weak points. We will
implement some of these measures by discussing a real great importance to know how to interpret them, as well
example from the medical literature performing an evalu- as when and under what circumstances to use them.
ation of the use of velocity criteria applied to transcranial When we conduct a test, we have a cutoff value indi-
Doppler (TCD) signals in the detection of stenosis of the cating whether an individual can be classified as positive
middle cerebral artery [3]. Then we will end the paper (above/below the cutoff) or negative (below/above the
with a list of take-home messages. cutoff), and a gold standard (or reference method) which
will tell us whether the same individual is ill or healthy.
Therefore, the cutoff divides the population of examined
Diagnostic Accuracy Measures subjects with and without disease into 4 subgroups, which
can be displayed in a 2 × 2 table:
Overview – true positive (TP) = subjects with the disease with the
The discriminative ability of a test can be quantified by value of a parameter of interest above/below the cutoff;
several measures of diagnostic accuracy: – false positive (FP) = subjects without the disease with
– sensitivity and specificity; the value of a parameter of interest above/below the
– positive and negative predictive values (PPV, NPV); cutoff;
– positive and negative likelihood ratios (LR+, LR–); – true negative (TN) = subjects without the disease with
– the area under the receiver operating characteristic the value of a parameter of interest below/above the
(ROC) curve (AUC); cutoff;
– the diagnostic odds ratio (DOR); – false negative (FN) = subjects with the disease with the
– the overall diagnostic accuracy. value of a parameter of interest below/above the cutoff
While these measures are often reported interchange- (table 1).
ably in the literature, they have specific features and fit
specific research questions. These measures are related to Sensitivity and Specificity
two main categories of issues: Sensitivity [5] is generally expressed in percentage and
– classification of people between those who are and defines the proportion of TP subjects with the disease in
those who are not diseased (discrimination); a total group of subjects with the disease: TP/(TP + FN).
– estimation of the posttest probability of a disease (pre- Sensitivity estimates the probability of getting a positive
diction). test result in subjects with the disease. Hence, it relates to
While discrimination purposes are mainly of concern the ability of a test to recognize the ill. Specificity [5], on
in health policy decisions, predictive measures are most the other hand, is defined as the proportion of subjects
useful for predicting the probability of a disease in an in- without the disease with a negative test result in a total
dividual once the test result is known. Thus, these mea- group of subjects without the disease: TN/(TN + FP). In
sures of diagnostic accuracy cannot be used interchange- other words, specificity estimates the probability of get-
ably. Some measures largely depend on disease preva- ting a negative test result in a healthy subject. Therefore,
lence, and all of them are sensitive to the spectrum of the it relates to the ability of a diagnostic procedure to recog-
disease in the population studied [4]. It is therefore of nize the healthy.
194.243.112.67 - 3/11/2014 9:21:51 AM
TCD
0.6 Positive 9 7 16
Sensitivity
Negative 3 80 83
0.4 Total 12 87 99
0.2
tivity with low rates of FP and FN has a high DOR. With
0 the same sensitivity of the test, the DOR increases with
the increase in the test’s specificity. The DOR does not
0 0.2 0.4 0.6 0.8 1.0
depend on disease prevalence; however, it depends on the
1 – specificity
criteria used to define the disease and its spectrum of
pathological conditions of the population examined.
Fig. 1. ROC curve. From bottom-left to upper-right corner, we ob-
serve increasing sensitivity and decreasing specificity at lowering
thresholds. Example
0.5
Key Messages
0.4
Population 0 0.1 0.2 0.3 0.4 0.5 0.6
The testing procedure should be verified on a reason- 1 – specificity
able population; thus it needs to include those with mild
and severe disease, aiming at providing a comparable
spectrum. Disease prevalence affects predictive values, Fig. 2. Visual display of test results evaluated at an MV cutoff of
but the disease spectrum has an impact on all diagnostic >80 cm/s and an MV cutoff of >90 cm/s in an ROC space.
accuracy measures.
Variability
As always, it is crucial to report variability/uncertainty
measures for diagnostic accuracy results (95% CI).
References
1 Whiteley W, Wardlaw J, Dennis M, Lowe G, 4 Montori VM, Wyer P, Newman TB, Keitz S, 8 Zou KH, O’Malley AJ, Mauri LL: Receiver-
Rumley A, Sattar N, Welsh P, Green A, An- Guyatt G, Evidence-Based Medicine Teach- operating characteristic analysis for evaluat-
drews M, Graham C, Sandercock P: Blood ing Tips Working Group: Tips for learners of ing diagnostic tests and predictive models.
biomarkers for the diagnosis of acute cerebro- evidence-based medicine. 5. The effect of Circulation 2007;115:654–657.
vascular diseases: a prospective cohort study. spectrum of disease on the performance of di- 9 Glas AS, Lijmer JG, Prins MH, Bonsel GJ,
Cerebrovasc Dis 2011;32:141–147. agnostic tests. CMAJ 2005;173:385–390. Bossuyt PM: The diagnostic odds ratio: a sin-
2 Koffijberg H, van Zaane B, Moons KGM: 5 Altman DG, Bland JM: Diagnostic tests. 1. gle indicator of test performance. J Clin Epi-
From accuracy to patient outcome and cost- Sensitivity and specificity. BMJ 1994; 308: demiol 2003;56:1129–1135.
effectiveness evaluations of diagnostic tests 1552. 10 Rorick MB, Nichols FT, Adams RJ: Transcra-
and biomarkers: an exemplary modelling 6 Altman DG, Bland JM: Diagnostic tests. 2. nial Doppler correlation with angiography in
study. BMC Med Res Methodol 2013;13:12. Predictive values. BMJ 1994;309:102. detection of intracranial stenosis. Stroke
3 Tamura A, Yamamoto Y, Nagakane Y, 7 Deeks JJ, Altman DG: Diagnostic tests. 4. 1994;25:1931–1934.
Takezawa H, Koizumi T, Makita N, Makino Likelihood ratios. BMJ 2004;329:168. 11 Navarro JC, Lao AY, Sharma VK, Tsivgoulis
M: The relationship between neurological G, Alexandrov AV: The accuracy of transcra-
worsening and lesion patterns in patients with nial Doppler in the diagnosis of middle cere-
acute middle cerebral artery stenosis. Cere- bral artery stenosis. Cerebrovasc Dis 2007;23:
brovasc Dis 2013;35:268–275. 325–330.