Sie sind auf Seite 1von 13

Clinica Chimica Acta 333 (2003) 155 – 167

www.elsevier.com/locate/clinchim

Evidence-based laboratory medicine as a tool for continuous


professional improvement
Tommaso Trenti*
Servizio di Patologia Clinica, Ospedale degli Infermi, Ausl di Modena, Via Martiri 51, Pavullo nel Frignano,
Pavullo nel Frignano, Modena I-41026, Italy

Abstract

Background: Evidence-based medicine (EBM), defined as ‘‘the conscientious, explicit, and judicious use of the current best
evidence in making decisions about the care of patients,’’ seems a tool (a ‘‘new paradigm’’) able to meet individual clinical
experience with robust observations. EBM has been driven by the need to manage information overload by cost control and by
public request for the best in diagnostics and treatment. Methods: The application of EBM in laboratory medicine or evidence-
based laboratory medicine (EBLM) is aimed to advance clinical diagnosis by researching and disseminating new knowledge,
combining methods from clinical epidemiology, statistics, and social science with the traditional pathophysiological molecular
approach. Results: EBLM, by evaluating the role of diagnostic investigations in the clinical decision-making process with
emphasis on measurable outcome, can help both in improving the quality of new scientific findings and in translating the results
of good-quality research into everyday practice. Conclusions: Since there is a need to integrate many educational tools to focus
the strategy on promoting the implementation of best practices, the STARD proposal for robust diagnostic test primary studies,
the presence of systematic reviews of high quality, and the development of valid guidelines based on the best scientific evidence
may be useful to promote an a evidence-based culture for appropriateness, efficiency, and effectiveness in laboratory medicine.
D 2003 Elsevier Science B.V. All rights reserved.

Keywords: Evidence-based laboratory medicine; Critical appraisal; Laboratory test utilization; Quality of evidence

1. Introduction ability to cope with >30,000 biomedical journals


published annually and >17,000 new books each year.
Several reasons and many factors have come It is worthy to note that systematic reviews of the
together for the development of an ‘‘evidence-based scientific literature demonstrated that an impressive
culture’’ or evidence-based medicine (‘‘EBM’’) in number of studies are ineffective and potentially
medicine and in laboratory medicine [1]. Authors misleading, as >95% of articles in medical journals
pointed to the number of papers in scientific journals do not meet adequate criteria of critical assessment
published and the insufficient time given to updating [2]. A following crucial point raised is the possible
knowledge with an ineffective educational process long delay between recognized clinically effective
[1,2], as individual physicians have no time nor the interventions and their subsequent application in rou-
tine practice [3]. By contrast, there are examples of
* Tel.: +39-536-29315; fax: +39-536-29278. new interventions implemented afterwards that are
E-mail address: t.trenti@ausl.mo.it (T. Trenti). considered useless (technology creep) [4].

0009-8981/03/$ - see front matter D 2003 Elsevier Science B.V. All rights reserved.
doi:10.1016/S0009-8981(03)00180-3
156 T. Trenti / Clinica Chimica Acta 333 (2003) 155–167

The global trend of cost rise in health care is a Whereas this is a misguided viewpoint, it indicates
stimulus on the critical appraisal of laboratory testing in the degree of ignorance or misunderstanding that
terms of appropriateness. The accurate evaluation is a surrounds the value of diagnostic tests, and it poses
prerequisite for the introduction of a diagnostic tool or a major challenge for today’s laboratory professionals
test into the healthcare system in an attempt of the best [11,12]. Accurate and fast diagnosis is a key to
use of finite resources in an ongoing cost – benefit accurate, fast, and cost-effective treatment. EBM in
analysis. In addition, the people wish the best in laboratories or EBLM is aimed to advance the
diagnostics and therapies with no waiting time. Evi- research in clinical diagnosis and to disseminate new
dence-based medicine has been driven by the necessity knowledge by combining methods from clinical epi-
to cope with information surplus and physician prac- demiology, statistics, and social sciences with the
tice, by patient demand for the best in diagnostics and traditional pathophysiological molecular approach to
therapeutics, and by cost control. The physicians, evaluate the role of diagnostic investigations in clin-
nurses, or laboratory professionals have to understand, ical decision-making processes, with emphasis on
interpret, and integrate the findings from clinical measurable outcomes.
examinations and results of tests to achieve diagnosis,
prognosis, or therapy in a decision-making process.
The information that expresses the patient’s need must 2. EBM and EBLM definitions
be turned into accountable question and the best evi-
dence must be searched out and critically evaluated to The assessment of evidence in research in ancient
reach the most available benefit for the patient. Know- Chinese medicine during the role of the Quianlong
ledge of whether the action taken is advantageous or Emperor named ‘‘kaozheng,’’ the autopsy used by
not requires an evaluation of the induced improvement Morgani in Padua in 1769 in the study of diseases,
in a process of lifelong and self-directed learning. and the postrevolutionary French clinicians such as
The evaluation of diagnostic processes is less Pierre Louis in Paris who rejected the pronounce-
advanced than that of drug treatments, and EBM ments of authorities and sought the observation of
appears to have had limited effect in the area of patients are claimed to be among the various origins
laboratory medicine; nevertheless, data suggest that of EBM. In 1992, the group led by Guyatt at McMas-
the use of robust evidence on diagnostic tests is poor. ter University in Canada consolidated and named
Several elements recommend the need for an EBM these ideas as EBM and, in 1996, Sackett et al. [13]
approach in laboratory medicine. A growing number defined what is and what is not EBM. Since then, the
of papers demonstrate significant difference in diag- number of articles about evidence-based medicine, as
nostic strategies between different hospitals for the well as international interest, have grown exponen-
same clinical appearance [2,5], as well as outstanding tially. Evidence-based medicine, defined as ‘‘the con-
data that show the inappropriate and/or ineffective use scientious, explicit, and judicious use of the current
of laboratory tests [6,7]. The rising use of diagnostic best evidence in making decisions about the care of
investigations is due to an increasing demand for care patients’’ [13], seems a tool (a ‘‘new paradigm’’) able
induced by the aging of the population and the to merge individual clinical experience with robust
growing number of chronically ill people, and the observations in a competent physician. This definition
evidence that a test itself leads to ordering of further underlines the fact that EBM is a practice based on a
laboratory investigations when an abnormal test result continuous, lifelong, problem-based learning process.
is found. Search for higher standards of care, guide- If EBM is the integration of best research with clinical
lines recommending additional testings, and defensive expertise and patients values, best research evidence
behaviour are further causes leading to more inves- means clinically relevant research, especially patient-
tigations [8]. A great challenge to laboratory medicine centred research, into the accuracy and precision of a
is to demonstrate an evidence-based role of diagnostic diagnostic test, the power of prognostic markers, and
tests on patients outcomes [9– 11] as they are not the efficacy and safety of therapeutic, rehabilitative,
perceived to have a primary role in providing an and preventive regimens. New evidence from clinical
added value to medical treatments. research both invalidates previously accepted diag-
T. Trenti / Clinica Chimica Acta 333 (2003) 155–167 157

nostic tests and treatments and replaces them with vention [15], this intervention should be ordered only
new ones that are more powerful, more accurate, more when a question is asked and when there is proof of
effective, and safer [1]. Clinical experience is intended evidence that the result can support an adequate
as the capacity to use clinical skills and professional answer to the clinical question.
experience to recognize each patient’s unique health
state and diagnosis, the individual risks, and the
benefits of potential interventions. Evidence-based 4. Where to find the best evidence
medicine is not a ‘‘cookbook’’ because it requires
the integration of the best external evidence with The increasing number of scientific publications
professional individual expertise and patient choice; and articles requires a searching strategy to guarantee
clinical evidence can inform but cannot substitute for that relevant findings are included and low-methodo-
individual clinical expertise. This knowledge is logical-quality data are excluded when a research ‘‘of
needed to evaluate whether external evidence may evidence’’ is prepared and strategies for this have been
be applied to the individual subject in deciding how to described [1,16]. As a general point of view, evidence
improve the patient’s clinical outcomes. In the same derived from clinical studies can be found in elec-
way, the use of EBM as a ‘‘cost-cutting medicine’’ is a tronic databases that can be the general database of
misunderstanding, as doctors practising EBM identify published articles such as MEDLINE and EMBASE,
and apply the most effective interventions to reach the or more evidence-based oriented as Evidence-Based
best in terms of quality of life for patients; this could Medicine Review (EBMR) from Ovid Technologies
raise, rather than lower, the cost of care [13]. Patient’s or the Cochrane Library or Best Evidence. The
values are the unique preferences, concerns, and laboratory professionals may be helped in their
expectations each patient brings to a clinical encoun- research by the growing number of scientific journals
ter and which must be integrated into clinical deci- devoted to summarizing the best evidence derived
sions. Laboratory professionals need to combine the from traditional scientific papers selected according to
skill and judgement they have developed through explicit high-quality criteria. The findings are often
clinical experience with the best existing clinical represented as structured abstracts provided of expert
evidence derived from systematic research [14], critical commentaries to illustrate the background of
which must be integrated into clinical decisions. the study and the clinical applicability. The most
Evidence-based laboratory medicine (EBLM) com- popular new journals are ACP Journal Club (the first
bines clinical epidemiology, statistics, and social sci- one in these publications in 1991), Evidence-Based
ences with more traditional molecular and bio- Medicine, Evidence-Based Mental Health, Evidence-
chemical pathology to evaluate the effectiveness of Based Nursing, Evidence-Based Cardiovascular Dis-
diagnostic tests in clinical decision making and ease, and Bandolier. It is worth noting that the value
patient-oriented outcomes. The systematic evaluation of searching in a database such as MEDLINE may be
of diagnostic tests could improve patient care, and the improved by evidence-based quality filter and sensi-
quality of diagnostic information should decrease tive strings [17]. The Internet is a tool very helpful in
healthcare costs. EBM practice because of the continuously expanding
resources and materials available there.

3. Practicing evidence-based laboratory medicine


5. The critical assessment of the best evidence and
Practicing evidence-based laboratory medicine has the critical appraisal of diagnostic tests
four main dimensions and elements comprising a
continuous process: (1) the identification of the ques- 5.1. Technical and diagnostic performances
tion, (2) the critical assessment of the best evidence
available, (3) the implementation of the best practice It is well accepted that the practice of medicine
and, (4) the maintenance of best practice [11]. If the depends upon an accurate diagnosis based on the use
use of a diagnostic test may be defined as an inter- of diagnostic tests These may be laboratory-oriented,
158 T. Trenti / Clinica Chimica Acta 333 (2003) 155–167

including biochemistry, haematology, bacteriology, potential to improve outcomes in the clinical decision-
virology, immunology, and genetics; or imaging tech- making process. In this appraisal, the results of one or
niques, including echography, magnetic resonance, more tests are compared with a reference or gold
and others. Most diagnostic tests are evaluated using standard in a group or groups of patients with or
architecture subject to huge bias. Fryback and Thorn- without the target disease to determine diagnostic
bury [18] proposed a hierarchy of elements to be accuracy. Accuracy thus refers to the amount of
considered in making an ‘‘evidence’’ decision on the agreement between the studied test and the reference
diagnostic test performances. This hierarchy, in the standard. Diagnostic performance is an assessment of
case of laboratory tests, is reported in Table 1 as the test to evaluate sensitivity, intended as the ability
suggested by Price [11]. of a test to identify correctly people having the
Technical performances are the first step in the disease, and specificity, intended as the ability of a
decisional process. In this light, nevertheless, preana- test to identify correctly people not having the disease.
lytical factors are not always documented; technical Diagnostic accuracy can be expressed in many ways
performances are usually documented by laboratory as sensitivity and specificity, likelihood ratios, diag-
experts [19]. nostic odds ratio, and the area under a receiver
operating characteristic curve.
5.2. Sensitivity, specificity, likelihood ratios, and The rates of correct identification of patients with
receiver operating characteristic (ROC) curves and without the disease are known as test sensitivity
and test specificity, respectively. A test useful to ruling
Diagnostic accuracy is the capacity of a test to out disease must have high sensitivity and a test useful
recognize a condition of interest, and knowledge of at confirming a disease must have high specificity.
the characteristic of a test is crucial to assessing its The overall significance of the test is the prevalence of
the disease in the target population. It has been
proposed [20,21] that the likelihood ratio is a clear
Table 1 and easy tool to identify posttest probability after the
Evidence against which new or existing test should be judged in a administration of diagnostic tests compared to the
decision-making process [11]
pretest probability. Likelihood ratios can be helpful
Technical performances Preanalytical factors in stating how many times more likely particular test
Sample stability
Precision
results are found in patients with disease than in those
Accuracy without disease. In this way, both components of
Analytical range evidence-based medicine, the personal clinical expe-
Biological variation rience and the best external evidence, are taken into
Diagnostic performances Sensitivity (people correctly account. In fact, the individual assessment of diag-
identified to have the disease)
Specificity (people correctly
nostic probabilities before doing the test (prior or
identified to not have the disease) pretest probabilities) and the ability of the test to
Likelihood ratio in positive distinguish patients with or without the target disorder
or negative are present in the new and more robust concept of
ROC curve likelihood ratios by incorporating both concepts of
Clinical impact and benefits Diagnostic strategy and
performances
sensitivity and specificity. These fundamentals of
Therapeutic evidence-based medicine can be used to estimate
Improved compliance whether the patients have the target disorder (posterior
Reduced risk of toxicity or posttest probabilities) and to make the diagnosis.
Reduced adverse effects Likelihood ratios seem a strong instrument to validate
Health outcomes
Organizational impact Reduced length of hospital stay
the power of a test to differentiate patients who do and
Reduced staff time utilization do not have a specific disease. The classic studies of
Reduced risk Guyatt et al. [22] are of value in understanding the
Cost effectiveness likelihood ratio as a powerful tool. A process aimed to
Decision evaluate the increasing confidence of the presence or
T. Trenti / Clinica Chimica Acta 333 (2003) 155–167 159

absence of disease requires a discriminative potential; Table 3


the measures of discrimination are commonly derived Results of a systematic review of serum ferritin as a diagnostic test
for iron deficiency anemia (application of 2  2 table) [22]
from a 2  2 table relating test outcomes, as shown in
Target disorder: iron deficiency
Table 2. In Table 3 are reported the results of a
anaemia
systematic review of serum ferritin as a diagnostic
Present Absent Totals
test to investigate the presence of iron deficiency
anaemia; the use of the 2  2 table allows the evalua- Diagnostic Test positive a b a+b
test results < 65 mmol/l 731 270 1001
tion of the test performance in comparison to the
Test negative c d c+d
golden standard (i.e., bone marrow stain for iron). It is 65 mmol/l 78 1500 1578
possible to note that 90% of patients with iron Totals a+c b+d a+b+
deficiency anaemia have serum ferritin < 65 mmol/l. c+d
These patients (731/809) with target disorder with a 809 1770 2579
Sensitivity = a/a + c = 731/
positive result define the test’s sensitivity. It can be
809 = 90%
seen that 85% of patients (1500/1770) who do not Specificity = d/b + d = 1500/
have iron deficiency anaemia presented ferritin levels 1770 = 85%
>65 mmol/l. Patients free of the studied disorder with LH+ = sensitivity/
negative or normal test results represent the test’s (1 specificity)
=[a/(a + c)]/ = 90%/15% =6
specificity. Sensitivity = a/a + c = 731/809 = 90%; Spe-
[b/(b + d)]
LH =(1 sensitivity)/
specificity
Table 2 =[c/(a + c)]/ = 10%/85% = 0.12
Discriminative power and measures of discrimination derived from [d/(b + d)]
a 2  2 table of test outcome related to a reference standard [1] Positive = a/(a + b) = 731/1001 = 73%
Target disorder predictive
value
Present Absent Totals Negative = d/(c + d) = 1500/1578 = 95%
Diagnostic test Test positive a b a+b predictive
results value
Test negative c d c+d Pretest =(a + c)/ = 809/2579 = 31%
Totals a+c b+d a+b+ probability (a + b + c + d)
c+d (prevalence)
Sensitivity = a/a + c Pretest odds = prevalence/ = 31%/69% = 0.45
Specificity = d/b + d (1 prevalence)
LH+ = sensitivity/ Posttest odds = pretest odds  = 0.45  6 = 2.7
(1 specificity) likelihood ratio
=[a/(a + c)]/ Posttest = posttest odds/ = 2.7/3.7 = 73%
[b/(b + d)] probability (posttest odds + 1)
LH =(1 sensitivity)/
specificity
=[c/(a + c)]/ cificity = d/b + d = 1500/1770 = 85%. From these data,
[d/(b + d)] it is possible to calculate the LH+ (likelihood ratio for
Positive predictive = a/(a + b)
value
positive results, six in this case) and LH (likelihood
Negative predictive = d/(c + d) ratio for negative results, 0.12) by the means of the
value formula reported in Table 2: LH + = sensitivity/
Pretest probability =(a + c)/ (1 specificity)=[a/(a + c)]/[b/(b + d)] = 90%/15% = 6;
(prevalence) (a + b + c + d) LH (1 sensitivity)/specificity=[c/(a + c)]/[d/
Pretest odds = prevalence/
(1 prevalence)
(b + d)] = 10%/15% = 0.12. As reported in Table 3,
Posttest odds = pretest odds  other parameters such as Positive predictive value,
likelihood ratio Negative predictive value, Prevalence, Pretest odds,
Posttest probability = posttest odds/ Posttest odds, and Posttest probability may be calcu-
(posttest odds + 1) lated. When a test has very high sensitivity, a negative
160 T. Trenti / Clinica Chimica Acta 333 (2003) 155–167

result effectively rules out the diagnosis; thus, in this highest sensitivity at the expense of specificity or a
condition, it was suggested to apply the mnemonic point that generates the highest specificity at the
term of SnNout (when a sign or a test has a high expense of sensitivity. The National Committee for
sensitivity, a negative result rules out the diagnosis). Clinical Laboratory Standards (NCCLS) has produced
Similarly, the term SpPin is used when a test has a guidelines that describe the use of this statistical tool
very high specificity and a positive result rules in the considering that several software programs are avail-
diagnosis (specificity, a positive result effectively able to perform ROC analysis. The guidelines is
rules in the diagnosis). entitled ‘‘NCCLS Document GP 10-A, Assessment
A list of SpPins and SnNouts may be found in the of the Clinical Accuracy of Laboratory Test Using
website: http://www.library.utoronto.ca/medicine/ Receiver Operation Characteristic (ROC) Plots.’’
ebm/. ROC analysis, combined with a reliable diagnostic
Most test results can be divided into several levels; strategy, can help to select the most appropriate cutoff
in the case of ferritin test, results were divided into point for the test in order to give the best information
five levels, as illustrated in Table 4. It is possible to to help make a diagnosis that leads to more positive
see how LH ratios reveal a greater power if compared patient outcomes.
to an approach that restricts the results to two levels, ROC plots are utilized in systematic reviews to
as in the case of positive or negative. Positive like- display the results of a set of studies, the sensitivity
lihood ratio above 10 and negative likelihood ratio and specificity from each study being plotted as
below 0.1 have been noted as providing convincing separate points in the receiver operating characteristic
diagnostic evidence, whereas those above 5 and below space.
0.2 give strong diagnostic evidence. The diagnostic test outcome is compared with an
When the performance of the diagnostic test is independently established standard diagnosis to eval-
evaluated in studies of diagnostic accuracy at several uate discriminatory power; however, ‘‘gold stand-
diagnostic thresholds, the receiver operating charac- ards’’ providing full certainty are rare and the
teristic curve is able to describe the pattern of sensi- challenge is to find a standard as close as possible
tivities and specificities. The global performance of to the theoretical gold standard [23]. However, if new
the test can be ascertained by the position of the diagnostic tests have to be developed, the research
receiver operating characteristic line. Poor tests have into the accuracy of test procedures cannot be
lines close to the rising diagonal whereas the line for a restricted only to compare tests with current standards
perfect test would rise sharply and pass close to the as new diagnostic procedures or tests would be
top left-hand corner where both the sensitivity and ignored. In this context, the reference standard is the
specificity are 1. As ROC analysis consists of a plot of best existing method for confirming the presence or
sensitivity and specificity pairs for the test studied, absence of the target condition.
from a sufficient number of pairs, an appropriate Several years ago, an editorial in ACP Journal
cutoff value may be selected based on the global Club summarized some guides to help readers to
diagnostic strategy. A cutoff maybe chosen with the critically appraise articles about diagnostic tests [24]

Table 4
Likelihood ratio at five levels of serum ferritin as a diagnostic test in iron deficiency anaemia
Results Iron deficiency anaemia LH value Diagnostic
Serum ferritin (mmol/l) Target disorder present Target disorder absent
Number % Number %
Very positive < 15 474 59 20 1.1 52 Rule in ‘‘SpPin’’
Moderately positive 15 – 34 175 22 79 4.5 4.8
Neutral 35 – 64 82 10 171 10 1 Neutral
Moderately negative 65 – 94 30 3.7 168 9.5 0.39
Very negative >95 48 5.9 1332 75 0.08 Rule out ‘‘SnNout’’
Totals 809 100 1770 1000
T. Trenti / Clinica Chimica Acta 333 (2003) 155–167 161

and the discussion of the tests’ diagnostic properties is comparison to patients without the disease, and the
summarised in a recent textbook about practising and conclusion is that BNP is a useful diagnostic aid for
teaching evidence-based medicine [1]. left ventricular disease [26]. In the opinion of the
There are some main questions about the validity authors [25], studies in this phase of evaluation add to
and importance of a diagnostic test. Is there is good biological knowledge in terms of pathophysiological
evidence regarding the test accuracy? And can this mechanism, but it cannot be translated into diagnostic
evidence demonstrate the ability of the test to cor- action as both specificity and sensitivity may change
rectly diagnose a specific disease? when the same diagnostic test is applied in a different
Four questions may be relevant to evaluate if the context of care. The second phase tries to evaluate
diagnostic test has a relation with target disorders, as whether patients with certain test results are more
illustrated by Sackett and Haynes [25] and reported in likely to have the target disorder than patients with
Table 5. other test results. A different group of investigators
Do test results in affected patients differ from those [27] investigated the role of cardiac natriuretic pep-
in normal individuals (Phase I)? Are patients with tides for diagnosis and risk stratification in heart
certain test results more likely to have the target failure by measuring the BNP concentration in normal
disorder (Phase II)? Do test results distinguish patients controls and in three groups of patients with coronary
with and without the target disorder among those in artery disease and left ventricular dysfunction at
whom it is clinically sensible to suspect the disorder different degrees. The results showed a high sensitiv-
(Phase III)? Do patients undergoing the diagnostic test ity of the test to rule out left ventricular disease
fare better than similar untested patients (Phase IV)? (SnNout) and a high specificity to rule in it (SpPin);
As example, a four-phase evaluation is reported the likelihood ratio for an abnormal test result was 13
between a test, the plasma concentration of B-type and the likelihood ratio for a normal test result was
natriuretic peptide (N-terminal pre-BNP), and a target 0.03, the sensitivity was 98%, and specificity was
disorder, the left ventricular dysfunction. Studies of 92%. In this phase of evaluation, the test seems to be a
Phase I are conducted among a group of patients good indicator of the severity and prognosis of con-
known to have the disease and a group known not gestive heart failure. In Phase III, it was evaluated if
to have it and not merely suspected to have it. In this the test results were able to distinguish patients with
case, researchers measured the concentration of BNP and without the target disorder among those in whom
in nonsystematic samples from normal controls and it was clinically realistic to suspect the disorder: in the
from patients suffering from various combinations of case of BNP, it was evaluated if the test result was
disease such as ventricular hypertrophy, left ventricu- useful among patients clinically suspected of having
lar dysfunction, and hypertension. Significant plasma left ventricular disease. General practitioners were
BNP levels were found in patients with the disease in invited by a group of clinical investigators in UK to
refer patients with suspected heart left ventricular
Table 5 systolic dysfunction to a hospitals where the patients
The relation between a diagnostic test and a target disorder (the were submitted to blind and independent BNP test and
most relevant questions and phases of evaluation) [25]
echocardiography in a cross-sectional study [28]. The
Phase I sensitivity found was 88% and the specificity was
Do test results in affected patients differ from those in normal
individuals?
34%, the likelihood ratio for an abnormal test result
Phase II was 1.3, and the likelihood ratio for a normal test
Are patients with certain test results more likely to have the target result was 0.4. The conclusion was that the introduc-
disorder? tion of BNP routine measurement would be unlikely
Phase III to improve the diagnosis of symptomatic left ventric-
Do test results distinguish patients with and without the target
disorder among those in whom it is clinically sensible to suspect
ular dysfunction in the community. As a broad state-
the disorder? ment, Sackett et al. called attention to the fact that
Phase IV clinicians who wish to apply Bayesian properties of
Do patients undergoing the diagnostic test fare better than similar diagnostic test require accurate estimates of the pretest
untested patients? probability of target disorders in their area and setting.
162 T. Trenti / Clinica Chimica Acta 333 (2003) 155–167

This assessment can come from five sources: personal the causes of inconsistency are benefits gained from a
experience, population prevalence data, practice data- systematic review of available evidence [11,12].
bases, the publication that describes the test, or one of The results are presented in a graphic form as odds
a growing number of primary studies of pretest or risk ratios with a 95% confidence interval for
probability in different settings [1,25]. The definitive individual trials, an overall ratio, and a 95% confi-
value of a diagnostic test lies in the further diagnostic dence interval for the pooled data from all the trials.
and therapeutic interventions and health outcomes No effect is given a relative risk of 1; a risk ratio < 1
induced. The question to answer is whether patients points out a decrease in the number of events in the
undergoing the diagnostic test fare better than similar considered or treated group compared to control group
patients who are not tested in their ultimate health and it is in favor of treatment. When a risk ratio is >1,
outcomes. When the test is of benefit in the correct it is the contrary. If the confidence interval of data
diagnosis of a severe disease with a life-saving ther- intersects the vertical line of no effect (relative risk of
apy, there is a manifest value; often, as in the case of 1.0), there may be no significant variation between the
tests for the early detection of disease, only the intervention [12] and no treatment. The summary
follow-up of randomised patients can answer the ROC curve can be used to combine studies to deter-
Phase IV question. mine the potential level of significance with meta-
A US doctors survey on academic versus clinical analysis comparing published studies as, for instance,
judgements and physicians’ use of quantitative meas- in the study of Olatidoye et al. [31] on the prognostic
ures of test accuracy showed that almost none of the role of troponin T versus troponin I in unstable angina
contacted doctors used the terms sensitivity, specific- pectoris for cardiac events. A research identifying 45
ity, positive predictive value, likelihood ratio, or meta-analyses in the field of clinical chemistry and
receiver operating characteristic curve in any formal hematology covering the period 1985 – 1998 showed
way. Nevertheless, 84% of doctors said they utilized that in only 23 meta-analyses did the authors complete
sensitivity and specificity in ordering tests but only an explicit and comprehensive search of scientific
3% applied Bayesian method, 1% ROC curve, and 1% literature present. When these 23 papers were further
likelihood ratios [29]. analyzed using the Methods Working group on
screening and diagnostic test in the Cochrane Collab-
5.3. Systematic review and meta-analysis of evalua- oration [32] criteria, none of them met all six guide-
tions of diagnostic and screening test lines. This exemplifies the difficulties in meta-
analyses of diagnostic tests mainly due to lack of
The evaluation of data from several studies in a quality of primary studies. Reviews on carbohydrate-
systematic review can be very helpful in he assessment deficient transferrin rather than g-glutamyltransferase
of a diagnostic test. Systematic reviews are not only in alcohol misuse, or on cardiac markers in the
papers that summarize other papers, but they are the diagnosis of acute myocardial infarction [33,34],
summary of the best evidence from primary studies confirm the design flaws in primary studies and the
using explicit and reproducible methods to find, crit- difficulties of an EBM approach.
ically review, and then synthesize the evidence for Although systematic reviews of evaluation of diag-
clinical decision. When statistical methods are used nostic tests are aimed to produce estimates of test
to combine the results of multiple researches, the result performance and impact on care based on all available
is a meta-analysis (i.e., a statistical combination of the evidence exactly as systematic reviews of drug treat-
data from more than one study after they have been ments, it is evident that systematic reviews of diag-
converted into a standard measurement scale) [30]. A nostic accuracy do not have the same influence on the
systematic review enhances confidence in overall practice as systematic reviews of therapeutic random-
results and it limits the effect of bias. There is an ized controlled trials. Some summary points may be
integration of information by reducing data to a man- relevant to understand systematic reviews of evalua-
ageable level. The avoidance of increase of studies, the tions of diagnostic and screening tests [35]. Studies of
reduction of delay between discovery and implemen- diagnostic accuracy compare test results between
tation, the identification of new research questions, and groups of patients with and without the studied dis-
T. Trenti / Clinica Chimica Acta 333 (2003) 155–167 163

ease: systematic reviews, to be reliable, should in- ical Journal, New England Journal of Medicine, and
clude primary studies of high quality and exclude Journal of the American Medical Association the
studies that do not meet acceptable criteria. It is adherence to standards of clinical epidemiology of
essential that the quality of the studies included in the reporting of results in a period ranging from 1978
the review is assessed and reported. The relationships to 1993. Few of the standards were met reliably,
between test results and presence of disease should be ranging from 46% in avoiding workup bias down to
evaluated using probabilistic measures such as sensi- 9% in reporting accuracy in subgroups. An improve-
tivity, specificity, likelihood ratios, diagnostic odds ment over time for reports was observed but even in
ratio, and receiver operating characteristic curves. The the most recent period, only 24% of studies met up to
statistical method for pooling results depends on the four standards and only 6% up to six, and none of the
statistic elaboration and sources of heterogeneity. The studies was evidenced to meet to all seven study
summaries obtained by pooling likelihood ratios can design and reporting standards. Important key infor-
be interpreted and applied to clinical practice more mation such as the blinding of final diagnosis with or
easily then those obtained by pooling sensitivities and without results knowledge and test reproducibility
specificities. The evaluation of the diagnostic test was not presented. In 1999, Lijmer et al. [39] dem-
accuracy is only one dimension in assessing the cli- onstrated that published studies with these faults were
nical value of a test and this has to be assessed in a associated with an overestimation of diagnostic accu-
framework for clinical diagnostic technologies evalu- racy compared with studies without these faults. In
ation. The most accurate test can be clinically useless this light, since reports do not include information
or more harmful than good and when it is assessed as about their validity, the inadequate reporting of studies
of average benefit to patients, the evaluations of diag- does not allow reliable high-quality systematic
nostic test should include the comprehension of lim- reviews. A checklist was proposed in 1997 for the
itations [36,37]. reporting of studies of diagnostic accuracy in the
The production of systematic reviews is a complex, attempt to improve the reporting trials quality on
expensive, and time-consuming process; nevertheless, diagnostic tests [40]. This list was submitted to
the reduction or elimination of research bias is one of clinical epidemiologists, statisticians, researchers,
the goals; the quality of study design is a crucial point and editors, and comments from more than 50 indi-
as few studies are meeting the high standards neces- viduals were incorporated in a checklist published in
sary for high-quality systematic reviews. The publi- Clinical Chemistry in 2000 [41]. The Standards for
cations of weak primary studies may generate opti- Reporting of Diagnostic Accuracy, STARD Initiative,
mistic estimates of diagnostic accuracy. In 1995, Reid was undertaken with the aim ‘‘to make a statement
et al. [38] documented the poor quality of diagnostic with checklist for the reporting of accuracy of a
accuracy research. They set out seven methodological diagnostic test and to update this statement in the
standards to evaluate diagnostic test as reported in future when necessary.’’ The checklist-stated objec-
Table 6 and the authors further assessed in 112 studies tive is to improve the reporting of studies quality, thus
of diagnostic tests published in Lancet, British Med- allowing the reader to judge both the internal validity
of the study and its applicability in other settings [42].
Table 6 The checklist and a suggested flow diagram to follow
Methodological standards for diagnostic tests [38] enrolled patients in the study are suitable for applica-
Spectrum composition tion in all kinds of diagnostic test researches. Both are
Age distribution described and reported in the STARD statement,
Sex distribution which was published in Clinical Chemistry with an
Symptoms or stage of disease accompanying paper that provides elaboration on and
Eligibility criteria
Accuracy in sub group
explanation of the checklist items along with exam-
Avoidance of workup bias ples [43,44]. The STARD statement, checklist, flow-
Test accuracy precision chart, and explanation should be useful to improve the
Indeterminate test results reporting of diagnostic accuracy studies as well to
Test reproducibility improve the systematic reviews.
164 T. Trenti / Clinica Chimica Acta 333 (2003) 155–167

6. Organizational impact and cost; clinical and total charges for patients with troponin testing were
economic benefit about US$900 less, with an annual saving of about
US$4 million. In an oriented EBM evaluation, it is
Nevertheless, the fact that the estimate cost of worth noting that the same study pointed out a
laboratories and laboratory test is about 4 – 5% of positive likelihood ratio of 17 and a negative like-
the NHS in UK and Italy makes the implications for lihood ratio of 0.04 for a diagnosis of myocardial
costs significant since in Italy, Europe, and elsewhere, infarction where the overall rate for the presence of
laboratory costs are rising. In the considerations or disease was 6%, with a posttest probability test of
evaluations regarding consumed resources, it is about 45% when positive, and a posttest probability
important to focus on the complete patient episode test of about 0.2% when the result was negative. In
and, when possible, the health outcome reached and this case, there is an evident cost-effectiveness in
not only the test cost [45,46]. The clinical impact of terms of benefit and outcome; robust evidences sup-
new tests on diagnostic and therapeutic strategies as port the test implementation and required investment
inappropriate laboratory utilization emphasizes the due to the gain achieved. EBM practice can raise the
need for an outcomes research agenda, as the majority ‘‘added’’ values of laboratory medicine by leading the
of evidence in literature are limited to technical and rational and cost-effective use of clinical investiga-
diagnostic test performances [9]. New tests, like tions. Other economic outcomes may be the number
drugs, are usually more expensive than older tests of clinical visits, readmission rate, working days lost,
currently in practice and resource constraints fre- and productive years gained. The use and the intro-
quently hinder the introduction of new tests even duction of a new test may have benefits in terms of
when there is good evidence about their value. The organizational aspects and care strategy since the test
best scenario is when there are evidences that the use may produce operational benefits such as reduced
of new tests can reduce overall costs, resulting both in staff time utilization, facilities, bed requirements,
clinical and economic benefits [47]. For example, in and risk associated with early discharge. The value
the study of Zarich et al., the patients presenting to the for money of laboratory tests needs to be appreciated
emergency department were randomized to receive a in a wider perspective than the confines of the
clinical evaluation with electrocardiogram and CK- laboratory services [11,12].
MB determination, or the same plus a serial troponin
evaluation. The information about subsequent care,
length of hospital stay, and charges was collected and 7. The implementation of the best practice; the
the results are reported in Table 7. Length of stay was guidelines and audit—maintaining the best
significantly shorter for patients who had troponin practice
determinations with acute or nonacute coronary syn-
drome and the total hospital costs were lower in this Guidelines are ‘‘systematically developed state-
case, with an impressive potential saving. On average, ments to assist practitioner and patients [on] decision
about appropriate healthcare for specific clinical cir-
Table 7 cumstances’’ and they are a tool to help physicians to
Use of troponin tests on resource use in hospitals for patients with practice best medicine. Clinical guidelines are not
nonacute and acute coronary syndromes [47] intended as a cookbook medicine leading to a small
Diagnosis Outcome With troponin Without troponin degree or absence of clinical freedom. The guidelines
measurements measurements need to be built into a structured process for delivery of
Nonacute coronary Stay (days) 1.2 1.6 health care, and the final goal is providing patients
syndrome with greater consistency of care. Nevertheless, guide-
n = 654 Charges 4487 6187
(US$)
lines have been successful in altering doctors’ practice;
Acute coronary Stay (days) 3.7 4.6 they have not been always successful in changing
syndrome laboratory test utilization in doctors’ practice. Many
n = 202 Charges 15,004 19,202 factors may be claimed to explain this situation. As it is
(US$ ) not possible for guidelines to evaluate all variations in
T. Trenti / Clinica Chimica Acta 333 (2003) 155–167 165

clinical practice style compared to patient population of a new diagnostic practice or established practice
cared for, dissonance will always exist; not all physi- should be subject to regular audit to evaluate the
cians understand, receive, or agree with guidelines quality and final results of care.
and, finally, the release of new evidence makes the
recommendations obsolete. The paper by van Wijk et
al. [48], which measures the compliance with labora- 8. Conclusions
tory guidelines of family physicians in The Nether-
lands, is of great value. The authors showed that the EBM has been driven by the need to manage
physician modified the test recommendations in 60.9% information overload, by cost control, and by public
of orders. The most common noncompliance was due request for the best in diagnostics and treatments.
to the addition of tests. Moreover, it was found that EBLM can help both in improving the quality of new
52.4% of the tests classified as noncompliant were scientific findings and in translating the results of good-
recommended by guidelines when they were updated. quality research into everyday practice. Since there is a
The guideline-developing process is complex as need to integrate many educational tools to focus the
they have to incorporate the greater strength of evi- strategy on promoting the implementation of best
dence and they have to be produced by experts of practices, the STARD proposal for robust diagnostic
different fields and, possibly, also the patients have to test primary studies, the presence of systematic reviews
be involved. There is a strong correlation between the of high quality, and the development of valid guidelines
strength of evidence incorporated and the quality of based on the best scientific evidence may be useful to
guidelines, as well the capacity to be effective and promote an evidence-based culture for appropriateness,
transferable in clinical practice. Additionally, the efficiency, and effectiveness in laboratory medicine.
guidelines are not easy to update and, if the evidence The practice of EBLM may have an important impact
incorporated is poor, they are weak and potentially on many aspects of professional practice. As perform-
misleading. With the infinite needs and finite resources, ances may deteriorate with age and since traditional
there is a reduction in the frequency with which the best means of continuing education are less effective than
opinion for a population matches that for individual an EBM approach, the presence of ‘‘EBM activities’’ in
patients [49,50]. Guidelines of high quality may help relation to diagnosis and therapeutic intervention can
laboratory professionals to change the ordering of be updated by professional performances. EBM may
investigations to achieve a more rational use of diag- help laboratory professionals to evaluate and to dem-
nostic tests by a better cost – benefit ratio, but the ulti- onstrate the impact of laboratory tests on a broad range
mate goal is to improve the quality of care for patients. of clinical outcomes. The value for money reviewed in
In the case of laboratory test, clinical audits may a wider perspective in terms of clinical benefits and
assess whether the test implemented or a new tech- health outcomes is a further important issue where the
nology is useful or correctly required. As outcome, the EBLM principles may improve the importance of
audit exercise may identify the need to modify prac- laboratory activities [52,53].
tice, or may lead to the identification of new questions
such as the necessity to introduce a diagnostic test
when necessary in clinical setting. Experience has Acknowledgements
shown that auditing practice is a means of controlling
demand for laboratory service and it can identify I gratefully acknowledge Mario Plebani (Padua)
motivated clinical needs or abuse and inappropriate for very helpful discussions and for critically reading
use of laboratory services. In the audit program, the the manuscript.
tools of evidence-based laboratory medicine are at its
foundation as in a continuous quality improvement
References
program. EBLM is pivotal in training and maintaining
performance in healthcare because it focuses on the [1] Sackett DL, Straus SE, Richardson WS, Rosemberg W, Hay-
evidence to support the decision-making process nes RB. Evidence-based medicine: how to practice and teach
required in clinical practice [1,51]. The establishment EBM. Edinburgh, Scotland: Livingstone; 2000. p. 67 – 75.
166 T. Trenti / Clinica Chimica Acta 333 (2003) 155–167

[2] Haynes RB. Where’s the meat in clinical journals? ACP J [23] Knottnerus JA, van Weel C, Muris JWM. Evaluation of diag-
Club 1993;119:A23 – 4. nostic procedure. Br Med J 2002;324:477 – 80.
[3] Granados A, Jonsson E, Banta HD, Bero L, Bonair A, Cochet [24] Guyatt GH. Readers’ guide for articles evaluating diagnostic
C, et al. Eur-Assess project subgroup report on dissemination tests: what ACP Journal Club does for you and what you must
and impact. Int J Technol Assess Health Care 1997;13:220 – 86. do yourself (Editorial). ACP J Club 1991;115:A16.
[4] Munro J, Booth A, Nicholl J. Routine preoperative testing; a [25] Sackett DL, Haynes RB. Evidence Based Clinical Diagnosis.
systematic review of evidence. Health Technol Assess 1997; The architecture of diagnostic research. Br Med J 2002;324:
1(12):i – iv, 1 – 62. 539 – 41.
[5] Barth JH, Seth J, Howlett TA, Freedman DB. A survey of [26] Talwar S, Siebenhofer A, Williams B, Ng L. Influence of
endocrine function testing by clinical biochemistry laborato- hypertension, left ventricular hypertrophy and left ventricular
ries in UK. Ann Clin Biochem 1995;32:442 – 9. systolic dysfunction on plasma N terminal pre-BNP. Heart
[6] van Warlraven C, Naylor CD. Do we know what inappropriate 2000;83:278 – 82.
laboratory utilization is? A systematic review of laboratory [27] Selvais PL, Donickier JE, Robert A, Laloux O, van Linden
clinical audits. JAMA 1998;280:550 – 8. F, Ahn S, et al. Cardiac natriuretic peptides for diagnosis and
[7] Bareford D, Hayling A. Inappropriate use of laboratory serv- risk stratification in heart failure. Eur J Clin Invest 1998;
ice: long term combined approach to modify request patterns. 28:636 – 42.
Br Med J 1990;301:1305 – 7. [28] Landray MJ, Lehman R, Arnold I. Measuring brain natriuretic
[8] Wikens R, Dinant GJ. Evidence base of diagnostic test. Ra- peptide in suspected left ventricular systolic dysfunction in
tional, cost effective use of investigations in clinical practice. general practice: cross sectional study. Br Med J 2000;320:
Br Med J 2002;324:783 – 5. 935 – 6.
[9] Lundberg GD. The need for an outcomes research agenda for [29] Reid MC, Lane DA, Feinstein AR. Academic calculations
clinical laboratory testing. JAMA 1998;280:565 – 6. versus clinical judgments: practicing physicians’ use of
[10] Lundberg GD. How clinicians should use the diagnostic labo- quantitative measures of test accuracy. Am J Med 1998;
ratory in changing medical world? Clin Chim Acta 1995; 104:374 – 80.
41:775 – 80. [30] Greenhalgh T. How to read a paper. Papers that summarize
[11] Price CP. Evidence-based laboratory medicine: supporting de- other papers (systematic reviews and meta-analyses). Br Med
cision making. Clin Chem 2000;46:1041 – 50. J 1997;315:672 – 5.
[12] McQeen MJ. Overview of evidence based medicine: chal- [31] Olatidoye AG, Wu AHB, Feng YJ, Waters D. Prognostic role
lenges for evidence-based laboratory medicine. Clin Chem of troponin the prognostic role of troponin T versus troponin I
2001;47:1536 – 46. in unstable angina pectoris for cardiac events with meta-anal-
[13] Sackett DL, Rosember WMC, Gray JAM, Hayes RB, Ri- ysis comparing published studies. Am J Cardiol 1998;81:
chardson WS. Evidence-based medicine: what it is and what 1405 – 10.
it isn’t? Br Med J 1996;312:71 – 2. [32] Irwing L, Tosteson ANA, Gatsonis C, Lau J, Colditz G,
[14] Gene M, Mardh PA. A cost effectiveness analysis of screening Chalmers TC, et al. Guidelines for meta-analysis evaluating
and treatment for Chlamydia trachomatis infection in asymp- diagnostic tests. Ann Intern Med 1994;120:67 – 76.
tomatic women. Ann Intern Med 1996;124:1 – 7. [33] Scouller K, Conigrave KM, Macskill P, Irwig L, Whitfield
[15] Witte DL. Measuring outcomes: why now? Clin Chim 1995; JB. Should we use carbohydrate-deficient transferrin instead
41:775 – 80. of g-glutamyl-transferase for detecting problem drinkers? A
[16] Glanville J, Haines M, Auston I. Getting research into prac- systematic review and metaanalysis. Clin Chem 2000;46:
tice. Finding information of clinical effectiveness. Br Med J 1902 – 84.
1998;317:200 – 3. [34] Panteghini M, Pagani F, Bonetti G. The sensitivity of cardiac
[17] Geenhalgh T. How to read a paper. The MEDLINE database. markers: an evidence based approach. Clin Chem Lab Med
Br Med J 1997;315:180 – 3. 1999;37:1097 – 106.
[18] Fryback FG, Thornbury JR. The efficacy of diagnostic imag- [35] Deeks JJ. Systematic reviews of evaluations of diagnostic and
ing. Med Decis Mak 1991;11:88 – 94. screening tests. Br Med J 2001;323:157 – 62.
[19] Guder WG. Pre analytical factors and their influence on ana- [36] Deeks JJ. Using evaluations of diagnostic test: understanding
lytical quality specifications. Scand J Clin Lab Invest 1999; their limitations and making the most available evidence. Ann
59:545 – 50. Oncol 1999;10:761 – 8.
[20] Moore RA, Fingerova H. Evidence based laboratory medi- [37] Guyatt GH, Tugwell P, Feeny DH, Haynes RB, Drummond
cine: using current best evidence to treat individual patients. M. A framework for clinical evaluation of diagnostic technol-
In: Price CP, Hicks JM, editors. Point of care testing. Wash- ogies. Can Med Assoc J 1986;134:587 – 94.
ington: AACC Press; 1999. p. 265 – 88. [38] Reid MC, Lachs MS, Feinstein AR. Use of methodological
[21] Moore RA. Evidence-based clinical biochemistry. Ann Clin standards in diagnostic test research: getting better but still not
Biochem 1997;13:220 – 86. good. JAMA 1995;274:645 – 51.
[22] Guyatt GH, Oxman AD, Ali M, William A, McIlroy W, Pat- [39] Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van
terson C. Laboratory diagnosis of iron-deficiency anaemia: an der Meulen JH, et al. Empirical evidence of designed related-
overview. J Gen Intern Med 1992;7(1):45 – 53. bias in studies of diagnostic tests. JAMA 1999;282:1061 – 6.
T. Trenti / Clinica Chimica Acta 333 (2003) 155–167 167

[40] Bruns DE. The clinical chemist. Clin Chem 1997;43:2211 – 2. hospital resource utilization and costs in the evaluation of
[41] Bruns DE, Huth EJ, Magid E, Young DS. Toward a checklist patients with suspected myocardial ischemia. Am J Cardiol
for reporting of studies of diagnostic accuracy of medical tests. 2001;88:732 – 6.
Clin Chem 2000;46:893 – 5. [48] van Wijk MAM, van del Lej J, Mosselveld M, Bohnen AM,
[42] Bruns DE. The STARD initiative and the reporting of studies van Bemmel JH. Compliance of general practitioner with a
of diagnostic accuracy (Editorial). Clin Chem 2003;49:19 – 20. guidelines-based decision support system for ordering blood
[43] Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou test. Clin Chem 2002;48:732.
PP, Irwing LM, et al, for the STANDARD group. Towards [49] Granata AV, Hillman AL. Completing practice guidelines: us-
complete and accurate reporting of studies of diagnostic accu- ing cost-effectiveness analysis to make optimal decision. Ann
racy: the STARD initiative. Clin Chem 2003;49:1 – 6. Intern Med 1998;128:53 – 6.
[44] Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou [50] McQeen MJ. Evidence based medicine: its application to lab-
PP, Irwing LM, et al. The STARD statement for reporting oratory medicine. Ther Drug Monit 2000;22:1 – 9.
studies of diagnostic accuracy: explanation and elaboration. [51] Davies DA, Thomson MA, Oxman AD, Haynes RB. Evidence
Clin Chem 2003;49:7 – 18. for the effectiveness of CME. A review of 50 randomized
[45] Scott MG. Faster is better—it’s rarely that simple (Editorial). controlled trials. JAMA 1992;268:1111 – 7.
Clin Chem 2000;46:441 – 2. [52] McQeen MJ. Will physicians and scientists have any role in
[46] Keffer J. Health economics point of care testing. In: Price CP, managing laboratory resources in the year 2000? Eur J Clin
Hicks JM, editors. Point of care testing. Washington: AACC Chem Clin Biochem 1996;34:867 – 71.
Press; 1999. p. 233 – 48. [53] Plebani M. Changing the course of medical laboratories in
[47] Zarich S, Bradley K, Seymour J, Ghali W, Traboulsi A, changing environment. Clin Chim Acta 2002;21:87 – 100.
Mayall ID, et al. Impact of troponin T determinations on

Das könnte Ihnen auch gefallen