Beruflich Dokumente
Kultur Dokumente
POINTCOUNTERPOINT
Background Recently there has been considerable debate about possible false positive study
outcomes. Several well-known epidemiologists have expressed their concern and
the possibility that epidemiological research may loose credibility with policy
makers as well as the general public.
Methods
We have identified 75 false positive studies and 150 true positive studies, all
published reports and all epidemiological studies reporting results on substances
or work processes generally recognized as being carcinogenic to humans. All studies
were scored on a number of design characteristics and factors relating to the
specificity of the research objective. These factors included type of study design,
use of cancer registry data, adjustment for smoking and other factors, availability
of exposure data, dose- and duration-effect relationship, magnitude of the reported
relative risk, whether the study was considered a fishing expedition, affiliation
and country of the first author.
Results
The strongest factor associated with the false positive or true positive study outcome was if the study had a specific a priori hypothesis. Fishing expeditions had
an over threefold odds ratio of being false positive. Factors that decreased the odds
ratio of a false positive outcome included observing a dose-effect relationship,
adjusting for smoking and not using cancer registry data.
Conclusion
The results of the analysis reported here clearly indicate that a study with a
specific a priori study objective should be valued more highly in establishing a
causal link between exposure and effect than a mere fishing expedition.
Keywords
Accepted
24 April 2001
948
949
Methods
We have distinguished false positive from true positive studies.
Since there is no gold standard for this distinction we have
based our classification on the International Agency for Research
on Cancer (IARC) classification.11 IARC has evaluated a range
of chemicals and occupational exposure circumstances regarding
their possible carcinogenic effect on humans. A small number of
the evaluated chemicals have been classified as being carcinogenic
to humans, based on the available epidemiological study results.
950
Table 1 List of 20 carcinogenic substances or work processes encountered in the occupational environment and their true positive and false
positive effects
Substance or work process
Arsenic
Tripletsa
Lung
Stomach, rectum
Asbestos
10
Benzene
Leukaemia
Lung, nose
Lung
Prostate
Cadmium
Chromium
Lung
Lymphoma
Leukaemia
Nickel
Radon
Lung
Larynx, rectum
Angiosarcoma
Melanoma, lung
Ethylene oxide
Vinyl chloride
Mineral oils
Wood dust
Nose
11
Nose
Pancreas, lung
Coal gasification
Lung
Colon/rectum, prostate
Nose
Lung
Lung, larynx
Bladder
Lung
Painter
Rubber industry
Sulphuric acid
Silica
a This column indicates the number of triplets on a particular agent or profession. Each triplet consists of two true positive studies and one false positive study.
Results
A number of study characteristics were investigated in association with the true or false positive status of the study. The
OR in Table 2 were calculated by taking as a standard the more
crude study design. For instance, the OR for adjustment for
smoking was calculated by dividing the odds for a false positive
study in the studies with adjustment for smoking by the odds
for a false positive outcome in the studies without adjustment
for smoking, the standard. An OR of below 1 indicates a protective effect against being a false positive outcome for the factor
under investigation.
Study objective
Several factors relating to the study objective were associated
with true or false positivity. Studies with a broader scope than
951
Study design
Several design characteristics were associated with true or false
positivity. Retrospective cohort studies had a slightly smaller
chance of yielding a false positive finding (OR = 0.80, 95%
CI : 0.441.47). The number of prospective cohort studies was
too small to be analysed in detail. However, all five prospective
cohort studies were categorized as being true positives. Casecontrol studies or cohort studies not based on cancer registries
had a decreased likelihood of yielding false positive outcomes
compared to studies in which cancer registries played an
important role (OR = 0.35, 95% CI : 0.190.62). The likelihood
of a false positive study tended to be smaller if adjustments for
other risk factors were made. Studies in which an adjustment
for smoking was made had a decreased likelihood of yielding
false positive results (OR = 0.51, 95% CI : 0.290.90). The same
was true for studies in which adjustments for other factors than
smoking were made (OR = 0.67, 95% CI : 0.361.22). In general
terms, studies with information on exposure concentration
had a decreased likelihood of yielding false positive results
(OR = 0.70, 95% CI : 0.341.44). If a study reported a positive
dose-response relationship the likelihood of a false positive finding was substantially less than if a dose-response relationship
was not tested for (OR = 0.26, 95% CI : 0.100.71). This was
not the case if a dose-response relationship was tested for
but not found, compared to studies where the relationship was
not tested for. Thus the presence of a dose-response association
appeared to be associated with a lower risk of a study finding
being a false positive. A decreased OR was found both between
studies that reported a positive or non-existent duration-effect
relationship compared to studies in which this association was
not tested for. Thus it must be concluded that having found a
positive duration-effect relationship is not related to the odds
for a false positive finding, but that testing for this association is
related to the odds for a false positive finding.
Other variables
The log-transformed OR was a strong predictor of the outcome
of a study not being a false positive (OR: 0.56 per unit increase
in the logarithm of the reported OR [95% CI : 0.370.86]). There
was only a marginal difference in likelihood of a false positive
study between journals specifically focused on the occupational
environment and other journals. The difference between
countries was considerable, although not statistically significant
at conventional levels. Taking the US as the reference group, the
most likely countries to report false positive results were the
Scandinavian countries.
Affiliation of the first author was also related to true or false
positivity. Chances of a false positives study outcome were
952
Table 2 Associations between study characteristics and true or false positive status of the study in terms of odds ratios (OR)a and 95% CI
Study characteristic
OR
95% CI
1.27
0.732.23
UK versus USA
0.68
0.192.38
1.54
0.733.26
0.50
0.221.13
1.01
0.432.36
0.80
0.441.47
1.16
0.492.76
1.14
0.472.76
0.23
0.110.49
0.57
0.241.36
Design
0.62
0.351.08
0.35
0.190.62
0.70
0.341.44
0.67
0.361.22
0.51
0.290.90
0.96
0.273.41
0.26
0.100.71
0.52
0.211.31
0.55
0.291.05
0.76
0.351.62
0.84
0.471.53
0.45
0.092.23
0.92
0.352.65
0.97
0.531.78
0.33
0.180.57
0.56
0.370.86
Dose-effect relation
a An odds ratio below 1 indicates a factor that decreases the risk of a false positive finding.
b Other designs are all other design than case-control studies and retrospective and prospective cohort studies, for instance proportional mortality ratio studies
and ecological studies. In other designs, prospective cohort studies were excluded because of the small number.
Discussion
This comparative study focused on identifying factors in
research objectives and design characteristics that may affect
the likelihood of a false positive finding in occupational cancer
epidemiology.
We selected the IARC classification as the gold standard for
distinguishing between true positive and false positive studies.
This choice gives some room for circular reasoning, since the
IARC classification in itself is based on a judgement of true/
false positivity of the epidemiological study results. However, a
line must be drawn somewhere. Our choice was to let this line
be drawn by IARC. The IARC classification has the advantage of
being based on an evaluation of the full evidence of the carcinogenicity of the compound. Clearly the research design least likely
to yield false positive results is the experimental design, with
control over the exposure, randomization and blinded observation
953
Table 3 Associations between study selection of study characteristics false positive status of the study in a multiple logistic model, odds ratios
(OR) and 95% CI
Study characteristic
OR
95% CI
0.22
0.070.62
0.33
0.111.00
1.13
0.363.54
0.49
0.073.52
0.12
0.020.80
0.26
0.110.62
0.75
0.301.89
0.44
0.230.84
KEY MESSAGES
Studies testing a specific a priori hypothesis are less likely to report false positive outcomes.
Adjustment for other factors, especially smoking, decreases the risk of a false positive study outcome.
A positive dose-response relationship and a substantial relative risk decrease the risk of a false positive finding.
954
results from fishing expedition studies only for hypothesis generation, and not as a basis for conclusions regarding the potential
carcinogenicity of the substance under study. This is especially
true if cancer registry data are used. Also, results from studies
without correction for smoking and studies that did test whether
or not there is a dose-response relationship but did not find one
or other confounders have to be handled with care.
6 Feinstein AR, Horwitz RI, Spitzer WO, Battista RN. Coffee and pan-
References
1995;269:16469.
2 Trichopoulos D. The future of epidemiology. Br Med J 1996;313:
43637.
3 Koplan JP, Thacker SB, Lezin NA. Epidemiology in the 21st century:
One does not need to agree with the premise of Swaen et al.,1
who examine the issue of false positive outcomes in this issue of
the International Journal of Epidemiology, or have to ignore some
methodological weakness in their study, or even to think their
conclusions simply reaffirm some very basic scientific principles,
to see considerable merit in the approach taken by these investigators and to believe their paper commands attention.
Taubes paper,2 which highlighted the discrepancies between
different study results that arise in epidemiology and the effect
this has on public opinion, was famously read by some as predicting the imminent demise of epidemiology but it also permitted
a broader examination of the state of the discipline.3 To be
sure, epidemiology produces conflicting results but so does any
research enterprise. It is only because the public has such a keen
interest in the results of epidemiological studies that they are
seen to be at the sharp end of this particular stick. Climatologists,
nuclear physicists and students of the fall of the Roman Empire
are all seen to produce their share of discrepant observations
Correspondence: Professor Michael Bracken, Department of Epidemiology
and Public Health, Yale University School of Medicine, 60 College Street,
PO Box 208034, New Haven, CT 065208034, USA. E-mail: michael.bracken@
yale.edu
science of review, these studies have led to a better understanding of the role of publication4 and citation bias,5,6 the importance
of different data base searching strategies,7 the validity of abstracts
in accumulating research evidence,810 and the validity of
different methods used for quantifying the quality of studies.11
Reassuringly, Swaen et al. find the largest factor in a false
positive study to be the absence of an a priori hypothesis as that
is arguably the most fundamental of all scientific principles.
Similarly, a dose-response relationship and adjustment for a
major confounder (smoking), as expected, lead to fewer false
positive results. It is interesting that study design is not itself a
factor but there is little reason it should be; there is nothing
intrinsically in error with case-control studies once problems
with confounding have been adjusted, as they have here. Many
of these issues are a matter of faith in epidemiology and it is
reassuring to have some empirical evidence for them.
It is a great paradox in epidemiology that while the profession
is very conversant with the requirements for conducting valid
studies, it has generally neglected the need for rigorous, objective and hypothesis-driven summaries of the totality of epidemiological evidence when reviewing a particular topic. Early
critics of the lack of scientific rigor in literature reviews focused
on medicine,12,13 but in one recent analysis, over 60% of epidemiology reviews were considered to not meet the standards
of a systematic review,14 and several specific biases in epidemiology
reviews have been reported for chronic fatigue syndrome15 and
passive smoking.16 While calls for more quantitative reviews in
epidemiology are starting to be made,17 the overall poor quality
of current epidemiology reviews is in marked contrast to the
field of evidence-based medicine and healthcare which over
the last 12 years has made remarkable strides in developing a
methodology and strict standards for systematically reviewing
and analysing a body of literature.18 While some epidemiologists have played a major role in these developments, by and
large it appears that epidemiologists still review evidence using
traditional and potentially biased methods.
Trhler has recently provided a comprehensive account of the
origins of evidence-based medicine, focusing on its early history
in Britain.19 Of particular note are the arithmetic observationists
who sought to quantify the mass of new observations being
made in medicine in the late 18th century, and exemplified by
William Black who in his text An Arithmetic and Medical Analysis
of the Diseases and Mortality of the Human Species wrote in 1789:
however it may be slighted as an heretical innovation, I
would strenuously recommend Medical Arithmetick, as a guide
and compass through the labyrinth of the therapeutick.20(cited
in 19 p.117)
955
Criterion
Review addressed a focused research question
Medicine
1985198612
(n = 50)
Medicine
199626
(n = 158)
Epidemiology
19971999a
(n = 39)
Meta-analysesb
199626
(n = 19)
80
34
49
95
28
15
95
14
10
68
a From Epidemiologic Reviews Vols 1921 omitting methodology reviews, editorials, and a historical review of one major trial.
b A subset of the 158 Medicine reviews described as meta-analyses, systematic reviews or overviews.26
956
Acknowledgement
I am grateful to Iain Chalmers for comments on an early draft
of this paper.
References
1 Swaen GGMH, Teggeler O, van Amelsvoort LGPM. False positive
65456.
6 Ojasoo T, Dor JC. Citation bias in medical journals. Scientometrics
1999;45:8184.
7 Watson RJ, Richardson PH. Accessing the literature on outcome
957
27 Herring C. Distil or drown: the need for reviews. Physics Today 1968;
21:2733.
1984;18:2740.
29 Sterling TD. Publication decisions and their possible effects on
Br Med J 1904;3:124346.
147:717.
24 Goldberger J. Typhoid bacillus carriers. In: Rosenau MJ, Lumsden
26 McAlister FA, Clark HD, van Walraven C et al. The medical review
article revisited: has the science improved. Ann Intern Med 1999;131:
94751.
1997;8:33739.
958
References
1 Swaen GGMH, Teggeler O, van Amelsvoort LGPM. False positive out-