Sie sind auf Seite 1von 11

International Epidemiological Association 2001

International Journal of Epidemiology 2001;30:948954

Printed in Great Britain

POINTCOUNTERPOINT

False positive outcomes and design


characteristics in occupational cancer
epidemiology studies
Gerard GMH Swaen, Olga Teggeler and Ludovic GPM van Amelsvoort

Background Recently there has been considerable debate about possible false positive study
outcomes. Several well-known epidemiologists have expressed their concern and
the possibility that epidemiological research may loose credibility with policy
makers as well as the general public.
Methods

We have identified 75 false positive studies and 150 true positive studies, all
published reports and all epidemiological studies reporting results on substances
or work processes generally recognized as being carcinogenic to humans. All studies
were scored on a number of design characteristics and factors relating to the
specificity of the research objective. These factors included type of study design,
use of cancer registry data, adjustment for smoking and other factors, availability
of exposure data, dose- and duration-effect relationship, magnitude of the reported
relative risk, whether the study was considered a fishing expedition, affiliation
and country of the first author.

Results

The strongest factor associated with the false positive or true positive study outcome was if the study had a specific a priori hypothesis. Fishing expeditions had
an over threefold odds ratio of being false positive. Factors that decreased the odds
ratio of a false positive outcome included observing a dose-effect relationship,
adjusting for smoking and not using cancer registry data.

Conclusion

The results of the analysis reported here clearly indicate that a study with a
specific a priori study objective should be valued more highly in establishing a
causal link between exposure and effect than a mere fishing expedition.

Keywords

Epidemiology, false positive studies, methods, occupational cancer

Accepted

24 April 2001

Recently a number of unexpected outcomes of epidemiological


studies fuelled the discussion among epidemiologists about the
limits and validity of epidemiological research. In a number of
articles and editorials epidemiologists argued that epidemiology
is producing false results, and that the public and policy makers
are developing reservations towards epidemiology in general.
Gary Taubes interviewed a number of prominent epidemiologists and published the responses in an article on the limits
of epidemiologic research.1 Taubes work was triggered by a
number of conflicting research results on associations between
radon exposure and lung cancer, DDT and breast cancer, cancer
Department of Epidemiology, University of Maastricht, The Netherlands.
Correspondence: Gerard GMH Swaen, Department of Epidemiology,
University of Maastricht, PO Box 616, 6200 MD Maastricht, The Netherlands.
E-mail: g.swaen@epid.unimaas.nl

risks from electromagnetic fields and a number of other topics,


all characterized by conflicting and contradictory study results.
The interviewed epidemiologists explained that part of the
problem lies in the very nature of epidemiologic studies,
in particular those that try to isolate causes of non-infectious
disease, known variously as observational or risk factor or
environmental epidemiology. Confounding factors, exposure
misclassification and recall bias were mentioned as important
factors in producing conflicting study results. Trichopoulos,
one of the prominent epidemiologists who was interviewed,
also expressed his concern and stated, we (epidemiologists) are
fast becoming a nuisance to society. People dont take us seriously anymore and he suggested that the press should become
more sceptical about epidemiological findings.1 Later Trichopoulos
published an article in which he wrote that concern has recently
arisen that epidemiology has either exhausted its potential or,

948

FALSE POSITIVE STUDIES

worse, is generating conflicting results that confuse the public


and disorient policy makers.2 Koplan et al.3 noted that epidemiology sometimes is regarded as a science that brings bad
news or even worse, and often reports contradictory findings. In
the same volume of the American Journal of Public Health it
was stated that epidemiology is accused of being the source of
spurious, confusing and misleading findings.4 This issue was
also addressed in a recent article by Bankhead5 in which the
scientific aspects of epidemiology were discussed.
The problem of false positive findings in epidemiology was
already addressed nearly two decades ago. Following the reporting of a presumably false positive finding of an association
between coffee drinking and pancreatic cancer, Feinstein stated,
the credibility of risk factor epidemiology can withstand only a
limited number of false alarms.6
Clearly, as indicated by these quotes, scientists are concerned
about the possibility that epidemiological research may produce
false results, either in the form of false negative or false positive
results. False study results may have their roots in chance. For
some reason, not necessarily causally related to the exposure
under investigation, the exposed population may have a different a priori risk for developing the disease than the unexposed
population. In such a situation even a well-designed epidemiological study will produce a false result. In addition, it is possible
that shortcomings in the study design may lead to false results,
either false positive or false negative results.
There are numerous examples of conflicting study results. A
collection of 56 topics with contradictory results in case-control
research has been reported by Mayes et al.7 Thirty of these topics
concerned issues related to cancer risks. Mayes suggested that
most of the disagreement is caused by the lack of a rigorous set
of scientific principles and proposed the use of the principle of
experimental trials to develop the scientific standards for casecontrol research. We can only partially agree with this proposal.
Clearly an experimental design must be regarded as being
superior to non-experimental designs. However, many epidemiological questions are not open to research in which an experimental design can be applied. Investigating long-term health
effects of possible toxic agents cannot be studied experimentally,
as this would be unethical. Despite the notion that experimental
design should be regarded as the more powerful research design,
we are left with the situation that, in many fields, experiments
are either unethical or would take too long to conduct because
of the long latency between exposure and effect, or both.
An association between a certain exposure and a specific type
of tumour can be reported in one study but not in another.
Unfortunately, there is no 100% certainty or gold standard
telling us which result is a true positive result and which a false
positive. It has been suggested that animal carcinogenicity data
can be taken as the gold standard. However, since it is a matter
of great debate how well animal studies correctly predict the
carcinogenicity of chemicals in humans we have decided not to
take these as the gold standard.
In a number of instances it is generally accepted that some
specific causal relation exists and any other association is likely
to be spurious. For instance, it is generally accepted that occupational exposure to asbestos fibres in the air will increase the
risk for lung cancer, mesothelioma and possibly laryngeal cancer.
A study reporting an association between asbestos exposure and
leukaemia can probably be regarded as a false positive finding,

949

since many cohort studies, with sufficient statistical power to


detect such an association, have not reported this result.
In general terms, false positive results can be a result of chance,
confounding or bias. There are many, more specific hypotheses
about which design characteristics may lead to false positive
findings. For instance, in his introductory book on occupational
epidemiology, Hernberg specifically addresses the problem of
false positive and false negative study results.8 He states that the
most common reasons for false positive studies are information
bias and confounding. According to Hernberg, case-control studies
have a tendency to produce false positive results, since information on exposures can be biased by the disease status of the
respondent (information bias). In retrospective cohort studies,
confounding is more probable to cause false positive studies,
because the effect of confounding cannot be controlled for.
Empirical data to support hypotheses such as the ones cited
above are lacking.
Another possible source of false positive studies may be
publication bias. This bias towards more likely publication of
positive results as compared to negative results has been described
in the literature.9,10 Publication bias may tend to increase the
number of reported positive findings in general. We have focused
on a comparison of false positive studies with true positive
studies. Therefore the occurrence of publication bias was not
the focus of our research. Publication bias may, however, have
an effect on our analysis because false positive results may be
more likely to be the result of publication bias than true positive
studies. Possibly, review boards and editors may be somewhat
hesitant to publish positive findings that have not yet been
reported elsewhere. The contrary may also be true, a new finding, perhaps a false positive finding, may be more capable of
attracting the readers attention.
The study reported here was specifically designed to compare
false positive studies with true positive studies. It was thought
that if design characteristics that increase the probability of a
false positive study result can be identified, studies with these
characteristics should be interpreted with caution. If, for instance, false positive studies more often turn out to be ecologic
studies than true positive studies, this could have an impact on
the interpretation of the results of ecologic studies.
In order to compare false positive studies and true positive
studies we searched the scientific literature focusing on occupational cancer epidemiological studies: studies aimed at investigating the possible carcinogenic effects of occupational exposures.
The advantage of focusing on the literature of occupational
cancer epidemiology is that the risk factors under investigation
are more uniform and we have a set of agents or occupational
exposure conditions that have been generally accepted as being
carcinogenic.

Methods
We have distinguished false positive from true positive studies.
Since there is no gold standard for this distinction we have
based our classification on the International Agency for Research
on Cancer (IARC) classification.11 IARC has evaluated a range
of chemicals and occupational exposure circumstances regarding
their possible carcinogenic effect on humans. A small number of
the evaluated chemicals have been classified as being carcinogenic
to humans, based on the available epidemiological study results.

950

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

The IARC expert groups have critically reviewed the available


epidemiological evidence for these chemicals and have
concluded that there is sufficient evidence for a carcinogenic
effect. These chemicals or substances have been classified as
category 1 carcinogens and were selected for the study reported
here. From the category 1 carcinogens only the substances that
form occupational exposures were selected and agents such
as drugs for treating cancer were excluded, in order to derive
at a homogeneous group of substances, all investigated in an
occupational setting.
These exclusions resulted in a list of 20 occupationally related
carcinogenic substances or processes, all classified as category 1
human carcinogens based on occupational epidemiology
studies. These substances or work processes were regarded as true
positive carcinogenic substances. In addition the IARC classifications
were used to identify the target organ(s) in which the carcinogenic effect was observed. Studies reporting a carcinogenic effect
in other organs were regarded as being false positive. For instance,
a study reporting an elevated risk of leukaemia in benzeneexposed workers was regarded as a true positive study. On the
other hand, a study reporting an elevated lung cancer risk in
benzene-exposed workers was regarded as being a false positive
Next, a literature search was performed to identify epidemiological studies reporting on these substances. This was done
using Medline with a time window limited to 1984 until 1997.
By simple searching with the name of the substance as listed in
Table 1 and cancer and epidemiology a large set of over 20 000
hits was produced. Articles reporting negative findings, either
true or false negative, and studies in which no original data
collection on exposed people took place (e.g. meta-analyses)
were excluded from our analysis. Articles that were not based
on original data collection, such as reviews and meta-analyses,
were excluded. Pilot studies were also excluded. Seventy-five

false positive studies were identified in this set, by reviewing the


abstract of the articles. Each dataset for each agent was sorted
by year of publication. For each false positive study two true
positive studies that occurred on the same search immediately
before or after the false positive study were selected. This procedure resulted in 75 triplets of which one was a false positive
study and two were true positive studies. A list of the 225 included
studies is available on request. For beryllium, no false positive
studies were found in the literature search, so that the final list
of carcinogenic substances or processes under investigation contained 19 substances or processes. In Table 1 the list of 19 carcinogenic substances and their true and false positive effects are given.
The 225 articles were looked up in the university library
or requested by interlibrary loan. Copies were made and were
coded by us, by means of a simple coding form, after the copy
was blinded with respect to the false or true positivity of the
results. Blinding sometimes failed because the coder would
know which study was a false positive or true positive. Each
study was scored by GS and OT independently of each other.
Later, the two score forms were compared and discrepancies
were resolved. It must be noted, however, that some scoring
required a certain degree of interpretation and judgement by us.
For instance, all studies were scored by us as being a fishing
expedition or not. This was partly a matter of judgement, since
in none of the articles was it stated that the study was a fishing
expedition, without a specific hypothesis. Each study was
scored on a number of items. The scored items included: journal
of publication, research design, specific or general study
objective (a study was coded as having a specific study objective
if one hypothesis between a specific exposure and a specific
effect were postulated. If the study had two, three or four specific
hypotheses it was separately coded as such), type of exposure
data, correction for smoking habits or other confounders, testing

Table 1 List of 20 carcinogenic substances or work processes encountered in the occupational environment and their true positive and false
positive effects
Substance or work process
Arsenic

True positive effect

False positive effects

Tripletsa

Lung

Stomach, rectum

Asbestos

Lung, mesothelioma, larynx

Biliary tract, kidney, stomach, colon, prostate, bladder

10

Benzene

Leukaemia

Lung, nose

Lung

Prostate

Cadmium
Chromium

Lung

Colon/rectum, bladder, kidney

Lymphoma

Leukaemia

Nickel

Lung, nose, oral cavity

Salivary gland, colon

Radon

Lung

Larynx, rectum

Angiosarcoma

Melanoma, lung

Ethylene oxide

Vinyl chloride
Mineral oils

Lung, skin, larynx

Melanoma, bladder, pancreas

Wood dust

Nose

Lung, lymphoma, soft tissue sarcoma, stomach,


colon, prostate, liver

11

Boot and shoe manufacture

Nose

Pancreas, lung

Coal gasification

Lung

Colon/rectum, prostate

Furniture and cabinet making

Nose

Leukaemia, lymphoma, melanoma

Iron and steel founding

Lung

Leukaemia, stomach, kidney

Bladder, upper respiratory tract

Stomach, ovary, leukaemia, myeloma

Lymphatic system, oesophagus, bladder

Brain, uterus, skin, pancreas

Lung, larynx

Bladder

Lung

Bone, salivary glands, brain, mouth, pancreas, liver, larynx

Painter
Rubber industry
Sulphuric acid
Silica

a This column indicates the number of triplets on a particular agent or profession. Each triplet consists of two true positive studies and one false positive study.

FALSE POSITIVE STUDIES

for and observing a dose-response relationship, number of


statistical tests performed, study size for the specific reported
association, affiliation of the primary investigator, collaboration
with other research institutes and being a fishing expedition or
not. A study was coded as a fishing expedition if no clear
underlying hypothesis on a specific cause (chemical agent or
profession) and a specific effect (the type of tumours postulated
to have an increased occurrence as the result of the exposure)
was mentioned in the paper. There is only a small difference
between the variables fishing expedition and specific
hypothesis yes/no. Any study with one clearly defined
hypothesis between a specific exposure and specific effect was
coded as yes on the variable specific hypothesis and no on
fishing expedition. Similarly, a study with two, three or four
clearly defined hypotheses was coded as no on the variable
specific hypothesis, yes on specific hypothesis 2,3 or 4 and no
on fishing expedition. However, a study focusing on five or
more specific hypotheses or was coded as no on specific
hypothesis, but on the other hand was not coded as a fishing
expedition. An example: In an article the long-term health
effects of asbestos were studied. The authors clearly hypothesized
that asbestos exposure may cause lung cancer, mesothelioma
and laryngeal cancer. This study was not regarded to be a fishing
expedition, but on the other hand was not regarded as having a
specific hypothesis anymore. Such a study of course could still
report an excess of say colon cancer and thus could still be
coded as being false positive.
Simple odds ratios (OR) were calculated to investigate the
crude associations between the study characteristics and the
status of the study in terms of false or true positivity. Next,
multiple logistic regression was used to adjust the individual OR
for the effect of other variables that appeared to be related
to true or false positivity. For the model a number of variables
were selected. The selection criteria were that the variable was
relatively strongly associated with true or false positivity. Between
the variable fishing expedition and the specificity of the hypothesis colinearity prevented simultaneous inclusion of both predictors in the model. We did not choose statistical significance
of the unadjusted OR as primary selection criterion for inclusion
in the model because of the relatively limited sample size. The
variables presented in Table 3 come from the full logistic regression model taking into account the matched design by giving
each stratum its own unique intercept term.

Results
A number of study characteristics were investigated in association with the true or false positive status of the study. The
OR in Table 2 were calculated by taking as a standard the more
crude study design. For instance, the OR for adjustment for
smoking was calculated by dividing the odds for a false positive
study in the studies with adjustment for smoking by the odds
for a false positive outcome in the studies without adjustment
for smoking, the standard. An OR of below 1 indicates a protective effect against being a false positive outcome for the factor
under investigation.

Study objective
Several factors relating to the study objective were associated
with true or false positivity. Studies with a broader scope than

951

occupation and cancer, for instance focusing on all causes of a


disease or all possible health effects of an exposure, had a marginally elevated chance of being false positive (OR = 1.14, 95%
CI : 0.472.76). Studies that were clearly aimed at a specific hypothesis in terms of a specific exposure and a specific carcinogenic
effect had a strongly decreased chance of being false positive
(OR = 0.23, 95% CI : 0.110.49). This was also reflected in
studies reporting more than 50 significance tests (OR = 0.62,
95% CI : 0.351.08). In addition, studies that were classified by
us as not being fishing expeditions were more likely not to be
false positives (OR = 0.33, 95% CI : 0.180.57).

Study design
Several design characteristics were associated with true or false
positivity. Retrospective cohort studies had a slightly smaller
chance of yielding a false positive finding (OR = 0.80, 95%
CI : 0.441.47). The number of prospective cohort studies was
too small to be analysed in detail. However, all five prospective
cohort studies were categorized as being true positives. Casecontrol studies or cohort studies not based on cancer registries
had a decreased likelihood of yielding false positive outcomes
compared to studies in which cancer registries played an
important role (OR = 0.35, 95% CI : 0.190.62). The likelihood
of a false positive study tended to be smaller if adjustments for
other risk factors were made. Studies in which an adjustment
for smoking was made had a decreased likelihood of yielding
false positive results (OR = 0.51, 95% CI : 0.290.90). The same
was true for studies in which adjustments for other factors than
smoking were made (OR = 0.67, 95% CI : 0.361.22). In general
terms, studies with information on exposure concentration
had a decreased likelihood of yielding false positive results
(OR = 0.70, 95% CI : 0.341.44). If a study reported a positive
dose-response relationship the likelihood of a false positive finding was substantially less than if a dose-response relationship
was not tested for (OR = 0.26, 95% CI : 0.100.71). This was
not the case if a dose-response relationship was tested for
but not found, compared to studies where the relationship was
not tested for. Thus the presence of a dose-response association
appeared to be associated with a lower risk of a study finding
being a false positive. A decreased OR was found both between
studies that reported a positive or non-existent duration-effect
relationship compared to studies in which this association was
not tested for. Thus it must be concluded that having found a
positive duration-effect relationship is not related to the odds
for a false positive finding, but that testing for this association is
related to the odds for a false positive finding.

Other variables
The log-transformed OR was a strong predictor of the outcome
of a study not being a false positive (OR: 0.56 per unit increase
in the logarithm of the reported OR [95% CI : 0.370.86]). There
was only a marginal difference in likelihood of a false positive
study between journals specifically focused on the occupational
environment and other journals. The difference between
countries was considerable, although not statistically significant
at conventional levels. Taking the US as the reference group, the
most likely countries to report false positive results were the
Scandinavian countries.
Affiliation of the first author was also related to true or false
positivity. Chances of a false positives study outcome were

952

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

Table 2 Associations between study characteristics and true or false positive status of the study in terms of odds ratios (OR)a and 95% CI
Study characteristic

OR

95% CI

1.27

0.732.23

UK versus USA

0.68

0.192.38

Scandinavia versus USA

1.54

0.733.26

Rest of Europe versus USA

0.50

0.221.13

Elsewhere versus USA

1.01

0.432.36

Retrospective cohort versus case-controlb

0.80

0.441.47

Other designs versus case-controlb

1.16

0.492.76

Occupation and cancer? Specific versus not specific

1.14

0.472.76

Agent and organ? Specific hypothesis versus no specific hypothesis

0.23

0.110.49

Agent and organ? 2, 3 or 4 hypotheses versus no specific hypothesis

0.57

0.241.36

Occupational health journals versus other journal

Design

Hypothesis specific for

No. of significance tests in article: Less than 50 versus over 50

0.62

0.351.08

Was study based on cancer registry? No versus yes

0.35

0.190.62

Availability of exposure data or estimates? Yes versus no

0.70

0.341.44

Adjustment for other factors than smoking? Yes versus no

0.67

0.361.22

Adjustment for smoking? Yes versus no

0.51

0.290.90

No dose-effect relation found versus not tested

0.96

0.273.41

Dose-effect relation found versus not tested

0.26

0.100.71

No positive duration effect relation found versus not tested

0.52

0.211.31

Positive duration effect relation found versus not tested

0.55

0.291.05

0.76

0.351.62

Government versus university

0.84

0.471.53

Industry versus university

0.45

0.092.23

Other versus university

0.92

0.352.65

Collaboration between institutes? Yes versus no

0.97

0.531.78

Fishing expedition? No versus yes

0.33

0.180.57

Ln (odds ratio) (per 1 unit increase)

0.56

0.370.86

Dose-effect relation

Positive duration effect

Observed association statistically significant? Yes versus no


Affiliation first author

a An odds ratio below 1 indicates a factor that decreases the risk of a false positive finding.
b Other designs are all other design than case-control studies and retrospective and prospective cohort studies, for instance proportional mortality ratio studies

and ecological studies. In other designs, prospective cohort studies were excluded because of the small number.

smallest if the first author was affiliated to industry. The highest


chance of a false positive result was observed if the first author
was affiliated with a university.

Adjustment for other factors


A logistic model that included the most important factors from
the univariate analyses reported above is presented in Table 3.
The variable selection for adjustment in the logistic regression
model was restricted to variables describing study design characteristics and was further based on the magnitude of the univariate OR. Since there was a strong correlation between the
variables fishing expedition and specific hypothesis the latter
was not included in the model. From this model the factors
fishing expedition, adjustment for smoking, investigation of
a dose-response relationship, Log odds ratio and use of cancer
registry data emerged as important and statistically significant
factors, independently of the other included variables.

Discussion
This comparative study focused on identifying factors in
research objectives and design characteristics that may affect
the likelihood of a false positive finding in occupational cancer
epidemiology.
We selected the IARC classification as the gold standard for
distinguishing between true positive and false positive studies.
This choice gives some room for circular reasoning, since the
IARC classification in itself is based on a judgement of true/
false positivity of the epidemiological study results. However, a
line must be drawn somewhere. Our choice was to let this line
be drawn by IARC. The IARC classification has the advantage of
being based on an evaluation of the full evidence of the carcinogenicity of the compound. Clearly the research design least likely
to yield false positive results is the experimental design, with
control over the exposure, randomization and blinded observation

FALSE POSITIVE STUDIES

953

Table 3 Associations between study selection of study characteristics false positive status of the study in a multiple logistic model, odds ratios
(OR) and 95% CI
Study characteristic

OR

95% CI

Fishing expedition? No versus yes

0.22

0.070.62

Adjustment for smoking? Yes versus no

0.33

0.111.00

Adjustment for other factors than smoking? Yes versus no

1.13

0.363.54

No dose-effect relation found versus not tested

0.49

0.073.52

Dose-effect found relation versus not tested

0.12

0.020.80

Was study based on cancer registry? No versus yes

0.26

0.110.62

No. of significance tests in article: ,50 versus .50

0.75

0.301.89

Log odds ratio (per unit increase)

0.44

0.230.84

of changes in health. However, this option normally is not


practicable because of ethical reasons and possible long latency
periods. Intuitively, however, it can be assumed that the study
that most resembles a true experiment should be least likely to
yield false (positive) results, which has also been argued by
Mayes et al.7 Our results support this conclusion, particularly if
one realizes that an experiment is usually performed to test a
highly specific hypothesis. Our study also supports the strong
need to set up studies designed to test for a specific hypothesis.
Over-reporting of positive results, either false positive or true
positive, is thought to exist in the open literature because of
publication bias.12 Publication bias may have had an effect on
the studies selected for our analysis. However, it is questionable
if selection bias has had an effect on our conclusions, since such
an effect is only possible if publication bias has an effect on the
type of false positive or true positive studies (and in a selective
manner) published in the literature.
In summary, several design characteristics, but in particular
the specificity of the study aim and whether it was a fishing
expedition or not were associated with the likelihood of a false
positive study result. The strongest associations were found for
the inclusion of adjustment for confounders, and the specificity
of the hypothesis under investigation. The latter association
was confirmed by the finding that fishing expeditions were over
three times more likely to yield a false positive finding than
studies that were based on specific hypotheses to be tested.
This finding persisted in a logistic model which included other
important design characteristics.
With respect to study outcome, there were several factors
associated with the likelihood of a false positive finding. Studies
that reported a dose-response relationship were much more
likely to yield a true positive finding than studies in which no
dose-response relationship was reported, or this was not investigated. There was no association between reporting a durationresponse relationship or not and the likelihood of a true positive
study result. As expected, the strength of the reported OR was
also associated with true or false positivity.

The finding that researchers affiliated with industry report


fewer false positive findings deserves some discussion. Since
our study does not have an intrinsic gold standard we cannot
distinguish between two explanations for this finding. First, it is
possible that researchers from industry might be less likely to
report findings that are not in agreement with other findings.
Second, it is possible that researchers from academia are more
likely to be driven by the need to publish results. The publish
or perish paradigm is more applicable to researchers from
universities than to researchers from industry.
Given the intrinsic limitations of an observational nonexperimental study it is difficult to draw solid causal inferences
from our study. In addition it was not possible to take into
account the possibility that researchers present, in their articles,
their objectives and conclusions with a different perspective
when dealing with a possible false positive outcome or possible
true positive outcome. Despite these limitations it can be concluded that several factors are strongly associated with the
likelihood of a false positive or true positive study result. Some
of these factors, such as strength of the association between exposure and effect and a dose-response relationship, are already
strongly embedded in the criteria for causality that have been
described by Hill13 and later modified by Susser.14 However, a
similarly strong factor: whether or not the study has a specific
a priori hypothesis, is not included in the well-known criteria
for causality. The type of study (cohort or case-control), a factor
often mentioned in the context of inferior or superior designs,
did not emerge as an important factor in the analysis, although
the number of cohort studies and case-control studies included
was comparable (97 retrospective cohort, 95 case-control). We
have to remark that most of the occupational cohort studies
are retrospective cohort studies, a design that is often thought
to be less robust than prospective cohort studies.
The results of the analysis reported here clearly indicate
that a study with a specific a priori hypothesis should be valued
more highly in establishing a causal link between exposure and
effect than a mere fishing expedition. We therefore suggest using

KEY MESSAGES

Studies testing a specific a priori hypothesis are less likely to report false positive outcomes.

Adjustment for other factors, especially smoking, decreases the risk of a false positive study outcome.

A positive dose-response relationship and a substantial relative risk decrease the risk of a false positive finding.

954

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

results from fishing expedition studies only for hypothesis generation, and not as a basis for conclusions regarding the potential
carcinogenicity of the substance under study. This is especially
true if cancer registry data are used. Also, results from studies
without correction for smoking and studies that did test whether
or not there is a dose-response relationship but did not find one
or other confounders have to be handled with care.

6 Feinstein AR, Horwitz RI, Spitzer WO, Battista RN. Coffee and pan-

creatic cancer: the problem of etiologic science and epidemiological


case-control research. JAMA 1981;246:95761.
7 Mayes L, Horwitz RI, Feinstein AR. A collection of 56 topics with

contradictory results in case-control research. Int J Epidemiol 1988;


17:68085.
8 Hernberg S. Introduction to Occupational Epidemiology. Michigan, MI:

Lewis Publishers, 1992.


9 Easterbrook PJ, Berlin R, Gopalan R, Matthews DR. Publication bias

References

in clinical research. Lancet 1991;337:86772.

1 Taubes G. Epidemiology faces its limits. Special News Report. Science

10 Dickersin K, Min Y, Meinert CL. Factors influencing publication of

research results. JAMA 1992;267:37478.

1995;269:16469.
2 Trichopoulos D. The future of epidemiology. Br Med J 1996;313:

43637.
3 Koplan JP, Thacker SB, Lezin NA. Epidemiology in the 21st century:

11 International Agency for Research on Cancer. IARC monographs

on the evaluation of carcinogenic risks to humans. List of IARC


evaluations, IARC, Lyon, October 1996.

calculation, communication and intervention. Am J Public Health


1999;89:115355.

12 Thornton A, Lee P. Publication bias in meta-analysis: its causes and

4 Bhopal R. Paradigms in epidemiology textbooks: in the footsteps of

13 Hill AB. The environment and disease: association or causation?

Thomas Kuhn. Am J Public Health 1999;89:116265.


5 Bankhead C. Debate swirls around the science of epidemiology. J Natl

Proc R Soc Med 1965;58:295300.


14 Susser M. What is a cause and how do we know one? A grammar for

pragmatic epidemiology. Am J Epidemiol 1991;133:63548.

Cancer Inst 1999;91:191416.

International Epidemiological Association 2001

consequences. J Clin Epidemiol 2000;53:20716.

Printed in Great Britain

International Journal of Epidemiology 2001;30:954957

Commentary: Toward systematic reviews


in epidemiology
Michael B Bracken

One does not need to agree with the premise of Swaen et al.,1
who examine the issue of false positive outcomes in this issue of
the International Journal of Epidemiology, or have to ignore some
methodological weakness in their study, or even to think their
conclusions simply reaffirm some very basic scientific principles,
to see considerable merit in the approach taken by these investigators and to believe their paper commands attention.
Taubes paper,2 which highlighted the discrepancies between
different study results that arise in epidemiology and the effect
this has on public opinion, was famously read by some as predicting the imminent demise of epidemiology but it also permitted
a broader examination of the state of the discipline.3 To be
sure, epidemiology produces conflicting results but so does any
research enterprise. It is only because the public has such a keen
interest in the results of epidemiological studies that they are
seen to be at the sharp end of this particular stick. Climatologists,
nuclear physicists and students of the fall of the Roman Empire
are all seen to produce their share of discrepant observations
Correspondence: Professor Michael Bracken, Department of Epidemiology
and Public Health, Yale University School of Medicine, 60 College Street,
PO Box 208034, New Haven, CT 065208034, USA. E-mail: michael.bracken@
yale.edu

when the spotlight of public scrutiny falls on them. In fact,


epidemiologists know a lot about the correct ways to conduct a
research study but less about how to review and synthesize data
from multiple studies, and this, I suggest, is a principal source of
the publics confusion when faced with a new result from an
epidemiological study.
The paper by Swaen et al.1 incurred several methodological
difficulties of its own. How does one define a true positive result
(i.e. what is the gold standard)? If investigators set up a hypothesis and test it on data already collected for another purpose
is that necessarily a fishing expedition? Indeed, is all secondary
data analysis a fishing expedition? Should Swaen et al. have
looked at the issue of false negative studies to derive a completely
balanced picture and one might mischievously ask whether
their own paper is itself an example of a false positive result.
Despite these limitations, this is an innovative attempt to try
and quantify some biases that may lead to misinterpretation in
an epidemiology review. It follows in the footsteps of a growing
body of work done under the purview of the discipline of
evidence-based medicine and healthcare which, by doing studies
of studies, searches for sources of bias in accumulating the
entire body of literature on a topic. Among many aspects of the

FALSE POSITIVE STUDIES

science of review, these studies have led to a better understanding of the role of publication4 and citation bias,5,6 the importance
of different data base searching strategies,7 the validity of abstracts
in accumulating research evidence,810 and the validity of
different methods used for quantifying the quality of studies.11
Reassuringly, Swaen et al. find the largest factor in a false
positive study to be the absence of an a priori hypothesis as that
is arguably the most fundamental of all scientific principles.
Similarly, a dose-response relationship and adjustment for a
major confounder (smoking), as expected, lead to fewer false
positive results. It is interesting that study design is not itself a
factor but there is little reason it should be; there is nothing
intrinsically in error with case-control studies once problems
with confounding have been adjusted, as they have here. Many
of these issues are a matter of faith in epidemiology and it is
reassuring to have some empirical evidence for them.
It is a great paradox in epidemiology that while the profession
is very conversant with the requirements for conducting valid
studies, it has generally neglected the need for rigorous, objective and hypothesis-driven summaries of the totality of epidemiological evidence when reviewing a particular topic. Early
critics of the lack of scientific rigor in literature reviews focused
on medicine,12,13 but in one recent analysis, over 60% of epidemiology reviews were considered to not meet the standards
of a systematic review,14 and several specific biases in epidemiology
reviews have been reported for chronic fatigue syndrome15 and
passive smoking.16 While calls for more quantitative reviews in
epidemiology are starting to be made,17 the overall poor quality
of current epidemiology reviews is in marked contrast to the
field of evidence-based medicine and healthcare which over
the last 12 years has made remarkable strides in developing a
methodology and strict standards for systematically reviewing
and analysing a body of literature.18 While some epidemiologists have played a major role in these developments, by and
large it appears that epidemiologists still review evidence using
traditional and potentially biased methods.
Trhler has recently provided a comprehensive account of the
origins of evidence-based medicine, focusing on its early history
in Britain.19 Of particular note are the arithmetic observationists
who sought to quantify the mass of new observations being
made in medicine in the late 18th century, and exemplified by
William Black who in his text An Arithmetic and Medical Analysis
of the Diseases and Mortality of the Human Species wrote in 1789:
however it may be slighted as an heretical innovation, I
would strenuously recommend Medical Arithmetick, as a guide
and compass through the labyrinth of the therapeutick.20(cited
in 19 p.117)

955

The preparation of systematic reviews in epidemiology goes


back at least 100 years. Chalmers et al.,21 remind us of an early
review and meta-analysis of 11 studies by Karl Pearson who in
1904 reviewed evidence of typhoid vaccines using many of the
strategies expected in modern systematic reviews.22 Winkelstein23
has also brought to our attention the early work of Joseph
Goldberger who in 1907 reviewed 26 studies concerning the
frequency of urinary infection in cases of typhoid fever.24
Goldberger also followed many of the maxims of modern
research synthesis. It remains an interesting question why
epidemiologists today have only rarely continued in the early
tradition of more empirical research review.
To test (in an admittedly simple manner) the hypothesis that
epidemiology reviews are not meeting modern standards of
research synthesis, I analysed 39 reviews in 5 recent issues
of Epidemiology Reviews, the pre-eminent source for reviews of
the epidemiology literature. I asked three questions of each
review, all reflecting some but by no means all of the principles
frequently used to characterize a high quality systematic review
within evidence-based medicine, as promulgated by the Cochrane
Collaboration.25 First, did the review address a focused research
question based on well-defined a priori exposures being related
to a defined pattern of disease. Second, was the method of
locating evidence described in detail in the review. Third, were
explicit criteria prespecified to indicate the rationale for including or excluding a study. These criteria are also the first three
in a larger set of criteria used by Mulrow and colleagues to
examine the quality of medical reviews.12,26 Importantly, the
use of meta-analysis was not a criterion for a systematic review.
Systematic reviews do not require a meta-analysis, which may
be deemed inappropriate because of sparse or heterogeneous
results, and not all reviews which include meta-analyses follow
the requirements of a systematic review. The choice of studies
meta-analysed may be serendipitous rather than being based on
a well-defined protocol.
Table 1 shows the result of the analysis of the epidemiology
reviews and compares them to the results of Mulrow and colleagues
who recently updated their earlier examination of medical review
articles.26 The single criterion that epidemiology reviews most
commonly meet is to have the review address a focused welldefined question although this was still only met in about half
of reviews (49%). Providing a description of the methods used to
locate the evidence being reviewed in the form of prespecified
criteria for data base searching (15%) and using explicit criteria
to select studies included in the review (10%) were rarely met
criteria. Reviews in epidemiology show a similar lack of rigor to
those in medicine generally and are methodologically inferior to
meta-analyses, systematic reviews and overviews.

Table 1 Methodology of review articles in medicine and epidemiology


Per cent of reviews positive

Criterion
Review addressed a focused research question

Medicine
1985198612
(n = 50)

Medicine
199626
(n = 158)

Epidemiology
19971999a
(n = 39)

Meta-analysesb
199626
(n = 19)

80

34

49

95

Described method for locating evidence

28

15

95

Used explicit criteria to select studies

14

10

68

a From Epidemiologic Reviews Vols 1921 omitting methodology reviews, editorials, and a historical review of one major trial.
b A subset of the 158 Medicine reviews described as meta-analyses, systematic reviews or overviews.26

956

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

Epidemiologists are not alone in having neglected the need


to construct methodologically rigorous and unbiased reviews
of research evidence. Chalmers et al.21 document calls for systematic reviews in physics,27 education,28 psychology,29 and the
social sciences.30 They suggest:
Many, if not most people working within academia, have not
yet recognized (let alone come to grips with) the rationale for
and methodological challenges presented by research synthesis.
Research synthesis is only now beginning to be seen as proper
research in the corridors of academic power. As much as anything else, this change seems to reflect the fact that the consumers
of research are pointing out more forcibly that the atomised,
unsynthesized products of the research enterprise are of little
help to people who wish to use research to inform their
decisions.21
If epidemiologists fail to use modern methods of scientific
review to derive unbiased syntheses of study results, is it any
surprise that journalists do not do so either? It has always been
a premise of scientific reporting that after describing a studys
new findings, the investigator has a duty to synthesize the new
results into the extant body of evidence on the topic. This aspect
of epidemiology reporting may be occurring less frequently,
perhaps because of editorial pressures to reduce the length of
articles or perhaps because students are not being trained in this
aspect of report writing; this itself is an area of research. Clinical
trial reports have been found to inadequately synthesize their
results within the current body of comparable evidence.31 A
systematic review should validly reflect the current state of knowledge on a given topic and should form the basis for scientific
reporting. If there were more concurrent systematic reviews
in epidemiology, and new research findings were routinely
discussed within the context of a systematic review, it would be
a relatively easy task to refer the inquiring journalist or policy
maker to the discussion section of a paper for an explanation
of how the new report had changed the totality of evidence, if
at all.
The study by Swaen et al. uses an innovative, albeit imperfect,
research design to investigate sources of bias in the epidemiology
literature. In doing so, it joins a growing body of literature on
the science of systematically reviewing and analysing research
evidence. The Cochrane Library25 includes a methodological
data base of some 1350 titles. It is worth noting that scholars of
evidence-based medicine have largely focused their attention
on randomized trials, the methodology widely considered to be
the gold standard of study design, but even here there remain
concerns about reviewing evidence based on trials.32 How
much more difficult will be the review of areas of research based
on observational study designs and how much more likely the
chance for bias, confusion and error?33
The limits of epidemiology are most likely faced when
studying associations of rare disease with rare exposures,34 and
some of the characteristics of these studies are found in the
occupational studies forming the basis for Swaen et al.s analysis.
Would a comparable review of a more common exposure with
a common outcome lead to similar conclusions? Only more study
will tell. However, it is the rare exposure-rare outcome that
increasingly tests epidemiology. As individual studies become
more challenging then systematically reviewing the evidence

from these studies will pose its own increasing difficulty. It


may be that unsystematic and poorly conducted reviews of the
smoking-lung cancer association would still correctly conclude
that an association existed simply because of the strength of
the relationship under study. This is less likely to happen as
epidemiologists focus on rare disease-rare exposure associations.
In these instances, the science of conducting high quality evidencebased reviews becomes increasingly critical if epidemiology is to
credibly inform the public of the current risks to health to which
it believes it may be exposed.

Acknowledgement
I am grateful to Iain Chalmers for comments on an early draft
of this paper.

References
1 Swaen GGMH, Teggeler O, van Amelsvoort LGPM. False positive

outcomes and design characteristics in occupational cancer


epidemiology studies. Int J Epidemiol 2001;30:94854.
2 Taubes G. Epidemiology faces its limits. Science 1995;269:16469.
3 Bracken MB. Alarums false, alarums real: challenges and threats to

the future of epidemiology. Ann Epidemiol 1998;8:7982.


4 Laupacis A. Methodologic studies of systematic reviews: is there

publication bias. Arch Intern Med 1997;157:35758.


5 Gtzsche P. Reference bias in reporting drug trials. Br Med J 1987;295:

65456.
6 Ojasoo T, Dor JC. Citation bias in medical journals. Scientometrics

1999;45:8184.
7 Watson RJ, Richardson PH. Accessing the literature on outcome

studies in group psychotherapy: the sensitivity and precision of


Medline and PsycINFO bibliographic data base searching. Br J Med
Psychol 1999;72:12734.
8 Scherer RW, Dickersin K, Langenberg P. Full publication of results

initially presented in abstracts. A meta-analysis. JAMA 1994;272:


15862.
9 Callaham M, Wears RL, Weber EJ, Barton C, Young G. Positive out-

come bias and other limitations in the outcome of research abstracts


submitted to a scientific meeting. JAMA 1998;280:25457.
10 Pitkin RM, Branagan MA, Burmeister LF. Accuracy of data in abstracts

of published research articles. JAMA 1999;281:111011.


11 Berlin JA, Rennie D. Measuring the quality of trials: the quality of

quality scales. JAMA 1999;282:108385.


12 Mulrow CD. The medical review article: state of the science. Ann

Intern Med 1987;106:48588.


13 Peto R. Why do we need systematic reviews overviews of randomized

trials? Stat Med 1987;6:23340.


14 Breslow RA, Ross SA, Weed DL. Quality of reviews in epidemiology.

Am J Public Health 1998;88:47577.


15 Joyce J, Rabe-Hesketh S, Wessely S. Reviewing the reviews: the

example of chronic fatigue syndrome. JAMA 1998;280:26466.


16 Misakian AL, Bero LA. Publication bias and research on passive

smoking: comparison of published and unpublished studies. JAMA


1998;280:25053.
17 Blettner

M, Sauerbrei W, Schlehofer B, Scheuchenpflug T,


Friedenreich C. Traditional reviews, meta-analyses and pooled
analyses in epidemiology. Int J Epidemiol 1999;28:19.

18 Clarke M, Oxman AD (eds). Cochrane Reviewers Handbook 4.0 [updated

July 1999]. In: The Cochrane Library [database on CDROM]. The


Cochrane Collaboration. Oxford: Update Softfware; 2000 Issue 1.

FALSE POSITIVE STUDIES

957

19 Trhler U. To improve the evidence of medicine. The 18th Century British

27 Herring C. Distil or drown: the need for reviews. Physics Today 1968;

Origins of a Critical Approach. Edinburgh: The Royal College of


Physicians of Edinburgh, 2000.

28 Pillemer DB. Conceptual issues in research synthesis. J Spec Educ

20 Black W. An Arithmetick and Medical Analysis of the Diseases and Mortality

of the Human Species. London: C Dilly, 1789.

21:2733.
1984;18:2740.
29 Sterling TD. Publication decisions and their possible effects on

21 Chalmers I, Hedges LV, Cooper H. A brief history of research synthesis.

In: Clarke M (ed.). Evaluation and the Health Professions. In Press.


22 Pearson K. Report on certain enteric fever inoculation statistics.

Br Med J 1904;3:124346.

inferences drawn from tests of significanceor vice versa. J Am Statist


Assoc 1959;54:3034.
30 Light RJ, Pillemer DB. Summing Up: The Science of Reviewing Research.

Cambridge, MA: Harvard University Press, 1984.

23 Winkelstein W. The first use of meta-analysis. Am J Epidemiol 1998;

147:717.
24 Goldberger J. Typhoid bacillus carriers. In: Rosenau MJ, Lumsden

LL, Kastle JH (eds). Report on the origin and prevalence of typhoid


fever in the District of Columbia. Hygienic Laboratory Bulletin
1907;No.35.

31 Clarke M, Chalmers I. Discussion sections in reports of controlled

trials published in general medical journals: islands in search of


continents? JAMA 1998;280:28082.
32 Jadad AR, Cook DJ, Jones A et al. Methodology and reports of

25 The Cochrane Library 2000 Issue 3. [database on CDROM]. The

systematic reviews and meta-analyses: a comparison of Cochrane


reviews with articles published in paper-based journals. JAMA 1998;
289:27880.

Cochrane Collaboration: Update Software; www.cochranelibrary.com

33 Egger M, Schneider M, Davey Smith G. Spurious precision? Meta-

26 McAlister FA, Clark HD, van Walraven C et al. The medical review

article revisited: has the science improved. Ann Intern Med 1999;131:
94751.

International Epidemiological Association 2001

Printed in Great Britain

analysis of observational studies. Br Med J 1998;316:14044.


34 Bracken MB. Musings on the edge of epidemiology. Epidemiology

1997;8:33739.

International Journal of Epidemiology 2001;30:957958

Commentary: Prior specification of hypotheses:


cause or just a correlate of informative studies?
David A Savitz

Published epidemiological studies are prone to spurious positive


findings. This is not just an issue bearing on the disciplines
credibility to outsiders but a fundamental methodological
concern. Epidemiologists must accept the challenge to improve
research methods, publish findings regardless of their implications, and objectively appraise the validity of our results and
those of our colleagues. Results are often dichotomized as
positive or negative, despite the loss of quantitative information resulting from this practice. The aetiology of false positive
reports is surely multifactorial. Some of this falls to the media
and the public for overinterpretation. Some results from the
exuberance of investigators who advertise their most surprising,
dramatic findings, despite the fact that results that run counter
to the conventional wisdom are most likely to be erroneous.
Human beings (not just epidemiologists) can become enamoured
with their own achievements, lose objectivity, and seek the
fame and fortune that result from startling discoveries. We need
to improve the resolution of our methods and devote greater
energy to helping to ensure appropriate use (or lack of use, in
many cases) of our findings by policy makers and the public.
Department of Epidemiology, Campus Box #7400, University of North
Carolina School of Public Health, Chapel Hill, NC 275997400, USA. E-mail:
david_savitz@unc.edu

Swaen and colleagues1 have taken on the important goal


of improving understanding of the aetiology of false positive
studies, searching for causes based on past research that could
be applied to future studies to help distinguish between true
positive and false positive findings. Such identifiers would enable
us to place a more appropriate level of confidence in study
findings, discounting some and paying more attention to others.
The authors deserve credit for attempting to bring some empirical,
quantitative evidence to bear on this important issue, but some
practical and conceptual barriers constrain the effectiveness of
the search and threaten to introduce false positive predictors of
false positive studies.
Formal specification of prior hypotheses, while empirically
predictive of more valid positive findings, is an artefact, not a
cause of such accuracy. In order for the hypothesis to be defined
in advance and narrowly focused, for few statistical tests to be
conducted, and for the study not to be categorized as a fishing
expedition, the prior evidence in support of the hypothesis
must be sufficiently strong. The biological context, experimental
support, or prior epidemiological studies presumably lay the
foundation that enables the researcher to specify a hypothesis
for evaluation. The act of articulating the hypothesis obviously
does not magically confer improved quality to the study. The
prior evidence in support of the hypothesis simultaneously

958

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

enables the investigator to focus the study and makes observed


positive findings more likely to be correct.
Every single hypothesis which a study addresses, even if there
are hundreds of such questions, has some level of credibility
prior to the study (whether the investigator knows it or not)
and a new level of support after the study is completed. Studies
do not generate new hypotheses; they only provide evidence
that helps to evaluate the credibility of hypotheses. If the hypothesis did not exist before the study was conducted, there would
have been no reason to calculate the measure of effect. To counter
the notion that epidemiological studies generate hypotheses,
as opposed to providing evidence to evaluate them, Philip Cole
laid the question of prior hypotheses to rest for all epidemiologists through his hypothesis generating machine.2 By proposing
every imaginable combination of exposure and disease, all hypotheses from that point forward can be considered to have been
formally specified a priori, whether or not the investigators knew
it or make any use of the information. To my knowledge, with
all studies from 1993 forward now based on a priori hypotheses,
there was no discernible improvement in the quality of epidemiological evidence, suggesting that the prior specification of
hypotheses did not in fact generate more accurate results.
It is the pre-existing support from epidemiology and other
disciplines that enhances the probability that a new positive
result will be a true positive, in that it would be consistent with
prior evidence. When studies of identical quality, undertaken
with little or no prior supporting evidence (referred to as the
pursuit of improbable hypotheses by Hatch3) generate positive
findings, this new positive evidence is by definition incompatible with the lack of prior support or may contradict previous
findings. In those situations, the cumulative support for the
hypothesis continues to be quite modest, so that the positive
study is operationally defined as a false positive. Bayesian
inference formalizes the prior belief in a hypothesis and the
influence of a given study in shifting that belief in one direction
or the other to a degree that depends on the quality and precision of the study. When the prior evidence for a hypothesis
is strong, a positive study is called a true positive study and
when the prior evidence is weak or in the opposite direction,
the positive study is called a false positive study. The mistake is
to confuse an increment in support from a positive study with
cumulatively strong support for the hypothesis. In so-called
fishing expeditions, many hypotheses, each with very limited
prior support, are being evaluated.
The other correlates of prior specification of hypotheses suggest that studies with a focused hypothesis are undertaken with
more methodological rigor. Thus, the use of cancer registry data or
having access to the tremendous data resources available in
Scandinavia, which reduce the barrier to conducting research, are

predictive of false positive studies. Studies that are more expensive


and difficult to initiate, reflected by cohort designs and adjustment
for smoking and other confounders, are less vulnerable to false
positive findings. Obtaining research funding generally requires
a focus on specific hypotheses, with the focus allowing more
rigorous measurement of key variables, for example. The declaration of the hypothesis in advance does not convey benefit except
insofar as it leads to a methodologically stronger study. In fishing
expeditions, the quality of the evidence bearing on the many
hypotheses is often weak, given the failure to focus energy and
resources on any one of the many questions being addressed.
Thus, I hypothesize that the benefits of specifying prior hypotheses are spurious, once the level of prior supporting evidence
and the quality of the study methods are considered. Even though
there may well be predictive value, in that positive results from
a focused study are more likely to qualify as true positives than
similar results from a study that addresses many questions, this
is a result of confounding by prior evidence and study quality.
To ignore these correlates of hypothesis specification would lead
to the suggestion that we simply articulate our goals or set few
as opposed to many goals to avoid making errors. A colleague
once suggested that the validity of positive findings could be
enhanced if a small number of hypotheses were written in
advance, placed in a sealed envelope, and opened upon completion of data collection and analysis. Swaen et al.s proposed
automatic downgrading of findings from fishing expeditions1
likewise suggests we need only state which fish we are looking
for in order to enhance the value of the fish we catch.
Both the prior evidence and quality of the study deserve intense
scrutiny, and the more fully each is understood, the more
effectively the cumulative evidence can be characterized. When
a hypothesis with limited prior support is being addressed, even
a strikingly positive study remains likely to be a false positive
in that the study is positive but the cumulative evidence for
the hypothesis is negative. The studys internal strengths will
determine how much it shifts the evidence for or against the
hypothesis in the direction of its findings. It is the cumulative
evidence, not the shift in evidence from the new study, which
should serve as the basis for individual decision making and
setting policy. Most positive findings from epidemiology really
do call only for more research.

References
1 Swaen GGMH, Teggeler O, van Amelsvoort LGPM. False positive out-

comes and design characteristics in occupational cancer epidemiology


studies. Int J Epidemiol 2001;30:94854.
2 Cole P. The hypothesis generating machine. Epidemiology 1993;4:27173.
3 Hatch M. Pursuit of improbable hypotheses. Epidemiology 1990;1:9798.

Das könnte Ihnen auch gefallen