Sie sind auf Seite 1von 5

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/269634928

What Everyone Should Know about Statistical


Correlation

Article in American Scientist January 2015


DOI: 10.1511/2015.112.26

CITATIONS READS

2 297

1 author:

Vladica Velikovi
University of Ni
20 PUBLICATIONS 11 CITATIONS

SEE PROFILE

All content following this page was uploaded by Vladica Velikovi on 17 February 2015.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Ethics

What Everyone Should Know


about Statistical Correlation
A common analytical error hinders biomedical research and misleads the public.

Vladica M. Velikovi

I
n 2012, the New England Journal of the correlation coefficient. The correlation misconceptions can appear even in the
Medicine published a paper claim- coefficient ranges in value from 1 to New England Journal of Medicine, I won-
ing that chocolate consumption +1. The closer the magnitude is to 1, the dered, how often are they appearing
could enhance cognitive function. stronger the relationship. in the biomedical literature generally?
The basis for this conclusion was that The stark simplicity of a correla- The example of chocolate consump-
the number of Nobel Prize laureates in tion coefficient hides the considerable tion and Nobel Prize winners brings
each country was strongly correlated complexity in interpreting its meaning. me to another, even more common
with the per capita consumption of One error in the New England Journal of misinterpretation of correlation analy-
chocolate in that country. When I read Medicine paper is that the authors fell sis: the idea that correlation implies
this paper I was surprised that it made into an ecological fallacy, when a con- causality. Calculating a correlation co-
it through peer review, because it was clusion about individuals is reached efficient does not explain the nature of
clear to me that the authors had com- based on group-level data. In this case, a quantitative agreement; it only as-
mitted two common mistakes I see in the authors calculated the correlation sesses the intensity of that agreement.
the biomedical literature when research- coefficient at the aggregate level (the The two factors may show a relation-
ers perform a correlation analysis. country), but then erroneously used ship not because they are influenced
Correlation describes the strength that value to reach a conclusion about by each other but because they are
of the linear relationship between two the individual level (eating chocolate both influenced by the same hidden
observed phenomena (to keep matters enhances cognitive function). Accu- factorin this case, perhaps a coun-
simple, I focus on the most commonly rate data at the individual level were trys affluence affects access to choco-
used linear relationship, or Pearsons completely unknown: No one had col- late and the availability of higher edu-
correlation, here). For example, the in- lected data on how much chocolate the cation. Correlation can certainly point
crease in the value of one variable, such Nobel laureates consumed, or even if to a possible existence of causality, but
as chocolate consumption, may be fol- they consumed any at all. I was not it is not sufficient to prove it.
lowed by the increase in the value of the the only one to notice this error. Many An eminent statistician, George E.
other one, such as Nobel laureates. Or other scientists wrote about this case of P. Box, wrote in his book Empirical
the correlation can be negative: The in- erroneous analysis. Chemist Ashutosh Model Building and Response Surfaces:
crease in the value of one variable may Jogalekar wrote a thorough critique on Essentially, all [statistical] models are
be followed by the decrease in the value his Scientific American blog The Curious wrong, but some are useful. All sta-
of the other. Because it is possible to cor- Wavefunction, and Beatrice A. Golomb tistical models are a description of a
relate two variables whose values can- of University of California, San Di- real-world phenomenon using math-
not be expressed in the same unitsfor ego, even tested this hypothesis with ematical concepts; as such, they are
example, per capita income and cholera a team of coauthors, pointing out that just a simplification of reality. If statis-
incidencetheir relationship is mea- there is no link. tical analyses are carefully designed,
sured by calculating a unitless number, Regardless of the scientific com- in accordance with current good prac-
munitys criticism of this paper, many tice guidelines and a thorough un-
news agencies reported on this arti- derstanding of the limitations of the
Vladica M. Velikovi is a Doctor of Medicine,
cles results. The paper was never re- methods used, they can be very useful.
a PhD student in public health, and a full-
time teaching assistant at the Public Health
tracted, and to date has been cited 23 But if models are not designed in ac-
Department, Faculty of Medicine, University of times. Even when erroneous papers cordance with the previous two prin-
Ni, Serbia. His research interests are in the use are retracted, news reports about them ciples, they can be not only inaccurate
of computational and mathematical models for remain on the Internet and can contin- and completely useless but also poten-
public health insight. E-mail: vladica.velickovic@ ue to spread misinformation. If these tially dangerousmisleading medical
medfak.ni.ac.rs faulty conclusions reflecting statistical practitioners and public.

26 American Scientist, Volume 103


I often use and design mathemati- average socioeconomic status of those myopic parents were also incidentally
cal models to gain insight into pub- using the hormone treatment, not by more likely to leave a light on in their
lic health problems, especially in the therapy itself. Re-analyses of non- childrens bedroom. In this example,
health technology assessment. For experimental studies, including the authors came to a conclusion based
this purpose I use data from already effect of socioeconomic status on out- on a spurious correlation, without
published studies. Uncritical use of come, showed the same findings as the checking for other likely explanations.
published data for designing these randomized controlled trial. But the But as shown in the figure on page 28,
models would lead to inaccurate, com- damage was done: The US Food and completely, laughably unrelated phe-
pletely uselessor worse, unsafe Drug Administration Advisory Com- nomena can be correlated.
conclusions about public health. mittee had already approved a label Along with the mistaken idea that
change for hormone replacement ther- correlation implies causation, I also see
Getting to Know the Data apy that permitted prevention of heart examples of a third, opposite type of
In well-designed experiments, correla- disease to be included as an indication, correlation error: the belief that a cor-
tion can confirm the existence of cau- almost a decade before the experiment relation of zero implies independence.
sality. Before causal inferences can be mentioned above. If two variables are independent of
derived from nonexperimental data, Even though scientists are well one anotherfor example, the number
however, careful statistical modeling aware of the mantra correlation does of calories I ate for breakfast over the
must be used. For example, a ran- not equal causation, studies conflat- past month and the temperature of the
domized controlled trial published by ing correlation and causation are all Moons surface over the same period
epidemiologist Stephen Hulley of Uni- too common in leading journals. A then I would expect the linear corre-
versity of California, San Francisco, widely discussed 1999 article in Nature lation coefficient between them to be
and colleagues determined that hor- found a strong association between zero. The reverse is not always the case,
monereplacement therapy caused in- myopia and night-time ambient light however. A linear correlation coefficient
creased risk of coronary heart disease, exposure during sleep in children un- of zero does not necessarily mean that
even though previously published der two years of age. However, an- the two variables are independent.
nonexperimental studies concluded other study published a year later Although this principle can be ap-
that the therapy lowered its risk. The also in Naturerefuted these findings plied in many cases, there are still non-
well-designed experiment showed and reported that the cause of child monotonic relationships (think of a line
that the lower-than-average incidence myopia is genetic, not environmental. graph that goes up and down) in which
of coronary heart disease in the non This new study found a strong link be- the value of the correlation coefficient
experimental studies was caused by tween parental myopia and the devel- equaling zero will not imply indepen-
the benefits associated with a higher opment of child myopia, noting that dence. To better envision this abstract

www.americanscientist.org 2015 JanuaryFebruary 27


first blush, the variables in each case
per capita consumption of mozzarella cheese (US)
appear to be strongly correlated. How-
civil engineering doctorates awarded (US) ever, it is enough just to observe the
11 800
plots of these four data sets to real-
ize that such a conclusion is wrong
700 (see figure below). Only the first graph

degrees awarded
clearly shows a linear relationship
pounds

10 600 where the interpretation of a very


strong correlation would be appropri-
ate. The second and the fourth graphs
500 show that the relationship between the
two variables is not linear, and so the
9 400 correlation coefficient of 0.816 would
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 not be relevant. The third graph de-
correlation: 96% sources: USDA & National Science Foundation tylervigen.com
picts an almost perfect relationship in
which the linear correlation coefficient
All sorts of unrelated phenomena can be correlated, including the per capita consumption of value should be almost 1, but a single
mozzarella cheese and the number of civil engineering doctorates awarded in the U.S. Misin- outlier decreases the linear correlation
terpretations of spurious correlations make it past peer review all too often. coefficient value to 0.816.
Such misconceptions can have ma-
concept, imagine flipping a fair coin to (steadily increasing or decreasing), but jor impacts on human health and pol-
determine the amount of a bet, using the2001 judging
2000 _____
_____ 2002 _____
_____
from
2003 _____
the examples
2004 _____
2005 2006 _____
_____
I see
2007 _____
in the
2008 2009
_____
icy. When testing the safety of a new
following rule: When heads is flipped biomedical literature, some people are substance, toxicologists often assume
Per capita consumption of mozzarella cheese (US)
first and then tails, youPounds (USDA) if9.3
lose$10; tails9.7 cutting
9.7 9.7corners.
9.9 10.2 10.5 11 relation-
A U-shaped 10.6 10.6 that high-dose tests will reveal low-
comesCivil upengineering
first and then
doctorates heads,
awarded (US) you ship between two variables may have dose effects more quickly and with less
Degrees awarded (National Science Foundation) 480 501 540 552 547 622 655 701 712 708
win$20. If we define X as the amount a linear correlation coefficient of zero, ambiguity than long-period, low-dose
ofCorrelation:
the bet and 0.958648
Y as the net winning, but in that case it does not imply that testing. But Anderson Andradeof the
XandY mayhave zero correlation, but the variables are independent. Charit University Medical School and
they will not be independentindeed, In 1973, Frank Anscombe, a statis- his colleagues showed otherwise. They
if you know the value of X, then you tician from England, developed ide- tested the effect of a plastic ingredient
know the value of Y. Nevertheless, the alized data sets to graphically dem- and endocrine disruptor called DEHP
relationship between the two variables onstrate this misconception. Called (di-(2-ethylhexyl)-phthalate) on rats at
may be nonlinear, and thus not detected Anscombes quartet, this representation two widely different levels of expo-
by a linear correlation test. shows four data sets that have very sure; in the experiment, the researchers
Ideally, a scientist would plot the similar statistical properties, each with monitored the activity of a key enzyme
data first to make sure it is monotonic a correlation coefficient of 0.816. On called aromatase, which induces mas-
culinization in the brain. They showed
that lower doses of DEHP suppress
12 12 aromatase, but higher doses actually
10 10 increase the enzymes activity.
In Andrades study, this dose-
8 8
y1

y1

response curve follows a nonmono-


6 6 tonic pattern, and the usual high-dose
4 4 tests would not predict these low-
dose effects. In 2010, the US Con-
4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18 sumer Product Safety Commission
x1 x1 announced that products containing
DEHP may be considered toxic and
hazardous. Studies such as this one
12 12
have led to the questioning of basic as-
10 10 sumptions used to design toxicological
8 8 tests of hormonally active compounds,
y1

y1

and this example again confirms that


6 6 sloppy analysis, or poor and superfi-
4 4 cial interpretation of data, certainly is
not a benign phenomenon.
4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18
x1 x1 Avoiding Errors
Anscombes quartet is a set of four plots that show data resulting in strong correlation coef-
All three misinterpretations of correla-
ficients, in this case of 0.816. Although that statistic appears to indicate a strong linear rela- tion can be avoided. Epidemiologist
tionship, such a conclusion would only be appropriate for the top left graph. The other three and statistician Austin Bradford Hill
violate assumptions of the statistical analysis, emphasizing the importance of plotting data suggested in 1965 certain criteria that
first to choose a suitable analysis. must be met to justify concluding causal

28 American Scientist, Volume 103


associations. Those criteria are still valid, types of transformations are used for To avoid these problems, scientists
but newer methods for drawing causal different distributions; for example, must clearly show that they under-
inference from observational data have the logarithmic transformation com- stand the assumptions behind a sta-
also been developed. Others are still presses the spacing between large tistical analysis and explain in their
in developmentfor example, Judea values and stretches out the spacing methods what they have done to make
Pearl and James Robins independent- between small values, which is ap- sure their data set meets those as-
ly introduced a new framework for propriate when groups of values with sumptions. A paper should not make
drawing causal inference from non- larger means also have larger variance. it through review if these best prac-
experimental studies. Robins figured Without access to the original data, it is tices are not followed. To make it pos-
out a statistical solution that can con- impossible to know whether this error sible for reviewers to test and replicate
vert nonexperimental data into data has been committed. analyses, the following three princi-
like those resulting from a randomized Correlation errors are as old as sta- ples must become mandatory for all
controlled trial. tistics itself, but as the number of pub- authors intending to publish results:
To avoid an ecological inference lished papers and new journals contin- publishing data sets as supplementary
fallacy, Hill suggests that researchers ues to increase, errors multiply as well. information alongside articles, giving
who lack data at the individual lev- Although it is not realistic to expect all reviewers full access to the software
el should perform careful multilevel researchers to have an in-depth knowl- code used for the analysis, and regis-
modeling. This kind of fallacy is often edge of statistical methods, they must tering the study in a publicly available
made in epidemiological studies when continuously monitor and extend basic database online with clearly stated
researchers only have access to aggre- methodological knowledge. Ignorance study objectives before the beginning
gate data. In his 1997 book A Solution or uncritical assessment of the adequacy of research, with mandatory submis-
to the Ecological Inference Problem, Gary and limitations of statistical methods sion of summary results to avoid pub-
King of Harvard University describes used often are the source of errors in lication bias toward positive results.
the statistical difficulties that lead to academic papers. Involvement of bio- These steps could speed up the pro-
such errors. As King explains, data statisticians and mathematicians in a cess of detecting errors even when re-
used for ecological inferences tend to research team is no longer an advantage viewers miss them, provide increased
have massive levels of heteroskedastic- transparency to bolster confidence in
ity, meaning that the variability within
different parts of a data set fluctuates
Involvement of science, and, most important, avoid
damage to public health caused by un-
widely across the range of values.
Aggregate data are often easier to
biostatisticians and intentional errors.

obtain than data on individuals and


may offer valuable clues about indi-
mathematicians in a Bibliography

vidual behavior when analyzed cor- research team is no Aldrich, J. 1995. Correlations genuine and spu-
rious in Pearson and Yule. Statistical Sci-

longer an advantage
rectly, but that requires individual- ence10:364376.
level data. Then, modeling at the Andrade, A. J. M., S. W. Grande, C. E. Tals

but a necessity.
individual level must be performed ness, K. Grote, and I. Chahoud. 2006. A
dose-response study followingin uteroand
in an attempt to determine the con- lactational exposure to di-(2-ethylhexyl)-
nection between individual and ag- phthalate (DEHP): Non-monotonic dose
gregate levels. Only then is it possible but a necessity. Some universities offer response and low dose effects on rat brain
to conclude whether the correlation the option for researchers to check their aromatase activity.Toxicology227:185192.
at the aggregate level applies to the analysis with their statistics department Anscombe, F. J. 1973. Graphs in statistical anal-
ysis.American Statistician27:1721.
individual level. Ecologic data alone before sending the article to review with
David, H. A. 2009. A historical note on zero
do not allow one to determine wheth- a publication. Although this solution correlation and independence. The American
er ecologic bias is likely to be present could work for some researchers, it pro- Statistician. 63:185186.
for this type of data set; the only solu- vides little incentive for the researcher to Hill, A. B. 1965. The environment and disease:
tion is to supplement the ecologic data take this extra time. Association or causation? Proceedings of the
with individual-level data. This type The process of scientific research Royal Society of Medicine 58:295300.
of modeling usually involves mixed requires adequate knowledge of bio- King, G. 1997. A Solution to the Ecological Infer-
or multilevel statistical models, which statistics, a constantly changing field. ence Problem: Reconstructing Individual Be-
havior from Aggregate Data. Princeton, NJ:
allow for individuals to be nested into To that end, biostatisticians should Princeton University Press.
aggregates. be involved in the research from the
Lemmens, P. 2010. U-shaped curve. In N. Sal-
To avoid assuming two variables very beginning, not after the measure- kind (Ed.), Encyclopedia of Research Design.
are independent because their corre- ment, observations, or experiments are Thousand Oaks, CA: SAGE Publications.
lation equals zero, the data must be completed. On the other hand, basic pp. 15871589. doi: 10.4135/9781412961288.
plotted to make sure it is monotonic. knowledge of biostatistics is essential n485.
If not, one or both variables can be in the critical appraisal of published Pearl,J. 2009. Causal inference in statistics: An
overview.Statistics Surveys3:96146.
transformed to make them so. In a scientific papers. A critical approach
Wakefield, J. 2009. Multi-level modelling, the
transformation, all values of a vari- must exist regardless of the journal in ecologic fallacy, and hybrid study designs.
able are recalculated using the same which the paper is published. A more International Journal of Epidemiology38:330
equation, so that the relationship be- careful use of statistics in biology can 336. doi: 10.1093/ije/dyp179.
tween the variables is maintained but also help set more rigorous standards Zadnik, K., et al. 2000. Myopia and ambient
their distribution is changed. Different for other fields. night-time lighting. Nature 404:143144.

www.americanscientist.org 2015 JanuaryFebruary 29

View publication stats

Das könnte Ihnen auch gefallen