Sie sind auf Seite 1von 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/232449663

Is It Really Robust?

Article  in  Methodology European Journal of Research Methods for the Behavioral and Social Sciences · January 2010
DOI: 10.1027/1614-2241/a000016

CITATIONS READS

160 8,916

5 authors, including:

Matthias Ziegler Erik Danay


Humboldt-Universität zu Berlin Private University for Health Sciences, Medical Informatics and T…
132 PUBLICATIONS   1,911 CITATIONS    14 PUBLICATIONS   485 CITATIONS   

SEE PROFILE SEE PROFILE

Markus Buehner
Ludwig-Maximilians-University of Munich
122 PUBLICATIONS   2,675 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

European Journal of Psychological Assessment View project

S²-Pan [dropout phenomena in higher education with special regard to primary (science) education] View project

All content following this page was uploaded by Matthias Ziegler on 17 May 2014.

The user has requested enhancement of the downloaded file.


Original Article

Is It Really Robust?
Reinvestigating the Robustness of ANOVA Against
Violations of the Normal Distribution Assumption
Emanuel Schmider,1 Matthias Ziegler,1 Erik Danay,1 Luzi Beyer,1
and Markus Bühner2
1
Humboldt University Berlin, Germany, 2Karl-Franzens University Graz, Austria

Abstract. Empirical evidence to the robustness of the analysis of variance (ANOVA) concerning violation of the normality assumption is
presented by means of Monte Carlo methods. High-quality samples underlying normally, rectangularly, and exponentially distributed basic
populations are created by drawing samples which consist of random numbers from respective generators, checking their goodness of fit, and
allowing only the best 10% to take part in the investigation. A one-way fixed-effect design with three groups of 25 values each is chosen. Effect-
sizes are implemented in the samples and varied over a broad range. Comparing the outcomes of the ANOVA calculations for the different types
of distributions, gives reason to regard the ANOVA as robust. Both, the empirical type I error a and the empirical type II error b remain constant
under violation. Moreover, regression analysis identifies the factor ‘‘type of distribution’’ as not significant in explanation of the ANOVA results.

Keywords: ANOVA, assumption violation, normal distribution, high-quality samples, Monte Carlo

The object of this paper is to empirically investigate the the assumptions of the test are respected. Violations of the
robustness of the univariate one-way fixed-effects analysis assumptions will impair this approximation (Ware, 2000),
of variance (ANOVA) against deviations from the assumption possibly leading to an inflation of the type I or II errors.
of a normally distributed dependent variable. This is tested by Reviews by Glass et al. (1972) and Harwell et al. (1992)
applying samples from basic populations. The distribution of sum up a lot of evidence for the robustness of the ANOVA
individual values in each population differs considerably from with regard to the empirical a and b values. Other evidence
the normal distribution and was derived with the method of discourages from usage of nonparametric tests due to the loss
Monte Carlo simulations. In contrast to previous research, of precision that comes along with transformations into rank
the effort to use data with a high quality was extensive. data (Edgington, 1995), lower power (Tanizaki, 1997), and
inaccuracy in case of multiple violations (Zimmerman,
1998). All in all, the findings speak for the robustness of the
ANOVA concerning violations of the normality assumption
and the lack of valuable alternatives (see also Keselman,
Previous Research Algina, Lix, Wilcox, & Deering, 2008; Lix, Keselman, &
Keselman, 1996). As reassuring as this seems, previous
Due to violated assumptions the appropriateness of the research could be inferior concerning the quality of samples.
ANOVA is often doubted. Micceri (1989) analyzed 400 pub- Firstly, early studies (before around, 1985) worked with
lished data sets reporting, most did not have univariate normal random number generators of questionable quality (Park &
distributions. After analyzing 17 journals, Keselman et al. Miller, 1988). Secondly, it was not tested that samples under-
(1998) found that researchers rarely verify the conformance lying Monte Carlo simulations are representative of their dis-
of validity assumptions. The harm of violation of assumptions tribution or not. We will resolve these deficits by application
can be understood quite intuitively considering the role of of high-quality samples that incorporate a goodness of fit test
group means and their variances. When data are skewed, between the samples and their underlying distributions.
means no longer reflect the central location. When variances
are unequal, not every group has the same level of noise, and
thus comparisons are invalid. More importantly, the infer-
ences made from the sample statistic to the population param- Method
eter based on a test statistic might be flawed (Yu, 2002).
If a one-way ANOVA is used to analyze data sets of ran- For our simulations, we will take random numbers from
domly selected numbers, the frequency distribution of empir- three different distributions: Normal-, rectangular- and expo-
ical F values will approximate the probability density curve nential distribution. The normal distribution function is
of the F statistic for the specified degrees of freedom only if given by

 2010 Hogrefe Publishing Methodology 2010; Vol. 6(4):147–151


DOI: 10.1027/1614-2241/a000016
148 E. Schmider et al.: ANOVA - and Violation

Table 1. Moments with their meanings and values for the if ANOVAs with an effect-size of, for example, f = 0.2 were
normal distribution (Equation 1), the rectangular analyzed, independent samples were simulated by program-
distribution (Equation 2), and the exponential ming the random number generator (Gough, 2003) such that
distribution (Equation 3) the groups yield the following mean values: 0 for group 1;
0.2450 for group 2; and 0.4900 for group 3. By doing this,
n’th central Normal Rectangular Exponential
the variance within a group remains the same, which is
moment Meaning distribution distribution distribution
important. The sample size of 75 can be considered optimal
1 Mean 0 0 0 in terms of power for an effect-size of f = 0.37. This effect-
2 Variance 1 1 1 size was added to check power empirically. Thus, a 3 (Dis-
3 Skewness 0 0 2 tributions) · 8 (Effect-sizes) design with 24 conditions was
4 Kurtosis 3 1.8 9 chosen. For each condition 50,000 data sets were simulated.
Annotations. The first two moments are identical due to con-
struction.
Getting Appropriate Samples
1 1 xl 2
fnormal ðxÞ ¼ pffiffiffiffiffiffi e 2ð r Þ ; ð1Þ By drawing samples from the respective distributions nor-
2pr
mally, rectangularly, and exponentially distributed data were
with the mean l and the standard deviation r. obtained. However, a sample is not always a good represen-
The rectangular distribution function is defined by tative for the basic population it is drawn from. For this rea-

0; if x < a or x > b son, the samples taken from the respective distributions
frect: ðxÞ ¼ 1 : ð2Þ were analyzed prior to conducting the actual ANOVAs.
ba
; if a  x  b
Aim of this was to extract those samples that are prototyp-
The exponential distribution function is given by ical for their basic population. To determine prototypicality
 the goodness of fit was computed with the Kolmogorov-
0; if x < b
frect: ðxÞ ¼ a ax ; ð3Þ Smirnov test (K-S test). Only the 10% best samples were
e ab e ; if x > b
chosen by applying a linear algorithm, that picks out, by
with the decay parameter a. It has no line symmetry, but is exhaustive comparison, the 5,000 samples with the
skewed to the right (skewness = 2) whereas the kurtosis is best fit.
larger than the normal distribution’s one.
In the following, we will fix the free parameters of each
distribution, such that the first two moments correspond to a Statistical Analyses
standard normal distribution: first moment (mean) = 0 and
second moment (dispersion) = 1. The moments of the three For each of the 24 conditions 5,000 ANOVAs were com-
standardized distributions are displayed in Table 1 and the puted. Outcomes were coded as 0 (H0 cannot be rejected)
curves plotted in Figure 1. in case of no significant differences between the groups or
1 (H0 has to be rejected) in case of significant differences.
The results from the ANOVAs were then regressed on
Design of the Monte Carlo Simulations the effect-size and type of distribution with a logistic regres-
sion analysis. The type of distribution is a categorial variable
A univariate, one-way experimental design with three and was therefore recoded into binary parameters.
groups (n = 25) and fixed-effects was chosen. Empirical
studies often use a design with three groups with 25 persons
in each group because this size is often recommended as the
threshold for robustness. Beside this, differences between Results
groups were also varied using effect-sizes f. The correspond-
ing means were calculated with G*power (Erdfelder et al., Goodness of Fit
1996) and are given in Table 2.
These effect-sizes cover the range from low to high Figure 2 contains plots of samples drawn from a normally
effects as proposed by Cohen (Rothstein et al., 1990). Thus, distributed basic population.

Figure 1. (a) Normal distribu-


tion (Equation 1). (b) Rectangu-
lar distribution (Equation 2). (c)
Exponential distribution (Equa-
tion 3). Parameters are fixed
concerning Table 1.

Methodology 2010; Vol. 6(4):147–151  2010 Hogrefe Publishing


E. Schmider et al.: ANOVA - and Violation 149

Table 2. Effect-sizes f and their corresponding mean differences for the three groups
Mean difference f=0 f = 0.1 f = 0.2 f = 0.3 f = 0.37 f = 0.4 f = 0.5 f = 0.6
d1 0 0 0 0 0 0 0 0
d2 0 0.1225 0.2450 0.3675 0.4533 0.4900 0.6125 0.7350
d3 0 0.2450 0.4900 0.7350 0.9065 0.9800 1.2250 1.4700
Annotations. d1 = mean for group 1, d2 = mean for group 2, and d3 = mean for group 3.

Figure 2. (a) Perfect sample


from a normal distribution. (b)
Good sample, a = 1.00. (c)
Bad sample, a = 0.02. (d) Bad
sample, a = 0.03.

A perfect sample of the normal distribution is shown in On the other hand, in case of an optimal sample size, the
Figure 2a. Good samples should closely resemble the perfect type II error b should comply with the number of (wrongly)
distribution. This holds for the sample shown in Figure 2b. not significant outcomes. According to Table 3, the percent-
In Figure 2c however, there is a quite low density at 0 as age of not significant outcomes is around 80% for f = 0.37.
only three data points range between 0.3 and 0.2. More- Thus, the type II error b, chosen as a basis when construct-
over, the symmetry is broken. Figure 2d, on the other hand, ing the design, could exactly be replicated.
is rather symmetric but concentrating many values at 1 Invariance of a and b for different distributions results
and 1, whereas the density at 0 is much lower. All in all, from comparison by estimating their confidence intervals.
5,000 prototypical samples were drawn in each condition. According to Glass et al. (1972), the value of the standard
error for processing 5,000 ANOVA simulations, assuming
a = 5%, is given by about 0.005% or 0.5%. The confidence
intervals of the empirical a and b belonging to the three dis-
ANOVAs tributions overlap more than 50% with each other. This indi-
cates that the differences are not significant (Cumming &
Results for the ANOVAs are shown in Table 3.
Finch, 2005). Therefore we can state that a and b stay con-
As can be seen, the number of significant tests increases
stant under application of non-normal distributions.
with increasing effect-size as was expected. However, even
in the case of no actual group differences (effect-size = 0),
there were some significant outcomes (around 5% indepen-
dent for all types of distributions). This can be explained by Regression Analysis
the fact that random numbers sometimes lead to unbalanced
mean values between the groups and therefore yield signif- The results for the regression analysis are given in Table 4.
icant differences. The percentage of such false results clo- There is a significant influence of effect-size, but no signif-
sely approximates the conventional level for type I errors. icant impact of type of distribution.

 2010 Hogrefe Publishing Methodology 2010; Vol. 6(4):147–151


150 E. Schmider et al.: ANOVA - and Violation

Table 3. Results of the 5,000 ANOVA tests for each normally, rectangularly, and exponentially distributed data.
condition Beside distribution shape effect-size was also varied. All
other influences, such as group variance or group size, were
Significant Significant
held constant. Thus, results can causally be explained by the
Distribution Effect-size f ANOVAs (R) ANOVAs (%)
manipulations of distribution shape and effect-size. The
Normal 0 257 5.14 results give strong support for the robustness of the ANOVA
Normal 0.1 509 10.18 under application of non-normally distributed data.
Normal 0.2 1,550 31.00 A lot of effort was dedicated to process the Monte Carlo
Normal 0.3 3,069 61.38 simulations with data of very high quality. A filtering pro-
Normal 0.37 4,013 80.26 cess that compares the samples with the desired type of dis-
Normal 0.4 4,342 86.84 tribution and allows only the best 10% of the samples to
Normal 0.5 4,891 97.82 pass the actual calculations followed the generation of
Normal 0.6 4,983 99.66 high-quality random numbers. Taking into account that 10
Rectangular 0 263 5.26 times more random numbers than finally used were simu-
Rectangular 0.1 533 10.66 lated, altogether 90 million random numbers were simu-
Rectangular 0.2 1,548 30.96 lated. We did not find investigations, to our best
Rectangular 0.3 3,118 62.36 knowledge, that cared that much about high-quality samples
Rectangular 0.37 4,038 80.76 as a basis of their Monte Carlo simulations.
Rectangular 0.4 4,374 87.48
Rectangular 0.5 4,901 98.02
Rectangular 0.6 4,998 99.96
Limitations and Outlook
Exponential 0 236 4.72
Exponential 0.1 562 11.24 The present study only focused on one assumption of the
Exponential 0.2 1,645 32.90 ANOVA. Violations of other assumptions have been shown
Exponential 0.3 3,172 63.44 to negatively influence ANOVAs. It would be interesting to
Exponential 0.37 4,027 80.54 check further variations of the design with high-quality sam-
Exponential 0.4 4,343 86.86 ples under violations of the normality assumption.
Exponential 0.5 4,817 96.34 For now the commonly given advice to use samples of
Exponential 0.6 4,971 99.42 25 participants per condition in ANOVA designs to circum-
Annotations. For each combination of distribution with effect- vent possible negative influences of violations of normality
size 5,000 tests were computed. assumptions seems well heeded.

Table 4. Results for the logistic regression analysis


References
B SE Wald df p Exp (B) Cumming, G., & Finch, S. (2005). Inference by eye: Confidence
intervals and how to read pictures of data. American
Distribution 3.791 2 0.150 Psychologist, 60, 170–180.
Distribution (1) 0.036 0.021 2.842 1 0.092 0.965 Edgington, E. S. (1995). Randomization tests. New York: M.
Distribution (2) 0.000 0.021 0.000 1 1.000 1.000
Dekker.
Effect-size 13.231 0.072 33770.980 1 0.000 557451.892
Constant 3.383 0.025 18227.812 1 0.000 0.034
Erdfelder, E., Faul, F., & Buchner, A. (1996). G*power:
A general power analysis program. Behavior Research
Annotations. B = regression coefficient; SE = standard error. Methods, Instruments, & Computers, 28, 1–11.
Glass, G. V., Peckham, P. D., & Sanders, J. R. (1972). Conse-
quences of failure to meet the assumptions underlying the
fixed effects analysis of variance and covariance. Review of
Educational Research, 42, 237–288.
Nagelkerkes R2 was .64. Considering that the kind of Gough., B. (2003). GNU Scientific Library Reference Manual,
distribution was not a significant factor, this was primarily (2nd ed.) Network Theory Ltd..
achieved by the factor effect-size. Harwell, M. R., Rubinstein, E. N., Hayes, W. S., & Olds, C. C.
The results have been reproduced by recurrence of all (1992). Summarizing Monte Carlo results in methodological
previous steps with new random numbers, leading to the research: The one- and two-factor fixed effects ANOVA
same outcomes. cases. Journal of Educational and Behavioral Statistics, 17,
315–339.
Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., &
Deering, K. N. (2008). A generally robust approach for
Discussion testing hypotheses and setting confidence intervals for effect
sizes. Psychological Methods, 13, 110–129.
Keselman, H. J., Huberty, C., Lix, L. M., Olejnik, S., Cribbie,
The present study aimed at investigating the robustness of R. A., Donahue, R. A. B., et al. (1998). Statistical practices
the ANOVA against violations of the underlying assumption of educational researchers: An analysis of their ANOVA,
of normally distributed data. Unlike previous studies a high- MANOVA, and ANCOVA analyses. Review of Educational
quality random number generator was used to simulate Research, 68, 350–386.

Methodology 2010; Vol. 6(4):147–151  2010 Hogrefe Publishing


E. Schmider et al.: ANOVA - and Violation 151

Lix, L. M., Keselman, J. C., & Keselman, H. J. (1996). Conse- Zimmerman, D. W. (1998). Invalidation of parametric and
quences of assumption violations revisited: A quantitative nonparametric statistical tests by concurrent violation of
review of alternatives to the one-way analysis of variance F two assumptions. Journal of Experimental Education, 67,
test. Review of Educational Research, 66, 579–619. 55–68.
Micceri, T. (1989). The unicorn, the normal curve, and other
improbable creatures. Psychological Bulletin, 105, 156–166.
Park, S. K., & Miller, K. W. (1988). Random number generators:
good ones are hard to find. Communications of the ACM, 31, 10.
Rothstein, H. R, Borenstein, M., Cohen, J., & Pollack, S. (1990).
Statistical power analysis for multiple regression/correlation:
A computer program. Educational and Psychological Mea-
surement, 50, 819–830. Matthias Ziegler
Tanizaki, H. (1997). Power comparison of non-parametric tests:
Small-sample properties from Monte Carlo experiments. Humboldt University Berlin
Journal of Applied Statistics, 24, 603–632. Institute for Psychological Diagnostics
Ware, M. E. (2000). Demonstrations and activities in the Unter den Linden 6
teaching of psychology, Vol. I, Erlbaum. 10099 Berlin
Yu, C. H. (2002). An overview of remedial tools for the violation Germany
of parametric test assumptions in the SAS system. Proceed- E-mail matthias.ziegler@cms.hu-berlin.de
ings of 2002 Western Users of SAS Software Conference.

 2010 Hogrefe Publishing Methodology 2010; Vol. 6(4):147–151

View publication stats

Das könnte Ihnen auch gefallen