Sie sind auf Seite 1von 16

Analysis of variance

From Wikipedia, the free encyclopedia

In statistics, analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t-test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a type I error. For this reason, ANOVAs are useful in comparing two, three, or more means.
Contents [hide]

1 Models

o o o

1.1 Fixed-effects models (Model 1) 1.2 Random-effects models (Model 2) 1.3 Mixed-effects models (Model 3)

2 Assumptions of ANOVA

o o

2.1 Textbook analysis using a normal distribution 2.2 Randomization-based analysis

2.2.1 Unit-treatment additivity 2.2.2 Derived linear model 2.2.3 Statistical models for observational data

3 Logic of ANOVA

o o

3.1 Partitioning of the sum of squares 3.2 The F-test

4 Power analysis 5 Effect size 6 Follow-up tests 7 Study designs and ANOVAs 8 History 9 See also 10 Footnotes 11 Notes 12 References

13 Further reading 14 External links

There are three classes of models used in the analysis of variance, and these are outlined here.


models (Model 1)

Main article: Fixed effects model The fixed-effects model of analysis of variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see if theresponse variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.


models (Model 2)

Main article: Random effects model Random effects models are used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, some assumptions and the method of contrasting the treatments differ from ANOVA model 1.

Main article: Mixed model

models (Model 3)

A mixed-effects model contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types.



The analysis of variance has been studied from several approaches, the most common of which uses a linear model that relates the response to the treatments and blocks. Even when the statistical model is nonlinear, it can be approximated by a linear model for which an analysis of variance may be appropriate.


analysis using a normal distribution

The analysis of variance can be presented in terms of a linear model, which makes the following assumptions about the probability distribution of the responses:[citation needed]

Independence of cases[clarification needed] this is an assumption of the model that simplifies the statistical analysis.

Normality the distributions of the residuals are normal.

Equality (or "homogeneity") of variances, called homoscedasticity the variance of data in groups should be the same. Model-based approaches usually assume that the variance is constant. The constantvariance property also appears in the randomization (design-based) analysis of randomized experiments, where it is a necessaryconsequence of the randomized design and the assumption of unit treatment additivity.[1] If the responses of a randomized balanced experiment fail to have constant variance, then the assumption of unit treatment additivity is necessarily violated.

To test the hypothesis that all treatments have exactly the same effect, the F-test's p-values closely approximate the permutation test's p-values: The approximation is particularly close when the design is balanced.[2] Such permutation tests characterize tests with maximum power against all alternative hypotheses, as observed by Rosenbaum.[nb 1] The anova Ftest (of the null-hypothesis that all treatments have exactly the same effect) is recommended as a practical test, because of its robustness against many alternative distributions.[3][nb 2] The KruskalWallis test and the Friedman test are nonparametric tests, which do not rely on an assumption of normality. The separate assumptions of the textbook model imply that the errors are independently, identically, and normally distributed for fixed effects models, that is, that the errors ( 's) are independent and



See also: Random assignment and Randomization test In a randomized controlled experiment, the treatments are randomly assigned to experimental units, following the experimental protocol. This randomization is objective and declared before the experiment is carried out. The objective random-assignment is used to test the significance of the null hypothesis, following the ideas of C. S. Peirce andRonald A. Fisher. This design-based analysis was discussed and developed by Francis J. Anscombe at Rothamsted Experimental Station and by Oscar Kempthorne atIowa State University.[4] Kempthorne and his students make an assumption of unit treatment additivity, which is discussed in the books of Kempthorne and David R. Cox.[citation needed]


from and

In its simplest form, the assumption of unit-treatment additivity states that the observed response experimental unit the treatment-effect when receiving treatment , that is

can be written as the sum of the unit's response

The assumption of unit-treatment addivity implies that, for every treatment exactly the same effect on every experiment unit.

, the

th treatment have

The assumption of unit treatment additivity usually cannot be directly falsified, according to Cox and Kempthorne. However, many consequences of treatment-unit additivity can be falsified. For a randomized experiment, the assumption of unit-treatment additivity implies that the variance is constant for all treatments. Therefore, bycontraposition, a necessary condition for unit-treatment additivity is that the variance is constant. The property of unit-treatment additivity is not invariant under a "change of scale", so statisticians often use transformations to achieve unit-treatment additivity. If the response variable is expected to follow a parametric family of probability distributions, then the statistician may specify (in the protocol for the experiment or observational study) that the responses be transformed to stabilize the variance.[8] Also, a statistician may specify that logarithmic transforms be applied to the responses, which are believed to follow a multiplicative model.[9][10] According to Cauchy's functional equation theorem, the logarithm is the only continuous transformation that transforms real multiplication to addition. The assumption of unit-treatment additivity was enunciated in experimental design by Kempthorne and Cox. Kempthorne's use of unit treatment additivity and randomization is similar to the designbased inference that is standard in finite-population survey sampling.


linear model

Kempthorne uses the randomization-distribution and the assumption of unit treatment additivity to produce a derived linear model, very similar to the textbook model discussed previously. The test statistics of this derived linear model are closely approximated by the test statistics of an appropriate normal linear model, according to approximation theorems and simulation studies by Kempthorne and his students (Hinkelmann and Kempthorne 2008). However, there are differences. For example, the randomization-based analysis results in a small but (strictly) negative correlation between the observations.[11][12] In the randomization-based analysis, there is no assumption of a normaldistribution and certainly no assumption of independence. On the contrary, the observations are dependent! The randomization-based analysis has the disadvantage that its exposition involves tedious algebra and extensive time. Since the randomization-based analysis is complicated and is closely approximated by the approach using a normal linear model, most teachers emphasize the normal linear model approach. Few statisticians object to model-based analysis of balanced randomized experiments.


models for observational data

However, when applied to data from non-randomized experiments or observational studies, modelbased analysis lacks the warrant of randomization. For observational data, the derivation of confidence intervals must use subjective models, as emphasized by Ronald A. Fisher and his followers. In practice, the estimates of treatment-effects from observational studies generally are often inconsistent. In practice, "statistical models" and observational data are useful for suggesting hypotheses that should be treated very cautiously by the public.[13]


of the sum of squares


Main article: Partition of sums of squares The fundamental technique is a partitioning of the total sum of squares S into components related to the effects used in the model. For example, the model for a simplified ANOVA with one type of treatment at different levels.

The number of degrees of freedom f can be partitioned in a similar way: one of these components (that for error) specifies a chi-squared distribution which describes the associated sum of squares, while the same is true for "treatments" if there is no treatment effect.

See also Lack-of-fit sum of squares.



Main article: F-test The F-test is used for comparisons of the components of the total deviation. For example, in one-way, or single-factor ANOVA, statistical significance is tested for by comparing the F test statistic

where I = number of treatments and nT = total number of cases

to the F-distribution with I 1,nT I degrees of freedom. Using the Fdistribution is a natural candidate because the test statistic is the ratio of two scaled sums of squares each of which follows a scaled chi-squared distribution.



Power analysis is often applied in the context of ANOVA in order to assess the probability of successfully rejecting the null hypothesis if we assume a certain ANOVA design, effect size in the population, sample size and alpha level. Power analysis can assist in study design by determining what sample size would be required in order to have a reasonable chance of rejecting the null hypothesis when the alternative hypothesis is true.



Main article: Effect size Several standardized measures of effect gauge the strength of the association between a predictor (or set of predictors) and the dependent variable. Effect-size estimates facilitate the comparison of findings in studies and across disciplines. 2 ( eta-squared ): Eta-squared describes the ratio of variance explained in the dependent variable by a predictor while controlling for other predictors. Eta-squared is a biased estimator of the variance explained by the model in the population (it estimates only the effect size in the sample). On average it overestimates the variance explained in the population. As the sample size gets larger the amount of bias gets smaller,

Cohen (1992) suggests effect sizes for various indexes, including (where 0.1 is a small effect, 0.25 is a medium effect and 0.4 is a large effect). He also offers a conversion table (see Cohen, 1988, p. 283) for eta squared (2) where 0.0099 constitutes a small effect, 0.0588 a medium effect and 0.1379 a large effect.



A statistically significant effect in ANOVA is often followed up with one or more different follow-up tests. This can be done in order to assess

which groups are different from which other groups or to test various other focused hypotheses. Follow-up tests are often distinguished in terms of whether they are planned (a priori) or post hoc. Planned tests are determined before looking at the data and post hoc tests are performed after looking at the data. Post hoc tests such as Tukey's range test most commonly compare every group mean with every other group mean and typically incorporate some method of controlling for Type I errors. Comparisons, which are most commonly planned, can be either simple or compound. Simple comparisons compare one group mean with one other group mean. Compound comparisons typically compare two sets of groups means where one set has two or more groups (e.g., compare average group means of group A, B and C with group D). Comparisons can also look at tests of trend, such as linear and quadratic relationships, when the independent variable involves ordered levels.


designs and ANOVAs

There are several types of ANOVA. Many statisticians base ANOVA on the design of the experiment,[citation needed] especially on the protocol that specifies the random assignment of treatments to subjects; the protocol's description of the assignment mechanism should include a specification of the structure of the treatments and of anyblocking. It is also common to apply ANOVA to observational data using an appropriate statistical model.[citation needed] Some popular designs use the following types of ANOVA:

One-way ANOVA is used to test for differences among two or more independent groups (means),e.g. different levels of urea application in a crop. Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test.[14] When there are only two means to compare, the t-test and the ANOVA F-test are equivalent; the relation between ANOVA and t is given by F = t2.

Factorial ANOVA is used when the experimenter wants to study the interaction effects among the treatments.

Repeated measures ANOVA is used when the same subjects are used for each treatment (e.g., in a longitudinal study).

Multivariate analysis of variance (MANOVA) is used when there is more than one response variable

Design of experiments
From Wikipedia, the free encyclopedia

Design of experiments with full factorial design (left), response surface with second-degree polynomial (right)

In general usage, design of experiments (DOE) or experimental design is the design of any informationgathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments. Other types of study, and their

design, are discussed in the articles on opinion polls andstatistical surveys (which are types of observational study), natural experiments and quasi-experiments (for example, quasi-experimental design). See Experiment for the distinction between these types of experiments or studies. In the design of experiments, the experimenter is often interested in the effect of some process or intervention (the "treatment") on some objects (the "experimental units"), which may be people, parts of people, groups of people, plants, animals, materials, etc. Design of experiments is thus a discipline that has very broad application across all the natural and social sciences.
Contents [hide]

1 History of development

o o

1.1 Controlled experimentation on scurvy 1.2 Statistical experiments, following Charles S. Peirce

1.2.1 Randomized experiments 1.2.2 Optimal designs for regression models

1.3 Sequences of experiments

2 Principles of experimental design, following Ronald A. Fisher 3 Example 4 Statistical control 5 Experimental designs after Fisher 6 See also 7 Notes 8 References 9 Further reading 10 External links


of development
experimentation on scurvy


In 1747, while serving as surgeon on HMS Salisbury, James Lind carried out a controlled experiment to develop a cure for scurvy.[1] Lind selected 12 men from the ship, all suffering from scurvy. Lind limited his subjects to men who "were as similar as I could have them", that is he provided strict entry requirements to reduce extraneous variation. He

divided them into six pairs, giving each pair different supplements to their basic diet for two weeks. The treatments were all remedies that had been proposed:

A quart of cider every day Twenty five gutts (drops) of elixir vitriol (sulphuric acid) three times a day upon an empty stomach, One half-pint of seawater every day A mixture of garlic, mustard, and horseradish in a lump the size of a nutmeg Two spoonfuls of vinegar three times a day Two oranges and one lemon every day.

The men who had been given citrus fruits recovered dramatically within a week. One of them returned to duty after 6 days and the other cared for the rest. The others experienced some improvement, but nothing was comparable to the citrus fruits, which were proved to be substantially superior to the other treatments.


experiments, following Charles S. Peirce

Main article: Frequentist statistics See also: Randomization A theory of statistical inference was developed by Charles S. Peirce in "Illustrations of the Logic of Science" (18771878) and "A Theory of Probable Inference" (1883), two publications that emphasized the importance of randomization-based inference in statistics.



Main article: Random assignment See also: Repeated measures design Charles S. Peirce randomly assigned volunteers to a blinded, repeated-measures design to evaluate their ability to discriminate weights.[2][3][4][5] Peirce's experiment inspired other researchers in psychology and education, which developed a research tradition of randomized experiments in laboratories and specialized textbooks in the 1800s.[2][3][4][5]


designs for regression models

Main article: Response surface methodology See also: Optimal design

Charles S. Peirce also contributed the first English-language publication on an optimal design for regressionmodels in 1876.[6] A pioneering optimal design for polynomial regression was suggested by Gergonne in 1815. In 1918 Kirstine Smith published optimal designs for polynomials of degree six (and less).


of experiments

Main article: Sequential analysis See also: Multi-armed bandit problem, Gittins index, and Optimal design The use of a sequence of experiments, where the design of each may depend on the results of previous experiments, including the possible decision to stop experimenting, is within the scope of Sequential analysis, a field that was pioneered[7] by Abraham Wald in the context of sequential tests of statistical hypotheses.[8] Herman Chernoffwrote an overview of optimal sequential designs,[9] while adaptive designs have been surveyed by S. Zacks.[10] One specific type of sequential design is the "two-armed bandit", generalized to the multi-armed bandit, on which early work was done by Herbert Robbins in 1952.[11]


of experimental design, following Ronald A. Fisher

A methodology for designing experiments was proposed by Ronald A. Fisher, in his innovative book The Design of Experiments (1935). As an example, he described how to test the hypothesis that a certain lady could distinguish by flavour alone whether the milk or the tea was first placed in the cup. While this sounds like a frivolous application, it allowed him to illustrate the most important ideas of experimental design: Comparison In some fields of study it is not possible to have independent measurements to a traceable standards. Comparisons between treatments are much more valuable and are usually preferable. Often one compares against a scientific control or traditional treatment that acts as baseline. Randomization Random assignment is the process of assigning individuals at random to groups or to different groups in an experiment. The random assignment of individuals to groups (or conditions within a group) distinguishes a rigorous, "true" experiment from an adequate, but less-than-rigorous, "quasiexperiment".[12] There is an extensive body of mathematical theory that explores the consequences of making the allocation of units to treatments by means of some random mechanism such as tables of random numbers, or the use of randomization devices such as playing cards or dice. Provided the sample size is adequate, the risks associated with random allocation (such as failing to obtain a representative sample in a survey, or having a serious imbalance in a key characteristic between a treatment group

and a control group) are calculable and hence can be managed down to an acceptable level. Random does not mean haphazard, and great care must be taken that appropriate random methods are used. Replication Measurements are usually subject to variation and uncertainty. Measurements are repeated and full experiments are replicated to help identify the sources of variation, to better estimate the true effects of treatments, to further strengthen the experiment's reliability and validity, and to add to the existing knowledge of about the topic.[13]However, certain conditions must be met before the replication of the experiment is commenced: the original research question has been published in a peer-reviewed journal or widely cited, the researcher is independent of the original experiment, the researcher must first try to replicate the original findings using the original data, and the write-up should state that the study conducted is a replication study that tried to follow the original study as strictly as possible.[14] Blocking Blocking is the arrangement of experimental units into groups (blocks) consisting of units that are similar to one another. Blocking reduces known but irrelevant sources of variation between units and thus allows greater precision in the estimation of the source of variation under study. Orthogonality

Example of orthogonal factorial design

Orthogonality concerns the forms of comparison (contrasts) that can be legitimately and efficiently carried out. Contrasts can be represented by vectors and sets of orthogonal contrasts are uncorrelated and independently distributed if the data are normal. Because of this independence, each orthogonal treatment provides different information to the others. If there are Ttreatments and T 1 orthogonal contrasts, all the information that can be captured from the experiment is obtainable from the set of contrasts. Factorial experiments Use of factorial experiments instead of the one-factor-at-a-time method. These are efficient at evaluating the effects and possible interactions of several factors (independent variables).

Analysis of the design of experiments was built on the foundation of the analysis of variance, a collection of models in which the observed variance is partitioned into components due to different factors which are estimated and/or tested.


This example is attributed to Harold Hotelling.[9] It conveys some of the flavor of those aspects of the subject that involve combinatorial designs. The weights of eight objects are to be measured using a pan balance and set of standard weights. Each weighing measures the weight difference between objects placed in the left pan vs. any objects placed in the right pan by adding calibrated weights to the lighter pan until the balance is in equilibrium. Each measurement has a random error. The average error is zero; thestandard deviations of the probability distribution of the errors is the same number on different weighings; and errors on different weighings are independent. Denote the true weights by

We consider two different experiments: 1. Weigh each object in one pan, with the other pan empty. Let Xi be the measured weight of the ith object, for i = 1, ..., 8. 2. Do the eight weighings according to the following schedule and let Yi be the measured difference for i = 1, ..., 8:

Then the estimated value of the weight 1 is

Similar estimates can be found for the weights of the other items. For example

The question of design of experiments is: which experiment is better? The variance of the estimate X1 of 1 is 2 if we use the first experiment. But if we use the second experiment, the variance of the estimate given above is 2/8. Thus the second experiment gives us 8 times as much precision for the estimate of a single item, and estimates all items simultaneously, with the same precision. What is achieved with 8 weighings in the second experiment would require 64 weighings if items are weighed separately. However, note that the estimates for the items obtained in the second experiment have errors which are correlated with each other. Many problems of the design of experiments involve combinatorial designs, as in this example.



It is best for a process to be in reasonable statistical control prior to conducting designed experiments. When this is not possible, proper blocking, replication, and randomization allow for the careful conduct of designed experiments.[15] To control for nuisance variables, researchers institute control checks as additional measures. Investigators should ensure that uncontrolled influences (e.g., source credibility perception) are measured do not skew the findings of the study. A manipulation check is one example of a control check. Manipulation checks allow investigators to isolate the chief variables to strengthen support that these variables are operating as planned.


designs after


Some efficient designs for estimating several main effects simultaneously were found by Raj Chandra Bose and K. Kishen in 1940 at the Indian Statistical Institute, but remained little known until the PlackettBurman designs were published in Biometrika in 1946. About the same time, C. R. Rao introduced the concepts of orthogonal arrays as experimental designs. This was a concept which played a central role in the development of Taguchi methods by Genichi Taguchi, which took place during his visit to Indian Statistical Institute in early 1950s. His methods were successfully applied and adopted by Japanese and Indian industries and subsequently were also embraced by US industry albeit with some reservations. In 1950, Gertrude Mary Cox and William Gemmell Cochran published the book Experimental Designs which became the major reference work on the design of experiments for statisticians for years afterwards. Developments of the theory of linear models have encompassed and surpassed the cases that concerned early writers. Today, the theory rests on advanced topics in linear algebra, algebra and combinatorics. As with other branches of statistics, experimental design is pursued using both frequentist and Bayesian approaches: In evaluating statistical procedures like experimental designs, frequentist statistics studies the sampling distribution while Bayesian statistics updates a probability distribution on the parameter space. Some important contributors to the field of experimental designs are C. S. Peirce, R. A. Fisher, F. Yates, C. R. Rao, R. C. Bose, J. N. Srivastava, Shrikhande S. S., D. Raghavarao, W. G. Cochran, O. Kempthorne, W. T.

Federer, V. V. Fedorov, A. S. Hedayat, J. A. Nelder, R. A. Bailey, J. Kiefer, W. J. Studden, A. Pzman, F. Pukelsheim,D. R. Cox, H. P. Wynn, A. C. Atkinson, G. E. P. Box and G. Taguchi.[citation needed] The textbooks of D. Montgomery and R. Myers have reached generations of students and practitioners.