Sie sind auf Seite 1von 49

Statistics for Marketing & Consumer Research

Copyright 2008 - Mario Mazzocchi


1
Analysis of variance
Chapter 7
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
2
Tests on multiple hypotheses
Consider the situation where the means for more than two
groups are compared, e.g. mean alcohol expenditure for:
(a) students; (b) unemployed; (c) employees
One could run a set of two mean comparison tests (students
vs. unemployed, students vs. employed, employed vs.
unemployed)
However, as seen in lecture 6, each of these tests is
subject to Type one error (the level of significance o), i.e.
the probability of rejecting the null hypothesis when it is
actually true
Thus, the overall Type one error for the joint three tests is
larger than o because the probability of Type one error
increases with the number of tests
This is the so-called problem of inflated family-wise (or
experiment-wise) Type one error
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
3
Analysis of Variance
It is an alternative approach to mean comparison
for multiple groups
It is a set of techniques for a variety of situations
It is applicable to a sample of individuals that
differ for one or more given factors
It allows tests where variability in a variable is
attributable to one (or more) factors
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
4
Example
Economic position of Household Reference Person

Self-
employed
Fulltime
employee
Pt
employee
Unempl.
Ret unoc
over min
ni age
Unoc -
under
min ni
age TOTAL

2

3

4

5

6

Mean
18.56 14.64 12.39 19.48 7.34 11.99 12.67

o
1
o
2
o
3
o
4
o
5
o
6
o
EFS: Total
Alcoholic
Beverages,
Tobacco
St. Dev.
19.0 18.5 15.0 19.7 14.6 19.1 17.8

Are there significant difference across the means of
these groups?
Or do the differences depend on the different levels of
variability across the groups?
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
5
Analysis of Variance
Here the target variable is alcohol
expenditure, the factor is the economic
position of the HRP
Basically we investigate the attribution of a
variation in the metric target variable to
variations in one on more categorical
explanatory variables (the factors)
A treatment is a combination of different
factors in n-way analysis of variance
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
6
One-way ANOVA
Only one categorical variable (a single factor)
Several levels (categories) for that factor
The typical hypothesis tested through ANOVA is
that the factor is irrelevant to explain differences
in the target variable (i.e. the means are equal, as
in bivariate mean comparisons/t-tests)
Apart from the tested factor(s), the groups should
be safely considered homogeneous between each
other
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
7
Null and alternative hypothesis for
ANOVA
Null hypothesis (H
0
): all the means are equal
Alternative hypothesis (H
1
): at least two
means are different
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
8
Arranging data for ANOVA
Economic position of Household Reference Person
Group (g)
1 2 3 4 5 6
Self-
employed
Fulltime
employee
Pt
employee
Unempl.
Ret unoc
over min
ni age
Unoc -
under
min ni
age
Observations
x
11
x
21
x
13
x
14
x
15
x
16

x
21
x
22
x
23
x
24
x
25
x
26

x
31
x
32
x
33
x
34
x
35
x
36



Number of observations (n)
n
1
n
2
n
3
n
4
n
5
n
6

Means
x
1
x
2
x
3
x
4
x
5
x
6


Overall mean x


t
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
9
The statistical distribution to carry out
ANOVA
1. Decompose the total variation (sum of
squares corrected for the mean)
2. Compute the F-test statistic
3. Choose the critical value
4. Interpret the result
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
10
One-way ANOVA: data
Suppose that we have n observation within each group and g group
Obs.
Group (factor level)
1 2 j g
1 x
11
x
12
x
1j
x
1g
2 x
21
x
22
x
2j
x
2g

i x
i1
x
i2
x
ij
x
ig

n x
n1
x
n2


x
nj
x
nn
Group mean
1
x

2
x


j
x


g
x


1
1
g
j
j
x x
g
=
=

TOTAL MEAN
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
11
Measuring and decomposing the total
variation
SUM OF SQUARES (corrected for the mean)
VARIATION BETWEEN THE GROUPS +
VARIATION WITHIN EACH GROUP =
________________________________

TOTAL VARIATION
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
12
Variance decomposition
( )
2
2
1 1
1
c
n g
rc
c r
T
x x
s
n
= =

=

( )
2
2
1
1
g
c c
c
BW
x x n
s
g
=

=

( )
2
2
1 1
1
c
n g
rc c
W
c r
c
x x
s
n
= =

(TOTAL VARIANCE)

(VARIANCE BETWEEN GROUPS)

(VARIANCE WITHIN GROUPS)
( ) ( ) ( )
2 2 2
1 1 1 1 1
c r
n n g g g
rc rc c c c
c r c r c
x x x x x x n
= = = = =
= +


DEGREES OF FREEDOM
1
1 1 ( 1) 1
g
c
c
n g n g n g
=
= + = +

DEGREES OF FREEDOM
1
1 1 ( 1) 1
g
c
c
n g n g n g
=
= + = +

Statistics for Marketing & Consumer Research


Copyright 2008 - Mario Mazzocchi
13
The basic principle of the ANOVA
If the variation explained by the different
factor between the groups is significantly
more relevant than the variation within the
groups, then the factor is assumed to be
statistically relevant in explaining the
differences
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
14
The test statistic
The test statistic is computed as:



This test statistic compares the weight of
the variance explained by the factors to the
weight of the variance not explained by the
factors
2
2
Variance between groups
Variance within groups
B
W
s
F
s
= =
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
15
Distribution of the
F-statistic (one-tailed test)
Rejection area
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
16
Characteristics of the
F-distribution
Its shape (critical value) changes according to the
degrees of freedom (numbers of observation/
groups)
It is a non-negative statistic (one-tailed test)
For ANOVA testing:
df
1
=g-1
df
2
=n-g

1 2
( , ) F df df
o
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
17
ANOVA in SPSS
Target variable
Factor
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
18
SPSS output
ANOVA
EFS: Total Alcoholic Beverages, Tobacco
6171.784 5 1234.357 4.024 .001
151535.3 494 306.752
157707.1 499
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
Variation decomposition
Degrees of
freedom
Variance between
Variance within
p-value < 0.05
The null is
rejected
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
19
Contrasts
Allows to test hypotheses on specific sub-sets of the
treatments (factor combinations).
They open the way to further explore the sources of
variability when the null hypothesis of mean equality is
rejected.
Comparisons are usually based on a theory and planned
before the analysis, thus they are also called planned
comparisons.
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
20
Linear contrasts
Linear contrasts are linear combinations of
the means, allowing one to test other
hypotheses than mean equality
For example, one may want to test whether
the mean for group one is double the mean
for groups two and three, while the means
of groups two and three are equal
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
21
Linear contrasts
Contrasts are also useful after rejection of
the null hypothesis of mean equality
Rejection of the null hypothesis means that
at least two means differ, but it does not
say which ones actually differ
Planned comparisons through linear
contrasts can help
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
22
Example
Test whether chicken expenditures increases
linearly with household size
Check whether there are significant differences:
Between households with one or two components and
households with more components
Considering the following groups
Households with one component
Households with two components
Households with more than two components
Considering the following comparison
Households with four, five, six and seven components have
equal means
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
23
Household sizes and means
Descriptives
In a typical week how much do you spend on fresh or frozen chicken (Euro)?
1 4.8000 . . . . 4.80 4.80
82 4.2470 2.82338 .31179 3.6266 4.8673 .37 15.00
145 5.0548 3.41626 .28371 4.4941 5.6156 .00 20.00
93 6.3231 4.71695 .48912 5.3517 7.2946 .00 30.00
87 6.7334 3.87396 .41533 5.9078 7.5591 .00 18.00
24 7.5613 7.64258 1.56003 4.3341 10.7884 .00 30.00
10 6.2730 3.25606 1.02966 3.9438 8.6022 .00 10.49
1 6.7500 . . . . 6.75 6.75
443 5.6677 4.13383 .19640 5.2817 6.0537 .00 30.00
0
1
2
3
4
5
6
7
Total
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confi dence Interval for
Mean
Mi nimum Maxi mum
7 GROUPS
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
24
Example
1 and 2 components versus 3, 4, 5, 6 and 7
Weights (they need to sum to 0)
1 = -2.5
2 = -2.5
3 = 1
4 = 1
5 = 1
6 = 1
7 = 1
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
25
Planned comparisons
Helmert contrasts: the first treatment is compared with all of the
remaining treatments, the second treatment will all the remaining
treatments but the first, the third treatment will all of the remaining
ones but the first two, and so on.
By looking at the results of this battery of tests, it becomes possible to
identify those groups whose difference from the others is most
relevant.
polynomial contrasts: it is possible to tested whether the trend in
means follows a linear, quadratic or cubic sequence or any polynomial
relationship between the treatments,
repeated contrasts: each treatment is compared with the one which
follows
reverse Helmert contrasts (or difference contrasts): Helmert contrasts
going backwards
simple contrasts where the user can choose the benchmark treatment
between the first and the last category .
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
26
Post-hoc comparisons
Linear contrasts are carried out independently from each
other
Post-hoc tests consist in a set of paired comparisons,
where the critical values are corrected to account for the
problem of inflating the Type I Error risk (rejecting the null
hypothesis when it is true) measured by the cumulative
Type I error or familiwise error.
The approach to correcting the critical values determines
the Type of test being used. In SPSS:
Scheffes test
Bonferronis test
Tukeys test.
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
27
Some post-hoc tests
Scheffe test: simultaneous comparisons for all potential
pair-wise and linear combinations of treatments using F
critical values corrected to account for the number of
treatment being compared
Bonferroni post-hoc method: (1) run the usual pair-wise t-
tests; (2) to account for the inflated Type one error rate an
adjustment is provided by dividing the family-wise error by
the number of tests.
Tukeys test: also known as an Honestly Significant
Difference or HSD test, it can be used when samples are of
equal size, but statistical packages usually provide variants
for unequal sizes. With this test, significant differences are
identified through an adjusted Studentized range
distribution (an extension of the Student t statistic which
uses pooled estimate of the standard errors)
More tests on the textbook
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
28
Effect size and power
The experimental factor matters, but how much?
(effect size)
Larger F statistics do not necessarily imply larger
effect sizes because they also depend on sample
sizes
A typical measure of effect size is q
2
(the ratio
between variation between and total variation)

The power of a test is 1-| where | is the
probability of non-rejecting the null hypothesis
when the alternative is true (Type II error)
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
29
Which post-hoc test?
One should check the probabilities of Type I
error and power (Type II errors)
There is a trade-off between power and
Type I error
Conservative tests: low Type I error, low power
(Scheff, Bonferroni)
Tukeys test more appropriate for a large number
of means
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
30
ANOVA in SPSS
Target variable (scale)
Factor (categorical)
Planned
comparisons
Post-hoc
tests
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
31
Planned comparisons (contrasts)
1 2 3 4
0
L ML MH H
w w w w + + + =
Other contrasts
Insert a weight for each
subgroup
Note: the null
hypothesis is that the
contrast holds
Click here to insert other sets of
weight (one set of weight per
comparison)
The polynomial contrast assumes
that the mean follows a given
polynomial (linear, quadratic, etc.)
Note: the null hypothesis is that
the polynomial contrast does not
hold
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
32
ANOVA output

Descriptives
EFS: Total Al cohol i c Beverages, Tobacco
125 7.467 12.8693 1.1511 5.188 9.745 .0 70.0
125 11.381 17.9038 1.6014 8.212 14.551 .0 93.9
125 13.040 16.9137 1.5128 10.046 16.035 .0 79.4
125 18.789 20.8025 1.8606 15.106 22.472 .0 92.5
500 12.669 17.7777 .7950 11.107 14.231 .0 93.9
Low i ncome
Medi um-l ow i ncome
Medi um-hi gh i ncome
Hi gh i ncome
Total
N Mean Std. Devi ation Std. Error Lower Bound Upper Bound
95% Confi dence Interval for
Mean
Mini mum Maximum
ANOVA
EFS: Total Al cohol i c Beverages, Tobacco
8289.482 3 2763.161 9.172 .000
7932.717 1 7932.717 26.333 .000
356.765 2 178.382 .592 .554
149417.6 496 301.245
157707.1 499
(Combined)
Contrast
Devi ati on
Li near Term
Between
Groups
Withi n Groups
Total
Sum of
Squares df Mean Square F Si g.
Mean equality is rejected
The means are
compatible with a
linear polynomial
And not compatible
with a non-linear one
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
33
Output
Contrast Coefficients
0 1 -1 0
1 0 0 -1
Contrast
1
2
Low income
Medium-low
income
Medium-high
income High income
Anonymised hhold inc + allowances (Banded)
Contrast Tests
-1.659 2.1954 -.756 496 .450
-11.322 2.1954 -5.157 496 .000
-1.659 2.2029 -.753 247.202 .452
-11.322 2.1879 -5.175 206.788 .000
Contrast
1
2
1
2
Assume equal variances
Does not assume equal
variances
EFS: Total Alcoholic
Beverages, Tobacco
Value of
Contrast Std. Error t df Sig. (2-tailed)
The first contrast (0, 1, -1, 0) holds (not rejected)
The second contrast (1, 0, 0, -1) (rejected)
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
34
SPSS Post-hoc tests
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
35
SPSS output (post-hoc tests)
Multiple Comparisons
Dependent Variable: EFS: Total Alcoholic Beverages, Tobacco
-3.9147 2.1954 .283 -9.574 1.745
-5.5737 2.1954 .055 -11.233 .086
-11.3224* 2.1954 .000 -16.982 -5.663
3.9147 2.1954 .283 -1.745 9.574
-1.6590 2.1954 .874 -7.318 4.000
-7.4077* 2.1954 .004 -13.067 -1.748
5.5737 2.1954 .055 -.086 11.233
1.6590 2.1954 .874 -4.000 7.318
-5.7487* 2.1954 .045 -11.408 -.089
11.3224* 2.1954 .000 5.663 16.982
7.4077* 2.1954 .004 1.748 13.067
5.7487* 2.1954 .045 .089 11.408
-3.9147 2.1954 .451 -9.730 1.901
-5.5737 2.1954 .069 -11.389 .242
-11.3224* 2.1954 .000 -17.138 -5.507
3.9147 2.1954 .451 -1.901 9.730
-1.6590 2.1954 1.000 -7.474 4.156
-7.4077* 2.1954 .005 -13.223 -1.592
5.5737 2.1954 .069 -.242 11.389
1.6590 2.1954 1.000 -4.156 7.474
-5.7487 2.1954 .055 -11.564 .067
11.3224* 2.1954 .000 5.507 17.138
7.4077* 2.1954 .005 1.592 13.223
5.7487 2.1954 .055 -.067 11.564
(J) Anonymised hhold inc
+ allowances (Banded)
Medium-low income
Medium-high income
High income
Low income
Medium-high income
High income
Low income
Medium-low income
High income
Low income
Medium-low income
Medium-high income
Medium-low income
Medium-high income
High income
Low income
Medium-high income
High income
Low income
Medium-low income
High income
Low income
Medium-low income
Medium-high income
(I) Anonymised hhold inc
+ allowances (Banded)
Low income
Medium-low income
Medium-high income
High income
Low income
Medium-low income
Medium-high income
High income
Tukey HSD
Bonf erroni
Mean
Dif f erence
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Conf idence Interval
The mean dif f erence is signif icant at the .05 level.
*.
Results for
each paired
comparison are
reported and
significance
level adjusted
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
36
ANOVA: Fixed versus random effects
1. Explore differences in monthly food expenditure
for different geographical regions
2. Explore differences in monthly food expenditure
according to the point of purchase for the last
food shopping
1. is a fixed effect which implies that the
researcher can fully control the factor (treatment)
2. is a random effect where the factor (treatment)
cannot be fully controlled and is subject to a
(random) measurement error
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
37
ANOVA assumptions
Two key assumptions are needed for running analysis
of variance without risks
1) that the sub-samples defined by the treatment are
independent
2) that no big discrepancies exist in the variances of the
different sub-samples
Normality within the sub-sample: within limits, departure
from normality is not a serious issue
Different variances: results are still reliable if the sizes of
sub-samples are equal
Both variances and sample sizes differ: high risk of biased
results
Adjustments: Brown-Forsythe test and/or the Welch test
instead of the usual F test
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
38
Adjustments for violated assumptions in
SPSS
Click on OPTIONS to request descriptive stats for a
random effect model , Brown-Forsythe and Welch
tests (plus more plots and descriptive statistics)
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
39
Non-parametric ANOVA tests
Exclusive samples extracted from the same
population
KruskalWallis test: extends the Mann-Whitney test to the
case of a higher number of sub-samples. It tests the null
hypothesis that all the sub-populations have the same
distribution function.
Jonckheere-Terpstra test: the same null hypothesis, but
against the alternative that an increase in treatment leads
to an increase in the (median of the) dependent variable.
Related samples (the same respondent may appear in
several treatment sub-samples)
Friedman test, Kendall test or Cochran Q test, extend to
the multiple sample case some of the non-parametric tests
for mean comparisons
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
40
Non-parametric ANOVA in SPSS-
Exhaustive sub-samples Related samples
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
41
The class of ANOVA techniques
Number of target
variables
Number of factors Measurement of factors Technique
1 1 nominal / ordinal independent samples One-way ANOVA
1 2 or more nominal / ordinal independent samples Factorial ANOVA
2 or more 1 or more nominal / ordinal independent samples MANOVA
1 2 or more nominal / ordinal and continuous, independent
samples
ANCOVA
2 or more 2 or more nominal / ordinal and continuous, independent
samples
MANCOVA
1 1 or more nominal / ordinal repeated samples Repeated ANOVA
1 1 or more nominal / ordinal mixed samples Mixed ANOVA
1 1 or more Nominal / ordinal random effects Variance Component Model
1 1 nominal / ordinal independent samples, non-
normal data and/or non-homogeneous
independent samples
Non-parametric tests: KruskalWallis test or
Jonckheree-Terpstra test
1 1 nominal / ordinal independent samples, non-
normal data and/or non-homogeneous related
samples
Non-parametric tests: Friedman, Cochran Q
or Kendall's test
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
42
Other ANOVA designs
Multi-way (factorial) ANOVA
Multivariate ANOVA (MANOVA)
(Multivariate) Analysis of Covariance
(MANCOVA)
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
43
General linear model
One-way ANOVA is equivalent to a linear
model, where the target variable is the
dependent variable and then each of the
treatments is transformed into a dummy
variable which assumes a value of one if
respondents are subject to that treatment.
This means that they belong to that
economic condition and are zero otherwise.
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
44
GLM example
y
i
is the amount spent in alcohol and tobacco by the i-th
respondent
SE
i
=1 if the respondent is self-employed
FT
i
=1 for full-time employees
PT
i
=1 for part-time employees
UN
i
=1 for unemployed resepondents
RE
i
=1 for retired or inactive respondents and
UA
i
=1 for those under working age
0 1 2 3 4 5 6 i i i i i i i i
y SE FT PT UN RE UA | | | | | | | c = + + + + + + +
Target variable: Alcohol and tobacco expenditure
Factor: employment status
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
45
Tests on the GLM coefficients
T-test on each coefficient: bivariate mean
comparison
F-test: one-way ANOVA
Other analyses of variance
Multi-way (Factorial) ANOVA: More than one factor
(interactions)
MANOVA: More than one target variable: allows one
to test whether the factors lead to significant
differences in a set of variables.
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
46
ANCOVA
A final generalization which is quite
interesting for consumer research is the
Analysis of Covariance (ANCOVA), which is
the appropriate technique when some of the
factors are continuous quantitative variables
instead of being measured on a nominal or
ordinal scale
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
47
GLM and ANOVA techniques in SPSS
Univariate GLM: ANOVA, n-way
ANOVA, ANCOVA
Multivariate GLM: MANOVA,
MANCOVA
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
48
Univariate GLM

Target variable
Factors (more than one for
n-way ANOVA, random
factors are allowed)
Scale variables for
ANCOVA
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi
49
Multivariate GLM

More than one target
variable for MANOVA
or MANCOVA

Das könnte Ihnen auch gefallen