Beruflich Dokumente
Kultur Dokumente
Analysis of Variance
15.1 Introduction
Analysis of variance compares two or more populations of interval data. Specifically, we are interested in determining whether differences exist between the population means. The procedure works by analyzing the sample variance.
Week ly sales
55 5 66 6 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 Weekl 55 5 55 5 y sales 55 5 55 5 55 5
Qa u lity
Price
55 5 66 6 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5
Notation
1
First observation, first sample Second observation, second sample X11 x21 . . . Xn1,1 n 5
k
X1k x2k . . . Xnk,k
x5
Sample size Sample mean
n5 x5
nk xk
Terminology
In the context of this problem
Response variable weekly sales Responses actual sale values Experimental unit weeks in the three cities when we record sales figures. Factor the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy. Factor levels the population (treatment) names. In this problem factor levels are the marketing trategies.
Two types of variability are employed when testing for the equality of the population means
30
25
x5= 5 5
20
x5= 5 5
16 15 14 11 10 9
20 19
x5= 5 5
x5= 5 5 x5 5 = 5
12 10 9 7
x5 5 = 5
A small variability within The sample means are the same as before 1 the samples makes it easier the larger within-sample variability but Treatment 1 Treatment 2 Treatment 3 Treatment 1 TreatmentTreatment 3 2 to draw a conclusion about the it harder to draw a conclusion makes population means. about the population means.
The variability between the sample means is measured as the sum of squared distances between each mean and the grand mean.
This sum is called the Sum of Squares for Treatments In our example treatments are SST represented by the different
advertising strategies.
SST= nj (xj x)
j=5
There are k treatments The size of sample The mean of sample j j Note: When the sample means are close to one another, their distance from the grand mean is small, leading to a small SST. Thus, large SST indicates large variation between sample means, which supports H1.
e three cities).
This sum is called the our example this is the m of all squared differences Sum of Squares for Error tween sales in city j and the mple mean of city j (over all SSE
nj
Is SST = 57,512.23 large enough relative to SSE = 506,983.50 to reject the null hypothesis that specifies that all the means are equal?
Calculation of MSE
Mean Square for Error
M ST =
k 5 5,5 .5 55 5 5 = 55 = 6,5 .5 65 5 5
MSE =
Required Conditions: 1. The populations tested are normally distributed. with the following degrees of freedom: 2. The variances of all the populations tested arev1=k -1 and v2=n-k equal.
The F test
Ho: 1 = 2= 3 H1: At least two means differ
MT S F= ME S 5,66 5 5 6.5 = 5 5.5 ,55 5 =5 5 .5
FDIST(3.23,2,57)
Count 5 5 5 5 5 5
df 5 5 5 5 5
MS 555 55 55 55
P-value 55 . 5 55 5 . 55
F crit 55 . 5
Level2
Factor B
Level 1
Random effects
If the levels included in our analysis represent a random sample of all the possible levels, we have a random-effect ANOVA. The conclusion of the random-effect ANOVA
Randomized Blocks
Block all the observations with some commonality across treatments
Block3
Block2
Block 1
Randomized Blocks
Block all the observations with some commonality across treatments
Treatment Block
5 5 . . . b 5 5 k Block mean X55X55. . . X5 k x[B] 5 X55X55 X5 k x[B]5
Xb5 Xb5
Xbk
Treatment mean
x[T] x[T]5 5
x[T]k
x[B]b
The sum of square total is Recall. For the independent partitioned into three sources of samples design we have: variation
Treatments Blocks Within samples (Error) SS(Total) = SST + SSB + SSE SS(Total) = SST + SSB + SSE
SS(Total) = SST + SSE
Sum of square for treatments of square for blocks of square for error Sum Sum
SSB=
Treatment mean
x[T] x[T]5 5
x[T]k
SST
Block
SSB=
k(x[B] ) X + 5 5 k(x[B] ) X + 5 k(x[B] ) X k
5 5
Treatment mean
x[T] x[T]5 5
5 5
x[T]k
SST
Mean Squares
To perform hypothesis tests for treatments and blocks we need Mean square for treatments Mean square for blocks Mean square for error = SST MST k 5
SSB MSB= b 5
SSE M = SE nkb+5
MST F= MSE
Test statistic for blocks
MSB F= MSE
F > F,k-1,n-k-b+1
Testing the mean response for blocks
F> F,b-1,n-k-b+1
Treatments
onclusion: At 5% significance level there is sufficient evidenc infer that the mean cholesterol reduction gained by at leas o drugs are different.
Variance
City1
City2
Convnce Paper
City3
Quality TV
City4 City5
Quality Price TV
Xm15-03
Paper
The p-value =.0452. We conclude that there is evidence that differences exist in the mean weekly sales among the six cities.
Are there differences in the mean sales caused by different marketing strategies?
Are there differences in the mean sales caused by different advertising media?
st whether mean sales of the TV, and Newspaper nificantly differ from one another. H0: TV = Newspapers H1: The means differ
Calculations are based on the sum of square for factor B SS(B)
Are there differences in the mean sales caused by interaction between marketing strategy and advertising medium?
Graphical description of the possible Graphical description of the possible relationships between factors A and B. relationships between factors A and B.
Difference between the levels of factor A, and between the levels of factor A Difference difference between the levels of factor B; no No difference between the levels of factor interaction M R Level 1 of factor B M R Level 1and 2 of factor B e e e e s s a p a p Level 2 of factor B n o n o n n s s e e Levels of factor A Levels of factor A 1 2 3 1 2 3
M R M R e e Interaction eNo difference between the levels of factor A. e s s a Difference between the levels of factor B p a p n o n n o n s e s e Levels of factor A Levels of factor A 1 2 3 1 2 3
Sums of squares
SS(A) = rb SS(B) = ra
(x[A]i x)5
i=5 b
(x[B]j x)5
b
j=5
a
SS(AB = r )
a
i=5 j=5
SSE =
( xijk x[ AB ]ij ) 5
i =5 j =5 k =5
MSE
F > F,(a-1)
Required conditions:
1. The response distributions is normal 2. The treatment variances are equal. 3. The samples are independent.
55 5 66 6 66 6 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5
55 5 55 5 66 6 66 6 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5
55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5 55 5
At 5% significance level there is evidence to infer that differences in weekly sales exist
At 5% significance level there is insufficient evidence to infer that differences in weekly sales exist between the two advertising
Interaction AB = Marketing*Media
At 5% significance level there is insufficient evidence to infer that the two factors interact to affect the mean weekly sales.
Two means are considered different if the difference between the corresponding sample means is larger than a critical number. Then, the larger sample mean is believed to be associated with a larger population mean. Conditions common to all the methods here:
The ANOVA model is the one way analysis of variance The conditions required to perform the
This method builds on the equal variances t-test of the difference between two means. The test statistic is improved by using MSE rather than sp2. We can conclude that i and j differ (at % significance level if 5 i -5j| > LSD, where | LSD= t 5 MSE + ) ( ni nj
df. = n k .
Bonferroni Adjustment
The procedure:
Compute the number of pairwise comparisons (C) [C=k(k-1)/2], where k is the number of populations. Set = E/C, where E is the true probability of making at least one Type I error (called experimentwise Type I error). 5 5 ( conclude (5 ) i and ) We can i j > t thatMSE + j differ (at /C% C ni nj significance level if
df. = n k .
MSE = q (k, ) ng
k = the number of samples =degrees of freedom = n - k ng = number of observations per sample (recall, all the sample sizes are the same) = significance level q(k,) = a critical value obtained from the studentized range table
Repeat this procedure for each pair of samples. Rank the means if possible. the sample sizes are not extremely different, we can use the k
5 5 5 5 ... 5 k n+ n + + n
Sample sizes were equal. n1 = n2 = n3 = 20, = n-k = 60-3Take q.05(3,60) from the table. = 57, MSE = 8894.
xmax xmin>
xmax xmin
Sales - City 1 City 1 vs. City 2: 653 - 577.55 = 75.45 577.55 Sales - City 2 City 1 vs. City 3: 608.65 - 577.55 = 31.1 653 Sales - City 3 City 2 vs. City 3: 653 - 608.65 = 44.35 698.65