Beruflich Dokumente
Kultur Dokumente
W&W, Chapter 10
Introduction
Last time we learned about the chi square test for independence, which is useful for data that is measured at the nominal or ordinal level of analysis. If we have data measured at the interval level, we can compare two or more population groups in terms of their population means using a technique called analysis of variance, or ANOVA.
We want to know something about how the populations compare. Do they have the same mean? We can collect random samples from each population, which gives us the following data.
N1 cases
N2 cases
Nk cases
Suppose we want to compare 3 college majors in a business school by the average annual income people make 2 years after graduation. We collect the following data (in $1000s) based on random surveys.
F-Statistic
For this test, we will calculate a F statistic, which is used to compare variances. F = SST/(k-1) SSE/(n-k) SST=sum of squares for treatment SSE=sum of squares for error k = the number of populations N = total sample size
F-statistic
Intuitively, the F statistic is: F = explained variance unexplained variance Explained variance is the difference between majors Unexplained variance is the difference based on random sampling for each group (see Figure 10-1, page 327)
Calculating SST
SST = ni(Mi - )2 = grand mean or = Mi/k or the sum of all values for all groups divided by total sample size Mi = mean for each sample k= the number of populations
Calculating SST
By major Accounting M1=29, n1=6 Marketing M2=33.5, n2=6 Finance M3=37, n3=6 = (29+33.5+37)/3 = 33.17 SST = (6)(29-33.17)2 + (6)(33.5-33.17)2 + (6)(37-33.17)2 = 193
Calculating SST
Note that when M1 = M2 = M3, then SST=0 which would support the null hypothesis. In this example, the samples are of equal size, but we can also run this analysis with samples of varying size also.
Calculating SSE
SSE = (Xit Mi)2 In other words, it is just the variance for each sample added together. SSE = (X1t M1)2 + (X2t M2)2 + (X3t M3)2 SSE = [(27-29)2 + (22-29)2 ++ (29-29)2] + [(23-33.5)2 + (36-33.5)2 +] + [(48-37)2 + (35-37)2 ++ (29-37)2] SSE = 819.5
Statistical Output
When you estimate this information in a computer program, it will typically be presented in a table as follows:
Source of Variation Treatment Error Total df k-1 n-k n-1 Sum of squares SST SSE
SS=SST+SSE
Mean squares
MSE=SSE/(n-k)
F-ratio
The Results
For 95% confidence (=.05), our critical F is 3.68 (averaging across the values at 14 and 16 In this case, 1.77 < 3.68 so we must accept the null hypothesis. The dean is puzzled by these results because just by eyeballing the data, it looks like finance majors make more money.
The Results
Many other factors may determine the salary level, such as GPA. The dean decides to collect new data selecting one student randomly from each major with the following average grades.
New data
Average Accounting A+ 41 A 36 B+ 27 B 32 C+ 26 C 23 M(t)1=30.83 = 33.72 Marketing 45 38 33 29 31 25 M(t)2=33.5 Finance M(b) 51 M(b1)=45.67 45 M(b2)=39.67 31 M(b3)=30.83 35 M(b4)=32 32 M(b5)=29.67 27 M(b6)=25 M(t)3=36.83