Beruflich Dokumente
Kultur Dokumente
and Chi-square
The data
Table 3. Solvent Exposure Frequencies and Adjusted Pairwise
Odds Ratios in PDDiscordant Twins, n = 99 Pairsa
independent
correlated
Chi-square test:
Conditional logistic
regression: multivariate
Logistic regression:
multivariate technique used
when outcome is binary; gives
multivariate-adjusted odds
ratios
independent
correlated
Non-parametric statistics
Pearsons correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
Repeated-measures
ANOVA: compares changes
Mixed models/GEE
modeling: multivariate
ANOVA example
Mean micronutrient intake from the school lunch by school
Calcium (mg)
Iron (mg)
Folate (g)
Zinc (mg)
a
Mean
SDe
Mean
SD
Mean
SD
Mean
SD
S1a, n=28
117.8
62.4
2.0
0.6
26.6
13.1
1.9
1.0
S2b, n=25
158.7
70.5
2.0
0.6
38.7
14.5
1.5
1.2
S3c, n=21
206.5
86.2
2.0
0.6
42.6
15.1
1.3
0.4
P-valued
0.000
0.854
0.000
0.055
FROM: Gould R, Russell J,
Barker ME. School lunch
menus and 11 to 12 year old
children's food choice in three
secondary schools in Englandare the nutritional standards
being met? Appetite. 2006
Jan;46(1):86-92.
ANOVA
(ANalysis Of VAriance)
Hypotheses of One-Way
ANOVA
H 0 : 1 2 3
H 1 : Not all of the population means are the same
ANOVA
The F-test
Is the difference in the means of the groups more
than background noise (=variability within groups)?
Summarizes the mean differences
between all groups at once.
The F-distribution
http://www.econtools.com/jevons/java/Graphics2D/FDist.html
The F-distribution
2
between
2
within
~ Fn ,m
The
Treatment 2
Treatment 3
Treatment 4
y11
y21
y31
y41
y12
y22
y32
y42
y13
y23
y33
y43
y14
y24
y34
y44
y15
y25
y35
y45
y16
y26
y36
y46
y17
y27
y37
y47
y18
y28
y38
y48
y19
y29
y39
y49
y110
y210
y310
y410
10
y1
j 1
y 2
10
10
10
(y
1j
y1 )
j 1
10 1
2j
j 1
10
( y 2 j y 2 ) 2
j 1
y 3
10
(y
3j
3j
y 4
j 1
10
y 3 )
j 1
10 1
k=4 groups
10
10
10
y1 j
n=10 obs./group
10 1
10
(y
4j
4j
j 1
10
y 4 ) 2
j 1
10 1
The (within)
group variances
10
(y
1j
y1 ) 2
(y
1j
j 1
10
(y
10
y1 ) +
2
3j
y 3 )
10
10
j 1
10
i 1 j 1
4j
y 4 ) 2
The (within)
group variances
10 1
10 1
( y 2 j y 2 ) 2
(y
j 1
j 1
10 1
10 1
(y
y 2 )
j 1
j 1
10
2j
( y3 j y3 ) +
j 3
( y ij y i ) 2
10
(y
4j
y 4 ) 2
j 1
Overall mean
of all 40
observations
(grand mean)
(y
i 1
ij
i 1 j 1
10 x
10
40
y )
10
( y
i 1 j 1
ij
y )
Partitioning of Variance
4
10
( y
i 1 j 1
ij
yi )
+10x
i 1
( y i y )
10
i 1 j 1
( yij y ) 2
ANOVA Table
Source of
variation
Between
(k groups)
Within
d.f.
Sum of
squares
k-1
SSB
F-statistic
SSB/k-1
(sum of squared
deviations of
group means from
grand mean)
nk-k
(n individuals per
group)
Total
variation
Mean Sum
of Squares
nk-1
SSW
(sum of squared
deviations of
observations from
their group mean)
SSB
SSW
Go to
k 1
nk k
s2=SSW/nk-k
TSS
(sum of squared deviations of
observations from grand mean)
p-value
TSS=SSB + SSW
Fk-1,nk-k
chart
n
X n Yn 2
X Yn 2
SSB n (X n (
)) n (Yn ( n
))
2
2
i 1
i 1
n
ANOVA=t-test
n
X n Yn 2
Y
X
n (
) n ( n n )2
2
2
2
2
i 1
i 1
X n 2 Yn 2
X *Y
Y
X
X *Y
) ( ) 2 n n ( n )2 ( n )2 2 n n )
2
2
2
2
2
2
2
2
2
n( X n 2 X n * Yn Yn ) n( X n Yn )
n((
Source of
variation
Between
(2 groups)
Within
d.f.
1
2n-2
Sum of
squares
SSB
Squared
(squared difference
difference in means
in means
times n
multiplied
by n)
SSW
equivalent to
numerator of
pooled
variance
Total
2n-1
variation
Mean
Sum of
Squares
TSS
Pooled
variance
F-statistic
n( X Y ) 2
sp
Go to
(X Y )
sp
n
sp
n
p-value
) (t 2 n 2 )
2
F1, 2n-2
Chart
notice
values
are just (t
2
2n-2)
Example
Treatment 1
Treatment 2
Treatment 3
Treatment 4
60 inches
50
48
47
67
52
49
67
42
43
50
54
67
67
55
67
56
67
56
68
62
59
61
65
64
67
61
65
59
64
60
56
72
63
59
60
71
65
64
65
Example
Step 1) calculate the sum
of squares between groups:
Treatment 1
Treatment 2
Treatment 3
Treatment 4
60 inches
50
48
47
67
52
49
67
42
43
50
54
67
67
55
67
56
67
56
68
62
59
61
65
64
67
61
65
59
64
60
56
72
63
59
60
71
65
64
65
Example
Step 2) calculate the sum
of squares within groups:
(60-62) 2+(67-62) 2+ (42-62)
2+ (67-62) 2+ (56-62) 2+ (6262) 2+ (64-62) 2+ (59-62) 2+
(72-62) 2+ (71-62) 2+ (5059.7) 2+ (52-59.7) 2+ (4359.7) 2+67-59.7) 2+ (6759.7) 2+ (69-59.7)
2+.(sum of 40 squared
deviations) = 2060.6
Treatment 1
Treatment 2
Treatment 3
Treatment 4
60 inches
50
48
47
67
52
49
67
42
43
50
54
67
67
55
67
56
67
56
68
62
59
61
65
64
67
61
65
59
64
60
56
72
63
59
60
71
65
64
65
d.f.
Sum of squares
Mean Sum of
Squares
F-statistic
p-value
Between
196.5
65.5
1.14
.344
Within
36
2060.6
57.2
Total
39
2257.1
d.f.
Sum of squares
Mean Sum of
Squares
F-statistic
p-value
Between
196.5
65.5
1.14
.344
Within
36
2060.6
57.2
Total
39
2257.1
INTERPRETATION of ANOVA:
How much of the variance in height is explained by treatment group?
R2=Coefficient of Determination = SSB/TSS = 196.5/2275.1=9%
Coefficient of Determination
SSB
SSB
R
ANOVA example
Table 6. Mean micronutrient intake from the school lunch by school
Calcium (mg)
Iron (mg)
Folate (g)
Zinc (mg)
a
Mean
SDe
Mean
SD
Mean
SD
Mean
SD
S1a, n=25
117.8
62.4
2.0
0.6
26.6
13.1
1.9
1.0
S2b, n=25
158.7
70.5
2.0
0.6
38.7
14.5
1.5
1.2
S3c, n=25
206.5
86.2
2.0
0.6
42.6
15.1
1.3
0.4
P-valued
0.000
0.854
0.000
0.055
FROM: Gould R, Russell J,
Barker ME. School lunch
menus and 11 to 12 year old
children's food choice in three
secondary schools in Englandare the nutritional standards
being met? Appetite. 2006
Jan;46(1):86-92.
Answer
Step 1) calculate the sum of squares between groups:
Mean for School 1 = 117.8
Mean for School 2 = 158.7
Mean for School 3 = 206.5
Grand mean: 161
SSB = [(117.8-161)2 + (158.7-161)2 + (206.5-161)2] x25 per
group= 98,113
Answer
Step 2) calculate the sum of squares within groups:
S.D. for S1 = 62.4
S.D. for S2 = 70.5
S.D. for S3 = 86.2
Therefore, sum of squares within is:
(24)[ 62.42 + 70.5 2+ 86.22]=391,066
Answer
Step 3) Fill in your ANOVA table
Source of variation
d.f.
Sum of squares
Mean Sum of
Squares
F-statistic
p-value
Between
98,113
49056
<.05
Within
72
391,066
5431
Total
74
489,179
**R2=98113/489179=20%
School explains 20% of the variance in lunchtime calcium
intake in these kids.
ANOVA summary
1. Bonferroni
For example, to make a Bonferroni correction, divide your desired alpha cut-off
level (usually .05) by the number of comparisons you are making. Assumes
complete independence between comparisons, which is way too conservative.
Obtained P-value
Original Alpha
# tests
New Alpha
Significant?
.001
.05
.010
Yes
.011
.05
.013
Yes
.019
.05
.017
No
.032
.05
.025
No
.048
.05
.050
Yes
adjust=tukey
adjust=scheffe
Holm
Start with p1, and compare to Bonferroni p (=/T).
2.
If p1< /T, then p1 is significant and continue to step 2.
If not, then we have no significant p-values and stop here.
3.
If p2< /(T-1), then p2 is significant and continue to step.
If not, then p2 thru pT are not significant and stop here.
4.
If p3< /(T-2), then p3 is significant and continue to step
If not, then p3 thru pT are not significant and stop here.
Repeat the pattern
1.
Hochberg
Start with largest (least significant) p-value, pT,
and compare to . If its significant, so are all
the remaining p-values and stop here. If its not
significant then go to step 2.
2.
If pT-1< /(T-1), then pT-1 is significant, as are all
remaining smaller p-vales and stop here. If not,
then pT-1 is not significant and go to step 3.
Repeat the pattern
1.
Note: Holm and Hochberg should give you the same results. Use
Holm if you anticipate few significant comparisons; use Hochberg if
you anticipate many significant comparisons.
Practice Problem
A large randomized trial compared an experimental drug and 9 other standard
drugs for treating motion sickness. An ANOVA test revealed significant
differences between the groups. The investigators wanted to know if the
experimental drug (drug 1) beat any of the standard drugs in reducing total
minutes of nausea, and, if so, which ones. The p-values from the pairwise
ttests (comparing drug 1 with drugs 2-10) are below.
Drug 1 vs. drug
10
p-value
.05
.3
.25
.04
.001
.006
.08
.002
.01
Answer
Bonferroni makes new value = /9 = .05/9 =.0056; therefore, using Bonferroni, the
new drug is only significantly different than standard drugs 6 and 9.
Arrange p-values:
6
10
.001
.002
.006
.01
.04
.05
.08
.25
.3
Practice problem
Answer
independent
correlated
Non-parametric statistics
Pearsons correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
Repeated-measures
ANOVA: compares changes
Mixed models/GEE
modeling: multivariate
Non-parametric ANOVA
Kruskal-Wallis one-way ANOVA
(just an extension of the Wilcoxon Sum-Rank (Mann
Whitney U) test for 2 groups; based on ranks)
independent
correlated
Chi-square test:
Conditional logistic
regression: multivariate
Logistic regression:
multivariate technique used
when outcome is binary; gives
multivariate-adjusted odds
ratios
Chi-square test
for comparing proportions
(of a categorical variable)
between >2 groups
I. Chi-Square Test of Independence
When both your predictor and outcome variables are categorical, they may be crossclassified in a contingency table and compared using a chi-square test of
independence.
A contingency table with R rows and C columns is an R x C contingency table.
Example
The Experiment
Standard line
Comparison lines
A, B, and C
The Experiment
Further Results
10
Yes
20
50
75
60
30
No
80
50
25
40
70
20 + 50 + 75 + 60 + 30 = 235
conformed
out of 500 experiments.
Overall likelihood of conforming =
235/500 = .47
Expected frequencies if no
association between group
size and conformity
Number of group members?
Conformed?
10
Yes
47
47
47
47
47
No
53
53
53
53
53
Chi-Square test
(observed- expected)2
expected
2
47
47
47
47
47
(80 53) 2 (50 53) 2 (25 53) 2 (40 53) 2 (70 53) 2
85
53
53
53
53
53
2
2 df Z 2 ; where Z ~ Normal(0,1 )
i 1
The expected
value and
variance of a chisquare:
E(x)=df
Var(x)=2(df)
Chi-Square test
(observed- expected)2
expected
2
47
47
47
47
47
(80 53) 2 (50 53) 2 (25 53) 2 (40 53) 2 (70 53) 2
85
53
53
53
53
53
2
Rule of thumb: if the chi-square statistic is much greater than its degrees of freedom,
indicates statistical significance. Here 85>>4.
Own a cell
phone
Dont own a
cell phone
Brain tumor
No brain tumor
347
352
88
91
435
453
5
3
.014; ptumor / nophone
.033
352
91
1 p
2) 0
(p
8
;p
.018
453
( p )(1 p ) ( p )(1 p )
n1
n2
ptumor / cellphone
Z
(.014 .033)
(.018)(.982 ) (.018)(.982 )
352
91
.019
1.22
.0156
No brain tumor
Own
347
352
Dont own
88
91
435
453
8
352
.018; pcellphone
.777
453
453
ptumor xp cellphone .018 * .777 .014
Expected value in
cell c= 1.7, so
technically should
Expected in cell a .014 * 453 6.3; 1.7 in cell c; use a Fishers exact
here! Next term
345.7 in cell b; 89.3 in cell d
ptumor
2
1
1.48
6.3
1.7
89.3
345 .7
NS
note :Z 2 1.22 2 1.48
Caveat
**When the sample size is very small in
any cell (expected value<5), Fishers
exact test is used as an alternative to
the chi-square test.
independent
correlated
Chi-square test:
Conditional logistic
regression: multivariate
Logistic regression:
multivariate technique used
when outcome is binary; gives
multivariate-adjusted odds
ratios