Sie sind auf Seite 1von 13

AB1202

Statistics and Analysis


Lecture 8
ANOVA And Chi-Square Tests

Chin Chee Kai


cheekai@ntu.edu.sg
Nanyang Business School
Nanyang Technological University
NBS 2016S1 AB1202 CCK-STAT-018
2

ANOVA And Chi-Square Tests


Chi-Square Tests
• Chi-Square Multinomial Test
• Chi-Square Goodness-of-Fit Test For Normality
• Chi-Square Test For Independence
• Statistical Inference For Population Variance
• Comparing Two Population Variances
• Use of Variance Equality Test

ANOVA
• Comparing 3 Or More Population Means
• One-way ANOVA
• Concept of Experimental Design
NBS 2016S1 AB1202 CCK-STAT-018
3

Chi-Square Goodness-of-Fit Test


• Purpose is to ascertain conformity to a given tabled
distribution. In this example,
 𝐻0 : population follows given tabled distribution 𝑑𝑓 = 5. Since
 𝐻1 : population does not follow tabled distribution. 𝜒 2 = 4.8333 <
𝜒𝑐2 = 11.0705,
𝑓𝑖 = observed X A B C D E F
frequency at the we do not reject
𝑖-th bin. 𝑃(𝑋) 0.3 0.2 0.1 0.1 0.2 0.1 𝐻0 and conclude
𝐸𝑖 = expected that 𝑋 has the
frequency at the 𝑓𝑖 14 10 6 2 4 4 given tabled
𝑖-th bin
𝐸𝑖 12 8 4 4 8 4 distribution at
14 𝑓1 𝛼 = 0.05.
12 𝑘=6
10 𝐸1 𝑓2 𝑘
8 𝐸5
6 𝐸2 𝑓3 𝑛= 𝑓𝑖 = 40 𝛼 = 0.05
4
𝐸4 𝑓6
𝐸3 𝑓5 𝐸6 𝑖=1
𝑓4
2
𝑋 0 2
𝜒2
Expected frequency: Must ensure: 𝜒 = 4.8333
𝐸𝑖 = 𝑛 ∙ 𝑃 𝑋 • Have 5 or more bins, and Critical value:
𝜒𝑐2 = 11.0705
Test statistic: • Each 𝐸𝑖 is 1 or more, and
𝑘 Always right-
𝑓𝑖 − 𝐸𝑖 2 • Average of all 𝐸𝑖 is 5 or more tailed.
2
𝜒 =
𝐸𝑖
𝑖=1
d.f. 𝑣 = 𝑘 − 1
NBS 2016S1 AB1202 CCK-STAT-018
4

Goodness-of-Fit Test For Normality


• Purpose is to ascertain normality of distribution.
 𝐻0 : population is normally distributed Can be used
 𝐻1 : population is not normally distributed. to test for
𝑓5 goodness-of-
𝑓𝑖 = observed frequency
at the 𝑖-th bin. 𝐸5 fit to other
𝐸𝑖 = expected frequency 𝑓6
at the 𝑖-th bin 𝐸4
𝐸6 distributions
in similar
𝑓4 manner.
𝑓3 𝑓7
𝑓2 𝐸7 𝑘=9
𝐸3 𝐸8
𝑓1 𝛼
𝐸2 𝑓8 𝐸9
𝐸1
𝑓9 𝑋 0 𝜒2
𝑐5 𝑑5 𝑐8 𝑑8
Test statistic: Critical value: 𝜒𝑐2
Expected frequency: 𝑘 2 Always right-tailed.
𝐸𝑖 = 𝑛 ∙ 𝑃 𝑐𝑖 < 𝑋 ≤ 𝑑𝑖 𝑓𝑖 − 𝐸𝑖
where 𝜒2 =
𝐸𝑖
𝑖=1
𝑋~𝑁 𝑥 , 𝑠 2 , 𝑛 = 𝑘𝑖=1 𝑓𝑖
d.f. 𝑣 = 𝑘 − 3
NBS 2016S1 AB1202 CCK-STAT-018
5

Chi-Square Test For Independence


• To ascertain independence of
random variables.  Usually used with crosstab
 𝐻0 : 𝑋 and 𝑌 are independent tables of 𝑋 and 𝑌.
 𝐻1 : 𝑋 and 𝑌 are not independent.  𝑋 and 𝑌 are usually qualitative
X Male Female Row X Male Female Row
Sums Sums
Y Y 𝑬𝒊 𝜒𝒊𝟐 𝑬𝒊 𝜒𝒊𝟐
Large 29 12 41 Large 23.36 1.36 17.64 1.80 41
Small 20 25 45 Small 25.64 1.24 19.36 1.64 45
Column 49 37 86 Column 49 37 86
Sums Sums

Crosstab of Gender vs Shoe Size. Test statistic: Test statistic:


Is gender of customer independent 𝑘 𝜒 2 = 6.04
𝑓𝑖 − 𝐸𝑖 2
from shoe size purchased? d.f. 𝑣 = 2 − 1 ∙
𝜒2 =
Calculate Expected Frequencies 𝐸𝑖 2−1 =1
𝑖=1
(Row sum)∙(Col sum) d.f. 𝑣 = (rows − 1) ∙ (cols − 1) Critical Value:
𝐸𝑖 = 𝜒𝑐2 = 5.02
Total sum Critical value: always right- Inference:
Calculate each cell’s 𝜒 2 error: tailed Reject H0.
𝑓𝑖 −𝐸𝑖 2
𝜒𝑖2 = 𝑘
𝑖=1 𝐸𝑖
NBS 2016S1 AB1202 CCK-STAT-018
6

Statistical Inference for Variance


 𝐻0 : 𝜎 2 = 𝜎02 𝐻0 : 𝜎 2 = 𝜎02
𝐻1 : 𝜎 2 ≠ 𝜎02
 𝐻1 : 𝜎 2 ≠ 𝜎02 𝛼 𝛼
2 2
Test statistic: 2 𝜒2
2
0 𝜒𝑐,1−𝛼 𝜒𝑐,2 𝛼
𝑛 − 1 ∙ 𝑠 2 2

𝜒2 =
𝜎02 𝐻0 : 𝜎 2 ≤ 𝜎02
𝐻1 : 𝜎 2 > 𝜎02
d.f. 𝑣 = 𝑛 − 1
𝛼
Critical value(s) may be 0 2
𝜒𝑐,𝛼 𝜒2
left-, right-, or two-tailed.
𝐻0 : 𝜎 2 ≥ 𝜎02
𝐻1 : 𝜎 2 < 𝜎02
𝛼
0 2
𝜒𝑐,1−𝛼 𝜒2
NBS 2016S1 AB1202 CCK-STAT-018
7

Comparing Two Population Variances


𝐻0 :𝜎12 = 𝜎22 Test statistic: 𝐻0 :𝜎12 = 𝜎22
2
2
𝐻1 :𝜎1 ≠ 𝜎2 2 𝑠 𝐿𝑎𝑟𝑔𝑒 𝐻1 :𝜎12 ≠ 𝜎22
𝐹= 2
df1 𝑣 = 𝑛 −1
𝑠𝑠𝑚𝑎𝑙𝑙 𝛼
2
1 𝐿𝑎𝑟𝑔𝑒
df2 𝑣2 = 𝑛𝑠𝑚𝑎𝑙𝑙 − 1 0 𝐹𝑑𝑓 ,𝑑𝑓 ,𝛼 𝐹
1 22

𝐻0 :𝜎12 ≤ 𝜎22 Test statistic:𝐹 = 𝑠12


𝑠22 𝐻0 :𝜎12 ≤ 𝜎22
2 2
𝐻1 :𝜎1 > 𝜎2 𝐻1 :𝜎12 > 𝜎22
Because we our null hypothesis is 𝜎12 ≤ 𝜎22 , 𝛼
we implicitly assume that 𝐹 should not be
large enough to reject 𝐻0 whenever 𝑠12 is 0 𝐹𝑑𝑓1,𝑑𝑓2 ,𝛼 𝐹
not significantly larger than 𝑠22 .
𝐻0 :𝜎12 ≥ 𝜎22
𝐻0 :𝜎12 ≥ 𝜎22 Test statistic:𝐹 =
𝑠22
𝐻1 :𝜎12 < 𝜎22
𝑠12
𝐻1 :𝜎12 < 𝜎22 𝛼
By similar reasoning, our 𝐹 this time will be 0 𝐹𝑑𝑓2,𝑑𝑓1 ,𝛼 𝐹
significantly larger when 𝑠12 is sufficiently
smaller than 𝑠22 , thereby rejecting 𝐻0 . 𝐹𝑑𝑓1,𝑑𝑓2 ,𝛼 = 1/𝐹𝑑𝑓2 ,𝑑𝑓1,1−𝛼
NBS 2016S1 AB1202 CCK-STAT-018
8

Use Of Variance Equality Test


• Explicitly, such as when statistical objective is
exactly to test if variances are equal or not.
▫ Eg: Are two selected stocks having same mean risk?

• Implicitly, when there is assumption of equal


variance in a statistical goal. Eg:
▫ Two-sample test of means equality for small sample
sizes,
▫ ANOVA,
▫ etc
• For additional information – when knowledge of
equality of variance helps us frame a better model
to investigate the data.
NBS 2016S1 AB1202 CCK-STAT-018
9

ANOVA – Comparing 3 Or More Means


H0: 𝜇1 = 𝜇2 = 𝜇3 = ⋯ = 𝜇𝑘 (all means are equal)
H1: At least 1 𝜇𝑖 is different (NOT all means are equal)
• 𝑘 populations with 𝑘 means. Goal is to test whether
all means are equal (to some value which we do not
know).
𝑘
• One way is to test combinations of pairs of means
2
using t-tests at 𝛼 significance level. But combining
𝑘
and interpreting results would (i) be tedious, (ii)
2
tend to make Type I probability exceed 𝛼.
• We could use Tukey simultaneous test, which is not
in our syllabus (thank goodness!)
• Or we could use ANOVA!
NBS 2016S1 AB1202 CCK-STAT-018
10

ANOVA
H0: 𝜇1 = 𝜇2 = 𝜇3 = ⋯ = 𝜇𝑘 (all means are equal)
H1: At least 1 𝜇𝑖 is different (NOT all means are equal)
𝑘 2 • 𝑘 groups each with 𝑛𝑖 sample
• 𝑆𝑆𝑇 = 𝑛
𝑖=1 𝑖 ∙ 𝑥𝑖 − 𝑥
2 size (𝑖 = 1 … 𝑘)
𝑛1
• 𝑆𝑆𝐸 = 𝑗=1 𝑥1𝑗 − 𝑥1 + • 𝑥𝑖 is mean of group 𝑖’s data.
𝑛2 2 • 𝑥 is mean of all data (regardless
𝑗=1 𝑥2𝑗 − 𝑥2 + ⋯+ of group boundary)
𝑛𝑘 2 • 𝑥𝑖𝑗 is the 𝑗th piece of data in
𝑗=1 𝑥𝑘𝑗 − 𝑥𝑘 group 𝑖.
𝑆𝑆𝑇 𝑆𝑆𝐸 • SST is also SSbetween
• 𝑀𝑆𝑇 = , 𝑀𝑆𝐸 =
𝑘−1 𝑛−𝑘 • SSE is also SSwithin
Test statistic: • SSTotal=SST + SSE
𝑀𝑆𝑇
𝐹= Always right-tailed.
𝑀𝑆𝐸
df1 𝑣1 = 𝑘 − 1 𝛼
df2 𝑣2 = 𝑛 − 𝑘 𝐹𝑑𝑓1,𝑑𝑓2 ,𝛼
0 𝐹
NBS 2016S1 AB1202 CCK-STAT-018
11

Concept Of Experimental Design


• Perfume company has 4 new products but not sure if
it should market all products. Perhaps consumers
might be receptive to just one of these new products.
• 4 groups of randomly chosen consumers were given
each of new product. Their receptiveness are
surveyed on a scale of 1 to 10. (Data shown in next
slide)
• Perfume company’s statistical question is:
▫ Are the potential consumer populations for each of the
4 products equally receptive?
▫ If so, company would launch all 4 (perhaps with divided
attention), but capturing a more diversified market.
▫ Otherwise, company would launch selectively one
product with same budget, probably with more success.
NBS 2016S1 AB1202 CCK-STAT-018
12

Calculating ANOVA • Since F=0.3727 < 2.9011,


SN A B C D we do NOT reject H0, and
1 6 7 5 9
2 6 6 5 8 conclude that all mean
3 7 7 5 3 receptiveness are equal at a
4 5 4 6 4 significance level of 5%.
5 4 7 7 3
6 3 1 7 1
7 8 5 3 7 • All receptiveness are
8 5 2 9 4
9 2 7 statistically equal. But
10 3 6 perfume company
𝒏𝒊 : 10 10 8 8
eventually did not launch
Total: 49 52 47 39 any product. Why?
Group mean 𝒙𝒊 : 4.9 5.2 5.875 4.875
SST (between): 0.8670 0.0003 3.7052 0.8164
𝛂: 0.05
s: 1.9120 2.2010 1.8077 2.7999
𝐧: 36
𝒔𝟐 : 3.6556 4.8444 3.2679 7.8393
No. of Group k: 4
SSE (within): 32.9 43.6 22.875 54.875
All Mean 𝒙: 5.1944
SS df MS F All Variance 𝒔𝟐𝑻𝒐𝒕𝒂𝒍 : 4.5611
Between (SST, MST): 5.3889 3 1.7963 0.3727 Test Statistic 𝐅: 0.3727
Within (SSE, MSE): 154.25 32 4.8203 p-value: 0.7733
SSTotal: 159.64 Critical Value: 𝑭𝒄 : 2.9011
NBS 2016S1 AB1202 CCK-STAT-018
13

Secrets To ANOVA Calculation


• Discussion on this slide pertains to use of calculators.
2
• 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = 𝑛 − 1 ∙ 𝑠𝑇𝑜𝑡𝑎𝑙
▫ Enter all values (regardless of group boundaries) into
calculator in 1-var stat mode. Recall sample s.d. [𝑠𝑛 ],
square it, multiply by (𝑛 − 1).
• Each group component of SSE, eg for group 3,
𝑛3 2
SSE3 = 𝑗=1 𝑥3𝑗 − 𝑥3 , is that group’s sample variance
multiplied by its sample size less 1.
𝑛3 2
Ie. 𝑗=1 𝑥3𝑗 − 𝑥3 = 𝑛3 − 1 𝑠32
▫ Enter that group’s values (only) into calculator. Eg, group
3. Recall sample s.d. [𝑠𝑛 ], square it, multiply by (𝑛3 −1).
▫ Sum up all group’s SSEs to get the full SSE.
• Now the magic is: 𝑆𝑆𝑇 = 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 − 𝑆𝑆𝐸. You don’t need
to individually find out SST components.

Das könnte Ihnen auch gefallen