Beruflich Dokumente
Kultur Dokumente
NetId: vdaga2
Section: C1
Homework 2
Exercise 1 (a)
Contingency Table for Treat and Allergy
Table of treat by allergy
treat
Frequency
Expected
Percent
control
intervention
Total
allergy
Y Total
1
1
25.00
1
1
25.00
1
1
25.00
1
1
25.00
50.00
2
50.00
2
50.00
4
100.00
50.00
2
DF
Value
Prob
Chi-Square
0.0000
1.0000
0.0000
1.0000
0.0000
1.0000
Mantel-Haenszel Chi-Square
0.0000
1.0000
Phi Coefficient
0.0000
Contingency Coefficient
0.0000
Cramer's V
0.0000
Left-sided Pr <= F
0.8333
Right-sided Pr >= F
0.8333
0.6667
Two-sided Pr <= P
1.0000
Sample Size = 4
From the contingency table in part (a), we can see that the expected count is less than 5, hence we
cannot use Chi-Square test in this case and we have to use Fishers Exact Test. Also, we cannot
use Mantel Haenszel test because the variables are not ordinal.
Interpretation
Ho: The variables treat and allergy are independent
Ha: The variables treat and allergy have significant association
The fishers exact test gives two sided p-value as 1.000, hence we accept Ho and conclude that
there is no significant association between two variables.
Magnitude of the association: The Contingency Coefficient, Cramers V and Phi Coefficient, all
have value 0, hence we conclude that the strength of association is negligible based on the values
of Cramers V and Phi Coefficient as there values are bounded between -1 and 1.
Exercise 1 (c)
Statistics for Table of treat by allergy
(Asymptotic) 95%
ASE Confidence Limits
(Exact) 95%
Confidence Limits
Row 1
0.5000 0.3536
0.0000
1.0000
0.0126
0.9874
Row 2
0.5000 0.3536
0.0000
1.0000
0.0126
0.9874
Total
0.5000 0.2500
0.0100
0.9900
0.0676
0.9324
-0.9800
0.9800
Sample Size = 4
The table above gives us the risk estimates for the people having allergy in control group (Row
1) and intervention group (Row 2). The risk difference between the two groups is 0.00 and the
95% confidence interval is (-.9800, .9800). We can check whether the control group has
significantly higher rate of having peanut allergy than intervention group by looking at the
confidence interval. Since, the 95 % confidence interval contains zero in it, this indicates that
control group do not have significantly higher rate of having peanut allergy than intervention
group at 0.05 level of significance.
-------------------------------------------------------------------------------------------------------------Exercise 2 (a)
Table of carat_cat by price_cat
carat_cat
price_cat
Frequency
Expected
1-Affordable 2-Average 3-Expensive Total
1-Small
104
50.649
0
18.234
0
35.117
104
2-Medium
46
49.675
47
17.883
9
34.442
102
3-Large
0
49.675
7
17.883
95
34.442
102
150
54
104
308
Total
DF
Value
Prob
Chi-Square
338.7975 <.0001
388.8967 <.0001
Mantel-Haenszel Chi-Square
238.1084 <.0001
Phi Coefficient
1.0488
Contingency Coefficient
0.7237
Cramer's V
0.7416
From the contingency table, we can see that all the expected counts have values greater than 5,
hence we can use Chi-Square test. We can also use Mantel Haenszel test because the variables
are ordinal in this case.
Interpretation
Ho: The variables carat_cat and price_cat and are independent
Ha: The variables carat_cat and price_cat have significant association
Both the Chi-Square and Mantel-Haenszel test depicts the p-value to be less than 0.001 at 0.05
significance level, hence we reject the null hypothesis and conclude that there is statistically
significant association between the two variables.
Magnitude of association: the value for Contingency Coefficient and Cramers V is quite
similar values of 0.723 and 0.7416 respectively, whereas the Phi Coefficient depicts the
magnitude of association to be a little more with value 1.04. Since, Cramers V is bounded by 1,
it could be the best way to assess the strength of association. Hence, we can conclude that the
strength of association is strong between the variables because the value for Cramers V is
greater than 0.6.
Exercise 2 (b)
price_cat
Frequency
Percent
1-Affordable 3-Expensive
2-Medium
3-Large
Total
Total
46
30.67
9
6.00
55
36.67
0
0.00
95
63.33
95
63.33
46
30.67
104
69.33
150
100.00
With help of contingency table, we can see that the large-sized diamonds have much larger
proportion in the expensive price category than the medium sized diamonds. We can further test
the significance of this with help of the Risk estimates table as follows:
Risk
(Exact) 95%
Confidence Limits
Row 1
0.1636
0.0499
0.0659
0.2614
0.0777
0.2880
Row 2
1.0000
0.0000
1.0000
1.0000
0.9619
1.0000
Total
0.6933
0.0376
0.6195
0.7671
0.6129
0.7659
-0.8364
0.0499
-0.9341
-0.7386
Difference
Source
DF
Model
2928642687
1464321343
305
626784660
2055032
3555427347
Error
712.55
<.0001
Source
carat_cat
28.55947
1433.538
5019.484
2928642687
1464321343
712.55
<.0001
Ho: Mean values of price for all Carat categories are equal
Ha: Mean value of price for at least one carat category is different
The Anova table gives p-value less that .0001, hence we reject the null hypothesis and conclude
that not all the Carat categories have equal mean values of price.
We assume the following things in one-way Anova:
1. Each sample is independent and identically distributed.
2. Response variable is normally distributed.
3. The population variances are equal across the responses for different groups.
Exercise 3 (b)
Other than normality, we have to check whether the population variances are equal across the
responses for different groups. We can check that with help of the Levenes Test of Homogeneity
of variances.
Levene's Test for Homogeneity of price Variance
ANOVA of Squared Deviations from Group Means
Source
carat_cat
Error
Sum of
DF Squares
Mean
Square F Value
9.589E14
4.794E14
305
6.731E15
2.207E13
21.73
Pr > F
<.0001
DF F Value Pr > F
2.0000
712.25
<.0001
182.3
The Welchs Anova model is very statistically significant with p-value less than 0.001 and hence,
we conclude that Carat categories can be used to explain a significant amount of variation in price
values.
Exercise 3 (c)
Tukey's Studentized Range (HSD) Test for price
Note This test controls the Type I experimentwise error rate.
carat_cat
Comparison
Difference
Between
Means
Simultaneous
95%
Confidence
Limits
3-Large - 2-Medium
4520.2
4047.4
4993.0 ***
3-Large - 1-Small
7493.5
7023.0
7964.0 ***
2-Medium - 3-Large
-4520.2
-4993.0
-4047.4 ***
2-Medium - 1-Small
2973.3
2502.8
3443.8 ***
1-Small - 3-Large
-7493.5
-7964.0
-7023.0 ***
1-Small - 2-Medium
-2973.3
-3443.8
-2502.8 ***
We can use Tuckeys test for comparing all pairwise differences. The table depicts that all the
comparisons are statistically significant at 0.05 level of significance. Hence, the mean value of
price is different for all the groups.
Interpretation of pricing for different pairs based on Means and Confidence Intervals
We can see that large-sized diamonds have the highest price mean followed by the medium sized
diamonds and small-sized diamonds having the lowest mean value of price amongst the three.
If we compare the small-sized diamonds with the other two, they have mean price 2973.3 less
than that of the medium-sized diamonds, and mean price 7493.5 less than that of the big-sized
diamonds with confidence intervals of (-3443.8, -2502.8) and (-7964.0, -2502.8) respectively.
If we compare the medium sized diamonds with big-sized diamonds, medium sized diamonds
have mean price 4520.2 less than the large sized diamonds with a confidence interval of (-4993.0,
-4047.4). Hence, this is how we distribute the price means amongst various groups.