HW 2 Solution

Name :Varun Daga
NetId: vdaga2
Section: C1
Homework 2
Exercise 1 (a)
Contingency Table for Treat and Allergy
Table of treat by allergy
treat
Frequency
Expected
Percent
control
intervention
Total
allergy
Y Total
1
1
25.00
1
1
25.00
1
1
25.00
1
1
25.00
50.00
2
50.00
2
50.00
4
100.00
50.00
2
Percentage of allergic children in the control group: 25%

Percentage of allergic children in the intervention group: 25%
Exercise 1 (b)
Statistics for Table of treat by allergy
Statistic
DF
Value
Prob
Chi-Square
0.0000
1.0000
Likelihood Ratio Chi-Square
0.0000
1.0000
Continuity Adj. Chi-Square
0.0000
1.0000
Mantel-Haenszel Chi-Square
0.0000
1.0000
Phi Coefficient
0.0000
Contingency Coefficient
0.0000
Cramer's V
0.0000
WARNING: 100% of the cells have expected counts less

than 5. Chi-Square may not be a valid test.
Fisher's Exact Test

Cell (1,1) Frequency (F)
Left-sided Pr <= F
0.8333
Right-sided Pr >= F
0.8333
Table Probability (P)
0.6667
Two-sided Pr <= P
1.0000
Sample Size = 4
From the contingency table in part (a), we can see that the expected count is less than 5, hence we
cannot use Chi-Square test in this case and we have to use Fishers Exact Test. Also, we cannot
use Mantel Haenszel test because the variables are not ordinal.
Interpretation
Ho: The variables treat and allergy are independent
Ha: The variables treat and allergy have significant association
The fishers exact test gives two sided p-value as 1.000, hence we accept Ho and conclude that
there is no significant association between two variables.
Magnitude of the association: The Contingency Coefficient, Cramers V and Phi Coefficient, all
have value 0, hence we conclude that the strength of association is negligible based on the values
of Cramers V and Phi Coefficient as there values are bounded between -1 and 1.
Exercise 1 (c)
Statistics for Table of treat by allergy
Column 2 Risk Estimates

Risk
(Asymptotic) 95%
ASE Confidence Limits
(Exact) 95%
Confidence Limits
Row 1
0.5000 0.3536
0.0000
1.0000
0.0126
0.9874
Row 2
0.5000 0.3536
0.0000
1.0000
0.0126
0.9874
Total
0.5000 0.2500
0.0100
0.9900
0.0676
0.9324
Difference 0.0000 0.5000
-0.9800
0.9800
Difference is (Row 1 - Row 2)
Sample Size = 4
The table above gives us the risk estimates for the people having allergy in control group (Row
1) and intervention group (Row 2). The risk difference between the two groups is 0.00 and the
95% confidence interval is (-.9800, .9800). We can check whether the control group has
significantly higher rate of having peanut allergy than intervention group by looking at the
confidence interval. Since, the 95 % confidence interval contains zero in it, this indicates that
control group do not have significantly higher rate of having peanut allergy than intervention
group at 0.05 level of significance.
-------------------------------------------------------------------------------------------------------------Exercise 2 (a)
Table of carat_cat by price_cat
carat_cat
price_cat
Frequency
Expected
1-Affordable 2-Average 3-Expensive Total
1-Small
104
50.649
0
18.234
0
35.117
104
2-Medium
46
49.675
47
17.883
9
34.442
102
3-Large
0
49.675
7
17.883
95
34.442
102
150
54
104
308
Total
Statistics for Table of carat_cat by price_cat

Statistic
DF
Value
Prob
Chi-Square
338.7975 <.0001
Likelihood Ratio Chi-Square
388.8967 <.0001
Mantel-Haenszel Chi-Square
238.1084 <.0001
Phi Coefficient
1.0488
Contingency Coefficient
0.7237
Cramer's V
0.7416
Sample Size = 308
From the contingency table, we can see that all the expected counts have values greater than 5,
hence we can use Chi-Square test. We can also use Mantel Haenszel test because the variables
are ordinal in this case.
Interpretation
Ho: The variables carat_cat and price_cat and are independent
Ha: The variables carat_cat and price_cat have significant association
Both the Chi-Square and Mantel-Haenszel test depicts the p-value to be less than 0.001 at 0.05
significance level, hence we reject the null hypothesis and conclude that there is statistically
significant association between the two variables.
Magnitude of association: the value for Contingency Coefficient and Cramers V is quite
similar values of 0.723 and 0.7416 respectively, whereas the Phi Coefficient depicts the
magnitude of association to be a little more with value 1.04. Since, Cramers V is bounded by 1,
it could be the best way to assess the strength of association. Hence, we can conclude that the
strength of association is strong between the variables because the value for Cramers V is
greater than 0.6.
Exercise 2 (b)
Table of carat_cat by price_cat

carat_cat
price_cat
Frequency
Percent
1-Affordable 3-Expensive
2-Medium
3-Large
Total
Total
46
30.67
9
6.00
55
36.67
0
0.00
95
63.33
95
63.33
46
30.67
104
69.33
150
100.00
With help of contingency table, we can see that the large-sized diamonds have much larger
proportion in the expensive price category than the medium sized diamonds. We can further test
the significance of this with help of the Risk estimates table as follows:
Column 2 Risk Estimates

(Asymptotic) 95%
ASE Confidence Limits
Risk
(Exact) 95%
Confidence Limits
Row 1
0.1636
0.0499
0.0659
0.2614
0.0777
0.2880
Row 2
1.0000
0.0000
1.0000
1.0000
0.9619
1.0000
Total
0.6933
0.0376
0.6195
0.7671
0.6129
0.7659
-0.8364
0.0499
-0.9341
-0.7386
Difference
Difference is (Row 1 - Row 2)
Row 1: Medium-sized diamonds

Row 2: Large sized diamonds
We can indeed see that the large-sized diamonds have much more probability of being
expensive with sample proportion of 1.0 as compared to that of medium-sized diamonds with
sample proportion of 0.1636. To see whether the difference between the two is significant, we
can see 95 % confidence interval, which is (-0.9341, -0.7386). Since, the 95 % confidence
interval does not contain 0, we can say that the difference is significant, and large-sized
diamonds have larger proportion in the expensive price category than the medium sized
diamonds at 0.05 level of significance.
-------------------------------------------------------------------------------------------------------------Exercise 3 (a)
Anova table for price with carat_cat as the categorical predictor
Dependent Variable: price
Sum of
Squares Mean Square F Value Pr > F
Source
DF
Model
2928642687
1464321343
305
626784660
2055032
Corrected Total 307
3555427347
Error
712.55
<.0001
R-Square Coeff Var Root MSE price Mean

0.823710
Source
carat_cat
28.55947
1433.538
5019.484
DF Anova SS Mean Square F Value Pr > F

2
2928642687
1464321343
712.55
<.0001
Ho: Mean values of price for all Carat categories are equal
Ha: Mean value of price for at least one carat category is different
The Anova table gives p-value less that .0001, hence we reject the null hypothesis and conclude
that not all the Carat categories have equal mean values of price.
We assume the following things in one-way Anova:
1. Each sample is independent and identically distributed.
2. Response variable is normally distributed.
3. The population variances are equal across the responses for different groups.
Exercise 3 (b)
Other than normality, we have to check whether the population variances are equal across the
responses for different groups. We can check that with help of the Levenes Test of Homogeneity
of variances.
Levene's Test for Homogeneity of price Variance
ANOVA of Squared Deviations from Group Means
Source
carat_cat
Error
Sum of
DF Squares
Mean
Square F Value
9.589E14
4.794E14
305
6.731E15
2.207E13
21.73
Pr > F
<.0001
Ho: Homogeneity of price variance for different groups

Ha: Some variances are significantly different.
The test results show that the p-value is less than 0.001, hence we reject the null hypothesis and
conclude that some variances are significantly different. Hence, our assumption of equal
variances in part (a) is not valid, and we further have to carry out Welchs test.
Welch's ANOVA for price
Source
carat_cat
Error
DF F Value Pr > F
2.0000
712.25
<.0001
182.3
The Welchs Anova model is very statistically significant with p-value less than 0.001 and hence,
we conclude that Carat categories can be used to explain a significant amount of variation in price
values.
Exercise 3 (c)
Tukey's Studentized Range (HSD) Test for price
Note This test controls the Type I experimentwise error rate.
Comparisons significant at the 0.05 level are indicated

by ***.
carat_cat
Comparison
Difference
Between
Means
Simultaneous
95%
Confidence
Limits
3-Large - 2-Medium
4520.2
4047.4
4993.0 ***
3-Large - 1-Small
7493.5
7023.0
7964.0 ***
2-Medium - 3-Large
-4520.2
-4993.0
-4047.4 ***
2-Medium - 1-Small
2973.3
2502.8
3443.8 ***
1-Small - 3-Large
-7493.5
-7964.0
-7023.0 ***
1-Small - 2-Medium
-2973.3
-3443.8
-2502.8 ***
We can use Tuckeys test for comparing all pairwise differences. The table depicts that all the
comparisons are statistically significant at 0.05 level of significance. Hence, the mean value of
price is different for all the groups.
Interpretation of pricing for different pairs based on Means and Confidence Intervals
We can see that large-sized diamonds have the highest price mean followed by the medium sized
diamonds and small-sized diamonds having the lowest mean value of price amongst the three.
If we compare the small-sized diamonds with the other two, they have mean price 2973.3 less
than that of the medium-sized diamonds, and mean price 7493.5 less than that of the big-sized
diamonds with confidence intervals of (-3443.8, -2502.8) and (-7964.0, -2502.8) respectively.
If we compare the medium sized diamonds with big-sized diamonds, medium sized diamonds
have mean price 4520.2 less than the large sized diamonds with a confidence interval of (-4993.0,
-4047.4). Hence, this is how we distribute the price means amongst various groups.

HW 2 Solution

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

HW 2 Solution

Hochgeladen von

Copyright:

Verfügbare Formate

Name :Varun Daga

Percentage of allergic children in the control group: 25%

Likelihood Ratio Chi-Square

Continuity Adj. Chi-Square

WARNING: 100% of the cells have expected counts less

Fisher's Exact Test

Table Probability (P)

Column 2 Risk Estimates

Difference 0.0000 0.5000

Difference is (Row 1 - Row 2)

Statistics for Table of carat_cat by price_cat

Likelihood Ratio Chi-Square

Sample Size = 308

Table of carat_cat by price_cat

Column 2 Risk Estimates

Difference is (Row 1 - Row 2)

Row 1: Medium-sized diamonds

Corrected Total 307

R-Square Coeff Var Root MSE price Mean

DF Anova SS Mean Square F Value Pr > F

Ho: Homogeneity of price variance for different groups

Comparisons significant at the 0.05 level are indicated

Das könnte Ihnen auch gefallen