Sie sind auf Seite 1von 7

Name :Varun Daga

NetId: vdaga2
Section: C1

Homework 2
Exercise 1 (a)
Contingency Table for Treat and Allergy
Table of treat by allergy
treat
Frequency
Expected
Percent
control
intervention
Total

allergy

Y Total

1
1
25.00

1
1
25.00

1
1
25.00

1
1
25.00

50.00

2
50.00

2
50.00

4
100.00

50.00
2

Percentage of allergic children in the control group: 25%


Percentage of allergic children in the intervention group: 25%
Exercise 1 (b)
Statistics for Table of treat by allergy
Statistic

DF

Value

Prob

Chi-Square

0.0000

1.0000

Likelihood Ratio Chi-Square

0.0000

1.0000

Continuity Adj. Chi-Square

0.0000

1.0000

Mantel-Haenszel Chi-Square

0.0000

1.0000

Phi Coefficient

0.0000

Contingency Coefficient

0.0000

Cramer's V

0.0000

WARNING: 100% of the cells have expected counts less


than 5. Chi-Square may not be a valid test.

Fisher's Exact Test


Cell (1,1) Frequency (F)

Left-sided Pr <= F

0.8333

Right-sided Pr >= F

0.8333

Table Probability (P)

0.6667

Two-sided Pr <= P

1.0000

Sample Size = 4

From the contingency table in part (a), we can see that the expected count is less than 5, hence we
cannot use Chi-Square test in this case and we have to use Fishers Exact Test. Also, we cannot
use Mantel Haenszel test because the variables are not ordinal.
Interpretation
Ho: The variables treat and allergy are independent
Ha: The variables treat and allergy have significant association
The fishers exact test gives two sided p-value as 1.000, hence we accept Ho and conclude that
there is no significant association between two variables.
Magnitude of the association: The Contingency Coefficient, Cramers V and Phi Coefficient, all
have value 0, hence we conclude that the strength of association is negligible based on the values
of Cramers V and Phi Coefficient as there values are bounded between -1 and 1.
Exercise 1 (c)
Statistics for Table of treat by allergy

Column 2 Risk Estimates


Risk

(Asymptotic) 95%
ASE Confidence Limits

(Exact) 95%
Confidence Limits

Row 1

0.5000 0.3536

0.0000

1.0000

0.0126

0.9874

Row 2

0.5000 0.3536

0.0000

1.0000

0.0126

0.9874

Total

0.5000 0.2500

0.0100

0.9900

0.0676

0.9324

Difference 0.0000 0.5000

-0.9800

0.9800

Difference is (Row 1 - Row 2)

Sample Size = 4

The table above gives us the risk estimates for the people having allergy in control group (Row
1) and intervention group (Row 2). The risk difference between the two groups is 0.00 and the
95% confidence interval is (-.9800, .9800). We can check whether the control group has
significantly higher rate of having peanut allergy than intervention group by looking at the
confidence interval. Since, the 95 % confidence interval contains zero in it, this indicates that
control group do not have significantly higher rate of having peanut allergy than intervention
group at 0.05 level of significance.
-------------------------------------------------------------------------------------------------------------Exercise 2 (a)
Table of carat_cat by price_cat
carat_cat

price_cat

Frequency
Expected
1-Affordable 2-Average 3-Expensive Total
1-Small

104
50.649

0
18.234

0
35.117

104

2-Medium

46
49.675

47
17.883

9
34.442

102

3-Large

0
49.675

7
17.883

95
34.442

102

150

54

104

308

Total

Statistics for Table of carat_cat by price_cat


Statistic

DF

Value

Prob

Chi-Square

338.7975 <.0001

Likelihood Ratio Chi-Square

388.8967 <.0001

Mantel-Haenszel Chi-Square

238.1084 <.0001

Phi Coefficient

1.0488

Contingency Coefficient

0.7237

Cramer's V

0.7416

Sample Size = 308

From the contingency table, we can see that all the expected counts have values greater than 5,
hence we can use Chi-Square test. We can also use Mantel Haenszel test because the variables
are ordinal in this case.

Interpretation
Ho: The variables carat_cat and price_cat and are independent
Ha: The variables carat_cat and price_cat have significant association
Both the Chi-Square and Mantel-Haenszel test depicts the p-value to be less than 0.001 at 0.05
significance level, hence we reject the null hypothesis and conclude that there is statistically
significant association between the two variables.
Magnitude of association: the value for Contingency Coefficient and Cramers V is quite
similar values of 0.723 and 0.7416 respectively, whereas the Phi Coefficient depicts the
magnitude of association to be a little more with value 1.04. Since, Cramers V is bounded by 1,
it could be the best way to assess the strength of association. Hence, we can conclude that the
strength of association is strong between the variables because the value for Cramers V is
greater than 0.6.
Exercise 2 (b)

Table of carat_cat by price_cat


carat_cat

price_cat

Frequency
Percent
1-Affordable 3-Expensive
2-Medium
3-Large
Total

Total

46
30.67

9
6.00

55
36.67

0
0.00

95
63.33

95
63.33

46
30.67

104
69.33

150
100.00

With help of contingency table, we can see that the large-sized diamonds have much larger
proportion in the expensive price category than the medium sized diamonds. We can further test
the significance of this with help of the Risk estimates table as follows:

Column 2 Risk Estimates


(Asymptotic) 95%
ASE Confidence Limits

Risk

(Exact) 95%
Confidence Limits

Row 1

0.1636

0.0499

0.0659

0.2614

0.0777

0.2880

Row 2

1.0000

0.0000

1.0000

1.0000

0.9619

1.0000

Total

0.6933

0.0376

0.6195

0.7671

0.6129

0.7659

-0.8364

0.0499

-0.9341

-0.7386

Difference

Difference is (Row 1 - Row 2)

Row 1: Medium-sized diamonds


Row 2: Large sized diamonds
We can indeed see that the large-sized diamonds have much more probability of being
expensive with sample proportion of 1.0 as compared to that of medium-sized diamonds with
sample proportion of 0.1636. To see whether the difference between the two is significant, we
can see 95 % confidence interval, which is (-0.9341, -0.7386). Since, the 95 % confidence
interval does not contain 0, we can say that the difference is significant, and large-sized
diamonds have larger proportion in the expensive price category than the medium sized
diamonds at 0.05 level of significance.
-------------------------------------------------------------------------------------------------------------Exercise 3 (a)
Anova table for price with carat_cat as the categorical predictor
Dependent Variable: price
Sum of
Squares Mean Square F Value Pr > F

Source

DF

Model

2928642687

1464321343

305

626784660

2055032

Corrected Total 307

3555427347

Error

712.55

<.0001

R-Square Coeff Var Root MSE price Mean


0.823710

Source
carat_cat

28.55947

1433.538

5019.484

DF Anova SS Mean Square F Value Pr > F


2

2928642687

1464321343

712.55

<.0001

Ho: Mean values of price for all Carat categories are equal
Ha: Mean value of price for at least one carat category is different
The Anova table gives p-value less that .0001, hence we reject the null hypothesis and conclude
that not all the Carat categories have equal mean values of price.
We assume the following things in one-way Anova:
1. Each sample is independent and identically distributed.
2. Response variable is normally distributed.
3. The population variances are equal across the responses for different groups.
Exercise 3 (b)
Other than normality, we have to check whether the population variances are equal across the
responses for different groups. We can check that with help of the Levenes Test of Homogeneity
of variances.
Levene's Test for Homogeneity of price Variance
ANOVA of Squared Deviations from Group Means
Source
carat_cat
Error

Sum of
DF Squares

Mean
Square F Value

9.589E14

4.794E14

305

6.731E15

2.207E13

21.73

Pr > F
<.0001

Ho: Homogeneity of price variance for different groups


Ha: Some variances are significantly different.
The test results show that the p-value is less than 0.001, hence we reject the null hypothesis and
conclude that some variances are significantly different. Hence, our assumption of equal
variances in part (a) is not valid, and we further have to carry out Welchs test.
Welch's ANOVA for price
Source
carat_cat
Error

DF F Value Pr > F
2.0000

712.25

<.0001

182.3

The Welchs Anova model is very statistically significant with p-value less than 0.001 and hence,
we conclude that Carat categories can be used to explain a significant amount of variation in price
values.

Exercise 3 (c)
Tukey's Studentized Range (HSD) Test for price
Note This test controls the Type I experimentwise error rate.

Comparisons significant at the 0.05 level are indicated


by ***.

carat_cat
Comparison

Difference
Between
Means

Simultaneous
95%
Confidence
Limits

3-Large - 2-Medium

4520.2

4047.4

4993.0 ***

3-Large - 1-Small

7493.5

7023.0

7964.0 ***

2-Medium - 3-Large

-4520.2

-4993.0

-4047.4 ***

2-Medium - 1-Small

2973.3

2502.8

3443.8 ***

1-Small - 3-Large

-7493.5

-7964.0

-7023.0 ***

1-Small - 2-Medium

-2973.3

-3443.8

-2502.8 ***

We can use Tuckeys test for comparing all pairwise differences. The table depicts that all the
comparisons are statistically significant at 0.05 level of significance. Hence, the mean value of
price is different for all the groups.
Interpretation of pricing for different pairs based on Means and Confidence Intervals
We can see that large-sized diamonds have the highest price mean followed by the medium sized
diamonds and small-sized diamonds having the lowest mean value of price amongst the three.
If we compare the small-sized diamonds with the other two, they have mean price 2973.3 less
than that of the medium-sized diamonds, and mean price 7493.5 less than that of the big-sized
diamonds with confidence intervals of (-3443.8, -2502.8) and (-7964.0, -2502.8) respectively.
If we compare the medium sized diamonds with big-sized diamonds, medium sized diamonds
have mean price 4520.2 less than the large sized diamonds with a confidence interval of (-4993.0,
-4047.4). Hence, this is how we distribute the price means amongst various groups.

Das könnte Ihnen auch gefallen