Beruflich Dokumente
Kultur Dokumente
45 + 54 + 36
(a) x =
= 45 yrs,
3
y=
x1 = 45
y1 = 41
x2 = 54
y2 = 50
x3 = 36
y3 = 29
sx =
2
( 45 45 ) 2 + ( 54 45 ) 2 + ( 36 45 ) 2
3 1
( 41 40 ) + ( 50 40 ) 2 + ( 29 40 ) 2
= 81 yrs2
41 + 50 + 29
= 40 yrs,
3
sy2 =
3 1
= 111 yrs2
critical value
spooled
1
1
+
nx n y
2 (81) + 2 (111)
4
where spooled =
2
1 1
=
+
3 3
96
( nx 1) sx 2 + ( n y 1) s y 2
df
2
= 8 yrs.
3
Therefore,
95% margin of error = ( t4,.025 )( s.e.) (2.776)(8 yrs) = 22.208 yrs, so that...
95% Confidence Interval for X Y = (5 22.2, 5 + 22.2) = (17.2, 27.2) yrs.
As this interval contains the null value X Y = 0, the null hypothesis cannot be rejected
at the = .05 significance level.
(b) Interpretation: Based on the sample data, there is not enough evidence to conclude that a
statistically significant difference exists between the population mean ages of men and women,
at the = .05 level. (Not surprising, given the small sample sizes, and thus low power of
detecting such a difference, even if present.)
(c) If normality is not established for two independent samples, then a nonparametric test
specifically, the Wilcoxon Rank Sum Test should be used instead.
2
(d) In this design, we have two small, paired samples. Therefore, a paired t-test is appropriate, i.e.,
a t-test of H0: D = 0 vs. HA: D 0 , where =
X Y , using the single sample of n = 3
D
individual differences D = {4, 4, 7}. Here, as before,
(4 5) 2 + (4 5) 2 + (7 5) 2
but
= 3 yrs2.
d = 5 yrs,
sd 2 =
3 1
So, for = .05 and df = 3 1 = 2, we have
3 yrs2 / 3 = 1 yr.
Multiply
Therefore, the 95% margin of error = ( t4,.025 )( s.e.) (4.303)(1 yr) = 4.303 yrs, so the
95% Confidence Interval for D = (5 4.3, 5 + 4.3) = (0.7, 9.3) yrs.
As this interval does not contain the null value D = 0, the null hypothesis can be rejected at
the = .05 significance level.
(e) Interpretation: Based on the data from this sample, at the = .05 level, there is a statistically
significant difference between the population mean ages of married couples, with husbands
older than their wives by an average of 5 years.
(f) If normality is not established for two paired samples, then (after reducing them to a single
sample of pairwise differences) a nonparametric test specifically, the Wilcoxon Signed Rank
Test should be used instead.
2. Chi-squared Tests
(a) Dark = 48/120 = 0.4, Milk = 40/120 = 0.333, and White = 32/120 = 0.267.
Dark
Chocolate
Milk
Chocolate
48 (40)
40 (40)
32 (40)
(b) Men | Dark = 12/48 = 0.25, Men | Milk = 16/40 = 0.4, Men | White = 20/32 = 0.525
vs.
Women | Dark = 36/48 = 0.75, Women | Milk = 24/40 = 0.6, Women | White = 12/32 = 0.375
Dark | Men = 12/48 = 0.25, Dark | Women = 36/72 = 0.5 vs.
Milk | Men = 16/48 = 0.333, Milk | Women = 24/72 = 0.333 vs.
White | Men = 20/48 = 0.417, White | Women = 12/72 = 0.167
Dark
Chocolate
Milk
Chocolate
12 (19.2)
16 (16.0)
20 (12.8)
48
36 (28.8)
24 (24.0)
12 (19.2)
72
48
40
32
120
(c) Interpretation: At the 5% level, there is no statistically significant difference shown in the
overall proportions of people who prefer dark, milk, or white chocolate. However, there is
strong evidence that a significant difference exists when stratified on gender. In particular, the
two categorical variables I = Gender (Men / Women) and J = Chocolate preference (Dark /
Milk / White) are not independent of one another. i.e., an association exists between the two.
MSTrt =
ANOVA Table
Source
Treatment
df
SS
MS
5754
2877
Error
60
12600
Total
62
18354
MSTrt
F = MS
Err
p-value
13.7
p < .05
210
(b) Because 13.7 is greater than the tabulated = .05 F2, 60-score of 3.15, it follows that p << .05,
and we can strongly reject the null hypothesis H0: 1 = 2 = 3, at the = .05 significance level.
Interpretation: At the 5% level, there is a statistically significant difference in the mean yields
between the three corn varieties. Specifically, the two experimental varieties do indeed have
significantly higher mean yields than the control, beyond what one would expect by chance.
F2, 60
3.15
13.7
+20.8
Y = 121.4 11.1 X
17.0
14.8
2.6
+13.6
[(2 6)(120 54.8) + (4 6)(60 54.8) + (6 6)(40 54.8) + (8 6)(30 54.8) + (10 6)(24 54.8)]
= 111.
(b) r =
sxy
sx s y
111
(c) SSTotal =
(y
i =1
(d) b1 =
sxy
sx
10 1515.2
y ) 2 = ( n 1) s y 2 = 4 (1515.2) = 6060.8
111
= 11.1,
10
Hence, the equation of the least squares regression line is Y = 121.4 11.1 X .
predictors
observed
responses
fitted
responses
(e)
residuals
10
120
60
40
30
24
99.2
77.0
54.8
32.6
10.4
Y Y
+20.8
17.0
14.8
2.6
+13.6
(f) SSReg can be computed directly from its definition, but because SSTotal = SSReg + SSErr,
it follows that SSReg = SSTotal SSErr = 6060.8 1132.4 = 4928.4.
Total
Err
Reg
or, equivalently,
r2 =
SSReg
SSTotal
4928.4
= 0.813.
6060.8
Interpretation: The negative linear association between Y (protein level) and X (time)
accounts for 81.3% of the total variation in the data. (The remaining 18.7% of the
variation is unaccounted for by the line, and may in fact just be due to random chance.)
This indicates that the linear model is a reasonably good fit to the data. However
(h) At x = 12 hours, the corresponding point estimate of the response is Y = 121.4 11.1(12)
= 11.8 mg/L, a negative value, which is physically impossible in this situation.
(i) The overall graph of the line, along with its high r2 value, indicates that this model may be
a reasonable perhaps useful description of the true manner in which the protein levels
decrease with time, at least over the recorded intervals. However, the residuals clearly
follow a nonlinear pattern of +, to , then back to +, which we would not expect to see if
this were actually the case. Furthermore, as time increases, we might anticipate that the
protein levels would drop to zero as a lower bound, but not less, as the model yielded
for x = 12 hours above; the margin of error there is almost certain so large as to make the
estimate practically useless. All of this suggests that, being physically unrealistic, the
linear model has definite limitations, and a nonlinear model might be more appropriate.
240
(j) The nonlinear model Y =
exactly fits the original data values for X = 2, 4, 6, 8, 10, is a
X
mathematically simple relationship and hence easy to interpret, and unlike the linear model,
decreases toward zero as time X gets large, which conforms to physical intuition. When
240
x = 12 hours, this model predicts that Y =
= 20 mg/dL, a much more realistic value.
12