Sie sind auf Seite 1von 6

1.

Two Means (t-test)


Age (yrs)

45 + 54 + 36
(a) x =
= 45 yrs,
3
y=

x1 = 45

y1 = 41

x2 = 54

y2 = 50

x3 = 36

y3 = 29

sx =
2

( 45 45 ) 2 + ( 54 45 ) 2 + ( 36 45 ) 2
3 1
( 41 40 ) + ( 50 40 ) 2 + ( 29 40 ) 2

= 81 yrs2

41 + 50 + 29
= 40 yrs,
3

sy2 =

3 1

= 111 yrs2

Note that the sample mean age difference = x y = 5 yrs.


Testing H0: X Y =
0 versus HA: X Y 0 , at the = .05 significance level. The problem
states that the age variable is normally distributed in each population, and these independent
samples are clearly small. This is therefore a good candidate for the two sample t-test, provided
that equivariance can be established, via the informal condition that the ratio of sample variances
2
s
81
lies between 0.25 and 4. We have x 2 =
= 0.73, which satisfies this required criterion.
111
sy
Next, for = .05 and df = nx + n y 2 = (3 + 3) 2 = 4, we calculate

critical value

= t4, .025 = 2.776

standard error s.e.

spooled

1
1
+
nx n y

2 (81) + 2 (111)
4

where spooled =
2

1 1
=
+
3 3

96

( nx 1) sx 2 + ( n y 1) s y 2
df

2
= 8 yrs.
3

Therefore,
95% margin of error = ( t4,.025 )( s.e.) (2.776)(8 yrs) = 22.208 yrs, so that...
95% Confidence Interval for X Y = (5 22.2, 5 + 22.2) = (17.2, 27.2) yrs.
As this interval contains the null value X Y = 0, the null hypothesis cannot be rejected
at the = .05 significance level.
(b) Interpretation: Based on the sample data, there is not enough evidence to conclude that a
statistically significant difference exists between the population mean ages of men and women,
at the = .05 level. (Not surprising, given the small sample sizes, and thus low power of
detecting such a difference, even if present.)
(c) If normality is not established for two independent samples, then a nonparametric test
specifically, the Wilcoxon Rank Sum Test should be used instead.

2
(d) In this design, we have two small, paired samples. Therefore, a paired t-test is appropriate, i.e.,
a t-test of H0: D = 0 vs. HA: D 0 , where =
X Y , using the single sample of n = 3
D
individual differences D = {4, 4, 7}. Here, as before,
(4 5) 2 + (4 5) 2 + (7 5) 2
but
= 3 yrs2.
d = 5 yrs,
sd 2 =
3 1
So, for = .05 and df = 3 1 = 2, we have

critical value = t2, .025 = 4.303


standard error s.e. sd / n =

3 yrs2 / 3 = 1 yr.

Multiply

Therefore, the 95% margin of error = ( t4,.025 )( s.e.) (4.303)(1 yr) = 4.303 yrs, so the
95% Confidence Interval for D = (5 4.3, 5 + 4.3) = (0.7, 9.3) yrs.
As this interval does not contain the null value D = 0, the null hypothesis can be rejected at
the = .05 significance level.
(e) Interpretation: Based on the data from this sample, at the = .05 level, there is a statistically
significant difference between the population mean ages of married couples, with husbands
older than their wives by an average of 5 years.
(f) If normality is not established for two paired samples, then (after reducing them to a single
sample of pairwise differences) a nonparametric test specifically, the Wilcoxon Signed Rank
Test should be used instead.

2. Chi-squared Tests
(a) Dark = 48/120 = 0.4, Milk = 40/120 = 0.333, and White = 32/120 = 0.267.
Dark
Chocolate

Milk
Chocolate

48 (40)

40 (40)

32 (40)

With the three equal expected values under


H 0 : =
=
White shown in parentheses, we
Dark
Milk
2
(+8)
(0)2 (8)2
2
have = 40 + 40 + 40 = 3.2 on 2 df, so that
0.10 < p-value < 0.25. Do not reject H0 at = .05.

(b) Men | Dark = 12/48 = 0.25, Men | Milk = 16/40 = 0.4, Men | White = 20/32 = 0.525

vs.

Women | Dark = 36/48 = 0.75, Women | Milk = 24/40 = 0.6, Women | White = 12/32 = 0.375
Dark | Men = 12/48 = 0.25, Dark | Women = 36/72 = 0.5 vs.
Milk | Men = 16/48 = 0.333, Milk | Women = 24/72 = 0.333 vs.
White | Men = 20/48 = 0.417, White | Women = 12/72 = 0.167
Dark
Chocolate

Milk
Chocolate

12 (19.2)

16 (16.0)

20 (12.8)

48

36 (28.8)

24 (24.0)

12 (19.2)

72

48

40

32

120

With the expected values shown, we have


( 7.2 ) 2 ( +7.2 ) 2 ( +7.2 ) 2 ( 7.2 ) 2
2
+
+
+
=
19.2
12.8
28.8
19.2
= 11.25 on 2 df, so that .0025 < p-value < .005.
Reject H0 strongly at the = .05 level.

(c) Interpretation: At the 5% level, there is no statistically significant difference shown in the
overall proportions of people who prefer dark, milk, or white chocolate. However, there is
strong evidence that a significant difference exists when stratified on gender. In particular, the
two categorical variables I = Gender (Men / Women) and J = Chocolate preference (Dark /
Milk / White) are not independent of one another. i.e., an association exists between the two.

3. Analysis of Variance (ANOVA)

20(130) + 21(150) + 22(151)


= 144.
63
20(130 144) 2 + 21(150 144) 2 + 22(151 144) 2
5754
=
=
= 2877
3 1
2
(20 1)(207.5) + (21 1)(204.5) + (22 1)(217.5)
12600
=
=
= 210
63 3
60

(a) With sample size n = 20 + 21 + 22 = 63, the grand mean x =


SSTrt
df Trt
SSErr
MSErr =
df Err

MSTrt =

ANOVA Table
Source
Treatment

df

SS

MS

5754

2877

Error

60

12600

Total

62

18354

MSTrt
F = MS
Err

p-value

13.7

p < .05

210

(b) Because 13.7 is greater than the tabulated = .05 F2, 60-score of 3.15, it follows that p << .05,
and we can strongly reject the null hypothesis H0: 1 = 2 = 3, at the = .05 significance level.
Interpretation: At the 5% level, there is a statistically significant difference in the mean yields
between the three corn varieties. Specifically, the two experimental varieties do indeed have
significantly higher mean yields than the control, beyond what one would expect by chance.

F2, 60

3.15

13.7

4. Linear Correlation and Regression

+20.8

Y = 121.4 11.1 X

17.0
14.8
2.6
+13.6

(a) With means x = 6 and y = 54.8, we have sxy =


1
5 1

[(2 6)(120 54.8) + (4 6)(60 54.8) + (6 6)(40 54.8) + (8 6)(30 54.8) + (10 6)(24 54.8)]
= 111.

(b) r =

sxy
sx s y

111

(c) SSTotal =

(y
i =1

(d) b1 =

sxy
sx

= 0.902; a strong, negative linear correlation between X and Y.

10 1515.2

y ) 2 = ( n 1) s y 2 = 4 (1515.2) = 6060.8

111
= 11.1,
10

b0 = y b1 x = 54.8 (11.1)(6) = 121.4

Hence, the equation of the least squares regression line is Y = 121.4 11.1 X .
predictors
observed
responses
fitted
responses

(e)

residuals

10

120

60

40

30

24

99.2

77.0

54.8

32.6

10.4

Y Y

+20.8

17.0

14.8

2.6

+13.6

SSErr = (+20.8)2 + (17.0)2 + (14.8)2 + (2.6)2 + (+13.6)2 = 1132.4


(By construction, this is the smallest value that the residual sum of squares can attain, of
any regression line for these data.)

(f) SSReg can be computed directly from its definition, but because SSTotal = SSReg + SSErr,
it follows that SSReg = SSTotal SSErr = 6060.8 1132.4 = 4928.4.
Total
Err
Reg

(g) r2 = (0.902)2 = 0.813

or, equivalently,

r2 =

SSReg
SSTotal

4928.4

= 0.813.

6060.8

Interpretation: The negative linear association between Y (protein level) and X (time)
accounts for 81.3% of the total variation in the data. (The remaining 18.7% of the
variation is unaccounted for by the line, and may in fact just be due to random chance.)
This indicates that the linear model is a reasonably good fit to the data. However
(h) At x = 12 hours, the corresponding point estimate of the response is Y = 121.4 11.1(12)
= 11.8 mg/L, a negative value, which is physically impossible in this situation.
(i) The overall graph of the line, along with its high r2 value, indicates that this model may be
a reasonable perhaps useful description of the true manner in which the protein levels
decrease with time, at least over the recorded intervals. However, the residuals clearly
follow a nonlinear pattern of +, to , then back to +, which we would not expect to see if
this were actually the case. Furthermore, as time increases, we might anticipate that the
protein levels would drop to zero as a lower bound, but not less, as the model yielded
for x = 12 hours above; the margin of error there is almost certain so large as to make the
estimate practically useless. All of this suggests that, being physically unrealistic, the
linear model has definite limitations, and a nonlinear model might be more appropriate.
240
(j) The nonlinear model Y =
exactly fits the original data values for X = 2, 4, 6, 8, 10, is a
X
mathematically simple relationship and hence easy to interpret, and unlike the linear model,
decreases toward zero as time X gets large, which conforms to physical intuition. When
240
x = 12 hours, this model predicts that Y =
= 20 mg/dL, a much more realistic value.
12

Das könnte Ihnen auch gefallen