Sie sind auf Seite 1von 11

1.

HOMEWORK TOPIC 1&2.

Due Thursday January 15 at the beginning of the lab class. Indicate clearly the
procedures used in each exercise.

1.1. The statistics for the data set are as follows

Sample Mean St Dev CV SE 2
1 18 29 10 14 16 12 19 21 19 15 17.3 5.33 30.8 1.687 28.46
2 17 12 27 22 20 11 20 24 16 17 18.6 5.04 27.1 1.593 25.38
3 18 23 19 17 34 14 23 20 21 14 20.3 5.77 28.4 1.826 33.34
4 16 23 14 20 14 11 23 15 17 17 17.0 3.94 23.2 1.247 15.56
18.3 5.02 27.4 1.588 25.683

Plus 20
38 49 30 34 36 32 39 41 39 35 37.3 5.33 14.3 1.687
37 32 47 42 40 31 40 44 36 37 38.6 5.04 13.1 1.593
38 43 39 37 54 34 43 40 41 34 40.3 5.77 14.3 1.826
36 43 34 40 34 31 43 35 37 37 37.0 3.94 10.7 1.247

38.3 5.02 13.1 1.588

Mean St Dev SE 2
S2-S1 -1 -17 17 8 4 -1 1 3 -3 2 1.3 8.60 2.720 74.01
S4-S3 -2 0 -5 3 -20 -3 0 -5 -4 3 -3.3 6.57 2.077 43.12

Theoretical
Mean 18
5

25
SE 1.581

1.1. Total average 40 numbers: 18.30

Average of Std. Dev: 5.02
Average Std. dev. of means: 1.588 similar to the theoretical SE of the means 1.581 (5/SQRT(10)).

1.2. The addition of 20 to each value results in an increase of 20 in the means but no change in
the standard deviations or standard error of the means. The CV's decrease due to the increase in
the means (recall CV = Standard Dev./Mean)

Average of the four Standard Errors: 1.588 (close to theoretical one).

1
1.2

and
Standard deviation of the 4 means = 1.503 (close to theoretical one).
Both are close to the theoretical SE = 5/SQRT(10)=1.581

1.4. By taking random samples of size 10 and finding the sample means, one offsets the
variability of the observations against one another. The effect of exceptionally large or small
values is diluted. Therefore, a set of sample means deviates less from (have less dispersion
about ) than a set of individual variates. We see this here, where the standard deviation of the
four means (1.503) is much less than the average of the standard deviations of the four samples
(5.02). In fact, these two values are related to one another through the formula,
5.02/SQRT(10)=1.588

1.5. Using n=40 and all the samples combined

5
Distribution of sample means for sample size n =40 per mean Y ( n 40 ) 0 . 79057 .
n 40
Yi 18 . 3 18 . 00
Zi 0 . 379473
Y ( n 40 ) 0 . 79057

P (Z0.38) = 0.3520 Conclusion: this is not an unusual sample for a population with a mean = 18
and a variance of 25, similar or larger values happen ~35% of the times

1.6. The average is -1, close to the expected value of 0

The average standard deviation is 7.58 (and the average 2= 58.57).
This is close to the expected values of 2= 50. (Remember that the variance of A B = 2A+2B ;
even though you are subtracting the samples, the errors accumulate)

St
Mean SE Var
Dev
S2-S1 -1 -17 17 8 4 -1 1 3 -3 2 1.3 8.60 2.720 74.01

Question 2 ANSWER [15 points]

2.1. Given a Normal distribution Y with Mean = 1.00 and 2 = 4.00. Find
2.1.1. P (Y 3.44) = P (Z (3.44-1.00)/2= P (Z (1.22) = 1-0.1112= 0.8888

2.1.2. P(0.0 Y 2.66)= P(-0.5 Z 0.83)= P(Z 0.83)- P(Z -0.5) =

P(Z 0.83) = 0.7967
P(Z -0.5) = P(Z 0.5) =0.3085
= 0.7967-0.3085 = 0.4882

2.1.3. P(Y Yo)= 0.6026 P(YYo)= 0.3974

In the Z Table= P(Z0.26) = 0.3974
If Zo = (Yo-)/ Yo= Zo*+
P(Y (0.26*+))= P(Y (0.26*2+1))= 0.3974

2
1.3

P(Y1.52)= 0.3974 Yo= 1.52

Checking
P(Y 1.52)=1- P(Y1.52)=1- P(Z(1.52-1)/2)=1- P(Z0.26)=1-0.3974=0.6026

2.1.4. Remember that

P(|Z | Zo): Pb that a random Z will be numerically less than Zo, that is, lie within the interval (Z1, Z1)
P(|Z | Zo) : Pb that a random Z will numerically exceed Zo, that is, lie outside the interval (Z1, Z1)

P(|Y| Yo)= 0.975 P(|Y|Yo)=1-0.975=0.025 2P(YYo)=0.025

P(YYo)= 0.0125 P(Z2.2414)= 0.0125
Right border= P(Y1+ 2.2414*2)= 0.0125 P(Y5.48)= 0.0125 Yo= 5.48
Left border = P(Y 1-2.2414*2)= 0.0125 P(Y-3.48)= 0.0125 Yo= -3.48

The numbers are not identical because now the mean is1 instead of 0. The distance
between 5.48 and 1= -3.48 and 1= |4.48|

If you move all -1 to the left by subtracting 1, then

Left border: -3.48-1 = -4.48, mean =1-1= 0, right border: 5.48-1= 4.48
When mean = 0 the values of left and right borders are the same

Checking
P(|Y| 5.48)= 1- P(|Y| 5.48)= 1- 2P(Y 5.48)= 1- 2P(Z 2.24)=1-2*0.0125=0.975

2.2. Given that Y is normally distributed with mean =10 and variance 25 (= 5) and that
a sample of 25 observations is drawn then SE of the mean = 5/SQRT(25)= 1
2.2.1. P( Y 13) = P(Z (13-10)/1)= P(Z 3)=0.0013
2.2.2. P( 7 Y 13)= P(-3 Z 3)= P(|Z| 3)=1- 2P(Z 3) =1-2*0.0013=0.9974
2.2.3. = 24 and 2=12 N=? if P( Y 26)=0.1587
P( Y 26)=0.1587 then P(Z (26-24)/SQRT(12/N)=0.1587 then
2/SQRT(12/N)= 1 then 2= SQRT(12/N) then 4= 12/N then N=12/4= 3.

Checking
Mean STD DEV=SQRT(12/3)= 2
P( Y 26)= P(Z (26-24)/2)= P(Z 1)= 0.1587

2.3. Given a 2 distribution with 12 degrees of freedom. Find

2.3.1. P (2 21.0)= 1- P (2 21.0)=1-0.05= 0.95
2.3.2. Find o2 such that P(2 o2)=0.10 o2=18.5

2.4. Given a t distribution. Find

2.4.1. Y = 10 g s2= 4 (population variance unknown). What is the approximate
probability that a bag of 16 oysters weights less than 8 g?
P( Y 8)= P(t (8-10)/SQRT(4/16))= P(t (-2/0.5)= P(t (-4)= P(t 4)=0.00058 or
0.0006

3
1.4

2.4.2. Find to such that 80% of the values are within the -to to to interval for 22 d.f.
P(|t| to )=0.8 then P(|t| to )=0.20 then to=1.321

Question 3 ANSWER [20 points]

data PROBLEM3;
input GPC1 \$ PROTEIN;
cards;

No 11.6
No 9.3
No 11.7
No 12.7
No 13.4
No 9.2
No 7.9
No 14.1
No 12.0
No 10.6
No 12.0
No 10.1
No 12.1
Yes 18.1
Yes 15.2
Yes 17.2
Yes 11.9
Yes 14.7
Yes 12.0
Yes 12.9
Yes 12.7
Yes 10.9
Yes 14.5
Yes 12.8
Yes 12.9
Yes 14.9
;
PROC SORT;
By GPC1;

PROC UNIVARIATE normal plot;

by GPC1;

PROC TTEST;
Class GPC1;
Var PROTEIN;

proc power;
twosamplemeans
meandiff = 2.6154
stddev = 1.9521
npergroup = 13 14 15 16 17
power = .;

4
1.5

proc power;
twosamplemeans
meandiff = 2.6154
stddev = 1.9521
alpha= 0.01
npergroup = .
power = 0.85;
run; quit;

3.1. Both variables are normal

Tests for Normality No GPC
Tests for Normality

Tests for Normality Yes GPC

Tests for Normality

Shapiro-Wilk W 0.932847 Pr < W 0.3711

Q-Q plot

The good fit of the Q-Q plots to the expected ~N line correlates well with the high W values and the non-significant
differences in the Shaprio-Wilk test. We accept our assumption that the data are normally distributed.

3.2.
GPC1 N Mean Std Dev Std Err Minimum Maximum

5
1.6

Satterthwaite Unequal 23.33 -3.42 0.0023

Equality of Variances

Folded F 12 12 1.41 0.5624

We reject the null hypothesis that the samples are the same. The samples are significantly
different (P= 0.0023). The GPC gene increases protein content of the grain.

Using SAS
Computed Power

1 13 0.906

2 14 0.927

3 15 0.943

4 16 0.956

5 17 0.966

6
1.7

Hand calculation of power

Using Section 2.3.2 n=13 2n-2=24
t /2 2*(n-1)= t /2 24 df = 2.064 /2=0.025
Mean S s2
No Gpc 11.2846 1.78 3.165
Gpc 13.9000 2.11 4.457
Average s2 3.811
|1 - 2 |= 2.6154
((2*s2)/n)= ((2*3.811)/13) = 0.765707

Or using Section 2.4.4.

2
r = 2 (s(pooled) /(t/2, n1+n2-2 + t, n1+n2-2)
2
r = 2 (3.811/6.8402)(2.064+ t, n1+n2-2)
2
r= 1.1143(2.064+ t, n1+n2-2)
SQRT(13/1.1143) - 2.064= t, n1+n2-2
t, n1+n2-2= 1.3517
Then < 0.10 and the power= 1-0.10>0.90

3.4. Power of 85% and alpha=0.01 Average 2= 3.811

2
r = 2 (s(pooled) / (t/2, n1+n2-2 + t, n1+n2-2)
2 2
Approximate with Normal r = 2 (3.811/ (2.6154) ) (2.575 + 1.035) =14.5
2 2
Using T with n=16 r = 2 (3.811/ (2.6154) ) (2.75 + 1.055) =16.13
Then at least 17 to have at least 0.85 power. Actual power with 17 based on SAS= 0.87

guesstimate n df = 2(n - 1) t0.005 t0.15 estimated n

16 30 2.75 1.055 16.1
17 32 2.7385 1.0535 16.0

It is not possible to have 16.014 replications to achieve a power of at least 0.85 therefore
we must round up to 17 replications. The iterations suggest that 17 replications would
achieve a power of at least 0.85.
Using SAS
Computed N Per
Group

0.870 17

Question 4 ANSWER [10 points]

7
1.8

4.1. Since the experiment was aimed to detect an increase in weight, the test is one-tailed.
=0.05 (Type I error).
Using the Power formula
| 1 2 | | 1 2 | 70
Power P ( z z ) P ( z z ) P ( z 1 . 645 )
Y 1 Y 2 2
2
2 * 2500

n 4

P ( z 0 . 3349 ) 1 0 . 36885 0 . 63
Also could be done using equation from section 2.4.4 in lecture notes

4.2. Table
Z/2= 0.005 =2.575 Z=0.2= 0.8416
2
(Z/2 + Z) = 11.673
2 2
r = 2 ( /) * (Z + Z)
Distance 1 1 1 1 2
N 374 94 42 24 15 11 8 6

Question 5 ANSWER [15 points]

Data Prob5;
Input C T @@; *paired samples;
DIFF= C-T;
Cards;
15.675 14.135
17.160 15.510
18.480 15.015
19.745 17.050
19.470 16.720
19.855 19.525
18.590 17.160
18.150 16.610
18.975 17.820
15.620 16.390
;
Proc TTEST;
paired C*T; * assumes paired samples;

proc power;
onesamplemeans
mean = 1.579
ntotal = 10
stddev = 1.223
nullmean= 0
alpha= 0.05
power = .;
run; quit;
Two sample paired test
Mean 95% CL Mean Std Dev 95% CL Std Dev
Diff

8
1.9

9 4.08 0.0028

One sample DIFF

Fixed Scenario Elements

Distribution Normal

Method Exact

Null Mean 0

Alpha 0.05

Mean 1.579

Total Sample Size 10

Number of Sides 2

Computed Power

Power

0.951

5.1.: there are significant differences between treatment and control (P=0.028)
5.2.: the power of the test is 0.95
5.3. Graphical representation of DIFF

9
1.10

Question 6 ANSWER [15 points]

Four locations. Average yield= 7,000 lb/ac, standard deviation= 450 lb/ac. How many
locations do we need to estimate the true mean yield with a 95% confidence interval of
less than 800 lb/acre? d= 400 lb/ac
First estimate using r = z / d r= 1.962 * 202500/160000= 4.86
Using r = t2 /2, r-1 s2 / d2, the sample size, is estimated iteratively,

initial-n t2.5 %, n-1 n

5 2.776 (2.776)2 (450)2 / 4002 = 9.75
10 2.262 (2.262)2 (450)2 / 4002 = 6.48
7 2.447 (2.447)2 (450)2 / 4002 = 7.57
8 2.365 (2.365)2 (450)2 / 4002 = 7.07

Question 7 ANSWER [10 points]

No standard deviation
CV not greater than 10%. Estimate the number of replications required in order
to have the total length of a 95% confidence interval about the true mean yield
be less than the standard deviation?

z CV
r =
( d / )

CV= (s/ Y ) < 0.1 and 2d<s (then s>2d)

Y >s/0.10
d/=(s/2)/(s/0.1))= 0.1/2=0.05
Normal approximation: r = 1.962 * 0.102 /0.052 = 15.4

Or

CV = s/ Y , so s = CV* Y = 0.10 * 7500= 750, and s < 750

2d <750, thus d < 375 and r = 1.962 * 7502 / 3752 = 15.4

Using r = t2 /2, r-1 s2 / d2, the sample size, is estimated iteratively,

initial-n t 2.5%, df n
16 2.131 (2.131) (750)2 /3752 = 18.2
2

19 2.101 (2.101)2 (750)2 /3752 = 17.66

18 2.110 (2.110)2 (750)2 /3752 = 17.8

10
1.11

18 replications are necessary to have a 95% confidence interval for the mean= 1s

11