Sie sind auf Seite 1von 12

Homework 1

Kenneth Guzman
STA 412/512
September 4, 2018

1 Question 1 Solution
In problem number 1, in order to obtain the appropriate histogram, Q-Q plot, and use the
K-S test to conclude whether or not the data follows a normal distribution, we will use the
following R code. The R-code used is:
p i n e t r e e s<−c ( 2 9 . 1 , 4 0 . 5 , 5 2 . 3 , 3 4 . 4 , 3 7 . 7 , 5 7 . 8 , 4 7 . 1 , 4 5 . 0 , 3 2 . 0 , 4 3 . 6 , 2 8 . 4 , 4 0 . 7 ,
35.5 ,33.4 ,42.4 ,30.7 ,50.0 ,39.1)
par ( mfrow=c ( 1 , 2 ) )
h i s t ( p i n e t r e e s , p r o b a b i l i t y=T)
curve (dnorm( x , mean( p i n e t r e e s ) , sd ( p i n e t r e e s ) ) , l t y =3, col =4, lwd =3,add=T)
qqnorm( p i n e t r e e s , main=”Q−Q P l o t ” ) ;
qqline ( p i n e t r e e s ) ;
ks . t e s t ( p i n e t r e e s , ’ pnorm ’ ,mean( p i n e t r e e s ) , sd ( p i n e t r e e s ) )

1
Displayed below are the results of our K-S test

Since our p-value = 0.9921, which is considered a large p-value we can conclude the data
provided about the height of the pine trees is normal. Due to the results of our K-S test I
will proceed to complete parts 2-7 of problem 1.

Part II)
The hypothesis testing procedure most appropriate to test the company’s claim is the
(T-Test) because of our smaller sample size.

Part III)
Our null hypothesis: H0 : µ = 40
Our alternative hypothesis: H1 : µ > 40

Part IV)
Displayed below is a graph with the critical region shaded in blue and the location of our
critical value marked with the red line.

2
Part V)
In order to calculate our test statistic and our p-value, we will run the following code in R.
I have included the results below:
n=length ( p i n e t r e e s )
mu0=40
a lp h a = 0 . 0 5 ;
t 0 =(mean( p i n e t r e e s )−mu0) / ( sd ( p i n e t r e e s ) /sqrt ( n ) ) ;
t 0 ; #Test S t a t i s t i c
[ 1 ] −0.00853031
qt ( alpha , df=n−1,lower . t a i l=F ) ;#C r i t i c a l Value
[ 1 ] 1.739607
p v a l=pt ( t0 , df=n−1,lower . t a i l=F ) ;
p v a l ; #p−v a l u e
[ 1 ] 0.5033534
Part VI)
No, there is not enough evidence to conclude the company’s claim. When comparing our
test statistic to our critical value, we can see that our test statistic is LESS than our critical
value, so we do NOT reject our null hypothesis. Also our p-value is a great indicator than
we NOT reject the null hypothesis since it is greater than our alpha value. Therefore, at a
0.05 significance level the data does not provide sufficient evidence to determine the H1 is
true.

Part VII)
Below I have used the following code in R to conduct a T-test in order to obtain our T-CI
or confidence interval. As we can see, once we run our R code we are given the output (One
Sample T-Test) which includes the Test statistic, degrees of freedom value, our p-value, and
it also states that our true mean is NOT equal to 40, however we can be 95 percent confident
that our population mean lies within the interval 35.86114 < µ < 44.10553 which as we can
see for this particular problem our mean is 39.98333

t . t e s t ( p i n e t r e e s , df=n−1, c o n f . l e v e l =1−alpha ,mu=mu0)


One Sample t−t e s t

data : p i n e t r e e s
t = −0.0085303 , df = 1 7 , p−v a l u e = 0 . 9 9 3 3
a l t e r n a t i v e h y p o t h e s i s : t r u e mean i s not equal t o 40
95 p e r c e n t c o n f i d e n c e i n t e r v a l :
35.86114 44.10553
sample e s t i m a t e s :
mean o f x
39.98333

3
2 Question 2 Solution
Part I)
The hypothesis testing procedure more appropriate to test the researcher’s claim will have
to be the (Z-Test) because we have a large sample size.

Part II)
Our null hypothesis: H0 : µ = 5710
Our alternative hypothesis: H1 : µ 6= 5710

Part III)
Displayed below is a graph with the critical region shaded in blue and pink respectively, and
the location of our critical values marked with the red lines.

Part IV)
In order to calculate our test statistic and our p-value, we will run the following code in R.
I have included the results below:
mu. h=5710;
s i g =992.05;
n=36;
mu0= 5 7 0 8 . 0 7 ;
a lp h a = 0 . 0 5 ;
z0=(mu0−mu. h ) / ( s i g /sqrt ( n ) ) ;
z0 ; #Test S t a t i s t i c
[ 1 ] −0.0116728
qnorm( c ( alpha /2,1− alpha / 2 ) ) ; #C r i t i c a l Values
[ 1 ] −1.959964 1 . 9 5 9 9 6 4
p v a l=2∗pnorm( abs ( z0 ) , lower . t a i l=F ) ;
p v a l ; #p−v a l u e
[ 1 ] 0.9906867

4
Part V)
No, there is NOT enough evidence to conclude the researcher’s claim. Our p-value is a
great indicator than we NOT reject the null hypothesis since it is greater than our alpha
value. Therefore, at a 0.05 significance level the data does not provide sufficient evidence to
determine the H1 is true.

Part VI)
In order to obtain a 90 percent confidence interval, I ran the following code in R.
e r r o r=qnorm(1 −0.10/ 2 )
x=s i g /sqrt ( n )
l o w e r l i m i t =(mu0−e r r o r ∗x ) ; l o w e r l i m i t ;
[ 1 ] 5436.107
u p p e r l i m i t =(mu0+e r r o r ∗x ) ; u p p e r l i m i t ;
[ 1 ] 5980.033
we can be 90 percent confident that our population mean lies within the interval 5436.107 <
µ < 5980.033 which as we can see for this particular problem our mean is given and does
exist within our interval.

5
3 Question 3 Solution
Part I)
After running the code below in R, I obtained the following sample.
i n s t a l l . packages ( ” UsingR ” ) ;
l i b r a r y ( UsingR ) ;
? f a t h e r . son ;
set . s e e d ( 8 2 0 1 8 ) ;
XS=sort ( sample ( 1 : 1 0 7 8 , s i z e = 3 0 ) ) ;
dt=f a t h e r . son [ XS , ] ; dt

6
Part II)
First, I will create a histogram and Q-Q plot, then I will use the K-S test on the variable
fheight using the following code in R.
f a t h e r=c ( dt [ , ’ f h e i g h t ’ ] )
par ( mfrow=c ( 1 , 2 ) )
h i s t ( f a t h e r , p r o b a b i l i t y=T)
curve (dnorm( x , mean( f a t h e r ) , sd ( f a t h e r ) ) , l t y =2, col =3, lwd =2,add=T)
qqnorm( f a t h e r , main=” Father ’ s Height ” )
qqline ( f a t h e r )
ks . t e s t ( f a t h e r , ’ pnorm ’ , 6 7 . 6 8 7 1 , 2 . 7 4 4 8 6 8 )

7
Second, I will create a histogram and Q-Q plot, then I will use the K-S test on the
variable sheight using the following code in R.
son=c ( dt [ , ’ s h e i g h t ’ ] )
par ( mfrow=c ( 1 , 2 ) )
h i s t ( son , p r o b a b i l i t y=T)
curve (dnorm( x , mean( son ) , sd ( son ) ) , l t y =2, col =4, lwd =2,add=T)
qqnorm( son , main=”Son ’ s Height ” )
qqline ( son )
ks . t e s t ( son , ’ pnorm ’ , 6 8 . 6 8 4 0 7 , 2 . 8 1 4 7 0 2 )

Part III)
According to our K-S tests for both of our variables, the p-values are large indicating both
variables are normal, so I will complete parts 4 through 9.

Part IV)
The test procedure more appropriate to test whether son are taller than their fathers is (2
sample Z-test) because of our large sample size.

8
Part V)
Our null hypothesis: H0 : µ1 − µ2 = 0
Our alternative hypothesis: H1 : µ1 − µ2 > 0

Part VI)
Displayed below is a graph with the critical region shaded in blue and the location of our
critical value marked with the red line.

Part VII)
In order to calculate our test statistic and our p-value, we will run the following code in R.
I have included the results below:
a lp h a =0.95
fatherm=mean( f a t h e r )
sonm=mean( son )
d i f f e r e n c e=sonm−fatherm
f a t h e r s d=var ( f a t h e r ) /length ( f a t h e r )
sonsd=var ( son ) /length ( son )
bottom=sqrt ( f a t h e r s d+sonsd )
z=( d i f f e r e n c e −0)/bottom ; z ##Test S t a t i s t i c
[ 1 ] 2.213421
qnorm( a lpha ) ##C r i t i c a l Value
[ 1 ] 1.644854
p v a l u e=pnorm( abs ( z ) , lower . t a i l=F ) ; p v a l u e ##P−Value
[ 1 ] 0.0134343

9
Part VIII)
Since our p-value is less than alpha, our conclusion is that the data provided sufficient evi-
dence to determine that H1 is True, therefore we reject the null hypothesis. So according to
the data sons are taller than their fathers.

Part IX)
In order to obtain a 95 percent confidence interval, I ran the following code in R.
e r r o r=qnorm( 0 . 0 5 / 2 )
l o w e r l i m i t=d i f f e r e n c e −( e r r o r ∗bottom ) ; l o w e r l i m i t
[ 1 ] 3.093487
u p p e r l i m i t=d i f f e r e n c e +( e r r o r ∗bottom ) ; u p p e r l i m i t ;
[ 1 ] 0.1878732
we can be 95 percent confident that the difference between our variable means lies within
the interval 0.1878732 < µ1 − µ2 < 3.093487 which as we can see for this particular problem
the difference between our means exists within our interval.

4 Question 4 Solution
Part I)
After importing the AIS data into R and setting my working directory, I created the two
appropriate variables (BMI.F and BMI.M) with the following R code.
BMI1=read . csv ( ” a i s data . c s v ” )
BMI2=read . table ( ” a i s data . c s v ” , header=T, sep=” , ” )
attach (BMI2)
BMI . F=BMI [ 1 : 1 0 0 ] # f e m a l e a t h l e t e s
BMI .M=BMI [ 1 0 1 : 2 0 2 ] # male a t h l e t e s
Part II)
The test procedure more appropriate to test whether male and female athletes have different
body mass index is the (2 sample Z-Test) because of our large sample size.

Part III)
Our null hypothesis: H0 : µ1 = µ2
Our alternative hypothesis: H1 : µ1 6= µ2

10
Part IV)
Displayed below is a graph with the critical region shaded in blue and pink respectively, and
the location of our critical values marked with the red lines.

Part V)
In order to calculate our test statistic and our p-value, we will run the following code in R.
I have included the results below:
varm=var (BMI .M) ;
v a r f=var (BMI . F ) ;
top=mean(BMI .M)−mean(BMI . F ) ;
bottom=(varm/length (BMI .M))+( v a r f /length (BMI . F ) ) ;
a c t u a l b o t t o m=sqrt ( bottom )
z=top/ a c t u a l b o t t o m ; z #Test S t a t i s t i c
[ 1 ] 5.031262
a lp h a =0.05
qnorm( c ( alpha /2,1− alpha / 2 ) )#C r i t i c a l Values
[ 1 ] −1.959964 1 . 9 5 9 9 6 4
p v a l=2∗pnorm( abs ( z ) , lower . t a i l=F ) ; p v a l #P−Value
[ 1 ] 4 . 8 7 2 6 2 9 e −07
Part VI)
Since our p-value is less than alpha, our conclusion is that the data provided sufficient evi-
dence to determine that H1 is True, therefore we reject the null hypothesis. So we determine
that male and female athletes have different body mass index.

Part VII)
In order to obtain a 95 percent confidence interval, I ran the following code in R.

11
e r r o r=qnorm(1− alpha / 2 )
l o w e r l i m i t=top −( e r r o r ∗ a c t u a l b o t t o m ) ; l o w e r l i m i t
[ 1 ] 1.168649
u p p e r l i m i t=top+( e r r o r ∗ a c t u a l b o t t o m ) ; u p p e r l i m i t ;
[ 1 ] 2.660206
we can be 95 percent confident that the difference between our variable means lies within
the interval 1.168649 < µ1 − µ2 < 2.660206 which as we can see for this particular problem
the difference between our means(which is 1.914427) exists within our interval.

12

Das könnte Ihnen auch gefallen