Sie sind auf Seite 1von 16

QUESTION - A

1. Study the Lungcap data set and answer the following questions.

i) Construct a two-way table for gender and smoking habit.


ii) Find the marginal probabilities.
iii) Given that one randomly selected person is a smoker, what is the probability that the
person is a female?
iv) Are gender and smoking habit independent?

2. Suppose it is given that 20% of the male smokers and 15% of the female smokers were born
caesarean. With the help of the data, verify the above statements. Give enough reasons for your
answers.
3. Plot the histogram of the distribution of Lungcap amongst smokers.
4. Plot the histogram of the distribution of Height amongst smokers.
5. Are height and Lungcap independent?
6. Are the variation of Lungcap of male smokers and female smokers equal?
7. Are the average of Lungcap of smokers and non-smokers equal?
8. Plot the histogram of the age amongst smokers.
9. What percentage of people below 16 years smoke?
10. What percentage of people above 17 years smoke?
11. Test if smoking habit and age are dependent.
12. Test if smoking habit and Lungcap are dependent.
13. Fit a suitable distribution to height and also to Lungcap. Test the goodness of fit.

QUESTION – B

Study the car data set and answer the following questions.

1. Find the average and variance of price and mileage separately. Comment on the results. How will
you interpret the result statistically?
2. Test if the mean mileage of different car manufacturers within some price range are equal. Clearly
specify all the assumptions and the null and alternative hypotheses.
3. Find a 90% confidence price range for the Chevrolet cars.
4. Find a 90% confidence for variance of prices for Pontiac cars.
5. Calculate the correlation coefficient between mileage and Liter for each company.
6. Comment on the results.
7. Suppose a car has a Liter of 3.8. How sure will you be that its mileage is more than 20,000?
8. Is there any correlation between prices and mileage?

QUESTION – C

Ý n1 N ( μ , σ =10) . Find n1
2
1. Let be the mean of a random sample of size from

such that the probability of the random interval ( Ý −1 /2 , Ý +1/2) includes μ is

approximately 0.954.

1|Page
2. Let Ź be the mean of a random sample of size
n2 from N ( μ , σ 2=9 ) . Find n2

such that the probability of the random interval ( Ź −1 , Ź+1) includes μ is

approximately 0.90.
n1
3. Draw 200 random samples each of size (found above) from a normal distribution with
mean 5 and variance 3.
4. Write down the distribution of the sample mean. Test using the data obtained in Q3 above, if the
sample means follow that distribution.
n2
5. Draw 200 random samples each of size (found above) from a normal distribution with
mean 7 and variance 3.
6. Compute 95% confidence interval for the difference of means from each of the 200 samples.
Draw a graph to show all 200 confidence intervals and comment.

QUESTION – D

1. Collect stock prices for 5 companies from 1st Jan 2016 to 30th June 2016.
2. Plot the histogram of the returns for each company. Describe the histograms.
3. Test whether the average returns for 5 companies are equal. State clearly the assumptions
required, null and alternative hypotheses.
4. Test whether the average returns for each pair of companies are equal.
5. Comment on the results.

QUESTION – E

1. The income distribution of a very large population is exponential with average income ₹ 40, 000
per annum. Draw 500 samples (from the income distribution) of size 100 each. Sketch the
distribution of sample average income. Comment.
2. The age distribution of a very large population is given below:

Age Group 15-18 18-21 21-23 23-25 25-27 27-29 29-31 31-33 33-35
(years)
Proportion 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1

Draw 100 samples (from the age distribution) of size 50 each. Sketch the distribution of sample
average age. Comment.

2|Page
3|Page
Section-A
Q.1.i.
Two-Way Table
Smoking Habit
Gender Non Grand
Smoker Smoker Total
Male 33 334 367
Female 44 314 358
Grand Total 77 648 725
Q.1.ii. Marginal Probabilities
Smoking Habit
Gender Non Marginal
Smoker Smoker Probability
Male 0.046 0.461 0.506
Female 0.061 0.433 0.494
Marginal
Probability 0.106 0.894 1.000
Q.1.iii Given that one randomly selected person is a smoker, probability that the person is
female:
P(Female|Smoker) = #of female smokers
#of smokers
= 44
77
= 0.571

Q.1.iv. H0: Gender and Smoking Habit are independent.


HA: Gender and Smoking Habit are dependent.
Reject Ho if cal is less than 5% p-value.

Observed Frequencies Expected Frequencies Difference Sq. Diff./Exp. Freq


F Value Given E Value Expected (Fij - Eij) (Fij - Eij)^2/Eij
F11 33 E11 38.98 -5.98 0.917
F12 334 E12 328.02 5.98 0.109
F21 44 E21 38.02 5.98 0.940
F22 314 E22 319.98 -5.98 0.112
Degrees of freedom= (2-1)(2-1)=1 
 cal 2.077

 1,0.05 3.841
Since cal < 1,0.05 , the p-value for cal (>10%) is more than 5%. Hence there is not sufficient
evidence to reject HO and we accept that gender and smoking habits are independent.

4|Page
Q.2 Given 20%(=m )of male smokers and 15%(=f ) of female smokers were born caesarean.
a) As per the sample,
# of male smokers = 33 , # of male smoker born caesarean =10
Proportion of male smoker born caesarean, Pm =10/33 =30.3%
Sample size,Nm=33
Since sample size > 30, as per CLT, Pm ~ N(Pm,SDm)
Standard deviation, SDm= sqrt(Pmx(1-Pm)/ Nm)= 0.08
Zcal=Pmm)/SD = (30.30%-20%)/0.08 = 1.29
Z+cri=Z0.975=1.96; Z-cri=Z0.025=-1.96
Hypothesis Statement:
HO: Proportion of male smoker, m = 20%
HA: Proportion of male smoker, m ≠ 20%
Rejection Rule
Reject HO if Zcal > Z+cri or Zcal < Z-cri
Since Z-cri > Zcal (=1.29) < Z+cri , there is not enough
evidence to reject HO.
Hence we accept the hypothesis that 20% of the male smokers were born caesarean.
b) As per the sample
# of female smokers = 44 , # of male smoker born caesarean =11
Proportion of male smoker born caesarean, Pf =11/44=25%
Sample size,Nf=44
Since sample size > 30, as per CLT, Pf ~ N(Pf,SDf)
Standard deviation, SDf= sqrt(Pfx(1-Pf)/ Nf)= 0.065
Zcal=Pff)/SD = (25% - 15%)/0.065 = 1.53
Z+cri=Z0.975=1.96; Z-cri=Z0.025=-1.96
Hypothesis Statement:
HO: Proportion of female smoker, f = 15%
HA: Proportion of female smoker, f ≠ 20%
Rejection Rule
Reject HO if Zcal > Z+cri or Zcal < Z-cri
Since Z-cri > Zcal (=1.53) < Z+cri , there is not enough
evidence to reject HO.
Hence we accept the hypothesis that 15% of the female smokers were born caesarean.
Q3.&4.

5|Page
Q5.
Lungcap Height Total
<63 >63
<7 229 34 263
>7 54 408 462
Total 283 442 725
Hypothesis Statement:
H0: Height and lungcap are independent for the following ranges
HA: Height and lungcap are dependent
Rejection Rule:
Reject Ho if cal is less than 5% p-value.
Observed Frequencies Expected Frequencies Difference Sq. Diff./Exp. Freq
F Value Given E Value Expected (Fij - Eij) (Fij - Eij)^2/Eij
F11 229 E11 102.66 126.34 155.479
F12 34 E12 160.34 -126.34 99.549
F21 54 E21 180.34 -126.34 88.509
F22 408 E22 281.66 126.34 56.670

Degrees of freedom= (2-1)(2-1)=1  cal 400.207

 1,0.05 3.841
Since  cal >  1,0.05 , the p-value for  cal (~0%) is less than 5%. Hence reject H O and state that Height and
  

lungcap are dependent.

Q6. Hypothesis Statement:


HO: Variance of male smokers and female smokers are equal or /22 =1
HA: Variance of male smokers and female smokers are not equal or /22≠1
Rejection Rule:
Reject HO if Fcal(=s12/s22) >Fcrit,df for significance level 0.05.
We conducted F-value test for =0.05/2=0.025(for two tail test) got the following results:

6|Page
Since Fcal (=1.19)< Fcrit(1.96), there is not enough reasons to reject H O. Hence we accept the
hypothesis and state that the variances of male smokers and female smokers are equal.

Q7. Let 1 and 2 be the average of lungcap of smokers and non-smokers. Whereas 12 and 22 are
the sample variance of the respective population.
x1= Random Variable of average of lungcap of sample smokers ~ N(1,12/n1)
x2= Random Variable of average of lungcap of sample non-smokers~ N(2,22/n2)
As per data,
No. of smokers, n1=77 No. of non-smokers, n2= 648
Average of lungcap of smokers x1=8.645 Average of lungcap of non-smokers x2= 7.77
Sample lungcap variance of smoker, s= 3.545 Sample lungcap variance of non-smoker, s=
7.432

Since population lungcap variances of smokers (12)and non-smokers is unknown22) , we assume


12=22=sp2.
Where sp2= [(n1 – 1)s12 + (n2 – 1)s22 ]/(n1+n2 – 2) = 7.023
Now, (x1-x2)/(sp x sqrt(1/n1+1/n2)) ~ tn1+n2-2
tcal = (x1-x2)/(sp x sqrt(1/n1+1/n2)) = 2.74
t+cri,0.025 =1.96
t-cri,0.975 = -1.96
Hypothesis Statement
HO: 1=2 or 1-2=0
HA:1≠2 or 1-2≠0
Rejection Rule
Reject HO if tcal > t+cri or tcal < t-cri
Since in our case tcal (=2.74) > t+cri (=1.96), we reject HO.
Hence the average lungcap of smokers and non-smokers are not equal(1≠2).
Q.8.

7|Page
Q.9. # people below 16 years = 548
# people below 16 years who smoke= 42
Percentage of people below 16 who smoke = 42/548 = 7.66%
Q.10. # people above 17 years = 80
# people above 17 years who smoke= 15
Percentage of people below 16 who smoke = 15/80 = 18.75%

Q.11. H0: Age and smoking habit are independent Age


Smoking
for the above age ranges Habit <15 >15 Total
HA: Age and smoking habit are dependent for the Yes 42 35 77
above age ranges No 506 142 648
Reject Ho if cal is less than 5% p-value. Total 548 177 725

Observed Frequencies Expected Frequencies Difference Sq. Diff./Exp. Freq


F Value Given E Value Expected (Fij - Eij) (Fij - Eij)^2/Eij
F11 42 E11 58.20 -16.20 4.510
F12 35 E12 18.80 16.20 13.963
F21 506 E21 489.80 16.20 0.536
F22 142 E22 158.20 -16.20 1.659
Degrees of freedom= (2-1)(2-1)=1 
 cal 20.668

 1,0.05 3.841
Since  cal >  1,0.05 , the p-value for  cal (~0%) is less than 5%. Hence reject H O and state that age and
  

smoking habit are dependent.


Q.12. Hypothesis Statement:
H0: Lungcap and smoking habit are independent for Smoking Lungcap
the above lungcap ranges Habit <9 >9 Total
Yes 43 34 77
No 432 216 648
Total 475 250 8 | P a g725
e
HA: Lungcap and smoking habit are dependent for the above lungcap ranges
Reject Ho if cal is less than 5% p-value.
Observed Frequencies Expected Frequencies Difference Sq. Diff./Exp. Freq
F Value Given E Value Expected (Fij - Eij) (Fij - Eij)^2/Eij
F11 43 E11 50.45 -7.45 1.100
F12 34 E12 26.55 7.45 2.089
F21 432 E21 424.55 7.45 0.131
F22 216 E22 223.45 -7.45 0.248
Degrees of freedom= (2-1)(2-1)=1 
 cal 3.568

 1,0.05 3.841
Since cal < 1,0.05 , the p-value for cal (>=5%) is more than 5%. Hence there is not enough reasons to
reject HO and state that lungcap and smoking habit are independent.

Q13. As per the data, we have the following descriptive statistics for lungcap and height:
LungCap Height
Mean 7.863148 Mean 64.83628
Standard Standard
Deviation 2.662008 Deviation 7.202144
Count 725 Count 725
Distribution for Lungcap
HO : We assume the lungcap distribution of the population to follow Normal Distribution
~ N(7.863,2.66)
HA: The lungcap distribution doesn’t follow ~ N(7.863,2.66)
We construct the following frequency distribution with taking bin size such that the frequency
percentage is 10%.
Percentage Z-value Bin Frequency Expected fi-ei (fi-ei)2 (fi-ei)2/ei
(fi) Frequenc
y
(ei)
10% -1.28 4.456 83 72.5 10.5 110.250 1.521
20% -0.84 5.627 62 72.5 -10.5 110.250 1.521
30% -0.52 6.479 67 72.5 -5.5 30.250 0.417
40% -0.25 7.198 61 72.5 -11.5 132.250 1.824
50% 0 7.863 72 72.5 -0.5 0.250 0.003
60% 0.25 8.529 74 72.5 1.5 2.250 0.031
70% 0.52 9.247 79 72.5 6.5 42.250 0.583
80% 0.84 10.099 71 72.5 -1.5 2.250 0.031
90% 1.28 11.271 88 72.5 15.5 240.250 3.314
More 68 72.5 -4.5 20.250 0.279

 cal 9.524
We get cal = 9.52

9|Page
For significance level 5% and degrees of freedom 7 (=10-2-1), we have 7,0.05=14.064.
Since cal < 7,0.05 , p-value will be more than 5% . Hence we accept H O and lungcap distribution
follow ~ N(7.863,2.66).

Distribution for Height


HO : We assume the height distribution of the population to follow Normal Distribution
~ N(64.836,7.202)
HA: The height distribution doesn’t follow ~ N(64.836,7.202)
We construct the following frequency distribution with taking bin size such that the frequency
percentage is 10%.
Percentage Z-value Bin Frequency Expected fi-ei (fi-ei)2 (fi-ei)2/ei
(fi) Frequency
(ei)
10% -1.28 55.618 86 72.5 13.5 182.250 2.514
20% -0.84 58.786 64 72.5 -8.5 72.250 0.997
30% -0.52 61.091 64 72.5 -8.5 72.250 0.997
40% -0.25 63.036 69 72.5 -3.5 12.250 0.169
50% 0 64.836 61 72.5 -11.5 132.250 1.824
60% 0.25 66.637 79 72.5 6.5 42.250 0.583
70% 0.52 68.581 63 72.5 -9.5 90.250 1.245
80% 0.84 70.886 70 72.5 -2.5 6.250 0.086
90% 1.28 74.055 98 72.5 25.5 650.250 8.969
More 71 72.5 -1.5 2.250 0.031
cal 17.414
We get cal = 17.414
For significance level 5% and degrees of freedom 7 (=10-2-1), we have 7,0.05=14.064.
Since cal > 7,0.05 , p-value will be less than 5% . Hence we reject H O and height distribution
doesn’t follow ~ (64.836,7.202).

Section-B

Q1. Price Mileage


Mean 21343.14 Mean 19831.93
Sample Variance 97710315 Sample Variance 67179657

We observe that the sample variance of price is more than mileage. That means the spread of
price around average is more than that of mileage. So we can say that wide range of priced cars
have mileage closer to 19831.93.

Q2. Average
Price Range mileage (xi) Variance (i2) Sample Size(ni)
<20000 20241.52 64394503 467
20k-40k 19759.26 65564947 297
>40K 15589.65 95651556 40 10 | P a g e
Let 1,2 and 3 be the average of mileage of cars in the price range as given in the table.
Whereas 12 ,22 and 32 are the variance of the respective car price range.
Hypothesis Statement
HO : 1= 2=3
HA : 1≠ 2≠3

We conducted
Anova test. Since
the p-Value is less
than 0.05, we
reject HO and state
that the average
mileage of the cars
in above price
range are not
equal.

Q3.
Price-Chevrollet
Mean 16427.6
Standard Deviation 6901.439 t-value Price
Sample Variance 47629867t+0.05,319 0.824822 16745.82
Count 320t-0.95,319 -0.82482 16109.38
Confidence Level(90.0%) 636.4364 CL= 636.4364

Or,else using t-distribution, s=6901.439/sqrt(320) = 385.80

Q.4. n 150

16708238
Sample Variance
171.507
149,0.1
CI Variance (90%) 14515607 (n-1)s2/149,0.1

Q.5. Manufacture
r Cov.(ML) SD(M) SD(L) Corr.(ML)
Buick 162.323 6932.136 0.230 0.102
Cadillac 594.100 8964.292 0.803 0.083
Chevrolet -285.829 8203.571 1.151 -0.030
Pontiac 959.280 8110.435 1.098 0.108
SAAB -9.525 8404.288 0.162 -0.007
Saturn -501.661 8479.994 0.301 -0.197
11 | P a g e
Q.6. Analysing the correlation coefficients of mileage and liter among different car manufacturer, it
can be stated that there is weak linear relation between mileage and liter as the correlation
coefficients are close to zero.

Q7. # total car = 804


# cars with liter equal to 3.8 = 160
# cars with liter equal to 3.8 and mileage more than 20,000 = 95
Given that a car has liter 3.8, probability of mileage greater than 20,000=
P(Mileage >20000|liter=3.8)
= P(cars with liter=3.8 & mileage >20000) = (#cars with liter=3.8 & mileage >20000)/(#total cars)
P(cars with liter =3.8) (#cars with liter =3.8)/(#total cars)
=95/804 = 0.5937
160/804
Thus, given that a car has liter 3.8, with 59.37% confidence we can say that mileage is greater
than 20,000.

Q8. As correlation between price and mileage Cov.(PM) SD(P) SD(M) Corr.(PM)
is close to zero, there is a weak linear -11589868.158 9884.853 8196.320 -0.143
relation among them.
Section-C:

Q1. Given Ý ~ N(10), error = 0.5 for sample size n1.


Confidence Level =1- =0.954 or 
Corresponding Z value for /2 =|Z0.023| =1.99
Now, from standard normal distribution |Z/2|=( Ý - /(sqrt(n1)).
Or, error = |Z/2|x(sqrt(n1))
Or, 0.5=1.99xsqrt(10)/sqrt(n1)
Or, n1 = 1.992 x 10 / 0.52 = 158.4 ~ 159.
Q2. Given Ź ~ N(9), error = 1

Confidence Level =1- =0.9 or 


Corresponding Z value for /2 =|Z0.05| =1.65
Now, from standard normal distribution |Z/2|=( Ź - /(sqrt(n2)).

Or, error = |Z/2|x(sqrt(n2))


Or, 1=1.65 x sqrt(9)/sqrt(n2)
Or, n2 = 1.652 x 9 / 12 = 24.5 ~ 25.
Q.3. 200 Random no. with size 159 from N(5,3) generated.

12 | P a g e
Q.4. Since we have taken
the sample from a
normal distribution
N(5,3) each of size 159
(>30), the average of
200 sample will follow
normal distribution
with N(5,3/159) which
can be verified in the
descriptive statistics of the sample and histogram below.
Sample Mean = 5.003317 ~ 5
Sample variance = 3/159 = 0.0188 ~ 0.020

Q.5. 200 Random no. with size 25 from N(7,3) generated.


Q.6. For confidence interval(CI) 95% corresponding Z value is 1.96. Hence the Confidence interval will
be 2xZ0.975 x /sqrt(n2). We have =3 , n2=25.
Hence CI =2x1.96 x sqrt(3/25) = 1.358. The corresponding CI and difference of mean is plotted
below.

0.5

-0.5

-1
Difference of Mean Upper Limit of CI (+0.679)
Lower Limit of CI (-0.679)

Section D:
Q.1. I collected stock price of Monnet Ispat & Energy Ltd, GAIL (India) Ltd, Alstom India Ltd, ABB India
Ltd and Siemens Ltd from 01.01.2016 to 30.06.2016.

Q2. Histogram of stock returns

13 | P a g e
Q.3. HO: Average stock returns of the five companies taken are equal (R1 = R2 = R3 = R4 = R5)
HO: Atleast one of average stock return is not equal to other stock returns.
Reject HO if p value < 0.05

Since the p-Value is greater than 0.05, we accept the H O and state that the average return of the
said companies are equal.

Q4.&5. Hypothesis Statement


HO: Average stock returns of each pair of companies are equal.
HA: Average stock returns of each pair of companies are not equal.
Rejection Rule:
tcal lies in rejection region or tcal>t+0.975,df or tcal<t- 0.025,df
Observation:
In all the cases t-0.025,df <tcal<t+0.975,df, i.e lies in acceptance region. Hence we don’t have sufficient
reasons to reject HO and state that for all the pair of companies averages of stock returns are
equal.

14 | P a g e
Section E:
Q.1

15 | P a g e
Since the sample size is more than 30,i.e 100, the average salary of each sample will follow normal
distribution, N(40000,SD=40000/sqrt(10))
Sample mean calculated = 40260.59 ~ 40000
Standard deviation = 3983.923 ~ 4000 (=40000/10)

Q.2 Since the sample size is more than 30, i.e 50, the average age of the samples should follow
normal distribution as per CLT.
Population average age =25.8, and standard deviation = 5.216 then sample average age must have mean
25.8 ~ 25.09 and standard deviation = 5.216/sqrt(50) = 0.737 ~ 0.64

16 | P a g e

Das könnte Ihnen auch gefallen