Tutorial: CTM 2019 Probability and Statistics For Data Sciences

TUTORIAL
CTM 2019
Probability and Statistics for Data Sciences

1. Suppose the average number of lions
seen on a 1-day safari is 5.
 What is the sample space of number of lions seen on

any 1-day safari?
 0,1,2,3…N – where N is the total number of lions existing
 Countably infinite sample space
 What family of probability distributions does it belong to?

Discrete/Continuous?
 Poisson distribution – discrete
 What is the mean and variance of the distribution?

 Mean = Variance = 5
 What is the probability that tourists will see 3 lions on the

next 1-day safari?
 Let X be number of lions in a 1-day safari
 P(x = 3) = (e-5*53)/3! = 0.139
 What is the probability that tourists will see fewer than 4

lions on the next 1-day safari?
 P(x < 4) = P(x=0) + P(x=1) + P(x=2) + P(x=3)
 = (e-5*50)/0! + (e-5*51)/1! + (e-5*52)/2! + (e-5*53)/3!
 = 0.0067 + 0.0337 + 0.0842 + 0.139 = 0.2636
 What is the probability that tourists will see 5 lions on a 2

day-safari?
 Average number of lions on a 1-day safari is 5.
 This implies that for a 2-day safari, the average number of lions is 2*5 =
10.
 Let Y be number of lions in a 2-day safari.
 P(Y = 5) = (e-10*105)/5! = 0.0377
2. Let X and Y be two jointly continuous
random variables with joint PDF
6𝑥𝑦, 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 𝑥
fXY(x,y) =
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
 Find fX(x) and fY(y)

 Note: Rx = Ry = [0,1]
 To find fX(x) for 0 ≤ x ≤ 1, we can write
√𝑥
fX x = ‫׬‬0 6𝑥𝑦 𝑑𝑦
= 3x2 for 0 ≤ x ≤ 1, 0 otherwise
 To find fY(y) for 0 ≤ y ≤ 1, we can write
1
fY y = ‫𝑦׬‬2 6𝑥𝑦 𝑑𝑥
= 3y(1-y4) for 0 ≤ y ≤ 1, 0 otherwise

6𝑥𝑦, 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 𝑥
fXY(x,y) =
 Are X and Y independent?

 fXY(x,y) ≠ fX(x)fY(y)
 Therefore, X and Y are not independent.
 Find the conditional PDF of X given Y=y

𝑓𝑋𝑌(𝑥,𝑦)
 Conditional PDF: fX|Y(x|y) = 𝑓𝑌(𝑦)
6𝑥𝑦
= 3y 1−𝑦 4
2𝑥
= , 𝑦 2 ≤ 𝑥 ≤ 1 ; 0 otherwise
1−𝑦 4
6𝑥𝑦, 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 𝑥
fXY(x,y) =
 Find E[X|Y=y] for 0 ≤ y ≤ 1

1 2𝑥
 E(X|Y=y) = ‫𝑦׬‬2 𝑥. 𝑑𝑥
1−𝑦 4
2(1−𝑦 6 )
=
3(1−𝑦 4 )
3. A topic of interest in ophthalmology is whether
or not spherical refraction differs between the left
and right eye on average. In a study to investigate
this, refraction was measured on the left and right
eye of 17 patients. The differences (right-left) in
diopters were d1,d2… d17 and elementary
calculations gave σ17 𝑑
𝑖=1 𝑖 = −3.50 and σ17 2
𝑖=1 𝑖 =
𝑑
19.13.
 Provide a 90% confidence interval for the average

difference (right-left).
3a. Provide a 90% confidence interval for
the average difference (right-left).
σ17
𝑖=1 𝑑𝑖
 Sample mean d̅ = = -3.50/17 = -0.2059
𝑛
1 (−3.5)2
 and sample variance = 19.13 − = 1.15059
17−1 17
 Therefore, standard deviation = 1.07266

 90% confidence interval using t distribution with 16 degrees of
freedom,
-0.2059 ± 1.746*(1.07266/√17)
= (-0.66, 0.25)
3b. If the population variance of the difference in
diopters is known to be 1.2, using the same sample
of 17 patients, calculate the confidence level
which indicates that the average difference is
between -0.9568 to 0.545.
 Given CI : (-0.9568 , 0.545)
0.545−(−0.9568) 𝜎
 = 𝑧
2 √𝑛 𝛼/2
 1.5018/2 = (1.2/√17) * 𝑧𝛼/2

 𝑧𝛼/2 = 0.7509/0.2910 = 2.58
 𝛼 = 0.01
 99% confidence level
3c. If the population variance of the difference in
diopters is known to be 1.2, for a confidence level
of 95%, what should be the minimum sample size
such that the error in estimating the average
difference is less than 0.2
𝜎
 ∗ 𝑧𝛼 < 0.2
𝑛 2
 (1.96*1.2)/√n < 0.2

 √n > (1.96*1.2)/0.2
 √n > 11.76
 n > 138.29
 The sample size must be atleast 139.
4. Two companies manufacture a rubber material
intended for use in an automotive application. The
part will be subjected to abrasive wear in the field
application, so we decide to compare the material
produced by each company in a test. Twenty-five
samples of material from each company are tested in
an abrasion test, and the amount of wear after 1000
cycles is observed. For company 1, the sample mean
and standard deviation of wear are x̅1 = 20
milligrams/1000 cycles and s1 = 2 milligrams/1000
cycles, while for company 2 we obtain x̅2 = 15
milligrams/1000 cycles and s2 = 8 milligrams/1000
cycles.
a) Do the data support the claim that the two companies produce
material with different mean wear? Use α = 0.05, and assume each
population is normally distributed but that their variances are not
equal.
a) Ho: µ1- µ2 = 0 or µ1= µ2
Halt: µ1- µ2 ≠ 0 or µ1≠ µ2
α = 0.05, two-tailed test
The t-test statistic (independent, unequal variance) is:
t0 = (x̅1- x̅2) - do / ((s12/n1) + (s22/n2))1/2
Degrees of freedom: tk , where

Reject the null hypothesis if t0 < -t0.025,27 where -t0.025,27 = -2.052 or t0 > t0.025,27 where
t0.025,27 = 2.052.
x̅1 = 20, x̅2 = 15, do=0
s1=2, s2=8, n1=25, n2=25
t0 = (20- 15) - 0 / ((22/25) + (82/25))1/2 = 3.03
Since 3.03 > 2.052, reject null hypothesis and conclude that the data supported the
claim that the two companies produce material with significantly different wear at
the 0.05 level of significance.
b) What is the P-value for this test?
P-value = 2P(t>3.03) (two-tailed)

0.005 < P-value < 0.010
c) Do the data support a claim that the material
from company 1 has higher mean wear than the
material from company 2? Use the same
assumptions as in part (a).
One-tailed test
Ho: µ1- µ2 ≤ 0, H1: µ1- µ2 > 0
α = 0.05
Test statistic: t0 = (x̅1- x̅2) - do / ((s12/n1) + (s22/n2))1/2
Reject null hypothesis if t0 > t0.05,27 where t0.05,27 = 1.706
x̅1 = 20, x̅2 = 15, do=0
s1=2, s2=8, n1=25, n2=25
t0 = (20- 15) - 0 / ((22/25) + (82/25))1/2 = 3.03
Since 3.03>1.706, reject null hypothesis and conclude that the data support the claim that the
material from company 1 has higher mean than the material from company 2 using a 0.05
level of significance.

Tutorial: CTM 2019 Probability and Statistics For Data Sciences

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Tutorial: CTM 2019 Probability and Statistics For Data Sciences

Hochgeladen von

Copyright:

Verfügbare Formate

TUTORIAL

Probability and Statistics for Data Sciences

 What is the sample space of number of lions seen on

 What family of probability distributions does it belong to?

 What is the mean and variance of the distribution?

 What is the probability that tourists will see 3 lions on the

 What is the probability that tourists will see fewer than 4

 What is the probability that tourists will see 5 lions on a 2

 Find fX(x) and fY(y)

= 3y(1-y4) for 0 ≤ y ≤ 1, 0 otherwise

 Are X and Y independent?

 Find the conditional PDF of X given Y=y

 Find E[X|Y=y] for 0 ≤ y ≤ 1

 Provide a 90% confidence interval for the average

 Therefore, standard deviation = 1.07266

 Given CI : (-0.9568 , 0.545)

 1.5018/2 = (1.2/√17) * 𝑧𝛼/2

 (1.96*1.2)/√n < 0.2

Degrees of freedom: tk , where

P-value = 2P(t>3.03) (two-tailed)

Das könnte Ihnen auch gefallen