Beruflich Dokumente
Kultur Dokumente
1. In a discrete random variable, the variance is given by: Var ( X )=E ( X 2 )−E ( X )2. Why do we
not need to use the unbiased estimator of variance n−1 in the formula?
Two different concepts.
In the book, an alternate notation for Var ( x ) is s2x (From the FM Stats book page 27).
s2x is the sample variance, estimated from the data.
For example, we collected some data of the number of spam calls per hour for a week.
Let’s say there is a mean rate of 2 spam calls per hour, why when calculating the variance
for a *sample* , we do not use the unbiased estimator?
Obviously, we cannot record the number of spam calls for all the possible weeks, so the
data we obtain is just a *sample* rather than an entire *population*. Is there something
different from these random variable variances and sample variances?
a. I understand that H 0 needs to precise. Why when we do χ 2 tests to test whether two
quantities are independent/dependent, H 0 assumes that the two quantities are
independent. In other words, why is having the two quantities being independent more
precise? Is it because we can calculate the expected values iff the two quantities are
independent? yes
d. In the example above about rainy days in May, I used to just say X~B(31,0.5), where X is
a drv indicating the number of rainy days in May. Is this equivalent to defining a test
statistic and showing what distribution the test statistic follows?
e. Is there a model answer for Q3 from the 0512 prep? It is not included in the course
website word document. I don’t really understand what you wrote in the whiteboard
from the lesson 15/5 either.
b. What does “Sampling distribution of the mean”/ “The distribution of sample means”
mean?
The distribution of X́
c. (From the Single maths A2 book page 384) Lets say the sample size is 50. Does it mean
we split the parent population into lots of different groups of 50 and get their means,
σ2
their distribution would be ( µ , )? (Assuming the parent population follows a
50
2
distribution N( µ , s ))
s2
d. Can I say that the sample variance S2x is therefore ? (I don’t think that’s correct
50
because in the example in page 388, the calculated sample variance S2x is treated as the
S2
population variance since they used the variance x when calculating the test statistic
50
2
z) or is the sample variance S xjust an estimate of s2 so we still need to divide by n in our
calculations for the test statistic z?
4. We had a few lessons on the distribution of the errors. How can we use that in a level
stats? Is it just an extension proof to why the mean and median is used? (To minimise
different error terms?)
5. About the 0519 prep question 3 (Normal distribution) Can you explain what the iid RVs
are?
Independent and identically distributed RVs
Are they just the random variables of the groups of 75 samples? Using the example I
mentioned in question 3b, is it the “lots of different groups of 50”?
Let μ be the population mean of the time taken to perform this task.
H 0 :μ=15
H 1 : μ> 15
Significance level: 1%
Let T i be the time taken for the i th randomly chosen person to complete the task, where
i=1 , 2, … , 75. Then T i’s are iid R.V.s following N ( μ , σ 2 ) distribution.
2
σ
( )
Hence, T́ ∼ N μ ,
75
.
As the sample size is 75 and the population variance is unknown, we can use the sample
variance as a reliable estimate of the population variance, thus:
1 12152
σ^ 2=s 2=
75−1( 21708−
75 ) ≈ 27.36486
1215
p-value¿ P( T^ ≥ ∨H 0) ≈ 0.02348>0.01
75
Insufficient evidence to reject the null hypothesis. Therefore we conclude that there is
insufficient evidence to suggest that the time taken for the chosen sample to complete the
task is more than 15 minutes.