Sie sind auf Seite 1von 4

tMM5BD1 (applied) Statistics Q&A 11/6/2020

1. In a discrete random variable, the variance is given by: Var ( X )=E ( X 2 )−E ( X )2. Why do we
not need to use the unbiased estimator of variance n−1 in the formula?
Two different concepts.

In the book, an alternate notation for Var ( x ) is s2x (From the FM Stats book page 27).
s2x is the sample variance, estimated from the data.

For example, we collected some data of the number of spam calls per hour for a week.
Let’s say there is a mean rate of 2 spam calls per hour, why when calculating the variance
for a *sample* , we do not use the unbiased estimator?

Probability: theoretical. Know distribution, parameter, then calculate the probability.


Stats: Assume a model and see the data, assess the model assumptions and estimate
model parameters (subjective).

Obviously, we cannot record the number of spam calls for all the possible weeks, so the
data we obtain is just a *sample* rather than an entire *population*. Is there something
different from these random variable variances and sample variances?

An estimator is a function of the observations, so it takes different values when we have


different samples. Therefore, it makes sense to talk about the mean and variance of an
estimator.

For instance, sample mean = 1/n*(X_1 + X_2 + … + X_n)


Exp(Sample mean) = 1/n * n * Exp(X_1) = Exp(X_1) = mu
σ2
Var(sample mean) = 1/n^2 * n * Var(X_1) =
n

Var ( X 1 + X 2 +…+ X n ) =Var ( X 1 ) +Var ( X 2 ) +…+Var ( X n )=nVar ( X 1) by independence


2. In the whiteboard from the lesson on 15/5,
(https://whiteboard.microsoft.com/me/whiteboards/73f2c237-528d-45c1-95ab-
55c7a30016e7). I have some questions regarding hypothesis testing:

a. I understand that H 0 needs to precise. Why when we do χ 2 tests to test whether two
quantities are independent/dependent, H 0 assumes that the two quantities are
independent. In other words, why is having the two quantities being independent more
precise? Is it because we can calculate the expected values iff the two quantities are
independent? yes

b. What do you mean by “Definte a test statistic T ¿ … X n ¿”?


From what I understand currently, a test static is something you are given in the
question and you do some calculations with it under the assumption H 0 is true.
Let me use an example: Someone claims there are more rainy days in May than
expected, where the probability of a rainy day is ½. It turns out there are 20 rainy days
in May.
Let X be the number of (31) days in May during which it was raining.
1
Under H0: X ∼ Bin(31 , )
2
In this case would the test statistic be 20? (Then we calculate the probability of having
20+ rainy days in May and compare that to the sig. level) Can you explain the notation
used in the whiteboard? Also please correct my understanding if I am wrong.

c. What do you mean by “Work out the distribution T~( …)”?


Do you mean we explain what distribution the test statistic follows? In the above
example, it would be binomial? In other words do I say 10~B(31,0.5)?
(I remember you told me that for pmcc and srcc we can omit this. Just to be sure, is
that true?)

d. In the example above about rainy days in May, I used to just say X~B(31,0.5), where X is
a drv indicating the number of rainy days in May. Is this equivalent to defining a test
statistic and showing what distribution the test statistic follows?

e. Is there a model answer for Q3 from the 0512 prep? It is not included in the course
website word document. I don’t really understand what you wrote in the whiteboard
from the lesson 15/5 either.

3. Questions about CLT:


a. Is this correct? We can only apply CLT when we talk about sample means.

b. What does “Sampling distribution of the mean”/ “The distribution of sample means”
mean?
The distribution of X́

c. (From the Single maths A2 book page 384) Lets say the sample size is 50. Does it mean
we split the parent population into lots of different groups of 50 and get their means,
σ2
their distribution would be ( µ , )? (Assuming the parent population follows a
50
2
distribution N( µ , s ))

s2
d. Can I say that the sample variance S2x is therefore ? (I don’t think that’s correct
50
because in the example in page 388, the calculated sample variance S2x is treated as the
S2
population variance since they used the variance  x when calculating the test statistic
50
2
z) or is the sample variance S xjust an estimate of s2 so we still need to divide by n in our
calculations for the test statistic z?

4. We had a few lessons on the distribution of the errors. How can we use that in a level
stats? Is it just an extension proof to why the mean and median is used? (To minimise
different error terms?)

5. About the 0519 prep question 3 (Normal distribution) Can you explain what the iid RVs
are?
Independent and identically distributed RVs
Are they just the random variables of the groups of 75 samples? Using the example I
mentioned in question 3b, is it the “lots of different groups of 50”?

(Optional consolidation) A random sample of 75 eleven-year-olds performed a simple task and


the time taken, t minutes, was noted for each. You may assume that the distribution of these
times is Normal. The results are summarized as follows:
n=75 , ∑ t=1215 , ∑ t 2=21708
Carry out a hypothesis, at the 1% significance level, to conclude whether there is evidence that
the mean time taken to perform this task is greater than 15 minutes.

Let μ be the population mean of the time taken to perform this task.
H 0 :μ=15
H 1 : μ> 15
Significance level: 1%

Let T i be the time taken for the i th randomly chosen person to complete the task, where
i=1 , 2, … , 75. Then T i’s are iid R.V.s following N ( μ , σ 2 ) distribution.
2
σ
( )
Hence, T́ ∼ N μ ,
75
.
As the sample size is 75 and the population variance is unknown, we can use the sample
variance as a reliable estimate of the population variance, thus:
1 12152
σ^ 2=s 2=
75−1( 21708−
75 ) ≈ 27.36486

1215
p-value¿ P( T^ ≥ ∨H 0) ≈ 0.02348>0.01
75
Insufficient evidence to reject the null hypothesis. Therefore we conclude that there is
insufficient evidence to suggest that the time taken for the chosen sample to complete the
task is more than 15 minutes.

Das könnte Ihnen auch gefallen