Sie sind auf Seite 1von 21

111

SAMPLE MEAN
112

Sample Mean
For a simple random sample X 1 , , X n from some population,
the sample mean is given by
1 n
X Xi
n i1

What it means:
1. X 1 , , X n are chosen from the same distribution
2. Well consider a situation where we choose with replacement:
X 1 , , X n are independent and identically distributed (IID)
113

Distribution of Sample Mean


The distribution of the sample mean is given by the
probability distribution of the values the sample mean can take.

Example 1, Pg. 60 from the course booklet.


a) Obtain the distribution of the students daily expenditure on
lunch.
b) Obtain the distribution of the students average expenditure on
lunch over two days.
c) Obtain the distribution of the students average
expenditure on lunch over 30 days. (Slight change from the
booklet)
114

Why is there a distribution of X ?


Consider other students who go to the restaurant
and eat there as well.
Assume there are many of them, but all of them
follow the same daily demand distribution of A, B
and C.
All of them have their own X , hence we can talk of
a distribution of X.
Sampling Distribution 115

Distribution of X

n=30?
116

A Result
If X1, X2, , Xn are all independent N(,2), then

n

X1 Xn Xi ~ N n, n2
i1

Or

1 n 2
X Xi ~ N ,
n i1 n

But what if the random variables are not normal?


117

Sample Mean
Consider the random variables: X1, X2, , Xn
Let all of them be independent and identically distributed
(IID) with mean and variance 2. (But not normal.)

2
Then E X ; V X
n

But how is X distributed?


118

Example: Example 1, Pg. 60


a) Obtain the distribution of the students daily expenditure on
lunch.
b) Obtain the distribution of the students average expenditure
on lunch over two days.
c) Obtain the distribution of the students average
expenditure on lunch over 30 days.
Sampling Distribution 119

Distribution of X
Sampling Distribution 120

The Central Limit Theorem (CLT)


X1, , X n : IID Sample with mean and variance 2
n
For large n, Xi ~ N(n, n2 ) approximately.
i1

Or,

For large sample size n, X ~ N(, 2 n ) approximately .


121

Back to our Example


Work out and :
= 118, 2 = 496 (check), = 22.27.
Now work out the approximate distribution of the sample
mean X for n = 30 using CLT:
N(118, 496/30) = N(118, 16.53) = N(118, 4.072)

What is the probability that over 30 days, the average spend


is at least Rs.122?
What is the probability that over 30 days, total spend is at
least Rs. 4000?
Sampling Distribution 122

Distribution of Proportion
Indicator/dummy variables: denote whether an event occurs
or not, and are valued 0 or 1.
Distributed as Bernoulli(p) Binomial(1,p), where probability of
event = p (true proportion)

Let n = number of trials. X: total no. of occurrences


Total no. of occurrences X can be thought of as sum of indicator
variables: X = X1+X2++Xn, each Bernoulli(p), independent.
Distribution of X: Binomial(n, p).

Proportion p = Total no. of occurrences/n =X/n


Solve Exercise 1, pg. 51.
Sampling Distribution 123

Distribution of Proportion
n = number of trials; p=true proportion
Distribution of total no. of occurrences: X~ Binomial(n, p).

X X1 X 2 X n
Sample proportion p : sample mean of
indicator variables. n n

X X p(1 p)
E p; V(p)
E(p) V .
n n n
Sampling Distribution 124

Distribution of Proportion and the CLT


Assume np 5 and n(1-p) 5 (i.e., n is large)
Then the following Central Limit Theorem may be used:

p(1 p)
p ~ N p, approximately
n

Or:

X= No. of occurrences ~ N(np, np(1-p)) approximately.


i.e., Bin(n, p) can be approximated by N(np, np(1-p))
(for large n).
Sampling Distribution 125

Continuing With the Example: proportion


a) What proportion of days is the student expected to spend
Rs.140 or more?
b) What is the distribution of the number of times the student
spends Rs.140 or more for lunch in 5 days?
c) What is the distribution of the number of times the student
spends over Rs.140 or more for lunch in 50 days?
d) What is the probability that the student spends over
Rs.140 or more at least 30 times in 50 days?
126

Issues With the CLT


When can we use it:
If samples are from IID distributions

When we may not use it:


Dont use it if the distributions are not IID.
Errors may be large for small samples from skewed
distributions
127

Example
Binomial(n,p) with p = 0.5, and various n.
128

Example
Binomial(n,p) with p = 0.1, and various n.
129

Example
Binomial(n,p) with p = 0.01, and various n.
130

How good are the approximations?


Depends!
Larger skew: need larger sample size
No skew: even 15 is a reasonable sample size to use the
normal approximation
Moderate skew: need 30 or more
High skew: need 50 or more
Severe skew (Example: binomial with large n and small p
so that Poisson approximation holds): might need very
high sample size, in the range of several hundred or even
higher
131

Quick questions
I collect a sample of 100 children of age 6 from
Ahmedabad.
a) Does the CLT say that their height distribution is
approximately normal?
b) Does the CLT say that the distribution of their average
height is approximately normal?
c) I record the heights of 15 students of age 6 and 15
students of age 10. Is the distribution of the average
height approximately normal? What if I recorded 50+50?
I record the heights of 20 students sampled, without
replacement, from this class. Does the CLT say that the
average height is approximately normal?

Das könnte Ihnen auch gefallen