Sie sind auf Seite 1von 26

Sampling Distribution

Sampling Distribution

If a random sample of size n is drawn from a finite or


infinite population, we have a number of samples with
different composition. Consequently, the value of
statistic will vary from one sample to the other.
STATISTIC IS A RANDOM VARIABLE WITH
PROBABILITY DISTRIBUTION
Probability distribution of a statistic is called
SAMPLING DISTRIBUTION
Examples of sampling distribution

1. Suppose the true proportion of females in the PGP 11-13 batch across all the IIMs is p=0.15.
Suppose you select all possible random samples of 30 students, each of those samples will yield a
value of the sample proportion (of females in that sample). If you construct a histogram of those
values, what you will get is precisely the sampling distribution of the sample proportion (of
females).
2. Suppose, each year the placement office of IIMK selects a random sample of 50 graduating
students and records the starting salary for each. Then it reports the sample mean of those 50
starting salaries. The distribution of these mean salaries will constitute the sampling distribution (of
the sample mean salaries of IIMK graduating students).
Sampling Distribution

Variation in the values of statistic from sample to sample is called


sampling fluctuation and is measured by STANDARD ERROR
Sampling Distribution of Mean

E (x)  
Standard Error of mean in case of infinite population
or sampling with replacement

s.e( x ) 
n
Population and Sample Proportions

• The population proportion is equal to the number of


elements in the population belonging to the category of
interest, divided by the total number of elements in the
population:
X
p
N
• The sample proportion is the number of elements in the
sample belonging to the category of interest, divided by the
sample size: x
pˆ 
n
The Sampling Distribution of the Sample Proportion, p
The sample proportion is the percentage of
successes in n binomial trials. It is the
number of successes, X, divided by the
number of trials, n.

x
Sample proportion: pˆ 
n
As the sample size, n, increases such that the
sampling distribution of p approaches a Infinite Population
normal distribution with mean p and standard E ( pˆ )  p
deviation p(1  p)
n
p (1  p )
s.e.( pˆ ) 
n
Example
Suppose Indian corporate sector believes that about 45% of their senior executives have attended at
least one program (MDP, EPGP etc) offered by the IIMs at some point in their career. Suppose there
are about 1.2 lacs senior executives currently working in India. A research group in IIMK surveys a
random sample of 1000 senior executives regarding the above issue to verify that belief.
a) Find out population and sample proportions.
b) Find the standard error of the sample proportion of senior executives who have attended at least
one program in the IIMs.
c) If the research group selected 7000 senior executives, would the standard error remain the same as
above ?
d) Suppose 375 of the 1000 senior executives have attended a program in one of the IIMs. What will
be the estimated standard error of the sample proportion?
e) If the research group selected 7000 executives and 3000 of them admitted to have attended a
program in one of the IIMs, then what would be the new estimated standard error of the sample
proportion ?
Sampling from a Normal Population

When sampling from a normal population with mean  and standard


deviation , the sample mean, X, has a normal sampling distribution:

2

X ~ N (, )
n

This means that, as the S ampling Distribution of the S ample Mean

sample size increases, the 0.4

Sampling Distribution: n =16


sampling distribution of the 0.3
Sampling Distribution: n = 4

sample mean remains

f(X)
0.2

Sampling Distribution: n = 2
centered on the population 0.1
Normal population
Normal population
mean, but becomes more 0.0


compactly distributed around
that population mean
Example
The foreman of a bottling plant has observed that the
amount of soda in each “32-ounce” bottle is actually a
normally distributed random variable, with a mean of 32.2
ounces and a standard deviation of .3 ounce.

If a customer buys one bottle, what is the probability that


the bottle will contain more than 32 ounces?
Example
We want to find P(X > 32), where X is normally distributed
and µ = 32.2 and σ =.3

 X   32  32.2 
P(X  32)  P    P( Z   .67)  1  .2514  .7486
  .3 

“there is about a 75% chance that a single bottle of soda


contains more than 32oz.”
Example
The foreman of a bottling plant has observed that the
amount of soda in each “32-ounce” bottle is actually a
normally distributed random variable, with a mean of 32.2
ounces and a standard deviation of .3 ounce.

If a customer buys a carton of four bottles, what is the


probability that the mean amount of the four bottles will
be greater than 32 ounces?
Example …
We want to find P(X > 32), where X is normally distributed
With µ = 32.2 and σ =.3

Things we know:

1) X is normally distributed, therefore so will X.

2) = 32.2 oz.

3)
Example
If a customer buys a carton of four bottles, what is the probability that
the mean amount of the four bottles will be greater than 32 ounces?

“There is about a 91% chance the mean of the four bottles will exceed
32oz.”
mean=32.2

what is the probability that one bottle will what is the probability that the mean of
contain more than 32 ounces? four bottles will exceed 32 oz?
Central Limit Theorem (CLT)
If a random sample of size n is drawn from a
population with mean µ and standard deviation σ, the
distribution of the sample mean (x) approaches
normal distribution with mean µ and standard

deviation n as the sample size (n) increases.
 2 
 , n
i.e. x ~ N  

 
If the population is normal, the distribution of the
sample mean is normal regardless of sample size.
WHY CLT IS USEFUL

• When the sampling distribution of x is approximately


normal, we can use the Empirical rule to predict how
close sample means will be to the true population mean.
• Since the CLT holds for a large number of population
distributions, it helps us to make inferences about the
population means regardless of the shape of the
population distribution. This is often helpful in practice
since we usually do not know the true shape of the
population distribution (and often it is skewed).
Central Limit Theorem
The Central Limit Theorem Applies to Sampling
Distributions from Any Population
Normal Uniform Skewed General

Population

n=2

n = 30

 X  X  X  X
NOTE
When the population has a normal distribution, the
sampling distribution of x is normally distributed
for any sample size.

In most applications, the sampling distribution of x


can be approximated by a normal distribution whenever
the sample is size 30 or more.

In cases where the population is highly skewed or


outliers are present, samples of size 50 may be
needed.
Case:
Marketing Iced Coffee
• In order to capitalize on the iced coffee trend, Starbucks
offered for a limited time half-priced Frappuccino
beverages between 3 pm and 5 pm.
• Anne Jones, manager at a local Starbucks, determines
the following from past historical data:
• 43% of iced-coffee customers were women.
• 21% were teenage girls.
• Customers spent an average of $4.18 on iced coffee
with a standard deviation of $0.84.
Case:
Marketing Iced Coffee
• One month after the marketing period ends, Anne
surveys 50 of her iced-coffee customers and finds:
46% were women.
34% were teenage girls.
They spent an average of $4.26 on the drink with sd $0.84.
• Anne wants to use this survey information to calculate
the probability that:
Customers spend an average of $4.26 or more on iced coffee.
46% or more of iced-coffee customers are women.
34% or more of iced-coffee customers are teenage girls.
The Sampling Distribution
of the Means
• Example: Anne wants to determine if the marketing
campaign has had a lingering effect on the amount of
money customers spend on iced coffee.
 Before the campaign,  = $4.18 and σ = $0.84. Based on 50
customers sampled after the campaign, 𝑥 = $4.26.
 Let’s find P  X  4.26  . Since n > 30, the central limit theorem
states that X is approximately normal. So,

 X   4.26  4.18 


 
P X  4.26  P  Z 

  P Z  
 n   0.84 50 
 P  Z  0.67   1  0.7486  0.2514
LO 7.4
The Sampling Distribution
of the Sample Proportion
• Example: From the introductory case, Anne wants to
determine if the marketing campaign has had a
lingering effect on the proportion of customers who are
women and teenage girls.
 Before the campaign, p = 0.43 for women and p = 0.21 for
teenage girls. Based on 50 customers sampled after the
campaign, 𝑝 = 0.46 and 𝑝= 0.34, respectively.
 Let’s find 𝑃(𝑝 ≥ 0.46). Since n > 30, the central limit theorem
states that 𝑝 is approximately normal.

LO 7.5
The Sampling Distribution
of the Sample Proportion
   
   
pp  0.46  0.43
 
P P  0.46  P  Z 
 p 1  p  
 P Z 
 0.43 1  0.43 


   
 n   50 
 P  Z  0.43   1  0.6664  0.3336

LO 7.5
Problem

1. Suppose out of all first year students enrolled in the top business schools across India, about 35% went
abroad for summer internship last year. Suppose you randomly select a business school and it turns out to be
IIMK which has about 360 students enrolled in the first year.
a) What is the probability that at least 30% of the 360 IIMK students will go abroad for internship this year ?
b) What is the probability that at most 50% of the 360 IIMK students will go abroad for internship this year ?
c) What is the probability that between 40% and 60% of the 360 IIMK students will go abroad for internship
this year ?

2. The sales of food and drink in Milma stall in IIMK vary from day to day. The daily sales figures fluctuate
with mean = Rs 500 and standard deviation = Rs 200. The stall owner wants to calculate the mean daily
sales for the week to check how he is doing.
a) What would the mean daily sale figures for the week center around ?
b) How much variability would you expect in the mean daily sales figures for the week ?
c) Suppose Milma stall owner now wants to look at the monthly sales. What will be the sampling
distribution ? Will his mean daily sales for the month vary more or less than the mean daily sales for the
week ?

Das könnte Ihnen auch gefallen