Sie sind auf Seite 1von 66

Amity Business School

Sampling Distributions

RSR
Amity Business School

Sampling Distributions…
 A sampling distribution is created by, as the name suggests,
sampling. There are two ways to create a sampling distribution.
• The first is to actually draw samples of the same size from a
population, calculate the statistic of interest, and then use
descriptive techniques to learn more about the sampling
distribution.
• The method we will employ on the rules of probability and the
laws of expected value and variance to derive the sampling
distribution.

For example, consider the roll of one and two dice…


RSR
Amity Business School

Sampling Distribution of the Mean…


A fair die is thrown infinitely many times,
with the random variable X = # of spots on any throw.

The probability distribution of X is:


x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6

…and the mean and variance are calculated as well:

RSR
Amity Business School
A sampling distribution is created by looking at all samples of
size n=2 (i.e. two dice) and their means…

RSR
All Samples of Size 2 from a Population
Amity Business School

RSR
Amity Business School

6/36

P( ) 5/36
1.0 1/36
4/36
)
1.5 2/36
2.0 3/36
3/36
P(

2.5 4/36
3.0 5/36
3.5 6/36
2/36
4.0 5/36
4.5 4/36
5.0 3/36 1/36
5.5 2/36
6.0 1/36 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

RSR
Amity Business School

Compare…
Compare the distribution of X…

1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

…with the sampling distribution of .

As well, note that:

RSR
Amity Business School

Generalize…
We can generalize the mean and variance of the sampling of two dice:

…to n-dice:

The standard deviation of the


sampling distribution is
called the standard error:

RSR
Amity Business School
It is important to recognize that the distribution of X is
different from the distribution of X. However the two
random variables are related.
Their means are the same(    x  3.5) and their variances
(
are related x 2
  2
2)
 2

 x2

If we now repeat the sampling process with the same


population but with other values of n, we produce
somewhat different sampling distributions of when
n = 5, 10 and 25.
RSR
Amity Business School

RSR
Amity Business School

RSR
Amity Business School

RSR
Amity Business School

The variance of sampling distributions of Xis less than the


variance of the population we’re sampling from all the sample
sizes.
Thus, a randomly selected value of X (the mean of the number
of spots in, say five throws of the die) is likely to be closer to
the mean value of 3.5 than is a randomly selected value of X
(the number of spots observed in one throw). Indeed this is
what one would expect, because, in five throws of the die one
is likely to get some 5s and 6s and some 1s and 2s, which will
tend to offset one another in averaging process and produce a
sample mean reasonably close to 3.5. As the number of throws
of the die increases the, the probability that the sample mean
will be close to 3.5 also increases. Thus we observe that the
sampling distribution of X becomes narrower as n increases.

RSR
Amity Business School

RSR
Amity Business School

Central Limit Theorem…


The sampling distribution of the mean of a random sample drawn from
any population is approximately normal for a sufficiently large
sample size.

The larger the sample size, the more closely the sampling distribution
of X will resemble a normal distribution.

RSR
Amity Business School

Central Limit Theorem…


If the population is normal, then X is normally distributed for all
values of n.

If the population is non-normal, then X is approximately normal only


for larger values of n.

In most practical situations, a sample size of 30 may be sufficiently


large to allow us to use the normal distribution as an approximation
for the sampling distribution of X.

RSR
Amity Business School

Sampling Distribution of the Sample Mean


1.

2.

3. If X is normal, X is normal. If X is nonnormal, X is approximately


normal for sufficiently large sample sizes.
Note: the definition of “sufficiently large” depends on the extent of
nonnormality of x (e.g. heavily skewed; multimodal)

RSR
Amity Business School

Sampling Distribution of the Sample Mean


We can express the sampling distribution of the mean
simple as
X
Z
/ n

RSR
Amity Business School

Sampling Distribution of the Sample Mean


The summaries above assume that the population is infinitely large.
However if the population is finite the standard error is
 Nn
x 
n N 1

where N is the population size and


Nn
N 1

is the finite population correction factor.

RSR
Amity Business School

Sampling Distribution of the Sample Mean


If the population size is large relative to the sample size the
finite population correction factor is close to 1 and can be
ignored.

We will treat any population that is at least 20 times larger


than the sample size as large.

In practice most applications involve populations that qualify


as large.

As a consequence the finite population correction factor is


usually omitted.
RSR
Amity Business School

Contents of a “32 - Ounce” Bottle

The foreman of a bottling plant has observed that the


amount of soda in each “32- ounce” bottle is actually a
normally distributed random variable, with a mean of
32.2 ounces and a standard deviation of .3 ounce.
a. If the customer buys one bottle, what is the
probability that the bottle will contain more than
32 ounces?
b. If a customer buys a carton of four bottles, what is
the probability that the mean amount of four
bottles will be greater than 32 ounces?

RSR
Amity Business School

Distribution of X and Sampling Distribution of X

RSR
Amity Business School

Solution
a. Because the random variable is the amount of soda in
one bottle, we want to find:
P ( X > 32), Where X is normally distributed,   32.2,
and   .3. Hence,

 X   32  32.2 
P X  32   P  
  .3 
 P( z  .67)
 .5  .2486  .7486

RSR
Amity Business School

Distribution of X and Sampling Distribution of X

RSR
Amity Business School

Solution
b. Now we want to find the probability that the mean
amount of four filled bottles exceeds 32 ounces. That is,
 
we want P X  32. From our previous analysis and
from the central limit theorem, we know the following:
1. X is normally distributed
2.  x    32.2
3. ( x   n )  .3 4  .15
 X   x 32  32.2 
P X  32   P  
 x .15 
 P( z  1.33)
RSR  .5  .4082  .9082
Amity Business School

In the example, we began with the assumption that both µ


and σ were known.
Then, using the sampling distribution, we made a
probability statement about mean.
Unfortunately the values of µ and σ are not usually known,
so an analysis such as that in this example cannot usually
be conducted.
However, we can use the sampling distribution to infer
something about an unknown value of µ on the basis of a
sample mean.

RSR
Amity Business School

EXERCISE

The number of pizzas consumed per month by university


students is normally distributed with a mean of 10 and a
standard deviation of 3.
a. What proportion of students consume more than 12
pizzas per month?
b. What is the probability that, in a random sample of
25, students, more than 275 pizzas are consumed?

RSR
Amity Business School

Solution

 X   12  10 
P(X > 12) = P  
  3 

= P(Z > .67) = .5 – P(0 < Z < .67)

= .5 – .2486

= .2514

RSR
Amity Business School

EXERCISE

The number of customers who enter a supermarket each


hour is normally distributed with a mean of 600 and a
standard deviation of 200. The supermarket is open 16
hours per day. What is the probability that the total
number of customers who enter the supermarket in one
day is greater than 10,000?

RSR
Amity Business School

Solution
P ( X  10,000 / 16)
 P ( X  625)
 X   625  600 
 P  
  / n 200 / 16 
= P(Z > .50)
= .5 – P(0 < Z < .50)

= .5 – .1915

= .3085
RSR
Amity Business School

Salaries of a Business School’s Graduates


In the advertisements for a large university, the dean of
the School of Business claims that the average salary of
schools graduates one year after graduation is $ 800 per
week with a standard deviation of $ 100. A second year
student in the business school who has just completed his
course would like to check whether the claim about the
mean is correct. He does a survey of 25 people who
graduated one year ago and determines their weekly
salary. He discovers the sample mean to be $ 750. To
interpret his findings he needs to calculate the probability
that a sample of 25 graduates would have a mean of $ 750
or less when the population mean is $ 800 and the
standard deviation is $ 100.
RSR
Amity Business School

RSR
Amity Business School

Solution
We want to find the probability that the sample mean is less than $ 750.

Thus we seek P X  750 . 
The distribution of X, the weekly income, is likely to be positively skewed,
but not sufficiently so to make the distribution of X nonnormal.
As a result, we may assume that is normal with mean (    x  800)
and standard deviation    2 n  100 25  20 thus
x

The probability of observing a


 X   x 750  800 
P X  750   P
sample mean as low as $750 When
 
the population mean is $800 is
 x 20  extremely small.
 P( z  2.5) Because this event is quite
 .5  .4938  .006 unlikely, we would have to conclude
that the claim is not justified.

RSR
Amity Business School

Using the Sampling Distribution for Inference


• We know that zA is the value of z such that the area
to the right of zA under the standard normal curve is
equal to A.
and therefore
• we can show that z.025 = 1.96.
• Because the standard normal distribution is
symmetric about 0, the area to the left of -1.96 is
also .025.
• The area between -1.96 and 1.96 is .95.
• We can express the notation algebraically as:
P(-1.96 < Z < 1.96) = .95
RSR
Amity Business School

Using the Sampling Distribution for Inference

RSR
Amity Business School

Using the Sampling Distribution for Inference


Here’s another way of expressing the probability calculated
from a sampling distribution.
P(-1.96 < Z < 1.96) = .95
Substituting the formula for the sampling distribution
X
P(1.96   1.96 )  .95
/ n
With a little algebra
 
P(  1.96  X    1.96 )  .95
n n
   
P   Z  2  X    Z 2   1
 n n
RSR
Amity Business School

Using the Sampling Distribution for Inference


We can also produce a general form of this statement

 
P(   z / 2  X    z / 2 ) 1
n n
In this formula α (Greek letter alpha) is the probability that X
does not fall into the interval.

To apply this formula all we need do is substitute the values for


µ, σ, n, and α.

RSR
Amity Business School

Salaries of a Business School’s Graduates

 100 100 
P 800  1.96  X  800  1.96   .95
 25 25 

P760.8  X  839.2  .95

This tells us that there is a 95% probability that a sample


mean will fall between 760.8 and 839.2. Because the mean
as computed to be $ 750, we would have to conclude that
the dean’s claim is not support by statistic.
RSR
Amity Business School

Exercise

RSR
Amity Business School

RSR
Amity Business School

RSR
Amity Business School

Sampling Distribution of a Proportion…


The estimator of a population proportion of successes is the sample
proportion. That is, we count the number of successes in a sample and
compute:

(read this as “p-hat”).

X is the number of successes, n is the sample size.

RSR
Amity Business School

Sampling Distribution of Proportion


For example, suppose that we have a binomial experiment with n
= 10 and p = .4. To find the probability that the sample proportion

is less than or equal to .5 (because X/n = 5/10 = .50)

After calculations
P( Pˆ  .50)  P( X  5)  .834

We can calculate the probability associated with other values of
similarly.

Discrete distributions such as the binomial do not lend themselves


easily to the kinds of calculation needed for inference. And
inference is the reason we need sampling distributions.
Fortunately, we can approximate the binomial distribution by
normal distribution.
RSR
Amity Business School

Normal Approximation to Binomial Distribution


Distribution with n = 20 and p = .5 •Let X be a binomial
random variable with n = 20
and p = .5.
•We can easily determine
the probability of each value
of X, where X = 0, 1, 2, …,
19, 20.
•A rectangle representing a
value of x is drawn so that
its area equals the
probability.
RSR
Amity Business School

Normal Approximation to Binomial Distribution


Distribution with n = 20 and p = .5 •If we now smooth the ends of
the rectangles, we produce a
bell-shaped curve.

•Thus the base of each rectangle


for x is the interval x - .5 to x +
.5.

•As you can see, the rectangle


representing x = 10 is the
rectangle whose base is the
interval 9.5 to 10.5 and whose
height is P(X = 10) = .176
RSR
Amity Business School

Normal Approximation to Binomial Distribution


Distribution with n = 20 and p = .5

RSR
Amity Business School

Binomial Distribution with n = 20 and


p = .5 and Normal Approximation

RSR
Amity Business School

Normal Approximation to Binomial Distribution


Binomial Distribution with n = 20 and p •We accomplish this by letting
= .5 and Normal Approximation the height of the rectangle equal
the probability and the base of
the rectangle equal 1.
•Thus, to use the normal
approximation, all we need to
do is find the area under the
normal curve between 9.5 and
10.5.
•To find normal probabilities
requires us to first standardize
X by subtracting the mean and
dividing by the standard
deviation.
RSR
Amity Business School

Normal Approximation to Binomial Distribution


Binomial Distribution with n = 20 and p •The values for µ and σ are
= .5 and Normal Approximation
derived from the binomial
distribution being
approximated as:
  np
and
  npq

•for n =20 and p = .50 we have


µ = 10 and σ = 2.24

RSR
Amity Business School

Normal Approximation to Binomial Distribution


Binomial Distribution with n = 20 and •To calculate the probability that X =
p = .5 and Normal Approximation 10 using the normal distribution
requires that we find the area under
the normal curve between 9.5 and
10.5, i.e.,
P( X  10)  P(9.5  Y  10.5)
 9.5  10 Y   10.5  10 
P   
 2.24  2.24 
 P .22  Z  .22   2 .0871  .1742

•The actual probability that X equals


10 is P(X=10) = .176. As we can see
the approximation is quite good.
RSR
Amity Business School

Approximate Sampling Distribution of a Sample Proportion


Using the laws of expected value and variance, we can determine the mean,
variance, and standard deviation of P̂. (The standard deviation of P̂ is known
as Standard error of Proportion) that is,

E ( Pˆ )  p

ˆ
V P   pˆ 
2 pq
n
 pˆ 
pq
n
Thus the variable
Pˆ  p
Z
pq n
is approximately standard normally distributed provided that the sample size is
large. The theoretical sample size requirements are that np and nq are both greater
than or equal to 5. This requirement is referred as theoretical because in practice
much larger sample size are needed for the inference to be useful.
RSR
Amity Business School

Political Survey

In the last election a Member of Parliament received 52% of the


votes cast. One year after the election the representative
organized a survey that asked a random sample of 300 people
whether they would vote for him in the next election. If we
assume that his popularity has not changed, what is the
probability that more than half of the sample would vote for him?

RSR
Amity Business School

Solution
•The number of respondents who would vote for the representative is
a binomial random variable with n = 300 and p =.52. We want to
determine the probability that the sample proportion is greater than

50%. That is, we want to find P Pˆ  .50 
•We now know that the sample proportion is approximately normally
distributed with mean p = .52 and standard deviation =
pq n  (.52)(.48) 300  .0288
 Pˆ  P 
Thus we calculate
 
P Pˆ  .50  P
 pq n

.50
.
 .
0288
52 

 
 PZ  .69   .5  .2549  .7549

If we assume the level of support remains at 52%, the probability that


more than half the sample of 300 people would vote for the
representative is 75.49%
RSR
Amity Business School

Housing Board Colony


A housing board colony of Gwalior consists of
2000 houses. A researcher wants to know the
average income of the households in this
housing board colony. The mean income per
household is Rs 150,000 with standard
deviation of 15,000. A random sample of 200
households is selected by a researcher and
analysed. What is the probability that the
sample average is greater than Rs 160,000.
RSR
Amity Business School

Solution

RSR
Amity Business School

Exercise

RSR
Amity Business School

RSR
Amity Business School

RSR
Amity Business School

RSR
Amity Business School

RSR
Amity Business School

RSR
Amity Business School

Summary
•The sampling distribution of a statistics is
created by repeated sampling from one
population. Here, we introduced the sampling
distribution of mean, the proportion and the
difference between the means. We described how
these distributions are created theoretically and
empirically.

RSR
Amity Business School

From Here to Inference


we introduced probability distributions, which allowed us to
make probability statements about values of the random
variable.

A prerequisite of this calculation is knowledge of the


distribution and the relevant parameters.

RSR
Amity Business School

From Here to Inference

The figure below symbolically represents the use of probability


distributions.

Simply put, knowledge of the population and its parameter(s) allows us


to use the probability distribution to make probability statements about
individual members of the population.

RSR
Amity Business School

From Here to Inference


we developed the sampling distribution, wherein knowledge of the
parameter(s) and some information about the distribution allow us to
make probability statements about a sample statistic.

RSR
Amity Business School

From Here to Inference


Statistical works by reversing the direction of the flow of knowledge in the previous
figure. The next figure displays the character of statistical inference.

Now onwards, we will assume that most population parameters are unknown. The
statistics practitioner will sample from the population and compute the required
statistic. The sampling distribution of that statistic will enable us to draw inferences
about the parameter.

RSR

Das könnte Ihnen auch gefallen