Sie sind auf Seite 1von 60

Ch04

Sampling Distributions
CHAPTER CONTENTS
• 4.1 Introduction ........................................................................................ 178
• 4.2 Sampling Distributions Associated with Normal Populations ............. 184
• 4.3 Order Statistics ................................................................................... 200
• 4.4 Large Sample Approximations ............................................................ 205
• 4.5 Chapter Summary ............................................................................... 210
• 4.6 Computer Examples ............................................................................ 211
• Projects for Chapter 4 ............................................................................... 215
4.1 Introduction

Sampling distributions play a very important role in statistical analysis and decision making.

Because a random sample is a set of random variables X1, . . ., Xn, it follows that a sample
statistic that is a function of the sample is also random.

We call the probability distribution of a sample statistic its sampling distribution.

Sampling distributions provide the link between probability theory and statistical inference.

a population distribution v.s. a sampling distribution

A sample statistic (such as sample mean and sample variance) is a random variable.
A sample statistic is a parameter derived from a random sample.

Sampling distribution: distribution of a sample statistic


(such as the distribution of sample mean or sample variance)
E.g., sample average, sample stdev, etc.

E.g., the distribution of the sample average


Population: Normal/Non-normal

 is the standard deviation of the population.


4.1.1 FINITE POPULATION
4.2 Sampling Distributions Associated with Normal Populations

Population: Normal
 X_bar is normal
Now we introduce some distributions that can be derived from a
normal distribution.
4.2.1 CHI-SQUARE DISTRIBUTION
The chi-square distribution is a special case of a gamma distribution with  = n/2 and  = 2.

n (a positive integer) : degree of freedom


m =n
2 = 2n

Ref: 3.2.5 Gamma distribution


S2 : sample variance, =  (Xi – X_bar)^2 / (n – 1)
a
(Xi – 5)^2
X1 2 : population variance
X2 variance of Xi, I = 1, …, 10
A random
Population ~ N(?, 2) sample S2 : Sample variance,
X10 variance of a random sample (X1, x2, …, X10)

Each Xi is a random variables; it has a normal


distribution.
S2 is a random variable; What is its distribution?
(n-1)S^2 / sigma^2 ~ Chi-sq (n-1)
4.2.2 STUDENT t-DISTRIBUTION

Let the random variables X1, . . ., Xn follow a normal distribution with mean  and variance 2.

If  is known, then we know that n  X    /   is N(0,1).

If  is not known (as is usually the case), then it is routinely replaced by the sample standard
deviation s.

If the sample size is large, one could suppose that s  and apply the
Central Limit Theorem and obtain that n   X    / S is approximately
an N(0,1).


If the random sample is small, then the distribution of n X   / S  
Is given by the so-called Student t-distribution (or simply t-distribution).
This was originally developed by W.S. Gosset in 1908. Because his employers, the
Guinness brewery, would not permit him to publish this important work in his own name,
he used the pseudonym “Student.” Thus, the distribution is known as the Student t-
distribution.
In fact, the standard normal distribution
provides a good approximation to the t-
distribution for sample sizes of 30 or more.
EXAMPLE 4.2.6
A manufacturer of fuses claims that with 20% overload, the fuses will blow in less than 10 minutes on the average.
To test this claim, a random sample of 20 of these fuses was subjected to a 20% overload, and the times it took them to
blow had the mean of 10.4 minutes and a sample standard deviation of 1.6 minutes. It can be assumed that the data
constitute a random sample from a normal population.
Do they tend to support or refute the manufacturer’s claim?

Hypothesis test
10 min

10.4 min
EXAMPLE 4.2.7
The human gestation period—the period of time between conception and labor—is
approximately 40 weeks (280 days), measured from the first day of the mother’s last
menstrual period.
For a newborn full-term infant, the length appropriate for gestational age is assumed to be
normally distributed with  = 50 centimeters and  = 1.25 centimeters.
Compute the probability that a random sample of 20 infants born at full term results in a
sample mean greater than 52.5 centimeters.

50 cm

gestation 孕育
conception 懷孕
labor 分娩
4.2.3 F-DISTRIBUTION

The F-distribution was developed by Fisher to study the behavior of two variances
from random samples taken from two independent normal populations.

Question of interest :
whether the population variances are equal or not,
based on the response of the random samples.
0.303697901

a = 0.911093703
4.3 Order Statistics

This section is skipped.


The extreme (i.e. largest) value distribution
Skipped 4.4 Large Sample Approximations

If the sample size is large, the normality


assumption on the underlying population can
be relaxed.
A useful generalization of Corollary 4.2.2:
By Corollary 4.2.2, if the random sample came from a normal population,
then sampling distribution of the mean is normally distributed regardless of
the size of the sample.

By Theorem 4.4.1, regardless of the form of the population distribution, the


distribution of the z-transform of a sample mean X will be approximately a standard
normal random variable whenever n is large.

Even though the required sample size to apply Theorem 4.4.1 will depend on the
particular distribution of the population, for practical purposes we will consider the
sample size to be large enough if n = 30.
EXAMPLE 4.4.1
The average SAT score for freshmen entering a particular university is 1100 with a standard
deviation of 95. What is the probability that the mean SAT score for a random sample of 50 of
these freshmen will be anywhere from 1075 to 1110?
Note: We don’t know the distribution of the SAT score.
4.4.1 THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Because Y ?nX, by the Central Limit Theorem, Y has an approximate normal distribution with mean  = n and variance 2= np(1-p).
Because the calculation of the binomial probabilities is cumbersome for large sample sizes n, the normal approximation to the binomial
distribution is widely used. A useful rule of thumb for use of the normal approximation to the binomial distribution is to make sure n is
large enough if np > 5 and n(1-p) > 5. Otherwise, the binomial distribution may be so asymmetric that the normal distribution may not
provide a good approximation.
Other rules, such as np>10 and n(1-p) > 10, or np(1-p) > 10, are also used in the literature. Because all of these rules are only
approximations, for consistency’s sake we will use np > 5 and n(1-p) > 5 to test for largeness of sample size in the normal approximation
to the binomial distribution. If need arises, we could use the more stringent condition np(1-p) > 10.
CORRECTION FOR CONTINUITY FOR THE NORMAL APPROXIMATION TO THE
BINOMIAL DISTRIBUTION
EXAMPLE 4.4.2
A study of parallel interchange ramps revealed that many drivers do not use the entire length of parallel lanes for
acceleration, but seek, as soon as possible, a gap in the major stream of traffic to merge. At one site on Interstate Highway
75, 46% of drivers used less than one third of the lane length available before merging. Suppose we monitor the merging
pattern of a random sample of 250 drivers at this site.
(a) What is the probability that fewer than 120 of the drivers will use less than one third of the acceleration lane length
before merging?
(b) What is the probability that more than 225 of the drivers will use less than one third of the acceleration lane length
before merging?
4.5 Chapter Summary
4.6 Computer Examples (Optional)
Projects for Chapter 4

Das könnte Ihnen auch gefallen