Beruflich Dokumente
Kultur Dokumente
A typical problem in Probability Theory is of the following form: A sample space and underlying
probability distribution are specified, and we are asked to compute the probabilities of a given
event(s).
Example: Experiment of rolling a loaded die, for which the chance of landing an odd-
numbered outcome is twice as much as that of an even-numbered outcome
a. Define the sample space.
b. Define an appropriate probability space.
c. Let E be event of getting either a 3 or a 4. Find P(E).
In a typical problem of Statistical Inference, it is not a single underlying probability distribution which
is specified, but rather a class of probability distributions, any of which may possibly be the one that
governs the chance experiment, whose outcome we shall observe. We know that the underlying
probability distribution is a member of this class, but we do not know which one it is. The objective
is thus to determine a “good” way of guessing, on the basis of the observed outcome of the
experiment, which of the underlying probability distributions is the one that actually governs the
experiment.
Example: Consider the experiment of rolling a die, for which we know nothing
a. Define the random variable X as the number of dots, i.e., face value of the die. What
are the possible values of X?
b. What would be a possible (probability) distribution of X?
c. Suppose the die is rolled 5 times. On the ith roll, let Xi be the number of dots. What
would be the (joint) distribution of X?
We now consider the specification of a statistical problem. Suppose that there is an experiment whose
outcome can be observed by the statistician. This outcome is described by a random variable X (or
random vector X), which takes on values in the space S. The distribution function of X, say FX, (or in
the case of a random vector, the distribution of X is FX ) is unknown to the statistician, but it is known
that FX belongs to a specified class of distribution functions, the class Ω. The collection of possible
actions that the statistician can take, or the collection of possible statements that can be made, at the
end of the experiment, is called the decision space, denoted as D. At the conclusion of the experiment,
the statistician actually chooses only one action (or makes only one statement) from the possible
choices in D.
In summary, therefore, any statistical problem can be specified by defining each of the components of
the triplet (S, Ω, D).
Example
Suppose we are given a coin, about which we know nothing. We are allowed to perform 10
independent flips with the coin, on each of which, the probability of getting a head is p. We do not
know the value of p, but we know that p will be in the interval [0,1].
Statistics 222: Introduction to Statistical Inference 19
In this example, we can let 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋10 )′ with each 𝑋𝑖 defined being either a “1” or “0”
according to whether the ith flip is a head or tail. Then, 𝑺 consists of all the ___ possible values of the
vector 𝑋. The class 𝛀 consists of all possible probability mass functions of 𝑋 for which the 𝑋𝑖 𝑠 are
independently and identically distributed Bernoulli random variables with probability of success 𝑝,
i.e., iid 𝐵𝑒(𝑝). Thus, for a specific value of the vector 𝑋, say 𝑥 = (𝑥1 , 𝑥2 , … , 𝑥10 )′,
𝛀 = { 𝑝𝑋 ∶ 𝑝𝑋 (𝑥) = 𝑝∑ 𝑥𝑖 (1 − 𝑝)10−∑ 𝑥𝑖 , 0 ≤ 𝑝 ≤ 1 } .
𝑫 = { 𝑝̂ ∶ 0 ≤ 𝑝̂ ≤ 1} .
This type of statistical problem is referred to as Point Estimation. If, on the other hand, we do not
merely want a guess as to the value of 𝑝, but rather a statement of an interval of values which is thought
to enclose the true value of the 𝑝, then we can define 𝑫 as
𝑫 = { (𝑝𝐿 , 𝑝𝑈 ) ∶ 0 ≤ 𝑝𝐿 ≤ 𝑝𝑈 ≤ 1 } .
Suppose that we are not required to come up with a numerical guess as to the value of 𝑝, but only to
know whether the coin is fair or not. In this case, 𝑫 can be defined as
Note that 𝑫 can be viewed as the collection of possible answers to a question (e.g., “What do you guess
𝑝 to be?” or “Within what interval do you guess 𝑝 to lie?” or “Is the coin fair?”) asked of the statistician.
The real problem in Statistical Inference lies in choosing the best “guessing method.” Note also
that there are infinitely many ways of arriving at a guess of the value of 𝑝 or arriving at a decision
whether the coin is fair or not. Which of these ways of forming a guess from the experimental data
should we actually employ? This is the real problem of inference.
When statisticians discuss statistical problems, they naturally classify them in certain ways. In our
case, we shall classify statistical problems on the basis of the structure of 𝛀 and 𝑫.
Example:
In contrast, a nonparametric treatment of the above problem would entail only slight
assumptions regarding the distributions 𝐹𝑋 in Ω, such that Ω consists of absolutely
continuous distributions, not necessarily having common parameters.
a. Point Estimation
b. Interval (or Region) Estimation
c. Hypothesis Testing
d. Ranking / Multiple Decision Problems
e. Regression / Experimental Designs Problems
Letters (a) – (c) were discussed in the example of the previous section. Region Estimation involves
estimating a vector of parameters 𝜃 = (𝜃1 , 𝜃2 , … , 𝜃𝑘 )′. For example, we might be interested in
estimating the mean 𝜇 and variance 𝜎 2 of a distribution simultaneously. In this case, our estimate
will be of the form:
𝑫 = { (𝑢, 𝑣) ∶ 𝑢1 ≤ 𝑢 ≤ 𝑢2 ; 𝑣1 ≤ 𝑣 ≤ 𝑣2 } ,
Multiple Decision Problems are decision problems where there are a finite (more than 2) number
of possible decisions. Note that hypothesis testing is just a special case of this problem with the
number of possible decisions equal to two. Ranking Problems are those for which a decision is a
statement as to the complete ordering of certain objects or things, as in “Method A is best, B is next,
and C is the worst.”
A Regression Problem looks into investigating the (linear) relationship between one variable
(dependent variable) and a set of other variables (independent variables). For this type of a problem,
the objective is to try to predict the value of the dependent variable based on the observed values of
independent variables. If Regression Analysis investigates the linear relationships between
variables, an Experimental Designs Problem, on the other hand, looks into investigating the causal
relationship between a dependent variable and several independent variables. Such problems aim
to determine whether changes in the independent variables will cause some effect on the dependent
variables.
Statistics 222: Introduction to Statistical Inference 21
Topics that do not fall into any of the classifications just mentioned but are of practical importance
to us are listed below. Some of the more important topics usually discussed are
a. Sampling Methods
b. Cost Considerations
c. Randomization
d. Asymptotic Theory
In Sampling Methods, the focus oftentimes falls on the so-called fixed-sample-size and sequential
procedures. The former entails fixing the sample size even before any data are collected, while the
latter is characterized by taking the observations sequentially, hence, the sample size is not fixed in
advance. Cost Considerations frequently become a deciding factor in the choice between the two
sampling procedures.
Some mathematically oriented problems in statistics involve the use of Randomization. Loosely,
this is the process of incorporating some element of chance into the manner in which the experiment
is being performed, so as to minimize the possibility of having biases. Asymptotic Theory is the
class of results and theories that apply for cases using very large samples. The word asymptotic is
usually used to describe a method, a result, a theorem, or a definition associated with very large
samples.
Defn: The totality of elements which are under discussion, and about which information is desired
will be called the target population.
Remarks:
1. The target population must be capable of being well defined. It may be real or hypothetical.
2. The object of any investigation is to find out something about a given target population.
3. It is generally impossible or impractical to examine the entire population, but, on the basis of
examining a part of it, inferences regarding the entire target population can be made.
Consider 𝑋~𝐹𝑋 . You wish to make inferences about FX on the basis of n independent observations of
𝑋, say 𝑋1 , 𝑋2 , … , 𝑋𝑛 . The problem now is how to select a part of the population, i.e., how do we obtain
a sample? In answering this question, the following consideration should be taken into account: If the
sample is selected in a certain way, we can make probabilistic statements about the population.
Defn: Given a probability space and a positive integer n, a collection of n independent random
variables 𝑋1 , 𝑋2 , … , 𝑋𝑛 , all having common distribution 𝐹𝑋 is called a random sample from a
population (with distribution) 𝐹𝑋 .
Statistics 222: Introduction to Statistical Inference 22
Note: We assume that each physical element of the population has some numerical value associated
with it and that the distribution of these values is given by the distribution function 𝐹𝑋 .
Remarks:
1. A random sample (r.s.) can be viewed as a random vector 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑛 )′ defined on the
n-dimensional real space Rn. Further, it can also be interpreted as the outcome of a series of n
independent trials of an experiment performed under identical conditions.
2. Inference works under the assumption that the sample (data) reflects the truth about the
population. To ferret out this truth, inference employs the so-called process of inductive
argumentation.
Example: If a given coin is biased (loaded) in favor of heads, we would expect to observe heads
more frequently than tails in repeated tosses of the coin. Out of 20 tosses of the coin,
14 were heads and only 6 were tails. This is thus taken as evidence that the coin may
not be fair.
Since we cannot make absolutely certain generalizations, uncertainty will always be present in
all inductive inferences we make. This is why statistical inference is based on laws of
probability.
3. The distribution 𝐹𝑋 is usually called the sampled population, the collection of all elements from
which the sample is actually selected. (In certain cases, 𝐹𝑋 may be replaced with the
corresponding PMF or PDF.)
5. Implicitly, sampling without replacement from a finite population is ruled out in the above
definition.
6. Since the r.s. 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑛 )′ consists of independent and identically distributed (iid)
random variables, then the distribution of the r.s. X, which is simply the joint distribution of
𝑋1 , 𝑋2 , … , 𝑋𝑛 , is thus,
Examples:
1. In studying the “reliability” of light bulbs, the lifetime X (in hours) of a given light bulb is taken
to be a r.v. with density function:
A collection of n light bulbs is put to a “reliability test” and their lifetimes are recorded. Then,
𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑛 )′ can be considered a r.s. (from an exponential population with parameter
). What is the PDF of the r.s. X? What are S and for this statistical problem?
2. A r.s. 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑛 )′ from a Bernoulli population with parameter 0p1, will have a PMF
of the form
p X x
2.2.1 Statistics
Defn: Let X = (X1, X2, …, Xn)’ be an observable random vector. Any observable function of X, say
T(X), which is itself a r.v. (or random vector), is called a statistic. The standard deviation of a
statistic is called its standard error.
Remarks:
1. A statistic is always
a. a function of observable random variables
b. itself a r.v.
c. does not contain any unknown parameter
2. By “observable”, we mean that the value of the statistic T(X) can be computed directly from
the values of the r.v.’s in the r.s.
Statistics 222: Introduction to Statistical Inference 24
Examples:
For the given random samples, which of the given are statistics?
1. Let X be a r.s. (of size 1) from N(, 2), where and 2 are both unknown.
a. X- d. X2 + 3
b. X/ e. X2 + log X2
c. X
X i 5
a. X i 1
10
d. X
i 1
i
X
n
2
n i X
a. Sn X i d. S2 i 1
i 1 n 1
X1
b. X n1 S n e. Z
c.
n
T1 X X i
2
f. T2 X
n 1S 2
i 1 2
Let X = (X1, X2, …, Xn)’ be a r.s. from FX. Some common statistics are:
n
1. Sample Sum : Sn X i
i 1
n
X i
2. Sample Mean : X i 1
X
n
2
i X
3. Sample Variance : S2 i 1
n 1
n
X i
r
X
n
r
i X
5. rth Sample Central Moment : Mr i 1
, r 1,2,...
n
a. EM r ' E X r ' , if E X exists; and,
r r
b. VarM r '
EX EX , if EX exists.
2r r 2
2r
n
Corollary: Let X = (X1, X2, …, Xn)’ be a r.s. from FX, with mean and variance 2. Then,
a. E X E X ; and
VarX
2
b. .
n
Theorem: Let X = (X1, X2, …, Xn)’ be a r.s. from FX, with mean and variance 2. Then,
2
a. E S ; and
2
n 3 4
4
n 1
b. Var S 2
n
.
Remarks:
1. A statistic, being a r.v., has thus its own probability distribution, which is called the sampling
distribution.
2. The sampling distribution of a statistic is affected by the sample size n, the population size N
(for finite cases), and the way X was observed (i.e., the manner in which the r.s. was selected).
Examples:
4. Let X = (X1, X2, …, Xn)’ be a r.s. from N(,2), where R and 2>0.
a. The sampling distribution of the sample sum is
b. The sampling distribution of the sample mean is
Defn: The family of density (or mass) functions { f X ; , with parameter } is said to be
reproductive with respect to the parameter if, and only if,
Remark: The above definition also applies for more than 2 independent r.v.’s.
Examples:
n
1. Xi ~ Bi(mi, p) i = 1, 2, …, n Sn ~ Bi( mi , p) ;
i 1
n n n
Xi ~ N(i, i2) i = 1, 2, …, n ai X i ~ N ( ai i , ai2 i2 )
i 1 i 1 i 1
Xi ~ Exp() i = 1, 2, …, n Sn ~ Ga(n, )
Defn: A continuous r.v. X is said to have a chi-square distribution with k degrees of freedom
(d.f.) if, and only if, the PDF of X is given by
1 k
f X x 2 x
2
k
2 1
exp x 2I 0, x , k .
k 2
Notation : X~ 2k
Mean : E(X) = k
Variance : Var(X) = 2k
k
1 2
MGF : mX t
1 2t
k =2
k =5
k =10
k =15
FIGURE 1.1. Graph of the chi-square distribution with varying degrees of freedom (k).
Remarks
1. The degrees of freedom (d.f.) of the chi-square distribution completely specify the
distribution of a chi-square r.v.
3. A chi-square r.v. with k d.f. is equivalent to a Gamma r.v. with parameters r = k/2 and
= ½, i.e., 2k Ga(r = k/2, = ½).
Theorem: If the r.v.’s X1, X2, …, Xk are normally and independently distributed with means
i and variances i2, i = 1, 2, …, k, respectively, then,
2
X i
k
U i ~ 2k .
i 1 i
Statistics 222: Introduction to Statistical Inference 28
Corollary: If the r.v.’s X1, X2, …, Xn is a r.s. from N(, ), then, 2
2
k
X
U i ~ k .
2
i 1
Remarks:
1. The theorem states that the sum of the squares of independent standard normal
random variables is a chi-square random variable, with d.f. equal to the number of
r.v.’s (number of terms) in the sum.
2
X
2. If X~ N(,2), then ~ 1 .
2
4. The chi-square family of densities is reproductive with respect to the degrees of freedom.
a. X1, X2, …, Xn ~ independent 2ki Sn ~ 2
ki
i
Illustration:
Four r.s.’s, each of size 100, from N(0,1) are obtained using PHStat for MS Excel. The
histograms of the normal random samples, and the histograms for the squares of the values
are shown below.
50
12
40
Count
Count
8
30
20
10
0
-2 .0 0 -1 .0 0 0 .00 1 .00 2 .00 2 .00 4 .00 6 .00
n1 sq1
50
15
40
30
Count
10
Count
20
10
0
-2 .0 0 -1 .0 0 0 .00 1 .00 2 .00 2 .00 4 .00 6 .00 8 .00
n2 sq2
Statistics 222: Introduction to Statistical Inference 29
40
12
30
8
Count
Count
20
10
0
-1 .0 0 0 .00 1 .00 2 .00 1 .00 2 .00 3 .00 4 .00 5 .00 6 .00
n3 sq3
40
12
30
Count
Count
20
4
10
0
-2 .0 0 -1 .0 0 0 .00 1 .00 2 .00 1 .00 2 .00 3 .00 4 .00 5 .00
n4 sq4
Theorem: Let X = (X1, X2, …, Xn)’ be a r.s. from N(,2), where R and 2>0 and n 2.
Then,
a. X and S2 are independent; and,
b.
n 1S 2
~ 2n 1 .
2
m2 n m x 2 1
m m
2
f X x m n I 0, x , m, n Z .
2 2 n 1 x
m n
m 2
n
Notation : X~ Fm, n
n
Mean : E(X) = , n2
n2
2n 2 m n 2
Variance : Var(X) = , n4
mn 2 n 4
2
MGF : DNE
Statistics 222: Introduction to Statistical Inference 30
Remarks:
1. The numerator (m) and denominator (n) degrees of freedom completely specify the
distribution.
U
X m
~ Fm, n .
V
n
Remark: The theorem states that the ratio of two independent chi-square r.v.’s over their
respective d.f. is an F-distributed r.v., with numerator d.f. equal to the d.f. of the
chi-square r.v. in the numerator and denominator d.f. equal to the d.f. of the chi-
square r.v. in the denominator.
Corollary: If X1, X2, …, Xm is a r.s. from N(X,2) and Y1, Y2, …, Yn is another r.s. from
N(Y,2), then,
S X2
~ Fm 1, n 1 ,
SY2
X Y Y
m n
2 2
i X i
where S X2 i 1
and SY2 i 1
.
m 1 n 1
Illustration: Using 4 of the r.s.’s of size 100 each from N(0,1) in the earlier illustration, the
histograms of sq1/sq2 and sq3/sq4 are shown below.
1 00
75
75
Count
Count
50 50
25 25
0 0
1 00 0 .00 0 2 00 0 .00 0 3 00 0 .00 0 2 50 0 .00 0 5 00 0 .00 0 7 50 0 .00 0 1 00 0 0.0 0 0
f1_2 f5_4
k 21
k 1
f X x 1 X2
I , x , k Z .
2
k k2
k
Notation : X~ tk
Mean : E(X) = 0
k
Variance : Var(X) = , k2
k 2
MGF : DNE
v=5
v=2
-3 -2 -1 0 1 2 3
FIGURE 1.3. Graph of the t-distribution with varying degrees of freedom (k) and the standard normal
distribution
Statistics 222: Introduction to Statistical Inference 32
Remarks:
5. For large d.f. k, the t-distribution reduces to the standard normal distribution.
Z
T ~ t k .
U
k
Remark: The theorem states that the ratio of a standard normal random variable to the
square root of an independent chi-square random variable over its degrees of
freedom is a t-distributed random variable, with d.f. equal to the d.f. of the chi-
square random variable in the denominator.
Corollary: If X1, X2, …, Xn is a r.s. from N(,2), with sample mean X and sample standard
deviation S, then,
X
T ~ t n 1 .
S
n
75 60
50 40
Count
Count
25 20
0 0
-5 0.0 00 -2 5.0 00 0 .00 0 2 5.0 0 0 5 0.0 0 0 -5 0.0 00 0 .00 0 5 0.0 0 0 1 00 .0 00
n1_n2 n5_rootsq4
Statistics 222: Introduction to Statistical Inference 33
Some Important Results:
Let X 1 , X 2 , ..., X n1 be a r.s. from N 1 , 12 and Y1 , Y2 , ..., Yn2 be another independent r.s. from
N 2 , 22 . Then,
X 1 Y 2
1. ~ tn1 1 and ~ t n 2 1
S1 S2
n1 n2
X Y Y
n1 n2
2 2
i X i
where S12 i 1
and S22 i 1
n1 1 n2 1
2.
X Y ~ N (0,1)
1 2
2
2
1
2
n1 n2
3.
X Y ~ t 1 2
assuming 12 22 ,
n1 n2 2 ,
1 1
S p2
n1 n2
where S p2
n1 1S12 n2 1S22 (pooled variance)
n1 n2 2
S12
12
4. ~ Fn1 1, n2 1
S22
22
S12
5. 2
~ Fn1 1, n2 1 , assuming 12 22
S2
Defn: Let X1, X2, …, Xn be a r.s. from FX. Let X(1) X(2) … X(n) be the Xi’s arranged in
increasing order. Then, X(1), X(2), …, X(n) are called the order statistics corresponding
to the r.s. X1, X2, …, Xn, and X(r) is called the rth order statistic.
Remarks
2. In general, the order statistics (o.s.) are not independent, unless FX is a distribution that is
degenerate at some constant c.
Statistics 222: Introduction to Statistical Inference 34
3. The first and the last order statistics, X(1) and X(n), are called the sample minimum and
sample maximum, respectively.
Theorem: Let X(1), X(2), …, X(n) represent the o.s. of a r.s. from the distribution FX. For r
=1, 2, …, n, the CDF of X(r) is given by
n
n
Fr y FX y 1 FX y .
j n j
j r j
Corollary: The CDF of the sample minimum X(1) and maximum X(n) are, respectively,
F1 y 1 1 FX y n and Fn y FX y n .
Example: Suppose 20 identical light bulbs operate independently in a system. The system
stops when one light bulb expires. For i = 1, 2, …, 20, let Xi represent the lifetime
(in days) of the ith bulb, with each Xi ~ Exp(). Find the CDFof X(4). What is the
probability that the system will still be working after 150 days?
Theorem: If FX is absolutely continuous with PDF fX, then for r = 1, 2, …, n, the PDF of
X(r) denoted fr, is given by
n
f r y r FX y f X y 1 FX y .
r 1 nr
r
Corollary: If FX is absolutely continuous with PDF fX, then for r = 1, 2, …, n, the PDF of
the sample minimum X(1) and maximum X(n) are, respectively
Theorem: Let X(1), X(2), …, X(n) represent the o.s. of a r.s. from the distribution FX. For r, s
=1, 2, …, n, and r < s, if FX has PDF fX, then the joint PDF of X(r) and X(s),
denoted by fr,s, is given by
j 1
Statistics 222: Introduction to Statistical Inference 35
X j
~
2. Sample Median : X X n1 , if n is odd
2
~ X n X n 1
X 2 2
, if n is even
2
The distribution of R can be derived (using transformation) from the joint PDF f1, n .
Examples:
1. Suppose we take a r.s. of size n from Bi(m, p). Find the CDF and the PMF of X(r).
3. Let X1, X2, …, Xn be a r.s. from U(0, ), n 2. Find the mean and the variance of the r.v.
n 1
X n .
n
Asymptotic theory deals with results that arise for sample size n approaching infinity, or for very
large n. The following asymptotic results will be useful in obtaining approximate (or asymptotic)
sampling distributions of certain statistics.
Theorem: (Chebyshev’s Inequality) Let X be a r.v. with mean and variance 2 < , finite.
Then, > 0,
2
P X 2 .
Statistics 222: Introduction to Statistical Inference 36
P X k
1
Corollary: .
k2
P X k 1
1 1
2
, or, P k X k 1 2 .
k k
Special Results
1. For k = 1, P X 0 .
1. Let X~N(0,1). Find the probabilities that X is within 1 standard deviation from the mean , 2
standard deviations from the mean , and 3 standard deviations from the mean .
2. Let X~Bi(n=10, p=0.9). Find the probabilities that X is within 1 standard deviation from the
mean , 2 standard deviations from the mean , and 3 standard deviations from the mean .
3. Let X~Po(=9). Find the probabilities that X is within 1 standard deviation from the mean , 2
standard deviations from the mean , and 3 standard deviations from the mean .
4. Let X~t(6). Find the probabilities that X is within 1 standard deviation from the mean , 2
standard deviations from the mean , and 3 standard deviations from the mean .
Theorem: Weak Law of Large Numbers (WLLN) Let X1, X2, …, Xn be a r.s. from the PDF f X ,
with mean and finite variance 2 < . Let and be 2 arbitrary numbers such that >
Remarks
n
lim P X 0, 0, and we write X
P
.
2. Explanation of the WLLN: The probability that X will deviate from the true population mean
by more than some arbitrary small nonzero value can be made arbitrarily small, by choosing
n sufficiently large. Because of this, the sample mean can be used to estimate reliably.
3. If X1, X2, …, Xn is a r.s. from the PDF f X , with mean and variance 2 < , we can determine
n Z+ so that the probability that X will differ from by less than an arbitrary small amount
Statistics 222: Introduction to Statistical Inference 37
, can be made as close to 1 as possible. Thus, X can be used to estimate with a high degree
of accuracy.
P X 1 as n , 0 .
4. If n is sufficiently large, X is likely to be small, but this does not imply that X is
small for all large n.
5. The result does not imply that P X 1 . It only means that it can be very likely that X
Example: Consider a distribution with unknown mean and variance 2 = 1. How large a
sample should be taken so that a probability of at least 0.95 is attained that the sample
mean X will not deviate from the population mean by more than 0.4 units?
Theorem: Central Limit Theorem (CLT) Let X1, X2, …, Xn be a r.s. from the PDF f X , with mean
and finite variance 2. Let X be the sample mean of the r.s. and define the r.v. Z as
Zn
X E X .
X E X
VarX
n
Zn N (0,1) as n .
Remarks
n
as n and S N n, n as n .
n
2
3. The CLT result holds for all r.s.’s, regardless of the form of the parent PMF/PDF, for as long
as this distribution has finite variance.
4. Importance of the CLT: In making inferences about population parameter(s), we need the
distribution of certain statistics, e.g., the sample mean X . Finding the sampling distributions of
statistics is often mathematically easier if samples are taken from the normal distribution.
However, if the r.s. is not taken from the normal distribution, finding the sampling distribution
of X can become very difficult. The CLT states that, for as long as (1) the parent PMF/PDF of
the r.s. has finite variance, and (2) the sample size is large, the approximate distribution of the
sample mean is a normal distribution.
Statistics 222: Introduction to Statistical Inference 38
Examples
1. Consider a distribution with unknown mean and variance 2 = 1. How large a sample should
be taken so that a probability of (exactly) 0.95 is attained that the sample mean X will not
deviate from the population mean by more than 0.4 units?
2. An electrical firm manufactures light bulbs that have an average length of life equal to 800 hours
and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will
have an average life of less than 775 hours?
Remark: The De-Moivre – Laplace Theorem uses a normal distribution to approximate the
probabilities under a binomial distribution. However, the approximation is appropriate
only for binomial distributions with (1) very large values of n, and (2) values of p that
are not very close to 0 or 1. When the value of p is very close to 0 or 1, and when the
value of n is very large, the following corollary, which uses the normal distribution to
approximate the Poisson distribution, will be more appropriate.
Examples
1. Toss a pair of dice 600 times. Find the probability that there will be between 90 and 110 tosses
(exclusive) resulting in a total of “7” on the pair of dice.
2. The probability that a patient recovers from a rare blood disease is 0.6. If 100 people are known
to have contracted the disease, what is the probability that less than half of them will survive?
3. A multiple-choice quiz has 200 questions, each with 4 possible answers, only 1 of which is the
correct answer. What is the probability that sheer guesswork yields from 25 to 30 correct
answers for 80 of the 200 problems about which the student has no knowledge?
Corollary: (Normal Approximation to Poisson Distribution) If X1, X2, …, Xn is a r.s. from Po(),
with small, the sample sum Sn X i is approximately (or asymptotically)
i
distributed as N(n, n) as n .
Examples
1. Suppose that, on average, 1 person in every 1000 is alcoholic. Find the probability that a random
sample of 8000 people will yield fewer than 7 alcoholics.
2. The probability that a person dies from a respiratory infection is 0.002. Find the probability that
fewer than 5 of the next 2000 so infected will die.