Introduction to Statistical Inference

8.1.
0 Introduction
In real life, we work with data that are affected by randomness, and we need to extract information and draw
conclusions from the data. The randomness might come from a variety of sources. Here are two examples of
such situations:
1. Suppose that we would like to predict the outcome of an election. Since we cannot poll the entire
population, we will choose a random sample from the population and ask them who they plan to vote
for. In this experiment, the randomness comes from the sampling. Note also that if our poll is
conducted one month before the election, another source of randomness is that people might change
their opinions during the one month period.
2. In a wireless communication system, a message is transmitted from a transmitter to a receiver.
However, the receiver receives a corrupted version (a noisy version) of the transmitted signal. The
receiver needs to extract the original message from the received noisy version. Here, the randomness
comes from the noise.
Examples like these are abundant. Dealing with such situations is the subject of the field of statistical
inference.
Statistical inference is a collection of methods that deal with drawing conclusions from data that are prone to
random variation.
Clearly, we use our knowledge of probability theory when we work on statistical inference problems.
However, the big addition here is that we need to work with real data. The probability problems that we have
seen in this book so far were clearly defined and the probability models were given to us. For example, you
might have seen a problem like this:
Let X be a normal random variable with mean μ = 100 and variance σ 2 = 15 .

Find the probability that X > 110.
In real life, we might not know the distribution of X , so we need to collect data, and from the data we should
conclude whether X has a normal distribution or not. Now, suppose that we can use the central limit theorem
to argue that X is normally distributed. Even in that case, we need to collect data to be able estimate μ and
σ.
Here is a general setup for a statistical inference problem: There is an unknown quantity that we would like
to estimate. We get some data. From the data, we estimate the desired quantity. There are two major
approaches to this problem:
1. Frequentist (classical) Inference: In this approach, the unknown quantity θ is assumed to be a fixed
quantity. That is, θ is a deterministic (non-random) quantity that is to be estimated by the observed
data. For example, in the polling problem stated above we might consider θ as the percentage of
people who will vote for a certain candidate, call him/her Candidate A. After asking n randomly
chosen voters, we might estimate θ by
Y
̂
Θ = ,
n
where Y is the number of people (among the randomly chosen voters) who say they will vote for
Candidate A. Although θ is assumed to be a non-random quantity, our estimate of θ, which we show
by Θ̂ is a random variable, because it depends on our random sample.
2. Bayesian Inference: In the Bayesian approach the unknown quantity Θ is assumed to be a random
variable, and we assume that we have some initial guess about the distribution of Θ. After observing
the data, we update the distribution of Θ using Bayes' Rule.
As an example, consider the communication system in which the information is transmitted in the
form of bits, i.e., 0 's and 1 's. Let's assume that, in each transmission, the transmitter sends a 1 with
probability p , or it sends a 0 with probability 1 − p . Thus, if Θ is the transmitted bit, then
Θ ∼ Bernoulli(p). At the receiver, X , which is a noisy version of Θ, is received. The receiver has
to recover Θ from X . Here, to estimate Θ, we use our prior knowledge that Θ ∼ Bernoulli(p).
In summary, you may say that frequentist (classical) inference deals with estimating non-random quantities,
while Bayesian inference deals with estimating random variables. We will discuss frequentist and Bayesian
approaches more in detail in this and the next chapter. Nevertheless, it is important to note that both
approaches are very useful and widely used in practice. In this chapter, we will focus on frequentist methods,
while in the next chapter we will discuss Bayesian methods.
← previous
next →
8.1.1 Random Sampling
When collecting data, we often make several observations on a random variable. For example, suppose that
our goal is to investigate the height distribution of people in a well defined population (i.e., adults between
25 and 50 in a certain country). To do this, we define random variables X1 , X2 , X3 , . . ., Xn as follows: We
choose a random sample of size n with replacement from the population and let Xi be the height of the ith
chosen person. More specifically,
1. We chose a person uniformly at random from the population and let X1 be the height of that person.
Here, every person in the population has the same chance of being chosen.
2. To determine the value of X2 , again we choose a person uniformly (and independently from the first
person) at random and let X2 be the height of that person. Again, every person in the population has
the same chance of being chosen.
3. In general, Xi is the height of the ith person that is chosen uniformly and independently from the
population.
You might ask why do we do the sampling with replacement? In practice, we often do the sampling without
replacement, that is, we do not allow one person to be chosen twice. However, if the population is large, then
the probability of choosing one person twice is extremely low, and it can be shown that the results obtained
from sampling with replacement are very close to the results obtained using sampling without replacement.
The big advantage of sampling with replacement (the above procedure) is that Xi 's will be independent and
this makes the analysis much simpler.
Now for example, if we would like to estimate the average height in the population, we may define an
estimator as
X1 + X2 + ⋯ + Xn
̂
Θ = .
n
The random variables X1 , X2 , X3 , . . ., Xn defined above are independent and identically distributed (i.i.d.)
and we refer to them collectively as a (simple) random sample.
The collection of random variables X1 , X2 , X3 , . . ., Xn is said to be a random sample of size n if they are
independent and identically distributed (i.i.d.), i.e.,
1. X1 , X2 , X3 , . . ., Xn are independent random variables, and

2. they have the same distribution, i.e,
FX (x) = FX (x) =. . . = FX (x), for all x ∈ ℝ.

1 2 n
X1 +X2 +⋯+Xn
In the above example, the random variable Θ̂ = n
is called a point estimator for the average
height in the population. After performing the above experiment, we will obtain Θ̂ = θ.̂ Here, θ ̂ is called an
estimate of the average height in the population. In general, a point estimator is a function of the random
sample Θ̂ = h(X1 , X2 , ⋯ , Xn ) that is used to estimate an unknown quantity.
It is worth noting that there are different methods for sampling from a population. We refer to the above
sampling method as simple random sampling. In general, "sampling is concerned with the selection of a
subset of individuals from within a statistical population to estimate characteristics of the whole population"
[18]. Nevertheless, for the material that we cover in this book simple random sampling is sufficient. Unless
otherwise stated, when we refer to random samples, we assume they are simple random samples.
Some Properties of Random Samples:
Since we will be working with random samples, we would like to review some properties of random samples
in this section. Here, we assume that X1 , X2 , X3 , . . ., Xn are a random sample. Specifically, we assume
1. the Xi 's are independent;

2. FX1 (x) = FX2 (x) =. . . = FXn (x) = FX (x);
3. EXi = EX = μ < ∞;
4. 0 < Var(Xi ) = Var(X ) = σ 2 < ∞.
Sample Mean:
The sample mean is defined as
⎯⎯⎯⎯ X1 + X2 +. . . +Xn
X = .
n
Another common notation for the sample mean is Mn . Since Xi are assumed to have the CDF FX (x) , the
sample mean is sometimes denoted by Mn (X ) to indicate the distribution of Xi 's.
Properties of the sample mean

⎯⎯⎯⎯
1. EX = μ .
⎯⎯⎯⎯ 2
2. Var(X ) = σn .
3. Weak Law of Large Numbers (WLLN):
⎯⎯⎯⎯
lim P(|X − μ| ≥ ϵ) = 0.
n→∞
4. Central Limit Theorem: The random variable

⎯⎯⎯⎯
X − μ X1 + X2 +. . . +Xn − nμ
Zn = =
σ/ n nσ
√‾ √‾
converges in distribution to the standard normal random variable as n goes to infinity, that is
lim P(Zn ≤ x) = Φ(x), for all x ∈ ℝ

n→∞
where Φ(x) is the standard normal CDF.
Order Statistics:
Given a random sample, we might be interested in quantities such as the largest, the smallest, or the middle
value in the sample. Thus, we often order the observed data from the smallest to the largest. We call the
resulting ordered random variables order statistics. More specifically, let X1 , X2 , X3 , . . ., Xn be a random
sample from a continuous distribution with CDF FX (x) . Let us order Xi 's from the smallest to the largest
and denote the resulting sequence of random variables as
X(1) , X(2) , ⋯ , X(n) .
Thus, we have
X(1) = min
(X1 , X2 , ⋯ , Xn );
and
X(n) = max
(X1 , X2 , ⋯ , Xn ).
We call X(1) , X(2) , ⋯ , X(n) the order statistics of the random sample X1 , X2 , X3 , . . ., Xn . We are often
interested in the PDFs or CDFs of the X(i) 's. The following theorem provides these functions.
Theorem 8.1
Let X1 , X2 , . . ., Xn be a random sample from a continuous distribution with CDF FX (x) and PDF fX (x) .
Let X(1) , X(2) , ⋯ , X(n) be the order statistics of X1 , X2 , X3 , . . ., Xn . Then the CDF and PDF of X(i) are
given by
n! i−1 n−i
fX (x) = fX (x) FX (x)
(i) [ ] [1 − FX (x)] ,
(i − 1)!(n − i)!
n
n k n−k
FX (x) =
(i)
∑ ( k)
[FX (x)] [1 − FX (x)] .
k=i
Also, the joint PDF of X(1) , X(2) , ⋯ , X(n) is given by
fX ,⋯,X(n) (x 1 , x 2 , ⋯ , x n ) =
(1)
⎧ n!f (x )f (x ) ⋯ f (x ) for x 1 ≤ x 2 ≤ x 2 ⋯ ≤ x n
X 1 X 2 X n
⎪
⎨
⎪
⎩0 otherwise
A method to prove the above theorem is outlined in the End of Chapter Problems section. Let's look at an
example.
Example 8.1
Let X1 , X2 , X3 , X4 be a random sample from the U nif orm(0, 1) distribution, and let X(1) , X(2) , X(3) ,
X(4) . Find the PDFs of X(1) , X(2) , and X(4) .
Solution
Here, the ranges of the random variables are [0, 1], so the PDFs and CDFs are zero outside of
[0, 1]. We have
fX (x) = 1, for x ∈ [0, 1],
and
FX (x) = x, for x ∈ [0, 1].
By Theorem 8.1, we obtain
4! 1−1 4−1
fX (x) = fX (x) FX (x)
(1) [ ] [1 − FX (x)]
(1 − 1)!(4 − 1)!
3
= 4fX (x) 1 − FX (x)
[ ]
3
= 4(1 − x) , for x ∈ [0, 1].
4! 2−1 4−2
fX (x) = fX (x) FX (x) 1 − FX (x)

(2) [ ] [ ]
(2 − 1)!(4 − 2)!
2
= 12fX (x)FX (x) 1 − FX (x)
[ ]
2
= 12x(1 − x) , for x ∈ [0, 1].
4! 4−1 4−4
(x) = fX (x)[FX (x)]
fX
(4) [1 − FX (x)]
(4 − 1)!(4 − 4)!
3
= 4fX (x) FX (x)
[ ]
3
= 4x , for x ∈ [0, 1].
← previous
next →
8.2.0 Point Estimation
Here, we assume that θ is an unknown parameter to be estimated. For example, θ might be the expected
value of a random variable, θ = EX . The important assumption here is that θ is a fixed (non-random)
quantity. To estimate θ, we need to collect some data. Specifically, we get a random sample X1 , X2 , X3 ,
̂
. . . , Xn such that Xi 's have the same distribution as X . To estimate θ, we define a point estimator Θ that is a
function of the random sample, i.e.,
̂
Θ = h(X1 , X2 , ⋯ , Xn ).
For example, if θ = EX , we may choose Θ̂ to be the sample mean
⎯⎯⎯⎯ X1 + X2 +. . . +Xn
̂
Θ = X = .
n
There are infinitely many possible estimators for θ, so how can we make sure that we have chosen a good
estimator? How do we compare different possible estimators? To do this, we provide a list of some desirable
properties that we would like our estimators to have. Intuitively, we know that a good estimator should be
able to give us values that are "close" to the real value of θ. To make this notion more precise we provide
some definitions.
← previous
next →
8.2.1 Evaluating Estimators
We define three main desirable properties for point estimators. The first one is related to the estimator's bias.
The bias of an estimator Θ̂ tells us on average how far Θ̂ is from the real value of θ.
Let Θ̂ = h(X1 , X2 , ⋯ , Xn ) be a point estimator for θ. The bias of point estimator Θ̂ is defined by
̂ ̂
B(Θ) = E[Θ] − θ.
In general, we would like to have a bias that is close to 0 , indicating that on average, Θ̂ is close to θ. It is
worth noting that B(Θ̂ ) might depend on the actual value of θ. In other words, you might have an estimator
for which B(Θ̂ ) is small for some values of θ and large for some other values of θ. A desirable scenario is
when B(Θ̂ ) = 0 , i.e, E[Θ̂ ] = θ , for all values of θ. In this case, we say that Θ̂ is an unbiased estimator of
θ.
Let Θ̂ = h(X1 , X2 , ⋯ , Xn ) be a point estimator for a parameter θ. We say that Θ̂ is an unbiased of
estimator of θ if
̂
B(Θ) = 0, for all possible values of θ.
Example 8.2
Let X1 , X2 , X3 , . . ., Xn be a random sample. Show that the sample mean
⎯⎯⎯⎯ X1 + X2 +. . . +Xn
̂
Θ = X =
n
is an unbiased estimator of θ = EXi .
Solution
We have
̂ ̂
B(Θ) = E[Θ] − θ
⎯⎯⎯⎯
= E X − θ
[ ]
= EX i − θ
= 0.
Note that if an estimator is unbiased, it is not necessarily a good estimator. In the above example, if we
choose Θ̂ 1 = X1 , then Θ̂ 1 is also an unbiased estimator of θ:
̂ ̂
B(Θ1 ) = E[Θ1 ] − θ
= EX 1 − θ
= 0.
⎯⎯⎯⎯
Nevertheless, we suspect that Θ̂ 1 is probably not as good as the sample mean X . Therefore, we need other
measures to ensure that an estimator is a "good" estimator. A very common measure is the mean squared
error defined by E[(Θ̂ − θ) 2 ] .
The mean squared error (MSE) of a point estimator Θ̂ , shown by M SE(Θ̂ ), is defined as
̂ ̂ 2
M SE(Θ) = E (Θ − θ) .
[ ]
Note that Θ̂ − θ is the error that we make when we estimate θ by Θ̂ . Thus, the MSE is a measure of the
distance between Θ̂ and θ, and a smaller MSE is generally indicative of a better estimator.
Example 8.3
Let X1 , X2 , X3 , . . ., Xn be a random sample from a distribution with mean EXi = θ , and variance
Var(Xi ) = σ . Consider the following two estimators for θ:
2
1. Θ̂ 1 = X1 .
⎯⎯⎯⎯
X1 +X2 +...+Xn
2. Θ̂ 2 = X =
n
.
Find M SE(Θ̂ 1 ) and M SE(Θ̂ 2 ) and show that for n > 1 , we have
̂ ̂
M SE(Θ1 ) > M SE(Θ2 ).
Solution
We have
̂ ̂ 2
M SE(Θ1 ) = E (Θ1 − θ)
[ ]
2
= E[(X1 − EX1 ) ]
= Var(X1 )
2
= σ .
To find M SE(Θ̂ 2 ) , we can write
̂ ̂ 2
M SE(Θ2 ) = E (Θ2 − θ)
[ ]
⎯⎯⎯⎯
2
= E[(X − θ) ]
⎯⎯⎯⎯ ⎯⎯⎯⎯ 2
= Var(X − θ) +
(E[X − θ]) .
⎯⎯⎯⎯
The last equality results from EY 2 = Var(Y ) + (EY )
2
, where Y = X − θ . Now, note that
⎯⎯⎯⎯ ⎯⎯⎯⎯
Var(X − θ) = Var(X )
⎯⎯⎯⎯
since θ is a constant. Also, E[X − θ] = 0 . Thus, we conclude

⎯⎯⎯⎯
̂
M SE(Θ2 ) = Var(X )
2
σ
= .
n
Thus, we conclude for n > 1 ,

̂ ̂
M SE(Θ1 ) > M SE(Θ2 ).
From the above example, we conclude that although both Θ̂ 1 and Θ̂ 2 are unbiased estimators of the mean,
⎯⎯⎯⎯
̂
Θ2 = X is probably a better estimator since it has a smaller MSE. In general, if Θ̂ is a point estimator for θ,
we can write
̂ ̂ 2
M SE(Θ) = E (Θ − θ)
[ ]
2
̂ ̂
= Var(Θ − θ) +
(E[Θ − θ])
̂ ̂ 2
= Var(Θ) + B(Θ) .
If Θ̂ is a point estimator for θ,
̂ ̂ ̂ 2
M SE(Θ) = Var(Θ) + B(Θ) ,
where B(Θ̂ ) ̂
= E[Θ] − θ is the bias of Θ̂ .
The last property that we discuss for point estimators is consistency. Loosely speaking, we say that an
estimator is consistent if as the sample size n gets larger, Θ̂ converges to the real value of θ. More precisely,
we have the following definition:
Let Θ̂ 1 , Θ̂ 2 , ⋯, Θ̂ n , ⋯, be a sequence of point estimators of θ. We say that Θ̂ n is a consistent estimator of
θ, if
̂
lim P
(|Θn − θ| ≥ ϵ) = 0, for all ϵ > 0.
n→∞
Example 8.4
Let X1 , X2 , X3 , . . ., Xn be a random sample with mean EXi = θ , and variance Var(Xi ) = σ

2
. Show
⎯⎯⎯⎯
̂
that Θn = X is a consistent estimator of θ.
Solution
We need to show that
⎯⎯⎯⎯
lim P |X − θ| ≥ ϵ
( ) = 0, for all ϵ > 0.
n→∞
But this is true because of the weak law of large numbers. In particular, we can use Chebyshev's
inequality to write
⎯⎯⎯⎯
⎯⎯⎯⎯ Var(X )
P(|X − θ| ≥ ϵ) ≤
2
ϵ
2
σ
= ,
2
nϵ
which goes to 0 as n → ∞ .
⎯⎯⎯⎯
We could also show the consistency of Θ̂ n = X by looking at the MSE. As we found previously, the MSE
⎯⎯⎯⎯
of Θ̂ n = X is given by
2
σ
̂
M SE(Θn ) = .
n
⎯⎯⎯⎯
Thus, M SE(Θ̂ n ) goes to 0 as n → ∞. From this, we can conclude that Θ̂ n = X is a consistent estimator
for θ. In fact, we can state the following theorem:
Theorem 8.2
Let Θ̂ 1 , Θ̂ 2 , ⋯ be a sequence of point estimators of θ. If
̂
lim M SE(Θn ) = 0,
n→∞
then Θ̂ n is a consistent estimator of θ.
Proof
We can write
̂ ̂ 2 2
P(|Θn − θ| ≥ ϵ) = P(|Θn − θ| ≥ ϵ )
̂ 2
E[Θn − θ]
≤ (by Markov's inequality)
2
ϵ
̂
M SE(Θn )
= ,
2
ϵ
which goes to 0 as n → ∞ by the assumption.
← previous
next →
8.2.2 Point Estimators for Mean and Variance
⎯⎯⎯⎯
The above discussion suggests the sample mean, X , is often a reasonable point estimator for the mean. Now,
suppose that we would like to estimate the variance of a distribution σ 2 . Assuming 0 < σ 2 < ∞ , by
definition
2 2
σ = E[(X − μ) ].
Thus, the variance itself is the mean of the random variable Y = (X − μ)

2
. This suggests the following
estimator for the variance
n
2
1
2
σ ̂ = (Xk − μ) .
n ∑
k=1
2 2
By linearity of expectation, σ ̂ is an unbiased estimator of σ 2 . Also, by the weak law of large numbers, σ ̂
is also a consistent estimator of σ 2 . However, in practice we often do not know the value of μ. Thus, we
may replace μ by our estimate of the μ, the sample mean, to obtain the following estimator for σ 2 :
n
⎯⎯⎯2 1 ⎯⎯⎯⎯
2
S = (Xk − X ) .
n ∑
k=1
Using a little algebra, you can show that
n
⎯⎯⎯2 1 ⎯⎯⎯⎯2
2
S = X − nX .
k
n (∑ )
k=1
Example 8.5
Let X1 , X2 , X3 , . . ., Xn be a random sample with mean EXi = μ , and variance Var(Xi ) = σ

2
. Suppose
that we use
n n
⎯⎯⎯2 1 ⎯⎯⎯⎯ 1 ⎯⎯⎯⎯2
2 2
S = (Xk − X ) = X − nX
k
n ∑ n (∑ )
k=1 k=1
to estimate σ 2 . Find the bias of this estimator

⎯⎯⎯2 ⎯⎯⎯2
2
B(S ) = E[S ] − σ .
Solution
First note that
⎯⎯⎯⎯2 ⎯⎯⎯⎯ ⎯⎯⎯⎯
2
EX = EX ) + Var(X )
(
2
σ
2
= μ + .
n
Thus,
n
⎯⎯⎯2 1 ⎯⎯⎯⎯2
2
E[S ] = EX − nEX
k
n (∑ )
k=1
2
1 σ
2 2 2
= n(μ + σ ) − n μ +
n( ( n ))
n − 1
2
= σ .
n
Therefore,
⎯⎯⎯2 ⎯⎯⎯2
2
B(S ) = E[S ] − σ
2
σ
= − .
n
⎯⎯⎯2
We conclude that S is a biased estimator of the variance. Nevertheless, note that if n is relatively large, the
⎯⎯⎯2 ⎯⎯⎯2
n−1
bias is very small. Since E[S ] =
n
σ
2
, we can obtain an unbiased estimator of σ 2 by multiplying S
by n
n−1
. Thus, we define
n n
1 ⎯⎯⎯⎯ 1 ⎯⎯⎯⎯2
2 2 2
S = (Xk − X ) = X − nX .
k
n − 1 ∑ n − 1 (∑ )
k=1 k=1
By the above discussion, S 2 is an unbiased estimator of the variance. We call it the sample variance. We
⎯⎯⎯2
should note that if n is large, the difference between S 2 and S is very small. We also define the sample
standard deviation as
‾2 .
‾‾
S = √S
Although the sample standard deviation is usually used as an estimator for the standard deviation, it is a
biased estimator. To see this, note that S is random, so Var(S) > 0. Thus,
2 2
0 < Var(S) = ES − (ES)
2 2
= σ − (ES) .
Therefore, ES < σ , which means that S is a biased estimator of σ .

Let X1 , X2 , X3 , . . ., Xn be a random sample with mean EXi = μ < ∞, and variance
< ∞ . The sample variance of this random sample is defined as
2
0 < Var(Xi ) = σ
n n
1 ⎯⎯⎯⎯ 1 ⎯⎯⎯⎯2
2 2 2
S = (Xk − X ) = X − nX .
k
n − 1 ∑ n − 1 (∑ )
k=1 k=1
The sample variance is an unbiased estimator of σ 2 . The sample standard deviation is defined as
2
S = √‾‾
S‾ ,
and is commonly used as an estimator for σ . Nevertheless, S is a biased estimator of σ .
You can use the mean command in MATLAB to compute the sample mean for a given sample. More
specifically, for a given vector
x =[x 1 , x 2 , ⋯, x n ], mean(x) returns the sample average
x1 + x2 + ⋯ + xn
.
n
Also, the functions var and std can be used to compute the sample variance and the sample standard
deviation respectively.
Example 8.6
Let T be the time that is needed for a specific task in a factory to be completed. In order to estimate the mean
and variance of T , we observe a random sample T1 ,T2 ,⋯,T6 . Thus, Ti 's are i.i.d. and have the same
distribution as T . We obtain the following values (in minutes):
18, 21, 17, 16, 24, 20.
Find the values of the sample mean, the sample variance, and the sample standard deviation for the observed
sample.
Solution
The sample mean is
⎯⎯⎯⎯ T1 + T2 + T3 + T4 + T5 + T6
T =
6
18 + 21 + 17 + 16 + 24 + 20
=
6
= 19.33
The sample variance is given by

6
1
2 2
S = (Tk − 19.333) = 8.67
6 − 1 ∑
k=1
Finally, the sample standard deviation is given by
‾2 = 2.94
‾‾
S = √S
You can use the following MATLAB code to compute the above values:
t = [18, 21, 17, 16, 24, 20];
m = mean(t);
v = var(t);
s = std(t);
← previous
next →
8.2.5 Solved Problems
Problem 1
Let X be the height of a randomly chosen individual from a population. In order to estimate the mean and
variance of X , we observe a random sample X1 ,X2 ,⋯,X7 . Thus, Xi 's are i.i.d. and have the same
distribution as X . We obtain the following values (in centimeters):
166.8, 171.4, 169.1, 178.5, 168.0, 157.9, 170.1
Find the values of the sample mean, the sample variance, and the sample standard deviation for the observed
sample.
Solution
⎯⎯⎯⎯ X1 + X2 + X3 + X4 + X5 + X6 + X7
X =
7
166.8 + 171.4 + 169.1 + 178.5 + 168.0 + 157.9 + 170.1

=
7
= 168.8
The sample variance is given by
7
1
2 2
S = (Xk − 168.8) = 37.7
7 − 1 ∑
k=1
Finally, the sample standard deviation is given by
2
= √‾‾
S‾ = 6.1
The following MATLAB code can be used to obtain these values:

x=[166.8, 171.4, 169.1, 178.5, 168.0, 157.9, 170.1];
m=mean(x);
v=var(x);
s=std(x);
Problem 2
Prove the following:
a. If Θ̂ 1 is an unbiased estimator for θ, and W is a zero mean random variable, then
̂ ̂
Θ2 = Θ1 + W
is also an unbiased estimator for θ.

b. If Θ̂ 1 is an estimator for θ such that E[Θ̂ 1 ] = aθ + b , where a ≠ 0 , show that
̂
Θ1 − b
̂
Θ2 =
a
is an unbiased estimator for θ.
Solution
a. We have
̂ ̂
E[Θ2 ] = E[Θ1 ] + E[W ] (by linearity of expectation)
̂
= θ + 0 (since Θ1 is unbiased and EW = 0)
= θ.
Thus, Θ̂ 2 is an unbiased estimator for θ.

b. We have
̂
E[Θ1 ] − b
̂
E[Θ2 ] = (by linearity of expectation)
a
aθ + b − b
=
a
= θ.
Thus, Θ̂ 2 is an unbiased estimator for θ.
Problem 3
Let X1 , X2 , X3 , . . ., Xn be a random sample from a U nif orm(0, θ) distribution, where θ is unknown.

Define the estimator
̂
Θn = max{X1 , X2 , ⋯ , Xn }.
a. Find the bias of Θ̂ n , B(Θ̂ n ) .

b. Find the MSE of Θ̂ n , M SE(Θ̂ n ) .
c. Is Θ̂ n a consistent estimator of θ?
Solution
If X ∼ U nif orm(0, θ) , then the PDF and CDF of X are given by
⎧ 1
0 ≤ x ≤ θ
⎪ θ
fX (x) = ⎨
⎪
⎩0 otherwise
and
x
⎧ 0 ≤ x ≤ θ
θ
⎪
FX (x) = ⎨
⎪
⎩0 otherwise
By Theorem 8.1, the PDF of Θ̂ n is given by
1
n−1
f (y) = nfX (x) FX (x)
̂
Θn
[ ]
n−1
⎧ ny
n 0 ≤ y ≤ θ
⎪ θ
= ⎨
⎪
⎩0 otherwise
a. To find the bias of Θ̂ n , we have

θ n−1
ny
̂
E[Θn ] = y ⋅ dy
n
∫ θ
0
n
= θ.
n + 1
Thus, the bias is given by
̂ ̂
B(Θn ) = E[Θn ] − θ
n
= θ − θ
n + 1
θ
= − .
n + 1
b. To find M SE(Θ̂ n ) , we can write
̂ ̂ ̂ 2
M SE(Θn ) = Var(Θn ) + B(Θn )
2
θ
̂
= Var(Θn ) + .
2
(n + 1)
Thus, we need to find Var(Θ̂ ). We have

θ n−1
2 ny
̂ 2
E Θ = y ⋅ dy
[ n] ∫
n
θ
0
n
2
= θ .
n + 2
Thus,
2 2
̂ ̂ ̂
Var(Θn ) = E Θ −
[ n] (E[Θn ])
n
2
= θ .
2
(n + 2)(n + 1)
Therefore,
2
n θ
̂ 2
M SE(Θn ) = θ +
2 2
(n + 2)(n + 1) (n + 1)
2
2θ
= .
(n + 2)(n + 1)

Introduction to Statistical Inference

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Introduction to Statistical Inference

Hochgeladen von

Copyright:

Verfügbare Formate

8.1.

Let X be a normal random variable with mean μ = 100 and variance σ 2 = 15 .

1. X1 , X2 , X3 , . . ., Xn are independent random variables, and

FX (x) = FX (x) =. . . = FX (x), for all x ∈ ℝ.

1. the Xi 's are independent;

The sample mean is defined as

Properties of the sample mean

4. Central Limit Theorem: The random variable

lim P(Zn ≤ x) = Φ(x), for all x ∈ ℝ

where Φ(x) is the standard normal CDF.

X(1) , X(2) , ⋯ , X(n) .

Also, the joint PDF of X(1) , X(2) , ⋯ , X(n) is given by

fX (x) = 1, for x ∈ [0, 1],

FX (x) = x, for x ∈ [0, 1].

By Theorem 8.1, we obtain

fX (x) = fX (x) FX (x) 1 − FX (x)

For example, if θ = EX , we may choose Θ̂ to be the sample mean

Let X1 , X2 , X3 , . . ., Xn be a random sample. Show that the sample mean

is an unbiased estimator of θ = EXi .

To find M SE(Θ̂ 2 ) , we can write

since θ is a constant. Also, E[X − θ] = 0 . Thus, we conclude

Thus, we conclude for n > 1 ,

If Θ̂ is a point estimator for θ,

Let X1 , X2 , X3 , . . ., Xn be a random sample with mean EXi = θ , and variance Var(Xi ) = σ

Let Θ̂ 1 , Θ̂ 2 , ⋯ be a sequence of point estimators of θ. If

then Θ̂ n is a consistent estimator of θ.

which goes to 0 as n → ∞ by the assumption.

Thus, the variance itself is the mean of the random variable Y = (X − μ)

Using a little algebra, you can show that

Let X1 , X2 , X3 , . . ., Xn be a random sample with mean EXi = μ , and variance Var(Xi ) = σ

to estimate σ 2 . Find the bias of this estimator

Therefore, ES < σ , which means that S is a biased estimator of σ .

18, 21, 17, 16, 24, 20.

The sample variance is given by

Finally, the sample standard deviation is given by

166.8, 171.4, 169.1, 178.5, 168.0, 157.9, 170.1

166.8 + 171.4 + 169.1 + 178.5 + 168.0 + 157.9 + 170.1

The sample variance is given by

Finally, the sample standard deviation is given by

The following MATLAB code can be used to obtain these values:

Prove the following:

is also an unbiased estimator for θ.

Thus, Θ̂ 2 is an unbiased estimator for θ.

Thus, Θ̂ 2 is an unbiased estimator for θ.

Let X1 , X2 , X3 , . . ., Xn be a random sample from a U nif orm(0, θ) distribution, where θ is unknown.

a. Find the bias of Θ̂ n , B(Θ̂ n ) .

By Theorem 8.1, the PDF of Θ̂ n is given by

a. To find the bias of Θ̂ n , we have

Thus, the bias is given by

b. To find M SE(Θ̂ n ) , we can write

Thus, we need to find Var(Θ̂ ). We have

Das könnte Ihnen auch gefallen