Sie sind auf Seite 1von 29

Estimation

Tieming Ji
Fall 2012
1 / 29
Two important jobs for statisticians: estimation and testing.
With respect to estimation, we might want to give
a point estimation or an interval estimation. There are several
ways to give a point estimation or an interval estimation. In
this chapter, we are going to learn
a few most simple but most fundamental methods to
construct a point estimator or an interval estimator, as well as
pros and cons of these approaches.
2 / 29
Denition: An estimator is a random variable. It is a statistic
which generates the estimate. An estimate is a number.
Example: Suppose the random variable X N(,
2
). In order
to estimate the unknown parameter , we take a random
sample with size n. X
1
, X
2
, , X
n
i .i .d.
N(,
2
), and
x
1
, x
2
, , x
n
are observations. Then,
the sample mean

X =
1
n

n
i =1
X
i
is a random variable, and
it is an estimator for .
x =
1
n

n
i =1
x
i
is an estimate for .
3 / 29
Denition: An estimator

is an unbiased estimator for a
parameter if and only if E(

) = .
Theorem: Let X
1
, X
2
, , X
n
be a random sample of size n
from a distribution with mean and variance
2
. Then,
The sample mean

X is an unbiased estimator for .
The sample variance S
2
is an unbiased estimator for
2
.
Denition: An estimator

is a consistent estimator for a
parameter if and only if lim
n

= .
4 / 29
More Exploration of Estimator

X
Theorem: Let

X be the sample mean based on a random
sample of size n from a distribution with mean and variance

2
. Then
E(

X) = . unbiasedness
Var(

X) =

2
n
.

X
=

n
.
Proof:
E(

X) = E(
1
n

n
i =1
X
i
) =
1
n
E(

n
i =1
X
i
) =
1
n
(EX
1
+ +EX
n
) =
1
n
n = .
Var(

X) = Var(
1
n
n

i =1
X
i
) =
1
n
2
Var(
n

i =1
X
i
)
iid
=
1
n
2
(VarX
1
+ + VarX
n
) =
1
n
2
n
2
=

2
n
.
5 / 29
Theorem: Let X
1
, X
2
, , X
n
be a random sample of size n
from a normal distribution with mean and variance
2
. Then

X N(,

2
n
).
6 / 29
Point Estimation
Two methods for deriving point estimators for distribution
parameters.
Method of Moment Estimators (MME) (by Karl Pearson).
Maximum Likelihood Estimators (MLE) (by R. F. Fisher).
7 / 29
Point Estimation MME
Idea: Use the sample moments to match the population
moments.
The kth population moment: E(X
k
). contains unknown
parameter(s).
The kth sample moment: M
k
=
1
n

n
i =1
X
k
i
. if we have a
sample, it will be a number.
Let
E(X
k
) = M
k
to solve the unknown parameter(s).
8 / 29
Example 1: A forester plants 5 rows of 20 pine seedlings. We
use the r.v. X to denote the number of seedlings survive in
each row. Suppose the probability for each seedling to survive
is p, then X Binomial(20, p). For the 5 rows, we observe
x
1
= 18, x
2
= 17, x
3
= 15, x
4
= 19, and x
5
= 20. Estimate
the parameter p.
Solution: Use moment method.
The 1st sample moment is x =
1
5
(x
1
+ x
2
+ + x
5
) = 17.8.
The 1st population moment is EX = np = 20p, since
X Binomial. Then, by matching the sample moment with
the population moment, we have 20 p = 17.8. Thus, p = 0.89.
9 / 29
Example 2: Let X
1
, X
2
, , X
n
be a random sample from a gamma
distribution with parameters and . Use the moment method to
estimate these two unknown parameters.
Solution:
Now, there are 2 unknown parameters, we need to use at least two
moments. We choose to use the 1st and 2nd moments.
We know that if X Gamma(, ), E(X) = and
Var(X) = E(X
2
) (EX)
2
=
2
. By matching the sample moments
with the population moments, we have
M
1
=

,
M
2
M
2
1
=

2
.
Thus,

= (M
2
M
2
1
)/M
1
, and = M
2
1
/(M
2
M
2
1
).
10 / 29
Point Estimation MLE
Idea: f
X
(x; ) is the density function for X with unknown
parameter . The parameter space for is .
With observations x
1
, x
2
, , x
n
, we want to nd one value of
, such that the probability (or the likelihood) of
generating such a sample is maximized.
11 / 29
Example 3: Suppose we ip a coin (may not be fair) 3 times
independently. A head is a success with probability p, and a
tail is a failure with probability (1 p). We use r.v. X to
denote one ipping result, and X Bernoulli(p). Suppose,
the three observations in the sample are x
1
= 1, x
2
= 0, and
x
3
= 0. Estimate p.
MLE Idea (Very Important):
Pick p [0, 1], such that the joint probability density
f (x
1
, x
2
, , x
n
) is maximized with parameter p.
12 / 29
Solution:
The probability of seeing the observations is:
P(X
1
= 1, X
2
= 0, X
3
= 0)
iid
= P(X
1
= 1)P(X
2
= 0)P(X
3
= 0)
= p (1 p) (1 p)
= p(1 p)
2
We want to nd a p [0, 1] such that this probability is
maximized. This is the p that makes the observations most
plausible (or have the highest likelihood) to be generated.
Now, the question is how to nd such a p.
13 / 29
We use L(p) = f (x
1
, x
2
, , x
n
|p), and call this the likelihood
function.
So, we want to nd a p such that L(p) is maximized, and p
need to be in [0, 1]. Since the ln() function (e-based log) is an
increasing function, this is equivalent to nd p to maximize
ln(L(p)).
Theorem: Assume , and L() is twice dierentiable on
. Then

is an MLE for if and only if
dlnL()
d

= 0 and
d
2
lnL()
d
2

< 0.
14 / 29
(Continue with the example 3.)
The likelihood function is
L(p) = f (x
1
= 1, x
2
= 0, x
3
= 0|p) = p(1 p)
2
.
The log likelihood function is
lnL(p) = lnp + 2ln(1 p).
The 1st derivative for the log likelihood function is
dlnL(p)
dp
=
1
p

2
1 p
.
Let the 1st derivative to be 0, and solve p.
dlnL(p)
dp
=
1
p

2
1 p
let
= 0;
solve
p =
1
3
.
15 / 29
(Continue with the example 3.)
Check the 2nd derivative to see if it is negative at p =
1
3
.
d
2
lnL(p)
dp
2

p=
1
3
=
d

1
p

2
1p

dp

p=
1
3
=
1
p
2

2
(1 p)
2

p=
1
3
< 0
Thus, p =
1
3
is the Maximum Likelihood Estimate for p.
16 / 29
Example 4: Let x
1
, x
2
, , x
n
be a random sample from a normal
distribution with mean and variance
2
. The density for X is
f (x) =
1

2
2
e

(x)
2
2
2
.
Find the MLE of and
2
.
The likelihood function for the sample is
L(,
2
) = f (x
1
, x
2
, , x
n
|,
2
)
iid
=
n

i =1
f (x
i
|,
2
)
=
n

i =1
1

2
2
e

(x)
2
2
2
=

2
2

n
e

n
i =1
(x
i
)
2
2
2
17 / 29
The logarithm of the likelihood function is
lnL(,
2
) = nln

2
n
2
ln
2

1
2
2
n

i =1
(x
i
)
2
.
To maximize this function, we take the 1st partial derivatives with
respect to and
2
, respectively. Set these derivatives equal to 0 to
solve and
2
.

lnL(,
2
)

=
1

n
i =1
(x
i
)
let
= 0;
lnL(,
2
)

2
=
n
2
2
+
1
2(
2
)
2

n
i =1
(x
i
)
2
let
= 0.
This give us:

n
i =1
x
i
n = 0;
n
2
+

n
i =1
(x
i
)
2
= 0.

= x;

2
=
1
n

n
i =1
(x
i
x)
2
.
18 / 29
Check the 2nd partial derivatives to see if they are negative when =
and
2
=
2
.

2
lnL(,
2
)

=
=
n

=
< 0;

2
lnL(,
2
)

2
=
2
=
n
2(
2
)
2

1
2(
2
)
3

n
i =1
(x
i
)
2

2
=
2
=
n
2(
2
)
2
< 0.
Thus, given that the random variable X follows a normal distribution,
and also given a random sample (iid) with size n, the maximum
likelihood estimators for and
2
are and
2
, respectively.
19 / 29
Interval Estimation
We have learnt two fundamental approaches to construct
point estimators Method of Moment Estimators (MME) and
Maximum Likelihood Estimators (MLE).
Often, we would want to give an interval estimate rather than
a point estimate, such as the weather tomorrow, the body
temperature range considered to be normal, etc.
Next, we are going to learn one approach of constructing an
interval estimate.
20 / 29
The most simple and fundamental method to construct an
interval estimator is based on the Central Limit Theorem
(CLT).
Theorem: (Central Limit Theorem) Let X
1
, X
2
, , X
n
be a
random sample of size n from a distribution with mean and
variance
2
. Then for large n,

X

N(,
2
/n);

X
/

N(0, 1).
The Central Limit Theorem tells us: as long as the sample size
n is large enough (often n 20 or 30), we can always assume

X follows a normal distribution in applications, no matter


what is the distribution of X.
21 / 29
Denition: (Condence Interval) is an unknown parameter of
interest. A (1-) condence interval for is an interval
[L
1
, L
2
], such that
P(L
1
L
2
) 1 .
Next, we will show examples of using Central Limit Theorem
to construct condence intervals for unknown parameters with
condence level (1 ).
22 / 29
Example 5: Acute Myeloblastic leukemia is among the most
deadly of cancers. Past experience indicates that the time in
months that a patient survives after initial diagnosis of the
disease is distributed with a mean of 13 months and a
standard deviation of 3 months. A new treatment is being
investigated which should prolong the average survival time
without aecting variability. Let X
1
, X
2
, , X
n
denote a
random sample (assume n is large) from the distribution of X,
the survival time under the new treatment. We are assuming
that X is distributed with
2
= 9 and unknown. We want to
nd statistics L
1
and L
2
so that P(L
1
L
2
) 0.95.
23 / 29
Solution:
Based on the ztable, we have
P(1.96 Z 1.96) = 0.95.
Since n is large, based on the Central Limit Theorem, we have
P(1.96

X
/

n
1.96) = 0.95.
Thus, for the unknown parameter , we have
P(

X 1.96/

n

X + 1.96/

n) = 0.95.
Thus, L
1
=

X 1.96/

n and L
2
=

X + 1.96/

n.
Plugging in
2
= 9, we have
L
1
=

X 5.88/

n and L
2
=

X + 5.88/

n.
24 / 29
Suppose we have the following observations on X, the survival
time (months) under the new treatment:
8.0 13.6 13.2 13.6 14.0
12.5 14.2 14.9 14.5 13.5
13.4 8.6 11.5 16.0 13.6
14.2 19.0 17.9 17.0 14.4
Based on the data, x=13.88, and n=20. Thus,
L
1
= x 5.88/

n = 13.88 5.88/

20 12.57, and
L
2
= x + 5.88/

n = 13.88 + 5.88/

20 15.19.
Thus, under the new treatment, 95% of time the true mean
survival time is covered in the interval [12.57, 15.19].
25 / 29
Based on the previous example, we could conclude the
following theorem.
Theorem: Let X
1
, X
2
, , X
n
be a random sample of size n
from a distribution for variable X with mean and variance

2
. If (1) X is normally distributed; or (2) X is not known to
be normally distributed but n is large enough, then the (1-)
condence interval estimate for mean is given by

X z
/2
/

n
where P(Z z
/2
) = 1

2
, and z
/2
= z
1/2
.
Note: (1) When X N(,
2
),

X N(,
2
/n). (2) When X
is not known to be normal, but n is large, use CLT.
26 / 29
Example: The late manifestation of an injury following
exposure to a sucient dose of radiation is common. A
random sample of size n=40 are obtained on the variable X,
the time in days that elapses between the exposure to
radiation and the appearance of peak skin redness. We have
x = 13.85 days, and = 4. Find the 95% condence interval
on the mean time to the appearance of peak redness. Would
you be surprised to hear a claim that = 17 days? Explain
based on the condence interval.
Solution: The bounds for the C.I. are
x z
0.05/2
/

n = 13.85 1.96(4/

40).
We are 95% condent that the interval [12.6, 15.1] will
contain . I would be surprised to hear a claim that =17
days since we are only 2.5% sure that > 15.1 days.
27 / 29
Example 6: The city driving mileage of a particular car is approximately
normally distributed with a mean of 27 and a standard deviation of 3.
What is the probability that a sample of 9 cars will average between 26
and 28 miles per gallon? (Distinguish the intervals for

X and .)
Solution: This question gives L
1
and L
2
, and ask us to compute the
condence level (1 ). Note that X is given to be normal, thus, even
when n=9 is small, we have

X N(,
2
/n). We want to compute
P(26

X 28) = P(
26
/

X
/

n

28
/

n
)
= P(
26
/

n
Z
28
/

n
)
= P(
26 27
3/

9
Z
28 27
3/

9
)
= P(1 Z 1) = P(Z 1) P(1 Z)
= 0.8413 0.1587 = 0.68
Thus, the probability is 0.68 to be averaged between 26 and 28 miles for
a sample with size 9.
28 / 29
Chapter Summary
Random variable

X N(,

2
n
) if X N(,
2
).
Random variable

X

N(,

2
n
) if n is large and X has
mean and variance
2
.
Use MME and MLE method to give a point estimator for
unknown parameters of a distribution.
Give (1-) condence interval for (
2
is known) when X
is known to be normally distributed, and when X is not
known to be normally distributed but n is large.
29 / 29

Das könnte Ihnen auch gefallen