M Unbiased

Topic 13: Unbiased Estimation
November 3, 2009
When we look to estimate the distribution mean , we use the sample mean x
. For the variance 2 , we have seen
two choices:
n
n
1X
1 X
2
(xi x
) and
(xi x
)2 .
n 1 i=1
n i=1
One criterion for choosing is statistical bias.
Definition 1. A statistic d is called an unbiased estimator for a function of the parameter g() provided that for
every choice of ,
E d(X) = g().
Any estimator that not unbiased is called biased. The bias is the difference
bd () = E d(X) g().
We can assess the quality of an estimator by computing its mean square error.
E [(d(X) g())2 ] = E [(d(X) E d(X) + bd ())2 ]
= E [(d(X) E d(X))2 ] + 2bd ()(E [(d(X) E d(X)] + bd ()2
= Var (d(X)) + bd ()2
Note that the mean square error for an unbiased estimator is its variance. Bias increases the mean square error.
Example 2. Let X1 , X2 , . . . , be Bernoulli trials with success parameter p and set d(X) = X,
= 1 (p + + p) = p
E X
n
is an unbiased estimator for p. In this circumstance, we generally write p instead of X.
In addition,
Thus, X
Var(
p) =
1
1
(p(1 p) + + p(1 p)) = p(1 p).
n2
n
Computing Bias
Example 3. If a simple random sample X1 , X2 , . . . , has unknown finite variance 2 , then, we can consider the sample
variance
n
1X
2.
S2 =
(Xi X)
n i=1
83
Introduction to Statistical Methodology
Unbiased Estimation
To find the mean of S 2 , we begin with the identity

n
X
i=1
(Xi )2 =
n
X
+ (X
))2
((Xi X)
i=1
n
n
X
X
2+
X
) + n(X
)2
=
(Xi X)
(Xi X)(
i=1
i=1
n
X
2 + n(X
)2
=
(Xi X)
i=1
Then,
"
1X
)2
(Xi )2 (X
ES = E
n i=1
n1 2
1 2 1 2
n =
.
n
n
n
This shows that S 2 is a biased estimator for 2 . We can see that it is biased downwards.
b( 2 ) =
1
n1 2
2 = 2 .
n
n
In addition,

n
2
E
S = 2
n1
and
Su2 =
1 X
n
2
S2 =
(Xi X)
n1
n 1 i=1
is an unbiased estimator for 2 . As we shall learn in the next example, because the square root is concave downward,
Su as an estimator for is downwardly biased.
is an unbiased estimator
Example 4. If X1 , . . . , Xn form a simple random sample with unknown finite mean , then X
of . If the Xi have variance 2 , then
2
= .
Var(X)
n
In the methods of moments estimation, we have used g(X) as an estimator for g(). If g is a convex function,
we can say something about the bias of this estimator. In Figure 1, we see the method of moments estimator for the
estimator g for a parameter in the Pareto distribution. The choice of = 3 corresponds to a mean of = 3/2 for the
is nearly normally distributed with
Pareto random variables. The central limit theorem states that the sample mean X
mean 3/2. Thus, the distribution is nearly symmetric around 3/2. From the figure, we can see that the interval from
1.4 to 1.5 under the function g maps into a shorter interval than the interval from 1.5 to 1.6. Thus, a sample mean
above 1.5 is stretched away from = 3 under the mapping g more than a sample mean below 1.5. Consequently, we
anticipate that the estimator will be upwardly biased.
One way to characterize a convex function is that its graph lies above any tangent line. If we look at the point
x = , then this statement becomes
g(x) g() g 0 ()(x ).
and take expectations.
Now replace x with the random variable X
g()] E [g 0 ()(X
)] = g 0 ()E [X
] = 0.
E [g(X)
84
Unbiased Estimation
4.5
g(x) = x/(x1)
3.5
y=g()+g()(x)
3
2.5
2
1.25
1.3
1.35
1.4
1.45
1.5
1.55
1.6
1.65
1.7
1.75
Figure 1: Graph of a convex function. Note that the tangent line is below the graph of g.
Consequently,
E g(X) g()
(1)
is biased upwards. The expression in (1) is known as Jensens inequality.

and g(X)
For the method of moments estimator for the Pareto random variable, we determined that
g() =
.
1
By taking the second derivative, we see that g 00 () = 2( 1)3 > 0 and, because > 1, g is a convex function . To
estimate the size of the bias, we look at a quadratic approximation for g
1
g(x) g() g 0 ()(x ) + g 00 ()(x )2 .
2
and take expectations.
Again, replace x with the random variable X
g() E [g 0 ()(X
)] + 1 E[g 00 ()(X
)2 ] = 1 g 00 ()Var(X)
= . 1 g 00 () 2
bg () = E [g(X)]
2
2
2n
Returning to the Pareto example, we have that

00
g
= 2( 1)3 .
1
Thus, the bias
bg ()
( 1)
n( 2)
So, for = 3 and n = 100, the bias is approximately 0.06. Compare this to the estimated value of 0.053 from
simulation.
85
Unbiased Estimation
Cramer-Rao Bound
So, among unbiased estimators, one important goal is to find an estimator that has as small a variance as possible, A
more precise goal would be to find an unbiased estimator d that has uniform minimum variance. In other words,
d(X) has finite variance for every value of the parameter and for any other unbiased estimator d,
Var d(X) Var d(X).
The efficiency of unbiased estimator d,

=
e(d)
Var d(X)
.
Var d(X)
Thus, the efficiency is between 0 and 1.

The Cramer-Rao bound tells us how small a variance is ever possible. The formula is a bit mysterious at first, but
can be seen after we review a bit on correlation. Recall that for two random variables Y and Z, the correlation
(Y, Z) = p
Cov(Y, Z)
Var(Y )Var(Z)
(1)
The correlation takes values 1 (Y, Z) 1 and takes the extreme values 1 if and only if Y and Z are linearly
related, i.e., Z = aY + b for some constants a and b. Consequently,
Cov(Y, Z)2 Var(Y )Var(Z).
One of the random variables we will encounter on the way to finding the Cramer-Rao bound will have mean zero.
In this case, we can make a small simplification.
If the random variable Z has mean zero, then Cov(Y, Z) = E[Y Z] and
E[Y Z]2 Var(Y )Var(Z) = Var(Y )EZ 2 .
(2)
We begin with data X = (X1 , . . . , Xn ) drawn from an unknown probability P . The paramater space R.
Denote the joint density of these random variables
f (x|),
where x = (x1 . . . , xn ).
In the case that the date comes from a simple random sample then the joint density is the product of the marginal
densities.
f (x|) = f (x1 |) f (xn |).
(3)
For continuous random variables, we have
Z
1=
f (x|) dx
(4)
Rn
Now, let d be the unbiased estimator of g(), then

Z
g() = E d(X) =
d(x)f (x|) dx
(5).
Rn
If the functions in (4) and (5) are differentiable with respect to the parameter and we can pass the derivative
through the integral, then

Z
Z
f (x|)
ln f (x|)
ln f (X|)
dx =
f (x|) dx = E
.
(6)
0=
Rn
Rn
86
Unbiased Estimation
From a similar calculation,

ln f (X|)
g 0 () = E d(X)
.
(7)
Now, return to the review on correlation with Y = d(X) and the score function Z = ln f (X|)/. Then, by
equation (6), EZ = 0, and from equations (7) and (2), we find that
"

2
2 #
ln f (X|)
ln f (X|)
0
2
g () = E d(X)
Var (d(X))E
,
or,
Var (d(X))
g 0 ()2
.
I()
(8)
where
"
I() = E
ln f (X|)
2 #
is called the Fisher information.

Equation (8), called the Cramer-Rao lower bound or the information inequality, states that the lower bound
for the variance of an unbiased estimator is the reciprocal of the Fisher information. In other words, the higher the
information, the lower is the possible value of the variance of an unbiased estimator.
If we return to the case of a simple random sample then
ln f (x|) = ln f (x1 |) + + ln f (xn |).
ln f (x1 |)
ln f (xn |)
ln f (x|)
=
+ +
.
Also, the random variables { ln f (xk |)/; 1 k n} are independent and have the same distribution. Using the
fact that the variance of th sum is the sum of the variances for independent random variables, we see that the Fisher
information.
ln f (X1 |) 2
I() = nE[(
) ].
Example 5. For independent Bernoulli random variable with unknown success probability ,
ln f (x|) = x ln + (1 x) ln(1 ),
x 1x
x
ln f (x|) =
=
,
1
(1 )
"
2 #
1
1
E
ln f (X|)
= 2
E[(X )2 ] =
2
(1 )
(1 )
and the information is the reciprocal of the variance. Thus, by the Cramer-Rao lower bound, any unbiased estimator
based on n observations must have variance al least (1 )/n. However, if we take d(x) = x
, then
V ar d(X) =
and x
is a uniformly minimum variance unbiased estimator.
87
(1 )
n
Unbiased Estimation
Example 6. For independent normal random variables with known variance 02 and unknown mean ,
(x )2
ln f (x|) = ln(0 2)
.
202
and
1
ln f (x|) = 2 (x ).
0
"
2 #
1
1
E
= 4 E[(X )2 ] = 2 .
ln f (X|)
0
0
Again, the information is the reciprocal of the variance. Thus, by the Cramer-Rao lower bound, any unbiased estimator
based on n observations must have variance al least 02 /n. However, if we take d(x) = x
, then
V ar d(X) =
02
.
n
and x
is a uniformly minimum variance unbiased estimator.
Using integration by parts, we have an identity that is often an useful alternative to the Fisher Information.
"
2 #

2
ln f (X|)
ln f (X|)
= E
I() = E
2
For an exponential random variable,
2 f (x|)
1
= 2.
2
ln f (x|) = ln x,
This,
I() =
1
.
2
is an unbiased estimator for g() = 1/ with variance

Now, X
1
.
n2
Cramer-Rao lower bound, we have that
g 0 ()2
1/4
1
=
=
.
nI()
n2
n2
has this variance, it is a uniformly minimum variance unbiased estimator.

Because X
88

M Unbiased

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

M Unbiased

Hochgeladen von

Copyright:

Verfügbare Formate

Topic 13: Unbiased Estimation

Introduction to Statistical Methodology

To find the mean of S 2 , we begin with the identity

Introduction to Statistical Methodology

is biased upwards. The expression in (1) is known as Jensens inequality.

Introduction to Statistical Methodology

Var d(X) Var d(X).

The efficiency of unbiased estimator d,

Thus, the efficiency is between 0 and 1.

Now, let d be the unbiased estimator of g(), then

Introduction to Statistical Methodology

From a similar calculation,

is called the Fisher information.

Introduction to Statistical Methodology

is an unbiased estimator for g() = 1/ with variance

has this variance, it is a uniformly minimum variance unbiased estimator.

Das könnte Ihnen auch gefallen