Beruflich Dokumente
Kultur Dokumente
November 3, 2009
When we look to estimate the distribution mean , we use the sample mean x
. For the variance 2 , we have seen
two choices:
n
n
1X
1 X
2
(xi x
) and
(xi x
)2 .
n 1 i=1
n i=1
One criterion for choosing is statistical bias.
Definition 1. A statistic d is called an unbiased estimator for a function of the parameter g() provided that for
every choice of ,
E d(X) = g().
Any estimator that not unbiased is called biased. The bias is the difference
bd () = E d(X) g().
We can assess the quality of an estimator by computing its mean square error.
E [(d(X) g())2 ] = E [(d(X) E d(X) + bd ())2 ]
= E [(d(X) E d(X))2 ] + 2bd ()(E [(d(X) E d(X)] + bd ()2
= Var (d(X)) + bd ()2
Note that the mean square error for an unbiased estimator is its variance. Bias increases the mean square error.
Example 2. Let X1 , X2 , . . . , be Bernoulli trials with success parameter p and set d(X) = X,
= 1 (p + + p) = p
E X
n
is an unbiased estimator for p. In this circumstance, we generally write p instead of X.
In addition,
Thus, X
Var(
p) =
1
1
(p(1 p) + + p(1 p)) = p(1 p).
n2
n
Computing Bias
Example 3. If a simple random sample X1 , X2 , . . . , has unknown finite variance 2 , then, we can consider the sample
variance
n
1X
2.
S2 =
(Xi X)
n i=1
83
Unbiased Estimation
(Xi )2 =
n
X
+ (X
))2
((Xi X)
i=1
n
n
X
X
2+
X
) + n(X
)2
=
(Xi X)
(Xi X)(
i=1
i=1
n
X
2 + n(X
)2
=
(Xi X)
i=1
Then,
"
1X
)2
(Xi )2 (X
ES = E
n i=1
n1 2
1 2 1 2
n =
.
n
n
n
This shows that S 2 is a biased estimator for 2 . We can see that it is biased downwards.
b( 2 ) =
1
n1 2
2 = 2 .
n
n
In addition,
n
2
E
S = 2
n1
and
Su2 =
1 X
n
2
S2 =
(Xi X)
n1
n 1 i=1
is an unbiased estimator for 2 . As we shall learn in the next example, because the square root is concave downward,
Su as an estimator for is downwardly biased.
is an unbiased estimator
Example 4. If X1 , . . . , Xn form a simple random sample with unknown finite mean , then X
of . If the Xi have variance 2 , then
2
= .
Var(X)
n
In the methods of moments estimation, we have used g(X) as an estimator for g(). If g is a convex function,
we can say something about the bias of this estimator. In Figure 1, we see the method of moments estimator for the
estimator g for a parameter in the Pareto distribution. The choice of = 3 corresponds to a mean of = 3/2 for the
is nearly normally distributed with
Pareto random variables. The central limit theorem states that the sample mean X
mean 3/2. Thus, the distribution is nearly symmetric around 3/2. From the figure, we can see that the interval from
1.4 to 1.5 under the function g maps into a shorter interval than the interval from 1.5 to 1.6. Thus, a sample mean
above 1.5 is stretched away from = 3 under the mapping g more than a sample mean below 1.5. Consequently, we
anticipate that the estimator will be upwardly biased.
One way to characterize a convex function is that its graph lies above any tangent line. If we look at the point
x = , then this statement becomes
g(x) g() g 0 ()(x ).
and take expectations.
Now replace x with the random variable X
g()] E [g 0 ()(X
)] = g 0 ()E [X
] = 0.
E [g(X)
84
Unbiased Estimation
4.5
g(x) = x/(x1)
3.5
y=g()+g()(x)
3
2.5
2
1.25
1.3
1.35
1.4
1.45
1.5
1.55
1.6
1.65
1.7
1.75
Figure 1: Graph of a convex function. Note that the tangent line is below the graph of g.
Consequently,
E g(X) g()
(1)
.
1
By taking the second derivative, we see that g 00 () = 2( 1)3 > 0 and, because > 1, g is a convex function . To
estimate the size of the bias, we look at a quadratic approximation for g
1
g(x) g() g 0 ()(x ) + g 00 ()(x )2 .
2
and take expectations.
Again, replace x with the random variable X
g() E [g 0 ()(X
)] + 1 E[g 00 ()(X
)2 ] = 1 g 00 ()Var(X)
= . 1 g 00 () 2
bg () = E [g(X)]
2
2
2n
Returning to the Pareto example, we have that
00
g
= 2( 1)3 .
1
Thus, the bias
bg ()
( 1)
n( 2)
So, for = 3 and n = 100, the bias is approximately 0.06. Compare this to the estimated value of 0.053 from
simulation.
85
Unbiased Estimation
Cramer-Rao Bound
So, among unbiased estimators, one important goal is to find an estimator that has as small a variance as possible, A
more precise goal would be to find an unbiased estimator d that has uniform minimum variance. In other words,
d(X) has finite variance for every value of the parameter and for any other unbiased estimator d,
Var d(X)
.
Var d(X)
Cov(Y, Z)
Var(Y )Var(Z)
(1)
The correlation takes values 1 (Y, Z) 1 and takes the extreme values 1 if and only if Y and Z are linearly
related, i.e., Z = aY + b for some constants a and b. Consequently,
Cov(Y, Z)2 Var(Y )Var(Z).
One of the random variables we will encounter on the way to finding the Cramer-Rao bound will have mean zero.
In this case, we can make a small simplification.
If the random variable Z has mean zero, then Cov(Y, Z) = E[Y Z] and
E[Y Z]2 Var(Y )Var(Z) = Var(Y )EZ 2 .
(2)
We begin with data X = (X1 , . . . , Xn ) drawn from an unknown probability P . The paramater space R.
Denote the joint density of these random variables
f (x|),
where x = (x1 . . . , xn ).
In the case that the date comes from a simple random sample then the joint density is the product of the marginal
densities.
f (x|) = f (x1 |) f (xn |).
(3)
For continuous random variables, we have
Z
1=
f (x|) dx
(4)
Rn
d(x)f (x|) dx
(5).
Rn
If the functions in (4) and (5) are differentiable with respect to the parameter and we can pass the derivative
through the integral, then
Z
Z
f (x|)
ln f (x|)
ln f (X|)
dx =
f (x|) dx = E
.
(6)
0=
Rn
Rn
86
Unbiased Estimation
(7)
Now, return to the review on correlation with Y = d(X) and the score function Z = ln f (X|)/. Then, by
equation (6), EZ = 0, and from equations (7) and (2), we find that
"
2
2 #
ln f (X|)
ln f (X|)
0
2
g () = E d(X)
Var (d(X))E
,
or,
Var (d(X))
g 0 ()2
.
I()
(8)
where
"
I() = E
ln f (X|)
2 #
Also, the random variables { ln f (xk |)/; 1 k n} are independent and have the same distribution. Using the
fact that the variance of th sum is the sum of the variances for independent random variables, we see that the Fisher
information.
ln f (X1 |) 2
I() = nE[(
) ].
Example 5. For independent Bernoulli random variable with unknown success probability ,
ln f (x|) = x ln + (1 x) ln(1 ),
x 1x
x
ln f (x|) =
=
,
1
(1 )
"
2 #
1
1
E
ln f (X|)
= 2
E[(X )2 ] =
2
(1 )
(1 )
and the information is the reciprocal of the variance. Thus, by the Cramer-Rao lower bound, any unbiased estimator
based on n observations must have variance al least (1 )/n. However, if we take d(x) = x
, then
V ar d(X) =
and x
is a uniformly minimum variance unbiased estimator.
87
(1 )
n
Unbiased Estimation
Example 6. For independent normal random variables with known variance 02 and unknown mean ,
(x )2
ln f (x|) = ln(0 2)
.
202
and
1
ln f (x|) = 2 (x ).
0
"
2 #
1
1
E
= 4 E[(X )2 ] = 2 .
ln f (X|)
0
0
Again, the information is the reciprocal of the variance. Thus, by the Cramer-Rao lower bound, any unbiased estimator
based on n observations must have variance al least 02 /n. However, if we take d(x) = x
, then
V ar d(X) =
02
.
n
and x
is a uniformly minimum variance unbiased estimator.
Using integration by parts, we have an identity that is often an useful alternative to the Fisher Information.
"
2 #
2
ln f (X|)
ln f (X|)
= E
I() = E
2
For an exponential random variable,
2 f (x|)
1
= 2.
2
ln f (x|) = ln x,
This,
I() =
1
.
2
g 0 ()2
1/4
1
=
=
.
nI()
n2
n2
88