Beruflich Dokumente
Kultur Dokumente
Tony Ke
January 29, 2012
James-Stein Estimator
1.1
Estimator
(x)
= x.
(1.1)
Some definitions and examples in this note are excerpted and modified from
wikipedia.org, though not explicitly cited in the context.
1
For sake of conceptual simplicity, we often talk about the data set X which consists
of only one data point in the following discussion. The generation is straightforward.
Prepared by Tony Ke for UC Berkeley MFE 230K class. All rights reserved.
1.2
Risk Function
L(, (x))
= |(x)
|2 ,
(1.2)
R(, )
(1.3)
R(, )
|2 .
1.3
(1.4)
2
=
In ), R(, )
After introducing the risk function to compare the goodness of any two estimators, one natural question is know how to define the best estimator,
which turns out to be a non-trivial question.
An admissible decision rule is defined as a rule for making a decision
such that there isnt any other rule that is always better than it. Why
we need to specify always? Because the parameter is unknown, a decision
rule can perform well for some underlying parameter, while poorly for others.
is an admissible decision rule if
Mathematically speaking, we say
s.t. R(, )
1.4
Steins Paradox
S (x) = x
2)
|x|2
(1.5)
which will be shown as a better estimator than the ordinary estimator (x)
=
x in example 1.1.
"
#
2
2
(n
2)
S ) = E X +
R(,
X
|X|2
(n 2) 2
(n 2)2 4
= E | X|2 + 2( X)T
X
+
|X|2
|X|2
|X|4
( X)T X
1
2
2
2 4
= E | X| + 2(n 2) E
+ (n 2) E
2
|X|
|X|2
1
= n 2 (n 2)2 4 E
< n 2.
(1.6)
|X|2
The last equation comes from
by parts. Its not hard to show that
h integration
i
@h
E [(i Xi )h(X)] = E @x
(X) , for any well-behaved function h().
i
Steins example shows that for estimation of mean of multi-variate Gaussian distribution, Steins estimator (1.5) is better than the ordinary estimator
in that its risk function is smaller. As a quirky example, we measure the speed
of light (1 ), tea consumption in Taiwan (2 ), and hog weight in Montana
(3 ), and observe data point x = (x1 , x2 , x3 ). Estimates i (xi ) = xi based on
individual measurement is worse than
the one
based on measurements on all
1
total mean squared error in measuring light speed, tea consumption, and hog
weight would improve by using the Stein estimator. However, any particular
component (such as the speed of light) would improve for some parameter
values, and deteriorate for others. Thus, although the Stein estimator dominates the ordinary estimator when three or more parameters are estimated,
any single component does not dominate the respective component of the
ordinary estimator.
Its also worthwhile to point out that validity of Steins paradox doesnt
depend on the quadratic form of the loss function. Though quadratic loss
function can approximate any well-behaved general functions by Tyler expansion. Brown has extended the conclusion of inadmissibility to fairly weak
loss function conditions [1, 2, 3].
1.5
It is not intuitively clear why the Stein estimator dominates the ordinary one.
Steins original argument [10] is based on a comparison of T to xT x when
n is large and proceeds as follows. (This note is based on [9]). Intuitively,
should satisfy i i for i = 1, 2, , n, which implies
a good estimate
2
2
i i for i = 1, 2, , n, and thus
T
T .
(1.7)
1 T
, as n ! 1.
n
(1.8)
Notice that xi for i = 1, 2, , n are not identically distributed, but still the law of
large number holds, which can be proved by Chevyshevs inequality.
1.6
(1.9)
1
1 T
T
exp
(x ) (x ) exp
2 2
2 2
(
2)
2
1
exp
1
x
2 2
2 + 2
2 2 +
2
So |x N
is
2
2 + 2
x,
2 2
2 + 2
B =
(1.10)
2
2
+ 2
x.
(1.11)
B achieves the smallest risk for any , under quadratic loss function. One
should notice that the definition of risk function from Bayesian perspective
diers from that from a frequentists perspective. In Bayesian framework, we
take expectation of the loss function over the posterior distribution of ; while
in frequentists framework, we take expectation over the population space of
x. Instead of an admissible rule, we usually call the risk-function-minimizing
rule a Bayes rule in the Bayesian framework.
B in equation (1.11) is intended to evoke the form
The expression of
of Steins estimator. In fact, instead of determining 2 from outside, we
can estimate the prior density from the data, and then apply the Bayesian
framework. This approach is known as empirical Bayesian estimation. It
2
2
can be shown that x(nT x2) is an unbiased estimator of 2 + 2 . By substituting
this estimator back into (1.11), Steins estimator (1.5) is obtained.
1.7
James-Stein Estimator
Steins idea has been completed and improved later, notably by James [6],
Efron and Morris [5]. Lets consider Steins idea in a more general case.
5
For xt N (, ) (t = 1, 2, , T ), similarly we
show that the ordinary
Pcan
T
1
k)
x + kx0 1,
(1.12)
(n
(
x
2)/T
x0 1)
(
x
x0 1)
(1.13)
We find the James-Stein estimator shrinks not only toward 0 but also in
direction of x0 1. For = 2 In , T = 1 and x0 = 0, (1.12) goes back to Steins
estimator (1.5).
James-Stein method improves accuracy when estimating more than two quantities together. This makes it a natural fit in estimation of multiple asset
returns in a portfolio.
2.1
Framework
Jorion (1986) considered the parameter uncertainties in portfolio optimization problem [7]. He cares more on the uncertainty of return mean, rather
than the uncertainty of return variance-covariance matrix. As pointed out by
Prof. Leland in the class, there are two rationales behand: (1) the optimal
portfolio allocation is very sensitive to the change of mean; (2) the estimation
accuracy on variance-covariance matrix will get refined if one gets data in
finer time scale, such as high-frequency data.
An empirical Bayes method is applied, with the prior on the mean of
asset returns as
1
p(|V, , Y0 ) exp
( Y0 1)T V 1 ( Y0 1) .
(2.1)
2
By repeating a similar procedure in section 1.6, we obtain the optimal estimator as the James-Stein estimator
B (R) = (1
(2.2)
M L (R) = x
is the ordinary
where R represents all the asset return data,
maximum-likelihood estimate of the mean, and
k =
(2.3)
+T
1T V 1
(R).
(2.4)
Y0 = T 1
1 V 1 ML
Y0 happens to be the average return for the minimum variance portfolio.
T
1
One can verify the allocation weights 11T VV 1 1 minimize the variance of the
portfolio subject to the condition that they sum to one.
We can further estimate the shrinkage coefficient from data
k =
n+2
ML
(n + 2) + (
min (R)1)T T V
ML
(
min (R)1)
(2.5)
T
T
1
n
S,
(2.6)
2.2
Example
Figure 1 illustrates a sample estimates from stock market returns for seven
major countries, calculated over a 60-month period. The underlying parameters and were chosen equal to the estimates reported in Figure 1. Then
T independent vectors of returns were generated from this distribution, and
the following estimators are computed:
1. Certainty Equivalence: classical mean-variance optimization
2. Bayes Diuse Prior: Klein and Bawa (1976) uninformative prior
3. Minimum Variance:
! 1 and k = 1
4. Bayes-Stein estimator
The results are shown in Figure 2. We can see Bayes-Stein estimator beats
all others in estimation accuracy for relatively large sample size T 50.
3
Figure 1: Excerpted from Jorion (1986) paper. Dollar returns in percent per
month. Sample period is Jan 1977 to Dec 1981.
Figure 2: Excerpted from Jorion (1986) paper. Fmax is the investors utility function calculated from the true underlying parameters, which is the
theoretically maximum utility that can be achieved. Fi is investors utility
function, when she/he adopts the corresponding estimator calculated from
simulation samples. y-axis on the left shows the relative dierence of the
utility functions, which directly characterizes the goodness of estimation.
8
References
[1] L. Brown. On the admissibility of invariant estimators of one or more
location parameters. The Annals of Mathematical Statistics, 37(5):1087
1136, 1966.
[2] L. Brown. Estimation with incompletely specified loss functions (the
case of several location parameters). Journal of the American Statistical
Association, 70(350):417427, 1975.
[3] L. Brown.
A heuristic method for determining admissibility of
estimatorswith applications. The Annals of Statistics, pages 960994,
1979.
[4] B. Efron and C. Morris. Limiting the risk of bayes and empirical bayes
estimatorspart ii: The empirical bayes case. Journal of the American
Statistical Association, 67(337):130139, 1972.
[5] B. Efron and C. Morris. Steins estimation rule and its competitorsan
empirical bayes approach. Journal of the American Statistical Association, 68(341):117130, 1973.
[6] W. James and C. Stein. Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and
probability, volume 1, pages 1379, 1961.
[7] P. Jorion. Bayes-stein estimation for portfolio analysis. Journal of Financial and Quantitative Analysis, 21(3):279292, 1986.
[8] D. Lindley. Discussion on professor steins paper. Journal of the Royal
Statistical Society, 24:285287, 1962.
[9] J. Richards. An introduction to james-stein estimation, 1999.
[10] C. Stein. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley symposium on mathematical statistics and probability, volume 1, pages 197
206, 1956.