Maximum Likelihood Estimation in A Class of Nonregular Cases

Biometrika Trust
Maximum Likelihood Estimation in a Class of Nonregular Cases

Author(s): Richard L. Smith
Source: Biometrika, Vol. 72, No. 1 (Apr., 1985), pp. 67-90
Published by: Oxford University Press on behalf of Biometrika Trust
Stable URL: http://www.jstor.org/stable/2336336
Accessed: 03-04-2017 11:21 UTC
REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/2336336?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
Biometrika Trust, Oxford University Press are collaborating with JSTOR to digitize, preserve and
extend access to Biometrika
This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC
All use subject to http://about.jstor.org/terms
Biometrika (1985), 72, 1, pp. 67-90 67
Printed in Great Britain
Maximum likelihood estimation in a class of

nonregular cases
BY RICHARD L. SMITH
Department of Mathematics, Imperial College, London SW7 2BZ, U.K.
SUMMARY
We consider maximum likelihood estimation of the parameters of a probability

density which is zero for x < 0 and asymptotically cxc(x_ 0)'- ' as x 4 0. Her
parameters, which may or may not include oc and c, are unknown. The classical regularity
conditions for the asymptotic properties of maximum likelihood estimators are not
satisfied but it is shown that, when cx > 2, the information matrix is finite and the
classical asymptotic properties continue to hold. For cx = 2 the maximum likelihood
estimators are asymptotically efficient and normally distributed, but with a different
rate of convergence. For 1 < a < 2, the maximum likelihood estimators exist in general,
but are not asymptotically normal, while the question of asymptotic efficiency is still
unsolved. For cx < 1, the maximum likelihood estimators may not exist at all, but
alternatives are proposed. All these results are already known for the case of a single
unknown location parameter 0, but are here extended to the case in which there are
additional unknown parameters. The paper concludes with a discussion of the appli-
cations in extreme value theory.
Some key words: Extreme value theory; Maximum likelihood; Nonregular estimation; Stable distribution;
Weibull distribution.
1. INTRODUCTION
It is well known that, if the support of a probability density depends on an unknown

parameter, then the classical regularity conditions for maximum likelihood estimation
are not satisfied. In some cases the maximum likelihood estimators exist and have the
same asymptotic properties as in regular cases. In other cases they may exist but not be
asymptotically efficient or normally distributed, and in still other cases they may not
exist at all, at least not as solutions of the likelihood equations.
The case of a single unknown location parameter
f (X; 0) = o (X 0) (O < X < so); fo(x) - occx'1(11

as x I 0 (ac > 0, c > 0), subject to certain smoothness conditions, is well docum
For cx > 2 the Fisher information is finite and the maximum likelihood estimators have
the same asymptotic properties as in regular cases. This is stated by Woodroofe (1972,
Proposition 1.1), based on results of Le Cam (1970); the result is also implicit in Dawid's
(1970) proof of the asymptotic normality of posterior distributions in this case. For cx = 2
the maximum likelihood estimators are asymptotically normal (Woodroofe, 1972;
Akahira, 1975a) and efficient (Weiss & Wolfowitz, 1973), but the order of convergence is
O{(n log n)2} instead of the usual O(n2). For 1 <cx < 2 the maximum likelihood
estimators have a nonnormal limiting distribution with order of convergence O(n'12)
68 RICHARD L. SMITH
(Woodroofe, 1974). Asymptotic efficiency is an open question, though Akahira (1975a, b)

showed that no estimator has a larger order of convergence. When cx < 1 the likelihood
function has no local maximum but is globally maximized at the sample minimum. The
sample minimum itself is a consistent estimator with order of convergence O(n 'a ), and
Akahira's results show that, for 0 <cx < 2, this order of convergence cannot be
improved. When cx = 1 a result of Weiss (1979) shows that the sample minimum has a
property of asymptotic sufficiency, but I have not succeeded in applying Weiss's
argument for any other value of ac.
In the present paper these results are extended to distributions in which there are
other unknown parameters as well as 0. Some examples which fall within our framework
are as follows.
(i) Three-parameter Weibull,
f(x;O,x,f3)=cxf3(x-O)'-exp{-,f(x-0y)} (0<x<oo,x>O,fl>O).(1-2)
(ii) Three-parameter gamma,
f(x; 0, Lc, fl) = f,B(x-0) exp {-f(x-0)}/F(ac) (0 < x < oo, oc > 0
(1.3)
(iii) Three-parameter beta, which is a beta distribution with an unknown scale

parameter, which can be recast in our framework by defining X to be a random
variable on 0 < x < oc with 1- exp (- X + 0) beta distributed; thus
f(x; 0, c,,B) = B(c, fl) {1-e-(X -)}-I e-fl(X-0) (0 < x < o, oc > 0,3 > 0).
(1.4)
(iv) Three-parameter log gamma, which arises when log (X -0+1) has a gamma
distribution; thus
f(x;0,cx,f3)=flB{log(x-0+1)}'-'(x-0+1)-y-'/r(oc) (0<x<oo,oc>0,f,>0).
(1.5)
Our results also cover certain instances of the Box-Cox transformation family but we
shall not consider these explicitly.
Of these four examples, the Weibull has been studied the most intensively. For all
four, when ac < 1 the density tends to infinity as x I 0 so that, unless the range
excluded, the likelihood function always tends to infinity along some path in the
likelihood space as 0 tends to the sample minimum. Therefore it is necessary to
distinguish between global and local maxima of the likelihood. In this paper, by the
maximum likelihood estimator we shall always mean a local maximum, thus satisfying
the likelihood equations. A second point, which applies to all four examples but may not
be true in general, is that for Lc < 1 the density is J-shaped, so there cannot exist a
maximum likelihood estimator for which ac < 1. In particular, if the true value of ac is less
than 1, maximum likelihood estimators either do not exist at all or are inconsistent.
Harter & Moore (1965) described an iterative procedure for finding maximum
likelihood estimators for the Weibull and gamma distributions with possibly censored
data. In cases where maximum likelihood estimators do not exist, they proposed an ad
hoc modification based on treating the smallest observation as if it were censored.
Rockette, Antle & Klimko (1974) showed for the Weibull distribution that, if a local
maximum of the likelihood function exists, then there is a second solution of the
likelihood equations which is a saddlepoint. Their result shows that, in finding maximum
Nonregular estimation 69
likelihood estimators one must take care to ensure that a solution of the likelihood
equations really is a local maximum. Other relevant references are Cohen (1965) and
Lemon (1975). The Weibull distribution may be reparameterized as the generalized
extreme value distribution (Jenkinson, 1955) for which modern algorithms are available
(NERC, 1975; Prescott & Walden, 1980, 1983).
Thus there is a fair-sized literature on finding maximum likelihood estimators of the
Weibull distribution, but their asymptotic properties are largely unexplored. It is easily
checked that, when a > 2, the Fisher information matrix is finite, and it is widely
assumed that the classical properties hold in this case. For a < 2 the Fisher information
for 0 is infinite, so the classical results are certainly not valid.
In this paper it is confirmed that the classical results hold when a > 2, and the case
a <, 2 is studied in detail. The surprising result is that, in this case, estimation of 0 and
the other parameters, denoted by 0, in general a vector, are asymptotically independent:
each of the maximum likelihood estimators of 0 and 0 has the same asymptotic
distribution when the other is unknown as when the other is known, and we are also able
to show that these asymptotic distributions are independent. For 1 < Li < 2 we prove the
existence of a consistent sequence of maximum likelihood estimators as the sample size
tends to infinity, while for Li < 1 no consistent maximum likelihood estimators exist. We
also propose efficient alternatives to maximum likelihood which, in particular, cover the
cases where maximum likelihood estimators do not exist.
Cheng & Amin (1983) have also studied the asymptotic properties of maximum
likelihood estimators for the Weibull and gamma cases, though their Theorem 2 is less
extensive than our results in ?? 3 and 4 and their proofs are unpublished. On the other
hand they introduce a new estimator, the 'maximum .product of spacings' estimator,
which deserves to be studied further. Johnson & Haskell (1983) prove consistency
of the maximum likelihood estimator for the three-parameter Weibull with Li > 1, and
present Monte Carlo results which indicate that, even in the 'regular' case Li > 2, the
asymptotic normality of the estimators is approached only slowly.
Our main results require a long list of assumptions, which are stated in ? 2. The
remainder of the paper is organized so that statements of the main results appear at the
beginning of each section, and may be understood without reference to the proofs.
Results of a purely technical nature are stated as lemmas and contained within the
proofs of the theorems, though Lemmas 6 and 7 may be of independent interest.
2. ASSUMPTIONS
We consider probability densities of the form
f(x;0'')= (x_0)"-'g(x-0;0g) ( < x< 00), (2.1)

where 0 and 0, the latter a vector, are unknown parameters and g t
as x I 0. We assume a =_ x(f) is a twice continuously differentiable
0 < a < oo. This formulation allows both the case when a is a component of 0 and where
it is a known constant. Similarly c c(+). For our four examples the function g is given
by:
(i) Weibull, g(x;oc,,) = oc,exp(- fx");
(ii) gamma, g(x; o, fl) = fl, exp (-fx)/F(a);
(iii) beta, g(x; o, ) = B(oc, ,B) {(1-e--)/x}i -e1 Oix;
(iv) log gamma, g(x ; oc, /B) = JBP{(log (1 +?x))/x}a- '1( ?x)~ -' /F(oc).
70 RICHARD L. SMITH
In general, we assume 0 is real-valu

We adopt the convention that subscripts, 01, 02, etc., are used to denote elements of F,
whilst superscripts, 1, 4)2, etc., denote components of a particular 4. Thus
4, (i = 1, ...,p) is the ith component of 4j. The symbol I 0 1 denotes
The detailed assumptions are as follows.
Assumpton 1. All second-order partial derivatives of g(x; 0) exist and are continuous
in 0 < x < oo, 0 E (D. Moreover c(0) = c - 1 lim g(x; 4) as x 4 0 exists, is positive and finite
for each 4, and is twice continuously differentiable as a function of 0.
Assumption 2. If 01 and 42 are distinct elements of (F then the set
{X: f (X; 0, 01) * f (X; 0, 0A)

has positive Lebesgue measure.
Assumption 3. For some fixed P >, 0 and for each il > 0, 01 E (, we have
log g(x -e; 0) -log g(x - el; 01) < Hr,(x; 01),
whenever IeI < 1, I1el <11, I 0 -01<ii , x-e> , x-e1 > * and the func
satisfies
100
lim Hq(x+y; 01)f(x;0,00))dx = 0

q-O Jo
for each y > 0, 4) - (F.
Assumption 4. There exists a fixed increasing sequence of compact subsets {Km, m > 1}
of RP+l and a fixed constant 6' such that Um Km = DR x (D, and, for 0,0, 04, 4O satisfyin
0 < 00o-'4)40)0o c (D,(0, 0) 0 Km, we have
log f (x; f, 0) - log f (x; f0o, #)) < Ho (x; f0o, 4)) (x > f0o),
where
(00
-oo < limsup Ho(x;0, 0) f(x;O0,0)dx < 0.

m oo J Oo
Assumption 5. There exists a fixed increasing sequence of compact subsets {K I m > 1}

of (F with UmK' = (F, and 6' > 0 a fixed constant, such that for 0 E (F-K,, 4o c (F,
I < 6', max (0, e) < x < oc and some 17' > 0, we have
~i'x(4)) + {xoc(4) -x(4O)} log x + log g(x-e; 4) -log g(x; 4)) < HIJ(x; 4),)v
where
('00
-so < limsup H' (x; O) f (x;0, O))dx < 0.

m - JO 0
Assumption 6. If E, denotes expectation with respect to f(. ;0, 4)) then for each
0 E (D (i, j = l ... I,p):
(a)
E (4 logf(X;O,a) = O
E,)6{(;+i) log f (X; 0, 4) )Qa4j)logf(x;0, 4)) =- E4(.A4) '7 ,) log f (X; 0 4)}
Nonreqular estimation 71
(b) ifcx>1, then
E4( )log f (X; O, )} = ,
-E=Ox log f)(X; 0, ) (e-logf)(X 0, E c/ox}) logf(X;0 04)}
= mio(4) = moi(4));
(c) if oc>2, then
E4,[{(a/ax)logf(X ;o, 4)}2] = 2 -E{(02/aX2)logf(X;0, 0)}

- moo(4) > 0.
It is part of the assumption that all these expectations are finite.
Assumption 7. If h(x; 4) is any Of (02/ax a/i) log g(x; 0) or (a2/a4i a/J) log g(x; 4),
as 0 -+ Oo, 4) -+
EoIh(X- 0;4)-h(X-0o;40o)I -+ 0,
where Eo is with respect to f(.; 00, 4)0). If ox(4o) > 2, we require the same of
h(x; 4) = (a2/aX2) log g(x; 4).
Assumption 8. For each e > 0, 6 > 0, there exists a function h, , such that
I (02/aX2) log g(x; 0) I < e/x2 + hE,,(y; /o)
whenever I 01- )0I < 5, I x-y y < , and h, satisfies
{h8(X; 00) f (X; 0, 4)0) < so.
Assumption 9. For each ( > 0, ) E (D,
{ {(a/ax)logg(x;10)}2f(x;0,0))dx < 00.
We now make some remarks on the assumptions. The key assumptions are Assump-
tions 1 and 6; the rest are there for technical reasons. Assumptions 2-5 are similar to, but
necessarily more complicated than, the classical assumptions of Wald (1949). In
Assumption 3 we may have (* = 0 or (* > 0; in the Weibull case, Assumption 3 is true
with (* > 0 for all oc but with (* = 0 only for a > 1. The distinction is reflected in the
statement of Theorem 2 below. Assumptions 7 and 8 are needed for Lemma 4 in ? 3, and
Assumption 9 is taken from Woodroofe (1972, 1974). Assumption 1 could be weakened to
g(x; 0) slowly varying at x = 0 for each 0, but at the cost of an increase in technical
detail.
For all our examples, Assumptions 1-6 and 9 are straightforward, if somewhat
tedious, to check. For examples (ii)-(iv), log g and its partial derivatives are bounded in x
for each 4, so Assumptions 7 and 8 are easy to check as well. For the Weibull distribution
we have
log g(x;c,f3) = log a + log _flx,
a a2t*~R R- nnm*~R- t M
72 RICHARD L. SMITH
which are bounded as x I 0 respectiv

easily. To satisfy Assumption 8, given 0 < ax1 < aO < oc2 and 6 > 0, there exists K such
that
02 lOg g(X; oc, A) <Kxal - 2 (x 1),
and all (oc, /B) satisfying X1 < oX < oX2, I- I < J. Let XE = (8/K)'I' which we assume to be
< 1. Then
I (021/X2) log g(x ; oc, /)I < ?/X2 + hi6(,; C0, /0)
where
(0 (x <,xe),
2 (X,
he, (x; ao, f3o) = KxI (XE< ?XE
Xv< 1),
tX12 2 (X l)
This function is bounded as x - 0. If we define h.,6(y ; ao, I3o) t

sup {Ih4(x;f ox o): I X-y I < (},
Assumption 8 is satisfied.
3. EXISTENCE AND UNIQUENESS OF CONSISTENT ESTIMATORS
From now on we shall let 00, 00 denote the true values of 0 and 0. The letters ox and
unless indicated otherwise, will always denote Ca(40) and c(0o). Let X1, ..., Xn den
sample of independent observations from the common density f( ; 00, /0). Let Ln d
the log likelihood divided by n, that is
Ln(0, ) = n-l E logf(Xi;0,0).

i= 1
The order statistics will be denoted Xn,1 < ... < Xn n; note that Ln(0, 0) is def
for 0 < Xn, 1.
The maximum likelihood estimator, when it exists, will be denoted by (On, /n) an
satisfies
aL,n(on,, 4J)/aO = 0, aLn(On, /n)/abi = 0 (i = 1, ...,p).
For the special case when 0 = 00 is known, let 7n- n(7O() denote the maximum
likelihood estimator for k, satisfying (aLnl/a4Ji) (0f,5,) = 0. The existence and con-
sistency of a5n follows from the classical results for regular estimation problems.
Similarly, let fn =_ O-n(/) denote the maximum likelihood estimator for 0 when 0 = 00
is known. The asymptotic properties of On are given by the results in ? 1. In particular,
0in exists and is consistent when oc > 1.
Define dn, 0 (n 1) to be 1 if,B> 1, logn if,B= 1 and n1/P1 if 0 </< 1, and write
Yn <prrn, for random variables { Yn4 and positive constants {rnl}, if
lim limsuppr(I Yn I > arn) = 0. (3.1)

a-4oo n-4oo
Our main result in this section is the following.
THEOREM 1. Assume Assumptions 1 and 6-8 are satisfied, and that ocX> 1. Suppose that
M is strictly positive-definite, where:

(i) for a > 2, M is the (p + 1) x (p + 1) matrix with entries mij(0o) (i, j = 0, ...,p);
(ii) for 1 < o < 2, M is the p x p matrix with entries mij(?o ) (i, j = 1, . . ., p), the functions
mij being defined in Assumption 6. Then there exists a sequence ( o,) A) Of solutions to
the likelihood equations such that
An00 <p (n4n,a) v n -/)o <p n-

Moreover, if ot = 2 we have
An -On <pn - (logn) v n a- dn <p (n log n) 2

while if 1 <cx < 2 we have
An -fJn < p n /a + n _ (An- <Pn-l /a
Theorem 1 is a result about the local behaviour of the likelihood function near the true
parameter values. There is no guarantee that the local maximum, whose existence is
guaranteed by Theorem 1 when cx > 1, is unique, and we know already that the local
maximum is not a global maximum. The following result goes part of the way to settling
the uniqueness of the estimator. It is an analogue for this problem of Wald's (1949) result
on the consistency of the maximum likelihood estimator in regular estimation problems.
THEOREM 2. Suppose that, for each n, we have a sample of n independent observations

from the density f (. ; 00, 0 0), ordered Xn, 1 < ... < Xn, n. For fixed ( > 0, ? > 0, y < oo, defi
the following regions of the parameter space R x (D: U is the set of (0, 0) for which 0 < 00
4 E (, and Vn is the set of (0, 0) for which I 0- 0o I < , I - 00 I > ( and either X(4)) > 1
or fJ < Xn, 1-n-
(i) Suppose Assumptions 1-6 hold with * = (Sin Assumption 3, (' = (Sin Assumptions
4 and 5. Then
lim pr {sup Ln(0, 4)) > Ln(O0, 4)o)} = 0.
(ii) Suppose Assumptions 1-6 hold with (* = 0 in Assumption 3, (' = ( in Assumptions

4 and 5, and cx > 1. Then
lim pr {sup Ln(0, ?) > LO(O0, ?O)} = O0

nf- 00 Vn
We remark that the main interest in this result lies in the case oc > 1, when (i) and (ii)
both hold. The theorem then shows that the region on which Xn, 1-0 is exponentially
small and cx(o) < 1 is asymptotically the only region where the likelihood function
badly behaved. When ac < 1, for all our four examples the log likelihood is J-shaped,
which shows at once that there can be no consistent maximum likelihood estimator.
Note that we have not settled the question of whether there is a unique local
maximum of the likelihood function. The proof of Theorem 1 makes it clear that the
Hessian of the log likelihood is negative-definite on a small neighbourhood depending on
n around (00, 00), and results of Mikeliinen, Schmidt & Styan (1981) then show that
there is a unique local maximum on this neighbourhood. The question of global
uniqueness is much harder to resolve.
We start the proofs with several technical lemmas. In Lemmas 1-3, the only
assumptions made are Assumption 1 and (21). Convergence of random variables is
always convergence in distribution unless stated; p means convergence in probability.
74 RICHARD L. SMITH
LEMMA 1. Let X1, Xn be independent from f(. 00, 00) and let
n
Sn,m Z I (Xk-00)m
k= 1
(m > O).
Then a - '(Sn,m - bn) W for some an, bn and W having a stable distribution with
min (2, x/m). We may take an = (ndn,pim). If m = o then also
Sn, ml(cn log n) +p 1.
Proof. The random variable (Xi-O0)-m has a distribution in a stable domain of

attraction. The conclusion follows from standard results (Feller, 1971, ?? XVII.5, VII.7).
D
LEMMA 2. Let X1,., Xn be as in Lemma 1, ordered by Xn 1 < ... < Xn,n. Define
n
Sn,m = Z (Xn,k-Xn,l) (m>O).
(i) If Lxm, thenSn*,mSn,m-+p asn- o.

(ii) If a < m, then Sn*m < n Ma as n -+ o.
Proof. Define S~n,m = Sn,m-(Xn,I-OO)-m. It is easily seen that S* */Sn, m p 1, S

concentrate on showing that S*, mlS/*n -*p 1. Now
(Xn k-Xn, 1) m- (Xn, k-O ) < (Xn, k-o ) Mh{ (Xn, 1 -O0)/(Xn, k - O) },
where h(t) = mt(I - t) -m r-i for 0 < t < 1. Note that h(t) is increasing in t and tends
t -+ 0. Therefore
n
s*-m - S* (X,k-00) -m h{(Xn,1 -o)/(Xn,k-O)} (3*2)
k=2
Part (i) follows by splitting the sum in (3 2) into two parts, using the monotonicity of h
and (X., k-OO)-MSn*,*m -p O for any fixed k. Part (ii) is similar but easier. O
For the results which follow, we need some new notation. Suppose { Yn(An), n > 1} i
random sequence indexed by iln e An, and {rn, n > 1 } is a sequence of positive consta
We shall say that Yn(An) <p rn uniformly in An if the relation (3 1) holds uniformly over
An e An. Similarly, we say that Yn(An) -+p c uniformly in An if convergence in probabili
holds uniformly over An.
Define
-n E 9log(Xn, k-O) (i = 1),

YU) (0) = k= I
n-' Y, (X,k-f) i+ 1 (i = 2,3).
k= 1
LEMMA 3. Given positive sequences {5,n}, {bn}, the following relations hold uniformly
0on satisfying 0On < Xn, 1- bnI 0| - On I < 6n
(i) I Yn')(O,,)- Y'((0o) | <p max {(nlog I/6n)', n- log n al};
(ii) I Yn(')(On) - Yn )(00) I <p max {(nu,) , n 6, n
(iii) | Yn)(O,) - Yn (()0) <n max {n bn n n21a1, I
Proof. We prove only (ii); (i) and (iii) are si

the sum of three terms: n - '(Xn, 1-On) - n - '(Xn, 1-00)-,
n
n1 E {(Xn,k-On)'-(Xn,k-O0)'}-
k=2
The first term is <p (nbn)'- and the second is <p n1ll- 1. For the th
I (Xnk k-Ofn) -(Xn,k-OoV00) I O I 0-0 I/(Xn,k- Xn, 1)2.

But it follows from Lemmas 1 and 2 that
n
n- , (Xn,k-Xn, l )2 <p Xn, 2 a-

k=2~~~~~~~~~~~
Hence this third term is <, 5n __. The result follows by putting together these three
bounds. DH
LEMMA 4. Let 6 6* for n = 1, 2, ..., be any sequences of positive numbers with

-+ 0, 5n* o0, and let b, b' be any positive finite numbers. Let Assumptions 1 and 6-8
hold.
I (i) Suppose a > 2. Then _ (02/l02) Ln(O, (/) p moo(+o) uniformly over
0-00 I < bn-/ 1 0 -o 00 < bn, 0} < Xn, I -bn* n 2

I (ii) Suppose x = 2. Then - (log n) - 1(02/002) Ln(O, 0) p c uniformly over
10 So < b , 11)-s)ol bn)0 < Xn, I- 6*(n log n)-2

I (iii) Suppose x < 2. Then _ (02/002) Ln(0, 4)) Zn on this range, where {Zn} i8 a sequence
of asymptotically positive random variables in the sense that
lim lim pr (Zn > a) = 1.

alO n-oo
II (i) Suppose a > 1. Then - (02/aO a4i) Ln(0, 4)) p moi(4)o) for i = 1, . . ., p, unifor
over
0 I| <nlXn, la, 1 0 001 < bn, 0J < Xn, 1 -n 16n*
II (ii) Suppose a = 1. Then (D2/00 a')i) Ln(0, 4)) <p log n, for i = 1, ..., p, uniformly
10-00 I < b(logn)/n, I0 - )0I < bn, 0 < Xn 1-b'/(nlogn).

II (iii) Suppose x < 1. Then (a2/0 84)') Ln(O, ) 0, _ (0214i a4)) LnA(, 4)) p mij(4)O) for i, j = 1, ..., p, uniformly ove
0-00 I < bn/Xn,v I 0-01 < bn5 ,0 < Xn,b-nb- If the upper bound on I 0
relaxed to b '1 n, we still have (02/&/i /iJ) Ln(0, i) <P 1 uniformly.
Proof. I. We write
f2 \ n (0 , 2) = {a(/) - } Yn3() -n k (02) logg(Xn k-0; (/)) (33)
76 RICHARD L. SMITH
(i) The result is true for 0 = 00, /

prove (02/002) {Ln(0, 0)-Ln(0O, 0
( $42 ){Ln(0, 4) -Ln(0 ))} | I -o(+
+ n k=1 (Oj2) {1gg9(X

and the second term tends to zero in probability. By Lemma 3(iii) and the
continuity of oc(.), the first term also tends to zero in probability, uniformly as
required.
(ii) For arbitrarily small ?, (5, the second term in (3 3) is bounded by
n
nY
? y(3) (0 ) + 3,(O)?Xn,'kZ
n-f h (nkf '/))
k= 1
00, 40)
uniformly over I 0-00 I < (, I b-00 1 < , where the latter term is Op(1),
Assumption 8. The result then follows from Lemmas 1 and 3(iii).
(iii) The first part again follows by a combination of Assumption 8 and Lemma 3(iii),
together with Lemma 1. For the second part, note that _ (02 Ln/a02) is bo
below, up to an error of at most Op(l), by
(oc -l ) n - (Xn, l-0)2 '> (OC-l ) n '(Xn, 1 -0 +b'n X a - 2)

With Wn = n'/"(Xn,1-00), this becomes (ac-1)n21- '(Wn-b')-2. But W
verges to a nondegenerate Weibull limit by extreme value theory. Hence the
stated conditions are satisfied by Zn = (oc -1) (Wn+ b') -2 C
We shall not give the proofs of the remaining parts of Lemma 4. They follow by
arguments similar to those already given, using Assumptions 1, 6 and 7 and the relevant
parts of Lemmas 1 and 3.
LEMMA 5. Let h be a continuously differentiable real-valued function of p +? r

variables and let H denote the gradient vector of h. Suppose that the scalar product of x
and H(x) is negative whenever I x = 1. Then h has a local maximum, at which H = 0,
for some x with IxI < 1.
The proof is omitted.
Proof of Theorem 1. The case cx > 2 is straightforward, so that we do this first. Let {
be any sequence such that n' n -+ 0, n2 (5b -+ oo and define for t E R, y E RP,
fn(t,
fn(^y)Y==(5-2
n 2 L(O?5t f?5y)
n((O + in t, o0 + an Y)-
By expanding afn/It and afnl/yj as far as the second term and using the results of Lem
4, we deduce that
afn(t, y)/at = -tmoo(0o) - i yM mi(4)0) + 'En, 0(t, Y),
afn(t, y)/yj = -tmOj(0O) - i y mij(0)o) +?n, j(t, y),

where ?n j(t, y) -,p 0 uniformly over t2 + j y 2 < 1 say, for j = 0, ..., p.
Let t2+Iy12 = 1. We have
ti n@ zY g/Y = -tmO2 jnO E iy i ?
which is as n s-+o eventually strictly negative by the assumed positive-definiteness of

M. Hence Lemma 5 shows that fn has a local maximum in the range t2 + I y 12 < 1, with
probability tending to 1 as n --+ oo. Since the sequence n' bn may be made to increase
arbitrarily slowly, this proves the result for the case oc > 2.
Now let c s 2. The argument proceeds by showing that On, /n are respectively
On, (n- For 1 i < p,
Ln (0, k) = ai; (0, 0)_)- ni (00, n)
= (0-0) Lf Ln (0*' *) X(4,J(/)aLfl ( + )n
where (0*, O*) = A(0, 0) + (1 -iA) (0o, On) for some A between 0 and 1. Thus we may writ
aLn(o, O 0i = -(0)n 0o) MOi - j(j- on) mijr+ en(0, 4). (3 4)

Similarly,
Lfl(04,) {nLfl(O)) aL) (Ok)}+ {aL n (0 ) n (0A )}

00 00_ JL (n, 4o)} 0
+ { g^n (O)n, n)_ (On dmn) -
{Ln (0Ln) ( 3*5
= { f) 0> iin)gXf (Rr fin)
Let 0* satisfy
aL^ (n* On)- (On, On) = Xi(4l - /) mO. (3 6)
For the moment, we assume On* exists. Comparing (3 6) with (3 5), we see
L__ =L 0 OLn (0 ,*,)?e*(0,k) = ( -a 07 02) aL2 (O** nj+e*(0, t)

(3*7)
for some 0** between 0 and 0n. If we also define /* to satisfy
0 = (07n 6 0) MOi j((ljn (Pn) Mij,

(3 4) becomes
OLn4) = -Ej(o -k)*j).mij+e (38)

Note that /R must exist, since M is invertible.
To show that 0n* exists, with probability tending to 1, we may
large n, so that (aL /a0)(0, a) -00 as 0 t Xn, 1. Since aLn/a0 is c
to show that there exists a 0 with
(0, c;bn)- g6o (On, On) > &(/n - iO) mOi.
The right-hand side is Op(n- 2) while the left is of the form (0-00) (52 La02) (0, v) for
78 RICHARD L. SMITH
some 01 between 0 and On. For cx =

on I 01-00 1 < n by Lemma 41(ii), h
as n -> oo. Similarly, for 1 < oc < 2
that O7*-_ <7n-2/a+
Now consider en and e*. For cx = 2, we have n-nO- <p (nlogn) 2 and it follo
(n log n)2 en(0n, /n) -,p 0 along any sequence (o), /n) such that
(n log n) "(On - Rn) +p 0, O/n - C) <P (nlog n) 2-
Under the same conditions, we have also n4 en*(On /) ?n 0 For 1 <o c < 2,
and it follows that n"'l en(OX, /)) -+p 0 and n2 e*(On, On)4) - 0 along any sequence (
such that n'l"(0 - -n) +P andn < n'-a.
We are ready for the final step. Suppose ci = 2. For t E M, y E [RP, define
fn(t, y) = n log nLn{On* + tn -'(log n) -' q5n* + y(n log n)-}.

Then
at = n- ,j {tn* + tn-(log n) - 1, ?n* + y{n log n} '}
= t(log n)- a2 ( +**, i +na e*{ ?+tn -(logn) -' , n* +y(nlogn)}-

by (3 7). But the second term tends to zero in probability, and the first to - tc. Similarly,
afnl/ay -, -jgyfimij uniformly over t2 + I y 12 < 1, say, where we use (3 8) and the
previously stated result about en. Note that
On-On <Pn (logn)' = o{(nlogn)-}, 2} -n <p (nlogn) 2

so the required conditions are satisfied. Then
t O fnl't + ? Yi O fnl'Yi --p - tc-? Y'E y' mjj < ?,

so that, applying Lemma 5, the probability that fn has a local maximum within the ball
t2 + I y 12 < 62, for any fixed 6 > 0, tends to 1 as n -+ x. Hence Ln has a local maximum
at (On, /k) satisfying
an- n <pn 2(logn) v 1 an - qn <p (n log n) 2(3i9)

with probability tending to 1.
The case 1 < cx < 2 is similar to the case cx = 2. Now define
fn(t,y) = I"L + tn 2 2 + yn
Expanding as far as the second term, using (3 7), (3 8) and Lemma 4J(iii), we may show
that t afn/an + ? y' afnl/yi is strictly negative over t2 + I y 12 = 62, with probability tending
to 1 as n -x . Again Lemma 5 may be applied, and we conclude that fn has a local
maximum satisfying t2 + I y 12 < 62, for any 6 > 0, with probability tending to 1. Hence
Ln has a local maximum at (On, an) satisfying
On- Rn <pn a2) $fn-fi% < n-/

with probability tending to 1 as n s~oc.
The proof of Theorem 1 is complete. II
Proof of Theorem 2. (i) This follows Wald's classical proof very closely; see also Walker
(1969) where similar results are needed as a preliminary to establishing asymptotic
normality of the Bayes estimator in a regular case. There are three steps.
First, for fixed (01, 01) E U there exists b(01, 01) > 0 such that
lim pr {Ln(Ol 014)-L(0o, 4o) < -b(Ol, 01)} = 1

n oo
This uses Assumption 2 and Jensen's inequality.

Secondly, for fixed (O1, 01) E U, i between 0 and 1 01-00 -6, we have
lim pr {sup LnA(0) < Ln(, v0)-b(01v, 1)/4} = 1,

n - oo
where the supremum is over all (0, 0) such that 0-01 I < , I k-01 1 < . This is
by bounding Ln(0, ) -Ln(01, 01), using Assumptions 1 and 3 the first step above.
Thirdly, let Km be a compact subset of RO x (F, as in Assumption 4. Extending the result
of the second step above, we have
lim pr { sup Ln(0v/) < Ln(00,v0>"1m} = 1

n- oo KmnU
for some ilm > 0. This is because Km can be covered by a finite number of open
neighbourhoods of points (01, 01). But now Assumption 4 allows us to drop the
restriction to KMn, provided m is sufficiently large.
(ii) First we note that, if 0 = 00 is assumed known then, given 6 > 0, there exists 4 > 0
such that
lim pr{ sup L(0O,0) < L(0O,00)-4} = 1 (3.10)
n -oo < 10-: 1<1-I > a
This may be proved by imitating the arguments used to prove (i), using Assumptions 2
and 3 with E = el = 0, and Assumption 5.
Let K' be a compact subset of (F, as in Assumption 5. For
(0, 0 1) E- Vn, 0 E- K f 01 c- Km,

we have
LO(0, 0)-LO(0o, /1) = n {X(+)-1} Ik{lOg (Xn,k -0) -log (Xn, k -Oo)}
+n {oIC()-ok 1) } k log (Xn, k -O)
+n' Xk{log (Xn, k-0; ,) )-log (Xn, k-O, 1)} * (3d11
But if kk-k11 < ij,b <, iv
n Ik{log (Xn, k-O0; 4)-log (Xn, k-o; 0 1?) } < nfl kH(Xn, k fO; 1)
and by Assumption 3 we may choose il and hence 6 sufficiently small that

limpr{n '1kHt,(Xn,k-OO;01) > 4/4} = 1. (3.12)
Note that it is essential. that Assumption 3 holds with 6* = 0 for this step.
Define an event 4'n,,(O, b) to hold if and only if
n7 I{xC() - I} Ek{log (Xn,k-0) -log (Xn,k -o)} < e{oC(4) + I}
We claim that, for any E > 0, it is possible to choose 6 sufficiently small so that
lim pr {gn~(, E(fb(l) for all (0, 0b) E V,n} = 1.F (3d 13)
n -o~
80 RICHARD L. SMITH
To prove this, fix (0, 0) E V,, and consider two cases: (a) oc() > 1, (b) X,,1 < 0-n-Y. We
use the inequality
I log (x-01)-log (x-00) max (01, 00). In case (a),
n 1{x(4) - 1} Xk{log (Xfl,k -0)- log (Xf,k - Oo)}
is negative when 0 > 00, and is bounded by n- 1 6{x(4)- 1} Ik(Xfl k-Oo) - when 0 < 00.
Since E{(X - 0o) -} < oo, we may choose 6 sufficiently small, independently of 0
so that
lim pr[n'16{x(4)-1} Xk(Xn,k-00)' > e{X(4)+1}] = O0
n -~ oo
In case (b),
n
n XkIlog(Xn,k-0)-log(Xn,k-Oo)i ? n'ylogn?+rbn E (Xn,k-Xn ,l)1
and the last term converges in probability by Lemma 2. Putting the results for (a) and
(b) together, we have (3413).
We also have E{llog(Xn,k-0o)i} < oo and may make I o(o)-cx(o1)I arbitrarily s
by choosing q sufficiently small. Combining this observation with (3 10), (3-12) and (3
choosing E in (3d13) so that e{ (4)? 1} < 4/4 on the range I 0-01 l < , we have
lim pr {sup Ln( 0) < Ln(00 v0o)- /4} = 1, (3d14)
n-oo
where the supremum is now taken over all (0, 0) such that
(0, 0 1) E- Vn, 0 E- Kmn 01 c- K'5 1 0-01 I < il,
for fixed 01. This result may immediately be extended to any finite set of values of 01
and hence by compactness to the whole of Km. Thus (3d14) holds if the supremum is taken
over all (0, 0) such that (0, 0) E Vn and 0 E K'm.
Now consider the case (0, 0) E Vn, 0 ? Km. Taking e = i' in (3 13), &l = 00 in (31 1), we
have with probability tending to one that
LnA(0 ) -L n(0O, k0) < n Sk HkH(Xn, k-O; ;0)

and the result now follows from Assumption 5. II
4. ASYMPTOTIC DISTRIBUTIONS
We are now in a position to state our main results about the asymptotic distributions
of 0n and s-
THEOREM 3. Under the a88umption8 of Theorem 1 let (n, $n) denote a sequence of
maximum likelihood e8timator8 8ati8fying the conclu8ion8 of Theorem 1.
(i) If a > 2 then n (On - 00, 4n - k0) converges in di8tribution to a normal random vector
with mean 0 and covariance matrix M-1, where M i8 as in Theorem 1(i).
(ii) If o = 2 then {(nc log n)2 (0n - 0), n (an - 4o)} converges in di8tribution to a normal
random vector with covariance matrix of form
[A M'1'
where M i8 as in Theorem 1l(ii).
(iii) If 1 < a < 2 then {(nc)11"'(O - O)

Y E R8 and Z E RiP are independent, Z has a normal distribution with mean 0 and
covariance matrix M - 1, where M is as in (ii), and the distribution function of Y is H,
where H is defined in the statement of Theorem 2-4 of Woodroofe (1974).
Note that Woodroofe's definition of H is long-winded, but because we do not know any
way to simplify it, we refer to Woodroofe's paper for the definition.
We also have the following Corollary.
COROLLARY. If a = 2 then
(O(n-O0) {-n 2Ln (On, o)/0O2}1 Z, (4.1)

(n - 00) { - n Lnan2 L , $p)/aO2} - Z, (4-2)
where convergence is in distribution and, in each case, Z is standard normal.
The corollary shows that the variance of the estimators may be estimated asymp-
totically by means of the observed information, in a case where the expected information
does not exist. In regular estimation problems, Efron & Hinkley (1978) argued that the
observed information is superior to the expected information as an estimator of variance,
but their argument depends on second-order approximations and conditional argu-
ments. It may therefore be of some interest that we have an example in which the
superiority of observed information is very easily demonstrated. Our argument,
however, applies only to the specific case a = 2 and therefore is of only slight practical
significance.
Proofs. We require two preliminary lemmas and a remark. Suppose {(Xk, Yk), k > 1} is
a sequence of independent identically distributed random variables and let (Sn, Tn)
denote the sum of (Xk, Yk) (k = 1, ..., n).
LEMMA 6. Suppose E(X1) = E(Y1) = 0, E(X2) = 1, E(Y2) = + oo and Sn/1n - ' 1,

Tnlbn -* Z2 in distribution, where Z1 and Z2 are each standard normal. Then
(Sn/1n, Tn/bn) + (Z1, Z2)

in distribution, with Z1, Z2 independent.
Proof. By Theorem 3X1 of an Erasmus University, Rotterdam, technical report by

L. de Haan, E. Omey and S. I. Resnick, it suffices to show that the conditional
expectations satisfy
E(Xj Yj; I Y, bn) 0
{E( Y2; I Y1 Ib2)} 0
By splitting into I XI M and I XI > M and using the Cauchy-Schwarz inequality,
this is less than
ME(I Yf I)/{E(Y2 ; I 41 

each term of which may be made arbitrarily small by taking first M and then n
sufficiently large. El
LEMMA 7. Suppose X1
density f satisfies f (
82 RICHARD L. SMITH
where the function g 8ati8fte8 f g(x)

(M./a., Wl/1n) converges weakly to
8tandard normal random variable8.
Proof. The case where g is the identity or a linear function is dealt with by Chow &
Teugels (1978), and our method closely follows theirs. It suffices to show that
E{exp (it W,/ Vn); M, > an u} -texp(-2-u )

The left-hand side may be written
/ 0 ZoanU n
(1 + f
But it follows from the standard proof of the central limit theorem that
00
n g {exp (itg(x)/ In) - } f (x) dx - -2

0
and from the asymptotic form of f that
ranu
n ff(x)dx -- u'.
The result therefore follows by calculating that
ranu- anu
n Jf[exp {itg(x)/ In
<L{ t2 2(X
as n -[ oo.
Remark 1. Lemma 6 extends the result of Resnick & Greenwood (1979), Theorem 3,
that if Sn/an Z1, Tnlbn -_ Z2 with Z1 normal, Z2 stable with index less than 2, th
(Sn/an, Tnlbn) (Z1 , Z2), with Z1, Z2 necessarily independent. The key point in the pr
is the observation that the limit (Z1, Z2) must be infinitely divisible and therefore the
sum of independent Gaussian and Levy components. Our remark is that the same result
holds if (Z1, Z2) arises as the limit of renormalized row sums of a triangular array subject
to the usual asymptotic negligibility condition. That is, if Z1 is normal and Z2 has an
infinitely divisible distribution without a Gaussian component, then Z1 and Z2 are
independent.
Proof of Theorem 3. (i) Theorem 1 shows the existence of (On, an) with an - O <p
an-- k0 <p n- 2, and the proof of Theorem 1 shows that the second derivatives of L
asymptotically constant in this region. The result therefore follows by standard
arguments.
(ii) Since (n log n)2(On - On) -sp 0, n (4n - ) -sp 0, it suffices to prove the result
Orn 4in in place Of On, an. For On alone, the asymptotic distribution is given by Wood
(1972). For an, alone, the asymptotic distribution is given by the classical results for
regular estimation problems. Therefore the only thing to show is the asymptotic
independence of on and /n,
Now (nclogn)2(On-0o) may be written as
(nc log n)2 2L n(O, _)/0{- 2 Ln(n* 00)//002 -1,

where O* satisfies l on- o I , )L
where Vn -+p 1 and hence plays no role in determining the limit. Now it follo
Lemma 1 and Assumption 9 (Woodroofe, 1972) that aLO(O0, X0)/0O has infinite v
but that {n/(clogn)}l2OLO(O, 0)/a0 converges to a standard normal variable
Similar arguments applied to (j - 4c), together with the Cramer-Wold device and
Lemma 6, allow us to assert that ZO is independent of the asymptotic distribution of
n2(0n-0)o), as required.
(iii) Since n1 (On- O-n) p 0 n($n -n) -+p 0, it again suffices to prove the result with
Onv, an in place of n,, $n, and hence the only thing to show is the independence of the
asymptotic distributions of On and /. Our proof will make use of Remark 1 as well as
Lemma 7.
Let t > 0, y Ec R and consider
pr I (cn) 1 /"(07n - 0) <-, n' Ei ai Oj?jo) -<Z y} 43

for fixed al, ..., aP. In the notation of Woodroofe (1974) this is the same as
pr Znt -> ?, n2 I 1 aW(n j- 0)o) -<- A}
where Znt (n > 1) is a sequence of renormalized row sums of a triangular array,

converging to an infinitely divisible limit. The limit is given in Woodroofe's Theorem 2-2,
and does not contain a Gaussian component. Therefore the limiting probability in (4 3) is
the same as that of
pr I (cn)/ a(07n - 0) < - t} pr {n' Ej aj((Mn - jo) <, Y}-

Similar arguments may be applied to the limits of
pr {(cn) la(Tn - 00) <, 0, n a'A zj ajs/ njo) <- Y}
pr{(cn)1/'(0"-n-0) > t, n" Ejai(($jn-0j) K, y IXn,l > 00+t(cn) '/"

for t > 0, using Theorems 2-1 and 2-3 of Woodroofe's paper. The last equation implies
that
lim pr I (cn) 11"(0-n - 0) > t, n 1 j aj(0Jn -jOi) <, y}
- limpr {(cn) " (On-n-o) > t, n2 a Ej a(#Jn- O) y IXn 1 > IOo+ t(cn)-/}
x pr {Xn 1 > Oo +t(cn)V }
- limpr{(cn)'1 (on-ao) > tIXn 1 > o?+t(cnV I }
x pr {n2 XjaJ($j-4/)jO l yXn, 1 > O0?+t(cn) l/a} pr {Xn, > SO+?t(cn)" }.
84 RICHARD L. SMITH
Lemma 7 implies the asymptotic independence of (en) - - (XO,1- 0) and the score
statistic evaluated at 4)0 for the parameter E ai Xi. It easily follows that the second factor
in this expression is independent of t, and hence that the whole expression equals
lim pr {(cn) 11"(0n - O) > t} pr {n' Xj aj(O - O,jo) -<, Y}.

This proves the independence of the asymptotic distributions of On and X a& Rn for
al, ..., ap, and hence gives the required result.
Proof of Corollary. Rewrite (4a 1) as
(nc log n)2(On - O0) { lo n (On,( 0)o)}

{c log H. 00
the product of two factors, the first converging to standard normal and the second to 1.
This gives (4-1), and (4 2) is similar. Note that (4-1) also holds if cx > 2.
5. AN ALTERNATIVE TO MAXIMUM LIKELIHOOD
The complicated nature of the preceding results when 1 < oc < 2, and the nonexistence
of a consistent estimator when cx < 1, make it desirable to seek some alternative
estimator. An obvious candidate for a point estimator of 0 is Xn 1, the sample minimum.
The asymptotic distribution of n'/a(Xn,1 - 0S) is Weibull, and Akahira's results show that
no point estimator of 0 converges at a faster rate when oc < 2. Thus it seems reasonable to
use Xn, 1 as an estimator of 0 when x < 2. The difficulties are, first, that it is generally no
known a priori whether Xc < 2 and, secondly, that we still need an estimator of 4.
In this section we propose a new estimator Sn of 4. It is consistent, so that it may be
used to discriminate between the cases oc > 2, cx < 2, and when a < 2 it is asymptotically
efficient, and may therefore be used in place of the maximum likelihood estimator 4)n.
The new estimator is defined as the local maximum of the function
n
Z log f (Xn,k; Xn, 1,t))

k = 2
It is therefore equivalent to estimating 0 by the smallest observation, and dropping that

observation for the estimation of 4 by maximum likelihood.
Define a modified likelihood function
LnAf v t) = n E 109 f (Xn,k; 09v 01)-

k = 2
Thus our new estimator satisfies
aLn(Xn,l, 4))L0i = 0 (i = l ...,p). (51)
THEOREM 4. Under the conditions of Theorem 1, with probabili

there exists an estimator 4n satisfying (5-1), and we have
- -~~ S-1/a (cx>1)

(>n C) <P ln 1 log n (o < 1)
COROLLARY. For all ~, 4 iits a consistent estimator of ?)O. If oc < 2, then a- is also
asymptotically efficient, and n2(4)n- 4)O) converges to a normal distri,bution with mean
zero and covariance matrix M 1, where M i8 the p x p matrix with entrie8
mij(4o) (i, j = 1, .., p).

Remark. The results of this section continue to hold if X,, 1 is replaced by any nl
consistent estimator O,n in (5 1), provided <, Xn,
Proof&. We start with some elementary observations about Ln. Clearly,
Ln(9 G/) -Ln(09, G/) = n- 109o f (Xn, 1; f9, /))
= n 1{CX(?) - 1} log (Xn, 1-0) + n log g(Xn, 1 - ; (W).

When 0 = 00 we have
log (Xn 1- 0) = O(log n), log g(Xn, 10; ) -+, log c(o),
so that the whole expression is Op(n- 1 log n). The same applies to
(0/0 Oi) {Ln (0, 0) -Ln (0, 0)}

since
(@/1(/)i) log1 (Xn 1 - 00; ) -+, (/10i) 1log0 (4)
by Assumption 1. We also note that parts II and III of Lemma 4 apply with Ln in place
of Ln, and without the lower bound restriction on Xn, 1 -00.
Let
Xn = (a/a8/ti )Ltn(00 k (tn) = (@/@(/)i) {Ln(fO0, (/))- LL(0, d) {)}
which is 1, be any positive sequence such that bn 0,
nbn/logn -+ x. Define, for y e RlP such that I y = 1,
fn(y) =O/n6-
gnY n 2 n(Xn, 15 + 6n Y)L(5-2)
( 2
Then
ay n 6 n' = (Xn,1-00) a0 2Lin n j C L0 n

where the dependence of (0*, 4*) on both n and y is indicated. Now, for all y with I y I
we have
10n*(Y)-001 < lXn,1-001 <pn-/ I On* n(Y) - 0 <, bn 0,

so by Lemma 4, modified to apply to Ln as indicated above,
.^0g {(n Yv/n() <p{ l/a-1 (C<1
aO>j Oj{0n*(Y)v On*(Y)} +mij pp
uniformly over y. Therefore
afn/aYi = -E jmj+e() (5 3)
86 RICHARD L. SMITH
where en(y) -,p 0 uniformly over

previously, that with probability tending to 1 there exists /)n satisfying
In-fI < bn6aLn(Xn,41 n)/a4t = 0 for i = 1, ...,p. Since bn may be chosen so that
nb6/log n increases arbitrarily slowly, the result of the theorem follows in the case cx < 1.
The argument for cx > 1 is almost identical. Now choose bn (n > 1) so that
-+ 0, n' la - oc, and define fn by (5L2). We have a2 fi/ao 9 a4t -moi uniform
a region, so 6 n- 01) <p 6,7' n - l/ 0 and so (5 3) again holds. The rest of t
is the same. O
Proof of Cor
The results for oc < 2 follow by noting that
n2(4n -qo) = n( - )o) + n (;n -n)

and the last term tends in probability to zero. Therefore the asymptotic distribution of
n2(kn-4o) is the same as that of n2(in-40), which is as claimed.
6. HYPOTHESIS TESTING
In this section we consider testing hypotheses about 0, making first a few remarks
about the case 4 known before turning to the case 0 unknown. The results given
of course also relevant to the construction of confidence intervals.
Consider first a simple versus simple test of Ho: 0 = 00, 4 = 00 against HI: 0
0 = (/o, based on sample size n. The Neyman-Pearson test is to reject Ho if
njLn(On, oo) -Ln(001 00)1 > c* (6.1)

for some critical value c*. If ac < 2, the natural choice of 0On is of the
On= 00?+ n / t, (6-2)

for fixed real t, since in this case a test of fixed size will have limiting power strictly
between 0 and 1.
Now suppose /0 is unknown but estimated by Jin, as defined in ? 5. The obvious
analogue of (6-1) is the test: reject Ho if
4{n(fLn( dn- Ln(6}n 7;n) > c * (6 3)

To compare (6 1) and (6 3), note that
n{Ln(O, n - LAn(0O0, 4n) - Ln(0n0 4.)) + Ln(0o, 4.o)}

= n(O0-O) X n(4.,n-l)) (2/ 4J) Ln,(0n*n, cn*
for some 0*', 1*. If 1 < cx < 2 we have
On-0 =O(n-'1) (jn-4.o0 <pn-? a2 Ln/a08.p) <p 1i

so the whole expression is OP(n2 - l and thus tends to zero in probability. For
ax = 1, cx < 1 we have a2 Ln,,0 a0j <P log n, nl/" -i respectively, leading to the same
conclusion. Therefore the two tests (6-1) and (6 3) are asymptotically equivalent, in the
sense that they make the same decision in large samples, provided cx < 2.
Note that this conclusion is false if cx > 2, even if fan is used in place of 4.n. In that
we have O,n- 0 = O(n- 2) and hence that n(On -00) Xj (4n- 4o) a2Ln/a0a4.O is Op(1)
than op(l), so that the equivalence of (6-1) and (6 3) fails. When cx =^ 2, we have
din-k0 <pn-2 by Theorem 4, but in th

so that (641) and (6 3) are again asymptotically equivalent.
Our conclusion is that for testing a simple hypothesis about 0 against a simple
alternative, ignorance about 0 makes no difference to the power of the test
but it decreases the power if cx > 2.
Now consider a test of Ho: 0 = 0S against the composite alternative H1: 0 $ 00. For
the construction of two-sided confidence intervals, in particular, we wish to consider this
case. In regular cases, there are three widely used tests, all of which are to first order
asymptotically efficient, namely the Wald test which is based on the asymptotic
distribution of the maximum likelihood estimator, the score test which is based on the
first derivative of the log likelihood and the likelihood ratio test; see, for example, Cox &
Hinkley (1974, Ch. 9). In the nonregular case oc < 2, even when 0 is known, these
are not first-order equivalent and it is not known whether any of them has any
optimality properties. The following discussion is therefore concerned purely with
distributional properties and not with asymptotic optimality.
For the Wald test, the test statistic is the maximum likelihood estimator, assuming it
exists. Woodroofe's (1974) results determine the asymptotic distribution of maximum
likelihood estimator in the case 1 < oc < 2 but this is not easy to work with. For the
likelihood ratio test, nothing appears to be known about the asymptotic distribution of
the test statistic. For the score test, however, things are rather simpler, since the
asymptotic distribution of the score statistic is known and the test does not require
computation or even existence of the maximum likelihood estimator. Therefore we
concentrate on the score test, believing that this provides at least a viable method of
testing hypotheses about 0 even though it may not have asymptotic optimality
properties.
The score static for testing Ho: 0 = 00 when k = kO is known is
AL(0O, 00)/a0 = (oc- 1)2 (Xn, k-00)' n- +?E {logg(X10 nk 0; 00)/00J} n

When oc < 2 the second term is Op(l) and the first tends to oc, but E(aLn/a0) = 0 wh
oc > 1, by Assumption 6. Hence aLnI/0 has the same asymptotic distribution as
n'- l{Sn 1-E(Sn, 1)} when oc > 1, n - 1 Sn, 1 when oc < 1, and this asymptotic distribution
stable, by Lemma 1. In particular, when 1 < oc < 2 or oc < 1,n'" aLnI/0 has a
nondegenerate stable limit law.
A two-sided test is therefore defined by the acceptance region
a < n{a-'1 aLn(0, /0o)/00 - bn < b, (6-4)

where an, bn are as in Lemma 1 and the percentage points a,
limiting stable law. For oc = 2, the acceptance region is
a < n (clogn)>- 2Ln((0 400)/30 < b (6-5)

with a, b the appropriate percentage points of the standard normal distribution.
Now let us consider the effect of 0 being unknown, and estimated by jn. When
1 < oc < 2 we have
nl 'a{aLO(, f3n)/00 - aLn(0f 00)/O1} = nl - l (Ln- n(0 4L On)1/00j4),
with 0* between an and k0. But a2 Ln/I0 0fi <p 1 uniformly on a region of the
I I I Io (l), O1-0I1 =Iop(rJ2+l), so the whole expression 0 uniformly on a
region of form l 0-001l = O(n - 1a). Similar arguments show that the same result holds
88 RICHARD L. SMITH
O < oc < 1, cx = 1 and cx = 2. We therefore conclude that the tests (6-4)-(6-5) remain valid
when k0 is unknown, provided that /0 is replaced by its estimated value 4n or by som
other In-consistent estimator.
As previously remarked, no claim of optimality is made for these procedures. In one-
parameter problems, 0 unknown, 0 known, the problem of constructing asym

optimal procedures, when cx < 2, is considered in detail in Chapters 5 and 6 of Jbragimov
& Has'minskii (1981). These results are complicated and depend on the loss function.
7. APPLICATIONS IN EXTREME VALUE THEORY
The results of this paper may be applied to two particular distributions which are of
importance in the analysis of extreme values. These are the generalized extreme value
distribution, which includes the three-parameter Weibull as a special case, and the
generalized Pareto distribution introduced by Pickands (1975).
The density function of the generalized extreme value distribution is
g(y;k,y,u) = a-_{1p-kl-l(ylu} l/k-1 exp[-{1-kFi(y_4)}l/k] (7.1)

defined on the set {y: k(y - p) < a}. In the case k > 0, this is a reparameterization of the
Weibull distribution, while the cases k < 0, k = 0, defined by taking the limit as k - 0 in
(7-1), correspond to the type II and type I extreme value laws, in Gumbel's (1958)
characterization. For computational details, see Prescott & Walden (1980, 1983). Recent
reviews on the fitting of extreme value distributions and discrimination among types
have been given by Mann (1984) and Tiago de Oliveira (1984).
For k > 0, the results of this paper are directly applicable, and show in particular that
the classical properties of maximum likelihood estimators hold for 0 < k < 4, but not for
k ?1. The information matrix is finite over the whole range- oX < k < 4. For k < 0, the
range of the distribution again depends on the unknown parameters, the density being
positive when y > 4u + u/k. Writing ,B = - 1/k, 0 = p + l/k gives the reparam
g(y;13,O,U) =U' {(y-0)/(fU,)} exp[-{(y-0)/(#fU)}f-] (y> 0),

which converges to zero faster than any power of y -0 as y I 0. Although this does not fall
within the scope of our Theorem 1, the arguments that were applied there to the case
oc > 2 are applicable here also, and show the existence of a maximum likelihood estimator
which is asymptotically normal and efficient. We conclude that the classical asymptotic
properties of maximum likelihood estimators hold throughout the range - so < k <2,
while for k > 4 the results for the three-parameter Weibull are directly applicable.
We remark that this argument applies also to the three-parameter log normal
distribution. In this case also, the likelihood function is unbounded but there is a local
maximum which is asymptotically normal and efficient. This observation provides a
theoretical justification for the procedures advocated by Griffiths (1980). Another
distribution for which similar arguments hold is the inverse Gaussian distribution with
unknown location (Cheng & Amin, 1981).
The generalized Pareto distribution has density
g(y; k, ) = v- '(1-ky/l)l/k-1 (7-2)

defined on the set y > 0, ky < a. Its importance in extreme value theory arises from the
fact that it characterizes the limiting distributions of the excesses over a threshold, as
the threshold tends to infinity (Pickands, 1975; Davison, 1984; Smith, 1984). When
k < 0, it is the same as the Pareto distribution, a

distribution with mean a. For k > 0, the density is a transformation of the three-
parameter beta distribution of ? 1, with ,3 = 1. Thus we get analogous results to those for
the generalized extreme value distribution: for k < 2 the information matrix is finite and
the classical asymptotic theory of maximum likelihood estimators is applicable, while for
k 1 2 the problem is nonregular and special procedures are needed.
An alternative approach was taken by Hall (1982), who considered the estimation of 0
when the density is of the form (11), but without assuming a specific parametric form for
fo. Hall proposed a procedure based on a specified number of order statistics; hi
procedure is almost the same as restricting attention to observations beneath a
threshold, and assuming the generalized Pareto distribution for the differences between
these observations and the threshold. Hall proves asymptotic normality of his es-
timators in the case oc > 2. The strength of his approach is that it also takes into account
the error in (1-1), but he essentially assumes that this error is known, which is not a
realistic practical assumption.
In conclusion, there are many open questions. Even when oc > 2, not all the higher-
order moments of the score statistic are finite and it remains an open question whether
arguments based on higher-order asymptotics, e.g. Efron & Hinkley (1978), Barndorff-
Nielsen (1983), are applicable. Mann (1984) has reviewed a number of papers in which
authors have reported practical difficulties in estimating the parameters of the three-
parameter Weibull. This raises questions of the speed of convergence, and small-sample
properties of the estimators, which we have not considered at all. The problem of
determining asymptotically efficient tests and estimators of 0 when oc < 2 is also
unresolved. Finally, there is the question of the asymptotic performance of Bayes
estimators; in the case oc < 2 we would not expect asymptotic normality (Dawid, 1970) to
hold, and it would be of interest to establish under what conditions the asymptotic
posterior distribution does not depend on the prior distribution.
ACKNOWLEDGEMENTS
One of the referees made many helpful suggestions, in particular a greatly shortened
proof of Lemma 6. I thank N. H. Bingham and J. P. Cohen for references.
REFERENCES
AKAHIRA, M. (1975a). Asymptotic theory for estimation of location in non-regular cases, I: Order of
convergence of consistent estimators. Rep. Statist. Appl. Res. Union Jap. Sci. Eng. 22, 8-26.
AKAHIRA, M. (1975b). Asymptotic theory for estimation of location in non-regular cases, II: Bounds of
asymptotic distributions of consistent estimators. Rep. Statist. Appl. Res. Union Jap. Sci. Eng. 22, 99-
115.
BARNDORFF-NIELSEN, 0. (1983). On a formula for the distribution of the maximum likelihood estimator.
Biometrika 70, 343-65.
CHENG, R. C. H. & AMIN, N. A. K. (1981). Maximum likelihood estimation of parameters in the inverse
Gaussian distribution, with unknown origin. Technometrics 23, 257-63.
CHENG, R. C. H. & AMIN, N. A. K. (1983). Estimating parameters in continuous univariate distributions
with a shifted origin. J. R. Statist. Soc. B 45, 394-403.
CHOW, T. L. & TEUGELS, J. L. (1978). The sum and maximum of i.i.d. random variables. In Proc. 2nd
Symp. Asymp. Statist., Ed. P. Mandl and M. Huskova, pp. 81-92. Amsterdam: North Holland.
COHEN, A. C. (1965). Maximum likelihood estimation in the Weibull distribution based on complete and on
censored samples. Technometrics 7, 579-88.
Cox, D. R. & HINKLEY, D. V. (1974). Theoretical Statistics. London: Chapman and Hall.
DAVISON, A. C. (1984). Modelling excesses over high thresholds, with an application. In Statistical Extremes
and Applications, Ed. J. Tiago de Oliveira, pp. 461-82. Dordrecht: Reidel.
90 RICHARD L. SMITH
DAWID, A. P. (1970). On the limiting normality of posterior distributions. Proc. Camb. Phil. Soc. 67, 625-33.
EFRON, B. & HINKLEY, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed
versus expected Fisher information. Biometrika 65, 457-87.
FELLER, W. (1971). An Introduction to Probability Theory and its Applications, 2, 2nd ed. New York: Wiley.
GRIFFITHS, D. A. (1980). Interval estimation for the three-parameter lognormal distribution via the
likelihood function. Appl. Statist. 29, 58-68.
GUMBEL, E. J. (1958). Statistics of Extremes. New York: Columbia University Press.
HALL, P. (1982). On estimating the endpoint of a distribution. Ann. Statist. 10, 556-68.
HARTER, H. L. & MOORE, A. H. (1965). Maximum-likelihood estimation of the parameters of gamma and
Weibull populations from complete and from censored samples. Technometrics 7, 639-43.
IBRAGIMOV, I. A. & HAS MINSKII, R. Z. (1981). Statistical Estimation. Berlin: Springer.
JENKINSON, A. F. (1955). Frequency distribution of the annual maximum (or minimum) values of
meteorological elements. Quart. J. R. Met. Soc. 81, 158-71.
JOHNSON, R. A. & HASKELL, J. H. (1983). Sampling properties of estimators of a Weibull distribution of use
in the lumber industry. Can. J. Statist. 11, 155-69.
LE CAM, L. (1970). On the assumptions used to prove asymptotic normality of maximum likelihood
estimates. Ann. Math. Statist. 41, 802-28.
LEMON, G. H. (1975). Maximum likelihood estimation for the three parameter Weibull distribution based on
censored samples. Technometrics 17, 247-54.
MAKELIINEN, T., SCHMIDT, K. & STYAN, G. P. H. (1981). On the existence and uniqueness of the maximum
likelihood estimate of a vector-valued parameter in fixed-size samples. Ann. Statist. 9, 758-67.
MANN, N. R. (1984). Statistical estimation of parameters of the Weibull and Frechet distributions. In
Statistical Extremes and Applications. Ed. J. Tiago de Oliveira, pp. 81-9. Dordrecht: Reidel.
NERC (1975). Flood Studies Report, 1. London: Natural Environment Research Council.
PICKANDS, J. (1975). Statistical inference using extreme order statistics. Ann. Statist. 3, 119-31.
PRESCOTT, P. & WALDEN, A. T. (1980). Maximum likelihood estimation of the parameters of the generalized
extreme-value distribution. Biometrika 67, 723-4.
PRESCOTT, P. & WALDEN, A. T. (1983). Maximum likelihood estimation of the parameters of the three-
parameter generalized extreme-value distribution from censored samples. J. Statist. Comput. Simul. 16,
241-50.
RESNICK, S. & GREENWOOD, P. (1979). A bivariate stable characterization and domains of attraction. J.
Mult. Anal. 9, 206-21.
RoCKETTE, H., ANTLE, C. & KLIMKO, L. A. (1974). Maximum likelihood estimation with the Weibull model.
J. Am. Statist. Assoc. 69, 246-9.
SMITH, R. L. (1984). Threshold methods for sample extremes. In Statistical Extremes and Applications, Ed. J.
Tiago de Oliveira, pp. 621-38. Dordrecht: Reidel.
TIAGO DE OLIVEIRA, J. (1984). Univariate extremes: Statistical choice. In Statistical Extremes and
Applications, Ed. J. Tiago de Oliveira, pp. 91-107. Dordrecht: Reidel.
WALD, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20, 595-
601.
WALKER, A. M. (1969). On the asymptotic behaviour of posterior distributions. J. R. Statist. Soc. B 31, 80-8.
WEISS, L. (1979). Asymptotic sufficiency in a class of non-regular cases. Selecta Statistica Canadiana 5, 143-
50.
WEISS, L. & WOLFOWITZ, J. (1973). Maximum likelihood estimation of a translation parameter of a
truncated distribution. Ann. Statist. 1, 944-7.
WOODROOFE, M. (1972). Maximum likelihood estimation of a translation parameter of a truncated
distribution. Ann. Math. Statist. 43, 113-22.
WOODROOFE, M. (1974). Maximum likelihood estimation of translation parameter of truncated distribution
II. Ann. Statist. 2, 474-88.
[Received January 1984. Revised May 1984]

Maximum Likelihood Estimation in A Class of Nonregular Cases

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Maximum Likelihood Estimation in A Class of Nonregular Cases

Hochgeladen von

Copyright:

Verfügbare Formate

Biometrika Trust

Maximum Likelihood Estimation in a Class of Nonregular Cases

Maximum likelihood estimation in a class of

Department of Mathematics, Imperial College, London SW7 2BZ, U.K.

We consider maximum likelihood estimation of the parameters of a probability

It is well known that, if the support of a probability density depends on an unknown

f (X; 0) = o (X 0) (O < X < so); fo(x) - occx'1(11

(Woodroofe, 1974). Asymptotic efficiency is an open question, though Akahira (1975a, b)

(i) Three-parameter Weibull,

(iii) Three-parameter beta, which is a beta distribution with an unknown scale

We consider probability densities of the form

f(x;0'')= (x_0)"-'g(x-0;0g) ( < x< 00), (2.1)

In general, we assume 0 is real-valu

Assumption 2. If 01 and 42 are distinct elements of (F then the set

{X: f (X; 0, 01) * f (X; 0, 0A)

lim Hq(x+y; 01)f(x;0,00))dx = 0

-oo < limsup Ho(x;0, 0) f(x;O0,0)dx < 0.

Assumption 5. There exists a fixed increasing sequence of compact subsets {K I m > 1}

-so < limsup H' (x; O) f (x;0, O))dx < 0.

(b) ifcx>1, then

E4( )log f (X; O, )} = ,

-E=Ox log f)(X; 0, ) (e-logf)(X 0, E c/ox}) logf(X;0 04)}

E4,[{(a/ax)logf(X ;o, 4)}2] = 2 -E{(02/aX2)logf(X;0, 0)}

I (02/aX2) log g(x; 0) I < e/x2 + hE,,(y; /o)

whenever I 01- )0I < 5, I x-y y < , and h, satisfies

{h8(X; 00) f (X; 0, 4)0) < so.

Assumption 9. For each ( > 0, ) E (D,

{ {(a/ax)logg(x;10)}2f(x;0,0))dx < 00.

which are bounded as x I 0 respectiv

02 lOg g(X; oc, A) <Kxal - 2 (x 1),

This function is bounded as x - 0. If we define h.,6(y ; ao, I3o) t

3. EXISTENCE AND UNIQUENESS OF CONSISTENT ESTIMATORS

Ln(0, ) = n-l E logf(Xi;0,0).

lim limsuppr(I Yn I > arn) = 0. (3.1)

Our main result in this section is the following.

M is strictly positive-definite, where:

An00 <p (n4n,a) v n -/)o <p n-

An -On <pn - (logn) v n a- dn <p (n log n) 2

An -fJn < p n /a + n _ (An- <Pn-l /a

THEOREM 2. Suppose that, for each n, we have a sample of n independent observations

(ii) Suppose Assumptions 1-6 hold with (* = 0 in Assumption 3, (' = ( in Assumptions

lim pr {sup Ln(0, ?) > LO(O0, ?O)} = O0

Sn, ml(cn log n) +p 1.

Proof. The random variable (Xi-O0)-m has a distribution in a stable domain of

(i) If Lxm, thenSn*,mSn,m-+p asn- o.

Proof. Define S~n,m = Sn,m-(Xn,I-OO)-m. It is easily seen that S* */Sn, m p 1, S

-n E 9log(Xn, k-O) (i = 1),

Proof. We prove only (ii); (i) and (iii) are si

I (Xnk k-Ofn) -(Xn,k-OoV00) I O I 0-0 I/(Xn,k- Xn, 1)2.

n- , (Xn,k-Xn, l )2 <p Xn, 2 a-

LEMMA 4. Let 6 6* for n = 1, 2, ..., be any sequences of positive numbers with

0-00 I < bn-/ 1 0 -o 00 < bn, 0} < Xn, I -bn* n 2

10 So < b , 11)-s)ol bn)0 < Xn, I- 6*(n log n)-2

IO - 00 < bn-/ I | ?-?) I < bn, f0 < Xn,j -b'n-1.

lim lim pr (Zn > a) = 1.

10-00 I < b(logn)/n, I0 - )0I < bn, 0 < Xn 1-b'/(nlogn).

I 0 o - 0 < bn- /, I ?- 00 I < bn, 0 K, Xn, I- b'n- 1/c.

f2 \ n (0 , 2) = {a(/) - } Yn3() -n k (02) logg(Xn k-0; (/)) (33)

(i) The result is true for 0 = 00, /

( $42 ){Ln(0, 4) -Ln(0 ))} | I -o(+

+ n k=1 (Oj2) {1gg9(X

(oc -l ) n - (Xn, l-0)2 '> (OC-l ) n '(Xn, 1 -0 +b'n X a - 2)

LEMMA 5. Let h be a continuously differentiable real-valued function of p +? r

= (0-0) Lf Ln (0' ) X(4,J(/)aLfl ( + )n

L__ =L 0 OLn (0 ,,)?e(0,k) = ( -a 07 02) aL2 (O** nj+e*(0, t)

= t(log n)- a2 ( +**, i +na e{ ?+tn -(logn) -' , n +y(nlogn)}-