Sie sind auf Seite 1von 11

The Canadian Journal of Statistics

Vol. 8, No.2, 1980, Pages 193-203


La Revue Canadienne de Statistique

193

Towards a unified definition of maximum


likelihood
F. W. SCHOLZ

Boeing Computer Services Company

Key words andphrases: Maximum likelihood method, nonparametric, density version,


definition of maximum likelihood.
AMS 1980 subject class@ations: Primary 62A10; secondary 62G05.
ABSTRACT
A unified defrnition of maximum likelihood (ML) is given. It is based on a pairwise
comparison of probability measures near the observed data point. This definition does not
suffer from the usual inadequacies of earlier definitions, i.e., it does not depend on the choice
of a density version in the dominated case. The defrnition covers the undominated case as
well, i.e., it provides a consistent approach to nonparametric ML problems, which heretofore
have been solved on a more less od hoc basis. It is shown that the new ML definition is a true
extension of the classical ML approach, as it is practiced in the dominated case. Hence the
classical methodology can simply be subsumed. Parametric and nonparametric examples are
discussed.
1. INTRODUCTION

Although the method of maximum likelihood (hereafter abbreviated MML) dates


back to Lambert (1760) or Bernoulli (1777), it is generally agreed that Fisher (1912,
1922) rediscovered it and set the stage for its general acceptance in the statistical
world.
This paper addresses a difficulty that arises in the definition of the method.
Traditionally the MML is motivated by discrete examples, where the likelihood L(x,
e) is defined to be just P(X = x I e>, and a maximum likelihood estimator (m.1.e.) 4
of 8 is a value 0 which maximizes L(x, 8) for fved x . In this context, when one
speaks of probabilities, the MML has much intuitive appeal, although its logical
foundations are metaphysical as already admitted by Bernoulli (1777). In the case of
continuous distributions, which have densities f ( x 1 8) with respect to Lebesgue
measure, the likelihood L(x, 6) is redefined to bef(x 16).To link this definition with
the previous one, it is noted that P(X E N(x)lO) - f ( x 1 O)V(N(x)), where V ( N ( x ) )
is the volume (independent of the parameter e) of a small neighbourhood N ( x ) of x .
In this context it is of interest to quote Fisher (1912, p. 156): Iffis an ordinate of the
theoretical curve of unit area, then p = f S x is the chance of an observation falling
within the range Sx; and if log P= C?log p then Pis proportional to the chance of
a given set of observations occurring. The factors Sx are independent of the theoretical
curve, so the probability of any particular set of 8s is proportional to P where log P
= Zilogf: The most probable set of values for the 8s will make P a maximum.
*This research, partially supported by NSF grant no. MCS75-08557A01, was carried out at the
University of Washington,and revised since the author joined the Boeing Computer Services Company.

SCHOLZ

194

Vol. 8, No. 2

The search for a new defintion was originally motivated by the desire that the
MEXLshould also cover nonparametric situations. The difficulty here is, that typically

we will not have a common a-finite dominating measure and hence cannot use
densities as likelihoods. In spite of this, many nonparametric problems have been
solved by the MML with acceptable solutions, however, it is not clear which extension
of the defhtion has been employed. We are aware of only two attemptsto widen the
definition of the MML in order to cover the undominated case as well. The one by
Kiefer and Wolfowitz (1956) essentially suggests a pairwise comparison of possible
distributions P and Q for the data point x, by comparing (dP/&)(x) and (dQ/&)(x),
where p P + Q. However the version of the Radon-Nikodym derivatives is not
specified. Selection of various versions can lead to many different solutions for the
MML. In this context consider the following example: Let #(x) be the standard normal
density and defrne #*(x) = @(x)for x f 1 and @*(l)= 10. Let (PdB E R ) be the
family of probability distributions generated by the densitiesfe(x) = @*(x- d),
B E 1p. Now Pe 4 8 , l ) and the m.1.e. of 8 (or Pe) upon observing x ouj$t to be
x (or P,). However if one takes the above versionsfi as densities, i.e., as likelihoods,
the m.1.e. will be x 1 for all x.
It appears then, that not only is the Kiefer-Wolfowitzdefinition defective in this
respect but so is the classical defmition as well, unless one considers the density
versions as given entities in any statistical model for data. However the wron8
versions could have been specified and we will not escape the consequences of the
above example. Further it is not clear how to select unproblematic density versions
in the Kiefer-Wolfowitzapproach should one select to represent the statistical model
in terms of given densities.
The only other approach we are aware of for extending the MML towards nonparametric problems is outlined in the new book by Kalbfleisch and Prentice (1980).
Motivating the likelihood in the derivation of the Kaplan-Meier estimator, they
suggest a d i s c r e t i o n of the data, via the concept of rounding mors (cf. also
Kempthorne and Folks (197I) and Kalbfleisch (197l)), and then apply the m a to
the discrete (multinomial)problem and let the rounding error tend to zero,hoping
that the dismte m.1.e. will converge to a limit. Kalbfleisch and Prentice suggest
without fwther support that this will provide an extension of the usual m.We did
consider providing rigorous support for this last claim. However the questions of
existence of the limiting m.1.e. (as the grouping gets finer) and whether this limit
would depend on the grouping employed, presented major obstacles and it remained
unclear Whether an elegant or even satisfactory approach could be found this way.
In the next section we will present a new definition of the MML which is based 011.8
pairwise comparison of probability measures in the neighbourhood of the observed
data point x. Thus, in a sense,the new d e f ~ t i o nis a marriage of the two extension
proposals discwsed above.

2. DEFINITIONS
Let AY be a metric space with metric d and let B be a family of probability
measures on the Bore1 sets of 9C For any (data) point x E $let M,denote the family
of ail measurable sets N, which contain x as an interior point. By D(N,) denote the
diameter of the set N,.

195

DEFINITION OF MAXIMUM LIKELIHOOD

1980

DEFINITION
1. For P, Q E B write P 2 Q if
lim inf{P(Nx)/Q(Nx):Nx
E Nxwith D(Nx)IE }
E+O

1,

where by convention 0/0 = 1.


For convenience we will write the above limit also suggestively as follows:
X

DEFINITION
2. For P, Q E B write P = Q when P 2 Q and Q L P. Then P and Q are
called equivalent at x.
Comments.
(i) P f Q if and only i f l i m ~ ( ~ ,P(Nx)/Q(Nx)
)+~
exists and equals I.
X

(ii) If P 2 Q and Q 2 R then P L R (transitivity). [We could not establish the same
transitivity relationship for the Kiefer-Wolfowitz definition, the many density
versions representing the basic difficulty.]
X

(iii) P 2 P for all P E B(reflexivity).


X

2) is a
(iv) If {P}.: = ( Q E S? Q f P } and if Bx:= {{P}.: P E B} then {PX,
X

partially ordered set if we extend L in the natural way.

DEFINITION
3. The statistic PO E B is a maximum likelihood estimator (m.1.e.) with
x

respect to x E 9and Biffor every Q E Bsuch that Q> POitfollows that Q- PO.That
X

is, POis an m.1.e. fi and onb if there does not exist a Q E 9 - PO}^ such that Q 2 PO

or equivalent&

lim Q(NX)/Po(Nx)
< 1for all Q E 8- {PO},.
The existence of an m.1.e. according to this definition cannot be taken for granted
and depends on the underlying problem. Further m.l.e.s may not be unique since we
may either have a whole equivalence class of them or the situation may arise where
several equivalence classes of rn.l.e.s exist which are not comparable with respect to
x

2 , e.g., let Bconsist of only P and Q,such that neither P L Q nor

Qr
P; then by our

definition both P and Q q u a w as m.l.e.s.


Now we will give some criteria that will clarify the relationship between the new
definition and the classical approach:
(A) Suppose P, Q E Bare dominated by a u-finite measure p, and suppose P and Q
have density versionsp(.) and q(.) (w.r.t.p), which are continuous at x , then

lim P(Nx)/p(Nx)exists = p ( x )

D(N*kO

and

provided p(Nx)> 0 for rtl Nx E Jcr,.

SCHOLZ

196

Vol. 8.No. 2

Case 1: If p(NJ = 0 for some N , E Jtx,then

lim P(N,)/Q(N,) = lim


-Q ( N x ) / P ( N x=
) 1
X

since 0/0 = 1, i.e., P = Q.


Case 2. If p(Nx)> 0 for every N , E Nx,then

provided p ( x ) + q ( x ) > 0. Thus if p ( x )

+ q ( x ) > 0, then

P 2 Q c + p ( x ) -> q(x).
The case p ( x ) + q ( x ) = 0 has to be considered on an individual basis in each given
problem.

(B) IxfP({x})> 0 or Q ( { x } )> 0 thenlim


-P ( N x ) / Q ( N x =
) P ( { x } ) / Q ( { x ) )and hence
P 2 Q P ( { x } )2 Q ( { x ) ) .
@

Criteria (A) and (B) show the agreement between the new and the classical
definition. This means that the classical methodology for finding m.l.e.s is simply
subsumed in the methodology for the new definition. Further, the new definition
points to those density versions (continuous at x , if they exist) that should be used in
the classical definition in order for agreement to occur between the defmitions. If
there exists a continuous density version it seems only natural to insist on its use in
the classical definition, since densities are supposed to represent the localized
properties of probability distributions. Unresolved are those cases where density
versions, which are continuous at x do not exist. In such cases no natural density
version would suggest itself for the classical approach whereas the new definition
shows how to deal with this question in a satisfactory way, as will be seen from the
following examples.
3.EXAMPLES
In the following examples P ( Q , . . .) may denote either the distribution of one
random variable or the distribution of a random sample of such random variables,
i.e., P serves as a parameter as well. From the context it should be clear what is
meant so that no confusion arises while at the same time we avoid notational
complexities.
Example 3.1. Let XI,. . . , Xn be independently and identically distributed as
9, the class of all probability distributions on the real line. Let P,,be the
empirical distribution corresponding to the observed data vector x = ( x l , . . . , xn),
i.e.,

P E

Pn(A) = C I A ( ~ J / ~
1-1

where

1 ifzA

IA(z) = 0 otherwise.

Then Pn is the unique m.1.e. of P.

DEFINITION OF MAXIMUM LIKELIHOOD

1980

197

Proof. Note that


h

where yl < .. < yh represent the distinct values among xl, . . . , Xn appearing with
respective multiplicities ml, . . . , mh. Furthermore note that by simple application of
Jensen's inequality one has
h

Q({x>>= nQ<{yj})"l5 n(mj/n)"l


J=1

1-1
X

with equality if and only if Q = P n . Hence Q?


i.e., Pn is the unique m.1.e. Q.E.D.

Pn

Q({x}) ? P,({x})

P, = Q,

It seems fitting to give this example first, since in some respect it was the first one
"solved" through the MML, namely by Lambert (1760), cf. Edwards (1974).
Example 3.2. Let Xl, . . . , X n be independently and identically distributed
%(O, 8), 8 > 0 (uniform on (0,8)); then PO, = %(O, OO), with 80 = max(xl, . . . , x,),
is the unique m.1.e. based on the observed data vector x = (XI, . . . , X n ) , since
(a) if 0 c 00 then Pe(Nx)= 0 for D(N,) small, and Pen(Nx)> 0 for all N , E ,A,
so
lim PANx)/Peo(Nx)= 0 < 1.
(b) if 8 > e0,let

. . . ,u n ) E R": max@l, . . . , yn) < do} 17 N,,


and let h denote Lebesgue measure on R". Then
N,* =

{@I,

lim Pe(Nx)IPeo(N~)
= @ (80/B)n(X(N,)/h(N,*))= (do/@" < 1
since the ratio h(N,)/X(N:) ( 21) can approximate 1 by appropriate choice of N,.
Thus Penis an m.l.e., and the uniqueness follows easily. Here the classical approach
can either produce the same solution or no solution or any other solution depending
on the choice of density. Note that in this example there is no density for Pe,,which
is continuous at x .
Example 3.3. Let XI, . . . , X n be independently and identically distributed
% ( 8 - 4, 8 + 3), 8 E I?'(uniform on (0 - 3, 8 + 4)); then any Pe,, = %(do - 3, 80+
+), with O0 E I = (X(n)- 4, x ( ~+) +),is an m.1.e. based on the observed data vector x
= (XI, . . . , xn); here x(11= min(x1, . . . , Xn) and X(n) = max(xl, . . . , xn).
X

Proof. First note that Pen= Pel for 80, 81 E Z,since PO, and PO, admit densities which
are continuous at x with the same value 1 at x. Now let 80 E Z and dl = x ( ~-) 4 <
80and let
N: = N,

n { y E R,:

5 o1

+ 3);

then

lim Pe,(NX)/Pe,,(Nx)

=lim h(N:)/X(N,) = 0 < 1.

Similarly one argues for 00 E Z and dl < x ( ~-) 3 or 81 L x ( ~+) 3 that

lim Pe,(N~)/Peo(Nx)
= 0 < 1,
hence establishing the claim.

Vol. 8 , No. 2

SCHOLZ

198

It is easily seen that PO.with 8* 5 X ( n ) - 4 or 8* 2 x(1) + 4 are not m.l.e.s under


our definition. Note that Rohatgi (1976, p. 379) gives the closed interval [ X ( n ) - 4,
X U ) + $1 as the set of all m.l.e.s which is due to the choice of density employed. This
arbitrariness is resolved by our defrnition in a manner which appears satisfactory.
Example 3.4. The Kaplan Meier estimator: Suppose TI, . . . , T n are independently
and identically distributed as F,where F E E the family of all distribution functions
on [0, a];the Ti represent latent failure times. We also have censoring times
C1, . . . , Cn independently and identically distributed as G, where G E 9? Assume
that C = (CI, . . . , C n ) and T = (TI, . . . , Tn) are independent. For i = 1, . . . , n we
observe the actual failure times Xi = min(T,, Ci)and

The distribution of (X,6) = (XI,


81, .. . , Xn, 6,) will be denoted by PF,Gand an
observed value of (X,6)is denoted by ( x , d). Note that as in Example 3. I there is no
F, G E 9},
and hence no likelihood
common a-finite dominating measure for {PF,G:
in the classical sense. The approach taken so far, cf. Kaplan and Meier (1958), is to
maximize

PF,G(X= x , 6 = d) over F, G,E S

(3.1)
The conventional MML fails to justify (3.1) as a starting point; the new definition,
however, easily leads us to (3.1). This follows from our discussion in (B) above in
Section 2 and the fact that one can find (Fo,GO)E 9 X @such that PF,,G,(X= x ,
6 = d ) > 0.
Example 3.5. The Multivariate Normal Distribution: Let X I , . . . , Xn be independently and identically distributed Np(m,B) with m E Rp and B a p X p covariance
matrix. Although m and B,the conventional m.l.e.3 of m,B,are always well defined
as algebraic expressions they no longer could be called m.l.e.3 in the classical sense
when n 5 p or rank B < p . However under the new defrnition 4 ( m , 8 ) emerges
again as the unique m.1.e. provided we let
B = {&(m, B): m E Ipp, B covariance matrix of rank 5 p } .

The proof is straightforward although somewhat elaborate and is therefore omitted.


Example 3.6. Let XI,. . . , Xn be independently and identically distributed as
P E B = family of all probability measures on the positive real line, which admit a
monotone decreasing density with respect to Lebesgue measure. The solution of this
problem was first given by Grenander (1956). The unique m.1.e. POadmits a density
fo(x), which is a step function with steps at XI <
< X n and

--

Here XI <
< X n denote the ordered observed values of the sample. Since fo
exhibits discontinuities at the observed data values, it is not evident how the likelihood
ought to be defined in the classical sense. Should one take the left or right limits, or
some value in between to account for the fact that the density at n ought to reflect
the probability in a small neighbourhood around x? It turns out that the new
definition will produce the POgiven above as the unique m.1.e. for this problem. The
derivation, which is somewhat involved is given in the Appendix, Section 5.

DEFINITION

1980

OF

MAXIMUM LIKELIHOOD

199

4. CONCLUDING REMARKS

After making a case for the necessity of a more careful and broader definition for
the MML a new definition has been put forth. In the few examples presented above it
has performed well. Many more examples, dominated or not, could and should be
treated with more or less ease, and we hope that the versatility of the definition, as
we have experienced it so far, will stand up under future tests.
This new defmition will not remove inconsistent m.l.e.'s and replace them by
consistent ones as the following example demonstrates, cf. Barlow (1972). Let XI,
. . . , X, be independently and identically distributed as F E 9,
the family of all
starshaped distribution functions on [0, 11. Then F is starshaped on [0, 11 if F ( x ) / x
is nondecreasing on [0, 11. One easily shows, using the new definition and following
Barlow (1972), that the m.1.e. of F is
n

F(x)= x

Z[X,~~I/~,

i-l

which is certainly not consistent.


Finally we comment on the question: what is the m.1.e. of g ( P ) , where g is some
known functional defined on 9?Since g( P) typically is not a probability measure it
does not make sense to apply our definition. Zehna's theorem (1966), which states
that g ( p ) is an m.1.e. of g ( P ) ( p being an m.1.e. of P),requires the introduction of an
"induced likelihood" which is no likelihood in the classical sense, cf. Berk (1967).
Rather than give this artificial definition of an "induced likelihood" and prove the
above theorem we may as well define g ( p ) outright as the m.1.e. of g ( P ) , cf. Bickel
and Doksum (1977), or we may follow Berk's suggestion: Adjoin another functional
h to g such that (h, g ) represents a one-to-one map defined on 9.
Clearly one could
then accept (h(p),g ( p ) ) as m.1.e. of ( h ( P ) ,g ( P ) ) and by common convention g ( p )
as m.1.e. of g ( P ) .
5. APPENDIX

Proof that Po is the unique m.1.e. in Example 3.6. Let x = ( X I , . . . , X n ) be the


observed data vector and y = ( y l , . . . , yn), the corresponding vector of ordered
observations; then let L be the linear permutation map that maps x into y. If N, E
X, with D ( N , ) IE then Ny:= LN, E Nywith D ( N y ) = D ( N , ) IE and P ( N , ) =
P ( N y ) for all P E 9.
Therefore we may, without loss of generality, assume that the
<xn.
observed data vector x is ordered a priori, i.e., X I <
Let Ni;J = 1, . . . , 2", represent the intersections of N, E X, with the 2" open
quadrants with vertex at x, i.e.,

--

N:=N,ni; { y ~ ~ n : y , < x , )
1-1
n-1

N Z = N , n n { y ~ R " : y ; < x ; } n( y E R n : y n > x n )


1'1

Vol. 8 , No. 2

SCHOLZ

200

Then for P I , PZ E 9 with P2(Nx)> 0 for all N x E Jv;


2"

where
Aij(Nx) = P i ( N i ) / h ( N i ) ;

rj(Nx) = h ( N i ) / A ( N x ) ,

i = 1, 2, j = 1,

. . . , 2".

Here h denotes Lebesgue measure on R". Next note that

exists, e.g.,
n

fi2+)

fl f i ( x k + )

for i = 1,2.

k-1

We remark here that thef;,(x) are independent of the density versionf, employed.
Since in the following we will always use onlyf(xk-) andf(xk+), k = 1, . . . , n, there
should be no ambiguity if we identlfL P and some corresponding density versionfof

P.
We now state two lemmas whose proofs are straightforward and omitted:

. . . , k,i = 1, 2, with max(A2,: 1 Ij 5 k ) > 0 and let


Y = (r = ( r ,~. . .,rk) E Rk:rj > 0,j = 1, . . ., k, C! r, = 1} .

LEMMA
5.1. Let A,,

0;j = 1,

Then

inf C? rjAij

r ~ ~ s p C ;rjAzj
k

LEMMA
5.2. With rj(Nx) defined as in'(5.1)andYas in Lemma 5.1, it follows that with
k = 2",
{ ( r l ( N x ) ,. . . , rk(Nx)):N , E N ~D(N,)
,

sE}

=Y

for all E > 0.

Then (5.1) and Lemma 5.2 yield

lim Pi(N,)/Pz(Nx)5 m i n ( f i j ( x ) / f i i ( x ) : jsuch thatfij(x) > 0)

(5.2)

while (5.1) and Lemma 5.1 yield

lim P1 (Nx)/PZ(Nx )
2
r-0

inf min(Alj(Nx)/A2j(Nx):jsuch that A 2 j ( N x ) > O}. (5.3)

N,.k>
D(N,kP

DEFINITION OF MAXIMUM LIKELIHOOD

1980

201

We will now show that PO(with densityfo as defined in Example 5.6) is an m.1.e.
Note that fo(x,+) = 0 and f o ( X n - ) > 0, which in conjunction with (5.2) and (5.3)
implies that
such thatfoi(x) > 0)

lim P(N,)/Po(N,) = min{f,(x)/,(x):j

for any P E 9andf a density version of P. We will show in the following steps that
L(P, PO)c 1 for all P E 9such that P # PO,hence establishing that POis an m.1.e.
(a) Let 81C 9 be the following subfamily of distributions: P E P1if and only if
P admits a densityf E 9satisfying the following two conditions:
(i) f = 0 on (x,, 03)
(ii) f is a step function on (0, x,) with at most one step in each interval (xz,x l + , ) ; i =
0, . . . , n - I (x = 0), and in case of a step in ( x r , x , + 1 ) f is continuous at x L .

The following considerations show that our problem can be reduced to showing
L ( P , PO)c 1 for all P E PI with P # Po.

- )fi, E 9 be chosen such that fi


Iff E 9 is such that f(xl+) > f ( ~ ~ + ~let
coincides withfoutside J = ( x r ,~ , + ~ )
is ,
a step
f ~ function on J with exactly one
step in J andfi(x,+) =f(xr+),fi(xr+l-) = f ( x r + l - ) . Then L(fi,fo)
= L<f,fo).
(11) Iffl is as in (I) andfl(x,-) >fi(xl+), letfz E 9be chosen such thatfz coincides
withfl outside J,fzis a step function on J with exactly one step in J andf2(xl+)
= f i ( x , - ) andf(x,+l-) = f i ( x r + l - ) . Then L(f2,fo)
2 L(fi,fo).
(I)

Note that neither step I nor step 11, if carried out, will lead to P1= POor PZ= Po.
(111) I f f 9undf(x,+) > 0 letfi(x) = k f(x) Z(O.~.,(X)
with k > 1 so thatfi E 9.
Then L ( f , f o )c L(fi,fo).
Hence it remains to show L(P, P o ) < 1 for all P E

with P # Po.

(b) Now letfE 9%:iff is continuous at xl then


A---f(Xl+> - fCxr-)

f<Xt->

fo(xr-)

fo(xr-)

fo(Xl+)

Iffis not continuous at xl and if

f(xr->
---

_fCXl+>

fo(xr-)

fo(xr+)

with d > 0, let


fi(x) =f(x) for x < x I
=f(xr-)

for x, 5 x c x,

= a f(x,+) for x,

+ e, e > 0

+ e Ix < z where 0 < u -= 1 and z is the


locus of the next jump off following x,

= f ( x ) for x

E z.

Here e and u should be chosen so thatfl E gl, in fact u can be chosen arbitrarily
close to 1 by taking e > 0 sufficiently small, so that L ( f ,fo) c L ( f i ,fo) and fi is

Vol. 8, No. 2

SCHOLZ

202

continuous at x,. Hence for anyfE Pl we either have


n

L<f,fo>=

n f<Xl-)/fo(x~-)

1-1

or if not, then there exists anfi E PIsuch that


n

Jqf,fo) < L(ji,fo>


= n fi(x,-)/fo(xt-)
1-1

and P1corresponding t o p is different from PO.


(c) It remains to show that fo yields the unique maximum of nff(xr-) over
f E 9,This was shown by Grenander (1956), cf. also Barlow (1972, p.223 ff). The
problem basically reduces to maximizing

U,

subject to

U I2

an L 0 and

u,(x, - ~

1.

~ - =
1 )

r-1

1-1

The solution is
a, = min max
kSc-1

Izr

I-k
.
n(xr - Xk) '

i = 1, ..., n.

This proves that POis in fact an m.1.e. according to the new definition. It remains to
show that no other P E B can be an m.1.e.
First we claim that any m.1.e. P with density f must by necessity satisfy the
following conditions:
f(xn+) = 0

(5.4)

> 0.
To prove (5.4) SUppOSef(Xn+) > 0, and let P* have density
f(Xn-)

(5.5)

f*<U) = of( y)l(o,X,d

y ) +f(xn+)kXn.xn+du),
where a > 1 and d > 0 are chosen such that P* E 9.Then (5.2) and (5.3) imply

lim P*(N,)/P(N,)
= min{fi'(x)/fi(x):j

= 1, . . . ,2"} = an-l 2 1

and

lim P(N,)/P*(N,)= (l/a)" c 1;


hence P cannot be an m.1.e.
To prove (5.5) we can trivially exclude any P as m.1.e. with P(N,)= 0 for some N,
E Nx.Hence supposef(x,-) = 0 and P((xn - d, xn)) > 0 for all d > 0. Let P* E
9 with densityf* be such thatf*(xn-) > 0; then (5.3) implieslim P*(N,)/P(N,)
= and (5.2) implies lim P(N,)/P*(N,)
= 0,and hence P is no= m.1.e.
Let P E B with denscf satisfying (5.4) and (5.5). Then as before

Modifyfo as follows: at each jump point z offo extendfo continuously to the right a
small amount beyond z, at the same time lowering the plateau value off0 just prior
to z by an increment e > 0 so that the resulting densityf* is in 8;then

1980

DEFINITION OF MAXIMUM LIKELIHOOD

For e > 0 sufficiently small the last expression is greater than 1 if P # PO.This
concludes the proof of the uniqueness of Po as an m.1.e.
ACKNOWLEDGEMENT
I would like to thank Professor Ron Pyke for his stimulating interest in this problem. Our
many discussions on this subject were very essential in formulating the final form of the
definition presented here. I would also like to thank Professor R. Berk for pointing out his
review (1967) of Zehna (1966) to me.

RESUME
On presente une definition unitiee de la methode destimation du maximum de vraisemblance. Elle est basee sur une comparaison de deux mesures de probabilite dans un voisinage
de la donnee observee. Cette definition na pas les insuffisances des definitions anterieures,
i.e., elle ne depend pas du choix de la version de la densite dans le cas domine. La definition
sapplique igalement au cas non domine, i.e., elle procure une approche coherente a des
problemes non parametriques destimation du maximum de vraisemblance qui, jusqua present,
ont ete resolu a laide de methodes ad hoe. On montre que la nouvelle definition du maximum
de vraisemblance constitue une extension de lapproche classique telle quutilisee dans le cas
domine. Des exemples parametriques et non parametriques illustrent la nouvelle methodologie.

REFERENCES
Barlow, R.E.; Bartholomew, D.J.; Bremner, J.M., and Brunk, H.D. (1972). Statistical Inference under
Order Restrictions. Wiley, New York.
Berk, R.H. (1967). Review of Zehna (1966). Math. Rev., 33, no. 1922.
Bernoulli, Daniel (1777). The most probable choice between several discrepant observations and the
formation therefrom of the most likely induction. (In Latin.) Acta Acad. Petrop., 3-33. [English
translation: Biometrika, 48 (196 I), 3- 13.1
Bickel, J.P., and Doksum, K.A. (1977). Mathematical Statistics: Basic Ideas and Selected Topics. Holden
Day, San Francisco.
Edwards, A.W.F. (1974). The history of likelihood. Internat. Statist. Rev., 42, 9-15.
Fisher, R.A. (1912). On an absolute criterion for fitting frequency curves. Messenger Math., 41, 155-160.
Fisher, R.A. (1922). On the mathematical foundation of theoretical statistics. Philos. Trans. Roy. Soc.
London Ser. A , 222, 309-368.
Grenander, U. (1956). On the theory of mortality measurements. Skand. Aktuarietidskr., 39, 125-153.
Kalbfleisch, J.D., and Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data. Wiley, New
York.
Kalbfleisch, J.G. ( 1980). Probability and Statistical Inference. Volume 11. Springer-Verlag, New York.
Kaplan, E.L., and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Amer.
Statist. Assoc., 53, 457-481.
Kempthorne, 0..
and Folks, L. (1971). Probability, Statistics, and Data Analysis. The Iowa State University
Press, Ames.
Kiefer, J., and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of
infinitely many incidental parameters. Ann. Math. Statist., 27, 887-906.
Lambert, J.H. (1760). Photometria. Augustae Vindelicorum.
Rohatgi, V.K. (1976). An Introduction to Probability Theory and Mathematical Statistics. Wiley, New York.
Zehna, P.W. (1966). Invariance of maximum likelihood estimation. Ann. Math. Statist., 37, 744.

Received 23 January I980

Energy Technology Applications (ETA) Division


Boeing Computer Services Company
565 Andover Park West
Tukwila, Washington 98188, U.S.A.