0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

9 Ansichten11 Seitenlikelihood

Jun 08, 2015

193_ftp

© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

likelihood

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

9 Ansichten11 Seiten193_ftp

likelihood

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 11

La Revue Canadienne de Statistique

193

likelihood

F. W. SCHOLZ

definition of maximum likelihood.

AMS 1980 subject class@ations: Primary 62A10; secondary 62G05.

ABSTRACT

A unified defrnition of maximum likelihood (ML) is given. It is based on a pairwise

comparison of probability measures near the observed data point. This definition does not

suffer from the usual inadequacies of earlier definitions, i.e., it does not depend on the choice

of a density version in the dominated case. The defrnition covers the undominated case as

well, i.e., it provides a consistent approach to nonparametric ML problems, which heretofore

have been solved on a more less od hoc basis. It is shown that the new ML definition is a true

extension of the classical ML approach, as it is practiced in the dominated case. Hence the

classical methodology can simply be subsumed. Parametric and nonparametric examples are

discussed.

1. INTRODUCTION

back to Lambert (1760) or Bernoulli (1777), it is generally agreed that Fisher (1912,

1922) rediscovered it and set the stage for its general acceptance in the statistical

world.

This paper addresses a difficulty that arises in the definition of the method.

Traditionally the MML is motivated by discrete examples, where the likelihood L(x,

e) is defined to be just P(X = x I e>, and a maximum likelihood estimator (m.1.e.) 4

of 8 is a value 0 which maximizes L(x, 8) for fved x . In this context, when one

speaks of probabilities, the MML has much intuitive appeal, although its logical

foundations are metaphysical as already admitted by Bernoulli (1777). In the case of

continuous distributions, which have densities f ( x 1 8) with respect to Lebesgue

measure, the likelihood L(x, 6) is redefined to bef(x 16).To link this definition with

the previous one, it is noted that P(X E N(x)lO) - f ( x 1 O)V(N(x)), where V ( N ( x ) )

is the volume (independent of the parameter e) of a small neighbourhood N ( x ) of x .

In this context it is of interest to quote Fisher (1912, p. 156): Iffis an ordinate of the

theoretical curve of unit area, then p = f S x is the chance of an observation falling

within the range Sx; and if log P= C?log p then Pis proportional to the chance of

a given set of observations occurring. The factors Sx are independent of the theoretical

curve, so the probability of any particular set of 8s is proportional to P where log P

= Zilogf: The most probable set of values for the 8s will make P a maximum.

*This research, partially supported by NSF grant no. MCS75-08557A01, was carried out at the

University of Washington,and revised since the author joined the Boeing Computer Services Company.

SCHOLZ

194

Vol. 8, No. 2

The search for a new defintion was originally motivated by the desire that the

MEXLshould also cover nonparametric situations. The difficulty here is, that typically

we will not have a common a-finite dominating measure and hence cannot use

densities as likelihoods. In spite of this, many nonparametric problems have been

solved by the MML with acceptable solutions, however, it is not clear which extension

of the defhtion has been employed. We are aware of only two attemptsto widen the

definition of the MML in order to cover the undominated case as well. The one by

Kiefer and Wolfowitz (1956) essentially suggests a pairwise comparison of possible

distributions P and Q for the data point x, by comparing (dP/&)(x) and (dQ/&)(x),

where p P + Q. However the version of the Radon-Nikodym derivatives is not

specified. Selection of various versions can lead to many different solutions for the

MML. In this context consider the following example: Let #(x) be the standard normal

density and defrne #*(x) = @(x)for x f 1 and @*(l)= 10. Let (PdB E R ) be the

family of probability distributions generated by the densitiesfe(x) = @*(x- d),

B E 1p. Now Pe 4 8 , l ) and the m.1.e. of 8 (or Pe) upon observing x ouj$t to be

x (or P,). However if one takes the above versionsfi as densities, i.e., as likelihoods,

the m.1.e. will be x 1 for all x.

It appears then, that not only is the Kiefer-Wolfowitzdefinition defective in this

respect but so is the classical defmition as well, unless one considers the density

versions as given entities in any statistical model for data. However the wron8

versions could have been specified and we will not escape the consequences of the

above example. Further it is not clear how to select unproblematic density versions

in the Kiefer-Wolfowitzapproach should one select to represent the statistical model

in terms of given densities.

The only other approach we are aware of for extending the MML towards nonparametric problems is outlined in the new book by Kalbfleisch and Prentice (1980).

Motivating the likelihood in the derivation of the Kaplan-Meier estimator, they

suggest a d i s c r e t i o n of the data, via the concept of rounding mors (cf. also

Kempthorne and Folks (197I) and Kalbfleisch (197l)), and then apply the m a to

the discrete (multinomial)problem and let the rounding error tend to zero,hoping

that the dismte m.1.e. will converge to a limit. Kalbfleisch and Prentice suggest

without fwther support that this will provide an extension of the usual m.We did

consider providing rigorous support for this last claim. However the questions of

existence of the limiting m.1.e. (as the grouping gets finer) and whether this limit

would depend on the grouping employed, presented major obstacles and it remained

unclear Whether an elegant or even satisfactory approach could be found this way.

In the next section we will present a new definition of the MML which is based 011.8

pairwise comparison of probability measures in the neighbourhood of the observed

data point x. Thus, in a sense,the new d e f ~ t i o nis a marriage of the two extension

proposals discwsed above.

2. DEFINITIONS

Let AY be a metric space with metric d and let B be a family of probability

measures on the Bore1 sets of 9C For any (data) point x E $let M,denote the family

of ail measurable sets N, which contain x as an interior point. By D(N,) denote the

diameter of the set N,.

195

1980

DEFINITION

1. For P, Q E B write P 2 Q if

lim inf{P(Nx)/Q(Nx):Nx

E Nxwith D(Nx)IE }

E+O

1,

For convenience we will write the above limit also suggestively as follows:

X

DEFINITION

2. For P, Q E B write P = Q when P 2 Q and Q L P. Then P and Q are

called equivalent at x.

Comments.

(i) P f Q if and only i f l i m ~ ( ~ ,P(Nx)/Q(Nx)

)+~

exists and equals I.

X

(ii) If P 2 Q and Q 2 R then P L R (transitivity). [We could not establish the same

transitivity relationship for the Kiefer-Wolfowitz definition, the many density

versions representing the basic difficulty.]

X

X

2) is a

(iv) If {P}.: = ( Q E S? Q f P } and if Bx:= {{P}.: P E B} then {PX,

X

DEFINITION

3. The statistic PO E B is a maximum likelihood estimator (m.1.e.) with

x

respect to x E 9and Biffor every Q E Bsuch that Q> POitfollows that Q- PO.That

X

is, POis an m.1.e. fi and onb if there does not exist a Q E 9 - PO}^ such that Q 2 PO

or equivalent&

lim Q(NX)/Po(Nx)

< 1for all Q E 8- {PO},.

The existence of an m.1.e. according to this definition cannot be taken for granted

and depends on the underlying problem. Further m.l.e.s may not be unique since we

may either have a whole equivalence class of them or the situation may arise where

several equivalence classes of rn.l.e.s exist which are not comparable with respect to

x

Qr

P; then by our

Now we will give some criteria that will clarify the relationship between the new

definition and the classical approach:

(A) Suppose P, Q E Bare dominated by a u-finite measure p, and suppose P and Q

have density versionsp(.) and q(.) (w.r.t.p), which are continuous at x , then

lim P(Nx)/p(Nx)exists = p ( x )

D(N*kO

and

SCHOLZ

196

Vol. 8.No. 2

-Q ( N x ) / P ( N x=

) 1

X

Case 2. If p(Nx)> 0 for every N , E Nx,then

+ q ( x ) > 0, then

P 2 Q c + p ( x ) -> q(x).

The case p ( x ) + q ( x ) = 0 has to be considered on an individual basis in each given

problem.

-P ( N x ) / Q ( N x =

) P ( { x } ) / Q ( { x ) )and hence

P 2 Q P ( { x } )2 Q ( { x ) ) .

@

Criteria (A) and (B) show the agreement between the new and the classical

definition. This means that the classical methodology for finding m.l.e.s is simply

subsumed in the methodology for the new definition. Further, the new definition

points to those density versions (continuous at x , if they exist) that should be used in

the classical definition in order for agreement to occur between the defmitions. If

there exists a continuous density version it seems only natural to insist on its use in

the classical definition, since densities are supposed to represent the localized

properties of probability distributions. Unresolved are those cases where density

versions, which are continuous at x do not exist. In such cases no natural density

version would suggest itself for the classical approach whereas the new definition

shows how to deal with this question in a satisfactory way, as will be seen from the

following examples.

3.EXAMPLES

In the following examples P ( Q , . . .) may denote either the distribution of one

random variable or the distribution of a random sample of such random variables,

i.e., P serves as a parameter as well. From the context it should be clear what is

meant so that no confusion arises while at the same time we avoid notational

complexities.

Example 3.1. Let XI,. . . , Xn be independently and identically distributed as

9, the class of all probability distributions on the real line. Let P,,be the

empirical distribution corresponding to the observed data vector x = ( x l , . . . , xn),

i.e.,

P E

Pn(A) = C I A ( ~ J / ~

1-1

where

1 ifzA

IA(z) = 0 otherwise.

1980

197

h

where yl < .. < yh represent the distinct values among xl, . . . , Xn appearing with

respective multiplicities ml, . . . , mh. Furthermore note that by simple application of

Jensen's inequality one has

h

J=1

1-1

X

i.e., Pn is the unique m.1.e. Q.E.D.

Pn

Q({x}) ? P,({x})

P, = Q,

It seems fitting to give this example first, since in some respect it was the first one

"solved" through the MML, namely by Lambert (1760), cf. Edwards (1974).

Example 3.2. Let Xl, . . . , X n be independently and identically distributed

%(O, 8), 8 > 0 (uniform on (0,8)); then PO, = %(O, OO), with 80 = max(xl, . . . , x,),

is the unique m.1.e. based on the observed data vector x = (XI, . . . , X n ) , since

(a) if 0 c 00 then Pe(Nx)= 0 for D(N,) small, and Pen(Nx)> 0 for all N , E ,A,

so

lim PANx)/Peo(Nx)= 0 < 1.

(b) if 8 > e0,let

and let h denote Lebesgue measure on R". Then

N,* =

{@I,

lim Pe(Nx)IPeo(N~)

= @ (80/B)n(X(N,)/h(N,*))= (do/@" < 1

since the ratio h(N,)/X(N:) ( 21) can approximate 1 by appropriate choice of N,.

Thus Penis an m.l.e., and the uniqueness follows easily. Here the classical approach

can either produce the same solution or no solution or any other solution depending

on the choice of density. Note that in this example there is no density for Pe,,which

is continuous at x .

Example 3.3. Let XI, . . . , X n be independently and identically distributed

% ( 8 - 4, 8 + 3), 8 E I?'(uniform on (0 - 3, 8 + 4)); then any Pe,, = %(do - 3, 80+

+), with O0 E I = (X(n)- 4, x ( ~+) +),is an m.1.e. based on the observed data vector x

= (XI, . . . , xn); here x(11= min(x1, . . . , Xn) and X(n) = max(xl, . . . , xn).

X

Proof. First note that Pen= Pel for 80, 81 E Z,since PO, and PO, admit densities which

are continuous at x with the same value 1 at x. Now let 80 E Z and dl = x ( ~-) 4 <

80and let

N: = N,

n { y E R,:

5 o1

+ 3);

then

lim Pe,(NX)/Pe,,(Nx)

lim Pe,(N~)/Peo(Nx)

= 0 < 1,

hence establishing the claim.

Vol. 8 , No. 2

SCHOLZ

198

our definition. Note that Rohatgi (1976, p. 379) gives the closed interval [ X ( n ) - 4,

X U ) + $1 as the set of all m.l.e.s which is due to the choice of density employed. This

arbitrariness is resolved by our defrnition in a manner which appears satisfactory.

Example 3.4. The Kaplan Meier estimator: Suppose TI, . . . , T n are independently

and identically distributed as F,where F E E the family of all distribution functions

on [0, a];the Ti represent latent failure times. We also have censoring times

C1, . . . , Cn independently and identically distributed as G, where G E 9? Assume

that C = (CI, . . . , C n ) and T = (TI, . . . , Tn) are independent. For i = 1, . . . , n we

observe the actual failure times Xi = min(T,, Ci)and

81, .. . , Xn, 6,) will be denoted by PF,Gand an

observed value of (X,6)is denoted by ( x , d). Note that as in Example 3. I there is no

F, G E 9},

and hence no likelihood

common a-finite dominating measure for {PF,G:

in the classical sense. The approach taken so far, cf. Kaplan and Meier (1958), is to

maximize

(3.1)

The conventional MML fails to justify (3.1) as a starting point; the new definition,

however, easily leads us to (3.1). This follows from our discussion in (B) above in

Section 2 and the fact that one can find (Fo,GO)E 9 X @such that PF,,G,(X= x ,

6 = d ) > 0.

Example 3.5. The Multivariate Normal Distribution: Let X I , . . . , Xn be independently and identically distributed Np(m,B) with m E Rp and B a p X p covariance

matrix. Although m and B,the conventional m.l.e.3 of m,B,are always well defined

as algebraic expressions they no longer could be called m.l.e.3 in the classical sense

when n 5 p or rank B < p . However under the new defrnition 4 ( m , 8 ) emerges

again as the unique m.1.e. provided we let

B = {&(m, B): m E Ipp, B covariance matrix of rank 5 p } .

Example 3.6. Let XI,. . . , Xn be independently and identically distributed as

P E B = family of all probability measures on the positive real line, which admit a

monotone decreasing density with respect to Lebesgue measure. The solution of this

problem was first given by Grenander (1956). The unique m.1.e. POadmits a density

fo(x), which is a step function with steps at XI <

< X n and

--

Here XI <

< X n denote the ordered observed values of the sample. Since fo

exhibits discontinuities at the observed data values, it is not evident how the likelihood

ought to be defined in the classical sense. Should one take the left or right limits, or

some value in between to account for the fact that the density at n ought to reflect

the probability in a small neighbourhood around x? It turns out that the new

definition will produce the POgiven above as the unique m.1.e. for this problem. The

derivation, which is somewhat involved is given in the Appendix, Section 5.

DEFINITION

1980

OF

MAXIMUM LIKELIHOOD

199

4. CONCLUDING REMARKS

After making a case for the necessity of a more careful and broader definition for

the MML a new definition has been put forth. In the few examples presented above it

has performed well. Many more examples, dominated or not, could and should be

treated with more or less ease, and we hope that the versatility of the definition, as

we have experienced it so far, will stand up under future tests.

This new defmition will not remove inconsistent m.l.e.'s and replace them by

consistent ones as the following example demonstrates, cf. Barlow (1972). Let XI,

. . . , X, be independently and identically distributed as F E 9,

the family of all

starshaped distribution functions on [0, 11. Then F is starshaped on [0, 11 if F ( x ) / x

is nondecreasing on [0, 11. One easily shows, using the new definition and following

Barlow (1972), that the m.1.e. of F is

n

F(x)= x

Z[X,~~I/~,

i-l

Finally we comment on the question: what is the m.1.e. of g ( P ) , where g is some

known functional defined on 9?Since g( P) typically is not a probability measure it

does not make sense to apply our definition. Zehna's theorem (1966), which states

that g ( p ) is an m.1.e. of g ( P ) ( p being an m.1.e. of P),requires the introduction of an

"induced likelihood" which is no likelihood in the classical sense, cf. Berk (1967).

Rather than give this artificial definition of an "induced likelihood" and prove the

above theorem we may as well define g ( p ) outright as the m.1.e. of g ( P ) , cf. Bickel

and Doksum (1977), or we may follow Berk's suggestion: Adjoin another functional

h to g such that (h, g ) represents a one-to-one map defined on 9.

Clearly one could

then accept (h(p),g ( p ) ) as m.1.e. of ( h ( P ) ,g ( P ) ) and by common convention g ( p )

as m.1.e. of g ( P ) .

5. APPENDIX

observed data vector and y = ( y l , . . . , yn), the corresponding vector of ordered

observations; then let L be the linear permutation map that maps x into y. If N, E

X, with D ( N , ) IE then Ny:= LN, E Nywith D ( N y ) = D ( N , ) IE and P ( N , ) =

P ( N y ) for all P E 9.

Therefore we may, without loss of generality, assume that the

<xn.

observed data vector x is ordered a priori, i.e., X I <

Let Ni;J = 1, . . . , 2", represent the intersections of N, E X, with the 2" open

quadrants with vertex at x, i.e.,

--

N:=N,ni; { y ~ ~ n : y , < x , )

1-1

n-1

1'1

Vol. 8 , No. 2

SCHOLZ

200

2"

where

Aij(Nx) = P i ( N i ) / h ( N i ) ;

rj(Nx) = h ( N i ) / A ( N x ) ,

i = 1, 2, j = 1,

. . . , 2".

exists, e.g.,

n

fi2+)

fl f i ( x k + )

for i = 1,2.

k-1

We remark here that thef;,(x) are independent of the density versionf, employed.

Since in the following we will always use onlyf(xk-) andf(xk+), k = 1, . . . , n, there

should be no ambiguity if we identlfL P and some corresponding density versionfof

P.

We now state two lemmas whose proofs are straightforward and omitted:

Y = (r = ( r ,~. . .,rk) E Rk:rj > 0,j = 1, . . ., k, C! r, = 1} .

LEMMA

5.1. Let A,,

0;j = 1,

Then

inf C? rjAij

r ~ ~ s p C ;rjAzj

k

LEMMA

5.2. With rj(Nx) defined as in'(5.1)andYas in Lemma 5.1, it follows that with

k = 2",

{ ( r l ( N x ) ,. . . , rk(Nx)):N , E N ~D(N,)

,

sE}

=Y

(5.2)

lim P1 (Nx)/PZ(Nx )

2

r-0

N,.k>

D(N,kP

1980

201

We will now show that PO(with densityfo as defined in Example 5.6) is an m.1.e.

Note that fo(x,+) = 0 and f o ( X n - ) > 0, which in conjunction with (5.2) and (5.3)

implies that

such thatfoi(x) > 0)

for any P E 9andf a density version of P. We will show in the following steps that

L(P, PO)c 1 for all P E 9such that P # PO,hence establishing that POis an m.1.e.

(a) Let 81C 9 be the following subfamily of distributions: P E P1if and only if

P admits a densityf E 9satisfying the following two conditions:

(i) f = 0 on (x,, 03)

(ii) f is a step function on (0, x,) with at most one step in each interval (xz,x l + , ) ; i =

0, . . . , n - I (x = 0), and in case of a step in ( x r , x , + 1 ) f is continuous at x L .

The following considerations show that our problem can be reduced to showing

L ( P , PO)c 1 for all P E PI with P # Po.

Iff E 9 is such that f(xl+) > f ( ~ ~ + ~let

coincides withfoutside J = ( x r ,~ , + ~ )

is ,

a step

f ~ function on J with exactly one

step in J andfi(x,+) =f(xr+),fi(xr+l-) = f ( x r + l - ) . Then L(fi,fo)

= L<f,fo).

(11) Iffl is as in (I) andfl(x,-) >fi(xl+), letfz E 9be chosen such thatfz coincides

withfl outside J,fzis a step function on J with exactly one step in J andf2(xl+)

= f i ( x , - ) andf(x,+l-) = f i ( x r + l - ) . Then L(f2,fo)

2 L(fi,fo).

(I)

Note that neither step I nor step 11, if carried out, will lead to P1= POor PZ= Po.

(111) I f f 9undf(x,+) > 0 letfi(x) = k f(x) Z(O.~.,(X)

with k > 1 so thatfi E 9.

Then L ( f , f o )c L(fi,fo).

Hence it remains to show L(P, P o ) < 1 for all P E

with P # Po.

A---f(Xl+> - fCxr-)

f<Xt->

fo(xr-)

fo(xr-)

fo(Xl+)

f(xr->

---

_fCXl+>

fo(xr-)

fo(xr+)

fi(x) =f(x) for x < x I

=f(xr-)

for x, 5 x c x,

= a f(x,+) for x,

+ e, e > 0

locus of the next jump off following x,

= f ( x ) for x

E z.

Here e and u should be chosen so thatfl E gl, in fact u can be chosen arbitrarily

close to 1 by taking e > 0 sufficiently small, so that L ( f ,fo) c L ( f i ,fo) and fi is

Vol. 8, No. 2

SCHOLZ

202

n

L<f,fo>=

n f<Xl-)/fo(x~-)

1-1

n

= n fi(x,-)/fo(xt-)

1-1

(c) It remains to show that fo yields the unique maximum of nff(xr-) over

f E 9,This was shown by Grenander (1956), cf. also Barlow (1972, p.223 ff). The

problem basically reduces to maximizing

U,

subject to

U I2

an L 0 and

u,(x, - ~

1.

~ - =

1 )

r-1

1-1

The solution is

a, = min max

kSc-1

Izr

I-k

.

n(xr - Xk) '

i = 1, ..., n.

This proves that POis in fact an m.1.e. according to the new definition. It remains to

show that no other P E B can be an m.1.e.

First we claim that any m.1.e. P with density f must by necessity satisfy the

following conditions:

f(xn+) = 0

(5.4)

> 0.

To prove (5.4) SUppOSef(Xn+) > 0, and let P* have density

f(Xn-)

(5.5)

y ) +f(xn+)kXn.xn+du),

where a > 1 and d > 0 are chosen such that P* E 9.Then (5.2) and (5.3) imply

lim P*(N,)/P(N,)

= min{fi'(x)/fi(x):j

= 1, . . . ,2"} = an-l 2 1

and

hence P cannot be an m.1.e.

To prove (5.5) we can trivially exclude any P as m.1.e. with P(N,)= 0 for some N,

E Nx.Hence supposef(x,-) = 0 and P((xn - d, xn)) > 0 for all d > 0. Let P* E

9 with densityf* be such thatf*(xn-) > 0; then (5.3) implieslim P*(N,)/P(N,)

= and (5.2) implies lim P(N,)/P*(N,)

= 0,and hence P is no= m.1.e.

Let P E B with denscf satisfying (5.4) and (5.5). Then as before

Modifyfo as follows: at each jump point z offo extendfo continuously to the right a

small amount beyond z, at the same time lowering the plateau value off0 just prior

to z by an increment e > 0 so that the resulting densityf* is in 8;then

1980

For e > 0 sufficiently small the last expression is greater than 1 if P # PO.This

concludes the proof of the uniqueness of Po as an m.1.e.

ACKNOWLEDGEMENT

I would like to thank Professor Ron Pyke for his stimulating interest in this problem. Our

many discussions on this subject were very essential in formulating the final form of the

definition presented here. I would also like to thank Professor R. Berk for pointing out his

review (1967) of Zehna (1966) to me.

RESUME

On presente une definition unitiee de la methode destimation du maximum de vraisemblance. Elle est basee sur une comparaison de deux mesures de probabilite dans un voisinage

de la donnee observee. Cette definition na pas les insuffisances des definitions anterieures,

i.e., elle ne depend pas du choix de la version de la densite dans le cas domine. La definition

sapplique igalement au cas non domine, i.e., elle procure une approche coherente a des

problemes non parametriques destimation du maximum de vraisemblance qui, jusqua present,

ont ete resolu a laide de methodes ad hoe. On montre que la nouvelle definition du maximum

de vraisemblance constitue une extension de lapproche classique telle quutilisee dans le cas

domine. Des exemples parametriques et non parametriques illustrent la nouvelle methodologie.

REFERENCES

Barlow, R.E.; Bartholomew, D.J.; Bremner, J.M., and Brunk, H.D. (1972). Statistical Inference under

Order Restrictions. Wiley, New York.

Berk, R.H. (1967). Review of Zehna (1966). Math. Rev., 33, no. 1922.

Bernoulli, Daniel (1777). The most probable choice between several discrepant observations and the

formation therefrom of the most likely induction. (In Latin.) Acta Acad. Petrop., 3-33. [English

translation: Biometrika, 48 (196 I), 3- 13.1

Bickel, J.P., and Doksum, K.A. (1977). Mathematical Statistics: Basic Ideas and Selected Topics. Holden

Day, San Francisco.

Edwards, A.W.F. (1974). The history of likelihood. Internat. Statist. Rev., 42, 9-15.

Fisher, R.A. (1912). On an absolute criterion for fitting frequency curves. Messenger Math., 41, 155-160.

Fisher, R.A. (1922). On the mathematical foundation of theoretical statistics. Philos. Trans. Roy. Soc.

London Ser. A , 222, 309-368.

Grenander, U. (1956). On the theory of mortality measurements. Skand. Aktuarietidskr., 39, 125-153.

Kalbfleisch, J.D., and Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data. Wiley, New

York.

Kalbfleisch, J.G. ( 1980). Probability and Statistical Inference. Volume 11. Springer-Verlag, New York.

Kaplan, E.L., and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Amer.

Statist. Assoc., 53, 457-481.

Kempthorne, 0..

and Folks, L. (1971). Probability, Statistics, and Data Analysis. The Iowa State University

Press, Ames.

Kiefer, J., and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of

infinitely many incidental parameters. Ann. Math. Statist., 27, 887-906.

Lambert, J.H. (1760). Photometria. Augustae Vindelicorum.

Rohatgi, V.K. (1976). An Introduction to Probability Theory and Mathematical Statistics. Wiley, New York.

Zehna, P.W. (1966). Invariance of maximum likelihood estimation. Ann. Math. Statist., 37, 744.

Boeing Computer Services Company

565 Andover Park West

Tukwila, Washington 98188, U.S.A.