Sie sind auf Seite 1von 27
Efficient Estimates and Optimum Inference Procedures in Large Samples By C. Raprarasion Rao Indian Stott atte, Cale (lead a Rasancnt Meraoos Merrow of ih Socry, Deemer Gt, 1961, Profesor D. R- Cox inthe Chu} Suauy ‘The concept of efficiency in estimation i linked with closeness of appron- ration to the derivative of log Lkelibood, which plays an important role ia ‘satisticalinfeeace in large samples. Various orders of efficiency are defined ‘depending on degrees of closeness, and properties of estimates satsyng these centers are studied. Such measuesof efficiency appear tobe more appropriate ‘than the one related to arymplotic variance of an eximate for juding the performance ofan estimate, when used as a substitute for the whole sample {drawing iferece about ooknowa parameter. Iti found tht, under some conditions, the maximum likelihood exiate bas some optimum properties which dtaguis it rom all ther large sample estinatex 1. Ieooveriow ‘is easly four decades soc Sir Ronald Fiser introduced the conept of kehood which, a «funtion of unknown parameters given the sample, plays a fundamental ‘ole ia statistical infereace. He bad also studied and established optisum properties of estimates obtained by maximizing likeitood nthe light of criteria of consistency aod efcieny in large samples, and of suficieny and amount of information inthe case of smal samples (Fisher, 1922, 192). There has been, however, some controversy ‘gating the superiority of waximam likebood (m1) estimates over others. For lnstance, i basteo sid that te method of ms jastone out ofan nity of exina- ‘ion procedures yielding wha ar calle BA.N. (bes asymptotically normal estimates Inving the same opianum asymptote properties as ml. esimaes (Neynan, 1949; aléane, 1951), and that rer ert are necessary for establishing the superonty, ‘any, of ml, esas, It bas aso beea thought tha the extece of supereficieat cstimates, e. with asymptotic variances smaler than those of mJ estimates, aval ats the concept of efieaey on which the use of mi. estimates is advooled (Lecam, 1955. ‘The anomalies appaeay arte in judping an estimate from narow concepts, such a asymptotic variance and concentration, defined in estited manne round the true value, which are not by themselves well conditioned indicators ofthe wefil- ess ofan estimate in tatsalafreoce. For instance, iT, is an estimate of 6, the ligit of PrT.~ 0AM: 15 n-+00 is defined a8 limiting concentration, whereas the natural thing (0 do is to ‘examine the probebiity of cooceatration in a fixed interval round the true value as 1962] -Rao~ Efficient Estimates and Optimum Inference Procedures a n-vco and not in intervals tending to zero. Recent invetigaions at the Indian ‘Statistical Iastitute on criteria for estimation and limiting properties of estimates, from the wider point of view of statistical inference, have led to some definite resolts regarding m. estimates, which I would like to present at this meeting. "The main line of investigation has been to enquire Kow good a given estimate is ‘asa substitute for the whole sample in drawing inference about unknown parameters (Rao, 19608, 1961). This approach is implicit ia Fisher's work (Fisher, 1922, 1925) and is also stated by Barnard (1948) in his fundamental paper on statistical inference. ‘Such an approach is considered by some statisticians as of limited value because it is general and not intended ¢o answer specific questions in making decisions from ‘observed data (Berkson, 1960). On the contrary, in meny practical situations, the ‘object of reduction of data in the form of an estate is only to facilitate answering a variety of questions of immediate interest. Further, it would be more economical to preserve for future use an estimate instead of the whole sample; and this could be done satisfactorily only ifthe estimate is « good substitute for the sample. Another line of work initiated by Bahadur (1960) has been to study the concentration of an estimate in fxed intervals round the true value of a parameter, at the sample size increases. It may be observed from the approach adopied by Bahadur, oras explicitly ‘demonstrated in the present paper, that cosceatration is equivalent fo cortaia other properties of an estioiate used as a substitute for the whole sample in tests of signifi ‘ance. Thus the approach developed in the earlier papers (Rao, 1960a, 1961) seems 10 provide a convenient framework for discussing the problem of estimation. ‘The criteria for judging the performance of an estimate compared with that of the whole sample are obtained by a suitzble reformulation of consistency and eficiency introduced by Fisher (1922, 1925). This is done in section 2 and the properties of estimates satisfying these criteria are examined in sections 3, 4 and 5. ‘While thanking the Royal Statistical Society for giviog me an opportunity to read ‘paper at one ofits meetings, I must apologize for choosing a subject which may appear somewhat classical. But I hope this small attempt intended to state in precise terms what can be claimed about mL. estimates, in large samples, will at least throw some light on current controversies, 2. Carrenta oP ConsisTINCY AND EFFICIENCY 2, Consistency twill help usin our discussion if we state in precise terms the criteria of consistency and efficiency, especially as there is some misunderstanding in the interpretations of ‘the original definitions given by Fisher (1922, 1925). Although the concept of con- sistency a5 discussed in this section is mot necessary for the development ofthe paper it has been included for the sake of completeness, to demonstrate how this criterion fits in with the general approsch to the problem of estimation indicated in the introduction. Let £* be the space of infinite sequences of observations and S, the Kolmogoroff ‘e-feld of measurable sets. Further (Pa) represents a family of probability measures defined over 5, and indexed by a parameter 6 varying in a set {@). Let T,(X,) bea real valued statistic defined on 2%, the space of the first n observations, X- The family of probability measures induced by T,, which may be regarded as a function oa X®, is (PyTq3), Two probability measures yu,» are said to be orthogonal (a) if there exist disjoint sets A and B such thet w(4) = (8) = 1, (8) = 44) = 0. a Rao Efflen Extimates and Option Inference Procedures (No. 1, Definition 21. Let Py Ts! wg weakly, Then Ty is wid to be cositent forthe family Baheif mod ny whenever Ppa, ‘Wea the imiting measures pp and py do aot exit an alleratve deition of ‘consistency may be ven. Defniton 22, The statistic 7, is said to be consistent for the fail (Pit for any Given «>0, thre exists an mg) such tha foreach n>m(0) disjoit ses dy and By can be found in 2 with the propery AT HUDPI-“6 BTN KG PTE Ee RTSKB21—6, wonnever PytPy _ Tis easy to Show thatthe definition 2.1 implies tbe definition 2.2 when and py cist Orthogonaity of masures Py and P, inplics that complet disciaination is possible betwesa thee two members ofthe family asthe sample sie i increased indefinitely. The ceri of consistency sates thatthe same could be achived by considering ony tke estimate (sa statisti) in te place ofthe whole smple a each sage, “There are two definitions of consistency curet in literature, one of which know 1s probably consistency (P.C.) requires that Ty» weakly of strongly, in which ‘ase it is csy to see that te debnitions 21 and 32 are slisied. According to our wider concept, 7, would be coasstent eva if Ty» (0), single valued function of 8 and not necessarily to 6, The author has shows (Rao, 196) that 2 few examples of inconsistency of ml, estimates recorded in literature relate to an estimate tending to ‘function ofthe parameter instead of the parameter with respect to which maxiniza- tion of likelihood is sought. ‘Another definition suitable or sequences of independent and identically distributed observations i cll Fther consistency (F.C). IFS, denotes the empirical dsc ‘Gon function based on n observations, Fiber (1956) considers a statistic Ty = (S,) where fis a functional defined over the space of all dstrbution functoas. Thea 7, is sid to be F.C. if /(Fq)= © where Fy is the tre disvibuion furction, I te functional is weally continuous, F.C. implies P.C. 22, Efiincy ‘While consistency ensure that an estimate achieves perfect discrimination between altemative distributions as n-+o, ficiency is concerned with dflreacesin dscimina- tion provided by an estimate and the whale sample as n->20. For split, et ur coosier the case where probability densities exist and only one uaknowe peramtier ininvoled. For a given 8, let P(Yq 0) deacte the probablity density of the semple poiot X,in* and PT, ), that forthe statistic Ty. The best discriminator (dierimi- ‘ban futetion) between alternative distributions with indices @ and ¢ isthe Likehood _gilio PU /P(%q.4), while that based on 7, aloae is PCT, OFT). Now Ty inequivalent ro the wool sample i dn Pow i) Fd)" Fg) OOP ep which is realized if T, is sufficient for the unknown parameter, 1962] RAo~ Efcient Estinctes and Optimum Inference Procedures ° Ta large samples itis perhaps relevant to consider alteraatives close to one another, In such a ease, ifthe first derivative P’(X 6) of P(X, 6) with respect to 6 exists, the dlscriminator may be writen as P(X 6)/P(%, 8). The corresponding expression for T, is P(T,, O/PCTy, 8) and the difference nya Pan) PT O° FED PTD om plays a key role in studying the performance of Ty. The statistic 7, is equivalet ia ‘some seas tothe whole sample, when ms large, if ‘td{O,n)-+0 in probability, ‘where a is chosen so that 1*P'(X,, 0)/P(Xq 9) does not itself converge to zero in probability. Usually « =~} serves the purpose. Definition2.3. A statistic T, is said to be ecient (to the frst ordes) if fora suitable ‘choice of c, such that the statistic *P'(X,, O)/P(X,, 8) does not converge to zero, ‘d(0,»)-r0 in probability. “There may be avery wide class of statistics satisfying the criterion of fint-order ficiency, in which case a further criterion may be necessary for restricting the choice of statistics, This should depeod on the rapidity of convergence of n#d(@,n) or the asymptotic behaviour of d(6,n) itself, Since E{d(@,n)} = 0, V(d(0,n)} may provide A satisfactory measure. Definltion2.4. The second-order eicieney of T, is lim ¥(a(6,n)) = him (1(%,)— 1) where 1(X,) and 1(T,) stand for the amounts of information (ia Fisher's sense) contained in the sample and in the statistic respectively. Second-order efficiency, as given in definition 2.4, examines the amount of information lost in using a statistic instead of the whole sample. This aspect was fist ‘examined by Fisher (1925) when he conjectured that m.l. estimates have the least Limiting loss. ‘The criteria of efficiency given in definitions 2.3 and 2.4 are extremely dificult to verify in practice. They are, therefore, replaced by simpler definitions 2.5 and 2.6, which are formated so a8 to incorporate the eset atures of definitions 23 and. ind to be equivalent to them under some restrictive conditions on the probability densities. Definition 2.5, The statistic T, is said to be efficient (frst order) if Py 8) 50. ia probabil {EG 9 e097, a]-+0 a proba, @3) ‘where A) is a function of 8 only. ‘Note that if (2,3) holds, 7, is automatically consistent, because it tends to 6 in Probability. If we replace 6(8)(T,~6) by a(6)+B(0)T,, we have a more general ‘situation, but this is not of direct interest in the context of the present investigation. Definition 2.6. The second-order efficiency isthe minimum asymptotic variance of FP tO I) yO, en ‘when minimized with respect to A. 50 Rao Bfcient Enimates and Optimum Inference Procedures (No, , Ja the rest ofthe paper we shall use ist-and second-order efcienies only in the sense of definitions 25 and 2.6 respectively. 23, Second-order Eficixey of ml. Estimates 1 has been shown by Rao (1961) that, under some regularity conditions, the frs-ordereficieny in the sense of dfaiton 2.5 ensures that a8 n-+00 Hy 1% Seer e9 ‘where 1(%,) and I(T, are the amounts of information contained in the sample and in the statistic respectively, Tere may be an infinity of estimation procedures for whic (2.5) is tre, in which cae the second-order efficiency will be of use in restiing the choice fo a subset. tisof some interest to examine the conditions under which deSitios 2.4 and 2.6 of second-order eficieacy are equivalent. In sucb a case the results already proved rmgarding second-order eiciency of estimates have direct signfcance as statements concerning the actual amovat of information lost. ‘Now, for a multiaomial distribution, it has beea shown by Rao (1961) that mL. 4s the only method with an optimum second-order eficiency under the following ‘conditions of whic the fist one is purely a resrcioa on the choice ofa parameter: (@) The parameter under consideraion is continuous functional of the istibation function; (i) The cell probabiies represented by 9y(),...m(@) admit continuous rh, ay Gelert, on 2 Rao Efficient Estimates and Optima Inference Procedures (No.1, where £ and xq are the mean and median respectively, Obviously 7, is consistent for te mean of the noroal distibution and its asymptotic vance i Jon when = Oand Ifrotervis. The asymptotic variance of 7, is the soe as Ua of = when 4440 and can be made smaller tan that of at 80 by chosig a ebivariy sal, Now, being stochastically equivalemt tothe mean when #= 0, cbviowtly, tot a more sisfucory estate than # from any plat of view. Indeed one ca, ‘Construct, by an extension of Hodges’s method suggested by LeCam (1953), T, $0. 8 tobe asrmpttcaly equivalent to the median fora countable sof vas Of 2 and possessing a sale asymptote variance a these points. That wold be mahing the postion worse. What ia elect Ses-order eficeny demands is not tat tbe Asymptotic variance of an estimate is « minimum butts asymptote corelaon with the derivaive of og lieliboed be unity. For 7, coasted is (2.7), the asymptti correation is 2) at @ = O and wait elsewhere, whereas fr, it sunt for a 8. ‘Uthe deficiency in an estimate is measured by | —r°, where ris the asymptouc correla ‘tion, then the deficiency in T, defined in (2.7) is about 0:363 at @ = 0. 2, Assumptions end Notations In the rest of the paper we shall consider only sequences of independest and idenicaly distributed variables with probability density (x8) with respect to a ‘nite measure v, and distibuion function Fx). The probability density of m observations is denoted by Pty 6) with respect 10 ty and tbat of Ty with respect toa ote measure u by P(Tq 0). The following assinptions are refered to ia the ‘various propositions proved a Assanption 1 The derivative of x, 6) exists and i= E(dlog 9/48 is Site, “Assanption 2. FE, s any Lebesgae measurable set a, Bfjfroona[lP Bf, Peta TD y, ‘emay be noted that, as a consequence of assumption |, the statistic Ze =P DIPS DINA is asymptotically normally distributed with mean zero and variance i. Further F(Z) =0,0(2,)e foreach Tf we define Y= (PT IPT INH, ‘he jaformation contained in 7, WO) = ly 7, has s-oder efieney i the seas of defnion 2.5, it bas been shows by Rao (0961) hat both (7,8) Yaad, ace normally dibuted inthe Lt 3. Emon Eenwares ano Tests or Sixareance In this scion, we eablsh some optimum properties o tet of infcane based on Satorder ecient eximates, Since oplmom lets provide 4 bas for ftereal ‘estimation a justification ofthe choice of ficient estimates in wo important methoda- logis problems is provided, The main resuls are given i Theorems | and 2, 1962) Rao - Efficient Estimates and Optimum Inference Procedures 3 Lemma 3.1. Let r4(6) be the power function of a text of the hypothesis Hy:8 = @ (8 specified value), at a given level of significance a. If r,(8,) is the derivative of r4{0) at 8, then, under Assumptions 1 and 2, Tas rece) < Gites el ‘where ais the « per cent point of the standard normal distribution. For any mit is known (Rao and Poti, 1946) that the test based on the critical region Wy hai Pa OP Xan 8) Oy yl) where ays choten such thatthe sizeof m, is, basa maximum slope forthe power function uty. Heace ebD< [PO Mddrg=Yaf ZPD dry ln Ja Dividing by (mand taking it 8 <0, we have that Tnr(0 actin Iva ys™ PE Ado =f Qrrzennaam inte, — 02) soevs Lemma 32. IT, bas first-order efciency and Assumptions | and 2 are satisfied, then fp-+f. The proof is given by Rao (1961). We shall now prove Theorem 1, which shows ‘that a test based on an efficient estimate bas maximum local power asymptotically. Theorem 1. Let T, be a first-order efficient estimate and P'(T,, %_)> An Pa(Tms 80)s ‘a test of the hypothesis Hy: 6 = @, at a given level of significance a. If 7,(6) is the ‘power function of this test then, under Assumptions 1 and 2, lim reC60)N = (Hla he, 3) ‘the upper bound derived in Lemma 3.1. Following the notation of section 25, let wy lai Z>6) and we = (Ly: %2b) co} ‘be two critical regions of the same size Since 1=p = E(Z,—Y,)} and ig-+/, a8 8 result of Lemma 32, tim [ (Z.- YFP 8)dog = 0. ede Heace if 2aPOt Bde tim YP Ady os) Jas sels But [rPet 80de> [Ptr Are, ” Rao Eficlent Estimates and Optimum Inference Procedures (No. ly ‘because ¥% 3, inthe locally powerful test based 00 Ty. Hence (Ba J rere tam in 00 2 tim YPC Odea in. [24 PC Bd = (ine, cr) ‘sing (1.2). The result (1.3) of Theorem 1 follows from (.1) and (3.9). ‘important to coosider the property of the test based om 7, ia the form ‘Ta~A)Yn>A, which is usually wed, and ot as considered in Theorcen | using the og decvative of the density function of T,. Theorem 2 shows tat the sae loeal Property is true of such 8 test Theoren 2. Let the assumptions on the probability density xx, ) and T, be as fa Theorem !. Ifthe test eiteron is U, = (TaOp)yn A where A is chosen sich Bat the imine of sgicance is, then (fain as the same Limit sin Byetaiion — KCOWR= | Pe Adan = [ip PaP he Bd on ea ‘Since [2,~BUy|-+0 in probability beoawse of the first-order efficiency of Ty the joint asymptotic distribution of Z, and U, exists and is, in fact, degenerate. If F(Z, U) represents this asymptotic distribution, (3.7) tends to frre], a=, since pis tobe chosen s that Pr(Z> Mf) w, which proves te required result ‘The foregoing analysis sugests the folowing deion of lintng eficiency of atest Defntion. Let r9(8) be the power function ofa test of the hypothesis ys 8 0, at a fned level of sigalfcance The limiting eficency ofa testi thea {im xeon ea, on ‘or its square, where (0) i the upper bound derived in lene 3.1. ‘An explicit evaluation ofthe limiting efciency is provided by Lema Lena 3.3, Let the tet criterion be Uy = (Ty @)yi4> and let the joint limiting Aistribatioa of Z, and Uy be bivariate normal with correlation coefficient p, Thea unger the assumptions of Theorem 2, the liting effiiency ofthe test isp. Fora firstorderefica estimate the asympitic correlation between the eatimate ‘nd Z, i unity, in which case the eficieacy of any otberestinate may be measured by its ésymptotic correlation with Lemma 33 shows that this meature is directly related to the asymptotic slope ofthe power function ofthe est based on he timate, 1962) Rao- Efficient Estimates and Optimum Inference Procedures 3 To prove Lemma 3.3, we have that aloo eMetoaee[,, MED =f Zazlertur= [sear 39) loo o> where aU ia the regression of Z 00 U, $0 that 9 = poyloy. The lat expression in 9), afer simpifcation, reduces 19 ie e, And dividing bythe upper bound (3.1) we obtain the efficiency ofthe test asp. ‘Sundrum (1954) obsered that in examining linkage in inheritance of two factors, the test based on the mi estimate of the recombination fraction (inkage parameter) is locally less powerful thaa an alternative large sarope test. This cannot, obviously, be tre in view of what has beta cstablisbed in Theorem 2, since the ma. estimate in this particular situation bas first-order eficicocy. ‘The atcruative testis based on & siatstc which happens to be eficien whea linkage does not exist, i. for a particular value of the tinkage parameter. Hence itis expected to have the same local property as the test based on the oul estimate for this panicular value of the parameter. Sundrum’s result is, therefore, misleading, especially as he claims 10 provide a justfcaton for a statement attributed to Fisher (1950, pp. 314-315) that good tests may be based on ineficient estimators. The particular ineficient estimator referred to by Fisher happens to be efficient atthe point specified by tbe null hypothesis, and itprovides a test as good as the m.L estimate, but not better. The results of setion 4 ‘ofthis paper will show tha efficiency in an interval of the unkoown parameter ensures some other desirable properties. 4. STRONGER Finst-onDER EFFICIENCY AND Tests OF SIGNIFICANCE ‘The limiting properties proved in Theorems | and 2 state that, compared to any other given tes, the power of a one-sided test based on a first-order efficient estimate is not less (and is perhaps better) in @ neighbourhood of @,, for each sufi- ‘Genlly large, but the neighbourhood may depead on n. It would, indeed, be better if it could be claimed of any test that its power canaot be less than that of any other {Given test in a specified interval of 6 for all sufficiently large n. It appears that such a statement can be made if the testis based on an estimate that bears a relation to Zy, stronger than that implied by first-order eficency. Lemma 4.1, Let ,, be a critical region of size a, in * for testing the hypothesis, Ly ‘and (9) the second kind of error. If a, is bounded away from unity, then - HB _ Ban woh» [ra BOlon FE d= 6.89. an TE (8, 6,) = co, there is nothing to prove, Let (8, 6,) be finite and define wo Reon Tea [iPrtomaten bel Pood $5 RAO Bificent Estimates and Optinan Inference Procedaes (No.1, “ral aroml Pad a, pref os eB Pett a, 6,094 [frat a ey 42) ‘Hence Emr tog A> H(0, Since, by the mean ergodio theorem (Doob, 1983, p. 469), the second expression in (42) tts to wr. Lanna 42. For the Hkelibood ratio test PU O)> Ay P(XasQ) such that the sizeof the critical region @,->0,0 (Bxed) ww *togBa(8) _ _ 1(0) ye ET! where B,(8) is the second kind of error. Lemina 4.3 shows that for the test Z4(05)> which is known to be locally most powerful on one sie, the stronger result (6), which ensure its superiority over any other tet in a neighbourhood of das n-+2o, is true under the additional conditions and (i). ‘Under the conditions () and (iit is easy to show that Yon, 60 Ion exn(—(8-892(601 A dn 46), og a69 +o0-a05] ‘Therefore fi 0-09 one M9400) = A. an Consider fexrt-(8- A026 —A yn Pas Dy me (AO- AMIE AOY. (48) 8 Rao - Bfclent Extimotes ond Optima Inference Procedwes (No.1, By Tebebybets inequaliy, (48) is ates ta x) rovided (0-6) 0, aad therfore wtlogh.9.< EAN og (0,09 “s) ‘sing (4.7). Combining (49) with (4.5), ve obtain the result 4.9), Corollary, Uf the testis of tbe form |Z,(0)|2 4 and §5(0) = FrqZ,(6)| =— ar (I+ COs ba Hope Zan =a, a9 hich implies that, considering Z,/lisead of 2, fr suiceally laren ogPrdZyiynl> d 4|T—-8l: 9 ZONA <0 wie! is independent of interval of @ enclosing @ Thea for Seat fio enkirietneues nok bd tim tim 08D e 00) (19) potenee OH 2 By étiniton Pa Prl(t,— nd; 6] aren, Pr(4B) ca 1 Pr (C)<1—Pr(AB)<1—Pr (4) +1—Pr(B) using the result (4.14) for 1~Pr(B). Keeping « fixed we can decrease such thatthe second term in (4.17) dominates over pf. Heace for sufi- cenly small & Be tog —Pr(C)}< lim wr HHog ep exp (nih (C—O, UO ne Dividing by # and taking Limits a8 h-+0, im im MEP) KOI e atae, Sen Caan, salto 1) Baa“ ap Further, since 8, <1—| Pr Sy nde) remaenv wih Pin ple 1=Pr(C). Noy using (4.5), the result (4.16) of Theorem 3 is established. CCroiay. For the tworsded test] 7, Mek Moa ‘The proof son the same lines as that ofthe corollary ofthe Theorgm 2. 6)>0; © Rao Efcent Estimates and Optinwn inference Procedures (No. ly 5, Srronom Fast-onpex Erncimcr axp Loaine Concenreanion Ja this setion, we shall consider the problem of tmiting concentration as developed bby Babadur (1940), As observed earlier, te main result isa consequence of Lemma 4. ‘which provides a lower bound to the Second kind of error, The same bound holds forthe probability of deviation of aa estimate from the ive value by a given amount, ‘under mild restriction on tbe estimate. The conaerion between tbe second kind of ‘error, which plays a fundamental role in the Neyenen-Pearson uheory of testing of hypotheses, and the probability of deviation ia an estimate has been exploited by Birnbaum (1961) iz proving some optimum properies of ml. estimates ia small ‘samples. The atempt has 00t been completely sucessful in the sense that no general “statements evuld be made about mi. estimates, which shows thatthe smal sample roblem may have to be viewed from an eniely different angle fice from the concept ‘of longrun frequency of errors in the estimate (Barnard, 1949). Lemma 5.1. Let T, bea statistic such that Pr(T, 2 6; 6) is bounded avay from nity 8 n-+20 for each <8, thea for A> lim wHogPe(Ty~64<—hi 82 Oy 86). on ~ H(8) + of, then TE ly =A) 0 80, tas hi Bl> ~ 62 Consider the tst T,> &,—h ofthe null hypothesis, Hy: =k. The second kindof error when 8, i the true value is Pr(T, 6h; 8). Hence an application of [Lema 4.1 gives the result (31). Equation (5.2) foows fom (5.1) by considering the ‘expansion of pl 6)~A) Lemma 5.2, de Yo Bahadur (1960), follows from the results of Lemna 5.1. The conditions imposed by Bahadur are, howeves, ghly diferent, Lena 2. IT, 8a saustic such that One or both of Pr(T,> 8; 8, 8< 6, and Pr(T, <8; 0), 8> 8 ate bounded away fom unity asm, then (5) Let Pr, 2 0; 6) (08) be bounded away fom unity, Then (53) follows from (52) by observing that PrqT,— 41>? Pr(T,— 8 < —h). ‘The same result can be established if Pr(T,< 0; 8) is bounded away from unity for 8> 8 Theor, Let, be sch tat for Bea whe 8 bins Pe Tan ZAHM] >e|Ta— Ah <9 69 or rT, ~O,-Za(OE Yad] > «|ZCODME YN) 0) = (Xa: T,> where 7, is assumed to be the unique root of Z4(8) = 0, to deduce admissibittry and local beiinss of T, a8 an estimate of @ io finite samples, from the corresponding ‘properties of admissibiity and local bestness of the test based on Z,(6). These are extremly special cases and it appears that one has to.consider large samples in order to say something definite, in general, The various types of dependence in large samples considered are @ Z()-KO(T,—9a}-+0 in probability; (Ga) PHIZ (OMA Cy—D)> 417A <0* (0< Ns bd Pr(|ZACOME a) (Te D> «IZ OMA elasiy the two procedures coretly inthis way would require rather caefl attention to their distribtions, for, although we koow that they ae asymptotically the same on the rl hypothesis in Targe samples, the question of the second-order departure from the hisqutre dtibution would be very relevant, Actually Ihave showa that o this next “ Discusion on Paper by Professor Roo (Wo. 1, order of approximation te two expen ere equivalent onthe nl bypotheti t ast indirbation (Blomernte, (198). 308) “The tame ind of remark se te rkvat 10 his resent paper Profesor Reo bas given some very teresting and careful sun prope tere imp etnae, {nd anima itehood one in artulr, athe hal tended to cancenirae on ter Inference properes rather than the ue ump a esmalc. Now to do hin comprten- they dock toe lean require 9 ctr knowledge of th amshing popces 2 traf bow asymptociy“aficien they are The in deinnion (2.6) Profeuoe RAD Genes secnd-ode fice mn tems The vanance of the deren betwee he 1 itctnood dervutve wad the best quarai oncion of T=. Butt nl fo cea Iie how iat al he propos 10 make ui of ths quadratic uretion He Cot ot my shat ss cstribution i or docs he eer: tress ht valves opting coeents wih are funcuons ofthe unknown parameter 8. If hs purpove, 8 be hums ether ipl in he paper, (out Toncton of 8 which appronmate oe Io ikeingod ‘era, why nol 31 nas my own systematic purpen in the paper That Mave ul teed os hs uncon eh We ae offen tld To study the liketnood function diet the advange of she logan dvinatine ae tat sly very wel behaved (epi i we coos Ne rh Turton of he unknown parame) and sampling proper, icuding normaly proper, are much moceesiy and excl ivetgatd (othe extent of ov making Second-order corecions. I beeve vn dacusing concrete prob, Jo let te Fete 1D tro examples to sutate exe points. Fie remind you tha f we plot Zam = Say, sa function of @ we hope t et 8 stig ine nest te point = 0. fed and {he sopers aio ~/.,3 constant independent of he simple, then we Cn speiy 8 ad, im place of Ly, sy. and. thi the fs-orderslvaton and 1 ts otder we cannot ‘ating te two aerate, 31 Profesor Rap notes when refering (the hkge tstimaton problem dacused by Sundrum Tey me, however, write down the west order equations ia the developments of 8-8 and 8 = ,-9. They are oe atta HL PE, w whieh is equation (13) of my paper ine 1p ee ame ary (aoe aeRO ta Nis notctearto me of hand which il provide the Btier lest of # = By, Bu it is aot too duit ivengate ths question, although exile for (2) than for (1. Sundeum ‘concluded inthe linkage peoblem tha in certain cvcumstances the Iet ated on Ly it beers Bough hei nt altogether jsy his neglect of Bias and non-normal pets wich 1s posible allow for whea oxsessary. 1 ny algebra is cores the bias in B In tbe linkage prim is ze 1 the net ore, andthe non-normaliy Comer ie $e for Boh sans, 4 that Surdrum’s conclusion appears uraleied "As a second example fet me ell hain the psa estimation problem (Phi. Mog. 446 (193), 26) which simulated thee higher approumation meinods bated on Lad te caimaton ofthe mean Heine @ of certain fundamental pe of parle from ‘observations of decays over limited (and variable) track length. Lnowed tha £90 in the neigibouthood of 1/8 = O was ofthe fem B/8~A, and ence linear in 1/8. W was tas much more sensible to estimate 18 than the maximum itcihoed timate of being 025616198» 10* sec and the upper O93 confidence imi for I infered most, ascortely drt Grom L being 0821 x10" “Thus whe 1 apres with Profesor Rao that i i most imporant to iavestiate second-ocder npprotmations in largesample tests and estimation problems, Iam Iss 1962} Discussion on Paper by Profestor Rao 65 ‘convinced ofthe practical value of some of the proposed definitions and results in his paper. Tam very pkased to move the vote of thanks to Professor Rac. Professor H, E. Dawns: Confronted by « paper of such exellenee and to full of ‘ew ideas 1nd ict ofliow the tection among voters of hacks Wat pate shall te t same eteat sled wih lame. So perhaps | may be alowed to fallback on another wellesablsned tradition that discussion at our metngs may be broad to th point af irelevance. Professor Reo messures the first-order ficiency of anestimateinverms of coneiation with 81/00, the derivative of the log Hketbood L(y @). This isthe best dsermintor ‘tween neighbouring hypoteses under the essumed regularity conditions. Its intresting to see How far the idea may sl be appropiate when the regolrity conditions ae relaxed. For desis lke lx, 8) = 4", 2/00 has Hite discontinuities in @, Dut iis st a ‘00d csciminatorBecase Lis a coavexpolgca in @ whose fat portions are of order Te such atest have shown (rececigs of 4h Berkeley Symposium) ha the extinate fe stil asympioueally efficient n the ol sense. On the other hand or pte.) = cexpt-lx-8N) (d0, thea the mean squared error would be a relevant measure ofthe goodness (ofthe estimator. However. the quadratic losis not always applicable. Worse, according {othe theory of wily, ullity must be bounded and therefore itis ever realy applicable ‘whens unbounded. On the other hand, one may frequently assume tht. 8) lealy ‘vadratc ie. one may be able to approximate, 8) by o8)(t~ for sutcienty else 0-8. What happens then 10 the eXpecied loss 38.27 Under mild conditions. the ‘expected loss is somewhere between the asymptotic variance of the estimator 7, and the variance of Ta te later quanity may be larger. Furthermore, Commins proved that Tor the maximum likelihood estimator (subject to repuarty condition on the Stribution function ofthe data the expected loss is given bythe asymptotic variance. This, together ‘with Steia’s version ofthe Cramer-Rao theorem, shows that no etimator can do substan- Tilly beter in any open interval. In this seme we have the eficiency of the maxima iketnood estimator without assuming x particutar fom fr the loss font must be admitted that inspite of the genet applicability of locally quadratic loss functions, thee are cases where other los unctons are called for. I would be interesting i one could prove tha, given an arbitrary los funelon, thre san appropriate unetion ‘ofthe axiom Hikelihood estimator /.(7.)= Te fr which the expected lox 8 aymplo cally minimized. The proof or disproof‘of sich a geveralzation of Commins resolt 1962) Discussion on Paper by Professor Rao o oul undoubedy relate fo and depend on many ofthe eas expressed in Profesor 20's paper. Profestor G. A. Baawaxo: | wish to add my congratulations to Professor Rio. T ‘am sure we have ali been most impressed with the range as well as with the depth of the ‘contributions he has made during the course of his visit to London. If may follow Professor Daniels in not being too closely relevant, and perhaps follow slso Profesor Rao in thal respect, I would lite {0 comment on tbe fst section of bis paper where he divcuses the question of consistency. THis is not aliogether relevant 10 ‘ficiency, but itis 2 tope worth dwelling on. Professor Ra's definition of consistency is ormulated with reference tothe possibilty of distinguishing asympiouically between (wo distributions and it has the property that the usual estimates ofthe variance i a consistent ftimator of e as wel as of. This rather affront our usual notions of what is meant by consistency and {want to point out that one can get over the dificult in this particular case by referring (othe group properties of the problem. The reason we consider usually that 3 estimates ot and does not estimate can be expressed by saying that if all the and 50 is of, but ois led by ‘The group covariance property is relevant also to the examples of supereficient ‘timates due to LeCam. These fail to satisfy the reasonable group properties that one ‘would nocmaly require of them (they are not covariant under the waralaion group), and Tam glad to learn this evening from Professor Chernoff that there is ¢ theorem which implies that one cannot construct such examples which are group covariat do net wish to suggest thatthe difficulty of this definition of consistency ean always bbe repaired by group considerations although the range of cases in which it can be so repaired can be considerably extended if we use the idea of local group. That sto say, if ‘we apy these notions under the restrictions that a lies elo to the unit element of the relevant group. But in fat I think that no wholly satisfactory single definition of consis- tency sever likely tobe given. The original idea of consistency used by Gauss in connexion with the theorem of last squares, which condition has so long in the textbooks been $0 ‘confusingly and misleadingly replaced by the condition of unbiasedness, was based on the fact that in the problems that Gauss was considering one could distinguish between trae values and erors. Gauss's consistency requirement was simply that when the observations were free from etror the method of estimation used should give the trve value. This is very close (o the Fisher consistency definition which Professor Rao mentions, but that generalization, ingenious though itis, suffers from the disadvantage that it is restricted ‘0 functionals the empirical sampling distribution function, Not all the usual statistics {are definable as such, for itis a general property ofall such functionals that ifthe same set of observations occurs twice over in a sample of double size the sample distribution function i unaliered. But, for example, s* will become (20-2) ,#/(2a—1) instead of ‘remaining unaltered. Therefore the usual estimate of variance fails 10 be a statistic in this sense, One might try to repair this defect by introducing well-behaved factors, such 1s functions of « which (end monotonically 10 1; but I really think that the tendeney to look fr a single definition of consistency, applicable toa wide variety of diferent cicum- stances, is an example of the tendency to over-simplifcation in a mathematical sense, which has been endemic in the field of statistics for many years past. do not think the search islikely vo succeed. ‘Now I wish to comment on the main part ofthe paper. In spite ofthe emphesis which Professor Rao has placed on the importance of regatding sats as discriminators, it stil tends to be overlooked that mox of the definitions of consistency, second-order ficiency and so on can be applied to “estimators” ofthe form (4 used for estimating 4 single quantity 6, There is 90 restriction to single numer. if one is reducing data (ne can reduce it 0 a pu of numbers, or 10a triplet, a8 well as to 8 single number, and a Discusion om Paper by Profesor Rao Wo.1, this pouty should be bore i mind. Tt would belp to gt ut away fom te misleading ‘concept of pot eximation, hich I thnk does a peat deal of harm. Finally, T would lie aluo to comment, and 1 do act doubt that Profesor Rao will ‘agree with this, tat nothing in this pape, or indeed nvthing that bas beea sad tonight,