T h i s p a g e i s i n t e n t i o n a l l y left b l a n k
G r o u p I n v a r i a n c e in Statistical Inference
Narayan
C . Giri
W o r l d h Singapore
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Fairer Road, Singapore 912805 USA office: Suite I B , 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
G R O U P I N V A R I A N C E IN S T A T I S T I C A L I N F E R E N C E Copyright 1996 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 9810218753
Printed in Singapore.
T h i s p a g e i s i n t e n t i o n a l l y left b l a n k
CONTENTS
Chapter 0. G R O U P I N V A R I A N C E 0,0. Introduction 0.1. Examples Chapter I . M A T R I C E S , G R O U P S A N D J A C O B I A N S 1.0. Introduction 1.1. Matrices 1.1.1. Characteristic Roots and Vectors 1.1.2. Factorization of Matrices 1.1.3. Partitioned Matrices 1.2. Groups 1.3. Homomorphism, Isomorphism and Direct Product 1.4. Topological Transitive Groups 1.5. Jacobians Chapter 2. I N V A R I A N C E 2.0. Introduction 2.1. Invariance of Distributions 2.1.1. Transformation of Variable in Abstract Integral 2.2. Invariance of Testing Problems 2.3. Invariance of Statistical Tests and Maximal Invariant 2.4. Some Examples of Maximal Invariants 2.5. Distribution of Maximal Invariant 2.5.1. Existence of an Invariant Probability Measure on 0{p) (Group of p x p Othogonal Matrices) 2.6. Applications 2.7. The Distribution of a Maximal Invariant in the General Case
vii
1 1 3 7 7 7 8 9 9 10 12 12 13 16 16 16 22 25 26 28 31 33 33 36
viii
Contents 37 44 45
2.8. An Important Multivariate Distribution 2.9. Almost Invariance, Sufficiency and Invariance 2.10. Invariance, Type D and E Regions Chapter 3. E Q U I V A R I A N T E S T I M A T I O N I N C U R V E D MODELS 3.1. Best Equivariant Estimation of y, with \ Known 3.1.1. Maximum Likelihood Estimators 3.2. A Particular Case 3.2.1. An Application 3.2.2. Maximum Likelihood Estimator 3.3. Best Equivariant Estimation in Curved Covariance Models 3.3.1. Characterization of Equivariant Estimators of S 3.3.2. Characterization of Equivariant Estimators of 0 Chapter 4. S O M E B E S T I N V A R I A N T T E S T S I N MULTINORMALS 4.0. Introduction 4.1. Tests of Mean Vector 4.2. The Classification Problem (Two Populations) 4.3. Tests of Multiple Correlation 4.4. Tests of Multiple Correlation with Partial Information Chapter 5. S O M E M I N I M A X T E S T S I N M U L T I N O R M A L E S 5.0. Introduction 5.1. Locally Minimax Tests 5.2. Asymptotically Minimax Tests 5.3. Minimax Tests 5.3.1. HotelUng's T Test 5.3.2. R Test 5.3.3. eMinimax Test (Linnik, 1966) Chapter 6. L O C A L L Y M I N I M A X DISTRIBUTIONS 6.0. Introduction 6.1. Eliiptically Symmetric Distributions 6.2. Locally Minimax Tests in E (n, S )
p 2 2
49 50 53 53 58 58 60 61 63
68 68 68 75 82 85 91 91 93 106 111 112 124 135 137 137 137 137 140 143 162
TESTS IN SYMMETRICAL
Chapter 0 GROUP I N V A R I A N C E
0.0. I n t r o d u c t i o n One of the unpleasant facts of statistical problems is that they are often too big or too difficult to admit of practical solutions. Statistical decisions are made on the basis of sample observations. Sample observations often contain information which is not relevant to the making of the statistical decision. Some simplifications are introduced by characterizing the decision rules in terms of the sufficient statistic (minimal) which discard that part of sample observations which is of no value for any decision making concerning the parameter and thereby reducing the dimension of the sample space to that of the minimal sufficient statistic. T h i s , however, does not reduce the dimension of the parametric space. B y introducing the group invariance principle and I n view of the fact that sufficiency and restricting attention to invariant decision rules a reduction to the dimension of the parametric space is possible. group invariance are both successful in reducing the dimension of the statistical problems, one is naturally interested in knowing whether both principles can be used simultaneously and if so, in what order. Hall, Wijsman and Ghosh (1965) have shown that under certain conditions this reduction can be carried out by using both principles simultaneously and the order in which they are used is immaterial in such cases. However, one can avoid verifying these conditions by replacing the sample space by the space of sufficient statistic and then using group invariance on the space of sufficient statistic. I n this monograph we treat multivariate problems only where the reduction in dimension is
Inference
very significant. In what follows we use the term invariance to indicate group invariance. In statistics the term invariance is used in the mathematical sense to denote a property that remains unchanged (invariant) under a group of transformations. In actual practice many statistical problems possess such a property. As in other branches of applied sciences it is a generally accepted principle in statistics that if a problem with an unique solution is invariant under a group of transformations, then the solution should also be invariant under it. This notion has an old origin in statistical sciences. Apart from this natural justification for the use of invariant decision rules, the unpublished work of Hunt and Stein towards the end of Second World War has given this principle a Strang support as to its applicability and meaningMness to prove various optimum properties like minimax, admissibility etc. of statistical decision rules. Although a great deal has been written concerning this principle in statistical inference, no great amount of literature exists concerning the problem of discerning whether or not a given statistical problem is actually invariant under a certain group of transformations. Brillinger (1963) gave necessary and sufficient conditions that a statistical problem must satisfy in order that it be invariant under a fairly large class of group of transformations including Lie groups. In our treatment in this monograph we treat invariance in the framework of statistical decision rules only. D e F i n e t t i (1964) in his theory of exchangeability treats invariance of the distribution of sample observations under finite permutations. It provides a crucial link between his theory of subjective probability and the frequency approach of probability. T h e classical statistical methods take as basic a family of distributions, the true distribution of the sample observations is an unknown member of this family about which the statistical inference is required. unknown. According to D e Finetti's approach no probability is If i , i E s , . . . are the outcomes of a sequence of experiments con a known joint
ducted under similar conditions, subjective uncertainty is expressed directly by ascribing to the corresponding random variables X ,X^,
t
distribution. When some of the X's are observed, predictive inference about others is made by conditioning the original distributions on the observations. De Finetti has shown that these approaches are equivalent when the subjectivist's joint distribution is invariant under finite permutation. T w o other related principles, known in the literature, are the weak invariance and the strong invariance principles. T h e weak invariance principle is used
Group Invariance
to demonstrate the sufficiency of the classical assumptions associated with the weak convergence of stable laws (Billingsley, 1968). T h i s is popularly known as Donsker's theorem (Donsker, 1951). L e t X\, X2, be independently distributed random variable with the same mean zero and the variance a
2
and let
Sj =
j=i
= ^ . * =  . ;
.  " . 
introduced to prove the strong convergence result (Tusnady, 1977). Here the term invariance is used in the sense that if X i , X j , . . . are independently distributed random variables with the same mean 0 and the same variance a , and if ft is a continuous function on [0,1 then the limiting distribution of h(Xi) does not depend on any other property of Xi. 0.1. E x a m p l e s We now give a n example to show how the solution to a statistical problem can be obtained through direct application of group theoretic results. E x a m p l e 0.1.1. Let X mean ^ = (m,..
a 2
= (X
n l
,...,X
a p
)\
a = l,...,N(>
p) be inde
pendently and identically distributed pvariate normal vectors with the same . , j t ) ' , and the same positive definite covariance matrix E .
p
The parametric space f! is the space of all {11, ) . H0 : // = 0 against the alternatives Hi : 11 each Xi * gXi, i = l,...,N. Let
T h e problem of testing
is given by
(X,S)^(gX,gSg'),
g Gi(p)
(0.2)
Since this transformation permits arbitrary changes of X, S and any reasonable statistical test procedure should not depend on any such arbitrary change by g, we conclude that a reasonable statistical test procedure should depend on (X, S) only through T
2
(0.3) is given by
Inference
fr*(t \6 )
(Nl)T(^Np))
(
\8 y(t j{N
2 2
> T{\N
+ j)
if > 0
2
(0.4)
where 6
= JV/i'y
Under H S
0 2
= 0 and under ff i S
> 0.
Applying
Neyman and Pearson's L e m m a we conclude from (0.4) that the uniformly most powerful test based on T T
2
of H
N o t e . I n this problem the dimension of fl is p + also the dimension of the (X,S). a scalar quantity. For the distribution of T
w n
i h is
c
the parameter is
One main reason of the intuitive appeal, that for an invariant problem with an unique solution, the solution should be invariant, is probably the belief that there should be a unique way of analysing a collection of statistical data. A s a word of caution we should point out that, if in cases where the use of invariant decision rule conflicts violently with the desire to make a correct decision or to have a smaller risk, it must be abandoned. We give below one such example which is due to Charles Stein as reported by Lehmann (1959, p. 338). E x a m p l e 0.1.2. Let X = Xi,...,X )',
p f
Y = (Yi,...,Y )'
p
be indepen
dently distributed normal pvectors with the same mean 0 and positive definite covariance matrices S , 6 S respectively where 6 is an unknown scalar constant. Consider the problem of testing Ho : 6 1 against H\ : 6 > 1, T h i s problem is invariant under Gj(p) transforming X > gX,Y gY,g e Gt(p). Since this group is transitive (see Chapter 1) on the space of values of (X, Y) with probability one, the uniformly most powerful invariant test of level a under Gi{p} is the trivial test $ ( X , Y) = a which rejects Ho with constant probability a for all values {x,y) of (X,Y). Hence the maximum power that can be achieved over the alternatives Hi by any invariant test under G;(p) is also a . B u t the test which rejects Ha whenever
(0.5)
where the constant C depends on level a, has strictly increasing power 0(6} whose minimum over the set 6 > Si > 1 is /?{<5i) > /3(1) = a. discussions and results refer to Giri (1983a, 1983b). F o r more
,X be
n
variables with the same mean 9 and the same unknown variance o
A
and let
:d = 0 and H i : 9 / 0.
F i n d the largest group of transformations which leaves the problem of testing Ha against Hi invariant. Using the group theoretic notion show that the twosided student (test is uniformly most powerful among all tests based on (.
(a) (b)
2. U n i v a r i a t e G e n e r a l L i n e a r H y p o t h e s i s . i 1,... ,TI.
Let
, . . . ,X
be indepena,
2
dently distributed normal random variables with E(Xi) let lip. and IL
= (ft, V a r ( X ; ) =
dim IIJJ i , I > k. Consider the problem of testing HQ : 8 lTm against the alternatives Hi : 9 6 HQ.
(a) F i n d the largest groups of transformations which leave the problem invariant. (b) Using the group theoretic notions show that the usual Ftest is uniformly most powerful for testing HQ against H\. 3. L e t Xy,..., X
n
For testing H
: a
a\
= a
< af, where <T ,O~% are known, find the most powerful
2
References 1. P. Billingsley, Convergence of Probability Measures, Wiley, New York, 1968. 2. D. Brillinger, Necessary and sufficient conditions for a statistical problem to be invariant under a Lie group, Ann. Math. Statist., 34 (1963) 492500. 3. B. De Finetti, Studies in Subjective Probability, H . E , Kyburg and H, E . Smoker, eds., Wiley, New York, (1964) 93158. 4. M. Donsker, An invariance principle for certain probability limit theorems, Amer. Math. S o c , 6 (1951). 5. N. Giri, HuntStein theorem. Encyclopedia of Statistical Sciences, Vol. 3, K o t z Johnson, eds., Wiley, New York, 1983a, 686689. 6. N. Giri, Invariance Concepts in Statistics, Encyclopedia of Statistical Sciences, Vol. 4, KotzJofmson, eds., Wiley, New York, 1983b, 219224. 7. N. Giri, Multivariate Statistical Inference, Academic Press, New York, 1977.
Inference
8. W. J . Hall, R. A. Wijsman and J. K . Ghosh, The relationship between sufficiency and invariance with application in sequential analysis, Ann. Math. Statist, 36 (1965) 575614. 9. E . L . Lehmann, Testing of Statistical Hypothesis, Wiley, New York, 1959. 10. G . Tusnady, in Recent Developments in Statistics, Barra, J . R . , Brodeau, F . , Romier, G. and Van Cutsera, B. eds, NorthHolland, Amsterdam, (1977) 289300.
1.0.
Introduction
T h e study of group invariance requires knowledge of matrix algebra and group theory. We present here some basic results on matrices, groups and Jacobians without proofs. Proofs can be obtained from G i r i (1993, 1996) or any textbook on these topics. 1.1. M a t r i c e s A p x q matrix C = (c\j) is a rectangular array of real number Cy written as (1.1)
where e% is the element in the i t h row and the j t h column. T h e transpose C of C in a q x p matrix obtained by interchanging the rows and the columns of C. If q = p, C is called a square matrix of order p. K q 1,0 is a 1 X p row vector and if p = 1, C is a g x 1 column vector. A square matrix C is symmetric if C C. A square matrix C = ( c ^ ) of order p is a diagonal matrix D ( c , c ) with diagonal elements e n , . . . ,Cp if all offdiagonal elements of C are zero.
u pp P
A diagonal matrix with unit diagonal elements is a n indentity matrix and is denoted by / . Sometimes it will be necessary to write it as I to denote a n identity matrix of order k.
K
Group Invariance
in Statistical
Inference
0 for j > i. T h e determinant of the lower triangular matrix det C = H(=i We shall also write det C \C\ for convenience. A square matrix C = (ey ) of order p is a upper triangular matrix if Cj, = 0
for i > j and det C = [IS=I "A square matrix of order p is nonsingular if det C ?^ 0. If det C = 0 then C , is a singular matrix. A nonsingular matrix C of order p is orthogonal if CC = CC = I. T h e inverse of a nonsingular matrix C of order p is the unique matrix C such that C C
1 _ 1
= C
A square matrix (7 = ( c ^ ) of order p or the associated quadratic form x'Cx = J2i Z l j djXiXj is positive definite if x'Cx > 0 for x = ( i i , . . . , x )' / 0.
p
1.1.1. Characteristic roots and vectors T h e characteristic roots of a square matrix C of order p are given by the roots of the characteristic equation det(CA/)=0 where A is real. A s det {8C8'  XI) = det ( C  XI) for any orthogonal matrix 8 of order p, the characteristic roots of C remain invariant under the transformation of C > 8C8'. T h e vector x ( i ^ , . . . , x )' / 0 satisfying
p
(1.2)
{CXI)x
= 0
(1.3)
is the characteristic vector of C corresponding to its characteristic root X. I f x is a characteristic vector of C corresponding to its characteristic root X, then any scalar multiple ax, a j i 0, is also a characteristic vector of C corresponding to X. Some Results on Characteristic R o o t s a n d Vectors 1. T h e characteristic roots of a real symmetric matrix are real. 2. T h e characteristic vectors corresponding to distinct characteristic roots of a symmetric matrix are orthogonal.
Matrices, Groups and Jacobians 9 3. T h e characteristic roots of a symmetric positive definite matrix C are ail positive. 4. Given any real square symmetric matrix C of order p, there exists an orthogonal matrix 9 of order p such that 9C9' is a diagonal matrix D{\\,..., where A i , . . . , A
p
A )
p
1.1.2.
Factorization
of matrices
In this sequal we shall use frequently the following factorizations of matrices. For every positive definite matrix C of order p there exists a nonsingular matrix A of order p such that C = AA' and, hence, there exists a nonsingular matrix B (B = A' )
1
= I.
Given a symmetric nonsingular matrix C of order p, there exists a nonsingular matrix A of order p such that
 i )
where the order of / is equal to the number of positive characteristic roots of C and the order of I is equal to the number of negative characteristic roots of C . Given a symmetric positive definite matrix C of order p there exists a nonsingular lower traingular matrix A (an upper triangular matrix B) of the same order p such that
C = AA' = B'B.
(1.5)
C h o l e s k y D e c o m p o s i t i o n For every positive definite matrix C there exists an unique lower triangular matrix of positive diagonal elements D such that
C = DD'.
1.1.3.
Partitioned matrices
Cij as
C = (^
where C
u
\Cai 
^ C22 /
12
(cij)(i = l , . . . , m ; j = l,...,n)\C
 (ftj)( = l , . . . , m ; jf =
10
Inference
n+
l,...,q);C
2 l
= ( c y ) ( t = m+
l,...,p; j = l,...,n);C
2 2
C f^ ' \C2i
where Cn,C 2
2
\ C22 /
2
) det ( C n
CaC^Cji),
U 22
 C iG^C2
or equivalently = 1,2.
C22 B22 l2
B iB 'Bi ,
2 ll 2
= B B^.
(1.6)
A group G is a set with an operation r satisfying the following axioms. A i . For any two elements a, b E G, arb G. A . For any three elements a,b,c G G\ {arb)rc o r ( b r c ) .
2
A3. There exists an element e (identity element) such that for all o G,are
a.
A .
4
e.
In what follows we will write for the convenience of notation arb = ab. I n such writing the reader may not confuse it with the arithmetic product ab. A group G is abelian if for any pair of elements o, 6 belonging to G, ab = ba. A nonempty subset H of G is a subgroup if the restriction of the group operation T to H satisfies the axioms A i , , . . , A 4 , E x a m p l e s of G r o u p s E x a m p l e 1.1. mappings A . L e t X be a set and G be the set of all onetoone
g : X > X
with g{x) = g(y)\x,y G I ; implies x y and for x G X there exists y G X such that y = g{x). W i t h the group operation defined by 9r92(x) = SI ( J 2M); SI,S 2 G , G forms a permutation group.
Matrices,
11
E x a m p l e 1.2. T h e additive group of real numbers is the set of all reals with the group operation ab = a + 6. T h e multiplicative group of all nonzero reals with the group operation ab a multiplied by b. E x a m p l e 1.3. L e t X be a linear space of dimension n. Define for x
0
G X,g (x)
xo
forms an
additive abelian group and is called the translation group. E x a m p l e 1.4. L e t X be a linear space of dimension n and let Gi(n) be the set of all nonsingular linear transformations X onto X. Gi(n) with matrix multiplication as the group operation is called the full linear group. E x a m p l e 1.5. T h e affine group is the set of pairs (g,x),x 91.92 6 Gi(n) and X\,x G X,g Gi{n)
(1,0) and ( s . x r  O r
E x a m p l e 1.6.
order p with the usual matrix multiplication as the group operation forms a group GT(P) (with identity matrix as the unit element). identity matrix as the unit element. E x a m p l e 1.7. T h e set of all orthogonal matrices 6 of order p with the matrix multiplication as the group operation forms a group E x a m p l e 1.8. L e t o C 1 C 2 C C X 0(p). upper triangular nonsingular matrices of order p forms the group GUT(P)
= X be a strictly increasing
sequence of linear subspaces of X and let G be a subgroup of Gi(p) such that g G Gi{p) if and only if gXi = Xi, i = 1 , . . . , k. T h e group G is the group of nonsingular linear transformations on X to X which leaves Xi invariant. Choose a basis x^\ ... , x $ , t = 1 , . . . , k for XiXii dim (3t,_i) and XQ <f> (null set). Then g G G can be written as with n, dim (X,) 
9 =
fl(lJfc) \
9<2fc)
(1.7)
12
inference
where g/*^ is a block of Bj rows and rij columns for i < j . I f n; = 1 for all i,G
= GUT
1.3. H o m o m o r p h i s m , I s o m o r p h i s m a n d D i r e c t P r o d u c t Let G, H be groups. A mapping / of G into H is called a homomorphism if, for gi, g
2
G,
/(9i!72) = / ( 9 i ) * / M
where * is the group operation in H. is the identity element in H and / ( g
 1
(18)
onetoone mapping, then / is called an isomorphisim. T h e Cartesian product G x H of groups G and H with the operation (9u iK92,h )
2 h
= (9i92,hih y,gi,g2
2
GMM
product of G and i f . A subgroup H of G is a normal subgroup of G if for all h H and s e (?, ghg~
l
/ / , or equivalently if gHg
1
H .
E x a m p l e 1.9.
with positive diagonal elements is a normal subgroup of GT(P)of an abelian group is normal.
Let G be a group and let i f be a subgroup of G. T h e set G/H operation defined by, for j&jjfe G &,.(% H){gzH) by multiplying all elements of giH element is H and (gH)
1
of the form gH, g G is called the quotient group of G modulo H with group is a s e t of elements obtained
2
T h e identity
g~ H.
1.4. T o p o l o g i c a l T r a n s i t i v e G r o u p s Let X be a set and A a collection of subsets of X. topological space if A satisfies the following axioms: (X,A) is called a
TAi : X A.
T A j : T h e union of an arbitrary subfamily of A belongs to A. TA
3
; T h e intersection of a finite subfamily of A belongs to A. satisfies: for x,y X,x / y, there exist two open sets A,B, belonging
T h e elements of A are called open sets. If, in addition the topological space (X,A) space. A collection of open sets is a base for a topology if every open set in A is the union of a subcollection of sets in the base. to A such that A n B = <p with x e A, y B; then it is called a Hausdorff
Matrices,
13
If a group G has a topology r defined on it such that under T the mapping ab from G x G into G and a ( G , r ) is a topological group. A compact group is a topological group which is a compact space. A locally compact group is a topological group which is a locally compact space. A compact space is a Hausdorff space X such that every open covering of X can be reduced to a finite subcovering. A locally compact space is a Hausdorff space such that every point has at least one compact neighborhood. Let X be a set. T h e group G operates from the left on X if there exists a function on G x X into X whose value at {g,x} is denoted by gx such that (i) gzgi(x). ex x for all x X and e, the identity element of G ; (ii) g^igix) This implies that gtG is one to one on X into X. Let G operate from the left on X. every Xi,x
2 1
G o p e r a t e s t r a n s i t i v e l y on X if for
2
X there exists ag G such that gx\ x Let X be a linear space. T h e full linear group operates
E x a m p l e 1.10.
transitively on the set of all n x p(n ^ p) real matrices, x satisfying x'x = E x a m p l e 1.12. T h e linear group G ; ( p ) acts transitively on the set of all p x p positive definite matrices s, transfering s to gsg',g 1.5. J a c o b i a n s Let Xi,...,X fx,
n
G/(p).
xAXi,.,X }
,X )>i
n n
= l,...,n
be a set of X.
n 1
be
9Y (1.9) dX 9V J is called the Jacobian of the transformation X\,..., The pdf of Yi,...,
r
to V j , . . . ,
Y.
n
M , . . . y  ( K i .    . Vn) = fx,,...^Cftltflj
14
Group Invariance
tn Statistical
Inference
where \J\ is the absolute value of J. We now state some results on Jacobian without proof. We refer to G i r i (1994), OIkin (1962) and Roy (1959) for further results and proofs. S o m e R e s u l t s on J a c o b i a n s Let X = (Xi,...,X )',y
p
(Y ,...,Y )'
1 P
E.
l
T h e Jacobian of the
transformation X > Y gX g g G j ( p ) is (det g)~ Let X, Y be p x n matrices. T h e Jacobian of the transformation X * Y = gx,geGt{p) L e t X,Y L e t g, k nf i(ft.i)
= 1 _ 1
is (det bepxn
T h e Jacobian of
p
(det
S ) . is
nf= <M'  . Let g,k n^iC )' ^ n L Let G S T  ( P ) be the multiplicative group of p x p lower triangular nonsingular matrices in the block form, i.e. g e G B T ( P ) .
l
e
 p  1
GUT(P)
\ 9(21) m
9(1:1)
9(22) m
9(*2)
0
* 9(ltfc)
where g^q are submatrices of order pi x pi such that Y^iPi GBT(P) the Jacobian of the transformation g *
a = ; 3
= P F 9 . ^
d e t
6
 , T i
kg is n ! = i i
/i(ii)j
Let GBUT(P)
9(12)
9(22)
9<u)\
9(2fc) j
0 where g^
9(fcfc)
P i
= . For g,h
p d e t
e
a n d t
G
h a
e t
7 ,
T o f
fya)] "'
t(u)]"  " .
15
Let 5 be a symmetric positive definite matrix of order p. T h e Jacobian of the transformation 5 > gSg',g is (det S ) " . Let 5 be a symmetric positive definite matrix of order p and let g be the unique lower triangular matrix such that S = gg'. T h e Jacobian of the transformation S > gSg' is 2~" ^ i f f f i i ) ^ the transformation S gSg',g
1 2
e Gt(p)
is [det g ] "
t p + 1 )
and that of S  5
_ 1
g G ( p ) is [det Exercises
1. Show that for any nonsingular symmetric matrix A of order p and non null p vectors x,y (a) (A + xy') ^
1
l + =
y'A~ x
be p x q and qxp
3. Show that for any lower triangular matrix C the diagonal elements are its characteristic roots. 4. Let X be the set of all n x p real matrices X satisfying X'X that the group 0(n) X > 6X acts transitively on X. 5. Let 5 be the set of all symmetric positive definite matrices s of order p. Show that G j ( p ) which transforms s t gsg',g 5. G)(p), acts transitively on I.
p
Show
References 1. N. Giri, Multivariate Statistical Inference, Academic Press, New York, 1977. 2. N. Giri, Introduction to Probability and Statistics, 2nd Edition (Expanded), Marcel Dekker, New York, 1993. 3. N, Giri, Multivariate Statistical Analysis, Marcel Dekker, New York, 1996. 4. I. Olkin, Note on the Jacobian of certain transformations useful in multivariate analysis, Biometrika, 40 (1962), 4346. 5. S. N. Roy, Some Aspects of Multivariate Analysis, Wiley, New York, 1957.
Chapter 2 INVARIANCE
2.0,
tistical problems involving testing of hypothesis, exhibit symmetries, which imposes additional restrictions for the choice of appropriate statistical tests, for example, the statistical tests must also exhibit the same kind of symmetry as is present in the problem. In this chapter we shall consider the principle of invariance of statistical testing problems only. For an additional reference the reader is referred to Lehmann (1959), Ferguson (1967) and Nachbin (1965). 2.1. I n v a r i a n c e of D i s t r i b u t i o n s L e t [X,A) be a measure space and ft the set of points 8, that is, f! {9}.
t
is (A, A) measurable;
(2.1)
0
s
(2.2) where
0* = 98 6 ft. A l l transformations considered in connection with invariance will be taken for granted as onetoone from X onto X. A n equivalent way of stating (2.2) is as follows:
16
Invariance
I7
P {gX
e
e A) = P (X
g6
A),Ae
A,
(2.2a)
or
P {g A)
9
P {A)
se
(2.2b)
Pt(B) = P {gB),
s
Be A
PH*
t n e n
9*
S a
homomorphism.
The condition (2.2a) is often known as the condition of invariance of the disVery often in statistical problems there exists a measure A such that Pa is absolutely continuous with respect to A for all 8 S fi so that p$ is the corresponding probability density function with respect to the measure A. I n other words, we can write
(2.3)
Also in great many cases of interest, it is possible to choose the measure A such that it is invariant under G, viz
\(A) = \[gA)
for all A G A,
geG.
(2.4)
D e f i n i t i o n 2 . 1 . 1 . A measure A satisfying (2.4) is left invariant. In such cases the condition of invariance of distribution under G reduces to Pgs(gx) = pe(x) for alia; e X, g e G. (2.5)
The general theory of invariant measures in a large class of topological groups was first given by Haar (1933). However, the results we need were all known by the end of the nineteenth century. I n the terminology of Haar, Ps(A) is called a positive integral of p, not necessarily a probability density function. T h e basic result is that for a large class of topological groups there is a left invariant measure, positive on open sets and finite on compact sets and to within multiplication by a positive constant this measure is unique. Haar proved this result for locally compact topological groups. for left invariant positive integrals. T h e definition of right invariant positive integrals, viz. Pe{A) = Pge(Ag) is analagous to that
18
Inference
Because of the poineering works of Haar such invariant measures are called invariant (left or right) Haar measures. A rigorous presentation of Haar measure will not be attempted here. T h e reader is referred to Nachbin (1965). It has been shown that such invariant measures do exist, and from a left invariant Haar measure one can construct the right invariant Haar measure. We need also the following measuretheoritic notions for further developments. Let G be a locally compact group and let B be the <7 algebra of compact subsets of G. D e f i n i t i o n 2.1.1. (G,B) (Relatively left invariant measure). A measure v on
is relatively left invariant with left multiplier x{g) i f f satisfies v{gB) = x(9)v(B), forallBeB.
+
(2.5a) If v
T h e multiplier x(t) is a continuous homomorphisim from G R . is relatively left invariant with left multiplier \(g) x(ff )f(^ff) '
l s a
then \ ( g
 1
) = l/x(ff) and
D e f i n i t i o n 2.1.2.
mapping T(g, x) : x x G x
continuous.
D e f i n i t i o n 2.1.3. (Proper Gspace). Let the group G acts topologically on the left of x
a n d
let h be a mapping on G x \ * X
X given by
Hg>x) = {gx,x)
for g G G,x C C x
x
XJ
T h e space x is called a proper Gspace fG acts properly on \ . A n equivalent concept of the proper Gspace is the Cartan Gspace. D e f i n i t i o n 2.1.4 (Cartan Gspace). Let G acts topologically on \ . Then X is called a Cartan Gspace if for every x e x there exists a neighborhood V of a such that (V, V ) = {g e G\{gV) n V # 4>} has a compact closure. Wijsman (1967, 1978,1985, 1986) demonstrate the verification of the condition of Cartan Gspace and proper actions in a number of multivariate testing problems on covariance structures. E x a m p l e 2.1.1. Let G be a subgroup of the permutation group of a finite s e t *  For anysubset A of x define A( A) = number of points in A. T h e m e a s u r e
Invariance
19
A is an invariant measure under G and is unique upto a positive constant. It is known as counting measure. Example 2.1.2. L e t x be the Euclidean n space and G the group of
3
translations defined by i
e %g
t
Xl
e G implying =x + xi.
g (x)
Xl
invariant under G. Again A is unique upto a positive multiplicative constant. E x a m p l e 2.1.3. left. Consider the group Gi(n) operating on itself from the is linear, it has a is a vector Obviously, Gi(n)
column horizontally. T h e n the second is likewise and so forth. I n this manner a matrix of dimension nxn let g = (gi ),x
}
. L e t g,x,y
e Gi(n)
and
(x j),y
z
(y ).
as
y^9ikXkj
k=l
. . . , Xjin) 2
x n
matrix
0^ 0
\0
where each 0 indicates null matrix of dimension nxn. the transformation y = gx is (detg} . dX(y) n
(det T / ) "
20
Then
dX(gy) =
(det 3/)"
= dX(y)
Hence A is an invariant measure on G ; ( n ) under (?/(n). It is also unique upto a positive multiplicative constant. E x a m p l e 2.1.4. Let G be a group o f n x n nonsingular lower triangular
matrices with positive diagonal elements and let G operate from the left on itself. We will verify that the Jacobian of the transformation
Q*hg
for G, is I l i C " ) '  Thus an invariant measure on G under G which is
1
dX{g) =
il>,
n?=i(j?)' To show that the Jacobian is n r = i ( ^ ) * > ' kij = 0 for i <j. T h e n
e t
9 
(fij),^ =
with g
tj
This defines a transformation (fti)(Cy). We can write the Jacobian of this transformation as the determinant of the matrix /dC 3C \ dCu dg &921
n lL
dC
21
dC
2l
dC
21
dgu
dgnn
BC \
nn
dC
nn
dgn
$921
&9nn 1
/nuartonce
21
> ro.
In other words h%i will appear once as a diagonal element, /122 will appear twice as a diagonal element and finally h
1 nn 1
element. Hence the Jacobian is I I I L i C " ) ' E x a m p l e 2.1.5. L e t S be the space of all positive definite symmetric
To show that the Jacobian of this transformation is (det g) , matrix obtained from the nxn the j t h row; M^C)
be the matrix obtained from the n x n identity matrix by identity matrix by adding the j t h row to the i t h . We and A y . T h i s can be easily is
2
multiplying the ith row by any nonzero constant C and let A y be the matrix obtained from the nxn need only show this for the matrices Eij,Mi(C)
verified by the reader; for example, d e t ( M , ( C ) ) = C and Mi{C)SMi(C) obtained from S = ( 5 y ) by multiplying Sa by C that the Jacobian is C ^ " "
1
and S y for i ^ j by C so
= C"
+ 1
(detS)(W
so that X is an invariant measure on S under the transformation (2.6). Here also X is unique upto a positive multiplicative constant. T h e group G;(p) acts transitively on S.
E x a m p l e 2.1.6. L e t H be a subgroup of G. T h e group G acts transitively on the space x = G/H (the quotient group of H) where group action satisfies = 9i9Hi for 9i e G , gG.
9i(gh)
When the group G acts transitively on x, there is a one to one correspondence between x and the quotient space G/H of any subgroup H of G.
22
Inference
2.1.1.
CEC.
(2.7)
on X such that f{x)
I }{x)dti{x) Jx
= =
f Jx
g(<p(x))dn(x) (2.8)
f g{y)dv{y) Jy
For a proof of this theorem, the reader is referred to L e h m a n n [(1959), p. 38], E x a m p l e 2.1.6. ( W i s h a r t d i s t r i b u t i o n ) . Let X\,..., matrix (positive definite). Assume n > p so that X be n indepen
dent and identically distributed normal pvectors with mean 0 and covariance
,X
is given by
(2.9)
Using Theorem 2.1.1, we get for any measurable set A in the space of S
P(S
e A) ( 2 J T )  ' ^ J (det
s  X^Xixl
e A exp
jitr
_ 1
S^sj
(2.10)
= (2^)^
(det
exp j  i
dm(s)
Invariance
23
where m is the measure corresponding to the measure v of Theorem 2.1.1. Let us define the measure m' by
To find the distribution of S, which is popularly called Wishart distribution ( W ( S , J I ) ) with parameter E and degrees of freedom n, it is sufficient to find dm'. T o do this let us first observe the following: such that
(i) Since E is positive definite symmetric, there exists a Gi{p) aa'. (ii) Let
As a
Xi s
P .(S
aa
6 A) = ( 2 7 r }  ^
/ JatA
(det(aa') s))
n/2
x exp   ^ t r ( a a ' )
 1
s
dm'(s);
P (aSa'
r
A) = (27r) ^
(det((c.a')
_1
s))
/a
s  x dm*{a
5a
')
a a
(2.12)
{ S )
= (det
'
24
Inference
W ( E , n ) = C(det  } ' ( d e t s )
( n
 
, ) / 2
exp j  ^
tr
sj
(2. 13)
where C is the universal constant depending on E . T o evaluate C let us first observe that C is a function of n and p. L e t us write it as C = C , .
n p
Since C ^
n
where S
11 = Tn,
S12 T12, S22  SuSiiSu which has a Jacobian equal to unity. Using the fact that = T22 = Z ( s a y ) ,
and
J C , 5
n p
l n
  ^exp^tr5jds= 1
forallp.n,
we get
= .pj
kiil
 n
" "
p
1 ) / 2
expitr _I 
z
S l l
Jd
5 l l
x y"{,)( i)/
2 e x p
d 2
e x p
{~5 'i
2 s
i"i '
5 l 2
d l l
/nuoriance
25
x /
I
j
sn ^
t n
" ~
l ) / 2
exp
da
11
m C
C,
n P
2 j r
f n  p + l \ \ 2 J da n
n,(p1)
C
",(p i)5n
( n
l / 2
exp
n 2
we have
1 2 J r
7 r
p(pl)/22' /
p(pl)/2 np/2
2
2.2. I n v a r i a n c e o f T e s t i n g P r o b l e m s We have seen in the previous section that for the invariance of the statistical distribution, transformation g of g g G must satisfy g9 g fi for 9 g fl.
D e f i n i t i o n 2.2.1. ( I n v a r i a n c e o f p a r a m e t r i c s p a c e ) . T h e parametric space fi remains invariant under a group of transformation G : X (i) for g g G, the induced mapping j on fi satisfies g9e.il (ii) for any 8' fi there exists 0 g fi such that g$ = 8'. An equivalent way of writing (i) and (ii) is A , if
1
fffi =
It may be remarked that if P
g
fi.
(2.14)
g is onetoone. T h e following theorem will assert that the set of all induced transformations g,g g G also forms a group.
T h e o r e m 2.2.1. Letg g
ly
be two transformations
1
The transformations
g gi
2
and g^
defined by
26
Inference
and g gi
2
P r o o f . We know that if the distribution of X is Pg, 0 G fl, the distribution of gX is Pgd.gi ft. Hence the distribution of g2gi{X)
1
and
ft.
T h u s g g\
2
obviously,
9i gi = 92fli
Similarly the reader may find it instructive to verify the other assertion. L e t us now consider the problem of testing the hypothesis Ho : 6 G Slu against the alternatives Hi : 6 ftjj,, where ft#
0
and Sin,
subsets of SI. Let G be a group of transformations which operate from the left on X satisfying conditions (i) and (ii) above and (2.14). I n what follows we will assume that g G is measurable which ensures that whenever X is a random variable then gX is also a random variable.
under a group of transformations G operating from the left on X if (i) (or g G, A A, P (A)
e
=
Hl
P (gA),
sa
(ii)
ttH ,gSl
a
Hl
= Sl
for all g G G .
2.3. I n v a r i a n c e of S t a t i s t i c a l T e s t s a n d M a x i m a l I n v a r i a n t Let (X,A, A. A) be a measure space. Suppose pg,8 G SI, is the probability
density function of the random variable X G X with respect to the measure Consider a group G operating on X from the left and suppose that the measure A is invariant with respect to G , that is, \{gA) = \(A) for a l l A e A ? G .
Let [y,C) be a measurable space and suppose T ( X ) is a measurable mapping from X into X.
Invariance
27
D e f i n i t i o n 2.3.2. X.
( M a x i m a l i n v a r i a n t ) . T(X)
remains invariant under the group transformations G, it is natural to restrict attention to statistical tests which are also invariant under G. Let v>(X), X X be a statistical test for testing Ho against H i , that is, <p{x) be the probability of rejecting H
a
then ip must satisfy <p(x) = tp(gx), xeX.geG (2.16) in terms of the maximal in
a g G such that y gx and therefore tp(x) f[y). In general if ip is invariant and measureable, then <fi(x) k(T(x)) invariant T(X)
may not be a Borel measurable function. However if the range of the maximal is Euclidean and T is Borel measurable then Ai {A : A where h is a Borel A and gA A for all g G) is the smallest ffield with respect to which T is measurable and ip is invariant if and only if ip(x) h(T(x)) measurable function. T h i s result is essentially due to Blackwell (1956). Let G be the group of induced transformations on f! (induced by the group G). Let v(8),8 fi, be a maximal invariant on fi under G. The distribution i/{9). fi. T h e n there exists a g G such of T(X) [or any function of T(X))
T h e o r e m 2.3.2.
28
Inference
P (T(X)
tl
C) = P {T{gX))
9i
G C) E C )
= P (T(X)
g9l
2.4. S o m e E x a m p l e s of M a x i m a l I n v a r i a n t s E x a m p l e 2 . 4 . 1 . L e t X = {Xu  . * n ) ' * and let G be the group of translations gc{X) = (Xi + C,.. .,X
a
+ C)',oo
A maximal invariant on X under G is T{X) Obviously, it is invariant. Suppose x (x\,... T(x) = T(y). Writing C y
n
X ).
n n
, ) ' , y = ( j / i , . . . ,y )' X
and let
we get yi = Xi + C for i 1 , . . . , n.
n
X) where
X = i 51 X j . It may be further observed that if n 1 there is no maximal invariant. T h e whole space forms a single orbit in the sense that given any two points in the space there exists C G G transforming one point into the other. In other words G acts transitively on X. E x a m p l e 2.4.2. Let A be a Euclidean nspace, and let G be the group of scale changes, that is, j? G G,
a 1
9 (X)
a
a > 0.
E x a m p l e 2.4.3. Let X = X x X where Xi is the pdimensional Euclidean space and X is the set of all p x p symmetric positive definite matrices. Let XeX SX . Write
Y 2 2 u 2
X 
(X i),...,X ))',
1 ( t
where X is d; x 1 and 5 is a\ x d; such that i U < * i = P Let G be the group of all nonsingular lower triangular pxp matrices in the block form, S G G
[t) { i i )
Invariance (9(H) 3(21) 9 = \9(ki) operating as (X,S)>(gX,gSg'). Let 9(k2) 9{kk)/ 0 3(22) 0 0 >
29
f{X,S) =
where ^  % S % Obviously, R* > 0. L e t us show that Rj,..., in a similar fashion. Let us first observe the following: (i) I f X * gX, S  <?5g' then
{Rl...,Rl),
*= 1 .  . *
(217)
if is a maximal invariant on X
under G. We shall prove it for the case k 2, T h e general case can be proved
 * 9 ( n ) X i ) , 5(u) t 9(ii)S(n)0( ).
( U
X  XJ'JJSJ"!,^!) + ( X (5
( 2 )
( 2 2 )
S(21)S^ \ X( )
l ) 1)
e X\\ S,T
1
G A*
1 1
5! x
)
( 1 )
1 1
T; y
l
[ 1 )
,y'r y),
(2.18)
where K, T are similarly partitioned as X, S. exists a.g eG such that X = gY, S = gTg'.
L e t us first choose 0
3(22)
11)
30
Inference
h = ( \fy l)
2
/l(22)
implies we conclude that there exists an orthogonal matrix 0\ that Since X'S'X = sT
1
Y'T~ Y,
= s's, = h%
(fl(ii)^(i))'((ii)^(i)) = (fc(u)%))'(tyii)%))> then I S (21)^(1) + 9 ( 2 2  A T ( 2 )   = 11/1(211^(1) + ^(22)^(2) IP so that there exists an orthogonal matrix 0
2 2
of order d
x d
such that
f 0,
I o
0 o
2
\ J
Invariance 31 so that
S =
aTo'
s a
maxima! invariant on x
Note that if G is the group of p x p nonsingular upper triangular matrices in the block form, that is, if g G ( 9{ii) 0 9 = V where 0 0 9{kk)/
S(i2) 9(22)
'' '
9(i*) \
9(2*)
defined by / % )
%,]
l
Y^T* =
(X(i),...
,X )i
w
= 1 , . . . ,k
density function with respect to A. Consider a group G operating from the left on X and suppose that A is a left invariant measure, \{A) Let [yX) = X(gA), g G, A A. is a measurable L e t A* be a
be a measurable space and suppose that T{X) induced by A, such that X'(C) = X(T (C)),
1
CeC defined by
Pt{C)
Pe(x)dX(x)
32
Inference
so that P
p'{T)d\'{T)
p(x)dX{x),CC
p* of the maximal
invariant
T with
respect to A* is given by
P*(T) = J (gx)dp(g).
P
(2.19)
Then
j f(T)p'(T)d\(T)
= J
f(T(x))p(x)d\'x).
However,
j f(T(x)) = jj = J = j J
p(g(x))dfi{g)
Invariance
33
on
O(p)
Let X = ( X i , . . . , X ) be a p x p random matrix where the column vectors Xi (pvectors) are independent and are identically distributed normal (pvectors) with mean vector 0 and covariance matrix / (identity). Apply the GramSchmidt orthogonalization process to the columns of X to obtain the random orthogonal matrix Y = (%,...., Y ),
p
where
Yi
i =
i ...,p.
t
Define Y = T(X)
T(gX ...,gX )
u p
T(X;,...,X;),
Since X ; , X " have the same distribution and Xi,...,X* gY have the same distribution for each g 0(p). of Y defined by P[Y is invariant under 2.6. Applications 2.6.1. (Noncentral Wishart 0{p), eC) = P(X eT~ (C))
L
are independently
normally distributed with mean 0 and covariance matrix / , we obtain Y and T h u s the probability measure
Example
n
distribution).
Let X
( X ! , . . . , X ) (each X * is a pvector) be a p x n matrix (n > p) where each column Xi is normally distributed with mean vector / i ; and covariance matrix / and is independent of the remaining columns of X . T h e probability density function of X with respect to the pndimensional Lebesgue measure is
i&'t$
2 n
)np/2
e x p
{~i
t t x
(219)
34
In/erence
iml
re
0{n),
where O ( n ) is the group of orthogonal matrices of dimension n x n . Let p(s, p.) be the probability density function of 5 at the parameter point p. with respect to the induced measure dX'{S) (2.13) p(s,0)dX'(s) By (2.18) = C(det s ) "  ( p , ) / 2
e x p    t r s J da,
(2.20)
(s,o)=
f (2rr,o)d^(r) Join)
P
(2ff)f/
a n 2
fH/ ., *
0
rexpi^trsl , \ 2
P
(r)
where / i is the invariant probability measure on O ( n ) . Hence, dA*(s) = ( 2 T ) " / C ( d e t f*+W*it Again by (2.18) .
p { a , u ) = exp  1 tr(s 4 J V ) J jf
^ exp j  j t r acT**' J
dp{T)
= ^ P   g tr(s
+ W
0j p),
Invariance
35
exp J  t r a d ? a ' \
I
2
du(T)
The probability density function of S with respect to the Lebesgue measure ds is given by p(s,^)dA*(s) = p ( s , ^ ) ( 2 7 r ) / C ( d e t
n p 2
/2
This is called the probability density function of the noncentral Wishart distribution with noncentrality parameter fia'. E x a m p l e 2.6.2. ( N o n  c e n t r a l d i s t r i b u t i o n of t h e c h a r a c t e r i s t i c Let Ri > R >
r o o t s ) . Let S of dimension p x p be distributed as central Wishart random variable with parameter and degrees of freedom n > p. > R
p 2
let P denote a diagonal matrix with diagonal elements R i , . . . ,R . respect to induce measure dX'(R) Giri(1977)]
/p v(npl)/2 .
p
Denote by
p(i?, S ) the probability density function of R at the parameter point E with of L e m m a 2.5.1. We know [see, for example,
p(R,I)dX(R)
= C
{U^J
exp4X>j
*ll(&Rj)dR
where dR stands for the Lebesgue measure and C\ is the universal constant. It may be checked that R is a maximal invariant under the transformation
s
By (2.18)
r s r ,
r e
o(p).
p(R,
I) = C
(j[
R^j
exp   \ J Ri 
^ dp(T).
36
Group Invariance
in Statistical
Inference
Again by (2.18)
/ p \ (npl)/2
(n^)
(npl)/2
0 TRr'\
l
da(T) J
where t9 is a diagonal matrix whose diagonal elements are the characteristic roots of . T h e noncentral probability density function of R depends on only through its characteristic roots. 2.7. T h e D i s t r i b u t i o n of a M a x i m a l I n v a r i a n t i n t h e General Case Invariant tests depend on the observations only through the maximal invariant. T o find the optimum test we need to find the explicit form of the maximal invariant statistic and its distribution. For many multivariate testing problems it is not always convenient to find the explicit form of the maximal invaraint. Stein (1956) gave the following representation of the ratio of probability densities of a maximal invariant with respect to a group G of transformations g leaving the testing problem invariant. T h e o r e m 2.7.1. Let G be a group operating from the left on a topological space (X, A) not necessarily densities pi,p
2
and \
a measure
under G (G is
transitive on X).
Pi(A)
J P2(x)d\( ),
A x
Invariance
37
for A A and Pi and P% are absolutely continuous. maximat invariant Pi. and p" is the distribution conditions of T(X} Then under certain
(2.21)
form
(2.21a)
where f
measure
with multiplier
Stein (1956) gave the statements of Theorem 2.7.1. without stating explicitly the conditions under which (2.21) holds. However this representation was used by G i r i (1961, 1964, 1965, 1971) and Schwartz (1967). Schwartz (1967) also gave a set of conditions (rather complicated) which must be satisfied for Stein's representation to be valid. Wijsman (1967, 1969) gave a sufficient condition for (2.21) using the concept of Cartan Gspace. K o e h n (1970) gave a generalization of the results of W i j s m a n (1967). Bonder (1976) gave some conditions for (2.21) through topological arguments. Anderson (1982) has obtained some results for the validity of (2.21) in terms of the proper action of the group. Wijsman (1985) studied the properness of several groups of transformations used in several multivariate testing problems. Subsequently W i j s m a n (1986) investigated the global crosssection technique for factorization of measures and applied it to the representation of the ratio of densities of a maximal invariant.
normal random variables with mean = ( , . . . , p ) ' and covariance matrix (positive definite). Write
N
x =
N ' 
w E
jy x x%x xf.
i i i
38
Inference
It
is wellknown that
X,S
are independent
in distribution and X
has
pdimensional normal distribution with mean and covariance matrix (1/jV) E and S has central Wishart distribution with parameter E and degrees of freedom n N 1. Throughout this section we will assume that N > p so that S is positive definite almost everywhere. Write for any pvector b,
'An) Am =
(n)
,A i)
( A
(kk)
j ,R
k
where by
and
ft,...,5*
E ^ = ^w2r7,^ , t = l,2,...,fc.
W
T h e o r e m 2.8.1. TTte joint probability density function given by (for Ri > 0, * = i fl^ < y
of R% . ..,R
t
is
w 
/
(
[(W )/2]l
P
x T j
W l  E ^ J
/ J V  <5,_i
dj J L & \
Invariance where a S i = i dj
w
39
kypergeometric
series given by a
+
a{a + 1 ) x
X +
a(a + l ) ( a + 2) x
+
b(b + 1) 2!
Furthermore, the marginal probability density function of Ri,...,Rj,j can be obtained from (2.22) by replacing A; by j and p by YJ[=I hiproof.
We will first prove the theorem for the case k 2 and then use
this result for the case k 3. T h e proof for the general case will follow from these cases. For the case k = 2 consider the random variables W = 5(22)  5 ( 2 1 ) 5 ^ , 5 ( 1 2 ) ,
%,).
It is wellknown [see, for example, Stein (1956), or G i r i ( l 9 7 7 ) , p. 154] that (i) V is distributed as Wishart W(N (ii) W is distributed as Wishart W{N (21)2^,V
 1
 1, E ( ) ) ;
U
 1  di, E 2 ) [ 2
S^ijSmjE^));
(L,V). by
W (l
2
+ W ) = N(X
L
{2)
 S
S ^
T ( S
( 1 )
 5
( 3
i 5 ) (
1 )
5(i2 ))
(2.24)
x ( X 2 )  5 ( 2 i , 5 ^ , A"
(
). given 5 ( j
n
40
Inference
1 1
thereupon follows that the conditional distribution is independent of the conditional distri
, and y/NX^f
given Sni)
and Xay.
11 s
5 ( ~ ! ) ^ { i ) give
(n)
covariance matrix
2 )
given 5 ( H )
a n
( 2
i)S^i^(i)) (i +
Wi) ?
( 2 2 )  ^(21) ( i i ) ( 1 2 ) 
From Giri (1977) it follows that the conditional distribution of W% given and S ( u ) is p(W \X ,S )
2 (1) tn)
X^
xl [Hi
+ Wi)" )
/xl dl
dl
(2.25)
where Xn($) denotes the noncentral chisquare random variable with noncentrality parameter 6 and with n degrees of freedom and Xn denotes the central chisquare random variable with n degrees of fredom. distribution depends on Xht and Sti\) probability density of WuWz p{W ,Wi)
2
as =
2 X 2
(6 (l
2
+ Wi) )/x .
N
d l
.^
(2.26)
and furthermore it is well known that the marginal probability density function of Wi is P(Wi) = X (Si)/x dl N 2 ai
(2.27)
Invariance
41
Hence, we have
= 0
jSI
fl +
i f f t ^ ^ H *
r(* + /?)r(V)
JJ3
j !
(2.28)
= {(Ri + R ){1
2
~Hi(l
Ri)~ } (2.29)
x (1 + H j . { l =
2
is
x R$  BI*' (IR T
4I
nriWJh*
2
x exp    ( f t +6 ) 2 x n ^ i ( J V 
+ rfefiji
_ 0 , f ; i i U , )
(2.30)
(l +
[22]
l23
42
Inference
and covariance matrix E(33)  [32]Ep2] [23] . and is independent of the conditional distribution of ^ [M] \~22) W
S S X E S
m [~2 2) m
S l X
is =
viW^W^Wy)
X%$41
+ WiftWi
+ W +
2
WiWsT )
1
(231) ,W, W
2 3
is given by '
 ^
5
XNdididz
Aivfl,
W W
Ri(lR R ) ,
l 2 2
= {{R, +R
+ R )(l
3
 R ,  R  R, 1 a
^ p R)
2
 (R, +
fl )
2
x (1  R,  R^'Xl
3 3
= R (.1RlR R ) .
(2.33)
Invariance
43
i=l
i=l
x n ^ i t i V  ^ f ; ^ ) .
(2.34)
Proceeding exactly in the above fashion we get (2.22) for general k. Since the marginal distribution of JEui is normal with mean &a and covariance matrix S[iij it follows that given the joint probability density function of by replacing k by j . i f i f ^ , the marginal distribution of i f , , . . . , i f j , 1 < j < p can be obtained from (2.22)
The following lemma will show that there is onetoone correspondence between ( f i i , . . . , i f t ) and (if J , . . . , i f ) as defined earlier in this chapter. L e m m a 2.8.1. NX'(S + NXX') X
1
NX'S^X 1+ NX'SiX
P r o o f . Let {S + NXX )Hence, I + NXX'SSo we get (S + NXX')Now NX'(S + NXX')^ = NX'S^X  NX'(S + NXXT'XiX'S^X).
1 1 1 1
=S'
+A.
+(S
+ NXX')A
I.
= S"
 N(S +
NXX'^XX'S'
44
Group Invariance
in Statistical
Inference
S~ X
Note that
=i
j=i
J=I
A" is said to be equivalent to an invariant test with respect to the group of transformations G, if there exists an invariant test ip(x) with respect to G such that ib(x) = y{x),
for all x G X except possibly for a subset TV of A" of probability measure zero. D e f i n i t i o n 2.9.2. A l m o s t i n v a r i a n c e . A test ip{x), x G X, is said to be almost invariant with respect to the group of transformations G, if <p(x) = tp(gx) for all x G X J\f ,
g
where N
It is tempting to conjecture that if tp(x) is almost invariant then it is equivalent to an invariant test i>{x). Obviously, any ifi(x) which is equivaleant to an invariant test is almost invariant. For, take N x & N, and gx $ N so that tp(x) = y(gx) = ih(gx) = ii>{x).
g
N U g N.
1
If x N ,
g
then
Conversely, if G is countable or finite then given an almost invariant test <p(x) we define N = where N = {x G x ' f{x) # f{gx)}
1J N
T h e uncountable G presents difficulties. Such examples are not rare where an almost invariant test can be different from an invariant test.
Invariance
45
Let G be a group of transformations operating on X and let, A,B, pairs (x, g) for which gx A is measurable Ax v{Bg)
be the
dfield of subsets of X and G respectively such that for any A A the set of B. Suppose further that there = 0 implies exists a trfinite measure v on G such that for all B B,v(B) with respect to G is equivalent to an invariant test under G. For a proof of this fact the reader is referred to Lehmann (1959). requirement that for all g G and B B,v(B) satisfied in particular when f{B) f{Bg) 0 imply v(Bg) The
= 0 is
right invariant measure exists for a large number of groups. Let A\ = {A : A A and gA A for all g G } and let .4 be the sufficient
S
ofield of X. that gA = A
getting the measure space (X, A , P ) , and subsequently by invariance then we arrive at the measure space (X,A i,
s
P) where A i
s
and Cox provides an answer to our questions under certain conditions. T h e proof is given by Hall, Wijsman and Ghosh (1965). T h e o r e m 2.9.1. Let [X, A, P ) be a measure space and let G be a group of measurable transformations almost invariant A leaving P invariant.
s
Let A
be a sufficient
subfield each
is sufficient for P on
Thus under conditions of the theorem if we reduce by sufficiency and invariance then the order in which it is done is immaterial. 2.10. I n v a r i a n c e , T y p e D a n d E R e g i o n s . The notion of a type D or type E region is due to Issacson (1951). Kiefer (1958) showed that the Ptest of the univariate general linear hypothesis possesses this property. Suppose, for a parametric space ft {(#,i?) : 6 ft',n H] with associated distributions, with ft' a Euclidean set, that every test function <fr has a power function 0${$,n} which, for each n, is twice continuously
a :
differentiable in the components of 0 at 0 = 0, an interior point of ft'. Let Q be the class of locally strictly unbiased level a tests of Ho alternatives H\:0^O. similar and that O u r assumption on implies that all tests in Q
$ 0 against the
a
are
46
Inference
Q.
a
AM
= det(/V"))Q.
a
Assumption. A ^ ( T J ) > 0 for all v E H and for at least one d>' 6 Definition 2.10.1. said to be of type E if A^(n)max^ Q A^(n)
e =
( T y p e E a n d T y p e D T e s t s ) . A test 4>' in Q
is
for all V H. If i f is a single point, d>~ is said to be of type D. Lehmann (1959a) showed that, in finding regions which are of type D, invariance could be invoked in the manner of HuntStein theorem. (Theorem 5.OA); and that this could also be done for type E regions (if they exist) provided that one works with a group which operates on H as the identity. Suppose that our problem is invariant under a group of transformations for G which HuntStein theorem holds and which acts trivially on fi', such that for g e G
If (fig is the test function defined by tpg(x) = <j>(gx), then a trivial computation shows that A* (n) = A ^ o n )
9
and hence A(n) = where A(n) = m a x ^ ^ A ^ n ) . Also, if 0 is better than 4>' in the sense of either type D or type E, then tpg is clearly better than <p'g. A{gr))
Invariance
47
variables with mean a and variance (unknown) IT . Using Theorem 2.7.1. find the U M P invariant test of Ho : p, = 0 against the alternative Hi : u ^ 0 with respect to the group of transformations which transform each Xi * cXi,cQ. 2. Let {Pg,8 e 11} be a family of distributions on (X,A) density p(j8) flrj against Hj : 9 G Hi with 12
n
Pi = 0 (
n u
'l
s e t
).
rejects Ho whenever X(x) is less than a constant, where, sup 9.n X(x) = 9
0
p(x\9) p(xS)
sup
e flj u Hi
Let the problem of testing r/o against i i ] be invariant under a group G of transformations g transforming x * g ( x ) , g 6 G, i G A". Assume that p(x\9)=p(9x\g8)X(9) for some multiplier X. Show that A(x) is an invariant function. 3. Prove (2.8) and (2.25). 4. Prove Theorem 2.9.1.
References
1. S. A, Anderson, Distribution of maximal invariants using quotient measures,
Ann. Statist. 10, pp. 955961 (1982). 2. D. Blacltwell, On a class of probability spaces, Proc. Berkeley Symp. Math. Statist. Probability. 3rd, Univ. of California Press, Berkeley, California, 1956.
3. J . V . Bonder, Borel crosssections and maximal invariants, Ann. Statist. 4,
Statisties,
Ann. Math. Statist. 36, pp. 10611065 (1965). 8. N. Giri, On the distribution of a multivariate statistic, Sankhya, 33, pp. 207210 (1971).
48
Inference
9. N. Giri, Multivariate
specifying
Statistical
Inference,
of unbiased
234 (1951).
11. A. Haar, Der Massbegriff der kontinvjierchen
between
sufficiency
symmetrical
(1959a).
18. L , Loomis, An Introduction to Abstract Hermonic Analysis, D. Van Nostrand
1965.
20. R. Schwartz, Locally mimimax tests, Ann. Math. Statist., 38, pp 340359, 1967. 21. C . Stein, Multivariate AnalysisI, Technical Report 42, Department of Statistics,
imal invariants, Proc. Fifth. Berk. Symp. Math. Statist. Prob. 1. pp. 389400 (1967) University of California, Berkeley.
23. R. A. Wijsman, General proof of termination sequential probability with probability one of invariant ratio test based on multivariate of maximal invariants, observation, Ann. Math.
imal invariants,
distribution
Denote by
of transformations operating from the left on \ such that p G , g ; x * X is one to one onto (bijective). Let G be the corresponding group of induced transformations g on ft. Assume (a) For 0i e 0, i = 1,2; 9i ? t 9 , P ,
2 e
P
B
(b) P { A )  P e(gA),
f l g
A A,p G,g e G.
Let A(0) be a maximal invariant on ft under G , and let f!" = {0\9 fi.with A(<?) = A }
0
(3.1)
where Ao is known. We assume x to be the space of minimal sufficient statistic for 8. D e f i n i t i o n 3 . 1 . ( E q u i v a r i a n t E s t i m a t o r ) A point estimator 8(X),X X, a mapping from x ' ft, is equivariant if 8(gX) N o t e 1. = g8(X),g G.
transformations on
50
Group Invariance
in Statistical
Inference
X' be a maximal invariant on x under G. Since the distridepends on 0 Q only through A(0), given X($) = A T(X)
0
is an ancillary statistic.
minimal sufficient statistic X whose marginal distribution is parameter free. In this chapter we consider the approach of finding the best equivariant estimator in models admitting an ancillary statistic. Such models are assumed to be generated as an orbit under the induced group G on fi. T h e ancillary statistic is realized as the maximal invariant on x under G. A model which admits an ancillary statistic is referred to as a curved model. Models of this nature are not uncommon in statistics. Fisher (1925) considered the problem of estimating the mean of a Normal population with variance a
2
when the
coefficient of variation is known. T h e motivation behind this was based on the empirically observed fact that a standard deviation a often becomes large proportionally to a corresponding mean p so that the coefficient of variation a/p remains constant. This fact is often present in mutually correlated multivariate data. In the case of multivariate data, no wellaccepted measure of variation between a mean vector p and a covariance matrix E is available. K a r i y a , Giri and Perron(1988) suggested the following multivariate version of the coefficent of variation, (i) A = fl'SS /!
1
(ii) v = S "
^ where E = E ^ E
'
with S /
as the unique lower triangular matrix with positive diagonal elements. I n recent years Cox and Hinkley (1977), Efron (1978), A m a r i (1982 a,b) among other have reconsidered the problem of estimating p, when the coefficent of variation is known in the context of curved models. 3.1. B e s t E q u i v a r i a n t E s t i m a t i o n of fi w i t h A K n o w n Let Xi,...,X (n > p) be independently and identically distributed JV (p, E ) . We want to estimate a under the loss function
n p
L{n,d)
l 0
= (dn)'L {dp)
(3.2)
nx = Y,x>,s
T h e n (X,S)
p
 x)(Xi
xy.
is distributed independently
of S as N (y/nu,
degrees of freedom and parameter E . Under the loss function (3.2) this problem remains invariant under the group of transformations G j ( p ) of pxp
Equivariant
Estimation
in Curved Models
51
(gX ,gSg').
induced transformations g on the parametric space ft transforms 8 = (p., ) * g8 (pp, o p ' ) . A maximal invariant invariant on the space of (\/nX, T
2
= n(n
lJX'S'X
A, given A A o T {gX,gSg')
R(8, l)
(
= = E (gjl
e
E (pp)'^ (pp,))
e
= E (p(gX)
e
= Eg (a{X)
e
Since G  ( p ) acts transitively on E we conclude from (3.4) that the risk of any equivariant estimator ji is a constant for all 8 ft.
the fact that R(8, ii) is a constant we can choose p.. = e = ( 1 , 0 , . . , Q) , E = I. To find the best equivariant estimator which minimizes R(8,p) equivariant estimators p. satisfying ji(gX,gSg')=gp(X,S) we need to characterize p.. Let Gr{p) be the subgroup of G;(p( containing all Since 5 is Grip)among all
~'
2
lY Q
WT\
then (3.5)
 1).
52
Group Invariance
in Statistical
Inference
where k is a measurable
ofU = T j(n
= p.(gY,gSg').
fi(V,I)
= p(00'V,00')
OH{VUeJ).
Since the columns of O except the first one are arbitrary as far as they are orthogonal to Q it is easy to claim that the components of p.(\fUe,I) the first component p (VU
L
except
p{V,I)
T h e following theorem gives the best equivariant estimator (BEE) of p. T h e o r e m 3 . 2 . Under the loss function (If) WQ where k(U) = E(Q'W'e\U)/E(Q'W'WQ\U). (3.8) (3.2) the unique BEE of p isp = k
Proof.
= E(k(U)WQ
 e)'(k(U)WQ
e). k(u)WQ
Using the fact that U is ancillary, a unique BEE is obtained as p = where k(u) minimizes the conditional risk given U E{((k(U)WQ  e)(k{U)WQ e)')\U\.
Equivariant
Estimation
in Curved Models
53
log
(det )  
tr 5 S "
  j  M J ' E
^  M ) 
(3.9)
with respect to p, where 7 is the Lagrange multiplier. Maximizing (3.9) we obtain p~, n + 7
 l
S   5 +
71
M ' ( 7 + C
(310)
The maximum likelihood estimator (mle) p is clearly equivariant and hence it is dominated by jl of Theorem 3.2 for any p. I n the univariate case the mle is
some properties of this model associated with the Fisher information. A m a r i (1982 a, 6) proposed through a geometric approach what he called the dual mle which is also equivariant.
J when
is known. L e t Y = y/nX,W
s
= tr S. T h e n (Y,W)
is a sufficient statistic
n p
i distributed independently of Y as X (  i ) 
(3.13)
54
Group Invariance
in Statistical
Inference
T h i s problem remains invariant with respect to the group G = R+ x tive group of p x p orthogonal matrices transforming Xi t bTXi , t = 1 , . . . d 4. bTd, ,n
0(p),R+
being the multiplicative group of positive reals and O(p) being the multiplica
^gf)
where (6,T) G with b e R+,T (Y, W) is given by T h e theorem under G . T h e o r e m 3.3. below
a ~
n,
g O ( p ) . T h e transformation induced by G on
(Y,W)
An estimator d(Y, W) h ;R
+
is equivariant
* R such that
d{Y,W)
h{^p}Y
R.
+
* R and d[Y, W)
ariant under G , then d(Y,W) for all T 0{p),Y generality that Y'Y R ,b > 0.
p
(3.14)
L e t Y, W be fixed and
ft)
where di is the first component of the pvector d. p x p matrix such that Y'A = (\\Y ,0,...,0) Let A O(p) be a fixed
Equivariant
Estimation
in Curved Models
55
\\Y\\~ Y.
Now choose
(3.15)
Since (3.15) holds for any choice of B 0(p 1) we must have then
d(Y,w)
= d ((i,o...o).^)r.
l
It may be easily verified that a maximal invariant under G in the space of sufficient statistic V = W~ Y'Y the parametric space is
l
As the group acts transitively on the parametric space the risk function (using (3.13)) fl(M) = ((M)) Hence we can take p = po =
fl(po,d)
(L (p ,k
0
y ) ) r))
V
= Eu (E(L(p ,h(^}
0 n
= *)
= v)<
Ep {L{po,h{v)y)\V
0
= v)
for all h : R
= v) + l
EMY \V Ep (Y'Y\V
i 0
= v) = v)
3 ( 3
. '
1 6 )
56
is given by
d {Y,W)
0
' r f j n y + i + l ) (n&
nC
1 where t y ( l + ti)
h i^ ) \
r +i+l v
^ (3.17)
t
2
/
V h
rti"P+j+i)/ BC V hir+W \ 2 J .
is
2 1 1
[ exp{(C /2)(j/y  2y5gi + n + I K ) } ^ ' "  ' " /(y>) = { 2"p/ (C )""p/2(r(i))pr((n  l)p/2) 0, otherwise.
2 2
, if w > 0
exp{(C /2[((l+ i Q W i , + 2 ^
2
]}
j
(
( C
)  ^ \ r ^ m ( n 
l)p/2)
0, otherwise.
Using (3.16) we get (3.17). C o r o l l a r y . If m  p(n  l ) / 2 is an integer, the B E E is given by n <" do(Y,W)= ^h (v)
T 0 2
g(t)X,
(3.18)
where m+ m+l m + l 1
u(t) =
y ,
m + l \ /'nC \ Id r (  p + i)
...
Equivariant
Estimation
in Curved Models
57
P r o o f . Let Y
be distributed as % % Then
c E
./
y n
^_o r(fc/2 + )
t t a 2
k
l f Q >
^ ~
r(fc/2)
^ " '
= v ^ C
^ 3=0
m 1
4 (3.19)
/ v
SC (V
),
n( 2
where Vj is distributed as noncentral X ,+2( ~' i)^ noncentral Xp( C t). get
n 2
and ^2 is distributed as
Hence letting V = ^ ( t f )
d taking r as an integer we
Prom (3.19) and (3.20) we get (3.18). N o t e 1. g(t) is a continuous function of f, and
lirn
(.0+
hmg(t)
and g(t) > 0 for all t. T h u s when Y'Y N o t e 2. We can also write d
form is very popular in the literature. Perron and G i r i (1990) have shown that g{t) is a strictly decreasing function of t and T(V) is strictly increasing in v. T h e result that g(t) is strictly decreasing in f tells what one may intuitively do if he has an idea of the true value of C and observe many large values concentrated. Normally one is suspicious of their effects on the sample mean and they have the tendency to shrink the sample mean towards the origin. T h a t is what our estimator does. T h e result that T(V) is strictly increasing in v relates the B B E of the mean for C known with the class of minimax estimators of the mean for C unknown. Efron and Morris (1977) have shown that a necessary condition
58
Group Invariance
in Statistical
Inference
to be minimax is g(t) * 1 as
t * 1, SO our estimator fails to be minimax if we do not know the value of C. O n the other hand Efron and Morris (1977) have shown that an estimator of the form d = (1 0 < T(V) <(p
Z^p)X
(i) but fails to satisfy (ii). So a truncated version of our estimator could be a compromise solution between the best, when one knows the values of C and the worst, one can do by using an incorrect value of C. 3.2.1.
An application
T h e following interesting application of this model is given by Kent, Briden, and Mardia (1983). T h e natural remanent magnetization ( N R M ) in rocks is known to have, in general, originated in one or more relatively short time intervals during rockforming or metamorphic events during which N R M is frozen in by falling temperature, grain growth, etc. T h e N R M acquired during each such event is a single vector magnetization parallel to the thenprevailing geometric field and is called a component of N R M . B y thermal, alternating fields or chemical demagnetization in stages these components can be identified. Resistance to these treatments is known as "stability of remanence". A t each stage of the demagnetization treatment one measures the remanent magnetization as a vector in 3dimensional space. These observations are represented by vectors Xi,...,X
n
where
ci denotes the true magnetization at the ith step, 0i represents the model error, and ei represents the measurement error. They assumed that 0, and e< are independent, p\ is distributed as N {0,T {c>i)I),
3 2 ; 2
and e is distributed as
z
A M 0 > c r ( a , ) / ) . T h e Q are assumed to possess some specific structures, like collinearity etc., which one attemps to determine. Sometimes the magnitude of model error is harder to ascertain and one reasonably assumes r (a) In practice <J {a) is allowed to depend on a ; and plausible model for which fits many data reasonably well is cr (a)
2 2 2
= 0. c (a)
2
a{a'o)
3.2.2.
C ^ )  
e x p   ^  ( , + j ' ,2^'M + V
( / J
) }
Equivariant
Estimation
in Curved Models
59
wy'y
+ 2 \ / n i / A ] = Vnp'py
(3.21)
If the E q . (3.21) in ft has a solution it must be colinear with y. L e t p = ky be a solution. From (3.21) we obtain k[(np/C )y'y)k
2 2
 1 + (1 + '
2
^ ( ^ ) )
2
iJkpIC
2^lp/C
To find the value of k which maximizes the likelihood we compute the matrix of mixed derivatives a (  logL)
dp'dp
2
kHy'y)'
and assert that this matrix should be positive definite. T h e characteristic roots of this matrix are given by x/nC
2
k y'y
2
'
+ 2npk k y'y
2 2 2
If k = ki, then Ai < 0 and A < 0. B u t if k = k , then A, > 0, A > 0. Hence the mle p. di(xi,... characteristic roots) (1 + 4 p / C ( ) 2p
2 1 / 2
,x ,C)
n
d {x ,...x ,C)
1 1 n
 1
C X.
(3.22)
Since the maximum likelihood estimator is equivariant and it differs from the B E E do the mle d\ is inadmissible. T h e risk function of do depends on C . Perron and Giri (1990) computed the relative efficiency of do when compared with d]_, the JamesStein estimator (d ),
2 3
estimator (d ), and the sample mean X ( d j ) for different values of C, n, and p. They have concluded that when the sample size n increases for a given p and C the relative efficiency of do when compared with di, i = 1,.., 4 does not change significantly. T h i s phenomenon changes markedly when C varies. When C is small, dp is markedly superior to others. O n the other hand, when C is large
60
Inference
all five estimators are more or less similar. These conclusions are not exact as the risk of d ,di
0
indication that for small values of C the use of B E E is clearly advantageous. 3.3. B e s t E q u i v a r i a n t E s t i m a t i o n in C u r v e d C o v a r i a n c e M o d e l s Let Xi,... ,X (n
n
pdimensional normal vectors with mean vector p and positive definite convariance matrix S . Let E and S = / i
1 ll
(X
 X)(X
 Xf
be partitioned as
PIV
^12 I .51
i Sn
p  i V S\2 I
We are interested to find the B E E of 0 E ^ ^ i on the basis of n sample observations Xi,...,x coefficent p
2
when one knows the value of the multiple correlation If the value of p
2
E ^ E ^ s E ^ E21.
naturally be interested to estimate 0 for the prediction purpose and also to estimate 22 to ascertain the variability of the prediction variables. L e t Gi(p) 0{p) be the full linear group of p x p nonsingular matrices and let be the multiplicative group of p x p orthogonal matrices. L e t Hi be the
' An 0
P1 0 h22
2
Define G = Hi H
2
of H i and H .
T h e transformation g = (hi,k )
G transforms H .
The
t = l,...,n,
(X,S)
Equivariant
Estimation
in Curved Models
61
3.3.1.
Characterization
p
of equivariant estimators of
Let S
the group of pxp lower triangular matrices with positive diagonal elements. A n equivariant estimator d{X, S) of with respect to the group of transformations G is a measurable function d(X, S) on S xR
p p
to S
satisfying
S)ti
Hi,
and X,
S.
p
is a mapping from S
to S .
p
another space Y (say) then d* is an equivariant estimator of u ( ) if and only if d* u d for some equivariant estimator d of . Let
L e m m a 3 . 1 . G acts transitively on 0 .
P
Proof.
( A , ) G with
1 p .0
p 1 0
1 / 2
P 0 0 /
j 1 p2
N (3.23)
If p = 0, i.e., p # 0, choose A
1 2
= 0, we take ha = E
1 u / 2
= ,,
, f =  f t p to obtain (3.23). I f
2 2
= F ^3
, where T is a (p  1) x (p  1)
1 2
1 / a 2
r = 0 , 0,
0),
and =  A / / to obtain (3.23). T h e following theorem gives a characterization of the equivariant estimator
d(S) of S .
62
Group Invariance
in Statistical
Inference
{a^WRiSu
R~ a tR)S S, S 2
22 21 1 1
C(R){S22S S S )
2l l2
/f,\
[ K )
_ ( aii{R)  \a (R)
2l
a (R) a (R)
12 22
Furthermore
2
<ki p S ^ E i j i E j " ^ S a i
2
l n
ai a
2
2 2
a \ = f>
2
I t consists
1 2
tations presented in the necessary part. To prove the necessary part we observe that
 ( R
*)
FI
><
and d satisfies Ii 0
o
for all T 0(p  2). T h i s implies that
P
0 with C(R) > 0.
P
o \ _ /MR)
I)
2
o
C(R)I _
P 2
Sn
Sn
S\
l2
_(Ti
\ 0
S)
22
0 \ / l T j \U
V \(T[
V J U
0 T
in Curved Models
63
with 7\ G G ( 1 ) , T
>
generality we may assume that U ^ 0. Corresponding to U there exists a B G 0 ( p  1) such that U'B = {R, 0, . . .0) with R = \\U\\= Using such a B we have the decomposition 1 U and W \ _(\ !.,) ~ {0 O W P Bj \ 0 0
J_)
P 2
5 )=
2 1 l
>
R~ U.
\ (l \0
0
B'
V TJ
A o
0
13
fi'a^ffi)^!
R a 2(R}S  S 'S 2
2 2 l l l
+ C(R){S
22
S iS^S )
2 l2
T h e following theorem gives a characterization of the equivariant estimator of 0. T h e o r e m 3 . 6 . If d* is an equivariant form d*(S) = R' a(R)S S ,
22 2 1 1
estimator ,
of 0 then d*(S)
has the
where a : R
* R.
p
P r o o f . Define u:S
^ RP'
by
+ C(R)(S
22
+ CfflHCV! 2 1 21
UU )T^S R a {R)
2l 21 1 2 2 21
= R~ (a (R)
22
+ (1 l 2 2
R )(C(R)) a (R)S  S
= R a{R)S  S i
2
64
Inference
Ex(L((3,d'))
1 l i
1 l
= E {S^(R a(R)S S .
n 2
 0}'S {R~
22
l 12
a(R)S
22
l 22
S
.
2l
0)}
(3.25)
= E {a?{R)
z
 2R a{R)S[ S 0
+ S^0'S 0}
T h e o r e m . 3.7.
(3.26)
where
r (
n i
2
r ( ^  M ) ) (2 )
p r 2 m
( V V ) 7 I T / m ! r (2=1 + m)
a'(R)
= r p
^ _
r (2=1 +
m
(3.27)
Proof. a*(R)
E^S^
Si20R~ \R).
with C(p) = \ \
,1)
2 1
fl"" ,
Pi
i=2j=l
1 *
e x p {
~ 2 ( i 
2 P
"
* 
P ^
^ n i=i
fe)
(3.28)
Equivariant
Estimation
in Curved Models
65
l)^
Proof.
^ r ( a + m + l)r(^ + l ) j
i =0
d"
(=1
0 = (1  7 ) ^ ) ^ ( 1 + )+ ( l ^
a ' M ^  ^  l p y m,
i =0
1  r
1 
r V
Note. (pir S S i.
22 2 l 1
If p
66
Group Invariance
in Statistical
Inference
= 1,...,N(
tically distributed pvariate normal random vectors with common mean fi and positive definite common covariance matrix and let
 1
+ NXX')~ X
is an
= (X i,...,X y,a
a ap
cally distributed pvariate normal random vectors with mean 0 and positive definite covariance matrix and let
E
_ (
\(21)
(12) \
with
( l l
) : lxl.
( 1 2
, ~
(
2 )
2 1 1
/ i i find the
ancillary statistic. 3. (Basu (1955)). If T is a boundedly complete sufficient statistic for the family of distributions {Pg,9 T. 4. F i n d the conditions under which the maximum likelihood estimator is equivariant. g ft}, then any ancillary statistic is independent of
References
1. S. Amari, Differential geometry of curved exponential ancillary families  curvatures and
information
(1955). 4. D. R. Cox and D. V, Hinkley, Theoretical Statistics, Champman and Hill, London, (1977). 5. B. Efron, The geometry of exponential families, Ann. Statist., 6, 262376, (1978).
6. B. Efron and C, Morris, Stein's estimation rule and its competitors, An empirical
approach, J . Amer. Statist. Assoc. 68, 117130, (1973). 7. B. Efron and C . Morris, Stein's paradox in statistics, Sci. Ann., 239, 119127, (1977).
Equivariant
67
of statistical Statistical
estimation,
22,
Inference,
inference
11. T . Kariya, N. Giri, and F . Perron, Equivariant of Npifi, E ) with u"Z~ u Anal. 27, 270283 (1988). 12. J . Kent, J . Briden, and K . Mardia, Linear tivariate data as applied to progressive = 1 or " V
V
Multivariate mulrema
in ordered
demagnetization estimator
of palaemagnetic of mean of a
nence, Geophys. J . Roy. Astronom. Soc. 75, 593662, (1983). 13. F . Perron and N. Giri, On the best equivariant normal population, 40, 4655 (1992). 14. F . Perron and N. Giri, Best equivariant multivariate Models, 3. Multivariate Analysis, 32, 116 (1990). estimation in Curved Covariance
4.0. I n t r o d u c t i o n In Chapter 3 we have dealt with some applications of invariance in statistical estimation. We discuss in this chapter various testing problems concerning means of multivariate normal distributions. Testing problems concerning discriminant coefficents, as they are somewhat related to mean problems will also be cousidered. This chapter will also include testing problems concerning multiple correlation coefficient and a related problem concerning multiple correlation with partial informations. We will be concerned with invariant tests only and we will take a different approach to derive tests for these problems. Rather than deriving the likelihood ratio tests and studying their optimum properties we will look for a group under which the testing problem remain invariant and then find tests based on the maximal invariant under the group. A justification of this approach is as follows: If a testing problem is invariant under a group, the likelihood ratio test, under a mild regularity condition, depends on the observations only through the maximal invariant in the sample space under the group {Lehmann, 1959 p. 252). We find then the optimum invariant test using the above approach, the likelihood ratio test can be no better, since it is an invariant test.
4.1. T e s t s of M e a n V e c t o r Let X = (Xi,...,X ) be normally distributed with mean E(X) = (i> l i p ) and positive definite covariance matrix E(X u) (X p). Its pdf is given by
p 1
68
Tests in Mattinormals
69
fx(x)
= {2 r)"/ (det
7
exp
jitrE" ^
 0 & 
tf
(41)
In what follows we denote 3: a pdimensional linear vector space, and X', the dual space of X. T h e uniqueness of nonmal distribution follows from the following two facts: (a) T h e distribution of X is completely determined by the family of distributions olB'X, distribution. For relevant results of univariate and multivariate normal distribution we refer to Giri (1996). We shall denote a pvariate normal with pdf (4.1) as JV (f, S ) .
p
9 X'.
(b) T h e mean and the variance of a normal variable completely determines its
(x ,...,
ai
x)
ap
, a I,. ..,N
from
P r o b l e m 1, T o test the null hypothesis HQ: = 0 against the alternatives Hi: j 0 when g, S are unknown. T h i s is known as Hotelling's T
2
problem.
alternatives H i : # 0 when , are unknown and p > p i . P r o b l e m 3 . T o test the null hypothesis Ha : i * * > alternatives H i : & =  * = ^ Let X jj^X", (minimal) for (, ) and \/NX S = E^(X
a
P l
against the
2
< p. E)
 X)'.
(X,S)
is sufficient
p
and S has Wishart distribution Wj,(JV1, E ) with JI = J V  1 degrees of freedom and parameter E . T h e pd/ of S is given by f X{detE)' / (dets) } = < if i positive definite,
s s n 2 : i : :
F exp{trE a} (4.2)
W (n,
p
I 0 otherwise. Problem 1 remains invariant under the general linear group G ; ( p ) transfering (X, S) * (gX, gSg'), g G Gi(p). R =NX'(S
l
NX'S X l+NX'StX
(
70
Inference
= N(N
2
^X'S^X
(44)
we will use the test based on R. From Theorem 2.8.1 the pdf of i i depends
>
r(p/2)r((7v )/2)
P
'
It may be verified that 8i is a maximal invariant under the induced group (5j(p) = Gi(p) in the parametric space of (, E ) . Since is positive definite tfi = 0 if and only if p = 0 and 6 > 0 for p ^ 0. F o r invariant tests with respect to G;(p) Problem 1 reduces to testing HQ: 6 = 0 against the alternatives Hi6 > 0. From (4.5) the pdf of R under HQ is given by
r ( a
JrM(iTjtl**&*
( 1
Mn) =
4.
6 }
'
of R has monotone likelihood ratio in i i (see Lehmann, 1959). T h u s the test which rejects HQ for large values of R is uniformly most powerful invariant (UMPI) rem. T h e o r e m 4.1.1. whenever T
2
1, Hotelling's
The group Gt{p) induces on the space of {X, S) the transformations (X,S)^(gX,gSg'),
9
Tests in Multinomials
71
E ^(X S)
ll <
= =
lti>2
E . _ . . .ct>(X,S)
g 1)l g lEg l
E <KgX,gSg'),
4'(gX,gSg}
for all g 6 <3J(P)> except possibly for a set of measure zero. A s there exists a left invariant measure (Example 2.1.3) on G ( p ) , which is also right invariant,
(
any almost invariant function is invariant ( L e h m a n n , 1959). Hence we obtain the following Theorem for Problem 1. all tests of HQ: 0 with power function test is UMP:
T h e o r e m 4.1.2.
Among T
2
de
pending on 6, Hotelling's
P r o b l e m 2. Let Ti be the group of translations such that t i , g T i translates the last p pi components of each X", a 1 , . . . , N and let Gi be the subgroup of Gi{p) such that g 6 G\ has the following form
g=
(m
\9{21) 9(22)
where g^ j
U
is the p
x p
is invariant under the affine group ( G ^ T i ) such that, for g Gi,tj (g, ( i ) transform X ^gX
a a
+ t
a =
R ^ N X ^ S ^ + N X ^ X ' ^ r ' X ^
as defined in E x a m p l e 2.4.3 with di == Jh A corresponding maximal invariant in the parametric space of (, E ) under ( G i . T i ) is #i = N ^ E ^ . ^ J J . variant test this problem reduces to testing H Hi:
0
For in
is given by
72
Group Invariance
in Statistical
Inference
From (4.8) it follows that the distribution of R\ possesses a monotone likelihood ratio in i%. Hence we get the following Theorem. T h e o r e m 4 . 1 . 3 . For problem 2 the test which rejects Ho for large values of Ri is UMPI under (Gi,Ti) for testing Ho against H\.
N o t e : A s in Problem 1 this U M P I test is also the likelihood ratio test for Problem 2. Using the arguments of Theorem 4.1.2 we can prove from Theorem 4.1.3 that among all tests of Ho with power function depending on Si, the test which rejects Ho for large values of Ri is U M P for Problem 2. P r o b l e m 3. Let T be the translation group which translates the last
a
a = 1 , . . . , N and let G
be the subgroup
9 =
0 \ 0 1 3(32) ff(33] /
2
where
is pi x p and <7( ] is p2 x p .
t 22 2
transforming X ^gX
a a
+ t
2l
a =
l,..,,N
where g G (G ,T )
2 2
and t
is C f l M b ) ) where
^=^(1)^^(1).
is given by
73
m l
, _
2 >
n  N ) r t l p i j r c ^ r t ^ p  p !  ^ ) )
'
From G i r i (1977) the likelihood ratio test of this problem rejects Ho whenever
where the constant c depends on the level a of the test and under Ho, Z is distributed as central beta with parameters (  ( i V pi p ) , \p?)2
From (4.9) it follows that the likelihood ratio test is not U M P I . However for fixed p, the likelihood ratio test is approximately optimum as the sample size N is large (Wald, 1943). T h u s if p is not large, it seems likely that the sample size commonly occurring in practice will be fairly large enough for this result to hold. However, if the dimension p is large, it might be that the sample size N must be extremely large for this result to apply. We shall now show that the likelihood ratio test is not locally best invariant ( L B I ) as S  * 0. T h e L B I test rejects H whenever
2 0
Ri +
> c Hotelling's T
(4.11)
2
where the constant c depends on the level a of the test. which rejects Ho whenever R, + R for Problem 3. D e f i n i t i o n 4 . 1 . 1 . ( L B I test). For testing H
2
test
of the test, does not coincide with the L B I test, and hence it is locally worse
: 9 IH,
an invariant test
4>* of level a is L B I if there exists an open neighborhood fii of fl/jo such that
Eebt?) >
E {4>\
e
e&
fijio,
(4.12)
where 0 is any other invariant test of level a. T h e o r e m 4 . 1 . 4 . For testing H : h*i = 0, tf = 0 against H\ : S i = 0,6
2
> c
R\ + ~?'
2
74
Group Invariance
in Statistical
Inference
Proof.
2
Since (R\,R2)
under Hi,
' _ 2 V
i v ^ P2
> )
(l
+ R i + ^ ^ R ^ j
o(6 )
2 N Pl
as 62 * 0, which is maximized by taking $ = ^ c. Asymtotically Best Invariant ( A B I ) Test Let # be an invariant test for testing H
2 0
1 whenever Ri + ~
R n
2
: $1 = 0, 6 = 0 against Hi : S\ = 0,
2
^ c } satisfying
P (R)
Hl 2
where J B ( f i , f ; A ) o ( H ( A ) ) as A > 00 and 0 < ci < R(X) < c $ is an A B I test for testing HQ against H1 as A 1 00.
A > 0, Hotelling's
whenever Ri + R
^ c, the constant c
depending on the level a of the test, is ABI as A * 00. P r o o f . Since _, , . , a a(a +1) x
2
as x < 0 0 ,
+ B(f f2,\))};
u
(4.13)
75
+ R
< c
) (414)
2
= exp { ^ ( c  1 ) 0 + o ( l ) ) }
Thus, from (4.12) with H(X) = ~{c  1), we conclude that Hotelling's T is A B I . 4.2. T h e C l a s s i f i c a t i o n P r o b l e m ( T w o P o p u l a t i o n s ) Given two different pdimensional populations Pi,P probability density functions pi,p . ., x Y
P 2 2
test
characterized by the
populations. We assume for certainty that it belongs to one of the two populations. Denote by Oj, t = 1,2 that x comes from Pi and let us denote, for simplicity, the states of nature Pi simply by i. L e t L ( j , L(i,a<)  0, L(i,a,j) = ij > 0, j j i = 1,2, be the loss incurred by taking the action a; when j is the true state of nature having property that
L e t (x, 1 T ) be the a priori probability distribution on the states of nature. The posterior probability distribution ( , 1 ) , given the observation x, is given by _
5
rrp^x) Pi(*J + {
l
, '
l is given by
+ (1  J T ) P 2( Z)
(4.16)
is given by
irpi{x)
+ (1  iv)p {x)
2
'
2
(4.17)
if
2
,
2
{1  ir)c (x)
lP2
npi(x)
+ (1  n)p (x)
2
xp^x)
+ (1  ?r)p2(i)
76
Group Invariance
in Statistical
Inference
(4.18) and (4.19) can be simplified as, take action a if if > ., < JiT . = K (say), (4.20)
K.
L e t us now specialize the case of two pvariate normal populations with different means but the same covariance matrix. Assume Pi : W ( , E ) , p :A. (M,S),
2 p
...,^ )', p
(ft,. ,,p)' e i i
= exp  exp
[ f >  O'S" ^
HYXH*
_ 1
 f * ) ]
(s'E^O
2
0
if
4  ( { ' E
pE'V)
 1
( p . 4) is called
Fisher's discriminant function. In practice E , , ft are usually unknown, and we consider estimation and testing problems for T ( r i , . . . , T ) .
p
Let X , let Y ,
a
E ) and
s = Y, x { a
+ J2(  )(  )'.
Y Y ya y
a=l
where N,X
We consider two testing problems about T. P r o b l e m 4. To test the hypothesis H alternatives Hi T ^ O . :r i =  r = 0 against the
Testa in Multinomials
77
:T
P 1 + 1
= = T
0 against the
i = = T ,X )
P
Without any loss of generality we restate our setup in the following canonical form, where X = {Xi,... tributed, independently of X, as Wishart W (n, E ) and T E J J . Since E is
_ 1
positive definite, T = 0 if and only if n  0. T h u s the problem of testing the hypothesis T = 0 against T ^ 0 is equivalent to Problem 1 considered earlier in this chapter. P r o b l e m 4, It remains invariant under the group G of p x p nonsingular matrices g of the form
9
=(m
(4.21)
\9{21)
where 9{\\) is of order p i , operating as (X S;T S)
} }
9(22) }
(gX^gSg'^g') ?^^')
where X, jf are the mean and the sample covariance matrix based on N observations on X. B y E x a m p l e 2.4.3 a maximal invariant in the space of the with di = pi, d% = P2 is given by (4.4) where the expressufficient statistic (X, S) under G is given by (Ri^R^), and p pi + P 2  T h e joint pdf of ( r ? i , R )
2 2
= AT'sr,
2
S =
with
2 2
M Nr' (x r r
22 r l (2)
1 2 )
(422)
{2)
= (E 2)  E ( 2 i ) E j
( 2
_ n )
E(
_ 1
02 0 and under the alternatives HiSi > 0, i = 1 , 2 . From ( 4 . 4 ) it follows that the probability density function of Ri under Ho is given by
m\$i)
= r(ip,)r(i(A/p,)) x(fi)^
p ,
p l )
"
QjT,  M J
Ifi^l
(4.23)
t\.
Giri (1964) has shown that the likelihood ratio test for this problem rejects
T T 7 r ^
( 4
2 4 )
78
Group Invariance
in Statistical
Inference
where the constant c depends on the level a of the test and under H Z
0
has a
To prove that the likelihood ratio test is uniformly most powerful similar we first check that the family of pdf UXH%)h is boundedly complete. D e f i n i t i o n 4.2.1. {Ps(x) {ps(x) (Boundedly complete). A family of distributions ^0} (4.25)
' 6 E fi} of a random variable X or the corresponding family of pdfs : 6 e fi} is boundedly complete if E h(X)
6
= J = j = 0
k(x)dP (x)
s
h{x)p {x)dx
s
for all 6 6 fi and for any real valued bounded function h(X), h(X) We also say AT is a complete statistic or X is complete. L e m m a 4.2.1. plete.
implies that
of Ri we get (using
= j
Hnmr^dfi
= ex {i* }
P 1
x jJo
f = exp <
l  1 ^
I^ )
Tests in MultxnoTmals
79
where
g ( { P  P i ) , 5 ( *  P i ) + j) B ( i ( J V  p,), i
P l
+ j ) B ( i ( J V  p), i ( p 
P l
))
= 0 implies that
I
^Cn)(n) ^i=0
(4.26)
Since the righthand side of (4.26) is a polynomial in 6\, (4.26) implies that all its coefncents must be zero. In other words
/ Jo Let
h*(f\)f{dfi
=0,
j =0,1,2,....
(4.27)
where h"
and ft* denote the positive and negative parts of h'. Hence we get
from (4.27)
Jo
h(f )
l
Ri,
To find the uniformly most powerful test among all similar invariant tests we need the ratio R R = fH (fl,f \Rl
l 2
= fi) = f%)
}H {T\,T \Ri
a 2
80
Group Invariance
in Statistical
= r
Inference
where / ^ ( f i f f ^  f i i
given Ri =
H  exp   i  ( l  f x ) } <t>
2
~ P i ) , \(P ~ P i ) ;
(428) is true, we
From (4.28), using the fact that Z is independent of R when H get the following theorem:
T h e o r e m 4.2.1. For Problem 4, the likelihood ratio test given in (4.24) is uniformly most powerful invariant similar.
P r o b l e m 5. It remains invariant under the group G of p x p nonsingular matrices g of the form / V > 3(21) \3(31)
0
9 =
3(22) 3(32)
\ 0 3(33)/
with 3 ( H ) : pi x p2, 9(22) : P2 x p2 submatrices of g. A maximal invariant in the space of (X,S) (as in Problem 4) under G is (Ri, R , R3) as defined in
2 2 2
Example 2.4.3 with d\ = p i , d = p and d$ = p  pi  p2. B y Theorem 2.8.1 the joint probability density function of Ri, R , Ri is given by
2
i"" ^i
l F
i(r^WJl
1
x f l  n  ^  f a j i ^  p ' "
l,
'=1
J= l
>>J
where
"  E J'
p
CTQ
' P3 = p  p i  p a
and
Tests in Multinomials
81
&1 = i V f E j i i j l ^ i ) + S
s r
( 1 2
r
i 2
) + (18)1(3) ) ' S j ~ , j
( 3
 i ' '
2 2
j E(ii)r + VE(2i)r ) + (
( 1 ) ( 1
( 1 3 
2 2
r i r ] r j + (23)r(
( 3 ) j l 3 ) ( 2
( 3 )
3 )
\ ^ J ,
, =
( E
3 3
 ( ^ ; ) ' E
. ( ^ ) , r
Under H ,
0
(4.29) it follows that Ri is sufficient for S\ under H . likelihood ratio test of this problem rejects H Z =
1 0
whenever $ c
0
1 
tii
(4.30) Z is
x
where the constant c depends on the level a of the test and under H distributed independently of H i as central beta with parameter (^(N J>2). \V2)Let <p(Ri,R ,R )
2 3
against H i . From
Lemma 4.2.1 H i is boundedly complete. T h u s <j> has Neyman Structure with respect to R .
x
not depend on c\, the condition that the level a test <j> has Neyman structure, that is, H
H D
reduces the problem to that of testing the simple hypothesis 6 = 0 against the alternatives 6% > 0 on each surface Ri f\.
2
most powerful level a invariant test of S = 0 against the simple alternative S 6 >0\s
2 2
given by,
rejectH
whenever
<h
Q(JV 
P l
), ip
z ;
^f 6%j
2
^ c,
(4.31)
(lHi)(lZ)
and Z is independent of H i , we get the following theorem. T h e o r e m 4 . 2 . 2 . For Problem 5 the likelihood ratio test given in (4.30) is uniformly most powerful invariant similar.
82
Inference
4.3. T e s t of M u l t i p l e C o r r e l a t i o n Let X ,
a
a = 1,
a
,N beN
a
X)(X X)'.
\E(2i]
2 2 2 2
E(22) / '
\ 5(21)
S( ) /
2 2
_
=
(i3)E
( 3 2 )
( 2 1 )
;p
> 0. T h i s
(X, 5; p, E )  ( g X + t, gSg'; gp +1, where g g G i s a p x p nonsingular matrix of the form g=f ") 0
9(22)
of each X . (G,T) is
_ ^
2 2
> ( 22)5(21)
5
(H)
which is popularly called the square of sample multiple correlation coefficent. D i s t r i b u t i o n of R . To find the distribution of R R 1 From Giri (1977) 5(u) 5(i2)5, L
2 ( 2 1 ) 2 2 2
( 2 1
Tests in Multinomials
83
is distributed as X N2 2
w P
'
 R ) is distributed as 1 *Np
2
( )5 2 5(22)
12 {2 
(11) 
(12) 2] (21)
[2
( 12) 5(22)
{ 1 2 )
\/E(ii}  S
2 2
( 2 2 )
S(
2 l )
EdajE^Efai)
where Xj.(A) denotes a noncentral chisquare random variable with noncentrality parameter A and k degrees of freedom. Also
s
(i2)E
E
( 2 2 )
5(
2 2 )
 2 2 
S(
1 2
is distributed as chisquare
S
Xwv Since
(12) (~22) (21)
I 2 ) S S
p
( 2 1
E (
2 1
1 P
'
 R ) is distributed as 1 2
,2 I _P_
Using the fact that a noncentral chisquare random variable XmW represented as xt +2K>
1 w n e r e
can be
^, we conclude that ^ ( r r ^  i )
84
Group Invariance
in Statistical
Inference
is distributed as X  p _
1 + 2 K
 P )]XNI2
P(K
= k) =
25<  >r(i(Ar_i))A;!
r((iVi)+*)A*
r (  ( N  l ) + fc) ,
fc!r<(ivi))
k
_ i ,
with lb = 0, 1 , . . . . Thus
is distributed as
T lo U 1 M I 1 U U . c u d o
X p
1 + 2 K
(4.34)
t
2 X
1  B
N
where K is a negative binomial with pdf given in (4.33). Simple calculations yield (N
p)R
(p~l)(lR ) has central F distribution with parameter (p  1, N  p) when p (4.33)(4.34) the noncentral distribution of R (j f R
r 2 2
0. From
is given by
P ( 1
2)l(N 3)
P
( r
2 l( 3)
)
ftHtrl)
'
{ r
>x
r(i(Ni))r(i(ivp)) , f . W W ( j ( f f  i ) fe ^(i( D
P + l
+ i)
.....
( 4
)
'
3 5 )
It may be checked that a corresponding maximal invariant in the parametric space is p . T h e o r e m 4.3.1. For problem 6 the test which rejects H invariant. whenever R most
2 2
^ C,
powerful
Tests in Multinomials
85
P r o o f . From (4.36)
~
^ which for a given value of p
2
(A )T^(Jvi^)r(j(pi))
i!r(i(pi) + i)r (i(7vi))
2
'
is an increasing function of r .
Using Neyman
2
Pearson L e m m a we get the theorem. As in Theorem 4.1.2 we can show that among all tests of Ho : p against Hi : p UMP. 4.4. T e s t of M u l t i p l e C o r r e l a t i o n w i t h P a r t i a l I n f o r m a t i o n
2
= 0
the fi test is
Let X be a normally distributed pdimensional random column vector with mean p and positive definite covariance matrix S , and let X , as X = (Xi,X^,X'^Y
2 a
a 1 , . . . ,7V
[N > p) be a random sample of size N from this distribution. We partition X where X\ is onedimensional, X^ is pidimensional respectively. and X ( ) is p2dimensional and 1 + p i + P 2 = p. Let p\ and p denote the multiple correlatron coefncents of X\ with X[ )
2 2
Denote by p\ = p p\. We consider here the following two testing problems: P r o b l e m 7. To test H m : p p\ = X > 0. P r o b l e m 8. T o test H o2 2
p\ 0,
p\ = 0,
pl =
X>0. = X^,X ,
0
Let NX
S = S^^A"X)(XX)',
(fJ
consisting of the first i components of a vector b and C[;j denote the i x i upper left submatrix of a matrix c. Partitions 5 and E as
(
2
5n
( 1 2
S(
1 3 )
/ En
E(i2j
E(
1 3
5(21]
5(31
2 2 2
5 ] 5(32 )
 2 2
5 ) I , 5(33)/
[ 2 3
I S(2i) \^(31)
E 22)
(
E(
3 3
3 2
E( 3) ) E( )
2 3 3 3 3
where 5 ( ) and E ( 2 ) are each of dimension pi x p i ; 5 ( ) and E ( j are each of dimension P2 x p . Define Pi = E ( i ) E ^ E( /E ,
2 1 2 2 ) 2 1 1 1 1
p =^+pI
{E, E, 3 )fg ;
12) 1 )
22
(32)
86
Inference
Rl
 (i2)S
( 2 2 )
S< )/\Sii ,
2 I
4 +fl = c% ^ ) ( f p jjgj )
2 3) }
_ 1
leaves the present problem invariant and, along with the full linear group G of p x p nonsingular matrices g (gu 9= \ where ffii : 1 X 1, 9 {
2 2 )
o \
(4.36)
0 0
u
: pi x p
leaves the present problem invariant. T h e action of these transformations is to reduce the problem to that where p, = 0 and S = % X X '
=1
is sufficient for
E , where N has been reduced by one from what it was originally. We now treat later formulation considering X", a = 1 , . . . , TV ^ p > 2, to have a common zero mean vector. operating as (S,Z)^(gSg\gU) for the invariance of the problem. A maximal invariant in the sample space under G is {Ri,R )
2
as defined in (4.11).
> 0 and Ri + R
the squared sample multiple correlation coefficent between the first and the T h e joint probability
remaining p  1 components of the random vector X. A corresponding maximal invariant in the parametric space under G is (p\,p\). density function of (R^R )
2
/a(n,f )  m
2
 P r (l
N/2
~ fi  f )^  ~V
2
N p
TJfo)***
" A S S
w t * + A >
< 4
'
3 7 )
87
% = 1 E
P)
'
ff
i = E J'
~
.2
f)/Wii.
7.
and i f is the normalizing constant. B y straightforward computations the likelihood ratio test of Hio when fl { ( , E ) : E 1 3 0 } rejects J?io whenever f ^ c , (4.38)
where the constant c depends on the size a of the test and under HM, RI has a central beta distribution with parameter ( i p i , ( N  p i ) ) , and the likelihood ratio test of H
20
when fl = { ( p , S ) : S
= 0 } rejects H Q whenever
2
' ^ ~ ^ 4 1  rj
(439)
where the constant c depends on the size a of the test and under H o the
2
corresponding random variable Z is distributed independently of R\ as central beta with parameter (^(JV p i ps), \p )2
T h e o r e m 4 . 4 . 1 . For problem 7 the likelihood ratio test given in (4.38) is UMP invariant.
Proof. Under ff
u
Under H
2
i0
= 1 , i = 0,1,2.
Hence a
2
= 0, i = 0, i = 1,2.
7 2
P . = A, p\ = 0, % = 1, 71 = 1  \ a
2
= 1  A,
 1  A, al = 0,
jw/2
i=0
(801
88
Inference
Using the NeymanPearson L e m m a and (4.40) we get the theorem. T h e o r e m 4 . 4 . 2 . For Problem 8 the likelihood ratio test is UMP among all tests <j>(R R.2) ^sed on R\,R\
u b
invariant
satisfying
Proof. Hence
Under
2
p\ = 0, p\'=
2
A, % = 1 , %
 1
1, 7a = 1  A, &{  0,
a\8 = A, h = 0 and 0 = 4 f A ( l 
fiA) .
fn A 2\ l)/fH ( 2\n)
3 20
fH A l,?2)/fnio( l,r2)
1
0 0
.ra(JVp )+t)/
1
4f A
2 1
(2i)!
\ l  f
a j
= (1 z)(l
1. Prove Equations (4.3) and (4.5). 2. Prove (4.10). 3. Show that JrJ^(A) is distributed as X +2K
m w n e r e
i f is a Poisson random
variable with parameter j A . 4. Prove (4.31). 5. Let 7ri , ?r be two pvariate normal populations with means p\ and / i and
2 2
the same positive definite covariance matrix S . L e t X {Xi,... distributed according to ir\ or ir and let b = ( & i , . . . ,b )'
2 p
,X )'
P
be
be a real vector.
(pi p )
2
= {X ,...,
al a n <
X )',
op
dently identically distributed p = 2p!variate normal random vectors with mean p = (/*!.. tUp)' Let X = jjX^X^ *
c o m m
S = ^ , ( X
 X)(X
 X)'.
Write
89
s =
: M(i) =
0
p ( j , with p
2
( 1 }
= N(X
2
( 1
)  X(2))'(5
( 1 1 )
( 2 
where T
is distributed as x ( S ) / ' x % P 1 _ ( ( 2 1 1 ( 1
P 1
with 6
/ V ( M ( I ) ~ r*(2])'
a P l r e
( E ( i i ) + E 22)  E independent.
(b) Show that the above test is U M P I . 7. ( G i r i , 19946). In problem 5 let T with r (a)
( 1 )
E " V 
(Ti
J' =
(T^j, Tfaf
= (r
1 [
...,r )'.
j
Show that the likelihood ratio test of HQ : T ( i ) = r ( 2 ) rejects Ho for small values of Z _ 1 + i V ( X ( i ) + X ( ) ) ( S ( n ) + 5(22) + 5(21) + 5 ( i ) ) ~ ( X ( i ) + X(2))
2 2 , 1
l +
NX'S^X \pi)
where Z is distributed as central beta with parameter {^{N pi), under Hn. (b) Show that it is U M P I similar. (c) F i n d the likelihood ratio test of H
0
: T ^ , = A F ( ) when A is known.
2
(
where E (
3 1
(ii)
E(
1 2
E(
1 3
) \
/S(it)
( 1 2
S(
1 3 
E (2i) E(3l)
U
{ 2 2
) )
E(23) I , E(
3 2
5 = I 5(2i) \ S"<31)
5(2 )
2
( 2 3 
E(
3 3
)/
2 2
5(32)
5(33)
( 3 3
) and 5 ( ) are 1 x 1, E (
j and
5(33) are pi x pi with 2pi = p 1. (a) Show that the likelihood ratio test of H large values of R\ = (5(12)5(i ))(5( ) + S ( )  S ( 3 2 ) 5(23))
3 2 2 3 3 1 0
:E
( 1 2
) = E(
1 3
) rejects H
for
(5(21) ~ 5 ( ) ) / 5 ( ) ,
3 ! U
90
Inference
j=0 where p
2
mhpi+j)
= i V ( ( 1 2 )  2(13)1(2(22) I { 3 ) 
3
( 3 2
)  E(23])~
x ((2i) 
( 3
i))/
( 1 1
).
(b) Show that no optimum invariant test exists for this problem.
References
5. N. Giri, On the Likelihood Ratio Test of A Multivariate Testing Problem II, Ann.
Math. Statist. 36, 10611065 (1965).
12. A. Wald, Tests of Statistical Hypotheses Concerning Parameters when the Num
5.0. I n t r o d u c t i o n T h e invariance principle, restricting its attention to invariant tests only, allows us to consider a subclass of the class of all available tests. Naturally a question arises, under what conditions, an optimum invariant test is also optimum among the class of all tests if such can at all be achieved. A powerful support for this comes from the celebrated unpublished work of Hunt and Stein, popularly known as the HuntStein theorem, who towards the end of Second World War proved that under certain conditions on the transformation group G , there exists an invariant test of level a which is also minimax, i.e. minimizes the maximum error of second kind (1power) among all tests. Though many proofs of this theorem have now appeared in the literature, the version of this theorem which appeared in Lehmann (1959) is probably close in spirit to that originally developed by Hunt and Stein. P i t t m a n (1939) gave intuitive reasons for the use of best invariant procedure in hypothesis testing problems concerning location and scale parameters. Wald (1939) had the idea that for certain nonsequential location parameter estimation problems under certain restrictions on the group there exists an invariant estimator which is minimax. Peisakoff (1950) in his P h . D . thesis pointed out that there seems to be a locuna in Wald's proof and he gave a general development of the theory of minimax decision procedures invariant under transformation group. Kiefer (1957) proved an analogue of the HuntStein theorem for the continuous and
91
92
Group Invariance
in Statistical
Inference
discrete sequential decision problems and extended this theorem to other decision problems. Wesler (1959) generalized for modified minimax tests based on slices of the parametric space. It is wellknown that for statistical inference problems we can, without any loss of generality, characterize statistical tests as functions of sufficient statistic instead of sample observations. Such a characterization introduces considerable simplifications to the sample space without loosing any information concerning the problem at hand. Though such a characterization in terms of maximal invariant is too strong a result to expect, the HuntStein theorem has made considerable contribution towards that direction. T h e HuntStein theorem gives conditions on the transformation groups such that given any test <f> for the problem of testing H : 8 f l / / against the alternatives K : $ fl/f, with fi// n fi/f a null set, there exists an invariant test sup E (f>> eefi
S e
In other words, V behaves at least as good as any <f> in the worst possible cases. We shall present only the statements of this theorem. For a detailed discussion and a proof the reader is referred to Lehmann (1959) or Ghosh (1967). L e t V {Pg,0 clidean space (X,A), e f!} be a dominated family of distributions on the E u dominated by a crfinite measure p. Let G be the group
of transformations, operating from the left on X, leave fi invariant. T h e o r e m 5.0.1. (HuntStein Theorem). Let B be a afield of subsets of with gx A is in A x B right invariant in that there exists a sequence of
G such that for any A E A, the set of pairs (x,g) and for any B B, g G G , Bg e B. Suppose distribution functions v
n
on ( G , B) which is asymptotically
 v(B)\
= 0.
1
(5.3)
T h e n , given any test <j>, there exists a test V which is almost invariant and satisfies conditions (5.1) and (5.2). It is a remarkable feature of this theorem that its assumptions have nothing to do with the statistical aspects of the problem and they involve only the group G . However, for the problem of admissibility of statistical tests the situation is more complicated. If G is a finite or a locally compact group the best invariant test is admissible. For other groups the nature of V plays a dominant role.
Some Minimax
Test in Muttinormates
93
T h e proof of Theorem 5.0.1 is straightforward if G is a finite group. L e t m denote the number of elements of G. We define
As observed in Chapter 2, invariant measures exist for many groups and they are essentially unique. B u t frequently they are not finite and as a result they cannot be taken as a probability measure. We have shown in Chapter 2 that on the group 0{p) of orthogonal matrices of order p an invariant probability measure exists and this group satisfies the conditions of the HuntStein theorem. T h e group GT(P) of nonsingular lower triangular matrices of order p also satisfies the conditions of this theorem (see Lehmann, 1959, p. 345). 5.1. L o c a l l y M i n i m a x T e s t s Let (X,A) be a measurable space. For each point (5, JJ) in the parametric with
space f!, where 6 > 0 and ij is of arbitrary dimension and its range may depend on i5, suppose that p{; 6,n) is a probability density function on (3Z,A) respect to some ufinite measure p . We are interested in testing at level a (0 < a < 1) the hypothesis HQ : 6 0 against the alternative Hi : 6 = A, where A is a positive specified canstant and in giving a sufficient condition for a test to be locally minimax in the sense of (5.7) below. T h i s is a local theory in sense that p(x;A, JJ) is close to p(x;o, 7?) when A is small. Obviously, then, every test of level a would be locally minimax in the sense of trivial criterion obtained by not substracting a in the numerator and the denominator of (5.7). It may be remarked that our method of proof of (5.7) consists merely of considering local power behaviour with sufficient accuracy to obtain an approximate version of the classical result that a Bayes procedure with constant risk is minimax. A result of this type can be proved under various possible types of conditions of which we choose a form which is more convenient in many applications and stating other possible generalizations and simplifications as remarks. Throughout this section expressions like o(A), o(/i(A)) are to be interpreted as A > 0. For fixed a , 0 < a < 1 we shall consider critical regions of the form (5.4) where V is bounded and positive and has a continuous distribution function for each {S,Tj), equicontinuous in (6,7)} for 6 < some 6$ and which satisfies
94
Group Invariance
in Statistical
Inference
(5.5) P , (R)
x v
= a +
k(X)+q(X, )
V
= o{h(X))
uniformly in v with h(X) > 0 for A > 0 and h(X) = probability density functions <J,A
{ f r o i f ^
=
2
1 +
WW)
+ r(Xmm
B( ,
X
A)
(5.6)
uniformly in x.
T h e o r e m 5.1.1.
(Locally
Minimax
(5.5)
and for sufficiently small X there exist o,i. and fa^ satisfying that is to say,
(5.6) then R is
lim i  o Sup0
t _
tu/iere Q
+ c
r(A)}]
 1
r(X)}.
(5.8)
Using (5.7) and (5.8) the Bayes critical region relative to the a priori distribution
\ = (1  T ) O , X + n
A
95
Pi,*(A)
Px,vW)
6 , A W ,
B\, (5.10)
= B
 R .
o(A)
and the continuity assumption on the distribution function of U we get Plx(Vx Also for U
x
+ W )=o(X).
x
(5.11)
= V
Let r* (A) = {1
x
 n) ^, 4
A
n ( l  pfc(jf)).
Using (5.9), (5.10) and (5.11) the integrated Bayes risk relative to $\ is given by r'x(Bx) = rl(R)
,
+ (1  n ) [ P ' , ( ^ )  P * ( V ) ]
0 x 0 A A 1 A A
^O%(VX + W A ) O ( M A ) )
= r ( R ) + o(/ (A)).
A l
(5.13)
If (5.7) were false one could by (5.5) find a family of tests { 0 > J of level a such that d> has power function cv + g(X, ij) on the set 6 A with
x
> 0.
lim s u K ( i i )  r , ] / h ( A ) > 0 ,
P
A0
contradicting (5.13).
96
Group invariance
in Statistical
Inference
Remarks. (1) Let the set {6 = 0} be a single point and the set {8 = A } be a convex finite dimensional Euclidean set where in each component $ of TJ is
00 (A)) If
P ? ' ^ = l + h(X) U(x) + T, p(ar;0,Jj) where s
1
(5.14)
if
exists any
which
assigns all its measure to the mean of I,A also satisfies (5.6). (2) T h e assumption on B can be weakened to Px,{\B(x, as A  * 0 uniformly in n for each e > 0. If the A )  < e k(X)}  0 i are independent of
A the uniformity of the last condition is unnecessary. T h e boundedness of U and the equicontinuity of the distribution of U can be similarly weakened. (3) T h e conclusion of Theorem 5.1.1 also holds if Q is modified to include every family {<j>x} of tests of level a + o(h{X)). consider the optimality of the family {Ux} replacing R by Rx with qxiv) o(h{X)). (4) We can refine (5.7) by including one or more error terms. Specifically one may be interested to know if a level a critical region R which satisfies (5.7) with inf Px, {R)
n
{x : U {x)
x
> c ,x},
a
where P {Rx}
= a + ciA + o(A),
as A  > 0
also satisfies
l i m
mUPxMRladX o sup^
e
 a  c,X distribution
In the setting of (5.14) this involves two moments of the a priori more moments are brought in.
I , A rather than just one in Theorem 5.1.1. A s further refinements are invoked The theory of locally minimax test as developed above and the theory of asymptotically minimax test (far in distance from the null hypothesis) to be developed later in this chapter serve two purposes. F i r s t the obvious point of demonstrating such properties for their own sake. B u t well known valid doubts
Some Minimax
Test in Multinormales
97
have been raised as to meaningfulness of such properties. Secondly, then, and in our opinion more important, these properties can give an indication of what to look for in the way of genuine minimax or admissibility property of certain tests, even though the later do not follow from the local or the asymptotic properties.
Write NX
=
0
X 5 
f (X; NX'{S
 X)'.
Let 6 =
l
> 0. T
2
For testing H
whenever R 
+ NXX')~ X
> c, where c is
chosen to yield level a, is U M P I (Theorem 4.1.1). N o t e . For notational convenience we are writing R\ as R. L e t 6 A > 0 (specified). We are concerned here to find the locally We minimax test of Hn against Hj : S = A as A  > 0 in the sense of (5.7).
assume that N > p, since it is easily shown that the denominator of (5.7) is zero in the degenerate case N < p. In our search for locally minimax test as A > 0 we may restrict attention to the space of sufficient statistic ( X , S). T h e general linear group G)(p) of nousingular matrix of order p operating as ( , s ; p , E ) . (gx,gsg';gp, gT,g')
leaves this problem invariant. However, as discussed earlier (see also James and Stein, 1960, p. 376), the HuntStein theorem cannot be applied to the group G ( p ) , p > 2. However this theorem does apply to the subgroup G j  ( p )
(
of nonsingular lower triangular matrices of order p. T h u s , for each A, there is a level a test which is almost invariant and hence for this problem which is invariant under Grip) (see Lehmann, 1959, p. 225) and which minimizes, among all level a tests, the minimum power under H i . From the local point of view, the denominator of (5.7) remains unchanged by the restriction to GT invariant tests and for any level a test d> there is a G j invariant level a test 4>' for which the expression i n f , P x , , [4>' rejects H )
a
is at least as large, so
that a procedure which is locally minimax among GT invariant level a tests, is locally minimax among all level a tests. I n the place of onedimensional maximal invariant R under G j ( p ) we obtain a pdimensional maximal invariant (Ri,...,R )
p
98
Group Invariance
in Statistical
Inference
J2 i=]
Rj = NX{ {S
A
+ N X ^ r ' X ^
, i = 1,..  ,p
(5.16)
with R, > 0, ] ?
= l
flj  J? < 1 and the corresponding maximal invariant on 6 ) (Theorem 2.8.1) where
P
(5.17)
5.
i) = 0.
From
on the set
i n , . . . , r ) : r , > 0,
p
JPfl <
is given by
/(ri,...,r) =
x exp
yj , /JV  + 1
i 2'
nSi
2
n c e
(5.18)
We now verify the validity of Theorem 2.8.1 for U Yl^=i ^  s i preceding (5.5) for the locally minimax tests are obvious. fi(A) = b\ with 6 a positive constant. Of course P\,(R) 77. From (5.18) we get with r = ( r i , . . . r ) ' ,
p
those
In (5.5) we take
2 o(A) uniformly in r , n as A * 0.
0  A
B(J,X,V)
(519)
satisfied by letting
gives measure one to the single point rj = TP* = (TJ", . . . T?;  (N  j)~ (N
L
 j + l r v ^ o v  p),
Some Minimax
Tesi in MultinoTmales
99
> p) Hotelling's
test based on
alternatives Hi : 6 = X as X * 0. E x a m p l e 5.1.3. Consider Problem 2 of Chapter 4. From E x a m p l e 5.1.2 it follows that the maximal invariant in the sample space is ( J ? i , . . . , R )
P1
with
Ri
=
P
a n
corresponding maximal invariant in the parametric space Under Ho, Si = 0 and under Hi 6i X.
Now following E x a m p l e 5.1.2 we prove Theorem 5.1.3. T h e o r e m 5.1.3. For every pi,N,a large values of R\ is locally minimax alternative Hi : 6\ X as X * 0. E x a m p l e 5.1.4. Consider Problem 3 of Chapter 4. We are concerned here to find the locally minimax test of Ho : Si = 0, S = 0 against the alternatives
2
the UMPi
Hi
: 8i = 0, 6
X > 0. A s usual we assume that N > p the determinator To find the locally minimax test we restrict attention S).
2
T h e group (G2,T )
2
operating as
does not satisfy the condition of the HuntStein theorem. However this theorem holds for the subgroup (G {p\
T
+ p ) , T ).
2 2
Pl
such that
^R^NX'u
(SM + NXMX ) X ,
{i] [{l
i = l,...,pi
+ P
(5.20)
with Ri >0,Ri=
Rj, Ri+R
= E^Li"
R,
3
such that
(5.21)
= E j L i 6j, h + 6 =
2
5,. Under H
100
Croup Invariance
in Statistical
Inference
w
P i + Pa with E f ^ + i * =
 l + f,!/o,(r) 2
^ r j . + f/V.j + l ) ^
+ (r,n,A) (5.22)
A (fixed)} Is a
Euclidean set wherein each component jj; is o(/i(A)), can be replaced by the degenerate measure
n P2 A
f &,*)
/
I
fx, (r)
n
ux(dv)
fo.n(r)
=I
+ ^  i + f i + "E *(l>+Cwi+i>%J
\iiAdn)
+ B(r,A,7j)
= 1 + L
 l + f , +
J = P 1 +
^ ( ^ i t f + t W  j '
V , > J
+ l h n
/ J
+B(r,A) (5.23)
where B{r, A) o(/i(A)} uniformly in r . Let il* be the rejection region defined by Rk = {X : U(X) = J * , + kR
2
> C }
Q
(5.24)
where fc is chosen such that (5.23) is reduced to yield (5.6) and the constant C
a
depends on the level a of the test for the chosen fc. Choose (jyji)...(/VP l
p )
3
(A Pi)
P2 (iV  p ,  P 2 + 1)
101
j = pi + 1, . . . , p i + p n _ (^Pi)
 1,
( 5 2 5 1
so that
V M ^  J + l l ^ ^^, J =
T h e test 0* with rejection region
N  p ,
P i + 1,
  ,pl + P 2
JT = j * :
U(X) = % +
> C
(5.26)
with Pa,\{R*)
Theorem 4 .1 .4, for any invariant region R*, Px (R*) test based on Ri + R
is locally worse. It is easy to verify that the power function of Hotelling's test which depends only on 03, 6\ being zero, has positive derivative everywhere, in particular, at 8
2
get the following theorem. Theorem 5 .1 .4. For Problem 3 the L B I test 0* is locally minimax as A * 0. R e m a r k s . Hotelling's T
2 2 2
ant under (G , T ) and therefore thier power functions are functions of 6 only. From the above theorem it follows that neither of these two tests maximises the derivative of the power function at S = 0. So Hotelling's T
2 2
likelihood ratio test are not locally minimax for this problem. E x a m p l e 5.1.5. ( G , T), transforming ( X , S; p, S ) (gX + t, gSg'; ga + t,
2
T h e affine group
gBg') 0 again H\ : p
2
A > 0 (specified) invariant. However the subgroup ( G T ( P ) , T ) , where Gj(p) is the multiplicative group of nonsingular lower triangular matrices of order p whose first column contains only zeros expect for the first element, satisfies HuntStein conditions. T h e action of the translation group T is to reduce the
102
by one from what it was originally. We treat the latter formulation considering to have zero mean and positive definite covariance matrix E and for invariance. A maximal invariant in the (R ,
2
 , Rp)', where S
( 2 ] ) [ i
_ S )[i] i=i
5 ~
2 ) [ i ]
5(H)
where C[i] denotes the upper lefthand corner submatrix of C of order i and tyi] denotes the ivector consisting of the first i components of a vector 6 and
\ (21)
3(22)7
>p
which implies that 5 is positive definite with probability one. Hence Ri > 0, = U = ^2* Rj
2
1
(R
coefficient).
2 P
A = (6 ,...,S Y,
5 > i2
.2 _ V P with 6, > 0, p = E ?=2 >2 5 L e t
(5.28)
T h e joint distribution of J ? , . . . , R
2
( l  X ) ^ ( l  T , ^ r i ) nHrDr(l(Np+ I)) ( l + E
P = 2
rjiCl  A ) /
7 i
 1])
x fi
J=2
+ i)^  r(l{Ni
N +2]
+2))
ft,=D
/3 =D
P
\j=2
r ( i ( j v  i + 2) +
3=2 L
ft;
4T (lA)/
J
7 j
(l+7r
10
(5.29)
103
T h e expression 1 / ( 1 + TT^
means 0 if
+ B ( r , A , ) , (5.30)
/
/o,o(r)
3=2
\ i>J
where J(j", A,T?) o ( A ) uniformly in r, 77. From Theorem 4.3.1 it follows that the assumption of Theorem 5.1.1 are satisfied with U = letting i j + 2)
 1  A _ 1 P = 2
^j = R
give measure one to n" = ( ) ) , . ... ,1(2} where jjj (JV j + l j ' ( W
the following theorem. T h e o r e m 5 . 1 . 5 . F o r ever?/ p,N and a the R? test which rejects H coefficient R
2 2
for
large values of the squared sample multiple correlation minimax for testing HQ against Hi : p Remarks. = A > 0.
is locally
form. (5.18) involves the ratio of noncentral to central chisquare distribution while (5.29) involves similar ratio with random noncentrality parameter. T h e first order terms in the expressions which involve only mathematical expectations of these quantities correspond each other. T h e group G
E x a m p l e 5.1.5.
as defined there does not satisfy the condition of the HuntStein theorem. However this theorem does apply to the solvable subgroup GT of p x p lower triangular matrices g of the form (9u 0 9 = V 0 0 0 0 \ (5.31) Spp/
322
92p From E x a m p l e 5.1.4 a maximal invariant in the sample space under GT is R ( i ? , . . . , R ) ' and the corresponding maximal invariant in the parametric
2 p
From (5.29) R has a single distribution under and on the pdimension parameter r
HIQ and H o and it has a distribution which depends continuously on the p i dimension parameter Fix under Hi\ under H ,
2X 2A
where
104
Inference
IX
= { A :tf<> 0, t = 2 , . . . , p i + l ,
PI+I
Si=0,i
= pi
+ 2,...,p,
;=2
= pf = A } , i pi (5.32)
r > = { A : tfi = 0, t = 2 , . . . , p i + l ,
2
6i>0,
+ 2,...,p,
g
P 1
ft
= 4
= A>.
i= +2
Let
with
tfi/p , 0,
if p > 0 if p = 0 .
2
Because of the compactness of the reduced parameter space { 0 } and and the continuity of
1A
fx,n( )
minimax test for the reduced problem in terms of the maximal invariant R is Bayes. Thus any test based on R with constant power on TJX is minimax for problem 7 if and only if it is Bayes. From (5.29) as A ~* 0 we get (for testing H
10
against
Hi\)
PI+I
j=2
,i>j
+ B(r,A,ij)
(5.33)
where S ( r , A, 17) o(A) uniformly in r and r>. It is obvious that the assumption of Theorem 5.1.1 is satisfied with
PI+I
U=
J^Rj
j=2
= Ri,
h(\)
b\
where b > 0. T h e set p 0 is a single point n = 0, so rjj assigns measure one to the point T? 0. T h e set Fix is a convex p\dimensional component r>i o(h(\)).
0 0
can be replaced by the degenerate measure * which assigns the probability one t o m e a n r i  ( J? ,. . . , J 7 , , ) o f i * . C h o o s i n g *
+1 A
= (Nj
105
p + l){Npi),
follows from Theorem 4.4.1. Hence we prove the following theorem. T h e o r e m 5 . 1 . 5 . For Problem 7 the likelihood ratio test of Hit H\\ as X > 0. H x)
2
defined in
(4.38), is locally minimax against the alternatives In problem 8 as X > 0 (for testing H Q against
2
A,(r)
.V A
/o,T,(r)
 i + n + Y,
i(S'K+(
i V
.. i+ H
2
+5(r,A,7i),
(5.34) where B ( r , A, JJ) o(/i(A)) uniformly i n r, TJ. Since the set I ^ A is a p2dimensional Euclidean set wherein each component r/, o(h(X)), argument as in problem 7 we can write using the same
/
I
h,n( )^o,x(dv)
r
1
+ f, +
YI >(
r
3 +
2)%
j=pi+i
1 +
JVA
 i + f,+ 2 ^(Sf+c^j'+iM
=
+ B(r,A) (5.35)
where B ( r , A )
+ 2
to
<)'
> c}
a Q
(5.36) depends on
the level of significance a of the test for the chosen k. Now choosing
(Np + 2)(Np+l)
( N  p + 2)p
106
Inference
B* =
U(x) = ft + ^  ' 2
> c
j
k
with P O , A ( W * ) = Q satisfies (5.6) as A  0. Moreover any region R form (5.36) must have k =
2
of the From
(4.37) it is easy to conclude that the test 4>' is L B I for problem 8 as A 0. T h e iE test which rejects H20 whenever r test depends only on p at p
2 2
f i +f
> c
and is locally worse. From Sec. 4.3 it is evident that the power function of R? and has positive derivatives everywhere in particular > 0. So we get the following theorem. = 0. From (5.5) with R R*, h{\)
whenever
f 1 + ~f
> c
where c
= p\ = A,
"minimax" simply means "most powerful". T h e rejection region of the most powerful test is obtained from (5.34), from which it is easy to conclude that [/A ,7j('")//o,n(r)] depends nontrivially on A so that no test can be most powerful for every value of a. 5.2. A s y m p t o t i c a l l y M i n i m a x T e s t s We treat here the setting of (5.1) when A > 00. o(H(\)) Expressions like o ( l ) ,
imizing a probability of error which tends to zero. T h e reader, familiar with the large sample theory, may recall that in this setting it is difficult to compare directly approximations to such small probabilities for different families of tests and one instead compares their logarithms. While our considerations are asymptotic in a sense not involving sample sizes, we encounter the same difficulty which accounts for the form (5.39) below. Assume that the region R={x:U(x)>c }
a
(5.37)
107
P (R)
x>n
= = 1  e x p {  i r ( A ) (1 + o ( l ) ) }
(5.38)
= e x p { f f ( A ) [ G ( A ) + R(\)U{x)}
0 A
+ B(af,A)}
(5.39)
< oo.
One other
distribution function of U when 6 0 uniformly in n, i.e. infPo(E/ > c for every e > 0. T h e o r e m 5 . 2 . 1 . / / U satisfies (5.38)(5.40) and for sufficiently there exist , A and i
0 ( A a
 e ) > a
(5.40)
large A
satisfying minimax
(5.39), then the rejection region R is asympof level a for testing Ho 6 = 0 against the
totically logarithmically
inf,[log(lP ,,{*})]
A A e
A  sup^
P r o o f . Assume that (5.41) does not hold. T h e n there exists an e > 0 and an unbounded sequence V of values A with corresponding tests d>\ in Q critical region satisfies P , {R}
X V a
whose
(5.42)
for all IJ . There are two possible cases, (5.43) and (5.46). If A T and  1  ( 7 ( A ) < R(X) c
a
+ 2e,
(5.43) and r
A
consider the a priori distribution A (see Theorem 5.1.1) given by satisfying r /(l A n
) = e x p { i f ( A ) (1 + 4e)} .
A
(5.44) must
Using (5.42) and (5.44) the integrated risk of any Bayes procedure f satisfy rl(Bx) < rl(M < (1  r )a
x
+ r e x p {  J / ( A ) (1 + 5e)}
A
(1 
Tx
)\a
+ exp{e H(A)}].
(5.45)
108
Inference
= {x : U(x) + B(x,X)/R(X)H(X)
Hence, if X is so large that B{x,X) sup we conclude from (5.43) that B D{x:
x
H(X)R(X)
U(x) > c
!V
e/c } =
2
B' (sa.y).
x
T h e assumption (5.40) implies that Po {B' } (5.45) for large A. On the other hand if A e V and lG(A)>R C
A Q
+2e,
a
given by * j , and T
. C r7, so
Thus, if s u p I R(X^BIX) I */2c2 we conclude from (5.46) that B\ that, by (5.37) and (5.47), r*(Bx) > nexp{H(X)[l
A
o(l)}}
0
(5.48)
= ( l  r ) [ o + exp{4e
A
H(X)}], = J % .
we contradict (5.48) for sufficiently large A. E x a m p l e 5.2.1. ( T  t e s t ) In Problem 1 of Chapter 4 let U(X) Since 0 ( a , 6 ; x ) = e x p { x ( l + o ( l ) ) } as x * co we get, using (5.18),
2
+
J= l i>j
(5.49)
109
with Sup
= 1, 7)i 0,
< c}
a
= exp{(
C a
 1)[1 + o ( l ) ] } =  ( 1 c ).
a 5
as A oo. i
i A
" Vpi
0 Vp 1
O,A From
assign measure one to (0,0) we get (5.39). Theorem 5.2.1 we get Theorem 5.2.2. T h e o r e m 5.2.2. For every p,N,a,
Hotelling's
test is
asymptotically
minimax for testing HQ .6 0 against H\ : 6 A as A > co. From Sec. (5.1) we conclude that no critical region of the form ai R{ > c other than Hotelling's would have been locally minimax, many regions of this form are asymptotically minimax. T h e o r e m 5 . 2 . 3 . c < 1 and 1 a < a < < a ,
p
region
{ tti Ri > c } is asymptotically minimax among tests of same size as A > 00. P r o o f . T h e maximum of V f j 2~2i>j Vi subject to at r\ c, T2 r
p
0.
region near that point yields (5.50) with c replaced by c. with U Yl{aiRi. Again (5.40) is trivial.
nondecreasing in j it follows from (5.49) that I,A can be chosen to yield (5.39)
E x a m p l e 5.2.2.
R )',
Pl
r, = (n,,...
,n )'
Pl
we get
e x
P S~
^E^E*
( l + B(r,A,>)))
(5.51)
above theorem we now prove the following. T h e o r e m 5.2.3. For every p i , N, a the UMPI imax for testing HQ : 61 0 against Hi :6 = \as\* E x a m p l e 5.2.3. test is asymptotically 00. min
: 61 0,
110
Inference
62 = A as A co.
r 1 r f
/o,n(r)
where sup,.,, \B(r,A,n)
 l + fi +
iK
(5.52)
00,
/Cl .falO)
where s u p , . ^ \B(f ,f ,\)\
t 2
= exp
^[l
+ f i + f ] ( l + B(fi , f . A ) ) j
2
(5.53)
= o ( l ) as A * 0.
A,,(rKi.x(dij)
 l + f,+
>=Pi + l
r, 2 *
i>j
.?.A))
(5.54)
given by (5.55)
k
c}
a
where k is chosen such that (5.54) is reduced to yield (5.39) and R chosen k satisfies (5.38) and (5.40). Now letting 1 / 1 = = T) 7
P l + P )
for the _i = 0, R.
2
P l + p l
fa
 1)(1 + o ( l ) ) j .
2
(5.56) satisfies
(5.38) with H(X) = f (1  c' ). T h e fact that Hotelling's test satisfies (3.39) is trivial. Since the coefficient of r
p i +
with k /
1 and which
111
satisfies (5.38) and (5.40) cannot be minimax as A co. Using Theorem 4.1.5 we now prove the following:
best invariant
T
is
0,
R e m a r k s . From Theorem 5.2.3 it is obvious that there are other asymptotically minimax tests, not of the form Rt, for this problem. It is easy to see that P {R
Xin
(lc) R >
2
1) = l  e x {  ( l o g A  l o g ( ( l  c ) ( l + c ( l ) ) ) } ,
P
P a
'" (
^ r
=1
{ ~
~/l*fi *' 5
1
ft
and the locally minimax test satisfy (5.40) and (5.38) is obvious
in both cases. B u t from Theorem 5.2.4 these two tests are not assymptotically minimax for this problem. 5.3. M i n i m a x T e s t s In this section we discuss the genuine minimax character of Hotelling's T R.
2 2
and the test based on the square of the sample multiple correlation coefficient These problems have remained elusive even in the simplest case of p 2 or
2
test by a method which could not be used to prove the admissibility character of R  t e s t . G i r i , Kiefer and Stein (1964a) attacked the problem of minimax property of Hotelling's T  t e s t among all level a tests and proved its minimax propery for the very special case of p 2, N 3 by solving a Fredholm integral equation of first kind which is transformed into an "overdetermined" linear differential equation of first order. Linnik, Pliss and Salaeveski (1969) extended this result to N 4 and p 2 using a slightly more complicated argument to construct an overdetermined boundary problem with linear differential operator. Later Salaveski (1969) extended this result to the general case of p = 2.
2
112
Inference
5.3.1. Hotelling's
test
In the setting of Examples 5.1.1 and 5.2.1 we give first the details of the proof of the minimax property of Hotelling's T
2
test as developed by
G i r i , Kiefer and Stein (1964a), then give the method of proof by Lunnik, Pliss and Salaeveski (1966). In this setting a maximal invariant under G j f p ) is R =
p
(R ,... R )'
1 } P  1
with
= NX'(S
+ NXXy^X
and
5,>0,
= Let us write
r i n
as the pdf of
reduced parametric spaces { 0 } and T and the continuity of f&( ) in terms of R is Bayes. In particular Hotelling's T
1 2
conclude from Wald (1950) that every minimax test for the reduced problem test which rejects He, whenever U = J R\ > c, which is also G r  i n variant and has a constant power function on each contour Np'T,~ p H\ constant Ci \m 1 = ] *t (5.57) = X, maximizes the minimum power over if and only if there is a probability measure on T such that for some
'r / o t according as
<
= > c
K< J
except possibly for a set of measure zero where c depends on the specified level of significance a and c i may depend on c and the specified X. A n examination of the integrand of (5.57) allows us to replace it by its equivalent
r Jo\ )
d A ) = c,
if
Y >  e . ,
(5.58)
Obviously (5.57) implies (5.58). O n the other hand, if there are a and a constant C[ for which (5.58) is satisfied and if r* = ( r j , . . . , r ' ) ' is such that i r j = c' > c, then writing / = ^ and r" = ft* we conclude that
Some Minimax
Test in Multinormales
113
because of the form of / and the fact that c'/c > 1 and
r** = f E i t
This and similar argument for the case c' < c show that (5.58) implies (5.57). Of course we do not assert that the lefthand side of (5.58) still depends only on 2%n if
p
n / c.
The computation is somewhat simplified by the fact that for fixed c and A we can at this point compute the unique value of c i for which (5.58) can possibly be satisfied. Let R = (Ri,...,
p
= i Ri it with respect to the Lebesgue measure, which is continuous in f and u with U = E i < it < 1. which is continuous for 0 < u < 1 and vanishes elsewhere and
j /AtfMtfdA) =
1
fo(f\c) /;*<<=)
(5.59)
if Ti > 0, E i
<
 The
fi ^
an
Hence the expression inside square brakets of (5.59) is equal to one. From (4.5)
r(f) . ^ )
r( V ) _ ^ 
i  ,
' ^ , E  ) . .
{ 5
6 0 )
, / J V /j
cA
J r
i>3
>> = 1
r c
'2' 2 i
'
'12'!'
2
(5.61)
114
Inference
J r
>
I )=1
i>j
fe=l
'
(  > 'i = 7
a i m
(h
t ) with g j f f t = 7P
From (5.62) it is evident that such *, if it exists, depends on c and A only through their product 7 . Note that for p 1, T i is a single point but the dependence on 7 in other cases is genuine. C a s e p = 2, N = 3. S o l u t i o n of G i r i , K i e f e r a n d S t e i n Since 0 (  , (1 + x)exp(^x), from (5.62) we obtain
jf [i + ( 7  
ft)]<TO)
= ^ " ^ ( f , i ; (5.63)
with fi = 7  f , A = 1  / 3 . We could now try to solve (5.63) for by using 1;. 1 . the theory of Meijer transform with kernal ^ ( 1 , 4 hx). Instead we expand both 21 2' sides of (5.63) as power series in f . Let
2 2 2
Jo
(a)
(b)
l+t*m=B,
(2r 1) p _ ! +
r y
(2r + 7 W
7 ^+1 = B
T()r(i)J
where B = e~^ d>(^, 1; 7 ) . We could now try to show that the sequence { p , } given by (5.64) satisfies the classical necessary and sufficient condition for it to be the moment sequence of a probability measure on [0,1] or, equivalently, that the Laplace transform
GO
=0
Some Minimal
Test in Muttinormales
115
is completely monotone on [0, oo), but we have been unable to proceed successfully in this way Instead, we shall obtain a function m (x)
1
which we
of an absolutely continuous
M t )
= B t [ ( l  t r = S ( ( l  t ) "
 l ] + 7 [ t ( l  / 0  l ]
2
 (  7 .
(5.65)
T h i s is solved by treatment of the corresponding homogeneous equation and by variation of parameter to yield f J
0
m n
'
(li)
zl [3T#Ift*
2
2T (lT) /2
2 1
2T(1T)J
'
(5.66) the integration being understood to start from the origin along the negative real axis of the complex plane. T h e constant of integration has been chosen to make VJ continuous at 0 with VJ(0) = I , and (5.66) defines a singlevalued function on the complex plane minus a cut along the real axis from 1 to oo. The analyticity of ip on this region can easily be demonstrated by considering the integral of ip on a closed curve about 0 avoiding 0 and the cut, making the inversion w =  , shrinking the path down to the cut 0 < w < 1 and using (5.67) below. Now, if there existed an absolutely continuous * whose suitably regular derivative m
7
satisfied
j m ( a ; ) / ( l  tx)dx = i>(t), Jo
7
(5.67)
we could obtain m
m^(x)
limty(x
 1
+ it)  T / > ( X
1
 ie)].
(5.68)
116
Inference
However, there is nothing in the theory of the Stieltjes transform which tells us that an m (x)
y
m,(x)
B
2rr x^ (l
2
v>l (1 +
+ B u) '
3 2
 x) '
1 2
\ J
 l + u
Ja
1
J
(5.69)
(a)
<1, (5.70)
= l xmf(x) dx satisfies (5.64) (a) dx satisfies (5.64) (b) for all r > 1.
(d) p
x m {x)
1
T h e first condition follows from (5.69) and the fact that B > 1 and u / ( l + u) < (1 + w) ' for u > 0. T h e condition (d) follows from the fact that m ( x )
7 3 2
w! (x) + m ( i ) 
7
+ (l2x)/2x(lx)
1/2
(5.71)
( r + l K  r ^
_ , =
/ [(r + l ) x Jo
 rx  \m (x)
y
dx
= / Jo = u (l
T
dx / 2 ) + 7 ii /2
T+l
r_i/2
+ B r ^ + ^/2Tr^r! which is (5.64) (b). T o prove (5.70) (b) and (e) we need the following identities involving confluent hypergeometric function <p(a,b;x). T h e materials presented here can
Some Minimax
Test in Multinormates
117
The
0 K 6; x) =
5=0
T(a + j)T(b)/r{a)T(b
+ j) j \ JS>
when 6 > a > 0. T h e associated solution ip to the hypergeometric equation has the representation
if a > 0. We shall use the fact the general definition of ip, as used in what follows when a 0, satisfies iP(0,b;x) = 1. (5.74)
T h e functions T/J and <f> satisfy the following differential properties, identities and integrals.
~<t>(a,b;x)=
(jl)b+l;x)
+ 0(o,6;i),
(5.75)
^ / > ( a , 6; x )  $ ( a , 6; x)  ^ ( a , 6 + 1; x),
(5.76)
Jo
e~ (x
r,l;y
118
Group Invariance
in Statistical
Inference
e  * ( s + y)vQ,l;z)<te
(5.83)
dvv^e^^dv
r r ? r 2 TT[X (1 E/2
. ( e " ^ f i i : T / 2 ) ^ i , i ; 7 ( i  ) / 2 ) ( \2 /
r()v(,l;7/2)}.
(5.84)
We now prove (5.70) (b) to establish that m,(x) is an honest probability density function. From (5.73), (5.72) and (5.82) we get
/
Jo
=
VTTt^*(i.i;7(i)/?)#
2ir x f l  i ) J f Jo VT7T~ r ^ ( l , 1 ; 7 * / 2 ) tte 2ir[x(l  l ) ] a
v
dt 7o 2 , r [ * ( l  * ) ] 7o
1 J
(5.85)
119
ri f h
r()
dx
= 2 ^ , l ; ^ ( i , l ; , ) where 2z = 7 . We now show that tf'(z)#(*)=0, from which it follows that H{z)=ce for some constant c.
l
 0 ( I , l ; ^ ( ^ , l ;
(5.86)
B y direct evaluation in terms of elementary integrals when 7 0 we get (using (5.85)) / m (x) dx = 1; Jo hence, (5.70) (b) follows from (5.S6). T o prove (5.86), we use (5.75) and (5.76),
a
+ # /  , l ; \ ^ Q
; *
1 ~ n H ^2; 2
 <P[  , 2 ; z
2 \2
T
1 . ^3
+ 0
,l;z
2* U
(N
2 ^
2
01
2 \2
120
Group Invariance in Statistical Inference We now verify (5.70) (c). Prom (5.72) and (5.79), with a = , b = 1 we
obtain
1
(1+7?/) e'
dy = M ,l; /2
7 1
+70
,2;7/2
/4 (5.88)
Using (5.70) (6), which we have just proved, and (5.85) and (5.88) we rewrite (5.70) (c) as 11 AI
1
, mt/aJJ
jf u + 7(1  $}] M * ) *
(l + 7 ) 2T[(1 )]I/2
V
1 ; 7
, ?
 e ^ r
1 7
11; /2
7
)\dy f 7
l
g ^(l,l;7g/2) 2^(1 
, f y I
1 +
(l + 73/)e^
2ff[v(l  J/)]V2
f Jo +
dy
(5.89)
l/3 1_ 2 r (  ] ^ (  , l ; / 2 )  2 r [V 2 ]^[,l;7/2
7
7g^(l,l;7/2)
, _
f
0
M0,0;7y/2)(l,O;7y/2)]
d
2 ^ ( 1  , ) ] ^ ^  y
= 1 
f _ dy Jo * [3/(1  ) ] ^ Jo
a
= l  j T ( l +
= 1+ r
r V Q ^ 7 * / 2 ) e  ^
() ^(I^^A^dHT/s)
/2. (5.90)
Some Afinimai Test in Mv.lttnorma.les T h u s (5.89) and (5.90) imply (5.31) and, hence (5.70) ( c ) . C a s e p = 2, N = 4. S o l u t i o n o f L i n n i k , P l i s s a n d S a l a e v s k i
121
From (5.62) the problem consists of showing (with 0i /J) on the interval 0 < 8 < 1 there exists a probability density function {/3) for which
= 0(2,l; /2).
7
(5.91)
j *
"<'/ ^2,  ; M / 2 ^ { 
8)/2)mW
4l<\M^0)/2jmdf3.
(5.92)
we rewrite (5.92) as
16^0(2,1;
/
1 e
^ ) [ 1 +
_ ^
7 10
( A ) ( f / J
 d d 
/3)K(W
 ^ / 2
r +
2r )8
for r = 1 , 2 , . . . .
(5.94)
Let
r=l
be an arbitrary function represented by a series with radius of convergence greater than one. Multiplying the r t h equation in (5.94) by o _ i and taking
r
122
Inference
s:
 f&
+ /3 )/'
7
!  l +
o 0  ^ + f 0
) ,
J LUMP) Jo
d0 (5.95)
= 0 where L{f)
denotes the differential form inside the square brakets. Applying and choosing a small e > 0 we get (5.96)
 1)C + 0{Z
+ 60 + 7/3  7 / 3 K '
2
m% = ^ ( !  ^ ) ( ^  / ' o  \0+7^ (i  mi n
2 2 2
(5.98)
(5.99) =0 with tto / 0, Co / 0; are two fundamental solution of L*(Q 0 on the interval (0,1) Similarly
n = 96(1  /J) ,
k=0
fc
(5.100)
with 3o ^ 0, fto # 0 are also two fundamental solutions of the same equation on the interval (0,1). Consequently any arbitrary solution of the equation L*() 0 is integrable on [0,1] and we assert that there also exists a solution of the equation L*() 0 for which h m L [ M ]  ;  < = 0. (5.101)
Some Minimax
Test in Mullinormales
123
It can also be shown that any linear combination of the solutions in (5.99) (similarly for solutions in (5.100)) satisfies (5.101) provided that () = i (  ) .
2
T h u s  12 is a solution of (5.95) and hence of (5.91). I n the following paragraph we show that i does not vanish in the interval
2
The equation L"(t\) 0 becomes 2x{\  x)8" + (  1 + 4x,x Substituting u(x) = 6'(x)j8(x) + yx )S' + ^(1 + jx)$ z
2
= 0.
(5.103)
l4s
s ' _
3+^7*
2x{lx)
1
4x(lx)
'
Let us recall that 8(x) E t ^ o ^ ) * ' > 1*1 < ! Assume that 9{x)
vanishes Then
at some point in the interval (0,1). Since 0(0) = 1, among the roots of the equation ${x) = 0 there will be a least root. L e t us denote it by XQ. the function 0(x) ?'(0) is analytic on the circle \x\ < x$.  which implies that u(0) =  . From (5.103) we get
Hence u(x) cannot vanish in the < 0. T h e fact that which in turn
interval (0, Xrj), since at any point where it vanishes we would necessarily have u'(x) > 0 whereas (5.104) shows that at this point u'(x) as we approach the point XQ from the left (u' / contradicts (5.104). A s a result =
i 2
the function u(x) has a negative value implies that it decreases unboundedly 0 since
L e t us now return to (5.91), which, as we have seen, is satisfied by B y the method of Laplace transforms it is easy to obtain the relation
/ x  {l Jo
b l
xy^ei
x))
r(t)r(c  b) r(c)
4>(a,c;p + q),
0<b<c.
(5.105)
5 (1 r )
' and
p = 772,? =
7(lW2,
 y 2 , l ; ^ W
=p ( H ; ^ )
( W .
(5.106)
124
Inference
Now to obtain (5.91) it is sufficient to make use of the homogeneity of the above relation and to normalize the function 12. 5.3.2. R?test
0
Consider the setting of Example 5.1.5 for the problem of testing H against the alternatives Hi alternatives p
2
:p
: p
A > 0.
that among all tests based on sufficient statistic (X,S) > 0 the R test
2
(%m
with 3 ( H ) of order 1 and t is a pvector. For p > 2 this result does not imply the minimax property of rem. R test
2
as the group ( G , T) does not satisfy the condition of the HuntStein theoAs discussed in Example 5.1.5 we assume the mean to be zero and A maximal invariant under G T (P) consider only the subgroup G r ( p ) for invariance. T h i s subgroup satisfies the conditions of the HuntStein theorem. is R = A (Rg,... ,Rp)' with a single distribution under Ho but with a dis6j p
2
with J
= A
A under St
T h e Lebesgue density
function f\, { )
r =
and the continuity of fx, { )
v r m
6 y,
p
6i>o,
4 = A]
(5.107)
reduced problem in terms of R is Bayes. In particular, the Jf test which has a constant power on the contour p = A and which is also G (p)
T
invariant,
maximizes the minimum power over H i if and only if there is a probability measure on Y such that for some constant cj
according as
Some Minimax
Test in Multinormales
125
except possibly for a set of measure zero. A n examination of the integrand in (5.108} allows us to replace it by the equivalent
/ McktfA3i
Obviously (5.108) implies (5.109). Y% i
f
i f X > , = c
(5109)
constant c\ for which (5.109) is satisfied and if f = (r~2>.*p)' = c > c, then writing / =
7
we conclude that
/(f) = / ( c V / c ) > / ( r  ) =
C l
because of the form of / and the fact that c'/c > 1 and ^ j j r * in (5.29) 7 " ( l A) 1  J2j>i jfai
i 1 6 a n d t h a t
ti > 
argument for the case c' < c show that (5.108) implies (5.109). T h e remaining computations in this section is somewhat simplified by the fact that for fixed c and A we can at this point compute the unique value of C\ for which (5.109) can possibly be satisfied. L e t R = (R2,..., for Ti > 0 and E i
_ 1
Rpi)'
and write /x,Tj(ri/) for the version of conditional (u) for the
Lebesgue density of R given that E 2 Ri = U = u which is continuous in f and r < w < 1 and zero elsewhere. Also write
2
/
for fi > 0 and E S P
1 r =
/ ,,(rc)(dA) A
ci
/c*(0
75(c) J
of probability densities, is itself a probability density in f, as is / o o ( r  c ) . Hence the expression in square brackets equals one. From (4.35) with 0 < c < 1 q  A ) ^ 
P
r(^i
, _
( p
3 )
1
_
'
A)
i { N
_ _
p
2 )
r((jv )/2)r((pi)/2)
\(Nl), 1(JV1);(P1)
; C
(5.111)
126
Inference
F(a,b;c;x)
= 2^  * r()r(6)r(c + r)
r=0
E
. where we write ( a )
r
(g)
(fc) _
r r
(5.112)
(c)
= T(a +
r)/T(a).
which
,,_ ,
3 1 J
 _
P
"
r((/v ))r(i(pi))
P
FQ( JV
 1), i ( J V  1);  ( p 
1);CA)
(5.113)
[ i
r ( i  A ) / 7  t )
i i
03=o
* r ( i ( J v  j + l) +
+
0 =a
p
7 j
ft)
j
r 4
r j
(lA)/
i l r (  ( i v  i
i))(2/3 )! U + ^ ( ( 1  * ) / > ;  i ) J
FQ(/V1),
(Nl);
i(pl);cA)
(5.114)
for all T with r, > 0 and J ri = c. C a s e p 3 , N 4 ( N 3 if m e a n k n o w n ) . S o l u t i o n of G i r i a n d K i e f e r In this case (5.113) can be written as . + 2n)
> * 3 *3
V(l4)(i^A)
r g a ( l  A ) ( 2 n + l ) ( 2 n + 3)
a
Ta 83
(lfc)(lr A)
2
(la)(lr A)l
a
(5.115)
One could presumably solve for by using the theory Meijer transforms with kernal F ( , ; \\x). Instead they proceeded as for Hotelling's T
2
problem,
128
Inference
s
measure on [0,1], or, equivalently that the Laplace transform ltj(t) lj\ successfully in this way. We now obtain a function m {x),
t
is
completely monotone on [0, oo}. Unfortunately we have been unable to proceed which we then prove to be the Lebesgue density d * ( x ) / d x of an absolutely continuous probability measure C satisfying (5.118) and hence (5.115). T h a t proof does not rely on the somewhat heuristic development which follows, but we nevertheless sketch that development to give an idea of where the 111,(1) of (5.123) comes from. T h e generating function
CC
3=0
(5.118) by multiplying it by t ~
1zf" .
Solving (5.119) by treatment of the corresponding homogeneous equation and by variation of parameter, we get
m=
W=T)\
k U l  r ) (rCr*)l (1*)*
1
2[T(1T)(TZ)]1
dr. (5.120)
T h e constant of integration has been chosen to make <p(t) continuous at t = 0 with 0 ( 0 ) 1 and (5.120) defines a single valued analytic function on the complex plane cut from 0 to z and from 1 to 0 0 . If there did exist an absolutely continuous * whose suitably regular derivative m
1 z
satisfied
m (x)
f
L lo
we could obtain m
c
(1
dx = 0 ( f ) ,
(5.121)
tx)
m (x)
z
 ie)
(5.122)
27TIX E l O
Since there is nothing in the theory of Stieltjes transforms which tells us that an m
z
129
obtain m
()=
(1**)*
1
f
/
du
(1 + u)(z + u p  2 (1 + 1 i ) i ( z + )
[u(l + u ) ( z + i*)]*
{ B , Q , ( z ) + c,} (say). 2jr(a:(l  as))i We can evaluate c (5.126) below. We obtain by making the change of variables V = (1 + u)
1
(5.123)
and using
and
Q ,  2 ( l  z } 
+ (lz)"t
(5.124)
130
Group Invariance
in Statistical
Inference
T h e first condition will follow from (5.123) and the positivity of B for 0 < z < 1. T h e former is obvious. To prove the positivity of c ,
z
and c , we first
note that
this is seen by comparing the two power series, the coefficient of z [(f j j / j f ] I)2/j{j
+ 3
being
and (j + 1), and the ratio of the former to the latter being n";=i( + i ) > i . T h u s we have B
z M
>l2.
the expression for c and writing u = 1 2 the resulting lower bound for c whose coefficent of v> for j > 1 is [(j + i ) "  T (j the coefficient of u> for j > 1 is > (j +  ) 0 < z < 1.
1 1 2
has +1)].
a power series in u (convergent for \u\ < 1) whose constant term is zero and + )/r(j)r(j +
2 1
< !T(j)r(i + l ) ,
z
> 0 for
T o prove (5.124)(d) we note that m , ( i ) defined by (5.118) satisfies the differential equation 1  2x + zx m' (x)
x 2 1
+ 7ro,(i)
= B /2nxHl
z
 x)%{\  zx),
(5.125)
x(l
 x)(l
zx)
/ {z(n Jo [ x (l(l Jo _
n
+ 2)x
n+1
+ ( 1 + z)(n + l)x
nx ~ }
m {x)
z
dx
+ z)x +
zx )m' (x)dx
t
=  ) 
zu
n + 1
+ B
which is (5.118)(b). T h e proofs of (5.124) (b) and (c) depends on the following identities involving hypergeometric functions F{a, b; c; x) which has the following integral representation when Re(c) > Re(b) > 0: F(a,b;c;x) = ^ _ f  t )
M
&
( \
 tx)~*dt.
(5.126)
Some Minimax Test in Multinormales and the identities F(a,b;c;x) ( c  o  1 ) F(a,b;c\x)  ( c  1) F{a,b;c lim (a) [TicT'F^bc^x) ( f t U i ^ M A *( + + i (JI + 1)!
+ l
131
(5.128)
= 0,
(5.129)
;x) (5.130)
for 7i = 0 , 1 , 2 , . . . , r ( l + A + n ) r ( l + y + /Q
= F \ l + A, ~
 v; 1 + X +
 A, 1 + V;1 + J/ + ii; 1  * j
+ F\ \ + X, \  u; \ + X t fi\x \ f [
 i  A, ~ +1/; 1 + + ./*;! as
 j
+ A ,   i / ; l + A + as)
W C
F (   A,+
6
+ y +
 ;.j; (5.131)
We now prove (5.124) (b) and (c). F r o m (5.123), using (5.126) and (5.127) we obtain
(I48
F ^ , i ; l ; ^
r / 2 ) 1 1 2 2
 ; 1; 1 
3 1  ,  ; 2; 1  z] x F\ 2 2
I 1 zi) ?

(5.132)
132
Group Invariance
in Statistical
Inference
can be written as
_ J
y.
3
(1 
zT
dx (lx)i(lzx)"
^ ( l "
) ^
2n + l Jo
71=0 1 m=0
m
r a
,m(l\ oo z (^)m ^
(n + m ) ! < l  J t ) "
B
2 * H
v
(m!)
'
Z n=0
!(n+)
v
3'
m=0
(i  f )  i
;
(
1
'
1
V
rll r ' _ i i ^
;
0
2
1
( i  2  o *  ; 2 ; 1  z ) . (5.133)
= ( 1  z } " + ( T T / 4 ) (1 
i r * * ( j ,
F ,
V2'2
rii;l
 P
1.1,*..)]
(5.134)
 F (  ^  ^ I ^ ) ] } .
(5.135)
133
)F[
/(
 i , 4 ; l ;
/ 2 ) F ( i , i ;
= 0. (5.134)
T h e expression inside the square brackets is easily seen to be zero by computing the coefficent of z . T h u s we prove (5.124)(b). We now verify (5.124) (c). T h e integrand in (5.132) is unaltered by multiplication by x, and in place of (5.133) we obtain (1 Z)~ /2
1 n
Z~ /3+[K/4Z(1
1
z)i]
pi
I x m (x) Jo
2
dx
;2;*)
^i
1 3
2 2
; ;
n ^ l ^ i  z
(7T/4)(1Z )/Z
kin'3
(
[(lz)*/3.]F
3
: 1
2'2
(5.135)
T o verify (5.124) (c) we then have to prove the following identity (by (5.118)
ril
+ 3 3
,1 :1:1*
2
<x/4)[(l*) /*]
x F
2'2'
1 3   ,  ; 2 ; l  z 2'2'
(5.136)
134
From (5.131) with c = 1, a = b = , then (5.129) with a = , 6 =  J , and then (5.130) with A = /i = 0, f = 1 we can rewrite (5.136) as
(4/3ir){l + 2)
+ (3z/2)F(i,;2i *)
 ,  ; 2 ; 1  * ) + (4/3 TT )(1 
z)
i
2'
i
2'
1 2'
3 2 (5.137)
Using (5.129) with a =  f , b = , c = 1 + r, and then (5.130) with n = 0, c  * 0, the expression inside the square brackets in the last term of (5.137) can be simplified to \ z F ( I , ^ ; 2 ; r ) . T h u s we are faced with the problem of establishing the following identity 1 3 1 1  ,  ; 2; z j F\  ,  ; 1; 1 z 2 2 2'2'
4/TT = Fl
EMinimax
test
(Linnik, 1966)
for Hotelling's T
2
Consider the setting of Sec. 5.3.1. 6 = 0 against alternatives Hj : 8 0jv is  m i n i m a x if sup inf E (<b)8 2
test
inf
E (<p )<
e N
(5.139)
for all JV > / V (e) where 8 = (ft, E ) and 0 runs through all tests of level < a.
0
which is a pdf on the simplex I V If we substitute (5.140) in the left side of (5.62) we obtain a discrepancy with its right side which is of the order of 0 ( J V
 1 +
) , for e > 0.
T h u s , if
(5.141)
(5.142)
inf
Eg(4> ) = O ^ l / i V " )
N
against
Exercises 1. Prove in details (5.65). 2. Prove (5.72) and (5.73). 3. Prove (5.119) and (5.120). 4. Prove (5.127) and (5.130). 5. Prove that in Problem 8 of Chapter 4 no invariant test under ( G T  , ! ^ ) is minimax for testing Ho against Hi for every choice of A.
136
Inference
References 1. M. Behara, and N. Giri, Locally and asymptotically minimax test of some multivariate decision problems, Archiv der Mathmatik, 4, 436441, (1971). 2. A. Erdelyi, Higher Transcedenial Functions, McGrawHill, N . Y . , 1953. 3. J . K. Ghosh, fnvariance in Testing and Estimation, Publication no S M 67/2, Indian Statistical Institute, Calculta India, (1967). 4. N. Giri, Multivariate Statistical Inference, Academic Press, N . Y . , (1977). 5. N. Giri, Locally and asymptotically minimax tests of a multivariate problem., Ann. Math. Statist. 39, 171178 (1968). 6. N. Giri, and J . Kiefer, Local and asymptotic minimax properties of multivariate test, Ann. Math. Statist. 39, 2135 (1964). 7. N. Giri, and J . Kiefer, Minimal character of R test in the simplest case, Ann, Math. Statist, 34, 14751490 (1964). 8. N. Giri, J . Kiefer, and C . Stein, Minimax properties of T test in the simplest case, Ann. Math. Statist. 34, 15241535 (1963). 9. E . L . Ince, Ordinary Differential Equations, Longmans, Green and Co., London (1926). 10. W, James, and C . Stein, Estimation with quadratic loss., Proc. Fourth Berkeley Symp. Math. Statist. Prob. 1, 361379 (1960). 11. E . L . Lehmann, Testing Statistical Hypotheses, Wiley, N . Y . (1959). 12. Ju. V. Linnik, V. A. Pliss and Salaevski, On Hotelling's Test, Doklady, 168, 719722 (1966). 13. J . Kiefer, Invariance, minimax sequential estimation, and continuous time processes, Ann. Math. Statist. 28, 573601, (1957). 14. Ju. V. Linnik, Appoximately minimax detection of a vector signal on a Gaussian background, Soviet Math. Doklady, 7, 966968, (1966). 15. M. P. Peisakoff, Transformation Parameters, Ph. D. thesis, Princeton University (1950). 16. E . J . G . Pittman, Tests of hypotheses concerning location and scale parameters, Biometrika 31, 20G215 (1939). 17. O. V. Salaevski, Minimal character of Hotelling's T test, Soviet Math. Doklady 3, 733735 (1968). 18. J . B. Semika, An optimum poroperty of two statistical test, Biometrika 33, 7080 (1941). 19. L . J . Slater, Confluent Hypergeometric Functions, Cambridge University Press (1960). 20. C . Stein, The admissibility of Hotelling's T test, Ann. Math. Statist, 27, 616623 (1956). 21. A. Wald, Contributions to the theory of statistical estimation and testing hypotheses, Ann. Math. Statist 10, 299320, (1939). 22. A. Wald, Statistical Decision Functions, Wiley, N.Y. (1950). 23. O. Wesler, fnvariance theory and a modified principle, Ann. Math. Statist. 30, 120 (1959).
2 2 2 2
6.0.
Introduction
In multivariate analysis the role of multivariate normal distribution is of utmost importance for the obvious reason that many results relating to the univariate normal distribution have been successfully extended to the multivariate normal distribution. However, in actual practice, the assumption of multinormality does not always hold and the verification of multinormality in a given set of data is, often, very cumbersome, if not impossible. Very often, the optimum statistical procedures derived under the assumption of multivariate normal remain optimum when the underlying distribution is a member of a family of elliptically symmetric distributions.
tion belonging to the family of univariate elliptically symmetric distributions with location parameter (i = (pi,... the quadratic form (x  n)''S~ (x
1
definite) if its probability density function can be expressed as a function of  p) and is given by
fx(x)
= E
1 / 2
ff((x 
 ft)),
i e
(6.1)
137
138
Group Invariance
in Statistical
Inference
= 1
We shall denote a family of elliptically symmetric distributions by E (n, It may be verified that E{X) where a p the class E {u
p t  1
).
= u,
l
cov(X)
= oS
E{{X
 n)'% (X
S ) have the same mean and the same correlation matrix. T h e S ) contains a class of probability densities whose contours of equal
family E (ti,
p
density have the same elliptical shape as the multivariate normal but it contains also longtailed and shorttailed distributions relative to multivariate normal. T h i s family of distributions satisfies most properties of the multivariate normal. We refer to Giri (1996) for these results. D e f i n i t i o n 6 . 1 . 2 . (Multivariate Elliptically Symmetric Distribution). A n x p random matrix x = (x )
ij
(x ,,x y
l n
where Xi (Xn,...,
X )'
ip
distribution with the same location parameter p = ( p i , . . . , p } ' and the same scale matrix S (positive definite) if its probability density function is given by
f (x)
x
= \L\^ q
r g f o
tf^fatM
(6.2)
with Xi e E ,i
tically symmetric distributions are becoming increasingly popular because of their frequent use in filtering and stochastic control, random signal input, stock market data analysis and robustness studies of multivariate normal statistical procedures. T h e family E (u,
p
mal, the multivariate Cauchy, the Pearsonian type I I and type I V , the Laplance distributions. T h e family of spherically symmetric distributions which includes the multivariatet distribution, the contaminated normal and the compound normal distributions is a subfamily of i ( p , ) .
p
Distributions
139
Write f (x)
x
in (6.2) as (6.3)
3^
and is independent of
0 and q is fixed integrable function from y to [0, oo) independently of 8. L e t G be a group of transformations which leaves the problem of testing H
0
Assume that \ is
for ail i
d G acts transitively on
Using (2.21a) the ratio i i of the probability density function of the maximal invariant T(X) under G on x; with 8\ 6 QH, , #o QH > > given by
s 0
where 6(g) is the Jacobian of the inverse transformation x > gx. T h e o r e m 6 . 1 . 1 . / / G acts transitively is independent Proof. on the range space y
0
of
then R
element k(x,8j
140
Inference
where A
Q{8 )8{h{x,8
0
: y )i)A (h(x,6>
0 r
: y ))
0
E)
In recent years considerable attentions are being focussed on the study of robustness of commonly used test procedures concerning the parameters of multivariate normal populations in the family of symmetric distributions. For an uptodate reference we refer to K a r i y a and Sinha (1988). T h e criterion mostly used in such studies is the locally best invariant ( L B I ) property of the test procedure. G i r i (1988) gave the following formulation of the locally minimax test in E (ii,
p
) . Let F be the space of values of the maximal invariant depend on the parameter (<5,77), For each (u, 77) in the parametric space of the distribu: $,tf) is the probability density function of Z with
0
Z T(X)
against Hi .8 e f i / / , reduces to that of testing H0 : 8  0 against Hi := A > 0, in the presence of the nuisance parameter IJ, in terms of Z. We are concerned with the local theory in the sense that f(z : A,n) is close to f{z : 0,17) when A o(h(X)) is small for all q in (6.2). Throughout this chapter notations like o ( l ) , are to be interpreted a A * 0 for an q in (6.2). For each a , 0 < a < 1 we now consider a critical region of the form R'  [U{x) = U(T(x)) > C]
a
(6.7)
where U is bounded, positive and has a continous distribution for (8, r/), equicontinuous in (S,ri) with 6 < S for any q (6.2) and R'
0
satisfies (6.8)
P , ( f l * ) = cv, P ( f T ) = a + h{\)
0 M
+ rfA,/,)
for any q in (6.2) where r(A,ri)  o(/i(A)) uniformly in 7 7 with h{\) > 0 and h(X) = o ( l ) . Without any loss of generality we shall assume h(X) bX with
>> 0.
Remarks. (1) T h e assumption P (R*)
0t
of U under H
Locally Minimax Teats in Symmetrical Distributions (2) T h e assumption P\^(R") a + h(X) + r(X,ri)
141
X},
/
g(X)
f(z
f(z:Q, )( (dri)
n 0
< c
= o(X) uniformly in z.
0
Note. If the set 8 = 0 is a single point, assigns measure 1 to that point. In this case we obtain
small X, there
iaU,Px, (R*)
v i e Q a
 a rejects H}
0
f i
A.osup^
iafPx, {4x
v
 a
= (2 + /i(A)[9(A) + C* 7(A)]) .
Q
A Bayes critical region for (0, 1) losses with respect to the a priori T\Z\ + (1  TX)O is given by
142
Group Invariance
in Statistical
Inference
B y (6.10) B (z)
x
= B,
x
= R* 
and W
is continuous, letting
KM) we get
= /
PxMUdv)
(i+o(ft(A))
f&fA))
P&pjJ).
A
r (R*)
x
=(1
+ (1  n ) t * f o T O "
1 I A
T (P '
a
(V )P  ,(W,))
A 1 I
= rl(R')
+ (1  2 ) ( P * ( H M  P * ( V ) )
n 0 A 0 A A
= ri(il*) + (MA)).
0
(6.13)
If (6.11) is false for all q in (6.2), then by (6.8) we can find a family of tests {<px} of level a such that d> has power function cv + r(A, n) on the set {6 = A}
x
for all q in (6.2) satisfying lim sup (inf[r(A,jj) h(X)]/h(X) A > 0 "
A
> 0.
then satisfies
Locally Minimax
Tests in Symmetrical
Distributions
143
X[ = (X' ,...
tl
,X ),
ip
f (x)
x
m^ q(Y(xi
ii
rf'E^JK))
(6.14)
where g is from [0, co) to [0, oo), ft = (pi,... Write for any b = (h,...,b )'
p
,p )'
p
definite matrix. We shall assume that q is thrice contunuously differentiable. = {b' ,b' )',
{1) (2)
with b i) {
(lfa+1,..
.6p)',6p] = (bi,...,bi)'
matrix
A=(a )=(f*
iS 3
\ A
22
where A{
are pi x p
p We shall denote by
\aa,...,
On/
We are interested here in the following three problems. P r o b l e m 1. To test HIQ : p = 0 against H\\ ; p ^ 0 when E is unknown. P r o b l e m 2. T o test H are unknown. P r o b l e m 3 . To test H Q : p 0 against H31 : p ; i j = 0, p ( ) /
Z 2 : 2I)
M(i) = 0 against H \
2
P(i) ^ 0 when p ( ) , E
2
0 when E
is unknown. T h e normal analogues of these problems have been considered in Chapter 4. P r o b l e m 1. Let Gi(p) be the multiplicative group of p > c p nonsingular matrices g. Problem 1 remains invariant under G;(p) transforming (X,S;p,i:)^(gX,gSg';gp,gT g')
l
where X = ^
X i , S = Z? (XiX)(XiX}'.
=l 2
space of ( X , S) is T
= nX'S^X'
or equivalently R = nX'(S
144
Group Invariance
in Statistical
Inference
/ ( l + T
) .
Gi(p) is 6 = T t y t ' E ~ V  I f Q is convex and nonincreasing, then the Hotelling with respect to G ( p ) for testing i f
;
test which rejects H\o whenever i i > C is uniformly most powerful invariant
1 0
against H\i
Chapter 5, the group Gi(p) does not satisfy the conditions of the H u n t  S t e i n theorm. However this theorem does apply for the subgroup GT(P) maximal invariant under GT{P) is [R\,... on A (6\,...,Sf,)'. under Hu T h e ratio
L 0
of probability densities of
: S = A and H
GT{P))
j R = ~ ^
i)/2
/
where dg = Pi(x)
Po(sx)I](^)
0 / 2
dS
Jldgij,
Po(x) = ? ( t r x'x). Since x'x > 0, there exists a g E GT such that gx'xg' = I, Hence f
R =
v^p)'.
q(ti{ g'
9
 2A'gy +
tyf[(fi)f**&*B (6.16)
/
 1(^(99'))
9(tr(ss'))n(^) 
( n
i ) / 2
dff
Under the assumption that q is thrice differentiable, we expand q as q(ti(gg'  2A'gy + A)) + (  2 t r ( A ' g y ) + \)q(ti
2
gg )
+ X) q (z)
(6.17)
Distributions
145
/ /
tr(AW"(tr(S9')) 9iW
t )
fiftfe*****
( 1 i l / 2 r f
= ' ^
0
(tr(
(618)
f f 0
/ 2
^,
R=l
g^(gg'))f[(gir^dg
+ 4
MA'
0 i
,))
2 9
( >(tr(
5 f f
'))n(^)
 n
'
) / 2
+ 4
6 / ?
(2tr(A'
? s /
A)V
3 1
( )n(S ,) i i
 n
l ) / 3
^ (6.19)
T h e first integral in (6.19) is a finite constant Q\. integral in (6.19) we first note that
tr{L\'gy)
Y r)
d
12
(6.20)
5>MI
^(gg'))f[(gl^^dg.
(6.21)
146
Inference
T o evaluate the above integral we need to evaluate the following two integrals.
JOT
i=i (6.22)
h = Define
g 3 '( (s3'))f[(s
j
( 2
t r
2 1 l
} ''
i , / 2
L = tr(ffff')
 gtJL, i = l,...,pi
ep i=flfn,i/ '
+
*=
i,.,Pi;
= 2,...,p2;
Since
p + 1
and e has a Dirichlet distribution ( 1 / 2 , . . . , 1 / 2 ) . From K a r i y a and E a t o n (1977) the probability density function of e is given by
(e) = r ( f c ^ ) / [ r ( i / 2 ) i ^ ' /
p(p+l)/2l
i=.
/
V
p(p+l)/2l
i J
1/21
(6.24)
,
Jk 
.NM
(n  + 1 ) ,
_ WW J 
2
(6.25)
Locally Minimax
147
where / P TJ(
\i=l
M = E[
E I
)(0/a
C = p(p + l ) / 4 + 1/2 ( n  i ) .
(6.26)
X+ A2?
j&+ J
o(A) uniformly in
j>
L
+ ( n  j +
)%] j j +
(6.27)
where B(y,ri,\)
the equation (6.9) is satisfied by letting 0 give measure one to the single point ?7 = 0 while
A
whose j t h corrdinate Wj satisfies V'j = (" " j ) " V so that + l ) n   njp. (6.29) " 3 + i r V ^ f f t " P ) . / = 1, (628)
depend on q. T h e first equation in (6.8) follows from the following lemma. L e m m a 6 . 3 . 1 . Let Y = (Y ...
u
,Y )',
n
Y; = {Yn,..., symmetric
Y )'
ip
i = 1 , . . . , be function
annxp
probability density
f (y)
Y
= 9(tr yy') = q ( g
,i=l
tr
] ,
(6.30)
a n d Jet
148
where Y ) , Y
( L
( 2 )
n  k >
p>k.
The distribution
(Y ' Y( ))
(1)
is positive definite with probability one. Hence AA. A s the Jacobian of the transformation
there exists a p x p symmetric positive definite matrix A such that Y'Y Transform Y to U, given by Y = UA. Y U is \A\ , /r/,A(,) = (tr a a ' ) M . T h u s (7 has the uniform distribution and 17 is independent of A. U as
n n
Partition
where t /
( 1 )
( { )
= t7#j>4,
i = 1, 2. T h u s
Since Y ( ( Y ' Y ( ) )
1 1 1 2 ) 2
_ 1
distribution. C o r o l l a r y 6.3.1. If T
2
(central i.e. 6 = 0). Proof. Since Y ( j ( Y j ' j Y ( 2 ] ) Y j ' j has a completely specified distribution
1 2 1 1
for all q in (6.30), we can, without any loss of generality, assume that Y i , . . . , Y are independently distributed N (0,I).
p
T h i s implies that Y / j l ( 3 ) =
2
lY
Hotelling's T
distribution.
2
= nY'S' ?,
2
with Y = l/n
"
=1
~ Y
Hotelling's T
T o establish the second equation in (6.8) we proceed as follows. (2.21) we get for any g t?/(p), (which we will write as Gi) fT*{t \6) fT2(t \0)
2 2
J g(tT( g'2a'gy
Gi 9
+
2
S))\gg'\^' dg
( J
<j(tr gg')\g0)/ dg
Locally Minimax
Testa in Symmetrical
Distributions
149
After a Taylor expansion of q, the numerator of (6.31) can be written as / q(ti99')\99'\  dg JG,
iin p,
{n
p)n
+ o(6)
where g (jJy). Using (6.18) we get the ratio in (6.13) as l + 6(k + CT) + B(y,t),6) where B(y,r),6) Si/S,i o(6) uniformly in y,v
p
(6.32)
:6 = 0
where c
is a positive constant so that R* has the size a, satisfies P^tRn^a + bX + giKr,) (6.34)
with b > 0 and <7(A,r/) = o(Z>A). T h u s we get the following theorem. T h e o r e m 6 . 3 . 1 . For testing Hm : n = 0 against the alternative H' A specified > 0 Hotelling's of distribution (6.14). T
2
:6
R e m a r k 1, Since G j ( p ) satisfies the condition of Theorem 6.1.1., the ratio (6.31) is independent of q. From Corollary 6.3.1. (6.34) it follows that Hotelling's T Hia against
2
P r o b l e m 2. Write X = ( X i j , X ) ) with
( ( 2
: n x p\,X( )
2
: n x P2, and
p,+P2=
p.
Since the problem is invariant under translations of the last pa components of each Xj, j 1 , . . . , n , this can be reduced to that of testing if o : p ^ j 0
2
0 in terms of X^
/^^(^DiISiir^gftrSr/^ijep'^y^^ep^))
150
Inference
(1,...,1)
g(tr v) =
/ q(tr(v + JR"F1
R
ww'))dw,
and q is a function on [0, oo) to [0, oo) satisfying (6.1a). Now, using the results of problem 1 with p = p , R = Ri L
T h e o r m 6.3.2.
n
21
S V ( i ) = X {specified) > 0 the test which rejects H20 for large values of Rj nX' (S
(1) l) {n)
= nXl S( \ X /{l
l )
nXl S\ X , )
1) ) l )
(6.34).
R e m a r k 2. T h e locally best invariant property and the uniformly most powerful invariant property under the assumption of nonincreasing convex property of q follows from the results of problem 1. P r o b l e m 3. T h e invariance property of this problem has been discussed in Chapter 4. T h e problem of testing i f
3 0
against H i
3
where gin) is p i X P i , transforming (X, S)  (gX, gSg'). in the sample space under G is (Hi, .S3) where
pi p
Ri=2^Ri,
fc=l
Ri+R~
= R =
Y, i=l
Ri
i=l P il
For invariant tests under G the problem is reduced to testing i i against the alternatives H Ri and R 2 3l
3 0
; 5 = 0
2
Distributions
151
As stated in Chapter 5 the group G does not satisfy the conditions of the H u n t  S t e i n theorem. However this theorem does apply for the subgroup GT(P) of p x p nonsingular lower triangular matrices. From problem 1 the whose distribution depends
30
,  , VS )'
P
: 6i 0,
/ R =
q(tv(g '
9
 2A{ (g
2j
{2im)
S ( 3 2 ) J
, ) + A)}
( 2 1
]\[gl)  dg (635)
{n
,)l2
/
J G
T
?(tr
1
gg')\{{gir^dg
where g e G (p)
T
and
*>(a a
with ( i i j ff(22) both lower triangular matrices. 9(tr Sff') + (~2v + \)qM(ti 9 where
 2 )
)
+ A)
2
(tr
3 0
(636)
Using (6.18) the integration of the second term in (6.36) with respect to the measure n
1
(  >
3 7
gives A Q I where
01= /
JG
T
q (K99')f[(9 ,)  dg.
1
in
iy2
T o integrate the third term in (6.36) with respect to the measure given in (6.37) we first observe that (by (6.18))
152
Group Invariance
in Statistical
Inference
(tr(A;
2 ) S ( 2 u y o )
)%' Htrffs0n(5 J
l n
, ) / 2
= /
( 2 t
(tr
S f f
')n^)
( n
"
i ) / a d
f (6.38)
and
/ JG
(tt(A;
v r
2 ) f f ( 2 2 )
( ( 2 )
)%( >(trS3')n(s i
ff
2 1
} 
( n
0 / 2
rfs
=/
T
E >Ei 4 +
[)=Pi + l j>i 1
V i
(6.39)
Denote K= f
JG
T
q^[tigg')dg,
= tr gg', (6.40)
D=
9(tr
gg')f[{glY^dg,
2MN
^ ( E i
r
+ B{y,r,, A)
where
(6.41) of
0
B(y,n,
by Theorem
6.1.1). T h e set {A = 0} is a simple point 17 = 0. So the assigns measure 1 to the single point rj = 0.
in Theorem 6.2.1.
77; =
0(h(X}).
Locally Minimax
Tests in Symmetrical
Distributions
153
Any probability measure \ can be replaced by degenerate measure assigns measure 1 to the mean 77*, i pi + 1 , . . . , p of j . Hence
which
+ B(y,ri,\)
uniformly in y,n.
RK = {X : U(X) = fi + Kf
> c)
0
(6.43)
where A" is a constant such that (6.42) is reduced to yield (6.9) and C depends on the level of significance a of the test for the chosen K, independently of q (by L e m m a 6.3.1.) Now choose
_
V j
(nj!) (nj + l)
(np) (np
npi
\ '
3
,_
P l
+ 2) {(np+Vpi)
so that
Hence we can conclude that the test with rejection region R' = ^x :U(x)=f with P(, (R')
TTI
^^r
>C J
0
(6.45)
= a satisfies (6.9) as A > 0. Furthermore any region RK of the to satisfy (6.9) as A  * 0 for some t\.
(fi,
f \H )
2 3a
Dl\ where D
2
P2
P2
(6.46)
l7
ati, a
are positive constants and B{f'1,^2,62) 0(62) uniformly in it follows that for testing H
2 3D
against J f
3 1
the test with critical region R' is locally best invariant as 6 > 0. Hotelling's
154
Inference
3Q
L B I test and hence it is locally worse. From (6.32) it follows that Hotelling's test whose power depends only on 6 = 6 , has positive derivative at 6
2 2
T h u s , for the critical region of Hotelling's test the value of h(\) we get the following theorem.
with 6% = A i n
T h e o r e m 6.2.2. rejects H
30
30
against H
3l
: ) = A the test,
( 2
which
whenever fi + ^jfrt
as A * 0 for
E x a m p l e 6 . 3 . 2 . L e t X = (X )
I}
= {X\
X )'
n
X[ = (X*
X )',i
ip
1,..., TI be an n x p random matrix with probability density function given in (6.2). L e t X = J we shall write Xt, S = J J ( X i  X)(X
t
 X)'.
C
2 S S S 2
4(12) A 2) 4(32)
(2
(647)
= p + p (E,i2),E,i3,)'(^
(E
a a
,,E
( 1 3 )
)'/E
( 1 1 )
We shall consider the following three testing problems: (see Sec. 4.4) P r o b l e m 4. unkown. P r o b l e m 5. T o test H are unknown. P r o b l e m 6. To test H are unknown. These problems remain invariant under the group of translations transforming ( * , S ; i , S )  ( * + 6 , S ; / + t,E). 6eR .
p
T o test H
: p
= 0 against Hi : p
> 0
when p, E are
: (? = 0 against H
: pi > 0, p\ = 0
when p , E
:p
= 0 against H
: p\ 0, p\ > 0 when p, E
Distributions
155
T h e effiect of these transformations is to reduce these problems to corresponding ones where p = Q and 5 = ^27=1 XiX[. T h u s we reduce n by 1. We treat now the latter formulation with p = 0 and assume that n > p > 2 to assert that S is positive definite with probability one. Let G be the full linear group of p x p nonsingular matrices g of the form f s m 0 \ For the first problem p Pi> under G operating as ( S ; E ) > with g G G. (gSg'^Zg )
1 2 0 0
\ 0 . (6.48)
9 = \
9(22) 9(32] =p
5(33]/
= 0, p\
(with p2 = 0) is _o "
5 =
and a corresponding maximal invariant in the parametric space of E under the induced group (with p2 = 0) is
p* =
^ (n)
(6.49)
R\
"
y ^ (H)
6
(6.50)
fij+fi
= (5
,5
)(^
(6.51)
{p\,p ),
2
(E
( 1
2),E
( 1 3 )
)('^
2 2
)^ ))(S
( 1 2 ) 1
( 1
3 )'
)
L(u)
156
Inference
P r o b l e m 4. T h e invariance of this problem has been discussed in Sec. 4.4. T h e group G does not satisfy the conditions of the HuntStein theorem. T h e subgroup GT of G , given by,
f
GT = < 9 ( where 3( 2)(Pi
2 x
( a m 0 \ 0
0(22) 9(32)
0 9(33)
satisfies the conditions of the theorm. It may be remarked that we are using the same notation g for lower triangular matrices as g in G . In the place of vector R = ( R , R ) ' denned in E x a m p l e 5.1.5. with
PI + I P
J=2
j=2
,t>)J )'
2
p\
= i=2
P =P i+pl
J2
i2
6i
For testing H
:p
:p
= A the ratio
fi of the probability density function of R under H\\ to that under HQ (using Theorem 2.7.1.) is R = (1 fR(r\H )/f (r\H )
lx R 0
A)""/
$((1  A) hT(g
2 u)
 2g Ay'g'
(11)
(22)
g{ g( 2)]v{dg)
22) 2
(6.53) where
=2 i=2 = /
JGT lG
T
g(tr
gg')v(dg). y/Tp)'
y = (v^2
Locally Minimax
Testa in Symmetrical
Distribntxona
157
Since the (p  1) x (p  1) matrix C=(lp )" (/AA') is positive definite there exists a (p 1} x (p 1) lower triangular nonsingular matrix T such that TCT' = I. Define 7i,a <,2 < i < p by i 7i = 1  E^''
j=
2
= &7j>/7<7ii
= 1  p . Let a = ( a
2
2 )
. . . , a ) ' . T h e n a = T A . Since C A = A
f
{r)
_ 1
A,
A = ACA' p.
2
Furthermore with
= (a ,
2
, on)' i
Since C(lp } " we can write (1 A ) " ' " / ' M=g
2 2 2
(654)
(tr S<7')(2tr(a ^
(11)C!
i,'o; )))
22
^ ^ (  2 t r ( ^^{2tr(g,
) )
3
+ where
n 
ay'g
( 2 2 
))
158
Inference
Since, with g 
I / 2
(
i>j
* j)
9 i
and the measure g > (tr gg') f(do) is invariant under the change of sign of g to g we conclude as in (6.18) that
/
JGT
[tr <J(ii)Q!/'i7(22)]9 (
(1)
tr
99'Hdg)
= 0
j 9i 9ihQ (^gg')u(dg)
j
(1)
= 0
i / /, J j t f e .
(6.55)
Let L = tr gg', L e i = g j u ) , L e , = g , i = 2 , . . . , p , Le
2 p 2
i+i = G
2 + 2
,i,
i = 2, . . . , p ,
iei
+ P
( p  i ) / 2 = ffp,
\ !>j (6.56)
I f =6j/X,
M = E
e
j = 2,...,p,
1 2
S ( j / i , A , n ) = o(A) uniformly in 3 / and TJ. Furthermore from Theorem 6.1.1 R does not depend on q. Since A = 0 is a single point n = 0, 0 assigns measure 1 to the point 1 7 = 0. T h e set p
2
Locally Minimal
Tests in Symmetrical
Distributions
159
component is 0(A). T h u s any probability measure & can be replaced by the degenerate measure which assigns measure 1 to the mean TJ , i = 2 , . . . , p of t)y Choosing V* = (nj + l)'(  j + 2 ) ~ ( p  l)~ n(n
1 l
P+1), (657)
f = 2,...J>. we see that (6.9) is satisfied with U = YZ=2 3 using Theorem 2.7.1. we get /^(r 7Ji)//fi (r ff )
2 0 2 2 R
1 9 8 8
(1 B
2
p )'
+ ( i  p 2i )(/AA')
yG
9 ( 2 2  0
2 2 )
])p(rf )
9
= 0, 
n ( p
ff(22)9f
2 2 )
1 , / 2
^,
gg')p{dg),
2
If p
2
test against the alternative H\ which rejects Ho whenever R by a + h{p ) T(X) = T(Xh) + o(p ),
2
> C =
where h{p )
> 0 for p
= o(l). R
2
null robustness of the test follows from the fact that T(X) theorem:
against H[: p
against H TlA
3=2 + B(r,X,n).
M>j
/ . (6.58)
160
Inference
to
ot*> where
1 1 1 1 I 7
j) = ( n  p ) 7 ( i  i + l )  ( n  J  l  2 )  p r J = l,...,Pi. we conclude that (6.9) is satisfied with U = Rf = ,,'=2 from Theorem 6.2.3.
R
(6.59) j
a n d
(  )
follows
T h e o r e m 6.2.4. For testing Ha against H' p\ A, the levela test which rejects Ho whenever Rf > C ,
a
depends on level a of
the test, is locally minimax as X * 0. P r o b l e m 6. Here T),=0, For testing Ho against H j = 2,...,pi + l .
2B + B(r,A,n)
)=Pi+2
V i
(6.60)
where B(r, X, n) = o(A) uniformly in r and rj. L e t us now consider the rejection region of the form RK={? ; u(y) = f\+Kf
2 2
> c ]
a
(6.61) depends
where K is chosen such that (6.61) is reduced to yield (6.9) and C on the level a of the test for the chosen K. and t]x give measure one to a single point ( 0 , . . . , 0, r v * Tl'j=(np+ and let R=! y:U(y)
[
l)n(n
= fl +
^ f l > C
} = a
depends only
Distributions
161
From Problem 4 the power of the test (which rejects conclude that h(X) > 0. Hence we have : p\ = X > 0, the test with
whenever R
with a = p~ E(X
 /iJ'E"
(X
2. Let X be a n x p random matrix with probability density function (6.2). F i n d the maximum likelihood estimator of p and E . References
1. N. Giri, Locally minimax tests for multiple correlations, Canad. J . Statist., pp. 53 Ann. Inst. Stat. Math.,
60 (1979).
2. N. Giri, Locally minimax tests m symmetrical distributions,
pp. 381394 (1988). 3. N. Giri, Multivariate Statistical Analysis, Marcel Dekker, N . Y . (1996). 4. T . Kariya and M. Eaton, Robust test of spherical symmetry, Ann. Statist., pp. 208 215 (1977).
5. T . Kariya, A robustness property of Hotelling's T problem,
2
of multivariate
and optimality
of Statistical
Chapter 7 T Y P E D AND E R E G I O N S
The
Kiefer
(1958) showed that the usual F  t e s t of the univarial general linear hypotheses possesses this property. Lehmann (1959) showed that, in finding type D region, invariance could be invoked in the manner of the Huntstein theorem; and this could also be done for type E regions (if they exist) provided that one works with a group which operates as the identity on the nuisance parameter set H of the testing problem. Suppose, for a parameter set fi = {(8,n) function fy(9,IJ) ' 8 0 , J J e H with associated
distributions, with 0 a Euclidean set, that every test function d> has a power which, for each n, is twice continuously differentiable in the
a
strictly unbiased level a test of HQ : 8 = 0 against Hi : 0 ^ 0. O u r assumption on 0$ implies that all tests in Q are similar and that d0^/d&i\g^ in Q .
a o
Let A ^ ( J J ) be the determinant of the matrix B<j,(rj) of second derivalives with respect to the components of 8 (the Gaussian curvature) at Q.
a
of 0${8,TI)
8 0. Suppose the parametrization be such that A # ( f i ) > 0 for all n for at least one < ( > ' in
D e f i n i t i o n 7.1. T y p e . E t e s t . A test d>' is said to be of type E if <p' e and A ^ . ( n ) = max^gp,, A ^ ( n ) for all 77. D e f i n i t i o n 7.2.
D if the nuisance parameter set H is a single point. I n the problems treated earlier in Chapter 4 it seems doubtful that type E regions exist (in terms of
162
163
Lehmann's development, H is not left invariant by many transformations). We introduce here two possible optimality criteria in the same sprit as the type D and E criteria which will always be fulfilled by some test under minimum regularity assumptions. Let
A ( n ) = max A ^ ( T J ) . (7.1)
D e f i n i t i o n 7.3. T y p e D
if
max i
A( )
V
 A^(77)] 
min max[A(r,)  A * ( r / )
(7.2)
These two criteria resemble stringency and regret criteria employed elsewhere in statistics; the subscripts "A" and " M " stand for "additive" and "multiplicative" regret principles. T h e possession of these properties is invariant under the product of any transformations on 9 (acting trivially on H) of the same general type as those for which type D regions retain their property, and an arbitrary 11 transformation on H (acting trivially on 0 ) , but, of course, not under more general transformations on ft. Obviously, a type E test automatically satisfies these weaker criteria. L e t us now assume that a testing problem is invariant under a group of transformations G for which the HuntStein theorem holds and which acts trivially on 0 ; that is g(0,n) = (6,gr,), e G . (7.4)
If d>g is the test function, defined by 4>g{x) = d>(gx), then a trivial compulation shows that A()
tg V 3
=/Xpigv)
(7.5)
better than d>'g. A l l of the requirements of Lehmann (1959) are easily seen to be satisfied, so that we can conclude that there is an almost invariant (hence, in our problems, invariant) test which is of type DA or D^ This differs from
164
Inference
the way in which invariance is used in page 883 of L e h m a n n (1959), where it is used to reduce Q rather than H here. If the group G is transitive on H, then A ( J J ) is constant as is A ^ ( T J ) for an invariant <p, which we therefore write simply as A ^ . In this case we conclude that if 0 ' is invariant and if <p* of type D among invariant d> (i.e. T o verify these optimality properties we need the following lemma. L e m m a 7.1. Let L be a class of nonnegative definite symmetric matrices of order m and suppose J is a fixed nousingular member of L. Conversely, if L is convex and J maximizes det B, the tr J~ B by B = J. P r o o f . Write J~ B tr ~
L L
if A ^ .
maximizes A ^ over all invariant <p), then d>" is of type DA or DM among all a).
I f tr
J~ B
L M
<
< 1 = det I . Conversely, if J. maximizes det A , it also maximizes tr A, > 1 for a small and positive.
R e m a r k s . T h e usefulness of this lemma lies in the fact that the generalized NeymanPearson lemma allows us to maximize tr QB^,, for fixed Q more easily than to maximize A ^ d e t B ^ among similar level a tests. We can find, for each Q, a, <pQ which maximizes tr QB^,\ a 4>" which maximizes A ^ is then obtained by finding a <S>Q for which B$
Q
Q~ .
In problems of the type which we consider here the reduction by invariance under a group G which is transitive on H often results in a reduced problem wherein the maximal invariant is a vector R = (Ri,..., tion depends only on A form m (7.6) where the hi and a y are constants and Q ( r , A ) o ( ^ ; ) as A > 0 and we can differentiate under the integral sign to obtain, for invariant test <p of level (Sy,..., 6 )'
M
Til
8 (0 )
T TV
= all
+ $ >
hA + 5 > a t f
/ r  #r)
}
f (r)
0
p(dr)
(7.7)
165
as S  0.
T h e o r e m 7 . 1 . In the SRR case, an invariant D among invariant only if <p* is of the form
^^={1}
where c is a constant and qT
1
IF
E {<}<?
+ EotZj ai Rj<l>'{R)
3
(7.8)
= const h^
P r o o f . I n the S R R case, every invariant <f> has a diagonal B$ whose i t h diagonal entry, by (6.7), is 2[hia + we get the theorem.
o {53,
a R 4i(R)}]i3 }
B y NeymanPearson
lemma tr QB$ is maximized over such a <j> by a d>* of the form (6.7). Hence
E x a m p l e 7.1. Let 0 = A
r l
^ F
 1
transitively on H = {positive definite E } but trivially on and we have the S R R case. We thus have (6.7) with h =  1 ,
{
'1,  < 0, N  j + l,
Hotelling's T
5Z;^>
so that,
with the above parametrization for 6, we have BT* a multiple of identity. Hotelling's critical region is of the form > >  B u t , when all are equal, the critical region corresponding to (6.8) is of the form ^ (  / V + p 1 2 j ) r , > c, which is not Hotelling's region if p > 1. Hence we get the following theorem.
T h e o r e m 7.2. F o r 0 < a < l < p < N , D among Grinvariant E) among all tests.
Hotelling's
166
Inference
R e m a r k s . T h e actual computation of a <j>* of type D appears difficult due to the fact that we need to evaluate an integral of the form
(7.10) for ft = 0 or 1 for various choices of the Ci'fl and c. W h e n a is close to zero or 1, we can carry out approximate computations as follows. As a 1 the complement R of the critical region becomes a simplex with one corner at 0. When p = 2, if we write p = 1  a and consider critical regions of level a of the form byi + y o(c) as C p b  ' /2(N2)
2 i 3 2 2
close to the origin), we get from (6.10) that p = E {l(p{R)) Similarly, E {(l
a
= (A 2)c/26 / +
3 1 s 2
 <p(R)Ri}
0
= {N  2 ) c 6  ' / 8 + o(c)
1 
1 
+
where o{p) and o ( J ^ 6 i ) terms are uniform in A and p, respectively. The product A ^ of the coefficients of 6\ and b% is maximized when b = (N + 1)/{JV  1) + o ( l ) , as p * 0; with more care, one can obtain further terms in an expansion in p for the type D choice of b. T h e argument is completed by showing that 6 < L
_ 1
to make EQ(1  <p(R))Ri too large and E {(1 d> as good as that with b = (N + l)/(N a + (Si + ) / 2 + o(p ),
2 2
 d>(R))R }
2
When p is very close to 0, all choices of 6 > 0 give substantially the same power, so that the relative departure from being type D, of
2
Hotelling's T
> c, with
b fixed and positive, approaches 0 as a  1. However we do not know how great the departure of A j from A can be for arbitrary a.
r
We can similarly
treat the case p > 2 and also the case a * 0. One can treat similarly other problems considered in Chapter 4. References
1. N. Giri, and J . Kiefer, Local and Asymptotic Minimax Properties of Multivariate
167
of Unbiased
Tests of Simple
Statistical
Hypothesis
Parameters,
A n n . M a t h . S t a t i s t . 2 2 , 217 Nonoptimality of
Optimality Tests.,
and Randomized