Sie sind auf Seite 1von 176

Group Invariance in Statistical Inference

T h i s p a g e i s i n t e n t i o n a l l y left b l a n k

G r o u p I n v a r i a n c e in Statistical Inference

Narayan

C . Giri

University of Montreal, Canada

W o r l d h Singapore

Scientific * New Jersey London Hong Kong

Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Fairer Road, Singapore 912805 USA office: Suite I B , 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

G R O U P I N V A R I A N C E IN S T A T I S T I C A L I N F E R E N C E Copyright 1996 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN 981-02-1875-3

Printed in Singapore.

To NILIMA NABANITA NANDIAN

T h i s p a g e i s i n t e n t i o n a l l y left b l a n k

CONTENTS

Chapter 0. G R O U P I N V A R I A N C E 0,0. Introduction 0.1. Examples Chapter I . M A T R I C E S , G R O U P S A N D J A C O B I A N S 1.0. Introduction 1.1. Matrices 1.1.1. Characteristic Roots and Vectors 1.1.2. Factorization of Matrices 1.1.3. Partitioned Matrices 1.2. Groups 1.3. Homomorphism, Isomorphism and Direct Product 1.4. Topological Transitive Groups 1.5. Jacobians Chapter 2. I N V A R I A N C E 2.0. Introduction 2.1. Invariance of Distributions 2.1.1. Transformation of Variable in Abstract Integral 2.2. Invariance of Testing Problems 2.3. Invariance of Statistical Tests and Maximal Invariant 2.4. Some Examples of Maximal Invariants 2.5. Distribution of Maximal Invariant 2.5.1. Existence of an Invariant Probability Measure on 0{p) (Group of p x p Othogonal Matrices) 2.6. Applications 2.7. The Distribution of a Maximal Invariant in the General Case
vii

1 1 3 7 7 7 8 9 9 10 12 12 13 16 16 16 22 25 26 28 31 33 33 36

viii

Contents 37 44 45

2.8. An Important Multivariate Distribution 2.9. Almost Invariance, Sufficiency and Invariance 2.10. Invariance, Type D and E Regions Chapter 3. E Q U I V A R I A N T E S T I M A T I O N I N C U R V E D MODELS 3.1. Best Equivariant Estimation of y, with \ Known 3.1.1. Maximum Likelihood Estimators 3.2. A Particular Case 3.2.1. An Application 3.2.2. Maximum Likelihood Estimator 3.3. Best Equivariant Estimation in Curved Covariance Models 3.3.1. Characterization of Equivariant Estimators of S 3.3.2. Characterization of Equivariant Estimators of 0 Chapter 4. S O M E B E S T I N V A R I A N T T E S T S I N MULTINORMALS 4.0. Introduction 4.1. Tests of Mean Vector 4.2. The Classification Problem (Two Populations) 4.3. Tests of Multiple Correlation 4.4. Tests of Multiple Correlation with Partial Information Chapter 5. S O M E M I N I M A X T E S T S I N M U L T I N O R M A L E S 5.0. Introduction 5.1. Locally Minimax Tests 5.2. Asymptotically Minimax Tests 5.3. Minimax Tests 5.3.1. HotelUng's T Test 5.3.2. R -Test 5.3.3. e-Minimax Test (Linnik, 1966) Chapter 6. L O C A L L Y M I N I M A X DISTRIBUTIONS 6.0. Introduction 6.1. Eliiptically Symmetric Distributions 6.2. Locally Minimax Tests in E (n, S )
p 2 2

49 50 53 53 58 58 60 61 63

68 68 68 75 82 85 91 91 93 106 111 112 124 135 137 137 137 137 140 143 162

TESTS IN SYMMETRICAL

6.3. Examples Chapter 7. T Y P E D A N D E R E G I O N S

Group Invariance in Statistical Inference

Chapter 0 GROUP I N V A R I A N C E

0.0. I n t r o d u c t i o n One of the unpleasant facts of statistical problems is that they are often too big or too difficult to admit of practical solutions. Statistical decisions are made on the basis of sample observations. Sample observations often contain information which is not relevant to the making of the statistical decision. Some simplifications are introduced by characterizing the decision rules in terms of the sufficient statistic (minimal) which discard that part of sample observations which is of no value for any decision making concerning the parameter and thereby reducing the dimension of the sample space to that of the minimal sufficient statistic. T h i s , however, does not reduce the dimension of the parametric space. B y introducing the group invariance principle and I n view of the fact that sufficiency and restricting attention to invariant decision rules a reduction to the dimension of the parametric space is possible. group invariance are both successful in reducing the dimension of the statistical problems, one is naturally interested in knowing whether both principles can be used simultaneously and if so, in what order. Hall, Wijsman and Ghosh (1965) have shown that under certain conditions this reduction can be carried out by using both principles simultaneously and the order in which they are used is immaterial in such cases. However, one can avoid verifying these conditions by replacing the sample space by the space of sufficient statistic and then using group invariance on the space of sufficient statistic. I n this monograph we treat multivariate problems only where the reduction in dimension is

Group Invariance in Statistical

Inference

very significant. In what follows we use the term invariance to indicate group invariance. In statistics the term invariance is used in the mathematical sense to denote a property that remains unchanged (invariant) under a group of transformations. In actual practice many statistical problems possess such a property. As in other branches of applied sciences it is a generally accepted principle in statistics that if a problem with an unique solution is invariant under a group of transformations, then the solution should also be invariant under it. This notion has an old origin in statistical sciences. Apart from this natural justification for the use of invariant decision rules, the unpublished work of Hunt and Stein towards the end of Second World War has given this principle a Strang support as to its applicability and meaningMness to prove various optimum properties like minimax, admissibility etc. of statistical decision rules. Although a great deal has been written concerning this principle in statistical inference, no great amount of literature exists concerning the problem of discerning whether or not a given statistical problem is actually invariant under a certain group of transformations. Brillinger (1963) gave necessary and sufficient conditions that a statistical problem must satisfy in order that it be invariant under a fairly large class of group of transformations including Lie groups. In our treatment in this monograph we treat invariance in the framework of statistical decision rules only. D e F i n e t t i (1964) in his theory of exchangeability treats invariance of the distribution of sample observations under finite permutations. It provides a crucial link between his theory of subjective probability and the frequency approach of probability. T h e classical statistical methods take as basic a family of distributions, the true distribution of the sample observations is an unknown member of this family about which the statistical inference is required. unknown. According to D e Finetti's approach no probability is If i , i E s , . . . are the outcomes of a sequence of experiments con a known joint

ducted under similar conditions, subjective uncertainty is expressed directly by ascribing to the corresponding random variables X ,X^,
t

distribution. When some of the X's are observed, predictive inference about others is made by conditioning the original distributions on the observations. De Finetti has shown that these approaches are equivalent when the subjectivist's joint distribution is invariant under finite permutation. T w o other related principles, known in the literature, are the weak invariance and the strong invariance principles. T h e weak invariance principle is used

Group Invariance

to demonstrate the sufficiency of the classical assumptions associated with the weak convergence of stable laws (Billingsley, 1968). T h i s is popularly known as Donsker's theorem (Donsker, 1951). L e t X\, X2, be independently distributed random variable with the same mean zero and the variance a
2

and let

Sj =

j=i

= ^ . * = | . ;

. - " . -

Donsker proved that { * ( ( ) }

converges weakly to Brownian motion.

Strong invariance principle has been

introduced to prove the strong convergence result (Tusnady, 1977). Here the term invariance is used in the sense that if X i , X j , . . . are independently distributed random variables with the same mean 0 and the same variance a , and if ft is a continuous function on [0,1| then the limiting distribution of h(Xi) does not depend on any other property of Xi. 0.1. E x a m p l e s We now give a n example to show how the solution to a statistical problem can be obtained through direct application of group theoretic results. E x a m p l e 0.1.1. Let X mean ^ = (m,..
a 2

= (X

n l

,...,X

a p

)\

a = l,...,N(>

p) be inde-

pendently and identically distributed p-variate normal vectors with the same . , j t ) ' , and the same positive definite covariance matrix E .
p

The parametric space f! is the space of all {11, ) . H0 : // = 0 against the alternatives Hi : 11 each Xi -* gXi, i = l,...,N. Let

T h e problem of testing

0 remains unchanged (invariant)

under the full linear group Gj(p) of p x p nonsingular matrices g transforming

It is well-known { G i r i , 1977) that (X,S) transformation on the space of {X,S)

is sufficient for ( / * , ) . T h e induced

is given by

(X,S)^(gX,gSg'),

g Gi(p)

(0.2)

Since this transformation permits arbitrary changes of X, S and any reasonable statistical test procedure should not depend on any such arbitrary change by g, we conclude that a reasonable statistical test procedure should depend on (X, S) only through T
2

= N(N - l l X ' S " * .


2

(0.3) is given by

It is well-known that (Giri, 1977) the distribution of T

Group /nuariance in Statistical

Inference

fr*(t \6 )

(N-l)T(^N-p))
(

\8 y(t j{N
2 2

>- T{\N

+ j)

if > 0
2

(0.4)

where 6

= JV/i'-y

Under H S
0 2

= 0 and under ff i S

> 0.

Applying

Neyman and Pearson's L e m m a we conclude from (0.4) that the uniformly most powerful test based on T T
2

of H

against Hi is the well-known Hotelling's T. = P(P+ )


2 3 2

test, which rejects H

for large values of

N o t e . I n this problem the dimension of fl is p + also the dimension of the (X,S). a scalar quantity. For the distribution of T

w n

i h is
c

the parameter is

One main reason of the intuitive appeal, that for an invariant problem with an unique solution, the solution should be invariant, is probably the belief that there should be a unique way of analysing a collection of statistical data. A s a word of caution we should point out that, if in cases where the use of invariant decision rule conflicts violently with the desire to make a correct decision or to have a smaller risk, it must be abandoned. We give below one such example which is due to Charles Stein as reported by Lehmann (1959, p. 338). E x a m p l e 0.1.2. Let X = Xi,...,X )',
p f

Y = (Yi,...,Y )'
p

be indepen-

dently distributed normal p-vectors with the same mean 0 and positive definite covariance matrices S , 6 S respectively where 6 is an unknown scalar constant. Consider the problem of testing Ho : 6 1 against H\ : 6 > 1, T h i s problem is invariant under Gj(p) transforming X > gX,Y gY,g e Gt(p). Since this group is transitive (see Chapter 1) on the space of values of (X, Y) with probability one, the uniformly most powerful invariant test of level a under Gi{p} is the trivial test $ ( X , Y) = a which rejects Ho with constant probability a for all values {x,y) of (X,Y). Hence the maximum power that can be achieved over the alternatives Hi by any invariant test under G;(p) is also a . B u t the test which rejects Ha whenever

(0.5)
where the constant C depends on level a, has strictly increasing power 0(6} whose minimum over the set 6 > Si > 1 is /?{<5i) > /3(1) = a. discussions and results refer to Giri (1983a, 1983b). F o r more

Group Invariance Exercises 1. L e t Xi,...


H

,X be
n

independently and identically distributed normal random


2

variables with the same mean 9 and the same unknown variance o
A

and let

:d = 0 and H i : 9 / 0.
F i n d the largest group of transformations which leaves the problem of testing Ha against Hi invariant. Using the group theoretic notion show that the two-sided student (-test is uniformly most powerful among all tests based on (.

(a) (b)

2. U n i v a r i a t e G e n e r a l L i n e a r H y p o t h e s i s . i 1,... ,TI.

Let

, . . . ,X

be indepena,
2

dently distributed normal random variables with E(Xi) let lip. and IL

= (ft, V a r ( X ; ) =

L e t fi be the linear coordinate space of dimension of n and


U

be two linear subspaces of fl such that dim I I ^

k and ... ,9 )' G


P

dim IIJJ i , I > k. Consider the problem of testing HQ : 8 lTm against the alternatives Hi : 9 6 HQ.

(a) F i n d the largest groups of transformations which leave the problem invariant. (b) Using the group theoretic notions show that the usual F-test is uniformly most powerful for testing HQ against H\. 3. L e t Xy,..., X
n

be independently distributed normal random variables with


2

the same mean 9 and the same variance a . against H\ : a


2

For testing H

: a

a\

= a

< af, where <T ,O~% are known, find the most powerful
2

invariant test using the group theoretic notions.

References 1. P. Billingsley, Convergence of Probability Measures, Wiley, New York, 1968. 2. D. Brillinger, Necessary and sufficient conditions for a statistical problem to be invariant under a Lie group, Ann. Math. Statist., 34 (1963) 492-500. 3. B. De Finetti, Studies in Subjective Probability, H . E , Kyburg and H, E . Smoker, eds., Wiley, New York, (1964) 93-158. 4. M. Donsker, An invariance principle for certain probability limit theorems, Amer. Math. S o c , 6 (1951). 5. N. Giri, Hunt-Stein theorem. Encyclopedia of Statistical Sciences, Vol. 3, K o t z Johnson, eds., Wiley, New York, 1983a, 686-689. 6. N. Giri, Invariance Concepts in Statistics, Encyclopedia of Statistical Sciences, Vol. 4, Kotz-Jofmson, eds., Wiley, New York, 1983b, 219-224. 7. N. Giri, Multivariate Statistical Inference, Academic Press, New York, 1977.

Group Invariance m Statistical

Inference

8. W. J . Hall, R. A. Wijsman and J. K . Ghosh, The relationship between sufficiency and invariance with application in sequential analysis, Ann. Math. Statist, 36 (1965) 575-614. 9. E . L . Lehmann, Testing of Statistical Hypothesis, Wiley, New York, 1959. 10. G . Tusnady, in Recent Developments in Statistics, Barra, J . R . , Brodeau, F . , Romier, G. and Van Cutsera, B. eds, North-Holland, Amsterdam, (1977) 289-300.

Chapter 1 M A T R I C E S , GROUPS AND JACOBIANS

1.0.

Introduction

T h e study of group invariance requires knowledge of matrix algebra and group theory. We present here some basic results on matrices, groups and Jacobians without proofs. Proofs can be obtained from G i r i (1993, 1996) or any textbook on these topics. 1.1. M a t r i c e s A p x q matrix C = (c\j) is a rectangular array of real number Cy written as (1.1)

where e% is the element in the i t h row and the j t h column. T h e transpose C of C in a q x p matrix obtained by interchanging the rows and the columns of C. If q = p, C is called a square matrix of order p. K q 1,0 is a 1 X p row vector and if p = 1, C is a g x 1 column vector. A square matrix C is symmetric if C C. A square matrix C = ( c ^ ) of order p is a diagonal matrix D ( c , c ) with diagonal elements e n , . . . ,Cp if all off-diagonal elements of C are zero.
u pp P

A diagonal matrix with unit diagonal elements is a n indentity matrix and is denoted by / . Sometimes it will be necessary to write it as I to denote a n identity matrix of order k.
K

Group Invariance

in Statistical

Inference

A square matrix C (cij) of order p is a lower triangular matrix if


c

0 for j > i. T h e determinant of the lower triangular matrix det C = H(=i We shall also write det C \C\ for convenience. A square matrix C = (ey ) of order p is a upper triangular matrix if Cj, = 0

for i > j and det C = [IS=I "A square matrix of order p is nonsingular if det C ?^ 0. If det C = 0 then C , is a singular matrix. A nonsingular matrix C of order p is orthogonal if CC = CC = I. T h e inverse of a nonsingular matrix C of order p is the unique matrix C such that C C
1 _ 1

= C

C = J . From this it follows that det C " = (det C)~ .

A square matrix (7 = ( c ^ ) of order p or the associated quadratic form x'Cx = J2i Z l j djXiXj is positive definite if x'Cx > 0 for x = ( i i , . . . , x )' / 0.
p

If C is positive definite C A of order pACA'

is positive definite and for any nonsingular matrix

is also positive definite.

1.1.1. Characteristic roots and vectors T h e characteristic roots of a square matrix C of order p are given by the roots of the characteristic equation det(C-A/)=0 where A is real. A s det {8C8' - XI) = det ( C - XI) for any orthogonal matrix 8 of order p, the characteristic roots of C remain invariant under the transformation of C > 8C8'. T h e vector x ( i ^ , . . . , x )' / 0 satisfying
p

(1.2)

{C-XI)x

= 0

(1.3)

is the characteristic vector of C corresponding to its characteristic root X. I f x is a characteristic vector of C corresponding to its characteristic root X, then any scalar multiple ax, a j i 0, is also a characteristic vector of C corresponding to X. Some Results on Characteristic R o o t s a n d Vectors 1. T h e characteristic roots of a real symmetric matrix are real. 2. T h e characteristic vectors corresponding to distinct characteristic roots of a symmetric matrix are orthogonal.

Matrices, Groups and Jacobians 9 3. T h e characteristic roots of a symmetric positive definite matrix C are ail positive. 4. Given any real square symmetric matrix C of order p, there exists an orthogonal matrix 9 of order p such that 9C9' is a diagonal matrix D{\\,..., where A i , . . . , A
p

A )
p

are the characteristic roots of C. Hence det C n f = i A<

and tr C X]f=i *W- Note that tr C is the sum of diagonal elements of C.

1.1.2.

Factorization

of matrices

In this sequal we shall use frequently the following factorizations of matrices. For every positive definite matrix C of order p there exists a nonsingular matrix A of order p such that C = AA' and, hence, there exists a nonsingular matrix B (B = A' )
1

of order p such that BCB'

= I.

Given a symmetric nonsingular matrix C of order p, there exists a nonsingular matrix A of order p such that

- i )

where the order of / is equal to the number of positive characteristic roots of C and the order of I is equal to the number of negative characteristic roots of C . Given a symmetric positive definite matrix C of order p there exists a nonsingular lower traingular matrix A (an upper triangular matrix B) of the same order p such that

C = AA' = B'B.

(1.5)

C h o l e s k y D e c o m p o s i t i o n For every positive definite matrix C there exists an unique lower triangular matrix of positive diagonal elements D such that

C = DD'.
1.1.3.

Partitioned matrices

Let C ( c y ) be a p x q matrix and let C be partitioned into submatrices

Cij as
C = (^
where C
u

\Cai -

^ C22 /
12

(cij)(i = l , . . . , m ; j = l,...,n)\C

- (ftj)( = l , . . . , m ; jf =

10

Group Invariance in Statistical

Inference

n+

l,...,q);C

2 l

= ( c y ) ( t = m+

l,...,p; j = l,...,n);C

2 2

m + l j . . , ,p; 3' = J l + 1 , . . . . , jjj. For any square matrix

C f^ ' \C2i
where Cn,C 2
2

\ C22 /
2

are square matrices and C 2 is nonsingular


2 2

(1) det ( C ) = det ( C C22-.Cn C\ C C \


2 22 2

) det ( C n

CaC^Cji),
U 22

(2) C is positive definite if and only if C ,C Let C Then


1

- C iG^C2

or equivalently = 1,2.

are positive definite.


i:i

= B be similarly partitioned into submatrices B ,i,j

C^, Bu - B12B22B21, C^C


1.2. G r o u p s
12

C22 B22 l2

B iB 'Bi ,
2 ll 2

= -B B^.

(1.6)

A group G is a set with an operation r satisfying the following axioms. A i . For any two elements a, b E G, arb G. A . For any three elements a,b,c G G\ {arb)rc o r ( b r c ) .
2

A3. There exists an element e (identity element) such that for all o G,are

a.
A .
4

For any a G G there exists an element a~ ara


- 1

(inverse element) such that

e.

In what follows we will write for the convenience of notation arb = ab. I n such writing the reader may not confuse it with the arithmetic product ab. A group G is abelian if for any pair of elements o, 6 belonging to G, ab = ba. A non-empty subset H of G is a subgroup if the restriction of the group operation T to H satisfies the axioms A i , , . . , A 4 , E x a m p l e s of G r o u p s E x a m p l e 1.1. mappings A . L e t X be a set and G be the set of all one-to-one

g : X -> X
with g{x) = g(y)\x,y G I ; implies x y and for x G X there exists y G X such that y = g{x). W i t h the group operation defined by 9r92(x) = SI ( J 2M); SI,S 2 G , G forms a permutation group.

Matrices,

Groups and Jacobians

11

E x a m p l e 1.2. T h e additive group of real numbers is the set of all reals with the group operation ab = a + 6. T h e multiplicative group of all nonzero reals with the group operation ab a multiplied by b. E x a m p l e 1.3. L e t X be a linear space of dimension n. Define for x
0

G X,g (x)
xo

x + XQ,X G X. T h e set of all {g }


Xo

forms an

additive abelian group and is called the translation group. E x a m p l e 1.4. L e t X be a linear space of dimension n and let Gi(n) be the set of all nonsingular linear transformations X onto X. Gi(n) with matrix multiplication as the group operation is called the full linear group. E x a m p l e 1.5. T h e affine group is the set of pairs (g,x),x 91.92 6 Gi(n) and X\,x G X,g Gi{n)

with the group operation defined by (9i,i)(<?2,X2) = {9192,91X2 + Xi) with


2

G X. T h e identity element and the inverses are

(1,0) and ( s . x r - O r
E x a m p l e 1.6.

T h e set of all nonsingular lower triangular matrices of T h e set of all with

order p with the usual matrix multiplication as the group operation forms a group GT(P) (with identity matrix as the unit element). identity matrix as the unit element. E x a m p l e 1.7. T h e set of all orthogonal matrices 6 of order p with the matrix multiplication as the group operation forms a group E x a m p l e 1.8. L e t o C 1 C 2 C C X 0(p). upper triangular nonsingular matrices of order p forms the group GUT(P)

= X be a strictly increasing

sequence of linear subspaces of X and let G be a subgroup of Gi(p) such that g G Gi{p) if and only if gXi = Xi, i = 1 , . . . , k. T h e group G is the group of nonsingular linear transformations on X to X which leaves Xi invariant. Choose a basis x^\ ... , x $ , t = 1 , . . . , k for Xi-Xi-i dim (3t,_i) and XQ <f> (null set). Then g G G can be written as with n, dim (X,) -

9 =

/ 9(U) 9(12) 0 9(22)


\ 0 0

fl(lJfc) \

9<2fc)

(1.7)

12

Group /nuariance in Statistical

inference

where g/*^ is a block of Bj rows and rij columns for i < j . I f n; = 1 for all i,G
= GUT-

1.3. H o m o m o r p h i s m , I s o m o r p h i s m a n d D i r e c t P r o d u c t Let G, H be groups. A mapping / of G into H is called a homomorphism if, for gi, g
2

G,

/(9i!72) = / ( 9 i ) * / M
where * is the group operation in H. is the identity element in H and / ( g
- 1

(1-8)

I f e is the identity element in G , }{e) ) [ / ( < ? ) ] . I n addition, if / is an


-1

one-to-one mapping, then / is called an isomorphisim. T h e Cartesian product G x H of groups G and H with the operation (9u iK92,h )
2 h

= (9i92,hih y,gi,g2
2

GMM

H is called the direct

product of G and i f . A subgroup H of G is a normal subgroup of G if for all h H and s e (?, ghg~
l

/ / , or equivalently if gHg

-1

H .

E x a m p l e 1.9.

T h e group G f ( p ) of lower triangular matrices of order p A n y subgroup of all sets

with positive diagonal elements is a normal subgroup of GT(P)of an abelian group is normal.

Let G be a group and let i f be a subgroup of G. T h e set G/H operation defined by, for j&jjfe G &,.(% H){gzH) by multiplying all elements of giH element is H and (gH)
-1

of the form gH, g G is called the quotient group of G modulo H with group is a s e t of elements obtained
2

with all elements of g H.

T h e identity

g~ H.

1.4. T o p o l o g i c a l T r a n s i t i v e G r o u p s Let X be a set and A a collection of subsets of X. topological space if A satisfies the following axioms: (X,A) is called a

TAi : X A.
T A j : T h e union of an arbitrary subfamily of A belongs to A. TA
3

; T h e intersection of a finite subfamily of A belongs to A. satisfies: for x,y X,x / y, there exist two open sets A,B, belonging

T h e elements of A are called open sets. If, in addition the topological space (X,A) space. A collection of open sets is a base for a topology if every open set in A is the union of a sub-collection of sets in the base. to A such that A n B = <p with x e A, y B; then it is called a Hausdorff

Matrices,

Groups and Jacotians

13

If a group G has a topology r defined on it such that under T the mapping ab from G x G into G and a ( G , r ) is a topological group. A compact group is a topological group which is a compact space. A locally compact group is a topological group which is a locally compact space. A compact space is a Hausdorff space X such that every open covering of X can be reduced to a finite subcovering. A locally compact space is a Hausdorff space such that every point has at least one compact neighborhood. Let X be a set. T h e group G operates from the left on X if there exists a function on G x X into X whose value at {g,x} is denoted by gx such that (i) gzgi(x). ex x for all x X and e, the identity element of G ; (ii) g^igix) This implies that gtG is one to one on X into X. Let G operate from the left on X. every Xi,x
2 1

from G into G ( a , b G ) are continuous, then

G o p e r a t e s t r a n s i t i v e l y on X if for
2

X there exists ag G such that gx\ x Let X be a linear space. T h e full linear group operates

E x a m p l e 1.10.

transitively on X { 0 } . E x a m p l e 1.11. T h e group 0{n) of n x n orthogonal matrices operates

transitively on the set of all n x p(n ^ p) real matrices, x satisfying x'x = E x a m p l e 1.12. T h e linear group G ; ( p ) acts transitively on the set of all p x p positive definite matrices s, transfering s to gsg',g 1.5. J a c o b i a n s Let Xi,...,X fx,
n

G/(p).

be a sequence of continuous random variables with pdf


n

xAXi,--.,X }

and let Yi = gi(X ...


u

,X )>i
n n

= l,...,n

be a set of X.
n 1

continuous one-to-one transformations of X%, , X ^ tion gi,...,g


n

Assume that the func= l , . . . , n . Let J


-

have continuous partial derivatives with respect to X ] . . . , Y ),i


n

Denote the inverse functions by X ; = hi(Yi,...,

be

the determinant of the n x n matrix of partial derivatives


Mi Mi

9Y (1.9) dX 9V J is called the Jacobian of the transformation X\,..., The pdf of Yi,...,
r

to V j , . . . ,

Y.
n

is given by .). . *Mi*l. --v.Sn)|/|

M , . . . y - ( K i . - - - . Vn) = fx,,...^Cftltflj

14

Group Invariance

tn Statistical

Inference

where \J\ is the absolute value of J. We now state some results on Jacobian without proof. We refer to G i r i (1994), OIkin (1962) and Roy (1959) for further results and proofs. S o m e R e s u l t s on J a c o b i a n s Let X = (Xi,...,X )',y
p

(Y ,...,Y )'
1 P

E.
l

T h e Jacobian of the

transformation X > Y gX g g G j ( p ) is (det g)~ Let X, Y be p x n matrices. T h e Jacobian of the transformation X * Y = gx,geGt{p) L e t X,Y L e t g, k nf i(ft.i)
= 1 _ 1

is (det bepxn

g)" . matrices and let A 6 Gj(p),Z? Gt{n). is (det A' )*


1

T h e Jacobian of
p

the transformation X -> Y = AXB Gr(p).

(det

S- ) . is

T h e Jacobain of the transformation g kg

where h (/i,j) and the Jacobian of the transformation g * gh is


p 1

nf= <M'- - . Let g,k n^iC )' ^ n L Let G S T - ( P ) be the multiplicative group of p x p lower triangular nonsingular matrices in the block form, i.e. g e G B T ( P ) .
l

e
- p - 1

GUT(P)-

T h e Jacobian of the transformation g kg is

where k = (hij) and the Jacobian of the transformation g gh

\ 9(21) m
9(1:1)

9(22) m
9(*2)

0
* 9(ltfc)

where g^q are submatrices of order pi x pi such that Y^iPi GBT(P) the Jacobian of the transformation g -*
a = ; 3

= P- F 9 . ^
d e t

6
- , T i

kg is n ! = i i

/i(ii)j

where cr = Y? =\P3* o is n t j d e t h^F- " .


1 p

T h e Jacobian of the transformation g > gh

Let GBUT(P)

be the group of nonsingular upper triangular matrices of order GBUT, '9(H)


?

p in the block form i.e. g e

9(12)
9(22)

9<u)\
9(2fc) j

0 where g^

9(fcfc)
P i

are submatrices of order p; satisfying

= . For g,h
p d e t

e
a n d t

G
h a

e t

7 ,
T o f

the Jacobian of the transformation g -t gh is n L i [ 9^/9isnLi[


d e

fya)] "'

t(u)]" - " .

Matrices, Groups and Jacobians

15

Let 5 be a symmetric positive definite matrix of order p. T h e Jacobian of the transformation 5 -> gSg',g is (det S ) " . Let 5 be a symmetric positive definite matrix of order p and let g be the unique lower triangular matrix such that S = gg'. T h e Jacobian of the transformation S -> gSg' is 2~" ^ i f f f i i ) ^ the transformation S gSg',g
1 2

e Gt(p)

is [det g ] "

t p + 1 )

and that of S - 5

_ 1

with g = (jfc), T h e Jacobian of g]~ - K


< p+1

g G ( p ) is [det Exercises

1. Show that for any nonsingular symmetric matrix A of order p and non null p vectors x,y (a) (A + xy')- ^
1

l + =

y'A~ x

x'A-^x (b) a?{A + xx')- x


1

l + x'A-tx x'A~ x).


1

(c) |A + xa;'| = \A\{\ + 2. Let A,B \I +BA\.


q

be p x q and qxp

matrices respectively. Show that \I + AB\


P

3. Show that for any lower triangular matrix C the diagonal elements are its characteristic roots. 4. Let X be the set of all n x p real matrices X satisfying X'X that the group 0(n) X > 6X acts transitively on X. 5. Let 5 be the set of all symmetric positive definite matrices s of order p. Show that G j ( p ) which transforms s t gsg',g 5. G)(p), acts transitively on I.
p

Show

of orthogonal matrices 9 of order p, which transform

References 1. N. Giri, Multivariate Statistical Inference, Academic Press, New York, 1977. 2. N. Giri, Introduction to Probability and Statistics, 2nd Edition (Expanded), Marcel Dekker, New York, 1993. 3. N, Giri, Multivariate Statistical Analysis, Marcel Dekker, New York, 1996. 4. I. Olkin, Note on the Jacobian of certain transformations useful in multivariate analysis, Biometrika, 40 (1962), 43-46. 5. S. N. Roy, Some Aspects of Multivariate Analysis, Wiley, New York, 1957.

Chapter 2 INVARIANCE

2.0,

Introduction In practice many sta-

Invariance is a mathematical term for symmetry.

tistical problems involving testing of hypothesis, exhibit symmetries, which imposes additional restrictions for the choice of appropriate statistical tests, for example, the statistical tests must also exhibit the same kind of symmetry as is present in the problem. In this chapter we shall consider the principle of invariance of statistical testing problems only. For an additional reference the reader is referred to Lehmann (1959), Ferguson (1967) and Nachbin (1965). 2.1. I n v a r i a n c e of D i s t r i b u t i o n s L e t [X,A) be a measure space and ft the set of points 8, that is, f! {9}.
t

Consider a function P on ft to the set of probability measures on (X A) whose value at 8 is P .


B

L e t G be a group of transformations operating from the left

on X such that g G g : X"X and


9 : ft -

is (A, A) measurable;

(2.1)

0
s

(2.2) where

is such that if X has distribution P ,


6

gX, X X has distribution P -

0* = 98 6 ft. A l l transformations considered in connection with invariance will be taken for granted as one-to-one from X onto X. A n equivalent way of stating (2.2) is as follows:
16

Invariance

I7

P {gX
e

e A) = P- (X
g6

A),Ae

A,

(2.2a)

or

P {g- A)
9

P {A)
se

(2.2b)

This can also be written as

Pt(B) = P {gB),
s

Be A
PH*
t n e n

If all Pe are distinct, that is, 81 / 82 Pe, tribution.

9*

S a

homomorphism.

The condition (2.2a) is often known as the condition of invariance of the disVery often in statistical problems there exists a measure A such that Pa is absolutely continuous with respect to A for all 8 S fi so that p$ is the corresponding probability density function with respect to the measure A. I n other words, we can write

(2.3)

Also in great many cases of interest, it is possible to choose the measure A such that it is invariant under G, viz

\(A) = \[gA)

for all A G A,

geG.

(2.4)

D e f i n i t i o n 2 . 1 . 1 . A measure A satisfying (2.4) is left invariant. In such cases the condition of invariance of distribution under G reduces to Pgs(gx) = pe(x) for alia; e X, g e G. (2.5)

The general theory of invariant measures in a large class of topological groups was first given by Haar (1933). However, the results we need were all known by the end of the nineteenth century. I n the terminology of Haar, Ps(A) is called a positive integral of p, not necessarily a probability density function. T h e basic result is that for a large class of topological groups there is a left invariant measure, positive on open sets and finite on compact sets and to within multiplication by a positive constant this measure is unique. Haar proved this result for locally compact topological groups. for left invariant positive integrals. T h e definition of right invariant positive integrals, viz. Pe{A) = Pge(Ag) is analagous to that

18

Group Invariance in Statistical

Inference

Because of the poineering works of Haar such invariant measures are called invariant (left or right) Haar measures. A rigorous presentation of Haar measure will not be attempted here. T h e reader is referred to Nachbin (1965). It has been shown that such invariant measures do exist, and from a left invariant Haar measure one can construct the right invariant Haar measure. We need also the following measure-theoritic notions for further developments. Let G be a locally compact group and let B be the <7 -algebra of compact subsets of G. D e f i n i t i o n 2.1.1. (G,B) (Relatively left invariant measure). A measure v on

is relatively left invariant with left multiplier x{g) i f f satisfies v{gB) = x(9)v(B), forallBeB.
+

(2.5a) If v

T h e multiplier x(t) is a continuous homomorphisim from G R . is relatively left invariant with left multiplier \(g) x(ff )f(^ff) '
-l s a

then \ ( g

- 1

) = l/x(ff) and

l ft invariant measure. A group G acts topologically on the left of x if the


1S

D e f i n i t i o n 2.1.2.

mapping T(g, x) : x x G x

continuous.

D e f i n i t i o n 2.1.3. (Proper G-space). Let the group G acts topologically on the left of x
a n d

let h be a mapping on G x \ * X

X given by

Hg>x) = {gx,x)
for g G G,x C C x
x

XJ

T h e group G acts properly on \

if for every compact

X> / " { G ) is compact.

T h e space x is called a proper G-space fG acts properly on \ . A n equivalent concept of the proper G-space is the Cartan G-space. D e f i n i t i o n 2.1.4 (Cartan G-space). Let G acts topologically on \ . Then X is called a Cartan G-space if for every x e x there exists a neighborhood V of a such that (V, V ) = {g e G\{gV) n V # 4>} has a compact closure. Wijsman (1967, 1978,1985, 1986) demonstrate the verification of the condition of Cartan G-space and proper actions in a number of multivariate testing problems on covariance structures. E x a m p l e 2.1.1. Let G be a subgroup of the permutation group of a finite s e t * - For anysubset A of x define A( A) = number of points in A. T h e m e a s u r e

Invariance

19

A is an invariant measure under G and is unique upto a positive constant. It is known as counting measure. Example 2.1.2. L e t x be the Euclidean n space and G the group of
3

translations defined by i

e %g
t

Xl

e G implying =x + xi.

g (x)
Xl

Clearly G acts transitively on x-

T h e n-dimensional Lebesgue measure A is

invariant under G. Again A is unique upto a positive multiplicative constant. E x a m p l e 2.1.3. left. Consider the group Gi(n) operating on itself from the is linear, it has a is a vector Obviously, Gi(n)

Since an element g operating from the left on Gi(n)


2

Jacobian with respect to this linear operation. space of dimension n . dimension n x n b y

L e t us make the convention to display a matrix of

column vectors of dimension n x 1, by writing the first represents a point in R


t] n

column horizontally. T h e n the second is likewise and so forth. I n this manner a matrix of dimension nxn let g = (gi ),x
}

. L e t g,x,y

e Gi(n)

and

(x j),y
z

(y ).

If we display these matrices by their column

vectors, the relationship y gx gives us the coordinates of the point

as

Vij = in terms of the coordinates of

y^9ikXkj
k=l

{^111 ' * ^ l n > " ' ! X i,


n

. . . , Xjin) 2

The Jacobian is thus the determinant of the n (9 0 0 g ... ...

x n

matrix

0^ 0

\0

9> Hence the Jacobian of Gt{n)

where each 0 indicates null matrix of dimension nxn. the transformation y = gx is (detg} . dX(y) n

Let us now define for y e

(det T / ) "

20

Group Invariance in Statistical Inference

Then

dX(gy) =

(det g) Y[dVij (det(gy))"

(det 3/)"

= dX(y)
Hence A is an invariant measure on G ; ( n ) under (?/(n). It is also unique upto a positive multiplicative constant. E x a m p l e 2.1.4. Let G be a group o f n x n nonsingular lower triangular

matrices with positive diagonal elements and let G operate from the left on itself. We will verify that the Jacobian of the transformation

Q-*hg
for G, is I l i C " ) ' - Thus an invariant measure on G under G which is
1

unique upto a positive multiplicative constant, is

dX{g) =

il>,

n?=i(j?)' To show that the Jacobian is n r = i ( ^ ) * > ' kij = 0 for i <j. T h e n
e t

9 -

(fij),^ =

with g

tj

This defines a transformation (fti)-(Cy). We can write the Jacobian of this transformation as the determinant of the matrix /dC 3C \ dCu dg &921
n lL

dC

21

dC

2l

dC

21

dgu

dgnn

BC \

nn

dC

nn

dgn

$921

&9nn 1

/nuartonce

21

which is a [n(rc + 1 ) ] / 2 x [ n ( f i - r l ) ] / 2 lower triangular matrix with the diagonal elements h ,k


kk

> ro.

In other words h%i will appear once as a diagonal element, /122 will appear twice as a diagonal element and finally h
1 nn 1

will appear n times as a diagonal

element. Hence the Jacobian is I I I L i C " ) ' E x a m p l e 2.1.5. L e t S be the space of all positive definite symmetric

matrices S of dimension n x n . and let Gi(rc) operate on 5 as S^gSg', SeS, g 6 G,{n).


n+1

(2.6) let i ? y be the

To show that the Jacobian of this transformation is (det g) , matrix obtained from the nxn the j t h row; M^C)

identity matrix by interchanging the i t h and

be the matrix obtained from the n x n identity matrix by identity matrix by adding the j t h row to the i t h . We and A y . T h i s can be easily is
2

multiplying the ith row by any non-zero constant C and let A y be the matrix obtained from the nxn need only show this for the matrices Eij,Mi(C)

verified by the reader; for example, d e t ( M , ( C ) ) = C and Mi{C)SMi(C) obtained from S = ( 5 y ) by multiplying Sa by C that the Jacobian is C ^ " "
1

and S y for i ^ j by C so

= C"

+ 1

Let us define the measure X on 5 by

(detS)(W

Clearly, d\(S) = dX(gSg'),

so that X is an invariant measure on S under the transformation (2.6). Here also X is unique upto a positive multiplicative constant. T h e group G;(p) acts transitively on S.

E x a m p l e 2.1.6. L e t H be a subgroup of G. T h e group G acts transitively on the space x = G/H (the quotient group of H) where group action satisfies = 9i9Hi for 9i e G , gG.

9i(gh)

When the group G acts transitively on x, there is a one to one correspondence between x and the quotient space G/H of any subgroup H of G.

22

Group Invariance in Statistical

Inference

2.1.1.

Transformation of variable in abstract integral


be a measure space and let (y, C) be a measurable space. Let us define the measure v

Let (X, A,n) on

Suppose <p is a measurable function on X into y.

(y,C) by v(C) = i{>p- (C)),


t 1

CEC.

(2.7)
on X such that f{x)

T h e o r e m 2 . 1 . 1 . If f is a real measurable function g[tp(x)) and if f is integrable, then

I }{x)dti{x) Jx

= =

f Jx

g(<p(x))dn(x) (2.8)

f g{y)dv{y) Jy

For a proof of this theorem, the reader is referred to L e h m a n n [(1959), p. 38], E x a m p l e 2.1.6. ( W i s h a r t d i s t r i b u t i o n ) . Let X\,..., matrix (positive definite). Assume n > p so that X be n indepen-

dent and identically distributed normal p-vectors with mean 0 and covariance

is positive definite almost everywhere. T h e joint probability density function of X ...


1}

,X

is given by

p(x ,...,xJ-(27r)-"P/ (det- )"/ xexp|-itrS- ^^;|


1

(2.9)

Using Theorem 2.1.1, we get for any measurable set A in the space of S

P(S

e A) ( 2 J T ) - ' ^ J (det

s - X^Xixl

e A exp

j-itr
_ 1

S^sj

f[dx,, ' i=l


tr

(2.10)

= (2^)-^

(det

exp j - i

dm(s)

Invariance

23

where m is the measure corresponding to the measure v of Theorem 2.1.1. Let us define the measure m' by

To find the distribution of S, which is popularly called Wishart distribution ( W ( S , J I ) ) with parameter E and degrees of freedom n, it is sufficient to find dm'. T o do this let us first observe the following: such that

(i) Since E is positive definite symmetric, there exists a Gi{p) aa'. (ii) Let

As a

Xi s

are independently normally distributed with mean vector 0 and Thus

covariance matrix I (identity matrix), 5 is distributed as W(I,n).

P .(S
aa

6 A) = ( 2 7 r } - ^

/ JatA

(det(aa')- s))

n/2

x exp | - ^ t r ( a a ' )

- 1

s|

dm'(s);

P (aSa'
r

A) = (27r)- ^

(det((c.a')

_1

s))

/a

x exp j - i t r ( a a ' ) Since P ' ( S e A ) = Pj{aSa

s | x dm*{a

5a

')

a a

A ) for all measurable sets A in the space of

5 , we must have for all g G J ( J I ) dm'(s) By example 2.1.5 we get = dm (gsg').


t

(2.12)

{ S )

= (det

'

where C is a positive constant.

24

Group Invariance in Statistical

Inference

T h u s the probability density function of the Wishart random variable is given by

W ( E , n ) = C(det - } ' ( d e t s )

( n

- -

, ) / 2

exp j - ^

tr

sj

(2. 13)

where C is the universal constant depending on E . T o evaluate C let us first observe that C is a function of n and p. L e t us write it as C = C , .
n p

Since C ^
n

is a constant for all values of , we can, in particular, take E = J to evaluate C .


np

Write Sn \$2l S12 S22

where S

is (p 1) X (p 1). Make the transformations

11 = Tn,
S12 T12, S22 - SuSiiSu which has a Jacobian equal to unity. Using the fact that = T22 = Z ( s a y ) ,

\S\ = \Su\ \S22 - S iSi, Si2\


l 2

and

J C , |5|
n p

l n

- - ^exp|-^tr5jds= 1

forallp.n,

we get

C , \ \^-"- >' e^p^~tis^ds


n p S

= -.pj

kiil

| n

" "
p

1 ) / 2

exp|-itr |_I |
z

S l l

Jd

5 l l

x y"{,)(- -i)/

2 e x p

d 2

e x p

{~5 'i

2 s

i"i '

5 l 2

d l l

/nuoriance

25

x /

I
j

|sn| ^

t n

" ~

l ) / 2

exp

da

11

m C

C,

n P

2 j r

j( -iV2 (n- +n/2


P 2 P

f n - p + l \ \ 2 J da n

n,(p-1)
C

",(p- i)|5n|

( n

l / 2

exp

Since C , i = 2- ' /T{n/2),


n

n 2

we have
1 2 J r

7 r

-p(p-l)/22-' /

-p(p-l)/2 -np/2
2

2.2. I n v a r i a n c e o f T e s t i n g P r o b l e m s We have seen in the previous section that for the invariance of the statistical distribution, transformation g of g g G must satisfy g9 g fi for 9 g fl.

D e f i n i t i o n 2.2.1. ( I n v a r i a n c e o f p a r a m e t r i c s p a c e ) . T h e parametric space fi remains invariant under a group of transformation G : X (i) for g g G, the induced mapping j on fi satisfies g9e.il (ii) for any 8' fi there exists 0 g fi such that g$ = 8'. An equivalent way of writing (i) and (ii) is A , if
1

for all # fi;

fffi =
It may be remarked that if P
g

fi.

(2.14)

for different values of 8 g fi are distinct then

g is one-to-one. T h e following theorem will assert that the set of all induced transformations g,g g G also forms a group.

T h e o r e m 2.2.1. Letg g
ly

be two transformations
1

which leave fi invariant.

The transformations

g gi
2

and g^

defined by

0201 (x) = 02(9i (aO)


_. ffi (2-15) 9iW=x

26

Group /rtuariance in Statistical

Inference

for all x X leave SI invariant

and g gi
2

fftffi, (Si ) (ffi)

P r o o f . We know that if the distribution of X is Pg, 0 G fl, the distribution of gX is Pgd.gi ft. Hence the distribution of g2gi{X)
1

is P g ^ e , as <hf) ft leaves ft invariant and,

and

leaves fi invariant, g~2g~\6

ft.

T h u s g g\
2

obviously,

9i gi = 92fli
Similarly the reader may find it instructive to verify the other assertion. L e t us now consider the problem of testing the hypothesis Ho : 6 G Slu against the alternatives Hi : 6 ftjj,, where ft#
0

and Sin,

are two disjoint

subsets of SI. Let G be a group of transformations which operate from the left on X satisfying conditions (i) and (ii) above and (2.14). I n what follows we will assume that g G is measurable which ensures that whenever X is a random variable then gX is also a random variable.

Definition 2.2.2. of testing Ho : 8 SIH


U

( I n v a r i a n c e o f s t a t i s t i c a l p r o b l e m ) . T h e problem against the alternative H i : 0 < S Sin, remains invariant

under a group of transformations G operating from the left on X if (i) (or g G, A A, P (A)
e

=
Hl

P- (gA),
sa

(ii)

ttH ,gSl
a

Hl

= Sl

for all g G G .

2.3. I n v a r i a n c e of S t a t i s t i c a l T e s t s a n d M a x i m a l I n v a r i a n t Let (X,A, A. A) be a measure space. Suppose pg,8 G SI, is the probability

density function of the random variable X G X with respect to the measure Consider a group G operating on X from the left and suppose that the measure A is invariant with respect to G , that is, \{gA) = \(A) for a l l A e A ? G .

Let [y,C) be a measurable space and suppose T ( X ) is a measurable mapping from X into X.

Definition 2.3.1. G,XX.

( I n v a r i a n t f u n c t i o n ) . T ( X ) is an invariant function for all g G

on X under G operating from the left on X if T ( X ) = T(gX)

Invariance

27

D e f i n i t i o n 2.3.2. X.

( M a x i m a l i n v a r i a n t ) . T(X)

is a maximal invariant = T{Y) for all fijjr gX.


lt

function on X under G if T(X)

is invariant under G and T(X)

Y X implies that there exists g G such that Y = If a problem of testing HQ : 8 e f l #


0

against the alternatives H 8

remains invariant under the group transformations G, it is natural to restrict attention to statistical tests which are also invariant under G. Let v>(X), X X be a statistical test for testing Ho against H i , that is, <p{x) be the probability of rejecting H
a

when x is the observed sample point. If f is invariant under G

then ip must satisfy <p(x) = tp(gx), xeX.geG (2.16) in terms of the maximal in-

A useful characterization of invariant test f(X) variant T(X)

on X is given in the following theorem. be a maximal invariant on X under G. A test h

T h e o r e m 2.3.1. Let T(X) ip(X) is invariant for which <p(X) = h(T{X)).

under the group G if and only if there exists a function

P r o o f . Let <p(X) = h{T(X)).

Obviously, gG,xX. T(y);x,y X then there exists but h

<p(x) = <f(g(x)), Conversely, if <p(x) is invariant and if T{x)

a g G such that y gx and therefore tp(x) f[y). In general if ip is invariant and measureable, then <fi(x) k(T(x)) invariant T(X)

may not be a Borel measurable function. However if the range of the maximal is Euclidean and T is Borel measurable then Ai {A : A where h is a Borel A and gA A for all g G) is the smallest f-field with respect to which T is measurable and ip is invariant if and only if ip(x) h(T(x)) measurable function. T h i s result is essentially due to Blackwell (1956). Let G be the group of induced transformations on f! (induced by the group G). Let v(8),8 fi, be a maximal invariant on fi under G. The distribution i/{9). fi. T h e n there exists a g G such of T(X) [or any function of T(X))

T h e o r e m 2.3.2.

depends on f! only through

P r o o f . Suppose v{9i) = v(8' ),8\,8


2

that f?2 # i . Now for any measurable set C

28

Group Invariance in Statistical

Inference

P (T(X)
tl

C) = P {T{gX))
9i

G C) E C )

= P- (T(X)
g9l

2.4. S o m e E x a m p l e s of M a x i m a l I n v a r i a n t s E x a m p l e 2 . 4 . 1 . L e t X = {Xu - . * n ) ' * and let G be the group of translations gc{X) = (Xi + C,.. .,X
a

+ C)',-oo

< C < oo. = (X\,X ,..


n

A maximal invariant on X under G is T{X) Obviously, it is invariant. Suppose x (x\,... T(x) = T(y). Writing C y
n

X ).
n n

, ) ' , y = ( j / i , . . . ,y )' X

and let

we get yi = Xi + C for i 1 , . . . , n.
n

Another maximal invariant is the redundant (X\ X,...,

X) where

X = i 51 X j . It may be further observed that if n 1 there is no maximal invariant. T h e whole space forms a single orbit in the sense that given any two points in the space there exists C G G transforming one point into the other. In other words G acts transitively on X. E x a m p l e 2.4.2. Let A be a Euclidean n-space, and let G be the group of scale changes, that is, j? G G,
a 1

9 (X)
a

= (aX ,...,aX )',


1 n

a > 0.

E x a m p l e 2.4.3. Let X = X x X where Xi is the p-dimensional Euclidean space and X is the set of all p x p symmetric positive definite matrices. Let XeX SX . Write
Y 2 2 u 2

X -

(X i),...,X ))',
1 ( t

where X is d; x 1 and 5 is a\ x d; such that i U < * i = P- Let G be the group of all nonsingular lower triangular pxp matrices in the block form, S G G
[t) { i i )

Invariance (9(H) 3(21) 9 = \9(ki) operating as (X,S)->(gX,gSg'). Let 9(k2) 9{kk)/ 0 3(22) 0 0 >

29

f{X,S) =
where ^ - % S % Obviously, R* > 0. L e t us show that Rj,..., in a similar fashion. Let us first observe the following: (i) I f X -* gX, S - <?5g' then

{Rl...,Rl),

*= 1 . - . *

(2-17)

if is a maximal invariant on X

under G. We shall prove it for the case k 2, T h e general case can be proved

- * 9 ( n ) X i ) , 5(u) -t 9(ii)S(n)0( ).
( U

Thus ( i f j . f i f + R%) is invariant under G. (ii) A " ' 5


_ 1

X - XJ'JJSJ"!,^!) + ( X (5

( 2 )

- 5 (ai )5(n )^"(i))'


1

( 2 2 )

- S(21)5 ^ 5(i2))" (X(2) ( )

S(21)S^ \ X( )
l ) 1)

Now, suppose that for X,Y (x '


(

e X\\ S,T
1

G A*

1 1

5-! x
)

( 1 )

,x'5- x)-(r '


(

1 1

T-; y
l

[ 1 )

,y'r- y),

(2.18)

where K, T are similarly partitioned as X, S. exists a.g eG such that X = gY, S = gTg'.

We must now show that there

L e t us first choose 0

g = ' 3(21) with 3(11) =

3(22)

3(22) ~ (S(22) ~ 5 ( 2 ) S ^ ) S 2 ) ) " "


1 ( ) ( 1

9(12) = 9(22)5'(21)S "


(

11)

30

Group Invariance in Statistical

Inference

T h e n gSg' I. Similarly choose

h = ( \fy l)
2

/l(22)

such that hTh! = I. Since

implies we conclude that there exists an orthogonal matrix 0\ that Since X'S-'X = sT
1

or order <ii x di such

Y'T~ Y,

= s's, = h%

(fl(ii)-^(i))'((ii)^(i)) = (fc(u)%))'(tyii)%))> then I |S (21)-^(1) + 9 ( 2 2 | A T ( 2 ) | | = 11/1(211^(1) + ^(22)^(2) IP so that there exists an orthogonal matrix 0
2 2

of order d

x d

such that

(9(21)X(i) + 9 ( 2 2 ) X ( ) = 02(/l( i)y"(i) + / l ( 2 2 ) V ( 2 ) ) .


2 ) 2

Now let r T h e n , obviously, X = = where a G and gSg' = rhTh'V, g^ThY aY,


=

f 0,
I o

0 o
2

\ J

Invariance 31 so that
S =

aTo'
s a

Hence ( R J . f l J + i i j ) and equivalently (J?J, Ji^) * under G.

maxima! invariant on x

Note that if G is the group of p x p nonsingular upper triangular matrices in the block form, that is, if g G ( 9{ii) 0 9 = V where 0 0 9{kk)/
S(i2) 9(22)

'' '

9(i*) \
9(2*)

is di x d{, then a maximal invariant on X under G is (ir,...,n)

defined by / % )

%,]

-l

Y^T* =

(Xti ,,..,X }'


) m

(X(i),...

,X )i
w

= 1 , . . . ,k

2.5. D i s t r i b u t i o n o f M a x i m a l I n v a r i a n t Let (X,A,X) be a measure space and suppose pg,9 0 , is a probability

density function with respect to A. Consider a group G operating from the left on X and suppose that A is a left invariant measure, \{A) Let [yX) = X(gA), g G, A A. is a measurable L e t A* be a

be a measurable space and suppose that T{X) induced by A, such that X'(C) = X(T- (C)),
1

mapping such that T is a maximal invariant on X under G. measure on {y,C)

CeC defined by

Also let Pi be the probability measure on (y,C) j JT-i(c)

Pt{C)

Pe(x)dX(x)

32

Group /n variance in Statistical

Inference

so that P

has a density function p" with respect A*. Hence = j

p'{T)d\'{T)

p(x)dX{x),CC

Assumption. group G under G. L e m m a 2.5.1.

There exists an invariant probability measure p on the

The density function

p* of the maximal

invariant

T with

respect to A* is given by

P*(T) = J (gx)dp(g).
P

(2.19)

P r o o f . Let / be any measurable function on (y,C}-

Then

j f(T)p'(T)d\-(T)

= J

f(T(x))p(x)d\'x).

However,

j f(T(x)) = jj = J = j J

p{g(x))dft(g)dX(x) HT(x))p(g(x))d\(x)du(g) nnx^pigix^dHgix^g) f(T{y)) {y)d\{y).


P

T h e n , since the function /


JG

p(g(x))dfi{g)

is invariant, we have p*{T) = / p(g(x))M9) Ja

Invariance

33

2.5.1. Existence of an invaraint probability measure (group of p x p orthogonal matrices).


P

on

O(p)

Let X = ( X i , . . . , X ) be a p x p random matrix where the column vectors Xi (p-vectors) are independent and are identically distributed normal (p-vectors) with mean vector 0 and covariance matrix / (identity). Apply the Gram-Schmidt orthogonalization process to the columns of X to obtain the random orthogonal matrix Y = (%,...., Y ),
p

where

Yi-

i =

i ...,p.
t

Define Y = T(X)

and let G = 0(p). gY = =


=

Obviously, (gY ...,gY )


u v

T(gX ...,gX )
u p

T(X;,...,X;),

where X;=gXi, i = l,.... .


P

Since X ; , X " have the same distribution and Xi,...,X* gY have the same distribution for each g 0(p). of Y defined by P[Y is invariant under 2.6. Applications 2.6.1. (Non-central Wishart 0{p), eC) = P(X eT~ (C))
L

are independently

normally distributed with mean 0 and covariance matrix / , we obtain Y and T h u s the probability measure

Example
n

distribution).

Let X

( X ! , . . . , X ) (each X * is a p-vector) be a p x n matrix (n > p) where each column Xi is normally distributed with mean vector / i ; and covariance matrix / and is independent of the remaining columns of X . T h e probability density function of X with respect to the pn-dimensional Lebesgue measure is

i&'t$

2 n

-)np/2

e x p

{~i

t t x

' - | * i W * ' + t*avi'j

(2-19)

34

Group Invariance in Statistical

In/erence

where fi = ( f t , . , . . . , U p ) . It is well known that


n

iml

is a maximal invariant under the transformation X XT,

re

0{n),

where O ( n ) is the group of orthogonal matrices of dimension n x n . Let p(s, p.) be the probability density function of 5 at the parameter point p. with respect to the induced measure dX'{S) (2.13) p(s,0)dX'(s) By (2.18) = C(det s ) " - ( p , ) / 2

as defined in L e m m a 2.5.1. T o find the From

probability density function of 5 with respect to d we first find dX'{s).

e x p | - | t r s J da,

(2.20)

(s,o)=

f (2rr,o)d^(r) Join)
P

(2ff)f/
a n 2

fH/ ., *
0
rexpi-^trsl , \ 2
P

(r)

where / i is the invariant probability measure on O ( n ) . Hence, dA*(s) = ( 2 T ) " / C ( d e t f*+-W*it Again by (2.18) .

p { a , u ) = exp | -1 tr(s 4- J V ) J jf

^ exp j - j t r acT**' J

dp{T)

= ^ P | - g tr(s

+ W

0j p),

Invariance

35

where h(x,fi)= It is evident that h(x,u) = h(xT,p.<B), as j


Jot,n)

exp J - t r a d ? a ' \
I
2

du(T)

where T, * e 0{n). T h u s we can write h(x,p)

The probability density function of S with respect to the Lebesgue measure ds is given by p(s,^)dA*(s) = p ( s , ^ ) ( 2 7 r ) / C ( d e t
n p 2

s) "-''- > ds.

/2

This is called the probability density function of the noncentral Wishart distribution with non-centrality parameter fia'. E x a m p l e 2.6.2. ( N o n - c e n t r a l d i s t r i b u t i o n of t h e c h a r a c t e r i s t i c Let Ri > R >

r o o t s ) . Let S of dimension p x p be distributed as central Wishart random variable with parameter and degrees of freedom n > p. > R
p 2

> 0 be the roots of characteristic equation det(S XI) 0, and


P

let P denote a diagonal matrix with diagonal elements R i , . . . ,R . respect to induce measure dX'(R) Giri(1977)]
/p v(n-p-l)/2 .
p

Denote by

p(i?, S ) the probability density function of R at the parameter point E with of L e m m a 2.5.1. We know [see, for example,

p(R,I)dX-(R)

= C

{U^J

exp|4X>j

*ll(&-Rj)dR

where dR stands for the Lebesgue measure and C\ is the universal constant. It may be checked that R is a maximal invariant under the transformation

s
By (2.18)

r s r ,

r e

o(p).

p(R,

I) = C

(j[

R^j

exp | - \ J Ri |

^ dp(T).

36

Group Invariance

in Statistical

Inference

So dX'(R) = ^-'[[(R -R )dR.


1 ]

Again by (2.18)
/ p \ (n-p-l)/2

p ( R , E ) = (det S " ) " /

(n^)

(n-p-l)/2

exp J --tr x / Jotv) I 2 'O(p)

0- TRr'\
l

da(T) J

where t9 is a diagonal matrix whose diagonal elements are the characteristic roots of . T h e non-central probability density function of R depends on only through its characteristic roots. 2.7. T h e D i s t r i b u t i o n of a M a x i m a l I n v a r i a n t i n t h e General Case Invariant tests depend on the observations only through the maximal invariant. T o find the optimum test we need to find the explicit form of the maximal invariant statistic and its distribution. For many multivariate testing problems it is not always convenient to find the explicit form of the maximal invaraint. Stein (1956) gave the following representation of the ratio of probability densities of a maximal invariant with respect to a group G of transformations g leaving the testing problem invariant. T h e o r e m 2.7.1. Let G be a group operating from the left on a topological space (X, A) not necessarily densities pi,p
2

and \

a measure

in x which is left invariant Assume

under G (G is

transitive on X).

that there are two given probability

with respect to A such that

Pi(A)

J P2(x)d\( ),
A x

Invariance

37

for A A and Pi and P% are absolutely continuous. maximat invariant Pi. and p" is the distribution conditions of T(X} Then under certain

Suppose T : X * X is a when X has distribution

(2.21)

where p, is a left invariant of (2.21) is given by

Haar measure on G. An often used alternative

form

(2.21a)

where f

is the probability density with respect to a relatively invariant X(g).

measure

with multiplier

Stein (1956) gave the statements of Theorem 2.7.1. without stating explicitly the conditions under which (2.21) holds. However this representation was used by G i r i (1961, 1964, 1965, 1971) and Schwartz (1967). Schwartz (1967) also gave a set of conditions (rather complicated) which must be satisfied for Stein's representation to be valid. Wijsman (1967, 1969) gave a sufficient condition for (2.21) using the concept of Cartan G-space. K o e h n (1970) gave a generalization of the results of W i j s m a n (1967). Bonder (1976) gave some conditions for (2.21) through topological arguments. Anderson (1982) has obtained some results for the validity of (2.21) in terms of the proper action of the group. Wijsman (1985) studied the properness of several groups of transformations used in several multivariate testing problems. Subsequently W i j s m a n (1986) investigated the global cross-section technique for factorization of measures and applied it to the representation of the ratio of densities of a maximal invariant.

2.8. A n I m p o r t a n t M u l t i v a r i a t e D i s t r i b u t i o n Let Xi,..., XN be N independently, identically distributed p-dimensional

normal random variables with mean = ( , . . . , p ) ' and covariance matrix (positive definite). Write
N
x =

N ' -

w E

jy x -x%x -xf.
i i i

38

Group Invariance in Statistical

Inference

It

is well-known that

X,S

are independent

in distribution and X

has

p-dimensional normal distribution with mean and covariance matrix (1/jV) E and S has central Wishart distribution with parameter E and degrees of freedom n N 1. Throughout this section we will assume that N > p so that S is positive definite almost everywhere. Write for any p-vector b,

where fofl is the d; x 1 subvector of b such that j matrix A A(n) A(


l f c

d* p and for any p x p

'An) Am =

(n)

,A i)
( A

(kk)

j ,R
k

where by

is a di x dj submatrix of A. L e t us define Ri,... t

and

ft,...,5*

E ^ = ^w2r7,^ , t = l,2,...,fc.
W

T h e o r e m 2.8.1. TTte joint probability density function given by (for Ri > 0, * = i fl^ < y

of R% . ..,R
t

is

w -

/
(

[(W- )/2]-l
P

x T j

W l - E ^ J

/ J V - <5,_i

dj J L & \

Invariance where a S i = i dj
w

39

^ o 0 and tp(a, b : x) is the confluent


a

kypergeometric

series given by a
+

a{a + 1 ) x
X +

a(a + l ) ( a + 2) x
+

b(b + 1) 2!

6(6 + l)(b + 2) 3!"

" ' ' < k

Furthermore, the marginal probability density function of Ri,...,Rj,j can be obtained from (2.22) by replacing A; by j and p by YJ[=I hiproof.

We will first prove the theorem for the case k 2 and then use

this result for the case k 3. T h e proof for the general case will follow from these cases. For the case k = 2 consider the random variables W = 5(22) - 5 ( 2 1 ) 5 ^ , 5 ( 1 2 ) ,

%,).

It is well-known [see, for example, Stein (1956), or G i r i ( l 9 7 7 ) , p. 154] that (i) V is distributed as Wishart W(N (ii) W is distributed as Wishart W{N (21)2^,V
- 1

- 1, E ( ) ) ;
U

- 1 - di, E 2 ) [ 2

S^ijSmjE^));

(hi) the conditional distribution of L given V is multivariate normal with mean /


2

and covariance matrix ((22, - E(21)S^ i S(12)}/ ( ,


1 ) ( 1

where is the tensor product of two matrices ( 2 2 ) di x d, identity matrix I ,


a

^(21)^(11)^(12) and the

(iv) W is independent of Let us now define W\, W

(L,V). by

W (l
2

+ W ) = N(X
L

{2)

- S

S ^

T ( S
( 1 )

- 5

( 3

i 5 ) (

1 )

5(i2 ))

(2.24)

x ( X 2 ) - 5 ( 2 i , 5 ^ , A"
(

). given 5 ( j
n

Since X is independent of 5 , the conditional distribution oi\fNX^2)

and X ( i ) is normal with mean \/~N{(2) + ^ < 2 i ) ^ ( i i ) ( i ) ) and covariance matrix

40

Group Invariance in Statistical

Inference

{22> ~ E { 8 i > E r i i j E ( i 4 > of y/NXt2) bution of given 5


( l l

1 1

thereupon follows that the conditional distribution is independent of the conditional distri-

, and y/NX^f

given Sni)

and Xay.
11 s

Furthermore, the conditional distribution of V W S ( i 2 )


a n ( i i s

5 ( ~ ! ) ^ { i ) give

(n)

normal with mean V W E ( a i ) S ^ j j ^ i ) and

covariance matrix

Hence the conditional distribution of


Viv" ( x < (i + W M
_ 1 / 2

2 )

given 5 ( H )

a n

d -^(i) ' normal with mean

(C(2) - S and covariance matrix

( 2

i)S^i^(i)) (i +

Wi)- ?

( 2 2 ) - ^(21) ( i i ) ( 1 2 ) -

From Giri (1977) it follows that the conditional distribution of W% given and S ( u ) is p(W \X ,S )
2 (1) tn)

X^

xl [Hi

+ Wi)" )

/xl- dl

dl

(2.25)

where Xn($) denotes the noncentral chisquare random variable with noncentrality parameter 6 and with n degrees of freedom and Xn denotes the central chisquare random variable with n degrees of fredom. distribution depends on Xh-t and Sti\) probability density of WuWz p{W ,Wi)
2

Since this conditional we get the joint

only through Wi,

as =
2 X 2

(6 (l
2

+ Wi)- )/x .
N

d l

.^

(2.26)

and furthermore it is well known that the marginal probability density function of Wi is P(Wi) = X (Si)/x dl N 2 ai

(2.27)

Invariance

41

Hence, we have

= 0

jSI

fl +

i f f t ^ ^ H *

r(* + /?)r(V)

JJ3

j !

(2.28)

From L e m m a 2.8.1. below,

= {(Ri + R ){1
2

- R i - Ra)" Hi)" )"


1 2 1 1

~Hi(l-

Ri)~ } (2.29)

x (1 + H j . { l =
2

fi (l-fli-fi )" . and R


2

From (2.28) and (2.29) the probability density function of Ri given by

is

x R$ - BI*'- (I-R T

4I

nriW-Jh*
2

x exp | - - ( f t +6 ) 2 x n ^ i ( J V -

+ rfefiji

_ 0 , f ; i i U , )

(2.30)

We now consider the case k = 3. Write W ( N X ' S " : ? - N%S^X{ y)


3 1

(l +

It is easy to conclude that 5(33, - 5 [ 3 2 j S S j

[22]

l23

42

Group /n variance in Statistical

Inference

is distributed as Wishart w(N-l-d-!2 S2

da,E(33) - E[33]E^jE(23]j , Furthermore, the conditional distribution

and is independent of S( 2) and S[ -


22

of \ / i V X ( 3 ) given S[ ]<X[2] is normal with mean ( f o ) - E|32]S|2 (-?[2] - [2]))


2J

and covariance matrix E(33) - [32]Ep2] [23] . and is independent of the conditional distribution of ^ [M] \~22) W
S S X E S

given S[ 2] and X p ] , which is normal with mean


Z

and covariance matrix


NX

m [~2 2) m
S l X

( [33] - (32] [22j [23]) 3

T h u s the conditional distribution of W given (Wi,W ),


2

given Sp2j and X [ j or, equivalently


2

is =

viW^W^Wy)

X%$41

+ WiftWi

+ W +
2

WiWsT )
1

Ixfu-dy-it-diHence the joint probability density function of p(Wi,W ,W )


2 3

(2-31) ,W, W
2 3

is given by '-

- ^

5
XN-di-di-dz

X.N-d,-d? From L e m m a 2.8.1. below, W =R (1-R )- ,


1 1 1 l

Aiv-fl,

W W

Ri(l-R -R )- ,
l 2 2

= {{R, +R

+ R )(l
3

- R , - R - R, 1 a

^ p R)
2

- (R, +

fl )
2

x (1 - R, - R^-'Xl
3 3

= R (.1-Rl-R -R )- .

(2.33)

Invariance

43

From (2.32)-(2.33) given by

the joint probability density function of i f , , i f 2 , # 3 is

i=l

i=l

x n ^ i t i V - ^ f ; ^ ) .

(2.34)

Proceeding exactly in the above fashion we get (2.22) for general k. Since the marginal distribution of JEui is normal with mean &a and covariance matrix S[iij it follows that given the joint probability density function of by replacing k by j . i f i f ^ , the marginal distribution of i f , , . . . , i f j , 1 < j < p can be obtained from (2.22)

The following lemma will show that there is one-to-one correspondence between ( f i i , . . . , i f t ) and (if J , . . . , i f ) as defined earlier in this chapter. L e m m a 2.8.1. NX'(S + NXX')- X
1

NX'S^X 1+ NX'S-i-X

P r o o f . Let {S + NXX )Hence, I + NXX'SSo we get (S + NXX')Now NX'(S + NXX')-^ = NX'S^X - NX'(S + NXXT'XiX'S^X).
1 1 1 1

=S-'

+A.

+(S

+ NXX')A

I.

= S"

- N(S +

NXX'^XX'S'

44

Group Invariance

in Statistical

Inference

Therefore, NX' NX><a NXXT*X)= , _


+ 1+NX s lx

S~ X

Note that

=i

j=i

J=I

2.9. A l m o s t I n v a r i a n c e , S u f f i c i e n c y a n d I n v a r i a n c e D e f i n i t i o n 2.9.1. Let ( A " , A , P ) be a measure space. A test ih{X),X G

A" is said to be equivalent to an invariant test with respect to the group of transformations G, if there exists an invariant test ip(x) with respect to G such that ib(x) = y{x),

for all x G X except possibly for a subset TV of A" of probability measure zero. D e f i n i t i o n 2.9.2. A l m o s t i n v a r i a n c e . A test ip{x), x G X, is said to be almost invariant with respect to the group of transformations G, if <p(x) = tp(gx) for all x G X J\f ,
g

where N

is a subset of x depending on g of probability measure zero.

It is tempting to conjecture that if tp(x) is almost invariant then it is equivalent to an invariant test i>{x). Obviously, any ifi(x) which is equivaleant to an invariant test is almost invariant. For, take N x & N, and gx $ N so that tp(x) = y(gx) = ih(gx) = ii>{x).
g

N U g N.

-1

If x N ,
g

then

Conversely, if G is countable or finite then given an almost invariant test <p(x) we define N = where N = {x G x ' f{x) # f{gx)}

1J N

and AT has probability measure zero.

T h e uncountable G presents difficulties. Such examples are not rare where an almost invariant test can be different from an invariant test.

Invariance

45

Let G be a group of transformations operating on X and let, A,B, pairs (x, g) for which gx A is measurable Ax v{Bg)

be the

d-field of subsets of X and G respectively such that for any A A the set of B. Suppose further that there = 0 implies exists a tr-finite measure v on G such that for all B B,v(B) with respect to G is equivalent to an invariant test under G. For a proof of this fact the reader is referred to Lehmann (1959). requirement that for all g G and B B,v(B) satisfied in particular when f{B) f{Bg) 0 imply v(Bg) The

= 0 for all g G . T h e n any measurable almost invariant test function

= 0 is

for all g G and B B. Such a

right invariant measure exists for a large number of groups. Let A\ = {A : A A and gA A for all g G } and let .4 be the sufficient
S

o--field of X. that gA = A

Let G be a group, leaving a testing problem invariant, be such


s

for each g G . If we first reduce the problem by sufficiency,


s

getting the measure space (X, A , P ) , and subsequently by invariance then we arrive at the measure space (X,A i,
s

P) where A i
s

is the class of all invariant

measurable sets. T h e following theorem which is essentially due to Stein

and Cox provides an answer to our questions under certain conditions. T h e proof is given by Hall, Wijsman and Ghosh (1965). T h e o r e m 2.9.1. Let [X, A, P ) be a measure space and let G be a group of measurable transformations almost invariant A leaving P invariant.
s

Let A

be a sufficient

subfield each

for P on (X, A) such that gA


3

for each g G . Suppose that for (X,A).

measurable function Then A


s

<p tb except possibly on a set N of

probability measure zero.

is sufficient for P on

Thus under conditions of the theorem if we reduce by sufficiency and invariance then the order in which it is done is immaterial. 2.10. I n v a r i a n c e , T y p e D a n d E R e g i o n s . The notion of a type D or type E region is due to Issacson (1951). Kiefer (1958) showed that the P-test of the univariate general linear hypothesis possesses this property. Suppose, for a parametric space ft {(#,i?) : 6 ft',n H] with associated distributions, with ft' a Euclidean set, that every test function <fr has a power function 0${$,n} which, for each n, is twice continuously
a :

differentiable in the components of 0 at 0 = 0, an interior point of ft'. Let Q be the class of locally strictly unbiased level a tests of Ho alternatives H\:0^O. similar and that O u r assumption on implies that all tests in Q

$ 0 against the
a

are

46

Group Invariance in Statistical

Inference

for all 4> in Let 6 = 0. Define

Q.
a

be the matrix of second derivates of 0$(9, v) with respect to the

components of $ (which is also popularly known as the Gaussian curvature) at

AM

= det(/V"))Q.
a

Assumption. A ^ ( T J ) > 0 for all v E H and for at least one d>' 6 Definition 2.10.1. said to be of type E if A^(n)-max^ Q A^(n)
e =

( T y p e E a n d T y p e D T e s t s ) . A test 4>' in Q

is

for all V H. If i f is a single point, d>~ is said to be of type D. Lehmann (1959a) showed that, in finding regions which are of type D, invariance could be invoked in the manner of Hunt-Stein theorem. (Theorem 5.OA); and that this could also be done for type E regions (if they exist) provided that one works with a group which operates on H as the identity. Suppose that our problem is invariant under a group of transformations for G which Hunt-Stein theorem holds and which acts trivially on fi', such that for g e G

If (fig is the test function defined by tpg(x) = <j>(gx), then a trivial computation shows that A* (n) = A ^ o n )
9

and hence A(n) = where A(n) = m a x ^ ^ A ^ n ) . Also, if 0 is better than 4>' in the sense of either type D or type E, then tpg is clearly better than <p'g. A{gr))

Invariance

47

Exercises 1. Let Xi,..., X


n

be independently and identically distributed normal random


2

variables with mean a and variance (unknown) IT . Using Theorem 2.7.1. find the U M P invariant test of Ho : p, = 0 against the alternative Hi : u ^ 0 with respect to the group of transformations which transform each Xi * cXi,cQ. 2. Let {Pg,8 e 11} be a family of distributions on (X,A) density p(-j8) flrj against Hj : 9 G Hi with 12
n

such that P$ has a likelihood ratio test

with respect to some a-finite measure u. For testing Ho 0


0

Pi = 0 (

n u

'l

s e t

).

rejects Ho whenever X(x) is less than a constant, where, sup 9.n X(x) = 9
0

p(x\9) p(x|S)

sup

e flj u Hi

Let the problem of testing r/o against i i ] be invariant under a group G of transformations g transforming x * g ( x ) , g 6 G, i G A". Assume that p(x\9)=p(9x\g8)X(9) for some multiplier X. Show that A(x) is an invariant function. 3. Prove (2.8) and (2.25). 4. Prove Theorem 2.9.1.

References
1. S. A, Anderson, Distribution of maximal invariants using quotient measures,

Ann. Statist. 10, pp. 955-961 (1982). 2. D. Blacltwell, On a class of probability spaces, Proc. Berkeley Symp. Math. Statist. Probability. 3rd, Univ. of California Press, Berkeley, California, 1956.
3. J . V . Bonder, Borel cross-sections and maximal invariants, Ann. Statist. 4,

pp. 866-877 (1976). 4. T . Ferguson, Mathematical


5. N. Giri, On the likelihood 6. N. Giri, On the likelihood

Statisties,

Academic Press, 1969.


multivariate testing testing testing problems; problems, problem-II,

ratio test of a normal

Ph.D. Thesis, Stanford University, California, 1961.


ratio test of a normal multivariate multivariate

Ann. Math. Statist. 35, pp. 181-189 (1964).


7. N. Giri, On the likelihood ratio test of a normal

Ann. Math. Statist. 36, pp. 1061-1065 (1965). 8. N. Giri, On the distribution of a multivariate statistic, Sankhya, 33, pp. 207-210 (1971).

48

Group Invariance in Statistical

Inference

9. N. Giri, Multivariate
specifying

Statistical

Inference,
of unbiased

Academic Press, N . Y . , 1977.


tests of simple statistical hypotheses Ann, Math. Statist. 22, pp. 217 gruppen, Annals of

10. S, L , Isaacson, On the theory

the values of two or more parameters, in der theorie

234 (1951).
11. A. Haar, Der Massbegriff der kontinvjierchen

Mathematics, 34 (1933), pp. 147-169.


12. W. J . Hail, R. A. Wijsman, and J . K . Ghosh, The relationship
and invariance with application in sequential analysis,

between

sufficiency

Ann. Math. Statist. 36,

pp. 575-614 (1965).


13. P. R. Halmos, Finite Dimensional Vector Space, D. Van Nostrand Company, optimatity and randomized of maximal nonoptimality invariants, of Ann.

Inc. Princeton, 1958.


14. J . Kiefer, On the nonrandomized

symmetrical

designs, Ann. Math. Statist. 29, pp. 675-699 (1958).


and the densities

15. U. Koehn, Global cross sections

Math. Statist. 41, pp. 2045-2056 (1970).


16. E . L . Lehmann, Testing of Statistical Hypotheseses, John Wiley, N . Y . , 1959. 17. E . L . Lehmann, Optimum invariant tests, Ann. Math. Statist. 30, pp. 881-884

(1959a).
18. L , Loomis, An Introduction to Abstract Hermonic Analysis, D. Van Nostrand

Company, Inc., Princeton, 1948.


19. L . Nachbin, The Haar Integral., D, Van Nostrand Company, Inc., Princeton,

1965.
20. R. Schwartz, Locally mimimax tests, Ann. Math. Statist., 38, pp 340-359, 1967. 21. C . Stein, Multivariate Analysis-I, Technical Report 42, Department of Statistics,

Standford University, 1956.


22. R. A. Wijsman, Cross-Section of orbits and their application to densities of max-

imal invariants, Proc. Fifth. Berk. Symp. Math. Statist. Prob. 1. pp. 389-400 (1967) University of California, Berkeley.
23. R. A. Wijsman, General proof of termination sequential probability with probability one of invariant ratio test based on multivariate of maximal invariants, observation, Ann. Math.

Statist. 38, pp. 8-24 (1969).


24. R. A. Wijsman, Distribution using global and local crossto density ratios of max-

sections, Preprint, Univ. of Illinois at Urban a-Champaign, 1978.


25. R. A. Wijsman, Proper action in steps, with application

imal invariants,
distribution

Ann. Statist. 13, pp. 395-402 (1985).


as a tool for factorization of measures and invariants, Sankhya A, 48, pp. 1-42 (1986).

26. R. A. Wijsman, Global cross sections of maximal

Chapter 3 EQUIVARIANT ESTIMATION IN C U R V E D MODELS

Let ( x , A ) be a measure space and f) be the set of points 8. {Ps,8

Denote by

Si], the set of probability distributions on (x, A ) . L e t G be a group

of transformations operating from the left on \ such that p G , g ; x * X is one to one onto (bijective). Let G be the corresponding group of induced transformations g on ft. Assume (a) For 0i e 0, i = 1,2; 9i ? t 9 , P ,
2 e

P
B

(b) P { A ) - P- e(gA),
f l g

A A,p G,g e G.

Let A(0) be a maximal invariant on ft under G , and let f!" = {0\9 fi.with A(<?) = A }
0

(3.1)

where Ao is known. We assume x to be the space of minimal sufficient statistic for 8. D e f i n i t i o n 3 . 1 . ( E q u i v a r i a n t E s t i m a t o r ) A point estimator 8(X),X X, a mapping from x ' ft, is equivariant if 8(gX) N o t e 1. = g8(X),g G.

For notational convenience we are taking G to the group of e(X).

transformations on

N o t e 2. T h e unique maximum likelihood estimator is equivariant.

50

Group Invariance

in Statistical

Inference

L e t T(X),X bution of T(X)

X' be a maximal invariant on x under G. Since the distridepends on 0 Q only through A(0), given X($) = A T(X)
0

is an ancillary statistic.

A n ancillary statistic is defined to be a part of the

minimal sufficient statistic X whose marginal distribution is parameter free. In this chapter we consider the approach of finding the best equivariant estimator in models admitting an ancillary statistic. Such models are assumed to be generated as an orbit under the induced group G on fi. T h e ancillary statistic is realized as the maximal invariant on x under G. A model which admits an ancillary statistic is referred to as a curved model. Models of this nature are not uncommon in statistics. Fisher (1925) considered the problem of estimating the mean of a Normal population with variance a
2

when the

coefficient of variation is known. T h e motivation behind this was based on the empirically observed fact that a standard deviation a often becomes large proportionally to a corresponding mean p so that the coefficient of variation a/p remains constant. This fact is often present in mutually correlated multivariate data. In the case of multivariate data, no well-accepted measure of variation between a mean vector p and a covariance matrix E is available. K a r i y a , Giri and Perron(1988) suggested the following multivariate version of the coefficent of variation, (i) A = fl'SS /!
-1

(ii) v = S "

^ where E = E ^ E

'

with S /

as the unique lower triangular matrix with positive diagonal elements. I n recent years Cox and Hinkley (1977), Efron (1978), A m a r i (1982 a,b) among other have reconsidered the problem of estimating p, when the coefficent of variation is known in the context of curved models. 3.1. B e s t E q u i v a r i a n t E s t i m a t i o n of fi w i t h A K n o w n Let Xi,...,X (n > p) be independently and identically distributed JV (p, E ) . We want to estimate a under the loss function
n p

L{n,d)
l 0

= (d-n)'-L- {d-p)

(3.2)

when A = a'T,~ p, A (known). L e t


n 7i

nx = Y,x>,s
T h e n (X,S)
p

- x)(Xi

-xy.

is minimal sufficient for {u, E ) , JUX

is distributed independently

of S as N (y/nu,

S ) and S is distributed as Wishart Wp(n - 1, E ) with n 1 nonsingular

degrees of freedom and parameter E . Under the loss function (3.2) this problem remains invariant under the group of transformations G j ( p ) of pxp

Equivariant

Estimation

in Curved Models

51

matrices g transforming (X,S)

(gX ,gSg').

T h e corresponding group G of S) is (3.3)

induced transformations g on the parametric space ft transforms 8 = (p., ) * g8 (pp, o p ' ) . A maximal invariant invariant on the space of (\/nX, T
2

= n(n-

lJX'S-'X

and the corresponding maximal invariant on the parametric space ft is A = (i'T,~ u.


1

Since the distribution of T


2

depends on the parameters only through is

A, given A A o T {gX,gSg')

is an ancillary statistic. Since the loss function L[p,d)

invariant under the group G i ( p ) of transformations g transforming {X,S) and ( , S ) (gp,gT,g')

we get for any equivariant estimator p. of

R(8, l)
(

= = E (gjl
e

E (p-p)'^- (p-p,))
e

- ga)'{g ,g')~ {gii. - 9${gWT\mm - n)'{gZ^y\^{gX)


g

gp)) gp) gp) (3-4)

= E (p(gX)
e

= Eg (a{X)
e

= *(5M), for g Gt{p),g

is the induced transformation on E corresponding to g on xR(8,fi)

Since G | ( p ) acts transitively on E we conclude from (3.4) that the risk of any equivariant estimator ji is a constant for all 8 ft.

Taking A 1, which we can do without any loss of generality and using


0

the fact that R(8, ii) is a constant we can choose p.. = e = ( 1 , 0 , . . , Q) , E = I. To find the best equivariant estimator which minimizes R(8,p) equivariant estimators p. satisfying ji(gX,gSg')=gp(X,S) we need to characterize p.. Let Gr{p) be the subgroup of G;(p( containing all Since 5 is Grip)among all

p x p lower triangular matrices with positive diagonal elements. Let


v = w

positive definite with probability one we can write, S W W ' , W

~'
2

lY Q

WT\
then (3.5)

where Y = JnX, \\ V \\ = V'V = T /(n

- 1).

T h e o r e m 3.1. If p, is an equivariant estimator of p, under Gi(p) p(Y,S) = k(U)WQ

52

Group Invariance

in Statistical

Inference

where k is a measurable

ofU = T j(n

1). Oi(p) (3.6)

P r o o f . Since p is equivariant under G i ( p ) we get for g gp(Y,S) Replace Y by W Y,


_1

= p.(gY,gSg').

g by W and S by I in (3.6) to obtain p(Y,S) = Wp(V,I). (3.7)

Let O be an pxp Then

orthogonal matrix with Q = V/ || V \\ as its first column.

fi(V,I)

= p(00'V,00')

OH{VUeJ).

Since the columns of O except the first one are arbitrary as far as they are orthogonal to Q it is easy to claim that the components of p.(\fUe,I) the first component p (VU
L

except

e, / } are zero. Hence = Qji {y/UeJ).


1

p{V,I)

T h e following theorem gives the best equivariant estimator (BEE) of p. T h e o r e m 3 . 2 . Under the loss function (If) WQ where k(U) = E(Q'W'e\U)/E(Q'W'WQ\U). (3.8) (3.2) the unique BEE of p isp = k

Proof.

B y Theorem 3.1 the risk function of an equivariant estimator p,

given Ao 1, is R(6,p)^R((e I),,l)


1

= E(k(U)WQ

- e)'(k(U)WQ

e). k(u)WQ

Using the fact that U is ancillary, a unique BEE is obtained as p = where k(u) minimizes the conditional risk given U E{((k(U)WQ - e)(k{U)WQ e)')\U\.

T h i s implies k(U) is given by (3.8).

Equivariant

Estimation

in Curved Models

53

3.1.1. Maximum likelihood estimators


T h e maximum likelihood estimators ft and of p and respectively under the restriction An = I are obtained by maximizing

-|log

(det ) - |

tr 5 S "

- | j - M J ' E

^ - M ) -

(3.9)

with respect to p, where 7 is the Lagrange multiplier. Maximizing (3.9) we obtain p~, n + 7
- l

S - - 5 +
71

M ' ( 7 + C

(310)

Using the condition J t ' E / i = 1 we get . (U-^U(4 + 5U)\ ^ (3.11)

The maximum likelihood estimator (mle) p is clearly equivariant and hence it is dominated by jl of Theorem 3.2 for any p. I n the univariate case the mle is

(3.12) 7 1 4 Hinkley (1977) investigated

Thus (3.12) is the natural extension of (3.11).

some properties of this model associated with the Fisher information. A m a r i (1982 a, 6) proposed through a geometric approach what he called the dual mle which is also equivariant.

3.2. A Particular Case

As a particular case of A constant we consider now the case = C


2

J when

is known. L e t Y = y/nX,W
s

= tr S. T h e n (Y,W)

is a sufficient statistic
n p

for this problem, and

i distributed independently of Y as X ( - i ) -

We are interested here to estimate /t with respect to the loss function

(3.13)

54

Group Invariance

in Statistical

Inference

T h i s problem remains invariant with respect to the group G = R+ x tive group of p x p orthogonal matrices transforming Xi t bTXi , t = 1 , . . . d -4. bTd, ,n

0(p),R+

being the multiplicative group of positive reals and O(p) being the multiplica-

^gf)
where (6,T) G with b e R+,T (Y, W) is given by T h e theorem under G . T h e o r e m 3.3. below

a ~

n,

g O ( p ) . T h e transformation induced by G on

(Y,W)

(!>rY, f> W).

gives the representation of an equivariant estimator

An estimator d(Y, W) h ;R
+

is equivariant

if and only if there

exists a measurable function

* R such that

d{Y,W)

h{^p}Y

for all [Y,W)R? Proof. h (^w~j" Y


t

R.
+

If h is a measurable function from if+


n e n

* R and d[Y, W)

clearly d is equivariant under G . Conversely if d is equiv(VY = b r d { , W\ )

ariant under G , then d(Y,W) for all T 0{p),Y generality that Y'Y R ,b > 0.
p

(3.14)

> 0 , W > 0. We may assume without any loss of

L e t Y, W be fixed and

-ft)
where di is the first component of the p-vector d. p x p matrix such that Y'A = (\\Y ||,0,...,0) Let A O(p) be a fixed

Equivariant

Estimation

in Curved Models

55

Let A be partitioned as A = ( A J . A J ) where A, T = {Ai,A B)


2

\\Y\\~ Y.

Now choose

with B e 0(p-l) d(Y,W) = d

and b = \\Y\\. From (3.14) we get ^(l,0...0).^)r.

+ \\Y\\A Bd ((l 0...Q) ~)


3 a t t

(3.15)

Since (3.15) holds for any choice of B 0(p 1) we must have then

d(Y,w)

= d ((i,o...o).^)r.
l

It may be easily verified that a maximal invariant under G in the space of sufficient statistic V = W~ Y'Y the parametric space is
l

and a corresponding maximal invariant in

As the group acts transitively on the parametric space the risk function (using (3.13)) fl(M) = ((M)) Hence we can take p = po =

of any equivariant estimator d is constant.

[C, 0 . . . 0)'. T h u s the risk of any equivariant estimator d can be written as

fl(po,d)

(L (p ,k
0

y ) ) r))
V

= Eu (E(L(p ,h(^-}
0 n

= *)

To find a B E E we need to find the function ho satisfying Ep (L(p ,h (V)Y\V


Q 0 0

= v)<

Ep {L{po,h{v)y)\V
0

= v)

for all h : R

* R measurable functions and for all values v of V. Since = v) =v)~ 2h(v)Epo(Y \V


l

EpampoM^YW = h (v)Epo(Y'Y\V where K = ( y . . . , y ) \ we get


l t p 2

= v) + l

EMY \V Ep (Y'Y\V
i 0

= v) = v)-

3 ( 3

. '

1 6 )

56

Group Invariance in Statistical T h e o r e m 3.4.

Inference .,X;C) = d (Y,W)


0

The BEE da(x ..


lt

is given by

d {Y,W)
0

' r f j n y + i + l ) (n&

nC
1 where t y ( l + ti)

h i^ ) \
r +i+l v

^ (3.17)
t

2
/

V h

rti"P+j+i)/ BC V hir+W \ 2 J .

P r o o f . T h e joint probability density function of Y and W under the assumption fi fi


0

is
2 1 1

[ exp{-(C /2)(j/y - 2y5gi + n + I K ) } ^ ' " - ' " /(y>) = { 2"p/ (C )""p/2(r(i))pr((n - l)p/2) 0, otherwise.
2 2

, if w > 0

Hence the joint probability density function of Y, V is given by (


/ y

exp{-(C /2[((l+ i Q W i , + 2 ^
2

]}

j
(

( C

) - ^ \ r ^ m ( n -

l)p/2)

0, otherwise.

Using (3.16) we get (3.17). C o r o l l a r y . If m - p(n - l ) / 2 is an integer, the B E E is given by n <" do(Y,W)= ^h (v)
T 0 2

g(t)X,

(3.18)

with <j(t) = p(t)/w(t)

where m+ m+l m + l 1

u(t) =

y ,

m + l \ /'nC \ Id r ( | p + i)

...

Equivariant

Estimation

in Curved Models

57

P r o o f . Let Y

be distributed as % % Then
c E

./

y n

^_o r(fc/2 + )
t t a 2

-k
l f Q >

^ ~

r(fc/2)

^ " '

Hence with m as integer t E i Y z ^ M - n C H m ^ y t h (v)


0

= v ^ C

^ 3=0
m 1

)exp{-nC (/2}(^) )/(V


m + 1 2

4 (3.19)

/ v

SC (V

),
n( 2

where Vj is distributed as noncentral X ,+2( ~' i)^ noncentral Xp( C t). get
n 2

and ^2 is distributed as

Hence letting V = ^ ( t f )

d taking r as an integer we

Prom (3.19) and (3.20) we get (3.18). N o t e 1. g(t) is a continuous function of f, and

lirn
(-.0+

(f) = TiC /p,


= g { l ) < l ,

hmg(t)

and g(t) > 0 for all t. T h u s when Y'Y N o t e 2. We can also write d

is large the B E E is less than X. This

= (1 - ^ ) X where r(t))/t) = 1 - g(t).

form is very popular in the literature. Perron and G i r i (1990) have shown that g{t) is a strictly decreasing function of t and T(V) is strictly increasing in v. T h e result that g(t) is strictly decreasing in f tells what one may intuitively do if he has an idea of the true value of C and observe many large values concentrated. Normally one is suspicious of their effects on the sample mean and they have the tendency to shrink the sample mean towards the origin. T h a t is what our estimator does. T h e result that T(V) is strictly increasing in v relates the B B E of the mean for C known with the class of minimax estimators of the mean for C unknown. Efron and Morris (1977) have shown that a necessary condition

58

Group Invariance

in Statistical

Inference

for an equivariant estimator of the form g{t)X

to be minimax is g(t) * 1 as

t * 1, SO our estimator fails to be minimax if we do not know the value of C. O n the other hand Efron and Morris (1977) have shown that an estimator of the form d = (1 0 < T(V) <(p-

Z^p-)X

is minimax if (i) r is an increasing function, (ii)

2 ) / ( n - 1) + 2 for all v e (0, oo). T h u s our estimator satisfies

(i) but fails to satisfy (ii). So a truncated version of our estimator could be a compromise solution between the best, when one knows the values of C and the worst, one can do by using an incorrect value of C. 3.2.1.

An application

T h e following interesting application of this model is given by Kent, Briden, and Mardia (1983). T h e natural remanent magnetization ( N R M ) in rocks is known to have, in general, originated in one or more relatively short time intervals during rock-forming or metamorphic events during which N R M is frozen in by falling temperature, grain growth, etc. T h e N R M acquired during each such event is a single vector magnetization parallel to the then-prevailing geometric field and is called a component of N R M . B y thermal, alternating fields or chemical demagnetization in stages these components can be identified. Resistance to these treatments is known as "stability of remanence". A t each stage of the demagnetization treatment one measures the remanent magnetization as a vector in 3-dimensional space. These observations are represented by vectors Xi,...,X
n

in fi . T h e y considered the model given by j j j a , +0i+ei

where

ci denotes the true magnetization at the ith step, 0i represents the model error, and ei represents the measurement error. They assumed that 0, and e< are independent, p\ is distributed as N {0,T {c>i)I),
3 2 ; 2

and e is distributed as
z

A M 0 > c r ( a , ) / ) . T h e Q are assumed to possess some specific structures, like collinearity etc., which one attemps to determine. Sometimes the magnitude of model error is harder to ascertain and one reasonably assumes r (a) In practice <J {a) is allowed to depend on a ; and plausible model for which fits many data reasonably well is cr (a)
2 2 2

= 0. c (a)
2

a{a'o)

+ b with a > 0, b > 0.

When a'a large, b is essentially 0 and a is unknown.

3.2.2.

Maximum likelihood estimator


x
n

T h e likelihood function of xi,... L(xi,...,x \p)


n

with C known is given by

C ^ ) - -

e x p | - ^ - ( , + j ' ,-2^'M + V
( / J

) }

Equivariant

Estimation

in Curved Models

59

Thus the mle p of p (if it exists) is given by [{np/C )(p'p)


2

-w-y'y

+ 2 \ / n i / A ] = Vnp'py

(3.21)

If the E q . (3.21) in ft has a solution it must be colinear with y. L e t p = ky be a solution. From (3.21) we obtain k[(np/C )y'y)k
2 2

+ Jn~y'yk - {y'y +w)] = 0.

Two nonzero solutions of A; are -1-(1 + ^ ( ^ ) )


1 _ 1 / 2

- 1 + (1 + '
2

^ ( ^ ) )
2

iJkpIC

2^lp/C

To find the value of k which maximizes the likelihood we compute the matrix of mixed derivatives a ( - logL)
dp'dp
2

kHy'y)'

and assert that this matrix should be positive definite. T h e characteristic roots of this matrix are given by x/nC
2

k y'y
2

'

+ 2npk k y'y
2 2 2

If k = ki, then Ai < 0 and A < 0. B u t if k = k , then A, > 0, A > 0. Hence the mle p. d-i(xi,... characteristic roots) (1 + 4 p / C ( ) 2p
2 1 / 2

,x ,C)
n

is given by (corresponding to positive

d {x ,...x ,C)
1 1 n

- 1

C X.

(3.22)

Since the maximum likelihood estimator is equivariant and it differs from the B E E do the mle d\ is inadmissible. T h e risk function of do depends on C . Perron and Giri (1990) computed the relative efficiency of do when compared with d]_, the James-Stein estimator (d ),
2 3

the positive part of the James-Stein

estimator (d ), and the sample mean X ( d j ) for different values of C, n, and p. They have concluded that when the sample size n increases for a given p and C the relative efficiency of do when compared with di, i = 1,.., 4 does not change significantly. T h i s phenomenon changes markedly when C varies. When C is small, dp is markedly superior to others. O n the other hand, when C is large

60

Group Invariance in Statistical

Inference

all five estimators are more or less similar. These conclusions are not exact as the risk of d ,di
0

are evaluated by simulation. Nevertheless, it give us sufficient

indication that for small values of C the use of B E E is clearly advantageous. 3.3. B e s t E q u i v a r i a n t E s t i m a t i o n in C u r v e d C o v a r i a n c e M o d e l s Let Xi,... ,X (n
n

> p > 2) be independently and identically distributed

p-dimensional normal vectors with mean vector p and positive definite convariance matrix S . Let E and S = / i
1 ll

(X

- X)(X

- Xf

be partitioned as

P-IV
^12 I .51

i Sn

p - i V S\2 I

We are interested to find the B E E of 0 E ^ ^ i on the basis of n sample observations Xi,...,x coefficent p
2

when one knows the value of the multiple correlation If the value of p
2

E ^ E ^ s E ^ E21.

is significant, one would

naturally be interested to estimate 0 for the prediction purpose and also to estimate 22 to ascertain the variability of the prediction variables. L e t Gi(p) 0{p) be the full linear group of p x p nonsingular matrices and let be the multiplicative group of p x p orthogonal matrices. L e t Hi be the

subgroup of Gi(p) defined by / Hi = {ft Gi(p): ft = \ and let H


2

' An 0

P-1 0 h22
2

be the additive group in R .


2

Define G = Hi H
2

the direct sum


2

of H i and H .

T h e transformation g = (hi,k )

G transforms H .

The

transformation g (fti,fta) G transforms X{ - t fti^i + h ,


3

t = l,...,n,

( M , E ) - * (ftiM + h a . f t i S f t i ) . T h e corresponding transformation on the sufficient statistic is given by


2

(X,S)

- * (fti-A + / i , ftiSftJ). A maximal invariant in the space of (X, S) under G is R? = S12S22S21


2

and a corresponding maximal invariant in the parametric space of (p, E ) is p .

Equivariant

Estimation

in Curved Models

61

3.3.1.

Characterization
p

of equivariant estimators of

Let S

be the space of all p x p positive definite matrices and let G ( p ) be

the group of pxp lower triangular matrices with positive diagonal elements. A n equivariant estimator d{X, S) of with respect to the group of transformations G is a measurable function d(X, S) on S xR
p p

to S

satisfying

d(hX for all S S ,h


p

+ , hSh') = hd(X, R'


p

S)ti

Hi,

and X,

From this definition it is easy to = d(0,S) for into

remark that if d is equivariant with respect to G then d(X,S) all X R ,S


P

S.
p

T h u s without any loss of generality we can assume that d


p

is a mapping from S

to S .
p

Furthermore, if u is a function mapping S

another space Y (say) then d* is an equivariant estimator of u ( ) if and only if d* u d for some equivariant estimator d of . Let

and let G be the group of induced transformations corresponding to G on 0 .


P

L e m m a 3 . 1 . G acts transitively on 0 .
P

Proof.

It is sufficient to show that there exists a g

( A , ) G with

ft Hi, ^ RP such that


1 1

(/i/t + > AA') =

1 p .0

p 1 0
1 / 2

P 0 0 /

j 1 p-2

N (3.23)

If p = 0, i.e., p # 0, choose A

1 2

= 0, we take ha = E
1 u / 2

= ,,

, f = - f t p to obtain (3.23). I f

2 2

= F ^3

, where T is a (p - 1) x (p - 1)

orthogonal matrix such that n


1 / 2

1 2

1 / a 2

r = 0 , 0,

0),

and = - A / / to obtain (3.23). T h e following theorem gives a characterization of the equivariant estimator

d(S) of S .

62

Group Invariance

in Statistical

Inference

T h e o r e m 3 . 5 . An estimator d ofT. is equivariant the decomposition

if and only if it admits

{a^WR-iSu

R~ a tR)S S,- S 2
22 21 1 1

C(R){S22-S S- S )
2l l2

(3.24) where C{R) > 0 and


a A

/f,\
[ K )

_ ( a-ii{R) - \a (R)
2l

a (R) a (R)
12 22

is a 2 x 2 positive definite matrix. d i di d 2


Y 2 2

Furthermore
2

<ki p S ^ E i j i E j " ^ S a i
2

if and only if a Note.


aij

l n

ai a
2

2 2

a \ = f>
2

T h e d;j are submatrices of d as partitioned in (3.24) and aij

(R). Proof. T h e sufficiency part of the proof is computational. hd(S)h'


1

I t consists
1 2

in verifying d(hsh') <iii^i2a a i.


22 2

for all h g H, S e Sp and d^, d^d^o d i

It can be obtained in a straightforward way from the compu-

tations presented in the necessary part. To prove the necessary part we observe that

- ( R

*)

FI

><

and d satisfies Ii 0

o
for all T 0(p - 2). T h i s implies that

P
0 with C(R) > 0.
P

o \ _ /MR)
I-)
2

o
C(R)I _
P 2

In general, S has a unique decomposition of the form S =

Sn
Sn

S\
l2

_(Ti
\ 0

S)
22

0 \ / l T j \U

V \(T[
V J U

0 T

Equina Hani Estimation

in Curved Models

63

with 7\ G G ( 1 ) , T

e G+(p - 1}, and U = T - S T,- .


2 21

>

Without any loss of ( S ^ S^S^


1 1

generality we may assume that U ^ 0. Corresponding to U there exists a B G 0 ( p - 1) such that U'B = {R, 0, . . .0) with R = \\U\\= Using such a B we have the decomposition 1 U and W \ _(\ !.,) ~ {0 O W P Bj \ 0 0
J_)
P 2

5 )=
2 1 l

>

0. F o r p > 2, 5 is not uniquely determined but its first column is

R~ U.

\ (l \0

0
B'

- ( ? i)-(J S)(T )(! *)( i)


(
n Of an(fi) R-'autRJl/'
p - i - W )

V TJ
A o

0
13

fi-'a^ffi)^!

R- a 2(R}S - S- 'S 2
2 2 l l l

+ C(R){S

22

-S iS^S )
2 l2

which proves the necessary part of the theorem. 3.3.2.

Characterization of equivariant estimators of 0

T h e following theorem gives a characterization of the equivariant estimator of 0. T h e o r e m 3 . 6 . If d* is an equivariant form d*(S) = R' a(R)S S ,
22 2 1 1

estimator ,

of 0 then d*(S)

has the

where a : R

* R.
p

P r o o f . Define u:S

^ RP'

by

() = = - % ! . If a" is equivariant then d*(S) = (R-^WSnS^Su = (T (R- a (R)UU'


2 22 2

+ C(R)(S

22

S^S^S ))~ S^R-'a^R)


12

+ CfflHCV! 2 1 21

UU )T^S R- a {R)
2l 21 1 2 2 21

= R~ (a (R)
22

+ (1 l 2 2

R )(C(R))- a (R)S - S

= R- a{R)S - S i
2

64

Group Invariance in Statistical

Inference

T h e risk function of an equivariant estimator d" of 0 is given by R(0,d*) =


L

Ex(L((3,d'))
1 l i
1 l

= E {S^(R- a(R)S S -.
n 2

- 0}'S {R~
22
l 12

a(R)S
22

l 22

S
.

2l

0)}
(3.25)

= E {a?{R)
z

- 2R- a{R)S-[ S 0

+ S^0'S 0}

T h e o r e m . 3.7.

T h e best equivariant estimator of 0 given p, under the

loss function L is given by R- a*(R)S - S ,


2 2 2 l l

(3.26)

where
r (

n i
2

r ( ^ - M ) ) (2 )
p r 2 m

( V V ) 7 I T / m ! r (2=1 + m)

a'(R)

= r p

^ _
r (2=1 +
m

(3.27)

Proof. a*(R)

From (3.25), the minimum of R(0,d*)


1

is attained when a(R)

E^S^

Si20R~ \R).

Since the problem is invariant and d" is equiv-

ariant we may assume, without any loss of generality, that

with C(p) = \ \

,1)

j . To evaluate o*(fi) we write 5 5 S


2 2

= T T ' , T e G + ( p - l ) , r = (%) = RTW, = W W . is ( G i r i (1977)) 0 < B < l , B


r

2 1

fl"" ,

T h e joint probability density function of (R, W,T)

P-i

i=2j=l

1 *
e x p {

~ 2 ( i -

2 P

"

* -

P ^

^ n i=i

fe)

(3.28)

A straightforward computation gives (3.26).

Equivariant

Estimation

in Curved Models

65

L e m m a 3 . 2 . Let 0 > 0, 0 < 7 < l , m be an integer. Then ^Vja


1=0

+ m + l) T(0 + T{a +1)

l)^

Proof.

^ r ( a + m + l)r(^ + l ) j
i =0

d"
(=1

-0 = (1 - 7 ) ^ ) ^ ( 1 + )+- ( l ^

T h e o r e m 3.8. / / m p) is an integer then

a ' M ^ - ^ - l p y m,
i =0

1 - r

1 -

r V

P r o o f . Follows trivially from L e m m a (3.2).


2 2

Note. (p-ir S S i.
22 2 l 1

If p

is such that terms of order ( r p ) T h e mle of 0 is SS.


22 2S

and higher can be ne2

glected, the best equivariant estimator of 0 is approximately equal to p (n 1)

66

Group Invariance

in Statistical

Inference

Exercises 1. Let X = (Xai,-,X ,)',a


aj

= 1,...,N(

> p) be independently and iden-

tically distributed p-variate normal random vectors with common mean fi and positive definite common covariance matrix and let

show that if & = A V E ancillary statistic. 2. L e t X


a

- 1

/ * is known, then NX'(S

+ NXX')~ X

is an

= (X i,...,X y,a
a ap

= 1 , . . . ,2V( > p) be independently and identi-

cally distributed p-variate normal random vectors with mean 0 and positive definite covariance matrix and let
E

_ (
\(21)

(12) \

with

( l l

) : lxl.

Given the value of p

( 1 2

, ~
(

2 )

2 1 1

/ i i find the

ancillary statistic. 3. (Basu (1955)). If T is a boundedly complete sufficient statistic for the family of distributions {Pg,9 T. 4. F i n d the conditions under which the maximum likelihood estimator is equivariant. g ft}, then any ancillary statistic is independent of

References
1. S. Amari, Differential geometry of curved exponential ancillary families - curvatures and

information

loss, Ann. Statist. 10, 357-385, (1982a).


theory of asymptotic independent and conditional inference,

2. S. Amari, Geometrical 3. D. Basu, On statistics

Biometrika, 69, 1-17 (1982b).


of sufficient statistic, Sankhya 20, 223-226

(1955). 4. D. R. Cox and D. V, Hinkley, Theoretical Statistics, Champman and Hill, London, (1977). 5. B. Efron, The geometry of exponential families, Ann. Statist., 6, 262-376, (1978).
6. B. Efron and C, Morris, Stein's estimation rule and its competitors, An empirical

approach, J . Amer. Statist. Assoc. 68, 117-130, (1973). 7. B. Efron and C . Morris, Stein's paradox in statistics, Sci. Ann., 239, 119-127, (1977).

Equivariant

Estimation in Curved Models

67

8. R. A. Fisher, Theory 700-725, (1925). 9. N. Giri, Multivariate of variation,

of statistical Statistical

estimation,

Proc. Cambridge Phil. Soc.

22,

Inference,

Academic Press, N . Y . , 1977. coefficent /i

10. D. V. Hinkley, Conditional

inference

about a normal mean with known estimator


2

Biometrika, 64, 105-108, (1977). of a mean vector J.


l

11. T . Kariya, N. Giri, and F . Perron, Equivariant of Npifi, E ) with u"Z~ u Anal. 27, 270-283 (1988). 12. J . Kent, J . Briden, and K . Mardia, Linear tivariate data as applied to progressive = 1 or " V
V

= c or = a (u'fi)I, and planer structure

Multivariate mulrema-

in ordered

demagnetization estimator

of palaemagnetic of mean of a

nence, Geophys. J . Roy. Astronom. Soc. 75, 593-662, (1983). 13. F . Perron and N. Giri, On the best equivariant normal population, 40, 46-55 (1992). 14. F . Perron and N. Giri, Best equivariant multivariate Models, 3. Multivariate Analysis, 32, 1-16 (1990). estimation in Curved Covariance

Chapter 4 SOME B E S T INVARIANT T E S T S IN MULTINORMALS

4.0. I n t r o d u c t i o n In Chapter 3 we have dealt with some applications of invariance in statistical estimation. We discuss in this chapter various testing problems concerning means of multivariate normal distributions. Testing problems concerning discriminant coefficents, as they are somewhat related to mean problems will also be cousidered. This chapter will also include testing problems concerning multiple correlation coefficient and a related problem concerning multiple correlation with partial informations. We will be concerned with invariant tests only and we will take a different approach to derive tests for these problems. Rather than deriving the likelihood ratio tests and studying their optimum properties we will look for a group under which the testing problem remain invariant and then find tests based on the maximal invariant under the group. A justification of this approach is as follows: If a testing problem is invariant under a group, the likelihood ratio test, under a mild regularity condition, depends on the observations only through the maximal invariant in the sample space under the group {Lehmann, 1959 p. 252). We find then the optimum invariant test using the above approach, the likelihood ratio test can be no better, since it is an invariant test.

4.1. T e s t s of M e a n V e c t o r Let X = (Xi,...,X ) be normally distributed with mean E(X) = (i> l i p ) and positive definite covariance matrix E(X u) (X p). Its pdf is given by
p 1

68

Some Best Invariant

Tests in Mattinormals

69

fx(x)

= {2 r)-"/ (det
7

exp

j-itrE" ^

- 0 & -

tf

(4-1)

In what follows we denote 3: a p-dimensional linear vector space, and X', the dual space of X. T h e uniqueness of nonmal distribution follows from the following two facts: (a) T h e distribution of X is completely determined by the family of distributions olB'X, distribution. For relevant results of univariate and multivariate normal distribution we refer to Giri (1996). We shall denote a p-variate normal with pdf (4.1) as JV (f, S ) .
p

9 X'.

(b) T h e mean and the variance of a normal variable completely determines its

On the basis of observations x" N {l,


p

(x ,...,
ai

x)
ap

, a I,. ..,N

from

E ) we are interested in testing the following statistical problems.

P r o b l e m 1, T o test the null hypothesis HQ: = 0 against the alternatives Hi: j 0 when g, S are unknown. T h i s is known as Hotelling's T
2

problem.

P r o b l e m 2. T o test the null hypothesis HQ: i , against the


p

alternatives H i : # 0 when , are unknown and p > p i . P r o b l e m 3 . T o test the null hypothesis Ha : i * * > alternatives H i : & = - * = ^ Let X jj^X", (minimal) for (, ) and \/NX S = E^(X
a

P l

against the
2

= 0 when p, are unknown and pi + p - X)(X


a

< p. E)

- X)'.

(X,S)

is sufficient
p

is distributed independently of S as N (y/ntl,

and S has Wishart distribution Wj,(JV1, E ) with JI = J V - 1 degrees of freedom and parameter E . T h e pd/ of S is given by f X{detE)' / (dets) } = < if i positive definite,
s s n 2 : i : :

F exp{-|trE- a} (4.2)

W (n,
p

I 0 otherwise. Problem 1 remains invariant under the general linear group G ; ( p ) transfering (X, S) -* (gX, gSg'), g G Gi(p). R =NX'(S
l

From E x a m p l e 2.4.3 with di = p + nXX')- X (4.3)


1

NX'S- X l+NX'S-tX
(

is a maximal invariant in the space of (X, S) under G ( p ) and 0 < R < 1.

70

Group Invariance in Statistical

Inference

T h e likelihood ratio test of HQ depends on the statistic ( G i r i (1996)) T


2

= N(N
2

-^X'S^X

(4-4)

which is known as Hotelling's T T


2 1

statistic. Since R is a one-to-one function of

we will use the test based on R. From Theorem 2.8.1 the pdf of i i depends

only on 6 i V ^ ' E ' ^ and is given by

>

r(p/2)r((7v- )/2)
P

'

It may be verified that 8i is a maximal invariant under the induced group (5j(p) = Gi(p) in the parametric space of (, E ) . Since is positive definite tfi = 0 if and only if p = 0 and 6 > 0 for p ^ 0. F o r invariant tests with respect to G;(p) Problem 1 reduces to testing HQ: 6 = 0 against the alternatives Hi6 > 0. From (4.5) the pdf of R under HQ is given by
r ( a

JrM(i-Tjtl*-*&-*
( 1

Mn) =

r ( f ) r ( V ) if 0 < r < 1, 0 otherwise,

4.

6 }

'

which is a central beta with parameters ( | , ^j^)-

From (4.5) the distribution

of R has monotone likelihood ratio in i i (see Lehmann, 1959). T h u s the test which rejects HQ for large values of R is uniformly most powerful invariant (UMPI) rem. T h e o r e m 4.1.1. whenever T
2

under the full linear group for testing HQ against H i .


2

Since R ^ constant, implies that T

^ constant we get the following theo-

For Problem Hi.

1, Hotelling's

test which rejects HQ UMPI

^ C, the constant C depends on the level a of the test, is

for testing HQ against

The group Gt{p) induces on the space of {X, S) the transformations (X,S)^(gX,gSg'),
9

G,(p). a function of {X,S}

It is easy to conclude that any statistical test <p(X,S), under G j ( p ) . Since

whose power function depends on ( / i , ) only through 6, is almost invariant

Some Best Invariant

Tests in Multinomials

71

E ^(X S)
ll <

= =
lti>2

E . _ . . .ct>(X,S)
g 1)l g lEg l

E <KgX,gSg'),

thus E^(<f>(X,S)-<P(gX,gSg')) for all ( / / , ) . = 0

Using the fact that the joint probability density function of

( X , S) is complete, we conclude that <p(X S)


1

4'(gX,gSg}

for all g 6 <3J(P)> except possibly for a set of measure zero. A s there exists a left invariant measure (Example 2.1.3) on G ( p ) , which is also right invariant,
(

any almost invariant function is invariant ( L e h m a n n , 1959). Hence we obtain the following Theorem for Problem 1. all tests of HQ: 0 with power function test is UMP:

T h e o r e m 4.1.2.

Among T
2

de-

pending on 6, Hotelling's

P r o b l e m 2. Let Ti be the group of translations such that t i , g T i translates the last p pi components of each X", a 1 , . . . , N and let Gi be the subgroup of Gi{p) such that g 6 G\ has the following form
g=

(m

\9{21) 9(22)
where g^ j
U

is the p

x p

upper left-hand corner submatrix of g. T h i s problem Ii,

is invariant under the affine group ( G ^ T i ) such that, for g Gi,tj (g, ( i ) transform X ^gX
a a

+ t

a =

l,..,,N. under ( 0 i , T i ) is Bi, where (4.7)

A maximal invariant in the space of (X,S)

R ^ N X ^ S ^ + N X ^ X ' ^ r ' X ^

as defined in E x a m p l e 2.4.3 with di == Jh- A corresponding maximal invariant in the parametric space of (, E ) under ( G i . T i ) is #i = N ^ E ^ . ^ J J . variant test this problem reduces to testing H Hi:
0

For in-

: i = 0 against the alternatives


t

&i > 0. From Theorem 2.8.1 the pdf of R

is given by

72

Group Invariance

in Statistical

Inference

From (4.8) it follows that the distribution of R\ possesses a monotone likelihood ratio in i%. Hence we get the following Theorem. T h e o r e m 4 . 1 . 3 . For problem 2 the test which rejects Ho for large values of Ri is UMPI under (Gi,Ti) for testing Ho against H\.

N o t e : A s in Problem 1 this U M P I test is also the likelihood ratio test for Problem 2. Using the arguments of Theorem 4.1.2 we can prove from Theorem 4.1.3 that among all tests of Ho with power function depending on Si, the test which rejects Ho for large values of Ri is U M P for Problem 2. P r o b l e m 3. Let T be the translation group which translates the last
a

p pi P2 components of each X , of Gi(p) such that g e G


2

a = 1 , . . . , N and let G

be the subgroup

has the form

9 =

(9iu) 0 9(21) 3(22)


\9(31)

0 \ 0 1 3(32) ff(33] /
2

where

is pi x p and <7( ] is p2 x p .
t 22 2

T h i s problem is invariant under the

affine group (G2,T )

transforming X ^gX
a a

+ t

2l

a =

l,..,,N

where g G (G ,T )
2 2

and t

T . A maximal invariant in the space of (X, S) under


2

is C f l M b ) ) where

R l + R - NXf2](Si22] + N ? [ ] A ' [ 2 ] ) " X [ 2 |


2 J 2

as defined in Example 2.4.3 with di = pi, d = p .


2 2

A corresponding maximal invariant in the parametric space of ( , E ) is (61,62), defined by,

^=^(1)^-^(1).

Under the hypothesis H ,$1


6 2

= 0, 6 = 0 and under the alternatives Hi, fij =


2 2

0, 6 > 0. From Theorem 2.3.1 the joint pdf of Ri, R

is given by

Some Best Invariant Tests in Multinomials

73

m l

, _
2 >

n | N ) r t l p i j r c ^ r t ^ p - p ! - ^ ) )

'

From G i r i (1977) the likelihood ratio test of this problem rejects Ho whenever

where the constant c depends on the level a of the test and under Ho, Z is distributed as central beta with parameters ( | ( i V pi p ) , \p?)2

From (4.9) it follows that the likelihood ratio test is not U M P I . However for fixed p, the likelihood ratio test is approximately optimum as the sample size N is large (Wald, 1943). T h u s if p is not large, it seems likely that the sample size commonly occurring in practice will be fairly large enough for this result to hold. However, if the dimension p is large, it might be that the sample size N must be extremely large for this result to apply. We shall now show that the likelihood ratio test is not locally best invariant ( L B I ) as S - * 0. T h e L B I test rejects H whenever
2 0

Ri +

> c Hotelling's T

(4.11)
2

where the constant c depends on the level a of the test. which rejects Ho whenever R, + R for Problem 3. D e f i n i t i o n 4 . 1 . 1 . ( L B I test). For testing H
2

test

^ c, the constant c depends on the level O:

of the test, does not coincide with the L B I test, and hence it is locally worse

: 9 IH,

an invariant test

4>* of level a is L B I if there exists an open neighborhood fii of fl/jo such that

Eebt?) >

E {4>\
e

e&

-fijio,

(4.12)

where 0 is any other invariant test of level a. T h e o r e m 4 . 1 . 4 . For testing H : h*i = 0, tf = 0 against H\ : S i = 0,6
2

> c

0, the test which rejects HQ whenever

R\ + ~?'
2

^ c, where the constant

depends on the level a of the test, is LBI as 6 * 0.

74

Group Invariance

in Statistical

Inference

Proof.
2

Since (R\,R2)

is a maximal invariant under the affine group


1 2

( G 2 , r ) , the ratio of densities of (R ,R ) given by, m , ^ ) f(fi,f \0)


2 2 = 1 +

under Hi,

to that under HQ, is

' _ 2 V

i v ^ P2

> )

as 6 * 0. Hence the power of any invariant test d> is given by E ($)=a-rE ^ $6


S2 s D

(-l

+ R i + ^ ^ R ^ j

o(6 )
2 N Pl

as 62 * 0, which is maximized by taking $ = ^ c. Asymtotically Best Invariant ( A B I ) Test Let # be an invariant test for testing H
2 0

1 whenever Ri + ~

R n
2

: $1 = 0, 6 = 0 against Hi : S\ = 0,
2

6 = A with associated critical region fi = { ( f i . f a ) ! U{f\,ft) P ( f i ) = a;


0

^ c } satisfying

P (R)
Hl 2

= 1 - exp{-H(A)(l +o<l))}; fifA)^,^)]} fi(f f ;A);


l7 2 2

f!-"- lff? = expWAJlGfAi + +


2

(4.12) < 00. Then

where J B ( f i , f ; A ) o ( H ( A ) ) as A > 00 and 0 < ci < R(X) < c $ is an A B I test for testing HQ against H1 as A 1 00.

T h e o r e m 4 . 1 . 5 . For testing HQ : S\ = 62 = 0 against H i : t>\ 0 ,


2

A > 0, Hotelling's

test which rejects H

whenever Ri + R

^ c, the constant c

depending on the level a of the test, is ABI as A * 00. P r o o f . Since _, , . , a a(a +1) x
2

= exp{ar(l + o ( l ) } we get from (4.9) T^j^j


2

as x < 0 0 ,

" i " M(l

+ B(f f2-,\))};
u

(4.13)

where S ( f i , f ; A ) - o ( l ) as A - 00. It follows from (4.2) that the pdf of U = R\ + R satisfies


2

Some Best Invariant Testa in Multinomials P(U(R ,R )


l 2

75

+ R

< c

) (4-14)
2

= exp { ^ ( c - 1 ) 0 + o ( l ) ) }

Thus, from (4.12) with H(X) = ~{c - 1), we conclude that Hotelling's T is A B I . 4.2. T h e C l a s s i f i c a t i o n P r o b l e m ( T w o P o p u l a t i o n s ) Given two different p-dimensional populations Pi,P probability density functions pi,p . ., x Y
P 2 2

test

characterized by the

respectively; and an observation x [x\,

the classification problem deals with classifying it to one of the two

populations. We assume for certainty that it belongs to one of the two populations. Denote by Oj, t = 1,2 that x comes from Pi and let us denote, for simplicity, the states of nature Pi simply by i. L e t L ( j , L(i,a<) - 0, L(i,a,j) = ij > 0, j j i = 1,2, be the loss incurred by taking the action a; when j is the true state of nature having property that

L e t (x, 1 T ) be the a priori probability distribution on the states of nature. The posterior probability distribution ( , 1 ) , given the observation x, is given by _
5

rrp^x) Pi(*J + {
l

, '

l is given by

The posterior expected loss of taking action a

(1 - 7T )CiP2 (l) Kpi(x) and for taking action a


2

+ (1 - J T ) P 2( Z)

(4.16)

is given by

irpi{x)

+ (1 - iv)p {x)
2

'
2

(4.17)

From (4.16)-(4.17) it follows that we take action a TC Pl(x)


2

if
2

,
2

(1-^)C1P (X) npi(x) + (1 w)p (x)


2

x p i ( x ) + (1 - 7 r ) p ( i ) and action a j if 7rc pi(a;)


2 >

{1 - ir)c (x)
lP2

npi(x)

+ (1 - n)p (x)
2

xp^x)

+ (1 - ?r)p2(i)

and randomize when equality holds.

76

Group Invariance

in Statistical

Inference

(4.18) and (4.19) can be simplified as, take action a if if > ., < JiT . = K (say), (4.20)

take action a , and randomize when

K.

L e t us now specialize the case of two p-variate normal populations with different means but the same covariance matrix. Assume Pi : W ( , E ) , p :A. (M,S),
2 p

where a = Hence pa()

...,^ )', p

(ft,. ,-,p)' e i i

and is positive definite.

= exp - exp

[ f > - O'S" ^

HYX-H*
_ 1

- f * ) ]

(s'E^O
2

-0
if

4- | ( { ' E

pE'V)

Using (4.20) we take action a

and take action a , otherwise. T h e linear functional T


x

- 1

( p . 4) is called

Fisher's discriminant function. In practice E , , ft are usually unknown, and we consider estimation and testing problems for T ( r i , . . . , T ) .
p

Let X , let Y ,
a

a 1 , . . . , Ni be a random sample of size A i from N (,


p r 2 2 p

E ) and

a 1 , . . . , A7 be a sample of size A7 from A ( p , E ) . Invariance and

sifficiency considerations lead us to consider the statistic

s = Y, x { a

+ J2( - )( - )'.
Y Y ya y

a=l

a=l X , A ^ y = J T ^ Y for the statistical inferences of V.


a

where N,X

We consider two testing problems about T. P r o b l e m 4. To test the hypothesis H alternatives Hi T ^ O . :r i = - r = 0 against the

Some Best Invariant

Testa in Multinomials

77

P r o b l e m 5 . T o test the hypothesis H alternatives Hi : r


p i + P 2 +

:T

P 1 + 1

= = T

0 against the

i = = T ,X )
P

= 0. is distributed as J V ( 7 j , E ) and S is disp p

Without any loss of generality we restate our setup in the following canonical form, where X = {Xi,... tributed, independently of X, as Wishart W (n, E ) and T E J J . Since E is
_ 1

positive definite, T = 0 if and only if n - 0. T h u s the problem of testing the hypothesis T = 0 against T ^ 0 is equivalent to Problem 1 considered earlier in this chapter. P r o b l e m 4, It remains invariant under the group G of p x p nonsingular matrices g of the form
9

=(m

(4.21)

\9{21)
where 9{\\) is of order p i , operating as (X S;T S)
} }

9(22) }

(gX^gSg'^g')- ?^^')

where X, jf are the mean and the sample covariance matrix based on N observations on X. B y E x a m p l e 2.4.3 a maximal invariant in the space of the with di = pi, d% = P2 is given by (4.4) where the expressufficient statistic (X, S) under G is given by (Ri^R^), and p pi + P 2 - T h e joint pdf of ( r ? i , R )
2 2

sions for 6i, 6 in terms of T, E are given by Si + s


2

= AT'sr,
2

S =
with
2 2

M Nr' (x r r
22 r l (2)
1 2 )

(4-22)

{2)

= (E 2) - E ( 2 i ) E j
( 2

_ n )

E(

_ 1

Under the hypothesis H 61 > 0,


0

02 0 and under the alternatives HiSi > 0, i = 1 , 2 . From ( 4 . 4 ) it follows that the probability density function of Ri under Ho is given by

m\$i)

= r(ip,)r(i(A/-p,)) x(fi)^
p ,

" (l-fi)^ "

p l )

"

x exp | - ^ ! } 4> and Ri is sufficient for Ho whenever * =


i

QjT, | M J

Ifi^l

(4.23)

t\.

Giri (1964) has shown that the likelihood ratio test for this problem rejects

T T 7 r ^

( 4

2 4 )

78

Group Invariance

in Statistical

Inference

where the constant c depends on the level a of the test and under H Z
0

has a

central beta distribution with parameter {^{N oiRi.

- p ) , \p2) and is independent

To prove that the likelihood ratio test is uniformly most powerful similar we first check that the family of pdf UXH%)-h is boundedly complete. D e f i n i t i o n 4.2.1. {Ps(x) {ps(x) (Boundedly complete). A family of distributions ^0} (4.25)

' 6 E fi} of a random variable X or the corresponding family of pdfs : 6 e fi} is boundedly complete if E h(X)
6

= J = j = 0

k(x)dP (x)
s

h{x)p {x)dx
s

for all 6 6 fi and for any real valued bounded function h(X), h(X) We also say AT is a complete statistic or X is complete. L e m m a 4.2.1. plete.

implies that

= 0 almost everywhere with respect to each of the measures Pf.

The family of pdfs { / ( f i ^ i ) : 6\ > 0} is boundedly com-

P r o o f . For any real-valued bounded function h(R]} (4.23)) E- (h{Ri))


Sl

of Ri we get (using

= j

Hnmr^dfi

= ex {-i*- }
P 1

x jJo

f = exp <

l - 1 ^

I^ )

Some Best Invariant

Tests in MultxnoTmals

79

where

h*(r )=h{r\)fi- (l-r )u -Ky-\


1 l
a 3

g ( { P - P i ) , 5 ( * - P i ) + j) B ( i ( J V - p,), i
P l

+ j ) B ( i ( J V - p), i ( p -

P l

))

and B ( p , g) is the beta function. Hence E g h(Ri)

= 0 implies that

-I

^Cn)(n) ^i=0-

(4.26)

Since the right-hand side of (4.26) is a polynomial in 6\, (4.26) implies that all its coefncents must be zero. In other words

/ Jo Let

h*(f\)f{dfi

=0,

j =0,1,2,....

(4.27)

where h"

and ft* denote the positive and negative parts of h'. Hence we get

from (4.27)

Jo for j= 0 , 1 , . . . ; which implies that h*+(f )


1

Jo

h-(f )
l

for all ^ h*(Ri)

except possibly on a set of measure 0.


1

Hence we conclude that

= 0 almost everywhere {Pg (Ri)

: Si ^ 0} which, in turn, implies that

h ( f i i ) = 0 almost everywhere. Since Ri i.e., E (<p(R R )\R )


lIll ll 2 i

is complete, it follows; from L e h m a n n (1959), p. 34; that any


2

invariant test 4>(Ri,R )

of level a has Neyman Structure with respect to = c.


l

Ri,

To find the uniformly most powerful test among all similar invariant tests we need the ratio R R = fH (fl,f \Rl
l 2

= fi) = f%)

}H {T\,T \Ri
a 2

80

Group Invariance

in Statistical
= r

Inference

where / ^ ( f i f f ^ - f i i

i ) denotes the conditional pdf of (RuRg)

given Ri =

f i under Hi,i = 0,1. From (4.9) we get

H - exp | - i | ( l - f x ) } <t>
2

~ P i ) , \(P ~ P i ) ;

(4-28) is true, we

From (4.28), using the fact that Z is independent of R when H get the following theorem:

T h e o r e m 4.2.1. For Problem 4, the likelihood ratio test given in (4.24) is uniformly most powerful invariant similar.

P r o b l e m 5. It remains invariant under the group G of p x p nonsingular matrices g of the form / V > 3(21) \3(31)
0

9 =

3(22) 3(32)

\ 0 3(33)/

with 3 ( H ) : pi x p2, 9(22) : P2 x p2 submatrices of g. A maximal invariant in the space of (X,S) (as in Problem 4) under G is (Ri, R , R3) as defined in
2 2 2

Example 2.4.3 with d\ = p i , d = p and d$ = p - pi - p2. B y Theorem 2.8.1 the joint probability density function of Ri, R , Ri is given by
2

i""- ^i-

l F

i(r^-W-J-l
1

x f l - n - ^ - f a j i ^ - p ' "

l,

'=1

J= l

>>J

where

" - E J'
p

CTQ

' P3 = p - p i - p a

and

Some Beat Invariant

Tests in Multinomials

81

&1 = i V f E j i i j l ^ i ) + S
s r

( 1 2

|r

i 2

) + (18)1(3) ) ' S j ~ , j
( 3

* ( ( i o ( i ) + S(i2)r ) + (i3]r )),


f !

A + 6 = f (n) (i> + ci2)r > + s i3)r V \ ( 2 1 ) ( l ) + (22)^(2) + (231^(3) /


2 ( 2 ( ( 3 ) S r x

- i ' '
2 2

j E(ii)r + VE(2i)r ) + (
( 1 ) ( 1

( 1 3 |

2 2

r -i- r ] r j + (23)r(
( 3 ) j l 3 ) ( 2

( 3 )

3 )

\ ^ J ,

, =

( E

3 3

- ( ^ ; ) ' E

. ( ^ ) , r

Under H ,
0

61 > 0, 6 = 0, 6 = 0, and under H i , Si > 0, S > 0, 6 = 0. From


2 3 2 3 0

(4.29) it follows that Ri is sufficient for S\ under H . likelihood ratio test of this problem rejects H Z =
1 0

From G i r i (1964) the

whenever $ c
0

1 -

tii

(4.30) Z is
x

where the constant c depends on the level a of the test and under H distributed independently of H i as central beta with parameter (^(N J>2). \V2)Let <p(Ri,R ,R )
2 3

be any invariant level a test of H

against H i . From

Lemma 4.2.1 H i is boundedly complete. T h u s <j> has Neyman Structure with respect to R .
x

Since the conditional distribution of ( H i , R , R3) given Ri does


2

not depend on c\, the condition that the level a test <j> has Neyman structure, that is, H
H D

(0(H ,H2,H )|HI)-Q


1 3 2

reduces the problem to that of testing the simple hypothesis 6 = 0 against the alternatives 6% > 0 on each surface Ri f\.
2

In this conditional situation the

most powerful level a invariant test of S = 0 against the simple alternative S 6 >0\s
2 2

given by,

rejectH

whenever

<h

Q(JV -

P l

), ip

z ;

^f 6%j
2

^ c,

(4.31)

where the constant c depends on a. Since tf. =


2

(l-Hi)(l-Z)

and Z is independent of H i , we get the following theorem. T h e o r e m 4 . 2 . 2 . For Problem 5 the likelihood ratio test given in (4.30) is uniformly most powerful invariant similar.

82

Group Invariance in Statistical

Inference

4.3. T e s t of M u l t i p l e C o r r e l a t i o n Let X ,
a

a = 1,
a

,N beN
a

independent normal p-vectors with common mean = E%-,X ,


a

fi and common positive definite covariance matrix E and let NX S = % (X


=1

-X)(X -X)'.

A s usual we assume JV > p s o that 5 is positive

definite with probability one. Partition E and S as

\E(2i]
2 2 2 2

E(22) / '

\ 5(21)

S( ) /
2 2

where E ( j , 5 ( ] are of order p 1. L e t


2

_
=

(i3)E

( 3 2 )

( 2 1 )

" the positive square root of p coefficent. P r o b l e m 6. To test H


2

E (u) is called the population multiple correlation

;p

= 0 against the alternative Hi : p operating as gXg')

> 0. T h i s

problem is invariant under the affine group (G,T),

(X, 5; p, E ) - ( g X + t, gSg'; gp +1, where g g G i s a p x p nonsingular matrix of the form g=f ") 0

9(22)

where p ( ) is of order p - 1 and T is the group of translations t of components


22 a

of each X . (G,T) is

A maximal invariant in the space of {X,S)

under the affine group

_ ^

2 2

> ( 22)5(21)
5

(H)

which is popularly called the square of sample multiple correlation coefficent. D i s t r i b u t i o n of R . To find the distribution of R R 1 From Giri (1977) 5(u) 5(i2)5, L
2 ( 2 1 ) 2 2 2

we first observe that

(12)S[! ) (21) 5( ) - (i2) S


2 5 U ( Z 2 )

( 2 1

Some Best Invariant

Tests in Multinomials

83

is distributed as X N2 2

w P

'

N - p degrees of freedom and is independent of


2

S(i2). 5( 2). T h u s R /(l

- R ) is distributed as 1 *N-p
2

( )5 2 5(22)
12 {2 |

(11) -

(12) 2] (21)
[2

But the conditional distribution of


S

( 12) 5(22)
{ 1 2 )

\/E(ii} - S
2 2

( 2 2 )

S(

2 l )

given 5 ( ) , is (p l)-variate normal with mean ^{22)^(22)^2) ^ E ( u ) - E(i2)S 2)S(21)


(2

and covariance / . Hence R /(l 1 X%X p 2 |

- R ) is distributed as /E( )S 5(22)E S ) pl21^( )^22)^ )^(2in


1 2 ( 2 2 | 2 2 ( 2 2 ) ( 2 1 ( 2 2 v

-EdajE^Efai)

where Xj.(A) denotes a noncentral chisquare random variable with noncentrality parameter A and k degrees of freedom. Also
s

(i2)E
E

( 2 2 )

5(

2 2 )

| 2 2 |

S(

1 2

(12) (22) (21>

is distributed as chisquare
S

Xwv Since
(12) (~22) (21)
I 2 ) S S

p
( 2 1

E(u) - E ( we conclude that R /(l


2 2

E (

2 1

1 -P

'

- R ) is distributed as 1 2

,2 I _P_

Using the fact that a noncentral chisquare random variable XmW represented as xt +2K>
1 w n e r e

can be

K is a Poisson random variable with parameter

^, we conclude that ^ ( r r ^ - i )

84

Group Invariance

in Statistical

Inference

is distributed as X - p _

1 + 2 K

- , where the conditional distribution of K given x ? v - i


2

is Poisson with parameter \[p l[l

- P )]XN-I2

Let A/2 - i p / ( l - p ) . Then

P(K

= k) =

25< - >r(i(Ar_i))A;!

r((iV-i)+*)A*
r ( | ( N - l ) + fc) ,
fc!r<(iv-i))
k

_ i ,

with lb = 0, 1 , . . . . Thus

is distributed as
T lo U 1 M I 1 U U . c u d o

X p

1 + 2 K

(4.34)

t
2 X

1 - B

N-

where K is a negative binomial with pdf given in (4.33). Simple calculations yield (N

p)R

(p~l)(l-R ) has central F distribution with parameter (p - 1, N - p) when p (4.33)-(4.34) the noncentral distribution of R (j f R
r 2 2

0. From

is given by
P ( 1

2)l(N- -3)
P

( r

2 l( -3)
)

ftHtr-l)

'

{ r

>x

r(i(N-i))r(i(iv-p)) , f . W W ( j ( f f - i ) fe ^(i( -D
P + l

+ i)

.....
( 4

)-

'

3 5 )

It may be checked that a corresponding maximal invariant in the parametric space is p . T h e o r e m 4.3.1. For problem 6 the test which rejects H invariant. whenever R most
2 2

^ C,

the constant c depends on the level a of the test, is uniformly

powerful

Some Beat Invariant

Tests in Multinomials

85

P r o o f . From (4.36)

~
^ which for a given value of p
2

(A )T^(Jv-i^)r(j(p-i))
i!r(i(p-i) + i)r (i(7v-i))
2

'

is an increasing function of r .

Using Neyman
2

Pearson L e m m a we get the theorem. As in Theorem 4.1.2 we can show that among all tests of Ho : p against Hi : p UMP. 4.4. T e s t of M u l t i p l e C o r r e l a t i o n w i t h P a r t i a l I n f o r m a t i o n
2

= 0

> 0 with power function depending only on p

the fi -test is

Let X be a normally distributed p-dimensional random column vector with mean p and positive definite covariance matrix S , and let X , as X = (Xi,X^,X'^Y
2 a

a 1 , . . . ,7V

[N > p) be a random sample of size N from this distribution. We partition X where X\ is one-dimensional, X^ is pi-dimensional respectively. and X ( ) is p2-dimensional and 1 + p i + P 2 = p. Let p\ and p denote the multiple correlatron coefncents of X\ with X[ )
2 2

and with (X'^X'^y,

Denote by p\ = p p\. We consider here the following two testing problems: P r o b l e m 7. To test H m : p p\ = X > 0. P r o b l e m 8. T o test H o2 2

= 0 against the alternatives Hi\;

p\ 0,

= 0 against the alternatives H \:


2

p\ = 0,

pl =

X>0. = X^,X ,
0

Let NX

S = S^^A-"-X)(X-X)',

(fJ

denote the i-vector

consisting of the first i components of a vector b and C[;j denote the i x i upper left submatrix of a matrix c. Partitions 5 and E as

(
2

5n

( 1 2

S(

1 3 )

/ En

E(i2j

E(

1 3

5(21]

5(31|
2 2 2

5 ] 5(32 )
| 2 2

5 ) I , 5(33)/
[ 2 3

I S(2i) \^(31)

E 22)
(

E(
3 3

3 2

E( 3) ) E( )
2 3 3 3 3

where 5 ( ) and E ( 2 ) are each of dimension pi x p i ; 5 ( ) and E ( j are each of dimension P2 x p . Define Pi = E ( i ) E ^ E( /E ,
2 1 2 2 ) 2 1 1 1 1

p =^+pI

{E, E, 3 )fg ;
12) 1 )

22

(32)

^ ^ " ' ( E d ^ ^ l ' / S n , ^(33) /

86

Group Invariance in Statistical

Inference

Rl

- (i2)S

( 2 2 )

S< )/\Sii ,
2 I

4 +fl = c% ^ ) ( f p jjgj )
2 3) }

_ 1

It is easy to verify that the translation group transforming (X,S , E ) ^ ( X + &,S;p + 6 , E )

leaves the present problem invariant and, along with the full linear group G of p x p nonsingular matrices g (gu 9= \ where ffii : 1 X 1, 9 {
2 2 )

o \
(4.36)

0 0
u

ff(23) 0 9(32) 9(33) / g :


l33)

: pi x p

x p , generates a group which


2 a a

leaves the present problem invariant. T h e action of these transformations is to reduce the problem to that where p, = 0 and S = % X X '
=1

is sufficient for

E , where N has been reduced by one from what it was originally. We now treat later formulation considering X", a = 1 , . . . , TV ^ p > 2, to have a common zero mean vector. operating as (S,Z)^(gSg\gU) for the invariance of the problem. A maximal invariant in the sample space under G is {Ri,R )
2

We therefore consider the group G of transformations g

as defined in (4.11).

Since S is positive definite with > 0, R


2

probability 1 (as N ^ p ^ 2 by assumption): R,


2

> 0 and Ri + R

the squared sample multiple correlation coefficent between the first and the T h e joint probability

remaining p - 1 components of the random vector X. A corresponding maximal invariant in the parametric space under G is (p\,p\). density function of (R^R )
2

is given by (see G i r i (1979))

/a(n,f ) - m
2

- P r (l

N/2

~ fi - f )^ - ~V
2

N p

TJfo)**-*

" A S S

w t * + A >

< 4

'

3 7 )

Some Best Invariant Tests in Multinormals where

87

% = 1- E

P)

'

ff

i = E J'

~
.2

f)/Wi-i.

7.

and i f is the normalizing constant. B y straightforward computations the likelihood ratio test of Hio when fl { ( , E ) : E 1 3 0 } rejects J?io whenever f ^ c , (4.38)

where the constant c depends on the size a of the test and under HM, RI has a central beta distribution with parameter ( i p i , ( N - p i ) ) , and the likelihood ratio test of H
20

when fl = { ( p , S ) : S

= 0 } rejects H Q whenever
2

' ^ ~ ^ 4 1 - rj

(4-39)

where the constant c depends on the size a of the test and under H o the
2

corresponding random variable Z is distributed independently of R\ as central beta with parameter (^(JV p i ps), \p )2

T h e o r e m 4 . 4 . 1 . For problem 7 the likelihood ratio test given in (4.38) is UMP invariant.

Proof. Under ff
u

Under H
2

i0

= 1 , i = 0,1,2.

Hence a
2

= 0, i = 0, i = 1,2.
7 2

P . = A, p\ = 0, % = 1, 71 = 1 - \ a
2

= 1 - A,

- 1 - A, al = 0,

81 = 4fiA and t? = 0. T h u s /fln(n,r ) _ ^


2

j-w/2

i=0

(801

88

Group /ntrarinnce in Statistical

Inference

Using the Neyman-Pearson L e m m a and (4.40) we get the theorem. T h e o r e m 4 . 4 . 2 . For Problem 8 the likelihood ratio test is UMP among all tests <j>(R R.2) ^sed on R\,R\
u b

invariant

satisfying

Proof. Hence

Under
2

p\ = 0, p\'=
2

A, % = 1 , %
- 1

1, 7a = 1 - A, &{ - 0,

a\8 = A, h = 0 and 0 = 4 f A ( l -

fiA) .

fn A 2\ l)/fH ( 2\n)
3 20

fH A l,?2)/fnio( l,r2)
1

0 0

.ra(JV-p )+t)/
1

4f A
2 1

From (4.41) / j y fi).

(2i)!

\ l - f

a j

( f | r i ) has a monotone likelihood ratio in f


2

= (1 z)(l

Now using Lehmann (1959) we get the theorem. Exercises

1. Prove Equations (4.3) and (4.5). 2. Prove (4.10). 3. Show that JrJ^(A) is distributed as X +2K
m w n e r e

i f is a Poisson random

variable with parameter j A . 4. Prove (4.31). 5. Let 7ri , ?r be two p-variate normal populations with means p\ and / i and
2 2

the same positive definite covariance matrix S . L e t X {Xi,... distributed according to ir\ or ir and let b = ( & i , . . . ,b )'
2 p

,X )'
P

be

be a real vector.

Show that [E lb'X)-E2(b X)\


1 , 2

var(b'X) is maximum for all b if 6 E under Tij. 6. (Giri, 1994a). Let X


a 1

(pi p )
2

where Ei denote the expectation a = 1 , . . . , N(> p) be indepen-

= {X ,...,
al a n <

X )',
op

dently identically distributed p = 2p!-variate normal random vectors with mean p = (/*!..- tUp)' Let X = jjX^X^ *
c o m m

o n covariance matrix E (positive definite).


a

S = ^ , ( X

- X)(X

- X)'.

Write

Some Best Invariant Tests m Multinomials _ / S ( u ) (ia)\ (5(11) , %2}V

89

s =

X = ( X i , . Xp) where S / y j and Sufy (a)

are pi x pi for all t, j and X ( i ) = ( X i , , X , ) '


p a

Show that the likelihood ratio test of H {fi' ,p[ )\


w 2)

: M(i) =
0

p ( j , with p
2

( 1 }

= ( M I I - - I M P I ) ' rejects r Y for large values of + 5(22) - 5(i2) - 5(2i)) (-A"[i) - X


2 _1

= N(X
2

( 1

) - X(2))'(5

( 1 1 )

( 2 |

where T

is distributed as x ( S ) / ' x % P 1 _ ( ( 2 1 1 ( 1

P 1

with 6

/ V ( M ( I ) ~ r*(2])'
a P l r e

( E ( i i ) + E 22) - E independent.

- E 2 ) ) ' ( M ( I ) - "(2)) and X p , ( - ) . X w -

(b) Show that the above test is U M P I . 7. ( G i r i , 19946). In problem 5 let T with r (a)
( 1 )

E " V -

(Ti

J' =

(T^j, Tfaf

= (r

1 [

...,r )'.
j

Show that the likelihood ratio test of HQ : T ( i ) = r ( 2 ) rejects Ho for small values of Z _ 1 + i V ( X ( i ) + X ( ) ) ( S ( n ) + 5(22) + 5(21) + 5 ( i ) ) ~ ( X ( i ) + X(2))
2 2 , 1

l +

NX'S^X \pi)

where Z is distributed as central beta with parameter {^{N pi), under Hn. (b) Show that it is U M P I similar. (c) F i n d the likelihood ratio test of H
0

: T ^ , = A F ( ) when A is known.
2

8. ( G i r i , 1994c). I n problem 6 write

(
where E (
3 1

(ii)

E(

1 2

E(

1 3

) \

/S(it)

( 1 2

S(

1 3 |

E (2i) E(3l)
U

{ 2 2

) )

E(23) I , E(
3 2

5 = I 5(2i) \ -S"<31)

5(2 )
2

( 2 3 |

E(

3 3

)/
2 2

5(32)

5(33)
( 3 3

) and 5 ( ) are 1 x 1, E (

) and 5( 2) are pi x pi and E


2

j and

5(33) are pi x pi with 2pi = p 1. (a) Show that the likelihood ratio test of H large values of R\ = (5(12)-5(i ))(5( ) + S ( ) - S ( 3 2 ) -5(23))
3 2 2 3 3 1 0

:E

( 1 2

) = E(

1 3

) rejects H

for

(5(21) ~ 5 ( ) ) / 5 ( ) ,
3 ! U

90

Group Invariance in Statistical

Inference

the probability density function of R\ is given by _ ( 1 - ^ - 0 ( 1 - ^ ( ^ - 3 )

r(i(iv-i))r(i(iv- )) (^) (^?> "- r (^(Jv-i) + j)


Pl J Pl+J 1) 3

j=0 where p
2

mhpi+j)

= i V ( ( 1 2 ) - 2(13)1(2(22) -I- { 3 ) -
3

( 3 2

) - E(23])~

x ((2i) -

( 3

i))/

( 1 1

).

(b) Show that no optimum invariant test exists for this problem.

References

1. N, Giri, Locally Minimax Tests of Multiple Correlations, Canad. J . Statist. 7,


53-60 (1979).

2. N. Giri, Multivariate Statistical Inference, Academic Press, N.Y., 1977.

3. N. Giri, On the Likelihood Ratio Test of a Normal Multivariate Testing Problem,


Ann. Math. Statist. 35, 181-189 (1964). 4. N. Giri, On a Multivariate Testing Problem, Calcutta Stat. Assoc. Bull. 11, 55-60 (1962).

5. N. Giri, On the Likelihood Ratio Test of A Multivariate Testing Problem II, Ann.
Math. Statist. 36, 1061-1065 (1965).

6. N. Giri, Admissibility of Some Minimax Tests of Multivariate Mean, Rapport de


Research, Mathematics and Statistics, University of Montreal, 1994a. 7. N. Giri, On a Test of Discriminant Coefficients, Rapport de Research, Mathematics and Statistics, University of Montreal, 1994b.

8. N. Giri, On Tests of Multiple Correlations with Partial Informations, Rapport


de Research, Mathematics and Statistics, Univerity of Montreal, 1994c.

9. N. Giri, Multivariate Statistical Analysis, Marcel Dekker, N. Y . , 1996.


10. J . K . Ghosh, Invariance in Testing and Estimation, Tech. report no Math-Stat 2/672. Indian Statistical Institute, Calcutta, India, 1967.

11. E . L . Lehmann, Testing Statistical Hypotheses, Wiley, N.Y., 1959.


ber of Observations is Large, Trans. Amer. Math. Soc, 54, 426-482 (1959).

12. A. Wald, Tests of Statistical Hypotheses Concerning Parameters when the Num-

Chapter 5 SOME MINIMAX T E S T S I N MULTINORMALES

5.0. I n t r o d u c t i o n T h e invariance principle, restricting its attention to invariant tests only, allows us to consider a subclass of the class of all available tests. Naturally a question arises, under what conditions, an optimum invariant test is also optimum among the class of all tests if such can at all be achieved. A powerful support for this comes from the celebrated unpublished work of Hunt and Stein, popularly known as the Hunt-Stein theorem, who towards the end of Second World War proved that under certain conditions on the transformation group G , there exists an invariant test of level a which is also minimax, i.e. minimizes the maximum error of second kind (1-power) among all tests. Though many proofs of this theorem have now appeared in the literature, the version of this theorem which appeared in Lehmann (1959) is probably close in spirit to that originally developed by Hunt and Stein. P i t t m a n (1939) gave intuitive reasons for the use of best invariant procedure in hypothesis testing problems concerning location and scale parameters. Wald (1939) had the idea that for certain nonsequential location parameter estimation problems under certain restrictions on the group there exists an invariant estimator which is minimax. Peisakoff (1950) in his P h . D . thesis pointed out that there seems to be a locuna in Wald's proof and he gave a general development of the theory of minimax decision procedures invariant under transformation group. Kiefer (1957) proved an analogue of the Hunt-Stein theorem for the continuous and

91

92

Group Invariance

in Statistical

Inference

discrete sequential decision problems and extended this theorem to other decision problems. Wesler (1959) generalized for modified minimax tests based on slices of the parametric space. It is well-known that for statistical inference problems we can, without any loss of generality, characterize statistical tests as functions of sufficient statistic instead of sample observations. Such a characterization introduces considerable simplifications to the sample space without loosing any information concerning the problem at hand. Though such a characterization in terms of maximal invariant is too strong a result to expect, the Hunt-Stein theorem has made considerable contribution towards that direction. T h e Hunt-Stein theorem gives conditions on the transformation groups such that given any test <f> for the problem of testing H : 8 f l / / against the alternatives K : $ fl/f, with fi// n fi/f a null set, there exists an invariant test sup E (f>> eefi
S e

such that (5.1) (5.2)

sup E yj, eer2


e e

inf E d> < inf E il>. fen* ~ eefi


K

In other words, V behaves at least as good as any <f> in the worst possible cases. We shall present only the statements of this theorem. For a detailed discussion and a proof the reader is referred to Lehmann (1959) or Ghosh (1967). L e t V {Pg,0 clidean space (X,A), e f!} be a dominated family of distributions on the E u dominated by a cr-finite measure p. Let G be the group

of transformations, operating from the left on X, leave fi invariant. T h e o r e m 5.0.1. (Hunt-Stein Theorem). Let B be a a-field of subsets of with gx A is in A x B right invariant in that there exists a sequence of

G such that for any A E A, the set of pairs (x,g) and for any B B, g G G , Bg e B. Suppose distribution functions v
n

on ( G , B) which is asymptotically

the sense that for any g G , B B \im^\v (Bg)


n

- v(B)\

= 0.
1

(5.3)

T h e n , given any test <j>, there exists a test V which is almost invariant and satisfies conditions (5.1) and (5.2). It is a remarkable feature of this theorem that its assumptions have nothing to do with the statistical aspects of the problem and they involve only the group G . However, for the problem of admissibility of statistical tests the situation is more complicated. If G is a finite or a locally compact group the best invariant test is admissible. For other groups the nature of V plays a dominant role.

Some Minimax

Test in Muttinormates

93

T h e proof of Theorem 5.0.1 is straightforward if G is a finite group. L e t m denote the number of elements of G. We define

As observed in Chapter 2, invariant measures exist for many groups and they are essentially unique. B u t frequently they are not finite and as a result they cannot be taken as a probability measure. We have shown in Chapter 2 that on the group 0{p) of orthogonal matrices of order p an invariant probability measure exists and this group satisfies the conditions of the Hunt-Stein theorem. T h e group GT(P) of nonsingular lower triangular matrices of order p also satisfies the conditions of this theorem (see Lehmann, 1959, p. 345). 5.1. L o c a l l y M i n i m a x T e s t s Let (X,A) be a measurable space. For each point (5, JJ) in the parametric with

space f!, where 6 > 0 and ij is of arbitrary dimension and its range may depend on i5, suppose that p{-; 6,n) is a probability density function on (3Z,A) respect to some u-finite measure p . We are interested in testing at level a (0 < a < 1) the hypothesis HQ : 6 0 against the alternative Hi : 6 = A, where A is a positive specified canstant and in giving a sufficient condition for a test to be locally minimax in the sense of (5.7) below. T h i s is a local theory in sense that p(x;A, JJ) is close to p(x;o, 7?) when A is small. Obviously, then, every test of level a would be locally minimax in the sense of trivial criterion obtained by not substracting a in the numerator and the denominator of (5.7). It may be remarked that our method of proof of (5.7) consists merely of considering local power behaviour with sufficient accuracy to obtain an approximate version of the classical result that a Bayes procedure with constant risk is minimax. A result of this type can be proved under various possible types of conditions of which we choose a form which is more convenient in many applications and stating other possible generalizations and simplifications as remarks. Throughout this section expressions like o(A), o(/i(A)) are to be interpreted as A > 0. For fixed a , 0 < a < 1 we shall consider critical regions of the form (5.4) where V is bounded and positive and has a continuous distribution function for each {S,Tj), equicontinuous in (6,7)} for 6 < some 6$ and which satisfies

94

Group Invariance

in Statistical

Inference

(5.5) P , (R)
x v

= a +

k(X)+q(X, )
V

where q(X,n) o(l).

= o{h(X))

uniformly in v with h(X) > 0 for A > 0 and h(X) = probability density functions <J,A

We shall be concerned with a priori

and t]\,x on sets 6 = 0, 8 = A respectively, for which

{ f r o i f ^

=
2

1 +

WW)

+ r(Xmm

B( ,
X

A)

(5.6)

where 0 < Ci < r(A) < c B(x,X) = o(h(X))

< oo for A sufficiently small and p(A) = o ( l ) ,

uniformly in x.

T h e o r e m 5.1.1.

(Locally

Minimax

Tests) If the region R satisfies

(5.5)

and for sufficiently small X there exist o,i. and fa^ satisfying that is to say,

(5.6) then R is

locally minimax of level a for testing Ho ' 8 = 0 against Hi : 8 = X as A * 0,

lim i - o Sup0

i n f , PUR)-o i n f , Px, {4>\ rejects i f } n 0

t _

tu/iere Q

is t/ie class of all level a tests <p\.

P r o o f . Let r\ = [2 + h(X){g(X) Then p= 1 + h{X) \g(X) + c


a

+ c

r(A)}]

- 1

r(X)}.

(5.8)

Using (5.7) and (5.8) the Bayes critical region relative to the a priori distribution
\ = (1 - T ) O , X + n
A

and ( 0 1 ) loss is given by

Some M i n i m a l Teat in Multino-rmales

95

Define for any subset A

Pi,*(A)

Px,vW)

6 , A W ,

Vx=R W Using the fact that


X

B\, (5.10)

= B

- R .

B(x,X) sup h{\)

-o(A)

and the continuity assumption on the distribution function of U we get Plx(Vx Also for U
x

+ W )=o(X).
x

(5.11)

= V

or Pi\x(Ux) = PS,x(Ux)[l + Pim)}] (5-12)

Let r* (A) = {1
x

- n) ^, 4
A

n ( l - pfc(jf)).

Using (5.9), (5.10) and (5.11) the integrated Bayes risk relative to $\ is given by r'x(Bx) = rl(R)
,

+ (1 - n ) [ P ' , ( ^ ) - P * ( V ) ]
0 x 0 A A 1 A A

+ n[f r,A(^)-F *, (-y- )] = r ( R ) 4- (1 - 2 r ) [ P * , ( ^ ) - P * ( V ) ]


A A 0 Q A A

^O%(VX + W A ) O ( M A ) )

= r ( R ) + o(/ (A)).
A l

(5.13)

If (5.7) were false one could by (5.5) find a family of tests { 0 > J of level a such that d> has power function cv + g(X, ij) on the set 6 A with
x

lim sup [inf g(A, JJ) - h(\)]/h(\)

> 0.

T h e integrated risk r' of ^> with respect to \ would then satisfy


x

lim s u K ( i i ) - r , ] / h ( A ) > 0 ,
P

A0

contradicting (5.13).

96

Group invariance

in Statistical

Inference

Remarks. (1) Let the set {6 = 0} be a single point and the set {8 = A } be a convex finite dimensional Euclidean set where in each component $ of TJ is

00 (A))- If
P ? ' ^ = l + h(X) U(x) + T, p(ar;0,Jj) where s
1

6i(*) OiifA) m + B{x, A , TJ)

(5.14)

if

a y are bounded and s u p , J 3 ( X , A , J J ) = o ( / i ( A ) ) , and if there


X i

exists any

satisfying (5.6), then the degenerate measure f ?

which

assigns all its measure to the mean of I,A also satisfies (5.6). (2) T h e assumption on B can be weakened to Px,{\B(x, as A - * 0 uniformly in n for each e > 0. If the A ) | < e k(X)} - 0 i are independent of

A the uniformity of the last condition is unnecessary. T h e boundedness of U and the equicontinuity of the distribution of U can be similarly weakened. (3) T h e conclusion of Theorem 5.1.1 also holds if Q is modified to include every family {<j>x} of tests of level a + o(h{X)). consider the optimality of the family {Ux} replacing R by Rx with qxiv) o(h{X)). (4) We can refine (5.7) by including one or more error terms. Specifically one may be interested to know if a level a critical region R which satisfies (5.7) with inf Px, {R)
n

One can similarly = aqx(v)

rather than single U by


Qt

{x : U {x)
x

> c ,x},
a

where P {Rx}

= a + ciA + o(A),

as A - > 0

also satisfies
l i m

mUPxMRl-a-dX o sup^
e

inf, Px, {<j> rejects H }


v 0

- a - c,X distribution

In the setting of (5.14) this involves two moments of the a priori more moments are brought in.

I , A rather than just one in Theorem 5.1.1. A s further refinements are invoked The theory of locally minimax test as developed above and the theory of asymptotically minimax test (far in distance from the null hypothesis) to be developed later in this chapter serve two purposes. F i r s t the obvious point of demonstrating such properties for their own sake. B u t well known valid doubts

Some Minimax

Test in Multinormales

97

have been raised as to meaningfulness of such properties. Secondly, then, and in our opinion more important, these properties can give an indication of what to look for in the way of genuine minimax or admissibility property of certain tests, even though the later do not follow from the local or the asymptotic properties.

E x a m p l e 5 . 1 . 1 . ( T - t e s t ) . Consider Problem 1 of Chapter 4. L e t X i , . . . , XN be independently and identically distributed J V ( p , E ) random vectors.


p

Write NX

=
0

X 5 -

f (X; NX'{S

- X)'.

Let 6 =
l

> 0. T
2

For testing H

: 8 = 0 against the alternatives Hi : 8 > 0 the Hotelling's


Q

test which rejects H

whenever R -

+ NXX')~ X

> c, where c is

chosen to yield level a, is U M P I (Theorem 4.1.1). N o t e . For notational convenience we are writing R\ as R. L e t 6 A > 0 (specified). We are concerned here to find the locally We minimax test of Hn against Hj : S = A as A - > 0 in the sense of (5.7).

assume that N > p, since it is easily shown that the denominator of (5.7) is zero in the degenerate case N < p. In our search for locally minimax test as A > 0 we may restrict attention to the space of sufficient statistic ( X , S). T h e general linear group G)(p) of nousingular matrix of order p operating as ( , s ; p , E ) . (gx,gsg';gp, gT,g')

leaves this problem invariant. However, as discussed earlier (see also James and Stein, 1960, p. 376), the Hunt-Stein theorem cannot be applied to the group G ( p ) , p > 2. However this theorem does apply to the subgroup G j - ( p )
(

of nonsingular lower triangular matrices of order p. T h u s , for each A, there is a level a test which is almost invariant and hence for this problem which is invariant under Grip) (see Lehmann, 1959, p. 225) and which minimizes, among all level a tests, the minimum power under H i . From the local point of view, the denominator of (5.7) remains unchanged by the restriction to GT invariant tests and for any level a test d> there is a G j invariant level a test 4>' for which the expression i n f , P x , , [4>' rejects H )
a

is at least as large, so

that a procedure which is locally minimax among GT invariant level a tests, is locally minimax among all level a tests. I n the place of one-dimensional maximal invariant R under G j ( p ) we obtain a p-dimensional maximal invariant (Ri,...,R )
p

as defined in Theorem 2.8.1

with k = p and di 1 for all i, satisfying

98

Group Invariance

in Statistical

Inference

J2 i=]

Rj = NX{ {S
A

+ N X ^ r ' X ^

, i = 1,.. - ,p

(5.16)

with R, > 0, ] ?

= l

flj - J? < 1 and the corresponding maximal invariant on 6 ) (Theorem 2.8.1) where
P

the parametric space of ( / * , ) is ( ^

(5.17)

with Sf_i (rn,...,%)',

5.

T h e nuisance parameter in this reduced setup is n =


= 1

with jji = 6,/6 > 0, V _ ^

TJJ = 1 and under H ,R


P

i) = 0.

From

Theorem 2.8.1 the Lebesgue density of Ri,...

on the set

i n , . . . , r ) : r , > 0,
p

JPfl <

is given by

/(ri,...,r) =

x exp

yj , /JV - + 1

i 2'

nSi
2
n c e

(5.18)

We now verify the validity of Theorem 2.8.1 for U Yl^=i ^ - s i preceding (5.5) for the locally minimax tests are obvious. fi(A) = b\ with 6 a positive constant. Of course P\,(R) 77. From (5.18) we get with r = ( r i , . . . r ) ' ,
p

those

In (5.5) we take

does not depend on

/o,,(r) where B(r,\,-n)

2 o(A) uniformly in r , n as A * 0.
0 | A

B(J,X,V)

(5-19)

T h e equation (5.6) is where j = 1,..., p

satisfied by letting

give measure one to the single point TJ 0 while

gives measure one to the single point rj = TP* = (TJ", . . . T?; - (N - j)~ (N
L

- j + l r v ^ o v - p),

Some Minimax

Tesi in MultinoTmales

99

so that Y,i>3 * + (N - j the following theorem.

+ l}rj; = for all j.

From Theorem 5.1.1 we get

T h e o r e m 5.1.2. For every p,N,a{N T


2

> p) Hotelling's

test based on

or equivalently on R is locally minimax for testing HQ : 6 0 against the

alternatives Hi : 6 = X as X * 0. E x a m p l e 5.1.3. Consider Problem 2 of Chapter 4. From E x a m p l e 5.1.2 it follows that the maximal invariant in the sample space is ( J ? i , . . . , R )
P1

with

Ri

=
P

a n

corresponding maximal invariant in the parametric space Under Ho, Si = 0 and under Hi 6i X.

is ( f i i , . . . ,(J ,) with 8\ = Yii' 6j.

Now following E x a m p l e 5.1.2 we prove Theorem 5.1.3. T h e o r e m 5.1.3. For every pi,N,a large values of R\ is locally minimax alternative Hi : 6\ X as X * 0. E x a m p l e 5.1.4. Consider Problem 3 of Chapter 4. We are concerned here to find the locally minimax test of Ho : Si = 0, S = 0 against the alternatives
2

the UMPi

test which rejects Ho for the

for testing Ho : &i 0 against

Hi

: 8i = 0, 6

X > 0. A s usual we assume that N > p the determinator To find the locally minimax test we restrict attention S).
2

of (5.7) is not zero. (X, S) * (gX + t ,


2

to the space of sufficient statistic (Xi gSg'), g G , t


2 2

T h e group (G2,T )
2

operating as

T , which leaves the problem invariant, A maximal invariant in the sample


P I

does not satisfy the condition of the Hunt-Stein theorem. However this theorem holds for the subgroup (G {p\
T

+ p ) , T ).
2 2

space under this subgroup is Ri....,

Pl

such that

^R^NX'u

(SM + NXMX )- X ,
{i] [{l

i = l,...,pi

+ P

(5.20)

with Ri >0,Ri=

Rj, Ri+R

= E^Li"

R,
3

and the corresponding


Pl + P 2

maximal invariant in the parametric space is 61,...,

such that

(5.21)

with Si > 0,6

= E j L i 6j, h + 6 =
2

5,. Under H

Si = 0 for all i and

under Hi Si 0 , i 1 , . . . , p i , fij = 6 = \ > 0. T h e nuisance parameter in this

100

Croup Invariance

in Statistical

Inference
w

reduced set up is fj = ( r f i , . . . , Vp,+piY

' * h if, = ^ . Under H n - 0 and under


0

Hi, ifc = 0 i = l . . . - , p i , ifc > 0, = pi + 1 Equation (5.19) in this case reduces to

P i + Pa with E f ^ + i * =

- l + f,-!/o,(r) 2

^ r j . + f/V-.j + l ) ^

+ (r,n,A) (5.22)

as A 0 with B(r, JJ, A) = o(A) uniformly in r, n. T h e set


2

= 0, 6i = 0 is a single point n = 0, so O,A assigns measure Since the set {Si = 0, 6


2

one to the single point ij = 0. convex p -dimensional


2

A (fixed)} Is a

Euclidean set wherein each component jj; is o(/i(A)), can be replaced by the degenerate measure
n P2 A

any probability measure Hence from (5.22)

which assigns measure one to the mean ( 0 , 0 , * 7 p , + i > - < p,+

f &,*)

/
I

fx, (r)
n

ux(dv)

fo.n(r)

=I

+ ^ - i + f i + "E *(l>+Cw-i+i>%J
\iiAdn)

+ B(r,A,7j)

= 1 + L

- l + f , +

J = P 1 +

^ ( ^ i t f + t W - j '
V , > J

+ l h n
/ J

+B(r,A) (5.23)

where B{r, A) o(/i(A)} uniformly in r . Let il* be the rejection region defined by Rk = {X : U(X) = J * , + kR
2

> C }
Q

(5.24)

where fc is chosen such that (5.23) is reduced to yield (5.6) and the constant C
a

depends on the level a of the test for the chosen fc. Choose (jy-j-i)...(/VP l

-p )
3

(A -Pi)
P2 (iV - p , - P 2 + 1)

Some M i n i m a l Test in Muttinormales

101

j = pi + 1, . . . , p i + p n _ (^-Pi)

- 1,
( 5 2 5 1

so that

V M ^ - J + l l ^ ^^-, J =
T h e test 0* with rejection region

N - p ,

P i + 1,

- - ,pl + P 2

JT = j * :

U(X) = % +

> C|

(5.26)

with Pa,\{R*)

a satisfies (5.6) as A > 0.

Furthermore any region R& of From depends only on A and

structure (5.24) must have k hence from (5.5) q{\,r)) Hotelling's T


2

to satisfy (5.6) for some


tV

Theorem 4 .1 .4, for any invariant region R*, Px (R*) test based on Ri + R

0 and thus tp* is L B I for this problem as A * 0.


2

does not coincide with 4>' and hence it

is locally worse. It is easy to verify that the power function of Hotelling's test which depends only on 03, 6\ being zero, has positive derivative everywhere, in particular, at 8
2

0. T h u s from (5.5), with R R*, h(A) > 0. Hence we

get the following theorem. Theorem 5 .1 .4. For Problem 3 the L B I test 0* is locally minimax as A * 0. R e m a r k s . Hotelling's T
2 2 2

test and the likelihood ratio test are also invari2

ant under (G , T ) and therefore thier power functions are functions of 6 only. From the above theorem it follows that neither of these two tests maximises the derivative of the power function at S = 0. So Hotelling's T
2 2

test and the

likelihood ratio test are not locally minimax for this problem. E x a m p l e 5.1.5. ( G , T), transforming ( X , S; p, S ) (gX + t, gSg'; ga + t,
2

Consider Problem 6 of Chapter 4.

T h e affine group

gBg') 0 again H\ : p
2

where g G, t T leaves the problem of testing H^p

A > 0 (specified) invariant. However the subgroup ( G T ( P ) , T ) , where Gj-(p) is the multiplicative group of nonsingular lower triangular matrices of order p whose first column contains only zeros expect for the first element, satisfies Hunt-Stein conditions. T h e action of the translation group T is to reduce the

102

Group Invariance in Statistical Inference


a

mean a to zero and S E a = i X (X)' A T , . . . ,AT


1 W

is sufficient where N has been reduced

by one from what it was originally. We treat the latter formulation considering to have zero mean and positive definite covariance matrix E and for invariance. A maximal invariant in the (R ,
2

consider only the subgroup Gx{p) sample space under Gx{p) is R=


[l2 (

- -, Rp)', where S
( 2 ] ) [ i

_ S )[i] i=i

5 ~

2 ) [ i ]

] i = 2,.. .,p (5.27)

5(H)

where C[i] denotes the upper left-hand corner submatrix of C of order i and tyi] denotes the i-vector consisting of the first i components of a vector 6 and

\ (21)

3(22)7

where 5(22) is a square matrix of order (p 1). A s usual we assume that N R


2

>p

which implies that 5 is positive definite with probability one. Hence Ri > 0, = U = ^2*- Rj
2

1-

(R

is the squared sample multiple correlation

coefficient).
2 P

T h e corresponding maximal invariant in the parametric space is where

A = (6 ,...,S Y,

('2)['] (22)[i1 <31)[i] E( )


U

5 > i-2
.2 _ V P with 6, > 0, p = E ?=2 >2 5 L e t

(5.28)

T h e joint distribution of J ? , . . . , R
2

is (see Giri and Kiefer, 1964)


i l N

( l - X ) ^ ( l - T , ^ r i ) nHr-Dr(l(N-p+ I)) ( l + E
P = 2

rjiCl - A ) /

7 i

- 1])

x fi
J=2

+ i)^ - r(l{N-i
N +2]

+2))

ft,=D

/3 =D
P

\j=2

r ( i ( j v - i + 2) +
3=2 L

ft;

4T- (l-A)/
J

7 j

(l+7r-

10

(5.29)

Some Minimax Test in Multinormates

103

where 7_,- = 1 - A J j ^ l f t , ttj = Sjfy. 6j - 0. From (5.29) we obtain

T h e expression 1 / ( 1 + TT^-

means 0 if

+ B ( r , A , ) , (5.30)
/

/o,o(r)

3=2

\ i>J

where J(j", A,T?) o ( A ) uniformly in r, 77. From Theorem 4.3.1 it follows that the assumption of Theorem 5.1.1 are satisfied with U = letting i j + 2)
- 1 | A _ 1 P = 2

-^j = R

M-M = bX, b > 0. A s in example 5.1.3


-

give measure one to n" = ( ) ) , . ... ,1(2} where jjj (JV j + l j ' ( W

( p - l ) J V ( i V - p + 1), j = 2 , . . . , p - l we get (5.6). Hence we prove

the following theorem. T h e o r e m 5 . 1 . 5 . F o r ever?/ p,N and a the R? test which rejects H coefficient R
2 2

for

large values of the squared sample multiple correlation minimax for testing HQ against Hi : p Remarks. = A > 0.

is locally

It is not a coincidence that (5.19) and (5.29) have the same

form. (5.18) involves the ratio of noncentral to central chisquare distribution while (5.29) involves similar ratio with random noncentrality parameter. T h e first order terms in the expressions which involve only mathematical expectations of these quantities correspond each other. T h e group G

E x a m p l e 5.1.5.

Consider problems 7 and 8 of Sec. 4.3.

as defined there does not satisfy the condition of the Hunt-Stein theorem. However this theorem does apply to the solvable subgroup GT of p x p lower triangular matrices g of the form (9u 0 9 = V 0 0 0 0 \ (5.31) Spp/

322

92p From E x a m p l e 5.1.4 a maximal invariant in the sample space under GT is R ( i ? , . . . , R ) ' and the corresponding maximal invariant in the parametric
2 p

space is A = (62,...,6 )'.


P 2

From (5.29) R has a single distribution under and on the p-dimension parameter r

HIQ and H o and it has a distribution which depends continuously on the p i dimension parameter Fix under Hi\ under H ,
2X 2A

where

104

Group Invariance in Statistical

Inference

IX

= { A :tf<> 0, t = 2 , . . . , p i + l ,
PI+I

Si=0,i

= pi

+ 2,...,p,
;=2

= pf = A } , i pi (5.32)

r > = { A : tfi = 0, t = 2 , . . . , p i + l ,
2

6i>0,

+ 2,...,p,

g
P 1

ft

= 4

= A>.

i= +2

Let

with

tfi/p , 0,

if p > 0 if p = 0 .
2

Because of the compactness of the reduced parameter space { 0 } and and the continuity of

1A

fx,n( )

in n we conclude from Wald (1950) that every

minimax test for the reduced problem in terms of the maximal invariant R is Bayes. Thus any test based on R with constant power on TJX is minimax for problem 7 if and only if it is Bayes. From (5.29) as A ~* 0 we get (for testing H
10

against

Hi\)
PI+I

j=2

,i>j

+ B(r,A,ij)

(5.33)

where S ( r , A, 17) o(A) uniformly in r and r>. It is obvious that the assumption of Theorem 5.1.1 is satisfied with
PI+I

U=

J^Rj
j=2

= Ri,

h(\)

b\

where b > 0. T h e set p 0 is a single point n = 0, so rjj assigns measure one to the point T? 0. T h e set Fix is a convex p\-dimensional component r>i o(h(\)).
0 0

set wherein each

So for a Bayes solution any probability measure t\\,x


A

can be replaced by the degenerate measure * which assigns the probability one t o m e a n r i - ( J? ,. . . , J 7 , , ) o f i * . C h o o s i n g *
+1 A

which assigns probability l)~ (N-j+2)- p2 (N1 1 1

one to the point whose j t h coordinate is

= (N-j

Some Mintmax Test in Multinormates

105

p + l){N-pi),

j = 2 , . . . , p i + l we get (5.6) from (5.33). T h e condition (5.5)

follows from Theorem 4.4.1. Hence we prove the following theorem. T h e o r e m 5 . 1 . 5 . For Problem 7 the likelihood ratio test of Hit H\\ as X > 0. H x)
2

defined in

(4.38), is locally minimax against the alternatives In problem 8 as X > 0 (for testing H Q against
2

A,(r)

.V A

/o,T,(r)

- i + n + Y,

i(S'K+(

i V

.. -i+ H
2

+5(r,A,7i),

(5.34) where B ( r , A, JJ) o(/i(A)) uniformly i n r, TJ. Since the set I ^ A is a p2-dimensional Euclidean set wherein each component r/, o(h(X)), argument as in problem 7 we can write using the same

/
I
h,n( )^o,x(dv)
r

-1

+ f, +

YI >(
r

3 +

2)%

j=pi+i

1 +

JVA

- i + f,+ 2 ^(Sf+c^-j'+iM
=

+ B(r,A) (5.35)

where B ( r , A )
+ 2

o(/i(A)) uniformly in r and I A assigns measures one

to

<)'

L e t iifc be the rejection region, given by, Rk = {x: U{x) = n + kf


2

> c}
a Q

(5.36) depends on

where k is chosen such that (5.36) is reduced to yield (5.5) and c

the level of significance a of the test for the chosen k. Now choosing

(N-p + 2)(N-p+l)

( N - p + 2)p

106

Group Invariance in Statistical

Inference

we conclude that the test <p* with the rejection region

B* =

U(x) = ft + ^ - ' 2

> c

j
k

with P O , A ( W * ) = Q satisfies (5.6) as A - 0. Moreover any region R form (5.36) must have k =
2

of the From

in order to satisfy (5.36) for some


2

(4.37) it is easy to conclude that the test 4>' is L B I for problem 8 as A 0. T h e iE -test which rejects H20 whenever r test depends only on p at p
2 2

f i +f

> c

does not coincide with 0*

and is locally worse. From Sec. 4.3 it is evident that the power function of R? and has positive derivatives everywhere in particular > 0. So we get the following theorem. = 0. From (5.5) with R R*, h{\)

T h e o r e m 5.1.6. For Problem 8 the LBI test <p* which rejects H a


2

whenever

f 1 + ~f

> c

where c

depends on the size a of the test, is locally minimax as A * 0.

against the alternatives

R e m a r k s . In order for an invariant test of H n against H x to be minimax,


2 2

it has to be minimax among all invariant tests also.

However, since for an


2

invariant test the power function is constant on each contour p

= p\ = A,

"minimax" simply means "most powerful". T h e rejection region of the most powerful test is obtained from (5.34), from which it is easy to conclude that [/A ,7j('")//o,n(r)] depends non-trivially on A so that no test can be most powerful for every value of a. 5.2. A s y m p t o t i c a l l y M i n i m a x T e s t s We treat here the setting of (5.1) when A > 00. o(H(\)) Expressions like o ( l ) ,

are to be interpreted as A - * 0 0 . We shall be concerned here in max-

imizing a probability of error which tends to zero. T h e reader, familiar with the large sample theory, may recall that in this setting it is difficult to compare directly approximations to such small probabilities for different families of tests and one instead compares their logarithms. While our considerations are asymptotic in a sense not involving sample sizes, we encounter the same difficulty which accounts for the form (5.39) below. Assume that the region R={x:U(x)>c }
a

(5.37)

satisfies in place of (5.5)

Some M i n i m a l Test in Mu Kin urinates

107

P (R)
x>n

= = 1 - e x p { - i r ( A ) (1 + o ( l ) ) }

(5.38)

where # ( A ) i co as A co and o ( l ) term is uniform in TJ. Suppose that p(x,A,7j)^ (d7i)


liA

= e x p { f f ( A ) [ G ( A ) + R(\)U{x)}
0 A

+ B(af,A)}

(5.39)

p{x,0,7j)^ , (dj;) and 0 < c j < R(X) < c


Q

where sup^ | B ( a : , A ) | = o{H(X)), regularity assumption is that c

< oo.

One other

is a point of increase from the left of the

distribution function of U when 6 0 uniformly in n, i.e. infPo(E/ > c for every e > 0. T h e o r e m 5 . 2 . 1 . / / U satisfies (5.38)-(5.40) and for sufficiently there exist , A and i
0 ( A a

- e ) > a

(5.40)

large A

satisfying minimax

(5.39), then the rejection region R is asympof level a for testing Ho -6 = 0 against the

totically logarithmically

alternatives Hi : 6 A as A - oo, i.e.,


l i m

inf,[-log(l-P ,,{*})]
A A e

A - sup^

i n f , [ - log(l - P , , ( 0 A rejects /Jo))]


a

P r o o f . Assume that (5.41) does not hold. T h e n there exists an e > 0 and an unbounded sequence V of values A with corresponding tests d>\ in Q critical region satisfies P , {R}
X V a

whose

> 1 - e x p { - # (A) (1 + 5e)}

(5.42)

for all IJ . There are two possible cases, (5.43) and (5.46). If A T and - 1 - ( 7 ( A ) < R(X) c
a

+ 2e,

(5.43) and r
A

consider the a priori distribution A (see Theorem 5.1.1) given by satisfying r /(l A n

) = e x p { i f ( A ) (1 + 4e)} .
A

(5.44) must

Using (5.42) and (5.44) the integrated risk of any Bayes procedure f satisfy rl(Bx) < rl(M < (1 - r )a
x

+ r e x p { - J / ( A ) (1 + 5e)}
A

(1 -

Tx

)\a

+ exp{-e H(A)}].

(5.45)

108

Group Invariance in Statistical

Inference

From (5.39) a Bayes critical region is B


x

= {x : U(x) + B(x,X)/R(X)H(X)

> [-(1 + 4e) - G{X)]/R(X)}

Hence, if X is so large that B{x,X) sup we conclude from (5.43) that B D{x:
x

H(X)R(X)

U(x) > c
!V

e/c } =
2

B' (sa.y).
x

T h e assumption (5.40) implies that Po {B' } (5.45) for large A. On the other hand if A e V and -l-G(A)>R C
A Q

> a+t! with e' > 0, contradicting

+2e,
a

(5.46) satisfying (5.47)

consider the a priori distribution


A A

given by * j , and T

r / ( l - r ) = exp{>7(A)(l + e)}. Using (5.39) a Bayes critical region is B


A

= {x : U(x) + B(x, X)/R(X)H(X)


T 1

> [-(1 + e) - G(X))/R(X)}

. C r7, so

Thus, if s u p I R(X^BIX) I */2c2 we conclude from (5.46) that B\ that, by (5.37) and (5.47), r*(Bx) > nexp{-H(X)[l
A

o(l)}}
0

-(l-r )exp{>Y(A)(c- (l))}. Since r*(B )


x

(5.48)

< r*(4>x) < (1 - T ) Q + r e x p { - r T ( A ) ( l + 5e)}


A x

= ( l - r ) [ o + exp{-4e
A

H(X)}], = J % .

we contradict (5.48) for sufficiently large A. E x a m p l e 5.2.1. ( T - t e s t ) In Problem 1 of Chapter 4 let U(X) Since 0 ( a , 6 ; x ) = e x p { x ( l + o ( l ) ) } as x -* co we get, using (5.18),
2

+
J= l i>j

(5.49)

Some M i n i m a l Test in Multinormo/es

109

with Sup

\B(r,T), A)I o ( l ) as A 0 0 . From (4.5) putting n

= 1, 7)i 0,

i < p, in (5.49) the density of U being independent of n we see that P {U


X

< c}
a

= exp{(

C a

- 1)[1 + o ( l ) ] } = | ( 1 c ).
a 5

(5.50) Now letting


A N A

as A oo. i
i A

Hence (5.38) is satisfied with H(X)

assign measure one to the point Vi =

" Vp-i

0 Vp 1

O,A From

assign measure one to (0,0) we get (5.39). Theorem 5.2.1 we get Theorem 5.2.2. T h e o r e m 5.2.2. For every p,N,a,

Finally (5.40) is trivial.

Hotelling's

test is

asymptotically

minimax for testing HQ .6 0 against H\ : 6 A as A > co. From Sec. (5.1) we conclude that no critical region of the form ai R{ > c other than Hotelling's would have been locally minimax, many regions of this form are asymptotically minimax. T h e o r e m 5 . 2 . 3 . c < 1 and 1 a < a < < a ,
p

then the critical

region

{ tti Ri > c } is asymptotically minimax among tests of same size as A > 00. P r o o f . T h e maximum of V f j 2~2i>j Vi subject to at r\ c, T2 r
p

Ojr< < c is achieved Since a ,'s are


}

0.

Hence the integration />,,()") over a small

region near that point yields (5.50) with c replaced by c. with U Yl{aiRi. Again (5.40) is trivial.

nondecreasing in j it follows from (5.49) that I,A can be chosen to yield (5.39)

E x a m p l e 5.2.2.

Consider again Problem 2 of Chapter 4. From E x a m u

ple 5.2.1 with p = pi, R = (R ...,

R )',
Pl

r, = (n,,...

,n )'
Pl

we get

e x

P S~

-^E^E*

( l + B(r,A,>)))

(5.51)

with sup,.^ \B(r,\,n)\

o ( l ) as A -> 00. Using the same argument of the

above theorem we now prove the following. T h e o r e m 5.2.3. For every p i , N, a the UMPI imax for testing HQ : 61 0 against Hi :6 = \as\* E x a m p l e 5.2.3. test is asymptotically 00. min-

In Problem 3 of Chapter 4 we want to find the asympQ

totically minimax test of H

: 6~i 62 0 against the alternatives H

: 61 0,

110

Croup Invariance in Statistical

Inference

62 = A as A co.
r 1 r f

In the notations of Examples 5.1.4 and 5.2.1 we get,


+ P 2 T T

i - E 1 J > i +*a = E V v ni = 0, i = 1 , . . . , p i + P 2 under H

= (n, -,Tn+toY* v = ( * h , . * , + ) ' . and rj, = 0, i = 1 , . . , , p i under H\. Hence


PI+PJ

/o,n(r)
where sup,.,, \B(r,A,n)|

- l + fi +

iK

(5.52)

= o ( l ) as A - t 00. From (4.9) when 5a = A

00,

61 = 0 we get /(r,,f |A)


2

/Cl .falO)
where s u p , . ^ \B(f ,f ,\)\
t 2

= exp

^[-l

+ f i + f ] ( l + B(fi , f . A ) ) j
2

(5.53)

= o ( l ) as A -* 0.

Letting fa in (5.39) assign

measure one to the single point n 0 we get from (5.52)

A,,(rKi.x(dij)

- l + f,+

>=Pi + l

r, 2 *
i>j

.-?.A))|
(5.54)

Let Rk be the rejection region of size a , based on R\,R R


k

given by (5.55)
k

= {x : U(x) = R +kR >


t 2

c}
a

where k is chosen such that (5.54) is reduced to yield (5.39) and R chosen k satisfies (5.38) and (5.40). Now letting 1 / 1 = = T) -7
P l + P )

for the _i = 0, R.
2

P l + p l

= 1 we see that (5.54) is reduced to yield (5.39) with U{x) = Ri +

From (5.53) P(U(x) <c'-)= exp j


0

fa

- 1)(1 + o ( l ) ) j .
2

(5.56) satisfies

Hence Hotelling's test which rejects fi


a

for large values of R\ + R

(5.38) with H(X) = f (1 - c' ). T h e fact that Hotelling's test satisfies (3.39) is trivial. Since the coefficient of r
p i +

i in the expression inside the brackets in the


k

exponent of (5.54) is one, any rejection region R


t

must have k = 1 to satisfy


k

(5.39) for some \ \ . From Theorem 5.2.1 a region R

with k /

1 and which

Some Minimax Teat in Multinormales

111

satisfies (5.38) and (5.40) cannot be minimax as A co. Using Theorem 4.1.5 we now prove the following:

T h e o r e m 5.2.4. Hotelling's asymptotically oj = A as A * oo.

test which is asymptotically

best invariant
T

is

minimax for testing HQ I fli = &2 = 0 against H : 6

0,

R e m a r k s . From Theorem 5.2.3 it is obvious that there are other asymptotically minimax tests, not of the form Rt, for this problem. It is easy to see that P {R
Xin

(l-c)- R >
2

1) = l - e x { - ( l o g A - l o g ( ( l - c ) ( l + c ( l ) ) ) } ,
P

P a

'" (

^ r

=1

{ ~

Thus the fact


1

that the likelihood ratio test which rejects HQ whenever

~/l*fi *' 5
1

ft

and the locally minimax test satisfy (5.40) and (5.38) is obvious

in both cases. B u t from Theorem 5.2.4 these two tests are not assymptotically minimax for this problem. 5.3. M i n i m a x T e s t s In this section we discuss the genuine minimax character of Hotelling's T R.
2 2

and the test based on the square of the sample multiple correlation coefficient These problems have remained elusive even in the simplest case of p 2 or
2

3 dimensions. In the case of Hotelling's T character of T


2

test Semika (1940) proved the U M P


2

test among all level a tests with power functions depending


1

only on & Np'H~ p.


2

In 1956 Stein proved the admissibility character of T

test by a method which could not be used to prove the admissibility character of R - t e s t . G i r i , Kiefer and Stein (1964a) attacked the problem of minimax property of Hotelling's T - t e s t among all level a tests and proved its minimax propery for the very special case of p 2, N 3 by solving a Fredholm integral equation of first kind which is transformed into an "overdetermined" linear differential equation of first order. Linnik, Pliss and Salaeveski (1969) extended this result to N 4 and p 2 using a slightly more complicated argument to construct an overdetermined boundary problem with linear differential operator. Later Salaveski (1969) extended this result to the general case of p = 2.
2

112

Group /nvuriaiice in Statistical

Inference

5.3.1. Hotelling's

test

In the setting of Examples 5.1.1 and 5.2.1 we give first the details of the proof of the minimax property of Hotelling's T
2

test as developed by

G i r i , Kiefer and Stein (1964a), then give the method of proof by Lunnik, Pliss and Salaeveski (1966). In this setting a maximal invariant under G j f p ) is R =
p

(R ,... R )'
1 } P - 1

with

= NX'(S

+ NXXy^X

and the correS )' with


p 0

sponding maximal invariant in the parametric space is A = (6i,..., St = N p '

u = S. From (5.18) R has a single distribution under H

and

its distribution under Hi : 6 = X (fixed) depends continuously on a p - 1 dimensional parameter T = | A = (6 6 )':


P

5,>0,

= Let us write
r i n

Thus there is no U M P I test under Gr(p) / (r)


A

as it was under Gt{p).

as the pdf of

as given in (5.18). Because of the compactness of the A we

reduced parametric spaces { 0 } and T and the continuity of f&( ) in terms of R is Bayes. In particular Hotelling's T
1 2

conclude from Wald (1950) that every minimax test for the reduced problem test which rejects He, whenever U = J R\ > c, which is also G r - i n variant and has a constant power function on each contour Np'T,~ p H\ constant Ci \m 1 = ] *t (5.57) = X, maximizes the minimum power over if and only if there is a probability measure on T such that for some

'r / o t according as

<

= > c

K< J

except possibly for a set of measure zero where c depends on the specified level of significance a and c i may depend on c and the specified X. A n examination of the integrand of (5.57) allows us to replace it by its equivalent

r Jo\ )

d A ) = c,

if

Y > - e . ,

(5.58)

Obviously (5.57) implies (5.58). O n the other hand, if there are a and a constant C[ for which (5.58) is satisfied and if r* = ( r j , . . . , r ' ) ' is such that i r j = c' > c, then writing / = ^ and r" = ft* we conclude that

Some Minimax

Test in Multinormales

113

because of the form of / and the fact that c'/c > 1 and

r** = f E i t

This and similar argument for the case c' < c show that (5.58) implies (5.57). Of course we do not assert that the left-hand side of (5.58) still depends only on 2%n if
p

n / c.

The computation is somewhat simplified by the fact that for fixed c and A we can at this point compute the unique value of c i for which (5.58) can possibly be satisfied. Let R = (Ri,...,
p

Rp-i}' > 0 and

and let f&{f/u)


1 r

be the conditional pdf of R given Denote by / " ( u ) the pdf of

= i Ri it with respect to the Lebesgue measure, which is continuous in f and u with U = E i < it < 1. which is continuous for 0 < u < 1 and vanishes elsewhere and

which depends on A only through 8. T h e n (5.58) can be written as

j /AtfMtfdA) =
1

fo(f\c) /;*<<=)

(5.59)

if Ti > 0, E i

<

- The

fi ^

an

integral in (5.59), being a probability fo(f\c).

mixture of probability densities is itself a probability density in f as is

Hence the expression inside square brakets of (5.59) is equal to one. From (4.5)

r(f) . ^ )

r( V ) _ ^ -

i - ,

' ^ , E | ) . .

{ 5

6 0 )

Using (5.60) we rewrite (5.58) as JV-i + 1 1 r$-\*,. *


A

, / J V /j

cA

J r

i>3

>> = 1
r c

'2' 2 i

'

'12'!'

2
(5.61)

for all r with r j > 0, E i i = - Write T i for the unit (p - l)-simplex

r! = {(A,...,/3 ) ft>o, X > = i }


p :

114

Group Invariance in Statistical

Inference

where /3; 6i/X. Let U

7 cA and * be the probability measure on t]{6A), (5.61) reduces to

T i associated with on T. Since

J r

>

I )=1

i>j

fe=l

'

= * ( f , f ; i , ) for all ( i i , . . . , f ) with


p

( - > 'i = 7
a i m

i i > 0 and hence, by analyticity for all

(h

t ) with g j f f t = 7P

From (5.62) it is evident that such *, if it exists, depends on c and A only through their product 7 . Note that for p 1, T i is a single point but the dependence on 7 in other cases is genuine. C a s e p = 2, N = 3. S o l u t i o n of G i r i , K i e f e r a n d S t e i n Since 0 ( | , (1 + x)exp(^x), from (5.62) we obtain

jf [i + ( 7 - -

ft)]<TO)

= ^ " ^ ( f , i ; |(5.63)

with fi = 7 - f , A = 1 - / 3 . We could now try to solve (5.63) for by using 1;. 1 . the theory of Meijer transform with kernal ^ ( 1 , 4 hx). Instead we expand both 21 2' sides of (5.63) as power series in f . Let
2 2 2

Jo

be the ith moment of /3 . From (5.63) we obtain


2

(a)
(b)

l+t-*m=B,
-(2r 1) p _ ! +
r y

(2r + 7 W

7 ^+1 = B

T(--)r(i)J

where B = e~^ d>(^, 1; 7 ) . We could now try to show that the sequence { p , } given by (5.64) satisfies the classical necessary and sufficient condition for it to be the moment sequence of a probability measure on [0,1] or, equivalently, that the Laplace transform
GO

=0

Some Minimal

Test in Muttinormales

115

is completely monotone on [0, oo), but we have been unable to proceed successfully in this way Instead, we shall obtain a function m (x)
1

which we

prove below to be the Lebesgue density dt]*{x)jdx T h e generating function


CO

of an absolutely continuous

probability measure f* satisfying (5.64) and hence (5.63).

of the sequence {fa}

satisfies a differential equation which is obtained by mul_ 1

tiplying (5.64) (b) by f


2

and summing from 1 to oo:


2

2t (l-t)V '(t)-(f -7t +


J

M t )

= B t [ ( l - t r = S ( ( l - t ) "

- l ] + 7 [ t ( l - / 0 - l ]
2

- ( - 7 .

(5.65)

T h i s is solved by treatment of the corresponding homogeneous equation and by variation of parameter to yield f J
0

m n

'

(l-i)

zl [3T#-Ift*

2
2T (l-T) /2
2 1

2T(1-T)J

'

(5.66) the integration being understood to start from the origin along the negative real axis of the complex plane. T h e constant of integration has been chosen to make VJ continuous at 0 with VJ(0) = I , and (5.66) defines a single-valued function on the complex plane minus a cut along the real axis from 1 to oo. The analyticity of ip on this region can easily be demonstrated by considering the integral of ip on a closed curve about 0 avoiding 0 and the cut, making the inversion w = | , shrinking the path down to the cut 0 < w < 1 and using (5.67) below. Now, if there existed an absolutely continuous * whose suitably regular derivative m
7

satisfied

j m ( a ; ) / ( l - tx)dx = i>(t), Jo
7

(5.67)

we could obtain m

by using the inversion formula = (2irix)


_ 1

m^(x)

limty(x

- 1

+ it) - T / > ( X

-1

- ie)].

(5.68)

116

Group Invariance in Statistical

Inference

However, there is nothing in the theory of the Stieltjes transform which tells us that an m (x)
y

satisfying (5.68) does satisfy (5.67) and, hence (5.63), so we use


7

(5.68) as a formal device to obtain an m (5.63).

which we shall then prove satisfied

From (5.66) and (5.68) we obtain, for 0 < x < 1,

m-,(x)

B
2rr x^ (l
2

v>l (1 +

+ B u) '
3 2

- x) '

1 2

\ J

| l + u

Ja

1-

J
(5.69)

In order to prove that a%*{x) m ( x ) dx satisfies (5.63) with " a probability


7

measure we must prove that

(a)

m ( i ) > 0 for almost all x, 0 <x


T 1 0 1

<1, (5.70)

(b) f m {x)dx (c) fii =


r

= l xm-f(x) dx satisfies (5.64) (a) dx satisfies (5.64) (b) for all r > 1.

(d) p

x m {x)
1

T h e first condition follows from (5.69) and the fact that B > 1 and u / ( l + u) < (1 + w) -' for u > 0. T h e condition (d) follows from the fact that m ( x )
7 3 2

as defined in (5.69) satisfies the differential equation

w! (x) + m ( i ) |
7

+ (l-2x)/2x(l-x)

= B/2irx (l-x) < ,

1/2

(5.71)

so that an integration by parts yields, for T > 1,

( r + l K - r ^

_ , =

/ [(r + l ) x Jo

- rx - \m (x)
y

dx

= / Jo = u (l
T

dx / 2 ) + 7 ii /2
T+l

r_i/2

+ B r ^ + ^/2Tr^r! which is (5.64) (b). T o prove (5.70) (b) and (e) we need the following identities involving confluent hypergeometric function <p(a,b;x). T h e materials presented here can

Some Minimax

Test in Multinormates

117

be found, for example, in Erdelyi (1953), Chapter 6 or Stater (1960).

The

confluent hypergeometric function has the following integral representation

0 K 6; x) =
5=0

T(a + j)T(b)/r{a)T(b

+ j) j \ JS>

when 6 > a > 0. T h e associated solution ip to the hypergeometric equation has the representation

if a > 0. We shall use the fact the general definition of ip, as used in what follows when a 0, satisfies iP(0,b;x) = 1. (5.74)

T h e functions T/J and <f> satisfy the following differential properties, identities and integrals.

~<t>(a,b;x)=

(j-l)b+l;x)

+ 0(o,6;i),

(5.75)

^ / > ( a , 6; x ) - $ ( a , 6; x) - ^ ( a , 6 + 1; x),

(5.76)

^(a,b;x) = i - ' o K a - H l W a + l . i ; ! ) - ^ , ! ; ! ) ] , (a - b + l ) 0 ( o , 6; x) - a 0 ( a + 1,6; x) + (b - l ) 0 ( a , b~l;x) = 0,

(5.77) (5.78) (5.79) (5.80) (5.81)

6 0 ( a , 6 ; i ) - bd>(a - 1 , 6 ; a ; ) - x 0 ( o , 6 + l ; i ) = 0 , ih(a, 6; x) - ai/>(a + 1, b; x) - ip{a, b - 1; x) = 0, ( 6 - a K > ( a , 6 ; x ) -xih(a,b + l;ar) + ^ - ( o - l , f c ; ) = 0 ,

Jo

e~ (x

+ y)- <f>[ - , l ; i <tt = *

r,l;y

for y > 0 , (5.82)

118

Group Invariance

in Statistical

Inference

e - * ( s + y)-vQ,l;z)<te
(5.83)

Using (5.73) , in terms of hypergeometric functions, for 0 < x < 1, the m ,


7

defined in (5.69), can be written as

h [,'^./.{^ / '<l- 1 ; ^)*' 1 - l i T / 2 - r (l)*(i l i ^)


+ </>(^,1 ,7/2 j jf v- e-<"t
l 2

dv-v^e^^dv

r r ? r 2 TT[X (1 E/2

. ( e " ^ f i i : T / 2 ) ^ i , i ; 7 ( i - ) / 2 ) ( \2 /

r(|)v-(|,l;7/2)}.

(5.84)

We now prove (5.70) (b) to establish that m-,(x) is an honest probability density function. From (5.73), (5.72) and (5.82) we get

/
Jo
=

VTTt^*(i.i;7(i-)/?)#
2ir x f l - i ) J f Jo VT7T~ r ^ ( l , 1 ; 7 * / 2 ) tte 2ir[x(l - l ) ] a
v

dt 7o 2 , r [ * ( l - * ) ] 7o
1 J

(5.85)

Some M i n i m a l Test in Mull in orm ales

119

From (5.85), (5.84) and (5.72) we get i 4 e


z

ri f h

r(|)

dx

= 2 ^ , l ; ^ ( i , l ; , ) where 2z = 7 . We now show that tf'(z)-#(*)=0, from which it follows that H{z)=ce for some constant c.
l

- 0 ( I , l ; ^ ( ^ , l ;

(5.86)

B y direct evaluation in terms of elementary integrals when 7 0 we get (using (5.85)) / m (x) dx = 1; Jo hence, (5.70) (b) follows from (5.S6). T o prove (5.86), we use (5.75) and (5.76),
a

which yield H\z)-H(z) = <b{^,2 z^i ^\-,z^


] >

+ # / | , l ; \ ^ Q

- 2 Ki i; ^G' 2;s ) + ^G' 2; ^G ,M )


1 1 T o this expression we now add the following four left-hand side expressions, each of which equals zero i> U .
1

; *

1 ~ n H ^2; 2

- -<P[ - , 2 ; z
2 \2
T

1 . ^3

+ 0

-,l;z

2* U

(N
2- ^
2

1 1 . /3 -,2;* - - t f -,2;* -tf -,l;z 2"V 2 1 . (Z

-01

2 \2

to obtain H' H 0, as desired.

120

Group Invariance in Statistical Inference We now verify (5.70) (c). Prom (5.72) and (5.79), with a = , b = 1 we

obtain
1

(1+7?/) e'

dy = M -,l; /2
7 1

+70

-,2;7/2

/4 (5.88)

Using (5.70) (6), which we have just proved, and (5.85) and (5.88) we rewrite (5.70) (c) as 1-1 AI

1-

|, mt/aJJ

jf u + 7(1 - $}] M * ) *
(l + 7 ) 2T[(1- )]I/2
V

1 ; 7

, ?

- e ^ r
1 7

11; /2
7

)\dy f 7
l

g ^(l,l;7g/2) 2^(1 -

, f y I
1 +

0(1,1; Tg/g) 2^^(1-2/)]!^


2

r ( | ) f ( f . 1,7/2) 0(|,l; /2)


7 =

(l + 73/)e^

2ff[v(l - J/)]V2

f Jo +

7 ^ ( 1 , 1 ; 7J//2) 2 T[J /(1 J f ) ] '


1 2

dy
(5.89)

l/3 1_ 2 r ( - ] ^ ( - , l ; / 2 ) - -2 r [V -2 ]^[-,l;7/2
7

Using (5.81), with a = 1, 6 = 0 and (5.72), (5.73), (5.83) we get f


1

7g^(l,l;7/2)

, _

f
0

M0,0;7y/2)-(l,O;7y/2)]
d

2 ^ ( 1 - , ) ] ^ ^ - y

= 1 -

f _ dy Jo * [3/(1 - ) ] ^ Jo
a

= l - j T ( l +
= 1+ r

r V Q ^ 7 * / 2 ) e - ^

(|) ^(I^^A-^d-HT/s)

/2. (5.90)

Some Afinimai Test in Mv.lttnorma.les T h u s (5.89) and (5.90) imply (5.31) and, hence (5.70) ( c ) . C a s e p = 2, N = 4. S o l u t i o n o f L i n n i k , P l i s s a n d S a l a e v s k i

121

From (5.62) the problem consists of showing (with 0i /J) on the interval 0 < 8 < 1 there exists a probability density function {/3) for which

= 0(2,l; /2).
7

(5.91)

W i t h a view to solving this problem we now consider the equation | i (7 - t l ) ( l -

j *

"<'-/ ^2, | ; M / 2 ^ { |

8)/2)mW

4l<\M^-0)/2jmdf3.

(5.92)

Using the fact that

we rewrite (5.92) as

16-^0(2,1;
/
1 e

^ ) [ 1 +
_ ^

7 -10
( A ) ( f / J

- d d -

/3)K(W

- ^ / 2

Jo From (5.93) we get + (1 + 7 + - 0,

/" e - ^ [ ( r - 2r W~ Jo - (7 + irW }t:(0)d0


+1

r +

2r )8

for r = 1 , 2 , . . . .

(5.94)

Let

r=l

be an arbitrary function represented by a series with radius of convergence greater than one. Multiplying the r t h equation in (5.94) by o _ i and taking
r

the sum over r from 1 to 00 we get

122

Group Invariance in Statistical


f1

Inference

s:

(2/3 - 2 / 3 ) / " + (-5/3 + 60

- f&

+ /3 )/'
7

! - l +

o 0 - ^ + f 0

) ,

J LUMP) Jo

d0 (5.95)

= 0 where L{f)

denotes the differential form inside the square brakets. Applying and choosing a small e > 0 we get (5.96)

the green formula to L(f)

where the adjoint form L*() is given by L * ( 0 = 20 (0


2

- 1)C + 0{-Z

+ 60 + 7/3 - 7 / 3 K '
2

(5.97) and the bilinear form L[f, ] is given by

m% = ^ ( ! - ^ ) ( ^ - / ' o - \0+7^ (i - mi n
2 2 2

(5.98)

Using the thoery of Frobenius (see Ince, 1926} we conclude that


CC CO

(5.99) =0 with tto / 0, Co / 0; are two fundamental solution of L*(Q 0 on the interval (0,1)- Similarly

n = 96(1 - /J) ,
k=0

fc

jia - (1 - /3)"5 fefcd - 0)


k=0

(5.100)

with 3o ^ 0, fto # 0 are also two fundamental solutions of the same equation on the interval (0,1). Consequently any arbitrary solution of the equation L*() 0 is integrable on [0,1] and we assert that there also exists a solution of the equation L*() 0 for which h m L [ M ] | ; - < = 0. (5.101)

Some Minimax

Test in Mullinormales

123

It can also be shown that any linear combination of the solutions in (5.99) (similarly for solutions in (5.100)) satisfies (5.101) provided that () = i ( - ) .
2

T h u s - 12 is a solution of (5.95) and hence of (5.91). I n the following paragraph we show that i does not vanish in the interval
2

(0,1). Write 0 = 1-*, *(ar)=A5*yf & i ( l - 4 (5.102)

The equation L"(t\) 0 becomes 2x{\ - x)8" + ( - 1 + 4x--,x Substituting u(x) = 6'(x)j8(x) + -yx )S' + ^(1 + jx)$ z
2

= 0.

(5.103)

we obtain the Riccati equation +


7 a

l-4s

s ' _

3+^7*

2x{l-x)
1

4x(l-x)

'

Let us recall that 8(x) E t ^ o ^ ) * ' > 1*1 < ! Assume that 9{x)

vanishes Then

at some point in the interval (0,1). Since 0(0) = 1, among the roots of the equation ${x) = 0 there will be a least root. L e t us denote it by XQ. the function 0(x) ?'(0) is analytic on the circle \x\ < x$. | which implies that u(0) = | . From (5.103) we get

Hence u(x) cannot vanish in the < 0. T h e fact that which in turn

interval (0, Xrj), since at any point where it vanishes we would necessarily have u'(x) > 0 whereas (5.104) shows that at this point u'(x) as we approach the point XQ from the left (u' / contradicts (5.104). A s a result =
i 2

the function u(x) has a negative value implies that it decreases unboundedly 0 since

does not vanish in the interval (0,1).


1 2

L e t us now return to (5.91), which, as we have seen, is satisfied by B y the method of Laplace transforms it is easy to obtain the relation

/ x - {l Jo

b l

-xy-^ei

*tt>(a,b;px) <p(a - b,c - b; q(l -

x))

r(t)r(c - b) r(c)

-4>(a,c;p + q),

0<b<c.

(5.105)

Letting t\ = 7 T , and multiplying both sides of (5.91) by r

5 (1 r )

' and

integrating with respect to r from 0 to 1 we obtain with a = 2, 6 = | , c = 1,

p = 772,? =

7(l-W2,

| y 2 , l ; ^ W

=p ( H ; ^ )

( W .

(5.106)

124

Group Invariance in Statistical

Inference

Now to obtain (5.91) it is sufficient to make use of the homogeneity of the above relation and to normalize the function 12. 5.3.2. R?-test
0

Consider the setting of Example 5.1.5 for the problem of testing H against the alternatives Hi alternatives p
2

:p

: p

A > 0.

We have shown in Chapter 4 of HQ against the (gX + t, gSg')

that among all tests based on sufficient statistic (X,S) > 0 the R -test
2

is U M P invariant under the affine group ( G , T )

of transformations ( 9 , t ) , g e G , t e T operating as (X,S) where g is a nousingular matrix of order p of the form


a =

(%m

with 3 ( H ) of order 1 and t is a p-vector. For p > 2 this result does not imply the minimax property of rem. R -test
2

as the group ( G , T) does not satisfy the condition of the Hunt-Stein theoAs discussed in Example 5.1.5 we assume the mean to be zero and A maximal invariant under G T -(P) consider only the subgroup G r ( p ) for invariance. T h i s subgroup satisfies the conditions of the Hunt-Stein theorem. is R = A (Rg,... ,Rp)' with a single distribution under Ho but with a dis6j p
2

tribution which depends continuously on a (p 2)-dimensional parameter (6 , ,S Y


2 P r 7)

with J

= A

A under St-

T h e Lebesgue density

function f\, { )

^ ^ under Hi is given in (5.29). Because of the compactness

of the reduced parameter spaces {0} and

r =
and the continuity of fx, { )
v r m

6 y,
p

6i>o,

4 = A]

(5.107)

1 " '* follows that every minimax test for the


2 2

reduced problem in terms of R is Bayes. In particular, the Jf -test which has a constant power on the contour p = A and which is also G (p)
T

invariant,

maximizes the minimum power over H i if and only if there is a probability measure on Y such that for some constant cj

according as

Some Minimax

Test in Multinormales

125

except possibly for a set of measure zero. A n examination of the integrand in (5.108} allows us to replace it by the equivalent

/ |McktfA3i
Obviously (5.108) implies (5.109). Y% i
f

i f X > , = c-

(5-109)

O n the other hand if there is a and a


s u c n t n a t

constant c\ for which (5.109) is satisfied and if f = (r~2>---.*p)' = c > c, then writing / =
7

and r' = cf/c'

we conclude that

/(f) = / ( c V / c ) > / ( r - ) =

C l

because of the form of / and the fact that c'/c > 1 and ^ j j r * in (5.29) 7 " ( l A) 1 - J2j>i jfai
i 1 6 a n d t h a t

- Note that ^ similar

ti > -

argument for the case c' < c show that (5.108) implies (5.109). T h e remaining computations in this section is somewhat simplified by the fact that for fixed c and A we can at this point compute the unique value of C\ for which (5.109) can possibly be satisfied. L e t R = (R2,..., for Ti > 0 and E i
_ 1

Rp-i)'

and write /x,Tj(r|i/) for the version of conditional (u) for the

Lebesgue density of R given that E 2 Ri = U = u which is continuous in f and r < w < 1 and zero elsewhere. Also write
2

Lebesgue density of R can be written as

E 2 Ri = U which is continuous for 0 < u < 1 and

vanishes elsewhere and depends on A only through S - 2 ^ , - T h e n (5.109)

/
for fi > 0 and E S P
1 r =

/ ,,(r|c)(dA) A

ci

/c*(0

75(c) J

> < - T h e integral in (5.110), being a probability mixture

of probability densities, is itself a probability density in f, as is / o o ( r | c ) . Hence the expression in square brackets equals one. From (4.35) with 0 < c < 1 q - A ) ^ -
P

r(^-i

, _
( p

3 )
1

_
'
A)

i { N

_ _
p

2 )

r((jv- )/2)r((p-i)/2)
\(N-l), 1(JV-1);(P-1)

; C

(5.111)

where F(a, 6; c; x ) is the ordinary 2^1 hypergeometric series given by

126

Group Invariance in Statistical

Inference

F(a,b;c;x)

= 2^ - -* r()r(6)r(c + r)
r=0

E
. where we write ( a )
r

(g)

(fc) _
r r

(5.112)

(c)

= T(a +

r)/T(a).

From (5.110) the value of

which

satisfies (5.109) is given by (i-A)tt"-'>r<i(jy-i))


C l

,,_ ,
3 1 J

- _
P

"

r(|(/v- ))r(i(p-i))
P

FQ( JV

- 1), i ( J V - 1); | ( p -

1);CA)

(5.113)

From (5.113) and (5.29) the condition (5.109) becomes

[ i

r ( i - A ) / 7 - t )
i i

03=o
* r ( i ( J v - j + l) +
+

0 =a
p
7 j

rci(N-U) ( l + 7r-') ift "


;

ft-)
j

r 4

r j

(l-A)/

i l r ( | ( i v - i

i))(2/3 )! U + ^ ( ( 1 - * ) / > ; - i ) J

FQ(/V-1),

-(N-l);

i(p-l);cA)

(5.114)

for all T with r, > 0 and J ri = c. C a s e p 3 , N 4 ( N 3 if m e a n k n o w n ) . S o l u t i o n of G i r i a n d K i e f e r In this case (5.113) can be written as . + 2n)
> * 3 *3

V(l-4)(i-^A)

r g a ( l - A ) ( 2 n + l ) ( 2 n + 3)
a

Ta 83
(l-fc)(l-r A)
2

(l-a)(l-r A)l
a

(5.115)

One could presumably solve for by using the theory Meijer transforms with kernal F ( , ; \\x). Instead they proceeded as for Hotelling's T
2

problem,

128

Group Invariance in Statistical

Inference
s

measure on [0,1], or, equivalently that the Laplace transform ltj(-t) lj\ successfully in this way. We now obtain a function m {x),
t

is

completely monotone on [0, oo}. Unfortunately we have been unable to proceed which we then prove to be the Lebesgue density d * ( x ) / d x of an absolutely continuous probability measure C satisfying (5.118) and hence (5.115). T h a t proof does not rely on the somewhat heuristic development which follows, but we nevertheless sketch that development to give an idea of where the 111,(1) of (5.123) comes from. T h e generating function
CC

3=0

of the sequence {fij}

satisfies a differential equation, which is obtained from


n l

(5.118) by multiplying it by t ~

and summing with respect to n from 1 to 00,


l 2

2(1 - t)(t - z)<p'{t) - t~ {t = tf,(l-t) i


-

- 2zt + z)<f>(t) (5.119)

-1-zf" .

Solving (5.119) by treatment of the corresponding homogeneous equation and by variation of parameter, we get

m=

W=T)\

k U l - r ) (rCr-*)l (1-*)*

1
2[T(1-T)(T-Z)]1

dr. (5.120)

T h e constant of integration has been chosen to make <p(t) continuous at t = 0 with 0 ( 0 ) 1 and (5.120) defines a single valued analytic function on the complex plane cut from 0 to z and from 1 to 0 0 . If there did exist an absolutely continuous * whose suitably regular derivative m
1 z

satisfied

m (x)
f

L lo
we could obtain m
c

(1

dx = 0 ( f ) ,

(5.121)

tx)

by using the simple inversion formula = . lim <p(x^ + i e ) - 0 ( x


- 1

m (x)
z

- ie)

(5.122)

27TIX E l O

Since there is nothing in the theory of Stieltjes transforms which tells us that an m
z

satisfying (5.122) does satisfy (5.121), we use (5.122) a formal device to

Some Minimax Test in Multinormales

129

obtain m

which we shall prove satisfies (5.115). From (5.120) and (5.122) we

obtain, for 0 < x < 1,

()=

(1-**)*
1

f
/

du

(1 + u)(z + u p - 2 (1 + 1 i ) i ( z + )

[u(l + u ) ( z + i*)]*

{ B , Q , ( z ) + c,} (say). 2jr(a:(l - as))i We can evaluate c (5.126) below. We obtain by making the change of variables V = (1 + u)
1

(5.123)

and using

and

Q , - 2 ( l - z } -

[ l - ( l - ^ ) - ^ log ' ( l - z ^ i + f l - z ) ^ . ( 1 - ) * - ( ! - * ) * l - ( l - z ) i l + (l-z)*. '

+ (l-z)"t

Now to show that *(da;) = m (x)


I

dx satisfies (5.115) with * a probability

measure we must show that

(a) m , ( i ) > 0 for allmost all x, 0 < x < 1; (b) fi m (x)


z

dx = l; 1 1 1 1 , ( 1 ) da; satisfies (5.118a); x


n

(5.124)

(c) /ti = (d) u


n

m,(a;) da; satisfies (5.118b) for TI > 1.

130

Group Invariance

in Statistical

Inference

T h e first condition will follow from (5.123) and the positivity of B for 0 < z < 1. T h e former is obvious. To prove the positivity of c ,
z

and c , we first

note that

this is seen by comparing the two power series, the coefficient of z [(f j j / j f ] I)2/j{j
+ 3

being

and (j + 1), and the ratio of the former to the latter being n";=i( + i ) > i . T h u s we have B
z M

>l-2.

Substituting this lower bound into


z

the expression for c and writing u = 1 2 the resulting lower bound for c whose coefficent of v> for j > 1 is [(j + i ) " - T (j the coefficient of u> for j > 1 is > (j + | ) 0 < z < 1.
1 1 2

has +1)].

a power series in u (convergent for \u\ < 1) whose constant term is zero and + )/r(j)r(j +
2 1

Using the logarithmic convexity of the T-function, i.e. V (j+^)

< !T(j)r(i + l ) ,
z

- (j + l ) " > 0 Hence c

> 0 for

T o prove (5.124)(d) we note that m , ( i ) defined by (5.118) satisfies the differential equation 1 - 2x + zx m' (x)
x 2 1

+ 7ro,(i)

= B /2nxHl
z

- x)%{\ - zx),

(5.125)

x(l

- x)(l

zx)

so that an integration by parts yields, n > 1


- z ( n + 2) M- v+I + (1 + z)(n + l)p n^ -i
n

/ {-z(n Jo [ x (l-(l Jo _
n

+ 2)x

n+1

+ ( 1 + z)(n + l)x

-nx ~ }

m {x)
z

dx

+ z)x +

zx )m' (x)dx
t

= - ) -

zu

n + 1

+ B

which is (5.118)(b). T h e proofs of (5.124) (b) and (c) depends on the following identities involving hypergeometric functions F{a, b; c; x) which has the following integral representation when Re(c) > Re(b) > 0: F(a,b;c;x) = ^ _ f - t )
M

&

( \

- tx)~*dt.

(5.126)

We will also use the representation 2 ^ ( i l ; ^ ; x ) o g ( l f ) (5.127)

Some Minimax Test in Multinormales and the identities F(a,b;c;x) ( c - o - 1 ) F(a,b;c\x) - ( c - 1) F{a,b;c lim (a) [TicT'F^b-c^x) ( f t U i ^ M A *-( + + i (JI + 1)!
+ l

131

F(b,a;c;x), +aF(a - l;x) + l,b;c;x)

(5.128)

= 0,

(5.129)

;x) (5.130)

for 7i = 0 , 1 , 2 , . . . , r ( l + A + n ) r ( l + y + /Q

= F \ l + A, ~

- v; 1 + X +

- A, 1 + V;1 + J/ + ii; 1 - * j

+ F\ \ + X, \ - u; \ + X -t- fi\x \ f [

- i - A, ~ +1/; 1 + + ./*;! as

- j

+ A , | - i / ; l + A + as)
W C

F ( | - A,|+
6

+ y +

- ;.j; (5.131)

F(ffl,6;c;x) = (1 - x ) - " - F ( c - a , c - b;c; x).

We now prove (5.124) (b) and (c). F r o m (5.123), using (5.126) and (5.127) we obtain

-(I-48

F ^ , i ; l ; ^

r / 2 ) 1 1 2 2

| ; 1; 1 -

3 1 - , - ; 2; 1 - z] x F\ 2 2

l + (i-z)5(i-zx) (1 log 3 X 1 1 i 2ir(l - j ) t i i ( l - i ] l \ 1 6

I 1 zi) ?
-

(5.132)

132

Group Invariance

in Statistical

Inference

The first expression in square brackets in ( 5 . 1 3 2 ) varishes, as is easily seen


from the power series of F in ( 5 . 1 1 1 ) . Using ( 5 . 1 2 7 ) , the integral in ( 5 . 1 3 2 }

can be written as
_ J

y.
3

(1 -

zT

dx (l-x)i(l-zx)"

^ ( l "

) ^

2n + l Jo

71=0 1 m=0
m
r a

,m(l\ oo z (^)m ^

(n + m ) ! < l - J t ) "
B

2 * H
v

(m!)
'

Z n=0

!(n+|)
v

3'

m=0

(i - f ) - i
;

(
1

'

1
V

rl-l r ' _ i i ^
;
0

2
1

( i - 2 - o * | ; 2 ; 1 - z ) . (5.133)

= ( 1 - z } " + ( T T / 4 ) (1 -

i r * * ( j ,

From ( 5 . 1 3 2 ) and ( 5 . 1 3 3 ) we get

m,()(fc = ( 7 r / 2 J F f-i,i l;4ri " 2 ' 2 " " / T


!

F ,

V2'2

rii;l

- P

1.1,*.-.)]
(5.134)

Hence, from ( 5 . 1 2 9 ) and ( 5 . 1 3 3 ) we obtain from

- F ( - ^ - ^ I ^ ) ] } .

(5.135)

Some Minimax Test in Multinormales

133

Using (5.129) with a = b = c = e * 0, (5.132) reduces to mF[

c = 1 + e and (5.130) with n = 0 and

1 1 1 1 3 1 - - , - ; l ; z ] F ( - , - ; 2 ; l - z ) + ( / 2 ) F (-, 2; 1-z 2'2' 2'2'


E

)F[

1 1 -;2;* 2'2' (5.136)

Now, by (5.131), the expression (5.133) equals one if we have

-/(

- i , 4 ; l ;

/ 2 ) F ( i , i ;

= 0. (5.134)

T h e expression inside the square brackets is easily seen to be zero by computing the coefficent of z . T h u s we prove (5.124)(b). We now verify (5.124) (c). T h e integrand in (5.132) is unaltered by multiplication by x, and in place of (5.133) we obtain (1 Z)~ /2
1 n

Z~ /3+[K/4Z(1
1

z)i]

F ( - | , | ; 2 ; 1 - z). T h e analogue of (5.134) is

pi

I x m (x) Jo
2

dx

;2;*)

^i
1 3
2 2
; ;

n ^ l ^ i - z

(7T/4)(1-Z )/Z

kin'3
(

-[(l-z)*/3.]F|

3
: 1

2'2

(5.135)

T o verify (5.124) (c) we then have to prove the following identity (by (5.118)

ril
+ 3 3

,1 :1:1-*
2

<x/4)[(l-*) /*]

x F

2'2'

1 3 - - , - ; 2 ; l - z 2'2'

(5.136)

134

Group Invariance in Statistical Inference c = 1,

From (5.131) with c = 1, a = b = , then (5.129) with a = , 6 = - J , and then (5.130) with A = /i = 0, f = 1 we can rewrite (5.136) as

(4/3ir){l + 2)

+ (3z/2)F(i,-|;2i *)

| , | ; 2 ; 1 - * ) + (4/3 TT )(1 -

z)

i
2'

i
2'

1 2'

3 2 (5.137)

Using (5.129) with a = - f , b = , c = 1 + r, and then (5.130) with n = 0, c - * 0, the expression inside the square brackets in the last term of (5.137) can be simplified to \ z F ( I , ^ ; 2 ; r ) . T h u s we are faced with the problem of establishing the following identity 1 3 1 1 - , - ; 2; z j F\ - , - ; 1; 1 z 2 2 2'2'

4/TT = Fl

(5.138) which finally by (5.126) reduces to

-^ F G'-^ ;2;1 - 2 ) +((1 - )/2 - 1)F (^ ;2;1 - 3 )


(5.139) T h e expression inside the square brackets in (5.139) has a power series in 1 - z, the value of which is seen to be zero by computing the coeificents of various power of 1 - z. Hence we prove (5.124) (c).

Some Minimal Test in Mul(inormales 135 5.3.3.

E-Minimax

test

(Linnik, 1966)
for Hotelling's T
2

Consider the setting of Sec. 5.3.1. 6 = 0 against alternatives Hj : 8 0jv is - m i n i m a x if sup inf E (<b)8 2

lest for testing H


2

> 0 at level Q(T (0,1). T h e Hotelling T

test

inf

E (<p )<
e N

(5.139)

for all JV > / V (e) where 8 = (ft, E ) and 0 runs through all tests of level < a.
0

For N - * oo, A = ^ , 0 solution

< tp < 2(log N ) * , (5.26) has the approximate

which is a pdf on the simplex I V If we substitute (5.140) in the left side of (5.62) we obtain a discrepancy with its right side which is of the order of 0 ( J V
- 1 +

) , for e > 0.

T h u s , if

a = ON satisfies the condition 0(exp{-logN)V*) and we have e x p ( - ( l o 7 V } ^ ) < 8 < (logfiV)) /*


g 1

< a < 1 - Ofexpf-logfV) ^)

(5.141)

(5.142)

then sup inf E (<t>) e

inf

Eg(4> ) = O ^ l / i V " )
N

for e > 0 and hence the Hotelling T i f , : 8 > 0.

test is e-minimax for testing H

against

Exercises 1. Prove in details (5.65). 2. Prove (5.72) and (5.73). 3. Prove (5.119) and (5.120). 4. Prove (5.127) and (5.130). 5. Prove that in Problem 8 of Chapter 4 no invariant test under ( G T - , ! ^ ) is minimax for testing Ho against Hi for every choice of A.

136

Group Invariance in Statistical

Inference

References 1. M. Behara, and N. Giri, Locally and asymptotically minimax test of some multivariate decision problems, Archiv der Mathmatik, 4, 436-441, (1971). 2. A. Erdelyi, Higher Transcedenial Functions, McGraw-Hill, N . Y . , 1953. 3. J . K. Ghosh, fnvariance in Testing and Estimation, Publication no S M 67/2, Indian Statistical Institute, Calculta India, (1967). 4. N. Giri, Multivariate Statistical Inference, Academic Press, N . Y . , (1977). 5. N. Giri, Locally and asymptotically minimax tests of a multivariate problem., Ann. Math. Statist. 39, 171-178 (1968). 6. N. Giri, and J . Kiefer, Local and asymptotic minimax properties of multivariate test, Ann. Math. Statist. 39, 21-35 (1964). 7. N. Giri, and J . Kiefer, Minimal character of R -test in the simplest case, Ann, Math. Statist, 34, 1475-1490 (1964). 8. N. Giri, J . Kiefer, and C . Stein, Minimax properties of T -test in the simplest case, Ann. Math. Statist. 34, 1524-1535 (1963). 9. E . L . Ince, Ordinary Differential Equations, Longmans, Green and Co., London (1926). 10. W, James, and C . Stein, Estimation with quadratic loss., Proc. Fourth Berkeley Symp. Math. Statist. Prob. 1, 361-379 (1960). 11. E . L . Lehmann, Testing Statistical Hypotheses, Wiley, N . Y . (1959). 12. Ju. V. Linnik, V. A. Pliss and Salaevski, On Hotelling's Test, Doklady, 168, 719-722 (1966). 13. J . Kiefer, Invariance, minimax sequential estimation, and continuous time processes, Ann. Math. Statist. 28, 573-601, (1957). 14. Ju. V. Linnik, Appoximately minimax detection of a vector signal on a Gaussian background, Soviet Math. Doklady, 7, 966-968, (1966). 15. M. P. Peisakoff, Transformation Parameters, Ph. D. thesis, Princeton University (1950). 16. E . J . G . Pittman, Tests of hypotheses concerning location and scale parameters, Biometrika 31, 20G-215 (1939). 17. O. V. Salaevski, Minimal character of Hotelling's T test, Soviet Math. Doklady 3, 733-735 (1968). 18. J . B. Semika, An optimum poroperty of two statistical test, Biometrika 33, 70-80 (1941). 19. L . J . Slater, Confluent Hypergeometric Functions, Cambridge University Press (1960). 20. C . Stein, The admissibility of Hotelling's T test, Ann. Math. Statist, 27, 616-623 (1956). 21. A. Wald, Contributions to the theory of statistical estimation and testing hypotheses, Ann. Math. Statist 10, 299-320, (1939). 22. A. Wald, Statistical Decision Functions, Wiley, N.Y. (1950). 23. O. Wesler, fnvariance theory and a modified principle, Ann. Math. Statist. 30, 1-20 (1959).
2 2 2 2

Chapter 6 L O C A L L Y MINIMAX T E S T S I N S Y M M E T R I C A L DISTRIBUTIONS

6.0.

Introduction

In multivariate analysis the role of multivariate normal distribution is of utmost importance for the obvious reason that many results relating to the univariate normal distribution have been successfully extended to the multivariate normal distribution. However, in actual practice, the assumption of multinormality does not always hold and the verification of multinormality in a given set of data is, often, very cumbersome, if not impossible. Very often, the optimum statistical procedures derived under the assumption of multivariate normal remain optimum when the underlying distribution is a member of a family of elliptically symmetric distributions.

6.1. E l l i p t i c a l l y S y m m e t r i c D i s t r i b u t i o n s D e f i n i t i o n 6 . 1 . 1 . (Elliptically Symmetric Distributions (Univariate)). A random vector X = (X\, - , X )'


?

with values in E ,/i )' e E


p p

is said to have a distribu(positive

tion belonging to the family of univariate elliptically symmetric distributions with location parameter (i = (pi,... the quadratic form (x - n)''S~ (x
1

and scale matrix

definite) if its probability density function can be expressed as a function of - p) and is given by

fx(x)

= |E|-

1 / 2

ff((x -

- ft)),

i e

(6.1)

137

138

Group Invariance

in Statistical

Inference

where q is a function on [0,oo) satisfying J o(y'y)dy for y e E.


p p

= 1

We shall denote a family of elliptically symmetric distributions by E (n, It may be verified that E{X) where a p the class E {u
p t - 1

).

= u,
l

cov(X)

= oS

E{{X

- n)'%- (X

- f*)). In other words all distributions in

S ) have the same mean and the same correlation matrix. T h e S ) contains a class of probability densities whose contours of equal

family E (ti,
p

density have the same elliptical shape as the multivariate normal but it contains also long-tailed and short-tailed distributions relative to multivariate normal. T h i s family of distributions satisfies most properties of the multivariate normal. We refer to Giri (1996) for these results. D e f i n i t i o n 6 . 1 . 2 . (Multivariate Elliptically Symmetric Distribution). A n x p random matrix x = (x )
ij

(x ,---,x y
l n

where Xi (Xn,...,

X )'
ip

is said to have a multivariate elliptically symmetric


p

distribution with the same location parameter p = ( p i , . . . , p } ' and the same scale matrix S (positive definite) if its probability density function is given by

f (x)
x

= \L\-^ q

r g f o

tf^fa-tM

(6.2)

with Xi e E ,i

= 1 , . . . , n and q is a function on [0, oo) of the sum of quadratic


- 1

forms {ii p ) ' S

( i i p ) , i 1,.... ,H. Ellip-

We have modified the definition 6.1.2. for statistical applications.

tically symmetric distributions are becoming increasingly popular because of their frequent use in filtering and stochastic control, random signal input, stock market data analysis and robustness studies of multivariate normal statistical procedures. T h e family E (u,
p

S ) includes, among others, the muiltivariate nor-

mal, the multivariate Cauchy, the Pearsonian type I I and type I V , the Laplance distributions. T h e family of spherically symmetric distributions which includes the multivariate-t distribution, the contaminated normal and the compound normal distributions is a subfamily of i ( p , ) .
p

Locally Minimax Tests in Symmetrical

Distributions

139

Write f (x)
x

in (6.2) as (6.3)

where 8 = {p, E ) e Q,x

e x and * is a measurable function from \


m

3^-

Let us assume that y is a nonempty open subset of R

and is independent of

0 and q is fixed integrable function from y to [0, oo) independently of 8. L e t G be a group of transformations which leaves the problem of testing H
0

:8 i l / / against Bi : 8 flff, invariant.


0

Assume that \ is

C a r t a n G-space and the function i/j satisfies *(gx\8) = gV(x\8) (6.4)

for ail i

x> ff G , # G SI and g G where (5 is the induced group of


a r |

transformations on * corresponding to g G on x the range space y of *.

d G acts transitively on

Using (2.21a) the ratio i i of the probability density function of the maximal invariant T(X) under G on x; with 8\ 6 QH, , #o QH > > given by
s 0

where 6(g) is the Jacobian of the inverse transformation x > gx. T h e o r e m 6 . 1 . 1 . / / G acts transitively is independent Proof. on the range space y
0

of

then R

of q for all 8\ 6 tiff,, do fln , for any j o J there exists an

Since G acts transitively on y : y ) of G such that


0

element k(x,8j

(6.6) Using the invariance property of p and replacing g by gh(x,8i : j / ) we get


0

140

Group Invariance in Statistical

Inference

where A

(-) is the right modular function of ft. Hence

Q{8 )8{h{x,8
0

: y )-i)A (h(x,6>
0 r

: y ))
0

is independent of g. 6.2. L o c a l l y M i n i m a x T e s t s i n E (fi,


p

E)

In recent years considerable attentions are being focussed on the study of robustness of commonly used test procedures concerning the parameters of multivariate normal populations in the family of symmetric distributions. For an up-to-date reference we refer to K a r i y a and Sinha (1988). T h e criterion mostly used in such studies is the locally best invariant ( L B I ) property of the test procedure. G i r i (1988) gave the following formulation of the locally minimax test in E (ii,
p

) . Let F be the space of values of the maximal invariant depend on the parameter (<5,77), For each (u, 77) in the parametric space of the distribu: $,tf) is the probability density function of Z with
0

Z T(X)

and let the distribution of T(X)

with 6 > 0 and 7 7 g W.

tion of Z suppose that f(z

respect to some r/-finite measure. Suppose the problem of testing Hg : 8 e f l / /

against Hi .8 e f i / / , reduces to that of testing H0 : 8 - 0 against Hi := A > 0, in the presence of the nuisance parameter IJ, in terms of Z. We are concerned with the local theory in the sense that f(z : A,n) is close to f{z : 0,17) when A o(h(X)) is small for all q in (6.2). Throughout this chapter notations like o ( l ) , are to be interpreted a A * 0 for an q in (6.2). For each a , 0 < a < 1 we now consider a critical region of the form R' - [U{x) = U(T(x)) > C]
a

(6.7)

where U is bounded, positive and has a continous distribution for (8, r/), equicontinuous in (S,ri) with 6 < S for any q (6.2) and R'
0

satisfies (6.8)

P , ( f l * ) = cv, P ( f T ) = a + h{\)
0 M

+ rfA,/,)

for any q in (6.2) where r(A,ri) - o(/i(A)) uniformly in 7 7 with h{\) > 0 and h(X) = o ( l ) . Without any loss of generality we shall assume h(X) bX with

>> 0.
Remarks. (1) T h e assumption P (R*)
0t

= a for any q implies that the distribution

of U under H

is independent of the choice of q. T h e test with critical

region R' satisfying the assumption is called null robust.

Locally Minimax Teats in Symmetrical Distributions (2) T h e assumption P\^(R") a + h(X) + r(X,ri)

141

for any q implies that

the distribution of U under Hi called locally non-null as X 0.

is independent of the choice of q as

X > 0. T h e test with critical region R" satisfying the assumption is

L e t fo, x be the probability measures on the sets {S = 0 } , {6 respectively, satisfying

X},

/
g(X)

f(z

X,T,)Udv) = 1 + h(X)[g{X) + r{X)u] + B{z, X) (6.9)

f(z:Q, )( (dri)
n 0

for any q in (6.2) where 0 < Ci < r(X) o ( l ) and B(z,X)

< c

< oo for X sufficiently small,

= o(X) uniformly in z.
0

Note. If the set 8 = 0 is a single point, assigns measure 1 to that point. In this case we obtain

Since G acts transitively on the range space y of (6.10) is independent of q. T h e o r e m 6.2.1.

the right-hand side of

If R* satisfies (6.8) and for sufficiently

small X, there

exist \ and o satisfying

(6.10), then R* is locally minimax for testing HQ :

5 = 0 against the alternative Hi : 6 = X for any q in (6.2) as X * 0, i.e.


l i m

iaU,Px, (R*)
v i e Q a

- a rejects H}
0

f i

A-.osup^

iafPx, {4x
v

- a

for any q in (6.2), where Q P r o o f . Let n

is the class of test d>\ of level a.

= (2 + /i(A)[9(A) + C* 7-(A)])- .
Q

A Bayes critical region for (0, 1) losses with respect to the a priori T\Z\ + (1 - TX)O is given by

142

Group Invariance

in Statistical

Inference

B y (6.10) B (z)
x

holds for any q in (6.2). Write B (z)


x A

= B,
x

= R* -

and W

= B\ R". Since s u p J 5 ( z ) / 7 i ( A ) | = o ( l ) and the distribution of U

is continuous, letting

KM) we get

= /

PxMUdv)

Since, with A Vj, or p&t^)=%f,4! writing r J ( A ) - (1 - r ) P J ( A ) + r ( l A i A A

(i+o(ft(A))

f&fA))

we get the integrated Bayes risk with respect to f j as r J ( B > ) - (1 - r O P o , x ( S , ) + r ( l A

P&pjJ).
A

r (R*)
x

=(1

- r )P %(J?) + rx(l - P ; , ( P ' ) )


A 0

Hence for any q in (6.2) r ( B ) = rl(R-)


A A

+ (1 - n ) t * f o T O "
1 I A

T (P '
a

(V )-P - ,(W,))
A 1 I

= rl(R')

+ (1 - 2 ) ( P * ( H M - P * ( V ) )
n 0 A 0 A A

= ri(il*) + (MA)).
0

(6.13)

If (6.11) is false for all q in (6.2), then by (6.8) we can find a family of tests {<px} of level a such that d> has power function cv + r(A, n) on the set {6 = A}
x

for all q in (6.2) satisfying lim sup (inf[r(A,jj) -h(X)]/h(X) A > 0 "
A

> 0.

Hence, the integrated Bayes risk r lim

of d>\ with respect to


A

then satisfies

sup ( r ; ( P ' ) - r ) / M A ) > 0 A 0

for all q in (6.2) contradicting (6.13)

Locally Minimax

Tests in Symmetrical

Distributions

143

6.3. E x a m p l e s E x a m p l e s 6.3.1. L e t X = (Xij) = (X[,. .. X' ),


t n

X[ = (X' ,...
tl

,X ),
ip

i 1 , . . . , n be a n x p ( TI > p) matrix with probability density function

f (x)
x

m-^ q(Y(xi
i-i

rf'E-^-JK))

(6.14)

where g is from [0, co) to [0, oo), ft = (pi,... Write for any b = (h,...,b )'
p

,p )'
p

and E is a p x p positive (f>i,... , 6 ) ' , b )


P l [2

definite matrix. We shall assume that q is thrice contunuously differentiable. = {b' ,b' )',
{1) (2)

with b i) {

(lfa+1,..

.6p)',6p] = (bi,...,bi)'

and for any pxp

matrix

A=(a )=(f*
iS 3

\ A

22

where A{

are pi x p

submatrices of A with pi -I- p /an,...,ai,\

p- We shall denote by

\aa,...,

On/

We are interested here in the following three problems. P r o b l e m 1. To test HIQ : p = 0 against H\\ ; p ^ 0 when E is unknown. P r o b l e m 2. T o test H are unknown. P r o b l e m 3 . To test H Q : p 0 against H31 : p ; i j = 0, p ( ) /
Z 2 : 2I)

M(i) = 0 against H \
2

P(i) ^ 0 when p ( ) , E
2

0 when E

is unknown. T h e normal analogues of these problems have been considered in Chapter 4. P r o b l e m 1. Let Gi(p) be the multiplicative group of p > c p nonsingular matrices g. Problem 1 remains invariant under G;(p) transforming (X,S;p,-i:)^(gX,gSg';gp,gT g')
l

where X = ^

X i , S = Z? (Xi-X)(Xi-X}'.
=l 2

A maximal invariant in the + nXX')~ X


1

space of ( X , S) is T

= nX'S^X'

or equivalently R = nX'(S

144

Group Invariance

in Statistical

Inference

/ ( l + T

) .

A corresponding maximal invariant in the parametric space under T


2

Gi(p) is 6 = T t y t ' E ~ V - I f Q is convex and nonincreasing, then the Hotelling with respect to G ( p ) for testing i f
;

test which rejects H\o whenever i i > C is uniformly most powerful invariant
1 0

against H\i

( K a r i y a (1981}}. A s stated in ofpxp the

Chapter 5, the group Gi(p) does not satisfy the conditions of the H u n t - S t e i n theorm. However this theorem does apply for the subgroup GT(P) maximal invariant under GT{P) is [R\,... on A (6\,...,Sf,)'. under Hu T h e ratio
L 0

nonsingular lower triangular matrices with p > 2. From E x a m p l e 5.1.1.


R

,i?p)', whose distribution depends


(Ri,...,Rp)'

of probability densities of

: S = A and H

: 6 = 0 is given by (with g = (gij) GT =

GT{P))

j R = ~ ^

Pi(9z)fl(9 i) "- d9 (6.15)


( n

i)/2

/
where dg = Pi(x)

Po(sx)I](^)

0 / 2

dS

Jldgij,

= q(tr x'x - 2nxp + A),

Po(x) = ? ( t r x'x). Since x'x > 0, there exists a g E GT such that gx'xg' = I, Hence f
R =

Jn~gx = y = (VR ...,


U

v^p)'.

q(ti{ g'
9

- 2A'gy +

tyf[(fi)f**&*B (6.16)

/
- 1(^(99'))

9(tr(ss'))n(^) -

( n

i ) / 2

dff

Under the assumption that q is thrice differentiable, we expand q as q(ti(gg' - 2A'gy + A)) + ( - 2 t r ( A ' g y ) + \)q(ti
2

gg )

+ ^ 2 t r ( A ' g y ) + A ) ( t r (gg')) + h-2ti(A'g )


y

+ X) q (z)

(6.17)

Locally Minimal Tests in Symmetrical

Distributions

145

where z = ti{gg') + (1 - a ) ( - 2 t r ( A ' y ) + A), 0 < c < 1 and


9

Since the measures

9 % * < ^ ) > j l ^ ^ ^ d f l , fc = 1.2,3


i=i are invariant under the change of sign of g to -g, we conclude that

/ /

tr(AW"(tr(S9')) 9iW
t )

fiftfe**-***
( 1 i l / 2 r f

= ' ^
0

(tr(

'))n(4) ' "

(618)

for i j i, j m . Now letting D = from (6.15)-(6.18) we get / (tr( '))n(s?i)


( u

f f 0

/ 2

^,

R=l

g^(gg'))f[(gir-^dg

+ 4

MA'

0 i

,))

2 9

( >(tr(

5 f f

'))n(^)

| n

-'

) / 2

+ 4
6 / ?

(-2tr(A'

? s /

A)V

3 1

( )n(S ,) i i

| n

l ) / 3

^ (6.19)

T h e first integral in (6.19) is a finite constant Q\. integral in (6.19) we first note that

T o evaluate the second

tr{L\'gy)

Y r)
d

12

(6.20)

From (6.17) and (6.19) the second integral in (6.18) reduces to

|5>M-I

^(gg'))f[(gl^-^dg.

(6.21)

146

Group Invariance in Statistical

Inference

T o evaluate the above integral we need to evaluate the following two integrals.

JOT

i=i (6.22)

h = Define

g 3 '( (s3'))f[(s
j

( 2

t r

2 1 l

} ''-

i , / 2

L = tr(ffff')

- gtJL, i = l,...,pi
ep i=flfn,i/ '
+

*=

i,--.,P-i;

fip+p-i+i = 9i+i,if <

= 2,...,p-2;

and ST= / g (tr(9s'))rfff


(2,

Since

is a spherical density of gLtf, L and e = ( e i , . . . , e (


p

p + 1

j y ) ' are independent


2

and e has a Dirichlet distribution ( 1 / 2 , . . . , 1 / 2 ) . From K a r i y a and E a t o n (1977) the probability density function of e is given by

(e) = r ( f c ^ ) / [ r ( i / 2 ) i ^ ' /
p(p+l)/2-l
i=.

/
V

p(p+l)/2-l
i J

1/2-1

(6.24)

Using (6.22) and (6.23) we get

,
Jk -

.NM

(n - + 1 ) ,

_ WW J -
2

(6.25)

Locally Minimax

Teats in Symmetrical Distributions

147

where / P TJ(
\i=l

M = E[

E I

)(-0/a

C = p(p + l ) / 4 + 1/2 ( n - i ) .

(6.26)

Let JJ = ( i j i , . . . , % , ) ' , r?i ^i/A. From (6.18) we get

X-+ A2?

j&+| J
o(A) uniformly in

|j>

L
+ ( n - j +

)%] j j +
(6.27)

where B(y,ri,\)

and 77. T h u s , from (6.27), with

the equation (6.9) is satisfied by letting 0 give measure one to the single point ?7 = 0 while
A

gives measure one to the single point 7 7 n" ( j j j , . . , t j j )

whose j t h corrdinate Wj satisfies V'j = (" " j ) " V so that + l ) n | - njp. (6.29) " 3 + i r V ^ f f t " P ) . / = 1, (6-28)

Since G;(p) and G (p)


T

satisfy the condition of Theorem 6.1.1., R does not

depend on q. T h e first equation in (6.8) follows from the following lemma. L e m m a 6 . 3 . 1 . Let Y = (Y ...
u

,Y )',
n

Y; = {Yn,..., symmetric

Y )'
ip

i = 1 , . . . , be function

annxp

random matrix with spherically

probability density

f (y)
Y

= 9(tr yy') = q ( g
,i=l

tr

] ,

(6.30)

a n d Jet

148

Group Invariance in Statistical are of dimensions of Y


( 1 |

Inference k x p, (n - k) x v respectively and


( 2 ) 2

where Y ) , Y
( L

( 2 )

n - k >

p>k.

The distribution

(Y ' Y( ))

(1)

does not depend on q.

P r o o f . Since n > p, Y'Y

is positive definite with probability one. Hence AA. A s the Jacobian of the transformation

there exists a p x p symmetric positive definite matrix A such that Y'Y Transform Y to U, given by Y = UA. Y U is \A\ , /r/,A(,) = (tr a a ' ) M . T h u s (7 has the uniform distribution and 17 is independent of A. U as
n n

Partition

where t /

( 1 )

is k x p and (7(2) is (n - k) x p. From Y = C M we get Y

( { )

= t7#j>4,

i = 1, 2. T h u s

Since Y ( ( Y ' Y ( ) )
1 1 1 2 ) 2

_ 1

Y j j is a function of 17 alone, it has a completely specified


{

distribution. C o r o l l a r y 6.3.1. If T
2

fcis distributed as Hotelling's

(central i.e. 6 = 0). Proof. Since Y ( j ( Y j ' j Y ( 2 ] ) Y j ' j has a completely specified distribution
1 2 1 -1

for all q in (6.30), we can, without any loss of generality, assume that Y i , . . . , Y are independently distributed N (0,I).
p

T h i s implies that Y / j l ( 3 ) =
2

E J L Y / Y / has Wishart distribution W (n


2 p Y

- 1,1) independently of Y i . Hence


1

(i)( (2) m)~ {i)

lY

Hotelling's T

distribution.
2

From this it follows that when 6 = 0, T Yi, 5 = (6.30). ~ )( i


Y Y

= nY'S' ?,
2

with Y = l/n

"

=1

~ Y

Hotelling's T

distribution for all q in Using

T o establish the second equation in (6.8) we proceed as follows. (2.21) we get for any g t?/(p), (which we will write as Gi) fT*{t \6) fT2(t \0)
2 2

J g(tT( g'-2a'gy
Gi 9

+
2

S))\gg'\^' dg
( J

<j(tr gg')\g0-)/ dg

Locally Minimax

Testa in Symmetrical

Distributions

149

where y = ( < / r , 0 \ . . . , 0 ) ' , a - ( \ A , 0 , . . . , 0 ) ' are p-vectors and T = f / ( l + f ) .


2 2

After a Taylor expansion of q, the numerator of (6.31) can be written as / q(ti99')\99'\ - dg JG,
iin p,

+ 6 j <,<>(tr99')\99'\ - dg JG, + 26r f q^(t gg )gl\gg'\^->'y dg JG,


T , 2

{n

p)n

+ o(6)

where g (jJy). Using (6.18) we get the ratio in (6.13) as l + 6(k + CT) + B(y,t),6) where B(y,r),6) Si/S,i o(6) uniformly in y,v
p

(6.32)

and 7 7 ( i j i , . . . ,i7 )', with r/i


l0

= 1 , . . . , p and k,c are positive constants. Hence for testing H


n

:6 = 0

against the alternative H'

: S A the Hotelling's T R* = {U(X) = R>c }


a

test with critical region. (6.33)

where c

is a positive constant so that R* has the size a, satisfies P^tRn^a + bX + giKr,) (6.34)

with b > 0 and <7(A,r/) = o(Z>A). T h u s we get the following theorem. T h e o r e m 6 . 3 . 1 . For testing Hm : n = 0 against the alternative H' A specified > 0 Hotelling's of distribution (6.14). T
2

:6

test is locally minimax

as X * 0 for the family

R e m a r k 1, Since G j ( p ) satisfies the condition of Theorem 6.1.1., the ratio (6.31) is independent of q. From Corollary 6.3.1. (6.34) it follows that Hotelling's T Hia against
2

and equations (6.31) and

test is locally best invariant for testing


2

: 6 X as A * 0 for the family of distributions (6.14). I f q test is uniformly most

is a nonincreasing convex function then Hotelling's T powerful invariant.

P r o b l e m 2. Write X = ( X i j , X ) ) with
( ( 2

: n x p\,X( )
2

: n x P2, and

p,+P2=

p.

Since the problem is invariant under translations of the last pa components of each Xj, j 1 , . . . , n , this can be reduced to that of testing if o : p ^ j 0
2

against H21 P(i) function is given by

0 in terms of X^

only whose marginal probability density

/^^(^Di-ISiir^gftrSr/^ij-ep'^y^^-ep^))

150

Group Invariance in Statistical

Inference

where E i j is pi JI-vector and

left-hand cornered submatrix of , e

(1,...,1)

g(tr v) =

/ q(tr(v + JR"F1
R

ww'))dw,

and q is a function on [0, oo) to [0, oo) satisfying (6.1a). Now, using the results of problem 1 with p = p , R = Ri L

following theorem. : 6\ np'^y

T h e o r m 6.3.2.
n

For testing H20 P(i) = 0 against H'

21

S V ( i ) = X {specified) > 0 the test which rejects H20 for large values of Rj nX' (S
(1) l) {n)

+r$( yfa)~ Jt(i)


l {l)

= nXl S( \ X /{l
l )

nXl S-\ X , )
1) ) l )

is locally minimax as X t 0 for the family of distributions

(6.34).

R e m a r k 2. T h e locally best invariant property and the uniformly most powerful invariant property under the assumption of nonincreasing convex property of q follows from the results of problem 1. P r o b l e m 3. T h e invariance property of this problem has been discussed in Chapter 4. T h e problem of testing i f
3 0

against H i
3

remains invariant under

the multiplicative group of transformations G of p x p nonsingular matrices g, 9 = ( \9{2i)


9 m )

9(32) A maximal invariant

where gin) is p i X P i , transforming (X, S) - (gX, gSg'). in the sample space under G is (Hi, .S3) where
pi p

Ri=2^Ri,
fc=l

Ri+R~

= R =

Y, i=l

Ri

A corresponding maximal invariant in the parametric space is S\, 62 where

i=l P i-l

For invariant tests under G the problem is reduced to testing i i against the alternatives H Ri and R 2 3l

3 0

; 5 = 0
2

: b\ > 0, when it is given that 61 = 0, in terms of

Locally Minimax Tests in Symmetrical

Distributions

151

As stated in Chapter 5 the group G does not satisfy the conditions of the H u n t - S t e i n theorem. However this theorem does apply for the subgroup GT(P) of p x p nonsingular lower triangular matrices. From problem 1 the whose distribution depends
30

maximal invariant under GT{P) is (Ri,...,Rp)' continuously on A = densities of (Ri,..., R )'


p

, - , VS )'
P

with 6i > 0 for all i. Under H

: 6i 0,

i = l , . . .p and under i/31 : 6i 0, i 1 , . . . , pi- T h e ratio R of the probability under H'


31

: 6 = A > 0 and under H Q ' $2 = 0, when


2 3

it is given that 6\ 0, is given by


f

/ R =

q(tv(g '
9

- 2A{ (g
2j

{2im)

S ( 3 2 ) J

, ) + A)}
( 2 1

]\[gl) - dg (6-35)

{n

,)l2

/
J G
T

?(tr
1

gg')\{{gir-^dg

where g e G (p)
T

GT, y (T/R~I, ,VRy)'

and

-*>-(a a
with ( i i j ff(22) both lower triangular matrices. 9(tr Sff') + (~2v + \)qM(ti 9 where
| 2 )

)
+ A)
2

As before we expand 9 in the numerator of (6.35) as gg') + {-2v


3 ,

(tr

3 0

') + (-2f + A)V (^)

(6-36)

2 = tr gg' + (1 - a ) ( - 2 f + A), 0 < a < 1,

" = tr A { ( 9 ) 3 / ) + 3(22)2/(2)) 2) [21 (1

Using (6.18) the integration of the second term in (6.36) with respect to the measure n
1

( - >

3 7

gives A Q I where

01= /
JG
T

q (K99')f[(9 ,) - dg.
1

in

iy2

T o integrate the third term in (6.36) with respect to the measure given in (6.37) we first observe that (by (6.18))

152

Group Invariance

in Statistical

Inference

(tr(A;

2 ) S ( 2 u y o )

)%' Htrffs0n(5 J

l n

, ) / 2

= /

( 2 t

(tr

S f f

')n^)

( n

"

i ) / a d

f (6.38)

and

/ JG

(tt(A;
v r

2 ) f f ( 2 2 )

( ( 2 )

)%( >(trS3')n(s i
ff

2 1

} -

( n

0 / 2

rfs

=/
T

E >Ei 4 +
[)=Pi + l j>i 1

V i

(6.39)

Denote K= f
JG
T

q^[tigg')dg,

= tr gg', (6.40)

D=

9(tr

gg')f[{glY-^dg,

2MN

n+ D 1>J we get where e-s are defined in (6.23). From (6.35)-(6.40)

^ ( E i
r

+ B{y,r,, A)
where

(6.41) of
0

B(y,n,

A) o(A) uniformly in j / , T > (independently

by Theorem

6.1.1). T h e set {A = 0} is a simple point 17 = 0. So the assigns measure 1 to the single point rj = 0.

in Theorem 6.2.1.
77; =

T h e set {<5 A} is a convex


2

P2 = (p pi) dimensional Euclidean set wherein each component

0(h(X}).

Locally Minimax

Tests in Symmetrical

Distributions

153

Any probability measure \ can be replaced by degenerate measure assigns measure 1 to the mean 77*, i pi + 1 , . . . , p of j . Hence

which

+ B(y,ri,\)

where B(y,Tj, A) = o(h{\))

uniformly in y,n.

Consider the rejection region


2

RK = {X : U(X) = fi + Kf

> c)
0

(6.43)

where A" is a constant such that (6.42) is reduced to yield (6.9) and C depends on the level of significance a of the test for the chosen K, independently of q (by L e m m a 6.3.1.) Now choose

_
V j

(n-j-!) (n-j + l)

(n-p) --(n-p

n-pi

\ '
3

,_
P l

+ 2) {(n-p+Vpi)

so that

Hence we can conclude that the test with rejection region R' = ^x :U(x)=f with P(, (R')
TTI

^^-r

>C J
0

(6.45)

= a satisfies (6.9) as A > 0. Furthermore any region RK of the to satisfy (6.9) as A - * 0 for some t\.

form (6.43) must have K =

It can be shown that ( G i r i (1988)) ffi, R (?i, f 1 J?M )//fi,


2 2

(fi,

f \H )
2 3a

Dl\ where D
2

P2

P2

(6.46)

l7

ati, a

are positive constants and B{f'1,^2,62) 0(62) uniformly in it follows that for testing H
2 3D

f j , f . From this and T h e o r m 6.1.1

against J f

3 1

the test with critical region R' is locally best invariant as 6 > 0. Hotelling's

154

Group Invariance in Statistical

Inference

test which rejects H

3Q

whenever f i + f 2 > constant does not coincide with the = 0.

L B I test and hence it is locally worse. From (6.32) it follows that Hotelling's test whose power depends only on 6 = 6 , has positive derivative at 6
2 2

T h u s , for the critical region of Hotelling's test the value of h(\) we get the following theorem.

with 6% = A i n

(6.8) is positive. T h e first equation in (6.8) follows from L e m m a 6.3.1. Hence

T h e o r e m 6.2.2. rejects H
30

For testing H (6.2).

30

against H

3l

: ) = A the test,
( 2

which

whenever fi + ^jf-rt

> constant, is locally minimax

as A * 0 for

the family of distributions

E x a m p l e 6 . 3 . 2 . L e t X = (X )
I}

= {X\

X )'
n

X[ = (X*

X )',i
ip

1,..., TI be an n x p random matrix with probability density function given in (6.2). L e t X = J we shall write Xt, S = J J ( X i - X)(X
t

- X)'.

For any p x p matrix A

C
2 S S S 2

-4(ll| Ajai) 4(31)


3 3

-4(12) A 2) 4(32)
(2

4(i \ -4(23) I .4(33) /


3 )

(6-47)

with 4 ( n ) : 1 x 1, -4{ 2) I P i x p i , 4 ( , : p2 x p ; so that pi + p a = p 1. Define Pi = ( 1 2 ) ( 2 2 ) ( 2 1 ) / ( I 1 ) ' p


2 S

= p + p -(E,i2),E,i3,)'(^

(E

a a

,,E

( 1 3 )

)'/E

( 1 1 )

We shall consider the following three testing problems: (see Sec. 4.4) P r o b l e m 4. unkown. P r o b l e m 5. T o test H are unknown. P r o b l e m 6. To test H are unknown. These problems remain invariant under the group of translations transforming ( * , S ; i , S ) - ( * + 6 , S ; / + t,E). 6eR .
p

T o test H

: p

= 0 against Hi : p

> 0

when p, E are

: (? = 0 against H

: pi > 0, p\ = 0

when p , E

:p

= 0 against H

: p\ 0, p\ > 0 when p, E

Locally Minimax Testa in Symmetrical

Distributions

155

T h e effiect of these transformations is to reduce these problems to corresponding ones where p = Q and 5 = ^27=1 XiX[. T h u s we reduce n by 1. We treat now the latter formulation with p = 0 and assume that n > p > 2 to assert that S is positive definite with probability one. Let G be the full linear group of p x p nonsingular matrices g of the form f s m 0 \ For the first problem p Pi> under G operating as ( S ; E ) -> with g G G. (gSg'^Zg )
1 2 0 0

\ 0 . (6.48)

9 = \

9(22) 9(32] =p

5(33]/

= 0, p\

1 and for the other two problems

0,p2 > 0 satisfying p\+p2

p 1- T h e problems above remain invariant

F o r Problem 6 a maximal invariant in the space of S under G

(with p2 = 0) is _o "
5 =

(12) (22) [21i Q

and a corresponding maximal invariant in the parametric space of E under the induced group (with p2 = 0) is

p* =

^ (n)

(6.49)

For Problems 5, 6 a maximal invariant under G is ( f i , fi ), where


2

R\

"

y ^ (H)
6

(6.50)

fij+fi|

= (5

,5

)(^

(6.51)

A corresponding maximal invariant in the parametric space of E is where ., _ E ( i 2 ) E E 2i] Pi = E(n)


( 2 2 ) (

{p\,p ),
2

(E

( 1

2),E

( 1 3 )

)('^

2 2

)^ ))(S

( 1 2 ) 1

( 1

3 )'
)

L(u)

156

Group Invariance in Statistical

Inference

P r o b l e m 4. T h e invariance of this problem has been discussed in Sec. 4.4. T h e group G does not satisfy the conditions of the Hunt-Stein theorem. T h e subgroup GT of G , given by,

f
GT = < 9 ( where 3( 2)(Pi
2 x

( a m 0 \ 0

0(22) 9(32)

0 9(33)

Pi)> 9(33)(P2 x Pi) are nonsingular lower triangular matrices, fi ,


2

satisfies the conditions of the theorm. It may be remarked that we are using the same notation g for lower triangular matrices as g in G . In the place of vector R = ( R , R ) ' denned in E x a m p l e 5.1.5. with
PI + I P

the maximal invariant in the sample space under GT is a (p 1)-dimensional


2 p

J=2

j=2

T h e corresponding maximal invariant in the parametric space under GT is A {6^ ,


2

,t>)J )'
2

as defined in E x a m p l e 5.1.5. with


PI+I

p\

= i=2

P =P i+pl

J2
i-2

6i

For testing H

:p

= 0 against the specified alternative Hi\

:p

= A the ratio

fi of the probability density function of R under H\\ to that under HQ (using Theorem 2.7.1.) is R = (1 fR(r\H )/f (r\H )
lx R 0

A)""/

$((1 - A) hT(g

2 u)

- 2g Ay'g'
(11)

(22)

g{ g( 2)]v{dg)
22) 2

(6.53) where

=2 i=2 = /
JGT lG
T

g(tr

gg')v(dg). y/Tp)'

y = (v^2

Locally Minimax

Testa in Symmetrical

Distribntxona

157

Since the (p - 1) x (p - 1) matrix C=(l-p )" (/-AA') is positive definite there exists a (p 1} x (p 1) lower triangular nonsingular matrix T such that TCT' = I. Define 7i,a <,2 < i < p by i 7i = 1 - E^''
j=
2

= &7j>/7<7i-i-

Obviously 7 and TCT'

= 1 - p . Let a = ( a
2

2 )

. . . , a ) ' . T h e n a = T A . Since C A = A
f

= I we get a = T A - r C A a'ot = A ' ( T ' T )


2 - 1

{r)

_ 1

A,

A = ACA' p.
2

= (1 - p ) - ( A ' A ) - ( A ' A ) { A ' A ) =


1

Furthermore with

= (a ,
2

, on)' i

Since |C|-(l-p } -" we can write (1 A ) " ' " / ' M=g
2 2 2

f g(tT(9 )^ 9(ii)ay'9(22)+m2)9{22))) ( 9)U 2 2 l/ d

(6-54)

E x p a n d i n g the integrand in (6.45) as 9 (tt 90') + o


( 1 )

(tr S<7')(-2tr(a ^

(11)C!

i,'o; )))
22

^ ^ ( - 2 t r ( ^^{-2tr(g,

) )
3

+ where

n |

ay'g

( 2 2 |

))

z = tr go' + (1 - ff)(-2tr 9 ( i i | a ' S ( ) ) '


y 2 2

with 0 < S < 1.

158

Group Invariance in Statistical (gtj),

Inference

Since, with g -

P trg ii,QJ/'<i(22] = tf(ii> E f o ) j=2


( (1

I / 2

(
i>j

* j)

9 i

and the measure g > (tr gg') f(do) is invariant under the change of sign of g to g we conclude as in (6.18) that

/
JGT

[tr <J(ii)Q!/'i7(22)]9 (

(1)

tr

99'Hdg)

= 0

j 9i 9ihQ (^gg')u(dg)
j

(1)

= 0

i / /, J j t f e .

(6.55)

Let L = tr gg', L e i = g j u ) , L e , = g , i = 2 , . . . , p , Le
2 p 2

-i+i = G

2 + 2

,i,

i = 2, . . . , p ,

iei

+ P

( p - i ) / 2 = ffp,

Now proceeding as in Problem 3 we get

j=2 + ( > / , A, r>) where

\ !>j (6.56)

I f =6j/X,
M = E
e

j = 2,...,p,
1 2

(n-i)/a TT J " - * * ) / j=2

S ( j / i , A , n ) = o(A) uniformly in 3 / and TJ. Furthermore from Theorem 6.1.1 R does not depend on q. Since A = 0 is a single point n = 0, 0 assigns measure 1 to the point 1 7 = 0. T h e set p
2

= A is a convex (p 1)-dimensional Euclidean set wherein each

Locally Minimal

Tests in Symmetrical

Distributions

159

component is 0(A). T h u s any probability measure & can be replaced by the degenerate measure which assigns measure 1 to the mean TJ- , i = 2 , . . . , p of t)y Choosing V* = (n-j + l)-'( - j + 2 ) ~ ( p - l)~ n(n
1 l

-P+1), (6-57)

f = 2,...J>. we see that (6.9) is satisfied with U = YZ=2 3 using Theorem 2.7.1. we get /^(r |7Ji)//fi (r |ff )
2 0 2 2 R

1 9 8 8

(1 B
2

p )-'

+ ( i - p 2i )(/-AA')

yG

9 ( 2 2 | 0

2 2 )

])p(rf )
9

where g e G , as defined in (6.48), with p ^S)-(9( n,) "5 = and u / g(tr


2 ( , l / 2

= 0, |
n ( p

|ff(22)9f

2 2 )

1 , / 2

^,

gg')p{dg),
2

(tii> - - > T p - i ) ' satisfying u'u r .


2

If p
2

is small the power of the


2

test against the alternative H\ which rejects Ho whenever R by a + h{p ) T(X) = T(Xh) + o(p ),
2

> C =

is given The satisfies

where h{p )

> 0 for p

> 0 and h(p }

= o(l). R
2

null robustness of the test follows from the fact that T(X) theorem:

where h 6 GT ( K a r i y a (1981b)). Hence we get the following

T h e o r e m 6 . 2 . 3 . For testing H rejects Hn whenever R


2

against H[: p

= A, the level-a test which

> C is locally minimax as A > 0.

P r o b l e m 5. Here m = 0, j = pi + 2 , . . . ,p. For testing H R = l


Q

against H TlA

- p\ = A we can write (using 6.55)


pi+i

3=2 + B(r,X,n).

M>j

/ . (6.58)

160

Group Invariance in Statistical

Inference

Now letting Vh---< p,+i'


n tlie

give measure one to n =


m e a n

0 and t\x give measure one

to

ot*> where
1 1 1 1 I 7

j)| = ( n - p ) 7 ( i - i + l ) - ( n - J - l - 2 ) - p r J = l,...,Pi. we conclude that (6.9) is satisfied with U = Rf = ,,'=2 from Theorem 6.2.3.
R

(6.59) j
a n d

( - )

follows

T h e o r e m 6.2.4. For testing Ha against H' -p\ A, the level-a test which rejects Ho whenever Rf > C ,
a

where the constant

depends on level a of

the test, is locally minimax as X * 0. P r o b l e m 6. Here T),=0, For testing Ho against H j = 2,...,pi + l .

; p\ = X we can write R as A < 0 as

2B + B(r,A,n)

)=Pi+2

V i

(6.60)

where B(r, X, n) = o(A) uniformly in r and rj. L e t us now consider the rejection region of the form RK={? ; u(y) = f\+Kf
2 2

> c ]
a

(6.61) depends

where K is chosen such that (6.61) is reduced to yield (6.9) and C on the level a of the test for the chosen K. and t]x give measure one to a single point ( 0 , . . . , 0, r v * Tl'j=(n-p+ and let R=! y:U(y)
[

Let Co give measure one to n = 0


i + 2

, . . . , JJp) where }=Pi+2,...,p

l)n(n

- j + l)" (n - j + 2)-\(p - 1)"\

= fl +

^ f l > C

} = a

with pi + P2 = p - 1. Observe that the rejection region R with P R


0>

satisfies (6.9). Furthermore, for the invariant region R, Px {R)


iTI

depends only

on A. Hence, from (6.8) r ( A , r ) = 0. Since R is locally best invariant as A - *

Locally Minimax Testa in Symmetrical

Distributions

161

0. the test, which rejects HQ whenever R hence is locally worse. H


0

> C, does not coincide with R and

From Problem 4 the power of the test (which rejects conclude that h(X) > 0. Hence we have : p\ = X > 0, the test with

whenever R

> C) which depends only on A, has positive derivative at A

= 0. T h u s from (6.8) with R* = Rwe T h e o r e m 6.2,5. critical

For testing HQ against H'

region R is locally minimax as A > 0.

Exercises 1. Let X = (Xi,.... X )'


p

be a random vector with peobability density function = p, c o v ( X ) = Q


s

(6.1). Show that E(X) !*))

with a = p~ E(X

- /iJ'E"

(X-

2. Let X be a n x p random matrix with probability density function (6.2). F i n d the maximum likelihood estimator of p and E . References
1. N. Giri, Locally minimax tests for multiple correlations, Canad. J . Statist., pp. 53 Ann. Inst. Stat. Math.,

60 (1979).
2. N. Giri, Locally minimax tests m symmetrical distributions,

pp. 381-394 (1988). 3. N. Giri, Multivariate Statistical Analysis, Marcel Dekker, N . Y . (1996). 4. T . Kariya and M. Eaton, Robust test of spherical symmetry, Ann. Statist., pp. 208 215 (1977).
5. T . Kariya, A robustness property of Hotelling's T -problem,
2

Ann. Statist., pp. 206

215 (1981). 6. T . Kariya, Robustness

of multivariate

tests, Ann. Statist., pp. 1267-1275 (1981b),


robustness of some tests, Ann,

7. T . Kariya and B . Sinha, Non-null

and optimality

Statist., pp. 1182-1197 (1985). 8. T . Kariya and B . Sinha, Robustness (1988).

of Statistical

Tests, Academic Press, N . Y .

Chapter 7 T Y P E D AND E R E G I O N S

The

notion of a type D or E region is due to Isaacson (1951).

Kiefer

(1958) showed that the usual F - t e s t of the univarial general linear hypotheses possesses this property. Lehmann (1959) showed that, in finding type D region, invariance could be invoked in the manner of the Hunt-stein theorem; and this could also be done for type E regions (if they exist) provided that one works with a group which operates as the identity on the nuisance parameter set H of the testing problem. Suppose, for a parameter set fi = {(8,-n) function fy(9,IJ) ' 8 0 , J J e H with associated

distributions, with 0 a Euclidean set, that every test function d> has a power which, for each n, is twice continuously differentiable in the
a

components of 6 at 8 = 0, an interior point of 0 . L e t Q

be the class of locally 0 for all d>

strictly unbiased level a test of HQ : 8 = 0 against Hi : 0 ^ 0. O u r assumption on 0$ implies that all tests in Q are similar and that d0^/d&i\g^ in Q .
a o

Let A ^ ( J J ) be the determinant of the matrix B<j,(rj) of second derivalives with respect to the components of 8 (the Gaussian curvature) at Q.
a

of 0${8,TI)

8 0. Suppose the parametrization be such that A # ( f i ) > 0 for all n for at least one < ( > ' in

D e f i n i t i o n 7.1. T y p e . E t e s t . A test d>' is said to be of type E if <p' e and A ^ . ( n ) = max^gp,, A ^ ( n ) for all 77. D e f i n i t i o n 7.2.

T y p e D t e s t . A type E test d>' is said to be of type

D if the nuisance parameter set H is a single point. I n the problems treated earlier in Chapter 4 it seems doubtful that type E regions exist (in terms of

162

Type D and E Regions

163

Lehmann's development, H is not left invariant by many transformations). We introduce here two possible optimality criteria in the same sprit as the type D and E criteria which will always be fulfilled by some test under minimum regularity assumptions. Let
A ( n ) = max A ^ ( T J ) . (7.1)

D e f i n i t i o n 7.3. T y p e D

t e s t . A test d>' is of type D

if

max i

A( )
V

- A^(77)] -

min max[A(r,) - A * ( r / )

(7.2)

D e f i n i t i o n 7.4. T y p e D M t e s t . A test d<" is of type DM if m a x ^ M / A d - f a ) ] = min m a x [ A ( r j ) / A , , , ( j ) ) ] . (7.3)

These two criteria resemble stringency and regret criteria employed elsewhere in statistics; the subscripts "A" and " M " stand for "additive" and "multiplicative" regret principles. T h e possession of these properties is invariant under the product of any transformations on 9 (acting trivially on H) of the same general type as those for which type D regions retain their property, and an arbitrary 1-1 transformation on H (acting trivially on 0 ) , but, of course, not under more general transformations on ft. Obviously, a type E test automatically satisfies these weaker criteria. L e t us now assume that a testing problem is invariant under a group of transformations G for which the Hunt-Stein theorem holds and which acts trivially on 0 ; that is g(0,n) = (6,gr,), e G . (7.4)

If d>g is the test function, defined by 4>g{x) = d>(gx), then a trivial compulation shows that A()
tg V 3

=/Xpigv)

(7.5)

It is easy to verify that A ^ ( ) j ) = A^gv) if d> is better than $


m

implies A ( T J ) = A(g-n). Furthermore,

the sense of either of the above criteria, then d>g is

better than d>'g. A l l of the requirements of Lehmann (1959) are easily seen to be satisfied, so that we can conclude that there is an almost invariant (hence, in our problems, invariant) test which is of type DA or D^ This differs from

164

Group Invariance in Statistical

Inference

the way in which invariance is used in page 883 of L e h m a n n (1959), where it is used to reduce Q rather than H here. If the group G is transitive on H, then A ( J J ) is constant as is A ^ ( T J ) for an invariant <p, which we therefore write simply as A ^ . In this case we conclude that if 0 ' is invariant and if <p* of type D among invariant d> (i.e. T o verify these optimality properties we need the following lemma. L e m m a 7.1. Let L be a class of non-negative definite symmetric matrices of order m and suppose J is a fixed nousingular member of L. Conversely, if L is convex and J maximizes det B, the tr J~ B by B = J. P r o o f . Write J~ B tr ~
L L

if A ^ .

maximizes A ^ over all invariant <p), then d>" is of type DA or DM among all a).

I f tr

J~ B

is maximized (over B in L) by B = J , then det B is also maximized by J. is maximized

= A. If A = I maximizes tr A, we get (det A) '

L M

<

< 1 = det I . Conversely, if J. maximizes det A , it also maximizes tr A, > 1 for a small and positive.

since tr B > tr I implies d e t ( a B + (1 a)I)

R e m a r k s . T h e usefulness of this lemma lies in the fact that the generalized Neyman-Pearson lemma allows us to maximize tr QB^,, for fixed Q more easily than to maximize A ^ d e t B ^ among similar level a tests. We can find, for each Q, a, <pQ which maximizes tr QB^,\ a 4>" which maximizes A ^ is then obtained by finding a <S>Q for which B$
Q

Q~ .

In problems of the type which we consider here the reduction by invariance under a group G which is transitive on H often results in a reduced problem wherein the maximal invariant is a vector R = (Ri,..., tion depends only on A form m (7.6) where the hi and a y are constants and Q ( r , A ) o ( ^ ; ) as A > 0 and we can differentiate under the integral sign to obtain, for invariant test <p of level (Sy,..., 6 )'
M

Rm)' whose distribum

where 6\ = B\, $ = ( 0 * , . . . , $ ) ' and

such that the density / A of i i with respect to a o"-finite measure fi is of the

Til

8 (0 )
T TV

= all

+ $ >

hA + 5 > a t f

/ r - #r)
}

f (r)
0

p(dr)

(7.7)

Type D and E Region)

165

as S - 0.

According to G i r i and Kiefer (1969) this is called the symmetric

reduced regular case ( S R R ) .

T h e o r e m 7 . 1 . In the SRR case, an invariant D among invariant only if <p* is of the form

test i>* of level a is of type

<p (and, hence, of type DA and DM among all d>) if and

^^={1}
where c is a constant and qT
1

IF

E {<}<?
+ EotZj ai Rj<l>'{R)
3

(7.8)

= const h^

P r o o f . I n the S R R case, every invariant <f> has a diagonal B$ whose i t h diagonal entry, by (6.7), is 2[hia + we get the theorem.

o {53,

a R 4i(R)}]i3 }

B y Neyman-Pearson

lemma tr QB$ is maximized over such a <j> by a d>* of the form (6.7). Hence

E x a m p l e 7.1. Let 0 = A
r l

( T - t e s t ) Consider once again. Problem 1 of Chapter 4. = E . T h e n 8 is Euclidean p-space, GT operates

^ F

- 1

^ where F is the unique member of GT with positive diagonal

elements satisfying F F'

transitively on H = {positive definite E } but trivially on and we have the S R R case. We thus have (6.7) with h = - 1 ,
{

'1, - < 0, N - j + l,

d i > 3 , if i < j , if i - j . (7.9)

Hotelling's T

test has a power function which depends only


r c

5Z;^>

so that,

with the above parametrization for 6, we have BT* a multiple of identity. Hotelling's critical region is of the form > > - B u t , when all are equal, the critical region corresponding to (6.8) is of the form ^ ( - / V + p 1 2 j ) r , > c, which is not Hotelling's region if p > 1. Hence we get the following theorem.

T h e o r e m 7.2. F o r 0 < a < l < p < N , D among Gr-invariant E) among all tests.

Hotelling's

test is not of type

tests, and hence is not of type DA or DM (nor of type

166

Group Invariance in Statistical

Inference

R e m a r k s . T h e actual computation of a <j>* of type D appears difficult due to the fact that we need to evaluate an integral of the form

(7.10) for ft = 0 or 1 for various choices of the Ci'fl and c. W h e n a is close to zero or 1, we can carry out approximate computations as follows. As a 1 the complement R of the critical region becomes a simplex with one corner at 0. When p = 2, if we write p = 1 - a and consider critical regions of level a of the form byi + y o(c) as C p b - ' /2(N-2)
2 i 3 2 2

> c where 0 < L 0.

< b < L , L being fixed but large (this keeps R


a

close to the origin), we get from (6.10) that p = E {l-(p{R)) Similarly, E {(l
a

= (A -2)c/26 / +
3 1 s 2

- <p(R)Ri}
0

= {N - 2 ) c 6 - ' / 8 + o(c)

+ o(p) as p 0 while ( # , ) = l/N.

From (5.18) we obtain

for the power near He Np + o(p) 2{N-2)b? + <5


2

1 -

1 -

p + o(p) 2(N -2)6*

(N - l)pb% + o(p) 2(N-2)

+
where o{p) and o ( J ^ 6 i ) terms are uniform in A and p, respectively. The product A ^ of the coefficients of 6\ and b% is maximized when b = (N + 1)/{JV - 1) + o ( l ) , as p * 0; with more care, one can obtain further terms in an expansion in p for the type D choice of b. T h e argument is completed by showing that 6 < L
_ 1

implies that R lies in a strip so close to the r i - a x i s as


0

to make EQ(1 - <p(R))Ri too large and E {(1 d> as good as that with b = (N + l)/(N a + (Si + ) / 2 + o(p ),
2 2

- d>(R))R }
2

too small to yield a

- 1), with a similar argument if b > L .

When p is very close to 0, all choices of 6 > 0 give substantially the same power, so that the relative departure from being type D, of
2

Hotelling's T

test or any other critical region of the form bR\ + R

> c, with

b fixed and positive, approaches 0 as a - 1. However we do not know how great the departure of A j from A can be for arbitrary a.
r

We can similarly

treat the case p > 2 and also the case a * 0. One can treat similarly other problems considered in Chapter 4. References
1. N. Giri, and J . Kiefer, Local and Asymptotic Minimax Properties of Multivariate

Tests, Ann. Math. Statist. 35, 21-35 (1964).

Type D and B Regions

167

2. S. L . I s a a c s o n , On the Theory Specifying 234 (1951).

of Unbiased

Tests of Simple

Statistical

Hypothesis

the Values of Two or more

Parameters,

A n n . M a t h . S t a t i s t . 2 2 , 217 Nonoptimality of

3. J . Kiefer, On the Nonrandomized Symmetrical (1959). Designs, 4. E . L . L e h m a n n , Optimum

Optimality Tests.,

and Randomized

A n n . M a t h . Statist. 2 9 , 675-699 (1958). Invariant A n n . M a t h . Statist. 3 0 , 881-884