Beruflich Dokumente
Kultur Dokumente
Ajit Kumar Das
Estimation
and
Inferential
Statistics
Estimation and Inferential Statistics
Pradip Kumar Sahu Santi Ranjan Pal
•
123
Pradip Kumar Sahu Ajit Kumar Das
Department of Agricultural Statistics Department of Agricultural Statistics
Bidhan Chandra Krishi Viswavidyalaya Bidhan Chandra Krishi Viswavidyalaya
Mohanpur, Nadia, West Bengal Mohanpur, Nadia, West Bengal
India India
Nowadays one can hardly find any field where statistics is not used. With a given
sample, one can infer about the population. The role of estimation and inferential
statistics remains pivotal in the study of statistics. Statistical inference is concerned
with problems of estimation of population parameters and test of hypotheses. In
statistical inference, drawing a conclusion about the population takes place on the
basis of a portion of the population. This book is written, keeping in mind the need
of the users, present availability of literature to cater to these needs, their merits and
demerits under a constantly changing scenario. Theories are followed by relevant
worked-out examples which help the user grasp not only the theory but also
practice them.
This work is a result of the experience of the authors in teaching and research
work for more than 20 years. The wider scope and coverage of the book will help
not only the students, researchers and professionals in the field of statistics but also
several others in various allied disciplines. All efforts are made to present the
“estimation and statistical inference”, its meaning, intention and usefulness. This
book reflects current methodological techniques used in interdisciplinary research,
as illustrated with many relevant research examples. Statistical tools have been
presented in such a manner, with the help of real-life examples, that the fear factor
about the otherwise complicated subject of statistics will vanish. In its seven
chapters, theories followed by examples will make the readers to find most suitable
applications.
Starting from the meaning of the statistical inference, its development, different
parts and types have been discussed eloquently. How someone can use statistical
inference in everyday life has remained the main point of discussion in examples.
How someone can draw conclusions about the population under varied situations,
even without studying each and every unit of the population, has been discussed
taking numerous examples. All sorts of inferential problems have been discussed, at
one place supported by examples, to help the students not only in meeting their
examination need and research requirement, but also in daily life. One can hardly
get such a compilation of statistical inference in one place. The step-by-step
v
vi Preface
procedure will immensely help not only the graduate and Ph.D. students but also
other researchers and professionals. Graduate and postgraduate students,
researchers and the professionals in various fields will be the user of the book.
Researchers in medical and social and other disciplines will be greatly benefitted
from the book. The book would also help students in various competitive
examinations.
Written in a lucid language, the book will be useful to graduate, postgraduate
and research students and practitioners in diverse fields including medical, social
and other sciences. This book will also cater the need for preparation in different
competitive examinations. One can find hardly a single book, in which all topics
related to estimation and inference are included. Numerous relevant examples for
related theories are added features of this book. An introduction chapter and an
annexure are special features of this book which will help readers in getting basic
ideas and plugging the loopholes of the readers. Chapter-wise summary of the
content of the proposed book is presented below.
for the publication of this book and for continuous monitoring, help and suggestion
during this book project. The authors acknowledge the help, cooperation, encour-
agement received from various corners, which are not mentioned here. The effort
will be successful, if this book is well accepted by the students, teachers,
researchers and other users to whom this book is aimed at. Every effort has been
made to avoid errors. Constructive suggestions from the readers in improving the
quality of this book will be highly appreciated.
2 Methods of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3 Method of Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . 48
2.4 Method of Minimum v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Method of Least Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
ix
x Contents
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
About the Authors
xi
xii About the Authors
In a statistical investigation, it is known that for reasons of time or cost, one may
not be able to study each individual element of the population. In such a situation, a
random sample should be taken from the population, and the inference can be
drawn about the population on the basis of the sample. Hence, statistics deals with
the collection of data and their analysis and interpretation. In this book, the problem
of data collection is not considered. We shall take the data as given, and we study
what they have to tell us. The main objective is to draw a conclusion about the
unknown population characteristics on the basis of information on the same char-
acteristics of a suitably selected sample. The observations are now postulated to be
the values taken by random variables. Let X be a random variable which describes
the population under investigation and F be the distribution function of X. There are
two possibilities. Either X has a distribution function of Fh with a known functional
form (except perhaps for the parameter h, which may be vector), or X has a
distribution function F about which we know nothing (except perhaps that F is, say,
absolutely continuous). In the former case, let H be the set of possible values of
unknown parameter h, then the job of statistician is to decide on the basis of
suitably selected samples, which member or members of the family fFh ; h 2 Hg
can represent the distribution function of X. These types of problems are called
problems of parametric statistical inference. The two principal areas of statistical
inference are the “area of estimation of parameters” and the “tests of statistical
hypotheses”. The problem of estimation of parameters involves both point and
interval estimation. Diagrammatically, let us show components and constituents of
statistical inference as in chart.
xiii
xiv Introduction
1X n
x ¼ xi :
n i¼1
1X n
x ¼ xi
n i¼1
is a point estimate of k.
Introduction xv
1X n
x ¼ xi ¼ sample mean
n i¼1
and
1 X n
s2 ¼ ðx1 xÞ2 ¼ sample mean square.
n 1 i¼1
1X n
x ¼ xi
n i¼1
1 X n
s2 ¼ ðxi xÞ2
n 1 i¼1
xvi Introduction
Besides point estimation and interval estimation, we are often required to decide
which value among a set of values of a parameter is true for a given population
distribution, or we may be interested in finding out the relevant distribution to
describe a population. The procedure by which a decision is taken regarding the
plausible value of a parameter or the nature of a distribution is known as the testing
of hypotheses. Some examples of hypothesis, which can be subjected to statistical
tests, are as follows:
1. The average length of life l of electric light bulbs of a certain brand is equal to
some specified value l0 .
2. The average number of bacteria killed by tests drops of germicide is equal to
some number.
3. Steel made by method A has a mean hardness greater than steel made by
method B.
4. Penicillin is more effective than streptomycin in the treatment of disease X.
5. The growing period of one hybrid of corn is more variable than the growing
period for other hybrids.
6. The manufacturer claims that the tires made by a new process have mean life
greater than the life of a tire manufactured by an earlier process.
7. Several varieties of wheat are equally important in terms of yields.
8. Several brands of batteries have different lifetimes.
9. The characters in the population are uncorrelated.
10. The proportion of non-defective items produced by machine A is greater than
that of machine B.
The examples given are simple in nature, and are well established and have
well-accepted decision rules.
So far we have assumed in statistical inference (parametric) that the distribution of the
random variable being sampled is known except for some parameters. In practice, the
functional form of the distribution is unknown. Here, we are not concerned to the
Introduction xvii
techniques of estimating the parameters directly, but with certain pertinent hypothesis
relating to the properties of the population, such as equalities of distribution, tests of
randomness of the sample without making any assumption about the nature of the
distribution function. Statistical inference under such a setup is called non-parametric.
Bayes Estimator
In case of parametric inference, we consider density function f ðx=hÞ, where h is a
fixed unknown quantity which can take any value in parametric space H. In
Bayesian approach, it is assumed that h itself is a random variable and density f ðx=hÞ
is the density of x for a given h. For example, suppose we are interested in estimating
P, the fraction of defective items in a consignment. Consider a collection of lots,
called superlots. It may happen that the parameter P may differ from lot to lot. In the
classical approach, we consider P as a fixed unknown parameter, whereas in
Bayesian approach, we say that P varies from lot to lot. It is random variable having
a density f ðPÞ, say. Bayes method tries to use this additional information about P.
Example Let X1 ; X2 ; . . . Xn be a random sample from PDF
1
f ðx; a; bÞ ¼ xa1 ð1 xÞb1 ; 0\x\1; a; b [ 0:
bða; bÞ
a a ð a þ 1Þ
E ð xÞ ¼ l11 ¼ and E x2 ¼ l12 ¼
aþb ða þ bÞða þ b þ 1Þ
Hence
a aða þ 1Þ 1Xn
¼ x; ¼ x2
aþb ða þ bÞða þ b þ 1Þ n i¼1 i
Solving, we get
P 2
b ð x 1Þ xi nx xb
b
b¼ P and b
a¼
ð xi xÞ2 1x
1 x=h rj1
f ðx; h; r Þ ¼ pffiffi e x ; x [ 0; h [ 0; r [ 0
hr r
xviii Introduction
1X n
Eð xÞ ¼ l11 ¼ rh ; E x2 ¼ l12 ¼ r ðr þ 1Þh2 and m11 ¼ x; m12 ¼ x2
n i¼1 i
Hence
1X n
rh ¼ x; r ðr þ 1Þh2 ¼ x2
n i¼1 i
Solving, we get
P
n
2 ðxi xÞ2
nx b i¼1
br ¼ n and h¼
P 2 nx
ð xi xÞ
i¼1
P
n
1h xi Q
n
1pffi
(i) L ¼ hnr ð rÞ
n e i¼1 xir1
i¼1
pffiffiffi P
n P
n
(ii) log L ¼ nr log h n log n 1h xi þ ðr 1Þ wgxi
i i¼1
Now,
@ log L nr nx x
¼ þ 2¼0)b
h¼
@h h h r
Or
pffiffi
@ log L @ log r X n
¼ n log h n þ log xi
@r @r i¼1
sð r Þ X
n
¼ n log r n pffiffi n log x þ log xi
r i
@ log L
¼0
@r
Introduction xix
and to get the estimate of r. Thus, for this example, estimators of θ and r are more
easily obtained by the method of moments than the method of maximum likelihood.
Example Find the estimators of α and β by the method of moments.
Proof We know
aþb ðb aÞ2
E ð xÞ ¼ l11 ¼ and Vð xÞ ¼ l2
2 12
Hence
aþb ð b aÞ 2 1 X n
¼x and ¼ ðxi xÞ2
2 12 n i¼1
Solving, we get
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P
3 ðxi xÞ2 b 3 ð xi x Þ 2
b
a ¼x and b ¼ xþ
n n
2
L¼ ð b xÞ
b2
Then
LogL ¼ Log2 2 log b þ logðb xÞ
Or
@ log L 2 1
¼ 2 þ ¼ 0 ) b ¼ 2x
@b b bx
Now,
Zb
2 b
E ð xÞ ¼ bx x2 dx ¼
b 3
0
xx Introduction
Hence
b
¼ x ) b ¼ 3x
3
Hence b
b is biased but b is unbiased.
Again
Zb
2 b2
E x2 ¼ 2 bx2 x3 dx ¼
b 6
0
Therefore,
b2 b2 b2
Vð xÞ ¼ ¼
6 9 18
Solving, we get
b2
Vðb Þ ¼ 9V ð xÞ ¼
9
2
V b
b ¼ 4V ð xÞ ¼ b2
9
Hence
h i2
M b
b ¼V b b þ E bb b
2
2 2 2
¼ b þ bb
9 3
1
¼ b2
3
Solution
Let x1 ; x2 ; . . . xn be arranged in K groups such that there are ni observations with
x ¼ i; i ¼ r þ 1; . . .; r þ k 2; nL observations x r, and nu observations with
x r þ k 1; so that the smallest and the largest values of x that are fewer are
pooled together and
X
rþ k2
nL þ ni þ nu ¼ n
i¼r þ 1
Let
eu li
pi ðlÞ ¼ Pðx ¼ iÞ ¼
i!
Xn
pL ðlÞ ¼ Pðx rÞ ¼ pi ðlÞ
i¼0
X
1
pu ðlÞ ¼ Pðx r þ k 1Þ ¼ pi ðlÞ
i¼r þ k1
Now, by using
Xk
ni @pi ðhÞ
¼ 0 j ¼ 1; 2; . . .:p
p ðhÞ @hj
i¼1 i
We have
r
P P
1
i
l 1 pi ðlÞ X
rþ k2 i
l 1 pi ðlÞ
i i¼r þ k1
nL i¼0 P þ ni 1 þ nu P
1 ¼0
r
l
pi ðlÞ i¼r þ 1 pi ðlÞ
i¼0 i¼r þ k1
Since there is only one parameter, i.e., p ¼ 1 we get the only above equation.
Solving,we get
P
r P
1
ipi ðlÞ X
rþ k2 ipi ðlÞ
i¼0 i¼r þ k1
l¼
n^ nL P r þ ini þ nu P 1
pi ðlÞ i¼r þ 1 pi ðlÞ
i¼0 i¼r þ k1
¼ sum of all x0 s
@/
¼ 0 on 2X 0 ðY Xb Þ ¼ 0
@b
^ ¼ ðX 0 XÞ1 X 0 Y.
The least square estimators b0 s is thus given by the vector b
Example Let yi ¼ b1 x1i þ b2 x2i þ . . .. . .. . .. . . þ bk xki ; i ¼ 1; 2; . . .. . .; n or
Eðyi Þ ¼ b1 x1i þ b2 x2i ; x1i ¼ 1 for all i. Find the least square estimates of b1 and b2 .
Prove that the method of maximum likelihood and the method of least square are
identical for the case of normal distribution.
Solution
In matrix notation, we have
0 1 01
1 x21 y1
B1
B x22 C
C b1
B y2 C
B C
EðY) = Xb where X ¼ B .. .. C; b ¼ and Y ¼B . C
@. . A b2 @ .. A
1 x2n yn
Now,
^ ¼ ðX 0 XÞ1 X 0 Y
b
Here
0 1
1 x21
B 1 x22 C P
1 1 ... 1 B C n x2i
0
XX¼ B. C
.. C ¼ P P
x21 x22 ... x2n B
@ .. . A x2i x22i
1 x2n
P
0 yi
XY¼ P
x2i yi
Introduction xxiii
Then
P 2 P P
^ ¼ P 1 x2i x2i yi
b P P P
n x2i ð x2i Þ2
2 x2i n x2i yi
P 2 P P P
1 x2i yi x2i x2i yi
¼ P 2 P P P P P
n x2i ð x2i Þ2 x2i yi þ n x2i yi
Hence
P P P P
^ ¼n x2i yi x2i yi
b 2 P P
n x22i ð x2i Þ2
P P
x2i yi nx2y
¼ P 2
x2i nx22
P
ðx2i x2 Þðyi yÞ
¼ P
ðx2i x2 Þ2
and
P P PP
^ ¼ x22i yi x2i x2i yi
b 1 P P
n x2 ð x2i Þ2
P 2 2i P
y x2i x2 x2i yi
¼ P 2
x2i nx2
P
ynx22 x2 x2i yi
¼ y þ P
x22i nx22
^
¼ y x2 b 2
X
n
/¼ ðyi b1 b2 xi Þ2
i¼1
P
n
L is maximum when ðyi b1 b2 xi Þ2 is minimum. By the method of
i¼1 Pn
maximum likelihood we choose b1 and b2 such that ðyi b1 b2 xi Þ2 ¼ / is
i¼1
minimum. Hence both the methods of least square and maximum likelihood esti-
mator are identical.
Chapter 1
Theory of Point Estimation
1.1 Introduction
The process is called Statistical Inference, being similar to the process of inductive
inference as envisaged in classical logic. For here too the problem is to know the
general nature of the phenomenon under study (as represented by the distribution of
the r.v’s) on the basis of the particular set of observations. The only difference that in a
statistical investigation induction is achieved within a probabilistic framework.
Probabilistic considerations enter into the picture in three ways. Firstly, the model used
to represent the field of study is probabilistic. Second, certain probabilistic principles
provide the guidelines in making the inference. Third, as we shall see in the sequel, the
reliability of the conclusions also is judged in probabilistic terms.
Random Sampling
Consider a statistical experiment that culminate in outcomes x which are the values
assumed by a r.v. X. Let F be the distribution function of X. One can also obtain n
independent observations on X. This means that the n values observed as
x1 ; x2 ; . . . ; xn are assumed by the r.v. X [This can be obtained by replicating the
experiment under (more or less) identical conditions]. Again each xi may be
regarded as the value assumed by a r.v. Xi, i = 1 (1)n, where X 1 ; X 2 ; . . . X n are
independent random variables with common distribution function F. The set
X 1 ; X 2 ; . . . X n of iid r.v’s is known as a random sample from the distribution
function F. The set of values ðx1 ; x2 ; . . . ; xn Þ is called a realization of the sample
ðX 1 ; X 2 ; . . .; X n Þ.
Parameter and Parameter Space
A constant which changes its value from one situation to another is knownpa-
rameter. The set of all admissible values of a parameter is often called the parameter
space. Parameter is denoted by θ (θ may be a vector). We denote the parameter
space by H.
Example 1.3
(a) Let y ¼ 2x þ h. Here, θ is a parameter and
H ¼ fh; / \ h \ /g:
H ¼ fp; 0 \ p \ 1g:
H ¼ fk; k [ 0g:
l
Here, h ¼ is a parameter and H ¼ lr ; 1 \ l \ 1; r [ 0
r
1.1 Introduction 3
Family of distributions
Let X Fh where h e H. Then the set of distribution functions {Fθ, h e H} is called
a family of distribution functions.
Similarly, we define family of p.d.f’s and family of p.m.f’s.
Remark
(1) If functional form of Fθ is known, then θ can be taken as an index.
(2) In the theory of estimation, we restrict ourselves to the case H Rk when k is
the number of unknown functionally unrelated parameters.
Statistic
A statistic is a function of observable random variable which must be free from
unknown parameter(s), that is a Borel measurable function of sample observations
X ¼ ðx1 ; x2 ; . . . ; xn Þ 2 Rn f : Rn ! Rk is often called a statistic.
In statistics, the job of a statistician is to interpret the data that he has collected and to
draw statistically valid conclusion about the population under investigation. But, in
many cases the raw data, which are too numerous and too costly to store, are not
suitable for this purpose. Therefore, the statistician would like to condense the data by
computing some statistics and to base his analysis on these statistics so that there is no
loss of relevant information in doing so, that is the statistician would like to choose
those statistics which exhaust all information about the parameter, which is contained
in the sample. Keeping this idea in mind, we define sufficient statistics as follows:
Definition Let X ¼ ðX 1 ; X 2 ; . . . ; X n Þ be a random sample from fF h ; h 2 Hg.
A statistic Tð X Þ is said to be sufficient for θ [or for the family of distribution
fF h ; h 2 Hg] iff the conditional distribution of X given T is free from θ.
4 1 Theory of Point Estimation
X1 þ X2 þ þ Xn
y1 ¼ p
n
ðk 1ÞX k ðX 1 þ X 2 þ þ X k1 Þ
and yk ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; k ¼ 2ð1Þn:
k ð k 1Þ
pffiffiffi
Clearly, y1 Nð n l; 1Þ and each of yk Nð0; 1Þ:
Again, y1, y2,…, yn are independent.
Note that the joint distribution of y2 ; y3 ; . . .; yn does not involve μ, i.e. y2 ; . . .; yn
do not provide any information on μ. So to estimate μ, we use either the obser-
vations on X 1 ; X 2 ; . . . X n or simply the observed value of y1. So any analysis based
on y1, is just as effective as the analysis that is based on all observed values on
X 1 ; X 2 ; . . . X n . Hence, we can suggest that y1 is a sufficient statistic for μ.
1.2 Sufficient Statistic 5
n!
gðt; hÞ ¼ ht ð1 hÞnt ; t ¼ 0; 1; 2; . . .; n
t!ðn tÞ!
Thus the joint probability mass function of X 1 ; X 2 ; . . .; X n may be written
f x ; h ¼ hx1 þ x2 þ þ xn ð1 hÞnðx1 þ x2 þ þ xn Þ
n! t!ðn tÞ!
¼ ht ð1 hÞnt
t!ðn tÞ! n!
By Fisher–Neyman criterion, T X ¼ X 1 þ X 2 þ þ X n is a sufficient
statistic for h. In some cases, it is quite tedious to find the p.d.f or p.m.f of a certain
statistic which is or is not a sufficient statistic for h. This problem can be avoided if
we use the following
Fisher–Neymann factorization theorem
Let X ¼ ðX 1 ; X 2 ; . . . X n Þ be a random sample from a population with c.d.f.
F h ; h e H. Furthermore, let all X 1 ; X 2 ; . . . X n are of discrete type or of continuous
6 1 Theory of Point Estimation
type. Then a statistic Tð x Þ will be sufficient for θ or for fF h ; h e Hg iff the joint
p.m.f. or p.d.f. f ð X ; hÞ, of X 1 ; X 2 ; . . . X n can be expressed as
f ð X ; 0Þ ¼ gfTð X Þ; hg hð x Þ
where the first factor gfTð x Þ; hg is a function of θ and x only through T( x ) and for
fixed T( x ) the second factor h( x ) is free from θ and is non-negative.
Remark 1.1 When we say that a function is free from θ, we do not only mean that θ
does not appear in the functional form but also the domain of the function does not
involve θ.
e.g. the function
1=2; h 1\x\hþ1
f ðxÞ ¼
0 otherwise
f ð x ; hÞ ¼ gfTð x Þ; hg hð x Þ
Since the function T 0 x is one-to-one
h i
f ð x ; hÞ ¼ g w1 fT 0 ð x Þg; h hð x Þ:
h
Clearly, the first factor of R.H.S. depends on h and x only through T 0 ð x Þ and
the second factor h( x ) is free from θ and is non-negative.
Therefore, according to factorizability criterion, we can say that T 0 ð x Þ is also
sufficient for the same parameter θ.
Example 1.6 Let X1 ; X2 ; . . . Xn be a random sample from b(1, π). We show that 1/n
ΣXi is a sufficient statistic for π.
1.2 Sufficient Statistic 7
P.m.f. of x is
hX ð1 hÞ1X if x ¼ 0; 1 ½h p
fh ðxÞ ¼
0 otherwise
where 0 \ h \ 1, i.e. the parameter space is H ¼ ð0; 1Þ. Writing fh ðxÞ in the
form
1 if x ¼ 0; 1
fh ðxÞ ¼ C ðxÞhx ð1 hÞ1x with CðxÞ ¼
0 Otherwise:
P
1 12 ðxi lÞ2
¼ pffiffiffiffiffiffin e
2r
i
r 2p
8 P 2 9
nðxlÞ2
< i ðxi xÞ 1 =
¼ e 2r2 e 2r2 pffiffiffiffiffi n
: ðr 2nÞ ;
¼ gl ðtÞ h ðx1 ; x2 ; . . .; xn Þ; ðSayÞ
2
where t ¼ x; gl ðtÞ ¼ enðXlÞ =2r2
P
1 ðxi xÞ2
and hðx1 ; x2 ; . . . ; xn Þ ¼ p1ffiffiffiffi n e 2r2
ð r 2p Þ
Thus the factorizability condition holds with respect P to T ¼ X, the sample mean,
which is therefore sufficient for μ. So the sum is i X i
(ii) The unknown variance σ2 = θ, say, is supposed to vary over H ¼ ð0; /Þ.
The joint p.d.f. of X 1 ; X 2 ; . . . X n may be written as
P
Y 1 12 ðxi lÞ2
1
1
2r2
ðxi lÞ2
pffiffiffiffiffiffi e 2r ¼ pffiffiffiffiffiffi n e i ¼ gh ðtÞ hðx1 ; x2 . . . xn Þ; say;
i r 2p ðr 2pÞ
P P 2
ðxi lÞ2 ; gh ðtÞ ¼ pffiffiffiffi1 n e2r2 ðxi lÞ
1
where t ¼
ð 2prÞ
i
P
r h and h x ¼ 1. Hence, T ¼
2
ðX i lÞ2 is a sufficient statistic for θ. So
is
P
S20 ¼ 1n i ðxi lÞ2 , which is in this situation commonly used to estimates σ2.
1.2 Sufficient Statistic 9
(iii) Taking the unknown mean and variance to be θ1 and θ2, respectively, we
now have for θ a vector θ = (θ1, θ2) varying over the parameter space (which is a
half-plane) H ¼ fðh1 ; h2 Þ = / \ h1 \ /; 0 \ h2 \ /g.
The jt. p.d.f. of X 1 ; X 2 ; . . . ; X n may now be written as
Y 1 2
1 2h1 ½nðxh1 Þ2 þ ðn1Þs2
pffiffiffiffiffiffiffiffiffiffi e2h2 ðxi h1 Þ
1
¼ n e 2
i 2ph2 ð2ph2 Þ 2
X .
¼ gh ðt1 ; t2 Þ hð x Þ; say; where t1 ¼ x; t2 ¼ s2 ¼ ðxi xÞ2 ðn 1Þ
i
1
2h1 ½nðxh1 Þ2 þ ðn1Þs2
gh ð t 1 ; t 2 Þ ¼ n e 2 and h x ¼ 1 :
ð2ph2 Þ2
The factorizability condition is thus observed to hold with regard to the statistics
T 1 ¼ X, the sample mean and T2 ¼ s2, the sample variance.
Hence, X and s2 are jointly sufficient for θ1 and θ2, i.e. RX i ; RX 2i is a joint
sufficient statistic for (μ, σ2).
Example 1.9 Let X 1 ; X 2 ; . . . ; X n be a random sample from R(0, θ).
Show that X ðnÞ ¼ max Xi is a sufficient statistic for θ.
1in
Ans.: The jt. p.d.f. of x1 ; x2; . . .; xn is
1
hn if 0 \ xi \ h 8i
f x;h ¼
0otherwise
1
if 0 \ xðnÞ \ h
hn
¼
0 otherwise
1 1 if a \ x \ b
¼ n I ð0;hÞ xðnÞ where I ða;bÞ ðxÞ ¼
h 0 otherwise
¼ gfxðnÞ ; hg hð x Þ; say
where
1 if h1 \ xð1Þ \ /
I1ðh1 ;1Þ fxð1Þ g ¼
0 otherwise
1 if / \ xn \ h2
and I 2 ð /; h2 ÞfxðnÞ g ¼ , i.e. f ð x ; hÞ ¼ g½fxð1Þ ; xðnÞ g;
0 otherwise
ðh1 ; h2 Þh x where g½fxð1Þ ; xðnÞ g; ðh1 ; h2 Þ ¼ ðh2 h
1
1Þ
n I 1ð0 ;/Þ fxð1ÞgI 2ð1;h Þ
1 2
xðnÞ and h x ¼ 1.
Note that g is a function of (θ1, θ2) and x only through {x(1), x(n)} where as for
1
f ð xÞ ¼ ; h1 \ x \ h2
h 2 h1
Let X ð1Þ ¼ min X i ¼ y1 ; X ðnÞ ¼ max X i ¼ y2
1in 1in
The joint p.d.f. of y1 ; y2 is
nð n 1Þ
gð y 1 ; y 2 ; h1 ; h2 Þ ¼ ðy y1 Þn2 ; h1 \ y1 \ y2 \ h2
ð h2 h1 Þ n 2
1 X2
f ðx; rÞ ¼ pffiffiffiffiffiffiffiffi e2r2 ; r[0
2kr
1 jX j2
¼ pffiffiffiffiffi e 2r2 1
2kr
¼ gðt; rÞ hðxÞ; hðxÞ ¼ 1
where g(t, σ) is a function of σ and x only through t ¼ jxj and for fixed t, h(x) = 1 is
free from σ.
Hence, by Fisher–Neymam factorization theorem, jX j is sufficient for σ.
Example 1.13 Let X 1 ; X 2 ; . . . X n be a random sample from a double-exponential
distribution whose p.d.f. may be taken as fh ðX Þ ¼ 12 exp ðjxi hjÞ, and the
unknown parameter θ varies over the space H ¼ ð /; /Þ.
Q P
In this case, the joint p.d.f. is i f h ðxi Þ ¼ 21n exp i jxi hj .
For no single statistic T, it is now not possible to express the joint p.d.f. in the
form gθ(t) h(x1, x2, … xn).
Hence, there exists no statistic T which taken alone is sufficient for θ. The whole
set X1, X2, …, Xn, or the set X(1), X(2), … X(n), is of course sufficient.
Remarks 1.2 A single sufficient statistic does not always exist.
e.g. Let X1, X2,…, Xn be a random sample from a population having p.d.f.
1
f ðx;hÞ ¼ h; k h \ x \ ðk þ 1Þ h; k [ 0
0 otherwise
Here, no single sufficient statistic for θ exists. In fact, {x(1), xn)} is sufficient for
θ.
Remark 1.3 Not all functions for sufficient statistic are sufficient. For example, in
2
random sampling from N(μ, σ2), σ2 being known, X is not sufficient for μ. (Is X
sufficient for μ2 ?)
Remark 1.4 Not all statistic are sufficient.
Let X1, X2 be a random sample from P(λ). Then X1 + 2X2 is not sufficient for λ,
because in particular, say
PfX 1 ¼ 0; X 2 ¼ 1 X 1 þ 2X 2 ¼ 2g
PfX 1 ¼ 0; X 2 ¼ 1j X 1 þ 2X 2 ¼ 2g ¼
PfX 1 þ 2X 2 ¼ 2g
1.2 Sufficient Statistic 13
PfX 1 ¼ 0; X 2 ¼ 1g
¼
PfX 1 ¼ 0; X 2 ¼ 1g þ PfX 1 ¼ 2; X 2 ¼ 0g
k k
e e k
¼ 2
ek ek k þ ek ek k2!
X
n
Then we have uðxi Þ ¼ GðTÞ ð1:2Þ
i¼1
@uðxi Þ @GðT Þ @T
¼ ð1:4Þ
@xi @T @xi
@GðT; hÞ @ GðTÞ
¼ k1 ðhÞ
@T @T
) GðT; hÞ ¼ GðT Þk1 ðhÞ þ k2 ðhÞ
P
@ i logf ðxi ; hÞ
) ¼ GðT Þk1 ðhÞ þ k2 ðhÞ
@h Z Z
X
) log f ðxi ; hÞ ¼ GðT Þ k1 ðhÞdh þ k2 ðhÞdh þ c x
i
Y
) f ðxi ; hÞ ¼ A x eh1 GðT Þ þ h2
i
where A x ¼ a function of x
θ1 = a function of θ, and
θ2 = another function of θ.
Thus if a distribution is to have a sufficient statistic for its parameter, it must be
of the form
k x
Here, f ðx; kÞ ¼ e x!k ¼ ek þ x log klog x!
which is of the form eB1 ðhÞuðxÞ þ B2 ðhÞ þ RðxÞ .
Hence, there exists a sufficient statistic for k.
Completeness A family of distributions is said to be complete
if E ½gðX Þ ¼ 0 8h2H
) PfgðxÞ ¼ 0g ¼ 1 8h2H
As E ½gðT Þ ¼ 0 8 p 2 ð0; 1Þ
X n
) gðtÞ nt pt ð1 pÞnt ¼ 0
t¼0
X
n n
p t
) ð 1 pÞ n gð t Þ ¼ 0 8 p 2 ð0; 1Þ
t¼0
t
1p
) gð t Þ ¼ 0 for t ¼ 0; 1; 2. . . n 8 p 2 ð0; 1Þ
) P fgð t Þ ¼ 0g ¼ 1 8p
(b) Let X * N (0, σ2). Then X is not complete
as; E ð X Þ ¼ 0 ; P ð X ¼ 0Þ ¼ 1 8 r2
points such that T x ¼ T y , then the inference about h should be the same
whether X ¼ x or X ¼ y is observed.
Definition (Sufficient statistic) A statistic T X is a sufficient statistic for h if the
conditional distribution of the sample X given the value of T X does not depend
on h.
Factorization theorem: Let f x jh denote the joint pdf/pmf of a sample X .
A statistic T X is a sufficient statistic for h iff 9 functions gðtjhÞ and h x such
that for all sample points X and all parameter values h,
f x jh ¼ gðtjhÞh x
Result: If T X is a function of T 0 X , then T 0 X is sufficient which
implies that T X is sufficient.
i.e. sufficiency of T 0 X ) sufficiency of T X ; a function of T 0 X
0 0 0
Proof Let fBt0 jt 2 s g and fAt jt 2 sg be the partitions induced by T X and
T X , respectively. h
Since T X is a function of T 0 X , for t0 2 s0 ) Bt0 At , for some 8t 2 s.
0
Thus Sufficiency of T X
, Conditional distribution of X ¼ x given T 0 X ¼ t0 is independent of h,
8t0 2 s0
, Conditional distribution of X ¼ x given X 2 Bt0 is independent of h, 8t0 2 s0
) Conditional distribution of X ¼ x given X 2 At (for some 8t 2 s) is inde-
pendent of h, 8t 2 s
, Conditional distribution of X ¼ x given T X ¼ t is independent of h,
8t 2 s.
1.2 Sufficient Statistic 17
, Sufficiency of T X .
!
X
n X
n
T X ¼ t1 Xj ; . . .; tk X j
j¼1 j¼1
U ¼ wðtÞ3wðt1 Þ ¼ ¼ wðtk Þ ¼ u:
Then
8 ðnhÞti
>
>
<Pk
ti !
if t ¼ ti ði ¼ 1; 2; . . .; kÞ
ðnhÞti
Ph ½T ¼ tjU ¼ u ¼
>
> i¼1
ti !
:
0 otherwise
! depends on h ;
1.2 Sufficient Statistic 19
Pn
so that U is not sufficient retaining sufficiency. Hence, T ¼ i¼1 Xi is minimal
sufficient statistic.
Remark 1 Since minimal sufficient statistic is a function of sufficient statistic,
therefore, a minimal sufficient statistic is also sufficient.
Remark 2 Minimal sufficient statistic is not unique since any one-to-one function of
minimal sufficient statistic is also a minimal sufficient statistic.
Definition of minimal sufficient statistic does not help us to find a minimal
sufficient statistic except for verifying whether a given statistic is minimal sufficient
statistic. Fortunately, the following result of Lehman and Scheffe (1950) gives an
easier way to find a minimal sufficient statistic.
Theorem Let f x jh be the pmf/pdf of a sample X . Suppose 9 a function
T X 3 for every two sample points x and , and the ratio of f x jh
y f y jh
is constant as a function of h (i.e. independent of h) iff T x ¼ T y . Then
T X is minimal sufficient statistic.
Proof Let us assume f x jh [ 0; x 2 x and h. First, we show that T X is a
n o
sufficient statistic. Let s ¼ t=t ¼ Tð x Þ; x 2
x be the image of x under Tð x Þ.
n
o
Since this ratio does not depend on h, the assumptions of the theorem imply
Tð x Þ ¼ Tð y Þ. Thus Tð x Þ is a function of T 0 ð x Þ and Tð x Þ is minimal.
ðy; s2y Þ be the sample means and variances corresponding to the x and y samples,
respectively. Then we must have
n=2
f x jl; r 2 exp ð2pr2 Þ
n ðx lÞ 2
þ ð n 1 Þs 2
x ð 2r 2
Þ
¼
n=2
f y jl; r2 ð2pr Þ
2 2
exp nðy lÞ þ ðn 1Þs y 2 ð2r Þ2
2
¼ exp n x y þ 2nlðx yÞ ðn 1Þ s x s y
2 2 2
2r2
This ratio will be constant as a function of l and r2 iff x ¼ y and s2x ¼ s2y , i.e.
s2 Þ is a minimal sufficient
ðx; s2x Þ ðy; s2y Þ. Then, by the above theorem, ðX;
Pn P P
i¼1 Xi and ni¼ 1 Xi2 are each singly sufficient for l; ni¼ 1 Xi2 being minimal.
(This particular example also establishes the fact that single sufficiency does not
imply minimal sufficiency.)
It can readily be shown that there exists no T for which (1.8) holds. [e.g. Let θ0
be a value of θ and consider T 0 ¼ h0 . Note that m.s.e. of T 0 at θ = θ0 is ‘0’, but m.s.
e. of T 0 for other values of θ may be quite large.]
To sidetrack this, we introduce the concept of unbiasedness.
Actually, we choose an estimator on the basis of a set of criteria. Such a set of
criteria must depend on the purpose for which we want to choose an estimator.
Usually, a set consists of the following criteria: (i) unbiasedness; (ii) mini-
mum-variance unbiased estimator; (iii) consistency, and (iv) efficiency.
Unbiasedness
An estimator T is said to be an unbiased estimator (u.e.) of h ½or cðhÞ iff
E ðT Þ ¼ h ½or cðhÞ 8h 2 H.
Otherwise, it will be called a biased estimator. The quantity b(θ, T) = Eθ (T) − θ
is called the bias. A function γ(θ) is estimable if it has an unbiased estimator.
22 1 Theory of Point Estimation
Let X1, X2,…, Xn be a random sample from a population with mean μ and
Pn
variance σ2. Then X and s2 ¼ 1 2 are u.e’s of μ and σ2,
n1 i ¼ 1 ðXi X Þ
respectively.
Note
(i) Every individual observation is an unbiased estimator of population mean.
(ii) Every partial mean is an unbiased estimator of population mean.
Pk Pk
1
(iii) Every partial sample variance [ e.g. k1 2
1 ðXi Xk Þ ; Xk ¼ k
1
1 Xi and
k < n] is an unbiased estimator of σ .
2
Example 1.15 Let X1, X2,…, Xn be a random sample from N(μ, σ2). Then X and
Pn
s2 ¼ n1 2
1 ðXi X Þ are u.e’s for μ and σ , respectively. But estimator s ¼
1 2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn ffi
2
1 ðXi X Þ is a biased estimator of σ.
1
hqffiffiffiffiffiffi
n1
1 i
The bias bðs; rÞ ¼ r n1 2
Cðn=2Þ C n1
2 1 .
Let Ta ¼ aX þ ð1 aÞs2 ; 0 a 1. Here, Ta is an unbiased estimator of λ.
Remark 1.11 An unbiased estimator may be absurd.
Example Let X PðkÞ. Then T ðX Þ ¼ ð2ÞX is an unbiased estimator of e−3λ since
X ek kx
E fT ðX Þg ¼ ð2Þx
x
x!
X ð2kÞx
¼ ek ¼ ek e2k ¼ e3k :
x
x!
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 23
) MSE ðT 1 Þ ¼ V ðT 1 Þ þ b2 ðT 1 ; hÞ
MSE ðT 2 Þ ¼ V ðT 2 Þ
if V ðT 2 Þ [ V ðT 1 Þ þ b2 ðT 1 ; hÞ; then we prefer T 1 :
e.g. Let X1, X2,…, Xn be a random sample from N(μ, σ2). Then s2 ¼
P P
1 2
i ðX i XÞ is an unbiased estimator of σ . Clearly, n þ 1
2 1 2
i ðX i XÞ ¼ n þ 1s
n1 2
n1
is a biased estimator of σ .
2
ðn 1ÞS2 ðn 1ÞS2
As v 2
n1 ) V ¼ 2 ð n 1Þ
r2 r2
2
) V s2 ¼ r4 ¼ MSE of s2 :
n1
24 1 Theory of Point Estimation
2
On the other hand, MSE of n1 2
nþ1 s ¼ V n1 2
nþ1 s þ n1 2
nþ1 r r2
n1 2 2 4r4 2r4 2r4 2r4
¼ r4 þ ¼ ð n 1 þ 2Þ ¼ \
nþ1 n 1 ð n þ 1Þ 2
ðn þ 1Þ 2 nþ1 n1
The implication of this statement is that T k gets closer and closer to the true
value of the parameter as k → ∝ (k becomes larger and larger).
On the other hand if Ti’s are biased estimator with common bias β, then T k
approaches to the wrong value θ + β instead of the true value θ even if k → ∝.
Problem 1.1 Let X1, X2, …, Xn be a random sample from bð1; ^Þ.
Show that
XðX1Þ 2
(i) nðn1Þ is an unbiased estimator of ^
XðnXÞ
(ii) is an unbiased estimator of ^ð1 ^Þ
nðn1Þ
P
where X = number of success in n tosses = ni¼ 1 Xi .
Minimum-VarianceUnbiased Estimator (MVUE)
Let U be the set of all u.e’s (T) of θ with E T 2 \ / 8h 2 H, and then an
estimator T0 2 U will be called a minimum-variance unbiased estimator (MVUE) of
θ{or γ(θ)} if V(T0) ≤ V(T) 8θ and for every T 2 U.
Result 1.1 Let U be the set of all u.e’s (T) of θ with E(T2) < ∝, 8θ 2 Θ.
Furthermore, let U0 be the class of all u.e’s (v) of ‘0’ {Zero} with E(v2) < ∝ 8θ, i.e.
U0 = {v: E(v) = 0 8θ and E(v2) < ∝].
Then an estimator T0 2 U will be an MVUE of θ iff
CovðT 0 ; vÞ ¼ E ðT 0 vÞ ¼ 0 8 h; 8v 2 U 0 :
E ðT 0 vÞ ¼ 0 8h; 8v 2 U 0 ð1:9Þ
Now, V ðT 0 T Þ 0
) V ðT 0 Þ þ V ðT Þ 2 CovðT 0 ; T Þ 0
) V ðT 0 Þ þ V ðT Þ 2 V ðT 0 Þ 0 ðby ð1:10ÞÞ
) V ðT Þ V ðT 0 Þ 8h 2 H:
EðT 1 vÞ ¼ 0 ¼ E ðT 2 vÞ
Now
ðc0 ðhÞÞ2
V ðT Þ 8h:
IðhÞ
R
Proof Since 1 ¼ Rn f ð x ; hÞd x
∴ We have from the condition (b)
Z @f x ; h Z @ log f x ; h
O ¼ dx ¼ f x;h dx ð1:11Þ
@h @h
n n
R R
h
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 27
Again, since T X is an u.e. of γ(θ), we have from condition (c)
Z @ log f ð x ; hÞ
c0 ðhÞ ¼ Tð x Þ f ð x ; hÞd x ð1:12Þ
@h
n
R
Remark 1 If the variables are of discrete type, the underlying condition and the
proof of the Cramer–Rao inequality will also be similar, only the multiple integrals
being replaced by multiple sum.
Remark 2 For any set of estimators T, having expectation γ(θ),
fc0 ðhÞg2
MSE ¼ EðT hÞ2 ¼ V ðT Þ þ B2 ðT; hÞ þ B2 ðT; hÞ ¼
IðhÞ
½1 þ B0 ðT; hÞ2
þ B2 ðT; hÞ ½where cðhÞ ¼ h þ BðT; hÞ:
IðhÞ
28 1 Theory of Point Estimation
Remark 3 Assuming that f x ; h is differentiable not only once but also twice, we
have
@ log f ð x ; hÞ
¼ kðhÞfT cðhÞg:
@h
Note that every MVBE is an MVUE, but the converse may not be true.
Remark 6 Distributions admitting an MVUE
A distribution having an MVUE of λ(θ) must satisfy
@ log f ðx;hÞ
@h ¼ kðhÞfT cðhÞg. It is a differential equation. So
Z Z
log f ðx; hÞ ¼ T kðhÞ dh kðhÞ cðhÞdh þ cðxÞ
) f ðx; hÞ ¼ AeTh1 þ h2
L ¼ gðT; hÞ hðx1 ; x2 ; . . . ; xn Þ
Remark 7 A necessary condition for satisfying the regularity conditions is that the
domain of positive p.d.f. must be free from h.
2
Example 1.16 Let X U½0; h, let us compute nE @logf@hðx;hÞ which is hn2 . So
Cramer–Rao lower bound, in this case, for the variance of an unbiased estimator of
2
h is hn (apparant).
Now, we consider an estimator. T ¼ n þn 1 X ðnÞ ; X ðnÞ ¼ maxðX1 ; X 2 ;. . .; X n Þ: P.d.f.
n1 1
of X ðnÞ is fxðnÞ ðxÞ ¼ n hx h ; 0 x h:
3h
Zh
n n x nþ1
)E X ðnÞ ¼ n xn dx ¼ n : 5 ¼ n h:
h h nþ1 nþ1
0 0
This is not surprising because the regularity conditions do not hold here.
Actually, here @f @h
ðx;hÞ
exists for h 6¼ x but not for θ = x since
1
f ðx; hÞ ¼ if h x
h
¼0 if h \ x:
‘=’ holds iff VðhjTÞ ¼ 0:, i.e. iff h ¼ EðhjTÞ with probability ‘1’
h i
VðhjT Þ ¼ Eh=T fh EðhjT Þg2
h
Result 1.6: Lehmann–Scheffe Theorem If T be a complete sufficientstatistic for θ
and if h be an unbiased estimator of cðhÞ, then EðhjTÞ will be an MVUE of cðhÞ.
Proof Let both h1 ; h2 2 U ¼ ½h : EðhÞ ¼ cðhÞ; Eðh2 Þ \ 1.
Then E fEðh1 jTÞg ¼ cðhÞ ¼ E fEðh2 jTÞg (from Result 1.5).
Hence, E fEðh1 jTÞ Eðh2 jTÞg ¼ 0. . .8h
) P Eðh1 jTÞ ¼ Eðh2 jTÞ ¼ 1 ð* T is completeÞ:
1 if X ¼ 0
Example 1.17 Let X PðkÞ, then show that dðxÞ ¼ is the only
0 otherwise
unbiased estimator of cðkÞ ¼ ek . Is it an MVUE of ek ?
Answer
Let h(x) be an unbiased estimator of ek ¼ h, say.
Then E fhðxÞg ¼ h 8h
x
X
1
h loge 1h
) hð x Þ ¼h 8h
x¼0
x!
1 if x ¼ 0
) hð x Þ ¼
0 if x 6¼ 0
) hðxÞ; i:e:; dðxÞ is the only unbiased estimator of ek .
Here, unbiased estimator of ek is unique and its variance exists. Therefore, dðxÞ
will be an MVUE of cðkÞ¼ ek .
" #
X
1
2
E fhðxÞg ¼ 1:Pðx ¼ 0Þ þ 0: Pðx ¼ iÞ ¼ ek \ 1
i¼1
Z Z i
@i @
3: f h ðxÞdx ¼ f h ðxÞdx 8h; i ¼ 1; 2. . .k; and
@h i
@hi
i ¼ 1; 2;. . .k
4. V k
k ðhÞ ¼ mij ðhÞ ;
j ¼ 1; 2;. . .k
exists and is positive definite 8h where
1 @i @j
mij ðhÞ ¼ E h f ð xÞ f ð xÞ :
f h ðxÞ @hi h @h j h
Varh ðT Þ g0 V 1 g 8h
where
n o @i
g0 ¼ gð1Þ ðhÞ; gð2Þ ðhÞ; . . .; gðkÞ ðhÞ ; gðiÞ ðhÞ ¼ i gðhÞ
@h
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 33
Proof
1 @i
Define bi ðx;hÞ ¼ f ð xÞ
f h ðxÞ @hi h
Z
1 @i
E ½bi ðx;hÞ ¼ f ðxÞ f h ðxÞdx ¼ 0
f h ðxÞ @hi h
2
1 @i
V ½bi ðx;hÞ ¼ E½bi ðx;hÞ2 ¼ E f ð x Þ ¼ mii ðhÞ
f h ðxÞ @hi h
Cov bi ; bj ¼ mij ðhÞ
Z Z
1 @i @i
CovðT;bi Þ ¼ t ð xÞ f h ðxÞ : f h ðxÞdx ¼ tðxÞf h ðxÞdx ¼ gðiÞ ðhÞ
f h ðxÞ @h i
@hi
0 1
T
B C
B 1C
b
B C
B b2 C
B C
B C
Let Rðk þ 1Þxðk þ 1Þ ¼ DispB C
B C
B C
B C
B C
@ A
bk
8 ð1Þ ð2Þ 9
>
> V h ðT Þ g ð hÞ g ð hÞ . . . gð k Þ ð hÞ >
>
>
> gð1Þ ðhÞ >
>
>
> m11 m12 ... m1k > >
>
>
>
< gð2Þ ðhÞ >
=
m21 m22 ... m 21 V h ðT Þ g0
¼ ¼
>
> ... ... >
> g V
>
> >
>
>
> >
>
> ... ... >
>
>
: ðkÞ ;
g ð hÞ mk1 mk2 ... mkk
jRj ¼ jV j V h ðT Þ g0 V 1 g
as jRj 0; jV j 0
) V h ðT Þ g0 V 1 g 0 i:e: V h ðT Þ g0 V 1 g 8h:
0 2 0 2
Cor: For k ¼ 1 V h ðTÞ fvg11ðhÞg fg ðhÞg
ðhÞ ¼ IðhÞ = Cramer–Rao lower bound. h
P
Case of equality holds when j j ¼ 0
X
X h
X Xi
)R \ k þ 1 or R k; RðVÞ ¼ k R =rank of
X
X
R RðVÞ ) R ¼ k:
34 1 Theory of Point Estimation
0 P
Lemma 1.1 Let X ¼ x1 ; x2 ; . . .; xp ; DðXÞ ¼ p
p
P
is of rank rð pÞ iff x1 ; x2 ; . . .; xp satisfies (p − r) linear restrictions of the
form
n o
a11 x1 Eðx1 Þ þ a12 x2 Eðx2 Þ þ þ a1p xp Eðxp Þ ¼ 0
n o
a21 x1 Eðx1 Þ þ a22 x2 Eðx2 Þ þ þ a2p xp Eðxp Þ ¼ 0
:
:
n o
apr;1 x1 Eðx1 Þ þ apr;2 x2 Eðx2 Þ þ þ apr;p xp Eðxp Þ ¼ 0
with probability 1.
Put p = k + 1, r = k; x1 ¼ T, x2 ¼ b1 ; . . .; xp ¼ bk .
Then RðRÞ ¼ k iff T; b1 ; b2 ; . . .; bk satisfy one restriction with probability ‘1’ of
the form
Result
T gðhÞ ¼ b0 b with probability ‘1’ ) T gðhÞ ¼ g0 V 1 b with probability ‘1’.
Proof
T gðhÞ ¼ b0 b ) V h ðTÞ ¼ g0 V 1 g
Consider V h b0 b g0 V 1 b ¼ V h ðT g0 V 1 bÞ
¼ V h ðTÞ þ g0 V 1 VðbÞV 1 g 2g0 V 1 CovðT; bÞ
Now Dn þ 1 ¼ g0n þ 1 V 1
n þ 1 gn þ 1
¼ g0n þ 1 C 0 ðC 0 Þ1 V 1 1
n þ 1 C Cgn þ 1 for any non symmetric matrix C n þ 1xn þ 1
0 0 1
¼ ðCgn þ 1 Þ ðCVn þ 1 C Þ ðCgn þ 1 Þ
!
In o
Choose C ¼
m0n V 1 1
n
!
In o g gn
n
) Cgn þ 1 ¼ ¼
m0n V 1
n 1 gn þ 1 gn þ 1 m0n V 1
n gn
! ! !
In o Vn mn I n V 1n mn
CVn þ 1 C 0 ¼
m0n V 1 1 m0n mn þ 1;n þ 1; o 1
n
! ! !
In o Vn o Vn o
¼ ¼
m0n V 1
n 1 m0n mn þ 1;n þ 1 m0n V 1
n mn
o
E n þ 1;n þ 1
Then
0 1
V 1 o
n g0n
Dn þ 1 ¼ gn ; g nþ1
m0n V 1 @ A
n gn
o E 1 gn þ 1 m0n V 1
n gn
n þ 1;n þ 1
n 1 o g0n
¼ gn V n 1 ; gn þ 1 m0n V 1
n gn E n þ 1;n þ 1
gn þ 1 m0n V 1
n gn
2
gn þ 1 m0n V 1
n gn
¼ g0n V n 1 gn þ g0n V n 1 gn ¼ Dn
E1
n þ 1;n þ 1
i.e. Dn þ 1 Dn :
If there exists no unbiased estimator T of gðhÞ for which V(T) attains the nth
Bhattacharya’s Lower Bound (BLB), then one can try to find a sharper lower bound
by considering the (n + 1)th BLB. In case the lower bound is attained at nth stage,
then Dn þ 1 ¼ Dn . However, Dn þ 1 ¼ Dn does not imply that the lower bound is
attained at the nth stage.
Example 1.18 X 1 ; X 2 ; . . .; X n is a random sample from iid Nðh; 1Þ
P 2
f h ðxÞ ¼ Const: e2 ðxi hÞ ; gðhÞ ¼ h2
1
1 1
X N h; i.e. EðXÞ ¼ h; VðXÞ ¼
n n
2 1 2 1
) EðX Þ E2 ðXÞ ¼ ) EðX Þ ¼h2 þ
n n
2 1 2 1
)E X ¼ h2 ; T ¼X :
n n
@ P 2 X
f h ðxÞ ¼ Const: e2 ðxi hÞ
1
ð x i hÞ
@h
1 @ X
b1 ¼ f h ðxÞ ¼ ð x i hÞ
f h ðxÞ @h
1 @2 nX o2
b2 ¼ f ðxÞ ¼ ð x i h Þ n
f h ðxÞ @h2 h
Eðb1 Þ ¼ 0; Eðb2 Þ¼ 0; E b21 ¼ n;
nX o3 nX o
Eðb1 b2 Þ ¼ E ðxi hÞ nE ðxi hÞ ¼ 0
nX o4 nX o2
E b22 ¼ E ðxi hÞ þ n2 2nE ð x i hÞ
¼ 3n2 þ n2 2n n ¼ 2n2
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 37
1
n 0 1 0
V¼ )V ¼ n
1
0 2n2 0 2n2
@
fh ðxÞ ¼ fh ðxÞ k10 ðhÞtðxÞ þ k20 ðhÞ
@h
1 @
b1 ¼ fh ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ
fh ðxÞ @h
@2 h 2 00 i
0 0 00
f h ðxÞ ¼ fh ðxÞ k 1 ðhÞtðxÞ þ k 2 ðhÞ þ k 1 ðhÞtðxÞ þ k 2 ðhÞ
@h2
1 @2 2
b2 ¼ fh ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ k100 ðhÞtðxÞ þ k200 ðhÞ
fh ðxÞ @h 2
1 @i
i h
Generally, bi ¼ fh ðxÞ f ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ Pi1 ftðxÞ; hg
@hi h
where Pi1 ftðxÞ; hg ¼ a polynomial in t(x) of degree at most (i − 1).
38 1 Theory of Point Estimation
iP
1
Let Pi1 ftðxÞ; hg ¼ Qij ðhÞ:t j ðxÞ
j¼0
Then
i X
i1
bi ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ Qij ðhÞ:t j ðxÞ
j¼0
i1
X j ij Xi1 ð1:15Þ
i
¼ k10 ðhÞ :t j ðxÞ k20 ðhÞ þ Qij ðhÞ:t j ðxÞ
j¼0 j j¼0
X
k
b
g ðxÞ ¼ a0 ðhÞ þ ai ðhÞbi ð1:16Þ
i¼1
with ak ðhÞ 6¼ 0:
Proof Only if part Given that b g ðxÞ is of the form (1.16), we have to show that
^
gðxÞ is a polynomial of degree k in t(x). From (1.15), bi is a polynomial of degree
i in t(x). So by putting the value of bi in (1.16), we get ^gðxÞ as a polynomial of
degree k in t(x) since ak ¼ 0.
if part Given that h
X
k
^gðxÞ ¼ Cj t j ðxÞ
j¼0 ð1:17Þ
½Ck 6¼ 0 ¼ a polynomial of degree k in tðxÞ
X
k X
k X
i1
a0 ðhÞ þ ai ðhÞbi ¼ a0 ðhÞ þ ai ðhÞ Qij ðhÞ t j ðxÞ
i¼0 i¼0 j¼0
!
X
k X
i i j ij
þ ai ðhÞ k10 ðhÞ t j ðxÞ k20 ðhÞ
i¼1 j¼0 j
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 39
from (1.15)
!
X
k jXk i 0 ij X
k1 X k
¼ t j ðxÞ k10 ðhÞ ai ðhÞ k2 ðhÞ þ t j ðxÞ ai ðhÞQij
j¼0 i¼j j j¼0 i¼j þ 1
!
k X i1 jXk i 0 ij X
k1 X k
¼ tk ðxÞ k10 ðhÞ ak ðhÞ þ t j ðxÞ k10 ðhÞ ai ðhÞ k2 ðhÞ þ t j ðxÞ ai ðhÞQij ðhÞ
j¼0 i¼j j j¼0 i¼j þ 1
" ! !#
k X i1 j X k i 0 j 0 ij
¼ tk ðxÞ k10 ðhÞ ak ðhÞ þ t j ðxÞ k10 ðhÞ aj ðhÞ þ ai ðhÞ k1 ðhÞ k2 ðhÞ þ Qij ðhÞ
j¼0 i¼j þ 1 j
ð1:18Þ
VðTn Þ
PrfjTn EðTn Þj 20 g [ 1 :
202
Now jT n cðhÞj jT n E ðT n Þj þ jE ðT n Þ cðhÞj
jT n E ðT n Þj 20 ) jT n cðhÞj 20 þ jEðT n Þ cðhÞj
Hence, h
PrfjT n cðhÞj 20 þ jE ðT n Þ cðhÞjg Pr T n EðT n Þ 20 [ 1
VðT n Þ
: ð1:19Þ
202
and
Taking 2 ¼ 20 þ 200
) PrfjT n cðhÞj 2g [ 1 d whenever n n0
Since, 20 200 and d are arbitrary positive numbers, the proof is complete.
(It should be remembered that consistency is a large sample criterion)
Example 1.19 Let X 1 ; X 2 ; . . .; X n be a random sample from a population mean l
P
and standard deviation r. Then X n ¼ 1n i X i is a consistent estimator of l.
2
Proof E X n ¼ l; V X n ¼ rn ! 0 as n ! 1. Sufficient condition of consistency
holds. ) X n will be consistent for l. h
1.4 Consistent Estimator 41
Alt
By Chebysheff’s inequality, for any 2
r2
Pr X n l 2 [ 1
n 22
Now for any d, we can find an n0 so that
Pr X n l 2 [ 1 d whenever n n0 (here d ¼ nr22 )
2
Example 1.20 Show that in random sampling from a normal population, the sample
mean is a consistent estimator of population mean.
n pffiffio
Proof For any 2 ð [ 0Þ; Pr X n l 2 ¼ Pr jZ j 2 r n
p
2 n
ffi
Zr
1 X n l pffiffiffi
pffiffiffiffiffiffie2t dt
12
¼ where Z ¼ n N ð0; 1Þ
pffi
2p r
2 n
r
h
Hence, we can choose an n0 depending on any two positive numbers 2 and d
such that
PrfjX n lj 2g [ 1 d whenever n n0
)X n Pr
!l as n ! 1 )X n is consistent for l.
Example 1.21 Show that for random sampling from the Cauchy population with
density function
f ðx;lÞ ¼ p1 1 þ ð1xlÞ2 ; 1 \ x\ 1; the sample mean is not a consistent esti-
mator of l but the sample median is a consistent estimator of l.
Answer
Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; lÞ ¼ p1 1 þ ð1xlÞ2 : It can be shown
that the sample mean X is distributed as x.
Z2
1 1 2
) Pr X n l 2 ¼ dZ ¼ tan1 2 taking Z ¼X l
p 1þZ 2 p
2
) Since E X~ n ! l and V X ~ n ! 0 as n ! 1; sufficient condition for consistent
~
estimator holds. )X n ; is consistent for l.
Remark 1.17 Consistency is essentially a large sample criterion.
Remark 1.18 Let T n be a consistent estimator of cðhÞ and wfyg be a continuous
function. Then wfT n g will be a consistent estimator of wfcðhÞg.
Proof Since T n is a consistent estimator of cðhÞ; for any two +ve numbers 21 and d;
we can find an n0 such that h
PrfjT n cðhÞj 21 g [ 1 d whenever n n0 :
Now wfT n g is a continuous function of T n : Therefore, for any 2, we can choose
an 21 such that
unbiased.
Remark 1.21 An unbiased estimator is not necessarily consistent, e.g.
f ð xÞ ¼ 12 ejxhj ; 1 \ x \ 1.
Xð1Þ þ XðnÞ
An unbiased estimator of h is 2 ; but it is not consistent.
Remark 1.22 A consistent estimator may be meaningless,
0 if n 1010
e.g. Let Tn0 ¼
Tn if n 1010
If Tn is consistent, then Tn0 is also consistent, but Tn0 is meaningless for any practical
purpose.
Remark 1.23 If T1 and T2 are consistent estimators of c1 ðhÞ and c2 ðhÞ; then
ðiÞðT1 þ T2 Þ is consistent for c1 ðhÞ þ c2 ðhÞ and
ðiiÞT 1 T 2 is consistent for c1 ðhÞc2 ðhÞ:
Proof (i) Since T1 and T2 are consistent for c1 ðhÞ and c2 ðhÞ; we can always choose
an n0 much that
1.4 Consistent Estimator 43
PrfjT1 c1 ðhÞj 21 g [ 1 d1
and
PrfjT2 c2 ðhÞj 22 g [ 1 d2
whenever n n0
21 ; 22 ; d1 ; d2 are arbitrary positive numbers.
21 þ 22 ¼ 2; ðsayÞ
½*PðABÞ Pð AÞ þ PðBÞ 1
1 d1 þ 1 d2 1 ¼ 1 ðd1 þ d2 Þ ¼ 1 d for n n0
) jT1 T2 c1 ðhÞc2 ðhÞj ¼ jfT1 c1 ðhÞgfT2 c2 ðhÞg þ T2 c1 ðhÞ þ T1 c2 ðhÞ 2c1 ðhÞc2 ðhÞj
jfT1 c1 ðhÞgfT2 c2 ðhÞgj þ jc1 ðhÞjjT2 c2 ðhÞj þ jc2 ðhÞjjT1 c1 ðhÞj
21 22 þ jc1 ðhÞj 22 þ jc2 ðhÞj 21 ¼2 ðsayÞ
l0 l02
As E m0r ¼ l0r and V m0r ¼ 2r r
n
)V m0r ! 0 as n ! 1 )m0r is consistent for l0r and E ðmr Þ ¼ lr þ 0 1n
1 1
V ðm r Þ ¼ l2r l2r 2rlr1 lr þ 1 þ r 2 l2r1 l2 þ 0 2 ! 0 as n ! 1
n n
)mr is consistent for lr :
l2
(c) Also it can be shown that b1 and b2 are consistent estimators of b1 ¼ l33 and
2
b2 ¼ ll42 :
2
Example 1.23 Let T1 and T2 be two unbiased estimators of h with efficiency e1 and
e2 , respectively. If q denotes the correlation coefficient between T1 and T2 , then
pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
e1 e2 ð1 e1 Þð1 e2 Þ q e1 e2 þ ð1 e1 Þð1 e2 Þ
Proof For any real ‘a’, T ¼ aT 1 þ ð1 aÞT 2 will also be an unbiased estimator of
h. Now
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
V ðT Þ ¼ a2 V ðT 1 Þ þ ð1 aÞ2 V ðT 2 Þ þ 2að1 aÞq V ðT 1 ÞV ðT 2 Þ:
V0 V0 V0
) a2 þ ð1 aÞ2 þ 2að1 aÞq pffiffiffiffiffiffiffiffiffi V0
e1 e2 e1 e2
1 1 2q 1 q 1
) a2 þ pffiffiffiffiffiffiffiffiffi 2a pffiffiffiffiffiffiffiffiffi þ 1 0
e1 e2 e1 e2 e2 e1 e2 e2
!2 !2
pffiffiffiffiffiffi pffiffiffiffiffiffi
e2 e1 e2
q
e2 e1 e2
q
e2 1
1 1 1
) a 1 þ 1 1 0
e1 þ e2 e1 e2
1 p2qffiffiffiffiffiffi 1 ffiffiffiffiffiffi
e1 þ e2 e1 e2
p2q
e1 þ e2 e1 e2
1 p2qffiffiffiffiffiffi
1 pqffiffiffiffiffi
e2 e1 e2
Taking a ¼ 1 ffiffiffiffiffi, we get
þ e1 pe2qe
e1 2 1 2
2
1 1 1 2q 1 q
1 þ pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi 0
e2 e1 e2 e1 e2 e2 e1 e2
p ffiffiffiffiffiffiffiffi
ffi e 1 e1
) q2 þ 2q e1 e2 þ ð1 e 2 Þ 1 þ 0
e2 e2
pffiffiffiffiffiffiffiffiffi
) q2 2q e1 e2 1 þ e1 þ e2 0
pffiffiffiffiffiffiffiffiffi 2
) ð q e1 e2 Þ ð 1 e1 Þ ð 1 e2 Þ 0
pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
) q e1 e2 ð1 e1 Þð1 e2 Þ Hence, the result. h
Remark 1.27 The correlation coefficient between T and the most efficient estimator
pffiffiffi
is e where e is the efficiency of the unbiased estimator T. Put e2 ¼ e and e1 = 1 in
pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
the above inequality; q e1 e2 ð1 e1 Þð1 e2 Þ and easily we get the
result.
Chapter 2
Methods of Estimation
2.1 Introduction
If the correspondence between l0r and θ is one-to-one and the inverse function is
hi ¼ f i l01 ; l02 ; . . .; l0k , i = 1, 2,.., k then, the method of moment estimate becomes
^
hi ¼ f i m01 ; m02 ; . . .; m0k . Now, if the function fi() is continuous, then by the weak
law of large numbers, the method of moment estimators will be consistent.
This method gives maximum likelihood estimators when f(x, θ) = exp
(b0 + b1x + b2x2 + ….) and so, in this case it gives efficient estimator. But the
estimators obtained by this method are not generally efficient. This is one of the
simplest methods. Therefore, these estimates can be used as a first approximation to
get a better estimate. This method is not applicable when the theoretical moments
do not exist as in the case of Cauchy distribution.
Example 2.1 Let X1 ; X2 ; . . .Xn be a random sample from p.d.f.
f ðx; a; bÞ ¼ bða;b
1
Þx
a1
ð1 xÞb1 ; 0\x\1; a; b [ 0: Find the estimators of a and
b by the method of moments.
Solution
This method of estimation is due to R.A. Fisher. It is the most important general
method of estimation. Let X ¼ ðX 1 ; X 2 ; . . .; X n Þ denote a random sample with joint
p.d.f or p.m.f. f x ; h ; h 2 H (θ may be a vector). The function f x ; h , con-
sidered as a function of θ, is called the likelihood function. In this case, it is denoted
by L(θ). The principle of maximum likelihood consists of choosing an estimate, say
^
h; within the admissible range of θ, that maximizes the likelihood. ^h is called the
maximum likelihood estimate (MLE) of θ. In other words, ^h will be an MLE of θ if
L h^ LðhÞ8 h 2 H:
log L ^h log LðhÞ8 h 2 H:
Again, if log LðhÞ is differentiable within H and ^h is an interior point, then ^h will
be the solution of
@log LðhÞ 0
¼ o; i ¼ 1; 2; . . .; k; h k1 = ðh1 ; h2 ; . . .; hk Þ .
@hi
Answer
Pn
LðhÞ ¼ Const: e i¼1
jxi hj
P
Maximization of L(θ) is equivalent to the minimization of ni¼1 jxi hj: Now,
Pn
~
i¼1 jxi hj will be least when h ¼ X; the sample median as the mean deviation
about the median is least. X ~ will be an MLE of θ.
Properties of MLE
(a) If a sufficient statistic exists, then the MLE will be a function of the sufficient
statistic.
n o
Proof Let T be a sufficient statistic for the family f X ; h ; h 2 H
Qn n o
By the factorisation theorem, we have f ð x i ; hÞ ¼ g T X ; h h X :
n i¼1 o n o
To find MLE, we maximize g T x ; h with respect to h. Since g T X ; h
Remark 2.1 Property (a) does not imply that an MLE is itself a sufficient statistic.
Example 2.3 Let X1, X2,…,Xn be a random sample from a population having p.d.f.
1 8 hxhþ1
f X; h ¼ :
0 Otherwise
1 if h MinX i MaxX i h þ 1
Then, LðhÞ ¼ :
0 Otherwise
Any value of θ satisfying MaxX i 1 h MinX i will be an MLE of θ. In
particular, Min Xi is an MLE of θ, but it is not sufficient for θ. In fact, here
ðMinXi ; MaxXi Þ is a sufficient statistic.
(b) If T is the MVBE, then the likelihood equation will have a solution T.
@ log f X ;h
Proof Since T is an MVBE, ¼ ðT hÞkðhÞ
@h
@ log f X ;h
Now, @h ¼0
) h ¼ T ½* kðhÞ 6¼ 0:
@f ðx; hÞ @ f ðx; hÞ
(ii) @h \A1 ðxÞ; @h2 \A2 ðxÞ and @ f@hðx;3 hÞ \BðxÞ;
where A1(x) and A2(x) are integrable functions of x and
R1
BðxÞf ðx; hÞdx \ M; a finite quantity
1
R1 2
@ log f ðx; hÞ
iii) @h f ðx; hÞdx is a finite and positive quantity.
1
If ^
hn is an MLE of θ on the basis of a sample of size n, from a population having
p.d.f. (or p.m.f.) f(x,θ) which satisfies the above regularity conditions, then
2.3 Method of Maximum Likelihood 51
pffiffiffi^
n hn h is asymptotically normal with mean ‘0’ and variance
1 2 1
R @ log f ðx; hÞ
@h f ðx; hÞdx : Also, ^hn is asymptotically efficient and consistent.
1
(e) An MLE may not be unique. h
1 if h x h þ 1
Example 2.4 Let f ðx; hÞ ¼ :
0 Otherwise
1 if h min xi max xi h þ 1
Then, LðhÞ ¼
0 Otherwise
1 if max xi 1 h min xi
i.e. LðhÞ ¼
0 Otherwise
L( θ )
Max Xi θ
From the figure, it is clear that the likelihood L(θ) will be the largest when
θ = Max Xi. Therefore Max Xi will be an MLE of θ. Note that EðMax X i Þ ¼
n þ 1 h 6¼ h: Therefore, here MLE is a biased estimator.
n
Example 2.6
1 3
f ðx; pÞ ¼ px ð1 pÞ1x ; x ¼ 0; 1; p 2 ;
4 4
p if x ¼ 1 p ¼ 34 if x ¼ 1
Then; LðpÞ ¼ i.e. LðpÞ will be maximized at
1p if x ¼ 0 p ¼ 14 if x ¼ 0
MSE of T ¼ E ðT pÞ2
2
2x þ 1 1
¼E p ¼ Ef2ðx pÞ þ 1 2pg2
4 16
1 n o
¼ E 4ðx pÞ2 þ ð1 2pÞ2 þ 4ðx pÞð1 2pÞ
16
1 n o 1
¼ 4pð1 pÞ þ ð1 2pÞ2 ¼
16 16
1 1\x\1
f ðx; hÞ ¼ ejxhj ;
2 1\h\1
Here, regularity conditions do not hold. However, the MLE (=sample median) is
asymptotically normal and efficient.
Solution
P
n
b ðxi aÞ
Lða; bÞ ¼ b e n i¼1
X
n
loge Lða; bÞ ¼ n loge b b ð x i aÞ
i¼1
@ log L P @ log L
@b ¼ bn ðxi aÞ and @a ¼ nb:
@ log L
Now, ¼ 0 gives us β = 0 which is nonadmissible. Thus, the method of
@a
differentiation fails here.
Now, from the expression of L(α, β), it is clear that for fixed β(>0), L(α, β)
becomes maximum when α is the largest. The largest possible value of α is
X(1) = Min xi.
Now, we maximize L X ð1Þ ; b with respect to β. This can be done by consid-
ering the method of differentiation.
@ log L xð1Þ ; b n X n
¼0) ðxi min xi Þ ¼ 0 ) b ¼ P
@b b ðxi min xi Þ
So, the MLE of (α, β) is minxi ; P n
:
ðxi minxi Þ
Example 2.10 Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; a; bÞ ¼
ba; axb
1
0; Otherwise
(a) Show that the MLE of (α, β) is (Min Xi, Max Xi).
(b) Also find the estimators of a and b by the method of moments.
Proof
1
ðaÞLða; bÞ ¼ if a Minxi \Maxxi b ð2:1Þ
ð b aÞ n
It is evident from (2.1), that the likelihood will be made as large as possible
when (β − α) is made as small as possible. Clearly, α cannot be larger than Min xi
and β cannot be smaller than Max xi; hence, the smallest possible value of (β − α) is
(Max xi − Min xi). Then the MLE’S of α and β are ^a ¼ Min xi and b ^ ¼ Max xi ,
respectively.
2
(b) We know E ð xÞ ¼ l11 ¼ a þ2 b and V ð xÞ ¼ l2 ¼ ðb 12aÞ
2 P
Hence, a þ2 b ¼ x and ðb 12aÞ ¼ 1n ni¼1 ðxi xÞ2
54 2 Methods of Estimation
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi
2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi
2
ðxi xÞ ðxi xÞ
By solving, we get ^a ¼ x
3
^ ¼ x þ
and b
3
n n
@h
¼ @h 2
h¼h h¼h0 h¼h0
with powers higher
than unity.
@ log L log L
þ ðh h0 Þ @ @h
2
) 0 ’ @h 2 ; neglecting the terms involving
h¼h0 h¼h0
ðh h0 Þ with powers higher than unity. 2
L
) 0 ’ @ log
@h ðh h0 ÞIðh0 Þ; where IðhÞ ¼ E @ @h
log L
2 :
h¼h0
Thus, the first approximate value of θ is
8 9
> @ log L >
< @h =
hð1Þ ¼ h0 þ
h¼h0
:
: I ð h0 Þ >
> ;
This method may be used when the population is grouped into a number of mu-
tually exclusive and exhaustive class and the observations are given in the form of
frequencies.
Suppose there are k classes and pi ðhÞ is the probability of an individual
belonging to the ith class. Let f i denote the sample frequency. Clearly,
Pk Pk
i¼1 pi ðhÞ ¼ 1 and i¼1 f i ¼ n:
The discrepancy between observed frequency and the corresponding expected
frequency is measured by the Pearsonian v2 , which is given by
P 2 P f2
v2 ¼ ki¼1 ff i npnpi ðhi ðÞhÞg ¼ npi iðhÞ n:
The principle of the method of minimum v2 consists of choosing an estimate of
θ, say ^h, we first consider the minimum v2 equations @v
2
of v2 , with respect to θ.
It can be shown that for large n the estimates obtained by min v2 would also be
approximately equal to the MLE’s. Difficulty arises if some of the classes are
empty. In this case, we minimize
X ff npi ðhÞg2
v002 ¼ i
+ 2M;
i:f 6¼0
fi
i
i¼r þ 1 ni l 1 þ nu
i P
i¼r þ k1 l
1 ¼ 0.
i¼r þ k1
pi ð lÞ
Since there is only one parameter, i.e. p ¼ 1 we get the only above equation. By
solving, we get
P
r P
1
ipi ðlÞ X
rþ k2 ipi ðlÞ
i¼0 i¼r þ k1
l¼
n^ nL P r þ ini þ nu P 1
pi ðlÞ i¼r þ 1 pi ðlÞ
i¼0 i¼r þ k1
In the method of least square , we consider the estimation of parameters using some
specified form of the expectation and second moment of the observations. For
fitting a curve of the form y ¼ f x; b0 ; b1 ; . . .; bp to the data ðxi ; yi Þ; i = 1, 2,…n,
we may use the method of least squares. This method consists of minimizing the
sum of squares.
P
S ¼ ni¼1 e2i , where ei ¼ yi f xi ; b0 ; b1 ; . . .; bp ; i = 1, 2,…,n with respect to
P 2 P 2
b0 ; b1 ; . . .; bp : Sometimes, we minimize wi ei instead of ei . In that case, it is
called a weighted least square method.
To minimize S, we consider (p + 1) first order partial derivatives and get (p + 1)
equations in (p + 1) unknowns. Solving these equations, we get the least square
estimates of b0i s.
In general, the least square estimates do not have any optimum properties even
asymptotically. However, in case of linear estimation this method provides good
estimators. When f xi ; b0 ; b1 ; . . .; bp is a linear function of the parameters and the
2.5 Method of Least Square 57
x-values are known, least square estimators will be BLUE. Again, if we assume that
e0i s are independently and identically normally distributed, then a linear estimator of
0
the form a b will be MVUE for the entire class of unbiased estimators. In general,
we consider n uncorrelated observations y1 ; y2 ; . . .yn such that E ðyi Þ ¼
b1 x1i þ b2 x2i þ þ bk xki :
EðY Þ ¼ Xb
V ðeÞ ¼ E ðee0 Þ ¼ r2 I
@/
¼0
@b
Or; 2X 0 ðY Xb Þ ¼ 0:
^ ¼ ðX 0 X Þ1 X 0 Y:
The least square estimators b 0 s is thus given by the vector b
Example 2.13 Let yi ¼ b1 x1i þ b2 x2i þ ei ; i ¼ 1; 2; . . .. . .; n or Eðyi Þ ¼ b1 x1i þ
b2 x2i ; x1i ¼ 1 for all i.
Find the least square estimates of b1 and b2 . Prove that the method of maximum
likelihood and the method of least square are identical for the case of normal
distribution.
Solution
In matrix notation we have 0 1 0 1
1 x21 y1
B1
B x22 C
C b1
B y2 C
B C
E ðY Þ ¼ Xb ; where X ¼ B ..
.. C; b ¼ and Y ¼ B . C
@. . A b2 @ .. A
1 x2n yn
58 2 Methods of Estimation
Now,
^ ¼ ðX 0 X Þ1 X 0 Y
b
0 1
1 x21
B
P
... 1 B1 x22 C
C
Here, X 0 X ¼
1 1
B. .. C ¼ Pn P x2i
x21 x22 . . . x2n @ .. . A x2i x22i
1 x2n
P
X0Y ¼ P yi
x2i yi
P 2 P
P
^ ¼ P 1 x x2i yi
)b P P 2i P
n x22i ð x2i Þ2 x2i n x2i yi
P 2 P P P
1 x2i yi x2i x2i yi
¼ P 2 P P P P P
n x2i ð x2i Þ2 x2i yi þ n x2i yi
Hence,
P
P P P P P
^ ¼n x2i yi x2i yi x2i yi nx2y
b P 2 P ¼ P 2
2
n x2i ð x2i Þ 2 x2i nx22
P
ðx2i x2 Þðyi yÞ
¼ P
ðx2i x2 Þ2
P P P P
^ ¼ x22i yi x2i x2i yi
b 1 P P
n x22i ð x2i Þ2
P P
y x22i x2 x2i yi
¼ P 2
x2i nx2
P
ynx22 x2 x2i yi
¼ y þ P
x22i nx22
^
¼ y x2 b 2
2.5 Method of Least Square 59
X
n
/¼ ðyi b1 b2 xi Þ2
i¼1
Answer
1 Xn 2
m11 ¼ x; m12 ¼ x
i¼1 i
n
Pn
Hence, rh ¼ x; r ðr þ 1Þh2 ¼ 1n 2
i¼1 xi Pn
ðx xÞ2
By solving, we get ^r ¼ Pn nx
2
2 and ^
h ¼ i¼1 i
ðx
x Þ n
x
i¼1 i
Pn Q n
(ii) L ¼ hnr ðC1ðrÞÞn eh i¼1 xi
1
xr1
i
i¼1
P P
log L ¼ nr log h n log Cðr Þ 1h ni xi þ ðr 1Þ ni¼1 log xi
Now, @ log L ¼ nr þ nx2 ¼ 0 ) ^h ¼ x
@h h h r
60 2 Methods of Estimation
Thus, for this example estimators of h and r are more easily obtained by the method
of moments than the method of maximum likelihood.
Example 2.15 If a sample of size one is drawn from the p.d.f f ðx; bÞ ¼
2
b2
ðb xÞ; 0\x\b:
^ the MLE of b and b , the estimator of b based on method of moments. Show
Find b,
^ is biased, but b is unbiased. Show that the efficiency of b
that b ^ w.r.t. b is 2/3.
Solution
2
L¼ ð b xÞ
b2
@ log L 2 1
¼ þ ¼ 0 ) b ¼ 2x
@b b bx
a
Thus, the MLE of b is given by b = 2x.
Rb
Now, Eð xÞ ¼ b22 0 ðbx x2 Þdx ¼ b3
Hence, b3 ¼ x ) b ¼ 3x
Thus, the estimator of b based on method of moment is given by b ¼ 3x:
Now,
^ ¼ 2 b ¼ 2b 6¼ b
E b
3 3
b
E ðb Þ ¼ 3 ¼ b
3
^ is biased but b is unbiased.
Hence, b
2.5 Method of Least Square 61
Again,
Zb
2 b2
E x2 ¼ 2 bx2 x3 dx ¼
b 6
0
b2 b2 b2
) V ð xÞ ¼ ¼
6 9 18
b2
V ðb Þ ¼ 9V ð xÞ ¼
2
2 2
^
V b ¼ 4V ð xÞ ¼ b
9
h i2
^ ^
M b ¼ V b þ E b b ^
2
2 2 2
¼ b þ bb
9 3
1
¼ b2
3
^ with respect to b is 2 :
Thus, the efficiency of b 3
Chapter 3
Theory of Testing of Hypothesis
3.1 Introduction
P m P
m
xi m xi
m x mx
pðx=hÞ ¼ p ð 1 pÞ Or p x =h ¼ p i¼1 ð1 pÞ i¼1
x
m x m x
q¼ p ð1 pÞmx ; p 2 ½0; 1 is known but p ð1 pÞmx is
x x
unknown if p is unknown.
Example 3.2 Let X1 ; X2 ; . . .Xn1 are i.i.d Pðk1 Þ and Y1 ; Y2 ; . . .Yn2 are i.i.d Pðk2 Þ.
Also they are independent and n1 and n2 are known.
Now,
Pn
1 ðxi lÞ2
pðx=hÞ ¼ ð2pÞ n=2 n 2r2 1
r e or
P
n
12 ðxi lÞ2
¼ ð2pÞn=2 e 1 ; r2 ¼ 1 or
P
n
1 x2
¼ ð2pÞ n=2 n 2r2 1 i
r e ; l ¼ 0: or
q ¼ p x=h ; 1\l\1; r2 [ 0 or
p x=h ; 1\l\1 or
2
p x=h ; r [ 0 all are known but unknown is p x=h for fixed θ (Unknown).
Parametric set up
pð xÞ ¼ pðx=hÞ; h 2 H: Then we can find HH ð HÞ and HK ð HÞ such that
So,
H : p 2 pH , H : h 2 H H
K : p 2 pK , K : h 2 H K :
Example 3.4 X1 ; X2 ; . . .Xn are i.i.d N ðl; r2 Þ: Consider the following hypothesis
ðH Þ:
1. H : l ¼ 0; r2 ¼ 1 : H ) H N ð0; 1Þ
2. H : l
0; r2 ¼ 1
3. H : l ¼ 0; r2 [ 0
4. H : r2 ¼ r20
5. H : lþr ¼ 0
The first one is a simple hypothesis and the remaining are composite hypotheses.
Definition 3 Let x be the observed value of the random variable X with probability
model pðx=hÞ; h unknown. Wherever X ¼ x is observed, pðx=hÞ is a function of h
only and is called the likelihood of getting such a sample. It is simply called the
likelihood function and often denoted by LðhÞ or Lðh=xÞ:
66 3 Theory of Testing of Hypothesis
Definition 4 Test It is a rule for the acceptance or the rejection of the null
hypothesis (H) on the basis of an observed value of X.
Definition 5 Non-randomized test
Let x be a subset of
x such that
X 2 x ) The rejection of H
X2
x x ) The acceptance of H:
Then x is called the critical region or a test for H against K. Test ‘x’ means a
rule determined by x. Note that x does not depend on the random experiment (that
is on X). So it is called a non-randomized test.
Definition 6 Randomized test:
It consists in determining a function /ð xÞ
such that
(i) 0
/ð xÞ
1 8x 2
x
(ii) H is rejected with probability /ð xÞ whenever X ¼ x is observed.
Such a ‘/ð xÞ’ is called ‘Critical function’ or ‘test function’ or simply ‘test’ for
H against K. Here the function /ð xÞ depends on the random experiment (that is on
X). So that name ‘randomised’ is justified.
e.g. (i) and (ii) ⇒ whenever X ¼ x is observed, perform a Bernoullian trial with
probability of success /ð xÞ. If the trial results in success, reject H; otherwise H is
accepted. Thus /ð xÞ represents the probability of rejection of H.
If /ð xÞ is non-randomized with critical region ‘x’, then we have
x ¼ fx : /ð xÞ ¼ 1g
x x ¼ fx :/ðxÞ ¼ 0g (Acceptance region).
Detailed study on Non-randomized test
If x is Non-randomized test then it implies H is rejected iff X 2 x. In many cases,
we get a statistic T ¼ T ð X Þ such that for some C or C1 and C2 ,
½X 2 x , ½X : T [ C or ½X : T\C or ½X : T\C1 or : T [ C2 ; C1 \C2 :
Such a ‘T’ is called ‘test statistic’.
The event ½T [ C is called right tail test based on T.
The event ½T\C is called left tail test based on T.
The event ½T\C1 or T [ C2 is called two tailed test based on ‘T’.
Sometimes C1 and C2 are such that PfT\C1 =hg ¼ PfT [ C2 =hg8h 2 HH then
the test ½T\C1 or T [ C2 is called equal-tail test based on T.
3.2 Definitions and Some Examples 67
Testing problem
H : h 2 HH versus K : h 2 HK fHH \ HK ¼ ;g
Px ðhÞ
a 8h 2 HH ½a : 0\a\1 ð3:1Þ
Px ðhÞ
a 8h 2 HH and Px ðhÞ ¼ a for some h 2 HH ð3:2Þ
n 2
o
Sup Pw ðl; r Þ l ¼ 0; 0\r2 \1 :
Example 3.5 X1 ; X2 ; . . .; Xn are i.i.d N l; r02 ; H : l
l0 against K : l [ l0 .
r0
x ðX1 ; X2 ; . . .; Xn Þ : X [ l0 þ pffiffiffi
n
Pw ðhÞ ¼ Pw ðlÞ ¼ P X [ l0 þ pffiffin l
r 0
pffiffiffi pffiffiffi
nðX lÞ nðl l0 Þ
¼P [ þ 1j l
r0 r0
pffiffiffi
nðl l0 Þ
¼ P Z [1 jZ N ð0; 1Þ
r0
pffiffiffi
nðl l0 Þ
¼U 1
r0
pffiffiffi
nðl l0 Þ
Size of x ¼ Sup Pw ðlÞ ¼ Sup U 1
l
l0 l
l0 r0
¼ Uð1Þ ¼ size of x for testing H : l ¼ l0 :
Example 3.6
X1 ; X2 ; . . .; Xn are i.i.d N l; r02 .
H : l ¼ l0 against K : l [ l0 .
r0 a 2 ð0; 1Þ
x¼ ðX1 ; X2 ; . . .; Xn Þ : X [ l0 þ pffiffiffi sa ;
n U ðsa Þ ¼ a
[ cg
x : fðX1 ; X2 ; . . .; Xn Þ; X ð3:3Þ
n o
Pw ðl0 Þ ¼ size of x for testing H ¼ P X [ c l
0
pffiffiffi pffiffiffi
n X l0 ðc l0 Þ n
¼P [ =l0
r0 r0
pffiffiffi
ðc l0 Þ n
¼P Z[ =Z N ð0; 1Þ ¼ 0:05ðgivenÞ
r0
pffiffiffi
ðc l0 Þ n 1:645r0
) ¼ s0:05 ’ 1:645 )c ¼ l0 þ pffiffiffi
r0 n
ð3:4Þ
f x=p ¼ px ð1 pÞx ; x ¼ 0; 1
1
Size ¼ 1: PfðX1 þ X2 Þ ¼ 2g þ PðX1 þ X2 ¼ 1Þ þ 0: PðX1 þ X2 ¼ 0Þ
2
¼ 0:25 þ 0:25 ¼ 0:50
X n o
¼ P E=x; h p x=h ; when X is discrete ð3:6Þ
P/ ðhÞ ¼ /ð xÞ p x=h dx As P E=x; h ¼ P E=x ¼ /ð xÞ
¼ E h / ð xÞ
P
In case of (3.6), we get: P/ ðhÞ ¼ /ð xÞ:p x=h ¼ Eh /ð xÞ
In either case we have P/ ðhÞ ¼ Eh /ð xÞ8h 2 H:
Special cases
1. Suppose /ð xÞ takes only two values, viz. 0 and 1. In that case, we say /ð xÞ is
non-randomized with critical region x ¼ fx : /ð xÞ ¼ 1g.
In that case
Example 3.9 X1 ; X2 ; . . .; Xn are i.i.d. Bernoulli (1, p), n = 25. Testing problem:
H : p ¼ 12 against K : p 6¼ 12.
3.2 Definitions and Some Examples 71
9
P
25
>
/ðxÞ ¼ 1 if xi [ c >
>
>
>
>
1 >
=
P
25
(2) ¼ a if xi ¼ c
>
>
1 >
>
P25 >
>
¼ 0 if xi \c >
;
1
nP o
25
By inspection we find that Pp¼12 1 x i [ 17 ¼ 0:02237 and
nP o
25
Pp¼12 1 xi [ 16 ¼ 0:0546. Hence, c = 17.
P15
0:05Pp¼1 x [c
Now, a ¼ P25
2 1 i
¼ 0:050:02237 ¼ 0:8573:
P xi ¼c 0:03223
1
Thus the test given by
X
25
/ð xÞ ¼ 1 if xi [ 17
1
X
25
¼ 0:8573 if xi ¼ 17
1
X
25
¼ 0 if xi \17
1
X
25
/ðxÞ ¼ 1 if xi 17
1
X
25
¼0 if xi \17
1
Performance of x
Our object is to choose x such that Px ðhÞ8h 2 HH and ð1 Px ðhÞÞ8h 2 HK are as
small as possible. While performing a test x we reach any of the following
decisions:
I. Observe X = x, Accept H when θ actually belongs to HH : A correct decision.
II. Observe X = x, Reject H when θ actually belongs to HH : An incorrect
decision.
III. Observe X = x, Accept H when θ actually belongs to HK : An incorrect
decision.
IV. Observe X = x, Reject H when θ actually belongs to HK : A correct decision.
An incorrect decision of the type as stated in II above is called type-I error and
an incorrect decision of the type as stated in III above is called type-II error. Hence,
the performance of x is measured by the following:
(a) Size of type-I error = Probability {Type-I error} ¼ Sup PfX 2 x=hg ¼
h2HH
Sup Px ðhÞ
h2HH
¼ 1 Px ðhÞ 8h 2 HK :
So we want to minimize simultaneously both the errors. In practice, it is not
possible to minimize both of them simultaneously, because the minimization of one
leads to the increase of the other.
Thus the conventional procedure: Choose ‘x’ such that, for a given a 2
ð0; 1Þ; Px ðhÞ
a 8h 2 HH and 1 Px ðhÞ 8h 2 HK is as low as possible, i.e.,
Px ðhÞ 8h 2 HK is as high as possible. ‘x’ satisfying above (if it exists) is called an
optimum test at the level α.
Suppose HH ¼ fh0 g a single point set and HK ¼ fh1 g a single point set.
The above condition thus reduces to: Px ðh1 Þ = maximum subject to Px ðh0 Þ
a.
Definition 9
1. For testing H : h 2 HH against K: h ¼ h1 62 HH , a test ‘x0 ’ is said to be most
powerful (MP) level ‘α’ 2 ð0; 1Þ if
3.2 Definitions and Some Examples 73
Px0 ðhÞ
a8h 2 HH ð3:7Þ
and
Px0 ðhÞ
a8h 2 HH ð3:9Þ
Px0 ðh1 Þ Px ðh1 Þ8h1 2 HK and 8x satisfying (3.9). i.e. ‘x0 ’is said to be UMP
size-α if Sup Px0 ðhÞ ¼ a and Px0 ðh1 Þ Px ðh1 Þ 8h1 2 HK and 8x satisfying
h2XH
Sup Px ðhÞ
a. Again if HH ¼ fh0 g, the aforesaid conditions reduce to
h2XH
The definition of most powerful critical region, i.e. best critical region (BCR) of
size α does not provide a systematic method of determining it. The following
lemma, due to Neyman and Pearson, provides a solution of the problem if we,
however, test a simple hypothesis against a simple alternative.
The Neyman–Pearson Lemma maybe stated as follows:
74 3 Theory of Testing of Hypothesis
¼ p x=h1 dx p x=h1 dx
x0 x xx0
ð3:10Þ
h
) p x=h1 dx k p x=h0 dx
x0 x x0 x
Notes
1. Define Y ¼ ppððxjh 1Þ
xjh0 Þ. If the random variable Y is continuous, we can always find a
k such that, for a 2 ð0; 1Þ P½Y k ¼ a. If the random variable Y is discrete, we
sometimes find k such that P½Y k ¼ a.
But, in most of the cases, we have (assuming that P½Y k 6¼ a)
Ph0 ðY k1 Þ\a and Ph0 ðY k2 Þ [ a; k1 [ k2 ð) PðY kÞ ¼ a has no
solution).
In that case we get a non-randomized test 0 x0 0 of level a given by
pð xjh1 Þ
x0 ¼ x: k1 ; Px0 ðh0 Þ
a:
pð xjh0 Þ
a Ph0 fY k1 g
P¼a¼ :
P fY ¼ k2 g
pð xjh1 Þ
/0 ð xÞ ¼ 1 if k1
pð xjh0 Þ
a Ph0 fY k1 g pðx=h1 Þ
¼a¼ if ¼ k2
Ph0 fY ¼ k2 g pðx=h0 Þ
pðx=h1 Þ
¼ 0 if \k2 :
pðx=h0 Þ
Test /0 ð xÞ is obviously of size-a.
2. k ¼ 0 ) Px0 ðh1 Þ ¼ 1 ) x0 is a trivial MP test.
3. If the test (x0 ) given by N–P lemma is independent of h1 2 Hk that does not
include h0 , the test is UMP size-a.
4. Test (x0 ) is unbiased size-a.
76 3 Theory of Testing of Hypothesis
Again
pð xjh1 Þ
kpð xjh0 Þ ½As outside x0 : pð xjh1 Þ
kpð xjh0 Þ
Z Z
) pð xjh1 Þdx
k pð xjh0 Þdx
x0c x0c
, 1 Px0 ðh1 Þ
kð1 aÞ ð3:12Þ
P ðh1 Þ a
(3.11) ÷ (3.12) ) 1Px0x ðh Þ 1a , Px0 ðh1 Þ a:
0 1
) Test is unbiased.
Conclusion MP test is unbiased. Let x0 be a MP size-a test. Then, with
probability one, the test is equivalent to (assuming that ppððxjh 1Þ
xjh0 Þ has continuous dis-
tribution under h0 and h1 ) x0 ¼ fx : pð xjh1 Þ [ kpð xjh0 Þg where k is such that
Px0 ðh0 Þ ¼ a 2 ð0; 1Þ. h
Example 3.10 X1 ; X2 ; . . .Xn are i.i.d. N l; r20 ; 1\l\1; r0 = known. (without
any loss of generality take r0 ¼ 1).
X ¼ ðX1 ; X2 ; . . .Xn Þ observed value of X ¼ x ¼ ðx1 ; x2 ; . . .; xn Þ. To find UMP
size-a test for H : l ¼ l0 against K : l [ l0 . Take any l1 [ l0 and find MP
size-a test for
H : l ¼ l0 against K : l ¼ l1 ;
Solution
n 1 P
n
ðxi lÞ2
1 2
p x=l ¼ pffiffiffiffiffiffi e 1 :
2p
3.3 Method of Obtaining BCR 77
Then
P
n
1
ðxi l0 Þ2 Pn
p x=l 2
e 1
1
2 ðl1 l0 Þ ð2xi l1 l0 Þ
1
¼ P n ¼e 1
p x=l 1
2 ðxi l1 Þ2
0
e 1
1X
¼ enxðl1 l0 Þ2ðl1 l0 Þ
n 2 2
* x ¼ xi
n
n
o
x0 ¼ x : p x=l [ kp x=l ð3:13Þ
1 0
By (3.16),
ð3:14Þ , P x [ c=l ¼ a
0
pffiffiffi pffiffiffi
nðx l0 Þ nðc l0 Þ
,P [ l0 ¼ a
1 1
N l0 ; 1 under H)
(X1 ; X2 ; . . .Xn are i.i.d N ðl0 ; 1Þ under H ) X n
pffiffiffi
, P Z [ nðc l0 ÞjZ N ð0; 1Þ ¼ a
2 1 3
Z
pffiffiffi
) nðc l0 Þ ¼ sa 4 N ðZjð0; 1ÞÞdz ¼ a5
sa
1
, c ¼ l0 þ pffiffiffi sa ð3:17Þ
n
Observations
1. Power function of the test given by (3.16) and (3.17) is
[ l0 þ psa
Px0 ðlÞ ¼ PðX 2 x0 jlÞ ¼ Pr: X ffiffiffil
n
pffiffiffi
¼ P Z [ nðl0 lÞ þ sa jZ N ð0; 1Þ
Z1
¼ N ðZjð0; 1ÞÞdz
pffiffi
sa nðll0 Þ
pffiffiffi
(Under any l : ðX1 ; X2 ; . . .; Xn Þ are i.i.d. N ðl; 1Þ ) nðx lÞ N ð0; 1Þ)
pffiffiffi
¼ 1 U sa nðl l0 Þ :
fx : x\c0 g ð3:20Þ
3.3 Method of Obtaining BCR 79
and Px0 ðl0 Þ is equivalent to PfX\c 0
jl0 g ¼ a
sa
) c0 ¼ l0 pffiffiffi ð3:21Þ
n
(by the same argument as before while finding c)
Test given by (3.20) and (3.21) is independent of any l1 \l0 . Hence it is UMP
size-a for H : l ¼ l0 against K : l\l0 .
n o
5. (i) UMP size-a for H : l ¼ l against K : l [ l is x ¼ x : x [ l þ psaffiffi
0 0 0 0 n
Clearly, x0 6¼ x00 (xo is biased for H against l\l0 and x00 is biased for
H against l [ l0 ).
There does not exist any test which is UMP for H : l ¼ l0 against
K : l 6¼ l0 .
Solution
P
n
n 12 x2i
Here p x=r ¼ p1ffiffiffiffi e
2r
i
r 2x
Hence
x P
n
p =r1 r0 n 1
2 xi2 1
r2 r2
1
¼ e i 0 1
ð3:22Þ
x
p =r0 r 1
where kð [ 0Þ
is such that
P
n
x
p =r
n 2 xi r2 r2
1 2 1 1
Now,
1
[ k , rr01 e i 0 1
[ k [from (3.22)]
p x=r
0
2
r0
X
n
2 loge k n loge r1
, xi2 [
½As r1 [ r0
i
1
1 1
r20
1
r21
ð3:25Þ
r 2 r0
2
1
¼ c ðsayÞ
( )
X
n
and P x2i [ c=r0 ¼a ð3:27Þ
i
P
n
x2i
i
) v2n
r2
Hence (3.27)
2 3
Z1
c 6 1 y 7
e 2 y21 dy ¼ a5
n
) v2n;a 4
r20 Cðn=2Þ2n=2
v2n;a
Observations
1. Under any r20 ;
1X n
x2 ¼ Yn v2n
r0 i i
2
) EðYn Þ ¼ n; V ðYn Þ ¼ 2n
x0 ¼ x: x2i [ r02 sa 2n þ n
i¼1
2. UMP size-α test for H : r2 ¼ r20 against K : r2 [ r20 is
( )
X
n
w0 ¼ x: x2i [ r20 v2n;a
i
UMP size-a test for H : r2 ¼ r20 against K : r2 \r20 is
2 3
( ) Z1
X
n 6 7
w ¼ x: x2i \r20 v2n;1a 6 f ðv2n Þdv2n ¼ 1 a7
4 5
i
v2n;ð1aÞ
Clearly, w0 6¼ w . Hence there does not exist UMP test for H : r2 ¼ r20 against
K : r2 6¼ r20 .
The power function of the test w0 is
( ) Z1
2 Xn .
Pw0 r ¼ P xi [ r0 vn;a r ¼
2 2 2 2
f v2n dv2n
i
r2
0 v2
r2 n;a
Also Pw0 ðr Þ
Pw0 r0 ¼ a 8r : r2
r20
2
P
n
Next observe that 1n x2i is a consistent estimator of r2 . That means, for fixed r2 ,
i
Pn
r2 v 2
as n ! 11n x2i ! r2 in probability and 0 nn;a ! r20 . Thus if r2 [ r20 , we get
n i .
P 2
Lt P xi [ r20 v2n;a r2 ¼ 1 implying that the test w0 is consistent against
n!1 i
K : r2 [ r20 .
Similarly the test w is consistent against K : r2 \r20
eX =2 against
2
Example 3.12 Find the MP size-a test for H : X p1ffiffiffiffi
2p
K : X 12 ejX j
Px0 ðH Þ ¼ a ð3:31Þ
Now,
rffiffiffi
pðx=K Þ p x2 jxj
¼ e2 [k
pðx=H Þ 2
rffiffiffi
p x2
, loge þ j xj [ loge k
2 2
n p
o
, x2 2j xj þ loge 2 loge k [ 0
2
, x 2 2j x j þ C [ 0 ð3:32Þ
g(x)
x2 2x þ C ¼ 0
pffiffiffiffiffiffiffiffiffiffiffiffiffi
2 4 4c pffiffiffiffiffiffiffiffiffiffiffi
)x¼ ¼1 1c
2
pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi
So, x1 ðcÞ ¼ 1 1 c and x2 ðcÞ ¼ 1 þ 1 c
Hence (3.34)
n pffiffiffiffiffiffiffiffiffiffiffio n pffiffiffiffiffiffiffiffiffiffiffio a
, PH 0\x\1 1 c þ PH x [ 1 þ 1 c ¼
2
pffiffiffiffiffiffiffiffiffiffiffi
1 pffiffiffiffiffiffiffiffiffiffiffi
a
, U 1 1 c þ1 U 1þ 1 c ¼
2 2
pffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffi
a 1 1a
, U 1þ 1 c U 1 1 c ¼ 1 ¼ ð3:35Þ
2 2 2
pðx=h1 Þ
[ k , j xj [ C ð3:36Þ
pðx=h0 Þ
Z1 1
h0 dx a 1 x a
, 2 ¼ , tan1 ¼
p h0 þ x 2 2 p h0 C 2
C
1 p 1 C a 2 1 C
, tan ¼ , 1 tan ¼a ð3:38Þ
p 2 h0 2 p h0
Test given by (3.36) and (3.38) is MP size-a. As the test is independent of any
h1 [ h0 , it is UMP size-a for H : h ¼ h0 against K : h [ h0 . Power function is
given by
ZC
h
Px0 ðhÞ ¼ Pfj X j [ C=hg ¼ 1 dx
p h2 þ x 2
C
Here
pðx=h1 Þ 1 þ x2
[k , [k
pðx=h0 Þ 1 þ ð x 1Þ 2
, 1 þ x2 [ k 1 þ x2 2x þ 1
, x2 ð1 kÞ þ 2kx þ 1 2k [ 0 ð3:39Þ
Several cases
(a) k ¼ 1 , x [ 0, hence the size of the test is PðX [ 0=h ¼ 0Þ ¼ 12.
(b) 0\k\1 if we write gð xÞ ¼ ð1 kÞx2 þ 2kx þ ð1 2kÞ,
we have, g0 ð xÞ ¼ 2ð1 kÞx þ 2k ¼ 0 ) x ¼ 1kk
00
g ð xÞ ¼ 2ð1 kÞ [ 0, this means that the curve y ¼ gð xÞ has a minimum at
k
x ¼ 1k .
The function given by (3.41) and (3.42) is called the randomized test corre-
sponding to non-randomized test x0 . It states that, after observing Y ði.e; X Þ
Reject H if Y [ k
Accept H if Y\k
P P
Qn xi n xi
pðx=h1 Þ f ð x =h Þ h ð 1 h Þ
¼ Qi¼1 ¼ 1P
i 1 1
Y¼ n P
pðx=h0 Þ i¼1 f ðxi =h0 Þ
xi n xi
n s h0 ð 1 h0 Þ
1 h1 h1 ð1 h0 Þ
¼
1 h0 h0 ð1 h1 Þ
P
where s ¼ xi . Observe that Y is a discrete r.v. under any h.
Now,
n
1 h1 h1 ð 1 h0 Þ s
or k
1 h0 h0 ð 1 h1 Þ
h 0 i
1 h1 h 1 ð 1 h0 Þ
, n log þ s log
or k0 k ¼ loge k
1 h0 h 0 ð 1 h1 Þ
0
k log 1h 1
, s
or n on
n 1h0 o
log hh10 ðð1h0Þ
log hh10 ðð1h 0Þ
1h1 Þ 1h1 Þ
h1 ð 1 h0 Þ
¼ C; ðsayÞ; As; h1 [ h0 ) log [0
h0 ð 1 h1 Þ
88 3 Theory of Testing of Hypothesis
and
Ph0 fs Cg\a\Ph0 fs C g
P
a nc þ 1 ns hs0 ð1 h0 Þns ð3:47Þ
)a¼ c
n h ð1 h Þnc
c 0 0
Observation
n o
h1 ð1h0 Þ
1. For h1 \h0 ) log h0 ð1h1 Þ \0
In that case (3.43) and (3.44) are equivalent to
8 9
< 1; if s\C >
> =
/ ð xÞ ¼ a; if s ¼ C and Ph0 fs\C g þ aPh0 fs ¼ C g ¼ a:
>
: >
;
0; if s [ C
We can get UMP for H : h ¼ h0 against K : h\h0 by similar arguments.
Obviously /0 6¼ / . So there does not exist a single test which is UMP for
H : h ¼ h0 against K : h 6¼ h0
2. By De Moivre–Laplace limit theorem, for large n, pffiffiffiffiffiffiffiffiffiffiffiffi
Snh
is approximately N(0, 1).
nhð1hÞ
Hence, from (3.45) and (3.46), we get,
C nh0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ’ sa ) C ’ nh0 þ sa nh0 ð1 h0 Þ
nh0 ð1 h0 Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Then approximately size-a test is : Reject H if s [ nh0 þ sa nh0 ð1 h0 Þ,
Accept H otherwise.
3.3 Method of Obtaining BCR 89
PðhÞ ¼ Eh /0 ð X Þ
¼ Ph fS [ cg þ aPh fS ¼ cg
X n n s ns
¼ s h ð 1 hÞ þ a nc hc ð1 hÞnc
S¼c þ 1
½Can be obtained using Biometrika table
X n n s ns
X
n
ns
¼ ð 1 aÞ s h ð 1 hÞ þa s h ð1 hÞ
n s
S¼c þ 1 S¼c
¼ ð1 aÞIh ðc þ 1; n cÞ þ aIh ðc; n c þ 1Þ;
½Can be obtained using incomplete Beta function table:
Observe, as Ih ðm; nÞ is an increasing function of h, the Power function PðhÞ
increases with h.
1; if 1=2 \ x \ 3=2
pðx=K Þ ¼0; otherwise
As the ratio pðx=K Þ=pðx=H Þ is discrete, MP test for H against K is given by:
9
¼ 1 if; pðx=K Þ [ kpðx=H Þ =
/0 ð xÞ ¼ a; if pðx=K Þ ¼ kpðx=H Þ ð3:48Þ
;
¼ 0; if; pðx=K Þ\kpðx=H Þ
Taking
1
k\1; 0\x
) pðx=K Þ ¼ 0 and pðx=H Þ ¼ 1
2
) pðx=k Þ\kpðx=H Þ ) /0 ð xÞ ¼ 0
1
\x\1 ) pðx=K Þ ¼ pðx=H Þ ¼ 1=2
2
) pðx=K Þ [ kpðx=H Þ ) /0 ð xÞ ¼ 1
3
1
x\ ) pðx=K Þ ¼ 1 and pðx=H Þ ¼ 0
2
) pðx=K Þ [ kpðx=H Þ ) /0 ð xÞ ¼ 1
90 3 Theory of Testing of Hypothesis
1
So, for k\1, we get EH /0 ð X Þ ¼ 1: PH 2 \X\1 þ 1: PH ðX 1Þ ¼ 12. Thus it
is a trivial test of size 0.5.
Taking k\1;
1 9
0\x
) /0 ð xÞ ¼ 0 >
>
>
2 >
>
=
1
\x\1 ) /0 ð xÞ ¼ 0 EH /0 ð X Þ ¼ 0 and it is a trivial test of size 0:
2 >
>
>
>
3 >
1
x
) /0 ð xÞ ¼ 1 ;
2
Taking k ¼ 1;
o\x
12 ) /0 ð xÞ ¼ 0: we always accept H.
1
x\ 32 ) /0 ð xÞ ¼ 1: We always reject H.
19>
¼ 0; if 0\x
>
2>>
>
=
1
¼ 2a; if \x\1 is MP size-a test:
2 >
>
>
3>>
¼ 1; if1
x\ ;
2
P
m
Outside x0 : g0 ð xÞ
ki gi ð xÞ, where k1 ; k2 ; . . .. . .km are so chosen that
R 1
gi ð xÞdx ¼ Ci ; i ¼ 1ð1Þm.
x0
R R
Then we have g0 ð xÞdx g0 ð xÞdx.
x0 x
This is called generalized Neyman–Pearson Lemma.
Proof
Z Z
g0 ð xÞdx g0 ð xÞdx
x0 x
Z Z
x0 x ¼ x0 x\x0 ¼ insidex0
¼ g0 ð xÞdx g0 ð xÞdx. . .. . .. . .ð1Þ
x x0 ¼ x x\x0 ¼ outsidex0
x0 x xx0
X
m
x 2 x0 x ) g0 ð xÞ [ ki gi ð xÞ
1
Z Z ( )
X
m
) g0 ð xÞdx ki gðxi Þ dx
1
x0 x x0 x
8 9
Xm < Z =
¼ ki gi ð xÞdx
i¼1
: ;
x0 x
X
m
x 2 x 0 x ) g0 ð x Þ
ki gi ð x Þ
1
( ) 8 9
Z Z X
m X
m < Z =
) g0 ð xÞdx
ki gi ðxÞ dx ¼ ki gi ð xÞdx
1 1
: ;
xx0 xx0 xx0
1 1
f ðx=hÞ ¼ ; ð1\h\1; 1\x\1Þ
p 1 þ ðx hÞ2
Similarly,
Let x0 be such that Px0 ðh0 Þ ¼ a and P0x0 ðh0 Þ is maximum, i.e.
P0x0 ðh0 Þ P0x ðh0 Þ8x : Px0 ðh0 Þ
¼ a: Then comparing (3.50) and (3.51), we get an
e [ 0, such that Pxc ðhÞ Px ðhÞ8h : h0 \h\h0 þ e. Such a x0 is called locally
most powerful size-α test for H : h ¼ h0 against h [ h0 .
Now our problem is to choose x0 such that
Z
Px0 ðh0 Þ ¼ a , pðx=h0 Þdx ¼ a ð3:52Þ
x0
and
Z Z
dpðx=hÞ dpðx=hÞ
P0x0 ðh0 Þ P0x ðh0 Þ , dx dx
dh0 dh0
x
Z0 x
Z
0
, p ðx=h0 Þdx p0 ðx=h0 Þdx
x0 x
R
where x satisfies Px ðh0 Þ ¼ a , pðx=h0 Þdx ¼ a
x
3.4 Locally MPU Test 93
Thus the test given by (3.53) and (3.54) is locally most powerful size-a for
H : h ¼ h0 against h [ h0 .
Note If UMP test exists for H : h ¼ h0 against h [ h0 ) LMP test corre-
sponding to the said problem must be identical to the UMP test. But the converse
may not be true.
Example 3.17 X1 ; X2 ; . . .Xn are i.id Nðh; 1Þ. H : h ¼ h0 against h [ h0 .
LMP test is provided by
p0 ðx=h0 Þ [ kpðx=h0 Þ
1
, p0 ðx=h0 Þ [ k
pðx=h0 Þ ð3:57Þ
d
, ½loge pðx=h0 Þ [ k
dh0
P
Here pðx=hÞ ¼ ð2pÞn=2 e2 ðxi hÞ
1 2
1Xn
) log pðx=hÞ ¼ const. ðxi hÞ2
2 1
0Þ
P
n
) d logðx=h
dh ¼ ðxi h0 Þ, hence by (3.57), (3.55)
0
1
94 3 Theory of Testing of Hypothesis
, x0 ¼ fx : x [ k0 g
p0 ðx=h0 Þ
p0 ðx=h0 Þ [ kpðx=h0 Þ , [k
pðx=h0 Þ
d log pðx=h0 Þ
, [k ½pðx=hÞ [ 0
dh0
P
n 0
f ðxi =h0 Þ P
n 0
, f ðxi =h0 Þ [ k, ðsayÞ , yi [ k, where yi ¼ ff ðxðxii=h
=h0 Þ
0Þ
.
1 1
Now, under H, yi 0 s is i.i.d with
Z Z
f 0 ðxi =h0 Þ
Eh0 fyi g ¼ f ðxi =h0 Þdx ¼ f 0 ðx=h0 Þdx
f ðxi =h0 Þ
Z
d d
¼ f ðx=h0 Þdx ¼ ð1Þ ¼ 0
dh0 dh0
Z
f 0 ðxi =h0 Þ 2
Vh0 fyi g ¼ f ðx=h0 Þdx
f ðx=h0 Þ
Z
@ log f ðx=h0 Þ 2
¼ f ðx=h0 Þdx
dh0
¼ Iðh0 Þ ½Fisher0 s information:
Pn
y
Hence, by Central Limit Theorem, for large n, pffiffiffiffiffiffiffiffiffi
1 i
Nð0; 1Þ, under H. So,
nIðh0 Þ
for large n, the above test can be approximated by
3.4 Locally MPU Test 95
( )
X
n
f 0 ðxi =h0 Þ pffiffiffiffiffiffiffiffiffiffiffiffiffi
x¼ x: [ sa nIðh0 Þ :
i¼1
f ðxi =h0 Þ
ðh h0 Þ2 00 k
Pw ðhÞ ¼ Pw0 ðh0 Þ þ ðh h0 ÞP0w ðh0 Þ þ Pw ðh Þ;
2!
jh h0 j\jh h0 j
Construction
Z
Z
c1 ¼ a; c2 ¼ 0
Then
n
o
w0 ¼ x : p00 x=h0 [ k1 p x=h0 þ k2 p0 x=h0
R R
where k1 and k2 are such that g1 ðxÞdx ¼ a; g2 ðxÞdx ¼ 0:
R Rw0 w0
Then we have g0 ðxÞdx g0 ðxÞdx provided ‘w’ satisfies (i) and (ii).
w0 w
Example 3.18 X1 ; X2 ; . . .Xn are i.i.d. Nðl; 1Þ. To find LMPU test for H : l ¼ l0
against K:l 6¼ l0 :
P n
1
n 12 ðxi lÞ2
Answer Here p x=h ¼ pffiffiffiffi 2p
e 1
P
n
0 1 n
1
2 ðxi lÞ2
x
p =h ¼ pffiffiffiffiffiffi nðx lÞe ¼ nðx lÞp x=h
2p
00 x
p =h ¼ ½nðx lÞ2 p x=h np x=h
and
Z
p0 ðx=h0 Þdx ¼ 0 ð3:59Þ
x0
n o
¼ x : ½nðx l0 Þ2 n [ k1 þ k2 nðx l0 Þ
n pffiffiffi 2 pffiffiffi o
¼ x: nðx l0 Þ [ k01 þ k02 nðx l0 Þ
¼ x : y2 [ k01 þ k02 y ;
pffiffiffi
y ¼ nðx l0 Þ Nð0; 1Þ under H
(3.59) ⟺
Z
yNðy=0; 1Þdy ¼ 0 ð3:60Þ
y2 [ k01 þ k02 y
Now the LHS of (3.60) is zero irrespective of choice of any ðk01 ; k02 Þ since
y
Nð =0; 1Þ is symmetrical about ‘0’.
Here, we can safely take k02 ¼ 0 without affecting size condition. Then our test
reduces to w0 : x : y2 [ k01 fx : jyj [ cg and hence (3.58) is equivalent to
R
Nðy=0; 1Þdy ¼ a ) c ¼ sa=2
jyj [ c
Then we obtain LMPU test for H : l ¼ l0 against l 6¼ l0 .
A test which is locally most powerful and locally unbiased is called a Type A
test and corresponding critical region ‘w0 ’ is said to be Type-A critical region
x=
h1
" T ð xÞ for h1 [ h0 ; then (i) and (ii) are satisfied.
p
p x=h
0
In that case, we try to choose a test w0 for which
(i) Pw0 ðh0 Þ ¼ a
(ii) Pw0 ðhÞ a8h 6¼ h0
(iii) Pw0 ðhÞ Pw ðhÞ8h 6¼ h0 8w satisfying (i) and (ii)
Thus, if a test w0 satisfies (i), (ii) and (iii); under (3.61), w0 also satisfies (i) and (iii).
Test satisfying (i), (iii) and (3.62) is called type-A1 test.
For exponential distribution, if type-A1 test exists, then it must be unbiased. But
this is not true in general.
Construction Our problem is to get w0 such that
R
(i) p x=h0 dx ¼ a;
R 0
w0
(ii) p x=h0 dx ¼ 0
R
w0
R
(iii) p x=h dx p x=h dx 8w satisfying (i) and (ii) and 8h 6¼ h0
w0 w
1
P
n
ðxi lÞ2
pðx=hÞ ¼ ð2pÞn=2 e
2
P
n
X
n 1
ðxi lÞ2 X
n
p ðx=hÞ ¼ ð2pÞn=2
0 2
ðxi lÞe i ¼ ðxi lÞpðx=hÞ
i¼1 i
1
P 2
e 2 ðxi lÞ
n 2
pðx=hÞ e 2 ðxlÞ
P ¼ e2ðll0 Þf2xðl0 þ lÞg
n
¼ ¼
pðx=h0 Þ e12 ðxi l0 Þ2 en2 ðxl0 Þ2
pffiffiffi
) w0 ¼ fx : eðll0 Þt [ k01 þ k02 tg where t ¼ nðx l0 Þ
¼ fx : edi [ k01 þ k02 tg
Z
, Nðt=0; 1Þdt ¼ a ð3:63Þ
wo
Z
, tNðt=0; 1Þdt ¼ 0 ð3:64Þ
w0
Writing gðtÞ ¼ edt k01 k02 t we have g0 ðtÞ ¼ dedt k02 and g00 ðtÞ ¼ d2 edt [ 08t
) y ¼ gðtÞ has a single minimum (global minimum).
Now if we take a\0:5, because of (3.63) and since the distribution of t is sym-
metric about 0 under H0 our shape of the curve will be like as shown below. From the
graph, we observe that c1 \c2 ; gðtÞ [ 0 for t\c1 and t [ c2 and gðtÞ
0 otherwise.
y=g(t)
c1 c2 t
100 3 Theory of Testing of Hypothesis
Hence w0 is equivalent
R to wo ¼ fx : t\c1 or t [ c2 g
(3.63) , Nðt=0; 1Þdt ¼ a and (3.64)
t\c1 ;t [ c2
Z
, tNðt=0; 1Þdt ¼ 0 ð3:65Þ
t\c1; t [ c2
w0 ¼ fx : t\ c and t [ cg ð3:66Þ
Here (3.65) is automatically satisfied. Hence test given by (3.66) and (3.67) is
type-A1 (which is UMPU).
Example 3.20
Thus,
Pn 2
x x i xi 1 x
P
n
x2
pðx= Þ r
n i
i r2
ð1 02 Þ
As x h ¼
0 2r2 r
e 0
pð =h0 Þ r
3.5 Type A1 (Uniformly Most Powerful Unbiased) Test 101
n r
n d o
e2t [ k01 þ k02 t
0
w0 ¼ x :
r
n dt
Now as before the curve y ¼ gðtÞ ¼ rr0 e 2 k01 k02 t has a single minimum.
n o
Here P T [ 0=h ¼ 1
) Shape of the curve gðtÞ will be as shown below
y = g(t)
d1 d2 t
and
Z
Z
p0 x=h0 dx ¼ 0 , ðt nÞfv2n ðtÞdt ¼ 0 ð3:69Þ
w0 t\d1 or t [ d2
R R
(3.69) ⇔ tfv2n ðtÞdt ¼ n fv2n ðtÞdt ¼ na, by (3.68)
t\d1 or t [ d2 t\d1 or d2
Zd2
, tfvn2 ðtÞdt ¼ ð1 aÞn
d1
Zd2
, fvn2þ 2 ðtÞdt ¼ ð1 aÞ ð3:70Þ
d1
Example 3.21 X1 ; X2 ; . . .; Xn are i.i.d with p x=h ¼ hehx . Find Type-A and Type-
A1 test for H : h ¼ h0 against h 6¼ h0 .
Answer proceed as Examples 3.19 and 3.20 and hence get
( )
X
n X
n
x0 ¼ x: Xi \c1 or Xi [ c2
1 1
4.1 Introduction
In the previous chapter we have seen that UMP or UMP-unbiased tests exist only
for some special families of distributions, while they do not exist for other families.
Further, computations of UMP-unbiased tests in K-parameter family of distribution
are usually complex. Neyman and Pearson (1928) suggested a simple method for
testing a general testing problem.
Consider X pðxjhÞ, where h is a real parameter or a vector of parameters,
h 2 H:
A general testing problem is
H : h 2 H0 Against K : h 2 H1 :
Here, H and K may be treated as the subsets of H. These are such that H \ K ¼
/ and H [ K H. Given that X ¼ x; pðxjhÞ is a function of h and is called like-
lihood function. Likelihood test for H against K is provided by the statistic
Sup pðxjhÞ
h2H
L ð xÞ ¼ ;
Sup pðxjhÞ
h2H[K
which is called the likelihood ratio criterion for testing H against K. It is known that
(i) pðxjhÞ 08h
(ii) Sup pðxjhÞ Sup pðxjhÞ.
h2H h2H[K
measures the best explanation of X to come from some population covered under
H [ K. Higher values of the numerator correspond to the better explanation of
X given by H compared to the overall best possible explanation of X, which results
in larger values in Lð xÞ leading to acceptance of H. That is, Lð xÞ would be larger
under H than under K. Indeed, smaller values of Lð xÞ will lead to the rejection of
H. Hence, our test procedure is:
Reject H iff Lð xÞ\C
and accept H otherwise,
where C is such that PfLð xÞ\CjH g ¼ a 2 ð0; 1Þ:
If the distribution of Lð xÞ is continuous, then the size a is exactly attained and no
randomization on the boundary is needed. If the distribution is discrete, the size
may not attain a and one may require randomization. In this case, we have C from
the relation
PfLð xÞ\CjH g a:
Example 4.1 Let X be a binomial bðn; hÞ random variable. Find the size-a likeli-
hood ratio test for testing H : h h0 against K : h [ h0
4.1 Introduction 105
_ x x
hH ¼ if h0
n n
x
¼ h0 if [ h0
n
Thus, we have
n n x x
nx x nx
Sup h ð 1 hÞ ¼
x
1
H x x n n
8
> n nx
>
> x x
1 nx if x
h0
n x < x n n
nx
Sup h ð 1 hÞ ¼
h h0 x >
> n x
>
: h ð1 h0 Þnx if x
[ h0
x 0 n
(
1 if x
n h0
So, Lð xÞ ¼ hx0 ð1h0 Þnx
if x
[ h0
ðnxÞ ð1nxÞ
x nx
n
Solution
(a) Here, H0 ¼ fl0 g; H ¼ fl : 1\l\1g
P
n
n=2 2r2 1
ðXi lÞ2
pðxjhÞ ¼ pðxjlÞ ¼ ð2pÞn=2 r2 e i¼1
El0 ½/ðxÞ ¼ a
pffiffiffi
nðx l0 Þ
or; P l0
r [ C2 ¼ a
4.1 Introduction 107
pffiffi
Now, under H : l ¼ l0 , the statistic nðxrl0 Þ follows N ð0; 1Þ distribution. Since
the distribution is symmetrical about 0, C2 must be the upper a=2—point of the
distribution. Finally, the test is given as
( pffiffi
1 if nðxrl0 Þ [ sa=2
/ðxÞ ¼
0 Otherwise
(b) Here, H0 ¼ ðl; r2 Þ : l ¼ l0 ; r2 [ 0
H¼ l; r2 : 1\l\1; r2 [ 0
In this case,
Pn
1 n=2 2r2 ðXi lÞ
1 2
Sup p xjl; r 2
¼ Sup e i¼1
H0 H0 2pr2
n=2
e 2 .
n
This gives Sup pðxjl0 ; r2 Þ ¼ 1
2ps02
H0
n=2 Pn
ðXi lÞ2
e2r2
1
Further, Sup pðxjl; r2 Þ ¼ Sup pðxjl; r2 Þ ¼ Sup p 1
2pr2
i¼1
H l;r2 l;r2
Pn
The MLE of l and r are given as x and2 1
i¼1
Þ2 ¼ ðn1Þ s2 ; where
ðXi X
Pn n n
s ¼ n1 2
i¼1 ðXi X Þ . We have,
2 1
!n=2
1
e 2
n
Sup p xjl; r2 ¼
l;r2 2p ðn1Þ 2
n s
Lð xÞ\C
nðx l0 Þ2
or; 2
[ C1
pffiffiffi s
nðx l0 Þ
or; [ C2 :
s
( pffiffi
1 if nðxsl0 Þ
/ðx; sÞ ¼ [ ta2;n1
0 otherwise
Example 4.3 X1 ; X2 . . .Xn are i.i.d N ðl; r2 Þ. Derive LR test for H : l l0 against
K : l [ l0 .
Answer h ¼ ðl; r2 Þ; H ¼ h : 1\l\1; r2 [ 0
P
n
n 2r2
1
ðXi lÞ2
Here, pðx=hÞ ¼ p x l; r2 ¼ ð2pÞ2 ðr2 Þ 2 e
n
1
_ _2 1Xn
ð n 1Þ 2
l ¼ x; r ¼ ðxi xÞ2 ¼ s
n 1 n
_ n1 2
lH ¼ x if x l0 and r
^2H ¼ s if x l0
n
1X n
¼ l0 if x [ l0 and r
^2 H ¼ s20 ¼ ðxi l0 Þ2 if x [ l0 :
n 1
. n n
_ 2 2 n
^2 ¼ ð2pÞ 2 n1
Thus, we have p x l; r n s e2
. n2
_ n2 n 1 2 n
^ H ¼ ð2pÞ
p x lH ; r 2
s e 2 if x l0
n
n n n
¼ ð2pÞ2 s20 2 e2 if x [ l0
So, Lð xÞ ¼ 1 if x l0
n1 2 n=2
n s
¼ ; if x [ l0
s20
8 9
>
> p ffiffi
ffi >
>
< nðx l Þ 00
=
, P rP C
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ =l ¼ l
0
¼a
>
> n 0>
>
: ðx xÞ2
1 i ;
n1
00
) C ¼ ta;n1
pffiffi
Hence, reject H iff nðxsl0 Þ [ ta;n1
) Test can be carried out by using students’ t-statistic.
Example 4.4 X1 ; X2 . . .Xn are i.i.d N ðl; r2 Þ; 1\l\1; r2 [ 0: Find the LR test
for
^H = MLE of l under 8
l H = x
>
>
ðn1ÞS2
< ðr20 Þn=2 e 2r20
Hence we get, Lð xÞ ¼ Þ 2 n=2 n
; if n1 2
n s r20
>
> ððn1
n s Þ e 2
: 1; if n1 s2 \r2
n 0
4.1 Introduction 111
Lð xÞ\C: ð4:3Þ
, u2 e2 \C
n u
if u n ð4:5Þ
n
n n u2 u
g0 ðuÞ ¼ u21 e2 e2 ¼ 0
u
2 2
)nu¼0,u¼n
The curve y = g(u) has a maximum and the shape of the curve is
g(u)=c'
u-1 u= n u0
From the graph, we observe that the line y = C′ cuts the curve y = g(u) at two
points: u1 and u0 such that 0\ u1 \n\ u0 . Hence, the test is equivalent to reject
H iff u\u1 or u [ u0 ; where
PH v2n1 \u1 þ PH v2n1 [ u0 ¼ a:
112 4 Likelihood Ratio Test
Although v2 is not symmetric, for the matter of simplicity, equal error proba-
bilities a=2 are attached to both left-and right-sided critical regions. Thus, u1 ¼
v2n1;1a and u0 ¼ v2n1;a .
2 2
Example
4.5 Let X1 ; X2 ; . . .Xn1 and Y1 ; Y2 ; . . .Yn2 be two independent samples drawn
from N l1 ; r21 and N l2 ; r22 ; respectively. Find out the likelihood ratio test of
(a) H : r21 ¼ r22 against K : r21 [ r22
(b) H : r21 ¼ r22 against K : r21 6¼ r22 ; when l1 and l2 are unknown
(a) Here, h ¼ l1 ; l2 ; r21 ; r22
H0 ¼ l1 ; l2 ; r21 ; r22 : 1\l1 ; l2 \1; r21 ¼ r22 ¼ r2 [ 0
H ¼ l1 ; l2 ; r21 ; r22 : 1\li \1; r2i [ 0; i ¼ 1; 2
P
n1
2
P
n2
2
n1 þ n2
1
1
ðXi l1 Þ ðyi l2 Þ
pðx; yjhÞ ¼ ð2pÞ rn 1 n2 2r2 2r2
1 r2 e
2 1 i¼1 2 i¼1
_
Sup pðx; yjhÞ
p x; y hH
h2H
Lðx; yÞ ¼ ¼ _
Sup pðx; yjhÞ
h2H[K
p x; y h
_
where hH = MLE of h under H
_
h ¼ MLE of h under H [ K:
_
n1 þ n2 n1 þ2 n2 n þ n
_ 2
Hence, p x; yjhH ¼ ð2pÞ 2 rH e 2
1 2
8
n þn n21 2 n22 n þ n
>
> ð Þ 1 2 2 _2
r
_
r2 e 2
1 2
_2
r1
1
_
< 2p 1 if _2
r2
and p x; yjh ¼ n1 þ n2 2 1 2 2
n þn
>
> n1 þ n2 _2
r1
: ð2pÞ 2 r _
H e 2 if _2
\1
r2
4.1 Introduction 113
8
> _2 n21 _2 n22
>
> r1 r2 _2
r1
>
< n1 þ n2 if 1
_2
_2 r2
Therefore, Lðx; yÞ ¼
2
rH
>
>
>
>
_2
r1
: 1 if _2
\1
r2
8 nP on1=2 nP on2=2
>
> ðxi xÞ2 ðyi yÞ2 P
>
> ðxi xÞ2=
>
>
n1 n2
P n1 1
> if
< nP
> 2
P
ðxi xÞ þ
2 oðn1 þ n2 Þ=2
ðyi yÞ ðyi yÞ2=
¼ n1 þ n2
n2
>
> P
>
> ðxi xÞ2=
>
> n1 \1
>
> P
:1 if
ðyi yÞ2=
n2
8
> P n1=2
>
> 2
P
>
> P ðxi xÞ2 ðxi xÞ2=
>
>
n1 þ n2
ðyi yÞ
> ðn1 þ n2 Þ 2
> if P n1 1
>
< n 1= n 2=
n 2 n 2 P n1 þ2 n2 ðyi yÞ2=
ðxi xÞ2
¼ 1þP n2
1 2
>
> ðyi yÞ2
P
>
>
>
> ðxi xÞ2=
>
> n1 \1
>
> 1 if P
: ðyi yÞ2=
n2
Now, under the null hypothesis H : r21 ¼ r22 ¼ r2 ; consider the statistic
P .
ðxi xÞ2
ðn1 1Þ s21
F¼P 2.
¼ 2 Fn1 1;n2 1 :
ðyi yÞ s2
ð n2 1Þ
2 n1 =2 3
n1 1
6 n2 1F n1 ðn2 1Þ7
where C1 is such that P4 ðn1 þ n2 Þ=2 \C1 and F n2 ðn1 1Þ5 ¼ a:
n 1
1 þ n1 1F
n1=2
2
n1 1
n2 1F
Writing gðF Þ ¼ ðn1 þ n2 Þ=2
n 1
1 þ n1 1F
2
Pr21 ¼r22 fF [ d0 g ¼ a
4.1 Introduction 115
(b) Similarly, for testing H : r21 ¼ r22 against K : r21 6¼ r22 the LR test is equivalent
to
s21 s21
\d 1 or [ d0 :
s22 s22
PH fF\d 1g ¼ PH fF [ d0 g ¼ a=2
This gives d 1 ¼ Fn1 1;n2 1;1a2 and d0 ¼ Fn1 1;n2 1;a=2 : The LR test is, there-
fore, given as
8
>
>
s21
\Fn1 1;n2 1;1a=2
<1 if s22
/ðx; yÞ ¼ s21
[ Fn1 1;n2 1;a=2
>
>
or s22
:
0 otherwise
Solution
(a) Here, h ¼ l1 ; l2 ; r21 ; r22
P n1 P n2
n1 þ n2 1
ðXi l1 Þ2 1
ðyi l2 Þ2
rn1 n2 2r2 2r2
pðx; yjhÞ ¼ ð2pÞ 2
1 r2 e
1 i¼1 2 i¼1
P
n1 n2 P
n1 þ n2 1
ðxi xÞ2 1
ðyi yÞ2
rn1 n2 2r2 2r2
Sup pðx; yjhÞ ¼ ð2pÞ 2
1 r2 e
1 i¼1 2 i¼1
Under l 2 H0 ;
P
n1 P n2
n1 þ n2 1
ðXi lÞ2 1
ðyi lÞ2
rn 1 n2 2r2 2r2
pðx; yjlÞ ¼ ð2pÞ 2
1 r2 e
1 i¼1 2 i¼1
1 X 1 X
log pðx; yjlÞ ¼ k 2
ð x i lÞ 2 2 ð y i lÞ 2 ;
2r1 2r2
d
log pðx; yjlÞ ¼ 0
dl
1X n1
1X
) 2 ð xi l Þ þ 2 ð y i lÞ ¼ 0
r1 1 r2
1 1 n1 n2
) 2 n1x þ 2 n2y ¼ þ l
r1 r2 r21 r22
n1 x n2 y r21 r22
þ
_ r21 r22 n1 yþ n2 x
) lH ¼ ¼ :
n1
þ n2 r21 r22
r21 r22 n1 þ n2
This gives
P
n1 _ 2 Pn2 _ 2
n þn 1
xi l H 1
yi l H
12 2
rn 1 n2
2 2
pðx; yjhÞ ¼ ð2pÞ 1 r2 e
2r 2r
Sup 1 i¼1 2 i¼1
Lðx; yÞ ¼ P P
1
ðxi xÞ2 1
ðyi yÞ2
2r2 2r2
e 1 2
_ 2 _ 2
n1 n2
xl H yl H
¼e 2r2
1
2r2
2
4.1 Introduction 117
"r2 #2
2
n1 ð
1 x yÞ
_
Now, x lH ¼ r2 r2
n1 þ n2
1 2
22 32
2 r2
ð
y
x Þ
y lH ¼ 4n2r2 r2 5
_
n1 þ n2
1 2
2 1
n1 _ n2 _ 2 r1 r22
2 2
) 2 ðx lH Þ þ 2 ðy lH Þ ¼ ðx yÞ þ
r1 r2 n1 n2
2 1
r r2
12 1 þ n2 ðxyÞ2
Thus, Lðx; yÞ ¼ e 2n1
1
x y
or; qffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ C3
r21 r22
n1 þ n2
For ðl1 ; l2 ; r2 Þ 2 H;
n
n1 þ2 n2 P 1 P
n2
2 2
12 ðxi l1 Þ þ ðyi l2 Þ
1 2r
p x; yjl1 ; l2 ; r2 ¼ e 1 1
2pr2
d log p _
¼ 0 ) l1 ¼ x
dl1
d log p _
¼ 0 ) l2 ¼ y
dl2
P P
d log p _2 ðxi xÞ2 þ ðyi yÞ2
¼ 0 ) r ¼
dr2 n1 þ n2
n1 þ2 n2 ðn1 þ n2 Þ
1 _2
e2ðn1 þ n2 Þ
2 1
) Sup p x; yjl1 ; l2 ; r ¼ 2
r
ðl1 ;l2 ;r Þ2H
2 2p
For ðl; r2 Þ 2 H0 ;
n1 þ2 n2 P
n1 P
n2
2 2
12 ðxi lÞ þ ðyi lÞ
1 2r
p x; yjl; r2 ¼ e 1 1
2pr2
" #
n1 þ n2 1 X n1
2
X
n2
2
) log p ¼ log 2pr 2
2
ð x i lÞ þ ð y i lÞ
2 2r 1 1
4.1 Introduction 119
Now,
2 2 n o2
_ n1 x þ n2 y n2 ðx yÞ
Here, x lH ¼ x n1 þ n2 ¼ n1 þ n2
2
n1x þ n2y 2
n1 ðy xÞ 2
_
y lH ¼ y ¼
n1 þ n2 n1 þ n2
This gives
" #
1 X
n1
n2 ðx yÞ 2 X n2
n1 ðy xÞ 2
_2 2 2
rH ¼ ðxi xÞ þ n1 þ ðyi yÞ þ n2
n1 þ n2 1 n1 þ n2 1
n1 þ n2
" #
1 X
n1 X
n2
n1 n2
¼ ðxi xÞ2 þ ðyi yÞ2 þ ðx yÞ2
n1 þ n2 1 1
n 1 þ n2
1 n1 þ2 n2 _ 2
ðn1 þ n2 Þ
e2ðn1 þ n2 Þ
2 1
Therefore, Sup pðx; yjl; r Þ ¼ 2p 2
rH
ðl;r2 Þ2H0
Hence we get,
_2
!n1 þ2 n2 Pn1 P
r ðxi xÞ2 þ n12 ðyi yÞ2
¼ _2
¼ Pn 1 P 1
2 3
6 7
6 7
6 1 7
¼6 7
6 n n ðx yÞ2 7
41 þ nP 1 2 o5
n1 P
ðn1 þ n2 Þ 1 ðxi xÞ2 þ n12 ðyi yÞ2
120 4 Likelihood Ratio Test
2 3
6 7
6 1 7
¼6
6
7
7
ðxyÞ
P
2
P n2
41 þ 5
ðn11 þ n12 Þðn1 þ n2 2Þf ðyi yÞ2 g
n1
1
ðxi xÞ2 þ 1
ðn1 þ n2 2Þ
N l1 ; r2 and Y N l2 ; r2
We know, X n1 n2
Y N l1 l2 ; r2 1 þ 1
X
n1 n2
ðl1 l2 Þ
Thus, X ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
rY
N ð0; 1Þ
r 1
n1 þ n1
2
P
Again, 1
ðXi X Þ2 v2n 1
r2
P 1
Thus, under H : l1 ¼ l2 ;
. ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðX Y Þ q
n1 þ n2
r 1 1
t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
hP i ffi tn1 þ n2 2 ;
P
1
ðX i X Þ þ 2 2
ðYi Y Þ n1 þ n2 2 1
r2
ðx yÞ
or; t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi n1 þ n2 2
n1 þ n2 s
1 1 2
P P
ðxi xÞ2 þ ðyi yÞ2 ðn1 1Þs21 þ ðn2 1Þs22
where s ¼
2
n1 þ n2 2 ¼ n1 þ n2 2
So, Lðx; yÞ ¼ 1
2
1 þ n þt n 2
1 2
Thus, the rejection region Lðx; yÞ\C gives
t2
1þ [ C1
n1 þ n2 2
or; t2 [ C2
or; jtj [ C3
0 Otherwise
8 x y
>
<1 if rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
\ tn1 þ n2 2;a
s
/ðx; yÞ ¼ 1
þ 1
>
n1 n2
:
0 Otherwise
@ log p x=h
as ¼ 0 ) li ¼ xi
@li
1X k X ni
2 within S:S: W
r2 ¼ xij xi ¼ ¼ :ðsayÞ:
n 1 1 n n
" #
@ log p x=h 1 X
k X ni
2
Here ¼ 0 ) r2 ¼ xij xi
@r2 n 1 1
1Xk X
ni
1Xk
lH ¼ xij ¼ nixi ¼ x; ðsayÞ
n 1 1 n 1
1X k Xni
2 Total S:S: T
r2H ¼ xij x ¼ ¼
n 1 1 n n
" #
1 X k 2 W þB
¼ Wþ ni xi x ¼
n 1
n
" #
Xk 2
B¼ ni xi x ¼ BetweenðmeansÞ S:S :
1
.
Hence, we get p x ^h ¼ ð2pÞn=2 ðrÞn e2 :
n
.
p x ^hH ¼ ð2pÞn=2 ðr
^H Þn e2
n
2 n=2
^
r
So, Lð xÞ ¼ ^2H
r
and therefore reject H iff Lð xÞ\c; PH fLð xÞ\cg ¼
a 2 ð0; 1Þð0 \ c \ 1Þ
^2H
r W þB B
, [ c0 , [ c0 , [ c00
r 2 W W
B=
, T ¼ k 1 [ c000
W=
nk
)c000 ¼ Fa;ðk1;nkÞ
B=
So, our LR test is W k 1 [ Fa;ðk1;nkÞ as rejection of H.
=n k
Note It is the same as ANOVA test.
Special case: (i) l ¼ l0 (given)
n=2 "P P 2 #n=2
k ni
^2
r 1 xij xi
Lð xÞ ¼ ¼ Pk Pn
1
2
^H 2
r
1 xij l0
i
1
" #n=2
W
¼ P
W þ k1 ni ðxi l0 Þ2
4.1 Introduction 123
B
[ v2a;k1 :
r20
Obtain the likelihood ratio test of H : r21 ¼ r22 ¼ r23 ¼ ¼ r2k against K: not
all ri ’s are equal.
Answer h ¼ l1 ; l2 ; . . .; lk ; r21 ; r22 ; . . .; r2k
H ¼ h : 1\li \1; r2i [ 0; i ¼ 1ð1Þk
8 9
< 12 P ðxij li Þ2 =
ni
ð2pÞ
n =2 Q P
k
Likelihood function = p x=h ¼ Q k
; n¼
2r
ni e i ni
k
ð r2 2
i Þ
i¼1
: ;
i¼1
124 4 Likelihood Ratio Test
@ log pðx=hÞ 2
Pn i
Now, @li ¼0) 2r2i j¼1 ^i ¼ xi
xij li ð1Þ ¼ 0 ) l
and
@ log p x=h ni 1 X ni
2
) þ xij li ¼ 0
@ri2 2
2ri 4
2ri j¼1
1X ni
2 ni 1 2
^2i ¼
)r xij xi ¼ s
ni j¼1 ni i
. Q 2 n2i
Hence, for h 2 H we get p x ^h ¼ ð2pÞn=2 ki¼1 r ^i
x
¼ Sup p =h
h2H
Under H, p x=h reduces to
P
k P
ni
ðxij li Þ
2
12
p x=h ¼ ð2pÞn=2 rn e ;
2r
P ni
k P 2 P
k
_ _2
where from we get liH ¼ xi and rH ¼ 1n xij xi ¼ 1n ðni 1Þs2i
. 2 n=2 n=2
Hence, Sup p x=h ¼ p x ^hH ¼ ð2pÞn=2 r ^H e
h2H
Hence, likelihood ratio is
Qk nðni 1Þs2i o 2i
n
n=2
r^2H i¼1 ni
Lð xÞ ¼ Qk ¼h in=2 :
ni
^2i Þ 2 Pk
i¼1 ðr i¼1 ðni 1Þsi
1 2
n
Pk
ðni 1Þs2i X k
2 loge Lð xÞ ¼ ðn k Þ loge i¼1
ðni 1Þ loge s2i
nk i¼1
X Pk
k
s 2
ðni 1Þs2i
¼ ðni 1Þ loge 2 ; where s ¼ Pi¼1
2
k
i¼1 ðni 1Þ
i¼1
si
Bartlett has also suggested that Chi-square approximation will hold goof for ni as
low as four or five if the above statistic is divided by t; where
" #
1 Xk
1 1
t ¼ 1þ Pk
3ðk 1Þ i¼1 ni 1 i¼1 ðni 1Þ
PK 2
ðni 1Þ loge s2
i¼1 s
Hence, T ¼ t
i
v2k1
So, we reject H approximately at level a iff
T [ v2k1;a
Solution
I. Likelihood function
P
n
1 r
1
ðxi lÞ
p x=l; r ¼ n e i if xi [ l8i
r
¼ 0 otherwise
P
n
r1 ðxi l0 Þ
p x=l ; r ¼ rn e i
0
P
n Pn
^H ¼ 1n ðxi l0 Þ ¼ 1n ðyi l0 Þ
) MLE of r is r
i i
x x
Then Sup p =l; r ¼ p =l ; r ¼ ðr^H Þn en
l;r2H 0 ^H
Under
H : l ¼ l0
nðy1 l0 Þ=
P 2 F2;2n2 :
ðyi y1 Þ=
2n 2
II. As earlier, it can be shown that the test has acceptance region as
1X n
c1 \T ¼ ðyi y1 Þ\c2 ;
r0 i
4.1 Introduction 127
Example 4.10 Let ðX11 ; X21 Þ; ðX12 ; X22 Þ. . .; ðX1n ; X2n Þ be a random sample from a
bivariate normal distribution with means l1 and l2 , variances r21 and r22 and cor-
relation coefficient q. Find the likelihood ratio test of
H : q ¼ 0 against K : q 6¼ 0
Solution
Here, h ¼ l1 ; l2 ; r21 ; r22 ; q
H ¼ l1 ; l2 ; r21 ; r22 ; q : 1\li \1; r2i [ 0; 1\q\1; i ¼ 1; 2
H0 ¼ l1 ; l2 ; r21 ; r22 ; q : 1\li \1; r2i [ 0; q ¼ 0; i ¼ 1; 2
In H, ML estimators for l1 ; l2 ; r21 ; r22 and q are
_ _ _2 P _2 P _
l1 ¼ x1 ; l2 ¼ x2 ; r1 ¼ 1n ðx1i x1 Þ2 ; r2 ¼ 1n ðx2i x2 Þ2 : and q¼
P
ðx1i x1 Þðx2i x2 Þ
P P 1=2 ¼ r.
f ðx1i x1 Þ2 ðx2i x2 Þ2 g
Thus,
h 2 2 i
pffiffiffiffiffiffiffiffiffiffiffiffiffin 2 1r
1
r
n^ r
n^
1 þ 2 2r:n:r
e ð Þ r^1 r^2
_ _
Sup pð xjhÞ ¼ 2pr1 r2 1 r 2
2 2 2
h2H
pffiffiffiffiffiffiffiffiffiffiffiffiffin
en
_ _
¼ 2pr1 r2 1 r 2
_ _ 1X _2 _2 1X
l1H ¼ x1 ; l2H ¼ x2 ; r1H ¼ðx1i x1 Þ2 ; r2H ¼ ðx2i x2 Þ2
n n
h i
n 1 n^r21H þ n^r22H
_ _
Thus, Sup pð xjhÞ ¼ 2pr1H r2H
2 2 2
e r^1H r^2H
h2H
n
en
_ _
¼ 2pr1H r2H
Sup pð xjhÞ
h2H0 n=2
Lð xÞ ¼ ¼ 1 r2
Sup pð xjhÞ
h2H
128 4 Likelihood Ratio Test
Lð xÞ\C
n=2
or; 1 r2 \C
or; 1 r 2 \C1
or; r 2 [ C2
or; jr j [ C3 ;
where C3 is obtained as
PH ½jr j [ C3 ¼ a
Zc3 Z1
1 1
dr ¼ dr ¼ 0:025
2 2
1 c3
H 0 ¼ fk : k ¼ k0 g
_
In H, MLE of k is k ¼ x
Thus, the LR test is
PH ½d0 \x\d1 ¼ 1 a:
Mx ðtÞ ¼ ð1 ktÞ1
kt n
) Mx ðtÞ ¼ 1
n
G n; k ; i.e. f ðxÞ ¼ 1 n n enkx xn1 :
Thus, X n CðnÞ k
One can find the values of d0 and d1 from the gamma distribution table under H0 .
Chapter 5
Interval Estimation
5.1 Introduction
A confidence interval (CI) of h is a random interval which covers the true value
of the parameter h with specified degrees of confidence (assurance). In other words,
a random interval I ð X Þ ¼ hð X Þ; hð X Þ satisfying
n o
Prh h 2 I X 1 a8h 2 H ð7:1Þ
Method I
A simple procedure for finding a confidence interval
Let T be a statistic and WðT; hÞ be a function of T and θ. Suppose the distribution of
WðT; hÞ is free from θ. Then it is always possible to choose two constants K1 and K2
ðK1 K2 Þ such that
5.3 Construction of Confidence Interval 133
" P #
ð x i lÞ 2
P v2n;1a1 v2n;a2 ¼ 1 ð a1 þ a2 Þ ¼ 1a
r2
"P P #
ðxi lÞ2 ð x i lÞ 2
)P r2 ¼ 1a
v2n;a2 v2n;1a1
P P
ðxi lÞ2 ðxi lÞ2
Thus v2n;a ; v2n;1a
is 100(1 − a)% confidence interval of r2 when l is
2 1
unknown.
Example 5.3 Let X1 ; X2 ; . . .; Xn be a random sample from density function f ðxjhÞ ¼
1
h ; 0 \ x \ h: Find 100ð1 aÞ% confidence interval of h.
Solution The likelihood function is L ¼ h1n . This is maximum when h is the
smallest; but h cannot be less than xðnÞ , the maximum of sample observations. Thus
_
h ¼ xð n Þ :
_
The p.d.f of h is given by
n1
_ nh
_
_
h h ¼ n ; 0\h\h:
h
_
x
Let u ¼ ðhnÞ ¼ hh : so that gðuÞ ¼ nun1 ; 0 < u < 1.
Thus the distribution of u is independent of h.
We find u1 and u2 such that
Ru1 R1
where gðuÞdu ¼ a1 , gðuÞdu ¼ a2
0
_
u2
_ _
h h h
i.e. P u1 \ h \u2 ¼ 1 a: ) P u2 \h\ u1 ¼ 1 a
Thus, max u2 ; u1 is a 100ð1 aÞ% confidence interval for h.
Xi max Xi
Example 5.4 X1 ; X2 ; . . .; Xn is a random sample from a G 1h ; 1 distribution having
p.d.f.
5.3 Construction of Confidence Interval 135
1
f ðx/hÞ ¼ ex=h ; x 0:
h
where
Zk1 Z1
gðtÞdt ¼ a1 ; gðtÞdt ¼ a2
0 k2
i.e.
nx nx
P \h\ ¼1a
k2 k1
θ(t) B(t,θ(t))
θ
T = t 2 (θ)
θ(t) A(t, θ(t))
Note 1: To avoid the drawing one may consider inverse interpolation formula.
Note 2: If the L.H.S’s of the Eq. (7.1) can be given explicit expression in terms of h
and if the equations can be solved for h uniquely, then roots are the confidence
limits for h at confidence level ð1 aÞ.
Example 5.5 Let X1 ; X2 ; . . .; Xn be a random sample from density function f ðxjhÞ ¼
h;0\x\h: Find 100ð1 aÞ% confidence interval of h.
1
where
Z
k1 ðhÞ
h ^h d ^h = a1 ð7:2Þ
0
and
Zh
h h^ d ^h = a2 ð7:3Þ
k2 ðhÞ
^n k ðhÞ
From (7.2), hhn 01 ¼ a1 or, k1 ðhÞ ¼ hða1 Þ1=n
^ n
From (7.3),¼ a2 or, 1 − ½k2hðhn Þ ¼ a2 or, k2 ðhÞ ¼ hð1 a2 Þ1=n . Therefore,
hn h
hn k2 ðhÞ
h _
i h i
1=n ^
h ^h
P ha1 \h\hð1 a2 Þ1=n ¼ 1 a or, P 1=n \h\ 1=n ¼ 1 a:
ð1a2 Þ ða1 Þ
Note We can get the confidence interval of h by the Method I which is given in
Example 5.3.
Large sample confidence interval: Let the asymptoticn distribution pofffiffi a statistic
o
Tn be normal with mean h and variance r nðhÞ ; then Pr s1a1 ðTnrh Þ n
2
ðhÞ sa2 ’
1 a1 þ a2 ¼ 1 a (say).
This fact gives us a confidence interval for h at confidence level ð1 aÞ
approximately.
5.3 Construction of Confidence Interval 137
h pffiffiffiffiffiffiffi pffiffiffiffiffiffiffii
) P x sa2 x=n\l\x s1a1 x=n ¼ 1 a
Example 5.7 Consider the problem of Example 5.3. Find the 100ð1 aÞ% confi-
dence interval of h by using the method of Chebysheff’s inequality.
Solution
_ _ 2
We have E h ¼ n þn 1 h and E h h ¼ h2 ðn þ 1Þ2ðn þ 2Þ
By applying Chebysheff’s inequality we get
2 _ 3
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h h
6 ð n þ 1Þ ð n þ 2Þ 7 1
P4 \ 25 [ 1 2 :
h 2 2
_ p _
Since h ! h; we replace h by h and for moderately large n,
2 _ 3
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h h ð n þ 1Þ ð n þ 2Þ
6 7 1
P4 _ \ 25 [ 1 2 :
h 2 2
_
Again pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ffi ’ 1n for large n and the fact that h h, we have
ðn þ 1Þðn þ 2Þ
h_ _
qffiffii _ qffiffi
P h\h\h 1 þ 1n 2a [ 1 a: Thus h ¼ max xi ; max xi 1 þ 1n 2a is an
approximate 1 a level confidence interval for h.
From the above discussion, it is clear that ð1 aÞ level C.I is not unique. In fact,
infinite number of C.I’s can be constructed by simple method [Because the equation
a1 þ a2 ¼ a; a1 0; a2 0 has infinite number of solution for ða1 ; a2 Þ]. Again
for different choice of statistic, we get different confidence intervals. For example,
in r.s. from
1 x
f ðx; hÞ ¼ eh ; 0 \ x \ 1
h
P
2 Xi
is a ð1 aÞ level lower confidence bound for h
v22n;a
P
v2 2n since M.g.f. of X = ð1 thÞ1 ) M.g.f of
2 Xi
[As 2X
h v 2
2 ) h
2X
h ¼ ð1 2tÞ1 = M.g.f. of v2 2 ].
On the other hand, að1 aÞ level confidence bound for h based on Xð1Þ ¼ minXi
i
2nXð1Þ
is v22;a
.
So we need some optimality criteria to choose one of the ð1 aÞ level confi-
dence intervals.
1. Shortest length confidence interval [Wilk’s criterion]
A ð1 aÞ level confidence interval I ðhÞ ¼ hðT Þ; hðT Þ based on T will be of
shortest length if the inequality
hð T Þ hð T Þ
h ðT Þ h ðT Þ; for all h holds for every other ð1 aÞ level C.I.
h ðT Þ;
h ðT Þ based on the same statistic T.
Example 5.8 On the basis of an r.s. from N ðl; r2 Þ; r2 being unknown, a ð1 aÞ
given by
h C.I. for l based on X is
level i
sa2 prffiffi ; X
X s1a1 prffiffi ; a1 ; a2 0 and a1 þ a2 ¼ a: The length of the
n n
interval is ðsa2 s1a1 Þ prffiffi : n
5.4 Shortest Length Confidence Interval and Neyman’s Criterion 139
t
X ta ;n1 psffiffi ; X psffiffi : This length of the C.I is t psffiffi :
2 n 1a ;n1 1 n a ;n1 t1a ;n1 2 1 n
which is a random quantity.
h So, to find i the shortest (expected) length C.I, we minimize
ta2 ;n1 t1a1 ;n1 EpðsffiffinÞ subject to a1 ; a2 0 and a1 þ a2 ¼ a: Owing to the sym-
metry of tn1 distribution about ‘0’, the minimum is attained at a1 ¼ a2 ¼ a=2:
Therefore
h the required shortest i expected length confidence interval is
sffiffi
X ta=2;n1 n ; X þ ta=2;n1 n :
p psffiffi
Example 5.9 Consider the problem discussed in Example 5.2. On the basis of a
random sample fromN ðl; r2 Þ; l being known, a ð1 aÞ level CI for r2 is given by
P P
ðxi lÞ2 ðxi lÞ2
vn;a
2 ; v 2 , a1 ; a2 0 and a1 þ a2 ¼ a: The length of the interval is
1
2 n;1a
P
v2
1
v21 ðXi lÞ2 which has the expected value
n;1a1 n;a2
" #
1 1
nr2 :
v2n;1a1 v2n;a2
We wish to minimize 1
v2n;1a
v21 ;
n;a2
1
Rv2
2
subject to f ðv2 Þdv2 ¼ 1 a; where v21 ¼ v2n;1a1 ; v22 ¼ v2n;a2 and f ðv2 Þ is the
v21
p.d.f. of a chi-square r.v. with
2 n d.f. 3
v22
R
Now let / ¼ v12 v12 þ k4 f ðv2 Þdv2 ð1 aÞ5
1 2
v21
d/ 1
¼ 4 kf v21 ¼ 0
dv21 v1
d/ 1
¼ 4 þ kf v22 ¼ 0
dv2 v2
2
1 1
)k ¼ ¼ 4 2
v41 f v21 v2 f v2
Zv2
2
v41 f v21 ¼ v42 f v22 is to be satisfied and f v2 dv2 ¼ 1 a:
v21
It is very difficult to find the actual values of v21 and v22 . In practice, the equal tails
P P
ðxi lÞ2 ðxi lÞ2
interval, v2
; v2 , is used.
n;a=2 n;1a=2
We minimize L subject to
Zu2
nun1 du ¼ un2 un1 ¼ 1 a
u1
) ð1 aÞ1=n \u2 1
and
5.4 Shortest Length Confidence Interval and Neyman’s Criterion 141
dL dL du1 1
¼ max xi þ
du2 du1 du2 u22
1 nu2n1 1
¼ max xi 2 n1 þ 2
u1 nu1 u2
un1 þ 1 u2n þ 1
¼ max xi \0;
u22 un1 þ 1
confidence interval is given by max x ; max x a1=n : This confidence interval has
i i
the smallest length among all confidence intervals for h based on max xi
2. Neyman’s criterion
Let I1 ð X Þ and I2 ð X Þ be two ð1 aÞ levelconfidence intervals for h. I1 ð X Þ will be
accurate (or shorter) than I2 ð X Þ if
x 2 AðhÞ , h 2 S x
n o n o
)Ph h 2 S x ¼ Ph x 2 AðhÞ 1 a8h:
Note The implication of this theorem is that for a fixed x , the confidence region
Since S x is arbitrary the proof follows immediately.
5.4 Shortest Length Confidence Interval and Neyman’s Criterion 143
Theorem 5.4 Let S x be an UMA ð1 aÞ level confidence interval for h. Define
n o
AðhÞ ¼ x =h 2 S x : Then Aðh0 Þ will be an acceptance region of a level-a
UMP test for testing H0 : h ¼ h0 .
Proof According to the construction of AðhÞ, Aðh0 Þ will be the acceptance region of
a level-a non-randomized test for testing H0 : h ¼ h0 .
Now corresponding to another ð1 aÞ level C.I. S x ,
n o
let A ðhÞ ¼ x : h 2 S x :
which implies that Aðh0 Þ will be the acceptance region of level-a UMP
non-randomized test for testing H0 : h ¼ h0 , since A ðhÞ is arbitrary.
Relation between UMPU non-randomized test and UMAU confidence interval
Theorem 3.5 Let Aðh0 Þ be the acceptance region of nan UMPU olevel-a
non-randomized test for testing H0 : h ¼ h0 . Define S x ¼ h= x 2 AðhÞ . Then
i.e., S x is a UMAU ð1 aÞ level C.I. for h, since A ðh0 Þ is arbitrary.
144 5 Interval Estimation
Theorem 5.6 Let S x be an UMAU ð1 aÞ level confidence interval for h.
n o
i.e. Aðh0 Þ will be an acceptance region of a level-a UMPU test for testing
H 0 : h ¼ h0 .
Example 5.11 Let X1 ; X2 ; . . .; Xn be a r.s. from Rð0; hÞ: The UMP level-a
non-randomized test for testing H0 : h ¼ h0 against h 6¼ h0 is given by the critical
pffiffiffi
region xðnÞ [ h0 or xðnÞ h0 n a:
n pffiffiffi o
Let AðhÞ ¼ x h n a\xðnÞ h
n o
Define S x ¼ hj x 2 AðhÞ
pffiffiffi
¼ hjh n a\xðnÞ h
xð n Þ
¼ hjxðnÞ h\ p ffiffiffi
n
a
n o
Thus, by Theorem 5.3, Sð xÞ ¼ hjxðnÞ h\ pðnnffiffiaÞ will be a ð1 aÞ level UMA
x
6.1 Introduction
(ii) No assumption is made about the form of frequency function of the parent
population from which the sample is drawn.
(iii) No parametric technique will be applicable to the data which are mere clas-
sification (i.e. which are measured in nominal scale), while non-parametric
method exists to deal with such data.
(iv) Since the socio-economic data are not, in general, normally distributed,
non-parametric tests have found applications in psychometry, sociology and
educational statistics.
(v) Non-parametric tests are available to deal with data which are given in ranks
or whose seemingly numerical scores have the strength of the ranks. For
example, no parametric test can be applied if the scores are given in grades
such as A, B, C, D, etc.
Disadvantages
(i) Non-parametric test can be used only if the measurements are nominal and
ordinal. Even in that case, if a parametric test exists it is more powerful than
the non-parametric test.
In other words, if all the assumptions of a statistical model are satisfied by the
data and if the measurements are of required strength, then non-parametric
tests are wasteful of time and data.
(ii) No non-parametric method exists for testing interactions in ANOVA model
unless special assumptions about the additivity of the model are made.
(iii) Non-parametric tests are designed to test statistical hypothesis only but not for
estimating parameters.
X
k
ðxi npi Þ2
v2 ¼ :
i¼1
npi
If the agreement between the observed ðxi Þ and expected frequencies ðei Þ is
close, then the differences ðxi npi Þ will be small and consequently v2 will be
small. Otherwise it will be large. The larger the value of v2 the more likely is that
the observed frequencies did not come from the population under H0 . This means
that the test is always right-sided. It can be shown that for large samples the
sampling distribution of v2 under H0 follows chi-square distribution with (k − 1) d.
f. The approximation holds good if every ei 5. In case there are some ei \5, we
have to combine adjacent classes till the expected frequency in the combined class
is at least 5. Then k will be the actual number of classes used in computing v2 . Thus
the null hypothesis H0 is rejected if Cal v2 [ v2a;k1 .
k
Fn ðxÞ ¼ :
n
D
n ¼ Sup½F0 ðxÞ Fn ðxÞ
x
The statistics Dnþ and D n have the same distribution because of symmetry. The
test rejects H0 if Dnþ [ Dn;a þ
when alternative is FðxÞ F0 ðxÞ 8 x and rejects H0 if
Dn [ D
n;a when alternative is FðxÞ F0 ðxÞ 8 x at the level a.
) Pr ½X np ¼ p i:e: Pr ½X np 0 ¼ p:
Case 1 H 1 : np [ np 0
To perform the test we consider the number of positive quantities among
x1 np 0 ; x2 np 0 ; . . .; xn np 0 . Sample values equal to np 0 are ignored.
Suppose S = total number of + signs, we note that, under H 0
Pr½X np 0 0 ¼ p
) Pr X np 0 [ 0 ¼ 1 p ¼ q; say:
) Under H 0 ; S Bðn;hqÞ i
Also, under H 1 ; Pr X n0p \p, i.e. Pr X np 0 [ 0 [ q. Suppose, under
h i
H 1 ; Pr X n0p [ 0 ¼ q0 where q0 [ q.
) Under H 1 ; S Bðn; q0 Þ where q0 [ q.
6.2 One-Sample Non-parametric Tests 149
8
< 1 if S[s
So the test is /ðSÞ ¼ a if S ¼ s
:
0 if S\s
H 2 ¼ np \np 0 or np ¼ np 0 \np 0
Under H 2
Pr½X np 0 ¼ p
) Pr½X np 0 [ p
) x0 : s\sa
Case 3 H 3 : np 6¼ np 0
a
Pr½S\s1 =H 0
) a1 ¼ 2
Pr½S ¼ s1
6.2 One-Sample Non-parametric Tests 151
a
¼ Pr S [ s2 =H 0 þ a2 Pr S ¼ s2 =H 0
2
a
Pr½S [ s2 =H 0
) a2 ¼ 2
Pr½S ¼ s2 =H 0
x0 : jsj [ sa2:
Another similar modification of the sign test is the Wilcoxon signed-rank test. This
is used to test the hypothesis that observations have come from symmetrical pop-
ulation with a common specified median, say, l0 . Thus the problem is to test
H0 : l ¼ l0 : The signed-rank statistic T þ is computed as follows:
1. Subtract l0 from each observation.
2. Rank the resulting differences in order of size, discarding sign.
3. Restore the sign of the original difference to the corresponding rank.
4. Obtain T þ , the sum of the positive ranks.
Similarly, T is the sum of the negative ranks. Then under H0 , we expect T þ
and T to be the same. We also note that
X
n
nðn þ 1Þ
T þ þ T ¼ i¼ :
i¼1
2
P½T þ \C1 ¼ a
P½T þ [ C2 ¼ a
The one-sample run test is based on the order or sequence in which the indi-
vidual scores or observations originally were obtained.
Example 6.1 The theory predicts that the proportion of peas in the four groups A,
B, C and D should be 9:3:3:1. In an experiment among 556 peas, the numbers in the
four groups were 315, 108, 101 and 32. Does the experimental result support the
theory?
Solution If P1, P2, P3 and P4 be the proportions of peas in the four classes in the
whole population of peas, then the null hypothesis to be tested is
9 3 3 1
H0 : P1 ¼ ; P2 ¼ ; P3 ¼ ; P4 ¼
16 16 16 16
X
k
ðxi np0 Þ2
v2 ¼ i
with ðk 1Þ d:f
i¼1
np0i
Xk
x2i
¼ n
i¼1
np0i
6.2 One-Sample Non-parametric Tests 153
From the table we have v20:05;3 ¼ 7:815. Since the calculated value of v2 , i.e. 0.47
is less than the tabulated value, i.e. 7.815, it is not significant. Hence the null
hypothesis may be accepted at 5 % level of significance and we may conclude that
the experimental result supports the theory.
Example 6.2 Can the following sample be reasonably regarded as coming from a
uniform distribution on the interval (35,70): 36, 42, 44, 50, 64, 58, 56, 50, 37, 48,
52, 63, 57, 43, 39, 42, 47, 61, 53, 58? Use Kolmogorov–Smirnov test.
Solution Here we test H0 : FðxÞ ¼ F0 ðxÞ for all x, where F0 ðxÞ is the distribution
function of the uniform distribution on the interval (35,70). Now
F0 ð xÞ ¼ 0 if x 35
x 35
¼ if 35\x\70
35
¼ 1 if x 70
(continued)
50 15/35 11/20 17/140
52 17/35 12/20 16/140
53 18/35 13/20 19/140
56 21/35 14/20 14/140
57 22/35 15/20 17/140
58 23/35 16/20 20/140
58 23/35 17/20 27/140
61 26/35 18/20 22/140
63 28/35 19/20 21/140
64 29/35 20/20 24/140
27
D20 ¼ SupjF20 ðxÞ F0 ðxÞj ¼ ¼ 0:1929:
x 140
Let us take a ¼ 0:05: Then from the table D20;0:05 ¼ 0:294. Since
0.1929 < 0.294, we accept H0 at 5 % level of significance. So we can conclude that
the given data has come from a uniform distribution on the interval (35,70).
Example 6.3 The following data represent the yields of maize in q/ha recorded
from an experiment.
16.4, 19.2, 24.5, 15.4, 17.3, 23.6, 22.7, 20.9, 18.2
Test whether the median yield (M) is 20 q/ha.
Solution We test H0 : M ¼ 20 against H1 : M 6¼ 20. To test H0 , we find the dif-
ference ðX 20Þ and write their signs
þ þ þ þ
Here n = 9 and r = number of ‘+’ sign = 4. This r will be binomial variate with
parameters n = 9 and p = 0.5.
To test H0 against H1 : M 6¼ 20 H1 : p 6¼ 0:5; the critical region x will be
0 0
given by r ra=2 and r ra=2 , where r a=2 is the smallest integer and ra=2 is the
largest integer such that
X9 9
9 1 a
P r ra=2 H0 ¼ ¼ 0:025
x¼ra=2 x 2 2
ra=2 1 9
X 9 1
i.e., 0:975
x¼0 x 2
h i X ra=2 9
0
0 9 1 a
and P r ra=2 H0 ¼ ¼ 0:025
x¼0 x 2 2
6.2 One-Sample Non-parametric Tests 155
0
From the table we have ra=2 1 ¼ 7, i.e. ra=2 ¼ 8 and ra=2 ¼ 1. Here
0
ra=2 ¼ 1\r ¼ 4\ra=2 ¼ 8, so H0 is accepted at 5 % level of significance.
Example 6.4 For the problem given in Example 6.3, test H0 : M ¼ 20 against
H1 : M 6¼ 20 by using Wilcoxon signed-rank test.
Solution The differences Xi 20 are
−3.6, −0.8, 4.5, −4.6, −2.7, 3.6, 2.7, 0.9, −1.8
The order sequence of numbers ignoring the sign and their ranks with original
signs are as follows:
Thus, T þ = The sum of the positive ranks = 21 and T = The sum of negative
ranks = 24.
We note that T þ þ T ¼ nðn 2þ 1Þ ¼ 45
To test H0 : M ¼ 20 against H1 : M 6¼ 20, the critical region ω will be given by
T þ [ C4 and T þ \C3 at the level a. Here we take a ¼ 0:05:
From the table we have P½T þ [ 39 0:025 and
15; 18; 19; 20; 21; 22; 24; 28; 30; 32; 35
) Median = 22
Each original observation is replaced by ‘+’ or ‘−’ sign according as it is larger
or smaller than the median, i.e. 22. Any observation equal to median is simply
discarded. Thus we have from the original observation
21 19 22 18 20 24 15 32 35 28 30
- - x - - + - + + + +
156 6 Non-parametric Test
M F F M M M F M F F M M F M
Test whether the order of males and females in the queue was random.
Solution Here null hypothesis is
H0 : The order of males and females in the queue was random against
H1 : The order of males and females in the queue was not random.
For the given sequence,
MFFMMMFMFFMMFM
we have,
n1 = number of males = 8
n2 = number of females = 6
r = number of runs = 9
Since the observed value of r = 9 lies between the critical values 3 and 12, we
accept H0 at 5 % level of significance. It means that the order of males and females
in the queue was random.
We draw a random sample ðx1 ; y1 Þ; ðx2 ; y2 Þ; . . .; ðxn ; yn Þ from F(x, y). To test
H 0 : np ðx yÞ ¼ np 0 writing z ¼ x y ) H 0 : np ðzÞ ¼ np 0 , i.e. H 0 : np ¼ np 0 ,
writing np ðzÞ ¼ np .
Assumption z ¼ x y is continuous in the neighbourhood of np ðzÞ. Note that
Pr z np ¼ p ) Pr znp [ 0 ¼ q; q ¼ 1 p. We define S = total number of
positive signs among z1 np 0 ; z2 np 0 ; . . .; zn np 0 .
) Under H 0 , Pr znp 0 [ 0 ¼ q and S Bðn; qÞ. Proceed for Case 1, Case 2
and Case 3 as worked out already in Sect. 6.2
Note Since np ðx yÞ is not necessarily equal to np ðxÞ np ðyÞ, the paired sample
sign test is a test for the quantile difference (but not for the difference of the
quantiles), whereas the paired ‘t’ test is a test for the mean difference (and also for
the difference of the means).
This is another test used on matched pairs. It is more powerful than the sign test
because it gives more weight to large numerical differences between the members
of a pair than to small differences. Under matched-paired samples, the differences
d within n paired sample values ðx1i ; x2i Þ for i ¼ 1; 2; . . .; n are assumed to have
come from continuous and symmetric population differences. If Md is the median of
the population of differences, then the null hypotheses is that Md ¼ 0 and the
alternative hypothesis is one of Md [ 0; Md \0 or Md 6¼ 0:
The observed differences di ¼ x1i x2i are ranked in increasing order of abso-
lute magnitude and the sum of ranks is computed for all the differences of like sign.
The test statistic T is the smaller of these two rank-sums. Paris with di ¼ 0 are not
counted. On the null hypothesis, the expected value of the two ranks-sums would be
equal. If the positive rank-sum is the smaller and is equal to or less than the table
value, the null hypothesis will be rejected at the corresponding level of significance
a in favour of the alternative hypothesis that Md [ 0. If the negative rank-sum is the
smaller, the alternative will be that Md \0. If a two-tailed test is required, the
alternative being that Md 6¼ 0, the given levels of significance should be doubled.
Example 6.7 For nine animals, tested under control conditions and experimental
conditions, the following values of a measured variable were observed:
Animal 1 2 3 4 5 6 7 8 9
Control (x1) 21 24 26 32 55 82 46 55 88
Experimental (x2) 18 9 23 26 82 199 42 30 62
158 6 Non-parametric Test
Test whether a significant difference exists between the medians, using (i) the
sign test and (ii) the Wilcoxon signed-ranks test.
Solution Let h be the median of the distribution of differences. Our null hypothesis
will be H0 : h ¼ 0 against H1 : h 6¼ 0.
(i) Let di ¼ x1i x2i be the difference of the values under control and experi-
mental conditions.
Here we have 7 ‘+’ signs among 9 non-zero values. Under Ho , number(r) of ‘+’
signs will follow a binomial distribution with parameters n = 9 and p = 0.5. To test
H0 : h ¼ 0 H0 : p ¼ 0:5 against H1 : h 6¼ 0 H1 : p 6¼ 0:5; the critical region x
0 0
will be given by r ra=2 and r ra=2 where ra=2 is the smallest integer and ra=2 is
the largest integer such that.
X9 9
9 1 a
P r ra=2 H0 ¼ ¼ 0:025
x¼ra=2 x 2 2
ra 1 9
X
2
9 1
i:e:; 0:975
x¼0 x 2
h i X ra=2 90
0 9 1 a
and P r ra=2 H0 ¼ ¼ 0:025
x¼0 x 2 2
From the table we get ra=2 1 ¼ 7 ) ra=2 ¼ 8 and r 0 a=2 ¼ 1: For our example
r = 7 which lies between ra=2 ð¼8Þ and r 0 a=2 ð¼1Þ: So H0 is accepted.
(ii) The observed differences di ¼ x1i x2i are ranked in increasing order of
absolute magnitude and the sum of the ranks is computed for all the difference of
like sign. Thus
di 3 15 3 6 −27 −117 4 25 26
Rank 1.5 5 1.5 4 8 9 3 6 7
The test statistic T is the smaller of these two rank-sums (one for positive di and
one for negative di ). Here T = 17. From the table, we reject H0 at a ¼ 0:05 if either
T > 39 or T < 6. Since T > 6 and < 39, we accept H0 .
6.4 Two-Sample Problem 159
3 \ 5 \ 7 \ 8 \ 9 \ 10 \ 11 \ 17 \ 20
# # # # # # # # #
Ranks 1 2 3 4 5 6 7 8 9
P½T\t1 =H 0 þ P½T [ t2 =H 0 a:
T nþ1 n2 ðn þ 1Þ
)E =H 0 ¼ l ¼ ) EðT=H 0 Þ ¼
n2 2 2
T n n2 r 2
n1 n 1 n1 ðn þ 1Þ
2
V =H 0 ¼ ¼ ¼
n2 n 1 n2 n 1 12 n2 12 n2
n1 n2 ðn þ 1Þ
) VðT=H 0 Þ¼ :
12
X
n2 X
n1
¼ gðxi ; xn1 þ j Þ
j¼1 i¼1
P
n2 P
n1
¼ gðRi ; Rn1 þ j Þ; [no. of pairs in which 2nd sample ranks are greater than
j¼1 i¼1
1st sample ranks]
n2 X
X n1
¼ gðRi ; Sj Þ
j¼1 i¼1
n n
P
n2 P1 P1
¼ gðRi ; Sj Þ , ½ gðRi ; Sj Þ = no. of 1st sample ranks which are less than Sj ]
j¼1 i¼1 i¼1
X
n2
Xn2
n2 ðn2 þ 1Þ
¼ ðSj 1Þ ðj 1Þ ¼ ðSj jÞ ¼ T
j¼1 1
2
n2 ðn2 þ 1Þ n1 n2
) EðU=H 0 Þ = E(T/H0 Þ ¼ :
2 2
6.4 Two-Sample Problem 161
n1 n2 ðn þ 1Þ
VðU=H0 Þ ¼ VðT=H0 Þ ¼
12
Therefore
(1) For H0 : d¼ 0 against H1 : d [ 0; x0 : s [ sa
(2) For H0 : d ¼ 0 against H2 : d\0; x0 : s\ sa
(3) For H0 : d ¼ 0 against H3 : d 6¼ 0; x0 : jsj [ sa=2
B. Mood’s Median Test
Here we test H0 : F1 ðxÞ ¼ F2 ðxÞ against H1 : F1 ðxÞ F2 ðxÞ, i.e. H0 : d ¼ 0 against
H1 : d [ 0.
We draw a sample ðx1 ; x2 ; . . .; xn1 Þ of size n1 from the 1st population and another
sample ðxn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 Þ of size n2 from the 2nd population.
We mix the two samples and arrange them in ascending order of magnitude. Say
xð1Þ \ xð2Þ \ \ xðnÞ & xðmÞ ¼ combined sample median.
Define T ¼ total no: of 2nd sample size [ xðmÞ
¼ total no: of 2nd sample ranks [ m
Here T is the test statistic.
Under H1 , T would be too large and hence a right tail test is appropriate.
So for H1 : d [ 0 ) x0 : T [ ta where, ta is such that PH0 ½T ta a
for H2 : d\0 ) x0 : T\ta 0 where PH0 ½T ta a and
for H3 : d 6¼ 0 ) x0 : T t1 and T t2 where t1 , t2 are such that
PH0 ½T t1 þ PH0 ½T t2 a.
Null distribution of T: We want to get PðT ¼ t=H0 Þ.
Note that the totality of the pooled ranks (1, 2,.., n) is comprised of two subsets:
f1; 2; . . .; mg and fm þ 1; m þ 2; . . .; ng. Under H0 , the second sample ranks
represent a random sample without replacement of size n2 from the entire set. Since
T = no. of 2nd sample ranks exceeding m, the probability that there will be just
t number of members from 2nd subset in the random sample of size n2 is given by
the hypergeometric law:
nm m
t n t
) PðT ¼ t=H0 Þ ¼ 2
n
n2
n2 ðn mÞ n1 n2 mðn mÞ
) EðT=H0 Þ ¼ and VðT=H0 Þ ¼ :
n n2 ðn 1Þ
162 6 Non-parametric Test
T n2=2
s ¼ pnffiffiffiffiffiffi
ffi a N ð0; 1Þ
1 n2
4n
) for H1 : d [ 0 ) x0 : s [ sa
for H1 : d\0 ) x0 : s\ sa
and for H3 : d 6¼ 0 ) x0 : jsj [ sa=2 :
Case II The two populations differ in every respect, i.e. with respect to location,
dispersion, skewness, kurtosis, etc.
C. Wald–Wolfowitz Run test
H0 : F1 ðxÞ ¼ F2 ðxÞ against H1 : F1 ðxÞ 6¼ F2 ðxÞ
Here also we arrange the combined sample in ascending order xð1Þ \
xð2Þ \ . . .\ xðnÞ .
Suppose ðR1 ; . . .; Rn1 Þ be the ranks of the 1st sample observation and
ðRn1 þ 1 ; . . .; Rn1 þ n2 Þ be the ranks of the 2nd sample observation. According to the
ordered arrangement,
we write za ¼ 0 if xðaÞ comes from 1st sample
= 1 if xðaÞ comes from 2nd sample.
We note that, 1st sample can be written as xðR1 Þ ; xðR2 Þ ; . . .; xðRn1 Þ and the 2nd
n o
sample can be written as xðRn1 þ 1 Þ ; xðRn1 þ 2 Þ ; . . .; xðRn1 þ n2 Þ .
) za ¼ 0 if a 2 ðR1 ; R2 ; . . .Rn1 Þ
¼ 1 if a 2 ðRn1 þ 1 ; Rn2 þ 2 ; . . .Rn1 þ n2 Þ:
(Note: Since U and V are not independent, so the traditional CLT for W = U + V is
not applicable here. Still (6.1) is true here as shown by Wald and Wolfowitz using
Strilings’ approximation). We write k1 ¼ nn1 and k2 ¼ nn2 ) k1 þ k2 ¼ 1
1 k2ffi
Then s ¼ p ffiffiffiffiffiffiffiffiffiffiffiffi
W2nk
a N ð0; 1Þ
4nk1 k2
2 2
x0 : s sa :
D. Kolmogorov–Smirnov test
Let X1 ; X2 ; . . .; Xn1 be from F1 and Xn1 þ 1 ; Xn1 þ 2 ; . . .; Xn be from F2 . We are to test
H0 : F1 ðxÞ ¼ F2 ðxÞ8x against
164 6 Non-parametric Test
#xa x; a ¼ 1; 2; . . .; n1
F1n1 ðxÞ ¼
n1
#xb x; b ¼ n1 þ 1; n1 þ 2; ::; n2
F2n2 ðxÞ ¼
n2
Test statistic
Dþ
n1 ;n2 ¼ SupfF1n1 ðxÞ F2n2 ðxÞg for H1
x
D
n1 ;n2 ¼ SupfF2n2 ðxÞ F1n1 ðxÞg for H2
x
Dn1 ;n2 ¼ Sup jF1n1 ðxÞ F2n2 ðxÞj
x
n o
¼ max Dþ n1 ;n2 ; D
n1 ;n2 for H3
Let 2nd sample ranks be Rn1 þ 1 ; . . .; Rn and ordered ranks be S1 \S2 \::\Sn2 .
Similarly for 1st sample ranks are R1 ; R2 ; . . .; Rn1 and ordered ranks are S01 \S02 ::\S0n1 .
Then Dþn1 ;n2 ¼ SupfF1n1 ðxÞ F2n2 ðxÞg ¼ max Sup fF1n1 ðxÞ F2n2 ðxÞg
x i¼0;1;::;n1 Xs0 x\S0 þ 1
i i
i S0i i
¼ max max ;0 :
i¼1;::;n1 n1 n2
Sj j
Similarly, D
n1 ;n2 ¼ max 0; max j
n2 n1 :
j¼1;::;n2
n o
Dn1 ;n2 ¼ max Dnþ1 ;n2 ; D
n1 ;n2 :
Under H0 , Dnis uniform and Dþ ; D and Doare distribution free. [Under H0,
distribution of ðs1 ; s2 ; . . .sn2 Þ; ðs01 ; s02 ; s03 ; . . .s0n1 Þ is independent of (F1 = F2)].
Critical region: under H0, we expect that D+, D− and D are very small. Hence right
tailed test based on D’s would be appropriate.
Asymptotic distribution
qffiffiffiffiffiffiffiffiffiffi
þ
! 1 e2z as min ðn1 ; n2 Þ! 1; z [ 0
2
For one-sided test PH0 n1 n2
D
n1 þ n2 n1 ;n2 z
qffiffiffiffiffiffiffiffiffiffi
Practically we find a z such that e2z ¼ a and reject H0 if n1n1þn2n2 (observed
2
Dþ
n1 ;n2 ) z.
6.4 Two-Sample Problem 165
qffiffiffiffiffiffiffiffiffiffi
P
1
ð1Þi1 e2i
2 2
For two sided test PH0 n1 n2
n1 þ n2 Dn1 ;n2 z ! 1 2 z
as min
i¼1
ðn1 ; n2 Þ ! 1:
Advantages of K–S test over Homogeneity v2 test are as follows
1. K–S test is applicable to ungrouped data, while v2 is applicable to grouped data
only.
2. Under H0 K–S is exactly distribution free, while v2 is asymptotically distribu-
tion free.
3. K–S test is consistent against any alternative, while v2 is so for specific alter-
native only.
Example 6.8 Twelve 4-year-old boys and twelve 4-year-old girls were observed
during two 15 min play sessions and each child’s play during these two periods was
scored as follows for incidence and degree of aggression:
Boys : 86; 69; 72; 65; 113; 65; 118; 45; 141; 104; 41; 50
Girls : 55; 40; 22; 58; 16; 7; 9; 16; 26; 36; 20; 15
Test the hypothesis that there were sex differences in the amount of aggression
shown, using (a) the Wald-Wolfowitz runs test, (b) the Mann–Whitney–Wilcoxon
test and (c) the Kolmogorov–Smirnov test.
Solution We want to test H0 : incidence and degree of aggression are the same in
four-year olds of both sexes against H1 : four-year-old boys and four-year-old girls
display differences in incidence and degree of aggression.
Score 7 9 15 16 16 20 22 26 36 40 41 45 50 55 58
Groups G G G G G G G G G G B B B G G
Runs _________________1___________________________ ____2_____ __3___
Score 65 65 69 72 86 104 113 118 141
Groups B B B B B B B B B
Runs __________________4______________________
R2 ¼ 1 þ 2 þ 3 þ 4:5 þ 4:5 þ 6 þ 7 þ 8 þ 9 þ 10 þ 14 þ 15 ¼ 84
n2 ðn2 þ 1Þ
U ¼ n1 n2 þ R2
2
¼ 144 þ 78 84 ¼ 138
Or, equivalently,
n1 ðn1 þ 1Þ
U ¼ n1 n2 þ R1
2
¼ 144 þ 78 216 ¼ 6
The test statistic is given by the smaller of the two quantities. Here U = 6. The
other value of U can be obtained from the relation U 0 = n1 n2 − U = 144 – 6 = 138.
The critical value of U for a two-tailed test at a = 0.05 and n1 ¼ n2 = 12 is 37. The
observed U = 6 is less than the table value. Hence it is significant at 5 % level.
Hence H0 is rejected.
(c) Kolmogorov–Smirnov test
The scores of the boys and girls are presented in two frequency distributions shown
below:
Score (x) No. of boys No. of girls F12 ðxÞ G12 ðxÞ jF12 ðxÞ G12 ðxÞj
7–20 0 6 0 6/12 6/12
21–34 0 2 0 8/12 8/12
35–48 2 2 2/12 10/12 8/12
49–62 1 2 3/12 12/12 9/12
63–76 4 0 7/12 12/12 5/12
77–90 1 0 8/12 12/12 4/12
91–104 1 0 9/12 12/12 3/12
105–118 2 0 11/12 12/12 1/12
119–132 0 0 11/12 12/12 1/12
133–146 1 0 12/12 12/12 0
168 6 Non-parametric Test
D12;12 ¼ SupjF12 ðxÞ G12 ðxÞj ¼ 9=12. From the table, the critical value for
n1 ¼ n2 ¼ 12 at level a ¼ 0:05 is D12;12;05 ¼ 6=12. Since D12;12 [ D12;12;0:5 , we
reject H0 .
If Pr½PrðL X UÞ b ¼ c ð6:2Þ
then the interval (L,U) is called 100 b% tolerance interval with tolerance coefficient
c. L and U are called lower and upper tolerance limits respectively. If the deter-
mination of c does not depend upon F then the limit (L,U) are called non-parametric
(distribution free) tolerance limits. We note that, (6.2) can be written as,
n! r1
g xðrÞ ; xðsÞ ¼ FðxðrÞ Þ
ðr 1Þ!ðs r 1Þ!ðn sÞ!
sr1 ns
FðxðsÞ Þ FðxðrÞ Þ 1 FðxðsÞ Þ f ðxðrÞ Þf ðxðsÞ Þ; xðrÞ \xðsÞ
n!
gðu; vÞ ¼ ur1 ðv uÞsr1 ð1 vÞns ; 0\u\v\1:
ðr 1Þ!ðs r 1Þ!ðn sÞ!
6.5 Non-parametric Tolerance Limits 169
n!
) gðw; yÞ ¼ wr1 ysr1 ð1 w yÞns
ðr 1Þ!ðs r 1Þ!ðn sÞ!
Z1y
n!
) gðyÞ ¼ ysr1 wr1 ð1 w yÞns dw
ðr 1Þ!ðs r 1Þ!ðn sÞ!
0
Z1
n!
¼ ysr1 ð1 yÞr1 tr1 ð1 yÞns ð1 tÞns ð1 yÞdt
ðr 1Þ!ðs r 1Þ!ðn sÞ!
0
Z1
n!
¼ ysr1 ð1 yÞn þ rs tr1 ð1 tÞns dt
ðr 1Þ!ðs r 1Þ!ðn sÞ!
0
Cðn þ 1Þ
¼ ysr1 ð1 yÞn þ rs
Cðs rÞCðn þ r s þ 1Þ
1
¼ ysr1 ð1 yÞn þ rs ; 0\y\1:
bðs r; n þ r s þ 1Þ
) Pr FðxðsÞ Þ FðxðrÞ Þ b ¼ c
, Pr½y b ¼ c , Pr½y b ¼ 1 c
Zb
i:e: gðyÞdy ¼ 1 c
0
Rb
ysr1 ð1 yÞn þ rs dy
0
i:e: ¼1c
bðs r; n þ r s þ 1Þ
i:e: Ib ðs r; n þ r s þ 1Þ ¼ 1 c ð6:4Þ
Then ð6:4Þ ) c ¼1 I b ðs r; n þ r s þ 1Þ
!
X n
sr1
¼ bx ð1 bÞnx :
x¼0 x
So for given n, b and c we can find s and r such that xðrÞ and xðsÞ are sym-
metrically placed.
Suppose F(x) is continuous and arandom sample ðx1 ; x2 ; . . .; xn Þ is drawn from it.
np is the p-th order quantile. So P X np ¼ p. Define X ðrÞ and X ðsÞ as the rth and
sth order statistics, r < s. Then X ðrÞ ; X ðsÞ is said to be 100(1 − α)% confidence
interval for np if
Pr X ðrÞ np X ðsÞ ¼ 1 a ð6:5Þ
Now; Pr X ðrÞ np X ðsÞ ¼ Pr np X ðsÞ Pr np X ðrÞ
¼ Pr X ðsÞ np Pr X ðrÞ np
¼ 1 Pr X ðsÞ \np 1 þ Pr X ðrÞ \np
¼ Pr X ðrÞ \np Pr X ðsÞ \np
¼ Pr at least r of the observations \np
Pr at least s of the observations \np
6.6 Non-parametric Confidence Interval for nP 171
!
X
s1
n nx
¼ px ð1 pÞ ð6:6Þ
x¼r x
! !
X
s1 n nx X
r1 n nx
¼ p ð1 pÞ
x
px ð1 pÞ
x¼0 x x¼0 x
¼ 1 I p ðs; n s þ 1Þ 1 þ I p ðr; n r þ 1Þ
¼ I p ðr; n r þ 1Þ I p ðs; n s þ 1Þ
Since, Pr X ðrÞ np X ðsÞ ¼ 1 a, so r and s are such that
Given a and n, the selection of r and s satisfying (6.7) is not unique. We select
that pair of r and s for which (s − r) is minimum.
For symmetrically placed order statistics xðrÞ and xðsÞ , we select that pair of (r,
s) such that r + s = n + 1 ⇒ s − 1 = n − r.
!
X
nr
n x nx
) From ð6:7Þ 1 a ¼ p ð1 pÞ :
r x
Pr½s1 \s\s2 ¼ 1 a
In order to obtain a confidence interval for n1=2 we need only to translate the
inequality in the LHS of (6.9) to an equivalent statement involving
the order
statistics and n1=2 . We have seen earlier that 1 a ¼ Pr XðrÞ np XðsÞ
!
sP
1 n x nx
¼ p ð1pÞ .
x¼r x
Now, for
!
1 X s1 n 1 n
p ¼ ; 1 a ¼ Pr XðrÞ n1=2 XðsÞ ¼
2 x¼r x 2
1
¼ Pr½r S s 1 as S B n; under H0 :
2
) Pr XðrÞ n1=2 XðsÞ ¼ Pr½r S s 1 ¼ 1 a ð6:10Þ
" #
s1 þ 1 n=2 S n2 s2 1 n2
Pr pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi ¼1a
n=4 n=4 n=4
" #
s1 þ 1 n=2 0:5 s2 1 n2 þ 0:5
or, Pr pffiffiffiffiffi s pffiffiffiffiffi ¼1a
n=4 n=4
pffiffiffiffiffi pffiffiffiffiffi
i:e: s1 ¼ n=2 0:5 n=4s
a=2 & s2 ¼ =2 þ 0:5 þ
n n=4s
a=2 ð6:11Þ
When several tests of the same hypothesis H0 are made on the basis of independent
sets of data, it is quite likely that some of the tests will dictate rejection of the
hypothesis (at the chosen level of significance) while the others will dictate its
acceptance. In such a case, one would naturally like to have a means of combining
the results of the individual tests to reach a firm, overall decision. While one may
well apply the same test to the combined set of data, what we are envisaging is a
situation where only the values of the test statistics used are available.
Let us denote by Ti the statistic used in making the ith test (say, for i = 1, 2,...,k).
Commonly T1 ; T2 ; . . .; Tk will be statistics defined in the same way (like v2
statistics or t-statistics), but with varying sampling distributions simply because
they are based on varying sample sizes. To fix ideas, let us assume that in each case
the test requires that H0 be rejected if, and only if, the observed value of the
corresponding statistic be too large. Consider, in this situation, the probabilities
yi ¼ Pr½Ti [ ti =H0 , for i = 1,2,…,k.
Provided Ti has a continuous distribution under H0 , say with probability density
R1
function gi ðtÞ, so that yi ¼ gi ðtÞdt; where ti is a randomly taken value of Ti , yi has
ti
the rectangular distribution over the interval [0,1] under H0 and hence 2 loge yi
Pk
has the v2 distribution with df = 2. Consequently Pk ¼ 2 loge yi has, under H0 ,
i¼1
the v2 distribution with 2k degrees of freedom. This statistic is used as the test
statistic for making the combined test. One would reject H0 if, and only if, the
observed value of Pk exceeds v2a;2k .
174 6 Non-parametric Test
The case where each individual test requires rejection of H0 if, and only if, the
observed value of the corresponding test statistic is too small, or the case where
each individual test requires rejection of H0 if, and only if, the observed value of the
test statistic is either too large or too small, is to be similarly dealt with. The reason
is that, if Ti have continuous distributions under H0 , then ui ¼ Pr½Ti \ti =H0 and
vi ¼ Pr½jTi j [ jti j=H0 are also rectangularly distributed over (0,1). This implies
Pk
that the statistic Pk to be appropriate to these situations, viz., Pk ¼ 2 loge ui
i¼1
P
k
and Pk ¼ 2 loge vi , are also distributed as v2 statistics with df = 2k under H0 . In
i¼1
each of these cases also, the overall decision will be to reject H0 if, and only if, the
observed value of the respective Pk exceeds v2a;2K .
Example 6.9 In order to test whether the mean height ðlÞ of a variety of paddy
plants, when fully grown, is 60 cm, or less than 60 cm, five experimenters made
independent (student’s) t-tests with their respective data. The probabilities of the t-
statistics (with the appropriate df in each case) to be less than their respective
observed values are 0.023, 0.061, 0.07, 0.105 and 0.007. If the tests are made at 5 %
level, then the hypothesis H0 : l ¼ 60 cm, has to be accepted in three cases out of
the five.
In order to combine the results of the 5 tests, we note that log yi , for i = 1, 2, 3, 4
and 5, are 2:36173; 2:78533; 2:23045; 1:02119 and 3:84510, respectively. Hence for
P5 P5
the data, loge ui ¼ 10 þ 2:24380 ¼ 7:75620, so that Pk ¼ 2 loge ui ¼
i¼1 1
2:30259 15:5124 ¼ 35:719.
This is to be compared with v2:05;10 ¼ 18:307 and v2:01;10 ¼ 23:205: Since the
observed value of Pk exceeds the tabulated values, the combined result of the
experimenter’s tests leads to the rejection of H0 at both 5 % and the 1 % level. In
other words, in the light of all 5 experimenters’ data, we may conclude that the
mean height at the variety of paddy plant is less than 60 cm.
increasing order of magnitude separately and if the X’s and Y’s have continuous
distribution functions, we get a unique set of rankings. The data will then reduce to
n pairs of ranking. Let us write
R1a ¼ Rank of Xa ; a ¼ 1; 2; . . .; n:
R2a ¼ Rank of Ya ; a ¼ 1; 2; . . .; n:
Pearsonian coefficient of correlation between the ranks R1a ’s and R2a ’s is called
the Spearman’s rank correlation coefficient rs which is given by
Pn
1 ÞðR2a R
ðR1a R 2Þ
a¼1
rs ¼
nP Pn o1=2
n
a¼1
1 Þ2
ðR1a R 2
a¼1 ðR2a R2 Þ
n
P
12 R1a n þ2 1 R2a n þ2 1
a¼1
¼
nðn2 1Þ
If for n individuals, Da ¼ R1a R2a , is the difference between ranks of the ath
individual for a ¼ 1; 2; . . .; n, the formula for Spearman’s rank correlation is
P
n
6 D2a
i¼1
rs ¼ 1 :
nðn2 1Þ
The value of rs lies between −1 and +1. If X, Y are independent then Eðrs Þ ¼ 0.
Also Population Spearman’s rank correlation coefficient, i.e. qs ¼ 0 ) Eðrs Þ ¼ 0:
Kendall in 1962 derived the frequency function of rs and gave exact critical value
rs . But the approximate test of rs which is the same as t-test for Pearsonian cor-
relation coefficient is good enough for all practical purposes. Here we test H0 :
qs ¼ 0 against H1 : qs 6¼ 0. The test statistic
pffiffiffiffiffiffi
t ¼ rp
s ffiffiffiffiffiffiffi
n2ffi
has (n − 2) d.f. The decision about H0 is taken in the usual way. For
1rs2
pffiffiffiffiffiffiffiffiffiffiffi
large samples under H0, the random variable Z ¼ rs n 1 has approximately a
standard normal distribution. The approximation is good for n 10.
B. Kendall’s rank correlation coefficient
Kendall’s rank correlation coefficient τ is suitable for the paired ranks as in case of
Spearman’s rank correlation. Let ðX1 ; Y1 Þ; ðX2 ; Y2 Þ; . . .; ðXn ; Yn Þ be a sample from a
bivariate population.
For any two pairs ðXi ; Yi Þ and Xj ; Yj we say that the relation is perfect con-
cordance if
176 6 Non-parametric Test
s ¼ pc pd
1 X
n X
T¼ ! sðxj xi Þsðyj yi Þ ð6:12Þ
n 1 i\j n
2
where sðrÞ ¼ 1 if r [ 0
¼ 0 if r ¼ 0
¼ 1 if r\0
Naturally E s xj xi sðyj yi Þ ¼ pc pd ¼ s
The statistic T defined in (6.12) is known as Kendall’s sample tau ðsÞ coefficient.
The procedure for calculating T consists of the following steps:
Step 1: Arrange the rank of the first set (X) in ascending order and rearrange the
ranks of the second set (Y) in such a way that n pairs of rank remain the same.
Step 2: After operating Step 1, the ranks of X are in natural order. Now we are left
to determine how many pairs of ranks on the set Y are in their natural order and how
many are not. A number is said to be in natural order if it is smaller than the
succeeding number and is coded as +1 and also if it is greater than its succeeding
it will not be taken in natural order and will be coded as −1. In this
number then
n
way all pairs of the set (Y) will be considered and assigned the values +1
2
and −1.
6.8 Measures of Association for Bivariate Samples 177
S Actual value 2S
T¼ ¼ ¼
n Maximum possible value nðn 1Þ
2
Compute (i) Spearmen’s rank correlation coefficient ðrs Þ and Kendall’s sample
tau coefficient (T) and test their significance.
Solution (i) First we find di ¼ xi yi 8i which are
d : −2 −4 −2 −1 3 2 4
P
Also, 7i¼1 di2 ¼P54
6 d2
thus rs ¼ 1 nðn2 1Þi ¼ 1 6 54
7 48 ¼ 0:036
To test H0 : qs ¼ 0 against H1 : qs 6¼ 0, the statistic
pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi
rs n 2 0:036 7 2
t5 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:080
1 rs2 1 ð0:036Þ2
From the table, t0.025,5 = 2.571. Calculated value of |t| = 0.080 < 2.571, hence we
accept H0. It means there is a dissociation between the ranks awarded by two
judges.
178 6 Non-parametric Test
x 1 2 3 4 5 6 7
y 3 6 5 2 1 7 4
Solution (a) The heights of husband and wife are each ranked from 1 to 12 in
increasing order of magnitude separately and let us denote their ranks by xi and yi
respectively (i = 1,2, …, 12).
xi : 12 7 10 9 1 4 11 8 2 3 6 5
yi : 11 2 12 4 5 1 9 10 3 6 7 8
di ¼ x i y i :1 5 2 5 4 3 2 2 1 3 1 3
X
di2 ¼ 108:
6.8 Measures of Association for Bivariate Samples 179
P
6 di26 108
Thus rs ¼ 1 ¼ 1 12
nðn2 1Þ 143 ¼ 0:6224
We write below the ranks of x in natural order and ranks of y correspondingly
xi : 1 2 3 4 5 6 7 8 9 10 11 12
yi : 5 3 6 1 8 7 2 10 4 12 9 11
n
Total number of scores = ¼ 12 2 11 ¼ 66
2
Actual score = S = 3 + 6 + 3 + 8 + 1 + 2 + 5 + 0 + 3 – 2 + 1 = 30 (procedure for
calculations of S is explained in Example 6.10 (ii))
Thus, T ¼ 3060 ¼ 0:4545
(b) To test H0 : qs ¼ 0 against H1 : qs 6¼ 0, the approximate test statistic is
pffiffiffiffiffiffiffiffiffiffiffi
Z ¼ rs n 1
pffiffiffiffiffi
¼ 0:6224 11 ¼ 2:06 Nð0; 1Þ
Since Cal |Z|, i.e., 2.06 > Z0.025 = 1.96, hence we reject H0 .
It means that the heights of husband and wife are not independent.
To test H0 : s ¼ 0 against H1 : s 6¼ 0, the approximate test statistic is
3 pffiffiffi 3 pffiffiffiffiffi
Z¼ n T¼ 12 0:4545 ¼ 2:36 Nð0; 1Þ
2 2
Since Cal |Z|, i.e., 2.36 > Z0.025 = 1.96, hence we reject H0 . Hence we can
conclude that there is an association between the heights of husband and wife.
Chapter 7
Statistical Decision Theory
7.1 Introduction
In this chapter we discuss the problems of point estimation, hypothesis testing and
interval estimation of a parameter from a different standpoint.
Before we start the discussion, let us first define certain terms commonly used in
statistical inerence problem and decision theory. Let X 1 ; X 2 ; . . .; X n denote a ran-
dom sample of size n from a distribution that has the p.d.f. f(x, θ), where θ is an
unknown state of nature or an unknown parameter and H is the set of all possible
values of θ, i.e. parameter space (known).
To make some inference about θ, i.e. to take some decisions or action about θ,
the statistician takes an action on the basis of the sample point ðx1 ; x2 ; . . .; xn Þ.
Let us define
L(θ,a)
Testing of hypothesis of θ
To test H 0 : h h0 (a given value of θ) against
H 1 : h [ h0
¼ fa0 ; a1 g where a0 = accept H0 and a1 = accept H1. Here, simple (0–1) loss
function is as
a0 a1
h h0 0 1
h [ h0 1 0
9
a0 a1 =
l00 \l01
or, assigned value loss function is as h h0 l00 l01
; l11 \l10
h [ h0 l10 l11
9
a0 a1 =
w1 ðhÞ " in h0 h
or, a ð0 xÞ type loss function is as h h0 0 x1 ðhÞ
; w2 ðhÞ " in h h0
h [ h0 x2 ðhÞ 0
Interval estimation
Here, we are to choose one interval from ð0; 1Þ.
Ph : Ph ½X 2 A or; F h ðxÞ ¼ Ph ½X x
or; f h ðxÞ ¼ p.d.f or p.m.f of X:
If d(x) = action taken; loss incurred under h ¼ Lðh; d ðxÞÞ. If d(x) = decision rule,
then loss incurred (under θ) ¼ Lðh; d ðxÞÞ (random quantity) = a real-valued random
variable. Expected loss (under θ) ¼ E h Lðh; d ðxÞÞ ¼ Rd ðhÞ = risk of d(x) under θ.
Let us restrict to rule d(x) for which Rd ðhÞ\18h and let D = the set of all such
d(x)’s. Rd ðhÞ gives a preference pattern D for given θ. The smaller the risk the better
is the decision rule d(x).
X
Thus, ðH; ; LÞ ! ðH; D; RÞ
Example 7.2 Point estimation of real h : ¼ H
d(x):
x ! ɶ(Θ); d(x) = point estimator of θ.
For squared error loss Rd ðhÞ ¼ E h ðd ðxÞ hÞ2 ¼ MSE of d(x) under θ.
Example 7.3 ¼ fa0 ; a1 g; ai ¼ accept Hi , i = 0, 1,
d ð xÞ :
x ! a0; a1
x0 ¼ fx : d ð xÞ ¼ a0 g ¼ acceptance region
x1 ¼ fx : d ð xÞ ¼ a1 g ¼ rejection region
) d ðxÞ ¼ a0 if x 2
x0
a1 if x 2
x1
x0 and
x1 are disjoint and
x0 þ
x1¼
x
184 7 Statistical Decision Theory
If h 2 H1 ; Rd ðhÞ ¼ Ph fd ð xÞ ¼ a0 g ¼ Ph fX 2
x0 g
¼ 1 Ph fX 2 x1 g ¼ Probability of type 2 error:
Interval estimation of real θ
ɶ = set of all possible intervals of H.
d ð xÞ :
x!
D = the set of all non-randomized decision rules (with finite risks 8h)
¼ fa0 ; a1 g; ai ¼ accept H i ; i ¼ 0; 1; . . .
Lðh; a Þ ¼ / 0 þ ð1 /Þ 1 ¼ 1 / for h 2 H1
R/ ðhÞ ¼ Eh /ð xÞ for h 2 H0
¼ Eh ½1 /ð xÞ for h 2 H1
d 1 : d 1 ð x 1 Þ ¼ a1 ; d 1 ð x 2 Þ ¼ a1
d 2 : d 2 ð x 1 Þ ¼ a2 ; d 2 ð x 2 Þ ¼ a2
d 3 : d 3 ð x 1 Þ ¼ a1 ; d 3 ð x 2 Þ ¼ a2
d 4 : d 4 ð x 1 Þ ¼ a2 ; d 4 ð x 2 Þ ¼ a1
X
4
R d ð hÞ ¼ pi Rd i ðhÞ:
i¼1
We shall consider only mixed rules d for which Rd ðhÞ is finite 8h and shall
denote by D as the class of all such mixed decision rules. Clearly, D D since a
non-randomized rule d = a probability distribution over D degenerate at d.
First mode of randomization:
X
ðH; ; LÞ ! ðH; ; LÞ !ðH; D; RÞ
7.1 Introduction 187
D ¼ fð/1 ; /2 Þ=0 /1 ; /2 1g
If one chooses a d 2 D ,
a1 is chosen with probability p1 þ p3 for X ¼ x1
a2 is chosen with probability 1 ðp1 þ p3 Þ ¼ p2 þ p4
a1 is chosen with probability p1 þ p4 for X ¼ x2
a2 is chosen with probability 1 ðp1 þ p4 Þ ¼ p2 þ p3
Note
1. d1 d2 ) either d1 [ d2 , or d1 d2
d1 [ d2 ) d1 d2
2. d1 [ d2 ; d2 [ d3 ) d1 [ d3 , similarly for case
3. It may so happen that neither d1 [ ðor Þd2 nor d2 [ ðor Þd1 . In such case d1
and d2 are non-comparable. Thus [ ðor Þ gives a partial ordering of rules
2D
Fig. 7.1 Rd ( θ)
R dc ( θ )
Rd (θ )
1
2
R d1 ( θ )
θ
7.1 Introduction 189
Some relationship between a complete (or a minimal complete) class and the
class of all admissible rules
Let A = the class of all admissible rules.
Result 1 For any complete class C, A C, i.e. any complete class C contains all
admissible rules.
190 7 Statistical Decision Theory
d0 62 A ð7:1Þ
d1 [ d0 ð7:2Þ
(Let d 62 C1 h
Case 1 d ¼ d0 . By (7.2), there exists a d1 2 C and hence d1 2 C 1 such that
d1 [ d0 :
Case 2 d 6¼ d0 . Then d 62 C, so there exists a d0 2 C such that d0 [ d:
A: d0 ¼ d0 . By (7.2), there exists a d1 2 C and hence 2 C1 such that
d1 [ d0 [ d.
B: d0 6¼ d0 . d0 2 C 1 . Hence, there exists a d0 2 C 1 such that d0 [ d:
Thus, given any d 62 C 1 in all cases there will exist a d0 2 C1 such that d0 [ d )
C 1 is complete)
Now (7.3) contradicts that C is minimal complete and hence (7.1) must be false
) C A: So C A.
Convex Non-Convex
7.2 Complete and Minimal Complete Class of Decision Rules 193
Let
S = a convex subset of Rk .
f x ¼ a real-valued function defined on S (Fig. 7.4).
Fig. 7.4
f x is said to be a convex function if for any two x ; y 2 S and for any
0 a 1,
f a x þ ð1 aÞ y af x þ ð1 aÞf y ð7:4Þ
If strict inequality holds in (7.4) whenever x 6¼ y , f x is said to be strictly
convex.
f ð xÞ ¼ x2 ; ex ; jxj; x 2 R1
Examples 7.10 |fflffl{zfflffl} convex
Strictly convex
Lemma 2 (Jensen’s inequality) Let S = a convex subset of Rk ; f x ¼ a
real-valued convex function defined on S. Let Z ¼ a random variable, such that
h i
P Z 2 S ¼ 1 and E Z exists. Then (i) E Z 2 S; (ii) Ef Z f E Z
If f is strictly convex, then strict inequality holds in (ii) unless the distribution of
Z is degenerate.
Let ɶ = a convex subset of Rk . The loss function Lðh; aÞ is said to be convex (or
strictly convex) if for each given h, Lðh; aÞ is a convex (or strictly convex) function
of a.
194 7 Statistical Decision Theory
Example 7.11
¼ R1 ; Lðh; aÞ ¼ ðh aÞ2 or jh aj
# #
Strictly convex convex
x2x ð7:5Þ
Let D = the class of all non-randomized rules D D.
Lemma 3 Let ɶ = a convex subset of Rk and the loss function be convex. Then for
each d 2 D satisfying (7.5) there exists a d 0 2 D; viz, d 0 ðxÞ ¼ EZ x such that
d 0 d. If the loss function is strictly convex, then d 0 [ d unless d itself 2 D.
Corollary 1 Let ɶ = a convex subset of Rk , the loss function be strictly convex and
every d 2 D satisfying (7.5), then D (=the class of all non-randomized rules) is
essential complete.
Proof of Lemma 3 Let d 2 D
dðxÞ ¼ a probability distribution over ɶ. For each x, Z x ¼ a random variable
with probability distribution dðxÞ. Define d 0 ðxÞ ¼ EZ x . By (i) of Lemma 2, d 0 ðxÞ 2
ɶ 8x, i.e. d 0 ¼ d 0 ðxÞ 2 D. Also, by (ii) of Lemma 2 Lðh; d 0 ðxÞÞ ¼ Lðh; EZ x Þ
ELðh; Z x Þ ¼ Lðh; dðxÞÞ.
If the loss function is strictly convex, strict inequality holds in (7.6) for at least
one h unless Z x -distribution is degenerate, i.e. 8x except possibly for x 2 A such
that Ph ½x 2 A ¼ 0 8h, in which case it means that d itself 2 D ) d 0 [ d unless d
itself 2 D h
Corollary 2 Let ɶ = a convex subset of Rk , the loss function is strictly convex and
every d 2 D satisfying (7.5). Let T be a sufficient statistic and D0 = the class of
non-randomized rules based on T, D0 D. Then D0 is essential complete
(complete).
Proof Let d 2 D.
D0 = the class of all randomized decision rules based on T. By Lemma 1, there
exists a d0 ¼ d0 ðT Þ 2 D0 such that d0 d. For each t, d0 ðT Þ is a probability dis-
tribution over ɶ. Define Z t ¼ a random variable with probability distribution d0 ðtÞ
and d 0 ðtÞ ¼ EZ t . As in proof of Lemma 3, d 0 ðtÞ 2 ɶ, i.e. d 0 ¼ d 0 ðT Þ 2 D0 and
d 0 d0 ( [ d0 for strictly convex loss function unless d0 2 D; d). Thus, given
7.2 Complete and Minimal Complete Class of Decision Rules 195
any d 2 D, there exists a d 0 2 D0 such that d 0 d (> d for strictly convex loss
function unless d 2 D0 ).
) D0 is essential complete (complete). h
Note On the condition stated by (7.5)
Let d 2 D, Z x ¼ a random variable with probability distribution dðxÞ over ɶ.
Lðh; dðxÞÞ ¼ ELðh; Z x Þ which exists for each x and h. This in many cases implies
(7.5) holds.
Example 7.12 k ¼ 1; Lðh; aÞ ¼ ðh aÞ2 ɶ ¼ R1
ELðh; Z x Þ ¼ E ðh Z x Þ2 exists 8x and 8h
) EZ x exists 8x.
Lðh; aÞ ¼ jh aj
ELðh; Z x Þ ¼ E jZ x hj E jZ x j h
i.e. E jZ x j h þ E jZ x hj:
Thus E jZ x hj exists 8x and 8h ) (7.5) holds.
For K 2, ɶ = ¼ Rk ¼ X
Xk
2
L h; a ¼ j ai hi j 2 ¼
a h
i¼1
EL h ; Zx ¼ E
Zx h
which exists 8x and 8h ) (7.5) holds.
Fig. 7.5
L(θ,a)
c1 a +c2
Rao-Blackwell Theorem
Lðh; aÞ ¼ ðh aÞ2
Then Rd ðhÞ exists 8h ) E h ðd ðxÞÞ exists ) (7.7) holds. Thus the proposition
gives a sufficient condition on loss function for (7.7) to hold.
d1 d2 if Rd1 ðhÞ Rd2 ðhÞ 8h and it is a natural partial ordering of decision rules.
d0 2 D is said to be best or optimal if d0 d 8 d 2 D, but generally such an optimal
rule does not exist.
Example To estimate a real parameter h, X ¼ ¼ ð1; 1Þ. Let
Lðh; aÞ ¼ ðh aÞ2 . If possible, suppose there exists a best rule, say d0 . Consider
any given value of h, say h0 and define d 0 ðxÞ ¼ h0 8x. Clearly, Rd 0 ðh0 Þ ¼ 0 )
Rd0 ðh0 Þ ¼ 0 where d0 d 0 . Since h0 is arbitrary we must have Rd0 ðhÞ ¼ 0; 8h
which is generally impossible.
) generally there does not exist a best rule.
So to find a reasonably good decision rule we need some additional principles.
Two such principles are generally followed:
(i) Restriction principle
(ii) Linear ordering principle
Restriction principle Put some reasonable restrictions on decision rules, i.e.
consider a reasonable restricted sub-class of decision rules having good overall
performances and then try to find a best in this restricted sub-class.
Two restriction criteria often used are
(i) Unbiasedness and
(ii) Invariance
Linear ordering principle
For every d replace the risk function by a representative number and then compare
the rules in terms of these representative numbers.
If representative number of d1 representative number of d2 , then we prefer d1
to d2 . d0 is considered to be optimal if representative number of d0 representative
number of d 8 d 2 D.
Thus a linear ordering principle is a way of specifying representative number
Note Any linear ordering principle should not disagree with partial ordering
principle, i.e. if d1 d2 we must have representative number of d1 as repre-
sentative number of d2 .
Two linear ordering principles that are used in general are
(i) Bayes principle
(ii) Minimax principle
198 7 Statistical Decision Theory
If X ¼ a non-degenerate interval of Rk ,
sðhÞ = p.d.f of a (continuous)
R distribution over X.
Bayes risk of d ¼ rðs; dÞ ¼ Rd ðhÞsðhÞdh.
X
d0 S are compared with respect to rðs; dÞ, i.e. if rðs; d1 Þ rðs; d2 Þ, then d1 is
preferred to d2 . A d0 2 D is considered to be optimum if it minimizes rðs; dÞ with
respect to d 2 D. Such a d0 is called a Bayes rule with respect to prior s.
Definition A rule d0 2 D is said to be a Bayes rule with respect to a prior s if it
minimizes Bayes risk (w.r.t. s) rðs; dÞ w.r.t. d 2 D, i.e. if rðs; d0 Þ ¼ inf rðs; dÞ.
d21
Note
1. A Bayes rule may or may not exist. If it exists, inf min.
2. A Bayes rule depends on prior s.
3. A Bayes rule, even if exists, may not be unique.
4. Bayes principle does not disagree with partial ordering principle, i.e.
Rd1 ðhÞ Rd2 ðhÞ 8h ) rðs; d1 Þ rðs; d2 Þ whatever be s.
Minimax principle
For a d 2 D, representative number is taken as
Sup Rd ðhÞ ¼ Max: Risk that may be incurred due to choice of d. d1 is preferred
h2X
to d2 if Sup Rd1 ðhÞ Sup Rd2 ðhÞ.
h2X h2X
d0 is considered to be optimum if it minimizes Sup Rd ðhÞ with respect to d 2 D.
h2X
Such a d0 is called a “Minimax Rule”.
Definition A rule d0 2 D is said to be a minimax rule if it minimizes Sup Rd ðhÞ
h2X
with respect to d 2 D, i.e. if
Sup Rd0 h ¼ inf Sup Rd h.
h2X d2D h2X
7.3 Optimal Decision Rule 199
Notes
1. A minimax rule may or may not exist.
2. A minimax rule does not involve any prior s.
3. A minimax rule, even if exists, may not be unique.
4. Minimax principle doesn’t disagree with partial ordering principle
s = a given prior.
To find a Bayes rule d0 with respect to s to find a rule d0 that minimizes Bayes
risk rðs; dÞ with respect to d.
Proposition If a Bayes rule d0 with respect to a given prior s exists, then there
exists a non-randomized rule d 0 which is Bayes with respect to s.
Implication For finding a Bayes rule, we can without any loss of generality con-
sider non-randomized rules only.
Proof Let d0 be a Bayes rule with respect to s. d0 may be considered as a prob-
ability distribution over D (=the class of non-randomized rules).
Let Z = a random variable with probability distribution d0 over D. Then
by (7.8).
We must have equality in (7.9), and consequently Z must 2 D0 with probability
1, where D0 ¼ fd=d 2 D;r ðs; d Þ ¼ r ðs; d0 Þg. h
Consider any d 0 2 D0 ,
then r ðs; d 0 Þ ¼ r ðs; d0 Þ ¼ inf r ðs; dÞ (since d0 is Bayes)
d2D
) d 0 is also Bayes. This proves the Proposition.
Note It is clear from the proof that
(1) A randomized Bayes rule = A probability distribution over D0 ; i.e. the class of
non-randomized Bayes rules.
(2) If a non-randomized Bayes rule is unique, i.e. D0 consists of a single d 0 , then a
Bayes rule is unique and is d 0 .
assuming it is permissible.
PSuppose there exists a d0 ¼ d0 ð xÞ such that for each x, d0 ð xÞ minimizes
sðhÞpðx=hÞLðh; d ð xÞÞ with respect to d ð xÞ 2 ɶ.
h2X
Then clearly, d0 minimizes (7.10) w.r.t. d 2 D ) d0 is Bayes rule with respect
to s.
pðx=hÞ ¼ conditional p:m:f of X given h.
sðhÞ ¼ marginal p:m:f of h.
pðx=hÞsðhÞ ¼ Joint p:m:f of X and h.
X
pð x Þ ¼ pðx=hÞsðhÞ ¼ marginal p:m:f of X:
h2X
pðx=hÞsðhÞ
qðh=xÞ ¼ ¼ conditional ðPosteriorÞ p:m:f: of h given X ¼ x
pð x Þ
r ðs; d Þ ¼ Eh Rd ðhÞ
h i
¼ Eh EX=h Lðh; d ð xÞÞ *Rd ðhÞEX=h Lðh; d ð xÞÞ
¼ EX Eh=X Lðh; d ð xÞÞ min for each X ¼ x with respect to d ð xÞ 2
Applications
1 Estimation of a real parameter h for squared error loss. To estimate a real
parameter h where X ¼ ɶ ¼ R1 or an open interval of it.
Lðh; aÞ ¼ ðh aÞ2 , sðhÞ ¼ a prior p.d.f of h
R
To min. E fLðh; d ð xÞÞ=X ¼ xg ¼ ðh d ð xÞÞ2 qðh=xÞdh w.r.t. d ð xÞ 2 .
X
Clearly, minimizing d0 ð xÞ is given by
R
Z hpðh=xÞsðhÞdh
h X
d0 ð xÞ ¼ E =X ¼ x ¼ hqðh=xÞdh ¼ R
sðhÞpðh=xÞdh
X X
Example 7.14
1
pðx=hÞ ¼ ; 0\x\h
h
7.4 Method of Finding a Bayes Rule 203
R1
heh dh
Mean of the posterior distribution of h given (X = x) ¼ ex x
R1 h
heh j1
x þ e dh
x
þ ex
¼ e x
x
¼ xe ex ¼ x þ 1.
Thus unique Bayes estimator of h w.r.t. s is d 0 ðxÞ ¼ X þ 1.
Example 7.15 X Binðn; hÞ, n given, 0\h\1
To estimate h under squared error loss.
n x
pðx=hÞ ¼ h ð1 hÞnx ; x ¼ 0; 1; . . .; n
x
Let sðhÞ ¼ prior p:d:f of h
1
¼ ha1 ð1 hÞb1 ; a; b [ 0
Bða; bÞ
¼ Beta prior
Thus the unique Bayes estimator of h w.r.t. Beta ða; bÞ prior is d 0 ðxÞ ¼ n þX þ a
a þ b.
204 7 Statistical Decision Theory
Þb
0
hx
pðx=hÞ ¼ eh ; x ¼ 0; 1; . . .
x!
eh hx ab ah b1
x! Þ ðbÞ e h
qðh=xÞ ¼ R1
ab 1
: x! eð1 þ aÞh hx þ b1 dh
Þ ð bÞ
0
ð1 þ aÞh x þ b1
e :h xþb
¼ ð 1 þ aÞ
Þ ð x þ bÞ
Z1
ð 1 þ aÞ x þ b
d0 ð x Þ ¼ eð1 þ aÞh hx þ b dh
Þ ðx þ b Þ
0
xþb
ð 1 þ aÞ Þ ð x þ b þ 1Þ xþb
¼ xþbþ1
¼
Þ ðx þ bÞ ð 1 þ aÞ 1þa
xþb
d 0 ð xÞ ¼ :
1þa
Notes
1. d 0 ðxÞ is also (unique) Bayes if Lðh; aÞ ¼ cðh aÞ2 1ðh aÞ2 , c = a given
constant
2. If a sufficient statistic T exists we may consider rules based on T only (because
of essential completeness of rules based on T) and then may find Bayes rule
based on T.
2
n2t2 h2 n þ 12 þ nht
¼ Cont: e e r
2
2 þ1 nr2
n2 r 2 t 2 nr h
n2t2 þ t
¼ cont: e 2ðnr2 þ 1Þ
e 2r2 nr2 þ 1
¼ 2
n2t2 þ n2 r 2 t 2
2ðnr2 þ 1Þ
R nr
2 þ1
2r2
h nr2
nr2 þ 1
t
Const.e e
2
2 þ1 nr2
nr h t
¼ Const.e 2r2 nr2 þ 1
nr2 r2
) given t; h N t; 2
nr þ 1 nr þ 1
2
2
Posterior mean ¼ Kx; K ¼ nrnr
2 þ1 :
r2 r2
¼ Ex E h=x ðd 0 hÞ2 ¼ Ex ¼
nr2 þ 1 nr2 þ 1
Applications
2. Estimation of a real h under weighted squared error loss: h ¼ a real parameter.
To estimate h, X ¼ ¼ some interval of R1
Let Lðh; aÞ ¼ wðhÞðh aÞ2 ; wðhÞ [ 0
d 0 be Bayes if for each x, d 0 ðxÞ minimizes
Z
Eh=X¼x wðhÞðh d ðxÞÞ2 ¼ wðhÞðh d ðxÞÞ2 qðh=xÞdh
R1 hx ð1hÞnx
R h hð1h
1
Þ Bðx þ 1;nx þ 1Þ dh
hwðhÞqðh=xÞdh 0
d 0 ð xÞ ¼ R ¼ 1
wðhÞqðh=xÞdh R 1 hx ð1hÞnx
hð1hÞ Bðx þ 1;nx þ 1Þ dh
0
R1
hx ð1 hÞnx1 dh
0 Bðx þ 1; n xÞ
¼ ¼
R1 Bðx; n xÞ
hx1 ð1 hÞnx1 dh
0
x
¼ ; for x ¼ 1; 2; ::; n 1
n
Z Z1
For x ¼ 0; 2
wðhÞðh d 0 ð0ÞÞ qðh=x ¼ 0Þdh1 ðh cÞ2 h1 ð1 hÞn1 dh
0
¼ finite if c ¼ 0 ½by taking d 0 ð0Þ ¼ c
¼ 1 if c 6¼ 0
Z
x
) wðhÞðh d 0 ð0ÞÞ2 qðh=x ¼ 0Þdh is min for d 0 ð0Þ ¼ 0 ¼ :
n
R
Similarly, for x = n, wðhÞðh d 0 ð0ÞÞ2 qðh=x ¼ nÞdh is min for d 0 ðnÞ ¼ 1 ¼ nx
Thus for every x ¼ 0; 1; 2; . . .; n; d 0 ðxÞ ¼ nx minimizes
R
wðhÞðh d 0 ðxÞÞ2 qðh=xÞdh with respect to d ðxÞ 2
) d 0 ðxÞ ¼ nx ð¼ minimum variance unbiased estimator or maximum likelihood
estimator of hÞ is unique Bayes rule.
Application
3. Estimation of a real h under absolute error loss. To estimate h ¼ a real
parameter, X ¼ some interval of R1
Let Lðh; aÞ ¼ jh aj
d0 ¼ d0 ðX Þ be Bayes if for each x d 0 ðxÞ minimizes Eh=X¼x jh d ðxÞj with respect
to d ðxÞ 2
7.4 Method of Finding a Bayes Rule 207
Application
4. Estimation of function of h:
To estimate gðhÞ ¼ a real-value function of h.
y ¼ risk point of d
ð1Þ ð2Þ
Two risk points y ; y may be considered to be equivalent if
(c,c)
Qc
7.5 Methods for Finding Minimax Rule 209
Fig. 7.7
Equivalent points
S
Minimax Point
Notes
1. If S does not contain its boundary points, a minimax rule may not exist.
2. A minimax point may not be unique
(1,1) (2,1)
S
All Minimax
Points
(1,0) (2,0)
210 7 Statistical Decision Theory
3. A minimax point does not necessarily lie on the equalizer line (Figs. 7.8 and 7.9).
1
¼ if x 1
2x
(1,1) (2,1)
S
All Minimax
(1,0)
(2,0)
C0
Minimax Points
Fig. 7.8
7.5 Methods for Finding Minimax Rule 211
C0
=Minimax Point
Fig. 7.9
1
Ph2 ½X ¼ x ¼ ; x ¼ 0; 1; . . .
2x þ 1
X
1
y 1 ¼ R d ð h1 Þ ¼ dðxÞ Ph1 ðX ¼ xÞ
x¼1
X1
y 2 ¼ R d ð h2 Þ ¼ f1 dðxÞg Ph2 ðX ¼ xÞ
x¼0
1 1X 1
1
¼ 1 dð 0Þ dð x Þ x
2 2 x¼1 2
1
dð x Þ ¼ 8x 1; dð 0Þ ¼ 1
3
212 7 Statistical Decision Theory
Example 7.21
1 1
X¼ h1 ¼ ; h2 ¼
4 2
1 1
¼ a1 ¼ ; a2 ¼
4 2
a1 a2
Loss matrix: h1 1 4
h2 3 2
Let X ¼ 0 with probability h
h ¼ h1 ; h2
¼ 1 with probability ð1 hÞ
Then D ¼ fd 1 ; d 2 ; d 3 ; d 4 g
d 1 ð0Þ ¼ d 1 ð1Þ ¼ a1 ; d 2 ð0Þ ¼ d 2 ð1Þ ¼ a2 ; d 3 ð0Þ ¼ a1 but d 3 ð1Þ ¼ a2 and
d 4 ð0Þ ¼ a2 but d 4 ð1Þ ¼ a1
Rd 1 ðh1 Þ ¼ 1; Rd 1 ðh2 Þ ¼ 3; Rd2 ðh1 Þ ¼ 4; Rd2 ðh2 Þ ¼ 2
1 3 1
R d 3 ð h1 Þ ¼ 1þ 4 ¼ 3
4 4 4
1 1 1
R d 3 ð h2 Þ ¼ 3þ 2 ¼ 2
2 2 2
1 3 3
R d 4 ð h1 Þ ¼ 4þ 1 ¼ 1
4 4 4
1 1 1
R d 4 ð h2 Þ ¼ 2þ 3 ¼ 2
2 2 2
(4,1)
Minmax point=
Fig. 7.10
Let d 2 D. For X = x
1 3
Rd ðh1 Þ ¼ fu 1 þ ð1 uÞ 4g þ fv 1 þ ð1 vÞ4g
4 4
1
¼ ð16 3u 9vÞ
4
1 1
R d ð h2 Þ ¼fu 3 þ ð1 uÞ 2g þ fv 3 þ ð1 vÞ2g
2 2
1
¼ ðu þ v þ 4Þ
2
26
Rd ðh1 Þ ¼ Rd ðh2 Þ ¼
11 ð7:12Þ
1 26 24
) ð16 3u 9vÞ ¼ ) u þ 3v ¼
4 11 11
1 26 8
and ð u þ v þ 4Þ ¼ ) uþv ¼ ð7:13Þ
2 11 11
214 7 Statistical Decision Theory
8 3
dða1 =0Þ ¼ 0; dða2 =0Þ ¼ 1; dða1 =1Þ ¼ ; dða2 =1Þ ¼ :
11 11
Note The unique minimax rule is purely randomized. Thus, unlike Bayes rules, a
minimax rule may be purely randomized, i.e. although a minimax rule exists, no
non-randomized rule is minimax.
Alternative (direct/or Algebraic approach)
Fig. 7.11
8
11 D2
v
D
3
D1 11
1
Sup Rd ðhÞ ¼ maxfRd ðh1 Þ; Rd ðh2 Þg ¼ ð16 3u 9vÞ
h2X 4
1 104 26
¼ inf ð12u þ 104Þ ¼ ¼
0u1 4 44 11
26
¼ min inf Sup Rd ðhÞ; inf Sup Rd ðhÞ ¼
d2D1 h2H d2D2 h2X 11
Unique minimaxiety
If possible let d0 be not unique minimax. So there exists another d1 which is also
minimax, i.e. there exists another d1 such that
d ðu; vÞ
@rðs;d Þ
Now @u ¼ 0 ) 2m2 þ 2m1 u ¼ 0 ) u ¼ m2
m1
@r ðs; d Þ
¼ 0 ) 2m2 2m1 v 2m1 þ 2v ¼ 0
@v
m1 m2
)v¼
1 m1
Hence, the equalizer non-randomized rule is unique Bayes w.r.t. a s such that
1 m2
m2
m1 ¼ 34 and m1m 1
¼ 14.
) m1 ¼ 2 and m2 ¼ 38.
1
7.5 Methods for Finding Minimax Rule 217
[For example, let s ¼ B 12 ; 12 prior α = β m1 ¼ a þa b ¼ 12 and m2 ¼
aða þ 1Þ
ða þ bÞða þ b þ 1Þ ¼ 38 and the equalizer non-randomized rule is unique Bayes w.r.t.
1 1
B 2 ; 2 prior].
Thus the non-randomized rule d 0 ð1Þ ¼ 34 ; d 0 ð0Þ ¼ 14 is equalizer as well as
unique Bayes (w.r.t some prior) ) d 0 ðX Þ is minimax (unique).
Example 7.23
X þa
d ab ðX Þ ¼
nþaþb
2
X þa
Rd ab ðhÞ ¼ E h h
nþaþb
E h fðx nhÞ hða þ bÞ þ ag2
¼
ð n þ a þ bÞ 2
Eh ðx nhÞ2 þ h2 ða þ bÞ2 þ a2 2haða þ bÞ
½* E h ðx nhÞ ¼ 0 )
ð n þ a þ bÞ 2
n o
h i h2 ða þ bÞ2 n þ hfn 2aða þ bÞg þ a2
* E h ðx nhÞ2 ¼ nhð1 hÞ )
ð n þ a þ bÞ 2
d ab is equalizer iff
pffiffi
ða þ bÞ2 ¼ n a ¼ 2n
, pffiffi
2aða þ bÞ ¼ n b ¼ 2n
Rd0 ðhÞ ¼ c 8h
(7.19), (7.20)
So, minimaxiety of d0 :
) d0 is minimax.
Unique minimaxiety of d0 : For any dð6¼ d0 Þ
r ðs; dÞ ¼ a2 r ðs; d Þ
1
R d 0 ð h1 ; h2 Þ ¼ pffiffiffiffiffi2 ¼ c 8ðh1 ; h2 Þ 2 H0
Hence, 1 þ 2n h
\c 8ðh1 ; h2 Þ 2 H H0
222 7 Statistical Decision Theory
2ðXY Þ
By Corollary 2, Step 1 + Step 2 gives us d 0 ¼ 2n pffiffiffiffi is the unique minimax
þ 2n
estimator of h1 h2 .
Result 4 Let d0 be such that
(i) Rd0 ðhÞ c 8h 2 H, c = a real constant
(ii) There exists a sequence of Bayes rules fdn g w.r.t. sequence of priors fsn g
such that r fsn ; dn g ! c. Then d0 is a minimax.
For d ¼ d0
(7.23) ) Sup Rd0 ðhÞ c and also condition ðiÞ ) Sup Rd0 ðhÞ c
h2H h2H
i.e. d0 is minimax.
Example 7.27 Let X 1 ; X 2 ; ::; X n i.i.d N ðh; 1Þ; 1\h\1
To estimate h under squared error loss,
Let d 0 ¼ X; Rd 0 ðhÞ ¼ 1n 8h
(i) is satisfied with c ¼ 1n.
Let sr : N ð0; r2 Þ prior
2
d r ¼ Bayes estimator of h w.r.t. sr ¼ 1 þnrnr2 X
r ðsr ; d r Þ ¼ 1 þr nr2 ! 1n ¼ c as r2 ! 1
2
bd ðhÞ ¼ E h ðd ðX ÞÞ h ¼ Bias of d ðX Þ
C d1 ðhÞ Cd 0 ðhÞ 8h
) bd 1 ð hÞ ¼ bd 0 ð hÞ 8h
Then d 0 is admissible.
If further, d 0 is equalizer, then d 0 is minimax.
Proof Result I ) proves that it is minimax h
Proof of admissibility If possible let d 0 be inadmissible.
Then there exists a d 1 such that
(7.26) ⇒
1
Rd 0 ðhÞ ¼ C d 0 ðhÞ ¼ 8h
n
7.5 Methods for Finding Minimax Rule 225
1
Sup Rd1 ðhÞ ¼ Sup Rd 0 ðhÞ ¼ ¼ C d 0 ð hÞ
h2H h2H n
) C d1 ðhÞ Rd 1 ðhÞ Sup Rd 1 ðhÞ ¼ Cd 0 ðhÞ8h
|fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} h2H
By C–R inequality
) Cd 1 ðhÞ Cd 0 ðhÞ 8 h ) bd1 ðhÞ ¼ 0 8 h (By Lemma).
) d 1 is an unbiased estimator of h. But since X is complete d 0 is the unique
is the unique minimax esti-
unbiased estimator of h; i.e., d 1 ¼ d 0 , Hence d 0 ¼ X
mator of h]
Proof of Lemma Writing bd ðhÞ ¼ bðhÞ
Let Cd ðhÞ Cd 0 ðhÞ ¼ 1n 8 h
.
i:e:; b2 ðhÞ þ f1 þ b0 ðhÞg
2
n 1=n 8 h ð7:30Þ
1 2b0 ðhÞ .
b2 ðhÞ þ f1 þ b0 ðhÞg n 1=n
2
½as ð7:30Þ) þ
n n
2b0 ðhÞ
) 0 ) b0 ðhÞ 0
n
b0 ðhÞ
Now ð7:32Þ ) 1
8 h such that bðhÞ 6¼ 0
b0 ðhÞ
2 2
226 7 Statistical Decision Theory
d 1 1
Or b ð hÞ 8 h such that bðhÞ 6¼ 0 ð7:33Þ
dh 2
ð7:31Þ; ð7:33Þ ) bðhÞ ! 0 as h ! 1 ð7:34Þ
Lðh; aÞ ¼ Loss ðto the statisticianÞ if the statistician chooses an action ‘a’ and
nature chooses an action ‘h’.
A randomized action for the statistician = a probability distribution over ɶ.
The statistician observes the value of a r.v. X. If X = x is observed, the statistician
chooses a randomized action dðxÞ.
Proof
(7.35), (7.36) ) Sup Rd ðhÞ ¼ Sup r ðs; dÞ, hence the proof.
h2H s2H
A rule d0 is minimax if it minimizes
Sup Rd ðhÞw:r:t d 2 D
h2X
If nature chooses a least favourable s0 , then expected loss (of the statistician) is
at least m whatever be the rule the statistician chooses. h
Result 2 m m
Proof
) m m
Result 3 if the statistical game has a value and a least favourable prior s0 and a
minimax rule d0 exists, then d0 is Bayes w.r.t. s0 .
Proof m ¼ inf r ðs0 ; dÞ r ðs0 ; d0 Þ Sup r ðs; d0 Þ ¼ m
d s
If m ¼ m, then ‘=’ must hold every where implying
inf r ðs0 ; dÞ ¼ r ðs0 ; d0 Þ ) d0 is Bayes w.r.t. s0 . h
d
Minimax theorem Let H be finite and the risk set S be bounded below. Then the
statistical game will have a value and a least favourable prior s0 exists.
If further, S is closed from below an admissible minimax rule d0 exists and d0
Bayes w.r.t. s0 .
Thus if H is finite and S is bounded below as well as closed from below, then
(i) A minimax rule exists
(ii) An admissible minimax rule exists and
(iii) A minimax rule is Bayes (w.r.t least favourable prior s0 ).
Proof
(a) Proved earlier
(b) To show inf r ðs0 ; dÞ inf r ðs; dÞ 8 s ðbÞ
d d
Now (i) ) r ðs; d0 Þ c 8 s
) inf r ðs; dÞ r ðs; d0 Þ c ¼ r ðs0 ; d0 Þ ¼ inf r ðs0 ; dÞ 8 s by ðiiÞ
d d
This proves (b). h
7.7 Invariance
d ðx þ cÞ ¼ d ð xÞ þ c ð7:37Þ
i. g1 ; g2 2 G ) g2 g1 2 G
ii. g 2 G ) g1 2 G.
Note Let G be a group of transformations, then every g 2 G is 1:1 and onto (since
g1 exists).
Also, the identity transformation e always 2 G [if g 2 G, then g1 2 G,
e ¼ g1 g 2 G].
Example 7.33 x ¼ R1
gc ðxÞ ¼ x þ c; c ¼ a real constant.
Let G ¼ fgc = 1\c\1g
1
gc 2 G ) g1
c 2 G Asgc ðxÞ ¼ x þ ðcÞ
1
g1
c ðxÞ ¼ x
c
gc 1 ; gc 2 2 G ) gc 1 gc 2 2 G
gc 2 G ) g1
c 2G
7.7 Invariance 231
G is a group transformation.
It is a group under both location and scale transformation.
Example 7.36 x ¼ f0; 1; 2. . .ng
Let gðxÞ ¼ n x
G ¼ fe; gg
x¼ x0 x. . .. . .. . .:x
x0 x x0
P ¼ fexpðhÞ=0\h\1g
P ¼ fBinðn; hÞ=0\h\1g
Lðh; aÞ ¼ LðgðhÞ; a0 Þ 8h 2 H:
G ¼ fgc = 1\c\1g; gc ¼ x þ c
P is invariant w.r.t
G with G ¼ fgc = 1\c\1g; gc ðhÞ ¼ c þ h:
To estimate h under Lðh; aÞ ¼ ðh aÞ2
For any gc 2 G; a 2 , there is an a0 ¼ a þ c 2
such that Lðh; aÞ ¼ Lðgc ðhÞ; a0 Þ 8h 2 X:
a0 is uniquely determined by a and c. Hence the loss function is invariant w.r.t G.
7.7 Invariance 233
H0 ¼ fh ¼ ðl; r2 Þ=l 0g
X Nðl; r2 Þ
) gc ðxÞ Nðc l; c2 r2 Þ 2 P
a ¼ fa0 ; a1 g; ai ¼ accept H i :
a0 a1
h 2 H0 0 L0
h 2 H1 L1 0
Let g 2 G
Let dðXÞ ¼ a be reasonable n.r. rule for the original problem. dðgðxÞÞ should be
a reasonable rule for the transformed problem. Also if for X = x, dðxÞ 2 is a
reasonable action for the original problem, then for gðXÞ ¼ gðxÞ; ~gðdðXÞÞ should be
a reasonable action in the transformed problem.
These two agree if dðgðxÞÞ ¼ ~gðdðxÞÞ. . .. . .:ðiiÞ
A non-randomized rule is said to be an invariant non-randomized rule if
(ii) holds 8x 2 x8g 2 G.
We thus get a class of n.r. decision rules as
DI = the class of invariant n.r. rules.
Appendix
Note
(1) For other values of π0 the exact test cannot be obtained as binomial distri-
bution is symmetric only when p ¼ 12.
(2) For some selected n and π the binomial probability sums considered above are
given in Table 37 of Biometrika (Vol. 1)
A.1.2 Suppose we have two infinite populations with π1 and π2 as the unknown
proportion of individuals having character A. We are to test H0 : p1 ¼ p2 .
To do this we draw two samples from two populations having sizes n1 and n2.
Suppose x1 and x2 as the random variables denoting the no. of individuals in the 1st
and 2nd samples with character A.
To test H0 : p1 ¼ p2 we make use of the statistics x1 and x2 such that x1 þ x2 ¼ x
(constant), say.
Under H0 : p1 ¼ p2 ¼ p (say),
!
n1
f ðx1 Þ ¼ p:m:f: of x1 ¼ px1 ð1 pÞn1 x1
x1
!
n2
f ðx2 Þ ¼ p:m:f: of x2 ¼ px2 ð1 pÞn2 x2
x2
n1 þ n2
f ðxÞ ¼ p:m:f: of x ¼ px ð1 pÞn1 þ n2 x :
x
X ðnk0 Þy
i:e:; enk0 a:
y y0
y!
X ðnk0 Þy
i:e:; enk0 a:
y y0
y!
In many investigations one is faced with the problem of judging whether two
qualitative characters, say A and B, may be said to be independent. Let us denote the
forms of A by Ai fi ¼ 1ð1Þkg and the forms of B by Bj fj ¼ 1ð1Þlg, and the prob-
ability associated with the cell Ai Bj in the two-way classification of the population
Appendix 241
P
by pij . The probability associated with Ai is then pi0 ¼ j pij and that associated
P
with Bj is p0j ¼ i pij . We show the concerned distribution in the following table:
A B Total
B1 B2 …. Bj …. Bl
A1 p11 p12 …. pij …. p1l p10
A2 p21 p22 …. p2j …. p2l p20
. . . . . .
. . . . . .
Ai pi1 pi2 …. pij …. pil pi0
. . . . . .
. . . . . .
Ak pk1 pk2 …. pkj …. pkl pk0
Total p01 p02 …. p0j …. p0l 1
where pij ¼ P A ¼ Ai ; B ¼ Bj 8ði; jÞ
pi0 ¼ PðA ¼ Ai Þ and p0j ¼ P B ¼ Bj
This may be used for testing H 0 . Keeping marginal frequencies fixed we change
the cell-frequencies and calculate the corresponding probabilities. If the sum of the
probabilities a, then we reject H 0 .
H 1 : l [ l0 ; x0 : s [ sa
H 2 : l\l0 ; x0 : s\sa
H 3 : l 6¼ l0 ; x0 : jsj [ sa=2
100ð1 aÞ% confidence interval for l (when H 0 is rejected) is x prffiffin sa=2
pffiffi
Case II r unknown: Here we estimate r by s’ and nðxs0 lÞ tn1 .
pffiffi
Under H 0 t ¼ nðxs0 l0 Þ tn1 :
H 1 : l [ l0 ; x0 : t [ ta;n1
H 2 : l\l0 ; x0 : t\ta;n1
H 3 : l 6¼ l0 ; x0 : jtj [ ta=2;n1
0
100ð1 aÞ% confidence interval for l is x ps ffiffin ta=2 ; n 1
A.4.2 To test H 0 : r ¼ r0P
: P
ðxi lÞ2 ðxi lÞ2
Case I l known: we know r2 v2n , under H 0 ; v2 ¼ r20
v2n .
H 1 : r [ r0 ; x0 : v2 [ v2a;n
H 2 : r\r0 ; x0 : v2 \v21a;n
H 3 : r 6¼ r0 ; x0 : v2 [ v2a=2;n
Appendix 243
or,
v2 \v21a=2;n
" P #
ðxi lÞ2
P v1a=2;n \
2
\va=2;n ¼ 1 a
2
r2
"P P #
ðxi lÞ2 ðxi lÞ2
i:e:; P \r \2
¼1a
v2a=2;n v21a=2;n
H 1 : r [ r0 ; x0 : v2 [ v2a;n1
H 2 : r\r0 ; x0 : v2 \v21a;n1
H 3 : r 6¼ r0 ; x0 : v2 [ v2a=2;n1 :
or,
v2 [ v21a=2;n1
" P #
ðxi xÞ2
P v1a=2;n1 \
2
\va=2;n1 ¼ 1 a
2
r2
"P P #
ðxi xÞ2 ðxi xÞ2
i:e., P \r \ 2
2
¼1a
v2a=2;n1 v1a=2;n1
P) 100ð1 P
aÞ% confidence interval for r2 when l is unknown is
ðxi xÞ2 ðxi xÞ2
v2
; v2 .
a=2;n1 1a=2;n1
Suppose we have two independent populations Nðl; r21 Þ and Nðl2 ; r22 Þ. We draw a
random sample ðx11 ; x12 ; . . .; x1n1 Þ of size n1 from the first population and another
random sample ðx21 ; x22 ; . . .; x2n2 Þ of size n2 from the second population.
244 Appendix
1X n1
1X n2
x1 ¼ x1i and x2 ¼ x2i
n1 i¼1 n2 i¼1
1 X n1
1 X n2
s02
1 ¼ ðx1i x1 Þ2 s02
2 ¼ ðx2i x2 Þ2
n1 1 i¼1 n2 1 i¼1
respectively.
(I) H 0 : 11 l1 þ 12 l2 ¼ 13 :
Case I r1 ; r2 known:
11x1 þ 12x2 ð11 l1 þ 12 l2 Þ
We find that sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Nð0; 1Þ
121 r21 122 r22
þ
n1 n2
11x1 þ 12x2 13
Under H 0 ; s ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Nð0; 1Þ
121 r21 122 r22
þ
n1 n2
) H 1 : 11 l1 þ 12 l2 [ 13 ; x0 : s [ sa
H 2 : 11 l1 þ 12 l2 \13 ; x0 : s\sa
H 3 : 11 l1 þ 12 l2 6¼ 13 ; x0 : jsj [ sa=2
Case II r1 ; r2 unknown:
Fisher’s t-test: We assume r1 ¼ r2 ¼ r, say.
ðn1 1Þs02 02
1 þ ðn2 1Þs2
r2 is estimated by ðn1 þ n2 2Þ ¼ s02 say
H 1 : 11 l1 þ 12 l2 [ 13 ; x0 : t [ ta;n1 þ n2 2
H 2 : 11 l1 þ 12 l2 \13 ; x0 : t\ ta;n1 þ n2 2
H 3 : 11 l1 þ 12 l2 6¼ 13 ; x0 : jtj [ ta=2 ;n1 þ n2 2
Note–I The above procedure may also be applicable when r1 and r2 are not
r2
equal provided 1 r12 \ 0.4—theoretical investigation in this area verifies this.
2
Note–II when homoscedasticity assumption r1 ¼ r2 is not tenable then we
require the alternative procedure and the corresponding problem is known as the
Fisher-Behren problem.
Note–III For 11 ¼ 1 and 12 ¼ 1 we get the test procedure for the difference
between the two means. Also for testing the ratio of the means, i.e. for testing
H 0 : ll1 ¼ k; say, we start with ðx1 kx2 Þ.
2
(II) H 0 : rr12 ¼ n0 :
P
1
ðx1i l1 Þ2
Case I l1 ; l2 known: n11 P ðx l Þ2 : r12 F n1 ;n2
n1 2i 2 1
r2
2
P
ðx1i l Þ2 =n1
) Under H 0 ; F ¼ P ðx l1 Þ2 n : n12 F n1 ;n2
2i 2 = 2 0
r1
H1 : [ n0 ; x0 : F [ F a;n1 ;n2
r2
r1
H 2 : \n0 ; x0 : F\F 1a;n1 ;n2
r2
r1
H3 : 6¼ n0 ; x0 : F [ F a=2;n1 ;n2 or; F\F 1a=2;n1 ;n2 :
r2
P
ðx1i l1 Þ2 =n1 r22
P
Also, P F 1a=2;n1 ;n2 \ ðx l Þ2 n : r2 \F a=2;n1 ;n2 ¼ 1 a
2i 2 = 2 1
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
ðx1i l1 Þ2 ðx1i l1 Þ2
Or, P P n2 r1
\ r2 \ n P n2
¼1a
n 1 ðx l Þ2 F
2i 2 a=2;n1 ;n2 ðx l Þ2 F
1 2i 2 1a=2;n1 ;n2
r1
This provides the 100ð1 aÞ% confidence interval for r2 when l1 ; l2 are
known.
Case II l1 ; l2 unknown:
P
1
ðx1i x1 Þ2
We have n111 P ðx x Þ2 : r12 F n1 1;n2 1
n2 1 2i 2 1
r2
2
246 Appendix
s02
1 r2
2
i:e:; : F n1 1;n2 1
s02
2 r1
2
s02
under H 0 ; F ¼ s102 : e12 F n1 1;n2 1
2 o
r1
) H1 : [ n0 ; x0 : F [ F a;n1 1;n2 1
r2
r1
H2 : \n0 ; x0 : F\F 1a;n1 1;n2 1
r2
r1
H3 : 6¼ n0 ; x0 : F [ F a=2;n1 1;n2 1 or F\F 1a=2;n1 1;n2 1 :
r2
2 3
6 s02 7
Also, P4F 1a=2;n1 1;n2 1 \ s102 : 12 \F a=2;n1 1;n2 1 5 ¼ 1 a
2 r1
r2
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s02 r1 s02
Or, P s02 F
1
\ r2 \ s02 F
1
¼1a
2 a=2;n1 1;n2 1 2 1a=2;n1 1;n2 1
i.e., 100ð1 aÞ% confidence interval for rr12 , when l1 ; l2 are unknown, is
2 3
s01 0 s01 0
6 s2 s2 7
4qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ; qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi5:
F a=2;n1 1;n2 1 F 1a=2;n1 1;n2 1
Suppose in a given population the variables x and y are distributed in the bivariate
normal form N 2 ðlx; ly ; rx ; ry ; qÞ. Let ðx1 ; y1 Þ; ðx2 ; y2 Þ; . . .; ðxn ; yn Þ be the values of
x and y observed in a sample of size n drawn from this population. We shall
suppose that the n pairs of sample observations are random and independent. We
shall also assume that all the parameters are unknown.
We have for the sample observations
1X 1X 1 X
x¼ xi ; y ¼ yi ; s02
x ¼ ðxi xÞ2 ;
n i n i n1 i
P
1 X
2 2
i ðxi xÞ ðyi yÞ
1
02 2
sy ¼ ðyi yÞ ; and r xy ¼ n1
n1 i s0x s0y
Appendix 247
(1) To test H 0 : q ¼ 0:
pffiffiffiffiffiffi
We know when q ¼ 0; t ¼ rpffiffiffiffiffiffiffi
n2ffi
2
tn2
1r
H 1 : q [ 0; x0 : t [ ta;n2
H 2 : q\0; x0 : t [ ta;n2
H 3 : q 6¼ 0; x0 : jtj [ ta=2;n2
Note For testing q ¼ q0 ð6¼0Þ, exact test is difficult to get as for q 6¼ 0 the
distribution of r is complicated in nature. But for moderately large n one can use the
large sample test which will be considered later.
(2) H 0 : lx ly ¼ n0
Define z ¼ x y ) lz ¼ lx ly i.e., we are to test H 0 : lz ¼ n0 . Also
pffiffi P
nðzlz Þ
note that s0z tn1 where s02
z ¼ n1
1
zÞ2 ¼ s02
i ðzi
02 0
x þ sy 2sxy
P pffiffi
nðzn0 Þ
s0xy ¼ n1
1 2 2
i ðxi xÞ ðyi yÞ . Under H 0 ; t ¼ s0 tn1 :
z
s02 02 02 0
z0 ¼ sx þ g0 sy 2g0 sxy
lx
H2 : \g0 ; x0 : t \ ta;n1
ly
l
H 3 : x 6¼ g0 ; x0 : jtj [ ta=2;n1 :
ly
248 Appendix
h pffiffi i
Again P ta=2;n1 \ sn0 z \ta=2;n1 ¼ 1 a
z
hpffiffi i
nz
i.e., P s0 \ta=2;n1 ¼ 1 a or P½wðgÞ\0 ¼ 1 a.
2
z
z2
Solving the equation wðgÞ ¼ n s02 ta=2;n1 ¼ 0 which is a quadratic equation
z
in g, one can get two roots g1 and g2 ð [ g1 Þ. Now if wðgÞ is a convex function and
g1 and g2 are real, then P½g1 \g\g2 ¼ 1 a. If wðgÞ is a concave function, then
P½g\g1 ; g [ g2 ¼ 1 a. But if g1 and g2 be imaginary then from the given sample
100ð1 aÞ%Confidence interval does not exist.
(4) Test for the ratio n ¼ rrxy :
We write u ¼ x þ ny; v ¼ x ny
Under H 0 ,
2 ! ! 3
0 2
n x l 0 2
x l 0 y l 0
y l
v2 ¼ 4 x
2q x y
þ
y 5 v2
1 q2 rx rx ry ry 2
Suppose there are k-populations Nðl1 ; r21 Þ; Nðl2 ; r22 Þ; . . .Nðlk ; r2k Þ. We draw a
random sample of size ni from the ith population with ni ( 2 for at least one i).
Define
xij ¼ jth observation of ith sample, i = 1,2,...,k; j = 1,2,..., ni
Pi
xi ¼ ith sample mean ¼ n1i nj¼1 xij
02
Pi 2
si ¼ ith sample variance ¼ ni 1 nj¼1
1
xij xi
(I) We are to test H 0 : l1 ¼ l2 ¼ ¼ lk ð¼lÞ, say against H 1 . There is at least
one inequality in H 0 .
Assumption r1 ¼ r2 ¼ ¼ rk ð¼rÞ say.
Note that xi N li ; rni
2
pffiffiffi
n ðx l Þ
) i ri i Nð0; 1Þ 8i and are independent.
ðn 1Þs02
Also, i r2 i v2ni 1 (xi and s0i are independent.)
Under H 0
X k
ni ðxi lÞ2
v2k
i¼1
r2
these two v2 are independent.
Xk
ðni 1Þs02
and i
v2nk
i¼1
r2
Pk P
) Under H 0 ; ni ðxi xÞ2 r2 v2k1 and ki¼1 ðni 1Þs02
i r vnk .
2 2
P
i¼1
2
ni ðxi xÞ =
) Under H 0 ; F ¼ P k 1 F k1;nk .
02
i ðn i 1Þs i =n k
x0 : F [ F a;k1;nk . If H 0 is rejected, then we may be interested to test H 0 : li ¼
lj against H 1 : li 6¼ lj 8ði; jÞ.
250 Appendix
2 1 1
xi xj N li lj ; r þ
ni nj
xi xj ðli lj Þ
) qffiffiffiffiffiffiffiffiffiffiffiffiffi Nð0; 1Þ
r n1i þ n1j
P
ðni 1Þs02 ðxi xpj Þðli lj Þ
ffiffiffiffiffiffiffiffi
Unknown r2 is estimated by r ^2 ¼ n k i ¼ s02 , say ) tnk .
s0 ni þ nj
1 1
ðp xi xj Þ
) under H 0 ; t ¼ 0 ffiffiffiffiffiffiffiffi tnk :
ni þ nj
1 1
s
) x0 : jtj [ ta=2;nk . Also, 100ð1 aÞ% confidence interval for li lj is
n qffiffiffiffiffiffiffiffiffiffiffiffiffi o
xi xj s0 n1i þ n1j ta=2;nk .
(II)Bartlett’s test To test H 0 : r1 ¼ r2 ¼ ¼ rk ð¼ rÞ, say against H 1 : There
is at least one inequality in H 0 .
P
Define ci ¼ ni 1 and c ¼ ki¼1 ci ¼ n k. Bartlett’s test statistic M is such that
( )
X
k
c s02 X
k
M ¼ c loge i i
ci loge s02
i¼1
c i¼1
i
x0 : M 0 [ v2a;k1 :
Suppose the sample values of x and y are arranged in arrays of y according to the
fixed values of x as given below:
x1 x2 . . . xi ... xk
y1 y21 . . . yi1 ... yk1
y12 y22 . . . yi2 yk2
: : ...
: :
: :
y1n1 y2n1 yini ... yknk
Appendix 251
Pni P
Define yi0 ¼ 1
ni j¼1 yij ; y00 ¼ 1n i niyi0 ¼ y
1X X
x¼ ni x i ; n ¼ ni
n i i
P
ni ðy y00 Þ2
e2yx ¼ P iP i0
i j ð yij y00 Þ2
qffiffiffiffiffiffi
eyx ¼ þ e2yx ¼ sample correlation ratio:
P P
j ðyij y00 Þðxi xÞ
1
i
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r ¼ rn n
on P offi
P P
j ðyij
2 2
1
n i y00 Þ 1
n i ni ð x i x Þ
We assume yij xi N 1 ðli ; r2 Þ , i.e. Eðyij xi Þ ¼ li: .
(I) Test for regression: H 0 There does not exist any regression of y on x.
, H 0 : l1 ¼ l2 ¼ ¼ lk :
qffiffiffiffiffiffi
Define g2yx ¼ VðEðy=xÞÞ
VðyÞ ; gyx ¼ þ g2yx ¼ population correlation ratio.
) To test H 0 is equivalent to test H 0 : g2yx ¼ 0 against H 1 : g2yx [ 0
We note that
XX XX X
ðyij y00 Þ2 ¼ ðyij yi0 Þ2 þ ni ðyi0 y00 Þ2
i j i j i
Under H 0
XX X
SSB ¼ e2yx ðyij y00 Þ2 ¼ ni ðyi0 y00 Þ2 r2 v2k1
i j i
XX XX
SSw ¼ 1 e2yx ðyij y00 Þ2 ¼ ðyij yi0 Þ2 r2 :v2nk
i j i j
h i
=ðk1Þ e2yx B =ðk1Þ
) Under H 0 :F ¼ ð1e2 Þ nk F k1;nk : F ¼ SS
yx =
SSW =nk
) x0 : F [ F a;k1;nk :
H 0 : li ¼ a þ bxi 8i
H 1 : li 6¼ a þ bxi
252 Appendix
P P P
We note that, e2yx ðyij y00 Þ2 ¼ i ni ðyi0 y00 Þ2
nP P i j
o2
P P ðy
y Þ ðx i xÞ
^2 P ni ðxi xÞ2
Pj
ij 00
Also, r 2 i j ðyij y00 Þ2 ¼ ¼b
i
ni ðxi xÞ2
PP i
ðyij y00 Þðxi xÞ
^¼
where b i Pj
2
ni ðxi xÞ
P P
i
P
) e2yx r 2
ðyij y00 Þ2 ¼ ^ 2 P ni ðxi xÞ2 r2 :v2 under
ni ðyi0 y00 Þ2 b
i j i i k2
H0. P P
P P
Also, e2yx i j ðyij y00 Þ2 and e2yx r 2 i y00 Þ2 are independent.
j ðyij
ðe2yx r2 Þ=ðk2Þ
) under H 0 ; F ¼ ð1e2yx Þ=nk
F k2;nk
) x0 : F [ F a;k2;nk
2 i
r2 r2 r2 r2 X 2
VðaÞ ¼ VðyÞ þ x2 VðbÞ ¼ þ x2 ¼ Sxx þ nx2 ¼ xi
n Sxx nsxx nsxx
x2
i.e., a N a; r2 1n þ Sxx
a a0
H 01 : a ¼ a0 : under H 01 , t ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tn2
1 x2
^
r þ
n Sxx
P 2
^ ¼
where r 2
ðyi a bxi Þ =ðn 2Þ
Appendix 253
) H 11 : a [ a0 ; x0 : t [ ta;n2
H 21 : a\a0 ; x0 : t\ta;n2
H 31 : a 6¼ a0 ; x0 : jtj [ ta=2;n2 :
qffiffiffiffiffiffiffiffiffiffiffiffiffi
Also, 100ð1 aÞ% confidence interval for a is a r ^ 1n þ Sxxx ta=2 ; n 2 :
2
pffiffiffiffiffi
H 02 : b ¼ b0 : under H 02 ; t ¼ ðbb0r^Þ Sxx tn2
) H 12 : b [ b0 ; x0 : t [ ta;n2
H 22 : b\b0 ; x0 : t\ta;n2
H 32 : b 6¼ b0 ; x0 : jtj [ ta=2;n2 :
r2
¼ xVðbÞ ¼ x :
Sxx
! ( ! P 2 !)
r 2
a a xi
rSxxx
2
) N2 ; nSxx
b b
rSxxx r 2 2
Sxx
! ( ! P 2 )
a a xi x n
; nSr xx
2
i.e., N2
b b x n n
P 2 X
r 2
xi x n
Let ¼
nSxx x n n
!0 !
aa X1 aa
) v22 :
bb bb
n Pnx
P r2
P1 adj nSxx
nx x2
Now, ¼ j P j ¼ r2 2 P 2 i2 2
ðnSxx Þ ðn xi n x Þ
254 Appendix
nSxx n nx 1 n nx
¼ 2 P 2
¼ 2 P 2
r nSxx nx xir nx xi
!0 ! !0 !
a a X1 a a 1 aa n nx aa
) ¼ 2 P 2
bb bb r bb nx xi bb
h X i
) nða aÞ2 þ 2nxða aÞðb bÞ þ ðb bÞ2 x2i r2 v22
P
Again, n1 ðyi a bxi Þ2 r2 v2n2
) under H 03 ,
n P o.
nðaa0 Þ2 þ 2nxða a0 Þðb b0 Þ þ ðb b0 Þ2 x2i 2
F¼ P . F 2;n2
ðyi a bxi Þ2 ðn 2Þ
) w0 : F [ F a;2;n2 :
To test
H 0 : q1:23...p ¼ 0 against H 1 : q1:23...p [ 0
)w0 : F [ F a;p1;np :
r 12:34 ...p = sample partial correlation coefficient of X 1 andX 2 eliminating the effect
of X 3 ; . . .; X p
¼ pffiffiffiffiffiffiffiffiffi
R12 ffi
R R
. If q12:34 ...p then
11 22
pffiffiffiffiffiffiffiffiffiffiffi
r 12:34...p n p
t ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tnp
1 r 212:34...p
Similarly, we0write 1
qyy qy1 qy2 qyp
B q1y q11 q12 q1p C
B C
q0p þ 1xp þ 1 ¼ B q
B 2y q21 q22 q2p C
C = correlation matrix of y,
@... ... ... ... A
qpy qp1 qp2 qpp
x1 ; x2 ; . . .; xp .
P P
Now, 0 ¼ product of the diagonal element of 0 jq0 j
jq 0 j
) q2y:12...p ¼ 1
Cofactor of qyy in q0
P P
ðryy r11 r22 . . .rpp Þ
¼1 P 0
¼1 P
0
j j ðr11 r22 . . .rpp Þ ryy j j
P1 P
ryy r 0 r 0
r0ð1Þ 1 rð1Þ rð1Þ b X1
ð1Þ ð1Þ
¼1 ¼ ¼ as b ¼ r :
ryy ryy ryy ð1Þ
P
r0 b b0 b
X1 X X
ð1Þ
) q2y:12...p ¼ ¼
; b¼ r ) b ¼ r ) r0 b ¼ b0 b
ryy ryy ð1Þ ð1Þ ð1Þ
0 1
S11 S12 ... S1p
B S21 S22 ... S2p C
Spxp ¼B
@...
C which is positive definite.
... ... ... A
Sp1 Sp2 ... Spp
^ þb
Estimated regression equation of y on x1 ; x2 ; . . .; xp is y ¼ b ^ x1 þ
0 1
^ ^
b2 x2 þ þ bp xp
Appendix 257
^ ;b
where b ^ ^ ^
0 1 ; b2 ; . . .; bp are the solutions of the following normal equations:
9
S1y ¼b ^ S11 ^ S12
b þ þ ^ S1p >
b >
1 2 p >
S2y ¼b ^ S21 ^ S22
b þ þ ^ S2p =
b
1 2 p ðA:2Þ
... ... ... >
>
Spy ¼b ^ Sp1 ^ Sp2
b þ þ ^ Spp >
b ;
1 2 p
^ ¼yb
and b ^ x1 b
^ x2 b ^ xp
0 1 2 p
We write y nx1
¼ ðy1 ; y2 ; . . .; yn Þ0
0 1
x11 x1 x21 x2 . . . xp1 xp
B x12 x1 x22 x2 . . . xp2 xp C
K nxp ¼B
@ ...
C
... ... ... A
x1n x1 x2n x2 . . . xpn xp
0
^ px1 ¼ b
b ^ ;b
^ ^
1 2 ; ::; bp
Pn
Note that Siy ¼ a¼1 ðxia xi Þya ; (A.2) reduces to
^ ¼ K0 y ) b
Sb ^ ¼ S1 K 0 y
^ ;b
)b ^ ^
1 2 ; . . .; bp are linear functions of y1 ; y2 ; . . .; yn which are normal.
^ ^ ;D b
) b Np E b ^
^ ¼ S1 K 0 y ¼ S1 K 0 y y 2
Now, b
0
where 2 ¼ ð1; 1; . . .; 1Þ and K 0 2 ¼ 0
^ 1 0
) E b ¼ S K E y y 2
yn y
258 Appendix
)E b ^ ¼ S1 K 0 K b
^ ¼ S1 S b ¼ b ½* K 0 K ¼ S
^
D b ¼ S K r I n KS ¼ r S K KS1 ¼ r2 S1 SS1 ¼ r2 S1
1 0 2 1 2 1 0
^ ¼ S1 K 0 y Np b ; r2 S1
)b
ij
We write S1 ¼ S ^ Þ ¼ r2 Sii
) Vðb i
and
Cov b ^
^ ;b
i j ¼r S 8i; j ¼ 1ð1Þp
2 ij
^ N 1 b ; r2 Sii i ¼ 1ð1Þp
)bi i
^ ¼y b
Again, b ^ x1 b
^ x2 b
^ xp
0 1 2 p
!
X
p X
p X
p
^ ¼ EðyÞ
)E b E ^ xi ¼
b b0 þ bixi bixi ¼ b0
0 i
i¼1 i¼1 i¼1
X
^ ¼V y
V b ^ xi
b
0 i
^ ; x 0 ¼ x1 ; x2 ; . . .; xp
¼ V y x0b
¼ rn
2
0 ^ x (as y and b
þ x D b ^ are independent)
r2 1
¼ þ x 0 r2 S1 x ¼ r2 þ x 0 S1 x :
n n
^ þ Pb
p
Again Y ¼ b ^ xi ; ) Y N 1 ðEðYÞ; VðYÞÞ
0 i
1
P
^ þ
where EðYÞ ¼ E b ^ xi ¼ b þ P b xi ¼ nx ; (say)
E b
0 i 0 i
Appendix 259
! " #
X
p X X
VðYÞ ¼ V b^0 þ ^ xi
b i ¼ V y ^ xi þ
b i
^ xi
b i
i¼1 i i
" # !
X X
¼ V y þ ^
bi ðxi xi Þ ¼ VðyÞ þ V ^
bi ðxi xi Þ
i i
r2 X X
p p
¼ þ ðxi xi Þ xj xj Cov b ^
^ ;b
i j
n 1 1
0
r2 X X 2 ij
p p
2 1 1
¼ þ ðxi xi Þ xj xj r S ¼ r þ x x S ð x xÞ
n n
1 1
X 1 0
) Y N1 b0 þ bi xi ¼ nx ; r2 þ x x S1 ð x xÞ
n
X p 2
1 ^ b ^ x1a þ . . . þ b ^ xpa
^2 ¼
r ya b
n p 1 a¼1 0 1 p
( )2
1 Xn X
¼ ðya yÞ ^ ðxia xi Þ
b
n p 1 a¼1 i
i
" #
1 X n XX
¼ Syy ^ ^
bi bj ðxia xi Þðxia xjÞ
np1 a¼1 i j
" #
1 XX 1 0
¼ Syy ^ ^
bi bj Sij ¼ ^
Syy b S b^
np1 i j
np1
0
P
b b
(Note that q2y:12:::p ¼ ryy )
(1) H 01 : b1 ¼ b2 ¼ ¼ bp ¼ 0
) x1 ; x2 ; . . .; xp are not worthwhile in predicting y.
^0 P b
b ^
* q2y:12...p ¼ ) b1 ¼ b2 ¼ ¼ bp ¼ 0 ) q2y:12...p ¼ 0
ryy
So the problem
is totest H 01 : q2y:12...p ¼ 0 against H 1 : q2y:12...p [ 0
Now Syy 1 r 2y:12...p ¼ Syy b ^0 P b ^
n
X
¼ ^ b
ya b ^ x1a b
^ xpa r2 v2
0 1 p np1
a¼1
260 Appendix
^ 0S b
Also, Syy r 2y:12...p ¼ b ^ 0S b
^ ¼ Syy Syy b ^
) Syy ¼ ^ 0S b
Syy b ^ þ Syy r 2y:12...p
(2) H 0 : b0 ¼ b against H 1 : b0 6¼ b
^ b
b
Under H 0 ; t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
0
tnp1
1 0 1
^
r þ x S x
n
^2 ¼ np1
where r 1 ^ 0S b
Syy b ^ ¼ S02
y:12...p , say
X
^ ¼y
and b ^ xi ¼ y x 0 b
b ^
0 i
x0 : jtj [ ta=2;np1
^ b0
b ipffiffiffiiffi
Under H 0 ; t ¼ tnp1 :
r
^ Sii
x0 : jtj [ ta=2;np1
) x0 : jtj [ ta=2;np1
100ð1 aÞ% confidence interval for bi bj is
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
b^ b
^ r ^ Sii þ Sjj 2Sij ta=2;np1
i j
(5) H 0 : EðYÞ ¼ nx ¼ n0
Yn0
Under H 0 ; t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
0 ffi tn p 1
r
^ nþ
1
x x S1 x x
x0 : jtj [ ta=2;np1 :
^ þ Pp b
where Y ¼ b ^
0 i¼1 i xi
0 P
1
1 12 xl xl 1\xi \1
f x ¼ p ffiffiffiffiffiffiffiffi
P e ; and 0\ri \1
ð2pÞp=2 j j 1\li \1
X1
Qð x Þ ¼ ð x l Þ0 ð x lÞ
P
Since is positive definite, there exists a nonsingular matrix V pxp such that
P1
¼ VV 0 :
) Qð x Þ ¼ ð x l Þ0 VV 0 ð x l Þ ¼ y 0 y where y ¼ V 0 ð x l Þ
262 Appendix
X
p
¼ y2i :
i¼1
rffiffiffiffiffiffiffiffiffiffi
X
@ x1 ; x2 ; . . .; xp 1 1
jJ j ¼ ¼ ¼ q ffiffiffiffiffiffiffiffiffiffiffiffiffi ¼
@ ðy1 ; y2 ; . . .; yn Þ jV j P1
12 y 0 y pffiffiffiffiffiffiffiffi
P
) p.d.f. of y is f ð y Þ ¼ p
1 ffiffiffiffiffiffiffiffi
P e j j
ð2pÞ p=2
j j
P
P
1 12 y21
¼ e 1 :
ð2pÞp=2
X
p
) y2i v2p ; i.e., Qð x Þ v2p
1
P P
If we now want to find the distribution of Q
ð x Þ ¼ x 0 1 x, since is
P
positive definite, there exists a non-singular matrix V pxp such that 1 ¼ VV 0 .
) Q
ð x Þ ¼ x 0 VV 0 x ¼ z 0 z where Z ¼ V 0 x .
P
pffiffiffiffiffiffiffiffi
Here also jJ j ¼ j j.
X1
)ð x l Þ0 ð x l Þ¼ ð x l Þ0 VV 0 ð x l Þ
0
¼ V 0 x V 0 l I p V 0 x V 0 l
0
¼ z V 0 l Ip z V 0 l
0
1 12 z V 0 l I p z V 0 l
) f ðzÞ ¼ e
ð2pÞp=2
) z1 ; z2 ; . . .; zp are normal with common variance unity but with means given by
E z ¼ V0 l.
0
Pp 2
) 1 zi non-central v2p with non-centrality parameter V 0 l V0 l ¼
P
l 0 VV 0 l ¼ l 0 1 l.
Appendix 263
Events A1 A2 …. Ai …… Ak Total
Probability P1 P2 …. Pi …… Pk 1
Frequency n1 n2 …. ni …… nk n
n! Y k
) f ðn1 ; n2 ; . . .; nk Þ ¼ Q pni i
n !
i i i¼1
ni Binðn; pi Þ
P 2
Pearsonian chi-square statistic is v2 ¼ ki¼1 ðni npnpi Þ
i
Using Stirling’s approximation to factorials
pffiffiffiffiffiffi n n þ 1
2pe n 2 Y k
f ðn1 ; n2 ; . . .; nk Þ ’ Q pffiffiffiffiffiffi ni þ 12
pni i
k n
1 2pe ni
i 1
n þ 12
Qk n i
n 1 pi
¼ k1 Q n þ1
k
ð2pÞ 2 1 ni i 2
pffiffiffi Y k 1
n npi ni þ 2
¼ k1 Q pffiffiffiffiffiffi
ð2pÞ 2 npi 1 ni
Xk
1 npi
) loge f ðn1 ; n2 ; . . .; nk Þ ’ C þ ni þ logc ðA:3Þ
i¼1
2 ni
pffiffi
where C ¼ loge n
k1 Qk pffiffiffiffiffi
ð2pÞ 2
1
npi
ni npi
We write di ¼ p ffiffiffiffiffiffiffi
np q ; qi ¼ 1 pi
i i
264 Appendix
rffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi ni qi
) ni ¼ npi þ di npi qi ) ¼ 1 þ di
npi npi
rffiffiffiffiffiffi1
np qi
) i ¼ 1 þ di
ni npi
X k rffiffiffiffiffiffi
1 pffiffiffiffiffiffiffiffiffiffi qi
) loge f ðn1 ; n2 ; . . .; nk Þ¼ C npi þ þ di npi qi logc 1 þ di
1
2 npi
0 1
Xk rffiffiffiffiffiffi 3=2
1 pffiffiffiffiffiffiffiffiffiffi qi d2 q d3 q
¼C npi þ þ di npi qi @di i i þ i i A;
2 np i 2np i 3ðnpi Þ 3=2
1
qffiffiffiffiffi
qi
Provided di np \1
i
k
X rffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi 1 qi 1 1 q d3
¼C di npi qi þ di d2i qi d2i i þ d2i qi þ piffiffiffi ð Þ þ
1
2 npi 2 4 npi n
k
X rffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi 1 qi 1 1 q d3
¼C di npi qi þ di þ d2i qi d2i i þ piffiffiffi ð Þ ðA:4Þ
1
2 npi 2 4 npi n
P pffiffiffiffiffiffiffiffiffiffi Pk
Note that di npi qi ¼ 1 ðni npi Þ ¼ n n ¼ 0;
pffiffiffi d3 d2
we assume that d3i ¼ 0ð nÞ i:e:; piffiffin ! 0; pdiffiffin ! 0 ) ni ! 0
) All the terms in the R.H.S of (A.4) tends to zero except 12 d2i qi , thus (A.4)
implies
1X k Pk 2
d2i qi ) f ’ eC e2 1 di qi
1
loge f ’ C
2 1
pffiffiffi Pk ðni npi Þ2
n 12
)f ’ k1 Q pffiffiffiffiffiffi e
i¼1 npi
ð2pÞ 2 k1 npi
1 Pk ðni npi Þ2 1
2 1
¼ pffiffiffiffiffi e
k1
i¼1 npi
: Qk1 pffiffiffiffiffiffi ðA:5Þ
ð2pÞ pk
2
1 npi
P
We note that k1 ðni npi Þ ¼ 0
P
i.e., nk npk ¼ 1k1 ðni npi Þ
X
k
ðni np Þ2 X
k1
ðni np Þ2 ðnk npk Þ2
) i
¼ i
þ
i¼1
npi i¼1
npi npk
Appendix 265
nP o2
k1
X
k1
ðni np Þ2 1 ðni npi Þ
¼ i
þ ðA:6Þ
1
npi npk
@ ðn1 ; n2 ; . . .; nk1 Þ pffiffiffiffiffiffiffiffiffiffiffi
jJ j ¼ ¼ Diag pffiffiffiffiffiffiffi
np1 . . . npk1
@ ðx ; x ; . . .; x Þ
1 2 k1
Y
k 1
pffiffiffiffiffiffi
¼ npi
Pk1 pffiffiffiffiffi 2
Pk ðni npi Þ2 Pk1 npi xi
(A.6) ) 1 npi ¼ 1 x2i þ 1
npk
" #
X
k1
1 X k1 k1 X
X pffiffiffiffiffiffiffiffi
¼ þ x2i px þ
2
pi pj x i x j
1
pk 1 i i i6¼j¼1
k1
X Xk1 X pffiffiffiffiffiffiffi
ffi
pi pj
p
¼ 1 þ i x2i þ xi xj
1
pk i6¼j¼1
pk
¼ x 0A x
where
0 pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi 1
p1 p1 p2 p1 p3 p1 pk1
1þ pk ffi . . .
B
pk pk pffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi
pk ffi
C
B 1 þ pp2
p2 p3 p2 pk1
C
Ak1xk1 ¼ B pk . . . pk C
@ A
k
... ...
1 þ ppk1
k
0 1
1 þ a21 a1 a2 a1 a3 . . . a1 ak1
B a1 a2 1 þ a22 a2 a3 . . . a2 ak1 C qffiffiffiffi
¼B
@
C where ai ¼ pi 8i ¼ 1ð1Þk 1
... ... A pk
Now
a1 þ a1 a2 a3 . . . ak1
1
a1 a2 þ a12 a3 . . . ak1 Rows
jAj ¼ ða1 a2 ::ak1 Þ ;
. . .: ... ... ... ai
a1 a2 a3 1
ak1 þ ak1
266 Appendix
1þ 1
1 1. . . 1
a21
1 1þ 1
1. . . 1
2
a22
¼ ða1 a2 ak1 Þ 1 1þ 1
1 1
... ... ...
a23
...
1 1þ
1 1 1
a2k1
2
a1 þ 1 a21 ... a21
2
a a2 þ 1
2
... a22
¼ 2 (ith row X a2 )
. 2. . ... ... i
a a2k1 ... a2k1 þ1
k1
Pk1 P Pk1
1þ a2i 1 þ 1k1 a2i ... 1þ a2i
1 1
¼
a22 1 þ a22 ... a22 (1st row R1 ¼ P Ri =
... ... ...
a2k1 a2k1 ... 1 þ a2k1
sum of all rows)
1 1 ... 1
2
Xk1 a a22 þ 1 ... a22
¼ 1þ a2i ¼ 2 ;
1
. 2. . ... ...
a a2k1 ... 1 þ a2k1
k1
1 1 ... 1
Pk1 2 0 1 ... 0
¼ 1 þ 1 ai ¼ ;
. . .: . . .: ... . . .
0 0 ... 1
P
k
X
k1 X
k1 pi
pi 1
¼ 1þ a2i ¼ 1 þ ¼ 1 ¼
1
pk pk pk
12 x 0 A x
) (A.5) ) f ðx1 ; x1 ; . . .; xk1 Þ ¼ 1
k1 p ffiffiffiffi e Qk11 pffiffiffiffiffi jJ j
ð2pÞ 2 pk npi
pffiffiffiffiffiffi
1
jAj 12 x 0 A x
¼ k1 e
ð2pÞ 2
P1
12 x 0 x P
¼ k1 P 1=2 e
1 where A1 ¼ :
ð2pÞ 2 j j
Appendix 267
P
Since 1 is positive definite therefore there exists a non-singular V such that
P1
¼ VV 0 .
P
) x 0 1 x ¼ x 0 vv0 x ¼ y 0 y where y ¼ v0 x .
Using transformation ðx1 ; x2 ; . . .; xk1 Þ ! ðy1 ; y2 ; . . .; yk1 Þ
X1=2
1
jJ j ¼ ¼
jV j
12 y 0 y
) f ðy1 ; y2 ; . . .; yk1 Þ ¼ 1
k1 e ) y1 ; y2 ; . . .; yk1 are i.i.d. N(0, 1)
ð2pÞ 2
) y 0 y v2k1
X1 X
k
ðni np Þ2
) x0 x v2k1 ) i
v2k1
npi
i¼1
qffiffiffiffiffi
qi
Note This approximate distribution is valid if di np \1
i
n np
Again di ¼ pi ffiffiffiffiffiffiffii , using normal approximation the effective range of di is (−3, 3)
npi qi
)d2i 9
i.e.,
Maxd2i ¼ 9
i.e. if npi [ 9. So the approximation is valid if the expected frequency for each
event is at least 10.
Again, if we consider the effective range of di as (−2, 2), then the approximation
is valid if the expected frequency for each event is at least 5.
It has been found by enquiry that if the expected frequencies are greater than 5
then the approximation is good enough.
If the expected frequencies of some classes be not at least 5 then some of the
adjacent classes are pooled such that the expected frequencies of all classes after
coalition are at least 5. If k
be the no. of classes after coalition, then
Pk
ðn
i np
i Þ2
i¼1 np
v2k
1 ;
i
where n
i ¼ observed frequency after coalition,
np
i ¼expected frequency after coalition.
Uses of Pearsonian-v2 :
(1) Test for goodness of fit:
268 Appendix
) x0 : v2 [ v2a;k1
Classes Population
P1 P2 ………. Pj ………. Pl
A1 p11 p12 ………. p1j ………. p1l
A2 p21 p22 ………. p2j ………. p2l
. . . ………. . ………. .
. . . ………. . ………. .
Ai pi1 pi2 pij pij
. . . . .
. . . . .
. .
Ak pk1 pk2 pkj pkl
Total 1 1 ………. 1 ………. 1
Appendix 269
where pij ¼ the probability that an individual selected from jth population will
belong to ith class.
We are to test H 0 : pi1 ¼ pi2 ¼ ¼ pil ð¼ pi sayÞ 8i ¼ 1ð1Þk. To do this we
draw a sample of size n and classify as shown below:
Pl Pk fnij n0j pi g2
) Under H 0 ; v2 ¼ j¼1 i¼1 n0j pi v21ðk1Þ
pi ’s are unknown and they are estimated by p^i ¼ nni0 8i ¼ 1ð1Þk:
P P fnij n0j ni0 g
) under H 0 ; v2 ¼ 1j¼1 ki¼1 n ni0n v21ðk1Þðk1Þ ¼ v2ðk1Þðl1Þ
0j n
) x0 : v2 [ v2a;ðk1Þð11Þ
270 Appendix
A B Total
B1 B2 ………. Bj ………. Bl
A1 p11 p12 ………. p1j ………. p1l p10
A2 p21 p22 ………. p2j ………. p2l p20
. . . ………. . ………. . .
. . . ………. . ………. . .
Ai pi1 pi2 pij pi1 pi0
. . . . . .
. . . . . .
Ak pk1 pk2 pkj pkl pk0
Total p01 p02 ………. p0j ………. p0l 1
We are to test
H 0 : A and B are independent, i.e. to test
A B Total
B1 B2 ………. Bj ………. Bl
A1 n11 n12 ………. n1j ………. n1l n10
A2 n21 n22 ………. n2j ………. n2l n20
. . . ………. . ………. . .
. . . ………. . ………. . .
Ai ni1 ni2 nij nil ni0
. . . . . .
. . . . . .
Ak nk1 nk2 nkj nkl nk0
Total n01 n02 ………. n0j ………. n0l n
n! Y Y nij
Pðn11 ; . . .; nkl Þ ¼ Q Q pij
i j nij ! i j
E nij ¼ npij ;
Appendix 271
i ¼ 1ð1Þk; j ¼ 1ð1Þ1:
X X nij npij 2 a
) v2k11
i j
np ij
a
vð2k1Þð11Þ
PP n2ij a
i.e., v2 ¼ n ni0 n0j n vð2k1Þð11Þ
i j
) x0 : v2 [ v2a;ðk1Þð11Þ :
A B Total
B1 B2
A1 a1 b1 T1
A2 a2 b2 T2
. . . .
Ai ai bi Ti
. . . .
Ak ak bk Tk
Total Ta Tb n
Here,
2 2
X
k
ai T inT a X
k
bi T inT b
v ¼
2
TiT a þ TiTb
1 n 1 n
T iT a T iT a
¼ T i ai T i þ ¼ ai
n n
X k 2
T T
i a 1 1 1
) v2 ¼ n ai þ
1
n T a T b T i
X k 2
T iT a n
¼n ai
1
n T i T aT b
2 X k 2 2
n ai T iT ai T a
¼ þ 2a 2
T aT b 1 T i n n
" #
n 2 Xak 2
Ta 2
Ta
¼ i
þ 2 n 2 Ta
T aT b 1 T i n n
" #
n2 X a2i T 2a
k
¼
T a T b 1 T i n2
hP i
2 k b2i T2
This formula or its equivalent v2 ¼ T na T b 1 Ti n2b will be found more con-
venient for computational purpose.
x0 : v2 [ v2a;k1
(ii) k = 2, l = 2:
A B Total
B1 B2
A1 a b a+b
A2 c d c+d
Total a+c b+d n=a+b+c+d
Here,
ða þ bÞða þ cÞ 2 ða þ bÞðb þ d Þ 2 ðc þ d Þða þ cÞ 2 n o2
a b c d ðc þ dÞnðb þ dÞ
n n n
v ¼
2
þ þ þ
ða þ bÞða þ cÞ ða þ bÞðb þ d Þ ðc þ d Þða þ cÞ ðc þ d Þðb þ d Þ
n n n n
ðad bcÞ2 1 1 1 1
)v2 ¼ þ þ þ
n ða þ bÞða þ cÞ ða þ bÞðb þ d Þ ðc þ d Þða þ cÞ ðc þ d Þðb þ d Þ
2
ðad bcÞ n n
¼ þ
n ða þ bÞða þ cÞðb þ d Þ ða þ cÞðc þ d Þðb þ d Þ
ðad bcÞ2 n
¼
ða þ bÞðc þ d Þða þ cÞðb þ d Þ
A B Total
B1 B2
A1 aþ 1
2 b 12 a+b
A2 c 1
2 dþ 1
2
c+d
Total a+c b+d a+b+c+d
Here, a þ 12 d þ 12 b 12 c 12 ¼ ðad bcÞ þ n2
¼ jad bcj þ n2 ¼ jad bcj n2 (since ad − bc < 0)
Case 2 If ad > bc
A B Total
B1 B2
A1 a 12 bþ 1
2
a+b
A2 cþ 1
2 d 1
2
c+d
Total a+c b+d a+b+c+d
274 Appendix
Here, a 12 d 12 b þ 12 c þ 12
n
¼ ðad bcÞ
2
n
¼ jad bcj
2
n½jadbcjn
2
v20 v21 2
v2 ¼ v20 v v20 :
v21 v21 1
PfjX n X j\ 2g ! 1 as n ! 1
P
and we write it as X n !X: If X is degenerate, i.e. a constant, say c, then this
convergence is known as WLLN.
Definition 2 Let fX n g; n ¼ 1; 2;. . . be a sequence of random variables having
distribution functions fF n ðxÞg and X be a random variable having distribution
functionF ðxÞ: If F n ðxÞ ! F ðxÞ as n ! 1 at all continuity points of F ðxÞ , then we
L
say X n converges in law to X and we write it as X n !X: i.e., the asymptotic
distribution of X n is nothing but the distribution of X.
Appendix 275
P
Result 1(a): If X n ! X and gðxÞ be a continuous function for all x, then
P
gðX n Þ!gðX Þ:
P
Result 1(b): If X n ! C and gðxÞ is continuous in the neighbourhood of C, then
P
gðX n Þ ! gðCÞ:
Result 2(a):
P L
X n ! X ) X n !X
P L
X n ! C , X n !C
L L
Result 2(b): X n !X ) gðX n Þ!gðxÞ if g be a continuous function.
L
Result 3: Let fX n g and fY n g be sequences of random variables such that X n !X
P
and Y n !C; where X is a random variable and C is a constant, then
L L
(a) X n þ Y n !X þ C; (b) X n Y n !XC;
L
(c) Xn
Yn ! CX ; if C 6¼ 0 and
P
(d) X n Y n !0; if C = 0.
Theorem 1 Let fT n gbe a sequence of statistics such that
pffiffiffi L 0
nðT n hÞ!X N ð0; r ðhÞÞ:If gðnÞbe a function admitting g ðnÞin the neigh-
2
pffiffiffi L
bourhood of h, then nðgðT n Þ gðhÞÞ!Y N ð0; r2 ðhÞg02 ðhÞÞ:
Z1 2x2
1 2r ðhÞ
! pffiffiffiffiffiffi e dx ¼ 1
2prðhÞ
1
276 Appendix
) Pfj2n j\dg ! 1 as n ! 1
P
) 2n !0 ðA:8Þ
pffiffiffi L
Again; nðT n hÞ!X N 0; r2 ðhÞ ðA:9Þ
Combining (A.8) and (A.9) and using result 3(d), we can write
pffiffiffi P
nðT n hÞ 2n !0 ðA:10Þ
P g0 ð hÞ P
) g0 ðT n Þ!g0 ðhÞ ) 0 !1
g ðT n Þ
pffiffiffi pffiffiffi
nð gð T n Þ gð hÞ Þ nðgðT n Þ gðhÞÞ g0 ðhÞ
) ¼ 0
g0 ðT n Þ g0 ð hÞ g ðT n Þ
Appendix 277
As first part of the R.H.S converges in law to X N ð0; r2 ðhÞÞ and the second
part converges in probability to 1,
) their product converges in law to N ð0; r2 ðhÞÞ:
Note 3 Further if rð1Þ is continuous, then
pffiffiffi
nðgðT n Þ gðhÞÞ a
N ð0; 1Þ:
g0 ðT n ÞrðT n Þ
pffiffi pffiffi
nðgðT n ÞgðhÞÞ nðgðT n ÞgðhÞÞ rðhÞ
Proof g0 ðT n ÞrðT n Þ ¼ g0 ðT n ÞrðhÞ rð T n Þ
pffiffi
nðgðT n ÞgðhÞÞ a
By note-2, g0 ðT n ÞrðhÞ N ð0; 1Þ
P
Also, T n !h and rðnÞ is continuous
P rðhÞ P
rðT n Þ!rðhÞ ) !1:
rðT n Þ
pffiffiffi
nðgðT n Þ gðhÞÞ L
) !N ð0; 1Þ
g0 ðT n ÞrðT n Þ
Generalization of theorem 1
Theorem
8 20 1 9
>
> T1n >
>
>
> B T2n C >
>
< B C =
B C
Let T ¼ B C for n¼ 1; 2. . .: be a sequence of statistics such that
>
> n
@ A >
>
>
> >
>
: ;
Tkn
0 pffiffiffi 1
p nffiðT1n h1 Þ
ffiffi
B nðT2n h2 Þ C P
pffiffiffi B C a P
n T h ¼B B C Nk 0 ; kxk h ;where
C h ¼
n
@ A
pffiffiffi
nðTkn hk Þ
rij h :
Let gð. . .Þ be a function of k variables such that it is totally differentiable. Then
L
pffiffiffi
n g T g h !X N1 0; V h
n
P P i
@g @g @g @g
where V h ¼ ki kj @h @h rij h ; @h ¼ @T
i j i
in
j T ¼h
n
278 Appendix
pffiffiffi
) n T h
P
2n ! 0
n
pffiffiffi P
pffiffiffi @g
) (A.13) implies n g T g h n k1 ðTin hi Þ @h ¼
n
P
i
pffiffiffi T h !
n 2n n 0
P
i.e., Yn Xn !0
pffiffiffi pffiffiffi P @g
where Yn ¼ n g T g h and Xn ¼ n k1 ðTin hi Þ @h
n i
pffiffiffi
We note that Xn ; being linear function of normal variables nðTin hi Þ; i ¼
1ð1ÞK; will be asymptotically normal with mean 0 and variance =
Pk Pk @g @g
i j @hi @hj rij ðhÞ ¼ V h
i.e.,
L
Xn !X N 0; V h
L
) ðYn Xn Þ þ Xn !X N 0; V h
L
i.e., Yn !X N 0; V h
pffiffiffi
a
i.e., n g T g h N 0; V h
n
Appendix 279
P
1
> m0r ¼ 1n ðxi lÞr
>
>
: m ¼ 1 P ðx xÞr
r n i
(i) To find E m0r ; V m0r ; Cov m0r ; m0s
1X n 1X n
E m0r ¼ E xri ¼ l0 ¼ l0r
n 1 n 1 r
Cov m0r ; m0s ¼ E m0r m0s l0r l0s
1 nX r X s o
¼ 2E xi xi l0r l0s
n " #
1 X r þ s X X n r s o
¼ 2 E xi þ E xi xj l0r l0s
n i6¼j
l0r þ s l0r l0s
¼
n
0 1 0
) V mr ¼ l2r l02 r
n
pffiffiffi 0
n mr l0r L
) pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi !N ð0; 1Þ
l02r l02r
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn
Since the sample s.d s ¼ i¼1 ðxi xÞ2 is consistent estimator of r;
1
n
P P
s!r, i.e. rs !1
pffiffiffi pffiffiffi
nðx lÞ nðx lÞ r L
) ¼ !N ð0; 1Þ
s r s
0 0 0 0
(ii) To find E mr ; V mr ; Cov mr ; ms :
1X n
m0r ¼ ðxi lÞr ; ) E m0r ¼ lr
n i¼1
! !
0 0 1 X n
r
X n
s
E mr ms ¼ 2 E ð x i lÞ ð x i lÞ
n i¼1 i¼1
" #
1 X n
rþs
XX n
r s
¼ 2 E ð x i lÞ þ E ð x i lÞ E x j l
n 1 i6¼j¼1
1
¼ 2
nlr þ s þ nðn 1Þlr ls
n
1
¼ lr þ s þ ðn 1Þlr ls
n
1
) Cov m0r ; m0s ¼ lr þ s þ ðn 1Þlr ls lr ls
n
1
¼ lr þ s lr ls
n
1
) V m0r ¼ l2r l2r
n
P P
We note that, m0r ¼ 1n ni¼1 ðxi lÞr ¼ 1n ni Zi
where Zi ¼ ðxi lÞr ; EðZi Þ ¼ lr and V ðZi Þ ¼ l2r l2r
For x1 ; x2 ; . . .xn i.i.d ) Z1 ; Z2 ; . . .Zn are also i.i.d
pffiffiffi
nðZ l Þ L
) pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffir ffi !N ð0; 1Þ:
l2r l2r
pffiffi 0
nðmr lr Þ a
That is, pffiffiffiffiffiffiffiffiffiffi
ffi N ð0; 1Þ
l2r lr
2
Appendix 281
1X n
1X
mr ¼ ðxi xÞr ¼ ½ðxi lÞ ðx lÞr
n i¼1 n
r
1X 1X
¼ ðxi lÞ ðx lÞ
n n
1X r
¼ ðXi lÞ m01
n" #
Xn
1 r r r1 0 r1 r 0r1 r 0r
¼ ðxi lÞ ðxi lÞ m1 þ . . . þ ð1Þ ðxi lÞm1 þ ð1Þ m1
n 1 1 r1
r r
¼ m0r m0r1 m01 þ þ ð1Þr m01
1
¼ g m01 ; m02 . . .m0r ¼ g m 0
pffiffiffi 0
a
We have observed that n mj lj N 0; l2j l2j 8 j ¼ 1ð1Þr
0 1
m01 l1
B 0 C
pffiffiffi pffiffiffiB m2 l2 C a X
) n m l ¼ nB
0
B C Nr 0 ;
C
rxr
@ A
m0r lr
pffiffiffi
P rxr pffiffiffi
where ¼ rij l and rij l ¼ Cov n m0i li ; n m0j lj
¼ nCov m0i ; m0j ¼ li þ j li lj
So by Theorem 2,
pffiffiffi 0 a
n g m g l N 0; V l
P P dg dg dg
dg
where V l ¼ ri¼1 rj¼1 dl dl r ij l ; dli ¼ dm0i m0 ¼ l
i j
282 Appendix
r
gðl1 ; l2 ; . . .lr Þ ¼ lr lr1 l1 þ þ ð1Þr lr1 ¼ lr
1
dg dg
¼ ¼ rlr1
dl1 dm0i m0 ¼ l
dg dg dg
¼ ¼ 1 and ¼ 0 8 i ¼ 2ð1Þr 1
dlr dm0r m0 ¼ l dli
2 2
dg dg dg dg
)V l ¼ r11 l þ rrr l þ 2 r1r l
dl 1 dl r dl 1 dl r
¼ r 2 l2r1 l2 l21 þ l2r l2r þ 2ðrlr1 Þ lr þ 1 l1 lr
¼ r 2 l2r1 l2 þ l2r l2r 2rlr1 lr þ 1
pffiffiffi a
) nðmr lr Þ N 0; V l
i.e.,
0 1
V l
B C
mr N B C
a
l
@ r ;
n A
In particular,
l l2
for r ¼ 2; m2 ¼ s2 N l2 ¼ r2 ; 4 n 2 , i.e. s2 N r2 ; l4 r
a a 4
n
a 9l3 þ l l2 6l l
for r ¼ 3; m3 N l3 ; 2 6 n 3 2 4
a 16l2 l þ l l2 8l l
for r ¼ 4; m4 N l4 ; 3 2 8n 4 3 5
Again, if sampling is from normal distribution N ðl; r2 Þ then l3 ¼ l5 ¼ ¼ 0
and l2r ¼ ð2r 1Þð2r 3Þ. . .3:1 r2r
i.e.,
l4 ¼ 3r4 ; l6 ¼ 15r6 ; l8 ¼ 105r8 .
a 2r4
) s2 N r2 ;
n
6
a 6r
m3 N 0;
n
8
a 4 96r
m4 N 3r ; :
n
Appendix 283
pffiffi 2 2 pffiffi 2 2
nðs r Þ a P nðs r Þ a
Thus, r2 N ð0; 2Þ and as s2 ! r2 ; s2 N ð0; 2Þ and this can be
used for testing hypothesis regarding r2 :
Note For H0 : r1 ¼ r2
testing
a r2 a r2
s1 N r1 ; 2n11 and s2 N r2 ; 2n22
a r21 r22
) s1 s2 N r1 r2 ; þ
2n1 2n2
s2 ffi a
Under H0 ; ps1ffiffiffiffiffiffiffiffiffiffiffi N ð0; 1Þ where unknown r is estimated as
r 1 1
2n1 þ 2n
2
n1 s 1 þ n2 s 2
^¼
r :
n1 þ n2
P
n
(iv) Covðmr ; xÞ ¼ Cov mr ; m01 þ l x ¼ 1n ðxi lÞ þ l ¼ m01 þ l
i¼1
¼ Cov m0r ; m01 rlr1 V m01
1 l 1
¼ lr þ 1 rlr1 2 ¼ lr þ 1 rlr1 l2
n n n
pffiffiffiffiffi
dg l2
¼ 02 ; r22 h ¼ l2 ; r12 h ¼ l3
dh2 l1
pffiffiffiffiffi
1 l2 1 l2
V h ¼ l4 l2 þ 04 l2 pffiffiffiffiffi 0 02 l3
2
4l2 l021 l 1 l l
2 1 l1
l4 l22 l22 l3
¼ þ 03
4l2 l02
1 l04
1 l1
l4 l22 l3
¼ þ V 2
V2
4l22 l2 l01
i.e.,
pffiffiffi
a
nðg1 c1 Þ N 0; V h
Appendix 285
2 2
dg dg dg dg
where V h ¼ dh r11 h þ dh r22 h þ 2 dh dh2 r12 h
1 2 1
dg dg l3
Now, dh1 ¼ l 13=2 ; dh2 ¼ 32 5=2
2 l2
1 9 l23 1 l3
)V h ¼ 3 l6 l23 þ 9l32 6l4 l2 þ l4 l22 3 ðl5 4l2 l3 Þ
l2 4 l2
5 3=2 5=2
l2 l2
l l23 þ 9l32 6l4 l2 9 l23 l4 l22 3l ðl 4l l Þ
¼ 6 þ 3 5 4 2 3
l32 4 l52 l2
pffiffiffi pffiffiffi m4 l4 a P
We know, n T h ¼ n N2 0 ;
n m2 l2
P22 l8 l4 þ 16l3 l2 8l3 l5 l6 l4 l2 4l23
2 2
where ¼ rij h ¼
l6 l4 l2 4l23 l4 l22
) By Theorem 2,
pffiffiffi a
n g T g h N 0; V h
n
pffiffiffi
a
i.e., nðg2 c2 Þ N 0; V h
2 2
dg dg dg dg
where V h ¼ dh1 r11 h þ dh r22 h þ 2 dh1 dh2 r12 h
2
dg dg
dh1 ¼ l22 ¼ 2l
1
Now and dh2 l32
4
286 Appendix
l l2 þ 16l2 l 8l l
4l24 l4 l22 4l4 l6 l4 l2 4l23
V h ¼ 8 4 3 2 3 5
þ
l42 l62 l52
1X
m010 ¼ xi ¼ x; l010 ¼ E ð X Þ ¼ lx
n
1X
m001 ¼ yi ¼ y; l001 ¼ E ðY Þ ¼ ly
n
1Xn r s 1 X n s
m0rs ¼ xi l010 yi l001 ¼ ð x i lx Þ r y i l y
n 1 n 1
1 X
n
1 X
n r s
mrs ¼ ðxi xÞr ðyi yÞs ¼ xi m010 yi m001
n 1 n 1
s r s
lrs ¼ EðX lx Þr Y ly ¼ E X l010 Y l001
1X n 1X 0
E m0rs ¼ E xri ysi ¼ lrs ¼ l0rs
n 1 n
1 nX r s X u v o
E m0rs m0uv ¼ 2 E xi y i xi yi
( n )
1 X n XX n
rþu sþv
¼ 2E xi yi þ r s u v
xi yi xj yj
n 1 i6¼j¼1
1h i
¼ 2 nl0r þ u;s þ v þ nðn 1Þl0rs l0uv
n
l0r þ u;s þ v þ ðn 1Þl0rs l0uv
¼
n
Appendix 287
1 0
) Cov m0rs ; m0uv ¼ lr þ u;s þ v þ ðn 1Þl0rs l0uv l0rs l0uv
n
1h 0 i
¼ lr þ u;s þ v l0rs l0uv
n
1 0
)V m0rs ¼ l2r;2s l02
rs
n
0 1X r s
E mrs ¼ E xi l010 yi l001 ¼ lrs
n
1 X r s X u v
E m0rs m0uv ¼ 2 E xi l010 yi l001 xi l010 yi l001
n
1
¼ 2 nlr þ u;s þ v þ nðn 1Þlrs luv
n
1
¼ lr þ u;s þ v þ ðn 1Þlrs luv
n
1
)Cov m0rs ; m0uv ¼ lr þ u;s þ v lrs luv
n
0 1
) V mrs ¼ l2r;2s l2rs
n
1X n
mrs ¼ ðxi xÞr ðyi yÞs
n 1
1Xn r s
¼ xi l010 x l010 yi l001 y l001
n 1
Pn P
Since x l010 ¼ 1n 1 xi l010 ¼ m010 and y l001 ¼ 1n n1 yi l001 ¼ m001
1X n r s
) mrs ¼ xi l010 m010 yi l001 m001
n 1
n
1X r r r1 0
xi l010 xi l010
r
¼ m10 þ þ ð1Þr m010
n 1 1
s s s1 0
yi l001 yi l001
s
m01 þ . . . þ ð1Þs m001
1
n
1X r s r r1 s s r s1 0
¼ xi l010 yi l001 xi l010 yi l001 m010 xi l010 yi l001 m01
n 1 1 1
r s r1 s1 0 0 r s
þ xi l010 yi l001 m10 m01 þ ð1Þr þ s m010 m001
1 1
r s r s r s
¼ m0rs m0r1;s m010 m0r;s1 m001 þ m0r1;s1 m010 m001 þ þ ð1Þr þ s m010 m001
1 1 1 1
¼ g m0ij ; i ¼ 0ð1Þr; j ¼ 0ð1Þs; ði; jÞ 6¼ ð0; 0Þ
) mrs ¼ g m0 ;
288 Appendix
0
1fðr þ 1Þðs þ 1Þ1gX1
m010
B m0 C
B 01 C
, say where m ¼ B
0
B C
C
@ A
m0rs
Using the expansion in Taylor’s series
mrs ¼ g m0ij ; i ¼ 0ð1Þr; j ¼ 0ð1Þs; ði; jÞ 6¼ ð0; 0Þ
!
r X
X s dg
¼ g lij ; i ¼ 0ð1Þr; j ¼ 0ð1Þs; ði; jÞ 6¼ ð0; 0Þ þ m0ij lij þ
i¼0 j¼0
dm0ij
m0 ¼ l
ði;jÞ6¼ð0;0Þ
1
where l ¼ ðl10 ; l01 ; ; lrs Þ
PP @g
¼ lrs þ ðm0ij lij Þ @m 0 ðas l01 ¼ l10 ¼ 0Þ
ij
i j m0 ¼ l
ði;jÞ6¼ð0;0Þ
@g
Now @m010 m 0 ¼ l
¼ rlr1;s
@g
¼ slr;s1
@m001 m0 ¼ l
!
@g @g
¼ 1 and ¼ 08i ¼ 0ð1Þr; j ¼ 0ð1Þs
@m0rs m ¼l
0 @m 0
ij
m0 ¼ l
Covðmrs ; muv Þ ¼ Cov m0rs rlr1;s m010 slr;s1 m001 ; m0uv ulu1;v m010 vlu;v1 m001
0 0 0 0 0 0 0
¼ Cov mrs ; muv ulu1;v Cov mrs ; m10 vlu;v1 Cov mrs ; m01 rlr1;s Cov m10 ; m0uv
0 0
þ rulr1;s lu1;v V m10 þ rvlr1;s lu;v1 Cov m10 ; m001 slr;s1 Cov m001 ; m0uv
þ uslr;s1 lu1;v Cov m001 ; m010 þ svlr;s1 lu;v1 V m001
" #
1 lr þ u;s þ v lrs luv ulu1;v lr þ 1;s vlu;v1 lr;s þ 1 rlr1;s lu þ 1;v þ rulr1;s lu1;v l20 þ rvlr1;s lu;v1 l11
¼
n slr;s1 lu;v þ 1 þ uslr;s1 lu1;v l11 þ svlr;s1 lu;v1 l02
1h i
)V ðmrs Þ ¼ l2r;2s l2rs þ r 2 l2r1;s l20 þ s2 l2r;s1 l02 2rlr1;s lr þ 1;s 2slr;s1 lr;s þ 1 þ 2rslr1;s lr;s1 l11
n
1 1
)V ðm20 Þ ¼ l40 l220 ; Covðm20 ; m02 Þ ¼ ½l22 l20 l02
n n
1 1
V ðm02 Þ ¼ l04 l202 ; Covðm20 ; m11 Þ ¼ ½l31 l20 l11
n n
1 1
V ðm11 Þ ¼ l22 l11 ; Covðm02 ; m11 Þ ¼ ½l13 l02 l11
2
n n
Sample correlation r ¼ pffiffiffiffiffiffiffiffiffiffi
m11 ffi
m20 m02 ¼ g
m ¼ gðTnÞ; say
where m ¼ ðm20 ; m02 ; m11 Þ and Tn ¼ ðT1n ; T2n ; T3n Þ0 ¼ ðm20 ; m02 ; m11 Þ0
0
X
pffiffiffi a
) n Tn h N3 0 ; 33
0 1 0 1
h1 l20
pffiffiffi P
where h ¼ @ h2 A ¼ @ l02 A ¼ l
a
i.e., ) n m l N3 0 ;
h3 l11
0 1
P l40 l220 l22 l20 l02 l31 l20 l11
and ¼ rij ðhÞ ¼ @ l04 l202 l13 l02 l11 A
l22 l211
l11
q ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ g ðlÞ ¼ g ðhÞ
l20 l02
pffiffiffi a
n g T g h N 0; V h
n
pffiffiffi
a
i.e., nðr qÞ N 0; V h
P P
@g @g
where V h ¼ 3i¼1 3j¼1 @h h
@hj rij
i
@g @g
and @h ¼ @T
i in T ¼h
n
290 Appendix
@g 2 @g 2 @g 2
@g @g
i:e:V h ¼ r11 h þ r22 h þ r33 h þ 2 r12 h
@h1 @h2 @h3 @h1 @h2
@g @g @g @g
þ2 r13 h þ 2 r23 h
@h1 @h3 @h2 @h3
l 1 l l l l31 l13
¼ q2 2 þ
22 40
þ 04
þ 2 22
þ
l11 4 l220 l202 l20 l02 l11 l20 l11 l02
If the sampling is from N2 l1 ; l2 ; r21 ; r22 ; q then
l40 ¼ 3r41 ; l04 ¼ 3r42 ; l11 ¼ qr1 r2 ; l22 ¼ r21 r22 1 þ 2q2
l13 ¼ 3qr1 r32 ; l31 ¼ 3qr31 r2 ; l20 ¼ r21 ; l02 ¼ r22
Using these values in the expression of V h ; we get
2
V h ¼ 1 q2
pffiffiffi 2
a
) nðr qÞ N 0; 1 q2
ð1q2 Þ
2
a
i.e., r N q; n
This result can be used for testing hypothesis regarding q.
pffiffi
ðrq0 Þ a
(i) H0 : q ¼ q0 ; under H0 : s ¼ n1q 2 N ð0; 1Þ
0
ð1q21 Þ ð1q22 Þ
2 2
a a
(ii) H0 : q1 ¼ q2 ð¼ q; sayÞ; r1 N q1 ; n1 ; r2 N q2 ; n2
2 2 !
a 1 q21 1 q22
) r1 r2 N q1 q2 ; þ
n1 n2
r1 r
r ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a
Under H0 ;s ¼
2
N ð0; 1Þ
ð1q2 Þ 1
n1 þ n1
2
^ ¼ n1nr11 þ
If q is unknown then it is estimated by q n2 r 2
þ n2
If q is known, then the efficiency of the test will be good enough, but if it is
unknown then the efficiency will be diminished. We can use the estimate of q only
when the sample sizes are very very large. Otherwise, we transform the statistic so
that its distribution is independent of q.
Appendix 291
Uses: (i) H0 : P ¼ P0
pffiffiffi pffiffiffiffiffi pffiffiffi a
Under H0 ; s ¼ sin1 p sin1 P0 2 n N ð0; 1Þ
w0 : jsj [ sa=2 where H1 : P 6¼ P0 :
Interval estimate of P
h pffiffiffi pffiffiffi pffiffiffi i
Pr sa=2 2 n sin1 p sin1 P sa=2 ¼ 1 a
h pffiffiffi sa=2ffiffi pffiffiffi i
sa=2
i.e., Pr sin2 sin1 p 2p P sin2 sin1 p þ pffiffi ¼1a
n 2 n
(ii) H0 : P1 ¼ P2 ð¼PÞsay
pffiffiffiffiffi a pffiffiffiffiffi 1
sin1 p1 N sin1 P1 ;
4n1
p ffiffiffiffi
ffi p ffiffiffiffiffi 1
sin1 p2 N sin1 P2 ;
a
4n2
1 pffiffiffiffiffi 1 p ffiffiffiffi
ffi a 1
pffiffiffiffiffi 1
pffiffiffiffiffi 1 1
) sin p1 sin p2 N sin P1 sin P2 ; þ
4n1 4n2
Under H0 ;
pffiffiffiffiffi pffiffiffiffiffi
sin1 p1 sin1 p2 a
s¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N ð0; 1Þ
1
4n1 þ 1
4n2
) w0 : jsj [ sa=2 if H1 : P1 6¼ P2 :
Appendix 293
pffiffiffiffiffi sa=2
U1 ¼ sin2 sin1 p1 þ pffiffiffiffiffi
2 n1
pffiffiffiffiffi a pffiffiffiffiffi
Similarly, sin1 p2 N sin1 P2 ; 4n12 and Pr fL2 P2 U2 g ¼ 1 a
where
pffiffiffiffiffi sa=2
L2 ¼ sin2 sin1 p2 pffiffiffiffiffi
2 n2
pffiffiffiffiffi sa=2
U2 ¼ sin2 sin1 p2 þ pffiffiffiffiffi
2 n2
) Pr fL1 P1 U1 ; L2 P2 U2 g ð1 aÞ þ ð1 aÞ 1
) Pr fL1 U2 P1 P2 U1 L2 g ð1 2aÞ:
294 Appendix
(iii) H0 : P1 ¼ P2 ¼ ¼ Pk ð¼ PÞ say
1 pffiffiffiffi a 1
pffiffiffiffiffi 1
sin pi N sin Pi ; ; i ¼ 1ð1Þk
4ni
P pffiffiffiffi pffiffiffi2
) under H0 ;P ki¼1 sin1 pi sin1 P 4ni v2k
pffiffiffi 1 pffiffiffi Pk 1 pffiffiffiffi pffiffiffi2
c 1 P ¼ nP
sin
i sin pi
and thus v 2
¼ sin p i c 1 P 4ni v2
sin
n i
i¼1 x1
(ii) H0 : k1 ¼ k2 ð¼ kÞ;Say
pffiffiffiffiffi pffiffiffiffiffi
pffiffiffiffiffi a 1 pffiffiffiffiffi a 1
X1 N k1 ; ; X2 N k2 ;
4 4
pffiffiffiffiffi pffiffiffiffiffi
pffiffiffiffiffi pffiffiffiffiffi a 1
) X1 X2 N k1 k2 ;
2
pffiffiffiffiffi pffiffiffiffiffi a
) Under H0 ; s ¼ X1 X2 2 N ð0; 1Þ
w0 : jsj [ sa=2 if H1 : k1 6¼ k2 :
If H0 is accepted, then to find the confidence interval for k:
pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi
p
cffiffiffi 4 X1 þ 4 X2 X1 þ X2
k¼ ¼
4þ4 2
pffiffiffi pffiffiffi
p
cffiffiffi k þ k pffiffiffi h pffiffiffiffiffi pffiffiffiffiffi pffiffiffii
E k ¼ ¼ k * k1 ¼ k 2 ¼ k
2
1 1
p
cffiffiffi þ 1
V k ¼4 4¼
4 8
p
cffiffiffi a pffiffiffi
So k N k; 18
p
cffiffiffi pffiffiffi pffiffiffi
) Probability sa=2 k k 8 sa=2 ¼ 1 a
" 2 2 #
p
cffiffiffi sa=2 p
cffiffiffi sa=2
) Probability k pffiffiffi k k þ pffiffiffi ¼1a
8 8
If H0 is rejected, then to find the confidence interval for the difference ðk1 k2 Þ :
pffiffiffiffiffi a pffiffiffiffiffi
Since X1 N k1 ; 14
pffiffiffiffiffi s 2
pffiffiffiffiffi sa=2 2 a=2
) Probability X1 k1 X1 þ ¼1a
2 2
pffiffiffiffiffi sa=2 2
i.e. Pð AÞ ¼ 1 a where A ¼ fL1 k1 U1 g where L1 ¼ X1 2 and
pffiffiffiffiffi sa=2 2
U1 ¼ X1 þ 2
pffiffiffiffiffi a pffiffiffiffiffi
Similarly, X2 N k2 ; 14
pffiffiffiffiffi s 2
pffiffiffiffiffi sa=2 2 a=2
) Probability X2 k2 X2 þ ¼1a
2 2
pffiffiffiffiffi sa=2 2
i.e. PðBÞ ¼ 1 a where B ¼ fL2 k2 U2 g having L2 ¼ X2 2 ;
pffiffiffiffiffi sa=2 2
U2 ¼ X2 þ 2
296 Appendix
) ProbabilityfL1 k1 U1 ; L2 k2 U2 g ð1 aÞ þ ð1 aÞ 1
) ProbabilityfL1 U2 k1 k2 U1 L2 g 1 2a
(iii) H0 : k1 ¼ k2 ¼ ¼ kk ð¼ kÞsay
pffiffiffiffi
pffiffiffiffiffi a 1
Xi N ki ; ; i ¼ 1ð1Þk
4
P pffiffiffiffiffi pffiffiffiffi2
) under
P H0 ; ki¼1 Xi ki 4 v2k
p^ffiffiffi p ffiffiffi
Xi
k ¼ k and then
Xk pffiffiffiffiffi p^ffiffiffi2
v2 ¼ i¼1
Xi k 4 v2k1
w0 : v2 [ v2a;k1 :
pffiffiffi
^
PpIf H0Pis accepted, then to find the interval estimate of k : E k ¼
ffiffiffi pffiffi
ki k pffiffiffi
k ¼ k ¼ k ½Ask1 ¼ k2 ¼ ¼ kk ¼ k
P pffiffiffiffiffi
p
cffiffiffi V Xi 1
V k ¼ 2
¼
k 4k
p ffiffi
c affi p ffiffi
ffi 1
) kN k;
4k
p
cffiffiffi pffiffiffi pffiffiffiffiffi
)s ¼ k k 4k N ð0; 1Þ
p
cffiffiffi pffiffiffi pffiffiffi
)Probability sa=2 k k 2 k sa=2 ¼ 1 a
" #
pcffiffiffi sa=2 2 p
cffiffiffi sa=2 2
) Probability k pffiffiffi k k þ pffiffiffi ¼1a
2 k 2 k
1 X
s2 ¼ ðxi xÞ2
n1
4
E ðs2 Þ ¼ r2 and V ðs2 Þ ’ 2rn P
4 ðxi xÞ2
Also EðS2 Þ ! r2 and V ðS2 Þ ’ 2rn for S2 ¼ n
2 a 2r4
)s N r ; 2
n
a
We like to get a function gðÞ such that gðs2 Þ N ðgðr2 Þ; c2 Þ where c2 is
independent of r2 .
Z sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Z
c n dr2
g r2 ¼ qffiffiffiffiffi dr2 ¼ c
2r4 2 r2
n
rffiffiffi
n
¼c log r2 þ k
2 e
qffiffi
Choosing k ¼ 0 and c ¼ 2n. We get gðr2 Þ ¼ loge r2
2 2 a 2 2
) g s ¼ loge s N loge r ;
n
d a 2
log r N log r ; 2
n1 þ n2
pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi
d
log r 2 n þ2 n sa=2 d
log r 2 þ n1 þ n2 sa=2
2
)Probability e 1 2 r e
2
¼ 1 a:
If H0 is rejected, then the confidence interval for r21 r22 can be obtained in
the following way:
pffiffiffi2 pffiffiffi2
log s21 n1 sr=2 log s21 þ n1 sa=2
Probability e r21 e ¼1a
pffiffiffi
log s21 n2 sa=2
i.e., Pð AÞ ¼ 1 a where A ¼ L1 r1 U1 having L1 ¼ e
2 1
pffiffiffi2
log s2 þ n1 sa=2
U1 ¼ e 1
pffiffiffi pffiffiffi2
log s22 n2 sa=2 log s22 þ n2 sa=2
Also, Probability e 2 r2 e
2
¼1a
i.e., PðBÞ ¼ 1 a where B ¼ L2 r2 U2 2
As PðABÞ Pð AÞ þ PðBÞ 1
) Probability L1 r21 U1 ; L2 r22 U2 ð1 aÞ þ ð1 aÞ 1
Or Probability L1 U2 r21 r22 U1 L2 1 2a
(iii) H0 : r21 ¼ r22 ¼ ¼ r2k ð¼r2 Þsay
a 2 2
loge s2i
¼ N loge ri ;
ni
P 2
dr 2 n1
) under H0 ; Pki¼1 loge s2i log
a 2
vk1
2
2
d
where log er ¼
2 P
ni loge si
;
ni
) w0 : v2 [ v2a;k1
Appendix 299
gðrÞ ¼ loge r
a
) gðsÞ ¼ loge s N loge r; 2n 1
:
We may use this result for testing hypothesis related to r.
V Z-transformation of sample correlation coefficient from N2 l1 ; l2 ; r21 ; r22 ; q :
ð1q2 Þ
2
E ðr Þ q and V ðr Þ n
2
!
a ð 1 q2 Þ
) r N q; :
n
We like to get a function gðÞ such that gðr Þ is asymptotically normal with
variance independent of q:
Z
pffiffiffi c pffiffiffi 1 1þq
) gðqÞ ¼ n dp ¼ nc loge þk
1q2 2 1q
1 1þq
) gðqÞ ¼ loge ¼ tan h1 q ¼ n; ðsayÞ
2 1q
1 1þr
) gðr Þ ¼ loge ¼ tan h1 r ¼ Z; ðsayÞ
2 1r
300 Appendix
a 1
)Z ¼ gðr Þ N n; :
n
1 4 q2 1
l2 ðZ Þ ¼ þ þ0 3
n 1 2ð n 1Þ 2 n
2 1
Now, 1
¼ ðn1Þ1 n3 ¼ ðn1Þ 11 2 ¼ n1
1
1 n1
n3 ðn1Þ ð n1Þ
" #
1 2 4 1
¼ 1þ þ þ0 3
n1 n 1 ðn 1Þ2 n
1 2 1
¼ þ þ 0
n 1 ð n 1Þ 2 n3
q
2 1
Again, l2 ðZ Þ ¼ n1
1
þ 2
ðn1Þ2
2ðn1 Þ2
þ0 n3
1 q2 1
¼ þ0 3
n 3 2ð n 1Þ 2 n
1
) l2 ðZ Þ ’
n3
) Under H0 ; s ¼ ðZ n0 Þ n 3 N ð0; 1Þ
w0 : jsj sa=2
Appendix 301
) Probability e n3
e n3
¼1a
1q
2 3
sa=2 sa=2
ffiffiffiffiffi
Zp 2 Z þ pffiffiffiffiffi
6 7
2
i.e., Probability4 e s 1 q e s 1 5 ¼ 1 a
n3 n3
2 Zpffiffiffiffiffi 2 Z þ pffiffiffiffiffi
a=2 a=2
e n3
þ1 n3
e þ1
(ii) H0 : q1 ¼ q2 ð¼ qÞ;say
, H0 : n1 ¼ n2 ð¼nÞ say
1
Z1 ¼ tanh1 r1 N n1 ;
a
n1 3
1
Z2 ¼ tanh1 r2 N n2 ;
a
n2 3
a 1 1
)ðZ1 Z2 Þ N n1 n2 ; þ
n1 3 n2 3
Z1 Z1 ffi
Under H0 ; s ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
a
N ð0; 1Þ; )w0 : jsj sa=2 if H1 : q1 6¼ q2
3 þ n
1 1
n1 2 3
2 sa=2
sa=2
3
2 Z1 pffiffiffiffiffiffi 2 Z1 þ pffiffiffiffiffiffi
6e
n1 3
1 e n1 3
17
Probability4 sa=2
q1
sa=2
5¼1a
2 Z1 pffiffiffiffiffiffi 2 Z1 þ pffiffiffiffiffiffi
e n1 3
þ1 e n1 3
þ1
sa=2
ffiffiffiffiffiffi 2 Z1 p
n1 3
Or, Pð AÞ ¼ 1 a where A ¼ fL1 q1 U1 g having L1 ¼ e s 1
2 Z1 pffiffiffiffiffiffi
a=2
n 3
e 1 þ1
sa=2
2 Z1 þ pffiffiffiffiffiffi
e n2 3
1
U1 ¼
sa=2
2 Z1 þ p ffiffiffiffiffiffi
e n2 3
þ1
As PðABÞ Pð AÞ þ PðBÞ 1
)ProbabilityfL1 q1 U1 ; L2 q2 U2 g ð1 aÞ þ ð1 aÞ 1
)ProbabilityfL1 U2 q1 q2 U1 L2 g ð1 2aÞ:
(iii) H0 : q1 ¼ q2 ¼ ¼ qk ð¼qÞ
, H0 : n1 ¼ n2 ¼ ¼ nk ð¼nÞ
1 1 þ ri a 1
Zi ¼ loge N ni ;
2 1 ri ni 3
P
P 2 a ðni 3ÞZi
Under H0 ;v2 ¼ k1 ðni 3Þ Zi ^n v2k1 where ^n ¼ P ðn 3Þ
i
w0 : v2 [ v2a;k1
Barnett, V.D.: Evaluation of the maximum likelihood estimator where the likelihood equation has
multiple roots. Biometrika 53, 151–166 (1966)
Barron, A.R.: Uniformly powerful goodness of fit test. Ann. Statist. 17, 107–124 (1982)
Basawa, I.V., Prakasa Rao, B.L.S.: Statistical Inference in Stochastic Processes. Academic Press,
London (1980)
Basu, D.: A note on the theory of unbiased estimation. Ann. Math. Statist. 26, 345-348. Reprinted
as Chapter XX of Statistical Information and Likelihood: A Collection of Critical Essays.
Springer-Verlag, New York (1955a)
Basu, D.: On statistics independent of a complete sufficient statistic. Sankhya 15, 377-380.
Reprinted as Chapter XXII of Statistical Information and Likelihood: A Collection of Critical
Essays. Springer-Verlag, New York (1955b)
Basu, D.: The concept of asymptotic efficiency. Sankhya 17, 193–196. Reprinted as Chapter XXI
of Statistical Information and Likelihood: A Collection of Critical Essays. Springer-Verlag,
New York (1956)
Basu, D.: Statistical Information and Likelihood: A Collection of Critical Essays (J.K. Ghosh, ed.).
Springer-Verlag, New York (1988)
Bayes, T.: An essay toward solving a problem in the doctrine of chances. Phil. Trans. Roy. Soc.
153, 370–418 (1763). Reprinted in (1958) Biometrika 45, 293-315
Berger, J.: Selecting a minimax estimator of a multivariate normal mean. Ann. Statist. 10, 81–92
(1982)
Berger, J.0.: The Robust Bayesian Viewpoint (with discussion). Robustness of Bayesian Analysis
(J. Kadane, ed.), 63–144. North-Holland. Amsterdam (1984)
Berger, J.0.: Statistical Decision Theory and Bayesian Analysis, 2nd edn. Springer-Verlag, New
York (1985)
Berger, J.O.: Estimation in continuous exponential families: Bayesian estimation subject to risk
restrictions. In: Gupta, S.S., Berger, J.O. (eds.) Statistical Decision Theory III. Academic Press,
New York (1982b)
Berger, J.O.: Statistical Decision Theory and Bayesian Analysis, 2nd edn. Springer-Verlag, New
York (1985)
Berger, J.O., Bernardo, J.M. On the development of reference priors. In: Berger, J.O., Bernardo, J.
M. (eds.) Bayesian Statist. 4. Clarendon Press, London (1992a)
Berger, J.O., Bernardo, J.M.: Ordered group reference priors with application to multinomial
probabilities. Biometrika 79, 25–37 (1992b)
Berger, J.O., Wolpert, R.W.: The Likelihood Principle, 2nd edn. Institute of Mathematical
Statistics, Hayward, CA (1988)
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, New York (1994)
Berndt, E.R.: The practice of Econometrics: Classic and Contemporary. Addison and Wesley
(1991)
Bhattacharya, G.K., Johnson, R.A.: Statistical Concepts and Methods. John Wiley (1977)
Bhatttachayra, R., Denker, M.: Asymptotic Statistics. Birkhauser-Verlag, Basel (1990)
Bickel, P.J.: Minimax estimation of the mean of a normal distribution subject to doing well at a
point. In: Rizvi, M.H., Rustagi, J.S., Siegmund, D., (eds.) Recent Advances in Statistics: Papers
in Honor of Herman Chernoff on his Sixtieth Birthday. Academic Press, New York (1983)
Bickel, P.J., Klaassen, P., Ritov, C.A.J., Wellner, J.: Efficient and Adaptive Estimation for
Semiparametric Models. Johns Hopkins University Press, Baltimore (1993)
Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995)
Blackwell, D., Girshick, M.A.: Theory of Games and Statistical Decision. John Wiley and Sons,
New York (1954)
Blackwell, D., Girshick, M.A.: Theory of Games and Statistical Decisions. Wiley, New York
(1954)
Blyth, C.R.: Maximum probability estimation in small samples. In: Bickel, P.J., Doksum, K.A.,
Hodges, J.L., Jr., (eds.) Festschrift for Erich Lehmann. Wadsworth and Brooks/Cole. Pacific
Grove, CA (1982)
References 305
Bock, M.E.: Employing vague inequality information in the estimation of normal mean vectors
(Estimators that shrink toward closed convex polyhedra). In: Gupta, S.S., Berger, J.O. (eds.).
Statistical Decision Theory III. Academic Press, New York (1982)
Bock, M.E.: Shrinkage estimators: Pseudo-Bayes estimators for normal mean vectors. In: Gupta,
S.S., Berger, J.O. (eds.) Statistical Decision Theory IV. Springer-Verlag, New York (1988)
Bradley, R.A., Gart, J.J.: The asymptotic properties of ML estimators when sampling for
associated populations. Biometrika 49, 205–214 (1962)
Brandwein, A.C., Strawderman, W.E.: Minimax estimation of location parameters for spherically
symmetric unimodal distributions under quadratic loss. Ann. Statist. 6, 377–416 (1978)
Bridge, J.I.: Applied Econometrics. North Holland, Amsterdam (1971)
Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods. Springer-Verlag, New York
(1987)
Brown, D., Rothery, P.: Models in Biology: Mathematics, Statistics and Computing. Wiley, New
York (1993)
Brown, L.D.: Fundamentals of Statistical Exponential Families with Applications in Statistical
Decision Theory. Institute of Mathematical Statistics Lecture Notes–Monograph Series.
Hayward, CA: IMS (1986)
Brown, L.D.: Fundamentals of Statistical Exponential Families. Institute of Mathematical
Statistics, Hayward, CA (1986)
Brown, L.D.: Commentary on paper [19]. Kiefer Collected Papers, Supple-mentary Volume.
Springer-Verlag, New York (1986)
Brown, L.D.: Minimaxity, more or less. In: Gupta, S.S., Berger, J.O. (eds.) Statistical Decision
Theory and Related Topics V. Springer-Verlag, New York (1994)
Brown, L.D., Hwang, J.T.: A unified admissibility proof. In: Gupta, S.S., Berger, J.O. (eds.)
Statistical Decision Theory III. Academic Press, New York (1982)
Bucklew, J.A.: Large Deviation Techniques in Decision, Simulation and Estimation (1990)
Burdick, R.K., Graybill, F.A.: Confidence Intervals on Variance Components. Marcel Dekker,
New York (1992)
Carlin, B.P., Louis, T.A.: Bayes and Empirical Bayes Methods for Data Analysis. Chapman &
Hall, London (1996)
Carroll, R.J., Ruppert, D., Stefanski, L.: Measurment Error in Nonlinear Models. Chapman & Hall,
London (1995)
Carroll, R.J., and Ruppert, D.: Transformation and Weighting in Regression. Chapman and Hall,
London (1988)
Carroll, R.J., Ruppert, D., Stefanski, L.A.: Measurement Error in Non-linear Models. Chapman
and HaIl, London (1995)
Casella, G., Berger, R.L.: Statistical Inference.: Wadsworth/Brooks Cole. Pacific Grove, CA
(1990)
Cassel, C., Sa¨rndal, C., Wretman, J.H.: Foundations of Inference in Survey Sampling. Wiley,
New York (1977)
CBBella., G., Robert, C.P.: Rao-Blackwellisation of Sampling Schemes. Biometrika 83, 81–94
(1996)
Chatterji, S., Price, B.: Regression Analysis by Example. John Wiley (1991)
Chatterji, S.D.: A remark on the Crame´r-Rao inequality. In: Kallianpur, G., Krishnaiah, P.R.,
Ghosh, J.K. (eds.) Statistics and Probability: Essays in Honor of C. R. Rao. North Holland,
New York (1982)
Chaudhuri, A., Mukerjee, R.: Randomized Response: Theory and Techniques. Marcel Dekker,
New York (1988)
Chikka.ra, R.S., Folks, J.L.: The Inverse Gaussian Distribution: Theory, Methodolog y, and
Applications.: Marcel Dekker. New York (1989)
306 References
Chow, G.C.: Test of equality between sets of coefficient in two linear regressions. Econometrica
28(3), 591–605 (1960)
Chow, G.C.: EconometricMethods. McGraw-Hill, New York (1983)
Christensen, R.: Plane Answers to Complex Questions: The Theory of Linear Models, 2nd edn.
Springer-Verlag, New York (1987)
Christensen, R.: Log-linear Models. Springer-Verlag, New York (1990)
Christensen, R.: Plane Answers to Complex Questions. The Theory of Linear Models, 2nd edn.
Springer-Verlag, New York (1996)
Christopher, A.H.: Interpreting and using regression. Sage Publication (1982)
Chung, K.L.: A Course in Probability Theory. Academic Press, New York (1974)
Chung, K.L.: A Course in Probability Theory, 2nd edn. Academic Press, New York (1974)
Chung, K.L.: A Course in Probability Theory, Harcourt. Brace & World, New York (1968)
Cleveland, W.S.: The Elements of Graphing Data. Wadsworth. Monterey (1985)
Clevensen, M.L., Zidek, J.: Simultaneous estimation of the mean of independent Poisson laws.
J. Amer. Statist. Assoc. 70, 698–705 (1975)
Clopper, C.J., Pearson, E.S.: The Use of Confidence or Fiducial Limits Illustrated in the Case of
the Binomial. Biometrika 26, 404–413 (1934)
Cochran, W.G.: Sampling technique. Wiley Eastern Limited, New Delhi (1985)
Cochran, W.G.: Sampling Techniques, 3rd edn. Wiley, New York (1977)
Cox, D.R.: The Analysis of Binary Data. Methuen, London (1970)
Cox, D.R.: Partial likelihood. Biometrika 62, 269–276 (1975)
Cox, D.R., Oakes, D.O.: Analysis of Survival Data. Chapman & Hall, London (1984)
Cox, D.R., Hinkley, D.V.: Theoretical Statistics. Chapman and Hall, London (1974)
Crame´r, H.: Mathematical Methods of Statistics. Princeton Univer-sity Press, Princeton (1946a)
Crow, E.L., Shimizu, K.: Lognormal Distributions: Theory and Practice. Marcel Dekker, New
York (1988)
Crowder, M.J., Sweeting, T.: Bayesian inference for a bivariate binomial distribution. Biometrika
76, 599–603 (1989)
Croxton, F.E., Cowden, D.J.: Applied General Statistics. Prentice-Hall (1964)
Daniels, H.E.: Exact saddlepoint approximations. Biometrika 67, 59–63 (1980)
DasGupta, A.: Bayes minimax estimation in multiparameter families when the parameter space is
restricted to a bounded convex set. Sankhya 47, 326–332 (1985)
DasGupta, A.: An examination of Bayesian methods and inference: In search of the truth.
Technical Report, Department of Statistics, Purdue University (1994)
DasGupta, A., Rubin, H.: Bayesian estimation subject to minimaxity of the mean of a multivariate
normal distribution in the case of a common unknown variance. In: Gupta, S.S., Berger, J.O.
(eds.) Statistical Decision Theory and Related Topics IV. Springer-Verlag, New York (1988)
Datta, G.S., Ghosh, J.K.: On priors providing frequentist validity for Bayesian inference.
Biometrika 82, 37–46 (1995)
Dean, A., Voss, D.: Design and Analysis of Experiments.: Springer- Verlag, New York (1999)
deFinetti, B.: Probability, Induction, and Statistics. Wiley, New York (1972)
DeGroot, M.: Optimal Statistical Decisions. McGraw-Hill, New York (1970)
DeGroot, M.H.: Probability and Statistics, 2nd edn. Addison-Wesley, New York (1986)
Raj, D., Chandhok, P.: Samlpe Survey Theory. Narosa Publishing House, New Delhi (1999)
Devroye, L.: Non-Uniform Random Variate Generation. Springer-Verlag, New York (1985)
Devroye, L., Gyoerfi, L.: Nonparametric Density Estimation: The L1 View. Wiley, New York
(1985)
Diaconis, P.: Theories of data analysis, from magical thinking through classical statistics. In:
Hoaglin, F. Mosteller, J (eds.) Exploring Data Tables, Trends and Shapes (1985)
Diaconis, P.: Group Representations in Probability and Statistics. Institute of Mathematical
Statistics, Hayward (1988)
Dobson, A.J.: An Introduction to Generalized Linear Models. Chapman & Hall, London (1990)
Draper, N.R., Smith, H.: Applied Regression Analysis, 3rd edn. Wiley, New York (1998)
References 307
Draper, N.R., Smith, H.: Applied Regression Analysis. John Wiley and Sons, New York (1981)
Dudley, R.M.: Real Analysis and Probability. Wadsworth and Brooks/Cole, Pacific Grove, CA
(1989)
Durbin, J.: Estimation of parameters in time series regression model. J. R. Statis. Soc.-Ser-B, 22,
139–153 (1960)
Dutta, M.: Econometric Methods. South Western Publishing Company, Cincinnati (1975)
Edwards, A.W.F.: Likelihood. Johns Hopkins University Press, Baltimore (1992)
Efron, B., Hinkley, D.: Assessing the accuracy of the maximum likelihood estimator: Observed vs.
expected Fisher information. Biometrica 65, 457–481 (1978)
Efron, B., Morris, C.: Empirical Bayes on vector observations-An extension of Stein’s method.
Biometrika 59, 335–347 (1972)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, London (1993)
Everitt, B.S.: The Analysis of Contingency Tables, 2nd edn. Chap-man & Hall, London (1992)
Everitt, B.S.: The Analysis of Contingency Tables. John Wiley (1977)
Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman & Hall, London (1981)
Ezekiel, M., Fox, K.A.: Methods of Correlation and Regression Analysis. John Wiley (1959)
Faith, R.E.: Minimax Bayes point and set estimators of a multivariate normal mean. Ph.D. thesis,
Department of Statistics, University of Michigan (1976)
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 1, 3rd edn. Wiley,
New York (1968)
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. II. Wiley, New York
(1971)
Feller, W.: An Introduction to Probability Theory and Its Applications, vol II, 2nd edn. John
Wiley, New York (1971)
Ferguson, T.S.: Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New
York (1967)
Ferguson, T.S.: A Course in Large Sample Theory. Chapman and Hall, London (1996)
Ferguson, T.S.: Mathematical Statistics. Academic Press, New York (1967)
Field, C.A., Ronchetti, E.: Small Sample Asymptotics. Institute of Mathematical Statistics.
Hayward, CA (1990)
Finney, D.J.: Probit Analysis. Cambridge University Press, New York (1971)
Fisher, R.A.: Statistical Methods and Scientific Inference, 2nd edn. Hafner, New York. Reprinted
(1990). Oxford University Press, Oxford (1959)
Fisz, M.: Probability Theory and Mathematical Statistics, 3rd edn. John Wiley (1963)
Fleming, T.R., Harrington, D.P.: Counting Processes and Survival Analysis. Wiley, New York
(1991)
Fox, K.: Intermediate Economic Statistics. Wiley (1968)
Fraser, D.A.S.: The Structure of Inference. Wiley, New York (1968)
Fraser, D.A.S.: Inference and Linear Models. McGraw-Hill, New York (1979)
Fraser, D.A.S.: Nonparametric Methods in Statistics. John Wiley, New York (1965)
Freedman, D., Pisani, R., Purves, R., Adhikari, A.: Statistics, 2nd edn. Norton, New York (1991)
Freund, J.E.: Mathematical Statistics. Prentice-Hall of India (1992)
Fuller, W.A.: Measurement Error Models. Wiley, New York (1987)
Gelman, A., Carlin, J., Stern, H., Rubin, D.B.: Bayesian Data Analysis. Chapman & Hall, London
(1995)
Ghosh, J.K., Mukerjee, R.: Non-informative priors (with discussion). In: Bernardo, J.M., Berger, J.
O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics IV. Oxford University Press, Oxford
(1992)
Ghosh, J.K., Subramanyam, K.: Second order efficiency of maximum likelihood estimators.
Sankhya A 36, 325–358 (1974)
Ghosh, M., Meeden, G.: Admissibility of the MLE of the normal integer mean. Wiley, New York
(1978)
Gibbons, J.D.: Nonparametric Inference. McGraw-Hill, New York (1971)
308 References
Gibbons, J.D., Chakrabarty, S.: Nonparametric Methods for Quantitative Analysis. American
Sciences Press (1985)
Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.): Markov Chain Monte Carlo in Practice.
Chapman & Hall, London (1996)
Girshick, M.A., Savage, L.J.: Bayes and minimax estimates for quadratic loss functions.
University of California Press (1951)
Glejser, H.: A new test for heteroscedasticity, J. Am. Stat. Assoc. 64 316–323 (1969)
Gnedenko, B.V.: The Theory of Probabilit y. MIR Publishers, Moscow (1978)
Godambe, V.P.: Estimating Functions. Clarendon Press, UK (1991)
Goldberg, S.: Probability, an Introduction. Prentice-Hall (1960)
Goldberger, A.S.: Econometric Theory. John Wiley and Sons, New York (1964)
Goldfield, S.M., Quandt, R.E.: Nonlinear methods in Econometrics. North Holland Publishing
Company, Amsterdam (1972)
Goon, A.M., Gupta, M.K., Dasgupta, B.: Fundamentals of Statistics, vol. 1. World Press. Kolkata
(1998)
Goon, A.M., Gupta, M.K., Dasgupta, B.: Fundamentals of Statistics, vol. 2. World Press. Kolkata,
India (1998)
Goon, A.M., Gupta, M.K., Dasgupta, B.: Outline of Statistics, vol. 1. World Press. Kolkata (1998)
Goon, A.M., Gupta, M.K., Dasgupta, B.: Outline of Statistics, vol. 2. World Press. Kolkata (1998)
Granger, C.W.J., Mowbold, P.: R2 and the transformation of regression variables. J. Econ. 4 205–
210 (1976)
Granger, C.W.J.: Investigating Causal Relations by Econometric Models and Cross- spectral
Methods. Econometrica, 424–438 (1969)
Graybill, F.A.: Introduction to Linear Statistical Models, vol. 1. Mc-Graw Hill Inc., New York
(1961)
Gujarati, D.N.: Basic Econometrics. McGraw-Hill, Inc., Singapore (1995)
Gupta, S.C., Kapoor, V.K.: Fundamentals of Mathematical Statistics. Sultan Chand and Sons. New
Delhi (2002)
Haberman, S.J.: The Analysis of Frequency Data. University of Chicago Press. Chicago (1974)
Hall, P.: Pseudo-likelihood theory for empirical likelihood. Ann. Statist. 18, 121–140 (1990)
Hall, P.: The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York (1992)
Halmos, P.R.: Measure Theory. Van Nostrand, New York (1950)
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach
Based on Influence Functions. Wiley, New York (1986)
Hardy, G.H., Littlewood, J.E., Polya, G.: Inequalities, 2nd edn. Cambridge University Press,
London (1952)
Hedayat, A.S., Sinha, B.K.: Design and Inference in Finite Population Sampling. Wiley, New
York (1991)
Helms, L.: Introduction to Potential Theory. Wiley, New York (1969)
Hinkley, D.V., Reid, N., Snell, L.: Statistical Theory and Modelling. In honor of Sir David Cox.
Chapman & Hall, London (1991)
Hobert, J.: Occurrences and consequences of nonpositive Markov chains in Gibbs sampling. Ph.D.
Thesis, Biometrics Unit, Cornell University (1994)
Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics. Amerind (1972)
Hollander, M., Wolfe, D.A..: Nonparametric Statistical Methods. John Wiley (1973)
Hsu, J.C.: Multiple Comparisons: Theory and Methods. Chapman and Hall, London (1996)
Huber, P.J.: Robust Statistics. Wiley, New York (1981)
Hudson, H.M.: Empirical Bayes estimation. Technical Report No. 58, Department of Statistics,
Stanford University ((1974))
Ibragimov, I.A., Has’minskii, R.Z.: Statistical Estimation: Asymptotic Theory. Springer-Verlag,
New York (1981)
References 309
James, W., Stein, C.: Estimation with Quadratic Loss. In: Proceedings of the Fourth Berkele y
Symposium on Mathematical Statistics and Probability 1, pp. 361–380. University of
California Press, Berkeley (1961)
Johnson, N.L., Kotz, S., Kemp, A.W.: Univariate Discrete Distributions, 2nd edn. Wiley, New
York (1992)
Johnson, N.L., Kotz, S.: Distributions in Statistics (4 vols.). Wiley, New York (1969–1972)
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distri- butions, vol 1, 2nd edn.
Wiley, New York (1994)
Johnson, N.L., Kotz. S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 2, 2d edn.
Wiley, New York (1995)
Johnston, J.: Econometric Methods, 3rd edn. Mc-Grawl-Hill Book Company (1985)
Kagan, A.M., Linnik, YuV, Rao, C.R.: Characterization Problems in Mathe- matical Statistics.
Wiley, New York (1973)
Kalbfleisch, J.D., Prentice, R.L.: The Statistical Analysis of Failure Time Data. Wiley, New York
(1980)
Kane, E.J.: Econmic Statistics and Econometrics. Harper International (1968)
Kapoor, J.N., Saxena, H.C.: Mathematical Statistics. S Chand and Co (Pvt) Ltd, New Delhi (1973)
Kelker, D.: Distribution Theory of Spherical Distributions and a Location-Scale Parameter
Generalization. Sankhya. Ser. A 32, 419–430 (1970)
Kempthorne, P.: Dominating inadmissible procedures using compromise decision theory. In:
Gupta, S.S., Berger, J.O. (eds.) Statistical Decision Theory IV. Springer-Verlag, New York
(1988a)
Kendall, M.G., Stuart, A.: The Advance Theory of Statistics vol. 3, 2nd edn. Charles Griffin and
Company Limited, London (1968)
Kendall, M., Stuart, A.: The Advance Theory of Statistics, vol. 2. Charles Griffin and Co. Ltd.,
London (1973)
Kendall, M., Stuart, A.: The Advance Theory of Statistics. vol 1. Charles Griffin and Co. Ltd.,
London (1977)
Kendall, M.G.: Rank Correlation Methods, 3rd edn. Griffin, London (1962)
Kendall, M., Stuart, A.: The Advanced Theory of Statistics, Volume II: Inference and
Relationship, 4th edn. Macmillan, New York (1979)
Kiefer, J.: Multivariate optimality results. In: Krishnaiah, P. (ed.) Multivariate Analysis. Academic
Press, New York (1966)
Kirk, R.E.: Experimental Design: Procedures for the Behavioral Sciences, 2nd edn. Brooks/Cole,
Pacific Grove (1982)
Klien, L.R., Shinkai, Y.: An Econometric Model of Japan, 1930-1959. International Economic
Review 4, 1–28 (1963)
Klien, L.R.: An Introduction to Econometrics. Prentice-Hall (1962)
Kmenta, J.: Elements of Econometrics, 2nd edn. Macmillan, New York (1986)
Kolmogorov, A.N., Fomin, S.V.: Elements of the Theory of Functions and Functional Analysis,
vol. 2. Graylock Press, Albany, New York (1961)
Koroljuk, V.S., Borovskich, YuV: Theory of U-Statistics. Kluwer Academic Publishers, Boston
(1994)
Koutsoyiannis, A.: Theory of Econometrics. Macmillan Press Ltd., London (1977)
Kraft, C.H., Van Eeden, C.: A Nonparametric Introduction to Statistics. Mac-millan, New York
(1968)
Kramer, J.S.: The logit Model for Economists. Edward Arnold publishers, London (1991)
Kuehl, R.O.: Design of Experiments: Statistical Principles of Research Design and Analysis, 2nd
edn. Pacific Grove, Duxbury (2000)
Lane, D.A.: Fisher, Jeffreys, and the nature of probability. In: Fienberg, S.E., Hinkley, D.V. (eds.)
R.A. Fisher: An Appreciation. Lecture Notes in Statistics 1. Springer-Verlag, New York (1980)
310 References
Lange, N., Billard, L., Conquest, L., Ryan, L., Brillinger, D., Greenhouse, J. (eds.): Case Studies in
Biometry. Wiley-Interscience, New York (1994)
Le Cam, L.: Maximum Likelihood: An Introduction. Lecture Notes in Statistics No. 18. University
of Maryland, College Park, Md (1979)
Le Cam, L.: Asymptotic Methods in Statistical Decision Theory. Springer-Verlag, New York
(1986)
Le Cam, L., Yang, G.L.: Asymptotics in Statistics: Some Basic Concepts. Springer-Verlag, New
York (1990)
Lehmann, E.L.: Testing Statistical Hypotheses, 2nd edn. Wiley, New York (1986)
Lehmann, E.L.: Introduction to Large-Sample Theory. Springer-Verlag, New York (1999)
Lehmann, E.L.: Testing Statistical Hypotheses, Second Edition (TSH2). Springer-Verlag, New
York (1986)
Lehmann, E.L.: Elements of Large-Sample Theory. Springer-Verlag, New York (1999)
Lehmann, E.L.: Testing Statistical Hypotheses. John Wiley, New York (1959)
Lehmann, E.L., Scholz, F.W.: Ancillarity. In: Ghosh, M., Pathak, P.K. (eds.) Current Issues in
Statistical Inference: Essays in Honor of D. Basu. Institute of Mathematical Statistics, Hayward
(1992)
Lehmann, E.L., Casella, G.: Theory of Point Estimation, 2nd edn. Springer-Verlag, New York
(1998)
LePage, R., Billard, L.: Exploring the Limits of Bootstrap. Wiley, New York (1992)
Leser, C.: Econometric Techniques and Problemss. Griffin (1966)
Letac, G., Mora, M.: Natural real exponential families with cubic variance functions. Ann. Statist.
18, 1–37 (1990)
Lindgren, B.W.: Statistical Theory, 2nd edn. The Macmillan Company, New York (1968)
Lindley, D.V.: The Bayesian analysis of contingency tables. Ann. Math. Statist. 35, 1622–1643
(1964)
Lindley, D.V.: Introduction to Probability and Statistics. Cambridge University Press, Cambridge
(1965)
Lindley, D.V.: Introduction to Probability and Statistics from a Bayesian Viewpoint. Part 2.
Inference. Cambridge University Press, Cambridge (1965)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Liu, C., Rubin, D.B.: The ECME algorithm: A simple extension of EM and ECM with faster
monotone convergence. Biometrika 81, 633–648 (1994)
Loeve, M.: Probability Theory, 3rd edn. Van Nostrand, Princeton (1963)
Luenberger, D.G.: Optimization by Vector Space Methods. Wiley, New York (1969)
Lukacs, E.: Characteristic Functions, 2nd edn. Hafner, New York (1970)
Lukacs, E.: Probability and Mathematical Statistics. Academic Press, New York (1972)
Madala: Limited Dependent and Qualitative Variables in Econometrics. Cambridge University
Press, New York (1983)
Madnani, J.M.K.: Introduction to Econometrics: Principles and Applications, 4th edn. Oxford and
1BH publishing Co. Pvt. Ltd (1988)
Maritz, J.S., Lwin, T.: Empirical Bayes Methods, 2nd edn. Chapman & Hall, London (1989)
Marshall, A., Olkin, I.: Inequalities—Theory of Majorization and its Applications. Academic
Press. New York (1979)
McCullagh, P.: Quasi-likelihood and estimating functions. In: Hinkley, D., Reid, N., Snell, L.
(eds.) Statistical Theory and Modelling: In Honor of Sir David Cox. Chapman & Hall, London
(1991)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman & Hall, London
(1989)
McLa.chlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)
McLachlan, G.: Recent Advances in Finite Mixture Models. Wiley, New York (1997)
McLachlan, G., Basford, K.: Mixture Models: Inference and Applications to Clustering. Marcel
Dekker, New York (1988)
References 311
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)
McPherson, G.: Statistics in Scientific Investigation. Springer-Verlag, New York (1990)
Meng, X.-L., Rubin, D.B.: Maximum likelihood estimation via the ECM algo-rithm: A general
framework. Biometrika 80, 267–278 (1993)
Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability. Springer-Verlag, New York
(1993)
Miller, R.G.: Simultaneous Statistical Inference, 2nd edn. Springer-Verlag, New York (1981)
Montgomery, D., Elizabeth, P.: Introduction to Linear Regression Analysis. John Wiley & Sons
(1982)
Mood, A.M., Graybill, F.A., Boes, D.C.: Introduction to the Theory of Statistics. McGraw-Hill
(1974)
Murray, M.K., Rice, J.W.: Differential Geometry and Statistics. Chap-man & Hall, London (1993)
Neter, J., Wasserman, W., Whitmore, G.A.: Applied Statistics. Allyn & Bacon, Boston (1993)
Novick, M.R., Jackson, P.H.: Statistical Methods for Educational and Psycho-logical Research.
McGraw-Hill, New York (1974)
Oakes, D.: Life-table analysis. Statistical Theory and Modelling, in Honor of Sir David Cox, FRS.
Chapman & Hall, London (1991)
Olkin, I., Selliah, J.B.: Estimating covariances in a multivariate distribution. In: Gupta, S.S.,
Moore, D.S. (eds) Statistical Decision Theory and Related Topics II. Academic Press, New
York (1977)
Owen, A.: Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75,
237–249 (1988)
Panse, V.G., Sukhatme, P.V.: Statistical methods for agricultural workers. Indian Council of
Agricultural Research, New Delhi (1989)
Park, R.E.: Estimation with heteroscedastic error terms. Econometrica. 34(4), 888 (1966)
Parzen, E.: Modern Probability Theory and Its Applications. Wiley Eastern (1972)
Pfanzagl, J.: Parametric Statistical Theory. DeGruyter, New York (1994)
Raifa, H., Schlaifer, R.: Applied Statistical Decision Theory. Published by Division of Research,
Harvard Business School, Harvard University, Boston (1961)
Rao, C.R.: Simultaneous estimation of parameters—A compound decision problem. In: Gupta, S.
S., Moore, D.S. (eds.) Decision Theory and Related Topics. Academic Press, New York (1977)
Rao, C.R.: Linear Statistical Inference and Its Applications. John Wiley, New York (1965)
Rao, C.R., Kleffe, J.: Estimation of variance components and applications. North Holland/Elsevier,
Amsterdam (1988)
Ripley, B.D.: Stochastic Simulation. Wiley, New York (1987)
Robbins, H.: Asymptotically subminimax solutions of compound statistical decision problems. In:
Proceedings Second Berkeley Symposium Mathematics Statistics Probability. University of
California Press, Berkeley (1951)
Robert, C., Casella, G.: Improved confidence statements for the usual multivariate normal
confidence set. In: Gupta, S.S., Berger, J.O. (eds.) Statistical Decision Theory V.
Springer-Verlag, New York (1994)
Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer-Verlag, New York (1998)
Robert, C.P.: The Bayesian Choice. Springer-Verlag, New York (1994)
Rohatgi, V.K.: Statistical Inference. Wiley, New York (1984)
Romano, J.P., Siegel, A.F.: Counter examples in Probability and Statistics. Wadsworth and
Brooks/Cole, Pacific Grove (1986)
Rosenblatt, M.: Markov Processes. Structure and Asymptotic Behavior. Springer-Verlag, New
York (1971)
Ross, S.M.: A First Course in Probability Theory, 3rd edn. Macmillan, New York (1988)
Ross, S.: Introduction to Probability Models, 3rd edn. Academic Press, New York (1985)
Rothenberg, T.J.: The Bayesian approach and alternatives in econometrics. In: Fienberg, S.,
Zellner, A. (eds.) Stud- ies in Bayesian Econometrics and Statistics, vol. 1. North-Holland,
Amsterdam (1977)
312 References
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)
Royall, R.M.: Statistical Evidence: A Likelihood Paradigm. Chapman and Hall, London (1997)
Rubin, D.B.: Using the SIR Algorithm to Simulate Posterior Distributions. In: Bernardo, J.M.,
DeGroot, M.H., Lindley, D.V., Smith, A.F.M. (eds.) Bayesian Statistics, pp. 395–402. Oxford
University Press, Cambridge (1988)
Rudin, W.: Principles of Real Anal ysis. McGraw-Hill, New York (1976)
Sahu, P.K.: Agriculture and Applied Statistics-I.2nd Reprint. Kalyani Publishers, New Delhi, India
(2013)
Sahu, P.K., Das, A.K.: Agriculture and Applied Statistics-II, 2nd edn. Kalyani Publishers, New
Delhi, India (2014)
Sahu, P.K.: Research Methodology-A Guide for Researchers in Agricultural Science. Springer,
Social science and other related fields (2013)
Sa¨rndal, C-E., Swenson, B., and retman, J.: Model Assisted Survey Sampling. Springer-Verlag,
New York (1992)
Savage, L.J.: The Foundations of Statistics. Wiley, Rev. ed., Dover Publications, New York (1954,
1972)
Savage, L.J.: On rereading R. A. Fisher (with discussion). Ann. Statist. 4, 441–500 (1976)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)
Scheffe, H.: The Analysis of Variance. John Wiley and Sons, New York (1959)
Schervish, M.J.: Theory of Statistics. Springer-Verlag, New York (1995)
Scholz, F.W.: Maximum likelihood estimation. In: Kotz, S., Johnson, N.L., Read, C.B. (eds.)
Encyclopedia of Statistical Sciences 5. Wiley, New York (1985)
Searle, S.R.: Linear Models. Wiley, New York (1971)
Searle, S.R.: Matrix Algebra Useful for Statistics. Wiley, New York (1982)
Searle, S.R.: Linear Models for Unbalanced Data. Wiley, New York (1987)
Searle, S.R., Casella, G., McCulloch, C.E.: Variance Components. Wiley, New York (1992)
Seber, G.A.F.: Linear Regression Analysis. Wiley, New York (1977)
Seber, G.A.F., Wild, C.J.: Nonlinear Regression. John Wiley (1989)
Seshadri, V.: The Inverse Gaussian Distribution. A Case Stud y in Exponential Families.
Clarendon Press, New York (1993)
Shao, J.: Mathematical Statistics. Springer-Verlag, New York (1999)
Shao, J., Tu, D.: The Jackknife and the Bootstrap. Springer- Verlag, New York (1995)
Siegel, S.: Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill (1956)
Silverman, B.W.: Density Estimation for Statistic and Data Analysis. Chap-man & Hall, London
(1986)
Daroga, S., Chaudhary, F.S.: Theory and Analysis of Sample Survey Designs. Wiley Eastern
Limited. New Delhi (1989)
Singh, R.K., Chaudhary, B.D.: Biometrical methods in quantitative genetic analysis. Kalyani
Publishers. Ludhiana (1995)
Snedecor, G.W., Cochran, W.G.: Statistical Methods, 8th edn. Iowa. State University Press, Ames
(1989)
Snedecor, G.W., Cochran, W.G.: Statistical Methods. Iowa State University Press (1967)
Spiegel, M.R.: Theory and Problems of Statistics. McGraw-Hill Book Co., Singapore (1988)
Staudte, R.G., Sheather, S.J.: Robust Estimation and Testing. Wiley, New York (1990)
Stein, C.: Efficient nonparametric testing and estimation. In: Proceedings Third Berkeley
Symposium Mathematics Statististics Probability 1. University of California Press (1956a)
Stein, C.: Inadmissibility of the usual estimator for the mean of a multivariate distribution. In:
Proceedings Third Berkeley Symposium Mathematics Statististics Probability 1. University of
California Press (1956b)
Stigler, S.: The History of Statistics: The Measurement of Uncertainty before 1900. Harvard
University Press, Cambridge (1986)
Stuart, A., Ord, J.K.: Kendall’ s Advanced Theory of Statistics, vol. I, 5th edn. Oxford University
Press, New York (1987)
References 313
Stuart, A., Ord, J.K.:Kendall's Advanced Theory of Statistics, Volume I: Distribution Theory, 5th
edn. Oxford University Press, New York (1987)
Stuart, A., Ord, J.K.: Kendall’ s Advanced Theory of Statistics, vol. II, 5th edn. Oxford University
Press, New York (1991)
Stuart, A., Ord, J.K., Arnold, S.: Advanced Theory of Statistics, Volume 2A: Classical Inference
and the Linear Model, 6th edn. Oxford University Press, London (1999)
Susarla, V.: Empirical Bayes theory. In: Kotz, S., Johnson, N.L., Read, C.B. (eds.) Encyclopedia
of Statistical Sciences 2. Wiley, New York (1982)
Tanner, M.A.: Tools for Statistical Inference: Observed Data and Data Augmentation Methods,
3rd edn. Springer-Verlag, New York (1996)
The Selected Papers of E. S. Pearson. Cambridge University Press, New York (1966)
Theil, H.: On the Relationships Involving Qualitative Variables. American J. Sociol. 76, 103–154
(1970)
Theil, H.: Principles of Econometrics. North Holland (1972)
Theil, H.: Introduction to Econometrics. Prentice-Hall (1978)
Thompson, W.A. Jr.: Applied Probability, Holt, Rinehart and Winston. New York (1969)
Tintner, G.: Econometrics. John Wiley and Sons, New York (1965)
Tukey, J.W.: A survey of sampling from contaminated distributions. In: Olkin, I. (ed.)
Contributions to Probability and Statistics. Stanford University Press. Stanford (1960)
Unni, K.: The theory of estimation in algebraic and analytical exponential families with
applications to variance components models. PhD. Thesis, Indian Statistical Institute, Calcutta,
India ((1978))
Wald, A.: Statistical Decision Functions. Wiley, New York (1950)
Walker, H.M., Lev, J.: Statistical Inference. Oxford & IBH (1965)
Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall, London (1995)
Wasserman, L.: Recent methodological advances in robust Bayesian inference (with discussion).
In: Bernardo, J.M, Berger, J.O., David, A.P. (eds.) Bayesian Statistics 4 (1990)
Weiss, L., Wolfowitz, J.: Maximum Probability Estimators and Related Topics. Springer-Verlag,
New York (1974)
White, H.: A heterosedasticity consistent covariance matrix estimator and direct test of
heterosedasticity. Econometrica 48 817–898 (1980)
Wilks, S.S.: Mathematical Statistics. John Wiley, New York (1962)
Williams, E.J.: Regression Analysis. Wiley, New York (1959)
Wu, C.F.J.: On the convergence of the EM algorithm. Ann. Statist. 11, 95–103 (1983)
Yamada, S. and Morimoto, H. (1992). Sufficiency. In: Ghosh, M., Pathak, P.K. (eds.) Current
Issues in Statistical Inference: Essays in Honor of D. Basu. Institute of Mathematical Statistics,
Hayward, CA
Yamane, T.: Statistics. Harper International (1970)
Yule, G.U., Kendell, M.G.: Introduction to the Theory of Statistics (Introduction). Charles Griffin,
London (1950)
Zacks, S.: The Theory of Statistical Inference. John Wiley, New York (1971)
Zellner, A.: An Introduction to Bayesian Inference in Econometrics. Wiley, New York (1971)
Index
A E
Admissibility, 2, 48, 189 Efficiency, 21, 44, 47, 60, 61, 290
Alternative hypothesis, 64, 157 Efficient estimator, 44, 45, 48, 268
ANOVA, 122, 146 Essential complete, 191, 192
Asymptotically normal, 51, 52, 278, 299 Essential complete class, 191
Estimate, 3, 4, 8, 21, 39, 48, 55, 60, 176, 177,
B 188, 197, 203, 207, 220, 223, 229, 242, 290
Bartlett’s test, 250 Estimation, 3, 39, 44, 47, 48, 54, 56, 63, 131,
Bayes principle, 198 132, 181, 183
Bernoulli distribution, 64 Estimator, 3, 21–26, 28–31, 36, 39, 44, 48, 53,
Best critical region, 73 104, 132, 203–205, 220, 228
Bhattacharya system of lower bounds, 32 Euclidian space, 64
Binomial distribution, 158, 237 Exact tests, 239
Expected frequency, 55, 147, 267, 268, 273
C
Cauchy distribution, 48 F
Chebysheff’s inequality, 40, 41, 137 Factorizability condition, 8, 9
Combination of tests, 173 Factorization criterion, 7
Complete decision rule, 189 Family of distribution function, 3, 30
Completeness, 15, 192, 204 Fisher-Neyman criterion, 5, 11, 12
Complete statistics, 15 Fisher-Neyman factorisation theorem, 11
Conditional distribution, 3, 5, 13, 16, 30, 238, Fisher’s t statistic, 244
240, 241
Confidence interval, 131–138, 141, 143, 170, H
172, 248, 293, 295, 298, 301, 302 Homogeneity, 123, 165, 268
Consistency, 21, 39, 40, 42, 47
Consistent estimator, 39–43, 52, 82, 280 I
Correlation coefficient, 25, 45, 127, 128 Interval estimation, 131, 132, 182
Critical function, 66 Invariance, 231, 233
Critical region, 66, 70, 73, 84, 92, 94, 97, 128,
129, 144, 154, 158, 164, 249 J
Jensen’s inequality, 193
D Joint distribution, 1, 4, 7, 168
Dandekar’s correction, 274
Distribution free test, 145 K
Distribution function, 1–3, 5, 30, 131, 153, Kolmogorov-Smirnov test, 163, 165
168, 274 Koopman form, 14, 29
N S
Neyman-Pearson Lemma, 73, 91 Sample, 2, 3, 7, 9, 15, 17, 20, 39, 55, 60, 63,
Neyman’s criterion, 138, 141 65, 104, 112, 132, 145, 147, 148, 153, 155,
Nominal, 146 159, 174, 181, 291
Non-parametric test, 145, 146, 156 Sign test, 146, 148, 151, 156–158, 171–173
Non-randomized test, 66, 75, 86, 142–144 Spearman’s rank correlation coefficient, 174,
Non-singular, 261, 262, 267 175
Normal distribution, 1, 57, 104, 105, 145, 243, Statistic, 3, 5, 6, 12, 18, 107, 135, 138, 145,
249, 261, 282, 291 148, 173, 179, 192
Null hypothesis, 64, 66, 113, 147, 152, Statistical decision theory, 173
156–158 Statistical hypothesis, 63, 146
Statistical inference, 2, 131, 181
O Students’ t-statistic, 110, 133
Optimal decision rule, 197 Sufficient statistic, 3, 4, 15, 17, 192, 204, 239
Optimum test, 72
Ordinal, 146 T
Orthogonal transformation, 4 Test for independence , 240, 270
Test for randomness, 152
P Test function, 66
Paired sample sign test, 157 Testing of hypothesis, 182, 279
Parameter, 2, 3, 6, 12–14, 17, 21, 47, 56, 63, Tolerance limit, 168
64, 103, 131, 132, 146, 156, 181, 197, 202, Transformation of statistic, 291
223, 246, 268 Type-1 and Type-2 error, 72
Index 317
U W
UMAU, 143, 144 Wald-Wolfowiltz run test, 162, 165
Unbiased estimator, 21–26, 28–32, 36, 37, 39, Wilcoxon-Mann Whitney rank test, 159
42, 44, 45, 57, 124, 225 Wilk’s criterion, 138
Uni-dimensional random variable, 1
Uniformly most powerful unbiased test Y
(UMPU test), 73, 97 Yates’ correction, 273
Uniformly powerful size, 73