10.1007 978 81 322 2514 0

Pradip Kumar Sahu · Santi Ranjan Pal
Ajit Kumar Das
Estimation
and
Inferential
Statistics
Estimation and Inferential Statistics
Pradip Kumar Sahu Santi Ranjan Pal
•
Ajit Kumar Das
Estimation and Inferential

Statistics
123
Pradip Kumar Sahu Ajit Kumar Das
Department of Agricultural Statistics Department of Agricultural Statistics
Bidhan Chandra Krishi Viswavidyalaya Bidhan Chandra Krishi Viswavidyalaya
Mohanpur, Nadia, West Bengal Mohanpur, Nadia, West Bengal
India India
Santi Ranjan Pal

Department of Agricultural Statistics
Bidhan Chandra Krishi Viswavidyalaya
Mohanpur, Nadia, West Bengal
India
ISBN 978-81-322-2513-3 ISBN 978-81-322-2514-0 (eBook)

DOI 10.1007/978-81-322-2514-0
Library of Congress Control Number: 2015942750
Springer New Delhi Heidelberg New York Dordrecht London

© Springer India 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.
Printed on acid-free paper
Springer (India) Pvt. Ltd. is part of Springer Science+Business Media (www.springer.com)

Preface
Nowadays one can hardly find any field where statistics is not used. With a given
sample, one can infer about the population. The role of estimation and inferential
statistics remains pivotal in the study of statistics. Statistical inference is concerned
with problems of estimation of population parameters and test of hypotheses. In
statistical inference, drawing a conclusion about the population takes place on the
basis of a portion of the population. This book is written, keeping in mind the need
of the users, present availability of literature to cater to these needs, their merits and
demerits under a constantly changing scenario. Theories are followed by relevant
worked-out examples which help the user grasp not only the theory but also
practice them.
This work is a result of the experience of the authors in teaching and research
work for more than 20 years. The wider scope and coverage of the book will help
not only the students, researchers and professionals in the field of statistics but also
several others in various allied disciplines. All efforts are made to present the
“estimation and statistical inference”, its meaning, intention and usefulness. This
book reflects current methodological techniques used in interdisciplinary research,
as illustrated with many relevant research examples. Statistical tools have been
presented in such a manner, with the help of real-life examples, that the fear factor
about the otherwise complicated subject of statistics will vanish. In its seven
chapters, theories followed by examples will make the readers to find most suitable
applications.
Starting from the meaning of the statistical inference, its development, different
parts and types have been discussed eloquently. How someone can use statistical
inference in everyday life has remained the main point of discussion in examples.
How someone can draw conclusions about the population under varied situations,
even without studying each and every unit of the population, has been discussed
taking numerous examples. All sorts of inferential problems have been discussed, at
one place supported by examples, to help the students not only in meeting their
examination need and research requirement, but also in daily life. One can hardly
get such a compilation of statistical inference in one place. The step-by-step
v
vi Preface
procedure will immensely help not only the graduate and Ph.D. students but also
other researchers and professionals. Graduate and postgraduate students,
researchers and the professionals in various fields will be the user of the book.
Researchers in medical and social and other disciplines will be greatly benefitted
from the book. The book would also help students in various competitive
examinations.
Written in a lucid language, the book will be useful to graduate, postgraduate
and research students and practitioners in diverse fields including medical, social
and other sciences. This book will also cater the need for preparation in different
competitive examinations. One can find hardly a single book, in which all topics
related to estimation and inference are included. Numerous relevant examples for
related theories are added features of this book. An introduction chapter and an
annexure are special features of this book which will help readers in getting basic
ideas and plugging the loopholes of the readers. Chapter-wise summary of the
content of the proposed book is presented below.
Estimation and Inferential Statistics

• Chapter 1: The chapter relates to introduction to the theory of point estimation
and inferential statistics. Different criteria for a good estimator are discussed.
The chapters also present real-life worked-out problems that help the reader
understand the subject. Compared to partial coverage of this topic in most books
on statistical inference, this book aims at elaborate coverage about the subject of
point estimation.
• Chapter 2: This chapter deals with different methods of estimation like least
square method, method of moments, method of minimum χ2 and method of
maximum likelihood estimation. Not all these methods are equally good and
applicable in all situations. Merits, demerits and applicability of these methods
have been discussed in one place, which otherwise have remained mostly dis-
persed or scattered in the competing literature.
• Chapter 3: Testing of hypotheses has been discussed in this chapter. This
chapter is characterized by typical examples in different forms and spheres
including Type A1 testing, which is mostly overlooked in many of the available
literature. This has been done in this book.
• Chapter 4: The essence and technique of likelihood ratio test has been discussed
in this chapter. Irrespective of the nature of tests for hypotheses (simple and
composite), this chapter emphasizes how easily the test could be performed,
supported by a good number of examples. Merits and drawbacks have also been
discussed. Some typical examples are discussed in this chapter that one can
hardly find in any other competing literature.
Preface vii
• Chapter 5: This chapter deals with interval estimation, techniques of interval

estimation under different situations, problems and prospects of different
approaches of interval estimation has been discussed with numerous examples
in one place.
• Chapter 6: This chapter deals with non-parametric methods of testing
hypotheses. All types of non-parametric tests have been put together and dis-
cussed in detail. In each case, suitable examples are the special feature of this
chapter.
• Chapter 7: This chapter is devoted to the discussion of decision theory. This
discussion is particularly useful to students and researchers interested in infer-
ential statistics. In this chapter, attempt has been made to present the decision
theory in an exhaustive manner, keeping in mind the requirement and the
purpose of the reader for whom the book is aimed at. Bayes and mini-max
method of estimation have been discussed in the Annexure. Most of the
available literature on inferential statistics lack due attention on these important
aspects of inference. In this chapter, the importance and utilities of the above
methods have been discussed in detail, supported with relevant examples.
• Annexure: The authors feel that the Annexure portion would be an asset to
varied types of readers of this book. Related topics, proofs, examples, etc.,
which could not be provided in the text itself, during the discussion of various
chapter for the sake of maintenance of continuity and flow are provided in this
section. Besides many useful proofs and derivations, this section includes
transformation of statistics, large sample theories, exact tests related to binomial,
Poisson population, etc. This added section will be of much help to the readers.
In each chapter, theories are followed by examples from applied fields, which
will help the readers of this book to understand the theories and applications of
specific tools. Attempts have been made to familiarize the problems with examples
on each topic in a lucid manner. During the preparation of this book, a good number
of books and articles from different national and international journals have been
consulted. Efforts have been made to acknowledge and provide these in the bib-
liography section. An inquisitive reader may find more material from the literature
cited.
The primary purpose of the book is to help students of statistics and allied fields.
Sincere efforts have been made to present the material in the simplest and
easy-to-understand form. Encouragements, suggestions and help received from our
colleagues at the Department of Agricultural Statistics, Bidhan Chandra Krishi
Viswavidyalaya are sincerely acknowledged. Their valuable suggestions towards
improvement of the content helped a lot and are sincerely acknowledged. The
authors thankfully acknowledge the constructive suggestions received from the
reviewers towards the improvement of the book. Thanks are also due to Springer
viii Preface
for the publication of this book and for continuous monitoring, help and suggestion
during this book project. The authors acknowledge the help, cooperation, encour-
agement received from various corners, which are not mentioned here. The effort
will be successful, if this book is well accepted by the students, teachers,
researchers and other users to whom this book is aimed at. Every effort has been
made to avoid errors. Constructive suggestions from the readers in improving the
quality of this book will be highly appreciated.
Mohanpur, Nadia, India Pradip Kumar Sahu

Santi Ranjan Pal
Ajit Kumar Das
Contents
1 Theory of Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sufficient Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Unbiased Estimator and Minimum-Variance
Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Consistent Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.5 Efficient Estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2 Methods of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3 Method of Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . 48
2.4 Method of Minimum v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Method of Least Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 Theory of Testing of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Definitions and Some Examples . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 Method of Obtaining BCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4 Locally MPU Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5 Type A1 (Uniformly Most Powerful Unbiased) Test . . . . . . . . . 97
4 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.1.1 Some Selected Examples . . . . . . . . . . . . . . . . . . . . . . . . 104
5 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
ix
x Contents
5.3 Construction of Confidence Interval. . . . . . . . . . . . . . . . . . . . . . 132

5.4 Shortest Length Confidence Interval and Neyman’s Criterion . . . . 138
6 Non-parametric Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2 One-Sample Non-parametric Tests. . . . . . . . . . . . . . . . . . . . . . . 146
6.2.1 Chi-Square Test (i.e Test for Goodness of Fit) . . . . . . . . . 146
6.2.2 Kolmogrov–Smirnov Test . . . . . . . . . . . . . . . . . . . . . . . 147
6.2.3 Sign Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.2.4 Wilcoxon Signed-Rank Test . . . . . . . . . . . . . . . . . . . . . . 151
6.2.5 Run Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.3 Paired Sample Non-parametric Test . . . . . . . . . . . . . . . . . . . . . . 156
6.3.1 Sign Test (Bivariate Single Sample Problem)
or Paired Sample Sign Test . . . . . . . . . . . . . . . . . . . . . . 156
6.3.2 Wilcoxon Signed-Rank Test . . . . . . . . . . . . . . . . . . . . . . 157
6.4 Two-Sample Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.5 Non-parametric Tolerance Limits . . . . . . . . . . . . . . . . . . . . . . . 168
6.6 Non-parametric Confidence Interval for nP . . . . . . . . . . . . . . . . . 170
6.7 Combination of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.8 Measures of Association for Bivariate Samples . . . . . . . . . . . . . . 174
7 Statistical Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.2 Complete and Minimal Complete Class of Decision Rules . . . . . . 189
7.3 Optimal Decision Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7.4 Method of Finding a Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . 199
7.5 Methods for Finding Minimax Rule. . . . . . . . . . . . . . . . . . . . . . 208
7.6 Minimax Rule: Some Theoretical Aspects . . . . . . . . . . . . . . . . . 226
7.7 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
About the Authors
P.K. Sahu is associate professor and head of the Department of Agricultural

Statistics, Bidhan Chnadra Krishi Viswavidyalaya (a state agriculture university),
West Bengal. With over 20 years of teaching experience, Dr. Sahu has published
over 70 research papers in several international journals of repute and has guided
several postgraduate students and research scholars. He has authored four books:
Agriculture and Applied Statistics, Vol. 1, and Agriculture and Applied Statistics,
Vol. 2 (both published with Kalyani Publishers), Gender, Education,
Empowerment: Stands of Women (published with Agrotech Publishing House) and
Research Methodology: A Guide for Researchers In Agricultural Science, Social
Science and Other Related Fields (published by Springer) as well as contributed a
chapter to the book Modelling, Forecasting, Artificial Neural Network and Expert
System in Fisheries and Aquaculture, edited by Ajit Kumar Roy and Niranjan
Sarangi (Daya Publishing House). Dr. Sahu has presented his research papers in
several international conferences. He also visited the USA, Bangladesh, Sri Lanka,
and Vietnam to attend international conferences.
S.R. Pal is former eminent professor at the Department of Agricultural Statistics at
R.K. Mission Residential College and Bidhan Chandra Krishi Viswavidyalaya (a
state agriculture university). An expert in agricultural statistics, Prof. Pal has over
35 years of teaching experience and has guided several postgraduate students and
research scholars. He has several research papers published in statistics and related
fields in several international journals of repute. With his vast experience in
teaching, research and industrial advisory role, Prof. Pal has tried to incorporate the
problems faced by the users, students, and researchers in this field.
A.K. Das is professor at the Department of Agricultural Statistics, Bidhan Chandra
Krishi Viswavidyalaya (a state agriculture university). With over 30 years of
teaching experience, Prof. Das has a number of good research articles to his credit
published in several international journals of repute and has guided several
xi
xii About the Authors
postgraduate students and research scholars. He has coauthored a book, Agriculture

and Applied Statistics, Vol. 2 (both published with Kalyani Publishers), and con-
tributed a chapter to the book, Modelling, Forecasting, Artificial Neural Network
and Expert System in Fisheries and Aquaculture, edited by Ajit Kumar Roy and
Niranjan Sarangi (Daya Publishing House).
Introduction
In a statistical investigation, it is known that for reasons of time or cost, one may
not be able to study each individual element of the population. In such a situation, a
random sample should be taken from the population, and the inference can be
drawn about the population on the basis of the sample. Hence, statistics deals with
the collection of data and their analysis and interpretation. In this book, the problem
of data collection is not considered. We shall take the data as given, and we study
what they have to tell us. The main objective is to draw a conclusion about the
unknown population characteristics on the basis of information on the same char-
acteristics of a suitably selected sample. The observations are now postulated to be
the values taken by random variables. Let X be a random variable which describes
the population under investigation and F be the distribution function of X. There are
two possibilities. Either X has a distribution function of Fh with a known functional
form (except perhaps for the parameter h, which may be vector), or X has a
distribution function F about which we know nothing (except perhaps that F is, say,
absolutely continuous). In the former case, let H be the set of possible values of
unknown parameter h, then the job of statistician is to decide on the basis of
suitably selected samples, which member or members of the family fFh ; h 2 Hg
can represent the distribution function of X. These types of problems are called
problems of parametric statistical inference. The two principal areas of statistical
inference are the “area of estimation of parameters” and the “tests of statistical
hypotheses”. The problem of estimation of parameters involves both point and
interval estimation. Diagrammatically, let us show components and constituents of
statistical inference as in chart.
xiii
xiv Introduction
Problem of Point Estimation
The problem of point estimation relates to the estimating formula of a parameter

based on random sample of size n from the population. The method basically
comprises of finding out an estimating formula of a parameter, which is called the
estimator of the parameter. The numerical value, which is obtained on the basis of a
sample while using the estimating formula, is called estimate. Suppose, for an
example, that a random variable X is known to have a normal distribution Nðl; r2 Þ;
but we do not know one of the parameters, say l. Suppose further that a sample
X1 ; X2 ; . . .; Xn is taken on X. The problem of point estimation is to pick a statistic
T ðX1 ; X2 ; . . .; Xn Þ that best estimates the parameter l. The numerical value of T
when the realization is x1 ; x2 ; . . .; xn is called an estimate of l, while the statistic T is
called an estimator of l. If both l and r2 are unknown, we seek a joint statistic
T ¼ ðU; V Þ as an estimate of ðl; r2 Þ.
Example Let X1 ; X2 ; . . .; Xn be a random sample from any distribution Fh for which
the mean exists and is equal to h. We may want to estimate the mean h of distri-
bution. For this purpose, we may compute the mean of the observations
x1 ; x2 ; . . .; xn , i.e., say
1X n
x ¼ xi :
n i¼1
This x can be taken as the point estimate of h.

Example Let X1 ; X2 ; . . .; Xn be a random sample from Poisson’s distribution with
parameter k, i.e., PðkÞ, where k is not known. Then the mean of the observations
x1 ; x2 ; . . .; xn , i.e.,
1X n
x ¼ xi
n i¼1
is a point estimate of k.
Introduction xv
Example Let X1 ; X2 ; . . .; Xn be a random sample from a normal distribution with

parameters l and r2 , i.e., N ðl; r2 Þ, where both l and r2 are unknown. l and r2 are
the mean and variance respectively of the normal distribution. In this case, we may
take a joint statistics ðx; s2 Þ as a point estimate of N ðl; r2 Þ, where
1X n
x ¼ xi ¼ sample mean
n i¼1
and
1 X n
s2 ¼ ðx1 xÞ2 ¼ sample mean square.
n 1 i¼1
Problem of Interval Estimation
In many cases, instead of point estimation, we are interested in constructing of a

family of sets that contain the true (unknown) parameter value with a specified
(high) probability, say 100ð1 aÞ%. This set is taken to be an interval, which is
known as confidence interval with a confidence coefficient ð1 aÞ and the tech-
nique of constructing such intervals is known as interval estimation.
Let X1 ; X2 ; . . .; Xn be a random sample from any distribution Fh . Let hð xÞ and
hðxÞ be functions of x1 ; x2 ; . . .; xn . If P½hð xÞ\h\hð xÞ ¼ 1 a, then ðhð xÞ; hð xÞÞ is

called a 100ð1 aÞ% confidence interval for h, whereas hð xÞ and hð xÞ are,
respectively, called lower and upper limits for h.
Example Let X1 ; X2 ; . . .; Xn be random sample from N ðl; r2 Þ, whereas both l
and r2 are unknown. We can find 100ð1 aÞ% confidence interval of l. To esti-
mate the population mean l and population variance r2 , we may take the observed
sample mean
1X n
x ¼ xi
n i¼1
and the observed sample mean square
1 X n
s2 ¼ ðxi xÞ2
n 1 i¼1
xvi Introduction
respectively. 100ð1 aÞ% confidence interval of l is given by

s
x ta2;n1 pffiffiffi
n
a
where ta2;n1 is the upper 2 point of the t-distribution with ðn 1Þ d.f.
Problem of Testing of Hypothesis
Besides point estimation and interval estimation, we are often required to decide
which value among a set of values of a parameter is true for a given population
distribution, or we may be interested in finding out the relevant distribution to
describe a population. The procedure by which a decision is taken regarding the
plausible value of a parameter or the nature of a distribution is known as the testing
of hypotheses. Some examples of hypothesis, which can be subjected to statistical
tests, are as follows:
1. The average length of life l of electric light bulbs of a certain brand is equal to
some specified value l0 .
2. The average number of bacteria killed by tests drops of germicide is equal to
some number.
3. Steel made by method A has a mean hardness greater than steel made by
method B.
4. Penicillin is more effective than streptomycin in the treatment of disease X.
5. The growing period of one hybrid of corn is more variable than the growing
period for other hybrids.
6. The manufacturer claims that the tires made by a new process have mean life
greater than the life of a tire manufactured by an earlier process.
7. Several varieties of wheat are equally important in terms of yields.
8. Several brands of batteries have different lifetimes.
9. The characters in the population are uncorrelated.
10. The proportion of non-defective items produced by machine A is greater than
that of machine B.
The examples given are simple in nature, and are well established and have
well-accepted decision rules.
Problems of Non-parametric Estimation
So far we have assumed in statistical inference (parametric) that the distribution of the
random variable being sampled is known except for some parameters. In practice, the
functional form of the distribution is unknown. Here, we are not concerned to the
Introduction xvii
techniques of estimating the parameters directly, but with certain pertinent hypothesis
relating to the properties of the population, such as equalities of distribution, tests of
randomness of the sample without making any assumption about the nature of the
distribution function. Statistical inference under such a setup is called non-parametric.
Bayes Estimator
In case of parametric inference, we consider density function f ðx=hÞ, where h is a
fixed unknown quantity which can take any value in parametric space H. In
Bayesian approach, it is assumed that h itself is a random variable and density f ðx=hÞ
is the density of x for a given h. For example, suppose we are interested in estimating
P, the fraction of defective items in a consignment. Consider a collection of lots,
called superlots. It may happen that the parameter P may differ from lot to lot. In the
classical approach, we consider P as a fixed unknown parameter, whereas in
Bayesian approach, we say that P varies from lot to lot. It is random variable having
a density f ðPÞ, say. Bayes method tries to use this additional information about P.
Example Let X1 ; X2 ; . . . Xn be a random sample from PDF
1
f ðx; a; bÞ ¼ xa1 ð1 xÞb1 ; 0\x\1; a; b [ 0:
bða; bÞ
Find the estimators of a and b by the method of moments.

Answer
We know
a a ð a þ 1Þ
E ð xÞ ¼ l11 ¼ and E x2 ¼ l12 ¼
aþb ða þ bÞða þ b þ 1Þ
Hence
a aða þ 1Þ 1Xn
¼ x; ¼ x2
aþb ða þ bÞða þ b þ 1Þ n i¼1 i
Solving, we get
P 2
b ð x 1Þ xi nx xb
b
b¼ P and b
a¼
ð xi xÞ2 1x
Example Let X1 ; X2 ; . . . Xn be a random sample from PDF
1 x=h rj1
f ðx; h; r Þ ¼ pffiffi e x ; x [ 0; h [ 0; r [ 0
hr r
xviii Introduction
Find estimator of θ and r by

(i) Method of moments
(ii) Method of maximum likelihood
Answer
Here
1X n
Eð xÞ ¼ l11 ¼ rh ; E x2 ¼ l12 ¼ r ðr þ 1Þh2 and m11 ¼ x; m12 ¼ x2
n i¼1 i
Hence
1X n
rh ¼ x; r ðr þ 1Þh2 ¼ x2
n i¼1 i
Solving, we get
P
n
2 ðxi xÞ2
nx b i¼1
br ¼ n and h¼
P 2 nx
ð xi xÞ
i¼1
P
n
1h xi Q
n
1pffi
(i) L ¼ hnr ð rÞ
n e i¼1 xir1
i¼1
pffiffiffi P
n P
n
(ii) log L ¼ nr log h n log n 1h xi þ ðr 1Þ wgxi
i i¼1
Now,
@ log L nr nx x
¼ þ 2¼0)b
h¼
@h h h r
Or
pffiffi
@ log L @ log r X n
¼ n log h n þ log xi
@r @r i¼1
sð r Þ X
n
¼ n log r n pffiffi n log x þ log xi
r i
It is, however, difficult to solve the equation
@ log L
¼0
@r
Introduction xix
and to get the estimate of r. Thus, for this example, estimators of θ and r are more
easily obtained by the method of moments than the method of maximum likelihood.
Example Find the estimators of α and β by the method of moments.
Proof We know
aþb ðb aÞ2
E ð xÞ ¼ l11 ¼ and Vð xÞ ¼ l2
2 12
Hence
aþb ð b aÞ 2 1 X n
¼x and ¼ ðxi xÞ2
2 12 n i¼1
Solving, we get
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P
3 ðxi xÞ2 b 3 ð xi x Þ 2
b
a ¼x and b ¼ xþ
n n
Example If a sample of size one be drawn from the PDF f ðx; bÞ ¼

2
b2
ðb xÞ; 0\x\b find b
b, the MLE of β and b the estimator of β based on
method of moments. Show that b b is biased but b is unbiased. Show that the
efficiency of b
b with respect to b is 2/3.
Solution
Here suppose
2
L¼ ð b xÞ
b2
Then
LogL ¼ Log2 2 log b þ logðb xÞ
Or
@ log L 2 1
¼ 2 þ ¼ 0 ) b ¼ 2x
@b b bx
Now,
Zb
2 b
E ð xÞ ¼ bx x2 dx ¼
b 3
0
xx Introduction
Hence
b
¼ x ) b ¼ 3x
3
Thus the estimator of β based on method of moments is given as b ¼ 3x. Now,

b 2b
E bb ¼2 ¼ 6¼ b
3 3
b
E ðb Þ ¼ 3 ¼ b
3
Hence b
b is biased but b is unbiased.
Again
Zb
2 b2
E x2 ¼ 2 bx2 x3 dx ¼
b 6
0
Therefore,
b2 b2 b2
Vð xÞ ¼ ¼
6 9 18
Solving, we get
b2
Vðb Þ ¼ 9V ð xÞ ¼
9
2
V b
b ¼ 4V ð xÞ ¼ b2
9
Hence
h i2
M b
b ¼V b b þ E bb b
2
2 2 2
¼ b þ bb
9 3
1
¼ b2
3
Thus the efficiency of b

b with respect to b is 2/3.
Example Let ðx1 ; x2 ; . . . xn Þ be a given sample of size n. It is to be tested whether
the sample comes from some Poisson distribution with unknown mean μ. How do
you estimate μ by the method of modified minimum chi-square?
Introduction xxi
Solution
Let x1 ; x2 ; . . . xn be arranged in K groups such that there are ni observations with
x ¼ i; i ¼ r þ 1; . . .; r þ k 2; nL observations x r, and nu observations with
x r þ k 1; so that the smallest and the largest values of x that are fewer are
pooled together and
X
rþ k2
nL þ ni þ nu ¼ n
i¼r þ 1
Let
eu li
pi ðlÞ ¼ Pðx ¼ iÞ ¼
i!
Xn
pL ðlÞ ¼ Pðx rÞ ¼ pi ðlÞ
i¼0
X
1
pu ðlÞ ¼ Pðx r þ k 1Þ ¼ pi ðlÞ
i¼r þ k1
Now, by using
Xk
ni @pi ðhÞ
¼ 0 j ¼ 1; 2; . . .:p
p ðhÞ @hj
i¼1 i
We have
r
P P
1
i
l 1 pi ðlÞ X
rþ k2 i
l 1 pi ðlÞ
i i¼r þ k1
nL i¼0 P þ ni 1 þ nu P
1 ¼0
r
l
pi ðlÞ i¼r þ 1 pi ðlÞ
i¼0 i¼r þ k1
Since there is only one parameter, i.e., p ¼ 1 we get the only above equation.
Solving,we get
P
r P
1
ipi ðlÞ X
rþ k2 ipi ðlÞ
i¼0 i¼r þ k1
l¼
n^ nL P r þ ini þ nu P 1
i¼0 i¼r þ k1
¼ sum of all x0 s
^ is approximately the sample mean x

Hence l
xxii Introduction
Example In general, we consider n uncorrelated observations y1 ; y2 ; . . .yn such that

Eðyi Þ ¼ b1 x1i þ b2 x2i þ . . .. . .. . .. . . þ bk xki and V(yi ) = r2 ; i ¼ 1; 2; . . .. . .; n; x1i ¼
18i; where b1 ; b2 . . .. . .. . .. . .bk and r2 are unknown parameters. If Y and b stand
for column vectors of the variables yi and parameters bj and if X ¼ ðxji Þ be an
ðn kÞ matrix of known coefficients xji the above equation can be written as
EðYÞ ¼ Xb and V(e) = Eðee0 Þ ¼ r2 I
Where e ¼ Y Xb is an ðn 1Þ vector of error random variable with EðeÞ ¼ 0 and

I is an ðn nÞ identity matrix. The least square method requires that b0 s be calculated
such that / ¼ ee0 ¼ ðY Xb Þ0 ðY Xb Þbe the minimum. This is satisfied when
@/
¼ 0 on 2X 0 ðY Xb Þ ¼ 0
@b
^ ¼ ðX 0 XÞ1 X 0 Y.
The least square estimators b0 s is thus given by the vector b
Example Let yi ¼ b1 x1i þ b2 x2i þ . . .. . .. . .. . . þ bk xki ; i ¼ 1; 2; . . .. . .; n or
Eðyi Þ ¼ b1 x1i þ b2 x2i ; x1i ¼ 1 for all i. Find the least square estimates of b1 and b2 .
Prove that the method of maximum likelihood and the method of least square are
identical for the case of normal distribution.
Solution
In matrix notation, we have
0 1 01
1 x21 y1
B1
B x22 C
C b1
B y2 C
B C
EðY) = Xb where X ¼ B .. .. C; b ¼ and Y ¼B . C
@. . A b2 @ .. A
1 x2n yn
Now,
^ ¼ ðX 0 XÞ1 X 0 Y
b
Here
0 1
1 x21
B 1 x22 C P
1 1 ... 1 B C n x2i
0
XX¼ B. C
.. C ¼ P P
x21 x22 ... x2n B
@ .. . A x2i x22i
1 x2n
P
0 yi
XY¼ P
x2i yi
Introduction xxiii
Then
P 2 P P
^ ¼ P 1 x2i x2i yi
b P P P
n x2i ð x2i Þ2
2 x2i n x2i yi
P 2 P P P
1 x2i yi x2i x2i yi
¼ P 2 P P P P P
n x2i ð x2i Þ2 x2i yi þ n x2i yi
Hence
P P P P
^ ¼n x2i yi x2i yi
b 2 P P
n x22i ð x2i Þ2
P P
x2i yi nx2y
¼ P 2
x2i nx22
P
ðx2i x2 Þðyi yÞ
¼ P
ðx2i x2 Þ2
and
P P PP
^ ¼ x22i yi x2i x2i yi
b 1 P P
n x2 ð x2i Þ2
P 2 2i P
y x2i x2 x2i yi
¼ P 2
x2i nx2
P
ynx22 x2 x2i yi
¼ y þ P
x22i nx22
^
¼ y x2 b 2
Let yi be an independent Nðb1 þ b2 xi ; r2 Þ variate, i ¼ 1; 2; . . .. . .; n so that

Eðyi Þ ¼ b1 þ b2 xi: The estimators of b1 and b2 are obtained by the method of least
square on minimizing
X
n
/¼ ðyi b1 b2 xi Þ2
i¼1
The likelihood estimate is

n P
1 2
e2r2 ðyi b1 b2 xi Þ
1
L¼ pffiffiffiffiffiffi
2pr
xxiv Introduction
P
n
L is maximum when ðyi b1 b2 xi Þ2 is minimum. By the method of
i¼1 Pn
maximum likelihood we choose b1 and b2 such that ðyi b1 b2 xi Þ2 ¼ / is
i¼1
minimum. Hence both the methods of least square and maximum likelihood esti-
mator are identical.
Chapter 1
Theory of Point Estimation
1.1 Introduction
In carrying out any statistical investigation, we start with a suitable probability

model for the phenomenon that we seek to describe (The choice of the model is
dictated partly by the nature of the phenomenon and partly by the way data on the
phenomenon are collected. Mathematical simplicity is also a point that is given
some consideration in choosing the model). In general, model takes the form of
specification of the joint distribution function of some random variables
X 1 ; X 2 ; . . . X n (all or some of which may as well be multidimensional). According
to the model, the distribution function F is supposed to be some (unspecified)
member of a more or less general class F of distribution functions.
Example 1.1 In many situations, we start by assuming that X 1 ; X 2 ; . . . X n are iid
(independently and identically distributed) unidimensional r.v’s (random variables)
with a common but unspecified distribution function, F1, say. In other words, the
model states that F is some member of the class of all distribution functions of the
form
Y
n
Fðx1 ; x2 ; . . .; xn Þ ¼ F1 ðxi Þ:
i¼1
Example 1.2 In traditional statistical practice, it is frequently assumed that

X 1 ; X 2 . . . X n have each the normal distribution (but its mean and/or variance being
left unspecified), besides making the assumption that they are iid r.v’s.
In carrying out the statistical investigation, we then take as our goal, the task of
specifying F more completely than is done by the model. This task is achieved by
taking a set of observations on the r.v’s X 1 ; X 2 ; . . . ; X n . These observations are the
raw material of the investigation and we may denote them, respectively, by
x1 ; x2 ; . . . ; xn . These are used to make a guess about the distribution function F,
which is partly unknown.
© Springer India 2015 1

P.K. Sahu et al., Estimation and Inferential Statistics,
DOI 10.1007/978-81-322-2514-0_1
2 1 Theory of Point Estimation
The process is called Statistical Inference, being similar to the process of inductive
inference as envisaged in classical logic. For here too the problem is to know the
general nature of the phenomenon under study (as represented by the distribution of
the r.v’s) on the basis of the particular set of observations. The only difference that in a
statistical investigation induction is achieved within a probabilistic framework.
Probabilistic considerations enter into the picture in three ways. Firstly, the model used
to represent the field of study is probabilistic. Second, certain probabilistic principles
provide the guidelines in making the inference. Third, as we shall see in the sequel, the
reliability of the conclusions also is judged in probabilistic terms.
Random Sampling
Consider a statistical experiment that culminate in outcomes x which are the values
assumed by a r.v. X. Let F be the distribution function of X. One can also obtain n
independent observations on X. This means that the n values observed as
x1 ; x2 ; . . . ; xn are assumed by the r.v. X [This can be obtained by replicating the
experiment under (more or less) identical conditions]. Again each xi may be
regarded as the value assumed by a r.v. Xi, i = 1 (1)n, where X 1 ; X 2 ; . . . X n are
independent random variables with common distribution function F. The set
X 1 ; X 2 ; . . . X n of iid r.v’s is known as a random sample from the distribution
function F. The set of values ðx1 ; x2 ; . . . ; xn Þ is called a realization of the sample
ðX 1 ; X 2 ; . . .; X n Þ.
Parameter and Parameter Space
A constant which changes its value from one situation to another is knownpa-
rameter. The set of all admissible values of a parameter is often called the parameter
space. Parameter is denoted by θ (θ may be a vector). We denote the parameter
space by H.
Example 1.3
(a) Let y ¼ 2x þ h. Here, θ is a parameter and
H ¼ fh; / \ h \ /g:
(b) Let x bð1; pÞ. Here, p is a parameter and
H ¼ fp; 0 \ p \ 1g:
(c) Let x PðkÞ Here, λ is a parameter and
H ¼ fk; k [ 0g:
(d) Let x Nðl0 ; r2 Þ, μ0 is a known constant.

Here, σ is a parameter and H ¼ fr; r [ 0g:
r Þ, both μ and σ are unknown.
(e) Let x Nðl; 2
l
Here, h ¼ is a parameter and H ¼ lr ; 1 \ l \ 1; r [ 0
r
1.1 Introduction 3
Family of distributions
Let X Fh where h e H. Then the set of distribution functions {Fθ, h e H} is called
a family of distribution functions.
Similarly, we define family of p.d.f’s and family of p.m.f’s.
Remark
(1) If functional form of Fθ is known, then θ can be taken as an index.
(2) In the theory of estimation, we restrict ourselves to the case H Rk when k is
the number of unknown functionally unrelated parameters.
Statistic
A statistic is a function of observable random variable which must be free from
unknown parameter(s), that is a Borel measurable function of sample observations
X ¼ ðx1 ; x2 ; . . . ; xn Þ 2 Rn f : Rn ! Rk is often called a statistic.
Example 1.4 Let X 1 ; X 2. . . X n be a random sample from Nðl; r2 Þ. Thus

P P P P
X i ; X 2i ; X i ; X 2i each of these is a statistic.
Estimator and estimate
Any statistic which is used to estimate (or to guess) τ(θ), a function of unknown
parameter θ, is said to be an estimator of τ(θ). The experimentally determined value
of an estimator is called an estimate.
Example 1.5 Let X 1 ; X 2 ; . . .; X 5 be a random sample from P(λ).
P
¼ 1 5 Xi .
An estimator of λ is X 5 i¼1
Suppose the experimentally determined values are X1 ¼ 1; X2 ¼ 4; X3 ¼ 2;
X4 ¼ 6 and X5 ¼ 0.
Then the estimate of λ is 1 þ 4 þ 52 þ 6 þ 0 ¼ 2:6.
1.2 Sufficient Statistic
In statistics, the job of a statistician is to interpret the data that he has collected and to
draw statistically valid conclusion about the population under investigation. But, in
many cases the raw data, which are too numerous and too costly to store, are not
suitable for this purpose. Therefore, the statistician would like to condense the data by
computing some statistics and to base his analysis on these statistics so that there is no
loss of relevant information in doing so, that is the statistician would like to choose
those statistics which exhaust all information about the parameter, which is contained
in the sample. Keeping this idea in mind, we define sufficient statistics as follows:
Definition Let X ¼ ðX 1 ; X 2 ; . . . ; X n Þ be a random sample from fF h ; h 2 Hg.

A statistic Tð X Þ is said to be sufficient for θ [or for the family of distribution

fF h ; h 2 Hg] iff the conditional distribution of X given T is free from θ.

Illustration 1.1 Suppose we want to study the nature of a coin. To do this, we

want to estimate p, the probability of getting head in a single toss. To estimate p,
n tosses are performed. Suppose the results are X 1 ; X 2 ; . . . ; X n where

0 if tail appears
Xi ¼
1 if head appears ðin ith tossÞ:
Intuitively, it sums unnecessary to mention the order of occurrences of head. To

estimate p, it is enough to keep the record of the number of heads. So the statistic
T = ΣXi should be sufficient for p.
Again, conditional distribution of X1 ¼ x1 ; X2 ¼ x2 ; . . .; X n ¼ xn given Tð X Þ ¼

t where t ¼ T ðX1 ¼ x1 ; X2 ¼ x2 ; . . . ; Xn ¼ xn Þ is given by
(
PðX1 ¼ x1 ;X2 ¼ x2 ;...;Xn ¼ xn ;T¼tÞ
PðT ¼ tÞ if T ¼ t
0 otherwise
8 P P
>
> xi
ð1pÞ
n xi
>
< n
p
if T ¼ t
¼ p ð1pÞ
t nt
>
> t
>
:
0 otherwise
8 1
> if T ¼ t
< n
¼
>
:
t
0 otherwise
which is free from parameter p.

So from definition of sufficient statistics, we observe that Σxi is a sufficient
statistic for p.
Illustration 1.2 Let X1 ; X2 ; . . .; Xn be a random samples from N(μ, 1) where μ
is unknown. Consider an orthogonal transformation of the form
X1 þ X2 þ þ Xn
y1 ¼ p
n
ðk 1ÞX k ðX 1 þ X 2 þ þ X k1 Þ
and yk ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; k ¼ 2ð1Þn:
k ð k 1Þ
pffiffiffi
Clearly, y1 Nð n l; 1Þ and each of yk Nð0; 1Þ:
Again, y1, y2,…, yn are independent.
Note that the joint distribution of y2 ; y3 ; . . .; yn does not involve μ, i.e. y2 ; . . .; yn
do not provide any information on μ. So to estimate μ, we use either the obser-
vations on X 1 ; X 2 ; . . . X n or simply the observed value of y1. So any analysis based
on y1, is just as effective as the analysis that is based on all observed values on
X 1 ; X 2 ; . . . X n . Hence, we can suggest that y1 is a sufficient statistic for μ.
1.2 Sufficient Statistic 5
From the above discussion, we see that the conditional distribution of

ðy2 ; y3 ; . . .; yn Þ given y1 is same as the unconditional distribution of
ðy2 ; y3 ; . . . ; yn Þ. Hence, the conditional distribution of X given y1 will be free from

μ.
Thus according to the definition of sufficient statistics, y1 will be a sufficient
statistic for μ.
However, this approach is not always fruitful. To overcome this, we consider a
necessary and sufficient condition for a statistic to be sufficient.
We first consider the Fisher–Neyman criterion for the existence of a sufficient
statistic for a parameter.
Let X ¼ ðX 1 ; X 2 ; . . .; X n Þ be a random sample from a population with contin-

uous distribution function F h ; h 2 H. Let Tð X Þ be a statistic whose probability

density function is fgfTð x Þ; hgg. Then Tð X Þ is a sufficient statistic for h iff the

joint probability density function f ð x ; hÞ of X 1 ; X 2 ; . . .; X n can be expressed as

f ð X ; hÞ ¼ gfTð x Þ; hghð x Þ whose, for every fixed value of Tð x Þ, hð x Þ does not

depend upon h.
Example 1.5 Let X 1 ; X 2 ; . . .; X n be a random sample from the distribution that has
probability mass function
P
f ðx; hÞ ¼ h ð1 hÞ ; x ¼ 0; 1; 0 \ h \ 1. The statistic T X ¼ ni¼1 X i has
x 1x

the probability mass function
n!
gðt; hÞ ¼ ht ð1 hÞnt ; t ¼ 0; 1; 2; . . .; n
t!ðn tÞ!
Thus the joint probability mass function of X 1 ; X 2 ; . . .; X n may be written

f x ; h ¼ hx1 þ x2 þ þ xn ð1 hÞnðx1 þ x2 þ þ xn Þ

n! t!ðn tÞ!
¼ ht ð1 hÞnt
t!ðn tÞ! n!

By Fisher–Neyman criterion, T X ¼ X 1 þ X 2 þ þ X n is a sufficient

statistic for h. In some cases, it is quite tedious to find the p.d.f or p.m.f of a certain
statistic which is or is not a sufficient statistic for h. This problem can be avoided if
we use the following
Fisher–Neymann factorization theorem
Let X ¼ ðX 1 ; X 2 ; . . . X n Þ be a random sample from a population with c.d.f.

F h ; h e H. Furthermore, let all X 1 ; X 2 ; . . . X n are of discrete type or of continuous
type. Then a statistic Tð x Þ will be sufficient for θ or for fF h ; h e Hg iff the joint

p.m.f. or p.d.f. f ð X ; hÞ, of X 1 ; X 2 ; . . . X n can be expressed as

f ð X ; 0Þ ¼ gfTð X Þ; hg hð x Þ

where the first factor gfTð x Þ; hg is a function of θ and x only through T( x ) and for

fixed T( x ) the second factor h( x ) is free from θ and is non-negative.

Remark 1.1 When we say that a function is free from θ, we do not only mean that θ
does not appear in the functional form but also the domain of the function does not
involve θ.
e.g. the function

1=2; h 1\x\hþ1
f ðxÞ ¼
0 otherwise
does depend upon θ.

0
Corollary 1.1 Let T X be a sufficient statistic for θ and T X = ψ T X

be a one-to-one function of T. Then T 0 X is also sufficient for the same

parameters θ.
Proof Since T is sufficient for θ, by factorization theorem, we have
f ð x ; hÞ ¼ gfTð x Þ; hg hð x Þ

Since the function T 0 x is one-to-one

h i
f ð x ; hÞ ¼ g w1 fT 0 ð x Þg; h hð x Þ:

h
Clearly, the first factor of R.H.S. depends on h and x only through T 0 ð x Þ and

the second factor h( x ) is free from θ and is non-negative.

Therefore, according to factorizability criterion, we can say that T 0 ð x Þ is also

sufficient for the same parameter θ.
Example 1.6 Let X1 ; X2 ; . . . Xn be a random sample from b(1, π). We show that 1/n
ΣXi is a sufficient statistic for π.
P.m.f. of x is

hX ð1 hÞ1X if x ¼ 0; 1 ½h p
fh ðxÞ ¼
0 otherwise
where 0 \ h \ 1, i.e. the parameter space is H ¼ ð0; 1Þ. Writing fh ðxÞ in the
form

1 if x ¼ 0; 1
fh ðxÞ ¼ C ðxÞhx ð1 hÞ1x with CðxÞ ¼
0 Otherwise:
We find that joint p.m.f. of X 1 ; X 2 ; . . . X n is
P fh ðxi Þ ¼ hRXi ð1 hÞnRXi P Cðxi Þ

i i
¼ gh ð t Þ h ð x 1 ; x 2 ; . . . x n Þ ðSayÞ
Q
where t ¼ Rxi ; gh ðtÞ ¼ ht ð1 hÞnt and h ðx1 ; x2 ; . . .; xn Þ ¼ i Cðxi Þ:
Hence, the factorization criterion is met by the joint distribution, implying that
T = ΣXi is sufficient for θ. So is T/n, the sample proportion of successes being
one-to-one correspondence with T.
Example 1.7 Let X1,…, Xn be a random sample from P(λ). We show that 1/n ΣXi is
a sufficient statistic for λ.
The p.m.f. of the Poisson distribution is

eh hx
fh ðxÞ ¼ x! if x ¼ 0; 1; 2. . .½h k
0 otherwise
where 0 \ h /, i.e. H ¼ ð0; /Þ

Let us write the p.m.f. in the form fh ðxÞ ¼ CðxÞ eh hX
1
if x ¼ 0; 1; 2; . . .
with CðxÞ ¼ x!
0 otherwise
We may represent the joint p.m.f. of X 1 ; X 2 ; . . . X n as

P
P fh ðxi Þ ¼ enh h Xi
P Cðxi Þ
i i
¼ gh ðtÞhðx1 ; x2 . . .xn Þ; ðSayÞ
P
where t ¼ xi ; gh ðtÞ ¼ enh ht and hðx1 ; x2 ; . . .xn Þ ¼ P Cðxi Þ:
i
The factorizability condition is thus observed to hold, so that T = ΣXi is sufficient

for θ; so is T/n ¼ X, the sample mean.
Example 1.8 Let X 1 ; X 2 ; . . . ; X n be a random sample from Nðl; r2 Þ. Show that
(i) if σ is known, ΣXi is a sufficient statistic for μ, (ii) if μ is known RðX i lÞ2 is a
sufficient statistic for σ2, and (iii) if both μ and σ are unknown RX i ; RX 2i is a
sufficient statistic for ðl; r2 Þ.
Ans. (i) we may take the variance to be σ2 and the unknown mean to be μ,
varying over the space H ¼ ð /; /Þ. Here, the joint p.d.f. of X 1 ; X 2 ; . . . ; X n is
Q n 1 12 ðxi lÞ2 o
pffiffiffiffi e 2r
i r 2p
P
1 12 ðxi lÞ2
¼ pffiffiffiffiffiffin e
2r
i
r 2p
8 P 2 9
nðxlÞ2
< i ðxi xÞ 1 =
¼ e 2r2 e 2r2 pffiffiffiffiffi n
: ðr 2nÞ ;
¼ gl ðtÞ h ðx1 ; x2 ; . . .; xn Þ; ðSayÞ
2
where t ¼ x; gl ðtÞ ¼ enðXlÞ =2r2
P
1 ðxi xÞ2
and hðx1 ; x2 ; . . . ; xn Þ ¼ p1ffiffiffiffi n e 2r2
ð r 2p Þ

Thus the factorizability condition holds with respect P to T ¼ X, the sample mean,
which is therefore sufficient for μ. So the sum is i X i
(ii) The unknown variance σ2 = θ, say, is supposed to vary over H ¼ ð0; /Þ.
The joint p.d.f. of X 1 ; X 2 ; . . . X n may be written as
P
Y 1 12 ðxi lÞ2

1
1
2r2
ðxi lÞ2
pffiffiffiffiffiffi e 2r ¼ pffiffiffiffiffiffi n e i ¼ gh ðtÞ hðx1 ; x2 . . . xn Þ; say;
i r 2p ðr 2pÞ
P P 2
ðxi lÞ2 ; gh ðtÞ ¼ pffiffiffiffi1 n e2r2 ðxi lÞ
1
where t ¼
ð 2prÞ
i
P
r h and h x ¼ 1. Hence, T ¼
2
ðX i lÞ2 is a sufficient statistic for θ. So

is
P
S20 ¼ 1n i ðxi lÞ2 , which is in this situation commonly used to estimates σ2.
(iii) Taking the unknown mean and variance to be θ1 and θ2, respectively, we
now have for θ a vector θ = (θ1, θ2) varying over the parameter space (which is a
half-plane) H ¼ fðh1 ; h2 Þ = / \ h1 \ /; 0 \ h2 \ /g.
The jt. p.d.f. of X 1 ; X 2 ; . . . ; X n may now be written as
Y 1 2

1 2h1 ½nðxh1 Þ2 þ ðn1Þs2
pffiffiffiffiffiffiffiffiffiffi e2h2 ðxi h1 Þ
1
¼ n e 2
i 2ph2 ð2ph2 Þ 2
X .
¼ gh ðt1 ; t2 Þ hð x Þ; say; where t1 ¼ x; t2 ¼ s2 ¼ ðxi xÞ2 ðn 1Þ

i
1

2h1 ½nðxh1 Þ2 þ ðn1Þs2
gh ð t 1 ; t 2 Þ ¼ n e 2 and h x ¼ 1 :
ð2ph2 Þ2
The factorizability condition is thus observed to hold with regard to the statistics
T 1 ¼ X, the sample mean and T2 ¼ s2, the sample variance.
Hence, X and s2 are jointly sufficient for θ1 and θ2, i.e. RX i ; RX 2i is a joint
sufficient statistic for (μ, σ2).
Example 1.9 Let X 1 ; X 2 ; . . . ; X n be a random sample from R(0, θ).
Show that X ðnÞ ¼ max Xi is a sufficient statistic for θ.
1in
Ans.: The jt. p.d.f. of x1 ; x2; . . .; xn is

1
hn if 0 \ xi \ h 8i
f x;h ¼
0otherwise
1
if 0 \ xðnÞ \ h
hn
¼
0 otherwise

1 1 if a \ x \ b
¼ n I ð0;hÞ xðnÞ where I ða;bÞ ðxÞ ¼
h 0 otherwise
¼ gfxðnÞ ; hg hð x Þ; say

where gfxðnÞ; hg ¼ h1n I ð0;hÞ fxðnÞ g & hð x Þ ¼ 1.

Note that g{x(n), θ} is a function of θ and x only through x(n) whereas for fixed

xðnÞ ; hð x Þ is free from θ.

Hence, x(n) is a sufficient statistic for θ.
Example 1.10 Let X 1 ; X 2 ; . . . ; X n be a random sample from R(θ1, θ2). Show that
fX ð1Þ ; X ðnÞ g is a sufficient statistic for θ = (θ1, θ2) where Xð1Þ ¼ min X i ,
1in
X ðnÞ ¼ max X i .
1 i n
Solution Joint p.d.f. of X 1 ; X 2 ; . . . ; X n is

(

1
if h1 \ xi \ h2 8i
ðh2 h1 Þn
f x;h ¼
0 otherwise
(
1
ðh2 h1 Þn
if h1 \ xð1Þ \ xðnÞ \ h2
¼
0 otherwise
1
¼ I1ðh1 ;1Þ fxð1Þ gI2ð1;h2 Þ fxðnÞ g
ðh2 h1 Þn
where

1 if h1 \ xð1Þ \ /
I1ðh1 ;1Þ fxð1Þ g ¼
0 otherwise

1 if / \ xn \ h2
and I 2 ð /; h2 ÞfxðnÞ g ¼ , i.e. f ð x ; hÞ ¼ g½fxð1Þ ; xðnÞ g;
0 otherwise

ðh1 ; h2 Þh x where g½fxð1Þ ; xðnÞ g; ðh1 ; h2 Þ ¼ ðh2 h
1
1Þ
n I 1ð0 ;/Þ fxð1ÞgI 2ð1;h Þ

1 2

xðnÞ and h x ¼ 1.

Note that g is a function of (θ1, θ2) and x only through {x(1), x(n)} where as for

fixed {x(1), x(n)}, h x is free from θ.

Hence, {x(1), x(n)} is a sufficient statistic for (θ1, θ2).
Example 1.11 Let X 1 ; X 2 ; . . .; X n be a random sample from a population having
p.d.f.

eðx hÞ ; x[ h
f ðx; hÞ ¼
0 otherwise
Show that X ð1Þ ¼ min X i is a sufficient statistic for θ.

1in
Solution The p.d.f. can equivalently be written as

eðXhÞ ; xð1Þ [ h
f ðx;hÞ ¼
0 otherwise
Now, the joint p.d.f. of X 1 ; X 2 ; . . . ; X n is

8 P
< ðxi hÞ
f ð x ;hÞ ¼ e i ; xð1Þ [ h
:
0 otherwise
P
ðxi hÞ 1 if xð1Þ [ h
f ð x ; hÞ ¼ e i I ðh;/Þ xð1Þ where I ðh;/Þ xð1Þ ¼
0 otherwise

¼ g fxð1Þ ; hg : h x ; say

P

where gfxð1Þ ; hg ¼ e i
ðxi hÞ
I ðh;/Þ xð1Þ and h x ¼ 1.

Note that g{x(i), θ} is a function of θ and x only through x(1) and for fixed x(1), h

(x) is free from θ. Hence, according to factorizability criterion, x(1) is a sufficient
statistic for θ.
Note In the above three problems the domain of the probability density depends
upon the parameter h. In this situation, we should aware to apply Fisher–Neyman
factorization

theorem
and we should give
proper
consideration to the domain of the
function h x for every fixed value of T X . In these situations, it is better to use

Fisher–Neyman criterion. Let us solve Example 1.10 by using Fisher–Neyman
criterion:
1
f ð xÞ ¼ ; h1 \ x \ h2
h 2 h1
Let X ð1Þ ¼ min X i ¼ y1 ; X ðnÞ ¼ max X i ¼ y2
1in 1in
The joint p.d.f. of y1 ; y2 is
nð n 1Þ
gð y 1 ; y 2 ; h1 ; h2 Þ ¼ ðy y1 Þn2 ; h1 \ y1 \ y2 \ h2
ð h2 h1 Þ n 2
The joint p.d.f. of X 1 ; X 2 ; . . .; X n is

1
f x ; h1 ; h2 ¼
ð h2 h1 Þ n
nð n 1Þ n2 1
¼ n xðnÞ xð1Þ n2
ð h2 h1 Þ nðn 1Þ xðnÞ xð1Þ

¼ g xðnÞ ; xð1Þ ; h1 ; h2 h x


By the Fisher–Neyman criterion, xð1Þ ; xðnÞ is a sufficient statistic for
h ¼ ð h1 ; h2 Þ
Example 1.12 Let X Nð0; r2 Þ, show that jX j is sufficient for σ.
Solution
1 X2
f ðx; rÞ ¼ pffiffiffiffiffiffiffiffi e2r2 ; r[0
2kr
1 jX j2
¼ pffiffiffiffiffi e 2r2 1
2kr
¼ gðt; rÞ hðxÞ; hðxÞ ¼ 1
where g(t, σ) is a function of σ and x only through t ¼ jxj and for fixed t, h(x) = 1 is
free from σ.
Hence, by Fisher–Neymam factorization theorem, jX j is sufficient for σ.
Example 1.13 Let X 1 ; X 2 ; . . . X n be a random sample from a double-exponential
distribution whose p.d.f. may be taken as fh ðX Þ ¼ 12 exp ðjxi hjÞ, and the
unknown parameter θ varies over the space H ¼ ð /; /Þ.
Q P
In this case, the joint p.d.f. is i f h ðxi Þ ¼ 21n exp i jxi hj .
For no single statistic T, it is now not possible to express the joint p.d.f. in the
form gθ(t) h(x1, x2, … xn).
Hence, there exists no statistic T which taken alone is sufficient for θ. The whole
set X1, X2, …, Xn, or the set X(1), X(2), … X(n), is of course sufficient.
Remarks 1.2 A single sufficient statistic does not always exist.
e.g. Let X1, X2,…, Xn be a random sample from a population having p.d.f.
1
f ðx;hÞ ¼ h; k h \ x \ ðk þ 1Þ h; k [ 0
0 otherwise
Here, no single sufficient statistic for θ exists. In fact, {x(1), xn)} is sufficient for
θ.
Remark 1.3 Not all functions for sufficient statistic are sufficient. For example, in
2
random sampling from N(μ, σ2), σ2 being known, X is not sufficient for μ. (Is X
sufficient for μ2 ?)
Remark 1.4 Not all statistic are sufficient.
Let X1, X2 be a random sample from P(λ). Then X1 + 2X2 is not sufficient for λ,
because in particular, say
PfX 1 ¼ 0; X 2 ¼ 1 X 1 þ 2X 2 ¼ 2g
PfX 1 ¼ 0; X 2 ¼ 1j X 1 þ 2X 2 ¼ 2g ¼
PfX 1 þ 2X 2 ¼ 2g
PfX 1 ¼ 0; X 2 ¼ 1g
¼
PfX 1 ¼ 0; X 2 ¼ 1g þ PfX 1 ¼ 2; X 2 ¼ 0g
k k
e e k
¼ 2
ek ek k þ ek ek k2!
¼ k þ2 2 which depends upon λ.

Remarks 1.5 Let h ¼ ðh1 ; h2 ; . . .; hk Þ and T ¼ ðT 1 ; T 2 ; . . . T m Þ. Further, let T

be a sufficient statistic for h . Then we cannot put any restriction on m, i.e. m ≥ k,

the number of parameters involved in the distribution. Even if m = k, then we
cannot say that Ti of T is sufficient for θi of h . It is better to say that (T1, T2, … Tm)

are jointly sufficient for (θ1, θ2, … θk).
Let X1, X2,…, Xn be a random sample from N(μ, σ2). Here, ΣXi and ΣX2i are
jointly sufficient for μ and σ2.
Remarks 1.6 The whole set of observations X ¼ ðX 1 ; X 2 ; . . . ; X n Þ is always

sufficient for h . But we do not consider this to be real sufficient statistic when

another sufficient statistic exists. There are a few situations where the whole set of
observations is a sufficient statistic. [As shown in the example of
double-exponential distribution].
Remarks 1.7 The set of all order statistics T{X(1), X(2), …, X(n)}, X(1) < X(2),
…, < X(n), is sufficient for the family.
Conditional distribution of ð X =T ¼ tÞ ¼ n!1 because for each T = t, we have n-

tuples of the form (x1, x2, … xn).
Remarks 1.8: Distribution

admitting

sufficient
statistic Let X1, X2, …, Xn be a
random sample from f x ; h and T X be a sufficient statistic for θ (θ is a scalar).

According to factorization theorem,
X
log f ðxi ; hÞ ¼ log g ðT; hÞ þ log hð x Þ

i
Differentiating w.r.t. θ, we have
X @ log f ðxi ; hÞ @ log gðT; hÞ
¼ ¼ GðT; hÞ; ðsayÞ ð1:1Þ
i
@h @h
Put a particular value of h in (1.1).

X
n
Then we have uðxi Þ ¼ GðTÞ ð1:2Þ
i¼1
Now differentiating (1.1) and (1.2) w.r.t. xi, we have
@ 2 log f ðxi ; hÞ @GðT; hÞ @T

¼ ð1:3Þ
@h@xi @T @xi
@uðxi Þ @GðT Þ @T
¼ ð1:4Þ
@xi @T @xi
(1.3) and (1.4) give us
@ 2 log f ðxi ; hÞ @uðxi Þ @GðT; hÞ=@T

= ¼ 8i ð1:5Þ
@h @xi @xi @GðT Þ=@T
Since the R.H.S. of (1.5) is free from xi, we can write
@GðT; hÞ @ GðTÞ
¼ k1 ðhÞ
@T @T
) GðT; hÞ ¼ GðT Þk1 ðhÞ þ k2 ðhÞ
P
@ i logf ðxi ; hÞ
) ¼ GðT Þk1 ðhÞ þ k2 ðhÞ
@h Z Z
X

) log f ðxi ; hÞ ¼ GðT Þ k1 ðhÞdh þ k2 ðhÞdh þ c x

i
Y

) f ðxi ; hÞ ¼ A x eh1 GðT Þ þ h2

i

where A x ¼ a function of x

θ1 = a function of θ, and
θ2 = another function of θ.
Thus if a distribution is to have a sufficient statistic for its parameter, it must be
of the form
f ðx; hÞ ¼ eB1 ðhÞuðxÞ þ B2 ðhÞ þ RðxÞ : ð1:6Þ
(1.6) is known as Koopman form.

Example Show, by expressing a Poisson p.m.f. in Koopman form, that Poisson
distribution possesses a sufficient statistic for itsparameter k.
k x
Here, f ðx; kÞ ¼ e x!k ¼ ek þ x log klog x!
which is of the form eB1 ðhÞuðxÞ þ B2 ðhÞ þ RðxÞ .
Hence, there exists a sufficient statistic for k.
Completeness A family of distributions is said to be complete
if E ½gðX Þ ¼ 0 8h2H
) PfgðxÞ ¼ 0g ¼ 1 8h2H
A statistic T is said to be complete if family of distributions of T is complete.

Examples 1.14P(a) Let X1, X2,…, Xn be a random sample from b(1, π), 0 < π < 1.
Then T ¼ ni¼ 1 X i is a complete statistic.
As E ½gðT Þ ¼ 0 8 p 2 ð0; 1Þ
X n
) gðtÞ nt pt ð1 pÞnt ¼ 0
t¼0
X
n n
p t
) ð 1 pÞ n gð t Þ ¼ 0 8 p 2 ð0; 1Þ
t¼0
t
1p
) gð t Þ ¼ 0 for t ¼ 0; 1; 2. . . n 8 p 2 ð0; 1Þ
) P fgð t Þ ¼ 0g ¼ 1 8p
(b) Let X * N (0, σ2). Then X is not complete
as; E ð X Þ ¼ 0 ; P ð X ¼ 0Þ ¼ 1 8 r2
(c) If X Uð0; hÞ, then X is a complete statistic [or R(0, θ)].

A statistic is said to be complete sufficient statistic if it is complete as well as
sufficient. P
If (X1, X2,…, Xn) is a random sample from b (1, π), 0 < π < 1, then
P T ¼ X i is
also sufficient. So T is a complete sufficient statistic where T ¼ Xi.
Minimal Sufficient Statistic
A statistic T is said to be minimal sufficient if it is a function of every other
sufficient statistics.
The sufficiency principle
A sufficient statistic for a parameter h is a statistic that, in a certain sense, captures
all the information about h contained in the sample. Any additional information in
the sample, besides the value of sufficient statistic, does not contain any more
information about h. These considerations lead to the data reduction technique
knownas sufficiency
principle.
If T X is a sufficient statistic for h, then any inference about h should depend

on the sample X only through the value T X , that is, if x and y are two sample


points such that T x ¼ T y , then the inference about h should be the same

whether X ¼ x or X ¼ y is observed.

Definition (Sufficient statistic) A statistic T X is a sufficient statistic for h if the

conditional distribution of the sample X given the value of T X does not depend

on h.

Factorization theorem: Let f x jh denote the joint pdf/pmf of a sample X .

A statistic T X is a sufficient statistic for h iff 9 functions gðtjhÞ and h x such

that for all sample points X and all parameter values h,

f x jh ¼ gðtjhÞh x

Result: If T X is a function of T 0 X , then T 0 X is sufficient which

implies that T X is sufficient.

i.e. sufficiency of T 0 X ) sufficiency of T X ; a function of T 0 X

0 0 0
Proof Let fBt0 jt 2 s g and fAt jt 2 sg be the partitions induced by T X and

T X , respectively. h

Since T X is a function of T 0 X , for t0 2 s0 ) Bt0 At , for some 8t 2 s.

0
Thus Sufficiency of T X

, Conditional distribution of X ¼ x given T 0 X ¼ t0 is independent of h,

8t0 2 s0
, Conditional distribution of X ¼ x given X 2 Bt0 is independent of h, 8t0 2 s0

) Conditional distribution of X ¼ x given X 2 At (for some 8t 2 s) is inde-

pendent of h, 8t 2 s
, Conditional distribution of X ¼ x given T X ¼ t is independent of h,

8t 2 s.

, Sufficiency of T X .

Sufficient statistic for an Exponential family of distributions:

Let X1 ; X2 ; . . .; Xn be i.i.d. observations from a pdf/pmf f xjh that belongs to
an exponential family given by
!
X
k
f xjh ¼ hð xÞc h exp xi h ti ð xÞ
i¼1
where h ¼ ðh1 ; h2 ; . . .hd Þ, d k. Then

!
X
n X
n
T X ¼ t1 Xj ; . . .; tk X j

j¼1 j¼1
is a (complete) sufficient statistic for h .

Minimal sufficient statistic

When we introduced the concept of sufficiency, we said that our objective was to
condense the data without losing any information about the parameter. In any
problem, there are, in fact, many sufficient statistics. In general, we have to consider
the choice between alternative sets of sufficient statistics. In a sample of n obser-
vations, we always have a set of n sufficient statistics [viz., the observations X ¼

ðX1 ; X2 ; . . .; Xn Þ themselves or the order statistics Xð1Þ ; Xð2Þ ; . . .; XðnÞ ] for the
kð 1Þ parameters of the distributions. For example, in sampling from N ðl; r2 Þ
distribution with both l and r2 unknown, there are, in fact, three sets of jointly
sufficient statistic: the observations X ¼ ðX1 ; X2 ; . . .; Xn Þ, the order statistics

s2 Þ. We naturally prefer the jointly sufficient statistic
Xð1Þ ; Xð2Þ ; . . .; XðnÞ and ðX;
s2 Þ since they condense the data more than either of the other two. Sometimes,
ðX;
though not always, there will be a set of sð\nÞ statistics sufficient for the
parameters. Often s ¼ k but s may be <k also.
The question that we might ask is as follows: Does 9 a set of sufficient statistic
that condenses the data more than ðX; s2 Þ? The answer is there does not. The notion
that we are alluding to is of minimum set of sufficient statistics, which we label
minimal sufficient statistic. In other words, we have to ask: what is the smallest
number s of statistics that constitute a sufficient set in any problem? It may be said
in general that a sufficient statistic T may expected to be minimal sufficient if it has
the same dimensions (i.e. the same number of components) as h.
Statistics and partition

It may be noted that every statistic induces a partition of x. The same is true for a set
of statistics; a set of statistics induces a partition of x. Loosely speaking, the
condensation of data that a statistic or a set of statistics exhibits can be measured by
the number of subsets in the partition induced by the statistic or a set of statistics. If
a set of statistics has fewer subsets (co-sets) in its induced partition than the induced
partition of another set of statistics, then we say that the first statistic condenses the
data more than the later. Still loosely speaking, a minimal sufficient set of statistics
is then a sufficient set of statistics that has fewer subsets (co-sets) in its partition
than the induced partition of any other set of sufficient statistics. So a set of
sufficient statistic is minimal if no other set of sufficient statistics condenses the data
more without losing sufficiency.
Thus T is minimal sufficient if any further reduction of data is not possible
without losing sufficiency, i.e. T is minimal sufficient if there does not exist a
function U ¼ wðT Þ such that U is sufficient.
Definition (Minimal sufficient statistic) A sufficient statistic Tð X Þ is called minimal

sufficient if, for every other sufficient statistic T 0 ð X Þ, Tð X Þ is a function of T 0 ð X Þ.

To say that Tð X Þ is a function of T 0 ð X Þ simply means that if T 0 ð x Þ ¼ T 0 ð y Þ,

then Tð x Þ ¼ Tð y Þ. In terms of the partition sets, if Bt0 jt0 2 s0 are the partition sets

for T 0 ð X Þ and At jt 2 T are the partition sets for Tð X Þ, then the above definition of

minimal sufficient statistic states that every Bt0 is a subset of some At . Thus, the
partition associated with a minimal sufficient statistic is coarsest possible partition
for a sufficient statistic, and a minimal sufficient statistic achieves the greatest
possible data reduction for a sufficient statistic.
Example Let Xi ði ¼ I; 2; . . .; nÞ independent PðhÞ distribution. Then T ¼
Pn
i ¼ 1 Xi is sufficient for h and, in fact, it is minimal sufficient.
Pn
Since T ¼ i ¼ 1 Xi is minimal sufficient; therefore, any further reduction of the
data is not possible without losing sufficiency, i.e. there does not exist a function
U ¼ wðT Þ such that U is sufficient. Suppose that T is sufficient and if possible, 9 a
function
U ¼ wðtÞ3wðt1 Þ ¼ ¼ wðtk Þ ¼ u:
Then
8 ðnhÞti
>
>
<Pk
ti !
if t ¼ ti ði ¼ 1; 2; . . .; kÞ
ðnhÞti
Ph ½T ¼ tjU ¼ u ¼
>
> i¼1
ti !
:
0 otherwise
! depends on h ;
Pn
so that U is not sufficient retaining sufficiency. Hence, T ¼ i¼1 Xi is minimal
sufficient statistic.
Remark 1 Since minimal sufficient statistic is a function of sufficient statistic,
therefore, a minimal sufficient statistic is also sufficient.
Remark 2 Minimal sufficient statistic is not unique since any one-to-one function of
minimal sufficient statistic is also a minimal sufficient statistic.
Definition of minimal sufficient statistic does not help us to find a minimal
sufficient statistic except for verifying whether a given statistic is minimal sufficient
statistic. Fortunately, the following result of Lehman and Scheffe (1950) gives an
easier way to find a minimal sufficient statistic.

Theorem Let f x jh be the pmf/pdf of a sample X . Suppose 9 a function

T X 3 for every two sample points x and , and the ratio of f x jh
y f y jh

is constant as a function of h (i.e. independent of h) iff T x ¼ T y . Then

T X is minimal sufficient statistic.

Proof Let us assume f x jh [ 0; x 2 x and h. First, we show that T X is a

n o
sufficient statistic. Let s ¼ t=t ¼ Tð x Þ; x 2
x be the image of x under Tð x Þ.
n
o
Define the partition sets induced by Tð X Þ as At ¼ x jT x ¼ t . For each

At , choose and fix one element x 2 At . For any x 2
x, x is the fixed element
t Tð x Þ

that is in the same set At , as x . Since x and x

are in the same set At ,
T x
0 1 , 0

1

T x ¼ T @ x
A and, hence, f x jh f @ x
jhA is constant as a
T x T x

f x jh
x by h x ¼ and h

function of h. Thus, we can define a function on

f x jh
T ð x Þ
does not depend on h. h
Define a function on s by gðtjhÞ ¼ f x jh . Then
t

f x jh f x jh
Tð x Þ
f ð x jhÞ ¼
¼ gðtjhÞhð x Þ and by factorization theorem, Tð X Þ

f x jh
Tð x Þ

is sufficient for h. Now to show that Tð X Þ is minimal, let T 0 ð X Þ be any other

sufficient statistic. By factorization theorem, 9 functions g0 and h0 such that

f ð x jhÞ ¼ g0 T 0 ð x Þjh h0 ð x Þ

Let x and y be any two sample points with T 0 ð x Þ ¼ T 0 ð y Þ. Then

f ð x jhÞ g0 ðT 0 x jhÞ h0 ð x Þ h0 x

¼ ¼ :
f ð y jhÞ g0 ðT 0 y jhÞh0 y
h0 y

Since this ratio does not depend on h, the assumptions of the theorem imply
Tð x Þ ¼ Tð y Þ. Thus Tð x Þ is a function of T 0 ð x Þ and Tð x Þ is minimal.

Example (Normal minimal sufficient statistic) Let X1 ; X2 ; . . .Xn be iid Nðl; r2 Þ,

both l and r2 unknown. Let x and y denote two sample points, and let ðx; s2x Þ and

ðy; s2y Þ be the sample means and variances corresponding to the x and y samples,

respectively. Then we must have

n=2
f x jl; r 2 exp ð2pr2 Þ
n ðx lÞ 2
þ ð n 1 Þs 2
x ð 2r 2
Þ

¼
n=2
f y jl; r2 ð2pr Þ
2 2
exp nðy lÞ þ ðn 1Þs y 2 ð2r Þ2

2
¼ exp n x y þ 2nlðx yÞ ðn 1Þ s x s y
2 2 2
2r2

This ratio will be constant as a function of l and r2 iff x ¼ y and s2x ¼ s2y , i.e.

s2 Þ is a minimal sufficient
ðx; s2x Þ ðy; s2y Þ. Then, by the above theorem, ðX;

statistic for ðl; r2 Þ.

Remark Although minimal sufficiency ) sufficiency, the converse is not neces-
Pn true.PnFor 2a random sample X1 ; X2 ; . . .Xn from Nðl; lÞ distribution,
sarily
i ¼ 1 Xi ; i ¼ 1 Xi is sufficient but not minimal sufficient statistic. In fact,
Pn P P
i¼1 Xi and ni¼ 1 Xi2 are each singly sufficient for l; ni¼ 1 Xi2 being minimal.
(This particular example also establishes the fact that single sufficiency does not
imply minimal sufficiency.)
1.3 Unbiased Estimator and Minimum-Variance

Unbiased Estimator
Let X be a random variable having c.d.f. F h ; h 2 H. The functional form of Fθ is

known, but the parameter θ is unknown. Here, we wish to find the true value of θ on
the basis of the experimentally determined values x1, x2,…, xn, corresponding to a
random sample X1, X2,…, Xn from Fθ. Sine the observed values x1, x2, …, xn
change from one case to another, leading to different estimates in different cases, we
cannot expect that the estimate in each case will be good in the sense of having
small deviation from the true value of the parameter. So, we first choose an esti-
mator T of θ such that the following condition holds:
PfjT hj \ cg PfjT 0 hj \ cg 8 h 2 H and 8 c ð1:7Þ
where T 0 is any rival estimator.

Surely, (1.7) is an ideal condition, but the mathematical handling of (1.7) is very
difficult. So we require some simpler condition. Such a condition is based on mean
square error (m.s.e.). In this case, an estimator will be best if its m.s.e. is least. In
other words, an estimator T will be best in the sense of m.s.e. if
EðT hÞ2 E ðT 0 hÞ 8 h and for any rival estimator T 0

2
ð1:8Þ
It can readily be shown that there exists no T for which (1.8) holds. [e.g. Let θ0
be a value of θ and consider T 0 ¼ h0 . Note that m.s.e. of T 0 at θ = θ0 is ‘0’, but m.s.
e. of T 0 for other values of θ may be quite large.]
To sidetrack this, we introduce the concept of unbiasedness.
Actually, we choose an estimator on the basis of a set of criteria. Such a set of
criteria must depend on the purpose for which we want to choose an estimator.
Usually, a set consists of the following criteria: (i) unbiasedness; (ii) mini-
mum-variance unbiased estimator; (iii) consistency, and (iv) efficiency.
Unbiasedness
An estimator T is said to be an unbiased estimator (u.e.) of h ½or cðhÞ iff
E ðT Þ ¼ h ½or cðhÞ 8h 2 H.
Otherwise, it will be called a biased estimator. The quantity b(θ, T) = Eθ (T) − θ
is called the bias. A function γ(θ) is estimable if it has an unbiased estimator.
Let X1, X2,…, Xn be a random sample from a population with mean μ and
Pn
variance σ2. Then X and s2 ¼ 1 2 are u.e’s of μ and σ2,
n1 i ¼ 1 ðXi X Þ
respectively.
Note
(i) Every individual observation is an unbiased estimator of population mean.
(ii) Every partial mean is an unbiased estimator of population mean.
Pk Pk
1
(iii) Every partial sample variance [ e.g. k1 2
1 ðXi Xk Þ ; Xk ¼ k
1
1 Xi and
k < n] is an unbiased estimator of σ .
2
Example 1.15 Let X1, X2,…, Xn be a random sample from N(μ, σ2). Then X and
Pn
s2 ¼ n1 2
1 ðXi X Þ are u.e’s for μ and σ , respectively. But estimator s ¼
1 2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn ffi
2
1 ðXi X Þ is a biased estimator of σ.
1
hqffiffiffiffiffiffi
n1
1 i
The bias bðs; rÞ ¼ r n1 2
Cðn=2Þ C n1
2 1 .
Remark 1.9 An unbiased estimator may not exist.

Example (a) Let X b ð1; pÞ; 0 \ p \ 1.
Then there is no estimator T(X) for which E fT ðX Þg ¼ p2 8 p 2 ð0; 1Þ
i.e. π2 is not estimable. Similarly, p1 has no unbiased estimator.
m hm
f ðx; hÞ ¼ x hnx
; x ¼ 0; 1; 2 . . . n
(b) For n
h ¼ m; m þ 1; . . .
then there is no unbiased estimator for θ.
Remark 1.10 Usually, unbiased estimator is not unique. Starting from two unbiased
estimators, we can construct an infinite number of unbiased estimators.
Example Let X1, X2,…, Xn be a random sample from P(λ). Then both X and
P
s2 ¼ n1 n 2
1 ðXi XÞ are unbiased estimators of λ as mean = variance = λ for P(λ).
1

Let Ta ¼ aX þ ð1 aÞs2 ; 0 a 1. Here, Ta is an unbiased estimator of λ.
Remark 1.11 An unbiased estimator may be absurd.
Example Let X PðkÞ. Then T ðX Þ ¼ ð2ÞX is an unbiased estimator of e−3λ since
X ek kx
E fT ðX Þg ¼ ð2Þx
x
x!
X ð2kÞx
¼ ek ¼ ek e2k ¼ e3k :
x
x!
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 23
Note that T ðX Þ [ 0 for even X

\ 0 for odd X:

∴ T(X) which is an estimator of a positive quantity e3k [ 0 may occasionally be
negative.
k
Example Let X * P(λ).
( Construct an unbiased estimator of e .
1 if x ¼ 0
Ans Let TðxÞ ¼
0; otherwise
) E fT ðX Þg ¼ 1 P ðX ¼ 0Þ þ O PðX 6¼ 0Þ ¼ ek 8k:
∴ T(X) is an unbiased estimator of ek .

Remark 1.12 Mean square error of an unbiased estimator (i.e. variance of unbiased
estimator) may be greater than that of a biased estimator and then we prefer the
biased estimator.
EðT hÞ2 ¼ E½T EðT Þ þ fE ðT Þ hg2

¼ V ðT Þ þ b2 ðT; hÞ where bðT; hÞ ¼ E ðT Þ h:
Let T1 be a biased estimator and T2 an unbiased estimator, i.e. E(T1) ≠ θ but

E(T2) = θ.
) MSE ðT 1 Þ ¼ V ðT 1 Þ þ b2 ðT 1 ; hÞ
MSE ðT 2 Þ ¼ V ðT 2 Þ
if V ðT 2 Þ [ V ðT 1 Þ þ b2 ðT 1 ; hÞ; then we prefer T 1 :
e.g. Let X1, X2,…, Xn be a random sample from N(μ, σ2). Then s2 ¼
P P
1 2
i ðX i XÞ is an unbiased estimator of σ . Clearly, n þ 1
2 1 2
i ðX i XÞ ¼ n þ 1s
n1 2
n1
is a biased estimator of σ .
2

ðn 1ÞS2 ðn 1ÞS2
As v 2
n1 ) V ¼ 2 ð n 1Þ
r2 r2
2
) V s2 ¼ r4 ¼ MSE of s2 :
n1

2
On the other hand, MSE of n1 2
nþ1 s ¼ V n1 2
nþ1 s þ n1 2
nþ1 r r2

n1 2 2 4r4 2r4 2r4 2r4
¼ r4 þ ¼ ð n 1 þ 2Þ ¼ \
nþ1 n 1 ð n þ 1Þ 2
ðn þ 1Þ 2 nþ1 n1
⇒ MSE of s2 > MSE of n1 2

nþ1 s , i.e. MSE (Unbiased estimator) > MSE (biased
estimator).
Remark 1.13: Pooling of information Let Ti be an unbiased estimator of θ obtained
from the ith source, i = 1, 2 …, k. Suppose Ti’s are independent and V(Ti) = σ2i < σ2
8i. Then T k ¼ 1k ¼ ðT 1 þ T 2 þ þ T k Þ is also an unbiased estimator of θ with
P
V T k ¼ k12 k1 r2i \ rk ! 0 as k !/.
2
The implication of this statement is that T k gets closer and closer to the true
value of the parameter as k → ∝ (k becomes larger and larger).
On the other hand if Ti’s are biased estimator with common bias β, then T k
approaches to the wrong value θ + β instead of the true value θ even if k → ∝.
Problem 1.1 Let X1, X2, …, Xn be a random sample from bð1; ^Þ.
Show that
XðX1Þ 2
(i) nðn1Þ is an unbiased estimator of ^
XðnXÞ
(ii) is an unbiased estimator of ^ð1 ^Þ
nðn1Þ
P
where X = number of success in n tosses = ni¼ 1 Xi .
Minimum-VarianceUnbiased Estimator (MVUE)
Let U be the set of all u.e’s (T) of θ with E T 2 \ / 8h 2 H, and then an
estimator T0 2 U will be called a minimum-variance unbiased estimator (MVUE) of
θ{or γ(θ)} if V(T0) ≤ V(T) 8θ and for every T 2 U.
Result 1.1 Let U be the set of all u.e’s (T) of θ with E(T2) < ∝, 8θ 2 Θ.
Furthermore, let U0 be the class of all u.e’s (v) of ‘0’ {Zero} with E(v2) < ∝ 8θ, i.e.
U0 = {v: E(v) = 0 8θ and E(v2) < ∝].
Then an estimator T0 2 U will be an MVUE of θ iff
CovðT 0 ; vÞ ¼ E ðT 0 vÞ ¼ 0 8 h; 8v 2 U 0 :
Proof Only if part Given that T0 is an MVUE of θ, we have to prove that
E ðT 0 vÞ ¼ 0 8h; 8v 2 U 0 ð1:9Þ
Suppose the statement (1.9) is wrong.

∴ E(T0v) ≠ 0 for some θ0 and for some v0 2 U0. h
Note that for every real λ, T0 + λv0 is an u.e. of θ.

Again, T0 + λv0 2 U, as E(T0 + λv0)2 < ∞
Now V00(T0 + λvo) = V00(T0) + λ2 E00(V20) + 2λE(v0T0).
Choose a particular setting k ¼ EE00 ðTV0 v2 0 Þ assuming E00(v20) > 0
00 ð 0 Þ
(If E00(v20) = 0 then P00(v0 = 0) = 1, and hence E00(T0v0) = 0)

E 200 ðT o vo Þ
We have V 00 ðT 0 þ kv0 Þ ¼ V 00 ðT 0 Þ E o ðV 2o Þ
\ V 00 ðT 0 Þ which contradicts the
fact that T0 is a minimum-variance unbiased estimator of θ.
(if part) It is given that Cov(T0v) = 0 8 θ, 8 v 2 U0. We have to prove that T0 is
an MVUE of θ. Let T be an estimator belonging to U, then (T0 − T) 2 U0.
∴ From the given condition, Cov(T0, T0 − T) = 0

) V ðT 0 ÞCovðT 0 ; T Þ ¼ 0 ) Cov T 0; T ¼ V ðT 0 Þ ð1:10Þ
Now, V ðT 0 T Þ 0
) V ðT 0 Þ þ V ðT Þ 2 CovðT 0 ; T Þ 0
) V ðT 0 Þ þ V ðT Þ 2 V ðT 0 Þ 0 ðby ð1:10ÞÞ
) V ðT Þ V ðT 0 Þ 8h 2 H:
Since T is an arbitrary member of U so that result.

Result 1.2 Minimum-variance unbiased estimator is unique.
Proof Suppose T1 and T2 are MVUE’s of θ.
Then
E fT 1ðT 1T 2 Þg ¼ 0 ðfrom Result 1.1Þ

) E T 21 ¼ E ðT 1 T 2 Þ ) qT 1 T 2 ¼ 1
as V(T1) = V(T2) 8θ ⇒ T1 = βT2 + α with probability 1.

Now V ðT 1 Þ ¼ b2 V ðT 2 Þ 8h ) b2 ¼ 1 ) b ¼ 1 ðas qT 1 T 2 ¼ 1Þ:
Again EðT 1 Þ ¼ bE ðT 2 Þ þ a 8h )a ¼ 0
as E ðT 1 Þ ¼ EðT 2 Þ ¼ h and b ¼ 1 ) P ðT 1 ¼ T 2 Þ ¼ 1 h
Remark Correlation coefficient between T1 and T2 (where T1 is an MVUE of θ and
T2 is any unbiased estimator of θ) is always non-negative.
E fT 1 ðT 1 T 2 Þg ¼ 0 . . . ðfrom Result 1:1Þ

) Cov ðT 1 ; T 2 Þ ¼ V ðT 1 Þ 0:
Result 1.3 Let T1 be an MVUE of γ1(θ) and T2 be an MVUE of γ2(θ). Then

αT1 + βT2 will be an MVUE of αγ1(θ) + βγ2(θ).
Proof Let v be an u.e. of zero.
Then
EðT 1 vÞ ¼ 0 ¼ E ðT 2 vÞ
Now
EfðaT 1 þ bT 2 Þvg ¼ aEðT 1 vÞ þ bEðT 2 vÞ ¼ 0

) ðaT 1 þ bT 2 Þ is an MVUE of ac1 ðhÞ þ bc2 ðhÞ:
h
Result 1.4: (Rao–Cramer inequality) Let X1, X2,…, Xn be a random sample from
a population having p.d.f. f ðx; hÞ; h 2 H. Assume that θ is a non-degenerate open
interval on the real line. Let
Q T be an unbiased
estimator of γ(θ). Again assume that
the joint p.d.f. f ð x ; hÞ ¼ ni¼ 1 f ðxi ; hÞ of X ¼ ðX1 ; X2 ; . . .; Xn Þ satisfies the

following regularity conditions:
@f ð x ;hÞ

(a) @h exists
@
R R @f ð x ;hÞ
(b) @hf ð x ; hÞd x ¼ @h d
x

R
R @f ð x ;hÞ
@
(c) @h T x f ð x ; hÞd x ¼ Tð x Þ @h d x

and
(d) 0 < I(θ) < ∝

@ log f ð X ;hÞ 2
where IðhÞ ¼ E @h

, information on θ supplied by the sample of size n.
Then
ðc0 ðhÞÞ2
V ðT Þ 8h:
IðhÞ
R
Proof Since 1 ¼ Rn f ð x ; hÞd x

∴ We have from the condition (b)

Z @f x ; h Z @ log f x ; h

O ¼ dx ¼ f x;h dx ð1:11Þ
@h @h
n n
R R
h

Again, since T X is an u.e. of γ(θ), we have from condition (c)

Z @ log f ð x ; hÞ

c0 ðhÞ ¼ Tð x Þ f ð x ; hÞd x ð1:12Þ
@h
n
R
Now, (1.12)–(1.11). γ(θ) gives us

Z @ log f ð x ; hÞ

c0 ðhÞ ¼ ½Tð x Þ cðhÞ f ð x ; hÞd x
@h
n
R
)
@ log f ð X ; hÞ

¼ Cov Tð X Þ; :
@h
From the result, [Cov(X, Y)]2 ≤ V(X) V(Y), we have

" ( )# 2
@ log f ð X ; hÞ
0 2
fc ðhÞg ¼ Cov Tð X Þ;
@h
( )
@ log f ð X ; hÞ

VfTð X Þg V
@h
!2 !
@ log f ð X ; hÞ @ log f ð X ; hÞ

¼ V Tð X Þ E as from ð1:11Þ E ¼ 0
@h @h
¼ V fT ð X Þ g I ð hÞ
fc 0 ð hÞ g2
) V ðT Þ ; 8 h: h
I ðhÞ
Remark 1 If the variables are of discrete type, the underlying condition and the
proof of the Cramer–Rao inequality will also be similar, only the multiple integrals
being replaced by multiple sum.
Remark 2 For any set of estimators T, having expectation γ(θ),
fc0 ðhÞg2
MSE ¼ EðT hÞ2 ¼ V ðT Þ þ B2 ðT; hÞ þ B2 ðT; hÞ ¼
IðhÞ
½1 þ B0 ðT; hÞ2
þ B2 ðT; hÞ ½where cðhÞ ¼ h þ BðT; hÞ:
IðhÞ

Remark 3 Assuming that f x ; h is differentiable not only once but also twice, we

have
Z @ 2 log f ð x ; hÞ Z (@ log f ð x ; hÞ)2

0¼ f ð x ; hÞd x þ f ð x ; hÞd x
@h2 @h
n n
R R
( )
@ 2 log f ð x ; hÞ

) IðhÞ ¼ E :
@h2
Remark 4 Since X1, X2,…, Xn are iid random variables,

( )
@ 2 log f ð x ; hÞ

IðhÞ ¼ n E :
@h2
Remark 5 An estimator T for which the Cramer–Rao lower bound is attained is

often called a minimum-variance bound estimator (MVBE). In this case, we have
@ log f ð x ; hÞ

¼ kðhÞfT cðhÞg:
@h
Note that every MVBE is an MVUE, but the converse may not be true.
Remark 6 Distributions admitting an MVUE
A distribution having an MVUE of λ(θ) must satisfy
@ log f ðx;hÞ
@h ¼ kðhÞfT cðhÞg. It is a differential equation. So
Z Z
log f ðx; hÞ ¼ T kðhÞ dh kðhÞ cðhÞdh þ cðxÞ
) f ðx; hÞ ¼ AeTh1 þ h2
where hi ; i ¼ 1; 2 are functions of h and A ¼ ecðxÞ

Note If T be a sufficient statistic for θ, then
L ¼ gðT; hÞ hðx1 ; x2 ; . . . ; xn Þ
@ log L @ log gðT;hÞ

or; ¼ ð1:13Þ
@h @h
which is a function of T and h.

Now the condition that T be an MVB unbiased estimator of h is that / ¼
@ log L
@h ¼ BðThÞ which is a linear function of T and h and V(T) = 1/B. Thus if there
exists an MVB unbiased estimator of h it is also sufficient. The converse is not
necessarily true. Equation (1.13) may be a non-linear functions of T and h in which

case T is not an MVB unbiased estimator of h. Thus the existence of MVB unbiased
estimator implies the existence of a sufficient estimator, but the existence of suf-
ficient statistic does not necessarily imply the existence of an MVB unbiased
estimator. It also follows that the distribution possessing an MVB unbiased esti-
mator for its parameter can be expressed in Koopman form. Thus, when
0
L ¼ eA T þ BðhÞ þ Rðx1 ;x2 ;...xn Þ , T is an MVB unbiased estimator of h with variance
0
1=ð@A
@h Þ, which is also MVB.
Example x1 ; x2 ; . . .; xn is a random sample from Nðl; 1Þ

Pn Pn

n 12 ðxi lÞ2 12 ðxi xÞ2 þ nx þ nl2 2nlx n2 log 2p
Here, L ¼ p1ffiffiffiffi e i¼1
2p
¼e i¼1
Take A0 T ¼ nlx where T ¼ x: MVB ¼ 1=ð@ðnlÞ

@l Þ ¼ n.
1
Example x1 ; x2 ; . . .; xn is a random sample

from b(1, π) distribution.
Pn P
Here, P P xi log p þ ðn xi Þ logð1pÞ
nx log p 1p þ n logð1pÞ
n xi
L¼p xi
i ð1 pÞ ii¼1
¼e i
¼e :
0
Take ¼ A T ¼ p
n log 1p x where T ¼ x ¼ k
n, k = number of successes in n trials.

@n log 1p
p
pð1 pÞ
MVB ¼ 1 ¼ :
@p n
Remark 7 A necessary condition for satisfying the regularity conditions is that the
domain of positive p.d.f. must be free from h.

2
Example 1.16 Let X U½0; h, let us compute nE @logf@hðx;hÞ which is hn2 . So
Cramer–Rao lower bound, in this case, for the variance of an unbiased estimator of
2
h is hn (apparant).
Now, we consider an estimator. T ¼ n þn 1 X ðnÞ ; X ðnÞ ¼ maxðX1 ; X 2 ;. . .; X n Þ: P.d.f.
n1 1
of X ðnÞ is fxðnÞ ðxÞ ¼ n hx h ; 0 x h:
3h
Zh
n n x nþ1
)E X ðnÞ ¼ n xn dx ¼ n : 5 ¼ n h:
h h nþ1 nþ1
0 0
) T ¼ n þn 1 X ðnÞ is an unbiased estimator of θ. It can be shown that

2 2
VðTÞ ¼ nðnhþ 2Þ \ hn .
This is not surprising because the regularity conditions do not hold here.
Actually, here @f @h
ðx;hÞ
exists for h 6¼ x but not for θ = x since
1
f ðx; hÞ ¼ if h x
h
¼0 if h \ x:
Result 1.5: Rao–Blackwell Theorem Let fF h ; h 2 Hg be a family of distribution

functions and h be any estimator of cðhÞ in U which is the class of unbiased
estimators (h) with Eh ðh2 Þ \ 1 8h:. Let T be a sufficient statistic for the family
fF h ; h 2 Hg. Then Eðh=TÞ is free from h and will be an unbiased estimator of cðhÞ.
Moreover, V fEðh=TÞg VðhÞ 8h; h 2 H:
The equality sign holds iff h ¼ Eðh=TÞ with probability ‘1’.
Proof Since T is a sufficient for the family fF h ; h 2 Hg; conditional distribution of
h given T must be independent of h.
)Eðh=TÞ will be free from h.

Now; cðhÞ ¼ EðhÞ ¼ E T Eh=T ðh=TÞ 8 h
i:e: cðhÞ ¼ EfEðh=TÞg 8h
) Eðh=TÞ is an unbiased estimator of cðhÞ.

Again we know that VðhÞ ¼ V fEðhjTÞg þ EfVðhjTÞg
) VðhÞ V fEðhjTÞg ðsince VðhjTÞ 0Þ
‘=’ holds iff VðhjTÞ ¼ 0:, i.e. iff h ¼ EðhjTÞ with probability ‘1’
h i
VðhjT Þ ¼ Eh=T fh EðhjT Þg2
h
Result 1.6: Lehmann–Scheffe Theorem If T be a complete sufficientstatistic for θ
and if h be an unbiased estimator of cðhÞ, then EðhjTÞ will be an MVUE of cðhÞ.
Proof Let both h1 ; h2 2 U ¼ ½h : EðhÞ ¼ cðhÞ; Eðh2 Þ \ 1.
Then E fEðh1 jTÞg ¼ cðhÞ ¼ E fEðh2 jTÞg (from Result 1.5).
Hence, E fEðh1 jTÞ Eðh2 jTÞg ¼ 0. . .8h

) P Eðh1 jTÞ ¼ Eðh2 jTÞ ¼ 1 ð* T is completeÞ:
) EðhjTÞ is unique for any h 2 U.

Again, applying Result 1.5, we have V fEðhjTÞg VðhÞ8 h 2 U: Now since
EðhjTÞ is unique, it will be an MVUE of cðhÞ. h
Remark 1.14 The implication of Result 1.5 is that if we are given an unbiased
estimator h, then we can improve upon h by forming the new estimatorEðhjTÞ
based on h and the sufficient statistic T. This process of finding an improved

estimator starting from an unbiased estimator has been called Blackwellization.
Problem 1.2 Let X 1 ; X 2 ; . . .; X n be a random sample from Nðl; r2 Þ; r2 known. Let
cðlÞ ¼ l2 .
(a) Show that the variance of any unbiased estimator of l2 cannot be less than
4l2 r2
n
2 2 4l2 r2 2r4
(b) Show that T ¼ X rn is an MVUE of l2 with variance n þ n .

1 if X ¼ 0
Example 1.17 Let X PðkÞ, then show that dðxÞ ¼ is the only
0 otherwise
unbiased estimator of cðkÞ ¼ ek . Is it an MVUE of ek ?
Answer
Let h(x) be an unbiased estimator of ek ¼ h, say.
Then E fhðxÞg ¼ h 8h
x
X
1
h loge 1h
) hð x Þ ¼h 8h
x¼0
x!

1 if x ¼ 0
) hð x Þ ¼
0 if x 6¼ 0
) hðxÞ; i:e:; dðxÞ is the only unbiased estimator of ek .
Here, unbiased estimator of ek is unique and its variance exists. Therefore, dðxÞ
will be an MVUE of cðkÞ¼ ek .
" #
X
1
2
E fhðxÞg ¼ 1:Pðx ¼ 0Þ þ 0: Pðx ¼ iÞ ¼ ek \ 1
i¼1
Remark 1.15 MVUE may not be very sensible.
Example Let X 1 ; X 2 ; . . .; X n be a random sample from N ðl; 1Þ, and then T ¼

2 2
X 1n is an MVUE of l2 . Note that X 1n may occasionally be negative, so that
an MVUE of l2 is not very sensible in this case.
Remark 1.16 An MVUE may not exist even though an unbiased estimator does
exist.
Example Let
PfX ¼ 1g ¼ h and PfX ¼ ng ¼ ð1 hÞ2 hn ; n ¼ 0; 1; 2; . . .; 0\ h \ 1.
No MVUE of θ exists even though an unbiased estimator of θ exists.

1 if X ¼ 1
e.g. T ðX Þ ¼
0 otherwise
Bhattacharya system of Lower Bounds (Sankhya A (1946))

(Generalization of Cramer–Rao lower bound)
Regularity conditions
A family of distribution P ¼ ff h ðxÞ; h 2 Xg is said to satisfy Bhattacharya
regularity conditions if
1. θ lies in an open interval Ω of real line R. Ω may be infinite;
@i
2. @h i f h ðxÞ exists for almost all x and 8h; i ¼ 1; 2;. . .k;
Z Z i
@i @
3: f h ðxÞdx ¼ f h ðxÞdx 8h; i ¼ 1; 2. . .k; and
@h i
@hi
i ¼ 1; 2;. . .k
4. V k
k ðhÞ ¼ mij ðhÞ ;
j ¼ 1; 2;. . .k
exists and is positive definite 8h where

1 @i @j
mij ðhÞ ¼ E h f ð xÞ f ð xÞ :
f h ðxÞ @hi h @h j h
For i = 1, Bhattacharya regularity conditions ≡ Cramer–Rao regularity

conditions.
Theorem 1.1 Let P ¼ff h ðxÞ; h 2 Xg be a family of distributions satisfying
above-mentioned regularity conditions and gðhÞ be a real valued, estimable, and
k times differentiable function of θ. Let T be an unbiased estimator of gðhÞ satisfying
R R
5. @h@ i tðxÞf h ðxÞdx ¼ tðxÞ @h@ i f h ðxÞdx
Then
Varh ðT Þ g0 V 1 g 8h
where
n o @i
g0 ¼ gð1Þ ðhÞ; gð2Þ ðhÞ; . . .; gðkÞ ðhÞ ; gðiÞ ðhÞ ¼ i gðhÞ
@h
Proof
1 @i
Define bi ðx;hÞ ¼ f ð xÞ
f h ðxÞ @hi h
Z
1 @i
E ½bi ðx;hÞ ¼ f ðxÞ f h ðxÞdx ¼ 0
f h ðxÞ @hi h
2
1 @i
V ½bi ðx;hÞ ¼ E½bi ðx;hÞ2 ¼ E f ð x Þ ¼ mii ðhÞ
f h ðxÞ @hi h

Cov bi ; bj ¼ mij ðhÞ
Z Z
1 @i @i
CovðT;bi Þ ¼ t ð xÞ f h ðxÞ : f h ðxÞdx ¼ tðxÞf h ðxÞdx ¼ gðiÞ ðhÞ
f h ðxÞ @h i
@hi
0 1
T
B C
B 1C
b
B C
B b2 C
B C
B C
Let Rðk þ 1Þxðk þ 1Þ ¼ DispB C
B C
B C
B C
B C
@ A
bk
8 ð1Þ ð2Þ 9
>
> V h ðT Þ g ð hÞ g ð hÞ . . . gð k Þ ð hÞ >
>
>
> gð1Þ ðhÞ >
>
>
> m11 m12 ... m1k > >
>
>
>
< gð2Þ ðhÞ >
=
m21 m22 ... m 21 V h ðT Þ g0
¼ ¼
>
> ... ... >
> g V
>
> >
>
>
> >
>
> ... ... >
>
>
: ðkÞ ;
g ð hÞ mk1 mk2 ... mkk

jRj ¼ jV j V h ðT Þ g0 V 1 g
as jRj 0; jV j 0
) V h ðT Þ g0 V 1 g 0 i:e: V h ðT Þ g0 V 1 g 8h:
0 2 0 2
Cor: For k ¼ 1 V h ðTÞ fvg11ðhÞg fg ðhÞg
ðhÞ ¼ IðhÞ = Cramer–Rao lower bound. h
P
Case of equality holds when j j ¼ 0

X
X h
X Xi
)R \ k þ 1 or R k; RðVÞ ¼ k R =rank of

X
X
R RðVÞ ) R ¼ k:
0 P
Lemma 1.1 Let X ¼ x1 ; x2 ; . . .; xp ; DðXÞ ¼ p
p
P
is of rank rð pÞ iff x1 ; x2 ; . . .; xp satisfies (p − r) linear restrictions of the
form
n o
a11 x1 Eðx1 Þ þ a12 x2 Eðx2 Þ þ þ a1p xp Eðxp Þ ¼ 0
n o
a21 x1 Eðx1 Þ þ a22 x2 Eðx2 Þ þ þ a2p xp Eðxp Þ ¼ 0
:
:
n o
apr;1 x1 Eðx1 Þ þ apr;2 x2 Eðx2 Þ þ þ apr;p xp Eðxp Þ ¼ 0
with probability 1.
Put p = k + 1, r = k; x1 ¼ T, x2 ¼ b1 ; . . .; xp ¼ bk .
Then RðRÞ ¼ k iff T; b1 ; b2 ; . . .; bk satisfy one restriction with probability ‘1’ of
the form
a1 fT EðTÞg þ a2 fb1 Eðb1 Þg þ þ ak þ 1 fbk Eðbk Þg ¼ 0

) a1 fT gðhÞg þ a2 b1 þ þ ak þ 1 bk ¼ 0
) T gðhÞ ¼ b1 b1 þ b2 b2 þ þ bk bk ¼ b 0 b

where b 0 ¼ ðb1 ; b2 ;. . .; bk Þ and b ¼ ðb1 ; b2 ; . . .; bk ; Þ0 .

Result
T gðhÞ ¼ b0 b with probability ‘1’ ) T gðhÞ ¼ g0 V 1 b with probability ‘1’.
Proof
T gðhÞ ¼ b0 b ) V h ðTÞ ¼ g0 V 1 g

Consider V h b0 b g0 V 1 b ¼ V h ðT g0 V 1 bÞ
¼ V h ðTÞ þ g0 V 1 VðbÞV 1 g 2g0 V 1 CovðT; bÞ
¼ g0 V 1 g þ g0 V 1 g 2g0 V 1 g ¼ 0 ) b0 b ¼ g0 V 1 b with probability

0 ‘1’.
1 h
gð1Þ
B gð2Þ C
ð1Þ ð2Þ 1 B
B
C
C
A series of lower bounds: g V g ¼ g ; g ; . . .; g V B : C gives
0 1 ðkÞ
B C
@ : A
gðkÞ
1
nth lower bound ¼ gðnÞ 0 V n gðnÞ ¼ Dn ; n ¼ 1; 2;::; k
Theorem 1.2 The sequence fDn g is a non-decreasing sequences, i.e. Dn þ 1 Dn 8n

Proof The (n+1)th lower bound D ¼ g0n þ 1 V n þ 1 1 gn þ 1 where
n o n þ1
g0n þ 1 ¼ gð1Þ ðhÞ ; gð2Þ ðhÞ ; . . .; gðnÞ ðhÞ ; gðn þ 1Þ ðhÞ ¼ g1n ; gn þ 1
0 1
m11 m12 m1n m1;n þ 1
B m21 m22 m2n m2;n þ 1 C
B C
B: : : : C mn
Vnþ1 ¼B C ¼ V0n
B: : : : C mn mn þ 1;n þ 1
B C
@ mn1 mn2 :: mnn mn;n þ 1 A
mn þ 1;1 mn þ 1;2 :: mn þ 1;n mn þ 1;n þ 1

where m0n ¼ mn þ 1;1 mn þ 1;2 . . . mn þ 1;n .
Now Dn þ 1 ¼ g0n þ 1 V 1
n þ 1 gn þ 1
¼ g0n þ 1 C 0 ðC 0 Þ1 V 1 1
n þ 1 C Cgn þ 1 for any non symmetric matrix C n þ 1xn þ 1
0 0 1
¼ ðCgn þ 1 Þ ðCVn þ 1 C Þ ðCgn þ 1 Þ
!
In o

Choose C ¼
m0n V 1 1
n
!
In o g gn

n
) Cgn þ 1 ¼ ¼
m0n V 1
n 1 gn þ 1 gn þ 1 m0n V 1
n gn
! ! !
In o Vn mn I n V 1n mn
CVn þ 1 C 0 ¼
m0n V 1 1 m0n mn þ 1;n þ 1; o 1
n
! ! !
In o Vn o Vn o

¼ ¼
m0n V 1
n 1 m0n mn þ 1;n þ 1 m0n V 1
n mn
o

E n þ 1;n þ 1
Since V n þ 1 is positive definite, CV n þ 1 C0 is also +ve definite h

!
V 1
n o
0 1
E n þ 1;n þ 1; [ 0; ðCV n þ 1 C Þ ¼
o E 1
n þ 1;n þ 1

Then
0 1
V 1 o
n g0n
Dn þ 1 ¼ gn ; g nþ1
m0n V 1 @ A
n gn
o E 1 gn þ 1 m0n V 1
n gn
n þ 1;n þ 1
n 1 o g0n

¼ gn V n 1 ; gn þ 1 m0n V 1
n gn E n þ 1;n þ 1
gn þ 1 m0n V 1
n gn
2
gn þ 1 m0n V 1
n gn
¼ g0n V n 1 gn þ g0n V n 1 gn ¼ Dn
E1
n þ 1;n þ 1
i.e. Dn þ 1 Dn :
If there exists no unbiased estimator T of gðhÞ for which V(T) attains the nth
Bhattacharya’s Lower Bound (BLB), then one can try to find a sharper lower bound
by considering the (n + 1)th BLB. In case the lower bound is attained at nth stage,
then Dn þ 1 ¼ Dn . However, Dn þ 1 ¼ Dn does not imply that the lower bound is
attained at the nth stage.
Example 1.18 X 1 ; X 2 ; . . .; X n is a random sample from iid Nðh; 1Þ
P 2
f h ðxÞ ¼ Const: e2 ðxi hÞ ; gðhÞ ¼ h2
1

1 1
X N h; i.e. EðXÞ ¼ h; VðXÞ ¼
n n
2 1 2 1
) EðX Þ E2 ðXÞ ¼ ) EðX Þ ¼h2 þ
n n
2 1 2 1
)E X ¼ h2 ; T ¼X :
n n
@ P 2 X
f h ðxÞ ¼ Const: e2 ðxi hÞ
1
ð x i hÞ
@h
1 @ X
b1 ¼ f h ðxÞ ¼ ð x i hÞ
f h ðxÞ @h
1 @2 nX o2
b2 ¼ f ðxÞ ¼ ð x i h Þ n
f h ðxÞ @h2 h

Eðb1 Þ ¼ 0; Eðb2 Þ¼ 0; E b21 ¼ n;
nX o3 nX o
Eðb1 b2 Þ ¼ E ðxi hÞ nE ðxi hÞ ¼ 0
nX o4 nX o2
E b22 ¼ E ðxi hÞ þ n2 2nE ð x i hÞ
¼ 3n2 þ n2 2n n ¼ 2n2
1
n 0 1 0
V¼ )V ¼ n
1
0 2n2 0 2n2
gðhÞ ¼ h2 ; gð1Þ ðhÞ ¼ 2h; gð2Þ ðhÞ ¼ 2 )g0 ¼ ð2h; 2Þ

! !
1
0 2h 2h=
D2 ¼ g V0 1
g ¼ ð2h; 2Þ n
1
¼ ð2h; 2Þ n
0 2 1 2
2n2 n
4h2 2
¼ þ 2:
n n
2
2 2
2 v21;k ; k ¼ nh2 ; V nX
nX ¼ 2 þ 4h
¼ 2 þ 4nh2 ; V X
n2 n
Lower bound is attained if b0 b ¼ T gðhÞ ¼ g0 V 1 b:
! P
1
0 ð x i hÞ
T gðhÞ ¼ ð2h; 2Þ n
P
0 1
2n2 ½ ðxi hÞ2 n

x h 1
¼ ð2h; 2Þ 1 ¼ 2hðx hÞ þ ðx hÞ2
2 ðx hÞ2 2n
1 n
1 1
¼ ðx hÞð2h þ x hÞ ¼ x2 h2 :
n n
Theorem 1.3 Let fh ðxÞ is of the exponential, i.e.
fh ðxÞ ¼ hðxÞek1 ðhÞtðxÞ þ k2 ðhÞ such that k10 ðhÞ 6¼ 0: ð1:14Þ
Then the variance of an unbiased estimator of gðhÞ; say b

g ðxÞ, attains the kth lower
bound but not ðk 1Þth if b
g ðxÞ is a polynomial of degree k in t(x).
Proof If fh ðxÞ is of form (1.14), then
@
fh ðxÞ ¼ fh ðxÞ k10 ðhÞtðxÞ þ k20 ðhÞ
@h
1 @
b1 ¼ fh ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ
fh ðxÞ @h
@2 h 2 00 i
0 0 00
f h ðxÞ ¼ fh ðxÞ k 1 ðhÞtðxÞ þ k 2 ðhÞ þ k 1 ðhÞtðxÞ þ k 2 ðhÞ
@h2
1 @2 2
b2 ¼ fh ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ k100 ðhÞtðxÞ þ k200 ðhÞ
fh ðxÞ @h 2
1 @i
i h
Generally, bi ¼ fh ðxÞ f ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ Pi1 ftðxÞ; hg
@hi h
where Pi1 ftðxÞ; hg ¼ a polynomial in t(x) of degree at most (i − 1).
iP
1
Let Pi1 ftðxÞ; hg ¼ Qij ðhÞ:t j ðxÞ
j¼0
Then
i X
i1
bi ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ Qij ðhÞ:t j ðxÞ
j¼0
i1
X j ij Xi1 ð1:15Þ
i
¼ k10 ðhÞ :t j ðxÞ k20 ðhÞ þ Qij ðhÞ:t j ðxÞ
j¼0 j j¼0
¼ a polynomial in tðxÞ of degree i since k10 ðhÞ 6¼ 0
Condition of equality in BLB

Variance of b
g ðxÞ attains the kth BLB but not the ðk 1Þth BLB iff
X
k
b
g ðxÞ ¼ a0 ðhÞ þ ai ðhÞbi ð1:16Þ
i¼1
with ak ðhÞ 6¼ 0:
Proof Only if part Given that b g ðxÞ is of the form (1.16), we have to show that
^
gðxÞ is a polynomial of degree k in t(x). From (1.15), bi is a polynomial of degree
i in t(x). So by putting the value of bi in (1.16), we get ^gðxÞ as a polynomial of
degree k in t(x) since ak ¼ 0.
if part Given that h
X
k
^gðxÞ ¼ Cj t j ðxÞ
j¼0 ð1:17Þ
½Ck 6¼ 0 ¼ a polynomial of degree k in tðxÞ
It is sufficient to show that we can write ^gðxÞ in the form of (1.16)
X
k X
k X
i1
a0 ðhÞ þ ai ðhÞbi ¼ a0 ðhÞ þ ai ðhÞ Qij ðhÞ t j ðxÞ
i¼0 i¼0 j¼0
!
X
k X
i i j ij
þ ai ðhÞ k10 ðhÞ t j ðxÞ k20 ðhÞ
i¼1 j¼0 j
from (1.15)
!
X
k jXk i 0 ij X
k1 X k
¼ t j ðxÞ k10 ðhÞ ai ðhÞ k2 ðhÞ þ t j ðxÞ ai ðhÞQij
j¼0 i¼j j j¼0 i¼j þ 1
!
k X i1 jXk i 0 ij X
k1 X k
¼ tk ðxÞ k10 ðhÞ ak ðhÞ þ t j ðxÞ k10 ðhÞ ai ðhÞ k2 ðhÞ þ t j ðxÞ ai ðhÞQij ðhÞ
j¼0 i¼j j j¼0 i¼j þ 1
" ! !#
k X i1 j X k i 0 j 0 ij
¼ tk ðxÞ k10 ðhÞ ak ðhÞ þ t j ðxÞ k10 ðhÞ aj ðhÞ þ ai ðhÞ k1 ðhÞ k2 ðhÞ þ Qij ðhÞ
j¼0 i¼j þ 1 j
ð1:18Þ
Equating coefficients of t j from (1.17) and (1.18), we get

k
Ck ¼ ak ðhÞ k10 ðhÞ
Ck
) ak ð hÞ ¼ k 6¼ 0 and
k10 ðhÞ
! !
Pk i j ij
Cj i¼j þ 1 ai ð hÞ k10 ðhÞ k20 ðhÞ þ Qij ðhÞ
j
aj ð hÞ ¼
fk 0 ðhÞg j
for j ¼ 0; 1; . . .; k 1
As such a choice of aj ðhÞ exists with ak ðhÞ 6¼ 0, the result follows.

Result 1 If there exists an unbiased estimator of gðhÞ say ^gðxÞ such that g^ðxÞ is a
polynomial of degree k in t(x), then
Dk ¼ kth BLB to the variance of an unbiased estimator of gðhÞ ¼ Var fgðxÞg:
Result 2 If there does not exist any polynomial in t(x) which is an unbiased
estimator of gðhÞ, then it is not possible to find any unbiased estimator of gðhÞ
where variance attains BLB for some k.
1.4 Consistent Estimator
An estimation procedure should be such that the accuracy of an estimate increases

with the sample size. Keeping this idea in mind, we define consistency as follows.
Definition An estimator Tn is said to be (weakly) consistent for cðhÞ if for any two
positive numbers 2 and d there exists an n0 (depending upon 2, d) such that
PrfjTn cðhÞj 2g [ 1 d whenever n n0 and for all h 2 H, i.e. if
Pr
Tn !cðhÞ as n ! 1
Result 1.7 (Sufficient condition for consistency):

An estimator Tn will be consistent for cðhÞ if EðTn Þ ! cðhÞ and VðTn Þ ! 0 as
n ! 1.
Proof By Chebysheff’s inequality, for any 20 [ 0
VðTn Þ
PrfjTn EðTn Þj 20 g [ 1 :
202
Now jT n cðhÞj jT n E ðT n Þj þ jE ðT n Þ cðhÞj
jT n E ðT n Þj 20 ) jT n cðhÞj 20 þ jEðT n Þ cðhÞj
Hence, h

PrfjT n cðhÞj 20 þ jE ðT n Þ cðhÞjg Pr T n EðT n Þ 20 [ 1
VðT n Þ
: ð1:19Þ
202
Since E ðT n Þ ! cðhÞ and VðT n Þ ! 0 as n ! 1, for any pair of two positive

numbers ð200 ; dÞ, we can find an n0 (depending on ð200 ; dÞ) such that
jE ðT n Þ ! cðhÞj 200 ð1:20Þ
and
VðT n Þ 202 d ð1:21Þ
whenever n n0 . For such n0
jTn cðhÞj 20 þ jEðTn Þ cðhÞj ) jTn cðhÞj 20 þ 200

V(Tn Þ ð1:22Þ
and 1 02 1 d
2
Now from (1.19) and (1.22), we have PrfjT n cðhÞj 20 þ 200 g
PrfjT n cðhÞj 20 þ jE ðT n Þ cðhÞjg [ 1 d:
Taking 2 ¼ 20 þ 200
) PrfjT n cðhÞj 2g [ 1 d whenever n n0
Since, 20 200 and d are arbitrary positive numbers, the proof is complete.
(It should be remembered that consistency is a large sample criterion)
Example 1.19 Let X 1 ; X 2 ; . . .; X n be a random sample from a population mean l
P
and standard deviation r. Then X n ¼ 1n i X i is a consistent estimator of l.
2
Proof E X n ¼ l; V X n ¼ rn ! 0 as n ! 1. Sufficient condition of consistency
holds. ) X n will be consistent for l. h
1.4 Consistent Estimator 41
Alt
By Chebysheff’s inequality, for any 2
r2
Pr X n l 2 [ 1
n 22
Now for any d, we can find an n0 so that

Pr X n l 2 [ 1 d whenever n n0 (here d ¼ nr22 )
2
Example 1.20 Show that in random sampling from a normal population, the sample
mean is a consistent estimator of population mean.
n pffiffio
Proof For any 2 ð [ 0Þ; Pr X n l 2 ¼ Pr jZ j 2 r n
p
2 n
ffi
Zr
1 X n l pffiffiffi
pffiffiffiffiffiffie2t dt
12
¼ where Z ¼ n N ð0; 1Þ
pffi
2p r
2 n
r
h
Hence, we can choose an n0 depending on any two positive numbers 2 and d
such that
PrfjX n lj 2g [ 1 d whenever n n0
)X n Pr
!l as n ! 1 )X n is consistent for l.
Example 1.21 Show that for random sampling from the Cauchy population with
density function
f ðx;lÞ ¼ p1 1 þ ð1xlÞ2 ; 1 \ x\ 1; the sample mean is not a consistent esti-
mator of l but the sample median is a consistent estimator of l.
Answer
Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; lÞ ¼ p1 1 þ ð1xlÞ2 : It can be shown
that the sample mean X is distributed as x.
Z2
1 1 2
) Pr X n l 2 ¼ dZ ¼ tan1 2 taking Z ¼X l
p 1þZ 2 p
2
which is free from n.

Since this probability does not involve n, Pr X n l 2 cannot always be
greater than 1 d; and however large n may be.
It can be shown that for the sample median X ~ n;

2
~n ¼ l þ 0 1 ;
E X ~n ¼ 0 1 þ p
V X
n n 4n

) Since E X~ n ! l and V X ~ n ! 0 as n ! 1; sufficient condition for consistent
~
estimator holds. )X n ; is consistent for l.
Remark 1.17 Consistency is essentially a large sample criterion.
Remark 1.18 Let T n be a consistent estimator of cðhÞ and wfyg be a continuous
function. Then wfT n g will be a consistent estimator of wfcðhÞg.
Proof Since T n is a consistent estimator of cðhÞ; for any two +ve numbers 21 and d;
we can find an n0 such that h
PrfjT n cðhÞj 21 g [ 1 d whenever n n0 :
Now wfT n g is a continuous function of T n : Therefore, for any 2, we can choose
an 21 such that
jT n cðhÞj 21 ) jwfT n g wfcðhÞgj 2 :
) PrfjwfT n g wfcðhÞgj 2g PrfjT n cðhÞj 21 g [ 1 d whenever

n n0
i.e. PrfjwfT n g wfcðhÞgj 2g [ 1 d whenever n n0 :
Remark 1.19 A consistent estimator is not unique
For example, if T n is a consistent estimator of h, then for any fixed a and b
T 0n ¼ na
nb T n is also consistent for h.
Remark 1.20 A consistent estimator is not necessarily unbiased, e.g. U : f ðx; hÞ ¼

h ; 0 \ x\ h; consistent estimator of h is X ðnÞ ¼ max1 i n X i : But it is not
1
unbiased.
Remark 1.21 An unbiased estimator is not necessarily consistent, e.g.
f ð xÞ ¼ 12 ejxhj ; 1 \ x \ 1.
Xð1Þ þ XðnÞ
An unbiased estimator of h is 2 ; but it is not consistent.
Remark 1.22 A consistent estimator may be meaningless,

0 if n 1010
e.g. Let Tn0 ¼
Tn if n 1010
If Tn is consistent, then Tn0 is also consistent, but Tn0 is meaningless for any practical
purpose.
Remark 1.23 If T1 and T2 are consistent estimators of c1 ðhÞ and c2 ðhÞ; then
ðiÞðT1 þ T2 Þ is consistent for c1 ðhÞ þ c2 ðhÞ and
ðiiÞT 1 T 2 is consistent for c1 ðhÞc2 ðhÞ:
Proof (i) Since T1 and T2 are consistent for c1 ðhÞ and c2 ðhÞ; we can always choose
an n0 much that
1.4 Consistent Estimator 43
PrfjT1 c1 ðhÞj 21 g [ 1 d1
and
PrfjT2 c2 ðhÞj 22 g [ 1 d2
whenever n n0
21 ; 22 ; d1 ; d2 are arbitrary positive numbers.
Now jT1 þ T2 c1 ðhÞ c2 ðhÞj jT1 c1 ðhÞj þ jT2 c2 ðhÞj
21 þ 22 ¼ 2; ðsayÞ
) PrfjT1 þ T2 c1 ðhÞ c2 ðhÞj 2g PrfjT1 c1 ðhÞj 21 ; jT2 c2 ðhÞj 22 g

PrfjT1 c1 ðhÞj 21 g þ PrfjT2 c2 ðhÞj 22 g 1
½*PðABÞ Pð AÞ þ PðBÞ 1
1 d1 þ 1 d2 1 ¼ 1 ðd1 þ d2 Þ ¼ 1 d for n n0
) PrfjT1 þ T2 c1 ðhÞ c2 ðhÞj 2g [ 1 d for n n0
Hence, T1 þ T2 is consistent estimator of c1 ðhÞ þ c2 ðhÞ.

(ii) Again jT1 c1 ðhÞj 21 and jT2 c2 ðhÞj 22
) jT1 T2 c1 ðhÞc2 ðhÞj ¼ jfT1 c1 ðhÞgfT2 c2 ðhÞg þ T2 c1 ðhÞ þ T1 c2 ðhÞ 2c1 ðhÞc2 ðhÞj
jfT1 c1 ðhÞgfT2 c2 ðhÞgj þ jc1 ðhÞjjT2 c2 ðhÞj þ jc2 ðhÞjjT1 c1 ðhÞj
21 22 þ jc1 ðhÞj 22 þ jc2 ðhÞj 21 ¼2 ðsayÞ
) PrfjT1 T2 c1 ðhÞc2 ðhÞj 2g PrfjT1 c1 ðhÞj 21 ; jT2 c2 ðhÞj 22 g

PrfjT1 c1 ðhÞj 21 g þ PrfjT2 c2 ðhÞj 22 g 1
1 d1 þ 1 d2 1 ¼ 1 ðd1 þ d2 Þ ¼ 1 d whenever n n0
) T1 T2 is consistent for c1 ðhÞc2 ðhÞ: h

Example 1.22 Let X1 ; X2 ; . . .; Xn be a random sample from the distribution of X
for which the moments of order 2r ðl02r Þ exist. Then show that
P
n
(a) m0r ¼ 1n Xir is a consistent estimator of l0r , and
P 1
(b) mr ¼ 1n ðXi X Þr is a consistent estimator of lr . These can be proved using
the following results.
l0 l02
As E m0r ¼ l0r and V m0r ¼ 2r r
n

)V m0r ! 0 as n ! 1 )m0r is consistent for l0r and E ðmr Þ ¼ lr þ 0 1n

1 1
V ðm r Þ ¼ l2r l2r 2rlr1 lr þ 1 þ r 2 l2r1 l2 þ 0 2 ! 0 as n ! 1
n n
)mr is consistent for lr :
l2
(c) Also it can be shown that b1 and b2 are consistent estimators of b1 ¼ l33 and
2
b2 ¼ ll42 :
2
1.5 Efficient Estimator
Suppose the regularity conditions hold for the family of distribution

ff ðx; hÞ; h 2 Hg: Let an unbiased estimator of cðhÞ be T. Then the efficiency of T is
given by
.
fc 0 ð hÞ g2
I ð hÞ
V ðT Þ
It is denoted by eff. (T)/or e(T). Clearly, 0 eðT Þ 1:

An estimator T will be called (most) efficient if eff(T) = 1. An estimator T of cðhÞ
is said to be asymptotically efficient if E ðT Þ ! cðhÞ and eff (T) ! 1 as n ! 1:
Let T1 and T2 be two unbiased

estimators of cðhÞ. Then the efficiency of T1
relative to T2 is given by eff. T1=T2 ¼ VV ððTT21 ÞÞ :
Remark 1.24 An MVBE will be efficient.

Remark 1.25 In many cases, MVBE does not exist even though the family satisfies
the regularity conditions. Again in many cases, the regularity conditions do not
hold. In such cases, the above definition fails. If MVUE exists, we take it as an
efficient estimator.
Remark 1.26 The efficiency measure has an appealing property of determining the
relative sample sizes needed to attain the same precision of estimation as measured
by variance.
e.g.: Suppose an estimator T1 is 80 % efficient and V ðT1 Þ ¼ nc ; where c depends
upon h. Then, V ðT0 Þ ¼ 0:8 nc : Thus the estimator based on a sample of size 80 will
be just as good as an estimator T1 based on a sample of size 100.
1.5 Efficient Estimator 45
Example 1.23 Let T1 and T2 be two unbiased estimators of h with efficiency e1 and
e2 , respectively. If q denotes the correlation coefficient between T1 and T2 , then
pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
e1 e2 ð1 e1 Þð1 e2 Þ q e1 e2 þ ð1 e1 Þð1 e2 Þ
Proof For any real ‘a’, T ¼ aT 1 þ ð1 aÞT 2 will also be an unbiased estimator of
h. Now
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
V ðT Þ ¼ a2 V ðT 1 Þ þ ð1 aÞ2 V ðT 2 Þ þ 2að1 aÞq V ðT 1 ÞV ðT 2 Þ:
Suppose T0 be an MVUE of h with variance V0 . Then V ðT Þ V0
V0 V0 V0
) a2 þ ð1 aÞ2 þ 2að1 aÞq pffiffiffiffiffiffiffiffiffi V0
e1 e2 e1 e2

1 1 2q 1 q 1
) a2 þ pffiffiffiffiffiffiffiffiffi 2a pffiffiffiffiffiffiffiffiffi þ 1 0
e1 e2 e1 e2 e2 e1 e2 e2
!2 !2
pffiffiffiffiffiffi pffiffiffiffiffiffi
e2 e1 e2
q
e2 e1 e2
q
e2 1
1 1 1
) a 1 þ 1 1 0
e1 þ e2 e1 e2
1 p2qffiffiffiffiffiffi 1 ffiffiffiffiffiffi
e1 þ e2 e1 e2
p2q
e1 þ e2 e1 e2
1 p2qffiffiffiffiffiffi
1 pqffiffiffiffiffi
e2 e1 e2
Taking a ¼ 1 ffiffiffiffiffi, we get
þ e1 pe2qe
e1 2 1 2
2
1 1 1 2q 1 q
1 þ pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi 0
e2 e1 e2 e1 e2 e2 e1 e2

p ffiffiffiffiffiffiffiffi
ffi e 1 e1
) q2 þ 2q e1 e2 þ ð1 e 2 Þ 1 þ 0
e2 e2
pffiffiffiffiffiffiffiffiffi
) q2 2q e1 e2 1 þ e1 þ e2 0
pffiffiffiffiffiffiffiffiffi 2
) ð q e1 e2 Þ ð 1 e1 Þ ð 1 e2 Þ 0
pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
) q e1 e2 ð1 e1 Þð1 e2 Þ Hence, the result. h
Remark 1.27 The correlation coefficient between T and the most efficient estimator
pffiffiffi
is e where e is the efficiency of the unbiased estimator T. Put e2 ¼ e and e1 = 1 in
pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
the above inequality; q e1 e2 ð1 e1 Þð1 e2 Þ and easily we get the
result.
Chapter 2
Methods of Estimation
2.1 Introduction
In chapter one, we have discussed different optimum properties of good point

estimators viz. unbiasedness, minimum variance, consistency and efficiency which
are the desirable properties of a good estimator. In this chapter, we shall discuss
different methods of estimating parameters which are expected to provide estima-
tors having some of these important properties. Commonly used methods are:
1. Method of moments
2. Method of maximum likelihood
3. Method of minimum v2
4. Method of least squares
In general, depending on the situation and the purpose of our study we apply any
one of the methods that may be suitable among the above-mentioned methods of
point estimation.
2.2 Method of Moments
The method of moments, introduced by K. Pearson is one of the oldest methods of

estimation. Let (X1, X2,…Xn) be a random sample from a population having p.d.f.
(or p.m.f) f(x,θ), θ = (θ1, θ2,…, θk). Further, let the first k population moments about
zero exist as explicit function of θ, i.e. l0r ¼ l0r ðh1 ; h2 ; . . .; hk Þ; r = 1, 2,…,k. In the
method of moments, we equate k sample moments with the corresponding popu-
lation moments. Generally, the first k moments are taken because the errors due to
sampling increase with the order of the moment. Thus, we get k equations
l0r ðh1 ; h2 ; . . .; hk Þ; ¼ m0r ; r = 1, 2,…, k. Solving these equations we get the method
P P
of moment estimators (or estimates) as m0r ¼ 1n ni¼1 X ri (or m0r ¼ 1n ni¼1 xri ).

DOI 10.1007/978-81-322-2514-0_2
48 2 Methods of Estimation
If the correspondence between l0r and θ is one-to-one and the inverse function is

hi ¼ f i l01 ; l02 ; . . .; l0k , i = 1, 2,.., k then, the method of moment estimate becomes

^
hi ¼ f i m01 ; m02 ; . . .; m0k . Now, if the function fi() is continuous, then by the weak
law of large numbers, the method of moment estimators will be consistent.
This method gives maximum likelihood estimators when f(x, θ) = exp
(b0 + b1x + b2x2 + ….) and so, in this case it gives efficient estimator. But the
estimators obtained by this method are not generally efficient. This is one of the
simplest methods. Therefore, these estimates can be used as a first approximation to
get a better estimate. This method is not applicable when the theoretical moments
do not exist as in the case of Cauchy distribution.
Example 2.1 Let X1 ; X2 ; . . .Xn be a random sample from p.d.f.
f ðx; a; bÞ ¼ bða;b
1
Þx
a1
ð1 xÞb1 ; 0\x\1; a; b [ 0: Find the estimators of a and
b by the method of moments.
Solution
We know E ð xÞ ¼ l11 ¼ a þa b and Eðx2 Þ ¼ l12 ¼ ða þ baÞððaaþþ1bÞ þ 1Þ :

P
Hence, a þa b ¼ x; ða þ abðÞaðaþþ1bÞ þ 1Þ ¼ 1n ni¼1 x2i
P 2
ðx 1Þð xi nxÞ ^
By solving, we get ^b ¼ P 2 and â ¼ 1xb x :
ðxi xÞ
2.3 Method of Maximum Likelihood
This method of estimation is due to R.A. Fisher. It is the most important general
method of estimation. Let X ¼ ðX 1 ; X 2 ; . . .; X n Þ denote a random sample with joint

p.d.f or p.m.f. f x ; h ; h 2 H (θ may be a vector). The function f x ; h , con-

sidered as a function of θ, is called the likelihood function. In this case, it is denoted
by L(θ). The principle of maximum likelihood consists of choosing an estimate, say
^
h; within the admissible range of θ, that maximizes the likelihood. ^h is called the
maximum likelihood estimate (MLE) of θ. In other words, ^h will be an MLE of θ if

L h^ LðhÞ8 h 2 H:
In practice, it is convenient to work with logarithm. Since log-function is a

monotone function, ^h satisfies
2.3 Method of Maximum Likelihood 49

log L ^h log LðhÞ8 h 2 H:
Again, if log LðhÞ is differentiable within H and ^h is an interior point, then ^h will
be the solution of
@log LðhÞ 0
¼ o; i ¼ 1; 2; . . .; k; h k1 = ðh1 ; h2 ; . . .; hk Þ .
@hi
These equations are known as likelihood equations.

Problem 2.1 Let ðX 1 ; X 2 ; . . .; X n Þ be a random sample from b(m, p ), (m known).
Pn
^¼ mn
Show that p 1
i¼1 X i is an MLE of p.
Problem 2.2 Let ðX 1 ; X 2 ; . . .; X n Þ be a random sample from P (λ). Show that

^ P
k ¼ 1n ni¼1 X i is an MLE of λ.
2.3 Let ðX 1 ; X 2 ; . . .; X n Þ be a random

Problem
P
sample from N ðl; r2 Þ. Show that
P
s2 is an MLE of ðl; r2 Þ, where X
X; ¼ 1 n X i and s2 ¼ 1 n ðX i X Þ2 :
n i¼1 n i¼1
Example 2.2 Let ðX 1 ; X 2 ; . . .; X n Þ be a random sample from a population having

p.d.f f ðx; hÞ ¼ 12 ejxhj ; 1\x\1:
Show that the sample median X ~ is an MLE of θ.
Answer
Pn
LðhÞ ¼ Const: e i¼1
jxi hj
P
Maximization of L(θ) is equivalent to the minimization of ni¼1 jxi hj: Now,
Pn
~
i¼1 jxi hj will be least when h ¼ X; the sample median as the mean deviation
about the median is least. X ~ will be an MLE of θ.
Properties of MLE
(a) If a sufficient statistic exists, then the MLE will be a function of the sufficient
statistic.
n o
Proof Let T be a sufficient statistic for the family f X ; h ; h 2 H

Qn n o
By the factorisation theorem, we have f ð x i ; hÞ ¼ g T X ; h h X :

n i¼1 o n o
To find MLE, we maximize g T x ; h with respect to h. Since g T X ; h

is a function of h and x only through T X ; the conclusion follows immediately.h

Remark 2.1 Property (a) does not imply that an MLE is itself a sufficient statistic.
Example 2.3 Let X1, X2,…,Xn be a random sample from a population having p.d.f.

1 8 hxhþ1
f X; h ¼ :
0 Otherwise

1 if h MinX i MaxX i h þ 1
Then, LðhÞ ¼ :
0 Otherwise
Any value of θ satisfying MaxX i 1 h MinX i will be an MLE of θ. In
particular, Min Xi is an MLE of θ, but it is not sufficient for θ. In fact, here
ðMinXi ; MaxXi Þ is a sufficient statistic.
(b) If T is the MVBE, then the likelihood equation will have a solution T.

@ log f X ;h

Proof Since T is an MVBE, ¼ ðT hÞkðhÞ
@h
@ log f X ;h

Now, @h ¼0
) h ¼ T ½* kðhÞ 6¼ 0:
(c) Let T be an MLE of θ and d ¼ wðhÞ be a one-to-one function of θ. Then,

d ¼ wðT Þ will be an MLE of d. h
n o
Proof Since T is an MLE of θ, L T X LðhÞ8h;

Since the correspondence between θ and d is one-to-one, inverse function must
exist. Suppose the inverse function is h ¼ w1 ðdÞ:

Thus, LðhÞ ¼ L w1 ðdÞ ¼ L1 ðdÞ (say)
Now, 8 0
2 19 3
1 < = n o
L1 ðd Þ ¼ L w ðd Þ ¼ L4w1 w@T X A 5¼ L T X LðhÞ ¼ L1 ðdÞ.
: ;

Therefore, ‘d’ is an MLE of d.
(d) Suppose the p.d.f. (or p.m.f.) f(x, θ) satisfies the following regularity
conditions:
(i) For almost all x, @f ð@h

x; hÞ @ f ðx; hÞ @ f ðx; hÞ
2 3
; ; @h3 exists 8 h 2 H.
2 @h 3
2
@f ðx; hÞ @ f ðx; hÞ
(ii) @h \A1 ðxÞ; @h2 \A2 ðxÞ and @ f@hðx;3 hÞ \BðxÞ;
where A1(x) and A2(x) are integrable functions of x and
R1
BðxÞf ðx; hÞdx \ M; a finite quantity
1
R1 2
@ log f ðx; hÞ
iii) @h f ðx; hÞdx is a finite and positive quantity.
1
If ^
hn is an MLE of θ on the basis of a sample of size n, from a population having
p.d.f. (or p.m.f.) f(x,θ) which satisfies the above regularity conditions, then
pffiffiffi^
n hn h is asymptotically normal with mean ‘0’ and variance
1 2 1
R @ log f ðx; hÞ
@h f ðx; hÞdx : Also, ^hn is asymptotically efficient and consistent.
1
(e) An MLE may not be unique. h

1 if h x h þ 1
Example 2.4 Let f ðx; hÞ ¼ :
0 Otherwise

1 if h min xi max xi h þ 1
Then, LðhÞ ¼
0 Otherwise

1 if max xi 1 h min xi
i.e. LðhÞ ¼
0 Otherwise
Clearly, for any value of θ, say T a ¼ aðMaxxi 1Þ þ ð1 aÞMinxi ; 0 a 1;

L(θ) will be maximized. For fixed α, T a will be an MLE. Thus, we observe that an
infinite number of MLE exist in this case.
(f) An MLE may not be unbiased.
Example 2.5
1
if 0 x h
f ðx; hÞ ¼ h :
0 Otherwise
1
if max xi h
Then, LðhÞ ¼ hn :
0 Otherwise
L( θ )
Max Xi θ
From the figure, it is clear that the likelihood L(θ) will be the largest when
θ = Max Xi. Therefore Max Xi will be an MLE of θ. Note that EðMax X i Þ ¼
n þ 1 h 6¼ h: Therefore, here MLE is a biased estimator.
n
(g) An MLE may be worthless.

Example 2.6

1 3
f ðx; pÞ ¼ px ð1 pÞ1x ; x ¼ 0; 1; p 2 ;
4 4

p if x ¼ 1 p ¼ 34 if x ¼ 1
Then; LðpÞ ¼ i.e. LðpÞ will be maximized at
1p if x ¼ 0 p ¼ 14 if x ¼ 0
Thus, T ¼ 2X 4þ 1 will be an MLE of θ.

Now, EðT Þ ¼ 2p 4þ 1 6¼ p: Thus, T is a biased estimator of π.
MSE of T ¼ E ðT pÞ2
2
2x þ 1 1
¼E p ¼ Ef2ðx pÞ þ 1 2pg2
4 16
1 n o
¼ E 4ðx pÞ2 þ ð1 2pÞ2 þ 4ðx pÞð1 2pÞ
16
1 n o 1
¼ 4pð1 pÞ þ ð1 2pÞ2 ¼
16 16
Now, we consider a trivial estimator dðxÞ ¼ 12.

2 1
MSE of dðxÞ ¼ 12 p 16 = MSE of T 8p 2 14 ; 34
Thus, in the sense of mean square error MLE is meaningless.
(h) An MLE may not be consistent
Example 2.7

hx ð1 hÞ1x if h is rational
f ðx; hÞ ¼
ð1 hÞx h1x if h is rational 0\h\1; x ¼ 0; 1:
An MLE of θ is ^hn ¼ X. Here, ^hn is not a consistent estimator of h.

(i) The regularity conditions in (d) are not necessary conditions.
Example 2.8
1 1\x\1
f ðx; hÞ ¼ ejxhj ;
2 1\h\1
Here, regularity conditions do not hold. However, the MLE (=sample median) is
asymptotically normal and efficient.
Example 2.9 Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; a; bÞ ¼

bebðxaÞ ; a x\1 and b [ 0.
Find MLE’s of α, β.
Solution
P
n
b ðxi aÞ
Lða; bÞ ¼ b e n i¼1
X
n
loge Lða; bÞ ¼ n loge b b ð x i aÞ
i¼1
@ log L P @ log L
@b ¼ bn ðxi aÞ and @a ¼ nb:
@ log L
Now, ¼ 0 gives us β = 0 which is nonadmissible. Thus, the method of
@a
differentiation fails here.
Now, from the expression of L(α, β), it is clear that for fixed β(>0), L(α, β)
becomes maximum when α is the largest. The largest possible value of α is
X(1) = Min xi.
Now, we maximize L X ð1Þ ; b with respect to β. This can be done by consid-
ering the method of differentiation.

@ log L xð1Þ ; b n X n
¼0) ðxi min xi Þ ¼ 0 ) b ¼ P
@b b ðxi min xi Þ

So, the MLE of (α, β) is minxi ; P n
:
ðxi minxi Þ
Example 2.10 Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; a; bÞ ¼

ba; axb
1
0; Otherwise
(a) Show that the MLE of (α, β) is (Min Xi, Max Xi).
(b) Also find the estimators of a and b by the method of moments.
Proof
1
ðaÞLða; bÞ ¼ if a Minxi \Maxxi b ð2:1Þ
ð b aÞ n
It is evident from (2.1), that the likelihood will be made as large as possible
when (β − α) is made as small as possible. Clearly, α cannot be larger than Min xi
and β cannot be smaller than Max xi; hence, the smallest possible value of (β − α) is
(Max xi − Min xi). Then the MLE’S of α and β are â ¼ Min xi and b ^ ¼ Max xi ,
respectively.
2
(b) We know E ð xÞ ¼ l11 ¼ a þ2 b and V ð xÞ ¼ l2 ¼ ðb 12aÞ
2 P
Hence, a þ2 b ¼ x and ðb 12aÞ ¼ 1n ni¼1 ðxi xÞ2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi
2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi
2
ðxi xÞ ðxi xÞ
By solving, we get â ¼ x
3
^ ¼ x þ
and b
3
n n
Successive approximation for the estimation of parameter
It frequently happens that the likelihood equation is by no means easy to solve.

A general method in such cases is to assume a trial solution and correct it by an
extra term to get a more accurate solution. This process can be repeated until we get
the solution to a sufficient degree of accuracy.

Let L denote the likelihood and h be the MLE.

Then @ log
@h
L

¼ 0. Suppose h0 is a trial solution of @ log
@h ¼ 0
L
h¼h
Then
L @ log L log L
0 ¼ @ log þ ðh h0 Þ @ @h + terms involving ðh h0 Þ
2
@h
¼ @h 2
h¼h h¼h0 h¼h0
with powers higher
than unity.
@ log L log L
þ ðh h0 Þ @ @h
2
) 0 ’ @h 2 ; neglecting the terms involving
h¼h0 h¼h0

ðh h0 Þ with powers higher than unity. 2
L
) 0 ’ @ log
@h ðh h0 ÞIðh0 Þ; where IðhÞ ¼ E @ @h
log L
2 :
h¼h0
Thus, the first approximate value of θ is
8 9
> @ log L >
< @h =
hð1Þ ¼ h0 þ
h¼h0
:
: I ð h0 Þ >
> ;
Example 2.11 Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; hÞ ¼ p 1

f1 þ ðx hÞ2 g
@ log f ðx; hÞ Pn
Here, @h ¼ 1 þ2ððxhÞ
xhÞ2
; and so the likelihood equation is ðxi hÞ
i¼1 1 þ ðxi hÞ2 ¼ 0;
clearly it is difficult to solve for θ.
So, we consider successive approximation method.
In this case, IðhÞ ¼ n2 :
P
Here, the first approximation is hð1Þ ¼ h0 þ 4n ni¼1 1 þðxði xh0hÞ Þ2 ;
i 0
h0 being a trial solution.

Usually, we take h0 = sample median.
2.4 Method of Minimum v2 55
2.4 Method of Minimum v2
This method may be used when the population is grouped into a number of mu-
tually exclusive and exhaustive class and the observations are given in the form of
frequencies.
Suppose there are k classes and pi ðhÞ is the probability of an individual
belonging to the ith class. Let f i denote the sample frequency. Clearly,
Pk Pk
i¼1 pi ðhÞ ¼ 1 and i¼1 f i ¼ n:
The discrepancy between observed frequency and the corresponding expected
frequency is measured by the Pearsonian v2 , which is given by
P 2 P f2
v2 ¼ ki¼1 ff i npnpi ðhi ðÞhÞg ¼ npi iðhÞ n:
The principle of the method of minimum v2 consists of choosing an estimate of
θ, say ^h, we first consider the minimum v2 equations @v
2
@hi ¼ 0; i = 1, 2,…,r and

hi ¼ ith component of θ.
It can be shown that for large n, the min v2 equations and the likelihood
equations are identical and provides identical estimates.
The method of minimum v2 , is found to be more troublesome to apply in many
cases, and has no improvement on the maximum likelihood method. This method
can be used when maximum likelihood equations are difficult to solve. In particular
situations, this method may be simple. To avoid the difficulty in minimum v2
method, we consider another measure of discrepancy, which is given by v02 ¼
Pk ff i npi ðhÞg2 02
i¼1 f ; v is called modified Pearsonian v2 . Now, we minimize, instead
i
of v2 , with respect to θ.
It can be shown that for large n the estimates obtained by min v2 would also be
approximately equal to the MLE’s. Difficulty arises if some of the classes are
empty. In this case, we minimize
X ff npi ðhÞg2
v002 ¼ i
+ 2M;
i:f 6¼0
fi
i
where M = sum of the expected frequencies of the empty classes.

Example 2.12 Let ðx1 ; x2 ; . . .; xn Þ be a given sample of size n. It is to be tested
whether the sample comes from some Poisson distribution with unknown mean l.
How do you estimate l by the method of modified minimum chi-square?
Solution
Let x1 ; x2 ; . . .; xn be arranged in k groups such that there are

ni observations with x ¼ i; i ¼ r þ 1; . . .; r þ k 2
nL observations x r
nu observations with x r þ k 1 so that the smallest and the largest values of

P þ k2
x, which are fewer, are pooled together andnL þ ri¼r þ 1 ni þ nu ¼ n:
el li P
Let pi ðlÞ ¼ Pð x ¼ iÞ ¼ i! , pL ðlÞ ¼ Pðx r Þ ¼ ri¼0 pi ðlÞ and pu ðlÞ ¼
P1
Pðx r þ k 1Þ ¼ i¼r þ k1 pi ðlÞ:
Pr i
Pk ni @pi ðhÞ ð 1Þpi ðlÞ
Now using i¼1 pi ðhÞ @hj ¼ 0; j ¼ 1; 2; . . .p we have nL
P
i¼0 l
r þ
p ðlÞ
P1 i¼0 i
Pr þ k2 ð 1Þpi ðlÞ
i
i¼r þ 1 ni l 1 þ nu
i P
i¼r þ k1 l
1 ¼ 0.
i¼r þ k1
pi ð lÞ
Since there is only one parameter, i.e. p ¼ 1 we get the only above equation. By
solving, we get
P
r P
1
ipi ðlÞ X
rþ k2 ipi ðlÞ
i¼0 i¼r þ k1
l¼
n^ nL P r þ ini þ nu P 1
i¼0 i¼r þ k1
= sum of all x’s

^ is approximately the sample mean x.
Hence, l
2.5 Method of Least Square
In the method of least square , we consider the estimation of parameters using some
specified form of the expectation and second moment of the observations. For
fitting a curve of the form y ¼ f x; b0 ; b1 ; . . .; bp to the data ðxi ; yi Þ; i = 1, 2,…n,
we may use the method of least squares. This method consists of minimizing the
sum of squares.
P
S ¼ ni¼1 e2i , where ei ¼ yi f xi ; b0 ; b1 ; . . .; bp ; i = 1, 2,…,n with respect to
P 2 P 2
b0 ; b1 ; . . .; bp : Sometimes, we minimize wi ei instead of ei . In that case, it is
called a weighted least square method.
To minimize S, we consider (p + 1) first order partial derivatives and get (p + 1)
equations in (p + 1) unknowns. Solving these equations, we get the least square
estimates of b0i s.
In general, the least square estimates do not have any optimum properties even
asymptotically. However, in case of linear estimation this method provides good
estimators. When f xi ; b0 ; b1 ; . . .; bp is a linear function of the parameters and the
2.5 Method of Least Square 57
x-values are known, least square estimators will be BLUE. Again, if we assume that
e0i s are independently and identically normally distributed, then a linear estimator of
0
the form a b will be MVUE for the entire class of unbiased estimators. In general,

we consider n uncorrelated observations y1 ; y2 ; . . .yn such that E ðyi Þ ¼
b1 x1i þ b2 x2i þ þ bk xki :
V ðyi Þ ¼ r2 ; i ¼ 1; 2; . . .. . .; n; x1i ¼ 18i;

where b1 ; b2 . . .. . .. . .bk and r2 are unknown parameters. If Y and
b stand for
column vectors of the variables yi and parameters bj and if X ¼ xji be an ðn kÞ
matrix of known coefficients xji then the above equation can be written as
EðY Þ ¼ Xb
V ðeÞ ¼ E ðee0 Þ ¼ r2 I
where e ¼ Y Xb is an ðn 1Þ vector of error random variable with E ðeÞ ¼ 0

and I is an ðn nÞ identity matrix. The least square method requires that b0 s be such
calculated that / ¼ e0 e ¼ ðY Xb Þ0 ðY Xb Þ be the minimum. This is satisfied
when
@/
¼0
@b
Or; 2X 0 ðY Xb Þ ¼ 0:
^ ¼ ðX 0 X Þ1 X 0 Y:
The least square estimators b 0 s is thus given by the vector b
Example 2.13 Let yi ¼ b1 x1i þ b2 x2i þ ei ; i ¼ 1; 2; . . .. . .; n or Eðyi Þ ¼ b1 x1i þ
b2 x2i ; x1i ¼ 1 for all i.
Find the least square estimates of b1 and b2 . Prove that the method of maximum
likelihood and the method of least square are identical for the case of normal
distribution.
Solution
In matrix notation we have 0 1 0 1
1 x21 y1
B1
B x22 C
C b1
B y2 C
B C
E ðY Þ ¼ Xb ; where X ¼ B ..
.. C; b ¼ and Y ¼ B . C
@. . A b2 @ .. A
1 x2n yn
Now,
^ ¼ ðX 0 X Þ1 X 0 Y
b
0 1
1 x21
B P
... 1 B1 x22 C
C
Here, X 0 X ¼
1 1
B. .. C ¼ Pn P x2i
x21 x22 . . . x2n @ .. . A x2i x22i
1 x2n
P
X0Y ¼ P yi
x2i yi
P 2 P P
^ ¼ P 1 x x2i yi
)b P P 2i P
n x22i ð x2i Þ2 x2i n x2i yi
P 2 P P P
1 x2i yi x2i x2i yi
¼ P 2 P P P P P
n x2i ð x2i Þ2 x2i yi þ n x2i yi
Hence,
P
P P P P P
^ ¼n x2i yi x2i yi x2i yi nx2y
b P 2 P ¼ P 2
2
n x2i ð x2i Þ 2 x2i nx22
P
ðx2i x2 Þðyi yÞ
¼ P
ðx2i x2 Þ2
P P P P
^ ¼ x22i yi x2i x2i yi
b 1 P P
n x22i ð x2i Þ2
P P
y x22i x2 x2i yi
¼ P 2
x2i nx2
P
ynx22 x2 x2i yi
¼ y þ P
x22i nx22
^
¼ y x2 b 2
Let yi be an independent N ðb1 þ b2 xi ; r2 Þ variate, i ¼ 1; 2; . . .. . .; n so that

E ðyi Þ ¼ b1 þ b2 xi : The estimators of b1 and b2 are obtained by the method of least
square on minimizing
X
n
/¼ ðyi b1 b2 xi Þ2
i¼1
The likelihood estimate is

n P
1 2
pffiffiffiffiffiffiffiffi e2r2 ðyi b1 b2 xi Þ
1
L¼
2pr
Pn 2
L is maximum when i¼1 ðyi b1 b2 xi Þ is minimum. By the method of
P
maximum likelihood, we choose b1 and b2 such that ni¼1 ðyi b1 b2 xi Þ2 ¼ / is
minimum. Hence, both the methods of least square and maximum likelihood
estimator are identical.
Example 2.14 Let X1 ; X2 ; . . .Xn be a random sample from p.d.f.
f ðx; h; r Þ ¼ hr C1ðrÞ ex=h xr1 ; x [ 0; h [ 0; r [ 0:
Find estimator of h and r by
(i) Method of moments
(ii) Method of maximum likelihood
Answer
(i) Here, E ð xÞ ¼ l11 ¼ rh; E ðx2 Þ ¼ l12 ¼ r ðr þ 1Þh2
1 Xn 2
m11 ¼ x; m12 ¼ x
i¼1 i
n
Pn
Hence, rh ¼ x; r ðr þ 1Þh2 ¼ 1n 2
i¼1 xi Pn
ðx xÞ2
By solving, we get ^r ¼ Pn nx
2
2 and ^
h ¼ i¼1 i
ðx
x Þ n
x
i¼1 i
Pn Q n
(ii) L ¼ hnr ðC1ðrÞÞn eh i¼1 xi
1
xr1
i
i¼1
P P
log L ¼ nr log h n log Cðr Þ 1h ni xi þ ðr 1Þ ni¼1 log xi
Now, @ log L ¼ nr þ nx2 ¼ 0 ) ^h ¼ x
@h h h r
@ log L @ log Cðr Þ X n

¼ n log h n þ log xi
@r @r i¼1
C0 ðr Þ X
n
¼ n log r n n log x þ log xi
Cðr Þ i
It is, however, difficult to solve the equation @ log

@r ¼ 0 and to get the estimate of r.
L
Thus, for this example estimators of h and r are more easily obtained by the method
of moments than the method of maximum likelihood.
Example 2.15 If a sample of size one is drawn from the p.d.f f ðx; bÞ ¼
2
b2
ðb xÞ; 0\x\b:
^ the MLE of b and b , the estimator of b based on method of moments. Show
Find b,
^ is biased, but b is unbiased. Show that the efficiency of b
that b ^ w.r.t. b is 2/3.
Solution
2
L¼ ð b xÞ
b2
log L ¼ log 2 2 log b þ logðb xÞ
@ log L 2 1
¼ þ ¼ 0 ) b ¼ 2x
@b b bx
a
Thus, the MLE of b is given by b = 2x.
Rb
Now, Eð xÞ ¼ b22 0 ðbx x2 Þdx ¼ b3
Hence, b3 ¼ x ) b ¼ 3x
Thus, the estimator of b based on method of moment is given by b ¼ 3x:
Now,

^ ¼ 2 b ¼ 2b 6¼ b
E b
3 3
b
E ðb Þ ¼ 3 ¼ b
3
^ is biased but b is unbiased.
Hence, b
Again,
Zb
2 b2
E x2 ¼ 2 bx2 x3 dx ¼
b 6
0
b2 b2 b2
) V ð xÞ ¼ ¼
6 9 18
b2
V ðb Þ ¼ 9V ð xÞ ¼
2
2 2
^
V b ¼ 4V ð xÞ ¼ b
9
h i2
^ ^
M b ¼ V b þ E b b ^
2
2 2 2
¼ b þ bb
9 3
1
¼ b2
3
^ with respect to b is 2 :
Thus, the efficiency of b 3
Chapter 3
Theory of Testing of Hypothesis
3.1 Introduction
Consider a random sample from an infinite or a finite population. From such a

sample or samples we try to draw inference regarding population. Suppose the form
of the distribution of the population is Fh which is assumed to be known but the
parameter h is unknown. Inferences are drawn about unknown parameters of the
distribution. In many practical problems, we are interested in testing the validity of
an assertion about the unknown parameter h. Some hypothesis is made regarding
the parameters and it is tested whether it is acceptable in the light of sample
observations. As for examples, suppose we are interested in introducing a high
yielding rice variety. We have at our disposal a standard variety having average
yield x quintal per acre. We want to know whether the average yield for the new
variety is higher than x. Similarly, we may be interested to check the claim of a tube
light manufacturer about the average life hours achieved by a particular brand.
A problem of this type is usually referred to as a problem of testing of hypothesis.
Testing of hypothesis is closely linked with estimation theory in which we seek the
best estimator of unknown parameter. In this chapter, we shall discuss the problem
of testing of hypothesis.
3.2 Definitions and Some Examples
In this section, some aspects of statistical hypotheses and tests of statistical

hypothesis will be discussed.
Let q ¼ fpð xÞg be a class of all p.m.f or p.d.f. In testing problem pð xÞ is
unknown, but ρ is known. Our objective is to provide more information about pð xÞ
on the basis of X ¼ x. That is, to know whether pð xÞ 2 q q:

DOI 10.1007/978-81-322-2514-0_3
64 3 Theory of Testing of Hypothesis
Definition 1 A hypothesis is a conjecture or assertion about pð xÞ. It is of two types,

viz., Null hypothesis (H) and alternative hypothesis (K).
Null hypothesis (H): A hypothesis that is tentatively set up is called null
hypothesis. Alternative to H is called Alternative hypothesis.
H and K are such that H \ K ¼ u and H [ Kq: We also write H as

H : pð xÞ 2 qH q
q \ qK ¼ u and qH [ qK q
and K as K : pð xÞ 2 qK q H
Labeling of the distribution

Write q ¼ pð xÞ ¼ p x=h ; h 2 H . Then ‘h’ is called the labelling parameter of the
distribution and ‘H’ is called the parameter space.
Example
Pm 3.1 X binðm; pÞ , X1 ; X2 ; . . .Xm are i.i.d Bernoulli (p) ) X ¼
X
i¼1 i bin ðm; pÞ, m is known, h ¼ p, H ¼ ½0; 1; outcome space

x ¼ f0; 1; 2; . . .mg f0; 1gX f0; 1gX. . .X f0; 1g
P m P
m

xi m xi
m x mx
pðx=hÞ ¼ p ð 1 pÞ Or p x =h ¼ p i¼1 ð1 pÞ i¼1
x

m x m x
q¼ p ð1 pÞmx ; p 2 ½0; 1 is known but p ð1 pÞmx is
x x
unknown if p is unknown.
Example 3.2 Let X1 ; X2 ; . . .Xn1 are i.i.d Pðk1 Þ and Y1 ; Y2 ; . . .Yn2 are i.i.d Pðk2 Þ.
Also they are independent and n1 and n2 are known.
Now,
X ¼ ðX1 ; X2 ; . . .Xn1 ; Y1 ; Y2 ; . . .Yn2 Þ; n ¼ n1 þ n2
x ¼ ½f0; 1; . . .1gn1 X ½f0; 1; . . .1gn2

h ¼ ðk1 ; k2 Þ; H ¼ ð0; 1ÞX ð0; 1Þ ¼ fðk1 ; k2 Þ : 0\k1 ; k2 \1g

P P
Y n1 Yn2 x y
k1 i k2 j ð n k þ n k Þ
pðx=hÞ ¼ pðxi =k1 Þ p y j k2 ¼ Q Q e 1 1 2 2
i¼1 j¼1
xi ! yj !

q ¼ p x=h ; 0\k1 ; k2 \1 is known but p x=h is unknown if h is unknown.
Example 3.3 X1 ; X2 ; . . .Xn are i.i.d N ðl;
r2Þ. X ¼ ðX1 ; X2 ; . . .Xn Þ; n 1; h ¼ ðl; r2 Þ
or flg (if r2 is known) or r2 (if l is known), H ¼ ðl; r2 Þ :
0
1\l\1; r [ 0g or fl : 1\l\1g R or r2 : r2 [ 0 :
2
x ¼ Rn : n-dimensional Euclidean space.

3.2 Definitions and Some Examples 65
Pn
1 ðxi lÞ2
pðx=hÞ ¼ ð2pÞ n=2 n 2r2 1
r e or
P
n
12 ðxi lÞ2
¼ ð2pÞn=2 e 1 ; r2 ¼ 1 or
P
n
1 x2
¼ ð2pÞ n=2 n 2r2 1 i
r e ; l ¼ 0: or

q ¼ p x=h ; 1\l\1; r2 [ 0 or

p x=h ; 1\l\1 or
2
p x=h ; r [ 0 all are known but unknown is p x=h for fixed θ (Unknown).
Parametric set up
pð xÞ ¼ pðx=hÞ; h 2 H: Then we can find HH ð HÞ and HK ð HÞ such that
HH \ HK ¼ / and pH ¼ fpðx=hÞ; h 2 HH g; pK ¼ fpðx=hÞ; h 2 HK g
So,
H : p 2 pH , H : h 2 H H
K : p 2 pK , K : h 2 H K :
Definition 2 Now a hypothesis H is called

i. Simple if H contains just one parametric point, i.e. H specifies the distribution
fpðx=hÞg completely.
ii. Composite if H contains more than one parametric point, i.e. H cannot
specify the distribution fpðx=hÞg completely.
Example 3.4 X1 ; X2 ; . . .Xn are i.i.d N ðl; r2 Þ: Consider the following hypothesis
ðH Þ:
1. H : l ¼ 0; r2 ¼ 1 : H ) H N ð0; 1Þ
2. H : l
0; r2 ¼ 1
3. H : l ¼ 0; r2 [ 0
4. H : r2 ¼ r20
5. H : lþr ¼ 0
The first one is a simple hypothesis and the remaining are composite hypotheses.
Definition 3 Let x be the observed value of the random variable X with probability
model pðx=hÞ; h unknown. Wherever X ¼ x is observed, pðx=hÞ is a function of h
only and is called the likelihood of getting such a sample. It is simply called the
likelihood function and often denoted by LðhÞ or Lðh=xÞ:
Definition 4 Test It is a rule for the acceptance or the rejection of the null
hypothesis (H) on the basis of an observed value of X.
Definition 5 Non-randomized test
Let x be a subset of
x such that
X 2 x ) The rejection of H
X2
x x ) The acceptance of H:
Then x is called the critical region or a test for H against K. Test ‘x’ means a
rule determined by x. Note that x does not depend on the random experiment (that
is on X). So it is called a non-randomized test.
Definition 6 Randomized test:
It consists in determining a function /ð xÞ
such that
(i) 0
/ð xÞ
1 8x 2
x
(ii) H is rejected with probability /ð xÞ whenever X ¼ x is observed.
Such a ‘/ð xÞ’ is called ‘Critical function’ or ‘test function’ or simply ‘test’ for
H against K. Here the function /ð xÞ depends on the random experiment (that is on
X). So that name ‘randomised’ is justified.
e.g. (i) and (ii) ⇒ whenever X ¼ x is observed, perform a Bernoullian trial with
probability of success /ð xÞ. If the trial results in success, reject H; otherwise H is
accepted. Thus /ð xÞ represents the probability of rejection of H.
If /ð xÞ is non-randomized with critical region ‘x’, then we have
x ¼ fx : /ð xÞ ¼ 1g

x x ¼ fx :/ðxÞ ¼ 0g (Acceptance region).
Detailed study on Non-randomized test
If x is Non-randomized test then it implies H is rejected iff X 2 x. In many cases,
we get a statistic T ¼ T ð X Þ such that for some C or C1 and C2 ,
½X 2 x , ½X : T [ C or ½X : T\C or ½X : T\C1 or : T [ C2 ; C1 \C2 :
Such a ‘T’ is called ‘test statistic’.
The event ½T [ C is called right tail test based on T.
The event ½T\C is called left tail test based on T.
The event ½T\C1 or T [ C2 is called two tailed test based on ‘T’.
Sometimes C1 and C2 are such that PfT\C1 =hg ¼ PfT [ C2 =hg8h 2 HH then
the test ½T\C1 or T [ C2 is called equal-tail test based on T.
Definition 7 Power Function

Let X be a random variable with pðx=hÞ as p.d.f or p.m.f of X, h 2 H; x 2
x
Testing problem
H : h 2 HH versus K : h 2 HK fHH \ HK ¼ ;g
Let x be a test for H against K.

Then the function given by
Px ðhÞ ¼ Probabilityfrejecting H under hg:

¼ PfX 2 x=hg; h 2 H
is called power function (a function of h) of the test ‘x’

For a given h 2RHK , Px ðhÞ is called the power of ‘x’ at h. For continuous case,
P
we have Px ðhÞ ¼ x pðx=hÞdx and for discrete case we have Px ðhÞ ¼ x pðx=hÞ:
A test ‘x’ is called size-α if
Px ðhÞ
a 8h 2 HH ½a : 0\a\1 ð3:1Þ
and it is called strictly size-α if
Px ðhÞ
a 8h 2 HH and Px ðhÞ ¼ a for some h 2 HH ð3:2Þ
(3.1) , Sup Px ðhÞ

a and (3.2) , Sup Px ðhÞ ¼ a:
h2HH h2HH
The quantity SupfPx ðhÞ; h 2 HH g is called the size of the test. Sometimes ‘a’ is
called the level of the test ‘x’
Some Specific cases

(i) h: Real-valued; testing problem H : h ¼ h0 (Simple) or H : h
h0
(Composite) against K : h [ h0 ; x: A test; Px ðhÞ: Power function; Size of the
test: Px ðh0 Þ or Sup Px ðhÞ
h
h0
(ii) h ¼ ðh1 ; h2 Þ : 2 component vector
Testing problem: H: h1 ¼ h01 (given) against K : h1 [ h01 (composite)
x: A test

Pw ðhÞ: power function = Pw ðh1 ; h2 Þ ¼ Pw h01 ; h2 (at H) = A function of h2 .
Thus, the power function (under H) is still unknown. The quantity sup

Pw h01 ; h2 ; h2 2 Space of h2 g is known and is called the size of the test. For, e.g.
take N ðl; r2 Þ distribution and consider the problem of testing H: l ¼ 0 against K:
l [ 0, then the size of the test is
n 2
o
Sup Pw ðl; r Þ l ¼ 0; 0\r2 \1 :

Example 3.5 X1 ; X2 ; . . .; Xn are i.i.d N l; r02 ; H : l
l0 against K : l [ l0 .

r0
x ðX1 ; X2 ; . . .; Xn Þ : X [ l0 þ pffiffiffi
n

Pw ðhÞ ¼ Pw ðlÞ ¼ P X [ l0 þ pffiffin l
r 0
pffiffiffi pffiffiffi
nðX lÞ nðl l0 Þ
¼P [ þ 1j l
r0 r0
pffiffiffi
nðl l0 Þ
¼ P Z [1 jZ N ð0; 1Þ
r0
pffiffiffi
nðl l0 Þ
¼U 1
r0
pffiffiffi
nðl l0 Þ
Size of x ¼ Sup Pw ðlÞ ¼ Sup U 1
l
l0 l
l0 r0
¼ Uð1Þ ¼ size of x for testing H : l ¼ l0 :
Example 3.6

X1 ; X2 ; . . .; Xn are i.i.d N l; r02 .
H : l ¼ l0 against K : l [ l0 .

r0 a 2 ð0; 1Þ
x¼ ðX1 ; X2 ; . . .; Xn Þ : X [ l0 þ pffiffiffi sa ;
n U ðsa Þ ¼ a
Pw ðl0 Þ ¼ size of x for testing H

pffiffiffi
r0 nð X l0 Þ

¼ P X [ l0 þ pffiffiffi sa jl0 ¼ P [ sa jl0
n r0
¼ PfZ [ sa jZ N ð0; 1Þg ¼ a:
) Test is exactly size 0 a0 :
Example 3.7 X1 ; X2 ; . . .; Xn are i.i.d. N ðl; r2 Þ

H : l ¼ l0 against K : l [ l0
[ cg
x : fðX1 ; X2 ; . . .; Xn Þ; X ð3:3Þ
where ‘c’ is such that the test is of size 0.05.

n o
Pw ðl0 Þ ¼ size of x for testing H ¼ P X [ c l
0
n X l0 ðc l0 Þ n
¼P [ =l0
r0 r0
pffiffiffi
ðc l0 Þ n
¼P Z[ =Z N ð0; 1Þ ¼ 0:05ðgivenÞ
r0
pffiffiffi
ðc l0 Þ n 1:645r0
) ¼ s0:05 ’ 1:645 )c ¼ l0 þ pffiffiffi
r0 n
ð3:4Þ
Test given (3.3) and (3.4) is strictly (exactly) of size 0.05.

(or, level of significance of the test is 0.05).
Example 3.8 X1 and X2 are i.i.d. according to (, Bernoulli (1, p)).

f x=p ¼ px ð1 pÞx ; x ¼ 0; 1
Testing problem, H : p ¼ 12 against K : p [ 12.

Consider the test x ¼ fðX1 ; X2 Þ : X1 þ X2 ¼ 2g
Accept H if ðX1 ; X2 Þ 62 x
Test statistic: T ¼ X1 þ X2 binð2; pÞ
2
x T ¼ 2 1
Size of the test is P ðX1 ; X2 Þ 2 p¼2 ¼P
1
p ¼ 2 ¼ 2 ¼ 0:25
1
If we take x ¼ fðX1 ; X2 Þ : X1 þ X2 ¼ 1; 2g i.e. accept H if ðX1 ; X2 Þ 62 x: We get

2 2
size ¼ 2: 12 þ 12 ¼ 0:75:
Let us take another test of the form:
x : Reject H if X1 þ X2 ¼ 2
xB : Reject H if X1 þ X2 ¼ 1 with probability 1
2
A: Accept H if X1 þ X2 ¼ 0
Sample space ¼ f0; 1; 2g ¼ x þ xB þ A
1
Size ¼ 1: PfðX1 þ X2 Þ ¼ 2g þ PðX1 þ X2 ¼ 1Þ þ 0: PðX1 þ X2 ¼ 0Þ
2
¼ 0:25 þ 0:25 ¼ 0:50
The test given above is called a randomized test.

Definition 8 Power function of a randomized test:
Consider /ð xÞ as a randomized test which is equivalent to probability of

rejection of H when the observed value of (X = x) and E as an Event of rejection of
H. Then PðE jX ¼ xÞ ¼ /ðxÞ: Power function of / is
P/ ðhÞ ¼ Probability fRejection of H under h using the function /g

n o
¼ P E=h
Z n o
¼ P E=x; h p x=h dx; when X is continuous ð3:5Þ

X n o
¼ P E=x; h p x=h ; when X is discrete ð3:6Þ

In case of (3.5), we get:

Z

P/ ðhÞ ¼ /ð xÞ p x=h dx As P E=x; h ¼ P E=x ¼ /ð xÞ

¼ E h / ð xÞ
P
In case of (3.6), we get: P/ ðhÞ ¼ /ð xÞ:p x=h ¼ Eh /ð xÞ

In either case we have P/ ðhÞ ¼ Eh /ð xÞ8h 2 H:
Special cases
1. Suppose /ð xÞ takes only two values, viz. 0 and 1. In that case, we say /ð xÞ is
non-randomized with critical region x ¼ fx : /ð xÞ ¼ 1g.
In that case
P/ ðhÞ ¼ 1:Ph f/ð xÞ ¼ 1g þ 0: Ph f/ð xÞ ¼ 0g

¼ Ph fX 2 xg ¼ Px ðhÞ:
) / is generalization of x.
2. Suppose /ð xÞ takes three values, viz 0, a and I according as
x 2 A; x 2 wB and x 2 x. In that case /ð xÞ is called randomized test having the
boundary region WB . The power function of this test is
P/ ðhÞ ¼ Ph fX 2 xg þ aPh fX 2 wB g.
(1) ) no need of post randomization: Non-randomised test.

(2) ) requires post randomization: randomized test.
Example 3.9 X1 ; X2 ; . . .; Xn are i.i.d. Bernoulli (1, p), n = 25. Testing problem:
H : p ¼ 12 against K : p 6¼ 12.
Consider the following tests:

9
P
25
/ð xÞ ¼ 1 if xi [ 12 >>
=
1
(1) Non-randomized
P15 >
¼ 0 if xi
12 >
;
1
9
P
25
>
/ðxÞ ¼ 1 if xi [ c >
>
>
>
>
1 >
=
P
25
(2) ¼ a if xi ¼ c
>
>
1 >
>
P25 >
>
¼ 0 if xi \c >
;
1
Find c and a such that Ep¼12 /ð xÞ ¼ 0:05. Randomized if a 2 ð0; 1Þ and

Non-randomized if a ¼ 0 or 1. In case of (1), size = Ep¼12 /ð xÞ ¼
25
P
P xi [ 12jp ¼ 12 ¼ 0:50001:
1
In case of (2), we want to get (c, a) such that Ep¼12 /ð xÞ ¼ 0:05.
( ) ( )
X
25 X
25
, Pp¼12 xi [ c þ aPp¼12 xi ¼ c ¼ 0:05
1 1
nP o
25
By inspection we find that Pp¼12 1 x i [ 17 ¼ 0:02237 and
nP o
25
Pp¼12 1 xi [ 16 ¼ 0:0546. Hence, c = 17.
P15
0:05Pp¼1 x [c
Now, a ¼ P25
2 1 i
¼ 0:050:02237 ¼ 0:8573:
P xi ¼c 0:03223
1
Thus the test given by
X
25
/ð xÞ ¼ 1 if xi [ 17
1
X
25
¼ 0:8573 if xi ¼ 17
1
X
25
¼ 0 if xi \17
1
is randomized and of size 0.05.

But the test given by
X
25
/ðxÞ ¼ 1 if xi 17
1
X
25
¼0 if xi \17
1
is non-randomized and of size 0.0546 (at the level 0.06).
Performance of x
Our object is to choose x such that Px ðhÞ8h 2 HH and ð1 Px ðhÞÞ8h 2 HK are as
small as possible. While performing a test x we reach any of the following
decisions:
I. Observe X = x, Accept H when θ actually belongs to HH : A correct decision.
II. Observe X = x, Reject H when θ actually belongs to HH : An incorrect
decision.
III. Observe X = x, Accept H when θ actually belongs to HK : An incorrect
decision.
IV. Observe X = x, Reject H when θ actually belongs to HK : A correct decision.
An incorrect decision of the type as stated in II above is called type-I error and
an incorrect decision of the type as stated in III above is called type-II error. Hence,
the performance of x is measured by the following:
(a) Size of type-I error = Probability {Type-I error} ¼ Sup PfX 2 x=hg ¼
h2HH
Sup Px ðhÞ
h2HH
(b) Size of type-II error = Probability {Type-II error } ¼ PfX 2

x xg 8h 2 HK
¼ 1 Px ðhÞ 8h 2 HK :
So we want to minimize simultaneously both the errors. In practice, it is not
possible to minimize both of them simultaneously, because the minimization of one
leads to the increase of the other.
Thus the conventional procedure: Choose ‘x’ such that, for a given a 2
ð0; 1Þ; Px ðhÞ
a 8h 2 HH and 1 Px ðhÞ 8h 2 HK is as low as possible, i.e.,
Px ðhÞ 8h 2 HK is as high as possible. ‘x’ satisfying above (if it exists) is called an
optimum test at the level α.
Suppose HH ¼ fh0 g a single point set and HK ¼ fh1 g a single point set.
The above condition thus reduces to: Px ðh1 Þ = maximum subject to Px ðh0 Þ
a.
Definition 9
1. For testing H : h 2 HH against K: h ¼ h1 62 HH , a test ‘x0 ’ is said to be most
powerful (MP) level ‘α’ 2 ð0; 1Þ if
Px0 ðhÞ
a8h 2 HH ð3:7Þ
and
Px0 ðh1 Þ Px ðh1 Þ 8x satisfying ð3:7Þ ð3:8Þ

In particular, if H : h ¼ h0 , (3.7) and (3.8) reduce to
Px0 ðh0 Þ
a and Px0 ðh1 Þ Px ðh1 Þ 8x satisfying first condition.
2. A test ‘x0 ’ is said to be MP size-α, if Suph2HH Px0 ðhÞ ¼
a and Px0 ðh1 Þ Px ðh1 Þ 8x satisfying Px ðhÞ
a8h 2 HH . Again if HH ¼ fh0 g,
we get the above condition as Px0 ðh0 Þ ¼ a and Px0 ðh1 Þ Px ðh1 Þ8x satisfying
Px ðh0 Þ
a.
3. For testing H : h 2 HH against K : h 2 HK ; HK \ HH ¼ /, a test ‘x0 ’ is said to
be Uniformly Most Powerful (UMP) level ‘α’ if
Px0 ðhÞ
a8h 2 HH ð3:9Þ
Px0 ðh1 Þ Px ðh1 Þ8h1 2 HK and 8x satisfying (3.9). i.e. ‘x0 ’is said to be UMP
size-α if Sup Px0 ðhÞ ¼ a and Px0 ðh1 Þ Px ðh1 Þ 8h1 2 HK and 8x satisfying
h2XH
Sup Px ðhÞ
a. Again if HH ¼ fh0 g, the aforesaid conditions reduce to
h2XH
(a) Px0 fh0 g

a and Px0 fh1 g Px fh1 g 8h1 2 HK and 8x satisfying Px fh0 g
a.
(b) Px0 fh0 g ¼ a and Px0 fh1 g Px fh1 g 8h1 2 HK and 8x satisfying Px fh0 g
a.
4. A test x is said to be unbiased if (under testing problem: H : h ¼ h0 against
K : h ¼ h1 ; ðh1 6¼ h0 Þ)
Px ðh1 Þ Px ðh0 Þ (⇒power ≥ size), i.e. it is said to be unbiased size-α if
Px ðh0 Þ ¼ a and Px ðh1 Þ a. If K : h 2 HK is composite, the above relation
reduces to (A) Px ðh1 Þ Px ðh0 Þ 8h1 2 HK ðBÞ Px ðh1 Þ a8h1 2 HK where
a ¼ Px ðh0 Þ.
5. For testing H : h ¼ h0 against K : h 2 HK 6 3 h0 , a test x is said to be
Uniformly Most Powerful Unbiased (UMPU) size-α if
(i) Px ðh0 Þ ¼ a; (ii) Px ðh1 Þ a 8h1 2 HK and (iii) Px ðh1 Þ Px ðh1 Þ 8h1 2
HK 8x satisfying (i) and (ii).
3.3 Method of Obtaining BCR
The definition of most powerful critical region, i.e. best critical region (BCR) of
size α does not provide a systematic method of determining it. The following
lemma, due to Neyman and Pearson, provides a solution of the problem if we,
however, test a simple hypothesis against a simple alternative.
The Neyman–Pearson Lemma maybe stated as follows:
For testing H : h ¼ h0 against K : h ¼ h1 ; h0 ; h1 2 H; h1 6¼ h0 ,

for some a 2 ð0; 1Þ, let x0 be a subset of
x. Suppose x0 satisfies the following
conditions:

(i) If x 2 x0 ; p x=h1 kp x=h0 ðInside x0 Þ

(ii) If x 2 x x0 ; p x=h1 \kp x=h0 ðOutside x0 Þ
(x: observed value of X) where kð [ 0Þ is such that Px0 ðh0 Þ ¼ a. Then

Px0 ðh1 Þ Px fh1 g8x satisfying Px ðh0 Þ
a. That means ‘x0 ’ is a MP size-α test.
Proof (Continuous case)
Z
Z
Px0 ðh1 Þ Px ðh1 Þ ¼ p x=h1 dx p x=h1 dx

x0 x
Z
Z
Z
Z
¼ p x=h1 dx þ p x=h1 dx p x=h1 dx p x=h1 dx

x0 x x0 \x xx0 x\x0
Z
Z
¼ p x=h1 dx p x=h1 dx
x0 x xx0
ð3:10Þ
h

Now, x 2 x0 x , x 2 inside x0 ) p x=h1 kp x=h0

Z
Z
) p x=h1 dx k p x=h0 dx
x0 x x0 x
x 2 x x0 , x 2 outside x0 ) p x=h1 \kP x=h0

Z
Z
) p x=h1 dx\k P x=h0 dx

xx0 xx0
Hence R.H.S of (3.10)

2 3
Z
Z
k4 p x=h0 dx p =h0 dx5

x
x0 x xx0
2 3
Z
Z
¼ k4 p x=h0 dx p x=h0 dx5

x0 x
2 3
Z
¼ k4a p x=h0 dx5 ¼ k½a Px ðh0 Þ

x
kða aÞ ¼ 0 as Px ðh0 Þ
a:
3.3 Method of Obtaining BCR 75
Hence we get Px0 ðh1 Þ Px ðh1 Þ 0
, Px0 ðh1 Þ Px ðh1 Þ

R
(Similar result can be obtained for the discrete case replacing by R)
Notes
1. Define Y ¼ ppððxjh 1Þ
xjh0 Þ. If the random variable Y is continuous, we can always find a
k such that, for a 2 ð0; 1Þ P½Y k ¼ a. If the random variable Y is discrete, we
sometimes find k such that P½Y k ¼ a.
But, in most of the cases, we have (assuming that P½Y k 6¼ a)
Ph0 ðY k1 Þ\a and Ph0 ðY k2 Þ [ a; k1 [ k2 ð) PðY kÞ ¼ a has no
solution).
In that case we get a non-randomized test 0 x0 0 of level a given by

pð xjh1 Þ
x0 ¼ x: k1 ; Px0 ðh0 Þ
a:
pð xjh0 Þ
In order to get a size-a test, we proceed as follows:

(i) Reject H if Y k1
(ii) Accept H if Y\k2
(iii) Acceptance (or Rejection) depends on the random experiment whenever
Y ¼ k2 .
Random experiment: when Y ¼ k2 is observed, perform a random experiment
with probability of success
a Ph0 fY k1 g
P¼a¼ :
P fY ¼ k2 g
If the experiment results in success reject H; otherwise accept H. Hence, we get

the following randomized test:
pð xjh1 Þ
/0 ð xÞ ¼ 1 if k1
pð xjh0 Þ
a Ph0 fY k1 g pðx=h1 Þ
¼a¼ if ¼ k2
Ph0 fY ¼ k2 g pðx=h0 Þ
pðx=h1 Þ
¼ 0 if \k2 :
pðx=h0 Þ
Test /0 ð xÞ is obviously of size-a.
2. k ¼ 0 ) Px0 ðh1 Þ ¼ 1 ) x0 is a trivial MP test.
3. If the test (x0 ) given by N–P lemma is independent of h1 2 Hk that does not
include h0 , the test is UMP size-a.
4. Test (x0 ) is unbiased size-a.
Proof x0 ¼ fX : pð xjh1 Þ [ kpð xjh0 Þg we want to show Px0 ðh1 Þ a.

R Take k ¼ 0.
Then x0 ¼ fX : pðx=h1 Þ [ 0g. In that case Px0 ðh1 Þ ¼ pð xjh1 Þdx ¼
x0
R R
pð xjh1 Þdx ¼ pð xjh1 Þdx ¼ 1 [ a.
fx:pðxjh1 Þ [ 0g
x
) Test is trivially unbiased.
So throughout we assume that k [ 0.
Now
pð xjh1 Þ [ kpð xjh0 Þ ½As inside x0 : pð xjh1 Þ [ kpð xjh0 Þ

Z Z
) pð xjh1 Þdx k pð xjh0 Þdx ¼ ka
x0 x0
, Px0 ðh1 Þ ka ð3:11Þ
Again
pð xjh1 Þ
kpð xjh0 Þ ½As outside x0 : pð xjh1 Þ
kpð xjh0 Þ
Z Z
) pð xjh1 Þdx
k pð xjh0 Þdx
x0c x0c
, 1 Px0 ðh1 Þ
kð1 aÞ ð3:12Þ
P ðh1 Þ a
(3.11) ÷ (3.12) ) 1Px0x ðh Þ 1a , Px0 ðh1 Þ a:
0 1
) Test is unbiased.
Conclusion MP test is unbiased. Let x0 be a MP size-a test. Then, with
probability one, the test is equivalent to (assuming that ppððxjh 1Þ
xjh0 Þ has continuous dis-
tribution under h0 and h1 ) x0 ¼ fx : pð xjh1 Þ [ kpð xjh0 Þg where k is such that
Px0 ðh0 Þ ¼ a 2 ð0; 1Þ. h

Example 3.10 X1 ; X2 ; . . .Xn are i.i.d. N l; r20 ; 1\l\1; r0 = known. (without
any loss of generality take r0 ¼ 1).
X ¼ ðX1 ; X2 ; . . .Xn Þ observed value of X ¼ x ¼ ðx1 ; x2 ; . . .; xn Þ. To find UMP
size-a test for H : l ¼ l0 against K : l [ l0 . Take any l1 [ l0 and find MP
size-a test for
H : l ¼ l0 against K : l ¼ l1 ;
Solution

n 1 P
n
ðxi lÞ2
1 2
p x=l ¼ pffiffiffiffiffiffi e 1 :
2p
Then

P
n
1
ðxi l0 Þ2 Pn
p x=l 2
e 1
1
2 ðl1 l0 Þ ð2xi l1 l0 Þ
1

¼ P n ¼e 1
p x=l 1
2 ðxi l1 Þ2
0
e 1

1X
¼ enxðl1 l0 Þ2ðl1 l0 Þ
n 2 2
* x ¼ xi
n
Hence, by N–P lemma, the MP size-a test is given by
n

o
x0 ¼ x : p x=l [ kp x=l ð3:13Þ
1 0
where k is such that
Px0 ðl0 Þ ¼ a ð3:14Þ

n o
ð3:13Þ , x : enxðl1 l0 Þ2ðl1 l0 Þ [ k
n 2 2
ð3:15Þ

loge k 1
, x : x [ þ ðl1 þ l0 Þ as l1 [ l0
n ð l1 l0 Þ 2
, fx : x [ cg; say ð3:16Þ
By (3.16),

ð3:14Þ , P x [ c=l ¼ a
0
nðx l0 Þ nðc l0 Þ
,P [ l0 ¼ a
1 1

N l0 ; 1 under H)
(X1 ; X2 ; . . .Xn are i.i.d N ðl0 ; 1Þ under H ) X n
pffiffiffi
, P Z [ nðc l0 ÞjZ N ð0; 1Þ ¼ a
2 1 3
Z
pffiffiffi
) nðc l0 Þ ¼ sa 4 N ðZjð0; 1ÞÞdz ¼ a5
sa
1
, c ¼ l0 þ pffiffiffi sa ð3:17Þ
n
Test given by (3.16) and (3.17) is MP size-a for H : l ¼ l0 against

K : l ¼ l1 ð [ l0 Þ.
The test is independent of any μ1(>μ0). Hence it is UMP size-α for H : l ¼ l0

against K : l [ l0 .
Observations
1. Power function of the test given by (3.16) and (3.17) is

[ l0 þ psa
Px0 ðlÞ ¼ PðX 2 x0 jlÞ ¼ Pr: X ffiffiffil
n
pffiffiffi
¼ P Z [ nðl0 lÞ þ sa jZ N ð0; 1Þ
Z1
¼ N ðZjð0; 1ÞÞdz
pffiffi
sa nðll0 Þ
pffiffiffi
(Under any l : ðX1 ; X2 ; . . .; Xn Þ are i.i.d. N ðl; 1Þ ) nðx lÞ N ð0; 1Þ)
pffiffiffi
¼ 1 U sa nðl l0 Þ :
Hence, for any fixed lð [ l0 Þ
Px0 ðlÞ ! 1 as n ! 1 ð3:18Þ
and for any fixed lð\l0 Þ
Px0 ðlÞ ! 0 as n ! 1 ð3:19Þ
(3.18) ) test is consistent against any l [ l0 .

Definition 10
1. For testing H : h ¼ h0 against K : h ¼ h1 , a test x (based on n observations) is
said to be consistent if the power Px ðh1 Þ of the test tends to ‘1’ as n ! 1.
pffiffiffi
2. Px0 ðlÞ ¼ 1 Uðsa nðl l0 ÞÞ which increases as l increases for fixed n.
) Px0 ðlÞ [ 1 Uðsa Þ for all l [ l0

¼ 1 ð 1 aÞ ¼ a
) x0 is unbiased.
3. Px0 ðlÞ\Px0 ðl0 Þ for all l\l0
¼a
) Power \a for any l\l0
That is, test x0 is biased for testing H : l ¼ l0 against K : l\l0 .
4. From (3.15), if l1 \l0 , we get x0 to be equivalent to
fx : x\c0 g ð3:20Þ

and Px0 ðl0 Þ is equivalent to PfX\c 0
jl0 g ¼ a
sa
) c0 ¼ l0 pffiffiffi ð3:21Þ
n
(by the same argument as before while finding c)
Test given by (3.20) and (3.21) is independent of any l1 \l0 . Hence it is UMP
size-a for H : l ¼ l0 against K : l\l0 .
n o
5. (i) UMP size-a for H : l ¼ l against K : l [ l is x ¼ x : x [ l þ psaffiffi
0 0 0 0 n
n test for H :ol ¼ l0 against K : l\l0

(ii) UMP size-a
is x0 ¼ x : x\l0 psaffiffin
0
Clearly, x0 6¼ x00 (xo is biased for H against l\l0 and x00 is biased for
H against l [ l0 ).
There does not exist any test which is UMP for H : l ¼ l0 against
K : l 6¼ l0 .
Example 3.11 X1 ; X2 ; . . .Xn are i.i.d. N ðl0 ; r2 Þ; r2 [ 0 and l0 is known (without

any loss of generality we take l0 ¼ 0)
X ¼ ðX1 ; X2 ; . . .; Xn Þ, observed value = x ¼ ðx1 ; x2 ; . . .; xn Þ
Testing problem: H : r ¼ r0 against K : r [ r0 .
To find UMP size-a test for H against K : r [ r0 we take any r1 [ r0 .
Solution
P
n

n 12 x2i
Here p x=r ¼ p1ffiffiffiffi e
2r
i
r 2x
Hence

x P
n
p =r1 r0 n 1
2 xi2 1

r2 r2
1

¼ e i 0 1
ð3:22Þ
x
p =r0 r 1
By N–P lemma MP size-a test is given by

8
9
<
p x=r1 =
w0 ¼ x :
[k ð3:23Þ
: p x= ;
r0
where kð [ 0Þ
is such that
Pw0 ðr0 Þ ¼ a ð3:24Þ


P
n
x
p =r
n 2 xi r2 r2
1 2 1 1
Now,
1
[ k , rr01 e i 0 1
[ k [from (3.22)]
p x=r
0

2
r0
X
n
2 loge k n loge r1
, xi2 [
½As r1 [ r0
i
1
1 1
r20
1
r21
ð3:25Þ
r 2 r0
2
1
¼ c ðsayÞ
Hence (3.23) and (3.24) are equivalent to

( )
X
n
w0 ¼ x: x2i [ c ð3:26Þ
i
( )
X
n
and P x2i [ c=r0 ¼a ð3:27Þ
i
Under any r2 ; X1 ; X2 ; . . .; Xn are i.i.d. N ð0; r2 Þ
P
n
x2i
i
) v2n
r2
Hence (3.27)
2 3
Z1
c 6 1 y 7
e 2 y21 dy ¼ a5
n
) v2n;a 4
r20 Cðn=2Þ2n=2
v2n;a
,c¼ r20 v2n;a ð3:28Þ
Thus the test given by

( )
X
n
w0 ¼ x: x2i [ r20 v2n;a ð3:29Þ
i
is MP size-a for H : r ¼ r0 against K : r ¼ r1 . Test is independent of any

r1 [ r0 . Hence it is UMP size-a for H : r ¼ r0 against K : r1 [ r0
Observations
1. Under any r20 ;
1X n
x2 ¼ Yn v2n
r0 i i
2
) EðYn Þ ¼ n; V ðYn Þ ¼ 2n
Hence, from the asymptotic theory of v2 , for large n under H

ffiffiffiffi
Ypn n
is asymptotically N ð0; 1Þ.
2n
n o
ffiffiffiffi [ vp
n;a n
2
So, for large n; w0 ¼ x : Ypn n2n
ffiffiffiffi
2n
v2n;a n pffiffiffiffiffi
and pffiffiffi 2n
ffi ’ sa i.e. vn;a ’ sa 2n þ n
2
Thus, (3.29) can be approximated by

( )
X
n pffiffiffiffiffi
x0 ¼ x: x2i [ r02 sa 2n þ n
i¼1
2. UMP size-α test for H : r2 ¼ r20 against K : r2 [ r20 is
( )
X
n
w0 ¼ x: x2i [ r20 v2n;a
i
UMP size-a test for H : r2 ¼ r20 against K : r2 \r20 is
2 3
( ) Z1
X
n 6 7
w ¼ x: x2i \r20 v2n;1a 6 f ðv2n Þdv2n ¼ 1 a7
4 5
i
v2n;ð1aÞ
Clearly, w0 6¼ w . Hence there does not exist UMP test for H : r2 ¼ r20 against
K : r2 6¼ r20 .
The power function of the test w0 is
( ) Z1
2 Xn .
Pw0 r ¼ P xi [ r0 vn;a r ¼
2 2 2 2
f v2n dv2n
i
r2
0 v2
r2 n;a
Clearly, Pw0 ðr2 Þ increases

2 as r 2 increases.
2
Also Pw0 ðr Þ
Pw0 r0 ¼ a 8r : r2
r20
2
Test is biased ) w0 cannot be recommended for H : r2 ¼ r20 against

K : r2 \r20 .
Similarly w is biased (Here Pw ðr2 Þ increases as r2 decreases) and hence it
cannot be recommended for H : r2 ¼ r20 against K : r2 [ r20 .
P
n
Next observe that 1n x2i is a consistent estimator of r2 . That means, for fixed r2 ,
i
Pn
r2 v 2
as n ! 11n x2i ! r2 in probability and 0 nn;a ! r20 . Thus if r2 [ r20 , we get
n i .
P 2
Lt P xi [ r20 v2n;a r2 ¼ 1 implying that the test w0 is consistent against
n!1 i
K : r2 [ r20 .
Similarly the test w is consistent against K : r2 \r20
eX =2 against
2
Example 3.12 Find the MP size-a test for H : X p1ffiffiffiffi
2p
K : X 12 ejX j
Answer MP size-a test is given by (Using N–P lemma)
x0 ¼ fx : pðx=K Þ [ kpðx=H Þg ð3:30Þ
Px0 ðH Þ ¼ a ð3:31Þ
Now,
rffiffiffi
pðx=K Þ p x2 jxj
¼ e2 [k
pðx=H Þ 2
rffiffiffi
p x2
, loge þ j xj [ loge k
2 2
n p
o
, x2 2j xj þ loge 2 loge k [ 0
2
, x 2 2j x j þ C [ 0 ð3:32Þ
Using (3.32), (3.31) is equivalent to

P x2 2j xj þ C [ 0=H ¼ a ð3:33:Þ
Test given by (3.32) and (3.33.) is MP size-a.

To find ‘C’ we proceed as follows:

P x2 2jxj þ C [ 0=H ¼ PH x2 2jxj þ C [ 0\x\0 þ PH x2 2jxj þ C [ 0\x [ 0
2 2
¼ PH x þ 2x þ C [ 0\x\0 þ PH x 2x þ C [ 0\x [ 0
Now, under H, X ∼ N(0, 1)

) PH x2 þ 2x þ C [ 0\x\0 ¼ PH x2 2x þ C [ 0 \ x [ 0
Thus (3.33) is equivalent to

a
PH x2 2x þ C [ 0 \ x [ 0 ¼ ð3:34Þ
2
Writing gð xÞ ¼ x2 2x þ C ) g00 ð xÞ ¼ 2 and g0 ð xÞ ¼ 0 at x ¼ 1

) gð xÞ is minimum at x ¼ 1

) x2 2x þ C [ 0\x [ 0 , x\x1 ðcÞ or x [ x2 ðcÞ
g(x)
x<0 x1(c) x2 (c) x>0
where x1 ðcÞ\x2 ðcÞ are the roots of
x2 2x þ C ¼ 0
pffiffiffiffiffiffiffiffiffiffiffiffiffi
2 4 4c pffiffiffiffiffiffiffiffiffiffiffi
)x¼ ¼1 1c
2
pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi
So, x1 ðcÞ ¼ 1 1 c and x2 ðcÞ ¼ 1 þ 1 c
Hence (3.34)
n pffiffiffiffiffiffiffiffiffiffiffio n pffiffiffiffiffiffiffiffiffiffiffio a
, PH 0\x\1 1 c þ PH x [ 1 þ 1 c ¼
2
pffiffiffiffiffiffiffiffiffiffiffi
1 pffiffiffiffiffiffiffiffiffiffiffi
a
, U 1 1 c þ1 U 1þ 1 c ¼
2 2
a 1 1a
, U 1þ 1 c U 1 1 c ¼ 1 ¼ ð3:35Þ
2 2 2
Test given by (3.32) and (3.35) is MP size-a.

Example 3.13 Let X be a single observation from the p.d.f. p x=h ¼ ph h2 þ1 x2 ; h [ 0.
Find the UMP size-a test for H : h ¼ h0 against K : h [ h0
Answer Take any h1 [ h0 and consider the ratio
pðx=h1 Þ h1 h20 þ x2 h1 h20 þ x2

¼ 2 ¼ 2
pðx=h0 Þ h0 h1 þ x2 h0 h1 h20 þ h20 þ x2
h1 1
¼ 2 2
; which is a strictly increasing function of x2 ði:e:; j xjÞ:
h0 1 þ h21 h0
h0 þ x2
Since h1 [ h0 , hence we can find a ‘C’ such that
pðx=h1 Þ
[ k , j xj [ C ð3:36Þ
pðx=h0 Þ
where C is such that
P½j xj [ C=h0 ¼ a ð3:37Þ
Z1 1
h0 dx a 1 x a
, 2 ¼ , tan1 ¼
p h0 þ x 2 2 p h0 C 2
C

1 p 1 C a 2 1 C
, tan ¼ , 1 tan ¼a ð3:38Þ
p 2 h0 2 p h0
Test given by (3.36) and (3.38) is MP size-a. As the test is independent of any
h1 [ h0 , it is UMP size-a for H : h ¼ h0 against K : h [ h0 . Power function is
given by
ZC
h
Px0 ðhÞ ¼ Pfj X j [ C=hg ¼ 1 dx
p h2 þ x 2
C
Example 3.14 X is a single observation from Cauchy p.d.f .f ðx=hÞ ¼ p 1

,
fðxhÞ2 þ 1g
we are to find MP size-a test for H : h ¼ h0 against K : h ¼ h1 ð [ h0 Þ.
Answer X CouchyðhÞ ) Y ¼ X h0 Couchyðh h0 ¼ dÞ. Hence H : h ¼
h0 , H : d ¼ 0 using Y-observation. So, without any loss of generality we take
H : h ¼ 0 and for the sake of simplicity we take h1 ¼ 1.
Here, by N–P lemma, MP test has the critical region x ¼
fx : pðx=h1 Þ [ kpðx=h0 Þg with Ph0 ðX 2 xÞ ¼ a 2 ð0; 1Þ
Here
pðx=h1 Þ 1 þ x2
[k , [k
pðx=h0 Þ 1 þ ð x 1Þ 2

, 1 þ x2 [ k 1 þ x2 2x þ 1
, x2 ð1 kÞ þ 2kx þ 1 2k [ 0 ð3:39Þ
Several cases
(a) k ¼ 1 , x [ 0, hence the size of the test is PðX [ 0=h ¼ 0Þ ¼ 12.
(b) 0\k\1 if we write gð xÞ ¼ ð1 kÞx2 þ 2kx þ ð1 2kÞ,
we have, g0 ð xÞ ¼ 2ð1 kÞx þ 2k ¼ 0 ) x ¼ 1kk
00
g ð xÞ ¼ 2ð1 kÞ [ 0, this means that the curve y ¼ gð xÞ has a minimum at
k
x ¼ 1k .
Shape of the curve is:
Here x1 \x2 are the roots of gð xÞ ¼ 0. Clearly, test is given by x\x1 or x [ x2

such that
PfX\x1 =h ¼ 0g þ PfX [ x2 =h ¼ 0g ¼ a ð3:40Þ
We take those values of x1 ; x2 that satisfies (3.40). Eventually, it is not possible

to get x1 ; x2 for any given a. It exists for some specific values of a.
(c) If k [ 1, in that case g00 ð xÞ ¼ 2ð1 kÞ\0, thus y ¼ gð xÞ has the maximum at
k
x ¼ 1k [ 0. As shown in (b) above here also we can find x1 and x2 the
two roots of gð xÞ ¼ 0 and the test will be given by x [ x1 or x\x2 with
Pfx1 \X\x2 =h ¼ hg ¼ a. Taking h1 ¼ 2, it can be shown that the MP test
for H : h ¼ 0 against h ¼ 2 is completely different. Hence based on single
observation there does not exist UMP test for H : h ¼ 0 against K : h [ 0
Randomized Test Testing problem H : h ¼ h0 against K : h ¼ h1 . If the ran-

dom variable Y ¼ ppððX=h 1Þ
X=h0 Þ is continuous under h ¼ h0 , we can always find kð [ 0Þ
such that for a given a 2 ð0; 1Þ; Ph0 ðY [ kÞ ¼ a.
On the other hand, if the random variable Y is discrete under h ¼ h0 , it may not be
always possible to find k such that, for a given a 2 ð0; 1Þ Ph0 ðY [ kÞ ¼ a. In that
case, we modify the non-randomized test x0 ¼ fx : pðx=h1 Þ [ kpðx=h0 Þg by using
following functions:
9 8 9
¼ 1; if pðx=h1 Þ [ kpðx=h0 Þ = <Y [k=
/0 ð xÞ ¼ a; if pðx=h1 Þ ¼ kpðx=h0 Þ , Y ¼ k ð3:41Þ
; : ;
¼ 0; if pðx=h1 Þ\kpðx=h0 Þ Y\k
where ‘a’ and ‘k’ are such that
Ph0 fY [ kg þ aPh0 fY ¼ kg ¼ a ð3:42Þ
The function given by (3.41) and (3.42) is called the randomized test corre-
sponding to non-randomized test x0 . It states that, after observing Y ði.e; X Þ
Reject H if Y [ k
Accept H if Y\k
Perform random experiment with probability of success = a, if Y ¼ k.

Occurrence of success ) Rejection of H and
Occurrence of failure ) Acceptance of H.
Now we can show that the test given by (3.41) and (3.42) is MP size-a among all

tests / satisfying Eh0 /ð xÞ
a. Observe that /0 ð xÞ ¼ 0 ) /0 ð xÞ /ð xÞ 08x :

pðx=h1 Þ [ kpðx=h0 Þ and /0 ð xÞ ¼ 0 ) /0 ð xÞ /ð xÞ
08x : pðx=h1 Þ\kpðx=h0 Þ.
Hence, for all x, we have,
0
/ ð xÞ /ð xÞ ½pðx=h1 Þ kpðx=h0 Þ 0
Z
0
) / ð xÞ /ð xÞ ½pðx=h1 Þ kpðx=h0 Þdx 0
, Eh1 /0 ð xÞ Eh1 /ð xÞ ka þ kEh0 /ð xÞ 0

, Eh1 /0 ð xÞ Eh1 /ð xÞ kða Eh0 /ð xÞÞ 0; As k [ 0 and Eh0 /ð xÞ
a
, P/0 ðh1 Þ P/ ðh1 Þ
) /0 is MP size-a among all / : Eh0 /ð xÞ

a.
Example 3.15 X1 ; X2 ; . . .. . .; Xn are i.i.d according to f ðx=hÞ ¼ hx ð1 hÞ1x ; x ¼

0; 1 To find UMP size-a test for H : h ¼ h0 ; against K : h [ h0 .
Answer Take any h1 [ h0 . To get MP size-a test for H : h ¼ h0 ; against

K : h ¼ h1 , we consider the ratio
P P
Qn xi n xi
pðx=h1 Þ f ð x =h Þ h ð 1 h Þ
¼ Qi¼1 ¼ 1P
i 1 1
Y¼ n P
pðx=h0 Þ i¼1 f ðxi =h0 Þ
xi n xi
n s h0 ð 1 h0 Þ
1 h1 h1 ð1 h0 Þ
¼
1 h0 h0 ð1 h1 Þ
P
where s ¼ xi . Observe that Y is a discrete r.v. under any h.
Hence by the N–P lemma, MP size-a test is given by

9
1; if Y [ k =
/0 ð xÞ ¼ a; if Y ¼ k
;
0; if Y\k

1 h1 n h1 ð 1 h0 Þ s
,
or k; ð3:43Þ
1 h0 h0 ð 1 h1 Þ
where ‘k’ and ‘a’ are such that Eh0 /0 ð xÞ ¼ a
, Ph0 fY [ kg þ aPh0 fY ¼ kg ¼ a; ð3:44Þ
Now,
n
1 h1 h1 ð 1 h0 Þ s

or k
1 h0 h0 ð 1 h1 Þ
h 0 i
1 h1 h 1 ð 1 h0 Þ
, n log þ s log
or k0 k ¼ loge k
1 h0 h 0 ð 1 h1 Þ
0
k log 1h 1
, s
or n on
n 1h0 o
log hh10 ðð1h0Þ
log hh10 ðð1h 0Þ
1h1 Þ 1h1 Þ
h1 ð 1 h0 Þ
¼ C; ðsayÞ; As; h1 [ h0 ) log [0
h0 ð 1 h1 Þ
Hence (3.43) and (3.44) are equivalent to

8 9
< 1; if s [ C >
> =
/0 ð xÞ ¼ a; if s ¼ C ð3:45Þ
>
: >
;
0; if s\C
and
Ph0 fs [ C g þ aPh0 fs ¼ C g ¼ a ð3:46Þ

P
Under any h, s ¼ n1 Xi binðn; hÞ. Hence from (3.46) we have, either,
n
P ns
Ph0 fs [ C g ¼ a , s h0 ð 1 h0 Þ
n s
¼a)a¼0
cþ1
or,
Ph0 fs Cg\a\Ph0 fs C g
P
a nc þ 1 ns hs0 ð1 h0 Þns ð3:47Þ
)a¼ c
n h ð1 h Þnc
c 0 0
Test given by (3.45) and (3.47) is MP size-a for H : h ¼ h0 against

K : h ¼ h1 ð [ h0 Þ. Test is independent of any h1 ð [ h0 Þ. Hence it is UMP size-a
for H : h ¼ h0 against K : h [ h0
Observation
n o
h1 ð1h0 Þ
1. For h1 \h0 ) log h0 ð1h1 Þ \0
In that case (3.43) and (3.44) are equivalent to
8 9
< 1; if s\C >
> =
/ ð xÞ ¼ a; if s ¼ C and Ph0 fs\C g þ aPh0 fs ¼ C g ¼ a:
>
: >
;
0; if s [ C
We can get UMP for H : h ¼ h0 against K : h\h0 by similar arguments.
Obviously /0 6¼ / . So there does not exist a single test which is UMP for
H : h ¼ h0 against K : h 6¼ h0
2. By De Moivre–Laplace limit theorem, for large n, pffiffiffiffiffiffiffiffiffiffiffiffi
Snh
is approximately N(0, 1).
nhð1hÞ
Hence, from (3.45) and (3.46), we get,
C nh0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ’ sa ) C ’ nh0 þ sa nh0 ð1 h0 Þ
nh0 ð1 h0 Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Then approximately size-a test is : Reject H if s [ nh0 þ sa nh0 ð1 h0 Þ,
Accept H otherwise.
3. Power function of test given by (3.45) and (3.46) is:
PðhÞ ¼ Eh /0 ð X Þ
¼ Ph fS [ cg þ aPh fS ¼ cg
X n n s ns
¼ s h ð 1 hÞ þ a nc hc ð1 hÞnc
S¼c þ 1
½Can be obtained using Biometrika table
X n n s ns
X
n
ns
¼ ð 1 aÞ s h ð 1 hÞ þa s h ð1 hÞ
n s
S¼c þ 1 S¼c
¼ ð1 aÞIh ðc þ 1; n cÞ þ aIh ðc; n c þ 1Þ;
½Can be obtained using incomplete Beta function table:
Observe, as Ih ðm; nÞ is an increasing function of h, the Power function PðhÞ
increases with h.
Example 3.16 Let X be a single 1 3 observation. To find MP size-a test for H :

X Rð0; 1Þ against K : X R 2 ; 2
Answer pðx=H Þ ¼0; 1; if 0\x\1
otherwise
1; if 1=2 \ x \ 3=2
pðx=K Þ ¼0; otherwise
As the ratio pðx=K Þ=pðx=H Þ is discrete, MP test for H against K is given by:
9
¼ 1 if; pðx=K Þ [ kpðx=H Þ =
/0 ð xÞ ¼ a; if pðx=K Þ ¼ kpðx=H Þ ð3:48Þ
;
¼ 0; if; pðx=K Þ\kpðx=H Þ
where ‘a’ and ‘k’ are such that

E H / 0 ð xÞ ¼ a ð3:49Þ
Taking
1
k\1; 0\x
) pðx=K Þ ¼ 0 and pðx=H Þ ¼ 1
2
) pðx=k Þ\kpðx=H Þ ) /0 ð xÞ ¼ 0
1
\x\1 ) pðx=K Þ ¼ pðx=H Þ ¼ 1=2
2
) pðx=K Þ [ kpðx=H Þ ) /0 ð xÞ ¼ 1
3
1
x\ ) pðx=K Þ ¼ 1 and pðx=H Þ ¼ 0
2
) pðx=K Þ [ kpðx=H Þ ) /0 ð xÞ ¼ 1
1
So, for k\1, we get EH /0 ð X Þ ¼ 1: PH 2 \X\1 þ 1: PH ðX 1Þ ¼ 12. Thus it
is a trivial test of size 0.5.
Taking k\1;
1 9
0\x
) /0 ð xÞ ¼ 0 >
>
>
2 >
>
=
1
\x\1 ) /0 ð xÞ ¼ 0 EH /0 ð X Þ ¼ 0 and it is a trivial test of size 0:
2 >
>
>
>
3 >
1
x
) /0 ð xÞ ¼ 1 ;
2
Taking k ¼ 1;
o\x
12 ) /0 ð xÞ ¼ 0: we always accept H.
1
x\ 32 ) /0 ð xÞ ¼ 1: We always reject H.
2 \x\1 ) pðx=K Þ ¼ kpðx=H Þ ) We perform a random experiment with

1
probability of success ‘a’ determined by EH /0 ð xÞ ¼ a.

1
, a:PH \X\1 ¼ a , a ¼ 2a
2
Thus the randomized test given by /0 ð xÞ
19>
¼ 0; if 0\x
>
2>>
>
=
1
¼ 2a; if \x\1 is MP size-a test:
2 >
>
>
3>>
¼ 1; if1
x\ ;
2
3.4 Locally MPU Test
The optimum region is obtained by the use of the following:

Generalization of N–P lemma.
Theorem 2 Let g0 ; g1 ; g2 ; . . .gm be ðm þ 1Þ non-negative
R integrable functions on
the sample space x . Let x be any region such that x gi ð xÞdx ¼ Ci ; i ¼ 1ð1Þm
where Ci ’s are all known numbers.
Suppose x0 be a subset of x such that:
Pm
Inside x0 : g0 ð xÞ [ ki gi ð xÞ;
1
3.4 Locally MPU Test 91
P
m
Outside x0 : g0 ð xÞ
ki gi ð xÞ, where k1 ; k2 ; . . .. . .km are so chosen that
R 1
gi ð xÞdx ¼ Ci ; i ¼ 1ð1Þm.
x0
R R
Then we have g0 ð xÞdx g0 ð xÞdx.
x0 x
This is called generalized Neyman–Pearson Lemma.
Proof
Z Z
g0 ð xÞdx g0 ð xÞdx
x0 x
Z Z
x0 x ¼ x0 x\x0 ¼ insidex0
¼ g0 ð xÞdx g0 ð xÞdx. . .. . .. . .ð1Þ
x x0 ¼ x x\x0 ¼ outsidex0
x0 x xx0
X
m
x 2 x0 x ) g0 ð xÞ [ ki gi ð xÞ
1
Z Z ( )
X
m
) g0 ð xÞdx ki gðxi Þ dx
1
x0 x x0 x
8 9
Xm < Z =
¼ ki gi ð xÞdx
i¼1
: ;
x0 x
X
m
x 2 x 0 x ) g0 ð x Þ
ki gi ð x Þ
1
( ) 8 9
Z Z X
m X
m < Z =
) g0 ð xÞdx
ki gi ðxÞ dx ¼ ki gi ð xÞdx
1 1
: ;
xx0 xx0 xx0
Hence L.H.S of (1)

8 9 8 9
Z Z X
m < Z = X m < Z =
g0 ð xÞdx g0 ð xÞdx ki gi ð xÞdx ki gi ð xÞdx
i
: ; i
: ;
x0 x xx0 x0 x xx0
2 3
X
m Z Z
¼ ki 4 gi ðxÞdx gi ð xÞdx5
i
x0 x xx0
2 3
X
m Z Z X
m
¼ ki 4 gi ð xÞdx gi ð xÞdx5 ¼ k i ðCi Ci Þ ¼ 0
i i
x0 x
) Hence the proof. h

Locally Best Tests

1. One-sided case: For the family fpðx=hÞ; h 2 Hg, sometimes we cannot find
UMP size-α test for H : h ¼ h0 against K : h [ h0 or h\h0 .
For example, if X1 ; X2 ; . . .; Xn ðn 1Þ are i.i.d according to the p.d.f.
1 1
f ðx=hÞ ¼ ; ð1\h\1; 1\x\1Þ
p 1 þ ðx hÞ2
we cannot find UMP size-a test for H : h ¼ h0 against h [ h0 or h\h0 .

In that case, we can find an e [ 0 for which there exists a critical region x0 such
that Px0 ðh0 Þ ¼ a and Px0 ðhÞ Px ðhÞ8h : h0 \h\h0 þ e and 8x : Px ðh0 Þ ¼ a.
Construction Let pðx=hÞ be such that, for every x, ddh px ðhÞ exists and is
continuous in the neighborhood of h0 . Then we have, by mean value theorem, for
any h [ h0

d
Px ðhÞ ¼ Px ðh0 Þ þ ðh h0 Þ Px ðhÞ ; h0 \h \h
dh h¼h
¼ Px ðh0 Þ þ ðh h0 ÞP0x ðh Þ; ðsayÞ ð3:50Þ
Similarly,
Px0 ðhÞ ¼ Px0 ðh0 Þ þ ðh h0 ÞP0x0 ðh Þ; ðsayÞ ð3:51Þ
Let x0 be such that Px0 ðh0 Þ ¼ a and P0x0 ðh0 Þ is maximum, i.e.
P0x0 ðh0 Þ P0x ðh0 Þ8x : Px0 ðh0 Þ
¼ a: Then comparing (3.50) and (3.51), we get an
e [ 0, such that Pxc ðhÞ Px ðhÞ8h : h0 \h\h0 þ e. Such a x0 is called locally
most powerful size-α test for H : h ¼ h0 against h [ h0 .
Now our problem is to choose x0 such that
Z
Px0 ðh0 Þ ¼ a , pðx=h0 Þdx ¼ a ð3:52Þ
x0
and
Z Z
dpðx=hÞ dpðx=hÞ
P0x0 ðh0 Þ P0x ðh0 Þ , dx dx
dh0 dh0
x
Z0 x
Z
0
, p ðx=h0 Þdx p0 ðx=h0 Þdx
x0 x
R
where x satisfies Px ðh0 Þ ¼ a , pðx=h0 Þdx ¼ a
x
In the generalized N–P lemma, take m = 1 and set g0 ðxÞ ¼ p0 ðx=h0 Þ,

g1 ðxÞ ¼ pðx=h0 Þ, C1 ¼ a, k1 ¼ k.
Then we get,
)
Inside x0 : p0 ðx=h0 Þ [ kpðx=h0 Þ
ð3:53Þ
Outside x0 : p0 ðx=h0 Þ
kpðx=h0 Þ
R R
Finally, p0 ðx=h0 Þdx p0 ðx=h0 Þdx
x0 x
where x0 and x satisfy
Px0 ðh0 Þ ¼ Px ðh0 Þ ¼ a ð3:54Þ
Thus the test given by (3.53) and (3.54) is locally most powerful size-a for
H : h ¼ h0 against h [ h0 .
Note If UMP test exists for H : h ¼ h0 against h [ h0 ) LMP test corre-
sponding to the said problem must be identical to the UMP test. But the converse
may not be true.
Example 3.17 X1 ; X2 ; . . .Xn are i.id Nðh; 1Þ. H : h ¼ h0 against h [ h0 .
LMP test is provided by
x0 ¼ fx : p0 ðx=h0 Þ [ kpðx=h0 Þg ð3:55Þ

Z
pðx=h0 Þdx ¼ a ð3:56Þ
x0
It can be observed that
p0 ðx=h0 Þ [ kpðx=h0 Þ
1
, p0 ðx=h0 Þ [ k
pðx=h0 Þ ð3:57Þ
d
, ½loge pðx=h0 Þ [ k
dh0
P
Here pðx=hÞ ¼ ð2pÞn=2 e2 ðxi hÞ
1 2
1Xn
) log pðx=hÞ ¼ const. ðxi hÞ2
2 1
0Þ
P
n
) d logðx=h
dh ¼ ðxi h0 Þ, hence by (3.57), (3.55)
0
1
, x0 ¼ fx : x [ k0 g
and (3.56) , Ph0 fx [ k0 g ¼ a

pffiffiffi pffiffiffi pffiffiffi
, Ph0 nðx h0 Þ [ nðk0 h0 Þ ¼ a ) nðk0 h0 Þ ¼ sa
n o
i.e, k0 ¼ h0 þ p1ffiffin sa . Thus x0 , x0 ¼ x : x [ h0 þ p1ffiffin sa
which is identical to the UMP test for H : h ¼ h0 against h [ h0 .
General case: Let X1 ; X2 ; . . .Xn be i.i.d with p.d.f f ðx=hÞ.

To find LMP for H : h ¼ h0 against h [ h0
Qn
Here pðx=hÞ ¼ f ðxi =hÞ;
i¼1
LMP test is given by the critical region:
x ¼ fx : p0 ðx=h0 Þ [ kpðx=h0 Þg, where k such that Px ðh0 Þ ¼ a
Now,
p0 ðx=h0 Þ
p0 ðx=h0 Þ [ kpðx=h0 Þ , [k
pðx=h0 Þ
d log pðx=h0 Þ
, [k ½pðx=hÞ [ 0
dh0
P
n 0
f ðxi =h0 Þ P
n 0
, f ðxi =h0 Þ [ k, ðsayÞ , yi [ k, where yi ¼ ff ðxðxii=h
=h0 Þ
0Þ
.
1 1
Now, under H, yi 0 s is i.i.d with
Z Z
f 0 ðxi =h0 Þ
Eh0 fyi g ¼ f ðxi =h0 Þdx ¼ f 0 ðx=h0 Þdx
f ðxi =h0 Þ
Z
d d
¼ f ðx=h0 Þdx ¼ ð1Þ ¼ 0
dh0 dh0
Z
f 0 ðxi =h0 Þ 2
Vh0 fyi g ¼ f ðx=h0 Þdx
f ðx=h0 Þ
Z
@ log f ðx=h0 Þ 2
¼ f ðx=h0 Þdx
dh0
¼ Iðh0 Þ ½Fisher0 s information:
Pn
y
Hence, by Central Limit Theorem, for large n, pffiffiffiffiffiffiffiffiffi
1 i
Nð0; 1Þ, under H. So,
nIðh0 Þ
for large n, the above test can be approximated by
( )
X
n
f 0 ðxi =h0 Þ pffiffiffiffiffiffiffiffiffiffiffiffiffi
x¼ x: [ sa nIðh0 Þ :
i¼1
f ðxi =h0 Þ
Locally Best test: Two-sided case

For testing H : h ¼ h0 against h 6¼ h0 corresponding to the family fpðx=hÞ; h 2 Hg,
it is being known that (expect some cases) there does not exist any test which is
UMP for both sided alternatives. e.g. taking l ¼ 0 against l 6¼ 0 for Nðl; 1Þ and
taking r2 ¼ 1 against r2 6¼ 1 for Nð0; r2 Þ etc.
In those cases, we can think of a test which is UMP in a neighbourhood of h0 .
Thus a test ‘w0 ’ is said to be locally best (of size a) for H : h ¼ h0 against K : h 6¼ h0
if there exists an t [ 0 for which
(i) Pw0 ðh0 Þ ¼ a
(ii) Pw ðhÞ a 8h : jh h0 j\t and 8w satisfying (i).
Let pðx=hÞ be such that, for a chosen w
(i) P0w ðhÞ exists in the neighbourhood of h0 ;
(ii) P00w ðhÞ exists and is continuous at (in the neighbourhood) h0 .
Then we have, by Taylor’s Theorem
ðh h0 Þ2 00 k
Pw ðhÞ ¼ Pw0 ðh0 Þ þ ðh h0 ÞP0w ðh0 Þ þ Pw ðh Þ;
2!
jh h0 j\jh h0 j
Let w0 be such that

(i) Pw0 ðh0 Þ ¼ a (size condition)
(ii) P0w0 ðh0 Þ ¼ 0 (Locally unbiased condition)
(iii) P00w0 ðh0 Þ is maximum
Then we can find an t [ 0 such that 8h : jh h0 j \t, We have
Pw0 ðhÞ Pw ðhÞ8w satisfying (i) and (ii).
2
Now Pw ðhÞ ¼ Pw ðh0 Þ þ ðh h0 ÞP0w ðh0 Þ þ ðhh
2!
0Þ
P00w ðhk0 Þ þ g and g ! 0 as
h ! h0 .
To get Pw0 ðhÞ Pw ðhÞ8h : jh h0 j\t we must have P00w0 ðh0 Þ P00w ðh0 Þ [due to
continuity of Pw ðhÞ].
Then w0 is called locally Most Powerful unbiased size-a test if
(ii) P0w0 ðh0 Þ ¼ 0:
(iii) P00w0 ðh0 Þ P00w ðh0 Þ8w satisfying (i) and (ii).
Construction
Z
Z
Pw ðh0 Þ ¼ p x=h0 dx; P0w ðh0 Þ ¼ p0 x=h0 dx;

w w
Z
P00w ðh0 Þ ¼ p00 x=h0 dx:

w
Let us set in generalized N–P lemma

g0 ðxÞ ¼ p00 x=h0 ; g1 ðxÞ ¼ p x=h0 ; g2 ðxÞ ¼ p0 x=h0 ;
c1 ¼ a; c2 ¼ 0
Then
n

o
w0 ¼ x : p00 x=h0 [ k1 p x=h0 þ k2 p0 x=h0
R R
where k1 and k2 are such that g1 ðxÞdx ¼ a; g2 ðxÞdx ¼ 0:
R Rw0 w0
Then we have g0 ðxÞdx g0 ðxÞdx provided ‘w’ satisfies (i) and (ii).
w0 w
, P00w0 ðh0 Þ P00w ðh0 Þ
Example 3.18 X1 ; X2 ; . . .Xn are i.i.d. Nðl; 1Þ. To find LMPU test for H : l ¼ l0
against K:l 6¼ l0 :
P n
1
n 12 ðxi lÞ2
Answer Here p x=h ¼ pffiffiffiffi 2p
e 1
P
n

0 1 n
1
2 ðxi lÞ2
x
p =h ¼ pffiffiffiffiffiffi nðx lÞe ¼ nðx lÞp x=h
2p

00 x

p =h ¼ ½nðx lÞ2 p x=h np x=h
LMPU size-a test is

n

o Z
w0 ¼ x : p00 x=h0 [ k1 p x=h0 þ k2 p0 x=h0 pðx=h0 Þdx ¼ a ð3:58Þ
x0
and
Z
p0 ðx=h0 Þdx ¼ 0 ð3:59Þ
x0
n o
¼ x : ½nðx l0 Þ2 n [ k1 þ k2 nðx l0 Þ
n pffiffiffi 2 pffiffiffi o
¼ x: nðx l0 Þ [ k01 þ k02 nðx l0 Þ

¼ x : y2 [ k01 þ k02 y ;
pffiffiffi
y ¼ nðx l0 Þ Nð0; 1Þ under H
(3.59) ⟺
Z
yNðy=0; 1Þdy ¼ 0 ð3:60Þ
y2 [ k01 þ k02 y
Now the LHS of (3.60) is zero irrespective of choice of any ðk01 ; k02 Þ since
y
Nð =0; 1Þ is symmetrical about ‘0’.
Here, we can safely take k02 ¼ 0 without affecting size condition. Then our test

reduces to w0 : x : y2 [ k01 fx : jyj [ cg and hence (3.58) is equivalent to
R
Nðy=0; 1Þdy ¼ a ) c ¼ sa=2
jyj [ c
Then we obtain LMPU test for H : l ¼ l0 against l 6¼ l0 .
A test which is locally most powerful and locally unbiased is called a Type A
test and corresponding critical region ‘w0 ’ is said to be Type-A critical region
3.5 Type A1 (Uniformly Most Powerful Unbiased) Test

Let p x=h ; h 2 H : Real parameter family of distributions.
Testing problem: H : h ¼ h0 against K : h 6¼ h0 :
T ð X Þ ¼ T : Test statistic.
(i) Right tail test based on T is UMP for H : h ¼ h0 against h [ h0 (in most of the
cases)
(ii) Left tail test based on T is UMP for H : h ¼ h0 against h\h0 (in most of the
cases)
P P
[As for example N ðl; 1Þ; N ð0; r2 Þ …. T ¼ xi ; T ¼ x2i etc. and for
Bðn; pÞ; T ¼ x etc:]
There not exist a single test which is UMP for H : h ¼ h0 against h 6¼ h0 :
does
If p x=h has monotone likelihood ratio in T ð X Þ; i.e.

x=
h1
" T ð xÞ for h1 [ h0 ; then (i) and (ii) are satisfied.
p
p x=h
0
In that case, we try to choose a test w0 for which
(ii) Pw0 ðhÞ a8h 6¼ h0
(iii) Pw0 ðhÞ Pw ðhÞ8h 6¼ h0 8w satisfying (i) and (ii)
is called UMPU size-a test for H : h ¼ h0 against h 6¼ h0 :

Such a test
Let p x=h be such that, for every test w;
d
dh ½Pw ðhÞ exists;
and
Z Z Z
d d dpðx=hÞ
½Pw ðhÞ ¼ pðx=hÞdx ¼ dx ¼ p0 ðx=hÞdx ð3:61Þ
dh dh dh
w w w
Then unbiasedness of a test

d
w) Pw ðhÞ ¼ 0 ð3:62Þ
dh0
Thus, if a test w0 satisfies (i), (ii) and (iii); under (3.61), w0 also satisfies (i) and (iii).
Test satisfying (i), (iii) and (3.62) is called type-A1 test.
For exponential distribution, if type-A1 test exists, then it must be unbiased. But
this is not true in general.
Construction Our problem is to get w0 such that
R
(i) p x=h0 dx ¼ a;
R 0
w0
(ii) p x=h0 dx ¼ 0
R
w0
R
(iii) p x=h dx p x=h dx 8w satisfying (i) and (ii) and 8h 6¼ h0
w0 w

In generalized N–P Lemma, put g0 ¼ p x=h ; g1 ¼ p x=h0 ; g2 ¼ p0 x=h0 ;

c1 ¼ 1; c2 ¼ 0. n

o

Then, define w0 ¼ x : p x=h [ k1 p x=h0 þ k2 p0 x=h0 and hence
R R
p x=h dx p x=h dx8w satisfying (i) and (ii) and 8h 6¼ h0 :
w0 w
For exponential distribution, it is always possible to have such region w0 (which
means type-A1 test exists).
Example 3.19 X1; X2; . . .; Xn are i.i.d. Nðl; 1Þ. We test H : l ¼ l0 against
K : l 6¼ l0
3.5 Type A1 (Uniformly Most Powerful Unbiased) Test 99
1
P
n
ðxi lÞ2
pðx=hÞ ¼ ð2pÞn=2 e
2
P
n
X
n 1
ðxi lÞ2 X
n
p ðx=hÞ ¼ ð2pÞn=2
0 2
ðxi lÞe i ¼ ðxi lÞpðx=hÞ
i¼1 i
Then type-A1 region (test) is given by
w0 ¼ fx : pðx=hÞ [ k1 pðx=h0 Þ þ k2 nðx l0 Þpðx=h0 Þg
1
P 2
e 2 ðxi lÞ
n 2
pðx=hÞ e 2 ðxlÞ
P ¼ e2ðll0 Þf2xðl0 þ lÞg
n
¼ ¼
pðx=h0 Þ e12 ðxi l0 Þ2 en2 ðxl0 Þ2
pffiffiffi
) w0 ¼ fx : eðll0 Þt [ k01 þ k02 tg where t ¼ nðx l0 Þ
¼ fx : edi [ k01 þ k02 tg
where k00 1 and k01 are such that

Z Z
x
pð =h0 Þdx ¼ a; nðx lÞpðx=h0 Þdx ¼ 0
w0 w0
Z
, Nðt=0; 1Þdt ¼ a ð3:63Þ
wo
Z
, tNðt=0; 1Þdt ¼ 0 ð3:64Þ
w0
Writing gðtÞ ¼ edt k01 k02 t we have g0 ðtÞ ¼ dedt k02 and g00 ðtÞ ¼ d2 edt [ 08t
) y ¼ gðtÞ has a single minimum (global minimum).
Now if we take a\0:5, because of (3.63) and since the distribution of t is sym-
metric about 0 under H0 our shape of the curve will be like as shown below. From the
graph, we observe that c1 \c2 ; gðtÞ [ 0 for t\c1 and t [ c2 and gðtÞ
0 otherwise.
y=g(t)
c1 c2 t
Hence w0 is equivalent
R to wo ¼ fx : t\c1 or t [ c2 g
(3.63) , Nðt=0; 1Þdt ¼ a and (3.64)
t\c1 ;t [ c2
Z
, tNðt=0; 1Þdt ¼ 0 ð3:65Þ
t\c1; t [ c2
Now, as T Nð0; 1Þ, we take w0 as
w0 ¼ fx : t\ c and t [ cg ð3:66Þ
where c is such that

Z
Nðt=0; 1Þdt ¼ a ) c ¼ sa=2 ð3:67Þ
jt j [ c
Here (3.65) is automatically satisfied. Hence test given by (3.66) and (3.67) is
type-A1 (which is UMPU).
Example 3.20
X1 ; X2 ; . . .. . .; Xn are i.i.d: Nð0; r2 Þ:
Testing Problem, H : r2 ¼ r20 against K : r2 6¼ r20

n Pn
1 1 2
pðx=hÞ ¼ pffiffiffiffiffiffiffi e2r2 i xi
r 2P
0Pn 1
x2i
B i C 1
p0 ðx=hÞ ¼ B C
@ r2 nA 2r2 pð =hÞ
x
Thus,
Pn 2
x x i xi 1 x
w0 ¼ x : pð =hÞ [ k1 pð =h0 Þ þ k2 n p =h0

r20 2r20
( Pn 2
pðx= Þ x
¼ x : x h [ k01 þ k02 tg; t ¼ i 2 i
pð =ho Þ r0
P
n
x2
pðx= Þ r
n i
i r2
ð1 02 Þ
As x h ¼
0 2r2 r
e 0
pð =h0 Þ r
3.5 Type A1 (Uniformly Most Powerful Unbiased) Test 101
n r
n d o
e2t [ k01 þ k02 t
0
w0 ¼ x :
r
n dt
Now as before the curve y ¼ gðtÞ ¼ rr0 e 2 k01 k02 t has a single minimum.
n o
Here P T [ 0=h ¼ 1
) Shape of the curve gðtÞ will be as shown below
y = g(t)
d1 d2 t
which means there exists d1 and d2

such that w0 is equivalent to w0 ¼ fx : t\d1 or t [ d2 g
Here d1 and d2 are such that
Z
Zd2
p x=h0 dx ¼ a , fv2n ðtÞdt ¼ 1 a ð3:68Þ
w0 d1
and
Z
Z
p0 x=h0 dx ¼ 0 , ðt nÞfv2n ðtÞdt ¼ 0 ð3:69Þ
w0 t\d1 or t [ d2
R R
(3.69) ⇔ tfv2n ðtÞdt ¼ n fv2n ðtÞdt ¼ na, by (3.68)
t\d1 or t [ d2 t\d1 or d2
Zd2
, tfvn2 ðtÞdt ¼ ð1 aÞn
d1
Zd2
, fvn2þ 2 ðtÞdt ¼ ð1 aÞ ð3:70Þ
d1
Thus UMPU (a type-A1 ) size a test is

x0 ¼ fx : t\d1 or t [ d2 g such that

P d1\ vn2 \d2 ¼ 1 a and P d1 \vx2þ 2 \d2 ¼ 1 a:
Note in this example ) type A1 test , type-A test.


Example 3.21 X1 ; X2 ; . . .; Xn are i.i.d with p x=h ¼ hehx . Find Type-A and Type-
A1 test for H : h ¼ h0 against h 6¼ h0 .
Answer proceed as Examples 3.19 and 3.20 and hence get
( )
X
n X
n
x0 ¼ x: Xi \c1 or Xi [ c2
1 1
where c1 and c2 are such that

c1 c2
P 1\v2n \
2
1 ¼ 1 a and
2h0 2h0

c1 c2
P 1\v2n þ 2 \
2
1 ¼ 1 a:
2h0 2h0
Chapter 4
Likelihood Ratio Test
4.1 Introduction
In the previous chapter we have seen that UMP or UMP-unbiased tests exist only
for some special families of distributions, while they do not exist for other families.
Further, computations of UMP-unbiased tests in K-parameter family of distribution
are usually complex. Neyman and Pearson (1928) suggested a simple method for
testing a general testing problem.
Consider X pðxjhÞ, where h is a real parameter or a vector of parameters,
h 2 H:
A general testing problem is
H : h 2 H0 Against K : h 2 H1 :
Here, H and K may be treated as the subsets of H. These are such that H \ K ¼
/ and H [ K H. Given that X ¼ x; pðxjhÞ is a function of h and is called like-
lihood function. Likelihood test for H against K is provided by the statistic
Sup pðxjhÞ
h2H
L ð xÞ ¼ ;
Sup pðxjhÞ
h2H[K
which is called the likelihood ratio criterion for testing H against K. It is known that
(i) pðxjhÞ 08h
(ii) Sup pðxjhÞ Sup pðxjhÞ.
h2H h2H[K
Obviously 0 Lð xÞ 1. The numerator in Lð xÞ measures the best explanation

that the observation X comes from some population under H and the denominator

DOI 10.1007/978-81-322-2514-0_4
104 4 Likelihood Ratio Test
measures the best explanation of X to come from some population covered under
H [ K. Higher values of the numerator correspond to the better explanation of
X given by H compared to the overall best possible explanation of X, which results
in larger values in Lð xÞ leading to acceptance of H. That is, Lð xÞ would be larger
under H than under K. Indeed, smaller values of Lð xÞ will lead to the rejection of
H. Hence, our test procedure is:
Reject H iff Lð xÞ\C
and accept H otherwise,
where C is such that PfLð xÞ\CjH g ¼ a 2 ð0; 1Þ:
If the distribution of Lð xÞ is continuous, then the size a is exactly attained and no
randomization on the boundary is needed. If the distribution is discrete, the size
may not attain a and one may require randomization. In this case, we have C from
the relation
PfLð xÞ\CjH g a:
Here, we reject H if Lð xÞ\C;

accept H if Lð xÞ [ C;
and reject with probability ‘a’ iff Lð xÞ ¼ C.
Thus, we have PfLð xÞ\CjH g þ aPfLð xÞ ¼ CjH g ¼ a.
The likelihood ratio tests are useful, especially when h is a vector of parameters
and the testing involves only some of them. This test criterion is very popular
because of its computational simplicity. Moreover, this criterion proves to be a
powerful alternative for testing vector valued parameters that involve nuisance
parameters. Generally, the likelihood ratio tests result in optimal tests, whenever
they exist. An LR test is generally UMP, if an UMP test at all exists. In many cases
the LR tests are unbiased, although this is not universally true. However, it is
difficult to compute the exact null distribution of the test statistic Lð xÞ in many
cases. Therefore, a study of large sample properties of Lð xÞ becomes necessary
where maximum likelihood estimators follow normal distribution under certain
regularity conditions. We mention the following large sample property of the
likelihood ratio test statistic without proof.
Under H, the asymptotic distribution of 2 loge Lð xÞ is distributed as v2 with
degrees of freedom equal to the difference between the number of independent
parameters in H and the number in H0 .
Drawback: Likelihood ratio test is constructed completely by intuitive argu-
ment. So, it may not satisfy all the properties that are satisfied by a test obtained
from N–P theory; it also may not be unbiased.
4.1.1 Some Selected Examples
Example 4.1 Let X be a binomial bðn; hÞ random variable. Find the size-a likeli-
hood ratio test for testing H : h h0 against K : h [ h0
4.1 Introduction 105
Solution Here, H0 ¼ fh : 0 h h0 g and H ¼ fh : 0 h 1g.

The likelihood ratio test statistic is given as
Sup pðxjhÞ Sup pðxjhÞ

h2H h h0
L ð xÞ ¼ ¼
Sup pðxjhÞ Sup pðxjhÞ
H[K H

n x
Sup h ð1 hÞnx
h h0 x
¼
n x
Sup h ð1 hÞnx
H x
_
The MLE of h for h 2 H is h ¼ nx.
For, h 2 H0 , we have
_ x x
hH ¼ if h0
n n
x
¼ h0 if [ h0
n
Thus, we have

n n x x
nx x nx
Sup h ð 1 hÞ ¼
x
1
H x x n n
8
> n nx
>
> x x
1 nx if x
h0
n x < x n n
nx
Sup h ð 1 hÞ ¼
h h0 x >
> n x
>
: h ð1 h0 Þnx if x
[ h0
x 0 n
(
1 if x
n h0
So, Lð xÞ ¼ hx0 ð1h0 Þnx
if x
[ h0
ðnxÞ ð1nxÞ
x nx
n
It can be observed that Lð xÞ 1 when x [ nh0 and Lð xÞ ¼ 1 when x nh0 . This

shows that Lð xÞ is the decreasing function of x. Thus, Lð xÞ\C iff x [ C 0 and the
likelihood ratio test rejects H0 if x [ C 0 where C0 is obtained as
Ph0 ðX [ C 0 Þ ¼ a. Since X is a discrete random variable, C 0 is obtained such that
Ph0 ðX [ C 0 Þ a\Ph0 ðX [ C0 1Þ:
Example 4.2 Let X1 ; . . .; Xn be a random sample from a normal distribution with

mean l and variance r2 . Find out the likelihood ratio test of
(a) H : l ¼ l0 aganist K : l 6¼ l0 when r2 is known.
(b) H : l ¼ l0 aganist K : l 6¼ l0 when r2 is unknown.
Solution
(a) Here, H0 ¼ fl0 g; H ¼ fl : 1\l\1g
P
n
n=2 2r2 1
ðXi lÞ2
pðxjhÞ ¼ pðxjlÞ ¼ ð2pÞn=2 r2 e i¼1
The likelihood ratio test statistic is given as
Sup pðxjlÞ Sup pðxjlÞ

H H0
Lð xÞ ¼ ¼
Sup pðxjlÞ Sup pðxjlÞ
H[K H
The maximum likelihood estimate of l for l 2 H is x.

P
n
1 ðXi l0 Þ2
n=2 2 n=2 2r2 i¼1
So, Sup pðxjlÞ ¼ ð2pÞ ðr Þ e
H0
P
n
1
ðXi xÞ2
n=2
and Sup pðxjlÞ ¼ ð2pÞn=2 ðr2 Þ
2r2
e i¼1 :
H
This gives
Pn
ðXi l0 Þ2
e2r2
1
i¼1 2
¼ e2r2 ðxl0 Þ
n
L ð xÞ ¼ Pn
1
ðXi xÞ2
e 2r2 i¼1
The rejection region Lð xÞ\C gives

n
2 ðx l0 Þ2 \C1
2r
pffiffiffi
nðx l0 Þ
or; [ C2
r
Thus, the likelihood ratio test is given as

( pffiffi

1 if nðxrl0 Þ [ C2
/ðxÞ ¼
0 Otherwise
where the constant C2 is obtained by the size condition
El0 ½/ðxÞ ¼ a

pffiffiffi
nðx l0 Þ

or; P l0
r [ C2 ¼ a
pffiffi
Now, under H : l ¼ l0 , the statistic nðxrl0 Þ follows N ð0; 1Þ distribution. Since
the distribution is symmetrical about 0, C2 must be the upper a=2—point of the
distribution. Finally, the test is given as
( pffiffi

1 if nðxrl0 Þ [ sa=2
/ðxÞ ¼
0 Otherwise

(b) Here, H0 ¼ ðl; r2 Þ : l ¼ l0 ; r2 [ 0

H¼ l; r2 : 1\l\1; r2 [ 0
In this case,
Pn
1 n=2 2r2 ðXi lÞ
1 2
Sup p xjl; r 2
¼ Sup e i¼1
H0 H0 2pr2
MLE of r2 given l ¼ l0 is given as

1X n
s20 ¼ ð X i l0 Þ 2 :
n i¼1
n=2
e 2 .
n
This gives Sup pðxjl0 ; r2 Þ ¼ 1
2ps02
H0
n=2 Pn
ðXi lÞ2
e2r2
1
Further, Sup pðxjl; r2 Þ ¼ Sup pðxjl; r2 Þ ¼ Sup p 1
2pr2
i¼1
H l;r2 l;r2
Pn
The MLE of l and r are given as x and2 1
i¼1
Þ2 ¼ ðn1Þ s2 ; where
ðXi X
Pn n n
s ¼ n1 2
i¼1 ðXi X Þ . We have,
2 1
!n=2

1
e 2
n
Sup p xjl; r2 ¼
l;r2 2p ðn1Þ 2
n s
Hence, likelihood ratio test statistic is given as

n2
e2
1 n n=2
2ps20 ðn 1Þs2
Lð xÞ ¼ n2 ¼
ns20
e2
1 n
ðn1Þ 2
2p n s
!n=2 " #n=2
ðn 1Þs2 ðn 1Þs2
¼ Pn ¼ Pn
i¼1 ðX i l 0 Þ2 nðx l0 Þ þ
2 2
i¼1 ðXi X Þ
" #n=2
ðn 1Þs2
¼
nðx l0 Þ2 þ ðn 1Þs2
The LR critical region is given as
Lð xÞ\C
nðx l0 Þ2
or; 2
[ C1
pffiffiffi s
nðx l0 Þ
or; [ C2 :
s
The likelihood ratio test is given as

( pffiffi

1 if nðxsl0 Þ
/ðx; sÞ ¼ [ C2 ;
0 otherwise
where the constant C2 is obtained by the size condition

pffiffiffi
nðx l0 Þ
Pl0 [ C2 ¼ a:

s
pffiffi
Now under H : l ¼ l0 , the statistic nðxsl0 Þ is distributed as, t with (n − 1) d.f
which is symmetric about 0. Hence, C2 ¼ ta2;n1 . Finally, the test is given as
( pffiffi

1 if nðxsl0 Þ
/ðx; sÞ ¼ [ ta2;n1
0 otherwise
Example 4.3 X1 ; X2 . . .Xn are i.i.d N ðl; r2 Þ. Derive LR test for H : l l0 against
K : l [ l0 .
Answer h ¼ ðl; r2 Þ; H ¼ h : 1\l\1; r2 [ 0
P
n
n 2r2
1
ðXi lÞ2
Here, pðx=hÞ ¼ p x l; r2 ¼ ð2pÞ2 ðr2 Þ 2 e
n
1
Likelihood ratio criterion

Sup x l; r2 Sup x l; r2
H l l0
Lð xÞ ¼ ¼ ½H [ K ¼ H ¼ ðl l0 Þ[ðl [ l0 Þ
Supðx=l; r2 Þ Supðx=l; r2 Þ
H[K H

p x l ^2H
^H ; r
¼
pðx=^ ^2 Þ
l; r

where l ^2H : MLE ðl; r2 Þ under H
^H ; r
^; r
ðl ^2 Þ: MLE of ðl; r2 Þ under H [ K (in the unrestricted case).
For ðl; r2 Þ 2 H [ K, i.e. H, we have
_ _2 1Xn
ð n 1Þ 2
l ¼ x; r ¼ ðxi xÞ2 ¼ s
n 1 n
For ðl; r2 Þ 2 H, we have
_ n1 2
lH ¼ x if x l0 and r
^2H ¼ s if x l0
n
1X n
¼ l0 if x [ l0 and r
^2 H ¼ s20 ¼ ðxi l0 Þ2 if x [ l0 :
n 1
. n n
_ 2 2 n
^2 ¼ ð2pÞ 2 n1
Thus, we have p x l; r n s e2
. n2
_ n2 n 1 2 n
^ H ¼ ð2pÞ
p x lH ; r 2
s e 2 if x l0
n
n n n
¼ ð2pÞ2 s20 2 e2 if x [ l0
So, Lð xÞ ¼ 1 if x l0
n1 2 n=2
n s
¼ ; if x [ l0
s20
Hence, we reject H if Lð xÞ\C; where Cð\1Þ is such that
PfLð xÞ\C=l ¼ l0 g ¼ a 2 ð0; 1Þ ð4:1Þ

n1 2 n=2
s
, n2 \C and x [ l0
s0
" #n=2
ðn 1Þs2
, \C and x [ l0
ðn 1Þs2 þ nðx l0 Þ2
nðx l0 Þ2
, 1þ [ C 0 and x [ l0
ðn 1Þs2
pffiffiffi
nðx l0 Þ
, [ C00 ð4:2Þ
s
npffiffi 00
o
Thus, (4.1) is , P nðxsl0 Þ [ C =l ¼ l0 ¼ a
8 9
>
> p ffiffi
ffi >
>
< nðx l Þ 00
=
, P rP C
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ =l ¼ l
0
¼a
>
> n 0>
>
: ðx xÞ2
1 i ;
n1
00
) C ¼ ta;n1
pffiffi
Hence, reject H iff nðxsl0 Þ [ ta;n1
) Test can be carried out by using students’ t-statistic.
Example 4.4 X1 ; X2 . . .Xn are i.i.d N ðl; r2 Þ; 1\l\1; r2 [ 0: Find the LR test
for
I. H : r2 ¼ r20 against K : r2 [ r20

II. H : r2 ¼ r20 against K : r2 6¼ r20
Answer I. h ¼ ðl; r2 Þ
Pn
n 12 ðxi lÞ2
p x=l; r2 ¼ Likelihood function ¼ ð2pÞn=2 r2
=2 2r
e i¼1
n=2 1 ½P ðxi xÞ2 þ nðxlÞ2

¼ ð2pÞn=2 r2 e 2r2
Likelihood ratio is:

Supl;r2 ¼r20 p x l; r2 p x l^H ; r20
Lð xÞ ¼ ¼ ;
Supl;r2 r20 pðx=l; r2 Þ ^2 Þ
l; r
pðx=^
^ = (MLE of l overall h : r2 r20 ) = x

where l
n1
s2 ; if n1
n s r0
2 2
^ ¼ MLE of r ¼
r 2 2 n
r0 ; if n s \r0
2 n1 2 2
^H = MLE of l under 8
l H = x
>
>
ðn1ÞS2
< ðr20 Þn=2 e 2r20
Hence we get, Lð xÞ ¼ Þ 2 n=2 n
; if n1 2
n s r20
>
> ððn1
n s Þ e 2
: 1; if n1 s2 \r2
n 0
Now we apply LR technique: reject H iff
Lð xÞ\C: ð4:3Þ
where C (<1) is such that

PfLð xÞ\C=H g ¼ a 2 ð0; 1Þ ð4:4Þ
n1 2 n2 ðn1Þs2
s 2 ðn 1Þs2
ð 1Þ , n 2 e 2r0 \C 0 iff n
r0 r20
, u2 e2 \C
n u
if u n ð4:5Þ
Writing gðuÞ ¼ u2 e2 ; u 0

n u
n
n n u2 u
g0 ðuÞ ¼ u21 e2 e2 ¼ 0
u
2 2
)nu¼0,u¼n
The curve y = g(u) has a maximum and the shape of the curve is
g(u)=c'
u-1 u= n u0
From the graph, it is clear that gðuÞ\C , u\u1 or u [ u0 where

0\u1 \n\u0 :
Hence, (4.5) , u [ u0 and (4.4) , PH ðU [ u0 Þ ¼ a

2 As under H;
, P vn1 [ u0 ¼ a
U v2n1
) u0 ¼ v2n1;a
P
Thus, LR test is: reject H if ni¼1 ðxi xÞ2 [ r20 v2n1;a :
ðn1Þs2

Supl;r2 ¼r2 pðx=l;r2 Þ ðr2 Þn=2 e 2r20
¼ K u2 e 2
n u
II. Lð xÞ ¼ Sup 0pðx=l;r2 Þ ¼ n1 0
n=2
l;r2 ð n s2 Þ en=2 n n U o
Our test is to reject H if u2 e2 \C0 ; where C 0 is such that PH U 2 e 2 \C0 ¼ a:
n u
From the graph, we observe that the line y = C′ cuts the curve y = g(u) at two
points: u1 and u0 such that 0\ u1 \n\ u0 . Hence, the test is equivalent to reject
H iff u\u1 or u [ u0 ; where

PH v2n1 \u1 þ PH v2n1 [ u0 ¼ a:
Although v2 is not symmetric, for the matter of simplicity, equal error proba-
bilities a=2 are attached to both left-and right-sided critical regions. Thus, u1 ¼
v2n1;1a and u0 ¼ v2n1;a .
2 2
Example
4.5 Let X1 ; X2 ; . . .Xn1 and Y1 ; Y2 ; . . .Yn2 be two independent samples drawn
from N l1 ; r21 and N l2 ; r22 ; respectively. Find out the likelihood ratio test of
(a) H : r21 ¼ r22 against K : r21 [ r22
(b) H : r21 ¼ r22 against K : r21 6¼ r22 ; when l1 and l2 are unknown

(a) Here, h ¼ l1 ; l2 ; r21 ; r22

H0 ¼ l1 ; l2 ; r21 ; r22 : 1\l1 ; l2 \1; r21 ¼ r22 ¼ r2 [ 0

H ¼ l1 ; l2 ; r21 ; r22 : 1\li \1; r2i [ 0; i ¼ 1; 2
P
n1
2
P
n2
2
n1 þ n2
1
1
ðXi l1 Þ ðyi l2 Þ
pðx; yjhÞ ¼ ð2pÞ rn 1 n2 2r2 2r2
1 r2 e
2 1 i¼1 2 i¼1
_
Sup pðx; yjhÞ
p x; y hH
h2H
Lðx; yÞ ¼ ¼ _
Sup pðx; yjhÞ
h2H[K
p x; y h
_
where hH = MLE of h under H
_
h ¼ MLE of h under H [ K:
Under H, we obtain MLEs

P P
_ _ _2 ðxi xÞ2 þ ðyi yÞ2
l1H ¼ x; l2H ¼ y; rH ¼
n 1 þ n2
Under H [ K, MLEs are

8 P
>
>
_ _ _2
l1 ¼ x; l2 ¼ y; r1 ¼ n11 ðxi xÞ2
>
>
>
< _2 P _2
r
r2 ¼ n12 ðyi yÞ2 if _12 1
> r2
>
>
>
> _ _ _2
_2
r1
: l1 ¼ x; l2 ¼ y; rH if _2 \1
r2
_
n1 þ n2 n1 þ2 n2 n þ n
_ 2
Hence, p x; yjhH ¼ ð2pÞ 2 rH e 2
1 2
8
n þn n21 2 n22 n þ n
>
> ð Þ 1 2 2 _2
r
_
r2 e 2
1 2
_2
r1
1
_
< 2p 1 if _2
r2
and p x; yjh ¼ n1 þ n2 2 1 2 2
n þn
>
> n1 þ n2 _2
r1
: ð2pÞ 2 r _
H e 2 if _2
\1
r2
8
> _2 n21 _2 n22
>
> r1 r2 _2
r1
>
< n1 þ n2 if 1
_2
_2 r2
Therefore, Lðx; yÞ ¼
2
rH
>
>
>
>
_2
r1
: 1 if _2
\1
r2
8 nP on1=2 nP on2=2
>
> ðxi xÞ2 ðyi yÞ2 P
>
> ðxi xÞ2=
>
>
n1 n2
P n1 1
> if
< nP
> 2
P
ðxi xÞ þ
2 oðn1 þ n2 Þ=2
ðyi yÞ ðyi yÞ2=
¼ n1 þ n2
n2
>
> P
>
> ðxi xÞ2=
>
> n1 \1
>
> P
:1 if
ðyi yÞ2=
n2
8
> P n1=2
>
> 2
P
>
> P ðxi xÞ2 ðxi xÞ2=
>
>
n1 þ n2
ðyi yÞ
> ðn1 þ n2 Þ 2
> if P n1 1
>
< n 1= n 2=
n 2 n 2 P n1 þ2 n2 ðyi yÞ2=
ðxi xÞ2
¼ 1þP n2
1 2
>
> ðyi yÞ2
P
>
>
>
> ðxi xÞ2=
>
> n1 \1
>
> 1 if P
: ðyi yÞ2=
n2
Now, under the null hypothesis H : r21 ¼ r22 ¼ r2 ; consider the statistic
P .
ðxi xÞ2
ðn1 1Þ s21
F¼P 2.
¼ 2 Fn1 1;n2 1 :
ðyi yÞ s2
ð n2 1Þ
On writing Lð xÞ in terms of F, we have

8
> n1=2
>
>
>
> ðn þ n Þ=2 n1 1
F
< ð n1 þ n2 Þ 1 2

n2 1
if F n1 ðn2 1Þ
Lðx; yÞ ¼ n 1=2 n 2=2 ðn1 þ n2 Þ=2 n2 ðn1 1Þ
n1 n2
>
> 1 þ
n1 1
F
>
>
n2 1
>
:1 if F \ nn12 ððnn21 1Þ
1Þ
Now we apply LR technique: reject
H iff L ðx; yÞ\C ð4:6Þ
where C(<1) is such that

PfLðx; yÞ\CjH g ¼ a 2 ð0; 1Þ: ð4:7Þ

n1 =2
n1 1
n2 1 F n1 ð n2 1Þ
ð4:6Þ ) ðn1 þ n2 Þ=2 \C1 if F ;
n1 1
n2 ð n1 1Þ
1þ n2 1 F
2 n1 =2 3
n1 1
6 n2 1F n1 ðn2 1Þ7
where C1 is such that P4 ðn1 þ n2 Þ=2 \C1 and F n2 ðn1 1Þ5 ¼ a:
n 1
1 þ n1 1F
n1=2
2
n1 1
n2 1F
Writing gðF Þ ¼ ðn1 þ n2 Þ=2
n 1
1 þ n1 1F
2
n21 1 n1 þ2 n2 n21

0 n1 n1 1 n1 1 n1 1 n1 1 n1 þ n2
) g ðF Þ ¼ F 1þ F F
2 n2 1 n2 1 n2 1 n2 1 2
n1 þ2 n2 1
n1 1 n1 1
1þ F ¼0
n2 1 n2 1

n1 1 n1 1
) n1 1 þ F ðn 1 þ n 2 Þ F¼0
n2 1 n2 1
n1 ðn2 1Þ
)F¼
n2 ðn1 1Þ
The curve gðF Þ has single maximum at F ¼ nn12 ððnn21 1Þ

1Þ and the shape of the curve is
From the graph, we observe that gðF Þ ¼ C1 and F nn12 ððnn21 1

1Þ
Þ

) F [ d0 [ nn12 ððnn12 1 Þ
1Þ . The constant d0 is obtained by the size condition
Pr21 ¼r22 fF [ d0 g ¼ a
This gives d0 ¼ Fn1 1;n2 1;a

(
s21
1 if [ Fn1 1;n2 1;a
) LR test is given by /ðx; yÞ ¼ s22
0 otherwise
(b) Similarly, for testing H : r21 ¼ r22 against K : r21 6¼ r22 the LR test is equivalent
to
s21 s21
\d 1 or [ d0 :
s22 s22
These constants d 1 and d0 are obtained by the size condition
PH fF\d 1g ¼ PH fF [ d0 g ¼ a=2
This gives d 1 ¼ Fn1 1;n2 1;1a2 and d0 ¼ Fn1 1;n2 1;a=2 : The LR test is, there-
fore, given as
8
>
>
s21
\Fn1 1;n2 1;1a=2
<1 if s22
/ðx; yÞ ¼ s21
[ Fn1 1;n2 1;a=2
>
>
or s22
:
0 otherwise
Example 4.6 Let X1; X2 ; . . .; Xn1 and

Y1 ; Y2 ; . . .Yn2 be two independent samples
drawn from N l1 ; r1 and N l2 ; r2 ; respectively. Obtain the likelihood ratio test of
2 2
(a) H : l1 ¼ l2 against K : l1 6¼ l2 when r21 and r22 are known

(b) H : l1 ¼ l2 against K : l1 6¼ l2 when r21 ¼ r22 ¼ r2 but unknown
(c) H : l1 l2 against K : l1 \l2 when r21 ¼ r22 ¼ r2 but unknown
Solution

(a) Here, h ¼ l1 ; l2 ; r21 ; r22
H0 ¼ fðl1 ¼ l2 ¼ lÞ; 1\l\1g

H ¼ fðl1 ; l2 Þ; 1\li \1; i ¼ 1; 2g
P n1 P n2
n1 þ n2 1
ðXi l1 Þ2 1
ðyi l2 Þ2

rn1 n2 2r2 2r2
pðx; yjhÞ ¼ ð2pÞ 2
1 r2 e
1 i¼1 2 i¼1
ML estimators of l1 and l2 under H are given as

_ _
l1 ¼ x and l2 ¼ y
P
n1 n2 P
n1 þ n2 1
ðxi xÞ2 1
ðyi yÞ2

rn1 n2 2r2 2r2
Sup pðx; yjhÞ ¼ ð2pÞ 2
1 r2 e
1 i¼1 2 i¼1
ðl1 ;l2 Þ2H
Under l 2 H0 ;
P
n1 P n2
n1 þ n2 1
ðXi lÞ2 1
ðyi lÞ2

rn 1 n2 2r2 2r2
pðx; yjlÞ ¼ ð2pÞ 2
1 r2 e
1 i¼1 2 i¼1
On taking log, we get
1 X 1 X
log pðx; yjlÞ ¼ k 2
ð x i lÞ 2 2 ð y i lÞ 2 ;
2r1 2r2
where k is a constant which is independent of l.

The ML estimator for l is obtained as
d
log pðx; yjlÞ ¼ 0
dl
1X n1
1X
) 2 ð xi l Þ þ 2 ð y i lÞ ¼ 0
r1 1 r2

1 1 n1 n2
) 2 n1x þ 2 n2y ¼ þ l
r1 r2 r21 r22
n1 x n2 y r21 r22
þ
_ r21 r22 n1 yþ n2 x
) lH ¼ ¼ :
n1
þ n2 r21 r22
r21 r22 n1 þ n2
This gives
P
n1 _ 2 Pn2 _ 2
n þn 1
xi l H 1
yi l H
12 2
rn 1 n2
2 2
pðx; yjhÞ ¼ ð2pÞ 1 r2 e
2r 2r
Sup 1 i¼1 2 i¼1
ðl1 ;l2 Þ2H0
LR test L(x, y) is given as

2
2
P _ P _
1
ðxi xÞ2 þ n1 xl H 1
ðyi yÞ2 þ n2 yl H
2r2 2r2
e 1 2
Lðx; yÞ ¼ P P
1
ðxi xÞ2 1
ðyi yÞ2
2r2 2r2
e 1 2
_ 2 _ 2
n1 n2
xl H yl H
¼e 2r2
1
2r2
2
"r2 #2
2
n1 ð
1 x yÞ
_
Now, x lH ¼ r2 r2
n1 þ n2
1 2
22 32
2 r2
ð
y
x Þ
y lH ¼ 4n2r2 r2 5
_
n1 þ n2
1 2
2 1
n1 _ n2 _ 2 r1 r22
2 2
) 2 ðx lH Þ þ 2 ðy lH Þ ¼ ðx yÞ þ
r1 r2 n1 n2
2 1
r r2
12 1 þ n2 ðxyÞ2
Thus, Lðx; yÞ ¼ e 2n1
The rejection region Lðx; yÞ\C gives

1
1 r21 r22
þ ðx yÞ2 \C1
2 n1 n2
0 1
ðx
y Þ 2
or; @ r2 r2 A [ C2
n þ n2
1 2
1

x y

or; qffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ C3
r21 r22
n1 þ n2
We know that under H : l1 ¼ l2 ; qxffiffiffiffiffiffiffiffiffi

y ffi
2 2
N(0, 1). Hence, the likelihood ratio
r r
1
n1 þ n2
2
test has critical region

8 9
> >
< x y =

q
x ¼ ðx; yÞ : 2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ C3 ;
>
: r1 r22 >
;
n1 þ n2
where C3 is such that

2 3

x y
6 7
PH 4 qffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ C3 5 ¼ a
r21 r22
n1 þ n2
This gives C3 ¼ sa2 :

Finally, the LR test is given as

8
<1 if qjxffiffiffiffiffiffiffiffiffi
yj
ffi [ sa2
r 2 2r
/ðx; yÞ ¼ 1 þ n2
: n1 2
0 Otherwise

(b) Here h ¼ l1 ; l2 ; r21 ; r22

H0 ¼ l; r2 ; 1\l\1; r2 [ 0

H¼ l1 ; l2 ; r2 ; 1\li \1; r2 [ 0; i ¼ 1; 2
For ðl1 ; l2 ; r2 Þ 2 H;

n
n1 þ2 n2 P 1 P
n2

2 2
12 ðxi l1 Þ þ ðyi l2 Þ
1 2r
p x; yjl1 ; l2 ; r2 ¼ e 1 1
2pr2
On taking log, we get

" #
n1 þ n2 1 X n1
2
X
n2
2
log p ¼ log 2pr 2
2
ð x i l1 Þ þ ð y i l2 Þ
2 2r 1 1
The ML estimators for ðl1 ; l2 ; r2 Þ 2 H are given as
d log p _
¼ 0 ) l1 ¼ x
dl1
d log p _
¼ 0 ) l2 ¼ y
dl2
P P
d log p _2 ðxi xÞ2 þ ðyi yÞ2
¼ 0 ) r ¼
dr2 n1 þ n2
n1 þ2 n2 ðn1 þ n2 Þ
1 _2
e2ðn1 þ n2 Þ
2 1
) Sup p x; yjl1 ; l2 ; r ¼ 2
r
ðl1 ;l2 ;r Þ2H
2 2p
For ðl; r2 Þ 2 H0 ;

n1 þ2 n2 P
n1 P
n2

2 2
12 ðxi lÞ þ ðyi lÞ
1 2r
p x; yjl; r2 ¼ e 1 1
2pr2
" #
n1 þ n2 1 X n1
2
X
n2
2
) log p ¼ log 2pr 2
2
ð x i lÞ þ ð y i lÞ
2 2r 1 1
Now,
d log p _ n1x þ n2y

¼ 0 ) lH ¼
dl n1 þ n2
" #
Xn1 2 Xn2 2
d log p _2 1 _ _
¼ 0 ) rH ¼ xi lH þ yi lH
dr2 n1 þ n2 1 1
" #
1 Xn1 2 X n2 2
2 _ 2 _
¼ ðxi xÞ þ n1 x lH þ ðyi yÞ þ n2 y lH
n1 þ n2 1 1
2 2 n o2
_ n1 x þ n2 y n2 ðx yÞ
Here, x lH ¼ x n1 þ n2 ¼ n1 þ n2
2
n1x þ n2y 2

n1 ðy xÞ 2
_
y lH ¼ y ¼
n1 þ n2 n1 þ n2
This gives
" #
1 X
n1
n2 ðx yÞ 2 X n2
n1 ðy xÞ 2
_2 2 2
rH ¼ ðxi xÞ þ n1 þ ðyi yÞ þ n2
n1 þ n2 1 n1 þ n2 1
n1 þ n2
" #
1 X
n1 X
n2
n1 n2
¼ ðxi xÞ2 þ ðyi yÞ2 þ ðx yÞ2
n1 þ n2 1 1
n 1 þ n2
1 n1 þ2 n2 _ 2
ðn1 þ n2 Þ
e2ðn1 þ n2 Þ
2 1
Therefore, Sup pðx; yjl; r Þ ¼ 2p 2
rH
ðl;r2 Þ2H0
Hence we get,
Sup pðx; yjl; r2 Þ

ðl;r2 Þ2H0
Lðx; yÞ ¼
Sup pðx; yjl1 ; l2 ; r2 Þ
ðl1 ;l2 ;r2 Þ2H
_2
!n1 þ2 n2 Pn1 P
r ðxi xÞ2 þ n12 ðyi yÞ2
¼ _2
¼ Pn 1 P 1
rH xÞ2 þ n12 ðyi yÞ2 þ n1n1þn2n2 ðx yÞ2

1 ð xi
2 3
6 7
6 7
6 1 7
¼6 7
6 n n ðx yÞ2 7
41 þ nP 1 2 o5
n1 P
ðn1 þ n2 Þ 1 ðxi xÞ2 þ n12 ðyi yÞ2
2 3
6 7
6 1 7
¼6
6
7
7
ðxyÞ
P
2
P n2
41 þ 5
ðn11 þ n12 Þðn1 þ n2 2Þf ðyi yÞ2 g
n1
1
ðxi xÞ2 þ 1
ðn1 þ n2 2Þ

N l1 ; r2 and Y N l2 ; r2
We know, X n1 n2

Y N l1 l2 ; r2 1 þ 1
X
n1 n2
ðl1 l2 Þ
Thus, X ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
rY
N ð0; 1Þ
r 1
n1 þ n1
2
P
Again, 1
ðXi X Þ2 v2n 1
r2
P 1
and r12 ðYi Y Þ v2n2 1 :

2
hP P i
Therefore, r12 ðX i X Þ2 þ ðYi Y Þ v2n1 þ n2 2
2
Thus, under H : l1 ¼ l2 ;
. ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðX Y Þ q
n1 þ n2
r 1 1
t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
hP i ffi tn1 þ n2 2 ;
P
1
ðX i X Þ þ 2 2
ðYi Y Þ n1 þ n2 2 1
r2
ðx yÞ
or; t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi n1 þ n2 2
n1 þ n2 s
1 1 2
P P
ðxi xÞ2 þ ðyi yÞ2 ðn1 1Þs21 þ ðn2 1Þs22
where s ¼
2
n1 þ n2 2 ¼ n1 þ n2 2
So, Lðx; yÞ ¼ 1
2
1 þ n þt n 2
1 2
Thus, the rejection region Lðx; yÞ\C gives
t2
1þ [ C1
n1 þ n2 2
or; t2 [ C2
or; jtj [ C3
Therefore, the LR test is given as jtj [ C3 ;where C3 is obtained as

PH ½jtj [ C3 ¼ a:
This gives C3 ¼ tn1 þ n2 2; a=2 : Hence, LR test is given as

8 jx yj
>
<1 if rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
[ tn1 þ n2 2;a=2
/ðx; yÞ ¼
s
1
þ 1
>
:
n1 n2
0 Otherwise
(c) Proceeding similarly as in (b), for testing H : l1 l2 against K : l1 \l2 the

LR test is given as
8 x y
>
<1 if rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
\ tn1 þ n2 2;a
s
/ðx; yÞ ¼ 1
þ 1
>
n1 n2
:
0 Otherwise
Example 4.7 Suppose xij N ðli ; r2 Þ; j ¼ 1ð1Þni ; i ¼ 1ð1Þk independently. This is

one-way classified data.
We are to find LR test for H : l1 ¼ l2 ¼ ¼ lk against K: l0i are not all equal.
Answer Here, h ¼ ðl1 ; l2 ; . . .lk ; r2 Þ and H ¼ fh : 1\li \1; i ¼ 1ð1Þk;
r [ 0g: Observe that H[K ¼ H: Likelihood functions ¼ pð x=hÞ ¼
2
P
k Pni
ðxij li Þ
2
n=2 n
12 P
ð2pÞ r e ; n ¼ k1 ni :
2r
1 1

Sup pðx=hÞ p x=h
H
Likelihood ratio is Lð xÞ ¼ h2H x ¼ ;
Sup pð =hÞ pðx=hÞ
h2H
where h and hH are, respectively, the MLEs of hð2 HÞ and hð2 H Þ:

P
ni
Now, h 2 H) MLEs are li ¼ xi ¼ n1i xij
1

@ log p x=h
as ¼ 0 ) li ¼ xi
@li
1X k X ni
2 within S:S: W
r2 ¼ xij xi ¼ ¼ :ðsayÞ:
n 1 1 n n
" #
@ log p x=h 1 X
k X ni
2
Here ¼ 0 ) r2 ¼ xij xi
@r2 n 1 1
Again h 2 H ) h ¼ ðl; r2 Þ; where l is the common value of li ’s.

Here, we have MLEs
1Xk X
ni
1Xk
lH ¼ xij ¼ nixi ¼ x; ðsayÞ
n 1 1 n 1
1X k Xni
2 Total S:S: T
r2H ¼ xij x ¼ ¼
n 1 1 n n
" #
1 X k 2 W þB
¼ Wþ ni xi x ¼
n 1
n
" #
Xk 2
B¼ ni xi x ¼ BetweenðmeansÞ S:S :
1
.
Hence, we get p x ^h ¼ ð2pÞn=2 ðrÞn e2 :
n
.
p x ^hH ¼ ð2pÞn=2 ðr
^H Þn e2
n
2 n=2
^
r
So, Lð xÞ ¼ ^2H
r
and therefore reject H iff Lð xÞ\c; PH fLð xÞ\cg ¼
a 2 ð0; 1Þð0 \ c \ 1Þ
^2H
r W þB B
, [ c0 , [ c0 , [ c00
r 2 W W
B=
, T ¼ k 1 [ c000
W=
nk
The size condition now reduces to PH fT [ c000 g ¼ a under H.

Under H, T F ðk 1; n kÞ
)c000 ¼ Fa;ðk1;nkÞ
B=
So, our LR test is W k 1 [ Fa;ðk1;nkÞ as rejection of H.
=n k
Note It is the same as ANOVA test.
Special case: (i) l ¼ l0 (given)
n=2 "P P 2 #n=2
k ni
^2
r 1 xij xi
Lð xÞ ¼ ¼ Pk Pn
1
2
^H 2
r
1 xij l0
i
1
" #n=2
W
¼ P
W þ k1 ni ðxi l0 Þ2
Test reduces to reject H iff

Pk

1
ni ðxi l0 Þ2
T ¼ k 1
[ Fa;ðk;nkÞ
W=
ðn kÞ
(ii) Common value is unknown ðlÞ but r2 ¼ r20 is known.

ni

P
k P 2 n o
exp 2r1 2 xij x exp 2r1 2 ðB þ WÞ
0 1
Lð xÞ ¼ 1
¼ n0 o
P ni
k P 2
exp 2r2
1
xij xi exp 2rW
2
0
0
1 1

B
¼ exp 2
2r0
Hence LR test is: reject H iff
B
[ v2a;k1 :
r20
(iii) If min ðn1 ; n2 ; . . .; nk Þ is large, we can approximate the distribution of

2 log Lð xÞ by v2 —distribution with d.f. k − 1.
Note The above hypothesis is equivalent to homogeneity of k univariate normal
population.
Example 4.8

Suppose xij N li ; r2i ; j ¼ 1ð1Þni ; i ¼ 1ð1Þk independently:
Obtain the likelihood ratio test of H : r21 ¼ r22 ¼ r23 ¼ ¼ r2k against K: not
all ri ’s are equal.
Answer h ¼ l1 ; l2 ; . . .; lk ; r21 ; r22 ; . . .; r2k

H ¼ h : 1\li \1; r2i [ 0; i ¼ 1ð1Þk
8 9
< 12 P ðxij li Þ2 =

ni
ð2pÞ
n =2 Q P
k
Likelihood function = p x=h ¼ Q k
; n¼
2r
ni e i ni
k
ð r2 2
i Þ
i¼1
: ;
i¼1

@ log pðx=hÞ 2
Pn i
Now, @li ¼0) 2r2i j¼1 î ¼ xi
xij li ð1Þ ¼ 0 ) l
and

@ log p x=h ni 1 X ni
2
) þ xij li ¼ 0
@ri2 2
2ri 4
2ri j¼1
1X ni
2 ni 1 2
^2i ¼
)r xij xi ¼ s
ni j¼1 ni i
. Q 2 n2i
Hence, for h 2 H we get p x ^h ¼ ð2pÞn=2 ki¼1 r î

x
¼ Sup p =h
h2H

Under H, p x=h reduces to
P
k P
ni
ðxij li Þ
2
12
p x=h ¼ ð2pÞn=2 rn e ;
2r
P ni
k P 2 P
k
_ _2
where from we get liH ¼ xi and rH ¼ 1n xij xi ¼ 1n ðni 1Þs2i
. 2 n=2 n=2
Hence, Sup p x=h ¼ p x ^hH ¼ ð2pÞn=2 r ^H e
h2H
Hence, likelihood ratio is
Qk nðni 1Þs2i o 2i
n
n=2
r^2H i¼1 ni
Lð xÞ ¼ Qk ¼h in=2 :
ni
^2i Þ 2 Pk
i¼1 ðr i¼1 ðni 1Þsi
1 2
n
The distribution of the statistic obtained in Lð xÞ is difficult to calculate.

Therefore, we could only say about its asymptotic distribution, i.e. 2 loge Lð xÞ:
Pk
ðni 1Þs2i X k
ð ni 1Þ 2
So; 2 loge Lð xÞ ¼ n loge i¼1
ni loge si
n i¼1
ni
is distributed as v22kðk þ 1Þ¼k1 for large ni’s.

It has been suggested by Bartlett (1937) that the above approximation to
Chi-square for large n can be improved if we replace the ML estimators of r2 ’s by
unbiased estimators, i.e. if we replace ni by ni 1 and n by n − k in the above
expression. So,
Pk
ðni 1Þs2i X k
2 loge Lð xÞ ¼ ðn k Þ loge i¼1
ðni 1Þ loge s2i
nk i¼1
X Pk
k
s 2
ðni 1Þs2i
¼ ðni 1Þ loge 2 ; where s ¼ Pi¼1
2
k
i¼1 ðni 1Þ
i¼1
si
Bartlett has also suggested that Chi-square approximation will hold goof for ni as
low as four or five if the above statistic is divided by t; where
" #
1 Xk
1 1
t ¼ 1þ Pk
3ðk 1Þ i¼1 ni 1 i¼1 ðni 1Þ
PK 2
ðni 1Þ loge s2
i¼1 s
Hence, T ¼ t
i
v2k1
So, we reject H approximately at level a iff
T [ v2k1;a
It is noted that the rapidity of convergence for this statistic T towards v2 is

greater than that of 2 loge Lð xÞ:
Example 4.9 X1 ; X2 . . .Xn are i.i.d. with density r1 erðxlÞ

1
for x l: Find the LR
test for testing
I. H : l ¼ l0 against K: l 6¼ l0
II. H : r ¼ r0 against K: r ¼6 r0
Solution
I. Likelihood function
P
n
1 r
1
ðxi lÞ
p x=l; r ¼ n e i if xi [ l8i
r
¼ 0 otherwise
MLEs of l and r are given as

^ ¼ y1 ; y1 ¼ 1st order statistic
l
Pn
^ ¼ 1n ðyi y1 Þ; y1 \y2 \yn are the order statistics of x1 ; x2 . . .; xn :
r
i¼1
Then

Sup p x=l; r ¼ p x=l ^Þn en
^ ¼ ðr
^; r
l;r

Under H, p x=l; r reduces to
P
n
r1 ðxi l0 Þ
p x=l ; r ¼ rn e i
0
P
n Pn
^H ¼ 1n ðxi l0 Þ ¼ 1n ðyi l0 Þ
) MLE of r is r

i i
x x
Then Sup p =l; r ¼ p =l ; r ¼ ðr^H Þn en
l;r2H 0 ^H
Then likelihood ratio is given as

Sup p x=l; r n
l;r2H ^
r
Lð xÞ ¼ ¼
Sup p x=l; r ^H
r
l;r
Now, reject H iff

P
ð yi y1 Þ
Lð xÞ\C , P \c
ð y i l0 Þ
P
ð yi y1 Þ n ð y 1 l0 Þ
,P \c , P [ c0
ð y i y 1 Þ þ nð y 1 l 0 Þ ð yi y1 Þ
Under
H : l ¼ l0
nðy1 l0 Þ=
P 2 F2;2n2 :
ðyi y1 Þ=
2n 2
Hence, we reject H iff

n= ðy1 l Þ
T ¼P 2 0
[ Fa;2;2n2
ðyi y1 Þ=
ð2n 2Þ
II. As earlier, it can be shown that the test has acceptance region as
1X n
c1 \T ¼ ðyi y1 Þ\c2 ;
r0 i
where c1 and c2 are determined that

PH c1 \v22n2 \c2 ¼ 1 a

PH c1 \v22n \c2 ¼ 1 a
Example 4.10 Let ðX11 ; X21 Þ; ðX12 ; X22 Þ. . .; ðX1n ; X2n Þ be a random sample from a
bivariate normal distribution with means l1 and l2 , variances r21 and r22 and cor-
relation coefficient q. Find the likelihood ratio test of
H : q ¼ 0 against K : q 6¼ 0
Solution

Here, h ¼ l1 ; l2 ; r21 ; r22 ; q

H ¼ l1 ; l2 ; r21 ; r22 ; q : 1\li \1; r2i [ 0; 1\q\1; i ¼ 1; 2

H0 ¼ l1 ; l2 ; r21 ; r22 ; q : 1\li \1; r2i [ 0; q ¼ 0; i ¼ 1; 2
In H, ML estimators for l1 ; l2 ; r21 ; r22 and q are
_ _ _2 P _2 P _
l1 ¼ x1 ; l2 ¼ x2 ; r1 ¼ 1n ðx1i x1 Þ2 ; r2 ¼ 1n ðx2i x2 Þ2 : and q¼
P
ðx1i x1 Þðx2i x2 Þ
P P 1=2 ¼ r.
f ðx1i x1 Þ2 ðx2i x2 Þ2 g
Thus,
h 2 2 i
pffiffiffiffiffiffiffiffiffiffiffiffiffin 2 1r
1
r
n^ r
n^
1 þ 2 2r:n:r
e ð Þ r^1 r^2
_ _
Sup pð xjhÞ ¼ 2pr1 r2 1 r 2
2 2 2
h2H
pffiffiffiffiffiffiffiffiffiffiffiffiffin
en
_ _
¼ 2pr1 r2 1 r 2
In Ho ; ML estimators for l1 ; l2 ; r21 and r22 are
_ _ 1X _2 _2 1X
l1H ¼ x1 ; l2H ¼ x2 ; r1H ¼ðx1i x1 Þ2 ; r2H ¼ ðx2i x2 Þ2
n n
h i
n 1 n^r21H þ n^r22H
_ _
Thus, Sup pð xjhÞ ¼ 2pr1H r2H
2 2 2
e r^1H r^2H
h2H
n
en
_ _
¼ 2pr1H r2H
Hence, the LR is given as
Sup pð xjhÞ
h2H0 n=2
Lð xÞ ¼ ¼ 1 r2
Sup pð xjhÞ
h2H
The LR critical region is given as
Lð xÞ\C
n=2
or; 1 r2 \C
or; 1 r 2 \C1
or; r 2 [ C2
or; jr j [ C3 ;
where C3 is obtained as
PH ½jr j [ C3 ¼ a
Thus, the test of H : q ¼ 0 against K : q 6¼ 0 is based on r, the distribution of the

sample correlation coefficient and its distribution for q ¼ 0 is symmetric about 0.
Thus, PH ½r\ C3 ¼ PH ½r [ C3 ¼ a=2
Equivalently, the critical region for H : q ¼ 0 against K : q 6¼ 0 is
jr j n 2
pffiffiffiffiffiffiffiffiffiffiffiffiffi [ k:
1 r2
pffiffiffiffiffiffi
Since rpffiffiffiffiffiffiffi
n2ffi
1r2
has the t-distribution with (n − 2) d.f. when q ¼ 0, the constant k is
given as

jr j n 2
PH pffiffiffiffiffiffiffiffiffiffiffiffiffi [ k ¼ a:
1 r2
This gives k ¼ tn2;a=2 :

Note For example, if n = 4 and a = 0.05 then
Zc3 Z1
1 1
dr ¼ dr ¼ 0:025
2 2
1 c3
gives C3 = 0.95. Hence, H is rejected at 5 % level of significance if r based on a

sample of size four is such that jr j [ 0:95:
Example 4.11 Let X1 ; X2 ; . . .; Xn be a random sample form density f ð xÞ ¼

1 x=k
ke ; x [ 0; k [ 0: Find the likelihood ratio test of H : k ¼ k against K : k 6¼ k :
0 0
Solution Here, h ¼ ðkÞ; H ¼ fk : k [ 0g
H 0 ¼ fk : k ¼ k0 g
_
In H, MLE of k is k ¼ x
Thus, the LR test is
Sup pðxjkÞ nx

1 k0
H0 kno e xn n x
Lð xÞ ¼ ¼ ¼ e k0 :en
Sup pðxjkÞ 1 n
xn :e
kn0
H
The rejection region Lð xÞ\C gives
xn enx=k0 \C1 ;

nx
writing gðxÞ ¼ xn e k0 :
It shows that the curve y ¼ gðxÞ has single maximum at x ¼ k0 and the shape of
the curve is
The graph shows the critical regions 0\ x \d0 or d1 \ x \1 corresponding to

the critical region Lð xÞ\C: The constants d0 and d1 are obtained by the size
condition
PH ½d0 \x\d1 ¼ 1 a:
In this problem, X Gð1; kÞ
Mx ðtÞ ¼ ð1 ktÞ1

kt n
) Mx ðtÞ ¼ 1
n

G n; k ; i.e. f ðxÞ ¼ 1 n n enkx xn1 :
Thus, X n CðnÞ k
One can find the values of d0 and d1 from the gamma distribution table under H0 .
Chapter 5
Interval Estimation
5.1 Introduction
Inpoint estimation when a random sample ðX1 ; X2 ; . . .; Xn Þ is drawn from a popu-

lation having distribution function Fh and h is the unknown parameter (or the set of
unknown parameter). We try to estimate the parametric function cðhÞ by means of a
single value, say t, the value of a statistic T corresponding to the observed values
ðx1 ; x2 ; . . .; xn Þ of the random variables ðX1 ; X2 ; . . .; Xn Þ. This estimate may differ
from the exact value of cðhÞ in the given population. In other words, we take t as an
estimate of cðhÞ such that jt cðhÞj is small with high probability. In the point
estimate we try to choose a unique point in the parameter space which can rea-
sonably be considered as the true value of the parameter. Instead of unique estimate
of the parameter we are interested in constructing a family of sets that contain the
true (unknown) parameter value with a specified (high) probability. In many
problems of statistical inference we are not interested only in estimating the
parameter or testing some hypothesis concerning the parameter, we also want to get
a lower or an upper bound or both, for the real-valued parameter. Here two limits
are computed from the set of observations, say t1 and t2 and it is claimed with a
certain degree of confidence (measured in probabilistic terms) that the true value of
cðhÞ lies between t1 and t2 . Thus we get an interval ðt1 ; t2 Þ which we expect would
include the true value of cðhÞ. So this type of estimation is called intervalestimation.
In this chapter we discuss the problem of interval estimation.
5.2 Confidence Interval
An interval depending on a random variable X is called a random interval. For

example, (X, 2X) is a random interval. Note that, 12 X 1 , X 1 2X:

DOI 10.1007/978-81-322-2514-0_5
132 5 Interval Estimation
A confidence interval (CI) of h is a random interval which covers the true value
of the parameter h with specified degrees of confidence (assurance). In other words,

a random interval I ð X Þ ¼ hð X Þ; hð X Þ satisfying
n o
Prh h 2 I X 1 a8h 2 H ð7:1Þ

will be a confidence interval for θ at confidence level (1 − α). If equality in (7.1)

holds then (1 − α) will be called confidence coefficient. hð X Þ and hð X Þ are the lower
and upper confidence limits respectively.
Let I ð X Þ ¼ ½hð X Þ; 1 be a random interval such that Prh fh 2 I ð X Þg ¼
Prh fh hð X Þg 1 a 8 h 2 H: Then hð X Þ is called the lower confidence bound
of θ at confidence level (1 − α). Similarly we can define upper confidence bound

hð X Þ such that Prh fh 2 I ð X Þg ¼ Prh h hð X Þ 1 a 8 h 2 H; corresponding

to a random interval I ð X Þ ¼ a; hð X Þ :
Remark 1 In making the probability statement we do not mean θ is a random
variable. Indeed, θ is a constant. All that is meant here is that the probability is

(1 − α) that the random interval hð X Þ; hð X Þ will cover θ whatever the true value
of θ may be. More specifically, it is asserted that about 100(1 − α)% statements of

the form h 2 hð X Þ; hð X Þ should be correct.
b
Remark 2 In thepoint estimation, we choose
an estimate, say h ð xÞ; on the basis of a
b
sample x such that the difference h x h is small with high probability. In other

words, in the point estimator we try to choose a unique point in the parameter space
which can reasonably be considered as the true value of theparameter. On the other

hand, in the interval estimation, we choose a subset of the parameter space, say I x ;

on the basis of a sample x which reasonably includes the true value of the parameter.

More specifically in interval estimation, we choose an interval I x ; such that

n o
Prh h 2 I x 1a 8 h:

5.3 Construction of Confidence Interval
Method I
A simple procedure for finding a confidence interval
Let T be a statistic and WðT; hÞ be a function of T and θ. Suppose the distribution of
WðT; hÞ is free from θ. Then it is always possible to choose two constants K1 and K2
ðK1 K2 Þ such that
5.3 Construction of Confidence Interval 133
PrfwðT; hÞ\K1 g\a1 and PrfwðT; hÞ [ K2 g\a2 where a1 ; a2 [ 0 and

a1 þ a2 ¼ a:
Hence PrfK1 wðT; hÞ K2 g 1 ða1 þ a2 Þ ¼ 1 a:
Suppose it is possible to convert the inequality K1 wðT; hÞ K2 into the form
hðT Þ h hðT Þ:

Then Pr hðT Þ h hðT Þ 1 a: This fact gives us a (1 −α) level confidence
interval for θ.
Example 5.1 Let X1 ; X2 ; . . .; Xn be a random sample form N ðl; r2 Þ. Find ð1 aÞ
level confidence interval for l when (i) r2 is known and (ii) when r2 is unknown.
Solution (i) Suppose r2 is known.
pffiffi
We take wðT; hÞ ¼ nðrxlÞ which is an N ð0; 1Þ variate. Hence the distribution of
wðT; hÞ is independent of h. We can choose k1 and k2 from N ð0; 1Þ such that

pffiffiffi
nðx lÞ
P s1a1 sa 2 ¼ 1 ð a1 þ a 2 Þ ¼ 1a
r
h i
So, x sa2 prffiffin ; x s1a1 prffiffin is a ð1 aÞ level confidence interval for l if r is
known.
(ii) Suppose r2 is unknown:
pffiffi
We take wðT; hÞ ¼ nðxslÞ which is student’s t statistic with d.f ðn 1Þ where
P
s2 ¼ n11
ðxi xÞ2 : The distribution of wðT; hÞ is independent of h. Again we
choose k1 and k2 using a t-distribution with (n − 1) d.f such that

pffiffiffi
nðx lÞ
P tn1;1a1 tn1;a2 ¼ 1 ð a1 þ a2 Þ ¼ 1a
s

s s
) P x tn1;a2 pffiffiffi l x tn1;1a1 pffiffiffi ¼ ð1 aÞ:
n n
h i
So x tn1;a2 psffiffin ; x tn1;1a1 psffiffin is a ð1 aÞ level confidence interval for l, if
r2 is unknown.
Example 5.2 Let X1 ; X2 ; . . .; Xn be a random sample from N ðl; r2 Þ. Find ð1 aÞ
level confidence interval for r2 when (i) l is known and (ii) l is unknown.
Solution (i) SupposeP l is known.
ðxi lÞ2
We take wðT; hÞ ¼ r2 which is distributed as v2 with n d.f. Thus its
distribution is independent of h. We can choose k1 and k2 from v2 distribution with
n d.f such that
" P #
ð x i lÞ 2
P v2n;1a1 v2n;a2 ¼ 1 ð a1 þ a2 Þ ¼ 1a
r2
"P P #
ðxi lÞ2 ð x i lÞ 2
)P r2 ¼ 1a
v2n;a2 v2n;1a1
P P
ðxi lÞ2 ðxi lÞ2
Thus v2n;a ; v2n;1a
is 100(1 − a)% confidence interval of r2 when l is
2 1
known. (ii) Suppose l is unknown.P

ðxi xÞ2
We take the function wðT; hÞ ¼ r2 which is distributed as v2 with (n − 1)
d.f. This distribution
is independent of h. Proceeding as in (i),
P P
ðxi xÞ2 ðxi xÞ2
v2
; v2 is 100ð1 aÞ% confidence interval of r2 when l is
n1;a2 n1;1a1
unknown.
Example 5.3 Let X1 ; X2 ; . . .; Xn be a random sample from density function f ðxjhÞ ¼
1
h ; 0 \ x \ h: Find 100ð1 aÞ% confidence interval of h.
Solution The likelihood function is L ¼ h1n . This is maximum when h is the
smallest; but h cannot be less than xðnÞ , the maximum of sample observations. Thus
_
h ¼ xð n Þ :
_
The p.d.f of h is given by
n1
_ nh
_
_
h h ¼ n ; 0\h\h:
h
_
x
Let u ¼ ðhnÞ ¼ hh : so that gðuÞ ¼ nun1 ; 0 < u < 1.
Thus the distribution of u is independent of h.
We find u1 and u2 such that
P½u1 \ u \ u2 ¼ 1 ða1 þ a2 Þ ¼ 1a
Ru1 R1
where gðuÞdu ¼ a1 , gðuÞdu ¼ a2
0

_
u2
_ _

h h h
i.e. P u1 \ h \u2 ¼ 1 a: ) P u2 \h\ u1 ¼ 1 a

Thus, max u2 ; u1 is a 100ð1 aÞ% confidence interval for h.
Xi max Xi

Example 5.4 X1 ; X2 ; . . .; Xn is a random sample from a G 1h ; 1 distribution having
p.d.f.
1
f ðx/hÞ ¼ ex=h ; x 0:
h
Find 100ð1 aÞ% confidence

Pn interval of h.
xi
Solution Let t = i¼1
h ¼ n
x
h which is a G(1, n) variate having p.d.f. g
t n1
(t) = CðnÞ e t ; 0 t\1:
1
Thus the distribution of t is independent of h. We find k1 and k2 such that

nx
P k1 \t ¼ \k2 ¼ 1 ða1 þ a2 Þ ¼ 1 a
h
where
Zk1 Z1
gðtÞdt ¼ a1 ; gðtÞdt ¼ a2
0 k2
i.e.

nx nx
P \h\ ¼1a
k2 k1
Method 2: Confidence based methods: A general approach:

Let T be a statistic and t1 ðhÞ and t2 ðhÞ be two quantities such that PrfT\t1 ðhÞg\a1
and PrfT [ t2 ðhÞg\a2 ; a1 ; a2 [ 0; a1 þ a2 ¼ a: The equation T ¼ t1 ðhÞ and
T ¼ t2 ðhÞ give us two curves as
θ(t) B(t,θ(t))
θ
T = t 2 (θ)
θ(t) A(t, θ(t))
Suppose t be the observed value of the statistic T. Draw a perpendicular at

T = t. It intersects the curves at A and B. Suppose the co-ordinates of A and B are

ðt; hðtÞÞ and t; hðtÞ respectively. According to the construction
t1 ðhÞ T t2 ðhÞ , hðtÞ h hðtÞ:

) Pr½t1 ðhÞ T t2 ðhÞ ¼ Pr hðtÞ h hðT Þ 1 a:
This fact gives us ð1 aÞ level Confidence Interval for h.

Note 1: To avoid the drawing one may consider inverse interpolation formula.
Note 2: If the L.H.S’s of the Eq. (7.1) can be given explicit expression in terms of h
and if the equations can be solved for h uniquely, then roots are the confidence
limits for h at confidence level ð1 aÞ.
Example 5.5 Let X1 ; X2 ; . . .; Xn be a random sample from density function f ðxjhÞ ¼
h;0\x\h: Find 100ð1 aÞ% confidence interval of h.
1
Solution The likelihood function is L ¼ h1n : This is maximum when h is the

smallest; but h cannot be less than xðnÞ ; the maximum of sample observations. Thus
_
h ¼ xð n Þ :
_
The p.d.f of h is given by
n1
_ nh
_
_
h h ¼ n ; 0\h\h:
h
We find k1 ðhÞ and k2 ðhÞ such that

h _
i
P k1 ðhÞ\h\k2 ðhÞ ¼ 1 ða1 þ a2 Þ ¼ 1 a
where
Z
k1 ðhÞ

h ^h d ^h = a1 ð7:2Þ
0
and
Zh
h h^ d ^h = a2 ð7:3Þ
k2 ðhÞ
^n k ðhÞ
From (7.2), hhn 01 ¼ a1 or, k1 ðhÞ ¼ hða1 Þ1=n
^ n
From (7.3),¼ a2 or, 1 − ½k2hðhn Þ ¼ a2 or, k2 ðhÞ ¼ hð1 a2 Þ1=n . Therefore,
hn h
hn k2 ðhÞ
h _
i h i
1=n ^
h ^h
P ha1 \h\hð1 a2 Þ1=n ¼ 1 a or, P 1=n \h\ 1=n ¼ 1 a:
ð1a2 Þ ða1 Þ
Note We can get the confidence interval of h by the Method I which is given in
Example 5.3.
Large sample confidence interval: Let the asymptoticn distribution pofffiffi a statistic
o
Tn be normal with mean h and variance r nðhÞ ; then Pr s1a1 ðTnrh Þ n
2
ðhÞ sa2 ’
1 a1 þ a2 ¼ 1 a (say).
This fact gives us a confidence interval for h at confidence level ð1 aÞ
approximately.
Example 5.6 X1 ; X2 ; . . .; Xn is a large random sample from PðkÞ. Find the

100ð1 aÞ% confidence interval for k. Px i
enk k
Solution Likelihood function is L ¼ x1 !x2 !:::::xn !
_
MLE of k ¼ k ¼ x
_ 1 k
V k ¼ ¼
@ 2 log L n
E @k2
Thus, p ffiffiffiffiffiffi ! N ð0; 1Þ as n ! 1

xk
k=n

Hence P s1a1 \ p ffiffiffiffiffiffi \sa2 ¼ 1 ða1 þ a2 Þ ¼ 1 a
xk
k=n
h pffiffiffiffiffiffiffi pffiffiffiffiffiffiffii
) P x sa2 x=n\l\x s1a1 x=n ¼ 1 a
using the approximation k ¼ ^k ¼x in the denominator. So 100ð1 aÞ% confi-

qffiffiffiffiffiffi qffiffiffiffiffiffi
dence interval for k is from x sa2 x=n to x s1a1 x=n.
Method 3 Method based on Chebysheff’s inequality:
By Chebysheff’s inequality, Pr½jT EðT Þj erT [ 1 e12 . Now setting 1
e2 ¼ 1 a; we can construct confidence interval.
1
Example 5.7 Consider the problem of Example 5.3. Find the 100ð1 aÞ% confi-
dence interval of h by using the method of Chebysheff’s inequality.
Solution
_ _ 2
We have E h ¼ n þn 1 h and E h h ¼ h2 ðn þ 1Þ2ðn þ 2Þ
By applying Chebysheff’s inequality we get
2 _ 3
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h h
6 ð n þ 1Þ ð n þ 2Þ 7 1
P4 \ 25 [ 1 2 :
h 2 2
_ p _
Since h ! h; we replace h by h and for moderately large n,
2 _ 3
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h h ð n þ 1Þ ð n þ 2Þ
6 7 1
P4 _ \ 25 [ 1 2 :
h 2 2
Choosing 1 212 ¼ 1 a or 2 ¼ p1ffiffi : we have

a
" pffiffiffi pffiffiffi #

_1 _ 2 _ 1 _ 2
P h pffiffiffi h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi \h\h þ pffiffiffi h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ 1 a
a ð n þ 1Þ ð n þ 2Þ a ð n þ 1Þ ð n þ 2Þ
_
Again pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ffi ’ 1n for large n and the fact that h h, we have
ðn þ 1Þðn þ 2Þ
h_ _
qffiffii _ qffiffi
P h\h\h 1 þ 1n 2a [ 1 a: Thus h ¼ max xi ; max xi 1 þ 1n 2a is an
approximate 1 a level confidence interval for h.
5.4 Shortest Length Confidence Interval

and Neyman’s Criterion
From the above discussion, it is clear that ð1 aÞ level C.I is not unique. In fact,
infinite number of C.I’s can be constructed by simple method [Because the equation
a1 þ a2 ¼ a; a1 0; a2 0 has infinite number of solution for ða1 ; a2 Þ]. Again
for different choice of statistic, we get different confidence intervals. For example,
in r.s. from
1 x
f ðx; hÞ ¼ eh ; 0 \ x \ 1
h

P
2 Xi
is a ð1 aÞ level lower confidence bound for h
v22n;a
P
v2 2n since M.g.f. of X = ð1 thÞ1 ) M.g.f of
2 Xi
[As 2X
h v 2
2 ) h
2X
h ¼ ð1 2tÞ1 = M.g.f. of v2 2 ].
On the other hand, að1 aÞ level confidence bound for h based on Xð1Þ ¼ minXi
i
2nXð1Þ
is v22;a
.
So we need some optimality criteria to choose one of the ð1 aÞ level confi-
dence intervals.
1. Shortest length confidence interval [Wilk’s criterion]

A ð1 aÞ level confidence interval I ðhÞ ¼ hðT Þ; hðT Þ based on T will be of
shortest length if the inequality
hð T Þ hð T Þ
h ðT Þ h ðT Þ; for all h holds for every other ð1 aÞ level C.I.

h ðT Þ;
h ðT Þ based on the same statistic T.
Example 5.8 On the basis of an r.s. from N ðl; r2 Þ; r2 being unknown, a ð1 aÞ
given by
h C.I. for l based on X is
level i
sa2 prffiffi ; X
X s1a1 prffiffi ; a1 ; a2 0 and a1 þ a2 ¼ a: The length of the
n n
interval is ðsa2 s1a1 Þ prffiffi : n
5.4 Shortest Length Confidence Interval and Neyman’s Criterion 139
To find the shortest length confidence interval, we minimize ðsa2 s1a1 Þ

subject to a1 þ a2 ¼ a; a1 ; a2 0:
pffiffi
Owing to the symmetry of the distribution nðrxlÞ about ‘0’, the quantity sa2
s1a1 will be a minimum when sa=2 ¼ s1a1 , i.e., when a1 ¼ a2 ¼ a=2: Thus the
h i
shortest length ð1 aÞ level C.I. for l based on x is x sa=2 prffiffin ; x þ sa=2 prffiffin :
Remarks Occasionally, the length of a C.I is a random quantity. In this case, we

minimize its expected length. e.g. In random sampling from N ðl; r2 Þ; (both l and
hr are unknown), a ð1i aÞ level C.I for h l is given i by
2
t
X ta ;n1 psffiffi ; X psffiffi : This length of the C.I is t psffiffi :
2 n 1a ;n1 1 n a ;n1 t1a ;n1 2 1 n
which is a random quantity.
h So, to find i the shortest (expected) length C.I, we minimize
ta2 ;n1 t1a1 ;n1 EpðsffiffinÞ subject to a1 ; a2 0 and a1 þ a2 ¼ a: Owing to the sym-
metry of tn1 distribution about ‘0’, the minimum is attained at a1 ¼ a2 ¼ a=2:
Therefore
h the required shortest i expected length confidence interval is
sffiffi
X ta=2;n1 n ; X þ ta=2;n1 n :
p psffiffi
Example 5.9 Consider the problem discussed in Example 5.2. On the basis of a
random sample fromN ðl; r2 Þ; l being known, a ð1 aÞ level CI for r2 is given by
P P
ðxi lÞ2 ðxi lÞ2
vn;a
2 ; v 2 , a1 ; a2 0 and a1 þ a2 ¼ a: The length of the interval is
1
2 n;1a
P
v2
1
v21 ðXi lÞ2 which has the expected value
n;1a1 n;a2
" #
1 1
nr2 :
v2n;1a1 v2n;a2

We wish to minimize 1
v2n;1a
v21 ;
n;a2
1
Rv2
2
subject to f ðv2 Þdv2 ¼ 1 a; where v21 ¼ v2n;1a1 ; v22 ¼ v2n;a2 and f ðv2 Þ is the
v21
p.d.f. of a chi-square r.v. with
2 n d.f. 3
v22
R
Now let / ¼ v12 v12 þ k4 f ðv2 Þdv2 ð1 aÞ5
1 2
v21
where k is a Lagrangian multiplier. We get

d/ 1
¼ 4 kf v21 ¼ 0
dv21 v1
d/ 1
¼ 4 þ kf v22 ¼ 0
dv2 v2
2
1 1
)k ¼ ¼ 4 2
v41 f v21 v2 f v2
Hence v21 and v22 are such that the equation
Zv2
2

v41 f v21 ¼ v42 f v22 is to be satisfied and f v2 dv2 ¼ 1 a:
v21
It is very difficult to find the actual values of v21 and v22 . In practice, the equal tails

P P
ðxi lÞ2 ðxi lÞ2
interval, v2
; v2 , is used.
n;a=2 n;1a=2
PSimilarly if l is unknown, the equal tail confidence interval,

P
ðxi xÞ2 ðxi xÞ2
v2
; v2 , is employed.
n1;a=2 n1;1a=2
Example 5.10 Consider the problem discussed in Example in 5.3. A ð1 aÞ level

C.I. for h is given by
max xi ; max xi ; a ; a 0; a þ a ¼ a: The length L of the interval is
u2 u1 1 2 1 2

u1 u2 max xi .
1 1
We minimize L subject to
Zu2
nun1 du ¼ un2 un1 ¼ 1 a
u1
This implies 1 a\un2
) ð1 aÞ1=n \u2 1
and

dL dL du1 1
¼ max xi þ
du2 du1 du2 u22

1 nu2n1 1
¼ max xi 2 n1 þ 2
u1 nu1 u2
un1 þ 1 u2n þ 1
¼ max xi \0;
u22 un1 þ 1
so that the minimum occurs at u2 ¼ 1: When u. 1=n : Thus a 1 a level

2 ¼ 1;u1 ¼ a
confidence interval is given by max x ; max x a1=n : This confidence interval has
i i
the smallest length among all confidence intervals for h based on max xi
2. Neyman’s criterion
Let I1 ð X Þ and I2 ð X Þ be two ð1 aÞ levelconfidence intervals for h. I1 ð X Þ will be
accurate (or shorter) than I2 ð X Þ if
Ph0 fh 2 I1 ð X Þg Ph0 fh 2 I2 ð X Þg8h; h0 2 H; h 6¼ h0 ðh0 ¼ true valueÞ
A ð1 aÞ level C.I. I ð X Þ is said to be most accurate (UMA) (or shortest) if

Ph0 fh 2 I ð X Þg Ph0 fh 2 I ð X Þg8h; h0 2 H; h 6¼ h0 for any other ð1 aÞ level C.I.
I ð X Þ.
A ð1 aÞ level C.I. I ð X Þ is said to be unbiased if,
Ph0 fh 2 I ð X Þg 1 a ¼ Ph0 fh0 2 I ð X Þg8h; h0 2 H; h 6¼ h0
i.e. Probability (containing wrong value of θ) ≤ Probability (containing true

value of θ).
Implication An unbiased confidence interval includes true value more often than
it does contain wrong value.
A ð1 aÞ level unbiased C.I. I ð X Þ is said to be most accurate amongst the class
of unbiased ð1 aÞ level if Ph0 fh 2 I ð X Þg Ph0 fh 2 I ð X Þg8h; h0 2 H;h 6¼ h0 for
any other ð1 aÞ level unbiased C.I. I ð X Þ
Relation between non randomized test and confidence interval
Theorem 5.1 Suppose Aðh0 Þ denoted the acceptance region of a level a test for
testing H0 :h ¼ h0 n o
Define S x ¼ h= x 2 AðhÞ

Then S x will be a ð1 aÞ level confidence interval for h.

Proof By the construction of S x , we have


x 2 AðhÞ , h 2 S x

n o n o
)Ph h 2 S x ¼ Ph x 2 AðhÞ 1 a8h:

Note The implication of this theorem is that for a fixed x , the confidence region

S x is that set of values h0 for which the hypothesis H0 : h ¼ h0 is accepted when

x is the observed value of x

Theorem 5.2 Let S x be a ð1 aÞ level confidence interval for h: Define AðhÞ ¼
n o
x =h 2 S x : Then Aðh0 Þ will be an acceptance region of a level a

non-randomized test for testing H0 : h ¼ h0 .
Proof By the construction of AðhÞ, we have

x 2 AðhÞ , h 2 S x

n o n o
)Ph x 2 AðhÞ ¼ Ph h 2 S x 1 a8h:

Relation between UMP non-randomized test and UMA confidence interval

Theorem 5.3 Suppose Aðh0 Þ denoted the acceptance region
n of an UMP,olevel-a
non-randomized test for testing H0 : h ¼ h0 . Define S x ¼ h= x 2 AðhÞ . Then

S x will be an UMA ð1 aÞ level confidence interval for h:

Proof By Theorem 5.1, it is clear that the level of set S x is ð1 aÞ.

Consider another acceptance region A ðh0 Þ of a level a non-randomized test for
testing H0: h¼ hn0 o
Let S x ¼ h= x 2 A ðhÞ , then the level of S x is also ð1 aÞ.

Since AðhÞ is the acceptance region of a UMP non-randomized test, we can write,
n o n o
Ph0 x 2 AðhÞ Ph0 x 2 A ðhÞ ðh 6¼ h0 Þ

n o n o
) Ph0 h 2 S x Ph0 h 2 S x 8ðh 6¼ h0 Þh; h0 2 H

Since S x is arbitrary the proof follows immediately.


Theorem 5.4 Let S x be an UMA ð1 aÞ level confidence interval for h. Define
n o
AðhÞ ¼ x =h 2 S x : Then Aðh0 Þ will be an acceptance region of a level-a

UMP test for testing H0 : h ¼ h0 .
Proof According to the construction of AðhÞ, Aðh0 Þ will be the acceptance region of
a level-a non-randomized test for testing H0 : h ¼ h0 .
Now corresponding to another ð1 aÞ level C.I. S x ,

n o
let A ðhÞ ¼ x : h 2 S x :

Then A ðh0 Þ will be also an acceptance region of a level-a non-randomized test

for testing H0 :
h ¼h0
Now since S x is an UMA ð1 aÞ level C.I. for h,

n o n o
Ph0 h 2 S x Ph0 h 2 S x 8ðh 6¼ h0 Þh; h0 2 H; h 6¼ h0

n o n o
) Ph0 x 2 AðhÞ Ph0 x 2 A ðhÞ

which implies that Aðh0 Þ will be the acceptance region of level-a UMP
non-randomized test for testing H0 : h ¼ h0 , since A ðhÞ is arbitrary.
Relation between UMPU non-randomized test and UMAU confidence interval
Theorem 3.5 Let Aðh0 Þ be the acceptance region of nan UMPU olevel-a
non-randomized test for testing H0 : h ¼ h0 . Define S x ¼ h= x 2 AðhÞ . Then

S x will be an UMAU ð1 aÞ level confidence interval for h.

Proof According to construction of S x it will be a ð1 aÞ level Confidence
n o
Interval for h. Let S x ¼ h= x 2 A ðhÞ corresponding to any other accep-

tance region A ðh0 Þ of a level-a non-randomized test for testing H0 : h ¼ h0 .
Now since Aðh0 Þ is the acceptance region of a level-a UMPU non-randomized
testfor testing H0 : h ¼ h0
n o n o
Ph0 x 2 AðhÞ Ph0 x 2 A ðh0 Þ 1 a8h; h0 2 H; h 6¼ h0

n o n o
) Ph0 h 2 S x Ph0 h 2 S x 1 a

i.e., S x is a UMAU ð1 aÞ level C.I. for h, since A ðh0 Þ is arbitrary.


Theorem 5.6 Let S x be an UMAU ð1 aÞ level confidence interval for h.
n o
Define AðhÞ ¼ x =h 2 S x : Then Aðh0 Þ will be an acceptance region of a

level-a UMPU test for testing H0 : h ¼ h0 .
Proof According to the construction of AðhÞ, Aðh0 Þ will be the acceptance region of
a level-a non-randomized test for testing H0 : h ¼ h0 .
Now, corresponding to any other ð1 aÞ level C.I. S x for h,
n o

let A ðhÞ ¼ x =h 2 S x ; then A ðhÞ will also be an acceptance region of a

level-a non-randomized
test for testing H0 : h ¼ h0 .
Since S x is UMAU ð1 aÞ level C.I.

n o n o
)Ph0 h 2 S x Ph0 h 2 S x 1 a8h; h0 2 H; h 6¼ h0

n o n o
) Ph0 x 2 AðhÞ Ph0 x 2 A ðh0 Þ 1 a

i.e. Aðh0 Þ will be an acceptance region of a level-a UMPU test for testing
H 0 : h ¼ h0 .
Example 5.11 Let X1 ; X2 ; . . .; Xn be a r.s. from Rð0; hÞ: The UMP level-a
non-randomized test for testing H0 : h ¼ h0 against h 6¼ h0 is given by the critical
pffiffiffi
region xðnÞ [ h0 or xðnÞ h0 n a:
n pffiffiffi o

Let AðhÞ ¼ x h n a\xðnÞ h

n o
Define S x ¼ hj x 2 AðhÞ

pffiffiffi
¼ hjh n a\xðnÞ h

xð n Þ
¼ hjxðnÞ h\ p ffiffiffi
n
a
n o
Thus, by Theorem 5.3, Sð xÞ ¼ hjxðnÞ h\ pðnnffiffiaÞ will be a ð1 aÞ level UMA
x
confidence interval for h.

Chapter 6
Non-parametric Test
6.1 Introduction
In parametric tests we generally assume a particular form of the population dis-

tribution (say, normal distribution) from which a random sample is drawn and we
try to construct a test criterion (for testing hypothesis regarding parameter of the
population) and the distribution of the test criterion depends upon the parent
population.
In non-parametric tests the form of the parent population is unknown. We only
assume that the population, from which a random sample is drawn, is continuous
and try to develop a test criterion whose distribution is independent of the popu-
lation distribution under the hypothesis under consideration. A non-parametric test
is concerned with the form of the population but not with any parametric value.
A test procedure is said to be distribution free if the statistic used has a distri-
bution which does not depend upon the form of the distribution of the parent
population from which the sample is drawn. So in such procedure assumptions
regarding the population are not necessary.
Note Sometimes the term ‘distribution free’ is used instead of non-parametric. But
we should make some distinction between them.
In fact, the terms ‘distribution free’ and ‘non-parametric’ are not synonymous.
The term ‘distribution free’ is used to indicate the nature of the distribution of the
test statistic whereas the term ‘non-parametric’ is used to indicate the type of
hypothesis problem investigated.
Advantages and disadvantages of non-parametric method over parametric
method
Advantages
(i) Non-parametric methods are readily comprehensible, very simple and easy to
apply and do not require complicated sample theory.

DOI 10.1007/978-81-322-2514-0_6
146 6 Non-parametric Test
(ii) No assumption is made about the form of frequency function of the parent
population from which the sample is drawn.
(iii) No parametric technique will be applicable to the data which are mere clas-
sification (i.e. which are measured in nominal scale), while non-parametric
method exists to deal with such data.
(iv) Since the socio-economic data are not, in general, normally distributed,
non-parametric tests have found applications in psychometry, sociology and
educational statistics.
(v) Non-parametric tests are available to deal with data which are given in ranks
or whose seemingly numerical scores have the strength of the ranks. For
example, no parametric test can be applied if the scores are given in grades
such as A, B, C, D, etc.
Disadvantages
(i) Non-parametric test can be used only if the measurements are nominal and
ordinal. Even in that case, if a parametric test exists it is more powerful than
the non-parametric test.
In other words, if all the assumptions of a statistical model are satisfied by the
data and if the measurements are of required strength, then non-parametric
tests are wasteful of time and data.
(ii) No non-parametric method exists for testing interactions in ANOVA model
unless special assumptions about the additivity of the model are made.
(iii) Non-parametric tests are designed to test statistical hypothesis only but not for
estimating parameters.
6.2 One-Sample Non-parametric Tests
In this section we consider the following one-sample non-parametric tests:

(i) Chi-square test
(ii) Kolmogorov–Smirnov test
(iii) Sign test
(iv) Wilcoxon signed-rank test
(v) Run test
6.2.1 Chi-Square Test (i.e Test for Goodness of Fit)
Let n sample observations are continuous measurements grouped in k class intervals

or observations themselves are frequency of k mutually exclusive events
6.2 One-Sample Non-parametric Tests 147
A1 ; A2 ; . . .; Ak such that S ¼ A1 þ A2 þ þ Ak is the space of the variable under

consideration. The form of the distribution is not known. We want to test Ho :
FðxÞ ¼ F0 ðxÞ against: H1 : FðxÞ 6¼ F0 ðxÞ. Here Fo ðxÞ is specified with all its
parameters.
Under H0 we can obtain the probability ðpi Þ of a random observation from F0 to
belong in the ith class Ai ði ¼ 1; 2; . . .kÞ: The expected frequency in ith class is
ei ¼ npi for i ¼ 1; 2;. . .; k. These are compared with the observed frequencies xi .
Pearson suggested the statistic.
X
k
ðxi npi Þ2
v2 ¼ :
i¼1
npi
If the agreement between the observed ðxi Þ and expected frequencies ðei Þ is
close, then the differences ðxi npi Þ will be small and consequently v2 will be
small. Otherwise it will be large. The larger the value of v2 the more likely is that
the observed frequencies did not come from the population under H0 . This means
that the test is always right-sided. It can be shown that for large samples the
sampling distribution of v2 under H0 follows chi-square distribution with (k − 1) d.
f. The approximation holds good if every ei 5. In case there are some ei \5, we
have to combine adjacent classes till the expected frequency in the combined class
is at least 5. Then k will be the actual number of classes used in computing v2 . Thus
the null hypothesis H0 is rejected if Cal v2 [ v2a;k1 .
6.2.2 Kolmogrov–Smirnov Test
Let X1 ; X2 ; . . .; Xn be a sample from continuous distribution function FðxÞ. We are

to test H0 : FðxÞ ¼ F0 ðxÞ 8 x against H1 : FðxÞ 6¼ F0 ðxÞ for some x.
Suppose Fn ðxÞ is the sample (empirical) distribution function corresponding to
any given x; that is, if the number of observation x is k, then
k
Fn ðxÞ ¼ :
n
Test statistic under H0 is given by
Dn ¼ Sup jFn ðxÞ F0 ðxÞj

x
which is known as Kolmogorov–Smirnov statistic.

The distribution of Dn does not depend on F0 as long as F0 is continuous. H0 is
rejected if Dn [ Dn;a .
Similarly, the one-sided KS statistics for one-sided alternatives are the

following:
(i) for the alternative H þ : FðxÞ F0 ðxÞ 8 x the appropriate statistic is
Dnþ ¼ Sup½Fn ðxÞ F0 ðxÞ

x
(ii) for the alternative H : FðxÞ F0 ðxÞ 8 x the appropriate statistic is
D
n ¼ Sup½F0 ðxÞ Fn ðxÞ
x
The statistics Dnþ and D n have the same distribution because of symmetry. The
test rejects H0 if Dnþ [ Dn;a þ
when alternative is FðxÞ F0 ðxÞ 8 x and rejects H0 if
Dn [ D
n;a when alternative is FðxÞ F0 ðxÞ 8 x at the level a.
6.2.3 Sign Test
FðxÞ is continuous distribution function of the parent population, which is con-

tinuous. FðxÞ is unknown, from which we draw a random sample ðx1 ; x2 ; . . .; xn Þ.
We define fp ¼ pth order population quantile.
) Pr ½X np ¼ p i:e: Pr ½X np 0 ¼ p:
Assumption FðxÞ is continuous in the neighbourhood of fp . To test H 0 : fp ¼ fp 0 .
Case 1 H 1 : np [ np 0
To perform the test we consider the number of positive quantities among

x1 np 0 ; x2 np 0 ; . . .; xn np 0 . Sample values equal to np 0 are ignored.
Suppose S = total number of + signs, we note that, under H 0
Pr½X np 0 0 ¼ p

) Pr X np 0 [ 0 ¼ 1 p ¼ q; say:
) Under H 0 ; S Bðn;hqÞ i
Also, under H 1 ; Pr X n0p \p, i.e. Pr X np 0 [ 0 [ q. Suppose, under
h i
H 1 ; Pr X n0p [ 0 ¼ q0 where q0 [ q.
) Under H 1 ; S Bðn; q0 Þ where q0 [ q.
Hence a large value of S indicates the rejection of H 0 .
8
< 1 if S[s
So the test is /ðSÞ ¼ a if S ¼ s
:
0 if S\s
where ‘s’ and ‘a’ are such that

(I) Pr½S [ s=H 0 \a Pr½S s=H 0
(II) E H 0 /ðSÞ ¼ a

From (I) we get s and from (II) a ¼ Pr S [ s=H 0 þ a Pr S ¼ s=H 0

a Pr S [ s=H 0
)a¼ :
Pr S ¼ s=H 0
Hence test is given by S [ s ) Rejection of H 0

S\s ) Acceptance of H 0
S ¼ s ) To draw a random number with probability of rejection ‘a’ and
probability of acceptance 1 − a.
Case 2
H 2 ¼ np \np 0 or np ¼ np 0 \np 0
Under H 2
Pr½X np 0 ¼ p
) Pr½X np 0 [ p
or, Pr½X np 0 0 [ p, i.e. Pr½X np 0 [ 0\1 p ¼ q.

Suppose under H 2 ; Pr X np 0 [ 0 ¼ q0 where q0 \q.
) Under H 2 ; S Bðn; q0 Þ where q0 \q:
So a small value of S 8 indicates the rejection of H 0 .
> 1 if S \ s
<
So our test is /ðSÞ ¼ a if S ¼ s
>
:
0 if S [ s
where ‘s’ and ‘a’ are such that

Pr S\s=H 0 \a Pr S s=H 0 and E H 0 /ðSÞ ¼ a

i.e: Pr S\s=H 0 þ a Pr S ¼ s=H 0 ¼ a or,

a Pr S\s=H 0
a¼
Pr S ¼ s=H 0
i.e. if S\s ) reject H 0

S [ s ) accept H 0
S ¼ s ) draw a random number with probability of rejection ‘a’ and probability
of acceptance (1 − a).
Large sample test
Under H 0 ; S Bðn; qÞ
) under H 0 s ¼ Spffiffiffiffiffiffi
npq Nð0; 1Þ
nq
) x0 : s\sa
Case 3 H 3 : np 6¼ np 0
Under H 3 ; Pr½X np 0 6¼ p ) Pr½X np 0 [ 0 6¼ q

Suppose under H 3 ; Pr½Xnp 0 [ 0 ¼ q0 where q0 6¼ q
) Under H 3 ; S Bðn; q0 Þ where q0 6¼ q:
So a small or a large value of S indicates the rejection of H 0 . Here the test is
8
>
> 1 if S\s1
>
>
< a1 if S ¼ s1
/ ðSÞ ¼ 0 if s1 \ S \ s2
>
>
>
> a if S ¼ s2
: 2
1 if S [ s2
where s1 and s2 are such that
Pr½S\s1 =H 0 \a1 Pr½S s1 =H 0 ;

Pr S [ s2 =H 0 \a2 Pr S s2 =H 0
and a1 þ a2 ¼ a. For simplicity we take a1 þ a2 ¼ a=2.

‘a1’ and ‘a2’ are such that

a= ¼ Pr S\s1 =H þ a1 Pr S ¼ s1 =H
2 0 0
a
Pr½S\s1 =H 0
) a1 ¼ 2
Pr½S ¼ s1
a
¼ Pr S [ s2 =H 0 þ a2 Pr S ¼ s2 =H 0
2
a
Pr½S [ s2 =H 0
) a2 ¼ 2
Pr½S ¼ s2 =H 0
Thus, we reject H 0 if S\s1 or S [ s2 .

We accept H 0 if s1 \S\s2 and random or no conclusion if S ¼ s1 or S ¼ s2 .
Large sample test: Under H 0 ; S Bðn; qÞ,
) under H 0 ; s ¼ pffiffiffiffiffi
ffi
npq Nð0; 1Þ
Snq
x0 : jsj [ sa2:
Note p ¼ 12 ; np ¼ n12 ¼ median:

Under H 0 ; S B n; 12 and then S is symmetric about n2 : Therefore for two sided
test in case of Case 3,
2 s1 ¼ s2 2 ) s1 ¼ n s2 and hence a1 ¼ a2 :
n n
6.2.4 Wilcoxon Signed-Rank Test
Another similar modification of the sign test is the Wilcoxon signed-rank test. This
is used to test the hypothesis that observations have come from symmetrical pop-
ulation with a common specified median, say, l0 . Thus the problem is to test
H0 : l ¼ l0 : The signed-rank statistic T þ is computed as follows:
1. Subtract l0 from each observation.
2. Rank the resulting differences in order of size, discarding sign.
3. Restore the sign of the original difference to the corresponding rank.
4. Obtain T þ , the sum of the positive ranks.
Similarly, T is the sum of the negative ranks. Then under H0 , we expect T þ
and T to be the same. We also note that
X
n
nðn þ 1Þ
T þ þ T ¼ i¼ :
i¼1
2
The statistic T þ (or T ) is known as the Wilcoxon statistic. A large value of T þ

(or equivalently, a small value of T ) means that most of the large deviation from
l0 are positive and therefore we reject H0 in favour of the alternative H1 : l [ l0 .
Thus the test rejects H0 at the level a if T þ \C1 when H1 : l\l0
if T þ [ C2 when H1 : l [ l0
if T þ \C3 or T þ [ C4 when H1 : l 6¼ l0
where C1 ; C2 ; C3 and C4 are such that
P½T þ \C1 ¼ a
P½T þ [ C2 ¼ a
P½T þ \C3 þ P½T þ [ C4 ¼ a:
6.2.5 Run Test
Suppose we have a set of observations ðX1 ; X2 . . .; Xn Þ. We are to test H0 : The set of

observations are random against H1 : They are not random.
We replace each observation either by ‘+’ or ‘−’ sign according as it is larger or
smaller than the median of the sample observations. Any observation equal to
median is simply discarded. A run is defined to be a sequence of values of the same
kind bounded by the values of other kind. We compute the total number of runs ‘r’.
Too many values of ‘r’ as well as too small values of ‘r’ give an indication of
non-randomness. Thus the test rejects H0 at the level a if r < r1 or r > r2 where r1
and r2 are such that
P½r\r1 ¼ a=2; P½r [ r2 ¼ a=2:
The one-sample run test is based on the order or sequence in which the indi-
vidual scores or observations originally were obtained.
Example 6.1 The theory predicts that the proportion of peas in the four groups A,
B, C and D should be 9:3:3:1. In an experiment among 556 peas, the numbers in the
four groups were 315, 108, 101 and 32. Does the experimental result support the
theory?
Solution If P1, P2, P3 and P4 be the proportions of peas in the four classes in the
whole population of peas, then the null hypothesis to be tested is
9 3 3 1
H0 : P1 ¼ ; P2 ¼ ; P3 ¼ ; P4 ¼
16 16 16 16
The test statistic under H0 is given by
X
k
ðxi np0 Þ2
v2 ¼ i
with ðk 1Þ d:f
i¼1
np0i
Xk
x2i
¼ n
i¼1
np0i
The expected frequencies are

9
e1 ¼ np01 ¼ 556X ¼ 312:75
16
3
e2 ¼ np02 ¼ 556X ¼ 104:25
16
3
e3 ¼ np03 ¼ 556X ¼ 104:25
16
1
e4 ¼ np04 ¼ 556X ¼ 34:75
16
3152 1082 1012 322
So, v2 ¼ þ þ þ 556
312:75 104:25 104:25 34:75
¼ 556:47 556 ¼ 0:47 with 3 d:f:
From the table we have v20:05;3 ¼ 7:815. Since the calculated value of v2 , i.e. 0.47
is less than the tabulated value, i.e. 7.815, it is not significant. Hence the null
hypothesis may be accepted at 5 % level of significance and we may conclude that
the experimental result supports the theory.
Example 6.2 Can the following sample be reasonably regarded as coming from a
uniform distribution on the interval (35,70): 36, 42, 44, 50, 64, 58, 56, 50, 37, 48,
52, 63, 57, 43, 39, 42, 47, 61, 53, 58? Use Kolmogorov–Smirnov test.
Solution Here we test H0 : FðxÞ ¼ F0 ðxÞ for all x, where F0 ðxÞ is the distribution
function of the uniform distribution on the interval (35,70). Now
F0 ð xÞ ¼ 0 if x 35
x 35
¼ if 35\x\70
35
¼ 1 if x 70
Rearranging the data in increasing order of magnitude, we have the following

results:
x F0 ðxÞ F20 ðxÞ jF20 ð xÞ F0 ð xÞj
36 1/35 1/20 3/140
37 2/35 2/20 6/140
39 4/35 3/20 5/140
42 7/35 4/20 0
42 7/35 5/20 7/140
43 8/35 6/20 10/140
44 9/35 7/20 13/140
47 12/35 8/20 8/140
48 13/35 9/20 11/140
50 15/35 10/20 10/140
(continued)
(continued)
50 15/35 11/20 17/140
52 17/35 12/20 16/140
53 18/35 13/20 19/140
56 21/35 14/20 14/140
57 22/35 15/20 17/140
58 23/35 16/20 20/140
58 23/35 17/20 27/140
61 26/35 18/20 22/140
63 28/35 19/20 21/140
64 29/35 20/20 24/140
27
D20 ¼ SupjF20 ðxÞ F0 ðxÞj ¼ ¼ 0:1929:
x 140
Let us take a ¼ 0:05: Then from the table D20;0:05 ¼ 0:294. Since
0.1929 < 0.294, we accept H0 at 5 % level of significance. So we can conclude that
the given data has come from a uniform distribution on the interval (35,70).
Example 6.3 The following data represent the yields of maize in q/ha recorded
from an experiment.
16.4, 19.2, 24.5, 15.4, 17.3, 23.6, 22.7, 20.9, 18.2
Test whether the median yield (M) is 20 q/ha.
Solution We test H0 : M ¼ 20 against H1 : M 6¼ 20. To test H0 , we find the dif-
ference ðX 20Þ and write their signs
þ þ þ þ
Here n = 9 and r = number of ‘+’ sign = 4. This r will be binomial variate with
parameters n = 9 and p = 0.5.
To test H0 against H1 : M 6¼ 20 H1 : p 6¼ 0:5; the critical region x will be
0 0
given by r ra=2 and r ra=2 , where r a=2 is the smallest integer and ra=2 is the
largest integer such that
X9 9
9 1 a
P r ra=2 H0 ¼ ¼ 0:025
x¼ra=2 x 2 2
ra=2 1 9
X 9 1
i.e., 0:975
x¼0 x 2
h i X ra=2 9
0
0 9 1 a
and P r ra=2 H0 ¼ ¼ 0:025
x¼0 x 2 2
0
From the table we have ra=2 1 ¼ 7, i.e. ra=2 ¼ 8 and ra=2 ¼ 1. Here
0
ra=2 ¼ 1\r ¼ 4\ra=2 ¼ 8, so H0 is accepted at 5 % level of significance.
Example 6.4 For the problem given in Example 6.3, test H0 : M ¼ 20 against
H1 : M 6¼ 20 by using Wilcoxon signed-rank test.
Solution The differences Xi 20 are
−3.6, −0.8, 4.5, −4.6, −2.7, 3.6, 2.7, 0.9, −1.8
The order sequence of numbers ignoring the sign and their ranks with original
signs are as follows:
0.8 0.9 1.8 2.7 2.7 3.6 3.6 4.5 4.6

−1 2 −3 4.5 −4.5 6.5 −6.5 8 −9
Thus, T þ = The sum of the positive ranks = 21 and T = The sum of negative
ranks = 24.
We note that T þ þ T ¼ nðn 2þ 1Þ ¼ 45
To test H0 : M ¼ 20 against H1 : M 6¼ 20, the critical region ω will be given by
T þ [ C4 and T þ \C3 at the level a. Here we take a ¼ 0:05:
From the table we have P½T þ [ 39 0:025 and
P½T þ \6 0:025
Since T þ ¼ 21 lies between 6 and 39 (table values), we accept H0 . It means that

the median yield of maize is 20 q/ha.
Example 6.5 Test whether the observations
21, 19, 22, 18, 20, 24, 15, 32, 35, 28, 30 are random.
Solution We test H0 : The observation are random against H1 : The observations
are not random.
The sample values are arranged in increasing order.
15; 18; 19; 20; 21; 22; 24; 28; 30; 32; 35
) Median = 22
Each original observation is replaced by ‘+’ or ‘−’ sign according as it is larger
or smaller than the median, i.e. 22. Any observation equal to median is simply
discarded. Thus we have from the original observation
21 19 22 18 20 24 15 32 35 28 30
- - x - - + - + + + +
Thus number of runs = r = 4, number of ‘+’ signs = n1 = 5 and number of ‘−’

signs = n2 = 5. From table for n1 = 5, n2 = 5 any observed r of 2 or less or of 10 or
more is in the region of rejection at 5 % level of significance. So H0 is accepted, i.e.
the observations are random.
Example 6.6 The males (M) and females (F) were queued in front of the railway
reservation counter in the order below
M F F M M M F M F F M M F M
Test whether the order of males and females in the queue was random.
Solution Here null hypothesis is
H0 : The order of males and females in the queue was random against
H1 : The order of males and females in the queue was not random.
For the given sequence,
MFFMMMFMFFMMFM
we have,
n1 = number of males = 8
n2 = number of females = 6
r = number of runs = 9
Since the observed value of r = 9 lies between the critical values 3 and 12, we
accept H0 at 5 % level of significance. It means that the order of males and females
in the queue was random.
6.3 Paired Sample Non-parametric Test
In this section we consider the following paired sample non-parametric tests:

(i) Sign test.
(ii) Wilcoxon signed-rank test.
6.3.1 Sign Test (Bivariate Single Sample Problem) or Paired

Sample Sign Test
Suppose we have a bivariate population with continuous distribution function

F(x,y) which is unknown but continuous. The ordinary sign test for the location
parameter of a univariate population is equally applicable to a paired sample
problem. This is the non-parametric version of paired ‘t’ test.
6.3 Paired Sample Non-parametric Test 157
We draw a random sample ðx1 ; y1 Þ; ðx2 ; y2 Þ; . . .; ðxn ; yn Þ from F(x, y). To test
H 0 : np ðx yÞ ¼ np 0 writing z ¼ x y ) H 0 : np ðzÞ ¼ np 0 , i.e. H 0 : np ¼ np 0 ,
writing np ðzÞ ¼ np .
Assumption z ¼ x y is continuous in the neighbourhood of np ðzÞ. Note that

Pr z np ¼ p ) Pr znp [ 0 ¼ q; q ¼ 1 p. We define S = total number of

positive signs among z1 np 0 ; z2 np 0 ; . . .; zn np 0 .

) Under H 0 , Pr znp 0 [ 0 ¼ q and S Bðn; qÞ. Proceed for Case 1, Case 2
and Case 3 as worked out already in Sect. 6.2
Note Since np ðx yÞ is not necessarily equal to np ðxÞ np ðyÞ, the paired sample
sign test is a test for the quantile difference (but not for the difference of the
quantiles), whereas the paired ‘t’ test is a test for the mean difference (and also for
the difference of the means).
6.3.2 Wilcoxon Signed-Rank Test
This is another test used on matched pairs. It is more powerful than the sign test
because it gives more weight to large numerical differences between the members
of a pair than to small differences. Under matched-paired samples, the differences
d within n paired sample values ðx1i ; x2i Þ for i ¼ 1; 2; . . .; n are assumed to have
come from continuous and symmetric population differences. If Md is the median of
the population of differences, then the null hypotheses is that Md ¼ 0 and the
alternative hypothesis is one of Md [ 0; Md \0 or Md 6¼ 0:
The observed differences di ¼ x1i x2i are ranked in increasing order of abso-
lute magnitude and the sum of ranks is computed for all the differences of like sign.
The test statistic T is the smaller of these two rank-sums. Paris with di ¼ 0 are not
counted. On the null hypothesis, the expected value of the two ranks-sums would be
equal. If the positive rank-sum is the smaller and is equal to or less than the table
value, the null hypothesis will be rejected at the corresponding level of significance
a in favour of the alternative hypothesis that Md [ 0. If the negative rank-sum is the
smaller, the alternative will be that Md \0. If a two-tailed test is required, the
alternative being that Md 6¼ 0, the given levels of significance should be doubled.
Example 6.7 For nine animals, tested under control conditions and experimental
conditions, the following values of a measured variable were observed:
Animal 1 2 3 4 5 6 7 8 9
Control (x1) 21 24 26 32 55 82 46 55 88
Experimental (x2) 18 9 23 26 82 199 42 30 62
Test whether a significant difference exists between the medians, using (i) the
sign test and (ii) the Wilcoxon signed-ranks test.
Solution Let h be the median of the distribution of differences. Our null hypothesis
will be H0 : h ¼ 0 against H1 : h 6¼ 0.
(i) Let di ¼ x1i x2i be the difference of the values under control and experi-
mental conditions.
di : 3; 15; 3; 6; 27; 117; 4; 25; 26
Here we have 7 ‘+’ signs among 9 non-zero values. Under Ho , number(r) of ‘+’
signs will follow a binomial distribution with parameters n = 9 and p = 0.5. To test
H0 : h ¼ 0 H0 : p ¼ 0:5 against H1 : h 6¼ 0 H1 : p 6¼ 0:5; the critical region x
0 0
will be given by r ra=2 and r ra=2 where ra=2 is the smallest integer and ra=2 is
the largest integer such that.
X9 9
9 1 a
P r ra=2 H0 ¼ ¼ 0:025
x¼ra=2 x 2 2
ra 1 9
X
2
9 1
i:e:; 0:975
x¼0 x 2
h i X ra=2 90
0 9 1 a
and P r ra=2 H0 ¼ ¼ 0:025
x¼0 x 2 2
From the table we get ra=2 1 ¼ 7 ) ra=2 ¼ 8 and r 0 a=2 ¼ 1: For our example
r = 7 which lies between ra=2 ð¼8Þ and r 0 a=2 ð¼1Þ: So H0 is accepted.
(ii) The observed differences di ¼ x1i x2i are ranked in increasing order of
absolute magnitude and the sum of the ranks is computed for all the difference of
like sign. Thus
di 3 15 3 6 −27 −117 4 25 26
Rank 1.5 5 1.5 4 8 9 3 6 7
The test statistic T is the smaller of these two rank-sums (one for positive di and
one for negative di ). Here T = 17. From the table, we reject H0 at a ¼ 0:05 if either
T > 39 or T < 6. Since T > 6 and < 39, we accept H0 .
6.4 Two-Sample Problem 159
6.4 Two-Sample Problem
Case 1 The two populations differ in location only:

We take two univariate populations with continuous distribution functions F 1 ðxÞ
and F 2 ðxÞ which are unknown but continuous.
Assumption The two populations differ only in location.
To test H 0 : F 1 ðxÞ ¼ F 2 ðxÞ against H 1 : F 2 ðxÞ is located to the right of F 1 ðxÞ ,
H 0 : F 1 ðxÞ ¼ F 2 ðxÞ against H 1 : F 1 ðxÞ F 2 ðxÞ:
We draw a random sample ðx1 ; x2 ; . . .; xn1 Þ of size n1 from the first population
and another sample ðxn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 Þ of size n2 from the second popula-
tion. We write, F 1 ðxÞ ¼ F 2 ðxÞ and F 2 ðxÞ ¼ Fðx dÞ, d is unknown location
parameter. So we are to test H 0 : d ¼ 0 against H 1 : d [ 0.
A. Wilcoxon–Mann Whitney Rank-Sum Test
We pooled the two samples and give them ranks. Suppose ðR1 ; R2 ; . . .; Rn1 Þ and
ðRn1 þ 1 ; Rn1 þ 2 ; . . .; Rn1 þ n2 Þ be the ranks of the 1st and 2nd sample observations
respectively.
[Example (10,7,9,11,3), n1 ¼ 5 is sample 1 and (20,5,17,8), n2 ¼ 4 is the sample 2.
3 \ 5 \ 7 \ 8 \ 9 \ 10 \ 11 \ 17 \ 20
# # # # # # # # #
Ranks 1 2 3 4 5 6 7 8 9
) ðR1 ¼ 6; R2 ¼ 3; R3 ¼ 5; R4 ¼ 7; R5 ¼ 1Þ are the 1st sample ranks and ðR6 ¼

9; R7 ¼ 2; R8 ¼ 8; R9 ¼ 4Þ are the 2nd sample ranks.]
If there is any tie then the corresponding observation is ignored. Let
S1 ; S2 ; . . .; Sn2 be the ordered ranks of the 2nd sample observations, i.e.
S1 \ S1 \ . . .\ Sn2 .
[In the example above 2 < 4 < 8 < 9 ) R7 ¼ S1 ; R9 ¼ S2 ; R8 ¼ S3 ; R6 ¼ S4 ]
P
n2 P
n2
Define T = sum of the ranks of the 2nd sample observations ¼ Rn1 þ j ¼ Sj
j¼1 j¼1
If H 1 is true, then it is expected that the second sample observations are gen-
erally of higher ranks and hence T will be large. So a right tail test will be
appropriate here.
Hence for testing H 0 : d ¼ 0 against H 1 : d [ 0, x0 : T [ ta where ta is such
that Pr½T [ ta =H 0 a. Similarly for H 0 : d ¼ 0 against H 2 : d\0, x0 : T\ta 0
where ta 0 is such that Pr½T\ta 0 =H 0 a, and for H 0 : d ¼ 0 against H 3 : d 6¼
0; x0 : T\t1 ; T [ t2 where t1 and t2 are such that
P½T\t1 =H 0 þ P½T [ t2 =H 0 a:
Null distribution of T: Under H 0 all the nð¼ n1 þ n2 Þ, observations

x1 ; x2 ; . . .; xn1 ; xn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 are i.i.d. so that the second sample ranks can
be considered as a random sample of size n2 without replacement from (1, 2,…,n).
) l = population mean ¼ n þ2 1 and r2 ¼ Variance ¼ n 121.

2

T nþ1 n2 ðn þ 1Þ
)E =H 0 ¼ l ¼ ) EðT=H 0 Þ ¼
n2 2 2

T n n2 r 2
n1 n 1 n1 ðn þ 1Þ
2
V =H 0 ¼ ¼ ¼
n2 n 1 n2 n 1 12 n2 12 n2
n1 n2 ðn þ 1Þ
) VðT=H 0 Þ¼ :
12
Hence, if n is large, under H 0

n2 ðn þ 1Þ
s ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T2
asymptotically Nð0; 1Þ
n1 n2 ðn þ 1Þ=12
) For H 0 : d ¼ 0 against H 1 : d [ 0 ) x0 : s [ sa
H 0 : d ¼ 0 against H 2 : d\0 ) x0 : s\ sa
and H 0 : d ¼ 0 against H 3 : d 6¼ 0 ) x0 : jsj [ sa=2
Mann–Whitney
An alternative 8description of the test is more convenient.
< 1 if xn1 þ j [ xi
Let gðxi ; xn1 þ j Þ ¼ 0 otherwise i ¼ 1ð1Þn1
:
j ¼ 1ð1Þn2
U = no. of pairs in which 2nd sample observation is greater than 1st sample
observation
X
n2 X
n1
¼ gðxi ; xn1 þ j Þ
j¼1 i¼1
P
n2 P
n1
¼ gðRi ; Rn1 þ j Þ; [no. of pairs in which 2nd sample ranks are greater than
j¼1 i¼1
1st sample ranks]
n2 X
X n1
¼ gðRi ; Sj Þ
j¼1 i¼1

n n
P
n2 P1 P1
¼ gðRi ; Sj Þ , ½ gðRi ; Sj Þ = no. of 1st sample ranks which are less than Sj ]
j¼1 i¼1 i¼1
X
n2
Xn2
n2 ðn2 þ 1Þ
¼ ðSj 1Þ ðj 1Þ ¼ ðSj jÞ ¼ T
j¼1 1
2
n2 ðn2 þ 1Þ n1 n2
) EðU=H 0 Þ = E(T/H0 Þ ¼ :
2 2
n1 n2 ðn þ 1Þ
VðU=H0 Þ ¼ VðT=H0 Þ ¼
12
Hence, for large n, under H0

U n12n2 a
s ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Nð0; 1Þ
n1 n2 ðn þ 1Þ
12
Therefore
(1) For H0 : d¼ 0 against H1 : d [ 0; x0 : s [ sa
(2) For H0 : d ¼ 0 against H2 : d\0; x0 : s\ sa
(3) For H0 : d ¼ 0 against H3 : d 6¼ 0; x0 : jsj [ sa=2
B. Mood’s Median Test
Here we test H0 : F1 ðxÞ ¼ F2 ðxÞ against H1 : F1 ðxÞ F2 ðxÞ, i.e. H0 : d ¼ 0 against
H1 : d [ 0.
We draw a sample ðx1 ; x2 ; . . .; xn1 Þ of size n1 from the 1st population and another
sample ðxn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 Þ of size n2 from the 2nd population.
We mix the two samples and arrange them in ascending order of magnitude. Say
xð1Þ \ xð2Þ \ \ xðnÞ & xðmÞ ¼ combined sample median.
Define T ¼ total no: of 2nd sample size [ xðmÞ
¼ total no: of 2nd sample ranks [ m
Here T is the test statistic.
Under H1 , T would be too large and hence a right tail test is appropriate.
So for H1 : d [ 0 ) x0 : T [ ta where, ta is such that PH0 ½T ta a
for H2 : d\0 ) x0 : T\ta 0 where PH0 ½T ta a and
for H3 : d 6¼ 0 ) x0 : T t1 and T t2 where t1 , t2 are such that
PH0 ½T t1 þ PH0 ½T t2 a.
Null distribution of T: We want to get PðT ¼ t=H0 Þ.
Note that the totality of the pooled ranks (1, 2,.., n) is comprised of two subsets:
f1; 2; . . .; mg and fm þ 1; m þ 2; . . .; ng. Under H0 , the second sample ranks
represent a random sample without replacement of size n2 from the entire set. Since
T = no. of 2nd sample ranks exceeding m, the probability that there will be just
t number of members from 2nd subset in the random sample of size n2 is given by
the hypergeometric law:

nm m
t n t
) PðT ¼ t=H0 Þ ¼ 2
n
n2
n2 ðn mÞ n1 n2 mðn mÞ
) EðT=H0 Þ ¼ and VðT=H0 Þ ¼ :
n n2 ðn 1Þ
As n ! 1; mn ’ 12 and then EðT=H0 Þ ’ n22 and VðT=H0 Þ ’ n4n

1 n2
.
) For large n, under H0
T n2=2
s ¼ pnffiffiffiffiffiffi
ffi a N ð0; 1Þ
1 n2
4n
) for H1 : d [ 0 ) x0 : s [ sa
for H1 : d\0 ) x0 : s\ sa
and for H3 : d 6¼ 0 ) x0 : jsj [ sa=2 :
Case II The two populations differ in every respect, i.e. with respect to location,
dispersion, skewness, kurtosis, etc.
C. Wald–Wolfowitz Run test
H0 : F1 ðxÞ ¼ F2 ðxÞ against H1 : F1 ðxÞ 6¼ F2 ðxÞ
Here also we arrange the combined sample in ascending order xð1Þ \
xð2Þ \ . . .\ xðnÞ .
Suppose ðR1 ; . . .; Rn1 Þ be the ranks of the 1st sample observation and
ðRn1 þ 1 ; . . .; Rn1 þ n2 Þ be the ranks of the 2nd sample observation. According to the
ordered arrangement,
we write za ¼ 0 if xðaÞ comes from 1st sample
= 1 if xðaÞ comes from 2nd sample.

We note that, 1st sample can be written as xðR1 Þ ; xðR2 Þ ; . . .; xðRn1 Þ and the 2nd
n o
sample can be written as xðRn1 þ 1 Þ ; xðRn1 þ 2 Þ ; . . .; xðRn1 þ n2 Þ .
) za ¼ 0 if a 2 ðR1 ; R2 ; . . .Rn1 Þ
¼ 1 if a 2 ðRn1 þ 1 ; Rn2 þ 2 ; . . .Rn1 þ n2 Þ:
So z1 ; z2 ; . . .; zn is a sequence of 0’s and 1’s and are determined by

ðR1 ; R2 ; . . .; Rn Þ. Let U = number of ‘0’ runs and V = number of ‘1’ runs and
W = U + V = total number of runs.
Here W is our test statistic.
The idea is that if the populations are identical, then the 1st sample and 2nd
sample ranks would get thoroughly mixed up, i.e. the runs of ‘0’ and ‘1’ would be
mixed up thoroughly, i.e. W would be too large. On the other hand, if the two
populations are not identical, i.e. if H0 is not true, then the arrangement of runs
will be patching. So x would be too small. Hence a left tail test would be
appropriate.
Hence x0 : W xa where xa is such that PH0 ½W xa a. It can be shown

that under H0
8
> 0 if ju vj 2
>
> n1 1 n2 1
> ðu1
>
>
>
> Þðv1 Þ if ju vj ¼ 1
>
< n
Pr½U ¼ u; V ¼ v ¼ n1
>
>
> 2ðu1 Þðv1
1 n2 1
> Þ if u v ¼ 0
n 1
>
>
>
> n
>
:
n1
n2 1
2 n1 1 m1
) PH0 ½W ¼ 2m ¼ PH0 fu ¼ m; v ¼ mg ¼ m1 and
n
n1
PH0 ½W ¼ 2m þ 1 ¼PH0 fu ¼ m; v ¼ m þ 1g þ PH0 fu ¼ m þ 1; v ¼ mg

n1 1 n2 1 n1 1 n2 1
þ
m1 m m m1
¼
n
n1
It can be shown that EðW=H0 Þ ¼ 2nn1 n2 þ 1;

2n1 n2 2n1 n2
VðW=H0 Þ ¼ 1 :
nðn 1Þ n
For large n1 and n2 , under H0

W EH0 ðWÞ a
s ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi Nð0; 1Þ: ð6:1Þ
VH0 ðWÞ
(Note: Since U and V are not independent, so the traditional CLT for W = U + V is
not applicable here. Still (6.1) is true here as shown by Wald and Wolfowitz using
Strilings’ approximation). We write k1 ¼ nn1 and k2 ¼ nn2 ) k1 þ k2 ¼ 1
) EðW=H0 Þ ¼ 2nk1 k2 þ 1 ’ 2nk1 k2 and VðW=H0 Þ ’ 4nk21 k22 :
1 k2ffi
Then s ¼ p ffiffiffiffiffiffiffiffiffiffiffiffi
W2nk
a N ð0; 1Þ
4nk1 k2
2 2
x0 : s sa :
D. Kolmogorov–Smirnov test
Let X1 ; X2 ; . . .; Xn1 be from F1 and Xn1 þ 1 ; Xn1 þ 2 ; . . .; Xn be from F2 . We are to test
H0 : F1 ðxÞ ¼ F2 ðxÞ8x against
H1 : F1 ðxÞ F2 ðxÞ8x; F1 ðxÞ [ F2 ðxÞ for some x

Or, H2 : F1 ðxÞ F2 ðxÞ8x; F1 ðxÞ\F2 ðxÞ for some x
Or, H3 : F1 ðxÞ 6¼ F2 ðxÞ8x; for some x.
Let ‘#’ symbol implies the number of cases satisfying a stated condition.
#xa x; a ¼ 1; 2; . . .; n1
F1n1 ðxÞ ¼
n1
#xb x; b ¼ n1 þ 1; n1 þ 2; ::; n2
F2n2 ðxÞ ¼
n2
Test statistic
Dþ
n1 ;n2 ¼ SupfF1n1 ðxÞ F2n2 ðxÞg for H1
x
D
n1 ;n2 ¼ SupfF2n2 ðxÞ F1n1 ðxÞg for H2
x
Dn1 ;n2 ¼ Sup jF1n1 ðxÞ F2n2 ðxÞj
x
n o
¼ max Dþ n1 ;n2 ; D
n1 ;n2 for H3
Let 2nd sample ranks be Rn1 þ 1 ; . . .; Rn and ordered ranks be S1 \S2 \::\Sn2 .
Similarly for 1st sample ranks are R1 ; R2 ; . . .; Rn1 and ordered ranks are S01 \S02 ::\S0n1 .
Then Dþn1 ;n2 ¼ SupfF1n1 ðxÞ F2n2 ðxÞg ¼ max Sup fF1n1 ðxÞ F2n2 ðxÞg
x i¼0;1;::;n1 Xs0 x\S0 þ 1
i i

i S0i i
¼ max max ;0 :
i¼1;::;n1 n1 n2

Sj j
Similarly, D
n1 ;n2 ¼ max 0; max j
n2 n1 :
j¼1;::;n2
n o
Dn1 ;n2 ¼ max Dnþ1 ;n2 ; D
n1 ;n2 :
Under H0 , Dnis uniform and Dþ ; D and Doare distribution free. [Under H0,
distribution of ðs1 ; s2 ; . . .sn2 Þ; ðs01 ; s02 ; s03 ; . . .s0n1 Þ is independent of (F1 = F2)].
Critical region: under H0, we expect that D+, D− and D are very small. Hence right
tailed test based on D’s would be appropriate.
Asymptotic distribution
qffiffiffiffiffiffiffiffiffiffi
þ
! 1 e2z as min ðn1 ; n2 Þ! 1; z [ 0
2
For one-sided test PH0 n1 n2
D
n1 þ n2 n1 ;n2 z
Practically we find a z such that e2z ¼ a and reject H0 if n1n1þn2n2 (observed
2
Dþ
n1 ;n2 ) z.
P
1
ð1Þi1 e2i
2 2
For two sided test PH0 n1 n2
n1 þ n2 Dn1 ;n2 z ! 1 2 z
as min
i¼1
ðn1 ; n2 Þ ! 1:
Advantages of K–S test over Homogeneity v2 test are as follows
1. K–S test is applicable to ungrouped data, while v2 is applicable to grouped data
only.
2. Under H0 K–S is exactly distribution free, while v2 is asymptotically distribu-
tion free.
3. K–S test is consistent against any alternative, while v2 is so for specific alter-
native only.
Example 6.8 Twelve 4-year-old boys and twelve 4-year-old girls were observed
during two 15 min play sessions and each child’s play during these two periods was
scored as follows for incidence and degree of aggression:
Boys : 86; 69; 72; 65; 113; 65; 118; 45; 141; 104; 41; 50
Girls : 55; 40; 22; 58; 16; 7; 9; 16; 26; 36; 20; 15
Test the hypothesis that there were sex differences in the amount of aggression
shown, using (a) the Wald-Wolfowitz runs test, (b) the Mann–Whitney–Wilcoxon
test and (c) the Kolmogorov–Smirnov test.
Solution We want to test H0 : incidence and degree of aggression are the same in
four-year olds of both sexes against H1 : four-year-old boys and four-year-old girls
display differences in incidence and degree of aggression.
(a) Wald–Wolfowitz runs test

We combine the scores of boys (B’s) and girls (G’s) in a single-ordered series, we
may determine the number of runs of G’s and B’s. The ordered series is given below.
Score 7 9 15 16 16 20 22 26 36 40 41 45 50 55 58
Groups G G G G G G G G G G B B B G G
Runs _________________1___________________________ ____2_____ __3___
Score 65 65 69 72 86 104 113 118 141
Groups B B B B B B B B B
Runs __________________4______________________
Each run is underlined and we observe that r = 4.

From the table for n1 = 12, n2 = 12, we reject H0 at a ¼ 0:05 if r 7. Since our
value of r is smaller than 7, we may reject H0 . So we can conclude that boys and
girls display differences in aggression.
166
7 9 15 16 16 20 22 26 36 40 41 45 50 55 58 65 65 69 72 86 104 113 118 141

G G G G G G G G G G B B B G G B B B B B B B B B
Rank 1 2 3 4.5 4.5 6 7 8 9 10 11 12 13 14 15 16.5 16.5 18 19 20 21 22 23 24
6 Non-parametric Test
(b) Mann–Whitney–Wilcoxon test

The pooled sample and the ranks are given below:
The sum of the ranks for the observations corresponding to the boys is
R1 ¼ 11 þ 12 þ 13 þ 16:5 þ 16:5 þ 18 þ 19 þ 20 þ 21 þ 22 þ 23 þ 24 ¼ 216
and that for girls is
R2 ¼ 1 þ 2 þ 3 þ 4:5 þ 4:5 þ 6 þ 7 þ 8 þ 9 þ 10 þ 14 þ 15 ¼ 84
The smaller rank-sum is 84. This corresponds to girls.

Hence
n2 ðn2 þ 1Þ
U ¼ n1 n2 þ R2
2
¼ 144 þ 78 84 ¼ 138
Or, equivalently,
n1 ðn1 þ 1Þ
U ¼ n1 n2 þ R1
2
¼ 144 þ 78 216 ¼ 6
The test statistic is given by the smaller of the two quantities. Here U = 6. The
other value of U can be obtained from the relation U 0 = n1 n2 − U = 144 – 6 = 138.
The critical value of U for a two-tailed test at a = 0.05 and n1 ¼ n2 = 12 is 37. The
observed U = 6 is less than the table value. Hence it is significant at 5 % level.
Hence H0 is rejected.
(c) Kolmogorov–Smirnov test
The scores of the boys and girls are presented in two frequency distributions shown
below:
Score (x) No. of boys No. of girls F12 ðxÞ G12 ðxÞ jF12 ðxÞ G12 ðxÞj
7–20 0 6 0 6/12 6/12
21–34 0 2 0 8/12 8/12
35–48 2 2 2/12 10/12 8/12
49–62 1 2 3/12 12/12 9/12
63–76 4 0 7/12 12/12 5/12
77–90 1 0 8/12 12/12 4/12
91–104 1 0 9/12 12/12 3/12
105–118 2 0 11/12 12/12 1/12
119–132 0 0 11/12 12/12 1/12
133–146 1 0 12/12 12/12 0
D12;12 ¼ SupjF12 ðxÞ G12 ðxÞj ¼ 9=12. From the table, the critical value for
n1 ¼ n2 ¼ 12 at level a ¼ 0:05 is D12;12;05 ¼ 6=12. Since D12;12 [ D12;12;0:5 , we
reject H0 .
6.5 Non-parametric Tolerance Limits
We draw a random sample ðX1 ; X2 ; . . .; Xn Þ from a distribution with distribution

function F(x) which is continuous. We define functions of sample observations
L ¼ Lðx1 ; x2 ; . . .; xn Þ and U ¼ Uðx1 ; x2 ; . . .; xn Þ such that L < U.
If Pr½PrðL X UÞ b ¼ c ð6:2Þ
then the interval (L,U) is called 100 b% tolerance interval with tolerance coefficient
c. L and U are called lower and upper tolerance limits respectively. If the deter-
mination of c does not depend upon F then the limit (L,U) are called non-parametric
(distribution free) tolerance limits. We note that, (6.2) can be written as,
PrfFðUÞ FðLÞ bg ¼ c ð6:3Þ
that is a tolerance interval (L,U) for a continuous distribution having c.d.f. F

(x) with tolerance coefficient c is a random interval such that the probability is c that
the area between the endpoints of the interval (L,U) is at least a certain pre-assigned
quantity ‘b’.
If L and U are two-order statistics say xðrÞ and xðsÞ , (r < s), then (6.3) is

equivalent to Pr FðxðsÞ Þ FðxðrÞ Þ b ¼ c.
Wilks has shown that the order statistics provide non-parametric tolerance limits,
while it is Robbins who has shown that it is only the order statistics which provide
distribution free tolerance limits.
Determination of Tolerance Limits
Joint distribution of xðrÞ ; xðsÞ is
n! r1
g xðrÞ ; xðsÞ ¼ FðxðrÞ Þ
ðr 1Þ!ðs r 1Þ!ðn sÞ!
sr1 ns
FðxðsÞ Þ FðxðrÞ Þ 1 FðxðsÞ Þ f ðxðrÞ Þf ðxðsÞ Þ; xðrÞ \xðsÞ
Putting U ¼ FðxðrÞ Þ and V ¼ FðxðsÞ Þ we get,
n!
gðu; vÞ ¼ ur1 ðv uÞsr1 ð1 vÞns ; 0\u\v\1:
6.5 Non-parametric Tolerance Limits 169
U¼W U¼W 0\y\1

Again we put )
V U ¼Y V ¼ W þ Y; 0\W\1 y:
n!
) gðw; yÞ ¼ wr1 ysr1 ð1 w yÞns
Z1y
n!
) gðyÞ ¼ ysr1 wr1 ð1 w yÞns dw
0
Z1
n!
¼ ysr1 ð1 yÞr1 tr1 ð1 yÞns ð1 tÞns ð1 yÞdt
0
Z1
n!
¼ ysr1 ð1 yÞn þ rs tr1 ð1 tÞns dt
0
Cðn þ 1Þ
¼ ysr1 ð1 yÞn þ rs
Cðs rÞCðn þ r s þ 1Þ
1
¼ ysr1 ð1 yÞn þ rs ; 0\y\1:
bðs r; n þ r s þ 1Þ

) Pr FðxðsÞ Þ FðxðrÞ Þ b ¼ c
, Pr½y b ¼ c , Pr½y b ¼ 1 c
Zb
i:e: gðyÞdy ¼ 1 c
0
Rb
ysr1 ð1 yÞn þ rs dy
0
i:e: ¼1c
bðs r; n þ r s þ 1Þ
i:e: Ib ðs r; n þ r s þ 1Þ ¼ 1 c ð6:4Þ
For given b; c and n we choose r and s satisfying (6.4) such that r + s = n + 1

that is xðrÞ and xðsÞ are symmetrically placed.
Particular case: r = 1, s = n; Then (6.4) ) Ib ðn 1; 2Þ ¼ 1 c
Rb
tn2 ð1 tÞdt
0
i:e: 1 c ¼
bððn 1Þ; 2Þ
n1
b bn Cðn 1ÞCð2Þ
i:e: 1 c ¼ =
n1 n Cðn þ 1Þ
nbn1 ðn 1Þbn
¼ nðn 1Þ
nðn 1Þ
) 1 c ¼ nbn1 ð1 bÞ þ bn
that is 1 c ’ nbn1 ð1 bÞ as 0\b\1 and n ! 1. So for large ‘n’,

1 c ’ nbn1 ð1 bÞ:
For given b and c, one can find n from this relationship.
Alternative
For Bin(n,p), we know
!
X
c n
px qnx ¼I q ðn c; c þ 1Þ
x¼0 x
¼1 I p ðc þ 1; n cÞ
Then ð6:4Þ ) c ¼1 I b ðs r; n þ r s þ 1Þ
!
X n
sr1
¼ bx ð1 bÞnx :
x¼0 x
So for given n, b and c we can find s and r such that xðrÞ and xðsÞ are sym-
metrically placed.
6.6 Non-parametric Confidence Interval for nP
Suppose F(x) is continuous and arandom sample ðx1 ; x2 ; . . .; xn Þ is drawn from it.
np is the p-th order quantile. So P X np ¼ p. Define X ðrÞ and X ðsÞ as the rth and

sth order statistics, r < s. Then X ðrÞ ; X ðsÞ is said to be 100(1 − α)% confidence
interval for np if

Pr X ðrÞ np X ðsÞ ¼ 1 a ð6:5Þ

Now; Pr X ðrÞ np X ðsÞ ¼ Pr np X ðsÞ Pr np X ðrÞ

¼ Pr X ðsÞ np Pr X ðrÞ np

¼ 1 Pr X ðsÞ \np 1 þ Pr X ðrÞ \np

¼ Pr X ðrÞ \np Pr X ðsÞ \np

¼ Pr at least r of the observations \np

Pr at least s of the observations \np
6.6 Non-parametric Confidence Interval for nP 171
!
X
s1
n nx
¼ px ð1 pÞ ð6:6Þ
x¼r x
! !
X
s1 n nx X
r1 n nx
¼ p ð1 pÞ
x
px ð1 pÞ
x¼0 x x¼0 x
¼ 1 I p ðs; n s þ 1Þ 1 þ I p ðr; n r þ 1Þ
¼ I p ðr; n r þ 1Þ I p ðs; n s þ 1Þ

Since, Pr X ðrÞ np X ðsÞ ¼ 1 a, so r and s are such that
1 a ¼ I p ðr; n r þ 1Þ I p ðs; n s þ 1Þ ð6:7Þ
Given a and n, the selection of r and s satisfying (6.7) is not unique. We select
that pair of r and s for which (s − r) is minimum.
For symmetrically placed order statistics xðrÞ and xðsÞ , we select that pair of (r,
s) such that r + s = n + 1 ⇒ s − 1 = n − r.
!
X
nr
n x nx
) From ð6:7Þ 1 a ¼ p ð1 pÞ :
r x
From this relation one can find r and hence s = n + 1 − r.

Note If in (6.7) the exact probability ð1 aÞ is not attained then we choose that
pair of r and s such that

Pr X ðrÞ np X ðsÞ 1 a i:e: I p ðr; n r þ 1Þ I p ðs; n s þ 1Þ 1 a:
Non-parametric confidence interval for n1=2 (=median) using sign test

The sign test technique can be applied to obtain a class interval estimate for the
unknown population median n1=2 . Suppose X ð1Þ ; X ð2Þ ; . . .; X ðnÞ be the order statis-
tics. We consider the testing problem H 0 : n1=2 ¼ n0 against H 1 : n1=2 6¼ n0 :

Define; S ¼ total no: of þ ve signs among XðiÞ n0 8i ¼ 1ð1Þn
The ordinary sign test is

8
>
> 1 if s\s1
>
>
< a1 if s ¼ s1
/ðsÞ ¼ 0 if s1 \s\s2
>
>
> a2
> if s ¼ s2
:
1 if s [ s2
where s1 and s2 are such that

9
a
Pr ½s\s1 =H0 \ Pr ½s s1 =H0 >
=
2 ð6:8Þ
a
Pr ½s [ s2 =H0 \ Pr ½s s2 =H0 > ;
2
a a
Pr½s\s1 =H0 Pr½s [ s2 =H0
Also a1 and a2 are such that a1 ¼ 2 P½s¼s1 =H0 and a2 ¼ 2 P½s¼s2 =H0
We accept H0 if s1 \s\s2 and so
Pr½s1 \s\s2 ¼ 1 a
i:e:; Pr½s1 þ 1 s s2 1 ¼ 1 a ð6:9Þ
In order to obtain a confidence interval for n1=2 we need only to translate the
inequality in the LHS of (6.9) to an equivalent statement involving
the order
statistics and n1=2 . We have seen earlier that 1 a ¼ Pr XðrÞ np XðsÞ
!
sP
1 n x nx
¼ p ð1pÞ .
x¼r x
Now, for
!
1 X s1 n 1 n
p ¼ ; 1 a ¼ Pr XðrÞ n1=2 XðsÞ ¼
2 x¼r x 2

1
¼ Pr½r S s 1 as S B n; under H0 :
2

) Pr XðrÞ n1=2 XðsÞ ¼ Pr½r S s 1 ¼ 1 a ð6:10Þ
Comparing (6.9) and (6.10), we can write

h i
Pr Xðs1 þ 1Þ n12 Xðs2 Þ ¼ 1 a

) 100ð1 aÞ% C.I. for n1=2 using sign test is Xðs1 þ 1Þ ; Xðs2 Þ ¼ Xðs1 þ 1Þ ; Xðns1 Þ

fsince S is symmetric about n=2; n2 s1 ¼ s2 n=2
For large samples, (6.9) is equivalent to
6.6 Non-parametric Confidence Interval for nP 173
" #
s1 þ 1 n=2 S n2 s2 1 n2
Pr pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi ¼1a
n=4 n=4 n=4
" #
s1 þ 1 n=2 0:5 s2 1 n2 þ 0:5
or, Pr pffiffiffiffiffi s pffiffiffiffiffi ¼1a
n=4 n=4
s1 þ 1 n=2 0:5 s2 1 n2 þ 0:5

) pffiffiffiffiffi ¼ sa=2 and pffiffiffiffiffi ¼ sa=2
n=4 n=4
pffiffiffiffiffi pffiffiffiffiffi
i:e: s1 ¼ n=2 0:5 n=4s
a=2 & s2 ¼ =2 þ 0:5 þ
n n=4s
a=2 ð6:11Þ
So, 100ð1 aÞ% C.I. for n1=2 using sign test is

Xðs1 þ 1Þ ; Xðs2 Þ ¼ Xðs1 þ 1Þ ; Xðns1 Þ where s1 and s2 are given by (6.11).
6.7 Combination of Tests
When several tests of the same hypothesis H0 are made on the basis of independent
sets of data, it is quite likely that some of the tests will dictate rejection of the
hypothesis (at the chosen level of significance) while the others will dictate its
acceptance. In such a case, one would naturally like to have a means of combining
the results of the individual tests to reach a firm, overall decision. While one may
well apply the same test to the combined set of data, what we are envisaging is a
situation where only the values of the test statistics used are available.
Let us denote by Ti the statistic used in making the ith test (say, for i = 1, 2,...,k).
Commonly T1 ; T2 ; . . .; Tk will be statistics defined in the same way (like v2
statistics or t-statistics), but with varying sampling distributions simply because
they are based on varying sample sizes. To fix ideas, let us assume that in each case
the test requires that H0 be rejected if, and only if, the observed value of the
corresponding statistic be too large. Consider, in this situation, the probabilities
yi ¼ Pr½Ti [ ti =H0 , for i = 1,2,…,k.
Provided Ti has a continuous distribution under H0 , say with probability density
R1
function gi ðtÞ, so that yi ¼ gi ðtÞdt; where ti is a randomly taken value of Ti , yi has
ti
the rectangular distribution over the interval [0,1] under H0 and hence 2 loge yi
Pk
has the v2 distribution with df = 2. Consequently Pk ¼ 2 loge yi has, under H0 ,
i¼1
the v2 distribution with 2k degrees of freedom. This statistic is used as the test
statistic for making the combined test. One would reject H0 if, and only if, the
observed value of Pk exceeds v2a;2k .
The case where each individual test requires rejection of H0 if, and only if, the
observed value of the corresponding test statistic is too small, or the case where
each individual test requires rejection of H0 if, and only if, the observed value of the
test statistic is either too large or too small, is to be similarly dealt with. The reason
is that, if Ti have continuous distributions under H0 , then ui ¼ Pr½Ti \ti =H0 and
vi ¼ Pr½jTi j [ jti j=H0 are also rectangularly distributed over (0,1). This implies
Pk
that the statistic Pk to be appropriate to these situations, viz., Pk ¼ 2 loge ui
i¼1
P
k
and Pk ¼ 2 loge vi , are also distributed as v2 statistics with df = 2k under H0 . In
i¼1
each of these cases also, the overall decision will be to reject H0 if, and only if, the
observed value of the respective Pk exceeds v2a;2K .
Example 6.9 In order to test whether the mean height ðlÞ of a variety of paddy
plants, when fully grown, is 60 cm, or less than 60 cm, five experimenters made
independent (student’s) t-tests with their respective data. The probabilities of the t-
statistics (with the appropriate df in each case) to be less than their respective
observed values are 0.023, 0.061, 0.07, 0.105 and 0.007. If the tests are made at 5 %
level, then the hypothesis H0 : l ¼ 60 cm, has to be accepted in three cases out of
the five.
In order to combine the results of the 5 tests, we note that log yi , for i = 1, 2, 3, 4
and 5, are 2:36173; 2:78533; 2:23045; 1:02119 and 3:84510, respectively. Hence for
P5 P5
the data, loge ui ¼ 10 þ 2:24380 ¼ 7:75620, so that Pk ¼ 2 loge ui ¼
i¼1 1
2:30259 15:5124 ¼ 35:719.
This is to be compared with v2:05;10 ¼ 18:307 and v2:01;10 ¼ 23:205: Since the
observed value of Pk exceeds the tabulated values, the combined result of the
experimenter’s tests leads to the rejection of H0 at both 5 % and the 1 % level. In
other words, in the light of all 5 experimenters’ data, we may conclude that the
mean height at the variety of paddy plant is less than 60 cm.
6.8 Measures of Association for Bivariate Samples
A. Spearman’s rank correlation coefficient

In many situations, the individuals are ranked by two judges or the measurements
taken for two variables are assigned ranks within the samples independently. Now it
is desired to know the extent of association between the ranks. The method of
calculating the association between ranks was given by Charles Edward Spearman
in 1906 and is known as Spearman’s rank correlation.
Let ðX1 ; Y1 Þ; ðX2 ; Y2 Þ; . . .; ðXn ; Yn Þ: be a sample from a bivariate population. If
the sample values X1 ; X2 ; . . .; Xn and Y1 ; Y2 ; . . .; Yn are each ranked from 1 to n in
6.8 Measures of Association for Bivariate Samples 175
increasing order of magnitude separately and if the X’s and Y’s have continuous
distribution functions, we get a unique set of rankings. The data will then reduce to
n pairs of ranking. Let us write
R1a ¼ Rank of Xa ; a ¼ 1; 2; . . .; n:
R2a ¼ Rank of Ya ; a ¼ 1; 2; . . .; n:
Pearsonian coefficient of correlation between the ranks R1a ’s and R2a ’s is called
the Spearman’s rank correlation coefficient rs which is given by
Pn
1 ÞðR2a R
ðR1a R 2Þ
a¼1
rs ¼
nP Pn o1=2
n
a¼1
1 Þ2
ðR1a R 2
a¼1 ðR2a R2 Þ
n
P
12 R1a n þ2 1 R2a n þ2 1
a¼1
¼
nðn2 1Þ
If for n individuals, Da ¼ R1a R2a , is the difference between ranks of the ath
individual for a ¼ 1; 2; . . .; n, the formula for Spearman’s rank correlation is
P
n
6 D2a
i¼1
rs ¼ 1 :
nðn2 1Þ
The value of rs lies between −1 and +1. If X, Y are independent then Eðrs Þ ¼ 0.
Also Population Spearman’s rank correlation coefficient, i.e. qs ¼ 0 ) Eðrs Þ ¼ 0:
Kendall in 1962 derived the frequency function of rs and gave exact critical value
rs . But the approximate test of rs which is the same as t-test for Pearsonian cor-
relation coefficient is good enough for all practical purposes. Here we test H0 :
qs ¼ 0 against H1 : qs 6¼ 0. The test statistic
pffiffiffiffiffiffi
t ¼ rp
s ffiffiffiffiffiffiffi
n2ffi
has (n − 2) d.f. The decision about H0 is taken in the usual way. For
1rs2
large samples under H0, the random variable Z ¼ rs n 1 has approximately a
standard normal distribution. The approximation is good for n 10.
B. Kendall’s rank correlation coefficient
Kendall’s rank correlation coefficient τ is suitable for the paired ranks as in case of
Spearman’s rank correlation. Let ðX1 ; Y1 Þ; ðX2 ; Y2 Þ; . . .; ðXn ; Yn Þ be a sample from a
bivariate population.
For any two pairs ðXi ; Yi Þ and Xj ; Yj we say that the relation is perfect con-
cordance if
Xi \Xj whenever Yi \Yj or Xi [ Xj whenever Yi [ Yj and that the relation is

perfect discordance if Xi [ Xj whenever Yi \Yj or Xi \Xj whenever Yi [ Yj .
Let pc and pd be the probability of perfect concordance and of perfect discor-
dance respectively defined by

pc ¼P ðXj Xi ÞðYj Yi Þ [ 0

and pd ¼P ðXj Xi ÞðYj Yi Þ\0 :
The measure of association between the random variables X and Y defined by
s ¼ pc pd
is known as Kendall’s tau ðsÞ

It is noted that
s ¼ 0 if X and Y are independent.
= +1 if X and Y be in prefect concordance.
= −1 if X and Y be in prefect discordance.
We now need to find an estimate of s from the sample.
Using sample observations, Kendall’s measure of association becomes
1 X
n X
T¼ ! sðxj xi Þsðyj yi Þ ð6:12Þ
n 1 i\j n
2
where sðrÞ ¼ 1 if r [ 0
¼ 0 if r ¼ 0
¼ 1 if r\0

Naturally E s xj xi sðyj yi Þ ¼ pc pd ¼ s
The statistic T defined in (6.12) is known as Kendall’s sample tau ðsÞ coefficient.
The procedure for calculating T consists of the following steps:
Step 1: Arrange the rank of the first set (X) in ascending order and rearrange the
ranks of the second set (Y) in such a way that n pairs of rank remain the same.
Step 2: After operating Step 1, the ranks of X are in natural order. Now we are left
to determine how many pairs of ranks on the set Y are in their natural order and how
many are not. A number is said to be in natural order if it is smaller than the
succeeding number and is coded as +1 and also if it is greater than its succeeding
it will not be taken in natural order and will be coded as −1. In this
number then
n
way all pairs of the set (Y) will be considered and assigned the values +1
2
and −1.
Step 3: Find the sum ‘S’ of all the coded values.

Step 4: The formula for Kendall’s rank correlation coefficient-T is
S Actual value 2S
T¼ ¼ ¼
n Maximum possible value nðn 1Þ
2
Here we test H0 : s ¼ 0 against

H1 : s6¼ 0. Thus we reject H0 if the observed
value of jTj [ ta=2 where P jT [ ta=2 jH0 ¼ a. The values of ta are given in the
table for selected values of n and a. Values for 4 n 10 are tabulated by Kendall.
4ðn2Þ
þ 1s . If n ! 1 under
2
It can be shown that EðTÞ ¼ s and VðTÞ ¼ 9nðn1Þ
n
2
pffiffi
H0 : s ¼ 0, 2 T Nð0; 1Þ and we can test the independence of x and y.
3 n
Remark An important difference between T and rs is that T provides an unbiased

estimate of s, whereas rs is not an unbiased estimate of qs :
Example 6.10 Following are the ranks awarded to seven debators in a competition
by two judges.
Debators A B C D E F G
Ranks by judge I (x) 3 2 1 6 7 4 5
Ranks by judge II (y) 5 6 3 7 4 2 1
Compute (i) Spearmen’s rank correlation coefficient ðrs Þ and Kendall’s sample
tau coefficient (T) and test their significance.
Solution (i) First we find di ¼ xi yi 8i which are
d : −2 −4 −2 −1 3 2 4
P
Also, 7i¼1 di2 ¼P54
6 d2
thus rs ¼ 1 nðn2 1Þi ¼ 1 6 54
7 48 ¼ 0:036
To test H0 : qs ¼ 0 against H1 : qs 6¼ 0, the statistic
pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi
rs n 2 0:036 7 2
t5 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:080
1 rs2 1 ð0:036Þ2
From the table, t0.025,5 = 2.571. Calculated value of |t| = 0.080 < 2.571, hence we
accept H0. It means there is a dissociation between the ranks awarded by two
judges.
(ii) We write below ranks of x in natural order and ranks of y correspondingly
x 1 2 3 4 5 6 7
y 3 6 5 2 1 7 4
For this problem, n = 7

For S, take the rank 3 and give +1 or −1 value for all pairs with subsequent ranks
of y. 3 < 6, give a number +1; 3 < 5, again +1; 3 > 2, give a number −1 and so on.
Then choose 6 and take the pairs (6,5), (6,2), (6,1), (6,7) and (6,4) and continue the
process till we reach the last pair (7,4). Proceeding in this manner,
S = (+1 +1 –1 –1 +1 +1) + (–1 –1 –1 +1 –1) + (–1 –1 +1 –1) + (–
1 +1 +1) + (+1 +1) + (–1)
= 2 – 3 – 2 + 1 + 2 – 1 = –1
Thus T ¼ S ! ¼ 1 2
7 6 ¼ 0:048
n
2
To test the significance of T, we test
H0 : s ¼ 0 against H1 : s 6¼ 0.
From the table, for n = 7 we have t0.025 = 0.62. Since |T| = 0.048 < 0.62, we
accept H0 . It reveals that there is no association between the ranks awarded by two
judges.
Example 6.11 A random sample of 12 couples showed the following distribution of
heights (in inches):
Couple no. 1 2 3 4 5 6 7 8 9 10 11 12
Husband 80 70 73 72 62 65 74 71 63 64 68 67
height
Wife height 72 60 76 62 63 46 68 71 61 65 66 67
(a) Compute rs and T.

(b) Test the hypothesis that the heights of husband and wife are independent using
rs as well as T. In each case use the normal approximation.
Solution (a) The heights of husband and wife are each ranked from 1 to 12 in
increasing order of magnitude separately and let us denote their ranks by xi and yi
respectively (i = 1,2, …, 12).
xi : 12 7 10 9 1 4 11 8 2 3 6 5
yi : 11 2 12 4 5 1 9 10 3 6 7 8
di ¼ x i y i :1 5 2 5 4 3 2 2 1 3 1 3
X
di2 ¼ 108:
P
6 di26 108
Thus rs ¼ 1 ¼ 1 12
nðn2 1Þ 143 ¼ 0:6224
We write below the ranks of x in natural order and ranks of y correspondingly
xi : 1 2 3 4 5 6 7 8 9 10 11 12
yi : 5 3 6 1 8 7 2 10 4 12 9 11

n
Total number of scores = ¼ 12 2 11 ¼ 66
2
Actual score = S = 3 + 6 + 3 + 8 + 1 + 2 + 5 + 0 + 3 – 2 + 1 = 30 (procedure for
calculations of S is explained in Example 6.10 (ii))
Thus, T ¼ 3060 ¼ 0:4545
(b) To test H0 : qs ¼ 0 against H1 : qs 6¼ 0, the approximate test statistic is
Z ¼ rs n 1
pffiffiffiffiffi
¼ 0:6224 11 ¼ 2:06 Nð0; 1Þ
Since Cal |Z|, i.e., 2.06 > Z0.025 = 1.96, hence we reject H0 .
It means that the heights of husband and wife are not independent.
To test H0 : s ¼ 0 against H1 : s 6¼ 0, the approximate test statistic is
3 pffiffiffi 3 pffiffiffiffiffi
Z¼ n T¼ 12 0:4545 ¼ 2:36 Nð0; 1Þ
2 2
Since Cal |Z|, i.e., 2.36 > Z0.025 = 1.96, hence we reject H0 . Hence we can
conclude that there is an association between the heights of husband and wife.
Chapter 7
Statistical Decision Theory
7.1 Introduction
In this chapter we discuss the problems of point estimation, hypothesis testing and
interval estimation of a parameter from a different standpoint.
Before we start the discussion, let us first define certain terms commonly used in
statistical inerence problem and decision theory. Let X 1 ; X 2 ; . . .; X n denote a ran-
dom sample of size n from a distribution that has the p.d.f. f(x, θ), where θ is an
unknown state of nature or an unknown parameter and H is the set of all possible
values of θ, i.e. parameter space (known).
To make some inference about θ, i.e. to take some decisions or action about θ,
the statistician takes an action on the basis of the sample point ðx1 ; x2 ; . . .; xn Þ.
Let us define
¼ the set of all possible actions for statistician ðaction spaceÞ

to choose an action a from :
So, θ = true state of nature and a = action taken by the statistician.

The value L(θ, a) is the loss incurred by taking action ‘a’ when θ is true.
Equivalently, it is a measure of the degree of undesirability of choosing an action
‘a’ when θ is true and this gives a preference pattern over ɶ for given θ, i.e. the
smaller the loss the better the action under θ. L(θ, a) is a real-valued function on Θ x
ɶ = Loss function. Thus (Θ, ɶ, L) is the basic element in our discussion.
Example 7.1 Let θ = average life length of electric bulbs produced in a factory and
H ¼ ð0; 1Þ:
Point estimation of θ
To estimate the value of θ ≡ to choose one value from ð0; 1Þ; so a ¼ ð0; 1Þ.
Observe life lengths of some randomly selected bulbs.
Define L(θ, a) = ðh aÞ2 = squared error loss function

DOI 10.1007/978-81-322-2514-0_7
182 7 Statistical Decision Theory
(or) = jh aj = absolute error loss function

(or) = wðhÞðh aÞ2 = weighted squared error loss function where
wðhÞ = a known function of θ.
Desired nature of (L, θ) graph should be a convex function with minimum at θ
and increasing in jh aj.
L(θ,a)
Testing of hypothesis of θ
To test H 0 : h h0 (a given value of θ) against
H 1 : h [ h0
¼ fa0 ; a1 g where a0 = accept H0 and a1 = accept H1. Here, simple (0–1) loss
function is as
a0 a1
h h0 0 1
h [ h0 1 0
9
a0 a1 =
l00 \l01
or, assigned value loss function is as h h0 l00 l01
; l11 \l10
h [ h0 l10 l11
9
a0 a1 =
w1 ðhÞ " in h0 h
or, a ð0 xÞ type loss function is as h h0 0 x1 ðhÞ
; w2 ðhÞ " in h h0
h [ h0 x2 ðhÞ 0
Interval estimation
Here, we are to choose one interval from ð0; 1Þ.
So; ¼ The set of all possible intervals of ð0; 1Þ

¼ ða1 ; a2 Þ:

1 if h 62 a
Lðh; aÞ ¼ or, may be Lðh; aÞ ¼ a2 a1 ¼ length of the interval.
0 if h 2 a
Let R ¼ A random experiment performed

X ¼ Random outcomes of the experiment ¼ Random variable or vector
x ¼ Observed value of X

x ¼ Sample space
The probability distribution of X depends on θ, (say)
Ph : Ph ½X 2 A or; F h ðxÞ ¼ Ph ½X x
or; f h ðxÞ ¼ p.d.f or p.m.f of X:
The statistician observes the value x of X to take his decision. If X = x is observed

the statistician takes an action d ðxÞ 2 , d ðxÞ : x!
where
d ðxÞ ¼ A decision rule in its simplest form

¼ A non-randomized decision rule:
If d(x) = action taken; loss incurred under h ¼ Lðh; d ðxÞÞ. If d(x) = decision rule,
then loss incurred (under θ) ¼ Lðh; d ðxÞÞ (random quantity) = a real-valued random
variable. Expected loss (under θ) ¼ E h Lðh; d ðxÞÞ ¼ Rd ðhÞ = risk of d(x) under θ.
) Rd ðhÞ : h 2 H ! Risk function of d ðxÞ:
Let us restrict to rule d(x) for which Rd ðhÞ\18h and let D = the set of all such
d(x)’s. Rd ðhÞ gives a preference pattern D for given θ. The smaller the risk the better
is the decision rule d(x).
X
Thus, ðH; ; LÞ ! ðH; D; RÞ
Example 7.2 Point estimation of real h : ¼ H
d(x):
x ! ɶ(Θ); d(x) = point estimator of θ.
For squared error loss Rd ðhÞ ¼ E h ðd ðxÞ hÞ2 ¼ MSE of d(x) under θ.
Example 7.3 ¼ fa0 ; a1 g; ai ¼ accept Hi , i = 0, 1,

d ð xÞ :
x ! a0; a1
x0 ¼ fx : d ð xÞ ¼ a0 g ¼ acceptance region

x1 ¼ fx : d ð xÞ ¼ a1 g ¼ rejection region
) d ðxÞ ¼ a0 if x 2
x0
a1 if x 2
x1

x0 and
x1 are disjoint and
x0 þ
x1¼
x
For ð01Þ loss if h 2 H0 ; Rd ðhÞ ¼ Ph fd ð xÞ ¼ a1 g

¼ Ph fX 2
x1 g
¼ Probability of first kind of error:
If h 2 H1 ; Rd ðhÞ ¼ Ph fd ð xÞ ¼ a0 g ¼ Ph fX 2
x0 g
¼ 1 Ph fX 2 x1 g ¼ Probability of type 2 error:
Interval estimation of real θ
ɶ = set of all possible intervals of H.
d ð xÞ :
x!
d ð xÞ ¼ ðd1 ð xÞ; d2 ð xÞÞ

1 if h 62 a
Lðh; aÞ ¼
0 if h 2 a
Then Rd ðhÞ ¼ Ph fh 62 d ð xÞg ¼ 1 Ph fh 2 d ð xÞg

If Lðh; aÞ ¼ a2 a1
then Rd ðhÞ ¼ Eh ½d2 ð xÞ d1 ð xÞ = Expected length of d(x).
Thus, (Θ, ɶ, L) = Basic element of a statistical decision problem.
X = observable random variable; for each x, d ð xÞ 2 , i.e. d :
x!
d(x) = a non-randomized decision rule.
Rd ðhÞ ¼ Eh ðLðh; d ð xÞÞÞ ¼ Risk of d ðxÞ
D = the set of all non-randomized decision rules (with finite risks 8h)
Randomized Decision Rules

Randomized action
Example 7.4 Let H ¼ fh1 ; h2 g; ɶ ¼ fa1 ; a2 ; a3 g and
a1 a2 a3
Loss function as h1 1 4 3
h2 4 1 3
Neither a1 nor a2 is better than a3 for every value of h. Now define an action
1
a : a ¼ a1 with probability
2
1
¼ a2 with probability
2
The expected loss for a is

1 1
Lðh1 ; a Þ ¼ Lðh1 ; a1 Þ þ Lðh1 ; a2 Þ ¼ 2:5
2 2
1 1
Lðh2 ; a Þ ¼ Lðh2 ; a1 Þ þ Lðh2 ; a2 Þ ¼ 2:5
2 2
Thus a is to be preferred to a3 both under h1 and h2 . Such an a is called

randomized action.
Generally, by randomized action a we mean actually a probability distribution
over ɶ and loss due to a randomized action a is
Lðh; a Þ ¼ ELðh; zÞ where z is a random variable with probability distribution a
over ɶ.
Advantages of considering randomized actions

1. Extends the class of actions, i.e. allows more flexibility for the statistician.
2. The set of all randomized actions is convex, i.e. if a1 ; a2 are two randomized
actions, then for every 0 a 1; aa1 þ ð1 aÞa2 is also a randomized action

with L h; aa1 þ ð1 aÞa2 ¼ aL h; a1 þ ð1 aÞL h; a2 .
We shall consider only randomized actions a for which Lðh; a Þ is finite 8h and
shall denote by ɶ* the set of all such randomized actions.
Note Clearly ɶ ɶ* because a non-randomized action ‘a’ ≡ A probability dis-
tribution over ɶ degenerate at the point ‘a’.
First definition of randomized decision rule

Let X = observable random variable
x = observed value of X
For each x, let dð xÞ 2 ɶ*, i.e. d :
x ! ɶ*
d ¼ dð xÞ = a (behavioural) randomized decision rule.
Rd ðhÞ ¼ Risk of d at h ¼ Eh Lðh; dð xÞÞ
We shall consider only behavioural rules d for which Rd ðhÞ is finite 8h and shall
denote by D as the class of all such behavioural rules. Clearly, D D.
Example 7.5 Test of hypothesis problem H0 : h 2 H0 against H1 : h 2 H1 .
¼ fa0 ; a1 g; ai ¼ accept H i ; i ¼ 0; 1; . . .
A typical randomized action a ¼ /

where / = probability of accepting H 1
1 / = probability of accepting H 0 ; 0 / 1.
A typical behavioural decision rule: d ¼ dðxÞ ¼ /ðxÞ

where /ðxÞ ¼ probability of accepting H 1 for X ¼ x
1 /ðxÞ ¼ probability of accepting H 0 for X ¼ x
0 /ðxÞ 1. For 0–1 loss, Lðh; a Þ ¼ / 1 þ ð1 /Þ 0 ¼ / for h 2 H0
Lðh; a Þ ¼ / 0 þ ð1 /Þ 1 ¼ 1 / for h 2 H1
R/ ðhÞ ¼ Eh /ð xÞ for h 2 H0
¼ Eh ½1 /ð xÞ for h 2 H1
Second definition of randomized decision rule

Let X = observable random variable; x = observed value of X.
D = the set of all non-randomized decision rules.
d ¼ A probability distribution over D
¼ A randomized ðmixedÞ decision rule with Rd ðhÞ ¼ ERz ðhÞ ¼ Risk of d at h where
Z ¼ A random variable with probability distribution d over D:
Example 7.6 ¼ fa1 ; a2 g

x ¼ fx 1 ; x 2 g
D ¼ fd 1 ; d 2 ; d 3 ; d 4 g
d 1 : d 1 ð x 1 Þ ¼ a1 ; d 1 ð x 2 Þ ¼ a1
d 2 : d 2 ð x 1 Þ ¼ a2 ; d 2 ð x 2 Þ ¼ a2
d 3 : d 3 ð x 1 Þ ¼ a1 ; d 3 ð x 2 Þ ¼ a2
d 4 : d 4 ð x 1 Þ ¼ a2 ; d 4 ð x 2 Þ ¼ a1
A typical mixed decision rule is d ¼ ðp1 ; p2 ; p3 ; p4 Þ

P
4
pi 0 8 i ¼ 1ð1Þ4, pi ¼ 1 where
1
pi ¼ probability of choosing non-randomized rule d i
X
4
R d ð hÞ ¼ pi Rd i ðhÞ:
i¼1
We shall consider only mixed rules d for which Rd ðhÞ is finite 8h and shall
denote by D as the class of all such mixed decision rules. Clearly, D D since a
non-randomized rule d = a probability distribution over D degenerate at d.
First mode of randomization:
X
ðH; ; LÞ ! ðH; ; LÞ !ðH; D; RÞ
Second mode of randomization:

X
ðH; ; LÞ !ðH; D; LÞ ! ðH; D ; RÞ
Note The two modes of randomization can be considered to be equivalent in the

sense that given any d 2 D one can find a d 2 D with Rd ðhÞ ¼ Rd ðhÞ 8h and
conversely.
Example 7.7 ¼ fa1 ; a2 g, x ¼ fx 1 ; x 2 g
D ¼ fd 1 ; d 2 ; d 3 ; d 4 g as defined earlier.
P
4
A typical d 2 D is d ¼ ðp1 ; p2 ; p3 ; p4 Þ; pi 0 for i = 1(1)4, pi ¼ 1, where
1
pi ¼ probability of choosing d i .
n . X o
D ¼ ðp1 ; p2 ; p3 ; p4 Þ pi 08i; pi ¼ 1
A typical d 2 D is d ¼ ð/1 ; /2 Þ; 0 /1 ; /2 1, where /i ¼ /ðxi Þ = probabil-

ity of taking action a1 if X ¼ xi ;
1 /i ¼ probability of taking action a2 if X ¼ xi :
D ¼ fð/1 ; /2 Þ=0 /1 ; /2 1g
If one chooses a d 2 D ,

a1 is chosen with probability p1 þ p3 for X ¼ x1
a2 is chosen with probability 1 ðp1 þ p3 Þ ¼ p2 þ p4

a1 is chosen with probability p1 þ p4 for X ¼ x2
a2 is chosen with probability 1 ðp1 þ p4 Þ ¼ p2 þ p3
Thus, d can be considered to be equivalent to a d 2 D with /1 ¼ p1 þ p3 ,

/2 ¼ p1 þ p4 .
Similarly, a d 2 D can be considered to be equivalent to a d ¼ D with
p1 þ p3 ¼ /1 , p1 þ p4 ¼ /2 .
Advantages of considering randomized rules

1. Extends the class of decision rules, i.e. allows more flexibility to the statistician
2. The set of all randomized rules is convex, i.e. if d1 ; d2 2 D (or D ) then
ad1 þ ð1 aÞd2 2 D (or D ).
For every 0 a 1 and Rad1 þ ð1aÞd2 ðhÞ ¼ aRd1 ðhÞ þ ð1 aÞRd2 ðhÞ8h.
Thus, h 2 H; a 2 ; Lðh; aÞ; ðH; ; LÞ
X = observable random variable
P ¼ fPh =h 2 Hg ¼ family of probability distribution of X
d(x) = a non-randomized decision rule

D = the class of all non-randomized decision rules
dðX Þ = a behavioural or randomized decision rule
D = the class of all behavioural rules
D = the class of all randomized rules
D and D are equivalent classes.
We shall hereafter denote both D and D as D.
Note D
D
Let d 2 D, Rd ðhÞ = risk function of d; h 2 H.
Goodness of a d is measured by risk function.
A natural ordering of decision rules

Let d1 ; d2 2 D
1. d1 is said to be equivalent to d2 ðd1 d2 Þ if Rd1 ðhÞ ¼ Rd2 ðhÞ 8h 2 H
2. d1 is at least as good as d2 ðd1 d2 Þ if Rd1 ðhÞ Rd2 ðhÞ 8h 2 H
3. d1 is said to be better than d2 ðd1 [ d2 Þ if Rd1 ðhÞ Rd2 ðhÞ 8h 2 H with strict
inequality for at least one θ.
Note
1. d1 d2 ) either d1 [ d2 , or d1 d2
d1 [ d2 ) d1 d2
2. d1 [ d2 ; d2 [ d3 ) d1 [ d3 , similarly for case
3. It may so happen that neither d1 [ ðor Þd2 nor d2 [ ðor Þd1 . In such case d1
and d2 are non-comparable. Thus [ ðor Þ gives a partial ordering of rules
2D
Example 7.8 X N ðh; 1Þ

To estimate h; H ¼ ¼ ð1; 1Þ:
Lðh; aÞ ¼ ðh aÞ2 ¼ squared error loss. For any real constant C, let d c ðX Þ ¼
CX ¼ A non-randomized rule (Fig. 7.1).
Rdc ðX Þ ¼ Eh ½CX h2 ¼ E h ½C ðX hÞ hð1 C Þ2

¼ C 2 E h ðX hÞ2 þ h2 ð1 C Þ2 2C ð1 C Þh Eh ðX hÞ
¼ C 2 þ h2 ð 1 C Þ 2
Fig. 7.1 Rd ( θ)
R dc ( θ )
Rd (θ )
1
2
R d1 ( θ )
θ
For C = 1, Rd 1 ðhÞ ¼ 1 8h.

For C > 1, Rd c ðhÞ [ 1 ¼ Rd 1 ðhÞ 8h ) d 1 [ d c
If C ¼ 12 ; Rd1=2 ðhÞ ¼ 14 þ h4
2
Here neither d 1 [ d 1=2 , nor d1=2 [ d 1

Hence d 1 and d 1=2 are non-comparable.
Admissibility of Decision Rules

Definition A d 2 D is said to be an admissible decision rule if there does not exist
any d0 2 D such that d0 [ d. Otherwise d is said to be inadmissible, i.e. d is said to
be an inadmissible rule if there exists a d0 2 D such that d0 [ d.
In the above example, for any C > 1, d c is inadmissible as d 1 [ d c .
Note Admissibility is the minimum requirement for any reasonably good decision
rule though the criterion is of negative nature.
7.2 Complete and Minimal Complete Class of Decision

Rules
Definition Let C(D) be a class of decision rules. C is said to be a complete class

of decision rule if given any d 62 C such that a d0 2 C exists such that d0 [ d
(Fig. 7.2).
C is said to be minimal complete if
Fig. 7.2
(i) C is complete and

(ii) No proper sub-class of C is complete.
Significance
If a complete class of C is available one can restrict to this class only for finding a
reasonable decision rule and thus reduce the problem.
A minimal complete class, if exists, provides maximal reduction to this extent.
Note A minimal complete class does not necessarily exist.
Some relationship between a complete (or a minimal complete) class and the
class of all admissible rules
Let A = the class of all admissible rules.
Result 1 For any complete class C, A C, i.e. any complete class C contains all
admissible rules.
Proof Let d 2 A. If possible let d 62 C. So there exists a d0 2 C such that d0 [ d )

d is inadmissible, which is a contradiction as we have assumed d is an admissible
rule. So d 2 A ) d 2 C, i.e. A C. h
Result 2 If A is complete, then A is minimal complete.
Proof Assume A is complete. Result 1 ) No proper sub-class of A can be com-
plete. Hence A is minimal complete. h
Result 3 If a minimal complete class C exists, then C A.
Proof Let C be a minimal complete class. Then C is complete. By Result 1, A C.
So it is enough to prove that C A. Suppose this is not true. Then there exists a d0
such that d0 2 C but
d0 62 A ð7:1Þ
This will imply that there exists a d1 2 C such that
d1 [ d0 ð7:2Þ
(Since d0 62 A, i.e. d0 is inadmissible. Hence, there exists a d such that d [ d0 . If

d 2 C, take d ¼ d1 . If d 62 C, there exists a d1 2 C such that d1 [ d [ d0 . Thus, in
all cases there exists a d1 2 C such that d1 [ d0 ).
Let us define C 1 ¼ C fd0 g.
Let us define C 1 ¼ C fd0 g: Then it follows that C1 is also complete ð7:3Þ
(Let d 62 C1 h
Case 1 d ¼ d0 . By (7.2), there exists a d1 2 C and hence d1 2 C 1 such that
d1 [ d0 :
Case 2 d 6¼ d0 . Then d 62 C, so there exists a d0 2 C such that d0 [ d:
A: d0 ¼ d0 . By (7.2), there exists a d1 2 C and hence 2 C1 such that
d1 [ d0 [ d.
B: d0 6¼ d0 . d0 2 C 1 . Hence, there exists a d0 2 C 1 such that d0 [ d:
Thus, given any d 62 C 1 in all cases there will exist a d0 2 C1 such that d0 [ d )
C 1 is complete)
Now (7.3) contradicts that C is minimal complete and hence (7.1) must be false
) C A: So C A.
Result 2 + Result 3 gives us ) A minimal complete class exists iff A is complete

and in this case C A.
Corollary 1 A minimal complete class, if it exists, is unique.
Proof Let C = a minimal complete class. C A which is unique. h
7.2 Complete and Minimal Complete Class of Decision Rules 191
Corollary 2 Let C be a minimal complete class and let d 2 C. Then if d0 d, d0

also 2 C.
Proof C A, d 2 C , d 2 A. d0 d ) d0 also 2 A and hence 2 C. h
Corollary 3 If d is admissible and d0 d then d0 is also admissible.
Essential complete class and minimal essential complete class

Definition Let CðDÞ be a class of decision rules. Then C is said to be an essential
complete class if given any d 62 C there exists a d0 2 C such that d0 d.
C is said to be minimal essential complete class if

(i) C is essential complete; and (ii) No proper sub-class of C is essential complete.
Note A complete class C is also essential complete since d0 [ d ) d0 d.

Result 1 Let A = the class of all admissible rules and C = an essential complete
class.
If d 2 A but 62 C, then there exists a d0 d such that d0 2 C (and hence 2 A).
Proof Let d 2 A but 62 C. Then there exists a d0 2 C such that d0 d. But as d 2 A,
it is impossible that d0 [ d. So, d0 d. h
Result 2 Let C be minimal essential complete and let d 2 C. If d0 d, then d0 62 C.
Proof If possible, let d0 2 C. Define C 0 ¼ C fd0 g then C0 will be also essential
complete. This contradicts that C is minimal essential complete. Hence d0 62 C. h
Note Let D1 ðDÞ be a class of decision rules. D1 is said to be an equivalent class
if all rules 2 D1 are equivalent to each other, but no rule 2 D D1 is equivalent to
a rule 2 D1 . Then D can be considered as the disjoint union of some equivalent
classes.
Then,
(i) If C = a min. complete class then C does or does not entirely contain an
equivalent class (by Corollary 2)
(ii) If C = a minimal essential complete class then C contains at most one rule
from each equivalent class (by Result 2)
Further if d 2 C and in C, d is replaced by d0 d, then resultant class is also
minimal essential complete.
So,
(a) A minimal complete class A minimal essential complete class (by (i) and
(ii) above)
(b) A min. essential complete class is not necessarily unique. (by 2nd part of
(ii) above).
If C be a complete class such that C contains no proper essentially complete

sub-class, then C is minimal complete and is also minimal essential complete.
Example 7.9 Examples of complete and essential complete class
(1) Essential completeness of the class of rules based on a sufficient statistic:
Let d dð xÞ 2 D. For such x, dðxÞ is a probability distribution over ɶ. T = t(x) = a
statistic. d is said to be based on T if dðxÞ is a function of t(x), i.e. dðxÞ ¼ dðx0 Þ
whenever tðxÞ ¼ tðx0 Þ.
Such a rule can be denoted by dðT Þ: T is said to be a sufficient statistic if the
conditional probability distribution of X given T is the same 8h.
Let T = a sufficient statistic and D0 = the class of rules based on T.
Lemma 1 For any d 2 D, there exists a d0 2 D0 such that d0 d: [Cor. D0 is an
essential complete class]
Proof Let d 2 D
For each given value t of T we define a probability distribution d0 ðtÞ over ɶ as
follows:
Observe the value of a random variable X 0 having the probability distribution the
same as the conditional probability distribution of X given T = t (which is inde-
pendent of h) and then if X 0 ¼ x0 choose an action a 2 ɶ according to the proba-
bility distribution dðx0 Þ. h
Clearly, d0 ðT Þ ¼ a decision rule based on T, i.e. 2 D0 .

Also, Lðh; d0 ðtÞÞ ¼ E fLðh; dðxÞÞ=T ¼ tg
) Rd0 ðhÞ ¼ E h Lðh; dðT ÞÞ ¼ Eh E fLðh; d0 ðxÞÞ=T g
¼ E h Lðh; dðxÞÞ ¼ Rd ðhÞ i:e:; d0 d
Thus, given any d 2 D we can find a d0 2 D0 D0 such that d0 d.

(2) Essential completeness of the class of non-randomized rules for convex
(strictly convex) loss. Let Rk ¼ k-dimensional real space. S Rk .
S is said to be a convex subset if for any two x ; y 2 S and for any 0 a 1,

a x þ ð1 aÞ y also 2 S (Fig. 7.3).

Fig. 7.3 (a) (b)

x y~ x
~ y~
~
S
Convex Non-Convex
Let
S = a convex subset of Rk .
f x ¼ a real-valued function defined on S (Fig. 7.4).

Fig. 7.4

f x is said to be a convex function if for any two x ; y 2 S and for any

0 a 1,

f a x þ ð1 aÞ y af x þ ð1 aÞf y ð7:4Þ

If strict inequality holds in (7.4) whenever x 6¼ y , f x is said to be strictly

convex.
f ð xÞ ¼ x2 ; ex ; jxj; x 2 R1
Examples 7.10 |fflffl{zfflffl} convex
Strictly convex

Lemma 2 (Jensen’s inequality) Let S = a convex subset of Rk ; f x ¼ a

real-valued convex function defined on S. Let Z ¼ a random variable, such that
h i
P Z 2 S ¼ 1 and E Z exists. Then (i) E Z 2 S; (ii) Ef Z f E Z

If f is strictly convex, then strict inequality holds in (ii) unless the distribution of
Z is degenerate.

Let ɶ = a convex subset of Rk . The loss function Lðh; aÞ is said to be convex (or
strictly convex) if for each given h, Lðh; aÞ is a convex (or strictly convex) function
of a.
Example 7.11
¼ R1 ; Lðh; aÞ ¼ ðh aÞ2 or jh aj
# #
Strictly convex convex
Let d 2 D. For each x, dðxÞ is a probability distribution over ɶ. Let Z x ¼ a

random variable with probability distribution dðxÞ over ɶ.
We assume that Ezx exists for each
x2x ð7:5Þ
Let D = the class of all non-randomized rules D D.
Lemma 3 Let ɶ = a convex subset of Rk and the loss function be convex. Then for
each d 2 D satisfying (7.5) there exists a d 0 2 D; viz, d 0 ðxÞ ¼ EZ x such that
d 0 d. If the loss function is strictly convex, then d 0 [ d unless d itself 2 D.
Corollary 1 Let ɶ = a convex subset of Rk , the loss function be strictly convex and
every d 2 D satisfying (7.5), then D (=the class of all non-randomized rules) is
essential complete.
Proof of Lemma 3 Let d 2 D
dðxÞ ¼ a probability distribution over ɶ. For each x, Z x ¼ a random variable
with probability distribution dðxÞ. Define d 0 ðxÞ ¼ EZ x . By (i) of Lemma 2, d 0 ðxÞ 2
ɶ 8x, i.e. d 0 ¼ d 0 ðxÞ 2 D. Also, by (ii) of Lemma 2 Lðh; d 0 ðxÞÞ ¼ Lðh; EZ x Þ
ELðh; Z x Þ ¼ Lðh; dðxÞÞ.
) Rd0 ðhÞ ¼ E h Lðh; d 0 ðxÞÞ Eh ðL; h; dðxÞÞ ¼ Rd ðhÞ8h

ð7:6Þ
) d0 d
If the loss function is strictly convex, strict inequality holds in (7.6) for at least
one h unless Z x -distribution is degenerate, i.e. 8x except possibly for x 2 A such
that Ph ½x 2 A ¼ 0 8h, in which case it means that d itself 2 D ) d 0 [ d unless d
itself 2 D h
Corollary 2 Let ɶ = a convex subset of Rk , the loss function is strictly convex and
every d 2 D satisfying (7.5). Let T be a sufficient statistic and D0 = the class of
non-randomized rules based on T, D0 D. Then D0 is essential complete
(complete).
Proof Let d 2 D.
D0 = the class of all randomized decision rules based on T. By Lemma 1, there
exists a d0 ¼ d0 ðT Þ 2 D0 such that d0 d. For each t, d0 ðT Þ is a probability dis-
tribution over ɶ. Define Z t ¼ a random variable with probability distribution d0 ðtÞ
and d 0 ðtÞ ¼ EZ t . As in proof of Lemma 3, d 0 ðtÞ 2 ɶ, i.e. d 0 ¼ d 0 ðT Þ 2 D0 and
d 0 d0 ( [ d0 for strictly convex loss function unless d0 2 D; d). Thus, given
any d 2 D, there exists a d 0 2 D0 such that d 0 d (> d for strictly convex loss
function unless d 2 D0 ).
) D0 is essential complete (complete). h
Note On the condition stated by (7.5)
Let d 2 D, Z x ¼ a random variable with probability distribution dðxÞ over ɶ.
Lðh; dðxÞÞ ¼ ELðh; Z x Þ which exists for each x and h. This in many cases implies
(7.5) holds.
Example 7.12 k ¼ 1; Lðh; aÞ ¼ ðh aÞ2 ɶ ¼ R1
ELðh; Z x Þ ¼ E ðh Z x Þ2 exists 8x and 8h
) EZ x exists 8x.
Lðh; aÞ ¼ jh aj
ELðh; Z x Þ ¼ E jZ x hj E jZ x j h
i.e. E jZ x j h þ E jZ x hj:
Thus E jZ x hj exists 8x and 8h ) (7.5) holds.
For K 2, ɶ = ¼ Rk ¼ X
Xk 2

L h; a ¼ j ai hi j 2 ¼ a h

i¼1

EL h ; Zx ¼ E
Zx h which exists 8x and 8h ) (7.5) holds.

Proposition Suppose for some h

Lðh; aÞ C 1 jaj þ C2 for some C 1 ð [ 0Þ; C2 . Then ELðh; Z x Þ exists 8x ) (7.5)
holds.
This fact gives a sufficient condition on loss function for (7.5) to hold (Fig. 7.5).
Fig. 7.5
L(θ,a)
c1 a +c2
Rao-Blackwell Theorem
Let T = a sufficient statistic.

D = the class of random values.
D0 = the class of random vales based on T.
D = the class of non-random values.
D0 = the class of non-random values based on T.
Let d 2 D satisfy E ðd ðxÞ=T ¼ tÞ exists ð7:7Þ
Lemma 4 (Rao-Blackwell Theorem) Let ɶ be a convex subset of Rk and let the

loss function be convex. For any d 2 D satisfying (7.7), there exists a d 0 2 D0 , viz,
d 0 ðtÞ ¼ E ðd ðxÞ=T ¼ tÞ. If the loss function be strictly convex d 0 [ d unless d itself
2 D0 .
Proof d 0 ðtÞ ¼ E ðd ðxÞ=T ¼ tÞ is independent of h.
Lðh; d 0 ðtÞÞ ¼ Lðh; Eðd ðxÞ=T ¼ tÞÞ

E fLðh; d ðxÞ=T ¼ tÞg by Lemma 2:
) Rd 0 ðhÞ ¼ E h Lðh; d 0 ðT ÞÞ E h E fLðh; d ðxÞ=T ¼ tÞg
¼ E h Lðh; d ðxÞÞ ¼ Rd ðhÞ 8h
) d0 d
If L is strictly convex, ‘=’ in the above inequality 8h

iff d is a function of t, i.e. d itself 2 D0 implying that d 0 [ d unless d itself
2 D0 . h
Corollary Let ɶ be a convex subset of Rk and the loss function be convex. Let
every d 2 D satisfy (7.6) and every d 2 D satisfy (7.7), then D0 is essential com-
plete. If the loss function be strictly convex, D0 is complete.
Proof Let d 2 D
By Lemma 3, there exists a d 2 D such that d d. Also, by Lemma 4 there exists a
d 0 2 D such that d 0 d d. Thus given any d 2 D, there exists a d 0 2 D0 such
that d 0 d ) D0 is essentially complete. If the loss function is strictly convex
d 0 [ d unless d itself 2 D0 ) D0 is complete. h
Note on condition (7.7) For every d 2 D, Rd ðhÞ ¼ Eh Lðh; d ðxÞÞ exists 8h.
This generally implies that Eh ðd ðxÞÞ exists 8h.
) Eðd ðxÞ=T ¼ tÞ exists, i.e. (7.7) holds.
Example 7.13 To estimate a real parameter h; X ¼ ɶ ¼ ð1; 1Þ
Lðh; aÞ ¼ ðh aÞ2
Rd ðhÞ ¼ Eh ðd ðxÞ hÞ2 exists 8h ) E h ðd ðxÞÞ exists 8h ) (7.7) holds.

Similarly, it can be shown for absolute error loss Lðh; aÞ ¼ jh aj
Proposition Let for some h, Lðh; aÞ C 1 jaj þ C 2 for some constant C 1 ð [ 0Þ and
C2 .
Then Rd ðhÞ exists 8h ) E h ðd ðxÞÞ exists ) (7.7) holds. Thus the proposition
gives a sufficient condition on loss function for (7.7) to hold.
7.3 Optimal Decision Rule
d1 d2 if Rd1 ðhÞ Rd2 ðhÞ 8h and it is a natural partial ordering of decision rules.
d0 2 D is said to be best or optimal if d0 d 8 d 2 D, but generally such an optimal
rule does not exist.
Example To estimate a real parameter h, X ¼ ¼ ð1; 1Þ. Let
Lðh; aÞ ¼ ðh aÞ2 . If possible, suppose there exists a best rule, say d0 . Consider
any given value of h, say h0 and define d 0 ðxÞ ¼ h0 8x. Clearly, Rd 0 ðh0 Þ ¼ 0 )
Rd0 ðh0 Þ ¼ 0 where d0 d 0 . Since h0 is arbitrary we must have Rd0 ðhÞ ¼ 0; 8h
which is generally impossible.
) generally there does not exist a best rule.
So to find a reasonably good decision rule we need some additional principles.
Two such principles are generally followed:
(i) Restriction principle
(ii) Linear ordering principle
Restriction principle Put some reasonable restrictions on decision rules, i.e.
consider a reasonable restricted sub-class of decision rules having good overall
performances and then try to find a best in this restricted sub-class.
Two restriction criteria often used are
(i) Unbiasedness and
(ii) Invariance
Linear ordering principle
For every d replace the risk function by a representative number and then compare
the rules in terms of these representative numbers.
If representative number of d1 representative number of d2 , then we prefer d1
to d2 . d0 is considered to be optimal if representative number of d0 representative
number of d 8 d 2 D.
Thus a linear ordering principle is a way of specifying representative number
Note Any linear ordering principle should not disagree with partial ordering
principle, i.e. if d1 d2 we must have representative number of d1 as repre-
sentative number of d2 .
Two linear ordering principles that are used in general are
(i) Bayes principle
(ii) Minimax principle
Bayes Principle Let X may be finite or countable P

sðhÞ : h 2 X ! a suitable weight function over X. sðhÞ 0 8h and sðhÞ ¼ 1.
h2X
Take representative number as weighted average risk
X
¼ sðhÞRd ðhÞ ¼ r ðs; dÞ
h2X
sðhÞ ¼ a p:m:f of a ðdiscreteÞ distribution over X

¼ prior p:m:f of h:
r ðs; dÞ ¼ Bayes risk of d with respect to s:
If X ¼ a non-degenerate interval of Rk ,
sðhÞ = p.d.f of a (continuous)
R distribution over X.
Bayes risk of d ¼ rðs; dÞ ¼ Rd ðhÞsðhÞdh.
X
d0 S are compared with respect to rðs; dÞ, i.e. if rðs; d1 Þ rðs; d2 Þ, then d1 is
preferred to d2 . A d0 2 D is considered to be optimum if it minimizes rðs; dÞ with
respect to d 2 D. Such a d0 is called a Bayes rule with respect to prior s.
Definition A rule d0 2 D is said to be a Bayes rule with respect to a prior s if it
minimizes Bayes risk (w.r.t. s) rðs; dÞ w.r.t. d 2 D, i.e. if rðs; d0 Þ ¼ inf rðs; dÞ.
d21
Note
1. A Bayes rule may or may not exist. If it exists, inf min.
2. A Bayes rule depends on prior s.
3. A Bayes rule, even if exists, may not be unique.
4. Bayes principle does not disagree with partial ordering principle, i.e.
Rd1 ðhÞ Rd2 ðhÞ 8h ) rðs; d1 Þ rðs; d2 Þ whatever be s.
Minimax principle
For a d 2 D, representative number is taken as
Sup Rd ðhÞ ¼ Max: Risk that may be incurred due to choice of d. d1 is preferred
h2X
to d2 if Sup Rd1 ðhÞ Sup Rd2 ðhÞ.
h2X h2X
d0 is considered to be optimum if it minimizes Sup Rd ðhÞ with respect to d 2 D.
h2X
Such a d0 is called a “Minimax Rule”.
Definition A rule d0 2 D is said to be a minimax rule if it minimizes Sup Rd ðhÞ
h2X
with respect to d 2 D, i.e. if
Sup Rd0 h ¼ inf Sup Rd h.
h2X d2D h2X
7.3 Optimal Decision Rule 199
Notes
1. A minimax rule may or may not exist.
2. A minimax rule does not involve any prior s.
3. A minimax rule, even if exists, may not be unique.
4. Minimax principle doesn’t disagree with partial ordering principle
i.e. Rd1 ðhÞ Rd2 ðhÞ 8h

) Sup Rd1 ðhÞ Sup Rd2 ðhÞ
h2X h2X
7.4 Method of Finding a Bayes Rule
s = a given prior.
To find a Bayes rule d0 with respect to s to find a rule d0 that minimizes Bayes
risk rðs; dÞ with respect to d.
Proposition If a Bayes rule d0 with respect to a given prior s exists, then there
exists a non-randomized rule d 0 which is Bayes with respect to s.
Implication For finding a Bayes rule, we can without any loss of generality con-
sider non-randomized rules only.
Proof Let d0 be a Bayes rule with respect to s. d0 may be considered as a prob-
ability distribution over D (=the class of non-randomized rules).
Let Z = a random variable with probability distribution d0 over D. Then
rðs; d0 Þ ¼ E z rðs; zÞ ð7:8Þ

P P
[Let X be finite or countable. r ðs; d0 Þ ¼ sðhÞRd0 ðhÞ ¼ sðhÞEz Rz ðhÞ
h2X h2X
X
¼ Ez sðhÞRz ðhÞðassuming that it is permissibleÞ
h2X
¼ Ez r ðs; zÞ
Similarly, we can show if when X ¼ a non-degenerate interval of Rk ].
Now d0 is Bayes ) r ðs; d0 Þ r ðs; dÞ 8 d 2 D

) r ðs; d0 Þ r ðs; d Þ 8d 2 D as D D
ð7:9Þ
) r ðs; d0 Þ r ðs; zÞ 8 values of z:
) r ðs; d0 Þ E z r ðs; zÞ ¼ r ðs; d0 Þ
by (7.8).
We must have equality in (7.9), and consequently Z must 2 D0 with probability
1, where D0 ¼ fd=d 2 D;r ðs; d Þ ¼ r ðs; d0 Þg. h
Consider any d 0 2 D0 ,
then r ðs; d 0 Þ ¼ r ðs; d0 Þ ¼ inf r ðs; dÞ (since d0 is Bayes)
d2D
) d 0 is also Bayes. This proves the Proposition.
Note It is clear from the proof that
(1) A randomized Bayes rule = A probability distribution over D0 ; i.e. the class of
non-randomized Bayes rules.
(2) If a non-randomized Bayes rule is unique, i.e. D0 consists of a single d 0 , then a
Bayes rule is unique and is d 0 .
Method of finding Bayes rule

sðhÞ ¼ a prior distribution of h.
To minimize r ðs; dÞ with respect to d 2 D,
Without any loss of generality we may restrict to non-randomized rules only. So
we are to minimize r ðs; d Þ with respect to d 2 D.
Let X be Rcountable and x be also countable (If
x is an open interval of Rk ,
replace Σ by ).
Then for any d 2 D
X X X
r ðs; d Þ ¼ sðhÞRd ðhÞ ¼ sðhÞ pðx=hÞLðh; d ð xÞÞ
h2X h2X x2
x
XX ð7:10Þ
¼ sðhÞpðx=hÞLðh; d ð xÞÞ
x h2X
x2
assuming it is permissible.
PSuppose there exists a d0 ¼ d0 ð xÞ such that for each x, d0 ð xÞ minimizes
sðhÞpðx=hÞLðh; d ð xÞÞ with respect to d ð xÞ 2 ɶ.
h2X
Then clearly, d0 minimizes (7.10) w.r.t. d 2 D ) d0 is Bayes rule with respect
to s.
pðx=hÞ ¼ conditional p:m:f of X given h.
sðhÞ ¼ marginal p:m:f of h.
pðx=hÞsðhÞ ¼ Joint p:m:f of X and h.
X
pð x Þ ¼ pðx=hÞsðhÞ ¼ marginal p:m:f of X:
h2X
pðx=hÞsðhÞ
qðh=xÞ ¼ ¼ conditional ðPosteriorÞ p:m:f: of h given X ¼ x
pð x Þ
if pð xÞ [ 0 ½pð xÞ ¼ 0 , sðhÞpðx=hÞ ¼ 08h 2 X.

7.4 Method of Finding a Bayes Rule 201
P
To minimize sðhÞpðx=hÞLðh; dð xÞÞ with respect to d ð xÞ 2 ɶ
h2XP
, to min. pð xÞ qðh=xÞLðh; dð xÞÞ with respect to d ð xÞ 2 ɶ.
P h2X
, min qðh=xÞLðh; dð xÞÞ w.r.t. d ð xÞ 2 ɶ.
h2X
(It is conditional (posterior) loss given X = x), i.e. EfLðh; d ð xÞÞ=X ¼ xg.
Thus if there exists a d0 d0 ð xÞ such that for each x, d0 ð xÞ gives min
E fLðh; d ðxÞÞ=X ¼ xg ¼ Conditional ðposteriorÞ loss given X ¼ x w:r:t: d ð xÞ 2 :
Then d0 is a Bayes rule.
If the minimizing d0 ð xÞ is unique for each x, then d0 is the unique Bayes rule.
[Let X be an open interval of RkR and x be also an open interval of Rk
(If x is countable, replace Σ by )
Then for any d 2 D
Z
r ðs; d Þ ¼ sðhÞRd ðhÞd ðhÞ
X
2 3
Z Z
¼ sðhÞ4 pðx=hÞLðh; d ð xÞÞdx5dh ð7:11Þ
X
x
Z Z
¼ sðhÞpðx=hÞLðh; d ð xÞÞdhdx
U X
(assuming this to be permissible)

R Suppose there exists a d0 d0 ð xÞ such that for each x, d0 ð xÞ minimize
sðhÞpðx=hÞLðh; d ð xÞÞdh with respect to d ð xÞ 2 .
X
Then clearly, d0 minimizes (7.11) with respect to d 2 D ) d0 is Bayes rule with
respect to s.
pðx=hÞ ¼ conditional p:d:f of Xgiven h.
sðhÞ ¼ marginal p:d:f of h
pðx=hÞsRðhÞ ¼ Joint p:d:f of Xand h.
pð xÞ ¼ sðhÞpðx=hÞdh ¼ marginal p:d:f of X:
X
pðx=hÞsðhÞ
qðh=xÞ ¼ p ð xÞ ¼ conditional ðposteriorÞ p:d:f of h given X = x, if pð xÞ [ 0.
R
To minimize sðhÞpðx=hÞLðh; d ð xÞÞdh with respect to d ð xÞ 2
RX
, min. pð xÞ qðh=xÞLðh; d ð xÞÞdh with respect to d ð xÞ 2
R X
, min. qðh=xÞLðh; d ð xÞÞdh with respect to d ð xÞ 2
X
which is to min conditional (posterior) loss given X = x.
i.e. E ðLðh; d ð xÞÞ=X ¼ xÞ.
Thus if there exists a d0 d0 ð X Þ such that for each x, d0 ð xÞ min

E fLðh; d ð xÞÞ=X ¼ xg = conditional (posterior) loss given X = x with respect to
d ð xÞ 2 ɶ then d0 is a Bayes rule.
If the minimizing d0 ð xÞ is unique for each x, then d0 is unique Bayes rule]
Summary To min r ðs; d Þ with respect to d 2 D
r ðs; d Þ ¼ Eh Rd ðhÞ
h i
¼ Eh EX=h Lðh; d ð xÞÞ *Rd ðhÞEX=h Lðh; d ð xÞÞ
¼ EX Eh=X Lðh; d ð xÞÞ min for each X ¼ x with respect to d ð xÞ 2
If d0 ð xÞ is the minimizing, then d0 ¼ d0 ð xÞ is Bayes rule.
Applications
1 Estimation of a real parameter h for squared error loss. To estimate a real
parameter h where X ¼ ɶ ¼ R1 or an open interval of it.
Lðh; aÞ ¼ ðh aÞ2 , sðhÞ ¼ a prior p.d.f of h
R
To min. E fLðh; d ð xÞÞ=X ¼ xg ¼ ðh d ð xÞÞ2 qðh=xÞdh w.r.t. d ð xÞ 2 .
X
Clearly, minimizing d0 ð xÞ is given by
R
Z hpðh=xÞsðhÞdh
h X
d0 ð xÞ ¼ E =X ¼ x ¼ hqðh=xÞdh ¼ R
sðhÞpðh=xÞdh
X X
Thus, here unique Bayes rule is d0 where
d0 ð xÞ ¼ Mean of the posterior distribution of h given X ¼ x:
Example 7.14
X Rð0; hÞ; 0\h\1
To estimate h under squared error loss

Let sðhÞ ¼ prior p:d:f of h ¼ heh ; h [ 0
1
pðx=hÞ ¼ ; 0\x\h
h
qðh=xÞ ¼ conditional P:d:f of h given X ¼ x

eh
¼ R1 ; x\h\1
eh dh
x

¼ eh ex
R1
heh dh
Mean of the posterior distribution of h given (X = x) ¼ ex x
R1 h
heh j1
x þ e dh
x
þ ex
¼ e x
x
¼ xe ex ¼ x þ 1.
Thus unique Bayes estimator of h w.r.t. s is d 0 ðxÞ ¼ X þ 1.
Example 7.15 X Binðn; hÞ, n given, 0\h\1
To estimate h under squared error loss.

n x
pðx=hÞ ¼ h ð1 hÞnx ; x ¼ 0; 1; . . .; n
x
Let sðhÞ ¼ prior p:d:f of h
1
¼ ha1 ð1 hÞb1 ; a; b [ 0
Bða; bÞ
¼ Beta prior
qðh=xÞ ¼ posterior distribution of h given X ¼ x

n 1
hx þ a1 ð1 hÞnx þ b1
x Bða;bÞ
¼

n R1 x þ a1
Bða;bÞ h
1
ð1 hÞnx þ b1 dh
x 0
1
¼ hx þ a1 ð1 hÞnx þ b1 ; 0\h\1:
Bðx þ a; n x þ bÞ
d 0 ðxÞ ¼ mean of posterior distribution of h given ðX ¼ xÞ

Z1
1
¼ hx þ a ð1 hÞnx þ b1 dh
Bðx þ a; n x þ bÞ
0
Bðx þ a þ 1; n x þ bÞ xþa
¼ ¼ :
Bðx þ a; n x þ bÞ aþbþn
Thus the unique Bayes estimator of h w.r.t. Beta ða; bÞ prior is d 0 ðxÞ ¼ n þX þ a
a þ b.
Particular case if a ¼ 1; b ¼ 1 sðhÞ ¼ 1 80\h\1, i.e. uniform prior.

Unique Bayes estimator is Xn þþ 21 :
Example 7.16 Let X Poisson (h), 0\h\1
To find Bayes estimator w.r.t. Þ prior.
i.e. sðhÞ 1 eah hb1 ; h 0
Let sðhÞ ¼ Keah hb1 ; h 0
R1
as sðhÞdh ¼ 1 ) K aÞbb ¼ 1 ) K ¼ a
b
Þb
0
hx
pðx=hÞ ¼ eh ; x ¼ 0; 1; . . .
x!
eh hx ab ah b1
x! Þ ðbÞ e h
qðh=xÞ ¼ R1
ab 1
: x! eð1 þ aÞh hx þ b1 dh
Þ ð bÞ
0
ð1 þ aÞh x þ b1
e :h xþb
¼ ð 1 þ aÞ
Þ ð x þ bÞ
Z1
ð 1 þ aÞ x þ b
d0 ð x Þ ¼ eð1 þ aÞh hx þ b dh
Þ ðx þ b Þ
0
xþb
ð 1 þ aÞ Þ ð x þ b þ 1Þ xþb
¼ xþbþ1
¼
Þ ðx þ bÞ ð 1 þ aÞ 1þa
) Unique Bayes estimator of h w.r.t. Þ prior is
xþb
d 0 ð xÞ ¼ :
1þa
Notes
1. d 0 ðxÞ is also (unique) Bayes if Lðh; aÞ ¼ cðh aÞ2 1ðh aÞ2 , c = a given
constant
2. If a sufficient statistic T exists we may consider rules based on T only (because
of essential completeness of rules based on T) and then may find Bayes rule
based on T.
Example 7.17 X ¼ ðX 1 ; X 2 ; ::; X n Þ; X 1 ; X 2 ; ::; X n i.i.d. N ðh; 1Þ 1\h\1


T ¼ X ¼ min. Sufficient statistic N h; 1n .
sðhÞ :¼ h N ð0; r2 Þ; r2 ð [ 0Þ is known.
pðt=hÞsðhÞ ¼ Cont: e2ðthÞ eh =2r

n 2 2 2
2

n2t2 h2 n þ 12 þ nht
¼ Cont: e e r
2
2 þ1 nr2
n2 r 2 t 2 nr h
n2t2 þ t
¼ cont: e 2ðnr2 þ 1Þ
e 2r2 nr2 þ 1
qðh=tÞ ¼ Posterior p:d:f of h given t

2
2 þ1 nr2
n2 r2 t2 nr h
n2t2 þ t
2ðnr2 þ 1Þ
Const.e e 2r2 nr2 þ 1
¼ 2
n2t2 þ n2 r 2 t 2
2ðnr2 þ 1Þ
R nr
2 þ1
2r2
h nr2
nr2 þ 1
t
Const.e e
2
2 þ1 nr2
nr h t
¼ Const.e 2r2 nr2 þ 1

nr2 r2
) given t; h N t; 2
nr þ 1 nr þ 1
2
2
Posterior mean ¼ Kx; K ¼ nrnr
2 þ1 :
) (Unique) Bayes estimator of h ¼ Kx ¼ d 0 ðxÞ

Also, Min. Bayes risk = Bayes risk of d 0 ¼ Eh Ex=h ðd 0 hÞ2
r2 r2
¼ Ex E h=x ðd 0 hÞ2 ¼ Ex ¼
nr2 þ 1 nr2 þ 1
Applications
2. Estimation of a real h under weighted squared error loss: h ¼ a real parameter.
To estimate h, X ¼ ¼ some interval of R1
Let Lðh; aÞ ¼ wðhÞðh aÞ2 ; wðhÞ [ 0
d 0 be Bayes if for each x, d 0 ðxÞ minimizes
Z
Eh=X¼x wðhÞðh d ðxÞÞ2 ¼ wðhÞðh d ðxÞÞ2 qðh=xÞdh
with respect to d ðRxÞ 2

hwðhÞqðh=xÞdh
Clearly, d 0 ðxÞ ¼ R .
wðhÞqðh=xÞdh
Example 7.18 X Bin ðn; hÞ, n known, 0\h\1

2
To estimate h with Lðh; aÞ ¼ hðha Þ
ð1hÞ ¼ wðhÞðh aÞ
2
where wðhÞ ¼ hð1h

1
Þ
Let sðhÞ ¼ 1 80\h\1; i.e. uniform prior.

n x
h ð1 hÞnx
x hx ð1 hÞnx
qðh=xÞ ¼
¼ ; 0\h\1
n R x Bðx þ 1; n x þ 1Þ
h ð1 hÞnx dh
x
R1 hx ð1hÞnx
R h hð1h
1
Þ Bðx þ 1;nx þ 1Þ dh
hwðhÞqðh=xÞdh 0
d 0 ð xÞ ¼ R ¼ 1
wðhÞqðh=xÞdh R 1 hx ð1hÞnx
hð1hÞ Bðx þ 1;nx þ 1Þ dh
0
R1
hx ð1 hÞnx1 dh
0 Bðx þ 1; n xÞ
¼ ¼
R1 Bðx; n xÞ
hx1 ð1 hÞnx1 dh
0
x
¼ ; for x ¼ 1; 2; ::; n 1
n
Z Z1
For x ¼ 0; 2
wðhÞðh d 0 ð0ÞÞ qðh=x ¼ 0Þdh1 ðh cÞ2 h1 ð1 hÞn1 dh
0
¼ finite if c ¼ 0 ½by taking d 0 ð0Þ ¼ c
¼ 1 if c 6¼ 0
Z
x
) wðhÞðh d 0 ð0ÞÞ2 qðh=x ¼ 0Þdh is min for d 0 ð0Þ ¼ 0 ¼ :
n
R
Similarly, for x = n, wðhÞðh d 0 ð0ÞÞ2 qðh=x ¼ nÞdh is min for d 0 ðnÞ ¼ 1 ¼ nx
Thus for every x ¼ 0; 1; 2; . . .; n; d 0 ðxÞ ¼ nx minimizes
R
wðhÞðh d 0 ðxÞÞ2 qðh=xÞdh with respect to d ðxÞ 2
) d 0 ðxÞ ¼ nx ð¼ minimum variance unbiased estimator or maximum likelihood
estimator of hÞ is unique Bayes rule.
Application
3. Estimation of a real h under absolute error loss. To estimate h ¼ a real
parameter, X ¼ some interval of R1
Let Lðh; aÞ ¼ jh aj
d0 ¼ d0 ðX Þ be Bayes if for each x d 0 ðxÞ minimizes Eh=X¼x jh d ðxÞj with respect
to d ðxÞ 2
Clearly, d 0 ðxÞ = median of the posterior distribution of h given X = x. If the

median of the posterior distribution is unique, then d 0 is the unique Bayes rule.
Example 7.19 X ¼ ðX 1 ; X 2 ; . . .; X n Þ, X 1 ; X 2 ; . . .; X n i.i.d. N ðh; 1Þ; a\h\a
To estimate h under absolute error loss, without loss of any generality we restrict
to rules based on T ¼ X. Let sðhÞ : h N ð0; r2 Þ; r2 ð [ 0Þ known.
2
Median of posterior distribution of h given t ¼ kx; k ¼ nrnr2 þ1
) ðUniqueÞ Bayes estimator of h is kx.
Application
4. Estimation of function of h:
To estimate gðhÞ ¼ a real-value function of h.
¼ X ¼ the set of possible values of gðhÞ
Let Lðh; aÞ ¼ ðgðhÞ aÞ2 ! squared error loss.

d 0 be Bayes if it minimizes E h=X¼x fgðhÞ d ðxÞg2 with respect to d ðxÞ 2 ӕ for
each given x. Clearly, d 0 ðxÞ ¼ Eh=x¼x fgðhÞg
) d 0 ðxÞ ¼ E h=x fgðhÞg is (unique) Bayes.
Similarly, we can find it for weighted squared error loss or for absolute error
loss.
Example 7.20 X ¼ ðX 1 ; X 2 Þ; X i ’s independent and X i Binðni ; hi Þ; where n1 ; n2
known, 0\h1 ; h2 \1; h ¼ ðh1 ; h2 Þ
To estimate gðhÞ ¼ h1 h2 under squared error loss.
sðhÞ : h1 ; h2 independent, hi Rð0; 1Þ; i ¼ 1; 2

n1 n1 x1 n2
h1 ð 1 h1 Þ
x1
h2 x2 ð1 h2 Þn2 x2
x1 x2
qðh=xÞ ¼ 1 1

R R n1 n1 x1 n2
h1 ð 1 h1 Þ
x1
h2 x2 ð1 h2 Þn2 x2 dh1 dh2
0 0 x 1 x 2
hx11 ð1 h1 Þn1 x1 hx22 ð1 h2 Þn2 x2
¼ ; 0\h1 ; h2 \1:
Bðx1 þ 1; n1 x1 þ 1ÞBðx2 þ 1; n2 x2 þ 1Þ
i.e. posterior distribution of h is, h1 ; h2 independent and hi Bðxi þ 1;

ni xi þ 1Þ; i ¼ 1; 2
d 0 ðxÞ ¼ Eh=x ðh1 h2 Þ ¼ E h1 =x ðh1 Þ E h2 =x ðh2 Þ

x1 þ 1 x 2 þ 1
¼
n1 þ 2 n2 þ 2
Thus (unique) Bayes estimator of h1 h2 is

X1 þ 1 X2 þ 1
d 0 ðX Þ ¼ :
n1 þ 2 n2 þ 2
7.5 Methods for Finding Minimax Rule
I. Geometric or Direct Method
We find geometrically or directly a rule d0 such that
Sup Rd0 ðhÞ ¼ inf Sup Rd ðhÞ:

h2X d2D h2X
Let X ¼ fh1 ; h2 ; . . .; hk g; d 2 D; S = risk set
y ¼ risk point of d

Sup Rd ðhÞ ¼ max yj

h2X 1jk
ð1Þ ð2Þ
Two risk points y ; y may be considered to be equivalent if

max yj ð1Þ ¼ max yj ð2Þ :

1jk 1jk
A risk point y0 is said to be a minimax point if max yj0 ¼ inf max yj :

1jk y 2S 1 j k

If y0 is a minimax point and d0 is a rule with risk point y0 , then d0 is minimax.

For any real C, let Qc ¼ Qðc;c;::;cÞ ¼ y yj c8j ¼ 1; 2; ::k :

All risk points lying on the boundary of a Qc are equivalent points (Figs. 7.6 and
7.7).
Fig. 7.6 Equalizer line z1=z 2
(c,c)
Qc
7.5 Methods for Finding Minimax Rule 209
Fig. 7.7
Equivalent points
S
Minimax Point
(For any such point y , max yj = c)

Let C0 ¼ inf fC=Qc \ S 6¼ /g:
Any risk point 2 boundary of Qc0 is a minimax point. Any rule d0 with risk point
y0 is minimax.

Notes
1. If S does not contain its boundary points, a minimax rule may not exist.
2. A minimax point may not be unique
(1,1) (2,1)
S
All Minimax
Points
(1,0) (2,0)
3. A minimax point does not necessarily lie on the equalizer line (Figs. 7.8 and 7.9).
Example 7.20 Let X ¼ fh1 ; h2 g ¼ fa1 ; a2 g

Loss is (0–1).
x ¼ f0; 1; 2; . . .g
Ph1 ½X ¼ x ¼ 0 if x = 0
1
¼ if x 1
2x
(1,1) (2,1)
S
All Minimax
(1,0)
(2,0)
C0
Minimax Points
Fig. 7.8
C0
=Minimax Point
Fig. 7.9
1
Ph2 ½X ¼ x ¼ ; x ¼ 0; 1; . . .
2x þ 1
Let d 2 D: dða2 =xÞ ¼ dðxÞ dða1 =xÞ ¼ 1 dðxÞ

0 dð x Þ 1
X
1
y 1 ¼ R d ð h1 Þ ¼ dðxÞ Ph1 ðX ¼ xÞ
x¼1
X1
y 2 ¼ R d ð h2 Þ ¼ f1 dðxÞg Ph2 ðX ¼ xÞ
x¼0
1 1X 1
1
¼ 1 dð 0Þ dð x Þ x
2 2 x¼1 2
To find dðxÞ such that y1 ¼ y2 ¼ 13 we take
1
dð x Þ ¼ 8x 1; dð 0Þ ¼ 1
3
Thus, a minimax rule is given as
dða2 =0Þ ¼ 1 dða1 =0Þ ¼ 0

1 2
dða2 =xÞ ¼ dða1 =xÞ ¼ ; x ¼ 1; 2; . . .
3 3
Example 7.21
1 1
X¼ h1 ¼ ; h2 ¼
4 2

1 1
¼ a1 ¼ ; a2 ¼
4 2
a1 a2
Loss matrix: h1 1 4
h2 3 2

Let X ¼ 0 with probability h
h ¼ h1 ; h2
¼ 1 with probability ð1 hÞ
Then D ¼ fd 1 ; d 2 ; d 3 ; d 4 g
d 1 ð0Þ ¼ d 1 ð1Þ ¼ a1 ; d 2 ð0Þ ¼ d 2 ð1Þ ¼ a2 ; d 3 ð0Þ ¼ a1 but d 3 ð1Þ ¼ a2 and
d 4 ð0Þ ¼ a2 but d 4 ð1Þ ¼ a1
Rd 1 ðh1 Þ ¼ 1; Rd 1 ðh2 Þ ¼ 3; Rd2 ðh1 Þ ¼ 4; Rd2 ðh2 Þ ¼ 2
1 3 1
R d 3 ð h1 Þ ¼ 1þ 4 ¼ 3
4 4 4
1 1 1
R d 3 ð h2 Þ ¼ 3þ 2 ¼ 2
2 2 2
1 3 3
R d 4 ð h1 Þ ¼ 4þ 1 ¼ 1
4 4 4
1 1 1
R d 4 ð h2 Þ ¼ 2þ 3 ¼ 2
2 2 2
) S0 ¼ the set of risk points of all non-randomized rules

1 1 3 1
¼ ð1; 3Þ; ð4; 2Þ; 3 ; 2 ; 1 ; 2 :
4 2 4 2
If y2 ¼ my1 þ C using = lined points m ¼ 29, c ¼ 26

9 26
Here to find a minimax rule means to find a rule with risk point 26
11 ; 11
(Fig. 7.10)
(4,1)
Minmax point=
Fig. 7.10
Let d 2 D. For X = x
d ¼ a1 with probability dðxÞ

¼ a2 with probability 1 dðxÞ
Let dð0Þ ¼ u; dð1Þ ¼ v, d ffi ðu; vÞ; 0 u; v 1
dða1 =0Þ ¼ u; dða1 =1Þ ¼ v
dða2 =0Þ ¼ 1 u; dða2 =1Þ ¼ 1 v:
1 3
Rd ðh1 Þ ¼ fu 1 þ ð1 uÞ 4g þ fv 1 þ ð1 vÞ4g
4 4
1
¼ ð16 3u 9vÞ
4
1 1
R d ð h2 Þ ¼fu 3 þ ð1 uÞ 2g þ fv 3 þ ð1 vÞ2g
2 2
1
¼ ðu þ v þ 4Þ
2
26
Rd ðh1 Þ ¼ Rd ðh2 Þ ¼
11 ð7:12Þ
1 26 24
) ð16 3u 9vÞ ¼ ) u þ 3v ¼
4 11 11
1 26 8
and ð u þ v þ 4Þ ¼ ) uþv ¼ ð7:13Þ
2 11 11
(7.12), (7.13) gives the unique solution u = 0, v ¼ 11

8
:
Thus, the unique minimax rule is given as
8 3
dða1 =0Þ ¼ 0; dða2 =0Þ ¼ 1; dða1 =1Þ ¼ ; dða2 =1Þ ¼ :
11 11
Note The unique minimax rule is purely randomized. Thus, unlike Bayes rules, a
minimax rule may be purely randomized, i.e. although a minimax rule exists, no
non-randomized rule is minimax.
Alternative (direct/or Algebraic approach)
Let us take the same Example 7.21 (Fig. 7.11)
Fig. 7.11
8
11 D2
v
D
3
D1 11
1
Sup Rd ðhÞ ¼ maxfRd ðh1 Þ; Rd ðh2 Þg ¼ ð16 3u 9vÞ
h2X 4
if 14 ð16 3u 9vÞ 12 ðu þ v þ 4Þ,

i.e. if 5u þ 11v 8 and ¼ 12 ðu þ v þ 4Þ if
2 ðu þ v þ 4Þ 4 ð16 3u 9vÞ, i.e. if
5u þ 11v 8
1 1
Let D1 ¼ fd ðu; vÞ=5u þ 11v 8g
D2 ¼ fd ðu; vÞ=5u þ 11v [ 8g ) D1 þ D2 ¼ D
For d 2 D1 , Sup Rd ðhÞ ¼ 16 3u

4
9v
h2X
163u9v
Now Inf Sup Rd ðhÞ ¼ Inf 4
d2D1 h2H 0 u; v 1
5u þ 11v 8

16 3u 9v 1 9ð8 5uÞ
¼ inf inf ¼ inf 16 3u
0 u 1 0 v 85u 4 0u1 4 11
11
1 104 26
¼ inf ð12u þ 104Þ ¼ ¼
0u1 4 44 11
and inf attained for u = 0, v ¼ 8 115:0 ¼ 11

8
Similarly, inf Sup Rd ðhÞ ¼ 11, which is for u = 0, v ¼ 11

26 8
d2D2 h2H
Finally, inf Sup Rd ðhÞ
d2D h2H

26
¼ min inf Sup Rd ðhÞ; inf Sup Rd ðhÞ ¼
d2D1 h2H d2D2 h2X 11
and inf is attained if u = 0, v ¼ 11

8
.
Thus, the unique minimax rule is given as u = 0, v ¼ 11 8
i.e. dða1 =0Þ ¼ 0; dða2 =0Þ ¼ 1; dða1 =1Þ ¼ 11 ; dða2 =1Þ ¼ 11

8 3
II. Use of Bayes rule
A rule d0 is said to be an equalizer rule if Rd0 ðhÞ ¼ Const 8h 2 X.

Result 1 If an equalizer rule d0 is Bayes (w.r.t some prior s), then d0 is minimax. If
d0 is unique Bayes (w.r.t. s), then d0 is unique minimax (and hence admissible).
Proof Rd0 ðhÞ ¼ c 8h ) Sup Rd0 ðhÞ ¼ c and r ðs; d0 Þ ¼ c
h2H
Minimaxiety: If possible let d0 be not minimax, so there exists a d1 such that
Sup Rd1 ðhÞ\ Sup Rd0 ðhÞ ¼ c

h2H h2H
) Rd1 ðhÞ Sup Rd1 ðhÞ ¼ c8h
h2H
) r ðs; d1 Þ\c ¼ r ðs; d0 Þ
But this contradicts that d0 is Bayes w.r.t. s. Hence, d0 is minimax.
Unique minimaxiety
If possible let d0 be not unique minimax. So there exists another d1 which is also
minimax, i.e. there exists another d1 such that
Sup Rd1 ðhÞ ¼ Sup Rd1 ðhÞ ¼ c

h2H h2H
) Rd1 ðhÞ Sup Rd1 ðhÞ ¼ c8h
h2H
) r ðs; d1 Þ c ¼ r ðs; d0 Þ
i.e. r ðs; d1 Þ ¼ r ðs; d0 Þ (* d0 is Bayes w.r.t. s)

) d1 is also Bayes w.r.t s, but this contradicts that d0 is unique Bayes w.r.t. s.
Hence d0 is unique minimax.

X ¼ 1 with probabilityh
Example 7.22 Let
¼ 0 with prob 1 h: 0\h\1
To estimate θ under squared error loss.
Let d ðxÞ ¼ a non-randomized rule.
Let d ð1Þ ¼ u; d ð0Þ ¼ v; 0\u; v\1.
d ðu; vÞ
Equalizer rule Rd ðhÞ ¼ hðu hÞ2 þ ð1 hÞðv hÞ2

¼ h2 ð1 þ 2v 2uÞ þ h u2 v2 2v þ v2
The rule is equalizer iff 1 þ 2v 2u ¼ 0; or,
1 þ 2v
u¼ ð7:14Þ
2
2
and u2 v2 2v ¼ 0 or ð1 þ42vÞ v2 2v ¼ 0 (using 7.14)
Or 14 v ¼ 0 ) v ¼ 14 ) u ¼ 34
Thus, the only equalizer non-randomized rule is
3 1
d ð1Þ ¼ ; d ð0Þ ¼ :
4 4
Bayes rule: Let s ¼ a prior distribution

E ðhÞ ¼ m1 and E h2 ¼ m2

) r ðs; d Þ ¼ ERd ðhÞ ¼ m2 ð1 þ 2v 2uÞ þ m1 u2 v2 2v þ v2
@rðs;d Þ
Now @u ¼ 0 ) 2m2 þ 2m1 u ¼ 0 ) u ¼ m2
m1
@r ðs; d Þ
¼ 0 ) 2m2 2m1 v 2m1 þ 2v ¼ 0
@v
m1 m2
)v¼
1 m1
Thus, the unique Bayes rule w.r.t. s is

m2 m1 m2
d ð 1Þ ¼ ; d ð 0Þ ¼ where m1 ¼ E ðhÞ; m2 ¼ E h2
m1 1 m1
Hence, the equalizer non-randomized rule is unique Bayes w.r.t. a s such that
1 m2
m2
m1 ¼ 34 and m1m 1
¼ 14.
) m1 ¼ 2 and m2 ¼ 38.
1

[For example, let s ¼ B 12 ; 12 prior α = β m1 ¼ a þa b ¼ 12 and m2 ¼
aða þ 1Þ
ða þ bÞða þ b þ 1Þ ¼ 38 and the equalizer non-randomized rule is unique Bayes w.r.t.
1 1
B 2 ; 2 prior].
Thus the non-randomized rule d 0 ð1Þ ¼ 34 ; d 0 ð0Þ ¼ 14 is equalizer as well as
unique Bayes (w.r.t some prior) ) d 0 ðX Þ is minimax (unique).
Example 7.23
X Binðn; hÞ; n known and 0\h\1
sab ¼ Bða; bÞ prior a; b [ 0
The unique Bayes rule w.r.t. sab is
X þa
d ab ðX Þ ¼
nþaþb
2
X þa
Rd ab ðhÞ ¼ E h h
nþaþb
E h fðx nhÞ hða þ bÞ þ ag2
¼
ð n þ a þ bÞ 2
Eh ðx nhÞ2 þ h2 ða þ bÞ2 þ a2 2haða þ bÞ
½* E h ðx nhÞ ¼ 0 )
ð n þ a þ bÞ 2
n o
h i h2 ða þ bÞ2 n þ hfn 2aða þ bÞg þ a2
* E h ðx nhÞ2 ¼ nhð1 hÞ )
ð n þ a þ bÞ 2
d ab is equalizer iff
pffiffi
ða þ bÞ2 ¼ n a ¼ 2n
, pffiffi
2aða þ bÞ ¼ n b ¼ 2n
Thus the rule

pffiffi
X þ n
d pffin;pffin ðX Þ ¼ p2ffiffiffi
2 2 nþ n
pffiffiffi
pffiffi pffiffi
X þ n=2
is equalizer as well as unique Bayes (w.r.t B n
; n
prior). Hence pffiffi is the
2 2 nþ n
unique minimax estimator of h.
Example 7.24 Let X Binðn; hÞ; n be known 0\h\1

2
To estimate h under loss function Lðh; aÞ ¼ hðhð1ahÞ Þ
E h ðX h Þ
2
Let d 0 ðX Þ ¼ Xn ; Rd 0 ðhÞ ¼ hðn1hÞ ¼ 1n 8h

i.e. d 0 is an equalizer rule.
Also, d 0 is unique Bayes w.r.t. R(0,1) prior.
Hence d 0 ðX Þ ¼ Xn is the unique minimax estimator of h.
Result 2 If an equalizer rule d0 is extended Bayes, then it is minimax.
Example 7.25 X 1 ; X 2 ; . . .; X n i.i.d N ðh; 1Þ; 1\h\1
To estimate h under squared error loss. Let d 0 ¼ X; Rd 0 ðhÞ ¼ 1n 8h, i.e. d 0 is
equalizer. Also, d 0 is extended Bayes. Hence X is minimax.
Proof of Result 2
Rd0 ðhÞ ¼ c 8h
So, Sup Rd0 ðhÞ ¼ c ) r ðs; d0 Þ ¼ c 8s:

h2H
Also, d0 is extended Bayes
) given any 2 [ 0; there exists a prior s2 such that
c ¼ r ðs2 ; d0 Þ inf r ðs2 ; dÞ þ 2

d2D
ð7:15Þ
or; inf r ðs2 ; dÞ c 2
d2D
if possible let d0 be not minimax.

So there exists a d1 such that
Sup Rd1 ðhÞ\ Sup Rd0 ðhÞ ¼ c ð7:16Þ

h2H h2H
(7.16) implies there exists an 2 such that
Sup Rd1 ðhÞ\c 2

h2H
) Rd1 ðhÞ\c 2 8h; since Rd ðhÞ Sup Rd ðhÞ
h2H ð7:17Þ
) r ðs; d1 Þ\c 2 whatever be s:
) inf r ðs; dÞ\c 2 whatever be s
d2D

* inf r ðs; dÞ r ðs; d1 Þ
d2D
(7.17) contradicts (7.15). Hence d0 must be minimax. h

Result 3 Let d0 be such that

(i) Rd0 ðhÞ c 8h for some real constant c.
(ii) d0 is Bayes (unique Bayes) w.r.t. a prior s0 such that r ðs0 ; d0 Þ ¼ c
Then d0 is minimax (unique minimax).
Corollary 1 Let d0 be such that
ðiÞ0 Rd0 ðhÞ ¼ c 8h (This is in fact Result 1)
ðiiÞ0 d0 is Bayes (unique Bayes) w.r.t. a prior s0 .
Then d0 is minimax (unique minimax). ðiÞ0 ; ðiiÞ0 ) ðiÞ; ðiiÞ
^
Corollary 2 Let d0 be such that
Rd0 ðhÞ ¼ c 8h 2 H0 ðHÞ
ðiÞ0
c 8h 2 H H0
ðiiÞ0 d0 is Bayes (unique Bayes) w.r.t. a s0 such that Prfh 2 H0 g ¼ 1
Then d0 is minimax (unique minimax)
ðiÞ0 ; ðiiÞ0 ; ) ðiÞ; ðiiÞ
Note For H0 ¼ H, Corollary 2 ) Corollary 1
Proof of Result 3 For any d and any s,
Sup Rd ðhÞ r ðs; dÞ ð7:18Þ

h2X

As Rd ðhÞ Sup Rd ðhÞ8h ) r ðs; dÞ Sup Rd ðhÞ
h2H h2H
For d ¼ d0 and s ¼ s0 ; (7.18)
) r ðs0 ; d0 Þ Sup Rd0 ðhÞ by (ii) ð7:19Þ

h2H
Also ðiÞ ) Sup Rd0 ðhÞ c ð7:20Þ

h2H
(7.19), (7.20)
) Sup Rd0 ðhÞ ¼ c ð7:21Þ

h2H
So, minimaxiety of d0 :
For any; d; Sup Rd ðhÞ r ðs0 ; dÞ by ð7:18Þ

h2H
r ð s 0 ; d0 Þ ðSince; d0 is Bayes w:r:t: s0 Þ
¼ cðbyðiiÞÞ
¼ Sup Rd0 ðhÞ by ð7:21Þ
h2H
) d0 is minimax.
Unique minimaxiety of d0 : For any dð6¼ d0 Þ
Sup Rd ðhÞ r ðs0 ; dÞ ðbyð7:18ÞÞ

h2X
[ r ðs0 ; d0 Þ ðSince d0 is unique Bayes w:r:t:s0 Þ
¼ c ðby ðiiÞÞ
¼ Sup Rd0 ðhÞ by ð7:21Þ
h2X
Thus Sup Rd ðhÞ [ Sup Rd0 ðhÞ

h2X h2X
8dð6¼ d0 Þ ) d0 is unique minimax. h
Example 7.26 Let X Binðn; h1 Þ n be known.
Y Binðn; h2 Þ 0\h1 ; h2 \1; h1 ; h2 are unknown.
To estimate h1 h2 under squared error loss, we can expect a rule of the form
aX + bY + c to be minimum. However, no rule of this form is an equalizer rule. So
Result 1 (or Corollary 1) cannot be applied. But Corollary 2 can be applied as
follows:
Step 1: To find an equalizer Bayes rule in some H0 ðHÞ. Let
H0 ¼ fh1 ; h2 =0\h1 ; h2 \1; h1 þ h2 ¼ 1g. Restricting to H0 , let us write h1 ¼ h;
h2 ¼ 1 h.
Thus, we have,
3
X Binðn; hÞ
7 independent:
Y Binðn; 1 hÞ 5
or n Y Binðn; hÞ
Without any loss of generality we may restrict ourselves to rules based on
Z ¼ X þ ðn Y Þ Binð2n; hÞ (Sufficient statistic) pffiffi pffiffi
If X Binðn; hÞ, an equalizer and unique Bayes (w.r.t. Bin 2n ; 2n prior)
pffi
Xþ n
estimator of h under squared error loss is n þ p2ffiffin.
pffiffiffiffi pffiffiffiffi
If Z Binð2n; hÞ; an equalizer and unique Bayes (w.r.t. Bin 22n ; 22n prior)
pffiffiffi
Z þ 2n
estimator of h under squared error loss is 2n þ p2ffiffiffi
2n
ffi.
To estimate now h1 h2 ¼ 2h 1; consider the following:

Lemma Under squared error loss, if d0 is an equalizer Bayes (unique) estimator of
gðhÞ, then d0 ¼ ad0 þ b is an equalizer Bayes (unique) estimator of
g ðhÞ ¼ agðhÞ þ b.
Proof For any estimator d of gðhÞ we can define an induced estimator, viz. d ¼
ad0 þ b of g ðhÞ ¼ agðhÞ þ b and vice versa.
Under squared error loss, Rd ðhÞ ¼ a2 Rd ðhÞ
r ðs; dÞ ¼ a2 r ðs; d Þ
Hence, d0 is equalizer ) d ¼ ad0 þ b is equalizer. h

d0 is Bayes (unique) w.r.t. s ) d0 ¼ ad0 þ b is Bayes (unique).
By the Lemma, an equalizer Bayes (unique) estimator of 2h 1 is
pffiffiffiffi
2 z þ 22n 2ðX YÞ
pffiffiffiffiffi ¼ pffiffiffiffiffi
2n þ 2n 2n þ 2n
Thus, if we restrict to H0 , an equalizer Bayes (unique) estimator of h1 h2 is

2ðXY Þ
pffiffiffiffi
¼ d 0 (say)
2n þ 2n
Step 2: Rd0 ðh1 ; h2 Þ c 8ðh1 ; h2 Þ 2 H where c ¼ Rd 0 ðh1 ; h2 Þ for ðh1 ; h2 Þ 2 H0 .
Proof For ðh1 ; h2 Þ 2 H
2
2ðX Y Þ
Rd 0 ðh1 ; h2 Þ ¼ E h1 ;h2 pffiffiffiffiffi ðh1 h2 Þ
2n þ 2n
n pffiffiffiffiffi o 2 pffiffiffiffiffi 2
¼ E h 2ðX nh1 Þ 2ðY nh2 Þ 2nðh1 h2 Þ 2n þ 2n
4E h ðX nh1 Þ2 þ 4E h ðY nh2 Þ2 þ 2nðh1 h2 Þ2

¼ pffiffiffiffiffi2
2n þ 2n
2h1 ð1 h1 Þ þ 2h2 ð1 h2 Þ þ ðh1 h2 Þ2 Numerator
¼ pffiffiffiffiffi2 ¼ :
1 þ 2n Dinominator
Now Numerator ¼ 2h1 þ 2h2 h12 h22 2h1 h2

¼ 1 f1 h1 h2 g2 1
0
¼ 0 holds iff h1 þ h2 ¼ 1
1
R d 0 ð h1 ; h2 Þ ¼ pffiffiffiffiffi2 ¼ c 8ðh1 ; h2 Þ 2 H0
Hence, 1 þ 2n h
\c 8ðh1 ; h2 Þ 2 H H0
2ðXY Þ
By Corollary 2, Step 1 + Step 2 gives us d 0 ¼ 2n pffiffiffiffi is the unique minimax
þ 2n
estimator of h1 h2 .
Result 4 Let d0 be such that
(i) Rd0 ðhÞ c 8h 2 H, c = a real constant
(ii) There exists a sequence of Bayes rules fdn g w.r.t. sequence of priors fsn g
such that r fsn ; dn g ! c. Then d0 is a minimax.
Proof For any d and any s,
Sup Rd ðhÞ r ðs; dÞ ð7:22Þ

h2H
(as was in the Proof of Result 3) h

(7.22) ) For any d,
Sup Rd ðhÞ r ðsn ; dÞ r ðsn ; dn Þ ! c by ðiiÞ

h2H
ðSince; dn is Bayes w:r:t: sn priorÞ ð7:23Þ
For d ¼ d0
(7.23) ) Sup Rd0 ðhÞ c and also condition ðiÞ ) Sup Rd0 ðhÞ c
h2H h2H
) Sup Rd0 ðhÞ ¼ c ð7:24Þ

h2H
Then (7.23), (7.24) ) for any d,
Sup Rd ðhÞ c ¼ Sup Rd0 ðhÞ

h2H h2H
i.e. d0 is minimax.
Example 7.27 Let X 1 ; X 2 ; ::; X n i.i.d N ðh; 1Þ; 1\h\1
To estimate h under squared error loss,
Let d 0 ¼ X; Rd 0 ðhÞ ¼ 1n 8h
(i) is satisfied with c ¼ 1n.
Let sr : N ð0; r2 Þ prior
2
d r ¼ Bayes estimator of h w.r.t. sr ¼ 1 þnrnr2 X
r ðsr ; d r Þ ¼ 1 þr nr2 ! 1n ¼ c as r2 ! 1
2
Thus (ii) is satisfied.

Hence d 0 ¼ X is minimax.
Example 7.28 Let X Poisson (θ), 0 < θ < α.

2
To estimate θ with Lðh; aÞ ¼ ðh h aÞ .
(Apply Result 4 to prove that d 0 ¼ X is minimax)
2
Hint Rd0 ðhÞ ¼ Eh ðXh hÞ ¼ 1 ) (i) is satisfied with c = 1. Take
sab ðhÞ1eah hb1 ; 0\h\1. d ab ðX Þ ¼ Bayes estimator of h w.r.t. sab ¼
x þ b1

1 þ a r sab ; d ab ! 1 ¼ c as a ! 0, b ! 1. Hence d 0 ¼ X is minimax.
Other Methods: Use of Cramer-Rao inquality.
Result 1 If an equalizer rule d0 is admissible, then d0 is minimax.

Proof Rd 0 ðhÞ ¼ c 8h ) Sup Rd 0 ðhÞ ¼ c.
h2H
If possible let d0 be not minimax. Then there exists a d1 such that
Sup Rd1 ðhÞ\ Sup Rd0 ðhÞ ¼ c
h2H h2H
) Rd1 ðhÞ\C ¼ Rd0 ðhÞ 8h ð7:25Þ
(7.25) ) d1 [ d0 ; which contradicts that d0 is admissible. Hence, d0 is minimax.

To estimate a real-valued parameter h under squared error loss. H ¼ an open
interval of R1 . Without any loss of generality we can restrict ourselves to
non-randomized rules only (since the loss function is convex). h
Let d ðX Þ ¼ a non-randomized rule.
bd ðhÞ ¼ E h ðd ðX ÞÞ h ¼ Bias of d ðX Þ
By C--R inequality Rd ðhÞ ¼ MSEh ðd ðX ÞÞ

¼ b2d ðhÞ þ V h ðd ðX ÞÞ
2 .
b2d ðhÞ þ 1 þ b0d ðhÞ I ð hÞ 8h
¼ C d ð hÞ ðsayÞ
IðhÞ ¼ Fisher's information function:
Result 2 Let d 0 be a non-randomized rule such that

(i) MSE of d 0 attains C–R lower bound, i.e.
Rd0 ðhÞ ¼ C d0 ðhÞ 8h

(ii) For any non-randomized rule d 1 ,
C d1 ðhÞ Cd 0 ðhÞ 8h
) bd 1 ð hÞ ¼ bd 0 ð hÞ 8h
Then d 0 is admissible.
If further, d 0 is equalizer, then d 0 is minimax.
Proof Result I ) proves that it is minimax h
Proof of admissibility If possible let d 0 be inadmissible.
Then there exists a d 1 such that
Rd 1 ðhÞ Rd 0 ðhÞ 8h with strict inequality for at least one h ð7:26Þ
(7.26) ⇒
Cd 1 ðhÞ Rd 1 ðhÞ Rd 0 ðhÞ ¼ C d 0 ðhÞ 8h ð7:27Þ

|fflfflfflfflfflfflfflfflfflfflffl
ffl{zfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl}
by C–R inequality and by (i)

(7.27) ⇒
bd 1 ðhÞ ¼ bd 0 ðhÞ 8h by ðiiÞ
) Cd 1 ðhÞ ¼ Cd 0 ðhÞ 8h ð7:28Þ
(7.27) and (7.28) ⇒
Cd 0 ðhÞ Rd 1 ðhÞ Rd0 ðhÞ ¼ Cd 0 ðhÞ 8h ð7:29Þ
We must have equality in (7.29) everywhere, implying that Rd 1 ðhÞ ¼ Rd 0 ðhÞ8h:

Thus, strict inequality in (7.26) cannot hold for any h, i.e. there cannot be any d 1
such that d 1 [ d 0 .
Hence d 0 is admissible. h
Example 7.29 Let x1 ; x2 . . .xn i.i.d N ðh; 1Þ; 1\h\1. To estimate h under
squared error loss.
is sufficient ) it is enough to restrict to n.r. rules based on X
X only.
Let d 0 ¼ d 0 ðX Þ ¼ X.

Rd 0 ðhÞ ¼ 1n 8 h, i.e. d 0 is equalizer.
Also, bd0 ðhÞ ¼ 0 8 h, i.e. d 0 is unbiased.
1
Rd 0 ðhÞ ¼ C d 0 ðhÞ ¼ 8h
n
i.e. condition (i) of Result 2 is satisfied [Here IðhÞ ¼ n].

Let d ¼ d ðX Þ be any n.r rule based on X.
Lemma C d ðhÞ C d 0 ðhÞ 8 h.

) bd ðhÞ ¼ 0 8 h, i.e. d is also unbiased.
Lemma ) Condition (ii) of Result 2 is also satisfied.
Hence, (i) d 0 is admissible.
(ii) d 0 is minimax.
Also, (iii) d 0 is unique minimax.
[Proof of (iii): Let d 1 ¼ d 1 ðxÞ be another minimax rule.
Then
1
Sup Rd1 ðhÞ ¼ Sup Rd 0 ðhÞ ¼ ¼ C d 0 ð hÞ
h2H h2H n
) C d1 ðhÞ Rd 1 ðhÞ Sup Rd 1 ðhÞ ¼ Cd 0 ðhÞ8h
|fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} h2H
By C–R inequality
) Cd 1 ðhÞ Cd 0 ðhÞ 8 h ) bd1 ðhÞ ¼ 0 8 h (By Lemma).
) d 1 is an unbiased estimator of h. But since X is complete d 0 is the unique
is the unique minimax esti-
unbiased estimator of h; i.e., d 1 ¼ d 0 , Hence d 0 ¼ X
mator of h]
Proof of Lemma Writing bd ðhÞ ¼ bðhÞ
Let Cd ðhÞ Cd 0 ðhÞ ¼ 1n 8 h
.
i:e:; b2 ðhÞ þ f1 þ b0 ðhÞg
2
n 1=n 8 h ð7:30Þ
ð7:30Þ ) b0 ðhÞ 0 8 h i:e:; bðhÞ is non-increasing ð7:31Þ
1 2b0 ðhÞ .
b2 ðhÞ þ f1 þ b0 ðhÞg n 1=n
2
½as ð7:30Þ) þ
n n
2b0 ðhÞ
) 0 ) b0 ðhÞ 0
n
Also ð7:30Þ ) b2 ðhÞ þ 2b0 ðhÞ 0 ð7:32Þ
½As ð7:30Þ)nb2 ðhÞ þ b0 ðhÞ þ 2b0 ðhÞ 0

2
)b2 ðhÞ þ 2b0 ðhÞ nb2 ðhÞ þ b0 ðhÞ þ 2b0 ðhÞ 0

2
b0 ðhÞ
Now ð7:32Þ ) 1
8 h such that bðhÞ 6¼ 0
b0 ðhÞ
2 2
d 1 1
Or b ð hÞ 8 h such that bðhÞ 6¼ 0 ð7:33Þ
dh 2
ð7:31Þ; ð7:33Þ ) bðhÞ ! 0 as h ! 1 ð7:34Þ
Finally (7.31), (7.34) ) bðhÞ ¼ 0 8 h, which proves the Lemma. h
7.6 Minimax Rule: Some Theoretical Aspects
A statistical decision problem A game between statistician and nature.
H ¼ the set of possible actions for nature:
¼ the set of possible actions for statistician:
Lðh; aÞ ¼ Loss ðto the statisticianÞ if the statistician chooses an action ‘a’ and
nature chooses an action ‘h’.
A randomized action for the statistician = a probability distribution over ɶ.
The statistician observes the value of a r.v. X. If X = x is observed, the statistician
chooses a randomized action dðxÞ.
dðxÞ ¼ a randomized rule for statistician:
s ¼ a prior distribution: ¼ a probability distribution over H:

¼ a randomized action for the nature:
If the statistician chooses a randomized rule d and the nature chooses a ran-
domized action s, then the statistician’s expected loss is
cðs; dÞ ¼ Bayes risk of d w:r:t: s.
Result 1 For any d 2 D,
Sup Rd ðhÞ ¼ Sup cðs; dÞ where H = the set of all possible s’s.
h2H s2H
Proof
Rd ðhÞ Sup Rd ðhÞ 8 h

h2H
) cðs; dÞ Sup Rd ðhÞ 8 s ð7:35Þ
h2H
) Sup cðs; dÞ Sup Rd ðhÞ
s2H h2H
Consider a prior s0 which chooses a particular value h with probability 1.

7.6 Minimax Rule: Some Theoretical Aspects 227
Then r ðs0 ; dÞ ¼ Rd ðhÞ

Hence, Sup r ðs; dÞ r ðs0 ; dÞ ¼ Rd ðhÞ 8 h
s2H
Thus Rd ðhÞ Sup r ðs; dÞ 8 h
s2H
) Sup Rd ðhÞ Sup r ðs; dÞ ð7:36Þ

h2H s2H
(7.35), (7.36) ) Sup Rd ðhÞ ¼ Sup r ðs; dÞ, hence the proof.
h2H s2H
A rule d0 is minimax if it minimizes
Sup Rd ðhÞw:r:t d 2 D
h2X
Or; Sup cðs; dÞ w:r:t d 2 D ½ by Result 1

s2H
i.e. if Sup r ðs; dÞ ¼ Inf Sup r ðs; dÞ ¼ m (say)

s2H d2D s2H
m ¼ Upper value of the game:
Thus, if a statistician chooses a minimax rule d0 , his expected loss is at most m

whatever be the action chosen by nature.
Similarly, a prior s0 is said to be a maximum rule for the nature or a least
favourable prior for the statistician if s0 maximizes inf r ðs; dÞw:r:t s, i.e. if
d
inf r ðs0 ; dÞ ¼ Sup inf r ðs; dÞ ¼ m ðSay)

d s d
m ¼ Lower value of the game:
If nature chooses a least favourable s0 , then expected loss (of the statistician) is
at least m whatever be the rule the statistician chooses. h
Result 2 m m
Proof
r ðs; dÞ Sup r ðs; dÞ 8 s; d

s
) inf r ðs; dÞ inf Sup r ðs; dÞ ¼ m 8 s
d d s
) Sup inf r ðs; dÞ m
s d
) m m
The statistical game is said to have a value m if m ¼ m ¼ m, h

Result 3 if the statistical game has a value and a least favourable prior s0 and a
minimax rule d0 exists, then d0 is Bayes w.r.t. s0 .
Proof m ¼ inf r ðs0 ; dÞ r ðs0 ; d0 Þ Sup r ðs; d0 Þ ¼ m
d s
If m ¼ m, then ‘=’ must hold every where implying
inf r ðs0 ; dÞ ¼ r ðs0 ; d0 Þ ) d0 is Bayes w.r.t. s0 . h
d
Minimax theorem Let H be finite and the risk set S be bounded below. Then the
statistical game will have a value and a least favourable prior s0 exists.
If further, S is closed from below an admissible minimax rule d0 exists and d0
Bayes w.r.t. s0 .
Thus if H is finite and S is bounded below as well as closed from below, then
(i) A minimax rule exists
(ii) An admissible minimax rule exists and
(iii) A minimax rule is Bayes (w.r.t least favourable prior s0 ).
Result 4 Suppose there exists a rule d0 such that

(i) Rd0 ðhÞ c 8 h
(ii) d0 is Bayes w.r.t. some s0 and r ðs0 ; d0 Þ ¼ c,
then
(a) d0 is minimax
(b) s0 is least favourable prior.
Proof
(a) Proved earlier
(b) To show inf r ðs0 ; dÞ inf r ðs; dÞ 8 s ðbÞ
d d
Now (i) ) r ðs; d0 Þ c 8 s
) inf r ðs; dÞ r ðs; d0 Þ c ¼ r ðs0 ; d0 Þ ¼ inf r ðs0 ; dÞ 8 s by ðiiÞ
d d
This proves (b). h
7.7 Invariance
Many statistical decision problems are invariant w.r.t. some transformations of X. In

such case it seems reasonable to restrict to decision rules, which are also invariant
w.r.t. similar transformations. Such a decision rule is called an invariant decision
rule and in many problems a best rule exists within the class of invariant rules.
7.7 Invariance 229
Example 7.30 X N ðh; 1Þ; 1\h\1

We are to estimate h under the squared error loss.
Suppose one considers a transformation of X, viz., X 0 ¼ X þ c, c = a given
constant and considers the problem of estimating h0 ¼ h þ c on the basis of
X 0 N ðh0 ; 1Þ under the squared error loss.
For an action ‘a’ for the first problem, there is an action a0 ¼ a þ c for the second
problem and vice versa with Lðh; aÞ ¼ Lðh0 ; a0 Þ. Thus the two problems may be
considered to be equivalent in the sense that ðH; ɶ, L) ðH0 ; ɶ′, L′).
Now let d ¼ d ð X Þ ¼ a reasonable estimator of h on the basis of X. Then
d ðX 0 Þ ¼ d ðX þ cÞ should be a reasonable estimator of h0 on the basis of X 0 . Also, if
d ð xÞ ¼ a reasonable estimate of h on the basis of X ¼ x then d ð xÞ þ c should be a
reasonable estimate for h0 . The two estimates are identical if
d ðx þ cÞ ¼ d ð xÞ þ c ð7:37Þ
An estimator d ð X Þ is said to be a location invariant or an equivariant if (7.37)

holds 8x8c.
d ð X Þ is an equivariant estimator iff d ð X Þ ¼ X þ K ¼ dK ð X Þ (say) for some
constant K.
[If d ð X Þ ¼ X þ K, then (7.37) is satisfied 8x8c. Let (7.37) be satisfied 8x8c. For
c ¼ x, ðiÞ ) d0 ¼ dðxÞ x; or d ð xÞ ¼ x þ K, K ¼ dð0Þ Rdk ðhÞ ¼
E ðX þ K hÞ2 ¼ 1 þ K 2 which is minimum when K = 0. Thus
Rd0 ðhÞ Rdk ðhÞ8h8K.
) d0 ð X Þ ¼ X is the best within the class of equivariant estimators.]
Invariant statistical decision problems

ðH; ɶ, L) X = a r.v. and x = observed value of X 2 x (=sample space)
Ph = A probability distribution over x depending on h.
P ¼ fPh =h 2 Hg = family of probability distribution.
A statistical decision problem ðH; a; LÞ and P,
Groups of transformation of X(or x)
Y ¼ gðXÞ = a transformation of X
gðxÞ ¼ a single valued function of x.
g: x! x , g = a transformation on x
We assume that g is measurable so that g(x) is an r.v. g is said to be an onto
transformation if the range of g(x) is x is
x, i.e. x.
g is said to be 1:1 if gðx1 Þ ¼ gðx2 Þ ) x1 ¼ x2 .
Example 7.31 x ¼ R1 ; gðxÞ ¼ x þ c; c ¼ a real constant. This g is 1:1 and onto.
The identity transformation e is defined as eðxÞ ¼ x. Let g1 ; g2 be two trans-
formations on x. Then the composition of g2 ; g1 , denoted by g2 g1 is defined as
g2 g1 ðxÞ ¼ g2 ½g1 ðxÞ.
Example 7.32 x ¼ R1
g1 ðxÞ ¼ x þ c1 and g2 ðxÞ ¼ c2 , c1, c2 are real constants. g1 g2 ðxÞ ¼ x þ c1 þ c2
Clearly, g1 g2 g3 ¼ g1 ðg2 g3 Þ ¼ ðg1 g2 Þg3

Also ge ¼ eg ¼ g
If g is a transformation on x, then the inverse transformation of g, denoted by
g1 , is the transformation g such that
gg1 ¼ g1 g ¼ e.
In the example, g1 1 ðxÞ ¼ x c1 .
Note g1 exists iff g is 1:1 and onto.

Let G = a class of transformation on
x
Definition G is called a group of transformations if G is closed under the com-
positions and inverses, i.e. if
i. g1 ; g2 2 G ) g2 g1 2 G
ii. g 2 G ) g1 2 G.
Note Let G be a group of transformations, then every g 2 G is 1:1 and onto (since
g1 exists).
Also, the identity transformation e always 2 G [if g 2 G, then g1 2 G,
e ¼ g1 g 2 G].
Example 7.33 x ¼ R1
gc ðxÞ ¼ x þ c; c ¼ a real constant.
Let G ¼ fgc = 1\c\1g
gc1 ; gc2 2 G ) gc1 gc2 2 G½Asgc1 gc2 ðxÞ ¼ x þ c1 þ c2 ; c1 þ c2 ¼ c
1
gc 2 G ) g1
c 2 G Asgc ðxÞ ¼ x þ ðcÞ
Hence, G is a group of transformation which is Additive or Location group.

Example 7.34
x ¼ R1 , gc ðxÞ ¼ cx where c = a positive real constant
gc1 gc2 ðxÞ ¼ c1 c2 x
1
g1
c ðxÞ ¼ x
c
Let G ¼ fge =0\c\1g
gc 1 ; gc 2 2 G ) gc 1 gc 2 2 G
gc 2 G ) g1
c 2G
7.7 Invariance 231
Thus G is a group of transformations.

These are multiplicative or group under scale transformation.
Example 7.35
x ¼ R1 , ga;b ¼ a þ bx

G ¼ ga;b = 1\a\1; 0\b\1
G is a group transformation.
It is a group under both location and scale transformation.
Example 7.36 x ¼ f0; 1; 2. . .ng
Let gðxÞ ¼ n x
G ¼ fe; gg
eg ¼ g 2 G; g1 ðxÞ ¼ x ¼ eðxÞ 2 G
Also e1 2 G [Trivially]

Hence, G is a group of transformation.
Example 7.37 x ¼ ðx1 ; x2 ; x3 ; . . .. . .xn Þ
x0 = The set of possible values of xi

x¼ x0 x. . .. . .. . .:x
x0 x x0
Let i ¼ ði1 ; i2 ; i3 ; . . .. . .in Þ be a permutation of 1; 2. . .n

Let gi ð xÞ ¼ ðxi1 ; xi2 . . .. . .xin Þ
G ¼ fgi =i 2 the set of all possible permutation of ð1; 2. . .nÞg
G is a group of transformations. It is a permutation group.

The invariance of a statistical decision problem is considered to be w.r.t a given
group transformations G on x:
Invariance of P Let G = a given group of transformations on

x:
Definition P ¼ fPh =h 2 Hg is said to be invariant w.r.t G if for any g 2 G and any
h 2 H ði:e:; any Ph 2 PÞ there exists a unique h0 2 H (i.e. a unique Ph0 2 P) such
that probability distribution of y ¼ gð xÞ is Ph0 when the probability distribution of
X is Ph .
This unique h0 determined by g and h is denoted by gðhÞ.
Example 7.38 X Nðh; 1Þ; 1\h\1
P ¼ fNðh; 1Þ= 1\h\1g
Let G ¼ fgc = 1\c\1g where gc ðxÞ ¼ x þ c

If X Nðh; 1Þ then Y ¼ gc ðxÞ Nðh þ c ¼ h0 ; 1Þ
h0 is uniquely determined by c and h:

Thus P is invariant under G with

gc ðhÞ ¼ h þ c:
Example 7.39 X expðhÞ; 0\h\1
P.d.f of X under h is 1h eh ; x [ 0
x
P ¼ fexpðhÞ=0\h\1g
Let G ¼ fgc =0\c\1g where gc ðxÞ ¼ cx

If X expðhÞ, then gc ðxÞ expðchÞ, i.e. ch ¼ h0 : h0 is uniquely determined by
c and h. Thus P is invariant under G with gc ðhÞ ¼ ch:
Example 7.40 Let X Binðn; hÞ, n known, 0\h\1
P ¼ fBinðn; hÞ=0\h\1g
Let G = a group of transformations on x ¼ fe; gg where gðxÞ ¼ n x

If X Binðn; hÞ then eðxÞ Binðn; h ¼ h0 Þ and gðxÞ Binðn; 1 h ¼ h0 Þ
h0 is uniquely determined by h and member of G. Thus P is invariant under
G with eðhÞ ¼ h;
gðhÞ ¼ 1 h:
Invariance of loss function
Let G = a group of transformations on

x
Let P be invariant w.r.t G with induced group of transformations on H as
¼ f
G g=g 2 Gg:
Definition The loss function L is said to be invariant w.r.t G if for each g 2 G and
each a 2 , there exists a unique a0 2 such that
Lðh; aÞ ¼ LðgðhÞ; a0 Þ 8h 2 H:
This unique a0 determined by g and ‘a’ is denoted by gðaÞ:

Example 7.41 X Nðh; 1Þ; 1\h\1
G ¼ fgc = 1\c\1g; gc ¼ x þ c
P is invariant w.r.t
G with G ¼ fgc = 1\c\1g; gc ðhÞ ¼ c þ h:
To estimate h under Lðh; aÞ ¼ ðh aÞ2
For any gc 2 G; a 2 , there is an a0 ¼ a þ c 2
such that Lðh; aÞ ¼ Lðgc ðhÞ; a0 Þ 8h 2 X:
a0 is uniquely determined by a and c. Hence the loss function is invariant w.r.t G.
7.7 Invariance 233
Example 7.42 X expðhÞ; 0\h\1
G ¼ fgc =0\c\1g; gc ðxÞ ¼ cx
P is invariant w.r.t. G with

G ¼ fgc =0\c\1g, gc ðhÞ ¼ ch:
2
To estimate h with Lðh; aÞ ¼ 1 ah
For a0 ¼ ca, Lðh; aÞ ¼ Lðgc ðhÞ; a0 Þ 8h 2 X:
This a0 is uniquely determined by a and c. Hence the loss function is invariant w.
r.t. G.
Example 7.43 X Binðn; hÞ, 0\h\1
G ¼ fe; gg; eðxÞ ¼ x; gðxÞ ¼ n x
P is invariant w.r.t. G with

¼ fe;
G gg, eðhÞ ¼ h, gðhÞ ¼ 1 h:
Then Lðh; aÞ ¼ LðeðhÞ; a0 Þ where a0 ¼ a
and Lðh; aÞ ¼ LðgðhÞ; a0 Þ where a0 ¼ 1 a: a0 is uniquely determined by a
member of G. Thus L is invariant w.r.t. G.
Invariance of a statistical decision problem:
A statistical decision problem ðH; ; LÞ and P
G = A group of transformation of x.
Definition A Statistical decision problem is said to be invariant under G if
(i) P is invariant under G
and (ii) L is invariant under G.
Thus as already shown
i. X Nðh; 1Þ to estimate h under squared error loss
G ¼ fge = 1\c\1g; ge ðxÞ ¼ x þ c
the problem is invariant under G.
ii: X expðhÞ; 0\h\1

2
To estimate h under Lðh; aÞ ¼ 1 ah ; ge ðxÞ ¼ cx the problem is invariant
under G.
iii. X Binðn; hÞ; n is known, 0\h\1
To estimate h under squared error loss with G ¼ fe; gg
e(x) = x, g(x) = n − x, the problem is invariant under G.
Example 7.44 X Nðl; r2 Þ; 1\l\1; r2 [ 0

To test H0 : l 0 against H1 : l [ 0, i.e. h 2 H0 against h 2 H1
H0 ¼ fh ¼ ðl; r2 Þ=l 0g
H1 ¼ fh ¼ ðl; r2 Þ=l [ 0g; H ¼ H0 þ H1
Let G ¼ fge =0\c\1g; gc ðxÞ ¼ cx

= A group of transformation on
x
X Nðl; r2 Þ
) gc ðxÞ Nðc l; c2 r2 Þ 2 P
P is invariant under G with gc ðhÞ ðc l; c2 r2 Þ h 2 Hi ,

Note ge ðhÞ 2 Hi ; i ¼ 0; 1; 2; . . .. . .:ðiÞ
i.e. both P0 and P1 are invariant under G
where Pi ¼ fPh =h 2 Hi g; i ¼ 0; 1:
Also, Lðh; aÞ ¼ Lðgc ðhÞ; a0i Þ; i ¼ 0; 18h 2 H by (i)
⇒ Loss is invariant under G
Note To test H0 : h 2 H0 against H1 : h 2 H1 ; H0 ; H1 , disjoint, H0 þ H1 ¼ H
a ¼ fa0 ; a1 g; ai ¼ accept H i :
a0 a1
h 2 H0 0 L0
h 2 H1 L1 0
Let the loss function be 0–Li

Let G = a group of transformation on
x
P ¼ fPh =h 2 Hg; Pi ¼ fPh =h 2 Hi g
Let both P0 and P1 be invariant under G, then P is invariant under G.

Also, h 2 Hi , gðhÞ 2 Hi ; i ¼ 0; 1
Hence, Lðh; ai Þ ¼ LðgðhÞ; ai Þ; i ¼ 0; 18h 2 H
L is invariant under G
A test of hypothesis problem (with 0–Li loss) is said to be invariant under G if
both P0 and P1 are invariant under G.
Invariant decision rule
Let G = a group of transformation on x. The problem is invariant under G with

corresponding group of induced transformations on H and a.
7.7 Invariance 235
Let g 2 G
Let dðXÞ ¼ a be reasonable n.r. rule for the original problem. dðgðxÞÞ should be
a reasonable rule for the transformed problem. Also if for X = x, dðxÞ 2 is a
reasonable action for the original problem, then for gðXÞ ¼ gðxÞ; ~gðdðXÞÞ should be
a reasonable action in the transformed problem.
These two agree if dðgðxÞÞ ¼ ~gðdðxÞÞ. . .. . .:ðiiÞ
A non-randomized rule is said to be an invariant non-randomized rule if
(ii) holds 8x 2 x8g 2 G.
We thus get a class of n.r. decision rules as
DI = the class of invariant n.r. rules.
Appendix
A.1 Exact Tests Related to Binomial Distribution
A.1.1 We have an infinite population for which π = unknown proportion of indi-

viduals having certain character, say A. We are to test H0 : p ¼ p0 .
For doing this we draw a sample of size n. Suppose x = no. of individuals in the
sample have character A. The sufficient statistic x is used for testing H0 : p ¼ p0 .
Suppose x0 is the observed value of x. Then x binðn;! pÞ.
P n
(a) H1 : p [ p0 ; x0 : P½x x0 =H0 a i:e:; px0 ð1 p0 Þnx a
x x0 x
!
P n
(b) H2 : p\p0 ; x0 : P½x x0 =H0 a i:e:; px0 ð1 p0 Þnx a
x x0 x
(c) H3 : p 6¼ p0 ; where p0 ¼ 12 may be of our interest.

h n i

x0 : P x d0 =H0 a
h 2 i h i
n n
i:e:; P x þ d0 =H0 þ P x d0 =H0 a
2 ! 2 !
n n
X n 1 X n 1 n
i:e:; þ a where d0 ¼ x0
x n2 þ d0 x 2 x n2d0 x 2 2
Note
(1) For other values of π0 the exact test cannot be obtained as binomial distri-
bution is symmetric only when p ¼ 12.
(2) For some selected n and π the binomial probability sums considered above are
given in Table 37 of Biometrika (Vol. 1)

DOI 10.1007/978-81-322-2514-0
238 Appendix
A.1.2 Suppose we have two infinite populations with π1 and π2 as the unknown
proportion of individuals having character A. We are to test H0 : p1 ¼ p2 .
To do this we draw two samples from two populations having sizes n1 and n2.
Suppose x1 and x2 as the random variables denoting the no. of individuals in the 1st
and 2nd samples with character A.
To test H0 : p1 ¼ p2 we make use of the statistics x1 and x2 such that x1 þ x2 ¼ x
(constant), say.
Under H0 : p1 ¼ p2 ¼ p (say),
!
n1
f ðx1 Þ ¼ p:m:f: of x1 ¼ px1 ð1 pÞn1 x1
x1
!
n2
f ðx2 Þ ¼ p:m:f: of x2 ¼ px2 ð1 pÞn2 x2
x2

n1 þ n2
f ðxÞ ¼ p:m:f: of x ¼ px ð1 pÞn1 þ n2 x :
x
The conditional distribution of x1 given x has p.m.f.

! !
n1 n2
x1 x2
f ðx1 =xÞ ¼ , which is hypergeometric and independent of p.
n1 þ n2
x
Suppose the observed values of x1 and x are x10 and x0 respectively.
(a) H1 : p1 [ p2 ; x0 : P½x1 x10 =x ¼ x0 a

!
n1 n2
X x1 x0 x1
i:e:; a
x1 x10 n1 þ n2
x0
(b) H2 : p1 \p2 ; x0 : P½x1 x10 =x ¼ x0 a

!
n1 n2
X x1 x0 x1
i:e:; a
x1 x10 n1 þ n2
x0
Appendix 239
(c) H3 : p1 6¼ p2 ; exact test is not available.

Note The above probabilities can be obtained from the tables of hypergeometric
distributions (Standard University Press).
A.2 Exact Tests Related to Poisson Distribution
A.2.1 Suppose we have a Poisson population with unknown parameter k. We draw

a random sample ðx1 ; x2 ; . . .; xn Þ of size n from this population. Here, we are to test
H 0 : k ¼ k0 . P
To develop a test we make use of the sufficient statistic y ¼ ni¼1 xi ; which is
itself distributed as Poisson with parameter nk. The p.m.f. of y under H0 is therefore
y
f ðyÞ ¼ enk0 ðnky!0 Þ ; y ¼ 0; 1; 2. . .
Suppose y0 is the observed value of y.
(a) H1 : k [ k0 ; x0 : P½y y0 =k ¼ k0 a
X ðnk0 Þy
i:e:; enk0 a:
y y0
y!
(b) H2 : k\k0 ; x0 : P½y y0 =k ¼ k0 a
X ðnk0 Þy
i:e:; enk0 a:
y y0
y!
(c) H3 : k 6¼ k0 : exact test is not available.

Note These probabilities may be obtained from Table 7 of Biometrika (Vol. 1)
A.2.2 Suppose we have two populations Pðk1 Þ and Pðk2 Þ. We draw a random
sample ðx11 ; x12 ; . . .; x1n1 Þ of size n1 from Pðk1 Þ and another random sample
ðx21 ; x22 ; . . .; x2n2 Þ of size
Pn12 from Pðk2 Þ. We are to Ptest H0 : k1 ¼ k2 ¼ k (say).
Here we note that y1 ¼ ni¼1 x1i Pðn1 k1 Þ and y2 ¼ ni¼1
2
x2i Pðn2 k2 Þ.
To develop a test we shall make use of the sufficient statisticsy1 and y2 but shall
concentrate only on those for which y ¼ y1 þ y2 = constant. Under H0 the p.m.f. of
y1 ; y2 and y are
ðn1 kÞy1 ðn2 kÞy2 fðn1 þ n2 Þkgy

f ðy1 Þ ¼ en1 k ; f ðy2 Þ ¼ en2 k and f ðyÞ ¼ eðn1 þ n2 Þk
y1 ! y2 ! y!
240 Appendix
The conditional distribution of y1 given y has the p.m.f. as

yy1 y1
en2 k ðnðyy
2 kÞ
1 Þ!
en1 k ðn1ykÞ
1!
f ðy1 =yÞ ¼ fðn1 þ n2 Þkgy
eðn1 þ n2 Þk y!
y! ny11 ny22
¼
y1 !y2 ! ðn1 þ n2 Þy
! y1 y2
y n1 n1 n1
¼ 1 bin y; free of k:
y1 n1 þ n2 n1 þ n2 n1 þ n2
So this may be regarded as sufficient statistic. Suppose the observed values of y1

and y are y10 and y0 respectively. We consider the conditional p.m.f. f ðy1 =y0 Þ for
testing H0 .
(a) H1 : k1 [ k2 ; x0 : P½y1 y10 =y ¼ y0 a
! y1 y0 y1

X y0 n1 n2
i:e:; a
y1 y10 y1 n1 þ n2 n1 þ n2
(b) H2 : k1 \k2 ; x0 : P½y1 y10 =y ¼ y0 a
! y1 y0 y1

X y0 n1 n2
i:e:; a:
y1 y10 y1 n1 þ n2 n1 þ n2
(c) H3 : k 6¼ k0 : exact test is not available.
A.3 A Test for Independence of Two Attributes
In many investigations one is faced with the problem of judging whether two
qualitative characters, say A and B, may be said to be independent. Let us denote the
forms of A by Ai fi ¼ 1ð1Þkg and the forms of B by Bj fj ¼ 1ð1Þlg, and the prob-
ability associated with the cell Ai Bj in the two-way classification of the population
Appendix 241
P
by pij . The probability associated with Ai is then pi0 ¼ j pij and that associated
P
with Bj is p0j ¼ i pij . We show the concerned distribution in the following table:
A B Total
B1 B2 …. Bj …. Bl
A1 p11 p12 …. pij …. p1l p10
A2 p21 p22 …. p2j …. p2l p20
. . . . . .
. . . . . .
Ai pi1 pi2 …. pij …. pil pi0
. . . . . .
. . . . . .
Ak pk1 pk2 …. pkj …. pkl pk0
Total p01 p02 …. p0j …. p0l 1

where pij ¼ P A ¼ Ai ; B ¼ Bj 8ði; jÞ

pi0 ¼ PðA ¼ Ai Þ and p0j ¼ P B ¼ Bj
We are to test H 0 : A and B are independent , H 0 : pij ¼ pi0 p0j 8ði; jÞ

To do this we draw a random sample of size n. Let nij P = observed frequency for
the cell Ai Bj . The marginal frequency of Ai is ni0 ¼ j nij and that of Bj is
P
n0j ¼ i nij . Note that the joint p.m.f. of nij is multinomial, i.e.
YY
i ¼ 1ð1Þk i ¼ 1ð1Þk n!
f nij ; pij ; ¼Q Q ðpij Þnij :
j ¼ 1ð1Þl j ¼ 1ð1Þl i j ðn ij Þ! i j
Under H 0 : pij ¼ pio poj 8ði; jÞ

Y Y
i ¼ 1ð1Þk n!
f nij ; ¼Q Q ðpio Þnio ðpoj Þnoj
j ¼ 1ð1Þl i j ðn ij Þ! i j
n! Y
f ðni0 Þ ¼ Q ðpi0 Þni0 8i ¼ 1ð1Þk
i ðn i0 Þ! i
n! Y
f ðn0j Þ ¼ Q ðp0j Þn0j 8j ¼ 1ð1Þl
j ðn 0j Þ! j
The conditional Q distribution

Q of nij keeping marginals fixed is, under
f ðnij Þ ðni0 Þ! ðn0j Þ!
H 0 ; f ðni0 Þf ðn0j Þ ¼ n!i Q Q ðnj Þ!
i j ij
242 Appendix
This may be used for testing H 0 . Keeping marginal frequencies fixed we change
the cell-frequencies and calculate the corresponding probabilities. If the sum of the
probabilities a, then we reject H 0 .
A.4 Problems Related to Univariate

Normal Distribution
Suppose we have a normal population with mean l and standard deviation r. We

draw a random sample ðx1 ; x2 ; . . .; xn Þ of size n from this population. Here x ¼
Pn P P
1
n 1 xi ; s ¼ n
2 1
xÞ2 and s02 ¼ n 1 1 i ðxi xÞ2 .
i ðxi
A.4.1 To test H 0 : l ¼ l0 .
pffiffi
Case I r known: we note that nðxlÞ r Nð0; 1Þ
pffiffi
Under H 0 ; s ¼ nðxr l0 Þ Nð0; 1Þ:
H 1 : l [ l0 ; x0 : s [ sa
H 2 : l\l0 ; x0 : s\sa
H 3 : l 6¼ l0 ; x0 : jsj [ sa=2

100ð1 aÞ% confidence interval for l (when H 0 is rejected) is x prffiffin sa=2
pffiffi
Case II r unknown: Here we estimate r by s’ and nðxs0 lÞ tn1 .
pffiffi
Under H 0 t ¼ nðxs0 l0 Þ tn1 :
H 1 : l [ l0 ; x0 : t [ ta;n1
H 2 : l\l0 ; x0 : t\ta;n1
H 3 : l 6¼ l0 ; x0 : jtj [ ta=2;n1
0

100ð1 aÞ% confidence interval for l is x ps ffiffin ta=2 ; n 1
A.4.2 To test H 0 : r ¼ r0P
: P
ðxi lÞ2 ðxi lÞ2
Case I l known: we know r2 v2n , under H 0 ; v2 ¼ r20
v2n .
H 1 : r [ r0 ; x0 : v2 [ v2a;n
H 2 : r\r0 ; x0 : v2 \v21a;n
H 3 : r 6¼ r0 ; x0 : v2 [ v2a=2;n
Appendix 243
or,
v2 \v21a=2;n
" P #
ðxi lÞ2
P v1a=2;n \
2
\va=2;n ¼ 1 a
2
r2
"P P #
ðxi lÞ2 ðxi lÞ2
i:e:; P \r \2
¼1a
v2a=2;n v21a=2;n
P)100ð1 aÞ% confidence

P
interval for r2 when l is known is
ðxi lÞ2 ðxi lÞ2
v 2 ; v2
.
a=2;n 1a=2;n
P P
ðxi xÞ2 ðxi xÞ2
Case II l unknown: we know r2 v2n1 under H 0 ; v2 ¼ r20
v2n1
H 1 : r [ r0 ; x0 : v2 [ v2a;n1
H 2 : r\r0 ; x0 : v2 \v21a;n1
H 3 : r 6¼ r0 ; x0 : v2 [ v2a=2;n1 :
or,
v2 [ v21a=2;n1
" P #
ðxi xÞ2
P v1a=2;n1 \
2
\va=2;n1 ¼ 1 a
2
r2
"P P #
ðxi xÞ2 ðxi xÞ2
i:e., P \r \ 2
2
¼1a
v2a=2;n1 v1a=2;n1
P) 100ð1 P
aÞ% confidence interval for r2 when l is unknown is
ðxi xÞ2 ðxi xÞ2
v2
; v2 .
a=2;n1 1a=2;n1
A.5 Problems Relating Two Univariate Normal

Distributions
Suppose we have two independent populations Nðl; r21 Þ and Nðl2 ; r22 Þ. We draw a
random sample ðx11 ; x12 ; . . .; x1n1 Þ of size n1 from the first population and another
random sample ðx21 ; x22 ; . . .; x2n2 Þ of size n2 from the second population.
244 Appendix
Now, we have for the 1st and the 2nd samples
1X n1
1X n2
x1 ¼ x1i and x2 ¼ x2i
n1 i¼1 n2 i¼1
1 X n1
1 X n2
s02
1 ¼ ðx1i x1 Þ2 s02
2 ¼ ðx2i x2 Þ2
n1 1 i¼1 n2 1 i¼1
respectively.
(I) H 0 : 11 l1 þ 12 l2 ¼ 13 :
Case I r1 ; r2 known:
11x1 þ 12x2 ð11 l1 þ 12 l2 Þ
We find that sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Nð0; 1Þ
121 r21 122 r22
þ
n1 n2
11x1 þ 12x2 13
Under H 0 ; s ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Nð0; 1Þ
121 r21 122 r22
þ
n1 n2
) H 1 : 11 l1 þ 12 l2 [ 13 ; x0 : s [ sa
H 2 : 11 l1 þ 12 l2 \13 ; x0 : s\sa
H 3 : 11 l1 þ 12 l2 6¼ 13 ; x0 : jsj [ sa=2
Also, ð1 aÞ100 % confidence interval for ð11 l1 þ 12 l2 Þ is

2 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3
1 2 r2 1 2 r2
411x1 þ 12x2 1 1
þ 2 2 sa=2 5
n1 n2
Case II r1 ; r2 unknown:
Fisher’s t-test: We assume r1 ¼ r2 ¼ r, say.
ðn1 1Þs02 02
1 þ ðn2 1Þs2
r2 is estimated by ðn1 þ n2 2Þ ¼ s02 say
11x1 þ 12x2 ð11 l1 þ 12 l2 Þ

Also, sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tn1 þ n2 2
2 2
1 1
s0 1
þ 2
n1 n2
11x1 þ 12x2 13
Under H 0 ; t ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tn1 þ n2 2
121 122
s0 þ
n1 n2
This t is known as Fisher’s t when 11 ¼ 1, 12 ¼ 1.
Appendix 245
H 1 : 11 l1 þ 12 l2 [ 13 ; x0 : t [ ta;n1 þ n2 2
H 2 : 11 l1 þ 12 l2 \13 ; x0 : t\ ta;n1 þ n2 2
H 3 : 11 l1 þ 12 l2 6¼ 13 ; x0 : jtj [ ta=2 ;n1 þ n2 2
Also 100(1 − α)% confidence interval for 11 l1 þ 12 l2 is

0 sffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1
1 2 1 2
@11x1 þ 12x2 s0 1
þ 2 ta=2;n1 þ n2 2 A:
n1 n2
Note–I The above procedure may also be applicable when r1 and r2 are not
r2
equal provided 1 r12 \ 0.4—theoretical investigation in this area verifies this.
2
Note–II when homoscedasticity assumption r1 ¼ r2 is not tenable then we
require the alternative procedure and the corresponding problem is known as the
Fisher-Behren problem.
Note–III For 11 ¼ 1 and 12 ¼ 1 we get the test procedure for the difference
between the two means. Also for testing the ratio of the means, i.e. for testing
H 0 : ll1 ¼ k; say, we start with ðx1 kx2 Þ.
2
(II) H 0 : rr12 ¼ n0 :
P
1
ðx1i l1 Þ2
Case I l1 ; l2 known: n11 P ðx l Þ2 : r12 F n1 ;n2
n1 2i 2 1
r2
2
P
ðx1i l Þ2 =n1
) Under H 0 ; F ¼ P ðx l1 Þ2 n : n12 F n1 ;n2
2i 2 = 2 0
r1
H1 : [ n0 ; x0 : F [ F a;n1 ;n2
r2
r1
H 2 : \n0 ; x0 : F\F 1a;n1 ;n2
r2
r1
H3 : 6¼ n0 ; x0 : F [ F a=2;n1 ;n2 or; F\F 1a=2;n1 ;n2 :
r2
P
ðx1i l1 Þ2 =n1 r22
P
Also, P F 1a=2;n1 ;n2 \ ðx l Þ2 n : r2 \F a=2;n1 ;n2 ¼ 1 a
2i 2 = 2 1
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
ðx1i l1 Þ2 ðx1i l1 Þ2
Or, P P n2 r1
\ r2 \ n P n2
¼1a
n 1 ðx l Þ2 F
2i 2 a=2;n1 ;n2 ðx l Þ2 F
1 2i 2 1a=2;n1 ;n2
r1
This provides the 100ð1 aÞ% confidence interval for r2 when l1 ; l2 are
known.
Case II l1 ; l2 unknown:
P
1
ðx1i x1 Þ2
We have n111 P ðx x Þ2 : r12 F n1 1;n2 1
n2 1 2i 2 1
r2
2
246 Appendix
s02
1 r2
2
i:e:; : F n1 1;n2 1
s02
2 r1
2
s02
under H 0 ; F ¼ s102 : e12 F n1 1;n2 1
2 o
r1
) H1 : [ n0 ; x0 : F [ F a;n1 1;n2 1
r2
r1
H2 : \n0 ; x0 : F\F 1a;n1 1;n2 1
r2
r1
H3 : 6¼ n0 ; x0 : F [ F a=2;n1 1;n2 1 or F\F 1a=2;n1 1;n2 1 :
r2
2 3
6 s02 7
Also, P4F 1a=2;n1 1;n2 1 \ s102 : 12 \F a=2;n1 1;n2 1 5 ¼ 1 a
2 r1
r2
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s02 r1 s02
Or, P s02 F
1
\ r2 \ s02 F
1
¼1a
2 a=2;n1 1;n2 1 2 1a=2;n1 1;n2 1
i.e., 100ð1 aÞ% confidence interval for rr12 , when l1 ; l2 are unknown, is
2 3

s01 0 s01 0
6 s2 s2 7
4qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ; qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi5:
F a=2;n1 1;n2 1 F 1a=2;n1 1;n2 1
A.6 Problems Relating to Bivariate Normal

Distributions
Suppose in a given population the variables x and y are distributed in the bivariate
normal form N 2 ðlx; ly ; rx ; ry ; qÞ. Let ðx1 ; y1 Þ; ðx2 ; y2 Þ; . . .; ðxn ; yn Þ be the values of
x and y observed in a sample of size n drawn from this population. We shall
suppose that the n pairs of sample observations are random and independent. We
shall also assume that all the parameters are unknown.
We have for the sample observations
1X 1X 1 X
x¼ xi ; y ¼ yi ; s02
x ¼ ðxi xÞ2 ;
n i n i n1 i
P
1 X
2 2
i ðxi xÞ ðyi yÞ
1
02 2
sy ¼ ðyi yÞ ; and r xy ¼ n1
n1 i s0x s0y
Appendix 247
(1) To test H 0 : q ¼ 0:
pffiffiffiffiffiffi
We know when q ¼ 0; t ¼ rpffiffiffiffiffiffiffi
n2ffi
2
tn2
1r
H 1 : q [ 0; x0 : t [ ta;n2
H 2 : q\0; x0 : t [ ta;n2
H 3 : q 6¼ 0; x0 : jtj [ ta=2;n2
Note For testing q ¼ q0 ð6¼0Þ, exact test is difficult to get as for q 6¼ 0 the
distribution of r is complicated in nature. But for moderately large n one can use the
large sample test which will be considered later.
(2) H 0 : lx ly ¼ n0
Define z ¼ x y ) lz ¼ lx ly i.e., we are to test H 0 : lz ¼ n0 . Also
pffiffi P
nðzlz Þ
note that s0z tn1 where s02
z ¼ n1
1
zÞ2 ¼ s02
i ðzi
02 0
x þ sy 2sxy
P pffiffi
nðzn0 Þ
s0xy ¼ n1
1 2 2
i ðxi xÞ ðyi yÞ . Under H 0 ; t ¼ s0 tn1 :
z
For H 1 : lx ly [ n0 ; x0 : t [ ta;n1
H 2 : lx ly \n0 ; x0 : t\ ta;n1

H 3 : lx ly 6¼ n0 ; x0 : jtj [ ta=2;n1

s0
Also, 100ð1 aÞ% confidence interval for lz ¼ lx ly is z pzffiffin ta=2;n1
(3) H 0 : llx ¼ g0 : we write g¼ llx .
y y
To test H 0 : g ¼ g0 , we take z ¼ xgy ) lz ¼ lx gly ¼ 0.

z ¼ x gy = a function of g.
s02 02 2 02 0
z ¼ sz þ g sy 2gsxy = a function of g.
pffiffi pffiffi
nðzlz Þ
Now, s0z tn1 : i:e:; sn0 z tn1 ð*lz ¼ 0Þ
pffiffi z
Under H 0 ; t ¼ sn0 z0 tn1 where z0 ¼ x g0y

z0
s02 02 02 0
z0 ¼ sx þ g0 sy 2g0 sxy
So for H 1 : llx [ g0 ; x0 : t [ ta;n1

y
lx
H2 : \g0 ; x0 : t \ ta;n1
ly
l
H 3 : x 6¼ g0 ; x0 : jtj [ ta=2;n1 :
ly
248 Appendix
h pffiffi i
Again P ta=2;n1 \ sn0 z \ta=2;n1 ¼ 1 a
z
hpffiffi i
nz
i.e., P s0 \ta=2;n1 ¼ 1 a or P½wðgÞ\0 ¼ 1 a.
2
z
z2
Solving the equation wðgÞ ¼ n s02 ta=2;n1 ¼ 0 which is a quadratic equation
z
in g, one can get two roots g1 and g2 ð [ g1 Þ. Now if wðgÞ is a convex function and
g1 and g2 are real, then P½g1 \g\g2 ¼ 1 a. If wðgÞ is a concave function, then
P½g\g1 ; g [ g2 ¼ 1 a. But if g1 and g2 be imaginary then from the given sample
100ð1 aÞ%Confidence interval does not exist.
(4) Test for the ratio n ¼ rrxy :
We write u ¼ x þ ny; v ¼ x ny
) Covðu; vÞ ¼ r2x n2 r2y ) quv ¼ 0

pffiffiffiffiffiffi
Then, rp
uv ffiffiffiffiffiffiffiffi
n2ffi
2
tn2
1ruv
P
1
ðui uÞðvi vÞ
where r uv ¼ n su sv = a function of n. We are to test H 0 : rrxy ¼ n0 , i.e.
H 0 : n ¼ n0 .
pffiffiffiffiffiffi
r0uv n2
) under H0, t ¼ p ffiffiffiffiffiffiffiffiffi
02
tn 2
1ruv
where r 0uv ¼ value of r uv under n ¼ n0 .
For H 1 : n [ n0 ; x0 : t [ ta;n2
H 2 : n\n0 ; x0 : t\ta;n2
H 3 : n 6¼ n0 ; x0 : jtj [ ta=2;n2 :
pffiffiffiffiffiffi

Also, P ta=2;n2 \ rp
uv ffiffiffiffiffiffiffiffi
n2ffi
2
\t a=2;n2 ¼ 1 a
1ruv
r2 ðn2Þ
Solving the equation wðnÞ ¼ uv1r2 t2a=2;n2 ¼ 0, (which is a quadratic in n)
uv
one can get two roots n1 and n2 ð [ n1 Þ. If these roots are real and wðnÞ is a convex
function, then Pðn1 \n\n2 Þ ¼ 1 a. Again if wðnÞ is concave,
Pðn\n1 ; n [ n2 Þ ¼ 1 a. But if n1 and n2 are not real, then 100ð1 aÞ%
Confidence interval does not exist so far as the given sample is concerned.
(5) rx ; ry ; q are known:
H 0 : lx ¼ l0x ; ly ¼ l0y against H1 : H0 is not true. We know that
" 2
#
1 x lx x lx y ly y ly 2
Qðx; yÞ ¼ 2q þ v22
1 q2 rx rx ry ry
!
r2x r2y
ðx; yÞ N 2 lx ; ly ; ; ; q
n n
" #
n x lx 2 x lx y ly y ly 2
) Qðx; yÞ ¼ 2q þ v22
1 q2 rx rx ry ry
Appendix 249
Under H 0 ,
2 ! ! 3
0 2
n x l 0 2
x l 0 y l 0
y l
v2 ¼ 4 x
2q x y
þ
y 5 v2
1 q2 rx rx ry ry 2
Hence, the critical region is x0 : v2 [ v2a;2 .
A.7 Problems Relating to k-Univariate Normal

Distributions
Suppose there are k-populations Nðl1 ; r21 Þ; Nðl2 ; r22 Þ; . . .Nðlk ; r2k Þ. We draw a
random sample of size ni from the ith population with ni ( 2 for at least one i).
Define
xij ¼ jth observation of ith sample, i = 1,2,...,k; j = 1,2,..., ni
Pi
xi ¼ ith sample mean ¼ n1i nj¼1 xij
02
Pi 2
si ¼ ith sample variance ¼ ni 1 nj¼1
1
xij xi
(I) We are to test H 0 : l1 ¼ l2 ¼ ¼ lk ð¼lÞ, say against H 1 . There is at least
one inequality in H 0 .
Assumption r1 ¼ r2 ¼ ¼ rk ð¼rÞ say.
Note that xi N li ; rni
2
pffiffiffi
n ðx l Þ
) i ri i Nð0; 1Þ 8i and are independent.
ðn 1Þs02
Also, i r2 i v2ni 1 (xi and s0i are independent.)
Under H 0
X k
ni ðxi lÞ2
v2k
i¼1
r2
these two v2 are independent.
Xk
ðni 1Þs02
and i
v2nk
i¼1
r2
But the unknown l is estimated by

1X X
^¼
l nixi ¼ xðsayÞ; n ¼ ni
n i
Pk P
) Under H 0 ; ni ðxi xÞ2 r2 v2k1 and ki¼1 ðni 1Þs02
i r vnk .
2 2
P
i¼1
2
ni ðxi xÞ =
) Under H 0 ; F ¼ P k 1 F k1;nk .
02
i ðn i 1Þs i =n k
x0 : F [ F a;k1;nk . If H 0 is rejected, then we may be interested to test H 0 : li ¼
lj against H 1 : li 6¼ lj 8ði; jÞ.
250 Appendix

2 1 1
xi xj N li lj ; r þ
ni nj
xi xj ðli lj Þ
) qffiffiffiffiffiffiffiffiffiffiffiffiffi Nð0; 1Þ
r n1i þ n1j
P
ðni 1Þs02 ðxi xpj Þðli lj Þ
ffiffiffiffiffiffiffiffi
Unknown r2 is estimated by r ^2 ¼ n k i ¼ s02 , say ) tnk .
s0 ni þ nj
1 1
ðp xi xj Þ
) under H 0 ; t ¼ 0 ffiffiffiffiffiffiffiffi tnk :
ni þ nj
1 1
s

) x0 : jtj [ ta=2;nk . Also, 100ð1 aÞ% confidence interval for li lj is
n qffiffiffiffiffiffiffiffiffiffiffiffiffi o
xi xj s0 n1i þ n1j ta=2;nk .
(II)Bartlett’s test To test H 0 : r1 ¼ r2 ¼ ¼ rk ð¼ rÞ, say against H 1 : There
is at least one inequality in H 0 .
P
Define ci ¼ ni 1 and c ¼ ki¼1 ci ¼ n k. Bartlett’s test statistic M is such that
( )
X
k
c s02 X
k
M ¼ c loge i i
ci loge s02
i¼1
c i¼1
i
Under H 0 M v2k1 (approximately) provided none of ci is small. For small

P
samples M 0 ¼ n M o v2k1 under H 0 where c1 ¼ ki¼1 c1 1c and
c i
1 þ 3ðk1Þ
1
x0 : M 0 [ v2a;k1 :
A.8 Test for Regression
Suppose the sample values of x and y are arranged in arrays of y according to the
fixed values of x as given below:
x1 x2 . . . xi ... xk
y1 y21 . . . yi1 ... yk1
y12 y22 . . . yi2 yk2
: : ...
: :
: :
y1n1 y2n1 yini ... yknk
Appendix 251
Pni P
Define yi0 ¼ 1
ni j¼1 yij ; y00 ¼ 1n i niyi0 ¼ y
1X X
x¼ ni x i ; n ¼ ni
n i i
P
ni ðy y00 Þ2
e2yx ¼ P iP i0
i j ð yij y00 Þ2
qffiffiffiffiffiffi
eyx ¼ þ e2yx ¼ sample correlation ratio:
P P
j ðyij y00 Þðxi xÞ
1
i
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r ¼ rn n
on P offi
P P
j ðyij
2 2
1
n i y00 Þ 1
n i ni ð x i x Þ

We assume yij xi N 1 ðli ; r2 Þ , i.e. Eðyij xi Þ ¼ li: .
(I) Test for regression: H 0 There does not exist any regression of y on x.
, H 0 : l1 ¼ l2 ¼ ¼ lk :
qffiffiffiffiffiffi
Define g2yx ¼ VðEðy=xÞÞ
VðyÞ ; gyx ¼ þ g2yx ¼ population correlation ratio.
) To test H 0 is equivalent to test H 0 : g2yx ¼ 0 against H 1 : g2yx [ 0
We note that
XX XX X
ðyij y00 Þ2 ¼ ðyij yi0 Þ2 þ ni ðyi0 y00 Þ2
i j i j i
Under H 0
XX X
SSB ¼ e2yx ðyij y00 Þ2 ¼ ni ðyi0 y00 Þ2 r2 v2k1
i j i
XX XX
SSw ¼ 1 e2yx ðyij y00 Þ2 ¼ ðyij yi0 Þ2 r2 :v2nk
i j i j
h i
=ðk1Þ e2yx B =ðk1Þ
) Under H 0 :F ¼ ð1e2 Þ nk F k1;nk : F ¼ SS
yx =
SSW =nk
) x0 : F [ F a;k1;nk :
(II) If H 0 is rejected then we may be interested in testing whether the regression

is linear, i.e. we are to test
H 0 : li ¼ a þ bxi 8i
H 1 : li 6¼ a þ bxi
252 Appendix
P P P
We note that, e2yx ðyij y00 Þ2 ¼ i ni ðyi0 y00 Þ2
nP P i j
o2
P P ðy
y Þ ðx i xÞ
^2 P ni ðxi xÞ2
Pj
ij 00
Also, r 2 i j ðyij y00 Þ2 ¼ ¼b
i
ni ðxi xÞ2
PP i
ðyij y00 Þðxi xÞ
^¼
where b i Pj
2
ni ðxi xÞ
P P
i
P
) e2yx r 2
ðyij y00 Þ2 ¼ ^ 2 P ni ðxi xÞ2 r2 :v2 under
ni ðyi0 y00 Þ2 b
i j i i k2
H0. P P
P P
Also, e2yx i j ðyij y00 Þ2 and e2yx r 2 i y00 Þ2 are independent.
j ðyij
ðe2yx r2 Þ=ðk2Þ
) under H 0 ; F ¼ ð1e2yx Þ=nk
F k2;nk
) x0 : F [ F a;k2;nk
A.9 Tests Relating to Simple Linear

Regression Equation
Regression of y on x is established and it is linear, i.e. Eðy=xÞ ¼a þ bx; say
) Eðy=x ¼ xi Þ ¼ a þ bxi ; i ¼ 1ð1Þn:

y=x Nða þ bx; r2 Þ
Least square (LS) regression line is given by

PY = a + bx, where a, b are the LS
ðy yÞðxi xÞ
estimates of α and β, i.e. a ¼ y bx and b ¼ Pi ðx xÞ2 ¼ Sxyxx :
S
2 i
)y N a þ bx; rn and b N b; Srxx . Also they are independent.

2
) ‘a’ is normal with EðaÞ ¼ EðyÞ xEðbÞ ¼ a.
r2 r2 r2 r2 X 2
VðaÞ ¼ VðyÞ þ x2 VðbÞ ¼ þ x2 ¼ Sxx þ nx2 ¼ xi
n Sxx nsxx nsxx

x2
i.e., a N a; r2 1n þ Sxx
a a0
H 01 : a ¼ a0 : under H 01 , t ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tn2
1 x2
^
r þ
n Sxx
P 2
^ ¼
where r 2
ðyi a bxi Þ =ðn 2Þ
Appendix 253
) H 11 : a [ a0 ; x0 : t [ ta;n2
H 21 : a\a0 ; x0 : t\ta;n2
H 31 : a 6¼ a0 ; x0 : jtj [ ta=2;n2 :
qffiffiffiffiffiffiffiffiffiffiffiffiffi
Also, 100ð1 aÞ% confidence interval for a is a r ^ 1n þ Sxxx ta=2 ; n 2 :
2
pffiffiffiffiffi
H 02 : b ¼ b0 : under H 02 ; t ¼ ðbb0r^Þ Sxx tn2
) H 12 : b [ b0 ; x0 : t [ ta;n2
H 22 : b\b0 ; x0 : t\ta;n2
H 32 : b 6¼ b0 ; x0 : jtj [ ta=2;n2 :
Also, 100ð1 aÞ% confidence interval for b is

^
r
b pffiffiffiffiffiffi ta=2;n2
Sxx
H 03 : a ¼ a0 ; b¼b0 : Covða; bÞ ¼ Cov (y bx; bÞ
r2
¼ xVðbÞ ¼ x :
Sxx
! ( ! P 2 !)
r 2
a a xi
rSxxx
2
) N2 ; nSxx
b b
rSxxx r 2 2
Sxx
! ( ! P 2 )
a a xi x n
; nSr xx
2
i.e., N2
b b x n n
P 2 X
r 2
xi x n
Let ¼
nSxx x n n
!0 !
aa X1 aa
) v22 :
bb bb

n Pnx
P r2
P1 adj nSxx
nx x2
Now, ¼ j P j ¼ r2 2 P 2 i2 2
ðnSxx Þ ðn xi n x Þ
254 Appendix

nSxx n nx 1 n nx
¼ 2 P 2
¼ 2 P 2
r nSxx nx xir nx xi
!0 ! !0 !
a a X1 a a 1 aa n nx aa
) ¼ 2 P 2
bb bb r bb nx xi bb
h X i
) nða aÞ2 þ 2nxða aÞðb bÞ þ ðb bÞ2 x2i r2 v22
P
Again, n1 ðyi a bxi Þ2 r2 v2n2
) under H 03 ,
n P o.
nðaa0 Þ2 þ 2nxða a0 Þðb b0 Þ þ ðb b0 Þ2 x2i 2
F¼ P . F 2;n2
ðyi a bxi Þ2 ðn 2Þ
) w0 : F [ F a;2;n2 :
A.10 Tests Relating to Multiple and Partial

Correlation Coefficient

P
Suppose x px1 N p l ; pxp

q1:23...p = population multiple correlation coefficient of X 1 on X 2 ; X 3 ; . . .; X p
r 1:23...p = sample multiple correlation coefficient of X 1 on X 2 ; X 3 ; . . .; X p based on
a sample of size n ð p þ 1Þ 0 1
1 r 12 r 13 . . . r 1p
1=2 B 1 r 23 . . . r 1p C
B C
¼ 1 RjRj11 where R ¼ BB ... ...C C and R11 = cofactor of r 11
@ ... ...A
1
in R.
r2 =ðp1Þ

If q1:23...p ¼ 0 then F ¼ 1:23...p F p1;np .
ð1r1:23...p Þ ðnpÞ
2
To test
H 0 : q1:23...p ¼ 0 against H 1 : q1:23...p [ 0
)w0 : F [ F a;p1;np :
q12:34 ...p = population partial correlation coefficient of X 1 and X 2 eliminating the

effect of X 3 ; . . .; X p .
Appendix 255
r 12:34 ...p = sample partial correlation coefficient of X 1 andX 2 eliminating the effect
of X 3 ; . . .; X p
¼ pffiffiffiffiffiffiffiffiffi
R12 ffi
R R
. If q12:34 ...p then
11 22
r 12:34...p n p
t ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tnp
1 r 212:34...p
Thus for testing H 0 : q12:34 ...p ¼ 0 against
H 1 : q12:34 ...p [ 0; x0 : t [ ta;np

H 2 : q12:34 ...p \0; x0 : t\ ta;np
H 3 : q12:34 ...p 6¼ 0; x0 : jtj [ ta=2;np :
A.11 Problems Related to Multiple Regression

We consider a set of variables y; x1 ; x2 ; . . .; xp , where y is stochastic and

x1 ; x2 ; . . .; xp are nonstochastic. Let the multiple regression of y on x1 ; x2 ; . . .; xp be

E y x1 ; x2 ; . . .; xp ¼ b0 þ b1 x1 þ b2 x2 :: þ bp xp ðA:1Þ
where b0 ; b1 ; b2 ; bp are constants. In fact,

bi = partial regression coefficient of y on xi eliminating the effects of
xj ; j 6¼ i ¼ 1; 2;. . .p:
Define riy ¼ Cov(xi ; yÞ;rij ¼ Cov(xi ; xj Þ; ryy ¼ vðyÞ; qiy ¼ correlation of
ðxi ; yÞ; qij ¼ correlation of ðxi ; xj Þ; i ¼ 1; 2; ::p and j ¼ 1ð1Þp
We write r px1 ¼ ðr1y ; r2y ; . . .; rpy ; Þ0

0 ð1Þ 1
r11 r12 . . . r1p
Ppxp B r21 r22 . . . r2p C
¼B C
@ . . . . . . . . . . . . A = variance-covariance matrix of x1 ; x2 ; ::; xp
rP1 rP2 . . . rpp
We write
0 1
ryy ry1 ry2 ryp
B
Xp þ 1xp þ 1 B r1y r11 r12 r1p C
C
0
¼B
B r2y r21 r22 r2p C
C
@... ... ... ... A
0 1 rpy rp1 rp2 rpp
ryy r0
ð1Þ
¼@ P A = variance–covariance matrix of y, x1 ; x2 ; . . .; xp .
r pxp
ð1Þ
256 Appendix
Similarly, we0write 1
qyy qy1 qy2 qyp
B q1y q11 q12 q1p C
B C
q0p þ 1xp þ 1 ¼ B q
B 2y q21 q22 q2p C
C = correlation matrix of y,
@... ... ... ... A
qpy qp1 qp2 qpp
x1 ; x2 ; . . .; xp .
P P
Now, 0 ¼ product of the diagonal element of 0 jq0 j
¼ ðryy r11 r22 . . .rpp Þjq0 j

P
j j P
)jq0 j ¼ ðryy r11 r220...rpp Þ. Also, j j ¼ ðr P11 r22 . . .rpp Þx Cofactor of qyy in q0 .
j j
) Cofactor of qyy in q0 ¼ r11 r22 ...rpp
jq 0 j
) q2y:12...p ¼ 1
Cofactor of qyy in q0
P P
ðryy r11 r22 . . .rpp Þ
¼1 P 0
¼1 P
0
j j ðr11 r22 . . .rpp Þ ryy j j
P1 P
ryy r 0 r 0
r0ð1Þ 1 rð1Þ rð1Þ b X1
ð1Þ ð1Þ
¼1 ¼ ¼ as b ¼ r :
ryy ryy ryy ð1Þ
P
r0 b b0 b X1 X X
ð1Þ
) q2y:12...p ¼ ¼
; b¼ r ) b ¼ r ) r0 b ¼ b0 b
ryy ryy ð1Þ ð1Þ ð1Þ
Suppose we are given the set of observations

ya ; x1a ; x2a ; . . .; xpa ; a ¼ 1ð1Þn; n [ p þ 1:
Pn Pn
Define xi ¼ 1n a¼1 xia ; Sij ¼ a¼1 ðxia xi Þ xja xj
Xn
Siy ¼ a¼1
ðxia xi Þ ya yj : 8i; j ¼ 1ð1Þp
0 1
S11 S12 ... S1p
B S21 S22 ... S2p C
Spxp ¼B
@...
C which is positive definite.
... ... ... A
Sp1 Sp2 ... Spp
^ þb
Estimated regression equation of y on x1 ; x2 ; . . .; xp is y ¼ b ^ x1 þ
0 1
^ ^
b2 x2 þ þ bp xp
Appendix 257
^ ;b
where b ^ ^ ^
0 1 ; b2 ; . . .; bp are the solutions of the following normal equations:
9
S1y ¼b ^ S11 ^ S12
b þ þ ^ S1p >
b >
1 2 p >
S2y ¼b ^ S21 ^ S22
b þ þ ^ S2p =
b
1 2 p ðA:2Þ
... ... ... >
>
Spy ¼b ^ Sp1 ^ Sp2
b þ þ ^ Spp >
b ;
1 2 p
^ ¼yb
and b ^ x1 b
^ x2 b ^ xp
0 1 2 p
We write y nx1
¼ ðy1 ; y2 ; . . .; yn Þ0

0 1
x11 x1 x21 x2 . . . xp1 xp
B x12 x1 x22 x2 . . . xp2 xp C
K nxp ¼B
@ ...
C
... ... ... A
x1n x1 x2n x2 . . . xpn xp
0
^ px1 ¼ b
b ^ ;b
^ ^
1 2 ; ::; bp

Pn
Note that Siy ¼ a¼1 ðxia xi Þya ; (A.2) reduces to
^ ¼ K0 y ) b
Sb ^ ¼ S1 K 0 y

^ ;b
)b ^ ^
1 2 ; . . .; bp are linear functions of y1 ; y2 ; . . .; yn which are normal.

^ ^ ;D b
) b Np E b ^

^ ¼ S1 K 0 y ¼ S1 K 0 y y 2
Now, b

0
where 2 ¼ ð1; 1; . . .; 1Þ and K 0 2 ¼ 0

^ 1 0
) E b ¼ S K E y y 2

Eðya Þ ¼ b0 þ b1 x1a þ þ bp xpa

EðyÞ ¼ b0 þ b1x1 þ þ bpxp
Eðya yÞ¼b1 ðx1a x1 Þ þ þ bp ðxpa xp Þ
0 1
y1 y
B y y C
B 2 C
E y y 2 ¼ E B C ¼ Kb
@ : A
yn y
258 Appendix

)E b ^ ¼ S1 K 0 K b
^ ¼ S1 S b ¼ b ½* K 0 K ¼ S

^
D b ¼ S K r I n KS ¼ r S K KS1 ¼ r2 S1 SS1 ¼ r2 S1
1 0 2 1 2 1 0

^ ¼ S1 K 0 y Np b ; r2 S1
)b

ij
We write S1 ¼ S ^ Þ ¼ r2 Sii
) Vðb i
and

Cov b ^
^ ;b
i j ¼r S 8i; j ¼ 1ð1Þp
2 ij

^ N 1 b ; r2 Sii i ¼ 1ð1Þp
)bi i
^ ¼y b
Again, b ^ x1 b
^ x2 b
^ xp
0 1 2 p
!
X
p X
p X
p
^ ¼ EðyÞ
)E b E ^ xi ¼
b b0 þ bixi bixi ¼ b0
0 i
i¼1 i¼1 i¼1
X
^ ¼V y
V b ^ xi
b
0 i

^ ; x 0 ¼ x1 ; x2 ; . . .; xp
¼ V y x0b

¼ rn
2
0 ^ x (as y and b
þ x D b ^ are independent)

r2 1
¼ þ x 0 r2 S1 x ¼ r2 þ x 0 S1 x :
n n
^ is also a linear combination of normal variables.

Thus b0

^ N 1 b ; r2 1 þ x 0 S1 x
)b 0 0
n
^ þ Pb
p
Again Y ¼ b ^ xi ; ) Y N 1 ðEðYÞ; VðYÞÞ
0 i
1
P
^ þ
where EðYÞ ¼ E b ^ xi ¼ b þ P b xi ¼ nx ; (say)
E b
0 i 0 i
Appendix 259
! " #
X
p X X
VðYÞ ¼ V b^0 þ ^ xi
b i ¼ V y ^ xi þ
b i
^ xi
b i
i¼1 i i
" # !
X X
¼ V y þ ^
bi ðxi xi Þ ¼ VðyÞ þ V ^
bi ðxi xi Þ
i i
r2 X X
p p

¼ þ ðxi xi Þ xj xj Cov b ^
^ ;b
i j
n 1 1
0
r2 X X 2 ij
p p
2 1 1
¼ þ ðxi xi Þ xj xj r S ¼ r þ x x S ð x xÞ
n n
1 1

X 1 0
) Y N1 b0 þ bi xi ¼ nx ; r2 þ x x S1 ð x xÞ
n
To get different test procedures r2 is estimated as
X p 2
1 ^ b ^ x1a þ . . . þ b ^ xpa
^2 ¼
r ya b
n p 1 a¼1 0 1 p
( )2
1 Xn X
¼ ðya yÞ ^ ðxia xi Þ
b
n p 1 a¼1 i
i
" #
1 X n XX
¼ Syy ^ ^
bi bj ðxia xi Þðxia xjÞ
np1 a¼1 i j
" #
1 XX 1 0
¼ Syy ^ ^
bi bj Sij ¼ ^
Syy b S b^
np1 i j
np1
0
P
b b
(Note that q2y:12:::p ¼ ryy )
(1) H 01 : b1 ¼ b2 ¼ ¼ bp ¼ 0
) x1 ; x2 ; . . .; xp are not worthwhile in predicting y.
^0 P b
b ^

* q2y:12...p ¼ ) b1 ¼ b2 ¼ ¼ bp ¼ 0 ) q2y:12...p ¼ 0
ryy
So the problem
is totest H 01 : q2y:12...p ¼ 0 against H 1 : q2y:12...p [ 0
Now Syy 1 r 2y:12...p ¼ Syy b ^0 P b ^

n
X
¼ ^ b
ya b ^ x1a b
^ xpa r2 v2
0 1 p np1
a¼1
260 Appendix

^ 0S b
Also, Syy r 2y:12...p ¼ b ^ 0S b
^ ¼ Syy Syy b ^

) Syy ¼ ^ 0S b
Syy b ^ þ Syy r 2y:12...p

Syy r2 v2n1 ) Syy r 2y:12...p r2 v2p

.
r 2y:12...p p
) F1 ¼ . F p;np1
1 r 2y:12...p ðn 1 pÞ
) x0 : F 1 [ F a;p;np1
(2) H 0 : b0 ¼ b against H 1 : b0 6¼ b
^ b
b
Under H 0 ; t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
0
tnp1
1 0 1
^
r þ x S x
n
^2 ¼ np1
where r 1 ^ 0S b
Syy b ^ ¼ S02
y:12...p , say

X
^ ¼y
and b ^ xi ¼ y x 0 b
b ^
0 i

x0 : jtj [ ta=2;np1
(3) H 0 : bi ¼ b0i against H 1 : bi 6¼ b0i 8i ¼ 1ð1Þp

^ N 1 b ; r2 Sii
b i i
^ b0
b ipffiffiffiiffi
Under H 0 ; t ¼ tnp1 :
r
^ Sii
x0 : jtj [ ta=2;np1
100ð1 aÞ% confidence interval for bi is

pffiffiffiffiffi
b^ r
^ S ii
t a=2;np1
i
(4) H 0 : bi bj ¼ d0 against H 1 : bi bj 6¼ d0 : 8i 6¼ j ¼ 1ð1Þp

^ b
b ^ N 1 b b ; r2 Sii þ Sjj 2Sij
i j i j
ðbî b^j Þd

) under H 0 ; t ¼ pffiffiffiffiffiffiffiffiffiffi ffi 0 tnp1
r
^
ii jj
S þ S 2Sij
Appendix 261
) x0 : jtj [ ta=2;np1

100ð1 aÞ% confidence interval for bi bj is
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
b^ b
^ r ^ Sii þ Sjj 2Sij ta=2;np1
i j
(5) H 0 : EðYÞ ¼ nx ¼ n0
Yn0
Under H 0 ; t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
0 ffi tn p 1
r
^ nþ
1
x x S1 x x

x0 : jtj [ ta=2;np1 :
100ð1 aÞ% confidence interval for nx is

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !
1 0
1

^
Y r þ x x S x x ta=2;n p 1
n
^ þ Pp b
where Y ¼ b ^
0 i¼1 i xi
A.12 Distribution of the Exponent

of the Multivariate Normal Distribution

Ppxp P
Let x px1
Np l px1
; ; is positive definite.

The p.d.f. of x is

0 P
1
1 12 xl xl 1\xi \1
f x ¼ p ffiffiffiffiffiffiffiffi
P e ; and 0\ri \1
ð2pÞp=2 j j 1\li \1
X1
Qð x Þ ¼ ð x l Þ0 ð x lÞ

P
Since is positive definite, there exists a nonsingular matrix V pxp such that
P1
¼ VV 0 :
) Qð x Þ ¼ ð x l Þ0 VV 0 ð x l Þ ¼ y 0 y where y ¼ V 0 ð x l Þ

262 Appendix
X
p
¼ y2i :
i¼1
rffiffiffiffiffiffiffiffiffiffi
X
@ x1 ; x2 ; . . .; xp 1 1

jJ j ¼ ¼ ¼ q ffiffiffiffiffiffiffiffiffiffiffiffiffi ¼
@ ðy1 ; y2 ; . . .; yn Þ jV j P1

12 y 0 y pffiffiffiffiffiffiffiffi
P
) p.d.f. of y is f ð y Þ ¼ p
1 ffiffiffiffiffiffiffiffi
P e j j
ð2pÞ p=2
j j
P
P
1 12 y21
¼ e 1 :
ð2pÞp=2
Now y1 ; y2 ; . . .; yp are i.i.d Nð0; 1Þ
X
p
) y2i v2p ; i.e., Qð x Þ v2p

1
P P
If we now want to find the distribution of Q
ð x Þ ¼ x 0 1 x, since is
P
positive definite, there exists a non-singular matrix V pxp such that 1 ¼ VV 0 .
) Q
ð x Þ ¼ x 0 VV 0 x ¼ z 0 z where Z ¼ V 0 x .

P
pffiffiffiffiffiffiffiffi
Here also jJ j ¼ j j.
X1
)ð x l Þ0 ð x l Þ¼ ð x l Þ0 VV 0 ð x l Þ

0
¼ V 0 x V 0 l I p V 0 x V 0 l

0
¼ z V 0 l Ip z V 0 l

0
1 12 z V 0 l I p z V 0 l
) f ðzÞ ¼ e
ð2pÞp=2
) z1 ; z2 ; . . .; zp are normal with common variance unity but with means given by
E z ¼ V0 l.

0
Pp 2
) 1 zi non-central v2p with non-centrality parameter V 0 l V0 l ¼

P
l 0 VV 0 l ¼ l 0 1 l.

Appendix 263
A.13 Large Sample Distribution of Pearsonian

Chi-Square
Events A1 A2 …. Ai …… Ak Total
Probability P1 P2 …. Pi …… Pk 1
Frequency n1 n2 …. ni …… nk n
n! Y k
) f ðn1 ; n2 ; . . .; nk Þ ¼ Q pni i
n !
i i i¼1
ni Binðn; pi Þ
P 2
Pearsonian chi-square statistic is v2 ¼ ki¼1 ðni npnpi Þ
i
Using Stirling’s approximation to factorials
pffiffiffiffiffiffi n n þ 1
2pe n 2 Y k
f ðn1 ; n2 ; . . .; nk Þ ’ Q pffiffiffiffiffiffi ni þ 12
pni i
k n
1 2pe ni
i 1
n þ 12
Qk n i
n 1 pi
¼ k1 Q n þ1
k
ð2pÞ 2 1 ni i 2
pffiffiffi Y k 1
n npi ni þ 2
¼ k1 Q pffiffiffiffiffiffi
ð2pÞ 2 npi 1 ni
Xk
1 npi
) loge f ðn1 ; n2 ; . . .; nk Þ ’ C þ ni þ logc ðA:3Þ
i¼1
2 ni
pffiffi

where C ¼ loge n
k1 Qk pffiffiffiffiffi
ð2pÞ 2
1
npi
ni npi
We write di ¼ p ffiffiffiffiffiffiffi
np q ; qi ¼ 1 pi
i i
264 Appendix
rffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi ni qi
) ni ¼ npi þ di npi qi ) ¼ 1 þ di
npi npi
rffiffiffiffiffiffi1
np qi
) i ¼ 1 þ di
ni npi
X k rffiffiffiffiffiffi
1 pffiffiffiffiffiffiffiffiffiffi qi
) loge f ðn1 ; n2 ; . . .; nk Þ¼ C npi þ þ di npi qi logc 1 þ di
1
2 npi
0 1
Xk rffiffiffiffiffiffi 3=2
1 pffiffiffiffiffiffiffiffiffiffi qi d2 q d3 q
¼C npi þ þ di npi qi @di i i þ i i A;
2 np i 2np i 3ðnpi Þ 3=2
1
qffiffiffiffiffi
qi
Provided di np \1
i
k
X rffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi 1 qi 1 1 q d3
¼C di npi qi þ di d2i qi d2i i þ d2i qi þ piffiffiffi ð Þ þ
1
2 npi 2 4 npi n
k
X rffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi 1 qi 1 1 q d3
¼C di npi qi þ di þ d2i qi d2i i þ piffiffiffi ð Þ ðA:4Þ
1
2 npi 2 4 npi n
P pffiffiffiffiffiffiffiffiffiffi Pk
Note that di npi qi ¼ 1 ðni npi Þ ¼ n n ¼ 0;
pffiffiffi d3 d2
we assume that d3i ¼ 0ð nÞ i:e:; piffiffin ! 0; pdiffiffin ! 0 ) ni ! 0
) All the terms in the R.H.S of (A.4) tends to zero except 12 d2i qi , thus (A.4)
implies
1X k Pk 2
d2i qi ) f ’ eC e2 1 di qi
1
loge f ’ C
2 1
pffiffiffi Pk ðni npi Þ2
n 12
)f ’ k1 Q pffiffiffiffiffiffi e
i¼1 npi
ð2pÞ 2 k1 npi
1 Pk ðni npi Þ2 1
2 1
¼ pffiffiffiffiffi e
k1
i¼1 npi
: Qk1 pffiffiffiffiffiffi ðA:5Þ
ð2pÞ pk
2
1 npi
P
We note that k1 ðni npi Þ ¼ 0
P
i.e., nk npk ¼ 1k1 ðni npi Þ
X
k
ðni np Þ2 X
k1
ðni np Þ2 ðnk npk Þ2
) i
¼ i
þ
i¼1
npi i¼1
npi npk
Appendix 265
nP o2
k1
X
k1
ðni np Þ2 1 ðni npi Þ
¼ i
þ ðA:6Þ
1
npi npk
We use the transformation ðn1 ; n2 ; . . .; nk1 Þ ! ðx1 ; x2 ; . . .; xk1 Þ

where xi ¼ npi np
ffiffiffiffi
ffii
np ; i ¼ 1ðlÞk 1:
i

@ ðn1 ; n2 ; . . .; nk1 Þ pffiffiffiffiffiffiffiffiffiffiffi

jJ j ¼ ¼ Diag pffiffiffiffiffiffiffi
np1 . . . npk1
@ ðx ; x ; . . .; x Þ
1 2 k1
Y
k 1
pffiffiffiffiffiffi
¼ npi
Pk1 pffiffiffiffiffi 2
Pk ðni npi Þ2 Pk1 npi xi
(A.6) ) 1 npi ¼ 1 x2i þ 1
npk
" #
X
k1
1 X k1 k1 X
X pffiffiffiffiffiffiffiffi
¼ þ x2i px þ
2
pi pj x i x j
1
pk 1 i i i6¼j¼1
k1
X Xk1 X pffiffiffiffiffiffiffi
ffi
pi pj
p
¼ 1 þ i x2i þ xi xj
1
pk i6¼j¼1
pk
¼ x 0A x

where
0 pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi 1
p1 p1 p2 p1 p3 p1 pk1
1þ pk ffi . . .
B
pk pk pffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi
pk ffi
C
B 1 þ pp2
p2 p3 p2 pk1
C
Ak1xk1 ¼ B pk . . . pk C
@ A
k
... ...
1 þ ppk1
k
0 1
1 þ a21 a1 a2 a1 a3 . . . a1 ak1
B a1 a2 1 þ a22 a2 a3 . . . a2 ak1 C qffiffiffiffi
¼B
@
C where ai ¼ pi 8i ¼ 1ð1Þk 1
... ... A pk
a1 ak1 a2 ak1 a3 ak1 1 þ a2k1
Now

a1 þ a1 a2 a3 . . . ak1
1
a1 a2 þ a12 a3 . . . ak1 Rows

jAj ¼ ða1 a2 ::ak1 Þ ;

. . .: ... ... ... ai
a1 a2 a3 1
ak1 þ ak1
266 Appendix

1þ 1
1 1. . . 1
a21

1 1þ 1
1. . . 1

2
a22
¼ ða1 a2 ak1 Þ 1 1þ 1
1 1
... ... ...
a23
...

1 1þ
1 1 1
a2k1
2
a1 þ 1 a21 ... a21
2
a a2 þ 1
2
... a22
¼ 2 (ith row X a2 )

. 2. . ... ... i

a a2k1 ... a2k1 þ1
k1
Pk1 P Pk1
1þ a2i 1 þ 1k1 a2i ... 1þ a2i
1 1

¼
a22 1 þ a22 ... a22 (1st row R1 ¼ P Ri =
... ... ...

a2k1 a2k1 ... 1 þ a2k1
sum of all rows)

1 1 ... 1
2
Xk1 a a22 þ 1 ... a22
¼ 1þ a2i ¼ 2 ;

1
. 2. . ... ...
a a2k1 ... 1 þ a2k1
k1

1 1 ... 1

Pk1 2 0 1 ... 0
¼ 1 þ 1 ai ¼ ;
. . .: . . .: ... . . .
0 0 ... 1
P
k
X
k1 X
k1 pi
pi 1
¼ 1þ a2i ¼ 1 þ ¼ 1 ¼
1
pk pk pk
12 x 0 A x
) (A.5) ) f ðx1 ; x1 ; . . .; xk1 Þ ¼ 1
k1 p ffiffiffiffi e Qk11 pffiffiffiffiffi jJ j
ð2pÞ 2 pk npi
pffiffiffiffiffiffi
1
jAj 12 x 0 A x
¼ k1 e
ð2pÞ 2
P1
12 x 0 x P
¼ k1 P 1=2 e
1 where A1 ¼ :
ð2pÞ 2 j j
Appendix 267
P
Since 1 is positive definite therefore there exists a non-singular V such that
P1
¼ VV 0 .
P
) x 0 1 x ¼ x 0 vv0 x ¼ y 0 y where y ¼ v0 x .

Using transformation ðx1 ; x2 ; . . .; xk1 Þ ! ðy1 ; y2 ; . . .; yk1 Þ
X1=2
1
jJ j ¼ ¼
jV j
12 y 0 y
) f ðy1 ; y2 ; . . .; yk1 Þ ¼ 1
k1 e ) y1 ; y2 ; . . .; yk1 are i.i.d. N(0, 1)
ð2pÞ 2
) y 0 y v2k1

X1 X
k
ðni np Þ2
) x0 x v2k1 ) i
v2k1
npi
i¼1
qffiffiffiffiffi
qi
Note This approximate distribution is valid if di np \1
i
i.e., d2i qi \npi ) if d2i \ np npi

q , i.e. if Maxdi \ q
i 2
i i
n np
Again di ¼ pi ffiffiffiffiffiffiffii , using normal approximation the effective range of di is (−3, 3)
npi qi
)d2i 9
i.e.,
Maxd2i ¼ 9
So the approximate distribution will be valid if 9\ np qi , i.e. if npi [ 9ð1 pi Þ ,

i
i.e. if npi [ 9. So the approximation is valid if the expected frequency for each
event is at least 10.
Again, if we consider the effective range of di as (−2, 2), then the approximation
is valid if the expected frequency for each event is at least 5.
It has been found by enquiry that if the expected frequencies are greater than 5
then the approximation is good enough.
If the expected frequencies of some classes be not at least 5 then some of the
adjacent classes are pooled such that the expected frequencies of all classes after
coalition are at least 5. If k
be the no. of classes after coalition, then
Pk
ðn
i np
i Þ2
i¼1 np
v2k
1 ;
i
where n
i ¼ observed frequency after coalition,
np
i ¼expected frequency after coalition.
Uses of Pearsonian-v2 :
(1) Test for goodness of fit:
268 Appendix
Classes Probability Frequency

A1 p1 n1
A2 p2 n2
. . .
. . .
. . .
Ai pi ni
: : :
Ak pk nk
Total 1 n
We are to test H 0 : pi ¼ p0i 8i

Under H 0 ; expected frequencies are np0i : We assume np0i 58i.
Pk ðni np0i Þ2
) Under H 0 ; i¼1 np0i
v2k1
P n2
i.e. ki¼1 npi 0 n v2k1 :
Pi 2
i.e. v2 ¼ ki¼1 OE n v2k1
where O = observed frequency (ni )
E = Expected frequency (np0i )
) x0 : v2 [ v2a;k1
Note Suppose the cell probabilities p1 ; p2 ; . . .; pk depend on unknown parame-

ters h s1 ¼ ðh1 ; h2 ; . . .; hs Þ0 and suppose ^h be an efficient estimator of h . Then

n o2
Pk ni npi ð h^Þ
i¼1
v2ðk1sÞ :
npi ^
h

(2) Test for homogeneityof similarly classified populations
Classes Population
P1 P2 ………. Pj ………. Pl
A1 p11 p12 ………. p1j ………. p1l
A2 p21 p22 ………. p2j ………. p2l
. . . ………. . ………. .
. . . ………. . ………. .
Ai pi1 pi2 pij pij
. . . . .
. . . . .
. .
Ak pk1 pk2 pkj pkl
Total 1 1 ………. 1 ………. 1
Appendix 269
where pij ¼ the probability that an individual selected from jth population will
belong to ith class.
We are to test H 0 : pi1 ¼ pi2 ¼ ¼ pil ð¼ pi sayÞ 8i ¼ 1ð1Þk. To do this we
draw a sample of size n and classify as shown below:
Classes Population Total

P1 P2 …. Pj …. Pl
A1 n11 n12 n1j n1l n10
A2 n21 n22 n2j n2l n20
. . . . . . . .
. . . . . . . .
Ai ni1 ni2 nij nil ni0
. . . . . . . .
. . . . . . . .
Ak nkl nk2 …. nkj …. nkl nk0
Total n01 n01 …. n0j …. n0l n
For the jth population the Pearsonian chi-square statistic is

2
Xk
nij n0j pij
vð2k1Þ 8j ¼ 1ð1Þ1
i¼1
n0j pij
2
Xl X k
nij n0j pij
) v21ðk1Þ
j¼1 i¼1
n 0j p ij
Pl Pk fnij n0j pi g2
) Under H 0 ; v2 ¼ j¼1 i¼1 n0j pi v21ðk1Þ
pi ’s are unknown and they are estimated by pî ¼ nni0 8i ¼ 1ð1Þk:
P P fnij n0j ni0 g
) under H 0 ; v2 ¼ 1j¼1 ki¼1 n ni0n v21ðk1Þðk1Þ ¼ v2ðk1Þðl1Þ
0j n
as the d.f. will be reduced by (k − 1) since we are to estimate any (k − 1) of

P
p1 ; p2 ; . . .; pk as k1 pi ¼ 1:
) x0 : v2 [ v2a;ðk1Þð11Þ
270 Appendix
(3) Test for independenceof two attributes
A B Total
B1 B2 ………. Bj ………. Bl
A1 p11 p12 ………. p1j ………. p1l p10
A2 p21 p22 ………. p2j ………. p2l p20
. . . ………. . ………. . .
. . . ………. . ………. . .
Ai pi1 pi2 pij pi1 pi0
. . . . . .
. . . . . .
Ak pk1 pk2 pkj pkl pk0
Total p01 p02 ………. p0j ………. p0l 1
We are to test
H 0 : A and B are independent, i.e. to test
H 0 : pij ¼ pi0 xp0j 8ði; jÞ
To do this we draw a sample of size n and suppose the sample observations be

classified as shown below:
A B Total
B1 B2 ………. Bj ………. Bl
A1 n11 n12 ………. n1j ………. n1l n10
A2 n21 n22 ………. n2j ………. n2l n20
. . . ………. . ………. . .
. . . ………. . ………. . .
Ai ni1 ni2 nij nil ni0
. . . . . .
. . . . . .
Ak nk1 nk2 nkj nkl nk0
Total n01 n02 ………. n0j ………. n0l n
n! Y Y nij
Pðn11 ; . . .; nkl Þ ¼ Q Q pij
i j nij ! i j

E nij ¼ npij ;
Appendix 271
i ¼ 1ð1Þk; j ¼ 1ð1Þ1:

X X nij npij 2 a
) v2k11
i j
np ij
P P ðnij npi0 p0j Þ2 a

Under H 0 ; v2 ¼ npi0 p0j v2k11
i j
Now, unknown pi0 and p0j are estimated by
n
pi0 ¼ nni0 and ^
^ p0j ¼ n0j
P P ðnij ni0nn0j Þ2 a 2
) under H 0 ; v2 ¼ ni0 n0j vðk11Þðk þ 12Þ
i j n
a
vð2k1Þð11Þ
PP n2ij a
i.e., v2 ¼ n ni0 n0j n vð2k1Þð11Þ
i j
) x0 : v2 [ v2a;ðk1Þð11Þ :
Particular cases: (i) l = 2
A B Total
B1 B2
A1 a1 b1 T1
A2 a2 b2 T2
. . . .
Ai ai bi Ti
. . . .
Ak ak bk Tk
Total Ta Tb n
Here,
2 2
X
k
ai T inT a X
k
bi T inT b
v ¼
2
TiT a þ TiTb
1 n 1 n
Now, bi T inT b ¼ T i ai T i ðnT

n
aÞ
272 Appendix

T iT a T iT a
¼ T i ai T i þ ¼ ai
n n
X k 2
T T
i a 1 1 1
) v2 ¼ n ai þ
1
n T a T b T i
X k 2
T iT a n
¼n ai
1
n T i T aT b
2 X k 2 2

n ai T iT ai T a
¼ þ 2a 2
T aT b 1 T i n n
" #
n 2 Xak 2
Ta 2
Ta
¼ i
þ 2 n 2 Ta
T aT b 1 T i n n
" #
n2 X a2i T 2a
k
¼
T a T b 1 T i n2
hP i
2 k b2i T2
This formula or its equivalent v2 ¼ T na T b 1 Ti n2b will be found more con-
venient for computational purpose.
x0 : v2 [ v2a;k1
(ii) k = 2, l = 2:
A B Total
B1 B2
A1 a b a+b
A2 c d c+d
Total a+c b+d n=a+b+c+d
Here,

ða þ bÞða þ cÞ 2 ða þ bÞðb þ d Þ 2 ðc þ d Þða þ cÞ 2 n o2
a b c d ðc þ dÞnðb þ dÞ
n n n
v ¼
2
þ þ þ
ða þ bÞða þ cÞ ða þ bÞðb þ d Þ ðc þ d Þða þ cÞ ðc þ d Þðb þ d Þ
n n n n
Now, a ða þ bÞnða þ cÞ ¼ 1n ½aða þ b þ c þ d Þ ða þ bÞða þ cÞ ¼ ad n bc
Similarly; b ða þ bÞnðb þ d Þ ¼ ad n bc ; c ðc þ d Þða þ cÞ

n ¼ ad n bc
and d ðb þ dÞnðc þ d Þ ¼ ad n bc
Appendix 273

ðad bcÞ2 1 1 1 1
)v2 ¼ þ þ þ
n ða þ bÞða þ cÞ ða þ bÞðb þ d Þ ðc þ d Þða þ cÞ ðc þ d Þðb þ d Þ
2
ðad bcÞ n n
¼ þ
n ða þ bÞða þ cÞðb þ d Þ ða þ cÞðc þ d Þðb þ d Þ
ðad bcÞ2 n
¼
ða þ bÞðc þ d Þða þ cÞðb þ d Þ
This turns out to be much easier to apply.

Corrections for continuity
We know that for the validity of the v2 -approximation it is necessary that the
expected frequency in each class should be sufficiently large (say > 4). When
expected frequencies are smaller we pool some of the classes in order to satisfy this
condition. However, it should be apparent that this procedure should be ruled out in
case of 2 × 2 table. For 2 × 2 table the following two methods of correction may be
applied.
(I) Yates’ correction: Yates has suggested a correction to be applied to the
observed frequencies in a 2 × 2 table in case any expected frequency is found
to be
too small. This is done by increasing or decreasing the frequencies by half 1=2 in
such a way that the marginal totals remain unaltered.
Case 1 Say ad < bc
A B Total
B1 B2
A1 aþ 1
2 b 12 a+b
A2 c 1
2 dþ 1
2
c+d
Total a+c b+d a+b+c+d

Here, a þ 12 d þ 12 b 12 c 12 ¼ ðad bcÞ þ n2

¼ jad bcj þ n2 ¼ jad bcj n2 (since ad − bc < 0)
Case 2 If ad > bc
A B Total
B1 B2
A1 a 12 bþ 1
2
a+b
A2 cþ 1
2 d 1
2
c+d
Total a+c b+d a+b+c+d
274 Appendix

Here, a 12 d 12 b þ 12 c þ 12
n
¼ ðad bcÞ
2
n
¼ jad bcj
2
n½jadbcjn
2
) For both the cases v2 ¼ ða þ bÞðc þ d Þða þ 2cÞðb þ dÞ :
(ii) Dandekar’s correction: A slightly different method suggested by V.N.

Dandekar involves the calculation of v20 ;v21 and v21 for the observed 2 × 2
configuration
2
where v20 ¼ ða þ bÞðcnþðadbc Þ
d Þða þ cÞðb þ d Þ
v21 = the chi-square obtained by decreasing the smallest frequency by ‘1’ keeping
marginal totals fixed and
v21 = the chi-square obtained by increasing the smallest frequency by ‘1’
keeping the marginal totals fixed.
Then the test statistic is given by
v20 v21 2
v2 ¼ v20 v v20 :
v21 v21 1
A.14 Some Convergence Results

Definition 1 A sequence of random variables fX n g; n ¼ 1; 2; . . .; is said to be
convergent in probability to a random variable X if for any 2 [ 0; however small,
PfjX n X j\ 2g ! 1 as n ! 1
P
and we write it as X n !X: If X is degenerate, i.e. a constant, say c, then this
convergence is known as WLLN.
Definition 2 Let fX n g; n ¼ 1; 2;. . . be a sequence of random variables having
distribution functions fF n ðxÞg and X be a random variable having distribution
functionF ðxÞ: If F n ðxÞ ! F ðxÞ as n ! 1 at all continuity points of F ðxÞ , then we
L
say X n converges in law to X and we write it as X n !X: i.e., the asymptotic
distribution of X n is nothing but the distribution of X.
Appendix 275
P
Result 1(a): If X n ! X and gðxÞ be a continuous function for all x, then
P
gðX n Þ!gðX Þ:
P
Result 1(b): If X n ! C and gðxÞ is continuous in the neighbourhood of C, then
P
gðX n Þ ! gðCÞ:
Result 2(a):
P L
X n ! X ) X n !X
P L
X n ! C , X n !C
L L
Result 2(b): X n !X ) gðX n Þ!gðxÞ if g be a continuous function.
L
Result 3: Let fX n g and fY n g be sequences of random variables such that X n !X
P
and Y n !C; where X is a random variable and C is a constant, then
L L
(a) X n þ Y n !X þ C; (b) X n Y n !XC;
L
(c) Xn
Yn ! CX ; if C 6¼ 0 and
P
(d) X n Y n !0; if C = 0.
Theorem 1 Let fT n gbe a sequence of statistics such that
pffiffiffi L 0
nðT n hÞ!X N ð0; r ðhÞÞ:If gðnÞbe a function admitting g ðnÞin the neigh-
2
pffiffiffi L
bourhood of h, then nðgðT n Þ gðhÞÞ!Y N ð0; r2 ðhÞg02 ðhÞÞ:
Proof By mean value theorem

gðT n Þ¼ gðhÞ þ ðT n hÞfg0 ðhÞ þ 2n g… (A) where 2n ! 0 as T n ! h. Since
2n ! 0 as T n ! h; we can determine a d [ 0; for any small positive quantity g;
such that jT n hj\g ) j2n j\d:
) PfjT n hj\gg Pfj2n j\dg ðA:7Þ
i.e., Pfj2n j\dg Pfg\T n h\gg

¼ P ng\ nðT n hÞ\ ng
Z1 2x2
1 2r ðhÞ
! pffiffiffiffiffiffi e dx ¼ 1
2prðhÞ
1
276 Appendix
) Pfj2n j\dg ! 1 as n ! 1
P
) 2n !0 ðA:8Þ
pffiffiffi L
Again; nðT n hÞ!X N 0; r2 ðhÞ ðA:9Þ
Combining (A.8) and (A.9) and using result 3(d), we can write
pffiffiffi P
nðT n hÞ 2n !0 ðA:10Þ
Again, (A) gives

nðgðT n Þ gðhÞÞ nðT n hÞg0 ðhÞ ¼ nðT n hÞ 2n
pffiffiffi P
i.e. X n Y n ¼ nðT n hÞ 2n !0
pffiffiffi pffiffiffi P
where X n ¼ nðgðT n Þ gðhÞÞ and Y n ¼ nðT n hÞg0 ðhÞ i:e: X n Y n !0
ðA:11Þ
pffiffiffi L
Also; Y n ¼ nðT n hÞg0 ðhÞ!Y N 0; r2 ðhÞg02 ðhÞ ðA:12Þ
Combining (A.11) and (A.12) and using result 3(a),

L L
Y n þ ðX n Y n Þ!Y þ 0 , i.e. X n !Y
pffiffiffi L
i.e., nðgðT n Þ gðhÞÞ!Y N ð0; g02 ðhÞr2 ðhÞÞ
pffiffiffi a
i.e., nðgðT n Þ gðhÞÞ N ð0; g02 ðhÞr2 ðhÞÞ
Note 1 If T n N 0; r nðhÞ ; then gðT n Þ N gðhÞ; g02 ðhÞ r nðhÞ provided gðnÞ
a 2 a 2
be a continuous function in the neighbourhood of h admitting the1st derivative.

pffiffi
Note 2 nðggðT0 ðnTÞg
nÞ
ðhÞÞ a
N ð0; r2 ðhÞÞ; provided g0 ðnÞ is continuous.
pffiffi
Proof nðgðgT0 nðhÞg
Þ
ðhÞÞ a
N ð0; r2 ðhÞÞ
P
Since T n !h and g0 ðnÞ is continuous
P g0 ð hÞ P
) g0 ðT n Þ!g0 ðhÞ ) 0 !1
g ðT n Þ
nð gð T n Þ gð hÞ Þ nðgðT n Þ gðhÞÞ g0 ðhÞ
) ¼ 0
g0 ðT n Þ g0 ð hÞ g ðT n Þ
Appendix 277
As first part of the R.H.S converges in law to X N ð0; r2 ðhÞÞ and the second
part converges in probability to 1,
) their product converges in law to N ð0; r2 ðhÞÞ:
Note 3 Further if rð1Þ is continuous, then
pffiffiffi
nðgðT n Þ gðhÞÞ a
N ð0; 1Þ:
g0 ðT n ÞrðT n Þ
pffiffi pffiffi
nðgðT n ÞgðhÞÞ nðgðT n ÞgðhÞÞ rðhÞ
Proof g0 ðT n ÞrðT n Þ ¼ g0 ðT n ÞrðhÞ rð T n Þ
pffiffi
nðgðT n ÞgðhÞÞ a
By note-2, g0 ðT n ÞrðhÞ N ð0; 1Þ
P
Also, T n !h and rðnÞ is continuous
P rðhÞ P
rðT n Þ!rðhÞ ) !1:
rðT n Þ
pffiffiffi
nðgðT n Þ gðhÞÞ L
) !N ð0; 1Þ
g0 ðT n ÞrðT n Þ
Generalization of theorem 1
Theorem
8 20 1 9
>
> T1n >
>
>
> B T2n C >
>
< B C =
B C
Let T ¼ B C for n¼ 1; 2. . .: be a sequence of statistics such that
>
> n
@ A >
>
>
> >
>
: ;
Tkn
0 pffiffiffi 1
p nffiðT1n h1 Þ
ffiffi
B nðT2n h2 Þ C P
pffiffiffi B C a P
n T h ¼B B C Nk 0 ; kxk h ;where
C h ¼
n
@ A
pffiffiffi
nðTkn hk Þ

rij h :

Let gð. . .Þ be a function of k variables such that it is totally differentiable. Then
L
pffiffiffi
n g T g h !X N1 0; V h
n
P P i
@g @g @g @g
where V h ¼ ki kj @h @h rij h ; @h ¼ @T
i j i
in
j T ¼h
n
278 Appendix
Proof Since g is totally differentiable, so by mean value theorem,

X
k
@g
g T ¼g h þ ðTin hi Þ þ 2n
T h
ðA:13Þ
n
i¼1
@hi n
where 2n ! 0 as Tin ! hi 8 i ¼ 1ð1ÞK:

) For any given small g [ 0 we can find a d [ 0; however small, such that
jTin hi j\g ) j2n j\d
) Pfj2n j\dg PfjTin hi j\gg ! 1

pffiffiffi a
as nðTin hi Þ N
P P
)Tin !hi and 2n !0
pffiffiffi a
Again nðTin hi Þ N
pffiffiffi n Pk o1=2
) n T h ¼ n 1 ðTin hi Þ2 has an asymptotic distribution.
n

pffiffiffi L
Suppose n
T h ! Y
n

pffiffiffi
) n T h
P
2n ! 0
n
pffiffiffi P
pffiffiffi @g
) (A.13) implies n g T g h n k1 ðTin hi Þ @h ¼
n
P
i
pffiffiffi T h !
n 2n n 0
P
i.e., Yn Xn !0

pffiffiffi pffiffiffi P @g
where Yn ¼ n g T g h and Xn ¼ n k1 ðTin hi Þ @h
n i
pffiffiffi
We note that Xn ; being linear function of normal variables nðTin hi Þ; i ¼
1ð1ÞK; will be asymptotically normal with mean 0 and variance =
Pk Pk @g @g
i j @hi @hj rij ðhÞ ¼ V h
i.e.,
L

Xn !X N 0; V h

L

) ðYn Xn Þ þ Xn !X N 0; V h

L

i.e., Yn !X N 0; V h

pffiffiffi
a
i.e., n g T g h N 0; V h
n
Appendix 279
A.15 Large Sample Standard Errors

of Sample Moments
We have F ð xÞ a continuous c.d.f. We draw a random sample ðx1 ; x2 ; . . .; xn Þ from it.

We have

l0r¼ EðX rÞ; l01 ¼ l
r
lr ¼ E X l01 ¼ EðX lÞr
and the sample moments as

8
> Pn
> 0 0
< mr ¼ n xi ; m1 ¼ x
> 1 r
P
1
> m0r ¼ 1n ðxi lÞr
>
>
: m ¼ 1 P ðx xÞr
r n i

(i) To find E m0r ; V m0r ; Cov m0r ; m0s
1X n 1X n
E m0r ¼ E xri ¼ l0 ¼ l0r
n 1 n 1 r

Cov m0r ; m0s ¼ E m0r m0s l0r l0s
1 nX r X s o
¼ 2E xi xi l0r l0s
n " #
1 X r þ s X X n r s o
¼ 2 E xi þ E xi xj l0r l0s
n i6¼j
l0r þ s l0r l0s
¼
n
0 1 0
) V mr ¼ l2r l02 r
n
pffiffiffi 0
n mr l0r L
) pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi !N ð0; 1Þ
l02r l02r
This fact can be used for testing of hypothesis related to l0r :

For r = 1,
pffiffiffi 0 pffiffiffi
n m1 l01 nðx lÞ L
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ !N ð0; 1Þ:
0
l2 l 1 02 r
280 Appendix
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn
Since the sample s.d s ¼ i¼1 ðxi xÞ2 is consistent estimator of r;
1
n
P P
s!r, i.e. rs !1
nðx lÞ nðx lÞ r L
) ¼ !N ð0; 1Þ
s r s
0 0 0 0
(ii) To find E mr ; V mr ; Cov mr ; ms :
1X n
m0r ¼ ðxi lÞr ; ) E m0r ¼ lr
n i¼1
! !
0 0 1 X n
r
X n
s
E mr ms ¼ 2 E ð x i lÞ ð x i lÞ
n i¼1 i¼1
" #
1 X n
rþs
XX n
r s
¼ 2 E ð x i lÞ þ E ð x i lÞ E x j l
n 1 i6¼j¼1
1
¼ 2
nlr þ s þ nðn 1Þlr ls
n
1
¼ lr þ s þ ðn 1Þlr ls
n
1
) Cov m0r ; m0s ¼ lr þ s þ ðn 1Þlr ls lr ls
n
1
¼ lr þ s lr ls
n
1
) V m0r ¼ l2r l2r
n
P P
We note that, m0r ¼ 1n ni¼1 ðxi lÞr ¼ 1n ni Zi
where Zi ¼ ðxi lÞr ; EðZi Þ ¼ lr and V ðZi Þ ¼ l2r l2r
For x1 ; x2 ; . . .xn i.i.d ) Z1 ; Z2 ; . . .Zn are also i.i.d
pffiffiffi
nðZ l Þ L
) pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffir ffi !N ð0; 1Þ:
l2r l2r
pffiffi 0
nðmr lr Þ a
That is, pffiffiffiffiffiffiffiffiffiffi
ffi N ð0; 1Þ
l2r lr
2
Appendix 281
(iii) To find E ðmr Þ; V ðmr Þ; Covðmr ; ms Þ :
1X n
1X
mr ¼ ðxi xÞr ¼ ½ðxi lÞ ðx lÞr
n i¼1 n
r
1X 1X
¼ ðxi lÞ ðx lÞ
n n
1X r
¼ ðXi lÞ m01
n" #
Xn
1 r r r1 0 r1 r 0r1 r 0r
¼ ðxi lÞ ðxi lÞ m1 þ . . . þ ð1Þ ðxi lÞm1 þ ð1Þ m1
n 1 1 r1

r r
¼ m0r m0r1 m01 þ þ ð1Þr m01
1

¼ g m01 ; m02 . . .m0r ¼ g m 0

pffiffiffi 0
a

We have observed that n mj lj N 0; l2j l2j 8 j ¼ 1ð1Þr
0 1
m01 l1
B 0 C
pffiffiffi pffiffiffiB m2 l2 C a X
) n m l ¼ nB
0
B C Nr 0 ;
C
rxr

@ A
m0r lr
pffiffiffi
P rxr pffiffiffi
where ¼ rij l and rij l ¼ Cov n m0i li ; n m0j lj

¼ nCov m0i ; m0j ¼ li þ j li lj
So by Theorem 2,

pffiffiffi 0 a
n g m g l N 0; V l

P P dg dg dg

dg

where V l ¼ ri¼1 rj¼1 dl dl r ij l ; dli ¼ dm0i m0 ¼ l
i j

282 Appendix

r
gðl1 ; l2 ; . . .lr Þ ¼ lr lr1 l1 þ þ ð1Þr lr1 ¼ lr
1

dg dg
¼ ¼ rlr1
dl1 dm0i m0 ¼ l

dg dg dg
¼ ¼ 1 and ¼ 0 8 i ¼ 2ð1Þr 1
dlr dm0r m0 ¼ l dli

2 2
dg dg dg dg
)V l ¼ r11 l þ rrr l þ 2 r1r l
dl 1 dl r dl 1 dl r

¼ r 2 l2r1 l2 l21 þ l2r l2r þ 2ðrlr1 Þ lr þ 1 l1 lr

¼ r 2 l2r1 l2 þ l2r l2r 2rlr1 lr þ 1

pffiffiffi a
) nðmr lr Þ N 0; V l

i.e.,
0 1
V l
B C
mr N B C
a
l
@ r ;
n A
In particular,
l l2
for r ¼ 2; m2 ¼ s2 N l2 ¼ r2 ; 4 n 2 , i.e. s2 N r2 ; l4 r
a a 4
n

a 9l3 þ l l2 6l l
for r ¼ 3; m3 N l3 ; 2 6 n 3 2 4

a 16l2 l þ l l2 8l l
for r ¼ 4; m4 N l4 ; 3 2 8n 4 3 5
Again, if sampling is from normal distribution N ðl; r2 Þ then l3 ¼ l5 ¼ ¼ 0
and l2r ¼ ð2r 1Þð2r 3Þ. . .3:1 r2r
i.e.,
l4 ¼ 3r4 ; l6 ¼ 15r6 ; l8 ¼ 105r8 .

a 2r4
) s2 N r2 ;
n
6

a 6r
m3 N 0;
n
8

a 4 96r
m4 N 3r ; :
n
Appendix 283
pffiffi 2 2 pffiffi 2 2
nðs r Þ a P nðs r Þ a
Thus, r2 N ð0; 2Þ and as s2 ! r2 ; s2 N ð0; 2Þ and this can be
used for testing hypothesis regarding r2 :
Note For H0 : r1 ¼ r2
testing
a r2 a r2
s1 N r1 ; 2n11 and s2 N r2 ; 2n22

a r21 r22
) s1 s2 N r1 r2 ; þ
2n1 2n2
s2 ffi a
Under H0 ; ps1ffiffiffiffiffiffiffiffiffiffiffi N ð0; 1Þ where unknown r is estimated as
r 1 1
2n1 þ 2n
2
n1 s 1 þ n2 s 2
^¼
r :
n1 þ n2

P
n
(iv) Covðmr ; xÞ ¼ Cov mr ; m01 þ l x ¼ 1n ðxi lÞ þ l ¼ m01 þ l
i¼1

¼ Cov m0r ; m01 rlr1 V m01
1 l 1
¼ lr þ 1 rlr1 2 ¼ lr þ 1 rlr1 l2
n n n
Covðm2 ; xÞ ’ 1n l3 ¼ 0 if the sampling is from N ðl; r2 Þ:

Note The exact expression for Covðm2 ; xÞ ¼ n1n2 l3 :
(v) Large sample distribution of C.V.
pffiffiffiffiffiffi
m2
0 ¼ g m2 ; m01 ; say
v¼
m1

T1n m2
¼ gðT1n ; T2n Þ ¼ g T where T ¼ ¼
n n T2n m01
Writing

h1 l2
h ¼ ¼
h2 l01

pffiffiffi pffiffiffi m2 l2 a P
we observed that n T h ¼ n 0 0 N2 ð0; Þ
l
P
n m
l4 l22 l3 1 1
where ¼ rij h ¼
l l2
pffiffiffi ffi 3
0 l2
If g h ¼ g l2 ; l1 ¼ l0 ¼ V; Population C.V., then by Theorem 2,

1

pffiffiffi a pffiffiffi a
n g T g h N 0; V h , i.e. nðv V Þ N 0; V h
n
2 2
dg dg dg dg
where V h ¼ dh r11 h þ dh r22 h þ 2 dh dh2 r12 h
1 2 1
284 Appendix

dg
Now, dh1 ¼ 2pffiffiffi
1ffi
l l 0 ; r 11 h ¼ l4 l22
2 1
pffiffiffiffiffi
dg l2
¼ 02 ; r22 h ¼ l2 ; r12 h ¼ l3
dh2 l1
pffiffiffiffiffi
1 l2 1 l2
V h ¼ l4 l2 þ 04 l2 pffiffiffiffiffi 0 02 l3
2
4l2 l021 l 1 l l
2 1 l1
l4 l22 l22 l3
¼ þ 03
4l2 l02
1 l04
1 l1

l4 l22 l3
¼ þ V 2
V2
4l22 l2 l01
If the sampling is from N ðl; r2 Þ; l2 ¼ r2 ; l3 ¼ 0; l4 ¼ 3r4 ; then

3r4 r4
1

V 2 ð1 þ 2V 2 Þ
V h ¼ 4
þ V 2
V 2
¼ þ V 2
V2 ¼
4r 2 2

pffiffiffi a V 2 ð1 þ 2V 2 Þ
Thus nðv V Þ N 0; 2 :
(vi) Large sample distribution of skewness

T
Sample skewness = g1 ¼ m3
3=2 ¼ gðm3 ; m2 Þ ¼ g T ¼ g 1n
m2 n T2n
where T1n ¼ m3 andT2n ¼ m2:

h1 l3
We define, h ¼ ¼ ) g h ¼ l3=2 3
¼ c1
h2 l2 l2

We know, n T h ¼ n N2 ð0; Þ
n m2 l2

P22 l6 l23 þ 9l32 6l4 l2 l5 4l2 l3
where ¼ rij h ¼
l5 4l2 l3 l4 l22
) By Theorem 2,

pffiffiffi a
n g T g h N 0; V h
n
i.e.,
pffiffiffi
a
nðg1 c1 Þ N 0; V h

Appendix 285
2 2
dg dg dg dg
where V h ¼ dh r11 h þ dh r22 h þ 2 dh dh2 r12 h
1 2 1
dg dg l3
Now, dh1 ¼ l 13=2 ; dh2 ¼ 32 5=2
2 l2
1 9 l23 1 l3
)V h ¼ 3 l6 l23 þ 9l32 6l4 l2 þ l4 l22 3 ðl5 4l2 l3 Þ
l2 4 l2
5 3=2 5=2
l2 l2

l l23 þ 9l32 6l4 l2 9 l23 l4 l22 3l ðl 4l l Þ
¼ 6 þ 3 5 4 2 3
l32 4 l52 l2
If the sampling is from N ðl; r2 Þ then l2 ¼ r2 ;l3 ¼ l5 ¼ 0;l4 ¼ 3r4 ,

l6 ¼ 15r6 :
ð15 þ 9 18Þr6
)V h ¼ ¼6
r6
pffiffiffi a
) nðg1 c1 Þ N ð0; 6Þ
(vii) Large sample distribution of Kurtosis

T
Sample Kurtosis = g2 ¼ m4
m22
3 ¼ gð m 4 ; m 2 Þ ¼ g T ¼ g 1n
n T2n
where T1n ¼m4 and
T 2n ¼ m2 :

h1 l4
Let h ¼ ¼ ) g h ¼ ll42 3 ¼ c2
h2 l2

2
We know, n T h ¼ n N2 0 ;
n m2 l2

P22 l8 l4 þ 16l3 l2 8l3 l5 l6 l4 l2 4l23
2 2
where ¼ rij h ¼
l6 l4 l2 4l23 l4 l22
) By Theorem 2,

pffiffiffi a
n g T g h N 0; V h
n
pffiffiffi
a
i.e., nðg2 c2 Þ N 0; V h

2 2
dg dg dg dg
where V h ¼ dh1 r11 h þ dh r22 h þ 2 dh1 dh2 r12 h
2
dg dg
dh1 ¼ l22 ¼ 2l
1
Now and dh2 l32
4
286 Appendix
l l2 þ 16l2 l 8l l
4l24 l4 l22 4l4 l6 l4 l2 4l23
V h ¼ 8 4 3 2 3 5
þ
l42 l62 l52
Now, if the sampling is from N ðl; r2 Þ
l2 ¼ r2 ; l3 ¼ l5 ¼ l7 ¼ 0; l4 ¼ 3r4 ; l6 ¼ 15r6 and l8 ¼ 105r8

)V h ¼ 96 þ 4:9ð3 1Þ 4:3ð15 3Þ ¼ 24

pffiffiffi a
) nðg2 c2 Þ N ð0; 24Þ
(viii) Large sample distribution of bivariate moments
Let F ðx; yÞ be c.d.f from which a random sampleðx1 ; y1 Þ; ðx2 ; y2 Þ; . . .ðxn ; yn Þ is

drawn.
P
We define, m0rs ¼ 1n xri ysi ; l0rs ¼ E ðX r Y s Þ
1X
m010 ¼ xi ¼ x; l010 ¼ E ð X Þ ¼ lx
n
1X
m001 ¼ yi ¼ y; l001 ¼ E ðY Þ ¼ ly
n
1Xn r s 1 X n s
m0rs ¼ xi l010 yi l001 ¼ ð x i lx Þ r y i l y
n 1 n 1
1 X
n
1 X
n r s
mrs ¼ ðxi xÞr ðyi yÞs ¼ xi m010 yi m001
n 1 n 1
s r s
lrs ¼ EðX lx Þr Y ly ¼ E X l010 Y l001
1X n 1X 0
E m0rs ¼ E xri ysi ¼ lrs ¼ l0rs
n 1 n
1 nX r s X u v o
E m0rs m0uv ¼ 2 E xi y i xi yi
( n )
1 X n XX n
rþu sþv
¼ 2E xi yi þ r s u v
xi yi xj yj
n 1 i6¼j¼1
1h i
¼ 2 nl0r þ u;s þ v þ nðn 1Þl0rs l0uv
n
l0r þ u;s þ v þ ðn 1Þl0rs l0uv
¼
n
Appendix 287
1 0
) Cov m0rs ; m0uv ¼ lr þ u;s þ v þ ðn 1Þl0rs l0uv l0rs l0uv
n
1h 0 i
¼ lr þ u;s þ v l0rs l0uv
n
1 0
)V m0rs ¼ l2r;2s l02
rs
n
0 1X r s
E mrs ¼ E xi l010 yi l001 ¼ lrs
n
1 X r s X u v
E m0rs m0uv ¼ 2 E xi l010 yi l001 xi l010 yi l001
n
1
¼ 2 nlr þ u;s þ v þ nðn 1Þlrs luv
n
1
¼ lr þ u;s þ v þ ðn 1Þlrs luv
n
1
)Cov m0rs ; m0uv ¼ lr þ u;s þ v lrs luv
n
0 1
) V mrs ¼ l2r;2s l2rs
n
1X n
mrs ¼ ðxi xÞr ðyi yÞs
n 1
1Xn r s
¼ xi l010 x l010 yi l001 y l001
n 1
Pn P
Since x l010 ¼ 1n 1 xi l010 ¼ m010 and y l001 ¼ 1n n1 yi l001 ¼ m001
1X n r s
) mrs ¼ xi l010 m010 yi l001 m001
n 1
n
1X r r r1 0
xi l010 xi l010
r
¼ m10 þ þ ð1Þr m010
n 1 1

s s s1 0
yi l001 yi l001
s
m01 þ . . . þ ð1Þs m001
1
n
1X r s r r1 s s r s1 0
¼ xi l010 yi l001 xi l010 yi l001 m010 xi l010 yi l001 m01
n 1 1 1

r s r1 s1 0 0 r s
þ xi l010 yi l001 m10 m01 þ ð1Þr þ s m010 m001
1 1

r s r s r s
¼ m0rs m0r1;s m010 m0r;s1 m001 þ m0r1;s1 m010 m001 þ þ ð1Þr þ s m010 m001
1 1 1 1

¼ g m0ij ; i ¼ 0ð1Þr; j ¼ 0ð1Þs; ði; jÞ 6¼ ð0; 0Þ

) mrs ¼ g m0 ;

288 Appendix
0
1fðr þ 1Þðs þ 1Þ1gX1
m010
B m0 C
B 01 C
, say where m ¼ B
0
B C
C

@ A
m0rs
Using the expansion in Taylor’s series

mrs ¼ g m0ij ; i ¼ 0ð1Þr; j ¼ 0ð1Þs; ði; jÞ 6¼ ð0; 0Þ
!
r X
X s dg
¼ g lij ; i ¼ 0ð1Þr; j ¼ 0ð1Þs; ði; jÞ 6¼ ð0; 0Þ þ m0ij lij þ
i¼0 j¼0
dm0ij
m0 ¼ l

ði;jÞ6¼ð0;0Þ
1
where l ¼ ðl10 ; l01 ; ; lrs Þ
PP @g
¼ lrs þ ðm0ij lij Þ @m 0 ðas l01 ¼ l10 ¼ 0Þ
ij
i j m0 ¼ l

ði;jÞ6¼ð0;0Þ

@g
Now @m010 m 0 ¼ l
¼ rlr1;s

@g
¼ slr;s1
@m001 m0 ¼ l

!
@g @g
¼ 1 and ¼ 08i ¼ 0ð1Þr; j ¼ 0ð1Þs
@m0rs m ¼l
0 @m 0
ij

m0 ¼ l

ði; jÞ ¼ ð0; 0Þ; ðr; sÞ; ð0; 1Þ; ð1; 0Þ:

) mrs ¼ lrs þ m010 l10 rlr1;s þ m001 l01 slr;s1 þ m0rs lrs 1
) mrs ¼ m0rs rlr1;s m010 slr;s1 m001

) E ðmrs Þ ¼ E m0rs rlr1;s E m010 slr;s1 E m001
¼ lrs
Appendix 289

Covðmrs ; muv Þ ¼ Cov m0rs rlr1;s m010 slr;s1 m001 ; m0uv ulu1;v m010 vlu;v1 m001
0 0 0 0 0 0 0
¼ Cov mrs ; muv ulu1;v Cov mrs ; m10 vlu;v1 Cov mrs ; m01 rlr1;s Cov m10 ; m0uv
0 0
þ rulr1;s lu1;v V m10 þ rvlr1;s lu;v1 Cov m10 ; m001 slr;s1 Cov m001 ; m0uv

þ uslr;s1 lu1;v Cov m001 ; m010 þ svlr;s1 lu;v1 V m001
" #
1 lr þ u;s þ v lrs luv ulu1;v lr þ 1;s vlu;v1 lr;s þ 1 rlr1;s lu þ 1;v þ rulr1;s lu1;v l20 þ rvlr1;s lu;v1 l11
¼
n slr;s1 lu;v þ 1 þ uslr;s1 lu1;v l11 þ svlr;s1 lu;v1 l02
1h i
)V ðmrs Þ ¼ l2r;2s l2rs þ r 2 l2r1;s l20 þ s2 l2r;s1 l02 2rlr1;s lr þ 1;s 2slr;s1 lr;s þ 1 þ 2rslr1;s lr;s1 l11
n
1 1
)V ðm20 Þ ¼ l40 l220 ; Covðm20 ; m02 Þ ¼ ½l22 l20 l02
n n
1 1
V ðm02 Þ ¼ l04 l202 ; Covðm20 ; m11 Þ ¼ ½l31 l20 l11
n n
1 1
V ðm11 Þ ¼ l22 l11 ; Covðm02 ; m11 Þ ¼ ½l13 l02 l11
2
n n

Sample correlation r ¼ pffiffiffiffiffiffiffiffiffiffi
m11 ffi
m20 m02 ¼ g
m ¼ gðTnÞ; say

where m ¼ ðm20 ; m02 ; m11 Þ and Tn ¼ ðT1n ; T2n ; T3n Þ0 ¼ ðm20 ; m02 ; m11 Þ0
0

X
pffiffiffi a
) n Tn h N3 0 ; 33

0 1 0 1
h1 l20
pffiffiffi P
where h ¼ @ h2 A ¼ @ l02 A ¼ l
a
i.e., ) n m l N3 0 ;

h3 l11
0 1
P l40 l220 l22 l20 l02 l31 l20 l11
and ¼ rij ðhÞ ¼ @ l04 l202 l13 l02 l11 A
l22 l211
l11
q ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ g ðlÞ ¼ g ðhÞ
l20 l02

pffiffiffi a
n g T g h N 0; V h
n
pffiffiffi
a
i.e., nðr qÞ N 0; V h
P P
@g @g
where V h ¼ 3i¼1 3j¼1 @h h
@hj rij
i
@g @g
and @h ¼ @T
i in T ¼h
n
290 Appendix
@g 2 @g 2 @g 2
@g @g
i:e:V h ¼ r11 h þ r22 h þ r33 h þ 2 r12 h
@h1 @h2 @h3 @h1 @h2

@g @g @g @g
þ2 r13 h þ 2 r23 h
@h1 @h3 @h2 @h3

l 1 l l l l31 l13
¼ q2 2 þ
22 40
þ 04
þ 2 22
þ
l11 4 l220 l202 l20 l02 l11 l20 l11 l02

If the sampling is from N2 l1 ; l2 ; r21 ; r22 ; q then

l40 ¼ 3r41 ; l04 ¼ 3r42 ; l11 ¼ qr1 r2 ; l22 ¼ r21 r22 1 þ 2q2
l13 ¼ 3qr1 r32 ; l31 ¼ 3qr31 r2 ; l20 ¼ r21 ; l02 ¼ r22

Using these values in the expression of V h ; we get

2
V h ¼ 1 q2

pffiffiffi 2
a
) nðr qÞ N 0; 1 q2

ð1q2 Þ
2
a
i.e., r N q; n
This result can be used for testing hypothesis regarding q.
pffiffi
ðrq0 Þ a
(i) H0 : q ¼ q0 ; under H0 : s ¼ n1q 2 N ð0; 1Þ
0
ð1q21 Þ ð1q22 Þ
2 2
a a
(ii) H0 : q1 ¼ q2 ð¼ q; sayÞ; r1 N q1 ; n1 ; r2 N q2 ; n2
2 2 !
a 1 q21 1 q22
) r1 r2 N q1 q2 ; þ
n1 n2
r1 r
r ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a
Under H0 ;s ¼
2
N ð0; 1Þ
ð1q2 Þ 1
n1 þ n1
2
^ ¼ n1nr11 þ
If q is unknown then it is estimated by q n2 r 2
þ n2
If q is known, then the efficiency of the test will be good enough, but if it is
unknown then the efficiency will be diminished. We can use the estimate of q only
when the sample sizes are very very large. Otherwise, we transform the statistic so
that its distribution is independent of q.
Appendix 291
A.16 Transformation of Statistics

pffiffiffi a
If a sequence of statistics fTn g for estimating h are such that nðTn hÞ
N ð0; r2 ðhÞÞ; then for large samples the normal distribution can be used for testing
hypothesis regarding h if r2 ðhÞ is independent of h. Otherwise, it may be necessary
to transform the statistic Tn such that the new statistic gðTn Þ has an asymptotic
variance independent of h. This is known as transformation of statistics. Another
important advantage of such transformation is that in many cases the distribution of
gðTn Þ tends more rapidly to normality than Tn itself, so that large sample tests can
be made even for moderately large sample sizes. Also, in analysis of variance,
where the assumption of homoscedasticity is made, such transformation of statistics
may be useful.
A general formula We know that, if fTn g is a sequence of statistics
pffiffiffi a pffiffiffi a
nðTn hÞ N ð0; r2 ðhÞÞ; then nðgðTn Þ gðhÞÞ N ð0; g02 ðhÞr2 ðhÞÞ provided
0
gðÞ is a function admitting 1st derivative and g ðhÞ 6¼ 0:
By equating the standard deviation g0 ðhÞrðhÞ to a constant c, we get the dif-
ferentiated equation
c
dgðhÞ ¼ dh
rðhÞ
R
Solving this equation we get, gðhÞ ¼ rðchÞ dh þ k; where k is the constant of
integration. Using this formula and suitably choosing c and k we can obtain a
number of transformations of statistics of different important cases.
I. sin−1 transformationof the square root of the binomial proportion
pffiffiffi a
We know nðp PÞ N ð0; Pð1 PÞ ¼ r2 ðPÞÞ: We like to have a function gðÞ
pffiffiffi a
such that nðgð pÞ gðPÞÞ N ð0; c2 Þ where c is independent of P.
From the differentiated equation, we have
Z
c
gð P Þ ¼ d ðPÞ þ k
rðPÞ
Z
1
¼ c pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi d ðPÞ þ k ¼ c:2h þ k
Pð1 PÞ
1
pffiffiffi
¼ c:2 sin Pþk where sin2 h ¼ P
pffiffiffi
Now selecting c ¼ 12 and k ¼ 0; we have gðPÞ ¼ sin1 P
292 Appendix
pffiffiffi pffiffiffi pffiffiffi pffiffiffi a

) gð pÞ ¼ sin1 P and n sin1 p sin1 P N 0; c2 ¼ 14
i.e.,

pffiffiffi a pffiffiffi 1
sin1 p N sin1 P;
4n
This fact can be used for testing hypothesis regarding P.

Note Ascomble (1948) has shown that a slightly better transformation assuming
rffiffiffiffiffiffiffiffi
1 p þ 8n3
more stability in variance is sin 1þ 3
which has asymptotic variance 4n 1þ 2 :
4n
Uses: (i) H0 : P ¼ P0
pffiffiffi pffiffiffiffiffi pffiffiffi a
Under H0 ; s ¼ sin1 p sin1 P0 2 n N ð0; 1Þ
w0 : jsj [ sa=2 where H1 : P 6¼ P0 :
Interval estimate of P
h pffiffiffi pffiffiffi pffiffiffi i
Pr sa=2 2 n sin1 p sin1 P sa=2 ¼ 1 a
h pffiffiffi sa=2ffiffi pffiffiffi i
sa=2
i.e., Pr sin2 sin1 p 2p P sin2 sin1 p þ pffiffi ¼1a
n 2 n
(ii) H0 : P1 ¼ P2 ð¼PÞsay

pffiffiffiffiffi a pffiffiffiffiffi 1
sin1 p1 N sin1 P1 ;
4n1

p ffiffiffiffi
ffi p ffiffiffiffiffi 1
sin1 p2 N sin1 P2 ;
a
4n2

1 pffiffiffiffiffi 1 p ffiffiffiffi
ffi a 1
pffiffiffiffiffi 1
pffiffiffiffiffi 1 1
) sin p1 sin p2 N sin P1 sin P2 ; þ
4n1 4n2
Under H0 ;
sin1 p1 sin1 p2 a
s¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N ð0; 1Þ
1
4n1 þ 1
4n2
) w0 : jsj [ sa=2 if H1 : P1 6¼ P2 :
Appendix 293
If H0 is accepted then to find the confidence interval for P:
pffiffiffi 1 pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi

c 1 P ¼ 4n1 sin p1 þ 4n2 sin1 p2 n1 sin1 p1 þ n2 sin1 p2
sin ¼
4n1 þ 4n2 n1 þ n2
p ffiffiffi n sin1 pffiffiffiffiffi
P þ n sin 1 pffiffiffiffiffi
P p ffiffiffi
c 1 P ¼
E sin
1 1 2 2
¼ sin1 P ½AsP1 ¼ P2 ¼ P
n1 þ n2
pffiffiffi n2 1 þ n22 4n12 1
V sinc 1 P ¼ 1 4n1 ¼
ðn1 þ n2 Þ 2 4ð n1 þ n2 Þ

p ffiffiffi p ffiffiffi 1
c 1 P
) sin
a
N sin1 P;
4ðn1 þ n2 Þ
h pffiffiffi pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i
Pr sa=2 sin c 1 P sin1 P 2 ðn1 þ n2 Þ sa=2 ¼ 1 a
" ! !#
2 c 1
pffiffiffi sa=2 2 c 1
pffiffiffi sa=2
) Pr sin sin P pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P sin sin P þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1 a:
2 ðn1 þ n2 Þ 2 ðn1 þ n2 Þ
to find the confidence

If H0 is rejected, then interval for the difference ðP1 P2 Þ:
1 pffiffiffiffiffi a 1 pffiffiffiffiffi 1
Since sin p1 N sin P1 ; 4n1

pffiffiffiffiffi sa=2 pffiffiffiffiffi sa=2
) Pr sin2 sin1 p1 pffiffiffiffiffi P1 sin2 sin1 p1 þ pffiffiffiffiffi ¼1a
2 n1 2 n1
pffiffiffiffiffi sa=2
i.e., Pð AÞ ¼ 1 a where A ¼ fL1 P1 U1 g having L1 ¼ sin2 ð sin1 p1 2pffiffiffi ffi
n1 Þ

U1 ¼ sin2 sin1 p1 þ pffiffiffiffiffi
2 n1
pffiffiffiffiffi a pffiffiffiffiffi
Similarly, sin1 p2 N sin1 P2 ; 4n12 and Pr fL2 P2 U2 g ¼ 1 a
where

L2 ¼ sin2 sin1 p2 pffiffiffiffiffi
2 n2

U2 ¼ sin2 sin1 p2 þ pffiffiffiffiffi
2 n2
i.e., PðBÞ ¼ 1 a where B ¼ fL2 P2 U2 g

As PðABÞ Pð AÞ þ PðBÞ 1
) Pr fL1 P1 U1 ; L2 P2 U2 g ð1 aÞ þ ð1 aÞ 1
) Pr fL1 U2 P1 P2 U1 L2 g ð1 2aÞ:
294 Appendix
(iii) H0 : P1 ¼ P2 ¼ ¼ Pk ð¼ PÞ say

1 pffiffiffiffi a 1
pffiffiffiffiffi 1
sin pi N sin Pi ; ; i ¼ 1ð1Þk
4ni
P pffiffiffiffi pffiffiffi2
) under H0 ;P ki¼1 sin1 pi sin1 P 4ni v2k
pffiffiffi 1 pffiffiffi Pk 1 pffiffiffiffi pffiffiffi2
c 1 P ¼ nP
sin
i sin pi
and thus v 2
¼ sin p i c 1 P 4ni v2
sin
n i
i¼1 x1
If H0 is accepted, then to find the interval estimate of P:

P pffiffiffiffiffi P pffiffiffi
pffiffiffi ni sin1 Pi ni sin1 P pffiffiffi
c 1 P ¼
E sin P ¼ P ¼ sin1 P * ½P1 ¼ P2 ¼ ¼ Pk ¼ P
ni ni
P 2
pffiffiffi ni 1 1
c
V sin 1
P ¼ P 24ni ¼ P
ð ni Þ 4ð ni Þ

pffiffiffi a pffiffiffi 1
c 1 P
) sin N sin1 P; P
4ð ni Þ
" ! !#
2 c 1
p ffiffiffi sa=2 2 c 1
pffiffiffi sa=2
Pr sin sin P pP ffiffiffiffiffiffiffiffiffiffi P sin sin P þ pP ffiffiffiffiffiffiffiffiffiffi ¼1a
2 ni 2 ni
II Square root transformation of Poisson variate

If X PðkÞ; then E ð X Þ ¼ V ð X Þ ¼ k
a
We know, ðX kÞ N ð0; k ¼ r2 ðkÞÞ: We would like to have a function g(.)
a
such that gð X Þ gðkÞ N ð0; c2 Þ where c2 is independent of k.
R dk R dk pffiffiffi
gðkÞ ¼ c rðkÞ þ k ¼ c pffiffik þ k ¼ c2 k þ k:
pffiffiffi
Taking k = 0 and c = 1/2, gðkÞ ¼ k
pffiffiffiffi pffiffiffiffi pffiffiffi a
) gð X Þ ¼ X and c2 ¼ 14 ; ) X k N 0; 14
pffiffiffiffi a pffiffiffi
i.e., X N k; 14 :
Uses: (i) H0 : k ¼ k0
pffiffiffiffi pffiffiffiffiffi a
Under H0 ; s ¼ 2 X k0 N ð0; 1Þ
w0 : jsj [ sa=2 where H1 : k 6¼ k0
Interval estimate of k:
h pffiffiffiffi pffiffiffi i
P sa=2 2 X k sa=2 ¼ 1 a
" 2 2 #
pffiffiffiffi 1 pffiffiffiffi 1
)P X sa=2 k X þ sa=2 ¼1a
2 2
Appendix 295
(ii) H0 : k1 ¼ k2 ð¼ kÞ;Say
pffiffiffiffiffi a 1 pffiffiffiffiffi a 1
X1 N k1 ; ; X2 N k2 ;
4 4
pffiffiffiffiffi pffiffiffiffiffi a 1
) X1 X2 N k1 k2 ;
2
pffiffiffiffiffi pffiffiffiffiffi a
) Under H0 ; s ¼ X1 X2 2 N ð0; 1Þ
w0 : jsj [ sa=2 if H1 : k1 6¼ k2 :
If H0 is accepted, then to find the confidence interval for k:
pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi
p
cffiffiffi 4 X1 þ 4 X2 X1 þ X2
k¼ ¼
4þ4 2
p
cffiffiffi k þ k pffiffiffi h pffiffiffiffiffi pffiffiffiffiffi pffiffiffii
E k ¼ ¼ k * k1 ¼ k 2 ¼ k
2
1 1
p
cffiffiffi þ 1
V k ¼4 4¼
4 8
p
cffiffiffi a pffiffiffi
So k N k; 18

p
cffiffiffi pffiffiffi pffiffiffi
) Probability sa=2 k k 8 sa=2 ¼ 1 a
" 2 2 #
p
cffiffiffi sa=2 p
cffiffiffi sa=2
) Probability k pffiffiffi k k þ pffiffiffi ¼1a
8 8
If H0 is rejected, then to find the confidence interval for the difference ðk1 k2 Þ :
Since X1 N k1 ; 14
pffiffiffiffiffi s 2
pffiffiffiffiffi sa=2 2 a=2
) Probability X1 k1 X1 þ ¼1a
2 2
pffiffiffiffiffi sa=2 2
i.e. Pð AÞ ¼ 1 a where A ¼ fL1 k1 U1 g where L1 ¼ X1 2 and
U1 ¼ X1 þ 2
Similarly, X2 N k2 ; 14
pffiffiffiffiffi s 2
pffiffiffiffiffi sa=2 2 a=2
) Probability X2 k2 X2 þ ¼1a
2 2
i.e. PðBÞ ¼ 1 a where B ¼ fL2 k2 U2 g having L2 ¼ X2 2 ;
U2 ¼ X2 þ 2
296 Appendix
As, PðABÞ Pð AÞ þ PðBÞ 1
) ProbabilityfL1 k1 U1 ; L2 k2 U2 g ð1 aÞ þ ð1 aÞ 1
) ProbabilityfL1 U2 k1 k2 U1 L2 g 1 2a
(iii) H0 : k1 ¼ k2 ¼ ¼ kk ð¼ kÞsay
pffiffiffiffi
pffiffiffiffiffi a 1
Xi N ki ; ; i ¼ 1ð1Þk
4
P pffiffiffiffiffi pffiffiffiffi2
) under
P H0 ; ki¼1 Xi ki 4 v2k
p^ffiffiffi p ffiffiffi
Xi
k ¼ k and then
Xk pffiffiffiffiffi p^ffiffiffi2
v2 ¼ i¼1
Xi k 4 v2k1
w0 : v2 [ v2a;k1 :
pffiffiffi
^
PpIf H0Pis accepted, then to find the interval estimate of k : E k ¼
ffiffiffi pffiffi
ki k pffiffiffi
k ¼ k ¼ k ½Ask1 ¼ k2 ¼ ¼ kk ¼ k
P pffiffiffiffiffi
p
cffiffiffi V Xi 1
V k ¼ 2
¼
k 4k

p ffiffi
c affi p ffiffi
ffi 1
) kN k;
4k

p
cffiffiffi pffiffiffi pffiffiffiffiffi
)s ¼ k k 4k N ð0; 1Þ

p
cffiffiffi pffiffiffi pffiffiffi
)Probability sa=2 k k 2 k sa=2 ¼ 1 a
" #
pcffiffiffi sa=2 2 p
cffiffiffi sa=2 2
) Probability k pffiffiffi k k þ pffiffiffi ¼1a
2 k 2 k
Note It can be shown that

pffiffiffiffi pffiffiffi
1
E X ¼ k þ 0 pffiffiffi
k
pffiffiffiffi 1
1
V X ¼ þ0
4 k
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
whereas E X þ 3=8 ¼ k þ 3=8 þ 0 p1ffiffik and V X þ 3=8 ¼ 14 þ 0 k12 :
Appendix 297
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffi
Comparing V X and V X þ 3=8 we observe that X þ 3=8 is better
pffiffiffiffi
transformation than X .
III Logarithmic transformation of sample variance for N ðl; r2 Þ
1 X
s2 ¼ ðxi xÞ2
n1
4
E ðs2 Þ ¼ r2 and V ðs2 Þ ’ 2rn P
4 ðxi xÞ2
Also EðS2 Þ ! r2 and V ðS2 Þ ’ 2rn for S2 ¼ n

2 a 2r4
)s N r ; 2
n
a
We like to get a function gðÞ such that gðs2 Þ N ðgðr2 Þ; c2 Þ where c2 is
independent of r2 .
Z sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Z
c n dr2
g r2 ¼ qffiffiffiffiffi dr2 ¼ c
2r4 2 r2
n
rffiffiffi
n
¼c log r2 þ k
2 e
qffiffi
Choosing k ¼ 0 and c ¼ 2n. We get gðr2 Þ ¼ loge r2

2 2 a 2 2
) g s ¼ loge s N loge r ;
n
Uses: (i) H0 : r2 ¼r20

pffiffi
Under H0 ; s ¼ n2 log s2 log r02 N ð0; 1Þ
w0 : jsj [ sa=2 if H1 : r2 6¼ r20 :
Interval estimate of r2 is given as
rffiffiffi
n
Probability sa=2 log s2 log r2 sa=2 ¼ 1 a
2
h pffi2 pffi2 i
i.e., Probability elog s nsa=2 r2 elog s þ nsa=2 ¼ 1 a.
2 2
(ii) H0 : r21 ¼ r22 ð¼r2 Þsay

log s log s2 a
2 2
Under H0 : s ¼ p1ffiffiffiffiffiffiffiffiffi N ð0; 1Þ
2 2
n1 þn
2
w0 : jsj sa=2 if H1 : r21 6¼ r22 :

298 Appendix
If H0 is accepted, thus the interval estimate of the common variance r2 can be

obtained by the
dr 2 ¼ 2 log s1 þ 2 log s2 ¼ n1 log s1 þ n2 log s2 :
n1 2 n2 2 2 2
dr 2 where log
log n1
þ 2
n2
n1 þ n2
2
d d
E log r ¼ log r2 and V log r ¼ 2
2 2
n1 þ n2

d a 2
log r N log r ; 2
n1 þ n2
pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi
d
log r 2 n þ2 n sa=2 d
log r 2 þ n1 þ n2 sa=2
2
)Probability e 1 2 r e
2
¼ 1 a:

If H0 is rejected, then the confidence interval for r21 r22 can be obtained in
the following way:
pffiffiffi2 pffiffiffi2
log s21 n1 sr=2 log s21 þ n1 sa=2
Probability e r21 e ¼1a
pffiffiffi
log s21 n2 sa=2
i.e., Pð AÞ ¼ 1 a where A ¼ L1 r1 U1 having L1 ¼ e
2 1
pffiffiffi2
log s2 þ n1 sa=2
U1 ¼ e 1
pffiffiffi pffiffiffi2
log s22 n2 sa=2 log s22 þ n2 sa=2
Also, Probability e 2 r2 e
2
¼1a

i.e., PðBÞ ¼ 1 a where B ¼ L2 r2 U2 2

) Probability L1 r21 U1 ; L2 r22 U2 ð1 aÞ þ ð1 aÞ 1

Or Probability L1 U2 r21 r22 U1 L2 1 2a
(iii) H0 : r21 ¼ r22 ¼ ¼ r2k ð¼r2 Þsay

a 2 2
loge s2i
¼ N loge ri ;
ni
P 2
dr 2 n1
) under H0 ; Pki¼1 loge s2i log
a 2
vk1
2
2
d
where log er ¼
2 P
ni loge si
;
ni
) w0 : v2 [ v2a;k1
Appendix 299
If H0 is accepted, then the interval estimate of r2 can be obtained as follows:

d 2 a 2
loge r N loge r ; P 2
ni
2 3
6 c
log e r loge r
2 2
7
)Probability4sa=2 qffiffiffiffiffiffiffiffiffi sa=2 5 ¼ 1 a
P 2
ni
2 q ffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffi 3
d
loge r
2 P 2
sa=2 d
loge r 2 þ P2 sa=2
4
So, Probability e
ni
r2 e
ni
5 ¼ 1 a:
IV Logarithmic transformation of sample s.d.

a r2
s N r;
2n
a
We like to get a gðÞ such that gðsÞ N ðgðrÞ; c2 Þ where c2 is independent of r.
R pffiffiffiffiffi
gðrÞ ¼ r pc ffiffiffiffiffi dr ¼ 2nc loge r þ k: Choosing c ¼ p1ffiffiffi ffi and k ¼ 0. We have
= 2n 2n
gðrÞ ¼ loge r
a
) gðsÞ ¼ loge s N loge r; 2n 1
:
We may use this result for testing hypothesis related to r.

V Z-transformation of sample correlation coefficient from N2 l1 ; l2 ; r21 ; r22 ; q :
ð1q2 Þ
2
E ðr Þ q and V ðr Þ n
2
!
a ð 1 q2 Þ
) r N q; :
n
We like to get a function gðÞ such that gðr Þ is asymptotically normal with
variance independent of q:
Z
pffiffiffi c pffiffiffi 1 1þq
) gðqÞ ¼ n dp ¼ nc loge þk
1q2 2 1q
We choose c ¼ p1ffiffin and k = 0 and then
1 1þq
) gðqÞ ¼ loge ¼ tan h1 q ¼ n; ðsayÞ
2 1q
1 1þr
) gðr Þ ¼ loge ¼ tan h1 r ¼ Z; ðsayÞ
2 1r
300 Appendix

a 1
)Z ¼ gðr Þ N n; :
n
Note Putting Z n ¼ y; the distribution of y may be derived using the distri-

bution of r. The first four moments were found by Fisher and later they were revised
by Gayen (1951). 1
q
In fact EðZ Þ ¼ n þ 2ðn1 Þ þ 0 n2

1 4 q2 1
l2 ðZ Þ ¼ þ þ0 3
n 1 2ð n 1Þ 2 n

2 1
Now, 1
¼ ðn1Þ1 n3 ¼ ðn1Þ 11 2 ¼ n1
1
1 n1
n3 ðn1Þ ð n1Þ
" #
1 2 4 1
¼ 1þ þ þ0 3
n1 n 1 ðn 1Þ2 n

1 2 1
¼ þ þ 0
n 1 ð n 1Þ 2 n3
q
2 1
Again, l2 ðZ Þ ¼ n1
1
þ 2
ðn1Þ2
2ðn1 Þ2
þ0 n3

1 q2 1
¼ þ0 3
n 3 2ð n 1Þ 2 n
1
) l2 ðZ Þ ’
n3
In fact, V ðZ Þ ’ 1n for large n

’ n3
1
for moderately large n:
) For moderately large n;

11 1þr a 1
Z ¼ tan h r ¼ loge N n;
2 1r n3
þq
where n ¼ tan h1 q ¼ 12 loge 11q :
Uses: (i) H0 : q ¼ q0 against H1 : q 6¼ q0
þ q0
, H0 : n ¼ n0 against H1 : n 6¼ n0 where n0 ¼ 12 loge 11q
pffiffiffiffiffiffiffiffiffiffiffi a 0
) Under H0 ; s ¼ ðZ n0 Þ n 3 N ð0; 1Þ
w0 : jsj sa=2
Appendix 301
Also, the 100ð1 aÞ% confidence interval for q is given as

h pffiffiffiffiffiffiffiffiffiffiffi i
Probability sa=2 n 3ðZ nÞ sa=2 ¼ 1 a

sa=2 sa=2
Probability Z pffiffiffiffiffiffiffiffiffiffiffi n Z þ pffiffiffiffiffiffiffiffiffiffiffi ¼ 1 a
n3 n3
" s
sa=2
#
2 Zpffiffiffiffiffi 1þq 2 Z þ pffiffiffiffiffi
a=2
) Probability e n3
e n3
¼1a
1q
2 3
sa=2 sa=2
ffiffiffiffiffi
Zp 2 Z þ pffiffiffiffiffi
6 7
2
i.e., Probability4 e s 1 q e s 1 5 ¼ 1 a
n3 n3
2 Zpffiffiffiffiffi 2 Z þ pffiffiffiffiffi
a=2 a=2
e n3
þ1 n3
e þ1
(ii) H0 : q1 ¼ q2 ð¼ qÞ;say
, H0 : n1 ¼ n2 ð¼nÞ say

1
Z1 ¼ tanh1 r1 N n1 ;
a
n1 3

1
Z2 ¼ tanh1 r2 N n2 ;
a
n2 3

a 1 1
)ðZ1 Z2 Þ N n1 n2 ; þ
n1 3 n2 3
Z1 Z1 ffi
Under H0 ; s ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
a
N ð0; 1Þ; )w0 : jsj sa=2 if H1 : q1 6¼ q2
3 þ n
1 1
n1 2 3
If H0 is accepted 100ð1 aÞ%confidence interval for n is given as
ðn1 3ÞZ1 þ ðn2 3ÞZ2 ðn1 3ÞZ1 þ ðn2 3ÞZ2

Z^ ¼ ¼
ðn1 3Þ þ ðn2 3Þ n1 þ n2 6
1 1 þ q
E Z^ ¼ n ¼ loge
2 1q
1
V Z^ ¼
n1 þ n2 6
. .
ffi 1 log 1 þ q Z^ þ sa=2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1 a:
) Probability Z^ sa=2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
n1 þ n2 6 2 e 1 q n1 þ n2 6
We can get 100ð1 aÞ% confidence interval from this.

If H0 is rejected, then 100ð1 2aÞ% confidence interval can be obtained as
follows:
302 Appendix
2 sa=2

sa=2
3
2 Z1 pffiffiffiffiffiffi 2 Z1 þ pffiffiffiffiffiffi
6e
n1 3
1 e n1 3
17
Probability4 sa=2
q1
sa=2
5¼1a
e n1 3
þ1 e n1 3
þ1

sa=2
ffiffiffiffiffiffi 2 Z1 p
n1 3
Or, Pð AÞ ¼ 1 a where A ¼ fL1 q1 U1 g having L1 ¼ e s 1
2 Z1 pffiffiffiffiffiffi
a=2
n 3
e 1 þ1

sa=2
2 Z1 þ pffiffiffiffiffiffi
e n2 3
1
U1 ¼
sa=2
2 Z1 þ p ffiffiffiffiffiffi
e n2 3
þ1
we get PðBÞ ¼ 1 a where

Similarly B¼ fL2 q2 U2 g and
sa=2 sa=2
n2 3 n2 3
L2 ¼ e s 1 and U2 ¼ e sa=2
1
a=2
n 3 n 3
e 2 þ1 e 2 þ1
)ProbabilityfL1 q1 U1 ; L2 q2 U2 g ð1 aÞ þ ð1 aÞ 1
)ProbabilityfL1 U2 q1 q2 U1 L2 g ð1 2aÞ:
(iii) H0 : q1 ¼ q2 ¼ ¼ qk ð¼qÞ
, H0 : n1 ¼ n2 ¼ ¼ nk ð¼nÞ

1 1 þ ri a 1
Zi ¼ loge N ni ;
2 1 ri ni 3
P
P 2 a ðni 3ÞZi
Under H0 ;v2 ¼ k1 ðni 3Þ Zi ^n v2k1 where ^n ¼ P ðn 3Þ
i
w0 : v2 [ v2a;k1
If H0 is accepted, then 100ð1 aÞ%confidence interval for n (Subsequently for

q) can be obtained as follows:
1
E ^
n ¼ n; V ^n ¼ P
ni 3k
h X i
)Probability sa=2 ni 3k ^n n sa=2 ¼ 1 a
This will provide us for interval estimate of n and thus for q.

References
Agresti, A.: Categorical Data Analysis. Wiley, New York (1990)

Aigner D.J.: Basic Econometrics, Prentice-Hall (1971)
Aitken, M., Anderson, D., Francis, B., Hinde, J.: Statistical Modelling in GLIM. Clarendon Press,
Oxford (1989)
Akahira, M., Takeuchi, K.: Asymptotic Efficiency of Statistical Estimators. Springer-Verlag, New
York (1981)
Allen, R.G.D.: Statistics for Economics. Hutchinson Universal Library (1951)
Amari, S.-I., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Differential geometry
and statistical inference. Institute of Mathematical Statistics, Hayward (1987)
Anderson, T.W.: An Introduction to multivariate analysis. Wiley, New York (1958)
Anderson, T.W.: Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York
(1984)
Arnold, S.F.: The Theory of Linear Models and Multivariate Analysis. Wiley, New York (1981)
Arnold, S.F.: Sufficiency and invariance. Statist. Prob. Lett. 3, 275–279 (1985)
Ash, R.: Real Analysis and Probability. Academic Press, New York (1972)
Bahadur, R.R.: Some Limit Theorems in Statistics. SIAM, Philadelphia (1971)
Bar-Lev, S.K., Enis, P.: On the classical choice of variance stabilizing transformations and an
application for a Poisson variate. Biometrika 75, 803–804 (1988)
Barlow, R., Proschan, F.: Statistical Theory of Life Testing. Holt, Rinehart and Winston, New
York (1975)
Barnard, G.A.: Pivotal Inference and the Bayesian Controversy (with discussion). In: Bernardo, J.
M., DeGroot, M.H., Lindley, D.V., Smith, A.F.M. (eds.) Bayesian Statistics. University Press,
Valencia (1980)
Barnard, G.A.: Pivotal inference. In: Johnson, N.L., Kota, S., Reade, C. (eds.) Encyclopedia
Statistical Sciences. Wiley, New York (1985)
Barndorff-Nielsen, O.: Information and Exponential Families in Statistical Theory. Wiley, New
York (1978)
Barndorff-Nielsen, O.: Conditionality resolutions. Biometrika 67, 293–310 (1980)
Barndorff-Nielsen, O.: On a formula for the distribution of the maximum likelihood estimator.
Biometrika 70, 343–365 (1983)
Barndorff-Nielsen, O.: Parametric Statistical Models and Likelihood. Lecture Notes in Statistics
50. Springer, New York (1988)
Barndorff-Nielsen, O., Blaesild, P.: Global maxima, and likelihood in linear models. Research
Rept. 57. Department of Theoretical Statistics, University of Aarhus (1980)
Barndorff-Nielsen, O., Cox, D.R.: Inference and Asymptotics. Chapman & Hall (1994)
Barr, D.R., Zehna, P.W.: Probabilit y: Modeling Uncertainty. Addison-Wesley. Reading (1983)
Barnett, V.: Comparitive Statistical Inference, 2nd edn. Wiley, New York (1989)
Barnett, V., Lewis, T.: Outliers in Statistics. John Wiley (1978)

DOI 10.1007/978-81-322-2514-0
304 References
Barnett, V.D.: Evaluation of the maximum likelihood estimator where the likelihood equation has
multiple roots. Biometrika 53, 151–166 (1966)
Barron, A.R.: Uniformly powerful goodness of fit test. Ann. Statist. 17, 107–124 (1982)
Basawa, I.V., Prakasa Rao, B.L.S.: Statistical Inference in Stochastic Processes. Academic Press,
London (1980)
Basu, D.: A note on the theory of unbiased estimation. Ann. Math. Statist. 26, 345-348. Reprinted
as Chapter XX of Statistical Information and Likelihood: A Collection of Critical Essays.
Springer-Verlag, New York (1955a)
Basu, D.: On statistics independent of a complete sufficient statistic. Sankhya 15, 377-380.
Reprinted as Chapter XXII of Statistical Information and Likelihood: A Collection of Critical
Essays. Springer-Verlag, New York (1955b)
Basu, D.: The concept of asymptotic efficiency. Sankhya 17, 193–196. Reprinted as Chapter XXI
of Statistical Information and Likelihood: A Collection of Critical Essays. Springer-Verlag,
New York (1956)
Basu, D.: Statistical Information and Likelihood: A Collection of Critical Essays (J.K. Ghosh, ed.).
Springer-Verlag, New York (1988)
Bayes, T.: An essay toward solving a problem in the doctrine of chances. Phil. Trans. Roy. Soc.
153, 370–418 (1763). Reprinted in (1958) Biometrika 45, 293-315
Berger, J.: Selecting a minimax estimator of a multivariate normal mean. Ann. Statist. 10, 81–92
(1982)
Berger, J.0.: The Robust Bayesian Viewpoint (with discussion). Robustness of Bayesian Analysis
(J. Kadane, ed.), 63–144. North-Holland. Amsterdam (1984)
Berger, J.0.: Statistical Decision Theory and Bayesian Analysis, 2nd edn. Springer-Verlag, New
York (1985)
Berger, J.O.: Estimation in continuous exponential families: Bayesian estimation subject to risk
restrictions. In: Gupta, S.S., Berger, J.O. (eds.) Statistical Decision Theory III. Academic Press,
New York (1982b)
Berger, J.O.: Statistical Decision Theory and Bayesian Analysis, 2nd edn. Springer-Verlag, New
York (1985)
Berger, J.O., Bernardo, J.M. On the development of reference priors. In: Berger, J.O., Bernardo, J.
M. (eds.) Bayesian Statist. 4. Clarendon Press, London (1992a)
Berger, J.O., Bernardo, J.M.: Ordered group reference priors with application to multinomial
probabilities. Biometrika 79, 25–37 (1992b)
Berger, J.O., Wolpert, R.W.: The Likelihood Principle, 2nd edn. Institute of Mathematical
Statistics, Hayward, CA (1988)
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, New York (1994)
Berndt, E.R.: The practice of Econometrics: Classic and Contemporary. Addison and Wesley
(1991)
Bhattacharya, G.K., Johnson, R.A.: Statistical Concepts and Methods. John Wiley (1977)
Bhatttachayra, R., Denker, M.: Asymptotic Statistics. Birkhauser-Verlag, Basel (1990)
Bickel, P.J.: Minimax estimation of the mean of a normal distribution subject to doing well at a
point. In: Rizvi, M.H., Rustagi, J.S., Siegmund, D., (eds.) Recent Advances in Statistics: Papers
in Honor of Herman Chernoff on his Sixtieth Birthday. Academic Press, New York (1983)
Bickel, P.J., Klaassen, P., Ritov, C.A.J., Wellner, J.: Efficient and Adaptive Estimation for
Semiparametric Models. Johns Hopkins University Press, Baltimore (1993)
Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995)
Blackwell, D., Girshick, M.A.: Theory of Games and Statistical Decision. John Wiley and Sons,
New York (1954)
Blackwell, D., Girshick, M.A.: Theory of Games and Statistical Decisions. Wiley, New York
(1954)
Blyth, C.R.: Maximum probability estimation in small samples. In: Bickel, P.J., Doksum, K.A.,
Hodges, J.L., Jr., (eds.) Festschrift for Erich Lehmann. Wadsworth and Brooks/Cole. Pacific
Grove, CA (1982)
References 305
Bock, M.E.: Employing vague inequality information in the estimation of normal mean vectors
(Estimators that shrink toward closed convex polyhedra). In: Gupta, S.S., Berger, J.O. (eds.).
Statistical Decision Theory III. Academic Press, New York (1982)
Bock, M.E.: Shrinkage estimators: Pseudo-Bayes estimators for normal mean vectors. In: Gupta,
S.S., Berger, J.O. (eds.) Statistical Decision Theory IV. Springer-Verlag, New York (1988)
Bradley, R.A., Gart, J.J.: The asymptotic properties of ML estimators when sampling for
associated populations. Biometrika 49, 205–214 (1962)
Brandwein, A.C., Strawderman, W.E.: Minimax estimation of location parameters for spherically
symmetric unimodal distributions under quadratic loss. Ann. Statist. 6, 377–416 (1978)
Bridge, J.I.: Applied Econometrics. North Holland, Amsterdam (1971)
Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods. Springer-Verlag, New York
(1987)
Brown, D., Rothery, P.: Models in Biology: Mathematics, Statistics and Computing. Wiley, New
York (1993)
Brown, L.D.: Fundamentals of Statistical Exponential Families with Applications in Statistical
Decision Theory. Institute of Mathematical Statistics Lecture Notes–Monograph Series.
Hayward, CA: IMS (1986)
Brown, L.D.: Fundamentals of Statistical Exponential Families. Institute of Mathematical
Statistics, Hayward, CA (1986)
Brown, L.D.: Commentary on paper [19]. Kiefer Collected Papers, Supple-mentary Volume.
Brown, L.D.: Minimaxity, more or less. In: Gupta, S.S., Berger, J.O. (eds.) Statistical Decision
Theory and Related Topics V. Springer-Verlag, New York (1994)
Brown, L.D., Hwang, J.T.: A unified admissibility proof. In: Gupta, S.S., Berger, J.O. (eds.)
Statistical Decision Theory III. Academic Press, New York (1982)
Bucklew, J.A.: Large Deviation Techniques in Decision, Simulation and Estimation (1990)
Burdick, R.K., Graybill, F.A.: Confidence Intervals on Variance Components. Marcel Dekker,
New York (1992)
Carlin, B.P., Louis, T.A.: Bayes and Empirical Bayes Methods for Data Analysis. Chapman &
Hall, London (1996)
Carroll, R.J., Ruppert, D., Stefanski, L.: Measurment Error in Nonlinear Models. Chapman & Hall,
London (1995)
Carroll, R.J., and Ruppert, D.: Transformation and Weighting in Regression. Chapman and Hall,
London (1988)
Carroll, R.J., Ruppert, D., Stefanski, L.A.: Measurement Error in Non-linear Models. Chapman
and HaIl, London (1995)
Casella, G., Berger, R.L.: Statistical Inference.: Wadsworth/Brooks Cole. Pacific Grove, CA
(1990)
Cassel, C., Sa¨rndal, C., Wretman, J.H.: Foundations of Inference in Survey Sampling. Wiley,
New York (1977)
CBBella., G., Robert, C.P.: Rao-Blackwellisation of Sampling Schemes. Biometrika 83, 81–94
(1996)
Chatterji, S., Price, B.: Regression Analysis by Example. John Wiley (1991)
Chatterji, S.D.: A remark on the Crame´r-Rao inequality. In: Kallianpur, G., Krishnaiah, P.R.,
Ghosh, J.K. (eds.) Statistics and Probability: Essays in Honor of C. R. Rao. North Holland,
New York (1982)
Chaudhuri, A., Mukerjee, R.: Randomized Response: Theory and Techniques. Marcel Dekker,
New York (1988)
Chikka.ra, R.S., Folks, J.L.: The Inverse Gaussian Distribution: Theory, Methodolog y, and
Applications.: Marcel Dekker. New York (1989)
306 References
Chow, G.C.: Test of equality between sets of coefficient in two linear regressions. Econometrica
28(3), 591–605 (1960)
Chow, G.C.: EconometricMethods. McGraw-Hill, New York (1983)
Christensen, R.: Plane Answers to Complex Questions: The Theory of Linear Models, 2nd edn.
Christensen, R.: Log-linear Models. Springer-Verlag, New York (1990)
Christensen, R.: Plane Answers to Complex Questions. The Theory of Linear Models, 2nd edn.
Christopher, A.H.: Interpreting and using regression. Sage Publication (1982)
Chung, K.L.: A Course in Probability Theory. Academic Press, New York (1974)
Chung, K.L.: A Course in Probability Theory, 2nd edn. Academic Press, New York (1974)
Chung, K.L.: A Course in Probability Theory, Harcourt. Brace & World, New York (1968)
Cleveland, W.S.: The Elements of Graphing Data. Wadsworth. Monterey (1985)
Clevensen, M.L., Zidek, J.: Simultaneous estimation of the mean of independent Poisson laws.
J. Amer. Statist. Assoc. 70, 698–705 (1975)
Clopper, C.J., Pearson, E.S.: The Use of Confidence or Fiducial Limits Illustrated in the Case of
the Binomial. Biometrika 26, 404–413 (1934)
Cochran, W.G.: Sampling technique. Wiley Eastern Limited, New Delhi (1985)
Cochran, W.G.: Sampling Techniques, 3rd edn. Wiley, New York (1977)
Cox, D.R.: The Analysis of Binary Data. Methuen, London (1970)
Cox, D.R.: Partial likelihood. Biometrika 62, 269–276 (1975)
Cox, D.R., Oakes, D.O.: Analysis of Survival Data. Chapman & Hall, London (1984)
Cox, D.R., Hinkley, D.V.: Theoretical Statistics. Chapman and Hall, London (1974)
Crame´r, H.: Mathematical Methods of Statistics. Princeton Univer-sity Press, Princeton (1946a)
Crow, E.L., Shimizu, K.: Lognormal Distributions: Theory and Practice. Marcel Dekker, New
York (1988)
Crowder, M.J., Sweeting, T.: Bayesian inference for a bivariate binomial distribution. Biometrika
76, 599–603 (1989)
Croxton, F.E., Cowden, D.J.: Applied General Statistics. Prentice-Hall (1964)
Daniels, H.E.: Exact saddlepoint approximations. Biometrika 67, 59–63 (1980)
DasGupta, A.: Bayes minimax estimation in multiparameter families when the parameter space is
restricted to a bounded convex set. Sankhya 47, 326–332 (1985)
DasGupta, A.: An examination of Bayesian methods and inference: In search of the truth.
Technical Report, Department of Statistics, Purdue University (1994)
DasGupta, A., Rubin, H.: Bayesian estimation subject to minimaxity of the mean of a multivariate
normal distribution in the case of a common unknown variance. In: Gupta, S.S., Berger, J.O.
(eds.) Statistical Decision Theory and Related Topics IV. Springer-Verlag, New York (1988)
Datta, G.S., Ghosh, J.K.: On priors providing frequentist validity for Bayesian inference.
Biometrika 82, 37–46 (1995)
Dean, A., Voss, D.: Design and Analysis of Experiments.: Springer- Verlag, New York (1999)
deFinetti, B.: Probability, Induction, and Statistics. Wiley, New York (1972)
DeGroot, M.: Optimal Statistical Decisions. McGraw-Hill, New York (1970)
DeGroot, M.H.: Probability and Statistics, 2nd edn. Addison-Wesley, New York (1986)
Raj, D., Chandhok, P.: Samlpe Survey Theory. Narosa Publishing House, New Delhi (1999)
Devroye, L.: Non-Uniform Random Variate Generation. Springer-Verlag, New York (1985)
Devroye, L., Gyoerfi, L.: Nonparametric Density Estimation: The L1 View. Wiley, New York
(1985)
Diaconis, P.: Theories of data analysis, from magical thinking through classical statistics. In:
Hoaglin, F. Mosteller, J (eds.) Exploring Data Tables, Trends and Shapes (1985)
Diaconis, P.: Group Representations in Probability and Statistics. Institute of Mathematical
Statistics, Hayward (1988)
Dobson, A.J.: An Introduction to Generalized Linear Models. Chapman & Hall, London (1990)
Draper, N.R., Smith, H.: Applied Regression Analysis, 3rd edn. Wiley, New York (1998)
References 307
Draper, N.R., Smith, H.: Applied Regression Analysis. John Wiley and Sons, New York (1981)
Dudley, R.M.: Real Analysis and Probability. Wadsworth and Brooks/Cole, Pacific Grove, CA
(1989)
Durbin, J.: Estimation of parameters in time series regression model. J. R. Statis. Soc.-Ser-B, 22,
139–153 (1960)
Dutta, M.: Econometric Methods. South Western Publishing Company, Cincinnati (1975)
Edwards, A.W.F.: Likelihood. Johns Hopkins University Press, Baltimore (1992)
Efron, B., Hinkley, D.: Assessing the accuracy of the maximum likelihood estimator: Observed vs.
expected Fisher information. Biometrica 65, 457–481 (1978)
Efron, B., Morris, C.: Empirical Bayes on vector observations-An extension of Stein’s method.
Biometrika 59, 335–347 (1972)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, London (1993)
Everitt, B.S.: The Analysis of Contingency Tables, 2nd edn. Chap-man & Hall, London (1992)
Everitt, B.S.: The Analysis of Contingency Tables. John Wiley (1977)
Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman & Hall, London (1981)
Ezekiel, M., Fox, K.A.: Methods of Correlation and Regression Analysis. John Wiley (1959)
Faith, R.E.: Minimax Bayes point and set estimators of a multivariate normal mean. Ph.D. thesis,
Department of Statistics, University of Michigan (1976)
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 1, 3rd edn. Wiley,
New York (1968)
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. II. Wiley, New York
(1971)
Feller, W.: An Introduction to Probability Theory and Its Applications, vol II, 2nd edn. John
Wiley, New York (1971)
Ferguson, T.S.: Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New
York (1967)
Ferguson, T.S.: A Course in Large Sample Theory. Chapman and Hall, London (1996)
Ferguson, T.S.: Mathematical Statistics. Academic Press, New York (1967)
Field, C.A., Ronchetti, E.: Small Sample Asymptotics. Institute of Mathematical Statistics.
Hayward, CA (1990)
Finney, D.J.: Probit Analysis. Cambridge University Press, New York (1971)
Fisher, R.A.: Statistical Methods and Scientific Inference, 2nd edn. Hafner, New York. Reprinted
(1990). Oxford University Press, Oxford (1959)
Fisz, M.: Probability Theory and Mathematical Statistics, 3rd edn. John Wiley (1963)
Fleming, T.R., Harrington, D.P.: Counting Processes and Survival Analysis. Wiley, New York
(1991)
Fox, K.: Intermediate Economic Statistics. Wiley (1968)
Fraser, D.A.S.: The Structure of Inference. Wiley, New York (1968)
Fraser, D.A.S.: Inference and Linear Models. McGraw-Hill, New York (1979)
Fraser, D.A.S.: Nonparametric Methods in Statistics. John Wiley, New York (1965)
Freedman, D., Pisani, R., Purves, R., Adhikari, A.: Statistics, 2nd edn. Norton, New York (1991)
Freund, J.E.: Mathematical Statistics. Prentice-Hall of India (1992)
Fuller, W.A.: Measurement Error Models. Wiley, New York (1987)
Gelman, A., Carlin, J., Stern, H., Rubin, D.B.: Bayesian Data Analysis. Chapman & Hall, London
(1995)
Ghosh, J.K., Mukerjee, R.: Non-informative priors (with discussion). In: Bernardo, J.M., Berger, J.
O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics IV. Oxford University Press, Oxford
(1992)
Ghosh, J.K., Subramanyam, K.: Second order efficiency of maximum likelihood estimators.
Sankhya A 36, 325–358 (1974)
Ghosh, M., Meeden, G.: Admissibility of the MLE of the normal integer mean. Wiley, New York
(1978)
Gibbons, J.D.: Nonparametric Inference. McGraw-Hill, New York (1971)
308 References
Gibbons, J.D., Chakrabarty, S.: Nonparametric Methods for Quantitative Analysis. American
Sciences Press (1985)
Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.): Markov Chain Monte Carlo in Practice.
Chapman & Hall, London (1996)
Girshick, M.A., Savage, L.J.: Bayes and minimax estimates for quadratic loss functions.
University of California Press (1951)
Glejser, H.: A new test for heteroscedasticity, J. Am. Stat. Assoc. 64 316–323 (1969)
Gnedenko, B.V.: The Theory of Probabilit y. MIR Publishers, Moscow (1978)
Godambe, V.P.: Estimating Functions. Clarendon Press, UK (1991)
Goldberg, S.: Probability, an Introduction. Prentice-Hall (1960)
Goldberger, A.S.: Econometric Theory. John Wiley and Sons, New York (1964)
Goldfield, S.M., Quandt, R.E.: Nonlinear methods in Econometrics. North Holland Publishing
Company, Amsterdam (1972)
Goon, A.M., Gupta, M.K., Dasgupta, B.: Fundamentals of Statistics, vol. 1. World Press. Kolkata
(1998)
Goon, A.M., Gupta, M.K., Dasgupta, B.: Fundamentals of Statistics, vol. 2. World Press. Kolkata,
India (1998)
Goon, A.M., Gupta, M.K., Dasgupta, B.: Outline of Statistics, vol. 1. World Press. Kolkata (1998)
Goon, A.M., Gupta, M.K., Dasgupta, B.: Outline of Statistics, vol. 2. World Press. Kolkata (1998)
Granger, C.W.J., Mowbold, P.: R2 and the transformation of regression variables. J. Econ. 4 205–
210 (1976)
Granger, C.W.J.: Investigating Causal Relations by Econometric Models and Cross- spectral
Methods. Econometrica, 424–438 (1969)
Graybill, F.A.: Introduction to Linear Statistical Models, vol. 1. Mc-Graw Hill Inc., New York
(1961)
Gujarati, D.N.: Basic Econometrics. McGraw-Hill, Inc., Singapore (1995)
Gupta, S.C., Kapoor, V.K.: Fundamentals of Mathematical Statistics. Sultan Chand and Sons. New
Delhi (2002)
Haberman, S.J.: The Analysis of Frequency Data. University of Chicago Press. Chicago (1974)
Hall, P.: Pseudo-likelihood theory for empirical likelihood. Ann. Statist. 18, 121–140 (1990)
Hall, P.: The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York (1992)
Halmos, P.R.: Measure Theory. Van Nostrand, New York (1950)
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach
Based on Influence Functions. Wiley, New York (1986)
Hardy, G.H., Littlewood, J.E., Polya, G.: Inequalities, 2nd edn. Cambridge University Press,
London (1952)
Hedayat, A.S., Sinha, B.K.: Design and Inference in Finite Population Sampling. Wiley, New
York (1991)
Helms, L.: Introduction to Potential Theory. Wiley, New York (1969)
Hinkley, D.V., Reid, N., Snell, L.: Statistical Theory and Modelling. In honor of Sir David Cox.
Hobert, J.: Occurrences and consequences of nonpositive Markov chains in Gibbs sampling. Ph.D.
Thesis, Biometrics Unit, Cornell University (1994)
Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics. Amerind (1972)
Hollander, M., Wolfe, D.A..: Nonparametric Statistical Methods. John Wiley (1973)
Hsu, J.C.: Multiple Comparisons: Theory and Methods. Chapman and Hall, London (1996)
Huber, P.J.: Robust Statistics. Wiley, New York (1981)
Hudson, H.M.: Empirical Bayes estimation. Technical Report No. 58, Department of Statistics,
Stanford University ((1974))
Ibragimov, I.A., Has’minskii, R.Z.: Statistical Estimation: Asymptotic Theory. Springer-Verlag,
New York (1981)
References 309
James, W., Stein, C.: Estimation with Quadratic Loss. In: Proceedings of the Fourth Berkele y
Symposium on Mathematical Statistics and Probability 1, pp. 361–380. University of
California Press, Berkeley (1961)
Johnson, N.L., Kotz, S., Kemp, A.W.: Univariate Discrete Distributions, 2nd edn. Wiley, New
York (1992)
Johnson, N.L., Kotz, S.: Distributions in Statistics (4 vols.). Wiley, New York (1969–1972)
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distri- butions, vol 1, 2nd edn.
Johnson, N.L., Kotz. S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 2, 2d edn.
Johnston, J.: Econometric Methods, 3rd edn. Mc-Grawl-Hill Book Company (1985)
Kagan, A.M., Linnik, YuV, Rao, C.R.: Characterization Problems in Mathe- matical Statistics.
Kalbfleisch, J.D., Prentice, R.L.: The Statistical Analysis of Failure Time Data. Wiley, New York
(1980)
Kane, E.J.: Econmic Statistics and Econometrics. Harper International (1968)
Kapoor, J.N., Saxena, H.C.: Mathematical Statistics. S Chand and Co (Pvt) Ltd, New Delhi (1973)
Kelker, D.: Distribution Theory of Spherical Distributions and a Location-Scale Parameter
Generalization. Sankhya. Ser. A 32, 419–430 (1970)
Kempthorne, P.: Dominating inadmissible procedures using compromise decision theory. In:
Gupta, S.S., Berger, J.O. (eds.) Statistical Decision Theory IV. Springer-Verlag, New York
(1988a)
Kendall, M.G., Stuart, A.: The Advance Theory of Statistics vol. 3, 2nd edn. Charles Griffin and
Company Limited, London (1968)
Kendall, M., Stuart, A.: The Advance Theory of Statistics, vol. 2. Charles Griffin and Co. Ltd.,
London (1973)
Kendall, M., Stuart, A.: The Advance Theory of Statistics. vol 1. Charles Griffin and Co. Ltd.,
London (1977)
Kendall, M.G.: Rank Correlation Methods, 3rd edn. Griffin, London (1962)
Kendall, M., Stuart, A.: The Advanced Theory of Statistics, Volume II: Inference and
Relationship, 4th edn. Macmillan, New York (1979)
Kiefer, J.: Multivariate optimality results. In: Krishnaiah, P. (ed.) Multivariate Analysis. Academic
Press, New York (1966)
Kirk, R.E.: Experimental Design: Procedures for the Behavioral Sciences, 2nd edn. Brooks/Cole,
Pacific Grove (1982)
Klien, L.R., Shinkai, Y.: An Econometric Model of Japan, 1930-1959. International Economic
Review 4, 1–28 (1963)
Klien, L.R.: An Introduction to Econometrics. Prentice-Hall (1962)
Kmenta, J.: Elements of Econometrics, 2nd edn. Macmillan, New York (1986)
Kolmogorov, A.N., Fomin, S.V.: Elements of the Theory of Functions and Functional Analysis,
vol. 2. Graylock Press, Albany, New York (1961)
Koroljuk, V.S., Borovskich, YuV: Theory of U-Statistics. Kluwer Academic Publishers, Boston
(1994)
Koutsoyiannis, A.: Theory of Econometrics. Macmillan Press Ltd., London (1977)
Kraft, C.H., Van Eeden, C.: A Nonparametric Introduction to Statistics. Mac-millan, New York
(1968)
Kramer, J.S.: The logit Model for Economists. Edward Arnold publishers, London (1991)
Kuehl, R.O.: Design of Experiments: Statistical Principles of Research Design and Analysis, 2nd
edn. Pacific Grove, Duxbury (2000)
Lane, D.A.: Fisher, Jeffreys, and the nature of probability. In: Fienberg, S.E., Hinkley, D.V. (eds.)
R.A. Fisher: An Appreciation. Lecture Notes in Statistics 1. Springer-Verlag, New York (1980)
310 References
Lange, N., Billard, L., Conquest, L., Ryan, L., Brillinger, D., Greenhouse, J. (eds.): Case Studies in
Biometry. Wiley-Interscience, New York (1994)
Le Cam, L.: Maximum Likelihood: An Introduction. Lecture Notes in Statistics No. 18. University
of Maryland, College Park, Md (1979)
Le Cam, L.: Asymptotic Methods in Statistical Decision Theory. Springer-Verlag, New York
(1986)
Le Cam, L., Yang, G.L.: Asymptotics in Statistics: Some Basic Concepts. Springer-Verlag, New
York (1990)
Lehmann, E.L.: Testing Statistical Hypotheses, 2nd edn. Wiley, New York (1986)
Lehmann, E.L.: Introduction to Large-Sample Theory. Springer-Verlag, New York (1999)
Lehmann, E.L.: Testing Statistical Hypotheses, Second Edition (TSH2). Springer-Verlag, New
York (1986)
Lehmann, E.L.: Elements of Large-Sample Theory. Springer-Verlag, New York (1999)
Lehmann, E.L.: Testing Statistical Hypotheses. John Wiley, New York (1959)
Lehmann, E.L., Scholz, F.W.: Ancillarity. In: Ghosh, M., Pathak, P.K. (eds.) Current Issues in
Statistical Inference: Essays in Honor of D. Basu. Institute of Mathematical Statistics, Hayward
(1992)
Lehmann, E.L., Casella, G.: Theory of Point Estimation, 2nd edn. Springer-Verlag, New York
(1998)
LePage, R., Billard, L.: Exploring the Limits of Bootstrap. Wiley, New York (1992)
Leser, C.: Econometric Techniques and Problemss. Griffin (1966)
Letac, G., Mora, M.: Natural real exponential families with cubic variance functions. Ann. Statist.
18, 1–37 (1990)
Lindgren, B.W.: Statistical Theory, 2nd edn. The Macmillan Company, New York (1968)
Lindley, D.V.: The Bayesian analysis of contingency tables. Ann. Math. Statist. 35, 1622–1643
(1964)
Lindley, D.V.: Introduction to Probability and Statistics. Cambridge University Press, Cambridge
(1965)
Lindley, D.V.: Introduction to Probability and Statistics from a Bayesian Viewpoint. Part 2.
Inference. Cambridge University Press, Cambridge (1965)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Liu, C., Rubin, D.B.: The ECME algorithm: A simple extension of EM and ECM with faster
monotone convergence. Biometrika 81, 633–648 (1994)
Loeve, M.: Probability Theory, 3rd edn. Van Nostrand, Princeton (1963)
Luenberger, D.G.: Optimization by Vector Space Methods. Wiley, New York (1969)
Lukacs, E.: Characteristic Functions, 2nd edn. Hafner, New York (1970)
Lukacs, E.: Probability and Mathematical Statistics. Academic Press, New York (1972)
Madala: Limited Dependent and Qualitative Variables in Econometrics. Cambridge University
Madnani, J.M.K.: Introduction to Econometrics: Principles and Applications, 4th edn. Oxford and
1BH publishing Co. Pvt. Ltd (1988)
Maritz, J.S., Lwin, T.: Empirical Bayes Methods, 2nd edn. Chapman & Hall, London (1989)
Marshall, A., Olkin, I.: Inequalities—Theory of Majorization and its Applications. Academic
Press. New York (1979)
McCullagh, P.: Quasi-likelihood and estimating functions. In: Hinkley, D., Reid, N., Snell, L.
(eds.) Statistical Theory and Modelling: In Honor of Sir David Cox. Chapman & Hall, London
(1991)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman & Hall, London
(1989)
McLa.chlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)
McLachlan, G.: Recent Advances in Finite Mixture Models. Wiley, New York (1997)
McLachlan, G., Basford, K.: Mixture Models: Inference and Applications to Clustering. Marcel
Dekker, New York (1988)
References 311
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)
McPherson, G.: Statistics in Scientific Investigation. Springer-Verlag, New York (1990)
Meng, X.-L., Rubin, D.B.: Maximum likelihood estimation via the ECM algo-rithm: A general
framework. Biometrika 80, 267–278 (1993)
Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability. Springer-Verlag, New York
(1993)
Miller, R.G.: Simultaneous Statistical Inference, 2nd edn. Springer-Verlag, New York (1981)
Montgomery, D., Elizabeth, P.: Introduction to Linear Regression Analysis. John Wiley & Sons
(1982)
Mood, A.M., Graybill, F.A., Boes, D.C.: Introduction to the Theory of Statistics. McGraw-Hill
(1974)
Murray, M.K., Rice, J.W.: Differential Geometry and Statistics. Chap-man & Hall, London (1993)
Neter, J., Wasserman, W., Whitmore, G.A.: Applied Statistics. Allyn & Bacon, Boston (1993)
Novick, M.R., Jackson, P.H.: Statistical Methods for Educational and Psycho-logical Research.
McGraw-Hill, New York (1974)
Oakes, D.: Life-table analysis. Statistical Theory and Modelling, in Honor of Sir David Cox, FRS.
Olkin, I., Selliah, J.B.: Estimating covariances in a multivariate distribution. In: Gupta, S.S.,
Moore, D.S. (eds) Statistical Decision Theory and Related Topics II. Academic Press, New
York (1977)
Owen, A.: Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75,
237–249 (1988)
Panse, V.G., Sukhatme, P.V.: Statistical methods for agricultural workers. Indian Council of
Agricultural Research, New Delhi (1989)
Park, R.E.: Estimation with heteroscedastic error terms. Econometrica. 34(4), 888 (1966)
Parzen, E.: Modern Probability Theory and Its Applications. Wiley Eastern (1972)
Pfanzagl, J.: Parametric Statistical Theory. DeGruyter, New York (1994)
Raifa, H., Schlaifer, R.: Applied Statistical Decision Theory. Published by Division of Research,
Harvard Business School, Harvard University, Boston (1961)
Rao, C.R.: Simultaneous estimation of parameters—A compound decision problem. In: Gupta, S.
S., Moore, D.S. (eds.) Decision Theory and Related Topics. Academic Press, New York (1977)
Rao, C.R.: Linear Statistical Inference and Its Applications. John Wiley, New York (1965)
Rao, C.R., Kleffe, J.: Estimation of variance components and applications. North Holland/Elsevier,
Amsterdam (1988)
Ripley, B.D.: Stochastic Simulation. Wiley, New York (1987)
Robbins, H.: Asymptotically subminimax solutions of compound statistical decision problems. In:
Proceedings Second Berkeley Symposium Mathematics Statistics Probability. University of
California Press, Berkeley (1951)
Robert, C., Casella, G.: Improved confidence statements for the usual multivariate normal
confidence set. In: Gupta, S.S., Berger, J.O. (eds.) Statistical Decision Theory V.
Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer-Verlag, New York (1998)
Robert, C.P.: The Bayesian Choice. Springer-Verlag, New York (1994)
Rohatgi, V.K.: Statistical Inference. Wiley, New York (1984)
Romano, J.P., Siegel, A.F.: Counter examples in Probability and Statistics. Wadsworth and
Brooks/Cole, Pacific Grove (1986)
Rosenblatt, M.: Markov Processes. Structure and Asymptotic Behavior. Springer-Verlag, New
York (1971)
Ross, S.M.: A First Course in Probability Theory, 3rd edn. Macmillan, New York (1988)
Ross, S.: Introduction to Probability Models, 3rd edn. Academic Press, New York (1985)
Rothenberg, T.J.: The Bayesian approach and alternatives in econometrics. In: Fienberg, S.,
Zellner, A. (eds.) Stud- ies in Bayesian Econometrics and Statistics, vol. 1. North-Holland,
Amsterdam (1977)
312 References
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)
Royall, R.M.: Statistical Evidence: A Likelihood Paradigm. Chapman and Hall, London (1997)
Rubin, D.B.: Using the SIR Algorithm to Simulate Posterior Distributions. In: Bernardo, J.M.,
DeGroot, M.H., Lindley, D.V., Smith, A.F.M. (eds.) Bayesian Statistics, pp. 395–402. Oxford
University Press, Cambridge (1988)
Rudin, W.: Principles of Real Anal ysis. McGraw-Hill, New York (1976)
Sahu, P.K.: Agriculture and Applied Statistics-I.2nd Reprint. Kalyani Publishers, New Delhi, India
(2013)
Sahu, P.K., Das, A.K.: Agriculture and Applied Statistics-II, 2nd edn. Kalyani Publishers, New
Delhi, India (2014)
Sahu, P.K.: Research Methodology-A Guide for Researchers in Agricultural Science. Springer,
Social science and other related fields (2013)
Sa¨rndal, C-E., Swenson, B., and retman, J.: Model Assisted Survey Sampling. Springer-Verlag,
New York (1992)
Savage, L.J.: The Foundations of Statistics. Wiley, Rev. ed., Dover Publications, New York (1954,
1972)
Savage, L.J.: On rereading R. A. Fisher (with discussion). Ann. Statist. 4, 441–500 (1976)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)
Scheffe, H.: The Analysis of Variance. John Wiley and Sons, New York (1959)
Schervish, M.J.: Theory of Statistics. Springer-Verlag, New York (1995)
Scholz, F.W.: Maximum likelihood estimation. In: Kotz, S., Johnson, N.L., Read, C.B. (eds.)
Encyclopedia of Statistical Sciences 5. Wiley, New York (1985)
Searle, S.R.: Linear Models. Wiley, New York (1971)
Searle, S.R.: Matrix Algebra Useful for Statistics. Wiley, New York (1982)
Searle, S.R.: Linear Models for Unbalanced Data. Wiley, New York (1987)
Searle, S.R., Casella, G., McCulloch, C.E.: Variance Components. Wiley, New York (1992)
Seber, G.A.F.: Linear Regression Analysis. Wiley, New York (1977)
Seber, G.A.F., Wild, C.J.: Nonlinear Regression. John Wiley (1989)
Seshadri, V.: The Inverse Gaussian Distribution. A Case Stud y in Exponential Families.
Clarendon Press, New York (1993)
Shao, J.: Mathematical Statistics. Springer-Verlag, New York (1999)
Shao, J., Tu, D.: The Jackknife and the Bootstrap. Springer- Verlag, New York (1995)
Siegel, S.: Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill (1956)
Silverman, B.W.: Density Estimation for Statistic and Data Analysis. Chap-man & Hall, London
(1986)
Daroga, S., Chaudhary, F.S.: Theory and Analysis of Sample Survey Designs. Wiley Eastern
Limited. New Delhi (1989)
Singh, R.K., Chaudhary, B.D.: Biometrical methods in quantitative genetic analysis. Kalyani
Publishers. Ludhiana (1995)
Snedecor, G.W., Cochran, W.G.: Statistical Methods, 8th edn. Iowa. State University Press, Ames
(1989)
Snedecor, G.W., Cochran, W.G.: Statistical Methods. Iowa State University Press (1967)
Spiegel, M.R.: Theory and Problems of Statistics. McGraw-Hill Book Co., Singapore (1988)
Staudte, R.G., Sheather, S.J.: Robust Estimation and Testing. Wiley, New York (1990)
Stein, C.: Efficient nonparametric testing and estimation. In: Proceedings Third Berkeley
Symposium Mathematics Statististics Probability 1. University of California Press (1956a)
Stein, C.: Inadmissibility of the usual estimator for the mean of a multivariate distribution. In:
Proceedings Third Berkeley Symposium Mathematics Statististics Probability 1. University of
California Press (1956b)
Stigler, S.: The History of Statistics: The Measurement of Uncertainty before 1900. Harvard
University Press, Cambridge (1986)
Stuart, A., Ord, J.K.: Kendall’ s Advanced Theory of Statistics, vol. I, 5th edn. Oxford University
References 313
Stuart, A., Ord, J.K.:Kendall's Advanced Theory of Statistics, Volume I: Distribution Theory, 5th
edn. Oxford University Press, New York (1987)
Stuart, A., Ord, J.K.: Kendall’ s Advanced Theory of Statistics, vol. II, 5th edn. Oxford University
Stuart, A., Ord, J.K., Arnold, S.: Advanced Theory of Statistics, Volume 2A: Classical Inference
and the Linear Model, 6th edn. Oxford University Press, London (1999)
Susarla, V.: Empirical Bayes theory. In: Kotz, S., Johnson, N.L., Read, C.B. (eds.) Encyclopedia
of Statistical Sciences 2. Wiley, New York (1982)
Tanner, M.A.: Tools for Statistical Inference: Observed Data and Data Augmentation Methods,
3rd edn. Springer-Verlag, New York (1996)
The Selected Papers of E. S. Pearson. Cambridge University Press, New York (1966)
Theil, H.: On the Relationships Involving Qualitative Variables. American J. Sociol. 76, 103–154
(1970)
Theil, H.: Principles of Econometrics. North Holland (1972)
Theil, H.: Introduction to Econometrics. Prentice-Hall (1978)
Thompson, W.A. Jr.: Applied Probability, Holt, Rinehart and Winston. New York (1969)
Tintner, G.: Econometrics. John Wiley and Sons, New York (1965)
Tukey, J.W.: A survey of sampling from contaminated distributions. In: Olkin, I. (ed.)
Contributions to Probability and Statistics. Stanford University Press. Stanford (1960)
Unni, K.: The theory of estimation in algebraic and analytical exponential families with
applications to variance components models. PhD. Thesis, Indian Statistical Institute, Calcutta,
India ((1978))
Wald, A.: Statistical Decision Functions. Wiley, New York (1950)
Walker, H.M., Lev, J.: Statistical Inference. Oxford & IBH (1965)
Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall, London (1995)
Wasserman, L.: Recent methodological advances in robust Bayesian inference (with discussion).
In: Bernardo, J.M, Berger, J.O., David, A.P. (eds.) Bayesian Statistics 4 (1990)
Weiss, L., Wolfowitz, J.: Maximum Probability Estimators and Related Topics. Springer-Verlag,
New York (1974)
White, H.: A heterosedasticity consistent covariance matrix estimator and direct test of
heterosedasticity. Econometrica 48 817–898 (1980)
Wilks, S.S.: Mathematical Statistics. John Wiley, New York (1962)
Williams, E.J.: Regression Analysis. Wiley, New York (1959)
Wu, C.F.J.: On the convergence of the EM algorithm. Ann. Statist. 11, 95–103 (1983)
Yamada, S. and Morimoto, H. (1992). Sufficiency. In: Ghosh, M., Pathak, P.K. (eds.) Current
Issues in Statistical Inference: Essays in Honor of D. Basu. Institute of Mathematical Statistics,
Hayward, CA
Yamane, T.: Statistics. Harper International (1970)
Yule, G.U., Kendell, M.G.: Introduction to the Theory of Statistics (Introduction). Charles Griffin,
London (1950)
Zacks, S.: The Theory of Statistical Inference. John Wiley, New York (1971)
Zellner, A.: An Introduction to Bayesian Inference in Econometrics. Wiley, New York (1971)
Index
A E
Admissibility, 2, 48, 189 Efficiency, 21, 44, 47, 60, 61, 290
Alternative hypothesis, 64, 157 Efficient estimator, 44, 45, 48, 268
ANOVA, 122, 146 Essential complete, 191, 192
Asymptotically normal, 51, 52, 278, 299 Essential complete class, 191
Estimate, 3, 4, 8, 21, 39, 48, 55, 60, 176, 177,
B 188, 197, 203, 207, 220, 223, 229, 242, 290
Bartlett’s test, 250 Estimation, 3, 39, 44, 47, 48, 54, 56, 63, 131,
Bayes principle, 198 132, 181, 183
Bernoulli distribution, 64 Estimator, 3, 21–26, 28–31, 36, 39, 44, 48, 53,
Best critical region, 73 104, 132, 203–205, 220, 228
Bhattacharya system of lower bounds, 32 Euclidian space, 64
Binomial distribution, 158, 237 Exact tests, 239
Expected frequency, 55, 147, 267, 268, 273
C
Cauchy distribution, 48 F
Chebysheff’s inequality, 40, 41, 137 Factorizability condition, 8, 9
Combination of tests, 173 Factorization criterion, 7
Complete decision rule, 189 Family of distribution function, 3, 30
Completeness, 15, 192, 204 Fisher-Neyman criterion, 5, 11, 12
Complete statistics, 15 Fisher-Neyman factorisation theorem, 11
Conditional distribution, 3, 5, 13, 16, 30, 238, Fisher’s t statistic, 244
240, 241
Confidence interval, 131–138, 141, 143, 170, H
172, 248, 293, 295, 298, 301, 302 Homogeneity, 123, 165, 268
Consistency, 21, 39, 40, 42, 47
Consistent estimator, 39–43, 52, 82, 280 I
Correlation coefficient, 25, 45, 127, 128 Interval estimation, 131, 132, 182
Critical function, 66 Invariance, 231, 233
Critical region, 66, 70, 73, 84, 92, 94, 97, 128,
129, 144, 154, 158, 164, 249 J
Jensen’s inequality, 193
D Joint distribution, 1, 4, 7, 168
Dandekar’s correction, 274
Distribution free test, 145 K
Distribution function, 1–3, 5, 30, 131, 153, Kolmogorov-Smirnov test, 163, 165
168, 274 Koopman form, 14, 29

DOI 10.1007/978-81-322-2514-0
316 Index
L Parametric space, 67, 147

Lebelling parameter, 64 Parametric test, 145
Lehmann-Scheffe theorem, 30 Point estimation, 47, 131, 132, 181
Likelihood ratio test, 103–106, 108, 112, 117, Poisson distribution, 7, 14, 55, 239
123, 127, 128 Population, 3, 5, 12, 22, 26, 41, 47, 63, 103,
Linear ordering principle, 197 131, 145, 147, 151, 156, 157, 161, 174,
Locally Best test, 92, 95 238, 242, 251
Locally MPU test, 90 Powerful test, 95, 146, 157
Power function, 67, 70, 78, 81, 89
M Prior distribution, 200, 216, 226
Mann Whitney Test, 159 Probability density function, 5, 173
Mean square error, 21, 23, 52 Probability mass function, 5
Measure of discrepancy, 55
Method of least square, 47, 56, 59 R
Method of maximum likelihood, 48, 57, 59, 60 Random interval, 131, 132, 168
Method of minimum Chi-square, 47, 55 Randomized decision rule, 175, 185, 188
Method of moments, 47, 53, 59, 60 Randomized test, 66, 69, 70, 75, 86, 90
Minimal complete, 17, 189, 190 Randomness, xv
Minimal complete decision rule, 189 Random sample, 2, 3, 5, 8, 12, 15, 20–22, 26,
Minimal essential complete class, 191 40, 47, 48, 50, 128, 131, 145, 159, 168,
Minimal sufficient statistics, 15, 17–20 170, 178, 286
Minimax principle, 199 Random sampling, 2, 12, 41, 139
Minimum variance bound estimator, 28 Rao-Blackwell theorem, 30
Minimum variance unbiased estimator Rao-Cramer inequality, 26
(MVUE), 28, 31, 57 Regression, 250–252, 256
Mood’s Median test, 161 Regularity condition, 26, 29, 30, 32, 44, 50, 52,
Most powerful size, 92, 93 104
Most powerful test, 97 Restriction principle, 34, 197
Mutually exclusive, 55, 146 Risk, 183, 185, 188, 198, 208, 212, 227
Mutually exhaustive, 55 Risk function, 188, 197
N S
Neyman-Pearson Lemma, 73, 91 Sample, 2, 3, 7, 9, 15, 17, 20, 39, 55, 60, 63,
Neyman’s criterion, 138, 141 65, 104, 112, 132, 145, 147, 148, 153, 155,
Nominal, 146 159, 174, 181, 291
Non-parametric test, 145, 146, 156 Sign test, 146, 148, 151, 156–158, 171–173
Non-randomized test, 66, 75, 86, 142–144 Spearman’s rank correlation coefficient, 174,
Non-singular, 261, 262, 267 175
Normal distribution, 1, 57, 104, 105, 145, 243, Statistic, 3, 5, 6, 12, 18, 107, 135, 138, 145,
249, 261, 282, 291 148, 173, 179, 192
Null hypothesis, 64, 66, 113, 147, 152, Statistical decision theory, 173
156–158 Statistical hypothesis, 63, 146
Statistical inference, 2, 131, 181
O Students’ t-statistic, 110, 133
Optimal decision rule, 197 Sufficient statistic, 3, 4, 15, 17, 192, 204, 239
Optimum test, 72
Ordinal, 146 T
Orthogonal transformation, 4 Test for independence , 240, 270
Test for randomness, 152
P Test function, 66
Paired sample sign test, 157 Testing of hypothesis, 182, 279
Parameter, 2, 3, 6, 12–14, 17, 21, 47, 56, 63, Tolerance limit, 168
64, 103, 131, 132, 146, 156, 181, 197, 202, Transformation of statistic, 291
223, 246, 268 Type-1 and Type-2 error, 72
Index 317
U W
UMAU, 143, 144 Wald-Wolfowiltz run test, 162, 165
Unbiased estimator, 21–26, 28–32, 36, 37, 39, Wilcoxon-Mann Whitney rank test, 159
42, 44, 45, 57, 124, 225 Wilk’s criterion, 138
Uni-dimensional random variable, 1
Uniformly most powerful unbiased test Y
(UMPU test), 73, 97 Yates’ correction, 273
Uniformly powerful size, 73

10.1007 978 81 322 2514 0

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

10.1007 978 81 322 2514 0

Hochgeladen von

Copyright:

Verfügbare Formate

Pradip Kumar Sahu · Santi Ranjan Pal

Ajit Kumar Das

Estimation and Inferential

Santi Ranjan Pal

ISBN 978-81-322-2513-3 ISBN 978-81-322-2514-0 (eBook)

Library of Congress Control Number: 2015942750

Springer New Delhi Heidelberg New York Dordrecht London

Printed on acid-free paper

Springer (India) Pvt. Ltd. is part of Springer Science+Business Media (www.springer.com)

Estimation and Inferential Statistics

• Chapter 5: This chapter deals with interval estimation, techniques of interval

Mohanpur, Nadia, India Pradip Kumar Sahu

1 Theory of Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

3 Theory of Testing of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.3 Construction of Conﬁdence Interval. . . . . . . . . . . . . . . . . . . . . . 132

6 Non-parametric Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7 Statistical Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

P.K. Sahu is associate professor and head of the Department of Agricultural

postgraduate students and research scholars. He has coauthored a book, Agriculture

Problem of Point Estimation

The problem of point estimation relates to the estimating formula of a parameter

This x can be taken as the point estimate of h.

Example Let X1 ; X2 ; . . .; Xn be a random sample from a normal distribution with

Problem of Interval Estimation

In many cases, instead of point estimation, we are interested in constructing of a

and the observed sample mean square

respectively. 100ð1 aÞ% conﬁdence interval of l is given by

Problem of Testing of Hypothesis

Problems of Non-parametric Estimation

Find the estimators of a and b by the method of moments.

Example Let X1 ; X2 ; . . . Xn be a random sample from PDF

Find estimator of θ and r by

It is, however, difﬁcult to solve the equation

Example If a sample of size one be drawn from the PDF f ðx; bÞ ¼

Thus the estimator of β based on method of moments is given as b ¼ 3x. Now,

Thus the efﬁciency of b

^ is approximately the sample mean x

Example In general, we consider n uncorrelated observations y1 ; y2 ; . . .yn such that

EðYÞ ¼ Xb and V(e) = Eðee0 Þ ¼ r2 I

Where e ¼ Y Xb is an ðn  1Þ vector of error random variable with EðeÞ ¼ 0 and

Let yi be an independent Nðb1 þ b2 xi ; r2 Þ variate, i ¼ 1; 2; . . .. . .; n so that

The likelihood estimate is

In carrying out any statistical investigation, we start with a suitable probability

Example 1.2 In traditional statistical practice, it is frequently assumed that

© Springer India 2015 1

(b) Let x bð1; pÞ. Here, p is a parameter and

(c) Let x PðkÞ Here, λ is a parameter and

(d) Let x Nðl0 ; r2 Þ, μ0 is a known constant.

Example 1.4 Let X 1 ; X 2. . . X n be a random sample from Nðl; r2 Þ. Thus

1.2 Sufﬁcient Statistic

Illustration 1.1 Suppose we want to study the nature of a coin. To do this, we

Intuitively, it sums unnecessary to mention the order of occurrences of head. To

which is free from parameter p.

From the above discussion, we see that the conditional distribution of

does depend upon θ.

We ﬁnd that joint p.m.f. of X 1 ; X 2 ; . . . X n is

P fh ðxi Þ ¼ hRXi ð1 hÞnRXi P Cðxi Þ

where 0 \ h /, i.e. H ¼ ð0; /Þ

We may represent the joint p.m.f. of X 1 ; X 2 ; . . . X n as

The factorizability condition is thus observed to hold, so that T = ΣXi is sufﬁcient

where gfxðnÞ; hg ¼ h1n I ð0;hÞ fxðnÞ g & hð x Þ ¼ 1.

Thus the estimator of β based on method of moments is given as b ¼ 3x. Now,

EðYÞ ¼ Xb and V(e) = Eðee0 Þ ¼ r2 I

Where e ¼ Y Xb is an ðn 1Þ vector of error random variable with EðeÞ ¼ 0 and

Example 1.4 Let X 1 ; X 2. . . X n be a random sample from Nðl; r2 Þ. Thus

where h ¼ ðh1 ; h2 ; . . .hd Þ, d k. Then

EðT hÞ2 E ðT 0 hÞ 8 h and for any rival estimator T 0

) E fT ðX Þg ¼ 1 P ðX ¼ 0Þ þ O PðX 6¼ 0Þ ¼ ek 8k:

EðT hÞ2 ¼ E½T EðT Þ þ fE ðT Þ hg2

E fT 1ðT 1T 2 Þg ¼ 0 ðfrom Result 1.1Þ

Example x1 ; x2 ; . . .; xn is a random sample from Nðl; 1Þ