Sie sind auf Seite 1von 237

PROBABILITY MODELS AND THEIR PARAMETRIC ESTIMATION

NET/JRF/CSIR EXAMINATIONS

A. SANTHAKUMARAN
Dr. A. Santhakumaraan
Associate Professor and Head
Department of Statistics
Salem Sowdeswari College
Salem - 636010
Tamil - Nadu
E-mail: ask.stat @ yahoo.com
About the Author
Dr. A. Santhakumaran is an Associate Professor and Head Department of Statistics at
Salem Sowdeswari College, Slaem - 10, Tamil Nadu. He holds a Ph.D. in Statistics -
Mathematics from the Ramanujan Institute for Advanced Study in Mathematics, Univer-
sity of Madras. He has interests in Stochastic Processes and Their Applications. He has
to his credit over 31 research papers in Feedback Queues, Statistical Quality Control and
Reliability Theory. He is the authour of the book Fundamentals of Testing Statistical
Hypotheses and Research Methodology.
Acknowledgments

My special thanks to the Correspondent and Secretary of Salem Sowdeswari


College , Salem and my colleagues for their enthusiastic and unstinted support ren-
dered for publishing this book. I am grateful to Professor V. Thangaraj, RIASM, Uni-
versity of Madras, for his encouragement for writing the book. My greatest debt is
Dr. J. Subramaniam, Professor of Mathematics, Bannari Amman Institute of Technol-
ogy, Sathyamangalam, who read most of the manuscript and whose critical comments
resulted in numerous significant improvements. My thanks to Mr. G. Narayanan, Ra-
manujan Institute Computer Centre, RIASM, University of Madras, for the suggestions
rendered by him towards the successful completion of the Latex typeset of the book.
Finally, I wish to express my gratitude to all my teachers under whose influ-
ence I have come to appreciate statistics as the science of winding and twisting net-
work, connecting Mathematics , Scientific Philosophy, Computer Software and other
intellectual sources of the Millennium. A.SANTHAKUMARAN
PREFACE

Even though the science of Statistics was originated more than 200 years ago ,
it was recognized as a separate discipline in the early 1940 in India. From then to
till now statistics is evolving as a versatile powerful and indispensable instrument for
analyzing the statistical data in real life problems. We have reached a stage where no
empirical science can afford to ignore the science of Statistics, since the diagnosis of
pattern of recognition can be achieved through the science of Statistics. Because of the
speedy growth of modern science and technology, one who learns statistics, he must
have capacity, knowledge and intellect. Bird has capacity to imitate when we taught.
The child is not born with a language. But it is born into an innate capacity to learn
language. So when we teach the child, the child manipulates the structure and creates
sentences. But a bird cannot do this. So the child has knowledge and capacity to create
new sentences. If a man has the ability and knowledge he can be inventiveness and
innovation constitute intellect.
If a student has ability, knowledge and intellect, then he will be able to learn and
implement statistics successfully. If these three faculties are lacking, learning of statis-
tics will not be possible. We shall give a number of examples drawn from the story of
improvement of natural knowledge and the success of decision making. It shows how
statistical ideas played an important role in scientific investigations and other decision
making processes. The most successful man in life is one who makes the best deci-
sion based on the available information. Practically it is a very difficult task to take a
decision on a real life problems. We illustrate this with the help of following examples.
One wants to know that how many ways a bread can be divided into two equivalent
parts. Immediately one reflects that it is divided into a finite number of ways. In fact
the bread is divided into two equivalent parts in infinite number of ways. Naturally
every article can have infinite dimension. Our interest of study may be one dimension
namely, length of the bread, Area ( = length × breath ) two dimension and Volume
( = length × height × breadth) three dimension and so on. Analogous to this are
the measures of average ( location), measures of variability ( scale) and measures of
skewness and kurtosis (shape).
Another example is that a new two wheeler is introduced by a manufacturer in the
market. The manufacturer wants to announce that the two wheeler gives how much
kilometer per litre on road. For this purpose, the manufacturer ride the two wheeler on
the road three times and observed that the two wheeler gives 50 km per litre, 55 km
per litre and 60 km per litre respectively. Suddenly one comes to the mind that the two
wheeler gives = 50+55+60 3 = 55 km per litre. This is absolutely wrong. Actually the
two wheeler gives 60 km per litre, the value of the maximum order statistic.
A cyclist pedals from his house to his college at a speed of 10 mph and returns back
his house from the college at a speed of 15 mph. He wants to know his average speed.
One assumes that the distance between the house and the college is x miles. Then the
average speed of the cyclist = TotalTotal distance = x 2x x = 12 mph which is the
time taken 10 + 15
Harmonic Mean.
Seven students and a master want to cross a river from one side to other side. The
students are not able to swim to cross the river. The master measures average height
of the students which is 5’.5”. He also measures the depth of the river from one side

5
to other side in 10 places 2’, 2’.5”, 4’, 5’.5”, 6’, 6’.5”, 10’, 2’.5”,1’.5”,1’which has
4’.15” average depth of the river. The master takes a decision to cross the river on foot,
since average height of the students is greater than the average depth of the river. The
students fail to cross the river, since some place the depth of the river is more than
5’.5”. The master is not happy for his decision. The master has succeeded to take a
decision if the minimum height of the students is greater than the maximum depth of
the river.
Keeping this in mind, the first chapter of the book deals with some of the well
known distributions he pattern of recognition of statistical distributions. Chapter 2
gives the criteria of point estimation. Chapter 3 focuses on the study of optimal estima-
tion. Chapter 4 illustrates the properties of complete family of distributions. Chapter 5
explains the methods of estimation. Chapter 6 discusses interval estimation. Chapter 7
consists of Bayesian estimation.

6
DISTINCTIVE FEATURES

• Care has been taken to provide conceptual clarity, simplicity and up to date ma-
terials.

• Properly graded and solved problems to illustrate each concept and procedure
are presented in the text.
• About 300 solved problems and 50 remarks.
• A chapter on complete family of distributions.

• It is intended to serve as a text book of one semester course on Statistical Infer-


ence of Under - Graduate and Post - Graduate Statistics of Indian universities
and other Applicable Sciences, Allied Statistical Courses, Mathematical Sci-
ences and various Competitive Examinations like ISS, UGC Junior Fellowship,
SLET, NET etc.

Salem - 636010 A.Santhakumaran

January 2010

7
CONTENTS

1 Diagnosis of Statistical Pattern 1 − 32


1.1 Introduction
1.2 Collection of data
1.3 Diagnosing the Probability Models data
1.4 Discrete Probability Models
1.5 Continuous Probability Models
1.6 Diagnosis of Probability Models
1.7 Quantile - Quantile plot

2 Criteria of point estimation 33 − 73


2.1 Introduction
2.2 Point estimator
2.3 Problems of point estimation
2.4 Criteria of the point estimation
2.5 Consistency
2.6 Sufficient condition for consistency
2.7 Unbiased estimator
2.8 Sufficient Statistic
2.9 Neyman Factorizability Criterion
2.10 Exponential family of distributions
2.11 Distribution Admitting Sufficient Statistic
2.12 Joint Sufficient Statistics
2.13 Efficient estimator

3 Complete Family of Distributions 74 − 94


3.1 Introduction
3.2 Completeness
3.3 Minimal Sufficient Statistic

4 Optimal Estimation 95 − 151

8
4.1 Introduction
4.2 Uniformly Minimum Variance Unbiased Estimator
4.3 Uncorrelatedness Approach
4.4 Rao - Balckwell Theorem
4.5 Lehman - Scheffe Theorem
4.6 Inequality Approach
4.7 Cramer Rao Inequality
4.8 Chapman - Robbin Inequality
4.9 Efficiency
4.10 Extension of Cramer- Rao Inequality
4.11 Cramer - Rao Inequality - Multiparameter case
4.12 Bhattacharya Inequality

5 Methods of Estimation 152 − 203


5.1 Introduction
5.2 Method of Maximum Likelihood Estimation
5.3 Numerical Methods of Maximum Likelihood Estimation
5.4 Optimum property of MLE
5.5 Method of Minimum Variance Bound Estimation
5.6 Method of Moment Estimation
5.7 Method of Minimum Chi - Square Estimation
5.8 Method of Least Square Estimation
5.9 Gauss Markoff Theorem

6 Interval Estimation 204 − 226


6.1 Introduction
6.2 Confidence Intervals
6.3 Alternative Method of Confidence Intervals
6.4 Shortest Length Confidence Intervals

7 Bayes estimation 227 − 245

9
7.1 Introduction

7.2 Bayes point estimation


7.3 Bayes confidence intervals

References
Glossary of Notation
Appendix

Answers to problems
Index

10
Probability Models and their Parametric Estimation

1. DIAGNOSIS OF STATISTICAL PATTERN

1.1 Introduction
Statistics is a decision making tool which aims to resolve the real life problems.
It originated more than 2000 years ago, but it was recognized as a separate discipline
from 1940 in India. From then till now , statistics is evolving as a versatile powerful and
indispensable instrument for investigation in all fields of real life problems. It provides
a wide variety of analytical tools. We have reached a stage where no empirical science
can afford to ignore the science of statistics since the diagnosis of pattern of recognition
can be achieved through the science of statistics.
Statistics is a method of obtaining and analyzing data in order to take decisions
on them. In India, during the period of Chandra Gupta Maurya there was an efficient
system of collecting official and administrative statistics. During Akbar’s reign ( 1556
- 1605AD) people maintained good records of land and agricultural statistics. Statistics
surveys were also conducted during his reign.
Sir Ronald A. Fisher known as Father of statistics placed statistics on a very
sound footing by applying it to various diversified fields. His contributions in statistics
led to a very responsible position of statistics among sciences
Professor P. C. Mahalanobis is the founder of statistics in India. He was a
physicist by training , a statistician by instinct and an economist by conviction. Gov-
ernment of India has observed on 29th June the birthday of Professor Prasanta Chan-
dra Mahalanobis as National Statistics Day. Professor C.R. Rao is an Indian legend
, whose career spans the history of modern statistics. He is considered by many to be
the greatest living statistician in the world to day.
There are many definitions of the term statistics . Some authors have defined
statistics as statistical data ( plural sense) and others as statistical methods ( singular
sense).

Statistics as Statistical Data


Yule and Kendall state By statistics we mean quantitative data affected to a
marked extent by multiplicity of causes. Their definition point out the following char-
acteristics:
• Statistics are aggregates of facts.
• Statistics are affected to a marked extent by multiplicity of causes.
• Statistics are numerically expressed.
• Statistics are enumerated or estimated according to reasonable standards of ac-
curacy.
• Statistics are collected in a systematic manner.
• Statistics are collected for a pre - determined purpose and
• Statistics should be placed in relation to each other.

11
A. Santhakumaran

Statistics as Statistical Methods


One of the best definitions of statistics is given by Croxton and Cowden. They
define statistics as the science which deals with collection, analysis and interpretation
of numerical data. This definition points out the scientific ways of :
• Data collection
• Data presentation
• data analysis
• Data interpretation

Statistics as Statistical Models and Methods


Statistics is an imposing form of Mathematics. The usage of statistical methods
has been briskly expanding in the late 20th century, because of the application value
of the statistical models and methods have greater implication in the applications of
many inter - disciplinary sciences. So we define Statistics as the science of winding
and twisting network connecting Mathematics, Scientific Philosophy, Computer
software and other intellectual sources of the millennium.
This definition reveals that statisticians work to translate real life problems
into mathematical models by using assumptions or axioms or principles. Then they
derive exact solutions by their knowledge and thereby intellectually validate the results
and express their merits in non-mathematical forms which make for the consistency of
real life problems.
In real life problems, there are many situations where the actions of the en-
tities within the system under study cannot be completely predicted with 100 percent
perfection . There is always some variation. The variation can be classified into two
categories, i.e., variation due to assignable causes which has to be identified and elim-
inated; and variation due to chance causes which is equal to 6σ values. This is also
called natural variation. In general, the reduction of natural variation is not necessary
and involves more cost. So it is not feasible to reduce the natural variation. However,
some appropriate statistical patterns of recognition may well describe the causes of
variations.
An appropriate statistical pattern of recognition can be diagnosed by repeated
sampling of phenomenon of interest. Then, through the systematic study of these data,
a statistician can obtain a known distribution suitable for the data and estimates the
parameters of the distribution. A statistician takes continuous efforts in the selection of
a distribution form.
There are four steps in the diagnosis of a statistical distribution. They are
(i) Data collection
Data collection for real life problems often requires a substantial knowledge on
the problems, planning time and resource commitment.
(ii) Identification of statistical pattern
When the data are available, identification of a probability distribution begins

12
Probability Models and their Parametric Estimation

by developing a frequency distribution or Histogram of the data. Based on the


pattern of frequency distribution and knowledge on the nature and behaviour of
the process, a family of distributions is chosen.
(iii) Parameters selection
Choose parameters that determine a specific instance of a distribution family
when the data are available. These parameters are estimated from the data.

(iv) Validity of the distribution


The validity of the chosen distribution and the associated parameters are evalu-
ated with the help of statistical tests. The validity of various assumptions made
on parameter is achieved by certain level of significance only.

If the chosen distribution is not a good approximation of the data, then the analyst
goes to the second step, chooses a different family of distributions and repeats the
procedure.
If the several iterations of this procedure fail to give a fit between an assumed
distributional form and the collected data, then the empirical form of the distribution
may be used.

1.2 Collection of Data


Collection of data is one of the important tasks in finding a solution for real life
problems. Even if the statistical pattern of the real life problems are valid, if the data
are inaccurately collected, inappropriately analyzed or not representative of the real life
problems, then the data will be misleading when used for decision making.
One can learn data collection from an actual experience. The following sug-
gestions may enhance and facilitate data collection. Data collection and analysis must
be tackled with great care.
(i) Before collecting data, planning is very important. It could commence by a prac-
tice of pre - observing experience. Try to collect the data while pre - observing.
Forms of the data are devised for due purposes. It is very likely that these forms
will have to be modified several times before the actual data collection begins.
Watch for unusual situations or normal circumstances and consider how they
will be handled. Planning is very important even if the data are collected au-
tomatically. After collecting the data, find out whether the collected data are
appropriate or not.
(ii) If the data being collected are adequate to diagnosize the statistical distributions,
then determine the apt distribution. If the data being used are useless to diagno-
size the statistical distribution, then there is no need to collect superfluous data.
(iii) Try to combine homogeneous data sets. Check data for homogeneity in suc-
cessive time periods and during the same time period on successive interval of
times.

13
A. Santhakumaran

(iv) Beware of the possibility of data censoring, in which a quantity of interest is not
observed in its entirety. This problem most often occurs when the analyst is
interested in the time required to complete some process but the process begins
prior to or finishes after the completion of the observation period. Censoring can
result in especially long process times being left out of the data sample.
(v) One may use scatter diagram which indicates the relationship between the two
variables of interest.
(vi) Consider the possibility that a sequence of observations which appear to be in-
dependent may possess autocorrelation. Autocorrelation may exist in successive
time periods.

1.3 Diagnosis of a distribution with data


The methods for selecting families of distributions are possible, if only the sta-
tistical data are available. The specific distribution within a family is specified by
estimating its parameters. Estimating the parameters of a family of distributions leads
to the theory of estimation.
The formation of frequency distribution or Histogram is useful in guessing the
shape of a distribution. Hines and Montgomery state that choosing the number of class
intervals approximately equals the square root of the sample size. If the intervals are too
long, the Histogram will be coarse or blocking and its shape and other details will not
smoothen the data. So one has to allow the interval sizes to change until a good choice
is found. The Histogram for continuous data corresponds to the probability density
function of a theoretical distribution. If continuous, a line drawn through the centre
point of each class interval frequency should result in a shape like that of probability
density function ( pdf )( see Figure 1.2).
Histogram for discrete data where there are a large number of data points,
should have a cell for each value in the range of the data. However if there are a few
data points, it may be necessary to combine adjacent cells to eliminate the ragged ap-
pearance of the Histogram. If the Histogram is associated with discrete data, it should
look like a probability mass function ( pmf ) ( see Figure 1.1).

1.4 Discrete Distributions


Discrete random variables are used to describe the random phenomenon in which
only integer values can occur. The following are some important distributions.
1.4.1 Bernoulli distribution
An experiment consists of n trials, each trial has a success or a failure and each
trial is repeated under the same condition. Let Xj = 1 if the j th experiment
resulted in success and let Xj = 0 , if the j th experiment resulted in a failure,
the sample space has a value 0 and 1. If the trials are independent, each trial has
only two possible outcomes ( success or failure) and the probability of success

14
Probability Models and their Parametric Estimation

θ remains constant from trial to trial. For one trial the pmf
 x
θ (1 − θ)1−x x = 0, 1, 0 < θ < 1
pθ (x) =
0 otherwise

is the Bernoulli distribution function.


From the above assumptions in a production process, X denotes the quality of
the produced item, then X follows the Bernoulli random variable.
1.4.2 Binomial Distribution
Let X be a random variable, denotes the number of success in n Bernoulli
trials. Then the random variable X is called a Binomial random variable with
parameters n and θ . Here the sample space is {0, 1, 2, · · · , n} and the pmf
is
 n! x n−x
pθ (x) = x!(n−x)! θ (1 − θ) x = 0, 1, · · · , n, 0 < θ < 1
0 otherwise

In Binomial distribution, the mean is always greater than variance . If


X1 , X2 , · · · , XnPare independent and identically distributed Bernoulli random
n
variables, then i=1 Xi ∼ b(n, θ) . The problems relating to tossing a coin
or throwing dice lead to Binomial distribution . In a production process, the
number of x defective units in a random sample of n units follows Binomial
distribution.
1.4.3 Geometric Distribution
A random variable X is related to a sequence of Bernoulli trials in which the
number of trials (x + 1) to achieve the first success is

θ(1 − θ)x x = 0, 1, 2, · · · , 0 < θ < 1



pθ (x) =
0 otherwise

It is the probability that the event {X = x} occurs, when there are x failures
followed by a success.
A couple decides to have any number of children until they have a male
child. If the probability of having a male child in their family is p , they have
to expect how many children they will have before the first male child is born.
X denotes the number of children of the couple. The probability that there are
x female children preceding the first male child is born, is a Geometric random
variable.
1.4.4 Negative Binomial Distribution
PnX1 , X2 , · · · , Xn are iid Geometric variables, then T
If = t(X) =
i=1 Xi ∼ a Negative Binomial variate whose pmf is
(
(t+n−1)! n t
pθ (t) = t!(n−1)! θ (1 − θ) t = 0, 1, · · ·
0 otherwise

15
A. Santhakumaran

A random variable X is related to a sequence of Bernoulli trials in which x


failures preceding the nth success in (x + n) trials is given by
(
(x+n−1)! n x
pθ (x) = (n−1)!x! θ (1 − θ) x = 0, 1, 2, · · ·
0 otherwise

This will happen if the last trial results in a success and among the previous
(n + x − 1) trials there are exactly x failures. Note that if n = 1 , then p(x)θ
is the Geometric distribution function. Negative Binomial distribution has Mean
< Variance . In a production process, the number of units that are required to
achieve nth defective in x + n units follow Negative Binomial distribution.
1.4.5 Multinomial Distribution
If the sample space of a random experiment has been split into more than two
mutually exclusive and exhaustive events then one can define a random vari-
able which leads to Multinomial distribution. Let E1 , E2 , · · · , Ek be k mu-
tually exclusive and exhaustive events of a random experiment with respec-
tive probabilities θ1 , θ2 , · · · , θk , such that θ1 + θ2 + · · · + θk = 1 and
0 < θi < 1, i = 1, 2, · · · , k, then the probability that E1 occurs x1 times, E2
occurs x2 times, · · · , Ek occurs xk times in n independent trials is known
as Multinomial distribution with pmf is given by
x
n!
θx1 θ2x2 where ki=1 xi = n
 P
x1 !x2 !···xk ! 1
· · · θk k
pθ1 ,θ2 ,··· ,θk (x1 , x2 , · · · , xn ) =
0 otherwise

If k = 2 , that is, the number of mutually exclusive events is only two, then the
Multinomial distribution becomes a Binomial distribution as is given by
 n! x1 x2
pθ1 ,θ2 (x1 , x2 ) = x1 !x2 ! θ1 θ2 where x1 + x2 = n and θ1 + θ2 = 1
0 otherwise

That is x2 = n − x1 and θ2 = 1 − θ1 which implies


 n! x1 n−x1
pθ1 (x1 ) = x1 !(n−x1 )! θ1 (1 − θ1 ) 0 < θ1 < 1, x1 = 0, 1, · · · , n
0 otherwise

Consider two brands A and B. Each individual in the population prefers brand
A to brand B with probability θ1 , prefers B to A with probability θ2 and is
indifferent between brand A and B with probability θ3 = 1 − θ1 − θ2 . In
a random sample of n individuals X1 prefers brand A, X2 prefers brand B
and X3 prefers some other brand other than A and B. Then the three random
variables follow a Trinomial distribution, i.e.,

pθ1 ,θ2 ,θ3 (x1 , x2 , x3 ) = P {X1 = x1 , X2 = x2 , X3 = x3 }


 n! x1 x2 x3
= x1 !x2 !x3 ! θ1 θ2 θ3 x1 + x2 + x3 = n
0 otherwise

16
Probability Models and their Parametric Estimation

1.4.6 Discrete Uniform Distribution


A random variable X is said to follow uniform distribution on N points
(x1 , x2 , · · · , xN ), if its pmf is given by
 1
i = 1, 2, · · · , N and N ∈ I+
pN (x) = PN {X = xi } = N
0 otherwise

A random experiment with complete uncertainty but whose outcomes are equal
probabilities may describe Uniform distribution. In a finite population of N
units, one has to select any unit xi , i = 1, 2, · · · , N from the population with
simple random sampling technique which has a discrete uniform distribution.
1.4.7 Hypergeometric Distribution
One situation in which Bernoulli trials are encountered is that in which an ob-
ject is drawn at random from a collection of objects of two types in a box. In
order to repeat this experiment so that the results are independent and identically
distributed, it is necessary to replace each object drawn and to mix the objects
before the next one is drawn. This process is referred to as sampling with re-
placement. If the sampling is done no replacement of the objects drawn, the
resulting trial are still of the Bernoulli type but no longer independent.
For example, four balls are drawn one at a time, at random and no replace-
ment from 8 balls in a box, 3 black and 5 red. The probability that the third ball
drawn is black, i.e.,

P { 3rd ball black} = P (RRB) + P (RBB) + P (BRB) + P (BBB)


5 4 3 5 3 2 3 5 2 3 2 1
= × × + × × + × × + × ×
8 7 6 8 7 6 8 7 6 8 7 6
3
=
8
which is the same as the probability that the first ball drawn is black. It should
not be surprising that this probability for black ball is the same on the third draw
as on the first draw.
In general case, n objects are to be drawn at random, one at a time, from
a collection of N objects, M of one kind and N − M of another kind. The
one kind and of object will be thought of as success and coded 1; the other kind
is coded 0. Let X1 , X2 , · · · , Xn denote the sequence of coded outcomes; that
is Xi is 1 or 0 according to whether the ith draw results in success or failure.
The total number of success in n trials is just the sum of the X 0 s ,

Sn = X1 + X2 + · · · + Xn

as it was in the case of independent identically distributed Bernoulli trials. That


is, the probability of a 1 on the ith trial is the same at each trial:
M
P {Xi = 1} = i = 1, 2, · · · , n
N

17
A. Santhakumaran

One can observe first that the probability of a given sequence of N objects is
1 1 1
···
N N −1 N −n+1

The probability that an object of type 1 occurs in the ith position in the sequence
of N objects is

M (N − 1)(N − 2) · · · (N − n + 1)
P {Xi = 1} =
N (N − 1) · · · (N − n + 2)(N − n + 1)
M
= i = 1, 2, · · · , n
N
where M is the number of ways of selecting the ith position with an object
coded 1 and (N − 1)(N − 2) · · · (N − n + 1) is the number of ways of selecting
the remaining (n − 1) places in the sequence from the (N − 1) remaining
objects. It does not matter whether the number of success among the n objects
drawn, one at a time, at random or that of simultaneously drawing n at random.
The probability function of Sn is
M N-M
   

 k  n - k 

   



P {Sn = k} = N
  k = 0, 1, 2, · · · , min(n, M )



 n 


0

otherwise

The random variable Sn with the above probability function is said to have a
Hypergeometric distribution. The mean of the random variable Sn is easily
obtained from the representation of a Hypergeometric variable as a sum of the
Bernoulli trials. That is,

E[Sn ] = E[X1 + X2 + · · · + Xn ]
= E[X1 ] + E[Xn ] + · · · + E[Xn ]
= 1 × P {X1 = 1} + 0 × P {X1 = 0}
+ · · · + 1 × P {Xn = 1} + 0 × P {Xn = 0}
M M nM
= + ··· + =
N N N
M N −M N −n
Variance of Sn = n if N ∈ I+ (1.1)
N N N −1
The probability at each trial that the object drawn is of the type of which there
are initially M is p = MN , then

N −n
Variance of Sn = npq if N ∈ I+ (1.2)
N −1

18
Probability Models and their Parametric Estimation

−n
The above formula (1.2) differs from the formula (1.1) by the extra factor N
N −1 .
N −n
The variance of Sn = npq N −1 in the no replacement case and the variance
of Sn = npq in the replacement case for fixed p and fixed n , since the factor
N −n
N −1 → 1 as N becomes finitely many. Thus Hypergeometric distribution is
exact where as Binomial distribution is approximate one.
50 students of the M.Sc. Statistics in a certain college are divided at random
into 5 batches of 10 each for the annual practical examination in Statistics. The
class consists of 20 resident students and 30 non - resident students. X denotes
the number of students in the first batch who appear the practical examination.
The Hypergeometric distribution is apt to describe the random variable X and
has the pmf
  
20  30 






 x 10 - x


x = 0, 1, 2, · · · , 10

50
 
P {X = x} =
10
  





0 otherwise

1.4.8 Poisson Distribution


Poisson random variable is used to describe rare events. For example number of
air crashes occurred on Monday in 3 pm to 5 pm. The pmf of Poisson random
variable given as
 −θ θx
e x! θ > 0, x = 0, 1, 2, · · ·
pθ (x) =
0 otherwise
where θ is a parameter. One of the important properties of the Poisson dis-
tribution is that mean and variance are the same and are equal to θ . If
X1 , X2 , · · · , Xn are iid Poisson
Pn random variables with parameter θ , then the
sum of the random variables i=1 Xi follows a Poisson distribution with pa-
rameter nθ .
After correcting 50 pages of the proof of a book, the proof readers find
that there are, on the average 2 errors per 5 pages. One would like to know the
number of pages with errors 0 , 1, 2, 3 · · · in 10000 pages of the first print of
the book. X denotes the number of errors per page; then the random variable
X follows the Poisson distribution with parameter θ = 52 = .4.
1.4.9 Power series distribution
If a random variable X follows a Power series distribution, then its pmf is
 ax θx
f (θ) x ∈ S; ax ≥ 0, θ > 0
Pθ {X = x} =
0 otherwise

where f (θ) is a generating function, i.e., f (θ) = x∈S ax θx , θ > 0 so that


P
f (θ) is positive, finite and differentiable and S is a non - empty countable
subset of non - negative integers.

19
A. Santhakumaran

1.4.9 Particular cases.


(i) Binomial Distribution
p
Let θ = 1+p , f (θ) = (1 + θ)n and S = {0, 1, 2, 3, · · · , n} a set of non -
negative integers, then
X
f (θ) = ax θx
x∈S
Xn
(1 + θ)n = ax θ x
x=0
n
ax = x
n  x
p
x 1−p
Pp {X = x} = p n
[1 + 1−p ]
( n
x px q n−x x = 0, 1, 2, · · · , n
=
0 otherwise

(ii) Negative Binomial Distribution


p
Let θ = 1−p , f (θ) = (1 − θ)−n and S = {0, 1, 2, · · · }, 0 ≤ θ ≤ 1 and n ∈
I+ . Now
X
f (θ) = ax θx
x∈S
X∞
(1 − θ)−n = ax θx
x=0
- n    
n+x-1 n+x-1
ax = (−1)x x = (−1)x (−1)x x = x
 
n + x - 1  p x
x 1+p
P {X = x} = h i−n
p
1 − ( 1+p )
  x
n+x-1 p
= x (1 + p)−n
1+p
 
n+x-1 x
= x p (1 + p)−(n+x)
-n
= x (−p)x (1 + p)−(n+x) x = 0, 1, 2, · · ·

(iii) Poisson distribution

20
Probability Models and their Parametric Estimation

Let f (θ) = eθ and S = {0, 1, 2, · · · }. Now


X
f (θ) = ax θx
x∈S
X
θ
e = ax θx
x∈S
∞ x ∞
X θ X 1
= ax θx ⇒ ax =
x=0
x! x=0
x!
ax θ x 1 θx e−θ θx
Pθ {X = x} = = = x = 0, 1, 2, · · ·
f (θ) x! eθ x!

1.5 Continuous Distributions


Continuous random variable can be used to describe random phenomena in which
the variable X of interest can take any value x in some interval which has P {X =
x} = 0 ∀ x in that interval.
1.5.1 Uniform Distribution
A random variable X is uniformly distributed at an interval [a, b], if its pdf is
given by  1
b−a a≤x≤b
pa,b (x) =
0 otherwise
2 −x1
Note that P {x1 < X < x2 } = F (x2 ) − F (x1 ) = xb−a is proportional to the
length of the interval for all x1 and x2 satisfying a ≤ x1 ≤ x2 ≤ b . If random
phenomenon has complete unpredictability, then it can be described as uniform
distribution.
1.5.2 Normal Distribution
A random variable X with mean θ (−∞ < θ < ∞) and variance σ 2 (> 0)
has a Normal distribution if it has the pdf
( 1 2
√ 1 e− 2σ2 [x−θ] −∞ < x < ∞
pθ,σ2 (x) = 2πσ
0 otherwise

The time of number of components of a random experiment can be thought of


as a Normal distribution. The time to assemble a product which is the sum of
the times required for each assembly operation may describe a Normal random
variable.

1.5.3 Exponential Distribution


A random variable X is said to be Exponentially distributed with parameter
θ > 0 , if its pdf is given by
 −θx
θe x>0
pθ (x) =
0 otherwise

21
A. Santhakumaran

The value of the intercept on the vertical axis is always equal to the value of θ .
Note that all pdf 0 s eventually intersect at θ , since the Exponential distribution
has its mode at the origin. The mean and standard deviation are equal in Ex-
ponential distribution. In a random phenomenon, the time between independent
events which have memory less property may appropriately follow Exponential
random variable. For example, the time between the arrivals of a large number
of customers who act independently of each other may fit adequately the data to
Exponential distribution.
1.5.4 Gamma Distribution
A function used to define the Gamma distribution is the Gamma function. A
random variable X follows a Gamma distribution, if
( β
θ −θx β−1
pθ,β (x) = Γβ e x x > 0, β > 0, θ > 0
0 otherwise

Pn β is called 1the shape parameter and 1θ is called the scale parameter.


where
i=1 Xi ∼ G(n, θ ) , if each Xi ∼ exp( θ ) . The cumulative distribution
function F (x) = P {X ≤ x} of a random variable X is given by
R ∞ βθ
(βθt)β−1 e−βθt dt x > 0

1 − x Γβ
F (x) =
0 otherwise

1.5.5 Erlang Distribution


The pdf of the Gamma distribution becomes Erlang distribution of order k
when β = k an integer. When β = k a positive integer, the cumulative distri-
bution function F (x) is given by
( Pk−1 −kθx (kθx)i
F (x) = 1 − i=0 e i! x>0
0 otherwise
which is the sum of Poisson terms with mean kθx .
1.5.6 Weibull Distribution
A random variable X has a Weibull distribution if it has pdf
(
β x−γ β−1
exp[−( x−γ
 β
pβ,α,γ (x) = α α α ) ] x≥γ
0 otherwise
The three parameters of the Weibull distribution are γ (−∞ < γ < ∞) which is
the location parameter, α (α > 0) which is the scale parameter and β (β > 0)
which is the shape parameter. When γ = 0 the Weibull pdf becomes
 β x β−1
α(α) exp[−( αx )β ] x ≥ 0
pβ,α (x) =
0 otherwise
When γ = 0 and β = 1 , the Weibull distribution is reduced to Exponential
distribution with pdf
 1 −x
αe
α x≥0
pα (x) =
0 otherwise

22
Probability Models and their Parametric Estimation

1.5.7 Triangular Distribution


A random variable X has a Triangular distribution if its pdf is given by
 2(x−a)
 (b−a)(c−a) a ≤ x ≤ b

pa,b,c (x) = 2(c−x)
b<x≤c
 (c−b)(c−a)

0 otherwise

where a ≤ b ≤ c. The mode occurs at x = b , since a ≤ b ≤ c, it follows


that 2a+c
3 ≤ E[X] ≤ a+2c 3 . The mode is used more often than the mean to
characterize the Triangular distribution.
1.5.8 Empirical Distribution
An empirical distribution may be either continuous or discrete in nature. It is
used to establish a statistical model for the available data whenever there is a
discrepancy in the aimed distribution or whenever one can unable to arrive at a
known distribution.
(a) Empirical Continuous Distributions
The time taken to install 100 machines is collected. The data are given in Table
1.1 which gives the number of machines together with time taken. For example,
30 machines have installed between 0 and 1 hour, 25 between 1 and 2 hour, 20
between 2 and 3 hour and 25 between 3 and 4 hour. X denotes time taken to
install the machines.

Table 1.1 Distribution of the time taken to install the Machines

Duration
of Hours Frequency p(x) F (x) = P {X ≤ x}
0≤x≤1 30 .30 .30
1<x≤2 25 .25 .55
2<x≤3 20 .20 .75
3<x≤4 25 .25 1.00
(b) Empirical Discrete Distributions
At the end of the day, the number of shipments on the loading docks of an export
company are observed as 0, 1 , 2, 3, 4 and 5 with frequencies 23, 15, 12, 10, 25
and 15 respectively. Let X be the number of shipments on the loading docks of
the company at the end of the day. Then X is a discrete random variable which
takes the values 0 , 1, 2, 3, 4 and 5 with the distribution as given in Table 1.2.
Figure 1.1 is the Histogram of number of shipments on the loading docks of the
company.

23
A. Santhakumaran

Table 1.2 Distribution of number of shipments

Number of
shipments x Frequency P {X = x} F (x) = P {X ≤ x}
0 23 .23 .23
1 15 .15 .38
2 12 .12 .50
3 10 .10 .60
4 25 .25 .85
5 15 .15 1.00

F
R
E
Q 25
U
E 20
N
C 15
Y 10
5

0 1 2 3 4 5
Number of shipments
Figure 1.1 Histogram of shipments

1.6 Diagnosis of distributions


The probability of an item whose value of the variable is constant increment, is
an Exponential distribution. This is apt to fit the data. If the probability of a variable
of an item whose value of the variable is either positive or negative, then a Normal
distribution is appropriate to the data. When the variable of interest seems to follow
the Normal probability distribution, the random variable is restricted to be greater than
or less than a certain value. The truncated Normal distribution will be adequate to fit
the data. The Gamma and Weibull distributions are also used to describe the data. The
Exponential distribution is a special case of both the Gamma and Weibull distributions.
The difference between the Exponential, Gamma and Weibull distributions involve the
location of modes of the pdf ’s and the shapes of their tails will be in proportion to

24
Probability Models and their Parametric Estimation

large and short times. The Exponential distribution has its mode at the origin but the
Gamma and Weibull distributions have their modes at some point( ≥ 0 ) which is a
function of the parameters values selected. The tail of the Gamma distribution is long,
like an Exponential distribution while the tail of the Weibull distribution may decline
more rapidly or less rapidly than that of an Exponential distribution. In practice, if
there are higher value of the variable than an Exponential distribution, it can account
for a Weibull distribution which provides a better distribution of the data.

Illustration 1.6.1
Sixteen equipments were produced and placed on test and the Table 1.3 gives the
length of time intervals between failures in hours.

Table 1.3 Equipments time between failures


Equipment
Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Time
between
failures 19 12 16 1 15 5 10 1 46 7 33 25 4 9 1 10

For the sake of simplicity in processing the data , one can set up the ordered set as
given blow:

Table 1.4 Ordered set of equipment time between failures


Equipment
Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Time
between
failures 1 1 1 4 5 7 9 10 10 12 15 16 19 25 33 46

On this basis, one may construct a Histogram to judge the pattern of the data in Table
1.4. An approximate value of the interval can be determined from the formula.
maximum value - minimum value
∆t =
1 + 3.3 log10 N

where the maximum and minimum are the values in the ordered set and N is the total
number of items of the order statistics. In this case maximum value is 46 , minimum
value is 1 and N is 16. Thus ∆t = 1+3.345 log10 16 = 9.05 ≈ 10 = width of the class
interval.

Table 1.5 Frequency Distribution


Time
interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
Number of
Equipment 9 4 1 1 1

25
A. Santhakumaran

Histogram is drawn based on the frequency distribution in Table 1.5 and is given in
Figure 1.2.
9

Number
of
Equipment
4

1 1 1

0 10 20 30 40 50
Time interval
Figure 1.2 Histogram of time to failures

The Histogram reveals that the distribution could be Negative Exponential or the
right portion of the Normal distribution. Assume the time to failure follows Exponen-
tial distribution of the form,
 −θx
θe θ > 0, x > 0
pθ (x) =
0 otherwise

How for the assumption is valid has to be verified? The validity of the assumption
is tested by the χ2 test of goodness of fit.

Table 1.6 Distribution of time to failures


Expected Observed
frequency frequency
Interval pi E O
0 - 10 .5262 8.41 ≈ 8 9
10 - 20 .2493 3.98 ≈ 4 4
20 - 30 .1181 1.886 ≈ 2 1
30 - 40 .0559 .8944 ≈ 1 1
40 - 50 .0265 .454 ≈ 1 1

26
Probability Models and their Parametric Estimation

Rx
where pi = xii+1 θe−θx dx = e−θxi − e−θxi+1 , i = 0, 10, 20, · · · , 50. If the cell
frequencies are less than 5, then it can be made 5 or more than 5. One may get two
classes only, i.e, the expected frequencies are equal to 8 each and the corresponding
observed frequencies are 9 and 7 respectively. The χ2 test of goodness of fit fails
to test the validity of the assumption that the sample data come from an Exponential
1
distribution with parameter θ = 13.38 = .0747 = failure rate per unit hour where the
mean life time of the equipments = 214 16 = 13.38 hours. To test the validity of the
assumption that the time to failure follows an Exponential distribution, consider the
likelihood function of the cell frequencies of o1 = 9 and o2 = 7 is
e1 o1 e2 o2
 n!  
o !o ! n n o1 + o2 = n
L= 1 2
0 otherwise

Under H0 the likelihood function follows a Binomial probability law b(16, p) where
p = en1 . To test the hypothesis that H0 : the fit is the best one vs H1 : the fit is not the
best one. It is equivalent to test the hypothesis that H0 : p ≤ .5 vs H1 : p > .5 The
UMP level α = .05 test is given by

 1 if x > 11
φ(x) = .17 if x = 11
0 otherwise

The observed value is 9 which is less than 11. There is no evidence to reject the
hypothesis H0 . The data come from an Exponential distribution with 5% level of
significance. Thus time to failure of the equipments follows an Exponential distribu-
tion. One may conclude that on an average the equipment would be operated for 13.38
hours without failure.

1.7 Quantile - Quantile plot


The construction of Histograms and the recognition of a distributional shape are
necessary ingredients for selecting a family of distributions to represent a sample data.
A Histogram is not useful for evaluating the fit of the chosen distribution. When there
are a small number of data points ( ≤ 30 ), a Histogram can be rather ragged. Further
perception of the fit depends on the width of the Histogram intervals. Even if the
intervals are well chosen, grouping the data into cells makes it difficult to compare a
Histogram to a continuous pdf . A quantile - quantile (q - q) plot is a useful tool for
evaluating distribution fit that does not suffer from these problems.
If X is a random variable with cumulative distribution F (x) , then q quantile of
X is that value y such that F (y) = P {X ≤ y} = q for 0 < q < 1 . When F (x)
has an inverse y = F −1 (q) . Let x1 , x2 , · · · , xn be a sample observations of X .
Order the observations from the smallest to the largest and denote these as yj , j = 1
to n where y1 ≤ y2 ≤ · · · ≤ yn . One can denote j the rank or order number.
Therefore j = 1 for the smallest and j = n for the largest. The q - q plot is based on
j− 21
 that yj is an estimate of the ( n ) quantile of X , i.e, yj is approximately
the fact
j− 12
F −1 n .

27
A. Santhakumaran

A distribution with cumulative distribution function F (x) is a possible represen-


tation of the random variable X . If F (x)  is a member of an appropriate family of
j− 1

distributions, then a plot of yj versus F −1 n 2 will be approximately a straight
line.
If F (x) is from an appropriate family of distributions and also has appropriate
parameter values, then the line will have slope 1. On the other hand, if the assumed
distribution is inappropriate, the points will deviate from a straight line in a systematic
manner. The decision whether to accept or reject some hypothesized distribution is
subjective.
In the construction of q - q plot, the following should be borne in mind.
(i) The observed values will never fall exactly on a straight line. (ii) The ordered values
are not independent, since they have been ranked. (iii) The variances of the extremes
are much higher than the variances in the middle of the plot. Greater discrepancies can
be accepted at the extremes. The linearity of the points in the middle of the plot is more
important than the linearity at the extremes.

Illustration 1.7.1
A sample of 20 repairing times of electronic watch was considered. The repairing
time X is a random variable. The values are in seconds on the random variable X .
The values are arranged in the increasing order of magnitude as in Table 1.7.

Table 1.7 Repairing times of electronic watch


j Value j Value j Value j Value
1 88.54 6 88.82 11 88.98 16 89.26
2 88.56 7 88.85 12 89.02 17 89.30
3 88.60 8 88.90 13 89.08 18 89.35
4 88.64 9 88.95 14 89.18 19 89.41
5 88.75 10 88.97 15 89.25 20 89.45

28
Probability Models and their Parametric Estimation

Table 1.8 Normal Quantile


yj = yj =
xj = yj × .08 xj = yj × .08
j− 1 j− 1 j− 1 j− 1
   
−1
j 20
2
F ( 20
2
) + 88.993 j 20
2
F −1 ( 202 ) + 88.993
1 .025 -1.96 88.84 11 .525 .06 89.00
2 .075 - 1.41 88.88 12 .575 .18 89.01
3 .125 - 1.13 88.90 13 .625 .31 89.02
4 .175 - 0.93 88.92 14 .675 .45 89.03
5 .225 - 0.75 88.94 15 .725 .60 89.04
6 .275 -.60 88.95 16 .775 .75 89.05
7 .325 -.45 88.96 17 .825 .93 89.07
8 .375 -.31 88.97 18 .875 1.13 89.08
9 .425 - .18 88.98 19 .925 1.41 89.11
10 .475 -.06 88.99 20 .975 1.96 89.15
 1
j−
The ordered observations in Table 1.8 are then plotted versus F −1 n 2 for
j = 1, 2, · · · , 20 where F (.) is the cumulative distribution function of the Normal
random variable X with mean 88.993 seconds, and standard deviation .08 seconds to
obtain the q - q plot. The plotted values are shown in Figure 1.3. The general per-
ception of a straight line is quite clear in the q - q plot, supporting the hypothesis of a
normal distribution.

?
? Normal
? quantile
yj
?
?
?
?
?
?

Time xj
Figure 1.3 q − q plot of the repairing times

Note: The diagnosis of statistical distributions of real life problems are not exact
but at best they represent reasonable approximations.
Problems
1.1 The mean and variance of the number of defective items drawn randomly one
by one with replacement from a lot are found to be 10 and 6 respectively. The
distribution of the number of defective items is:
(a) Poisson with mean 10

29
A. Santhakumaran

(b) Binomial with n = 25 and p = 0.4


(c) Normal with mean 10 and variance 6
(d) None of the above
1.2 If X is a Poisson random variate with mean 3, then P {|X − 3| < 1} will be:
(a) 21 e−3 (b) 3e−3 (c) 4.5e−3 (d) 27e−3
1.3 Let U(1) , U(2) , · · · , U(n) be the order statistics of a random sample
U1 , U2 , · · · , Un of size n from the uniform (0, 1) distribution. Then the con-
ditional distribution of U1 given U(n) = u(n) is given by:
(a) Uniform on (0, u(n) )
(b) P {U1 = u(n) } = n1 and probability n−1 n is uniformly distributed over
(0, u(n) ) .
(c) Beta n1 , n−1

n
(d) Uniform (0, 1)
1.4 A biased coin is tested 4 times or until a head turns up, whichever occurs earlier.
The distribution of the number of tails turning up is:
(a) Binomial (b) Geometric (c) Negative Binomial (d) Hypergeometric
1.5 If X and Y are independent Exponential random variables with the same mean
λ , then the distribution of min(X, Y ) is :
(a) Exponential with mean λ2
(b) Exponential with mean λ
(c) not Exponential with mean λ
(d) Exponential with mean 2λ
1.6 The χ2 goodness of fit is based on the assumption that a character under study is
(a) Normal (b) Non - Normal (c) any distribution (d) not required
1.7 The exact distribution of χ2 goodness of fit for each experiment unit is classified
into one of more k categories of a random sample of size n depends on :
(a) Hypergeometric distribution
(b) Normal distribution
(c) Multinomial distribution
(d) Binomial distribution
1.8 If X1 ∼ b(n1 , θ1 ) , X2 ∼ b(n2 , θ2 ) and X1 , X2 are independent, then the
sum of the variates X1 + X2 is distributed as :
(a) Hypergeometric distribution
(b) Binomial distribution
(c) Poisson distribution
(d) None of the above
1.9 If X1 ∼ b(n1 , θ) , X2 ∼ b(n2 , θ) and X1 , X2 are independent, then the sum
of the variates X1 + X2 is distributed as :
(a) Hypergeometric distribution
(b) Binomial distribution

30
Probability Models and their Parametric Estimation

(c) Poisson distribution


(d) None of the above
1.10 If X1 ∼ P (θ1 ), X2 ∼ P (θ2 ) and X1 , X2 are independent,then the sum of
the variates X1 + X2 is distributed as :
(a) Hypergeometric distribution
(b) Binomial distribution
(c) Poisson distribution
(d) None of the above
1.11 The skewness of a Binomial distribution will be zero if:
(a) p < .5 (b) p > .5 (c) p 6= .5 (d) p = .5
1.12 If the sample size n = 2 , the students’ t - distribution reduces to:
(a) Normal distribution
(b) F - distribution
(c) χ2 - distribution
(d) Cauchy distribution
1.13 The reciprocal property of Fn−1,n2 −1 distribution can be expressed as:
(a) Fn2 ,n1 (1 − α) = Fn ,n1 (α)
1 2
(b) P {Fn1 ,n2 (α) ≥ c} = P Fn2 ,n1 (α) ≤ 1c

(c) Fn2 ,n1 (1 − α2 ) = Fn ,n1 ( α )


1 2 2
(d) All the above
1.14 The distribution of which the moment generating function is not useful in finding
the moments is :
(a) Binomial distribution
(b) Negative Binomial distribution
(c) Hypergeometric distribution
(d) Geometric distribution
1.15 Probability of selecting a unit from a population of N units in a simple random
sampling technique is a :
(a) Bernoulli distribution
(b) Binomial distribution
(c) Geometric distribution
(d) discrete Uniform distribution
1.16 A production process is a sequence of Bernoulli trials, the number of x defective
units in a sample of n units is a:
(a) Bernoulli distribution
(b) Binomial distribution
(c) Multinomial distribution
(d) Hypergeometric distribution
1.17 A random variable X is related to a sequence of Bernoulli trials in which the
number of trials (x + 1) to achieve the first success, then the distribution of X

31
A. Santhakumaran

is :
(a) Bernoulli distribution
(b) Binomial distribution
(c) Multinomial distribution
(d) Geometric distribution
Pn
1.18 If X1 , X2 , · · · , Xn are iid Geometric variables, then i=1 Xi follows:
(a) Negative Binomial distribution
(b) Binomial distribution
(c) Multinomial distribution
(d) Geometric distribution
1.19 A random variable X is related to a sequence of Bernoulli trials in which x
failures preceding the nth success in (x + n) trials is a :
(a) Binomial distribution
(b) Multinomial distribution
(c) Negative Binomial distribution
(d) Geometric distribution
1.20 If a random experiment has only two mutually exclusive outcomes of a Bernoulli
trial, then the random variable leads to:
(a) Binomial distribution
(b) Multinomial distribution
(c) Negative Binomial distribution
(d) Geometric distribution
1.21 A box contains N balls M of which are white and N − M are red. If X
denotes the number of white balls in the sample contains n balls with replace-
ment, then X is a :
(a) Binomial variate
(b) Bernoulli variate
(c) Negative Binomial variate
(d) Hypergeometric variate
1.22 The number of independent events that occur in a fixed amount of time may
follow:
(a) Exponential distribution
(b) Poisson distribution
(c) Geometric distribution
(d) Gamma distribution
1.23 A power series distribution
ax θ x

f (θ) x ∈ S, ax ≥ 0
Pθ {X = x} =
0 otherwise
p
where f (θ) = (1 + θ)n , θ = (1−p) and S = {0, 1, 2, · · · } . Then the random
variable X has

32
Probability Models and their Parametric Estimation

(a) Geometric distribution


(b) Bernoulli distribution
(c) Binomial distribution
(d) Negative Binomial distribution
2
1.24 The given probability function p(x) = 3x+1 for x = 0, 1, 2, 3, · · · , represents:
(a) Negative Binomial distribution
(b) Binomial distribution
(c) Bernoulli distribution
(d) Geometric distribution
1.25 Dinesh Kumar receives 2, 2, 4 and 4 telephone calls on 4 randomly selected days.
Assuming that the telephone calls follow Poisson distribution, the estimate of the
number of telephone calls in 8 days is:
(a) 12 (b) 3 (c) 24 (d) none of the above
1.26 The exact distribution of χ2 goodness of fit test for each experiment units is
classified into one of two categories of a random sample of size n depends on :
(a) Hypergeometric distribution
(b) Normal distribution
(c) Multinomial distribution
(d) Binomial distribution
1.27 The pmf of a random variable X is
 !
k+x
 P∞ θ x+k
 k
k=0 (−1) k Γ(x+k+1) x = 0, 1, · · ·
pθ (x) =

=0 otherwise

It is known as
(a) Binomial ( b) Negative Binomial (c) Poisson (d) Geometric

33
A. Santhakumaran

2. CRITERIA OF POINT ESTIMATION

2.1 Introduction
In real life applications, determining appropriate distributions from the random
sample is a major task. Faulty assumption of distributions will lead to misleading rec-
ommendations. As a family of distributions induced by a parameter has been selected,
the next step is to estimate the parameters of the distribution. The criteria of the point
estimators for many standard distributions are described in this chapter.
The set of all admissible values of parameters of a distribution is called the parame-
ter space Ω . Any member from the parameter space is called parameter. For example,
a random variable X is assumed to follow a normal distribution with mean θ and
variance σ 2 . The parameter space Ω = {(θ, σ) | −∞ < θ < ∞, 0 < σ 2 < ∞} .
Suppose a random sample X1 , X2 , X3 , · · · , Xn is taken on X . Here a statistic
T = t(X) from the sample X1 , X2 , · · · , Xn which gives the best value for the pa-
rameter θ . The particular value of the Statistic T = t(x) = x̄ based on the values
x1 , x2 , · · · , xn is called an estimate of θ . If the statistic T = X̄ is used to estimate
the unknown parameter θ, then the sample mean is called an estimator of θ . Thus an
estimator is a rule or a procedure to estimate the value of θ . The numerical value x̄ is
called an estimate of θ .

2.2 Point Estimator


Let X1 , X2 , · · · , Xn be n independent identically distributed ( iid ) random
sample drawn from a population with probability density function ( pdf ) pθ (x) ,
θ ∈ Ω. The statistic T = t(X) is said to be a point estimator of θ , if the func-
tion T = t(X) has a single point θ̂(X) which maps to θ in the parameter space
Ω.

2.3 Problems of Point Estimation


The problems involved in point estimation are
• to select or choose a statistic T = t(X) .
• to find the distribution function of the statistic T = t(X) .
• to verify the selected statistic satisfies the criteria of the point estimation .

2.4 Criteria of the Point Estimation


The criteria of the point estimation are
(i) Consistency
(ii) Unbiasedness
(iii) Sufficiency and
(iv) Efficiency

34
Probability Models and their Parametric Estimation

2.5 Consistency
Consistency is a convergence property of an estimator. It is an asymptotic or large
sample size property. Let X1 , X2 , · · · , Xn be iid random sample drawn from a pop-
ulation with common distribution Pθ , θ ∈ Ω. An estimator T = t(X) is consistent
for θ if for every  > 0 and for each fixed θ ∈ Ω, Pθ {|T −θ| > } → θ as n → ∞ ,
P
i.e. T → θ as n → ∞ for fixed θ ∈ Ω .
Example 2.1 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal
population with mean θ and known variance σ 2 . The statistic T = X̄ is chosen for
2
an estimator of the parameter θ . The statistic X̄ ∼ N( θ, σn ). To test the consistency
of the estimator, consider for every  > 0 and fixed θ ∈ Ω,
Pθ {|X̄ − θ| > } = 1 − Pθ {|X̄ − θ| < }
= 1 − Pθ {− < X̄ − θ < }
√ X̄ − θ √
= 1 − Pθ {− n/σ < √ <  n/σ}
σ/ n
√ √
= 1 − Pθ {− n/σ < Z <  n/σ}
X̄ − θ
where Z = √
σ/ n
= 1 − Pθ {−∞ < Z < ∞} as n → ∞
= 1 − 1 = 0 as n → ∞
P
Thus X̄ → θ as n → ∞ . The sample mean X̄ of the normal population is a
consistent estimator of the population mean θ .
Remark 2.1 In general sample mean need not be a consistent estimator of the
population mean.
Example 2.2 Let X1 , X2 , X3 , · · · , Xn be iid random sample from a Cauchy
population with pdf
 1 1
π 1+(x−θ)2 −∞ < x < ∞
pθ (x) =
0 otherwise
For every  > 0 and fixed θ ∈ Ω,
Pθ {|X̄ − θ| > } = 1 − Pθ {− < X̄ − θ < }
= 1 − Pθ {θ −  < X̄ < θ + }
Z θ+
1 1
= 1− 2
dx̄
θ− π 1 + (x̄ − θ)
since X̄ ∼ Cauchy distribution with parameter θ
Z 
1 1
= 1− 2
dz where x̄ − θ = z
− π 1 + z
1
= 1 − [tan−1 (z)]−
π
2
= 1 − tan−1 () since tan−1 (−θ) = − tan−1 (θ)
π

35
A. Santhakumaran

Thus Pθ {|X̄ − θ| > } 6→ 0 as n → ∞


P
i.e., X̄ 6→ θ as n → ∞ . For Cauchy population the sample mean X̄ is not a
consistent estimator of the parameter θ .

2.6 Sufficient condition for consistency


Theorem 2.1 If {Tn }∞n=1 is a sequence of estimator such that Eθ [Tn ] → θ and
Vθ [Tn ] → 0 as n → ∞ , then the statistic Tn is a consistent estimator of the param-
eter θ .
2
Consider Eθ [Tn − θ]2 = Eθ (Tn − Eθ [Tn ] + Eθ [Tn ] − θ)
2 2
= Eθ (Tn − Eθ [Tn ]) + {Eθ [Tn − θ]}
2
= Vθ [Tn ] + {Eθ [Tn − θ]}
since Eθ (Tn − Eθ [Tn ]) = 0

By Chebychev’s inequality
1
Pθ {|Tn − θ| > } ≤ Eθ [Tn − θ]2
2
1 h 2
i
≤ Vθ [Tn ] + {Eθ [Tn − θ]}
2
→0 as n → ∞

since Vθ [Tn ] → 0 and Eθ [Tn ] → θ as n → ∞ .


∵ Tn is a consistent estimator of θ .
Remark 2.2 The conditions are only sufficient, but not necessary. Since if
{Xn }∞ n=1 is a sequence of iid random variables from a population with finite mean
θ = Eθ [X] , then X̄ converges to θ in probability for each fixed θ ∈ Ω. It is known
as Khintchin’s Weak Law of Large Numbers, i.e., sample mean X̄ finitely exists, is a
consistent estimator for the population mean θ which does not require the condition
Vθ [X̄] → 0 as n → ∞ for every fixed θ ∈ Ω . Thus consistency follows the ex-
istence of the expectation of the statistic and the assumption of finite variance of the
statistic is not needed.
For illustration the Cauchy pdf is
 1 1
π 1+x2 −∞ < x < ∞
p(x) =
0 otherwise

The mean E[X] does not exist finitely, i.e.,


Z ∞
1 x
E[X] = dx
−∞ π 1 + x2

36
Probability Models and their Parametric Estimation

is divergent. But the Cauchy Principle value


Z t Z t
1 x 1 2x
lim dx = lim dx
π t→∞ −t 1 + x2 2π t→∞ −t 1 + x2
1 t
lim log(1 + x2 ) −t

=
2π t→∞
1
= lim [log(1 + t2 ) − log(1 + t2 )]
2π t→∞
= 0

The Cauchy Principle value 0 is taken as the mean of the Cauchy distribution. Thus the
Cauchy distribution has not the mean finitely exist. Hence for the Cauchy population,
the sample mean X̄ is not a consistent estimator of the parameter θ .
Example 2.3 If X1 , X2 , · · P · , Xn is a random sample drawn from a normal popu-
1 n
lation N( 0, σ 2 ). ShowPthat 3n 4 4
k=1 Xk is a consistent estimator of σ .
1 n 4
Let T = 3n k=1 Xk .
n
1 X
Eσ4 [T ] = Eσ4 [Xk4 ]
3n
k=1
n
1 X
= Eσ4 [Xk − 0]4 since E[Xk ] = 0 ∀ k = 1, 2, · · ·
3n
k=1
1 1
= nµ4 = 3nσ 4 since µ4 = 3σ 4 where
3n 3n
µ2n = 1 × 3 × 5 × · · · × (2n − 1)σ 2n n = 1, 2, · · ·
= σ4

n
1 X
Vσ4 [T ] = Vσ4 [X 4 ]
(3n)2
k=1
n
1 Xn 4 2 4 2
 o
= Eσ 4 [Xk ] − E σ 4 [Xk ]
(3n)2
k=1
1
= n[µ8 − µ24 ]
(3n)2
1
= [105σ 8 − (3σ 4 )2 ] since µ8 = 1 × 3 × 5 × 7 × σ 8
32 n
1
= 96σ 8 → 0 as n → ∞.
32 n
Thus T is a consistent estimator of σ 4 .
Example 2.4 Let X1 , X2 , · · · Xn be a random sample drawn from a population
Qn 1
with rectangular distribution ∪(0, θ), θ > 0 . Show that ( i=1 Xi ) n is a consistent
estimator of θe−1 .

37
A. Santhakumaran

Qn 1
Let GM = ( i=1 Xi ) n ∀ Xi > 0, i = 1, 2, · · · , n .
n
1X
loge GM = log Xi
n i=1
Z θ
1
Eθ [log X] = log xdx
θ 0
( Z θ )
1 θ
= [x log x]0 − dx
θ 0
1h i
= θ log θ − lim x log x − θ
θ x→0
= log θ − 1
1
log x
Since lim x log x = lim 1 = lim x1 = 0
x→0 x→0 x→0 − 2
x x
Z θ
1
Eθ [log X]2 = (log x)2 dx
θ 0
Z θ
1 2 θ 1 log x
= [x(log x) ]0 − 2x dx
θ θ 0 x
1 2
= (log θ)2 − lim x(log x)2 − [θ log θ − θ]
θ x→0 θ
= (log θ)2 − 2 log θ + 2 since lim x(log x)2 = 0
x→0
Vθ [log X] = (log θ)2 − 2 log θ + 2 − (log θ − 1)2 = 1
n
1 X 1
Vθ [log GM ] = Vθ [log Xi ] =
n2 i=1 n
Vθ [log GM ] → 0 as n → ∞, ∀ θ > 0

Thus loge GM is a consistent estimator of log θ−1 , i.e., GM is a consistent estimator


of θe−1 .
Example 2.5 Let X1 , X2 , · · · , Xn be iid random sample drawn from a pop-
ulation with Eθ [Xi ] = θ and Vθ [Xi ] = σ 2 , ∀ i = 1, 2, · · · , n. Prove that

38
Probability Models and their Parametric Estimation

2
Pn
n(n+1) i=1 iXi is a consistent estimator of θ .
" n #
X
Eθ iXi = Eθ [X1 + 2X2 + · · · + nXn ]
i=1
= θ + 2θ + · · · + nθ
= θ[1 + 2 + · · · + n]
n(n + 1)
= θ
" n # 2
2 X
Eθ iXi = θ, ∀ θ ∈ Ω
n(n + 1) i=1
" n # n
X X
Vθ iXi = i2 Vθ [Xi ]
i=1 i=1
n
X
= σ2 i2
i=1

2 n(n
+ 1)(2n + 1)
= σ
" # 6
n
2 X 2 (2n + 1) 2
Vθ iXi = σ → 0 as n → ∞
n(n + 1) i=1 3 n(n + 1)

2
Pn
Thus n(n+1) i=1 iXi is a consistent estimator of θ.

Consistent estimator is not unique


Example 2.6 Let T = max1≤i≤n {Xi } be the nth order statistic of a random
sample of size n drawn from a population with a uniform distribution on the interval
( 0, θ ). The pdf of T is
 n−1
nt
θn 0 < t < θ, θ > 0
pθ (t) =
0 otherwise
Z θ
n n
Eθ [T ] = tn dt = θ
θn 0 n+1
2
nθ nθ2
Eθ [T 2 ] = , Vθ [T ] =
(n + 2) (n + 2)(n + 1)2

h Eθ [T ]i→ θ and Vθ [T ] → 0 as n → ∞.
Thus T is a consistent estimator of θ . Also
θ2
Eθ (n+1)
n T = θ and V θ [ (n+1)
n T ] = n(n+2) → 0 as n → ∞, i.e.,
(n+1)
n T is

also a consistent estimator of θ . The statistic T and (n+1)


n T are the two consistent
estimators of the same parameter θ . Thus consistent estimator is not unique.

39
A. Santhakumaran

2.7 Invariance Property of Consistent Estimator


If T = t(X) is a consistent estimator of θ , then an T, T + cn , and an T + cn
are also consistent estimators of θ, where an = 1 + nk , k ∈ < and an → 1 and
cn → 0 as n → ∞ for every fixed θ ∈ Ω . In general, we have the Theorem 2.2.
Theorem 2.2 If Tn = tn (X) is a consistent estimator of τ (θ) and ψ(τ (θ)) is a
continuous function of τ (θ) , then ψ(Tn ) is a consistent estimator of ψ(τ (θ)) .
P
Proof Given Tn = tn (X) is a consistent estimator τ (θ) , i.e., Tn → τ (θ) as
n → ∞.
Therefore for given  > 0, η > 0 , there exist a positive integer n ≥ N (, η) such
that

P {|Tn − τ (θ)| < } > 1 − η ∀ n ≥ N


Also ψ(.) is a continuous function , i. e., For every  such that
{|ψ(Tn ) − ψ(τ (θ))|} < 1 whenever |Tn − τ (θ)| < 
i.e., |Tn − τ (θ)| <  ⇒ |ψ(Tn ) − ψ(τ (θ))| < 1
For any two events A and B if A ⇒ B , then A ⊆ B .
Therefore P (A) ≤ P (B), i.e., P (B) ≥ P (A) . Let A = {|Tn − τ (θ)| < } and
B = {|ψ(Tn ) − ψ(τ (θ))| < 1 } then
P {ψ(Tn ) − ψ(τ (θ))| < 1 } ≥ P {|Tn − τ (θ)| < }
P
i.e., P {|ψ(Tn ) − ψ(τ (θ))| < 1 } ≥ 1 − η ∀ n ≥ N ⇒ ψ(Tn ) → ψ(τ (θ)) as
n → ∞.
i.e., ψ(Tn ) is a consistent estimator of ψ(τ (θ))
Example 2.7 Suppose T = t(X) is a statistic with pdf p(x)θ for θ > 0, θ ∈ Ω .
Prove that T 2 = t2 (X) is a consistent estimator of θ2 , if T = t(X) is a consistent
estimator of θ .
Given T = t(X) is a consistent estimator of θ .
By the definition of consistent estimator, Pθ {|T − θ| < } → 1 as n → ∞, for θ >
0, ∀ θ ∈ Ω , consider
Pθ {|T − θ| < } = Pθ {θ −  < T < θ + }
= Pθ {(θ − )2 < T 2 < (θ + )2 }
= Pθ {−2θ < T 2 − θ2 − 2 < 2θ}
= Pθ {−0 < T 2 − θ2 − 2 < 0 }
where 0 = 2θ
= Pθ {−0 < T 0 − θ2 < 0 }
where T 0 = T 2 − 2
= Pθ {|T 0 − θ2 | < 0 } → 1 as n → ∞

T 0 = T 2 − 2 → T 2 as n → ∞ since  → 0 as n → ∞
.. . Pθ {|T 2 − θ2 | < 0 } → 1 as → ∞. Thus T 2 is a consistent estimator of θ2 .

40
Probability Models and their Parametric Estimation

2.8 Unbiased Estimator


For any statistic g(T ) , if the mathematical expectation is equal to a parameter τ (θ) ,
then g(T ) is called an unbiased estimator of the parameter τ (θ) ,

i.e., Eθ [g(T )] = τ (θ), ∀ θ ∈ Ω.

Otherwise, the statistic g(T ) is said to be a biased estimator of τ (θ) . The unbiased
estimator is also called zero bias estimator. A statistic g(T ) is said to be asymptotically
unbiased estimator if Eθ [g(T )] → τ (θ) as n → ∞, ∀ θ ∈ Ω .
Example 2.8 A random variable X has the pdf

 2θx if 0 < x < 1
pθ (x) = (1 − θ) if 1 ≤ x < 2, 0 < θ < 1
0 otherwise

Show that g(X) , a measurable function of X is an unbiased estimator of θ if and


R 1 R2
only if 0 xg(x)dx = 21 and 1 g(x)dx = 0.
Assume g(X) is an unbiased estimator of θ , i.e.,

Eθ [g(X)] = θ
Z 1 Z 2
g(x)2θxdx + g(x)(1 − θ)dx = θ
0 1
Z 1 Z 2  Z 2
θ 2xg(x)dx − g(x)dx + g(x)dx = θ
0 1 1
Z 1 Z 2
⇒ 2xg(x)dx − g(x)dx = 1 and
0 1
Z 2
g(x)dx = 0
1
Z 1
1
i.e., xg(x)dx = and
0 2
Z 2
g(x)dx = 0
1
R 1 1
R 2
Conversely, 0
xg(x)dx = 2 and 1
g(x)dx = 0, then g(X) is an unbiased esti-
mator of θ .
Z 1 Z 2
Eθ [g(X)] = 2θxg(x)dx + (1 − θ)g(x)dx
0 1
Z 1 Z 2
= 2θ xg(x)dx + (1 − θ) g(x)dx
0 1
1
= 2θ + (1 − θ) × 0
2
= θ

41
A. Santhakumaran

Thus g(X) is an unbiased estimator of θ .


Example 2.9 If T denotes the number of successes in n independent and identical
trials of an experiment with probability of success θ . Obtain an unbiased estimator of
θ2 and θ(1 − θ), 0 < θ < 1. Pn
Let Xi ∼ b(1, θ), ∀ i = 1, 2, · · · , n , then T = i=1 Xi ∼ b(n, θ) . If g(T ) is
the unbiased estimator of τ (θ) = θ(1 − θ) , then Eθ [g(T )] = θ(1 − θ)
n
X
g(t)cn t
t θ (1 − θ)
n−t
= θ(1 − θ)
t=0
n  t
X θ
g(t)cn
t = θ(1 − θ)1−n
t=0
1−θ
θ
Consider ρ =
1−θ
ρ
⇒θ =
1+ρ
n  1−n
X ρ 1
.. . g(t)cn

t
=
t=0
1+ρ 1+ρ
= ρ(1 + ρ)n−2
= ρ[1 + cn−2
1 ρ + cn−3
2 ρ2 + · · · + ρn ]
Equating the coefficient of ρt on both sides
g(t)cn
t = cn−2
t−1
(n − 2)! t!(n − t)!
g(t) =
(t − 1)!(n − t − 1)! n!
(n − 2)!t(t − 1)!(n − t)(n − t − 1)!
=
(t − 1)!n(n − 1)(n − 2)!(n − t − 1)!
t(n − t)
= , if n = 2, 3, · · ·
n(n − 1)

Thus the unbiased estimator of θ(1 − θ) is

T (n − T )
n = 2, 3, · · ·
n(n − 1)

42
Probability Models and their Parametric Estimation

Let the unbiased estimator of θ2 be given by

Eθ [g ∗ (T )] = θ2
n  t
X θ
g ∗ (t)cnt (1 − θ)n = θ2
t=0
1−θ
n
X
g ∗ (t)cnt ρt = ρ2 (1 + ρ)n−2
t=0
= ρ2 [1 + cn−2
1 ρ + · · · + cn−2
t ρt + · · · + ρn−2 ]
∵ g(t)∗ cnt = cn−2
t−2

∗ (n − 2)!t!(n − t)!
⇒ g (t) =
(t − 2)!(n − t)!n!
(n − 2)!t(t − 1)!(t − 2)!
=
(t − 2)!n(n − 1)(n − 2)!
t(t − 1)
= n = 2, 3, · · · · · ·
n(n − 1)

Thus the unbiased estimator of θ2 is


T [T − 1]
g ∗ (T ) = n = 2, 3, · · ·
n(n − 1)

Example 2.10 Obtain an unbiased estimator of θ1 , given a sample observation from


a Geometric population with pmf

θ(1 − θ)x−1 x = 1, 2, 3, · · · , 0 < θ < 1



pθ (x) =
0 otherwise

1
Eθ [g(X)] =
θ

X 1
g(x)θ(1 − θ)x−1 =
x=1
θ

X (1 − θ)
g(x)(1 − θ)x =
x=1
θ2
Take 1 − θ = ρ ⇒ θ = 1−ρ
X∞
g(x)ρx = ρ(1 − ρ)−2
x=1
= ρ(1 + 2ρ + 3ρ2 + · · · + xρx−1 + · · · )
⇒ g(x) = x ∀ x = 1, 2, 3, · · ·
1
Thus g(X) = X is the unbiased estimator of θ .

43
A. Santhakumaran

Unbiased estimator is not exist

Example 2.11 Assume X ∼ b(1, θ), 0 < θ < 1. If a single observation x of X


from a Bernoulli population, then there is no unbiased estimator exist for θ2 .

θ(1 − θ)1−x x = 0, 1 and 0 < θ < 1



pθ (x) =
0 otherwise

Let there be an unbiased estimator for θ2 say g(X) . That is,

Eθ [g(X)] = θ2
1
X
g(x)θx (1 − θ)1−x = θ2
x=0
g(0)(1 − θ) + g(1)θ = θ2

[g(1) − g(0)]θ + g(0) = θ2 ⇒ g(1) = 0 and g(0) = 0 i.e., g(x) = 0 for x = 0, 1.


Thus the value of θ2 is 0 for x = 0 or x = 1 . But the value of θ2 lies between 0 to
1. ∴ The unbiased estimator of θ2 does not exist.
Example 2.12 If X ∼ b(n, θ) , then show that there exist no unbiased estimator
of the parameter θ1

1
Consider Eθ [g(X)] =
θ
n  x
X n! θ 1
g(x) (1 − θ)n =
i=0
x!(n − x)! 1 − θ θ
n
X n! (1 + ρ)n+1
g(x) ρx =
i=0
x!(n − x)! ρ
θ
where ρ = 1−θ

n+1
n!
ρx → g(0) as θ → 0 and (1+ρ)
P
g(x) x!(n−x)! ρ → ∞ as ρ → 0 or θ → 0
Thus there is no unbiased estimator exist of the parameter θ1 .

Unbiased estimator is unique


Example 2.13 A random sample X is drawn from a Bernoulli population b(1, θ), θ =
{ 14 , 12 } . Then there exists an unique unbiased estimator of θ2 .

Let Eθ [g(X)] = θ2
1
X
g(x)θx (1 − θ)1−x = θ2
x=0

1 1
When θ = ⇒ 3g(0) + g(1) = (2.1)
4 4

44
Probability Models and their Parametric Estimation

1 1
When θ = ⇒ g(0) + g(1) = (2.2)
2 2
Solving the equations (2.1) and (2.2) for g(0) and g(1) , one gets the values of g(0) =
− 81 and g(1) = 58 ,  1
−8 for x = 0
i.e., g(x) = 5
8 for x=1
Thus the unbiased estimator of θ2 is g(X) = X which is unique.
Unbiased estimator is not unique

Example 2.14 Let X1 , X2 , · · · , Xn be a iid random sample drawn


Pnfrom a popu-
lation with Poisson distribution P (θ) . g1 (X) = X̄ and g2 (X) = n1 i=1 (Xi − X̄)2
are the two unbiased estimators of θ. Consider a statistic g(X) = αg1 (X) + (1 −
α)g2 (X), α ∈ <, 0 < θ < 1 . Then Eθ [g(X)] = θ ∀ θ ∈ Ω and α ∈ < which is not
unique. Thus unbiased estimator is not unique.
Example 2.15 Show that the mean X̄ of a random sample of size n drawn from
a population with probability density function
 1 −x
θe
θ 0 < x < ∞, θ > 0
pθ (x) =
0 otherwise
2
θ
Pn of θ and has variance n .
is an unbiased estimator
Let T = i=1 Xi ∼ G(n, θ). The pdf of T is
 1 − t n−1
θ n Γn e
θt 0 < t < ∞, θ > 0
pθ (t) =
0 otherwise

Z ∞
1 − 1 t n+1−1
Eθ [T ] = e θ t dt
0 θn Γn
" n
# = nθ
X
Eθ Xi = nθ ∀ θ > 0
i=1
Eθ [nX̄] nθ ∀ θ > 0
=
⇒ Eθ [X̄] θ∀θ>0
=
Eθ [T 2 ] n(n + 1)θ2 ∀ θ > 0
=
Vθ [T ] nθ2 ∀ θ > 0
=
 Pn 
. i=1 Xi
. . Vθ [X̄] = Vθ
n
1
= Vθ [T ]
n2
1 2 θ2
= nθ =
n2 n

45
A. Santhakumaran

Example 2.16 Let X1 , X2 , · · · , Xn be a random sample drawn fromP a normal pop-


n 2
i=1 Xi
ulation with mean zero and variance σ 2 , 0 < σ 2 < ∞. Show that n is an
2 2σ 4
unbiased estimator of σ and has variance n .
Pn 2
Define ns2 = i=1 Xi2 , then Y = ns 2
σ 2 ∼ χ distribution with n degrees
n 1
of freedom , i.e., Y ∼ G( 2 , 2 ).
( 1 n
n
1
e− 2 y y 2 −1 0 < y < ∞
p(y) = 2 2 Γn 2
0 otherwise

Z ∞
1 1 n
E[Y ] = 1 n
e− 2 y y 2 +1−1 dy
0 2 2 Γ 2
1 Γ( n2 + 1)
= n n
2 2 Γ n2 ( 1 ) 2 +1
2
= n
2
E[Y ] = n2 + 2n
V [Y ] = 2n
ns2
But Y = 2
 σ2 
ns
.. . Eσ2 = n
σ2
⇒ Eσ2 [s2 ] = σ2
Xi2
P
Thus n is an unbiased estimator of σ 2 .
 2
ns
Vσ 2 = 2n
σ2
n2
Vσ2 [s2 ] = 2n
σ4
2σ 4
Vσ2 [s2 ] =
n
Example 2.17 Let Y1 < Y2 < Y3 be the order statistics of a random sample of
size 3 drawn from an uniform population with pdf
 1
θ 0<x<θ
pθ (x) =
0 otherwise

Show that 4Y1 and 2Y2 are unbiased estimators of θ . Also find the variance of these
estimators.
The pdf of Y1 is
( hR i2
3! 1 θ 1
pθ (y1 ) = 1!2! θ y1 θ
dx 0 < y1 < θ
0 otherwise

46
Probability Models and their Parametric Estimation

3 y1 2

θ [1 − θ ] 0 < y1 < θ
pθ (y1 ) =
0 otherwise

Z θ
3 y1 2
Eθ [Y1 ] = y1 (1 − ) dy1
θ 0 θ
Z 1
3 y1
= θt(1 − t)2 θdt where θ =t
θ 0
Z 1
= 3θ t2−1 (1 − t)3−1 dt
0
Γ2Γ3 θ
= 3θ = ∀θ>0
Γ5 4
θ2 3θ2
Similarly Eθ [Y12 ] = and Vθ [Y1 ] =
10 15
3θ2
.. . Vθ [4Y1 ] =
5
The pdf of Y2 is
!
Z y2  Z θ
3! 1 1 1
pθ (y2 ) = dx dx
1!1!1! 0 θ θ y2 θ

6 y2

θ 2 y2 [1 − θ ] 0 < y2 < θ
pθ (y2 ) =
0 otherwise
.˙. Eθ [Y2 ] = θ2
2
θ2
⇒ 2Y2 is an unbiased estimator of θ and Eθ [Y 2 ] = 3θ 10 and Vθ [Y2 ] = 20
2
⇒ Vθ [2Y2 ] = θ5
Example 2.18 Let Y1 and Y2 be two independent and unbiased estimators of θ .
If the variance of Y1 is twice the variance of Y2 , find the constant k1 and k2 so that
k1 Y1 + k2 Y2 is an unbiased estimator of θ with smaller possible variance for such a
linear combination.
Given Eθ [Y1 ] = θ ∀ θ and Eθ [Y2 ] = θ ∀ θ and Vθ [Y1 ] = 2σ 2 and

47
A. Santhakumaran

Vθ [Y2 ] = σ 2 . Also Eθ [k1 Y1 + k2 Y2 ] = θ ∀ θ

k1 Eθ [Y1 ] + k2 Eθ [Y2 ] = θ
⇒ k1 + k2 = 1
i.e., k2 = 1 − k1
Consider φ = Vθ [k1 Y1 + k2 Y2 ]
= k12 Vθ [Y1 ] + k22 Vθ [Y2 ]
= k12 2σ 2 + (1 − k1 )2 σ 2
= 3k12 σ 2 − 2k1 σ 2 + σ 2
Differentiate twice this with respective to k1

= 6k1 σ 2 − 2σ 2
dk1
d2 φ
= 6σ 2
dk12
dφ d2 φ
For minimum =0 and >0
dk1 dk12
⇒ 6k1 σ 2 − 2σ 2 = 0
1 2
i.e., k1 = and k2 =
3 3
1
Thus 3 Y1 + 23 Y2 has minimum variance.
Consistent estimator need not be unbiased
Example 2.19 Let X1 , X2 , · · · , Xn be a sample of size P
n drawn from a normal
n
population with mean θ and variance σ 2 . Define s2 = n1 i=1 (Xi − X̄)2 , then
2
Y = ns 2 n−1 1
σ 2 ∼ χ distribution with (n − 1) degrees of freedom and Y ∼ G( 2 , 2 ) .
It has the pdf
( 1 n−1
n−1
1
n−1
e− 2 y y 2 −1 0 < y < ∞
p(y) = 2 2 Γ 2
0 otherwise

48
Probability Models and their Parametric Estimation

Z ∞
1 1 n−1
E[Y ] r
= n−1
n−1
e− 2 y y 2 +r−1 dy
0 2 2Γ 2
Γ n−1

1 2 +r
= n−1 n−1
2 2 Γ n−1
2 ( 12 ) 2
+r

r
 
2 n−1
= Γ +r
Γ n−1
2
2
When r = 1
2 n−1 n−1
E[Y ] = Γ =n−1
Γ n−1
2
2 2
ns2
 
.
. . Eσ2 = n−1
σ2
n−1 2
⇒ Eσ2 [s2 ] σ=
n
2(n − 1) 4
and Vσ2 [s2 ] = σ
n2
Thus Eσ2 [s2 ] → σ 2 and Vσ2 [s2 ] → 0 as n → ∞
Pn
.˙. n1 i=1 (Xi − X̄)2 is aP consistent estimator of σ 2 .
1 n
But Eσ2 [s ] 6= σ . .˙. n i=1 (Xi − X̄)2 is not an unbiased estimator of σ 2 .
2 2

Example 2.20 Illustrate with an example that an estimator is both consistent and
unbiased.
Let X1 , X2 , · · · , Xn be a random sample of size n P
drawn from a normal
n
population with mean θ and variance σ 2 . Define s2 = n1 i=1 (Xi − X̄)2 and
n 2
1 ns
S 2 = n−1 2 2
P
i=1 (Xi − X̄) , then Y = σ 2 ∼ χ distribution with (n − 1) degrees
2(n−1) 4
of freedom and Y ∼ G( n−1 1 2
2 , 2 ) . with Eσ [s ] =
2
n−1 2
n σ and Vσ2 [s2 ] = n2 σ .
n 2
(n − 1)S 2 = ns2 → S 2 = s
n−1
n
Eσ2 [S 2 ] = Eσ2 [s2 ]
n−1
n n−1 2
= σ = σ2
n−1 n
n2
Vσ2 [S 2 ] = Eσ2 [s2 ]
(n − 1)2
n2 2(n − 1) 4
= 2
σ
(n − 1) n2
2σ 4
= → 0 as → ∞
(n − 1)
1
Pn
Thus S 2 = n−1 2
i=1 (Xi − X̄) is consistent and also unbiased estimator of σ .
2

Example 2.21 Give an example that an unbiased estimator need not be consistent.
Let X1 , X2 , · · · , Xn be a random sample drawn from a normal population
with mean θ and known variance σ 2 , then the estimator X1 ( first observation) of the

49
A. Santhakumaran

sample is unbiased but not consistent. Since Eθ [X1 ] = θ and Vθ [X1 ] = σ 2 ∀ θ ∈ Ω


and

Pθ {|X1 − θ| < } = Pθ {− < X1 − θ < }


= Pθ {θ −  < X1 < θ + }
Z θ+
1 1 2
= √ e− 2σ2 (x1 −θ) dx1
2πσ θ−
6→ 1 as n → ∞

. ˙. X1 is not consistent but unbiased estimator of θ.


Example 2.22 Give an example that an estimator is not consistent and not unbi-
ased.
Let Y1 < Y2 < Y3 be the order statistics of a random sample of size 3 drawn
from a uniform population with pdf for given θ is
 1
θ 0<x<θ
pθ (x) =
0 otherwise

then Y1 is not consistent and not unbiased estimator of θ , since


θ
Eθ [Y1 ] = 6 θ ∀ θ ∈ Ω and
=
  4  
θ θ θ
Pθ Y1 − <  = Pθ −  < Y1 < + 
4 4 4
Z θ4 + h
3 y1 2
i
= 1− dy1
θ θ4 − θ
6→ 1 as n → ∞

Thus Y1 the first order statistic is not consistent and not unbiased estimator of θ .

2.9 Sufficient Statistic


Sufficient statistic conveys as much as information about the distribution of a ran-
dom variable which is contained in the sample. It helps to identify a family of distribu-
tions only and not for the parameters of the distributions.
Definition 2.1 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a
population with pdf p(x | θ). Let T = t(X) be a statistic whose pdf is pθ (t) . For
a continuous random variable X , T = t(X) is said to be a sufficient statistic iff

pθ (x1 , x2 , · · · , xn )
pθ (t)

is independent of θ for every given T = t . Similarly for a discrete random variable


X , T = t(X) is said to be a sufficient statistic iff

Pθ {X1 = x1 , X2 = x2 , · · · | T = t}

50
Probability Models and their Parametric Estimation

is independent of θ for every given T = t .


Example 2.23 Let X be a single observation from a population with pmf
pθ (x), 0 < θ < 1 .
 |x| |x|
 θ (1−θ)
2 x = −1, 1
pθ (x) = 1 − θ(1 − θ) x=0

0 otherwise
Show that |X| is sufficient.
Let Y = |X| . Then P {Y = 0} = P {|X| = 0} = P {X = 0} = 1 − θ(1 − θ)
P {Y = 1} = P {|X| = 1} = P {X = 1orX = −1} = P {X = 1} + P {X = −1} =
θ(1 − θ)
Consider
P {X = 1 ∩ Y = 1}
P {X = 1 | Y = 1} =
P {Y = 1}
X = 1 ∩ |X| = 1}
=
P {Y = 1}
P {X = 1}
=
P {Y = 1}
θ(1−θ)
2 1
= = is independent of θ
θ(1 − θ) 2
Therefore Y = |X| is sufficient.
Example 2.24 Let X1 , X2 , · · · , Xn be independent random sample drawn from a
population with pdf
 iθ−x
e x > iθ, i = 1, 2, 3 · · · , n
pθ (x) =
0 otherwise

Show that T = min1≤i≤n Xii is a sufficient statistic.


Let y = xii , then dx = idy
x
Given pθ (x) = ei[θ− i ]
i[θ−y]
i.e., pθ (y) = ie ,y > θ
Take T = min1≤i≤n Yi . The pdf of T is
Z ∞ n−1
n!
pθ (t) = iei[θ−t] iθ−iy
ie dy
1!(n − 1)! t
= ineniθ−nit
P
θ<t<∞
inθ− xi
pθ (x1 , x2 , · · · , xn ) e
=
pθ (t) nieniθ−nit
1 nit−P xi
= e
ni
It is independent of θ . Thus T = min1≤i≤n Yi = min1≤i≤n Xii is sufficient.
Example 2.25 Let X1 and X2 be iid Poisson random variables with parameter
θ . Prove that

51
A. Santhakumaran

(i) X1 + X2 is a sufficient statistic.


(ii) X1 + 2X2 is not a sufficient statistic.
(i) Given that (
e−θ θ x1
x1 = 0, 1, 2, · · ·
Pθ {X1 = x1 } = x1 !
0 otherwise
(
e−θ θ x2
x2 = 0, 1, 2, · · ·
and Pθ {X2 = x2 } = x2 !
0 otherwise
Let T = X1 + X2 , then
e−θ θ t

t = 0, 1, 2, · · ·
Pθ {T = t} = t!
0 otherwise

Pθ {X1 = x1 , X2 = t − x1 }
Consider Pθ {X1 = x1 , X2 = x2 | T = t} =
Pθ {T = t}
Pθ {X1 = x1 }Pθ {X2 = t − x1 }
=
Pθ {T = t}
e−θ θ x1 e−θ θ t−x2
x1 ! (t−x2 )!
= e−2θ (2θ)t
t!
t!
= is independent of θ.
(t − x1 )!x1 !2t
.˙. X1 + X2 is a sufficient statistic.

(ii) Consider Pθ {X1 + 2X2 = 2} = Pθ {X1 = 0, X2 = 1}


+ Pθ {X1 = 2, X2 = 0}
= Pθ {X1 = 0}Pθ {X2 = 1}
+ Pθ {X1 = 2}Pθ {X2 = 0}
θ2
= θe−2θ + e−2θ
2
−2θ θ
= θe [1 + ]
2
Pθ {X1 = 0, X2 = 1}
Therefore Pθ {X1 = 0, X2 = 1 | X1 + 2X2 = 2} =
Pθ {X1 + 2X2 = 2}
e−2θ θ
=
θe−2θ [1 + θ2 ]
2
= depends on θ.
2+θ
.˙. X1 + 2X2 is not a sufficient statistic.
Example 2.26 Let X1 and X2 be two independent Bernoulli random variables such

52
Probability Models and their Parametric Estimation

that Pθ {X1 = 1} = 1 − Pθ {X1 = 0} = θ, 0 < θ < 1 and Pθ {X2 = 1} =


1 − Pθ {X2 = 0} = 2θ, 0 < θ ≤ 21 . Show that X1 + X2 is not a sufficient statistic.

Let T = X1 + X2 . Consider
Pθ {T = 1} = Pθ {X1 + X2 = 1}
= Pθ {X1 = 0, X2 = 1} + Pθ {X1 = 1, X2 = 0}
= (1 − θ)2θ + θ(1 − 2θ)
= θ(3 − 4θ)
Pθ {X1 = 0 ∩ X1 + X2 = 1}
.˙.Pθ {X1 = 0 | X1 + X2 = 1} =
Pθ {X1 + X2 = 1}
Pθ {X1 = 0, X2 = 1}
=
Pθ {X1 + X2 = 1}
(1 − θ)2θ
=
θ(3 − 4θ)
2(1 − θ)
= is dependent on θ.
(3 − 4θ)
. ˙. X1 + X2 is not a sufficient statistic.
Example 2.27 If X1 and X2 denote a random sample drawn from a normal popula-
tion N( θ, 1 ), −∞ < θ < ∞ . Show that T = X1 + X2 is a sufficient statistic.
The joint pdf of X1 and X2 is

pθ (x1 , x2 ) = pθ (x1 )pθ (x2 )


1 − 1 (x1 −θ)2 − 1 (x2 −θ)2
= e 2 2

Let T = X1 + X2 ∼ N (2θ, 2)
( 1 2
√ 1 √ e− 4 (t−2θ) −∞ < t < ∞
p(t)θ = 2π 2
0 otherwise

The definition of sufficient statistic gives


1 − 21 [x21 +x22 −2(x1 +x2 )θ+2θ 2 ]
pθ (x1 , x2 ) 2π e
= 1 1 2 2
pθ (t) √
2 π
e− 4 [t −4tθ+4θ ]
1 2 2 2
1 e− 2 (x1 +x2 )+(x1 +x2 )θ−θ
= √
π e− 41 (x1 +x2 )2 +(x1 +x2 )θ−θ2
1 1 2 2 1 2
= √ e− 2 (x1 +x2 )+ 4 (x1 +x2 ) is independent of θ.
π
. ˙. T = X1 + X2 is a sufficient statistic.

Example 2.28 Let X1 , X2 , X3 be a sample from B(1, θ) . Show that X1 X2 +X3


is not sufficient.

53
A. Santhakumaran

Let Y = X1 X2 and T = X1 X2 + X3 , then

P {Y = 0} = P {X1 = 0X2 = 0} + P {X1 = 1, X2 = 0} + P {X1 = 0, X2 = 1}


= (1 − θ)2 + θ(1 − θ) + θ(1 − θ)
= 1 − θ2

P {Y = 1} = P {X1 = 1, X2 = 1}
= θ2

P {Y + X3 = 1} = P {Y = 0, X3 = 1} + P {Y = 1, X3 = 0}
= (1 − θ2 )θ + θ2 (1 − θ)
i.e., P {T = 1} = θ(1 − θ)(1 + 2θ)

Consider
P {Y = 1, T = 1}
P {Y = 1 | T = 1} =
P {T = 1}
P {Y = 1}P {X3 = 0}
=
P {T = 1}
θ2 θ
=
θ(1 − θ)(1 + 2θ)
θ2
=
(1 − θ)(1 + 2θ)

P {Y = 1 | T = 1} depends on the parameter θ . Thus X1 X2 + X3 is not sufficient


Remark 2.3 The definition of sufficient statistic is not always useful to find a
sufficient statistic, since
(i) it does not reveal which statistic is to be sufficient.
(ii) even if it is known in some cases, it is tedious to find the pdf of the statistic.
(iii) it requires to derive a conditional density, which may not be easy, namely for
continuous random variables.
To avoid the above difficulties one may use the Neyman Factorization Theorem.

2.10 Neyman Factorization Theorem


Theorem 2.3 Let X1 , X2 , · · · , Xn be discrete random variables with pmf
pθ (x1 , x2 , · · · , xn ), θ ∈ Ω. Then T = t(X) is sufficient statistic if and only if

pθ (x1 , x2 , · · · , xn ) = pθ (t)h(x1 , x2 , · · · , xn )

where h(x1 , x2 , · · · , xn ) is a non-negative function of x1 , x2 , · · · , xn only and does


not depend on θ and pθ (t) is a non-negative function of θ and T = t only.

54
Probability Models and their Parametric Estimation

Proof: Assume that T = t(X) is a sufficient statistic. Then by definition of


sufficient statistic that

pθ (x1 , x2 , · · · , xn ) = pθ (t)h(x1 , x2 , · · · , xn ).

At any given sample point X1 = x1 , X2 = x2 , · · · , Xn = xn , let t(x) = t; then


adding the consistent restriction T = t does not alter the event X1 = x1 , X2 =
x2 , · · · , Xn = xn :

Pθ {X1 = x1 , · · · , Xn = xn } = Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn , T = t}
= Pθ {T = t}P {X1 = x1 , · · · , Xn = xn | T = t}

provided that P {X1 = x1 , · · · , Xn = xn | T = t} is well defined.


Choose Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn } > 0 for some θ .
Define h(x1 , x1 , · · · , xn ) = P {X1 = x1 , · · · , Xn = xn | T = t} and
pθ (t) = Pθ {T = t} , then Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn } =
pθ (t)h(x1 , x2 , · · · , xn ).
Conversely, Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn } = pθ (t)h(x1 , x2 , · · · , xn )
holds, then prove that T = t(X) is a sufficient statistic. The marginal pmf of
T = t(X) is
X
Pθ {T = t} = Pθ {X1 = x1 , · · · , Xn = xn , t(X) = t}
t(x)=t
X
= Pθ {X1 = x2 , · · · , Xn = xn }
t(x)=t
X
= pθ (t)h(x1 , x2 , · · · , xn )
t(x)=t

Assume Pθ {T = t} > 0 for some θ > 0 , then

Pθ {X1 = x1 , · · · , Xn = xn , T = t}
Pθ {X1 = x1 , · · · , Xn = xn | T = t} =
Pθ {T = t}
(
0 if T 6= t
= Pθ {X1 =x1 ,··· ,Xn =xn }
Pθ {T =t} if T =t
If T = t, then
Pθ {X1 = x1 , · · · , Xn = xn } pθ (t)h(x1 , x2 , · · · , xn )
= P
Pθ {T = t} pθ (t) t(x)=t h(x1 , x2 , · · · , xn )
h(x1 , x2 , · · · , xn )
= P
t(x)=t h(x1 , x2 , · · · , xn )
is independent of θ.

Thus T = t(X) is a sufficient statistic.


Theorem 2.4 If T = t(X) is a sufficient statistic, then any one to one function of
the sufficient statistic is also a sufficient statistic.

55
A. Santhakumaran

Proof: Let T = t(X) be a sufficient statistic, then by the Neyman Factorization


Theorem p(x1 , x2 , · · · , xn | θ) = pθ (t)h(x1 , x2 , · · · , xn ). Let U be any one to one
function of T = t(X) , i.e., u = α(t) . Since u = α(t) → t = α−1 (u)
−1 0
dt
= dα du(u) = α−1 (u) .

.˙. du

h(x1 , x2 , · · · , xn )
p(x1 , x2 , · · · , xn | θ) = p(α−1 (u) | θ)[α−1 (u)]0
[α−1 (u)]0
= p(u | θ)h1 (x1 , x2 , · · · , xn )
where p(u | θ) = p(α−1 (u) | θ)[α−1 (u)]0
h(x1 , x2 , · · · , xn )
h1 (x1 , x2 , · · · , xn ) =
[α−1 (u)]0

is a function of x1 , x2 , · · · , xn for given U = u which is independent of θ . Thus


any one to one function of T = t(X) is also a sufficient statistic.
Remark 2.4
(i) Sufficient statistic is not unique. If it is unique, there is no one to one function
exist.
(ii) Every function of a sufficient statistic is itself a sufficient statistic.
Example 2.29 Let X1 , X2 , · · · , Xn be a random sample drawn from a population
with pmf  x
θ (1 − θ)1−x x = 0, 1
pθ (x) =
0 otherwise
Find the sufficient statistic.
n−t
Consider pθ (x1 , x2 , · · · , xn ) = θt (1 − θ)
Pn
where t = i=1 xi
 t
θ
= (1 − θ)n
1−θ
= pθ (t)h(x1 , x2 , · · · , xn )
 t
θ
where pθ (t) = (1 − θ)n and h(x1 , x2 , · · · , xn ) = 1
1−θ
Pn
.˙. T = i=1 Xi is a sufficient statistic.
Remark 2.5 If the range of the distribution depends on the parameter, the Ney-
man Factorization Theorem fails to find the sufficient statistic. For such cases of the
distributions the definition of sufficient statistic is useful to find the sufficient statistic.
Example 2.30 Let X1 , X2 , · · · , Xn be a random sample drawn from a population
with pdf  −(x−θ)
e θ<x<∞
pθ (x) =
0 otherwise
Obtain a sufficient statistic.

56
Probability Models and their Parametric Estimation

Consider Y1 ≤ Y2 ≤ · · · ≤ Yn is the order statistic of X1 , X2 , · · · , Xn .


The pdf of the statistic Y1 is
Z ∞ n−1
n!
pθ (y1 ) = e−(y1 −θ) e−(x−θ) dx
1!(n − 1)! y1

−n(y1 −θ)

ne θ < y1 < ∞
pθ (y1 ) =
0 otherwise
The definition of sufficient statistic gives
pθ (x1 , x2 , · · · , xn ) e−(x1 −θ) · · · e−(xn −θ)
=
pθ (y1 ) ne−n(y1 −θ)
−t+nθ
e Pn
= −ny +nθ
where t = i=1 xi
ne 1

e−t
= is independent of θ.
ne−ny1
.˙. Y1 = min1≤i≤n {Xi } is sufficient. Again

Consider pθ (x1 , x2 , · · · , xn ) = e−(x1 +x2 +···+xn )+nθ


= e−yn +nθ−t+yn
n
X
where t = xi and Yn = max {Xi }
1≤i≤n
i=1
pθ (x1 , x2 , · · · , xn ) = e−yn +nθ e−t+yn
= pθ (yn )h(x1 , x2 , · · · , xn )
By Neyman Factorization Theorem, Yn = max1≤i≤n {Xi } is a sufficient statistic. But
if max1≤i≤n {Xi } = Y1 , then the range of the distribution θ < y1 < ∞ depends on
θ . Again if max1≤i≤n {Xi } = Y2 , then the range of the distribution θ < y2 < ∞
depends on θ and so on. Thus for each fixed Y1 = y1 , Y2 = y2 , · · · Yn = yn ,
h(x1 , x2 , · · · , xn ) depends on θ . h(x1 , x2 , · · · , xn ) depends on θ is a contradiction
to Neyman Factorization Theorem. Hence the Neyman Factorization Theorem fails
when the range of the distribution depends on the parameter θ .
Example 2.31 Show that the set of order statistic based on a random sample drawn
from a continuous population with pdf p(x | θ) is a sufficient statistic.
The order statistic Y1 ≤ Y2 ≤ · · · ≤ Yn are jointly sufficient statistic to iden-
tifying the distribution. If the order statistic is given by Y1 = y1 , Y2 = y2 , · · · , Yn =
yn , then X1 , X2 , · · · , Xn are taking the values equally. So the probability of the
random sample equals for a particular permutations of these given values of the order
1
statistic is n! which is independent of the parameter θ . . ˙. The set of order statistic
is a sufficient statistic.
Example 2.32 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a
population with pdf
 1 1
π 1+(x−θ)2 −∞ < x < ∞
pθ (x) =
0 otherwise

57
A. Santhakumaran

Can the joint pdf of X1 , X2 , · · · , Xn be written in the form given in Neyman Fac-
torization Theorem ? Does Cauchy distribution have a sufficient statistic ?
The joint pdf of X1 , X2 , · · · , Xn is
n  
Y 1 1
pθ (x1 , x2 , · · · , xn ) =
i=1
π 1 + (x − θ)2

It cannot be written in the form of Neyman Factorization Theorem, hence it does not
have a single sufficient statistic.

2. 11 Exponential Family of Distributions


Definition 2.2 A family {pθ (x), θ ∈ Ω} of probability functions of the form

c(θ)eQ(θ)t(x) h(x) a < x < b
pθ (x) =
0 otherwise

is said to be a regular exponential family of the probability functions if


• the range a < x < b of the distribution is independent of the parameter θ .
• Q(θ) is a non - trivial continuous function of θ .
• t(x) is a non-trivial function of x .
• h(x) is a continuous function of x in a < x < b .
If θ is a single value of the parameter space, then it is a single parameter expo-
nential family.
Definition 2.3 A pdf pθ (x) with single parameter θ is expressed as a single
parameter exponential family

c(θ)eQ(θ)t(x) h(x) a < x < b
pθ (x) =
0 otherwise

then T = t(X) is called sufficient statistic.


Remark 2.6 The simplicity of the definition is to determine the sufficient statistic
by inspection.
Example 2.33 Let Xi ’ s be independent and having N( iθ, 1 ), i = 1 to n , where
θ is unknown. Find a sufficient statistic for N( iθ, 1 ).
( 1 2
√1 e− 2 (xi −iθ) −∞ < xi < ∞
pθ (xi ) = 2π
0 otherwise

58
Probability Models and their Parametric Estimation

Consider
n
Y
pθ (x1 , x2 , · · · , xn ) = pθ (xi )
i=1
 n
1 1
Pn 2
= √ e− 2 i=1 (xi −iθ)

 n
1 1
Pn 2 Pn Pn 2 2
= √ e− 2 i=1 xi +θ i=1 ixi − i=1 i θ

 n
1 1
Pn 2 Pn n(n+1)(2n+1) 2
= √ e− 2 i=1 xi +θ i=1 ixi − 12 θ

= c(θ)eQ(θ)t(x) h(x)
n Pn
X 1 2
where t(x) = ixi , h(x) = e− 2 i=1 xi
i=1
 n
1 1 2
and c(θ) = √ e− 12 n(n+1)(2n+1)θ

Pn
Thus T = i=1 iXi is a sufficient statistic.
Example 2.34 Given n independent observations on a random variable X with prob-
ability density function
 1 −x
 2θ e θ if x > 0, θ > 0
θ θx
pθ (x) = e if x ≤ 0
 2
0 otherwise
Obtain a sufficient statistic.
Consider
( t(x)
1 n − θ
( 2θ ) e if x > 0
pθ (x1 , x2 , · · · , xn ) = Pn
( θ2 )n eθt(x) , if x ≤ 0, where t(x) = i=1 xi

c1 (θ)eQ1 (θ)t(x) h(x)



if x > 0, θ > 0
pθ (x) =
c2 (θ)eQ2 (θ)t(x) h(x) if x ≤ 0
1 n
wherePc(θ) = ( 2θ ) , Q1 (θ) = − θ1 , c2 (θ) = ( θ2 )n , Q2 (θ) = θ and h(x) = 1 . .˙.
n
T = i=1 Xi is a sufficient statistic.
Example 2.35 If X has a single observation from N (0, σ 2 ) , then show that |X|
is a sufficient statistic.
( 1 2
√ 1 e− 2σ2 x σ2 > 0 − ∞ < x < ∞
Given pσ (x) = 2πσ
0 otherwise

The pdf is expressed as

pσ (x) = c(σ)eQ(σ)t(x) h(x)


1 1
wherec(σ) = √ Q(σ) = − 2 , t(x) = x2 , h(x) = 1
2πσ 2σ

59
A. Santhakumaran

It is an one parameter exponential family. Thus T = X 2 is sufficient It is equivalent


to T = |X| is sufficient.
Example 2.36 Let X1 , X2 , · · · , Xn be a random sample from N (θ, θ), θ > 0 .
Find the sufficient statistic for the random sample.
( 1 2
√ 1 e− 2θ (x−θ) −∞ < x < ∞
Given pθ (x) = 2πθ
0 otherwise

The joint pdf of X1 , X2 , · · · , Xn is


 n
1 1
pθ (x1 , x2 , · · · , xn ) = √ e− 2θ (x−θ)
2πθ
 n
1 1 X 2 X n
= e− xi + xi − θ
2θ 2θ 2
= c(θ)eQ(θ)t(x) h(x)
 n P
√1 e− n2 θ, h(x) = e xi 1
x2i Q(θ) = − 2θ
P
where c(θ) = 2πθ
t(x) = It is
Xi2 is sufficient statistic.
P
an one parameter exponential family. Thus T =

2.12 Distribution Admitting Sufficient Statistic


Let X be a random sample drawn from a population with distribution Pθ , θ ∈ Ω ,
whose pdf is given by pθ (x) . Assume θ is a single value of the parameter space Ω
and the range of the distribution is independent of the parameter θ . Let T = t(X) be
a sufficient statistic. Using Neyman Factorization Theorem,

pθ (x) = pθ (t)h(x)

log pθ (x) = log pθ (t) + log h(x)


Assume that the function pθ (t) is partially differentiable with respect to θ , then
∂ log pθ (x) ∂ log pθ (t)
= = Qθ (t) (2.3)
∂θ ∂θ
Since the equation holds for all values of θ , it is also true for θ = 0. So one can obtain
the relation t(x) = k(t) where
∂ log pθ (x)
|θ=0 = t(x) and Q0 (t) = k(t)
∂θ
Suppose k(t) and t(x) are differentiable with respect to x , then
∂t(x) ∂k(t) ∂t
=
∂x ∂t ∂x
Again differentiate the equation (2.3) with respect to x
∂ 2 log pθ (x) ∂Qθ (t) ∂t
=
∂x∂θ ∂t ∂x

60
Probability Models and their Parametric Estimation

∂ 2 log pθ (x)
∂x∂θ ∂Qθ (t)
∂t(x)
= (2.4)
∂k(t)
∂x
The left hand side of the equation (2.4) is the same for all x . It must depend on θ
alone so that ∂Q θ (t)
∂k(t) = λ(θ), i.e.,

∂Qθ (t) = λ(θ)∂k(t)

Integrating with respective to t ,


Z Z
∂Qθ (t) = λ(θ)∂k(t) + c1 (θ)

Qθ (t) = λ(θ)k(t) + c1 (θ)


Again integrating with respective to θ ,
Z Z Z
Qθ (t)dθ = k(t) λ(θ)dθ + c1 (θ)dθ + c(x)
R ∂ log pθ (x) R
dθ dθ
= t(x) λ(θ)dθ + B(θ) + c(x)
R
since k(t) = t(x) for θ = 0 and B(θ) = c1 (θ)dθ

log pθ (x) = Q(θ)t(x) + B(θ) + c(x)


R
where Q(θ) = λ(θ)dθ

elog pθ) (x) = eQ(θ)t(x)+B(θ)+c(x)

pθ (x) = eQ(θ)t(x) eB(θ) ec(x)


= c(θ)eQ(θ)t(x) h(x)
where c(θ) = eB(θ) , h(x) = ec(x)

This is an one parameter exponential family.


Remark 2.7 The Neyman Factorization Theorem and the Exponential family of
distributions form are the two equivalent methods of identifying the sufficient Statistic.

2.13 Joint Sufficient Statistics


Definition 2.4 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a
population with pdf pθ1 ,θ2 (x), θ1 , θ2 ∈ Ω. Let T1 = t1 (X), T2 = t2 (X) be two
statistics whose joint pdf is pθ1 ,θ2 (t1 t2 ) . The statistics T1 = t1 (X) and T2 =
t2 (X) are called jointly sufficient statistics iff

pθ1 ,θ2 (x1 , x2 , · · · , xn )


pθ1 ,θ2 (t1 , t2 )

is independent of the parameters θ1 and θ2 for fixed T1 = t1 and T2 = t2 .

61
A. Santhakumaran

Example 2.37 Let X1 , X2 , · · · , Xn be a random sample drawn from a population


with density function
 1
2θ2 θ1 − θ2 < x < θ1 + θ2 where −∞ < θ1 < ∞, 0 < θ2 < ∞
pθ1 ,θ2 (x) =
0 otherwise

Find a sufficient statistic.


Consider Y1 ≤ Y2 ≤ · · · ≤ Yn be the order statistic of X1 , X2 , · · · , Xn .
The joint pdf of (Y1 , Yn ) is
Z yn n−2
n! 1 1 1
pθ1 ,θ2 (y1 , yn ) = dx
1!(n − 2)!1! 2θ2 y1 2θ2 2θ2
(
n(n−1) n−2
= (2θ2 )n (yn − y1 ) θ1 − θ2 < θ < θ1 + θ2
0 otherwise
pθ1 ,θ2 (x1 , x2 , · · · , xn ) ( 2θ12 )n
= n(n−1)
pθ1 ,θ2 (y1 , yn ) − y1 )n−2
(2θ2 )n (yn
1
=
n(n − 1)(yn − y1 )n−2

is independent of the parameter θ1 and θ2 . .˙. (Y1 , Yn ) is jointly sufficient statistics.

Example 2.38 Let X1 , X2 , · · · , Xn be a random sample from the pdf


 1 − 1 (x−θ)
σ e
σ θ < x < ∞, −∞ < θ < ∞
pθ,σ (x) =
0 otherwise 0 < σ < ∞

Find a two dimensional sufficient statistics.


Consider a transformation

Y1 = nX(1)
Y2 = (n − 1)[X(2) − X(1) ]
Y3 = (n − 2)[X(3) − X(2) ]
··· ······
Yn−1 = 2[X(n−1) − X(n−2) ]
n
X n
X
Yn−2 = [X(n) − X(n−1) ] so that Yi = X(i)
i=1 i=1

1
The Jacobian of the transformation is |J| = n! . QThe joint pdf of
n
X(1) , X(2) , · · · , X(n) is given by p(x(1) , x(2) , · · · , x(n) ) = n! i=1 p(x(i) ) The joint

62
Probability Models and their Parametric Estimation

pdf of Y1 , Y2 , · · · , Yn is given by
n
Y
pθ,σ (y1 , y2 , · · · , yn ) = n! p(yi ) × |J|
i=1
n
Y
= p(yi )
i=1
1 − 1 (P yi +nθ)
= e σ nθ < y1 < ∞, 0 ≤ y2 < · · · , < yn < ∞
σn
Consider a further transformation

U1 = Y2
U2 = Y2 + Y3
U3 = Y2 + Y3 + Y4
··· ······
Un−2 = Y1 + Y2 + · · · + Yn−1
T = Y2 + Y3 + · · · + Yn
i.e., Y2 = U1
Y3 = U2 − U1
Y4 = U3 − U2
··· ······
Yn−1 = Un−2 − Un−3
Yn = T − Un−2

Now Y2 + Y3 + · · · + Yn = T and the Jacobian of the transformation is |J| = 1 .The


joint pdf of Y1 , U2 , · · · , Un−2 T is
(y1 −nθ) t
pθ,σ (y1 , u2 , · · · , un−2 , t) = σ1 e− σ σn−1 1
e− σ
nθ ≤ y1 , 0 ≤ u1 ≤ u2 ≤ · · · ≤ un−2 ≤ t < ∞
The marginal density of (Y1 T ) is
(y1 −nθ) t Rt Ru Ru Ru
pσ,θ (y1 , t) = σ1 e− σ σn−1 1
e− σ 0 0 n−2 0 n−1 · · · 0 2 du1 du2 · · · dun−2
(y1 −nθ) t Rt Ru Ru Ru
= σ1 e− σ σn−1 1
e− σ 0 0 n−2 0 n−1 · · · 0 3 u2 du2 du3 · · · dun−2
(y1 −nθ) t Rt Ru Ru R u u2
= σ1 e− σ σn−1 1
e− σ 0 0 n−2 0 n−1 · · · 0 4 2!3 du3 · · · dun−2
(y1 −nθ) t R t n−3
= σ1 e− σ σn−1 1
e− σ (n−3)!
1
u
0 n−2
dun−2
1 − (y1 −nθ) 1 − σt tn−2
= σe
σ
σ n−1 e (n−2)!
n
The first order statistic Y1 has the pdf pθ,σ (y1 ) = nσ e− σ (y1 −θ) θ < y1 < ∞
1
i.e., pθ,σ (y1 ) = σ1 e− σ (y1 −nθ) nθP< y1 < ∞
n
Thus Y1 + nθ ∼ eσ and T = i=2 Yi ∼ G(σ, n − 1) .
(
1 [− y1 −nθ ] tn−2 [− σt ]
σe
σ
(n−2)!σ n−1 e nθ < y1 < ∞, 0 < t < ∞
pθ,σ (y1 , t) =
0 otherwise

63
A. Santhakumaran

The conditional joint density of U1 , U2 , · · · , Un−2 given (Y1 , T ) is

(n − 2)
p(u1 , u2 , · · · , un−2 | y1 , t) = 0 < u1 < u2 < · · · < un−2 < t
tn−2
Thus (Y1 , T ) is jointly sufficient statistics, i.e., (X(1) , +i = 1n [X(i) − X(1) ]) is
P
jointly sufficient statistics.
Definition 2.5 Let θ = (θ1 , θ2 , · · · , θk ) is a vector of parameters and T =
(T1 , T2 , · · · , Tk ) is a random vector . The vector T is jointly sufficient statistics
if pθ (x) is expressed of the form
 Pk
Qj (θ)tj (x)
pθ (x) = c(θ)e j=1 h(x) a<x<b
0 otherwise

Example 2.39 Let X1 , X2 , · · · , Xn be P


a random sample
Pn drawn  from a population
n
N( θ, σ 2 ). Show that the statistic T = i=1 Xi , X
i=1 i
2
is jointly sufficient
statistics.
 n
1 1
Pn 2
pθ,σ2 (x1 , x2 , · · · , xn ) = √ e− 2σ2 i=1 (xi −θ)
2πσ
 n
1 Pn 2 Pn 2
e− 2σ2 [ i=1 xi −2θ i=1 xi +nθ ]
1
= √
2πσ
 n
1 nθ 2
Pn 2 Pn
e− 2σ2 e− 2σ2 [ i=1 xi −2θ i=1 xi ]
1
= √
2πσ
2 2
= c(θ, σ 2 )eQ1 (θ,σ )t1 (x)+Q2 (θ,σ )t2 (x) h(x)
 n
1 nθ 2
where c(θ, σ 2 ) = √ e− 2σ2 ,
2πσ
θ −1
Q1 (θ, σ 2 ) = 2 , Q2 (θ, σ 2 ) = ,
σ 2σ 2
Xn Xn
h(x) = 1, t1 (x) = xi , t2 (x) = x2i
i=1 i=1
Pn Pn 2

.˙. i=1 Xi , i=1 Xi is jointly sufficient statistics.
Example 2.40 Let X1 , X2 , · · · , Xn be a random sample from a Gamma (α, β)
population. Find a two dimensional sufficient statistics for the random sample.
( β
α −αx β−1
Given pα,β (x) = Γβ e x x > 0, α > 0, β > 0
0 otherwise

64
Probability Models and their Parametric Estimation

The joint pdf of X1 , X2 · · · , Xn is

αnβ −α P xi Y β−1
pα,β (x1 , x2 , · · · , xn ) = e ( xi )
(Γβ)n
αnβ −α P xi (β−1) log(Qni=1 xi )
= ne e
(Γβ)
αnβ −α P xi +(β−1) P log xi
= ne
(Γβ)
αnβ −α P xi +β P log xi −P log xi
= ne
(Γβ)
= c(α, β)eQ1 (α,β)t1 (x)+Q2 (α,β)t2 (x) h(x)

αnβ
Pn
where c(α, β) = (Γβ) n , Q1 (α, β) = −α , t1 (x) = i=1 xi , Q2 (α, β) = β ,
− i =1n log xi
Pn P
t2 (x) = i=1 log xP h(x) = e
i and P . It is a two parameter exponential
family. Therefore ( Xi , Xi2 ) is jointly sufficient statistic.

2.14 Efficient Estimator


There are two types of efficient estimators. One is relative efficient estimator and
the other one is efficient estimator. Efficient estimator due to Cramer - Rao lower bound
for the variance of an unbiased estimator. Relative efficient estimator is given below:
Definition 2.6 Let T1 = t1 (X) and T2 = t2 (X) be two unbiased estimators of
θ and Eθ [T12 ] < ∞ and Eθ [T22 ] < ∞. One may define the efficiency of T1 = t1 (X)
relative to T2 = t2 (X) is
Vθ [T1 ]
Efficiency = .
Vθ [T2 ]
Vθ [T1 ]
If < 1 , then T1 = t1 (X) is more efficient than T2 = t2 (X) .
Vθ [T2 ]
Example 2.41 Let Y1 < Y2 < Y3 < Y4 < Y5 be the order statistic of a random
sample of size 5 from a uniform population with pdf
 1
θ 0 < x < θ, θ > 0
pθ (x) =
0 otherwise

Show that 2Y3 is unbiased estimator of θ . Find the conditional expectation T =


Eθ [2Y3 | Y5 ] . Compare the variances of 2Y3 and the statistic T .

65
A. Santhakumaran

The pdf of Y3 is
2 "Z θ #2
Z y3
5! 1 1 1
pθ (y3 ) = dx dx
2!1!2! 0 θ θ y3 θ
30 2
= y [θ − y3 ]2 0 < y3 < θ
θ5 3
30 2 y3
= 5
y3 [1 − ]2 0 < y3 < θ
θ θ
Z θ
30 y3
Eθ [Y3 ] = y 3 [1 − ]2 dy3
θ3 0 3 θ
Z 1
30 y3
= θ4 t3 (1 − t)2 dt where t = θ
θ3 0
Z 1
= 30θ t4−1 (1 − t)3−1 dt
0
Γ4Γ3
= 30θ
Γ7
3! × 2! θ
= 30 =
6! 2
Eθ [2Y3 ] = θ

The joint pdf of Y3 and Y5 is


Z y3 2 Z y5 
5! 1 1 1 1
pθ (y3 , y5 ) = dx dx
2!1!1!1! 0 θ θ y3 θ θ
 60 2
= θ 5 y3 [y5 − y3 ] 0 < y3 < y5 < θ
0 otherwise

The pdf of Y5 is
5 4

pθ (y5 ) = θ 5 y5 0 < y5 < θ
0 otherwise

66
Probability Models and their Parametric Estimation

The conditional distribution of Y3 given Y5 = y5 is

pθ (y3 , y5 )
pθ (y3 | y5 ) =
pθ (y5 )
60 y32 [y5 − y3 ]
= 0 < y3 < y5
5 y54
Z y5
12
Eθ [Y3 | Y5 ] = y33 [y5 − y3 ]dy3
y54 0
3
= y5
5
6
.. . Eθ [2Y3 | Y5 = y5 ] = y5
5
θ2 2θ2
Vθ [Y3 ] = since Eθ [Y32 ] =
28 7
θ2
Vθ [2Y3 ] =
7
Z θ
5 5
Eθ [Y5 ] = y55 dy5 = θ
θ5 0 6
5θ2
Eθ [Y52 ] =
7
5θ2
Vθ [Y5 ] =
5 × 36
θ2
 
6
Vθ Y5 =
5 35
6
The efficiency of 5 Y5 is relative to 2Y3 is

θ2
Vθ [ 65 Y5 ] 35 1
= θ2
= <1
Vθ [2Y3 ] 7
5

Thus 56 Y5 is more efficient than 2Y3 for the unbiased estimator of θ .


Problems
2.1 Give an example for each of the following cases:
(i) Estimator with non - zero bias.
(ii) Estimator with non zero bias
(iii) Consistent estimator with zero bias and
(iv) Consistent estimator with non zero bias.
2.2 Give a sufficient condition for an estimator to be consistent? Is the sample mean
a consistent estimator of the population mean?
2.3 If X1 , X2 , · · · , Xn is a random sample of size n drawn from a population with
uniform distribution ∪[−2θ, θ] , examine whether max1≤i≤n {Xi } is consistent
for θ ?

67
A. Santhakumaran

2.4 What is consistent estimator ? Examine whether a consistent estimator is (i)


unique and (ii) unbiased.
2.5 Show that if the bias of an estimator and its variance approach zero, then the
estimator will be consistent.
2.6 When would you say that estimator of a parameter is good? In particular dis-
cuss the requirements of consistency and unbiasedness of an estimator. Give an
example to show that a consistent estimator need not be unbiased.
2.7 Let X1 , X2 , · · · , Xn be n independent random sample drawn from a normal
population with mean θ and variance σ 2 . Obtain the unbiased estimators of (i)
θ and (ii) σ 2 .

2.8 If Tn denotes the number of successes in n independent and identical trials of


an experiment with probability of success p . Obtain an unbiased estimator of
p2 in the form aTn2 + bTn + c .
2.9 Obtain the unbiased estimator of θ(1 − θ) , where θ is the parameter of the
Binomial distribution.
2.10 Find the unbiased estimator of λ2 in a Poisson population with parameter λ
based on a random sample of size n .
2.11 Let X1 , X2 , · · · , Xn be iid random sample of size n drawn from a population
with common density
 1 −x
θe θ > 0 and x > 0
θ
pθ (x) =
0 otherwise
P
Xi
(i) Show that TP
1 = n is the unbiased estimator of θ .
n
(ii) Let Tc = c i=1 Xi . Show that Ec [Tc − θ]2 = θ2 E1 [Tc − 1]2 .
2.12 Obtain the sufficient statistic, given a sample of size n from a uniform distribu-
tion ∪(−θ, θ) .
2.13 State two equivalent definition of sufficient statistic and obtain their equivalence.
2.14 Explain the concept of sufficiency. State the Factorization Theorem for a suffi-
cient statistic and indicate its importance.
2.15 Let X1 , X2 , · · · , Xn be a random sample
P drawn from a uniform population
in the interval [0, θ] . Define nT2 1 = n
Xi and n+1 T2 = max1≤i≤n {Xi } .
Evaluate their relative efficiency.
2.16 Let X1 , X2 , · · · , Xn be a random sample drawn from a population N (θ, σ 2 ) .
Prove that the sample mean is more efficient estimator as compared to the sample
median for the parameter θ .

68
Probability Models and their Parametric Estimation

2.17 Let X be a single observation from a normal population N (2θ, 1) Pnand let
Y1 , Y2 , · · · , Yn be a normal population N (θ, 1) . Define T = 2X + k=1 Yk .
Show that T is sufficient statistic.
2.18 Let X1 , X2 , · · · , Xn be a random
Pnsample drawn from a normal population
2
N (0, θ) , 0 < θ < ∞ . Show that X
i=1 i is a sufficient statistic.
2.19 If T1 = 32 max{X1 , X2 } and T2 = 2(X1 + X2 ) are estimators of θ based
on two independent observations X1 and X2 on a random variable distributed
uniformly over (0, θ) . Which one do you prefer and why?
2.20 Let X1 , X2 , · · · , Xn be aPrandom sample drawn from a Poisson population with
Xi
parameter θ . Show that n+2 is not unbiased of θ but consistent of θ .
2.21 Distinguish between an Estimate and Estimator. Given three observations
X1 , X2 and X3 on a normal random variable X from N (θ, 1) , a person con-
structs the following estimators for θ
X1 + X2 + X3
T1 =
6
X1 + 2X2 + 3X3
T2 =
7
X1 + X2
T3 =
2
which one would you choose and why?
2.22 A random sample X1 , X2 , · · · , Xn drawn on XPwhichP takes 1 or 0 with re-
Xi ( Xi −1)
spective probabilities θ and (1 − θ) . Show that n(n−1) is an unbiased
estimator of θ2 .
2.23 Discuss whether an unbiased estimator exists for the parametric function τ (θ) =
θ2 of Binomial (1, θ) based on a sample of size one.
2.24 Obtain the sufficient statistic of the pdf

(1 + θ)xθ 0 < x < 1
pθ (x) =
0 otherwise
based on an independent sample of size n .
2.25 X1 , X2 , X3 and X4 constitute a random sample of size four from a Poisson
population with parameter θ . Show that (X1 + X2 + X3 + X4 ) and (X1 +
X2 , X3 + X4 ) are sufficient statistics. Which would you prefer ?
2.26 A statistic Tn such that Vθ [Tn ] → 0 ∀ θ is consistent as an estimator of θ as
n → ∞:
(a) if and only if Eθ [Tn ] → θ ∀ θ
(b) if, but not only if Eθ [Tn ] → θ ∀ θ
(c) if and only if Eθ [Tn ] = θ ∀ θ , for every n
(d) if and only if |Eθ [Tn ] − θ| and Vθ [Tn ] → 0 ∀ θ

69
A. Santhakumaran

2.27 The sequence {Xn } of random variables is said to converge to X in probability,


if as n → ∞ :
(a) P {|Xn − X| > } → 0 for some  > 0
(b) P {|Xn − X| > } → 0 for some  < 0
(c) P {|Xn − X| > } → 0 for every  > 0
(d) P {|Xn − X| < } → 0 for every  > 0

2.28 X1 , XP2 , · · · , Xn are iid Bernoulli random variables with Eθ [Xi ] = θ and
n
Sn = i=1 Xi . Then, for a sequence of non - negative numbers {kn }, Tn =
Sn +kn
n+kn is a consistent estimator of θ :
(a) if knn → 0 as n → ∞
(b) if and if kn = 0 ∀ n
(c) if and only if kn is bounded as n → ∞
(d) whatever {kn } is
2.29 In tossing a coin the P {Head} = p2 . It is tossed n times to estimate the value
of p2 . X denotes the number of heads. One may use to estimate the unbiased
estimator
2 is 2
(a) X n (b) Xn (c) Xn (d) nX2
2.30 Which of the following statement is not correct for a consistent estimator?
1. If there exists one consistent estimator, then an infinite number of consistent
statistics may be constructed.
2. Unbiased estimators are always consistent.
3. A consistent estimator with finite mean value must tend to be unbiased in
large samples.
Select the correct answer given below:
(a) 1 (b) 2 (c) 1 and 3 (d) 1, 2 and 3
2.31 Consider the following type of population :
1. Normal 2 . Cauchy 3 . Poisson
Sample mean is the best estimator of population mean in case of
(a) 1 and 3 (b) 1 and 2 (c) 2 and 3 (d) 1 , 2 and 3

70
Probability Models and their Parametric Estimation

3. COMPLETE FAMILY OF DISTRIBUTIONS

3.1 Introduction
If a given family of probability distributions that admits a non - trivial sufficient
statistic which results in the greatest reduction of data collection. The reduction of data
can be achieved through complete sufficient statistic. The existence of the mathemat-
ical expectation Eθ [g(X)] , θ ∈ Ω implies that the integral ( or sum) involves in
Eθ [g(X)] converges absolutely. This absolute convergence was tacitly assumed in the
definition of completeness.

3.2 Completeness
Definition 3.1 A family of distributions {Pθ , θ ∈ Ω} is said to be complete if

Eθ [g(X)] = 0 ∀ θ ∈ Ω ⇒ g(x) = 0 ∀ x.

Definition 3.2 A statistic T = t(X) is said to be a complete statistic if the family of


distributions of the statistic T = t(X) is complete, i.e.,

Eθ [g(T )] = 0 ∀ θ ∈ Ω ⇒ g(t) = 0 ∀ t ∈ <

Example 3.1 Let X be one observation from the pmf


 1
 2 θ|x| (1 − θ)|x| x = −1, 1
pθ (x) = 1 − θ(1 − θ) x=0
0 otherwise

Show that the family is not complete but the family of distributions Y = |X| is
complete.

Consider Eθ [g(X)] = 0
1
X
g(x)pθ (x) = 0
x=−1

1 1
g(−1) θ(1 − θ) + g(0)[1 − θ(1 − θ)] + g(1) θ(1 − θ) = 0
2 2

[g(−1) + g(1) − 2g(0)]θ(1 − θ) + 2g(0) = 0


Equivating the coefficient of θx on both sides
g(0) = 0 and g(−1) + g(1) − 2g(0) = 0
⇒ g(−1) + g(1) = 0andg(−1) = −g(1)

i.e. g(x) 6= 0 for x = −1, 1 . Thus the family is not complete.

71
A. Santhakumaran

Let Y = |X| and find the pmf of Y .



 1 − θ(1 − θ) y = 0
pθ (y) = θ(1 − θ) y=1
0 otherwise

ConsiderEθ [g(Y )] = 0
1
X
g(y)[θ(1 − θ)]y [1 − θ(1 − θ)]1−y = 0
y=0
1
X θ(1−θ)
g(y)ρy = 0 where ρ = 1−θ(1−θ)
y=0
g(0) + g(1)ρ = 0 → g(0) = 0 andg(1) = 0

i.e., g(y) = 0 ∀ y = 0, 1 . Thus the family of distributions of Y = |X| is complete.


Example 3.2 Let X1 , X2 , · · · , Xn be iid random sample drawn from a Bernoulli
P(
θ), 0 < θ < 1 . Prove that T = i=1 n)Xi is a complete statistic.
population b(1, P
n
Given T = i=1 X1 ∼ b(n, θ)

Eθ [g(T )] = 0
n
X
g(t)cnt (1 − θ)n−t = 0
t=0
n  t
X θ
g(t)cnt (1 − θ)n = 0
t=0
1−θ
Here (1 − θ)n 6= 0
n
X θ
g(t)cnt ρt = 0 where ρ =
t=0
1−θ
g(0)cn0 + g(1)cn1 ρ + · · · + g(n)ρn = 0

By comparing the coefficients of ρt on both sides,

= 0 coefficient of ρ0
g(0)
cn1 g(1)
= 0 coefficient of ρ1
⇒ g(1) = 0
······ ··· ······
g(n) = 0 coefficient of ρn
Thus g(t) = 0 ∀ t = 0, 1, 2, · · · , n.
Pn
Hence T = i=1 Xi is a complete statistic.
Example 3.3 Let X1 , X2 , · · · , Xn be iid random sample drawn from a Poisson

72
Probability Models and their Parametric Estimation

Pn
population with parameter λ > 0 . Show that T = i=1 Xi is a complete statistic.
n
X
T = Xi ∼ P (nλ)
i=1
(nλ)t
i.e., pλ (t) = e−nλ , t = 0, 1, 2, · · · , ∞
t!
Eλ [g(T )] = 0

X (nλ)t
g(t)e−nλ = 0
t=0
t!

X (nλ)t
g(t) = 0 since e−nλ 6= 0
t=0
t!
nλ (nλ)n
g(0) + g(1) + · · · + g(n) + ··· = 0
1! n!
By comparing the coefficients of λt on both sides,

g(0) = 0 coefficient of λ0
ng(1) = 0 coefficient of λ1
⇒ g(1) = 0
······ ··· ······
Thus g(t) = 0 ∀ t = 0, 1, 2, · · · , ∞
Pn
Hence T = i=1 Xi is a complete statistic.
Example 3.4 Let X ∼ ∪(0, θ), θ > 0 . Show that the family of distributions is
complete.

Consider Eθ [g(X)] = 0
Z θ
1
⇒ g(x) dx = 0
0 θ
Z θ
⇒ g(x)dx = 0
0

One can differentiate the above integral with respect to θ on both sides

Z θ
0dx + g(θ) × 1 − g(0) × 0 = 0
0
hR i
b(θ)
d a(θ) pθ (x)dx Z b(θ)
dpθ (x) db(θ)
since = dx + pθ [b(θ)]
dθ a(θ) dθ dθ
da(θ)
−pθ [a(θ)]

g(θ) = 0 ∀ θ > 0, i.e., g(x) = 0 ∀ 0 < x < θ, θ > 0

73
A. Santhakumaran

Thus the family of distributions is complete.


Example 3.5 Let X ∼ N (0, θ) . Prove that the family of pdf ’s {N (0, θ), θ > 0}
is not complete.
Consider Eθ [X] = 0 for θ > 0
Z ∞
1 1 2
x √ √ e− 2θ x dx = 0
−∞ 2π θ
→x = 0 not for all x
i.e., for some value of x 6 = 0
since Eθ [X − θ] = 0
Z ∞
1 1 2
⇒ (x − θ) √ √ e− 2θ x dx = 0
−∞ 2π θ
Z ∞
1 1 2
⇒ t √ √ e− 2θ (t+θ) dt = 0 where t = x − θ
−∞ 2π θ
Z ∞  
−t t 1 2
− 2θ t θ
e √ e dt = 0 since e− 2 6= 0
0 2πθ
R∞
This is same as the Bilateral Laplace Transform of f (t) as −∞
e−st f (t)dt . By the
uniqueness property of the Laplace Transform
Z ∞
e−st f (t)dt = 0
o
⇒ f (t) = 0 ∀ t ∈ (−∞, ∞)
t 1 2
i.e., √ e− 2θ t = 0
2πθ
⇒ t = 0 i.e., x − θ = 0
⇒x = θ >0
Thus x is not equal to zero for θ > 0 . The family X ∼ N (0, θ), θ > 0 is not
complete.
Example 3.6 If X ∼ N (0, θ), θ > 0 . Prove that T = X 2 is a complete statistic.
2
Let T = (X − 0)2 , then Tθ = Xθ ∼ χ2 distribution with one degree of
freedom. Tθ has the pdf of G( 21 , 12 ) .
1 t 1
( 1
1 e− 2 θ ( θt ) 2 −1 θ1 0 < t < ∞
pθ (t) = 2 2 Γ 12
0 otherwise
( 1 1
√ 1
e− 2θ t t 2 −1 0 < t < ∞
= 2πθ
0 otherwise
Eθ [g(T )] = 0
Z ∞
1 t 1
g(t) √ e− 2θ t 2 −1 dt = 0
0 2πθ
Z ∞
1 1
e− 2θ t [g(t)t− 2 ]dt = 0 ∀ θ > 0
0

74
Probability Models and their Parametric Estimation

R ∞
This is same as the Laplace Transform of f (t) as 0 e−st f (t)dt.
Using the uniqueness property of Laplace Transform
1
g(t)t− 2 = 0 ∀ t > 0
i.e., g(t) = 0 ∀ t > 0 . Thus T = X 2 is a complete statistic .
Example 3.7 Examine whether the family of distributions

2θ if 0 < x < 12 , 0 < θ < 1



pθ (x) =
2(1 − θ) if 21 ≤ x < 1

is complete.

Consider Eθ [g(X)] = 0
Z 1 Z 1
2
⇒ g(x)2θdx + g(x)2(1 − θ)dx = 0
1
0 2
1
Z 2
Z 1
2θ g(x)dx + 2(1 − θ) g(x)dx = 0
1
0 2
1
Z 2
Z 1 Z 1
θ g(x)dx − θ g(x)dx + g(x)dx = 0
1 1
0 2 2
"Z 1
#
2
Z 1 Z 1
θ g(x)dx − g(x)dx + g(x)dx = 0
1 1
0 2 2
"Z 1
#
2
Z 1
θ g(x)dx − g(x)dx = 0
1
0 2
Z 1
and g(x)dx = 0
1
2
1
Z 2
Z 1
g(x)dx = g(x)dx θ 6= 0
1
0 2
Z 1
2
⇒ g(x)dx = 0
0

That is g(x) = 0 not for all x


i.e., g(x) 6= 0 for some x . Thus the family of distributions is not complete. Since

75
A. Santhakumaran

choose


 +1 if 0 < x < 14
−1 if 14 ≤ x < 21

g(x) =

 +1 if 12 ≤ x < 43
−1 if 34 ≤ x < 1

Z 41 Z 12
Eθ [g(X)] = (+1)2θdx + (−1)2θdx
1
0 4
3
Z 4
Z 1
+ (+1)2(1 − θ)dx + (−1)2(1 − θ)dx
1 3
2 4

1 1 1 1
=2θ − 2θ + 2(1 − θ) − 2(1 − θ)
4 4 4 4
= 0
But g(x) 6= 0 for some x


 +1 if 0 < x < 14
−1 if 14 ≤ x < 21

i.e., g(x) =
 +1 if 12 ≤ x < 43

−1 if 34 ≤ x < 1

Theorem 3.1 Let {Pθ , θ ∈ Ω} be a single parameter exponential family of


distributions. Its pdf is given by

c(θ)eQ(θ)t(x) h(x) if a < x < b
pθ (x) =
0 otherwise
where a and b are independent of θ . Then the family of distributions is complete.
Proof: Assume pθ (x), θ ∈ Ω is a pmf.
X
Eθ [g(T )] = g(t)Pθ {T = t}
t
X
θt
g(t)c(θ)e h(x) = 0
t

Choose Q(θ) = θ, h(x) = es(t) , t(x) = t and c(θ) 6= 0,


X
then g(t)eθt+s(t) = 0
t

Define g + (t) = max[g(t), 0] and g − (t) = − min[g(t), 0],


then g(t) = g + (t) − g − (t) and both g + (t) and g − (t) are non - negative functions
X
[g + (t) − g − (t)]eθt+s(t) = 0 ∀ θ ∈ Ω
t
X X
g + (t)eθt+s(t) = g − (t)eθt+s(t) ∀ θ ∈ Ω
t t
+ + θt+s(t)
P
Dividing g (t) by a constant t g (t)e and it is denoted by
+ θt+s(t)
g (t)e
p+ (t) = P + θt+s(t)
t g (t)e

76
Probability Models and their Parametric Estimation

Again dividing g − (t) by a constant g − (t)eθt+s(t) and is denoted by


P
t

g − (t)eθt+s(t)
p− (t) = P − θt+s(t)
t g (t)e

Now p+ (t) and p− (t) are both pmf ’s and


X X
p+ (t)eδt = p− (t)eδt ∀ δ ∈ Ω
t t

By the uniqueness property of the moment generating functions p+ (t) = p− (t) ∀ t .


Hence g + (t) = g − (t) ∀ t
⇒ g(t) = 0 ∀ t . Thus the family of distributions is complete.
Example 3.8 Let X1 , X2 , · · · , XP
n be a random
Pn sample drawn from a population
n
with N (θ, θ2 ) . Define g(X) = i=1 Xi ,
2
i=1 Xi . Prove that g(X) is not

77
A. Santhakumaran

complete.
 2
n n
X X 2
Define g(X) = 2 − (n + 1)
Xi  Xi , n = 2, 3, · · ·
i=1 i=1
 2  
n n
X X 2
Eθ [g(X)] = 2Eθ  Xi  − (n + 1)Eθ  Xi 
i=1 i=1
 2
n
X 2 2 θ2
 Xi  = n X̄ and X̄ ∼ N (θ, )
i=1 n

2
Z ∞
2 n − n (x̄−θ)2
Eθ [X̄ ] = x̄ √ e 2θ 2 dx̄
−∞ 2πθ
x̄ − θ √ θ θ
If z = n, then x̄ − θ = z√ and dx̄ = √ dz
θ n n
√ 2
θ 2 n −z θ
Z ∞
. 2 2 √ dz
. . Eθ [X̄ ] = (θ + z √ ) √ e
−∞ n 2πθ n

z2
 
2
Z ∞ z2 2z 1 −
= θ 1 + + √  √ e 2 dz
−∞ n n 2π
2
 
2 1 Z ∞ 2 1 −z
= θ 1 + z √ e 2 dz  + 0
n −∞ 2π
1 1 1 −1
One can take z 2 = t, then z = t 2 and dz = t2 dt
2
" #
2 − t 1 1 −1
Z ∞
2 2
i.e., Eθ [X̄ ] = θ √1+ te 2 t2 dt
2π 0 n 2
" #
1 − t 3 −1
Z ∞
2
= θ 1+ √ e 2 t2 dt
n 2π 0
 

2 1 Γ3 
= θ 

1 + 2 

n 2π 1 3
 
( )2
2
 √ √ 
2 1 1 π2 2
= θ 1 + 2
√ 
n 2π

2
 1 n+1 2
= θ 1+ = θ
n n
 2
n
. X 2 2 2 2 2n+1
. . Eθ  Xi  = Eθ [nX̄] = n Eθ [X̄] = n θ
i=1 n
2
= n(n + 1)θ
n n
X 2 X 2
Consider Xi = (Xi − θ + θ)
i=1 i=1
n n
X 2 2 X
= (Xi − θ) + nθ + 2θ (Xi − θ)
i=1 i=1
n
X 2 2
= (Xi − θ) + 2θnx̄ − nθ
i=1
 
n
X 2 2 2
Eθ  Xi  = Eθ [ns ] + 2θnEθ [X̄] − nθ
i=1
2 2 2 2 2
= Eθ [ns ] + 2nθ − nθ = Eθ [ns ] + nθ

2 1 X 2
where s = (xi − θ)
n

Pn 2
ns2 i=1 (Xi −θ)
Let Y = σ2 = θ2 ∼ χ2 distribution with n degrees of freedom. Y has

78
Probability Models and their Parametric Estimation

the pdf G( n2 , 12 )
1 n
(
1
1
e− 2 y y 2 −1 0<y<∞
p(y) = 22 Γn
2
0 otherwise
Z ∞
1 1 n
E[Y ] = n e− 2 y y 2 +1−1 dy = n
0 2 2 Γ n2
ns2
 
i.e., Eθ = n
σ2
Eθ [s2 ] = θ2 since σ 2 = θ2
n
X
Eθ [ Xi2 ] = nθ2 + nθ2 = 2nθ2
i=1
" n #2 " n #
X X
Eθ [g(X)] = 2Eθ Xi − (n + 1)Eθ Xi2
i=1 i=1
= 2n(n + 1)θ2 − (n + 1)2nθ2 = 0
→ g(x) = 0 not for all x
i.e., g(x) 6= 0 for some x
n
!2 n
!
X X
2
i.e., g(x) = 2 xi − (n + 1) xi 6= 0
i=1 i=1
n
!2 n
!
X X
i.e., 2 xi 6= (n + 1) x2i for some x, n = 2, 3, · · ·
i=1 i=1

Thus g(X) is not a complete statistic .


Example 3.9 Show that the family of distributions given by the pdf ’s

 θ if 0 < x < θ
pθ (x) = (1 + θ) if θ ≤ x < 1, 0 < θ < 1
0 otherwise

79
A. Santhakumaran

is complete.
Consider Eθ [g(X)] = 0
Z θ Z 1
θg(x)dx + (1 + θ)g(x)dx = 0+0
0 θ
Z θ
⇒ g(x)dx = 0 and
0
Z 1
g(x)dx = 0
θ
One can differentiate the above integrals with respect to θ
Z θ
0dx + g(θ) × 1 − g(0) × 0 = 0 and
0
Z 1
0dx + g(1) × 0 − g(θ) × 1 = 0
θ
g(θ) = 0 and −g(θ) = 0 ∀ θ > 0
i.e., g(x) = 0 ∀ 0 < x < θ, 0 < θ < 1
Thus the family of distributions is complete.
Definition 3.3 A statistic T = t(X) is said to be bounded complete statistic, if
there exists a function |g(T )| ≤ M, M ∈ < such that E[g(T )] = 0 ⇒ g(t) =
0 ∀ t ∈ <.
Example 3.9 Show that Completeness implies bounded completeness, but
bounded completeness does not imply completeness.
Proof: Assume T = t(X) is a complete statistic. That is E[g(T )] = 0 ⇒
g(t) = 0 ∀ t ∈ < . Prove that g(T ) is bounded complete.
V [g(T )]
P {|g(T ) − E[g(T )]| < } ≥ 1− for every given  > 0
2
V [g(T )]
P {|g(T )| < } ≥ 1− for every given  > 0
2
⇒ |g(t)| <  ∀ t ∈ <
at least with probability 1 − V [g(T
2
)]
. This means that g(T ) is bounded with
E[g(T )] = 0 ⇒ g(t) = 0 ∀ t ∈ < . i.e., T = t(X) is a bounded complete
statistic.
Assume T = t(X) is a bounded complete statistic. To prove that T is not a
complete statistic.

 θ x = −1
Consider a family of density functions pθ (x) = (1 − θ)2 θx x = 0, 1, 2, · · ·
0 otherwise

Examine whether the family is bounded complete.


Consider a function

x x = −1, 0, 1, 2, · · · , n and ∀ n ∈ N
g(x) =
0 x = n + 1, n + 2, · · ·

80
Probability Models and their Parametric Estimation

Now the function g(x) = x is bounded. If the family is bounded complete, then

Eθ [X] = 0, ⇒ x = 0 ∀ x = −1, 0, 1, · · · , n and ∀ n ∈ N



X
xpθ (x) = 0
x=−1
n
X
−1 × θ + x(1 − θ)2 θx = 0
0
n
X θ
xθx = = θ(1 − θ)−2
x=0
(1 − θ)2
Xn
xθx = θ[1 + 2θ + 3θ2 + · · · ]
x=0
= [θ + 2θ2 + 3θ3 + · · · ]
X∞ X∞
= xθx = xθx
x=1 x=0
Xn X∞
= xθx + xθx
x=0 x=n+1

X
⇒ xθx = 0
x=n+1

⇒ x = 0 ∀ x = −1, 0, 1, 2, · · · , n and ∀ n ∈ N . Thus the family of distributions

81
A. Santhakumaran

is bounded complete. But it is not complete. Since


Eθ [g(X)] = 0

X
i.e., g(x)pθ (x) = 0
x=−1

X
g(−1)θ + g(x)(1 − θ)2 θx = 0
x=0
X∞
g(x)(1 − θ)2 θx = −g(−1)θ
x=0

X −g(−1)θ
g(x)θx = = −g(−1)θ(1 − θ)−2
x=0
(1 − θ)2
X∞
g(x)θx = −g(−1)θ[1 + 2θ + 3θ2 + · · · +
x=0
nθn−1 + (n + 1)θn + · · · ]
= −g(−1)[θ + 2θ2 + 3θ3 + · · · +
nθn + (n + 1)θn+1 + · · · ]
X∞
= −g(−1) xθx
x=1
X∞
= −g(−1) xθx
x=0
⇒ g(x) = −g(−1)x = cx where c = −g(−1) and c ∈ <
Thus the family of distributions is not complete.
Example 3.11 Examine the family of distributions given by Pθ {X = −1} =
θ2 , Pθ {X = 0} = 1 − θ and Pθ {X = 1} = θ(1 − θ), 0 < θ < 1 is complete.
Consider Eθ [g(X)] = 0
2
g(−1) × θ + g(0)θ(1 − θ) + g(1)(1 − θ) = 0
θ2 [g(−1) − g(0)] + θ[g(0) − g(1)] + g(1)] = 0
g(−1) − g(0) = 0 coefficient of θ2
⇒ g(−1) = g(0)
g(0) − g(1) = 0 coefficient of θ
⇒ g(0) = g(1)
g(1) = 0 coefficient of θ0
Hence g(−1) = g(1) = g(0) = 0 . Thus g(x) = 0 for x = −1, 0 and 1. .˙. The
family of distributions is complete.
Example 3.12 X has the following distribution
X =x: 0 1 2
Pθ {X = x} 1 − θ − θ2 θ θ2

82
Probability Models and their Parametric Estimation

Prove that the family of distributions is complete.

Consider Eθ [g(X)] = 0
g(0)[1 − θ − θ ] + g(1)θ + g(2)θ2 = 0
2

2
θ [g(2) − g(0)] + θ[g(1) − g(0)] + g(0) = 0
g(2) − g(0) = 0 coefficient of θ2
g(1) − g(0) = 0 coefficient of θ
g(0) = 0 coefficient of θ0

Hence g(0) = g(1) = g(2) = 0 , i.e., g(x) = 0 for x = 0, 1 and 2. Thus the family
of distributions is complete.
Example 3.13 X has the following distribution
X =x: 1 2 3 4 5 6
Pθ {X = x} 61 16 16 61 16 16
Examine whether the family of pmf ’s is complete.
Define 
c when x = 1, 3, 5
g(x) =
−c when x = 2, 4, 6

Consider E[g(X)] = 0
3c 3c
⇒ − + = 0
6 6
But g(x) 6= 0 for x = 1, 2, 3, 4, 5, 6.

Thus the family of pmf ’s is not complete.


Example 3.14
Show that the family of pmf ’s {pN (x), x = 1, 2, · · · , N and ∀ N = 1, 2, 3, · · · }
is complete.
The pmf of a random variable X is
1
pN (x) = x = 1, 2, · · · , N and ∀ N = 1, 2, · · ·
N
i.e., pN =1 (x) = 1 x = 1
1
pN =2 (x) = x = 1, 2
2
1
pN =3 (x) = x = 1, 2, 3
3
······ ··· ······
······ ··· ······

Consider EN g(X) = 0 ∀ N ∈ I+
PN
i.e., x=1 g(x) N1 = 0 ⇒ g(x) = 0 ∀ x and ∀ N
When N = 1 ⇒ g(1) = 0
When N = 2 ⇒ g(1) + g(2) = 0 ⇒ g(2) = 0 since g(1) = 0
When N = 3 ⇒ g(3) = 0 since g(1) + g(2) = 0 and so on.

83
A. Santhakumaran

Thus g(x) = 0 ∀ x and ∀ N ∈ I+ .


Thus the discrete family of uniform distributions defined on the sample {x | x =
1, 2, 3, · · · , N and N ∈ I+ } is complete.
Example 3.15 Examine whether the family of pmf ’s {pN (x), x =
1, 2, · · · , N and N = 2, 3, · · · } is complete.
PN
Consider x=1 g(x) 1 = 0 when N = 2, 3, · · ·
P2 N
When N = 2 ⇒ x=1 g(x) = 0 ⇒ g(1) + g(2) = 0 i.e., g(2) = −g(1)
When N = 3 ⇒ g(1) + g(2) + g(3) = 0
⇒ g(3) = 0 and so on. Thus EN [g(x)] = 0 ⇒ g(x) 6= 0 for x = 1 and 2 ,
i.e., g(2) = −g(1) and
g(x) = 0 ∀ x = 3, 4, · · · , N and ∀ N = 2, 3, 4, · · ·
Thus the family of distributions is not complete.
Remark 3.1 Completeness is a property of a family of distributions. As in the
example 3.15, one can see that the exclusion of even one member from the family
{pN (x), x = 1, 2, · · · , N and N = 1, 2, · · · } destroys completeness.
Remark 3.2 For the example 3.15, define

 c if x = 1
g(x) = −c if x = 2
0 if x = 3, 4, 5, · · ·

PN 1
then x=1 g(x) N = 0 when N = 2, 3, · · ·
⇒ g(x) = 0 ∀ x = 1, 2, 3, · · · , N and N = 2, 3, · · · . This means that the family of
distributions is bounded complete. Thus there exist is a class of unbiased estimators of
zero, i.e., U0 = {g(X) | c ∈ <} where
(−1)x−1 c X = 1, 2 and c ∈ <

g(x) =
0 otherwise
If the family of distributions is complete, then the unbiased estimator of zero is unique.

3.3 Minimal Sufficient Statistic


Consider a random sample (X1 , X2 , X3 , · · · , Xn ) from a iid discrete popula-
tion with probability function pθ (x), θ ∈ Ω. The statistic T = {X1 = x1 , x2 =
x2 , · · · , Xn−1 = xn−1 } is not sufficient. For
P {X1 = x1 , · · · , Xn = xn | X1 = x1 , X2 = x2 , · · · , Xn−1 = xn−1 } = P {Xn = xn }
= P {Xn = xn }
= pθ (xn )
This conditional probability pθ (xn ) given the value of T , i.e., X1 =
x1 , · · · , Xn−1 = xn−1 is just the probability function of the nth observation which
does depend on θ . One uses a statistic means that there is a reduction of a given sam-
ple. It usually simplifies the methodology and the theory, and how much data can be
reduced without sacrificing sufficiency.

84
Probability Models and their Parametric Estimation

Definition 3.4 A statistic is said to be minimal sufficient if it is sufficient and if any


reduction of the partition of the sample space defined by the statistic is not sufficient.

3.4 Method of constructing minimal sufficient statistics


Lehmann and Scheffe technique for obtaining a minimal sufficient statistic is par-
tition of the sample space. Once the partition is obtained, a minimal sufficient statistic
can be defined by assigning distinct numbers to distinct partition sets.
In constructing sets of a partition that is to be sufficient for the family of den-
sities pθ (x), for θ ∈ Ω, there is two sets of sample points X1 = x1 , · · · , Xn = xn
and Y1 = y1 , Y2 = y2 , · · · , Yn = yn will lie on the same partition of the minimal
sufficient partition iff the ratio of x1 , x2 , · · · , xn to its value at y1 , y2 , · · · , yn :

pθ (x1 , x2 · · · , xn )
= k(y1 , · · · , yn ; x1 , x2 , · · · , xn )
pθ (y1 , y2 , · · · , yn )

where k(y1 , · · · , yn ; x1 , x2 , · · · , xn ) 6= 0 and k(y1 , · · · , yn ; x1 , x2 , · · · , xn ) is in-


dependent of θ, θ ∈ Ω
The reason for writing the definition in terms of a product rather than a ratio
is taken into account the points for which pθ (x1 , x2 , · · · , xn ) = 0 , i.e., all points
x1 , x2 , · · · , xn such that pθ (x1 , x2 , · · · , xn ) = 0 ∀ θ ∈ Ω will be equivalent, and
every x1 , x2 · · · , xn will be lie in some partition D , namely in D(x1 , x2 , · · · , xn )
and there is no overlapping of the D ’s, so that they constitute a partition of the sample
space. For if two D ’s, say D(x1 , x2 , · · · , xn ) and D(y1 , y2 , · · · , yn ) have a point
z1 , z2 , · · · , zn in common, then z1 , z2 , · · · , zn is equivalent to both x1 , x2 , · · · , xn
and y1 , y2 , · · · , yn which are then equivalent to each other and define the same D .
Thus the partition of the sample space D defines the minimal sufficient partition.
Example 3.16 Let X1 , X2 , · · · Xn be iid random sample drawn from a Binomial
population b(n, θ). Obtaining the minimal sufficient statistic by partition method.
The joint pdf at X1 = x1 , X2 = x2 , · · · , Xn = xn is
P P
xi
pθ (x1 , x2 , · · · , xn ) = θ (1 − θ)n− xi

The joint pdf at Y1 = y1 , Y2 = y2 , · · · , Yn = yn is


P P
yi
pθ (y1 , y2 , · · · , yn ) = θ (1 − θ)n− yi

The ratio is  P xi =P yi
pθ (x1 , x2 , · · · , xn ) θ
= .
pθ (y1 , y2 , · · · , yn ) 1−θ
P P
The ratio is independent of θ iff xi = yi . Thus the points x1 , x2 , · · · , xn and
y1 , y2 , · · · ,P
yn whose coordinates have the same set of minimal sufficient partition.
Therefore Xi is a minimal sufficient statistic.
Example 3.17 Let X1 , X2 , · · · , Xn be iid PrandomP sample from N (θ, σ 2 ) . As-
2 2
sume θ and σ are unknown. Prove that ( Xi , Xi ) is a minimal sufficient
statistic.

85
A. Santhakumaran

Consider the ratio


 X i
pθ,σ2 (x1 , x2 , · · · , xn ) 1 hX 2 X 2 X
= exp − 2 xi − yi − 2θ xi − yi
pθ,σ2 (y1 , y2 , · · · , yn ) 2σ
2
P P P 2
The
P 2ratio is independent P of Pthe2parameters (θ, σ ) iff xi = yi and xi =
yi . Therefore ( Xi , Xi ) is a minimal sufficient statistic.
Example 3.18 Determine the minimal sufficient statistic based on a random sample
of size from each of the following:
(i)  −θx
θe θ>0
pθ (x) =
0 otherwise
(ii)
x2
 x
pθ (x) = θ exp[− 2θ ] x>0
0 otherwise
and (iii)
2
( q x
2 x2 − σ
π σ3 e x>0
2
pσ (x) =
0 otherwise
(i) Consider the ratio
pθ (x1 , x2 , · · · , xn ) h X X i
= exp −θ xi − yi .
pθ (y1 , y2 , · · · , yn )
P P P
The ratio is independent of θ iff xi = yi . Therefore Xi is a minimal
sufficient statistic.
(ii) Consider the ratio
   
pθ (x1 , x2 , · · · , xn ) Y xi 1 X 2 X 2 
= exp − xi − yi .
pθ (y1 , y2 , · · · , yn ) yi 2θ
P 2 P 2 P 2
The ratio is independent of the parameter θ iff xi = yi . Therefore Xi is a
minimal sufficient statistic.
(iii) Consider the ratio
pσ (x1 , x2 , · · · , xn ) Y x2i
 
1 X 2 X 2
= 2 exp[− 2 ( xi − yi )]
pθ (y1 , y2 , · · · , yn ) yi 2σ
P 2 P 2 P 2
The ratio is independent of σ iff xi = yi . Therefore Xi is a minimal
sufficient statistic.
Theorem 3.2 The Exponential family of distributions consists of those distri-
butions with densities or probability functions expressible in the form: pθ (x) =
c(θ)eQ(θ)t(x) h(x), i.e., pθ (x) is a member of exponential family, then there exist is
a minimal sufficient statistic.
Proof: The joint density function of the random sample X1 , X2 , · · · , Xn for a
random variable X is
X Y
pθ (x1 , x2 , · · · , xn ) = [c(θ)]n exp[Q(θ) t(xi )] h(xi ).

86
Probability Models and their Parametric Estimation

Consider the ratio of this density at x1 , x2 , · · · , xn to its value at y1 , y2 , · · · , yn is

pθ (x1 , x2 , · · · , xn ) h nX X oi Y h(x )
i
= exp Q(θ) t(xi ) − t(yi ) .
pθ (y1 , y2 , · · · , yn ) h(yi )
P P P
This is independent of θ iff t(xi ) = t(yi ) . Therefore T = t(Xi ) is a
minimal sufficient statistic.
Remark 3.3 A complete sufficient statistics is minimal sufficient whenever mini-
mal sufficient statistic exists.
Theorem 3.3 Let pθ0 (x) and pθ1 (x)) be the densities and they have the same
p (X)
support ( the range of the two densities are the same). Then the statistic T = pθθ1 (X)
0
is minimal sufficient.
Proof: The necessary and sufficient condition that T = t(X) is a sufficient
statistic for fixed θ1 and θ0 are

pθ1 (x1 , x2 , · · · , xn ) = pθ1 (t)h(x1 , x1 , · · · , xn )

and
pθ0 (x1 , x2 · · · , xn ) = pθ0 (t)h(x1 , x2 , · · · , xn )
pθ1 (x1 ,x2 ,··· ,xn ) pθ1 (t)
respectively. Let the ratio pθ0 (x1 ,x2 ,··· ,xn ) = pθ0 (t) be a function of u(x) , then
p (X)
U = u(X1 , X2 , · · · , Xn ) is a sufficient statistic for pθθ1 (X)
iff T is a function of U .
0
This proves T = t(X) to be minimal sufficient statistic.
If P is a family of distributions with common support and P0 ⊂ P and if
T = t(X) is minimal sufficient statistic for P0 and sufficient for P , it is minimal
sufficient for P .
Example 3.18 Let P ∼ N (θ, 1) and P0 ∼ N (θ0 , 1) and P0 ⊂ P . Let
X1 , X2 , · · · , Xn be a random sample of size n . Then
1
(xi −θ)2
P
pθ (x1 , x2 , · · · , xn ) e− 2
= 1
P
(xi −θ0 )2
pθ (x1 , x2 · · · , xn ) e− 2
1 2 2
= e 2 [2n(θ−θ0 )x̄−n(θ −θ0 )]

Thus T = X̄ is the minimal sufficient statistic for N (θ, 1). Example 3.19 Let
X1 , X2 , · · · , Xn be a random sample from a population defined by the Cauchy density
with parameter θ :
 1
π[1+(x−θ)2 ] −∞ < x < ∞
pθ (x) =
0 otherwise − ∞ < θ < ∞

Find the minimal sufficient statistic.


Two sets of sample points x1 , x2 , · · · , xn and y1 , y2 , · · · , yn will lie on the same
partition of the minimal sufficient partition iff the ratio
n 
1 + (yj − θ)2

pθ (x1 , x2 , · · · , xn ) Y
=
p(y1 ,2 , · · · , yn ) j=1
1 + (xj − θ)2

87
A. Santhakumaran

is independent of θ The numerator and denominator are polynomials of degree 2n


in θ The ratio is independent of θ , the two polynomials are identical ( the leading
coefficients being
√ equal). This means that the set of zeroes of the numerator polynomial
yj + i ( i = −1, j = 1, 2, · · · , n ) is the same as the set of zeroes of the polynomial,
xj + i ( j = 1, 2, · · · , n ). This is true iff the real numbers ( x1 , x2 , · · · , xn ) are a
permutaion of the numbers ( y1 , y2 , · · · , yn ). A partition set of the minimal sufficient
partition consists of the n! permutations of n real numbers. This minimal sufficient
partition is defined by the order statistic ( X(1) , X(2) , · · · , X(n) ).
Problems
3.1 X has the following distribution
X =x: 0 1 2
Pθ {X = x} 1 − θ − θ2 θ2 θ
Prove that the family of distributions is complete.
3.2 Let X1 , X2 , · · · , Xn be a sample from pmf
 1
x = 1, 2, 3, · · · , N ; N ∈ I+
p(x | N ) = PN {X = x} = N
0 otherwise

Examine if the family of distributions is complete.


3.3 Let X1 , X2 , · · · , Xn be iid random variables from ∪(0, θ) . Prove that the
statistic YN = max1≤i≤n {Xi } is complete.
3.4 Consider the class of Hypergeometric probability distributions, {PD , D =
0, 1, 2, · · · , N } where

N-D

D
  

  x  n-x


x = 0, 1, · · · , min(n, D)

N
 
PD {X = x} =
 n




0 otherwise

Show that it is a complete class.


3.5 Examine if the family of distributions

θ if 0 < x ≤ 1
pθ (x) =
1 − θ if 1 < x ≤ 2

is complete.
3.6 Let X1 , X2 , · · · , Xn be a sample from ∪(θ − 12 , θ + 12 ), θ ∈ < . Show that the
statistic T = (min1≤i≤n (Xi ), max1≤i≤n (Xi )) is not complete.
2
3.7 Let X1 , X2 , · · · , Xn be a sample of n independent
Pn observations
Pn from N (θ, σ )
2 2
−∞ < θ < ∞, 0 < σ < ∞ . Show that i=1 Xi , i=1 Xi is a sufficient
statistic. Is it complete? Justify?

88
Probability Models and their Parametric Estimation

3.8 Show that the Exponential family of distributions



exp[θx + w(x) + A(θ)] −∞ < x < ∞
pθ (x) =
0 otherwise

depending on a single parameter θ is complete.


3.9 Prove that a complete sufficient statistics is minimal sufficient whenever minimal
sufficient statistic exists.
3.10 Explain the method of construction of minimal sufficient statistic.
3.11 If a family P is complete, then it is possible to conclude that completeness for
(a) a larger class
(b) an equal class
(c) a small class
(d) none of these
3.12 For a fixed n0 = 1, 2, · · · from the family of densities {pN : N ∈ I+ }. Let
P = {pN : N ∈ I+ and N 6= n0 } where
 1
N x = 1, 2, · · · , N ; N ∈ I+
pN (x) =
0 otherwise

then
(a) P is complete
(b) P is not complete
(c) P is bounded complete
(d) P is not bounded complete
3.13 If a complete sufficient statistic does not exist, then UMVUE
(a) may not exist
(b) may exist
(c) may unique
(d) none of the above
3.14 If a complete sufficient statistic exists, then UMVUE is
(a) unique
(b) not unique
(c) not exist
(d) none of the above

89
A. Santhakumaran

4. OPTIMAL ESTIMATION

4.1 Introduction
Let g(T ) be an unbiased estimator of τ (θ) and δ(T ) be an another unbiased
estimator of τ (θ) different from g(T ) . Then there always exists an infinite number of
unbiased estimators of τ (θ) such that λg(T ) + (1 − λ)δ(T ), 0 < λ < 1 . In this case
one can find the best estimator or optimal estimator among all the unbiased estimators.
The following procedures are used to identify the optimal estimator.

4.2 Uniformly Minimum Variance Unbiased Estimator


Let U = {δi (T ), i = 1, 2, 3, · · · } be the set of all unbiased estimators of the
parameter τ (θ) ∀ θ ∈ Ω and Vθ [δi (T )] < ∞, i = 1, 2, 3, · · · ∀ θ ∈ Ω and g(T )
be a statistic with Vθ [g(T )] < ∞ . Then the estimator g(T ) is called an Uniformly
Minimum Variance Unbiased Estimator (UMVUE) of τ (θ) if

Eθ [g(T )] = τ (θ) ∀ θ ∈ Ω and


Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δi (T ) − τ (θ)]2
i.e., Vθ [g(T )] ≤ Vθ [δi (T )] ∀ θ ∈ Ω and ∀ i = 1, 2, 3, · · ·

The procedures to identify the UMVUE are


• Uncorrelatedness Approach.
• Rao Blackwell or Lehman Scheffe’s Approach.
• Inequality Approach.

4.3 Uncorrelatedness Approach


It is a mathematical property based on the uncorrelatedness of estimators. The
following result gives a necessary and sufficient condition for an unbiased estimator to
be UMVUE.
Theorem 4.1 Let U be the class of all unbiased estimators T = t(X) of a
parameter τ (θ) ∀ θ ∈ Ω with Eθ [T 2 ] < ∞ for all θ . Suppose that U is a non-
empty set. Let U0 be the set of all unbiased estimators of V of zero, i.e.,

U0 = {V | Eθ [V ] = 0, Eθ [V 2 ] < ∞ ∀ θ ∈ Ω}

Then T ∈ U is a UMVUE of τ (θ) if and only if Eθ [V T ] = 0 ∀ θ ∈ Ω and ∀ V ∈


U0 .
Proof: Let T ∈ U and V ∈ U0 . Assume that T = t(X) is a UMVUE of τ (θ) .
Prove that Eθ [V T ] = 0 ∀ θ ∈ Ω. That is, Covθ [V, T ] = 0 ∀ θ ∈ Ω and V ∈ U0 .
Consider T + λ V ∈ U for some real λ (λ 6= 0) , then

Eθ [T + λ V ] = τ (θ) + λEθ [V ]
= τ (θ) since Eθ [V ] = 0

90
Probability Models and their Parametric Estimation

→ T + λ V is also an unbiased estimator of τ (θ) and Vθ [T + λ V ] ≥ Vθ [T ] ∀ θ


and ∀ λ .

Vθ [T ] + λ2 Vθ [V ] + 2λCovθ [V, T ] ≥ Vθ [T ]
i.e., 2λCovθ [T, V ] + λ2 Vθ [V ] ≥ 0 ∀ θ and ∀ λ

It is an quadratic equation in λ and it has two real roots λ = 0 and λ = − 2Cov θ [T,V ]
Vθ [V ] .
If λ = 0, trivially T is an UMVUE of τ (θ) .
For λ 6= 0, take λ0 = λ2 = − Cov θ [T,V ] 0
Eθ [V 2 ] , then one can define T ∈ U where T =
0

T + λ0 V and Eθ [T + λ0 V ] = Eθ [T ] = τ (θ) and


2
Vθ [T 0 ] = Eθ [T + λ0 V ]2 − {Eθ [T + λ0 V ]}
= Eθ [T + λ0 V ]2 − τ 2 (θ)
2
= Eθ [T 2 ] − τ 2 (θ) + λ0 Eθ [V 2 ] + 2λ0 Covθ [T, V ]
2 2
(Covθ [T, V ]) (Covθ [T, V ])
= Vθ [T ] + Eθ [V 2 ] − 2
(Eθ [V 2 ])2 Eθ [V 2 ]
(Covθ [T, V ])2
= Vθ [T ] −
Eθ [V 2 ]
(Covθ [T, V ])2
Vθ [T 0 ] = Vθ [T ] − ≤ Vθ [T ]
Eθ [V 2 ]

Thus λ0 = − EEθθ[T V]
[V 2 ] contradicts that T is the UMVUE of τ (θ) . If T is the UMVUE
of τ (θ) , then Covθ [T, V ] = 0, i.e., Eθ [T V ] = 0 ∀ θ ∈ Ω .
Conversely, assume Covθ [T, V ] = 0 for some θ ∈ Ω . To prove that T is a
UMVUE of τ (θ) . Let T 0 be another unbiased estimator of τ (θ) so that T 0 ∈ U,
then T 0 − T ∈ U0 . Since Eθ [T ] = τ (θ) and Eθ [T 0 ] = τ (θ) →

Eθ [T 0 − T ] = 0
⇒ Eθ [T (T 0 − T )] = 0
Eθ [T T 0 ] = Eθ [T 2 ]
Applying Cauchy Schwarz inequality to Eθ [T 0 T ]
2 2
{Eθ [T T 0 ]} ≤ Eθ [T 2 ]Eθ [T 0 ]
1 n o 12
2
Eθ [T T 0 ] ≤ Eθ [T 2 ] 2 Eθ [T 0 ]


Eθ [T 2 ] n
2
o 21
1 ≤ Eθ [T 0 ]
{Eθ [T 2 ]} 2
Vθ [T ] ≤ Vθ [T 0 ]

Thus T = t(X) is the UMVUE of τ (θ) .


Theorem 4.2 Let U be the non-empty class of unbiased estimators as in the
Theorem 4.1, then there exists at most one UMVUE of τ (θ) .

91
A. Santhakumaran

Proof: Assume T = t(X) is a UMVUE of τ (θ) . Let T 0 = t0 (X) be another


UMVUE of τ (θ) .
Eθ [T 0 ] = τ (θ)
Eθ [T ] = τ (θ)
⇒ Eθ [T 0 − T ] = 0
⇒ Eθ [T (T 0 − T )] = 0
i.e., Eθ [T 2 ] = Eθ [T T 0 ]
Covθ [T, T 0 ] = Vθ [T ]
The correlation coefficient between T and T 0 is given by
Covθ [T, T 0 ] Vθ [T ]
= =1
Vθ [T 0 ]
p p
Vθ [T ] Vθ [T 0 ]
since Vθ [T ] = Vθ [T 0 ]
⇒ Pθ {aT + bT 0 = 0} = 1 ∀ a, b ∈ <
Choose a = 1 and b = −1
⇒ Pθ {T = T 0 } = 1, then T and T 0 are the same. .˙. The UMVUE T is unique.
Theorem 4.3 If UMVUE’s Ti = ti (X), i = 1, 2 exist for real function τ1 (θ)
and τ2 (θ) of θ , then aT1 + bT2 is also UMVUE of aτ1 (θ) + bτ2 (θ) .
Proof: Given T1 = t1 (X) is a UMVUE of τ1 (θ) , i.e., Eθ [T1 V ] = 0 ∀ θ ∈ Ω
and V ∈ U0 . Again Eθ [T2 V ] = 0, ∀ θ ∈ Ω and V ∈ U0 .
Prove that Eθ {[(aT1 + bT2 )V ]} = 0 ∀ θ ∈ Ω.
Covθ [(aT1 + bT2 )V ] = Eθ [(aT1 V ) + (bT2 V )]
−Eθ [aT1 + bT2 ]Eθ [V ] since Eθ [V ] = 0
= Eθ [aT1 V ] + Eθ [bT2 V ]
= aCovθ [T1 , V ] + bCovθ [T2 , V ]
= a×0+b×0=0
Thus aT1 + bT2 is a UMVUE of aτ1 (θ) + bτ2 (θ) .
Theorem 4.4 Let {Tn = tn (X)} be a sequence of UMVUE’s of τ (θ) and
T = t(X) be a statistic with Eθ [T 2 ] < ∞ and such that Eθ [Tn − T ]2 → 0 as
n → ∞ ∀ θ ∈ Ω . Then T = t(X) is also the UMVUE of τ (θ) .
Proof: Given {Tn }∞ n=1 is a UMVUE of τ (θ), i.e., Eθ [Tn V ] = 0 ∀ n =
1, 2, 3, · · · ∀ θ and Eθ [V ] = 0 ∀ θ . Prove that T is also an UMVUE of τ (θ) ,
i.e., Eθ [T V ] = 0 ∀ θ and Eθ [V ] = 0 ∀ θ .
Consider Eθ [T − τ (θ)] = Eθ [T − Tn + Tn − τ (θ)]
|Eθ [T − τ (θ)]| ≤ |Eθ [T − Tn ]| + |Eθ [Tn − τ (θ)]|
≤ Eθ |T − Tn | since Eθ |Tn − τ (θ)| ≥ 0
1 1
Eθ [T − Tn ]2 2 since |Eθ [T − Tn ]| ≤ Eθ [T − Tn ]2 2
 
i.e., |Eθ [T − τ (θ)]| ≤
Consider Covθ [T, V ] = Eθ [T V ] − 0
= Eθ [T V ] − Eθ [Tn V ]
Eθ [T V ] = Eθ [(T − Tn )V ]

92
Probability Models and their Parametric Estimation

Applying Cauchy Schwarz Inequality to Eθ [(T − Tn )V ]


1  1
|Eθ [(T − Tn )V ]| ≤ Eθ [V 2 ] 2 Eθ [T − Tn ]2 2
But Eθ [(T − Tn )V ] = Eθ [T V ]
1  1
.. . |Eθ [T V ]| ≤ Eθ [V 2 ] 2 Eθ [T − Tn ]2 2
1
But Eθ [T − Tn ]2 2 ≥ 0 and


Eθ [T − Tn ]2 → 0 as n → ∞
.. . Eθ [T V ] → 0 as n → ∞
i.e., Covθ [T, V ] = 0 as n → ∞ ∀ θ ∈ Ω.

Thus T = t(X) is a UMVUE of τ (θ) .


Example 4.1 if T1 = t1 (X) and T2 = t2 (X) are UMVUE of τ (θ) , show that
the correlation coefficient between T1 and T2 is one.
Given Eθ [T1 ] = τ (θ) and Eθ [T2 ] = τ (θ) for θ ∈ Ω
Vθ [T1 ] = Vθ [T2 ] for θ ∈ Ω .
Consider a new estimator T = 12 [T1 + T2 ] which is also the unbiased estimator of
τ (θ) , i.e.,
1 1
Eθ [T ] = Eθ [T1 ] + Eθ [T2 ]
2 2
1 1
= τ (θ) + τ (θ)
2 2
= τ (θ)

 
1
Vθ [T ] = Vθ [T1 + T2 ]
2
1
= {Vθ [T1 ] + Vθ [T2 ] + 2Covθ [T, T2 ]}
4
1n p o
= Vθ [T1 ] + Vθ [T2 ] + 2ρ Vθ [T1 ] + Vθ [T2 ]
4
1
= {2Vθ [T1 ] + +2ρVθ [T1 ]}
4
1
= Vθ [T1 ](1 + ρ)
2

where ρ is the correlation cofficient between T1 and T2 .


Since T1 is the UMVUE of τ (θ)

⇒ Vθ [T ] ≥ Vθ [T1 ]
1
Vθ [T1 ](1 + ρ) ≥ Vθ [T1 ]
2
(1 + ρ) ≥ 2
ρ ≥1

93
A. Santhakumaran

But ρ ≤ 1 ⇒ ρ = 1. Thus the correlation between the UMVUE’s T1 and T2 is one.


Example 4.2 Let X1 , X2 , · · · , Xn be a sample from a population with mean θ
Pn finite variance, and T be an estimate of θ of the form T (X1 , X2 , · · · , Xn )0 =
and
i=1 αi Xi . If T is an unbiased estimate of θ that has minimum variance and T is
another linear unbiased estimate of θ , show that

Covθ (T, T 0 ) = Vθ [T ]
Pn
Given T = i=1 αi Xi is the unbiased estimator of θ , Eθ [T ] = θ .
Also T 0 is the unbiased estimator of θ , i.e., Eθ = [T 0 ].

Eθ [T ] = θ
Eθ [T 0 ] = θ
Eθ [T − T 0 ] = 0
Eθ [T [T − T 0 ] = 0
Eθ [T 2 − T T 0 ] = 0
Eθ [T 2 ] − Eθ [T T 0 ] = 0
Eθ [T T 2 ] = Eθ [T 2 ]
i.e., Covθ (T, T 0 ) = Vθ [T ]

Exaample 4.3 Let T1 , T2 be two unbiased estimates having common variance


ασ 2 (α > 1) , where σ 2 is the variance of the UMVUE. Show that the correlation
coefficient between T1 and T2 is greater than or equal to 2−α α .
Given Eθ [T1 ] = τ (θ) and Vθ [T1 ] = ασ 2 .
Also Eθ [T2 ] = τ (θ) and Vθ [T2 ] = ασ 2 where α > 1
Since T1 and T2 are UMVUE’s of τ (θ) , We have Vθ [T1 ] = Vθ [T2 ] = ασ 2
Consider anestimator 12 [T1 + T2 ] . It is also an unbiased estimator of τ (θ) , i.e.,
Eθ [T1 ] = 21 Eθ [T1 ] + 21 Eθ [T2 ] = 12 τ (θ) + 12 τ (θ) = τ (θ)

1 1
Vθ [ (T1 + T2 )] = {Vθ [T1 ] + Vθ [T2 ] + 2Covθ (T1 , T2 )}
2 4
1n p o
= Vθ [T1 ] + Vθ [T2 ] + 2ρ Vθ [T1 ]Vθ [T2 ]
4
1
= [2Vθ [T1 ] + 2ρVθ [T2 ]]
4
1
= [Vθ [T1 ] + ρVθ [T1 ]
2
where ρ is the correlation coefficient between T1 and T2 Let T be the UMVUE of

94
Probability Models and their Parametric Estimation

τ (θ) , therefore we have


1
Vθ { [T1 + T2 ]} ≥ Vθ [T ]
2
1
i.e., Vθ [T1 ](1 + ρ) ≥ Vθ [T ]
2
1 2
ασ (1 + ρ) ≥ σ2
2
α(1 + ρ) ≥ 2
2
(1 + ρ) ≥
α
2−α
ρ ≥
α

4.4 Rao - Blackwell Theorem


Rao - Blackwell Theorem helps to search for an UMVUE T = t(X) of a para-
metric function τ (θ). Let δ(T ) be another statistic and a function of the sufficient
statistic T = t(X) which is an unbiased estimator for the parametric function τ (θ) ,
i.e., Eθ [δ(T )] = τ (θ) . Rao - Blackwell Theorem improves on δ(T ) by conditioning
on the sufficient statistic T = t. That is, computing E[δ(T ) | T = t] = g(t) so
that Eθ [g(T )] = τ (θ) with smaller variance than that of δ(T ) . Also it states that the
conditioning on the sufficient statistic T = t(X) is made irrespective of any unbiased
estimator δ(T ) of τ (θ) .
Theorem 4.5 Let {Pθ , θ ∈ Ω} be a family of probability distributions and δ(T )
be any statistic in U where U is the non-empty class of all unbiased estimators of
τ (θ) with Eθ [δ 2 (T )] < ∞ . Let T = t(X) be a sufficient statistic for {Pθ , θ ∈
Ω} . Then the conditional expectation E[δ(T ) | T = t] = g(t) is independent of θ
and g(T ) is an unbiased estimator of τ (θ) . Also Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δ(T ) −
τ (θ)]2 ∀ θ ∈ Ω.
Proof: Given that δ(T ) is a unbiased estimator of τ (θ) , ∀ θ ∈ Ω and δ(T ) is a
function of a sufficient statistic T . E[δ(T ) | T = t] = g(t) and the statistic g(T ) is
an unbiased estimator of τ (θ) , since Eθ [E {δ(T ) | T }] = Eθ [δ(T )] = τ (θ) ∀ θ ∈ Ω.
Now prove that Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δ(T ) − τ (θ)]2 ∀ θ ∈ Ω .
It is enough to prove that Eθ [g 2 (T )] ≤ Eθ [δ 2 (T )] ∀ θ ∈ Ω.

One knows that E [δ(T ) | T ] = E [δ(T ) | T 1 | T ]

95
A. Santhakumaran

Applying Cauchy Schwarz Inequality to E[δ(T ) | T 1 | T ]


{E[δ(T ) | T 1 | T ]}2 ≤ E[δ 2 (T ) | T ]E[12 | T ]
2
i.e., {E[δ(T ) | T ]} ≤ E[δ 2 (T ) | T ]
i.e., g 2 (t) ≤ E[δ 2 (T ) | T ]
2
Eθ E[δ 2 (T ) | T ] = Eθ [δ 2 (T )]

i.e., Eθ [g (T )] ≤
2
→ Eθ [g(T ) − τ (θ)] ≤ Eθ [δ(T ) − τ (θ)]2 ∀ θ ∈ Ω
The inequality becomes equality iff
Eθ [δ 2 (T )] = Eθ [g 2 (T )]
2
Eθ {E[δ(T ) | T ]}2

i.e., Eθ E[δ (T ) | T ] =
since E[E[X 2 | Y ]] = E[X 2 ] and g(t) = E[δ(T ) | T ]
E[δ 2 (T ) | T ] − {E[δ(T ) | T ]}2
 
Eθ = 0
Eθ [V ar[δ(T ) | T ]] = 0
V ar[δ(T ) | T ] = 0 iff E[δ 2 (T ) | T ] = {E[δ(T ) | T ]}2
If this is the case , then E[δ(T ) | T = t] = g(t) and the statistic g(T ) is a function
of T.
Remark 4.1 The Rao - Blackwell Theorem has the following limitations.
(i) If the unbiased estimator T = t(X) is already a function of only one sufficient
statistic, then the derived statistic is identical to T = t(X) . In this case there is
no improvement in the variance of the statistic T = t(X) .
(ii) If more than one sufficient statistic exists, then one can improve the variance of the
unbiased estimator by using minimal sufficient statistics, since the set of jointly
sufficient statistic is an arbitrary set. To add the concept of completeness to
derive the statistic which is unique and may identify the UMVUE’s. This leads
to Lehman - Scheffe Theorem.

4.5 Lehman - Scheffe Theorem


The Theorem states that if a complete sufficient statistic exists, then the UMVUE
of τ (θ) is unique. But it does not mean that only the complete sufficient statistic has
UMVUE. Even if a complete sufficient statistic does not exist, an UMVUE may still
exist.
Theorem 4.6 If T = t(X) is a complete sufficient statistic and there exists an
unbiased estimator δ(T ) of τ (θ) , then there exists a unique UMVUE of τ (θ) which
is given by E[δ(T ) | T = t] = g(t) .
Proof: Rao - Blackwell Theorem gives E[δ(T ) | T = t] = g(t) and g(T )
is the UMVUE of τ (θ) . It is only to prove that g(T ) is unique. If δ1 (T ) ∈ U and
δ2 (T ) ∈ U , then Eθ [E[δ1 (T ) | T ]] = τ (θ) and Eθ [E[δ2 (T ) | T ]] = τ (θ) ∀ θ ∈ Ω.
Eθ [E[δ1 (T ) | T ] − E[δ2 (T ) | T ]] = 0 ∀ θ ∈ Ω
Since T = t(X) is a complete sufficient statistic
⇒ E[δ1 (T ) | T ] − E[δ2 (T ) | T ] = 0
E[δ1 (T ) | T ] = E[δ2 (T ) | T ]

96
Probability Models and their Parametric Estimation

.˙. The UMVUE g(T ) is unique, if the sufficient statistic T = t(X) is complete.
From the above Theorems 4.5 and 4.6 the UMVUE of τ (θ) is obtained by
solving a set of equations and conditioning on the sufficient statistic.
Solving a set of equations of the sufficient statistic
Let Pθ , θ ∈ Ω be a distribution of random variable X . If T is a complete suf-
ficient statistic, then the UMVUE g(T ) of any parametric function τ (θ) is uniquely
determined by solving the set of equations Eθ [g(T )] = τ (θ) ∀ θ ∈ Ω .
Conditioning on the sufficient statistic
If a random variable X has a distribution Pθ , θ ∈ Ω and δ(T ) is any unbiased
estimator of τ (θ) and T = t(X) is complete sufficient statistic, then the UMVUE
g(T ) can be obtained by conditional expectation of δ(T ) given T = t , i.e., g(t) =
E[δ(T ) | T = t].
Example 4.4 Obtain the UMVUE of θ + 2 for the pmf of the Poisson distribu-
tion  −θ θx
e x! x = 0, 1, 2, · · ·
p(x | θ) =
0 otherwise
by taking a sample of size n .
n
X
Let T = Xi , thenT ∼ P (nθ)
i=1
−nθ
e (nθ)t
p(t | θ) = t = 0, 1, 2, · · ·
t!
= 0 otherwise
1
p(t | θ) = e−nθ et log nθ
t!
= c(θ)eQ(θ)t(x) h(x)
Pn
where c(θ) = e−nθ , Q(θ) = log nθ, t(x) = i=1 xi , h(x) = 1
t! . .˙. The statistic

97
A. Santhakumaran

is complete and sufficient. Thus the UMVUE g(T ) of θ + 2 is

Eθ [g(T )] = θ+2

X 1
g(t)e−nθ (nθ)t = θ+2
t=0
t!

X 1
g(t)nt θt = (θ + 2)enθ
t=0
t!

X (nθ)t
= (θ + 2)
t=0
t!
∞ ∞
X
t t+1 1 X 1
= nθ +2 nt θ t
t=0
t! t=0
t!
Equivating the coefficient of θt on both sides
nt nt−1 nt
g(t) = +2
t! (t − 1)! t!
t
g(t) = +2
n
P
xi
= +2
n
= x̄ + 2

Thus X̄ + 2 is the UMVUE of θ + 2 .


Example 4.5 Let Xi (i = 1 to n) be a sample from Poisson distribution with
parameter θ . Obtain the UMVUEPof θr−1 e−rθ , r = 1, 2, 3, · · · .
n
As in the example 4.1, T = i=1 Xi is complete and sufficient. Therefore there
exists a unique UMVUE of τ (θ) = θr−1 e−rθ , r = 1, 2, · · · .

X 1
g(t)e−nθ (nθ)t = θr−1 enθ−rθ
t=0
t!

X (n − r)t
= θt+r−1
t=0
t!
Equivating the coefficient of θr on both sides
nt (n − r)t−r+1
g(t) =
t! (t − r + 1)!

(n − r)t−r+1 t!
g(t) = t
, r = 1, 2, · · · and n > r
n (t − r + 1)!
Thus the UMVUE of θr−1 e−rθ is
(n − r)T −r+1 T!
T
, r = 1, 2, · · · and n > r.
n (T − r + 1)!

98
Probability Models and their Parametric Estimation

t t
Remark 4.2 When r = 1, g(t) = n−1 n = 1 − n1 , n = 2, 3, · · · , then
T
1 − n1 is the unbiased estimator of e−θ where T = Xi .
 P

When r = 2, (n−2)T
[1 − n2 ]T , n = 3, 4, · · · is the UMVUE of e−2θ θ where
P
T = Xi .
Example 4.6 Obtain the UMVUE of θr + (r − 1)θ , r = 1, 2, · · · for the random
sample of size n from Poisson distribution
Pn with parameter θ .
As in the example 4.1, T = i=1 Xi is complete and sufficient. There exists a
UMVUE of τ (θ) = θr + (r − 1)θ , r = 1, 2, · · ·

X (nθ)t
Eθ [g(T )] = g(t)e−nθ = θr + (r − 1)θ
t=0
t!
∞ t t
X nθ
g(t) = [θr + (r − 1)θ]enθ
t=0
t!
= θr enθ + (r − 1)θenθ
∞ ∞
X nt θ t X nt θ t
= θr + (r − 1)θ
t=0
t! t=0
t!
∞ ∞
X 1 X 1
= nt θt+r + (r − 1) nt θt+1
t=0
t! t=0
t!
Equivating the coefficient of θt on both sides
nt nt−r nt−1
g(t) = + (r − 1)
t! (t − r)! (t − 1)!
1 t! 1 (r − 1)
= + t!
nr (t − r)! n (t − 1)!
t(t − 1) · · · · · · (t − r + 1) (r − 1)
= + t
nr n
The UMVUE of θr + (r − 1)θ is

T (T − 1) · · · · · · (T − r + 1) (r − 1)
g(T ) = + T, r = 1, 2, · · ·
nr n
Remark 4.3 When r = 1, X̄ is the UMVUE of θ .
When r = 2, X̄(nX̄−1)
n + X̄ is the UMVUE of θ2 + θ .
Example 4.7 Obtain UMVUE of θ(1−θ) using a random sample of size n drawn
from a Bernoulli population with parameter θ.
 x
θ (1 − θ)1−x x = 0, 1
Given pθ (x) =
0 otherwise

99
A. Santhakumaran

n
X
Let T = Xi , then T ∼ b(n, θ)
i=1
i.e., pθ (x) = cnt θt (1 − θ)n−t t = 0, 1, 2, · · · ,n
 t
θ
= cnt (1 − θ)n
1−θ
θ
= (1 − θ)n et log( 1−θ ) cnt
= c(θ)eQ(θ)t(x) h(x)
 
θ X
where c(θ) = (1 − θ)n , Q(θ) = log , t(x) = xi and h(x) = cnt .
1−θ
P
It is an one parameter exponentially family. .˙. The statistic T = Xi is complete
and sufficient. The UMVUE of θ(1 − θ) is

Eθ [g(T )] = θ(1 − θ)

X
g(t)cnt θt (1 − θ)n−t = θ(1 − θ)
t=0
∞  t
X θ
g(t)cnt = θ(1 − θ)(1 − θ)−n
t=0
1−θ
θ
One can take ρ = , then
1−θ
θ 1
1+ρ = 1+ =
1−θ 1−θ
1
Thus 1 − θ =
1+ρ
ρ
→ θ =
1+ρ

X
g(t)ρt cnt = ρ(1 + ρ)n−2
t=0
= ρ[1 + cn−2
1 ρ + · · · + ρn−2 ]
= ρ + cn−2
1 ρ2 + · · · + ρn−1
n−1
!
X n-2
= t-1 ρt
t=1
g(t)cnt = cn−2
t
(n − 2)! t!(n − t)!
g(t) =
(t − 1)!(n − t − 1)! n!
(n − 2)!t(t − 1)!(n − t)(n − t − 1)!
=
(t − 1)!(n − t − 1)!n(n − 1)(n − 2)!
t(n − t)
= if n = 2, 3, · · ·
n(n − 1)

100
Probability Models and their Parametric Estimation

T (n−T )
i.e., n(n−1) is the UMVUE of θ(1 − θ).
Example 4.8 Obtain the UMVUE of p1 of the pmf

pq x

x = 0, 1, · · ·
pp (x) =
0 otherwise

based on a sample of size n.


If xi denotes the number of trials after the (i − 1)th success up to but not
including the ith success, the probability that Xi = x is pq x for x = 0, 1, · · · and
i = 1, 2, · · · , n.
The joint pmf of X1 , X2 , · · · , Xn is
 Px
pq i xi = 0, 1, · · · ; i = 1, 2, · · · , n
pp (x1 , x2 , · · · , xn ) =
0 0otherwise
P
= pelog(1−p) xi
= c(p)eQ(p)t(x) h(x)
X
where c(p) = pn , Q(p) = log(1 − p), t(x) = xi , h(x) = 1.

This is an one parameter exponentially family which is complete and sufficient. Thus
there exist an unique UMVUE of p1 . It is given by Ep [g(T )] = p1 .
Pn
The statistic T = i=1 Xi is the sum of n iid Geometric variables with
same parameter p has the Negative Binomial distribution. The pmf of T is
 !
 n+t-1 n t

n-1 p q t = 0, 1, · · ·
pp (t) = P {T = t} =

0 otherwise


!
X n+t-1 1
g(t) n-1 pn q t =
t=0
p

n+t-1 t
X  
g(t) t q = (1 − q)−(n+1)
t=0
∞ 
n+t t
X 
= t q
t=0
n+t-1
 
Equivating the coefficient of q t on both sidesg(t) t
n+t
 
= t

(n + t)! t!(n − 1)!


g(t) =
t!n! (n + t − 1)!
(n + t)(n + t − 1)!(n − 1)! t+n
= =
n(n − 1)!(n + t − 1)! n

101
A. Santhakumaran

Thus T +n
n is the UMVUE of p1 .
1
Example 4.9 For a single observation x of X , find the UMVUE of p of the
pmf  x
pq x = 0, 1, · · ·
pp (x) =
0 otherwise
The pmf of the random variable is written as

pp (x) = pex log(1−p) = c(p)eQ(p)t(x) h(x)


where c(p) = p, Q(p) = log(1 − p), t(x) = x, h(x) = 1

It is an one parameter exponentially family. The statistic T = X is complete and


sufficient. The UMVUE of p1 is given by

1
Ep [g(X)] =
p

X 1
g(x)pq x =
x=0
p
X∞ ∞
X
g(x)q x = (1 − q)−2 = (x + 1)q x
x=0 x=0
→ g(x) = x+1

Thus the UMVUE of p1 is X + 1 .


ExampleP 4.10 Let X1 , X2 , · · · Xn be iid N (θ, 1) . Prove that E[X1 | Y ] = x̄
n
where Y = i=1 Xi .
The sample mean X̄ ∼ N (θ, n1 ) and Eθ [X1 ] = θ, ∀ θ ∈ Ω.
The pdf of the sample size n is
n
Y
pθ (x1 , x2 , · · · , xn ) = p(xi | θ)
i=1
 n
1 1
P 2
= √ e− 2 (xi −θ)

 n
1 1
P 2 nθ2
= √ e− 2 xi − 2 +nx̄θ

= c(θ)eQ(θ)t(x) h(x)
 n
− nθ
2 1 1
P 2 X
where c(θ) = e 2 , h(x) = √ e− 2 xi , Q(θ) = θ and t(x) = xi .

It is an one parameter exponential family. T = X̄ is complete and sufficient. The
UMVUE of θ is given by g(T ) and g(t) = E[X1 | Y ] where δ(T P ) = X1 is an
n
unbiased estimator of θ . The conditional expectation X1 on Y = i=1 Xi is a
regression line, i.e.,
σX1
E [X1 | Y ] = Eθ [X1 ] + bX1 Y (Y − Eθ [Y ]) where bX1 Y = ρ
σY

102
Probability Models and their Parametric Estimation

Pn
and ρ is the correlation coefficient between X1 and Y = i=1 Xi
Cov[X1 , Y ]
ρ =
σX σY
X 1 √
Y = Xi ∼ N (nθ, n) σY = n, σX1 = 1
Covθ [X1 , Y ] = Eθ [X1 Y ] − Eθ [X1 ]Eθ [Y ]
" n
#
X
Eθ [X1 Y ] = Eθ X1 Xi
i=1
= Eθ [X12 ] + Eθ [X1 X2 + · · · + Xn X1 ]
= Eθ [X12 ] + Eθ [X1 ]Eθ [X2 ] + · · · + Eθ [X1 ]Eθ [Xn ]
= 1 + θ2 + (n − 1)θ2 where Vθ [X1 ] = Eθ [X12 ] − θ2
= nθ2 − θ2 + 1 + θ2
= nθ2 + 1
Covθ [X1 , Y ] = nθ2 + 1 − θnθ = 1
1 1
ρ = √ and bX1 Y =
n n
1
E[X1 | Y = y] = Eθ [X1 ] + [y − nθ]
n
y
= θ + − θ = x̄
n
E[X1 | Y = y] = x̄ and X̄ is the UMVUE of θ.
Example 4.11 Let X1 , X2 , · · · Xn be iid random sample with pdf
 1
θ 0<x<θ
pθ (x) =
0 otherwise
Find the UMVUE of θ .
Let T = max {Xi }
1≤i≤n
The pdf of T is
Z t n−1
n! 1 1
pθ (t) = dx 0<t<θ
1!(n − 1)! 0 θ θ
 n n−1
pθ (t) = θn t 0<t<θ
0 otherwise
The joint density of X1 , X2 , · · · , Xn is
 n
1
pθ (x1 , x2 , · · · , xn ) =
θ
The conditional density of
1
pθ (x1 , x2 · · · xn ) θn 1
= n n−1 =
pθ (t) θn t ntn−1

103
A. Santhakumaran

It is independent of θ . T = max1≤i≤n {Xi } is a sufficient statistic.


Z θ
n n−1
Eθ [g(T )] = g(t) t dt = 0
0 θn
Z θ
g(t)tn−1 dt = 0
0
Differentiate this with respect to θ
hR i
θ
∂ 0
g(t) θnn tn−1 dt Z θ
= 0dt + g(θ)θn−1 × 1 − 0 = 0
∂θ 0
→ g(θ) = 0 ∀ θ
→ g(t) = 0 ∀ t and 0 < t < θ

Thus T = t(X) is a complete and sufficient statistic. δ(T ) = 2X1 is an unbiased



estimator of θ , since Eθ [X1 ] = 0 x1 θ1 dx1 = θ2 . The UMVUE of θ is given by
g(T ) and g(t) = E[2X1 | T = t].
When x1 = t the conditional pmf of X1 given T = t is p(x1 | t) = n1 .
When 0 < x1 < t the conditional density of X1 given T = t is
1 (n−1) n−2
pθ (x1 , t) θ θ n−1 t
pθ (x1 | t) = = n n−1 0 < x1 < t
pθ (t) nt
 θn−1 1
n t 0 < x1 < t
=
0 otherwise
Z t
1
E[2X1 | T = t] = 2x1 pθ (x1 | t)dx1 + 2t
0 n
Z t
n−11 2t
= 2 x1 dx1 +
n t 0 n
n − 1 1 t2 2t
= 2 +
n t 2 n
1
= (1 + )t
n
Thus the UMVUE of θ is ( n+1 n )T where T = max1≤i≤n {Xi }.
Example 4.12 Let X1 , X2 , · · · , Xn be a random sample drawn from a distribu-
tion with probability density function
 −(x−θ)
e θ<x<∞
pθ (x) =
0 otherwise

104
Probability Models and their Parametric Estimation

Derive the U M V U E of (i) θ and (ii) eθ .

Define T = min {Xi }


1≤i≤n
The pdf of T is
Z ∞ n−1
n! −(t−θ) −(x−θ)
pθ (t) = e e dx
1!(n − 1)! θ
ne−n(t−θ) θ < t < ∞

=
0 otherwise
Eθ [g(T )] = 0
Z ∞
g(t)ne−n(t−θ) dt = 0
θ
Z ∞
g(t)e−n(t−θ) dt = 0
θ

One can take z = t − θ , then dz = dt and when t = θ → z = 0 , when t = ∞ →


z=∞
Z ∞
g(z + θ)e−nz dz = 0
0
Z ∞
It is same as f (t)e−st dt = 0
0

By the uniqueness property of the Laplace Transform

g(z + θ) = 0 ∀ 0 < z < ∞


i.e., g(t) = 0 ∀ 0 < t − θ < ∞
= 0∀ θ<t<∞

Thus T is complete. Also T = min1≤i≤n {Xi } is sufficient.


Z ∞
Eθ [X1 ] = x1 e−(x1 −θ) dx1
θ
Z ∞
= (z + θ)e−z dz where z = x1 − θ
Z0 ∞ Z ∞
= e−z z 2−1 dz + θ e−z z 1−1 dz
0 0
= Γ2 + θΓ1
= 1+θ
Eθ [X1 − 1] = θ

If one can take δ(T ) = X1 − 1 , then the UMVUE of θ is given by g(T ) and
g(t) = E[(X1 − 1) | T = t].
When x1 = t, the conditional pmf of X1 given T = t is pθ (x1 | t) = n1 .

105
A. Santhakumaran

When t < x1 < ∞, the conditional density of X1 given T = t is


e−(x1 −θ) (n − 1)e−(n−1)(t−θ)
pθ (x1 | t) =
ne−n(t−θ)
n − 1 −(x1 −t)
= e
n Z ∞
(n − 1) 1
E[(X1 − 1) | T = t] = (x1 − 1)e−(x1 −t) dx1 + (t − 1)
n t n
Z ∞ Z ∞
n−1 n−1
= x1 e−(x1 −t) dx1 − e−(x1 −t) dx1
n t n t
1
+ (t − 1)
n Z
n−1 ∞ n − 1 ∞ −z
Z
= (z + t)e−z dz − e dz
n 0 n 0
1
+ (t − 1) where z = x1 − t
n Z
n − 1 ∞ −z 2−1
Z ∞
n−1
= e z dz + t e−z z 1−1 dz
n 0 n 0
n − 1 ∞ −z 1−1
Z
1
− e z dz + (t − 1)
n 0 n
n−1 n−1 n−1 1
= Γ2 + t− Γ1 + (t − 1)
n n n n
n−1 1
= t + (t − 1)
n n
1
= t−
n
1
The UMVUE of θ is T − n1 and the UMVUE of eθ is e{T − n }.
Example 4.13 Let X1 and X2 be a random sample drawn from a population with
pdf  1 −x
θe
θ 0<x<∞
pθ (x) =
0 otherwise
Obtain the UMVUE of θ .
1 − 1 (x1 +x2 )
pθ (x1 , x2 ) = e θ
θ2
1 −1t
= e θ
θ
= c(θ)eQ(θ)t(x) h(x)
2
1 1 X
where c(θ) = , Q(θ) = − , t(x) = xi , h(x) = 1
θ2 θ i=1

It is an one parameter exponential family. It is complete and sufficient. Let t = x1 +x2

106
Probability Models and their Parametric Estimation

∂x1 ∂x1
and t1 = x2 , then x1 = t−t1 and x2 = t1 . ∂t = 1, ∂t1 = −1, ∂x ∂x2
∂t = 0, ∂t1 = 1
2


∂x1 ∂x1
∂t ∂t1
J =

∂x2 ∂x2

∂t ∂t1

1 −1
=
0 1

The joint density of T and T1 is


1 − θ1 t

θ2 e 0 < t1 < t < ∞
p(t, t1 | θ) =
0 otherwise
( 0 < t1 < t or
1 − θ1 t
= θ2 e t1 < t < ∞
0 otherwise
The pdf of T is
Z t
pθ (t) = p(t, t1 | θ)dt1
0
Z t
1 1
= e− θ t dt1
θ2 0
1 −1t
= e θ t 0<t<∞
θ2
− θ1 t 2−1
 1
θ 2 Γ2 e t 0<t<∞
=
0 otherwise
The pdf of T1 is
Z ∞
1 −1t
pθ (t1 ) = e θ dt
t1 θ2
" 1 #∞
1 e− θ t
=
θ − θ1
t1
1 − 1 t1
= e θ 0 < t1 < ∞
θ
The conditional density of T1 given T = t is
1

0 < t1 < t
p(t1 | t) = t
0 otherwise
Eθ [X2 ] = θ .˙. δ(T ) = X2 = T1
is an unbiased estimator θ. Thus the UMVUE of θ is
Z t
1
E [T1 | T ] = t1 dt1
0 t
 t
1 t21
=
t 2 0
t x1 + x2
= = = x̄
2 2
The UMVUE of θ is X̄.

107
A. Santhakumaran

Example 4.14 The random variables X and Y have the joint pdf
 2 − 1 (x+y)
θ2 e
θ 0<x<y<∞
p(x, y | θ) =
0 otherwise

Show that
(i) Eθ [Y | X = x] = x + θ

(ii) Eθ [Y ] = Eθ [X + θ] and
(iii) Vθ [X + θ] ≤ Vθ [Y ]
The marginal density of X is
Z ∞
2 x+y
p(x | θ) = e− θ dy
θ2 x
2 − 2x

θ2 e
θ 0<x<∞
=
0 otherwise

The marginal density of Y is


Z y
2 x+y
pθ (y) = e− θ dy
θ2 0
2 −y 2

θe
θ − θ2 e− θ y 0<y<∞
=
0 otherwise

108
Probability Models and their Parametric Estimation

The conditional pdf of Y given X = x is


2 − x+y
θ2 e
θ
pθ (y | x) = 2 − θ2 x
e
θ 1 x − y
θe e
θ θ x<y<∞
=
0 otherwise
Z ∞
Eθ [Y | X = x] = ypθ (y | x)dy
x
x Z ∞
eθ y
= ye− θ dy
θ
Z x∞
x y
= e θ e− θ dy + x
x
= x+θ
Z ∞ Z ∞
2 y 2 2
Eθ [Y ] = ye− θ dy − ye− θ y dy
θ 0 θ 0
2 Γ2 2 Γ2
= 1 −
θ (θ) 2 θ ( θ2 )2
3
= θ
2
7θ2 5
Eθ [Y 2 ] = , Vθ (Y ) = θ2
2
Z ∞ 4
2 −2x θ
Eθ [X] = 2
e θ dx =
0 θ 2
θ 3
Eθ [X + θ] = + θ = θ = Eθ [Y ]
2 2
θ2
Vθ [X + θ] = Vθ [X] =
4
Thus Vθ [X + θ] ≤ Vθ [Y ].

Example 4.15 Let X1 , X2 , · · · , Xn be a sample from the pmf


 1
x = 1, 2, · · · , N and N ∈ I+
p(x | N ) = N
0 otherwise

109
A. Santhakumaran

Obtain the UMVUE of N .

Define X(n) = max {Xi }


1≤i≤n
PN {X(n) ≤ x} = PN {X1 ≤ x1 , X2 ≤ x2 , · · · , Xn ≤ xn }
= PN {X1 ≤ x1 } · · · PN {Xn ≤ xn }
x x  x n
= ··· =
N N N
 n
x−1
PN {X(n) ≤ x − 1} =
N
PN {X(n) = x} = PN {X(n) ≤ x} − PN {X(n) ≤ x − 1}
 x n  x − 1 n−1
= −
N N
N
"  n−1 #
n 
X t t−1
EN [g(T )] = g(t) − =0
t=1
N N
g(t) = 0 ∀ t = 1, 2, · · · , N

When N = 1, g(1)[1 − 0] = 0 → g(1) = 0


 n  "  n−1 #
1 1
When N = 2, g(1) − 0 + g(2) 1 − = 0
2 2
 
1
g(1) = 0 ⇒ g(2) 1 − n−1 = 0 ⇒ g(2) = 0 and so on
2
g(t) = 0 ∀ t = 1, 2, · · · , N
Thus the statistic X(n) is complete

110
Probability Models and their Parametric Estimation

Consider PN {X1 = x1 | X(n) = x}


PN {X1 = x1 ∩ X(n) = x}
=
PN {X(n) = x}
if x1 = 1, 2, · · · , (x − 1) and x1 6= x
x n−1
(N ) − ( x−1
N
)n−1 1
= x n
 x−1 n
×
N
− ( N
) N
xn−1 − (x − 1)n−1
=
xn − (x − 1)n
if x1 = 1, 2, · · · , (x − 1) and x1 6= x
PN {X1 = x1 ∩ X(n) = x}
PN {X1 = x1 | X(n) = x} = if x1 = x
PN {X(n) = x}
x n−1

N 1
= x n ×
(N ) − ( x−1
N
)n N
xn−1
= if x1 = x
xn − (x − 1)n
Thus X(n) is a sufficient statistic.
N
X 1
EN [X1 ] = x1
x1 =1
N
1 N (N + 1) N +1
= =
N 2 2
EN [2X1 ] = N +1
EN [2X1 − 1] = N
.
. . δ(T ) = 2X1 − 1 is an unbiased estimator of N.
The UMVUE of N is given by
E[(2X1 − 1) | X(n) = x]
x−1
X
= (2x1 − 1)PN {X1 = x1 | X(n) = x}
x1 =1

+(2x − 1)PN {X1 = x1 | X(n) = x}


x−1
xn−1 − (x − 1)n−1 X
= (2x1 − 1)
xn − (x − 1)n x =1
1

xn−1
+ n (2x − 1)
x − (x − 1)n
x−1
xn−1 X
= (2x1 − 1)
xn − (x − 1) x =1
n
1
n−1
x
+ (2x − 1)
xn − (x − 1)n
x−1
(x − 1)n−1 X
− (2x1 − 1)
xn − (x − 1)n x =1
1

xn−1
= [1 + 3 + 5 + · · · + (2x − 1)]
x − (x − 1)n
n
111
(x − 1)n−1
− [1 + 3 + · · · + (2x − 3)]
xn − (x − 1)n
A. Santhakumaran

1 + 3 + · · · + (2x − 3) + (2x − 1) = 1 + 2 + · · · + (2x − 1) + 2x


−(2 + 4 + · · · + 2x)
2x(2x + 1) x(x + 1)
= −2×
2 2
= x(2x + 1) − x(x + 1) = x2
1 + 3 + · · · + (2x − 3) = 1 + 2 + · · · + (2x − 2) − [2 + 4 + · · · + (2x − 2)]
(2x − 2)(2x − 1) 2(x − 1)x
= −
2 2
= (2x − 1)(x − 1) − x(x − 1) = (x − 1)2
xn−1 (x − 1)n−1
x2 − n (x − 1)2
 
E 2X1 − 1 | X(n) = x = n n
x − (x − 1) x − (x − 1)n
xn+1 (x − 1)n+1
= −
xn − (x − 1)n xn − (x − 1)n
n+1 n+1
x − (x − 1)
=
xn − (x − 1)n
n+1 n+1
Thus the UMVUE of N is X X n −(X−1)
−(X−1)n .
Remark 4.4 In Chapter 3 , Example 3.14 is not complete, but it is bounded com-
plete. The class of unbiased estimators of zero is

U0 = {g(X) | c ∈ <}

where
c(−1)x−1

if x = 1, 2
g(x) =
0 x = 3, 4, · · · , N ; N = 2, 3, · · ·
By Theorem 4.7, CovN [δ(T ), g(X)] = 0 for N = 2, 3, · · · implies that δ(T ) is a
UMVUE of N where T = t(X) . That is

EN [δ(t(X))g(X)] = 0 N = 2, 3, · · · , ∀ c ∈ <
N
X 1
δ(t(x))g(x) = 0 N = 2, 3, · · · , ∀ c ∈ <
x=1
N
N
X
⇒ δ(t(x))g(x) = 0 N = 2, 3, · · · , ∀ c ∈ <
x=1
i.e., δ(t(1))c − δ(t(2))c = 0 ∀ c ∈ <
If one can take c = 1 , then δ(t(1)) = δ(t(2)).

.˙. Any estimator δ(T ) such that δ(t(1)) = δ(t(2)) is a UMVUE of N , provided
EN [δ 2 (T )] < ∞, for N = 2, 3, · · · . Thus a family of distributions is bounded com-
plete, then there is a class of UMVUE’s.
Example 4.16 Let X1 , X2 , · · · , Xn be a random sample of size n from a distri-
bution with pdf  1 −x
θe
θ 0 < x < ∞, θ > 0
pθ (x) =
0 otherwise

112
Probability Models and their Parametric Estimation

Obtain the UMVUE of Pθ {X ≥ 2} .


The joint pdf of the sample size n is
1 − ni=1 xi
P

p(x1 , x2 , · · · , xn | θ) e = θ
θn
= c(θ)eQ(θ)t(x) h(x)
Pn
It is an one parameter exponential family. The statistic T = i=1 Xi is complete and
sufficient.

Pθ {X ≥ 2} = 1 − Pθ {X < 2}
Z 2
1 −x
= 1− e θ dx
0 θ
2
= eZ− θ

1 − x 2−1
Eθ [X1 ] = e θ x1 dx1 = θ
0 θ
Xn
Let T = Xi , thenT ∼ G(n, θ)
i=1
− θ1 t n−1
 1
pθ (t) = θ n Γn e t t>0
0 otherwise
n
X
Let y = xi , then
i=2

The joint probability density of (X1 , Y ) is

pθ (x1 , y) = pθ (x1 )pθ (y)


1 − 1 x1 1 1
= e θ n−1
e− θ y y n−2
θ θ Γ(n − 1)
1 − θ1
Pn
i=1 xi y n−2
= e
θn Γ(n − 1)
1 1
= e− θ t [t − x1 ]n−2 where y = t − x1
θn Γ(n − 1)
(
1 − θ1 t n−2
θ n Γ(n−1) e [t − x1 ] 0 < x1 ≤ t < ∞
pθ (x1 , t) =
0 otherwise

113
A. Santhakumaran

1 − θ1 t n−2
θ n Γ(n−1) e [t − x1 ]
pθ (x1 | t) = 1 − θ1 t tn−1
θ n Γn e
1
= (n − 1)[t − x1 ]n−2 n−1
t
(n − 1) 1t [1 − xt1 ]n−2

0 < x1 < t
=
0 otherwise
The UMVUE of θ is
t
n−1h
Z
x1 in−2
E[X1 | T = t] = x1 1− dx1
0 t t
Z t h
n−1 x1 in−2
= x1 1 − dx1
t 0 t
x1
One can take z = , then dx1 = tdz
t
When x1 = t ⇒ z = 1; when x1 = 0 ⇒ z = 0
Z 1
n−1
E[X1 | T = t] = (tz)[1 − z]n−2 tdz
t 0
Z 1
= (n − 1)t (1 − z)n−1−1 z 2−1 dz
0
Γ2Γ(n − 1) t nx̄
= (n − 1)t = = = x̄
Γ(n − 1 + 2) n n
2
The UMVUE of Pθ {X ≥ 2} is e− X̄
Example 4.17 Let X1 , X2 , · · · , Xn be a random sample from N (θ, σ 2 ) . Both
θ and σ are unknown. Find the UMVUE of σ and pth quantile.
(n−1)S 2 X̄)2
P
Let Y = σ2 = (Xσi − 2 ∼ χ2 distribution with (n − 1) degrees of
freedom. Y ∼ G( 12 , (n−1)
2 ).
( 1 n−1
n−1
1
e− 2 y y 2 −1 0<y<∞
p(y) = 2 2 Γ n−1
2
0 otherwise
√ Z ∞
1 1 n
E[ Y ] = n−1 e− 2 y y 2 −1 dy
0 2 Γ n−1
2
2
1 Γ n2
= n
( 12 ) 2
n−1
2 2 Γ n−1
2
"r #
n−1 2 Γ n2 √
i.e., Eσ S = 2
σ2 Γ n−1
2
Γ n2 √ σ
→ Eσ [S] = 2√
Γ n−1
2 n −1
1 Γ n−1
q
2
= σ where k(n) = Γn
2
n −1
k(n) 2

114
Probability Models and their Parametric Estimation

Thus k(n)S is the unbiased estimator of σ .


 n
1 1
P 2 P 2
p(x1 , x2 , · · · , xn | θ, σ) = √ e− 2σ2 [ xi −2θ xi +nθ ]
2πσ
 n
1 1
P 2 θ
P nθ 2
= √ e− 2σ2 xi e σ2 xi e− 2σ2
2πσ
P2
= c(θ1 , θ2 )e j=1 Qj (θ1 ,θ2 )tj (x) h(x) where θ1 = θ, θ2 = σ
 n
1 n 2 1 θ
and c(θ1 , θ2 ) = √ e− 2σ2 θ , Q1 (θ1 , θ2 ) = − 2 , Q2 (θ1 , θ2 ) = 2
2πσ 2σ σ
Hence T1 = Xi , T2 = Xi2 and T = (T1 , T2 ) is jointly sufficient and complete.
P P
But there is a one to one function also sufficient. .˙. T = (X̄, S 2 ) isP also sufficient
1
and complete. Thus the UMVUE of σ is k(n)S where S 2 = n−1 (Xi − X̄)2 .
th
The UMVUE of p quantile δp is given by
p = Pθ,σ {X ≤ δp }
 
X −θ δp − θ
= Pθ,σ ≤
σ σ
 
δp − θ X−θ
= Pθ,σ Z ≤ where Z = σ ∼ N (0, 1)
σ
Z δ−θσ
p = p(z)dz
0
Z ∞
i.e., 1 − p = δ −θ
p(z)dz
p
σ

δp − θ
⇒ = z1−p ⇒ δp = z1−p σ + θ
σ
Thus the UMVUE of δp is Z1−p k(n)S + X̄ .

4.6 Inequality Approach


Under some regularity conditions Cramer - Rao inequality provides a lower bound
for the variance of unbiased estimators. It may enable us to judge a given unbiased
estimator is an UMVUE or not. That is, the variance of an unbiased estimator coincides
with the Cramer - Rao lower bound, then the estimator is the UMVUE.
Covariance Inequality
Theorem 4.7 The covariance inequality between two functions T = t(X) and
ψ(X, θ) is defined as
2
{Covθ [T, ψ(X, θ)]}
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]
where ψ(X, θ) is a function of X and θ and T = t(X) is a statistic with pdf
pθ (t) .

115
A. Santhakumaran

Proof: The Cauchy - Schwarz Inequality between two variables X and Y is


2
{E[X − E[X]][Y − E[Y ]]} ≤ E[X − E[X]]2 E[Y − E[Y ]]2
2
(Cov[X, Y ]) ≤ V [X]V [Y ]

Now replace X by T and Y by ψ(X, θ)


2
(Covθ [T, ψ(X, θ)]) ≤ Vθ [T ]Vθ [ψ(X, θ)]
2
{Covθ [T, ψ(X, θ)]}
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]

Fisher Measure of Information

Definition 4.1 Let Pθ , θ ∈ Ω be the distribution of the random variable X . The


function
∂pθ (x) 1 ∂ log pθ (x)
ψ(x, θ) = =
∂θ pθ (x) ∂θ
is the relative rate at which the density pθ (x) changes at x . The average of the square
of this range is defined by
2 2
p0θ (x)
 Z 
∂ log pθ (X)
I(θ) = Eθ = pθ (x)dx
∂θ pθ (x)

Likelihood Function
Definition 4.2 Consider a random sample X1 , X2 , · · · , Xn from a distribution
having pdf pθ (x), θ ∈ Ω . The joint probability density function of X1 , X2 , · · · , Xn
with a parameter θ is p(x1 , x2 , · · · , xn | θ) . The joint probability density function
may be regarded as a function of θ is called the likelihood function of the random
sample and is denoted by L(θ) = pθ (x1 , x2 , · · · , xn ) θ ∈ Ω.
Property 4.1 Let IX (θ) and IY (θ) be the amount of information of two inde-
pendent samples (X1 , X2 , · · · , Xn ) and (Y1 , Y2 , · · · Yn ) respectively. Let IXY (θ)
be the amount of information of the joint sample (X1 , Y1 )(X2 , Y2 ), · · · , (Xn , Yn ) .
Then IXY (θ) = IX (θ) + IY (θ) . This is known as additive property of Fisher Mea-
sure of Information.

116
Probability Models and their Parametric Estimation

Proof:

Let LXY (θ) = pθ (x1 , y1 ) · · · pθ (xn , yn )


= pθ (x1 )pθ (y1 ) · · · pθ (xn )pθ (yn )
Yn Y n
= pθ (xi ) pθ (yi )
i=1 i=1
= LX (θ)LY (θ)
log LXY (θ) = log LX (θ) + log LY (θ)
Differentiate this with respect to θ
∂ log LXY (θ) ∂ log LX (θ) ∂ log LY (θ)
= +
 ∂θ   ∂θ  ∂θ  
∂ log LXY (θ) ∂ log LX (θ) ∂ log LY (θ)
Vθ = Vθ + Vθ
∂θ ∂θ ∂θ
IXY (θ) = IX (θ) + IY (θ)

Property 4.2 Let X1 , X2 , · · · , Xn be iid random sample drawn from a popula-


tion with density function pθ (x), θ ∈ Ω . Let I(θ) be the amount of information for
each Xi , i = 1, 2, · · · , n . Then the amount of information of (X1 , X2 , · · · , Xn ) is
nI(θ) .
Proof:
The likelihood function for θ of the sample size n is
n
Y
L(θ) = pθ (xi )
i=1
Xn
log L( θ) = log pθ (xi )
i=1
n
∂ log L(θ) X ∂ log pθ (xi )
=
∂θ i=1
∂θ
  " n #
∂ log L(θ) X ∂ log pθ (Xi )
Vθ = Vθ
∂θ i=1
∂θ
n  
X ∂ log pθ (Xi )
= Vθ since Xi0 s iid
i=1
∂θ
n
X
= I(θ) = nI(θ)
i=1
The amount of information of X1 , X2 , · · · , Xn is = nI(θ)
 
∂ log pθ (X)
where I(θ) = Vθ ∀∈Ω
∂θ

is the amount of information of a single observation x of X.


Property 4.3 Let X1 , X2 , · · · , Xn be iid random sample drawn from a popula-
tion with density function p(x | θ), θ ∈ Ω . Let IX (θ) be the amount of information of

117
A. Santhakumaran

the sample X1 , X2 , · · · , Xn and IT (θ) be the amount of information of the statistic


T = t(X) . Then IX (θ) ≥ IT (θ) . If T = t(X) is sufficient, then IX (θ) = IT (θ).
Proof: For a single observation x of X
  Z
∂ log pθ (X) ∂ log pθ (x)
Eθ = pθ (x)dx
∂θ ∂θ
Z
∂pθ (x) 1
= pθ (x)dx
∂θ pθ (x)
Z
∂pθ (x)
= dx
∂θ
Z

= pθ (x)dx
∂θ
Z Z
∂ ∂pθ (x)
Assume pθ (x)dx = dx and make the transformation T = X
∂θ ∂θ
   
∂ log pθ (X) ∂ log pθ (T ) ∂ log pθ (x) ∂ log pθ (t)
then Eθ = Eθ since =
∂θ ∂θ ∂θ ∂θ
 2
∂ log pθ (X) ∂ log pθ (T )
Consider Eθ − ≥0
∂θ ∂θ
 2  2  
∂ log pθ (X) ∂ log pθ (T ) ∂ log pθ (X) ∂ log pθ (T )
Eθ + Eθ − 2Eθ ≥0
∂θ ∂θ ∂θ ∂θ

2 
∂ log pθ (T )
IX (θ) + IT (θ) − 2Eθ ≥ 0
∂θ
IX (θ) + IT (θ) − 2IT (θ) ≥ 0
IX (θ) − IT (θ) ≥ 0
IX (θ) ≥ IT (θ)
Suppose T = t(X) is a sufficient statistic, then
pθ (x) = pθ (t)h(x)
log pθ (x) = log pθ (t) + log h(x)
Differentiate this with respect to θ
∂ log pθ (x) ∂ log pθ (t)
=
 ∂θ  ∂θ 
∂ log pθ (X) ∂ log pθ (T )
Vθ = Vθ
∂θ ∂θ
⇒ IX (θ) = IT (θ)

4.7 Cramer - Rao Inequality


When a UMVUE does not exist, one may interest on a Locally Minimum Variance
Unbiased Estimator (LMVUE) which gives the smallest variance that an unbiased es-
timator can achieve at θ = θ0 . This is helpful to measure the performance of a given

118
Probability Models and their Parametric Estimation

unbiased estimator with some lower bounds of the unbiased estimator which are not
sharp. The Cramer - Rao Inequality is very simple to calculate the lower bound for the
variance of an unbiased estimator. Also it provides asymptotically efficient estimators.
The assumptions of the Cramer - Rao Inequality are
(i) Ω is an open interval ( finite , infinite or semi infinite).

(ii) The range of the distribution Pθ (x) is independent of the parameter θ .


∂pθ (x)
(iii) For any x and θ the derivative ∂θ exists and is finite.
Theorem 4.8 Under the assumptions (i) ,(ii) and (iii) and that I(θ) > 0 . Let
2
R statistic with Eθ [T ] < ∞ for which the derivative with respect to
T = t(X) be any
θ of Eθ [T ] = tpθ (x)dx exists can be obtained by differentiating under the integral
sign, then
h i2
∂Eθ [T ]
∂θ
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
 2  2 
∂ log pθ (X) ∂ log pθ (X)
where I(θ) = Eθ = −Eθ
∂θ ∂θ2

R Proof: Suppose the assumptions hold for a single observation x of X and


pθ (x)dx = 1 is differentiated twice under the integral sign with respect to θ , then
Z
∂pθ (x)
dx = 0
∂θ
Z
∂pθ (x) 1
pθ (x)dx = 0
∂θ pθ (x)
Z
∂ log pθ (x)
pθ (x)dx = 0 (4.1)
∂θ
 
∂ log pθ (X)
⇒ Eθ = 0
∂θ

Differentiate the equation (4.1) with respect to θ


R ∂ 2 log pθ (x)
pθ (x)dx + ∂ log∂θpθ (x) ∂p∂θθ (x)
R
∂θ 2 dx = 0
R ∂ 2 log pθ (x) R ∂ log pθ (x) ∂ log pθ (x)
∂θ 2 pθ (x)dx + ∂θ ∂θ pθ (x)dx = 0
R ∂ 2 log pθ (x) R  ∂ log pθ (x) 2
∂θ 2 pθ (x)dx + ∂θ pθ (x)dx = 0

119
A. Santhakumaran

2
∂ 2 log pθ (X)
  
∂ log pθ (X)
Eθ + Eθ = 0
∂θ2 ∂θ
2
∂ 2 log pθ (X)
  
∂ log pθ (X)
Eθ = −Eθ
∂θ ∂θ2
 2  2 
∂ log pθ (X) ∂ log pθ (X)
But I(θ) = Eθ = −Eθ
∂θ ∂θ2
 
∂ log pθ (X)
= Vθ
∂θ
Z
Now Eθ [T ] = tpθ (x)dx
Differentiate this with respect to θ
Z
∂Eθ [T ] dpθ (x)
= t dx
∂θ dθ
Z
∂pθ (x) 1
= t pθ (x)dx
∂θ pθ (x)
Z
∂Eθ [T ] ∂ log pθ (x)
= t pθ (x)dx
∂θ ∂θ
 
∂ log pθ (X)
= Eθ T
∂θ
 
∂ log pθ (X)
= Covθ T,
∂θ
∂ log pθ (X)
∵ Eθ [ ]=0
∂θ
By Covariance Inequality
2
{Covθ [T, ψ(X, θ)]}
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]
∂ log pθ (x)
Take ψ(x, θ) =
∂θ
 2
∂Eθ [T ]
∂θ
then Vθ [T ] ≥ ∀θ∈Ω
Vθ [ ∂ log∂θ
pθ (X)
]
 2
∂Eθ [T ]
∂θ
i.e., Vθ [T ] ≥ ∀θ∈Ω
I(θ)

Different forms of Cramer - Rao Inequality

(i) Suppose T = t(X) is a biased estimator of the parameter τ (θ) , i.e., Eθ [T ] =

120
Probability Models and their Parametric Estimation

τ (θ) + b(θ) , then the Cramer - Rao Inequality becomes

[τ 0 (θ) + b0 (θ)]2
Vθ [T ] ≥ ∀θ∈Ω
I(θ)

(ii) Suppose X1 , X2 , · · · , Xn are iid with pdf pθ (x), θ ∈ Ω and Eθ [T ] =


τ (θ) ∀ θ ∈ Ω , then the Cramer - Rao Inequality is written as
2
[τ 0 (θ)]
Vθ [T ] ≥ ∀θ∈Ω
nI(θ)
h i
∂ log pθ (X)
where I(θ) = Vθ ∂θ of a single observation x of X.

or
2
[τ 0 (θ)]
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
h i Qn
where I(θ) = Vθ ∂ log∂θL(θ) and L(θ) = i=1 pθ (xi ).

4.8 Chapman - Robbin Inequality


Chapman - Robbin Inequality is an improvement of Cramer - Rao Inequality, since
it does not involve regularity conditions as in Cramer - Rao Inequality. They also give
a lower bound for the variance of an unbiased estimator.
Theorem 4.9 Suppose X is distributed with density function pθ (x) and T =
t(X) is a statistic with Eθ [T ] = τ (θ) and Eθ [T 2 ] < ∞. Suppose pθ (x) > 0 ∀ x
. If θ and θ + ∆ are two values for which τ (θ) 6= τ (θ + ∆) and the function
ψ(x, θ) = pθ+∆ (x)
pθ (x) − 1 , then
 
 [τ (θ + ∆) − τ (θ)]2 
 
Vθ [T ] ≥ sup h i2 ∀ θ ∈ Ω
∆  Eθ pθ+∆ (X) − 1  
pθ (X)

121
A. Santhakumaran

2
∂ 2 log pθ (X)
  
∂ log pθ (X)
Eθ + Eθ = 0
∂θ2 ∂θ
2
∂ 2 log pθ (X)
  
∂ log pθ (X)
Eθ = −Eθ
∂θ ∂θ2
 2  2 
∂ log pθ (X) ∂ log pθ (X)
But I(θ) = Eθ = −Eθ
∂θ ∂θ2
 
∂ log pθ (X)
= Vθ
∂θ
Z
Now Eθ [T ] = tpθ (x)dx
Differentiate this with respect to θ
Z
∂Eθ [T ] dpθ (x)
= t dx
∂θ dθ
Z
∂pθ (x) 1
= t pθ (x)dx
∂θ pθ (x)
Z
∂Eθ [T ] ∂ log pθ (x)
= t pθ (x)dx
∂θ ∂θ
 
∂ log pθ (X)
= Eθ T
∂θ
 
∂ log pθ (X) ∂ log pθ (x)
= Covθ T, ∵ Eθ [ ]=0
∂θ ∂θ

By covariance inequality
2
{Covθ [T, ψ(X, θ)]}
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]
∂ log pθ (x)
Take ψ(x, θ) =
∂θ
 2
∂Eθ [T ]
∂θ
then Vθ [T ] ≥ ∀θ∈Ω
Vθ [ ∂ log∂θ
pθ (X)
]
 2
∂Eθ [T ]
∂θ
i.e., Vθ [T ] ≥ ∀θ∈Ω
I(θ)

Property 4.3 Let X1 , X2 , · · · , Xn be iid random sample drawn from a popula-


tion with density function pθ (x), θ ∈ Ω . Let IX (θ) be the amount of information of
the sample X1 , X2 , · · · , Xn and IT (θ) be the amount of information of the statistic
T = t(X) . Then IX (θ) ≥ IT (θ) . If T = t(X) is sufficient, then IX (θ) = IT (θ).

122
Probability Models and their Parametric Estimation

Proof: For a single observation x of X


  Z
∂ log pθ (X) ∂ log pθ (x)
Eθ = pθ (x)dx
∂θ ∂θ
Z
∂pθ (x) 1
= pθ (x)dx
∂θ pθ (x)
Z
∂pθ (x)
= dx
∂θ
Z

= pθ (x)dx
∂θ
Assume
Z Z
∂ ∂pθ (x)
pθ (x)dx = dx
∂θ ∂θ
and make the transformation T = X then
   
∂ log pθ (X) ∂ log pθ (T )
Eθ = Eθ
∂θ ∂θ

since
∂ log pθ (x) ∂ log pθ (t)
=
∂θ ∂θ
 2
∂ log pθ (X) ∂ log pθ (T )
Consider Eθ − ≥0
∂θ ∂θ
 2  2  
∂ log pθ (X) ∂ log pθ (T ) ∂ log pθ (X) ∂ log pθ (T )
Eθ + Eθ − 2Eθ ≥0
∂θ ∂θ ∂θ ∂θ

2
∂ log pθ (T )
IX (θ) + IT (θ) − 2Eθ ≥ 0
∂θ
IX (θ) + IT (θ) − 2IT (θ) ≥ 0
IX (θ) − IT (θ) ≥ 0
IX (θ) ≥ IT (θ)
Suppose T = t(X) is a sufficient statistic, then
pθ (x) = pθ (t)h(x)
log pθ (x) = log pθ (t) + log h(x)
Differentiate this with respect to θ
∂ log pθ (x) ∂ log pθ (t)
=
 ∂θ  ∂θ 
∂ log pθ (X) ∂ log pθ (T )
Vθ = Vθ
∂θ ∂θ
⇒ IX (θ) = IT (θ)

123
A. Santhakumaran

4.7 Cramer - Rao Inequality


When a UMVUE does not exist, one may interest on a Locally Minimum Variance
Unbiased Estimator (LMVUE) which gives the smallest variance that an unbiased es-
timator can achieve at θ = θ0 . This is helpful to measure the performance of a given
unbiased estimator with some lower bounds of the unbiased estimator which are not
sharp. The Cramer - Rao Inequality is very simple to calculate the lower bound for the
variance of an unbiased estimator. Also it provides asymptotically efficient estimators.
The assumptions of the Cramer - Rao Inequality are

(i) Ω is an open interval ( finite , infinite or semi infinite).


(ii) The range of the distribution Pθ (x) is independent of the parameter θ .
∂pθ (x)
(iii) For any x and θ the derivative ∂θ exists and is finite.
Theorem 4.8 Under the assumptions (i) ,(ii) and (iii) and that I(θ) > 0 . Let T =
2
R statistic with Eθ [T ] < ∞ for which the derivative with respect to θ of
t(X) be any
Eθ [T ] = tpθ (x)dx exists can be obtained by differentiating under the integral sign,
then
h i2
∂Eθ [T ]
∂θ
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
 2  2 
∂ log pθ (X) ∂ log pθ (X)
where I(θ) = Eθ = −Eθ
∂θ ∂θ2

R Proof: Suppose the assumptions hold for a single observation x of X and


pθ (x)dx = 1 is differentiated twice under the integral sign with respect to θ , then
Z
∂pθ (x)
dx = 0
∂θ
Z
∂pθ (x) 1
pθ (x)dx = 0
∂θ pθ (x)
Z
∂ log pθ (x)
pθ (x)dx = 0 (4.1)
∂θ
 
∂ log pθ (X)
⇒ Eθ = 0
∂θ

Differentiate the equation (4.1) with respect to θ


R ∂ 2 log pθ (x)
pθ (x)dx + ∂ log∂θpθ (x) ∂p∂θθ (x)
R
∂θ 2 dx = 0
R ∂ 2 log pθ (x) R ∂ log pθ (x) ∂ log pθ (x)
∂θ 2 pθ (x)dx + ∂θ ∂θ pθ (x)dx = 0

124
Probability Models and their Parametric Estimation

R ∂ 2 log pθ (x) R  ∂ log pθ (x) 2


∂θ 2 pθ (x)dx + ∂θ pθ (x)dx = 0
2
∂ 2 log pθ (X)
  
∂ log pθ (X)
Eθ + Eθ = 0
∂θ2 ∂θ
2
∂ 2 log pθ (X)
  
∂ log pθ (X)
Eθ = −Eθ
∂θ ∂θ2
2
∂ 2 log pθ (X)
  
∂ log pθ (X)
But I(θ) = Eθ = −Eθ
∂θ ∂θ2
 
∂ log pθ (X)
= Vθ
∂θ
Z
Now Eθ [T ] = tpθ (x)dx
Differentiate this with respect to θ
Z
∂Eθ [T ] dpθ (x)
= t dx
∂θ dθ
Z
∂pθ (x) 1
= t pθ (x)dx
∂θ pθ (x)
Z
∂Eθ [T ] ∂ log pθ (x)
= t pθ (x)dx
∂θ ∂θ
 
∂ log pθ (X)
= Eθ T
∂θ
 
∂ log pθ (X)
= Covθ T,
∂θ

since Eθ [ ∂ log∂θ
pθ (X)
]=0 .
By covariance inequality
2
{Covθ [T, ψ(X, θ)]}
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]
∂ log pθ (x)
Take ψ(x, θ) =
∂θ
 2
∂Eθ [T ]
∂θ
then Vθ [T ] ≥ ∀θ∈Ω
Vθ [ ∂ log∂θ
pθ (X)
]
 2
∂Eθ [T ]
∂θ
i.e., Vθ [T ] ≥ ∀θ∈Ω
I(θ)

Different forms of Cramer - Rao Inequality

125
A. Santhakumaran

(i) Suppose T = t(X) is a biased estimator of the parameter τ (θ) , i.e., Eθ [T ] =


τ (θ) + b(θ) , then the Cramer - Rao Inequality becomes

[τ 0 (θ) + b0 (θ)]2
Vθ [T ] ≥ ∀θ∈Ω
I(θ)

(ii) Suppose X1 , X2 , · · · , Xn are iid with pdf pθ (x), θ ∈ Ω and Eθ [T ] =


τ (θ) ∀ θ ∈ Ω , then the Cramer - Rao Inequality is written as
2
[τ 0 (θ)]
Vθ [T ] ≥ ∀θ∈Ω
nI(θ)
h i
∂ log pθ (X)
where I(θ) = Vθ ∂θ of a single observation x of X.

or
2
[τ 0 (θ)]
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
h i Qn
where I(θ) = Vθ ∂ log∂θL(θ) and L(θ) = i=1 pθ (xi ).

4.8 Chapman - Robbin Inequality


Chapman - Robbin Inequality is an improvement of Cramer - Rao Inequality, since
it does not involve regularity conditions as in Cramer - Rao Inequality. They also give
a lower bound for the variance of an unbiased estimator.
Theorem 4.9 Suppose X is distributed with density function pθ (x) and T =
t(X) is a statistic with Eθ [T ] = τ (θ) and Eθ [T 2 ] < ∞. Suppose pθ (x) > 0 ∀ x
. If θ and θ + ∆ are two values for which τ (θ) 6= τ (θ + ∆) and the function
ψ(x, θ) = pθ+∆ (x)
pθ (x) − 1 , then
 
 [τ (θ + ∆) − τ (θ)]2 
 
Vθ [T ] ≥ sup h i2 ∀ θ ∈ Ω
∆  Eθ pθ+∆ (X) − 1  
pθ (X)

126
Probability Models and their Parametric Estimation

Proof: First prove that

Eθ [ψ(X, θ)] 0, ∀ θ ∈ Ω
=
Z
Eθ [ψ(X, θ)] = ψ(x, θ)pθ (x)dx
Z  
pθ+∆ (x)
= − 1 pθ (x)dx
pθ (x)
Z
= [pθ+∆ (x) − pθ (x)]dx
= 1−1=0
Covθ [T, ψ(X, θ)] = Eθ [T ψ(X, θ)] − Eθ [T ]Eθ [ψ(X, θ)]
= Eθ [T ψ(X, θ)]
 
pθ+∆ (X)
= Eθ T −1
pθ (X)
 
pθ+∆ (x) − pθ (x)
Z
= t pθ (x)dx
pθ (x
Z Z
= tpθ+∆ (x)dx − tpθ (x)dx

= τ (θ + ∆) − τ (θ)
By covariance inequality
2
[τ (θ + ∆) − τ (θ)]
Vθ [T ] ≥ h i
Vθ pθ+∆ (X)
pθ (X) − 1
It is true for all values of ∆  
 [τ (θ + ∆) − τ (θ)]2 
Vθ [T ] ≥ sup h
pθ+∆(X)
i
∆  V
θ pθ (X) − 1 

Remark 4.5 If the range of the distribution Pθ , θ ∈ Ω can be relaxed by S(φ) ⊂


S(θ), φ < θ, φ 6= θ , then the Chapman - Robbin Inequality becomes
 
 [τ (φ) − τ (θ)]2 
Vθ [T ] ≥ sup h i
pφ (X)
φ:S(φ)⊂S(θ)  V θ pθ (X)−1 

Example 4.18 Using a single observation x of X , obtain the Chapman - Robbin


bound for the parameter θ of the pdf
 1
θ 0<x<θ
pθ (x) =
0 otherwise

127
A. Santhakumaran

Assume φ < θ and φ 6= θ and τ (φ) 6= τ (θ) . Define


 1
φ 0<x<φ
pφ (x) =
0 otherwise
  Z φ Z θ
pφ (X) θ 0
Eθ = pθ (x)dx + 1 pθ (x)dx
pθ (X) 0 φ φ θ
Z φ
θ1
= dx = 1
0 φθ
 2 Z φ  2
pφ (X) θ 1
Eθ = dx
pθ (X) 0 φ θ
θ2 1 θ
= φ=
φ2 θ φ
   
pφ (X) pφ (X)
Vθ −1 = Vθ
pθ (X) pθ (X)
θ θ−φ
= −1=
φ φ
The Chapman Robbin Inequality is
(φ − θ)2
 
Vθ [T ] ≥ sup φ
φ:S(φ)⊂S(θ) (θ − φ)
≥ sup {φ(θ − φ)}
φ:S(φ)⊂S(θ)

Let y = φ(θ − φ)
Differentiate this with respect to φ
dy
= θ − 2φ

d2 y
= −2 < 0
dφ2
d2 y dy
For maximum of y , dφ2 < 0 at the value of φ for which dφ = 0 . At φ = θ2 , y has
2
maximum. The maximum value of y is θ4 . The Chapman - Robbin lower bound for
2
the variance of the unbiased estimator of θ is θ4 .
Remark 4.6 Chapman - Robbin bound becomes the Cramer - Rao lower bound
by allowing ∆ → 0 and assume the range of the distribution is independent of the

128
Probability Models and their Parametric Estimation

∂ log pθ (x)
parameter, and the derivative ∂θ exists and finite, then

[τ (θ + ∆) − τ (θ)]2
Vθ [T ] ≥ h i2
1
Eθ [pθ+∆ (X) − pθ (X)] pθ (X)
h i2
lim∆→0 [τ (θ+∆)−τ

(θ)

≥ h i2
[pθ+∆ (X)−pθ (X)] 1
Eθ lim∆ →0 ∆ pθ (X)

[τ 0 (θ)]2
≥ h i2
1
Eθ p0 (X | θ) pθ (X)
[τ 0 ]2
≥ h i2
∂ log pθ (X)
Eθ ∂θ

[τ 0 (θ)]2
≥ ∀ θ∈Ω
I(θ)
Example 4.19 Obtain the Cramer - Rao lower bound for the variance of the unbi-
ased estimator of the parameter θ of the Cauchy distribution by considering a sample
of size n .
 1 1
π 1+(x−θ)2
−∞ < x < ∞, −∞ < θ < ∞
pθ (x) =
0 otherwise
1 1
For a single observation x of X, L(θ) = pθ (x) =
π 1 + (x − θ)2
log L(θ) = − log π − log[1 + (x − θ)2 ]
∂ log pθ (x) 2(x − θ)
=
∂θ 1 + (x − θ)2
2
4(x − θ)2

∂ log pθ (x)
=
∂θ [1 + (x − θ)2 ]2
2
4(X − θ)2
  
∂ log pθ (X)
Eθ = Eθ
∂θ [1 + (X − θ)2 ]2
Z ∞
4 (x − θ)2
= dx
π −∞ [1 + (x − θ)2 ]3
Z ∞
4 t2
= dt since t = x − θ
π −∞ (1 + t2 )3
Z ∞
8 t2
= dt
π 0 (1 + t2 )3
Z ∞ 3
4 u 2 −1
= du since t2 = u
π 0 (1 + u) 23 + 32
3 3
4 Γ2Γ2
=
π Γ3
4 1√ 1√
π 2
π2 π 1
I(θ) = =
2 2

129
A. Santhakumaran

The Cramer - Rao lower bound from the sample of size n for the variance of the
0
(θ)]2
unbiased estimator of the parameter τ (θ) = θ is [τnI(θ) = n11 = n2 .
2
Example 4.20 Let X1 , X2 , · · · , Xn is a sample from N (θ, 1) . Obtain the
Cramer - Rao lower bound for the variance of (i) θ and (ii) θ2 . Also find the un-
biased estimator of θ2 . To verify that the actual variance of the unbiased estimator of
θ2 is same as Cramer - Rao lower bound.
(i) The likelihood function for θ is
n
Y
L(θ) = pθ (xi )
i=1
 n
1 1
Pn 2
= e− 2 i=1 (xi −θ)

n
√ 1X
log L(θ) = −n log 2π − (xi − θ)2
2 i=1

Differentiate this with respect to θ


n
∂ log L(θ) X
= (xi − θ) = n(x̄ − θ)
∂θ i=1
 2
∂ log L(θ)
= n2 (x̄ − θ)2
∂θ
 2
∂ log L(θ)
Eθ = n2 Eθ [X̄ − θ]2
∂θ
1
= n2 Vθ [X̄] = n2 = n = I(θ)
n
The Cramer - Rao lower bound for the variance of the unbiased estimator X̄ of τ (θ) =
0
(θ)]2
θ is [τI(θ) = n1 .
Remark 4.7 The actual variance of the statistic X̄ is Vθ [X̄] = n1 . It is same as
the Cramer - Rao lower bound. .˙. X̄ is UMVUE of θ .
(ii) The likelihood function for θ becomes
n
n 1 X √ 2
log L(θ) = − log 2π − xi − θ 2
2 2 i=1
Differentiate this with respect to θ2
n
∂ log L(θ) 1 X √ 
= xi − θ 2
∂θ2 2θ i=1
n
1 X 1
= (xi − θ) = n[x̄ − θ]
2θ i=1 2θ
2
n2 n2 1

∂ log L(θ) 1 2 n
Eθ = n Eθ [X̄ − θ]2 = 2 Vθ [X̄] = 2 = 2
∂θ2 4θ 2 4θ 4θ n 4θ

130
Probability Models and their Parametric Estimation

The Cramer - Rao lower bound for the variance of unbiased estimator of τ (θ) = θ2
0
(θ)]2 2
is [τI(θ) = 4θn where τ 0 (θ) = dτdθ(θ)
2 = 1.

Consider Eθ [X − θ]2 = 1
Eθ [X 2 ] − 1 = θ2
 Pn 2

i=1 Xi
Eθ − 1 = θ2
n
Pn
Xi2
.. . i=1
n − 1 is the unbiased estimator of θ2 .
Pn
Xi2
Pn
(X −θ+θ)2
Pn
(X −θ)2 Pn
Consider i=1n = i=1 ni = i=1 n i + θ2 + 2θ
n i=1 (Xi − θ)
P 2  P 2 
Xi Xi
Vθ −1 = Vθ
n n
  2 X n
!
(Xi − θ)2
P

= Vθ + Vθ [Xi ] − 0
n n i=1
(Xi − θ)2 4θ2
P 
= Vθ + 2 n since Vθ [Xi ] = 1 ∀ i = 1 to n
n n
2
4θ2
P 
(Xi − θ)
= Vθ +
n n
2 2
P
ns (Xi − θ)
Define Y = 2 = 2
∼ χ2 distribution with n degrees of freedom
σ  σ 
n 1
The pdf of Y ∼ G ,
2 2
( 1 n
n
1
2 Γn
e− 2 y y 2 −1 0 < y < ∞
p(y) = 2 2
0 otherwise
Z ∞
1 − 21 y n
E [Y r ] = n ne y 2 +r−1 dy
0 2 2 Γ
2
1 Γ( n2 + r)
= n n
2 2 Γ n2 ( 12 ) 2 +r
2r Γ( n2 + r)
= r = 1, 2, · · ·
Γ n2
Γ( n + 1)
E[Y ] = 2 2 n =n
Γ2
E[Y 2 ] =
(n + 2)n and V [Y ] = 2n
ns2
But Y = and σ 2 = 1
σ 2 
Y 2n 2
.. . Vθ [s2 ] = Vθ = 2 =
n n n
P 2 2
4θ2

Xi 4θ 2
Vθ −1 = Vθ [s2 ] + = +
n n n n

131
A. Santhakumaran

Xi2
P
4θ 2 2
The actual variance of n − 1 is n + n . Here the Cramer - Rao lower bound is
X2
P
less than the actual variance of the unbiased estimator n i − 1 of the parameter θ2 .
Note that the UMVUE of θ2 is X̄ 2 − n1 , since Eθ [X̄ 2 ] − {Eθ [X̄]}2 = n1
⇒ Eθ [X̄ 2 ] − n1 = θ2
i.e., X̄ 2 − n1 is unbiased estimator of θ2 .
Example 4.21 Given pθ (x) = θ1 , 0 < x < θ, θ > 0 . Compute the reciprocal
h i2
nEθ ∂ log∂θ pθ (X)
. Compare this with the variance of n+1
n T where T is the largest
observation of a random sample of size n for this distribution.
 1
θ 0<x<θ
pθ (x) =
0 otherwise
1
log pθ (x = −
θ
∂ log pθ (x) 1
= −
 ∂θ  θ
∂ log pθ (x) 1
=
∂θ θ2
 2
∂ log pθ (X) 1
Eθ =
∂θ θ2
 2
∂ log pθ (X) n
i.e., nEθ =
∂θ θ2
1 θ2
i2 =
n
h
nEθ ∂ log∂θ pθ (X)

Let T = max {Xi }


1≤i≤n
The pdf of T is
n n−1

p(t | θ) = θn t 0<t<θ
0 otherwise
n
Eθ [T ] = θ
n+1
n+1
→ T is an unbiased estimator of θ
n
n 2
Eθ [T 2 ] = θ
n+2
 2
n 2 n
Vθ [T ] = θ − θ2
n+1 n+1
nθ2
=
(n + 1)(n + 2)
θ2
 
n+1
Vθ T =
n n(n + 2)
n+1 θ2
The actual variance of the unbiased estimator n T is n(n+2)

132
Probability Models and their Parametric Estimation

Here the actual variance of the unbiased estimator of θ is less than the Cramer
- Rao lower bound of the estimator n+1 n T . Since the distribution is not satisfied the
assumptions of the Cramer - Rao Inequality . Note that n+1n T is the UMVUE of θ .
Example 4.22 Find the Cramer - Rao lower bound for the variance of the unbiased
estimator Pθ {X > 2} for a single observation x of X with pdf
 1 −x
θe x>0θ>0
θ
pθ (x) =
0 otherwise
Z 2
1 −x
Consider τ (θ) = Pθ {X > 2} = 1 − e θ dx
0 θ
 x 2
1 e− θ
= 1−
θ − θ1 0
2 2
= 1 + e− θ − 1 = e− θ
1
log pθ (x) = − log θ − x
θ
2
One can take λ = e− θ , then log λ = − θ2 i.e., θ = − log2 λ .
 
2 x
log pλ (x) = − log − + log λ
log λ 2
 
∂ log pλ (x) log λ −2 1 x1
= − (−2)(−1) (log λ) +
∂λ −2 λ 2λ
1 x
= +
λ log λ 2λ
∂ log pθ (x) θ x −2
 2 = − 2 + e θ
∂ e θ− e θ 2
2

= [x − θ]
2
 2
4
∂ log pθ (X)  eθ
Eθ   2 = Eθ [X − θ]2
∂ e− θ 4
4
eθ 2
= θ since Eθ [X − θ]2 = θ2
4
The Cramer - Rao lower bound for the variance of the unbiased estimator of τ (θ) =
−2 4 2
e θ is θ42 e− θ , since τ 0 (θ) = ∂τ−(θ)2  = 1. The unbiased estimator of τ (θ) = e− θ
∂ e θ

is 
1 if X > 2
T =
0 otherwise

4.9 Efficiency
As a consequence of Cramer - Rao Inequality, the efficient estimator is as follow:

133
A. Santhakumaran

Definition 4.3 Let T = t(X) be an unbiased estimator of a parameter θ . Then


T = t(X) is called an efficient estimator of θ iff the variance of T = t(X) attains
the Cramer - Rao lower bound.
Definition 4.4 The ratio of the actual variance of any unbiased estimator of a
parameter to the Cramer - Rao lower bound is called the efficiency of that estimator.
Actual Variance of the statistic
Efficiency =
Cramer - Rao lower bound of that statistic
Definition 4.5 An estimator is said to be efficient estimator if efficiency is one.
Definition 4.6 An estimator is said to be an asymptotic efficient estimator if effi-
ciency tends to one as n → ∞.
Using Cramer - Rao lower bound to find the efficient estimator has the follow-
ing limitations.
• UMVUE exists even the Cramer - Rao regularity conditions are not satisfied.
• UMVUE exists when the regularity conditions are satisfied but UMVUE’s are
not attained the Cramer - Rao lower bound.
Example 4.23 Let X1 , X2 , · · · , Xn be a random sample from
 −θx
θe 0 < x < ∞, θ > 0
pθ (x) =
0 otherwise

Obtain the asymptotic efficient estimator of θ .

For a single observation x of X


L(θ) = pθ (x) = θe−θx
log L(θ) = log θ − θx
∂ log pθ (x) 1
= −x
∂θ θ
2
∂ log pθ (x) 1
= − 2
 2 ∂θ2  θ
∂ log pθ (X) 1
Eθ = − 2
∂θ2 θ
The Cramer - Rao lower bound for the variance of the unbiased estimator of θ is
2
1
n 1
= θn .
θ2

n  
X 1
Let T = Xi , thenT ∼ G n,
i=1
θ

θ n −θt n−1

Γn e t 0<t<∞
pθ (t) =
0 otherwise
  Z ∞ n
1 θ −θt n−1−1
Eθ = e t dt
T 0 Γn

134
Probability Models and their Parametric Estimation

θn Γ(n − 1)
=
Γn θn−1
θ
=
n−1
 
n−1
Eθ = θ if n = 2, 3, · · ·
T
n−1
is the unbiased estimator of θ.
T
θ2

1
Eθ = if n = 3, 4, · · ·
T2 (n − 1)(n − 2)
θ2
 
1
Vθ =
T (n − 1)2 (n − 2)
θ2
 
n−1
Vθ = , if n = 3, 4, · · ·
T n−2
n−1 θ2
Actual variance of T is n−2 . Cramer - Rao lower bound of the unbiased estimator
n−1 θ2
T of θ is n.

θ2
n−2
Efficiency = θ2
n
n 1
= = 2 , n = 3, 4, · · ·
n−2 1− n
→ 1 as n → ∞

Thus n−1 T is the asymptotic efficient estimator of θ . Note that n−1


T is the UMVUE
of θ .
Theorem 4.10 A necessary and sufficient condition for an estimator to be the most
efficient is that T = t(X) is sufficient and t(x) − τ (θ) is proportional to ∂ log∂θpθ (x)
where Eθ [T ] = τ (θ) .
Proof: Assume T = t(X) is a most efficient estimator of τ (θ) and t(x)−τ (θ) ∝
∂ log pθ (x)
∂θ

∂ log pθ (x)
i.e., t(x) − τ (θ) = A(θ)
∂θ

135
A. Santhakumaran

Prove that T = t(X) is a sufficient statistic.

t(x) − τ (θ) ∂ log pθ (x)


=
A(θ) ∂θ
t(x) τ (θ) ∂ log pθ (x)
− =
A(θ) A(θ) ∂θ
Z Z Z
t(x) τ (θ)
dθ − dθ = d log pθ (x) + c(x)
A(θ) A(θ)
Z θ Z θ
1 τ (θ)
Choose dθ = Q(θ) and d(θ) = c1 (θ)
−∞ A(θ) −∞ A(θ)
Then t(x)Q(θ) − c1 (θ) − c(x) = log pθ (x)
eQ(θ)t(x)−c1 (θ)−c(x) = pθ (x)
pθ (x) = c(θ)eQ(θ)t(x) h(x)
where c(θ) = e−c1 (θ) and h(x) = e−c(x) .

It is an one parameter exponential family. . ˙. T = t(X) is a sufficient statistic.


Conversely, assume T = t(X) is sufficient and t(x) − τ (θ) = A(θ) ∂ log∂θpθ (x) .
Prove that T = t(X) is the most efficient estimator of τ (θ) .

∂ log pθ (x)
t(x) − τ (θ) = A(θ)
∂θ
t(x) − τ (θ) ∂ log pθ (x)
=
A(θ) ∂θ
 2  2
t(x) − τ (θ) ∂ log pθ (x)
=
A(θ) ∂θ
 2
1 ∂ log pθ (X)
Eθ [T − τ (θ)]2 = Eθ
[A(θ)]2 ∂θ
 2
Vθ [T ] ∂ log pθ (X)
= Eθ
[A(θ)]2 ∂θ
 2
2 ∂ log pθ (X)
Vθ [T ] = [A(θ)] Eθ (4.2)
∂θ

136
Probability Models and their Parametric Estimation

 
∂ log pθ (X)
But Eθ T, = τ 0 (θ)
∂θ
 
∂ log pθ (x)
i.e, Eθ (T − τ (θ)) , = τ 0 (θ)
∂θ
since Eθ [ ∂ log∂θ
pθ (x)
]=0
"  2 #
∂ log pθ (X)
Eθ A(θ) = τ 0 (θ)
∂θ
since t(x) − τ (θ) = A(θ) ∂ log∂θ
pθ (x)

 2
∂ log pθ (X)
A(θ)Eθ = τ 0 (θ)
∂θ
τ 0 (θ)
i.e., A(θ) = h i2
∂ log pθ (X)
Eθ ∂θ

[τ 0 (θ)]2
From equation (4.2) →Vθ [T ] = ∀θ∈Ω
Eθ [ ∂ log∂θ
pθ (X) 2
]

Thus the actual variance of T = t(X) is equal to the Cramer - Rao lower bound.
Remark 4.8 UMVUE may be most efficient estimator. As discussed in example
4.20, n−1
T , n = 3, 4, · · · is the UMVUE of θ but not most efficient estimator of θ .

4.10 Extension of Cramer - Rao Inequality


Cramer - Rao Inequality has been modified and extended in different directions. Con-
sider the first case, where θ is a vector. In second case, it may extend the inequality to
get better bounds for the variance of unbiased estimators. Bhattacharya gives a method
of having a whole sequence of non-decreasing lower bounds for the variance of an un-
biased estimator by successive differentiation of the likelihood function with respect to
the parametric function.
Lemma 4.1 For any random variables X1 , X2 , · · · , Xr with finite second mo-
ments, the covariance matrix

C = [Cov(Xi , Xj )]r×r

is positive semi definite. It is positive definite iff Xi ’s i = 1 to r are independent.


Pr
Proof: Assume Xi ’s are not independent. Consider the variance of i=1 ai Xi
  
" r # c11 · · · c1r a1
X  c21 · · · c2r   a2 
i.e., V ai Xi = (a1 , a2 , · · · , ar )  
 ··· ··· ···  ··· 

i=1
cr1 · · · crr ar
= a0 Ca ≥ 0 ∀ a0 = (a1 , a2 , · · · , ar )

where C is the covariance matrix.


→ C is positive semi definite.

137
A. Santhakumaran

If Xi ’s are independent , then


  
" # c11 0 ··· 0 a1
r
X  0 c22 ··· 0   a2 
V ai Xi = (a1 , a2 , · · · , ar ) 
 ···
 
· ··· ·  · 
i=1
0 · ··· crr ar
= a0 Ca > 0 ∀ a0 = (a1 , a2 , · · · , ar )

⇒ C is positive definite, since cii = V [Xi ] ∀ i = 1, 2, · · · , r and cij = 0 if i 6= j.


Lemma 4.2 Let ( X1 , X2 , · · · , Xr ) and Y have finite second moment, let
νi = Cov[Xi , Y ] and Σ be the covariance matrix of the Xi ’ s. Without loss of
0 −1
generality suppose Σ is positive definite, then ρ2 = ν VΣ[Y ] ν , ρ is the multiple corre-
lation coefficient between Y and the vector ( X1 , X2 , · · · , Xr ).
Proof: Define ρ is the correlation coefficient between a0 X and Y where a0 =
(a1 , a2 , · · · , ar ) and X0 = (X1 , X2 , · · · , Xr ),
Pr 2
{Cov [ i=1 ai Xi , Y ]}
i.e., ρ2 = Pr .
V [Y ]V [ i=1 ai Xi ]

Maximizing ρ2 is not uniquely determined by a0 , since ρ is invariant under changes


of scale. Obtaining the unique maximum of ρ , one can impose the condition that
V [Σri=1 ai Xi ] = a0 Σa = 1 . Maximizing ρ subject to a0 Σa = 1 is equivalent
to maximizing a0 ν subject to a0 Σa = 1 . By Lagrangian multiplier method, the
Lagrangian equation is
1
L(a, λ) = a0 ν − λ[a0 Σa − 1]
2
∂L(a, λ)
= ν − λaΣ
∂a
∂L(a, λ)
The necessary condition for maximum is =0
∂a
1 −1
→ ν − λaΣ = 0 i.e., a = Σ ν
λ
1 0 −1
ν Σ ν = 1 since a0 Σa = 1
λ2 √
λ = ± ν 0 Σ−1 ν
Σ−1 ν
.. . a = √
ν 0 Σ−1 ν
" r #
X
Cov ai Xi , Y = a0 Cov [X, Y ] = a0 ν
i=1
a0 ν ν 0 Σ−1 ν
.. . ρ = p =√ p
V [Y ] ν 0 Σ−1 ν V [Y ]
ν 0 Σ−1 ν
ρ2 =
V [Y ]

138
Probability Models and their Parametric Estimation

Theorem 4.11 For any unbiased estimator T = t(X) of τ (θ) and any func-
tions ψi (x, θ) with finite second moments, then V [T ] ≥ ν 0 C −1 ν where ν 0 =
(ν1 , ν2 , · · · , νr ) and C = [cij ]r×r are defined by νi = Cov[T, ψi (X, θ)] and
cij = Cov[ψi (X, θ)ψj (X, θ)], i, j = 1, 2, · · · , r .
Proof: As in Lemma 4.2, replace Y by T and Xi by ψi (X, θ), then

ν 0 C −1 ν
ρ2 = ≤1
V [T ]
V [T ] ≥ ν 0 C −1 ν
where νi = Cov[T, ψi (X, θ)] = τi0 (θ), i = 1, 2, · · · , r, and C = Σ.

4.11 Cramer - Rao Inequality - Multiparameter case


Let X be distributed with density pθ (x), θ ∈ Ω where θ is a vector, say θ =
(θ1 , θ2 , · · · , θr ).
Assumptions:
(i) Ω is an open interval ( finite or infinite or semi infinite).
(ii) The range of the distribution Pθ is independent of the parameter θ =
(θ1 , θ2 , · · · , θr ).
(iii) For any x and θ ∈ Ω and i = 1, 2, · · · , r the derivative exists and is finite.
Define the information matrix of order r
 
∂ log pθ (X) ∂ log p(X, θ)
I(θ) = [Iij (θ)]r×r where Iij (θ) = Eθ
∂θi ∂θj

139
A. Santhakumaran

For a single observation x of X and the assumptions (i) to (iii)


Z
pθ (x)dx = 1
Differentiate partially with respect to θi
Z
∂pθ (x)
= 0
∂θi
Z
∂pθ (x)
pθ (x)dx = 0
∂θ
 i 
∂ log pθ (X)
Eθ = 0
∂θi
Z 2 Z
∂ log pθ (x) ∂ log pθ (x) 1 ∂pθ (x)
pθ (x)dx + dx = 0
∂θi ∂θj ∂θi pθ (x) ∂θj
 2   
∂ log pθ (X) ∂ log pθ (X) ∂ log pθ (X)
Eθ + Eθ = 0
∂θi ∂θj ∂θi ∂θj
 
∂ log pθ (X) ∂ log pθ (X)
Iij (θ) = Eθ
∂θi ∂θj
 2 
∂ log pθ (X)
= −Eθ for i 6= j and
∂θi ∂θj
 2 
∂ log pθ (X)
= −Eθ for i = j
∂θi2

hTheorem i4.12 Suppose that assumptions (i) to (iii) and the relation
∂ log pθ (X)
Eθ ∂θi = 0, i = 1, 2, · · · , r hold and I(θ) is positive definite. Let
T = t(X) be any statistic with REθ [T 2 ] < ∞ for which the derivative with respect to
θi , i = 1, 2, · · · , r of Eθ [T ] = tpθ (x)dx exists for each i and can be obtained by
differentiating under the integral sign. Then Vθ [T ] ≥ α0 I −1 (θ)α, where α0 is the
row vector with ith element αi = ∂E∂θθ [T i
]
, i = 1, 2, · · · , r .
Proof: As in Theorem 4.11, replace ψi (x, θ) = ∂ log∂θpiθ (x) , i = 1, 2 · · · , r and
ν = α , C = I(θ) ⇒ Vθ [T ] ≥ α0 I −1 (θ)α.
Example 4.21 Let X1 , X2 , · · · , Xn iid N( θ, σ 2 ). Obtain the information in-
equality for the parameter θ = (θ, σ 2 ) .

140
Probability Models and their Parametric Estimation

Cramer - Rao Inequality for θ = (θ1 , θ2 ) is Vθ [T ] ≥ α0 I −1 (θ)α, where


T = (T1 , T2 ) and
" # 
∂Eθ [T]
τ 0 (θ1 )

α = ∂θ 1
=
∂Eθ [T] τ 0 (θ2 )
∂θ2
 2 
∂ log L(θ)
Iij (θ) = −Eθ i 6= j; i, j = 1, 2.
∂θi ∂θj
 2 
∂ log L(θ)
= −Eθ i=j
∂θi2
 
I11 (θ) I12 (θ)
I(θ) =
I21 (θ) I22 (θ)
 2   2 
∂ log L(θ) ∂ log L(θ)
I11 (θ) = −Eθ = −Eθ
∂θ1 ∂θ1 ∂θ2
" 2 #
∂ log L(θ)
= Eθ where θ1 = θ
∂θ
 2 
∂ log L(θ)
I12 (θ) = I21 (θ) = −Eθ
∂θ1 ∂θ2
 2 
∂ log L(θ)
= −Eθ where θ2 = σ 2
∂θ∂σ 2
 2   2 
∂ log L(θ) ∂ log L(θ)
I22 (θ) = −Eθ = −Eθ
∂θ2 ∂θ2 ∂θ∂σ 2
The likelihood function for θ is
Yn
L(θ) = pθ (xi )
i=1
  n2
1 1
P 2
= 2
e− 2σ2 (xi −θ)
2πσ
n n 1 X
log L(θ) = − log 2π − log σ 2 − 2 (xi − θ)2
2 2 2σ
∂ log L(θ) 1 X
= 2 (xi − θ)
∂θ 2σ 2
n
= [x̄ − θ]
σ2
 2
∂ log L(θ) 1 2
Eθ = n Eθ [X̄ − θ]2
∂θ σ4
n2 σ 2 n
I11 (θ) = =
σ 4 n σ 2
∂ 2 log L(θ)

I12 (θ) = I21 (θ) = −Eθ =0
∂θ∂σ 2
 2 
∂ log L(θ) n
since Eθ = − 4 Eθ [X̄ − θ] = 0
∂σ 2 ∂θ σ
∂ log L(θ) n 1 X
= − 2+ (xi − θ)2
∂σ 2 2σ
141 2(σ 2 )2
2
∂ log L(θ) n 1 X
= − 2 3 (xi − θ)2
∂(σ 2 )2 2σ 4 (σ )
 2
nσ 2

∂ log L(θ) n
−Eσ2 = − 4+ 6
∂(σ 2 )2 2σ σ
 
n 1 n
I22 (θ) = 4
1 − =
σ 2 2σ 4
 n " 2 #
σ

σ2 0 −1 n 0
I(θ) = n I (θ) = 2σ 4
0
A. Santhakumaran

2 4
i.e., Vθ [T1 ] ≥ σn and Vσ2 [T2 ] ≥ 2σn .
2
Remark 4.9 σn is the actual variance of the unbiased estimator T1 = X̄ for θ is
2σ 4
same as the Cramer - Rao lower bound of that estimator but n−1 is the actual variance
1
P n 2
of the unbiased estimator T2 = n−1 i=1 (Xi − X̄) is greater than the Cramer - Rao
lower bound of that estimator.

4.11 Bhattacharya Inequality


When the lower bound is not sharp, it can be improved by considering the higher
order derivatives of the likelihood function of the parameter θ .
Assumptions: Let X1 , X2 , · · · , Xn be distributed with pdf p(x | θ) , θ ∈ Ω .
(i) Ω is an open interval ( finite , infinite or semi infinite).
(ii) The range of the distribution Pθ , θ ∈ Ω is independent of the parameter θ .
(iii) For any x and θ ∈ Ω , the higher order derivatives

∂ i1 +i2 +···+is log L(θ)


∂θ1i1 · · · ∂θsis

exists and is finite.


(vi) Define K(θ) = [Kij (θ)]s×s
" #
∂ i1 +i2 +···+is log L(θ) ∂ j1 +j2 +···+js log L(θ)
where Kij (θ) = Eθ
∂θ1i1 · · · ∂θsis ∂θ1j1 · · · ∂θsjs

Theorem 4.13 Suppose that the assumptions (i) to (iv) hold and that the covariance
matrix K(θ) is positive definite. Let T = t(X) be any statistic with Eθ [T 2 ] < ∞ for
which the higher order derivative τ i1 +i2 +···+is (θ) exists for each i = 1, 2, · · · , s and
can be obtained by differentiating under the integral sign. Then Vθ [T ] ≥ α0 K −1 (θ)α,
where α0 is row vector with elements
∂ i1 +i2 +···+is Eθ [T ] ∂ i1 +i2 +···+is log L(θ)
 
= Covθ T,
∂θ1i1 · · · ∂θsis ∂θ1i1 · · · ∂θsis
= τ i1 +···+is (θ)

Proof: As in Theorem 4. 11, replace

∂ i1 +i2 +···+is log L(θ)


ψi (x, θ) =
∂θ1i1 · · · ∂θsis
and C = K(θ) = [Kij (θ)]s×s and ν = α0 = ( τ 0 (θ) τ 00 (θ) · · · τ (s) (θ) ),
then Vθ [T ] ≥ α0 K −1 (θ)α

142
Probability Models and their Parametric Estimation

Example 4.25 Given that X ∼ b(n, θ) , 0 < θ < 1 . Obtain the Bhattacharya
bound for the unbiased estimator of the parameter τ (θ) = θ2 .

L(θ) = p(x | θ) = cnx θx (1 − θ)n−x


log L(θ) = log cnx + x log θ − (n − x) log(1 − θ)
" #
∂ i1 +i2 log L(θ) ∂ j1 +j2 log L(θ)
K(θ) = [Kij (θ)] = Eθ
∂θ1i1 ∂θ2i2 ∂θ1j1 ∂θ2j2

" #
∂ log L(θ) ∂ log L(θ) ∂ log L(θ) ∂ 2 log L(θ)
K(θ) = Eθ ∂θ ∂θ ∂θ ∂θ 2
∂ 2 log L(θ) ∂ log L(θ) ∂ 2 log L(θ) ∂ 2 log L(θ)
∂θ 2 ∂θ ∂θ 2 ∂θ 2
  2 
∂ log L(θ) ∂ log L(θ) ∂ 2 log L(θ)
∂θ ∂θ ∂θ 2
= Eθ 
 
 2 2
∂ 2 log L(θ) ∂ log L(θ)

∂ log L(θ)
∂θ 2 ∂θ ∂θ 2

∂ log L(θ) x n−x


= −
∂θ θ (1 − θ)
x − xθ − nθ + xθ (x − nθ)
= =
θ(1 − θ) θ(1 − θ)
2
(x − nθ)2

∂ log L(θ)
=
∂θ θ2 (1 − θ)2
2
(X − nθ)2
  
∂ log L(θ) nθ(1 − θ)
Eθ = Eθ 2 2
= 2
∂θ θ (1 − θ) θ (1 − θ)2
n
=
θ(1 − θ)
∂ log L(θ) ∂ 2 log L(θ)
     2 
∂ log L(θ) ∂ log L(θ)
Eθ = Eθ Eθ
∂θ ∂θ2 ∂θ ∂θ2
 
∂ log L(θ)
= 0 since Eθ = 0 and
∂θ
∂ 2 log L(θ)
   
∂ log L(θ)
Eθ E θ = 0
∂θ2 ∂θ
 2 2
∂ 2 log L(θ)
    2 
∂ log L(θ) ∂ log L(θ) ∂ log L(θ)
Eθ = Eθ E θ
∂θ2 ∂θ2 ∂θ2 ∂θ2
 2   2
∂ log L(θ) ∂ log L(θ) n
Eθ = −Eθ =−
∂θ2 ∂θ θ(1 − θ)
 2  2
n2
 
∂ log L(θ) ∂ log L(θ)
Eθ Eθ =
∂θ2 ∂θ2 θ2 (1 − θ)2

143
A. Santhakumaran

 n 0
 
θ(1−θ)

θ(1−θ) −1 0
K(θ) = , K (θ) =  n
n2
   
θ 2 (1−θ)2
 
0 0
θ 2 (1−θ)2 n
2 0 00
τ (θ) = θ , τ (θ) = 2θ, τ (θ) = 2
 
θ(1−θ)
0 


Vθ [T ] ≥

2θ, 2
 n 
θ 2 (1−θ)2 2
 
0
n
4θ 3 (1 − θ) 4θ 2 (1 − θ)2
≥ +
n n2
≥ Cramer - Rao lower bound of θ 2 + positive quantity
!
n! x 1
2 2
Since log L(θ) = log + log θ + (n − x) log[1 − (θ ) 2 ]
x!(n − x)! 2

Differentiate this with respect to θ 2


∂ log L(θ) (x − nθ)
=
∂θ 2 2θ 2 (1 − θ)
 
∂ log L(θ) 2 (X − nθ)2
" #
Eθ = Eθ  
∂θ 2 4
4θ (1 − θ) 2

nθ(1 − θ)
=
4θ 4 (1 − θ)2
n
I(θ) =
4θ 3 (1 − θ)

1
The Cramer - Rao lower bound for the variance of an unbiased estimator is I(θ) =
4θ 3 (1−θ)
n ,since τ 0 (θ) = 1.
Remark 4.10 (i) Bhattacharya Inequality becomes Cramer - Rao Inequality when
s = 1 , i.e., α1 = τ 0 (θ) and
 
∂ log L(θ) ∂ log L(θ)
K11 (θ) = Eθ
∂θ ∂θ
 2
∂ log L(θ)
= Eθ = I(θ)
∂θ
Vθ [T ] ≥ α1 [I −1 (θ)]α1
α12
=
I(θ)
[τ 0 (θ)]2
= h i
Vθ ∂ log∂θL(θ)

(ii) When s = 2 Bhattacharya Inequality gives the non decreasing lower bound for the
variance of an unbiased estimator of τ (θ) .
The Bhattacharya Inequality is
Vθ [T ] ≥ α0 K −1 (θ)α
where α0 = (τ 0 (θ) τ 00 (θ)) and
 
K11 (θ) K12 (θ)
K(θ) =
K21 (θ) K22 (θ) 2×2
Vθ [T ] τ 0 (θ) τ 00 (θ)
 

Consider  τ 0 (θ) K11 (θ) K12 (θ)  ≥ 0


τ 00 (θ) K21 (θ) K22 (θ)

144
Probability Models and their Parametric Estimation

2
Vθ [T ][K11 (θ)K22 (θ) − K12 (θ)] − τ 0 (θ)[τ 0 (θ)K22 (θ) − τ 00 (θ)K12 (θ)]
+ τ 00 (θ)[τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)] ≥ 0
2
Vθ [T ][K11 (θ)K22 (θ) − K12 (θ)] ≥ τ 0 (θ)[τ 0 (θ)K22 (θ) − τ 00 (θ)K12 (θ)] − τ 00 (θ)[τ 0 (θ)K12 (θ) −
τ 00 (θ)K11 (θ)]
≥ K 1 (θ) [τ 0 (θ)]2 K22 (θ)K11 (θ) − 2τ 0 (θ)τ 00 (θ)K11 (θ)K12 (θ) + [τ 00 (θ)]2 K11
2

(θ)
11

1
[τ 0 (θ)]2 K12
2
(θ) + [τ 0 (θ)]2 K22 (θ)K11 (θ) − 2τ 0 (θ)τ 00 (θ)K11 (θ)K12 (θ) + [τ 00 (θ)]2 K11
2
(θ) − [τ 0 (θ)]2 K12
2

K11 (θ)
(θ)
≥ K 1 (θ) [τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)]2 + [τ 0 (θ)]2 [K11 (θ)K22 (θ) − K12 2

(θ)]
11

1 [τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)]2


Vθ [T ] ≥ [τ 0 (θ)]2 + 2 (θ)]
K11 (θ) K11 (θ)[K11 (θ)K22 (θ) − K12
≥ Cramer - Rao Inequality + Positive quantity
2
h K(θ) iis positive definite so K11 (θ)K22 (θ) − K12 (θ) > 0 and K11 (θ) =
Since
∂ log L(θ)
Vθ ∂θ > 0 . Thus the Bhattacharya Inequality is more sharper than the Cramer
- Rao Inequality.
Problems
4.1 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal population with
mean θ . Which among the two estimators T1 = X1 +X2n+···+Xn and T2 =
X1 +X2 +···+Xn
n is better? Why?
4.2 Show that, under some conditions to be stated there is a lower limit to the variance
of an unbiased estimator. How you modify the lower limit to a biased estimator?
4.3 Let X1 , X2 be independent random variables each having Poisson distribution
with mean θ . Show that Vθ X1 +X
 
2
2
≤ V θ [2X1 − X2 ] . Also justify the
inequality by Rao - Blackwell Theorem.
4.4 Show that Bhattacharya bound is better than Cramer - Rao bound.
4.5 Define Bhatttacharya bound of order r . Also obtain B(2) for estimating θ2 un-
biasedly, θ being the mean of a Bernoulli distribution from which a sample of
size n is available.
4.6 Let X and Y have a bivariate normal distribution with mean θ1 and θ2 with
positive variance σ12 and σ22 and with correlation coefficient ρ . Find Eθ2 [Y |
X = x] = φ(x) and variance of φ(X) .
4.7 Mention the significance of Rao - Blackwell Theorem.
4.8 In what way, Lehman - Scheffe’s Theorem different from Rao - Blackwell Theo-
rem.
4.9 Let X be a Hyper Geometric random variable with pmf
  
D N-D
x n-x
PD {X = x} =  
N
n

145
A. Santhakumaran

where max(0, D + n − N ) ≤ x ≤ min(n, D). Find the UMVUE for D, where


N is assumed to be known.
4.10 Let X1 , X2 · · · , Xn be a random sample from a population with meanPn θ and fi-
nite variance and T = t(X) be an estimator of θ of the form T = i=1 αi Xi .
If T is an unbiased estimator of θ that has minimum variance and T 0 = t0 (X)
is another linear unbiased estimator of θ, then Covθ (T, T 0 ) = Vθ [T ].
4.11 Let X1 , X2 , · · · , Xn be a random sample from p(x | θ) = θe−θx , θ > 0, x >
0 . Show that Pn−1 n
Xi is the UMVUE of θ .
i=1

4.12 Stating the assumptions clearly, derive the Chapman - Robbin lower bound for
the variance of an unbiased estimator of a function of a real valued parameter θ .
4.13 A random sample X1 , X2 , · · · , Xn is available from a Poisson population with
mean λ . Using the unbiased estimator T = t(X1 , X2 ) = X12 − X2 . Obtain
the UMVUE of λ2 based on the sample.
4.14 State the Bhattacharya bound of order s . Also prove that it is a non - decreasing
function of s .
4.15 Define Bhattacharya bound. Show that it is sharper than the Cramer - Rao bound.
4.16 On the basis of a random sample of size n , the Cramer - Rao lower bound of
variance of an unbiased estimator of θ in
 1
π[1+(x−θ)2 ] −∞ < x < ∞; −∞ < θ < ∞
pθ (x) =
0 otherwise
is equal to
( a) n1 (b) 1
n2 (c) 2
n (d) 2
n

4.17 T1 = t1 (X) and T2 = t2 (X) are independent unbiased estimators of θ with


V [Ti ] = vi , i = 1, 2. The best linear unbiased estimator (l1 T1 + l2 T2 ) of θ is
the one for which
(a) l1 = l2 = .5
(b) l1 = (v1v+v 2
2)
; l2 = (v1v+v
1
2)
v −1
(c) l1 = (v−11+v−1 )
1 2
(d) l1 = 0, l2 = 1 if v1 > v2 and vice versa
4.18 Consider the following statements:
If X1 , X2 , · · · , Xn are iid random variables with uniform distribution over
(0, θ) , then
1. 2X̄ is an unbiased estimator of θ .
2. The largest among X1 , X2 , · · · , Xn is an unbiased, estimator of θ .
3. The largest among X1 , X2 , · · · , Xn is sufficient for θ .
4. n+1
n X(n) is a minimum variance unbiased estimator of θ .
Of these statements :
(a) 1 and 3 are correct

146
Probability Models and their Parametric Estimation

(b) 1 and 4 are correct


(c) 1 and 2 are correct
(d) 1 , 3 and 4 are correct
4.19 Which one of the following is not necessary for the UMVU estimation of θ by
T = t(X) ?
(a) Eθ [T − θ] = 0
(b) Eθ [T − θ]2 < ∞
(c) Eθ [T − θ]2 is minimum
(d) T is a linear function of observations
4.20 If T1 = t1 (X) and T2 = t2 (X) are unbiased estimators of θ and θ2 (0 <
θ < 1) and T is a sufficient statistic, then E[T1 | T ] − E[T2 | T ] is :
(a) the minimum variance unbiased estimator of θ
(b) always an unbiased estimator of θ(1 − θ), which has variance not exceeding
that of θ(1 − θ)
(c) always the minimum variance unbiased estimator of θ(1 − θ)
(d) not an unbiased estimator of θ(1 − θ)
4.21 T 0 = t0 (X) and T = t(X) are two unbiased estimator of τ (θ) with variance
Vθ [T ] < ∞ and Vθ [T 0 ] < ∞ . The estimator T is said to be an efficient
estimator of τ θ) if:
(a) Vθ [T ] < Vθ [T 0 ]
(b) Vθ [T ] > Vθ [T 0 ]
(c) Vθ [T ] = Vθ [T 0 ]
(d) none of the above
4.22 T 0 = t0 (X) and T = t(X) are two unbiased estimator of τ (θ) with variance
Vθ [T ] < ∞ and Vθ [T 0 ] < ∞ . The estimator T is an efficient estimator relative
to T 0 of the parameter τ (θ) if:
(a) Vθ [T ] < Vθ [T 0 ]
(b) Vθ [T ] > Vθ [T 0 ]
(c) Vθ [T ] 6= Vθ [T 0 ]
(d) none of the above

147
A. Santhakumaran

5. METHODS OF ESTIMATION

5.1 Introduction
Chapters 2 , 3 and 4 disuse the properties of a good estimator. The methods of
obtaining such estimators are as follows:
(i) Method of Maximum Likelihood Estimation
(ii) Method of Minimum Variance Bound Estimation
(iii) Method of Moments Estimation

(iv) Method of Least Square Estimation


(iv) Method of Minimum Chi-Square Estimation

5.2 Method of Maximum Likelihood Estimation


The Maximum Likelihood Estimation is a principle, states that an estimate of θ,
say θ̂(x) within the admissible range of θ which makes the likelihood function L(θ)
as large as possible. That is, for any admissible value θ̂(x), L(θ̂) ≥ L(θ) . Thus
2
θ̂(x) is the solution of ∂L(θ)
∂θ = 0 and ∂ ∂θ L(θ)
2 < 0 at θ = θ̂(x). It is equivalent
2
that ∂ log∂θL(θ) = 0 and ∂ log L(θ)
∂θ 2 < 0 at θ = θ̂(x). Thus any non - trivial solution
θ̂(X) of the equations which maximizes L(θ) is called Maximum Likelihood Esti-
mator (MLE) of θ .

Example 5.1 Let X1 , X2 , · · · , Xn be iid random sample drawn from a popula-


tion with pdf N (0, θ) , θ > 0 . Find the MLE of θ .
The likelihood function for θ of the sample size n is
n
Y
L(θ) = pθ (xi )
i=1
  n2
1 1
Pn 2
= e− 2θ i=1 (xi −θ)
2πθ
n
n n 1 X
log L(θ) = − log 2π − log θ − (xi − θ)2
2 2 2θ i=1

Differenciating this with respect to θ


n
∂ log L(θ) n 1 X 2
= 0− + x
∂θ 2θ 2θ2 i=1 i
n
∂ 2 log L(θ) n 1 X 2
= − x
∂θ2 2θ2 θ3 i=1 i

148
Probability Models and their Parametric Estimation

Pn
∂ log L(θ) 1
Pn 2 x2i
For maximum , ∂θ = 0 → −n + θ i=1 xi = 0 i. e., θ̂ =
i=1
n and

∂ 2 log L(θ) n
=− < 0 at θ = θ̂(x)
∂θ2 2θ̂
Pn
X2
The MLE of θ is θ̂(X) = i=1 n
i
.
Example 5.2 A random sample of size n is drawn from a population having
density function

θxθ−1 0 < x < 1, 0 < θ < ∞
pθ (x) =
0 otherwise

Find the MLE of θ .


The likelihood function for θ of the sample size n is
n
Y
L(θ) = pθ (xi )
i=1
n
Y
= θn xθ−1
i
i=1
n
X
log L(θ) = n log θ + (θ − 1) log xi
i=1
n
∂ log L(θ) n X
= + log xi
∂θ θ i=1
∂ 2 log L(θ) n
= −
∂θ2 θ2
∂ log L(θ)
For maximum, = 0
∂θ
n
n X
⇒ + log xi = 0
θ i=1
−n
i.e., θ̂(x) = Pn and
i=1 log xi
∂ 2 log L(θ)

−n X 2
2
θ = θ̂(x) = 2
log xi
∂θ n
P 2
( log xi )
= − <0
n
−n
Thus the MLE of θ is θ̂(X) = Pn .
i=1 log Xi

Example 5.3 Let X1 , X2 , · · · , Xn be iid with common pdf


 1 −x
θe
θ 0 < x < ∞, θ > 0
pθ (x) =
0 otherwise

149
A. Santhakumaran

Obtain the MLE of Pθ {X > 2}.


Let p = Pθ {X > 2}
= 1 − Pθ {X ≤ 2}
Z 2
1 −x 2
= 1− e θ dx = e− θ
0 θ
2 1 2
log p = − ⇒ log =
θ p θ
2
⇒ θ =  
log p1

A sample of size n is taken and it is known that k of the observations are X > 2 and
(n − k) of the observation are X < 2 . The likelihood function for p of the sample
size n is
L(p) = pk (1 − p)n−k
log L(p) = k log p + (n − k) log(1 − p)
∂ log L(p) k (n − k)
= + (−1)
∂p p (1 − p)
k − np
=
p(1 − p)
∂ 2 log L(p) −np2 − k + 2pk
=
∂p2 [p(1 − p)]2
∂ log L(p)
For maximum, = 0
∂p
⇒ k − np = 0
k
i.e., p̂ = and
n
2
∂ 2 log L(p) −n nk 2 − k + 2 nk k

∂p2 k = k k 2

p̂= n n (1 − n )
k 1 − nk
 
k
= −  < 0 since n < 1 for n = 1, 2, · · ·
k k 2
n (1 − n )
k
Thus the value of the MLE of p is p̂ = n . The value of the MLE of P {X > 2} =
2
− −2
e where θ̂(x) =
θ̂(x)
k .
log( n )
Example 5.4 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal
population with mean θ and variance σ 2 . The density function
( 1 2
√ 1 e− 2σ2 (x−θ) −∞ < x < ∞, −∞ < θ < ∞, σ 2 > 0
pθ,σ2 (x) = 2πσ
0 otherwise
Find the MLE of

150
Probability Models and their Parametric Estimation

(i) θ when σ is known.


(ii) σ 2 when θ is known.
(iii) both θ and σ 2 are not known.
Case (i) When σ 2 is known, the likelihood function for θ is
n  
Y 1 − 12 (xi −θ)2
L(θ) = √ e 2σ

i=1 2πσ 2
− n − 1 P(xi −θ)2
= 2πσ 2 2 e 2σ2
n n 1 X
log L(θ) = − log 2π − log σ 2 − 2 (xi − θ)2
2 2 2σ
∂ log L(θ) 1 X
= (xi − θ)
∂θ σ2
∂ 2 log L(θ) n
= − 2 <0
∂θ2 σ
∂ log L(θ)
For maximum, = 0
∂θ
X ∂ 2 log L(θ)
⇒ (xi − θ) = 0 i.e., θ̂(x) = x̄ and <0
∂θ2

Thus the value of the MLE of θ is θ̂(x) = x̄ .


Case (ii) When θ is known, the likelihood function for σ 2 is
n n 1 X
log L(σ 2 ) = − log 2π − log σ 2 − 2 (xi − θ)2
2 2 2σ
∂ log L(σ 2 ) (xi − θ)2
P
n 1
= − +
∂σ 2 2σ 2 2(σ 2 )2
∂ 2 log L(σ 2 ) (xi − θ)2
P
n
= −
∂(σ 2 )2 2σ 4 σ6
∂ log L(σ 2 )
For maximum, = 0
∂σ 2
n 1 X
⇒ − 2+ 4 (xi − θ)2 = 0
2σ 2σ
(xi − θ)2
P
σ̂ 2 (x) = and
n
∂ 2 log L(σ 2 )

< 0
∂(σ 2 )2 σ2 =σ̂2 (x)
Pn
(x −θ)2
Thus the value of the MLE of σ 2 is σ̂ 2 (x) = i=1 n i .
Case (iii) When θ and σ 2 are unknown, the likelihood function for θ and σ 2 is
n n 1 X
log L(θ, σ 2 ) = − log 2π − log σ 2 − 2 (xi − θ)2
2 2 2σ
∂ log L(θ, σ 2 ) 1 X
= (xi − θ)
∂θ σ2

151
A. Santhakumaran

∂ 2 log L(θ,σ 2 ) ∂ 2 log L(θ,σ 2 )


∂θ∂σ 2 = ∂σ 2 ∂θ since both the partial derivatives exist and are continu-
ous.
∂ 2 log L(θ, σ 2 ) X −1
= (xi − θ) 4
∂θ∂σ 2 σ
2
∂ log L(θ, σ ) n 1 X
2
=− 2 + 4 (xi − θ)2
∂σ 2σ 2σ
∂ 2 log L(θ, σ 2 ) n 1 X
2 2
= 4
− 6 (xi − θ)2
∂(σ ) 2σ σ
∂ 2 log L(θ, σ 2 ) n
2
=− 2
∂θ σ
For maximum of L(θ, σ 2 ),

∂ log L(θ, σ 2 ) ∂ log L(θ, σ 2 ) ∂ 2 log L(θ, σ 2 )


=0 2
=0 <0
∂θ ∂σ ∂θ2
2
∂ 2 log L(θ, σ 2 ) ∂ 2 log L(θ, σ 2 ) ∂ 2 log L(θ, σ 2 )

and − > 0 at θ = θ̂(x) and σ 2 = σ̂ 2 (x)
∂θ2 ∂(σ 2 )2 ∂θ∂σ 2

(xi − x̄)2
   P
−n −n 2 2
− 0 > 0 at θ = θ̂(x) = x̄ and σ = σ̂ (x) =
σ̂ 2 (x) 2σ̂ 4 (x) n
∂ 2 log L(θ, σ 2 ) ∂ 2 log L(θ, σ 2 )

−n −n
since 2
= 2 <0 2 )2 2 2 = 2(σ̂ 2 (x))2 < 0
∂θ
θ=θ̂(x) σ̂ (x) ∂(σ σ =σ̂ (x)
2 2

∂ log L(θ, σ ) X −1
θ = θ̂(x) = (xi − x̄) 4 =0
∂θ∂σ 2 σ̂ (x)
σ 2 = σ̂ 2 (x)
P 2
.˙. The value of the MLE of θ and σ 2 are θ̂(x) = x̄ and σ̂ 2 (x) = (xni −x̄) .
Example 5.5 Find the MLE of the parameter α and λ ( λ being large) from a
sample of n independent observations from the population represented by the follow-
ing density function
( λ λ
(α) λ
−α x λ−1
pα,λ (x) = Γλ e x x > 0, λ > 0, α > 0
0 otherwise

Also obtain the asymptotic form of the covariance for the two parameters for large n .
Given that ∂ log
∂λ
Γλ 1
≈ log λ − 2λ .
Likelihood function for α and λ of the sample size n is
 nλ Yn
1 λ
Pn λ
L(α, λ) = n
e− α i=1 xi xλ−1
i
(Γλ) α i=1
n n
λX X
log L(α, λ) = −n log Γλ + nλ log λ − nλ log α − xi + (λ − 1) log xi
α i=1 i=1

152
Probability Models and their Parametric Estimation

P
∂ log L(α, λ) nλ xi
=− +λ 2
∂α α α
∂ 2 log L(α, λ)
P
nλ xi
= 2 − 2λ 3
∂α2 α α
∂ 2 log L(α, λ)
P
n xi
=− + 2
∂λ∂α α α
Pn n
∂ log L(α, λ) ∂ log Γλ i=1 xi X
= −n + n(1 + log λ) − n log α − + log xi
∂λ ∂λ α i=1
P
∂ log L(α, λ) 1 xi X
= −n(log λ − ) + n + n log λ − n log α − + log xi
∂λ 2λ α
P
∂ log L(α, λ) n xi X
= + n − n log α − + log xi
∂λ 2λ α
∂ 2 log L(α, λ) n
2
=− 2
∂λ 2λ
∂ log L(α,λ)
For maximum of log L(α, λ), ∂α = 0 and ∂ log∂λ L(α,λ)
=0
P
λ xi
−n + λ 2 = 0 → α̂(x) = x̄ and
P α α
n xi X
+ n − n log α − + log xi = 0
2λ α
n
→ λ̂(x) = Pn
2 i=1 (log x̄ − log xi )

∂ 2 log L(α, λ)

n nx̄
Further =− + 2 =0
∂λ∂α
α=α̂(x),λ=λ̂(x) x̄ x̄

∂ 2 log L(α, λ)

< 0 and
∂λ2
λ=λ̂(x)
2
∂ 2 log L(α, λ) ∂ 2 log L(α, λ)
 2
∂ log L(α, λ)
− > 0 at α = α̂(x) and λ = λ̂(x)
∂λ2 ∂α2 ∂λ∂α
" #
n nλ̂(x) 2λ̂(x)nx̄ n2 1
i.e., − − − 0 = >0
2λ̂2 (x) x̄2 x̄3 λ̂(x)x̄2 2

Thus the value of the MLE of α and λ are α̂(x) = x̄ and λ̂(x) = 2 P(log nx̄−log xi ) .
The asymptotic covariance matrix is
 h 2 i h 2 i 
−Eα,λ ∂ log∂αL(α,λ)
2 −Eα,λ ∂ log L(α,λ)
∂λ∂α
D= h 2 i h 2 i 
−Eα,λ ∂ log L(α,λ)
∂α∂λ −Eα,λ
∂ log L(α,λ)
∂λ 2

153
A. Santhakumaran

" n #
∂ 2 log L(α, λ)
 
nλ 2λ X
−Eα,λ = − 2 + 3 Eα Xi
∂α2 α α i=1
nλ 2λ
= − + 3 nα
α2 α

= since Eα [Xi ] = α ∀ i
α2
∂ 2 log L(α, λ)
 
n
−Eα,λ =
∂λ2 2λ2

The asymptotic covariance matrix at α = α̂(x) and λ = λ̂(x) is


" #
nλ̂(x)
α̂2 (x) 0
D= n
0 2λ̂2 (x)

Remark 5.1 The likelihood equation ∂L(θ) ∂θ = 0 or


∂ log L(θ)
∂θ = 0 has more than
one root and L(θ) is not differentiable everywhere in Ω , then the estimate of the MLE
may be a terminal value, middle value of a sample, need not be unbiased, not sufficient,
not unique and not consistent. The likelihood function L(θ) for θ is continuously
differentiable and is bounded above, then the likelihood equation has unique solution,
which maximizes L(θ) .
Example 5.6 MLE is a terminal value
The maximum likelihood estimate of the parameter α when β is known for the pdf
βe−β(x−α) α ≤ x < ∞, β > 0, α > 0

pα, β (x) =
0 otherwise
from a sample of size n is α̂ = x(1) .
When β is known, the likelihood function for α of the sample size n is
Pn
L(α) = β n e−β i=1 (xi −α)

n
X
log L(α) = n log β − β (xi − α)
i=1
∂ log L(α)
= nβ
∂α
The direct method cannot help to estimate the MLE of α . Since α ≤ x(1) ≤ x(2) ≤
· · · ≤ x(n) < ∞ , i.e., the range of the distribution depends on the parameter α .
log L(α) = n log β − nβ x̄ + nβα
is maximum, if α is minimum , i.e., α̂ = x(1) = value of the minimum order statistic
of the sample. Thus the value of the MLE of α is the terminal value x(1) .
Example 5.7 Let X1 , X2 , · · · , Xn be a random sample drawn from a population
having density
 1 −|x−θ|
2e −∞ < x < ∞, −∞ < θ < ∞
pθ (x) =
0 otherwise

154
Probability Models and their Parametric Estimation

Show that the sample Median is the MLE of θ .


The likelihood funcion for θ of the sample size n is
 n P
1
L(θ) = e− |xi −θ|
2
 n
1 1
= P
|x
2 e i −θ|

P
L(θ) Pis maximum, if e |xi −θ| is minimum.
But e |xi −θ| is minimum if θ̂(x) = Median of the sample value, since mean devia-
tion is least when measured from the median. Thus the value of the MLE of θ is the
middle value of the sample.
Example 5.8 MLE is not unbiased
Let X1 , X2 , · · · , X5 be a random sample of size 5 from the uniform distribution hav-
ing pdf  1
θ 0 < x < θ, θ > 0
pθ (x) =
0 otherwise
Show that the MLE of θ is not unbiased.
The likelihood function for θ of the sample size n = 5 is
1
L(θ) = if 0 < xi < θ, i = 1, 2, 3, 4, 5.
θ5
L(θ) is maximum, the estimate of θ is minimum. If

θ̂(x) = min {xi } = x(1) , then L(θ̂(x)) is not consistent.


1≤i≤5

If L[θ̂(x)] is consistent, then


(  5
1
L[θ̂(x)] = x(5) if 0 < x ≤ x(5)
0 otherwise

If θ̂(x) = x(5) = max1≤i≤5 {xi }, then the value of the MLE of θ is θ̂(x) = x(5) .
Let Y = max1≤i≤5 {X5 } . The pdf of Y is
 5 4
pθ (y) = θ5 t 0<y<θ
0 otherwise
Z θ
5 5
Eθ [Y ] = 5
t dt
0 θ
5
= θ 6= θ
6
The MLE θ̂(X) = X(5) is not an unbiased estimator.
Example 5.9 MLE is not unique and not sufficient statistic
Let X1 , X2 , · · · , Xn be iid with the pdf

1 θ ≤x≤θ+1
pθ (x) =
0 otherwise

155
A. Santhakumaran

The likelihood function for θ of the sample size n is



1 if θ ≤ xi ≤ θ + 1, i = 1, 2, · · · , n
L(θ) =
0 otherwise

1 if θ ≤ min{xi } ≤ max{xi } ≤ θ + 1, i = 1, 2, · · · , n
L(θ) =
0 otherwise

1 if θ ≤ min{xi } = x(1) ≤ max{xi } = x(n) ≤ θ + 1, i = 1, 2, · · · , n
L(θ) =
0 otherwise

1 if θ ∈ [x(n) − 1, x(1) ]
L(θ) =
0 otherwise

Thus any point in [x(n) − 1, x(1) ] is a value of the MLE of θ . Thus the MLE of θ is
not unique and not sufficient statistic.
Example 5.10 MLE is not exist
Let X1 , X2 , · · · , Xn be a random sample drawn from a population with
pmf b(1, θ), 0 < θ < 1 both n and θ are unknown and the only sample values
(0, 0, 0, · · · , 0) or (1, 1, · · · , 1) is available.
The likelihood function for θ of the sample size n is
P P
= θ xi (1 − θ)n− xi
L(θ)
X  X 
log L(θ) = xi + n − xi log(1 − θ)
P P
∂ log L(θ) xi (n − xi )
= +
∂θ θ 1−θ
∂ log L(θ)
For maximum , = 0
∂θ
→ θ̂(x) = x̄ and
2

∂ log L(θ)

<0
∂θ2
θ=x̄

If (0, 0, · · · , 0) or (1, 1, · · · , 1) is alone observed, then x̄ = 0 or 1 is the value of


the MLE of θ . It is not the admissible value of θ , since θ ∈ (0, 1) . Thus the MLE of
θ is not exist.
Example 5.11 MLE is not consistent
     
Xi µi
Let ∼N , σ 2 In i = 1, 2, · · · , n
Yi µi
be independent vectors, where µi , i = 1, 2, · · · , n and σ 2 are unknown.
Xi
" #  
µi
E Yi = µi ∀ i = 1, 2, · · · , n andV [Xi ] = V [Yi ] = σ 2 ∀ i

The likelihood function for µi , i = 1, 2, · · · , n and σ 2 is


n  
Y 1 − 12 (xi −µi )2 − 12 (yi −µi )2
L(µi , σ 2 ) = e 2σ 2σ

i=1
2πσ 2

156
Probability Models and their Parametric Estimation

1
Pn 1
Pn
log L(µi , σ 2 ) = −n log 2π − n log σ 2 − 2σ 2 i=1 (xi − µi )2 − 2σ 2 i=1 (yi − µi )2

∂ log L(µi , σ 2 )
= 0
∂µi
1 1
⇒ 2 (xi − µi ) + 2 (yi − µi ) = 0
σ σ
xi + yi
⇒ µ̂i = , i = 1, 2, · · · , n
2

" n n
#
∂ log L(µi , σ 2 ) −n 1 X 2
X
2
= 2 + 4 (xi − µi ) + (yi − µi ) = 0
∂σ 2 σ 2σ i=1 i=1
" n  2 X n  2 #
−n 1 X xi + yi xi + yi
+ 4 xi − + yi − =0
σ2 2σ i=1 2 i=1
2
" n n
#
−n 1 1X 2 1X 2
+ 4 (xi − yi ) + (xi − yi ) = 0
σ2 2σ 4 i= 4 i=1
n
1 X
⇒ σ̂ 2 (x, y) = (xi − yi )2
4n i=1

If Vi = Xi −Yi , then Vi ∼ N (0, 2σ 2 ), iP


= 1, 2, · · · , n, since Xi ∼ N (µi , σ 2 ), Yi ∼
1 n
2 2
N (µi , σ ), then the MLE of 2σ is n i=1 Vi 2 . V1 2 , V2 2 , · · · , Vn2 are iid ran-
2
dom variables each having χ variate with one degree of freedom. By Kolmogorov’s
Strong Law of Large Numbers
n
1 X 2 as
V → Eσ2 [Vi2 ] = 2σ 2 as n → ∞ sinceEσ2 [V 2 ] = 2σ 2
n i=1 i
n
1 X 2 as
i.e., V → σ2 as n → ∞
2n i=1 i
n
1 X 2 as σ2
i.e., V → 6= σ 2 as n → ∞
4n i=1 i 2

n
1 X
Thus σ̂ 2 (X, Y ) = (Xi − Yi )2 is not consistent estimator of σ 2 .
4n i=1

5.3 Numerical Methods of Maximum Likelihood Estimation


The likelihood equations are often difficult to solve explicitly for θ even in cases
where all the regularity conditions hold and the unique solution exist. Equations in the
exponential cases are very often non-linear and difficult to solve. It may difficult to
locate the global maximum of the likelihood function for the following cases,
(i) the family of distributions under consideration is not of the exponential type.

157
A. Santhakumaran

(ii) there exists multiple roots of the likelihood equations.

The use of successive iterations to solve the likelihood equations by assuming ∂ log∂θL(θ)
is continuous at θ for each xi , i = 1, 2, 3, · · · , n , where n is the sample size.
For example, a random variable has a Cauchy distribution depending on a location
parameter θ , i.e.,
 1 1
π 1+(x−θ)2 −∞ < x < ∞
pθ (x) =
0 otherwise

Taking a sample of size n from the population, the log likelihood function for θ is
n
X
log L(θ) = −n log π − log[1 + (xi − θ)2 ]
i=1
n  
∂ log L(θ) X 2(xi − θ)
= −
∂θ i=1
1 + (xi − θ)2

The likelihood equation


n  
X 2(xi − θ)
=0
i=1
1 + (xi − θ)2
has no explicit solution. The log likelihood function of θ may have several local
2
maximum for a given sample X1 , X2 , · · · , hXn . Suppose i − log[1 + (xi − θ) ] has
Pn 2(xi −θ)
a maximum at θ = xi , then sum − i=1 1+(xi −θ)2 may have up to n different
local maxima and it depends on the sample values. Newton - Raphson method is used
to locate the local maxima.
(i) Newton - Raphson Method
The Newton - Raphson method on the expansion around θ̂(x) of the likeli-
hood equation ∂ log∂θL(θ) is
∂ log L(θ̂(x))
 ∂ 2 log L[θ +ν θ̂(x)−θ ]
( 0)

∂ log L(θ0 ) 0
∂θ = ∂θ + θ̂(x) − θ 0 ∂θ 2 for some 0 < ν <
1 (5.1) where θ̂(x) is the root
of the likelihood equation and θ0 is an initial solution or trial solution. Since θ̂(x) is
∂ log L(θ̂(x))
the solution of the equation ∂θ = 0 and if ν = 0 , then

∂ log L(θ0 )   ∂ 2 log L(θ )


0
+ θ̂(x) − θ0 =0
∂θ ∂θ2
∂ log L(θ0 )
∂θ
⇒ θ̂(x) = θ0 − ∂ 2 log L(θ0 )
= θ1 (say) (5.2)
∂θ 2
The value θ1 can be substituted in equation (5.1) for θ0 to obtain another value θ2 ,
so that
∂ log L(θ1 )
∂θ
θ2 = θ1 − ∂ 2 log L(θ1 )
(5.3)
∂θ 2

158
Probability Models and their Parametric Estimation

and so on. Starting from an initial solution θ0 , one can generate a sequence {θk , k =
0, 1, · · · } which is determined successively by the formula
∂ log L(θk )
∂θ
θk+1 = θk − ∂ 2 log L(θk )
, k = 0, 1, 2, · · · (5.4)
∂θ 2

If the initial solution θ0 was chosen, close to the root of the likelihood equations θ̂(x)
2
and if ∂ log L(θk )
∂θ 2 for k = 0, 1, · · · , is bounded away from zero, there is a good
chance that the sequence generated by equation (5.4) will converge to the root θ̂(x) .
The sequence {θk , k = 0, 1, · · · , } generated by equation (5.4) depends on the sample
values X1 , X2 , · · · Xn . If the chosen initial solution θ0 is a consistent estimator of θ ,
then the sequence obtained by the equation (5.4) will faster converge to the root θ̂(x)
and provide the best asymptotically normal estimator of θ .
In small sample situations the sequence {θk , k = 0, 1, · · · , } generated by
equation (5.4) may convey irregularities due to the particular sample values obtained
in the experiment. In order to avoid irregularities in the approximating sequence, two
methods are proposed. They are fixed derivative method and method of scoring.
(ii) The Method of Fixed derivative
2
In the fixed derivative method, the term ∂ log L(θk )
∂θ 2 in equation (5.4) is re-
placed by − ank where {ak , k = 0, 1, · · · } is a suitable chosen sequence of constants
and n is the sample size.
Now the sequence {θk , k = 0, 1, · · · } is generated by

ak ∂ log L(θk )
θk+1 = θk + , k = 0, 1, 2, · · · (5.5)
n ∂θ
The sequence {θk , k = 0, 1, · · · , } converge to the root θ̂(x) in a more regular fash-
ion rather than the equation (5.4) by the choice sequence {ak }∞ k=0
Fixed derivative method fails to converge in many cases, the method of scoring
may use to locate the local maximum, since the log likelihood curve is steep in the
neighbour hood of a local maximum equation (5.5).
(iii) The Method of Scoring
The method of scoring is a special case of the fixed derivative method. The
special sequence {ak , k = 0, 1, · · · , } is chosen by Fisher. It is ak = I(θnk ) , where
I(θk ) is the amount of Fisher Information of n observations x of X and θk is the
value of the approximation after the (k − 1)th iteration. Thus Fisher’s scoring method
generates the sequence

1 ∂ log L(θk )
θk+1 = θk +
I(θk ) ∂θ

for the (k − 1)th iteration, k = 0, 1, 2, · · · . The method of iteration continues and


stop when the sequence {θk , k = 0, 1, · · · , } converges on a local maximum.
Example 5.12 The following data represents a sample from a Cauchy population.
Obtain the maximum likelihood estimate for the parameter involved in the distribution
by the method of successive approximation.

159
A. Santhakumaran

7.3344 3.4004 3.944 4.434 6.304


4.444 7.784 10.844 8.604 6.334
5.998 4.406 6.394 5.006 9.582
The pdf of the Cauchy distribution is
 1 1
π 1+(x−θ)2 −∞ < x < ∞
pθ (x) =
0 otherwise

Arrange the sample values in the incceasing order of magnitude. Let the first trial value
of θ is θ̂(x) = t1 = the value of the sample median. The first approximation value is
n  
4X (xi − t1 )
t2 = t1 +
n i=1 1 + (xi − t1 )2

The successive iteration values are t3 , t4 , · · · . This procedure is continued until any
two successive iterations values are equal. The convergent value is the value of the
MLE of θ .
C programme for MLE of θ of Cauchy distribution
#include < stdio.h >
#include < math.h >
#include < conio.h >
void main()
{
int i,j,n;
float a[100], sum[100], t[100], temp;
clrscr();
printf( ˝ Enter the number of observations n: \ n”);
scanf( ˝ %d”, &n);
printf( ˝ Enter the observations a: \ n”);
for(i= 1; i < = n; i++)
scanf( ˝ % f”, &a[i]);
for(i=1; i < = n-1, i++)
{
for(j=i+1; j < = n; j++)
{
if(a[i] > = a[j])
{
temp=a[i];
a[i]= a[j];
a[j]= temp;
}
}
}
if(n % 2 = = 0)
t[1] = (a[n/2] + a[ n/2 + 1]) / 2 ;

160
Probability Models and their Parametric Estimation

else
t[1] = a[(n+1)/2];
printf( ˝ \ n OUT PUT \ n \ n ”);
printf( ˝ Value of the MLE of the Cauchy Distribution \ n”);
printf( ˝ \ n - - - - - - - - - - - - - - \ n”);
for(i=1:i < = n; i++)
printf( ˝ \ t %f \ n”, a[i]);
printf( ˝ \ n Result: \ n \ n”);
printf( ˝ Median = t[1] = %f \ n \ n”, t[1]);
for(j=1; j < =n; j++)
{
sum[j]= 0;
for(i =1; i < = n; i++)
{
sum[j] = sum[j] + (a[i] - t[j]) / (1 + (a[i] - t[j]) *(a[i] - t[j]) );
}
printf( ˝ Sum[%d] = % f \ t \ n”, j, sum[j]);
t[j+1] = t[j] + (4 / (float)n)*(sum[j]);
printf( ˝ t[%d] = %f \ n ”, j+1, t[j+1]);
if(abs(t[j] -t[j+1] ) > = .001 )
break;
}
printf( ˝ \ n Value of the MLE of theta = % f”,t[j] );
getch();
}
The value of MLE of θ = 6.013498.
Example 5.13 Obtain the values of the MLE’s of the parameters b and c of the
pdf
c
c c−1 − xb

x e x, b, c > 0
pb,c (x) = b
0 otherwise
based on a sample of size n .
The likelihood function for b and c of the sample size n is
n
 c n Y Pn
1
xci
L(c, b) = xc−1
i e− b i=1
b i=1
n
X 1X c
log L(c, b) = n log c − n log b + (c − 1) log xi − x
b i=1 i
n
∂ log L(c, b) n X c X c−1
= + log xi − x
∂c c b i=1 i
n
∂ log L(c, b) n 1 X c
= − + 2 x
∂b b b i=1 i

161
A. Santhakumaran

Case (i) when c is known, the maximum of L(c) is obtained.


n
∂ log L(b) n 1 X c
= − + 2 x =0
∂b b b i=1 i
n
X
⇒ nb = xci
i=1
Pn
xc
The value of the MLE of b is b̂(x) = i=1
n
i

Case (ii) when b is unknown, the maximum of L(c) is obtained

∂ log L(c)
= 0
∂c
n n
n X c X
⇒ + log xi − xc−1
i = 0
c i=1
b i=1
n
X n
X
i.e., c2 xc−1
i − cb log xi − nb = 0
i=1 i=1

The estimate of the MLE of c is obtained by iterative method.


Case(iii) when both c and b are unknown, the maximum L(c, b) is obtained by
solving the following equations.

∂ log L(c, b) ∂ log L(c, b)


= 0 and =0
∂c ∂b
Xn
nb − xci = 0
i=1
n
X n
X
nb + cb log xi − c2 xc−1
i =0
i=1 i=1
n  Pn c n
X n
i=1 xi
X X
i.e., xci + c log xi − c2 xc−1
i =0
i=1
n i=1 i=1

The estimates of c and b are obtained to solve the above equations for c and b by
iterative method.

5.4 Optimum Property of MLE


Lemma 5.1 Denote X ∼ Pθ , θ ∈ Ω and it has pdf pθ (x)
(i) The probability distributions Pθ are distinct for distinct values of θ .
(ii) The range of the density functions p(x | θ) are independent of the parameter θ .
(iii) The random observations X1 , X2 , · · · , Xn on X are independent and identi-
cally distributed.

162
Probability Models and their Parametric Estimation

(iv) Ω contains an open interval and Ω containing θ0 , the true value of θ as an


interior point in Ω .
Then Pθ0 {L(θ0 ) > L(θ1 )} → 1 as n → ∞ for θ0 and θ1 ∈ Ω.
Proof:
n
Y
Let L(θ1 ) = pθ (xi ) and
i=1
Yn
L(θ0 ) = pθ0 (xi )
i=1
Define Sn = {x : L(θ0 ) > L(θ1 )}
Prove that Pθ0 {Sn } → 1 as n → ∞
 
L(θ0 ) L(θ0 )
>1 ↔ log >0
L(θ1 ) L(θ1 )
n  
X pθ0 (xi )
log > 0
i=1
pθ1 (xi )
n  
X pθ1 (xi )
log < 0
i=1
pθ0 (xi )
n  
1X pθ1 (xi )
log < 0
n i=1 pθ0 (xi )
( n   )
n o 1X pθ1 (Xi )
lim Pθ0 {Sn } = Pθ0 lim Sn = Pθ0 lim log <0
n→∞ n→∞ n→∞ n pθ0 (Xi )
i=1
p (x )
Since X1 , X2 , · · · , Xn are iid ⇒ pθθ1 (xii ) are iid. By Khintchin’s Law of Large
0
Numbers
n     
1X pθ1 (xi ) P pθ1 (X)
log → Eθ0 log as n → ∞
n i=1 pθ0 (xi ) pθ0 (X)

By Jensen’s Inequality for the convex function f (X) ⇒ E[f (X)] ≤ f (E[X]). Here
p (x) p (x)
− log pθθ0 (x) = log pθθ1 (x) is strictly convex. 1
1 0

 
pθ1 (x)
For the convex function, log
pθ0 (x)
     
pθ1 (X) pθ1 (X)
Eθ0 log ≤ log Eθ0
pθ0 (X) pθ0 (X)
  Z
pθ1 (X) pθ1 (x)
But Eθ0 = pθ (x)dx = 1
pθ0 (X) pθ0 (x) 0
1 dy 1
y = log x is a concave function and − log x is a convex function, since dx
= x
>0 ↑ ∀x>0
d2 y
and dx2
= − x12 <0

163
A. Santhakumaran

 
L(θ0 )
.˙. lim Pθ0 {Sn } = Pθ0 lim >1
n→∞ n→∞ L(θ1 )
( n   )
1X pθ1 (Xi )
= Pθ0 lim log <0
n→∞ n pθ0 (Xi )
i=1
    
pθ1 (X)
= Pθ0 Eθ0 log < 0 → 1 as n → ∞
pθ0 (X)
   
pθ1 (X)
= Pθ0 log Eθ0 < 0 → 1 as n → ∞
pθ0 (X)
Pθ0 {L(θ0 ) > L(θ1 )} → 1 as n → ∞

MLE is consistent
Theorem 5.1 (Dugue, 1937) If log L(θ) is differentiable in an interval including
the true value of θ, say θ0 , then under the assumptions of Lemma 5.1, the likelihood
equation ∂ log∂θL(θ) = 0 has a root with probability 1 as n → ∞ which is consistent
for θ0 .
Proof: Let θ0 be o of θ and consider an interval (θ0 ± δ) , δ > 0 .
n the true value
L(θ0 )
By Lemma 5.1 Pθ0 L(θ1 ) > 1 → 1 as n → ∞, where θ1 = θ0 ± δ, since θ0 ∈
(θ0 − δ, θ0 + δ) and the likelihood function is continuous in (θ0 − δ, θ0 + δ) .
L(θ) should have a relative maximum within (θ0 − δ, θ0 + δ) with probability tends
to 1 as n → ∞ , since L(θ) is differentiable over (θ0 − δ, θ0 + δ) .
⇒ ∂ log∂θL(θ) = 0 at some point in (θ0 − δ, θ0 + δ)
⇒ θ̂(x) is a solution of ∂ log∂θL(θ) = 0 in (θ0 − δ, θ0 + δ)
⇒ θ̂(X) n ∈ [θ0 − δ, θ0 + δ] with probability
o tends to 1 as n → ∞
⇒ Pθ0 θ0 − δ < θ̂(X) < θ0 + δ → 1 as n → ∞
n o
⇒ Pθ0 θ̂(X) − θ0 < δ → 1as n → ∞

P
⇒ θ̂(X) → θ0 as n → ∞
⇒ θ̂(X) is a consistent estimator of θ .
MLE maximizes the Likelihood
Theorem 5.2 ( Huzurbazar, 1948) If log L(θ) is twice differentiable in an interval
including the true value of the parameter, than the consistent solution of the likelihood
equation [ which exists with probability one by Theorem 5.1 ] maximizes the likelihood
at the true value with probability tends to one, i.e.,
( )
∂ 2 log L(θ)

Pθ0 < 0 → 1 as n → ∞
∂θ2
θ=θ̂(x)

∂ 2 log L(θ)
Proof: Expanding ∂ 2 θ2 as Taylor’s series around θ̂(x) is
∂ 2 log L[θ̂(x)] ∂ 2 log L(θ0 ) 3
L(θ ? )
∂θ 2 = ∂θ 2 +[θ̂(x)−θ0 ] ∂ log
∂θ 3 where θ? = θ0 +ν(θ̂(x)−θ0 ), 0 <
ν<1

164
Probability Models and their Parametric Estimation

3
L(θ ? )
Further, assume ∂ log ≤ H(x) ∀ θ ∈ Ω and Eθ0 [H(X)] < ∞ is independent

∂θ 3

of θ0 .

∂ 2 log L[θ̂(x)] ∂ 2 log L(θ ) 3
∂ log L(θ? )

0
− ≤ |θ̂(x) − θ0 |

∂θ2 ∂θ2 ∂θ3


≤ |θ̂(x) − θ0 |H(x)
P P
|θ̂(X) − θ0 |H(X) → 0 as n → ∞ since θ̂(X) → θ0 as n → ∞
( )
∂ 2 log L[θ̂(X)] ∂ 2 log L(θ )
0
Pθ0 − <  → 1 as n → ∞

∂θ2 ∂θ2
Each X1 , X2 , · · · , Xn is iid and by Khintchin’s Law of Large Numbers

n
1 X ∂ 2 log pθ (xi ) P
 2 
∂ log pθ (X)
→ Eθ 0 as n → ∞
n i=1 ∂θ2 ∂θ2
 2 
∂ log pθ (X)
Since I(θ0 ) ≥ 0 → Eθ0 = −I(θ0 ) < 0
∂θ2
( n )
. 1 X ∂ 2 log pθ (X)
. .Pθ0 <0 → 1 as n → ∞
n i=1 ∂θ2
n
( )
∂ 2 log L(θ)
Y
Since L(θ) = pθ (xi ) → Pθ0 <0 → 1 as n → ∞
i=1
∂θ2
θ=θ̂(x)

MLE is asymptotically Normal


Let X1 , X2 , · · · , Xn be random observations on X with pdf pθ (x), θ ∈ Ω .
Assumptions:
∂ log L(θ) ∂ 2 log L(θ) 3
(i) ∂θ , ∂θ 2and ∂ log
, L(θ)
∂θ 3 exist for all x and over an interval contain-
ing the true value of θ say θ0 .
h i h 2 i
(ii) Eθ0 ∂ log∂θL(θ) = 0, Eθ0 ∂ log∂θ 2
L(θ)
= −nI(θ0 ) < 0 ∀ θ ∈ Ω where I(θ0 ) is
the amount of information for a single observation x of X .
3
(iii) ∂ log L(θ)
≤ H(x) and Eθ0 [H(X)] < ∞ is independent of θ0 .

∂θ 3

Theorem 5.3 ( Cramer p 1946) Let θ̂(X) be the MLE of θ , then under the regular-
ity conditions (i) to (iii) nI(θ0 )(θ̂(X) − θ0 ) has an asymptotic normal distribution
with mean zero and variance one
Proof: Let θ̂(X) be the solution of ∂ log∂θL(θ) = 0 in an interval containing the
true value θ0 of θ .
Expanding the function ∂ log∂θL(θ) around θ̂(x) by using Taylor’s series for any fixed

165
A. Santhakumaran

x,
   2
∂ log L θ̂(x) ∂ log L(θ0 )   2
∂ log L(θ0 ) θ̂(x) − θ 0 ∂ 3 log L(θ? )
i.e., = + θ̂(x) − θ0 +
∂θ ∂θ ∂θ2 2! ∂θ3
 
where θ? = θ0 + ν θ̂(x) − θ0 , 0 < ν < 1.
 2
∂ log L(θ̂(x)) ∂ log L(θ0 )   ∂ 2 log L(θ ) θ̂(x) − θ0 ∂ 3 log L(θ? )
0
But =0 → + θ̂(x) − θ0 2
+ =0
∂θ ∂θ ∂θ 2 ∂θ3
 2
  ∂ 2 log L(θ ) θ̂(x) − θ0 ∂ 3 log L(θ? ) ∂ log L(θ0 )
0
θ̂(x) − θ0 2
+ 3
=−
∂θ 2 ∂θ ∂θ
   
 ∂ 2 log L(θ ) θ̂(x) − θ0 ∂ 3 log L(θ? )
 = − ∂ log L(θ0 )

0
θ̂(x) − θ0  +
∂θ2 2 ∂θ3 ∂θ

  1 ∂ log L(θ0 )
n ∂θ
θ̂(x) − θ0 =  
1 ∂ 2 log L(θ0 ) (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ? )
−n ∂θ 2 − 2 n ∂θ 3

I(θ0 )
nI(θ0 ) n1 ∂ log∂θ
L(θ0 )
p
 
I(θ0 )
p
nI(θ0 ) θ̂(x) − θ0 =  
2 (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ ? )
− n1 ∂ log L(θ0 )
∂θ 2
− 2 n ∂θ 3

1 ∂ log L(θ0 )
√ ∂θ
nI(θ0 )
p  
nI(θ0 ) θ̂(x) − θ0 =  
2 (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ? )
1
I(θ0 ) − n1 ∂ log L(θ0 )
∂θ 2 − 2 n ∂θ 3

By Khintchin’s Law of Large Numbers


n
1 X ∂ 2 log pθ (xi ) P
 2 
∂ log pθ (X)
→ Eθ0 as n → ∞
n i=1 ∂θ2 ∂θ2

n
1 X ∂ 2 log pθ (xi ) P
→ −I(θ0 ) as n → ∞
n i=1 ∂θ2
P P
Also θ̂(X) → θ0 as n → ∞ → θ̂(X) − θ0 → 0 as n → ∞ and
Eθ0 [H(X)]
h = ki as n → ∞. Denote Zi = ∂ log∂θ pθ (xi )
, i = 1, 2, · · · , n. Eθ0 [Zi ] =
∂ log pθ (Xi )
Eθ0 ∂θ = 0 ∀ i = 1, 2, · · · , n. Let Sn = Z1 + · · · + Zn , then E[Sn ] = 0
and V [Sn ] = I(θ0 ) + · · · + I(θ0 ) = nI(θ0 )
∂ log L(θ0 )
√ 1 ∂θ
nI(θ0 )
p  
nI(θ0 ) θ̂(X) − θ0 = 1
 1  as n → ∞
I(θ0 ) − n (−nI(θ0 )) − 0

166
Probability Models and their Parametric Estimation

p   ∂ log L(θ0 )
nI(θ0 ) θ̂(X) − θ0 = p ∂θ as n → ∞
nI(θ0 )
n −E[Sn ] d
By Lindeberg - Levey Central Limit Theorem S√ → N (0, 1) as n → ∞ .
V [Sn ]
p  
.˙. nI(θ0 ) θ̂(X) − θ0 ∼ N (0, 1) as n → ∞ .
Remark 5.2 Any consistent estimator θ̂(X) of roots of the likelihood equation

satisfies n(θ̂(X)−θ0 ) ∼ N (0, I(θ10 ) ), then θ̂(X) is an efficient likelihood estimator
of θ or asymptotically normal and efficient estimator of θ .
MLE is unique
Theorem 5.4 ( Wald 1949) Consistent solution of a likelihood equation is unique with
probability 1 as n → ∞
ˆ ˆ ∂ log L(θ)
 Proof: Let θ1 (x) and θ2 (x) be two consistent solutions of ∂θ = 0 and
ˆ ˆ
θ1 (x) 6= θ2 (x) . By Huzurbazar’s Theorem
( )
∂ 2 log L(θˆ1 (X))
Pθ < 0 → 1 as n → ∞ and
∂θ2
( )
∂ 2 log L(θˆ2 (X))
Pθ < 0 → 1 as n → ∞
∂θ2
∂ log L(θ) 2 ˆ
∂ log L(θ3 (x))
Applying Rolle’s Theorem to the function
 ∂θ  which gives ∂θ 2 = 0
ˆ ˆ ˆ ˆ ˆ
for some θ3 (x) within the interval θ1 (x), θ2 (x) where θ3 (x) = λθ1 (x) + (1 −
λ)θˆ2 (x), 0 < λ < 1. θˆ3 (x) is also a consistent solution of ∂ log∂θL(θ) = 0. Thus
( )
∂ 2 log L(θˆ3 (X))
Pθ < 0 → 1 as n → ∞
∂θ2
∂ 2 log L(θˆ3 (x)) 2
∂ log L(θ̂(x))
∂θ 2 < 0 is a contradiction to Rolle’s Theorem
 property that ∂θ 2 =
ˆ ˆ ˆ ˆ
0 for some θ3 (x) within the interval θ1 (x), θ2 (x) . The only possibility is θ1 (x) =
θˆ2 (x) . Thus θˆ1 (x) = θˆ2 (x) is a consistent solution of the likelihood equation and is
unique.
Invariance Property of MLE
Let X ∼ Pθ , θ ∈ Ω, where Ω is a k dimensional parameter space. Consider
g(θ) : Ω → O where O is the r dimensional space (r ≤ k) . If θ̂ is the MLE of θ ,
then g(θ̂) is the MLE of g(θ) .
Let g(θ) be the function of θ from Ω to O , i.e., g : Ω → O ∀θ ∈ Ω
i.e., g(θ) = ω ∈ O . For a fixed ω ∈ O , let
Aω = [θ | g(θ) = ω]
= the set of all θ0 s such that g(θ) = ω fixed ∀ ω ∈ O
.. . ∩ω Aω = Ω

167
A. Santhakumaran

In other words for any given θ ∈ Ω , we can find a ω ∈ O such that θ ∈ Aω


Let θ̂ be the MLE of θ , i.e, L(θ̂) is maximized at θ = θ̂ , .. .θ̂ ∈ Ω .
⇒ given θ̂ , we can find ω̂ = g(θ̂) such that θ̂ ∈ Aω .
Thus θ̂ is the MLE of θ
⇒ g(θ̂) is the MLE of g(θ) .

Relation between One Parameter Exponential Family and MLE


Let X1 , X2 , · · · , Xn be a random sample on X according to a one parameter
exponential family with density
Q(θ)t(x)
pθ (x) = c(θ)e h(x)
θt(x)−A(θ)
= e h(x)

where c(θ) = e−A(θ) and Q(θ) = θ


The likelihood function for θ of the sample size n is
Pn
θi=1 t(xi )−nA(θ) h(x) where h(x) = h1 (x1 , x2 , · · · , xn )
L(θ) = e
n
X
log L(θ) = θ t(xi ) − nA(θ) + log h(x)
i=1
∂ log L(θ) n
X 0
= t(xi ) − nA (θ)
∂θ i=1

n
∂ log L(θ) 1X
For maximum, = 0 → A0 (θ) = t(xi ) (5.6)
∂θ n i=1
and
∂ 2 log L(θ)
= −nA00 (θ) < 0
∂θ2
Z
Consider eθt(x)−A(θ) h(x)dx = 1

Assume that the integral is continuous and has derivatives of all orders with re-
spect to θ and it can be differentiated under the integral sign.
Z Z
t(x)eθt(x)−A(θ) h(x)dx − A0 (θ)e−A(θ) eθt(x) h(x)dx = 0
Z
Eθ [T ] = A0 (θ) eθt(x)−A(θ) h(x)dx

A0 (θ) = Eθ [T ] (5.7)
Pn
Using equations (5.6) and (5.7), one may get Eθ [T ] = n1 i=1 t(xi )
Z Z
0
t(x)e θt(x)−A(θ)
h(x)dx − A (θ) = 0 since eθt(x) e−A(θ) h(x)dx = 1

Again differentiating with respect to θ


Z
t2 (x)eθt(x)−A(θ) h(x)dx − A00 (θ) − A0 (θ)Eθ [T ] = 0

168
Probability Models and their Parametric Estimation

Eθ [T 2 ] = A00 (θ) + A0 (θ)Eθ [T ]


2
Eθ [T 2 ] − (Eθ [T ]) = A00 (θ) since A0 (θ) = Eθ [T ]
∂ 2 A(θ)
i.e., = Vθ [T ]
∂θ2
√  
Thus n θ̂(X) − θ ∼ N (0, Vθ [T ]) , i.e., θ̂(X) is consistent, unique and asymp-
totically normal.

Relationship between Sufficient Statistic and MLE


If sufficient statistic exists, then the MLE is a function of sufficient statistics.
Let X1 , X2 , · · · , Xn be iid random sample with pdf pθ (x) . Let T =
t(X) be the sufficient statistic. The likelihood function for θ of the sample size n is
n
Y
L(θ) = pθ (xi )
i=1
= pθ (t)h(x) where h(x) = h1 (x1 , x2 , · · · , xn )
log L(θ) = log pθ (t) + log h(x)
∂ log L(θ) ∂ log pθ (t)
= and
∂θ ∂θ
∂ 2 log L(θ) 2
∂ log pθ (t)
=
∂θ2 ∂θ2
2

∂ log L(θ)
For MLE, ∂θ = 0 and ∂ log L(θ)
∂θ 2 < 0 are equivalent to ∂ log∂θpθ (t) = 0
θ=θ̂(x)
∂ 2 log pθ (t)
and ∂θ 2 < 0. Thus MLE is a function of the sufficient statistic.
θ=θ̂(x)

5.5 Method of Minimum Variance Bound Estimation


A statistic T = t(X) is said to be a MVBE if it attains the Cramer - Rao lower
bound.
Theorem 5.5 A necessary and sufficient condition for a statistic T = t(X) is a
MVBE of τ (θ) is ∂ log∂θL(θ) and [t(x) − τ (θ)] are proportional.
Proof: Assume ∂ log∂θL(θ) and t(x) − τ (θ) are proportional, i.e., ∂ log∂θL(θ) ∝
t(x) − τ (θ), i.e.,
∂ log L(θ)
= A(θ)[t(x) − τ (θ)] (5.8)
∂θ
where A(θ) is function of θ only.
To Prove T = t(X) is MVBE of τ (θ) , it is enough to prove

[τ 0 (θ)]2
Vθ [T ] = h i2 ∀ θ ∈ Ω
∂ log L(θ)
Eθ ∂θ

169
A. Santhakumaran

 
∂ log L(θ)
Covθ T, = τ 0 (θ), ∀ θ ∈ Ω.
∂θ
 
∂ log L(θ)
i.e., Eθ T = τ 0 (θ), ∀ θ ∈ Ω.
∂θ
   
∂ logL(θ) ∂ log L(θ)
Eθ (T − τ (θ)) = τ 0 (θ), since Eθ =0∀θ∈Ω
∂θ ∂θ
∂ log L(θ)
A(θ)Eθ [T − τ (θ)]2 = τ 0 (θ) since = A(θ)[t(x) − τ 0 (θ)]
∂θ
A(θ)Vθ [T ] = τ 0 (θ)
τ 0 (θ)
A(θ) =
Vθ [T ]
Squaring both sides of (5.8), one can get
 2
∂ log L(θ) 2
A2 (θ) t(x) − τ 0 (θ)

=
∂θ
 2
∂ log L(θ)
Eθ = A2 (θ)Vθ [T ]
∂θ
2
[τ 0 (θ)]2 Vθ [T ]

∂ log L(θ)
i.e., Eθ =
∂θ {Vθ [T ]}2
[τ 0 (θ)]2
i.e., Vθ [T ] = h i2 ∀ θ ∈ Ω
∂ log L(θ)
Eθ ∂θ

T = t(X) attains the Cramer - Rao lower bound, i.e., T = t(X) is a MVBE of
τ (θ) .
Conversely, assume T = t(X) is a MVBE of τ (θ) . Now to prove ∂ log∂θL(θ) ∝
[t(x) − τ (θ)] , i.e., ∂ log∂θL(θ) = A(θ)[t(x) − τ (θ)] , τ 0 (θ) = A(θ)Vθ [T ] and
h i2 0 2
Eθ ∂ log∂θL(θ) = [τVθ(θ)]
[T ]

2
A2 (θ)Vθ2 [T ]

∂ log L(θ)
.˙. Eθ =
∂θ Vθ [T ]
 2
∂ log L(θ)
Eθ = A2 (θ)Vθ [T ]
∂θ
 2
∂ log L(θ)
Eθ = A2 (θ)Eθ [T − τ (θ)]2
∂θ
∂ log L(θ)
⇒ = A(θ)[t(x) − τ (θ)]
∂θ
∂ log L(θ)
i.e., ∝ [t(x) − τ (θ)]
∂θ

170
Probability Models and their Parametric Estimation

Example 5.14 Let X1 , X2 , · · · , Xn be iid random sample drawn from a popu-


lation with density function

θxθ−1 0 < x < 1, θ > 0
pθ (x) =
0 otherwise
Obtain the MVBE of θ .
The likelihood function for θ of the sample size n is
n
Y
L(θ) = θn xθ−1
i
i=1
n
X
log L(θ) = n log θ + (θ − 1) log xi
i=1
n
∂ log L(θ) n X
= + log xi
∂θ θ i=1
" n #
X −n
= log xi −
i=1
θ
Pn
t(x) = i=1 log xi , τ (θ) = −nθ and A(θ) = 1. Thus the MVBE of τ (θ)(= −n
θ ) is
Pn Pn τ 0 (θ) n
i=1 log Xi and the variance of the estimator i=1 log Xi is A(θ) = θ 2 .
The MVBE of θ is θ̂(X) = Pn −n .
i=1 log Xi
Example 5.15 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a
population with pdf
 1 − x p−1
θ p Γp e
θ x x > 0, θ > 0
pθ (x) =
0 otherwise
Obtain the MVBE of θ when p is known.
The likelihood function for θ of the sample size n is
Pn ip−1
1 − i=1
xi hY
L(θ) = e θ x i
(Γp)n θnp
P n
xi X
log L(θ) = −n log Γp − np log θ − + (p − 1) log xi
θ i=1
∂ log L(θ) np nx̄
= 0− + 2
∂θ θ θ
n
= [x̄ − pθ]
θ2  
np x̄
= − θ
θ2 p
np x̄
τ (θ) = θ, A(θ) = 2 , t(x) =
θ p
X̄ τ 0 (θ) x̄2
The MVBE of τ (θ) is T = p, when p is known and Vθ [T ] = A(θ) = np3 .

171
A. Santhakumaran

Theorem 5.6 The necessary and sufficient condition that distribution admits the
estimator of a suitable chosen function of a parameter with variance equal to the in-
formation limit ( MVB) is that the likelihood function L(θ) = eθ1 t(x)+θ2 h(x), where
h(x) and t(x) are functions of observations only and θ1 and θ2 are functions of θ
only. The parametric functions to be estimated is − dθ dθ2 dθ
dθ1 = − dθ dθ1 and the variance
2

2
h  i
of the estimator is − ddθθ22 = dθ
d
− dθ
dθ1
2 1

1 dθ1

Proof: Let T = t(X) be the MVBE of τ (θ) where θ is the population parameter.
For a single observation x of X , the likelihood function for θ is L(θ) = pθ (x) , and
t(x) − τ (θ) and ∂ log∂θL(θ) are proportional, i.e.,

∂ log L(θ)
= A(θ)[t(x) − τ (θ)]
∂θ
where A(θ) is a function of θ only.
Integrating with respect to θ , one can get
Z Z
∂ log L(θ) = A(θ)[t(x) − τ (θ)]dθ + c

where c is a constant of integration and free from θ .

log L(θ) = t(x)θ1 + θ2 + c


L(θ) = eθ1 t(x)+θ2 +c
= eθ1 t(x)+θ2 ec
= eθ1 t(x)+θ2 h(x) where ec = h(x)

Thus the condition is necessary.


Conversely, the likelihood function L(θ) is expressible in the form

L(θ) = eθ1 t(x)+θ2 h(x)


Z Z
L(θ)dx = h(x)et(x)θ1 +θ2 dx = 1
Z
i.e., h(x)et(x)θ1 dx = e−θ2

Further, assuming the differentiation with respect to θ1 under the integral sign is valid
and differentiate twice, one can get
Z  
dθ2
h(x)et(x)θ1 t(x)dx = e−θ2 − (5.9)
dθ1
2
d2 θ2
Z 
dθ2
h(x)et(x)θ1 [t2 (x)]dx = e−θ2 − e−θ2 (5.10)
dθ1 dθ12

172
Probability Models and their Parametric Estimation

From equation (5.9), Eθ [T ] = − dθ


dθ1 = τ (θ)
2

 2
dθ2 d2 θ2
R 2
From equation (5.10), t (x)et(x)θ1 +θ2 h(x)dx = dθ 1
− dθ12

2
d2 θ2

2 dθ2
Eθ [T ] = −
dθ1 dθ12
2 d2 θ 2
Vθ [T ] = Eθ [T 2 ] − (Eθ [T ]) = −
dθ12

The variance of the MVBE of τ (θ) is


n o
d dθ2
0
[τ (θ)] 2 dθ (− dθ1 )
h i2 = h i2
∂ log L(θ) ∂ log L(θ)
Eθ ∂θ Eθ ∂θ
dθ2 dθ1 2
[− dθd1 { dθ 1
} dθ ]
= h i2
Eθ ∂ log∂θL(θ)
2
( ddθθ22 )2 { dθ 1 2
dθ }
1
= h i2
Eθ ∂ log∂θL(θ)

But log L(θ) = t(x)θ1 + θ2 + log h(x)

∂ log L(θ) dθ1 dθ2


= t(x) +
∂θ dθ dθ
∂ 2 log L(θ) d2 θ 1 d2 θ2
2
= t(x) 2 +
 2 ∂θ dθ dθ2
2
d2 θ2

∂ log L(θ) d θ1
Eθ = E θ [T ] +
∂θ2 dθ2 dθ2
2
d2 θ2
 
d θ1 dθ2
= − +
dθ2 dθ1 dθ2
dθ2 dθ2 dθ1
=
dθ dθ1 dθ
2
 
d θ2 d dθ2 dθ1
=
dθ2 dθ dθ1 dθ
d[ dθ
dθ1 ] dθ1
2
dθ2 d2 θ1
= +
dθ dθ dθ1 dθ2
2
d2 θ2 dθ1 dθ2 d2 θ1
 
= 2 +
dθ1 dθ dθ1 dθ2
 2  2 
∂ log L(θ) ∂ log L(θ)
But Eθ = −Eθ
∂θ ∂θ2

173
A. Santhakumaran

 
The variance of the MVBE of τ (θ) = − dθ
dθ1
2
is

2
{ ddθθ22 }2 { dθ 1 2
dθ } d2 θ 2
1
=−
2
− ddθθ22 ( dθ 1 2
dθ )
dθ12
1

2
The variance of T = t(X) is − ddθθ22 . Thus T = t(X) attains the MVB of the
1
parametric function τ (θ).
Example 5.16 Let X1 , X2 , · · · , Xn be a random sample drawn from the popula-
tion with pdf 
θxθ−1 0 < x < 1, θ > 0
pθ (x) =
0 otherwise
Find the MVBE of θ .
The likelihood function for θ is

n
!θ−1
Y
n
L(θ) = θ xi
i=1
n
X
log L(θ) = n log θ + (θ − 1) log xi
i=1
log xi − n
P P
n log θ+θ i=1 log xi
→ L(θ) = e
→ L(θ) = eθ1 t(x)+θ2 h(x)
where θ1 = θ, θ2 = n log θ,
P X
h(x) = e− log xi
, t(x) = log xi
dθ2 n
τ (θ) = − =−
dθ1 θ
d nθ

2
d θ2 n
Vθ [T ] = − 2 = − = 2
dθ1 dθ θ

Since Eθ [T ] = τ (θ) = −n log Xi = −n


P
θ ⇒ θ .
−n
Thus the MVBE of θ is θ̂(X) = log Xi .
P

Relationship between MVBE and MLE


If MVBE exists, then the MLE is a function of the MVBE.
Assume that the MVBE T = t(X) exists for the parameter θ , then

∂ log L(θ)
= A(θ)[t(x) − θ]
∂θ
∂ log L(θ) ∂ 2 log L(θ)
L(θ) attains maximum, if ∂θ = 0 and ∂θ 2 < 0 at θ = θ̂(x) .

i.e., A(θ)[t(x) − θ] = 0 → θ̂(x) = t(x) and

174
Probability Models and their Parametric Estimation

∂ 2 log L(θ)
= A0 (θ)[t(x) − θ] + A(θ)(−1)
∂θ2
∂ 2 log L(θ)
= −A(θ̂(x)) < 0 at θ = θ̂(x)
∂θ2
where θ(X)ˆ is MLE of θ .
Example 5.17 If T = t(X) is MVBE of τ (θ) and pθ (x1 , x2 , · · · , xn ) the joint
density function corresponding to n independent observations of a random variable
X , then show that correlation between T and ∂ log pθ (x∂θ 1 ,x2 ,··· ,xn )
is unity.
Given T = t(X) is the MVUE of τ (θ) , i.e., T attains the Cramer Rao lower
bound,
[τ 0 (θ)]2
⇒ Vθ [T ] = ] θ∈Ω
Vθ [ ∂ log pθ (x∂θ
1 ,x2 ,··· ,xn )

∂ log pθ (x1 , x2 , · · · , xn )
i.e., [τ 0 (θ)]2 = Vθ [T ]Vθ [ ]
∂θ
r
∂ log pθ (x1 , x2 , · · · , xn )
τ 0 (θ) = Vθ [T ]Vθ [ ]
∂θ
But τ (θ) = Eθ [T ]
Z
= tpθ (x1 , x2 , · · · , xn )dx

∂pθ (x1 , x2 , · · · , xn )
Z
τ 0 (θ) = t dx
∂θ
∂pθ (x1 , x2 , · · · , xn ) pθ (x1 , x2 , · · · , xn )
Z
= t dx
∂θ pθ (x1 , x2 , · · · , xn )
∂ log pθ (x1 , x2 , · · · , xn )
Z
= t pθ (x1 , x2 , · · · , xn )dx
∂θ
 
∂ log pθ (x1 , x2 , · · · , xn )
= Eθ T
∂θ
 
log pθ (x1 , x2 , · · · , xn )
= Covθ T,
∂θ
Correlation coefficient between T and log pθ (x1∂θ ,x2 ,··· ,xn )
is
h  i
Covθ T, log pθ (x1∂θ,x2 ,··· ,xn )

ρ= r h i
Vθ [T ]Vθ ∂ log pθ (x∂θ
1 ,x2 ,··· ,xn )

τ 0 (θ)
ρ = r h i
∂ log pθ (x1 ,x2 ,··· ,xn )
Vθ [T ]Vθ ∂θ

= 1
r h i
∂ log pθ (x1 ,x2 ,··· ,xn )
Since τ 0 (θ) = Vθ [T ]Vθ ∂θ

175
A. Santhakumaran

5.6 Method of Moment Estimation


Let X1 , X2 , · · · , Xn be iid random sample of size n with pdf pθ (x) where
θ = (θ1 , θ2 , · · · , θk ) of k parameters. Define µ0r = Eθ [X r ], r = 1, 2, · · · , k .
The method of moments estimation is a principle of solving a set of k equa-
tions in θ1 , θ2 , · · · , θk to estimate the parameters θ1 , θ2 , · · · , θk , i.e., θ̂(µ0 ) =
µ01 , µ02 , · · · , µ0k . Replace µ0r by m0r , where m0r is the rth raw moment of the random
sample. It gives the moment estimators of the parameters.
Remark 5.3 Moment estimators are consistent under suitable conditions. For iid
random sample X1 , X2 , · · · , Xn with pdf pθ (x) ∀ θ ∈ Ω ,
n
1X r P
X → E[X r ] as n → ∞, r = 1, 2, · · ·
n i=1 i

This is not true when the moments of the distribution do not exist. For example in the
case of Cauchy distribution moment estimators do not exist.
Example 5.18 A random sample of size n is taken from the log normal distribu-
tion ( 2
√1 1 − 2σ12 (log x−θ)
x e x>0
pθ,σ2 (x) = 2πσ
0 otherwise

176
Probability Models and their Parametric Estimation

Find the moment estimates of θ and σ 2 .


Z ∞
1 xr − 12 (log x−θ)2
E[X r ] = √ e 2σ dx
0 2πσ x
Take y = log x, i.e., ey = x → ey dy = dx
Z ∞
1 1 2
E[X r ] = √ ery e− 2σ2 (y−θ) dy
0 2πσ
y−θ
Let = z → y = σz + θ, dy = σdz
σ Z ∞
1 1 2
E[X r ] = √ erθ− 2 z +rσz dz
−∞ 2π
Z ∞
erθ 1 2
= √ e− 2 [z −2rσz] dz
2π −∞
2 2
rθ+ r 2σ Z ∞
e 1 2
= √ e− 2 [z−rσ] dz
2π −∞
2 2
rθ+ r 2σ √ Z ∞ √
0 e 1 2
µr = √ 2π since e− 2 [z−rσ] dz = 2π
2π −∞
r2 σ2
µr0 = erθ+ 2 r = 1, 2, · · ·
σ2 2
when r = 1 log µ10 = θ + , 2 log µ10 = 2θ + σ 2 , log (µ10 ) = 2θ + σ 2
2 !
0
2 µ
when r = 2 log µ20 = 2θ + 2σ 2 , log µ20 − log (µ10 ) = σ 2 , log 2
2 = σ2
(µ10 )
m02
  P r
0 xi
⇒ σ̂ 2 (x) = log where mr = r = 1, 2, · · ·
(m01 )2 n
m02
 
log(m01 )2 = 2θ̂(x) + log
(m01 )2
m02
 
log(m01 )2 − log = 2θ̂(x)
(m01 )2
!
(m01 )2
i.e., θ̂(x) = log p 0
m2

Example 5.19 Find the moment estimates of α and β for the pdf
( β
α −αx β−1
pα,β (x) = Γβ e x x > 0, β > 0, α > 0
0 otherwise

177
A. Santhakumaran

by using a sample of size n .



αβ −αx β−1
Z
r
E[X ] = xr e x dx
0 Γβ

αβ −αx r+β−1
Z
= e x dx
0 Γβ
Γ(β + r) αβ Γ(β + r)
µr0 = = r = 1, 2, · · ·
αβ+r Γβ αr Γβ
Γ(β + 1) βΓβ β
when r = 1 µ01 = = =
αΓβ αΓβ α
Γ(β + 2) (β + 1)βΓβ (β 2 + β)
when r = 2 µ02 = 2
= 2
=
α Γβ α Γβ α2
µ20 1 (µ0 )2
= 1+ → β= 1
(µ10 )2 β µ2
h P i2
(m0 )2 m0
P P 2
x
Thus β̂(x) = m12 and α̂(x) = m12 where m01 = nxi and m2 = n i − nxi .
Example 5.20 Obtain the moment estimate of the parameter θ of the pdf
 1 −|x−θ|
2e −∞ < x < ∞
pθ (x) =
0 otherwise

by taking a sample of size n .


Z ∞
x −|x−θ|
µ01 = Eθ [X] = e dx
−∞ 2
|x − θ| = x − θ if x ≥ θ
= −(x − θ) if x ≤ θ
Z θ Z ∞
x (x−θ) x −(x−θ)
µ01 = e dx + e dx
−∞ 2 θ 2
when x − θ = t →x=t+θ
Z 0 Z ∞
2µ01 = (t + θ)et dt + (t + θ)e−t dt
−∞ 0
Z ∞ Z 0 Z ∞ 
= θ e−|t| dt + tet dt + te−t dt
−∞ −∞ 0
Z ∞ ∞
Z ∞
Z
−|t| −t
= θ dt − θ
e te dt + θ te−t dt
−∞ 0 0
Z ∞
1 −|t|
= θ since e dt = 1
2 −∞
P
xi
µ01 = θ → θ̂(x) = m01 where m01 = .
n

178
Probability Models and their Parametric Estimation

Example 5.21 For a single random observation x of X , obtain the moment


estimates of the parameters a and b of the rectangular distribution
 1
b−a a<x<b
pa,b (x) =
0 otherwise

Z b
x a+b
µ01 = E[X] = dx =
a b−a 2
b
x2 b3 − a3 b2 + ab + a2
Z  
1
µ02 = E[X ] =2
dx = =
a a−b b−a 3 3
2
b + 2ab + a2 − ab
2
(2µ01 ) − ab
µ02 = =
3 3
3µ02 = 4(µ01 )2 − ab and b = 2µ01 − a
.˙. 3µ02 = 4(µ01 )2 − a(2µ01 − a) ⇒ 3µ02 = 4(µ01 )2 − 2aµ01 + a2
2
a2 − 2aµ01 + 4µ01 − 3µ02 = 0
q
2µ01 ± 4µ01 2 − 4(4µ01 2 − 3µ02 )
a=
2
√ √ √
â(x) = m1 ± 3m2 . But 2µ1 = µ1 ± 3m2 + b ⇒ b̂(x) = m01√± 3m2 .
0 0 0

Thus the value of the moment estimators of a and b are â(x) = m01 − 3m2 and
√ P P 2
x
 P 2
b̂(x) = m01 + 3m2 where m01 = nxi and m2 = n i − n
xi

Example 5.22 X has the following distribution function


X=x 0 1 2
Pθ {X = x} 1 − θ − θ2 θ θ2
Obtain the moment estimate of θ , if in a sample of 25 observation there were 10 ones
and 4 twos.
X=x Pθ {X = x} Frequency f
0 1 − θ − θ2 11
1 θ 10
2 θ2 P 4
Total 1 fi = 25 P
One can get, µ01 = Eθ [X] = (1 − θ − θ2 ) × 0 + θ × 1 + θ2 × 2 = Pfi xi
fi
0 + θ + 2θ2 = 0+10+8
25
50θ2 + √
25θ − 18 = 0
θ̂(x) = −25± 625+4×50×18
2×50 = .4
Example 5.23 Let X1 , X2 , · · · , Xn be a random sample drawn from a population
with pdf
α
(
α α−1 − xβ
pα,β (x) = βx e x, β, α > 0
0 otherwise

179
A. Santhakumaran

Obtain the moment estimates of α and β .


Z ∞ 
α xα
E[X r ] = xα+r−1 e− β dx
β
0  Z ∞  
α 1 α+r−1 y 1 1
= yα e− β y α −1 dy where y = xα
β α
Z ∞0
1 y r
= e− β y α +1−1 dy
β 0
1 Γ αr + 1
 r 
r
µ0r = 1 r
+1
= βαΓ +1
β (β)α α
   
1 1 2 2
µ01 = β α Γ + 1 and µ02 = β α Γ +1
α α
    2
2 2 2 1
µ2 = β α Γ +1 −βα Γ +1
α α
2
µ2 Γ( α2 + 1) − Γ( α1 + 1)
=
(µ01 )2
2
Γ( 1 + 1)α
Pn P
S 1 Xi
Coefficient of variation = X̄
where S 2 = n−1
2
i=1 (Xi − X̄) and X̄ = n .
Equivating
2
S2 Γ( α2 + 1) − Γ( α1 + 1)
= 2
x̄2 Γ( α1 + 1)
and using iterative method to estimate the value of α . From the estimate α̂(x) one
can obtain the estimate of β .

5.7 Method of Minimum Chi - Square Estimation


Prthat a sample contains r classes with observed frequency f1 , f2 , · · · , fthr
Suppose
such that fi = f . Let πi (θ) be the probability of an observation in the i
i=1 P
r
class such that i=1 πi (θ) = 1 . The probability πi (θ) is the function of θ . Let
T = t(X) be any statistic for the parameter θ where θ is unknown. A statistic
T = t(X) is called minimum χ2 estimator of the parameter θ if it is obtained by
minimizing χ2 with respect to θ ,i.e.,
r
X [fi − f πi (θ)]2
χ2 =
i=1
f πi (θ)
r
X fi2
= −f
i=1
f πi (θ)
2 r
∂χ X fi2 dπi (θ)
= − =0
∂θ i=1
f πi2 (θ) dθ
r
X fi2 dπi (θ)
→ = 0
i=1
f πi2 (θ) dθ

180
Probability Models and their Parametric Estimation

A solution of this equation is called Minimum χ2 estimate of θ .


Remark 5.4 Minimum χ2 estimator is analogous to that of MLE of θ . The
asymptotic properties of Minimum χ2 estimators are similar to those of MLE.
A modified form of Minimum χ2 estimator is obtained by minimizing
r
X [f πi (θ) − fi ]2
χ2mod =
i=1
fi
r
X f 2 πi2 (θ)
= −f
i=1
fi
r
∂χ2mod X f2 dπi (θ)
= 2 πi (θ) =0
∂θ i=1
fi dθ
r
X f 2 dπ 2 (θ)
i
⇒ = 0
i=1
fi dθ

Solving the equation for θ gives the modified Minimum χ2 estimate of θ .


Example 5.24 Find minimum χ2 estimate of the parameter θ of the Poisson
distribution.
e−θ θj
Let πj (θ) = j = 0, 1, · · · ,
j!
dπj (θ) e−θ θj−1 j θj e−θ (−1)
= +
dθ j! j!
−θ j
 
e θ j
= −1
j! θ
 
dπj (θ) j
= πj −1
dθ θ
dχ2 X fj2 dπj (θ)
= − =0
dθ j
f πj2 (θ) dθ
X fj2 
j

i.e., π j (θ) − 1 = 0
j
πj2 (θ) θ
X fj2  j 
−1 = 0
j
πj (θ) θ
X fj2  j

1− = 0
j
πj (θ) θ

Iterative method may be used to solve the equation for θ . Alternatively, expand
P fj2  j

f (θ) = j πj (θ) 1 − θ in a Taylor’s series as a function of θ upto first order

181
A. Santhakumaran

about the sample mean x̄ where x̄ is the trial value of θ,


" 2 #
X fj2  j
 X fj2  j
 X fj2 j 
j
1− = 1− + (θ − x̄) + 1−
j
πj (θ) θ j
mj x̄ j
mj x̄2 x̄
e−x̄ x̄j
where πj (x̄) = mj = and f (θ) = f (x̄) + (θ − x̄)f 0 (x̄)
j!
 
d πj1(θ) (1 − θj ) 1

j
 
j

1 dπj (θ)
since = 0+ 2 − 1− 2
dθ πj (θ) θ θ πj (θ) dθ
    
1 j j 1 j
= − 1 − (−π j (θ)) 1 −
πj (θ) θ2 θ πj (θ) θ
"  2 #
1 j j
= 2
+ 1− = f 0 (θ)
πj (θ) θ θ
X fj2  j

But 1− = 0
j
πj (θ) θ
" 2 #
X fj2  j
 X fj2 j 
j
1− + (θ − x̄) + 1− =0
j
mj x̄ j
mj x̄2 x̄

fj2
1 − x̄j
P  
− j mj
θ − x̄ = fj2 j
+ (1 − x̄j )2 ]
P
j mj [ x̄2
fj2
− j] x̄1
P
− j mj [x̄
θ − x̄ = fj2
+ (x̄ − j)2 ] x̄12
P
j mj [j
P fj2
j mj [j − x̄]
Let θ1 = x̄ + x̄ P fj2
j mj [j + (j − x̄)2 ]

To improve the value of θ from x̄ , repeat the process until to get the convergent value
of θ .
Example 5.25 Show that for large sample size, maximizing the likelihood function
of the χ2 statistic is equal to minimizing the χ2 statistic.
Let oj be the observed frequency and ej be the theoretical frequency of the
P (o −e )2
j class. Then χ2 = j j ej j . For large fixed sample size n , the distribution of
th

182
Probability Models and their Parametric Estimation

the quantities oj , j = 1, 2, · · · , r is given by the likelihood function

n!  e o1  e o2  e o r
1 2 r
L = ···
o1 !o2 ! · · · or ! n n n
such that o1 + o2 + · · · + or = n
 o1  o2  or  
n! e1 e2 er o1 o1  o o r
r
= ··· ···
o1 !o2 ! · · · or ! o1 o2 or n n
r  
X ej
log L = constant + oj log
j=1
oj

1
For large fixed sample size, ej = oj + aj n1−δ , δ > 0, i.e., ej = oj + aj n 2 for δ = 12 ,
1 P P 1 P
where aj is finite and |aj n 2 | <  and j oj = j ej = n so that n
2
j aj = 0
1

aj < 0(n− 2 ) .
P P
as n → ∞ and if n 6→ ∞ , then aj < 1 for every  > 0 , i.e.,
n2

r 1
" #
X oj + aj n 2
log L = constant + oj log
j=1
oj
r
" 1
!#
X aj n 2
= constant + oj log 1 +
j=1
oj
" 1
!#
X aj n 2 a2j n 1
= constant + oj − 2 + ···
j
oj oj 2
X 1 1 X a2j n 1
= constant + aj n 2 − + 0(n− 2 )
j
2 j oj
1 X (ej − oj )2 1
= constant − + 0(n− 2 )
2 j oj

(ej −oj )2
If modified χ2 statistic is defined as χ2mod =
P
j oj , then

1
log L = constant − χ2mod as n → ∞.
2

183
A. Santhakumaran

To prove χ2 = χ2mod as n → ∞ . Consider


X (oj − ej )2 X (ej − oj )2
χ2 − χ2mod = −
j
ej j
oj
X (ej − oj )2 oj
 
= −1
j
oj ej
" #
X (ej − oj )2 oj
= 1 − 1
j
oj oj + aj n 2
 !−1 
1
X (ej − oj )2
 1 + aj n
2
= − 1
j
oj oj
" 1
#
X (ej − oj )2 aj n 2 a2j n
= 1− + 2 − ······ − 1
j
oj oj oj
X (oj − ej )2 aj n1/2 X (oj − ej )2 a2j n
= − + − ···
j
o2j j
o3j
X a3j n3/2 X a4j n2
= − + − · · · where ej − oj = aj n1/2
j
o2j j
o3j

3
 1
3 i3
a3j n 2 a3j
h 1
 3
= o(n− 2 =
P P
Since j o2j <  for some n > N ⇒ j o2j < 3 = 1
n2 n2
1
 1 4
1 a4j (n 2 )4 a4j 1 14
o(n− 2 ) and
P P
j o3j
< 1 for some n > N ⇒ j o3j < 1
4
< 1 =
(n 2 ) n2
1 1
(o(n− 2 ))4 = o(n− 2 ) where  > 0 and 1 > 0.
1
χ2 − χ2mod = o(n− 2 ) = 0 as n → ∞
1
Thus log L = constant − χ2 as n → ∞
2
1
max log L = constant + {− max χ2 } as n → ∞
2
1
= constant + min χ2 as n → ∞
2
Maximizing the likelihood function of the χ2 statistic = Minimizing the χ2 statistic.

5. 8 Method of Least Square Estimation


Consider a linear model Y = Xθ +  where  is a non - observable random
vector such that E[i ] = 0 and V [i ] = σ 2 ∀i, Y is a known vector, θ is the

184
Probability Models and their Parametric Estimation

parameter to be estimated
     
y1 θ1 1
 y2   θ2   2 
Y =  ···
 θ =  
 ···   =  
  ··· 
yn n×1
θ m m×1
n n×1
X = coefficient matrix of the parameter θ
 
x11 x12 · · · x1m
 x21 x22 · · · x2m 
i.e., X = 
 ··· ··· ··· ··· 

xn1 xn2 · · · xnm n×m


Definition 5.1 An estimate of θ say θ̂(x) which minimizes (Y − Xθ)0 (Y − Xθ) is
called the Least Square Estimator (LSE) of θ , i.e.,  = (Y − Xθ), 0 = (Y − Xθ)0 .
Define S = 0  = (Y − Xθ)0 (Y − Xθ) . The necessary condition is dS dθ = 0 and the
2
sufficient condition is ddθS2 > 0 at θ = θ̂(x) for minimization of S .
dS
⇒ = −2X 0 (Y − Xθ) = 0

X 0 Y − X 0 Xθ = 0
X 0 Xθ = X 0 Y
V θ̂(x) = (X 0 X)−1 X 0 Y provided (X 0 X)−1 exists

LSE of θ is unbiased, i.e., Eθ [θ̂(X)] = θ .

θ̂(x) = (X 0 X)−1 X 0 Y
= (X 0 X)−1 X 0 [Xθ + ]
= (X 0 X)−1 X 0 Xθ + (X 0 X)−1 X 0 
Eθ [θ̂(X)] = θ + (X 0 X)−1 X 0 Eθ []
= θ since Eθ [] = 0

To find the variance of the LSE, consider


θ̂(X) − θ = (X 0 X)−1 X 0 Y − θ
= (X 0 X)−1 X 0 [Xθ + ] − θ
= (X 0 X)−1 X 0 Xθ + (X 0 X)−1 X 0  − θ
= θ + (X 0 X)−1 X 0  − θ
= (X 0 X)−1 X 0 
Vθ [θ̂(X)] = Eθ [θ̂(X) − θ][θ̂(X) − θ]0
= Eθ [(X 0 X)−1 X 0 ][(X 0 X)−1 X 0 ]0
= Eθ [0 ](X 0 X)−1 (X 0 X)(X 0 X)−1
= σ 2 (X 0 X)−1
since E[i ] = 0, V [i ] = σ 2 and E[0 ] = σ 2 I

185
A. Santhakumaran

Linear Estimation

let Y1 , Y2 , · · · , Yn be n independent random variables with same variance σ 2


and E[Y ] = Xθ . Any linear function b0 θ = b1 θ1 + b2 θ2 + · · · + bm θm in θ is
unbiasedly estimable if there exists a linear function c0 Y = c1 y1 + c2 y2 + · · · + cn yn
such that Eθ [c0 Y ] = b0 θ .
Theorem 5.7 A necessary and sufficient condition for the ensilability of b0 θ is

ρ(X 0 ) = ρ(X 0 , b)

Proof: Let b0 θ be estimable,


i.e., Eθ [c0 Y ] = b0 θ ∀ θ
⇒ c0 Xθ = b0 θ ∀θ, i.e., c0 X = b0
X 0 c = b is solvable, i.e., ρ(X 0 ) = ρ(X 0 , b)
Conversely, suppose ρ(X 0 ) = ρ(X 0 , b) ⇒ X 0 c = b , the equation is consistent.
.˙. X 0 c = b is solvable , i.e., c0 X = b0
c0 Xθ = b0 θ
c0 Eθ [Y ] = b0 θ
Eθ [c0 Y ] = b0 θ
⇒ b0 θ is estimable.
Remarks 5.5 ρ(X 0 ) = ρ(X 0 , b) ⇒ ρ(X 0 X) = ρ(X 0 X, b)
Best Linear Unbiased Estimator
Definition 5.2 The unbiased linear estimate of an estimable linear parametric
function b0 θ = b1 θ1 + b2 θ2 + · · · + bm θm with minimum variance is called the best
linear unbiased estimator or BLUE.

5.9 Gauss Markoff Theorem


Theorem 5.8 Let Y1 , Y2 , · · · , Yn be n independent random variables with variance
σ 2 and Eθ [Y ] = Xθ , then every estimable parametric function b0 θ possesses an
unique minimum variance unbiased estimator which is a function of θ̂(X) , the LSE of
θ . Further, E[Y − X θ̂ ]0 [Y − X θ̂ ] = (n − r)σ 2 .
Proof: b0 θ is estimable if there exist c0 Y such that Eθ [c0 Y ] = b0 θ

c0 Xθ = b0 θ ⇒ X 0 c = b (5.11)
0 0 2
and V [c Y ] = c cσ (5.12)
Minimize equation (5.12) subject to equation ( 5.11)
Using the method of Lagrange multipliers , one determines the stationary points by
considering

L(λ) = c0 c − 2λ0 (X 0 c − b)
where λ is a vector of Lagrange multiplier
dL(λ)
= 2c0 − 2λ0 X 0
dc

186
Probability Models and their Parametric Estimation

The stationary points of the function L(λ) are given by the equation

dL(λ)
= 0
dc
⇒ c0 − λ0 X 0 = 0
⇒ c0 = λ0 X 0 i.e., c = Xλ
X 0 Xλ = b (5.13)
0
Since b θ is estimable, equation (5.11) is solvable, i.e.,
ρ(X 0 ) = ρ(X 0 , b) ↔ ρ(X 0 X) = ρ(X 0 X, b).
Thus equation (5.13) is solvable. Let c(1) and c(2) be two solutions for to λ(1) and
λ(2) of equation (5.13).
c(1) = Xλ(1)
c(2) = Xλ(2)
0 (1)
X Xλ = b
0 (2)
X Xλ = b
X 0 X(λ(1) − λ(2) ) = 0
c(1) − c(2) = X(λ(1) − λ(2) )
 0    0  
c(1) − c(2) c(1) − c(2) = λ(1) − λ(2) X 0 X λ(1) − λ(2)
 0  
⇒ c(1) − c(2) c(1) − c(2) = 0
(1)
⇒c = c(2)
Thus, whatever be the solution of λ of the equation (5.13) the values of c0 are the
same. Hence b0 θ possesses an unique minimum variance unbiased estimator.
Suppose that ρ(X) = r andthe first r columns of X are linearly independent.
b1
Let X = [X1 X2 ] , b = . Now the solution of the equation (5.13) is λ =
b2
−1
(X10 X1 ) b1
.˙. c = Xλ
−1
= X1 (X10 X1 ) b1
−1 −1
c0 c = b01 (X10 X1 ) X10 X1 (X10 X1 ) b1
−1
= b01 0
(X X1 ) b1
For every c satisfying X 0 c = b
c0 c = c0 [I − X1 (X10 X1 )−1 X10 ]c + c0 X1 (X10 X1 )−1 X10 ]c
= c0 [I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 (X10 X1 )(X10 X1 )−1 (X10 X1 )](X10 X1 )−1 b1
= c0 [I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 b1
Since [I − X1 (X10 X1 )−1 X10 ] is an idempotent matrix
= c0 [I − X1 (X10 X1 )−1 X10 ][I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 b1
≥ b01 (X10 X1 )−1 b1

187
A. Santhakumaran

This indicates that the minimum is actually obtained. The LSE θ̂(X) of θ is obtained
by minimizing (Y − X θ̂)0 (Y − X θ̂) . The normal equation is
X 0 Xθ = X 0 Y ⇒ c0 Y = λ0 X 0 Y = λ0 X 0 X θ̂ = b0 θ̂(X) since b0 = λ0 X 0 X.
The best
 linear unbiased−1estimator of c0 Y is b0 θ̂(X) .
0

Since I − X1 (X1 X1 ) X1 is a projection matrix and hence it is an idempotent
matrix. Further, it is an well known property that for an idempotent matrix A, ρ(A) =
T r(A)
⇒ ρ I − X1 (X10 X1 )−1 X10 = T r I − X1 (X10 X1 )−1 X10 = (n − r)
 

since the first r coloumn of X1 are linearly independent.


.˙. Eθ [(Y −X θ̂ )0 (Y −X θ̂)] = Eθ [(Y −X θ̂ )0 (I−X1 (X10 X1 )−1 X10 )(Y −X θ̂)] = (n−r)σ 2 .
Example 5.26 E[Y1 ] = θ1 + θ2 , E[Y2 ] = θ2 + θ3 , E[Y3 ] = θ3 + θ1 . Show that
l1 θ1 + l2 θ2 + l3 θ3 is estimable if l1 + l2 = l3 .
Given
       
1 0 1 1 0 1 l1 θ1
X= 0 1 1  , X0 =  0 1 0  , l =  l2  , θ =  θ2 
1 0 1 1 1 1 l3 θ3
l0 θ is estimable if and only if ρ(X 0 ) = ρ(X 0 , l) Consider X 0 θ = l , i.e.,
    
1 0 1 θ1 l1
 0 1 0   θ 2  =  l2 
1 1 1 θ3 l3

    
1 0 1 θ1 l1
 0 1 0   θ 2  =  l2 
0 1 0 θ3 l3 − l1
    
1 0 1 θ1 l1
 0 1 0   θ2  =  l2 
0 0 0 θ3 l3 − l1 − l2
ρ(X 0 ) = ρ(X 0 , l) if l3 − l1 − l2 = 0 , i.e., l3 = l1 + l2 .
Example 5.27 The feed intake of a cow with weight X1 and yield of milk X2 may
be of the linear model Y = a + b1 X1 + b2 X2 +  , where  is called random error
or random residuals. If yi , xi1 and xi2 are the values of Y, X1 and X2 for cow
i = 1, 2, 3, 4 and 5 . The following observations are made on 5 cows:
i Y X1 X2
1 62 2 6
2 60 9 10
3 57 6 4
4 48 3 13
5 23 5 2

188
Probability Models and their Parametric Estimation

The estimate θ̂(X) = (a, b1 , b2 )0 is calculated from θ̂(X) = (X 0 X)−1 X 0 Y


where
   
62 1 2 6  
 60   1 9 10  250
X 0 Y =  1265 
   
Y =  57  X=  1 6 4
 

 48   1 3 13  1870 3×1
23 5×1 1 5 2 5×3
  
5 25
35 790 −80 −42
1 
(X 0 X)−1 =  25 155
175  = −80 16 0 
480
35 175
325 −42 0 6
      
790 −80 −42 250 37 a
1
θ̂(X) = (X 0 X)−1 X 0 Y =  −80 16 0   1265  =  21  =  b1 
480 3
−42 0 6 1870 2 b2
The estimated linear model is Y = 37 + 12 X1 + 23 X2 .
Problems
5.1 Define LSE. Show that under certain assumptions to be stated, the LSE’s are
minimum variance unbiased estimators.
5.2 Let yi = βxi + i , i = 1, 2, 3, · · · , n where 1 , 2 , · · · n are uncorrelated ran-
dom variables with mean 0 and σ 2 . Find the LSE of β . Show that the LSE of
β is unbiased. Find the variance of LSE of β . Also show that LSE of β is the
best Linear Unbiased Estimator of β .
5.3 Examine the sufficiency and unbiasedness of the MLE.
5.4 Independent random samples of sizes n1 , n2 , and n3 are available from three
normal populations with mean α+β +γ, α−β and β −γ respectively and with
a common variance σ 2 . Find the MLE of α, β, γ and σ 2 . Are they UMVUE’s?
5.5 Give the conditions for which
(a) the likelihood equation has a consistent estimator with probability approach-
ing one as n → ∞ .
(b) the consistent estimator of the likelihood equation is asymptotically normal.
5.6 Explain the principle of Maximum Likelihood of Estimation of parameter θ of
p(x | θ) . Obtain MLE of the parameters of N (θ, σ 2 ) . Also examine them for
unbiasedness.
5.7 Show that under what regularity conditions to be stated, the MLE is asymptotically
normally distributed.
5.8 Let X1 , X2 , · · · , Xn be a random sample drawn from a population with mean
θ and finite variance. Let T = t(X) be an estimator for θ and has min-
imum variance and T 0 = t0 (X) is any other unbiased estimator of θ , then
Covθ [T, T 0 ] = V [T ] .

189
A. Santhakumaran

5.9 Derive the formula to calculate the MLE of θ , using a random sampleP from the
θx
distribution with Pθ {X = x} = ax g(θ) , x = 1, 2, · · · where g(θ) = ax θx .
Also obtain the explicit expression for the case of truncated Poisson distribution
with x = 1, 2, 3, · · · .
5.10 Show that MLE of θ based on n independent observations from a uniform
distribution in (0, θ) is consistent.
5.11 Find the MLE of θ given the observations .8 and .3 on a random variable with
pdf  2x
θ 0<x<θ
pθ (x) = (1−x)
2 1−θ if θ ≤ x < 1, 0 < θ < 1

5.12 Let X1 , X2 , · · · , Xn be n independent random observations with pdf


N (0, θ) . Find the MLE of θ .
5.13 Given a random sample from N (θ, 1), (θ = 0, ±1, ±2, · · · , ) . Find the MLE of
θ.
5.14 Explain the method of scoring to obtain the MLE.
5.15 Obtain the MLE of θ based on random samples of sizes n and m from popu-
x
lations with respective frequency functions θ1 e− θ and θe−xθ , x > 0, θ > 0 .
5.16 What is MVBE? Obtain sufficient conditions for an estimator to be MVBE.
5.17 Give an account of estimation by the method of (i) Moments (ii) Minimum χ2 ,
giving one illustration in each case.
5.19 Examine the truth of the following statements
(i) MLE is unique
(ii) MLE is unbiased
(iii) If sufficient statistics T = t(X) exists for parameter θ , then MLE is a
function of T .
5.20 Show that under certain conditions to be stated MLE is consistent.
5.21 Examine whether MLE always exists.
5.22 Obtain the general form of distribution admitting MVBE’s.
5.23 A random sample of size n is available from pθ (x) = θxθ−1 , 0 < x < 1, θ > 0.
Find that function of θ for which MVBE exists. Also find the MVBE of this
function and its variance.
−θ x
5.24 Derive the MVUE of θ2 in pθ(x) = e x!θ , x = 0, 1, · · · , by taking a sample
of size n and show that it is not a MVBE of θ2 .
5.25 Describe the Minimum χ2 method of estimation. Show that, under what certain
conditions to be stated, the methods of Minimum χ2 and Maximum likelihood
χ2 statistic are equally efficient estimators.

190
Probability Models and their Parametric Estimation

5.26 Show that MVBE’s exist for the exponential family of densities.

5.27 Find MLE of β in Gamma(1, β ) based on a sample of size n where the actual
observations are not available but it is known that k of the observations are less
than or equal to a fixed positive number M .
5.28 Obtain the BLUE of θ for the normal distribution with mean θ and variance
σ 2 based on n observations x1 , x2 , · · · , xn .
5.29 Obtain the MLE for the coefficient of variation from a population with N (θ, σ 2 )
based on n observations.
5.30 Obtain the MLE of θ for the pdf

(1 + θ)xθ 0 < x < 1 and θ > 0
pθ (x) =
0 otherwise

based on an independent sample of size n .


5.31 Obtain the MLE of θ using a random sample of size n from
 1
2θ −θ < x < θ
pθ (x) =
0 otherwise

5.32 Show that maximum likelihood estimation χ2 statistic and Minimum χ2 statis-
tic give the same results as n → ∞ .
5.33 Find the MLE of N of
1

N if x = 1, 2, · · · , N, N ∈ I+
pN (x) =
0 otherwise

based on a random sample of size n .


5.34 Suppose X1 , X2 , · · · , Xn are iid observations from the density
x2
 2x
pθ (x) = θ 2 exp{− θ 2 } x > 0, θ > 0
0 otherwise

Obtain the MLE of θ .


5.35 If the random variable X takes the value 0 or 1 with probability 1 − p and p
respectively and p ∈ [0.1, 0.9] , then maximum likelihood estimate of p on the
basis of a single observation x would be
(a) 8x+1
10 (b) x (c) 9−8x
10 (d) x2
Hint: 
p̂ if x = 0
The maximum of L(p) =
1 − p̂ if x = 1

191
A. Santhakumaran

5.36 The maximum likelihood estimator of σ 2 in a normal population with mean zero
is
(a) n1 P(xi − x̄)2
P
1
(b) n−1 (x − x̄)2
1
P 2 i
(c) n P xi
1
(d) n−1 x2i
5.37 Consider the following statements:
The maximum likelihood estimators
1. are consistent
2. have invariance property
3. can be made unbiased using an adjustment factor even if they are biased. Of
these statements:
(a) 1 and 3 are correct
(b) 1 and 2 are correct
(c) 2 and 3 are correct
(d) 1, 2 and 3 are correct
5.38 Which of the following statements are not correct?
1. From the Cramer - Rao inequality one can always find the lower bound of the
variance of an unbiased estimator.
2. If sufficient statistic exits, then maximum likelihood estimator is itself a suffi-
cient statistic.
3. UMVUE and MVBE’s are same.
4. MLE’s may not be unique
Select the correct answer given below:
(a) 1 and 3 (b) 1 and 2 (c) 1 and 4 (d) 2 and 3
5.39 Which one of the following is not necessary for the UMVU estimation of θ by
T = t(X) ?
(a) E[T − θ] = 0
(b) E[T − θ]2 < ∞
(c) E[T − θ]2 is minimum
(d) T = t(X) is a linear function of observations
5.40 Consider the following statements:
If X1 , X2 , · · · , Xn are iid random variables with uniform distribution over
(0, θ) , then
1. 2X̄ is an unbiased estimator of θ .
2.The largest among X1 , X2 , · · · , Xn is an unbiased estimator θ
3. The largest among X1 , X2 · · · , Xn is sufficient for θ
4. n+1
n X(n) is a minimum unbiased estimator of θ
Of these statements:
(a) 1 alone is correct
(b) 1 and 2 are correct
(c) 1, 3 and 4 are correct
(d) 1 and 4 are correct

192
Probability Models and their Parametric Estimation

5.41 LSE and MLE are the same if the sample comes from the population is :
(a) Normal (b) Binomial (c) Cauchy ( d) Exponential
5.42 LSE of the parameters of a linear model are
(a) unbiased (b) BLUE (c) UMVU (d) all the above

193
A. Santhakumaran

6. INTERVAL ESTIMATION

6.1 Introduction
Let X be a random sample drawn from a population with pdf pθ (x), θ ∈ Ω.
For every distinct value of θ, θ ∈ Ω , there corresponds one member of the family of
distributions. Thus one has a family of pdf ’s {pθ (x), θ ∈ Ω} . The experimenter
needs to select a point estimate of θ, θ ∈ Ω . Even though the estimator may have
some valid statistical properties, the estimator may not reflect the true value of the
parameters, due to the randomness of the observations. Hence one may search for an
alternative to get the closeness of the estimates to the unknown parameters with certain
probability values. Hence as an alternative one may go for the interval estimation with
certain level of significance. This chapter deals with interval estimation.
Family of random sets
k
Let Pθ , θ ∈ Ω ⊆ < , be the set of probability distributions of the random variable
X . A family of subsets of S(X) of Ω depends on the observations x of X but not
θ , is called family of random sets.

6.2 Confidence Intervals


The problem of interval estimation is that finding a family of random sets S(X)
of the parameter θ , such that for a given α, 0 < α < 1, Pθ {S(X) contains θ} ≥
1 − α, ∀ θ ∈ Ω .
Let θ ∈ Ω ⊆ < and 0 < α < 1 . A function θ(X) satisfying Pθ {θ(X) ≤
θ} ≥ 1−α ∀ θ is called lower confidence bound of θ at confidence level (1−α) . The
infiumum takes over all possible values of θ ∈ Ω ⊆ < of Pθ {θ(X) ≤ θ} is (1 − α) .
The quantity (1 − α) is called confidence coefficient.
A function of the form Pθ {θ ≤ θ̄(X)} ≥ 1 − α ∀ θ ∈ Ω ⊆ < is called upper
confidence bound of θ at confidence level (1 − α) . 
If S(x) is of the form S(x) = θ(x), θ̄(x) , then it is called a confidence
interval at confidence level (1 − α) , provided Pθ {θ(X) ≤ θ ≤ θ̄(X)} ≥ 1 − α ∀ θ ∈
Ω ⊆ <. Theconfidence coefficient (1 − α) is associated with the random interval
θ(X), θ̄(X) ,
Let X be a random sample drawn from a population with pdf pθ (x), θ ∈
Ω ⊆ < and a, b be two given positive numbers such that a < b, a, b ∈ < . Consider
Pθ {a < X < b} = Pθ {a < X and X < b}
 
X
= Pθ 1 < and X < b
a
 
b
= Pθ b < X and X < b
a
 
b
= Pθ X < b and b < X
a
 
b
= Pθ X < b < X
a

194
Probability Models and their Parametric Estimation

The interval with end points X and ab X that are functions of X . Hence I(X) =
X, ab X is a random interval. Thus if I(X) takes a value x, ab x when X takes
 

the value x with certain confidence of fixed probability.


Let θ be an unknown parameter and let (θ(X), θ̄(X)) be a (1 − α) level confi-
dence interval for θ . One desires
 the confidence limit for g(θ) , a monotonic
 function
of θ . The set θ(X), θ̄(X) is equivalent to the set g(θ(X)), g(θ̄(X)) as long
as g(θ) is a monotonic increasing function of θ . Thus g(θ(X)), g(θ̄(X)) is a
(1 − α) level confidence
 interval for g(θ) . If g(θ) is monotonic decreasing, then
g(θ̄(X)), g(θ(X)) is a (1 − α) level confidence interval for g(θ) .
Example 6.1 For a single observation x of a random variable X with density
function  1
θ 0 < x < θ, θ > 0
pθ (x) =
0 otherwise
Obtain the probability of confidence of the random interval (X, 10X) for θ, θ ∈ Ω .
The probability of confidence of the interval (X, 10X) for θ is
 
θ
Pθ {X < θ < 10X} = Pθ 1 < < 10
X
 
θ
= Pθ <X<θ
10
Z θ
1
= dx = .9
θ
10
θ

1 19

Example 6.2 Find the confidence coefficient of the confidence interval 19X , X
for θ based on a single observation x of a random variable X with pdf
 θ
(1+θx)2 0 < x < ∞, θ > 0
pθ (x) =
0 otherwise
1
, 19

The confidence coefficient of the interval 19X X for θ is
   
1 19 1 19
Pθ <θ< = Pθ <X<
19X X 19θ θ
Z 19 θ θ
= dx
1
19θ
(1 + θx)2
Z 19
1 θ 2θ
= dx
2 19θ (1 + θx)2
1

  19
1 1 θ
= −
2 (1 + θx) 1
  19θ
1 1 19
= − − = .45
2 20 20

195
A. Santhakumaran

 
X 2X
Example 6.3 Compute the confidence coefficient of the interval 1+X , 1+2X
θ
for 1+θ where X has the pdf
1

θ0 < x < θ, θ > 0
pθ (x) =
0otherwise
 
X 2X θ
The confidence coefficient of the interval 1+X , 1+2X for 1+θ is
   
X θ 2X 1 + 2X 1+θ 1+x
Pθ < < = Pθ < <
1+X 1+θ 1 + 2X 2X θ X
 
1 1 1
= Pθ +1< +1< +1
2X θ X
 
1 1 1
= Pθ < <
2X θ X
= Pθ {X < θ < 2X}
 
θ
= Pθ 1 < <2
X
 
X 1
= Pθ 1 > >
θ 2
 
θ
= Pθ <X<θ
2
Z θ  
1 1 θ
= dx = θ− = .5
θ
2
θ θ 2
Example 6.4 Let T = t(X) be the maximum of two independent observations
drawn from a population with uniform distribution over the interval (0, θ) . Compute
the confidence coefficient of the interval (0, 2T ) .
Let T = max{X1 , X2 } . The pdf of T is
 2
pθ (t) = θ2 t 0 < t < θ
0 otherwise
The confidence coefficient of the interval (0, 2T ) is
 
θ
Pθ {0 < θ < 2T } = Pθ 0 < <2
T
 
θ
= Pθ <T <∞
2
 
θ
= Pθ <T <θ
2
Z θ
2
= tdt
θ
2
θ2
 θ
2 t2
= = .75
θ2 2 θ
2

196
Probability Models and their Parametric Estimation

Example 6.5 For a sample observation x of X is drawn from a population with


pdf
2

θ 2 (θ − x) 0 < x < θ, θ > 0
pθ (x) =
0 otherwise
Find (1 − α) level confidence interval for θ.
Consider the pdf of Y = X θ . It is given by

2(1 − y) 0 < y < 1
p(y) =
0 otherwise
The (1 − α) level confidence interval for θ is
Pθ {λ1 < Y < λ2 } = 1−α
α
i.e., Pθ {Y ≥ λ2 } =
2
α
and Pθ {Y ≤ λ1 } =
2
Z 1
α
Thus 2(1 − y)dy =
λ2 2
λ22 − 2λ2 − (1 − α/2) = 0
p
⇒ λ2 = 1 − 2 − α/2 = c2
Pθ {Y ≤ λ1 } = α/2
Z λ1
2(1 − y)dy = α/2
0
⇒ λ21 − 2λ1 + α/2 = 0
p
λ1 = 1− 1 − α/2 = c1
The (1 − α) level confidence interval for θ is
Pθ {c1 < Y < c2 } = 1 − α
 
X
Pθ c1 < < c2 = 1−α
θ
 
X X
Pθ <θ< = 1−α
c2 c1
 
X X
is the (1 − α) level confidence interval for θ .
c2 , c1
Example 6.6 Obtain (1 − α) level confidence interval for θ , using a random
sample of size n from a population with pdf
 −(x−θ)
e x ≥ θ, θ > 0
pθ (x) =
0 otherwise
Let Y1 = min1≤i≤n {Xi } be the first order statistic of random sample
X1 , X2 , · · · , Xn . The pdf of Y1 is given by
ne−n(y1 −θ) θ < y1 < ∞

pθ (y1 ) =
0 otherwise

197
A. Santhakumaran

Denote t = y1 − θ , then the pdf of T = t(X) is

ne−nt

0<t<∞
p(t) =
0 otherwise

The (1 − α) level confidence interval for θ is

Pθ {λ1 < T < λ2 } = 1 − α


Z λ2
ne−nt dt = 1 − α
λ1
e−nλ1 − e−nλ2 = 1−α

This equation has infinitely many solutions. If one can choose λ1 = 0, then 1 −
e−nλ2 = 1 − α , i.e., e−nλ2 = α → −nλ2 = log α . Thus λ2 = n1 log( α1 ) . .˙. The
(1 − α) level confidence interval for θ is
  
1 1
Pθ 0 < T < log = 1−α
n α
  
1 1
Pθ 0 < Y1 − θ < log = 1−α
n α
   
1 1
Pθ Y1 − log < θ < Y1 = 1−α
n α

y1 − n1 log( α1 ), y1 is the (1 − α) level confidence interval for θ .




Example 6.7 Given a sample of size n from U (0, θ) . Show that the confidence
interval for θ based on the sample range R with confidence coefficient (1 − α) and
of the form (R, Rc ) has c given as a root of the equation

cn−1 [n − (n − 1)c] = α.

Also give the case n = 2.


The pdf of Range R of sample size n is given by
( R ∞ hR in−2
x+R
pθ (R) = n(n − 1) −∞
p(x | θ)p[(x + R) | θ] x
p(x | θ)dx dx
0 otherwise

198
Probability Models and their Parametric Estimation

Given pθ (x) = θ1 , 0 < x < θ and pθ (x+R) = θ1 , 0 < x+R < θ or 0 < x < θ −R .
"Z #n−2
Z θ−R R+x
11 1
pθ (R) = n(n − 1) dx dx
0 θθ x θ
Z θ−R
1 1
= n(n − 1) Rn−2 dx
0 θ2 θn−2
n(n − 1) n−2
= R (θ − R)
θn
 n−2  
n(n − 1) R R
= 1− , 0<R<θ
θ θ θ
R
If y = θ, then
n(n − 1)y n−2 (1 − y) 0 < y < 1

p(y) =
=0 otherwise

The (1 − α) level confidence interval for θ is given by


R
Pθ {λ1 < < λ2 } = 1 − α
θ
P {λ1 < Y < λ2 } = 1 − α
Z λ2
p(y)dy = 1 − α
λ1
Z λ2
n(n − 1)y n−2 (1 − y)dy = 1−α
λ1
λ2
y n−1 yn

n(n − 1) − = 1−α
n−1 n λ1
nλn−1
2 − (n − 1)λn2 − nλn−1
1 + (n − 1)λn1 = 1−α

This equation has infinitely many solutions. If one can choose λ1 = c and λ2 = 1 ,
then the confidence interval for θ is

P {c < Y < 1} = 1−α


R
P {c < < 1} = 1−α
θ
R
P {R < θ < } = 1−α
c
R, Rc is the (1 − α) level confidence interval for θ where c is given by cn−1 [n −


(n − 1)c] = α . For n = 2, c = 1 − 1 − α.

6.2 Alternative method of confidence intervals


For large or small samples, the Chebychev’s inequality can be employed to find
the confidence interval for a parameter θ, θ ∈ Ω. For a random variable X with

199
A. Santhakumaran

Eθ [X] = θ and Vθ [X] = σ 2 , then


n p o 1
Pθ |X − θ| <  V [X] > 1 − 2 where  > 1


If θ̂(X) is the estimate of θ ( not necessarily unbiased) with finite variance, then by
Chebychev’s Inequality
 
1
q
2
Pθ |θ̂(X) − θ| <  Eθ [θ̂(X) − θ] > 1 − 2

 q q 
⇒ θ̂(x) −  Eθ [θ̂(X) − θ]2 , θ̂(x) +  Eθ [θ̂(X) − θ]2 is a 1 − 12 level con-
fidence interval for θ .
Example 6.8 Let X1 , X2 , · · · , Xn be iid b(1, θ) random variables. Obtain (1 −
α) level confidence interval for θ by using Chebychev’s Inequality.
Pn Vθ [X]
i=1 Xi ∼ b(n, θ) since each Xi ∼ b(1, θ) . Eθ [X̄] = θ and Vθ [X̄] = n =
θ(1−θ)
n . Now ( )
r
θ(1 − θ) 1
Pθ |X̄ − θ| <  >1− 2
n 

Since θ(1 − θ) ≤ 14 ,
 
1 1
Pθ |X̄ − θ| <  √ >1− 2
2 n 
 
  1
Pθ X̄ − √ < θ < X̄ + √ >1− 2
2 n 2 n 
If n is kept constant, then one can choose 1 − 12 = 1 − α ⇒ 2 = 1
α ⇒ = √1 .
α
Thus the (1 − α) level confidence interval for θ is
 
1 1
x̄ − √ , x̄ + √
2 nα 2 nα

Example 6.9 Let X be a Binomial random variable with parameters n and θ .


Obtain (1 − α) level confidence interval for θ .
One can find the largest integer n2 (θ) such that Pθ {X ≤ n2 (θ)} ≥ α2 and
the smallest integer n1 (θ) such that Pθ {X ≥ n1 (θ)} ≥ α2 .
Because of the discreteness of the Binomial probability, one cannot make these
probabilities exactly equal α2 for all θ other than the symmetrical Binomial probabil-
ity. The events {X ≤ n1 (θ)} and {X ≥ n2 (θ)} are mutually exclusive, then
α α
Pθ {X ≤ n1 (θ) or X ≥ n2 (θ)} ≤ + =α
2 2
i.e., Pθ {n1 (θ) < X < n2 (θ)} ≥ 1−α

200
Probability Models and their Parametric Estimation

The two functions n1 (θ) and n2 (θ) are monotonic and non - decreasing and also
discontinuous step function such that the (1 − α) level confidence interval for θ is
Pθ {n−1 −1
2 (X) < θ < n1 (X)} ≥ 1−α
α
where Pθ {X ≤ n1 (θ)} ≤
2
n1 (θ)  
X n α
i.e., i θi (1 − θ)n−i ≤
i=0
2

If the observed value X = x , then n−1


1 (x) is the upper confidence limit for θ and
n1 (n−1
1 (x)) = x so that
x n
X α
i θi (1 − θ)n−i = (6.1)
i=0
2

Thus the upper confidence limit for θ . Similarly the lower confidence limit for θ is
n n
X α
i θi (1 − θ)n−i = (6.2)
i=x
2

Solving the equations (6.1) and (6.2) for θ ( when n and α are known) gives the
(1 − α) level confidence interval for θ . i.e., (θ(X), θ̄(X)) is the (1 − α) level
confidence interval where θ̄(x) is the solution of the equation (6.1) and θ(x) is the
solution the equation (6.2).
Example 6.10 Assuming there is a constant probability θ , for a person entering a
supermarket will make a purchase. Constitute a random sample of a Bernoulli random
variable ( success = purchase made, failure = no purchase made). If 10 persons were
selected at random and it was found that 4 made a purchase. Obtain 90% confidence
interval for θ .
The 90 % confidence limits for θ is
4
!
X 10
i θi (1 − θ)10−i = .05
i=0
10
!
X 10
i θi (1 − θ)10−i = .05
i=4

Solving these equations for θ, one may get that θ̄(x) = .696 and θ(x) = .150.
Thus, if a random sample of 10 independent Bernoulli random variables gives x = 4
success, the 90 % confidence interval for θ is ( .150, .696).
Example 6.11 Let X1 , X2 , · · · , Xn be a random sample from a Poisson random
variable X with parameter
Pn θ . Obtain (1 − α) level confidence interval for θ .
Let Y = i=1 Xi . Given that each Xi follows P (θ) . Then Y ∼ P (nθ) .
The exact (1 − α) level confidence interval for θ is
Pθ {λ1 (θ) < Y < λ2 (θ)} = 1 − α
α
i.e., Pθ {Y ≥ λ2 (θ)} ≤
2

201
A. Santhakumaran


X (nθ)x α
⇒ e−nθ = (6.3)
x=y
x! 2
α
and Pθ {Y ≤ λ1 (θ)} ≤
2
y
X (nθ)x α
⇒ e−nθ = (6.4)
x=0
x! 2

The (1 − α) level confidence interval for θ is equivalent to

Pθ {λ−1 −1
2 (Y ) < θ < λ1 (Y )} = 1 − α

 (6.3) and (6.4) for θ, the (1 − α) level confidence interval for


Solving the equations
θ is θ(X), θ̄(X) where θ̄(x) is the solution of the equation (6.3) and θ(x) is the
solution of the equation (6.4).
Example 6.12 Let X1 , X2 , · · · , Xn be a random sample of a Uniform random
variable X on (0, θ) . Obtain (1 − α) level confidence interval for θ .
Let T = t(X) = max1≤i≤n {Xi } . The pdf of T is
 n n−1
p(t | θ) = θn t 0<t<θ
0 otherwise

The (1 − α) level confidence interval for θ is

Pθ {λ1 (θ) < T < λ2 (θ)} = 1−α


α
Pθ {T ≤ λ1 (θ)} =
2
α
and Pθ {T ≥ λ2 (θ)} =
2
Z θ
n n−1 α
Thus Pθ {T ≥ λ2 (θ)} = n
t dt =
λ2 (θ) θ 2
Z λ2 (θ)
n n−1 α
⇒ 1− n
t dt =
0 θ 2
α [λ2 (θ)]n
⇒ 1− =
2 θn
 α
i.e., θn 1 − = [λ2 (θ)]n
2
 α  n1
i.e., λ2 (θ) = θ 1 −
2
α
Similarly Pθ {T ≤ λ1 (θ)} =
2
Z λ1 (θ)
n n−1 α
⇒ n
t dt =
0 θ 2
 α  n1
i.e., λ1 (θ) = θ
2

202
Probability Models and their Parametric Estimation

Thus the (1 − α) level confidence interval for θ is


  1
α  n1

α n 
Pθ θ <T <θ 1− = 1−α
2 2
 
 T T 
Pθ < θ < = 1−α
 1 − α  n1 1
α n 
2 2
 
T T
1 , 1 provides the (1 − α) level confidence interval for θ .
(1− α2 ) n ( α2 ) n
Example 6.13 Let X1 , X2 , · · · , Xn be iid random samples drawn from a normal
population with mean θ and variance σ 2 . Find (1 − α) level confidence interval for
θ , (i) when σ 2 is known and (ii) σ 2 is unknown.
Case (i) when σ 2 is known , consider

P {a < Z < b} = 1−α


X̄ − θ
whereZ = √ ∼ N (0, 1)
σ/ n
X̄ − θ
√ < b} =
P {a < 1−α
σ/ n
√ √
P {X̄ − bσ/ n < θ < X̄ − aσ/ n} = 1−α
Z a
α
where a is given by φ(z)dz =
−∞ 2
Z ∞
α
and b is given by φ(z)dz =
b 2

Case (ii) When σ 2 is unknown and sample size n ≤ 30 then the statistic

X̄ − θ
t= √ ∼ t distribution with n − 1 d.f
S/ n
1
Pn
where S 2 = n−1 i=1 [Xi − X̄]2 . In this case

P {t1 < t < t2 } = 1−α


X̄ − θ
P {t1 < √ < t2 } = 1−α
S/ n
X̄ − θ
P {t1 < √ < t2 } = 1−α
S/ n
√ √
P {X̄ − t2 S/ n < θ < X̄ − t1 S/ n} = 1−α
Z t1
α
where t1 is given by pn−1 (t)dt =
−∞ 2
Z ∞
α
and t2 is given by pn−1 (t)dt =
t2 2

203
A. Santhakumaran

X̄−θ
If n > 30 , then t = √
S/ n
∼ N (0, 1) . Such a case the 1 − α confidence interval is

S S
(X̄ − zα/2 √ , X̄ − zalpha/2 √ )
n n
R∞
where α2 = zα/2 φ(z)dz
Example 6.14 A random sampling of size 50 taken from a N (θ, σ = 5) has a
mean 40. Obtain a 95% confidence interval for 2θ + 3
Given the sample mean x̄ = 40 and population standard deviation σ = 5 . The
95% confidence interval for θ is
σ σ
P {X̄ − 1.96 √ < θ < X̄ + 1.96 √ } = .95
n n
   
σ σ
P {2 X̄ − 1.96 √ < 2θ < 2 X̄ + 1.96 √ } = .95
n n
   
σ σ
P {2 X̄ − 1.96 √ + 3 < 2θ + 3 < 2 X̄ + 1.96 √ + 3} = .95
n n

The 95% confidence limits for 2θ + 3 are


5×2 5×2
2X̄ + 3 ± 1.96 √ = 83 ± 1.96 √
50 50

6.3 Shortest Length Confidence Intervals


Let X1 , X2 , · · · , Xn be a random sample from a pdf p(x | θ) and t(X; θ) = Tθ
be an random variable with distribution independent of θ . Suppose λ1 (α) and λ2 (α)
are chosen such that

Pθ {λ1 (α) < Tθ < λ2 (α)} = 1 − α (6.5)

Equation (6.5) can also be written as

Pθ {θ(X) < θ < θ̄(X)} = 1 − α

For every Tθ , λ1 (α) and λ2 (α) can be chosen in number of ways. However the choice
is one like to choose λ1 (α) and λ2 (α) , such that θ̄(X) − θ(X) is minimum which is
the (1 − α) level shortest confidence interval based on Tθ .
Let Tθ = t(X, θ) be sufficient statistic. A random variable Tθ is a function
of (X1 , X2 , · · · , Xn ) and θ whose distribution is independent of θ is called pivot.
Example 6.15 Let X1 , X2 , · · · , Xn be a random sample from N (θ, σ 2 ) where
2
σ is known. Obtain (1 − α) level shortest confidence interval for θ .
Consider the statistic Tθ = X̄−θ
√σ which is a pivot. Since X̄ is sufficient and
n
Tθ ∼ N (0, 1) , i.e, the distribution of Tθ is independent of θ . The (1 − α) level

204
Probability Models and their Parametric Estimation

confidence interval for θ is


Pθ {a < Tθ < b} = 1−α
( )
X̄ − θ
Pθ a< σ <b = 1−α

n
 
σ σ
Pθ X̄ − b √ < θ < X̄ − a √ = 1−α
n n
The length of this confidence interval is √σn (b − a) .
Minimize L = √σn (b − a) subject to
Z b
1 1 2
√ e− 2 x dx = 1 − α
a 2π
Z b
i.e., φ(x)dx = 1 − α (6.6)
a
where φ(x) ∼ N (0, 1). The necessary condition for minimum of L is
 
∂L σ db
= √ −1 =0
∂a n da
 
db
⇒ −1 = 0
da
Define φ(a) =
1 1 2
inta−∞ √ e− 2 z dz

Differentiate equation (6.6) with respect to a
Z b
dφ(x) da db
dx − φ(a) + φ(b) = 0
a da da da
Z b
db
0 × dx − φ(a) + φ(b) = 0
a da
db
φ(b) − φ(a) = 0
da
db φ(a)
⇒ =
da φ(b)
 
φ(a)
Thus −1 = 0
φ(b)
dL
RIf a da = 0, then φ(a) = φ(b) , i.e., when R a a = b or a = −b . If a = b , then
φ(x)dx = 0 which does not satisfy φ(x)dx = 1 − α. If a = −b , then
Ra b a

−b
φ(x)dx = 1 − α . Thus the shortest length confidence interval based on Tθ is a
equal two tails confidence interval. The (1 − α) level confidence interval for θ is
 
σ σ
X̄ − √ z α2 , X̄ + √ z α2
n n

205
A. Santhakumaran

where z α2 is the upper ordinate corresponding to the area α2 . The shortest length of
this interval is L = 2z α2 √σn .
Example 6.16 Let X1 , X2 , · · · , Xn be a sample from U (0, θ). Find (1−α) level
shortest confidence interval for θ .
Let T = max1≤i≤n {Xi } . The pdf of T is
 n n−1
p(t | θ) = θn t 0<t<θ
0 otherwise
T
The pdf of Y = θ is given by

ntn−1

0<y<1
p(y) =
0 otherwise
T
The statistic Y = θ is pivot. The (1 − α) level confidence interval for θ is
P {a < Y < b} = 1 − α
T
P {a < < b} = 1 − α
θ
T T
P{ < θ < } = 1 − α
b a
1 1
The length of the interval isL = ( − )T
a b
To find the shortest confidence interval, minimizing L subject to
Z b
ny n−1 dy = 1 − α
a

i.e., [y n ]ba = 1−α


bn − an = 1−α
Differentiate this with respect to b
da
nbn−1 − nan−1 = 0
db
 n−1
da b
i.e., =
db a
1
Now(1 − α) n < b ≤ 1
 
dL 1 da 1
= T − 2 + 2
db a db b
 n+1 n+1

a −b
= T <0
b2 an+1
since a < b ≤ 1 . The minimum occurs at b = 1 , i.,e., 1 − an = 1 − α → an = α
1
and a = α n . Thus the (1 − α) shortest confidence interval for θ is
 
T
T, 1
αn

206
Probability Models and their Parametric Estimation

Example 6.17 Let X1 , X2 , · · · , Xn be a sample drawn from a Normal population


N (θ, σ 2 ) where σ 2 is unknown. Obtain (1 − α) level shortest confidence interval
for θ .
Pn
The statistic Tθ = (X̄−θ)
√S
1
is a pivot where S 2 = n−1 i=1 (Xi − X̄)
2
n

since X̄ is sufficient and Tθ is independent of the parameter θ . Then Tθ follows t


distribution with (n − 1) degrees of freedom. The (1 − α) level confidence interval
for θ is given by
Pθ {a < Tθ < b} = 1 − α
 
S S
Pθ X̄ − b √ < θ < X̄ − a √ = 1−α
n n
The length of the confidence interval L = (b − a) √Sn .
Rb
Minimizing L subject to a pn−1 (t)dt = (1 − α) where pn−1 (t) is the pdf of the t
distribution with n − 1 degrees of freedom.
 
dL db S
= − 1 √ and
da da n
db
pn−1 (b) − pn−1 (a) = 0
da  
dL pn−1 (a) S
→ = −1 √
da pn−1 (b) n
Z a
where pn−1 (a) = pn−1 (t)dt
−∞

As in the problem 6.15, the minimum occurs at a = −b . The (1 − α) level confidence


interval is a equal two tails confidence interval for θ is
 
S S
X̄ − t α2 (n − 1) √ , X̄ + t α2 (n − 1) √
n n
R a
where a = t α2 (n − 1) is given by −∞ pn−1 (t)dt = α2 and b = −a
This shortest length of this interval is L = 2t α2 (n − 1) √Sn .
Example 6.18 Let X1 , X2 , · · · Xn be iid random samples drawn from a Normal
population with mean θ and variance σ 2 . Find (1 − α) level shortest confidence
interval for σ 2 when (i) θ is known and (ii) θ is unknown.
The Statistic Pn
(Xi − θ)2
Tσ2 = i=1 2 ∼ χ2 (ndf )
σ
Tσ2 is a pivot, since the statistic Tσ2 is independent of σ 2
Case (i) The (1 − α) level confidential interval for σ 2 is
P {a < Tσ2 < b} = 1 − α
Pn
(Xi − θ)2
P {a < i=1 2 < b} = 1 − α
Pn σ
n
(Xi − θ)2 (Xi − θ)2
P
i.e., P { i=1 < σ 2 < i=1 } = 1−α
b a

207
A. Santhakumaran

The length of the shortest confidence interval is


n  
2 1 1
X
L= (Xi − θ) −
i=1
a b

To find the shortest length confidence interval, minimizing L subject to


Z b
pn (χ2 )dχ2 = 1 − α
a

where pn (χ2 ) is the pdf of the χ2 statistic. with n df

dL 1 1 X
= ( − ) (Xi − θ)2
da a b
Z b
db
and 0dχ2 + pn (b) − pn (a) = 0
a da
db pn (a)
i.e., =
da pn (b)
Z a
wherepn (a) = pn (χ2 )dχ2
0
 X
dL 1 1 pn (a)
= − (Xi − θ)2
da a b pn (b)
dL
For minimum = 0
da
1 1 pn (a)
⇒ 2 =
a b2 pn (b)
⇒ b2 pn (b) = a2 pn (a)

Using iterative method to solve the equation b2 pn (b) = a2 pn (a) for a and b i.e., to
solve Z b Z a
b2 pn (χ2 )dχ2 = a2 pn (χ2 )dχ2 where a < b and a 6= b
0 0

If â and b̂ are the solution of the equation, then the shortest confidence interval for
σ 2 is  Pn Pn
2
(Xi − θ)2

i=1 (Xi − θ)
, i=1
b̂ â
Case(ii) If θ is unknown, then
Pn
(Xi − X̄)2 (n − 1)S 2 Pn
Tσ2 = i=1 2 = ∼ χ2 (n−1)df where S 2 = 1
n−1 i=1 (Xi − X̄)2
σ σ2
In this case to solve the equation

a2 pn−1 (a) = b2 pn−1 (b)

208
Probability Models and their Parametric Estimation

where pn−1 (χ2 ) is the pdf is the statistic

(n − 1)S 2
Tσ2 =
σ2
with (n − 1)df
The shortest confidence interval for σ 2 is

(n − 1)S 2 (n − 1)S 2
 
,
b̂ â

Example 6.19 Let X and Y be two independent random variables that are
N (θ, σ12 ) and N (θ, σ22 ) respectively. Obtain (1 − α) level confidence interval for
σ2
the ratio σ22 < 1 by considering a random sample X1 , X2 , · · · , Xn1 of size n1 ≥ 2
1
from the distribution of X and a random sample Y1 , Y2 , · · · , Yn2 of size n2 ≥ 2
from the distribution ofPYn1 . Pn2
Let s21 = n11 i=1 (Xi − X̄)2 and s22 = n12 i=1 (Yi − Ȳ )2 be the variances
n s2 n s2
of the two samples. The independent random variables σ1 2 1 and σ2 2 2 have χ2
1 2
distribution with n1 − 1 and n2 − 2 degrees of freedom respectively. The definition
of the F statistic is
n1 s21
σ12 (n1 −1)
F = n2 s22
∼ F distribution with n1 − 1 and n2 − 1 degrees of freedom.
σ22 (n2 −1)

σ22
The (1 − α) level confidence interval for σ12
is

n1 s21
 

σ12 (n1 −1)

P σ22 a< 2
n2 s2
< b = 1−α
2
σ1
 
σ22 (n2 −1)
 2

 n2 s2 n2 s22 
(n2 −1) σ22 (n2 −1)
P σ22 a < 2 < b n s2 = 1−α
2
σ1
 n1 s21 σ1 1 1 
(n1 −1) (n1 −1)

σ22
The (1 − α) level confidence interval for σ12
is

n2 s22 (n1 − 1) n2 s22 (n1 − 1)


 
a ,b
n1 s21 (n2 − 1) n1 s21 (n2 − 1)

where a and b are given by


Z a
α
= dF (n1 − 1, n2 − 1)
2
Z0 ∞
α
= dF (n1 − 1, n2 − 1)
2 b

209
A. Santhakumaran

Example 6.20 Let X1 , X2 , · · · , Xn be a random sample of size n from an Ex-


ponential family of distributions with parameter θ. Assume the pdf is
 −θx
θe θ>0
p(x | θ) =
0 otherwise

Obtain (1 − α) level confidence interval for θ.


The joint pdf of the random sample X1 , X2 , · · · , Xn is
P
p(x1 , x2 , · · · , xn ) = θn e−θ xi

Pn
Let T = i=1Xi , then T ∼ G(n, θ1 ). Its pdf is
 θn −θt n−1
Γn e t 0<t<∞
pθ (t) =
0 otherwise
 1 − y 2n −1
2n Γn e
2y 2 0<y<∞
pθ (y) =
0 otherwise
That is Y = 2θ Xi follows χ2 distribution with 2n degrees of freedom. The
P
(1 − α) level confidence interval for θ is
n X o
Pθ a < 2θ Xi < b = 1−α
 
a b
Pθ P <θ< P = 1−α
2 Xi 2 Xi
where a is given by Z a
α
p2n (χ2 )dχ2 =
0 2
and b is given by Z ∞
α
p2n (χ2 )dχ2 =
b 2
Example 6.21 The time to failure for an electronic component is assumed to be an
Exponential distribution with unknown parameter θ ,
 −θx
θe x > 0, θ > 0
i.e., p(x | θ) =
0 otherwise

10 electronic components are place on test and their observed times to failure are
607.5, 1947.0, 37.6, 129.9, 409.5, 529.5, 109.0, 582.4, 499.0, 188.1 hours respectively.
Find the 90% confidence interval for θ and 90% confidence interval for mean time to
failure. Also obtain the 90% confidence interval for the probability of the component
for a 100 hours period.
xi = 5039.5, 2n = 20 degrees of freedom. From χ2
P
As in the Example 6.16,
2 2
table χ0.5 = 10.9 and χ.95 = 31.4 . 90% confidence interval for θ is
 
10.9 31.4
, = (.00108, .00312)
2 × 5039.5 2 × 5039.5

210
Probability Models and their Parametric Estimation

The mean time to failure is θ1 . The 90% confidence interval for mean time to failure
1 1
lies between .00312 = 320.5 hours and .00108 = 925.9 hours.
The probability that one of these components will work at least t hours with-
out failure is P {X > t} = e−θt . The 90% confidence interval for the probabil-
ity of the component for a 100 hours period lies between e−100×.00312 = .732 and
e−100×.00108 = .898.
Example 6.22 Explain a method of construction of large sample confidence inter-
val for θ in Poisson (θ) .
For large samples the variable
∂ log L
∂θ
Z=q ∼ N (0, 1)
V [ ∂ log L
∂θ ]

Hence the distribution of Z one can easily construct the confidence limits for θ for
large samples. We have
X X
log L(θ) = xi log θ − nθ − log xi
∂ log L(θ) nx̄
= −n
 ∂θ  θ
∂ log L(θ) nX̄
V = V[ − θ]
∂θ θ
n
1 X
= V [ Xi ]
θ2 i=1
n
1 X
= V [X]
θ2 i=1
1
= nθ
θ2
n
=
θ
nx̄
θ −n
ThusZ = p
n/θ
The 95% large confidence interval for θ is
P {−1.96 < Z < 1.96} = .95
r
n
P {−1.96 < (X̄ − θ) < 1.96} = .95
θ
Hence the 95% confidence limits for θ are
r
n
(x̄ − θ) = ±1.96
θ
3.42
θ2 − (2x̄ + )θ + x̄2 = 0
n r
1.92 3.42 3.69
θ = x̄ + ± x̄ + 2
n n n

211
A. Santhakumaran

Problems

6.1 Distinguish between point estimation and interval estimation.


6.2 Explain the shortest confidence interval. Also obtain (1 − α) level shortest con-
fidence interval for θ , using a random sample of size n from
 −(x−θ)
e x ≥ 0, θ > 0
p(x | θ) =
0 otherwise

6.3 Let X1 , X2 , · · · , Xn be a random sample from U (0, θ) . Find the shortest -


length confidence interval for θ at level (1 − α) .
6.4 Obtain (1 − α) level confidence interval for σ 2 when θ is known in N (θ, σ 2 ) .

6.5 Suggest (1 − α) level shortest confidence interval for θ in N (θ, σ 2 ), σ 2 is


known. what is its length?
6.6 Obtain (1 − α) coefficient confidence interval for θ based on a random sample
from  1 −1x
θe
θ x ≥ 0, θ > 0
p(x | θ) =
0 otherwise

6.7 Obtain (1 − α) level shortest confidence interval for θ using a random sample
from N (θ, 1) .
6.8 Given X1 , X2 , · · · , Xn is a random sample from N (θ, σ 2 ) , where σ 2 is known
. Find (1 − α) level upper confidence bound for θ .
6.9 Obtain a confidence interval for the range of a rectangular distribution in random
sample of size n .
6.10 The number of houses sold per week for 15 weeks by Dinesh real estate firm
were 3 , 3, 4, 6, 2, 4, 4, 3, 1, 2, 0 , 5, 7, 1, 4 respectively. Assuming these are the
observed values for a random sample of size 15 of a Poisson random variable
with parameter θ . Compute 95 % confidence limits for θ . Ans.(2.36, 4.18)
6.11 Show that in large samples, the 95% level confidence limits for the means of a
Poisson distribution are given by
r
1.92 3.84
X̄ + ± X̄
n n

where n−2 is negligible.


6.12 Show that for the pdf

θe−θx

x > 0, θ > 0
p(x | θ) =
0 otherwise

212
Probability Models and their Parametric Estimation

the 95% level confidence limits for large samples are given by
 
1 ± 1.96

n
θ=

6.13 Obtain the large sample confidence interval with confidence coefficient (1 − α)
for the parameter of Bernoulli distribution.
6.14 Examine the connection between shortest confidence interval and sufficient
statistics.

6.15 90 % confidence interval for θ based on a single observation X from the den-
sity function  1
0 < x < θ, θ > 0
p(x | θ) = θ
0 otherwise
is
(a) [X, 10X] (b) 20X (c) 50
   
19 , 20X 49 , 12.5 (d) All the above
6.16 The correct interpretation regarding the confidence interval (T1 , T2 ) of the pa-
rameter θ for a distribution F (x | θ), θ ∈ < with confidence coefficient 1 − α
is
(a) θ belongs to (T1 , T2 ) with probability 1 − α
(b) (T1 , T2 ) covers the parameter θ with probability 1 − α
(c) (T1 , T2 ) includes the parameter θ with confidence coefficient 1 − α
(d) θ0 belongs to (T1 , T2 ) with confidence α where θ(6= θ0 ) is the true value.
6.17 If a random sample of n = 100 voters in a community produced 59 votes in
favour of the candidate A , then 95 % confidence interval of fraction p of the
voting population
q favouring A is
59×41
(a) 59 ± 1.96
q 100
(b) .59 ± 1.96 .59×.41
q 100
(c) 59 ± 2.58 .59×.41
q 100
(d) 59 ± 2.58 59×41
100

6.18 Let X1 , X2 , · · · , Xn be a sample from U (0, θ) . The equal two tails (1 − α)


levelconfidence intervalfor θ is:
X(n) X(n)
(a) 1 , 1
(1−α/2) n (α/2) n
 
X(n) X(n)
(b) 1 , 1
(α/2) n (1−α/2) n
 
X(n) X(n)
(c) (1−α/2) n , (α/2)n

(d) None of the above

213
A. Santhakumaran

7. BAYES ESTIMATION

7.1 Introduction
Bayes estimation treats the parameter θ of a statistical distribution as the re-
alizations of a random variable Ω with known distribution rather than a unknown
constant. So far the realization of distributions have assumed only the shape of the
distribution to be known but not the value of the parameters. Bayes estimation uses the
prior information of the distribution to completely specify the realization of the distri-
butions. This is the major difference in Bayes estimation and it may quite reasonable, if
the past experience is sufficiently extensive and revelant to the problem. The choice of
the prior distribution is made like that of the distribution Pθ by combining experience
with convenience.
A number of observations are available from the distribution Pθ , θ ∈ Ω of
a random variable X and it may be used to check the assumption of the form of the
distribution. But in Bayes estimation only a single observation is available from the
distribution of the parameter θ on Ω and it cannot be used to check the assumption
of the distribution. This needs a special care to use in the Bayes estimation.
Replication of a random experiment consists of drawing another set of ob-
servations from the distribution Pθ of a random variable X is possible in the usual
estimation. Replication of a random experiment results taking another value θ0 on
Ω from the prior distribution, then drawing a set of observations from the distribution
Pθ0 of a random variable X is possible in Bayes estimation.
The determination of a Bayes estimation is quite simple in principle. When
consider a situation before observations are taken and the distribution of θ on Ω is
known as prior distribution.
A decision function d(X) is a statistic that takes value in Ω . A non negative
function L(θ, d(X)), θ ∈ Ω is called a loss function. The function R defined by
R(θ, d) = Eθ [L(θ, d(X)] is known as the risk function associated with the decision
function d(X) at θ . For example L(θ, d) = [θ − d]2 , θ ∈ Ω ⊆ < , then the risk
R(θ, d) = Eθ [d(X) − θ]2 is a mean squared error. If it is known as the variance of
the estimator d(X) when Eθ [d(X)] = θ.
Bayes Risk Related to Prior
In Bayes estimation, the pdf (pmf ) π(θ) of θ on Ω ⊆ < is known as prior
distribution. For a fixed θ ∈ Ω , the pdf (pmf ) p(x | θ) represents the conditional
pdf (pmf ) of a random variable X given θ . If π(θ) is the pdf (pmf ) of θ on
Ω ⊆ < , then the joint pdf (pmf ) of θ on Ω and X is given by p(x, θ) = π(θ)p(x |
θ)
The Bayes risk of a decision function d with respect to the loss function
L(θ, d) is defined by R(π, d) = Eθ [R(θ, d)]. If θ on Ω is a continuous random
variable and X is of the continuous type, then bayes risk with respect to the loss

214
Probability Models and their Parametric Estimation

function L(θ, d) is
R(π, d) = Eθ [R(θ, d)]
Z
= R(θ, d)π(θ)dθ
Z
= Eθ [L(θ, d(X))]π(θ)dθ
Z Z 
= L(θ, d(x))p(x | θ)dx π(θ)dθ
Z Z
= L[θ, d(x)]p(x | θ)π(θ)dxdθ

If θ on Ω is a discrete variable with pmf π(θ) and X is of the discrete type,


then XX
R(π, d) = L[θ, d(x)]p(x | θ)π(θ)
θ x

7.2 Bayes Point Estimation


A decision function d? (X) is known as a Bayes estimator, if it minimizes the
Bayes risk, i.e., if R(π, d? ) = inf d R(π, d).
p(θ | x) is the conditional distribution of a random variable θ on Ω given
X = x and also called as the a posteriori probability distribution of θ on Ω ,
given the sample. The joint pdf of X and θ on Ω can be expressed in the form
p(x, θ) = g(x)p(θ | x) where g(x) denotes the marginal pdf (pmf ) of X . The a
priori pdf (pmf ) π(θ) gives the distribution of θ on Ω before the sample is taken and
the posterior pdf (pmf ) p(θ | x) gives the distribution of θ on Ω after the sampling.

Bayes Risk Related to Posterior


The Bayes risk function of a decision function d(X) with respect to a loss
function L(θ, d(X)) in terms of p(θ | x) is
R(π, d) = Eθ [R(θ, d)]
Z
= R(θ, d(x))g(x)dx
Z
= g(x)Eθ [L(θ, d(x))]dx
Z Z 
= g(x) L(θ, d(x))p(θ | x)dθ dx
or
X
R(π, d) = g(x) [L(θ, d(x))p(θ | x)]
x

E[R(θ, d)] is a mean value of the risk R(θ, d) or the expected value of the risk
R(θ, d) . It is evident that a Bayes estimator d? (X) minimizes the mean value of the
risk R(θ, d).

215
A. Santhakumaran

Theorem 7.1 Let X1 , X2 , · · · , Xn be a random sample from the pdf p(x |


θ) and π(θ) be a priori pdf of θ on Ω ⊆ < . Let L(θ, d) = (θ − d)2 be the
loss function for estimating the parameter θ . The Bayes estimator of θ is given by
d? (X) = E [θ | X = x] .
Proof: The risk function of a decision function d(x) with respect to the loss function
L(θ, d) = [θ − d]2 is
Z Z 
R(π, d) = g(x) [θ − d(x)]2 p(θ | x)dθ dx

The Bayes estimator is a function d? (X) that minimizes R(π, d). Minimization of
R(π, d) is same as the minimization of
Z
[θ − d(x)]2 p(θ | x)dθ

Mean squared deviation is minimum iff

d? (x) = E[θ | X = x]
since Eθ [d? (X)] = Eθ {E[θ | X = x]} = θ

Remark 7.1 If L(θ, d) = |θ − d| is the loss function for estimating the parameter
θ , then Bayes estimator of θ is the median of the posterior distribution of θ ∈ Ω ⊆ < .
Since E|X − a| is minimized as a function of a , i.e., E|X − a| is minimized when
a? = median of the distribution of X . Also Bayes estimator is need not be unbiased.
Minimax decision function
The principle of minimax estimator is to choose d? so that max R(θ, d? ) ≤
max R(θ, d) ∀d . If such a function d? exists, is a minimax estimator of θ ∈ Ω ⊆ < .
Theorem 7.2 If d? (X) is a Bayes estimator having constant risk, that is R(θ, d? )
= constant, then d? (X) is a minimax estimator.
Proof: Let π ? (θ) be the prior density corresponding to the Bayes estimator d? (X)
with respect to the loss function L(θ, d) . Then

sup R(θ, d? ) = constant = R(θ, d? )


θ∈Ω
= Eθ L[θ, d? (X)]
Z
= L(θ, d? (x))π ? (θ)dθ
Z
≤ L(θ, d(x))π(θ)dθ

≤ sup R(θ, d)
θ∈Ω

for any other estimator d(X) of the parameter θ . Thus d? (X) is a minimax estima-

216
Probability Models and their Parametric Estimation

tor.

Mean squared error of d(X) = E[d(X) − θ]2


= E[d(X) − E[d(X)] + E[d(X)] − θ]2
= E[d(X) − Ed(X)]2 + [E[d(X) − θ]2
= Vθ [d(X)] + [bias]2

where Eθ [d(X)] − θ is called the bias of the estimator d(X) .


Example 7.1 Let X ∼ b(n, θ) and the a priori pdf of θ on Ω ⊆ < is U (0, 1) .
Find the Bayes estimate of θ using quadratic loss function. Also find the minimax
estimate of θ
The a priori pdf of θ on Ω is

1 0<θ<1
π(θ) =
0 otherwise

The marginal pdf of X is


Z
g(x) = p(x, θ)dθ
Z
= π(θ)p(x | θ)dθ
Z 1
= cnx θx (1 − θ)n−x dθ
0
Z 1
= cnx θx+1−1 (1 − θ)n−x+1−1 dθ
0
Γ(x + 1)Γ(n − x + 1)
= cnx
Γ(n − x + 1 + x + 1)
n! x!(n − x)!
=
x!(n − x)! (n + 1)!
 1
n+1 x = 0, 1, 2, · · · , n
g(x) =
0 otherwise
The posterior pdf of θ on Ω is

p(x, θ) π(θ)p(x | θ)
p(θ | x) = =
g(x) g(x)
= (n + 1)cnx θx (1 − θ)n−x

217
A. Santhakumaran

Bayes estimate d? (x) of the parameter θ

= E (θ | X = x)
Z 1
= θp(θ | x)dθ
0
Z 1
= (n + 1)cnx θx+2−1 (1 − θ)n−x+1−1 dθ
0
n! (x + 1)!(n − x)!
= (n + 1)
x!(n − x)! (n + 2)!
x+1
=
n+2

The Bayes estimator of the parameter θ is d? (X) = X+1


n+2 .
Bayes minimax estimator of the function d? (X) with respect to the loss function
L(θ, d? ) is
Z Z
?
R(π, d ) = L[θ, d? (x)]π(θ)p(x | θ)dxdθ
Z ( n )
X
? 2
= π(θ) [d (x) − θ] p(x | θ) dθ where L(θ, d? (x)) = [d? (x) − θ]2
x=0
( n  2 )
Z 1 X x+1
= −θ p(x | θ) dθ
0 x=0
n+2
Z 1 2

X +1
= Eθ − θ dθ
0 n+2
Z 1
1
Eθ (X + 1)2 + (n + 2)2 θ2 − 2(X + 1)(n + 2)θ dθ
 
= 2
(n + 2) 0

1
Z 1
R(π, d? ) = Eθ [X 2 ] + 2Eθ [X] + 1 + θ2 (n + 2)2 − 2θ(n + 2)Eθ [X] − 2θ(n + 2) dθ

(n + 2)2 0
1
Z 1
R(π, d? ) = n(n − 1)θ2 + nθ + 2nθ + 1 + θ2 (n + 2)2 − 2θ(n + 2)nθ − 2θ(n + 2) dθ

(n + 2)2 0

Z 1
? 1
R(π, d ) = [nθ(1 − θ) + (1 − 2θ)2 ]dθ
(n + 2)2 0
 
1 n 1 1
= + =
(n + 2)2 6 3 6(n + 2)

Example 7.2 Let X1 , X2 , · · · , Xn be a random sample drawn from a population


with pmf
 x
θ (1 − θ)1−x x = 0, 1 and 0 < θ < 1
p(x | θ) =
0 otherwise

218
Probability Models and their Parametric Estimation

Assume that the a priori distribution of θ on Ω is given by



1 0<θ<1
π(θ) =
0 otherwise

Find the Bayes estimate of θ and θ(1 − θ) using the quadratic loss function.
The marginal pdf of X1 , X2 , · · · , Xn is
Z
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
Z 1
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
Z 1 P
n− xi
P
x1
= θ (1 − θ) dθ
0
Z 1 X
= θt+1−1 (1 − θ)n−t+1−1 dθ where t = xi
0
(
t!(n−t)!
(n+1)! t = 0, 1, 2, · · · , n
=
0 otherwise

The posterior pdf of θ on Ω is

p(x1 , x2 , · · · , xn , θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
π(θ)p(x1 , x2 , · · · , xn | θ)
=
g(x1 , x2 , · · · , xn )
(
(n+1)! t
t!(n−t)! θ (1 − θ)n−t 0<θ<1
=
0 otherwise

Bayes estimate of the parameter θ is


d? (x1 , x2 , · · · , xn ) = E [θ | X1 = x1 , · · · , Xn = xn ]
Z 1
(n + 1)!θt (1 − θ)n−t
= θ dθ
0 t!(n − t)!
Z 1
(n + 1)!
= θt+2−1 (1 − θ)n+1−t−1 dθ
t!(n − t)! 0
(n + 1)! (t + 1)!(n − t)!
=
t!(n − t)! (n + 2)!
P
t+1 xi + 1
= =
n+2 n+2

219
A. Santhakumaran

Bayes estimate of the parameter θ(1 − θ) is


Z 1
?
d (x1 , x2 , · · · , xn ) = θ(1 − θ)p(θ | x1 , x2 , · · · , xn )dθ
0
Z 1
(n + 1)!
= θt+2−1 (1 − θ)n+2−t−1 dθ
t!(n − t)! 0
(n − t + 1)(t + 1)
=
(n + 3)(n + 2)
P P
(n − xi + 1)( xi + 1)
=
(n + 2)(n + 3)

Example 7.3 Let X1 , X2 , · · · , Xn be a random sample drawn from a Poisson


population with parameter θ . For estimating θ , using the quadratic error loss func-
tion and the a priori distribution of θ on Ω , given by pdf
 −θ
e θ>0
π(θ) =
0 otherwise

is used. Find the Bayes estimate for (i) θ and (ii) e−θ
The marginal pdf of X1 , X2 , · · · , Xn is
Z ∞
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
0
Z ∞
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
P
−nθ θ
xi
−θ e
Z ∞
= e dθ
0 x1 ! · · · xn !
1 Z ∞
−(n+1)θ t+1−1 X
= Qn e θ dθ where t = xi
x ! 0
i=1 i
t!
= Qn
x !(n + 1)t+1
i=1 i
The posterior pdf of θ on Ω is
p(x1 , x2 , · · · , xn , θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
π(θ)p(x1 , x2 , · · · , xn | θ)
=
g(x1 , x2 , · · · , xn )

e−(n+1)θ θ t t+1 X
= (n + 1) where t = xi and 0 < θ < ∞
t!

Case (i) Bayes estimate of θ is



e−(n+1)θ θt
Z
d? (x1 , x2 , · · · , xn ) = θ (n + 1)t+1 dθ
0 t!
(n + 1)t+1 ∞ −(n+1)θ t+2−1
Z
= e θ dθ
t! 0
(n + 1)t+1 Γt + 2
=
t! (n + 1)t+2
t!(t + 1) t+1
= =
t!(n + 1) (n + 1)

220
Probability Models and their Parametric Estimation

Case (ii) Bayes estimate of e−θ is



e−(n+1)θ θt
Z
?
d (x1 , x2 , · · · , xn ) = e−θ (n + 1)t+1 dθ
0 t!
(n + 1)t+1 ∞ −(n+2)θ t+1−1
Z
= e θ dθ
t! 0
(n + 1)t+1 Γt + 1
=
t! (n + 2)t+1
 t+1
n+1
=
n+2
Example 7.4 X ∼ b(n, θ) and suppose that a priori pdf of θ on Ω is U (0, 1) .
2
Find the Bayes estimate of θ . Using loss function L(θ, d) = (θ−d) θ(1−θ) , find the Bayes
minimax estimate of θ .
x+1
As in Example 7.1, the Bayes estimate of θ is d? (x) = n+2 . Minimax estimate of
?
θ with respect to the loss function L(θ, d ) is
Z 1 Z 
? ?
R(π, d ) = π(θ) L(θ, d (x))p(x | θ)dx dθ
"0 n #
X [d? (x) − θ]2
= p(x | θ) dθ
x=0
θ(1 − θ)
Z 1 "X n  2 #
x+1 1
= −θ p(x | θ) dθ
0 x=0
n+2 θ(1 − θ)
Z 1  2
X +1 1
= Eθ −θ dθ
0 n + 2 θ(1 − θ)
Z 1 
1 1
= (n − 4) + dθ
(n + 2)2 0 θ(1 − θ)
Z 1 
(n − 4) 1 1 1
= + + dθ
(n + 2)2 (n + 2)2 0 θ 1−θ
Example 7.5 Let X1 , X2 , · · · , Xn be a random sample drawn from a distribution
with pdf G(1, θ1 ) . To estimate θ , let priori pdf on θ be π(θ) = e−θ , θ > 0 and let
the loss function be squared error. Find the Bayes estimate of θ .
The marginal pdf of X1 , X2 , · · · , Xn is
Z ∞
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
0
Z ∞
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
Z ∞ n
X
= e−θ(1+t) θn+1−1 dθ where t = xi
0 i=0
n!
= , 0<t<∞
(1 + t)n+1

221
A. Santhakumaran

The posterior pdf θ on Ω is


p(x1 , x2 , · · · , xn , θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
π(θ)p(x1 , x2 , · · · , xn | θ)
=
g(x1 , x2 , · · · , xn )
(1 + t)n+1 −θ(1+t) n
= e θ , 0<θ<∞
n!
Bayes estimate of θ is

(1 + t)n+1 ∞ −θ(1+t) n+2−1
Z Z
?
d (x) = e θ dθ
0 n! 0
(1 + t)n (n + 1)!
=
n! (1 + t)n+2
n+1 n+1
= 2
= P 2
(1 + t) [ xi + 1]
Example 7.6 Let X1 , X2 , · · · , Xn be iid random sample drawn from a popula-
tion with pmf b(1, θ) . Assume the a priori pdf of θ on Ω is
( a−1
θ (1−θ)b−1
0<θ<1
π(θ) = β(a,b)
0 otherwise

Find the Bayes estimate of θ using the quadratic loss fuction.


The marginal pdf of X1 , X2 , · · · , Xn is
Z 1
g(x1 , x2 , · · · , xn ) = π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
1
θt+a−1 (1 − θ)n−t+b−1
Z
= dθ
0 β(a, b)
1 Γ(a + t)Γ(n − t + b)
=
β(a, b) Γ(n + a + b)
The posterior pdf of θ on Ω is

Γ(n + a + b)
p(θ | x1 , x2 , · · · , xn ) = θa+t−1 (1 − θ)n+b−t−1 0 < θ < 1
Γ(a + t)Γ(a + b − t)
Bayes estimate of θ is
Z 1
? Γ(n + a + b)
d (x) = θa+1+t−1 (1 − θ)n+b−t−1 dθ
Γ(a + t)Γ(n + b − t) 0
P
a+t xi + a
= =
n+b+a n+b+a
Example 7.7 Let the a priori pdf of θ on Ω be N (0, 1) . Let X1 , X2 , · · · , Xn
be iid random sample drawn from a normal population with mean θ and variance 1.

222
Probability Models and their Parametric Estimation

Find the Bayes estimate of θ and Bayes risk with respect to a loss function L[θ, d] =
[θ − d]2 .
The a priori pdf of θ on Ω is
( 1 2
√1 e− 2 θ −∞ < θ < ∞
π(θ) = 2π
0 otherwise

The pdf of X given θ is


( 1 2
√1 e− 2 (x−θ) −∞ < x < ∞
p(x | θ) = 2π
0 otherwise

The marginal density of X1 , X2 , · · · , Xn is


Z ∞
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
−∞
Z ∞
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
−∞
Z ∞  n
1 − 1 θ2 1 1
P 2
= √ e 2 √ e− 2 (xi −θ) dθ
−∞ 2π 2π
P 2 Z
− 21 xi ∞
e 1 2 2
= (n+1)
e− 2 [nθ +θ −2nθx̄] dθ
(2π) 2 −∞
P 2 Z
− 12 xi ∞
e (n+1) 2
e− 2 [θ − n ] dθ
2nx̄θ
= (n+1)
(2π) 2 −∞
P 2
− 21 xi Z ∞
e n2 x̄2 n+1 nx̄ 2
= (n+1)
e 2(n+1) e− 2 [θ− n+1 ] dθ
(2π) 2 −∞

√ nx̄ nx̄ 2
Put the transformation n + 1(θ − n+1 ) = t → (n + 1)(θ − n+1 ) = t2
2 2
1
x2i + 2(n+1)
n x̄
P
e− 2 ∞
Z
1 2
g(x1 , x2 , · · · , xn ) = n+1 √ e− 2 t dt
(2π) 2 n+1 −∞
2 2
− 12 x2i + 2(n+1)
n x̄
P
e √
= n+1 √ 2π
(2π) 2 n+1
1 2 n2 x̄2
P
e− 2 xi + 2(n+1)
= √ n
n + 1(2π) 2

223
A. Santhakumaran

The postrior pdf of θ on Ω is

π(θ)p(x1 , x2 , · · · , xn | θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
1 2 1
P 2√ n
√1 e− 2 θ √ 1 e− 2 (xi −θ) n + 1(2π) 2
2π ( 2π)n
= n2 x̄
− 12
P 2
xi + 2(n+1)
e
1 (n+1) nx̄ 2
− 2 [θ− n+1 ]
= q e −∞<θ <∞

n+1
= 0 otherwise
Bayes estimate of θ is
d? (x)= E[θ | X1 = x1 , · · · Xn = xn ]
Z ∞
= θp(θ | x1 , x2 · · · , xn )dθ
−∞
Z ∞
1 1 (n+1) nx̄ 2
= θ √ (n + 1) 2 e− 2 [θ− n+1 ] dθ
−∞ 2π
√ nx̄ t nx̄

Put t = n + 1(θ − n+1 ) → θ = √n+1 + n+1 , dt = n + 1dθ

∞ t2 ∞
1 te− 2
Z Z  
1 nx̄ t2
d? (x) = √ √ dt + √ e− 2 dt
−∞ 2π n + 1 −∞ 2π n+1
nx̄ nx̄
= 0+ =
n+1 n+1

224
Probability Models and their Parametric Estimation

Bayes minimax estimate R(π, d? )


Z ∞Z ∞ 2
nx̄
= − θ p(x̄ | θ)π(θ)dθdx̄
−∞ −∞ n + 1
Z ∞  2
nX̄
= π(θ)Eθ − θ dθ
−∞ n+1
Z ∞
1
= π(θ)Eθ [nX̄ − nθ − θ]2 dθ
(n + 1)2 −∞
Z ∞
1
π(θ) Eθ [n(X̄ − θ)]2 + θ2 dθ

= 2
(n + 1) −∞
Z ∞
1
= π(θ)[n2 Vθ [X̄] + θ2 ]dθ
(n + 1)2 −∞
Z ∞
1 1
= 2
π(θ)[n + θ]2 dθ where Vθ [X̄] =
(n + 1) −∞ n
Z ∞ Z ∞
n 1
= π(θ)dθ + θ2 π(θ)dθ
(n + 1)2 −∞ (n + 1)2 −∞
n 1
= + since π(θ) ∼ N (0, 1)
(n + 1)2 (n + 1)2
n+1 1
= 2
=
(n + 1) n+1

7.3 Bayes confidence intervals


Bayes confidence interval estimation taking into account a prior knowledge of the
experiment and to construct the confidence interval for a parameter θ . The posterior
pdf p(θ | x1 , x2 , · · · , xn ) of θ on Ω is known, then one can easily find out the
function l1 (x) and l2 (x) such that
P {l1 (X) < θ < l2 (X)} = 1 − α
It gives the 1 − α level Bayes confidence interval for θ . Thus
Z l2 (θ)
P {l1 (X) < θ < l2 (X)} = p(θ | x1 , x2 , · · · , xn )dθ
l1 (θ)
or
l2 (x)
X
= p(θ | x1 , x2 , · · · xn )
l1 (x)

Example 7.8 Let X1 , X2 , · · · , Xn be iid b(1, θ) random variables and let the
a priori pdf π(θ) of θ on Ω be U (0, 1) . Find (1 − α) level Bayes confidence
interval for θ .
As in Example 7.2,
1
θt (1 − θ)n−t 0 < θ < 1 where t = xi
 P
p(θ | x1 , x2 , · · · , xn ) = β(t+1,n−t+1)
0 otherwise

225
A. Santhakumaran

(1 − α) level Bayes confidence interval for θ is

Pθ {l1 (X) < θ < l2 (X)} = 1−α


α
i.e., Pθ {θ ≥ l2 X} =
2
Z 1
1 α
θt (1 − θ)n−t dθ = (7.1)
l2 x β(t + 1, n − t + 1) 2

α
and Pθ {θ ≤ l1 x} =
2

Z l1 (x)
1 α
(θ) θt (1 − θ)n−t dθ = (7.2)
0 β(t + 1, n − t + 1) 2
Solving the equations (7.1) and (7.2) for θ , one may get the (1 − α) level Bayes
confidence interval (θ(x), θ̄(x)) for θ .
Example 7.9 Let X1 , X2 , · · · , Xn be iid random sample drawn from a normal
population N (θ, 1), θ ∈ Ω ⊆ < and let the a priori pdf π(θ) of θ on Ω be
N (0, 1) . Find (1 − α) level Bayes confidence interval for θ .
As in Example 7.7, the posterior pdf of θ on Ω is
 
nx̄ 1
p(θ | x1 , x2 , · · · , xn ) ∼ N ,
n+1 n+1

. The (1 − α) level Bayes confidence interval is

Pθ {l1 (X) < θ < l2 (X)} = 1 − α

Consider the Statistic


nx̄
θ− n+1
Z= 1 ∼ N (0, 1)

n+1

Here θ is random variable. If one selects the equal tails confidence interval, then

   
nX̄
Pθ −z α2 < −θ n + 1 < z α2 = 1−α
n+1
zα zα
 
nX̄ nX̄
Pθ −√ 2 <θ< √ +√ 2 = 1−α
n+1 n+1 n+1 n+1
zα zα
 
nx̄ nx̄
−√ 2 , +√ 2
n+1 n+1 n+1 n+1
is the (1 − α) level Bayes confidence interval for θ .
Example 7.10 Let X1 , X2 , · · · , Xn be a random sample from a Poisson distri-
bution with unknown parameter θ . Assume that the a priori pdf π(θ) of θ on Ω
is  1 −αθ β−1
αβ Γβ
e θ θ > 0, α, β > 0
π(θ) =
0 otherwise

226
Probability Models and their Parametric Estimation

Find (1 − α) level Bayes confidence interval for θ .


The pdf of X1 , X2 , · · · , Xn given θ is
n
e−nθ θt X
p(x1 , x2 , · · · , xn | θ) = Qn where t = xi
i=1 xi ! i=1

The marginal pdf of X1 , X2 , · · · , Xn is


Z ∞
1 e−nθ θt
g(x1 , x2 , · · · , xn ) = e−αθ θβ−1 Qn dθ
0 αβ Γβ i=1 xi !
1 1 1 Γ(β + t)
= Qn β Γβ (α + n)β+t
x
i=1 i ! α

The posterior pdf of θ on Ω is

(α + n)β+t −(n+α)θ β+t−1


p(θ | x1 , x2 , · · · , xn ) = e θ θ>0
Γ(β + t)

The (1 − α) level confidence interval for θ is

P {l1 (X) < θ < l2 (X)} = 1−α


α
i.e., Pθ {θ ≥ l2 (x)} =
2
Z ∞
α
p(θ | x1 , x2 , · · · , xn )dθ = (7.3)
l2 (x) 2

α
Pθ {θ ≤ l1 (x)} =
2

Z l1 (x)
α
p(θ | x1 , x2 , · · · , xn )dθ = (7.4)
0 2
Solving the equations (7.3) and (7.4) for θ , one may get the (1 − α) level Bayes
confidence interval (θ(X), θ̄(X)) for θ .
Example 7.11 Let X1 , X2 , · · · , Xn be a sample drawn from a normal population
N (θ, 1) . Assume that the a priori pdf π(θ) on Ω is U (−1, 1) . Find (1 − α) level
Bayes confidence interval for θ.
The pdf of X1 , X2 , · · · , Xn is
(  n 1 P 2
√1 e− 2 (xi −θ) −∞ < x < ∞
p(x1 , x2 , · · · , xn | θ) = 2π
0 otherwise

The a priori pdf of θ on Ω is


1

2 −1 < θ < 1
π(θ) =
0 otherwise

227
A. Santhakumaran

The marginal pdf of X1 , X2 , · · · , Xn is


Z ∞
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
−∞
Z ∞
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
−∞
Z 1  n
1 1 1
P 2
= √ e− 2 (xi −θ) dθ
−1 2 2π
P 2 Z
− 21 xi 1
e n 2
= n e− 2 [θ −θx̄] dθ
2(2π) 2 −1
1
P 2 Z
e− 2 xi 1 − n2 {[θ−x̄]2 −x̄2 }
= n e dθ
2(2π) 2 −1
1
P 2 nx̄2 Z
1
e− 2 xi + 2 n 2
= n e− 2 [θ−x̄] dθ
2(2π) 2 −1
1
P 2 nx̄2 Z

e− 2 xi + 2 t2 dt √
= n e− 2 √ where t = (θ − x̄) n
2(2π) 2 −∞ n
1 2 nx̄ 2
e− 2 xi + 2 √
P

= √ n 2π
2 n(2π) 2
The posterior pdf of θ on Ω is
π(θ)p(x1 , x2 , · · · , xn | θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
1 − 12
P
(xi −θ)2
√ √
2e 2 n( 2π)n
= √ √ 1
P 2 n 2
2π( 2π)n e− 2 xi + 2 x̄

n 1 2
= √ e− 2 n[θ−x̄] − ∞ < θ < ∞

 
1
θ ∼ N x̄,
n
The (1 − α) level Bayes confidence interval for θ is
P {a < Z < b} = 1 − α
where Z = θ−x̄
√1
∼ N (0, 1)
n

P −z α2 < Z < z α2 = 1−α
 
zα/2 zα/2
P X̄ − √ < θ < X̄ + √ = 1−α
n n

Thus the (1 − α) level Bayes confidence for θ is


 
zα/2 zα/2
x̄ − √ , x̄ + √
n n

228
Probability Models and their Parametric Estimation

Problems

7.1 Given n independent observations from a Poisson distribution with mean λ , find
Bayes’ estimate of λ , assuming the prior distribution π(θ) = e−λ , 0 < λ <
∞.

7.2 If d is a Bayes estimator of θ relative to some prior distributions and the risk
function does not depend on θ , show that d is minimax.
7.3 Define the terms: loss function, risk function and minimax estimator. Explain a
procedure of computing the minimax estimator under squared error loss func-
tion.

7.4 Explain Bayes and Minimax estimation procedures. Find out the Bayes estimate
of θ by using the quadratic loss function. Given a random sample from p(x |
θ) = θx (1 − θ)1−x , x = 0, 1 . The a priori distribution of θ is π(θ) = 2θ, 0 ≤
θ ≤ 1.
7.5 Let X1 , X2 , · · · , Xn be a sample drawn from a normal population N (0, 1) .
Assume that the a priori pdf π(θ) on Ω is U (−1, 1) . Find (1 − α) level
Bayesian confidence interval for θ. Also comments on your confidence interval.
7.6 Explain the concepts of Baye’s estimation.
7.7 Distinguish between interval estimation and Bayes interval estimation.
7.8 The joint pdf p(x, θ) can be expressed for the given value θ on Ω ⊆ < and
the a prior density π(θ) as
(a) p(x, θ) = p(x | θ)π(θ)
(b) p(x, θ) = g(x)p(x | θ)
g(θ)
(c) p(x, θ) = p(θ|x)
π(θ)
(d) p(x, θ) = p(x|θ)

7.9 The joint pdf p(x, θ) can be expressed for the given value X = x . p(θ | x) is
the posterior pdf of θ on Ω ⊆ < and g(x) is the marginal density of X as
(a) p(x, θ) = g(x)p(θ | x)
g(x)
(b) p(x, θ) = p(θ|x)
π(θ)
(c) p(x, θ) = p(θ|x)
(d) p(x, θ) = g(x)p(x | θ)

7.10 Which of the following statements are correct?


(1) Properties of Bayes estimator are given in terms of minimum risk.
(2) For large n , Bayes estimators tend to MLE’s irrespective of prior density
π(θ) of θ on Ω .
(3) Bayes estimators in many cases are asymptotically consistent.
(4) Goodness of a Bayes estimator is measured in terms of mean squared error
loss funcion.

229
A. Santhakumaran

State the correct answer given below:


(a) 1 and 2 ( b) 2 and 3 (c) 3 and 4 (d) 1, 2, 3 and 4
7.11 Bayes estimator is
(a) unbiased
(b) not unbiased
(c) asymptoticaly normal
(d) None of the above
7.12 Which of the following statement is true?
Main feature of Bayes’ approach in the estimation of parameter is
(a) to consider the parameter a random variable
(b) to specify prior distribution
(c) to specify posterior distribution
(d) All the above
7.13 Bayes estimator is
(a) always asymptotically normal
(b) always a function of minimal sufficient statistics
(c) most efficient
(d) both (a) and (c)
7.14 Which of the following statements are true?
(1) Bayes estimation uses the prior information of the distribution to completely
specify the realization of the distribution.
(2) Bayes estimation involves only a single observation from the ditribution of θ
on Ω .
(3) Bayes estimation consists of repeating a random experiment means taking
another value θ0 on Ω from the prior distribution, then drawing a set of obser-
vations from the distribution Pθ0 of a random variable X .
Choose the correct answer given below:
(a) 1 and 2 (b) 1 and 3 (c) 2 and 3 (d) 1, 2 and 3

230
Probability Models and their Parametric Estimation

ANSWERS TO THE PROBLEMS


Chapter 1 Chapter 3
1.1 b 3.11 a
1.2 c 3.12 b
1.3 b 3.13 b
1.4 b 3.14 a
1.5 d Chapter 4
1.6 a 4.14 c
1.7 c 4.15 b
1.8 d 4.16 c
1.9 b 4.17 d
1.10 c 4.18 b
1.11 d 4.19 d
1.12 d 4.20 a
1.13 d Chapter 5
1.14 c 5.35 a
1.15 d 5.36 c
1.16 b 5.37 b
1.17 d 5.38 a
1.18 a 5.39 d
1.19 c 5.40 c
1.20 a 5.41 a
1.21 a 5.42 d
1.22 b Chapter 6
1.23 c 6. 11 a
1.24 d 6.12 c
1.25 c 6.13 b
1.26 d 6.14 a
1.27 c
1.28 a Chapter
Chapter 2 7.6 a
2.26 b 7.7 a
2.27 c 7.8 d
2.28 a 7.9 b
2.29 b 7.10 a
2.30 b 7.11 b
2.31 a 7.12 d

231
A. Santhakumaran

Glossary of Notation

N - Set of natural numbers


I+ - Set of positive integers
< - Real number system
Ω - Parameter space
pdf - Probability density function
pmf - Probability mass function
p(x | θ) - Given parameter θ, the pdf
or pmf of the random variable X
π(θ) - prior pdf or pmf of θ on Ω
p(θ | x) - Posterior pdf or pmf of θ on Ω
p(x, θ) - Joint pdf or joint pmf of the random variable X
and the random variable θ
p(x, y) - Joint pdf or joint pmf of the random variables X and Y
T = t(X) - t(X1 , X2 , · · · , Xn ), n = 1, 2, · · · is a function
of random sample
MLE - Maximum Likelihood Estimator
UMVUE - Uniformly Minimum Variance Unbiased Estimator
LMVUE - Locally Minimum Variance Unbiased Estimator
MVBE - Minimum Variance Bound Estimator
BLUE - Best Linear Unbiased Estimator
LSE - Least Square estimator
iid - Independent identically distributed
b(1, θ) - Bernoulli with parameter θ
b(n, θ) - Binomial with parameter n and θ
G(n, θ) - Gamma with parameter n and θ
exp(θ) - Exponential with parameter θ
P (θ) - Poisson with parameter θ
∪(a, b) - Uniform on (a, b)
N (θ, σ 2 ) - Normal with mean θ, variance σ 2
df - degrees of freedom
tn - Student’s t distribution with n df
F (m, n) - F - distribution with (m, n) df
Probability Models and their Parametric Estimation

APPENDIX
Normal curve ordinate
#include < stdio.h >
void main()
{
float y[200],a,b,x,l,n,s1,s2,calarea,area;
int i;
clrscr();
printf( ˝ Enter the value of a and area \n” );
scanf( ˝ %f %f”, & a, & area);
printf( ˝ Enter the number of intervals n \n” );
scanf( ˝ %d”,& n);
/* 0 ≤ a, b ≤ +3 */
b = 0.0;
do
{
l = (b - a)/n;
x = l;
y[0] = 1/2.506;
for( i= 1; i < = n; i++)
{
y[i] = (1/2.506)*exp( -0.5*x*x);
x=x+l}
s1 = 0 ;
s2 = 0 ;
for(i = 1; i < = n-1; i=i+2)
{
s1 = s1 + y[i];
}
for( i = 2; i < = n-2; i = i+2)
{
s2 = s2 + y[i];
}
calarea = l/3*( y[0] + y[n] + 4*s1 +2*s2);
if(( calarea - area ) > = .0001)
break;
b = b + 0.01
}
while( b < = 3.0)
printf( ˝ The ordinate of the given area = %4.2f”, b);
getch();
}

Normal Curve Area


#include < stdio.h >
void main()
A. Santhakumaran

{
float y[200],a,b,x,l,n,s1,s2,area;
int i;
clrscr();
printf( ˝ Enter the value of a \n” );
/* 0 ≤ a, b ≤ +3 */
scanf( ˝ %f”, & a);
printf( ˝ Enter the value of b \n” );
scanf( ˝ %f”, & b);
printf( ˝ Enter the value of n \n” );
scanf( ˝ %f”,& n);
l = (b - a)/n;
x = l;
y[0] = 1/2.506;
for( i= 1; i < = n; i++)
{
y[i] = (1/2.506)*exp( - 0.5*x*x);
x = x+l;
}
s1 = 0 ;
s2 = 0 ;
for(i = 1; i < = n-1; i=i+2)
{
s1 = s1 + y[i];
}
for( i = 2; i < = n-2; i = i+2)
{
s2 = s2 + y[i];
}
area = l/3*( y[0] + y[n] + 4*s1 +2*s2);
printf( ˝ The area for the given ordinate = %4.5f”,area);
getch();
}

234
Probability Models and their Parametric Estimation

BIBLIOGRAPHY

1. Apostal, T. M. Mathematical Analysis, Addison - Wesley, Reading, Mass, 1960.

2. Balagurusamy, E. Programming in ANSI C, Tata McGraw - Hill publishing Com-


pany Limited, New Delhi, 1995.
3. Chernoff, H. and Lehmann, E. L. The use of the maximum likelihood estimates in
χ2 tests of goodness of fit, Ann. Math. Stat., 25, 579 1964.
4. Fisher, R. A. On the mathematical foundations of theoretical statistics, Phil. Trans.
Royal Soc. A, 222, 309 - 368, 1922.
5. Lehmann, E. L. Testing Statistical hypotheses, John Wiley, New York, 1959.
6. Lehmann, E. L. Theory of point estimation, John Wiley and Sons, 1983