Beruflich Dokumente
Kultur Dokumente
NET/JRF/CSIR EXAMINATIONS
A. SANTHAKUMARAN
Dr. A. Santhakumaraan
Associate Professor and Head
Department of Statistics
Salem Sowdeswari College
Salem - 636010
Tamil - Nadu
E-mail: ask.stat @ yahoo.com
About the Author
Dr. A. Santhakumaran is an Associate Professor and Head Department of Statistics at
Salem Sowdeswari College, Slaem - 10, Tamil Nadu. He holds a Ph.D. in Statistics -
Mathematics from the Ramanujan Institute for Advanced Study in Mathematics, Univer-
sity of Madras. He has interests in Stochastic Processes and Their Applications. He has
to his credit over 31 research papers in Feedback Queues, Statistical Quality Control and
Reliability Theory. He is the authour of the book Fundamentals of Testing Statistical
Hypotheses and Research Methodology.
Acknowledgments
Even though the science of Statistics was originated more than 200 years ago ,
it was recognized as a separate discipline in the early 1940 in India. From then to
till now statistics is evolving as a versatile powerful and indispensable instrument for
analyzing the statistical data in real life problems. We have reached a stage where no
empirical science can afford to ignore the science of Statistics, since the diagnosis of
pattern of recognition can be achieved through the science of Statistics. Because of the
speedy growth of modern science and technology, one who learns statistics, he must
have capacity, knowledge and intellect. Bird has capacity to imitate when we taught.
The child is not born with a language. But it is born into an innate capacity to learn
language. So when we teach the child, the child manipulates the structure and creates
sentences. But a bird cannot do this. So the child has knowledge and capacity to create
new sentences. If a man has the ability and knowledge he can be inventiveness and
innovation constitute intellect.
If a student has ability, knowledge and intellect, then he will be able to learn and
implement statistics successfully. If these three faculties are lacking, learning of statis-
tics will not be possible. We shall give a number of examples drawn from the story of
improvement of natural knowledge and the success of decision making. It shows how
statistical ideas played an important role in scientific investigations and other decision
making processes. The most successful man in life is one who makes the best deci-
sion based on the available information. Practically it is a very difficult task to take a
decision on a real life problems. We illustrate this with the help of following examples.
One wants to know that how many ways a bread can be divided into two equivalent
parts. Immediately one reflects that it is divided into a finite number of ways. In fact
the bread is divided into two equivalent parts in infinite number of ways. Naturally
every article can have infinite dimension. Our interest of study may be one dimension
namely, length of the bread, Area ( = length × breath ) two dimension and Volume
( = length × height × breadth) three dimension and so on. Analogous to this are
the measures of average ( location), measures of variability ( scale) and measures of
skewness and kurtosis (shape).
Another example is that a new two wheeler is introduced by a manufacturer in the
market. The manufacturer wants to announce that the two wheeler gives how much
kilometer per litre on road. For this purpose, the manufacturer ride the two wheeler on
the road three times and observed that the two wheeler gives 50 km per litre, 55 km
per litre and 60 km per litre respectively. Suddenly one comes to the mind that the two
wheeler gives = 50+55+60 3 = 55 km per litre. This is absolutely wrong. Actually the
two wheeler gives 60 km per litre, the value of the maximum order statistic.
A cyclist pedals from his house to his college at a speed of 10 mph and returns back
his house from the college at a speed of 15 mph. He wants to know his average speed.
One assumes that the distance between the house and the college is x miles. Then the
average speed of the cyclist = TotalTotal distance = x 2x x = 12 mph which is the
time taken 10 + 15
Harmonic Mean.
Seven students and a master want to cross a river from one side to other side. The
students are not able to swim to cross the river. The master measures average height
of the students which is 5’.5”. He also measures the depth of the river from one side
5
to other side in 10 places 2’, 2’.5”, 4’, 5’.5”, 6’, 6’.5”, 10’, 2’.5”,1’.5”,1’which has
4’.15” average depth of the river. The master takes a decision to cross the river on foot,
since average height of the students is greater than the average depth of the river. The
students fail to cross the river, since some place the depth of the river is more than
5’.5”. The master is not happy for his decision. The master has succeeded to take a
decision if the minimum height of the students is greater than the maximum depth of
the river.
Keeping this in mind, the first chapter of the book deals with some of the well
known distributions he pattern of recognition of statistical distributions. Chapter 2
gives the criteria of point estimation. Chapter 3 focuses on the study of optimal estima-
tion. Chapter 4 illustrates the properties of complete family of distributions. Chapter 5
explains the methods of estimation. Chapter 6 discusses interval estimation. Chapter 7
consists of Bayesian estimation.
6
DISTINCTIVE FEATURES
• Care has been taken to provide conceptual clarity, simplicity and up to date ma-
terials.
• Properly graded and solved problems to illustrate each concept and procedure
are presented in the text.
• About 300 solved problems and 50 remarks.
• A chapter on complete family of distributions.
January 2010
7
CONTENTS
8
4.1 Introduction
4.2 Uniformly Minimum Variance Unbiased Estimator
4.3 Uncorrelatedness Approach
4.4 Rao - Balckwell Theorem
4.5 Lehman - Scheffe Theorem
4.6 Inequality Approach
4.7 Cramer Rao Inequality
4.8 Chapman - Robbin Inequality
4.9 Efficiency
4.10 Extension of Cramer- Rao Inequality
4.11 Cramer - Rao Inequality - Multiparameter case
4.12 Bhattacharya Inequality
9
7.1 Introduction
References
Glossary of Notation
Appendix
Answers to problems
Index
10
Probability Models and their Parametric Estimation
1.1 Introduction
Statistics is a decision making tool which aims to resolve the real life problems.
It originated more than 2000 years ago, but it was recognized as a separate discipline
from 1940 in India. From then till now , statistics is evolving as a versatile powerful and
indispensable instrument for investigation in all fields of real life problems. It provides
a wide variety of analytical tools. We have reached a stage where no empirical science
can afford to ignore the science of statistics since the diagnosis of pattern of recognition
can be achieved through the science of statistics.
Statistics is a method of obtaining and analyzing data in order to take decisions
on them. In India, during the period of Chandra Gupta Maurya there was an efficient
system of collecting official and administrative statistics. During Akbar’s reign ( 1556
- 1605AD) people maintained good records of land and agricultural statistics. Statistics
surveys were also conducted during his reign.
Sir Ronald A. Fisher known as Father of statistics placed statistics on a very
sound footing by applying it to various diversified fields. His contributions in statistics
led to a very responsible position of statistics among sciences
Professor P. C. Mahalanobis is the founder of statistics in India. He was a
physicist by training , a statistician by instinct and an economist by conviction. Gov-
ernment of India has observed on 29th June the birthday of Professor Prasanta Chan-
dra Mahalanobis as National Statistics Day. Professor C.R. Rao is an Indian legend
, whose career spans the history of modern statistics. He is considered by many to be
the greatest living statistician in the world to day.
There are many definitions of the term statistics . Some authors have defined
statistics as statistical data ( plural sense) and others as statistical methods ( singular
sense).
11
A. Santhakumaran
12
Probability Models and their Parametric Estimation
If the chosen distribution is not a good approximation of the data, then the analyst
goes to the second step, chooses a different family of distributions and repeats the
procedure.
If the several iterations of this procedure fail to give a fit between an assumed
distributional form and the collected data, then the empirical form of the distribution
may be used.
13
A. Santhakumaran
(iv) Beware of the possibility of data censoring, in which a quantity of interest is not
observed in its entirety. This problem most often occurs when the analyst is
interested in the time required to complete some process but the process begins
prior to or finishes after the completion of the observation period. Censoring can
result in especially long process times being left out of the data sample.
(v) One may use scatter diagram which indicates the relationship between the two
variables of interest.
(vi) Consider the possibility that a sequence of observations which appear to be in-
dependent may possess autocorrelation. Autocorrelation may exist in successive
time periods.
14
Probability Models and their Parametric Estimation
θ remains constant from trial to trial. For one trial the pmf
x
θ (1 − θ)1−x x = 0, 1, 0 < θ < 1
pθ (x) =
0 otherwise
It is the probability that the event {X = x} occurs, when there are x failures
followed by a success.
A couple decides to have any number of children until they have a male
child. If the probability of having a male child in their family is p , they have
to expect how many children they will have before the first male child is born.
X denotes the number of children of the couple. The probability that there are
x female children preceding the first male child is born, is a Geometric random
variable.
1.4.4 Negative Binomial Distribution
PnX1 , X2 , · · · , Xn are iid Geometric variables, then T
If = t(X) =
i=1 Xi ∼ a Negative Binomial variate whose pmf is
(
(t+n−1)! n t
pθ (t) = t!(n−1)! θ (1 − θ) t = 0, 1, · · ·
0 otherwise
15
A. Santhakumaran
This will happen if the last trial results in a success and among the previous
(n + x − 1) trials there are exactly x failures. Note that if n = 1 , then p(x)θ
is the Geometric distribution function. Negative Binomial distribution has Mean
< Variance . In a production process, the number of units that are required to
achieve nth defective in x + n units follow Negative Binomial distribution.
1.4.5 Multinomial Distribution
If the sample space of a random experiment has been split into more than two
mutually exclusive and exhaustive events then one can define a random vari-
able which leads to Multinomial distribution. Let E1 , E2 , · · · , Ek be k mu-
tually exclusive and exhaustive events of a random experiment with respec-
tive probabilities θ1 , θ2 , · · · , θk , such that θ1 + θ2 + · · · + θk = 1 and
0 < θi < 1, i = 1, 2, · · · , k, then the probability that E1 occurs x1 times, E2
occurs x2 times, · · · , Ek occurs xk times in n independent trials is known
as Multinomial distribution with pmf is given by
x
n!
θx1 θ2x2 where ki=1 xi = n
P
x1 !x2 !···xk ! 1
· · · θk k
pθ1 ,θ2 ,··· ,θk (x1 , x2 , · · · , xn ) =
0 otherwise
If k = 2 , that is, the number of mutually exclusive events is only two, then the
Multinomial distribution becomes a Binomial distribution as is given by
n! x1 x2
pθ1 ,θ2 (x1 , x2 ) = x1 !x2 ! θ1 θ2 where x1 + x2 = n and θ1 + θ2 = 1
0 otherwise
Consider two brands A and B. Each individual in the population prefers brand
A to brand B with probability θ1 , prefers B to A with probability θ2 and is
indifferent between brand A and B with probability θ3 = 1 − θ1 − θ2 . In
a random sample of n individuals X1 prefers brand A, X2 prefers brand B
and X3 prefers some other brand other than A and B. Then the three random
variables follow a Trinomial distribution, i.e.,
16
Probability Models and their Parametric Estimation
A random experiment with complete uncertainty but whose outcomes are equal
probabilities may describe Uniform distribution. In a finite population of N
units, one has to select any unit xi , i = 1, 2, · · · , N from the population with
simple random sampling technique which has a discrete uniform distribution.
1.4.7 Hypergeometric Distribution
One situation in which Bernoulli trials are encountered is that in which an ob-
ject is drawn at random from a collection of objects of two types in a box. In
order to repeat this experiment so that the results are independent and identically
distributed, it is necessary to replace each object drawn and to mix the objects
before the next one is drawn. This process is referred to as sampling with re-
placement. If the sampling is done no replacement of the objects drawn, the
resulting trial are still of the Bernoulli type but no longer independent.
For example, four balls are drawn one at a time, at random and no replace-
ment from 8 balls in a box, 3 black and 5 red. The probability that the third ball
drawn is black, i.e.,
Sn = X1 + X2 + · · · + Xn
17
A. Santhakumaran
One can observe first that the probability of a given sequence of N objects is
1 1 1
···
N N −1 N −n+1
The probability that an object of type 1 occurs in the ith position in the sequence
of N objects is
M (N − 1)(N − 2) · · · (N − n + 1)
P {Xi = 1} =
N (N − 1) · · · (N − n + 2)(N − n + 1)
M
= i = 1, 2, · · · , n
N
where M is the number of ways of selecting the ith position with an object
coded 1 and (N − 1)(N − 2) · · · (N − n + 1) is the number of ways of selecting
the remaining (n − 1) places in the sequence from the (N − 1) remaining
objects. It does not matter whether the number of success among the n objects
drawn, one at a time, at random or that of simultaneously drawing n at random.
The probability function of Sn is
M N-M
k n - k
P {Sn = k} = N
k = 0, 1, 2, · · · , min(n, M )
n
0
otherwise
The random variable Sn with the above probability function is said to have a
Hypergeometric distribution. The mean of the random variable Sn is easily
obtained from the representation of a Hypergeometric variable as a sum of the
Bernoulli trials. That is,
E[Sn ] = E[X1 + X2 + · · · + Xn ]
= E[X1 ] + E[Xn ] + · · · + E[Xn ]
= 1 × P {X1 = 1} + 0 × P {X1 = 0}
+ · · · + 1 × P {Xn = 1} + 0 × P {Xn = 0}
M M nM
= + ··· + =
N N N
M N −M N −n
Variance of Sn = n if N ∈ I+ (1.1)
N N N −1
The probability at each trial that the object drawn is of the type of which there
are initially M is p = MN , then
N −n
Variance of Sn = npq if N ∈ I+ (1.2)
N −1
18
Probability Models and their Parametric Estimation
−n
The above formula (1.2) differs from the formula (1.1) by the extra factor N
N −1 .
N −n
The variance of Sn = npq N −1 in the no replacement case and the variance
of Sn = npq in the replacement case for fixed p and fixed n , since the factor
N −n
N −1 → 1 as N becomes finitely many. Thus Hypergeometric distribution is
exact where as Binomial distribution is approximate one.
50 students of the M.Sc. Statistics in a certain college are divided at random
into 5 batches of 10 each for the annual practical examination in Statistics. The
class consists of 20 resident students and 30 non - resident students. X denotes
the number of students in the first batch who appear the practical examination.
The Hypergeometric distribution is apt to describe the random variable X and
has the pmf
20 30
x 10 - x
x = 0, 1, 2, · · · , 10
50
P {X = x} =
10
0 otherwise
19
A. Santhakumaran
20
Probability Models and their Parametric Estimation
21
A. Santhakumaran
The value of the intercept on the vertical axis is always equal to the value of θ .
Note that all pdf 0 s eventually intersect at θ , since the Exponential distribution
has its mode at the origin. The mean and standard deviation are equal in Ex-
ponential distribution. In a random phenomenon, the time between independent
events which have memory less property may appropriately follow Exponential
random variable. For example, the time between the arrivals of a large number
of customers who act independently of each other may fit adequately the data to
Exponential distribution.
1.5.4 Gamma Distribution
A function used to define the Gamma distribution is the Gamma function. A
random variable X follows a Gamma distribution, if
( β
θ −θx β−1
pθ,β (x) = Γβ e x x > 0, β > 0, θ > 0
0 otherwise
22
Probability Models and their Parametric Estimation
Duration
of Hours Frequency p(x) F (x) = P {X ≤ x}
0≤x≤1 30 .30 .30
1<x≤2 25 .25 .55
2<x≤3 20 .20 .75
3<x≤4 25 .25 1.00
(b) Empirical Discrete Distributions
At the end of the day, the number of shipments on the loading docks of an export
company are observed as 0, 1 , 2, 3, 4 and 5 with frequencies 23, 15, 12, 10, 25
and 15 respectively. Let X be the number of shipments on the loading docks of
the company at the end of the day. Then X is a discrete random variable which
takes the values 0 , 1, 2, 3, 4 and 5 with the distribution as given in Table 1.2.
Figure 1.1 is the Histogram of number of shipments on the loading docks of the
company.
23
A. Santhakumaran
Number of
shipments x Frequency P {X = x} F (x) = P {X ≤ x}
0 23 .23 .23
1 15 .15 .38
2 12 .12 .50
3 10 .10 .60
4 25 .25 .85
5 15 .15 1.00
F
R
E
Q 25
U
E 20
N
C 15
Y 10
5
0 1 2 3 4 5
Number of shipments
Figure 1.1 Histogram of shipments
24
Probability Models and their Parametric Estimation
large and short times. The Exponential distribution has its mode at the origin but the
Gamma and Weibull distributions have their modes at some point( ≥ 0 ) which is a
function of the parameters values selected. The tail of the Gamma distribution is long,
like an Exponential distribution while the tail of the Weibull distribution may decline
more rapidly or less rapidly than that of an Exponential distribution. In practice, if
there are higher value of the variable than an Exponential distribution, it can account
for a Weibull distribution which provides a better distribution of the data.
Illustration 1.6.1
Sixteen equipments were produced and placed on test and the Table 1.3 gives the
length of time intervals between failures in hours.
For the sake of simplicity in processing the data , one can set up the ordered set as
given blow:
On this basis, one may construct a Histogram to judge the pattern of the data in Table
1.4. An approximate value of the interval can be determined from the formula.
maximum value - minimum value
∆t =
1 + 3.3 log10 N
where the maximum and minimum are the values in the ordered set and N is the total
number of items of the order statistics. In this case maximum value is 46 , minimum
value is 1 and N is 16. Thus ∆t = 1+3.345 log10 16 = 9.05 ≈ 10 = width of the class
interval.
25
A. Santhakumaran
Histogram is drawn based on the frequency distribution in Table 1.5 and is given in
Figure 1.2.
9
Number
of
Equipment
4
1 1 1
0 10 20 30 40 50
Time interval
Figure 1.2 Histogram of time to failures
The Histogram reveals that the distribution could be Negative Exponential or the
right portion of the Normal distribution. Assume the time to failure follows Exponen-
tial distribution of the form,
−θx
θe θ > 0, x > 0
pθ (x) =
0 otherwise
How for the assumption is valid has to be verified? The validity of the assumption
is tested by the χ2 test of goodness of fit.
26
Probability Models and their Parametric Estimation
Rx
where pi = xii+1 θe−θx dx = e−θxi − e−θxi+1 , i = 0, 10, 20, · · · , 50. If the cell
frequencies are less than 5, then it can be made 5 or more than 5. One may get two
classes only, i.e, the expected frequencies are equal to 8 each and the corresponding
observed frequencies are 9 and 7 respectively. The χ2 test of goodness of fit fails
to test the validity of the assumption that the sample data come from an Exponential
1
distribution with parameter θ = 13.38 = .0747 = failure rate per unit hour where the
mean life time of the equipments = 214 16 = 13.38 hours. To test the validity of the
assumption that the time to failure follows an Exponential distribution, consider the
likelihood function of the cell frequencies of o1 = 9 and o2 = 7 is
e1 o1 e2 o2
n!
o !o ! n n o1 + o2 = n
L= 1 2
0 otherwise
Under H0 the likelihood function follows a Binomial probability law b(16, p) where
p = en1 . To test the hypothesis that H0 : the fit is the best one vs H1 : the fit is not the
best one. It is equivalent to test the hypothesis that H0 : p ≤ .5 vs H1 : p > .5 The
UMP level α = .05 test is given by
1 if x > 11
φ(x) = .17 if x = 11
0 otherwise
The observed value is 9 which is less than 11. There is no evidence to reject the
hypothesis H0 . The data come from an Exponential distribution with 5% level of
significance. Thus time to failure of the equipments follows an Exponential distribu-
tion. One may conclude that on an average the equipment would be operated for 13.38
hours without failure.
27
A. Santhakumaran
Illustration 1.7.1
A sample of 20 repairing times of electronic watch was considered. The repairing
time X is a random variable. The values are in seconds on the random variable X .
The values are arranged in the increasing order of magnitude as in Table 1.7.
28
Probability Models and their Parametric Estimation
?
? Normal
? quantile
yj
?
?
?
?
?
?
Time xj
Figure 1.3 q − q plot of the repairing times
Note: The diagnosis of statistical distributions of real life problems are not exact
but at best they represent reasonable approximations.
Problems
1.1 The mean and variance of the number of defective items drawn randomly one
by one with replacement from a lot are found to be 10 and 6 respectively. The
distribution of the number of defective items is:
(a) Poisson with mean 10
29
A. Santhakumaran
30
Probability Models and their Parametric Estimation
31
A. Santhakumaran
is :
(a) Bernoulli distribution
(b) Binomial distribution
(c) Multinomial distribution
(d) Geometric distribution
Pn
1.18 If X1 , X2 , · · · , Xn are iid Geometric variables, then i=1 Xi follows:
(a) Negative Binomial distribution
(b) Binomial distribution
(c) Multinomial distribution
(d) Geometric distribution
1.19 A random variable X is related to a sequence of Bernoulli trials in which x
failures preceding the nth success in (x + n) trials is a :
(a) Binomial distribution
(b) Multinomial distribution
(c) Negative Binomial distribution
(d) Geometric distribution
1.20 If a random experiment has only two mutually exclusive outcomes of a Bernoulli
trial, then the random variable leads to:
(a) Binomial distribution
(b) Multinomial distribution
(c) Negative Binomial distribution
(d) Geometric distribution
1.21 A box contains N balls M of which are white and N − M are red. If X
denotes the number of white balls in the sample contains n balls with replace-
ment, then X is a :
(a) Binomial variate
(b) Bernoulli variate
(c) Negative Binomial variate
(d) Hypergeometric variate
1.22 The number of independent events that occur in a fixed amount of time may
follow:
(a) Exponential distribution
(b) Poisson distribution
(c) Geometric distribution
(d) Gamma distribution
1.23 A power series distribution
ax θ x
f (θ) x ∈ S, ax ≥ 0
Pθ {X = x} =
0 otherwise
p
where f (θ) = (1 + θ)n , θ = (1−p) and S = {0, 1, 2, · · · } . Then the random
variable X has
32
Probability Models and their Parametric Estimation
It is known as
(a) Binomial ( b) Negative Binomial (c) Poisson (d) Geometric
33
A. Santhakumaran
2.1 Introduction
In real life applications, determining appropriate distributions from the random
sample is a major task. Faulty assumption of distributions will lead to misleading rec-
ommendations. As a family of distributions induced by a parameter has been selected,
the next step is to estimate the parameters of the distribution. The criteria of the point
estimators for many standard distributions are described in this chapter.
The set of all admissible values of parameters of a distribution is called the parame-
ter space Ω . Any member from the parameter space is called parameter. For example,
a random variable X is assumed to follow a normal distribution with mean θ and
variance σ 2 . The parameter space Ω = {(θ, σ) | −∞ < θ < ∞, 0 < σ 2 < ∞} .
Suppose a random sample X1 , X2 , X3 , · · · , Xn is taken on X . Here a statistic
T = t(X) from the sample X1 , X2 , · · · , Xn which gives the best value for the pa-
rameter θ . The particular value of the Statistic T = t(x) = x̄ based on the values
x1 , x2 , · · · , xn is called an estimate of θ . If the statistic T = X̄ is used to estimate
the unknown parameter θ, then the sample mean is called an estimator of θ . Thus an
estimator is a rule or a procedure to estimate the value of θ . The numerical value x̄ is
called an estimate of θ .
34
Probability Models and their Parametric Estimation
2.5 Consistency
Consistency is a convergence property of an estimator. It is an asymptotic or large
sample size property. Let X1 , X2 , · · · , Xn be iid random sample drawn from a pop-
ulation with common distribution Pθ , θ ∈ Ω. An estimator T = t(X) is consistent
for θ if for every > 0 and for each fixed θ ∈ Ω, Pθ {|T −θ| > } → θ as n → ∞ ,
P
i.e. T → θ as n → ∞ for fixed θ ∈ Ω .
Example 2.1 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal
population with mean θ and known variance σ 2 . The statistic T = X̄ is chosen for
2
an estimator of the parameter θ . The statistic X̄ ∼ N( θ, σn ). To test the consistency
of the estimator, consider for every > 0 and fixed θ ∈ Ω,
Pθ {|X̄ − θ| > } = 1 − Pθ {|X̄ − θ| < }
= 1 − Pθ {− < X̄ − θ < }
√ X̄ − θ √
= 1 − Pθ {− n/σ < √ < n/σ}
σ/ n
√ √
= 1 − Pθ {− n/σ < Z < n/σ}
X̄ − θ
where Z = √
σ/ n
= 1 − Pθ {−∞ < Z < ∞} as n → ∞
= 1 − 1 = 0 as n → ∞
P
Thus X̄ → θ as n → ∞ . The sample mean X̄ of the normal population is a
consistent estimator of the population mean θ .
Remark 2.1 In general sample mean need not be a consistent estimator of the
population mean.
Example 2.2 Let X1 , X2 , X3 , · · · , Xn be iid random sample from a Cauchy
population with pdf
1 1
π 1+(x−θ)2 −∞ < x < ∞
pθ (x) =
0 otherwise
For every > 0 and fixed θ ∈ Ω,
Pθ {|X̄ − θ| > } = 1 − Pθ {− < X̄ − θ < }
= 1 − Pθ {θ − < X̄ < θ + }
Z θ+
1 1
= 1− 2
dx̄
θ− π 1 + (x̄ − θ)
since X̄ ∼ Cauchy distribution with parameter θ
Z
1 1
= 1− 2
dz where x̄ − θ = z
− π 1 + z
1
= 1 − [tan−1 (z)]−
π
2
= 1 − tan−1 () since tan−1 (−θ) = − tan−1 (θ)
π
35
A. Santhakumaran
By Chebychev’s inequality
1
Pθ {|Tn − θ| > } ≤ Eθ [Tn − θ]2
2
1 h 2
i
≤ Vθ [Tn ] + {Eθ [Tn − θ]}
2
→0 as n → ∞
36
Probability Models and their Parametric Estimation
The Cauchy Principle value 0 is taken as the mean of the Cauchy distribution. Thus the
Cauchy distribution has not the mean finitely exist. Hence for the Cauchy population,
the sample mean X̄ is not a consistent estimator of the parameter θ .
Example 2.3 If X1 , X2 , · · P · , Xn is a random sample drawn from a normal popu-
1 n
lation N( 0, σ 2 ). ShowPthat 3n 4 4
k=1 Xk is a consistent estimator of σ .
1 n 4
Let T = 3n k=1 Xk .
n
1 X
Eσ4 [T ] = Eσ4 [Xk4 ]
3n
k=1
n
1 X
= Eσ4 [Xk − 0]4 since E[Xk ] = 0 ∀ k = 1, 2, · · ·
3n
k=1
1 1
= nµ4 = 3nσ 4 since µ4 = 3σ 4 where
3n 3n
µ2n = 1 × 3 × 5 × · · · × (2n − 1)σ 2n n = 1, 2, · · ·
= σ4
n
1 X
Vσ4 [T ] = Vσ4 [X 4 ]
(3n)2
k=1
n
1 Xn 4 2 4 2
o
= Eσ 4 [Xk ] − E σ 4 [Xk ]
(3n)2
k=1
1
= n[µ8 − µ24 ]
(3n)2
1
= [105σ 8 − (3σ 4 )2 ] since µ8 = 1 × 3 × 5 × 7 × σ 8
32 n
1
= 96σ 8 → 0 as n → ∞.
32 n
Thus T is a consistent estimator of σ 4 .
Example 2.4 Let X1 , X2 , · · · Xn be a random sample drawn from a population
Qn 1
with rectangular distribution ∪(0, θ), θ > 0 . Show that ( i=1 Xi ) n is a consistent
estimator of θe−1 .
37
A. Santhakumaran
Qn 1
Let GM = ( i=1 Xi ) n ∀ Xi > 0, i = 1, 2, · · · , n .
n
1X
loge GM = log Xi
n i=1
Z θ
1
Eθ [log X] = log xdx
θ 0
( Z θ )
1 θ
= [x log x]0 − dx
θ 0
1h i
= θ log θ − lim x log x − θ
θ x→0
= log θ − 1
1
log x
Since lim x log x = lim 1 = lim x1 = 0
x→0 x→0 x→0 − 2
x x
Z θ
1
Eθ [log X]2 = (log x)2 dx
θ 0
Z θ
1 2 θ 1 log x
= [x(log x) ]0 − 2x dx
θ θ 0 x
1 2
= (log θ)2 − lim x(log x)2 − [θ log θ − θ]
θ x→0 θ
= (log θ)2 − 2 log θ + 2 since lim x(log x)2 = 0
x→0
Vθ [log X] = (log θ)2 − 2 log θ + 2 − (log θ − 1)2 = 1
n
1 X 1
Vθ [log GM ] = Vθ [log Xi ] =
n2 i=1 n
Vθ [log GM ] → 0 as n → ∞, ∀ θ > 0
38
Probability Models and their Parametric Estimation
2
Pn
n(n+1) i=1 iXi is a consistent estimator of θ .
" n #
X
Eθ iXi = Eθ [X1 + 2X2 + · · · + nXn ]
i=1
= θ + 2θ + · · · + nθ
= θ[1 + 2 + · · · + n]
n(n + 1)
= θ
" n # 2
2 X
Eθ iXi = θ, ∀ θ ∈ Ω
n(n + 1) i=1
" n # n
X X
Vθ iXi = i2 Vθ [Xi ]
i=1 i=1
n
X
= σ2 i2
i=1
2 n(n
+ 1)(2n + 1)
= σ
" # 6
n
2 X 2 (2n + 1) 2
Vθ iXi = σ → 0 as n → ∞
n(n + 1) i=1 3 n(n + 1)
2
Pn
Thus n(n+1) i=1 iXi is a consistent estimator of θ.
h Eθ [T ]i→ θ and Vθ [T ] → 0 as n → ∞.
Thus T is a consistent estimator of θ . Also
θ2
Eθ (n+1)
n T = θ and V θ [ (n+1)
n T ] = n(n+2) → 0 as n → ∞, i.e.,
(n+1)
n T is
39
A. Santhakumaran
T 0 = T 2 − 2 → T 2 as n → ∞ since → 0 as n → ∞
.. . Pθ {|T 2 − θ2 | < 0 } → 1 as → ∞. Thus T 2 is a consistent estimator of θ2 .
40
Probability Models and their Parametric Estimation
Otherwise, the statistic g(T ) is said to be a biased estimator of τ (θ) . The unbiased
estimator is also called zero bias estimator. A statistic g(T ) is said to be asymptotically
unbiased estimator if Eθ [g(T )] → τ (θ) as n → ∞, ∀ θ ∈ Ω .
Example 2.8 A random variable X has the pdf
2θx if 0 < x < 1
pθ (x) = (1 − θ) if 1 ≤ x < 2, 0 < θ < 1
0 otherwise
Eθ [g(X)] = θ
Z 1 Z 2
g(x)2θxdx + g(x)(1 − θ)dx = θ
0 1
Z 1 Z 2 Z 2
θ 2xg(x)dx − g(x)dx + g(x)dx = θ
0 1 1
Z 1 Z 2
⇒ 2xg(x)dx − g(x)dx = 1 and
0 1
Z 2
g(x)dx = 0
1
Z 1
1
i.e., xg(x)dx = and
0 2
Z 2
g(x)dx = 0
1
R 1 1
R 2
Conversely, 0
xg(x)dx = 2 and 1
g(x)dx = 0, then g(X) is an unbiased esti-
mator of θ .
Z 1 Z 2
Eθ [g(X)] = 2θxg(x)dx + (1 − θ)g(x)dx
0 1
Z 1 Z 2
= 2θ xg(x)dx + (1 − θ) g(x)dx
0 1
1
= 2θ + (1 − θ) × 0
2
= θ
41
A. Santhakumaran
T (n − T )
n = 2, 3, · · ·
n(n − 1)
42
Probability Models and their Parametric Estimation
Eθ [g ∗ (T )] = θ2
n t
X θ
g ∗ (t)cnt (1 − θ)n = θ2
t=0
1−θ
n
X
g ∗ (t)cnt ρt = ρ2 (1 + ρ)n−2
t=0
= ρ2 [1 + cn−2
1 ρ + · · · + cn−2
t ρt + · · · + ρn−2 ]
∵ g(t)∗ cnt = cn−2
t−2
∗ (n − 2)!t!(n − t)!
⇒ g (t) =
(t − 2)!(n − t)!n!
(n − 2)!t(t − 1)!(t − 2)!
=
(t − 2)!n(n − 1)(n − 2)!
t(t − 1)
= n = 2, 3, · · · · · ·
n(n − 1)
1
Eθ [g(X)] =
θ
∞
X 1
g(x)θ(1 − θ)x−1 =
x=1
θ
∞
X (1 − θ)
g(x)(1 − θ)x =
x=1
θ2
Take 1 − θ = ρ ⇒ θ = 1−ρ
X∞
g(x)ρx = ρ(1 − ρ)−2
x=1
= ρ(1 + 2ρ + 3ρ2 + · · · + xρx−1 + · · · )
⇒ g(x) = x ∀ x = 1, 2, 3, · · ·
1
Thus g(X) = X is the unbiased estimator of θ .
43
A. Santhakumaran
Eθ [g(X)] = θ2
1
X
g(x)θx (1 − θ)1−x = θ2
x=0
g(0)(1 − θ) + g(1)θ = θ2
1
Consider Eθ [g(X)] =
θ
n x
X n! θ 1
g(x) (1 − θ)n =
i=0
x!(n − x)! 1 − θ θ
n
X n! (1 + ρ)n+1
g(x) ρx =
i=0
x!(n − x)! ρ
θ
where ρ = 1−θ
n+1
n!
ρx → g(0) as θ → 0 and (1+ρ)
P
g(x) x!(n−x)! ρ → ∞ as ρ → 0 or θ → 0
Thus there is no unbiased estimator exist of the parameter θ1 .
Let Eθ [g(X)] = θ2
1
X
g(x)θx (1 − θ)1−x = θ2
x=0
1 1
When θ = ⇒ 3g(0) + g(1) = (2.1)
4 4
44
Probability Models and their Parametric Estimation
1 1
When θ = ⇒ g(0) + g(1) = (2.2)
2 2
Solving the equations (2.1) and (2.2) for g(0) and g(1) , one gets the values of g(0) =
− 81 and g(1) = 58 , 1
−8 for x = 0
i.e., g(x) = 5
8 for x=1
Thus the unbiased estimator of θ2 is g(X) = X which is unique.
Unbiased estimator is not unique
Z ∞
1 − 1 t n+1−1
Eθ [T ] = e θ t dt
0 θn Γn
" n
# = nθ
X
Eθ Xi = nθ ∀ θ > 0
i=1
Eθ [nX̄] nθ ∀ θ > 0
=
⇒ Eθ [X̄] θ∀θ>0
=
Eθ [T 2 ] n(n + 1)θ2 ∀ θ > 0
=
Vθ [T ] nθ2 ∀ θ > 0
=
Pn
. i=1 Xi
. . Vθ [X̄] = Vθ
n
1
= Vθ [T ]
n2
1 2 θ2
= nθ =
n2 n
45
A. Santhakumaran
Z ∞
1 1 n
E[Y ] = 1 n
e− 2 y y 2 +1−1 dy
0 2 2 Γ 2
1 Γ( n2 + 1)
= n n
2 2 Γ n2 ( 1 ) 2 +1
2
= n
2
E[Y ] = n2 + 2n
V [Y ] = 2n
ns2
But Y = 2
σ2
ns
.. . Eσ2 = n
σ2
⇒ Eσ2 [s2 ] = σ2
Xi2
P
Thus n is an unbiased estimator of σ 2 .
2
ns
Vσ 2 = 2n
σ2
n2
Vσ2 [s2 ] = 2n
σ4
2σ 4
Vσ2 [s2 ] =
n
Example 2.17 Let Y1 < Y2 < Y3 be the order statistics of a random sample of
size 3 drawn from an uniform population with pdf
1
θ 0<x<θ
pθ (x) =
0 otherwise
Show that 4Y1 and 2Y2 are unbiased estimators of θ . Also find the variance of these
estimators.
The pdf of Y1 is
( hR i2
3! 1 θ 1
pθ (y1 ) = 1!2! θ y1 θ
dx 0 < y1 < θ
0 otherwise
46
Probability Models and their Parametric Estimation
3 y1 2
θ [1 − θ ] 0 < y1 < θ
pθ (y1 ) =
0 otherwise
Z θ
3 y1 2
Eθ [Y1 ] = y1 (1 − ) dy1
θ 0 θ
Z 1
3 y1
= θt(1 − t)2 θdt where θ =t
θ 0
Z 1
= 3θ t2−1 (1 − t)3−1 dt
0
Γ2Γ3 θ
= 3θ = ∀θ>0
Γ5 4
θ2 3θ2
Similarly Eθ [Y12 ] = and Vθ [Y1 ] =
10 15
3θ2
.. . Vθ [4Y1 ] =
5
The pdf of Y2 is
!
Z y2 Z θ
3! 1 1 1
pθ (y2 ) = dx dx
1!1!1! 0 θ θ y2 θ
6 y2
θ 2 y2 [1 − θ ] 0 < y2 < θ
pθ (y2 ) =
0 otherwise
.˙. Eθ [Y2 ] = θ2
2
θ2
⇒ 2Y2 is an unbiased estimator of θ and Eθ [Y 2 ] = 3θ 10 and Vθ [Y2 ] = 20
2
⇒ Vθ [2Y2 ] = θ5
Example 2.18 Let Y1 and Y2 be two independent and unbiased estimators of θ .
If the variance of Y1 is twice the variance of Y2 , find the constant k1 and k2 so that
k1 Y1 + k2 Y2 is an unbiased estimator of θ with smaller possible variance for such a
linear combination.
Given Eθ [Y1 ] = θ ∀ θ and Eθ [Y2 ] = θ ∀ θ and Vθ [Y1 ] = 2σ 2 and
47
A. Santhakumaran
k1 Eθ [Y1 ] + k2 Eθ [Y2 ] = θ
⇒ k1 + k2 = 1
i.e., k2 = 1 − k1
Consider φ = Vθ [k1 Y1 + k2 Y2 ]
= k12 Vθ [Y1 ] + k22 Vθ [Y2 ]
= k12 2σ 2 + (1 − k1 )2 σ 2
= 3k12 σ 2 − 2k1 σ 2 + σ 2
Differentiate twice this with respective to k1
dφ
= 6k1 σ 2 − 2σ 2
dk1
d2 φ
= 6σ 2
dk12
dφ d2 φ
For minimum =0 and >0
dk1 dk12
⇒ 6k1 σ 2 − 2σ 2 = 0
1 2
i.e., k1 = and k2 =
3 3
1
Thus 3 Y1 + 23 Y2 has minimum variance.
Consistent estimator need not be unbiased
Example 2.19 Let X1 , X2 , · · · , Xn be a sample of size P
n drawn from a normal
n
population with mean θ and variance σ 2 . Define s2 = n1 i=1 (Xi − X̄)2 , then
2
Y = ns 2 n−1 1
σ 2 ∼ χ distribution with (n − 1) degrees of freedom and Y ∼ G( 2 , 2 ) .
It has the pdf
( 1 n−1
n−1
1
n−1
e− 2 y y 2 −1 0 < y < ∞
p(y) = 2 2 Γ 2
0 otherwise
48
Probability Models and their Parametric Estimation
Z ∞
1 1 n−1
E[Y ] r
= n−1
n−1
e− 2 y y 2 +r−1 dy
0 2 2Γ 2
Γ n−1
1 2 +r
= n−1 n−1
2 2 Γ n−1
2 ( 12 ) 2
+r
r
2 n−1
= Γ +r
Γ n−1
2
2
When r = 1
2 n−1 n−1
E[Y ] = Γ =n−1
Γ n−1
2
2 2
ns2
.
. . Eσ2 = n−1
σ2
n−1 2
⇒ Eσ2 [s2 ] σ=
n
2(n − 1) 4
and Vσ2 [s2 ] = σ
n2
Thus Eσ2 [s2 ] → σ 2 and Vσ2 [s2 ] → 0 as n → ∞
Pn
.˙. n1 i=1 (Xi − X̄)2 is aP consistent estimator of σ 2 .
1 n
But Eσ2 [s ] 6= σ . .˙. n i=1 (Xi − X̄)2 is not an unbiased estimator of σ 2 .
2 2
Example 2.20 Illustrate with an example that an estimator is both consistent and
unbiased.
Let X1 , X2 , · · · , Xn be a random sample of size n P
drawn from a normal
n
population with mean θ and variance σ 2 . Define s2 = n1 i=1 (Xi − X̄)2 and
n 2
1 ns
S 2 = n−1 2 2
P
i=1 (Xi − X̄) , then Y = σ 2 ∼ χ distribution with (n − 1) degrees
2(n−1) 4
of freedom and Y ∼ G( n−1 1 2
2 , 2 ) . with Eσ [s ] =
2
n−1 2
n σ and Vσ2 [s2 ] = n2 σ .
n 2
(n − 1)S 2 = ns2 → S 2 = s
n−1
n
Eσ2 [S 2 ] = Eσ2 [s2 ]
n−1
n n−1 2
= σ = σ2
n−1 n
n2
Vσ2 [S 2 ] = Eσ2 [s2 ]
(n − 1)2
n2 2(n − 1) 4
= 2
σ
(n − 1) n2
2σ 4
= → 0 as → ∞
(n − 1)
1
Pn
Thus S 2 = n−1 2
i=1 (Xi − X̄) is consistent and also unbiased estimator of σ .
2
Example 2.21 Give an example that an unbiased estimator need not be consistent.
Let X1 , X2 , · · · , Xn be a random sample drawn from a normal population
with mean θ and known variance σ 2 , then the estimator X1 ( first observation) of the
49
A. Santhakumaran
Thus Y1 the first order statistic is not consistent and not unbiased estimator of θ .
pθ (x1 , x2 , · · · , xn )
pθ (t)
Pθ {X1 = x1 , X2 = x2 , · · · | T = t}
50
Probability Models and their Parametric Estimation
51
A. Santhakumaran
Pθ {X1 = x1 , X2 = t − x1 }
Consider Pθ {X1 = x1 , X2 = x2 | T = t} =
Pθ {T = t}
Pθ {X1 = x1 }Pθ {X2 = t − x1 }
=
Pθ {T = t}
e−θ θ x1 e−θ θ t−x2
x1 ! (t−x2 )!
= e−2θ (2θ)t
t!
t!
= is independent of θ.
(t − x1 )!x1 !2t
.˙. X1 + X2 is a sufficient statistic.
52
Probability Models and their Parametric Estimation
Let T = X1 + X2 . Consider
Pθ {T = 1} = Pθ {X1 + X2 = 1}
= Pθ {X1 = 0, X2 = 1} + Pθ {X1 = 1, X2 = 0}
= (1 − θ)2θ + θ(1 − 2θ)
= θ(3 − 4θ)
Pθ {X1 = 0 ∩ X1 + X2 = 1}
.˙.Pθ {X1 = 0 | X1 + X2 = 1} =
Pθ {X1 + X2 = 1}
Pθ {X1 = 0, X2 = 1}
=
Pθ {X1 + X2 = 1}
(1 − θ)2θ
=
θ(3 − 4θ)
2(1 − θ)
= is dependent on θ.
(3 − 4θ)
. ˙. X1 + X2 is not a sufficient statistic.
Example 2.27 If X1 and X2 denote a random sample drawn from a normal popula-
tion N( θ, 1 ), −∞ < θ < ∞ . Show that T = X1 + X2 is a sufficient statistic.
The joint pdf of X1 and X2 is
53
A. Santhakumaran
P {Y = 1} = P {X1 = 1, X2 = 1}
= θ2
P {Y + X3 = 1} = P {Y = 0, X3 = 1} + P {Y = 1, X3 = 0}
= (1 − θ2 )θ + θ2 (1 − θ)
i.e., P {T = 1} = θ(1 − θ)(1 + 2θ)
Consider
P {Y = 1, T = 1}
P {Y = 1 | T = 1} =
P {T = 1}
P {Y = 1}P {X3 = 0}
=
P {T = 1}
θ2 θ
=
θ(1 − θ)(1 + 2θ)
θ2
=
(1 − θ)(1 + 2θ)
pθ (x1 , x2 , · · · , xn ) = pθ (t)h(x1 , x2 , · · · , xn )
54
Probability Models and their Parametric Estimation
pθ (x1 , x2 , · · · , xn ) = pθ (t)h(x1 , x2 , · · · , xn ).
Pθ {X1 = x1 , · · · , Xn = xn } = Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn , T = t}
= Pθ {T = t}P {X1 = x1 , · · · , Xn = xn | T = t}
Pθ {X1 = x1 , · · · , Xn = xn , T = t}
Pθ {X1 = x1 , · · · , Xn = xn | T = t} =
Pθ {T = t}
(
0 if T 6= t
= Pθ {X1 =x1 ,··· ,Xn =xn }
Pθ {T =t} if T =t
If T = t, then
Pθ {X1 = x1 , · · · , Xn = xn } pθ (t)h(x1 , x2 , · · · , xn )
= P
Pθ {T = t} pθ (t) t(x)=t h(x1 , x2 , · · · , xn )
h(x1 , x2 , · · · , xn )
= P
t(x)=t h(x1 , x2 , · · · , xn )
is independent of θ.
55
A. Santhakumaran
h(x1 , x2 , · · · , xn )
p(x1 , x2 , · · · , xn | θ) = p(α−1 (u) | θ)[α−1 (u)]0
[α−1 (u)]0
= p(u | θ)h1 (x1 , x2 , · · · , xn )
where p(u | θ) = p(α−1 (u) | θ)[α−1 (u)]0
h(x1 , x2 , · · · , xn )
h1 (x1 , x2 , · · · , xn ) =
[α−1 (u)]0
56
Probability Models and their Parametric Estimation
−n(y1 −θ)
ne θ < y1 < ∞
pθ (y1 ) =
0 otherwise
The definition of sufficient statistic gives
pθ (x1 , x2 , · · · , xn ) e−(x1 −θ) · · · e−(xn −θ)
=
pθ (y1 ) ne−n(y1 −θ)
−t+nθ
e Pn
= −ny +nθ
where t = i=1 xi
ne 1
e−t
= is independent of θ.
ne−ny1
.˙. Y1 = min1≤i≤n {Xi } is sufficient. Again
57
A. Santhakumaran
Can the joint pdf of X1 , X2 , · · · , Xn be written in the form given in Neyman Fac-
torization Theorem ? Does Cauchy distribution have a sufficient statistic ?
The joint pdf of X1 , X2 , · · · , Xn is
n
Y 1 1
pθ (x1 , x2 , · · · , xn ) =
i=1
π 1 + (x − θ)2
It cannot be written in the form of Neyman Factorization Theorem, hence it does not
have a single sufficient statistic.
58
Probability Models and their Parametric Estimation
Consider
n
Y
pθ (x1 , x2 , · · · , xn ) = pθ (xi )
i=1
n
1 1
Pn 2
= √ e− 2 i=1 (xi −iθ)
2π
n
1 1
Pn 2 Pn Pn 2 2
= √ e− 2 i=1 xi +θ i=1 ixi − i=1 i θ
2π
n
1 1
Pn 2 Pn n(n+1)(2n+1) 2
= √ e− 2 i=1 xi +θ i=1 ixi − 12 θ
2π
= c(θ)eQ(θ)t(x) h(x)
n Pn
X 1 2
where t(x) = ixi , h(x) = e− 2 i=1 xi
i=1
n
1 1 2
and c(θ) = √ e− 12 n(n+1)(2n+1)θ
2π
Pn
Thus T = i=1 iXi is a sufficient statistic.
Example 2.34 Given n independent observations on a random variable X with prob-
ability density function
1 −x
2θ e θ if x > 0, θ > 0
θ θx
pθ (x) = e if x ≤ 0
2
0 otherwise
Obtain a sufficient statistic.
Consider
( t(x)
1 n − θ
( 2θ ) e if x > 0
pθ (x1 , x2 , · · · , xn ) = Pn
( θ2 )n eθt(x) , if x ≤ 0, where t(x) = i=1 xi
59
A. Santhakumaran
pθ (x) = pθ (t)h(x)
60
Probability Models and their Parametric Estimation
∂ 2 log pθ (x)
∂x∂θ ∂Qθ (t)
∂t(x)
= (2.4)
∂k(t)
∂x
The left hand side of the equation (2.4) is the same for all x . It must depend on θ
alone so that ∂Q θ (t)
∂k(t) = λ(θ), i.e.,
61
A. Santhakumaran
Y1 = nX(1)
Y2 = (n − 1)[X(2) − X(1) ]
Y3 = (n − 2)[X(3) − X(2) ]
··· ······
Yn−1 = 2[X(n−1) − X(n−2) ]
n
X n
X
Yn−2 = [X(n) − X(n−1) ] so that Yi = X(i)
i=1 i=1
1
The Jacobian of the transformation is |J| = n! . QThe joint pdf of
n
X(1) , X(2) , · · · , X(n) is given by p(x(1) , x(2) , · · · , x(n) ) = n! i=1 p(x(i) ) The joint
62
Probability Models and their Parametric Estimation
pdf of Y1 , Y2 , · · · , Yn is given by
n
Y
pθ,σ (y1 , y2 , · · · , yn ) = n! p(yi ) × |J|
i=1
n
Y
= p(yi )
i=1
1 − 1 (P yi +nθ)
= e σ nθ < y1 < ∞, 0 ≤ y2 < · · · , < yn < ∞
σn
Consider a further transformation
U1 = Y2
U2 = Y2 + Y3
U3 = Y2 + Y3 + Y4
··· ······
Un−2 = Y1 + Y2 + · · · + Yn−1
T = Y2 + Y3 + · · · + Yn
i.e., Y2 = U1
Y3 = U2 − U1
Y4 = U3 − U2
··· ······
Yn−1 = Un−2 − Un−3
Yn = T − Un−2
63
A. Santhakumaran
(n − 2)
p(u1 , u2 , · · · , un−2 | y1 , t) = 0 < u1 < u2 < · · · < un−2 < t
tn−2
Thus (Y1 , T ) is jointly sufficient statistics, i.e., (X(1) , +i = 1n [X(i) − X(1) ]) is
P
jointly sufficient statistics.
Definition 2.5 Let θ = (θ1 , θ2 , · · · , θk ) is a vector of parameters and T =
(T1 , T2 , · · · , Tk ) is a random vector . The vector T is jointly sufficient statistics
if pθ (x) is expressed of the form
Pk
Qj (θ)tj (x)
pθ (x) = c(θ)e j=1 h(x) a<x<b
0 otherwise
64
Probability Models and their Parametric Estimation
αnβ −α P xi Y β−1
pα,β (x1 , x2 , · · · , xn ) = e ( xi )
(Γβ)n
αnβ −α P xi (β−1) log(Qni=1 xi )
= ne e
(Γβ)
αnβ −α P xi +(β−1) P log xi
= ne
(Γβ)
αnβ −α P xi +β P log xi −P log xi
= ne
(Γβ)
= c(α, β)eQ1 (α,β)t1 (x)+Q2 (α,β)t2 (x) h(x)
αnβ
Pn
where c(α, β) = (Γβ) n , Q1 (α, β) = −α , t1 (x) = i=1 xi , Q2 (α, β) = β ,
− i =1n log xi
Pn P
t2 (x) = i=1 log xP h(x) = e
i and P . It is a two parameter exponential
family. Therefore ( Xi , Xi2 ) is jointly sufficient statistic.
65
A. Santhakumaran
The pdf of Y3 is
2 "Z θ #2
Z y3
5! 1 1 1
pθ (y3 ) = dx dx
2!1!2! 0 θ θ y3 θ
30 2
= y [θ − y3 ]2 0 < y3 < θ
θ5 3
30 2 y3
= 5
y3 [1 − ]2 0 < y3 < θ
θ θ
Z θ
30 y3
Eθ [Y3 ] = y 3 [1 − ]2 dy3
θ3 0 3 θ
Z 1
30 y3
= θ4 t3 (1 − t)2 dt where t = θ
θ3 0
Z 1
= 30θ t4−1 (1 − t)3−1 dt
0
Γ4Γ3
= 30θ
Γ7
3! × 2! θ
= 30 =
6! 2
Eθ [2Y3 ] = θ
The pdf of Y5 is
5 4
pθ (y5 ) = θ 5 y5 0 < y5 < θ
0 otherwise
66
Probability Models and their Parametric Estimation
pθ (y3 , y5 )
pθ (y3 | y5 ) =
pθ (y5 )
60 y32 [y5 − y3 ]
= 0 < y3 < y5
5 y54
Z y5
12
Eθ [Y3 | Y5 ] = y33 [y5 − y3 ]dy3
y54 0
3
= y5
5
6
.. . Eθ [2Y3 | Y5 = y5 ] = y5
5
θ2 2θ2
Vθ [Y3 ] = since Eθ [Y32 ] =
28 7
θ2
Vθ [2Y3 ] =
7
Z θ
5 5
Eθ [Y5 ] = y55 dy5 = θ
θ5 0 6
5θ2
Eθ [Y52 ] =
7
5θ2
Vθ [Y5 ] =
5 × 36
θ2
6
Vθ Y5 =
5 35
6
The efficiency of 5 Y5 is relative to 2Y3 is
θ2
Vθ [ 65 Y5 ] 35 1
= θ2
= <1
Vθ [2Y3 ] 7
5
67
A. Santhakumaran
68
Probability Models and their Parametric Estimation
2.17 Let X be a single observation from a normal population N (2θ, 1) Pnand let
Y1 , Y2 , · · · , Yn be a normal population N (θ, 1) . Define T = 2X + k=1 Yk .
Show that T is sufficient statistic.
2.18 Let X1 , X2 , · · · , Xn be a random
Pnsample drawn from a normal population
2
N (0, θ) , 0 < θ < ∞ . Show that X
i=1 i is a sufficient statistic.
2.19 If T1 = 32 max{X1 , X2 } and T2 = 2(X1 + X2 ) are estimators of θ based
on two independent observations X1 and X2 on a random variable distributed
uniformly over (0, θ) . Which one do you prefer and why?
2.20 Let X1 , X2 , · · · , Xn be aPrandom sample drawn from a Poisson population with
Xi
parameter θ . Show that n+2 is not unbiased of θ but consistent of θ .
2.21 Distinguish between an Estimate and Estimator. Given three observations
X1 , X2 and X3 on a normal random variable X from N (θ, 1) , a person con-
structs the following estimators for θ
X1 + X2 + X3
T1 =
6
X1 + 2X2 + 3X3
T2 =
7
X1 + X2
T3 =
2
which one would you choose and why?
2.22 A random sample X1 , X2 , · · · , Xn drawn on XPwhichP takes 1 or 0 with re-
Xi ( Xi −1)
spective probabilities θ and (1 − θ) . Show that n(n−1) is an unbiased
estimator of θ2 .
2.23 Discuss whether an unbiased estimator exists for the parametric function τ (θ) =
θ2 of Binomial (1, θ) based on a sample of size one.
2.24 Obtain the sufficient statistic of the pdf
(1 + θ)xθ 0 < x < 1
pθ (x) =
0 otherwise
based on an independent sample of size n .
2.25 X1 , X2 , X3 and X4 constitute a random sample of size four from a Poisson
population with parameter θ . Show that (X1 + X2 + X3 + X4 ) and (X1 +
X2 , X3 + X4 ) are sufficient statistics. Which would you prefer ?
2.26 A statistic Tn such that Vθ [Tn ] → 0 ∀ θ is consistent as an estimator of θ as
n → ∞:
(a) if and only if Eθ [Tn ] → θ ∀ θ
(b) if, but not only if Eθ [Tn ] → θ ∀ θ
(c) if and only if Eθ [Tn ] = θ ∀ θ , for every n
(d) if and only if |Eθ [Tn ] − θ| and Vθ [Tn ] → 0 ∀ θ
69
A. Santhakumaran
2.28 X1 , XP2 , · · · , Xn are iid Bernoulli random variables with Eθ [Xi ] = θ and
n
Sn = i=1 Xi . Then, for a sequence of non - negative numbers {kn }, Tn =
Sn +kn
n+kn is a consistent estimator of θ :
(a) if knn → 0 as n → ∞
(b) if and if kn = 0 ∀ n
(c) if and only if kn is bounded as n → ∞
(d) whatever {kn } is
2.29 In tossing a coin the P {Head} = p2 . It is tossed n times to estimate the value
of p2 . X denotes the number of heads. One may use to estimate the unbiased
estimator
2 is 2
(a) X n (b) Xn (c) Xn (d) nX2
2.30 Which of the following statement is not correct for a consistent estimator?
1. If there exists one consistent estimator, then an infinite number of consistent
statistics may be constructed.
2. Unbiased estimators are always consistent.
3. A consistent estimator with finite mean value must tend to be unbiased in
large samples.
Select the correct answer given below:
(a) 1 (b) 2 (c) 1 and 3 (d) 1, 2 and 3
2.31 Consider the following type of population :
1. Normal 2 . Cauchy 3 . Poisson
Sample mean is the best estimator of population mean in case of
(a) 1 and 3 (b) 1 and 2 (c) 2 and 3 (d) 1 , 2 and 3
70
Probability Models and their Parametric Estimation
3.1 Introduction
If a given family of probability distributions that admits a non - trivial sufficient
statistic which results in the greatest reduction of data collection. The reduction of data
can be achieved through complete sufficient statistic. The existence of the mathemat-
ical expectation Eθ [g(X)] , θ ∈ Ω implies that the integral ( or sum) involves in
Eθ [g(X)] converges absolutely. This absolute convergence was tacitly assumed in the
definition of completeness.
3.2 Completeness
Definition 3.1 A family of distributions {Pθ , θ ∈ Ω} is said to be complete if
Eθ [g(X)] = 0 ∀ θ ∈ Ω ⇒ g(x) = 0 ∀ x.
Show that the family is not complete but the family of distributions Y = |X| is
complete.
Consider Eθ [g(X)] = 0
1
X
g(x)pθ (x) = 0
x=−1
1 1
g(−1) θ(1 − θ) + g(0)[1 − θ(1 − θ)] + g(1) θ(1 − θ) = 0
2 2
71
A. Santhakumaran
ConsiderEθ [g(Y )] = 0
1
X
g(y)[θ(1 − θ)]y [1 − θ(1 − θ)]1−y = 0
y=0
1
X θ(1−θ)
g(y)ρy = 0 where ρ = 1−θ(1−θ)
y=0
g(0) + g(1)ρ = 0 → g(0) = 0 andg(1) = 0
Eθ [g(T )] = 0
n
X
g(t)cnt (1 − θ)n−t = 0
t=0
n t
X θ
g(t)cnt (1 − θ)n = 0
t=0
1−θ
Here (1 − θ)n 6= 0
n
X θ
g(t)cnt ρt = 0 where ρ =
t=0
1−θ
g(0)cn0 + g(1)cn1 ρ + · · · + g(n)ρn = 0
= 0 coefficient of ρ0
g(0)
cn1 g(1)
= 0 coefficient of ρ1
⇒ g(1) = 0
······ ··· ······
g(n) = 0 coefficient of ρn
Thus g(t) = 0 ∀ t = 0, 1, 2, · · · , n.
Pn
Hence T = i=1 Xi is a complete statistic.
Example 3.3 Let X1 , X2 , · · · , Xn be iid random sample drawn from a Poisson
72
Probability Models and their Parametric Estimation
Pn
population with parameter λ > 0 . Show that T = i=1 Xi is a complete statistic.
n
X
T = Xi ∼ P (nλ)
i=1
(nλ)t
i.e., pλ (t) = e−nλ , t = 0, 1, 2, · · · , ∞
t!
Eλ [g(T )] = 0
∞
X (nλ)t
g(t)e−nλ = 0
t=0
t!
∞
X (nλ)t
g(t) = 0 since e−nλ 6= 0
t=0
t!
nλ (nλ)n
g(0) + g(1) + · · · + g(n) + ··· = 0
1! n!
By comparing the coefficients of λt on both sides,
g(0) = 0 coefficient of λ0
ng(1) = 0 coefficient of λ1
⇒ g(1) = 0
······ ··· ······
Thus g(t) = 0 ∀ t = 0, 1, 2, · · · , ∞
Pn
Hence T = i=1 Xi is a complete statistic.
Example 3.4 Let X ∼ ∪(0, θ), θ > 0 . Show that the family of distributions is
complete.
Consider Eθ [g(X)] = 0
Z θ
1
⇒ g(x) dx = 0
0 θ
Z θ
⇒ g(x)dx = 0
0
One can differentiate the above integral with respect to θ on both sides
Z θ
0dx + g(θ) × 1 − g(0) × 0 = 0
0
hR i
b(θ)
d a(θ) pθ (x)dx Z b(θ)
dpθ (x) db(θ)
since = dx + pθ [b(θ)]
dθ a(θ) dθ dθ
da(θ)
−pθ [a(θ)]
dθ
g(θ) = 0 ∀ θ > 0, i.e., g(x) = 0 ∀ 0 < x < θ, θ > 0
73
A. Santhakumaran
74
Probability Models and their Parametric Estimation
R ∞
This is same as the Laplace Transform of f (t) as 0 e−st f (t)dt.
Using the uniqueness property of Laplace Transform
1
g(t)t− 2 = 0 ∀ t > 0
i.e., g(t) = 0 ∀ t > 0 . Thus T = X 2 is a complete statistic .
Example 3.7 Examine whether the family of distributions
is complete.
Consider Eθ [g(X)] = 0
Z 1 Z 1
2
⇒ g(x)2θdx + g(x)2(1 − θ)dx = 0
1
0 2
1
Z 2
Z 1
2θ g(x)dx + 2(1 − θ) g(x)dx = 0
1
0 2
1
Z 2
Z 1 Z 1
θ g(x)dx − θ g(x)dx + g(x)dx = 0
1 1
0 2 2
"Z 1
#
2
Z 1 Z 1
θ g(x)dx − g(x)dx + g(x)dx = 0
1 1
0 2 2
"Z 1
#
2
Z 1
θ g(x)dx − g(x)dx = 0
1
0 2
Z 1
and g(x)dx = 0
1
2
1
Z 2
Z 1
g(x)dx = g(x)dx θ 6= 0
1
0 2
Z 1
2
⇒ g(x)dx = 0
0
75
A. Santhakumaran
choose
+1 if 0 < x < 14
−1 if 14 ≤ x < 21
g(x) =
+1 if 12 ≤ x < 43
−1 if 34 ≤ x < 1
Z 41 Z 12
Eθ [g(X)] = (+1)2θdx + (−1)2θdx
1
0 4
3
Z 4
Z 1
+ (+1)2(1 − θ)dx + (−1)2(1 − θ)dx
1 3
2 4
1 1 1 1
=2θ − 2θ + 2(1 − θ) − 2(1 − θ)
4 4 4 4
= 0
But g(x) 6= 0 for some x
+1 if 0 < x < 14
−1 if 14 ≤ x < 21
i.e., g(x) =
+1 if 12 ≤ x < 43
−1 if 34 ≤ x < 1
76
Probability Models and their Parametric Estimation
g − (t)eθt+s(t)
p− (t) = P − θt+s(t)
t g (t)e
77
A. Santhakumaran
complete.
2
n n
X X 2
Define g(X) = 2 − (n + 1)
Xi Xi , n = 2, 3, · · ·
i=1 i=1
2
n n
X X 2
Eθ [g(X)] = 2Eθ Xi − (n + 1)Eθ Xi
i=1 i=1
2
n
X 2 2 θ2
Xi = n X̄ and X̄ ∼ N (θ, )
i=1 n
√
2
Z ∞
2 n − n (x̄−θ)2
Eθ [X̄ ] = x̄ √ e 2θ 2 dx̄
−∞ 2πθ
x̄ − θ √ θ θ
If z = n, then x̄ − θ = z√ and dx̄ = √ dz
θ n n
√ 2
θ 2 n −z θ
Z ∞
. 2 2 √ dz
. . Eθ [X̄ ] = (θ + z √ ) √ e
−∞ n 2πθ n
z2
2
Z ∞ z2 2z 1 −
= θ 1 + + √ √ e 2 dz
−∞ n n 2π
2
2 1 Z ∞ 2 1 −z
= θ 1 + z √ e 2 dz + 0
n −∞ 2π
1 1 1 −1
One can take z 2 = t, then z = t 2 and dz = t2 dt
2
" #
2 − t 1 1 −1
Z ∞
2 2
i.e., Eθ [X̄ ] = θ √1+ te 2 t2 dt
2π 0 n 2
" #
1 − t 3 −1
Z ∞
2
= θ 1+ √ e 2 t2 dt
n 2π 0
2 1 Γ3
= θ
1 + 2
√
n 2π 1 3
( )2
2
√ √
2 1 1 π2 2
= θ 1 + 2
√
n 2π
2
1 n+1 2
= θ 1+ = θ
n n
2
n
. X 2 2 2 2 2n+1
. . Eθ Xi = Eθ [nX̄] = n Eθ [X̄] = n θ
i=1 n
2
= n(n + 1)θ
n n
X 2 X 2
Consider Xi = (Xi − θ + θ)
i=1 i=1
n n
X 2 2 X
= (Xi − θ) + nθ + 2θ (Xi − θ)
i=1 i=1
n
X 2 2
= (Xi − θ) + 2θnx̄ − nθ
i=1
n
X 2 2 2
Eθ Xi = Eθ [ns ] + 2θnEθ [X̄] − nθ
i=1
2 2 2 2 2
= Eθ [ns ] + 2nθ − nθ = Eθ [ns ] + nθ
2 1 X 2
where s = (xi − θ)
n
Pn 2
ns2 i=1 (Xi −θ)
Let Y = σ2 = θ2 ∼ χ2 distribution with n degrees of freedom. Y has
78
Probability Models and their Parametric Estimation
the pdf G( n2 , 12 )
1 n
(
1
1
e− 2 y y 2 −1 0<y<∞
p(y) = 22 Γn
2
0 otherwise
Z ∞
1 1 n
E[Y ] = n e− 2 y y 2 +1−1 dy = n
0 2 2 Γ n2
ns2
i.e., Eθ = n
σ2
Eθ [s2 ] = θ2 since σ 2 = θ2
n
X
Eθ [ Xi2 ] = nθ2 + nθ2 = 2nθ2
i=1
" n #2 " n #
X X
Eθ [g(X)] = 2Eθ Xi − (n + 1)Eθ Xi2
i=1 i=1
= 2n(n + 1)θ2 − (n + 1)2nθ2 = 0
→ g(x) = 0 not for all x
i.e., g(x) 6= 0 for some x
n
!2 n
!
X X
2
i.e., g(x) = 2 xi − (n + 1) xi 6= 0
i=1 i=1
n
!2 n
!
X X
i.e., 2 xi 6= (n + 1) x2i for some x, n = 2, 3, · · ·
i=1 i=1
79
A. Santhakumaran
is complete.
Consider Eθ [g(X)] = 0
Z θ Z 1
θg(x)dx + (1 + θ)g(x)dx = 0+0
0 θ
Z θ
⇒ g(x)dx = 0 and
0
Z 1
g(x)dx = 0
θ
One can differentiate the above integrals with respect to θ
Z θ
0dx + g(θ) × 1 − g(0) × 0 = 0 and
0
Z 1
0dx + g(1) × 0 − g(θ) × 1 = 0
θ
g(θ) = 0 and −g(θ) = 0 ∀ θ > 0
i.e., g(x) = 0 ∀ 0 < x < θ, 0 < θ < 1
Thus the family of distributions is complete.
Definition 3.3 A statistic T = t(X) is said to be bounded complete statistic, if
there exists a function |g(T )| ≤ M, M ∈ < such that E[g(T )] = 0 ⇒ g(t) =
0 ∀ t ∈ <.
Example 3.9 Show that Completeness implies bounded completeness, but
bounded completeness does not imply completeness.
Proof: Assume T = t(X) is a complete statistic. That is E[g(T )] = 0 ⇒
g(t) = 0 ∀ t ∈ < . Prove that g(T ) is bounded complete.
V [g(T )]
P {|g(T ) − E[g(T )]| < } ≥ 1− for every given > 0
2
V [g(T )]
P {|g(T )| < } ≥ 1− for every given > 0
2
⇒ |g(t)| < ∀ t ∈ <
at least with probability 1 − V [g(T
2
)]
. This means that g(T ) is bounded with
E[g(T )] = 0 ⇒ g(t) = 0 ∀ t ∈ < . i.e., T = t(X) is a bounded complete
statistic.
Assume T = t(X) is a bounded complete statistic. To prove that T is not a
complete statistic.
θ x = −1
Consider a family of density functions pθ (x) = (1 − θ)2 θx x = 0, 1, 2, · · ·
0 otherwise
80
Probability Models and their Parametric Estimation
Now the function g(x) = x is bounded. If the family is bounded complete, then
81
A. Santhakumaran
82
Probability Models and their Parametric Estimation
Consider Eθ [g(X)] = 0
g(0)[1 − θ − θ ] + g(1)θ + g(2)θ2 = 0
2
2
θ [g(2) − g(0)] + θ[g(1) − g(0)] + g(0) = 0
g(2) − g(0) = 0 coefficient of θ2
g(1) − g(0) = 0 coefficient of θ
g(0) = 0 coefficient of θ0
Hence g(0) = g(1) = g(2) = 0 , i.e., g(x) = 0 for x = 0, 1 and 2. Thus the family
of distributions is complete.
Example 3.13 X has the following distribution
X =x: 1 2 3 4 5 6
Pθ {X = x} 61 16 16 61 16 16
Examine whether the family of pmf ’s is complete.
Define
c when x = 1, 3, 5
g(x) =
−c when x = 2, 4, 6
Consider E[g(X)] = 0
3c 3c
⇒ − + = 0
6 6
But g(x) 6= 0 for x = 1, 2, 3, 4, 5, 6.
Consider EN g(X) = 0 ∀ N ∈ I+
PN
i.e., x=1 g(x) N1 = 0 ⇒ g(x) = 0 ∀ x and ∀ N
When N = 1 ⇒ g(1) = 0
When N = 2 ⇒ g(1) + g(2) = 0 ⇒ g(2) = 0 since g(1) = 0
When N = 3 ⇒ g(3) = 0 since g(1) + g(2) = 0 and so on.
83
A. Santhakumaran
84
Probability Models and their Parametric Estimation
pθ (x1 , x2 · · · , xn )
= k(y1 , · · · , yn ; x1 , x2 , · · · , xn )
pθ (y1 , y2 , · · · , yn )
The ratio is P xi =P yi
pθ (x1 , x2 , · · · , xn ) θ
= .
pθ (y1 , y2 , · · · , yn ) 1−θ
P P
The ratio is independent of θ iff xi = yi . Thus the points x1 , x2 , · · · , xn and
y1 , y2 , · · · ,P
yn whose coordinates have the same set of minimal sufficient partition.
Therefore Xi is a minimal sufficient statistic.
Example 3.17 Let X1 , X2 , · · · , Xn be iid PrandomP sample from N (θ, σ 2 ) . As-
2 2
sume θ and σ are unknown. Prove that ( Xi , Xi ) is a minimal sufficient
statistic.
85
A. Santhakumaran
86
Probability Models and their Parametric Estimation
pθ (x1 , x2 , · · · , xn ) h nX X oi Y h(x )
i
= exp Q(θ) t(xi ) − t(yi ) .
pθ (y1 , y2 , · · · , yn ) h(yi )
P P P
This is independent of θ iff t(xi ) = t(yi ) . Therefore T = t(Xi ) is a
minimal sufficient statistic.
Remark 3.3 A complete sufficient statistics is minimal sufficient whenever mini-
mal sufficient statistic exists.
Theorem 3.3 Let pθ0 (x) and pθ1 (x)) be the densities and they have the same
p (X)
support ( the range of the two densities are the same). Then the statistic T = pθθ1 (X)
0
is minimal sufficient.
Proof: The necessary and sufficient condition that T = t(X) is a sufficient
statistic for fixed θ1 and θ0 are
and
pθ0 (x1 , x2 · · · , xn ) = pθ0 (t)h(x1 , x2 , · · · , xn )
pθ1 (x1 ,x2 ,··· ,xn ) pθ1 (t)
respectively. Let the ratio pθ0 (x1 ,x2 ,··· ,xn ) = pθ0 (t) be a function of u(x) , then
p (X)
U = u(X1 , X2 , · · · , Xn ) is a sufficient statistic for pθθ1 (X)
iff T is a function of U .
0
This proves T = t(X) to be minimal sufficient statistic.
If P is a family of distributions with common support and P0 ⊂ P and if
T = t(X) is minimal sufficient statistic for P0 and sufficient for P , it is minimal
sufficient for P .
Example 3.18 Let P ∼ N (θ, 1) and P0 ∼ N (θ0 , 1) and P0 ⊂ P . Let
X1 , X2 , · · · , Xn be a random sample of size n . Then
1
(xi −θ)2
P
pθ (x1 , x2 , · · · , xn ) e− 2
= 1
P
(xi −θ0 )2
pθ (x1 , x2 · · · , xn ) e− 2
1 2 2
= e 2 [2n(θ−θ0 )x̄−n(θ −θ0 )]
Thus T = X̄ is the minimal sufficient statistic for N (θ, 1). Example 3.19 Let
X1 , X2 , · · · , Xn be a random sample from a population defined by the Cauchy density
with parameter θ :
1
π[1+(x−θ)2 ] −∞ < x < ∞
pθ (x) =
0 otherwise − ∞ < θ < ∞
87
A. Santhakumaran
N-D
D
x n-x
x = 0, 1, · · · , min(n, D)
N
PD {X = x} =
n
0 otherwise
is complete.
3.6 Let X1 , X2 , · · · , Xn be a sample from ∪(θ − 12 , θ + 12 ), θ ∈ < . Show that the
statistic T = (min1≤i≤n (Xi ), max1≤i≤n (Xi )) is not complete.
2
3.7 Let X1 , X2 , · · · , Xn be a sample of n independent
Pn observations
Pn from N (θ, σ )
2 2
−∞ < θ < ∞, 0 < σ < ∞ . Show that i=1 Xi , i=1 Xi is a sufficient
statistic. Is it complete? Justify?
88
Probability Models and their Parametric Estimation
then
(a) P is complete
(b) P is not complete
(c) P is bounded complete
(d) P is not bounded complete
3.13 If a complete sufficient statistic does not exist, then UMVUE
(a) may not exist
(b) may exist
(c) may unique
(d) none of the above
3.14 If a complete sufficient statistic exists, then UMVUE is
(a) unique
(b) not unique
(c) not exist
(d) none of the above
89
A. Santhakumaran
4. OPTIMAL ESTIMATION
4.1 Introduction
Let g(T ) be an unbiased estimator of τ (θ) and δ(T ) be an another unbiased
estimator of τ (θ) different from g(T ) . Then there always exists an infinite number of
unbiased estimators of τ (θ) such that λg(T ) + (1 − λ)δ(T ), 0 < λ < 1 . In this case
one can find the best estimator or optimal estimator among all the unbiased estimators.
The following procedures are used to identify the optimal estimator.
U0 = {V | Eθ [V ] = 0, Eθ [V 2 ] < ∞ ∀ θ ∈ Ω}
Eθ [T + λ V ] = τ (θ) + λEθ [V ]
= τ (θ) since Eθ [V ] = 0
90
Probability Models and their Parametric Estimation
Vθ [T ] + λ2 Vθ [V ] + 2λCovθ [V, T ] ≥ Vθ [T ]
i.e., 2λCovθ [T, V ] + λ2 Vθ [V ] ≥ 0 ∀ θ and ∀ λ
It is an quadratic equation in λ and it has two real roots λ = 0 and λ = − 2Cov θ [T,V ]
Vθ [V ] .
If λ = 0, trivially T is an UMVUE of τ (θ) .
For λ 6= 0, take λ0 = λ2 = − Cov θ [T,V ] 0
Eθ [V 2 ] , then one can define T ∈ U where T =
0
Thus λ0 = − EEθθ[T V]
[V 2 ] contradicts that T is the UMVUE of τ (θ) . If T is the UMVUE
of τ (θ) , then Covθ [T, V ] = 0, i.e., Eθ [T V ] = 0 ∀ θ ∈ Ω .
Conversely, assume Covθ [T, V ] = 0 for some θ ∈ Ω . To prove that T is a
UMVUE of τ (θ) . Let T 0 be another unbiased estimator of τ (θ) so that T 0 ∈ U,
then T 0 − T ∈ U0 . Since Eθ [T ] = τ (θ) and Eθ [T 0 ] = τ (θ) →
Eθ [T 0 − T ] = 0
⇒ Eθ [T (T 0 − T )] = 0
Eθ [T T 0 ] = Eθ [T 2 ]
Applying Cauchy Schwarz inequality to Eθ [T 0 T ]
2 2
{Eθ [T T 0 ]} ≤ Eθ [T 2 ]Eθ [T 0 ]
1 n o 12
2
Eθ [T T 0 ] ≤ Eθ [T 2 ] 2 Eθ [T 0 ]
Eθ [T 2 ] n
2
o 21
1 ≤ Eθ [T 0 ]
{Eθ [T 2 ]} 2
Vθ [T ] ≤ Vθ [T 0 ]
91
A. Santhakumaran
92
Probability Models and their Parametric Estimation
Eθ [T − Tn ]2 → 0 as n → ∞
.. . Eθ [T V ] → 0 as n → ∞
i.e., Covθ [T, V ] = 0 as n → ∞ ∀ θ ∈ Ω.
1
Vθ [T ] = Vθ [T1 + T2 ]
2
1
= {Vθ [T1 ] + Vθ [T2 ] + 2Covθ [T, T2 ]}
4
1n p o
= Vθ [T1 ] + Vθ [T2 ] + 2ρ Vθ [T1 ] + Vθ [T2 ]
4
1
= {2Vθ [T1 ] + +2ρVθ [T1 ]}
4
1
= Vθ [T1 ](1 + ρ)
2
⇒ Vθ [T ] ≥ Vθ [T1 ]
1
Vθ [T1 ](1 + ρ) ≥ Vθ [T1 ]
2
(1 + ρ) ≥ 2
ρ ≥1
93
A. Santhakumaran
Covθ (T, T 0 ) = Vθ [T ]
Pn
Given T = i=1 αi Xi is the unbiased estimator of θ , Eθ [T ] = θ .
Also T 0 is the unbiased estimator of θ , i.e., Eθ = [T 0 ].
Eθ [T ] = θ
Eθ [T 0 ] = θ
Eθ [T − T 0 ] = 0
Eθ [T [T − T 0 ] = 0
Eθ [T 2 − T T 0 ] = 0
Eθ [T 2 ] − Eθ [T T 0 ] = 0
Eθ [T T 2 ] = Eθ [T 2 ]
i.e., Covθ (T, T 0 ) = Vθ [T ]
1 1
Vθ [ (T1 + T2 )] = {Vθ [T1 ] + Vθ [T2 ] + 2Covθ (T1 , T2 )}
2 4
1n p o
= Vθ [T1 ] + Vθ [T2 ] + 2ρ Vθ [T1 ]Vθ [T2 ]
4
1
= [2Vθ [T1 ] + 2ρVθ [T2 ]]
4
1
= [Vθ [T1 ] + ρVθ [T1 ]
2
where ρ is the correlation coefficient between T1 and T2 Let T be the UMVUE of
94
Probability Models and their Parametric Estimation
95
A. Santhakumaran
96
Probability Models and their Parametric Estimation
.˙. The UMVUE g(T ) is unique, if the sufficient statistic T = t(X) is complete.
From the above Theorems 4.5 and 4.6 the UMVUE of τ (θ) is obtained by
solving a set of equations and conditioning on the sufficient statistic.
Solving a set of equations of the sufficient statistic
Let Pθ , θ ∈ Ω be a distribution of random variable X . If T is a complete suf-
ficient statistic, then the UMVUE g(T ) of any parametric function τ (θ) is uniquely
determined by solving the set of equations Eθ [g(T )] = τ (θ) ∀ θ ∈ Ω .
Conditioning on the sufficient statistic
If a random variable X has a distribution Pθ , θ ∈ Ω and δ(T ) is any unbiased
estimator of τ (θ) and T = t(X) is complete sufficient statistic, then the UMVUE
g(T ) can be obtained by conditional expectation of δ(T ) given T = t , i.e., g(t) =
E[δ(T ) | T = t].
Example 4.4 Obtain the UMVUE of θ + 2 for the pmf of the Poisson distribu-
tion −θ θx
e x! x = 0, 1, 2, · · ·
p(x | θ) =
0 otherwise
by taking a sample of size n .
n
X
Let T = Xi , thenT ∼ P (nθ)
i=1
−nθ
e (nθ)t
p(t | θ) = t = 0, 1, 2, · · ·
t!
= 0 otherwise
1
p(t | θ) = e−nθ et log nθ
t!
= c(θ)eQ(θ)t(x) h(x)
Pn
where c(θ) = e−nθ , Q(θ) = log nθ, t(x) = i=1 xi , h(x) = 1
t! . .˙. The statistic
97
A. Santhakumaran
Eθ [g(T )] = θ+2
∞
X 1
g(t)e−nθ (nθ)t = θ+2
t=0
t!
∞
X 1
g(t)nt θt = (θ + 2)enθ
t=0
t!
∞
X (nθ)t
= (θ + 2)
t=0
t!
∞ ∞
X
t t+1 1 X 1
= nθ +2 nt θ t
t=0
t! t=0
t!
Equivating the coefficient of θt on both sides
nt nt−1 nt
g(t) = +2
t! (t − 1)! t!
t
g(t) = +2
n
P
xi
= +2
n
= x̄ + 2
(n − r)t−r+1 t!
g(t) = t
, r = 1, 2, · · · and n > r
n (t − r + 1)!
Thus the UMVUE of θr−1 e−rθ is
(n − r)T −r+1 T!
T
, r = 1, 2, · · · and n > r.
n (T − r + 1)!
98
Probability Models and their Parametric Estimation
t t
Remark 4.2 When r = 1, g(t) = n−1 n = 1 − n1 , n = 2, 3, · · · , then
T
1 − n1 is the unbiased estimator of e−θ where T = Xi .
P
When r = 2, (n−2)T
[1 − n2 ]T , n = 3, 4, · · · is the UMVUE of e−2θ θ where
P
T = Xi .
Example 4.6 Obtain the UMVUE of θr + (r − 1)θ , r = 1, 2, · · · for the random
sample of size n from Poisson distribution
Pn with parameter θ .
As in the example 4.1, T = i=1 Xi is complete and sufficient. There exists a
UMVUE of τ (θ) = θr + (r − 1)θ , r = 1, 2, · · ·
∞
X (nθ)t
Eθ [g(T )] = g(t)e−nθ = θr + (r − 1)θ
t=0
t!
∞ t t
X nθ
g(t) = [θr + (r − 1)θ]enθ
t=0
t!
= θr enθ + (r − 1)θenθ
∞ ∞
X nt θ t X nt θ t
= θr + (r − 1)θ
t=0
t! t=0
t!
∞ ∞
X 1 X 1
= nt θt+r + (r − 1) nt θt+1
t=0
t! t=0
t!
Equivating the coefficient of θt on both sides
nt nt−r nt−1
g(t) = + (r − 1)
t! (t − r)! (t − 1)!
1 t! 1 (r − 1)
= + t!
nr (t − r)! n (t − 1)!
t(t − 1) · · · · · · (t − r + 1) (r − 1)
= + t
nr n
The UMVUE of θr + (r − 1)θ is
T (T − 1) · · · · · · (T − r + 1) (r − 1)
g(T ) = + T, r = 1, 2, · · ·
nr n
Remark 4.3 When r = 1, X̄ is the UMVUE of θ .
When r = 2, X̄(nX̄−1)
n + X̄ is the UMVUE of θ2 + θ .
Example 4.7 Obtain UMVUE of θ(1−θ) using a random sample of size n drawn
from a Bernoulli population with parameter θ.
x
θ (1 − θ)1−x x = 0, 1
Given pθ (x) =
0 otherwise
99
A. Santhakumaran
n
X
Let T = Xi , then T ∼ b(n, θ)
i=1
i.e., pθ (x) = cnt θt (1 − θ)n−t t = 0, 1, 2, · · · ,n
t
θ
= cnt (1 − θ)n
1−θ
θ
= (1 − θ)n et log( 1−θ ) cnt
= c(θ)eQ(θ)t(x) h(x)
θ X
where c(θ) = (1 − θ)n , Q(θ) = log , t(x) = xi and h(x) = cnt .
1−θ
P
It is an one parameter exponentially family. .˙. The statistic T = Xi is complete
and sufficient. The UMVUE of θ(1 − θ) is
Eθ [g(T )] = θ(1 − θ)
∞
X
g(t)cnt θt (1 − θ)n−t = θ(1 − θ)
t=0
∞ t
X θ
g(t)cnt = θ(1 − θ)(1 − θ)−n
t=0
1−θ
θ
One can take ρ = , then
1−θ
θ 1
1+ρ = 1+ =
1−θ 1−θ
1
Thus 1 − θ =
1+ρ
ρ
→ θ =
1+ρ
∞
X
g(t)ρt cnt = ρ(1 + ρ)n−2
t=0
= ρ[1 + cn−2
1 ρ + · · · + ρn−2 ]
= ρ + cn−2
1 ρ2 + · · · + ρn−1
n−1
!
X n-2
= t-1 ρt
t=1
g(t)cnt = cn−2
t
(n − 2)! t!(n − t)!
g(t) =
(t − 1)!(n − t − 1)! n!
(n − 2)!t(t − 1)!(n − t)(n − t − 1)!
=
(t − 1)!(n − t − 1)!n(n − 1)(n − 2)!
t(n − t)
= if n = 2, 3, · · ·
n(n − 1)
100
Probability Models and their Parametric Estimation
T (n−T )
i.e., n(n−1) is the UMVUE of θ(1 − θ).
Example 4.8 Obtain the UMVUE of p1 of the pmf
pq x
x = 0, 1, · · ·
pp (x) =
0 otherwise
This is an one parameter exponentially family which is complete and sufficient. Thus
there exist an unique UMVUE of p1 . It is given by Ep [g(T )] = p1 .
Pn
The statistic T = i=1 Xi is the sum of n iid Geometric variables with
same parameter p has the Negative Binomial distribution. The pmf of T is
!
n+t-1 n t
n-1 p q t = 0, 1, · · ·
pp (t) = P {T = t} =
0 otherwise
∞
!
X n+t-1 1
g(t) n-1 pn q t =
t=0
p
∞
n+t-1 t
X
g(t) t q = (1 − q)−(n+1)
t=0
∞
n+t t
X
= t q
t=0
n+t-1
Equivating the coefficient of q t on both sidesg(t) t
n+t
= t
101
A. Santhakumaran
Thus T +n
n is the UMVUE of p1 .
1
Example 4.9 For a single observation x of X , find the UMVUE of p of the
pmf x
pq x = 0, 1, · · ·
pp (x) =
0 otherwise
The pmf of the random variable is written as
1
Ep [g(X)] =
p
∞
X 1
g(x)pq x =
x=0
p
X∞ ∞
X
g(x)q x = (1 − q)−2 = (x + 1)q x
x=0 x=0
→ g(x) = x+1
102
Probability Models and their Parametric Estimation
Pn
and ρ is the correlation coefficient between X1 and Y = i=1 Xi
Cov[X1 , Y ]
ρ =
σX σY
X 1 √
Y = Xi ∼ N (nθ, n) σY = n, σX1 = 1
Covθ [X1 , Y ] = Eθ [X1 Y ] − Eθ [X1 ]Eθ [Y ]
" n
#
X
Eθ [X1 Y ] = Eθ X1 Xi
i=1
= Eθ [X12 ] + Eθ [X1 X2 + · · · + Xn X1 ]
= Eθ [X12 ] + Eθ [X1 ]Eθ [X2 ] + · · · + Eθ [X1 ]Eθ [Xn ]
= 1 + θ2 + (n − 1)θ2 where Vθ [X1 ] = Eθ [X12 ] − θ2
= nθ2 − θ2 + 1 + θ2
= nθ2 + 1
Covθ [X1 , Y ] = nθ2 + 1 − θnθ = 1
1 1
ρ = √ and bX1 Y =
n n
1
E[X1 | Y = y] = Eθ [X1 ] + [y − nθ]
n
y
= θ + − θ = x̄
n
E[X1 | Y = y] = x̄ and X̄ is the UMVUE of θ.
Example 4.11 Let X1 , X2 , · · · Xn be iid random sample with pdf
1
θ 0<x<θ
pθ (x) =
0 otherwise
Find the UMVUE of θ .
Let T = max {Xi }
1≤i≤n
The pdf of T is
Z t n−1
n! 1 1
pθ (t) = dx 0<t<θ
1!(n − 1)! 0 θ θ
n n−1
pθ (t) = θn t 0<t<θ
0 otherwise
The joint density of X1 , X2 , · · · , Xn is
n
1
pθ (x1 , x2 , · · · , xn ) =
θ
The conditional density of
1
pθ (x1 , x2 · · · xn ) θn 1
= n n−1 =
pθ (t) θn t ntn−1
103
A. Santhakumaran
104
Probability Models and their Parametric Estimation
If one can take δ(T ) = X1 − 1 , then the UMVUE of θ is given by g(T ) and
g(t) = E[(X1 − 1) | T = t].
When x1 = t, the conditional pmf of X1 given T = t is pθ (x1 | t) = n1 .
105
A. Santhakumaran
106
Probability Models and their Parametric Estimation
∂x1 ∂x1
and t1 = x2 , then x1 = t−t1 and x2 = t1 . ∂t = 1, ∂t1 = −1, ∂x ∂x2
∂t = 0, ∂t1 = 1
2
∂x1 ∂x1
∂t ∂t1
J =
∂x2 ∂x2
∂t ∂t1
1 −1
=
0 1
107
A. Santhakumaran
Example 4.14 The random variables X and Y have the joint pdf
2 − 1 (x+y)
θ2 e
θ 0<x<y<∞
p(x, y | θ) =
0 otherwise
Show that
(i) Eθ [Y | X = x] = x + θ
(ii) Eθ [Y ] = Eθ [X + θ] and
(iii) Vθ [X + θ] ≤ Vθ [Y ]
The marginal density of X is
Z ∞
2 x+y
p(x | θ) = e− θ dy
θ2 x
2 − 2x
θ2 e
θ 0<x<∞
=
0 otherwise
108
Probability Models and their Parametric Estimation
109
A. Santhakumaran
110
Probability Models and their Parametric Estimation
xn−1
+ n (2x − 1)
x − (x − 1)n
x−1
xn−1 X
= (2x1 − 1)
xn − (x − 1) x =1
n
1
n−1
x
+ (2x − 1)
xn − (x − 1)n
x−1
(x − 1)n−1 X
− (2x1 − 1)
xn − (x − 1)n x =1
1
xn−1
= [1 + 3 + 5 + · · · + (2x − 1)]
x − (x − 1)n
n
111
(x − 1)n−1
− [1 + 3 + · · · + (2x − 3)]
xn − (x − 1)n
A. Santhakumaran
U0 = {g(X) | c ∈ <}
where
c(−1)x−1
if x = 1, 2
g(x) =
0 x = 3, 4, · · · , N ; N = 2, 3, · · ·
By Theorem 4.7, CovN [δ(T ), g(X)] = 0 for N = 2, 3, · · · implies that δ(T ) is a
UMVUE of N where T = t(X) . That is
EN [δ(t(X))g(X)] = 0 N = 2, 3, · · · , ∀ c ∈ <
N
X 1
δ(t(x))g(x) = 0 N = 2, 3, · · · , ∀ c ∈ <
x=1
N
N
X
⇒ δ(t(x))g(x) = 0 N = 2, 3, · · · , ∀ c ∈ <
x=1
i.e., δ(t(1))c − δ(t(2))c = 0 ∀ c ∈ <
If one can take c = 1 , then δ(t(1)) = δ(t(2)).
.˙. Any estimator δ(T ) such that δ(t(1)) = δ(t(2)) is a UMVUE of N , provided
EN [δ 2 (T )] < ∞, for N = 2, 3, · · · . Thus a family of distributions is bounded com-
plete, then there is a class of UMVUE’s.
Example 4.16 Let X1 , X2 , · · · , Xn be a random sample of size n from a distri-
bution with pdf 1 −x
θe
θ 0 < x < ∞, θ > 0
pθ (x) =
0 otherwise
112
Probability Models and their Parametric Estimation
p(x1 , x2 , · · · , xn | θ) e = θ
θn
= c(θ)eQ(θ)t(x) h(x)
Pn
It is an one parameter exponential family. The statistic T = i=1 Xi is complete and
sufficient.
Pθ {X ≥ 2} = 1 − Pθ {X < 2}
Z 2
1 −x
= 1− e θ dx
0 θ
2
= eZ− θ
∞
1 − x 2−1
Eθ [X1 ] = e θ x1 dx1 = θ
0 θ
Xn
Let T = Xi , thenT ∼ G(n, θ)
i=1
− θ1 t n−1
1
pθ (t) = θ n Γn e t t>0
0 otherwise
n
X
Let y = xi , then
i=2
113
A. Santhakumaran
1 − θ1 t n−2
θ n Γ(n−1) e [t − x1 ]
pθ (x1 | t) = 1 − θ1 t tn−1
θ n Γn e
1
= (n − 1)[t − x1 ]n−2 n−1
t
(n − 1) 1t [1 − xt1 ]n−2
0 < x1 < t
=
0 otherwise
The UMVUE of θ is
t
n−1h
Z
x1 in−2
E[X1 | T = t] = x1 1− dx1
0 t t
Z t h
n−1 x1 in−2
= x1 1 − dx1
t 0 t
x1
One can take z = , then dx1 = tdz
t
When x1 = t ⇒ z = 1; when x1 = 0 ⇒ z = 0
Z 1
n−1
E[X1 | T = t] = (tz)[1 − z]n−2 tdz
t 0
Z 1
= (n − 1)t (1 − z)n−1−1 z 2−1 dz
0
Γ2Γ(n − 1) t nx̄
= (n − 1)t = = = x̄
Γ(n − 1 + 2) n n
2
The UMVUE of Pθ {X ≥ 2} is e− X̄
Example 4.17 Let X1 , X2 , · · · , Xn be a random sample from N (θ, σ 2 ) . Both
θ and σ are unknown. Find the UMVUE of σ and pth quantile.
(n−1)S 2 X̄)2
P
Let Y = σ2 = (Xσi − 2 ∼ χ2 distribution with (n − 1) degrees of
freedom. Y ∼ G( 12 , (n−1)
2 ).
( 1 n−1
n−1
1
e− 2 y y 2 −1 0<y<∞
p(y) = 2 2 Γ n−1
2
0 otherwise
√ Z ∞
1 1 n
E[ Y ] = n−1 e− 2 y y 2 −1 dy
0 2 Γ n−1
2
2
1 Γ n2
= n
( 12 ) 2
n−1
2 2 Γ n−1
2
"r #
n−1 2 Γ n2 √
i.e., Eσ S = 2
σ2 Γ n−1
2
Γ n2 √ σ
→ Eσ [S] = 2√
Γ n−1
2 n −1
1 Γ n−1
q
2
= σ where k(n) = Γn
2
n −1
k(n) 2
114
Probability Models and their Parametric Estimation
δp − θ
⇒ = z1−p ⇒ δp = z1−p σ + θ
σ
Thus the UMVUE of δp is Z1−p k(n)S + X̄ .
115
A. Santhakumaran
Likelihood Function
Definition 4.2 Consider a random sample X1 , X2 , · · · , Xn from a distribution
having pdf pθ (x), θ ∈ Ω . The joint probability density function of X1 , X2 , · · · , Xn
with a parameter θ is p(x1 , x2 , · · · , xn | θ) . The joint probability density function
may be regarded as a function of θ is called the likelihood function of the random
sample and is denoted by L(θ) = pθ (x1 , x2 , · · · , xn ) θ ∈ Ω.
Property 4.1 Let IX (θ) and IY (θ) be the amount of information of two inde-
pendent samples (X1 , X2 , · · · , Xn ) and (Y1 , Y2 , · · · Yn ) respectively. Let IXY (θ)
be the amount of information of the joint sample (X1 , Y1 )(X2 , Y2 ), · · · , (Xn , Yn ) .
Then IXY (θ) = IX (θ) + IY (θ) . This is known as additive property of Fisher Mea-
sure of Information.
116
Probability Models and their Parametric Estimation
Proof:
117
A. Santhakumaran
2
∂ log pθ (T )
IX (θ) + IT (θ) − 2Eθ ≥ 0
∂θ
IX (θ) + IT (θ) − 2IT (θ) ≥ 0
IX (θ) − IT (θ) ≥ 0
IX (θ) ≥ IT (θ)
Suppose T = t(X) is a sufficient statistic, then
pθ (x) = pθ (t)h(x)
log pθ (x) = log pθ (t) + log h(x)
Differentiate this with respect to θ
∂ log pθ (x) ∂ log pθ (t)
=
∂θ ∂θ
∂ log pθ (X) ∂ log pθ (T )
Vθ = Vθ
∂θ ∂θ
⇒ IX (θ) = IT (θ)
118
Probability Models and their Parametric Estimation
unbiased estimator with some lower bounds of the unbiased estimator which are not
sharp. The Cramer - Rao Inequality is very simple to calculate the lower bound for the
variance of an unbiased estimator. Also it provides asymptotically efficient estimators.
The assumptions of the Cramer - Rao Inequality are
(i) Ω is an open interval ( finite , infinite or semi infinite).
119
A. Santhakumaran
2
∂ 2 log pθ (X)
∂ log pθ (X)
Eθ + Eθ = 0
∂θ2 ∂θ
2
∂ 2 log pθ (X)
∂ log pθ (X)
Eθ = −Eθ
∂θ ∂θ2
2 2
∂ log pθ (X) ∂ log pθ (X)
But I(θ) = Eθ = −Eθ
∂θ ∂θ2
∂ log pθ (X)
= Vθ
∂θ
Z
Now Eθ [T ] = tpθ (x)dx
Differentiate this with respect to θ
Z
∂Eθ [T ] dpθ (x)
= t dx
∂θ dθ
Z
∂pθ (x) 1
= t pθ (x)dx
∂θ pθ (x)
Z
∂Eθ [T ] ∂ log pθ (x)
= t pθ (x)dx
∂θ ∂θ
∂ log pθ (X)
= Eθ T
∂θ
∂ log pθ (X)
= Covθ T,
∂θ
∂ log pθ (X)
∵ Eθ [ ]=0
∂θ
By Covariance Inequality
2
{Covθ [T, ψ(X, θ)]}
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]
∂ log pθ (x)
Take ψ(x, θ) =
∂θ
2
∂Eθ [T ]
∂θ
then Vθ [T ] ≥ ∀θ∈Ω
Vθ [ ∂ log∂θ
pθ (X)
]
2
∂Eθ [T ]
∂θ
i.e., Vθ [T ] ≥ ∀θ∈Ω
I(θ)
120
Probability Models and their Parametric Estimation
[τ 0 (θ) + b0 (θ)]2
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
or
2
[τ 0 (θ)]
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
h i Qn
where I(θ) = Vθ ∂ log∂θL(θ) and L(θ) = i=1 pθ (xi ).
121
A. Santhakumaran
2
∂ 2 log pθ (X)
∂ log pθ (X)
Eθ + Eθ = 0
∂θ2 ∂θ
2
∂ 2 log pθ (X)
∂ log pθ (X)
Eθ = −Eθ
∂θ ∂θ2
2 2
∂ log pθ (X) ∂ log pθ (X)
But I(θ) = Eθ = −Eθ
∂θ ∂θ2
∂ log pθ (X)
= Vθ
∂θ
Z
Now Eθ [T ] = tpθ (x)dx
Differentiate this with respect to θ
Z
∂Eθ [T ] dpθ (x)
= t dx
∂θ dθ
Z
∂pθ (x) 1
= t pθ (x)dx
∂θ pθ (x)
Z
∂Eθ [T ] ∂ log pθ (x)
= t pθ (x)dx
∂θ ∂θ
∂ log pθ (X)
= Eθ T
∂θ
∂ log pθ (X) ∂ log pθ (x)
= Covθ T, ∵ Eθ [ ]=0
∂θ ∂θ
By covariance inequality
2
{Covθ [T, ψ(X, θ)]}
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]
∂ log pθ (x)
Take ψ(x, θ) =
∂θ
2
∂Eθ [T ]
∂θ
then Vθ [T ] ≥ ∀θ∈Ω
Vθ [ ∂ log∂θ
pθ (X)
]
2
∂Eθ [T ]
∂θ
i.e., Vθ [T ] ≥ ∀θ∈Ω
I(θ)
122
Probability Models and their Parametric Estimation
since
∂ log pθ (x) ∂ log pθ (t)
=
∂θ ∂θ
2
∂ log pθ (X) ∂ log pθ (T )
Consider Eθ − ≥0
∂θ ∂θ
2 2
∂ log pθ (X) ∂ log pθ (T ) ∂ log pθ (X) ∂ log pθ (T )
Eθ + Eθ − 2Eθ ≥0
∂θ ∂θ ∂θ ∂θ
2
∂ log pθ (T )
IX (θ) + IT (θ) − 2Eθ ≥ 0
∂θ
IX (θ) + IT (θ) − 2IT (θ) ≥ 0
IX (θ) − IT (θ) ≥ 0
IX (θ) ≥ IT (θ)
Suppose T = t(X) is a sufficient statistic, then
pθ (x) = pθ (t)h(x)
log pθ (x) = log pθ (t) + log h(x)
Differentiate this with respect to θ
∂ log pθ (x) ∂ log pθ (t)
=
∂θ ∂θ
∂ log pθ (X) ∂ log pθ (T )
Vθ = Vθ
∂θ ∂θ
⇒ IX (θ) = IT (θ)
123
A. Santhakumaran
124
Probability Models and their Parametric Estimation
since Eθ [ ∂ log∂θ
pθ (X)
]=0 .
By covariance inequality
2
{Covθ [T, ψ(X, θ)]}
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]
∂ log pθ (x)
Take ψ(x, θ) =
∂θ
2
∂Eθ [T ]
∂θ
then Vθ [T ] ≥ ∀θ∈Ω
Vθ [ ∂ log∂θ
pθ (X)
]
2
∂Eθ [T ]
∂θ
i.e., Vθ [T ] ≥ ∀θ∈Ω
I(θ)
125
A. Santhakumaran
[τ 0 (θ) + b0 (θ)]2
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
or
2
[τ 0 (θ)]
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
h i Qn
where I(θ) = Vθ ∂ log∂θL(θ) and L(θ) = i=1 pθ (xi ).
126
Probability Models and their Parametric Estimation
Eθ [ψ(X, θ)] 0, ∀ θ ∈ Ω
=
Z
Eθ [ψ(X, θ)] = ψ(x, θ)pθ (x)dx
Z
pθ+∆ (x)
= − 1 pθ (x)dx
pθ (x)
Z
= [pθ+∆ (x) − pθ (x)]dx
= 1−1=0
Covθ [T, ψ(X, θ)] = Eθ [T ψ(X, θ)] − Eθ [T ]Eθ [ψ(X, θ)]
= Eθ [T ψ(X, θ)]
pθ+∆ (X)
= Eθ T −1
pθ (X)
pθ+∆ (x) − pθ (x)
Z
= t pθ (x)dx
pθ (x
Z Z
= tpθ+∆ (x)dx − tpθ (x)dx
= τ (θ + ∆) − τ (θ)
By covariance inequality
2
[τ (θ + ∆) − τ (θ)]
Vθ [T ] ≥ h i
Vθ pθ+∆ (X)
pθ (X) − 1
It is true for all values of ∆
[τ (θ + ∆) − τ (θ)]2
Vθ [T ] ≥ sup h
pθ+∆(X)
i
∆ V
θ pθ (X) − 1
127
A. Santhakumaran
Let y = φ(θ − φ)
Differentiate this with respect to φ
dy
= θ − 2φ
dφ
d2 y
= −2 < 0
dφ2
d2 y dy
For maximum of y , dφ2 < 0 at the value of φ for which dφ = 0 . At φ = θ2 , y has
2
maximum. The maximum value of y is θ4 . The Chapman - Robbin lower bound for
2
the variance of the unbiased estimator of θ is θ4 .
Remark 4.6 Chapman - Robbin bound becomes the Cramer - Rao lower bound
by allowing ∆ → 0 and assume the range of the distribution is independent of the
128
Probability Models and their Parametric Estimation
∂ log pθ (x)
parameter, and the derivative ∂θ exists and finite, then
[τ (θ + ∆) − τ (θ)]2
Vθ [T ] ≥ h i2
1
Eθ [pθ+∆ (X) − pθ (X)] pθ (X)
h i2
lim∆→0 [τ (θ+∆)−τ
∆
(θ)
≥ h i2
[pθ+∆ (X)−pθ (X)] 1
Eθ lim∆ →0 ∆ pθ (X)
[τ 0 (θ)]2
≥ h i2
1
Eθ p0 (X | θ) pθ (X)
[τ 0 ]2
≥ h i2
∂ log pθ (X)
Eθ ∂θ
[τ 0 (θ)]2
≥ ∀ θ∈Ω
I(θ)
Example 4.19 Obtain the Cramer - Rao lower bound for the variance of the unbi-
ased estimator of the parameter θ of the Cauchy distribution by considering a sample
of size n .
1 1
π 1+(x−θ)2
−∞ < x < ∞, −∞ < θ < ∞
pθ (x) =
0 otherwise
1 1
For a single observation x of X, L(θ) = pθ (x) =
π 1 + (x − θ)2
log L(θ) = − log π − log[1 + (x − θ)2 ]
∂ log pθ (x) 2(x − θ)
=
∂θ 1 + (x − θ)2
2
4(x − θ)2
∂ log pθ (x)
=
∂θ [1 + (x − θ)2 ]2
2
4(X − θ)2
∂ log pθ (X)
Eθ = Eθ
∂θ [1 + (X − θ)2 ]2
Z ∞
4 (x − θ)2
= dx
π −∞ [1 + (x − θ)2 ]3
Z ∞
4 t2
= dt since t = x − θ
π −∞ (1 + t2 )3
Z ∞
8 t2
= dt
π 0 (1 + t2 )3
Z ∞ 3
4 u 2 −1
= du since t2 = u
π 0 (1 + u) 23 + 32
3 3
4 Γ2Γ2
=
π Γ3
4 1√ 1√
π 2
π2 π 1
I(θ) = =
2 2
129
A. Santhakumaran
The Cramer - Rao lower bound from the sample of size n for the variance of the
0
(θ)]2
unbiased estimator of the parameter τ (θ) = θ is [τnI(θ) = n11 = n2 .
2
Example 4.20 Let X1 , X2 , · · · , Xn is a sample from N (θ, 1) . Obtain the
Cramer - Rao lower bound for the variance of (i) θ and (ii) θ2 . Also find the un-
biased estimator of θ2 . To verify that the actual variance of the unbiased estimator of
θ2 is same as Cramer - Rao lower bound.
(i) The likelihood function for θ is
n
Y
L(θ) = pθ (xi )
i=1
n
1 1
Pn 2
= e− 2 i=1 (xi −θ)
2π
n
√ 1X
log L(θ) = −n log 2π − (xi − θ)2
2 i=1
130
Probability Models and their Parametric Estimation
The Cramer - Rao lower bound for the variance of unbiased estimator of τ (θ) = θ2
0
(θ)]2 2
is [τI(θ) = 4θn where τ 0 (θ) = dτdθ(θ)
2 = 1.
Consider Eθ [X − θ]2 = 1
Eθ [X 2 ] − 1 = θ2
Pn 2
i=1 Xi
Eθ − 1 = θ2
n
Pn
Xi2
.. . i=1
n − 1 is the unbiased estimator of θ2 .
Pn
Xi2
Pn
(X −θ+θ)2
Pn
(X −θ)2 Pn
Consider i=1n = i=1 ni = i=1 n i + θ2 + 2θ
n i=1 (Xi − θ)
P 2 P 2
Xi Xi
Vθ −1 = Vθ
n n
2 X n
!
(Xi − θ)2
P
2θ
= Vθ + Vθ [Xi ] − 0
n n i=1
(Xi − θ)2 4θ2
P
= Vθ + 2 n since Vθ [Xi ] = 1 ∀ i = 1 to n
n n
2
4θ2
P
(Xi − θ)
= Vθ +
n n
2 2
P
ns (Xi − θ)
Define Y = 2 = 2
∼ χ2 distribution with n degrees of freedom
σ σ
n 1
The pdf of Y ∼ G ,
2 2
( 1 n
n
1
2 Γn
e− 2 y y 2 −1 0 < y < ∞
p(y) = 2 2
0 otherwise
Z ∞
1 − 21 y n
E [Y r ] = n ne y 2 +r−1 dy
0 2 2 Γ
2
1 Γ( n2 + r)
= n n
2 2 Γ n2 ( 12 ) 2 +r
2r Γ( n2 + r)
= r = 1, 2, · · ·
Γ n2
Γ( n + 1)
E[Y ] = 2 2 n =n
Γ2
E[Y 2 ] =
(n + 2)n and V [Y ] = 2n
ns2
But Y = and σ 2 = 1
σ 2
Y 2n 2
.. . Vθ [s2 ] = Vθ = 2 =
n n n
P 2 2
4θ2
Xi 4θ 2
Vθ −1 = Vθ [s2 ] + = +
n n n n
131
A. Santhakumaran
Xi2
P
4θ 2 2
The actual variance of n − 1 is n + n . Here the Cramer - Rao lower bound is
X2
P
less than the actual variance of the unbiased estimator n i − 1 of the parameter θ2 .
Note that the UMVUE of θ2 is X̄ 2 − n1 , since Eθ [X̄ 2 ] − {Eθ [X̄]}2 = n1
⇒ Eθ [X̄ 2 ] − n1 = θ2
i.e., X̄ 2 − n1 is unbiased estimator of θ2 .
Example 4.21 Given pθ (x) = θ1 , 0 < x < θ, θ > 0 . Compute the reciprocal
h i2
nEθ ∂ log∂θ pθ (X)
. Compare this with the variance of n+1
n T where T is the largest
observation of a random sample of size n for this distribution.
1
θ 0<x<θ
pθ (x) =
0 otherwise
1
log pθ (x = −
θ
∂ log pθ (x) 1
= −
∂θ θ
∂ log pθ (x) 1
=
∂θ θ2
2
∂ log pθ (X) 1
Eθ =
∂θ θ2
2
∂ log pθ (X) n
i.e., nEθ =
∂θ θ2
1 θ2
i2 =
n
h
nEθ ∂ log∂θ pθ (X)
132
Probability Models and their Parametric Estimation
Here the actual variance of the unbiased estimator of θ is less than the Cramer
- Rao lower bound of the estimator n+1 n T . Since the distribution is not satisfied the
assumptions of the Cramer - Rao Inequality . Note that n+1n T is the UMVUE of θ .
Example 4.22 Find the Cramer - Rao lower bound for the variance of the unbiased
estimator Pθ {X > 2} for a single observation x of X with pdf
1 −x
θe x>0θ>0
θ
pθ (x) =
0 otherwise
Z 2
1 −x
Consider τ (θ) = Pθ {X > 2} = 1 − e θ dx
0 θ
x 2
1 e− θ
= 1−
θ − θ1 0
2 2
= 1 + e− θ − 1 = e− θ
1
log pθ (x) = − log θ − x
θ
2
One can take λ = e− θ , then log λ = − θ2 i.e., θ = − log2 λ .
2 x
log pλ (x) = − log − + log λ
log λ 2
∂ log pλ (x) log λ −2 1 x1
= − (−2)(−1) (log λ) +
∂λ −2 λ 2λ
1 x
= +
λ log λ 2λ
∂ log pθ (x) θ x −2
2 = − 2 + e θ
∂ e θ− e θ 2
2
eθ
= [x − θ]
2
2
4
∂ log pθ (X) eθ
Eθ 2 = Eθ [X − θ]2
∂ e− θ 4
4
eθ 2
= θ since Eθ [X − θ]2 = θ2
4
The Cramer - Rao lower bound for the variance of the unbiased estimator of τ (θ) =
−2 4 2
e θ is θ42 e− θ , since τ 0 (θ) = ∂τ−(θ)2 = 1. The unbiased estimator of τ (θ) = e− θ
∂ e θ
is
1 if X > 2
T =
0 otherwise
4.9 Efficiency
As a consequence of Cramer - Rao Inequality, the efficient estimator is as follow:
133
A. Santhakumaran
n
X 1
Let T = Xi , thenT ∼ G n,
i=1
θ
θ n −θt n−1
Γn e t 0<t<∞
pθ (t) =
0 otherwise
Z ∞ n
1 θ −θt n−1−1
Eθ = e t dt
T 0 Γn
134
Probability Models and their Parametric Estimation
θn Γ(n − 1)
=
Γn θn−1
θ
=
n−1
n−1
Eθ = θ if n = 2, 3, · · ·
T
n−1
is the unbiased estimator of θ.
T
θ2
1
Eθ = if n = 3, 4, · · ·
T2 (n − 1)(n − 2)
θ2
1
Vθ =
T (n − 1)2 (n − 2)
θ2
n−1
Vθ = , if n = 3, 4, · · ·
T n−2
n−1 θ2
Actual variance of T is n−2 . Cramer - Rao lower bound of the unbiased estimator
n−1 θ2
T of θ is n.
θ2
n−2
Efficiency = θ2
n
n 1
= = 2 , n = 3, 4, · · ·
n−2 1− n
→ 1 as n → ∞
∂ log pθ (x)
i.e., t(x) − τ (θ) = A(θ)
∂θ
135
A. Santhakumaran
∂ log pθ (x)
t(x) − τ (θ) = A(θ)
∂θ
t(x) − τ (θ) ∂ log pθ (x)
=
A(θ) ∂θ
2 2
t(x) − τ (θ) ∂ log pθ (x)
=
A(θ) ∂θ
2
1 ∂ log pθ (X)
Eθ [T − τ (θ)]2 = Eθ
[A(θ)]2 ∂θ
2
Vθ [T ] ∂ log pθ (X)
= Eθ
[A(θ)]2 ∂θ
2
2 ∂ log pθ (X)
Vθ [T ] = [A(θ)] Eθ (4.2)
∂θ
136
Probability Models and their Parametric Estimation
∂ log pθ (X)
But Eθ T, = τ 0 (θ)
∂θ
∂ log pθ (x)
i.e, Eθ (T − τ (θ)) , = τ 0 (θ)
∂θ
since Eθ [ ∂ log∂θ
pθ (x)
]=0
" 2 #
∂ log pθ (X)
Eθ A(θ) = τ 0 (θ)
∂θ
since t(x) − τ (θ) = A(θ) ∂ log∂θ
pθ (x)
2
∂ log pθ (X)
A(θ)Eθ = τ 0 (θ)
∂θ
τ 0 (θ)
i.e., A(θ) = h i2
∂ log pθ (X)
Eθ ∂θ
[τ 0 (θ)]2
From equation (4.2) →Vθ [T ] = ∀θ∈Ω
Eθ [ ∂ log∂θ
pθ (X) 2
]
Thus the actual variance of T = t(X) is equal to the Cramer - Rao lower bound.
Remark 4.8 UMVUE may be most efficient estimator. As discussed in example
4.20, n−1
T , n = 3, 4, · · · is the UMVUE of θ but not most efficient estimator of θ .
C = [Cov(Xi , Xj )]r×r
137
A. Santhakumaran
138
Probability Models and their Parametric Estimation
Theorem 4.11 For any unbiased estimator T = t(X) of τ (θ) and any func-
tions ψi (x, θ) with finite second moments, then V [T ] ≥ ν 0 C −1 ν where ν 0 =
(ν1 , ν2 , · · · , νr ) and C = [cij ]r×r are defined by νi = Cov[T, ψi (X, θ)] and
cij = Cov[ψi (X, θ)ψj (X, θ)], i, j = 1, 2, · · · , r .
Proof: As in Lemma 4.2, replace Y by T and Xi by ψi (X, θ), then
ν 0 C −1 ν
ρ2 = ≤1
V [T ]
V [T ] ≥ ν 0 C −1 ν
where νi = Cov[T, ψi (X, θ)] = τi0 (θ), i = 1, 2, · · · , r, and C = Σ.
139
A. Santhakumaran
hTheorem i4.12 Suppose that assumptions (i) to (iii) and the relation
∂ log pθ (X)
Eθ ∂θi = 0, i = 1, 2, · · · , r hold and I(θ) is positive definite. Let
T = t(X) be any statistic with REθ [T 2 ] < ∞ for which the derivative with respect to
θi , i = 1, 2, · · · , r of Eθ [T ] = tpθ (x)dx exists for each i and can be obtained by
differentiating under the integral sign. Then Vθ [T ] ≥ α0 I −1 (θ)α, where α0 is the
row vector with ith element αi = ∂E∂θθ [T i
]
, i = 1, 2, · · · , r .
Proof: As in Theorem 4.11, replace ψi (x, θ) = ∂ log∂θpiθ (x) , i = 1, 2 · · · , r and
ν = α , C = I(θ) ⇒ Vθ [T ] ≥ α0 I −1 (θ)α.
Example 4.21 Let X1 , X2 , · · · , Xn iid N( θ, σ 2 ). Obtain the information in-
equality for the parameter θ = (θ, σ 2 ) .
140
Probability Models and their Parametric Estimation
2 4
i.e., Vθ [T1 ] ≥ σn and Vσ2 [T2 ] ≥ 2σn .
2
Remark 4.9 σn is the actual variance of the unbiased estimator T1 = X̄ for θ is
2σ 4
same as the Cramer - Rao lower bound of that estimator but n−1 is the actual variance
1
P n 2
of the unbiased estimator T2 = n−1 i=1 (Xi − X̄) is greater than the Cramer - Rao
lower bound of that estimator.
Theorem 4.13 Suppose that the assumptions (i) to (iv) hold and that the covariance
matrix K(θ) is positive definite. Let T = t(X) be any statistic with Eθ [T 2 ] < ∞ for
which the higher order derivative τ i1 +i2 +···+is (θ) exists for each i = 1, 2, · · · , s and
can be obtained by differentiating under the integral sign. Then Vθ [T ] ≥ α0 K −1 (θ)α,
where α0 is row vector with elements
∂ i1 +i2 +···+is Eθ [T ] ∂ i1 +i2 +···+is log L(θ)
= Covθ T,
∂θ1i1 · · · ∂θsis ∂θ1i1 · · · ∂θsis
= τ i1 +···+is (θ)
142
Probability Models and their Parametric Estimation
Example 4.25 Given that X ∼ b(n, θ) , 0 < θ < 1 . Obtain the Bhattacharya
bound for the unbiased estimator of the parameter τ (θ) = θ2 .
" #
∂ log L(θ) ∂ log L(θ) ∂ log L(θ) ∂ 2 log L(θ)
K(θ) = Eθ ∂θ ∂θ ∂θ ∂θ 2
∂ 2 log L(θ) ∂ log L(θ) ∂ 2 log L(θ) ∂ 2 log L(θ)
∂θ 2 ∂θ ∂θ 2 ∂θ 2
2
∂ log L(θ) ∂ log L(θ) ∂ 2 log L(θ)
∂θ ∂θ ∂θ 2
= Eθ
2 2
∂ 2 log L(θ) ∂ log L(θ)
∂ log L(θ)
∂θ 2 ∂θ ∂θ 2
143
A. Santhakumaran
n 0
θ(1−θ)
θ(1−θ) −1 0
K(θ) = , K (θ) = n
n2
θ 2 (1−θ)2
0 0
θ 2 (1−θ)2 n
2 0 00
τ (θ) = θ , τ (θ) = 2θ, τ (θ) = 2
θ(1−θ)
0
2θ
Vθ [T ] ≥
2θ, 2
n
θ 2 (1−θ)2 2
0
n
4θ 3 (1 − θ) 4θ 2 (1 − θ)2
≥ +
n n2
≥ Cramer - Rao lower bound of θ 2 + positive quantity
!
n! x 1
2 2
Since log L(θ) = log + log θ + (n − x) log[1 − (θ ) 2 ]
x!(n − x)! 2
nθ(1 − θ)
=
4θ 4 (1 − θ)2
n
I(θ) =
4θ 3 (1 − θ)
1
The Cramer - Rao lower bound for the variance of an unbiased estimator is I(θ) =
4θ 3 (1−θ)
n ,since τ 0 (θ) = 1.
Remark 4.10 (i) Bhattacharya Inequality becomes Cramer - Rao Inequality when
s = 1 , i.e., α1 = τ 0 (θ) and
∂ log L(θ) ∂ log L(θ)
K11 (θ) = Eθ
∂θ ∂θ
2
∂ log L(θ)
= Eθ = I(θ)
∂θ
Vθ [T ] ≥ α1 [I −1 (θ)]α1
α12
=
I(θ)
[τ 0 (θ)]2
= h i
Vθ ∂ log∂θL(θ)
(ii) When s = 2 Bhattacharya Inequality gives the non decreasing lower bound for the
variance of an unbiased estimator of τ (θ) .
The Bhattacharya Inequality is
Vθ [T ] ≥ α0 K −1 (θ)α
where α0 = (τ 0 (θ) τ 00 (θ)) and
K11 (θ) K12 (θ)
K(θ) =
K21 (θ) K22 (θ) 2×2
Vθ [T ] τ 0 (θ) τ 00 (θ)
144
Probability Models and their Parametric Estimation
2
Vθ [T ][K11 (θ)K22 (θ) − K12 (θ)] − τ 0 (θ)[τ 0 (θ)K22 (θ) − τ 00 (θ)K12 (θ)]
+ τ 00 (θ)[τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)] ≥ 0
2
Vθ [T ][K11 (θ)K22 (θ) − K12 (θ)] ≥ τ 0 (θ)[τ 0 (θ)K22 (θ) − τ 00 (θ)K12 (θ)] − τ 00 (θ)[τ 0 (θ)K12 (θ) −
τ 00 (θ)K11 (θ)]
≥ K 1 (θ) [τ 0 (θ)]2 K22 (θ)K11 (θ) − 2τ 0 (θ)τ 00 (θ)K11 (θ)K12 (θ) + [τ 00 (θ)]2 K11
2
(θ)
11
≥
1
[τ 0 (θ)]2 K12
2
(θ) + [τ 0 (θ)]2 K22 (θ)K11 (θ) − 2τ 0 (θ)τ 00 (θ)K11 (θ)K12 (θ) + [τ 00 (θ)]2 K11
2
(θ) − [τ 0 (θ)]2 K12
2
K11 (θ)
(θ)
≥ K 1 (θ) [τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)]2 + [τ 0 (θ)]2 [K11 (θ)K22 (θ) − K12 2
(θ)]
11
145
A. Santhakumaran
4.12 Stating the assumptions clearly, derive the Chapman - Robbin lower bound for
the variance of an unbiased estimator of a function of a real valued parameter θ .
4.13 A random sample X1 , X2 , · · · , Xn is available from a Poisson population with
mean λ . Using the unbiased estimator T = t(X1 , X2 ) = X12 − X2 . Obtain
the UMVUE of λ2 based on the sample.
4.14 State the Bhattacharya bound of order s . Also prove that it is a non - decreasing
function of s .
4.15 Define Bhattacharya bound. Show that it is sharper than the Cramer - Rao bound.
4.16 On the basis of a random sample of size n , the Cramer - Rao lower bound of
variance of an unbiased estimator of θ in
1
π[1+(x−θ)2 ] −∞ < x < ∞; −∞ < θ < ∞
pθ (x) =
0 otherwise
is equal to
( a) n1 (b) 1
n2 (c) 2
n (d) 2
n
146
Probability Models and their Parametric Estimation
147
A. Santhakumaran
5. METHODS OF ESTIMATION
5.1 Introduction
Chapters 2 , 3 and 4 disuse the properties of a good estimator. The methods of
obtaining such estimators are as follows:
(i) Method of Maximum Likelihood Estimation
(ii) Method of Minimum Variance Bound Estimation
(iii) Method of Moments Estimation
148
Probability Models and their Parametric Estimation
Pn
∂ log L(θ) 1
Pn 2 x2i
For maximum , ∂θ = 0 → −n + θ i=1 xi = 0 i. e., θ̂ =
i=1
n and
∂ 2 log L(θ) n
=− < 0 at θ = θ̂(x)
∂θ2 2θ̂
Pn
X2
The MLE of θ is θ̂(X) = i=1 n
i
.
Example 5.2 A random sample of size n is drawn from a population having
density function
θxθ−1 0 < x < 1, 0 < θ < ∞
pθ (x) =
0 otherwise
149
A. Santhakumaran
A sample of size n is taken and it is known that k of the observations are X > 2 and
(n − k) of the observation are X < 2 . The likelihood function for p of the sample
size n is
L(p) = pk (1 − p)n−k
log L(p) = k log p + (n − k) log(1 − p)
∂ log L(p) k (n − k)
= + (−1)
∂p p (1 − p)
k − np
=
p(1 − p)
∂ 2 log L(p) −np2 − k + 2pk
=
∂p2 [p(1 − p)]2
∂ log L(p)
For maximum, = 0
∂p
⇒ k − np = 0
k
i.e., p̂ = and
n
2
∂ 2 log L(p) −n nk 2 − k + 2 nk k
∂p2 k = k k 2
p̂= n n (1 − n )
k 1 − nk
k
= − < 0 since n < 1 for n = 1, 2, · · ·
k k 2
n (1 − n )
k
Thus the value of the MLE of p is p̂ = n . The value of the MLE of P {X > 2} =
2
− −2
e where θ̂(x) =
θ̂(x)
k .
log( n )
Example 5.4 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal
population with mean θ and variance σ 2 . The density function
( 1 2
√ 1 e− 2σ2 (x−θ) −∞ < x < ∞, −∞ < θ < ∞, σ 2 > 0
pθ,σ2 (x) = 2πσ
0 otherwise
Find the MLE of
150
Probability Models and their Parametric Estimation
i=1 2πσ 2
− n − 1 P(xi −θ)2
= 2πσ 2 2 e 2σ2
n n 1 X
log L(θ) = − log 2π − log σ 2 − 2 (xi − θ)2
2 2 2σ
∂ log L(θ) 1 X
= (xi − θ)
∂θ σ2
∂ 2 log L(θ) n
= − 2 <0
∂θ2 σ
∂ log L(θ)
For maximum, = 0
∂θ
X ∂ 2 log L(θ)
⇒ (xi − θ) = 0 i.e., θ̂(x) = x̄ and <0
∂θ2
151
A. Santhakumaran
(xi − x̄)2
P
−n −n 2 2
− 0 > 0 at θ = θ̂(x) = x̄ and σ = σ̂ (x) =
σ̂ 2 (x) 2σ̂ 4 (x) n
∂ 2 log L(θ, σ 2 ) ∂ 2 log L(θ, σ 2 )
−n −n
since 2
= 2 <0 2 )2 2 2 = 2(σ̂ 2 (x))2 < 0
∂θ
θ=θ̂(x) σ̂ (x) ∂(σ σ =σ̂ (x)
2 2
∂ log L(θ, σ ) X −1
θ = θ̂(x) = (xi − x̄) 4 =0
∂θ∂σ 2 σ̂ (x)
σ 2 = σ̂ 2 (x)
P 2
.˙. The value of the MLE of θ and σ 2 are θ̂(x) = x̄ and σ̂ 2 (x) = (xni −x̄) .
Example 5.5 Find the MLE of the parameter α and λ ( λ being large) from a
sample of n independent observations from the population represented by the follow-
ing density function
( λ λ
(α) λ
−α x λ−1
pα,λ (x) = Γλ e x x > 0, λ > 0, α > 0
0 otherwise
Also obtain the asymptotic form of the covariance for the two parameters for large n .
Given that ∂ log
∂λ
Γλ 1
≈ log λ − 2λ .
Likelihood function for α and λ of the sample size n is
nλ Yn
1 λ
Pn λ
L(α, λ) = n
e− α i=1 xi xλ−1
i
(Γλ) α i=1
n n
λX X
log L(α, λ) = −n log Γλ + nλ log λ − nλ log α − xi + (λ − 1) log xi
α i=1 i=1
152
Probability Models and their Parametric Estimation
P
∂ log L(α, λ) nλ xi
=− +λ 2
∂α α α
∂ 2 log L(α, λ)
P
nλ xi
= 2 − 2λ 3
∂α2 α α
∂ 2 log L(α, λ)
P
n xi
=− + 2
∂λ∂α α α
Pn n
∂ log L(α, λ) ∂ log Γλ i=1 xi X
= −n + n(1 + log λ) − n log α − + log xi
∂λ ∂λ α i=1
P
∂ log L(α, λ) 1 xi X
= −n(log λ − ) + n + n log λ − n log α − + log xi
∂λ 2λ α
P
∂ log L(α, λ) n xi X
= + n − n log α − + log xi
∂λ 2λ α
∂ 2 log L(α, λ) n
2
=− 2
∂λ 2λ
∂ log L(α,λ)
For maximum of log L(α, λ), ∂α = 0 and ∂ log∂λ L(α,λ)
=0
P
λ xi
−n + λ 2 = 0 → α̂(x) = x̄ and
P α α
n xi X
+ n − n log α − + log xi = 0
2λ α
n
→ λ̂(x) = Pn
2 i=1 (log x̄ − log xi )
∂ 2 log L(α, λ)
n nx̄
Further =− + 2 =0
∂λ∂α
α=α̂(x),λ=λ̂(x) x̄ x̄
∂ 2 log L(α, λ)
< 0 and
∂λ2
λ=λ̂(x)
2
∂ 2 log L(α, λ) ∂ 2 log L(α, λ)
2
∂ log L(α, λ)
− > 0 at α = α̂(x) and λ = λ̂(x)
∂λ2 ∂α2 ∂λ∂α
" #
n nλ̂(x) 2λ̂(x)nx̄ n2 1
i.e., − − − 0 = >0
2λ̂2 (x) x̄2 x̄3 λ̂(x)x̄2 2
Thus the value of the MLE of α and λ are α̂(x) = x̄ and λ̂(x) = 2 P(log nx̄−log xi ) .
The asymptotic covariance matrix is
h 2 i h 2 i
−Eα,λ ∂ log∂αL(α,λ)
2 −Eα,λ ∂ log L(α,λ)
∂λ∂α
D= h 2 i h 2 i
−Eα,λ ∂ log L(α,λ)
∂α∂λ −Eα,λ
∂ log L(α,λ)
∂λ 2
153
A. Santhakumaran
" n #
∂ 2 log L(α, λ)
nλ 2λ X
−Eα,λ = − 2 + 3 Eα Xi
∂α2 α α i=1
nλ 2λ
= − + 3 nα
α2 α
nλ
= since Eα [Xi ] = α ∀ i
α2
∂ 2 log L(α, λ)
n
−Eα,λ =
∂λ2 2λ2
n
X
log L(α) = n log β − β (xi − α)
i=1
∂ log L(α)
= nβ
∂α
The direct method cannot help to estimate the MLE of α . Since α ≤ x(1) ≤ x(2) ≤
· · · ≤ x(n) < ∞ , i.e., the range of the distribution depends on the parameter α .
log L(α) = n log β − nβ x̄ + nβα
is maximum, if α is minimum , i.e., α̂ = x(1) = value of the minimum order statistic
of the sample. Thus the value of the MLE of α is the terminal value x(1) .
Example 5.7 Let X1 , X2 , · · · , Xn be a random sample drawn from a population
having density
1 −|x−θ|
2e −∞ < x < ∞, −∞ < θ < ∞
pθ (x) =
0 otherwise
154
Probability Models and their Parametric Estimation
P
L(θ) Pis maximum, if e |xi −θ| is minimum.
But e |xi −θ| is minimum if θ̂(x) = Median of the sample value, since mean devia-
tion is least when measured from the median. Thus the value of the MLE of θ is the
middle value of the sample.
Example 5.8 MLE is not unbiased
Let X1 , X2 , · · · , X5 be a random sample of size 5 from the uniform distribution hav-
ing pdf 1
θ 0 < x < θ, θ > 0
pθ (x) =
0 otherwise
Show that the MLE of θ is not unbiased.
The likelihood function for θ of the sample size n = 5 is
1
L(θ) = if 0 < xi < θ, i = 1, 2, 3, 4, 5.
θ5
L(θ) is maximum, the estimate of θ is minimum. If
If θ̂(x) = x(5) = max1≤i≤5 {xi }, then the value of the MLE of θ is θ̂(x) = x(5) .
Let Y = max1≤i≤5 {X5 } . The pdf of Y is
5 4
pθ (y) = θ5 t 0<y<θ
0 otherwise
Z θ
5 5
Eθ [Y ] = 5
t dt
0 θ
5
= θ 6= θ
6
The MLE θ̂(X) = X(5) is not an unbiased estimator.
Example 5.9 MLE is not unique and not sufficient statistic
Let X1 , X2 , · · · , Xn be iid with the pdf
1 θ ≤x≤θ+1
pθ (x) =
0 otherwise
155
A. Santhakumaran
Thus any point in [x(n) − 1, x(1) ] is a value of the MLE of θ . Thus the MLE of θ is
not unique and not sufficient statistic.
Example 5.10 MLE is not exist
Let X1 , X2 , · · · , Xn be a random sample drawn from a population with
pmf b(1, θ), 0 < θ < 1 both n and θ are unknown and the only sample values
(0, 0, 0, · · · , 0) or (1, 1, · · · , 1) is available.
The likelihood function for θ of the sample size n is
P P
= θ xi (1 − θ)n− xi
L(θ)
X X
log L(θ) = xi + n − xi log(1 − θ)
P P
∂ log L(θ) xi (n − xi )
= +
∂θ θ 1−θ
∂ log L(θ)
For maximum , = 0
∂θ
→ θ̂(x) = x̄ and
2
∂ log L(θ)
<0
∂θ2
θ=x̄
i=1
2πσ 2
156
Probability Models and their Parametric Estimation
1
Pn 1
Pn
log L(µi , σ 2 ) = −n log 2π − n log σ 2 − 2σ 2 i=1 (xi − µi )2 − 2σ 2 i=1 (yi − µi )2
∂ log L(µi , σ 2 )
= 0
∂µi
1 1
⇒ 2 (xi − µi ) + 2 (yi − µi ) = 0
σ σ
xi + yi
⇒ µ̂i = , i = 1, 2, · · · , n
2
" n n
#
∂ log L(µi , σ 2 ) −n 1 X 2
X
2
= 2 + 4 (xi − µi ) + (yi − µi ) = 0
∂σ 2 σ 2σ i=1 i=1
" n 2 X n 2 #
−n 1 X xi + yi xi + yi
+ 4 xi − + yi − =0
σ2 2σ i=1 2 i=1
2
" n n
#
−n 1 1X 2 1X 2
+ 4 (xi − yi ) + (xi − yi ) = 0
σ2 2σ 4 i= 4 i=1
n
1 X
⇒ σ̂ 2 (x, y) = (xi − yi )2
4n i=1
n
1 X
Thus σ̂ 2 (X, Y ) = (Xi − Yi )2 is not consistent estimator of σ 2 .
4n i=1
157
A. Santhakumaran
The use of successive iterations to solve the likelihood equations by assuming ∂ log∂θL(θ)
is continuous at θ for each xi , i = 1, 2, 3, · · · , n , where n is the sample size.
For example, a random variable has a Cauchy distribution depending on a location
parameter θ , i.e.,
1 1
π 1+(x−θ)2 −∞ < x < ∞
pθ (x) =
0 otherwise
Taking a sample of size n from the population, the log likelihood function for θ is
n
X
log L(θ) = −n log π − log[1 + (xi − θ)2 ]
i=1
n
∂ log L(θ) X 2(xi − θ)
= −
∂θ i=1
1 + (xi − θ)2
158
Probability Models and their Parametric Estimation
and so on. Starting from an initial solution θ0 , one can generate a sequence {θk , k =
0, 1, · · · } which is determined successively by the formula
∂ log L(θk )
∂θ
θk+1 = θk − ∂ 2 log L(θk )
, k = 0, 1, 2, · · · (5.4)
∂θ 2
If the initial solution θ0 was chosen, close to the root of the likelihood equations θ̂(x)
2
and if ∂ log L(θk )
∂θ 2 for k = 0, 1, · · · , is bounded away from zero, there is a good
chance that the sequence generated by equation (5.4) will converge to the root θ̂(x) .
The sequence {θk , k = 0, 1, · · · , } generated by equation (5.4) depends on the sample
values X1 , X2 , · · · Xn . If the chosen initial solution θ0 is a consistent estimator of θ ,
then the sequence obtained by the equation (5.4) will faster converge to the root θ̂(x)
and provide the best asymptotically normal estimator of θ .
In small sample situations the sequence {θk , k = 0, 1, · · · , } generated by
equation (5.4) may convey irregularities due to the particular sample values obtained
in the experiment. In order to avoid irregularities in the approximating sequence, two
methods are proposed. They are fixed derivative method and method of scoring.
(ii) The Method of Fixed derivative
2
In the fixed derivative method, the term ∂ log L(θk )
∂θ 2 in equation (5.4) is re-
placed by − ank where {ak , k = 0, 1, · · · } is a suitable chosen sequence of constants
and n is the sample size.
Now the sequence {θk , k = 0, 1, · · · } is generated by
ak ∂ log L(θk )
θk+1 = θk + , k = 0, 1, 2, · · · (5.5)
n ∂θ
The sequence {θk , k = 0, 1, · · · , } converge to the root θ̂(x) in a more regular fash-
ion rather than the equation (5.4) by the choice sequence {ak }∞ k=0
Fixed derivative method fails to converge in many cases, the method of scoring
may use to locate the local maximum, since the log likelihood curve is steep in the
neighbour hood of a local maximum equation (5.5).
(iii) The Method of Scoring
The method of scoring is a special case of the fixed derivative method. The
special sequence {ak , k = 0, 1, · · · , } is chosen by Fisher. It is ak = I(θnk ) , where
I(θk ) is the amount of Fisher Information of n observations x of X and θk is the
value of the approximation after the (k − 1)th iteration. Thus Fisher’s scoring method
generates the sequence
1 ∂ log L(θk )
θk+1 = θk +
I(θk ) ∂θ
159
A. Santhakumaran
Arrange the sample values in the incceasing order of magnitude. Let the first trial value
of θ is θ̂(x) = t1 = the value of the sample median. The first approximation value is
n
4X (xi − t1 )
t2 = t1 +
n i=1 1 + (xi − t1 )2
The successive iteration values are t3 , t4 , · · · . This procedure is continued until any
two successive iterations values are equal. The convergent value is the value of the
MLE of θ .
C programme for MLE of θ of Cauchy distribution
#include < stdio.h >
#include < math.h >
#include < conio.h >
void main()
{
int i,j,n;
float a[100], sum[100], t[100], temp;
clrscr();
printf( ˝ Enter the number of observations n: \ n”);
scanf( ˝ %d”, &n);
printf( ˝ Enter the observations a: \ n”);
for(i= 1; i < = n; i++)
scanf( ˝ % f”, &a[i]);
for(i=1; i < = n-1, i++)
{
for(j=i+1; j < = n; j++)
{
if(a[i] > = a[j])
{
temp=a[i];
a[i]= a[j];
a[j]= temp;
}
}
}
if(n % 2 = = 0)
t[1] = (a[n/2] + a[ n/2 + 1]) / 2 ;
160
Probability Models and their Parametric Estimation
else
t[1] = a[(n+1)/2];
printf( ˝ \ n OUT PUT \ n \ n ”);
printf( ˝ Value of the MLE of the Cauchy Distribution \ n”);
printf( ˝ \ n - - - - - - - - - - - - - - \ n”);
for(i=1:i < = n; i++)
printf( ˝ \ t %f \ n”, a[i]);
printf( ˝ \ n Result: \ n \ n”);
printf( ˝ Median = t[1] = %f \ n \ n”, t[1]);
for(j=1; j < =n; j++)
{
sum[j]= 0;
for(i =1; i < = n; i++)
{
sum[j] = sum[j] + (a[i] - t[j]) / (1 + (a[i] - t[j]) *(a[i] - t[j]) );
}
printf( ˝ Sum[%d] = % f \ t \ n”, j, sum[j]);
t[j+1] = t[j] + (4 / (float)n)*(sum[j]);
printf( ˝ t[%d] = %f \ n ”, j+1, t[j+1]);
if(abs(t[j] -t[j+1] ) > = .001 )
break;
}
printf( ˝ \ n Value of the MLE of theta = % f”,t[j] );
getch();
}
The value of MLE of θ = 6.013498.
Example 5.13 Obtain the values of the MLE’s of the parameters b and c of the
pdf
c
c c−1 − xb
x e x, b, c > 0
pb,c (x) = b
0 otherwise
based on a sample of size n .
The likelihood function for b and c of the sample size n is
n
c n Y Pn
1
xci
L(c, b) = xc−1
i e− b i=1
b i=1
n
X 1X c
log L(c, b) = n log c − n log b + (c − 1) log xi − x
b i=1 i
n
∂ log L(c, b) n X c X c−1
= + log xi − x
∂c c b i=1 i
n
∂ log L(c, b) n 1 X c
= − + 2 x
∂b b b i=1 i
161
A. Santhakumaran
∂ log L(c)
= 0
∂c
n n
n X c X
⇒ + log xi − xc−1
i = 0
c i=1
b i=1
n
X n
X
i.e., c2 xc−1
i − cb log xi − nb = 0
i=1 i=1
The estimates of c and b are obtained to solve the above equations for c and b by
iterative method.
162
Probability Models and their Parametric Estimation
By Jensen’s Inequality for the convex function f (X) ⇒ E[f (X)] ≤ f (E[X]). Here
p (x) p (x)
− log pθθ0 (x) = log pθθ1 (x) is strictly convex. 1
1 0
pθ1 (x)
For the convex function, log
pθ0 (x)
pθ1 (X) pθ1 (X)
Eθ0 log ≤ log Eθ0
pθ0 (X) pθ0 (X)
Z
pθ1 (X) pθ1 (x)
But Eθ0 = pθ (x)dx = 1
pθ0 (X) pθ0 (x) 0
1 dy 1
y = log x is a concave function and − log x is a convex function, since dx
= x
>0 ↑ ∀x>0
d2 y
and dx2
= − x12 <0
163
A. Santhakumaran
L(θ0 )
.˙. lim Pθ0 {Sn } = Pθ0 lim >1
n→∞ n→∞ L(θ1 )
( n )
1X pθ1 (Xi )
= Pθ0 lim log <0
n→∞ n pθ0 (Xi )
i=1
pθ1 (X)
= Pθ0 Eθ0 log < 0 → 1 as n → ∞
pθ0 (X)
pθ1 (X)
= Pθ0 log Eθ0 < 0 → 1 as n → ∞
pθ0 (X)
Pθ0 {L(θ0 ) > L(θ1 )} → 1 as n → ∞
MLE is consistent
Theorem 5.1 (Dugue, 1937) If log L(θ) is differentiable in an interval including
the true value of θ, say θ0 , then under the assumptions of Lemma 5.1, the likelihood
equation ∂ log∂θL(θ) = 0 has a root with probability 1 as n → ∞ which is consistent
for θ0 .
Proof: Let θ0 be o of θ and consider an interval (θ0 ± δ) , δ > 0 .
n the true value
L(θ0 )
By Lemma 5.1 Pθ0 L(θ1 ) > 1 → 1 as n → ∞, where θ1 = θ0 ± δ, since θ0 ∈
(θ0 − δ, θ0 + δ) and the likelihood function is continuous in (θ0 − δ, θ0 + δ) .
L(θ) should have a relative maximum within (θ0 − δ, θ0 + δ) with probability tends
to 1 as n → ∞ , since L(θ) is differentiable over (θ0 − δ, θ0 + δ) .
⇒ ∂ log∂θL(θ) = 0 at some point in (θ0 − δ, θ0 + δ)
⇒ θ̂(x) is a solution of ∂ log∂θL(θ) = 0 in (θ0 − δ, θ0 + δ)
⇒ θ̂(X) n ∈ [θ0 − δ, θ0 + δ] with probability
o tends to 1 as n → ∞
⇒ Pθ0 θ0 − δ < θ̂(X) < θ0 + δ → 1 as n → ∞
n o
⇒ Pθ0 θ̂(X) − θ0 < δ → 1as n → ∞
P
⇒ θ̂(X) → θ0 as n → ∞
⇒ θ̂(X) is a consistent estimator of θ .
MLE maximizes the Likelihood
Theorem 5.2 ( Huzurbazar, 1948) If log L(θ) is twice differentiable in an interval
including the true value of the parameter, than the consistent solution of the likelihood
equation [ which exists with probability one by Theorem 5.1 ] maximizes the likelihood
at the true value with probability tends to one, i.e.,
( )
∂ 2 log L(θ)
Pθ0 < 0 → 1 as n → ∞
∂θ2
θ=θ̂(x)
∂ 2 log L(θ)
Proof: Expanding ∂ 2 θ2 as Taylor’s series around θ̂(x) is
∂ 2 log L[θ̂(x)] ∂ 2 log L(θ0 ) 3
L(θ ? )
∂θ 2 = ∂θ 2 +[θ̂(x)−θ0 ] ∂ log
∂θ 3 where θ? = θ0 +ν(θ̂(x)−θ0 ), 0 <
ν<1
164
Probability Models and their Parametric Estimation
3
L(θ ? )
Further, assume ∂ log ≤ H(x) ∀ θ ∈ Ω and Eθ0 [H(X)] < ∞ is independent
∂θ 3
of θ0 .
∂ 2 log L[θ̂(x)] ∂ 2 log L(θ ) 3
∂ log L(θ? )
0
− ≤ |θ̂(x) − θ0 |
∂θ2 ∂θ2 ∂θ3
≤ |θ̂(x) − θ0 |H(x)
P P
|θ̂(X) − θ0 |H(X) → 0 as n → ∞ since θ̂(X) → θ0 as n → ∞
( )
∂ 2 log L[θ̂(X)] ∂ 2 log L(θ )
0
Pθ0 − < → 1 as n → ∞
∂θ2 ∂θ2
Each X1 , X2 , · · · , Xn is iid and by Khintchin’s Law of Large Numbers
n
1 X ∂ 2 log pθ (xi ) P
2
∂ log pθ (X)
→ Eθ 0 as n → ∞
n i=1 ∂θ2 ∂θ2
2
∂ log pθ (X)
Since I(θ0 ) ≥ 0 → Eθ0 = −I(θ0 ) < 0
∂θ2
( n )
. 1 X ∂ 2 log pθ (X)
. .Pθ0 <0 → 1 as n → ∞
n i=1 ∂θ2
n
( )
∂ 2 log L(θ)
Y
Since L(θ) = pθ (xi ) → Pθ0 <0 → 1 as n → ∞
i=1
∂θ2
θ=θ̂(x)
Theorem 5.3 ( Cramer p 1946) Let θ̂(X) be the MLE of θ , then under the regular-
ity conditions (i) to (iii) nI(θ0 )(θ̂(X) − θ0 ) has an asymptotic normal distribution
with mean zero and variance one
Proof: Let θ̂(X) be the solution of ∂ log∂θL(θ) = 0 in an interval containing the
true value θ0 of θ .
Expanding the function ∂ log∂θL(θ) around θ̂(x) by using Taylor’s series for any fixed
165
A. Santhakumaran
x,
2
∂ log L θ̂(x) ∂ log L(θ0 ) 2
∂ log L(θ0 ) θ̂(x) − θ 0 ∂ 3 log L(θ? )
i.e., = + θ̂(x) − θ0 +
∂θ ∂θ ∂θ2 2! ∂θ3
where θ? = θ0 + ν θ̂(x) − θ0 , 0 < ν < 1.
2
∂ log L(θ̂(x)) ∂ log L(θ0 ) ∂ 2 log L(θ ) θ̂(x) − θ0 ∂ 3 log L(θ? )
0
But =0 → + θ̂(x) − θ0 2
+ =0
∂θ ∂θ ∂θ 2 ∂θ3
2
∂ 2 log L(θ ) θ̂(x) − θ0 ∂ 3 log L(θ? ) ∂ log L(θ0 )
0
θ̂(x) − θ0 2
+ 3
=−
∂θ 2 ∂θ ∂θ
∂ 2 log L(θ ) θ̂(x) − θ0 ∂ 3 log L(θ? )
= − ∂ log L(θ0 )
0
θ̂(x) − θ0 +
∂θ2 2 ∂θ3 ∂θ
1 ∂ log L(θ0 )
n ∂θ
θ̂(x) − θ0 =
1 ∂ 2 log L(θ0 ) (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ? )
−n ∂θ 2 − 2 n ∂θ 3
I(θ0 )
nI(θ0 ) n1 ∂ log∂θ
L(θ0 )
p
I(θ0 )
p
nI(θ0 ) θ̂(x) − θ0 =
2 (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ ? )
− n1 ∂ log L(θ0 )
∂θ 2
− 2 n ∂θ 3
1 ∂ log L(θ0 )
√ ∂θ
nI(θ0 )
p
nI(θ0 ) θ̂(x) − θ0 =
2 (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ? )
1
I(θ0 ) − n1 ∂ log L(θ0 )
∂θ 2 − 2 n ∂θ 3
n
1 X ∂ 2 log pθ (xi ) P
→ −I(θ0 ) as n → ∞
n i=1 ∂θ2
P P
Also θ̂(X) → θ0 as n → ∞ → θ̂(X) − θ0 → 0 as n → ∞ and
Eθ0 [H(X)]
h = ki as n → ∞. Denote Zi = ∂ log∂θ pθ (xi )
, i = 1, 2, · · · , n. Eθ0 [Zi ] =
∂ log pθ (Xi )
Eθ0 ∂θ = 0 ∀ i = 1, 2, · · · , n. Let Sn = Z1 + · · · + Zn , then E[Sn ] = 0
and V [Sn ] = I(θ0 ) + · · · + I(θ0 ) = nI(θ0 )
∂ log L(θ0 )
√ 1 ∂θ
nI(θ0 )
p
nI(θ0 ) θ̂(X) − θ0 = 1
1 as n → ∞
I(θ0 ) − n (−nI(θ0 )) − 0
166
Probability Models and their Parametric Estimation
p ∂ log L(θ0 )
nI(θ0 ) θ̂(X) − θ0 = p ∂θ as n → ∞
nI(θ0 )
n −E[Sn ] d
By Lindeberg - Levey Central Limit Theorem S√ → N (0, 1) as n → ∞ .
V [Sn ]
p
.˙. nI(θ0 ) θ̂(X) − θ0 ∼ N (0, 1) as n → ∞ .
Remark 5.2 Any consistent estimator θ̂(X) of roots of the likelihood equation
√
satisfies n(θ̂(X)−θ0 ) ∼ N (0, I(θ10 ) ), then θ̂(X) is an efficient likelihood estimator
of θ or asymptotically normal and efficient estimator of θ .
MLE is unique
Theorem 5.4 ( Wald 1949) Consistent solution of a likelihood equation is unique with
probability 1 as n → ∞
ˆ ˆ ∂ log L(θ)
Proof: Let θ1 (x) and θ2 (x) be two consistent solutions of ∂θ = 0 and
ˆ ˆ
θ1 (x) 6= θ2 (x) . By Huzurbazar’s Theorem
( )
∂ 2 log L(θˆ1 (X))
Pθ < 0 → 1 as n → ∞ and
∂θ2
( )
∂ 2 log L(θˆ2 (X))
Pθ < 0 → 1 as n → ∞
∂θ2
∂ log L(θ) 2 ˆ
∂ log L(θ3 (x))
Applying Rolle’s Theorem to the function
∂θ which gives ∂θ 2 = 0
ˆ ˆ ˆ ˆ ˆ
for some θ3 (x) within the interval θ1 (x), θ2 (x) where θ3 (x) = λθ1 (x) + (1 −
λ)θˆ2 (x), 0 < λ < 1. θˆ3 (x) is also a consistent solution of ∂ log∂θL(θ) = 0. Thus
( )
∂ 2 log L(θˆ3 (X))
Pθ < 0 → 1 as n → ∞
∂θ2
∂ 2 log L(θˆ3 (x)) 2
∂ log L(θ̂(x))
∂θ 2 < 0 is a contradiction to Rolle’s Theorem
property that ∂θ 2 =
ˆ ˆ ˆ ˆ
0 for some θ3 (x) within the interval θ1 (x), θ2 (x) . The only possibility is θ1 (x) =
θˆ2 (x) . Thus θˆ1 (x) = θˆ2 (x) is a consistent solution of the likelihood equation and is
unique.
Invariance Property of MLE
Let X ∼ Pθ , θ ∈ Ω, where Ω is a k dimensional parameter space. Consider
g(θ) : Ω → O where O is the r dimensional space (r ≤ k) . If θ̂ is the MLE of θ ,
then g(θ̂) is the MLE of g(θ) .
Let g(θ) be the function of θ from Ω to O , i.e., g : Ω → O ∀θ ∈ Ω
i.e., g(θ) = ω ∈ O . For a fixed ω ∈ O , let
Aω = [θ | g(θ) = ω]
= the set of all θ0 s such that g(θ) = ω fixed ∀ ω ∈ O
.. . ∩ω Aω = Ω
167
A. Santhakumaran
n
∂ log L(θ) 1X
For maximum, = 0 → A0 (θ) = t(xi ) (5.6)
∂θ n i=1
and
∂ 2 log L(θ)
= −nA00 (θ) < 0
∂θ2
Z
Consider eθt(x)−A(θ) h(x)dx = 1
Assume that the integral is continuous and has derivatives of all orders with re-
spect to θ and it can be differentiated under the integral sign.
Z Z
t(x)eθt(x)−A(θ) h(x)dx − A0 (θ)e−A(θ) eθt(x) h(x)dx = 0
Z
Eθ [T ] = A0 (θ) eθt(x)−A(θ) h(x)dx
A0 (θ) = Eθ [T ] (5.7)
Pn
Using equations (5.6) and (5.7), one may get Eθ [T ] = n1 i=1 t(xi )
Z Z
0
t(x)e θt(x)−A(θ)
h(x)dx − A (θ) = 0 since eθt(x) e−A(θ) h(x)dx = 1
168
Probability Models and their Parametric Estimation
[τ 0 (θ)]2
Vθ [T ] = h i2 ∀ θ ∈ Ω
∂ log L(θ)
Eθ ∂θ
169
A. Santhakumaran
∂ log L(θ)
Covθ T, = τ 0 (θ), ∀ θ ∈ Ω.
∂θ
∂ log L(θ)
i.e., Eθ T = τ 0 (θ), ∀ θ ∈ Ω.
∂θ
∂ logL(θ) ∂ log L(θ)
Eθ (T − τ (θ)) = τ 0 (θ), since Eθ =0∀θ∈Ω
∂θ ∂θ
∂ log L(θ)
A(θ)Eθ [T − τ (θ)]2 = τ 0 (θ) since = A(θ)[t(x) − τ 0 (θ)]
∂θ
A(θ)Vθ [T ] = τ 0 (θ)
τ 0 (θ)
A(θ) =
Vθ [T ]
Squaring both sides of (5.8), one can get
2
∂ log L(θ) 2
A2 (θ) t(x) − τ 0 (θ)
=
∂θ
2
∂ log L(θ)
Eθ = A2 (θ)Vθ [T ]
∂θ
2
[τ 0 (θ)]2 Vθ [T ]
∂ log L(θ)
i.e., Eθ =
∂θ {Vθ [T ]}2
[τ 0 (θ)]2
i.e., Vθ [T ] = h i2 ∀ θ ∈ Ω
∂ log L(θ)
Eθ ∂θ
T = t(X) attains the Cramer - Rao lower bound, i.e., T = t(X) is a MVBE of
τ (θ) .
Conversely, assume T = t(X) is a MVBE of τ (θ) . Now to prove ∂ log∂θL(θ) ∝
[t(x) − τ (θ)] , i.e., ∂ log∂θL(θ) = A(θ)[t(x) − τ (θ)] , τ 0 (θ) = A(θ)Vθ [T ] and
h i2 0 2
Eθ ∂ log∂θL(θ) = [τVθ(θ)]
[T ]
2
A2 (θ)Vθ2 [T ]
∂ log L(θ)
.˙. Eθ =
∂θ Vθ [T ]
2
∂ log L(θ)
Eθ = A2 (θ)Vθ [T ]
∂θ
2
∂ log L(θ)
Eθ = A2 (θ)Eθ [T − τ (θ)]2
∂θ
∂ log L(θ)
⇒ = A(θ)[t(x) − τ (θ)]
∂θ
∂ log L(θ)
i.e., ∝ [t(x) − τ (θ)]
∂θ
170
Probability Models and their Parametric Estimation
171
A. Santhakumaran
Theorem 5.6 The necessary and sufficient condition that distribution admits the
estimator of a suitable chosen function of a parameter with variance equal to the in-
formation limit ( MVB) is that the likelihood function L(θ) = eθ1 t(x)+θ2 h(x), where
h(x) and t(x) are functions of observations only and θ1 and θ2 are functions of θ
only. The parametric functions to be estimated is − dθ dθ2 dθ
dθ1 = − dθ dθ1 and the variance
2
2
h i
of the estimator is − ddθθ22 = dθ
d
− dθ
dθ1
2 1
dθ
1 dθ1
Proof: Let T = t(X) be the MVBE of τ (θ) where θ is the population parameter.
For a single observation x of X , the likelihood function for θ is L(θ) = pθ (x) , and
t(x) − τ (θ) and ∂ log∂θL(θ) are proportional, i.e.,
∂ log L(θ)
= A(θ)[t(x) − τ (θ)]
∂θ
where A(θ) is a function of θ only.
Integrating with respect to θ , one can get
Z Z
∂ log L(θ) = A(θ)[t(x) − τ (θ)]dθ + c
Further, assuming the differentiation with respect to θ1 under the integral sign is valid
and differentiate twice, one can get
Z
dθ2
h(x)et(x)θ1 t(x)dx = e−θ2 − (5.9)
dθ1
2
d2 θ2
Z
dθ2
h(x)et(x)θ1 [t2 (x)]dx = e−θ2 − e−θ2 (5.10)
dθ1 dθ12
172
Probability Models and their Parametric Estimation
2
dθ2 d2 θ2
R 2
From equation (5.10), t (x)et(x)θ1 +θ2 h(x)dx = dθ 1
− dθ12
2
d2 θ2
2 dθ2
Eθ [T ] = −
dθ1 dθ12
2 d2 θ 2
Vθ [T ] = Eθ [T 2 ] − (Eθ [T ]) = −
dθ12
173
A. Santhakumaran
The variance of the MVBE of τ (θ) = − dθ
dθ1
2
is
2
{ ddθθ22 }2 { dθ 1 2
dθ } d2 θ 2
1
=−
2
− ddθθ22 ( dθ 1 2
dθ )
dθ12
1
2
The variance of T = t(X) is − ddθθ22 . Thus T = t(X) attains the MVB of the
1
parametric function τ (θ).
Example 5.16 Let X1 , X2 , · · · , Xn be a random sample drawn from the popula-
tion with pdf
θxθ−1 0 < x < 1, θ > 0
pθ (x) =
0 otherwise
Find the MVBE of θ .
The likelihood function for θ is
n
!θ−1
Y
n
L(θ) = θ xi
i=1
n
X
log L(θ) = n log θ + (θ − 1) log xi
i=1
log xi − n
P P
n log θ+θ i=1 log xi
→ L(θ) = e
→ L(θ) = eθ1 t(x)+θ2 h(x)
where θ1 = θ, θ2 = n log θ,
P X
h(x) = e− log xi
, t(x) = log xi
dθ2 n
τ (θ) = − =−
dθ1 θ
d nθ
2
d θ2 n
Vθ [T ] = − 2 = − = 2
dθ1 dθ θ
∂ log L(θ)
= A(θ)[t(x) − θ]
∂θ
∂ log L(θ) ∂ 2 log L(θ)
L(θ) attains maximum, if ∂θ = 0 and ∂θ 2 < 0 at θ = θ̂(x) .
174
Probability Models and their Parametric Estimation
∂ 2 log L(θ)
= A0 (θ)[t(x) − θ] + A(θ)(−1)
∂θ2
∂ 2 log L(θ)
= −A(θ̂(x)) < 0 at θ = θ̂(x)
∂θ2
where θ(X)ˆ is MLE of θ .
Example 5.17 If T = t(X) is MVBE of τ (θ) and pθ (x1 , x2 , · · · , xn ) the joint
density function corresponding to n independent observations of a random variable
X , then show that correlation between T and ∂ log pθ (x∂θ 1 ,x2 ,··· ,xn )
is unity.
Given T = t(X) is the MVUE of τ (θ) , i.e., T attains the Cramer Rao lower
bound,
[τ 0 (θ)]2
⇒ Vθ [T ] = ] θ∈Ω
Vθ [ ∂ log pθ (x∂θ
1 ,x2 ,··· ,xn )
∂ log pθ (x1 , x2 , · · · , xn )
i.e., [τ 0 (θ)]2 = Vθ [T ]Vθ [ ]
∂θ
r
∂ log pθ (x1 , x2 , · · · , xn )
τ 0 (θ) = Vθ [T ]Vθ [ ]
∂θ
But τ (θ) = Eθ [T ]
Z
= tpθ (x1 , x2 , · · · , xn )dx
∂pθ (x1 , x2 , · · · , xn )
Z
τ 0 (θ) = t dx
∂θ
∂pθ (x1 , x2 , · · · , xn ) pθ (x1 , x2 , · · · , xn )
Z
= t dx
∂θ pθ (x1 , x2 , · · · , xn )
∂ log pθ (x1 , x2 , · · · , xn )
Z
= t pθ (x1 , x2 , · · · , xn )dx
∂θ
∂ log pθ (x1 , x2 , · · · , xn )
= Eθ T
∂θ
log pθ (x1 , x2 , · · · , xn )
= Covθ T,
∂θ
Correlation coefficient between T and log pθ (x1∂θ ,x2 ,··· ,xn )
is
h i
Covθ T, log pθ (x1∂θ,x2 ,··· ,xn )
ρ= r h i
Vθ [T ]Vθ ∂ log pθ (x∂θ
1 ,x2 ,··· ,xn )
τ 0 (θ)
ρ = r h i
∂ log pθ (x1 ,x2 ,··· ,xn )
Vθ [T ]Vθ ∂θ
= 1
r h i
∂ log pθ (x1 ,x2 ,··· ,xn )
Since τ 0 (θ) = Vθ [T ]Vθ ∂θ
175
A. Santhakumaran
This is not true when the moments of the distribution do not exist. For example in the
case of Cauchy distribution moment estimators do not exist.
Example 5.18 A random sample of size n is taken from the log normal distribu-
tion ( 2
√1 1 − 2σ12 (log x−θ)
x e x>0
pθ,σ2 (x) = 2πσ
0 otherwise
176
Probability Models and their Parametric Estimation
Example 5.19 Find the moment estimates of α and β for the pdf
( β
α −αx β−1
pα,β (x) = Γβ e x x > 0, β > 0, α > 0
0 otherwise
177
A. Santhakumaran
178
Probability Models and their Parametric Estimation
Z b
x a+b
µ01 = E[X] = dx =
a b−a 2
b
x2 b3 − a3 b2 + ab + a2
Z
1
µ02 = E[X ] =2
dx = =
a a−b b−a 3 3
2
b + 2ab + a2 − ab
2
(2µ01 ) − ab
µ02 = =
3 3
3µ02 = 4(µ01 )2 − ab and b = 2µ01 − a
.˙. 3µ02 = 4(µ01 )2 − a(2µ01 − a) ⇒ 3µ02 = 4(µ01 )2 − 2aµ01 + a2
2
a2 − 2aµ01 + 4µ01 − 3µ02 = 0
q
2µ01 ± 4µ01 2 − 4(4µ01 2 − 3µ02 )
a=
2
√ √ √
â(x) = m1 ± 3m2 . But 2µ1 = µ1 ± 3m2 + b ⇒ b̂(x) = m01√± 3m2 .
0 0 0
Thus the value of the moment estimators of a and b are â(x) = m01 − 3m2 and
√ P P 2
x
P 2
b̂(x) = m01 + 3m2 where m01 = nxi and m2 = n i − n
xi
179
A. Santhakumaran
180
Probability Models and their Parametric Estimation
Iterative method may be used to solve the equation for θ . Alternatively, expand
P fj2 j
f (θ) = j πj (θ) 1 − θ in a Taylor’s series as a function of θ upto first order
181
A. Santhakumaran
fj2
1 − x̄j
P
− j mj
θ − x̄ = fj2 j
+ (1 − x̄j )2 ]
P
j mj [ x̄2
fj2
− j] x̄1
P
− j mj [x̄
θ − x̄ = fj2
+ (x̄ − j)2 ] x̄12
P
j mj [j
P fj2
j mj [j − x̄]
Let θ1 = x̄ + x̄ P fj2
j mj [j + (j − x̄)2 ]
To improve the value of θ from x̄ , repeat the process until to get the convergent value
of θ .
Example 5.25 Show that for large sample size, maximizing the likelihood function
of the χ2 statistic is equal to minimizing the χ2 statistic.
Let oj be the observed frequency and ej be the theoretical frequency of the
P (o −e )2
j class. Then χ2 = j j ej j . For large fixed sample size n , the distribution of
th
182
Probability Models and their Parametric Estimation
n! e o1 e o2 e o r
1 2 r
L = ···
o1 !o2 ! · · · or ! n n n
such that o1 + o2 + · · · + or = n
o1 o2 or
n! e1 e2 er o1 o1 o o r
r
= ··· ···
o1 !o2 ! · · · or ! o1 o2 or n n
r
X ej
log L = constant + oj log
j=1
oj
1
For large fixed sample size, ej = oj + aj n1−δ , δ > 0, i.e., ej = oj + aj n 2 for δ = 12 ,
1 P P 1 P
where aj is finite and |aj n 2 | < and j oj = j ej = n so that n
2
j aj = 0
1
aj < 0(n− 2 ) .
P P
as n → ∞ and if n 6→ ∞ , then aj < 1 for every > 0 , i.e.,
n2
r 1
" #
X oj + aj n 2
log L = constant + oj log
j=1
oj
r
" 1
!#
X aj n 2
= constant + oj log 1 +
j=1
oj
" 1
!#
X aj n 2 a2j n 1
= constant + oj − 2 + ···
j
oj oj 2
X 1 1 X a2j n 1
= constant + aj n 2 − + 0(n− 2 )
j
2 j oj
1 X (ej − oj )2 1
= constant − + 0(n− 2 )
2 j oj
(ej −oj )2
If modified χ2 statistic is defined as χ2mod =
P
j oj , then
1
log L = constant − χ2mod as n → ∞.
2
183
A. Santhakumaran
3
1
3 i3
a3j n 2 a3j
h 1
3
= o(n− 2 =
P P
Since j o2j < for some n > N ⇒ j o2j < 3 = 1
n2 n2
1
1 4
1 a4j (n 2 )4 a4j 1 14
o(n− 2 ) and
P P
j o3j
< 1 for some n > N ⇒ j o3j < 1
4
< 1 =
(n 2 ) n2
1 1
(o(n− 2 ))4 = o(n− 2 ) where > 0 and 1 > 0.
1
χ2 − χ2mod = o(n− 2 ) = 0 as n → ∞
1
Thus log L = constant − χ2 as n → ∞
2
1
max log L = constant + {− max χ2 } as n → ∞
2
1
= constant + min χ2 as n → ∞
2
Maximizing the likelihood function of the χ2 statistic = Minimizing the χ2 statistic.
184
Probability Models and their Parametric Estimation
parameter to be estimated
y1 θ1 1
y2 θ2 2
Y = ···
θ =
··· =
···
yn n×1
θ m m×1
n n×1
X = coefficient matrix of the parameter θ
x11 x12 · · · x1m
x21 x22 · · · x2m
i.e., X =
··· ··· ··· ···
θ̂(x) = (X 0 X)−1 X 0 Y
= (X 0 X)−1 X 0 [Xθ + ]
= (X 0 X)−1 X 0 Xθ + (X 0 X)−1 X 0
Eθ [θ̂(X)] = θ + (X 0 X)−1 X 0 Eθ []
= θ since Eθ [] = 0
185
A. Santhakumaran
Linear Estimation
ρ(X 0 ) = ρ(X 0 , b)
c0 Xθ = b0 θ ⇒ X 0 c = b (5.11)
0 0 2
and V [c Y ] = c cσ (5.12)
Minimize equation (5.12) subject to equation ( 5.11)
Using the method of Lagrange multipliers , one determines the stationary points by
considering
L(λ) = c0 c − 2λ0 (X 0 c − b)
where λ is a vector of Lagrange multiplier
dL(λ)
= 2c0 − 2λ0 X 0
dc
186
Probability Models and their Parametric Estimation
The stationary points of the function L(λ) are given by the equation
dL(λ)
= 0
dc
⇒ c0 − λ0 X 0 = 0
⇒ c0 = λ0 X 0 i.e., c = Xλ
X 0 Xλ = b (5.13)
0
Since b θ is estimable, equation (5.11) is solvable, i.e.,
ρ(X 0 ) = ρ(X 0 , b) ↔ ρ(X 0 X) = ρ(X 0 X, b).
Thus equation (5.13) is solvable. Let c(1) and c(2) be two solutions for to λ(1) and
λ(2) of equation (5.13).
c(1) = Xλ(1)
c(2) = Xλ(2)
0 (1)
X Xλ = b
0 (2)
X Xλ = b
X 0 X(λ(1) − λ(2) ) = 0
c(1) − c(2) = X(λ(1) − λ(2) )
0 0
c(1) − c(2) c(1) − c(2) = λ(1) − λ(2) X 0 X λ(1) − λ(2)
0
⇒ c(1) − c(2) c(1) − c(2) = 0
(1)
⇒c = c(2)
Thus, whatever be the solution of λ of the equation (5.13) the values of c0 are the
same. Hence b0 θ possesses an unique minimum variance unbiased estimator.
Suppose that ρ(X) = r andthe first r columns of X are linearly independent.
b1
Let X = [X1 X2 ] , b = . Now the solution of the equation (5.13) is λ =
b2
−1
(X10 X1 ) b1
.˙. c = Xλ
−1
= X1 (X10 X1 ) b1
−1 −1
c0 c = b01 (X10 X1 ) X10 X1 (X10 X1 ) b1
−1
= b01 0
(X X1 ) b1
For every c satisfying X 0 c = b
c0 c = c0 [I − X1 (X10 X1 )−1 X10 ]c + c0 X1 (X10 X1 )−1 X10 ]c
= c0 [I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 (X10 X1 )(X10 X1 )−1 (X10 X1 )](X10 X1 )−1 b1
= c0 [I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 b1
Since [I − X1 (X10 X1 )−1 X10 ] is an idempotent matrix
= c0 [I − X1 (X10 X1 )−1 X10 ][I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 b1
≥ b01 (X10 X1 )−1 b1
187
A. Santhakumaran
This indicates that the minimum is actually obtained. The LSE θ̂(X) of θ is obtained
by minimizing (Y − X θ̂)0 (Y − X θ̂) . The normal equation is
X 0 Xθ = X 0 Y ⇒ c0 Y = λ0 X 0 Y = λ0 X 0 X θ̂ = b0 θ̂(X) since b0 = λ0 X 0 X.
The best
linear unbiased−1estimator of c0 Y is b0 θ̂(X) .
0
Since I − X1 (X1 X1 ) X1 is a projection matrix and hence it is an idempotent
matrix. Further, it is an well known property that for an idempotent matrix A, ρ(A) =
T r(A)
⇒ ρ I − X1 (X10 X1 )−1 X10 = T r I − X1 (X10 X1 )−1 X10 = (n − r)
1 0 1 θ1 l1
0 1 0 θ 2 = l2
0 1 0 θ3 l3 − l1
1 0 1 θ1 l1
0 1 0 θ2 = l2
0 0 0 θ3 l3 − l1 − l2
ρ(X 0 ) = ρ(X 0 , l) if l3 − l1 − l2 = 0 , i.e., l3 = l1 + l2 .
Example 5.27 The feed intake of a cow with weight X1 and yield of milk X2 may
be of the linear model Y = a + b1 X1 + b2 X2 + , where is called random error
or random residuals. If yi , xi1 and xi2 are the values of Y, X1 and X2 for cow
i = 1, 2, 3, 4 and 5 . The following observations are made on 5 cows:
i Y X1 X2
1 62 2 6
2 60 9 10
3 57 6 4
4 48 3 13
5 23 5 2
188
Probability Models and their Parametric Estimation
189
A. Santhakumaran
5.9 Derive the formula to calculate the MLE of θ , using a random sampleP from the
θx
distribution with Pθ {X = x} = ax g(θ) , x = 1, 2, · · · where g(θ) = ax θx .
Also obtain the explicit expression for the case of truncated Poisson distribution
with x = 1, 2, 3, · · · .
5.10 Show that MLE of θ based on n independent observations from a uniform
distribution in (0, θ) is consistent.
5.11 Find the MLE of θ given the observations .8 and .3 on a random variable with
pdf 2x
θ 0<x<θ
pθ (x) = (1−x)
2 1−θ if θ ≤ x < 1, 0 < θ < 1
190
Probability Models and their Parametric Estimation
5.26 Show that MVBE’s exist for the exponential family of densities.
5.27 Find MLE of β in Gamma(1, β ) based on a sample of size n where the actual
observations are not available but it is known that k of the observations are less
than or equal to a fixed positive number M .
5.28 Obtain the BLUE of θ for the normal distribution with mean θ and variance
σ 2 based on n observations x1 , x2 , · · · , xn .
5.29 Obtain the MLE for the coefficient of variation from a population with N (θ, σ 2 )
based on n observations.
5.30 Obtain the MLE of θ for the pdf
(1 + θ)xθ 0 < x < 1 and θ > 0
pθ (x) =
0 otherwise
5.32 Show that maximum likelihood estimation χ2 statistic and Minimum χ2 statis-
tic give the same results as n → ∞ .
5.33 Find the MLE of N of
1
N if x = 1, 2, · · · , N, N ∈ I+
pN (x) =
0 otherwise
191
A. Santhakumaran
5.36 The maximum likelihood estimator of σ 2 in a normal population with mean zero
is
(a) n1 P(xi − x̄)2
P
1
(b) n−1 (x − x̄)2
1
P 2 i
(c) n P xi
1
(d) n−1 x2i
5.37 Consider the following statements:
The maximum likelihood estimators
1. are consistent
2. have invariance property
3. can be made unbiased using an adjustment factor even if they are biased. Of
these statements:
(a) 1 and 3 are correct
(b) 1 and 2 are correct
(c) 2 and 3 are correct
(d) 1, 2 and 3 are correct
5.38 Which of the following statements are not correct?
1. From the Cramer - Rao inequality one can always find the lower bound of the
variance of an unbiased estimator.
2. If sufficient statistic exits, then maximum likelihood estimator is itself a suffi-
cient statistic.
3. UMVUE and MVBE’s are same.
4. MLE’s may not be unique
Select the correct answer given below:
(a) 1 and 3 (b) 1 and 2 (c) 1 and 4 (d) 2 and 3
5.39 Which one of the following is not necessary for the UMVU estimation of θ by
T = t(X) ?
(a) E[T − θ] = 0
(b) E[T − θ]2 < ∞
(c) E[T − θ]2 is minimum
(d) T = t(X) is a linear function of observations
5.40 Consider the following statements:
If X1 , X2 , · · · , Xn are iid random variables with uniform distribution over
(0, θ) , then
1. 2X̄ is an unbiased estimator of θ .
2.The largest among X1 , X2 , · · · , Xn is an unbiased estimator θ
3. The largest among X1 , X2 · · · , Xn is sufficient for θ
4. n+1
n X(n) is a minimum unbiased estimator of θ
Of these statements:
(a) 1 alone is correct
(b) 1 and 2 are correct
(c) 1, 3 and 4 are correct
(d) 1 and 4 are correct
192
Probability Models and their Parametric Estimation
5.41 LSE and MLE are the same if the sample comes from the population is :
(a) Normal (b) Binomial (c) Cauchy ( d) Exponential
5.42 LSE of the parameters of a linear model are
(a) unbiased (b) BLUE (c) UMVU (d) all the above
193
A. Santhakumaran
6. INTERVAL ESTIMATION
6.1 Introduction
Let X be a random sample drawn from a population with pdf pθ (x), θ ∈ Ω.
For every distinct value of θ, θ ∈ Ω , there corresponds one member of the family of
distributions. Thus one has a family of pdf ’s {pθ (x), θ ∈ Ω} . The experimenter
needs to select a point estimate of θ, θ ∈ Ω . Even though the estimator may have
some valid statistical properties, the estimator may not reflect the true value of the
parameters, due to the randomness of the observations. Hence one may search for an
alternative to get the closeness of the estimates to the unknown parameters with certain
probability values. Hence as an alternative one may go for the interval estimation with
certain level of significance. This chapter deals with interval estimation.
Family of random sets
k
Let Pθ , θ ∈ Ω ⊆ < , be the set of probability distributions of the random variable
X . A family of subsets of S(X) of Ω depends on the observations x of X but not
θ , is called family of random sets.
194
Probability Models and their Parametric Estimation
The interval with end points X and ab X that are functions of X . Hence I(X) =
X, ab X is a random interval. Thus if I(X) takes a value x, ab x when X takes
1 19
Example 6.2 Find the confidence coefficient of the confidence interval 19X , X
for θ based on a single observation x of a random variable X with pdf
θ
(1+θx)2 0 < x < ∞, θ > 0
pθ (x) =
0 otherwise
1
, 19
The confidence coefficient of the interval 19X X for θ is
1 19 1 19
Pθ <θ< = Pθ <X<
19X X 19θ θ
Z 19 θ θ
= dx
1
19θ
(1 + θx)2
Z 19
1 θ 2θ
= dx
2 19θ (1 + θx)2
1
19
1 1 θ
= −
2 (1 + θx) 1
19θ
1 1 19
= − − = .45
2 20 20
195
A. Santhakumaran
X 2X
Example 6.3 Compute the confidence coefficient of the interval 1+X , 1+2X
θ
for 1+θ where X has the pdf
1
θ0 < x < θ, θ > 0
pθ (x) =
0otherwise
X 2X θ
The confidence coefficient of the interval 1+X , 1+2X for 1+θ is
X θ 2X 1 + 2X 1+θ 1+x
Pθ < < = Pθ < <
1+X 1+θ 1 + 2X 2X θ X
1 1 1
= Pθ +1< +1< +1
2X θ X
1 1 1
= Pθ < <
2X θ X
= Pθ {X < θ < 2X}
θ
= Pθ 1 < <2
X
X 1
= Pθ 1 > >
θ 2
θ
= Pθ <X<θ
2
Z θ
1 1 θ
= dx = θ− = .5
θ
2
θ θ 2
Example 6.4 Let T = t(X) be the maximum of two independent observations
drawn from a population with uniform distribution over the interval (0, θ) . Compute
the confidence coefficient of the interval (0, 2T ) .
Let T = max{X1 , X2 } . The pdf of T is
2
pθ (t) = θ2 t 0 < t < θ
0 otherwise
The confidence coefficient of the interval (0, 2T ) is
θ
Pθ {0 < θ < 2T } = Pθ 0 < <2
T
θ
= Pθ <T <∞
2
θ
= Pθ <T <θ
2
Z θ
2
= tdt
θ
2
θ2
θ
2 t2
= = .75
θ2 2 θ
2
196
Probability Models and their Parametric Estimation
197
A. Santhakumaran
ne−nt
0<t<∞
p(t) =
0 otherwise
This equation has infinitely many solutions. If one can choose λ1 = 0, then 1 −
e−nλ2 = 1 − α , i.e., e−nλ2 = α → −nλ2 = log α . Thus λ2 = n1 log( α1 ) . .˙. The
(1 − α) level confidence interval for θ is
1 1
Pθ 0 < T < log = 1−α
n α
1 1
Pθ 0 < Y1 − θ < log = 1−α
n α
1 1
Pθ Y1 − log < θ < Y1 = 1−α
n α
Example 6.7 Given a sample of size n from U (0, θ) . Show that the confidence
interval for θ based on the sample range R with confidence coefficient (1 − α) and
of the form (R, Rc ) has c given as a root of the equation
cn−1 [n − (n − 1)c] = α.
198
Probability Models and their Parametric Estimation
Given pθ (x) = θ1 , 0 < x < θ and pθ (x+R) = θ1 , 0 < x+R < θ or 0 < x < θ −R .
"Z #n−2
Z θ−R R+x
11 1
pθ (R) = n(n − 1) dx dx
0 θθ x θ
Z θ−R
1 1
= n(n − 1) Rn−2 dx
0 θ2 θn−2
n(n − 1) n−2
= R (θ − R)
θn
n−2
n(n − 1) R R
= 1− , 0<R<θ
θ θ θ
R
If y = θ, then
n(n − 1)y n−2 (1 − y) 0 < y < 1
p(y) =
=0 otherwise
This equation has infinitely many solutions. If one can choose λ1 = c and λ2 = 1 ,
then the confidence interval for θ is
199
A. Santhakumaran
If θ̂(X) is the estimate of θ ( not necessarily unbiased) with finite variance, then by
Chebychev’s Inequality
1
q
2
Pθ |θ̂(X) − θ| < Eθ [θ̂(X) − θ] > 1 − 2
q q
⇒ θ̂(x) − Eθ [θ̂(X) − θ]2 , θ̂(x) + Eθ [θ̂(X) − θ]2 is a 1 − 12 level con-
fidence interval for θ .
Example 6.8 Let X1 , X2 , · · · , Xn be iid b(1, θ) random variables. Obtain (1 −
α) level confidence interval for θ by using Chebychev’s Inequality.
Pn Vθ [X]
i=1 Xi ∼ b(n, θ) since each Xi ∼ b(1, θ) . Eθ [X̄] = θ and Vθ [X̄] = n =
θ(1−θ)
n . Now ( )
r
θ(1 − θ) 1
Pθ |X̄ − θ| < >1− 2
n
Since θ(1 − θ) ≤ 14 ,
1 1
Pθ |X̄ − θ| < √ >1− 2
2 n
1
Pθ X̄ − √ < θ < X̄ + √ >1− 2
2 n 2 n
If n is kept constant, then one can choose 1 − 12 = 1 − α ⇒ 2 = 1
α ⇒ = √1 .
α
Thus the (1 − α) level confidence interval for θ is
1 1
x̄ − √ , x̄ + √
2 nα 2 nα
200
Probability Models and their Parametric Estimation
The two functions n1 (θ) and n2 (θ) are monotonic and non - decreasing and also
discontinuous step function such that the (1 − α) level confidence interval for θ is
Pθ {n−1 −1
2 (X) < θ < n1 (X)} ≥ 1−α
α
where Pθ {X ≤ n1 (θ)} ≤
2
n1 (θ)
X n α
i.e., i θi (1 − θ)n−i ≤
i=0
2
Thus the upper confidence limit for θ . Similarly the lower confidence limit for θ is
n n
X α
i θi (1 − θ)n−i = (6.2)
i=x
2
Solving the equations (6.1) and (6.2) for θ ( when n and α are known) gives the
(1 − α) level confidence interval for θ . i.e., (θ(X), θ̄(X)) is the (1 − α) level
confidence interval where θ̄(x) is the solution of the equation (6.1) and θ(x) is the
solution the equation (6.2).
Example 6.10 Assuming there is a constant probability θ , for a person entering a
supermarket will make a purchase. Constitute a random sample of a Bernoulli random
variable ( success = purchase made, failure = no purchase made). If 10 persons were
selected at random and it was found that 4 made a purchase. Obtain 90% confidence
interval for θ .
The 90 % confidence limits for θ is
4
!
X 10
i θi (1 − θ)10−i = .05
i=0
10
!
X 10
i θi (1 − θ)10−i = .05
i=4
Solving these equations for θ, one may get that θ̄(x) = .696 and θ(x) = .150.
Thus, if a random sample of 10 independent Bernoulli random variables gives x = 4
success, the 90 % confidence interval for θ is ( .150, .696).
Example 6.11 Let X1 , X2 , · · · , Xn be a random sample from a Poisson random
variable X with parameter
Pn θ . Obtain (1 − α) level confidence interval for θ .
Let Y = i=1 Xi . Given that each Xi follows P (θ) . Then Y ∼ P (nθ) .
The exact (1 − α) level confidence interval for θ is
Pθ {λ1 (θ) < Y < λ2 (θ)} = 1 − α
α
i.e., Pθ {Y ≥ λ2 (θ)} ≤
2
201
A. Santhakumaran
∞
X (nθ)x α
⇒ e−nθ = (6.3)
x=y
x! 2
α
and Pθ {Y ≤ λ1 (θ)} ≤
2
y
X (nθ)x α
⇒ e−nθ = (6.4)
x=0
x! 2
Pθ {λ−1 −1
2 (Y ) < θ < λ1 (Y )} = 1 − α
202
Probability Models and their Parametric Estimation
Case (ii) When σ 2 is unknown and sample size n ≤ 30 then the statistic
X̄ − θ
t= √ ∼ t distribution with n − 1 d.f
S/ n
1
Pn
where S 2 = n−1 i=1 [Xi − X̄]2 . In this case
203
A. Santhakumaran
X̄−θ
If n > 30 , then t = √
S/ n
∼ N (0, 1) . Such a case the 1 − α confidence interval is
S S
(X̄ − zα/2 √ , X̄ − zalpha/2 √ )
n n
R∞
where α2 = zα/2 φ(z)dz
Example 6.14 A random sampling of size 50 taken from a N (θ, σ = 5) has a
mean 40. Obtain a 95% confidence interval for 2θ + 3
Given the sample mean x̄ = 40 and population standard deviation σ = 5 . The
95% confidence interval for θ is
σ σ
P {X̄ − 1.96 √ < θ < X̄ + 1.96 √ } = .95
n n
σ σ
P {2 X̄ − 1.96 √ < 2θ < 2 X̄ + 1.96 √ } = .95
n n
σ σ
P {2 X̄ − 1.96 √ + 3 < 2θ + 3 < 2 X̄ + 1.96 √ + 3} = .95
n n
For every Tθ , λ1 (α) and λ2 (α) can be chosen in number of ways. However the choice
is one like to choose λ1 (α) and λ2 (α) , such that θ̄(X) − θ(X) is minimum which is
the (1 − α) level shortest confidence interval based on Tθ .
Let Tθ = t(X, θ) be sufficient statistic. A random variable Tθ is a function
of (X1 , X2 , · · · , Xn ) and θ whose distribution is independent of θ is called pivot.
Example 6.15 Let X1 , X2 , · · · , Xn be a random sample from N (θ, σ 2 ) where
2
σ is known. Obtain (1 − α) level shortest confidence interval for θ .
Consider the statistic Tθ = X̄−θ
√σ which is a pivot. Since X̄ is sufficient and
n
Tθ ∼ N (0, 1) , i.e, the distribution of Tθ is independent of θ . The (1 − α) level
204
Probability Models and their Parametric Estimation
−b
φ(x)dx = 1 − α . Thus the shortest length confidence interval based on Tθ is a
equal two tails confidence interval. The (1 − α) level confidence interval for θ is
σ σ
X̄ − √ z α2 , X̄ + √ z α2
n n
205
A. Santhakumaran
where z α2 is the upper ordinate corresponding to the area α2 . The shortest length of
this interval is L = 2z α2 √σn .
Example 6.16 Let X1 , X2 , · · · , Xn be a sample from U (0, θ). Find (1−α) level
shortest confidence interval for θ .
Let T = max1≤i≤n {Xi } . The pdf of T is
n n−1
p(t | θ) = θn t 0<t<θ
0 otherwise
T
The pdf of Y = θ is given by
ntn−1
0<y<1
p(y) =
0 otherwise
T
The statistic Y = θ is pivot. The (1 − α) level confidence interval for θ is
P {a < Y < b} = 1 − α
T
P {a < < b} = 1 − α
θ
T T
P{ < θ < } = 1 − α
b a
1 1
The length of the interval isL = ( − )T
a b
To find the shortest confidence interval, minimizing L subject to
Z b
ny n−1 dy = 1 − α
a
206
Probability Models and their Parametric Estimation
207
A. Santhakumaran
dL 1 1 X
= ( − ) (Xi − θ)2
da a b
Z b
db
and 0dχ2 + pn (b) − pn (a) = 0
a da
db pn (a)
i.e., =
da pn (b)
Z a
wherepn (a) = pn (χ2 )dχ2
0
X
dL 1 1 pn (a)
= − (Xi − θ)2
da a b pn (b)
dL
For minimum = 0
da
1 1 pn (a)
⇒ 2 =
a b2 pn (b)
⇒ b2 pn (b) = a2 pn (a)
Using iterative method to solve the equation b2 pn (b) = a2 pn (a) for a and b i.e., to
solve Z b Z a
b2 pn (χ2 )dχ2 = a2 pn (χ2 )dχ2 where a < b and a 6= b
0 0
If â and b̂ are the solution of the equation, then the shortest confidence interval for
σ 2 is Pn Pn
2
(Xi − θ)2
i=1 (Xi − θ)
, i=1
b̂ â
Case(ii) If θ is unknown, then
Pn
(Xi − X̄)2 (n − 1)S 2 Pn
Tσ2 = i=1 2 = ∼ χ2 (n−1)df where S 2 = 1
n−1 i=1 (Xi − X̄)2
σ σ2
In this case to solve the equation
208
Probability Models and their Parametric Estimation
(n − 1)S 2
Tσ2 =
σ2
with (n − 1)df
The shortest confidence interval for σ 2 is
(n − 1)S 2 (n − 1)S 2
,
b̂ â
Example 6.19 Let X and Y be two independent random variables that are
N (θ, σ12 ) and N (θ, σ22 ) respectively. Obtain (1 − α) level confidence interval for
σ2
the ratio σ22 < 1 by considering a random sample X1 , X2 , · · · , Xn1 of size n1 ≥ 2
1
from the distribution of X and a random sample Y1 , Y2 , · · · , Yn2 of size n2 ≥ 2
from the distribution ofPYn1 . Pn2
Let s21 = n11 i=1 (Xi − X̄)2 and s22 = n12 i=1 (Yi − Ȳ )2 be the variances
n s2 n s2
of the two samples. The independent random variables σ1 2 1 and σ2 2 2 have χ2
1 2
distribution with n1 − 1 and n2 − 2 degrees of freedom respectively. The definition
of the F statistic is
n1 s21
σ12 (n1 −1)
F = n2 s22
∼ F distribution with n1 − 1 and n2 − 1 degrees of freedom.
σ22 (n2 −1)
σ22
The (1 − α) level confidence interval for σ12
is
n1 s21
σ12 (n1 −1)
P σ22 a< 2
n2 s2
< b = 1−α
2
σ1
σ22 (n2 −1)
2
n2 s2 n2 s22
(n2 −1) σ22 (n2 −1)
P σ22 a < 2 < b n s2 = 1−α
2
σ1
n1 s21 σ1 1 1
(n1 −1) (n1 −1)
σ22
The (1 − α) level confidence interval for σ12
is
209
A. Santhakumaran
Pn
Let T = i=1Xi , then T ∼ G(n, θ1 ). Its pdf is
θn −θt n−1
Γn e t 0<t<∞
pθ (t) =
0 otherwise
1 − y 2n −1
2n Γn e
2y 2 0<y<∞
pθ (y) =
0 otherwise
That is Y = 2θ Xi follows χ2 distribution with 2n degrees of freedom. The
P
(1 − α) level confidence interval for θ is
n X o
Pθ a < 2θ Xi < b = 1−α
a b
Pθ P <θ< P = 1−α
2 Xi 2 Xi
where a is given by Z a
α
p2n (χ2 )dχ2 =
0 2
and b is given by Z ∞
α
p2n (χ2 )dχ2 =
b 2
Example 6.21 The time to failure for an electronic component is assumed to be an
Exponential distribution with unknown parameter θ ,
−θx
θe x > 0, θ > 0
i.e., p(x | θ) =
0 otherwise
10 electronic components are place on test and their observed times to failure are
607.5, 1947.0, 37.6, 129.9, 409.5, 529.5, 109.0, 582.4, 499.0, 188.1 hours respectively.
Find the 90% confidence interval for θ and 90% confidence interval for mean time to
failure. Also obtain the 90% confidence interval for the probability of the component
for a 100 hours period.
xi = 5039.5, 2n = 20 degrees of freedom. From χ2
P
As in the Example 6.16,
2 2
table χ0.5 = 10.9 and χ.95 = 31.4 . 90% confidence interval for θ is
10.9 31.4
, = (.00108, .00312)
2 × 5039.5 2 × 5039.5
210
Probability Models and their Parametric Estimation
The mean time to failure is θ1 . The 90% confidence interval for mean time to failure
1 1
lies between .00312 = 320.5 hours and .00108 = 925.9 hours.
The probability that one of these components will work at least t hours with-
out failure is P {X > t} = e−θt . The 90% confidence interval for the probabil-
ity of the component for a 100 hours period lies between e−100×.00312 = .732 and
e−100×.00108 = .898.
Example 6.22 Explain a method of construction of large sample confidence inter-
val for θ in Poisson (θ) .
For large samples the variable
∂ log L
∂θ
Z=q ∼ N (0, 1)
V [ ∂ log L
∂θ ]
Hence the distribution of Z one can easily construct the confidence limits for θ for
large samples. We have
X X
log L(θ) = xi log θ − nθ − log xi
∂ log L(θ) nx̄
= −n
∂θ θ
∂ log L(θ) nX̄
V = V[ − θ]
∂θ θ
n
1 X
= V [ Xi ]
θ2 i=1
n
1 X
= V [X]
θ2 i=1
1
= nθ
θ2
n
=
θ
nx̄
θ −n
ThusZ = p
n/θ
The 95% large confidence interval for θ is
P {−1.96 < Z < 1.96} = .95
r
n
P {−1.96 < (X̄ − θ) < 1.96} = .95
θ
Hence the 95% confidence limits for θ are
r
n
(x̄ − θ) = ±1.96
θ
3.42
θ2 − (2x̄ + )θ + x̄2 = 0
n r
1.92 3.42 3.69
θ = x̄ + ± x̄ + 2
n n n
211
A. Santhakumaran
Problems
6.7 Obtain (1 − α) level shortest confidence interval for θ using a random sample
from N (θ, 1) .
6.8 Given X1 , X2 , · · · , Xn is a random sample from N (θ, σ 2 ) , where σ 2 is known
. Find (1 − α) level upper confidence bound for θ .
6.9 Obtain a confidence interval for the range of a rectangular distribution in random
sample of size n .
6.10 The number of houses sold per week for 15 weeks by Dinesh real estate firm
were 3 , 3, 4, 6, 2, 4, 4, 3, 1, 2, 0 , 5, 7, 1, 4 respectively. Assuming these are the
observed values for a random sample of size 15 of a Poisson random variable
with parameter θ . Compute 95 % confidence limits for θ . Ans.(2.36, 4.18)
6.11 Show that in large samples, the 95% level confidence limits for the means of a
Poisson distribution are given by
r
1.92 3.84
X̄ + ± X̄
n n
θe−θx
x > 0, θ > 0
p(x | θ) =
0 otherwise
212
Probability Models and their Parametric Estimation
the 95% level confidence limits for large samples are given by
1 ± 1.96
√
n
θ=
X̄
6.13 Obtain the large sample confidence interval with confidence coefficient (1 − α)
for the parameter of Bernoulli distribution.
6.14 Examine the connection between shortest confidence interval and sufficient
statistics.
6.15 90 % confidence interval for θ based on a single observation X from the den-
sity function 1
0 < x < θ, θ > 0
p(x | θ) = θ
0 otherwise
is
(a) [X, 10X] (b) 20X (c) 50
19 , 20X 49 , 12.5 (d) All the above
6.16 The correct interpretation regarding the confidence interval (T1 , T2 ) of the pa-
rameter θ for a distribution F (x | θ), θ ∈ < with confidence coefficient 1 − α
is
(a) θ belongs to (T1 , T2 ) with probability 1 − α
(b) (T1 , T2 ) covers the parameter θ with probability 1 − α
(c) (T1 , T2 ) includes the parameter θ with confidence coefficient 1 − α
(d) θ0 belongs to (T1 , T2 ) with confidence α where θ(6= θ0 ) is the true value.
6.17 If a random sample of n = 100 voters in a community produced 59 votes in
favour of the candidate A , then 95 % confidence interval of fraction p of the
voting population
q favouring A is
59×41
(a) 59 ± 1.96
q 100
(b) .59 ± 1.96 .59×.41
q 100
(c) 59 ± 2.58 .59×.41
q 100
(d) 59 ± 2.58 59×41
100
213
A. Santhakumaran
7. BAYES ESTIMATION
7.1 Introduction
Bayes estimation treats the parameter θ of a statistical distribution as the re-
alizations of a random variable Ω with known distribution rather than a unknown
constant. So far the realization of distributions have assumed only the shape of the
distribution to be known but not the value of the parameters. Bayes estimation uses the
prior information of the distribution to completely specify the realization of the distri-
butions. This is the major difference in Bayes estimation and it may quite reasonable, if
the past experience is sufficiently extensive and revelant to the problem. The choice of
the prior distribution is made like that of the distribution Pθ by combining experience
with convenience.
A number of observations are available from the distribution Pθ , θ ∈ Ω of
a random variable X and it may be used to check the assumption of the form of the
distribution. But in Bayes estimation only a single observation is available from the
distribution of the parameter θ on Ω and it cannot be used to check the assumption
of the distribution. This needs a special care to use in the Bayes estimation.
Replication of a random experiment consists of drawing another set of ob-
servations from the distribution Pθ of a random variable X is possible in the usual
estimation. Replication of a random experiment results taking another value θ0 on
Ω from the prior distribution, then drawing a set of observations from the distribution
Pθ0 of a random variable X is possible in Bayes estimation.
The determination of a Bayes estimation is quite simple in principle. When
consider a situation before observations are taken and the distribution of θ on Ω is
known as prior distribution.
A decision function d(X) is a statistic that takes value in Ω . A non negative
function L(θ, d(X)), θ ∈ Ω is called a loss function. The function R defined by
R(θ, d) = Eθ [L(θ, d(X)] is known as the risk function associated with the decision
function d(X) at θ . For example L(θ, d) = [θ − d]2 , θ ∈ Ω ⊆ < , then the risk
R(θ, d) = Eθ [d(X) − θ]2 is a mean squared error. If it is known as the variance of
the estimator d(X) when Eθ [d(X)] = θ.
Bayes Risk Related to Prior
In Bayes estimation, the pdf (pmf ) π(θ) of θ on Ω ⊆ < is known as prior
distribution. For a fixed θ ∈ Ω , the pdf (pmf ) p(x | θ) represents the conditional
pdf (pmf ) of a random variable X given θ . If π(θ) is the pdf (pmf ) of θ on
Ω ⊆ < , then the joint pdf (pmf ) of θ on Ω and X is given by p(x, θ) = π(θ)p(x |
θ)
The Bayes risk of a decision function d with respect to the loss function
L(θ, d) is defined by R(π, d) = Eθ [R(θ, d)]. If θ on Ω is a continuous random
variable and X is of the continuous type, then bayes risk with respect to the loss
214
Probability Models and their Parametric Estimation
function L(θ, d) is
R(π, d) = Eθ [R(θ, d)]
Z
= R(θ, d)π(θ)dθ
Z
= Eθ [L(θ, d(X))]π(θ)dθ
Z Z
= L(θ, d(x))p(x | θ)dx π(θ)dθ
Z Z
= L[θ, d(x)]p(x | θ)π(θ)dxdθ
E[R(θ, d)] is a mean value of the risk R(θ, d) or the expected value of the risk
R(θ, d) . It is evident that a Bayes estimator d? (X) minimizes the mean value of the
risk R(θ, d).
215
A. Santhakumaran
The Bayes estimator is a function d? (X) that minimizes R(π, d). Minimization of
R(π, d) is same as the minimization of
Z
[θ − d(x)]2 p(θ | x)dθ
d? (x) = E[θ | X = x]
since Eθ [d? (X)] = Eθ {E[θ | X = x]} = θ
Remark 7.1 If L(θ, d) = |θ − d| is the loss function for estimating the parameter
θ , then Bayes estimator of θ is the median of the posterior distribution of θ ∈ Ω ⊆ < .
Since E|X − a| is minimized as a function of a , i.e., E|X − a| is minimized when
a? = median of the distribution of X . Also Bayes estimator is need not be unbiased.
Minimax decision function
The principle of minimax estimator is to choose d? so that max R(θ, d? ) ≤
max R(θ, d) ∀d . If such a function d? exists, is a minimax estimator of θ ∈ Ω ⊆ < .
Theorem 7.2 If d? (X) is a Bayes estimator having constant risk, that is R(θ, d? )
= constant, then d? (X) is a minimax estimator.
Proof: Let π ? (θ) be the prior density corresponding to the Bayes estimator d? (X)
with respect to the loss function L(θ, d) . Then
≤ sup R(θ, d)
θ∈Ω
for any other estimator d(X) of the parameter θ . Thus d? (X) is a minimax estima-
216
Probability Models and their Parametric Estimation
tor.
p(x, θ) π(θ)p(x | θ)
p(θ | x) = =
g(x) g(x)
= (n + 1)cnx θx (1 − θ)n−x
217
A. Santhakumaran
= E (θ | X = x)
Z 1
= θp(θ | x)dθ
0
Z 1
= (n + 1)cnx θx+2−1 (1 − θ)n−x+1−1 dθ
0
n! (x + 1)!(n − x)!
= (n + 1)
x!(n − x)! (n + 2)!
x+1
=
n+2
1
Z 1
R(π, d? ) = Eθ [X 2 ] + 2Eθ [X] + 1 + θ2 (n + 2)2 − 2θ(n + 2)Eθ [X] − 2θ(n + 2) dθ
(n + 2)2 0
1
Z 1
R(π, d? ) = n(n − 1)θ2 + nθ + 2nθ + 1 + θ2 (n + 2)2 − 2θ(n + 2)nθ − 2θ(n + 2) dθ
(n + 2)2 0
Z 1
? 1
R(π, d ) = [nθ(1 − θ) + (1 − 2θ)2 ]dθ
(n + 2)2 0
1 n 1 1
= + =
(n + 2)2 6 3 6(n + 2)
218
Probability Models and their Parametric Estimation
Find the Bayes estimate of θ and θ(1 − θ) using the quadratic loss function.
The marginal pdf of X1 , X2 , · · · , Xn is
Z
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
Z 1
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
Z 1 P
n− xi
P
x1
= θ (1 − θ) dθ
0
Z 1 X
= θt+1−1 (1 − θ)n−t+1−1 dθ where t = xi
0
(
t!(n−t)!
(n+1)! t = 0, 1, 2, · · · , n
=
0 otherwise
p(x1 , x2 , · · · , xn , θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
π(θ)p(x1 , x2 , · · · , xn | θ)
=
g(x1 , x2 , · · · , xn )
(
(n+1)! t
t!(n−t)! θ (1 − θ)n−t 0<θ<1
=
0 otherwise
219
A. Santhakumaran
is used. Find the Bayes estimate for (i) θ and (ii) e−θ
The marginal pdf of X1 , X2 , · · · , Xn is
Z ∞
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
0
Z ∞
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
P
−nθ θ
xi
−θ e
Z ∞
= e dθ
0 x1 ! · · · xn !
1 Z ∞
−(n+1)θ t+1−1 X
= Qn e θ dθ where t = xi
x ! 0
i=1 i
t!
= Qn
x !(n + 1)t+1
i=1 i
The posterior pdf of θ on Ω is
p(x1 , x2 , · · · , xn , θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
π(θ)p(x1 , x2 , · · · , xn | θ)
=
g(x1 , x2 , · · · , xn )
e−(n+1)θ θ t t+1 X
= (n + 1) where t = xi and 0 < θ < ∞
t!
220
Probability Models and their Parametric Estimation
221
A. Santhakumaran
Γ(n + a + b)
p(θ | x1 , x2 , · · · , xn ) = θa+t−1 (1 − θ)n+b−t−1 0 < θ < 1
Γ(a + t)Γ(a + b − t)
Bayes estimate of θ is
Z 1
? Γ(n + a + b)
d (x) = θa+1+t−1 (1 − θ)n+b−t−1 dθ
Γ(a + t)Γ(n + b − t) 0
P
a+t xi + a
= =
n+b+a n+b+a
Example 7.7 Let the a priori pdf of θ on Ω be N (0, 1) . Let X1 , X2 , · · · , Xn
be iid random sample drawn from a normal population with mean θ and variance 1.
222
Probability Models and their Parametric Estimation
Find the Bayes estimate of θ and Bayes risk with respect to a loss function L[θ, d] =
[θ − d]2 .
The a priori pdf of θ on Ω is
( 1 2
√1 e− 2 θ −∞ < θ < ∞
π(θ) = 2π
0 otherwise
√ nx̄ nx̄ 2
Put the transformation n + 1(θ − n+1 ) = t → (n + 1)(θ − n+1 ) = t2
2 2
1
x2i + 2(n+1)
n x̄
P
e− 2 ∞
Z
1 2
g(x1 , x2 , · · · , xn ) = n+1 √ e− 2 t dt
(2π) 2 n+1 −∞
2 2
− 12 x2i + 2(n+1)
n x̄
P
e √
= n+1 √ 2π
(2π) 2 n+1
1 2 n2 x̄2
P
e− 2 xi + 2(n+1)
= √ n
n + 1(2π) 2
223
A. Santhakumaran
π(θ)p(x1 , x2 , · · · , xn | θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
1 2 1
P 2√ n
√1 e− 2 θ √ 1 e− 2 (xi −θ) n + 1(2π) 2
2π ( 2π)n
= n2 x̄
− 12
P 2
xi + 2(n+1)
e
1 (n+1) nx̄ 2
− 2 [θ− n+1 ]
= q e −∞<θ <∞
2π
n+1
= 0 otherwise
Bayes estimate of θ is
d? (x)= E[θ | X1 = x1 , · · · Xn = xn ]
Z ∞
= θp(θ | x1 , x2 · · · , xn )dθ
−∞
Z ∞
1 1 (n+1) nx̄ 2
= θ √ (n + 1) 2 e− 2 [θ− n+1 ] dθ
−∞ 2π
√ nx̄ t nx̄
√
Put t = n + 1(θ − n+1 ) → θ = √n+1 + n+1 , dt = n + 1dθ
∞ t2 ∞
1 te− 2
Z Z
1 nx̄ t2
d? (x) = √ √ dt + √ e− 2 dt
−∞ 2π n + 1 −∞ 2π n+1
nx̄ nx̄
= 0+ =
n+1 n+1
224
Probability Models and their Parametric Estimation
Example 7.8 Let X1 , X2 , · · · , Xn be iid b(1, θ) random variables and let the
a priori pdf π(θ) of θ on Ω be U (0, 1) . Find (1 − α) level Bayes confidence
interval for θ .
As in Example 7.2,
1
θt (1 − θ)n−t 0 < θ < 1 where t = xi
P
p(θ | x1 , x2 , · · · , xn ) = β(t+1,n−t+1)
0 otherwise
225
A. Santhakumaran
α
and Pθ {θ ≤ l1 x} =
2
Z l1 (x)
1 α
(θ) θt (1 − θ)n−t dθ = (7.2)
0 β(t + 1, n − t + 1) 2
Solving the equations (7.1) and (7.2) for θ , one may get the (1 − α) level Bayes
confidence interval (θ(x), θ̄(x)) for θ .
Example 7.9 Let X1 , X2 , · · · , Xn be iid random sample drawn from a normal
population N (θ, 1), θ ∈ Ω ⊆ < and let the a priori pdf π(θ) of θ on Ω be
N (0, 1) . Find (1 − α) level Bayes confidence interval for θ .
As in Example 7.7, the posterior pdf of θ on Ω is
nx̄ 1
p(θ | x1 , x2 , · · · , xn ) ∼ N ,
n+1 n+1
Here θ is random variable. If one selects the equal tails confidence interval, then
√
nX̄
Pθ −z α2 < −θ n + 1 < z α2 = 1−α
n+1
zα zα
nX̄ nX̄
Pθ −√ 2 <θ< √ +√ 2 = 1−α
n+1 n+1 n+1 n+1
zα zα
nx̄ nx̄
−√ 2 , +√ 2
n+1 n+1 n+1 n+1
is the (1 − α) level Bayes confidence interval for θ .
Example 7.10 Let X1 , X2 , · · · , Xn be a random sample from a Poisson distri-
bution with unknown parameter θ . Assume that the a priori pdf π(θ) of θ on Ω
is 1 −αθ β−1
αβ Γβ
e θ θ > 0, α, β > 0
π(θ) =
0 otherwise
226
Probability Models and their Parametric Estimation
α
Pθ {θ ≤ l1 (x)} =
2
Z l1 (x)
α
p(θ | x1 , x2 , · · · , xn )dθ = (7.4)
0 2
Solving the equations (7.3) and (7.4) for θ , one may get the (1 − α) level Bayes
confidence interval (θ(X), θ̄(X)) for θ .
Example 7.11 Let X1 , X2 , · · · , Xn be a sample drawn from a normal population
N (θ, 1) . Assume that the a priori pdf π(θ) on Ω is U (−1, 1) . Find (1 − α) level
Bayes confidence interval for θ.
The pdf of X1 , X2 , · · · , Xn is
( n 1 P 2
√1 e− 2 (xi −θ) −∞ < x < ∞
p(x1 , x2 , · · · , xn | θ) = 2π
0 otherwise
227
A. Santhakumaran
= √ n 2π
2 n(2π) 2
The posterior pdf of θ on Ω is
π(θ)p(x1 , x2 , · · · , xn | θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
1 − 12
P
(xi −θ)2
√ √
2e 2 n( 2π)n
= √ √ 1
P 2 n 2
2π( 2π)n e− 2 xi + 2 x̄
√
n 1 2
= √ e− 2 n[θ−x̄] − ∞ < θ < ∞
2π
1
θ ∼ N x̄,
n
The (1 − α) level Bayes confidence interval for θ is
P {a < Z < b} = 1 − α
where Z = θ−x̄
√1
∼ N (0, 1)
n
P −z α2 < Z < z α2 = 1−α
zα/2 zα/2
P X̄ − √ < θ < X̄ + √ = 1−α
n n
228
Probability Models and their Parametric Estimation
Problems
7.1 Given n independent observations from a Poisson distribution with mean λ , find
Bayes’ estimate of λ , assuming the prior distribution π(θ) = e−λ , 0 < λ <
∞.
7.2 If d is a Bayes estimator of θ relative to some prior distributions and the risk
function does not depend on θ , show that d is minimax.
7.3 Define the terms: loss function, risk function and minimax estimator. Explain a
procedure of computing the minimax estimator under squared error loss func-
tion.
7.4 Explain Bayes and Minimax estimation procedures. Find out the Bayes estimate
of θ by using the quadratic loss function. Given a random sample from p(x |
θ) = θx (1 − θ)1−x , x = 0, 1 . The a priori distribution of θ is π(θ) = 2θ, 0 ≤
θ ≤ 1.
7.5 Let X1 , X2 , · · · , Xn be a sample drawn from a normal population N (0, 1) .
Assume that the a priori pdf π(θ) on Ω is U (−1, 1) . Find (1 − α) level
Bayesian confidence interval for θ. Also comments on your confidence interval.
7.6 Explain the concepts of Baye’s estimation.
7.7 Distinguish between interval estimation and Bayes interval estimation.
7.8 The joint pdf p(x, θ) can be expressed for the given value θ on Ω ⊆ < and
the a prior density π(θ) as
(a) p(x, θ) = p(x | θ)π(θ)
(b) p(x, θ) = g(x)p(x | θ)
g(θ)
(c) p(x, θ) = p(θ|x)
π(θ)
(d) p(x, θ) = p(x|θ)
7.9 The joint pdf p(x, θ) can be expressed for the given value X = x . p(θ | x) is
the posterior pdf of θ on Ω ⊆ < and g(x) is the marginal density of X as
(a) p(x, θ) = g(x)p(θ | x)
g(x)
(b) p(x, θ) = p(θ|x)
π(θ)
(c) p(x, θ) = p(θ|x)
(d) p(x, θ) = g(x)p(x | θ)
229
A. Santhakumaran
230
Probability Models and their Parametric Estimation
231
A. Santhakumaran
Glossary of Notation
APPENDIX
Normal curve ordinate
#include < stdio.h >
void main()
{
float y[200],a,b,x,l,n,s1,s2,calarea,area;
int i;
clrscr();
printf( ˝ Enter the value of a and area \n” );
scanf( ˝ %f %f”, & a, & area);
printf( ˝ Enter the number of intervals n \n” );
scanf( ˝ %d”,& n);
/* 0 ≤ a, b ≤ +3 */
b = 0.0;
do
{
l = (b - a)/n;
x = l;
y[0] = 1/2.506;
for( i= 1; i < = n; i++)
{
y[i] = (1/2.506)*exp( -0.5*x*x);
x=x+l}
s1 = 0 ;
s2 = 0 ;
for(i = 1; i < = n-1; i=i+2)
{
s1 = s1 + y[i];
}
for( i = 2; i < = n-2; i = i+2)
{
s2 = s2 + y[i];
}
calarea = l/3*( y[0] + y[n] + 4*s1 +2*s2);
if(( calarea - area ) > = .0001)
break;
b = b + 0.01
}
while( b < = 3.0)
printf( ˝ The ordinate of the given area = %4.2f”, b);
getch();
}
{
float y[200],a,b,x,l,n,s1,s2,area;
int i;
clrscr();
printf( ˝ Enter the value of a \n” );
/* 0 ≤ a, b ≤ +3 */
scanf( ˝ %f”, & a);
printf( ˝ Enter the value of b \n” );
scanf( ˝ %f”, & b);
printf( ˝ Enter the value of n \n” );
scanf( ˝ %f”,& n);
l = (b - a)/n;
x = l;
y[0] = 1/2.506;
for( i= 1; i < = n; i++)
{
y[i] = (1/2.506)*exp( - 0.5*x*x);
x = x+l;
}
s1 = 0 ;
s2 = 0 ;
for(i = 1; i < = n-1; i=i+2)
{
s1 = s1 + y[i];
}
for( i = 2; i < = n-2; i = i+2)
{
s2 = s2 + y[i];
}
area = l/3*( y[0] + y[n] + 4*s1 +2*s2);
printf( ˝ The area for the given ordinate = %4.5f”,area);
getch();
}
234
Probability Models and their Parametric Estimation
BIBLIOGRAPHY
INDEX
Assignable causes Inequality approach
Asymptotic covariance Information matrix
Asymptotic normal Information inequality
Asymptotic efficient Interval estimation
Asymptotically unbiased
Bayes confidence Jensen’s inequality
Bayes estimator
Bayes risk Joint distribution
Bernoulli distribution Joint sufficient statistics
Best linear unbiased estimator
Bhattacharya inequality Law of large numbers
Binomial distribution Lagrange Multiplier’s
Bilateral Laplace Transform Laplace Transform
Completeness Least square estimator
Cauchy distribution Lehmann - Scheffe theorem
Cauchy priciple value Lehmann - Scheffe technique
Chi - Square distribution Likelihood function
Chebychev’s inequality Linear estimation
Complete family Location parameter
Complete Statistic Locally MVUE
Continuous distribution Lognormal distribution
Confidence intervals Loss function
Consistent estimator Lower confidence bound
Convex function Marginal distribution
Covariance Matrix Maximum Likelihood
Cramer Mean squared error
Cramer Rao inequality Method of scoring
Cramer - Rao lower bound MLE is not consistent
Chapman - Robbin bound MLE is consistent
Covariance inequality Minimal sufficient
Discrete distributions Minimax estimator
Discrete uniform distribution
Dugue Minimum Chi - square
Estimation Minimum variance bound
Estimator Modified Minimum Chi - square
Estimate Moments estimator
Efficient estimator Multinomial distribution
Efficiency Negative Binomial
Empricical continuous distribution Newton - Raphson Method
Empricical discrete distribution Neyman Factorization Theorem
Erlang distribution Normal distribution
Exponential distribution Optimal Estimation
Exponential family Optimal proberty of MLE
Random sets Parameter
F - distribution Parameter space
Fixed derivative partition
Fisher Measure Poisson distribution
Gamma distribution Posterior
Gauss Markov Theorem Power series
Geometric distribution Prior
Goodness of fit Pivot
Histogram Quantile
Hypergeometric distribution Rao - Blackwell Theorem
Huzurbazar Relative efficiency
Probability Models and their Parametric Estimation
Sufficient statistic
Teriminal value
Triangular distribution
Trinomial distributio
UMVUE
Unbiased
Uniform distribution
Upper confidence bound
Weibull distribution
Wald
Assignable causes Inequality approach
Asymptotic covariance Information matrix
Asymptotic normal Information inequality
Asymptotic efficient Interval estimation
Bayes confidence Jensen’s inequality
Bayes estimator Jocabian
Bayes risk Joint distribution
Bernoulli distribution Joint sufficient statistics
Best linear unbiased estimator joint complete statistics
Bhattacharya inequality law of large numbers
Binomial distribution Lagrange Multiplier’s
Bilateral Laplace Transform Laplace Transform
Bounded completeness Least square estimator
Cauchy distribution Lehman - Scheffef’s theorem
Cauchy priciple value Lehman - Scheffef’s technique
Chi - Square distribution Likelihood function
Chebychev’s inequality Linear estimation
Complete family Location parameter
Complete Statistic Locally MVUE
Continuous distribution Lognormal distribution
Confidence intervals Loss function
Consistent estimator Lower confidence bound
Convex function Marginal distribution
Covariance Matrix Maximum Likelihood
Cramer Maximum order statistic
Cramer Rao inequality Mean squared error
Cramer - Rao lower bound Fixed derivative
Chapman - Robbin bound Method of scoring
Covariance inequality Minimal sufficient
Discreate distribution Minimax estimator
Discreate uniform distributions Minimum order statistic
Dugue Minimum Chi - square
Estimation Minimum variance bound
Estimator Modified Minimum Chi - square
Estimate Moments estimator
Efficient estimator Multinomial distribution
Efficiency Negative Binomial
Empricical continuous distribution Newton - Raphson method
Empricical discreate distribution Neyman factorization theorem
Erlang distribution Normal distribution
Exponential distribution Optimal Estimation
Exponential family Optimal proberty of MLE
Random sets Parameter
F - distribution Parameter space
Fisher’s Measure partition
Gamma distribution Pvitital
Gauss Markov Theorem Posterior
Geometric distribution Prior
Goodness of fit Power series
Histogram Quantile
Hypergeometric distribution Rao - Blackwell theorem
Huzurbazar Wald
237