0 Stimmen dafür0 Stimmen dagegen

186 Aufrufe237 SeitenJun 01, 2011

© Attribution Non-Commercial (BY-NC)

PDF, TXT oder online auf Scribd lesen

Attribution Non-Commercial (BY-NC)

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

186 Aufrufe

Attribution Non-Commercial (BY-NC)

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

- The Law of Explosive Growth: Lesson 20 from The 21 Irrefutable Laws of Leadership
- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Hidden Figures Young Readers' Edition
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- Micro: A Novel
- The Wright Brothers
- The Other Einstein: A Novel
- State of Fear
- State of Fear
- The Power of Discipline: 7 Ways it Can Change Your Life
- The Kiss Quotient: A Novel
- Being Wrong: Adventures in the Margin of Error
- Algorithms to Live By: The Computer Science of Human Decisions
- The 6th Extinction
- The Black Swan
- The Art of Thinking Clearly
- The Last Battle
- Prince Caspian
- A Mind for Numbers: How to Excel at Math and Science Even If You Flunked Algebra
- The Theory of Death: A Decker/Lazarus Novel

Sie sind auf Seite 1von 237

NET/JRF/CSIR EXAMINATIONS

A. SANTHAKUMARAN

Dr. A. Santhakumaraan

Associate Professor and Head

Department of Statistics

Salem Sowdeswari College

Salem - 636010

Tamil - Nadu

E-mail: ask.stat @ yahoo.com

About the Author

Dr. A. Santhakumaran is an Associate Professor and Head Department of Statistics at

Salem Sowdeswari College, Slaem - 10, Tamil Nadu. He holds a Ph.D. in Statistics -

Mathematics from the Ramanujan Institute for Advanced Study in Mathematics, Univer-

sity of Madras. He has interests in Stochastic Processes and Their Applications. He has

to his credit over 31 research papers in Feedback Queues, Statistical Quality Control and

Reliability Theory. He is the authour of the book Fundamentals of Testing Statistical

Hypotheses and Research Methodology.

Acknowledgments

College , Salem and my colleagues for their enthusiastic and unstinted support ren-

dered for publishing this book. I am grateful to Professor V. Thangaraj, RIASM, Uni-

versity of Madras, for his encouragement for writing the book. My greatest debt is

Dr. J. Subramaniam, Professor of Mathematics, Bannari Amman Institute of Technol-

ogy, Sathyamangalam, who read most of the manuscript and whose critical comments

resulted in numerous significant improvements. My thanks to Mr. G. Narayanan, Ra-

manujan Institute Computer Centre, RIASM, University of Madras, for the suggestions

rendered by him towards the successful completion of the Latex typeset of the book.

Finally, I wish to express my gratitude to all my teachers under whose influ-

ence I have come to appreciate statistics as the science of winding and twisting net-

work, connecting Mathematics , Scientific Philosophy, Computer Software and other

intellectual sources of the Millennium. A.SANTHAKUMARAN

PREFACE

Even though the science of Statistics was originated more than 200 years ago ,

it was recognized as a separate discipline in the early 1940 in India. From then to

till now statistics is evolving as a versatile powerful and indispensable instrument for

analyzing the statistical data in real life problems. We have reached a stage where no

empirical science can afford to ignore the science of Statistics, since the diagnosis of

pattern of recognition can be achieved through the science of Statistics. Because of the

speedy growth of modern science and technology, one who learns statistics, he must

have capacity, knowledge and intellect. Bird has capacity to imitate when we taught.

The child is not born with a language. But it is born into an innate capacity to learn

language. So when we teach the child, the child manipulates the structure and creates

sentences. But a bird cannot do this. So the child has knowledge and capacity to create

new sentences. If a man has the ability and knowledge he can be inventiveness and

innovation constitute intellect.

If a student has ability, knowledge and intellect, then he will be able to learn and

implement statistics successfully. If these three faculties are lacking, learning of statis-

tics will not be possible. We shall give a number of examples drawn from the story of

improvement of natural knowledge and the success of decision making. It shows how

statistical ideas played an important role in scientific investigations and other decision

making processes. The most successful man in life is one who makes the best deci-

sion based on the available information. Practically it is a very difficult task to take a

decision on a real life problems. We illustrate this with the help of following examples.

One wants to know that how many ways a bread can be divided into two equivalent

parts. Immediately one reflects that it is divided into a finite number of ways. In fact

the bread is divided into two equivalent parts in infinite number of ways. Naturally

every article can have infinite dimension. Our interest of study may be one dimension

namely, length of the bread, Area ( = length × breath ) two dimension and Volume

( = length × height × breadth) three dimension and so on. Analogous to this are

the measures of average ( location), measures of variability ( scale) and measures of

skewness and kurtosis (shape).

Another example is that a new two wheeler is introduced by a manufacturer in the

market. The manufacturer wants to announce that the two wheeler gives how much

kilometer per litre on road. For this purpose, the manufacturer ride the two wheeler on

the road three times and observed that the two wheeler gives 50 km per litre, 55 km

per litre and 60 km per litre respectively. Suddenly one comes to the mind that the two

wheeler gives = 50+55+60 3 = 55 km per litre. This is absolutely wrong. Actually the

two wheeler gives 60 km per litre, the value of the maximum order statistic.

A cyclist pedals from his house to his college at a speed of 10 mph and returns back

his house from the college at a speed of 15 mph. He wants to know his average speed.

One assumes that the distance between the house and the college is x miles. Then the

average speed of the cyclist = TotalTotal distance = x 2x x = 12 mph which is the

time taken 10 + 15

Harmonic Mean.

Seven students and a master want to cross a river from one side to other side. The

students are not able to swim to cross the river. The master measures average height

of the students which is 5’.5”. He also measures the depth of the river from one side

5

to other side in 10 places 2’, 2’.5”, 4’, 5’.5”, 6’, 6’.5”, 10’, 2’.5”,1’.5”,1’which has

4’.15” average depth of the river. The master takes a decision to cross the river on foot,

since average height of the students is greater than the average depth of the river. The

students fail to cross the river, since some place the depth of the river is more than

5’.5”. The master is not happy for his decision. The master has succeeded to take a

decision if the minimum height of the students is greater than the maximum depth of

the river.

Keeping this in mind, the first chapter of the book deals with some of the well

known distributions he pattern of recognition of statistical distributions. Chapter 2

gives the criteria of point estimation. Chapter 3 focuses on the study of optimal estima-

tion. Chapter 4 illustrates the properties of complete family of distributions. Chapter 5

explains the methods of estimation. Chapter 6 discusses interval estimation. Chapter 7

consists of Bayesian estimation.

6

DISTINCTIVE FEATURES

• Care has been taken to provide conceptual clarity, simplicity and up to date ma-

terials.

• Properly graded and solved problems to illustrate each concept and procedure

are presented in the text.

• About 300 solved problems and 50 remarks.

• A chapter on complete family of distributions.

ence of Under - Graduate and Post - Graduate Statistics of Indian universities

and other Applicable Sciences, Allied Statistical Courses, Mathematical Sci-

ences and various Competitive Examinations like ISS, UGC Junior Fellowship,

SLET, NET etc.

January 2010

7

CONTENTS

1.1 Introduction

1.2 Collection of data

1.3 Diagnosing the Probability Models data

1.4 Discrete Probability Models

1.5 Continuous Probability Models

1.6 Diagnosis of Probability Models

1.7 Quantile - Quantile plot

2.1 Introduction

2.2 Point estimator

2.3 Problems of point estimation

2.4 Criteria of the point estimation

2.5 Consistency

2.6 Sufficient condition for consistency

2.7 Unbiased estimator

2.8 Sufficient Statistic

2.9 Neyman Factorizability Criterion

2.10 Exponential family of distributions

2.11 Distribution Admitting Sufficient Statistic

2.12 Joint Sufficient Statistics

2.13 Efficient estimator

3.1 Introduction

3.2 Completeness

3.3 Minimal Sufficient Statistic

8

4.1 Introduction

4.2 Uniformly Minimum Variance Unbiased Estimator

4.3 Uncorrelatedness Approach

4.4 Rao - Balckwell Theorem

4.5 Lehman - Scheffe Theorem

4.6 Inequality Approach

4.7 Cramer Rao Inequality

4.8 Chapman - Robbin Inequality

4.9 Efficiency

4.10 Extension of Cramer- Rao Inequality

4.11 Cramer - Rao Inequality - Multiparameter case

4.12 Bhattacharya Inequality

5.1 Introduction

5.2 Method of Maximum Likelihood Estimation

5.3 Numerical Methods of Maximum Likelihood Estimation

5.4 Optimum property of MLE

5.5 Method of Minimum Variance Bound Estimation

5.6 Method of Moment Estimation

5.7 Method of Minimum Chi - Square Estimation

5.8 Method of Least Square Estimation

5.9 Gauss Markoff Theorem

6.1 Introduction

6.2 Confidence Intervals

6.3 Alternative Method of Confidence Intervals

6.4 Shortest Length Confidence Intervals

9

7.1 Introduction

7.3 Bayes confidence intervals

References

Glossary of Notation

Appendix

Answers to problems

Index

10

Probability Models and their Parametric Estimation

1.1 Introduction

Statistics is a decision making tool which aims to resolve the real life problems.

It originated more than 2000 years ago, but it was recognized as a separate discipline

from 1940 in India. From then till now , statistics is evolving as a versatile powerful and

indispensable instrument for investigation in all fields of real life problems. It provides

a wide variety of analytical tools. We have reached a stage where no empirical science

can afford to ignore the science of statistics since the diagnosis of pattern of recognition

can be achieved through the science of statistics.

Statistics is a method of obtaining and analyzing data in order to take decisions

on them. In India, during the period of Chandra Gupta Maurya there was an efficient

system of collecting official and administrative statistics. During Akbar’s reign ( 1556

- 1605AD) people maintained good records of land and agricultural statistics. Statistics

surveys were also conducted during his reign.

Sir Ronald A. Fisher known as Father of statistics placed statistics on a very

sound footing by applying it to various diversified fields. His contributions in statistics

led to a very responsible position of statistics among sciences

Professor P. C. Mahalanobis is the founder of statistics in India. He was a

physicist by training , a statistician by instinct and an economist by conviction. Gov-

ernment of India has observed on 29th June the birthday of Professor Prasanta Chan-

dra Mahalanobis as National Statistics Day. Professor C.R. Rao is an Indian legend

, whose career spans the history of modern statistics. He is considered by many to be

the greatest living statistician in the world to day.

There are many definitions of the term statistics . Some authors have defined

statistics as statistical data ( plural sense) and others as statistical methods ( singular

sense).

Yule and Kendall state By statistics we mean quantitative data affected to a

marked extent by multiplicity of causes. Their definition point out the following char-

acteristics:

• Statistics are aggregates of facts.

• Statistics are affected to a marked extent by multiplicity of causes.

• Statistics are numerically expressed.

• Statistics are enumerated or estimated according to reasonable standards of ac-

curacy.

• Statistics are collected in a systematic manner.

• Statistics are collected for a pre - determined purpose and

• Statistics should be placed in relation to each other.

11

A. Santhakumaran

One of the best definitions of statistics is given by Croxton and Cowden. They

define statistics as the science which deals with collection, analysis and interpretation

of numerical data. This definition points out the scientific ways of :

• Data collection

• Data presentation

• data analysis

• Data interpretation

Statistics is an imposing form of Mathematics. The usage of statistical methods

has been briskly expanding in the late 20th century, because of the application value

of the statistical models and methods have greater implication in the applications of

many inter - disciplinary sciences. So we define Statistics as the science of winding

and twisting network connecting Mathematics, Scientific Philosophy, Computer

software and other intellectual sources of the millennium.

This definition reveals that statisticians work to translate real life problems

into mathematical models by using assumptions or axioms or principles. Then they

derive exact solutions by their knowledge and thereby intellectually validate the results

and express their merits in non-mathematical forms which make for the consistency of

real life problems.

In real life problems, there are many situations where the actions of the en-

tities within the system under study cannot be completely predicted with 100 percent

perfection . There is always some variation. The variation can be classified into two

categories, i.e., variation due to assignable causes which has to be identified and elim-

inated; and variation due to chance causes which is equal to 6σ values. This is also

called natural variation. In general, the reduction of natural variation is not necessary

and involves more cost. So it is not feasible to reduce the natural variation. However,

some appropriate statistical patterns of recognition may well describe the causes of

variations.

An appropriate statistical pattern of recognition can be diagnosed by repeated

sampling of phenomenon of interest. Then, through the systematic study of these data,

a statistician can obtain a known distribution suitable for the data and estimates the

parameters of the distribution. A statistician takes continuous efforts in the selection of

a distribution form.

There are four steps in the diagnosis of a statistical distribution. They are

(i) Data collection

Data collection for real life problems often requires a substantial knowledge on

the problems, planning time and resource commitment.

(ii) Identification of statistical pattern

When the data are available, identification of a probability distribution begins

12

Probability Models and their Parametric Estimation

pattern of frequency distribution and knowledge on the nature and behaviour of

the process, a family of distributions is chosen.

(iii) Parameters selection

Choose parameters that determine a specific instance of a distribution family

when the data are available. These parameters are estimated from the data.

The validity of the chosen distribution and the associated parameters are evalu-

ated with the help of statistical tests. The validity of various assumptions made

on parameter is achieved by certain level of significance only.

If the chosen distribution is not a good approximation of the data, then the analyst

goes to the second step, chooses a different family of distributions and repeats the

procedure.

If the several iterations of this procedure fail to give a fit between an assumed

distributional form and the collected data, then the empirical form of the distribution

may be used.

Collection of data is one of the important tasks in finding a solution for real life

problems. Even if the statistical pattern of the real life problems are valid, if the data

are inaccurately collected, inappropriately analyzed or not representative of the real life

problems, then the data will be misleading when used for decision making.

One can learn data collection from an actual experience. The following sug-

gestions may enhance and facilitate data collection. Data collection and analysis must

be tackled with great care.

(i) Before collecting data, planning is very important. It could commence by a prac-

tice of pre - observing experience. Try to collect the data while pre - observing.

Forms of the data are devised for due purposes. It is very likely that these forms

will have to be modified several times before the actual data collection begins.

Watch for unusual situations or normal circumstances and consider how they

will be handled. Planning is very important even if the data are collected au-

tomatically. After collecting the data, find out whether the collected data are

appropriate or not.

(ii) If the data being collected are adequate to diagnosize the statistical distributions,

then determine the apt distribution. If the data being used are useless to diagno-

size the statistical distribution, then there is no need to collect superfluous data.

(iii) Try to combine homogeneous data sets. Check data for homogeneity in suc-

cessive time periods and during the same time period on successive interval of

times.

13

A. Santhakumaran

(iv) Beware of the possibility of data censoring, in which a quantity of interest is not

observed in its entirety. This problem most often occurs when the analyst is

interested in the time required to complete some process but the process begins

prior to or finishes after the completion of the observation period. Censoring can

result in especially long process times being left out of the data sample.

(v) One may use scatter diagram which indicates the relationship between the two

variables of interest.

(vi) Consider the possibility that a sequence of observations which appear to be in-

dependent may possess autocorrelation. Autocorrelation may exist in successive

time periods.

The methods for selecting families of distributions are possible, if only the sta-

tistical data are available. The specific distribution within a family is specified by

estimating its parameters. Estimating the parameters of a family of distributions leads

to the theory of estimation.

The formation of frequency distribution or Histogram is useful in guessing the

shape of a distribution. Hines and Montgomery state that choosing the number of class

intervals approximately equals the square root of the sample size. If the intervals are too

long, the Histogram will be coarse or blocking and its shape and other details will not

smoothen the data. So one has to allow the interval sizes to change until a good choice

is found. The Histogram for continuous data corresponds to the probability density

function of a theoretical distribution. If continuous, a line drawn through the centre

point of each class interval frequency should result in a shape like that of probability

density function ( pdf )( see Figure 1.2).

Histogram for discrete data where there are a large number of data points,

should have a cell for each value in the range of the data. However if there are a few

data points, it may be necessary to combine adjacent cells to eliminate the ragged ap-

pearance of the Histogram. If the Histogram is associated with discrete data, it should

look like a probability mass function ( pmf ) ( see Figure 1.1).

Discrete random variables are used to describe the random phenomenon in which

only integer values can occur. The following are some important distributions.

1.4.1 Bernoulli distribution

An experiment consists of n trials, each trial has a success or a failure and each

trial is repeated under the same condition. Let Xj = 1 if the j th experiment

resulted in success and let Xj = 0 , if the j th experiment resulted in a failure,

the sample space has a value 0 and 1. If the trials are independent, each trial has

only two possible outcomes ( success or failure) and the probability of success

14

Probability Models and their Parametric Estimation

θ remains constant from trial to trial. For one trial the pmf

x

θ (1 − θ)1−x x = 0, 1, 0 < θ < 1

pθ (x) =

0 otherwise

From the above assumptions in a production process, X denotes the quality of

the produced item, then X follows the Bernoulli random variable.

1.4.2 Binomial Distribution

Let X be a random variable, denotes the number of success in n Bernoulli

trials. Then the random variable X is called a Binomial random variable with

parameters n and θ . Here the sample space is {0, 1, 2, · · · , n} and the pmf

is

n! x n−x

pθ (x) = x!(n−x)! θ (1 − θ) x = 0, 1, · · · , n, 0 < θ < 1

0 otherwise

X1 , X2 , · · · , XnPare independent and identically distributed Bernoulli random

n

variables, then i=1 Xi ∼ b(n, θ) . The problems relating to tossing a coin

or throwing dice lead to Binomial distribution . In a production process, the

number of x defective units in a random sample of n units follows Binomial

distribution.

1.4.3 Geometric Distribution

A random variable X is related to a sequence of Bernoulli trials in which the

number of trials (x + 1) to achieve the first success is

pθ (x) =

0 otherwise

It is the probability that the event {X = x} occurs, when there are x failures

followed by a success.

A couple decides to have any number of children until they have a male

child. If the probability of having a male child in their family is p , they have

to expect how many children they will have before the first male child is born.

X denotes the number of children of the couple. The probability that there are

x female children preceding the first male child is born, is a Geometric random

variable.

1.4.4 Negative Binomial Distribution

PnX1 , X2 , · · · , Xn are iid Geometric variables, then T

If = t(X) =

i=1 Xi ∼ a Negative Binomial variate whose pmf is

(

(t+n−1)! n t

pθ (t) = t!(n−1)! θ (1 − θ) t = 0, 1, · · ·

0 otherwise

15

A. Santhakumaran

failures preceding the nth success in (x + n) trials is given by

(

(x+n−1)! n x

pθ (x) = (n−1)!x! θ (1 − θ) x = 0, 1, 2, · · ·

0 otherwise

This will happen if the last trial results in a success and among the previous

(n + x − 1) trials there are exactly x failures. Note that if n = 1 , then p(x)θ

is the Geometric distribution function. Negative Binomial distribution has Mean

< Variance . In a production process, the number of units that are required to

achieve nth defective in x + n units follow Negative Binomial distribution.

1.4.5 Multinomial Distribution

If the sample space of a random experiment has been split into more than two

mutually exclusive and exhaustive events then one can define a random vari-

able which leads to Multinomial distribution. Let E1 , E2 , · · · , Ek be k mu-

tually exclusive and exhaustive events of a random experiment with respec-

tive probabilities θ1 , θ2 , · · · , θk , such that θ1 + θ2 + · · · + θk = 1 and

0 < θi < 1, i = 1, 2, · · · , k, then the probability that E1 occurs x1 times, E2

occurs x2 times, · · · , Ek occurs xk times in n independent trials is known

as Multinomial distribution with pmf is given by

x

n!

θx1 θ2x2 where ki=1 xi = n

P

x1 !x2 !···xk ! 1

· · · θk k

pθ1 ,θ2 ,··· ,θk (x1 , x2 , · · · , xn ) =

0 otherwise

If k = 2 , that is, the number of mutually exclusive events is only two, then the

Multinomial distribution becomes a Binomial distribution as is given by

n! x1 x2

pθ1 ,θ2 (x1 , x2 ) = x1 !x2 ! θ1 θ2 where x1 + x2 = n and θ1 + θ2 = 1

0 otherwise

n! x1 n−x1

pθ1 (x1 ) = x1 !(n−x1 )! θ1 (1 − θ1 ) 0 < θ1 < 1, x1 = 0, 1, · · · , n

0 otherwise

Consider two brands A and B. Each individual in the population prefers brand

A to brand B with probability θ1 , prefers B to A with probability θ2 and is

indifferent between brand A and B with probability θ3 = 1 − θ1 − θ2 . In

a random sample of n individuals X1 prefers brand A, X2 prefers brand B

and X3 prefers some other brand other than A and B. Then the three random

variables follow a Trinomial distribution, i.e.,

n! x1 x2 x3

= x1 !x2 !x3 ! θ1 θ2 θ3 x1 + x2 + x3 = n

0 otherwise

16

Probability Models and their Parametric Estimation

A random variable X is said to follow uniform distribution on N points

(x1 , x2 , · · · , xN ), if its pmf is given by

1

i = 1, 2, · · · , N and N ∈ I+

pN (x) = PN {X = xi } = N

0 otherwise

A random experiment with complete uncertainty but whose outcomes are equal

probabilities may describe Uniform distribution. In a finite population of N

units, one has to select any unit xi , i = 1, 2, · · · , N from the population with

simple random sampling technique which has a discrete uniform distribution.

1.4.7 Hypergeometric Distribution

One situation in which Bernoulli trials are encountered is that in which an ob-

ject is drawn at random from a collection of objects of two types in a box. In

order to repeat this experiment so that the results are independent and identically

distributed, it is necessary to replace each object drawn and to mix the objects

before the next one is drawn. This process is referred to as sampling with re-

placement. If the sampling is done no replacement of the objects drawn, the

resulting trial are still of the Bernoulli type but no longer independent.

For example, four balls are drawn one at a time, at random and no replace-

ment from 8 balls in a box, 3 black and 5 red. The probability that the third ball

drawn is black, i.e.,

5 4 3 5 3 2 3 5 2 3 2 1

= × × + × × + × × + × ×

8 7 6 8 7 6 8 7 6 8 7 6

3

=

8

which is the same as the probability that the first ball drawn is black. It should

not be surprising that this probability for black ball is the same on the third draw

as on the first draw.

In general case, n objects are to be drawn at random, one at a time, from

a collection of N objects, M of one kind and N − M of another kind. The

one kind and of object will be thought of as success and coded 1; the other kind

is coded 0. Let X1 , X2 , · · · , Xn denote the sequence of coded outcomes; that

is Xi is 1 or 0 according to whether the ith draw results in success or failure.

The total number of success in n trials is just the sum of the X 0 s ,

Sn = X1 + X2 + · · · + Xn

is, the probability of a 1 on the ith trial is the same at each trial:

M

P {Xi = 1} = i = 1, 2, · · · , n

N

17

A. Santhakumaran

One can observe first that the probability of a given sequence of N objects is

1 1 1

···

N N −1 N −n+1

The probability that an object of type 1 occurs in the ith position in the sequence

of N objects is

M (N − 1)(N − 2) · · · (N − n + 1)

P {Xi = 1} =

N (N − 1) · · · (N − n + 2)(N − n + 1)

M

= i = 1, 2, · · · , n

N

where M is the number of ways of selecting the ith position with an object

coded 1 and (N − 1)(N − 2) · · · (N − n + 1) is the number of ways of selecting

the remaining (n − 1) places in the sequence from the (N − 1) remaining

objects. It does not matter whether the number of success among the n objects

drawn, one at a time, at random or that of simultaneously drawing n at random.

The probability function of Sn is

M N-M

k n - k

P {Sn = k} = N

k = 0, 1, 2, · · · , min(n, M )

n

0

otherwise

The random variable Sn with the above probability function is said to have a

Hypergeometric distribution. The mean of the random variable Sn is easily

obtained from the representation of a Hypergeometric variable as a sum of the

Bernoulli trials. That is,

E[Sn ] = E[X1 + X2 + · · · + Xn ]

= E[X1 ] + E[Xn ] + · · · + E[Xn ]

= 1 × P {X1 = 1} + 0 × P {X1 = 0}

+ · · · + 1 × P {Xn = 1} + 0 × P {Xn = 0}

M M nM

= + ··· + =

N N N

M N −M N −n

Variance of Sn = n if N ∈ I+ (1.1)

N N N −1

The probability at each trial that the object drawn is of the type of which there

are initially M is p = MN , then

N −n

Variance of Sn = npq if N ∈ I+ (1.2)

N −1

18

Probability Models and their Parametric Estimation

−n

The above formula (1.2) differs from the formula (1.1) by the extra factor N

N −1 .

N −n

The variance of Sn = npq N −1 in the no replacement case and the variance

of Sn = npq in the replacement case for fixed p and fixed n , since the factor

N −n

N −1 → 1 as N becomes finitely many. Thus Hypergeometric distribution is

exact where as Binomial distribution is approximate one.

50 students of the M.Sc. Statistics in a certain college are divided at random

into 5 batches of 10 each for the annual practical examination in Statistics. The

class consists of 20 resident students and 30 non - resident students. X denotes

the number of students in the first batch who appear the practical examination.

The Hypergeometric distribution is apt to describe the random variable X and

has the pmf

20 30

x 10 - x

x = 0, 1, 2, · · · , 10

50

P {X = x} =

10

0 otherwise

Poisson random variable is used to describe rare events. For example number of

air crashes occurred on Monday in 3 pm to 5 pm. The pmf of Poisson random

variable given as

−θ θx

e x! θ > 0, x = 0, 1, 2, · · ·

pθ (x) =

0 otherwise

where θ is a parameter. One of the important properties of the Poisson dis-

tribution is that mean and variance are the same and are equal to θ . If

X1 , X2 , · · · , Xn are iid Poisson

Pn random variables with parameter θ , then the

sum of the random variables i=1 Xi follows a Poisson distribution with pa-

rameter nθ .

After correcting 50 pages of the proof of a book, the proof readers find

that there are, on the average 2 errors per 5 pages. One would like to know the

number of pages with errors 0 , 1, 2, 3 · · · in 10000 pages of the first print of

the book. X denotes the number of errors per page; then the random variable

X follows the Poisson distribution with parameter θ = 52 = .4.

1.4.9 Power series distribution

If a random variable X follows a Power series distribution, then its pmf is

ax θx

f (θ) x ∈ S; ax ≥ 0, θ > 0

Pθ {X = x} =

0 otherwise

P

f (θ) is positive, finite and differentiable and S is a non - empty countable

subset of non - negative integers.

19

A. Santhakumaran

(i) Binomial Distribution

p

Let θ = 1+p , f (θ) = (1 + θ)n and S = {0, 1, 2, 3, · · · , n} a set of non -

negative integers, then

X

f (θ) = ax θx

x∈S

Xn

(1 + θ)n = ax θ x

x=0

n

ax = x

n x

p

x 1−p

Pp {X = x} = p n

[1 + 1−p ]

( n

x px q n−x x = 0, 1, 2, · · · , n

=

0 otherwise

p

Let θ = 1−p , f (θ) = (1 − θ)−n and S = {0, 1, 2, · · · }, 0 ≤ θ ≤ 1 and n ∈

I+ . Now

X

f (θ) = ax θx

x∈S

X∞

(1 − θ)−n = ax θx

x=0

- n

n+x-1 n+x-1

ax = (−1)x x = (−1)x (−1)x x = x

n + x - 1 p x

x 1+p

P {X = x} = h i−n

p

1 − ( 1+p )

x

n+x-1 p

= x (1 + p)−n

1+p

n+x-1 x

= x p (1 + p)−(n+x)

-n

= x (−p)x (1 + p)−(n+x) x = 0, 1, 2, · · ·

20

Probability Models and their Parametric Estimation

X

f (θ) = ax θx

x∈S

X

θ

e = ax θx

x∈S

∞ x ∞

X θ X 1

= ax θx ⇒ ax =

x=0

x! x=0

x!

ax θ x 1 θx e−θ θx

Pθ {X = x} = = = x = 0, 1, 2, · · ·

f (θ) x! eθ x!

Continuous random variable can be used to describe random phenomena in which

the variable X of interest can take any value x in some interval which has P {X =

x} = 0 ∀ x in that interval.

1.5.1 Uniform Distribution

A random variable X is uniformly distributed at an interval [a, b], if its pdf is

given by 1

b−a a≤x≤b

pa,b (x) =

0 otherwise

2 −x1

Note that P {x1 < X < x2 } = F (x2 ) − F (x1 ) = xb−a is proportional to the

length of the interval for all x1 and x2 satisfying a ≤ x1 ≤ x2 ≤ b . If random

phenomenon has complete unpredictability, then it can be described as uniform

distribution.

1.5.2 Normal Distribution

A random variable X with mean θ (−∞ < θ < ∞) and variance σ 2 (> 0)

has a Normal distribution if it has the pdf

( 1 2

√ 1 e− 2σ2 [x−θ] −∞ < x < ∞

pθ,σ2 (x) = 2πσ

0 otherwise

as a Normal distribution. The time to assemble a product which is the sum of

the times required for each assembly operation may describe a Normal random

variable.

A random variable X is said to be Exponentially distributed with parameter

θ > 0 , if its pdf is given by

−θx

θe x>0

pθ (x) =

0 otherwise

21

A. Santhakumaran

The value of the intercept on the vertical axis is always equal to the value of θ .

Note that all pdf 0 s eventually intersect at θ , since the Exponential distribution

has its mode at the origin. The mean and standard deviation are equal in Ex-

ponential distribution. In a random phenomenon, the time between independent

events which have memory less property may appropriately follow Exponential

random variable. For example, the time between the arrivals of a large number

of customers who act independently of each other may fit adequately the data to

Exponential distribution.

1.5.4 Gamma Distribution

A function used to define the Gamma distribution is the Gamma function. A

random variable X follows a Gamma distribution, if

( β

θ −θx β−1

pθ,β (x) = Γβ e x x > 0, β > 0, θ > 0

0 otherwise

where

i=1 Xi ∼ G(n, θ ) , if each Xi ∼ exp( θ ) . The cumulative distribution

function F (x) = P {X ≤ x} of a random variable X is given by

R ∞ βθ

(βθt)β−1 e−βθt dt x > 0

1 − x Γβ

F (x) =

0 otherwise

The pdf of the Gamma distribution becomes Erlang distribution of order k

when β = k an integer. When β = k a positive integer, the cumulative distri-

bution function F (x) is given by

( Pk−1 −kθx (kθx)i

F (x) = 1 − i=0 e i! x>0

0 otherwise

which is the sum of Poisson terms with mean kθx .

1.5.6 Weibull Distribution

A random variable X has a Weibull distribution if it has pdf

(

β x−γ β−1

exp[−( x−γ

β

pβ,α,γ (x) = α α α ) ] x≥γ

0 otherwise

The three parameters of the Weibull distribution are γ (−∞ < γ < ∞) which is

the location parameter, α (α > 0) which is the scale parameter and β (β > 0)

which is the shape parameter. When γ = 0 the Weibull pdf becomes

β x β−1

α(α) exp[−( αx )β ] x ≥ 0

pβ,α (x) =

0 otherwise

When γ = 0 and β = 1 , the Weibull distribution is reduced to Exponential

distribution with pdf

1 −x

αe

α x≥0

pα (x) =

0 otherwise

22

Probability Models and their Parametric Estimation

A random variable X has a Triangular distribution if its pdf is given by

2(x−a)

(b−a)(c−a) a ≤ x ≤ b

pa,b,c (x) = 2(c−x)

b<x≤c

(c−b)(c−a)

0 otherwise

that 2a+c

3 ≤ E[X] ≤ a+2c 3 . The mode is used more often than the mean to

characterize the Triangular distribution.

1.5.8 Empirical Distribution

An empirical distribution may be either continuous or discrete in nature. It is

used to establish a statistical model for the available data whenever there is a

discrepancy in the aimed distribution or whenever one can unable to arrive at a

known distribution.

(a) Empirical Continuous Distributions

The time taken to install 100 machines is collected. The data are given in Table

1.1 which gives the number of machines together with time taken. For example,

30 machines have installed between 0 and 1 hour, 25 between 1 and 2 hour, 20

between 2 and 3 hour and 25 between 3 and 4 hour. X denotes time taken to

install the machines.

Duration

of Hours Frequency p(x) F (x) = P {X ≤ x}

0≤x≤1 30 .30 .30

1<x≤2 25 .25 .55

2<x≤3 20 .20 .75

3<x≤4 25 .25 1.00

(b) Empirical Discrete Distributions

At the end of the day, the number of shipments on the loading docks of an export

company are observed as 0, 1 , 2, 3, 4 and 5 with frequencies 23, 15, 12, 10, 25

and 15 respectively. Let X be the number of shipments on the loading docks of

the company at the end of the day. Then X is a discrete random variable which

takes the values 0 , 1, 2, 3, 4 and 5 with the distribution as given in Table 1.2.

Figure 1.1 is the Histogram of number of shipments on the loading docks of the

company.

23

A. Santhakumaran

Number of

shipments x Frequency P {X = x} F (x) = P {X ≤ x}

0 23 .23 .23

1 15 .15 .38

2 12 .12 .50

3 10 .10 .60

4 25 .25 .85

5 15 .15 1.00

F

R

E

Q 25

U

E 20

N

C 15

Y 10

5

0 1 2 3 4 5

Number of shipments

Figure 1.1 Histogram of shipments

The probability of an item whose value of the variable is constant increment, is

an Exponential distribution. This is apt to fit the data. If the probability of a variable

of an item whose value of the variable is either positive or negative, then a Normal

distribution is appropriate to the data. When the variable of interest seems to follow

the Normal probability distribution, the random variable is restricted to be greater than

or less than a certain value. The truncated Normal distribution will be adequate to fit

the data. The Gamma and Weibull distributions are also used to describe the data. The

Exponential distribution is a special case of both the Gamma and Weibull distributions.

The difference between the Exponential, Gamma and Weibull distributions involve the

location of modes of the pdf ’s and the shapes of their tails will be in proportion to

24

Probability Models and their Parametric Estimation

large and short times. The Exponential distribution has its mode at the origin but the

Gamma and Weibull distributions have their modes at some point( ≥ 0 ) which is a

function of the parameters values selected. The tail of the Gamma distribution is long,

like an Exponential distribution while the tail of the Weibull distribution may decline

more rapidly or less rapidly than that of an Exponential distribution. In practice, if

there are higher value of the variable than an Exponential distribution, it can account

for a Weibull distribution which provides a better distribution of the data.

Illustration 1.6.1

Sixteen equipments were produced and placed on test and the Table 1.3 gives the

length of time intervals between failures in hours.

Equipment

Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Time

between

failures 19 12 16 1 15 5 10 1 46 7 33 25 4 9 1 10

For the sake of simplicity in processing the data , one can set up the ordered set as

given blow:

Equipment

Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Time

between

failures 1 1 1 4 5 7 9 10 10 12 15 16 19 25 33 46

On this basis, one may construct a Histogram to judge the pattern of the data in Table

1.4. An approximate value of the interval can be determined from the formula.

maximum value - minimum value

∆t =

1 + 3.3 log10 N

where the maximum and minimum are the values in the ordered set and N is the total

number of items of the order statistics. In this case maximum value is 46 , minimum

value is 1 and N is 16. Thus ∆t = 1+3.345 log10 16 = 9.05 ≈ 10 = width of the class

interval.

Time

interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50

Number of

Equipment 9 4 1 1 1

25

A. Santhakumaran

Histogram is drawn based on the frequency distribution in Table 1.5 and is given in

Figure 1.2.

9

Number

of

Equipment

4

1 1 1

0 10 20 30 40 50

Time interval

Figure 1.2 Histogram of time to failures

The Histogram reveals that the distribution could be Negative Exponential or the

right portion of the Normal distribution. Assume the time to failure follows Exponen-

tial distribution of the form,

−θx

θe θ > 0, x > 0

pθ (x) =

0 otherwise

How for the assumption is valid has to be verified? The validity of the assumption

is tested by the χ2 test of goodness of fit.

Expected Observed

frequency frequency

Interval pi E O

0 - 10 .5262 8.41 ≈ 8 9

10 - 20 .2493 3.98 ≈ 4 4

20 - 30 .1181 1.886 ≈ 2 1

30 - 40 .0559 .8944 ≈ 1 1

40 - 50 .0265 .454 ≈ 1 1

26

Probability Models and their Parametric Estimation

Rx

where pi = xii+1 θe−θx dx = e−θxi − e−θxi+1 , i = 0, 10, 20, · · · , 50. If the cell

frequencies are less than 5, then it can be made 5 or more than 5. One may get two

classes only, i.e, the expected frequencies are equal to 8 each and the corresponding

observed frequencies are 9 and 7 respectively. The χ2 test of goodness of fit fails

to test the validity of the assumption that the sample data come from an Exponential

1

distribution with parameter θ = 13.38 = .0747 = failure rate per unit hour where the

mean life time of the equipments = 214 16 = 13.38 hours. To test the validity of the

assumption that the time to failure follows an Exponential distribution, consider the

likelihood function of the cell frequencies of o1 = 9 and o2 = 7 is

e1 o1 e2 o2

n!

o !o ! n n o1 + o2 = n

L= 1 2

0 otherwise

Under H0 the likelihood function follows a Binomial probability law b(16, p) where

p = en1 . To test the hypothesis that H0 : the fit is the best one vs H1 : the fit is not the

best one. It is equivalent to test the hypothesis that H0 : p ≤ .5 vs H1 : p > .5 The

UMP level α = .05 test is given by

1 if x > 11

φ(x) = .17 if x = 11

0 otherwise

The observed value is 9 which is less than 11. There is no evidence to reject the

hypothesis H0 . The data come from an Exponential distribution with 5% level of

significance. Thus time to failure of the equipments follows an Exponential distribu-

tion. One may conclude that on an average the equipment would be operated for 13.38

hours without failure.

The construction of Histograms and the recognition of a distributional shape are

necessary ingredients for selecting a family of distributions to represent a sample data.

A Histogram is not useful for evaluating the fit of the chosen distribution. When there

are a small number of data points ( ≤ 30 ), a Histogram can be rather ragged. Further

perception of the fit depends on the width of the Histogram intervals. Even if the

intervals are well chosen, grouping the data into cells makes it difficult to compare a

Histogram to a continuous pdf . A quantile - quantile (q - q) plot is a useful tool for

evaluating distribution fit that does not suffer from these problems.

If X is a random variable with cumulative distribution F (x) , then q quantile of

X is that value y such that F (y) = P {X ≤ y} = q for 0 < q < 1 . When F (x)

has an inverse y = F −1 (q) . Let x1 , x2 , · · · , xn be a sample observations of X .

Order the observations from the smallest to the largest and denote these as yj , j = 1

to n where y1 ≤ y2 ≤ · · · ≤ yn . One can denote j the rank or order number.

Therefore j = 1 for the smallest and j = n for the largest. The q - q plot is based on

j− 21

that yj is an estimate of the ( n ) quantile of X , i.e, yj is approximately

the fact

j− 12

F −1 n .

27

A. Santhakumaran

tation of the random variable X . If F (x) is a member of an appropriate family of

j− 1

distributions, then a plot of yj versus F −1 n 2 will be approximately a straight

line.

If F (x) is from an appropriate family of distributions and also has appropriate

parameter values, then the line will have slope 1. On the other hand, if the assumed

distribution is inappropriate, the points will deviate from a straight line in a systematic

manner. The decision whether to accept or reject some hypothesized distribution is

subjective.

In the construction of q - q plot, the following should be borne in mind.

(i) The observed values will never fall exactly on a straight line. (ii) The ordered values

are not independent, since they have been ranked. (iii) The variances of the extremes

are much higher than the variances in the middle of the plot. Greater discrepancies can

be accepted at the extremes. The linearity of the points in the middle of the plot is more

important than the linearity at the extremes.

Illustration 1.7.1

A sample of 20 repairing times of electronic watch was considered. The repairing

time X is a random variable. The values are in seconds on the random variable X .

The values are arranged in the increasing order of magnitude as in Table 1.7.

j Value j Value j Value j Value

1 88.54 6 88.82 11 88.98 16 89.26

2 88.56 7 88.85 12 89.02 17 89.30

3 88.60 8 88.90 13 89.08 18 89.35

4 88.64 9 88.95 14 89.18 19 89.41

5 88.75 10 88.97 15 89.25 20 89.45

28

Probability Models and their Parametric Estimation

yj = yj =

xj = yj × .08 xj = yj × .08

j− 1 j− 1 j− 1 j− 1

−1

j 20

2

F ( 20

2

) + 88.993 j 20

2

F −1 ( 202 ) + 88.993

1 .025 -1.96 88.84 11 .525 .06 89.00

2 .075 - 1.41 88.88 12 .575 .18 89.01

3 .125 - 1.13 88.90 13 .625 .31 89.02

4 .175 - 0.93 88.92 14 .675 .45 89.03

5 .225 - 0.75 88.94 15 .725 .60 89.04

6 .275 -.60 88.95 16 .775 .75 89.05

7 .325 -.45 88.96 17 .825 .93 89.07

8 .375 -.31 88.97 18 .875 1.13 89.08

9 .425 - .18 88.98 19 .925 1.41 89.11

10 .475 -.06 88.99 20 .975 1.96 89.15

1

j−

The ordered observations in Table 1.8 are then plotted versus F −1 n 2 for

j = 1, 2, · · · , 20 where F (.) is the cumulative distribution function of the Normal

random variable X with mean 88.993 seconds, and standard deviation .08 seconds to

obtain the q - q plot. The plotted values are shown in Figure 1.3. The general per-

ception of a straight line is quite clear in the q - q plot, supporting the hypothesis of a

normal distribution.

?

? Normal

? quantile

yj

?

?

?

?

?

?

Time xj

Figure 1.3 q − q plot of the repairing times

Note: The diagnosis of statistical distributions of real life problems are not exact

but at best they represent reasonable approximations.

Problems

1.1 The mean and variance of the number of defective items drawn randomly one

by one with replacement from a lot are found to be 10 and 6 respectively. The

distribution of the number of defective items is:

(a) Poisson with mean 10

29

A. Santhakumaran

(c) Normal with mean 10 and variance 6

(d) None of the above

1.2 If X is a Poisson random variate with mean 3, then P {|X − 3| < 1} will be:

(a) 21 e−3 (b) 3e−3 (c) 4.5e−3 (d) 27e−3

1.3 Let U(1) , U(2) , · · · , U(n) be the order statistics of a random sample

U1 , U2 , · · · , Un of size n from the uniform (0, 1) distribution. Then the con-

ditional distribution of U1 given U(n) = u(n) is given by:

(a) Uniform on (0, u(n) )

(b) P {U1 = u(n) } = n1 and probability n−1 n is uniformly distributed over

(0, u(n) ) .

(c) Beta n1 , n−1

n

(d) Uniform (0, 1)

1.4 A biased coin is tested 4 times or until a head turns up, whichever occurs earlier.

The distribution of the number of tails turning up is:

(a) Binomial (b) Geometric (c) Negative Binomial (d) Hypergeometric

1.5 If X and Y are independent Exponential random variables with the same mean

λ , then the distribution of min(X, Y ) is :

(a) Exponential with mean λ2

(b) Exponential with mean λ

(c) not Exponential with mean λ

(d) Exponential with mean 2λ

1.6 The χ2 goodness of fit is based on the assumption that a character under study is

(a) Normal (b) Non - Normal (c) any distribution (d) not required

1.7 The exact distribution of χ2 goodness of fit for each experiment unit is classified

into one of more k categories of a random sample of size n depends on :

(a) Hypergeometric distribution

(b) Normal distribution

(c) Multinomial distribution

(d) Binomial distribution

1.8 If X1 ∼ b(n1 , θ1 ) , X2 ∼ b(n2 , θ2 ) and X1 , X2 are independent, then the

sum of the variates X1 + X2 is distributed as :

(a) Hypergeometric distribution

(b) Binomial distribution

(c) Poisson distribution

(d) None of the above

1.9 If X1 ∼ b(n1 , θ) , X2 ∼ b(n2 , θ) and X1 , X2 are independent, then the sum

of the variates X1 + X2 is distributed as :

(a) Hypergeometric distribution

(b) Binomial distribution

30

Probability Models and their Parametric Estimation

(d) None of the above

1.10 If X1 ∼ P (θ1 ), X2 ∼ P (θ2 ) and X1 , X2 are independent,then the sum of

the variates X1 + X2 is distributed as :

(a) Hypergeometric distribution

(b) Binomial distribution

(c) Poisson distribution

(d) None of the above

1.11 The skewness of a Binomial distribution will be zero if:

(a) p < .5 (b) p > .5 (c) p 6= .5 (d) p = .5

1.12 If the sample size n = 2 , the students’ t - distribution reduces to:

(a) Normal distribution

(b) F - distribution

(c) χ2 - distribution

(d) Cauchy distribution

1.13 The reciprocal property of Fn−1,n2 −1 distribution can be expressed as:

(a) Fn2 ,n1 (1 − α) = Fn ,n1 (α)

1 2

(b) P {Fn1 ,n2 (α) ≥ c} = P Fn2 ,n1 (α) ≤ 1c

1 2 2

(d) All the above

1.14 The distribution of which the moment generating function is not useful in finding

the moments is :

(a) Binomial distribution

(b) Negative Binomial distribution

(c) Hypergeometric distribution

(d) Geometric distribution

1.15 Probability of selecting a unit from a population of N units in a simple random

sampling technique is a :

(a) Bernoulli distribution

(b) Binomial distribution

(c) Geometric distribution

(d) discrete Uniform distribution

1.16 A production process is a sequence of Bernoulli trials, the number of x defective

units in a sample of n units is a:

(a) Bernoulli distribution

(b) Binomial distribution

(c) Multinomial distribution

(d) Hypergeometric distribution

1.17 A random variable X is related to a sequence of Bernoulli trials in which the

number of trials (x + 1) to achieve the first success, then the distribution of X

31

A. Santhakumaran

is :

(a) Bernoulli distribution

(b) Binomial distribution

(c) Multinomial distribution

(d) Geometric distribution

Pn

1.18 If X1 , X2 , · · · , Xn are iid Geometric variables, then i=1 Xi follows:

(a) Negative Binomial distribution

(b) Binomial distribution

(c) Multinomial distribution

(d) Geometric distribution

1.19 A random variable X is related to a sequence of Bernoulli trials in which x

failures preceding the nth success in (x + n) trials is a :

(a) Binomial distribution

(b) Multinomial distribution

(c) Negative Binomial distribution

(d) Geometric distribution

1.20 If a random experiment has only two mutually exclusive outcomes of a Bernoulli

trial, then the random variable leads to:

(a) Binomial distribution

(b) Multinomial distribution

(c) Negative Binomial distribution

(d) Geometric distribution

1.21 A box contains N balls M of which are white and N − M are red. If X

denotes the number of white balls in the sample contains n balls with replace-

ment, then X is a :

(a) Binomial variate

(b) Bernoulli variate

(c) Negative Binomial variate

(d) Hypergeometric variate

1.22 The number of independent events that occur in a fixed amount of time may

follow:

(a) Exponential distribution

(b) Poisson distribution

(c) Geometric distribution

(d) Gamma distribution

1.23 A power series distribution

ax θ x

f (θ) x ∈ S, ax ≥ 0

Pθ {X = x} =

0 otherwise

p

where f (θ) = (1 + θ)n , θ = (1−p) and S = {0, 1, 2, · · · } . Then the random

variable X has

32

Probability Models and their Parametric Estimation

(b) Bernoulli distribution

(c) Binomial distribution

(d) Negative Binomial distribution

2

1.24 The given probability function p(x) = 3x+1 for x = 0, 1, 2, 3, · · · , represents:

(a) Negative Binomial distribution

(b) Binomial distribution

(c) Bernoulli distribution

(d) Geometric distribution

1.25 Dinesh Kumar receives 2, 2, 4 and 4 telephone calls on 4 randomly selected days.

Assuming that the telephone calls follow Poisson distribution, the estimate of the

number of telephone calls in 8 days is:

(a) 12 (b) 3 (c) 24 (d) none of the above

1.26 The exact distribution of χ2 goodness of fit test for each experiment units is

classified into one of two categories of a random sample of size n depends on :

(a) Hypergeometric distribution

(b) Normal distribution

(c) Multinomial distribution

(d) Binomial distribution

1.27 The pmf of a random variable X is

!

k+x

P∞ θ x+k

k

k=0 (−1) k Γ(x+k+1) x = 0, 1, · · ·

pθ (x) =

=0 otherwise

It is known as

(a) Binomial ( b) Negative Binomial (c) Poisson (d) Geometric

33

A. Santhakumaran

2.1 Introduction

In real life applications, determining appropriate distributions from the random

sample is a major task. Faulty assumption of distributions will lead to misleading rec-

ommendations. As a family of distributions induced by a parameter has been selected,

the next step is to estimate the parameters of the distribution. The criteria of the point

estimators for many standard distributions are described in this chapter.

The set of all admissible values of parameters of a distribution is called the parame-

ter space Ω . Any member from the parameter space is called parameter. For example,

a random variable X is assumed to follow a normal distribution with mean θ and

variance σ 2 . The parameter space Ω = {(θ, σ) | −∞ < θ < ∞, 0 < σ 2 < ∞} .

Suppose a random sample X1 , X2 , X3 , · · · , Xn is taken on X . Here a statistic

T = t(X) from the sample X1 , X2 , · · · , Xn which gives the best value for the pa-

rameter θ . The particular value of the Statistic T = t(x) = x̄ based on the values

x1 , x2 , · · · , xn is called an estimate of θ . If the statistic T = X̄ is used to estimate

the unknown parameter θ, then the sample mean is called an estimator of θ . Thus an

estimator is a rule or a procedure to estimate the value of θ . The numerical value x̄ is

called an estimate of θ .

Let X1 , X2 , · · · , Xn be n independent identically distributed ( iid ) random

sample drawn from a population with probability density function ( pdf ) pθ (x) ,

θ ∈ Ω. The statistic T = t(X) is said to be a point estimator of θ , if the func-

tion T = t(X) has a single point θ̂(X) which maps to θ in the parameter space

Ω.

The problems involved in point estimation are

• to select or choose a statistic T = t(X) .

• to find the distribution function of the statistic T = t(X) .

• to verify the selected statistic satisfies the criteria of the point estimation .

The criteria of the point estimation are

(i) Consistency

(ii) Unbiasedness

(iii) Sufficiency and

(iv) Efficiency

34

Probability Models and their Parametric Estimation

2.5 Consistency

Consistency is a convergence property of an estimator. It is an asymptotic or large

sample size property. Let X1 , X2 , · · · , Xn be iid random sample drawn from a pop-

ulation with common distribution Pθ , θ ∈ Ω. An estimator T = t(X) is consistent

for θ if for every > 0 and for each fixed θ ∈ Ω, Pθ {|T −θ| > } → θ as n → ∞ ,

P

i.e. T → θ as n → ∞ for fixed θ ∈ Ω .

Example 2.1 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal

population with mean θ and known variance σ 2 . The statistic T = X̄ is chosen for

2

an estimator of the parameter θ . The statistic X̄ ∼ N( θ, σn ). To test the consistency

of the estimator, consider for every > 0 and fixed θ ∈ Ω,

Pθ {|X̄ − θ| > } = 1 − Pθ {|X̄ − θ| < }

= 1 − Pθ {− < X̄ − θ < }

√ X̄ − θ √

= 1 − Pθ {− n/σ < √ < n/σ}

σ/ n

√ √

= 1 − Pθ {− n/σ < Z < n/σ}

X̄ − θ

where Z = √

σ/ n

= 1 − Pθ {−∞ < Z < ∞} as n → ∞

= 1 − 1 = 0 as n → ∞

P

Thus X̄ → θ as n → ∞ . The sample mean X̄ of the normal population is a

consistent estimator of the population mean θ .

Remark 2.1 In general sample mean need not be a consistent estimator of the

population mean.

Example 2.2 Let X1 , X2 , X3 , · · · , Xn be iid random sample from a Cauchy

population with pdf

1 1

π 1+(x−θ)2 −∞ < x < ∞

pθ (x) =

0 otherwise

For every > 0 and fixed θ ∈ Ω,

Pθ {|X̄ − θ| > } = 1 − Pθ {− < X̄ − θ < }

= 1 − Pθ {θ − < X̄ < θ + }

Z θ+

1 1

= 1− 2

dx̄

θ− π 1 + (x̄ − θ)

since X̄ ∼ Cauchy distribution with parameter θ

Z

1 1

= 1− 2

dz where x̄ − θ = z

− π 1 + z

1

= 1 − [tan−1 (z)]−

π

2

= 1 − tan−1 () since tan−1 (−θ) = − tan−1 (θ)

π

35

A. Santhakumaran

P

i.e., X̄ 6→ θ as n → ∞ . For Cauchy population the sample mean X̄ is not a

consistent estimator of the parameter θ .

Theorem 2.1 If {Tn }∞n=1 is a sequence of estimator such that Eθ [Tn ] → θ and

Vθ [Tn ] → 0 as n → ∞ , then the statistic Tn is a consistent estimator of the param-

eter θ .

2

Consider Eθ [Tn − θ]2 = Eθ (Tn − Eθ [Tn ] + Eθ [Tn ] − θ)

2 2

= Eθ (Tn − Eθ [Tn ]) + {Eθ [Tn − θ]}

2

= Vθ [Tn ] + {Eθ [Tn − θ]}

since Eθ (Tn − Eθ [Tn ]) = 0

By Chebychev’s inequality

1

Pθ {|Tn − θ| > } ≤ Eθ [Tn − θ]2

2

1 h 2

i

≤ Vθ [Tn ] + {Eθ [Tn − θ]}

2

→0 as n → ∞

∵ Tn is a consistent estimator of θ .

Remark 2.2 The conditions are only sufficient, but not necessary. Since if

{Xn }∞ n=1 is a sequence of iid random variables from a population with finite mean

θ = Eθ [X] , then X̄ converges to θ in probability for each fixed θ ∈ Ω. It is known

as Khintchin’s Weak Law of Large Numbers, i.e., sample mean X̄ finitely exists, is a

consistent estimator for the population mean θ which does not require the condition

Vθ [X̄] → 0 as n → ∞ for every fixed θ ∈ Ω . Thus consistency follows the ex-

istence of the expectation of the statistic and the assumption of finite variance of the

statistic is not needed.

For illustration the Cauchy pdf is

1 1

π 1+x2 −∞ < x < ∞

p(x) =

0 otherwise

Z ∞

1 x

E[X] = dx

−∞ π 1 + x2

36

Probability Models and their Parametric Estimation

Z t Z t

1 x 1 2x

lim dx = lim dx

π t→∞ −t 1 + x2 2π t→∞ −t 1 + x2

1 t

lim log(1 + x2 ) −t

=

2π t→∞

1

= lim [log(1 + t2 ) − log(1 + t2 )]

2π t→∞

= 0

The Cauchy Principle value 0 is taken as the mean of the Cauchy distribution. Thus the

Cauchy distribution has not the mean finitely exist. Hence for the Cauchy population,

the sample mean X̄ is not a consistent estimator of the parameter θ .

Example 2.3 If X1 , X2 , · · P · , Xn is a random sample drawn from a normal popu-

1 n

lation N( 0, σ 2 ). ShowPthat 3n 4 4

k=1 Xk is a consistent estimator of σ .

1 n 4

Let T = 3n k=1 Xk .

n

1 X

Eσ4 [T ] = Eσ4 [Xk4 ]

3n

k=1

n

1 X

= Eσ4 [Xk − 0]4 since E[Xk ] = 0 ∀ k = 1, 2, · · ·

3n

k=1

1 1

= nµ4 = 3nσ 4 since µ4 = 3σ 4 where

3n 3n

µ2n = 1 × 3 × 5 × · · · × (2n − 1)σ 2n n = 1, 2, · · ·

= σ4

n

1 X

Vσ4 [T ] = Vσ4 [X 4 ]

(3n)2

k=1

n

1 Xn 4 2 4 2

o

= Eσ 4 [Xk ] − E σ 4 [Xk ]

(3n)2

k=1

1

= n[µ8 − µ24 ]

(3n)2

1

= [105σ 8 − (3σ 4 )2 ] since µ8 = 1 × 3 × 5 × 7 × σ 8

32 n

1

= 96σ 8 → 0 as n → ∞.

32 n

Thus T is a consistent estimator of σ 4 .

Example 2.4 Let X1 , X2 , · · · Xn be a random sample drawn from a population

Qn 1

with rectangular distribution ∪(0, θ), θ > 0 . Show that ( i=1 Xi ) n is a consistent

estimator of θe−1 .

37

A. Santhakumaran

Qn 1

Let GM = ( i=1 Xi ) n ∀ Xi > 0, i = 1, 2, · · · , n .

n

1X

loge GM = log Xi

n i=1

Z θ

1

Eθ [log X] = log xdx

θ 0

( Z θ )

1 θ

= [x log x]0 − dx

θ 0

1h i

= θ log θ − lim x log x − θ

θ x→0

= log θ − 1

1

log x

Since lim x log x = lim 1 = lim x1 = 0

x→0 x→0 x→0 − 2

x x

Z θ

1

Eθ [log X]2 = (log x)2 dx

θ 0

Z θ

1 2 θ 1 log x

= [x(log x) ]0 − 2x dx

θ θ 0 x

1 2

= (log θ)2 − lim x(log x)2 − [θ log θ − θ]

θ x→0 θ

= (log θ)2 − 2 log θ + 2 since lim x(log x)2 = 0

x→0

Vθ [log X] = (log θ)2 − 2 log θ + 2 − (log θ − 1)2 = 1

n

1 X 1

Vθ [log GM ] = Vθ [log Xi ] =

n2 i=1 n

Vθ [log GM ] → 0 as n → ∞, ∀ θ > 0

of θe−1 .

Example 2.5 Let X1 , X2 , · · · , Xn be iid random sample drawn from a pop-

ulation with Eθ [Xi ] = θ and Vθ [Xi ] = σ 2 , ∀ i = 1, 2, · · · , n. Prove that

38

Probability Models and their Parametric Estimation

2

Pn

n(n+1) i=1 iXi is a consistent estimator of θ .

" n #

X

Eθ iXi = Eθ [X1 + 2X2 + · · · + nXn ]

i=1

= θ + 2θ + · · · + nθ

= θ[1 + 2 + · · · + n]

n(n + 1)

= θ

" n # 2

2 X

Eθ iXi = θ, ∀ θ ∈ Ω

n(n + 1) i=1

" n # n

X X

Vθ iXi = i2 Vθ [Xi ]

i=1 i=1

n

X

= σ2 i2

i=1

2 n(n

+ 1)(2n + 1)

= σ

" # 6

n

2 X 2 (2n + 1) 2

Vθ iXi = σ → 0 as n → ∞

n(n + 1) i=1 3 n(n + 1)

2

Pn

Thus n(n+1) i=1 iXi is a consistent estimator of θ.

Example 2.6 Let T = max1≤i≤n {Xi } be the nth order statistic of a random

sample of size n drawn from a population with a uniform distribution on the interval

( 0, θ ). The pdf of T is

n−1

nt

θn 0 < t < θ, θ > 0

pθ (t) =

0 otherwise

Z θ

n n

Eθ [T ] = tn dt = θ

θn 0 n+1

2

nθ nθ2

Eθ [T 2 ] = , Vθ [T ] =

(n + 2) (n + 2)(n + 1)2

h Eθ [T ]i→ θ and Vθ [T ] → 0 as n → ∞.

Thus T is a consistent estimator of θ . Also

θ2

Eθ (n+1)

n T = θ and V θ [ (n+1)

n T ] = n(n+2) → 0 as n → ∞, i.e.,

(n+1)

n T is

n T are the two consistent

estimators of the same parameter θ . Thus consistent estimator is not unique.

39

A. Santhakumaran

If T = t(X) is a consistent estimator of θ , then an T, T + cn , and an T + cn

are also consistent estimators of θ, where an = 1 + nk , k ∈ < and an → 1 and

cn → 0 as n → ∞ for every fixed θ ∈ Ω . In general, we have the Theorem 2.2.

Theorem 2.2 If Tn = tn (X) is a consistent estimator of τ (θ) and ψ(τ (θ)) is a

continuous function of τ (θ) , then ψ(Tn ) is a consistent estimator of ψ(τ (θ)) .

P

Proof Given Tn = tn (X) is a consistent estimator τ (θ) , i.e., Tn → τ (θ) as

n → ∞.

Therefore for given > 0, η > 0 , there exist a positive integer n ≥ N (, η) such

that

Also ψ(.) is a continuous function , i. e., For every such that

{|ψ(Tn ) − ψ(τ (θ))|} < 1 whenever |Tn − τ (θ)| <

i.e., |Tn − τ (θ)| < ⇒ |ψ(Tn ) − ψ(τ (θ))| < 1

For any two events A and B if A ⇒ B , then A ⊆ B .

Therefore P (A) ≤ P (B), i.e., P (B) ≥ P (A) . Let A = {|Tn − τ (θ)| < } and

B = {|ψ(Tn ) − ψ(τ (θ))| < 1 } then

P {ψ(Tn ) − ψ(τ (θ))| < 1 } ≥ P {|Tn − τ (θ)| < }

P

i.e., P {|ψ(Tn ) − ψ(τ (θ))| < 1 } ≥ 1 − η ∀ n ≥ N ⇒ ψ(Tn ) → ψ(τ (θ)) as

n → ∞.

i.e., ψ(Tn ) is a consistent estimator of ψ(τ (θ))

Example 2.7 Suppose T = t(X) is a statistic with pdf p(x)θ for θ > 0, θ ∈ Ω .

Prove that T 2 = t2 (X) is a consistent estimator of θ2 , if T = t(X) is a consistent

estimator of θ .

Given T = t(X) is a consistent estimator of θ .

By the definition of consistent estimator, Pθ {|T − θ| < } → 1 as n → ∞, for θ >

0, ∀ θ ∈ Ω , consider

Pθ {|T − θ| < } = Pθ {θ − < T < θ + }

= Pθ {(θ − )2 < T 2 < (θ + )2 }

= Pθ {−2θ < T 2 − θ2 − 2 < 2θ}

= Pθ {−0 < T 2 − θ2 − 2 < 0 }

where 0 = 2θ

= Pθ {−0 < T 0 − θ2 < 0 }

where T 0 = T 2 − 2

= Pθ {|T 0 − θ2 | < 0 } → 1 as n → ∞

T 0 = T 2 − 2 → T 2 as n → ∞ since → 0 as n → ∞

.. . Pθ {|T 2 − θ2 | < 0 } → 1 as → ∞. Thus T 2 is a consistent estimator of θ2 .

40

Probability Models and their Parametric Estimation

For any statistic g(T ) , if the mathematical expectation is equal to a parameter τ (θ) ,

then g(T ) is called an unbiased estimator of the parameter τ (θ) ,

Otherwise, the statistic g(T ) is said to be a biased estimator of τ (θ) . The unbiased

estimator is also called zero bias estimator. A statistic g(T ) is said to be asymptotically

unbiased estimator if Eθ [g(T )] → τ (θ) as n → ∞, ∀ θ ∈ Ω .

Example 2.8 A random variable X has the pdf

2θx if 0 < x < 1

pθ (x) = (1 − θ) if 1 ≤ x < 2, 0 < θ < 1

0 otherwise

R 1 R2

only if 0 xg(x)dx = 21 and 1 g(x)dx = 0.

Assume g(X) is an unbiased estimator of θ , i.e.,

Eθ [g(X)] = θ

Z 1 Z 2

g(x)2θxdx + g(x)(1 − θ)dx = θ

0 1

Z 1 Z 2 Z 2

θ 2xg(x)dx − g(x)dx + g(x)dx = θ

0 1 1

Z 1 Z 2

⇒ 2xg(x)dx − g(x)dx = 1 and

0 1

Z 2

g(x)dx = 0

1

Z 1

1

i.e., xg(x)dx = and

0 2

Z 2

g(x)dx = 0

1

R 1 1

R 2

Conversely, 0

xg(x)dx = 2 and 1

g(x)dx = 0, then g(X) is an unbiased esti-

mator of θ .

Z 1 Z 2

Eθ [g(X)] = 2θxg(x)dx + (1 − θ)g(x)dx

0 1

Z 1 Z 2

= 2θ xg(x)dx + (1 − θ) g(x)dx

0 1

1

= 2θ + (1 − θ) × 0

2

= θ

41

A. Santhakumaran

Example 2.9 If T denotes the number of successes in n independent and identical

trials of an experiment with probability of success θ . Obtain an unbiased estimator of

θ2 and θ(1 − θ), 0 < θ < 1. Pn

Let Xi ∼ b(1, θ), ∀ i = 1, 2, · · · , n , then T = i=1 Xi ∼ b(n, θ) . If g(T ) is

the unbiased estimator of τ (θ) = θ(1 − θ) , then Eθ [g(T )] = θ(1 − θ)

n

X

g(t)cn t

t θ (1 − θ)

n−t

= θ(1 − θ)

t=0

n t

X θ

g(t)cn

t = θ(1 − θ)1−n

t=0

1−θ

θ

Consider ρ =

1−θ

ρ

⇒θ =

1+ρ

n 1−n

X ρ 1

.. . g(t)cn

tρ

t

=

t=0

1+ρ 1+ρ

= ρ(1 + ρ)n−2

= ρ[1 + cn−2

1 ρ + cn−3

2 ρ2 + · · · + ρn ]

Equating the coefficient of ρt on both sides

g(t)cn

t = cn−2

t−1

(n − 2)! t!(n − t)!

g(t) =

(t − 1)!(n − t − 1)! n!

(n − 2)!t(t − 1)!(n − t)(n − t − 1)!

=

(t − 1)!n(n − 1)(n − 2)!(n − t − 1)!

t(n − t)

= , if n = 2, 3, · · ·

n(n − 1)

T (n − T )

n = 2, 3, · · ·

n(n − 1)

42

Probability Models and their Parametric Estimation

Eθ [g ∗ (T )] = θ2

n t

X θ

g ∗ (t)cnt (1 − θ)n = θ2

t=0

1−θ

n

X

g ∗ (t)cnt ρt = ρ2 (1 + ρ)n−2

t=0

= ρ2 [1 + cn−2

1 ρ + · · · + cn−2

t ρt + · · · + ρn−2 ]

∵ g(t)∗ cnt = cn−2

t−2

∗ (n − 2)!t!(n − t)!

⇒ g (t) =

(t − 2)!(n − t)!n!

(n − 2)!t(t − 1)!(t − 2)!

=

(t − 2)!n(n − 1)(n − 2)!

t(t − 1)

= n = 2, 3, · · · · · ·

n(n − 1)

T [T − 1]

g ∗ (T ) = n = 2, 3, · · ·

n(n − 1)

a Geometric population with pmf

pθ (x) =

0 otherwise

1

Eθ [g(X)] =

θ

∞

X 1

g(x)θ(1 − θ)x−1 =

x=1

θ

∞

X (1 − θ)

g(x)(1 − θ)x =

x=1

θ2

Take 1 − θ = ρ ⇒ θ = 1−ρ

X∞

g(x)ρx = ρ(1 − ρ)−2

x=1

= ρ(1 + 2ρ + 3ρ2 + · · · + xρx−1 + · · · )

⇒ g(x) = x ∀ x = 1, 2, 3, · · ·

1

Thus g(X) = X is the unbiased estimator of θ .

43

A. Santhakumaran

from a Bernoulli population, then there is no unbiased estimator exist for θ2 .

pθ (x) =

0 otherwise

Eθ [g(X)] = θ2

1

X

g(x)θx (1 − θ)1−x = θ2

x=0

g(0)(1 − θ) + g(1)θ = θ2

Thus the value of θ2 is 0 for x = 0 or x = 1 . But the value of θ2 lies between 0 to

1. ∴ The unbiased estimator of θ2 does not exist.

Example 2.12 If X ∼ b(n, θ) , then show that there exist no unbiased estimator

of the parameter θ1

1

Consider Eθ [g(X)] =

θ

n x

X n! θ 1

g(x) (1 − θ)n =

i=0

x!(n − x)! 1 − θ θ

n

X n! (1 + ρ)n+1

g(x) ρx =

i=0

x!(n − x)! ρ

θ

where ρ = 1−θ

n+1

n!

ρx → g(0) as θ → 0 and (1+ρ)

P

g(x) x!(n−x)! ρ → ∞ as ρ → 0 or θ → 0

Thus there is no unbiased estimator exist of the parameter θ1 .

Example 2.13 A random sample X is drawn from a Bernoulli population b(1, θ), θ =

{ 14 , 12 } . Then there exists an unique unbiased estimator of θ2 .

Let Eθ [g(X)] = θ2

1

X

g(x)θx (1 − θ)1−x = θ2

x=0

1 1

When θ = ⇒ 3g(0) + g(1) = (2.1)

4 4

44

Probability Models and their Parametric Estimation

1 1

When θ = ⇒ g(0) + g(1) = (2.2)

2 2

Solving the equations (2.1) and (2.2) for g(0) and g(1) , one gets the values of g(0) =

− 81 and g(1) = 58 , 1

−8 for x = 0

i.e., g(x) = 5

8 for x=1

Thus the unbiased estimator of θ2 is g(X) = X which is unique.

Unbiased estimator is not unique

Pnfrom a popu-

lation with Poisson distribution P (θ) . g1 (X) = X̄ and g2 (X) = n1 i=1 (Xi − X̄)2

are the two unbiased estimators of θ. Consider a statistic g(X) = αg1 (X) + (1 −

α)g2 (X), α ∈ <, 0 < θ < 1 . Then Eθ [g(X)] = θ ∀ θ ∈ Ω and α ∈ < which is not

unique. Thus unbiased estimator is not unique.

Example 2.15 Show that the mean X̄ of a random sample of size n drawn from

a population with probability density function

1 −x

θe

θ 0 < x < ∞, θ > 0

pθ (x) =

0 otherwise

2

θ

Pn of θ and has variance n .

is an unbiased estimator

Let T = i=1 Xi ∼ G(n, θ). The pdf of T is

1 − t n−1

θ n Γn e

θt 0 < t < ∞, θ > 0

pθ (t) =

0 otherwise

Z ∞

1 − 1 t n+1−1

Eθ [T ] = e θ t dt

0 θn Γn

" n

# = nθ

X

Eθ Xi = nθ ∀ θ > 0

i=1

Eθ [nX̄] nθ ∀ θ > 0

=

⇒ Eθ [X̄] θ∀θ>0

=

Eθ [T 2 ] n(n + 1)θ2 ∀ θ > 0

=

Vθ [T ] nθ2 ∀ θ > 0

=

Pn

. i=1 Xi

. . Vθ [X̄] = Vθ

n

1

= Vθ [T ]

n2

1 2 θ2

= nθ =

n2 n

45

A. Santhakumaran

n 2

i=1 Xi

ulation with mean zero and variance σ 2 , 0 < σ 2 < ∞. Show that n is an

2 2σ 4

unbiased estimator of σ and has variance n .

Pn 2

Define ns2 = i=1 Xi2 , then Y = ns 2

σ 2 ∼ χ distribution with n degrees

n 1

of freedom , i.e., Y ∼ G( 2 , 2 ).

( 1 n

n

1

e− 2 y y 2 −1 0 < y < ∞

p(y) = 2 2 Γn 2

0 otherwise

Z ∞

1 1 n

E[Y ] = 1 n

e− 2 y y 2 +1−1 dy

0 2 2 Γ 2

1 Γ( n2 + 1)

= n n

2 2 Γ n2 ( 1 ) 2 +1

2

= n

2

E[Y ] = n2 + 2n

V [Y ] = 2n

ns2

But Y = 2

σ2

ns

.. . Eσ2 = n

σ2

⇒ Eσ2 [s2 ] = σ2

Xi2

P

Thus n is an unbiased estimator of σ 2 .

2

ns

Vσ 2 = 2n

σ2

n2

Vσ2 [s2 ] = 2n

σ4

2σ 4

Vσ2 [s2 ] =

n

Example 2.17 Let Y1 < Y2 < Y3 be the order statistics of a random sample of

size 3 drawn from an uniform population with pdf

1

θ 0<x<θ

pθ (x) =

0 otherwise

Show that 4Y1 and 2Y2 are unbiased estimators of θ . Also find the variance of these

estimators.

The pdf of Y1 is

( hR i2

3! 1 θ 1

pθ (y1 ) = 1!2! θ y1 θ

dx 0 < y1 < θ

0 otherwise

46

Probability Models and their Parametric Estimation

3 y1 2

θ [1 − θ ] 0 < y1 < θ

pθ (y1 ) =

0 otherwise

Z θ

3 y1 2

Eθ [Y1 ] = y1 (1 − ) dy1

θ 0 θ

Z 1

3 y1

= θt(1 − t)2 θdt where θ =t

θ 0

Z 1

= 3θ t2−1 (1 − t)3−1 dt

0

Γ2Γ3 θ

= 3θ = ∀θ>0

Γ5 4

θ2 3θ2

Similarly Eθ [Y12 ] = and Vθ [Y1 ] =

10 15

3θ2

.. . Vθ [4Y1 ] =

5

The pdf of Y2 is

!

Z y2 Z θ

3! 1 1 1

pθ (y2 ) = dx dx

1!1!1! 0 θ θ y2 θ

6 y2

θ 2 y2 [1 − θ ] 0 < y2 < θ

pθ (y2 ) =

0 otherwise

.˙. Eθ [Y2 ] = θ2

2

θ2

⇒ 2Y2 is an unbiased estimator of θ and Eθ [Y 2 ] = 3θ 10 and Vθ [Y2 ] = 20

2

⇒ Vθ [2Y2 ] = θ5

Example 2.18 Let Y1 and Y2 be two independent and unbiased estimators of θ .

If the variance of Y1 is twice the variance of Y2 , find the constant k1 and k2 so that

k1 Y1 + k2 Y2 is an unbiased estimator of θ with smaller possible variance for such a

linear combination.

Given Eθ [Y1 ] = θ ∀ θ and Eθ [Y2 ] = θ ∀ θ and Vθ [Y1 ] = 2σ 2 and

47

A. Santhakumaran

k1 Eθ [Y1 ] + k2 Eθ [Y2 ] = θ

⇒ k1 + k2 = 1

i.e., k2 = 1 − k1

Consider φ = Vθ [k1 Y1 + k2 Y2 ]

= k12 Vθ [Y1 ] + k22 Vθ [Y2 ]

= k12 2σ 2 + (1 − k1 )2 σ 2

= 3k12 σ 2 − 2k1 σ 2 + σ 2

Differentiate twice this with respective to k1

dφ

= 6k1 σ 2 − 2σ 2

dk1

d2 φ

= 6σ 2

dk12

dφ d2 φ

For minimum =0 and >0

dk1 dk12

⇒ 6k1 σ 2 − 2σ 2 = 0

1 2

i.e., k1 = and k2 =

3 3

1

Thus 3 Y1 + 23 Y2 has minimum variance.

Consistent estimator need not be unbiased

Example 2.19 Let X1 , X2 , · · · , Xn be a sample of size P

n drawn from a normal

n

population with mean θ and variance σ 2 . Define s2 = n1 i=1 (Xi − X̄)2 , then

2

Y = ns 2 n−1 1

σ 2 ∼ χ distribution with (n − 1) degrees of freedom and Y ∼ G( 2 , 2 ) .

It has the pdf

( 1 n−1

n−1

1

n−1

e− 2 y y 2 −1 0 < y < ∞

p(y) = 2 2 Γ 2

0 otherwise

48

Probability Models and their Parametric Estimation

Z ∞

1 1 n−1

E[Y ] r

= n−1

n−1

e− 2 y y 2 +r−1 dy

0 2 2Γ 2

Γ n−1

1 2 +r

= n−1 n−1

2 2 Γ n−1

2 ( 12 ) 2

+r

r

2 n−1

= Γ +r

Γ n−1

2

2

When r = 1

2 n−1 n−1

E[Y ] = Γ =n−1

Γ n−1

2

2 2

ns2

.

. . Eσ2 = n−1

σ2

n−1 2

⇒ Eσ2 [s2 ] σ=

n

2(n − 1) 4

and Vσ2 [s2 ] = σ

n2

Thus Eσ2 [s2 ] → σ 2 and Vσ2 [s2 ] → 0 as n → ∞

Pn

.˙. n1 i=1 (Xi − X̄)2 is aP consistent estimator of σ 2 .

1 n

But Eσ2 [s ] 6= σ . .˙. n i=1 (Xi − X̄)2 is not an unbiased estimator of σ 2 .

2 2

Example 2.20 Illustrate with an example that an estimator is both consistent and

unbiased.

Let X1 , X2 , · · · , Xn be a random sample of size n P

drawn from a normal

n

population with mean θ and variance σ 2 . Define s2 = n1 i=1 (Xi − X̄)2 and

n 2

1 ns

S 2 = n−1 2 2

P

i=1 (Xi − X̄) , then Y = σ 2 ∼ χ distribution with (n − 1) degrees

2(n−1) 4

of freedom and Y ∼ G( n−1 1 2

2 , 2 ) . with Eσ [s ] =

2

n−1 2

n σ and Vσ2 [s2 ] = n2 σ .

n 2

(n − 1)S 2 = ns2 → S 2 = s

n−1

n

Eσ2 [S 2 ] = Eσ2 [s2 ]

n−1

n n−1 2

= σ = σ2

n−1 n

n2

Vσ2 [S 2 ] = Eσ2 [s2 ]

(n − 1)2

n2 2(n − 1) 4

= 2

σ

(n − 1) n2

2σ 4

= → 0 as → ∞

(n − 1)

1

Pn

Thus S 2 = n−1 2

i=1 (Xi − X̄) is consistent and also unbiased estimator of σ .

2

Example 2.21 Give an example that an unbiased estimator need not be consistent.

Let X1 , X2 , · · · , Xn be a random sample drawn from a normal population

with mean θ and known variance σ 2 , then the estimator X1 ( first observation) of the

49

A. Santhakumaran

and

= Pθ {θ − < X1 < θ + }

Z θ+

1 1 2

= √ e− 2σ2 (x1 −θ) dx1

2πσ θ−

6→ 1 as n → ∞

Example 2.22 Give an example that an estimator is not consistent and not unbi-

ased.

Let Y1 < Y2 < Y3 be the order statistics of a random sample of size 3 drawn

from a uniform population with pdf for given θ is

1

θ 0<x<θ

pθ (x) =

0 otherwise

θ

Eθ [Y1 ] = 6 θ ∀ θ ∈ Ω and

=

4

θ θ θ

Pθ Y1 − < = Pθ − < Y1 < +

4 4 4

Z θ4 + h

3 y1 2

i

= 1− dy1

θ θ4 − θ

6→ 1 as n → ∞

Thus Y1 the first order statistic is not consistent and not unbiased estimator of θ .

Sufficient statistic conveys as much as information about the distribution of a ran-

dom variable which is contained in the sample. It helps to identify a family of distribu-

tions only and not for the parameters of the distributions.

Definition 2.1 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a

population with pdf p(x | θ). Let T = t(X) be a statistic whose pdf is pθ (t) . For

a continuous random variable X , T = t(X) is said to be a sufficient statistic iff

pθ (x1 , x2 , · · · , xn )

pθ (t)

X , T = t(X) is said to be a sufficient statistic iff

Pθ {X1 = x1 , X2 = x2 , · · · | T = t}

50

Probability Models and their Parametric Estimation

Example 2.23 Let X be a single observation from a population with pmf

pθ (x), 0 < θ < 1 .

|x| |x|

θ (1−θ)

2 x = −1, 1

pθ (x) = 1 − θ(1 − θ) x=0

0 otherwise

Show that |X| is sufficient.

Let Y = |X| . Then P {Y = 0} = P {|X| = 0} = P {X = 0} = 1 − θ(1 − θ)

P {Y = 1} = P {|X| = 1} = P {X = 1orX = −1} = P {X = 1} + P {X = −1} =

θ(1 − θ)

Consider

P {X = 1 ∩ Y = 1}

P {X = 1 | Y = 1} =

P {Y = 1}

X = 1 ∩ |X| = 1}

=

P {Y = 1}

P {X = 1}

=

P {Y = 1}

θ(1−θ)

2 1

= = is independent of θ

θ(1 − θ) 2

Therefore Y = |X| is sufficient.

Example 2.24 Let X1 , X2 , · · · , Xn be independent random sample drawn from a

population with pdf

iθ−x

e x > iθ, i = 1, 2, 3 · · · , n

pθ (x) =

0 otherwise

Let y = xii , then dx = idy

x

Given pθ (x) = ei[θ− i ]

i[θ−y]

i.e., pθ (y) = ie ,y > θ

Take T = min1≤i≤n Yi . The pdf of T is

Z ∞ n−1

n!

pθ (t) = iei[θ−t] iθ−iy

ie dy

1!(n − 1)! t

= ineniθ−nit

P

θ<t<∞

inθ− xi

pθ (x1 , x2 , · · · , xn ) e

=

pθ (t) nieniθ−nit

1 nit−P xi

= e

ni

It is independent of θ . Thus T = min1≤i≤n Yi = min1≤i≤n Xii is sufficient.

Example 2.25 Let X1 and X2 be iid Poisson random variables with parameter

θ . Prove that

51

A. Santhakumaran

(ii) X1 + 2X2 is not a sufficient statistic.

(i) Given that (

e−θ θ x1

x1 = 0, 1, 2, · · ·

Pθ {X1 = x1 } = x1 !

0 otherwise

(

e−θ θ x2

x2 = 0, 1, 2, · · ·

and Pθ {X2 = x2 } = x2 !

0 otherwise

Let T = X1 + X2 , then

e−θ θ t

t = 0, 1, 2, · · ·

Pθ {T = t} = t!

0 otherwise

Pθ {X1 = x1 , X2 = t − x1 }

Consider Pθ {X1 = x1 , X2 = x2 | T = t} =

Pθ {T = t}

Pθ {X1 = x1 }Pθ {X2 = t − x1 }

=

Pθ {T = t}

e−θ θ x1 e−θ θ t−x2

x1 ! (t−x2 )!

= e−2θ (2θ)t

t!

t!

= is independent of θ.

(t − x1 )!x1 !2t

.˙. X1 + X2 is a sufficient statistic.

+ Pθ {X1 = 2, X2 = 0}

= Pθ {X1 = 0}Pθ {X2 = 1}

+ Pθ {X1 = 2}Pθ {X2 = 0}

θ2

= θe−2θ + e−2θ

2

−2θ θ

= θe [1 + ]

2

Pθ {X1 = 0, X2 = 1}

Therefore Pθ {X1 = 0, X2 = 1 | X1 + 2X2 = 2} =

Pθ {X1 + 2X2 = 2}

e−2θ θ

=

θe−2θ [1 + θ2 ]

2

= depends on θ.

2+θ

.˙. X1 + 2X2 is not a sufficient statistic.

Example 2.26 Let X1 and X2 be two independent Bernoulli random variables such

52

Probability Models and their Parametric Estimation

1 − Pθ {X2 = 0} = 2θ, 0 < θ ≤ 21 . Show that X1 + X2 is not a sufficient statistic.

Let T = X1 + X2 . Consider

Pθ {T = 1} = Pθ {X1 + X2 = 1}

= Pθ {X1 = 0, X2 = 1} + Pθ {X1 = 1, X2 = 0}

= (1 − θ)2θ + θ(1 − 2θ)

= θ(3 − 4θ)

Pθ {X1 = 0 ∩ X1 + X2 = 1}

.˙.Pθ {X1 = 0 | X1 + X2 = 1} =

Pθ {X1 + X2 = 1}

Pθ {X1 = 0, X2 = 1}

=

Pθ {X1 + X2 = 1}

(1 − θ)2θ

=

θ(3 − 4θ)

2(1 − θ)

= is dependent on θ.

(3 − 4θ)

. ˙. X1 + X2 is not a sufficient statistic.

Example 2.27 If X1 and X2 denote a random sample drawn from a normal popula-

tion N( θ, 1 ), −∞ < θ < ∞ . Show that T = X1 + X2 is a sufficient statistic.

The joint pdf of X1 and X2 is

1 − 1 (x1 −θ)2 − 1 (x2 −θ)2

= e 2 2

2π

Let T = X1 + X2 ∼ N (2θ, 2)

( 1 2

√ 1 √ e− 4 (t−2θ) −∞ < t < ∞

p(t)θ = 2π 2

0 otherwise

1 − 21 [x21 +x22 −2(x1 +x2 )θ+2θ 2 ]

pθ (x1 , x2 ) 2π e

= 1 1 2 2

pθ (t) √

2 π

e− 4 [t −4tθ+4θ ]

1 2 2 2

1 e− 2 (x1 +x2 )+(x1 +x2 )θ−θ

= √

π e− 41 (x1 +x2 )2 +(x1 +x2 )θ−θ2

1 1 2 2 1 2

= √ e− 2 (x1 +x2 )+ 4 (x1 +x2 ) is independent of θ.

π

. ˙. T = X1 + X2 is a sufficient statistic.

is not sufficient.

53

A. Santhakumaran

= (1 − θ)2 + θ(1 − θ) + θ(1 − θ)

= 1 − θ2

P {Y = 1} = P {X1 = 1, X2 = 1}

= θ2

P {Y + X3 = 1} = P {Y = 0, X3 = 1} + P {Y = 1, X3 = 0}

= (1 − θ2 )θ + θ2 (1 − θ)

i.e., P {T = 1} = θ(1 − θ)(1 + 2θ)

Consider

P {Y = 1, T = 1}

P {Y = 1 | T = 1} =

P {T = 1}

P {Y = 1}P {X3 = 0}

=

P {T = 1}

θ2 θ

=

θ(1 − θ)(1 + 2θ)

θ2

=

(1 − θ)(1 + 2θ)

Remark 2.3 The definition of sufficient statistic is not always useful to find a

sufficient statistic, since

(i) it does not reveal which statistic is to be sufficient.

(ii) even if it is known in some cases, it is tedious to find the pdf of the statistic.

(iii) it requires to derive a conditional density, which may not be easy, namely for

continuous random variables.

To avoid the above difficulties one may use the Neyman Factorization Theorem.

Theorem 2.3 Let X1 , X2 , · · · , Xn be discrete random variables with pmf

pθ (x1 , x2 , · · · , xn ), θ ∈ Ω. Then T = t(X) is sufficient statistic if and only if

pθ (x1 , x2 , · · · , xn ) = pθ (t)h(x1 , x2 , · · · , xn )

not depend on θ and pθ (t) is a non-negative function of θ and T = t only.

54

Probability Models and their Parametric Estimation

sufficient statistic that

pθ (x1 , x2 , · · · , xn ) = pθ (t)h(x1 , x2 , · · · , xn ).

adding the consistent restriction T = t does not alter the event X1 = x1 , X2 =

x2 , · · · , Xn = xn :

Pθ {X1 = x1 , · · · , Xn = xn } = Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn , T = t}

= Pθ {T = t}P {X1 = x1 , · · · , Xn = xn | T = t}

Choose Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn } > 0 for some θ .

Define h(x1 , x1 , · · · , xn ) = P {X1 = x1 , · · · , Xn = xn | T = t} and

pθ (t) = Pθ {T = t} , then Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn } =

pθ (t)h(x1 , x2 , · · · , xn ).

Conversely, Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn } = pθ (t)h(x1 , x2 , · · · , xn )

holds, then prove that T = t(X) is a sufficient statistic. The marginal pmf of

T = t(X) is

X

Pθ {T = t} = Pθ {X1 = x1 , · · · , Xn = xn , t(X) = t}

t(x)=t

X

= Pθ {X1 = x2 , · · · , Xn = xn }

t(x)=t

X

= pθ (t)h(x1 , x2 , · · · , xn )

t(x)=t

Pθ {X1 = x1 , · · · , Xn = xn , T = t}

Pθ {X1 = x1 , · · · , Xn = xn | T = t} =

Pθ {T = t}

(

0 if T 6= t

= Pθ {X1 =x1 ,··· ,Xn =xn }

Pθ {T =t} if T =t

If T = t, then

Pθ {X1 = x1 , · · · , Xn = xn } pθ (t)h(x1 , x2 , · · · , xn )

= P

Pθ {T = t} pθ (t) t(x)=t h(x1 , x2 , · · · , xn )

h(x1 , x2 , · · · , xn )

= P

t(x)=t h(x1 , x2 , · · · , xn )

is independent of θ.

Theorem 2.4 If T = t(X) is a sufficient statistic, then any one to one function of

the sufficient statistic is also a sufficient statistic.

55

A. Santhakumaran

Theorem p(x1 , x2 , · · · , xn | θ) = pθ (t)h(x1 , x2 , · · · , xn ). Let U be any one to one

function of T = t(X) , i.e., u = α(t) . Since u = α(t) → t = α−1 (u)

−1 0

dt

= dα du(u) = α−1 (u) .

.˙. du

h(x1 , x2 , · · · , xn )

p(x1 , x2 , · · · , xn | θ) = p(α−1 (u) | θ)[α−1 (u)]0

[α−1 (u)]0

= p(u | θ)h1 (x1 , x2 , · · · , xn )

where p(u | θ) = p(α−1 (u) | θ)[α−1 (u)]0

h(x1 , x2 , · · · , xn )

h1 (x1 , x2 , · · · , xn ) =

[α−1 (u)]0

any one to one function of T = t(X) is also a sufficient statistic.

Remark 2.4

(i) Sufficient statistic is not unique. If it is unique, there is no one to one function

exist.

(ii) Every function of a sufficient statistic is itself a sufficient statistic.

Example 2.29 Let X1 , X2 , · · · , Xn be a random sample drawn from a population

with pmf x

θ (1 − θ)1−x x = 0, 1

pθ (x) =

0 otherwise

Find the sufficient statistic.

n−t

Consider pθ (x1 , x2 , · · · , xn ) = θt (1 − θ)

Pn

where t = i=1 xi

t

θ

= (1 − θ)n

1−θ

= pθ (t)h(x1 , x2 , · · · , xn )

t

θ

where pθ (t) = (1 − θ)n and h(x1 , x2 , · · · , xn ) = 1

1−θ

Pn

.˙. T = i=1 Xi is a sufficient statistic.

Remark 2.5 If the range of the distribution depends on the parameter, the Ney-

man Factorization Theorem fails to find the sufficient statistic. For such cases of the

distributions the definition of sufficient statistic is useful to find the sufficient statistic.

Example 2.30 Let X1 , X2 , · · · , Xn be a random sample drawn from a population

with pdf −(x−θ)

e θ<x<∞

pθ (x) =

0 otherwise

Obtain a sufficient statistic.

56

Probability Models and their Parametric Estimation

The pdf of the statistic Y1 is

Z ∞ n−1

n!

pθ (y1 ) = e−(y1 −θ) e−(x−θ) dx

1!(n − 1)! y1

−n(y1 −θ)

ne θ < y1 < ∞

pθ (y1 ) =

0 otherwise

The definition of sufficient statistic gives

pθ (x1 , x2 , · · · , xn ) e−(x1 −θ) · · · e−(xn −θ)

=

pθ (y1 ) ne−n(y1 −θ)

−t+nθ

e Pn

= −ny +nθ

where t = i=1 xi

ne 1

e−t

= is independent of θ.

ne−ny1

.˙. Y1 = min1≤i≤n {Xi } is sufficient. Again

= e−yn +nθ−t+yn

n

X

where t = xi and Yn = max {Xi }

1≤i≤n

i=1

pθ (x1 , x2 , · · · , xn ) = e−yn +nθ e−t+yn

= pθ (yn )h(x1 , x2 , · · · , xn )

By Neyman Factorization Theorem, Yn = max1≤i≤n {Xi } is a sufficient statistic. But

if max1≤i≤n {Xi } = Y1 , then the range of the distribution θ < y1 < ∞ depends on

θ . Again if max1≤i≤n {Xi } = Y2 , then the range of the distribution θ < y2 < ∞

depends on θ and so on. Thus for each fixed Y1 = y1 , Y2 = y2 , · · · Yn = yn ,

h(x1 , x2 , · · · , xn ) depends on θ . h(x1 , x2 , · · · , xn ) depends on θ is a contradiction

to Neyman Factorization Theorem. Hence the Neyman Factorization Theorem fails

when the range of the distribution depends on the parameter θ .

Example 2.31 Show that the set of order statistic based on a random sample drawn

from a continuous population with pdf p(x | θ) is a sufficient statistic.

The order statistic Y1 ≤ Y2 ≤ · · · ≤ Yn are jointly sufficient statistic to iden-

tifying the distribution. If the order statistic is given by Y1 = y1 , Y2 = y2 , · · · , Yn =

yn , then X1 , X2 , · · · , Xn are taking the values equally. So the probability of the

random sample equals for a particular permutations of these given values of the order

1

statistic is n! which is independent of the parameter θ . . ˙. The set of order statistic

is a sufficient statistic.

Example 2.32 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a

population with pdf

1 1

π 1+(x−θ)2 −∞ < x < ∞

pθ (x) =

0 otherwise

57

A. Santhakumaran

Can the joint pdf of X1 , X2 , · · · , Xn be written in the form given in Neyman Fac-

torization Theorem ? Does Cauchy distribution have a sufficient statistic ?

The joint pdf of X1 , X2 , · · · , Xn is

n

Y 1 1

pθ (x1 , x2 , · · · , xn ) =

i=1

π 1 + (x − θ)2

It cannot be written in the form of Neyman Factorization Theorem, hence it does not

have a single sufficient statistic.

Definition 2.2 A family {pθ (x), θ ∈ Ω} of probability functions of the form

c(θ)eQ(θ)t(x) h(x) a < x < b

pθ (x) =

0 otherwise

• the range a < x < b of the distribution is independent of the parameter θ .

• Q(θ) is a non - trivial continuous function of θ .

• t(x) is a non-trivial function of x .

• h(x) is a continuous function of x in a < x < b .

If θ is a single value of the parameter space, then it is a single parameter expo-

nential family.

Definition 2.3 A pdf pθ (x) with single parameter θ is expressed as a single

parameter exponential family

c(θ)eQ(θ)t(x) h(x) a < x < b

pθ (x) =

0 otherwise

Remark 2.6 The simplicity of the definition is to determine the sufficient statistic

by inspection.

Example 2.33 Let Xi ’ s be independent and having N( iθ, 1 ), i = 1 to n , where

θ is unknown. Find a sufficient statistic for N( iθ, 1 ).

( 1 2

√1 e− 2 (xi −iθ) −∞ < xi < ∞

pθ (xi ) = 2π

0 otherwise

58

Probability Models and their Parametric Estimation

Consider

n

Y

pθ (x1 , x2 , · · · , xn ) = pθ (xi )

i=1

n

1 1

Pn 2

= √ e− 2 i=1 (xi −iθ)

2π

n

1 1

Pn 2 Pn Pn 2 2

= √ e− 2 i=1 xi +θ i=1 ixi − i=1 i θ

2π

n

1 1

Pn 2 Pn n(n+1)(2n+1) 2

= √ e− 2 i=1 xi +θ i=1 ixi − 12 θ

2π

= c(θ)eQ(θ)t(x) h(x)

n Pn

X 1 2

where t(x) = ixi , h(x) = e− 2 i=1 xi

i=1

n

1 1 2

and c(θ) = √ e− 12 n(n+1)(2n+1)θ

2π

Pn

Thus T = i=1 iXi is a sufficient statistic.

Example 2.34 Given n independent observations on a random variable X with prob-

ability density function

1 −x

2θ e θ if x > 0, θ > 0

θ θx

pθ (x) = e if x ≤ 0

2

0 otherwise

Obtain a sufficient statistic.

Consider

( t(x)

1 n − θ

( 2θ ) e if x > 0

pθ (x1 , x2 , · · · , xn ) = Pn

( θ2 )n eθt(x) , if x ≤ 0, where t(x) = i=1 xi

if x > 0, θ > 0

pθ (x) =

c2 (θ)eQ2 (θ)t(x) h(x) if x ≤ 0

1 n

wherePc(θ) = ( 2θ ) , Q1 (θ) = − θ1 , c2 (θ) = ( θ2 )n , Q2 (θ) = θ and h(x) = 1 . .˙.

n

T = i=1 Xi is a sufficient statistic.

Example 2.35 If X has a single observation from N (0, σ 2 ) , then show that |X|

is a sufficient statistic.

( 1 2

√ 1 e− 2σ2 x σ2 > 0 − ∞ < x < ∞

Given pσ (x) = 2πσ

0 otherwise

1 1

wherec(σ) = √ Q(σ) = − 2 , t(x) = x2 , h(x) = 1

2πσ 2σ

59

A. Santhakumaran

to T = |X| is sufficient.

Example 2.36 Let X1 , X2 , · · · , Xn be a random sample from N (θ, θ), θ > 0 .

Find the sufficient statistic for the random sample.

( 1 2

√ 1 e− 2θ (x−θ) −∞ < x < ∞

Given pθ (x) = 2πθ

0 otherwise

n

1 1

pθ (x1 , x2 , · · · , xn ) = √ e− 2θ (x−θ)

2πθ

n

1 1 X 2 X n

= e− xi + xi − θ

2θ 2θ 2

= c(θ)eQ(θ)t(x) h(x)

n P

√1 e− n2 θ, h(x) = e xi 1

x2i Q(θ) = − 2θ

P

where c(θ) = 2πθ

t(x) = It is

Xi2 is sufficient statistic.

P

an one parameter exponential family. Thus T =

Let X be a random sample drawn from a population with distribution Pθ , θ ∈ Ω ,

whose pdf is given by pθ (x) . Assume θ is a single value of the parameter space Ω

and the range of the distribution is independent of the parameter θ . Let T = t(X) be

a sufficient statistic. Using Neyman Factorization Theorem,

pθ (x) = pθ (t)h(x)

Assume that the function pθ (t) is partially differentiable with respect to θ , then

∂ log pθ (x) ∂ log pθ (t)

= = Qθ (t) (2.3)

∂θ ∂θ

Since the equation holds for all values of θ , it is also true for θ = 0. So one can obtain

the relation t(x) = k(t) where

∂ log pθ (x)

|θ=0 = t(x) and Q0 (t) = k(t)

∂θ

Suppose k(t) and t(x) are differentiable with respect to x , then

∂t(x) ∂k(t) ∂t

=

∂x ∂t ∂x

Again differentiate the equation (2.3) with respect to x

∂ 2 log pθ (x) ∂Qθ (t) ∂t

=

∂x∂θ ∂t ∂x

60

Probability Models and their Parametric Estimation

∂ 2 log pθ (x)

∂x∂θ ∂Qθ (t)

∂t(x)

= (2.4)

∂k(t)

∂x

The left hand side of the equation (2.4) is the same for all x . It must depend on θ

alone so that ∂Q θ (t)

∂k(t) = λ(θ), i.e.,

Z Z

∂Qθ (t) = λ(θ)∂k(t) + c1 (θ)

Again integrating with respective to θ ,

Z Z Z

Qθ (t)dθ = k(t) λ(θ)dθ + c1 (θ)dθ + c(x)

R ∂ log pθ (x) R

dθ dθ

= t(x) λ(θ)dθ + B(θ) + c(x)

R

since k(t) = t(x) for θ = 0 and B(θ) = c1 (θ)dθ

R

where Q(θ) = λ(θ)dθ

= c(θ)eQ(θ)t(x) h(x)

where c(θ) = eB(θ) , h(x) = ec(x)

Remark 2.7 The Neyman Factorization Theorem and the Exponential family of

distributions form are the two equivalent methods of identifying the sufficient Statistic.

Definition 2.4 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a

population with pdf pθ1 ,θ2 (x), θ1 , θ2 ∈ Ω. Let T1 = t1 (X), T2 = t2 (X) be two

statistics whose joint pdf is pθ1 ,θ2 (t1 t2 ) . The statistics T1 = t1 (X) and T2 =

t2 (X) are called jointly sufficient statistics iff

pθ1 ,θ2 (t1 , t2 )

61

A. Santhakumaran

with density function

1

2θ2 θ1 − θ2 < x < θ1 + θ2 where −∞ < θ1 < ∞, 0 < θ2 < ∞

pθ1 ,θ2 (x) =

0 otherwise

Consider Y1 ≤ Y2 ≤ · · · ≤ Yn be the order statistic of X1 , X2 , · · · , Xn .

The joint pdf of (Y1 , Yn ) is

Z yn n−2

n! 1 1 1

pθ1 ,θ2 (y1 , yn ) = dx

1!(n − 2)!1! 2θ2 y1 2θ2 2θ2

(

n(n−1) n−2

= (2θ2 )n (yn − y1 ) θ1 − θ2 < θ < θ1 + θ2

0 otherwise

pθ1 ,θ2 (x1 , x2 , · · · , xn ) ( 2θ12 )n

= n(n−1)

pθ1 ,θ2 (y1 , yn ) − y1 )n−2

(2θ2 )n (yn

1

=

n(n − 1)(yn − y1 )n−2

1 − 1 (x−θ)

σ e

σ θ < x < ∞, −∞ < θ < ∞

pθ,σ (x) =

0 otherwise 0 < σ < ∞

Consider a transformation

Y1 = nX(1)

Y2 = (n − 1)[X(2) − X(1) ]

Y3 = (n − 2)[X(3) − X(2) ]

··· ······

Yn−1 = 2[X(n−1) − X(n−2) ]

n

X n

X

Yn−2 = [X(n) − X(n−1) ] so that Yi = X(i)

i=1 i=1

1

The Jacobian of the transformation is |J| = n! . QThe joint pdf of

n

X(1) , X(2) , · · · , X(n) is given by p(x(1) , x(2) , · · · , x(n) ) = n! i=1 p(x(i) ) The joint

62

Probability Models and their Parametric Estimation

pdf of Y1 , Y2 , · · · , Yn is given by

n

Y

pθ,σ (y1 , y2 , · · · , yn ) = n! p(yi ) × |J|

i=1

n

Y

= p(yi )

i=1

1 − 1 (P yi +nθ)

= e σ nθ < y1 < ∞, 0 ≤ y2 < · · · , < yn < ∞

σn

Consider a further transformation

U1 = Y2

U2 = Y2 + Y3

U3 = Y2 + Y3 + Y4

··· ······

Un−2 = Y1 + Y2 + · · · + Yn−1

T = Y2 + Y3 + · · · + Yn

i.e., Y2 = U1

Y3 = U2 − U1

Y4 = U3 − U2

··· ······

Yn−1 = Un−2 − Un−3

Yn = T − Un−2

joint pdf of Y1 , U2 , · · · , Un−2 T is

(y1 −nθ) t

pθ,σ (y1 , u2 , · · · , un−2 , t) = σ1 e− σ σn−1 1

e− σ

nθ ≤ y1 , 0 ≤ u1 ≤ u2 ≤ · · · ≤ un−2 ≤ t < ∞

The marginal density of (Y1 T ) is

(y1 −nθ) t Rt Ru Ru Ru

pσ,θ (y1 , t) = σ1 e− σ σn−1 1

e− σ 0 0 n−2 0 n−1 · · · 0 2 du1 du2 · · · dun−2

(y1 −nθ) t Rt Ru Ru Ru

= σ1 e− σ σn−1 1

e− σ 0 0 n−2 0 n−1 · · · 0 3 u2 du2 du3 · · · dun−2

(y1 −nθ) t Rt Ru Ru R u u2

= σ1 e− σ σn−1 1

e− σ 0 0 n−2 0 n−1 · · · 0 4 2!3 du3 · · · dun−2

(y1 −nθ) t R t n−3

= σ1 e− σ σn−1 1

e− σ (n−3)!

1

u

0 n−2

dun−2

1 − (y1 −nθ) 1 − σt tn−2

= σe

σ

σ n−1 e (n−2)!

n

The first order statistic Y1 has the pdf pθ,σ (y1 ) = nσ e− σ (y1 −θ) θ < y1 < ∞

1

i.e., pθ,σ (y1 ) = σ1 e− σ (y1 −nθ) nθP< y1 < ∞

n

Thus Y1 + nθ ∼ eσ and T = i=2 Yi ∼ G(σ, n − 1) .

(

1 [− y1 −nθ ] tn−2 [− σt ]

σe

σ

(n−2)!σ n−1 e nθ < y1 < ∞, 0 < t < ∞

pθ,σ (y1 , t) =

0 otherwise

63

A. Santhakumaran

(n − 2)

p(u1 , u2 , · · · , un−2 | y1 , t) = 0 < u1 < u2 < · · · < un−2 < t

tn−2

Thus (Y1 , T ) is jointly sufficient statistics, i.e., (X(1) , +i = 1n [X(i) − X(1) ]) is

P

jointly sufficient statistics.

Definition 2.5 Let θ = (θ1 , θ2 , · · · , θk ) is a vector of parameters and T =

(T1 , T2 , · · · , Tk ) is a random vector . The vector T is jointly sufficient statistics

if pθ (x) is expressed of the form

Pk

Qj (θ)tj (x)

pθ (x) = c(θ)e j=1 h(x) a<x<b

0 otherwise

a random sample

Pn drawn from a population

n

N( θ, σ 2 ). Show that the statistic T = i=1 Xi , X

i=1 i

2

is jointly sufficient

statistics.

n

1 1

Pn 2

pθ,σ2 (x1 , x2 , · · · , xn ) = √ e− 2σ2 i=1 (xi −θ)

2πσ

n

1 Pn 2 Pn 2

e− 2σ2 [ i=1 xi −2θ i=1 xi +nθ ]

1

= √

2πσ

n

1 nθ 2

Pn 2 Pn

e− 2σ2 e− 2σ2 [ i=1 xi −2θ i=1 xi ]

1

= √

2πσ

2 2

= c(θ, σ 2 )eQ1 (θ,σ )t1 (x)+Q2 (θ,σ )t2 (x) h(x)

n

1 nθ 2

where c(θ, σ 2 ) = √ e− 2σ2 ,

2πσ

θ −1

Q1 (θ, σ 2 ) = 2 , Q2 (θ, σ 2 ) = ,

σ 2σ 2

Xn Xn

h(x) = 1, t1 (x) = xi , t2 (x) = x2i

i=1 i=1

Pn Pn 2

.˙. i=1 Xi , i=1 Xi is jointly sufficient statistics.

Example 2.40 Let X1 , X2 , · · · , Xn be a random sample from a Gamma (α, β)

population. Find a two dimensional sufficient statistics for the random sample.

( β

α −αx β−1

Given pα,β (x) = Γβ e x x > 0, α > 0, β > 0

0 otherwise

64

Probability Models and their Parametric Estimation

αnβ −α P xi Y β−1

pα,β (x1 , x2 , · · · , xn ) = e ( xi )

(Γβ)n

αnβ −α P xi (β−1) log(Qni=1 xi )

= ne e

(Γβ)

αnβ −α P xi +(β−1) P log xi

= ne

(Γβ)

αnβ −α P xi +β P log xi −P log xi

= ne

(Γβ)

= c(α, β)eQ1 (α,β)t1 (x)+Q2 (α,β)t2 (x) h(x)

αnβ

Pn

where c(α, β) = (Γβ) n , Q1 (α, β) = −α , t1 (x) = i=1 xi , Q2 (α, β) = β ,

− i =1n log xi

Pn P

t2 (x) = i=1 log xP h(x) = e

i and P . It is a two parameter exponential

family. Therefore ( Xi , Xi2 ) is jointly sufficient statistic.

There are two types of efficient estimators. One is relative efficient estimator and

the other one is efficient estimator. Efficient estimator due to Cramer - Rao lower bound

for the variance of an unbiased estimator. Relative efficient estimator is given below:

Definition 2.6 Let T1 = t1 (X) and T2 = t2 (X) be two unbiased estimators of

θ and Eθ [T12 ] < ∞ and Eθ [T22 ] < ∞. One may define the efficiency of T1 = t1 (X)

relative to T2 = t2 (X) is

Vθ [T1 ]

Efficiency = .

Vθ [T2 ]

Vθ [T1 ]

If < 1 , then T1 = t1 (X) is more efficient than T2 = t2 (X) .

Vθ [T2 ]

Example 2.41 Let Y1 < Y2 < Y3 < Y4 < Y5 be the order statistic of a random

sample of size 5 from a uniform population with pdf

1

θ 0 < x < θ, θ > 0

pθ (x) =

0 otherwise

Eθ [2Y3 | Y5 ] . Compare the variances of 2Y3 and the statistic T .

65

A. Santhakumaran

The pdf of Y3 is

2 "Z θ #2

Z y3

5! 1 1 1

pθ (y3 ) = dx dx

2!1!2! 0 θ θ y3 θ

30 2

= y [θ − y3 ]2 0 < y3 < θ

θ5 3

30 2 y3

= 5

y3 [1 − ]2 0 < y3 < θ

θ θ

Z θ

30 y3

Eθ [Y3 ] = y 3 [1 − ]2 dy3

θ3 0 3 θ

Z 1

30 y3

= θ4 t3 (1 − t)2 dt where t = θ

θ3 0

Z 1

= 30θ t4−1 (1 − t)3−1 dt

0

Γ4Γ3

= 30θ

Γ7

3! × 2! θ

= 30 =

6! 2

Eθ [2Y3 ] = θ

Z y3 2 Z y5

5! 1 1 1 1

pθ (y3 , y5 ) = dx dx

2!1!1!1! 0 θ θ y3 θ θ

60 2

= θ 5 y3 [y5 − y3 ] 0 < y3 < y5 < θ

0 otherwise

The pdf of Y5 is

5 4

pθ (y5 ) = θ 5 y5 0 < y5 < θ

0 otherwise

66

Probability Models and their Parametric Estimation

pθ (y3 , y5 )

pθ (y3 | y5 ) =

pθ (y5 )

60 y32 [y5 − y3 ]

= 0 < y3 < y5

5 y54

Z y5

12

Eθ [Y3 | Y5 ] = y33 [y5 − y3 ]dy3

y54 0

3

= y5

5

6

.. . Eθ [2Y3 | Y5 = y5 ] = y5

5

θ2 2θ2

Vθ [Y3 ] = since Eθ [Y32 ] =

28 7

θ2

Vθ [2Y3 ] =

7

Z θ

5 5

Eθ [Y5 ] = y55 dy5 = θ

θ5 0 6

5θ2

Eθ [Y52 ] =

7

5θ2

Vθ [Y5 ] =

5 × 36

θ2

6

Vθ Y5 =

5 35

6

The efficiency of 5 Y5 is relative to 2Y3 is

θ2

Vθ [ 65 Y5 ] 35 1

= θ2

= <1

Vθ [2Y3 ] 7

5

Problems

2.1 Give an example for each of the following cases:

(i) Estimator with non - zero bias.

(ii) Estimator with non zero bias

(iii) Consistent estimator with zero bias and

(iv) Consistent estimator with non zero bias.

2.2 Give a sufficient condition for an estimator to be consistent? Is the sample mean

a consistent estimator of the population mean?

2.3 If X1 , X2 , · · · , Xn is a random sample of size n drawn from a population with

uniform distribution ∪[−2θ, θ] , examine whether max1≤i≤n {Xi } is consistent

for θ ?

67

A. Santhakumaran

unique and (ii) unbiased.

2.5 Show that if the bias of an estimator and its variance approach zero, then the

estimator will be consistent.

2.6 When would you say that estimator of a parameter is good? In particular dis-

cuss the requirements of consistency and unbiasedness of an estimator. Give an

example to show that a consistent estimator need not be unbiased.

2.7 Let X1 , X2 , · · · , Xn be n independent random sample drawn from a normal

population with mean θ and variance σ 2 . Obtain the unbiased estimators of (i)

θ and (ii) σ 2 .

an experiment with probability of success p . Obtain an unbiased estimator of

p2 in the form aTn2 + bTn + c .

2.9 Obtain the unbiased estimator of θ(1 − θ) , where θ is the parameter of the

Binomial distribution.

2.10 Find the unbiased estimator of λ2 in a Poisson population with parameter λ

based on a random sample of size n .

2.11 Let X1 , X2 , · · · , Xn be iid random sample of size n drawn from a population

with common density

1 −x

θe θ > 0 and x > 0

θ

pθ (x) =

0 otherwise

P

Xi

(i) Show that TP

1 = n is the unbiased estimator of θ .

n

(ii) Let Tc = c i=1 Xi . Show that Ec [Tc − θ]2 = θ2 E1 [Tc − 1]2 .

2.12 Obtain the sufficient statistic, given a sample of size n from a uniform distribu-

tion ∪(−θ, θ) .

2.13 State two equivalent definition of sufficient statistic and obtain their equivalence.

2.14 Explain the concept of sufficiency. State the Factorization Theorem for a suffi-

cient statistic and indicate its importance.

2.15 Let X1 , X2 , · · · , Xn be a random sample

P drawn from a uniform population

in the interval [0, θ] . Define nT2 1 = n

Xi and n+1 T2 = max1≤i≤n {Xi } .

Evaluate their relative efficiency.

2.16 Let X1 , X2 , · · · , Xn be a random sample drawn from a population N (θ, σ 2 ) .

Prove that the sample mean is more efficient estimator as compared to the sample

median for the parameter θ .

68

Probability Models and their Parametric Estimation

2.17 Let X be a single observation from a normal population N (2θ, 1) Pnand let

Y1 , Y2 , · · · , Yn be a normal population N (θ, 1) . Define T = 2X + k=1 Yk .

Show that T is sufficient statistic.

2.18 Let X1 , X2 , · · · , Xn be a random

Pnsample drawn from a normal population

2

N (0, θ) , 0 < θ < ∞ . Show that X

i=1 i is a sufficient statistic.

2.19 If T1 = 32 max{X1 , X2 } and T2 = 2(X1 + X2 ) are estimators of θ based

on two independent observations X1 and X2 on a random variable distributed

uniformly over (0, θ) . Which one do you prefer and why?

2.20 Let X1 , X2 , · · · , Xn be aPrandom sample drawn from a Poisson population with

Xi

parameter θ . Show that n+2 is not unbiased of θ but consistent of θ .

2.21 Distinguish between an Estimate and Estimator. Given three observations

X1 , X2 and X3 on a normal random variable X from N (θ, 1) , a person con-

structs the following estimators for θ

X1 + X2 + X3

T1 =

6

X1 + 2X2 + 3X3

T2 =

7

X1 + X2

T3 =

2

which one would you choose and why?

2.22 A random sample X1 , X2 , · · · , Xn drawn on XPwhichP takes 1 or 0 with re-

Xi ( Xi −1)

spective probabilities θ and (1 − θ) . Show that n(n−1) is an unbiased

estimator of θ2 .

2.23 Discuss whether an unbiased estimator exists for the parametric function τ (θ) =

θ2 of Binomial (1, θ) based on a sample of size one.

2.24 Obtain the sufficient statistic of the pdf

(1 + θ)xθ 0 < x < 1

pθ (x) =

0 otherwise

based on an independent sample of size n .

2.25 X1 , X2 , X3 and X4 constitute a random sample of size four from a Poisson

population with parameter θ . Show that (X1 + X2 + X3 + X4 ) and (X1 +

X2 , X3 + X4 ) are sufficient statistics. Which would you prefer ?

2.26 A statistic Tn such that Vθ [Tn ] → 0 ∀ θ is consistent as an estimator of θ as

n → ∞:

(a) if and only if Eθ [Tn ] → θ ∀ θ

(b) if, but not only if Eθ [Tn ] → θ ∀ θ

(c) if and only if Eθ [Tn ] = θ ∀ θ , for every n

(d) if and only if |Eθ [Tn ] − θ| and Vθ [Tn ] → 0 ∀ θ

69

A. Santhakumaran

if as n → ∞ :

(a) P {|Xn − X| > } → 0 for some > 0

(b) P {|Xn − X| > } → 0 for some < 0

(c) P {|Xn − X| > } → 0 for every > 0

(d) P {|Xn − X| < } → 0 for every > 0

2.28 X1 , XP2 , · · · , Xn are iid Bernoulli random variables with Eθ [Xi ] = θ and

n

Sn = i=1 Xi . Then, for a sequence of non - negative numbers {kn }, Tn =

Sn +kn

n+kn is a consistent estimator of θ :

(a) if knn → 0 as n → ∞

(b) if and if kn = 0 ∀ n

(c) if and only if kn is bounded as n → ∞

(d) whatever {kn } is

2.29 In tossing a coin the P {Head} = p2 . It is tossed n times to estimate the value

of p2 . X denotes the number of heads. One may use to estimate the unbiased

estimator

2 is 2

(a) X n (b) Xn (c) Xn (d) nX2

2.30 Which of the following statement is not correct for a consistent estimator?

1. If there exists one consistent estimator, then an infinite number of consistent

statistics may be constructed.

2. Unbiased estimators are always consistent.

3. A consistent estimator with finite mean value must tend to be unbiased in

large samples.

Select the correct answer given below:

(a) 1 (b) 2 (c) 1 and 3 (d) 1, 2 and 3

2.31 Consider the following type of population :

1. Normal 2 . Cauchy 3 . Poisson

Sample mean is the best estimator of population mean in case of

(a) 1 and 3 (b) 1 and 2 (c) 2 and 3 (d) 1 , 2 and 3

70

Probability Models and their Parametric Estimation

3.1 Introduction

If a given family of probability distributions that admits a non - trivial sufficient

statistic which results in the greatest reduction of data collection. The reduction of data

can be achieved through complete sufficient statistic. The existence of the mathemat-

ical expectation Eθ [g(X)] , θ ∈ Ω implies that the integral ( or sum) involves in

Eθ [g(X)] converges absolutely. This absolute convergence was tacitly assumed in the

definition of completeness.

3.2 Completeness

Definition 3.1 A family of distributions {Pθ , θ ∈ Ω} is said to be complete if

Eθ [g(X)] = 0 ∀ θ ∈ Ω ⇒ g(x) = 0 ∀ x.

distributions of the statistic T = t(X) is complete, i.e.,

1

2 θ|x| (1 − θ)|x| x = −1, 1

pθ (x) = 1 − θ(1 − θ) x=0

0 otherwise

Show that the family is not complete but the family of distributions Y = |X| is

complete.

Consider Eθ [g(X)] = 0

1

X

g(x)pθ (x) = 0

x=−1

1 1

g(−1) θ(1 − θ) + g(0)[1 − θ(1 − θ)] + g(1) θ(1 − θ) = 0

2 2

Equivating the coefficient of θx on both sides

g(0) = 0 and g(−1) + g(1) − 2g(0) = 0

⇒ g(−1) + g(1) = 0andg(−1) = −g(1)

71

A. Santhakumaran

1 − θ(1 − θ) y = 0

pθ (y) = θ(1 − θ) y=1

0 otherwise

ConsiderEθ [g(Y )] = 0

1

X

g(y)[θ(1 − θ)]y [1 − θ(1 − θ)]1−y = 0

y=0

1

X θ(1−θ)

g(y)ρy = 0 where ρ = 1−θ(1−θ)

y=0

g(0) + g(1)ρ = 0 → g(0) = 0 andg(1) = 0

Example 3.2 Let X1 , X2 , · · · , Xn be iid random sample drawn from a Bernoulli

P(

θ), 0 < θ < 1 . Prove that T = i=1 n)Xi is a complete statistic.

population b(1, P

n

Given T = i=1 X1 ∼ b(n, θ)

Eθ [g(T )] = 0

n

X

g(t)cnt (1 − θ)n−t = 0

t=0

n t

X θ

g(t)cnt (1 − θ)n = 0

t=0

1−θ

Here (1 − θ)n 6= 0

n

X θ

g(t)cnt ρt = 0 where ρ =

t=0

1−θ

g(0)cn0 + g(1)cn1 ρ + · · · + g(n)ρn = 0

= 0 coefficient of ρ0

g(0)

cn1 g(1)

= 0 coefficient of ρ1

⇒ g(1) = 0

······ ··· ······

g(n) = 0 coefficient of ρn

Thus g(t) = 0 ∀ t = 0, 1, 2, · · · , n.

Pn

Hence T = i=1 Xi is a complete statistic.

Example 3.3 Let X1 , X2 , · · · , Xn be iid random sample drawn from a Poisson

72

Probability Models and their Parametric Estimation

Pn

population with parameter λ > 0 . Show that T = i=1 Xi is a complete statistic.

n

X

T = Xi ∼ P (nλ)

i=1

(nλ)t

i.e., pλ (t) = e−nλ , t = 0, 1, 2, · · · , ∞

t!

Eλ [g(T )] = 0

∞

X (nλ)t

g(t)e−nλ = 0

t=0

t!

∞

X (nλ)t

g(t) = 0 since e−nλ 6= 0

t=0

t!

nλ (nλ)n

g(0) + g(1) + · · · + g(n) + ··· = 0

1! n!

By comparing the coefficients of λt on both sides,

g(0) = 0 coefficient of λ0

ng(1) = 0 coefficient of λ1

⇒ g(1) = 0

······ ··· ······

Thus g(t) = 0 ∀ t = 0, 1, 2, · · · , ∞

Pn

Hence T = i=1 Xi is a complete statistic.

Example 3.4 Let X ∼ ∪(0, θ), θ > 0 . Show that the family of distributions is

complete.

Consider Eθ [g(X)] = 0

Z θ

1

⇒ g(x) dx = 0

0 θ

Z θ

⇒ g(x)dx = 0

0

One can differentiate the above integral with respect to θ on both sides

Z θ

0dx + g(θ) × 1 − g(0) × 0 = 0

0

hR i

b(θ)

d a(θ) pθ (x)dx Z b(θ)

dpθ (x) db(θ)

since = dx + pθ [b(θ)]

dθ a(θ) dθ dθ

da(θ)

−pθ [a(θ)]

dθ

g(θ) = 0 ∀ θ > 0, i.e., g(x) = 0 ∀ 0 < x < θ, θ > 0

73

A. Santhakumaran

Example 3.5 Let X ∼ N (0, θ) . Prove that the family of pdf ’s {N (0, θ), θ > 0}

is not complete.

Consider Eθ [X] = 0 for θ > 0

Z ∞

1 1 2

x √ √ e− 2θ x dx = 0

−∞ 2π θ

→x = 0 not for all x

i.e., for some value of x 6 = 0

since Eθ [X − θ] = 0

Z ∞

1 1 2

⇒ (x − θ) √ √ e− 2θ x dx = 0

−∞ 2π θ

Z ∞

1 1 2

⇒ t √ √ e− 2θ (t+θ) dt = 0 where t = x − θ

−∞ 2π θ

Z ∞

−t t 1 2

− 2θ t θ

e √ e dt = 0 since e− 2 6= 0

0 2πθ

R∞

This is same as the Bilateral Laplace Transform of f (t) as −∞

e−st f (t)dt . By the

uniqueness property of the Laplace Transform

Z ∞

e−st f (t)dt = 0

o

⇒ f (t) = 0 ∀ t ∈ (−∞, ∞)

t 1 2

i.e., √ e− 2θ t = 0

2πθ

⇒ t = 0 i.e., x − θ = 0

⇒x = θ >0

Thus x is not equal to zero for θ > 0 . The family X ∼ N (0, θ), θ > 0 is not

complete.

Example 3.6 If X ∼ N (0, θ), θ > 0 . Prove that T = X 2 is a complete statistic.

2

Let T = (X − 0)2 , then Tθ = Xθ ∼ χ2 distribution with one degree of

freedom. Tθ has the pdf of G( 21 , 12 ) .

1 t 1

( 1

1 e− 2 θ ( θt ) 2 −1 θ1 0 < t < ∞

pθ (t) = 2 2 Γ 12

0 otherwise

( 1 1

√ 1

e− 2θ t t 2 −1 0 < t < ∞

= 2πθ

0 otherwise

Eθ [g(T )] = 0

Z ∞

1 t 1

g(t) √ e− 2θ t 2 −1 dt = 0

0 2πθ

Z ∞

1 1

e− 2θ t [g(t)t− 2 ]dt = 0 ∀ θ > 0

0

74

Probability Models and their Parametric Estimation

R ∞

This is same as the Laplace Transform of f (t) as 0 e−st f (t)dt.

Using the uniqueness property of Laplace Transform

1

g(t)t− 2 = 0 ∀ t > 0

i.e., g(t) = 0 ∀ t > 0 . Thus T = X 2 is a complete statistic .

Example 3.7 Examine whether the family of distributions

pθ (x) =

2(1 − θ) if 21 ≤ x < 1

is complete.

Consider Eθ [g(X)] = 0

Z 1 Z 1

2

⇒ g(x)2θdx + g(x)2(1 − θ)dx = 0

1

0 2

1

Z 2

Z 1

2θ g(x)dx + 2(1 − θ) g(x)dx = 0

1

0 2

1

Z 2

Z 1 Z 1

θ g(x)dx − θ g(x)dx + g(x)dx = 0

1 1

0 2 2

"Z 1

#

2

Z 1 Z 1

θ g(x)dx − g(x)dx + g(x)dx = 0

1 1

0 2 2

"Z 1

#

2

Z 1

θ g(x)dx − g(x)dx = 0

1

0 2

Z 1

and g(x)dx = 0

1

2

1

Z 2

Z 1

g(x)dx = g(x)dx θ 6= 0

1

0 2

Z 1

2

⇒ g(x)dx = 0

0

i.e., g(x) 6= 0 for some x . Thus the family of distributions is not complete. Since

75

A. Santhakumaran

choose

+1 if 0 < x < 14

−1 if 14 ≤ x < 21

g(x) =

+1 if 12 ≤ x < 43

−1 if 34 ≤ x < 1

Z 41 Z 12

Eθ [g(X)] = (+1)2θdx + (−1)2θdx

1

0 4

3

Z 4

Z 1

+ (+1)2(1 − θ)dx + (−1)2(1 − θ)dx

1 3

2 4

1 1 1 1

=2θ − 2θ + 2(1 − θ) − 2(1 − θ)

4 4 4 4

= 0

But g(x) 6= 0 for some x

+1 if 0 < x < 14

−1 if 14 ≤ x < 21

i.e., g(x) =

+1 if 12 ≤ x < 43

−1 if 34 ≤ x < 1

distributions. Its pdf is given by

c(θ)eQ(θ)t(x) h(x) if a < x < b

pθ (x) =

0 otherwise

where a and b are independent of θ . Then the family of distributions is complete.

Proof: Assume pθ (x), θ ∈ Ω is a pmf.

X

Eθ [g(T )] = g(t)Pθ {T = t}

t

X

θt

g(t)c(θ)e h(x) = 0

t

X

then g(t)eθt+s(t) = 0

t

then g(t) = g + (t) − g − (t) and both g + (t) and g − (t) are non - negative functions

X

[g + (t) − g − (t)]eθt+s(t) = 0 ∀ θ ∈ Ω

t

X X

g + (t)eθt+s(t) = g − (t)eθt+s(t) ∀ θ ∈ Ω

t t

+ + θt+s(t)

P

Dividing g (t) by a constant t g (t)e and it is denoted by

+ θt+s(t)

g (t)e

p+ (t) = P + θt+s(t)

t g (t)e

76

Probability Models and their Parametric Estimation

P

t

g − (t)eθt+s(t)

p− (t) = P − θt+s(t)

t g (t)e

X X

p+ (t)eδt = p− (t)eδt ∀ δ ∈ Ω

t t

Hence g + (t) = g − (t) ∀ t

⇒ g(t) = 0 ∀ t . Thus the family of distributions is complete.

Example 3.8 Let X1 , X2 , · · · , XP

n be a random

Pn sample drawn from a population

n

with N (θ, θ2 ) . Define g(X) = i=1 Xi ,

2

i=1 Xi . Prove that g(X) is not

77

A. Santhakumaran

complete.

2

n n

X X 2

Define g(X) = 2 − (n + 1)

Xi Xi , n = 2, 3, · · ·

i=1 i=1

2

n n

X X 2

Eθ [g(X)] = 2Eθ Xi − (n + 1)Eθ Xi

i=1 i=1

2

n

X 2 2 θ2

Xi = n X̄ and X̄ ∼ N (θ, )

i=1 n

√

2

Z ∞

2 n − n (x̄−θ)2

Eθ [X̄ ] = x̄ √ e 2θ 2 dx̄

−∞ 2πθ

x̄ − θ √ θ θ

If z = n, then x̄ − θ = z√ and dx̄ = √ dz

θ n n

√ 2

θ 2 n −z θ

Z ∞

. 2 2 √ dz

. . Eθ [X̄ ] = (θ + z √ ) √ e

−∞ n 2πθ n

z2

2

Z ∞ z2 2z 1 −

= θ 1 + + √ √ e 2 dz

−∞ n n 2π

2

2 1 Z ∞ 2 1 −z

= θ 1 + z √ e 2 dz + 0

n −∞ 2π

1 1 1 −1

One can take z 2 = t, then z = t 2 and dz = t2 dt

2

" #

2 − t 1 1 −1

Z ∞

2 2

i.e., Eθ [X̄ ] = θ √1+ te 2 t2 dt

2π 0 n 2

" #

1 − t 3 −1

Z ∞

2

= θ 1+ √ e 2 t2 dt

n 2π 0

2 1 Γ3

= θ

1 + 2

√

n 2π 1 3

( )2

2

√ √

2 1 1 π2 2

= θ 1 + 2

√

n 2π

2

1 n+1 2

= θ 1+ = θ

n n

2

n

. X 2 2 2 2 2n+1

. . Eθ Xi = Eθ [nX̄] = n Eθ [X̄] = n θ

i=1 n

2

= n(n + 1)θ

n n

X 2 X 2

Consider Xi = (Xi − θ + θ)

i=1 i=1

n n

X 2 2 X

= (Xi − θ) + nθ + 2θ (Xi − θ)

i=1 i=1

n

X 2 2

= (Xi − θ) + 2θnx̄ − nθ

i=1

n

X 2 2 2

Eθ Xi = Eθ [ns ] + 2θnEθ [X̄] − nθ

i=1

2 2 2 2 2

= Eθ [ns ] + 2nθ − nθ = Eθ [ns ] + nθ

2 1 X 2

where s = (xi − θ)

n

Pn 2

ns2 i=1 (Xi −θ)

Let Y = σ2 = θ2 ∼ χ2 distribution with n degrees of freedom. Y has

78

Probability Models and their Parametric Estimation

the pdf G( n2 , 12 )

1 n

(

1

1

e− 2 y y 2 −1 0<y<∞

p(y) = 22 Γn

2

0 otherwise

Z ∞

1 1 n

E[Y ] = n e− 2 y y 2 +1−1 dy = n

0 2 2 Γ n2

ns2

i.e., Eθ = n

σ2

Eθ [s2 ] = θ2 since σ 2 = θ2

n

X

Eθ [ Xi2 ] = nθ2 + nθ2 = 2nθ2

i=1

" n #2 " n #

X X

Eθ [g(X)] = 2Eθ Xi − (n + 1)Eθ Xi2

i=1 i=1

= 2n(n + 1)θ2 − (n + 1)2nθ2 = 0

→ g(x) = 0 not for all x

i.e., g(x) 6= 0 for some x

n

!2 n

!

X X

2

i.e., g(x) = 2 xi − (n + 1) xi 6= 0

i=1 i=1

n

!2 n

!

X X

i.e., 2 xi 6= (n + 1) x2i for some x, n = 2, 3, · · ·

i=1 i=1

Example 3.9 Show that the family of distributions given by the pdf ’s

θ if 0 < x < θ

pθ (x) = (1 + θ) if θ ≤ x < 1, 0 < θ < 1

0 otherwise

79

A. Santhakumaran

is complete.

Consider Eθ [g(X)] = 0

Z θ Z 1

θg(x)dx + (1 + θ)g(x)dx = 0+0

0 θ

Z θ

⇒ g(x)dx = 0 and

0

Z 1

g(x)dx = 0

θ

One can differentiate the above integrals with respect to θ

Z θ

0dx + g(θ) × 1 − g(0) × 0 = 0 and

0

Z 1

0dx + g(1) × 0 − g(θ) × 1 = 0

θ

g(θ) = 0 and −g(θ) = 0 ∀ θ > 0

i.e., g(x) = 0 ∀ 0 < x < θ, 0 < θ < 1

Thus the family of distributions is complete.

Definition 3.3 A statistic T = t(X) is said to be bounded complete statistic, if

there exists a function |g(T )| ≤ M, M ∈ < such that E[g(T )] = 0 ⇒ g(t) =

0 ∀ t ∈ <.

Example 3.9 Show that Completeness implies bounded completeness, but

bounded completeness does not imply completeness.

Proof: Assume T = t(X) is a complete statistic. That is E[g(T )] = 0 ⇒

g(t) = 0 ∀ t ∈ < . Prove that g(T ) is bounded complete.

V [g(T )]

P {|g(T ) − E[g(T )]| < } ≥ 1− for every given > 0

2

V [g(T )]

P {|g(T )| < } ≥ 1− for every given > 0

2

⇒ |g(t)| < ∀ t ∈ <

at least with probability 1 − V [g(T

2

)]

. This means that g(T ) is bounded with

E[g(T )] = 0 ⇒ g(t) = 0 ∀ t ∈ < . i.e., T = t(X) is a bounded complete

statistic.

Assume T = t(X) is a bounded complete statistic. To prove that T is not a

complete statistic.

θ x = −1

Consider a family of density functions pθ (x) = (1 − θ)2 θx x = 0, 1, 2, · · ·

0 otherwise

Consider a function

x x = −1, 0, 1, 2, · · · , n and ∀ n ∈ N

g(x) =

0 x = n + 1, n + 2, · · ·

80

Probability Models and their Parametric Estimation

Now the function g(x) = x is bounded. If the family is bounded complete, then

∞

X

xpθ (x) = 0

x=−1

n

X

−1 × θ + x(1 − θ)2 θx = 0

0

n

X θ

xθx = = θ(1 − θ)−2

x=0

(1 − θ)2

Xn

xθx = θ[1 + 2θ + 3θ2 + · · · ]

x=0

= [θ + 2θ2 + 3θ3 + · · · ]

X∞ X∞

= xθx = xθx

x=1 x=0

Xn X∞

= xθx + xθx

x=0 x=n+1

∞

X

⇒ xθx = 0

x=n+1

81

A. Santhakumaran

Eθ [g(X)] = 0

∞

X

i.e., g(x)pθ (x) = 0

x=−1

∞

X

g(−1)θ + g(x)(1 − θ)2 θx = 0

x=0

X∞

g(x)(1 − θ)2 θx = −g(−1)θ

x=0

∞

X −g(−1)θ

g(x)θx = = −g(−1)θ(1 − θ)−2

x=0

(1 − θ)2

X∞

g(x)θx = −g(−1)θ[1 + 2θ + 3θ2 + · · · +

x=0

nθn−1 + (n + 1)θn + · · · ]

= −g(−1)[θ + 2θ2 + 3θ3 + · · · +

nθn + (n + 1)θn+1 + · · · ]

X∞

= −g(−1) xθx

x=1

X∞

= −g(−1) xθx

x=0

⇒ g(x) = −g(−1)x = cx where c = −g(−1) and c ∈ <

Thus the family of distributions is not complete.

Example 3.11 Examine the family of distributions given by Pθ {X = −1} =

θ2 , Pθ {X = 0} = 1 − θ and Pθ {X = 1} = θ(1 − θ), 0 < θ < 1 is complete.

Consider Eθ [g(X)] = 0

2

g(−1) × θ + g(0)θ(1 − θ) + g(1)(1 − θ) = 0

θ2 [g(−1) − g(0)] + θ[g(0) − g(1)] + g(1)] = 0

g(−1) − g(0) = 0 coefficient of θ2

⇒ g(−1) = g(0)

g(0) − g(1) = 0 coefficient of θ

⇒ g(0) = g(1)

g(1) = 0 coefficient of θ0

Hence g(−1) = g(1) = g(0) = 0 . Thus g(x) = 0 for x = −1, 0 and 1. .˙. The

family of distributions is complete.

Example 3.12 X has the following distribution

X =x: 0 1 2

Pθ {X = x} 1 − θ − θ2 θ θ2

82

Probability Models and their Parametric Estimation

Consider Eθ [g(X)] = 0

g(0)[1 − θ − θ ] + g(1)θ + g(2)θ2 = 0

2

2

θ [g(2) − g(0)] + θ[g(1) − g(0)] + g(0) = 0

g(2) − g(0) = 0 coefficient of θ2

g(1) − g(0) = 0 coefficient of θ

g(0) = 0 coefficient of θ0

Hence g(0) = g(1) = g(2) = 0 , i.e., g(x) = 0 for x = 0, 1 and 2. Thus the family

of distributions is complete.

Example 3.13 X has the following distribution

X =x: 1 2 3 4 5 6

Pθ {X = x} 61 16 16 61 16 16

Examine whether the family of pmf ’s is complete.

Define

c when x = 1, 3, 5

g(x) =

−c when x = 2, 4, 6

Consider E[g(X)] = 0

3c 3c

⇒ − + = 0

6 6

But g(x) 6= 0 for x = 1, 2, 3, 4, 5, 6.

Example 3.14

Show that the family of pmf ’s {pN (x), x = 1, 2, · · · , N and ∀ N = 1, 2, 3, · · · }

is complete.

The pmf of a random variable X is

1

pN (x) = x = 1, 2, · · · , N and ∀ N = 1, 2, · · ·

N

i.e., pN =1 (x) = 1 x = 1

1

pN =2 (x) = x = 1, 2

2

1

pN =3 (x) = x = 1, 2, 3

3

······ ··· ······

······ ··· ······

Consider EN g(X) = 0 ∀ N ∈ I+

PN

i.e., x=1 g(x) N1 = 0 ⇒ g(x) = 0 ∀ x and ∀ N

When N = 1 ⇒ g(1) = 0

When N = 2 ⇒ g(1) + g(2) = 0 ⇒ g(2) = 0 since g(1) = 0

When N = 3 ⇒ g(3) = 0 since g(1) + g(2) = 0 and so on.

83

A. Santhakumaran

Thus the discrete family of uniform distributions defined on the sample {x | x =

1, 2, 3, · · · , N and N ∈ I+ } is complete.

Example 3.15 Examine whether the family of pmf ’s {pN (x), x =

1, 2, · · · , N and N = 2, 3, · · · } is complete.

PN

Consider x=1 g(x) 1 = 0 when N = 2, 3, · · ·

P2 N

When N = 2 ⇒ x=1 g(x) = 0 ⇒ g(1) + g(2) = 0 i.e., g(2) = −g(1)

When N = 3 ⇒ g(1) + g(2) + g(3) = 0

⇒ g(3) = 0 and so on. Thus EN [g(x)] = 0 ⇒ g(x) 6= 0 for x = 1 and 2 ,

i.e., g(2) = −g(1) and

g(x) = 0 ∀ x = 3, 4, · · · , N and ∀ N = 2, 3, 4, · · ·

Thus the family of distributions is not complete.

Remark 3.1 Completeness is a property of a family of distributions. As in the

example 3.15, one can see that the exclusion of even one member from the family

{pN (x), x = 1, 2, · · · , N and N = 1, 2, · · · } destroys completeness.

Remark 3.2 For the example 3.15, define

c if x = 1

g(x) = −c if x = 2

0 if x = 3, 4, 5, · · ·

PN 1

then x=1 g(x) N = 0 when N = 2, 3, · · ·

⇒ g(x) = 0 ∀ x = 1, 2, 3, · · · , N and N = 2, 3, · · · . This means that the family of

distributions is bounded complete. Thus there exist is a class of unbiased estimators of

zero, i.e., U0 = {g(X) | c ∈ <} where

(−1)x−1 c X = 1, 2 and c ∈ <

g(x) =

0 otherwise

If the family of distributions is complete, then the unbiased estimator of zero is unique.

Consider a random sample (X1 , X2 , X3 , · · · , Xn ) from a iid discrete popula-

tion with probability function pθ (x), θ ∈ Ω. The statistic T = {X1 = x1 , x2 =

x2 , · · · , Xn−1 = xn−1 } is not sufficient. For

P {X1 = x1 , · · · , Xn = xn | X1 = x1 , X2 = x2 , · · · , Xn−1 = xn−1 } = P {Xn = xn }

= P {Xn = xn }

= pθ (xn )

This conditional probability pθ (xn ) given the value of T , i.e., X1 =

x1 , · · · , Xn−1 = xn−1 is just the probability function of the nth observation which

does depend on θ . One uses a statistic means that there is a reduction of a given sam-

ple. It usually simplifies the methodology and the theory, and how much data can be

reduced without sacrificing sufficiency.

84

Probability Models and their Parametric Estimation

reduction of the partition of the sample space defined by the statistic is not sufficient.

Lehmann and Scheffe technique for obtaining a minimal sufficient statistic is par-

tition of the sample space. Once the partition is obtained, a minimal sufficient statistic

can be defined by assigning distinct numbers to distinct partition sets.

In constructing sets of a partition that is to be sufficient for the family of den-

sities pθ (x), for θ ∈ Ω, there is two sets of sample points X1 = x1 , · · · , Xn = xn

and Y1 = y1 , Y2 = y2 , · · · , Yn = yn will lie on the same partition of the minimal

sufficient partition iff the ratio of x1 , x2 , · · · , xn to its value at y1 , y2 , · · · , yn :

pθ (x1 , x2 · · · , xn )

= k(y1 , · · · , yn ; x1 , x2 , · · · , xn )

pθ (y1 , y2 , · · · , yn )

dependent of θ, θ ∈ Ω

The reason for writing the definition in terms of a product rather than a ratio

is taken into account the points for which pθ (x1 , x2 , · · · , xn ) = 0 , i.e., all points

x1 , x2 , · · · , xn such that pθ (x1 , x2 , · · · , xn ) = 0 ∀ θ ∈ Ω will be equivalent, and

every x1 , x2 · · · , xn will be lie in some partition D , namely in D(x1 , x2 , · · · , xn )

and there is no overlapping of the D ’s, so that they constitute a partition of the sample

space. For if two D ’s, say D(x1 , x2 , · · · , xn ) and D(y1 , y2 , · · · , yn ) have a point

z1 , z2 , · · · , zn in common, then z1 , z2 , · · · , zn is equivalent to both x1 , x2 , · · · , xn

and y1 , y2 , · · · , yn which are then equivalent to each other and define the same D .

Thus the partition of the sample space D defines the minimal sufficient partition.

Example 3.16 Let X1 , X2 , · · · Xn be iid random sample drawn from a Binomial

population b(n, θ). Obtaining the minimal sufficient statistic by partition method.

The joint pdf at X1 = x1 , X2 = x2 , · · · , Xn = xn is

P P

xi

pθ (x1 , x2 , · · · , xn ) = θ (1 − θ)n− xi

P P

yi

pθ (y1 , y2 , · · · , yn ) = θ (1 − θ)n− yi

The ratio is P xi =P yi

pθ (x1 , x2 , · · · , xn ) θ

= .

pθ (y1 , y2 , · · · , yn ) 1−θ

P P

The ratio is independent of θ iff xi = yi . Thus the points x1 , x2 , · · · , xn and

y1 , y2 , · · · ,P

yn whose coordinates have the same set of minimal sufficient partition.

Therefore Xi is a minimal sufficient statistic.

Example 3.17 Let X1 , X2 , · · · , Xn be iid PrandomP sample from N (θ, σ 2 ) . As-

2 2

sume θ and σ are unknown. Prove that ( Xi , Xi ) is a minimal sufficient

statistic.

85

A. Santhakumaran

X i

pθ,σ2 (x1 , x2 , · · · , xn ) 1 hX 2 X 2 X

= exp − 2 xi − yi − 2θ xi − yi

pθ,σ2 (y1 , y2 , · · · , yn ) 2σ

2

P P P 2

The

P 2ratio is independent P of Pthe2parameters (θ, σ ) iff xi = yi and xi =

yi . Therefore ( Xi , Xi ) is a minimal sufficient statistic.

Example 3.18 Determine the minimal sufficient statistic based on a random sample

of size from each of the following:

(i) −θx

θe θ>0

pθ (x) =

0 otherwise

(ii)

x2

x

pθ (x) = θ exp[− 2θ ] x>0

0 otherwise

and (iii)

2

( q x

2 x2 − σ

π σ3 e x>0

2

pσ (x) =

0 otherwise

(i) Consider the ratio

pθ (x1 , x2 , · · · , xn ) h X X i

= exp −θ xi − yi .

pθ (y1 , y2 , · · · , yn )

P P P

The ratio is independent of θ iff xi = yi . Therefore Xi is a minimal

sufficient statistic.

(ii) Consider the ratio

pθ (x1 , x2 , · · · , xn ) Y xi 1 X 2 X 2

= exp − xi − yi .

pθ (y1 , y2 , · · · , yn ) yi 2θ

P 2 P 2 P 2

The ratio is independent of the parameter θ iff xi = yi . Therefore Xi is a

minimal sufficient statistic.

(iii) Consider the ratio

pσ (x1 , x2 , · · · , xn ) Y x2i

1 X 2 X 2

= 2 exp[− 2 ( xi − yi )]

pθ (y1 , y2 , · · · , yn ) yi 2σ

P 2 P 2 P 2

The ratio is independent of σ iff xi = yi . Therefore Xi is a minimal

sufficient statistic.

Theorem 3.2 The Exponential family of distributions consists of those distri-

butions with densities or probability functions expressible in the form: pθ (x) =

c(θ)eQ(θ)t(x) h(x), i.e., pθ (x) is a member of exponential family, then there exist is

a minimal sufficient statistic.

Proof: The joint density function of the random sample X1 , X2 , · · · , Xn for a

random variable X is

X Y

pθ (x1 , x2 , · · · , xn ) = [c(θ)]n exp[Q(θ) t(xi )] h(xi ).

86

Probability Models and their Parametric Estimation

pθ (x1 , x2 , · · · , xn ) h nX X oi Y h(x )

i

= exp Q(θ) t(xi ) − t(yi ) .

pθ (y1 , y2 , · · · , yn ) h(yi )

P P P

This is independent of θ iff t(xi ) = t(yi ) . Therefore T = t(Xi ) is a

minimal sufficient statistic.

Remark 3.3 A complete sufficient statistics is minimal sufficient whenever mini-

mal sufficient statistic exists.

Theorem 3.3 Let pθ0 (x) and pθ1 (x)) be the densities and they have the same

p (X)

support ( the range of the two densities are the same). Then the statistic T = pθθ1 (X)

0

is minimal sufficient.

Proof: The necessary and sufficient condition that T = t(X) is a sufficient

statistic for fixed θ1 and θ0 are

and

pθ0 (x1 , x2 · · · , xn ) = pθ0 (t)h(x1 , x2 , · · · , xn )

pθ1 (x1 ,x2 ,··· ,xn ) pθ1 (t)

respectively. Let the ratio pθ0 (x1 ,x2 ,··· ,xn ) = pθ0 (t) be a function of u(x) , then

p (X)

U = u(X1 , X2 , · · · , Xn ) is a sufficient statistic for pθθ1 (X)

iff T is a function of U .

0

This proves T = t(X) to be minimal sufficient statistic.

If P is a family of distributions with common support and P0 ⊂ P and if

T = t(X) is minimal sufficient statistic for P0 and sufficient for P , it is minimal

sufficient for P .

Example 3.18 Let P ∼ N (θ, 1) and P0 ∼ N (θ0 , 1) and P0 ⊂ P . Let

X1 , X2 , · · · , Xn be a random sample of size n . Then

1

(xi −θ)2

P

pθ (x1 , x2 , · · · , xn ) e− 2

= 1

P

(xi −θ0 )2

pθ (x1 , x2 · · · , xn ) e− 2

1 2 2

= e 2 [2n(θ−θ0 )x̄−n(θ −θ0 )]

Thus T = X̄ is the minimal sufficient statistic for N (θ, 1). Example 3.19 Let

X1 , X2 , · · · , Xn be a random sample from a population defined by the Cauchy density

with parameter θ :

1

π[1+(x−θ)2 ] −∞ < x < ∞

pθ (x) =

0 otherwise − ∞ < θ < ∞

Two sets of sample points x1 , x2 , · · · , xn and y1 , y2 , · · · , yn will lie on the same

partition of the minimal sufficient partition iff the ratio

n

1 + (yj − θ)2

pθ (x1 , x2 , · · · , xn ) Y

=

p(y1 ,2 , · · · , yn ) j=1

1 + (xj − θ)2

87

A. Santhakumaran

in θ The ratio is independent of θ , the two polynomials are identical ( the leading

coefficients being

√ equal). This means that the set of zeroes of the numerator polynomial

yj + i ( i = −1, j = 1, 2, · · · , n ) is the same as the set of zeroes of the polynomial,

xj + i ( j = 1, 2, · · · , n ). This is true iff the real numbers ( x1 , x2 , · · · , xn ) are a

permutaion of the numbers ( y1 , y2 , · · · , yn ). A partition set of the minimal sufficient

partition consists of the n! permutations of n real numbers. This minimal sufficient

partition is defined by the order statistic ( X(1) , X(2) , · · · , X(n) ).

Problems

3.1 X has the following distribution

X =x: 0 1 2

Pθ {X = x} 1 − θ − θ2 θ2 θ

Prove that the family of distributions is complete.

3.2 Let X1 , X2 , · · · , Xn be a sample from pmf

1

x = 1, 2, 3, · · · , N ; N ∈ I+

p(x | N ) = PN {X = x} = N

0 otherwise

3.3 Let X1 , X2 , · · · , Xn be iid random variables from ∪(0, θ) . Prove that the

statistic YN = max1≤i≤n {Xi } is complete.

3.4 Consider the class of Hypergeometric probability distributions, {PD , D =

0, 1, 2, · · · , N } where

N-D

D

x n-x

x = 0, 1, · · · , min(n, D)

N

PD {X = x} =

n

0 otherwise

3.5 Examine if the family of distributions

θ if 0 < x ≤ 1

pθ (x) =

1 − θ if 1 < x ≤ 2

is complete.

3.6 Let X1 , X2 , · · · , Xn be a sample from ∪(θ − 12 , θ + 12 ), θ ∈ < . Show that the

statistic T = (min1≤i≤n (Xi ), max1≤i≤n (Xi )) is not complete.

2

3.7 Let X1 , X2 , · · · , Xn be a sample of n independent

Pn observations

Pn from N (θ, σ )

2 2

−∞ < θ < ∞, 0 < σ < ∞ . Show that i=1 Xi , i=1 Xi is a sufficient

statistic. Is it complete? Justify?

88

Probability Models and their Parametric Estimation

exp[θx + w(x) + A(θ)] −∞ < x < ∞

pθ (x) =

0 otherwise

3.9 Prove that a complete sufficient statistics is minimal sufficient whenever minimal

sufficient statistic exists.

3.10 Explain the method of construction of minimal sufficient statistic.

3.11 If a family P is complete, then it is possible to conclude that completeness for

(a) a larger class

(b) an equal class

(c) a small class

(d) none of these

3.12 For a fixed n0 = 1, 2, · · · from the family of densities {pN : N ∈ I+ }. Let

P = {pN : N ∈ I+ and N 6= n0 } where

1

N x = 1, 2, · · · , N ; N ∈ I+

pN (x) =

0 otherwise

then

(a) P is complete

(b) P is not complete

(c) P is bounded complete

(d) P is not bounded complete

3.13 If a complete sufficient statistic does not exist, then UMVUE

(a) may not exist

(b) may exist

(c) may unique

(d) none of the above

3.14 If a complete sufficient statistic exists, then UMVUE is

(a) unique

(b) not unique

(c) not exist

(d) none of the above

89

A. Santhakumaran

4. OPTIMAL ESTIMATION

4.1 Introduction

Let g(T ) be an unbiased estimator of τ (θ) and δ(T ) be an another unbiased

estimator of τ (θ) different from g(T ) . Then there always exists an infinite number of

unbiased estimators of τ (θ) such that λg(T ) + (1 − λ)δ(T ), 0 < λ < 1 . In this case

one can find the best estimator or optimal estimator among all the unbiased estimators.

The following procedures are used to identify the optimal estimator.

Let U = {δi (T ), i = 1, 2, 3, · · · } be the set of all unbiased estimators of the

parameter τ (θ) ∀ θ ∈ Ω and Vθ [δi (T )] < ∞, i = 1, 2, 3, · · · ∀ θ ∈ Ω and g(T )

be a statistic with Vθ [g(T )] < ∞ . Then the estimator g(T ) is called an Uniformly

Minimum Variance Unbiased Estimator (UMVUE) of τ (θ) if

Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δi (T ) − τ (θ)]2

i.e., Vθ [g(T )] ≤ Vθ [δi (T )] ∀ θ ∈ Ω and ∀ i = 1, 2, 3, · · ·

• Uncorrelatedness Approach.

• Rao Blackwell or Lehman Scheffe’s Approach.

• Inequality Approach.

It is a mathematical property based on the uncorrelatedness of estimators. The

following result gives a necessary and sufficient condition for an unbiased estimator to

be UMVUE.

Theorem 4.1 Let U be the class of all unbiased estimators T = t(X) of a

parameter τ (θ) ∀ θ ∈ Ω with Eθ [T 2 ] < ∞ for all θ . Suppose that U is a non-

empty set. Let U0 be the set of all unbiased estimators of V of zero, i.e.,

U0 = {V | Eθ [V ] = 0, Eθ [V 2 ] < ∞ ∀ θ ∈ Ω}

U0 .

Proof: Let T ∈ U and V ∈ U0 . Assume that T = t(X) is a UMVUE of τ (θ) .

Prove that Eθ [V T ] = 0 ∀ θ ∈ Ω. That is, Covθ [V, T ] = 0 ∀ θ ∈ Ω and V ∈ U0 .

Consider T + λ V ∈ U for some real λ (λ 6= 0) , then

Eθ [T + λ V ] = τ (θ) + λEθ [V ]

= τ (θ) since Eθ [V ] = 0

90

Probability Models and their Parametric Estimation

and ∀ λ .

Vθ [T ] + λ2 Vθ [V ] + 2λCovθ [V, T ] ≥ Vθ [T ]

i.e., 2λCovθ [T, V ] + λ2 Vθ [V ] ≥ 0 ∀ θ and ∀ λ

It is an quadratic equation in λ and it has two real roots λ = 0 and λ = − 2Cov θ [T,V ]

Vθ [V ] .

If λ = 0, trivially T is an UMVUE of τ (θ) .

For λ 6= 0, take λ0 = λ2 = − Cov θ [T,V ] 0

Eθ [V 2 ] , then one can define T ∈ U where T =

0

2

Vθ [T 0 ] = Eθ [T + λ0 V ]2 − {Eθ [T + λ0 V ]}

= Eθ [T + λ0 V ]2 − τ 2 (θ)

2

= Eθ [T 2 ] − τ 2 (θ) + λ0 Eθ [V 2 ] + 2λ0 Covθ [T, V ]

2 2

(Covθ [T, V ]) (Covθ [T, V ])

= Vθ [T ] + Eθ [V 2 ] − 2

(Eθ [V 2 ])2 Eθ [V 2 ]

(Covθ [T, V ])2

= Vθ [T ] −

Eθ [V 2 ]

(Covθ [T, V ])2

Vθ [T 0 ] = Vθ [T ] − ≤ Vθ [T ]

Eθ [V 2 ]

Thus λ0 = − EEθθ[T V]

[V 2 ] contradicts that T is the UMVUE of τ (θ) . If T is the UMVUE

of τ (θ) , then Covθ [T, V ] = 0, i.e., Eθ [T V ] = 0 ∀ θ ∈ Ω .

Conversely, assume Covθ [T, V ] = 0 for some θ ∈ Ω . To prove that T is a

UMVUE of τ (θ) . Let T 0 be another unbiased estimator of τ (θ) so that T 0 ∈ U,

then T 0 − T ∈ U0 . Since Eθ [T ] = τ (θ) and Eθ [T 0 ] = τ (θ) →

Eθ [T 0 − T ] = 0

⇒ Eθ [T (T 0 − T )] = 0

Eθ [T T 0 ] = Eθ [T 2 ]

Applying Cauchy Schwarz inequality to Eθ [T 0 T ]

2 2

{Eθ [T T 0 ]} ≤ Eθ [T 2 ]Eθ [T 0 ]

1 n o 12

2

Eθ [T T 0 ] ≤ Eθ [T 2 ] 2 Eθ [T 0 ]

Eθ [T 2 ] n

2

o 21

1 ≤ Eθ [T 0 ]

{Eθ [T 2 ]} 2

Vθ [T ] ≤ Vθ [T 0 ]

Theorem 4.2 Let U be the non-empty class of unbiased estimators as in the

Theorem 4.1, then there exists at most one UMVUE of τ (θ) .

91

A. Santhakumaran

UMVUE of τ (θ) .

Eθ [T 0 ] = τ (θ)

Eθ [T ] = τ (θ)

⇒ Eθ [T 0 − T ] = 0

⇒ Eθ [T (T 0 − T )] = 0

i.e., Eθ [T 2 ] = Eθ [T T 0 ]

Covθ [T, T 0 ] = Vθ [T ]

The correlation coefficient between T and T 0 is given by

Covθ [T, T 0 ] Vθ [T ]

= =1

Vθ [T 0 ]

p p

Vθ [T ] Vθ [T 0 ]

since Vθ [T ] = Vθ [T 0 ]

⇒ Pθ {aT + bT 0 = 0} = 1 ∀ a, b ∈ <

Choose a = 1 and b = −1

⇒ Pθ {T = T 0 } = 1, then T and T 0 are the same. .˙. The UMVUE T is unique.

Theorem 4.3 If UMVUE’s Ti = ti (X), i = 1, 2 exist for real function τ1 (θ)

and τ2 (θ) of θ , then aT1 + bT2 is also UMVUE of aτ1 (θ) + bτ2 (θ) .

Proof: Given T1 = t1 (X) is a UMVUE of τ1 (θ) , i.e., Eθ [T1 V ] = 0 ∀ θ ∈ Ω

and V ∈ U0 . Again Eθ [T2 V ] = 0, ∀ θ ∈ Ω and V ∈ U0 .

Prove that Eθ {[(aT1 + bT2 )V ]} = 0 ∀ θ ∈ Ω.

Covθ [(aT1 + bT2 )V ] = Eθ [(aT1 V ) + (bT2 V )]

−Eθ [aT1 + bT2 ]Eθ [V ] since Eθ [V ] = 0

= Eθ [aT1 V ] + Eθ [bT2 V ]

= aCovθ [T1 , V ] + bCovθ [T2 , V ]

= a×0+b×0=0

Thus aT1 + bT2 is a UMVUE of aτ1 (θ) + bτ2 (θ) .

Theorem 4.4 Let {Tn = tn (X)} be a sequence of UMVUE’s of τ (θ) and

T = t(X) be a statistic with Eθ [T 2 ] < ∞ and such that Eθ [Tn − T ]2 → 0 as

n → ∞ ∀ θ ∈ Ω . Then T = t(X) is also the UMVUE of τ (θ) .

Proof: Given {Tn }∞ n=1 is a UMVUE of τ (θ), i.e., Eθ [Tn V ] = 0 ∀ n =

1, 2, 3, · · · ∀ θ and Eθ [V ] = 0 ∀ θ . Prove that T is also an UMVUE of τ (θ) ,

i.e., Eθ [T V ] = 0 ∀ θ and Eθ [V ] = 0 ∀ θ .

Consider Eθ [T − τ (θ)] = Eθ [T − Tn + Tn − τ (θ)]

|Eθ [T − τ (θ)]| ≤ |Eθ [T − Tn ]| + |Eθ [Tn − τ (θ)]|

≤ Eθ |T − Tn | since Eθ |Tn − τ (θ)| ≥ 0

1 1

Eθ [T − Tn ]2 2 since |Eθ [T − Tn ]| ≤ Eθ [T − Tn ]2 2

i.e., |Eθ [T − τ (θ)]| ≤

Consider Covθ [T, V ] = Eθ [T V ] − 0

= Eθ [T V ] − Eθ [Tn V ]

Eθ [T V ] = Eθ [(T − Tn )V ]

92

Probability Models and their Parametric Estimation

1 1

|Eθ [(T − Tn )V ]| ≤ Eθ [V 2 ] 2 Eθ [T − Tn ]2 2

But Eθ [(T − Tn )V ] = Eθ [T V ]

1 1

.. . |Eθ [T V ]| ≤ Eθ [V 2 ] 2 Eθ [T − Tn ]2 2

1

But Eθ [T − Tn ]2 2 ≥ 0 and

Eθ [T − Tn ]2 → 0 as n → ∞

.. . Eθ [T V ] → 0 as n → ∞

i.e., Covθ [T, V ] = 0 as n → ∞ ∀ θ ∈ Ω.

Example 4.1 if T1 = t1 (X) and T2 = t2 (X) are UMVUE of τ (θ) , show that

the correlation coefficient between T1 and T2 is one.

Given Eθ [T1 ] = τ (θ) and Eθ [T2 ] = τ (θ) for θ ∈ Ω

Vθ [T1 ] = Vθ [T2 ] for θ ∈ Ω .

Consider a new estimator T = 12 [T1 + T2 ] which is also the unbiased estimator of

τ (θ) , i.e.,

1 1

Eθ [T ] = Eθ [T1 ] + Eθ [T2 ]

2 2

1 1

= τ (θ) + τ (θ)

2 2

= τ (θ)

1

Vθ [T ] = Vθ [T1 + T2 ]

2

1

= {Vθ [T1 ] + Vθ [T2 ] + 2Covθ [T, T2 ]}

4

1n p o

= Vθ [T1 ] + Vθ [T2 ] + 2ρ Vθ [T1 ] + Vθ [T2 ]

4

1

= {2Vθ [T1 ] + +2ρVθ [T1 ]}

4

1

= Vθ [T1 ](1 + ρ)

2

Since T1 is the UMVUE of τ (θ)

⇒ Vθ [T ] ≥ Vθ [T1 ]

1

Vθ [T1 ](1 + ρ) ≥ Vθ [T1 ]

2

(1 + ρ) ≥ 2

ρ ≥1

93

A. Santhakumaran

Example 4.2 Let X1 , X2 , · · · , Xn be a sample from a population with mean θ

Pn finite variance, and T be an estimate of θ of the form T (X1 , X2 , · · · , Xn )0 =

and

i=1 αi Xi . If T is an unbiased estimate of θ that has minimum variance and T is

another linear unbiased estimate of θ , show that

Covθ (T, T 0 ) = Vθ [T ]

Pn

Given T = i=1 αi Xi is the unbiased estimator of θ , Eθ [T ] = θ .

Also T 0 is the unbiased estimator of θ , i.e., Eθ = [T 0 ].

Eθ [T ] = θ

Eθ [T 0 ] = θ

Eθ [T − T 0 ] = 0

Eθ [T [T − T 0 ] = 0

Eθ [T 2 − T T 0 ] = 0

Eθ [T 2 ] − Eθ [T T 0 ] = 0

Eθ [T T 2 ] = Eθ [T 2 ]

i.e., Covθ (T, T 0 ) = Vθ [T ]

ασ 2 (α > 1) , where σ 2 is the variance of the UMVUE. Show that the correlation

coefficient between T1 and T2 is greater than or equal to 2−α α .

Given Eθ [T1 ] = τ (θ) and Vθ [T1 ] = ασ 2 .

Also Eθ [T2 ] = τ (θ) and Vθ [T2 ] = ασ 2 where α > 1

Since T1 and T2 are UMVUE’s of τ (θ) , We have Vθ [T1 ] = Vθ [T2 ] = ασ 2

Consider anestimator 12 [T1 + T2 ] . It is also an unbiased estimator of τ (θ) , i.e.,

Eθ [T1 ] = 21 Eθ [T1 ] + 21 Eθ [T2 ] = 12 τ (θ) + 12 τ (θ) = τ (θ)

1 1

Vθ [ (T1 + T2 )] = {Vθ [T1 ] + Vθ [T2 ] + 2Covθ (T1 , T2 )}

2 4

1n p o

= Vθ [T1 ] + Vθ [T2 ] + 2ρ Vθ [T1 ]Vθ [T2 ]

4

1

= [2Vθ [T1 ] + 2ρVθ [T2 ]]

4

1

= [Vθ [T1 ] + ρVθ [T1 ]

2

where ρ is the correlation coefficient between T1 and T2 Let T be the UMVUE of

94

Probability Models and their Parametric Estimation

1

Vθ { [T1 + T2 ]} ≥ Vθ [T ]

2

1

i.e., Vθ [T1 ](1 + ρ) ≥ Vθ [T ]

2

1 2

ασ (1 + ρ) ≥ σ2

2

α(1 + ρ) ≥ 2

2

(1 + ρ) ≥

α

2−α

ρ ≥

α

Rao - Blackwell Theorem helps to search for an UMVUE T = t(X) of a para-

metric function τ (θ). Let δ(T ) be another statistic and a function of the sufficient

statistic T = t(X) which is an unbiased estimator for the parametric function τ (θ) ,

i.e., Eθ [δ(T )] = τ (θ) . Rao - Blackwell Theorem improves on δ(T ) by conditioning

on the sufficient statistic T = t. That is, computing E[δ(T ) | T = t] = g(t) so

that Eθ [g(T )] = τ (θ) with smaller variance than that of δ(T ) . Also it states that the

conditioning on the sufficient statistic T = t(X) is made irrespective of any unbiased

estimator δ(T ) of τ (θ) .

Theorem 4.5 Let {Pθ , θ ∈ Ω} be a family of probability distributions and δ(T )

be any statistic in U where U is the non-empty class of all unbiased estimators of

τ (θ) with Eθ [δ 2 (T )] < ∞ . Let T = t(X) be a sufficient statistic for {Pθ , θ ∈

Ω} . Then the conditional expectation E[δ(T ) | T = t] = g(t) is independent of θ

and g(T ) is an unbiased estimator of τ (θ) . Also Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δ(T ) −

τ (θ)]2 ∀ θ ∈ Ω.

Proof: Given that δ(T ) is a unbiased estimator of τ (θ) , ∀ θ ∈ Ω and δ(T ) is a

function of a sufficient statistic T . E[δ(T ) | T = t] = g(t) and the statistic g(T ) is

an unbiased estimator of τ (θ) , since Eθ [E {δ(T ) | T }] = Eθ [δ(T )] = τ (θ) ∀ θ ∈ Ω.

Now prove that Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δ(T ) − τ (θ)]2 ∀ θ ∈ Ω .

It is enough to prove that Eθ [g 2 (T )] ≤ Eθ [δ 2 (T )] ∀ θ ∈ Ω.

95

A. Santhakumaran

{E[δ(T ) | T 1 | T ]}2 ≤ E[δ 2 (T ) | T ]E[12 | T ]

2

i.e., {E[δ(T ) | T ]} ≤ E[δ 2 (T ) | T ]

i.e., g 2 (t) ≤ E[δ 2 (T ) | T ]

2

Eθ E[δ 2 (T ) | T ] = Eθ [δ 2 (T )]

i.e., Eθ [g (T )] ≤

2

→ Eθ [g(T ) − τ (θ)] ≤ Eθ [δ(T ) − τ (θ)]2 ∀ θ ∈ Ω

The inequality becomes equality iff

Eθ [δ 2 (T )] = Eθ [g 2 (T )]

2

Eθ {E[δ(T ) | T ]}2

i.e., Eθ E[δ (T ) | T ] =

since E[E[X 2 | Y ]] = E[X 2 ] and g(t) = E[δ(T ) | T ]

E[δ 2 (T ) | T ] − {E[δ(T ) | T ]}2

Eθ = 0

Eθ [V ar[δ(T ) | T ]] = 0

V ar[δ(T ) | T ] = 0 iff E[δ 2 (T ) | T ] = {E[δ(T ) | T ]}2

If this is the case , then E[δ(T ) | T = t] = g(t) and the statistic g(T ) is a function

of T.

Remark 4.1 The Rao - Blackwell Theorem has the following limitations.

(i) If the unbiased estimator T = t(X) is already a function of only one sufficient

statistic, then the derived statistic is identical to T = t(X) . In this case there is

no improvement in the variance of the statistic T = t(X) .

(ii) If more than one sufficient statistic exists, then one can improve the variance of the

unbiased estimator by using minimal sufficient statistics, since the set of jointly

sufficient statistic is an arbitrary set. To add the concept of completeness to

derive the statistic which is unique and may identify the UMVUE’s. This leads

to Lehman - Scheffe Theorem.

The Theorem states that if a complete sufficient statistic exists, then the UMVUE

of τ (θ) is unique. But it does not mean that only the complete sufficient statistic has

UMVUE. Even if a complete sufficient statistic does not exist, an UMVUE may still

exist.

Theorem 4.6 If T = t(X) is a complete sufficient statistic and there exists an

unbiased estimator δ(T ) of τ (θ) , then there exists a unique UMVUE of τ (θ) which

is given by E[δ(T ) | T = t] = g(t) .

Proof: Rao - Blackwell Theorem gives E[δ(T ) | T = t] = g(t) and g(T )

is the UMVUE of τ (θ) . It is only to prove that g(T ) is unique. If δ1 (T ) ∈ U and

δ2 (T ) ∈ U , then Eθ [E[δ1 (T ) | T ]] = τ (θ) and Eθ [E[δ2 (T ) | T ]] = τ (θ) ∀ θ ∈ Ω.

Eθ [E[δ1 (T ) | T ] − E[δ2 (T ) | T ]] = 0 ∀ θ ∈ Ω

Since T = t(X) is a complete sufficient statistic

⇒ E[δ1 (T ) | T ] − E[δ2 (T ) | T ] = 0

E[δ1 (T ) | T ] = E[δ2 (T ) | T ]

96

Probability Models and their Parametric Estimation

.˙. The UMVUE g(T ) is unique, if the sufficient statistic T = t(X) is complete.

From the above Theorems 4.5 and 4.6 the UMVUE of τ (θ) is obtained by

solving a set of equations and conditioning on the sufficient statistic.

Solving a set of equations of the sufficient statistic

Let Pθ , θ ∈ Ω be a distribution of random variable X . If T is a complete suf-

ficient statistic, then the UMVUE g(T ) of any parametric function τ (θ) is uniquely

determined by solving the set of equations Eθ [g(T )] = τ (θ) ∀ θ ∈ Ω .

Conditioning on the sufficient statistic

If a random variable X has a distribution Pθ , θ ∈ Ω and δ(T ) is any unbiased

estimator of τ (θ) and T = t(X) is complete sufficient statistic, then the UMVUE

g(T ) can be obtained by conditional expectation of δ(T ) given T = t , i.e., g(t) =

E[δ(T ) | T = t].

Example 4.4 Obtain the UMVUE of θ + 2 for the pmf of the Poisson distribu-

tion −θ θx

e x! x = 0, 1, 2, · · ·

p(x | θ) =

0 otherwise

by taking a sample of size n .

n

X

Let T = Xi , thenT ∼ P (nθ)

i=1

−nθ

e (nθ)t

p(t | θ) = t = 0, 1, 2, · · ·

t!

= 0 otherwise

1

p(t | θ) = e−nθ et log nθ

t!

= c(θ)eQ(θ)t(x) h(x)

Pn

where c(θ) = e−nθ , Q(θ) = log nθ, t(x) = i=1 xi , h(x) = 1

t! . .˙. The statistic

97

A. Santhakumaran

Eθ [g(T )] = θ+2

∞

X 1

g(t)e−nθ (nθ)t = θ+2

t=0

t!

∞

X 1

g(t)nt θt = (θ + 2)enθ

t=0

t!

∞

X (nθ)t

= (θ + 2)

t=0

t!

∞ ∞

X

t t+1 1 X 1

= nθ +2 nt θ t

t=0

t! t=0

t!

Equivating the coefficient of θt on both sides

nt nt−1 nt

g(t) = +2

t! (t − 1)! t!

t

g(t) = +2

n

P

xi

= +2

n

= x̄ + 2

Example 4.5 Let Xi (i = 1 to n) be a sample from Poisson distribution with

parameter θ . Obtain the UMVUEPof θr−1 e−rθ , r = 1, 2, 3, · · · .

n

As in the example 4.1, T = i=1 Xi is complete and sufficient. Therefore there

exists a unique UMVUE of τ (θ) = θr−1 e−rθ , r = 1, 2, · · · .

∞

X 1

g(t)e−nθ (nθ)t = θr−1 enθ−rθ

t=0

t!

∞

X (n − r)t

= θt+r−1

t=0

t!

Equivating the coefficient of θr on both sides

nt (n − r)t−r+1

g(t) =

t! (t − r + 1)!

(n − r)t−r+1 t!

g(t) = t

, r = 1, 2, · · · and n > r

n (t − r + 1)!

Thus the UMVUE of θr−1 e−rθ is

(n − r)T −r+1 T!

T

, r = 1, 2, · · · and n > r.

n (T − r + 1)!

98

Probability Models and their Parametric Estimation

t t

Remark 4.2 When r = 1, g(t) = n−1 n = 1 − n1 , n = 2, 3, · · · , then

T

1 − n1 is the unbiased estimator of e−θ where T = Xi .

P

When r = 2, (n−2)T

[1 − n2 ]T , n = 3, 4, · · · is the UMVUE of e−2θ θ where

P

T = Xi .

Example 4.6 Obtain the UMVUE of θr + (r − 1)θ , r = 1, 2, · · · for the random

sample of size n from Poisson distribution

Pn with parameter θ .

As in the example 4.1, T = i=1 Xi is complete and sufficient. There exists a

UMVUE of τ (θ) = θr + (r − 1)θ , r = 1, 2, · · ·

∞

X (nθ)t

Eθ [g(T )] = g(t)e−nθ = θr + (r − 1)θ

t=0

t!

∞ t t

X nθ

g(t) = [θr + (r − 1)θ]enθ

t=0

t!

= θr enθ + (r − 1)θenθ

∞ ∞

X nt θ t X nt θ t

= θr + (r − 1)θ

t=0

t! t=0

t!

∞ ∞

X 1 X 1

= nt θt+r + (r − 1) nt θt+1

t=0

t! t=0

t!

Equivating the coefficient of θt on both sides

nt nt−r nt−1

g(t) = + (r − 1)

t! (t − r)! (t − 1)!

1 t! 1 (r − 1)

= + t!

nr (t − r)! n (t − 1)!

t(t − 1) · · · · · · (t − r + 1) (r − 1)

= + t

nr n

The UMVUE of θr + (r − 1)θ is

T (T − 1) · · · · · · (T − r + 1) (r − 1)

g(T ) = + T, r = 1, 2, · · ·

nr n

Remark 4.3 When r = 1, X̄ is the UMVUE of θ .

When r = 2, X̄(nX̄−1)

n + X̄ is the UMVUE of θ2 + θ .

Example 4.7 Obtain UMVUE of θ(1−θ) using a random sample of size n drawn

from a Bernoulli population with parameter θ.

x

θ (1 − θ)1−x x = 0, 1

Given pθ (x) =

0 otherwise

99

A. Santhakumaran

n

X

Let T = Xi , then T ∼ b(n, θ)

i=1

i.e., pθ (x) = cnt θt (1 − θ)n−t t = 0, 1, 2, · · · ,n

t

θ

= cnt (1 − θ)n

1−θ

θ

= (1 − θ)n et log( 1−θ ) cnt

= c(θ)eQ(θ)t(x) h(x)

θ X

where c(θ) = (1 − θ)n , Q(θ) = log , t(x) = xi and h(x) = cnt .

1−θ

P

It is an one parameter exponentially family. .˙. The statistic T = Xi is complete

and sufficient. The UMVUE of θ(1 − θ) is

Eθ [g(T )] = θ(1 − θ)

∞

X

g(t)cnt θt (1 − θ)n−t = θ(1 − θ)

t=0

∞ t

X θ

g(t)cnt = θ(1 − θ)(1 − θ)−n

t=0

1−θ

θ

One can take ρ = , then

1−θ

θ 1

1+ρ = 1+ =

1−θ 1−θ

1

Thus 1 − θ =

1+ρ

ρ

→ θ =

1+ρ

∞

X

g(t)ρt cnt = ρ(1 + ρ)n−2

t=0

= ρ[1 + cn−2

1 ρ + · · · + ρn−2 ]

= ρ + cn−2

1 ρ2 + · · · + ρn−1

n−1

!

X n-2

= t-1 ρt

t=1

g(t)cnt = cn−2

t

(n − 2)! t!(n − t)!

g(t) =

(t − 1)!(n − t − 1)! n!

(n − 2)!t(t − 1)!(n − t)(n − t − 1)!

=

(t − 1)!(n − t − 1)!n(n − 1)(n − 2)!

t(n − t)

= if n = 2, 3, · · ·

n(n − 1)

100

Probability Models and their Parametric Estimation

T (n−T )

i.e., n(n−1) is the UMVUE of θ(1 − θ).

Example 4.8 Obtain the UMVUE of p1 of the pmf

pq x

x = 0, 1, · · ·

pp (x) =

0 otherwise

If xi denotes the number of trials after the (i − 1)th success up to but not

including the ith success, the probability that Xi = x is pq x for x = 0, 1, · · · and

i = 1, 2, · · · , n.

The joint pmf of X1 , X2 , · · · , Xn is

Px

pq i xi = 0, 1, · · · ; i = 1, 2, · · · , n

pp (x1 , x2 , · · · , xn ) =

0 0otherwise

P

= pelog(1−p) xi

= c(p)eQ(p)t(x) h(x)

X

where c(p) = pn , Q(p) = log(1 − p), t(x) = xi , h(x) = 1.

This is an one parameter exponentially family which is complete and sufficient. Thus

there exist an unique UMVUE of p1 . It is given by Ep [g(T )] = p1 .

Pn

The statistic T = i=1 Xi is the sum of n iid Geometric variables with

same parameter p has the Negative Binomial distribution. The pmf of T is

!

n+t-1 n t

n-1 p q t = 0, 1, · · ·

pp (t) = P {T = t} =

0 otherwise

∞

!

X n+t-1 1

g(t) n-1 pn q t =

t=0

p

∞

n+t-1 t

X

g(t) t q = (1 − q)−(n+1)

t=0

∞

n+t t

X

= t q

t=0

n+t-1

Equivating the coefficient of q t on both sidesg(t) t

n+t

= t

g(t) =

t!n! (n + t − 1)!

(n + t)(n + t − 1)!(n − 1)! t+n

= =

n(n − 1)!(n + t − 1)! n

101

A. Santhakumaran

Thus T +n

n is the UMVUE of p1 .

1

Example 4.9 For a single observation x of X , find the UMVUE of p of the

pmf x

pq x = 0, 1, · · ·

pp (x) =

0 otherwise

The pmf of the random variable is written as

where c(p) = p, Q(p) = log(1 − p), t(x) = x, h(x) = 1

sufficient. The UMVUE of p1 is given by

1

Ep [g(X)] =

p

∞

X 1

g(x)pq x =

x=0

p

X∞ ∞

X

g(x)q x = (1 − q)−2 = (x + 1)q x

x=0 x=0

→ g(x) = x+1

ExampleP 4.10 Let X1 , X2 , · · · Xn be iid N (θ, 1) . Prove that E[X1 | Y ] = x̄

n

where Y = i=1 Xi .

The sample mean X̄ ∼ N (θ, n1 ) and Eθ [X1 ] = θ, ∀ θ ∈ Ω.

The pdf of the sample size n is

n

Y

pθ (x1 , x2 , · · · , xn ) = p(xi | θ)

i=1

n

1 1

P 2

= √ e− 2 (xi −θ)

2π

n

1 1

P 2 nθ2

= √ e− 2 xi − 2 +nx̄θ

2π

= c(θ)eQ(θ)t(x) h(x)

n

− nθ

2 1 1

P 2 X

where c(θ) = e 2 , h(x) = √ e− 2 xi , Q(θ) = θ and t(x) = xi .

2π

It is an one parameter exponential family. T = X̄ is complete and sufficient. The

UMVUE of θ is given by g(T ) and g(t) = E[X1 | Y ] where δ(T P ) = X1 is an

n

unbiased estimator of θ . The conditional expectation X1 on Y = i=1 Xi is a

regression line, i.e.,

σX1

E [X1 | Y ] = Eθ [X1 ] + bX1 Y (Y − Eθ [Y ]) where bX1 Y = ρ

σY

102

Probability Models and their Parametric Estimation

Pn

and ρ is the correlation coefficient between X1 and Y = i=1 Xi

Cov[X1 , Y ]

ρ =

σX σY

X 1 √

Y = Xi ∼ N (nθ, n) σY = n, σX1 = 1

Covθ [X1 , Y ] = Eθ [X1 Y ] − Eθ [X1 ]Eθ [Y ]

" n

#

X

Eθ [X1 Y ] = Eθ X1 Xi

i=1

= Eθ [X12 ] + Eθ [X1 X2 + · · · + Xn X1 ]

= Eθ [X12 ] + Eθ [X1 ]Eθ [X2 ] + · · · + Eθ [X1 ]Eθ [Xn ]

= 1 + θ2 + (n − 1)θ2 where Vθ [X1 ] = Eθ [X12 ] − θ2

= nθ2 − θ2 + 1 + θ2

= nθ2 + 1

Covθ [X1 , Y ] = nθ2 + 1 − θnθ = 1

1 1

ρ = √ and bX1 Y =

n n

1

E[X1 | Y = y] = Eθ [X1 ] + [y − nθ]

n

y

= θ + − θ = x̄

n

E[X1 | Y = y] = x̄ and X̄ is the UMVUE of θ.

Example 4.11 Let X1 , X2 , · · · Xn be iid random sample with pdf

1

θ 0<x<θ

pθ (x) =

0 otherwise

Find the UMVUE of θ .

Let T = max {Xi }

1≤i≤n

The pdf of T is

Z t n−1

n! 1 1

pθ (t) = dx 0<t<θ

1!(n − 1)! 0 θ θ

n n−1

pθ (t) = θn t 0<t<θ

0 otherwise

The joint density of X1 , X2 , · · · , Xn is

n

1

pθ (x1 , x2 , · · · , xn ) =

θ

The conditional density of

1

pθ (x1 , x2 · · · xn ) θn 1

= n n−1 =

pθ (t) θn t ntn−1

103

A. Santhakumaran

Z θ

n n−1

Eθ [g(T )] = g(t) t dt = 0

0 θn

Z θ

g(t)tn−1 dt = 0

0

Differentiate this with respect to θ

hR i

θ

∂ 0

g(t) θnn tn−1 dt Z θ

= 0dt + g(θ)θn−1 × 1 − 0 = 0

∂θ 0

→ g(θ) = 0 ∀ θ

→ g(t) = 0 ∀ t and 0 < t < θ

Rθ

estimator of θ , since Eθ [X1 ] = 0 x1 θ1 dx1 = θ2 . The UMVUE of θ is given by

g(T ) and g(t) = E[2X1 | T = t].

When x1 = t the conditional pmf of X1 given T = t is p(x1 | t) = n1 .

When 0 < x1 < t the conditional density of X1 given T = t is

1 (n−1) n−2

pθ (x1 , t) θ θ n−1 t

pθ (x1 | t) = = n n−1 0 < x1 < t

pθ (t) nt

θn−1 1

n t 0 < x1 < t

=

0 otherwise

Z t

1

E[2X1 | T = t] = 2x1 pθ (x1 | t)dx1 + 2t

0 n

Z t

n−11 2t

= 2 x1 dx1 +

n t 0 n

n − 1 1 t2 2t

= 2 +

n t 2 n

1

= (1 + )t

n

Thus the UMVUE of θ is ( n+1 n )T where T = max1≤i≤n {Xi }.

Example 4.12 Let X1 , X2 , · · · , Xn be a random sample drawn from a distribu-

tion with probability density function

−(x−θ)

e θ<x<∞

pθ (x) =

0 otherwise

104

Probability Models and their Parametric Estimation

1≤i≤n

The pdf of T is

Z ∞ n−1

n! −(t−θ) −(x−θ)

pθ (t) = e e dx

1!(n − 1)! θ

ne−n(t−θ) θ < t < ∞

=

0 otherwise

Eθ [g(T )] = 0

Z ∞

g(t)ne−n(t−θ) dt = 0

θ

Z ∞

g(t)e−n(t−θ) dt = 0

θ

z=∞

Z ∞

g(z + θ)e−nz dz = 0

0

Z ∞

It is same as f (t)e−st dt = 0

0

i.e., g(t) = 0 ∀ 0 < t − θ < ∞

= 0∀ θ<t<∞

Z ∞

Eθ [X1 ] = x1 e−(x1 −θ) dx1

θ

Z ∞

= (z + θ)e−z dz where z = x1 − θ

Z0 ∞ Z ∞

= e−z z 2−1 dz + θ e−z z 1−1 dz

0 0

= Γ2 + θΓ1

= 1+θ

Eθ [X1 − 1] = θ

If one can take δ(T ) = X1 − 1 , then the UMVUE of θ is given by g(T ) and

g(t) = E[(X1 − 1) | T = t].

When x1 = t, the conditional pmf of X1 given T = t is pθ (x1 | t) = n1 .

105

A. Santhakumaran

e−(x1 −θ) (n − 1)e−(n−1)(t−θ)

pθ (x1 | t) =

ne−n(t−θ)

n − 1 −(x1 −t)

= e

n Z ∞

(n − 1) 1

E[(X1 − 1) | T = t] = (x1 − 1)e−(x1 −t) dx1 + (t − 1)

n t n

Z ∞ Z ∞

n−1 n−1

= x1 e−(x1 −t) dx1 − e−(x1 −t) dx1

n t n t

1

+ (t − 1)

n Z

n−1 ∞ n − 1 ∞ −z

Z

= (z + t)e−z dz − e dz

n 0 n 0

1

+ (t − 1) where z = x1 − t

n Z

n − 1 ∞ −z 2−1

Z ∞

n−1

= e z dz + t e−z z 1−1 dz

n 0 n 0

n − 1 ∞ −z 1−1

Z

1

− e z dz + (t − 1)

n 0 n

n−1 n−1 n−1 1

= Γ2 + t− Γ1 + (t − 1)

n n n n

n−1 1

= t + (t − 1)

n n

1

= t−

n

1

The UMVUE of θ is T − n1 and the UMVUE of eθ is e{T − n }.

Example 4.13 Let X1 and X2 be a random sample drawn from a population with

pdf 1 −x

θe

θ 0<x<∞

pθ (x) =

0 otherwise

Obtain the UMVUE of θ .

1 − 1 (x1 +x2 )

pθ (x1 , x2 ) = e θ

θ2

1 −1t

= e θ

θ

= c(θ)eQ(θ)t(x) h(x)

2

1 1 X

where c(θ) = , Q(θ) = − , t(x) = xi , h(x) = 1

θ2 θ i=1

106

Probability Models and their Parametric Estimation

∂x1 ∂x1

and t1 = x2 , then x1 = t−t1 and x2 = t1 . ∂t = 1, ∂t1 = −1, ∂x ∂x2

∂t = 0, ∂t1 = 1

2

∂x1 ∂x1

∂t ∂t1

J =

∂x2 ∂x2

∂t ∂t1

1 −1

=

0 1

1 − θ1 t

θ2 e 0 < t1 < t < ∞

p(t, t1 | θ) =

0 otherwise

( 0 < t1 < t or

1 − θ1 t

= θ2 e t1 < t < ∞

0 otherwise

The pdf of T is

Z t

pθ (t) = p(t, t1 | θ)dt1

0

Z t

1 1

= e− θ t dt1

θ2 0

1 −1t

= e θ t 0<t<∞

θ2

− θ1 t 2−1

1

θ 2 Γ2 e t 0<t<∞

=

0 otherwise

The pdf of T1 is

Z ∞

1 −1t

pθ (t1 ) = e θ dt

t1 θ2

" 1 #∞

1 e− θ t

=

θ − θ1

t1

1 − 1 t1

= e θ 0 < t1 < ∞

θ

The conditional density of T1 given T = t is

1

0 < t1 < t

p(t1 | t) = t

0 otherwise

Eθ [X2 ] = θ .˙. δ(T ) = X2 = T1

is an unbiased estimator θ. Thus the UMVUE of θ is

Z t

1

E [T1 | T ] = t1 dt1

0 t

t

1 t21

=

t 2 0

t x1 + x2

= = = x̄

2 2

The UMVUE of θ is X̄.

107

A. Santhakumaran

Example 4.14 The random variables X and Y have the joint pdf

2 − 1 (x+y)

θ2 e

θ 0<x<y<∞

p(x, y | θ) =

0 otherwise

Show that

(i) Eθ [Y | X = x] = x + θ

(ii) Eθ [Y ] = Eθ [X + θ] and

(iii) Vθ [X + θ] ≤ Vθ [Y ]

The marginal density of X is

Z ∞

2 x+y

p(x | θ) = e− θ dy

θ2 x

2 − 2x

θ2 e

θ 0<x<∞

=

0 otherwise

Z y

2 x+y

pθ (y) = e− θ dy

θ2 0

2 −y 2

θe

θ − θ2 e− θ y 0<y<∞

=

0 otherwise

108

Probability Models and their Parametric Estimation

2 − x+y

θ2 e

θ

pθ (y | x) = 2 − θ2 x

e

θ 1 x − y

θe e

θ θ x<y<∞

=

0 otherwise

Z ∞

Eθ [Y | X = x] = ypθ (y | x)dy

x

x Z ∞

eθ y

= ye− θ dy

θ

Z x∞

x y

= e θ e− θ dy + x

x

= x+θ

Z ∞ Z ∞

2 y 2 2

Eθ [Y ] = ye− θ dy − ye− θ y dy

θ 0 θ 0

2 Γ2 2 Γ2

= 1 −

θ (θ) 2 θ ( θ2 )2

3

= θ

2

7θ2 5

Eθ [Y 2 ] = , Vθ (Y ) = θ2

2

Z ∞ 4

2 −2x θ

Eθ [X] = 2

e θ dx =

0 θ 2

θ 3

Eθ [X + θ] = + θ = θ = Eθ [Y ]

2 2

θ2

Vθ [X + θ] = Vθ [X] =

4

Thus Vθ [X + θ] ≤ Vθ [Y ].

1

x = 1, 2, · · · , N and N ∈ I+

p(x | N ) = N

0 otherwise

109

A. Santhakumaran

1≤i≤n

PN {X(n) ≤ x} = PN {X1 ≤ x1 , X2 ≤ x2 , · · · , Xn ≤ xn }

= PN {X1 ≤ x1 } · · · PN {Xn ≤ xn }

x x x n

= ··· =

N N N

n

x−1

PN {X(n) ≤ x − 1} =

N

PN {X(n) = x} = PN {X(n) ≤ x} − PN {X(n) ≤ x − 1}

x n x − 1 n−1

= −

N N

N

" n−1 #

n

X t t−1

EN [g(T )] = g(t) − =0

t=1

N N

g(t) = 0 ∀ t = 1, 2, · · · , N

n " n−1 #

1 1

When N = 2, g(1) − 0 + g(2) 1 − = 0

2 2

1

g(1) = 0 ⇒ g(2) 1 − n−1 = 0 ⇒ g(2) = 0 and so on

2

g(t) = 0 ∀ t = 1, 2, · · · , N

Thus the statistic X(n) is complete

110

Probability Models and their Parametric Estimation

PN {X1 = x1 ∩ X(n) = x}

=

PN {X(n) = x}

if x1 = 1, 2, · · · , (x − 1) and x1 6= x

x n−1

(N ) − ( x−1

N

)n−1 1

= x n

x−1 n

×

N

− ( N

) N

xn−1 − (x − 1)n−1

=

xn − (x − 1)n

if x1 = 1, 2, · · · , (x − 1) and x1 6= x

PN {X1 = x1 ∩ X(n) = x}

PN {X1 = x1 | X(n) = x} = if x1 = x

PN {X(n) = x}

x n−1

N 1

= x n ×

(N ) − ( x−1

N

)n N

xn−1

= if x1 = x

xn − (x − 1)n

Thus X(n) is a sufficient statistic.

N

X 1

EN [X1 ] = x1

x1 =1

N

1 N (N + 1) N +1

= =

N 2 2

EN [2X1 ] = N +1

EN [2X1 − 1] = N

.

. . δ(T ) = 2X1 − 1 is an unbiased estimator of N.

The UMVUE of N is given by

E[(2X1 − 1) | X(n) = x]

x−1

X

= (2x1 − 1)PN {X1 = x1 | X(n) = x}

x1 =1

x−1

xn−1 − (x − 1)n−1 X

= (2x1 − 1)

xn − (x − 1)n x =1

1

xn−1

+ n (2x − 1)

x − (x − 1)n

x−1

xn−1 X

= (2x1 − 1)

xn − (x − 1) x =1

n

1

n−1

x

+ (2x − 1)

xn − (x − 1)n

x−1

(x − 1)n−1 X

− (2x1 − 1)

xn − (x − 1)n x =1

1

xn−1

= [1 + 3 + 5 + · · · + (2x − 1)]

x − (x − 1)n

n

111

(x − 1)n−1

− [1 + 3 + · · · + (2x − 3)]

xn − (x − 1)n

A. Santhakumaran

−(2 + 4 + · · · + 2x)

2x(2x + 1) x(x + 1)

= −2×

2 2

= x(2x + 1) − x(x + 1) = x2

1 + 3 + · · · + (2x − 3) = 1 + 2 + · · · + (2x − 2) − [2 + 4 + · · · + (2x − 2)]

(2x − 2)(2x − 1) 2(x − 1)x

= −

2 2

= (2x − 1)(x − 1) − x(x − 1) = (x − 1)2

xn−1 (x − 1)n−1

x2 − n (x − 1)2

E 2X1 − 1 | X(n) = x = n n

x − (x − 1) x − (x − 1)n

xn+1 (x − 1)n+1

= −

xn − (x − 1)n xn − (x − 1)n

n+1 n+1

x − (x − 1)

=

xn − (x − 1)n

n+1 n+1

Thus the UMVUE of N is X X n −(X−1)

−(X−1)n .

Remark 4.4 In Chapter 3 , Example 3.14 is not complete, but it is bounded com-

plete. The class of unbiased estimators of zero is

U0 = {g(X) | c ∈ <}

where

c(−1)x−1

if x = 1, 2

g(x) =

0 x = 3, 4, · · · , N ; N = 2, 3, · · ·

By Theorem 4.7, CovN [δ(T ), g(X)] = 0 for N = 2, 3, · · · implies that δ(T ) is a

UMVUE of N where T = t(X) . That is

EN [δ(t(X))g(X)] = 0 N = 2, 3, · · · , ∀ c ∈ <

N

X 1

δ(t(x))g(x) = 0 N = 2, 3, · · · , ∀ c ∈ <

x=1

N

N

X

⇒ δ(t(x))g(x) = 0 N = 2, 3, · · · , ∀ c ∈ <

x=1

i.e., δ(t(1))c − δ(t(2))c = 0 ∀ c ∈ <

If one can take c = 1 , then δ(t(1)) = δ(t(2)).

.˙. Any estimator δ(T ) such that δ(t(1)) = δ(t(2)) is a UMVUE of N , provided

EN [δ 2 (T )] < ∞, for N = 2, 3, · · · . Thus a family of distributions is bounded com-

plete, then there is a class of UMVUE’s.

Example 4.16 Let X1 , X2 , · · · , Xn be a random sample of size n from a distri-

bution with pdf 1 −x

θe

θ 0 < x < ∞, θ > 0

pθ (x) =

0 otherwise

112

Probability Models and their Parametric Estimation

The joint pdf of the sample size n is

1 − ni=1 xi

P

p(x1 , x2 , · · · , xn | θ) e = θ

θn

= c(θ)eQ(θ)t(x) h(x)

Pn

It is an one parameter exponential family. The statistic T = i=1 Xi is complete and

sufficient.

Pθ {X ≥ 2} = 1 − Pθ {X < 2}

Z 2

1 −x

= 1− e θ dx

0 θ

2

= eZ− θ

∞

1 − x 2−1

Eθ [X1 ] = e θ x1 dx1 = θ

0 θ

Xn

Let T = Xi , thenT ∼ G(n, θ)

i=1

− θ1 t n−1

1

pθ (t) = θ n Γn e t t>0

0 otherwise

n

X

Let y = xi , then

i=2

1 − 1 x1 1 1

= e θ n−1

e− θ y y n−2

θ θ Γ(n − 1)

1 − θ1

Pn

i=1 xi y n−2

= e

θn Γ(n − 1)

1 1

= e− θ t [t − x1 ]n−2 where y = t − x1

θn Γ(n − 1)

(

1 − θ1 t n−2

θ n Γ(n−1) e [t − x1 ] 0 < x1 ≤ t < ∞

pθ (x1 , t) =

0 otherwise

113

A. Santhakumaran

1 − θ1 t n−2

θ n Γ(n−1) e [t − x1 ]

pθ (x1 | t) = 1 − θ1 t tn−1

θ n Γn e

1

= (n − 1)[t − x1 ]n−2 n−1

t

(n − 1) 1t [1 − xt1 ]n−2

0 < x1 < t

=

0 otherwise

The UMVUE of θ is

t

n−1h

Z

x1 in−2

E[X1 | T = t] = x1 1− dx1

0 t t

Z t h

n−1 x1 in−2

= x1 1 − dx1

t 0 t

x1

One can take z = , then dx1 = tdz

t

When x1 = t ⇒ z = 1; when x1 = 0 ⇒ z = 0

Z 1

n−1

E[X1 | T = t] = (tz)[1 − z]n−2 tdz

t 0

Z 1

= (n − 1)t (1 − z)n−1−1 z 2−1 dz

0

Γ2Γ(n − 1) t nx̄

= (n − 1)t = = = x̄

Γ(n − 1 + 2) n n

2

The UMVUE of Pθ {X ≥ 2} is e− X̄

Example 4.17 Let X1 , X2 , · · · , Xn be a random sample from N (θ, σ 2 ) . Both

θ and σ are unknown. Find the UMVUE of σ and pth quantile.

(n−1)S 2 X̄)2

P

Let Y = σ2 = (Xσi − 2 ∼ χ2 distribution with (n − 1) degrees of

freedom. Y ∼ G( 12 , (n−1)

2 ).

( 1 n−1

n−1

1

e− 2 y y 2 −1 0<y<∞

p(y) = 2 2 Γ n−1

2

0 otherwise

√ Z ∞

1 1 n

E[ Y ] = n−1 e− 2 y y 2 −1 dy

0 2 Γ n−1

2

2

1 Γ n2

= n

( 12 ) 2

n−1

2 2 Γ n−1

2

"r #

n−1 2 Γ n2 √

i.e., Eσ S = 2

σ2 Γ n−1

2

Γ n2 √ σ

→ Eσ [S] = 2√

Γ n−1

2 n −1

1 Γ n−1

q

2

= σ where k(n) = Γn

2

n −1

k(n) 2

114

Probability Models and their Parametric Estimation

n

1 1

P 2 P 2

p(x1 , x2 , · · · , xn | θ, σ) = √ e− 2σ2 [ xi −2θ xi +nθ ]

2πσ

n

1 1

P 2 θ

P nθ 2

= √ e− 2σ2 xi e σ2 xi e− 2σ2

2πσ

P2

= c(θ1 , θ2 )e j=1 Qj (θ1 ,θ2 )tj (x) h(x) where θ1 = θ, θ2 = σ

n

1 n 2 1 θ

and c(θ1 , θ2 ) = √ e− 2σ2 θ , Q1 (θ1 , θ2 ) = − 2 , Q2 (θ1 , θ2 ) = 2

2πσ 2σ σ

Hence T1 = Xi , T2 = Xi2 and T = (T1 , T2 ) is jointly sufficient and complete.

P P

But there is a one to one function also sufficient. .˙. T = (X̄, S 2 ) isP also sufficient

1

and complete. Thus the UMVUE of σ is k(n)S where S 2 = n−1 (Xi − X̄)2 .

th

The UMVUE of p quantile δp is given by

p = Pθ,σ {X ≤ δp }

X −θ δp − θ

= Pθ,σ ≤

σ σ

δp − θ X−θ

= Pθ,σ Z ≤ where Z = σ ∼ N (0, 1)

σ

Z δ−θσ

p = p(z)dz

0

Z ∞

i.e., 1 − p = δ −θ

p(z)dz

p

σ

δp − θ

⇒ = z1−p ⇒ δp = z1−p σ + θ

σ

Thus the UMVUE of δp is Z1−p k(n)S + X̄ .

Under some regularity conditions Cramer - Rao inequality provides a lower bound

for the variance of unbiased estimators. It may enable us to judge a given unbiased

estimator is an UMVUE or not. That is, the variance of an unbiased estimator coincides

with the Cramer - Rao lower bound, then the estimator is the UMVUE.

Covariance Inequality

Theorem 4.7 The covariance inequality between two functions T = t(X) and

ψ(X, θ) is defined as

2

{Covθ [T, ψ(X, θ)]}

Vθ [T ] ≥ ∀θ∈Ω

Vθ [ψ(X, θ)]

where ψ(X, θ) is a function of X and θ and T = t(X) is a statistic with pdf

pθ (t) .

115

A. Santhakumaran

2

{E[X − E[X]][Y − E[Y ]]} ≤ E[X − E[X]]2 E[Y − E[Y ]]2

2

(Cov[X, Y ]) ≤ V [X]V [Y ]

2

(Covθ [T, ψ(X, θ)]) ≤ Vθ [T ]Vθ [ψ(X, θ)]

2

{Covθ [T, ψ(X, θ)]}

Vθ [T ] ≥ ∀θ∈Ω

Vθ [ψ(X, θ)]

function

∂pθ (x) 1 ∂ log pθ (x)

ψ(x, θ) = =

∂θ pθ (x) ∂θ

is the relative rate at which the density pθ (x) changes at x . The average of the square

of this range is defined by

2 2

p0θ (x)

Z

∂ log pθ (X)

I(θ) = Eθ = pθ (x)dx

∂θ pθ (x)

Likelihood Function

Definition 4.2 Consider a random sample X1 , X2 , · · · , Xn from a distribution

having pdf pθ (x), θ ∈ Ω . The joint probability density function of X1 , X2 , · · · , Xn

with a parameter θ is p(x1 , x2 , · · · , xn | θ) . The joint probability density function

may be regarded as a function of θ is called the likelihood function of the random

sample and is denoted by L(θ) = pθ (x1 , x2 , · · · , xn ) θ ∈ Ω.

Property 4.1 Let IX (θ) and IY (θ) be the amount of information of two inde-

pendent samples (X1 , X2 , · · · , Xn ) and (Y1 , Y2 , · · · Yn ) respectively. Let IXY (θ)

be the amount of information of the joint sample (X1 , Y1 )(X2 , Y2 ), · · · , (Xn , Yn ) .

Then IXY (θ) = IX (θ) + IY (θ) . This is known as additive property of Fisher Mea-

sure of Information.

116

Probability Models and their Parametric Estimation

Proof:

= pθ (x1 )pθ (y1 ) · · · pθ (xn )pθ (yn )

Yn Y n

= pθ (xi ) pθ (yi )

i=1 i=1

= LX (θ)LY (θ)

log LXY (θ) = log LX (θ) + log LY (θ)

Differentiate this with respect to θ

∂ log LXY (θ) ∂ log LX (θ) ∂ log LY (θ)

= +

∂θ ∂θ ∂θ

∂ log LXY (θ) ∂ log LX (θ) ∂ log LY (θ)

Vθ = Vθ + Vθ

∂θ ∂θ ∂θ

IXY (θ) = IX (θ) + IY (θ)

tion with density function pθ (x), θ ∈ Ω . Let I(θ) be the amount of information for

each Xi , i = 1, 2, · · · , n . Then the amount of information of (X1 , X2 , · · · , Xn ) is

nI(θ) .

Proof:

The likelihood function for θ of the sample size n is

n

Y

L(θ) = pθ (xi )

i=1

Xn

log L( θ) = log pθ (xi )

i=1

n

∂ log L(θ) X ∂ log pθ (xi )

=

∂θ i=1

∂θ

" n #

∂ log L(θ) X ∂ log pθ (Xi )

Vθ = Vθ

∂θ i=1

∂θ

n

X ∂ log pθ (Xi )

= Vθ since Xi0 s iid

i=1

∂θ

n

X

= I(θ) = nI(θ)

i=1

The amount of information of X1 , X2 , · · · , Xn is = nI(θ)

∂ log pθ (X)

where I(θ) = Vθ ∀∈Ω

∂θ

Property 4.3 Let X1 , X2 , · · · , Xn be iid random sample drawn from a popula-

tion with density function p(x | θ), θ ∈ Ω . Let IX (θ) be the amount of information of

117

A. Santhakumaran

T = t(X) . Then IX (θ) ≥ IT (θ) . If T = t(X) is sufficient, then IX (θ) = IT (θ).

Proof: For a single observation x of X

Z

∂ log pθ (X) ∂ log pθ (x)

Eθ = pθ (x)dx

∂θ ∂θ

Z

∂pθ (x) 1

= pθ (x)dx

∂θ pθ (x)

Z

∂pθ (x)

= dx

∂θ

Z

∂

= pθ (x)dx

∂θ

Z Z

∂ ∂pθ (x)

Assume pθ (x)dx = dx and make the transformation T = X

∂θ ∂θ

∂ log pθ (X) ∂ log pθ (T ) ∂ log pθ (x) ∂ log pθ (t)

then Eθ = Eθ since =

∂θ ∂θ ∂θ ∂θ

2

∂ log pθ (X) ∂ log pθ (T )

Consider Eθ − ≥0

∂θ ∂θ

2 2

∂ log pθ (X) ∂ log pθ (T ) ∂ log pθ (X) ∂ log pθ (T )

Eθ + Eθ − 2Eθ ≥0

∂θ ∂θ ∂θ ∂θ

2

∂ log pθ (T )

IX (θ) + IT (θ) − 2Eθ ≥ 0

∂θ

IX (θ) + IT (θ) − 2IT (θ) ≥ 0

IX (θ) − IT (θ) ≥ 0

IX (θ) ≥ IT (θ)

Suppose T = t(X) is a sufficient statistic, then

pθ (x) = pθ (t)h(x)

log pθ (x) = log pθ (t) + log h(x)

Differentiate this with respect to θ

∂ log pθ (x) ∂ log pθ (t)

=

∂θ ∂θ

∂ log pθ (X) ∂ log pθ (T )

Vθ = Vθ

∂θ ∂θ

⇒ IX (θ) = IT (θ)

When a UMVUE does not exist, one may interest on a Locally Minimum Variance

Unbiased Estimator (LMVUE) which gives the smallest variance that an unbiased es-

timator can achieve at θ = θ0 . This is helpful to measure the performance of a given

118

Probability Models and their Parametric Estimation

unbiased estimator with some lower bounds of the unbiased estimator which are not

sharp. The Cramer - Rao Inequality is very simple to calculate the lower bound for the

variance of an unbiased estimator. Also it provides asymptotically efficient estimators.

The assumptions of the Cramer - Rao Inequality are

(i) Ω is an open interval ( finite , infinite or semi infinite).

∂pθ (x)

(iii) For any x and θ the derivative ∂θ exists and is finite.

Theorem 4.8 Under the assumptions (i) ,(ii) and (iii) and that I(θ) > 0 . Let

2

R statistic with Eθ [T ] < ∞ for which the derivative with respect to

T = t(X) be any

θ of Eθ [T ] = tpθ (x)dx exists can be obtained by differentiating under the integral

sign, then

h i2

∂Eθ [T ]

∂θ

Vθ [T ] ≥ ∀θ∈Ω

I(θ)

2 2

∂ log pθ (X) ∂ log pθ (X)

where I(θ) = Eθ = −Eθ

∂θ ∂θ2

pθ (x)dx = 1 is differentiated twice under the integral sign with respect to θ , then

Z

∂pθ (x)

dx = 0

∂θ

Z

∂pθ (x) 1

pθ (x)dx = 0

∂θ pθ (x)

Z

∂ log pθ (x)

pθ (x)dx = 0 (4.1)

∂θ

∂ log pθ (X)

⇒ Eθ = 0

∂θ

R ∂ 2 log pθ (x)

pθ (x)dx + ∂ log∂θpθ (x) ∂p∂θθ (x)

R

∂θ 2 dx = 0

R ∂ 2 log pθ (x) R ∂ log pθ (x) ∂ log pθ (x)

∂θ 2 pθ (x)dx + ∂θ ∂θ pθ (x)dx = 0

R ∂ 2 log pθ (x) R ∂ log pθ (x) 2

∂θ 2 pθ (x)dx + ∂θ pθ (x)dx = 0

119

A. Santhakumaran

2

∂ 2 log pθ (X)

∂ log pθ (X)

Eθ + Eθ = 0

∂θ2 ∂θ

2

∂ 2 log pθ (X)

∂ log pθ (X)

Eθ = −Eθ

∂θ ∂θ2

2 2

∂ log pθ (X) ∂ log pθ (X)

But I(θ) = Eθ = −Eθ

∂θ ∂θ2

∂ log pθ (X)

= Vθ

∂θ

Z

Now Eθ [T ] = tpθ (x)dx

Differentiate this with respect to θ

Z

∂Eθ [T ] dpθ (x)

= t dx

∂θ dθ

Z

∂pθ (x) 1

= t pθ (x)dx

∂θ pθ (x)

Z

∂Eθ [T ] ∂ log pθ (x)

= t pθ (x)dx

∂θ ∂θ

∂ log pθ (X)

= Eθ T

∂θ

∂ log pθ (X)

= Covθ T,

∂θ

∂ log pθ (X)

∵ Eθ [ ]=0

∂θ

By Covariance Inequality

2

{Covθ [T, ψ(X, θ)]}

Vθ [T ] ≥ ∀θ∈Ω

Vθ [ψ(X, θ)]

∂ log pθ (x)

Take ψ(x, θ) =

∂θ

2

∂Eθ [T ]

∂θ

then Vθ [T ] ≥ ∀θ∈Ω

Vθ [ ∂ log∂θ

pθ (X)

]

2

∂Eθ [T ]

∂θ

i.e., Vθ [T ] ≥ ∀θ∈Ω

I(θ)

120

Probability Models and their Parametric Estimation

[τ 0 (θ) + b0 (θ)]2

Vθ [T ] ≥ ∀θ∈Ω

I(θ)

τ (θ) ∀ θ ∈ Ω , then the Cramer - Rao Inequality is written as

2

[τ 0 (θ)]

Vθ [T ] ≥ ∀θ∈Ω

nI(θ)

h i

∂ log pθ (X)

where I(θ) = Vθ ∂θ of a single observation x of X.

or

2

[τ 0 (θ)]

Vθ [T ] ≥ ∀θ∈Ω

I(θ)

h i Qn

where I(θ) = Vθ ∂ log∂θL(θ) and L(θ) = i=1 pθ (xi ).

Chapman - Robbin Inequality is an improvement of Cramer - Rao Inequality, since

it does not involve regularity conditions as in Cramer - Rao Inequality. They also give

a lower bound for the variance of an unbiased estimator.

Theorem 4.9 Suppose X is distributed with density function pθ (x) and T =

t(X) is a statistic with Eθ [T ] = τ (θ) and Eθ [T 2 ] < ∞. Suppose pθ (x) > 0 ∀ x

. If θ and θ + ∆ are two values for which τ (θ) 6= τ (θ + ∆) and the function

ψ(x, θ) = pθ+∆ (x)

pθ (x) − 1 , then

[τ (θ + ∆) − τ (θ)]2

Vθ [T ] ≥ sup h i2 ∀ θ ∈ Ω

∆ Eθ pθ+∆ (X) − 1

pθ (X)

121

A. Santhakumaran

2

∂ 2 log pθ (X)

∂ log pθ (X)

Eθ + Eθ = 0

∂θ2 ∂θ

2

∂ 2 log pθ (X)

∂ log pθ (X)

Eθ = −Eθ

∂θ ∂θ2

2 2

∂ log pθ (X) ∂ log pθ (X)

But I(θ) = Eθ = −Eθ

∂θ ∂θ2

∂ log pθ (X)

= Vθ

∂θ

Z

Now Eθ [T ] = tpθ (x)dx

Differentiate this with respect to θ

Z

∂Eθ [T ] dpθ (x)

= t dx

∂θ dθ

Z

∂pθ (x) 1

= t pθ (x)dx

∂θ pθ (x)

Z

∂Eθ [T ] ∂ log pθ (x)

= t pθ (x)dx

∂θ ∂θ

∂ log pθ (X)

= Eθ T

∂θ

∂ log pθ (X) ∂ log pθ (x)

= Covθ T, ∵ Eθ [ ]=0

∂θ ∂θ

By covariance inequality

2

{Covθ [T, ψ(X, θ)]}

Vθ [T ] ≥ ∀θ∈Ω

Vθ [ψ(X, θ)]

∂ log pθ (x)

Take ψ(x, θ) =

∂θ

2

∂Eθ [T ]

∂θ

then Vθ [T ] ≥ ∀θ∈Ω

Vθ [ ∂ log∂θ

pθ (X)

]

2

∂Eθ [T ]

∂θ

i.e., Vθ [T ] ≥ ∀θ∈Ω

I(θ)

tion with density function pθ (x), θ ∈ Ω . Let IX (θ) be the amount of information of

the sample X1 , X2 , · · · , Xn and IT (θ) be the amount of information of the statistic

T = t(X) . Then IX (θ) ≥ IT (θ) . If T = t(X) is sufficient, then IX (θ) = IT (θ).

122

Probability Models and their Parametric Estimation

Z

∂ log pθ (X) ∂ log pθ (x)

Eθ = pθ (x)dx

∂θ ∂θ

Z

∂pθ (x) 1

= pθ (x)dx

∂θ pθ (x)

Z

∂pθ (x)

= dx

∂θ

Z

∂

= pθ (x)dx

∂θ

Assume

Z Z

∂ ∂pθ (x)

pθ (x)dx = dx

∂θ ∂θ

and make the transformation T = X then

∂ log pθ (X) ∂ log pθ (T )

Eθ = Eθ

∂θ ∂θ

since

∂ log pθ (x) ∂ log pθ (t)

=

∂θ ∂θ

2

∂ log pθ (X) ∂ log pθ (T )

Consider Eθ − ≥0

∂θ ∂θ

2 2

∂ log pθ (X) ∂ log pθ (T ) ∂ log pθ (X) ∂ log pθ (T )

Eθ + Eθ − 2Eθ ≥0

∂θ ∂θ ∂θ ∂θ

2

∂ log pθ (T )

IX (θ) + IT (θ) − 2Eθ ≥ 0

∂θ

IX (θ) + IT (θ) − 2IT (θ) ≥ 0

IX (θ) − IT (θ) ≥ 0

IX (θ) ≥ IT (θ)

Suppose T = t(X) is a sufficient statistic, then

pθ (x) = pθ (t)h(x)

log pθ (x) = log pθ (t) + log h(x)

Differentiate this with respect to θ

∂ log pθ (x) ∂ log pθ (t)

=

∂θ ∂θ

∂ log pθ (X) ∂ log pθ (T )

Vθ = Vθ

∂θ ∂θ

⇒ IX (θ) = IT (θ)

123

A. Santhakumaran

When a UMVUE does not exist, one may interest on a Locally Minimum Variance

Unbiased Estimator (LMVUE) which gives the smallest variance that an unbiased es-

timator can achieve at θ = θ0 . This is helpful to measure the performance of a given

unbiased estimator with some lower bounds of the unbiased estimator which are not

sharp. The Cramer - Rao Inequality is very simple to calculate the lower bound for the

variance of an unbiased estimator. Also it provides asymptotically efficient estimators.

The assumptions of the Cramer - Rao Inequality are

(ii) The range of the distribution Pθ (x) is independent of the parameter θ .

∂pθ (x)

(iii) For any x and θ the derivative ∂θ exists and is finite.

Theorem 4.8 Under the assumptions (i) ,(ii) and (iii) and that I(θ) > 0 . Let T =

2

R statistic with Eθ [T ] < ∞ for which the derivative with respect to θ of

t(X) be any

Eθ [T ] = tpθ (x)dx exists can be obtained by differentiating under the integral sign,

then

h i2

∂Eθ [T ]

∂θ

Vθ [T ] ≥ ∀θ∈Ω

I(θ)

2 2

∂ log pθ (X) ∂ log pθ (X)

where I(θ) = Eθ = −Eθ

∂θ ∂θ2

pθ (x)dx = 1 is differentiated twice under the integral sign with respect to θ , then

Z

∂pθ (x)

dx = 0

∂θ

Z

∂pθ (x) 1

pθ (x)dx = 0

∂θ pθ (x)

Z

∂ log pθ (x)

pθ (x)dx = 0 (4.1)

∂θ

∂ log pθ (X)

⇒ Eθ = 0

∂θ

R ∂ 2 log pθ (x)

pθ (x)dx + ∂ log∂θpθ (x) ∂p∂θθ (x)

R

∂θ 2 dx = 0

R ∂ 2 log pθ (x) R ∂ log pθ (x) ∂ log pθ (x)

∂θ 2 pθ (x)dx + ∂θ ∂θ pθ (x)dx = 0

124

Probability Models and their Parametric Estimation

∂θ 2 pθ (x)dx + ∂θ pθ (x)dx = 0

2

∂ 2 log pθ (X)

∂ log pθ (X)

Eθ + Eθ = 0

∂θ2 ∂θ

2

∂ 2 log pθ (X)

∂ log pθ (X)

Eθ = −Eθ

∂θ ∂θ2

2

∂ 2 log pθ (X)

∂ log pθ (X)

But I(θ) = Eθ = −Eθ

∂θ ∂θ2

∂ log pθ (X)

= Vθ

∂θ

Z

Now Eθ [T ] = tpθ (x)dx

Differentiate this with respect to θ

Z

∂Eθ [T ] dpθ (x)

= t dx

∂θ dθ

Z

∂pθ (x) 1

= t pθ (x)dx

∂θ pθ (x)

Z

∂Eθ [T ] ∂ log pθ (x)

= t pθ (x)dx

∂θ ∂θ

∂ log pθ (X)

= Eθ T

∂θ

∂ log pθ (X)

= Covθ T,

∂θ

since Eθ [ ∂ log∂θ

pθ (X)

]=0 .

By covariance inequality

2

{Covθ [T, ψ(X, θ)]}

Vθ [T ] ≥ ∀θ∈Ω

Vθ [ψ(X, θ)]

∂ log pθ (x)

Take ψ(x, θ) =

∂θ

2

∂Eθ [T ]

∂θ

then Vθ [T ] ≥ ∀θ∈Ω

Vθ [ ∂ log∂θ

pθ (X)

]

2

∂Eθ [T ]

∂θ

i.e., Vθ [T ] ≥ ∀θ∈Ω

I(θ)

125

A. Santhakumaran

τ (θ) + b(θ) , then the Cramer - Rao Inequality becomes

[τ 0 (θ) + b0 (θ)]2

Vθ [T ] ≥ ∀θ∈Ω

I(θ)

τ (θ) ∀ θ ∈ Ω , then the Cramer - Rao Inequality is written as

2

[τ 0 (θ)]

Vθ [T ] ≥ ∀θ∈Ω

nI(θ)

h i

∂ log pθ (X)

where I(θ) = Vθ ∂θ of a single observation x of X.

or

2

[τ 0 (θ)]

Vθ [T ] ≥ ∀θ∈Ω

I(θ)

h i Qn

where I(θ) = Vθ ∂ log∂θL(θ) and L(θ) = i=1 pθ (xi ).

Chapman - Robbin Inequality is an improvement of Cramer - Rao Inequality, since

it does not involve regularity conditions as in Cramer - Rao Inequality. They also give

a lower bound for the variance of an unbiased estimator.

Theorem 4.9 Suppose X is distributed with density function pθ (x) and T =

t(X) is a statistic with Eθ [T ] = τ (θ) and Eθ [T 2 ] < ∞. Suppose pθ (x) > 0 ∀ x

. If θ and θ + ∆ are two values for which τ (θ) 6= τ (θ + ∆) and the function

ψ(x, θ) = pθ+∆ (x)

pθ (x) − 1 , then

[τ (θ + ∆) − τ (θ)]2

Vθ [T ] ≥ sup h i2 ∀ θ ∈ Ω

∆ Eθ pθ+∆ (X) − 1

pθ (X)

126

Probability Models and their Parametric Estimation

Eθ [ψ(X, θ)] 0, ∀ θ ∈ Ω

=

Z

Eθ [ψ(X, θ)] = ψ(x, θ)pθ (x)dx

Z

pθ+∆ (x)

= − 1 pθ (x)dx

pθ (x)

Z

= [pθ+∆ (x) − pθ (x)]dx

= 1−1=0

Covθ [T, ψ(X, θ)] = Eθ [T ψ(X, θ)] − Eθ [T ]Eθ [ψ(X, θ)]

= Eθ [T ψ(X, θ)]

pθ+∆ (X)

= Eθ T −1

pθ (X)

pθ+∆ (x) − pθ (x)

Z

= t pθ (x)dx

pθ (x

Z Z

= tpθ+∆ (x)dx − tpθ (x)dx

= τ (θ + ∆) − τ (θ)

By covariance inequality

2

[τ (θ + ∆) − τ (θ)]

Vθ [T ] ≥ h i

Vθ pθ+∆ (X)

pθ (X) − 1

It is true for all values of ∆

[τ (θ + ∆) − τ (θ)]2

Vθ [T ] ≥ sup h

pθ+∆(X)

i

∆ V

θ pθ (X) − 1

S(θ), φ < θ, φ 6= θ , then the Chapman - Robbin Inequality becomes

[τ (φ) − τ (θ)]2

Vθ [T ] ≥ sup h i

pφ (X)

φ:S(φ)⊂S(θ) V θ pθ (X)−1

bound for the parameter θ of the pdf

1

θ 0<x<θ

pθ (x) =

0 otherwise

127

A. Santhakumaran

1

φ 0<x<φ

pφ (x) =

0 otherwise

Z φ Z θ

pφ (X) θ 0

Eθ = pθ (x)dx + 1 pθ (x)dx

pθ (X) 0 φ φ θ

Z φ

θ1

= dx = 1

0 φθ

2 Z φ 2

pφ (X) θ 1

Eθ = dx

pθ (X) 0 φ θ

θ2 1 θ

= φ=

φ2 θ φ

pφ (X) pφ (X)

Vθ −1 = Vθ

pθ (X) pθ (X)

θ θ−φ

= −1=

φ φ

The Chapman Robbin Inequality is

(φ − θ)2

Vθ [T ] ≥ sup φ

φ:S(φ)⊂S(θ) (θ − φ)

≥ sup {φ(θ − φ)}

φ:S(φ)⊂S(θ)

Let y = φ(θ − φ)

Differentiate this with respect to φ

dy

= θ − 2φ

dφ

d2 y

= −2 < 0

dφ2

d2 y dy

For maximum of y , dφ2 < 0 at the value of φ for which dφ = 0 . At φ = θ2 , y has

2

maximum. The maximum value of y is θ4 . The Chapman - Robbin lower bound for

2

the variance of the unbiased estimator of θ is θ4 .

Remark 4.6 Chapman - Robbin bound becomes the Cramer - Rao lower bound

by allowing ∆ → 0 and assume the range of the distribution is independent of the

128

Probability Models and their Parametric Estimation

∂ log pθ (x)

parameter, and the derivative ∂θ exists and finite, then

[τ (θ + ∆) − τ (θ)]2

Vθ [T ] ≥ h i2

1

Eθ [pθ+∆ (X) − pθ (X)] pθ (X)

h i2

lim∆→0 [τ (θ+∆)−τ

∆

(θ)

≥ h i2

[pθ+∆ (X)−pθ (X)] 1

Eθ lim∆ →0 ∆ pθ (X)

[τ 0 (θ)]2

≥ h i2

1

Eθ p0 (X | θ) pθ (X)

[τ 0 ]2

≥ h i2

∂ log pθ (X)

Eθ ∂θ

[τ 0 (θ)]2

≥ ∀ θ∈Ω

I(θ)

Example 4.19 Obtain the Cramer - Rao lower bound for the variance of the unbi-

ased estimator of the parameter θ of the Cauchy distribution by considering a sample

of size n .

1 1

π 1+(x−θ)2

−∞ < x < ∞, −∞ < θ < ∞

pθ (x) =

0 otherwise

1 1

For a single observation x of X, L(θ) = pθ (x) =

π 1 + (x − θ)2

log L(θ) = − log π − log[1 + (x − θ)2 ]

∂ log pθ (x) 2(x − θ)

=

∂θ 1 + (x − θ)2

2

4(x − θ)2

∂ log pθ (x)

=

∂θ [1 + (x − θ)2 ]2

2

4(X − θ)2

∂ log pθ (X)

Eθ = Eθ

∂θ [1 + (X − θ)2 ]2

Z ∞

4 (x − θ)2

= dx

π −∞ [1 + (x − θ)2 ]3

Z ∞

4 t2

= dt since t = x − θ

π −∞ (1 + t2 )3

Z ∞

8 t2

= dt

π 0 (1 + t2 )3

Z ∞ 3

4 u 2 −1

= du since t2 = u

π 0 (1 + u) 23 + 32

3 3

4 Γ2Γ2

=

π Γ3

4 1√ 1√

π 2

π2 π 1

I(θ) = =

2 2

129

A. Santhakumaran

The Cramer - Rao lower bound from the sample of size n for the variance of the

0

(θ)]2

unbiased estimator of the parameter τ (θ) = θ is [τnI(θ) = n11 = n2 .

2

Example 4.20 Let X1 , X2 , · · · , Xn is a sample from N (θ, 1) . Obtain the

Cramer - Rao lower bound for the variance of (i) θ and (ii) θ2 . Also find the un-

biased estimator of θ2 . To verify that the actual variance of the unbiased estimator of

θ2 is same as Cramer - Rao lower bound.

(i) The likelihood function for θ is

n

Y

L(θ) = pθ (xi )

i=1

n

1 1

Pn 2

= e− 2 i=1 (xi −θ)

2π

n

√ 1X

log L(θ) = −n log 2π − (xi − θ)2

2 i=1

n

∂ log L(θ) X

= (xi − θ) = n(x̄ − θ)

∂θ i=1

2

∂ log L(θ)

= n2 (x̄ − θ)2

∂θ

2

∂ log L(θ)

Eθ = n2 Eθ [X̄ − θ]2

∂θ

1

= n2 Vθ [X̄] = n2 = n = I(θ)

n

The Cramer - Rao lower bound for the variance of the unbiased estimator X̄ of τ (θ) =

0

(θ)]2

θ is [τI(θ) = n1 .

Remark 4.7 The actual variance of the statistic X̄ is Vθ [X̄] = n1 . It is same as

the Cramer - Rao lower bound. .˙. X̄ is UMVUE of θ .

(ii) The likelihood function for θ becomes

n

n 1 X √ 2

log L(θ) = − log 2π − xi − θ 2

2 2 i=1

Differentiate this with respect to θ2

n

∂ log L(θ) 1 X √

= xi − θ 2

∂θ2 2θ i=1

n

1 X 1

= (xi − θ) = n[x̄ − θ]

2θ i=1 2θ

2

n2 n2 1

∂ log L(θ) 1 2 n

Eθ = n Eθ [X̄ − θ]2 = 2 Vθ [X̄] = 2 = 2

∂θ2 4θ 2 4θ 4θ n 4θ

130

Probability Models and their Parametric Estimation

The Cramer - Rao lower bound for the variance of unbiased estimator of τ (θ) = θ2

0

(θ)]2 2

is [τI(θ) = 4θn where τ 0 (θ) = dτdθ(θ)

2 = 1.

Consider Eθ [X − θ]2 = 1

Eθ [X 2 ] − 1 = θ2

Pn 2

i=1 Xi

Eθ − 1 = θ2

n

Pn

Xi2

.. . i=1

n − 1 is the unbiased estimator of θ2 .

Pn

Xi2

Pn

(X −θ+θ)2

Pn

(X −θ)2 Pn

Consider i=1n = i=1 ni = i=1 n i + θ2 + 2θ

n i=1 (Xi − θ)

P 2 P 2

Xi Xi

Vθ −1 = Vθ

n n

2 X n

!

(Xi − θ)2

P

2θ

= Vθ + Vθ [Xi ] − 0

n n i=1

(Xi − θ)2 4θ2

P

= Vθ + 2 n since Vθ [Xi ] = 1 ∀ i = 1 to n

n n

2

4θ2

P

(Xi − θ)

= Vθ +

n n

2 2

P

ns (Xi − θ)

Define Y = 2 = 2

∼ χ2 distribution with n degrees of freedom

σ σ

n 1

The pdf of Y ∼ G ,

2 2

( 1 n

n

1

2 Γn

e− 2 y y 2 −1 0 < y < ∞

p(y) = 2 2

0 otherwise

Z ∞

1 − 21 y n

E [Y r ] = n ne y 2 +r−1 dy

0 2 2 Γ

2

1 Γ( n2 + r)

= n n

2 2 Γ n2 ( 12 ) 2 +r

2r Γ( n2 + r)

= r = 1, 2, · · ·

Γ n2

Γ( n + 1)

E[Y ] = 2 2 n =n

Γ2

E[Y 2 ] =

(n + 2)n and V [Y ] = 2n

ns2

But Y = and σ 2 = 1

σ 2

Y 2n 2

.. . Vθ [s2 ] = Vθ = 2 =

n n n

P 2 2

4θ2

Xi 4θ 2

Vθ −1 = Vθ [s2 ] + = +

n n n n

131

A. Santhakumaran

Xi2

P

4θ 2 2

The actual variance of n − 1 is n + n . Here the Cramer - Rao lower bound is

X2

P

less than the actual variance of the unbiased estimator n i − 1 of the parameter θ2 .

Note that the UMVUE of θ2 is X̄ 2 − n1 , since Eθ [X̄ 2 ] − {Eθ [X̄]}2 = n1

⇒ Eθ [X̄ 2 ] − n1 = θ2

i.e., X̄ 2 − n1 is unbiased estimator of θ2 .

Example 4.21 Given pθ (x) = θ1 , 0 < x < θ, θ > 0 . Compute the reciprocal

h i2

nEθ ∂ log∂θ pθ (X)

. Compare this with the variance of n+1

n T where T is the largest

observation of a random sample of size n for this distribution.

1

θ 0<x<θ

pθ (x) =

0 otherwise

1

log pθ (x = −

θ

∂ log pθ (x) 1

= −

∂θ θ

∂ log pθ (x) 1

=

∂θ θ2

2

∂ log pθ (X) 1

Eθ =

∂θ θ2

2

∂ log pθ (X) n

i.e., nEθ =

∂θ θ2

1 θ2

i2 =

n

h

nEθ ∂ log∂θ pθ (X)

1≤i≤n

The pdf of T is

n n−1

p(t | θ) = θn t 0<t<θ

0 otherwise

n

Eθ [T ] = θ

n+1

n+1

→ T is an unbiased estimator of θ

n

n 2

Eθ [T 2 ] = θ

n+2

2

n 2 n

Vθ [T ] = θ − θ2

n+1 n+1

nθ2

=

(n + 1)(n + 2)

θ2

n+1

Vθ T =

n n(n + 2)

n+1 θ2

The actual variance of the unbiased estimator n T is n(n+2)

132

Probability Models and their Parametric Estimation

Here the actual variance of the unbiased estimator of θ is less than the Cramer

- Rao lower bound of the estimator n+1 n T . Since the distribution is not satisfied the

assumptions of the Cramer - Rao Inequality . Note that n+1n T is the UMVUE of θ .

Example 4.22 Find the Cramer - Rao lower bound for the variance of the unbiased

estimator Pθ {X > 2} for a single observation x of X with pdf

1 −x

θe x>0θ>0

θ

pθ (x) =

0 otherwise

Z 2

1 −x

Consider τ (θ) = Pθ {X > 2} = 1 − e θ dx

0 θ

x 2

1 e− θ

= 1−

θ − θ1 0

2 2

= 1 + e− θ − 1 = e− θ

1

log pθ (x) = − log θ − x

θ

2

One can take λ = e− θ , then log λ = − θ2 i.e., θ = − log2 λ .

2 x

log pλ (x) = − log − + log λ

log λ 2

∂ log pλ (x) log λ −2 1 x1

= − (−2)(−1) (log λ) +

∂λ −2 λ 2λ

1 x

= +

λ log λ 2λ

∂ log pθ (x) θ x −2

2 = − 2 + e θ

∂ e θ− e θ 2

2

eθ

= [x − θ]

2

2

4

∂ log pθ (X) eθ

Eθ 2 = Eθ [X − θ]2

∂ e− θ 4

4

eθ 2

= θ since Eθ [X − θ]2 = θ2

4

The Cramer - Rao lower bound for the variance of the unbiased estimator of τ (θ) =

−2 4 2

e θ is θ42 e− θ , since τ 0 (θ) = ∂τ−(θ)2 = 1. The unbiased estimator of τ (θ) = e− θ

∂ e θ

is

1 if X > 2

T =

0 otherwise

4.9 Efficiency

As a consequence of Cramer - Rao Inequality, the efficient estimator is as follow:

133

A. Santhakumaran

T = t(X) is called an efficient estimator of θ iff the variance of T = t(X) attains

the Cramer - Rao lower bound.

Definition 4.4 The ratio of the actual variance of any unbiased estimator of a

parameter to the Cramer - Rao lower bound is called the efficiency of that estimator.

Actual Variance of the statistic

Efficiency =

Cramer - Rao lower bound of that statistic

Definition 4.5 An estimator is said to be efficient estimator if efficiency is one.

Definition 4.6 An estimator is said to be an asymptotic efficient estimator if effi-

ciency tends to one as n → ∞.

Using Cramer - Rao lower bound to find the efficient estimator has the follow-

ing limitations.

• UMVUE exists even the Cramer - Rao regularity conditions are not satisfied.

• UMVUE exists when the regularity conditions are satisfied but UMVUE’s are

not attained the Cramer - Rao lower bound.

Example 4.23 Let X1 , X2 , · · · , Xn be a random sample from

−θx

θe 0 < x < ∞, θ > 0

pθ (x) =

0 otherwise

L(θ) = pθ (x) = θe−θx

log L(θ) = log θ − θx

∂ log pθ (x) 1

= −x

∂θ θ

2

∂ log pθ (x) 1

= − 2

2 ∂θ2 θ

∂ log pθ (X) 1

Eθ = − 2

∂θ2 θ

The Cramer - Rao lower bound for the variance of the unbiased estimator of θ is

2

1

n 1

= θn .

θ2

n

X 1

Let T = Xi , thenT ∼ G n,

i=1

θ

θ n −θt n−1

Γn e t 0<t<∞

pθ (t) =

0 otherwise

Z ∞ n

1 θ −θt n−1−1

Eθ = e t dt

T 0 Γn

134

Probability Models and their Parametric Estimation

θn Γ(n − 1)

=

Γn θn−1

θ

=

n−1

n−1

Eθ = θ if n = 2, 3, · · ·

T

n−1

is the unbiased estimator of θ.

T

θ2

1

Eθ = if n = 3, 4, · · ·

T2 (n − 1)(n − 2)

θ2

1

Vθ =

T (n − 1)2 (n − 2)

θ2

n−1

Vθ = , if n = 3, 4, · · ·

T n−2

n−1 θ2

Actual variance of T is n−2 . Cramer - Rao lower bound of the unbiased estimator

n−1 θ2

T of θ is n.

θ2

n−2

Efficiency = θ2

n

n 1

= = 2 , n = 3, 4, · · ·

n−2 1− n

→ 1 as n → ∞

T is the UMVUE

of θ .

Theorem 4.10 A necessary and sufficient condition for an estimator to be the most

efficient is that T = t(X) is sufficient and t(x) − τ (θ) is proportional to ∂ log∂θpθ (x)

where Eθ [T ] = τ (θ) .

Proof: Assume T = t(X) is a most efficient estimator of τ (θ) and t(x)−τ (θ) ∝

∂ log pθ (x)

∂θ

∂ log pθ (x)

i.e., t(x) − τ (θ) = A(θ)

∂θ

135

A. Santhakumaran

=

A(θ) ∂θ

t(x) τ (θ) ∂ log pθ (x)

− =

A(θ) A(θ) ∂θ

Z Z Z

t(x) τ (θ)

dθ − dθ = d log pθ (x) + c(x)

A(θ) A(θ)

Z θ Z θ

1 τ (θ)

Choose dθ = Q(θ) and d(θ) = c1 (θ)

−∞ A(θ) −∞ A(θ)

Then t(x)Q(θ) − c1 (θ) − c(x) = log pθ (x)

eQ(θ)t(x)−c1 (θ)−c(x) = pθ (x)

pθ (x) = c(θ)eQ(θ)t(x) h(x)

where c(θ) = e−c1 (θ) and h(x) = e−c(x) .

Conversely, assume T = t(X) is sufficient and t(x) − τ (θ) = A(θ) ∂ log∂θpθ (x) .

Prove that T = t(X) is the most efficient estimator of τ (θ) .

∂ log pθ (x)

t(x) − τ (θ) = A(θ)

∂θ

t(x) − τ (θ) ∂ log pθ (x)

=

A(θ) ∂θ

2 2

t(x) − τ (θ) ∂ log pθ (x)

=

A(θ) ∂θ

2

1 ∂ log pθ (X)

Eθ [T − τ (θ)]2 = Eθ

[A(θ)]2 ∂θ

2

Vθ [T ] ∂ log pθ (X)

= Eθ

[A(θ)]2 ∂θ

2

2 ∂ log pθ (X)

Vθ [T ] = [A(θ)] Eθ (4.2)

∂θ

136

Probability Models and their Parametric Estimation

∂ log pθ (X)

But Eθ T, = τ 0 (θ)

∂θ

∂ log pθ (x)

i.e, Eθ (T − τ (θ)) , = τ 0 (θ)

∂θ

since Eθ [ ∂ log∂θ

pθ (x)

]=0

" 2 #

∂ log pθ (X)

Eθ A(θ) = τ 0 (θ)

∂θ

since t(x) − τ (θ) = A(θ) ∂ log∂θ

pθ (x)

2

∂ log pθ (X)

A(θ)Eθ = τ 0 (θ)

∂θ

τ 0 (θ)

i.e., A(θ) = h i2

∂ log pθ (X)

Eθ ∂θ

[τ 0 (θ)]2

From equation (4.2) →Vθ [T ] = ∀θ∈Ω

Eθ [ ∂ log∂θ

pθ (X) 2

]

Thus the actual variance of T = t(X) is equal to the Cramer - Rao lower bound.

Remark 4.8 UMVUE may be most efficient estimator. As discussed in example

4.20, n−1

T , n = 3, 4, · · · is the UMVUE of θ but not most efficient estimator of θ .

Cramer - Rao Inequality has been modified and extended in different directions. Con-

sider the first case, where θ is a vector. In second case, it may extend the inequality to

get better bounds for the variance of unbiased estimators. Bhattacharya gives a method

of having a whole sequence of non-decreasing lower bounds for the variance of an un-

biased estimator by successive differentiation of the likelihood function with respect to

the parametric function.

Lemma 4.1 For any random variables X1 , X2 , · · · , Xr with finite second mo-

ments, the covariance matrix

C = [Cov(Xi , Xj )]r×r

Pr

Proof: Assume Xi ’s are not independent. Consider the variance of i=1 ai Xi

" r # c11 · · · c1r a1

X c21 · · · c2r a2

i.e., V ai Xi = (a1 , a2 , · · · , ar )

··· ··· ··· ···

i=1

cr1 · · · crr ar

= a0 Ca ≥ 0 ∀ a0 = (a1 , a2 , · · · , ar )

→ C is positive semi definite.

137

A. Santhakumaran

" # c11 0 ··· 0 a1

r

X 0 c22 ··· 0 a2

V ai Xi = (a1 , a2 , · · · , ar )

···

· ··· · ·

i=1

0 · ··· crr ar

= a0 Ca > 0 ∀ a0 = (a1 , a2 , · · · , ar )

Lemma 4.2 Let ( X1 , X2 , · · · , Xr ) and Y have finite second moment, let

νi = Cov[Xi , Y ] and Σ be the covariance matrix of the Xi ’ s. Without loss of

0 −1

generality suppose Σ is positive definite, then ρ2 = ν VΣ[Y ] ν , ρ is the multiple corre-

lation coefficient between Y and the vector ( X1 , X2 , · · · , Xr ).

Proof: Define ρ is the correlation coefficient between a0 X and Y where a0 =

(a1 , a2 , · · · , ar ) and X0 = (X1 , X2 , · · · , Xr ),

Pr 2

{Cov [ i=1 ai Xi , Y ]}

i.e., ρ2 = Pr .

V [Y ]V [ i=1 ai Xi ]

of scale. Obtaining the unique maximum of ρ , one can impose the condition that

V [Σri=1 ai Xi ] = a0 Σa = 1 . Maximizing ρ subject to a0 Σa = 1 is equivalent

to maximizing a0 ν subject to a0 Σa = 1 . By Lagrangian multiplier method, the

Lagrangian equation is

1

L(a, λ) = a0 ν − λ[a0 Σa − 1]

2

∂L(a, λ)

= ν − λaΣ

∂a

∂L(a, λ)

The necessary condition for maximum is =0

∂a

1 −1

→ ν − λaΣ = 0 i.e., a = Σ ν

λ

1 0 −1

ν Σ ν = 1 since a0 Σa = 1

λ2 √

λ = ± ν 0 Σ−1 ν

Σ−1 ν

.. . a = √

ν 0 Σ−1 ν

" r #

X

Cov ai Xi , Y = a0 Cov [X, Y ] = a0 ν

i=1

a0 ν ν 0 Σ−1 ν

.. . ρ = p =√ p

V [Y ] ν 0 Σ−1 ν V [Y ]

ν 0 Σ−1 ν

ρ2 =

V [Y ]

138

Probability Models and their Parametric Estimation

Theorem 4.11 For any unbiased estimator T = t(X) of τ (θ) and any func-

tions ψi (x, θ) with finite second moments, then V [T ] ≥ ν 0 C −1 ν where ν 0 =

(ν1 , ν2 , · · · , νr ) and C = [cij ]r×r are defined by νi = Cov[T, ψi (X, θ)] and

cij = Cov[ψi (X, θ)ψj (X, θ)], i, j = 1, 2, · · · , r .

Proof: As in Lemma 4.2, replace Y by T and Xi by ψi (X, θ), then

ν 0 C −1 ν

ρ2 = ≤1

V [T ]

V [T ] ≥ ν 0 C −1 ν

where νi = Cov[T, ψi (X, θ)] = τi0 (θ), i = 1, 2, · · · , r, and C = Σ.

Let X be distributed with density pθ (x), θ ∈ Ω where θ is a vector, say θ =

(θ1 , θ2 , · · · , θr ).

Assumptions:

(i) Ω is an open interval ( finite or infinite or semi infinite).

(ii) The range of the distribution Pθ is independent of the parameter θ =

(θ1 , θ2 , · · · , θr ).

(iii) For any x and θ ∈ Ω and i = 1, 2, · · · , r the derivative exists and is finite.

Define the information matrix of order r

∂ log pθ (X) ∂ log p(X, θ)

I(θ) = [Iij (θ)]r×r where Iij (θ) = Eθ

∂θi ∂θj

139

A. Santhakumaran

Z

pθ (x)dx = 1

Differentiate partially with respect to θi

Z

∂pθ (x)

= 0

∂θi

Z

∂pθ (x)

pθ (x)dx = 0

∂θ

i

∂ log pθ (X)

Eθ = 0

∂θi

Z 2 Z

∂ log pθ (x) ∂ log pθ (x) 1 ∂pθ (x)

pθ (x)dx + dx = 0

∂θi ∂θj ∂θi pθ (x) ∂θj

2

∂ log pθ (X) ∂ log pθ (X) ∂ log pθ (X)

Eθ + Eθ = 0

∂θi ∂θj ∂θi ∂θj

∂ log pθ (X) ∂ log pθ (X)

Iij (θ) = Eθ

∂θi ∂θj

2

∂ log pθ (X)

= −Eθ for i 6= j and

∂θi ∂θj

2

∂ log pθ (X)

= −Eθ for i = j

∂θi2

hTheorem i4.12 Suppose that assumptions (i) to (iii) and the relation

∂ log pθ (X)

Eθ ∂θi = 0, i = 1, 2, · · · , r hold and I(θ) is positive definite. Let

T = t(X) be any statistic with REθ [T 2 ] < ∞ for which the derivative with respect to

θi , i = 1, 2, · · · , r of Eθ [T ] = tpθ (x)dx exists for each i and can be obtained by

differentiating under the integral sign. Then Vθ [T ] ≥ α0 I −1 (θ)α, where α0 is the

row vector with ith element αi = ∂E∂θθ [T i

]

, i = 1, 2, · · · , r .

Proof: As in Theorem 4.11, replace ψi (x, θ) = ∂ log∂θpiθ (x) , i = 1, 2 · · · , r and

ν = α , C = I(θ) ⇒ Vθ [T ] ≥ α0 I −1 (θ)α.

Example 4.21 Let X1 , X2 , · · · , Xn iid N( θ, σ 2 ). Obtain the information in-

equality for the parameter θ = (θ, σ 2 ) .

140

Probability Models and their Parametric Estimation

T = (T1 , T2 ) and

" #

∂Eθ [T]

τ 0 (θ1 )

α = ∂θ 1

=

∂Eθ [T] τ 0 (θ2 )

∂θ2

2

∂ log L(θ)

Iij (θ) = −Eθ i 6= j; i, j = 1, 2.

∂θi ∂θj

2

∂ log L(θ)

= −Eθ i=j

∂θi2

I11 (θ) I12 (θ)

I(θ) =

I21 (θ) I22 (θ)

2 2

∂ log L(θ) ∂ log L(θ)

I11 (θ) = −Eθ = −Eθ

∂θ1 ∂θ1 ∂θ2

" 2 #

∂ log L(θ)

= Eθ where θ1 = θ

∂θ

2

∂ log L(θ)

I12 (θ) = I21 (θ) = −Eθ

∂θ1 ∂θ2

2

∂ log L(θ)

= −Eθ where θ2 = σ 2

∂θ∂σ 2

2 2

∂ log L(θ) ∂ log L(θ)

I22 (θ) = −Eθ = −Eθ

∂θ2 ∂θ2 ∂θ∂σ 2

The likelihood function for θ is

Yn

L(θ) = pθ (xi )

i=1

n2

1 1

P 2

= 2

e− 2σ2 (xi −θ)

2πσ

n n 1 X

log L(θ) = − log 2π − log σ 2 − 2 (xi − θ)2

2 2 2σ

∂ log L(θ) 1 X

= 2 (xi − θ)

∂θ 2σ 2

n

= [x̄ − θ]

σ2

2

∂ log L(θ) 1 2

Eθ = n Eθ [X̄ − θ]2

∂θ σ4

n2 σ 2 n

I11 (θ) = =

σ 4 n σ 2

∂ 2 log L(θ)

I12 (θ) = I21 (θ) = −Eθ =0

∂θ∂σ 2

2

∂ log L(θ) n

since Eθ = − 4 Eθ [X̄ − θ] = 0

∂σ 2 ∂θ σ

∂ log L(θ) n 1 X

= − 2+ (xi − θ)2

∂σ 2 2σ

141 2(σ 2 )2

2

∂ log L(θ) n 1 X

= − 2 3 (xi − θ)2

∂(σ 2 )2 2σ 4 (σ )

2

nσ 2

∂ log L(θ) n

−Eσ2 = − 4+ 6

∂(σ 2 )2 2σ σ

n 1 n

I22 (θ) = 4

1 − =

σ 2 2σ 4

n " 2 #

σ

σ2 0 −1 n 0

I(θ) = n I (θ) = 2σ 4

0

A. Santhakumaran

2 4

i.e., Vθ [T1 ] ≥ σn and Vσ2 [T2 ] ≥ 2σn .

2

Remark 4.9 σn is the actual variance of the unbiased estimator T1 = X̄ for θ is

2σ 4

same as the Cramer - Rao lower bound of that estimator but n−1 is the actual variance

1

P n 2

of the unbiased estimator T2 = n−1 i=1 (Xi − X̄) is greater than the Cramer - Rao

lower bound of that estimator.

When the lower bound is not sharp, it can be improved by considering the higher

order derivatives of the likelihood function of the parameter θ .

Assumptions: Let X1 , X2 , · · · , Xn be distributed with pdf p(x | θ) , θ ∈ Ω .

(i) Ω is an open interval ( finite , infinite or semi infinite).

(ii) The range of the distribution Pθ , θ ∈ Ω is independent of the parameter θ .

(iii) For any x and θ ∈ Ω , the higher order derivatives

∂θ1i1 · · · ∂θsis

(vi) Define K(θ) = [Kij (θ)]s×s

" #

∂ i1 +i2 +···+is log L(θ) ∂ j1 +j2 +···+js log L(θ)

where Kij (θ) = Eθ

∂θ1i1 · · · ∂θsis ∂θ1j1 · · · ∂θsjs

Theorem 4.13 Suppose that the assumptions (i) to (iv) hold and that the covariance

matrix K(θ) is positive definite. Let T = t(X) be any statistic with Eθ [T 2 ] < ∞ for

which the higher order derivative τ i1 +i2 +···+is (θ) exists for each i = 1, 2, · · · , s and

can be obtained by differentiating under the integral sign. Then Vθ [T ] ≥ α0 K −1 (θ)α,

where α0 is row vector with elements

∂ i1 +i2 +···+is Eθ [T ] ∂ i1 +i2 +···+is log L(θ)

= Covθ T,

∂θ1i1 · · · ∂θsis ∂θ1i1 · · · ∂θsis

= τ i1 +···+is (θ)

ψi (x, θ) =

∂θ1i1 · · · ∂θsis

and C = K(θ) = [Kij (θ)]s×s and ν = α0 = ( τ 0 (θ) τ 00 (θ) · · · τ (s) (θ) ),

then Vθ [T ] ≥ α0 K −1 (θ)α

142

Probability Models and their Parametric Estimation

Example 4.25 Given that X ∼ b(n, θ) , 0 < θ < 1 . Obtain the Bhattacharya

bound for the unbiased estimator of the parameter τ (θ) = θ2 .

log L(θ) = log cnx + x log θ − (n − x) log(1 − θ)

" #

∂ i1 +i2 log L(θ) ∂ j1 +j2 log L(θ)

K(θ) = [Kij (θ)] = Eθ

∂θ1i1 ∂θ2i2 ∂θ1j1 ∂θ2j2

" #

∂ log L(θ) ∂ log L(θ) ∂ log L(θ) ∂ 2 log L(θ)

K(θ) = Eθ ∂θ ∂θ ∂θ ∂θ 2

∂ 2 log L(θ) ∂ log L(θ) ∂ 2 log L(θ) ∂ 2 log L(θ)

∂θ 2 ∂θ ∂θ 2 ∂θ 2

2

∂ log L(θ) ∂ log L(θ) ∂ 2 log L(θ)

∂θ ∂θ ∂θ 2

= Eθ

2 2

∂ 2 log L(θ) ∂ log L(θ)

∂ log L(θ)

∂θ 2 ∂θ ∂θ 2

= −

∂θ θ (1 − θ)

x − xθ − nθ + xθ (x − nθ)

= =

θ(1 − θ) θ(1 − θ)

2

(x − nθ)2

∂ log L(θ)

=

∂θ θ2 (1 − θ)2

2

(X − nθ)2

∂ log L(θ) nθ(1 − θ)

Eθ = Eθ 2 2

= 2

∂θ θ (1 − θ) θ (1 − θ)2

n

=

θ(1 − θ)

∂ log L(θ) ∂ 2 log L(θ)

2

∂ log L(θ) ∂ log L(θ)

Eθ = Eθ Eθ

∂θ ∂θ2 ∂θ ∂θ2

∂ log L(θ)

= 0 since Eθ = 0 and

∂θ

∂ 2 log L(θ)

∂ log L(θ)

Eθ E θ = 0

∂θ2 ∂θ

2 2

∂ 2 log L(θ)

2

∂ log L(θ) ∂ log L(θ) ∂ log L(θ)

Eθ = Eθ E θ

∂θ2 ∂θ2 ∂θ2 ∂θ2

2 2

∂ log L(θ) ∂ log L(θ) n

Eθ = −Eθ =−

∂θ2 ∂θ θ(1 − θ)

2 2

n2

∂ log L(θ) ∂ log L(θ)

Eθ Eθ =

∂θ2 ∂θ2 θ2 (1 − θ)2

143

A. Santhakumaran

n 0

θ(1−θ)

θ(1−θ) −1 0

K(θ) = , K (θ) = n

n2

θ 2 (1−θ)2

0 0

θ 2 (1−θ)2 n

2 0 00

τ (θ) = θ , τ (θ) = 2θ, τ (θ) = 2

θ(1−θ)

0

2θ

Vθ [T ] ≥

2θ, 2

n

θ 2 (1−θ)2 2

0

n

4θ 3 (1 − θ) 4θ 2 (1 − θ)2

≥ +

n n2

≥ Cramer - Rao lower bound of θ 2 + positive quantity

!

n! x 1

2 2

Since log L(θ) = log + log θ + (n − x) log[1 − (θ ) 2 ]

x!(n − x)! 2

∂ log L(θ) (x − nθ)

=

∂θ 2 2θ 2 (1 − θ)

∂ log L(θ) 2 (X − nθ)2

" #

Eθ = Eθ

∂θ 2 4

4θ (1 − θ) 2

nθ(1 − θ)

=

4θ 4 (1 − θ)2

n

I(θ) =

4θ 3 (1 − θ)

1

The Cramer - Rao lower bound for the variance of an unbiased estimator is I(θ) =

4θ 3 (1−θ)

n ,since τ 0 (θ) = 1.

Remark 4.10 (i) Bhattacharya Inequality becomes Cramer - Rao Inequality when

s = 1 , i.e., α1 = τ 0 (θ) and

∂ log L(θ) ∂ log L(θ)

K11 (θ) = Eθ

∂θ ∂θ

2

∂ log L(θ)

= Eθ = I(θ)

∂θ

Vθ [T ] ≥ α1 [I −1 (θ)]α1

α12

=

I(θ)

[τ 0 (θ)]2

= h i

Vθ ∂ log∂θL(θ)

(ii) When s = 2 Bhattacharya Inequality gives the non decreasing lower bound for the

variance of an unbiased estimator of τ (θ) .

The Bhattacharya Inequality is

Vθ [T ] ≥ α0 K −1 (θ)α

where α0 = (τ 0 (θ) τ 00 (θ)) and

K11 (θ) K12 (θ)

K(θ) =

K21 (θ) K22 (θ) 2×2

Vθ [T ] τ 0 (θ) τ 00 (θ)

τ 00 (θ) K21 (θ) K22 (θ)

144

Probability Models and their Parametric Estimation

2

Vθ [T ][K11 (θ)K22 (θ) − K12 (θ)] − τ 0 (θ)[τ 0 (θ)K22 (θ) − τ 00 (θ)K12 (θ)]

+ τ 00 (θ)[τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)] ≥ 0

2

Vθ [T ][K11 (θ)K22 (θ) − K12 (θ)] ≥ τ 0 (θ)[τ 0 (θ)K22 (θ) − τ 00 (θ)K12 (θ)] − τ 00 (θ)[τ 0 (θ)K12 (θ) −

τ 00 (θ)K11 (θ)]

≥ K 1 (θ) [τ 0 (θ)]2 K22 (θ)K11 (θ) − 2τ 0 (θ)τ 00 (θ)K11 (θ)K12 (θ) + [τ 00 (θ)]2 K11

2

(θ)

11

≥

1

[τ 0 (θ)]2 K12

2

(θ) + [τ 0 (θ)]2 K22 (θ)K11 (θ) − 2τ 0 (θ)τ 00 (θ)K11 (θ)K12 (θ) + [τ 00 (θ)]2 K11

2

(θ) − [τ 0 (θ)]2 K12

2

K11 (θ)

(θ)

≥ K 1 (θ) [τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)]2 + [τ 0 (θ)]2 [K11 (θ)K22 (θ) − K12 2

(θ)]

11

Vθ [T ] ≥ [τ 0 (θ)]2 + 2 (θ)]

K11 (θ) K11 (θ)[K11 (θ)K22 (θ) − K12

≥ Cramer - Rao Inequality + Positive quantity

2

h K(θ) iis positive definite so K11 (θ)K22 (θ) − K12 (θ) > 0 and K11 (θ) =

Since

∂ log L(θ)

Vθ ∂θ > 0 . Thus the Bhattacharya Inequality is more sharper than the Cramer

- Rao Inequality.

Problems

4.1 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal population with

mean θ . Which among the two estimators T1 = X1 +X2n+···+Xn and T2 =

X1 +X2 +···+Xn

n is better? Why?

4.2 Show that, under some conditions to be stated there is a lower limit to the variance

of an unbiased estimator. How you modify the lower limit to a biased estimator?

4.3 Let X1 , X2 be independent random variables each having Poisson distribution

with mean θ . Show that Vθ X1 +X

2

2

≤ V θ [2X1 − X2 ] . Also justify the

inequality by Rao - Blackwell Theorem.

4.4 Show that Bhattacharya bound is better than Cramer - Rao bound.

4.5 Define Bhatttacharya bound of order r . Also obtain B(2) for estimating θ2 un-

biasedly, θ being the mean of a Bernoulli distribution from which a sample of

size n is available.

4.6 Let X and Y have a bivariate normal distribution with mean θ1 and θ2 with

positive variance σ12 and σ22 and with correlation coefficient ρ . Find Eθ2 [Y |

X = x] = φ(x) and variance of φ(X) .

4.7 Mention the significance of Rao - Blackwell Theorem.

4.8 In what way, Lehman - Scheffe’s Theorem different from Rao - Blackwell Theo-

rem.

4.9 Let X be a Hyper Geometric random variable with pmf

D N-D

x n-x

PD {X = x} =

N

n

145

A. Santhakumaran

N is assumed to be known.

4.10 Let X1 , X2 · · · , Xn be a random sample from a population with meanPn θ and fi-

nite variance and T = t(X) be an estimator of θ of the form T = i=1 αi Xi .

If T is an unbiased estimator of θ that has minimum variance and T 0 = t0 (X)

is another linear unbiased estimator of θ, then Covθ (T, T 0 ) = Vθ [T ].

4.11 Let X1 , X2 , · · · , Xn be a random sample from p(x | θ) = θe−θx , θ > 0, x >

0 . Show that Pn−1 n

Xi is the UMVUE of θ .

i=1

4.12 Stating the assumptions clearly, derive the Chapman - Robbin lower bound for

the variance of an unbiased estimator of a function of a real valued parameter θ .

4.13 A random sample X1 , X2 , · · · , Xn is available from a Poisson population with

mean λ . Using the unbiased estimator T = t(X1 , X2 ) = X12 − X2 . Obtain

the UMVUE of λ2 based on the sample.

4.14 State the Bhattacharya bound of order s . Also prove that it is a non - decreasing

function of s .

4.15 Define Bhattacharya bound. Show that it is sharper than the Cramer - Rao bound.

4.16 On the basis of a random sample of size n , the Cramer - Rao lower bound of

variance of an unbiased estimator of θ in

1

π[1+(x−θ)2 ] −∞ < x < ∞; −∞ < θ < ∞

pθ (x) =

0 otherwise

is equal to

( a) n1 (b) 1

n2 (c) 2

n (d) 2

n

V [Ti ] = vi , i = 1, 2. The best linear unbiased estimator (l1 T1 + l2 T2 ) of θ is

the one for which

(a) l1 = l2 = .5

(b) l1 = (v1v+v 2

2)

; l2 = (v1v+v

1

2)

v −1

(c) l1 = (v−11+v−1 )

1 2

(d) l1 = 0, l2 = 1 if v1 > v2 and vice versa

4.18 Consider the following statements:

If X1 , X2 , · · · , Xn are iid random variables with uniform distribution over

(0, θ) , then

1. 2X̄ is an unbiased estimator of θ .

2. The largest among X1 , X2 , · · · , Xn is an unbiased, estimator of θ .

3. The largest among X1 , X2 , · · · , Xn is sufficient for θ .

4. n+1

n X(n) is a minimum variance unbiased estimator of θ .

Of these statements :

(a) 1 and 3 are correct

146

Probability Models and their Parametric Estimation

(c) 1 and 2 are correct

(d) 1 , 3 and 4 are correct

4.19 Which one of the following is not necessary for the UMVU estimation of θ by

T = t(X) ?

(a) Eθ [T − θ] = 0

(b) Eθ [T − θ]2 < ∞

(c) Eθ [T − θ]2 is minimum

(d) T is a linear function of observations

4.20 If T1 = t1 (X) and T2 = t2 (X) are unbiased estimators of θ and θ2 (0 <

θ < 1) and T is a sufficient statistic, then E[T1 | T ] − E[T2 | T ] is :

(a) the minimum variance unbiased estimator of θ

(b) always an unbiased estimator of θ(1 − θ), which has variance not exceeding

that of θ(1 − θ)

(c) always the minimum variance unbiased estimator of θ(1 − θ)

(d) not an unbiased estimator of θ(1 − θ)

4.21 T 0 = t0 (X) and T = t(X) are two unbiased estimator of τ (θ) with variance

Vθ [T ] < ∞ and Vθ [T 0 ] < ∞ . The estimator T is said to be an efficient

estimator of τ θ) if:

(a) Vθ [T ] < Vθ [T 0 ]

(b) Vθ [T ] > Vθ [T 0 ]

(c) Vθ [T ] = Vθ [T 0 ]

(d) none of the above

4.22 T 0 = t0 (X) and T = t(X) are two unbiased estimator of τ (θ) with variance

Vθ [T ] < ∞ and Vθ [T 0 ] < ∞ . The estimator T is an efficient estimator relative

to T 0 of the parameter τ (θ) if:

(a) Vθ [T ] < Vθ [T 0 ]

(b) Vθ [T ] > Vθ [T 0 ]

(c) Vθ [T ] 6= Vθ [T 0 ]

(d) none of the above

147

A. Santhakumaran

5. METHODS OF ESTIMATION

5.1 Introduction

Chapters 2 , 3 and 4 disuse the properties of a good estimator. The methods of

obtaining such estimators are as follows:

(i) Method of Maximum Likelihood Estimation

(ii) Method of Minimum Variance Bound Estimation

(iii) Method of Moments Estimation

(iv) Method of Minimum Chi-Square Estimation

The Maximum Likelihood Estimation is a principle, states that an estimate of θ,

say θ̂(x) within the admissible range of θ which makes the likelihood function L(θ)

as large as possible. That is, for any admissible value θ̂(x), L(θ̂) ≥ L(θ) . Thus

2

θ̂(x) is the solution of ∂L(θ)

∂θ = 0 and ∂ ∂θ L(θ)

2 < 0 at θ = θ̂(x). It is equivalent

2

that ∂ log∂θL(θ) = 0 and ∂ log L(θ)

∂θ 2 < 0 at θ = θ̂(x). Thus any non - trivial solution

θ̂(X) of the equations which maximizes L(θ) is called Maximum Likelihood Esti-

mator (MLE) of θ .

tion with pdf N (0, θ) , θ > 0 . Find the MLE of θ .

The likelihood function for θ of the sample size n is

n

Y

L(θ) = pθ (xi )

i=1

n2

1 1

Pn 2

= e− 2θ i=1 (xi −θ)

2πθ

n

n n 1 X

log L(θ) = − log 2π − log θ − (xi − θ)2

2 2 2θ i=1

n

∂ log L(θ) n 1 X 2

= 0− + x

∂θ 2θ 2θ2 i=1 i

n

∂ 2 log L(θ) n 1 X 2

= − x

∂θ2 2θ2 θ3 i=1 i

148

Probability Models and their Parametric Estimation

Pn

∂ log L(θ) 1

Pn 2 x2i

For maximum , ∂θ = 0 → −n + θ i=1 xi = 0 i. e., θ̂ =

i=1

n and

∂ 2 log L(θ) n

=− < 0 at θ = θ̂(x)

∂θ2 2θ̂

Pn

X2

The MLE of θ is θ̂(X) = i=1 n

i

.

Example 5.2 A random sample of size n is drawn from a population having

density function

θxθ−1 0 < x < 1, 0 < θ < ∞

pθ (x) =

0 otherwise

The likelihood function for θ of the sample size n is

n

Y

L(θ) = pθ (xi )

i=1

n

Y

= θn xθ−1

i

i=1

n

X

log L(θ) = n log θ + (θ − 1) log xi

i=1

n

∂ log L(θ) n X

= + log xi

∂θ θ i=1

∂ 2 log L(θ) n

= −

∂θ2 θ2

∂ log L(θ)

For maximum, = 0

∂θ

n

n X

⇒ + log xi = 0

θ i=1

−n

i.e., θ̂(x) = Pn and

i=1 log xi

∂ 2 log L(θ)

−n X 2

2

θ = θ̂(x) = 2

log xi

∂θ n

P 2

( log xi )

= − <0

n

−n

Thus the MLE of θ is θ̂(X) = Pn .

i=1 log Xi

1 −x

θe

θ 0 < x < ∞, θ > 0

pθ (x) =

0 otherwise

149

A. Santhakumaran

Let p = Pθ {X > 2}

= 1 − Pθ {X ≤ 2}

Z 2

1 −x 2

= 1− e θ dx = e− θ

0 θ

2 1 2

log p = − ⇒ log =

θ p θ

2

⇒ θ =

log p1

A sample of size n is taken and it is known that k of the observations are X > 2 and

(n − k) of the observation are X < 2 . The likelihood function for p of the sample

size n is

L(p) = pk (1 − p)n−k

log L(p) = k log p + (n − k) log(1 − p)

∂ log L(p) k (n − k)

= + (−1)

∂p p (1 − p)

k − np

=

p(1 − p)

∂ 2 log L(p) −np2 − k + 2pk

=

∂p2 [p(1 − p)]2

∂ log L(p)

For maximum, = 0

∂p

⇒ k − np = 0

k

i.e., p̂ = and

n

2

∂ 2 log L(p) −n nk 2 − k + 2 nk k

∂p2 k = k k 2

p̂= n n (1 − n )

k 1 − nk

k

= − < 0 since n < 1 for n = 1, 2, · · ·

k k 2

n (1 − n )

k

Thus the value of the MLE of p is p̂ = n . The value of the MLE of P {X > 2} =

2

− −2

e where θ̂(x) =

θ̂(x)

k .

log( n )

Example 5.4 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal

population with mean θ and variance σ 2 . The density function

( 1 2

√ 1 e− 2σ2 (x−θ) −∞ < x < ∞, −∞ < θ < ∞, σ 2 > 0

pθ,σ2 (x) = 2πσ

0 otherwise

Find the MLE of

150

Probability Models and their Parametric Estimation

(ii) σ 2 when θ is known.

(iii) both θ and σ 2 are not known.

Case (i) When σ 2 is known, the likelihood function for θ is

n

Y 1 − 12 (xi −θ)2

L(θ) = √ e 2σ

i=1 2πσ 2

− n − 1 P(xi −θ)2

= 2πσ 2 2 e 2σ2

n n 1 X

log L(θ) = − log 2π − log σ 2 − 2 (xi − θ)2

2 2 2σ

∂ log L(θ) 1 X

= (xi − θ)

∂θ σ2

∂ 2 log L(θ) n

= − 2 <0

∂θ2 σ

∂ log L(θ)

For maximum, = 0

∂θ

X ∂ 2 log L(θ)

⇒ (xi − θ) = 0 i.e., θ̂(x) = x̄ and <0

∂θ2

Case (ii) When θ is known, the likelihood function for σ 2 is

n n 1 X

log L(σ 2 ) = − log 2π − log σ 2 − 2 (xi − θ)2

2 2 2σ

∂ log L(σ 2 ) (xi − θ)2

P

n 1

= − +

∂σ 2 2σ 2 2(σ 2 )2

∂ 2 log L(σ 2 ) (xi − θ)2

P

n

= −

∂(σ 2 )2 2σ 4 σ6

∂ log L(σ 2 )

For maximum, = 0

∂σ 2

n 1 X

⇒ − 2+ 4 (xi − θ)2 = 0

2σ 2σ

(xi − θ)2

P

σ̂ 2 (x) = and

n

∂ 2 log L(σ 2 )

< 0

∂(σ 2 )2 σ2 =σ̂2 (x)

Pn

(x −θ)2

Thus the value of the MLE of σ 2 is σ̂ 2 (x) = i=1 n i .

Case (iii) When θ and σ 2 are unknown, the likelihood function for θ and σ 2 is

n n 1 X

log L(θ, σ 2 ) = − log 2π − log σ 2 − 2 (xi − θ)2

2 2 2σ

∂ log L(θ, σ 2 ) 1 X

= (xi − θ)

∂θ σ2

151

A. Santhakumaran

∂θ∂σ 2 = ∂σ 2 ∂θ since both the partial derivatives exist and are continu-

ous.

∂ 2 log L(θ, σ 2 ) X −1

= (xi − θ) 4

∂θ∂σ 2 σ

2

∂ log L(θ, σ ) n 1 X

2

=− 2 + 4 (xi − θ)2

∂σ 2σ 2σ

∂ 2 log L(θ, σ 2 ) n 1 X

2 2

= 4

− 6 (xi − θ)2

∂(σ ) 2σ σ

∂ 2 log L(θ, σ 2 ) n

2

=− 2

∂θ σ

For maximum of L(θ, σ 2 ),

=0 2

=0 <0

∂θ ∂σ ∂θ2

2

∂ 2 log L(θ, σ 2 ) ∂ 2 log L(θ, σ 2 ) ∂ 2 log L(θ, σ 2 )

and − > 0 at θ = θ̂(x) and σ 2 = σ̂ 2 (x)

∂θ2 ∂(σ 2 )2 ∂θ∂σ 2

(xi − x̄)2

P

−n −n 2 2

− 0 > 0 at θ = θ̂(x) = x̄ and σ = σ̂ (x) =

σ̂ 2 (x) 2σ̂ 4 (x) n

∂ 2 log L(θ, σ 2 ) ∂ 2 log L(θ, σ 2 )

−n −n

since 2

= 2 <0 2 )2 2 2 = 2(σ̂ 2 (x))2 < 0

∂θ

θ=θ̂(x) σ̂ (x) ∂(σ σ =σ̂ (x)

2 2

∂ log L(θ, σ ) X −1

θ = θ̂(x) = (xi − x̄) 4 =0

∂θ∂σ 2 σ̂ (x)

σ 2 = σ̂ 2 (x)

P 2

.˙. The value of the MLE of θ and σ 2 are θ̂(x) = x̄ and σ̂ 2 (x) = (xni −x̄) .

Example 5.5 Find the MLE of the parameter α and λ ( λ being large) from a

sample of n independent observations from the population represented by the follow-

ing density function

( λ λ

(α) λ

−α x λ−1

pα,λ (x) = Γλ e x x > 0, λ > 0, α > 0

0 otherwise

Also obtain the asymptotic form of the covariance for the two parameters for large n .

Given that ∂ log

∂λ

Γλ 1

≈ log λ − 2λ .

Likelihood function for α and λ of the sample size n is

nλ Yn

1 λ

Pn λ

L(α, λ) = n

e− α i=1 xi xλ−1

i

(Γλ) α i=1

n n

λX X

log L(α, λ) = −n log Γλ + nλ log λ − nλ log α − xi + (λ − 1) log xi

α i=1 i=1

152

Probability Models and their Parametric Estimation

P

∂ log L(α, λ) nλ xi

=− +λ 2

∂α α α

∂ 2 log L(α, λ)

P

nλ xi

= 2 − 2λ 3

∂α2 α α

∂ 2 log L(α, λ)

P

n xi

=− + 2

∂λ∂α α α

Pn n

∂ log L(α, λ) ∂ log Γλ i=1 xi X

= −n + n(1 + log λ) − n log α − + log xi

∂λ ∂λ α i=1

P

∂ log L(α, λ) 1 xi X

= −n(log λ − ) + n + n log λ − n log α − + log xi

∂λ 2λ α

P

∂ log L(α, λ) n xi X

= + n − n log α − + log xi

∂λ 2λ α

∂ 2 log L(α, λ) n

2

=− 2

∂λ 2λ

∂ log L(α,λ)

For maximum of log L(α, λ), ∂α = 0 and ∂ log∂λ L(α,λ)

=0

P

λ xi

−n + λ 2 = 0 → α̂(x) = x̄ and

P α α

n xi X

+ n − n log α − + log xi = 0

2λ α

n

→ λ̂(x) = Pn

2 i=1 (log x̄ − log xi )

∂ 2 log L(α, λ)

n nx̄

Further =− + 2 =0

∂λ∂α

α=α̂(x),λ=λ̂(x) x̄ x̄

∂ 2 log L(α, λ)

< 0 and

∂λ2

λ=λ̂(x)

2

∂ 2 log L(α, λ) ∂ 2 log L(α, λ)

2

∂ log L(α, λ)

− > 0 at α = α̂(x) and λ = λ̂(x)

∂λ2 ∂α2 ∂λ∂α

" #

n nλ̂(x) 2λ̂(x)nx̄ n2 1

i.e., − − − 0 = >0

2λ̂2 (x) x̄2 x̄3 λ̂(x)x̄2 2

Thus the value of the MLE of α and λ are α̂(x) = x̄ and λ̂(x) = 2 P(log nx̄−log xi ) .

The asymptotic covariance matrix is

h 2 i h 2 i

−Eα,λ ∂ log∂αL(α,λ)

2 −Eα,λ ∂ log L(α,λ)

∂λ∂α

D= h 2 i h 2 i

−Eα,λ ∂ log L(α,λ)

∂α∂λ −Eα,λ

∂ log L(α,λ)

∂λ 2

153

A. Santhakumaran

" n #

∂ 2 log L(α, λ)

nλ 2λ X

−Eα,λ = − 2 + 3 Eα Xi

∂α2 α α i=1

nλ 2λ

= − + 3 nα

α2 α

nλ

= since Eα [Xi ] = α ∀ i

α2

∂ 2 log L(α, λ)

n

−Eα,λ =

∂λ2 2λ2

" #

nλ̂(x)

α̂2 (x) 0

D= n

0 2λ̂2 (x)

∂ log L(θ)

∂θ = 0 has more than

one root and L(θ) is not differentiable everywhere in Ω , then the estimate of the MLE

may be a terminal value, middle value of a sample, need not be unbiased, not sufficient,

not unique and not consistent. The likelihood function L(θ) for θ is continuously

differentiable and is bounded above, then the likelihood equation has unique solution,

which maximizes L(θ) .

Example 5.6 MLE is a terminal value

The maximum likelihood estimate of the parameter α when β is known for the pdf

βe−β(x−α) α ≤ x < ∞, β > 0, α > 0

pα, β (x) =

0 otherwise

from a sample of size n is α̂ = x(1) .

When β is known, the likelihood function for α of the sample size n is

Pn

L(α) = β n e−β i=1 (xi −α)

n

X

log L(α) = n log β − β (xi − α)

i=1

∂ log L(α)

= nβ

∂α

The direct method cannot help to estimate the MLE of α . Since α ≤ x(1) ≤ x(2) ≤

· · · ≤ x(n) < ∞ , i.e., the range of the distribution depends on the parameter α .

log L(α) = n log β − nβ x̄ + nβα

is maximum, if α is minimum , i.e., α̂ = x(1) = value of the minimum order statistic

of the sample. Thus the value of the MLE of α is the terminal value x(1) .

Example 5.7 Let X1 , X2 , · · · , Xn be a random sample drawn from a population

having density

1 −|x−θ|

2e −∞ < x < ∞, −∞ < θ < ∞

pθ (x) =

0 otherwise

154

Probability Models and their Parametric Estimation

The likelihood funcion for θ of the sample size n is

n P

1

L(θ) = e− |xi −θ|

2

n

1 1

= P

|x

2 e i −θ|

P

L(θ) Pis maximum, if e |xi −θ| is minimum.

But e |xi −θ| is minimum if θ̂(x) = Median of the sample value, since mean devia-

tion is least when measured from the median. Thus the value of the MLE of θ is the

middle value of the sample.

Example 5.8 MLE is not unbiased

Let X1 , X2 , · · · , X5 be a random sample of size 5 from the uniform distribution hav-

ing pdf 1

θ 0 < x < θ, θ > 0

pθ (x) =

0 otherwise

Show that the MLE of θ is not unbiased.

The likelihood function for θ of the sample size n = 5 is

1

L(θ) = if 0 < xi < θ, i = 1, 2, 3, 4, 5.

θ5

L(θ) is maximum, the estimate of θ is minimum. If

1≤i≤5

( 5

1

L[θ̂(x)] = x(5) if 0 < x ≤ x(5)

0 otherwise

If θ̂(x) = x(5) = max1≤i≤5 {xi }, then the value of the MLE of θ is θ̂(x) = x(5) .

Let Y = max1≤i≤5 {X5 } . The pdf of Y is

5 4

pθ (y) = θ5 t 0<y<θ

0 otherwise

Z θ

5 5

Eθ [Y ] = 5

t dt

0 θ

5

= θ 6= θ

6

The MLE θ̂(X) = X(5) is not an unbiased estimator.

Example 5.9 MLE is not unique and not sufficient statistic

Let X1 , X2 , · · · , Xn be iid with the pdf

1 θ ≤x≤θ+1

pθ (x) =

0 otherwise

155

A. Santhakumaran

1 if θ ≤ xi ≤ θ + 1, i = 1, 2, · · · , n

L(θ) =

0 otherwise

1 if θ ≤ min{xi } ≤ max{xi } ≤ θ + 1, i = 1, 2, · · · , n

L(θ) =

0 otherwise

1 if θ ≤ min{xi } = x(1) ≤ max{xi } = x(n) ≤ θ + 1, i = 1, 2, · · · , n

L(θ) =

0 otherwise

1 if θ ∈ [x(n) − 1, x(1) ]

L(θ) =

0 otherwise

Thus any point in [x(n) − 1, x(1) ] is a value of the MLE of θ . Thus the MLE of θ is

not unique and not sufficient statistic.

Example 5.10 MLE is not exist

Let X1 , X2 , · · · , Xn be a random sample drawn from a population with

pmf b(1, θ), 0 < θ < 1 both n and θ are unknown and the only sample values

(0, 0, 0, · · · , 0) or (1, 1, · · · , 1) is available.

The likelihood function for θ of the sample size n is

P P

= θ xi (1 − θ)n− xi

L(θ)

X X

log L(θ) = xi + n − xi log(1 − θ)

P P

∂ log L(θ) xi (n − xi )

= +

∂θ θ 1−θ

∂ log L(θ)

For maximum , = 0

∂θ

→ θ̂(x) = x̄ and

2

∂ log L(θ)

<0

∂θ2

θ=x̄

the MLE of θ . It is not the admissible value of θ , since θ ∈ (0, 1) . Thus the MLE of

θ is not exist.

Example 5.11 MLE is not consistent

Xi µi

Let ∼N , σ 2 In i = 1, 2, · · · , n

Yi µi

be independent vectors, where µi , i = 1, 2, · · · , n and σ 2 are unknown.

Xi

" #

µi

E Yi = µi ∀ i = 1, 2, · · · , n andV [Xi ] = V [Yi ] = σ 2 ∀ i

n

Y 1 − 12 (xi −µi )2 − 12 (yi −µi )2

L(µi , σ 2 ) = e 2σ 2σ

i=1

2πσ 2

156

Probability Models and their Parametric Estimation

1

Pn 1

Pn

log L(µi , σ 2 ) = −n log 2π − n log σ 2 − 2σ 2 i=1 (xi − µi )2 − 2σ 2 i=1 (yi − µi )2

∂ log L(µi , σ 2 )

= 0

∂µi

1 1

⇒ 2 (xi − µi ) + 2 (yi − µi ) = 0

σ σ

xi + yi

⇒ µ̂i = , i = 1, 2, · · · , n

2

" n n

#

∂ log L(µi , σ 2 ) −n 1 X 2

X

2

= 2 + 4 (xi − µi ) + (yi − µi ) = 0

∂σ 2 σ 2σ i=1 i=1

" n 2 X n 2 #

−n 1 X xi + yi xi + yi

+ 4 xi − + yi − =0

σ2 2σ i=1 2 i=1

2

" n n

#

−n 1 1X 2 1X 2

+ 4 (xi − yi ) + (xi − yi ) = 0

σ2 2σ 4 i= 4 i=1

n

1 X

⇒ σ̂ 2 (x, y) = (xi − yi )2

4n i=1

= 1, 2, · · · , n, since Xi ∼ N (µi , σ 2 ), Yi ∼

1 n

2 2

N (µi , σ ), then the MLE of 2σ is n i=1 Vi 2 . V1 2 , V2 2 , · · · , Vn2 are iid ran-

2

dom variables each having χ variate with one degree of freedom. By Kolmogorov’s

Strong Law of Large Numbers

n

1 X 2 as

V → Eσ2 [Vi2 ] = 2σ 2 as n → ∞ sinceEσ2 [V 2 ] = 2σ 2

n i=1 i

n

1 X 2 as

i.e., V → σ2 as n → ∞

2n i=1 i

n

1 X 2 as σ2

i.e., V → 6= σ 2 as n → ∞

4n i=1 i 2

n

1 X

Thus σ̂ 2 (X, Y ) = (Xi − Yi )2 is not consistent estimator of σ 2 .

4n i=1

The likelihood equations are often difficult to solve explicitly for θ even in cases

where all the regularity conditions hold and the unique solution exist. Equations in the

exponential cases are very often non-linear and difficult to solve. It may difficult to

locate the global maximum of the likelihood function for the following cases,

(i) the family of distributions under consideration is not of the exponential type.

157

A. Santhakumaran

The use of successive iterations to solve the likelihood equations by assuming ∂ log∂θL(θ)

is continuous at θ for each xi , i = 1, 2, 3, · · · , n , where n is the sample size.

For example, a random variable has a Cauchy distribution depending on a location

parameter θ , i.e.,

1 1

π 1+(x−θ)2 −∞ < x < ∞

pθ (x) =

0 otherwise

Taking a sample of size n from the population, the log likelihood function for θ is

n

X

log L(θ) = −n log π − log[1 + (xi − θ)2 ]

i=1

n

∂ log L(θ) X 2(xi − θ)

= −

∂θ i=1

1 + (xi − θ)2

n

X 2(xi − θ)

=0

i=1

1 + (xi − θ)2

has no explicit solution. The log likelihood function of θ may have several local

2

maximum for a given sample X1 , X2 , · · · , hXn . Suppose i − log[1 + (xi − θ) ] has

Pn 2(xi −θ)

a maximum at θ = xi , then sum − i=1 1+(xi −θ)2 may have up to n different

local maxima and it depends on the sample values. Newton - Raphson method is used

to locate the local maxima.

(i) Newton - Raphson Method

The Newton - Raphson method on the expansion around θ̂(x) of the likeli-

hood equation ∂ log∂θL(θ) is

∂ log L(θ̂(x))

∂ 2 log L[θ +ν θ̂(x)−θ ]

( 0)

∂ log L(θ0 ) 0

∂θ = ∂θ + θ̂(x) − θ 0 ∂θ 2 for some 0 < ν <

1 (5.1) where θ̂(x) is the root

of the likelihood equation and θ0 is an initial solution or trial solution. Since θ̂(x) is

∂ log L(θ̂(x))

the solution of the equation ∂θ = 0 and if ν = 0 , then

0

+ θ̂(x) − θ0 =0

∂θ ∂θ2

∂ log L(θ0 )

∂θ

⇒ θ̂(x) = θ0 − ∂ 2 log L(θ0 )

= θ1 (say) (5.2)

∂θ 2

The value θ1 can be substituted in equation (5.1) for θ0 to obtain another value θ2 ,

so that

∂ log L(θ1 )

∂θ

θ2 = θ1 − ∂ 2 log L(θ1 )

(5.3)

∂θ 2

158

Probability Models and their Parametric Estimation

and so on. Starting from an initial solution θ0 , one can generate a sequence {θk , k =

0, 1, · · · } which is determined successively by the formula

∂ log L(θk )

∂θ

θk+1 = θk − ∂ 2 log L(θk )

, k = 0, 1, 2, · · · (5.4)

∂θ 2

If the initial solution θ0 was chosen, close to the root of the likelihood equations θ̂(x)

2

and if ∂ log L(θk )

∂θ 2 for k = 0, 1, · · · , is bounded away from zero, there is a good

chance that the sequence generated by equation (5.4) will converge to the root θ̂(x) .

The sequence {θk , k = 0, 1, · · · , } generated by equation (5.4) depends on the sample

values X1 , X2 , · · · Xn . If the chosen initial solution θ0 is a consistent estimator of θ ,

then the sequence obtained by the equation (5.4) will faster converge to the root θ̂(x)

and provide the best asymptotically normal estimator of θ .

In small sample situations the sequence {θk , k = 0, 1, · · · , } generated by

equation (5.4) may convey irregularities due to the particular sample values obtained

in the experiment. In order to avoid irregularities in the approximating sequence, two

methods are proposed. They are fixed derivative method and method of scoring.

(ii) The Method of Fixed derivative

2

In the fixed derivative method, the term ∂ log L(θk )

∂θ 2 in equation (5.4) is re-

placed by − ank where {ak , k = 0, 1, · · · } is a suitable chosen sequence of constants

and n is the sample size.

Now the sequence {θk , k = 0, 1, · · · } is generated by

ak ∂ log L(θk )

θk+1 = θk + , k = 0, 1, 2, · · · (5.5)

n ∂θ

The sequence {θk , k = 0, 1, · · · , } converge to the root θ̂(x) in a more regular fash-

ion rather than the equation (5.4) by the choice sequence {ak }∞ k=0

Fixed derivative method fails to converge in many cases, the method of scoring

may use to locate the local maximum, since the log likelihood curve is steep in the

neighbour hood of a local maximum equation (5.5).

(iii) The Method of Scoring

The method of scoring is a special case of the fixed derivative method. The

special sequence {ak , k = 0, 1, · · · , } is chosen by Fisher. It is ak = I(θnk ) , where

I(θk ) is the amount of Fisher Information of n observations x of X and θk is the

value of the approximation after the (k − 1)th iteration. Thus Fisher’s scoring method

generates the sequence

1 ∂ log L(θk )

θk+1 = θk +

I(θk ) ∂θ

stop when the sequence {θk , k = 0, 1, · · · , } converges on a local maximum.

Example 5.12 The following data represents a sample from a Cauchy population.

Obtain the maximum likelihood estimate for the parameter involved in the distribution

by the method of successive approximation.

159

A. Santhakumaran

4.444 7.784 10.844 8.604 6.334

5.998 4.406 6.394 5.006 9.582

The pdf of the Cauchy distribution is

1 1

π 1+(x−θ)2 −∞ < x < ∞

pθ (x) =

0 otherwise

Arrange the sample values in the incceasing order of magnitude. Let the first trial value

of θ is θ̂(x) = t1 = the value of the sample median. The first approximation value is

n

4X (xi − t1 )

t2 = t1 +

n i=1 1 + (xi − t1 )2

The successive iteration values are t3 , t4 , · · · . This procedure is continued until any

two successive iterations values are equal. The convergent value is the value of the

MLE of θ .

C programme for MLE of θ of Cauchy distribution

#include < stdio.h >

#include < math.h >

#include < conio.h >

void main()

{

int i,j,n;

float a[100], sum[100], t[100], temp;

clrscr();

printf( ˝ Enter the number of observations n: \ n”);

scanf( ˝ %d”, &n);

printf( ˝ Enter the observations a: \ n”);

for(i= 1; i < = n; i++)

scanf( ˝ % f”, &a[i]);

for(i=1; i < = n-1, i++)

{

for(j=i+1; j < = n; j++)

{

if(a[i] > = a[j])

{

temp=a[i];

a[i]= a[j];

a[j]= temp;

}

}

}

if(n % 2 = = 0)

t[1] = (a[n/2] + a[ n/2 + 1]) / 2 ;

160

Probability Models and their Parametric Estimation

else

t[1] = a[(n+1)/2];

printf( ˝ \ n OUT PUT \ n \ n ”);

printf( ˝ Value of the MLE of the Cauchy Distribution \ n”);

printf( ˝ \ n - - - - - - - - - - - - - - \ n”);

for(i=1:i < = n; i++)

printf( ˝ \ t %f \ n”, a[i]);

printf( ˝ \ n Result: \ n \ n”);

printf( ˝ Median = t[1] = %f \ n \ n”, t[1]);

for(j=1; j < =n; j++)

{

sum[j]= 0;

for(i =1; i < = n; i++)

{

sum[j] = sum[j] + (a[i] - t[j]) / (1 + (a[i] - t[j]) *(a[i] - t[j]) );

}

printf( ˝ Sum[%d] = % f \ t \ n”, j, sum[j]);

t[j+1] = t[j] + (4 / (float)n)*(sum[j]);

printf( ˝ t[%d] = %f \ n ”, j+1, t[j+1]);

if(abs(t[j] -t[j+1] ) > = .001 )

break;

}

printf( ˝ \ n Value of the MLE of theta = % f”,t[j] );

getch();

}

The value of MLE of θ = 6.013498.

Example 5.13 Obtain the values of the MLE’s of the parameters b and c of the

pdf

c

c c−1 − xb

x e x, b, c > 0

pb,c (x) = b

0 otherwise

based on a sample of size n .

The likelihood function for b and c of the sample size n is

n

c n Y Pn

1

xci

L(c, b) = xc−1

i e− b i=1

b i=1

n

X 1X c

log L(c, b) = n log c − n log b + (c − 1) log xi − x

b i=1 i

n

∂ log L(c, b) n X c X c−1

= + log xi − x

∂c c b i=1 i

n

∂ log L(c, b) n 1 X c

= − + 2 x

∂b b b i=1 i

161

A. Santhakumaran

n

∂ log L(b) n 1 X c

= − + 2 x =0

∂b b b i=1 i

n

X

⇒ nb = xci

i=1

Pn

xc

The value of the MLE of b is b̂(x) = i=1

n

i

∂ log L(c)

= 0

∂c

n n

n X c X

⇒ + log xi − xc−1

i = 0

c i=1

b i=1

n

X n

X

i.e., c2 xc−1

i − cb log xi − nb = 0

i=1 i=1

Case(iii) when both c and b are unknown, the maximum L(c, b) is obtained by

solving the following equations.

= 0 and =0

∂c ∂b

Xn

nb − xci = 0

i=1

n

X n

X

nb + cb log xi − c2 xc−1

i =0

i=1 i=1

n Pn c n

X n

i=1 xi

X X

i.e., xci + c log xi − c2 xc−1

i =0

i=1

n i=1 i=1

The estimates of c and b are obtained to solve the above equations for c and b by

iterative method.

Lemma 5.1 Denote X ∼ Pθ , θ ∈ Ω and it has pdf pθ (x)

(i) The probability distributions Pθ are distinct for distinct values of θ .

(ii) The range of the density functions p(x | θ) are independent of the parameter θ .

(iii) The random observations X1 , X2 , · · · , Xn on X are independent and identi-

cally distributed.

162

Probability Models and their Parametric Estimation

interior point in Ω .

Then Pθ0 {L(θ0 ) > L(θ1 )} → 1 as n → ∞ for θ0 and θ1 ∈ Ω.

Proof:

n

Y

Let L(θ1 ) = pθ (xi ) and

i=1

Yn

L(θ0 ) = pθ0 (xi )

i=1

Define Sn = {x : L(θ0 ) > L(θ1 )}

Prove that Pθ0 {Sn } → 1 as n → ∞

L(θ0 ) L(θ0 )

>1 ↔ log >0

L(θ1 ) L(θ1 )

n

X pθ0 (xi )

log > 0

i=1

pθ1 (xi )

n

X pθ1 (xi )

log < 0

i=1

pθ0 (xi )

n

1X pθ1 (xi )

log < 0

n i=1 pθ0 (xi )

( n )

n o 1X pθ1 (Xi )

lim Pθ0 {Sn } = Pθ0 lim Sn = Pθ0 lim log <0

n→∞ n→∞ n→∞ n pθ0 (Xi )

i=1

p (x )

Since X1 , X2 , · · · , Xn are iid ⇒ pθθ1 (xii ) are iid. By Khintchin’s Law of Large

0

Numbers

n

1X pθ1 (xi ) P pθ1 (X)

log → Eθ0 log as n → ∞

n i=1 pθ0 (xi ) pθ0 (X)

By Jensen’s Inequality for the convex function f (X) ⇒ E[f (X)] ≤ f (E[X]). Here

p (x) p (x)

− log pθθ0 (x) = log pθθ1 (x) is strictly convex. 1

1 0

pθ1 (x)

For the convex function, log

pθ0 (x)

pθ1 (X) pθ1 (X)

Eθ0 log ≤ log Eθ0

pθ0 (X) pθ0 (X)

Z

pθ1 (X) pθ1 (x)

But Eθ0 = pθ (x)dx = 1

pθ0 (X) pθ0 (x) 0

1 dy 1

y = log x is a concave function and − log x is a convex function, since dx

= x

>0 ↑ ∀x>0

d2 y

and dx2

= − x12 <0

163

A. Santhakumaran

L(θ0 )

.˙. lim Pθ0 {Sn } = Pθ0 lim >1

n→∞ n→∞ L(θ1 )

( n )

1X pθ1 (Xi )

= Pθ0 lim log <0

n→∞ n pθ0 (Xi )

i=1

pθ1 (X)

= Pθ0 Eθ0 log < 0 → 1 as n → ∞

pθ0 (X)

pθ1 (X)

= Pθ0 log Eθ0 < 0 → 1 as n → ∞

pθ0 (X)

Pθ0 {L(θ0 ) > L(θ1 )} → 1 as n → ∞

MLE is consistent

Theorem 5.1 (Dugue, 1937) If log L(θ) is differentiable in an interval including

the true value of θ, say θ0 , then under the assumptions of Lemma 5.1, the likelihood

equation ∂ log∂θL(θ) = 0 has a root with probability 1 as n → ∞ which is consistent

for θ0 .

Proof: Let θ0 be o of θ and consider an interval (θ0 ± δ) , δ > 0 .

n the true value

L(θ0 )

By Lemma 5.1 Pθ0 L(θ1 ) > 1 → 1 as n → ∞, where θ1 = θ0 ± δ, since θ0 ∈

(θ0 − δ, θ0 + δ) and the likelihood function is continuous in (θ0 − δ, θ0 + δ) .

L(θ) should have a relative maximum within (θ0 − δ, θ0 + δ) with probability tends

to 1 as n → ∞ , since L(θ) is differentiable over (θ0 − δ, θ0 + δ) .

⇒ ∂ log∂θL(θ) = 0 at some point in (θ0 − δ, θ0 + δ)

⇒ θ̂(x) is a solution of ∂ log∂θL(θ) = 0 in (θ0 − δ, θ0 + δ)

⇒ θ̂(X) n ∈ [θ0 − δ, θ0 + δ] with probability

o tends to 1 as n → ∞

⇒ Pθ0 θ0 − δ < θ̂(X) < θ0 + δ → 1 as n → ∞

n o

⇒ Pθ0 θ̂(X) − θ0 < δ → 1as n → ∞

P

⇒ θ̂(X) → θ0 as n → ∞

⇒ θ̂(X) is a consistent estimator of θ .

MLE maximizes the Likelihood

Theorem 5.2 ( Huzurbazar, 1948) If log L(θ) is twice differentiable in an interval

including the true value of the parameter, than the consistent solution of the likelihood

equation [ which exists with probability one by Theorem 5.1 ] maximizes the likelihood

at the true value with probability tends to one, i.e.,

( )

∂ 2 log L(θ)

Pθ0 < 0 → 1 as n → ∞

∂θ2

θ=θ̂(x)

∂ 2 log L(θ)

Proof: Expanding ∂ 2 θ2 as Taylor’s series around θ̂(x) is

∂ 2 log L[θ̂(x)] ∂ 2 log L(θ0 ) 3

L(θ ? )

∂θ 2 = ∂θ 2 +[θ̂(x)−θ0 ] ∂ log

∂θ 3 where θ? = θ0 +ν(θ̂(x)−θ0 ), 0 <

ν<1

164

Probability Models and their Parametric Estimation

3

L(θ ? )

Further, assume ∂ log ≤ H(x) ∀ θ ∈ Ω and Eθ0 [H(X)] < ∞ is independent

∂θ 3

of θ0 .

∂ 2 log L[θ̂(x)] ∂ 2 log L(θ ) 3

∂ log L(θ? )

0

− ≤ |θ̂(x) − θ0 |

∂θ2 ∂θ2 ∂θ3

≤ |θ̂(x) − θ0 |H(x)

P P

|θ̂(X) − θ0 |H(X) → 0 as n → ∞ since θ̂(X) → θ0 as n → ∞

( )

∂ 2 log L[θ̂(X)] ∂ 2 log L(θ )

0

Pθ0 − < → 1 as n → ∞

∂θ2 ∂θ2

Each X1 , X2 , · · · , Xn is iid and by Khintchin’s Law of Large Numbers

n

1 X ∂ 2 log pθ (xi ) P

2

∂ log pθ (X)

→ Eθ 0 as n → ∞

n i=1 ∂θ2 ∂θ2

2

∂ log pθ (X)

Since I(θ0 ) ≥ 0 → Eθ0 = −I(θ0 ) < 0

∂θ2

( n )

. 1 X ∂ 2 log pθ (X)

. .Pθ0 <0 → 1 as n → ∞

n i=1 ∂θ2

n

( )

∂ 2 log L(θ)

Y

Since L(θ) = pθ (xi ) → Pθ0 <0 → 1 as n → ∞

i=1

∂θ2

θ=θ̂(x)

Let X1 , X2 , · · · , Xn be random observations on X with pdf pθ (x), θ ∈ Ω .

Assumptions:

∂ log L(θ) ∂ 2 log L(θ) 3

(i) ∂θ , ∂θ 2and ∂ log

, L(θ)

∂θ 3 exist for all x and over an interval contain-

ing the true value of θ say θ0 .

h i h 2 i

(ii) Eθ0 ∂ log∂θL(θ) = 0, Eθ0 ∂ log∂θ 2

L(θ)

= −nI(θ0 ) < 0 ∀ θ ∈ Ω where I(θ0 ) is

the amount of information for a single observation x of X .

3

(iii) ∂ log L(θ)

≤ H(x) and Eθ0 [H(X)] < ∞ is independent of θ0 .

∂θ 3

Theorem 5.3 ( Cramer p 1946) Let θ̂(X) be the MLE of θ , then under the regular-

ity conditions (i) to (iii) nI(θ0 )(θ̂(X) − θ0 ) has an asymptotic normal distribution

with mean zero and variance one

Proof: Let θ̂(X) be the solution of ∂ log∂θL(θ) = 0 in an interval containing the

true value θ0 of θ .

Expanding the function ∂ log∂θL(θ) around θ̂(x) by using Taylor’s series for any fixed

165

A. Santhakumaran

x,

2

∂ log L θ̂(x) ∂ log L(θ0 ) 2

∂ log L(θ0 ) θ̂(x) − θ 0 ∂ 3 log L(θ? )

i.e., = + θ̂(x) − θ0 +

∂θ ∂θ ∂θ2 2! ∂θ3

where θ? = θ0 + ν θ̂(x) − θ0 , 0 < ν < 1.

2

∂ log L(θ̂(x)) ∂ log L(θ0 ) ∂ 2 log L(θ ) θ̂(x) − θ0 ∂ 3 log L(θ? )

0

But =0 → + θ̂(x) − θ0 2

+ =0

∂θ ∂θ ∂θ 2 ∂θ3

2

∂ 2 log L(θ ) θ̂(x) − θ0 ∂ 3 log L(θ? ) ∂ log L(θ0 )

0

θ̂(x) − θ0 2

+ 3

=−

∂θ 2 ∂θ ∂θ

∂ 2 log L(θ ) θ̂(x) − θ0 ∂ 3 log L(θ? )

= − ∂ log L(θ0 )

0

θ̂(x) − θ0 +

∂θ2 2 ∂θ3 ∂θ

1 ∂ log L(θ0 )

n ∂θ

θ̂(x) − θ0 =

1 ∂ 2 log L(θ0 ) (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ? )

−n ∂θ 2 − 2 n ∂θ 3

I(θ0 )

nI(θ0 ) n1 ∂ log∂θ

L(θ0 )

p

I(θ0 )

p

nI(θ0 ) θ̂(x) − θ0 =

2 (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ ? )

− n1 ∂ log L(θ0 )

∂θ 2

− 2 n ∂θ 3

1 ∂ log L(θ0 )

√ ∂θ

nI(θ0 )

p

nI(θ0 ) θ̂(x) − θ0 =

2 (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ? )

1

I(θ0 ) − n1 ∂ log L(θ0 )

∂θ 2 − 2 n ∂θ 3

n

1 X ∂ 2 log pθ (xi ) P

2

∂ log pθ (X)

→ Eθ0 as n → ∞

n i=1 ∂θ2 ∂θ2

n

1 X ∂ 2 log pθ (xi ) P

→ −I(θ0 ) as n → ∞

n i=1 ∂θ2

P P

Also θ̂(X) → θ0 as n → ∞ → θ̂(X) − θ0 → 0 as n → ∞ and

Eθ0 [H(X)]

h = ki as n → ∞. Denote Zi = ∂ log∂θ pθ (xi )

, i = 1, 2, · · · , n. Eθ0 [Zi ] =

∂ log pθ (Xi )

Eθ0 ∂θ = 0 ∀ i = 1, 2, · · · , n. Let Sn = Z1 + · · · + Zn , then E[Sn ] = 0

and V [Sn ] = I(θ0 ) + · · · + I(θ0 ) = nI(θ0 )

∂ log L(θ0 )

√ 1 ∂θ

nI(θ0 )

p

nI(θ0 ) θ̂(X) − θ0 = 1

1 as n → ∞

I(θ0 ) − n (−nI(θ0 )) − 0

166

Probability Models and their Parametric Estimation

p ∂ log L(θ0 )

nI(θ0 ) θ̂(X) − θ0 = p ∂θ as n → ∞

nI(θ0 )

n −E[Sn ] d

By Lindeberg - Levey Central Limit Theorem S√ → N (0, 1) as n → ∞ .

V [Sn ]

p

.˙. nI(θ0 ) θ̂(X) − θ0 ∼ N (0, 1) as n → ∞ .

Remark 5.2 Any consistent estimator θ̂(X) of roots of the likelihood equation

√

satisfies n(θ̂(X)−θ0 ) ∼ N (0, I(θ10 ) ), then θ̂(X) is an efficient likelihood estimator

of θ or asymptotically normal and efficient estimator of θ .

MLE is unique

Theorem 5.4 ( Wald 1949) Consistent solution of a likelihood equation is unique with

probability 1 as n → ∞

ˆ ˆ ∂ log L(θ)

Proof: Let θ1 (x) and θ2 (x) be two consistent solutions of ∂θ = 0 and

ˆ ˆ

θ1 (x) 6= θ2 (x) . By Huzurbazar’s Theorem

( )

∂ 2 log L(θˆ1 (X))

Pθ < 0 → 1 as n → ∞ and

∂θ2

( )

∂ 2 log L(θˆ2 (X))

Pθ < 0 → 1 as n → ∞

∂θ2

∂ log L(θ) 2 ˆ

∂ log L(θ3 (x))

Applying Rolle’s Theorem to the function

∂θ which gives ∂θ 2 = 0

ˆ ˆ ˆ ˆ ˆ

for some θ3 (x) within the interval θ1 (x), θ2 (x) where θ3 (x) = λθ1 (x) + (1 −

λ)θˆ2 (x), 0 < λ < 1. θˆ3 (x) is also a consistent solution of ∂ log∂θL(θ) = 0. Thus

( )

∂ 2 log L(θˆ3 (X))

Pθ < 0 → 1 as n → ∞

∂θ2

∂ 2 log L(θˆ3 (x)) 2

∂ log L(θ̂(x))

∂θ 2 < 0 is a contradiction to Rolle’s Theorem

property that ∂θ 2 =

ˆ ˆ ˆ ˆ

0 for some θ3 (x) within the interval θ1 (x), θ2 (x) . The only possibility is θ1 (x) =

θˆ2 (x) . Thus θˆ1 (x) = θˆ2 (x) is a consistent solution of the likelihood equation and is

unique.

Invariance Property of MLE

Let X ∼ Pθ , θ ∈ Ω, where Ω is a k dimensional parameter space. Consider

g(θ) : Ω → O where O is the r dimensional space (r ≤ k) . If θ̂ is the MLE of θ ,

then g(θ̂) is the MLE of g(θ) .

Let g(θ) be the function of θ from Ω to O , i.e., g : Ω → O ∀θ ∈ Ω

i.e., g(θ) = ω ∈ O . For a fixed ω ∈ O , let

Aω = [θ | g(θ) = ω]

= the set of all θ0 s such that g(θ) = ω fixed ∀ ω ∈ O

.. . ∩ω Aω = Ω

167

A. Santhakumaran

Let θ̂ be the MLE of θ , i.e, L(θ̂) is maximized at θ = θ̂ , .. .θ̂ ∈ Ω .

⇒ given θ̂ , we can find ω̂ = g(θ̂) such that θ̂ ∈ Aω .

Thus θ̂ is the MLE of θ

⇒ g(θ̂) is the MLE of g(θ) .

Let X1 , X2 , · · · , Xn be a random sample on X according to a one parameter

exponential family with density

Q(θ)t(x)

pθ (x) = c(θ)e h(x)

θt(x)−A(θ)

= e h(x)

The likelihood function for θ of the sample size n is

Pn

θi=1 t(xi )−nA(θ) h(x) where h(x) = h1 (x1 , x2 , · · · , xn )

L(θ) = e

n

X

log L(θ) = θ t(xi ) − nA(θ) + log h(x)

i=1

∂ log L(θ) n

X 0

= t(xi ) − nA (θ)

∂θ i=1

n

∂ log L(θ) 1X

For maximum, = 0 → A0 (θ) = t(xi ) (5.6)

∂θ n i=1

and

∂ 2 log L(θ)

= −nA00 (θ) < 0

∂θ2

Z

Consider eθt(x)−A(θ) h(x)dx = 1

Assume that the integral is continuous and has derivatives of all orders with re-

spect to θ and it can be differentiated under the integral sign.

Z Z

t(x)eθt(x)−A(θ) h(x)dx − A0 (θ)e−A(θ) eθt(x) h(x)dx = 0

Z

Eθ [T ] = A0 (θ) eθt(x)−A(θ) h(x)dx

A0 (θ) = Eθ [T ] (5.7)

Pn

Using equations (5.6) and (5.7), one may get Eθ [T ] = n1 i=1 t(xi )

Z Z

0

t(x)e θt(x)−A(θ)

h(x)dx − A (θ) = 0 since eθt(x) e−A(θ) h(x)dx = 1

Z

t2 (x)eθt(x)−A(θ) h(x)dx − A00 (θ) − A0 (θ)Eθ [T ] = 0

168

Probability Models and their Parametric Estimation

2

Eθ [T 2 ] − (Eθ [T ]) = A00 (θ) since A0 (θ) = Eθ [T ]

∂ 2 A(θ)

i.e., = Vθ [T ]

∂θ2

√

Thus n θ̂(X) − θ ∼ N (0, Vθ [T ]) , i.e., θ̂(X) is consistent, unique and asymp-

totically normal.

If sufficient statistic exists, then the MLE is a function of sufficient statistics.

Let X1 , X2 , · · · , Xn be iid random sample with pdf pθ (x) . Let T =

t(X) be the sufficient statistic. The likelihood function for θ of the sample size n is

n

Y

L(θ) = pθ (xi )

i=1

= pθ (t)h(x) where h(x) = h1 (x1 , x2 , · · · , xn )

log L(θ) = log pθ (t) + log h(x)

∂ log L(θ) ∂ log pθ (t)

= and

∂θ ∂θ

∂ 2 log L(θ) 2

∂ log pθ (t)

=

∂θ2 ∂θ2

2

∂ log L(θ)

For MLE, ∂θ = 0 and ∂ log L(θ)

∂θ 2 < 0 are equivalent to ∂ log∂θpθ (t) = 0

θ=θ̂(x)

∂ 2 log pθ (t)

and ∂θ 2 < 0. Thus MLE is a function of the sufficient statistic.

θ=θ̂(x)

A statistic T = t(X) is said to be a MVBE if it attains the Cramer - Rao lower

bound.

Theorem 5.5 A necessary and sufficient condition for a statistic T = t(X) is a

MVBE of τ (θ) is ∂ log∂θL(θ) and [t(x) − τ (θ)] are proportional.

Proof: Assume ∂ log∂θL(θ) and t(x) − τ (θ) are proportional, i.e., ∂ log∂θL(θ) ∝

t(x) − τ (θ), i.e.,

∂ log L(θ)

= A(θ)[t(x) − τ (θ)] (5.8)

∂θ

where A(θ) is function of θ only.

To Prove T = t(X) is MVBE of τ (θ) , it is enough to prove

[τ 0 (θ)]2

Vθ [T ] = h i2 ∀ θ ∈ Ω

∂ log L(θ)

Eθ ∂θ

169

A. Santhakumaran

∂ log L(θ)

Covθ T, = τ 0 (θ), ∀ θ ∈ Ω.

∂θ

∂ log L(θ)

i.e., Eθ T = τ 0 (θ), ∀ θ ∈ Ω.

∂θ

∂ logL(θ) ∂ log L(θ)

Eθ (T − τ (θ)) = τ 0 (θ), since Eθ =0∀θ∈Ω

∂θ ∂θ

∂ log L(θ)

A(θ)Eθ [T − τ (θ)]2 = τ 0 (θ) since = A(θ)[t(x) − τ 0 (θ)]

∂θ

A(θ)Vθ [T ] = τ 0 (θ)

τ 0 (θ)

A(θ) =

Vθ [T ]

Squaring both sides of (5.8), one can get

2

∂ log L(θ) 2

A2 (θ) t(x) − τ 0 (θ)

=

∂θ

2

∂ log L(θ)

Eθ = A2 (θ)Vθ [T ]

∂θ

2

[τ 0 (θ)]2 Vθ [T ]

∂ log L(θ)

i.e., Eθ =

∂θ {Vθ [T ]}2

[τ 0 (θ)]2

i.e., Vθ [T ] = h i2 ∀ θ ∈ Ω

∂ log L(θ)

Eθ ∂θ

T = t(X) attains the Cramer - Rao lower bound, i.e., T = t(X) is a MVBE of

τ (θ) .

Conversely, assume T = t(X) is a MVBE of τ (θ) . Now to prove ∂ log∂θL(θ) ∝

[t(x) − τ (θ)] , i.e., ∂ log∂θL(θ) = A(θ)[t(x) − τ (θ)] , τ 0 (θ) = A(θ)Vθ [T ] and

h i2 0 2

Eθ ∂ log∂θL(θ) = [τVθ(θ)]

[T ]

2

A2 (θ)Vθ2 [T ]

∂ log L(θ)

.˙. Eθ =

∂θ Vθ [T ]

2

∂ log L(θ)

Eθ = A2 (θ)Vθ [T ]

∂θ

2

∂ log L(θ)

Eθ = A2 (θ)Eθ [T − τ (θ)]2

∂θ

∂ log L(θ)

⇒ = A(θ)[t(x) − τ (θ)]

∂θ

∂ log L(θ)

i.e., ∝ [t(x) − τ (θ)]

∂θ

170

Probability Models and their Parametric Estimation

lation with density function

θxθ−1 0 < x < 1, θ > 0

pθ (x) =

0 otherwise

Obtain the MVBE of θ .

The likelihood function for θ of the sample size n is

n

Y

L(θ) = θn xθ−1

i

i=1

n

X

log L(θ) = n log θ + (θ − 1) log xi

i=1

n

∂ log L(θ) n X

= + log xi

∂θ θ i=1

" n #

X −n

= log xi −

i=1

θ

Pn

t(x) = i=1 log xi , τ (θ) = −nθ and A(θ) = 1. Thus the MVBE of τ (θ)(= −n

θ ) is

Pn Pn τ 0 (θ) n

i=1 log Xi and the variance of the estimator i=1 log Xi is A(θ) = θ 2 .

The MVBE of θ is θ̂(X) = Pn −n .

i=1 log Xi

Example 5.15 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a

population with pdf

1 − x p−1

θ p Γp e

θ x x > 0, θ > 0

pθ (x) =

0 otherwise

Obtain the MVBE of θ when p is known.

The likelihood function for θ of the sample size n is

Pn ip−1

1 − i=1

xi hY

L(θ) = e θ x i

(Γp)n θnp

P n

xi X

log L(θ) = −n log Γp − np log θ − + (p − 1) log xi

θ i=1

∂ log L(θ) np nx̄

= 0− + 2

∂θ θ θ

n

= [x̄ − pθ]

θ2

np x̄

= − θ

θ2 p

np x̄

τ (θ) = θ, A(θ) = 2 , t(x) =

θ p

X̄ τ 0 (θ) x̄2

The MVBE of τ (θ) is T = p, when p is known and Vθ [T ] = A(θ) = np3 .

171

A. Santhakumaran

Theorem 5.6 The necessary and sufficient condition that distribution admits the

estimator of a suitable chosen function of a parameter with variance equal to the in-

formation limit ( MVB) is that the likelihood function L(θ) = eθ1 t(x)+θ2 h(x), where

h(x) and t(x) are functions of observations only and θ1 and θ2 are functions of θ

only. The parametric functions to be estimated is − dθ dθ2 dθ

dθ1 = − dθ dθ1 and the variance

2

2

h i

of the estimator is − ddθθ22 = dθ

d

− dθ

dθ1

2 1

dθ

1 dθ1

Proof: Let T = t(X) be the MVBE of τ (θ) where θ is the population parameter.

For a single observation x of X , the likelihood function for θ is L(θ) = pθ (x) , and

t(x) − τ (θ) and ∂ log∂θL(θ) are proportional, i.e.,

∂ log L(θ)

= A(θ)[t(x) − τ (θ)]

∂θ

where A(θ) is a function of θ only.

Integrating with respect to θ , one can get

Z Z

∂ log L(θ) = A(θ)[t(x) − τ (θ)]dθ + c

L(θ) = eθ1 t(x)+θ2 +c

= eθ1 t(x)+θ2 ec

= eθ1 t(x)+θ2 h(x) where ec = h(x)

Conversely, the likelihood function L(θ) is expressible in the form

Z Z

L(θ)dx = h(x)et(x)θ1 +θ2 dx = 1

Z

i.e., h(x)et(x)θ1 dx = e−θ2

Further, assuming the differentiation with respect to θ1 under the integral sign is valid

and differentiate twice, one can get

Z

dθ2

h(x)et(x)θ1 t(x)dx = e−θ2 − (5.9)

dθ1

2

d2 θ2

Z

dθ2

h(x)et(x)θ1 [t2 (x)]dx = e−θ2 − e−θ2 (5.10)

dθ1 dθ12

172

Probability Models and their Parametric Estimation

dθ1 = τ (θ)

2

2

dθ2 d2 θ2

R 2

From equation (5.10), t (x)et(x)θ1 +θ2 h(x)dx = dθ 1

− dθ12

2

d2 θ2

2 dθ2

Eθ [T ] = −

dθ1 dθ12

2 d2 θ 2

Vθ [T ] = Eθ [T 2 ] − (Eθ [T ]) = −

dθ12

n o

d dθ2

0

[τ (θ)] 2 dθ (− dθ1 )

h i2 = h i2

∂ log L(θ) ∂ log L(θ)

Eθ ∂θ Eθ ∂θ

dθ2 dθ1 2

[− dθd1 { dθ 1

} dθ ]

= h i2

Eθ ∂ log∂θL(θ)

2

( ddθθ22 )2 { dθ 1 2

dθ }

1

= h i2

Eθ ∂ log∂θL(θ)

= t(x) +

∂θ dθ dθ

∂ 2 log L(θ) d2 θ 1 d2 θ2

2

= t(x) 2 +

2 ∂θ dθ dθ2

2

d2 θ2

∂ log L(θ) d θ1

Eθ = E θ [T ] +

∂θ2 dθ2 dθ2

2

d2 θ2

d θ1 dθ2

= − +

dθ2 dθ1 dθ2

dθ2 dθ2 dθ1

=

dθ dθ1 dθ

2

d θ2 d dθ2 dθ1

=

dθ2 dθ dθ1 dθ

d[ dθ

dθ1 ] dθ1

2

dθ2 d2 θ1

= +

dθ dθ dθ1 dθ2

2

d2 θ2 dθ1 dθ2 d2 θ1

= 2 +

dθ1 dθ dθ1 dθ2

2 2

∂ log L(θ) ∂ log L(θ)

But Eθ = −Eθ

∂θ ∂θ2

173

A. Santhakumaran

The variance of the MVBE of τ (θ) = − dθ

dθ1

2

is

2

{ ddθθ22 }2 { dθ 1 2

dθ } d2 θ 2

1

=−

2

− ddθθ22 ( dθ 1 2

dθ )

dθ12

1

2

The variance of T = t(X) is − ddθθ22 . Thus T = t(X) attains the MVB of the

1

parametric function τ (θ).

Example 5.16 Let X1 , X2 , · · · , Xn be a random sample drawn from the popula-

tion with pdf

θxθ−1 0 < x < 1, θ > 0

pθ (x) =

0 otherwise

Find the MVBE of θ .

The likelihood function for θ is

n

!θ−1

Y

n

L(θ) = θ xi

i=1

n

X

log L(θ) = n log θ + (θ − 1) log xi

i=1

log xi − n

P P

n log θ+θ i=1 log xi

→ L(θ) = e

→ L(θ) = eθ1 t(x)+θ2 h(x)

where θ1 = θ, θ2 = n log θ,

P X

h(x) = e− log xi

, t(x) = log xi

dθ2 n

τ (θ) = − =−

dθ1 θ

d nθ

2

d θ2 n

Vθ [T ] = − 2 = − = 2

dθ1 dθ θ

P

θ ⇒ θ .

−n

Thus the MVBE of θ is θ̂(X) = log Xi .

P

If MVBE exists, then the MLE is a function of the MVBE.

Assume that the MVBE T = t(X) exists for the parameter θ , then

∂ log L(θ)

= A(θ)[t(x) − θ]

∂θ

∂ log L(θ) ∂ 2 log L(θ)

L(θ) attains maximum, if ∂θ = 0 and ∂θ 2 < 0 at θ = θ̂(x) .

174

Probability Models and their Parametric Estimation

∂ 2 log L(θ)

= A0 (θ)[t(x) − θ] + A(θ)(−1)

∂θ2

∂ 2 log L(θ)

= −A(θ̂(x)) < 0 at θ = θ̂(x)

∂θ2

where θ(X)ˆ is MLE of θ .

Example 5.17 If T = t(X) is MVBE of τ (θ) and pθ (x1 , x2 , · · · , xn ) the joint

density function corresponding to n independent observations of a random variable

X , then show that correlation between T and ∂ log pθ (x∂θ 1 ,x2 ,··· ,xn )

is unity.

Given T = t(X) is the MVUE of τ (θ) , i.e., T attains the Cramer Rao lower

bound,

[τ 0 (θ)]2

⇒ Vθ [T ] = ] θ∈Ω

Vθ [ ∂ log pθ (x∂θ

1 ,x2 ,··· ,xn )

∂ log pθ (x1 , x2 , · · · , xn )

i.e., [τ 0 (θ)]2 = Vθ [T ]Vθ [ ]

∂θ

r

∂ log pθ (x1 , x2 , · · · , xn )

τ 0 (θ) = Vθ [T ]Vθ [ ]

∂θ

But τ (θ) = Eθ [T ]

Z

= tpθ (x1 , x2 , · · · , xn )dx

∂pθ (x1 , x2 , · · · , xn )

Z

τ 0 (θ) = t dx

∂θ

∂pθ (x1 , x2 , · · · , xn ) pθ (x1 , x2 , · · · , xn )

Z

= t dx

∂θ pθ (x1 , x2 , · · · , xn )

∂ log pθ (x1 , x2 , · · · , xn )

Z

= t pθ (x1 , x2 , · · · , xn )dx

∂θ

∂ log pθ (x1 , x2 , · · · , xn )

= Eθ T

∂θ

log pθ (x1 , x2 , · · · , xn )

= Covθ T,

∂θ

Correlation coefficient between T and log pθ (x1∂θ ,x2 ,··· ,xn )

is

h i

Covθ T, log pθ (x1∂θ,x2 ,··· ,xn )

ρ= r h i

Vθ [T ]Vθ ∂ log pθ (x∂θ

1 ,x2 ,··· ,xn )

τ 0 (θ)

ρ = r h i

∂ log pθ (x1 ,x2 ,··· ,xn )

Vθ [T ]Vθ ∂θ

= 1

r h i

∂ log pθ (x1 ,x2 ,··· ,xn )

Since τ 0 (θ) = Vθ [T ]Vθ ∂θ

175

A. Santhakumaran

Let X1 , X2 , · · · , Xn be iid random sample of size n with pdf pθ (x) where

θ = (θ1 , θ2 , · · · , θk ) of k parameters. Define µ0r = Eθ [X r ], r = 1, 2, · · · , k .

The method of moments estimation is a principle of solving a set of k equa-

tions in θ1 , θ2 , · · · , θk to estimate the parameters θ1 , θ2 , · · · , θk , i.e., θ̂(µ0 ) =

µ01 , µ02 , · · · , µ0k . Replace µ0r by m0r , where m0r is the rth raw moment of the random

sample. It gives the moment estimators of the parameters.

Remark 5.3 Moment estimators are consistent under suitable conditions. For iid

random sample X1 , X2 , · · · , Xn with pdf pθ (x) ∀ θ ∈ Ω ,

n

1X r P

X → E[X r ] as n → ∞, r = 1, 2, · · ·

n i=1 i

This is not true when the moments of the distribution do not exist. For example in the

case of Cauchy distribution moment estimators do not exist.

Example 5.18 A random sample of size n is taken from the log normal distribu-

tion ( 2

√1 1 − 2σ12 (log x−θ)

x e x>0

pθ,σ2 (x) = 2πσ

0 otherwise

176

Probability Models and their Parametric Estimation

Z ∞

1 xr − 12 (log x−θ)2

E[X r ] = √ e 2σ dx

0 2πσ x

Take y = log x, i.e., ey = x → ey dy = dx

Z ∞

1 1 2

E[X r ] = √ ery e− 2σ2 (y−θ) dy

0 2πσ

y−θ

Let = z → y = σz + θ, dy = σdz

σ Z ∞

1 1 2

E[X r ] = √ erθ− 2 z +rσz dz

−∞ 2π

Z ∞

erθ 1 2

= √ e− 2 [z −2rσz] dz

2π −∞

2 2

rθ+ r 2σ Z ∞

e 1 2

= √ e− 2 [z−rσ] dz

2π −∞

2 2

rθ+ r 2σ √ Z ∞ √

0 e 1 2

µr = √ 2π since e− 2 [z−rσ] dz = 2π

2π −∞

r2 σ2

µr0 = erθ+ 2 r = 1, 2, · · ·

σ2 2

when r = 1 log µ10 = θ + , 2 log µ10 = 2θ + σ 2 , log (µ10 ) = 2θ + σ 2

2 !

0

2 µ

when r = 2 log µ20 = 2θ + 2σ 2 , log µ20 − log (µ10 ) = σ 2 , log 2

2 = σ2

(µ10 )

m02

P r

0 xi

⇒ σ̂ 2 (x) = log where mr = r = 1, 2, · · ·

(m01 )2 n

m02

log(m01 )2 = 2θ̂(x) + log

(m01 )2

m02

log(m01 )2 − log = 2θ̂(x)

(m01 )2

!

(m01 )2

i.e., θ̂(x) = log p 0

m2

Example 5.19 Find the moment estimates of α and β for the pdf

( β

α −αx β−1

pα,β (x) = Γβ e x x > 0, β > 0, α > 0

0 otherwise

177

A. Santhakumaran

∞

αβ −αx β−1

Z

r

E[X ] = xr e x dx

0 Γβ

∞

αβ −αx r+β−1

Z

= e x dx

0 Γβ

Γ(β + r) αβ Γ(β + r)

µr0 = = r = 1, 2, · · ·

αβ+r Γβ αr Γβ

Γ(β + 1) βΓβ β

when r = 1 µ01 = = =

αΓβ αΓβ α

Γ(β + 2) (β + 1)βΓβ (β 2 + β)

when r = 2 µ02 = 2

= 2

=

α Γβ α Γβ α2

µ20 1 (µ0 )2

= 1+ → β= 1

(µ10 )2 β µ2

h P i2

(m0 )2 m0

P P 2

x

Thus β̂(x) = m12 and α̂(x) = m12 where m01 = nxi and m2 = n i − nxi .

Example 5.20 Obtain the moment estimate of the parameter θ of the pdf

1 −|x−θ|

2e −∞ < x < ∞

pθ (x) =

0 otherwise

Z ∞

x −|x−θ|

µ01 = Eθ [X] = e dx

−∞ 2

|x − θ| = x − θ if x ≥ θ

= −(x − θ) if x ≤ θ

Z θ Z ∞

x (x−θ) x −(x−θ)

µ01 = e dx + e dx

−∞ 2 θ 2

when x − θ = t →x=t+θ

Z 0 Z ∞

2µ01 = (t + θ)et dt + (t + θ)e−t dt

−∞ 0

Z ∞ Z 0 Z ∞

= θ e−|t| dt + tet dt + te−t dt

−∞ −∞ 0

Z ∞ ∞

Z ∞

Z

−|t| −t

= θ dt − θ

e te dt + θ te−t dt

−∞ 0 0

Z ∞

1 −|t|

= θ since e dt = 1

2 −∞

P

xi

µ01 = θ → θ̂(x) = m01 where m01 = .

n

178

Probability Models and their Parametric Estimation

estimates of the parameters a and b of the rectangular distribution

1

b−a a<x<b

pa,b (x) =

0 otherwise

Z b

x a+b

µ01 = E[X] = dx =

a b−a 2

b

x2 b3 − a3 b2 + ab + a2

Z

1

µ02 = E[X ] =2

dx = =

a a−b b−a 3 3

2

b + 2ab + a2 − ab

2

(2µ01 ) − ab

µ02 = =

3 3

3µ02 = 4(µ01 )2 − ab and b = 2µ01 − a

.˙. 3µ02 = 4(µ01 )2 − a(2µ01 − a) ⇒ 3µ02 = 4(µ01 )2 − 2aµ01 + a2

2

a2 − 2aµ01 + 4µ01 − 3µ02 = 0

q

2µ01 ± 4µ01 2 − 4(4µ01 2 − 3µ02 )

a=

2

√ √ √

â(x) = m1 ± 3m2 . But 2µ1 = µ1 ± 3m2 + b ⇒ b̂(x) = m01√± 3m2 .

0 0 0

Thus the value of the moment estimators of a and b are â(x) = m01 − 3m2 and

√ P P 2

x

P 2

b̂(x) = m01 + 3m2 where m01 = nxi and m2 = n i − n

xi

X=x 0 1 2

Pθ {X = x} 1 − θ − θ2 θ θ2

Obtain the moment estimate of θ , if in a sample of 25 observation there were 10 ones

and 4 twos.

X=x Pθ {X = x} Frequency f

0 1 − θ − θ2 11

1 θ 10

2 θ2 P 4

Total 1 fi = 25 P

One can get, µ01 = Eθ [X] = (1 − θ − θ2 ) × 0 + θ × 1 + θ2 × 2 = Pfi xi

fi

0 + θ + 2θ2 = 0+10+8

25

50θ2 + √

25θ − 18 = 0

θ̂(x) = −25± 625+4×50×18

2×50 = .4

Example 5.23 Let X1 , X2 , · · · , Xn be a random sample drawn from a population

with pdf

α

(

α α−1 − xβ

pα,β (x) = βx e x, β, α > 0

0 otherwise

179

A. Santhakumaran

Z ∞

α xα

E[X r ] = xα+r−1 e− β dx

β

0 Z ∞

α 1 α+r−1 y 1 1

= yα e− β y α −1 dy where y = xα

β α

Z ∞0

1 y r

= e− β y α +1−1 dy

β 0

1 Γ αr + 1

r

r

µ0r = 1 r

+1

= βαΓ +1

β (β)α α

1 1 2 2

µ01 = β α Γ + 1 and µ02 = β α Γ +1

α α

2

2 2 2 1

µ2 = β α Γ +1 −βα Γ +1

α α

2

µ2 Γ( α2 + 1) − Γ( α1 + 1)

=

(µ01 )2

2

Γ( 1 + 1)α

Pn P

S 1 Xi

Coefficient of variation = X̄

where S 2 = n−1

2

i=1 (Xi − X̄) and X̄ = n .

Equivating

2

S2 Γ( α2 + 1) − Γ( α1 + 1)

= 2

x̄2 Γ( α1 + 1)

and using iterative method to estimate the value of α . From the estimate α̂(x) one

can obtain the estimate of β .

Prthat a sample contains r classes with observed frequency f1 , f2 , · · · , fthr

Suppose

such that fi = f . Let πi (θ) be the probability of an observation in the i

i=1 P

r

class such that i=1 πi (θ) = 1 . The probability πi (θ) is the function of θ . Let

T = t(X) be any statistic for the parameter θ where θ is unknown. A statistic

T = t(X) is called minimum χ2 estimator of the parameter θ if it is obtained by

minimizing χ2 with respect to θ ,i.e.,

r

X [fi − f πi (θ)]2

χ2 =

i=1

f πi (θ)

r

X fi2

= −f

i=1

f πi (θ)

2 r

∂χ X fi2 dπi (θ)

= − =0

∂θ i=1

f πi2 (θ) dθ

r

X fi2 dπi (θ)

→ = 0

i=1

f πi2 (θ) dθ

180

Probability Models and their Parametric Estimation

Remark 5.4 Minimum χ2 estimator is analogous to that of MLE of θ . The

asymptotic properties of Minimum χ2 estimators are similar to those of MLE.

A modified form of Minimum χ2 estimator is obtained by minimizing

r

X [f πi (θ) − fi ]2

χ2mod =

i=1

fi

r

X f 2 πi2 (θ)

= −f

i=1

fi

r

∂χ2mod X f2 dπi (θ)

= 2 πi (θ) =0

∂θ i=1

fi dθ

r

X f 2 dπ 2 (θ)

i

⇒ = 0

i=1

fi dθ

Example 5.24 Find minimum χ2 estimate of the parameter θ of the Poisson

distribution.

e−θ θj

Let πj (θ) = j = 0, 1, · · · ,

j!

dπj (θ) e−θ θj−1 j θj e−θ (−1)

= +

dθ j! j!

−θ j

e θ j

= −1

j! θ

dπj (θ) j

= πj −1

dθ θ

dχ2 X fj2 dπj (θ)

= − =0

dθ j

f πj2 (θ) dθ

X fj2

j

i.e., π j (θ) − 1 = 0

j

πj2 (θ) θ

X fj2 j

−1 = 0

j

πj (θ) θ

X fj2 j

1− = 0

j

πj (θ) θ

Iterative method may be used to solve the equation for θ . Alternatively, expand

P fj2 j

f (θ) = j πj (θ) 1 − θ in a Taylor’s series as a function of θ upto first order

181

A. Santhakumaran

" 2 #

X fj2 j

X fj2 j

X fj2 j

j

1− = 1− + (θ − x̄) + 1−

j

πj (θ) θ j

mj x̄ j

mj x̄2 x̄

e−x̄ x̄j

where πj (x̄) = mj = and f (θ) = f (x̄) + (θ − x̄)f 0 (x̄)

j!

d πj1(θ) (1 − θj ) 1

j

j

1 dπj (θ)

since = 0+ 2 − 1− 2

dθ πj (θ) θ θ πj (θ) dθ

1 j j 1 j

= − 1 − (−π j (θ)) 1 −

πj (θ) θ2 θ πj (θ) θ

" 2 #

1 j j

= 2

+ 1− = f 0 (θ)

πj (θ) θ θ

X fj2 j

But 1− = 0

j

πj (θ) θ

" 2 #

X fj2 j

X fj2 j

j

1− + (θ − x̄) + 1− =0

j

mj x̄ j

mj x̄2 x̄

fj2

1 − x̄j

P

− j mj

θ − x̄ = fj2 j

+ (1 − x̄j )2 ]

P

j mj [ x̄2

fj2

− j] x̄1

P

− j mj [x̄

θ − x̄ = fj2

+ (x̄ − j)2 ] x̄12

P

j mj [j

P fj2

j mj [j − x̄]

Let θ1 = x̄ + x̄ P fj2

j mj [j + (j − x̄)2 ]

To improve the value of θ from x̄ , repeat the process until to get the convergent value

of θ .

Example 5.25 Show that for large sample size, maximizing the likelihood function

of the χ2 statistic is equal to minimizing the χ2 statistic.

Let oj be the observed frequency and ej be the theoretical frequency of the

P (o −e )2

j class. Then χ2 = j j ej j . For large fixed sample size n , the distribution of

th

182

Probability Models and their Parametric Estimation

n! e o1 e o2 e o r

1 2 r

L = ···

o1 !o2 ! · · · or ! n n n

such that o1 + o2 + · · · + or = n

o1 o2 or

n! e1 e2 er o1 o1 o o r

r

= ··· ···

o1 !o2 ! · · · or ! o1 o2 or n n

r

X ej

log L = constant + oj log

j=1

oj

1

For large fixed sample size, ej = oj + aj n1−δ , δ > 0, i.e., ej = oj + aj n 2 for δ = 12 ,

1 P P 1 P

where aj is finite and |aj n 2 | < and j oj = j ej = n so that n

2

j aj = 0

1

aj < 0(n− 2 ) .

P P

as n → ∞ and if n 6→ ∞ , then aj < 1 for every > 0 , i.e.,

n2

r 1

" #

X oj + aj n 2

log L = constant + oj log

j=1

oj

r

" 1

!#

X aj n 2

= constant + oj log 1 +

j=1

oj

" 1

!#

X aj n 2 a2j n 1

= constant + oj − 2 + ···

j

oj oj 2

X 1 1 X a2j n 1

= constant + aj n 2 − + 0(n− 2 )

j

2 j oj

1 X (ej − oj )2 1

= constant − + 0(n− 2 )

2 j oj

(ej −oj )2

If modified χ2 statistic is defined as χ2mod =

P

j oj , then

1

log L = constant − χ2mod as n → ∞.

2

183

A. Santhakumaran

X (oj − ej )2 X (ej − oj )2

χ2 − χ2mod = −

j

ej j

oj

X (ej − oj )2 oj

= −1

j

oj ej

" #

X (ej − oj )2 oj

= 1 − 1

j

oj oj + aj n 2

!−1

1

X (ej − oj )2

1 + aj n

2

= − 1

j

oj oj

" 1

#

X (ej − oj )2 aj n 2 a2j n

= 1− + 2 − ······ − 1

j

oj oj oj

X (oj − ej )2 aj n1/2 X (oj − ej )2 a2j n

= − + − ···

j

o2j j

o3j

X a3j n3/2 X a4j n2

= − + − · · · where ej − oj = aj n1/2

j

o2j j

o3j

3

1

3 i3

a3j n 2 a3j

h 1

3

= o(n− 2 =

P P

Since j o2j < for some n > N ⇒ j o2j < 3 = 1

n2 n2

1

1 4

1 a4j (n 2 )4 a4j 1 14

o(n− 2 ) and

P P

j o3j

< 1 for some n > N ⇒ j o3j < 1

4

< 1 =

(n 2 ) n2

1 1

(o(n− 2 ))4 = o(n− 2 ) where > 0 and 1 > 0.

1

χ2 − χ2mod = o(n− 2 ) = 0 as n → ∞

1

Thus log L = constant − χ2 as n → ∞

2

1

max log L = constant + {− max χ2 } as n → ∞

2

1

= constant + min χ2 as n → ∞

2

Maximizing the likelihood function of the χ2 statistic = Minimizing the χ2 statistic.

Consider a linear model Y = Xθ + where is a non - observable random

vector such that E[i ] = 0 and V [i ] = σ 2 ∀i, Y is a known vector, θ is the

184

Probability Models and their Parametric Estimation

parameter to be estimated

y1 θ1 1

y2 θ2 2

Y = ···

θ =

··· =

···

yn n×1

θ m m×1

n n×1

X = coefficient matrix of the parameter θ

x11 x12 · · · x1m

x21 x22 · · · x2m

i.e., X =

··· ··· ··· ···

Definition 5.1 An estimate of θ say θ̂(x) which minimizes (Y − Xθ)0 (Y − Xθ) is

called the Least Square Estimator (LSE) of θ , i.e., = (Y − Xθ), 0 = (Y − Xθ)0 .

Define S = 0 = (Y − Xθ)0 (Y − Xθ) . The necessary condition is dS dθ = 0 and the

2

sufficient condition is ddθS2 > 0 at θ = θ̂(x) for minimization of S .

dS

⇒ = −2X 0 (Y − Xθ) = 0

dθ

X 0 Y − X 0 Xθ = 0

X 0 Xθ = X 0 Y

V θ̂(x) = (X 0 X)−1 X 0 Y provided (X 0 X)−1 exists

θ̂(x) = (X 0 X)−1 X 0 Y

= (X 0 X)−1 X 0 [Xθ + ]

= (X 0 X)−1 X 0 Xθ + (X 0 X)−1 X 0

Eθ [θ̂(X)] = θ + (X 0 X)−1 X 0 Eθ []

= θ since Eθ [] = 0

θ̂(X) − θ = (X 0 X)−1 X 0 Y − θ

= (X 0 X)−1 X 0 [Xθ + ] − θ

= (X 0 X)−1 X 0 Xθ + (X 0 X)−1 X 0 − θ

= θ + (X 0 X)−1 X 0 − θ

= (X 0 X)−1 X 0

Vθ [θ̂(X)] = Eθ [θ̂(X) − θ][θ̂(X) − θ]0

= Eθ [(X 0 X)−1 X 0 ][(X 0 X)−1 X 0 ]0

= Eθ [0 ](X 0 X)−1 (X 0 X)(X 0 X)−1

= σ 2 (X 0 X)−1

since E[i ] = 0, V [i ] = σ 2 and E[0 ] = σ 2 I

185

A. Santhakumaran

Linear Estimation

and E[Y ] = Xθ . Any linear function b0 θ = b1 θ1 + b2 θ2 + · · · + bm θm in θ is

unbiasedly estimable if there exists a linear function c0 Y = c1 y1 + c2 y2 + · · · + cn yn

such that Eθ [c0 Y ] = b0 θ .

Theorem 5.7 A necessary and sufficient condition for the ensilability of b0 θ is

ρ(X 0 ) = ρ(X 0 , b)

i.e., Eθ [c0 Y ] = b0 θ ∀ θ

⇒ c0 Xθ = b0 θ ∀θ, i.e., c0 X = b0

X 0 c = b is solvable, i.e., ρ(X 0 ) = ρ(X 0 , b)

Conversely, suppose ρ(X 0 ) = ρ(X 0 , b) ⇒ X 0 c = b , the equation is consistent.

.˙. X 0 c = b is solvable , i.e., c0 X = b0

c0 Xθ = b0 θ

c0 Eθ [Y ] = b0 θ

Eθ [c0 Y ] = b0 θ

⇒ b0 θ is estimable.

Remarks 5.5 ρ(X 0 ) = ρ(X 0 , b) ⇒ ρ(X 0 X) = ρ(X 0 X, b)

Best Linear Unbiased Estimator

Definition 5.2 The unbiased linear estimate of an estimable linear parametric

function b0 θ = b1 θ1 + b2 θ2 + · · · + bm θm with minimum variance is called the best

linear unbiased estimator or BLUE.

Theorem 5.8 Let Y1 , Y2 , · · · , Yn be n independent random variables with variance

σ 2 and Eθ [Y ] = Xθ , then every estimable parametric function b0 θ possesses an

unique minimum variance unbiased estimator which is a function of θ̂(X) , the LSE of

θ . Further, E[Y − X θ̂ ]0 [Y − X θ̂ ] = (n − r)σ 2 .

Proof: b0 θ is estimable if there exist c0 Y such that Eθ [c0 Y ] = b0 θ

c0 Xθ = b0 θ ⇒ X 0 c = b (5.11)

0 0 2

and V [c Y ] = c cσ (5.12)

Minimize equation (5.12) subject to equation ( 5.11)

Using the method of Lagrange multipliers , one determines the stationary points by

considering

L(λ) = c0 c − 2λ0 (X 0 c − b)

where λ is a vector of Lagrange multiplier

dL(λ)

= 2c0 − 2λ0 X 0

dc

186

Probability Models and their Parametric Estimation

The stationary points of the function L(λ) are given by the equation

dL(λ)

= 0

dc

⇒ c0 − λ0 X 0 = 0

⇒ c0 = λ0 X 0 i.e., c = Xλ

X 0 Xλ = b (5.13)

0

Since b θ is estimable, equation (5.11) is solvable, i.e.,

ρ(X 0 ) = ρ(X 0 , b) ↔ ρ(X 0 X) = ρ(X 0 X, b).

Thus equation (5.13) is solvable. Let c(1) and c(2) be two solutions for to λ(1) and

λ(2) of equation (5.13).

c(1) = Xλ(1)

c(2) = Xλ(2)

0 (1)

X Xλ = b

0 (2)

X Xλ = b

X 0 X(λ(1) − λ(2) ) = 0

c(1) − c(2) = X(λ(1) − λ(2) )

0 0

c(1) − c(2) c(1) − c(2) = λ(1) − λ(2) X 0 X λ(1) − λ(2)

0

⇒ c(1) − c(2) c(1) − c(2) = 0

(1)

⇒c = c(2)

Thus, whatever be the solution of λ of the equation (5.13) the values of c0 are the

same. Hence b0 θ possesses an unique minimum variance unbiased estimator.

Suppose that ρ(X) = r andthe first r columns of X are linearly independent.

b1

Let X = [X1 X2 ] , b = . Now the solution of the equation (5.13) is λ =

b2

−1

(X10 X1 ) b1

.˙. c = Xλ

−1

= X1 (X10 X1 ) b1

−1 −1

c0 c = b01 (X10 X1 ) X10 X1 (X10 X1 ) b1

−1

= b01 0

(X X1 ) b1

For every c satisfying X 0 c = b

c0 c = c0 [I − X1 (X10 X1 )−1 X10 ]c + c0 X1 (X10 X1 )−1 X10 ]c

= c0 [I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 (X10 X1 )(X10 X1 )−1 (X10 X1 )](X10 X1 )−1 b1

= c0 [I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 b1

Since [I − X1 (X10 X1 )−1 X10 ] is an idempotent matrix

= c0 [I − X1 (X10 X1 )−1 X10 ][I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 b1

≥ b01 (X10 X1 )−1 b1

187

A. Santhakumaran

This indicates that the minimum is actually obtained. The LSE θ̂(X) of θ is obtained

by minimizing (Y − X θ̂)0 (Y − X θ̂) . The normal equation is

X 0 Xθ = X 0 Y ⇒ c0 Y = λ0 X 0 Y = λ0 X 0 X θ̂ = b0 θ̂(X) since b0 = λ0 X 0 X.

The best

linear unbiased−1estimator of c0 Y is b0 θ̂(X) .

0

Since I − X1 (X1 X1 ) X1 is a projection matrix and hence it is an idempotent

matrix. Further, it is an well known property that for an idempotent matrix A, ρ(A) =

T r(A)

⇒ ρ I − X1 (X10 X1 )−1 X10 = T r I − X1 (X10 X1 )−1 X10 = (n − r)

.˙. Eθ [(Y −X θ̂ )0 (Y −X θ̂)] = Eθ [(Y −X θ̂ )0 (I−X1 (X10 X1 )−1 X10 )(Y −X θ̂)] = (n−r)σ 2 .

Example 5.26 E[Y1 ] = θ1 + θ2 , E[Y2 ] = θ2 + θ3 , E[Y3 ] = θ3 + θ1 . Show that

l1 θ1 + l2 θ2 + l3 θ3 is estimable if l1 + l2 = l3 .

Given

1 0 1 1 0 1 l1 θ1

X= 0 1 1 , X0 = 0 1 0 , l = l2 , θ = θ2

1 0 1 1 1 1 l3 θ3

l0 θ is estimable if and only if ρ(X 0 ) = ρ(X 0 , l) Consider X 0 θ = l , i.e.,

1 0 1 θ1 l1

0 1 0 θ 2 = l2

1 1 1 θ3 l3

1 0 1 θ1 l1

0 1 0 θ 2 = l2

0 1 0 θ3 l3 − l1

1 0 1 θ1 l1

0 1 0 θ2 = l2

0 0 0 θ3 l3 − l1 − l2

ρ(X 0 ) = ρ(X 0 , l) if l3 − l1 − l2 = 0 , i.e., l3 = l1 + l2 .

Example 5.27 The feed intake of a cow with weight X1 and yield of milk X2 may

be of the linear model Y = a + b1 X1 + b2 X2 + , where is called random error

or random residuals. If yi , xi1 and xi2 are the values of Y, X1 and X2 for cow

i = 1, 2, 3, 4 and 5 . The following observations are made on 5 cows:

i Y X1 X2

1 62 2 6

2 60 9 10

3 57 6 4

4 48 3 13

5 23 5 2

188

Probability Models and their Parametric Estimation

where

62 1 2 6

60 1 9 10 250

X 0 Y = 1265

Y = 57 X= 1 6 4

48 1 3 13 1870 3×1

23 5×1 1 5 2 5×3

5 25

35 790 −80 −42

1

(X 0 X)−1 = 25 155

175 = −80 16 0

480

35 175

325 −42 0 6

790 −80 −42 250 37 a

1

θ̂(X) = (X 0 X)−1 X 0 Y = −80 16 0 1265 = 21 = b1

480 3

−42 0 6 1870 2 b2

The estimated linear model is Y = 37 + 12 X1 + 23 X2 .

Problems

5.1 Define LSE. Show that under certain assumptions to be stated, the LSE’s are

minimum variance unbiased estimators.

5.2 Let yi = βxi + i , i = 1, 2, 3, · · · , n where 1 , 2 , · · · n are uncorrelated ran-

dom variables with mean 0 and σ 2 . Find the LSE of β . Show that the LSE of

β is unbiased. Find the variance of LSE of β . Also show that LSE of β is the

best Linear Unbiased Estimator of β .

5.3 Examine the sufficiency and unbiasedness of the MLE.

5.4 Independent random samples of sizes n1 , n2 , and n3 are available from three

normal populations with mean α+β +γ, α−β and β −γ respectively and with

a common variance σ 2 . Find the MLE of α, β, γ and σ 2 . Are they UMVUE’s?

5.5 Give the conditions for which

(a) the likelihood equation has a consistent estimator with probability approach-

ing one as n → ∞ .

(b) the consistent estimator of the likelihood equation is asymptotically normal.

5.6 Explain the principle of Maximum Likelihood of Estimation of parameter θ of

p(x | θ) . Obtain MLE of the parameters of N (θ, σ 2 ) . Also examine them for

unbiasedness.

5.7 Show that under what regularity conditions to be stated, the MLE is asymptotically

normally distributed.

5.8 Let X1 , X2 , · · · , Xn be a random sample drawn from a population with mean

θ and finite variance. Let T = t(X) be an estimator for θ and has min-

imum variance and T 0 = t0 (X) is any other unbiased estimator of θ , then

Covθ [T, T 0 ] = V [T ] .

189

A. Santhakumaran

5.9 Derive the formula to calculate the MLE of θ , using a random sampleP from the

θx

distribution with Pθ {X = x} = ax g(θ) , x = 1, 2, · · · where g(θ) = ax θx .

Also obtain the explicit expression for the case of truncated Poisson distribution

with x = 1, 2, 3, · · · .

5.10 Show that MLE of θ based on n independent observations from a uniform

distribution in (0, θ) is consistent.

5.11 Find the MLE of θ given the observations .8 and .3 on a random variable with

pdf 2x

θ 0<x<θ

pθ (x) = (1−x)

2 1−θ if θ ≤ x < 1, 0 < θ < 1

N (0, θ) . Find the MLE of θ .

5.13 Given a random sample from N (θ, 1), (θ = 0, ±1, ±2, · · · , ) . Find the MLE of

θ.

5.14 Explain the method of scoring to obtain the MLE.

5.15 Obtain the MLE of θ based on random samples of sizes n and m from popu-

x

lations with respective frequency functions θ1 e− θ and θe−xθ , x > 0, θ > 0 .

5.16 What is MVBE? Obtain sufficient conditions for an estimator to be MVBE.

5.17 Give an account of estimation by the method of (i) Moments (ii) Minimum χ2 ,

giving one illustration in each case.

5.19 Examine the truth of the following statements

(i) MLE is unique

(ii) MLE is unbiased

(iii) If sufficient statistics T = t(X) exists for parameter θ , then MLE is a

function of T .

5.20 Show that under certain conditions to be stated MLE is consistent.

5.21 Examine whether MLE always exists.

5.22 Obtain the general form of distribution admitting MVBE’s.

5.23 A random sample of size n is available from pθ (x) = θxθ−1 , 0 < x < 1, θ > 0.

Find that function of θ for which MVBE exists. Also find the MVBE of this

function and its variance.

−θ x

5.24 Derive the MVUE of θ2 in pθ(x) = e x!θ , x = 0, 1, · · · , by taking a sample

of size n and show that it is not a MVBE of θ2 .

5.25 Describe the Minimum χ2 method of estimation. Show that, under what certain

conditions to be stated, the methods of Minimum χ2 and Maximum likelihood

χ2 statistic are equally efficient estimators.

190

Probability Models and their Parametric Estimation

5.26 Show that MVBE’s exist for the exponential family of densities.

5.27 Find MLE of β in Gamma(1, β ) based on a sample of size n where the actual

observations are not available but it is known that k of the observations are less

than or equal to a fixed positive number M .

5.28 Obtain the BLUE of θ for the normal distribution with mean θ and variance

σ 2 based on n observations x1 , x2 , · · · , xn .

5.29 Obtain the MLE for the coefficient of variation from a population with N (θ, σ 2 )

based on n observations.

5.30 Obtain the MLE of θ for the pdf

(1 + θ)xθ 0 < x < 1 and θ > 0

pθ (x) =

0 otherwise

5.31 Obtain the MLE of θ using a random sample of size n from

1

2θ −θ < x < θ

pθ (x) =

0 otherwise

5.32 Show that maximum likelihood estimation χ2 statistic and Minimum χ2 statis-

tic give the same results as n → ∞ .

5.33 Find the MLE of N of

1

N if x = 1, 2, · · · , N, N ∈ I+

pN (x) =

0 otherwise

5.34 Suppose X1 , X2 , · · · , Xn are iid observations from the density

x2

2x

pθ (x) = θ 2 exp{− θ 2 } x > 0, θ > 0

0 otherwise

5.35 If the random variable X takes the value 0 or 1 with probability 1 − p and p

respectively and p ∈ [0.1, 0.9] , then maximum likelihood estimate of p on the

basis of a single observation x would be

(a) 8x+1

10 (b) x (c) 9−8x

10 (d) x2

Hint:

p̂ if x = 0

The maximum of L(p) =

1 − p̂ if x = 1

191

A. Santhakumaran

5.36 The maximum likelihood estimator of σ 2 in a normal population with mean zero

is

(a) n1 P(xi − x̄)2

P

1

(b) n−1 (x − x̄)2

1

P 2 i

(c) n P xi

1

(d) n−1 x2i

5.37 Consider the following statements:

The maximum likelihood estimators

1. are consistent

2. have invariance property

3. can be made unbiased using an adjustment factor even if they are biased. Of

these statements:

(a) 1 and 3 are correct

(b) 1 and 2 are correct

(c) 2 and 3 are correct

(d) 1, 2 and 3 are correct

5.38 Which of the following statements are not correct?

1. From the Cramer - Rao inequality one can always find the lower bound of the

variance of an unbiased estimator.

2. If sufficient statistic exits, then maximum likelihood estimator is itself a suffi-

cient statistic.

3. UMVUE and MVBE’s are same.

4. MLE’s may not be unique

Select the correct answer given below:

(a) 1 and 3 (b) 1 and 2 (c) 1 and 4 (d) 2 and 3

5.39 Which one of the following is not necessary for the UMVU estimation of θ by

T = t(X) ?

(a) E[T − θ] = 0

(b) E[T − θ]2 < ∞

(c) E[T − θ]2 is minimum

(d) T = t(X) is a linear function of observations

5.40 Consider the following statements:

If X1 , X2 , · · · , Xn are iid random variables with uniform distribution over

(0, θ) , then

1. 2X̄ is an unbiased estimator of θ .

2.The largest among X1 , X2 , · · · , Xn is an unbiased estimator θ

3. The largest among X1 , X2 · · · , Xn is sufficient for θ

4. n+1

n X(n) is a minimum unbiased estimator of θ

Of these statements:

(a) 1 alone is correct

(b) 1 and 2 are correct

(c) 1, 3 and 4 are correct

(d) 1 and 4 are correct

192

Probability Models and their Parametric Estimation

5.41 LSE and MLE are the same if the sample comes from the population is :

(a) Normal (b) Binomial (c) Cauchy ( d) Exponential

5.42 LSE of the parameters of a linear model are

(a) unbiased (b) BLUE (c) UMVU (d) all the above

193

A. Santhakumaran

6. INTERVAL ESTIMATION

6.1 Introduction

Let X be a random sample drawn from a population with pdf pθ (x), θ ∈ Ω.

For every distinct value of θ, θ ∈ Ω , there corresponds one member of the family of

distributions. Thus one has a family of pdf ’s {pθ (x), θ ∈ Ω} . The experimenter

needs to select a point estimate of θ, θ ∈ Ω . Even though the estimator may have

some valid statistical properties, the estimator may not reflect the true value of the

parameters, due to the randomness of the observations. Hence one may search for an

alternative to get the closeness of the estimates to the unknown parameters with certain

probability values. Hence as an alternative one may go for the interval estimation with

certain level of significance. This chapter deals with interval estimation.

Family of random sets

k

Let Pθ , θ ∈ Ω ⊆ < , be the set of probability distributions of the random variable

X . A family of subsets of S(X) of Ω depends on the observations x of X but not

θ , is called family of random sets.

The problem of interval estimation is that finding a family of random sets S(X)

of the parameter θ , such that for a given α, 0 < α < 1, Pθ {S(X) contains θ} ≥

1 − α, ∀ θ ∈ Ω .

Let θ ∈ Ω ⊆ < and 0 < α < 1 . A function θ(X) satisfying Pθ {θ(X) ≤

θ} ≥ 1−α ∀ θ is called lower confidence bound of θ at confidence level (1−α) . The

infiumum takes over all possible values of θ ∈ Ω ⊆ < of Pθ {θ(X) ≤ θ} is (1 − α) .

The quantity (1 − α) is called confidence coefficient.

A function of the form Pθ {θ ≤ θ̄(X)} ≥ 1 − α ∀ θ ∈ Ω ⊆ < is called upper

confidence bound of θ at confidence level (1 − α) .

If S(x) is of the form S(x) = θ(x), θ̄(x) , then it is called a confidence

interval at confidence level (1 − α) , provided Pθ {θ(X) ≤ θ ≤ θ̄(X)} ≥ 1 − α ∀ θ ∈

Ω ⊆ <. Theconfidence coefficient (1 − α) is associated with the random interval

θ(X), θ̄(X) ,

Let X be a random sample drawn from a population with pdf pθ (x), θ ∈

Ω ⊆ < and a, b be two given positive numbers such that a < b, a, b ∈ < . Consider

Pθ {a < X < b} = Pθ {a < X and X < b}

X

= Pθ 1 < and X < b

a

b

= Pθ b < X and X < b

a

b

= Pθ X < b and b < X

a

b

= Pθ X < b < X

a

194

Probability Models and their Parametric Estimation

The interval with end points X and ab X that are functions of X . Hence I(X) =

X, ab X is a random interval. Thus if I(X) takes a value x, ab x when X takes

Let θ be an unknown parameter and let (θ(X), θ̄(X)) be a (1 − α) level confi-

dence interval for θ . One desires

the confidence limit for g(θ) , a monotonic

function

of θ . The set θ(X), θ̄(X) is equivalent to the set g(θ(X)), g(θ̄(X)) as long

as g(θ) is a monotonic increasing function of θ . Thus g(θ(X)), g(θ̄(X)) is a

(1 − α) level confidence

interval for g(θ) . If g(θ) is monotonic decreasing, then

g(θ̄(X)), g(θ(X)) is a (1 − α) level confidence interval for g(θ) .

Example 6.1 For a single observation x of a random variable X with density

function 1

θ 0 < x < θ, θ > 0

pθ (x) =

0 otherwise

Obtain the probability of confidence of the random interval (X, 10X) for θ, θ ∈ Ω .

The probability of confidence of the interval (X, 10X) for θ is

θ

Pθ {X < θ < 10X} = Pθ 1 < < 10

X

θ

= Pθ <X<θ

10

Z θ

1

= dx = .9

θ

10

θ

1 19

Example 6.2 Find the confidence coefficient of the confidence interval 19X , X

for θ based on a single observation x of a random variable X with pdf

θ

(1+θx)2 0 < x < ∞, θ > 0

pθ (x) =

0 otherwise

1

, 19

The confidence coefficient of the interval 19X X for θ is

1 19 1 19

Pθ <θ< = Pθ <X<

19X X 19θ θ

Z 19 θ θ

= dx

1

19θ

(1 + θx)2

Z 19

1 θ 2θ

= dx

2 19θ (1 + θx)2

1

19

1 1 θ

= −

2 (1 + θx) 1

19θ

1 1 19

= − − = .45

2 20 20

195

A. Santhakumaran

X 2X

Example 6.3 Compute the confidence coefficient of the interval 1+X , 1+2X

θ

for 1+θ where X has the pdf

1

θ0 < x < θ, θ > 0

pθ (x) =

0otherwise

X 2X θ

The confidence coefficient of the interval 1+X , 1+2X for 1+θ is

X θ 2X 1 + 2X 1+θ 1+x

Pθ < < = Pθ < <

1+X 1+θ 1 + 2X 2X θ X

1 1 1

= Pθ +1< +1< +1

2X θ X

1 1 1

= Pθ < <

2X θ X

= Pθ {X < θ < 2X}

θ

= Pθ 1 < <2

X

X 1

= Pθ 1 > >

θ 2

θ

= Pθ <X<θ

2

Z θ

1 1 θ

= dx = θ− = .5

θ

2

θ θ 2

Example 6.4 Let T = t(X) be the maximum of two independent observations

drawn from a population with uniform distribution over the interval (0, θ) . Compute

the confidence coefficient of the interval (0, 2T ) .

Let T = max{X1 , X2 } . The pdf of T is

2

pθ (t) = θ2 t 0 < t < θ

0 otherwise

The confidence coefficient of the interval (0, 2T ) is

θ

Pθ {0 < θ < 2T } = Pθ 0 < <2

T

θ

= Pθ <T <∞

2

θ

= Pθ <T <θ

2

Z θ

2

= tdt

θ

2

θ2

θ

2 t2

= = .75

θ2 2 θ

2

196

Probability Models and their Parametric Estimation

2

θ 2 (θ − x) 0 < x < θ, θ > 0

pθ (x) =

0 otherwise

Find (1 − α) level confidence interval for θ.

Consider the pdf of Y = X θ . It is given by

2(1 − y) 0 < y < 1

p(y) =

0 otherwise

The (1 − α) level confidence interval for θ is

Pθ {λ1 < Y < λ2 } = 1−α

α

i.e., Pθ {Y ≥ λ2 } =

2

α

and Pθ {Y ≤ λ1 } =

2

Z 1

α

Thus 2(1 − y)dy =

λ2 2

λ22 − 2λ2 − (1 − α/2) = 0

p

⇒ λ2 = 1 − 2 − α/2 = c2

Pθ {Y ≤ λ1 } = α/2

Z λ1

2(1 − y)dy = α/2

0

⇒ λ21 − 2λ1 + α/2 = 0

p

λ1 = 1− 1 − α/2 = c1

The (1 − α) level confidence interval for θ is

Pθ {c1 < Y < c2 } = 1 − α

X

Pθ c1 < < c2 = 1−α

θ

X X

Pθ <θ< = 1−α

c2 c1

X X

is the (1 − α) level confidence interval for θ .

c2 , c1

Example 6.6 Obtain (1 − α) level confidence interval for θ , using a random

sample of size n from a population with pdf

−(x−θ)

e x ≥ θ, θ > 0

pθ (x) =

0 otherwise

Let Y1 = min1≤i≤n {Xi } be the first order statistic of random sample

X1 , X2 , · · · , Xn . The pdf of Y1 is given by

ne−n(y1 −θ) θ < y1 < ∞

pθ (y1 ) =

0 otherwise

197

A. Santhakumaran

ne−nt

0<t<∞

p(t) =

0 otherwise

Z λ2

ne−nt dt = 1 − α

λ1

e−nλ1 − e−nλ2 = 1−α

This equation has infinitely many solutions. If one can choose λ1 = 0, then 1 −

e−nλ2 = 1 − α , i.e., e−nλ2 = α → −nλ2 = log α . Thus λ2 = n1 log( α1 ) . .˙. The

(1 − α) level confidence interval for θ is

1 1

Pθ 0 < T < log = 1−α

n α

1 1

Pθ 0 < Y1 − θ < log = 1−α

n α

1 1

Pθ Y1 − log < θ < Y1 = 1−α

n α

Example 6.7 Given a sample of size n from U (0, θ) . Show that the confidence

interval for θ based on the sample range R with confidence coefficient (1 − α) and

of the form (R, Rc ) has c given as a root of the equation

cn−1 [n − (n − 1)c] = α.

The pdf of Range R of sample size n is given by

( R ∞ hR in−2

x+R

pθ (R) = n(n − 1) −∞

p(x | θ)p[(x + R) | θ] x

p(x | θ)dx dx

0 otherwise

198

Probability Models and their Parametric Estimation

Given pθ (x) = θ1 , 0 < x < θ and pθ (x+R) = θ1 , 0 < x+R < θ or 0 < x < θ −R .

"Z #n−2

Z θ−R R+x

11 1

pθ (R) = n(n − 1) dx dx

0 θθ x θ

Z θ−R

1 1

= n(n − 1) Rn−2 dx

0 θ2 θn−2

n(n − 1) n−2

= R (θ − R)

θn

n−2

n(n − 1) R R

= 1− , 0<R<θ

θ θ θ

R

If y = θ, then

n(n − 1)y n−2 (1 − y) 0 < y < 1

p(y) =

=0 otherwise

R

Pθ {λ1 < < λ2 } = 1 − α

θ

P {λ1 < Y < λ2 } = 1 − α

Z λ2

p(y)dy = 1 − α

λ1

Z λ2

n(n − 1)y n−2 (1 − y)dy = 1−α

λ1

λ2

y n−1 yn

n(n − 1) − = 1−α

n−1 n λ1

nλn−1

2 − (n − 1)λn2 − nλn−1

1 + (n − 1)λn1 = 1−α

This equation has infinitely many solutions. If one can choose λ1 = c and λ2 = 1 ,

then the confidence interval for θ is

R

P {c < < 1} = 1−α

θ

R

P {R < θ < } = 1−α

c

R, Rc is the (1 − α) level confidence interval for θ where c is given by cn−1 [n −

√

(n − 1)c] = α . For n = 2, c = 1 − 1 − α.

For large or small samples, the Chebychev’s inequality can be employed to find

the confidence interval for a parameter θ, θ ∈ Ω. For a random variable X with

199

A. Santhakumaran

n p o 1

Pθ |X − θ| < V [X] > 1 − 2 where > 1

If θ̂(X) is the estimate of θ ( not necessarily unbiased) with finite variance, then by

Chebychev’s Inequality

1

q

2

Pθ |θ̂(X) − θ| < Eθ [θ̂(X) − θ] > 1 − 2

q q

⇒ θ̂(x) − Eθ [θ̂(X) − θ]2 , θ̂(x) + Eθ [θ̂(X) − θ]2 is a 1 − 12 level con-

fidence interval for θ .

Example 6.8 Let X1 , X2 , · · · , Xn be iid b(1, θ) random variables. Obtain (1 −

α) level confidence interval for θ by using Chebychev’s Inequality.

Pn Vθ [X]

i=1 Xi ∼ b(n, θ) since each Xi ∼ b(1, θ) . Eθ [X̄] = θ and Vθ [X̄] = n =

θ(1−θ)

n . Now ( )

r

θ(1 − θ) 1

Pθ |X̄ − θ| < >1− 2

n

Since θ(1 − θ) ≤ 14 ,

1 1

Pθ |X̄ − θ| < √ >1− 2

2 n

1

Pθ X̄ − √ < θ < X̄ + √ >1− 2

2 n 2 n

If n is kept constant, then one can choose 1 − 12 = 1 − α ⇒ 2 = 1

α ⇒ = √1 .

α

Thus the (1 − α) level confidence interval for θ is

1 1

x̄ − √ , x̄ + √

2 nα 2 nα

Obtain (1 − α) level confidence interval for θ .

One can find the largest integer n2 (θ) such that Pθ {X ≤ n2 (θ)} ≥ α2 and

the smallest integer n1 (θ) such that Pθ {X ≥ n1 (θ)} ≥ α2 .

Because of the discreteness of the Binomial probability, one cannot make these

probabilities exactly equal α2 for all θ other than the symmetrical Binomial probabil-

ity. The events {X ≤ n1 (θ)} and {X ≥ n2 (θ)} are mutually exclusive, then

α α

Pθ {X ≤ n1 (θ) or X ≥ n2 (θ)} ≤ + =α

2 2

i.e., Pθ {n1 (θ) < X < n2 (θ)} ≥ 1−α

200

Probability Models and their Parametric Estimation

The two functions n1 (θ) and n2 (θ) are monotonic and non - decreasing and also

discontinuous step function such that the (1 − α) level confidence interval for θ is

Pθ {n−1 −1

2 (X) < θ < n1 (X)} ≥ 1−α

α

where Pθ {X ≤ n1 (θ)} ≤

2

n1 (θ)

X n α

i.e., i θi (1 − θ)n−i ≤

i=0

2

1 (x) is the upper confidence limit for θ and

n1 (n−1

1 (x)) = x so that

x n

X α

i θi (1 − θ)n−i = (6.1)

i=0

2

Thus the upper confidence limit for θ . Similarly the lower confidence limit for θ is

n n

X α

i θi (1 − θ)n−i = (6.2)

i=x

2

Solving the equations (6.1) and (6.2) for θ ( when n and α are known) gives the

(1 − α) level confidence interval for θ . i.e., (θ(X), θ̄(X)) is the (1 − α) level

confidence interval where θ̄(x) is the solution of the equation (6.1) and θ(x) is the

solution the equation (6.2).

Example 6.10 Assuming there is a constant probability θ , for a person entering a

supermarket will make a purchase. Constitute a random sample of a Bernoulli random

variable ( success = purchase made, failure = no purchase made). If 10 persons were

selected at random and it was found that 4 made a purchase. Obtain 90% confidence

interval for θ .

The 90 % confidence limits for θ is

4

!

X 10

i θi (1 − θ)10−i = .05

i=0

10

!

X 10

i θi (1 − θ)10−i = .05

i=4

Solving these equations for θ, one may get that θ̄(x) = .696 and θ(x) = .150.

Thus, if a random sample of 10 independent Bernoulli random variables gives x = 4

success, the 90 % confidence interval for θ is ( .150, .696).

Example 6.11 Let X1 , X2 , · · · , Xn be a random sample from a Poisson random

variable X with parameter

Pn θ . Obtain (1 − α) level confidence interval for θ .

Let Y = i=1 Xi . Given that each Xi follows P (θ) . Then Y ∼ P (nθ) .

The exact (1 − α) level confidence interval for θ is

Pθ {λ1 (θ) < Y < λ2 (θ)} = 1 − α

α

i.e., Pθ {Y ≥ λ2 (θ)} ≤

2

201

A. Santhakumaran

∞

X (nθ)x α

⇒ e−nθ = (6.3)

x=y

x! 2

α

and Pθ {Y ≤ λ1 (θ)} ≤

2

y

X (nθ)x α

⇒ e−nθ = (6.4)

x=0

x! 2

Pθ {λ−1 −1

2 (Y ) < θ < λ1 (Y )} = 1 − α

Solving the equations

θ is θ(X), θ̄(X) where θ̄(x) is the solution of the equation (6.3) and θ(x) is the

solution of the equation (6.4).

Example 6.12 Let X1 , X2 , · · · , Xn be a random sample of a Uniform random

variable X on (0, θ) . Obtain (1 − α) level confidence interval for θ .

Let T = t(X) = max1≤i≤n {Xi } . The pdf of T is

n n−1

p(t | θ) = θn t 0<t<θ

0 otherwise

α

Pθ {T ≤ λ1 (θ)} =

2

α

and Pθ {T ≥ λ2 (θ)} =

2

Z θ

n n−1 α

Thus Pθ {T ≥ λ2 (θ)} = n

t dt =

λ2 (θ) θ 2

Z λ2 (θ)

n n−1 α

⇒ 1− n

t dt =

0 θ 2

α [λ2 (θ)]n

⇒ 1− =

2 θn

α

i.e., θn 1 − = [λ2 (θ)]n

2

α n1

i.e., λ2 (θ) = θ 1 −

2

α

Similarly Pθ {T ≤ λ1 (θ)} =

2

Z λ1 (θ)

n n−1 α

⇒ n

t dt =

0 θ 2

α n1

i.e., λ1 (θ) = θ

2

202

Probability Models and their Parametric Estimation

1

α n1

α n

Pθ θ <T <θ 1− = 1−α

2 2

T T

Pθ < θ < = 1−α

1 − α n1 1

α n

2 2

T T

1 , 1 provides the (1 − α) level confidence interval for θ .

(1− α2 ) n ( α2 ) n

Example 6.13 Let X1 , X2 , · · · , Xn be iid random samples drawn from a normal

population with mean θ and variance σ 2 . Find (1 − α) level confidence interval for

θ , (i) when σ 2 is known and (ii) σ 2 is unknown.

Case (i) when σ 2 is known , consider

X̄ − θ

whereZ = √ ∼ N (0, 1)

σ/ n

X̄ − θ

√ < b} =

P {a < 1−α

σ/ n

√ √

P {X̄ − bσ/ n < θ < X̄ − aσ/ n} = 1−α

Z a

α

where a is given by φ(z)dz =

−∞ 2

Z ∞

α

and b is given by φ(z)dz =

b 2

Case (ii) When σ 2 is unknown and sample size n ≤ 30 then the statistic

X̄ − θ

t= √ ∼ t distribution with n − 1 d.f

S/ n

1

Pn

where S 2 = n−1 i=1 [Xi − X̄]2 . In this case

X̄ − θ

P {t1 < √ < t2 } = 1−α

S/ n

X̄ − θ

P {t1 < √ < t2 } = 1−α

S/ n

√ √

P {X̄ − t2 S/ n < θ < X̄ − t1 S/ n} = 1−α

Z t1

α

where t1 is given by pn−1 (t)dt =

−∞ 2

Z ∞

α

and t2 is given by pn−1 (t)dt =

t2 2

203

A. Santhakumaran

X̄−θ

If n > 30 , then t = √

S/ n

∼ N (0, 1) . Such a case the 1 − α confidence interval is

S S

(X̄ − zα/2 √ , X̄ − zalpha/2 √ )

n n

R∞

where α2 = zα/2 φ(z)dz

Example 6.14 A random sampling of size 50 taken from a N (θ, σ = 5) has a

mean 40. Obtain a 95% confidence interval for 2θ + 3

Given the sample mean x̄ = 40 and population standard deviation σ = 5 . The

95% confidence interval for θ is

σ σ

P {X̄ − 1.96 √ < θ < X̄ + 1.96 √ } = .95

n n

σ σ

P {2 X̄ − 1.96 √ < 2θ < 2 X̄ + 1.96 √ } = .95

n n

σ σ

P {2 X̄ − 1.96 √ + 3 < 2θ + 3 < 2 X̄ + 1.96 √ + 3} = .95

n n

5×2 5×2

2X̄ + 3 ± 1.96 √ = 83 ± 1.96 √

50 50

Let X1 , X2 , · · · , Xn be a random sample from a pdf p(x | θ) and t(X; θ) = Tθ

be an random variable with distribution independent of θ . Suppose λ1 (α) and λ2 (α)

are chosen such that

For every Tθ , λ1 (α) and λ2 (α) can be chosen in number of ways. However the choice

is one like to choose λ1 (α) and λ2 (α) , such that θ̄(X) − θ(X) is minimum which is

the (1 − α) level shortest confidence interval based on Tθ .

Let Tθ = t(X, θ) be sufficient statistic. A random variable Tθ is a function

of (X1 , X2 , · · · , Xn ) and θ whose distribution is independent of θ is called pivot.

Example 6.15 Let X1 , X2 , · · · , Xn be a random sample from N (θ, σ 2 ) where

2

σ is known. Obtain (1 − α) level shortest confidence interval for θ .

Consider the statistic Tθ = X̄−θ

√σ which is a pivot. Since X̄ is sufficient and

n

Tθ ∼ N (0, 1) , i.e, the distribution of Tθ is independent of θ . The (1 − α) level

204

Probability Models and their Parametric Estimation

Pθ {a < Tθ < b} = 1−α

( )

X̄ − θ

Pθ a< σ <b = 1−α

√

n

σ σ

Pθ X̄ − b √ < θ < X̄ − a √ = 1−α

n n

The length of this confidence interval is √σn (b − a) .

Minimize L = √σn (b − a) subject to

Z b

1 1 2

√ e− 2 x dx = 1 − α

a 2π

Z b

i.e., φ(x)dx = 1 − α (6.6)

a

where φ(x) ∼ N (0, 1). The necessary condition for minimum of L is

∂L σ db

= √ −1 =0

∂a n da

db

⇒ −1 = 0

da

Define φ(a) =

1 1 2

inta−∞ √ e− 2 z dz

2π

Differentiate equation (6.6) with respect to a

Z b

dφ(x) da db

dx − φ(a) + φ(b) = 0

a da da da

Z b

db

0 × dx − φ(a) + φ(b) = 0

a da

db

φ(b) − φ(a) = 0

da

db φ(a)

⇒ =

da φ(b)

φ(a)

Thus −1 = 0

φ(b)

dL

RIf a da = 0, then φ(a) = φ(b) , i.e., when R a a = b or a = −b . If a = b , then

φ(x)dx = 0 which does not satisfy φ(x)dx = 1 − α. If a = −b , then

Ra b a

−b

φ(x)dx = 1 − α . Thus the shortest length confidence interval based on Tθ is a

equal two tails confidence interval. The (1 − α) level confidence interval for θ is

σ σ

X̄ − √ z α2 , X̄ + √ z α2

n n

205

A. Santhakumaran

where z α2 is the upper ordinate corresponding to the area α2 . The shortest length of

this interval is L = 2z α2 √σn .

Example 6.16 Let X1 , X2 , · · · , Xn be a sample from U (0, θ). Find (1−α) level

shortest confidence interval for θ .

Let T = max1≤i≤n {Xi } . The pdf of T is

n n−1

p(t | θ) = θn t 0<t<θ

0 otherwise

T

The pdf of Y = θ is given by

ntn−1

0<y<1

p(y) =

0 otherwise

T

The statistic Y = θ is pivot. The (1 − α) level confidence interval for θ is

P {a < Y < b} = 1 − α

T

P {a < < b} = 1 − α

θ

T T

P{ < θ < } = 1 − α

b a

1 1

The length of the interval isL = ( − )T

a b

To find the shortest confidence interval, minimizing L subject to

Z b

ny n−1 dy = 1 − α

a

bn − an = 1−α

Differentiate this with respect to b

da

nbn−1 − nan−1 = 0

db

n−1

da b

i.e., =

db a

1

Now(1 − α) n < b ≤ 1

dL 1 da 1

= T − 2 + 2

db a db b

n+1 n+1

a −b

= T <0

b2 an+1

since a < b ≤ 1 . The minimum occurs at b = 1 , i.,e., 1 − an = 1 − α → an = α

1

and a = α n . Thus the (1 − α) shortest confidence interval for θ is

T

T, 1

αn

206

Probability Models and their Parametric Estimation

N (θ, σ 2 ) where σ 2 is unknown. Obtain (1 − α) level shortest confidence interval

for θ .

Pn

The statistic Tθ = (X̄−θ)

√S

1

is a pivot where S 2 = n−1 i=1 (Xi − X̄)

2

n

distribution with (n − 1) degrees of freedom. The (1 − α) level confidence interval

for θ is given by

Pθ {a < Tθ < b} = 1 − α

S S

Pθ X̄ − b √ < θ < X̄ − a √ = 1−α

n n

The length of the confidence interval L = (b − a) √Sn .

Rb

Minimizing L subject to a pn−1 (t)dt = (1 − α) where pn−1 (t) is the pdf of the t

distribution with n − 1 degrees of freedom.

dL db S

= − 1 √ and

da da n

db

pn−1 (b) − pn−1 (a) = 0

da

dL pn−1 (a) S

→ = −1 √

da pn−1 (b) n

Z a

where pn−1 (a) = pn−1 (t)dt

−∞

interval is a equal two tails confidence interval for θ is

S S

X̄ − t α2 (n − 1) √ , X̄ + t α2 (n − 1) √

n n

R a

where a = t α2 (n − 1) is given by −∞ pn−1 (t)dt = α2 and b = −a

This shortest length of this interval is L = 2t α2 (n − 1) √Sn .

Example 6.18 Let X1 , X2 , · · · Xn be iid random samples drawn from a Normal

population with mean θ and variance σ 2 . Find (1 − α) level shortest confidence

interval for σ 2 when (i) θ is known and (ii) θ is unknown.

The Statistic Pn

(Xi − θ)2

Tσ2 = i=1 2 ∼ χ2 (ndf )

σ

Tσ2 is a pivot, since the statistic Tσ2 is independent of σ 2

Case (i) The (1 − α) level confidential interval for σ 2 is

P {a < Tσ2 < b} = 1 − α

Pn

(Xi − θ)2

P {a < i=1 2 < b} = 1 − α

Pn σ

n

(Xi − θ)2 (Xi − θ)2

P

i.e., P { i=1 < σ 2 < i=1 } = 1−α

b a

207

A. Santhakumaran

n

2 1 1

X

L= (Xi − θ) −

i=1

a b

Z b

pn (χ2 )dχ2 = 1 − α

a

dL 1 1 X

= ( − ) (Xi − θ)2

da a b

Z b

db

and 0dχ2 + pn (b) − pn (a) = 0

a da

db pn (a)

i.e., =

da pn (b)

Z a

wherepn (a) = pn (χ2 )dχ2

0

X

dL 1 1 pn (a)

= − (Xi − θ)2

da a b pn (b)

dL

For minimum = 0

da

1 1 pn (a)

⇒ 2 =

a b2 pn (b)

⇒ b2 pn (b) = a2 pn (a)

Using iterative method to solve the equation b2 pn (b) = a2 pn (a) for a and b i.e., to

solve Z b Z a

b2 pn (χ2 )dχ2 = a2 pn (χ2 )dχ2 where a < b and a 6= b

0 0

If â and b̂ are the solution of the equation, then the shortest confidence interval for

σ 2 is Pn Pn

2

(Xi − θ)2

i=1 (Xi − θ)

, i=1

b̂ â

Case(ii) If θ is unknown, then

Pn

(Xi − X̄)2 (n − 1)S 2 Pn

Tσ2 = i=1 2 = ∼ χ2 (n−1)df where S 2 = 1

n−1 i=1 (Xi − X̄)2

σ σ2

In this case to solve the equation

208

Probability Models and their Parametric Estimation

(n − 1)S 2

Tσ2 =

σ2

with (n − 1)df

The shortest confidence interval for σ 2 is

(n − 1)S 2 (n − 1)S 2

,

b̂ â

Example 6.19 Let X and Y be two independent random variables that are

N (θ, σ12 ) and N (θ, σ22 ) respectively. Obtain (1 − α) level confidence interval for

σ2

the ratio σ22 < 1 by considering a random sample X1 , X2 , · · · , Xn1 of size n1 ≥ 2

1

from the distribution of X and a random sample Y1 , Y2 , · · · , Yn2 of size n2 ≥ 2

from the distribution ofPYn1 . Pn2

Let s21 = n11 i=1 (Xi − X̄)2 and s22 = n12 i=1 (Yi − Ȳ )2 be the variances

n s2 n s2

of the two samples. The independent random variables σ1 2 1 and σ2 2 2 have χ2

1 2

distribution with n1 − 1 and n2 − 2 degrees of freedom respectively. The definition

of the F statistic is

n1 s21

σ12 (n1 −1)

F = n2 s22

∼ F distribution with n1 − 1 and n2 − 1 degrees of freedom.

σ22 (n2 −1)

σ22

The (1 − α) level confidence interval for σ12

is

n1 s21

σ12 (n1 −1)

P σ22 a< 2

n2 s2

< b = 1−α

2

σ1

σ22 (n2 −1)

2

n2 s2 n2 s22

(n2 −1) σ22 (n2 −1)

P σ22 a < 2 < b n s2 = 1−α

2

σ1

n1 s21 σ1 1 1

(n1 −1) (n1 −1)

σ22

The (1 − α) level confidence interval for σ12

is

a ,b

n1 s21 (n2 − 1) n1 s21 (n2 − 1)

Z a

α

= dF (n1 − 1, n2 − 1)

2

Z0 ∞

α

= dF (n1 − 1, n2 − 1)

2 b

209

A. Santhakumaran

ponential family of distributions with parameter θ. Assume the pdf is

−θx

θe θ>0

p(x | θ) =

0 otherwise

The joint pdf of the random sample X1 , X2 , · · · , Xn is

P

p(x1 , x2 , · · · , xn ) = θn e−θ xi

Pn

Let T = i=1Xi , then T ∼ G(n, θ1 ). Its pdf is

θn −θt n−1

Γn e t 0<t<∞

pθ (t) =

0 otherwise

1 − y 2n −1

2n Γn e

2y 2 0<y<∞

pθ (y) =

0 otherwise

That is Y = 2θ Xi follows χ2 distribution with 2n degrees of freedom. The

P

(1 − α) level confidence interval for θ is

n X o

Pθ a < 2θ Xi < b = 1−α

a b

Pθ P <θ< P = 1−α

2 Xi 2 Xi

where a is given by Z a

α

p2n (χ2 )dχ2 =

0 2

and b is given by Z ∞

α

p2n (χ2 )dχ2 =

b 2

Example 6.21 The time to failure for an electronic component is assumed to be an

Exponential distribution with unknown parameter θ ,

−θx

θe x > 0, θ > 0

i.e., p(x | θ) =

0 otherwise

10 electronic components are place on test and their observed times to failure are

607.5, 1947.0, 37.6, 129.9, 409.5, 529.5, 109.0, 582.4, 499.0, 188.1 hours respectively.

Find the 90% confidence interval for θ and 90% confidence interval for mean time to

failure. Also obtain the 90% confidence interval for the probability of the component

for a 100 hours period.

xi = 5039.5, 2n = 20 degrees of freedom. From χ2

P

As in the Example 6.16,

2 2

table χ0.5 = 10.9 and χ.95 = 31.4 . 90% confidence interval for θ is

10.9 31.4

, = (.00108, .00312)

2 × 5039.5 2 × 5039.5

210

Probability Models and their Parametric Estimation

The mean time to failure is θ1 . The 90% confidence interval for mean time to failure

1 1

lies between .00312 = 320.5 hours and .00108 = 925.9 hours.

The probability that one of these components will work at least t hours with-

out failure is P {X > t} = e−θt . The 90% confidence interval for the probabil-

ity of the component for a 100 hours period lies between e−100×.00312 = .732 and

e−100×.00108 = .898.

Example 6.22 Explain a method of construction of large sample confidence inter-

val for θ in Poisson (θ) .

For large samples the variable

∂ log L

∂θ

Z=q ∼ N (0, 1)

V [ ∂ log L

∂θ ]

Hence the distribution of Z one can easily construct the confidence limits for θ for

large samples. We have

X X

log L(θ) = xi log θ − nθ − log xi

∂ log L(θ) nx̄

= −n

∂θ θ

∂ log L(θ) nX̄

V = V[ − θ]

∂θ θ

n

1 X

= V [ Xi ]

θ2 i=1

n

1 X

= V [X]

θ2 i=1

1

= nθ

θ2

n

=

θ

nx̄

θ −n

ThusZ = p

n/θ

The 95% large confidence interval for θ is

P {−1.96 < Z < 1.96} = .95

r

n

P {−1.96 < (X̄ − θ) < 1.96} = .95

θ

Hence the 95% confidence limits for θ are

r

n

(x̄ − θ) = ±1.96

θ

3.42

θ2 − (2x̄ + )θ + x̄2 = 0

n r

1.92 3.42 3.69

θ = x̄ + ± x̄ + 2

n n n

211

A. Santhakumaran

Problems

6.2 Explain the shortest confidence interval. Also obtain (1 − α) level shortest con-

fidence interval for θ , using a random sample of size n from

−(x−θ)

e x ≥ 0, θ > 0

p(x | θ) =

0 otherwise

length confidence interval for θ at level (1 − α) .

6.4 Obtain (1 − α) level confidence interval for σ 2 when θ is known in N (θ, σ 2 ) .

known. what is its length?

6.6 Obtain (1 − α) coefficient confidence interval for θ based on a random sample

from 1 −1x

θe

θ x ≥ 0, θ > 0

p(x | θ) =

0 otherwise

6.7 Obtain (1 − α) level shortest confidence interval for θ using a random sample

from N (θ, 1) .

6.8 Given X1 , X2 , · · · , Xn is a random sample from N (θ, σ 2 ) , where σ 2 is known

. Find (1 − α) level upper confidence bound for θ .

6.9 Obtain a confidence interval for the range of a rectangular distribution in random

sample of size n .

6.10 The number of houses sold per week for 15 weeks by Dinesh real estate firm

were 3 , 3, 4, 6, 2, 4, 4, 3, 1, 2, 0 , 5, 7, 1, 4 respectively. Assuming these are the

observed values for a random sample of size 15 of a Poisson random variable

with parameter θ . Compute 95 % confidence limits for θ . Ans.(2.36, 4.18)

6.11 Show that in large samples, the 95% level confidence limits for the means of a

Poisson distribution are given by

r

1.92 3.84

X̄ + ± X̄

n n

6.12 Show that for the pdf

θe−θx

x > 0, θ > 0

p(x | θ) =

0 otherwise

212

Probability Models and their Parametric Estimation

the 95% level confidence limits for large samples are given by

1 ± 1.96

√

n

θ=

X̄

6.13 Obtain the large sample confidence interval with confidence coefficient (1 − α)

for the parameter of Bernoulli distribution.

6.14 Examine the connection between shortest confidence interval and sufficient

statistics.

6.15 90 % confidence interval for θ based on a single observation X from the den-

sity function 1

0 < x < θ, θ > 0

p(x | θ) = θ

0 otherwise

is

(a) [X, 10X] (b) 20X (c) 50

19 , 20X 49 , 12.5 (d) All the above

6.16 The correct interpretation regarding the confidence interval (T1 , T2 ) of the pa-

rameter θ for a distribution F (x | θ), θ ∈ < with confidence coefficient 1 − α

is

(a) θ belongs to (T1 , T2 ) with probability 1 − α

(b) (T1 , T2 ) covers the parameter θ with probability 1 − α

(c) (T1 , T2 ) includes the parameter θ with confidence coefficient 1 − α

(d) θ0 belongs to (T1 , T2 ) with confidence α where θ(6= θ0 ) is the true value.

6.17 If a random sample of n = 100 voters in a community produced 59 votes in

favour of the candidate A , then 95 % confidence interval of fraction p of the

voting population

q favouring A is

59×41

(a) 59 ± 1.96

q 100

(b) .59 ± 1.96 .59×.41

q 100

(c) 59 ± 2.58 .59×.41

q 100

(d) 59 ± 2.58 59×41

100

levelconfidence intervalfor θ is:

X(n) X(n)

(a) 1 , 1

(1−α/2) n (α/2) n

X(n) X(n)

(b) 1 , 1

(α/2) n (1−α/2) n

X(n) X(n)

(c) (1−α/2) n , (α/2)n

213

A. Santhakumaran

7. BAYES ESTIMATION

7.1 Introduction

Bayes estimation treats the parameter θ of a statistical distribution as the re-

alizations of a random variable Ω with known distribution rather than a unknown

constant. So far the realization of distributions have assumed only the shape of the

distribution to be known but not the value of the parameters. Bayes estimation uses the

prior information of the distribution to completely specify the realization of the distri-

butions. This is the major difference in Bayes estimation and it may quite reasonable, if

the past experience is sufficiently extensive and revelant to the problem. The choice of

the prior distribution is made like that of the distribution Pθ by combining experience

with convenience.

A number of observations are available from the distribution Pθ , θ ∈ Ω of

a random variable X and it may be used to check the assumption of the form of the

distribution. But in Bayes estimation only a single observation is available from the

distribution of the parameter θ on Ω and it cannot be used to check the assumption

of the distribution. This needs a special care to use in the Bayes estimation.

Replication of a random experiment consists of drawing another set of ob-

servations from the distribution Pθ of a random variable X is possible in the usual

estimation. Replication of a random experiment results taking another value θ0 on

Ω from the prior distribution, then drawing a set of observations from the distribution

Pθ0 of a random variable X is possible in Bayes estimation.

The determination of a Bayes estimation is quite simple in principle. When

consider a situation before observations are taken and the distribution of θ on Ω is

known as prior distribution.

A decision function d(X) is a statistic that takes value in Ω . A non negative

function L(θ, d(X)), θ ∈ Ω is called a loss function. The function R defined by

R(θ, d) = Eθ [L(θ, d(X)] is known as the risk function associated with the decision

function d(X) at θ . For example L(θ, d) = [θ − d]2 , θ ∈ Ω ⊆ < , then the risk

R(θ, d) = Eθ [d(X) − θ]2 is a mean squared error. If it is known as the variance of

the estimator d(X) when Eθ [d(X)] = θ.

Bayes Risk Related to Prior

In Bayes estimation, the pdf (pmf ) π(θ) of θ on Ω ⊆ < is known as prior

distribution. For a fixed θ ∈ Ω , the pdf (pmf ) p(x | θ) represents the conditional

pdf (pmf ) of a random variable X given θ . If π(θ) is the pdf (pmf ) of θ on

Ω ⊆ < , then the joint pdf (pmf ) of θ on Ω and X is given by p(x, θ) = π(θ)p(x |

θ)

The Bayes risk of a decision function d with respect to the loss function

L(θ, d) is defined by R(π, d) = Eθ [R(θ, d)]. If θ on Ω is a continuous random

variable and X is of the continuous type, then bayes risk with respect to the loss

214

Probability Models and their Parametric Estimation

function L(θ, d) is

R(π, d) = Eθ [R(θ, d)]

Z

= R(θ, d)π(θ)dθ

Z

= Eθ [L(θ, d(X))]π(θ)dθ

Z Z

= L(θ, d(x))p(x | θ)dx π(θ)dθ

Z Z

= L[θ, d(x)]p(x | θ)π(θ)dxdθ

then XX

R(π, d) = L[θ, d(x)]p(x | θ)π(θ)

θ x

A decision function d? (X) is known as a Bayes estimator, if it minimizes the

Bayes risk, i.e., if R(π, d? ) = inf d R(π, d).

p(θ | x) is the conditional distribution of a random variable θ on Ω given

X = x and also called as the a posteriori probability distribution of θ on Ω ,

given the sample. The joint pdf of X and θ on Ω can be expressed in the form

p(x, θ) = g(x)p(θ | x) where g(x) denotes the marginal pdf (pmf ) of X . The a

priori pdf (pmf ) π(θ) gives the distribution of θ on Ω before the sample is taken and

the posterior pdf (pmf ) p(θ | x) gives the distribution of θ on Ω after the sampling.

The Bayes risk function of a decision function d(X) with respect to a loss

function L(θ, d(X)) in terms of p(θ | x) is

R(π, d) = Eθ [R(θ, d)]

Z

= R(θ, d(x))g(x)dx

Z

= g(x)Eθ [L(θ, d(x))]dx

Z Z

= g(x) L(θ, d(x))p(θ | x)dθ dx

or

X

R(π, d) = g(x) [L(θ, d(x))p(θ | x)]

x

E[R(θ, d)] is a mean value of the risk R(θ, d) or the expected value of the risk

R(θ, d) . It is evident that a Bayes estimator d? (X) minimizes the mean value of the

risk R(θ, d).

215

A. Santhakumaran

θ) and π(θ) be a priori pdf of θ on Ω ⊆ < . Let L(θ, d) = (θ − d)2 be the

loss function for estimating the parameter θ . The Bayes estimator of θ is given by

d? (X) = E [θ | X = x] .

Proof: The risk function of a decision function d(x) with respect to the loss function

L(θ, d) = [θ − d]2 is

Z Z

R(π, d) = g(x) [θ − d(x)]2 p(θ | x)dθ dx

The Bayes estimator is a function d? (X) that minimizes R(π, d). Minimization of

R(π, d) is same as the minimization of

Z

[θ − d(x)]2 p(θ | x)dθ

d? (x) = E[θ | X = x]

since Eθ [d? (X)] = Eθ {E[θ | X = x]} = θ

Remark 7.1 If L(θ, d) = |θ − d| is the loss function for estimating the parameter

θ , then Bayes estimator of θ is the median of the posterior distribution of θ ∈ Ω ⊆ < .

Since E|X − a| is minimized as a function of a , i.e., E|X − a| is minimized when

a? = median of the distribution of X . Also Bayes estimator is need not be unbiased.

Minimax decision function

The principle of minimax estimator is to choose d? so that max R(θ, d? ) ≤

max R(θ, d) ∀d . If such a function d? exists, is a minimax estimator of θ ∈ Ω ⊆ < .

Theorem 7.2 If d? (X) is a Bayes estimator having constant risk, that is R(θ, d? )

= constant, then d? (X) is a minimax estimator.

Proof: Let π ? (θ) be the prior density corresponding to the Bayes estimator d? (X)

with respect to the loss function L(θ, d) . Then

θ∈Ω

= Eθ L[θ, d? (X)]

Z

= L(θ, d? (x))π ? (θ)dθ

Z

≤ L(θ, d(x))π(θ)dθ

≤ sup R(θ, d)

θ∈Ω

for any other estimator d(X) of the parameter θ . Thus d? (X) is a minimax estima-

216

Probability Models and their Parametric Estimation

tor.

= E[d(X) − E[d(X)] + E[d(X)] − θ]2

= E[d(X) − Ed(X)]2 + [E[d(X) − θ]2

= Vθ [d(X)] + [bias]2

Example 7.1 Let X ∼ b(n, θ) and the a priori pdf of θ on Ω ⊆ < is U (0, 1) .

Find the Bayes estimate of θ using quadratic loss function. Also find the minimax

estimate of θ

The a priori pdf of θ on Ω is

1 0<θ<1

π(θ) =

0 otherwise

Z

g(x) = p(x, θ)dθ

Z

= π(θ)p(x | θ)dθ

Z 1

= cnx θx (1 − θ)n−x dθ

0

Z 1

= cnx θx+1−1 (1 − θ)n−x+1−1 dθ

0

Γ(x + 1)Γ(n − x + 1)

= cnx

Γ(n − x + 1 + x + 1)

n! x!(n − x)!

=

x!(n − x)! (n + 1)!

1

n+1 x = 0, 1, 2, · · · , n

g(x) =

0 otherwise

The posterior pdf of θ on Ω is

p(x, θ) π(θ)p(x | θ)

p(θ | x) = =

g(x) g(x)

= (n + 1)cnx θx (1 − θ)n−x

217

A. Santhakumaran

= E (θ | X = x)

Z 1

= θp(θ | x)dθ

0

Z 1

= (n + 1)cnx θx+2−1 (1 − θ)n−x+1−1 dθ

0

n! (x + 1)!(n − x)!

= (n + 1)

x!(n − x)! (n + 2)!

x+1

=

n+2

n+2 .

Bayes minimax estimator of the function d? (X) with respect to the loss function

L(θ, d? ) is

Z Z

?

R(π, d ) = L[θ, d? (x)]π(θ)p(x | θ)dxdθ

Z ( n )

X

? 2

= π(θ) [d (x) − θ] p(x | θ) dθ where L(θ, d? (x)) = [d? (x) − θ]2

x=0

( n 2 )

Z 1 X x+1

= −θ p(x | θ) dθ

0 x=0

n+2

Z 1 2

X +1

= Eθ − θ dθ

0 n+2

Z 1

1

Eθ (X + 1)2 + (n + 2)2 θ2 − 2(X + 1)(n + 2)θ dθ

= 2

(n + 2) 0

1

Z 1

R(π, d? ) = Eθ [X 2 ] + 2Eθ [X] + 1 + θ2 (n + 2)2 − 2θ(n + 2)Eθ [X] − 2θ(n + 2) dθ

(n + 2)2 0

1

Z 1

R(π, d? ) = n(n − 1)θ2 + nθ + 2nθ + 1 + θ2 (n + 2)2 − 2θ(n + 2)nθ − 2θ(n + 2) dθ

(n + 2)2 0

Z 1

? 1

R(π, d ) = [nθ(1 − θ) + (1 − 2θ)2 ]dθ

(n + 2)2 0

1 n 1 1

= + =

(n + 2)2 6 3 6(n + 2)

with pmf

x

θ (1 − θ)1−x x = 0, 1 and 0 < θ < 1

p(x | θ) =

0 otherwise

218

Probability Models and their Parametric Estimation

1 0<θ<1

π(θ) =

0 otherwise

Find the Bayes estimate of θ and θ(1 − θ) using the quadratic loss function.

The marginal pdf of X1 , X2 , · · · , Xn is

Z

g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ

Z 1

= π(θ)p(x1 , x2 , · · · , xn | θ)dθ

0

Z 1 P

n− xi

P

x1

= θ (1 − θ) dθ

0

Z 1 X

= θt+1−1 (1 − θ)n−t+1−1 dθ where t = xi

0

(

t!(n−t)!

(n+1)! t = 0, 1, 2, · · · , n

=

0 otherwise

p(x1 , x2 , · · · , xn , θ)

p(θ | x1 , x2 , · · · , xn ) =

g(x1 , x2 , · · · , xn )

π(θ)p(x1 , x2 , · · · , xn | θ)

=

g(x1 , x2 , · · · , xn )

(

(n+1)! t

t!(n−t)! θ (1 − θ)n−t 0<θ<1

=

0 otherwise

d? (x1 , x2 , · · · , xn ) = E [θ | X1 = x1 , · · · , Xn = xn ]

Z 1

(n + 1)!θt (1 − θ)n−t

= θ dθ

0 t!(n − t)!

Z 1

(n + 1)!

= θt+2−1 (1 − θ)n+1−t−1 dθ

t!(n − t)! 0

(n + 1)! (t + 1)!(n − t)!

=

t!(n − t)! (n + 2)!

P

t+1 xi + 1

= =

n+2 n+2

219

A. Santhakumaran

Z 1

?

d (x1 , x2 , · · · , xn ) = θ(1 − θ)p(θ | x1 , x2 , · · · , xn )dθ

0

Z 1

(n + 1)!

= θt+2−1 (1 − θ)n+2−t−1 dθ

t!(n − t)! 0

(n − t + 1)(t + 1)

=

(n + 3)(n + 2)

P P

(n − xi + 1)( xi + 1)

=

(n + 2)(n + 3)

population with parameter θ . For estimating θ , using the quadratic error loss func-

tion and the a priori distribution of θ on Ω , given by pdf

−θ

e θ>0

π(θ) =

0 otherwise

is used. Find the Bayes estimate for (i) θ and (ii) e−θ

The marginal pdf of X1 , X2 , · · · , Xn is

Z ∞

g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ

0

Z ∞

= π(θ)p(x1 , x2 , · · · , xn | θ)dθ

0

P

−nθ θ

xi

−θ e

Z ∞

= e dθ

0 x1 ! · · · xn !

1 Z ∞

−(n+1)θ t+1−1 X

= Qn e θ dθ where t = xi

x ! 0

i=1 i

t!

= Qn

x !(n + 1)t+1

i=1 i

The posterior pdf of θ on Ω is

p(x1 , x2 , · · · , xn , θ)

p(θ | x1 , x2 , · · · , xn ) =

g(x1 , x2 , · · · , xn )

π(θ)p(x1 , x2 , · · · , xn | θ)

=

g(x1 , x2 , · · · , xn )

e−(n+1)θ θ t t+1 X

= (n + 1) where t = xi and 0 < θ < ∞

t!

∞

e−(n+1)θ θt

Z

d? (x1 , x2 , · · · , xn ) = θ (n + 1)t+1 dθ

0 t!

(n + 1)t+1 ∞ −(n+1)θ t+2−1

Z

= e θ dθ

t! 0

(n + 1)t+1 Γt + 2

=

t! (n + 1)t+2

t!(t + 1) t+1

= =

t!(n + 1) (n + 1)

220

Probability Models and their Parametric Estimation

∞

e−(n+1)θ θt

Z

?

d (x1 , x2 , · · · , xn ) = e−θ (n + 1)t+1 dθ

0 t!

(n + 1)t+1 ∞ −(n+2)θ t+1−1

Z

= e θ dθ

t! 0

(n + 1)t+1 Γt + 1

=

t! (n + 2)t+1

t+1

n+1

=

n+2

Example 7.4 X ∼ b(n, θ) and suppose that a priori pdf of θ on Ω is U (0, 1) .

2

Find the Bayes estimate of θ . Using loss function L(θ, d) = (θ−d) θ(1−θ) , find the Bayes

minimax estimate of θ .

x+1

As in Example 7.1, the Bayes estimate of θ is d? (x) = n+2 . Minimax estimate of

?

θ with respect to the loss function L(θ, d ) is

Z 1 Z

? ?

R(π, d ) = π(θ) L(θ, d (x))p(x | θ)dx dθ

"0 n #

X [d? (x) − θ]2

= p(x | θ) dθ

x=0

θ(1 − θ)

Z 1 "X n 2 #

x+1 1

= −θ p(x | θ) dθ

0 x=0

n+2 θ(1 − θ)

Z 1 2

X +1 1

= Eθ −θ dθ

0 n + 2 θ(1 − θ)

Z 1

1 1

= (n − 4) + dθ

(n + 2)2 0 θ(1 − θ)

Z 1

(n − 4) 1 1 1

= + + dθ

(n + 2)2 (n + 2)2 0 θ 1−θ

Example 7.5 Let X1 , X2 , · · · , Xn be a random sample drawn from a distribution

with pdf G(1, θ1 ) . To estimate θ , let priori pdf on θ be π(θ) = e−θ , θ > 0 and let

the loss function be squared error. Find the Bayes estimate of θ .

The marginal pdf of X1 , X2 , · · · , Xn is

Z ∞

g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ

0

Z ∞

= π(θ)p(x1 , x2 , · · · , xn | θ)dθ

0

Z ∞ n

X

= e−θ(1+t) θn+1−1 dθ where t = xi

0 i=0

n!

= , 0<t<∞

(1 + t)n+1

221

A. Santhakumaran

p(x1 , x2 , · · · , xn , θ)

p(θ | x1 , x2 , · · · , xn ) =

g(x1 , x2 , · · · , xn )

π(θ)p(x1 , x2 , · · · , xn | θ)

=

g(x1 , x2 , · · · , xn )

(1 + t)n+1 −θ(1+t) n

= e θ , 0<θ<∞

n!

Bayes estimate of θ is

∞

(1 + t)n+1 ∞ −θ(1+t) n+2−1

Z Z

?

d (x) = e θ dθ

0 n! 0

(1 + t)n (n + 1)!

=

n! (1 + t)n+2

n+1 n+1

= 2

= P 2

(1 + t) [ xi + 1]

Example 7.6 Let X1 , X2 , · · · , Xn be iid random sample drawn from a popula-

tion with pmf b(1, θ) . Assume the a priori pdf of θ on Ω is

( a−1

θ (1−θ)b−1

0<θ<1

π(θ) = β(a,b)

0 otherwise

The marginal pdf of X1 , X2 , · · · , Xn is

Z 1

g(x1 , x2 , · · · , xn ) = π(θ)p(x1 , x2 , · · · , xn | θ)dθ

0

1

θt+a−1 (1 − θ)n−t+b−1

Z

= dθ

0 β(a, b)

1 Γ(a + t)Γ(n − t + b)

=

β(a, b) Γ(n + a + b)

The posterior pdf of θ on Ω is

Γ(n + a + b)

p(θ | x1 , x2 , · · · , xn ) = θa+t−1 (1 − θ)n+b−t−1 0 < θ < 1

Γ(a + t)Γ(a + b − t)

Bayes estimate of θ is

Z 1

? Γ(n + a + b)

d (x) = θa+1+t−1 (1 − θ)n+b−t−1 dθ

Γ(a + t)Γ(n + b − t) 0

P

a+t xi + a

= =

n+b+a n+b+a

Example 7.7 Let the a priori pdf of θ on Ω be N (0, 1) . Let X1 , X2 , · · · , Xn

be iid random sample drawn from a normal population with mean θ and variance 1.

222

Probability Models and their Parametric Estimation

Find the Bayes estimate of θ and Bayes risk with respect to a loss function L[θ, d] =

[θ − d]2 .

The a priori pdf of θ on Ω is

( 1 2

√1 e− 2 θ −∞ < θ < ∞

π(θ) = 2π

0 otherwise

( 1 2

√1 e− 2 (x−θ) −∞ < x < ∞

p(x | θ) = 2π

0 otherwise

Z ∞

g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ

−∞

Z ∞

= π(θ)p(x1 , x2 , · · · , xn | θ)dθ

−∞

Z ∞ n

1 − 1 θ2 1 1

P 2

= √ e 2 √ e− 2 (xi −θ) dθ

−∞ 2π 2π

P 2 Z

− 21 xi ∞

e 1 2 2

= (n+1)

e− 2 [nθ +θ −2nθx̄] dθ

(2π) 2 −∞

P 2 Z

− 12 xi ∞

e (n+1) 2

e− 2 [θ − n ] dθ

2nx̄θ

= (n+1)

(2π) 2 −∞

P 2

− 21 xi Z ∞

e n2 x̄2 n+1 nx̄ 2

= (n+1)

e 2(n+1) e− 2 [θ− n+1 ] dθ

(2π) 2 −∞

√ nx̄ nx̄ 2

Put the transformation n + 1(θ − n+1 ) = t → (n + 1)(θ − n+1 ) = t2

2 2

1

x2i + 2(n+1)

n x̄

P

e− 2 ∞

Z

1 2

g(x1 , x2 , · · · , xn ) = n+1 √ e− 2 t dt

(2π) 2 n+1 −∞

2 2

− 12 x2i + 2(n+1)

n x̄

P

e √

= n+1 √ 2π

(2π) 2 n+1

1 2 n2 x̄2

P

e− 2 xi + 2(n+1)

= √ n

n + 1(2π) 2

223

A. Santhakumaran

π(θ)p(x1 , x2 , · · · , xn | θ)

p(θ | x1 , x2 , · · · , xn ) =

g(x1 , x2 , · · · , xn )

1 2 1

P 2√ n

√1 e− 2 θ √ 1 e− 2 (xi −θ) n + 1(2π) 2

2π ( 2π)n

= n2 x̄

− 12

P 2

xi + 2(n+1)

e

1 (n+1) nx̄ 2

− 2 [θ− n+1 ]

= q e −∞<θ <∞

2π

n+1

= 0 otherwise

Bayes estimate of θ is

d? (x)= E[θ | X1 = x1 , · · · Xn = xn ]

Z ∞

= θp(θ | x1 , x2 · · · , xn )dθ

−∞

Z ∞

1 1 (n+1) nx̄ 2

= θ √ (n + 1) 2 e− 2 [θ− n+1 ] dθ

−∞ 2π

√ nx̄ t nx̄

√

Put t = n + 1(θ − n+1 ) → θ = √n+1 + n+1 , dt = n + 1dθ

∞ t2 ∞

1 te− 2

Z Z

1 nx̄ t2

d? (x) = √ √ dt + √ e− 2 dt

−∞ 2π n + 1 −∞ 2π n+1

nx̄ nx̄

= 0+ =

n+1 n+1

224

Probability Models and their Parametric Estimation

Z ∞Z ∞ 2

nx̄

= − θ p(x̄ | θ)π(θ)dθdx̄

−∞ −∞ n + 1

Z ∞ 2

nX̄

= π(θ)Eθ − θ dθ

−∞ n+1

Z ∞

1

= π(θ)Eθ [nX̄ − nθ − θ]2 dθ

(n + 1)2 −∞

Z ∞

1

π(θ) Eθ [n(X̄ − θ)]2 + θ2 dθ

= 2

(n + 1) −∞

Z ∞

1

= π(θ)[n2 Vθ [X̄] + θ2 ]dθ

(n + 1)2 −∞

Z ∞

1 1

= 2

π(θ)[n + θ]2 dθ where Vθ [X̄] =

(n + 1) −∞ n

Z ∞ Z ∞

n 1

= π(θ)dθ + θ2 π(θ)dθ

(n + 1)2 −∞ (n + 1)2 −∞

n 1

= + since π(θ) ∼ N (0, 1)

(n + 1)2 (n + 1)2

n+1 1

= 2

=

(n + 1) n+1

Bayes confidence interval estimation taking into account a prior knowledge of the

experiment and to construct the confidence interval for a parameter θ . The posterior

pdf p(θ | x1 , x2 , · · · , xn ) of θ on Ω is known, then one can easily find out the

function l1 (x) and l2 (x) such that

P {l1 (X) < θ < l2 (X)} = 1 − α

It gives the 1 − α level Bayes confidence interval for θ . Thus

Z l2 (θ)

P {l1 (X) < θ < l2 (X)} = p(θ | x1 , x2 , · · · , xn )dθ

l1 (θ)

or

l2 (x)

X

= p(θ | x1 , x2 , · · · xn )

l1 (x)

Example 7.8 Let X1 , X2 , · · · , Xn be iid b(1, θ) random variables and let the

a priori pdf π(θ) of θ on Ω be U (0, 1) . Find (1 − α) level Bayes confidence

interval for θ .

As in Example 7.2,

1

θt (1 − θ)n−t 0 < θ < 1 where t = xi

P

p(θ | x1 , x2 , · · · , xn ) = β(t+1,n−t+1)

0 otherwise

225

A. Santhakumaran

α

i.e., Pθ {θ ≥ l2 X} =

2

Z 1

1 α

θt (1 − θ)n−t dθ = (7.1)

l2 x β(t + 1, n − t + 1) 2

α

and Pθ {θ ≤ l1 x} =

2

Z l1 (x)

1 α

(θ) θt (1 − θ)n−t dθ = (7.2)

0 β(t + 1, n − t + 1) 2

Solving the equations (7.1) and (7.2) for θ , one may get the (1 − α) level Bayes

confidence interval (θ(x), θ̄(x)) for θ .

Example 7.9 Let X1 , X2 , · · · , Xn be iid random sample drawn from a normal

population N (θ, 1), θ ∈ Ω ⊆ < and let the a priori pdf π(θ) of θ on Ω be

N (0, 1) . Find (1 − α) level Bayes confidence interval for θ .

As in Example 7.7, the posterior pdf of θ on Ω is

nx̄ 1

p(θ | x1 , x2 , · · · , xn ) ∼ N ,

n+1 n+1

nx̄

θ− n+1

Z= 1 ∼ N (0, 1)

√

n+1

Here θ is random variable. If one selects the equal tails confidence interval, then

√

nX̄

Pθ −z α2 < −θ n + 1 < z α2 = 1−α

n+1

zα zα

nX̄ nX̄

Pθ −√ 2 <θ< √ +√ 2 = 1−α

n+1 n+1 n+1 n+1

zα zα

nx̄ nx̄

−√ 2 , +√ 2

n+1 n+1 n+1 n+1

is the (1 − α) level Bayes confidence interval for θ .

Example 7.10 Let X1 , X2 , · · · , Xn be a random sample from a Poisson distri-

bution with unknown parameter θ . Assume that the a priori pdf π(θ) of θ on Ω

is 1 −αθ β−1

αβ Γβ

e θ θ > 0, α, β > 0

π(θ) =

0 otherwise

226

Probability Models and their Parametric Estimation

The pdf of X1 , X2 , · · · , Xn given θ is

n

e−nθ θt X

p(x1 , x2 , · · · , xn | θ) = Qn where t = xi

i=1 xi ! i=1

Z ∞

1 e−nθ θt

g(x1 , x2 , · · · , xn ) = e−αθ θβ−1 Qn dθ

0 αβ Γβ i=1 xi !

1 1 1 Γ(β + t)

= Qn β Γβ (α + n)β+t

x

i=1 i ! α

p(θ | x1 , x2 , · · · , xn ) = e θ θ>0

Γ(β + t)

α

i.e., Pθ {θ ≥ l2 (x)} =

2

Z ∞

α

p(θ | x1 , x2 , · · · , xn )dθ = (7.3)

l2 (x) 2

α

Pθ {θ ≤ l1 (x)} =

2

Z l1 (x)

α

p(θ | x1 , x2 , · · · , xn )dθ = (7.4)

0 2

Solving the equations (7.3) and (7.4) for θ , one may get the (1 − α) level Bayes

confidence interval (θ(X), θ̄(X)) for θ .

Example 7.11 Let X1 , X2 , · · · , Xn be a sample drawn from a normal population

N (θ, 1) . Assume that the a priori pdf π(θ) on Ω is U (−1, 1) . Find (1 − α) level

Bayes confidence interval for θ.

The pdf of X1 , X2 , · · · , Xn is

( n 1 P 2

√1 e− 2 (xi −θ) −∞ < x < ∞

p(x1 , x2 , · · · , xn | θ) = 2π

0 otherwise

1

2 −1 < θ < 1

π(θ) =

0 otherwise

227

A. Santhakumaran

Z ∞

g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ

−∞

Z ∞

= π(θ)p(x1 , x2 , · · · , xn | θ)dθ

−∞

Z 1 n

1 1 1

P 2

= √ e− 2 (xi −θ) dθ

−1 2 2π

P 2 Z

− 21 xi 1

e n 2

= n e− 2 [θ −θx̄] dθ

2(2π) 2 −1

1

P 2 Z

e− 2 xi 1 − n2 {[θ−x̄]2 −x̄2 }

= n e dθ

2(2π) 2 −1

1

P 2 nx̄2 Z

1

e− 2 xi + 2 n 2

= n e− 2 [θ−x̄] dθ

2(2π) 2 −1

1

P 2 nx̄2 Z

∞

e− 2 xi + 2 t2 dt √

= n e− 2 √ where t = (θ − x̄) n

2(2π) 2 −∞ n

1 2 nx̄ 2

e− 2 xi + 2 √

P

= √ n 2π

2 n(2π) 2

The posterior pdf of θ on Ω is

π(θ)p(x1 , x2 , · · · , xn | θ)

p(θ | x1 , x2 , · · · , xn ) =

g(x1 , x2 , · · · , xn )

1 − 12

P

(xi −θ)2

√ √

2e 2 n( 2π)n

= √ √ 1

P 2 n 2

2π( 2π)n e− 2 xi + 2 x̄

√

n 1 2

= √ e− 2 n[θ−x̄] − ∞ < θ < ∞

2π

1

θ ∼ N x̄,

n

The (1 − α) level Bayes confidence interval for θ is

P {a < Z < b} = 1 − α

where Z = θ−x̄

√1

∼ N (0, 1)

n

P −z α2 < Z < z α2 = 1−α

zα/2 zα/2

P X̄ − √ < θ < X̄ + √ = 1−α

n n

zα/2 zα/2

x̄ − √ , x̄ + √

n n

228

Probability Models and their Parametric Estimation

Problems

7.1 Given n independent observations from a Poisson distribution with mean λ , find

Bayes’ estimate of λ , assuming the prior distribution π(θ) = e−λ , 0 < λ <

∞.

7.2 If d is a Bayes estimator of θ relative to some prior distributions and the risk

function does not depend on θ , show that d is minimax.

7.3 Define the terms: loss function, risk function and minimax estimator. Explain a

procedure of computing the minimax estimator under squared error loss func-

tion.

7.4 Explain Bayes and Minimax estimation procedures. Find out the Bayes estimate

of θ by using the quadratic loss function. Given a random sample from p(x |

θ) = θx (1 − θ)1−x , x = 0, 1 . The a priori distribution of θ is π(θ) = 2θ, 0 ≤

θ ≤ 1.

7.5 Let X1 , X2 , · · · , Xn be a sample drawn from a normal population N (0, 1) .

Assume that the a priori pdf π(θ) on Ω is U (−1, 1) . Find (1 − α) level

Bayesian confidence interval for θ. Also comments on your confidence interval.

7.6 Explain the concepts of Baye’s estimation.

7.7 Distinguish between interval estimation and Bayes interval estimation.

7.8 The joint pdf p(x, θ) can be expressed for the given value θ on Ω ⊆ < and

the a prior density π(θ) as

(a) p(x, θ) = p(x | θ)π(θ)

(b) p(x, θ) = g(x)p(x | θ)

g(θ)

(c) p(x, θ) = p(θ|x)

π(θ)

(d) p(x, θ) = p(x|θ)

7.9 The joint pdf p(x, θ) can be expressed for the given value X = x . p(θ | x) is

the posterior pdf of θ on Ω ⊆ < and g(x) is the marginal density of X as

(a) p(x, θ) = g(x)p(θ | x)

g(x)

(b) p(x, θ) = p(θ|x)

π(θ)

(c) p(x, θ) = p(θ|x)

(d) p(x, θ) = g(x)p(x | θ)

(1) Properties of Bayes estimator are given in terms of minimum risk.

(2) For large n , Bayes estimators tend to MLE’s irrespective of prior density

π(θ) of θ on Ω .

(3) Bayes estimators in many cases are asymptotically consistent.

(4) Goodness of a Bayes estimator is measured in terms of mean squared error

loss funcion.

229

A. Santhakumaran

(a) 1 and 2 ( b) 2 and 3 (c) 3 and 4 (d) 1, 2, 3 and 4

7.11 Bayes estimator is

(a) unbiased

(b) not unbiased

(c) asymptoticaly normal

(d) None of the above

7.12 Which of the following statement is true?

Main feature of Bayes’ approach in the estimation of parameter is

(a) to consider the parameter a random variable

(b) to specify prior distribution

(c) to specify posterior distribution

(d) All the above

7.13 Bayes estimator is

(a) always asymptotically normal

(b) always a function of minimal sufficient statistics

(c) most efficient

(d) both (a) and (c)

7.14 Which of the following statements are true?

(1) Bayes estimation uses the prior information of the distribution to completely

specify the realization of the distribution.

(2) Bayes estimation involves only a single observation from the ditribution of θ

on Ω .

(3) Bayes estimation consists of repeating a random experiment means taking

another value θ0 on Ω from the prior distribution, then drawing a set of obser-

vations from the distribution Pθ0 of a random variable X .

Choose the correct answer given below:

(a) 1 and 2 (b) 1 and 3 (c) 2 and 3 (d) 1, 2 and 3

230

Probability Models and their Parametric Estimation

Chapter 1 Chapter 3

1.1 b 3.11 a

1.2 c 3.12 b

1.3 b 3.13 b

1.4 b 3.14 a

1.5 d Chapter 4

1.6 a 4.14 c

1.7 c 4.15 b

1.8 d 4.16 c

1.9 b 4.17 d

1.10 c 4.18 b

1.11 d 4.19 d

1.12 d 4.20 a

1.13 d Chapter 5

1.14 c 5.35 a

1.15 d 5.36 c

1.16 b 5.37 b

1.17 d 5.38 a

1.18 a 5.39 d

1.19 c 5.40 c

1.20 a 5.41 a

1.21 a 5.42 d

1.22 b Chapter 6

1.23 c 6. 11 a

1.24 d 6.12 c

1.25 c 6.13 b

1.26 d 6.14 a

1.27 c

1.28 a Chapter

Chapter 2 7.6 a

2.26 b 7.7 a

2.27 c 7.8 d

2.28 a 7.9 b

2.29 b 7.10 a

2.30 b 7.11 b

2.31 a 7.12 d

231

A. Santhakumaran

Glossary of Notation

I+ - Set of positive integers

< - Real number system

Ω - Parameter space

pdf - Probability density function

pmf - Probability mass function

p(x | θ) - Given parameter θ, the pdf

or pmf of the random variable X

π(θ) - prior pdf or pmf of θ on Ω

p(θ | x) - Posterior pdf or pmf of θ on Ω

p(x, θ) - Joint pdf or joint pmf of the random variable X

and the random variable θ

p(x, y) - Joint pdf or joint pmf of the random variables X and Y

T = t(X) - t(X1 , X2 , · · · , Xn ), n = 1, 2, · · · is a function

of random sample

MLE - Maximum Likelihood Estimator

UMVUE - Uniformly Minimum Variance Unbiased Estimator

LMVUE - Locally Minimum Variance Unbiased Estimator

MVBE - Minimum Variance Bound Estimator

BLUE - Best Linear Unbiased Estimator

LSE - Least Square estimator

iid - Independent identically distributed

b(1, θ) - Bernoulli with parameter θ

b(n, θ) - Binomial with parameter n and θ

G(n, θ) - Gamma with parameter n and θ

exp(θ) - Exponential with parameter θ

P (θ) - Poisson with parameter θ

∪(a, b) - Uniform on (a, b)

N (θ, σ 2 ) - Normal with mean θ, variance σ 2

df - degrees of freedom

tn - Student’s t distribution with n df

F (m, n) - F - distribution with (m, n) df

Probability Models and their Parametric Estimation

APPENDIX

Normal curve ordinate

#include < stdio.h >

void main()

{

float y[200],a,b,x,l,n,s1,s2,calarea,area;

int i;

clrscr();

printf( ˝ Enter the value of a and area \n” );

scanf( ˝ %f %f”, & a, & area);

printf( ˝ Enter the number of intervals n \n” );

scanf( ˝ %d”,& n);

/* 0 ≤ a, b ≤ +3 */

b = 0.0;

do

{

l = (b - a)/n;

x = l;

y[0] = 1/2.506;

for( i= 1; i < = n; i++)

{

y[i] = (1/2.506)*exp( -0.5*x*x);

x=x+l}

s1 = 0 ;

s2 = 0 ;

for(i = 1; i < = n-1; i=i+2)

{

s1 = s1 + y[i];

}

for( i = 2; i < = n-2; i = i+2)

{

s2 = s2 + y[i];

}

calarea = l/3*( y[0] + y[n] + 4*s1 +2*s2);

if(( calarea - area ) > = .0001)

break;

b = b + 0.01

}

while( b < = 3.0)

printf( ˝ The ordinate of the given area = %4.2f”, b);

getch();

}

#include < stdio.h >

void main()

A. Santhakumaran

{

float y[200],a,b,x,l,n,s1,s2,area;

int i;

clrscr();

printf( ˝ Enter the value of a \n” );

/* 0 ≤ a, b ≤ +3 */

scanf( ˝ %f”, & a);

printf( ˝ Enter the value of b \n” );

scanf( ˝ %f”, & b);

printf( ˝ Enter the value of n \n” );

scanf( ˝ %f”,& n);

l = (b - a)/n;

x = l;

y[0] = 1/2.506;

for( i= 1; i < = n; i++)

{

y[i] = (1/2.506)*exp( - 0.5*x*x);

x = x+l;

}

s1 = 0 ;

s2 = 0 ;

for(i = 1; i < = n-1; i=i+2)

{

s1 = s1 + y[i];

}

for( i = 2; i < = n-2; i = i+2)

{

s2 = s2 + y[i];

}

area = l/3*( y[0] + y[n] + 4*s1 +2*s2);

printf( ˝ The area for the given ordinate = %4.5f”,area);

getch();

}

234

Probability Models and their Parametric Estimation

BIBLIOGRAPHY

pany Limited, New Delhi, 1995.

3. Chernoff, H. and Lehmann, E. L. The use of the maximum likelihood estimates in

χ2 tests of goodness of fit, Ann. Math. Stat., 25, 579 1964.

4. Fisher, R. A. On the mathematical foundations of theoretical statistics, Phil. Trans.

Royal Soc. A, 222, 309 - 368, 1922.

5. Lehmann, E. L. Testing Statistical hypotheses, John Wiley, New York, 1959.

6. Lehmann, E. L. Theory of point estimation, John Wiley and Sons, 1983

## Viel mehr als nur Dokumente.

Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.

Jederzeit kündbar.