Sie sind auf Seite 1von 177

ST 522: Statistical Theory II

Subhashis Ghoshal,
North Carolina State University
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Useful Results from Calculus
We recapitulate some facts from calculus we need throughout.
Theorem (Binomial theorem)
(a+b)
n
=
_
n
0
_
a
n
b
0
+
_
n
1
_
a
n1
b
1
+ +
_
n
n 1
_
a
1
b
n1
+
_
n
n
_
a
0
b
n
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Common innite series
Geometric series
a + ar + + ar
n1
= a
r
n
1
r 1
= a
1 r
n
1 r
, r = 1.
Innite Geometric series
a + ar + ar
2
+ = a
1
1r
, |r | < 1.
(1 x)
1
= 1 + x + x
2
+ , |x| < 1
(1 + x)
1
= 1 x + x
2
x
3
+ , |x| < 1.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Common innite series (contd.)
Innite binomial series
(1 x)
2
= 1 + 2x + 3x
2
+ 4x
3
+ , |x| < 1,
(1 x)
r
= 1 +

n=1
_
r +n1
n
_
x
n
, |x| < 1, where for any real
number ,
_

n
_
= ( 1) ( n + 1)/n!, the generalized
binomial coecient. In particular,
_
r +n1
n
_
= r (r +1) (r +n 1)/n!. Also note that for > 0,
_

r
_
= (1)
r
(+1)(+r 1)
r !
.
Exponential series
e
x
= 1 +
x
1!
+
x
2
2!
+
Logarithmic series
log(1 + x) = x
x
2
2
+
x
3
3
, |x| < 1
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Useful limits
lim
n
(1 + 1/n)
n
= e.
lim
n
(1 +
n
/n)
n
= e

for any
n
.
lim
x0
(1 + ax)
1/x
= e
a
.
lim
x0
log(1+x)
x
= 1.
lim
x0
sin x
x
= 1.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Derivatives
d
dx
x
n
= nx
n1
.
d
dx
e
ax
= ae
ax
.
d
dx
a
x
= a
x
log a.
d
dx
log x = 1/x.
d
dx
sin x = cos x.
d
dx
cos x = sin x.
d
dx
tan x = 1 + tan
2
x.
d
dx
sin
1
x = 1/

1 x
2
.
d
dx
tan
1
x =
1
1+x
2
.
d
dx
(af (x) + bg(x)) = af

(x) + bg

(x).
d
dx
f (x)g(x) = f

(x)g(x) + f (x)g

(x).
d
dx
(f (x)/g(x)) =
f

(x)g(x)f (x)g

(x)
g
2
(x)
.
d
dx
f (g(x)) = f

(g(x))g

(x).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Integration
_
x
n
dx =
x
n+1
n+1
, n = 1.
_
x
1
dx = log x.
_
e
ax
dx = e
ax
/a, a = 0.
_
f

(x)
f (x)
dx = log f (x).
Integration by substitution
_
g(f (x))f

(x)dx =
_
g(y)dy, y = f (x).
Integration by parts
_
u(x)v(x)dx = u(x)V(x)
_
V(x)u

(x)dx,
where V(x) =
_
v(x)dx, u(x) is called the rst function and
v(x) the second.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Integration (contd.)
Integration by partial fractions
Applies while integrating the ratio of two polynomials P(x)
and Q(x), where the degree of P is less than the degree of Q
without loss of generality. Factorize Q(x) in linear and
quadratic factors. The ratio can be written as uniquely a
linear combination of the reciprocals of the linear factors and
linear over quadratic factors. The resulting expression can be
integrated term by term. Consult any standard Calculus text
such as Apostol.
Denite Integral
_
b
a
f (x)dx = F(x))

b
a
= F(b) F(a),
where F(x) =
_
f (x)dx.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Order Statistics
Given a random sample, we are interested in the smallest,
largest, or middle observations.
the highest ood waters
the lowest winter temperature recorded in the last 50 years
the median price of houses sold in last month
the median salary of NBA players
Denition: Given a random sample, X
1
, , X
n
, the sample
order statistics are the sample values placed in ascending
order,
X
(1)
= min
1i n
X
i
,
X
(2)
= second smallest X
i
,
... = ...
X
(n)
= max
1i n
X
i
.
Example: Suppose four numbers are observed as a sample of
size 4. The sample values are
x
1
= 6, x
2
= 9, x
3
= 3, x
4
= 8.. What are the order
statistics?
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Order Statistics (contd.)
Order statistics are random variables themselves (as functions
of a random sample).
Order statistics satisfy
X
(1)
X
(n)
.
Though the samples X
1
, , X
n
are independently and
identically distributed, the order statistics X
(1)
, , X
(n)
are
never independent because of the order restriction.
We will study their marginal distributions and joint
distributions
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Order Statistics - Marginal distributions
Assume X
1
, , X
n
are from a continuous population with cdf
F(x) and pdf f (x).
The nth order statistic, or the sample maximum, X
(n)
had the
pdf
f
X
(n)
(x)
= n[F(x)]
n1
f (x)
The rst order statistic, or the sample minimum, X
(1)
had the
pdf
f
X
(1)
(x)
= n[1 F(x)]
n1
f (x)
More generally, the j th order statistic X
(j )
has the pdf
f
X
(j )
(x) =
n!
(j 1)!(n j )!
f (x)[F(x)]
j 1
[1 F(x)]
nj
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Order Statistics -Joint distributions
For 1 i < j n, the joint pdf of X
(i )
and X
(j )
is
f
X
(i )
,X
(j )
(u, v) =
n!
(i 1)!(j i 1)!(n j )!
f (u)f (v)[F(u)]
i 1
[F(v) F(u)]
j i 1
[1 F(v)]
nj
if < u < v < ; = 0 otherwise.
Special case: Joint pdf of X
(1)
and X
(n)
The joint pdf X
(1)
, , X
(n)
is
f
X
(1)
, ,X
(n)
(u
1
, , u
n
)
= n!f (u
1
) f (u
n
)1l{< u
1
< < u
n
< }.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Illustration
Example: X
1
, , X
n
are iid from unif [0, 1].
Show that X
(j )
Beta(j , n + 1 j ).
Compute E[X
(j )
] and Var[X
(j )
]
The joint pdf of X
(1)
and X
(n)
.
Let n = 5. Derive the joint pdf of X
(2)
and X
(4)
.
X
(1)
|X
(n)
X
(n)
Beta(1, n 1)
For any i < j , X
(i )
|X
(j )
X
(j )
Beta(i , j i )
Let n = 5. Derive the joint pdf of X
(1)
, , X
(5)
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
Compute P(X
(1)
> 1, X
(n)
2).
P(X
(1)
> x, X
(n)
y) =
n

i =1
P(x < X
i
y) = [F(y) F(x)]
n
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Common statistics based on order statistics
sample range: R = X
(n)
X
(1)
sample midrange: V =
_
X
(n)
+ X
(1)
_
/2
sample median:
M =
_
X
((n+1)/2)
if n is odd
_
X
(n/2)
+ X
(n/2+1)
_
/2 if n is even.
sample percentile: For any 0 < p < 1, the (100p)th sample
percentile is the observation such that about np of the
observations are less than this observation and n(1 p)th of
the observations are larger.
sample median M is 50th sample quantile (the second sample
quartile)
denote Q
1
as 25th sample quantile (the rst sample quartile)
denote Q
3
as 75th sample quantile (the third sample quartile)
interquartile range IQR=Q
3
Q
1
(describing the spread about
the median)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Remarks
Sample Mean vs Sample Median
Sample Median vs Population Median
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Principles of data reduction
Data X, (X
1
, . . . , X
n
): Probability distribution P completely or
partially unknown.
Distribution often modeled by standard ones such as Poisson,
normal.
A few parameters control the distribution of the data. P = P

Parameter : unknown, object of interest.


Inference: Any conclusion about parameter values based on data.
Three main inference problems point estimation, hypothesis
testing, interval estimation.
Statistic T = T(X): Any function of data. A summary measure of
the data.
Statistics may be used as point estimators, test statistics, upper
and lower condence limit.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Inductive reasoning
Role of probability theory: Extent of randomness of T controlled
by . Probabilistic characteristics such as expectation, variance,
moments, distribution involve .
Conversely, value of T reects knowledge about . For instance, if
T has expectation and is unknown, then can be estimated by
T. Intuitively, if we observe a large value of T, we tend to
conclude that must be large.
Need to assess the extent of the error.
Frequentist approach: Randomness of error means need to judge
based on average error over repeated sampling. Thus need to
study the sampling distribution of T.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Suciency
As T summarizes the data X, the rst natural question is that
whether there is any loss of information due to summarization.
Data contains many information, some are relevant for and some
are not.
Dropping an irrelevant information is desirable, but dropping a
relevant information is undesirable.
How to compare the amount of information about in data and in
T? Is it sucient to consider only the reduced data T?
Denition (Sucient statistic)
A statistic T is called sucient if the conditional distribution of X
given T is free of (that is, the conditional is a completely known
distribution).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
Toss a coin 100 times. The probability of head p is unknown.
T=number of heads obtained.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Suciency principle
If T is sucient, the extra information carried by X is worthless
as long as is concerned. It is then only natural to consider
inference procedures which do not use this extra irrelevant
information. This leads to the principle of suciency.
Denition (Suciency principle)
Any inference procedure should depend on the data only through a
sucient statistic.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
How to check suciency?
Theorem (Neyman-Fisher Factorization theorem)
T is sucient i f (x; ) can be written as the product
g(T(x); )h(x), where the rst factor depends on x only though
T(x) and the second factor is free of .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
X
1
, . . . , X
n
iid
N(, 1).
Bin(1, )
Poi().
N(,
2
). = (, ).
Ga(, ). = (, ). (Includes exponential)
U(0, ), range of X depends on .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Exponential family
f (x; ) = c()h(x) exp[

k
j =1
w
j
()t
j
(x)], = (
1
, . . . ,
d
), d k.
Theorem
Let X
1
, . . . , X
n
be iid observations from the above exponential
family. Then T(X) = (

n
i =1
t
1
(X
i
), . . . ,

n
i =1
t
k
(X
i
)) is sucient
for = (
1
, . . . ,
d
).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Applications
beta(, ).
Curved exponential family: N(,
2
).
Old examples revisited: binomial, Poisson, normal,
exponential, gamma (except uniform). Exercise
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
More applications
Discrete uniform. P(X = x) = 1/, x = 1, . . . , , a positive
integer.
f (x, ) = e
(x)
, x > .
A universal example. iid f density. Order statistics
T = (X
(1)
, . . . , X
(n)
) is sucient.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Remarks
In the order statistics example, the dimension of T the same
as the dimension of the data. Still this is a nontrivial reduction
as n! dierent values of data corresponds to one value of T.
Often one nds better reductions for specic parametric
families, as seen in the many examples before.
Trivially X is always sucient for itself, has no gain.
When one statistic is a mathematical function of the other
and vice versa (i.e., there is a one to one correspondence),
then they carry exactly the same amount of information, so
are equivalent.
More generally, if T is sucient for and T = c(U), a
mathematical function of some other statistic U, then U is
also sucient.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples of in-suciency
X
1
, X
2
iid Poi(). T = X
1
X
2
is not sucient.
X
1
, . . . , X
n
iid pmf f (x; ). T = (X
1
, . . . , X
n1
) is not
sucient.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Minimal suciency
Maximum possible reduction.
Denition (Minimal sucient statistic)
T is a minimal sucient statistic if, given any other sucient
statistic T

, there is a function c() such that T = c(T

).
Equivalently, T is minimal sucient if, given any other sucient
statistic T

, whenever x and y are two data values such that


T

(x) = T

(y), then T(x) = T(y).


Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Checking minimal suciency
Theorem (Lehmann-Schee Theorem)
A statistic T is minimal sucient if the following property holds:
For any two sample points x and y, f (x; )/f (y; ) does not
depend on if and only if T(x) = T(y).
Corollary
Minimal sucient statistic is not unique. But any two are in
one-to-one correspondence, so are equivalent.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
iid N(,
2
).
iid U(, + 1).
iid Cauchy().
iid U(, ).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Minimal suciency in exponential family
Theorem
For iid observations from an exponential family
f (x; ) = c()h(x) exp[

w
j
()t
j
(x)],
so that, no ane (linear plus constant) relationship exists between
w
1
(), . . . , w
k
(), the statistic
T(X) = (

n
i =1
t
1
(X
i
), . . . ,

n
i =1
t
k
(X
i
)) is minimal sucient for
= (
1
, . . . ,
d
).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
N(,
2
).
Ga(, ).
Be(, ).
N(,
2
).
Be(, 1 ), 0 < < 1.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Ancillary statistic
Denition
A statistic T is called ancillary if its distribution does not depend
on the parameter.
Induced family is singleton, completely known, contains no
information about . Opposite of suciency.
Function of ancillary is ancillary.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
iid U(, + 1).
Location family, iid f (x ).
Scale family, iid
1
f (x/).
iid N(, 1).
X
1
, X
2
iid N(0,
2
).
X
1
, . . . , X
n
iid N(,
2
).
T = ((X
1


X)/S, . . . , (X
n


X)/S), where S is sample
standard deviation is ancillary.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Results
Location family. f (x ).
T is a location invariant statistic, i.e.,
T(x
1
+ b, . . . , x
n
+ b) = T(x
1
, . . . , x
n
). Then T is ancillary.
In particular, sample sd S is ancillary (and so are other
estimates of scale).
Location scale family.
1
f ((x )/).
T is a location-scale invariant statistic, i.e.,
T(ax
1
+ b, . . . , ax
n
+ b) = T(x
1
, . . . , x
n
). Then T is ancillary.
If T
1
and T
2
are such that
T
1
(ax
1
+ b, . . . , ax
n
+ b) = aT
1
(x
1
, . . . , x
n
) and
T
2
(ax
1
+ b, . . . , ax
n
+ b) = aT
2
(x
1
, . . . , x
n
), then T
1
/T
2
is
ancillary.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Question. An ancillary statistic does not contain any information
about . Then why do we study it?
It indicates how good the given sample is.
Example
X
1
, . . . , X
n
iid U( 1, + 1). is estimated by the midrange
(X
(1)
+ X
(n)
)/2. The range R = X
(n)
X
(1)
is ancillary.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Question. Can addition or removal of ancillary information
change the information content about ?
Intuitively, one may think that ancillary contains no information
about , so it should not change the information content. But this
interpretation is false.
U(, + 1).
A more dramatic example: (X, Y) BVN(0, 0, 1, 1, ).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Completeness
Let a parametric family {f (x, ), } be given. Let T be a
statistic. Induced family of distributions f
T
(t, ), .
Denition
A statistic T is called complete (for the family {f (x, ), }), or
equivalently the induced family f
T
(t, ), is called complete if
E

(g(T)) = 0 for all implies g(T) = 0 a.s. P

for all .
In other words, no non-constant function of T can have constant
expectation (in ).
Completeness not only depends on the statistic, but also on the
family. For instance, no nontrivial statistic is complete if the family
is singleton.
In order to nd optimal estimators and tests, one sometimes needs
to nd complete sucient statistics.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X bin(n, ), 0 < < 1.
X Poi(), 0 < .
X N(, 1), < < .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Theorem
Let X
1
, . . . , X
n
be iid observations from the above exponential
family. Then T(X) = (

n
i =1
t
1
(X
i
), . . . ,

n
i =1
t
k
(X
i
)) is complete
if the parameter space contains an open set in R
k
(i.e., d = k).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
A non-exponential example: iid U(0, ), T = X
(n)
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Useful facts
If T is complete and S = (T) is a function of T, then S is
also complete.
The constant statistic is complete for any family.
A non-constant ancillary statistic cannot be complete.
A statistic is called rst order ancillary if its expectation is free
of . If a non-constant function of statistic T is rst order
ancillary, then T cannot be complete.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Connection with minimal suciency
Theorem
If T is complete and sucient, and a minimal sucient statistic
exists, then T is also minimal sucient.
As a consequence, in the search for complete sucient statistics, it
is enough check completeness of a minimal sucient statistic (if
exists and easily found).
This implies no complete sucient statistic exists for the
U(, + 1) family, or the Cauchy() family.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Basus theorem
T complete sucient carries all relevant information about . S
ancillary carries no information about . The following
remarkable result shows that they are statistically independent.
Theorem (Basus theorem)
A complete sucient statistic is independent of all ancillary
statistics.
Completeness cannot be dropped, even if T is minimal sucient
iid U(, + 1).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Applications
iid exponential. Then T =

n
i =1
X
i
and (W
1
, . . . , W
n
) are
independent, where W
j
= X
j
/T. Also calculate E(W
j
).
iid normal. T =

X and sample standard deviation S are
independent.
iid U(0, ). Then X
(n)
and X
(1)
/X
(n)
are independent. Also
calculate E(X
(1)
/X
(n)
).
iid Ga(, ), > 0 known. Let U = (

n
i =1
X
i
)
1/n
. Then
U/

X is ancillary, independent of

X. Also
E[(U/

X)
k
] = E(U
k
)/E(

X
k
).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Likelihood
X f (, ) pmf or pdf. X = x is observed.
Denition
The likelihood function is a function of the parameter with an
observed sample, and is given by L(|x) = f (x, ).
Same expression, but now x is xed and is variable.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
binomial experiment. Decide to stop after 10 trials. 3
successes obtained.
negative binomial experiment. Decide to stop after 3
successes. 10 trials were needed.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Likelihood can be viewed as the degree of plausibility. An estimate
of may be obtained by choosing the most plausible value, i.e.,
where the likelihood function is maximized. This leads to one of
the most important methods of estimation the maximum
likelihood estimator (more details in Chapter 7).
For instance, in either example above, the likelihood function is
maximized at 0.3.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
More examples
iid Poisson()
iid N(,
2
)
iid U(0, )
Exponential family
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayesian approach
Suppose that can be considered as a random quantity with some
marginal distribution (), a pre-experiment assessment called the
prior distribution. Then we can legitimately calculate the posterior
distribution of given the data by the Bayes theorem. This
posterior distribution will be the source of any inference about .
Theorem (Bayes theorem)
(|X) =
()f (X, )
_
(t)f (X, t)dt
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
iid Bin(1, ), prior U(0, 1).
iid Poi(), prior standard exponential.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Diculty:
is xed, nonrandom.
How to specify a prior?
Bayesians response:
Probability is a quantication of uncertainty of any type.
The arbitrariness of prior choice can be rectied to some
extent by the use of automatic priors which are
non-informative. (More later)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Point Estimation
Find estimators for the unknown parameter or its function
().
Evaluate your estimators (are they good?)
Denition
A point estimator of , is a function

= W(X
1
, . . . , X
n
).
Given a sample of realized observations, the number W(x
1
, . . . , x
n
)
is called a point estimate of .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Methods of point estimation
method of moments
maximum likelihood estimator (MLE)
Bayes estimators
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of Moments
Let X
1
, . . . , X
n
be a sample from a population with pdf or pmf
f (x|
1
, . . . ,
k
). Estimate = (
1
, . . . ,
k
) by solving k equations
formed by matching rst k sample and population raw moments:
m

1
=
1
n

n
i =1
X
i
,

1
= E

(X)
m

2
=
1
n

n
i =1
X
2
i
,

2
= E

(X
2
)
. . . , . . .
m

k
=
1
n

n
i =1
X
k
i
,

k
= E

(X
k
)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X
1
, . . . , X
n
iid N(,
2
), both and
2
unknown.
X
1
, . . . , X
n
iid Bin(1, ).
X
1
, . . . , X
n
iid Ga(, ), with (, ) unknown.
X
1
, . . . , X
n
iid Unif(
1
,
2
), where
1
<
2
, both unknown.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Features
Easy to implement
Computationally cheap
Converges to the parameter with increasing probability (called
consistency)
Not necessarily give asymptotically most ecient estimator
Often used as an initial estimator in iterative methods
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Maximum Likelihood Estimator
Recall that the likelihood function is
L(|X) = L(|X
1
, . . . , X
n
) =
n

i =1
f (X
i
|)
Denition
The maximum likelihood estimator (MLE)

of is the location at
which L(|X) attains its maximum as a function of . Its numerical
value is often called the maximum likelihood estimate.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
How to nd the MLE?
We want to nd the global maximum of L(|X).
If L(|X) is dierentiable in (
1
, . . . ,
k
), we solve

j
L(|X) = 0, j = 1, . . . , k.
The solutions to these likelihood equations locate only
extreme points in the interior of , and provide possible
candidates for the MLE. They can be local or global minima,
local or global maxima, or inection points. Our job is to nd
a global maximum.
((d
2
/d
2
)L())
=

< 0 is sucient for local maxima. We also


need to check the boundary points separately.
If there is only one local maxima, then that must be the
unique global maxima.
Many examples falls in this category, so no further work will
be needed then.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
How to nd the MLE? (contd.)
In practice, we often work with log L(|X), i.e. solve

j
log L(|X) = 0, j = 1, . . . , k.
We consider several dierent situations:
one parameter case
non-dierentiable L(|X)
restricted range MLE (e.g. is not the whole real line)
discrete parameter space
two-parameter case
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples: One-parameter case
X
1
, . . . , X
n
iid N(, 1), with unknown.
X
1
, . . . , X
n
iid Poi().
X
1
, . . . , X
n
iid Exp().
(numerical/iterative method): X
1
, . . . , X
n
iid Weibull().
(numerical/iterative method): X
1
, . . . , X
n
iid gamma(, 1).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Restricted MLE
Parameter space is a proper subset of the set of all possible
values of the parameter. Special attention is needed to make sure


X
1
, . . . , X
n
iid N(, 1), 0.
But what if > 0?
X
1
, . . . , X
n
iid N(,
2
), a b.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Non-dierentiable likelihood
X
1
, . . . , X
n
iid Unif(0, ], > 0.
X
1
, . . . , X
n
iid exponential location family with pdf
f (x) = e
(x)
, if x .
X
1
, . . . , X
n
iid Unif(
1
2
, +
1
2
).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Discrete parameter space
Example
Let X by a single observation taking values from {0, 1, 2} according
to P

, where =
0
or
1
. The probability of X is summarized
x = 0 x = 1 x = 2
=
0
0.8 0.1 0.1
=
1
0.2 0.3 0.5
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples: Two-parameter case
For dierentiable likelihood, needs calculus of several variables in
general, but often simple tricks help reduce to one-dimension.
X
1
, . . . , X
n
iid N(,
2
).
X
1
, . . . , X
n
iid location-scale exponential family, with pdf
f (x; , ) =
1

e
(x)/
if x .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Remarks about the MLE
The MLE

is the value for which the observed sample x is
most likely; possess some optimal properties (discussed later)
In exponential families, coincides with the method of moment
estimator.
The MLE can be numerically sensitive to the variation in the
data, if the likelihood function is discontinuous.
If T is sucient for , then the MLE

must be a function of
T.
MLE is the value of that maximizes g(T(X), ), where
g(t, )) is the pdf or pmf of T = T(X) at t.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Induced likelihood
If = () is a parametric function, then the likelihood for is
dened by
L

(|X) = sup
:()=
L(|X).
Theorem (Invariance Principle)
If

is the MLE of , then for any function (), the MLE of ()
is (

).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X
1
, . . . , X
n
iid Bin(1, ). Find the MLE of
_
(1 ).
X
1
, . . . , X
n
iid Poi(). Find the MLE of P

(X 1).
X
1
, . . . , X
n
iid N(,
2
).
Find the MLE of /.
Find the MLE of the population median.
Find the MLE for c = c(, ) such that P
,
(

X > c) = 0.025.
(the 97.5% percentile of the distribution of

X).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
EM-algorithm
Useful numerical algorithm to compute the MLE with
missing data
Iterative method repeating E-step (Expectation) and M-step
(Maximization).
Given data Y, missing vital X. Augmented data (X, Y).
Actual likelihood L(|Y) = E

[L(|X, Y)|Y].
Start with an initial estimator

.
Calculate E
=

0
(log L(|X, Y)|Y).
Maximize with respect to to get update

1
.
Repeat the procedure by replacing the old estimate by the
new until convergence.
Example
Multinomial (( + 1)/2, /4, /4, 1/2 ).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayes Estimators
Recall, in the Bayesian approach is considered as a quantity
whose variation can be described by a probability distribution
(called the prior distribution). A sample is then taken from a
population indexed by and the prior distribution is updated with
this sample information. The updated prior is called the posterior
distribution.
Prior distribution of : ()
Posterior distribution of : (|X) = f (X|)()/m(X)
Marginal distribution of X: m(X) =
_
f (X|)()d
The mean of the posterior distribution, E(|X), can be used
as the Bayes estimator of .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X
1
, . . . , X
n
iid Bin(1, ). Assume the prior distribution on is
Beta(, ). Find the posterior distribution of and the Bayes
estimator of .
Special case: () Unif(0,1).
X
1
, . . . , X
n
iid N(0, ), [0, 1], prior U[0, 1].
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Conjugate family
Let F denote the class of pdfs or pmfs f (x|). A class of prior
distributions is a conjugate family for F if the posterior
distribution is in the class for all f F, all priors in , and all
observation values x.
Examples:
The beta family is conjugate for the binomial family.
The normal family is conjugate for the normal family.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Methods of Evaluating Estimators
Various criteria to evaluate

and compare dierent point
estimators
mean squared error
best unbiased estimators or UMVUE (Uniform Minimum
Variance Unbiased Estimator)
optimal for general loss function and risk
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Unbiasedness and Mean Squared Error
The bias of a point estimator W of is Bias

(W) = E

W .
An estimator whose bias is equal to 0 is called unbiased.
An unbiased estimator satises E

W = for all .
The mean squared error (MSE) of an estimator W of is
dened by E

(W )
2
.
the MSE is a function of , and has the representation
E

(W )
2
= Var

W + (Bias

W)
2
.
the MSE incorporates two components, one measuring the
variability of the estimator (precision) and the other measuring
its bias (accuracy).
Small value of MSE implies small combined variance and bias.
Unbiased estimators do a good job of controlling bias.
Smaller MSE indicates smaller probability for W to be far from
, because
P(|W | > )
1

2
E

(W )
2
=
1

2
MSE

(W)
by Chebyshev Inequality.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
In general, there will not be one best estimator. Often the MSE
of two estimators cross each other, showing that each estimator is
better in only a portion of the parameter space.
Example
Let X
1
, X
2
be iid from Bin(1, p) with 0 < p < 1. Compare three
estimators with respect to their MSE.
p
1
= X
1
p
2
=
X
1
+X
2
2
p
3
= 0.5.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Illustration
Let X
1
, . . . , X
n
be iid N(,
2
). Show

X is unbiased for and
S
2
is unbiased for
2
, and compute their MSEs.
What about non-normal distributions with mean and
variance
2
?
Let X
1
, . . . , X
n
be iid N(,
2
). Show the estimator

2
=
1
n

n
i =1
(X
i


X)
2
is biased for
2
, but it has a smaller
MSE than S
2
.
More generally, nd the MSE of cS
2
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Uniformly Minimum Variance Unbiased Estimator
If the estimator W is unbiased for (), then its MSE is equal to
Var

(W). Therefore, choosing a better unbiased estimator is


equivalent to choosing the one with smaller variance.
Denition
An estimator W

is a best unbiased estimator of () if it satises:


E

= () for all ;
For any other estimator W with E

W = (), we have
Var

Var

W for all .
W

is also called a uniform minimum variance unbiased estimator


(UMVUE).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
X
1
, . . . , X
n
iid Poi(). Both

X and S
2
are unbiased for .
How to nd a best unbiased estimator?
If B() is a lower bound on the variance of any unbiased
estimators of (), and if W

is unbiased satises
Var

= B(), then W

is a UMVUE.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Cramer-Rao Inequality
Theorem
Let X be a sample with pdf f (x, ). Suppose W(X) is an
estimator satisfying
E

W(X) = () for any ;


Var

W(X) < .
If dierentiation under integral sign can be carried out, then
Var

(W(X))
[

()]
2
E

_
(

log f (X|))
2
_.
In the i.i.d. case, the bound reduces to

()
2
/nI (), where
I () = E

_
(

log f (X|))
2
_
is called the Fisher information (per observation).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Score function: s(X, ) =

log f (X|) =
1
f (X|)

f (X|).
Lemma (Expressions for I ())
If dierentiation and integration are interchangeable,
I () = E

(s(X, ))
2
= var

(s(X, ))
= E

_

2

2
log f (X, )
_
=
_ _

log f (x, )
_
2
f (x, )dx
=
_
f

(x, )
2
f (x, )
dx
=
_ _

2

2
log f (x, )
_
f (x, )dx.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X
1
, . . . , X
n
iid Poi(). Find the Fisher information number
and a UMVUE for .
X
1
, . . . , X
n
iid N(,
2
), unknown but
2
known. Find a
UMVUE for using Cramer-Rao bound.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
When can we exchange dierentiation and integration?
yes for exponential family.
not always true for non-exponential family. We have to do a
match check for
d
d
_
h(x)f (x, )dx and
_
h(x)

[f (x, )]dx.
Example
X
1
, . . . , X
n
iid from Unif(0, ).
Cramer-Rao bound does not work here!
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Attainability of Cramer-Rao bound
The Cramer-Rao bound inequality says, if W

achieves the
variance bound then it is an UMVUE. In the one-parameter
exponential family case, we can nd such an estimator. But there
is no guarantee that this lower bound is sharp (attainable) in other
situations. It is possible that the value of Cramer-Rao bound may
be strictly smaller than the variance of any unbiased estimator.
Corollary
Let X
1
, . . . , X
n
be iid with pdf f (x, ), where f (x, ) satises the
assumptions of the Cramer-Rao bound theorem. Let
L(|x) =

n
i =1
f (x
i
, ) denote the likelihood function. If W(X) is
unbiased for (), then W(X) attains the Cramer-Rao Lower
Bound if and only if
a()[W(X) ()] = s(X, )
for some function a().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Attainability in one-parameter exponential family
Theorem
Let X
1
, . . . , X
n
be iid from a one-parameter exponential family
with the pdf f (x, ) = c()h(x) exp{w()T(x)}. Assume
E[T(X)] = (). Then n
1

n
i =1
T(X
i
), as an unbiased estimator
of (), attains the Cramer-Rao Lower Bound, i.e.
Var
_
n
1
n

i =1
T(X
i
)
_
=
[

()]
2
nI ()
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X
1
, . . . , X
n
iid from Bin(1, ). Find an UMVUE of and show
it attains the Lower Bound.
X
1
, . . . , X
n
N(,
2
), with (,
2
) both unknown. Consider
estimation of
2
. What is the Cramer-Rao Lower bound and
is it attainable?
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Constructing UMVUE using Rao-Blackwell Method
An important method of nding/constructing UMVUEs with the
help of conditioning on a complete and sucient statistics.
Review on conditional expectation:
E(X) = E[E(X|Y)], for any X, Y.
Var(X) = Var[E(X|Y)] +E[Var(X|Y)], for any X, Y
E(g(X)|Y) =
_
g(x)f
x|y
(x|y)dx, and it is a function of Y.
Cov(E(X|Y), Y) = Cov(X, Y).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Rao-Blackwell Theorem
Theorem
Let W be unbiased for () and T be a sucient statistic for .
Dene (T) = E(W|T). Then the following hold
E

(T) = ();
Var

(T) Var

W for all .
Thus, E(W|T) is a uniformly better unbiased estimator of
() than W.
Conditioning any unbiased estimator on a sucient statistic will
result in a uniform improvement, so we need consider only
statistics that are functions of a sucient statistic for best
unbiased estimators.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Let X
1
, X
2
be iid N(, 1). Show X
1
is unbiased for and
E(X
1
|

X) is uniformly better.
Let X
1
, . . . , X
n
be iid Unif(0, ). Show Y = (n + 1)X
(1)
is
unbiased for and E(Y|X
(n)
) is uniformly better.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Uniqueness of UMVUE
Theorem
If W is an UMVUE of (), then W is unique.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
UMVUE and unbiased estimators of zero
Theorem
If E

W = (), W is the best unbiased estimator of () if and


only if W is uncorrelated with all unbiased estimators of 0.
Example
Let X be an observation from a Unif(, + 1). Show that
X
1
2
is unbiased for .
Show that h(X) = sin(2X) is an unbiased estimators of zero.
Show X
1
2
and h(X) are correlated. So X
1
2
is not best.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Lehmann-Schee theorem
Theorem
Let T be a complete sucient statistic for a parameter , and let
(T) be any estimator based on T. Then (T) is the unique best
unbiased estimators of its expected value.
Thus
Find a complete sucient statistic T for a parameter ,
Find an unbiased estimator h(X) of (),
then (T) = E(h(X)|T) is the best unbiased estimator of
().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Let X
1
, . . . , X
n
be iid Bin(k, ).
X
1
, . . . , X
n
are iid from Unif(0, ).
Find the UMVUE of .
Find the UMVUE of g(), where g is dierentiable on (0, ).
Suppose X
1
, . . . , X
n
are iid from Poi().
Find the UMVUE of .
Find the UMVUE of g() =
r
, r 1 integer.
Find the UMVUE of g() = e

.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
More Examples
Suppose that the random variables Y
1
, . . . , Y
n
satisfy
Y
i
= x
i
+
i
, i = 1, . . . , n,
where x
1
, . . . , x
n
are xed constants, and
1
, . . . ,
n
are iid
N(0,
2
) with
2
known. Find the MLE of and show it is
UMVUE.
Suppose X
1
, . . . , X
n
are iid from exp(), > 0.
Find the UMVUE for .
Find the UMVUE for () = 1 F

(s), where
F

(s) = P

(X
1
> s).
Find the UMVUE for e
1/
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
More Examples (contd.)
Suppose X
1
, . . . , X
n
are iid from N(,
2
), both (,
2
)
unknown.
Find the UMVUE for .
Find the UMVUE for
2
.
Find the UMVUE for
2
.
Normal probability. X
1
, . . . , X
n
iid N(, 1).
() = P

(X
1
c) = (c ).
Ridiculous UMVUE. X
1
, . . . , X
n
iid Poi(). () = e
3
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Loss Function Optimality
Observations X
1
, . . . , X
n
are iid with pdf f (x, ), . To
evaluate the estimator

(X), various loss function can be used.
The loss function measures the closeness of and

absolute error loss: L(,



) = (

)
2
squared error loss: L(,

) = |

|
a loss that penalizes overestimation more than
underestimation is
L(,

) = (

)
2
I (

< ) + 10(

)
2
I (

).
a loss that penalized more if is near 0 than if || is large
L(,

) =
(

)
2
|| + 1
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Loss Function Optimality (contd.)
To compare estimators, we use the expected loss, called the risk
function,
R(,

) = E

L(,

(X)).
If R(,

1
) < R(,

2
) for all , then

1
is the preferred
estimator because it performs better for all . In particular, for the
squared error loss, the risk function is the MSE.
Example
X
1
, . . . , X
n
iid from Bin(1, ). Compare two estimators in terms of
their MSE.
MLE

1
=

X
Bayes estimator: prior () Beta(, ) with
= =
_
n/4,

B
=

n
i =1
X
i
+
_
n/4
n +

n
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Minimaxity
Risk functions are generally overlapping. One cannot beat everyone
else.
Example
X
1
, . . . , X
n
iid N(,
2
). Consider the estimators of the form

b
(X) = bS
2
.
Minimaxity: Compare the worst case scenario compare the
maximum risks. Find the estimator which has the smallest
maximum risk minimax estimator.
Downside
Problems with unbounded risk maximum is innity.
Not easy to nd the minimax estimator.
Too pessimistic.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayes Rule
The Bayes risk is the average risk with respect to the prior ,
_

R(,

)()d.
By denition, the Bayes risk can be written as
_

R(,

)()d =
_

__
L(,

(x))f (x|)dx
_
()d.
Note f (x)() = (|x)m(x), where (x|) is the posterior
distribution of and m(x) is the marginal distribution of X, then
the Bayes risk becomes
_

R(,

)()d =
_ __

L(,

(x))(|x)d
_
m(x)dx.
The quantity
_

L(,

(x))(|x)d is called the posterior expected
loss.
To minimize the Bayes risk, we only need to nd

to minimize the
posterior expected loss for each x.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayes Rule (contd.)
The Bayes rule with respect to a prior is an estimator that yields
the smallest value of the Bayes risk.
For squared error loss, the posterior expected loss is
_

( a)
2
(|x)d = E
_
( a)
2
|x
_
,
therefore the Bayes rule is E(|x).
For absolute error loss, the posterior expected loss is
E(| a||x). The Bayes rule is the median of (|x).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X
1
, . . . , X
n
are iid from N(,
2
) and let () be N(,
2
).
The values
2
, ,
2
are known.
X
1
, . . . , X
n
are iid from Bin(1, ) and let () be Beta(, ).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Hypothesis Testing
Point estimation: provide a single estimate of
Hypothesis testing: test a statement about
A hypothesis is a statement about a population parameter.
Two complementary hypotheses in a hypothesis testing are
called the null hypothesis and alternative hypothesis. Let
0
be a subset of the parameter space, called null region. The
hypotheses are denoted by H
0
and H
1
,
H
0
:
0
vs H
1
:
c
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Illustration
Example
An ideal manufacturing process requires that all products are
non-defective. This is very seldom. The goal is to keep the
proportion of defective items as low as possible. Let be the
proportion of defective items, and 0.01 be the maximum
acceptable proportion of defective items.
Statement 1: 0.01 (the proportion of defectives is
unacceptably high)
Statement 2: < 0.01 (acceptable quality)
Example
Let be the average change in a patients blood pressure after
taking a drug. An experimenter might be interested in testing
H
0
: = 0 (the drug has no eect on blood pressure)
H
1
: = 0 (there is some eect)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Dierent Types of Hypotheses
Simple hypotheses: Both H
0
and H
1
consist of only one
probability distribution,
Composite hypotheses: Either H
0
or H
1
contains more than
one possible distribution
One-sided hypotheses: H :
0
or H : <
0
.
Two-sided hypotheses: H
0
: =
0
vs H
1
: =
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Rejection region
A hypothesis testing procedure or hypothesis test is a rule
that species:
for which sample values the decision is made to accept H
0
as
true
for which sample values H
0
is rejected and H
1
is accepted as
true.
The subset of the sample space for which H
0
will be rejected
is R: rejection region or critical region.
The complement of the rejection region is R
c
: acceptance
region.
The rejection region R of a hypothesis test is usually dened
by a test statistic W(X), a function of the sample
R = {X : W(X) > c} = reject H
0
.
R
c
= {X : W(X) c} = accept H
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Methods of Evaluating Tests
In deciding to accept or reject the null hypothesis H
0
, we might
make a mistake no matter whatever the decision is. There are two
types of errors:
Type I error: if H
0
is actually true, i.e.
0
, but the test
incorrectly decides to reject H
0
Type II error: if H
0
is actually false, i.e.
c
0
, but the test
incorrectly decides to accept H
0
Decision
Accept H
0
Reject H
0
H
0
Correct decision Type I error
Truth
H
1
Type II error Correct decision
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Power Function
Denition
The power function of a hypothesis test with rejection region R is
the function of dened by
() = P

(X R).
=
_
probability of Type I error if
0
1 probability of Type II error if
c
0
Note P(Type I error) = (), for
0
, P(Type II error) =
1 (), for
c
0
Ideal test: () = 0 for all
0
; () = 1 for all
c
0
.
Good test:
() is near 0 (small) for most
0
;
() is near 1 (large) for most
c
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example (Binomial power function)
X Bin(5, ).
H
0
:
1
2
versus H
1
: >
1
2
.
Test 1: reject H
0
if and only if all successes are observed,
i.e R = {5}
Test 2: reject H
0
if X = 3, 4, or 5.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Likelihood Ratio Tests (LRT)
Denition
The likelihood ratio test statistic for testing H
0
:
0
vs
H
1
:
c
0
is
(x) =
sup

0
L(|x)
sup

L(|x)
.
A likelihood ratio test (LRT) has a rejection region
R : {x : (x) c},
where c is any number satisfying 0 c 1.
This should be reduced to the simplest possible form.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Rationale of LRT
The numerator of (x) is the maximum probability of the
observed sample, computed over parameters in H
0
. The
denominator of (x) is the maximum probability of the
observed sample over all possible parameters.
The numerator says which
0
makes the observation of
data most likely; the denominator say which make the
observation of data most likely.
The ratio of these two maxima is small if there are parameter
points in H
1
for which the observed sample is much more
likely than for any parameter in H
0
. In this situation, the LRT
criterion says H
0
should be rejected and H
1
accepted as true.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Relation between LRT and MLE
Let

0
be the MLE of in the null set
0
(restricted
maximization).
Let

be the MLE of in the full set (unrestricted
maximization). then the LRT statistic, a function of x (not ) is
(x) =
sup

0
L(|x)
sup

L(|x)
=
L(

0
|x)
L(

|x)
In R : {x : (x) c}, dierent c gives dierent rejection region
and hence dierent tests.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X
1
, . . . , X
n
iid N(,
2
) with unknown (
2
known).
Consider testing
H
0
: =
0
versus H
1
: =
0
,
where
0
is a number xed by the experimenter prior to the
experiment.
Find the LRT and its power function.
Comment on the decision rules given by dierent cs.
Let X
1
, . . . , X
n
be a random sample from a
location-exponential family
f (x, ) = e
(x)
if x ,
where < < . Consider testing H
0
:
0
versus
H
1
: >
0
. Find the LRT.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
LRT and suciency
Theorem
If T(X) is a sucient statistic for ,

(t) is the LRT statistic


based on T, and (x) is the LRT statistic based on x. Then

(T(x)) = (x)
for every x in the sample space.
Thus the simplied expression for (x) should depend on x only
through T(x) if T(X) is a sucient statistic for .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X
1
, . . . , X
n
iid N(,
2
) with
2
known. Test
H
0
: =
0
versus H
1
: =
0
.
Let X
1
, . . . , X
n
be a random sample from a
location-exponential family. Test H
0
:
0
versus
H
1
: >
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Nuisance parameter case
Likelihood ratio tests are also useful when there are nuisance
parameters, which are present in the model but not of direct
interest.
Example
X
1
, . . . , X
n
iid N(,
2
), both and
2
unknown. Test
H
0
:
0
versus H
1
: >
0
.
Specify and
0
.
Find the LRT and the power function.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayesian Tests
Using the posterior density (|x, compute
P(
0
|x) = P(H
0
is true |x)
P(
c
0
|x) = P(H
1
is true |x)
Decide in favor the hypothesis which has greater posterior
probability: Accept H
0
if P(
0
|x)
1
2
.
Does not work if
0
is a point and is given a prior density. One
will need to put a prior mass at the point.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
Let X
1
, . . . , X
n
be iid N(,
2
) and the prior distribution on be
N(,
2
), where
2
, ,
2
are known. Test H
0
:
0
against
H
1
: >
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Unbiased Test
Denition
A test with power function () is unbiased if
(

) (

), for every


c
0
and


0
.
In most problems, there are many unbiased tests.
Recall () = P(reject H
0
). An unbiased test says that the
probability of rejecting H
0
when H
0
is true is smaller than the
probability of rejecting H
0
when H
0
is false.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X Bin(5, ). Consider testing
H
0
:
1
2
versus H
1
: >
1
2
and reject H
0
if X = 5.
X
1
, . . . , X
n
N(,
2
),with
2
known. Consider testing
H
0
:
0
versus H
1
: >
0
.
The LRT test is unbiased.
Draw the graph of the power function.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Controlling Type I error
For a xed sample size, it is usually impossible to make both types
of error arbitrarily small.
Common approach:
Control the Type I error probability at a specied level .
Within this class of tests, make Type II error probability that
is as small as possible; equivalently, maximize the power.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Size and level test
Denition
For 0 1, a test with power function () is a size test if
sup

0
() = .
Denition
For 0 1, a test with power function () is a level test if
sup

0
() .
If these relations hold only in the limit as n , we call the tests
respectively asymptotically size (level) . [More details in the nal
chapter]
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Notations and remarks
Typical choices of are: 0.01, 0.05, 0.10.
We use z
/2
to denote the point having probability /2 to the
right of it for a standard normal pdf. By convention, we have
P(Z > z

) = , where Z N(0, 1)
P(T
n1
> t
n1,/2
) = /2, where T
n1
t
n1
P(
2
p
>
2
p,1
) = 1 , chi square with d.f. p
Note z

= z
1
.
Commonly used cutos:
z
0.05
= 1.645, z
0.025
= 1.96, z
0.01
= 2.33, z
0.005
= 2.58.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
How to specify H
0
and H
1
?
If an experimenter expects an experiment will indicate a
phenomenon, should choose H
1
to be the theory being
proposed.
H
1
is sometimes called researchers hypothesis. By using a
level test with small , the experiment is guarding against
saying the data support the research hypothesis when it is
false.
Announcing a new phenomenon when in fact nothing has
happened is usually more serious than missing something new
that has in fact occurred.
Similarly, in judicial system the evidence is collected to decide
whether the accused is innocent or guilty. To prevent the
possibility of penalizing an innocent person incorrectly, the
test should be set up H
0
: innocent versus H
1
: guilty
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
Let X Bin(5, ). Consider testing
H
0
:
1
2
versus H
1
: >
1
2
with the procedure:
reject H
0
if X = 5.
Is this test a level 0.05 test?
Is this test a size 0.05 test?
What size is the test at?
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
How to critical value of LRT
In order to make a LRT test be a size test, we choose c such that
sup

0
P((X) c) = .
iid N(,
2
),
2
is known. H
0
:
0
vs H
1
: >
0
.
iid N(,
2
),
2
is known. Consider testing for H
0
: =
0
vs
H
1
: =
0
.
Let X
1
, . . . , X
n
be iid from N(,
2
),
2
unknown. Consider
testing H
0
: =
0
versus H
1
: =
0
. Show that the LRT
test that rejects H
0
if |

X
0
| > t
n1,/2
S/

n is a test of
size .
iid location-exponential dist. Consider testing H
0
:
0
vs
H
1
: >
0
. Find the size LRT test.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Sample size calculation
For a xed sample size, it is usually impossible to make both types
of error probabilities arbitrarily small. But if we can choose the
sample size, it is possible to make the desired power level.
Example
iid N(,
2
),
2
is known. Test H
0
:
0
vs H
1
: >
0
. The
LRT test rejects H
0
if (

X
0
)/(/

n) > C has the power


function
() = 1
_
C +

0

n
_
.
Note () is increasing in .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Notes
The maximum Type I error is
sup

0
() = (
0
) = 1 (C).
For the size test, C = z

.
After C is chosen, it is possible to increase () for >
0
by
increasing the sample size n. Thus we can minimize Type II
error (Remember: Type I error is under control already).
Draw the picture of power function for small n and large n.
Assume C = z

. How to choose n such that the maximum


Type II error is 0.2 if
0
+?
Compute n if = 0.05 in (3).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
Let X Bin(n, ). Testing:
H
0
: 3/4 vs H
1
: < 3/4.
The LRT test for this problem is to reject H
0
if X c.
Choose c and n such that the following satises simultaneously:
If =
3
4
, we have Pr (reject H
0
|) = 0.01; (control Type I
error)
If =
1
2
, we have Pr (reject H
0
|) = 0.99. (control Type II
error)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Most Powerful Tests
Given that the maximum probability of Type I error less than or
equal to , the most powerful level test minimizes the
probability of Type II error, or, equivalently maximizes the power
function at a
c
0
.
If this occurs for all
c
0
, such a test is called the uniformly
most powerful (UMP) level test.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Test function
Given a rejection region R, dene a test function on the sample
space to be
(x) =
_
1 if x R
0 if x / R
.
Interpret (X) as the probability of rejecting the null hypothesis
given the sample X.
This also opens doors for randomized tests, where (X) can even
take values strictly between 0 and 1.
Note the expected value of is the power function:
E

[(X)] = P

(X R) = ().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Existence of UMP tests
Lemma (Neyman-Pearson)
Consider testing H
0
: =
0
versus H
1
: =
1
, where the pdf or
pmf corresponding to
i
is f (x,
i
), i = 0, 1. Consider any test
function satisfying
(x) =
_
1, if f (x,
1
) > kf (x,
0
),
0, if f (x,
1
) < kf (x,
0
),
for some k 0, and E

0
(X) = . Then
(X) is a UMP size test,
if k > 0, any other UMP level test

must have size and


can dier from only on the set {x : f (x,
1
) = kf (x,
0
)}.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X Bin(2,), one observation. H
0
: =
1
2
versus H
1
: =
3
4
.
To obtain the UMP level 1/8 test and the UMP level 1/2 test?
X Exp(), H
0
: = 1 versus H
1
: = 2.
X Cauchy(), H
0
: = 0 versus H
1
: = 1.
X Un(0, ), H
0
: = 1 versus H
1
: = 2.
X Un(, + 1), H
0
: = 0 versus H
1
: = 2.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Sucient statistic and UMP test
Let T(X) be a sucient statistic for and g(t, ) is the pdf or
pmf of T corresponding to . Then a UMP level test (T)
based on T is given by
(t) =
_
1, if g(t,
1
) > kg(t,
0
),
0, if g(t,
1
) < kg(t,
0
),
for some k 0, where = E

0
(T).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
UMP normal test for mean: X
1
, . . . , X
n
be iid from N(,
2
)
with
2
known, H
0
: =
0
versus H
1
: =
1
, where
1
>
0
.
UMP normal test for variance: X
1
, . . . , X
n
be iid from
N(0,
2
) with
2
unknown. H
0
:
2
=
2
0
versus H
1
:
2
=
2
1
,
where
2
1
>
2
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Comments
Discrete Case: Suppose only has two possible values
0
or

1
, and X is a discrete variable taking nite k values with
P

i
(X = a
j
), j = 1, . . . , k; i = 0, 1. H
0
: =
0
vs =
1
. The
rejection region R of the UMP level test satises
max
R

a
j
R
P

1
(X = a
j
)
subject to

a
j
R
P

0
(X = a
j
) .
N-P test is the LRT test for H
0
: =
0
vs =
1
.
For simple hypotheses, the UMP level test is unbiased, i.e.
(
1
) > (
0
) = .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
UMP test for one-sided composite alternative
iid N(, 1).
H
0
: =
0
vs H
1
: >
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Monotone Likelihood Ratio (MLR)
Denition
A family of pdfs or pmfs {g(t, ) : } for a univariate random
variable T with real-valued parameter has a monotone likelihood
ratio (MLR) if, for every
2
>
1
, g(t,
2
)/g(t,
1
) is an increasing
function of t on {t : g(t,
1
) > 0 or g(t,
2
) > 0}.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Normal, Poisson, Binomial all have the MLR property.
If T is from an exponential family with the density
f (t, ) = h(t)c()e
w()t
, then T has an MLR if w() is a
nondecreasing function in .
If X
1
, . . . , X
n
iid from N(,
2
) with known, then

X has an
MLR.
If X
1
, . . . , X
n
iid from N(,
2
) with known, then

n
i =1
(X
i
)
2
has an MLR.
iid Unif(0, ), T = X
(n)
has MLR property.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Stochastically increasing
Denition
A statistic T with family of pdf {f (t, ), } is called
stochastically increasing in if
1
<
2
implies that
P

1
(T > c) P

2
(T > c) for every c,
or equivalently, F

2
(c) F

1
(c), where F is the cdf.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Useful facts
Lemma
If a family T has the MLR property, then it is stochastically
increasing in its parameter.
A location family T is stochastically increasing in its location
parameter.
Let a test have rejection region R = {T > c}. If T has the
MLR property, then the power function
() = P

(T R) = P

(T > c) is non-decreasing in .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Karlin-Rubin Theorem
Theorem
Let T(X) be a sucient statistic for and the family
{g(t,
i
), } has the MLR property. Then
For testing H :
0
vs H
1
: >
0
, the UMP level test
rejects H
0
if and only if T > t
0
, where
= P

0
(T > t
0
).
For testing H :
0
vs H
1
: <
0
, the UMP level test
rejects H
0
if and only if T < t
0
, where
= P

0
(T < t
0
).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Let X
1
, . . . , X
n
be iid from N(,
2
),
2
known.
Find the UMP level test for testing H
0
:
0
vs
H
1
: >
0
.
Find the UMP level test for testing H
0
:
0
vs
H
1
: <
0
.
Let X
1
, . . . , X
n
be iid from N(
0
,
2
),
2
unknown,
0
known.
Find the UMP level test for testing H
0
:
2

2
0
vs
H
1
:
2
>
2
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Nonexistence of UMP test
For many problems with two-sided alternative, there is no
UMP level test, because the class of level test is so large
that no one test dominates all the others in terms of power.
Search a UMP test within some subset of the class of level
test, for example, the subset of all unbiased tests.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
Let X
1
, . . . , X
n
be iid from N(,
2
),
2
known. Consider testing
H
0
: =
0
vs H
1
: =
0
.
There is no UMP level test.
Find the UMP level test within the class of unbiased tests.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
p-value
The choice of is subjective. Dierent people may have
dierent tolerance levels .
If is small, the decision is conservative.
If is large, the decision is overly liberal.
If you reject (or accept) H
0
, is it a strong or borderline
rejection (acceptance)?
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
p-value (contd.)
Denition
A p-value is the smallest possible level at which H
0
would be
rejected.
Note
p-value is a test statistic, taking value 0 p(x) 1 for the
sample x.
Small values of p(X) gives evidence that H
1
is true.
The smaller p-value, the stronger the evidence of rejecting H
0
.
Reject H
0
at level is equivalent to p-value being less than .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
p-value for composite null
A p-value is called valid if, for every
0
and every 0 1,
we have P

(p(X) ) .
Theorem
Let W(X) be a test statistic such that large values of W give
evidence that H
1
is true. For each sample point x, dene
p(x) = sup

0
P

(W(X) W(x)).
Then p(X) is a valid p-value.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Two-sided normal p-value:
Let X
1
, . . . , X
n
be iid from N(,
2
),
2
unknown. Consider
testing H
0
: =
0
versus H
1
: =
0
, use the LRT statistic
W(X) = |

X
0
|/(S/

n).
Let
0
= 1, n = 16 , observed x = 1.5, s
2
= 1. Do you reject
the hypothesis = 1 at level 0.05? at level 0.1?
One-sided normal p-value:
In the above example, consider testing H
0
:
0
versus
H
1
: >
0
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
p-value and sucient statistic
Sometimes there is a non-trivial sucient for the null model. Then
dening a p-value through conditioning on a sucient statistic
eectively reduces the composite null to a point null:
p(x) = P(W(X) W(x)|S = S(x)).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Fishers Exact Test
Let S
1
and S
2
be independent observations with S
1
Bin(n
1
, p
1
)
and S
2
Bin(n
2
, p
2
). Consider testing H
0
: p
1
= p
2
versus
H
1
: p
1
> p
2
.
To form an exact (non-asymptotic) level test.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Interval Estimation
Interval estimate (L(X), U(X))
Condence coecient min

( (L(X), U(X))) = 1 .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of inversion
One to one correspondence between tests and condence intervals.
Hypothesis testing: Fix the parameter asks what sample
values (in the appropriate region) are consistent with that
xed value.
Condence set: Fix the sample value asks what parameter
values make this sample most plausible.
For each
0
, let A(
0
) be the acceptance region of a level
test H
0
: =
0
. Dene a set C(x) = {
0
: x A(
0
)}. Then
C(x) is a (1 )-condence set.
Example
iid N(,
2
), unknown, is parameter of interest.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of inversion (contd.)
In general, inverting acceptance region of a two sided test will give
two sided interval and inverting acceptance region of a one sided
test will give an open end interval on one side.
Theorem
Let acceptance region of a two sided test be of the form
A() = {x : c
1
() T(x) c
2
()} and let the cuto be
symmetric, that is, P

(T(X) > c
2
()) = /2 and
P

(T(X) < c
1
()) = /2.
If T has MLR property then both c
1
() and c
2
() are increasing in
.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X
1
, . . . , X
n
N(,
2
), both unknown.
Upper condence bound for .
Lower condence bound for .
X
1
, . . . , X
n
Exp(). Invert the LRT.
Discrete. X
1
, . . . , X
n
Bin(1, ) Obtain a lower condence
bound.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Pivot
Denition
A random quantity Q(X, ) is called pivotal quantity (or a pivot) if
the distribution of Q(X, ) is independent of .
Note this is dierent from an ancillary statistic since Q(X, )
depends also on and hence is not a statistic.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Location family
Scale family
Location-scale family
iid exponential. Gamma pivot.
A statistic T has density f (t, ) = g(Q(t, ))|(/t)Q(t, )|.
Then Q(T, ) is a pivot.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of pivot
How to construct a condence set using a pivotal quantity?
Find a, b such that P

(a Q(X, ) b) = 1 .
Dene C(x) = { : a Q(x, ) b}.
Then P

( C(X)) = P

(a Q(X, ) b) = 1 .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of pivot (contd.)
When will C(x) be an interval?
If Q(x, ) is monotone in , then C(x) is an interval.
Examples:
iid exponetial.
iid N(,
2
), known. Interval for .
iid N(,
2
), unknown. Interval for .
iid N(,
2
), known. Interval for .
iid N(,
2
), unknown. Interval for .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of pivot (contd.)
If F(t, ) is decreasing in for all t, dene
L
,
U
by
F(t,
L
) = 1
2
, F(t,
U
) =
1
,
1
+
2
= . Then
[
L
(T),
U
(T)] is (1 ) CI for .
Similarly if F(t|) is increasing in for all t, dene
L
,
U
by
F(t,
L
) =
2
, F(t,
U
) = 1
1
,
1
+
2
= . Then
[
L
(T),
U
(T)] is (1 ) CI for .
Examples:
iid from f (x, ) = e
(x)
I (x > ). X
(n)
sucient.
(1 ) CI is not unique. Among many choices, want to
minimize expected length.
iid N(,
2
) known.
iid N(,
2
) unknown.
iid exponential.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Asymptotic Evaluation
X
1
, . . . , X
n
i.i.d. f (x, ), n large. Mathematically n .
The assumption n makes life easier. Dependence of
optimality on models or loss functions becomes less
pronounced.
Because limit theorems become available, distributions can be
found approximately. Limiting distributions are much simpler
than actual distributions.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Convergence in probability
Denition
We say that Y
n

p
c (Y
n
converges in probability to a constant
c), if P(|Y
n
c| > ) 0 as n for all > 0.
Usual calculus applies for convergence in probability.
A possible method of showing this is Chebychevs inequality
P(|Y
n
c| > )
2
E(Y
n
c)
2
=
2
[var(Y
n
) + (E(Y
n
) c)
2
],
so it is enough to show that the right hand side goes to 0.
If Y
n
=

X
n
, then

X
n

p
E(X) by the law of large numbers.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Convergence in distribution
Denition
If Y
n
is a sequence of random variables and F is a continuous cdf,
we say that Y
n
converges in distribution to F if
P(Y
n
x) F(x) for all x.
We also say that Y
n

d
Y where Y is a random variable having
cdf F.
The central limit theorem states that

n(

X
n
E(X)) converges in
distribution to N(0, var(X)), i.e.,
P
_

n(

X
n
E(X))
_
var(X)
_
(x)
for all x where stands for the standard normal cdf.
Another important result is Slutskys theorem: If Y
n

d
Y and
Z
n

p
c, then Y
n
+ Z
n
Y + c, Y
n
Z
n

d
cY, Y
n
/Z
n
Y/c if
c = 0.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Consistency
Denition
Let W
n
= W
n
(X
1
, . . . , X
n
) be a sequence of estimators for ().
We say that W
n
is consistent for estimating () if W
n

p
()
under P

for all .
Theorem
If E

(W
n
) () (in which case W
n
is called asymptotically
unbiased for ()) and var

(W
n
) 0 for all , then W
n
is
consistent for ().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
If X
1
, . . . , X
n
are i.i.d. f with E(X) = and var(X) =
2
,
then

X
n
is consistent for and S
2
n
=

n
i =1
(X
i


X
n
)
2
/(n 1)
is consistent for
2
.

n
i =1
(X
i


X
n
)
2
/n is consistent for
2
too.
(Invariance principle of consistency): If W
n
is consistent for
and g is a continuous function, then g(W
n
) is consistent for
g().
Method of moment estimator is generally consistent.
UMVUE is consistent: Let X
1
, . . . , X
n
be i.i.d. f (x, ) and let
W
n
be the UMVUE of (). Then W
n
is consistent for ().
Consistency of MLE: Let X
1
, . . . , X
n
be i.i.d. f (x, ), a
parametric family satisfying some regularity conditions. Then
the MLE

n
is consistent for .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Asymptotic normality
Denition
A statistic T
n
is called asymptotically normal with mean () and
variance v()/n, written as AN((), v()/n), if

n(T
n
())/
_
v()
d
N(0, 1) for all .
() is called the asymptotic mean and v() is called the
asymptotic variance.
Note: T
n
is consistent for ().
Example
If X
1
, . . . , X
n
is i.i.d. f (x, ), then T
n
=

X
n
is AN((),
2
()/n),
where () = E

(X) and
2
() = var

(X) by the central limit


theorem.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Delta method
Theorem
If T
n
is AN(,
2
()/n), then g(T
n
) is AN(g(), (g

())
2

2
()/n).
A multivariate version is also true.
CLT and delta method combination gives asymptotic
normality of many statistics of interest.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Eciency
How to distinguish between consistent estimators.
Let estimators be asymptotically normal. Asymptotic means
are the same. Can compare asymptotic variances.
Often one variance is smaller than another throughout.
If there is a lower bound, and that lower bound is attained,
then the estimator making that happen is called
asymptotically ecient. Clearly such an estimator is
impossible to beat asymptotically the best.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Eciency bound
Cramer-Rao bound for MSE of T
n
in estimating ():
(

() + b

n
())
2
nI ()
,
where I () is Fisher information, b
n
() the bias.
So if b

n
() 0, then the bound for the asymptotic variance
should be (

())
2
/I ().
In particular, if () = , the bound for asymptotic variance is
1/I ().
Strictly, speaking, this bound is not valid, although it is nearly
correct.
Then we can dene an estimator to be asymptotically ecient
if its asymptotic variance is 1/I ().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Attaining eciency bound
Theorem
The MLE

is AN(, 1/(nI ())).
More generally, () is AN((), (

())
2
/(nI ())).
The MLE is not the only possible asymptotically ecient
estimator.
Any Bayes estimator is asymptotically ecient.
Method of moment estimators are asymptotically normal, but
need not be asymptotically ecient.
Dene asymptotic eciency of

n
AN(, v()/n) by
I ()/v().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Cauchy
Logistic
Mean versus median
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Asymptotic distribution of likelihood ratio statistic
Theorem (Point null case)
Let X
1
, . . . , X
n
be i.i.d. f (x|) and let
n
(X) be the likelihood
ratio for testing H
0
: =
0
vs H
1
: =
0
and is d dimensional.
Then 2 log
n
(X)
d

2
d
.
Example: Poisson
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Asymptotic distribution of likelihood ratio statistic
Theorem (General case)
Let X
1
, . . . , X
n
be i.i.d. f (x, ) and let
n
(X) be the likelihood
ratio for testing H
0
:
0
vs H
1
:
0
. Then
2 log
n
(X)
d

2
d
, where d is the dierence between the
number of free parameter in the model and the number of free
parameters in the null.
Example: Multinomial
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Common large sample tests
Two sample normal. Variances unknown.
Two sample binomial.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Common large sample condence intervals
Asymptotic normality of moment or function of moments, by
central limit theorem and delta method, often can give a
condence interval.
Condence interval can also be obtained from the distribution
of the MLE:

n
AN(, 1/(nI ())), i.e.,

n
_
I ()(

n
) AN(0, 1), an asymptotic pivot.
Can possibly invert this to get a CI for . May be complicated.
By Slutsky,

n
_
I (

n
)(

n
) AN(0, 1). This is easy to
invert.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Poisson
Binomial
Correlation coecient
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Variance stabilizing transformation
If T
n
AN(, v()/n), then
g(T
n
) AN(g(), (g

())
2
v()/n).
Choose g such that the variance does not depend on , that is
g

() = v
1/2
(), or g(x) =
_
x
0
v
1/2
(u)du.
Large sample CI:
g(T
n
) z
/2
cn
1/2
< g() < g(T
n
) + z
/2
cn
1/2
, or
[g
1
(g(T
n
) z
/2
cn
1/2
), g
1
(g(T
n
) + z
/2
cn
1/2
)].
Has some advantages, and usually much better accuracy.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Poisson
Binomial
Chi-square
Correlation coecient
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

Das könnte Ihnen auch gefallen