Sie sind auf Seite 1von 37

STAT 426

Lecture 20
Fall 2012
Arash A. Amini

November 20, 2012

1 / 37

Outline

Sufficiency
I
I
I
I

Exponential families
I
I
I

What does it really mean?


Sufficient statistic is not unique
Minimal sufficient
Order statistics are sufficient for iid data
More examples
Some properties
MLE in exponential families

Back to sufficiency
I
I

MLE and sufficiency


Rao-Blackwell theorem

2 / 37

What does sufficiency mean?


Here is an example to give you some intuition:
I

Consider iid Bernoulli trials first


iid

X1 , . . . , Xn Ber()
I

Let us think of coin tossing:


Xi = 1 heads observed in ith throw
Xi = 0 tails observed in ith throw

Let x = (x1 , . . . , xn ). The joint pmf is


p (x) =

n
Y
i=1

xi (1 )1xi =

xi

(1 )n

xi

P
= g ( i xi , ).
I

By factorization theorem, T (X ) =

Pn

i=1

Xi is sufficient for .
3 / 37

What does sufficiency mean?

Suppose you observe the sequence (n = 20)


X = (0 1 1 1 1 0 0 1 0 1 1 1 0 0 1 1 1 0 0 0)

Since T (X ) = 11 is sufficient for , we can keep 11 and discard the


sequence.

Any estimator that we can form based on the information in X , can be


matched in performance by an estimator that is based on T (X ).
(The above is formalized in Rao-Blackwell theorem.)

4 / 37

What does sufficiency mean?


Now consider the following modification:
I {Xi } are still independent, but they are not identically distributed.
I Say n = 2k is even.
I Assume that Xi Ber() for i = 1, 3, 5, . . . , n 1
I For i = 2, 4, . . . , n, in each trial, we toss the coin twice and record 1 if we
observe at least one heads:
Xi = 1 HH or HT or TH is observed in ith throw
I
I
I

P (Xi = 1) = 1 (1 )2 = (2 ).

We have Xi Ber (2 ) for i = 2, 4, . . . , n.
The joint pmf is
Y
Y
xi
1xi
p (x) =
xi (1 )1xi
(2 )
(1 )2
.
i even

i odd

Let t1 :=

i odd xi

and t2 :=

i even xi .

Then,

p (x) = t1 +t2 (2 )t2 (1 )kt1 +2k2t2


5 / 37

What does sufficiency mean?


I

We have
p (x) = t1 +t2 (2 )t2 (1 )3kt1 2t2
= g (t1 , t2 , )

We conclude
that (T1 , T2 ) is sufficient where T1 =
P
T2 = i even Xi .

(Note: (T , T1 ) is also sufficient where T = T1 + T2 .)

For example, if we observe the same sequence

i odd

Xi and

X = (0 1 1 1 1 0 0 1 0 1 1 1 0 0 1 1 1 0 0 0)
It is enough to keep (T1 , T2 ) = (5, 6) and discard the sequence, if we only
care about estimating .
I

In an intuitive sense, with this model, there is more information in the same
sequence for estimating than just its number of ones.

Exercise. Find mle of in this problem.


6 / 37

Outline

Sufficiency
I
I
I
I

Exponential families
I
I
I

What does it really mean?


Sufficient statistic is not unique
Minimal sufficient
Order statistics are sufficient for iid data
More examples
Some properties
MLE in exponential families

Back to sufficiency
I
I

MLE and sufficiency


Rao-Blackwell theorem

7 / 37

Sufficient statistic is not unique

If T is sufficient and T = h(Te ), then Te is also sufficient.

This follows easily from factorization theorem. (Exercise)


Pn
Example: In the Bernoulli problem, T = i=1 Xi is sufficient.
Pn
Since T = X1 + i=2 Xi , then

I
I

X1 ,

n
X

Xi

i=2

I
I

is also sufficient. [Take h(u, v ) = u + v .]


Pn
Similarly, (X1 , X2 , i=3 Xi ) is also sufficient,
Pn
... and also (X1 , X2 , X3 , i=4 Xi ).

8 / 37

Minimal sufficiency

I
I
I
I
I

If T is sufficient and T = h(Te ), then Te is also sufficient.


Pn
Example: In the Bernoulli problem, T = i=1 Xi is sufficient.


Pn
Pn
Since T = X1 + i=2 Xi , then X1 , i=2 Xi is also sufficient.
Pn
Similarly, (X1 , X2 , i=3 Xi ) is also sufficient,
Pn
... and also (X1 , X2 , X3 , i=4 Xi ).

These progressively carry more information.


(They are in a sense more than sufficient.)

There is a hierarchy (ordering) of sufficient statistics.


Pn
Intuitively, i=1 Xi carries the minimal amount of information sufficient for
estimating . (Minimal sufficient)

9 / 37

Outline

Sufficiency
I
I
I
I

Exponential families
I
I
I

What does it really mean?


Sufficient statistic is not unique
Minimal sufficient
Order statistics are sufficient for iid data
More examples
Some properties
MLE in exponential families

Back to sufficiency
I
I

MLE and sufficiency


Rao-Blackwell theorem

10 / 37

Sufficiency of order statistic in iid data


I

The entire data (X1 , . . . , Xn ) is always sufficient.


(In a sense, this is the maximal sufficient statistic.)

For iid data, we can discard the order information:


The order statistic is sufficient for iid data.

To obtain order statistic of (X1 , . . . , Xn ),


we sort them and denote them as
X(1) X(2) X(n)

Example (X1 , X2 , X3 ) = (2, 0.4, 1.5), has order statistic


(X(1) , X(2) , X(3) ) = (0.4, 1.5, 2)
as does (X1 , X2 , X3 ) = (1.5, 2, 0.4).

11 / 37

Sufficiency of order statistic in iid data


I

Example (X1 , X2 , X3 ) = (2, 0.4, 1.5), has order statistic


(X(1) , X(2) , X(3) ) = (0.4, 1.5, 2)

as does (X1 , X2 , X3 ) = (1.5, 2, 0.4).

Why order statistic is enough? By factorization theorem


f (x) =

n
Y
i=1

f (xi ) =

n
Y

f (x(i) )

i=1

Concrete example:
f (2, 0.4, 1.5) = f (2)f (0.4)f (1.5)

= f (0.4)f (1.5)f (2)

12 / 37

Outline

Sufficiency
I
I
I
I

Exponential families
I
I
I

What does it really mean?


Sufficient statistic is not unique
Minimal sufficient
Order statistics are sufficient for iid data
More examples
Some properties
MLE in exponential families

Back to sufficiency
I
I

MLE and sufficiency


Rao-Blackwell theorem

13 / 37

More on exponential families

Recall: 1-parameter exponential family in canonical form


f (x) = exp[T (x) A()]h(x)

T (x) is sufficient by factorization.

is the canonical parameter.


R
R
e A() = exp[T (x)]h(x)dx since we need f (x)dx = 1.

I
I
I

E = { : A() < } is the natural parameter space.

Can cook up different distributions by choosing T (x) and h(x).

14 / 37

Cooking up exponential families (1-param)

Say we want a distribution on [1, 1].

Can set up a model by taking


T (x) = x,

I
I
I

h(x) = 1{1x1}

Can compute A() explicitly:


R1
e A() = 1 e x dx = 1 (e e ) =

2 sinh()

The density is

h
 2 sinh i
f (x) = exp x log
1{1x1}

exp(x)1{1x1}

15 / 37

Cooking up exponential families (1-param)


I

The plot of f (x) exp(x)1{1x1} .


= 0.5
= 0.1
= 1.2
= 0.9

0.5

0
2

16 / 37

Cooking up exponential families (2-param)

We can add more parameters = more modelling ability

Say we take
T (x) = (x, x 2 ),

h(x) = 1{1x1}

Let = (1 , 2 ). The density is given by (up to a constant)


f (x) exp(1 x + 2 x 2 )1{1x1}

More cumbersome to compute normalizing constant e A() in this case.

17 / 37

Cooking up exponential families (2-param)


I

Plots of f (x) exp(1 x + 2 x 2 )1{1x1}


2.5
= (0.5, 0.5)
= (0.1, 1)
= (1, 1.2)
= (1, 0.9)
= (1, 3)

2
1.5
1
0.5
0
2

18 / 37

Cooking up exponential families (2-D, 5-param)

I
I

Consider now two random variable (U, V ) [0, 1]2 .


Let x = (u, v ) and

T (x) = (u 2 , v 2 , u, v , uv ),
I

h(x) = 1{0u,v 1}

The density is


f (x) exp 1 u 2 + 2 v 2 + 3 u + 4 v + 5 uv 1{0u,v 1}

Computing A() explicitly is going to cause a headache.

(For fixed , can be done by numerical integration.)

19 / 37

Cooking up exponential families (2-D, 5-param)


h(x) = 1{0u,v 1}

= (2, 2, -0.5, -0.5, -1)

= (-0.5, -0.5, 0.5, 0.5, -4)

0
1

1
0.5
v

0 0
(a)

0.5
u

0
1

1
0.5
v

0 0

0.5
u

(b)
20 / 37

Cooking up exponential families (2-D, 5-param)


h(x) = 1{0u,v 1}

= (2, -2, 2, 2, 1)

= (-2, -2, 2, 2, 1)

1.5
5

1
0.5
1

0.5

0 0

v
(c)

0.5
u

0
1

1
0.5
v

0 0

0.5
u

(d)
21 / 37

Cooking up exponential families (2-D, 5-param)


h(x) = 1{0u,v 1}

= (2, -2, 2, -2, 1)

= (0.1, -3, -4, -3, 10)

6
10

0
1

1
0.5

0 0

v
(e)

0.5
u

0
1

1
0.5
v

0 0

0.5
u

(f)
22 / 37

Cooking up exponential families (2-D, 5-param)


h(x) = 1{1u,v 1}

= (1, 0, 0, 0, -0.5)

= (2, -1, 0.1, 0.1, -0.1)

1
0.5
0.5
1
01

11

v
(g)

1
1

11

(h)
23 / 37

Cooking up exponential families (2-D, 5-param)


h(x) = 1{1u,v 1}

= (0, 0, 0, 0, -1.5)

= (0, -1, 0, 0, -0.5)

0.4

0.5

0.2
1
1

11

v
(i)

1
01

11

(j)
24 / 37

Cooking up exponential families (2-D, 5-param)


h(x) = 1{1u,v 1}

= (0.5, -0.5, 0.5, -0.5, 1)

= (0, -0.5, 0, 0, 1)

0.4

0.5

0.2
1
1

11

v
(k)

1
01

11

(l)
25 / 37

More examples of well-known exponential families


x

Poisson distribution has pmf p (x) = e x! ,

Can be rewritten as

x = 0, 1, 2, . . .

 1
p (x) = exp x log
x!
|{z}
h(x)

The canonical parameter is = log and A() = e . T (x) = x. After


reparametrization,
p (x) =

1
exp(x e )
x!

Natural parameter space is E = (, ).

26 / 37

Outline

Sufficiency
I
I
I
I

Exponential families
I
I
I

What does it really mean?


Sufficient statistic is not unique
Minimal sufficient
Order statistics are sufficient for iid data
More examples
Some properties
MLE in exponential families

Back to sufficiency
I
I

MLE and sufficiency


Rao-Blackwell theorem

27 / 37

Properties of exponential families (mean)

Consider the 1-parameter case. Then, we have


E [T (X )] = A0 ().
e T (x) h(x)dx.

Easy to prove by differentiating e A() =

Example, for X Poi(), we get E [X ] = (e )0 = e = .

The mgf is easy to obtain too

MT (t) := E [e T (X ) ] = e A(t+)A() .
I

The cumulant generating function (cgf) is


KT (t) := log MT (t) = A(t + ) A()

28 / 37

Properties of exponential families (cgf)


I

The cumulant generating function (cgf) is


KT (t) = log MT (t) = A(t + ) A()

Cumulants m (T ) =

One can show that

dm
dt m KT (t) |t=0 .

1 (T ) = E [T ] = A0 ()
2 (T ) = var [T ] = A00 ()
I

For X Poi(), this gives an easy proof of


var (X ) = (e )00 = e =
.

29 / 37

Outline

Sufficiency
I
I
I
I

Exponential families
I
I
I

What does it really mean?


Sufficient statistic is not unique
Minimal sufficient
Order statistics are sufficient for iid data
More examples
Some properties
MLE in exponential families

Back to sufficiency
I
I

MLE and sufficiency


Rao-Blackwell theorem

30 / 37

MLE in exponential families


I
I

Consider the 1-param canonical case f (x) = h(x) exp[T (x) A()].
Joint density base on n samples,

n
n X
oY
f (x) = exp
T (xi ) nA()
h(x)
i=1

Letting ` (x) = log f (x). Differentiating


X
` (x)
= 0 =
T (xi ) nA0 () = 0.

I
I

MLE is obtained by solving A0 () = n1


P
equivalently E [T (X )] = n1 i T (Xi ).

T (Xi ),
MLE coincides with MOM

31 / 37

What is not an example of an exponential family?

The shifted exponential f, (x) = e (x) 1{x} .

Also applicable to double exponential or Laplace with shift


1
exp( |x|
(It is if you take = 0.)
f, (x) = 2
).

Uniform[0, ].

32 / 37

Outline

Sufficiency
I
I
I
I

Exponential families
I
I
I

What does it really mean?


Sufficient statistic is not unique
Minimal sufficient
Order statistics are sufficient for iid data
More examples
Some properties
MLE in exponential families

Back to sufficiency
I
I

MLE and sufficiency


Rao-Blackwell theorem

33 / 37

Sufficiency and MLE

For any sufficient statistic T = T (X ), MLE is a function of T = T (X ).


Immediate from factorization theorem
argmax f (X ) = argmax[g (T )h(X )] = argmax g (T )

Knowing a sufficient statistic is enough for computing MLE.

34 / 37

Rao-Blackwell Theorem

Formalizing: Sufficient statistic T captures all info. for estimating .

Theorem (Rao-Blackwell)
I
I
I
I

Let b be some estimator with E [b2 ] < (finite 2nd moment).


Let T be sufficient for .
b ].
Let e = E [|T
Then
E (e )2 E (b )2 .

The inequality is strict unless e = b (a.s. P ).

35 / 37

Proof of Rao-Blackwell

I
I

b ]. Want to show that MSE()


e MSE().
b
e = E [|T
Here is the proof:
I
I
I

e = E [].
b
E []
Since MSE = var + (bias)2 , need to compare variances only.
Law of total variance implies


b = var E(|T
b ) .
b ) + E var(|T
var ()
| {z }
|
{z
}
e

We are done.

Where did we use T is sufficient then?

36 / 37

Rao-Blackwell example
I
I
I
I
I
I
I

iid

Bernoulli problem as usual X1 , . . . , Xn Ber(). (n 5)


For some reason, someone proposes b = X1 + 2X5 .
Pn
T = i=1 Xi is sufficient.
b ] does better.
Rao-Blackwell says that e = E [|T
By symmetry E [Xj |T ] does not depend on j, hence
E [Xj |T ] = h(T ).

Now T = E [T |T ] =

Pn

j=1

E [Xj |T ] = nh(T ).

We have shown that h(T ) = n1 T = X n .

We have
3
e = E [X1 |T ] + 2E [X5 |T ] = 3h(T ) = T .
n

Does this really have a lower MSE? Exercise.

37 / 37

Das könnte Ihnen auch gefallen