Sie sind auf Seite 1von 4

Supplementary material for Lesson 10

1 Conjugate Posterior for the Normal Mean


Here we derive the update formula from Lesson 10.1 where the likelihood is normal and we
iid
use a conjugate normal prior on the mean. Specifically, the model is x1 , x2 , . . . , xn N(, 02 ),
N(m0 , s20 ) with 02 , m0 , and s20 known. First consider the case in which we have only
one data point x. The posterior is then

f (x|)f ()
f (|x) = R

f (x|)f ()d
f (x|)f ()
   
1 1 2 1 1 2
=p exp 2 (x ) p exp 2 ( m0 )
202 20 2s20 2s0
   
1 1
exp 2 (x )2 exp 2 ( m0 )2
20 2s0
 
1 2 1 2
= exp 2 (x ) 2 ( m0 )
20 2s0
  
1 1 2 2 1 2 2
= exp (x 2x + ) + 2 ( 2m0 + m0 )
2 02 s0
 2 2
x2 m20
 
1 2x 2m0
= exp + 2 + 2 + + 2 + 2
2 02 s0 0 s20 s0
 0
x2 m20
    
1 1 1 2 x m0
= exp + 2 + 2 + 2 + 2
2 s20 02 02 s0 s
  0 0  2
m20
    
1 1 1 2 x m0 1 x
= exp + 2 + 2 exp + 2
2 s20 02 02 s0 2 02 s0
     
1 1 1 x m0
exp 2
+ 2 2 2 2
+ 2
2 s0 0 0 s0
      1
1 1 2 x m0 2 1 1
= exp 2 + 2 where s1 = +
2 s21 02 s0 s20 02
s21 x
    
1 1 2 m0
= exp 2 2 + 2
2 s21 s1 02 s0
   
1  2  2 m0 x
= exp 2 2m1 where m1 = s1 + 2 .
2s1 s20 0

1
The next step is to complete the square in the exponent:

 
1  2 
f (|x) exp 2 2m1
2s1
 
1  2 2 2

= exp 2 2m1 + m1 m1
2s1
   2
1  2 2
 m1
= exp 2 2m1 + m1 exp
2s1 2s21
 
1 
exp 2 2 2m1 + m21

2s1
 
1 2
= exp 2 ( m1 )
2s1

which, except for a normalizing constant not involving , is the PDF of a normal distribution
with mean m1 and variance s21 .
The final step is to extend this result to accommodate n independent data points. The
likelihood in this case is
n  
Y 1 1 2
f (x|) = p exp 2 (xi )
i=1 202 20
( n
)
1 X
= (202 )1/2 exp 2 (xi )2
20 i=1
( " n n
#)
1 X 2 X
exp 2 x 2 xi + n2
20 i=1 i i=1
 
1  2

exp 2 2nx + n .
20
We can repeat the steps above or notice that the data contribute only through the sample
mean x (and n which we assume is known). This means that x is a sufficient statistic for
, allowing us to use the distribution of x as the likelihood (analogous to using a binomial
likelihood in place of a sequence of Bernoullis). The model then becomes
02
 
x | N , , N(m0 , s20 ) .
n
We now apply our result derived above, replacing x with x and 02 with 02 /n. This yields
the update equation presented in Lesson 10.1.

2
2 Marginal Distribution of Normal Mean in Conjugate
Model
Consider again the model x| N(, 02 ), N(m0 , s20 ) with 02 known. Here we derive
R
the marginal distribution for data given by f (x|)f ()d. This is the prior predictive
distribution for a new data point x .
To do so, re-write the model in an equivalent, but more convenient form: x = + 
where  N(0, 02 ) and = m0 + where N(0, s20 ), with  and independent. Now
substitute into the first equation to get x = m0 + + . Recall that adding two normal
random variables results in another normal random variable, so x is normal with E(x) =
E(m0 + + ) = m0 + E() + E() = m0 + 0 + 0 and V ar(x) = V ar(m0 + + ) =
V ar(m0 ) + V ar() + V ar() = 0 + s20 + 02 (note that we can add variances because of the
independence of and ). Therefore the marginal distribution for x is normal with mean m0
and variance s20 + 02 . The posterior predictive distribution is the same, but with m0 and s20
replaced by the posterior updates given in Lesson 10.1.

3 Inverse-Gamma Distribution
The inverse-gamma distribution is the conjugate prior for 2 in the normal likelihood with
known mean. It is also the marginal prior/posterior for 2 in the model of Lesson 10.2.
As the name implies, the inverse-gamma distribution is related to the gamma distribution.
If X Gamma(, ), then the random variable Y = 1/X Inverse-Gamma(, ) where

(+1)
 

f (y) = y exp I{y>0}
() y

E(Y ) = for > 1 .
1
The relationship between gamma and inverse-gamma suggest a simple method for sim-
ulating draws from the inverse-gamma distribution. First draw X from the Gamma(, )
distribution and take Y = 1/X, which corresponds to a draw from the Inverse-Gamma(, ).

3
4 Marginal Posterior Distribution for the Normal Mean
when the Variance is Unknown
If we are not interested in inference for an unknown 2 , we can integrate it out of the joint
posterior in Lesson 10.2. This results in a t-distributed marginal posterior as noted at the
end of the lesson. This t distribution has = 2 + n degrees of freedom and two additional
parameters, a scale and a location m given by
nx + wm
m = (the mean of the conditional posterior for )
sn + w
n1 2 wn
+ 2
s + 2(w+n)
(x m)2
= (modified scale of the updated inverse-gamma for 2 )
(n + w)( + n/2)

where s2 = (xi x)2 /(n 1), the sample variance.


P

This t distribution can be used to create a credible interval for by multiplying the
appropriate quantiles of the standard t distribution by the scale and adding the location
m .

Example: Suppose we have normal data with unknown mean and variance 2 . We use
the model from Lesson 10.2 with m = 0, w = 0.1, = 3/2, and = 1. The data are n = 20
independent observations with x = 1.2 and s2 = 0.7. Then we have

2 | x Inverse-Gamma(11.5, 7.72)
2
 
2
| , x N 1.19,
20.1
m = 1.19
= 0.183

and | x is distributed t with 23 degrees of freedom, location 1.19 and scale 0.183. To
produce a 95% equal-tailed credible interval for , we first need the 0.025 and 0.975 quantiles
of the standard t distribution with 23 degrees of freedom. These are 2.07 and 2.07. The
95% credible interval is then m (2.07) = 1.19 0.183(2.07) = 1.19 0.38.