Conjugate Posterior for the Normal Mean

Supplementary material for Lesson 10
1 Conjugate Posterior for the Normal Mean

Here we derive the update formula from Lesson 10.1 where the likelihood is normal and we
iid
use a conjugate normal prior on the mean. Specifically, the model is x1 , x2 , . . . , xn N(, 02 ),
N(m0 , s20 ) with 02 , m0 , and s20 known. First consider the case in which we have only
one data point x. The posterior is then
f (x|)f ()
f (|x) = R

f (x|)f ()d
f (x|)f ()

1 1 2 1 1 2
=p exp 2 (x ) p exp 2 ( m0 )
202 20 2s20 2s0

1 1
exp 2 (x )2 exp 2 ( m0 )2
20 2s0

1 2 1 2
= exp 2 (x ) 2 ( m0 )
20 2s0

1 1 2 2 1 2 2
= exp (x 2x + ) + 2 ( 2m0 + m0 )
2 02 s0
2 2
x2 m20

1 2x 2m0
= exp + 2 + 2 + + 2 + 2
2 02 s0 0 s20 s0
0
x2 m20

1 1 1 2 x m0
= exp + 2 + 2 + 2 + 2
2 s20 02 02 s0 s
0 0 2
m20

1 1 1 2 x m0 1 x
= exp + 2 + 2 exp + 2
2 s20 02 02 s0 2 02 s0

1 1 1 x m0
exp 2
+ 2 2 2 2
+ 2
2 s0 0 0 s0
1
1 1 2 x m0 2 1 1
= exp 2 + 2 where s1 = +
2 s21 02 s0 s20 02
s21 x

1 1 2 m0
= exp 2 2 + 2
2 s21 s1 02 s0

1 2 2 m0 x
= exp 2 2m1 where m1 = s1 + 2 .
2s1 s20 0
1
The next step is to complete the square in the exponent:

1 2
f (|x) exp 2 2m1
2s1

1 2 2 2

= exp 2 2m1 + m1 m1
2s1
2
1 2 2
m1
= exp 2 2m1 + m1 exp
2s1 2s21

1
exp 2 2 2m1 + m21

2s1

1 2
= exp 2 ( m1 )
2s1
which, except for a normalizing constant not involving , is the PDF of a normal distribution
with mean m1 and variance s21 .
The final step is to extend this result to accommodate n independent data points. The
likelihood in this case is
n
Y 1 1 2
f (x|) = p exp 2 (xi )
i=1 202 20
( n
)
1 X
= (202 )1/2 exp 2 (xi )2
20 i=1
( " n n
#)
1 X 2 X
exp 2 x 2 xi + n2
20 i=1 i i=1

1 2

exp 2 2nx + n .
20
We can repeat the steps above or notice that the data contribute only through the sample
mean x (and n which we assume is known). This means that x is a sufficient statistic for
, allowing us to use the distribution of x as the likelihood (analogous to using a binomial
likelihood in place of a sequence of Bernoullis). The model then becomes
02

x | N , , N(m0 , s20 ) .
n
We now apply our result derived above, replacing x with x and 02 with 02 /n. This yields
the update equation presented in Lesson 10.1.
2
2 Marginal Distribution of Normal Mean in Conjugate
Model
Consider again the model x| N(, 02 ), N(m0 , s20 ) with 02 known. Here we derive
R
the marginal distribution for data given by f (x|)f ()d. This is the prior predictive
distribution for a new data point x .
To do so, re-write the model in an equivalent, but more convenient form: x = +
where N(0, 02 ) and = m0 + where N(0, s20 ), with and independent. Now
substitute into the first equation to get x = m0 + + . Recall that adding two normal
random variables results in another normal random variable, so x is normal with E(x) =
E(m0 + + ) = m0 + E() + E() = m0 + 0 + 0 and V ar(x) = V ar(m0 + + ) =
V ar(m0 ) + V ar() + V ar() = 0 + s20 + 02 (note that we can add variances because of the
independence of and ). Therefore the marginal distribution for x is normal with mean m0
and variance s20 + 02 . The posterior predictive distribution is the same, but with m0 and s20
replaced by the posterior updates given in Lesson 10.1.
3 Inverse-Gamma Distribution
The inverse-gamma distribution is the conjugate prior for 2 in the normal likelihood with
known mean. It is also the marginal prior/posterior for 2 in the model of Lesson 10.2.
As the name implies, the inverse-gamma distribution is related to the gamma distribution.
If X Gamma(, ), then the random variable Y = 1/X Inverse-Gamma(, ) where
(+1)

f (y) = y exp I{y>0}
() y

E(Y ) = for > 1 .
1
The relationship between gamma and inverse-gamma suggest a simple method for sim-
ulating draws from the inverse-gamma distribution. First draw X from the Gamma(, )
distribution and take Y = 1/X, which corresponds to a draw from the Inverse-Gamma(, ).
3
4 Marginal Posterior Distribution for the Normal Mean
when the Variance is Unknown
If we are not interested in inference for an unknown 2 , we can integrate it out of the joint
posterior in Lesson 10.2. This results in a t-distributed marginal posterior as noted at the
end of the lesson. This t distribution has = 2 + n degrees of freedom and two additional
parameters, a scale and a location m given by
nx + wm
m = (the mean of the conditional posterior for )
sn + w
n1 2 wn
+ 2
s + 2(w+n)
(x m)2
= (modified scale of the updated inverse-gamma for 2 )
(n + w)( + n/2)
where s2 = (xi x)2 /(n 1), the sample variance.

P
This t distribution can be used to create a credible interval for by multiplying the
appropriate quantiles of the standard t distribution by the scale and adding the location
m .
Example: Suppose we have normal data with unknown mean and variance 2 . We use
the model from Lesson 10.2 with m = 0, w = 0.1, = 3/2, and = 1. The data are n = 20
independent observations with x = 1.2 and s2 = 0.7. Then we have
2 | x Inverse-Gamma(11.5, 7.72)
2

2
| , x N 1.19,
20.1
m = 1.19
= 0.183
and | x is distributed t with 23 degrees of freedom, location 1.19 and scale 0.183. To
produce a 95% equal-tailed credible interval for , we first need the 0.025 and 0.975 quantiles
of the standard t distribution with 23 degrees of freedom. These are 2.07 and 2.07. The
95% credible interval is then m (2.07) = 1.19 0.183(2.07) = 1.19 0.38.

Conjugate Posterior for the Normal Mean

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Conjugate Posterior for the Normal Mean

Hochgeladen von

Copyright:

Verfügbare Formate

Supplementary material for Lesson 10

1 Conjugate Posterior for the Normal Mean

where s2 = (xi x)2 /(n 1), the sample variance.

Das könnte Ihnen auch gefallen