The Exponential Family of Distributions: P (X) H (X) e

The Exponential Family of Distributions
θ > T (x)−A(θ)
p(x) = h(x) e
θ vector of parameters
T (x) vector of “suf£cient statistics”
A(θ) cumulant generating function
h(x)
θ T T (x)
Key point: x and θ only “mix” in e
1
The Exponential Family of Distributions
θ > T (x)−A(θ)
p(x) = h(x) e
To get a normalized distribution, for any θ

Z Z
−A(θ) θ > T (x)
p(x) dx = e h(x) e dx = 1
so Z
>
eA(θ) = h(x) e θ T (x)
dx,
i.e., when T (x) = x, A(θ) is the log of Laplace transform of h(x).
2
Examples
2
/(2σ 2 )
Gaussian p(x) = √ 1
2πσ 2
e−k x−µ k x∈R
x 1−x
Bernoulli p(x) = α (1 − α) x ∈ {0, 1}
¡ n¢ x n−x
Binomial p(x) = x α (1 − α) x ∈ {0, 1, 2, . . . , n}
Qn P
Multinomial p(x) = x1 !x2 !...xn ! i=1 αixi
n!
xi ∈ {0, 1, 2, . . . , n} , i xi = n
Exponential p(x) = λ e−λx x ∈ R+

e−λ x
Poisson p(x) = x! λ x ∈ {0, 1, 2, . . .}
P
Γ( i α i ) Q αi −1 P
Dirichlet p(x) = Q x
i i xi ∈ [0, 1] , i xi = 1
i Γ(αi )
(don’t need to memorize these except for Gaussian)
3
Natural Parameter form for Bernoulli
θ > T (x)−A(θ)
p(x) = h(x) e
1−x
p(x) = αx (1 − α)
h ¡ x i
1−x ¢
= exp log α (1 − α)
= exp [ x log α + (1 − x) log (1 − α) ]
· ¸
α
= exp x log + log (1 − α)
1−α
£ ¡ θ
¢¤
= exp x θ − log 1 + e
so
α ¡ θ
¢
T (x) = x θ = log A(θ) = log 1 + e
1−α
4
Natural Parameter Form for Gaussian
1 −(x−µ)2 /(2σ 2 )
p(x) = √ e
2πσ 2
µ ¶
1 x2 µx µ2
= √ exp − log σ − 2 + 2 − 2
2π 2σ σ 2σ
1 ¡ > 2 2
¢
= √ exp θ T (x) − log σ − µ /(2σ )
2π | {z }
| {z } A(θ)
h(x)
where
   
2 µ2
x µ/σ A(θ) = 2σ 2 + log σ
T (x) =   θ= 
[θ]21
2 2 1
x −1/(2σ ) = − 4[θ]2 − 2 log (−2[θ]2 )
5
Natural Parameter Form for Multivariate
Gaussian
>
p(x) = h(x) e θ T (x)−A(θ)
1 −1
p(x) = D/2 1/2
e−(x−µ)Σ (x−µ)/2
(2π) |Σ|
   
−D/2 x Σ−1 µ
h(x) = (2π) T (x) =   θ= 
x x> − 12 Σ−1
6
The £rst derivative of A(θ)
·Z ¸
>
A(θ) = log h(x) e θ T (x)
dx
| {z }
Q(θ)
dA(θ) 1 dQ(θ) Q0 (θ)

= =
dθ Q(θ) dθ Q(θ)
R >
h(x) e θ T (x) T (x) dx
= R
h(x) e θ> T (x) dx
R θ > T (x)−A(θ)
h(x) e T (x) dx
= R
h(x) e θ> T (x)−A(θ) dx
= Epθ [T (x)] .
7
The second derivative of A(θ)
·Z ¸
θ > T (x)
A(θ) = log h(x) e dx
| {z }
Q(θ)
· 0
¸ · ¸ 2
dA(θ) d Q (θ) d 0 1 Q00 (θ) (Q0 (θ))
= = Q (θ) = − 2
dθ dθ Q(θ) dθ Q(θ) Q(θ) (Q(θ))
R >
h(x) e θ T (x) T 2 (x) dx 2
= R − (E pθ [T (x)])
h(x) e θ> T (x) dx
R θ > T (x)−A(θ) 2
h(x) e T (x) dx 2
= R − (E p [T (x)])
h(x) e θ> T (x)−A(θ) dx θ
£ 2 ¤ 2
= Epθ T (x) − (Epθ [T (x)]) = Covpθ [T (x)] º 0.
=⇒ A(θ) is convex. (º means positive de£nite)
8
Maximum Likelihood
N
X N h
X i
`(θ) = log p ( xi | θ ) = log h(xi ) + T (xi ) − A(θ)
i=1 i=1
To £nd maxmimum likelihood solution
" N #
X
0
` (θ) = θT T (xi ) − N A0 (θ)
i=1
So ML solution satis£es
N
1 X
A0 (θ̂M L ) = T (xi ) = 0
N i=1
(is θ̂ML a consistent estimator then ?)

1
PN
Suf£cient statistics N i=1 T (xi ) summarize data.
When can’t do this analytically: convexity =⇒ unique global ML
solution for θ.
9
Products
Products of E-family distributions are E-family distributions

³ ´ ³ ´
θ1T T (x)−A(θ1 ) θ2T T (x)−A(θ2 )
h(x) e × h(x) e =
h̃(x) e(θ1 +θ2 )T (x)−Ã(θ1 ,θ2 )
but might not have a nice parametric form any more.
But the product of two Gaussians is always a Gaussian.
10
Conjugate Priors in Bayesian Statistics
p ( x | θ ) p(θ)
p(θ|x) = R
p ( x | θ ) p(θ) dθ
Note: denominator not a function of θ ⇒ just normalizing term
p(θ) −→ p ( x | θ ) p(θ) −→ p ( θ | x ) ∝ p ( x | θ ) p(θ)

|{z} | {z } | {z }
parametric parametric mess?
Conjugacy: require p(θ) and p ( θ | x ) to be of the same form. E.g.
p(θ) −→ p ( x | θ ) p(θ) −→ p(θ|x)

|{z} | {z } | {z }
Dirichlet Multinomial Dirichlet
p(θ) and p ( x | θ ) are then called conjugate distributions.
11
Example: Dirichlet and Multinomial
P
Γ( αi ) Y αi −1
i
p(θ) = Q θ Dirichlet in θ Γ(x) = (x − 1)!
i Γ (α i ) i
P n
( i xi )! Y
p(x|θ) = θixi Multinomial in x
x1 !x2 ! . . . xn ! i=1
Y
p(θ|x) ∝ p ( θ | x ) p(θ) = junk × θixi +αi −1
i
which is again Dirichlet, so we must have

P
Γ ( i αi + xi ) Y xi +αi −1
p(θ|x) = Q θi .
i Γ (α i + x i ) i
Remember pseudocount of 1? That was just a Dirichlet prior.
12
Conjugate Pairs
Prior Conditional
2
/(2σ 2 ) 2
/(2σ 2 )
Gaussian e−k µ−µ0 k Gaussian e−k x−µ k
Γ(r+s) r−1 s−1 1−x
Beta Γ(r)Γ(s) α (2 − α) Bernoulli αx (1 − α)
P Q αi −1 P
Γ(
Q αi ) ( Q xi )! Q xi
Dirichlet Γ(αi ) θi Multinomial xi ! θi
Inv. Wishart Gaussian (cov)
Note: Conjugacy is mutual, e.g.
Dirichlet → Multinomial → Dirichlet
Multinomial → Dirichlet → Multinomial
13

The Exponential Family of Distributions: P (X) H (X) e

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Exponential Family of Distributions: P (X) H (X) e

Hochgeladen von

Copyright:

Verfügbare Formate

The Exponential Family of Distributions

To get a normalized distribution, for any θ

i.e., when T (x) = x, A(θ) is the log of Laplace transform of h(x).

Exponential p(x) = λ e−λx x ∈ R+

(don’t need to memorize these except for Gaussian)

dA(θ) 1 dQ(θ) Q0 (θ)

=⇒ A(θ) is convex. (º means positive de£nite)

(is θ̂ML a consistent estimator then ?)

Products of E-family distributions are E-family distributions

h̃(x) e(θ1 +θ2 )T (x)−Ã(θ1 ,θ2 )

but might not have a nice parametric form any more.

But the product of two Gaussians is always a Gaussian.

p(θ) −→ p ( x | θ ) p(θ) −→ p ( θ | x ) ∝ p ( x | θ ) p(θ)

Conjugacy: require p(θ) and p ( θ | x ) to be of the same form. E.g.

p(θ) −→ p ( x | θ ) p(θ) −→ p(θ|x)

p(θ) and p ( x | θ ) are then called conjugate distributions.

which is again Dirichlet, so we must have

Remember pseudocount of 1? That was just a Dirichlet prior.

Note: Conjugacy is mutual, e.g.

Dirichlet → Multinomial → Dirichlet

Multinomial → Dirichlet → Multinomial

Das könnte Ihnen auch gefallen