Beruflich Dokumente
Kultur Dokumente
隐变量量的Complete Condi!onal
p(zd,m ∣Z¬d,m , W , Θ, Φ, α, β)
= p(zd,m ∣Z¬d,m , wd,m , W¬d,m , Θ, Φ, α, β)
= p(zd,m ∣wd,m , θd , ϕzd,m )
∝ θd,zd,m × ϕzd,m ,wd,m
⎡ η1 ⎤ ⎡ [zd,m = 1] ⎤
T
where
h(zd,m ) = 1
⎡ η1 ⎤
ln K
d,m
η(wd,m , θd , ϕzd,m ) = ⋮ = ⎢ ⎢ ⋮ ⎥
⎥
⎣ηK ⎦
⎣ln ∑K θd,k ×ϕk,w ⎦
θd,K ×ϕK,wd,m
k=1 d,m
⎡ [x = 1] ⎤
t(zd,m ) = ⋮
⎣[x = K]⎦
a(η(wd,m , θd , ϕzd,m )) = 0
p(θd ∣W , Z, Θ¬d , Φ, α, β)
= p(θd ∣Zd , α)
K
Γ(∑k=1 (Nd,k + αk )) Nd,k +αk −1
= K
θd,k
∏k=1 Γ(Nd,k + αk )
⎡ η1 ⎤ ⎡ ln θd,1 ⎤
T
K K
− ∑ ln Γ(ηk ) + ln Γ (∑ ηk ) )
1
= exp ( ⋮ ⋮
⎣ηK ⎦ ⎣ln θd,K ⎦ k=1
K
∏k=1 θd,k k=1
where
1
h(θd ) = K
∏k=1 θd,k
⎡ η1 ⎤ ⎡ Nd,1 + α1 ⎤
η(Zd , α) = ⋮ = ⋮
⎣ηK ⎦ ⎣Nd,K + αK ⎦
⎡ ln θd,1 ⎤
t(θd ) = ⋮
⎣ln θd,K ⎦
K K
a(η(Zd , α)) = ∑ ln Γ(ηk ) − ln Γ (∑ ηk )
k=1 k=1
K K
= ∑ ln Γ(Nd,k + αk ) − ln Γ (∑(Nd,k + αk ))
k=1 k=1
p(ϕk ∣W , Z, Θ, Φ¬k , α, β)
= p(ϕk ∣Wk , Zk , β)
V V
Γ(∑v=1 (Nk,v + βv ))
∏ ϕk,v
Nk,v +βv −1
= V
∏v=1 Γ(Nk,v + βv ) v=1
⎡ η1 ⎤ ⎡ ln ϕk,1 ⎤
T
V V
− ∑ ln Γ(ηv ) + ln Γ (∑ ηv ) )
1
= exp ( ⋮ ⋮
⎣ηV ⎦ ⎣ln ϕk,V ⎦ v=1
V
∏v=1 ϕk,v v=1
where
1
h(ϕk ) = V
∏v=1 ϕk,v
⎡ η1 ⎤ ⎡ Nk,1 + β1 ⎤
η(Wk , Zk , β) = ⋮ = ⋮
⎣ηV ⎦ ⎣Nk,V + βK ⎦
⎡ ln ϕk,1 ⎤
t(ϕk ) = ⋮
⎣ln ϕk,V ⎦
V V
a(η(Wk , Zk , β)) = ∑ ln Γ(ηv′ ) − ln Γ (∑ ηv )
v ′ =1 v=1
V V
= ∑ ln Γ(Nk,v + βv ) − ln Γ (∑(Nk,v + βv ))
v=1 v=1
隐变量量的Mean-field family
隐变量量联合分布的Mean-field简化形式
D Nd D K
q(Z, Θ, Φ) = ∏ ∏ q(zd,m ∣νd,m ) × ∏ q(θd ∣γd ) × ∏ q(ϕk ∣λk )
d=1 m=1 d=1 k=1
⎡ γd,1 ⎤ ⎡ ln θd,1 ⎤
T
K K
− ∑ ln Γ(γd,k ) + ln Γ (∑ γd,k ) )
1
q(θd ∣γd ) = exp ( ⋮ ⋮
⎣γd,K ⎦ ⎣ln θd,K ⎦ k=1
K
∏k=1 θd,k k=1
⎡ λk,1 ⎤ ⎡ ln ϕk,1 ⎤
T
V V
− ∑ ln Γ(λk,v ) + ln Γ (∑ λk,v ) )
1
q(ϕk ∣λk ) = exp ( ⋮ ⋮
⎣λk,V ⎦ ⎣ln ϕk,V ⎦ v=1
V
∏v=1 ϕk,v v=1
同样q(zd,m ∣νd,m )和p(zd,m ∣wd,m , θd , ϕzd,m ),属于相同的指数家族分布,也就是
Categorical distribu!on。
⎡ [zd,m = 1] ⎤
p(zd,m ∣νd,m ) = exp (νd,m
T
⋮ )
⎣[zd,m = K]⎦
根据笔记VI简明推导可以知道在使⽤用coordinate-ascent⽅方法时,每个隐变量量的参数的最优
解表达式:
νd,m 计算推导
d,m≠d′ ,m′
D K
q(¬zd,m ) = ∏ ∏ q(zd′ ,m′ ∣νd′ ,m′ ) × ∏ q(θd ∣γd ) × ∏ q(ϕk ∣λk )
d′ m′ d=1 k=1
⎡ ln θd,1 × ϕ1,wd,m ⎤
∝ Eq(θd ∣γd )×q(ϕzd,m ∣λzd,m ) ⋮
⎣ln θd,K × ϕK,w ⎦
d,m
⎡ ln θd,1 + ln ϕ1,wd,m ⎤
= Eq(θd ∣γd )×q(ϕzd,m ∣λzd,m ) ⋮
⎣ln θd,K + ln ϕK,w ⎦
d,m
⎡ ln θd,1 ⎤ ⎡ ln ϕ1,wd,m ⎤
= Eq(θd ∣γd )×q(ϕzd,m ∣λzd,m ) ⋮ + Eq(θd ∣γd )×q(ϕzd,m ∣λzd,m ) ⋮
⎣ln θd,K ⎦ ⎣ln ϕ ⎦
K,wd,m
⎡ ln θd,1 ⎤ ⎡ ln ϕ1,wd,m ⎤
= Eq(θd ∣γd ) ⋮ + Eq(ϕzd,m ∣λzd,m ) ⋮
⎣ln θd,K ⎦ ⎣ln ϕK,w ⎦
d,m
⋮ = ⋮
⎣ln θd,K ⎦ ⎣ψ(γ ) − ψ(∑K γ )⎦
Eq(θd ∣γd )
d,K k=1 d,K
也就是
K
Eq(θd ∣γd ) [ln θd,i ] = ψ(γd,i ) − ψ(∑ γd,k )
k=1
类似的
⎡ ln ϕ1,wd,m ⎤ ⎡ ψ(λ1,wd,m ) − ψ(∑v=1 λ1,v ) ⎤
V
⋮ = ⋮
⎣ln ϕK,w ⎦ ⎣ψ(λ ⎦
Eq(ϕzd,m ∣λzd,m )
V
d,m K,wd,m ) − ψ(∑v=1 λK,v )
也就是
V
Eq(ϕk ∣λk ) [ln ϕk,j ] = ψ(λk,j ) − ψ(∑ λk,v )
v=1
⎡ νd,m,1 ⎤
= ⋮
⎣νd,m,K ⎦
νd,m
∝ ⋮ + ⋮
⎣ψ(γ ) − ψ(∑K γ )⎦ ⎣ψ(λ ) − ψ( ∑
V
λ ) ⎦
d,K k=1 d,K K,w d,m v=1 K,v
∝ ⋮
⎣ψ(γ ) + ψ(λ ⎦
νd,m
V
d,K K,wd,m ) − ψ(∑v=1 λK,v )
γd 的计算推导
D Nd K
q(¬θd ) = ∏ ∏ q(zd,m ∣νd,m ) × ∏ q(θd ∣γd ) × ∏ q(ϕk ∣λk )
d=1 m=1 d′ ≠d k=1
⎡ Nd,1 + α1 ⎤
=E∏Nd q(zd,m ∣νd,m ) ⋮
m=1
⎣Nd,K + αK ⎦
⎡ α1 ⎤ ⎡ Nd,1 ⎤
= ⋮ + E∏Nd q(zd,m ∣νd,m ) ⋮
⎣α ⎦ m=1
⎣Nd,K ⎦
K
⎡ α1 ⎤ Nd ⎡ [zd,m = 1] ⎤
= ⋮ + E∏Nd q(zd,m ∣νd,m ) [ ∑ ⋮ ]
⎣αK ⎦ m=1
m=1 ⎣[zd,m = K]⎦
⎡ α1 ⎤ Nd ⎡ [zd,m = 1] ⎤
= ⋮ + ∑ (E∏Nd q(zd,m ∣νd,m ) ⋮ )
⎣αK ⎦ m=1 m=1
⎣[zd,m = K]⎦
⎡ α1 ⎤ Nd ⎡ [zd,m = 1] ⎤
= ⋮ + ∑ (Eq(zd,m ∣νd,m ) ⋮ )
⎣α ⎦ m=1 ⎣[zd,m = K]⎦
K
⎡ α1 ⎤ Nd ⎡ νd,m,1 ⎤
= ⋮ +∑ ⋮
⎣αK ⎦ m=1 ⎣νd,m,K ⎦
Nd
=α + ∑ νd,m
m=1
其中
⎡ [zd,m = 1] ⎤
⋮
⎣[zd,m = K]⎦
Eq(zd,m ∣νd,m )
K ⎡ [zd,m = 1] ⎤
= ∑ q(zd,m = i∣νd,m ) ⋮
i=1 ⎣[zd,m = K]⎦
K ⎡ [zd,m = 1] ⎤
= ∑ νd,m,i ⋮
i=1 ⎣[zd,m = K]⎦
⎡νd,m,1 × 1⎤ ⎡ 0 ⎤
= ⋮ + ⋮
⎣ 0 ⎦ ⎣νd,m,K × 1⎦
⎡ νd,m,1 ⎤
= ⋮
⎣νd,m,K ⎦
=νd,m
所以
Nd
γd = α + ∑ νd,m
m=1
λk 的推导计算
D Nd D
q(¬ϕk ) = ∏ ∏ q(zd,m ∣νd,m ) × ∏ q(θd ∣γd ) × ∏ q(ϕk ∣λk )
d=1 m=1 d=1 k′ ≠k
⎡ Nk,1 + β1 ⎤
=E∏D ∏Nd q(zd,m ∣νd,m ) ⋮
d=1 m=1
⎣Nk,V + βV ⎦
⎡ β1 ⎤ ⎡ Nk,1 ⎤
= ⋮ + E∏D ∏Nd q(zd,m ∣νd,m ) ⋮
⎣βV ⎦ d=1 m=1
⎣Nk,V ⎦
⎡ β1 ⎤ D Nd ⎡ [wd,m = 1] ⎤
= ⋮ + E∏D ∏Nd q(zd,m ∣νd,m ) [∑ ∑ ([zd,m = k] × ⋮
⎣β ⎦ ⎣[wd,m = V ]⎦
)]
d=1 m=1
V d=1 m=1
⎡ β1 ⎤ D Nd ⎡ [wd,m = 1] ⎤
= ⋮ + ∑ ∑ (E∏D ∏Nd q(zd,m ∣νd,m ) ([zd,m = k] × ⋮ ))
⎣βV ⎦ d=1 m=1 d=1 m=1
⎣[wd,m = V ]⎦
⎡ β1 ⎤ D Nd ⎡ [wd,m = 1] ⎤
= ⋮ + ∑ ∑ ((Eq(zd,m ∣νd,m ) [zd,m = k]) × ⋮ )
⎣βV ⎦ d=1 m=1 ⎣[wd,m = V ]⎦
⎡ β1 ⎤ D Nd ⎡ [wd,m = 1] ⎤
= ⋮ + ∑ ∑ (νd,m,k × ⋮ )
⎣β ⎦ d=1 m=1 ⎣[wd,m = V ]⎦
V
D ⎡ [wd,m = 1] ⎤
Nd
=β + ∑ ∑ (νd,m,k × ⋮ )
d=1 m=1 ⎣[wd,m = V ]⎦
⎡ [wd,m = 1] ⎤
因为每个wd,m 是已观测的确定值,所以每个 ⋮ 是⼀一个确定的one-hot
⎣[wd,m = V ]⎦
vector。
前⾯面的推导中已知Eq(zd,m ∣νd,m ) [zd,m = k] = νd,m,k 。
Coordinate-ascent VI 算法
···
Ini!alize lambda randomly
···