A Quantum Extension of Variational Bayes Inference

A Quantum Extension of Variational Bayes Inference
Hideyuki Miyahara1∗ and Yuki Sughiyama2

1
Department of Mathematical Informatics, Graduate School of Information Science and Technology,
The University of Tokyo, 7-3-1 Hongosanchome Bunkyo-ku Tokyo 113-8656, Japan and
2
Institute of Industrial Science, The University of Tokyo,
4-6-1, Komaba, Meguro-ku, Tokyo 153-8505, Japan
(Dated: December 14, 2017)
Variational Bayes (VB) inference is one of the most important algorithms in machine learning
and widely used in engineering and industry. However, VB is known to suffer from the problem of
local optima. In this Letter, we generalize VB by using quantum mechanics, and propose a new
algorithm, which we call quantum annealing variational Bayes (QAVB) inference. We then show
that QAVB drastically improve the performance of VB by applying them to a clustering problem
described by a Gaussian mixture model. Finally, we discuss an intuitive understanding on how
arXiv:1712.04709v1 [stat.ML] 13 Dec 2017
QAVB works well.
PACS numbers: 89.90.+n, 89.20.-a, 89.70.-a, 03.67.-a, 03.67.Ac
Introdction.— Machine learning gathers considerable and SAVB do not. This fact is noteworthy because our
attention in a wide range of fields, and much effort is de- algorithm is one of the few algorithms that can obtain a
voted to develop effective algorithms. Variational Bayes global optimum of non-convex optimization in practical
(VB) inference [1–6] is one of the most fundamental computational time without using random numbers.
methods in machine learning, and widely used for param- Problem setting of VB.— For preparation of a quan-
eter estimation and model selection. In particular, VB tum extension of VB, we briefly review the problem set-
has succeeded to compensate some disadvantages of the ting of VB [1–6]. First, we summarize the definitions of
expectation-maximization (EM) algorithm [5–7], which variables. Suppose that we have N data points Y obs =
is a well-used approach for maximum likelihood estima- {yiobs }N
i=1 , which are independent and identically dis-
tion. For example, overfitting, which is often occurred in tributed by the conditional distribution py,σ|θ (yi , σi |θ),
EM, is greatly moderated in VB. Furthermore, a variant where yi , σi , and θ are an observable variable, a hidden
of VB based on classical statistical mechanics, which we variable and a parameter,
QN respectively. Thus, we have
call simulated annealing variational Bayes (SAVB) infer- pY,Σ|θ (Y, Σ|θ) = i=1 py,σ|θ (yi , σi |θ), where Y = {yi }Ni=1
ence in this paper, was proposed [8] and has been getting and Σ = {σi }N i=1 . The joint distribution is also given by
popular in many fields due to its effectiveness. However, pY,Σ,θ (Y, Σ, θ) = pY,Σ,θ (Y, Σ|θ)pθpr (θ), where pθpr (θ) de-
it is also known that VB and SAVB often fail to estimate notes the prior distribution of θ. Furthermore, we define
appropriate parameters of an assumed model depending N
the domains of Σ and θ as S Σ := ⊗ S σ and S θ , respec-
on prior distributions and initial conditions. i=1
tively.
In the field of physics, the study of quantum computa- The goal of VB is to approximate the poste-
tion and how to exploit it for machine learning are getting rior distributions given by pΣ,θ|Y (Σ, θ|Y obs ) =
popular. For example, while experimentalists are inten- pY,Σ,θ
(Y obs
, Σ, θ)/p Y
(Y obs
) with pY
(Y obs
) =
sively developing quantum machines [9–13], theorists are P R obs
Σ∈S Σ θ∈S θ dθ p(Y , Σ, θ) in the mean field ap-
developing quantum error correction schemes [14–18] and proximation. Here, we have used the Bayes theorem
quantum algorithms [19–29]. In particular, the study of for the derivation of the posterior distribution. Us-
quantum annealing (QA) has a history for more than two ing function q Σ,θ (Σ, θ) that satisfies
decades [22–25] and is still progressing [26]. P a Rvariational Σ,θ
Σ∈S Σ θ∈S θ dθ q (Σ, θ) = 1, the objective function of
In this Letter, by focusing on QA and VB, we devise VB is given by
a quantum-mechanically inspired algorithm that works
on a classical computer in practical time and achieves KL q Σ,θ (·, ·) pΣ,θ|Y (·, ·|Y obs )

a considerable improvement over VB and SAVB. More X Z pΣ,θ|Y (Σ, θ|Y obs )
specifically, we introduce the mathematical mechanism := − dθ q Σ,θ (Σ, θ) ln ,
of quantum fluctuations into VB, and propose a new al- Σ θ∈S θ q Σ,θ (Σ, θ)
Σ∈S
gorithm, which we call quantum annealing variational (1)
Bayes (QAVB) inference. To demonstrate the perfor-
mance of QAVB, we consider a clustering problem and which is the KL divergence [30, 31]. In VB, we minimize
employ a Gaussian mixture model, which is one of im- Eq. (1) in the mean field approximation given by
portant applications of VB. Then, we see that QAVB
succeeds in estimation with high probability while VB q Σ,θ (Σ, θ) = q Σ (Σ)q θ (θ). (2)
2
Σ Σ :=
Thus, by setting the functional derivatives of Eq. (1) un- where Ĥqu is a non-commutative term, defined as Ĥqu
der Eq. (2) with respect to q Σ (Σ) and q θ (θ) equal to 0 PN σi σi
i=1 Ĥqu , and Ĥqu is defined such that
and solving for q Σ (Σ) and q θ (θ), we obtain the update
equations for Σ and θ:
i−1

N

σi
Ĥqu , ⊗ Iˆσj ⊗ σ̂i ⊗ ⊗ Iˆσj ⊗ Iˆθ 6= 0, (10)
j=1 j=i+1
Z
Σ θ Y,Σ,θ obs
qt+1 (Σ) ∝ exp dθ qt+1 (θ) ln p (Y , Σ, θ) ,
θ∈S θ
for any i [32]. Here, Iˆθ represents the identity opera-
(3) tor for the space spanned by |θi. This Gibbs opera-
tor, Eq. (9), involves two annealing parameters β and
!
X
θ
qt+1 (θ) ∝ exp qtΣ (Σ) ln pY,Σ,θ (Y obs , Σ, θ) , (4) s, where, in terms of physics, β is regarded as the inverse
Σ∈S Σ temperature and s represents the strength of quantum
fluctuations.
Thus, when s = E0 and β = 1, we re-
where qtΣ (Σ) and qtθ (θ) is the distributions of Σ and θ at D
cover Σ, θ fˆ(β = 1, s = 0) Σ, θ = pY,Σ,θ (Y obs , Σ, θ).

the t-th iteration [5, 6].
VB is widely used due to its effectiveness. In some Although we consider only the quantization of Σ, the
cases, the performance of VB is much better than that quantization of θ is almost straightforward [33].
of EM [5–7], and VB can be directly used for model se- Using Eq. (9), we define a quantum extension of the
lection [1–6]. However, it is also known that the perfor- KL divergence [34] by
mance of VB heavily depends on initial conditions. To !
relax this problem, we introduce quantum fluctuations to fˆ(β, s)
Σ,θ
S ρ̂
VB in the rest of this Letter. Z(β, s)
Quantum annealing variational Bayes inference.—
fˆ(β, s)

Here, we formulate a quantum extension of VB. We first := −TrΣ,θ ρ̂Σ,θ ln − ln ρ̂Σ,θ , (11)
define the classical Hamiltonians by pY,Σ|θ (Y obs , Σ|θ) and Z(β, s)
pθpr (θ): h i
where Z(β, s) := TrΣ,θ fˆ(β, s) and TrΣ,θ [·] :=
Σ|θ
Hcl := − ln pY,Σ|θ (Y obs , Σ|θ), (5) P R Σ,θ
Σ∈S Σ θ∈S θ dθ hΣ, θ | · | Σ, θi. Also, ρ̂ denotes a den-
θ
Hpr := − ln pθpr (θ). (6)

sity operator over Σ and θ that satisfies TrΣ,θ ρ̂Σ,θ = 1.
In particular, when β = 1, s = 0, and ρ̂ is diagonal,
Next, we define operators σ̂i and θ̂ whose eigenvalues are the quantum relative entropy, Eq. (11), reduces to the
σi and θ, respectively; that is, σ̂i and θ̂ satisfy σ̂i |σi i = classical KL divergence, Eq. (1).
σi |σi i and θ̂ |θi = θ |θi, where |σi i and |θi are eigenstates To derive the update equations, we repeat the almost
of σ̂i and θ̂, respectively. In this paper, we assume σ̂i same procedure of VB; that is, we employ the mean field
and θ̂ are commutative with each other. Using the above approximation ρ̂Σ,θ = ρ̂Σ ⊗ ρ̂θ , where ρ̂Σ and ρ̂θ repre-
N sent the density operators for Σ and θ,respectively; then
definition of |σi i, we also define |Σi := ⊗ |σi i. Then,
i=1 Eq. (11) can be reduced to [35]
we replace Σ = {σi }N and θ in Eqs. (5) and (6) by
i=1 N
fˆ(β, s)
!
i−1 N
⊗ Iˆσj
⊗ σ̂i ⊗ ⊗ I ˆσj
and θ̂, respectively, Σ
S ρ̂ ⊗ ρ̂ θ
j=1 j=i+1 i=1 Z(β, s)
where Iˆσi denotes the identity operator for the spaces X X Z Z
spanned by |σi i. That is, we define =− dθ dθ0
Σ∈S Σ Σ0 ∈S Σ θ∈S θ θ 0 ∈S θ
Σ|θ
X Z Σ|θ
× Σ ρ̂Σ Σ0 θ ρ̂θ θ0

Ĥcl := dθ Hcl P̂ Σ,θ , (7)
Σ∈S Σ θ∈S θ h ih ih i
X Z × hΣ0 | ⊗ hθ0 | ln fˆ(β, s) |Σi ⊗ |θi
θ θ
Ĥpr := dθ Hpr P̂ Σ,θ , (8) X X
Σ∈S Σ θ∈S θ + hΣ| ρ̂Σ |Σ0 i hΣ0 | ln ρ̂Σ |Σi
Σ∈S Σ Σ0 ∈S Σ
N
Z Z
where P̂ Σ,θ := P̂ Σ ⊗ P̂ θ , P̂ Σ := ⊗ P̂ σi , P̂ σi := |σi i hσi |, + dθ dθ0 hθ| ρ̂θ |θ0 i hθ0 | ln ρ̂θ |θi
i=1 θ∈S θ θ 0 ∈S θ
and P̂ θ := |θi hθ|. To introduce quantum fluctuations + ln Z(β, s). (12)
to VB, we define a Gibbs operator that involves a non-
commutative term: Next, by setting
the
functional
derivatives

of Eq. (12)
with respect to Σ ρ̂Σ Σ0 and θ ρ̂θ θ0 equal to 0

fˆ(β, s) := exp −Ĥpr
θ Σ|θ Σ
− β(1 − s)Ĥcl − βsĤqu , (9)
and solving for ρ̂Σ and ρ̂θ , we obtain the update equa-
3
{π k }K k K k K
k=1 , {µ }k=1 , and {Λ }k=1 by π, µ, and Λ, respec-
ALGORITHM 1: Quantum annealing variational Bayes
(QAVB) inference tively, and we refer by θ to {π, µ, Λ} collectively.
Taking the logarithm of Eq. (15), we define the Hamil-
tonian of the GMM for σi with y = yiobs as
1: set ρ̂θpr and t ← 0, and initialize ρ̂Σ
0
2: set β ← β0 and s ← s0 σ |θ
3: while convergence criterion is not satisfied do Hcli = − ln py,σ|θ (yiobs , σi |θ). (16)
4: compute ρ̂θt+1 in Eq. (14)
5: compute ρ̂Σ t+1 in Eq. (13) for i = 1, 2, . . . , N Then the Hamiltonian of the GMM for Σ = {σi }Ni=1 with
6: change β and s Σ|θ PN σ |θ
7: t←t+1
Y = Yiobs is given by Hcl = i=1 Hcli . Using Eq. (7),
Σ|θ
8: end while we can also define the quantum representation of Hcl
Σ|θ
as Ĥcl .
Σ|θ
To introduce quantum fluctuations into Ĥcl , a non-
Σ
PN σi
tions [36]: commutative term Ĥqu = i=1 Ĥqu that satisfies
h i Eq. (10) should be added. In this Letter, we adopt
ρ̂Σ θ ˆ
t+1 ∝ exp Trθ ρ̂t+1 ln f (β, s) , (13)  
h i
ρ̂θt+1 ∝ exp TrΣ ρ̂Σ ˆ
t ln f (β, s) , (14) i−1
⊗ Iˆσj
X
σi
 
Ĥqu = ⊗ |σi = li hσi = k| 
j=1  
P k=1,...,K,
Rwhere TrΣ [·] := Σ∈S Σ hΣ | · | Σi, Trθ [·] := l=k±1
θ∈S θ dθ hθ | · | θi, and t stands for the number of
N

iterations. We mention that TrΣ [·] and Trθ [·] represent ⊗ ⊗ Iˆσj ⊗ Iˆθ , (17)
j=i+1
partial traces, and they yield operators on the spaces
spanned by |θi and |Σi, respectively. We also note that where |σi = 0i = |σi = Ki and |σi = K + 1i = |σi = 1i.
the subscriptions t and t + 1 in the right-hand sides of We note that the form of Ĥquσi
is not limited to the above
Eqs. (13) and (14) depend on implementations of QAVB definition and has arbitrariness in general.
and the normalization factors of Eqs. (13) and (14)
Numerical setup and results.— We assess the perfor-
are determined by the condition of density operators
mances of three algorithms: QAVB, VB, and SAVB. In
TrΣ [ρ̂Σ ] = 1 and Trθ [ρ̂θ ] = 1. In QAVB, we iterate these
this numerical simulation, we use the data set shown in
two update equations changing the annealing parameters
Fig. 1(a). The number of Gaussian mixtures of the gen-
β and s until a termination condition is satisfied. In
θ erating model Kgen is 10. The means and covaricances
this algorithm, we obtain density operators
ρ̂Σ t and ρ̂
t
Σ of Gaussians are depicted by green crosses and blue lines
in each step, and their diagonal elements Σ ρ̂t Σ

in Fig. 1(a), respectively.
and θ ρ̂θt θ represent the distributions of Σ and θ, There are many candidates for annealing schedules; so,
respectively. In practical applications,
we may
use the we limit ourselves to some annealing schedules as follows.
mean Trθ [θ̂ρ̂θ ] or the mode arg maxθ θ ρ̂θ θ . Note Let βt and st be β and s at the t-th iteration, respectively.
that, when β = 1 and s = 0, Eqs. (13) and (14) exactly For QAVB, we vary st and βt as st = s0 × max(1 −
reduces to the update equations of VB, Eqs. (3) and (4). t/τQA1 , 0.0) and
Finally, we summarize this algorithm in Algo. 1.
Gaussian mixture models.— To see the performance of 
 β0 (t ≤ τQA1 )
QAVB, we consider the estimation problem of the pa-

(β0 −1)(τQA2 −t)
rameters and number of clusters of a GMM studied in βt = 1 + τQA2 −τQA1 (τQA1 ≤ t ≤ τQA2 ) , (18)

Ref. [2, 5, 6]. The joint probability distribution of the 1.0 (t ≥ τQA2 )

GMM over an observable variable yi and a hidden vari-
able σi conditioned by a set of parameters θ is given by respectively, where s0 and β0 are initial values of the an-
nealing schedules, τQA1 and τQA2 specify the time scales
K
X of the annealing schedules, and max(x, y) gives the max-
py,σ|θ (yi , σi |θ) = π k N (yi |µk , (Λk )−1 )δk,σi , (15)
imum of x and y. To visualize how Tt := 1/βt and st be-
k=1
have in the above annealing schedules, we illustrate them
where δk,σi is the Kronecker delta function, {πk }K k=1 in Fig. 1(b). The reason why we adopt the above anneal-
are the mixing coefficients for the GMM, and ing schedules will be discussed later. Note that QAVB
N (yi |µk , (Λk )−1 ) is a Gaussian distribution whose mean with s0 = 0 corresponds to SAVB and SAVB with β0 = 1
and precision, which is the inverse of covariance, are is identical to VB.
µk and Λk , respectively [37]. Here, we have assumed We show the numerical results of the three algo-
that each hidden variable σi takes 1, . . . , K; that is, rithms [39]. We set K = 15 hereafter. In Fig. 2(a), we
S σ = {k}K k=1 [38]. To simplify the notation, we denote first compare QAVB and VB by plotting the estimated
4
(b) (a) −2700 (b) −2700

(a) 20
Posterior log-likelihood
Posterior log-likelihood
Data points 1.2 Tt −2750 −2750
15 st
µk 1 −2800 −2800
10 Σk −2850 −2850
0.8
5 −2900 −2900
0.6
−2950 −2950
0 0.4 QAVB QAVB
−3000 −3000
−5 0.2 VB SAVB
−3050 −3050
4 5 6 7 8 9 10 11 12 13 4 5 6 7 8 9 10 11 12 13
−10 0 Estimated number of clusters Estimated number of clusters
−15 0 100 200 300 400 500 600
−10 −5 0 5 10 15 20 t
FIG. 2: (a) Relation between the number of estimated
FIG. 1: (a) Data set generated by 10 Gaussian functions clusters and the posterior log-likelihood of QAVB and VB,
(Kgen = 10). The means and covaricances of Gaussians are and (b) that of QAVB and SAVB. We set s0 = 1.0 and
depicted by green crosses and blue lines, respectively. (b) β0 = 30.0 for QAVB and β0 = 0.9 for SAVB. The horizontal
Annealing schedules of QAVB. The red line represents axis represents the number of estimated clusters, while the
Tt = 1/βt with β0 = 30.0, and green lines depict st with vertical axis depicts the posterior log-likelihood. The error
s0 = 1.0. We set τQA1 = 450 and τQA2 = 500. bars along the horizontal axis represent frequency
normalized to ten for VB and SAVB, and to unity for
QAVB.
number of clusters and the posterior log-likelihood, which
is given by

L q Σ (·)q θ (·) = −KL q Σ (·)q θ (·) pΣ,θ|Y (·, ·|Y obs )

+ ln pY (Y obs ). (19) where |GSi is the ground state of Ĥqu Σ

, and ZMF (β, s)
To draw Fig. 2(a), we ran QAVB and VB 1000 times with is the mean field partition function with β and s [41].
randomized initialization. For QAVB, we set s0 = 1.0, Here, we have assumed that β0 is sufficiently large and
β0 = 30.0, τQA1 = 450, and τQA2 = 500. Estimates ignored excited states in the approximation (21). Next,
with the same number of clusters and same posterior log- let us turn our attention to the annealing schedules in
likelihood are plotted at the same point in Fig. 2(a). To the numerical simulation, which consists of two parts:
count trials with the same estimate, we represent them 0 ≤ t ≤ τQA1 and τQA1 ≤ t ≤ τQA2 . In the first part,
by error bars along the horizontal axis; thus long lines we gradually decrease s to 0 at low temperature. The
θ
mean that they are frequently obtained in 1000 trials, estimated state ρ̂Σ t ⊗ ρ̂t is considered to keep staying at
Σ|θ Σ
while short lines mean that they are rarely obtained. Fur- the mean field ground state of (1−st )Ĥcl +st Ĥqu during
thermore, the lengths of the error bars are normalized to 0 ≤ t ≤ τQA1 , when st is changed slowly enough [42]. If
ten for VB and unity for QAVB. Figure 2(a) shows that, the above consideration holds, at the τQA1 -th iteration,
θ Σ|θ
while VB can never find it, all the trials of QAVB attain ρ̂Σ
τQA1 ⊗ ρ̂τQA1 reaches the mean field ground state of Ĥcl .
the best posterior log-likelihood. That is, the success ra- In the second part, the temperature increases. In most
tio of QAVB is 100.0%. Next, we show the comparison cases, during the process to increase the temperature of
between QAVB and SAVB in Fig. 2(b). or SAVB, we a system, its state relaxes to a unique equilibrium state
adopt βt = 1 + (β0 − 1) × max(1 − t/τSA , 0.0), because at β. We therefore expect that, during τQA1 ≤ t ≤ τQA2 ,
Eq. (18) is not an effective one, and we set β0 = 0.9 and θ
ρ̂Σ
t ⊗ ρ̂t would transit from the mean field ground state of
τSA = 500. The length of the error bars for SAVB is also Σ|θ
Ĥcl to the Gibbs operator that minimizes Eq. (12) with
normalized to ten as those for VB. Figure 2(b) also shows Σ θ

β = 1.0
Σand sθ =
0.0, and we finally obtain qt (Σ)qt (θ) =
that SAVB fails to find the best posterior log likelihood Σ, θ ρ̂t ⊗ ρ̂t Σ, θ that minimize Eq. (1). In the above

while QAVB finds it [40]. discussion, we have used some non-trivial assumptions
This numerical simulation shows the surprising supe- without proving them mathematically. Then, a rigorous
riority of QAVB against VB and SAVB, because only discussion on the dynamics of QAVB is an issue in the
QAVB attains the best posterior log-likelihood. Further- future.
more, the computational cost of QAVB scales linearly
against the number of data points N and thus QAVB
works well even for large N . Conclusion.— We have presented QAVB by introduc-
Discussion.— Here, we intuitively discuss the reason ing quantum fluctuations into VB. After formulating
why QAVB is superior to VB and SAVB. First, we con- QAVB, we have shown the numerical simulations, which
sider the first iteration of QAVB in the numerical simu- suggest that QAVB is superior to VB and SAVB, and dis-
lation. Then, we have cussed its mechanism. We consider that our quantization
h i approach for VB can be applied to other algorithms in
exp Trθ ρ̂θ1 − Ĥpr θ Σ
− β0 Ĥqu machine learning and may yield considerable improve-
ρ̂Σ
1 = (20) ments on them. Thus, we believe that our approach
ZMF (β0 , 1.0)
opens the door to a new field spreading over physics and
≈ |GSi hGS| , (21) machine learning.
5
[25] E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lund-

gren, and D. Preda, Science 292, 472 (2001).
[26] T. Albash, V. Martin-Mayor, and I. Hen, Phys. Rev.
∗
hideyuki miyahara@mist.i.u-tokyo.ac.jp Lett. 119, 110502 (2017).
[1] S. Waterhouse, D. Mackay, and T. Robinson, in In (MIT [27] H. Miyahara and K. Tsumura, in American Control Con-
Press, 1996) pp. 351–357. ference (ACC), 2016 (2016).
[2] H. Attias, in Proceedings of the Fifteenth Conference [28] H. Miyahara, K. Tsumura, and Y. Sughiyama, in Deci-
on Uncertainty in Artificial Intelligence, UAI’99 (Mor- sion and Control (CDC), 2016 IEEE 55th Conference on
gan Kaufmann Publishers Inc., San Francisco, CA, USA, (IEEE, 2016) pp. 4674–4679.
1999) pp. 21–30. [29] H. Miyahara, K. Tsumura, and Y. Sughiyama, Journal
[3] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. of Statistical Mechanics: Theory and Experiment 2017,
Saul, Machine learning 37, 183 (1999). 113404 (2017).
[4] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, Journal [30] S. Kullback and R. A. Leibler, The annals of mathemat-
of the American Statistical Association (2017). ical statistics 22, 79 (1951).
[5] C. Bishop, “Pattern recognition and machine learning [31] S. Kullback, Information Theory and Statistics (Dover
(information science and statistics), 1st edn. 2006. corr. Publications, 1997).
2nd printing edn,” (2007). [32] In Eq. (9), we intentionally drop β from the term includ-
[6] K. P. Murphy, Machine learning: a probabilistic perspec- ing Ĥprθ
. The reason is that, when β is large, a necessary
tive (MIT press, 2012). condition of a conjugate prior distribution may be bro-
[7] A. P. Dempster, N. M. Laird, and D. B. Rubin, Journal ken. In the case of a GMM, large β breaks a condition
of the Royal Statistical Society, Series B 39, 1 (1977). of the Wishart distribution, which is the conjugate prior
[8] K. Katahira, K. Watanabe, and M. Okada, Journal of distribution for the inverse of the covariance of a Gaus-
Physics: Conference Series 95, 012015 (2008). sian function.
[9] R. Barends, A. Shabani, L. Lamata, J. Kelly, A. Mezza- [33] An approach to quantize θ is just to add Ĥqu θ
that satisfies
capo, U. L. Heras, R. Babbush, A. Fowler, B. Campbell, θ ˆ Σ
[Ĥqu , I ⊗ θ̂] 6= 0 to Eq. (9).
Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, E. Jeffrey,
[34] H. Umegaki, Kodai Mathematical Seminar Reports 14,
E. Lucero, A. Megrant, J. Mutus, M. Neeley, C. Neill,
59 (1962).
P. O’Malley, C. Quintana, E. Solano, T. White, J. Wen-
[35] See Sec. A in the Supplemental Material for the detail
ner, A. Vainsencher, D. Sank, P. Roushan, H. Neven,
derivation.
and J. Martinis, Nature 534, 222 (2016).
[36] See Sec. B in the Supplemental Material for the detail
[10] M. Mohseni, P. Read, H. Neven, S. Boixo, V. Denchev,
derivation.
R. Babbush, A. Fowler, V. Smelyanskiy, and J. Martinis,
[37] We have not got into the detail of the prior distribution
Nature 543, 171174 (2017).
of the GMM pθpr (θ), because we do not quantize it in this
[11] M. Johnson, M. Amin, S. Gildert, T. Lanting, F. Hamze,
paper. See Ref. [5, 6], if the reader is not familiar with it.
N. Dickson, R. Harris, A. Berkley, J. Johansson,
[38] When we use the one-hot notation [5, 6], we can construct
P. Bunyk, et al., Nature 473, 194 (2011).
an equivalent quantization scheme.
[12] M. W. Johnson, P. Bunyk, F. Maibaum, E. Tolkacheva,
[39] In the numerical simulation, we used the Dirich-
A. J. Berkley, E. M. Chapple, R. Harris, J. Johansson, k K
let distribution D(π|{αpr }k=1 ), Gauss distribution
T. Lanting, I. Perminov, E. Ladizinsky, T. Oh, and QK k k k k −1
G. Rose, Superconductor Science and Technology 23, k=1 N (µ QK
|mpr , (γpr Λ ) ), and the Wishart distri-
k k k
065004 (2010). bution k=1 W(Λ |Wpr , νpr ) for the prior distribu-
[13] T. Albash, T. Rønnow, M. Troyer, and D. Lidar, The Eu- tions of π, µ, and Λ, respectively; that is, ppr (θ =
k K
}k=1 ) K k k k k −1
Q
ropean Physical Journal Special Topics 224, 111 (2015). {π, µ, Λ}) = D(π|{αpr k=1 N (µ |mpr , (γpr Λ ) )
QK k k k
[14] S. J. Devitt, W. J. Munro, and K. Nemoto, Reports on k=1 W(Λ |W pr , ν pr ). Here, we do not provide the def-
Progress in Physics 76, 076001 (2013). initions of the three distributions. If the reader is not
[15] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and familiar with them, see Ref. [5, 6]. Furthermore, we set
W. K. Wootters, Phys. Rev. A 54, 3824 (1996). k
αpr = 0.001, γpr k
= 0.001, mkpr = ~0 where ~0 is a zero vec-
[16] D. Gottesman, arXiv preprint quant-ph/9705052 (1997). k
tor, Wpr = I where I is an identity operator, and νpr k
=1
[17] E. Knill, R. Laflamme, and L. Viola, Physical Review for each k.
Letters 84, 2525 (2000). [40] Although we have checked different β0 , SAVB cannot find
[18] K. L. Pudenz, T. Albash, and D. A. Lidar, 5, 3243 EP the best estimate.
(2014). [41] When s = 1.0, it holds that ZMF (β, s = 1.0) = Z(β, s =
[19] A. W. Harrow, A. Hassidim, and S. Lloyd, Phys. Rev. 1.0), because Eq. (9) does not have a term over Σ and θ
Lett. 103, 150502 (2009). and then the mean field approximation is exact.
[20] P. Rebentrost, M. Mohseni, and S. Lloyd, Phys. Rev. [42] We expect that something like the adiabatic theorem in
Lett. 113, 130503 (2014). quantum mechanics would hold during the process of
[21] S. Lloyd, M. Mohseni, and P. Rebentrost, arXiv preprint QAVB.
arXiv:1307.0411 (2013).
[22] B. Apolloni, C. Carvalho, and D. de Falco, Stochastic
Processes and their Applications 33, 233 (1989).
[23] A. Finnila, M. Gomez, C. Sebenik, C. Stenson, and
J. Doll, Chemical Physics Letters 219, 343 (1994).
[24] T. Kadowaki and H. Nishimori, Phys. Rev. E 58, 5355
(1998).
6
A. Functional derivatives of G with respect to ρ̂Σ and tion, we prove the following equality:
ρ̂θ
δ h i D E
Tr X̂ ln ρ̂ = Σ0 X̂ ρ̂−1 Σ , (26)

Here, we derive Eq. (12) in the main text. By substi- 0
δ hΣ | ρ̂ | Σ i
tuting the mean field approximation ρ̂Σ,θ = ρ̂Σ ⊗ ρ̂θ into
Eq. (11), we obtain for and density operator ρ̂ and any X̂ that commutes
with ρ̂. The proof is as follows.
!
ˆ
Σ θ f (β, s)
S ρ̂ ⊗ ρ̂
Z(β, s)
X X Z Z ˆ the definitions of the logarithm
Proof. When 0̂ ≺ ρ̂ ≺ 2I,
=− dθ dθ0
θ∈S θ θ 0 ∈S θ
and inverse are given by
h ihih i
∞
× hΣ| ⊗ hθ| ρ̂Σ ⊗ ρ̂θ |Σ0 i ⊗ |θ0 i (−1)n+1 ˆ n,
X
ln ρ̂ := (ρ̂ − I) (27)
h ih ih i n
× hΣ0 | ⊗ hθ0 | ln fˆ(β, s) |Σi ⊗ |θi n=1
∞
ˆ n−1 .
X X Z Z X
+ dθ dθ0 ρ̂−1 := (−1)n+1 (ρ̂ − I) (28)
θ∈S θ θ 0 ∈S θ n=1
h ih ih i
× hΣ| ⊗ hθ| ρ̂Σ ⊗ ρ̂θ |Σ0 i ⊗ |θ0 i By substituting Eq. (27) into the left-hand side of
h ih ih i Eq. (26), we get
× hΣ0 | ⊗ hθ0 | ln ρ̂Σ ⊗ ρ̂θ |Σi ⊗ |θi
+ ln Z(β, s). (22) δ h i
Tr X̂ ln ρ̂
δ hΣ | ρ̂ | Σ0 i
where |Σ, θi = |Σi⊗|θi. This expression can be simplified ∞
(−1)n+1

further using the following identities:
X δ ˆ n
= Tr X̂ (ρ̂ − I) . (29)

n=1
δ hΣ | ρ̂ | Σ0 i n
ln ρ̂a ⊗ ρ̂b = ln ρ̂a ⊗ Iˆb + Iˆa ⊗ ln ρ̂b , (23)
h ih ih i
ha| ⊗ hb| ρ̂a ⊗ ρ̂b |a0 i ⊗ |b0 i Each term in the summation in Eq. (29) can be calcu-
lated as
= ha | ρ̂a | a0 i b ρ̂b b0 ,

(24)
(−1)n+1

δ ˆ n
ˆa ˆb
where I and I are identity operators in the Hilbert Tr X̂ (ρ̂ − I)
δ hΣ | ρ̂ | Σ0 i n
spaces spanned by |ai and |bi, respectively. Then we n
obtain (−1)n+1 X D 0 ˆ n−i X̂(ρ̂ − I)
E
ˆ i−1 Σ
= Σ (ρ̂ − I)

ˆ
! n
θ f (β, s)
i=1
Σ
S ρ̂ ⊗ ρ̂ (30)
Z(β, s) D E
X X Z Z ˆ n−1 Σ .
= (−1)n+1 Σ0 X̂(ρ̂ − I) (31)
=− dθ dθ0
Σ∈S Σ Σ0 ∈S Σ θ∈S
θ θ 0 ∈S θ
We have used [X̂, ρ̂] = 0 in Eq. (31). By summing
× Σ ρ̂ Σ θ ρ̂ θ0

Σ 0
θ

h ih ih i Eq. (31) over n, we have
× hΣ0 | ⊗ hθ0 | ln fˆ(β, s) |Σi ⊗ |θi
δ h i
Tr X̂ ln ρ̂
X X
+ hΣ| ρ̂Σ |Σ0 i hΣ0 | ln ρ̂Σ |Σi δ hΣ | ρ̂ | Σ0 i
Σ∈S Σ Σ0 ∈S Σ ∞ D E
ˆ n−1 Σ
Z Z X
= (−1)n+1 Σ0 X̂(ρ̂ − I) (32)

+ dθ dθ0 hθ| ρ̂θ |θ0 i hθ0 | ln ρ̂θ |θi
θ∈S θ θ 0 ∈S θ n=1
D E
+ ln Z(β, s), (25) = Σ0 X̂ ρ̂−1 Σ .

(33)
which is identical to Eq. (12).
Here, we note the definition of ρ̂−1 , Eq. (28).
B. Derivation of update equations
Next, by using Eq. (26), we derive the update equa-
We derive the update equations of QAVB, Eqs. (13) tions of QAVB, Eqs. (13) and (14).
The functional
derivative of Eq. (12) with respect to Σ ρ̂Σ Σ0 under

and (14), from Eq. (12). For preparation of the deriva-
7

the constraint TrΣ ρ̂Σ = 1 is given by By solving
" !
δ fˆ(β, s)
S ρ̂Σ ⊗ ρ̂θ

" ! δ hΣ | ρ̂Σ | Σ0 i Z(β, s)
δ fˆ(β, s) #
Σ θ
S ρ̂ ⊗ ρ̂ Σ
δ hΣ | ρ̂Σ | Σ0 i Z(β, s)

− α TrΣ ρ̂ −1 = 0, (36)
#
− α TrΣ ρ̂Σ − 1

we obtain
D h i E
Σ ln ρ̂Σ Σ = Σ0 Trθ ρ̂θ ln fˆ(β, s) Σ
Z Z
0
dθ0 θ ρ̂θ θ0

=− dθ
θ∈S θ θ 0 ∈S θ D E
+ (α − 1) Σ0 IˆΣ Σ . (37)
h ih ih i
× hΣ | ⊗ hθ0 | ln fˆ(β, s) |Σi ⊗ |θi
0
Taking into account that |Σi and hΣ0 | are arbitrary vec-
D E
+ Σ0 ln ρ̂Σ Σ − (α − 1) Σ0 IˆΣ Σ

(34)

D h
i E tors, we obtain
= − Σ0 Trθ ρ̂θ ln fˆ(β, s) Σ
h i
D E ln ρ̂Σ = Trθ ρ̂θ ln fˆ(β, s) + (α − 1)IˆΣ . (38)
+ Σ0 ln ρ̂Σ Σ − (α − 1) Σ0 IˆΣ Σ ,

(35)

Hence we have the update equation of Σ, Eq. (13), where
α contributes as a normalization factor. On the other
hand, by using the same procedure, we obtain the update
where α is a Lagrange multiplier. equations of θ as Eq. (14).

A Quantum Extension of Variational Bayes Inference

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A Quantum Extension of Variational Bayes Inference

Hochgeladen von

Copyright:

Verfügbare Formate

A Quantum Extension of Variational Bayes Inference

Hideyuki Miyahara1∗ and Yuki Sughiyama2

QAVB works well.

PACS numbers: 89.90.+n, 89.20.-a, 89.70.-a, 03.67.-a, 03.67.Ac

(b) (a) −2700 (b) −2700

+ ln pY (Y obs ). (19) where |GSi is the ground state of Ĥqu Σ

[25] E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lund-

Das könnte Ihnen auch gefallen