Sie sind auf Seite 1von 12

A limit relation for entropy

and channel capacity per unit cost

arXiv:0704.0046v1 [quant-ph] 1 Apr 2007

Imre Csiszar1,4 , Fumio Hiai2,5 and Denes Petz3,4


4

Alfred Renyi Institute of Mathematics,


H-1364 Budapest, POB 127, Hungary

Graduate School of Information Sciences, Tohoku University


Aoba-ku, Sendai 980-8579, Japan

Abstract: In a quantum mechanical model, Diosi, Feldmann and Koslo


arrived at a conjecture stating that the limit of the entropy of certain mixtures
is the relative entropy as system size goes to innity. The conjecture is
proven in this paper for density matrices. The rst proof is analytic and uses
the quantum law of large numbers. The second one claries the relation to
channel capacity per unit cost for classical-quantum channels. Both proofs
lead to generalizations of the conjecture.
Key words: Shannon entropy, von Neumann entropy, relative entropy, capacity per unit cost, Holevo bound.

E-mail: csiszar@renyi.hu. Partially supported by the Hungarian Research Grant OTKA T068258.
E-mail: hiai@math.is.tohoku.ac.jp. Partially supported by Grant-in-Aid for Scientific Research
(B)17340043.
3
E-mail: petz@math.bme.hu. Partially supported by the Hungarian Research Grant OTKA T068258.
2

Introduction

It was conjectured by Diosi, Feldmann and Koslo in [4], based on thermodynamical


considerations, that the von Neumann entropy of a quantum state equal to a mixture
Rn :=


1
(n1) + (n2) + + (n1)
n

exceeds the entropy of a component asymptotically by the Umegaki relative entropy


S(k), that is,
S(Rn ) (n 1)S() S() S(k)
(1)
as n . Here and are density matrices acting on a nite dimensional Hilbert
space. Recall that S() = Tr log and

Tr (log log )
if supp supp
S(k) =
+
otherwise.
Concerning the background of quantum entropy quantities, we refer to [10, 12].
Apparently no exact proof of (1) has been published even for the classical case, although for that case a heuristic proof is oered in [4].
In the paper rst an analytic proof of (1) is given for the case supp supp , using
an inequality between the Umegaki and the Belavkin-Staszewski relative entropies, and
the weak law of large numbers in the quantum case. In the second part of the paper, it
is claried that the problem is related to the theory of classical-quantum channels. The
essential observation is the fact that S(Rn ) (n 1)S() S() in the conjecture is a
Holevo quantity (classical-quantum mutual information) for a certain channel for which
the relative entropy emerges as the capacity per unit cost.
The two dierent proofs lead to two dierent generalizations of the conjecture.

An analytic proof of the conjecture

In this section we assume that supp supp for the support projections of and .
One can simply compute:
S(Rn kn ) = Tr(Rn log Rn Rn log n )
= S(Rn ) (n 1)Tr log Tr log .
Hence the identity
S(Rn kn ) = S(Rn ) + (n 1)S() + S(k) + S()
holds. It follows that the conjecture (1) is equivalent to the statement
S(Rn kn ) 0 as n
2

when supp supp .


Recall the Belavkin-Staszewski relative entropy
SBS (k) = Tr( log( 1/2 1 1/2 )) = Tr( (1/2 1/2 ))
if supp supp , where (t) := t log t, see [1, 10]. It was proved by Hiai and Petz
that
S(k) SBS (k),
(2)
see [6], or Proposition 7.11 in [10].
Theorem 1. If supp supp , then S(Rn ) (n 1)S() S() S(k) as n .
Proof: We want to use the quantum law of large numbers, see Proposition 1.17 in
[10]. Assume that and are d d density matrices and we may suppose that is
invertible. Due to the GNS-construction with respect to the limit of the product
states n (A) = Tr n A on the n-fold tensor product Md (C)n , n N, all nite tensor
products Md (C)n are embedded into a von Neumann algebra M acting on a Hilbert
space H. If denotes the right shift and X := 1/2 1/2 , then Rn is written as
!
n1
X
1
i (X) (1/2 )n .
Rn = (1/2 )n
n i=0
By inequality (2), we get
0 S(Rn kn ) SBS (Rn kn )



= Tr n (1/2 )n Rn (1/2 )n
!
n1
D
E
X
1
i
= ,
(X) ,
n i=0

(3)

where is the cyclic vector in the GNS-construction.


The law of large numbers gives
n1

1X i
(X) I
n i=0
in the strong operator topology in B(H), since (X) = Tr 1/2 1/2 = 1.
Since the continuous functional calculus preserves the strong convergence (simply due
to approximation by polynomials on a compact set), we obtain
!
n1
1X i

(X) (I) = 0 strongly.


n i=0
3

This shows that the upper bound (3) converges to 0 and the proof is complete.
By the same proof one can obtain that for
Rm,n :=


1 m
(n1) + m (n2) + + (n1) m ,
n

the limit relation

S(Rm,n ) (n 1)S() mS() mS(k)

(4)

holds as n when m is xed.


In the next theorem we treat the probabilistic case in a matrix language. The proof
includes the case when supp supp is not true. Those readers who are not familiar
with the quantum setting of the previous theorem are suggested to follow the arguments
below.
Theorem 2. Assume that and are commuting density matrices. Then S(Rn ) (n
1)S() S() S(k) as n .
Proof: We may assume that = Diag(1 , . . . , , 0, . . . , 0) and = Diag(1 , . . . , d )
are d d diagonal matrices, 1 , . . . , > 0 and < d. (We may consider , in a matrix
algebra of bigger size if is invertible.) If supp supp , then +1 = = d = 0;
this will be called the regular case. When supp supp is not true, we may assume
that d > 0 and we refer to the singular case.
The eigenvalues of Rn correspond to elements (i1 , . . . , in ) of {1, . . . , d}n :
1
(i i in + i1 i2 i3 in + + i1 in1 in ).
n 1 2
We divide the eigenvalues in three dierent groups as follows:
(a) A corresponds to (i1 , . . . , in ) {1, . . . , d}n with 1 i1 , . . . , in ,
(b) B corresponds to (i1 , . . . , in ) {1, . . . , d}n which contains exactly one d,
(c) C is the rest of the eigenvalues.
If the eigenvalue (5) is in group A, then it is
(i1 /i1 ) + + (in /in )
i1 i2 in .
n
First we compute
X
A

() =

i1 ,...,in


(i1 /i1 ) + + (in /in )
i1 in .

n


(5)

Below the summations are over 1 i1 , . . . , in :



X  (i /i ) + + (i /i )
n
n
1
1

i1 in
n
i ,...,i
1


X (i /i ) + + (i /i )
n
n
1
1
(
=
i1 in log(i1 in ) + Qn
n
i ,...,i
n

n
1X
=
n k=1

i1 i2 in log ik +

i1 i2 in log ik

i1 ,...,in

i1 ,...,in

++

i1 i2 in log ik

i1 ,...,in
n

X
X
1X
ik log ik
ik log ik +
(n 1)
=
n k=1
i
i
k

= (n 1)S()

+ Qn

+ Qn

i log i + Qn ,

i=1

where
Qn :=

i1 ,...,in


(i1 /i1 ) + + (in /in )
(i1 in )
.
n

Consider a probability space



(, P) := {1, . . . , }N , (1 , . . . , )N ,

where (1 , . . . , )N is the product of the measure on {1, . . . , } with the distribution


(1 , . . . , ). For each n N let Xn be a random variable on depending on the
nth {1, . . . , } so that the value of Xn at i {1, . . . , } is i /i . Then X1 , X2 , . . . are
identically distributed independent random variables and Qn is the expectation value of


X1 + + Xn

.
n
The strong law of large numbers says that



X
X
i
X1 + + Xn
i =
i almost surely.
E(X1 ) =
n

i
i=1
i=1
Since ((X1 + + Xn )/n) is uniformly bounded, the Lebesgue bounded convergence
theorem implies that

X

Qn
i
i=1

as n .

In the regular case


Hence we have

i=1

i = 1, Qn 0 and all non-zero eigenvalues are in group A.

S(Rn ) (n 1)S() S() =

i log i +

i=1

i log i + Qn = S(k) + Qn

i=1

and the statement is clear.


Next we consider the singular case, when we have
X
() = (n 1)S() + O(1),
A

and we turn to eigenvalues in B. If the eigenvalue corresponding to (i1 , . . . , in )


{1, . . . , d}n is in group B and i1 = d, then the eigenvalue is
1
d i2 . . . in .
n
It follows that

 
X  d i i 
d i2
in
n
2
log
n
n
i2 ,...,in
X
d
d
d
(i2 in ) log(i2 in )
log
=
n i ,...,i
n
n
n
2
d
d
d
= (n 1)S()
log .
n
n
n

When i2 = d, . . . , in = d, we get the same quantity, so this should be multiplied with n:


X

() = d (n 1)S() d log

d
.
n

P
We make a lower estimate to the entropy of Rn in such a way that we compute ()
when runs over A and B. It is clear now that
X
X
S(Rn ) (n 1)S() S()
() +
() (n 1)S() S()
A

d (n 1)S() + d log n + O(1) +

as n .

Interpretation as capacity

A classical-quantum channel with classical input alphabet X transfers the input x X


into the output W (x) x which is a density matrix acting on a Hilbert space K. We
restrict ourselves to the case when X is nite and K is nite dimensional.
6

If a classical random variable X is chosen to be the input, with probability distribution


P
P = {p(x) : x X }, then the corresponding output is the quantum state X :=
xX p(x)x . When a measurement is performed on the output quantum system, it
gives rise to an output random variable Y which is jointly distributed with the input X.
If a partition of unity {Fy : y X } in B(K) describes the measurement, then
Prob(Y = y | X = x) = Tr x Fy

(x, y X ).

(6)

According to the Holevo bound, we have


I(X Y ) := H(Y ) H(Y |X) I(X, W ) := S(X )

p(x)S(x ),

(7)

xX

which is actually a simple consequence of the monotonicity of the relative entropy under state transformation [7], see also [11]. I(X, W ) is the so-called Holevo quantity or
classical-quantum mutual information, and it satises the identity
X
p(x)S(x k) = I(X, W ) + S(X k),
(8)
xX

where is an arbitrary density.


The channel is used to transfer sequences from the classical alphabet; x = (x1 , x2 , . . . , xn )
X n is transferred into the quantum state W n (x) = x := x1 x2 . . .xn . A code for
the channel W n is dened by a subset An X n , which is called a codeword set. The decoder is a measurement {Fy : y X n }. The probability of error is Prob(X 6= Y ), where
X is the input random variable uniformly distributed on An and the output random
variable is determined by (6), where x and y are replaced by x and y.
The essential observation is the fact that S(Rn ) (n 1)S() S() in the conjecture
is a Holevo quantity in case of a channel with input sequences (x1 , x2 , . . . , xn ) {0, 1}n
and outputs x1 x2 . . . xn , where 0 = , 1 = and the codewords are all
sequences containing exactly one 0. More generally, we shall consider Holevo quantities
 1 X 
1 X
I(A, 0 , 1 ) := S
x
S(x ).
|A| xA
|A| xA

dened for any set A {0, 1}n of binary sequences of length n.

The concept related to the conjecture we study is the channel capacity per unit cost
which is dened next for simplicity only in the case where X = {0, 1}, the cost of a
character 0 X is 1, while the cost of 1 X is 0.
For a memoryless channel with a binary input alphabet X = {0, 1} and an > 0, a
number R > 0 is called an -achievable rate per unit cost if for every > 0 and for any
suciently large T , there exists a code of length n > T with at least eT (R) codewords
such that each of the codewords contains at most T 0s and the error probability is at
most . The largest R which is an -achievable per unit cost for every > 0 is the
channel capacity per unit cost.
7

Lemma 1. For an arbitrary A {0, 1}n ,


I(A, 0 , 1 ) c(A)S(0 k1 )
holds, where
c(A) :=

1 X
|{i : xi = 0}|.
|A| xA

Proof: Let c(x) := |{i : xi = 0}| for x A. Since I(A, 0 , 1 ) is a particular Holevo
quantity I(X, W n ), we can use the identity (8) to get an upper bound
1 X
1 X
S(x kn
)
=
c(x)S(0 k1 ) = c(A)S(0 k1 )
1
|A| xA
|A| xA
for I(A, 0 , 1 ).
Lemma 2. If A {0, 1}n is a code of the channel W n , whose probability of error (for
some decoding scheme) does not exceed a given 0 < < 1, then
(1 ) log |A| log 2 I(A, 0 , 1 ).
Proof: The right-hand side is a bound for the classical mutual information I(X Y ) =
H(Y ) H(Y |X), where Y is the channel output, see (7). Since the error probability
Prob(X 6= Y ) is smaller than , application of the Fano inequality (see [3]) gives
H(X|Y ) log |A| + log 2.
Therefore
I(X Y ) = H(X) H(X|Y ) (1 ) log |A| log 2,
and the proof is complete.
The above two lemmas shows that the relative entropy S(0 k1 ) is an upper bound
for the channel capacity per unit cost of the channel W (0) = 0 and W (1) = 1 with
a binary input alphabet. In fact, assume that R > 0 is an -achievable rate. For every
> 0 and T > 0 there is a code A {0, 1}n for which we get by Lemmas 1 and 2
T S(0 k1 ) c(A)S(0 k1 ) I(A, 0 , 1 )
(1 ) log |A| log 2
(1 )T (R ) log 2.
Since T is arbitrarily large and , are arbitrarily small, R S(0 k1 ) follows. That
S(0 k1 ) equals the channel capacity per unit cost will be veried below.
Theorem 3. Let the classical-quantum channel W : X = {0, 1} B(K) be defined as
W (0) = 0 and W (1) = 1 . Assume that An {0, 1}n is chosen such that
8

(a) each element x = (x1 , x2 , . . . , xn ) An contains at most copies of 0,


(b) log |An |/ log n c as n ,
(c)
c(An ) :=

1 X
|{i : xi = 0}| c
|An | xA

as n

for some real number c > 0 and for some natural number . If the random variable Xn
has a uniform distribution on An , then


1 X
lim S(Xn )
S(x ) = cS(k).
n
|An | xA
n

The proof of the theorem is divided into lemmas. We need the direct part of the
so-called quantum Stein lemma obtained in [6], see also [2, 5, 9, 12].
Lemma 3. Let 0 and 1 be density matrices. For every > 0 and 0 < R < S(0 k1 ),
if N is sufficiently large, then there is a projection E B(KN ) such that
N [E] := Tr N
0 (I E) <
and for N [E] := Tr N
1 E the estimate
1
log N [E] < R
N
holds.
Note that N is called the error of the rst kind, while N is the error of the second
kind.
Lemma 4. Assume that > 0, 0 < R < S(0 k1 ), is a positive integer and the
sequences x in An {0, 1}n contain at most copies of 0. Let the codewords be the
N-fold repetitions xN = (x, x, . . . , x) of the sequences x An . If N is the integer part
of
2n
1
log
R

and n is large enough, then there is a decoding scheme such that the error probability is
smaller than .
Proof: We follow the probabilistic construction in [13]. Let the codewords be the Nfold repetitions xN = (x, x, . . . , x) of the sequences x An . The corresponding output
density matrices act on the Hilbert space KN n (Kn )N . We decompose this Hilbert
space into an N-fold product in a dierent way. For each 1 i n, let Ki be the
9

tensor product of the factors i, i + n, i + 2n, . . . , i + (N 1)n. So K is identied with


K 1 K2 . . . Kn .
For each 1 i n we perform a hypothesis testing on the Hilbert space Ki . The
0-hypothesis is that the ith component of the actually chosen x An is 0. Based on
the channel outputs at time instances i, i + n, . . . , i + (N 1)n, the 0-hypothesis is
tested against the alternative hypothesis that the ith component of x is 1. According
to the quantum Stein lemma (Lemma 3), given any > 0 and 0 < R < S(k), for N
suciently large, there exists a test Ei such that the probability of error of the rst kind
is smaller than , while the probability of error of the second kind is smaller than eN R .
The projections Ei and I Ei form a partition of unity in the Hilbert space Ki , and
the n-fold tensor product of these commuting projection will give a partition of unity in
KN n . Let y {0, 1}n and set Fy := ni=1 Fyi , where Fyi = Ei if yi = 0 and Fyi = I Ei
if yi = 1. Therefore, the result of decoding can be an arbitrary 01 sequence in {0, 1}n .
The decoding scheme gives y {0, 1}n in such a way that yi = 0 if the tests accepted
the 0-hypothesis for i and yi = 1 if the alternative was accepted. The error probability
should be estimated:
Prob(Y 6= X|X = x) =

Tr N
x Fy

y:y6=x
n
X
X

n
X Y

Tr N
xi Fyi

y:y6=x i=1
n
Y

Tr N
xj Fyj

i=1 y:yi 6=xi j=1

n
X

Tr N
xi (I Fxi ).

i=1

If xi = 0, then
N
Tr N
xi (I Fxi ) = Tr 0 (I Ei ) ,

because it is an error of the rst kind. When xi = 1,


N
RN
Tr N
xi (I Fxi ) = Tr 1 Ei e

from the error of the second kind. It follows that + neN R is a bound for the error
probability. The rst term will be small if is small. The second term will be small
if N is large enough. If both terms are majorized by /2, then the statement of the
lemma holds. We can choose n so large that N dened by the statement should be large
enough.
Proof of Theorem 3: Since Lemma 1 gives an upper bound, that is,


1 X
lim sup S(Xn )
S(x ) cS(k),
|An | xA
n
n

it remains to prove that




1 X
S(x ) cS(k).
lim inf S(Xn )
n
|An | xA
n

10

Lemma 4 is about the N-times repeated input X N and describes a decoding scheme
with error probability at most . According to Lemma 2 we have
(1 ) log |An | 1 S(X N )

1 X
S(xN ).
|A| xA
n

From the subadditivity of the entropy we have


S(X N ) NS(X )
and
S(xN ) = NS(x )
holds due to the additivity for product. It follows that
(1 )

log |An |
1
1 X
S(x ).

S(X )
N
N
|An | xA
n

From the choice of N in Lemma 4 we have


R

log |An |
log n
log |An |

log n log n + log 2 log


N

and the lower bound is arbitrarily close to cR. Since R < S(0 k1 ) was arbitrary, the
proof is complete.

References
[1] V.P. Belavkin and P. Staszewski, C*-algebraic generalization of relative entropy and
entropy, Ann. Inst. Henri Poincare, Sec. A 37(1982), 5158.
[2] I. Bjelakovic, J. Deuschel, T. Kr
uger, R. Seiler, R. Siegmund-Schultze and A. Szkola,
A quantum version of Sanovs theorem, Comm. Math. Phys. 260(2005), 659671.
[3] T. M. Cover and J. A. Thomas, Elements of Information Theory, Second edition,
Wiley-Interscience, Hoboken, NJ, 2006.
[4] L. Diosi, T. Feldmann and R. Koslo, On the exact identity between thermodynamic
and informatic entropies in a unitary model of friction, Int. J. Quantum Information,
4(2006), 99104.
[5] M. Hayashi, Quantum information. An introduction, Springer, 2006.
[6] F. Hiai and D. Petz, The proper formula for relative entropy and its asymptotics in
quantum probability, Comm. Math. Phys. 143(1991), 99114.

11

[7] A.S. Holevo, Some estimates for the amount of information transmittable by a quantum communication channel (in Russian), Problemy Peredachi Informacii, 9(1973),
311.
[8] M.A. Nielsen and I.L. Chuang, Quantum computation and quantum information,
Cambridge University Press, Cambridge, 2000.
[9] T. Ogawa and H. Nagaoka, Strong converse and Steins lemma in quantum hypothesis testing, IEEE Tans. Inf. Theory 46(2000), 24282433.
[10] M. Ohya and D. Petz, Quantum Entropy and its Use, Springer, 1993.
[11] M. Ohya, D. Petz and N. Watanabe, On capacities of quantum channels, Prob.
Math. Stat. 17(1997), 179196.
[12] D. Petz, Lectures on quantum information theory and quantum statistics, book
manuscript in preparation.
[13] S. Verdu, On channel capacity per unit cost, IEEE Trans. Inform. Theory 36(1990),
10191030.

12

Das könnte Ihnen auch gefallen