Sie sind auf Seite 1von 14
BY 3 Sie Shane DETERMINANT INEQUALITIES VIA INFORMATION THEORY by THOMAS M. COVER and JOY A. THOMAS STANFORD UNIVERSITY TECHNICAL REPORT NO. 65 NOVEMBER 1987 PREPARED UNDER THE AUSPICES OF NATIONAL SCIENCE FOUNDATION GRANT DC185-20136 DEPARTMENT OF STATISTICS STANFORD UNIVERSITY STANFORD, CALIFORNIA DETERMINANT INEQUALITIES VIA INFORMATION THEORY, by THOMAS M. COVER and JOY A. THOMAS STANFORD UNIVERSITY TECHNICAL REPORT NO. 65 NOVEMBER 1987 PREPARED UNDER THE AUSPICES OF NATIONAL SCIENCE FOUNDATION GRANT DC185-20136 DEPARTMENT OF STATISTICS STANFORD UNIVERSITY STANFORD, CALIFORNIA Determinant Inequalities via Information Theory Thomas M. Cover* Joy A. Thomast Abstract ‘We use simple inequalities from information theory to prove Hadamard’s in- equality and some of its generalizations. We also prove that the determinant of a Positive definite matrix is log-concave and that the ratio of the determinant of the matrix to the determinant of its principal minor |K,|/|Kn—1| is concave, establish- ing the concavity of minimum mean squared error in linear prediction. We also show for Toeplitz matrices that the normalized determinant |K',|* decreases with a 1 Introduction. The entropy inequalities of information theory have obvious intuitive meaning. For example, the entropy (or uncertainty) of a collection of random variables is less than or equal to the sum of their entropies. Letting the random variables be multivariate normal will yield Hadamard’s inequality(1],[2]. We shall find many such determinant “Departments of Electrical Engineering and Statistics, Stanford University. ‘Department of Electrical Engineering, Stanford University. ‘This work was partially supported by Bell Communications Research and by NSF contract number ECS85-20136. A preliminary version of this is appearing as Bellcore Technical Memo TM-ARH-010-203. inequalities using this technique. We use throughout the fact that if 1 ~ ee on) = ogee @ is the multivariate normal density with mean 0 and covariance matrix K, then the entropy h(X1, X2,...,Xn) is given by Way Haj. Xa) =~ f deelndye = Fin(2ne)"IK, @) where |K| denotes determinant of K, and In denotes the natural logarithm. This equality is verified by direct computation with the use of J 6x0 xdx = DO KK n=lne’. (3) First some information theory preliminaries, then the determinant inequalities, 2 Information Inequalities. In this section, we introduce some of the basic information theoretic quantities and Prove a few simple inequalities using convexity. We assume throughout that the vee- tor (X1,X2)-++.Xn) has a probability density f(21,22,...,2,). We need the following definitions. Definition: The entropy h(X1,X2,...,Xn), sometimes written h(f), is defined by X)=-f fing. (4) J f() ln f(%)/9(x)) dx is called the relative W(X, Xa, Definition: The functional D(l|9 entropy, where f and g are probability densities. The relative entropy D(f||g) is also known as the Kullback Leibler information number, information for discrimination, and information distance, We also note-that, D( fllg) is the error exponent in the hypothesis test of f versus g. Definition: The conditional entropy h(X|Y) of X given Y is defined by Wx) =— f flea)inflely)dedy. 6) We now observe certain natural properties of these information quantities. 2 Lemma D(f\lg) > 0. Proof: Let A be the support set of f. Then by Jensen’s inequality, —D({lla) Saf in(g/f) 0, by D(F(e, VIlF(@)Fy)) 2 0. Equality implies f(2,y) = fle)fly), ae., by strict concavity of the logarithm. Lemma 4 (Chain Rule) h(X;,X2y...Xn) with equality if X,,Xo,...,Xq are indeperident. Hea A(X] Xin Xinay «+5 Xt) S Dhar A(X) Proof: The equality is the chain rule for entropies, which we get be repeatedly applying Lemma 2. The inequality follows from Lemma 8, and we have equality iff Xy, X2,...,.Xn are independent. Lemma 5 If X and ¥ are independent, then h(X +Y) > h(X). Prooft A(X+Y)> A(X +Y¥|¥) = A(XI¥) = A(X). We will also need the entropy maximizing property of the multivariate normal. Lemma 6 Let the random vector X € R" have zero mean and covariance K = EXX', i.e., Kij = EX:Xj, 1S i, j Sn. Then h(X) < Lin(2ne)"|K|, with equality iff F000) = dx(x). Proof: Let g(x) be any density satisfying f g(x)xyxjdx = K,j, for all i,j. Then 0 < Doslldx) = faix(o/¢x) = -Mg)- [ gindx = -i9)- [ dxindx = -A(a) +A(6x) © 3 where the substitution [ glnéx = f $x In dx follows from the fact that g and bx yield the same moments of the quadratic form In ¢x(x). Motivated by a desire to prove Szasa’s generalization of Hadamard’s inequality in the next section, we develop a new inequality on the entropy rates of random subsets of random variables. Let (X1,X2,-..Xq) have a density and for every $C {1,2,....n}, denote by X(S) the subset {X;:i€ S). Definition: Let = ye MS), ” i) stsce | * Here h{") is the average entropy in bits per symbol of a randomly drawn k-element sub- set of {X;,Xa,...,Xq}. The following lemma says that the average entropy decreases monotonically in the size of the subset. Lemma 7 RY > AM >.> A. @) Proof: We will first prove the last inequality, i.e., hi") < h™,. We write R(X, Xap eee Xn) = A(Xay Kaye er Xner) + A(XalX, Kay BXiy Xayeey Xn) = WK, Xap +o Xneay Xn) + (XnealXay Xas oy Xnoay Xn) S WX, Xap. Xnoa Xn) + A(Nneal Xa, Xa5-0 Kaa), M(Xay Xay 004 Xn) S (Xa X3y--yXn) + A(X). Adding these n inequalities and using the chain rule, we obtain R(Xa, Xap +4 Xn) S DOAK Kaye Kina Xigay eee Xn) + A(X, X2,.-4Xn) (9) a or 1 1AM, X; Xin, X; X,.) ay Xo y Xinty Xian ey Xu ph Xas Kayo Xn) S Pe ASS a aie) (10) which is the desired result hi”) < h),. We now prove that h{” < A, for all k 0, and define (11) Then G2 > 290. (12) Proof: Starting from Eq. (10) in Lemma 7, we multiply both sides by r, exponentiate, and then apply the arithmetic mean geometric mean inequality to obtain eb Diy Boa Miaa Mean) Sh tas Acted sag, (13) a eM XX: w A which is equivalent to gl” < g,. Now we use the same arguments as in Lemma 7, taking an average over all subsets to prove the result that for all k 2M) 4 BMY) | (a4) Proof: See Shannon [3] for the statement and Stam [4] and Blachman [5] for the proof. Unlike the previous results, the proof is not elementary. 3 Determinant Inequalities. ‘Throughout we will assume that K is a nonnegative definite symmetric n x n matrix. Let |X| denote the determinant of K. We first prove a result due to Ky Fan{6]. Theorem 1 In|K| is concave. Proof: Let X; and X; be normally distributed n-vectors, X;~ ¢x,(x), i= 1,2. Let the random variable 8 have distribution Pr(9 = 1} = A, Pr{@ = 2} =1-, O< ASL. Let 6, Xs, and Xz be independent and let Z = Xz. Then Z has covariance 5 Kz = XK, +(1-A)Kz. However, Z will not be multivariate normal By first using Lemme 6, followed by Lemma 3, we have Fln(2re)MARi +(1-A) Kal > h(Z) > A(Z|9) = A In(2e)"|Kal+(1-2)z n(2xe)"1 Ka | (15) Thus IAKi + (1 )Kal 2 [PKI (16) as desired. The next theorem, used in [7], is too easy to require a new proof, but here it is anyway. Theorem 2 [Ki + Ka| > |Kil- Proof: Let X, ¥ be independent random vectors with X ~ $x, and Y ~ bx,. Then X+Y~ Oxisns and hence Ln(2me)"|Ky + Ke] = A(X+¥) > A(X) = Hn(2me)"|Kil, by Lemma 5. We now give Hadamard’s inequality using the proof in [2]. See also (1] for an alternative proof. Theorem 3 (Hadamard) |K| (9 >... > 5 = [clk (2) Proof: This follows directly from the corollary to Lemma 7, with the identification of = (2re)S{”, and r =2 in (11) and (12). We now prove a property of Toeplitz matrices, which are important as the covariance matrices of stationary random processes. A Toeplitz matrix K is characterized b y the property that Kij = K,, if |i — j] = |r—s|, Let Ky denote the principal minor K(1,2,...,k). For such a matrix, the following property can be proved easily from the properties of the entropy function. Theorem 6 If the positive definite n x n matriz K is Toeplitz, then [Ka] > Wal > ++ > [Kal > [Ky (22) and |Kel/|Ku-1| is decreasing in k. Prooft Let (X1,Xay.-Xn) ~ xq» Then the quantities h(Xs|Xt-1,-..s.X1) are decreasing in k, since A Xeaay h(XpsilXiy +» X2) A(XeealXiy Xa, Xa) (23) Vv where the equality follows from the Toeplitz assumption and the inequality from the fact that conditioning reduces entropy. Thus the running averages 1 14 Bh Riv Xa) = FL MMI Xay +X) (24) are decreasing in k. The theorem then follows from h(X1, X2,....X;) = Ela(2ne)*|Kil- i Since h(Xn|Xn—15 mean theorem, X;,) is a decreasing sequence, it has a limit, Hence by the Ceséro x) = fim AG IKeay-- Ki) (25) 12. = lim = AK IX,- Piet ‘Translating this to determinants, one obtains the following result : sen Bal lim |Kq|> = LF ABD = Be TT which is one of the simple limit theorems for determinants that can be proved using information theory. (28) In problems connected with maximum entropy spectrum estimation, one would like to maximize the value of the determinant of a Toeplitz matrix, subject to constraints on the values in » band around the main diagonal. Choi and Cover(10] use information theoretic arguments to show that the matrix maximizing the determinant under these constraints is the Yule-Walker extension of the values along the band. The proof of the next inequality (Oppenheim [11], Olkin and Marshall{12,p. 475)) fol- lows immediately from the entropy power inequality, but because of the complexity of the proof of the entropy power inequality, is not offered as a simpler proof. Theorem 7 Minkowski inequality/13]. [Ky + Kal" > |" + [Ka] (27) Proof: Let X1,Xz be independent with X; ~ ¢x,. Noting that X1+X2~ ox,K, and using the entropy power inequality (Lemma 8) yields (2re)|Ky + Ka|Y™ = e404) > eFNK) 4 @FA%) (2re)|Kal"" + (2me)|Ka]"/" . (28) 4 Inequalities for ratios of determinants We first prove a stronger version of Hadamard’s theorem due to Ky Fanf8]. 8 ‘Theorem 8 For alll tal |Tul_ Sn + Tal 35) DS, + Tp] — Be Tal’ a i.e, [Kal/|Kn-a| is concave. Simple ezamples show that |Kn|/|Kn-pl is not necessarily concave for p> 2. 5 Remarks. Concavity and Jensen’s inequality play a role in all the proofs. The inequality D(fl|g) = J fln(f/g) 2 0 is at the root. Acknowledgment ‘We wish to thank A. Bernardi and S. Pombra for their contributions. References [1] A. Marshall and I. Olkin, “A Convexity Proof of Hadamard’s Inequality,” Am. Math. Monthly, Vol. 89, No. 9, pp. 687-688, November 1982. cee [2] T. Cover and A. El Gamal, “An Information Theoretic Proof of Hadamard’s In- equality,” IEEE Trans. on Information Theory, Vol. IT-29, No. 6, pp. 930-931, November 1983. [3] C.E. Shannon, “A Mathematical Theory of Communication,” Bell Syst. Tech. J., Vol. 27, pp. 623-656, October 1948. [4] A. Stam, “Some Inequalities Satisfied by the Quantities of Information of Fisher and Shannon,” Information and Control, Vol. 2, pp. 101-112, June 1959. {5] N. Blachman, “The Convolution Inequality for Entropy Powers,” IEEE Trans. on Information Theory, Vol. IT-11, pp. 267-271, April 1965. [6] Ky Fan, “On a Theorem of Weyl Concerning the Eigenvalues of Linear Transfor- mations, II,” Proc. Natl. Acad. Sei. U.S., vol. 36, pp. 31-35, 1950, [7] S. Pombra and T. Cover, “Gaussian Feedback Capacity,” Stanford Statistics Dept. Technical Report No. 63, September 1987, submitted to IEEE Trans. on Informa- tion Theory. [8] Ky Fan, “Some Inequalities Concerning Positive-definite Matrices,” Proc. Cam: bridge Phil. Soc., vol. 51, pp.414-421, 1955. {9] L. Mirsky, “On a Generalization of Hadamard’s Determinantal Inequality due to Szasz,” Arch. Math., vol. VIII, pp. 274-275, 1957. {10] B.S.Choi and Thomas M. Cover, An Information-Theoretic Proof of Burg’s Max- imum Entropy Spectrum,” Proc. IEEE, vol. 72, no. 8, pp 1094-1295, August 1984. (11] A. Oppenheim, “Inequalities Connected with Definite Hermitian Forms,” J. Lon- don Math. Soc., vol. 5, pp. 114-119, 1930. [12] A. Marshall and I. Olkin, Inequalities: Theory of Majorizat: tions, Academic Press, 1979. n and its Applica- [13] H. Minkowski, “Diskontinuitatsbereich fir arithmetische Aquivalenz,” Journal fir Math., vol. 129, pp. 220-274, 1950. (14] R. Bellman, “ Notes on Matrix Theory—T Math. Monthly, vol. 62, pp. 172-173, 1955. An inequality due to Bergstrom,” Am. 12

Das könnte Ihnen auch gefallen