Sie sind auf Seite 1von 5

c

2011
by Armand M. Makowski

ENEE 627
SPRING 2011
INFORMATION THEORY
CONVEXITY

Convex sets
A subset K of Rd is said to be convex if for any elements x and y of K, and any
in ]0, 1], we have
x + (1 )y K,

[0, 1]

It is a simple exercise to show the following by induction:


Lemma 0.1 A set K in Rd is convex if and only if for any integer p = 2, 3, . . .,
and any collection x1 , . . . , xp in K, we have
1 x1 + . . . + p xp K
for arbitrary 1 , . . . , p in [0, 1] such that
1 + . . . + p = 1.
We refer to the linear combination 1 x1 + . . . + p xp with x1 , . . . , xp in Rd
and 1 , . . . , p in [0, 1] such that
1 + . . . + p = 1
as a convex combination.
Convex functions
Consider a convex set K in Rd . A function : K R is said to be convex if
for any elements x and y of K, and any in ]0, 1], we have
(x + (1 )y) (x) + (1 )(y),

[0, 1].

A function : K R is said to be concave if is a convex function.


It is also a simple exercise to show the following by induction:

c
2011
by Armand M. Makowski

Lemma 0.2 Consider a convex set K in Rd . A function : K R is convex if


and only if for any integer p = 2, 3, . . ., and any collection x1 , . . . , xp in K, we
have
(1 x1 + . . . + p xp ) 1 (x1 ) + . . . + p (xp )
for arbitrary 1 , . . . , p in [0, 1] such that
1 + . . . + p = 1.
Strictly convex functions
A function : K R is said to be strictly convex if it is convex and whenever
the equality
(x + (1 )y) = (x) + (1 )(y),

x, y K
(0, 1)

holds, we necessarily have x = y. As expected, a function : K R is said to


be strictly concave if is a strictly convex function.
Of great usefulness in many arguments is the following observation: Consider
a strictly convex : K R. Suppose that for some p = 2, 3, . . ., with x1 , . . . , xp
in K, we have the equality
(1)

(1 x1 + . . . + p xp ) = 1 (x1 ) + . . . + p (xp )

with 1 , . . . , p in (0, 1) such that


1 + . . . + p = 1.
Under such circumstances, what can we say about x1 , . . . , xp ? We shall show that
we must necessarily have
(2)
x1 = . . . = xp .
If p = 2, since 0 < 1 , 2 < 1, by definition of strict convexity we automatically
have the conclusion x1 = x2 . If p > 2, the matter is more involved. To proceed,
with any subset I of {1, . . . , p} such that 1 |I| < p we define
X
I =
i .
iI

Under the foregoing assumptions we have 0 < I < 1, so that the definition
X i
xI =
xi

I
iI

c
2011
by Armand M. Makowski

is well posed and yields an element of K. We also note that


I xI + I c xI c = 1 x1 + . . . + p xp .
with
I + I c = 1.
Using the convexity of twice we get
(1 x1 + . . . + p xp )
= (I xI + I c xI c )
I (xI ) + I c (xI c )

!
X j
X i
(xi ) + I c
(xj )
I

I
Ic
iI
j I
/

(3)

= 1 (x1 ) + . . . + p (xp ).

Moreover, convexity again gives


(4)

(xI )

and
(5)

(xI c )

X i
(xi )

I
iI
X j
(xj ).
I c
j I
/

However, because of (1) the inequalities leading to (3) must necessarily hold
as equalities, and this implies
(6)

(I xI + I c xI c ) = I (xI ) + I c (xI c ),

(7)

(xI ) =

X i
(xi )
I
iI

and
(8)

(xI c ) =

X j
(xj )
I c
j I
/

as we make use of the fact that


0 < I , I c < 1.

c
2011
by Armand M. Makowski

By strict convexity it follows from (6) that


xI = xI c .
With (7) and (8) as point of departure, in lieu of (1), we can repeat the arguments
above with I and I c , respectively, instead of {1, . . . , p}. Upon doing this as many
times as needed we can eventually conclude that
xi = xj ,

i, j = 1, . . . , p
i 6= j

and this completes the proof of (2).

Kullback-Leibler distance
Consider a set X of finite cardinality. With and pmfs on X , define


X
(x)
D(||) =
(x) log
x
(x)
with the conventions

 
0
0 log
= 0,
0
p
p log
= if p > 0
0
and
 
0
0 log
= 0 if q > 0
q
The proof of Theorem 2.6.3 revisited: Thus,


X
(x)
D(||) =
(x) log
x
(x)


X
(x)
=
(x) log
x: (x)>0
(x)


X
(x)
=
(x) log
x: (x)>0
(x)
X

(x)
(9)
log
(x)
x: (x)>0
(x)
X

= log
(x)
x: (x)>0

(10)

log 1 = 0

c
2011
by Armand M. Makowski

whence D(||) 0, or equivalently, D(||) 0.


The equality D(||) = 0 occurs if and only if equality occurs at both (9) and
(10). By the strict concavity of t log t, equality occurs at (9) if and only terre
exists c > 0 such that
(x)
xX
= c,
(x)
>0
(x)
As a result,
X

x: (x)>0

(x) = c

x: (x)>0

(x) = c

since
X

x: (x)>0

(x) =

(x) = 1.

On the other hand, (10) occurs if and only if


X
(x) = 1
x: (x)>0

Consequently, c = 1 and
X

x: (x)=0

(x) = 0,

whence (x) = 0 if and only if (x) = 0. In sum, (x) = (x) for all x in X .

Das könnte Ihnen auch gefallen