Sie sind auf Seite 1von 0

Assignment 1

Suggested Solutions


Contributors:
Raymond W. Yeung
Ning Cai
Terence H. Chan
Siu Wai Ho
Shenghao Yang
Zhixue Zhang

Copyright 2014 by Raymond W. Yeung













Chapter 2






Chapter 2
Information Measures
1. Let X and Y be random variables with alphabets X = Y = {1, 2, 3, 4, 5}
and joint distribution p(x, y) given by
1
25

1 1 1 1 1
2 1 2 0 0
2 0 1 1 1
0 3 0 2 0
0 0 1 1 3

.
Determine H(X), H(Y ), H(X|Y ), H(Y |X), and I(X; Y ).
Solution:
H(X) = H(Y ) = log 5.
H(X, Y ) = 2 log 5
8
25
log 2
6
25
log 3.
H(X|Y ) = H(X, Y ) H(Y ) = log 5
8
25
log 2
6
25
log 3.
H(Y |X) = H(X, Y ) H(X) = log 5
8
25
log 2
6
25
log 3.
I(X; Y ) = H(X) + H(Y ) H(X, Y ) =
8
25
log 2 +
6
25
log 3.
3
4 CHAPTER 2. INFORMATION MEASURES
2. Prove Propositions 2.8, 2.9, 2.10, 2.19, 2.21, and 2.22.
Solution:
Proof of Proposition 2.8
We will rst prove the only if part by induction on n. The claim is
true for n = 3. Assume it is true for all n m, where m 3, and
consider the Markov chain X
1
X
2
X
m
X
m+1
. Then by
(2.13),
p(x
2
)p(x
3
) p(x
m
)p(x
1
, x
2
, , x
m
, x
m+1
)
= p(x
1
, x
2
) p(x
m1
, x
m
)p(x
m
, x
m+1
).
Summing over all x
m+1
, we have
p(x
2
) p(x
m1
)p(x
m
)p(x
1
, x
2
, , x
m
)
= p(x
1
, x
2
) p(x
m1
, x
m
)p(x
m
).
If p(x
m
) > 0, then cancel p(x
m
) on both sides to obtain
p(x
2
) p(x
m1
)p(x
1
, x
2
, , x
m
) = p(x
1
, x
2
) p(x
m1
, x
m
).
(A2.1)
Otherwise, p(x
1
, x
2
, , x
m
) p(x
m
) = 0 implies p(x
1
, x
2
, , x
m
) =
0. Similarly, we see that p(x
m1
, x
m
) = 0. Thus (A2.1) continues to
be valid for p(x
m
) = 0. By Denition 2.4, we have X
1
X
2

X
m
, and so by the induction hypothesis,
X
1
X
2
X
3
(X
1
, X
2
) X
3
X
4
.
.
.
(X
1
, X
2
, , X
m2
) X
m1
X
m
.
It remains to show that
(X
1
, , X
m2
, X
m1
) X
m
X
m+1
. (A2.2)
Toward this end, we write
p(x
1
, , x
m
, x
m+1
)
=
(
p(x
1
,x
2
)p(x
m1
,xm)p(xm,x
m+1
)
p(x
2
)p(xm)
if p(x
2
), , p(x
m
) > 0
0 otherwise.
5
Dene
f(x
1
, , x
m
) =
(
p(x
1
,x
2
)p(x
m1
,xm)
p(x
2
)p(xm)
if p(x
2
), , p(x
m
) > 0
0 otherwise
and
g(x
m
, x
m+1
) = p(x
m
, x
m+1
).
If p(x
m
) > 0 and p(x
2
), , p(x
m1
) > 0, then
p(x
1
, , x
m
, x
m+1
) = f(x
1
, , x
m
)g(x
m
, x
m+1
). (A2.3)
If p(x
m
) > 0 and p(x
i
) = 0 for some 2 i m1, then p(x
1
, , x
m
, x
m+1
) =
0 and f(x
1
, , x
m
) = 0, so that (A2.3) again holds. Thus, (A2.3)
holds whenever p(x
m
) > 0. By Proposition 2.5, the Markov chain in
(A2.2) is established, completing the proof for the only if part.
We now prove the if part. Assume that
X
1
X
2
X
3
(X
1
, X
2
) X
3
X
4
.
.
.
(X
1
, X
2
, , X
m2
) X
m1
X
m
.
If p(x
2
), p(x
3
), , p(x
m1
) > 0, then
p(x
1
, x
2
, , x
m
)
= p(x
1
, x
2
, , x
m1
)p(x
m
|x
m1
)
= p(x
1
, x
2
, , x
m2
)p(x
m1
|x
m2
)p(x
m
|x
m1
)
.
.
.
= p(x
1
, x
2
)p(x
3
|x
2
) p(x
m
|x
m1
).
On the other hand, if p(x
i
) = 0 for some 2 i m 1, then
p(x
1
, x
2
, , x
m
) p(x
i
) = 0, which implies p(x
1
, x
2
, , x
m
) = 0.
Thus we have shown that X
1
X
2
X
m
, proving the if part
of the proposition. Hence, the proposition is proven.
Proof of Proposition 2.9
It suces to show that
p(x
1
, , x
n
) = f
1
(x
1
, x
2
) f
n1
(x
n1
, x
n
) (A2.4)
6 CHAPTER 2. INFORMATION MEASURES
if p(x
2
), , p(x
n1
) > 0 i
p(x
1
, , x
n
) =
(
p(x
1
,x
2
)p(x
n1
,xn)
p(x
2
)p(x
n1
)
if p(x
2
), , p(x
n1
) > 0
0 otherwise.
The if part is trivial and its proof is omitted. We now prove the only
if part. Dene for 1 i n,
Q(i) =
X
x
1

X
x
i1
f
1
(x
1
, x
2
) f
i1
(x
i1
, x
i
)
S(i) =
X
x
i+1

X
xn
f
i
(x
i
, x
i+1
) f
n1
(x
n1
, x
n
)
with the convention that Q(1) = S(n) = 1. Then by summing over all
x
j
for j 6= i 1, i in (A2.4), it is not dicult to show that
p(x
i1
, x
i
) = f
i1
(x
i1
, x
i
)Q(i 1)S(i) (A2.5)
for 1 i n. Summing over all x
i1
in the above, we can further
obtain
p(x
i
) = Q(i)S(i). (A2.6)
Hence, by using the expressions for p(x
i1
, x
i
) and p(x
i
), and can-
celling the corresponding terms, we obtain
p(x
1
, x
2
) p(x
n1
, x
n
)
p(x
2
) p(x
n1
)
= f
1
(x
1
, x
2
) f
n1
(x
n1
, x
n
)Q(1)S(n)
= f
1
(x
1
, x
2
) f
n1
(x
n1
, x
n
) 1 1
= p(x
1
, , x
n
).
Proof of Proposition 2.10
Let i
j
be the largest element in
j
, 1 j m, and i
0
= 0. Dene

j
= {i
j1
+1, , i
j
}, so that
j

j
, and let
j
=
j
\
j
. Consider
a Markov chain X
1
X
2
X
n
. By Proposition 2.9,
p(x
1
, , x
n
) = f
1
(x
1
, x
2
) f
n1
(x
n1
, x
n
) (A2.7)
for all x
1
, x
2
, , x
n
such that p(x
2
), , p(x
n1
) > 0. By dening
f

k
(x
k
, x
k+1
) =
(
f
k
(x
k
, x
k+1
) if p(x
k+1
) > 0
0 otherwise

No need to prove
Proposition 2.10
for Assignment 1.

This Proof is
included for your
self study only.
7
for 1 k n 1, we have
p(x
1
, , x
n
) = f

1
(x
1
, x
2
) f

n1
(x
n1
, x
n
) (A2.8)
for all x
1
, , x
n
. Note that f

k
(x
k
, x
k+1
) is well-dened because if
p(x
k+1
) > 0, then p(x
1
, , x
n
) > 0 for some x
1
, , x
k
, x
k+1
, , x
n
,
which implies that p(x
2
), , p(x
n1
) > 0. For notational convenience,
we will let X
0
be a constant and dene the function
f

0
(x
0
, x
1
) =
(
1 if p(x
1
) > 0
0 otherwise.
Denote (x
l
, l
j
) by x

j
. For 0 j m1, let
g
j
(x
i
j
, x

j+1
) = f

j
(x
i
j
, x
i
j
+1
)f

j+1
(x
i
j
+1
, x
i
j
+2
) f

i
j+1
1
(x
i
j+1
1
, x
i
j+1
),
We also let
G(x
im
, , x
n
) = f

im
(x
im
, x
i
m+1
) f

n1
(x
n1
, x
n
).
Then (A2.8) can be written as
p(x
1
, , x
n
) =

m1
Y
j=0
g
j
(x
i
j
, x

j+1
)

G(x
im
, , x
n
). (A2.9)
Denote
Q
lA
X
l
by X
A
and x x

j
, 1 j m. Summing over all
vectors (x
0
1
, , x
0
n
) such that x
0

j
= x

j
for 1 j m in (A2.9), we
have
p(x

1
, , x
m
) =

m1
Y
j=0
X
g
j
(x
i
j
, x

j+1
)

X
G(x
im
, , x
n
),
where the summation inside the square brackets is taken over all the
vectors in X

j
, while the other summation is taken over all the vec-
tors in
Q
n
l=im+1
X
l
. For j = 0, the summation
P
g
j
(x
i
j
, x

j+1
) =
P
g
0
(x
0
, x

1
) depends only on x

1
because x
0
is a constant, and hence
we can write it as f
0
0
(x

1
). For 1 j m1,
P
g
j
(x
i
j
, x

j+1
) depends
only on X
i
j
and x

j
, and hence we can write it as f
0
j
(x

j
, x

j+1
). Fi-
nally,
P
G(x
im
, , x
n
) depends only on X
im
, and hence we can write
it as G
0
(x
m
). Therefore, we have
p(x

1
, , x
m
) = f
0
0
(x

1
)f
0
1
(x

1
, x

2
) f
0
m1
(x

m1
, x
m
)G
0
(x
m
).

(Con't) No need to prove Proposition 2.10 for assignment 1. This Proof is included for your self study only.
8 CHAPTER 2. INFORMATION MEASURES
Then apply Proposition 2.9 to see that X

1
X

2
X
m
forms a Markov chain.
Proof of Propositions 2.19, 2.21, and 2.22
Consider
H(X) H(X|Y ) = E log p(X) +E log p(Y |X)
= E log
p(Y |X)
p(X)
= E log
p(X, Y )
p(X)p(Y )
= I(X; Y )
This proves the rst part of Proposition 2.19. The rest of the propo-
sition as well as Propositions 2.21 and 2.22 can be proved likewise.

(Con't) No need to prove Proposition 2.10 for assignment 1. This Proof is included for your self study only.
9
3. Give an example which shows that pairwise independence does not
imply mutual independence.
Solution:
xyz 000 001 010 011 100 101 110 111
p(x,y,z)
1
4
0 0
1
4
0
1
4
1
4
0
For this joint distribution, X, Y , and Z, are pairwise independent
but not mutually independent. Alternatively, the joint distribution
for X, Y , and Z can be described by Z = X +Y mod 2, where X and
Y are independent and identical with uniform distribution on {0, 1}.
10 CHAPTER 2. INFORMATION MEASURES
4. Verify that p(x, y, z) as dened in Denition 2.4 is a probability dis-
tribution. You should exclude all the zero probability masses from the
summation carefully.
Solution:
Consider
X
x,y,z
p(x, y, z) =
X
ySy
X
x,z
p(x, y)p(y, z)
p(y)
=
X
ySy
X
x,z
p(x, y)p(z|y)
=
X
ySy
X
x
p(x, y)
X
z
p(z|y)
=
X
ySy
X
x
p(x, y)
=
X
ySy
p(y)
= 1.
11
5. Linearity of expectation It is well-known that expectation is linear,
i.e., E[f(X) + g(Y )] = Ef(X) + Eg(Y ), where the summation in an
expectation is taken over the corresponding alphabet. However, we
adopt in information theory the convention that the summation in an
expectation is taken over the corresponding support. Justify carefully
the linearity of expectation under this convention.
Solution:
Consider
E[f(X) + g(Y )]
=
X
(x,y)S
XY
p(x, y)(f(X) + g(Y ))
=
X
(x,y)S
XY
p(x, y)f(x) +
X
(x,y)S
XY
p(x, y)g(y)
=
X
xSx
X
y:(x,y)S
XY
p(x, y)f(x) +
X
yS
Y
X
x:(x,y)S
XY
p(x, y)g(y)
=
X
xS
X
p(x)f(x) +
X
yS
Y
p(y)g(y)
= Ef(X) + Eg(Y ).
Thus the linearity of the information-theoretic expectation operator
is justied no matter what values f(x) and g(y) may take (possibly
+ or ) for x 6 S
X
and y 6 S
Y
, respectively.
14 CHAPTER 2. INFORMATION MEASURES
8. Let p
k
and p be probability distributions dened on a common nite
alphabet. Show that as k , if p
k
p in variational distance, then
p
k
p in L
2
, and vice versa.
Solution:
Note that the variational distance is exactly the L
1
-norm. Thus it
suces to show that in <
|X |
, where |X| is nite, L
1
convergence is
equivalent to L
2
convergence. Toward this end, consider any u =
(u(x), x X) <
|X |
. Then for all > 0,
p
P
x
u(x)
2
<

P
x
u(x)
2
<
2
u(x)
2
< x X
|u(x)| <

x X

P
x
|u(x)| < |X|

.
Thus we have shown that u 0 (the zero vector) in L
2
implies u 0
in L
1
. Similarly, it can be shown that u 0 in L
1
implies u 0 in
L
2
. The proof is completed upon letting u(x) = p
k
(x)p(x) for x X
and k .

Das könnte Ihnen auch gefallen