2007 Final ES250 Sol

Harvard SEAS
ES250 Information Theory
Final Exam
Solutions
1. (15 points) Let X, Y, Z be three binary Bernoulli( 12 ) random variables that are pairwise independent;
that is, I(X; Y ) = I(X; Z) = I(Y ; Z) = 0.
(a) (10 points) Under this constraint, what is the minimum value for H(X, Y, Z)?
(b) (5 points) Give an example achieving this minimum.
Solution :
(a)
H(X, Y, Z) = H(X, Y ) + H(Z|X, Y )
H(X, Y )
=2
So the minimum value for H(X, Y, Z) is at least 2. This is actually equal to 2, we show in part
(b) that this bound is attainable.
(b) Let X and Y be i.i.d. Bernoulli( 12 ) and let Z = X Y , where denotes addition mod 2 (xor).
2. (20 points) Let {Xn } be a stationary Markov process.
(a) (10 points) For what index k is
H(Xn |X0 , X1 ) = H(Xk |X0 , X1 )
Give the argument.
(b) (10 points) Show that
I(X1 ; X3 ) + I(X2 ; X4 ) I(X1 ; X4 ) + I(X2 ; X3 )
Solution :
(a) The trivial solution is k = n. To find other possible values of k, we expand
H(Xn |X0 , X1 ) = H(Xn , X0 , X1 ) H(X0 , X1 )
= H(Xn ) + H(X0 , X1 |Xn ) H(X0 , X1 )
= H(Xn ) + H(X0 |Xn ) + H(X1 |X0 , Xn ) H(X0 , X1 )
(a)
= H(Xn ) + H(X0 |Xn ) + H(X1 |X0 ) H(X0 , X1 )
= H(Xn ) + H(X0 |Xn ) H(X0 )
(b)
= H(X0 ) + H(X0 |Xn ) H(X0 )
(c)
= H(Xn |X0 )
(d)
= H(Xn |X0 , X1 )
(e)
= H(Xn+1 |X1 , X0 )
where (a) and (d) come from Markovity and (b), (c) and (e) come from stationary. Hence
k = n + 1 is also a solution. There are no other solution since for any other k, we can construct
a periodic Markov process as a counterexample. Therefore k {n, n + 1}.
1
Harvard SEAS
(b)
I(X1 ; X4 ) + I(X2 ; X3 ) I(X1 ; X3 ) I(X2 ; X4 )
= H(X1 ) H(X1 |X4 ) + H(X2 ) H(X2 |X3 )
(H(X1 ) H(X1 |X3 )) (H(X2 ) H(X2 |X4 ))
= H(X1 |X3 ) H(X1 |X4 ) + H(X2 |X4 ) H(X2 |X3 )
= H(X1 , X2 |X3 ) H(X2 |X1 , X3 ) H(X1 , X2 |X4 ) + H(X2 |X1 , X4 )+

H(X1 , X2 |X4 ) H(X1 |X2 , X4 ) H(X1 , X2 |X3 ) + H(X1 |X2 , X3 )
= H(X2 |X1 , X3 ) + H(X2 |X1 , X4 )
= H(X2 |X1 , X4 ) H(X2 |X1 , X3 , X4 )
= I(X2 ; X3 |X1 , X4 )
0
where H(X1 |X2 , X3 ) = H(X1 |X2 , X4 ) by Markovity of the random variables.

3. (10 points) A random variable X takes on three values with probabilities 0.6, 0.3, and 0.1.
(a) (5 points) What are the lengths of the binary
l Huffman

mcodewords for X? What are the lengths
1
of the binary Shannon codewords l(x) = log p(x)
for X?
(b) (5 points) What is the smallest integer D such that the expected Shannon codeword length with
a D-ary alphabet equals the expected Huffman codeword length with a D-ary alphabet?
Solution :
(a) It is obvious that an Huffman code for the distribution (0.6, 0.3, 0.1) is (1, 01, 00), with codeword
lengths (1, 2, 2). The Shannon code would use lengths log 1p , which gives lengths (1, 2, 4) for
the three symbols.
(b) For any D > 2, the Huffman code for the three symbols are all one character. The Shannon
1
= 1, i.e., if D = 10. Hence
code length logD p1 would be equal to 1 for all symbols if logD 0.1
for D 10, the Shannon code is also optimal.
4. (15 points) Minimum expected value
(a) (10 points) Find the minimum value of E[X] over all probability density functions f (x) satisfying
the following three constraints
i. f (x) = 0 for x 0,
R
ii. f (x)dx = 1,
iii. h(f ) = h.
(b) (5 points) Solve the same problem if (i.) is replaced by
i. f (x) = 0 for x a.
Solution :
(a) Let X be any positive random variable with mean . Then, from the result on the maximum
entropy distribution, h(X) h(X ) = log(e) where X has the exponential distribution with
mean . By exponentiating both sides of the above inequality, we get
1
E[X] = 2h .
e
2
Harvard SEAS
This bound is for any distribution with support on the positive real line (whether or not it has
a density), but it is tight for the exponential distribution (which has a density). Hence we can
conclude that 1e 2h is the minimum value of E[X] for any probability density function satisfying
Conditions (i),(ii), and (iii).
(b) Although we can derive the maximum entropy for random variables defined on {x a} with
the mean constraint and repeat the previous argument, we simply reuse the result of part (a)
as follows:
Consider any random variable X satisfying Conditions (i), (ii), and (iii). Since the entropy is
translation invariant, i.e., h(X) = h(X + b) for any b, the random variable Y = X a satisfies
Conditions (i), (ii), and (iii) of part (a) and thus E[Y ] 1e 2h . This implies
1
E[X] = E[Y ] + a a + 2h .
e
5. (25 points) Consider the two discrete memoryless channels (X , p1 (y|x), Y) and (Y, p2 (z|y), Z). Let
the channel transition matrices for the cascade channels be
X \Y 1 e
1
0 1
e
0 1
0
0 0
p1 (y|x)
0
0
0
1
Y \Z 1 e
1
1 0
e
0 1
0
0 1
p2 (z|y)
0
0
0
0
(a) (5 points) What is the capacity C1 of p1 (y|x)?

(b) (5 points) What is the capacity C2 of p2 (z|y)?
(c) (5 points) We now cascade these channels. Thus p3 (z|x) =
capacity C3 of p3 (z|x)?
p1 (y|x)p2 (z|y). What is the
(d) (5 points) Now let us actively intervene between channels 1 and 2, rather than passively transmit
y n . What is the capacity of channel 1 followed by channel 2 if you are allowed to decode the
output y n of channel 1 and then re-encode it as yn for transmission over channel 2? (Think
.)
W xn (W ) y n yn (y n ) z n W
(e) (5 points) What is the capacity of the cascade in part (c) if the receiver can view both Y and
Z?
Solution :
(a) Since H(Y |X) = 0 and Y can be only 0 or e, C1 = max I(X; Y ) = max H(Y ) = 1, which is
attained by any distribution p(x) with p(1) + p(e) = 1/2 and p(0) = 1/2.
(b) Similarly, C2 = 1.
(c) The transition matrix for the cascaded channel is:
X \Z
1
e
0
1
0
0
0
e
1
1
1
0
0
0
0
Since Z = e for any input X, 0 C max H(Z) = 0 so that C = 0.
Harvard SEAS
(d) Since we are allowed to decode the intermediate outputs and reencode them prior to the second
transmission, any rate less than both C1 and C2 can be achievable and at the same time any
(n)
rate greater than either C1 or C2 will cause Pe 1 exponentially. Hence, the overall capacity
is the minimum of two capacities, min(C1 , C2 ) = 1.
(e) Note that Z becomes irrelevant once we observe Y . Thus, the capacity of this channel is just
C1 = 1.
6. (15 points) Consider a Gaussian noise channel with power constraint P , where the signal takes two
different paths and the received noisy signals are added together at the antenna.
(a) (10 points) Find the capacity of this channel if Z1 and Z2 are jointly normal with covariance
matrix
2

2
KZ =
.
2 2
(b) (5 points) Compare with the capacity of a single path Gaussian channel, i.e., with Z N (0, 2 ).
Solution :
(a) The channel reduces to the following channel:
The power constraint on the input 2X is 4P . Z1 and Z2 are zero mean, and therefore:
Var(Z1 + Z2 ) = E[(Z1 + Z2 )2 ]
= E[Z12 + Z22 + 2Z1 Z2 ]
= 2 2 + 2 2
Thus the noise distribution is N (0, 2 2 (1 + )). Plugging the noise and power value into the
P
formula for the one-dimensional (P, N ) channel capacity, C = 12 log(1 + N
), we get

1
4P
C = log 1 + 2
2
2 (1 + )

1
2P
= log 1 + 2
2
(1 + )
4
Harvard SEAS
(b) The capacity of a single path Gaussian channel is C =
1
2
log 1 +

1
2P
C = log 1 + 2
2
(1 + )
P
2

. Thus
Equality holds when = 1.

7. (15 points) Suppose the Frequency-division multiple access (FDMA) in a Gaussian channel.
(a) (10 points) Maximize the throughput

P1
P2
+ (W W1 ) log 1 +
R1 + R2 = W1 log 1 +
N W1
N (W W1 )
over W1 to show that bandwidth should be proportional to transmitted power for FDMA.
(b) (5 points) Compare the result of part (a) to the sum capacity of the Gaussian multiple access
channel.
Solution :
(a) Allocating bandwidth W1 and W2 = W W1 to the two senders, we can achieve the following
rates

P1
R1 = W1 log 1 +
N W1

P2
R2 = W2 log 1 +
N W2
To maximize the sum of the rates, we write

P1
P2
R = R1 + R2 = W1 log 1 +
+ (W W1 ) log 1 +
N W1
N (W W1 )
and differentiating with respect to W1 , we obtain

P1
W1
P1
log 1 +
+
N W1
N W12
1 + NPW1 1

P2
W W1
P2
log 1 +
+
=0
2
2
N (W W1 )
N
(W
W
)
1 + N (WPW
1
1)
(1)
(2)
Instead of solving this equation, we can verify that if we set

W1 =
P1
W
P1 + P2
(3)
so that
P1
P2
P1 + P2
=
=
N W1
N W2
NW
(4)
that (2) is satisfied, and that using bandwidth proportional to the power optimizes the total
rate for Frequency Division Multiple Access.
5
Harvard SEAS
(b) The sum capacity of the Gaussian multiple access channel is :

P1 + P2
R = W log 1 +
NW
Plugging (3) and (4) into (1), then we can easily show that R = R . Thus, the optimal
bandwidth proportional FDMA achieves the capacity.
8. (15 points) Multiple access channel (MAC)
(a) (7 points)
thechannel Y = X1 + X2 with finite alphabets X1 = {1, 2, , m},
Consider
X2 = { 2, 2 2, , m 2}, X1 X1 , X2 X2 . Find the capacity region.
(b) (10 points) Consider the channel Y = X1 + X2 (mod 4), where X1 {0, 1, 2, 3}, X2 {0, 1}.
Find the capacity region.
Solution :
(a) Once the receiver gets Y = a + b 2 with

some a, b {1, 2, , m}, it can immediately nail
down the input signals as X = a and Z = b 2. Hence, the capacity region is
R1 log m
R2 log m
(b) The MAC capacity region is given by the standard set of equations which reduce as follows since
there is no noise:
R1 I(X1 ; Y |X2 ) = H(Y |X2 ) H(Y |X1 , X2 ) = H(Y |X2 ) = H(X1 )
R1 I(X2 ; Y |X1 ) = H(Y |X1 ) H(Y |X1 , X2 ) = H(Y |X1 ) = H(X2 )
R1 + R2 I(X1 , X2 ; Y ) = H(Y ) H(Y |X1 , X2 ) = H(Y )
Since entropy is maximized under a uniform distribution over the finite alphabet, R1 H(X1 )
2, R2 H(X2 ) 1, and R1 + R2 H(Y ) 2. Further, if X1 unif (0, 1, 2, 3), and X2
unif (0, 1) then Y unif (0, 1, 2, 3), so the upper bounds are achieved. This gives the capacity
region in the following figure.
9. (20 points) A speaker of Dutch, Spanish, and French wishes to communicate simultaneously to three
people: D, S, and F . D knows only Dutch but can distinguish when a Spanish word is being spoken
as distinguished from a French word; similarly for the other two, who know only Spanish and French,
6
Harvard SEAS
respectively, but can distinguish when a foreign word is spoken and which language is being spoken.
Suppose that each language, Dutch, Spanish, and French, has M words: M words of Dutch, M words
of French, and M words of Spanish.
(a) (10 points) What is the maximum rate at which the trilingual speaker can speak to D?
(b) (5 points) If he speaks to D at the maximum rate, what is the maximum rate at which he can
speak simultaneously to S?
(c) (5 points) If he is speaking to D and S at the joint rate in part b), can he also speak to F at
some positive rate? If so, what is it? If not, why not?
Solution :
(a) Speaking Dutch gives M words, and in addition two words for the distinguishability of French
and Spanish from Dutch, thus log(M + 2) bits.
(b) Transmitting log M bits for a fraction of 1/(M + 2) of the time gives R =
(c) Same reasoning as in (b) gives R =
log M
M +2 .
log M
M +2 .

2007 Final ES250 Sol

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

2007 Final ES250 Sol

Hochgeladen von

Copyright:

Verfügbare Formate

Harvard SEAS

ES250 Information Theory

= H(Xn ) + H(X0 , X1 |Xn ) H(X0 , X1 )

= H(Xn ) + H(X0 |Xn ) + H(X1 |X0 , Xn ) H(X0 , X1 )

= H(Xn ) + H(X0 |Xn ) + H(X1 |X0 ) H(X0 , X1 )

= H(Xn ) + H(X0 |Xn ) H(X0 )

= H(X0 ) + H(X0 |Xn ) H(X0 )

ES250 Information Theory

= H(X1 ) H(X1 |X4 ) + H(X2 ) H(X2 |X3 )

(H(X1 ) H(X1 |X3 )) (H(X2 ) H(X2 |X4 ))

= H(X1 |X3 ) H(X1 |X4 ) + H(X2 |X4 ) H(X2 |X3 )

= H(X1 , X2 |X3 ) H(X2 |X1 , X3 ) H(X1 , X2 |X4 ) + H(X2 |X1 , X4 )+

= H(X2 |X1 , X3 ) + H(X2 |X1 , X4 )

= H(X2 |X1 , X4 ) H(X2 |X1 , X3 , X4 )

where H(X1 |X2 , X3 ) = H(X1 |X2 , X4 ) by Markovity of the random variables.

ES250 Information Theory

(a) (5 points) What is the capacity C1 of p1 (y|x)?

p1 (y|x)p2 (z|y). What is the

Since Z = e for any input X, 0 C max H(Z) = 0 so that C = 0.

ES250 Information Theory

ES250 Information Theory

(b) The capacity of a single path Gaussian channel is C =

Equality holds when = 1.

Instead of solving this equation, we can verify that if we set

ES250 Information Theory

(b) The sum capacity of the Gaussian multiple access channel is :

X2 = { 2, 2 2, , m 2}, X1 X1 , X2 X2 . Find the capacity region.

(a) Once the receiver gets Y = a + b 2 with

R1 I(X2 ; Y |X1 ) = H(Y |X1 ) H(Y |X1 , X2 ) = H(Y |X1 ) = H(X2 )

R1 + R2 I(X1 , X2 ; Y ) = H(Y ) H(Y |X1 , X2 ) = H(Y )

ES250 Information Theory

Das könnte Ihnen auch gefallen