Sie sind auf Seite 1von 9

Harvard SEAS ES250 Information Theory

Homework 3
Solutions
1. Let X p(x), x = 1, 2, , m, denote the winner of a horse race. Suppose the odds o(x) are fair
with respect to p(x), i.e., o(x) =
1
p(x)
. Let b(x) be the amount bet on horse x, b(x) 0,

m
1
b(x) = 1.
Then the resulting wealth factor is S(x) = b(x)o(x), with probability p(x).
(a) Find the expected wealth ES(X).
(b) Find W

, the optimal growth rate of wealth.


(c) Suppose
Y =
_
1, X = 1 or 2
0, otherwise
If this side information is available before the bet, how much does it increase the growth rate
W

?
(d) Find I(X; Y )
Solution :
(a) The expected wealth ES(X) is
ES(X) =
m

x=1
S(x)p(x)
=
m

x=1
b(x)o(x)p(x)
=
m

x=1
b(x) (since o(x) = 1/p(x))
= 1
(b) The optimal growth rate of wealth, W

, is achieved when b(x) = p(x) for all x, in which case,


W

= E[log S(X)]
=
m

x=1
p(x) log(b(x)o(x))
=
m

x=1
p(x) log(p(x)/p(x))
=
m

x=1
p(x) log(1)
= 0,
so we maintain our current wealth.
1
Harvard SEAS ES250 Information Theory
(c) The increase in our growth rate due to the side information is given by I(X; Y ).
Let q = Pr(Y = 1) = p(1) +p(2).
I(X; Y ) = H(Y ) H(Y |X)
= H(Y ) (since Y is a deterministic function of X)
= H(q)
(d) Already computed above.
2. Many years ago in ancient St. Petersburg the following gambling proposition caused great conster-
nation. For an entry fee of c units, a gambler receives a payo of 2
k
units with probability 2
k
,
k = 1, 2, .
(a) Show that the expected payo for this game is innite. For this reason, it was argued that
c = was a fair price to pay to play this game. Most people nd this answer absurd.
(b) Suppose that the gambler can buy a share of the game. For example, if he invests c/2 units
in the game, he receives
1
2
a share and a return X/2, where Pr(X = 2
k
) = 2
k
, k = 1, 2, .
Suppose that X
1
, X
2
, are i.i.d. according to this distribution and that the gambler reinvests
all his wealth each time. Thus, his wealth S
n
at time n is given by
S
n
=
n

i=1
X
i
c
Show that this limit is or 0, with probability 1, accordingly as c < c

or c > c

. Identify the
fair entry fee c

.
More realistically, the gambler should be allowed to keep a proportion

b = 1 b of his money in his
pocket and invest the rest in the St. Petersburg game. His wealth at time n is then
S
n
=
n

i=1
_

b +
bX
i
c
_
Let
W(b, c) =

k=1
2
k
log
_
1 b +
b2
k
c
_
We have
S
n
.
= 2
nW(b,c)
Let
W

(c) = max
0b1
W(b, c)
Here are some questions about W

(c).
(c) For what value of the entry fee c does the optimizing value b

drop below 1?
(d) How does b

vary with c?
(e) How does W

(c) fall o with c?


2
Harvard SEAS ES250 Information Theory
0
0.2
0.4
0.6
0.8
1
0
5
10
2
0
2
4
6
b
c
W
(
b
,
c
)
Figure 1: St. Petersburg: W(b, c) as a function of b and c.
Note that since W

(c) > 0, for all c, we can conclude that any entry fee c is fair.
Solution :
(a) The expected return,
EX =

k=1
p(X = 2
k
) 2
k
=

k=1
2
k
2
k
=
Thus the expected return on the game is innite.
(b) By the strong law of large numbers, we see that
1
n
log S
n
=
1
n
n

i=1
log X
i
log c E[log X] log c, w.p.1
and therefore S
n
goes to innity or 0 according to whether E[log X] is greater or less than log c.
Therefore,
log c

= E[log X] =

k=1
k 2
k
= 2
Therefore a fair entry fee is 4 units if the gambler is forced to invest all his money.
(c) If the gambler is not required to invest all his money, then the growth rate is
W(b, c) =

k=1
2
k
log
_
1 b +
b2
k
c
_
3
Harvard SEAS ES250 Information Theory
0 2 4 6 8 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
c
b
*
Figure 2: St. Petersburg: b

as a function of c.
For b = 0, W = 1, and for b = 1, W = E[log X] log c = 2 log c. Dierentiating to nd the
optimum value of b, we obtain
W(b, c)
b
=

k=1
2
k
1
_
1 b +
b2
k
c
_
_
1 +
2
k
c
_
Unfortunately, there is no explicit solution for the b that maximizes W for a given value of c,
and we have to solve this numerically on the computer.
We have illustrated the results with three plots The rst (Figure 1) shows W(b, c) as a function
of b and c. The second (Figure 2) shows b

as a function of c and the third (Figure 3) shows


W

as a function of c.
From Figure 2, it is clear that b

is less than 1 for c > 3. We can also see this analytically, by


calculating the slope
W(b,c)
b
at b = 1.
W(b, c)
b
=

k=1
2
k
1
_
1 b +
b2
k
c
_
_
1 +
2
k
c
_
=

k
2
k
2
k
/c
_
2
k
c
1
_
=

k=1
2
k

k=1
c2
2k
= 1
c
3
which is positive for c < 3. Thus for c < 3, the optimal value of b lies on the boundary of the
region of bs, and for c > 3, the optimal value of b lies in the interior.
4
Harvard SEAS ES250 Information Theory
0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
3
c
W
*
(
b
*
,
c
)
Figure 3: St. Petersburg: W

(b

, c) as a function of c.
(d) The variation of b

with c is shown in Figure 2. As c , b

0. We have a conjecture
(based on numerical results) that b

2
c2
c
as c , but we do not have a proof.
(e) The variation of W

with c is shown in Figure 3.


3. Consider the random variable
X =
_
x
1
x
2
x
3
x
4
x
5
x
6
x
7
0.49 0.26 0.12 0.04 0.04 0.03 0.02
_
(a) Find a binary Human code for X.
(b) Find the expected code length for this encoding.
(c) Find a ternary Human code for X.
Solution :
(a) The Human tree for this distribution is
5
Harvard SEAS ES250 Information Theory
(b) The expected length of the codewords for the binary Human code is 2.02 bits. (H(X) = 2.01
bits).
(c) The ternary Human tree is
This code has an expected length 1.34 ternary symbols. (H
3
(X) = 1.27 ternary symbols).
4. One is given 6 bottles of wine. It is known that precisely one bottle has gone bad (tastes terrible).
From inspection of the bottles it is determined that the probability p
i
that the i
th
bottle is bad is
given by (p
1
, p
2
, , p
6
) = (
7
26
,
5
26
,
4
26
,
4
26
,
3
26
,
3
26
). Tasting will determine the bad wine.
Suppose you taste the wines one at a time. Choose the order of tasting to minimize the expected
number of tastings required to determine the bad bottle. Remember, if the rst 5 wines pass the test
you dont have to taste the last.
(a) What is the expected number of tastings required?
(b) Which bottle should be tasted rst?
Now you get smart. For the rst sample, you mix some of the wines in a fresh glass and sample the
mixture. You proceed, mixing and tasting, stopping when the bad bottle has been determined.
(c) What is the minimum expected number of tastings required to determine the bad wine?
(d) What mixture should be tasted rst?
Solution :
(a) If we taste one bottle at a time, to minimize the expected number of tastings the order of tasting
should be from the most likely wine to be bad to the least. The expected number of tastings
required is
6

i=1
p
i
l
i
= 1
7
26
+ 2
5
26
+ 3
4
26
+ 4
4
26
+ 5
3
26
+ 5
3
26
=
75
26
= 2.88
(b) The rst bottle to be tested should be the one with probability
7
26
.
6
Harvard SEAS ES250 Information Theory
(c) The idea is to use Human coding. With Human coding, we get codeword lengths as (2, 2, 3, 3, 3, 3).
The expected number of tastings required is
6

i=1
p
i
l
i
= 2
7
26
+ 2
5
26
+ 3
4
26
+ 3
4
26
+ 3
3
26
+ 3
3
26
=
66
26
= 2.54
(d) The mixture of the rst, third and forth bottles should be tasted rst.
5. Let X be a random variable with alphabet {1, 2, 3} and distribution
X =
_
_
_
1, with probability 1/2;
2, with probability 1/4;
3, with probability 1/4.
The data compression code for X assigns codewords
C(x) =
_
_
_
0, if x = 1;
10, if x = 2;
11, if x = 3.
Let X
1
, X
2
, be independent identical distributed according to this distribution and let Z
1
Z
2
Z
3
=
C(X
1
)C(X
2
) be the string of binary symbols resulting from concatenating the corresponding code-
words. For example, 122 becomes 01010.
(a) Find the entropy rate H(X) and the entropy rate H(Z) in bits per symbol. Note that Z is not
compressible further.
(b) Now let the code be
C(x) =
_
_
_
00, if x = 1;
10, if x = 2;
01, if x = 3.
and nd the entropy rate H(Z).
(c) Finally, let the code be
C(x) =
_
_
_
00, if x = 1;
1, if x = 2;
01, if x = 3.
and nd the entropy rate H(Z).
Solution :
(a) First, since the X
i
s are independent,
H(X) = H(X
1
) =
1
2
log 2 + 2
1
4
log 4 =
3
2
Now we observe that this is an optimal code for the given distribution on X, and since the
probabilities are dyadic there is no gain in coding in blocks. So the resulting process has to be
i.i.d. Bern(1/2). (for otherwise we could get further compression from it).
Therefore H(Z) = H(
1
2
) = 1.
7
Harvard SEAS ES250 Information Theory
(b) Here it is easy
H(Z) = lim
n
H(Z
1
, Z
2
, , Z
n
)
n
= lim
n
H(X
1
, X
2
, , X
n/2
)
n
= lim
n
H(X)
n
2
n
=
3
4
(We are being a little sloppy and ignoring the fact that n above may not be an even, but in the
limit as n this doesnt make a dierence).
(c) If H

(Z) = lim
n
H(Z
n
|Z
1
, , Z
n1
) exists, then by Cesaro mean lemma,
H(Z) = lim
n
1
n
H(Z
1
, , Z
n
) exists and H(Z) = H

(Z). Hence, it suces to show that


H(Z
n
|Z
1
, , Z
n1
) converges to a limit.
Dene a random sequence Y
n
as
Y
n
=
_
_
_
1, if Z
1
, , Z
n1
is a complete sequence of codewords,
i.e. Z
n
is the begining of a new codeword.
2, otherwise.
Note that Y
n
is a function of (Z
1
, , Z
n1
). Also note that Z
n
and (Z
1
, , Z
n1
) are condi-
tionally independent given Y
n
. In other words, Y
n
is a sucient statistic in (Z
1
, Z
n1
) about
Z
n
. Hence,
H(Z
n
|Z
1
, , Z
n1
) = H(Z
n
|Y
n
)
=

y
p(Y
n
= y)H(Z
n
|Y
n
= y)
= p(Y
n
= 1)H(Z
n
|Y
n
= 1) +p(Y
n
= 2)H(Z
n
|Y
n
= 2)
= p(Y
n
= 1)H(1/4) +p(Y
n
= 2)H(1/3)
Although p(Y
n
) changes with n, p(Y
n
) converges to a unique stationary distribution since Y
n
is an irreducible and aperiodic Markov chain. The standard analysis shows that = (4/7, 3/7).
Therefore,
lim
n
H(Z
n
|Z
1
, , Z
n1
) =
4
7
H
_
1
4
_
+
3
7
H
_
1
3
_
=
6
7
= H(Z)
6. Which of the following codes are
C
1
= {11, 01, 0}
C
2
= {00, 01, 100, 101, 11}
C
3
= {0, 10, 110, 1110, }
C
4
= {0, 00, 000, 0000}
(a) Uniquely decodable?
8
Harvard SEAS ES250 Information Theory
(b) Instantaneous?
Solution :
(a) C
1
= {11, 01, 0} is uniquely decodable (sux free) but not instantaneous.
(b) C
2
= {00, 01, 100, 101, 11} is prex free (instantaneous).
(c) C
3
= {0, 10, 110, 1110, } is instantaneous.
(d) C
4
= {0, 00, 000, 0000} is neither uniquely decodable or instantaneous.
7. Three horses run a race. A gambler oers 3-for-1 odds on each of the horses. These are fair odds
under the assumption that all horses are equally likely to win the race. The true probabilities are
known to be
p = (p
1
, p
2
, p
3
) =
_
1
2
,
1
4
,
1
4
_
Let b = (b
1
, b
2
, b
3
), b
i
0,

b
i
= 1, be the amount invested on each of the horses. The expected
log wealth is thus
W(b) =
3

i=1
p
i
log 3b
i
(a) Maximize this over b to nd b

and W

. Thus the wealth achieved in repeated horse races


should grow to innity like 2
nW

with probability one.


(b) Show that if instead we put all of the current wealth on horse 1, the most likely winner, on each
race, we will eventually go broke with probability one.
Solution :
(a) The doubling rate
W(b) =

i
p
i
log b
i
o
i
=

i
p
i
log 3b
i
=

p
i
log 3 +

p
i
log p
i

p
i
log
p
i
b
i
= log 3 H(p) D(p||b)
log 3 H(p)
with equality i p = b. Hence b

= p =
_
1
2
,
1
4
,
1
4
_
and W

= log 3 H
_
1
2
,
1
4
,
1
4
_
=
1
2
log
9
8
=
0.085.
When b = b

, W(b) = W

and S
n
.
= 2
nW

= 2
0.085n
= (1.06)
n
.
(b) If we put all the money on the rst horse, then the probability that we do not go broke in n
races is
_
1
2
_
n
. Since this probability goes to zero with n, the probability of the set of outcomes
where we do not ever go broke is zero, and we will go broke with probability 1.
Alternatively, if b = (1, 0, 0), then W(b) = and
S
n
2
nW
= 0 w.p. 1
by the strong law of large numbers.
9