Sie sind auf Seite 1von 5

Eric Zhang (ez2232)

HW 1, Information Theory
(1)
Proof : Var(Y) = Y 2 - ([Y])2 = Y 2

X])2 - ([[Y

= Var[Y

X] + ([Y

= [Var(Y

X)] + ([Y

= [Var(Y

X)] + Var([Y

X]])2

X - ([[Y
X]])2

X])2 - ([[Y

X]])2

X])

(2)
First, one may note that given the joint pmf of (X, Y), the marginal pmfs are as follows
p(x = 0) = 1/4 + 1/2 = 3/4, p(x = 1) = 0 + 1/4 = 1 /4
p(y = 0) = 1/4 + 0 = 1/4, p(y = 1) = 1/2 + 1/4 = 3 /4
(a)
3

H(X) = -p(x) [log2 p(x)] = -

4
1

H(Y) = -p(y) [log2 p(y)] = -

log2
log2

3
4
1
4

+
+

1
4
3
4

log2
log2

0.811 bits

4
3

0.811 bits

(b)
H(X

Y) = -p(x,y) [log2 p(X


=-

H(Y

1
4

log2

1/4
1/4

Y)] = -p(x,y) log2

1
2

log2

1/2
3/4

+ 0 log2

p(X, Y)

p(Y)
0

1/4

1
4

log2

1/4
3/4

0.689 bits, adopting the convention that 0 log2 0 = 0


p(X, Y)
X) = -p(x,y) [log2 p(Y X)] = -p(x,y) log2

p(X)
=-

1
4

log2

1/4
3/4

1
2

log2

1/2
3/4

+ 0 log2

0
1/4

1
4

log2

1/4
1/4

0.689 bits, adopting the convention that 0 log2 0 = 0


(c)
H(X, Y) = -p(x,y) [log2 p(x, y)] = -

1
4

log2

1
4

1
2

log2

1
2

+ 0 log2 0 +

= 1.5 bits, adopting the convention that 0 log2 0 = 0


(d)
I(X; Y) = H(X) - H(X

Y) 0.811 bits - 0.689 bits = 0.122 bits

Printed by Wolfram Mathematica Student Edition

1
4

log2

1
4

HW1.nb

(e)
H(Y )

H(X)

I(X; Y )

H(X Y )

H(Y

X)

H(X, Y )

(3)
(a)
Recall that
H(g(X), X) = H(X) + H(g(X)) - I(X; g(X)) and
H(g(X), X) = H(X) + H(g(X)

X)

Since H(g(X) X) = 0, it follows that


H(g(X)) - I(X; g(X)) = 0 H(g(X)) = I(X; g(X))
Then, from the data processing inequality,
I(X; X) I(X; g(X))
Noting that I(X; X) = H(X) , it follows that
H(X) H(g(X))
Due to the chain rule for mutual information,
I(X; X) = I(X; g(X)) + I(X; X

g(X))

Therefore, equality occurs only if I(X; X


0 = H(X
H(X

g(X)) - H(X

g(X)) = 0 i.e.,

X, g(X)) = H(X

g(X))H(X

g(X)) = 0

g(X)) = 0 only if g() is invertible so that X can be perfectly reconstructed from g(X)

(b)
Consider H(X, g(X), Y). By the chain rule for entropy,

Printed by Wolfram Mathematica Student Edition

HW1.nb

H(X, g(X), Y) = H(g(X), Y) + H(X

g(X), Y)

H(X, g(X), Y) = H(X, Y) + H(g(X)

X, Y)

Since H(g(X) X, Y) = 0, it follows that


H(g(X), Y) + H(X

g(X), Y) = H(X, Y)

H(g(X), Y) H(X, Y), since H(X


Equality only occurs if H(X

g(X), Y) 0

g(X), Y) = 0, that is, X can always be perfectly reconstructed

given g(X) and Y

(4)
According to the Information Inequality, D(p(x1 , ..., xn ) || p(x1 ) ... p(xn )) 0, with equality
iff p(x1 , ..., xn ) = p(x1 ) ... p(xn ) for all possible xi values (i.e., the Xi are independent). Then,
from the definition of relative entropy,
0 D(p(x1 , ..., xn ) || p(x1 ) ... p(xn )) = p(x1 , ..., xn ) log
x1 ,...,xn

p(x1 , ..., xn )
p(x1 ) ... p(xn )

= - p(x1 , ..., xn ) log p(x1 ) ... p(xn ) + p(x1 , ..., xn ) log p(x1 , ..., xn )
x1 ,...,xn

x1 ,...,xn
n

= - p(x1 , ..., xn ) log p(xi ) - H(X1 , ..., Xn )


x1 ,...,xn

i=1
n

= - p(x1 , ..., xn ) log p(xi ) - H(X1 , ..., Xn )


x1 ,...,xn i=1
n

= - p(x1 , ..., xn ) log p(xi ) - H(X1 , ..., Xn )


i=1 x1 ,...,xn
n

= -p(xi ) log p(xi ) - H(X1 , ..., Xn ) = H(Xi ) - H(X1 , ..., Xn )


i=1

i=1
n

H(X1 , ..., Xn ) H(Xi )


i=1

(5)
Given a1 , ..., an 0 and b1 , ..., bn 0, define probability distributions p(x), q(x) on {1, ..., n} such that
p(x = i) = pi =

ai
n
i=1 ai

and q(x = i) = qi =

bi
n
i=1 bi

for i {1, 2 ..., n}


n

Then, according to the Information Inequality, 0 D(p || q) = pi log


i=1

Printed by Wolfram Mathematica Student Edition

pi
qi

HW1.nb

ai

= (pi log pi - pi log qi ) =


i=1
n

ai

log ai -

i=1

i=1 ai

ai

n
i=1 ai

i=1
n

i=1 ai log
n
i=1 ai
n

ai log

i=1 ai
n

i=1 bi

i=1

ai

log ai -

i=1 ai
log

ai

bi
ai
bi

ai log
i=1

log bi

i=1 ai

i=1

i=1 bi
n

log

ai

ai

i=1 ai

log

n
i=1 ai

n
i=1 ai

i=1 bi

log bi +

bi

log

i=1 ai

i=1 ai

i=1

ai

i=1 ai

i=1 ai
ai

ai

i=1 ai

i=1

ai

log

i=1 ai
n
i=1 bi

log

i=1 ai
n
i=1 bi

i=1 ai log

ai
bi

i=1 ai

bi

(6)
(a)
H(X1 , X2 , X3 ) - H(X1 , X2 ) = H(X3

X1 , X 2 )

From the chain rule for entropy,


H(X1 , X2 , X3 ) = H(X1 ) + H(X2
H(X1 , X3 ) = H(X1 ) + H(X3

X1 ) + H(X3

X1 , X2 ) = H(X1 , X2 ) + H(X3

X1 )

Therefore, H(X1 , X2 , X3 ) - H(X1 , X2 ) = H(X3


H(X1 , X3 ) - H(X1 ) = H(X3
Then, since 0 I(X3 ; X2

X1 , X2 ) and

X1 , X2 ) and

X1 )
X1 ) = H(X3

X1 ) - H(X3

X1 , X2 ), it follows that

H(X1 , X2 , X3 ) - H(X1 , X2 ) H(X1 , X3 ) - H(X1 )


(b)
By applying the chain rule for mutual information,
I(X1 ; X3 ) + I(X2 ; X3

X1 ) = I(X1 , X2 ; X3 ) = I(X2 ; X3 ) + I(X1 ; X3

X2 )

(7)
In the proof of Fano' s Inequality, it is shown that H(Pe ) + HX
Therefore, it suffices to consider HX
HX

= (1 - P ) HX
E, X
e

= Pe HX

HX
E, X

. From the definition of entropy,


E, X

+ P HX
E = 0, X
e

, since HX
E = 1, X

E = 1, X

= 0
E = 0, X

Printed by Wolfram Mathematica Student Edition

H(X
X

Y)

HW1.nb

Then,
HX

= -
E = 1, X
px

E=1,x log pX


E = 1, X

has the same alphabet as X, , it follows that pX


Given that X

is a distribution
E = 1, X

}, a set with ( - 1) elements. The maximum entropy distribution pX


on the set \ {X
} with entropy log ( - 1). Therefore,
is the uniform distribution on \ {X
H(X Y)
H(P ) + log ( - 1) H(P ) + HX E, X
e

H(Pe ) + log ( - 1) H(X

Y)

Printed by Wolfram Mathematica Student Edition

E = 1, X

Das könnte Ihnen auch gefallen