Information Theory HW1

Eric Zhang (ez2232)
HW 1, Information Theory
(1)
Proof : Var(Y) = Y 2 - ([Y])2 = Y 2
X])2 - ([[Y
= Var[Y
X] + ([Y
= [Var(Y
X)] + ([Y
= [Var(Y
X)] + Var([Y
X]])2
X - ([[Y
X]])2
X])2 - ([[Y
X]])2
X])
(2)
First, one may note that given the joint pmf of (X, Y), the marginal pmfs are as follows
p(x = 0) = 1/4 + 1/2 = 3/4, p(x = 1) = 0 + 1/4 = 1 /4
p(y = 0) = 1/4 + 0 = 1/4, p(y = 1) = 1/2 + 1/4 = 3 /4
(a)
3
H(X) = -p(x) [log2 p(x)] = -
4
1
H(Y) = -p(y) [log2 p(y)] = -
log2
log2
3
4
1
4
+
+
1
4
3
4
log2
log2
0.811 bits
4
3
0.811 bits
(b)
H(X
Y) = -p(x,y) [log2 p(X

=-
H(Y
1
4
log2
1/4
1/4
Y)] = -p(x,y) log2
1
2
log2
1/2
3/4
+ 0 log2
p(X, Y)
p(Y)
0
1/4
1
4
log2
1/4
3/4
0.689 bits, adopting the convention that 0 log2 0 = 0

p(X, Y)
X) = -p(x,y) [log2 p(Y X)] = -p(x,y) log2
p(X)
=-
1
4
log2
1/4
3/4
1
2
log2
1/2
3/4
+ 0 log2
0
1/4
1
4
log2
1/4
1/4
0.689 bits, adopting the convention that 0 log2 0 = 0

(c)
H(X, Y) = -p(x,y) [log2 p(x, y)] = -
1
4
log2
1
4
1
2
log2
1
2
+ 0 log2 0 +
= 1.5 bits, adopting the convention that 0 log2 0 = 0

(d)
I(X; Y) = H(X) - H(X
Y) 0.811 bits - 0.689 bits = 0.122 bits
Printed by Wolfram Mathematica Student Edition
1
4
log2
1
4
HW1.nb
(e)
H(Y )
H(X)
I(X; Y )
H(X Y )
H(Y
X)
H(X, Y )
(3)
(a)
Recall that
H(g(X), X) = H(X) + H(g(X)) - I(X; g(X)) and
H(g(X), X) = H(X) + H(g(X)
X)
Since H(g(X) X) = 0, it follows that

H(g(X)) - I(X; g(X)) = 0 H(g(X)) = I(X; g(X))
Then, from the data processing inequality,
I(X; X) I(X; g(X))
Noting that I(X; X) = H(X) , it follows that
H(X) H(g(X))
Due to the chain rule for mutual information,
I(X; X) = I(X; g(X)) + I(X; X
g(X))
Therefore, equality occurs only if I(X; X

0 = H(X
H(X
g(X)) - H(X
g(X)) = 0 i.e.,
X, g(X)) = H(X
g(X))H(X
g(X)) = 0
g(X)) = 0 only if g() is invertible so that X can be perfectly reconstructed from g(X)
(b)
Consider H(X, g(X), Y). By the chain rule for entropy,
HW1.nb
H(X, g(X), Y) = H(g(X), Y) + H(X
g(X), Y)
H(X, g(X), Y) = H(X, Y) + H(g(X)
X, Y)
Since H(g(X) X, Y) = 0, it follows that

H(g(X), Y) + H(X
g(X), Y) = H(X, Y)
H(g(X), Y) H(X, Y), since H(X

Equality only occurs if H(X
g(X), Y) 0
g(X), Y) = 0, that is, X can always be perfectly reconstructed
given g(X) and Y
(4)
According to the Information Inequality, D(p(x1 , ..., xn ) || p(x1 ) ... p(xn )) 0, with equality
iff p(x1 , ..., xn ) = p(x1 ) ... p(xn ) for all possible xi values (i.e., the Xi are independent). Then,
from the definition of relative entropy,
0 D(p(x1 , ..., xn ) || p(x1 ) ... p(xn )) = p(x1 , ..., xn ) log
x1 ,...,xn
p(x1 , ..., xn )
p(x1 ) ... p(xn )
= - p(x1 , ..., xn ) log p(x1 ) ... p(xn ) + p(x1 , ..., xn ) log p(x1 , ..., xn )
x1 ,...,xn
x1 ,...,xn
n
= - p(x1 , ..., xn ) log p(xi ) - H(X1 , ..., Xn )

x1 ,...,xn
i=1
n
= - p(x1 , ..., xn ) log p(xi ) - H(X1 , ..., Xn )

x1 ,...,xn i=1
n
= - p(x1 , ..., xn ) log p(xi ) - H(X1 , ..., Xn )

i=1 x1 ,...,xn
n
= -p(xi ) log p(xi ) - H(X1 , ..., Xn ) = H(Xi ) - H(X1 , ..., Xn )

i=1
i=1
n
H(X1 , ..., Xn ) H(Xi )

i=1
(5)
Given a1 , ..., an 0 and b1 , ..., bn 0, define probability distributions p(x), q(x) on {1, ..., n} such that
p(x = i) = pi =
ai
n
i=1 ai
and q(x = i) = qi =
bi
n
i=1 bi
for i {1, 2 ..., n}

n
Then, according to the Information Inequality, 0 D(p || q) = pi log

i=1
pi
qi
HW1.nb
ai
= (pi log pi - pi log qi ) =

i=1
n
ai
log ai -
i=1
i=1 ai
ai
n
i=1 ai
i=1
n
i=1 ai log
n
i=1 ai
n
ai log
i=1 ai
n
i=1 bi
i=1
ai
log ai -
i=1 ai
log
ai
bi
ai
bi
ai log
i=1
log bi
i=1 ai
i=1
i=1 bi
n
log
ai
ai
i=1 ai
log
n
i=1 ai
n
i=1 ai
i=1 bi
log bi +
bi
log
i=1 ai
i=1 ai
i=1
ai
i=1 ai
i=1 ai
ai
ai
i=1 ai
i=1
ai
log
i=1 ai
n
i=1 bi
log
i=1 ai
n
i=1 bi
i=1 ai log
ai
bi
i=1 ai
bi
(6)
(a)
H(X1 , X2 , X3 ) - H(X1 , X2 ) = H(X3
X1 , X 2 )
From the chain rule for entropy,

H(X1 , X2 , X3 ) = H(X1 ) + H(X2
H(X1 , X3 ) = H(X1 ) + H(X3
X1 ) + H(X3
X1 , X2 ) = H(X1 , X2 ) + H(X3
X1 )
Therefore, H(X1 , X2 , X3 ) - H(X1 , X2 ) = H(X3

H(X1 , X3 ) - H(X1 ) = H(X3
Then, since 0 I(X3 ; X2
X1 , X2 ) and
X1 , X2 ) and
X1 )
X1 ) = H(X3
X1 ) - H(X3
X1 , X2 ), it follows that
H(X1 , X2 , X3 ) - H(X1 , X2 ) H(X1 , X3 ) - H(X1 )

(b)
By applying the chain rule for mutual information,
I(X1 ; X3 ) + I(X2 ; X3
X1 ) = I(X1 , X2 ; X3 ) = I(X2 ; X3 ) + I(X1 ; X3
X2 )
(7)
In the proof of Fano' s Inequality, it is shown that H(Pe ) + HX
Therefore, it suffices to consider HX
HX
= (1 - P ) HX
E, X
e
= Pe HX
HX
E, X
. From the definition of entropy,

E, X
+ P HX
E = 0, X
e
, since HX
E = 1, X
E = 1, X
= 0
E = 0, X
H(X
X
Y)
HW1.nb
Then,
HX
= -
E = 1, X
px
E=1,x log pX

E = 1, X
has the same alphabet as X, , it follows that pX

Given that X
is a distribution
E = 1, X
}, a set with ( - 1) elements. The maximum entropy distribution pX

on the set \ {X
} with entropy log ( - 1). Therefore,
is the uniform distribution on \ {X
H(X Y)
H(P ) + log ( - 1) H(P ) + HX E, X
e
H(Pe ) + log ( - 1) H(X
Y)
E = 1, X

Information Theory HW1

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Information Theory HW1

Hochgeladen von

Copyright:

Verfügbare Formate

Eric Zhang (ez2232)

H(X) = -p(x) [log2 p(x)] = -

H(Y) = -p(y) [log2 p(y)] = -

Y) = -p(x,y) [log2 p(X

Y)] = -p(x,y) log2

0.689 bits, adopting the convention that 0 log2 0 = 0

0.689 bits, adopting the convention that 0 log2 0 = 0

= 1.5 bits, adopting the convention that 0 log2 0 = 0

Y) 0.811 bits - 0.689 bits = 0.122 bits

Printed by Wolfram Mathematica Student Edition

Since H(g(X) X) = 0, it follows that

Therefore, equality occurs only if I(X; X

Printed by Wolfram Mathematica Student Edition

H(X, g(X), Y) = H(g(X), Y) + H(X

H(X, g(X), Y) = H(X, Y) + H(g(X)

Since H(g(X) X, Y) = 0, it follows that

H(g(X), Y) H(X, Y), since H(X

g(X), Y) = 0, that is, X can always be perfectly reconstructed

given g(X) and Y

= - p(x1 , ..., xn ) log p(xi ) - H(X1 , ..., Xn )

= - p(x1 , ..., xn ) log p(xi ) - H(X1 , ..., Xn )

= - p(x1 , ..., xn ) log p(xi ) - H(X1 , ..., Xn )

= -p(xi ) log p(xi ) - H(X1 , ..., Xn ) = H(Xi ) - H(X1 , ..., Xn )

H(X1 , ..., Xn ) H(Xi )

for i {1, 2 ..., n}

Then, according to the Information Inequality, 0 D(p || q) = pi log

Printed by Wolfram Mathematica Student Edition

= (pi log pi - pi log qi ) =

From the chain rule for entropy,

Therefore, H(X1 , X2 , X3 ) - H(X1 , X2 ) = H(X3

H(X1 , X2 , X3 ) - H(X1 , X2 ) H(X1 , X3 ) - H(X1 )

X1 ) = I(X1 , X2 ; X3 ) = I(X2 ; X3 ) + I(X1 ; X3

. From the definition of entropy,

Printed by Wolfram Mathematica Student Edition

has the same alphabet as X, , it follows that pX

}, a set with ( - 1) elements. The maximum entropy distribution pX

H(Pe ) + log ( - 1) H(X

Printed by Wolfram Mathematica Student Edition

Das könnte Ihnen auch gefallen