Beruflich Dokumente
Kultur Dokumente
HW 1, Information Theory
(1)
Proof : Var(Y) = Y 2 - ([Y])2 = Y 2
X])2 - ([[Y
= Var[Y
X] + ([Y
= [Var(Y
X)] + ([Y
= [Var(Y
X)] + Var([Y
X]])2
X - ([[Y
X]])2
X])2 - ([[Y
X]])2
X])
(2)
First, one may note that given the joint pmf of (X, Y), the marginal pmfs are as follows
p(x = 0) = 1/4 + 1/2 = 3/4, p(x = 1) = 0 + 1/4 = 1 /4
p(y = 0) = 1/4 + 0 = 1/4, p(y = 1) = 1/2 + 1/4 = 3 /4
(a)
3
4
1
log2
log2
3
4
1
4
+
+
1
4
3
4
log2
log2
0.811 bits
4
3
0.811 bits
(b)
H(X
H(Y
1
4
log2
1/4
1/4
1
2
log2
1/2
3/4
+ 0 log2
p(X, Y)
p(Y)
0
1/4
1
4
log2
1/4
3/4
p(X)
=-
1
4
log2
1/4
3/4
1
2
log2
1/2
3/4
+ 0 log2
0
1/4
1
4
log2
1/4
1/4
1
4
log2
1
4
1
2
log2
1
2
+ 0 log2 0 +
1
4
log2
1
4
HW1.nb
(e)
H(Y )
H(X)
I(X; Y )
H(X Y )
H(Y
X)
H(X, Y )
(3)
(a)
Recall that
H(g(X), X) = H(X) + H(g(X)) - I(X; g(X)) and
H(g(X), X) = H(X) + H(g(X)
X)
g(X))
g(X)) - H(X
g(X)) = 0 i.e.,
X, g(X)) = H(X
g(X))H(X
g(X)) = 0
g(X)) = 0 only if g() is invertible so that X can be perfectly reconstructed from g(X)
(b)
Consider H(X, g(X), Y). By the chain rule for entropy,
HW1.nb
g(X), Y)
X, Y)
g(X), Y) = H(X, Y)
g(X), Y) 0
(4)
According to the Information Inequality, D(p(x1 , ..., xn ) || p(x1 ) ... p(xn )) 0, with equality
iff p(x1 , ..., xn ) = p(x1 ) ... p(xn ) for all possible xi values (i.e., the Xi are independent). Then,
from the definition of relative entropy,
0 D(p(x1 , ..., xn ) || p(x1 ) ... p(xn )) = p(x1 , ..., xn ) log
x1 ,...,xn
p(x1 , ..., xn )
p(x1 ) ... p(xn )
= - p(x1 , ..., xn ) log p(x1 ) ... p(xn ) + p(x1 , ..., xn ) log p(x1 , ..., xn )
x1 ,...,xn
x1 ,...,xn
n
i=1
n
i=1
n
(5)
Given a1 , ..., an 0 and b1 , ..., bn 0, define probability distributions p(x), q(x) on {1, ..., n} such that
p(x = i) = pi =
ai
n
i=1 ai
and q(x = i) = qi =
bi
n
i=1 bi
pi
qi
HW1.nb
ai
ai
log ai -
i=1
i=1 ai
ai
n
i=1 ai
i=1
n
i=1 ai log
n
i=1 ai
n
ai log
i=1 ai
n
i=1 bi
i=1
ai
log ai -
i=1 ai
log
ai
bi
ai
bi
ai log
i=1
log bi
i=1 ai
i=1
i=1 bi
n
log
ai
ai
i=1 ai
log
n
i=1 ai
n
i=1 ai
i=1 bi
log bi +
bi
log
i=1 ai
i=1 ai
i=1
ai
i=1 ai
i=1 ai
ai
ai
i=1 ai
i=1
ai
log
i=1 ai
n
i=1 bi
log
i=1 ai
n
i=1 bi
i=1 ai log
ai
bi
i=1 ai
bi
(6)
(a)
H(X1 , X2 , X3 ) - H(X1 , X2 ) = H(X3
X1 , X 2 )
X1 ) + H(X3
X1 , X2 ) = H(X1 , X2 ) + H(X3
X1 )
X1 , X2 ) and
X1 , X2 ) and
X1 )
X1 ) = H(X3
X1 ) - H(X3
X1 , X2 ), it follows that
X2 )
(7)
In the proof of Fano' s Inequality, it is shown that H(Pe ) + HX
Therefore, it suffices to consider HX
HX
= (1 - P ) HX
E, X
e
= Pe HX
HX
E, X
+ P HX
E = 0, X
e
, since HX
E = 1, X
E = 1, X
= 0
E = 0, X
H(X
X
Y)
HW1.nb
Then,
HX
= -
E = 1, X
px
E=1,x log pX
E = 1, X
is a distribution
E = 1, X
Y)
E = 1, X