Chap 07

Chapter 7
Channel Capacity
Peng-Hua Wang
Graduate Inst. of Comm. Engineering
National Taipei University
Chapter Outline
Chap. 7 Channel Capacity
7.1 Examples of Channel Capacity
7.2 Symmetric Channels
7.3 Properties of Channel Capacity
7.4 Preview of the Channel Coding Theorem
7.5 Definitions
7.6 Jointly Typical Sequences
7.7 Channel Coding Theorem
7.8 Zero-Error Codes
7.9 Fanos Inequality and the Converse to the Coding Theorem
Peng-Hua Wang, April 16, 2012
Information Theory, Chap. 7 - p. 2/62
7.1 Examples of Channel Capacity
Channel Model
Operational channel capacity: the bit number to represent the

maximum number of distinguishable signals for n uses of a
communication channel.
In n transmission, we can send M signals without error, the

channel capacity is log M/n bits per transmission.
Information channel capacity: the maximum mutual information
Operational channel capacity is equal to Information channel capacity.
Fundamental theory and central success of information theory.
Channel capacity
Definition 1 (Discrete Channel) A system consisting of an input
alphabet X and output alphabet Y and a probability transition matrix
p(y|x).
Definition 2 (Channel capacity) The information channel capacity of
a discrete memoryless channel is
C = max I(X; Y )
p(x)
where the maximum is taken over all possible input distribution p(x).
Operational definition of channel capacity: The highest rate in bits per

channel use at which information can be sent.
Shannons second theorem: The information channel capacity is

equal to the operational channel capacity.
Example 1
Noiseless binary channel
p(Y = 0) = p(X = 0) = 0 , p(Y = 1) = p(X = 1) = 1 = 1 0

I(X; Y ) = H(Y ) H(Y |X) = H(Y )
1
=
0 = 1 = 1/2
Example 2
Noisy channel with non-overlapping outputs
p(X = 0) = 0 , p(X = 1) = 1 = 1 0
p(Y = 1) = 0 p, p(Y = 2) = 0 (1 p), p = 1/2
p(Y = 3) = 1 q, p(Y = 4) = 1 (1 q), q = 1/3
I(X; Y ) = H(Y ) H(Y |X) = H(Y ) 0 H(p) 1 H(q)
Noisy Typewriter
Noisy typewriter
Noisy Typewriter
I(X; Y )
= H(Y ) H(Y |X)
X
= H(Y )
p(x)H(Y |X = x)
x
= H(Y )
p(x)H( 21 )
= H(Y ) H( 12 )
log 26 1 = log 13
C = max I(X; Y ) = log 13
Binary Symmetric Channel (BSC)

I(X; Y )
= H(Y ) H(Y |X)
X
= H(Y )
p(x)H(Y |X = x)
x
= H(Y )
p(x)H(p)
= H(Y ) H(p)
1 H(p)
C = max I(X; Y ) = 1 H(p)
Binary Erasure Channel
Binary Erasure Channel
I(X; Y )
= H(Y ) H(Y |X)
X
= H(Y )
p(x)H(Y |X = x)
x
= H(Y )
p(x)H()
= H(Y ) H()
H(Y ) = (1 )H(0 ) + H()
C = max I(X; Y ) = 1
7.3 Properties of Channel Capacity
Properties of Channel Capacity
C 0.
C log |X |.
C log |Y|.
I(X; Y ) is a continuous function of p(x),
I(X; Y ) is a concave function of p(x),
7.4 Preview of the Channel Coding Theorem
Preview of the Channel Coding Theorem
For each input n-sequence, there are approximately 2nH(Y |X) ,

possible Y sequences.
The total number of possible (typical) Y sequences is 2nH(Y ) .
This set has to be divided into sets of size 2nH(Y |X) corresponding to
the different input X sequences.
The total number of disjoint sets is less than or equal to
2nH(Y ) /2nH(Y |X) = 2n(H(Y )H(Y |X)) = 2nI(X;Y )
We can send at most 2nI(X;Y ) distinguishable sequences of length n.
Example
6 typical sequences for X n . 4 typical sequences for Y n .
12 typical sequences for (X n , Y n ).
For every X n , we have
2nH(X,Y ) /2nH(X) = 2nH(Y |X) = 2 typical Y n .

e.g., for X n
= 001100 Y n = 010100, 101011.

Example
Since we have 2nH(Y )
= 4 typical Y n in total, how many typical X n
can these typical Y n be assigned?
2nH(Y ) /2nH(Y |X) = 2n(H(Y )H(Y |X)) = 2nI(X;Y ) = 2.
Can we assign more typical X n ? No. For some Y n received, we

cant not determine which X n is received. e.g., If we use 001100,
101101, and 101000 as codewords, we cant determine which
codeword is sent when we receive 101011.
7.5 Definitions
Communication Channel
Message W
Encoder: input W , output X n
{1, 2, ..., M }.
X n (W ) X n
n is the length of the signal. We then transmit the signal via the
channel by using the channel n times. Every time we send a
symbol of the signal.
Channel: input X n , output Y n with distribution p(y n |xn )
Decoder: input Y n , output W
= g(Y n ) where g(Y n ) is a
deterministic decoding rule.
If W
6= W , an error occurs.
Definitions
Definition 3 (Discrete Channel) A discrete channel, denoted by
(X , p(y|x), Y), consists of two finite sets X and Y and a collection of

probability mass functions p(y|x).
P
X : input, Y :output, for every input x X ,
y p(y|x) = 1.
Definition 4 (Discrete Memoryless Channel, DMC) The nth
extension of the discrete memoryless channel is the channel
(X n , p(y n |xn ), Y n ) where

p(yk |xk , y k1 ) = p(yk |xk ), k = 1, 2, . . . , n.
Without feedback: p(xk |xk1 , y k1 ) = p(xk |xk1 )
nth extension of DMC without feedback:

n
Y
p(y n |xn ) =
p(yi |xi ).
i=1
Definitions
(M, n) code] An (M, n) code for the channel
(X , p(y|x), Y) consists of the following:
1. An index set {1, 2, . . . , M }.
Definition 5
2. An encoding function X n
: {1, 2, . . . , M } X n . The codewords

are xn (1), xn (2), . . . , xn (M ). The set of codewords is called the
codebook.
3. A decoding function g
: Y n {1, 2, . . . , M }
Definitions
Definition 6 (Conditional probability of error)
n
i = Pr(g(Y ) 6= i|X = x (i)) =
p(y n |xn (i))
g(y n )6=i
p(y n |xn (i))I(g(y n ) 6= i)
yn
I() is the indicator function.
Definitions
Definition 7 (Maximal probability of error)
(n) =
max
i{1,2,...,M }
Definition 8 (Average probability of error)
Pe(n)
The decoding error is
Pr(g(Y n ) 6= W ) =
M
X
M
X
1
i
=
M i=1
Pr(W = i) Pr(g(Y n ) 6= i|W = i)
i=1
If the index W is chosen uniformly from {1, 2, . . . , M }, then

(n)
Pe
= Pr(g(Y n ) 6= W ).
Definitions
Definition 9 (Rate) The rate R of an (M, n) code is
log M
R=
n
bits per transmission
Definition 10 (Achievable rate) A rate R is said to be achievable if

there exists a (2nR , n) code such that the maximal probability of
error (n) tends to 0 as n
Definition 11 (Channel capacity) The capacity of a channel is the

supremum of all achievable rates.
7.6 Jointly Typical Sequences
Definitions
(n)
Definition 12 (Jointly typical sequences) The set A of jointly

typical sequences {(xn , y n )} with respect to the distribution p(x, y) is
defined by
A(n)
where

1
n
n
n
n
n
= (x , y ) X Y : log p(x ) H(X) < ,
n

1

log p(y n ) H(Y ) < ,
n

1

log p(xn , y n ) H(X, Y ) <
n

p(xn , y n ) =
n
Y
p(xi , yi )
i=1
Joint AEP
Theorem 1 (Joint AEP) Let (X n , Y n ) be sequences of length n
drawn i.i.d. according to p(xn , y n ). Then:
(n)
Pr (xn , y n ) A
1 as n .

(n)
2. A 2n(H(X,Y )+)
n , Y n ) p(xn )p(y n ) [i.e., X
n and Y n are independent with
3. If (X
1.
the same marginals], then

n(I(X;Y )3).
n , Y n ) A(n)
Pr (X
Also, for sufficient large n,

n , Y n ) A(n) (1 )2n(I(X;Y )+3).
Pr (X
Joint AEP

(n)
Theorem 2 (Joint AEP) 1. Pr (xn , y n ) A
1 as n .
Proof. Given
> 0, define events A, B, C as

n 1
A , X : log p(X n ) H(X)
n

n 1
B , Y : log p(Y n ) H(Y )
n

1
n
n
n
n

C , (X , Y ) : log p(X , Y ) H(X, Y ) ,
n
Joint AEP
Then, by weak law of large number, there exists n1 , n2 , n3 such that,
Pr (A) < , n > n1 , Pr (B) < , n > n2 ,

3
3
Pr (C) < , n > n3 .

3
Thus,
n
Pr (x , y )
A(n)
= Pr(Ac B c C c )
=1 Pr(A B C) 1 (Pr(A) + Pr(B) + Pr(C))

1
for all n
> max{n1 , n2 , n3 }.
Joint AEP

(n)
Theorem 3 (Joint AEP) 2. A 2n(H(X,Y )+)
Proof.
1=
p(xn , y n )
p(xn , y n )
(n)
(xn ,y n )A
n(H(X,Y )+)
|2
|A(n)
Thus,
(n)
A 2n(H(X,Y )+) .
Joint AEP
n , Y n )
Theorem 4 (Joint AEP) 3. If (X
n and
p(xn )p(y n ) [i.e., X
Y n are independent with the same marginals], then

n , Y n ) A(n) 2n(I(X;Y )3) .
Pr (X
Also, for sufficient large n,

n(I(X;Y )+3)
n , Y n ) A(n)
Pr (X
(1
)2
.
Joint AEP
Proof.
n , Y n ) A(n)
=
Pr (X
p(xn )p(y n )
(n)
(xn ,y n )A
2n(H(X,Y )+) 2n(H(X)) 2n(H(Y ))

= 2n(I(X;Y )3) .

(n)
For sufficient large n, Pr A
1 , and therefore
1
(n)
(xn ,y n )A
and
(n) n(H(X,Y ))
p(x , y ) A 2
.
(n)
A (1 )2n(H(X,Y ))
Joint AEP

n , Y n ) A(n)
Pr (X
X
=
p(xn )p(y n )
(n)
(xn ,y n )A
(1 )2n(H(X,Y )) 2n(H(X)+) 2n(H(Y )+)

=(1 )2n(I(X;Y )+3)
Joint AEP: Conclusion
There are about 2n(H(X) typical X sequences, and about 2n(H(Y )

typical X sequences.
There are about 2n(H(X,Y ) jointly typical sequences.
Randomly chosen a pair of typical X n and typical Y n , the probability

that it is jointly typical is about 2nI(X;Y ) .
7.7 Channel Coding Theorem
Channel Coding Theorem

Theorem 5 (Channel coding theorem) For every rate R
< C , there
exists a sequence of (2nR , n) codes with maximum probability of error
(n) 0.
Conversely, any sequence of (2nR , n) codes with (n)
0 must have
R C.
We have to prove two parts.
R < C achievable.
Achievable R C .
Main ideas.
Random encoding (random code)
Jointly typical decoding
Random Code
Generate a (2nR , n) code at random according to the distribution
p(x) (fixed). That is, the 2nR codewords have the distribution
p(xn ) =
n
Y
p(xi )
i=1
A particular code C is the matrix with 2nR codewords as the row.
x1 (1)
x2 (1)
x (2)
x2 (2)
1
C= .
..
..
..
.
.
x1 (2nR ) x2 (2nR )
xn (1)
xn (2)
..
xn (2nR )
The code C is revealed to both sender and receiver. Both sender and
receiver are also assumed to know the channel transition matrix
Random Code
nR
There are (|X |n )2
The probability of a particular code C is
different codes.
nR
Pr(C) =
2
n
Y
Y
p (xi (w))
w=1 i=1
Transmission and Channel
A message W is chosen according to a uniform distribution
Pr[W = w] = 2nR , w = 1, 2, . . . , 2nR .
The w th codeword X n (w), corresponding to the w th row of C , is

sent over the channel.
The receiver receives a sequence Y n according to the distribution
P (y n |xn (w)) =
n
Y
p(yi |xi (w)).
i=1
That is, use the DMC channel for n times.
Jointly Typical Decoding
was sent if
The
receiver
declares
that
the
message
W

), Y n is jointly typical.
X n (W
There is no other jointly typical pair for Y n . That is, there is no
other W
exists or if there is more than one such, an error is

If no such W
declared (W
such that W , Y n is jointly typical.

6= W
= 0).
There is decoding error if W
6= W ].
6= W . Let E be the event [W
Proof of R < C Achievable
The average probability of error averaged over all codewords in the

codebook, and averaged over all codebooks.
Pr(E) =
Pe(n) (C)
(n)
Pe (C)
Pr(C)
nR
2
X
X
1
2nR
w=1
2nR
w (C)
w=1
Pr(C)w (C)
is defined for jointly typical decoding.
By the symmetry of the code construction,

not depend on w.
nR
2
X
Pr(C)w (C) does
Therefore,
Pr(E) =
=
1
2nR
X
nR
2
XX
w=1
Pr(C)w (C)
Pr(C)w (C) for any w
Pr(C)1 (C) = Pr(E|W = 1)
= {(X n (i), Y n ) is jointly typical pair} for

i = 1, 2, . . . , 2nR where Y n is the channel output when the first
codeword X n (1) was sent. Then decoding error is declared if
E1c : The transmitted codeword and the received sequence are not
Define Ei
jointly typical.
E2 E3 E2nR : a wrong codeword is jointly

typical with
Y n is the channel output when the first codeword X n (1) was sent.
E1c : The transmitted codeword and the received sequence are not
jointly typical.
E2 , E3 , . . . , E2nR : wrong codewords that are jointly typical with the

received sequence.
The average error
Pr(E|W = 1) = P (E1c E2 E3 E2nR |W = 1)

nR
P (E1c |W = 1) +
2
X
P (Ei |W = 1)
i=2
By AEP,
P (E1c |W = 1) for n sufficiently large

P (Ei |W = 1) 2n(I(X;Y )3)
(Y
and X
(1) are jointly typical.)
We have
Pr(E|W = 1) + (2nR 1)2n(I(X;Y )3)

+ 2nR 2n(I(X;Y )3)
= + 2n(I(X;Y )R3)
If I(X; Y ) R 3
> 0, then 2n(I(X;Y )R3) < for n
sufficiently large, and
Pr(E|W = 1) 2.
So far, we prove that: for any , if R
< I(X; Y ) and n sufficient

large, the average decoding error Pr(E) = Pr(E|W = 1) < 2.
What do we need? If R
< C , the maximum error probability
(n) 0.
(We are almost there. Almost...)
Proof of R < C Achievable, final part
Choose p(x) such that I(X; Y ) is maximum. That is, choose p(x)
such that I(X; Y ) achieve channel capacity C . Then the condition
R < I(X; Y ) 3 can be replaced by the achievability condition

R < C 3.
Since the average probability of error over codebooks is less then 2,

there exists at least one codebook C such that Pr(E|C )
< 2.
C can be found by an exhaustive search over all codes.
Since W is chosen uniformly, we have

nR
Pr(E|C ) =
2
1 X
2nR
i (C ) 2
i=1
which implies that the maximal error probability of the better half
codewords is less than 4.
Chap. 7 - p. 48/62
There are 10 students. Their average score is 40.Information
Then Theory,
the highest
Peng-Hua Wang,
April 16, 2012
Proof of R < C Achievable, final part
We throw away the worst half of the codewords in the best codebook
C . The new code has a maximal probability of error less than 4.

nR
n(R1/n)
However, we construct a 2 /2, n or 2
, n code. The
rate of the new code is R 1/n.
Summary. If R 1/n
< C 3 for any , then (n) 4 for n
sufficiently large.
7.8 Zero-Error Codes
No error R C
, n code with zero probability of error.

W is determined by Y n . p(g(Y n ) = W ) = 1. H(W |Y n ) = 0.
Assume that we have a
nR
To obtain a strong bound, assume that W is uniformly distributed

over {1, 2, . . . , 2nR }.
nR = H(W ) = H(W |Y n ) + I(W ; Y n ) = I(W ; Y n )

I(X n ; Y n )(data processing ineq. W X n (W ) Y n )
n
X
I(Xi ; Yi ) (See next page.)
i=1
nC (definition of channel capacity)
That is, no error
R C.
No error R C
Lemma 1 Let Y n be the result of passing X n through a discrete
memoryless channel of capacity C . Then for all p(xn ),
I(X n ; Y n ) nC.
Proof.
I(X n ; Y n ) = H(Y n ) H(Y n |X n )

= H(Y n )
= H(Y n )
n
X
i=1
n
X
H(Yi |Y1 , . . . , Yi1 , X n )

H(Yi |Xi ) (definition of DMC)
i=1
n
X
i=1
H(Yi )
n
X
i=1
H(Yi |Xi ) =
n
X
I(Yi ; Xi ) nC
i=1
7.9 Fanos Inequality and the Converse to

the Coding Theorem
Fanos Inequality
Theorem 6 (Fanos inequality) Let X and W have the same sample
spaces X
= {1, 2, . . . , M } and have the joint p.m.f. p(x, w). Let

XX
p(x, w).
Pe = Pr[X 6= W ] =
xX wX ,
w6=x
Then
Pe log(M 1) + H(Pe ) H(X|W )

where
H(Pe ) = Pe log Pe (1 Pe ) log(1 Pe ).
Fanos Inequality
Proof. We will prove that H(X|W ) H(Pe ) Pe log(M
H(X|W ) =
XX
p(x, w) log
1
p(x|w)
XX
p(x, w) log
1
p(x|w)
XX
p(x, w) log
1
p(x|w)
XX
p(x, w) log
1
M 1
Pe log(M 1) =
w6=x
w=x
w6=x
1) 0.
H(Pe ) = Pe log Pe + (1 Pe ) log(1 Pe )

XX
p(x, w) log Pe
=
x
w6=x
XX
x
p(x, w) log(1 Pe )
w=x
Add the above three terms together.

Fanos Inequality
Proof (cont.)
H(X|W ) Pe log(M 1) H(Pe )
XX
XX
Pe
1 Pe
=
p(x, w) log
p(x, w) log
+
(M 1)p(x|w)
p(x|w)
x w6=x
x w=x
XX
XX
P
1 Pe
e
log
p(x, w)
+
p(x, w)
(M 1)p(x|w)
p(x|w)
x w6=x
x w=x
XX
XX
P
e
p(w) + (1 Pe )
p(w)
= log
M 1 x
x w=x
w6=x
= log[Pe + (1 Pe )] = 0
Fanos Inequality
Corollary 1 1.
2.
Pe log M + H(Pe ) H(X|W ), Pe = Pr[X 6= W ]
1 + Pe log M H(X|W ), Pe = Pr[X 6= W ]
3. If X
and Pe = Pr[X 6= X],

then
Y X
H(X|Y )
H(Pe ) + Pe log M H(X|X)
Remark.
H(X|W ) Pe log(M 1) + H(Pe ) Pe log M + H(Pe ).

2. H(X|W ) Pe log(M 1) + H(Pe ) Pe log M + 1.
1.
3. The second ineq. can be obtained by data processing ineq.
Data Processing Inequality

Lemma 2 (Data processing inequality) If X
Y Z , then
I(X; Z) I(X; Y )
Proof.
I(X; Z) I(X; Y )
=H(X) H(X|Z) [H(X) H(X|Y )] = H(X|Y ) H(X|Z)
XX
XX
1
1
=
p(x, y) log
p(x, z) log
p(x|y)
p(x|z)
x
y
x
z
XXX
1
1
p(x, y, z) log
p(x, y, z) log
=
p(x|y)
p(x|z)
x
y
z
x
y
z
!
XXX
p(x|z)
log
p(x, y, z)
(by convexity of logarithm)
p(x|y)
x
y
z
XXX
Data Processing Inequality

Proof (cont.) Since X
Y Z , we have
p(x, y)p(y, z)
p(x, y, z) = p(x, y)p(z|x, y) = p(x, y)p(z|y) =
p(y)
and
p(x|z)
p(x, y)p(y, z) p(x, z)p(y)
p(x, z)p(y, z)
=
=
p(x, y, z)
p(x|y)
p(y)
p(z)p(x, y)
p(z)
Therefore,
XXX
x
p(x|z) X X X p(x, z)p(y, z)

p(x, y, z)
=
p(x|y)
p(z)
x
y
z
X X p(x, z) X
x
p(z)
p(y, z) =
XX
x
p(x, z) = 1
Data Processing Inequality (Summary)

Lemma 3 1. If X
Y Z , then
(
I(X; Y )
I(X; Z)
I(Y ; Z)
H(X|Y ) H(X|Z)
2. If X
Y Z W , then
I(X; Z) + I(Y ; W ) I(X; W ) + I(Y ; Z),
I(X; W ) I(Y ; Z)
Achievable R C
Theorem 7 (Converse to Channel coding theorem) Any sequence of
(2nR , n) codes with (n) 0 must have R C .

Proof.
For a fixed encoding rule X n (W ) and a fixed decoding rule
= g(Y n ), we have W X n (W ) Y n W
.
W
For each n, let W be drawn according to a uniform distribution= over
{1, 2, . . . , 2nR }.
Since W has a uniform distribution,

nR
] = Pe(n) =
Pr[W 6= W
2
X
1
2nR
i .
i=1
Achievable R C
Proof (cont.)
nR = H(W ) (W is uniform distribution)

) + I(W ; W
)
= H(W |W
) (Fanos ineq.)
1 + Pe(n) nR + I(W ; W
1 + Pe(n) nR + I(X n ; Y n ) (data processing ineq.)
1 + Pe(n) nR + nC (lemma 7.9.2)
C
1
1
R nR
That is, if R > C , the probability of error is large than a positive value
for sufficiently large n. The error probability cant achieve arbitrary
small.
Pe(n)

Chap 07

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chap 07

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 7

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 2/62

7.1 Examples of Channel Capacity

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 3/62

Operational channel capacity: the bit number to represent the

In n transmission, we can send M signals without error, the

Information channel capacity: the maximum mutual information

Operational channel capacity is equal to Information channel capacity.

Fundamental theory and central success of information theory.

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 4/62

Operational definition of channel capacity: The highest rate in bits per

Shannons second theorem: The information channel capacity is

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 5/62

Noiseless binary channel

p(Y = 0) = p(X = 0) = 0 , p(Y = 1) = p(X = 1) = 1 = 1 0

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 6/62

Noisy channel with non-overlapping outputs

Information Theory, Chap. 7 - p. 7/62

Information Theory, Chap. 7 - p. 8/62

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 9/62

Binary Symmetric Channel (BSC)

Binary Symmetric Channel (BSC)

Information Theory, Chap. 7 - p. 10/62

Binary Symmetric Channel (BSC)

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 11/62

Binary Erasure Channel

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 12/62

Binary Erasure Channel

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 13/62

7.3 Properties of Channel Capacity

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 14/62

Properties of Channel Capacity

I(X; Y ) is a continuous function of p(x),

I(X; Y ) is a concave function of p(x),

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 15/62

7.4 Preview of the Channel Coding Theorem

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 16/62

Preview of the Channel Coding Theorem

For each input n-sequence, there are approximately 2nH(Y |X) ,

The total number of possible (typical) Y sequences is 2nH(Y ) .

The total number of disjoint sets is less than or equal to

2nH(Y ) /2nH(Y |X) = 2n(H(Y )H(Y |X)) = 2nI(X;Y )

We can send at most 2nI(X;Y ) distinguishable sequences of length n.

Peng-Hua Wang, April 16, 2012

Information Theory, Chap. 7 - p. 17/62

6 typical sequences for X n . 4 typical sequences for Y n .

12 typical sequences for (X n , Y n ).

For every X n , we have

2nH(X,Y ) /2nH(X) = 2nH(Y |X) = 2 typical Y n .

= 001100 Y n = 010100, 101011.

Since we have 2nH(Y )