Beruflich Dokumente
Kultur Dokumente
Coding.
Wadih Sawaya
General Introduction.
Communication systems
• There exists between the source and the destination a communicating channel
affected by various disturbances.
disturbances
disturbances
Source Channel
SOURCE
Coding Coding C
H
A
N
N
E
Source Channel L
User
Decoding Decoding
As in any mathematical theory, this theory deals only with mathematical models and not
with physical sources and physical channels. To proceed we will study the simplest classes of
mathematical models of sources and channels. Naturally the choice of these models will be
influenced by the more important existing real physical sources and physical channels.
Each symbol from this sequence is thus a random outcome taking value from the
finite alphabet {x1, x2 ,…xM}.
M
Pk = P( s = xk ) 1 ≤ k ≤ M ; ∑ Pk = 1
k =1
P (xi , x j ) = P( xi )P (x j ) ⇔ Q (xi ; x j ) = Q ( xi ) + Q (x j )
1
Q( xi ) ∆ log a
Pi
• The base (a) of the logarithm determines the unit of the measure assigned to
the information content. When the base a = 2, the unit is the “bit” measure.
1) The correct identification of one of two equally likely symbols, that is, P(x1) = P(x2 ),
conveys an amount of information equal to Q(x1) = Q(x2) = log22 = 1 bit of
information.
2) The information content of each outcome when tossing a fair coin is Q(“Head”) =
Q(“Tail”) = log22 = 1 bit of information.
M M
1
H (X ) = ∑ Pk Q ( xk ) = ∑ Pk log
Pk
k =1 k =1
Example 1:
Alphabet : {x1, x2 , x3 , x4 }
1 1 1
P =
Probabilities: 1 2 ; P2 = ; P3 = P4 =
4 8
4
1
⇒ Entropy H ( X ) = ∑
k =1
Pk
log 2 = 1 . 75 bits/symbol
Pk
• Example 2:
1
Alphabet of M equally likely distributed symbols: Pk = ∀k ∈ {1,..., M }
M
1 M
⇒ Entropy : H ( X ) = ∑ M log 2 ( M ) = log 2 ( M ) bits/symbol
k =1
• Example 3:
Binary alphabet : {0, 1}
p0 = Px ; p1 = 1 − Px
1 1
⇒ Entropy : H ( X ) = Px log 2 + (1 − Px ) log 2 ∆ H f ( Px )
Px 1 − Px
ENTROPY OF A BINARYALPHABET
1
0.9
0.8
0.7
Entropy in bits/symbol
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
probability Px
• The maximum occurs for Px = 0.5, that is, when the two
symbols are equally likely. This results is fairly general:
H ( X ) ≤ log M
p(x)
D(p q)=∑ p(x)log
x∈ℵ q(x)
Example: Determine D(p q) for p(0)=p(1)=1/2 and q(0)= 3/4, q(1)=1/4.
H(Y/X)
H(X/Y) I(X ; Y)
H(X,Y)
TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding
22. Mutual Information
H(X,Y)=H(X)+H(Y / X)
•and then:
I(X; Y) ≥ 0
H(X / Y) ≤ H(X)
• We have studied until now the average information content of a set of all
possible outcomes recognized as the alphabet of the discrete source.
• We are interested by the knowledge of the information content per symbol
in a long sequence of symbols delivered by the discrete source, disregarding
if the emitted symbols are correlated in time or not.
• The source can be identified as a stochastic process. A source is stationary if
it has the same statistics no matter the time origin is.
• Let (X1, X 2,..., X k ) be a sequence of k non-independent random variables emitted
1
The entropy per symbol of a sequence of k-symbols is fairly defined as: H k ( X ) = H (X k )
k
• Theorem 6: For a stationary source this limit exists and is equal to the limit
of the conditional entropy lim
k →∞
H(X X ,..., X )
k k −1 1
For a discrete memoryless source (DMS), each symbol emitted is independent from
all previous ones and the entropy rate of the source is equivalent to the entropy of
the alphabet of the source:
H∞ (X ) = H ( X )
+∞
H ( X ) = −∫ p( x) log 2 p( x)dx
−∞
M
n∆ ∑ Pk nk
k =1
• Example:
Symbol codeword
x1 0
x2 01
x3 10
x4 100
• Theorem 8 (Kraft Inequality): If the integers n1, n2,, …nK satisfy the inequality
K
∑ 2− n k
≤1
k =1
then a prefix binary code exists with these integers as codeword lengths
Note: The theorem does not say that any code whose lengths satisfy this inequality is a prefix code.
H ( X ) ≤ n < H ( X ) +1
H (X )
• we can define the efficiency of a code as: ε ∆
n
2. Group the last two symbols xM and xM-1 into an equivalent symbol, with
probability PM + PM-1.
4. Associate the binary digits 0 and 1 to each pair of branches in the tree departing
from intermediate nodes.
x2 0.35 01 10 n = 2 digits/sym
x2 0
0.35
0 0.55
0.1 x3
1
0.2 1
0.1 x4
1
( )
− 1 log p(x1, x2,..., xn) → H(X)
n
in probability
{ − n(H(X) + ε)
Aε = (x1, x2,..., xn) : 2
(n) − n(H(X) − ε)
≤ p(x1, x2,..., xn) ≤ 2 }
TELECOM LILLE 1 - Février 2010 Information Theory and Coding
38. The Asymptotic Equipartition
Property (AEP)
• Theorem 11: If (x1, x2,..., xn) ∈ Aε(n) then:
2. { }
Pr Aε(n) > 1 − ε for n sufficiently large
Aε(n) ≤ 2n(H(X)+ε)
(n)
Typical set Aε
=> Indexing requires no more
than n(H+ε)+1binary elements +
prefixed by 0
Theorem 12: Let X1, X2, … Xn are i.i.d. ~ p(x), let e >0. There exists
a code which maps sequences (x1,…,xn) such that:
n = ∑ P(x ,..., x )n(x ,..., x ) ≤ H(X) + ε
1 n 1 n
Xn
• Increasing the bloc length k makes the code more efficient and thus: for any δ > 0 it
is possible to choose k large enough so that n satisfies:
H∞ ( X ) ≤ n < H∞ ( X ) + δ
x2 0
0.35
0.55
0.2 x3 1
1
Between the channel encoder output and the channel decoder input, we may consider a
discrete channel.
• The input and output of the channel are discrete alphabets. As a practical example the Binary Channel.
Between the channel encoder output and the input of the demodulator we may consider a
continuous channel with discrete input alphabet.
• As a practical example, the AWGN ("Additive" White Gaussian Noise) channel is well known, and is completely
characterized by the probability distribution of the noise.
p11
x1 y1
p 21 p12
x2 p22 y2
. .
. .
. .
xN X pN X N Y
y NY
• pij ∆ P ( y j xi ) represents
the probability of receiving the
symbol yj, given that the symbol xi has been
transmitted.
NY
Obviously we have the relationship: ∑ pij = 1
j =1
p11 + p12 = 1 and p21 + p22 = 1
When p12 = p21 = p the channel is called binary symmetric channel (BSC) .
x1 1−p
y1
p
p
x2 y2
1−p
NY
The sum of the elements in each row of P is 1: ∑ pij = 1
j =1
• Example 2:
The symbols of the input alphabet are in one-to-one correspondence with the
symbols of the output alphabet.
⇒ P ( y j xi ) = P( y j ) ∀j , i
The matrix P has identical rows. The useless channel completely scrambles all
input symbols, so that the received symbol gives any useful information to
decide upon the transmitted one.
P( y j xi ) = P ( y j ) ⇔ P ( xi y j ) = P ( xi )
• This conditional entropy represents the average amount of information that has been
lost in the channel, and it is called equivocation.
• Examples:
The noiseless channel: H(X|Y) = 0
No loss in the channel.
The useless channel: H(X|Y) = H(X)
All transmitted information is lost on the channel
• A basic point is the knowledge of the average information flow that can reliably pass
through the channel.
CHANNEL
emitted message received message
Information lost in
the channel
• Remark: We can define the average information at the output end of the channel:
NY 1
H (Y ) ∆ ∑ P ( y j ) log bits/sym
j =1 P( y j )
I ( X ;Y ) ∆ H ( X ) − H ( X Y ) bits/sym
Note that: I ( X ;Y ) = H ( X ) − H ( X Y )
= H (Y ) − H (Y X )
Remark: The mutual information has a more general definition than “an information
flow”. It is the average information provided about the set X by set the Y, excluding all
average information about X from X itself (the average self-information is H(X)).
1 1 1 1
H (Y X ) = P ( x1, y1 ) log 2 + P ( x1, y2 ) log 2 + P( x2 , y1 ) log 2 + P( x2 , y2 ) log 2
P ( y1 x1 ) P ( y 2 x1 ) P ( y1 x2 ) P ( y 2 x2 )
pij × P ( xi ) for i ≠ j
P ( y j , xi ) = P ( y j xi ) P ( xi ) =
(1 − pij ) × P( xi ) for i = j
p12 = p21 = p
1 1
⇒ H (Y X ) = p log 2 + (1 − p) log 2 = H f ( p)
p
1 − p
P(x1)= 1 - P(x2)
2. H (Y ) = P( y1 )log 2
1 1
+ P( y2 )log 2
P( y1 ) P ( y2 )
P ( y j ) = ∑ p( y j , xi ) ⇒ P( y1 ) = p + P( x1 )(1 − 2 p )
i
P( y2 ) = (1 − p ) − P( x1 )(1 − 2 p )
= ∑ p( y j xi )P( xi )
i
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(x1)
TELECOM LILLE 1 - Février 2010 Information Theory and Coding
57. Capacity of a discrete memoryless
channel.
• Considering the set of curves I(X;Y) function of P(x1 ) we can observe
that the maximum of I(X;Y) is always obtained for P(x1)=P(x2)=0.5
when the input symbols are equally likely.
1
P ( xi ) = for all i = 1 , .... N X
NX
cos(2πf0t)
BSC
Discrete Source
Source Encoder BPSK h(t) x
{1, 0}
AWGN
Source
User h(-t) x
Decoder
{1, 0}
cos(2πf0t)
2 Eb
p = Q
N0
P(0) = P(1) = 0.5
1.2
1.1
Capacity of BSC
1
P(x1)=0.5 ; P(x2) = 0.5
0.9
0.8 I ( X ;Y )
bits/symbole
0.6
0.5
0.4
0.3
0.2
0 5 10 15 20 25
SNR (dB)
• In order to study the capacity of the AWGN channel, we drop the hypothesis
of discrete input alphabet and we consider the input X as a random
continuous variable with variance σ X2
X + Y
p X (x) pY ( y )
ν
ν ~ N(0, σν)
I ( X ;Y ) = H (Y ) − H (Y X )
σ2
C == 1 log21 + X2
2 σν
• The noise is white and Gaussian with two-sided power spectral density N0/2. In
the band (-B, +B), the noise mean power is σν² = (N0/2).(2B) = N0B
• For zero mean and a stationary input each sample will have a variance σX²
equal to the signal power P, i.e. σX² = P
• Using the sampling theorem we can represent the signal using at least 2B
samples per second. Transmitting at a sample rate 1/2B we express the
capacity in bits/sec as:
P
Cs = B log 2 1 + bits/sec
N 0
B
1
3
C = log 2 (1 + SNR )
2
AWGN Capacity bits/symbol
2.5
1.5
0.5
0 2 4 6 8 10 12 14 16 18 20
SNR (dB)
• The noisy channel coding theorem introduced by C. E. Shannon in 1948 is one of the
most important results in information theory.
• In imprecise terms, this theorem states that if a noisy channel has capacity Cs in bits
per second, and if binary data enters the channel encoder at a rate Rs < Cs , then by an
appropriate design of the encoder and decoder, it is possible to reproduce the emitted
data after decoding with a probability of error as small as desired.
I ( X ;Y ) ∆ H ( X ) − H ( X Y ) bits/sym
• Theorem 16: Let a discrete channel have a capacity C and a discrete source have an
entropy rate R. If R ≤ C there exists a coding system such that the output of the source
can be transmitted over the channel with an arbitrarily small frequency of errors (or an
arbitrarily small equivocation). If R > C there is no method of encoding which gives an
equivocation less than R − C (Shannon 1948).
• Hence the noisy channel coding theorem states on the existence of such a
code but didn’t exhibit the way of constructing it.
• E(R) is a convex ∪ , decreasing function of R, with 0 < R < C and n the length
of the emitted sequences.
E(R) E(R)
R C1 C2 R
R2 R1 C
Thus it is obvious that the average error probability could be rendered arbitrarily small by
choosing long sequences of codewords n.
In addition, the theorem considers randomly chosen codewords. Practically this appears to be
incompatible with reality, unless a genius observer deliver to the user of information, the rule
of coding (the mapping) for each received sequence.
The number of codewords and the number of possible received sequences are exponentially
increasing functions of n Thus for large n, it is impractical to store the codewords in the
encoder and decoders when a deterministic rule of mapping is adopted.
We shall continue our study on channel coding by discussing now techniques that avoid
these difficulties and we hope that progressively, after introducing simple coding techniques
we can emphasize on concatenated codes (known as turbo codes) which approaches capacity
limits as they behaves as random like codes.
-1
10
Gc (2.5 dB) is the QPSK
TCM
coding gain at Pe=10-4 -2 6D TCM
10
for the first code
-3
10
-6
10
3 4 5 6 7 8 9 10
k
ρ= <1
n
Channel y
User Demodulator
Decoder
u = [u1 u2 ⋅ ⋅ ⋅ uk ] k
x = [ x1 x2 ⋅ ⋅ ⋅ xn ] ρ= <1
n
• In a BSC channel (Binary symmetric channel) the received n-sequence is:
y = x⊕e e = [e1 e2 ⋅ ⋅ ⋅ en ]
• e is a binary n-sequence representing the error vector. If ei =1 than an error
has occurred at digit i .
Repetition Code (3 , 1)
x1 = u1 , x2 = u1 , x3 = u1
x1 = u1 , x2 = u 2 , x3 = u1 ⊕ u 2
G = [I k P ]
( x1 ⊕ x 2 ) ⊕ x 3 = x 1 ⊕ ( x 2 ⊕ x 3 )
x1 ⊕ x 2 = x 2 ⊕ x1
x1 ⊕ x 2 = 0 ⇒ x 1 = x 2
One can verify that the Hamming distance is a metric that indeed satisfies the
triangle inequality dH(x1,x3) ≤ dH(x1,x2) + dH(x2,x3)
xi = ui 1 ≤ i ≤ k
k
xi = ∑ g ij u j k +1 ≤ i ≤ n (n-k) parity equations
j =1
• Using the first k received symbols y1, …, yk, an algebraic decoder compute
the n-k parity equations and compare them to the received last (n-k)
symbols yk+1, …yn
k
y 'i = ∑ g ij y j k +1 ≤ i ≤ n
j =1
yi ⊕ y 'i k +1 ≤ i ≤ n
AWGN
Maximum
User Likelihood h(-t) x
Detection r
cos(2πf0t)
dE
Pe ≥ N min Q ,min
2 N0
where dE,min is the minimum Euclidean distance between two
sequences and where N min is the average number of nearest
neighbors in the code separated by dE,min .
2E k
Pe ≥ N min Q d H , min × ρ × b ρ=
N0 n
-2
10
-3
10
Pe
-4
10
-5
10
-6
10
0 1 2 3 4 5 6 7 8 9 10
Eb/N0 (dB)
AWGN
cos(2πf0t)
xˆ = x (l) ⇔ ( )
P y x (l) = Max P y x (m)
(m)
( )
x
( )
P y x (l) = p d l (1 − p ) n − d l
d l = d H ( y, x (l) ) is the Hamming distance between the received sequence and the
codeword x(l).
xˆ = x (l) ( ) (
⇔ d H y, x (l) = M in d H y, x (m)
(m)
)
x
In designing a good linear binary code one must search on codes maximizing the minimum
Hamming distance, and having a small average number of nearest neighbors.
The receptor operating on received binary sequences (after demodulation,i.e. a BSC channel)
is known as hard decision decoder
Hard Decoding
-1 Soft decoding
10
-2
10
Pe
-3
10
-4
10
-5
10
0 2 4 6 8 10
Eb/N0 (dB)
TELECOM LILLE 1 - Février 2010 Information Theory and Coding
89. Hard v/s Soft decoding
Capacity of BPSK in AWGN
1.2
Soft
1 decision channel
bits/symbol
0.8
0.4
0.2
-5 0 5 10 15
Eb/No (dB)
The Hamming code (7, 4) is a cyclic code. For instance, there are
six different cyclic shifts of the code word 0111010:
• Bose-Chaudhuri-Hocquenghem codes.
• This class of cyclic codes is one of the most useful
for correcting random errors mainly because the
decoding algorithms can be implemented with an
acceptable amount of complexity.
• A sequential machine:
+ x1
ui ui-1 ui-2
+ + x2
S1
S2
S3
(0/ 0, 0)
S0
(1/
1, 1
)
, 1)
1
S1
(0
/
0, 0)
(1/
(0/
(1 0, 1
/ )
S2
1,
0)
1, 0)
(0/
S3
(1/ 0, 1)
d free 2 Eb kd free
Pe ≥ N free Q = N free Q
2σ N0n
R
η ∆ bit/sec/Hz
B
Eb R Cs
• Let Cs = B log 2 1 + bits/sec ηc ∆ bit/sec/Hz
N0 B B
Eb 2η c − 1
≥
N0 η
Eb R
C∞ = log 2 e bits/s
N0
Eb 1
R ≤ C∞ ⇒ ≥ = 0.693
N 0 log 2 e
Eb
≥ −1.6 dB
N0
Channel Capacity
64QAM
Spectral Efficiency (bit/s/Hz in log)
M = 32
M = 64
Pe = 10 −5
Orthogonal signals
Coherent detection
-1
10 Back to text
-5 0 5 10 15 20 25
Eb/N0 (dB)
{ } { }
2 nR
P at least one x ( j ) , j ≠ i, belongs to S ≤ ∑ P x ( j ) ∈ S
j =1
j ≠i
( )
≤ 2 nR 2 nH ( X Y ) 2 nR
nH ( X )
= nC → 0 as n → ∞
2 2