Sie sind auf Seite 1von 57

# Chapter 4 Source coding

## 4.1 Discrete source coding

4.2 Instantaneous decodable code
4.3 Coding efficiency

## 4.4 Shannon-Fano algorithm

4.5 Huffman coding
Harbin Institute of Technology

## 4.1 Discrete source coding

The most intuitive way to reflect the effectiveness of the communication
system is the information transmission rate. It expresses the amount of
information transmitted per unit time.
The information transmission rate mainly depends on three factors
(1) symbol rate:
According to the knowledge of communication theory, symbol rate is
related to channel bandwidth and generally determined by application
requirements.
(3) noise entropy:
Noise entropy is related to the channel condition and determined by
application environment. In addition, the two are related to the system
cost.
(2) source entropy:
Increasing source entropy is an important way to improve the
effectiveness of the communication system. Here, increasing source
entropy is increasing the information carried by each symbol of source.
Harbin Institute of Technology

## 4.1 Discrete source coding

According to the information theory, encoding (some
transformation) can achieve the purpose of improving

## the source entropy, and we call this kind of encoding as

source coding.
principles and methods of source coding including non-

## distortion source coding, limit-distortion source coding

and source coding theorem.

## 4.1 Discrete source coding

The encoder can be regarded as such a system: The input of encoder is the
original source S:{s1,s2,,sn}, but the symbol set channel can transmit is
A:{a1,a2,,aq}; the function of encoder is to convert the original symbol S i to
corresponding symbol Wi (i=1,2,,n) by using the elements in set A. So, the
output of encoder is W:{W1,W2,,Wn}

S: {s1,s2sn}

encoder

W: {W1,W2Wn}

A: {a1,a2aq}
It can be seen that encoder is to establish a one-one mapping relation
and to convert the symbols in original source set S to those of W , which is
composed of symbols in channel symbol set A.

## 4.2 Instantaneous decodable code

Definition 4-1:
If any code sequence with finite length in a code group can be
only decoded into a series of code words, then we call it
univocal decodable code.
For example: S:{s1,s 2,s3}; A:{0,1}; W:{w1=0, w2=10, w3=11} is
an univocal decodable code. If the transmitted code
sequence is [w1,w2 ,w1,w 3,w1,w1,w3,w3 ], and its error-free
reception sequence is , then it can be
uniquely decoded as: [w1,w2,w1,w3,w1,w1,w3,w3].
If the code word set is W:{ w1=0, w 2=01, w3=10}, then it is
non-univocal decodable code. If the transmitted code
sequence is [w1,w2 ,w1,w 3,w1,w1,w3,w3 ], and its error-free
reception sequence is , then it can be
decoded
as
[w1,w 2,w1 ,w3,w 1,w1 ,w3,w 3]
or
[w1,w1,w3,w3 ,w1,w2,w2 ,w1,]. It has different decoding
methods, so it is not univocal decodable code.
Harbin Institute of Technology

## 4.2 Instantaneous decodable code

Definition 4-2: If adding several code elements after any code
word cannot generate another code word in the code group,
then we call it instantaneous decodable code.
For example: W:{010100111} is not an instantaneous
decodable code, because 100 can be generated by adding 0
after 10.
For example: W:{0 01} is univocal, but it is not an
instantaneous decodable code.

Obviously, the set of univocal decodable code contains the set of univocal
decodable codes. That is to say, univocal decodable codes must be
univocal decodable code, but univocal decodable code maybe not
univocal decodable code.
Harbin Institute of Technology

## 4.2 Instantaneous decodable code

Code lengths: code word set is W:{W1,W2,Wn} and the code lengths are

respectively L1,L2,,Ln
Kraft inequality:

Li

i 1

## Kraft inequality is the necessary and sufficient condition for

instantaneous decodable codes. if the code lengths satisfy Kraft
inequality, it certainly can constitute instantaneous decodable codes.
The code lengths of some code words also satisfy the Kraft inequality,
but the code words are not univocal decodable code because of incorrect
coding method.
Obviously, the set of Kraft inequality contains the set of univocal
decodable codes. That is to say, univocal decodable codes must satisfy
Kraft inequality, but if some code satisfies Kraft inequality, it maybe not
univocal decodable code.
Harbin Institute of Technology

## 4.2 Instantaneous decodable code

We can constitute instantaneous decodable code by using code tree diagram.
Example 4-2Assume the original source W:{w1,w2,w3,w4} has 4 symbols, and we
execute binary instantaneous decodable coding by using code tree diagram. The specific
coding method is shown in figures:

root

root
1

0
W1

W1

0
W2

0
W3

1
1

0
1
W2

W4

1
W3

0
W4

It can be seen that the instantaneous decodable code constituted by this method is not
unique.
Harbin Institute of Technology

## 4.3 Coding efficiency

S s1 , s2 , , sn
Definition 4-3: Assume the original source space is P( S ) : p( s1 ), p( s2 ),, p( sn )
and encode it using code element set A:{a1,a2,aq}, then we obtain univocal
decodable code W:{W1W2Wn}. If the corresponding code lengths are Li ,
(i=1,2,,n), the average code length of this source coding is:
n

L p( si ) Li
i 1

The information transmission rate R (the entropy rate) is equal to the source
entropy H(S).
Passing through the encoder, the original source converts to code word set
W:{W1,W2,,Wn}, then the information transmission rate is

H (W ) H ( S )

H ( A)
L
L
When the original source is fixed, the shorter the average code length, the higher the
information transmission rate, that is the higher coding efficiency. So, the coding
efficiency can be described by the average code length.
R

## 4.3 Coding efficiency

For a discrete noiseless channel, if the maximum source entropy is Hmax(S) and the
symbol transmission rate is r, then the channel capacity is

C r H max ( S )
If the actual source entropy is H(S), then the actual entropy rate of this discrete noiseless
channel is

R r H ( A)

Here we define the ratio of the entropy rate and the channel capacity as the transmission
efficiency. It can be seen that the transmission efficiency is the ratio of the actual
information transmission capability and the maximum information transmission
capability of a communication system (channel), that is

R
H ( A)

C H max ( S )

10

## 4.3 Coding efficiency

For a original source S with n symbols, it is equivalent to n-ary encoded without coding.
Its maximum entropy is H max ( S ) log n , and the transmission efficiency is

R H ( A)

C log n

## However, it is impossible to be n-ary encoded. It usually needs to be q-ary encoded, then

the transmission efficiency is

R
H ( A)
H (S )

C H max ( A) L log q

H (S )
L

11

## 4.3 Coding efficiency

For example: A discrete source coding, S:{s1,s2}P(S):{0.2, 0.8}, calculate the
coding efficiency after encoding.
1
4
5
H ( S ) pi log pi log 5 log 0.7219
5
5
4
i 1
2

H (S )

=72.19%
L

12

## 4.4 Shannon-Fano algorithm

(1) The idea of Shannon coding
Analysis shows that the uneven prior probability of source symbol will decrease
the encoding efficiency. So the length of each code word can be determined
according to the priori probability of the source symbols. The code with larger
probability has longer code word, and the code with shorter probability has
shorter code word.
In fact, this approach is a direct application to satisfy the Kraft inequality.

13

## 4.4 Shannon-Fano algorithm

(2) The steps of Shannon-Fano algorithm
The Shannon - Fano algorithm coding is marching with the following steps:
Rearrange the original source symbols in descending probability;
Divide all source symbols into q groups with the sum of symbol
probabilities in each group as equal as possible, and assign code
elements a1,a2,aq to the groups respectively;
Divide all the groups according to the rules above until the end points
(Each group has one source symbol);
Arrange the assigned code elements from left to right to form a code
word Wi.

14

## Example 4-5 Assume a single-symbol discrete memoryless source S, its source

space is:
S:

s1

s2

s3

s4

s5

s6

s7

s8

p(S):

0.1

0.18

0.4

0.05

0.06

0.1

0.07

0.04

Solution:
The entropy of the original source is
H (S )

i 1

## And the maximum entropy is

H max(S ) log 8 3

R
H (S )
2.55

85%
C H max (S )
3

15

## 4.4 Shannon-Fano algorithm

The encoding steps using Shannon-Fano algorithm are as follow:

si

p(si )

The 1st
grouping

The 2nd
grouping

s3

0.40

s2

0.18

s1

The 3rd
grouping

The 4th
grouping

Wi

Li

00

01

0.10

110

s6

0.10

111

s7

0.07

1000

s5

0.06

1001

s4

0.05

1010

s8

0.04

1011

16

0

0
1

is

R
H ( A)
H (S ) 2.55

96.6%
C H max ( A)
2.64
L

W3

W1

W6
0

W7

W5

W4

1
1

0
0

W2

W8

17

## 4.4 Shannon-Fano algorithm

Now we change the original source in the previous example, and the source space is:
S:

s1

s2

s3

s4

s5

s6

s7

s8

p(S):

1/4

1/4

1/8

1/8

1/16

1/16

1/16

1/16

18

## 4.4 Shannon-Fano algorithm

Now we change the original source in the previous example, and the source space is:
S:

s1

s2

s3

s4

s5

s6

s7

s8

p(S):

1/4

1/4

1/8

1/8

1/16

1/16

1/16

1/16

## The entropy of the original source is H (S ) p(si ) log p(si ) 2.75

And the maximum entropy is
So the coding efficiency is

i 1

H max(S ) log 8 3

R
H (S )
2.75

91.7%
C H max(S )
3

1/4

The 1st
grouping
0

The 2nd
grouping
0

s2

1/4

s3

1/8

s4

1/8

s5

1/16

s6

si

p(si)

s1

The 3rd
grouping

The 4th
grouping

## The average code length

after encoded is 2.75. So
the coding efficiency is

R
H ( A)
H (S ) 2.75

1
C H max ( A)
2.75
L

Wi

Li

00

01

100

101

1100

1/16

1101

s7

1/16

1110

s8

1/16

1111

19

## 4.5 Huffman algorithm

Through further analysis and research, an algorithm more efficient than
Shannon-Fano algorithm has been proposed, that is Huffman algorithm. It is
also known as the optimal coding algorithm

## The steps of binary Huffman algorithm

Arrange the n states {s1,s2,sn} of the original source S in descending
probability, as leaves of the code tree;
Assign "0" and "1" to the two symbols with the smallest probabilities
respectively, then sum up their probabilities and merge into a new
symbol. Rearrange the new symbol sequence in descending probability;
Repeat this step until all the states processed completely;
Arrange the assigned code elements from right to left to form a code
word Wi.

20

## 4.5 Huffman algorithm

Example 4-6Assume a source S the same as that in the previous example, its
source space is also:
S:

s1

s2

s3

s4

s5

s6

s7

s8

p(S):

0.1

0.18

0.4

0.05

0.06

0.1

0.07

0.04

21

## 4.5 Huffman algorithm

8

Solution: The entropy of the original source is H (S ) p(si ) log p(si ) 2.55
i 1

## And the maximum entropy is H max(S ) log 8 3

So the coding efficiency is

R
H (S )
2.55

85%
C H max (S )
3

s3

0.4

s2

0.18

s1

0.1

s6

0.1

s7

0.07

s5

0.06

s4

0.05

s8

0.04

W3=1

W2=001

1
(0.37)

(0.23)

(0.13)

(1.0)

(0.19)
1

(0.09)

0
(0.6)

W1=011
W6=0000

W7=0100
W5=0101
W4=00010

W8=00011
22

## 4.5 Huffman algorithm

8

Solution: The entropy of the original source is H (S ) p(si ) log p(si ) 2.55
i 1

## And the maximum entropy is H max(S ) log 8 3

So the coding efficiency is

R
H (S )
2.55

85%
C H max (S )
3

s3

0.4

s2

0.18

s1

0.1

s6

0.1

s7

0.07

s5

0.06

s4

0.05

s8

0.04

W3=1

W2=001

1
(0.37)

0
(0.6)

(0.23)

(1.0)

W5=0101

(0.19)
1

(0.09)

W6=0000

W7=0100

(0.13)

W1=011

p(s )L 2.61
i 1

W4=00010

W8=00011
23

## 4.5 Huffman algorithm

8

Solution: The entropy of the original source is H (S ) p(si ) log p(si ) 2.55
i 1

## And the maximum entropy is H max(S ) log 8 3

So the coding efficiency is

R
H (S )
2.55

85%
C H max (S )
3

R
H ( A)
H ( S ) 2.55

97.8%
C H max ( A)
2.61
L

s3

0.4

s2

0.18

s1

0.1

s6

0.1

s7

0.07

s5

0.06

s4

0.05

s8

0.04

W3=1

W2=001

1
(0.37)

0
(0.6)

(0.23)

(1.0)

W5=0101

(0.19)
1

(0.09)

W6=0000

W7=0100

(0.13)

W1=011

p(s )L 2.61
i 1

## Harbin Institute of Technology

W4=00010

W8=00011
24

Homework
[Homework 4-1]
A single-symbol discrete memoryless source is as follows:

x2
x3
x4
x5
x6
x7
X x1

## P 0.20 0.19 0.18 0.17 0.15 0.10 0.01

Please show both Shannon-Fano and the Huffman Coding results and
calculate the coding efficiency.

25

## Chapter 5 Channel coding theory

5.1 Basic Concepts
5.2 Decoding Criterion
5.3 Channel Coding Theorem
5.4 Error Control Method

26

27

## 5.1.1 Meanings of channel coding

The main purpose: Improve transmission reliability, increase noise immunity.
So it is also called error-correction coding or anti-interference coding.

Source
Coding
Channel
Coding
Input
Compress

Output
Coding

Send

Decoding

Decompression

Channel
with Noise

28

## 5.1.1 Meanings of channel coding

For a BSC channel: its input is X={01} with equiprobable distribution. The
channel model is:
p=1-p1=0.99

p1=0.01
p1=0.01
1
p=1-p1=0.99

## When p1=10-2, the average probability of error decoding is

P e=(1/n)(p1+p1)=(1/2)(0.01+0.01)=10-2
Harbin Institute of Technology

29

## 5.1.1 Meanings of channel coding

For repetition code

p=1-p1=0.99

## The inputs are

p1=0.01

X1=000X2=111
The outputs are

p1=0.01

Y1=000Y2=001Y3=010Y4=011

Y5=100Y6=101Y7=110Y8=111

p=1-p1=0.99

Decoding
F(Y1)=F(Y2)=F(Y3)=F(Y5)=X1=000
F(Y4)=F(Y6)=F(Y7)=F(Y8)=X2=111
The probability of error decoding is
P emin310-4
Harbin Institute of Technology

30

## 5.1.2 Hamming distance

(1) Code word space
Original source space has M code words. After q-ary coding with equal
code length N, channel code word space has qN code words.

## Chose M code words in the qN ones to represent original code words

respectively. Then the M code words are called allowable code word,
and other qN M ones are called disable code word.
For error Correcting, qN >M must be satisfied. These M code words
called a code group.

31

## 5.1.2 Hamming distance

(2) Hamming distance
In a code group, between any two equal length code, if there are D
different corresponding code elements, then D is called Hamming
distance between the two code word.

32

## 5.1.2 Hamming distance

(2) Hamming distance
<Example> and are two code words in group X: =[a1,a2,aN ]
ai{0,1}

=[b1,b2,bN]

bi{0,1}

N

d ( , ) a i bi
i 1

0d N

## d = 0 indicates that the code are all of the same ,

d = N indicates that the code are disparate code.

33

## 5.1.2 Hamming distance

(2) Hamming distance
For binary coding, we use modulo 2 addition

d ( , ) a i bi
i 1

34

## 5.1.2 Hamming distance

(3) Minimum code distance
The Hamming distances between any two code words constitute a set
D(,). The minimum value in this set is called the minimum code
distance of code words, denoted as dmin .

dmin=min{d, ,X }

35

## 5.1.2 Hamming distance

(4) Hamming weight
In binary coding, the number of 1 is called the Hamming weight,
denoted as: W().

d(,)=W()

36

## 5.1.2 Hamming distance

< Example >
0101
0112

1102
1113

Hamming distance
Number of edges from
one vertex to another
vertex
Hamming weight
the number of1

1001

0000
0011

1012

37

< Example >
Code A Code B

Code C

Code D

000
111

000
011
101
110

000
001
100
010

000
001

111

Code
number

Minimum
code
distance

Allowable
code
word

38

39

## 5.2.1 Meanings of decoding criterion

(1) Example
Decoding method affects the reliability.
A BSC channel is shown in the following figure:
1-p=1/4

p=3/4
p=3/4
1

1-p=1/4

40

## 5.2.1 Meanings of decoding criterion

Assume p(0)=p(1)=1/2

1-p=1/4

p=3/4

Probability of correct decoding = 1/4

p=3/4
1

## Probability of error decoding = 3/4

Probability of correct decoding = 3/4
Probability of error decoding = 1/4

1-p=1/4

Reliability
improved
41

## 5.2.1 Meanings of decoding criterion

(2) Decoding criterion
X

p(yj/xi)

X:{x1,x2,..,xn}

Y:{y1,y2,ym}

## P(Y/X):{p(yj/xi); i=1,2,n; j=1,2,m}

Definite a single value function
F(yj)=xi (i=1,2,n; j=1,2,m)
as decoding function.
The group of decoding functions constitutes a decoding criterion.

42

## 5.2.1 Meanings of decoding criterion

If a channel has n inputs and m outputs, there will be nm different decoding
criterions.
For example, the channel in previous example has 4 decoding criterions
A:{F(0)=0;F(1)=0}

1-p=1/4

p=3/4

B:{F(0)=0;F(1)=1}
C:{F(0)=1;F(1)=0}
D:{F(0)=1;F(1)=1}

p=3/4

1
1-p=1/4
Harbin Institute of Technology

43

## 5.2.2 Probability of error decoding

according to the decoding criterion, decode it into F(yj)= xi if what is sent is
xi, the decoding is called correct decoding, if not, called error decoding.
After received yj the probability of correct decoding is that after receiving the
yj , speculated the posterior probability of xi on sender :

Prj=P{F(yj)=xi/yj}

44

## 5.2.2 Probability of error decoding

Probability of error decoding:
After receiving the yj , speculated that the probability of symbols besides xi :

Pej=P{e/yj}=1-P{F(yj)=xi/yj}
The e expresses that collection of all the other source symbols except xi.
And then take the average of all yj, the average probability of correct decoding
is:

j 1

j 1

Pr P( y j ) Prj p( y j ) p{F ( y j ) xi / y j }
Harbin Institute of Technology

45

## 5.2.2 Probability of error decoding

The average probability of error decoding is
m

j 1

j 1

Pe p( y j ) P{e / y j } p( y j ){1 P[ F ( y j ) xi / y j ]}
All communication system will regard the average decoding error probability
as an important indicator of system reliability.

j 1 i

j 1 i

Pem in p ( y j ) p ( xi / y j ) p ( xi ) p ( y j / xi )
Harbin Institute of Technology

46

## 5.2.3 Maximum likelihood criterion

If p(yj/x*) is the maximum of n channel transmission probabilities of yj

p(yj/x1),p(yj/x2),,p(yj/xn)
then decode yj into x*the above method is called Maximum likelihood
decoding criterion
when p(xi)=1/n maximum a posterior probability criterion is equivalent to
the maximum likelihood decoding criterion.
Maximum likelihood decoding criterion use the channel transmission

47

Example
3/4

x1

y1

1/4
x2

1/3
2/3

y2

48

## 5.2.3 Maximum likelihood criterion

Set source space of X is:

x1
x2 ... xn
[ X , P]:
P( X ) : p( x1 ) p( x2 ) ... p( xn )
X:

## The transfer matrix of channel is:

p( y1 / x1 ) p( y 2 / x1 ) ... p( y m / x1 )
p( y1 / x2 ) p( y 2 / x2 ) ... p( y m / x2 )
[ P]
...
...
...
...
p( y1 / xn ) p( y 2 / xn ) ... p( y m / xn )

49

3/ 4 1/ 4
[ P (Y / X )]

1/
3
2
/
3

Example
3/4

x1

y1

1/4
x2

1/3
2/3

y2

50

## 5.2.4 Maximum a posterior criterion

After receiving each yj (j=1,2,m) posteriori probability

p(x1/yj),p(x2/yj),p(xn/yj)
There will be a maximum, and set it as: p(x*/yj)
so that p(x*/yj)p(xi /yj) (for each i)
This indicatethe input is considered as x* after receiving
the symbol yj
so the decoding function is

F(yj)=x*

(j==1,2,m)

51

## 5.2.4 Maximum a posterior criterion

Using this criterion can make the average probability of error decoding
formula of the sum of each item to reach the minimum {1-[F(yj)=x*/yj]}
Then the average probability of error decoding of the minimum value is
m

Pem in p ( y j ) p ( xi / y j )
j 1 i

In the expression, the minimum value associated with the prior probability

## information source and channel transition probability, especially the channel

transition probability, if in addition to the p(yj/x*) other items are small
and decoding error probability is small.

52

Example
3/4

x1

y1

1/4
x2

1/3
2/3

y2

53

## 5.2.4 Maximum a posterior criterion

Example
3/4

x1

y1

1/4
x2

1/3
2/3

y2

3/ 4 1/ 4
[ P (Y / X )]

1/
3
2
/
3

3/ 8 1/ 8
[ P ( X ;Y )]

1/
6
2
/
6

[ P (Y )] [13/ 24;

11/ 24]

9 /13 3/11
[ P ( X / Y )]

4
/13
8
/11

54

3/ 4 1/ 4
[ P (Y / X )]

1/
3
2
/
3

Example
3/4

x1

y1

1/4
x2

1/3
2/3

3/ 8 1/ 8
[ P ( X ;Y )]

1/
6
2
/
6

y2

## If the prior probability is {1/2 , 1/2};

[ P (Y )] [13/ 24;

11/ 24]

9 /13 3/11
[ P ( X / Y )]

4
/13
8
/11

55

3/ 4 1/ 4
[ P (Y / X )]

1/
3
2
/
3

Example
3/4

x1

y1

1/4
x2

1/3
2/3

3/ 8 1/ 8
[ P ( X ;Y )]

1/
6
2
/
6

y2

## If the prior probability is {1/2 , 1/2};

[ P (Y )] [13/ 24;

11/ 24]

9 /13 3/11
[ P ( X / Y )]

4
/13
8
/11

## The biggest transition probability is equivalent to the maximum a posteriori

probability
Harbin Institute of Technology

56

Homework 5
X

p(yj/xi)

X:{x1,x2,x3}

Y:{y1,y2,y3}

## 0.5 0.3 0.2

[ P] 0.2 0.3 0.5
0.3 0.3 0.4
Decode using maximum likelihood criterion and maximum posterior one
respectively, then calculate the average probabilities of error decodeing
when the priori probability is {1/4,1/4,1/2}.

57