Sie sind auf Seite 1von 57

Chapter 4 Source coding

4.1 Discrete source coding


4.2 Instantaneous decodable code
4.3 Coding efficiency

4.4 Shannon-Fano algorithm


4.5 Huffman coding
Harbin Institute of Technology

4.1 Discrete source coding


The most intuitive way to reflect the effectiveness of the communication
system is the information transmission rate. It expresses the amount of
information transmitted per unit time.
The information transmission rate mainly depends on three factors
(1) symbol rate:
According to the knowledge of communication theory, symbol rate is
related to channel bandwidth and generally determined by application
requirements.
(3) noise entropy:
Noise entropy is related to the channel condition and determined by
application environment. In addition, the two are related to the system
cost.
(2) source entropy:
Increasing source entropy is an important way to improve the
effectiveness of the communication system. Here, increasing source
entropy is increasing the information carried by each symbol of source.
Harbin Institute of Technology

4.1 Discrete source coding


According to the information theory, encoding (some
transformation) can achieve the purpose of improving

the source entropy, and we call this kind of encoding as


source coding.
The main contents of this chapter is to introduce basic
principles and methods of source coding including non-

distortion source coding, limit-distortion source coding


and source coding theorem.

Harbin Institute of Technology

4.1 Discrete source coding


The encoder can be regarded as such a system: The input of encoder is the
original source S:{s1,s2,,sn}, but the symbol set channel can transmit is
A:{a1,a2,,aq}; the function of encoder is to convert the original symbol S i to
corresponding symbol Wi (i=1,2,,n) by using the elements in set A. So, the
output of encoder is W:{W1,W2,,Wn}

S: {s1,s2sn}

encoder

W: {W1,W2Wn}

A: {a1,a2aq}
It can be seen that encoder is to establish a one-one mapping relation
and to convert the symbols in original source set S to those of W , which is
composed of symbols in channel symbol set A.

Harbin Institute of Technology

4.2 Instantaneous decodable code


Definition 4-1:
If any code sequence with finite length in a code group can be
only decoded into a series of code words, then we call it
univocal decodable code.
For example: S:{s1,s 2,s3}; A:{0,1}; W:{w1=0, w2=10, w3=11} is
an univocal decodable code. If the transmitted code
sequence is [w1,w2 ,w1,w 3,w1,w1,w3,w3 ], and its error-free
reception sequence is [01001100111110], then it can be
uniquely decoded as: [w1,w2,w1,w3,w1,w1,w3,w3].
If the code word set is W:{ w1=0, w 2=01, w3=10}, then it is
non-univocal decodable code. If the transmitted code
sequence is [w1,w2 ,w1,w 3,w1,w1,w3,w3 ], and its error-free
reception sequence is [001010001010], then it can be
decoded
as
[w1,w 2,w1 ,w3,w 1,w1 ,w3,w 3]
or
[w1,w1,w3,w3 ,w1,w2,w2 ,w1,]. It has different decoding
methods, so it is not univocal decodable code.
Harbin Institute of Technology

4.2 Instantaneous decodable code


Definition 4-2: If adding several code elements after any code
word cannot generate another code word in the code group,
then we call it instantaneous decodable code.
For example: W:{010100111} is not an instantaneous
decodable code, because 100 can be generated by adding 0
after 10.
For example: W:{0 01} is univocal, but it is not an
instantaneous decodable code.

Obviously, the set of univocal decodable code contains the set of univocal
decodable codes. That is to say, univocal decodable codes must be
univocal decodable code, but univocal decodable code maybe not
univocal decodable code.
Harbin Institute of Technology

4.2 Instantaneous decodable code


Code lengths: code word set is W:{W1,W2,Wn} and the code lengths are

respectively L1,L2,,Ln
Kraft inequality:

Li

i 1

Kraft inequality is the necessary and sufficient condition for


instantaneous decodable codes. if the code lengths satisfy Kraft
inequality, it certainly can constitute instantaneous decodable codes.
The code lengths of some code words also satisfy the Kraft inequality,
but the code words are not univocal decodable code because of incorrect
coding method.
Obviously, the set of Kraft inequality contains the set of univocal
decodable codes. That is to say, univocal decodable codes must satisfy
Kraft inequality, but if some code satisfies Kraft inequality, it maybe not
univocal decodable code.
Harbin Institute of Technology

4.2 Instantaneous decodable code


We can constitute instantaneous decodable code by using code tree diagram.
Example 4-2Assume the original source W:{w1,w2,w3,w4} has 4 symbols, and we
execute binary instantaneous decodable coding by using code tree diagram. The specific
coding method is shown in figures:

root

root
1

0
W1

W1

0
W2

0
W3

1
1

0
1
W2

W4

1
W3

0
W4

It can be seen that the instantaneous decodable code constituted by this method is not
unique.
Harbin Institute of Technology

4.3 Coding efficiency


S s1 , s2 , , sn
Definition 4-3: Assume the original source space is P( S ) : p( s1 ), p( s2 ),, p( sn )
and encode it using code element set A:{a1,a2,aq}, then we obtain univocal
decodable code W:{W1W2Wn}. If the corresponding code lengths are Li ,
(i=1,2,,n), the average code length of this source coding is:
n

L p( si ) Li
i 1

The information transmission rate R (the entropy rate) is equal to the source
entropy H(S).
Passing through the encoder, the original source converts to code word set
W:{W1,W2,,Wn}, then the information transmission rate is

H (W ) H ( S )

H ( A)
L
L
When the original source is fixed, the shorter the average code length, the higher the
information transmission rate, that is the higher coding efficiency. So, the coding
efficiency can be described by the average code length.
R

Harbin Institute of Technology

4.3 Coding efficiency


For a discrete noiseless channel, if the maximum source entropy is Hmax(S) and the
symbol transmission rate is r, then the channel capacity is

C r H max ( S )
If the actual source entropy is H(S), then the actual entropy rate of this discrete noiseless
channel is

R r H ( A)

Here we define the ratio of the entropy rate and the channel capacity as the transmission
efficiency. It can be seen that the transmission efficiency is the ratio of the actual
information transmission capability and the maximum information transmission
capability of a communication system (channel), that is

R
H ( A)

C H max ( S )

Harbin Institute of Technology

10

4.3 Coding efficiency


For a original source S with n symbols, it is equivalent to n-ary encoded without coding.
Its maximum entropy is H max ( S ) log n , and the transmission efficiency is

R H ( A)

C log n

However, it is impossible to be n-ary encoded. It usually needs to be q-ary encoded, then


the transmission efficiency is

R
H ( A)
H (S )

C H max ( A) L log q

When q=2, the transmission efficiency is

H (S )
L

Harbin Institute of Technology

11

4.3 Coding efficiency


For example: A discrete source coding, S:{s1,s2}P(S):{0.2, 0.8}, calculate the
coding efficiency after encoding.
1
4
5
H ( S ) pi log pi log 5 log 0.7219
5
5
4
i 1
2

q=2, the coding efficiency is

H (S )

=72.19%
L

Harbin Institute of Technology

12

4.4 Shannon-Fano algorithm


(1) The idea of Shannon coding
Analysis shows that the uneven prior probability of source symbol will decrease
the encoding efficiency. So the length of each code word can be determined
according to the priori probability of the source symbols. The code with larger
probability has longer code word, and the code with shorter probability has
shorter code word.
In fact, this approach is a direct application to satisfy the Kraft inequality.

Harbin Institute of Technology

13

4.4 Shannon-Fano algorithm


(2) The steps of Shannon-Fano algorithm
The Shannon - Fano algorithm coding is marching with the following steps:
Rearrange the original source symbols in descending probability;
Divide all source symbols into q groups with the sum of symbol
probabilities in each group as equal as possible, and assign code
elements a1,a2,aq to the groups respectively;
Divide all the groups according to the rules above until the end points
(Each group has one source symbol);
Arrange the assigned code elements from left to right to form a code
word Wi.

Harbin Institute of Technology

14

4.4 Shannon-Fano algorithm

Example 4-5 Assume a single-symbol discrete memoryless source S, its source


space is:
S:

s1

s2

s3

s4

s5

s6

s7

s8

p(S):

0.1

0.18

0.4

0.05

0.06

0.1

0.07

0.04

Solution:
The entropy of the original source is
H (S )

p(s ) log p(s ) 2.55


i 1

And the maximum entropy is

H max(S ) log 8 3

So the coding efficiency is

R
H (S )
2.55

85%
C H max (S )
3

Harbin Institute of Technology

15

4.4 Shannon-Fano algorithm


The encoding steps using Shannon-Fano algorithm are as follow:

si

p(si )

The 1st
grouping

The 2nd
grouping

s3

0.40

s2

0.18

s1

The 3rd
grouping

The 4th
grouping

Wi

Li

00

01

0.10

110

s6

0.10

111

s7

0.07

1000

s5

0.06

1001

s4

0.05

1010

s8

0.04

1011

Harbin Institute of Technology

16

4.4 Shannon-Fano algorithm


0

We can see its average code length after encoded

0
1

is 2.64 through calculation. So the coding efficiency


is

R
H ( A)
H (S ) 2.55

96.6%
C H max ( A)
2.64
L

W3

W1

W6
0

W7

W5

W4

1
1

Harbin Institute of Technology

0
0

It can be seen that the coding efficiency has been improved.

W2

W8

17

4.4 Shannon-Fano algorithm


Now we change the original source in the previous example, and the source space is:
S:

s1

s2

s3

s4

s5

s6

s7

s8

p(S):

1/4

1/4

1/8

1/8

1/16

1/16

1/16

1/16

Harbin Institute of Technology

18

4.4 Shannon-Fano algorithm


Now we change the original source in the previous example, and the source space is:
S:

s1

s2

s3

s4

s5

s6

s7

s8

p(S):

1/4

1/4

1/8

1/8

1/16

1/16

1/16

1/16

The entropy of the original source is H (S ) p(si ) log p(si ) 2.75


And the maximum entropy is
So the coding efficiency is

i 1

H max(S ) log 8 3

R
H (S )
2.75

91.7%
C H max(S )
3

1/4

The 1st
grouping
0

The 2nd
grouping
0

s2

1/4

s3

1/8

s4

1/8

s5

1/16

s6

si

p(si)

s1

The 3rd
grouping

The 4th
grouping

The average code length


after encoded is 2.75. So
the coding efficiency is

R
H ( A)
H (S ) 2.75

1
C H max ( A)
2.75
L

Wi

Li

00

01

100

101

1100

1/16

1101

s7

1/16

1110

s8

1/16

1111

19

4.5 Huffman algorithm


Through further analysis and research, an algorithm more efficient than
Shannon-Fano algorithm has been proposed, that is Huffman algorithm. It is
also known as the optimal coding algorithm

The steps of binary Huffman algorithm


Arrange the n states {s1,s2,sn} of the original source S in descending
probability, as leaves of the code tree;
Assign "0" and "1" to the two symbols with the smallest probabilities
respectively, then sum up their probabilities and merge into a new
symbol. Rearrange the new symbol sequence in descending probability;
Repeat this step until all the states processed completely;
Arrange the assigned code elements from right to left to form a code
word Wi.

Harbin Institute of Technology

20

4.5 Huffman algorithm


Example 4-6Assume a source S the same as that in the previous example, its
source space is also:
S:

s1

s2

s3

s4

s5

s6

s7

s8

p(S):

0.1

0.18

0.4

0.05

0.06

0.1

0.07

0.04

Harbin Institute of Technology

21

4.5 Huffman algorithm


8

Solution: The entropy of the original source is H (S ) p(si ) log p(si ) 2.55
i 1

And the maximum entropy is H max(S ) log 8 3


So the coding efficiency is

R
H (S )
2.55

85%
C H max (S )
3

According to the Huffman algorithm, we obtain the following encoding steps.

s3

0.4

s2

0.18

s1

0.1

s6

0.1

s7

0.07

s5

0.06

s4

0.05

s8

0.04

W3=1

W2=001

1
(0.37)

(0.23)

(0.13)

(1.0)

(0.19)
1

(0.09)

Harbin Institute of Technology

0
(0.6)

W1=011
W6=0000

W7=0100
W5=0101
W4=00010

W8=00011
22

4.5 Huffman algorithm


8

Solution: The entropy of the original source is H (S ) p(si ) log p(si ) 2.55
i 1

And the maximum entropy is H max(S ) log 8 3


So the coding efficiency is

R
H (S )
2.55

85%
C H max (S )
3

According to the Huffman algorithm, we obtain the following encoding steps.

s3

0.4

s2

0.18

s1

0.1

s6

0.1

s7

0.07

s5

0.06

s4

0.05

s8

0.04

W3=1

W2=001

1
(0.37)

0
(0.6)

(0.23)

(1.0)

W5=0101

(0.19)
1

(0.09)

W6=0000

W7=0100

(0.13)

W1=011

p(s )L 2.61
i 1

Harbin Institute of Technology

W4=00010

W8=00011
23

4.5 Huffman algorithm


8

Solution: The entropy of the original source is H (S ) p(si ) log p(si ) 2.55
i 1

And the maximum entropy is H max(S ) log 8 3


So the coding efficiency is

R
H (S )
2.55

85%
C H max (S )
3

R
H ( A)
H ( S ) 2.55

97.8%
C H max ( A)
2.61
L

According to the Huffman algorithm, we obtain the following encoding steps.

s3

0.4

s2

0.18

s1

0.1

s6

0.1

s7

0.07

s5

0.06

s4

0.05

s8

0.04

W3=1

W2=001

1
(0.37)

0
(0.6)

(0.23)

(1.0)

W5=0101

(0.19)
1

(0.09)

W6=0000

W7=0100

(0.13)

W1=011

p(s )L 2.61
i 1

Harbin Institute of Technology

W4=00010

W8=00011
24

Homework
[Homework 4-1]
A single-symbol discrete memoryless source is as follows:

x2
x3
x4
x5
x6
x7
X x1

P 0.20 0.19 0.18 0.17 0.15 0.10 0.01


Please show both Shannon-Fano and the Huffman Coding results and
calculate the coding efficiency.

Harbin Institute of Technology

25

Chapter 5 Channel coding theory


5.1 Basic Concepts
5.2 Decoding Criterion
5.3 Channel Coding Theorem
5.4 Error Control Method

Harbin Institute of Technology

26

5.1 Basic Concepts

Harbin Institute of Technology

27

5.1.1 Meanings of channel coding


The main purpose: Improve transmission reliability, increase noise immunity.
So it is also called error-correction coding or anti-interference coding.

Source
Coding
Channel
Coding
Input
Compress

Output
Coding

Send

Receive

Decoding

Decompression

Channel
with Noise

Harbin Institute of Technology

28

5.1.1 Meanings of channel coding


For a BSC channel: its input is X={01} with equiprobable distribution. The
channel model is:
p=1-p1=0.99

p1=0.01
p1=0.01
1
p=1-p1=0.99

When p1=10-2, the average probability of error decoding is


P e=(1/n)(p1+p1)=(1/2)(0.01+0.01)=10-2
Harbin Institute of Technology

29

5.1.1 Meanings of channel coding


For repetition code

p=1-p1=0.99

The inputs are

p1=0.01

X1=000X2=111
The outputs are

p1=0.01

Y1=000Y2=001Y3=010Y4=011

Y5=100Y6=101Y7=110Y8=111

p=1-p1=0.99

Decoding
F(Y1)=F(Y2)=F(Y3)=F(Y5)=X1=000
F(Y4)=F(Y6)=F(Y7)=F(Y8)=X2=111
The probability of error decoding is
P emin310-4
Harbin Institute of Technology

30

5.1.2 Hamming distance


(1) Code word space
Original source space has M code words. After q-ary coding with equal
code length N, channel code word space has qN code words.

Chose M code words in the qN ones to represent original code words


respectively. Then the M code words are called allowable code word,
and other qN M ones are called disable code word.
For error Correcting, qN >M must be satisfied. These M code words
called a code group.

Harbin Institute of Technology

31

5.1.2 Hamming distance


(2) Hamming distance
In a code group, between any two equal length code, if there are D
different corresponding code elements, then D is called Hamming
distance between the two code word.

Harbin Institute of Technology

32

5.1.2 Hamming distance


(2) Hamming distance
<Example> and are two code words in group X: =[a1,a2,aN ]
ai{0,1}

=[b1,b2,bN]

bi{0,1}

the Hamming distance between and is


N

d ( , ) a i bi
i 1

0d N

d = 0 indicates that the code are all of the same ,


d = N indicates that the code are disparate code.

Harbin Institute of Technology

33

5.1.2 Hamming distance


(2) Hamming distance
For binary coding, we use modulo 2 addition

d ( , ) a i bi
i 1

Harbin Institute of Technology

34

5.1.2 Hamming distance


(3) Minimum code distance
The Hamming distances between any two code words constitute a set
D(,). The minimum value in this set is called the minimum code
distance of code words, denoted as dmin .

dmin=min{d, ,X }

Harbin Institute of Technology

35

5.1.2 Hamming distance


(4) Hamming weight
In binary coding, the number of 1 is called the Hamming weight,
denoted as: W().

So, Hamming distance can be also expressed as:

d(,)=W()

Harbin Institute of Technology

36

5.1.2 Hamming distance


< Example >
0101
0112

1102
1113

Hamming distance
Number of edges from
one vertex to another
vertex
Hamming weight
the number of1

1001

0000
0011

1012

Harbin Institute of Technology

37

5.1.2 Hamming distance


< Example >
Code A Code B

Code C

Code D

000
111

000
011
101
110

000
001
100
010

000
001

111

Code
number

Minimum
code
distance

Allowable
code
word

Harbin Institute of Technology

38

5.2 Decoding criterion

39

5.2.1 Meanings of decoding criterion


(1) Example
Decoding method affects the reliability.
A BSC channel is shown in the following figure:
1-p=1/4

p=3/4
p=3/4
1

1-p=1/4

Harbin Institute of Technology

40

5.2.1 Meanings of decoding criterion


Assume p(0)=p(1)=1/2

1-p=1/4

p=3/4

1.Receive 0 0
receive 1 1
Probability of correct decoding = 1/4

p=3/4
1

Probability of error decoding = 3/4


2.Receive 0 1
receive 1 0
Probability of correct decoding = 3/4
Probability of error decoding = 1/4

Harbin Institute of Technology

1-p=1/4

Reliability
improved
41

5.2.1 Meanings of decoding criterion


(2) Decoding criterion
X

p(yj/xi)

X:{x1,x2,..,xn}

Y:{y1,y2,ym}

P(Y/X):{p(yj/xi); i=1,2,n; j=1,2,m}


Definite a single value function
F(yj)=xi (i=1,2,n; j=1,2,m)
as decoding function.
The group of decoding functions constitutes a decoding criterion.

Harbin Institute of Technology

42

5.2.1 Meanings of decoding criterion


If a channel has n inputs and m outputs, there will be nm different decoding
criterions.
For example, the channel in previous example has 4 decoding criterions
A:{F(0)=0;F(1)=0}

1-p=1/4

p=3/4

B:{F(0)=0;F(1)=1}
C:{F(0)=1;F(1)=0}
D:{F(0)=1;F(1)=1}

p=3/4

1
1-p=1/4
Harbin Institute of Technology

43

5.2.2 Probability of error decoding


After determining the decoding criterion, the receiver receives a yj, then
according to the decoding criterion, decode it into F(yj)= xi if what is sent is
xi, the decoding is called correct decoding, if not, called error decoding.
After received yj the probability of correct decoding is that after receiving the
yj , speculated the posterior probability of xi on sender :

Prj=P{F(yj)=xi/yj}

Harbin Institute of Technology

44

5.2.2 Probability of error decoding


Probability of error decoding:
After receiving the yj , speculated that the probability of symbols besides xi :

Pej=P{e/yj}=1-P{F(yj)=xi/yj}
The e expresses that collection of all the other source symbols except xi.
And then take the average of all yj, the average probability of correct decoding
is:

j 1

j 1

Pr P( y j ) Prj p( y j ) p{F ( y j ) xi / y j }
Harbin Institute of Technology

45

5.2.2 Probability of error decoding


The average probability of error decoding is
m

j 1

j 1

Pe p( y j ) P{e / y j } p( y j ){1 P[ F ( y j ) xi / y j ]}
All communication system will regard the average decoding error probability
as an important indicator of system reliability.

j 1 i

j 1 i

Pem in p ( y j ) p ( xi / y j ) p ( xi ) p ( y j / xi )
Harbin Institute of Technology

46

5.2.3 Maximum likelihood criterion


If p(yj/x*) is the maximum of n channel transmission probabilities of yj

p(yj/x1),p(yj/x2),,p(yj/xn)
then decode yj into x*the above method is called Maximum likelihood
decoding criterion
when p(xi)=1/n maximum a posterior probability criterion is equivalent to
the maximum likelihood decoding criterion.
Maximum likelihood decoding criterion use the channel transmission

probability instead of posterior probability, more convenient.

Harbin Institute of Technology

47

5.2.3 Maximum likelihood criterion


Example
3/4

x1

y1

1/4
x2

1/3
2/3

y2

If the prior probability is {1/2 , 1/2};

Harbin Institute of Technology

48

5.2.3 Maximum likelihood criterion


Set source space of X is:

x1
x2 ... xn
[ X , P]:
P( X ) : p( x1 ) p( x2 ) ... p( xn )
X:

The transfer matrix of channel is:

p( y1 / x1 ) p( y 2 / x1 ) ... p( y m / x1 )
p( y1 / x2 ) p( y 2 / x2 ) ... p( y m / x2 )
[ P]
...
...
...
...
p( y1 / xn ) p( y 2 / xn ) ... p( y m / xn )

Harbin Institute of Technology

49

5.2.3 Maximum likelihood criterion


3/ 4 1/ 4
[ P (Y / X )]

1/
3
2
/
3

Example
3/4

x1

y1

1/4
x2

1/3
2/3

y2

If the prior probability is {1/2 , 1/2};

Harbin Institute of Technology

50

5.2.4 Maximum a posterior criterion


After receiving each yj (j=1,2,m) posteriori probability

p(x1/yj),p(x2/yj),p(xn/yj)
There will be a maximum, and set it as: p(x*/yj)
so that p(x*/yj)p(xi /yj) (for each i)
This indicatethe input is considered as x* after receiving
the symbol yj
so the decoding function is

F(yj)=x*

(j==1,2,m)

This decoding formula called Maximum a posterior criterion

Harbin Institute of Technology

51

5.2.4 Maximum a posterior criterion


Using this criterion can make the average probability of error decoding
formula of the sum of each item to reach the minimum {1-[F(yj)=x*/yj]}
Then the average probability of error decoding of the minimum value is
m

Pem in p ( y j ) p ( xi / y j )
j 1 i

In the expression, the minimum value associated with the prior probability

information source and channel transition probability, especially the channel


transition probability, if in addition to the p(yj/x*) other items are small
and decoding error probability is small.

Harbin Institute of Technology

52

5.2.4 Maximum a posterior criterion


Example
3/4

x1

y1

1/4
x2

1/3
2/3

y2

If the prior probability is {1/2 , 1/2};

Harbin Institute of Technology

53

5.2.4 Maximum a posterior criterion


Example
3/4

x1

y1

1/4
x2

1/3
2/3

y2

3/ 4 1/ 4
[ P (Y / X )]

1/
3
2
/
3

3/ 8 1/ 8
[ P ( X ;Y )]

1/
6
2
/
6

[ P (Y )] [13/ 24;

11/ 24]

If the prior probabilities are {1/2, 1/2}

9 /13 3/11
[ P ( X / Y )]

4
/13
8
/11

Harbin Institute of Technology

54

5.2.4 Maximum a posterior criterion


3/ 4 1/ 4
[ P (Y / X )]

1/
3
2
/
3

Example
3/4

x1

y1

1/4
x2

1/3
2/3

3/ 8 1/ 8
[ P ( X ;Y )]

1/
6
2
/
6

y2

If the prior probability is {1/2 , 1/2};

[ P (Y )] [13/ 24;

11/ 24]

9 /13 3/11
[ P ( X / Y )]

4
/13
8
/11

Harbin Institute of Technology

55

5.2.4 Maximum a posterior criterion


3/ 4 1/ 4
[ P (Y / X )]

1/
3
2
/
3

Example
3/4

x1

y1

1/4
x2

1/3
2/3

3/ 8 1/ 8
[ P ( X ;Y )]

1/
6
2
/
6

y2

If the prior probability is {1/2 , 1/2};

[ P (Y )] [13/ 24;

11/ 24]

9 /13 3/11
[ P ( X / Y )]

4
/13
8
/11

The biggest transition probability is equivalent to the maximum a posteriori

probability
Harbin Institute of Technology

56

Homework 5
X

p(yj/xi)

X:{x1,x2,x3}

Y:{y1,y2,y3}

channel transmission matrix:

0.5 0.3 0.2


[ P] 0.2 0.3 0.5
0.3 0.3 0.4
Decode using maximum likelihood criterion and maximum posterior one
respectively, then calculate the average probabilities of error decodeing
when the priori probability is {1/4,1/4,1/2}.

57