Sie sind auf Seite 1von 27

Information Theory and Coding

Tutor: Engr: Tasleem Dehraj

Tasleemdehraj@gmail.com

Lecture No. 4-5-6


November 05, 2015
 Overview of Digital Communication and Storage
Systems
 Basic Information Processing System
 Discrete Information Sources and Entropy
 Difference between information and knowledge
 Source Alphabet and Entropy
 Distinction between data and information
 Examples on How to find Entropy
 Joint and Conditional Entropy
 Source Coding
Joint and Conditional Entropy
 Most communication systems are designed to be used
by a large numbers of users.

 The designers of such systems are concerned with


maximizing the total information carrying capacity of
the system.

 Likewise, many computer systems support multiple


users and the designers of the system are equally
concerned with the total data storage requirements of
the system
Joint and Conditional Entropy
 Let us consider a situation, where we have two
information sources A and B.

 Also let us assume the cardinality of each system as:


|A| = MA and |B| = MB

 If sources A and B are statistically independent, the total


entropy of this system will be simply
H(A,B) = H(A) + H(B)
M 1
 1 
H ( A)   Pm log 2  
m0  Pm 
Joint and Conditional Entropy
 On the other hand, if the information sent by B
is statistically dependent on the information sent
by A, the situation is less obvious.

 Let the joint probability that A sends symbol ai


and B sends symbol bi be written as:
Pi,j = Pr(ai,bi)
 If A and B are statistically independent then
Pij = Pr(ai).Pr(bj) = PiPj
Joint and Conditional Entropy
 This will not be true, if the two sources are statistically
dependent.

 Let us consider combined emission of symbol ai and bj as


compound symbol ci,j = <ai,bj> having probability Pi,j

 If C is the set of all compound symbols ci,j, the entropy


of C is calculated by applying the following equation to
all elements of C
M 1
 1 
H ( A)   Pm log 2  
m0  Pm 
Joint and Conditional Entropy
 and we have:

H (C )  
Ci , j C
Pi , j log 2 (1/ Pi , j )

M A 1 M B 1
H (C )   P
i 0 j 0
i, j log 2 (1/ Pi , j ) (A)

 Now, a joint probability Pi,j may be written in terms of a


conditional probability
Pj|i = Pr(bj|ai)
as
Pi,j = Pj|i . Pi
Joint and Conditional Entropy
 Using this and the property log(ab) = log(a) + log(b)
Equation (A) above becomes
M A 1 M A 1 M B 1
H (C )   P log (1/ P )    P
i 0
i 2 i
i 0 j 0
i, j log 2 (1/ Pi , j ) (B)

 The first term on the right-hand side of Equation (B) is


simply H(A). The second term is called conditional
entropy.
 It is uncertainty (entropy) of B given A and is written
H(B/A). Thus we may write
Joint and Conditional Entropy

H(C) = H(A,B) = H(A) + H(B|A)


Example #2
 Many computer backplanes and memory systems
employ a parity bit as a simple means of error detection.
Let A be an information source with alphabet
A ={0,1,2,3}. Let each symbol a be equally probable and
let B = {0,1} be a parity generator with

0 if a  0 or a  3
bj  
1 if a  1 or a  2
 What are H(A), H(B), and H(A,B)?
Solution of Example #2
 Since we know that

M 1
 1 
H ( A)   Pm log 2  
m0  Pm 

H(C) = H(A,B) = H(A) + H(B|A)

 Do your self
Source Coding
 The entropy of the source is the average information
carried per symbol.

 Since each symbol will either be transmitted ( in case of


communication system ) or stored ( in case of storage
system ), and since each use of the channel ( or each unit
of storage ) has some associated cost, it is clearly
desirable to obtain the most information possible per
symbol ( on the average ).
Source Coding
 If we have an inefficient source, our system can be
made more cost effective through the use of a source
encoder

 A source encoder can be looked at as a data processing


element which takes an input sequence of s0, s1, ……
symbols st  A from the information source and
produces an output sequence s’0, s’1, …. using symbols
s’t drawn from a code alphabet B.
Source Coding
 These symbols are called code words

 The objective of the encoder is to process the input in


such a way that the average information transmitted (or
stored) per channel use closely approaches to H(A).

 In its simple form, the encoder can be viewed as a


mapping of the source alphabet A to a code alphabet B.

 Mathematically this is represented as:

C: AB
Source Coding
 Since encoded sequence must eventually be
decoded, the function C must be invertible.

 This means there exist another function C-1


such that if
C (a) b
then
1
C (b) a
Source Coding
 This is possible only if C (a ) b is unique

 i.e for every b  B there is exactly one a  A such that


C (a ) b and for every a  A , there is exactly one b  B
such that C 1 (b) a
Example 3
 Let A be 4-ary source with symbol probabilities
PA  {0.5,0.3,0.15,0.05}
 Let C be an encoder which maps the symbols in
A into strings of binary digits, as follows:
p0  0.5 C(a 0 ) 0
P1  0.3 C(a1 ) 10
p2  0.15 C(a 2 ) 110
P3  0.05 C(a 3 ) 111
Source Coding
 Let Lm be the number of binary digits in code word bm.
If the code words are transmitted one binary digit at a
time, the average number of transmitted binary digits
per code word is given by
3
L   pm Lm  .5(1)  .3(2)  .15(3)  .05(3)  1.70
m 0
Recalling Example-1
 What is the entropy of a 4-ary source having symbol
probabilities:
PA = {0.5, 0.3, 0.15, 0.05}
 Solution:
 H(A) = 0.5log2 (2)+0.3log2 (10/3)+0.15log2 (100/15)+0.05log2 (20) = 1.6477 bits.
Source Coding
 The efficiency of this encoder is
1.6477
H ( A) / L   0.96924
1.70

 If the source encoder were not used, we would need two


binary digits to represent each source symbol.

 The efficiency of the uncoded source would be


H(A)/2 = 0.82385.
Source Coding
 More sophisticated data processing by the encoder can
also be carried out.

 Suppose that symbols emitted by source A were first


grouped into ordered pairs <ai, aj> and the code words
produced by the encoder were based on these pairs.

 The set of all possible pairs <ai, aj> is called the Cartesian
product of set A with itself and is denoted by A x A.
Source Coding
 The encoding process then becomes a function of two
variables and can be denoted by a mapping
C : A A  B
 or by a function
C (ai , a j ) b
Example 4
 Let A be 4-ary memoryless source with symbol
probabilities
PA  {0.5,0.3,0.15,0.05}

 Since A is a memoryless source, the probability of any


given pair of symbols is given by Pr(ai,aj)=Pr(ai).Pr(aj)

 Let the encoder map pairs of symbols into code words


shown in the following table:
Example 4 (Cond…)
<ai,aj> Pr(ai,aj) bm <ai,aj> Pr(ai,aj) bm
a0,a0 .25 00 a2,a0 .075 1101
a0,a1 .15 100 a2,a1 .045 0111
a0,a2 .075 1100 a2,a2 .0225 111110
a0,a3 .025 11100 a2,a3 .0075 1111110
a1,a0 .15 101 a3,a0 .025 11101
a1,a1 .09 010 a3,a1 .015 111101
a1,a2 .045 0110 a3,a2 .0075 11111110
a1,a3 .015 111100 a3,a3 .0025 11111111
Example 4 (Cond…)
 Since the symbols from A are independent, therefore
 H(AxA) = 2H(A) =3.2954
 Since Pm =Pr(bm) = Pr(ai,aj) for each symbol, the
average number of bits per transmitted code word is
15
L   pm Lm  3.3275
m 0
 The efficiency of the encoder is therefore
H ( A  A) / L  .99035

 Less than 1% of the transmitted bits are redundant

Das könnte Ihnen auch gefallen