Joint and Conditional Entropy

Information Theory and Coding
Tutor: Engr: Tasleem Dehraj
Tasleemdehraj@gmail.com
Lecture No. 4-5-6

November 05, 2015
 Overview of Digital Communication and Storage
Systems
 Basic Information Processing System
 Discrete Information Sources and Entropy
 Difference between information and knowledge
 Source Alphabet and Entropy
 Distinction between data and information
 Examples on How to find Entropy
 Joint and Conditional Entropy
 Source Coding
Joint and Conditional Entropy
 Most communication systems are designed to be used
by a large numbers of users.
 The designers of such systems are concerned with

maximizing the total information carrying capacity of
the system.
 Likewise, many computer systems support multiple

users and the designers of the system are equally
concerned with the total data storage requirements of
the system
 Let us consider a situation, where we have two
information sources A and B.
 Also let us assume the cardinality of each system as:

|A| = MA and |B| = MB
 If sources A and B are statistically independent, the total

entropy of this system will be simply
H(A,B) = H(A) + H(B)
M 1
 1 
H ( A)   Pm log 2  
m0  Pm 
 On the other hand, if the information sent by B
is statistically dependent on the information sent
by A, the situation is less obvious.
 Let the joint probability that A sends symbol ai

and B sends symbol bi be written as:
Pi,j = Pr(ai,bi)
 If A and B are statistically independent then
Pij = Pr(ai).Pr(bj) = PiPj
 This will not be true, if the two sources are statistically
dependent.
 Let us consider combined emission of symbol ai and bj as

compound symbol ci,j = <ai,bj> having probability Pi,j
 If C is the set of all compound symbols ci,j, the entropy

of C is calculated by applying the following equation to
all elements of C
M 1
 1 
H ( A)   Pm log 2  
m0  Pm 
 and we have:
H (C )  
Ci , j C
Pi , j log 2 (1/ Pi , j )
M A 1 M B 1
H (C )   P
i 0 j 0
i, j log 2 (1/ Pi , j ) (A)
 Now, a joint probability Pi,j may be written in terms of a

conditional probability
Pj|i = Pr(bj|ai)
as
Pi,j = Pj|i . Pi
 Using this and the property log(ab) = log(a) + log(b)
Equation (A) above becomes
M A 1 M A 1 M B 1
H (C )   P log (1/ P )    P
i 0
i 2 i
i 0 j 0
i, j log 2 (1/ Pi , j ) (B)
 The first term on the right-hand side of Equation (B) is

simply H(A). The second term is called conditional
entropy.
 It is uncertainty (entropy) of B given A and is written
H(B/A). Thus we may write
H(C) = H(A,B) = H(A) + H(B|A)

Example #2
 Many computer backplanes and memory systems
employ a parity bit as a simple means of error detection.
Let A be an information source with alphabet
A ={0,1,2,3}. Let each symbol a be equally probable and
let B = {0,1} be a parity generator with
0 if a  0 or a  3
bj  
1 if a  1 or a  2
 What are H(A), H(B), and H(A,B)?
Solution of Example #2
 Since we know that
M 1
 1 
H ( A)   Pm log 2  
m0  Pm 
H(C) = H(A,B) = H(A) + H(B|A)
 Do your self
Source Coding
 The entropy of the source is the average information
carried per symbol.
 Since each symbol will either be transmitted ( in case of

communication system ) or stored ( in case of storage
system ), and since each use of the channel ( or each unit
of storage ) has some associated cost, it is clearly
desirable to obtain the most information possible per
symbol ( on the average ).
Source Coding
 If we have an inefficient source, our system can be
made more cost effective through the use of a source
encoder
 A source encoder can be looked at as a data processing

element which takes an input sequence of s0, s1, ……
symbols st  A from the information source and
produces an output sequence s’0, s’1, …. using symbols
s’t drawn from a code alphabet B.
Source Coding
 These symbols are called code words
 The objective of the encoder is to process the input in

such a way that the average information transmitted (or
stored) per channel use closely approaches to H(A).
 In its simple form, the encoder can be viewed as a

mapping of the source alphabet A to a code alphabet B.
 Mathematically this is represented as:
C: AB
Source Coding
 Since encoded sequence must eventually be
decoded, the function C must be invertible.
 This means there exist another function C-1

such that if
C (a) b
then
1
C (b) a
Source Coding
 This is possible only if C (a ) b is unique
 i.e for every b  B there is exactly one a  A such that

C (a ) b and for every a  A , there is exactly one b  B
such that C 1 (b) a
Example 3
 Let A be 4-ary source with symbol probabilities
PA  {0.5,0.3,0.15,0.05}
 Let C be an encoder which maps the symbols in
A into strings of binary digits, as follows:
p0  0.5 C(a 0 ) 0
P1  0.3 C(a1 ) 10
p2  0.15 C(a 2 ) 110
P3  0.05 C(a 3 ) 111
Source Coding
 Let Lm be the number of binary digits in code word bm.
If the code words are transmitted one binary digit at a
time, the average number of transmitted binary digits
per code word is given by
3
L   pm Lm  .5(1)  .3(2)  .15(3)  .05(3)  1.70
m 0
Recalling Example-1
 What is the entropy of a 4-ary source having symbol
probabilities:
PA = {0.5, 0.3, 0.15, 0.05}
 Solution:
 H(A) = 0.5log2 (2)+0.3log2 (10/3)+0.15log2 (100/15)+0.05log2 (20) = 1.6477 bits.
Source Coding
 The efficiency of this encoder is
1.6477
H ( A) / L   0.96924
1.70
 If the source encoder were not used, we would need two

binary digits to represent each source symbol.
 The efficiency of the uncoded source would be

H(A)/2 = 0.82385.
Source Coding
 More sophisticated data processing by the encoder can
also be carried out.
 Suppose that symbols emitted by source A were first

grouped into ordered pairs <ai, aj> and the code words
produced by the encoder were based on these pairs.
 The set of all possible pairs <ai, aj> is called the Cartesian
product of set A with itself and is denoted by A x A.
Source Coding
 The encoding process then becomes a function of two
variables and can be denoted by a mapping
C : A A  B
 or by a function
C (ai , a j ) b
Example 4
 Let A be 4-ary memoryless source with symbol
probabilities
PA  {0.5,0.3,0.15,0.05}
 Since A is a memoryless source, the probability of any

given pair of symbols is given by Pr(ai,aj)=Pr(ai).Pr(aj)
 Let the encoder map pairs of symbols into code words

shown in the following table:
Example 4 (Cond…)
<ai,aj> Pr(ai,aj) bm <ai,aj> Pr(ai,aj) bm
a0,a0 .25 00 a2,a0 .075 1101
a0,a1 .15 100 a2,a1 .045 0111
a0,a2 .075 1100 a2,a2 .0225 111110
a0,a3 .025 11100 a2,a3 .0075 1111110
a1,a0 .15 101 a3,a0 .025 11101
a1,a1 .09 010 a3,a1 .015 111101
a1,a2 .045 0110 a3,a2 .0075 11111110
a1,a3 .015 111100 a3,a3 .0025 11111111
Example 4 (Cond…)
 Since the symbols from A are independent, therefore
 H(AxA) = 2H(A) =3.2954
 Since Pm =Pr(bm) = Pr(ai,aj) for each symbol, the
average number of bits per transmitted code word is
15
L   pm Lm  3.3275
m 0
 The efficiency of the encoder is therefore
H ( A  A) / L  .99035
 Less than 1% of the transmitted bits are redundant

Joint and Conditional Entropy

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Joint and Conditional Entropy

Hochgeladen von

Copyright:

Verfügbare Formate

Information Theory and Coding

Tutor: Engr: Tasleem Dehraj

Lecture No. 4-5-6

 The designers of such systems are concerned with

 Likewise, many computer systems support multiple

 Also let us assume the cardinality of each system as:

 If sources A and B are statistically independent, the total

 Let the joint probability that A sends symbol ai

 Let us consider combined emission of symbol ai and bj as

 If C is the set of all compound symbols ci,j, the entropy

 Now, a joint probability Pi,j may be written in terms of a

 The first term on the right-hand side of Equation (B) is

H(C) = H(A,B) = H(A) + H(B|A)

H(C) = H(A,B) = H(A) + H(B|A)

 Since each symbol will either be transmitted ( in case of

 A source encoder can be looked at as a data processing

 The objective of the encoder is to process the input in

 In its simple form, the encoder can be viewed as a

 Mathematically this is represented as:

 This means there exist another function C-1

 i.e for every b  B there is exactly one a  A such that

 If the source encoder were not used, we would need two

 The efficiency of the uncoded source would be

 Suppose that symbols emitted by source A were first

 Since A is a memoryless source, the probability of any

 Let the encoder map pairs of symbols into code words

 Less than 1% of the transmitted bits are redundant

Das könnte Ihnen auch gefallen