Source coding theorem
◦ It shows the efficient representation of symbols
generated by the source
◦ The main motivation is compression of data
◦ A discrete memoryless source output a symbol
every T seconds
◦ Each symbol is selected from a finite set of symbols
◦
◦ The symbols are occuring with the probabilities
The entropy of this DMS in bits per source
symbols is
The equality holds when symbols are equally
likely.
Entropy is average number of bits per
symbol.
The source rate is H(x)/T bits per second
Suppose we need to represent 26 letters of
english alphabets using bits
We Know that
So, Each of the letters is being is represented
by atleast 5 bits
The number of binary digits(bits) R required
for unique coding
When L is a power of 2
When L is not a power of 2
Here we can conclude that
The fixed length code means each letter in
alphabet is equally important (probable)
So each one requires 5 bits for representation
We know that some of letters are less important
i.e. (x,q,z..etc)
Some letters are more frequently used (s, t, a,
e...etc)
However representing same number of bits for all
the letters is not the efficient way of coding
◦ This is also known as Fixed Length of Codes. (FLC)
For example. ASCII codes.
Better way of coding is
◦ More frequent alphabet is represented by less
number of bits
◦ Less frequent alphabet is represented by more
number of bits
This is known as Variable Length Codes. (VLC)
Fixed Length codes
Variable Length codes
Distinct codes
Uniquely decodable codes
Prefix free codes
Instantaneous codes
Optimal codes
Entropy coding
If the code word length for a code is fixed
A fixed length code assigns fixed number of bits
to the source symbols irrespective of their
statistics of appearance
◦ ASCII codes
A to Z
A to z
0 to 9
Punctuation mark
Commas etc. have a 7 bit code word
If there are L number of source alphbets
If L is a power of 2 then codeword is given by
If L is not a power of 2 then codeword is given by
The codeword is not fixed
◦ More frequent by less number of bits
◦ Less frequent by more number of bits
◦ It require less number of bits as compared to fixed
length of codes to encode a same information
A Code is called Distinct if each codeword is
distinguishable from other
Xj Codeword
X1 00
X2 01
X3 10
X4 11
The coded source symbols are transmitted as
stream of bits
The codes must satisfy some properties so
that the receiver can identify the possible
symbols from stream of bits
A Distinct code is said to be uniquely
decodable if the original source sequence can
be represented perfectly from received
encoded binary sequence.
Symbol Code 1 Code 2
A 00 0
B 01 1
C 10 00
D 11 01
Code 1 is fixed length code
Code 2 is variable length code
The message ‘A BAD CAB’ can be encoded
using above 2 codes
In Code 1 format it appears as
00 010011 100001
In Code 2 format is appears as
0 1001 0001
In Code 1 format it appears as
00 010011 100001
In Code 2 format is appears as
0 1001 0001
Here code 1 requires 14 bits to encode
Here code 2 requires 9 bits to encode
Although code 2 is having less codes, yet it is not
a valid code as there is decoding problem with
this code
The code 0 1001 0001 can be grouped in
different ways as
Symbol Code 1 Code 2
A 00 0
B 01 1
C 10 00
D 11 01
The code 0 1001 0001 can be grouped in
different ways as
[0] [1][0][0][1] [0][0][0][1] which means
A BAAB AAAB
A B C B C D
D C B C D
As the destination does not know where the
codeword ends and there is new codeword
start.
In this case code 1 is uniquely decodable
A code in which no code word forms the prefix of
any other codeword is called prefix free code
The Prefix code is
Symbol Codeword
A 0
B 10
C 110
D 1110
If zero(0) is received, the receiver cannot decide
whether it is entire code for ‘A’ or a partial code
word for ‘C’ or ‘D’
Hence no code word should be prefix of any
other code word. This is called Prefix Free Code
A Uniquely decodable code is said to be an
instantaneous code if the end of any code is
recognizable without checking subsequent
code symbols.
It can be type of Prefix or Prefix free.
A code is called optimal code if it is
instantaneous and has minimum average
length for a given source particular
probability assignment for the source
symbols.
When a variable length code is designed such
that its average codeword length approaches
the entropy of the DMS (discrete memoryless
source).
It is known as entropy coding
◦ Shanon fano and Huffman coding are the examples.
Xj Code 1 Code 2 Code 3 Code 4 Code 5 Code 6
X1 00 00 0 0 0 1
X2 01 01 1 10 01 01
X3 00 10 00 110 011 001
X4 11 11 11 111 0111 0001
Code 1 and Code 2 are fixed length codes
Code 3, 4, 5 and 6 are variable length codes
All codes are distinct except code 1
Code 2, 4, 6 are prefix or instantaneous codes
Code 2, 4, 5 and 6 are uniquely decodable codes
Code 5 is not prefix free code, still it is uniquely decodable since bit 0
indicates the beginning of each codeword
Let X be discrete memory less Source having
an alphabet
If the length of the binary code word
corresponding to be
A necessary and sufficient condition for
existence of an instantaneous binary code is
This is an expression for kraft inequality
It indicates the existence of an instataneous
decodable code with codeword length that
satisfy the inequality
Xj Code 1 Code 2 Code 3 Code 4
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111
For code 1:
Hence this satisfy kraft inequality
Xj Code 1 Code 2 Code 3 Code 4
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111
For code 2:
Hence this does not satisfy kraft inequality
Xj Code 1 Code 2 Code 3 Code 4
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111
For code 3:
Hence this satisfy kraft inequality
Xj Code 1 Code 2 Code 3 Code 4
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111
For code 4:
Hence this satisfy kraft inequality
Viel mehr als nur Dokumente.
Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.
Jederzeit kündbar.