Beruflich Dokumente
Kultur Dokumente
Compression Coding
Lectures 10-11
Compression coding
Variable length codes
Assume that there is no channel noise: source coding
Define
source S
code C
with
symbols s1 , . . . , sq
with probabilities p1 , . . . , pq
with
codewords c1 , . . . , cq
of lengths 1 , . . . , q
and radix r
A code C is
uniquely decodeable (UD) if it can always be decoded unambiguously
instantaneous if no codeword is the prefix of another
Such a code is an I-code.
Example
Morse code is an I-code (due to the stop p).
A
B
C
D
12.5% E
F
G
H
I
J
K
L
M
p
p
p
p
p
p
p
p
p
p
p
p
p
N
O
P
0.11% Q
R
S
9.25% T
U
V
W
X
Y
Z
p
p
p
p
p
p
p
p
p
p
p
p
p
1
2
3
4
5
6
7
8
9
0
p
p
p
p
p
p
p
p
p
p
(See Appendix 1 for full list)
Example
The standard comma code of length 5 is
s1
s2
s3
s4
s5
s6
c1
c2
c3
c4
c5
c6
=
=
=
=
=
=
0
10
110
1110
11110
11111
Example
Consider the code C:
s1
s2
s3
s4
c1
c2
c3
c4
=
=
=
=
0
01
11
00
Example
Consider the code C:
s1
s2
s3
s4
s5
c1
c2
c3
c4
c5
=
=
=
=
=
0
01
011
0111
1111
c1
c2
c3
c4
=
=
=
=
00
01
10
11
Example
Consider the code C:
s1
s2
s3
s4
s5
c1
c2
c3
c4
c5
=
=
=
=
=
0
100
1011
110
111
c1
c2
c3
c4
c5
c6
=
=
=
=
=
=
0
10
110
1110
11110
11111
s1
0
s2
0
s3
0
s4
s5
s6
Example
Consider the block code C:
s1
s2
s3
s4
c1
c2
c3
c4
=
=
=
=
00
01
10
11
s1
s2
s3
s4
Example
Consider the code C:
s1
s2
s3
s4
s5
c1
c2
c3
c4
c5
=
=
=
=
=
0
100
1011
110
111
0
s1
s2
0
1
s4
s5
s3
c1
c2
c3
c4
c5
c6
=
=
=
=
=
=
00
01
02
1
20
21
Equivalent code
10
11
12
0
21
20
s4
0
1
s1
s3
s6
s2
s5
Example
Is there a radix 2 UD-code with codeword lengths 1, 2, 2, 3 ?
No, by the Kraft-McMillan Theorem:
1 1 1 2 1 2 1 3 9
= 1
+
+
+
2
2
2
2
8
Example
Is there a radix 3 I-code with codeword lengths 1, 2, 2, 2, 2, 3 ?
Yes, by the Kraft-McMillan Theorem:
1 1 1 2 1 2 1 2 1 2 1 3 22
+
+
+
+
+
=
1
3
3
3
3
3
3
27
For instance,
s1
0
1
s2
s4
s5
0
s1
s2
s3
s4
s5
s6
c1
c2
c3
c4
c5
c6
=
=
=
=
=
=
0
10
11
12
20
210
s3
s6
2
Proof
1 is trivial. We now prove that 1 3 .
Suppose that a radix r UD-code has codeword lengths 1 2 q .
Note that
q
X
X
i n
Nj
1
Kn =
=
j
r
r
j=1
i=1
+
2+2+2
+
2+2+3
r
r
r
3
3
1
1
= 6+ 7+ 8+ 9
r
r
r
r
+
2+3+2
+
2+3+3
+
3+2+2
+
3+2+3
+
3+3+2
1
r3+3+3
nq
X
Nj
j=1
rj
nq j
X
r
j=1
rj
nq
X
1 = nq
j=1
2 1
For i 3, set
ci = ci1 ci2 cii
where ci1 , ci2 , . . . , cii satisfy
i
i1
X
X
cii
ci1 ci2
1
cik
+
+
+
=
=
2j
2
22
2i
2k
j=1
k=1
For
1
2
3
4
instance,
=2
c1
=3
c2
=3
c3
=4
c4
= 00
3
X
1
1
1
0
0
0
1
1
1
= 010
=
=
=
+
+
+
+
+
22 23 23
2 21 22 23 24
= 011 j=1 2j
= 1000
2j
2j
2j
2u
j=1
j=1
j=u
=
=
j
j
k
k
k
2
2
2
2
2
j=1
j=1
k=1
k=1
k= +1
u
<
v
X
1
k
2
k=u +1
X
1
k=u
=
This is a contradiction, so the proof is finished.
k
2
+1
1
2u
Chapter 3:
Compression Coding
Lecture 12
Define
source S
code C
with
symbols s1 , . . . , sq
with probabilities p1 , . . . , pq
with
codewords c1 , . . . , cq
of lengths 1 , . . . , q
and radix r
q
X
i=1
p i i
V =
q
X
pi 2i
L2
i=1
1 1 1
A code C has the codewords 0, 10, 11 with probabilities , , .
2 4 4
Its average length and variance are
1
3
1
1
L= 1 + 2 + 2 =
2
4
4
2
1
1
1
5 3 2 1
2
2
2
2
=
V = 1 + 2 + 2 L =
2
4
4
2
2
4
1 1 1
It is easy to see that C is minimal with respect to , , .
2 4 4
Example
1 1 1
A code C has the codewords 10, 0, 11 with probabilities , , .
2 4 4
Its average length is
1
1
1
7 3
L = 2 + 1 + 2= >
2
4
4
4 2
1 1 1
We see that C is not minimal with respect to , , .
2 4 4
Theorem
If a binary UD-code has minimal average length L with respect to p1 , . . . , pq ,
then, possibly after permuting codewords of equally likely symbols,
1
2
3
4
5
1 2 q
The code
be assumed to be instantaneous.
Pq may
i
K = i=1 2 = 1
q1 = q
cq1 and cq differ only in their last place.
Proof
1 Suppose that p
m > pn and m > n .
Swapping cm and cn gives a new code with smaller L, a contradiction.
2 Use the Kraft-McMillan Theorem.
3 If K < 1, then the code can be shortened, reducing L, a contradiction.
4 We know that
q1 q . If q1 < q , then there must be nodes in the
decision tree where no choice is made, implying K < 1, a contradiction.
5 The tree must end with a simple fork:
sq1
sq
10
Example
In the place-low strategy, we place sa,b as low as possible.
Consider a source s1 , . . . , s6 with probabilities 0.3, 0.2, 0.2, 0.1, 0.1, 0.1 .
00 s1 0.3
0.3
10 s2 0.2
0.3
codewords
0.2
0.4
11 s3 0.2
0.3
0.6
0
0.2
0.3
0
011 s4 0.1
0100 s5 0.1
1.0
1
0.2
0.2
0.4
0.3
0.2
1
0.1
0101 s6 0.1 L = 0.32 + 0.22 + 0.22 + 0.13 + 0.14 + 0.14 = 2.5
V = 0.322 + 0.222 + 0.222 + 0.132 + 0.142 + 0.142 L2 =110.65
Example
In the place-high strategy, we place sa,b as high as possible.
Consider a source s1 , . . . , s6 with probabilities 0.3, 0.2, 0.2, 0.1, 0.1, 0.1 .
01 s1 0.3
0.3
11 s2 0.2
0.3
codewords
0.2
0.4
000 s3 0.2
0.3
0.2
001 s4 0.1
0.6
0
0.3
0
1
1.0
1
0.2
0.4
0.2
100 s5 0.1
0.3
0.2
0.1
101 s6 0.1 L = 0.32 + 0.22 + 0.23 + 0.13 + 0.13 + 0.13 = 2.5
V = 0.322 + 0.222 + 0.232 + 0.132 + 0.132 + 0.132 L2 = 0.25
The average length is the same as for the place-low strategy
- but the variance is smaller. It turns out that this is always the case,
so we will use only use the place-high strategy.
The Huffman Code Theorem
For any given source S and corresponding probabilities,
the Huffman algorithm yields an instantaneous minimum UD-code.
12
Chapter 3:
Compression Coding
Lectures 13-14
The Huffman Code Theorem
For any given source S and corresponding probabilities,
the Huffman algorithm yields an instantaneous minimum UD-code.
Proof
We proceed by induction on q = |S|. For q = 2, each Huffman code is
an instantaneous UD-code with minimum average length L = 1.
Now assume that each Huffman code on q 1 symbols is an instantaneous
UD-code with minimum average length.
Let C be a Huffman code on q symbols with average length L and
let C be any UD-code q symbols with minimum average length L .
Denote the codeword lengths of C and C by 1 , . . . , q and 1 , . . . , q .
By construction, cq and cq1 in C differ only in their last coordinate.
By minimality, C has codewords cq , cq1 differing only in the last coordinate.
Combine cq and cq1 in C to get a Huffman code on q 1 symbols and
combine cq and cq1 in C to get a UD-code on q 1 symbols.
Denote the average lengths of these codes by M and M , respectively.
By the induction hypothesis M M , so
q
q
X
X
i p i
i pi
L L =
=
=
=
i=1
i=1
q2
X
i pi + q1 pq1 + q pq
q2
X
i pi
q1 pq1
q pq
i=1
q2
X
X
i pi + q (pq1 + pq )
i pi + q (pq1 + pq )
i=1
i=1
q2
q2
X
X
i=1
q2
=M M 0
13
Theorem (Knuth)
The average codeword length L of each Huffman code is
the sum of all child node probabilities.
01 s1 0.3
0.3
11 s2 0.2
0.3
codewords
0.2
0.4
000 s3 0.2
0.3
0.2
001 s4 0.1
0.3
0
1
0.2
0
1
0.2
100 s5 0.1
0.6
0
1
0
1
1.0
0.4
0.3
0.2
0.1
101 s6 0.1 L = 0.32 + 0.22 + 0.23 + 0.13 + 0.13 + 0.13 = 2.5
L = 1.0 + 0.6 + 0.4 + 0.3 + 0.2 = 2.5
Proof
The tree-path for symbol si will pass through exactly i child nodes.
But pi will occur as part of the sum in each of the these child nodes.
So, adding all child node probabilities adds i copies of pi for each si ;
this is L.
14
Example
Consider a source s1 , . . . , s6 with probabilities 0.3, 0.2, 0.2, 0.1, 0.1, 0.1 .
With radix r = 3, there are 6 (mod r 1) = 0 symbols.
We need to add 1 dummy symbol.
codewords
1 s1 0.3
00 s2 0.2
0.3
01 s3 0.2
0.2
0.5
0
02 s4 0.1
0.2
0.3
20 s5 0.1
21 s6 0.1
dummy
0.2
0.2
0.1
L = 1.0 + 0.5 + 0.2 = 1.7
V = 0.3 12 + + 0.1 22 L2 = 0.21
15
Extensions
Given a source S = {s1 , . . . , sq } with associated probabilities p1 , . . . , pq ,
the nth extension is the Cartesian product
S n = |S {z
S} = {s1 sn : s1 , . . . , sn S} = {1 , . . . , qn }
n
pi
3 1
, .
4 4
S 1 = S pi
ci
S 2 pi
3
4
1
4
aa
ab
a
b
ba
bb
9
16
3
16
3
16
1
16
S3
ci
0
aaa
11
aab
100
aba
101
baa
abb
L(1) = L = 1
L(2) =
L
(3)
bab
27
16
bba
158
64
bbb
(3)
pi
27
64
9
64
9
64
9
64
3
64
3
64
3
64
1
64
ci
1
001
010
011
00000
00001
00010
00011
L(2)
2
L(3)
3
=
=
27
0.84
32
158
0.82
192
16
Markov sources
A k-memory source S is one whose symbols each depend on the previous k.
If k = 0, then no symbol depends on any other, and S is memoryless.
If k = 1, then S is a Markov source.
pij = P (si |sj ) is the probability of si occurring right after a given sj .
The matrix M = (pij ) is the transition matrix.
Entry pij is the probability of getting from state sj to state si .
A Markov process is a set of states (the source S)
and probabilities pij = P (si |sj ) of getting from state sj to state si .
Example
Consider Sydney, Melbourne, and Elsewhere in Australia.
A simple Markov model for the populations of these is that, each year,
population growth by births, deaths, and emmi-/immigration is 0%
of people living in Sydney, 5% move to Melbourne; 3% move Elsewhere
of people living in Melbourne, 4% move to Sydney; 2% move Elsewhere
of people living Elsewhere, 7% move to Sydney; 6% move to Melbourne.
S = {Sydney, Melbourne, Elsewhere}
0.94
0.05
0.04
3
0.0
7
0.0
0.0
6
0.0
2
M=
To
0.92
From
S
M
E
S 0.92 0.04 0.07
M 0.05 0.94 0.06
E 0.03 0.02 0.87
0.87
Lemma
The sum of entries in any column of M is 1.
17
sk
Let xk = mk denote the population distribution after k years.
ek
Then
xk+1 = M xk and xk = M k x0
4.5M
Suppose that the initial population distribution is x0 = 4M .
14M
After k = 20 years, the population distribution is then
4.5M
0.41 0.34 0.38
9.5M
x20 = M 20 x0 = 0.42 0.52 0.44 4M = 10.5M
14M
0.16 0.15 0.19
4M
Note that S and M 20 also form a Markov chain.
Eg., after 20 years, most people will have left Sydney (41% remain),
whereas most people will have stayed in Melbourne (52%).
To find a stable population distribution, we need to find a state x0
for which xk = xk1 = = x1 = x0 ; that is, Mx0 = x0 .
In other
words,
we need an eigenvector x0 of M for the eigenvalue 1; e.g.,
0.6
x0 = 0.76. If we want an eigenvector with actual population numbers,
0.26
8.3M
+4M +14M
then we must scale x0 by 4.5M
: x0 = 10.6M
0.6+0.76+0.26
3.6M
A Markov process M is in equilibrium p if p = M p.
In this case, p is an eigenvector for M and the eigenvalue 1.
We will assume that
M is ergodic: we can get from any state j to any state i.
M is aperiodic: the gcd of cycle lengths is 1.
Theorem
Under the above assumptions, M has a non-zero equilibrium state.
We will only consider equilibriums p with |p| = 1.
18
Chapter 3:
Compression Coding
Lecture 15
Huffman coding for stationary Markov sources
Consider a Markov source S = {s1 , . . . , sq } with probabilities p1 , . . . , pq ,
transition matrix M and equilibrium p.
Define
HuffE : the binary Huffman code on p (ordered)
Huff(i) : the binary Huffman code on the (ordered) ith column of M .
HuffM : s1 is encoded by HuffE ; for i > 1, si is encoded by Huff(i1)
This gives average lengths
LE
L(1) , . . . , L(q)
LM p1 L(1) + + pq L(q) .
Importantly, LM LE .
Example
0.3 0.1 0.1
1
1
Transition matrix M = 0.5 0.1 0.55 has equilibrium p = 3.
8
0.2 0.8 0.35
4
pi HuffE
s1
s2
s3
1
8
3
8
1
2
pi Huff(1)
pi Huff(2)
pi
Huff(3)
01
s1 0.3
00
s1 0.1
10
s1 0.10
11
00
s2 0.5
s2 0.1
11
s2 0.55
s3 0.2
01
s3 0.8
s3 0.35
10
LE = 1.5
L(1) = 1.5
L(2) = 1.2
L(3) = 1.45
3
4
1
LM = L(1) + L(2) + L(3) 1.36 < LE
8
8
8
Therefore, compared to a 2-bit block code C,
LM
1.36
this Huffman code compresses the message length to
=
= 68%.
19
LC
2
code to use
encoded symbol
HuffE
Huff(1)
Huff(2)
Huff(3)
Huff(3)
Huff(2)
Huff(1)
01
1
0
10
0
10
1
encoded symbol
01
1
0
10
0
10
1
decoded symbol
s1
s2
s3
s3
s2
s1
s2
20
Compression Coding
Huffman coding
Huffman coding of extensions
Huffman coding of Markov sources
Arithmetic coding
Dictionary methods
Lossy compression
much more
Arithmetic coding
Input:
21
Example
Consider symbols s1 , s2 , s3 , s4 = with probabilities 0.4, 0.3, 0.15, 0.15.
Let us encode the message s2 s1 s3 :
s2
s1
s3
subinterval start
0
0 + .4 = .4
.4 + 0.3 = .4
.4 + .7.12 = .484
.484 + .85.018 = .4993
subinterval width
1
.31 = .3
.4.3 = .12
.15.12 = .018
.15.018 = .0027
.4
.7
.85
Example
Consider symbols s1 , s2 , s3 , s4 = with probabilities 0.4, 0.3, 0.15, 0.15.
Let us decode the number 0.5 :
0
s1
.4
s2
.7
s3
.85
in interval
decoded symbol
0.5
(0.5 0.4)/.3 = 0.33333
(0.3333 0)/.4 = 0.83333
(0.8333 0.7)/.15 = 0.88889
[0.4, 0.7)
[0, 0.4)
[0.7, 0.85)
[0.85, 1)
s2
s1
s3
Dictionary methods
LZ77, LZ78, LZW, others
For instance used in gzip, gif, ps
LZ78
Input: a message r = r1 rn
Output: The message encoded, given by a dictionary
Algorithm:
Begin with a dictionary D = {}.
Find the longest prefix s of r in D (possibly ), in entry .
Find the symbol c just after s.
Append sc to D, remove it from r, and output (, c).
Repeat in this way until the whole message has been encoded.
23
Example
Let us encode the message abbcbcababcaa :
r
abbcbcababcaa
bbcbcababcaa
bcbcababcaa
bcababcaa
babcaa
bcaa
b
bc
b
bca
0
0
2
3
2
4
Example
Let us decode the message (0,c)(0,a)(2,a)(3,b)(4,c)(4,b) :
output
(0,c)
(0,a)
(2,a)
(3,b)
(4,c)
(4,b)
24