Sie sind auf Seite 1von 24

Chapter 3:

Compression Coding

Lectures 10-11
Compression coding
Variable length codes
Assume that there is no channel noise: source coding
Define
source S
code C

with
symbols s1 , . . . , sq
with probabilities p1 , . . . , pq
with
codewords c1 , . . . , cq
of lengths 1 , . . . , q
and radix r

A code C is
uniquely decodeable (UD) if it can always be decoded unambiguously
instantaneous if no codeword is the prefix of another
Such a code is an I-code.
Example
Morse code is an I-code (due to the stop p).
A
B
C
D
12.5% E
F
G
H
I
J
K
L
M

p
p
p
p
p
p
p
p
p
p
p
p
p

N
O
P
0.11% Q
R
S
9.25% T
U
V
W
X
Y
Z

p
p
p
p
p
p
p
p
p
p
p
p
p

1
2
3
4
5
6
7
8
9
0

p
p
p
p
p
p
p
p
p
p
(See Appendix 1 for full list)

Example
The standard comma code of length 5 is
s1
s2
s3
s4
s5
s6

c1
c2
c3
c4
c5
c6

=
=
=
=
=
=

0
10
110
1110
11110
11111

This code is an I-code.


Decode
1100111101101011111110
as
s3 s1 s5 s3 s2 s6 s3

Example
Consider the code C:
s1
s2
s3
s4

c1
c2
c3
c4

=
=
=
=

0
01
11
00

This code is not uniquely decodable since, for example,


0011
can be decoded as s1 s1 s3 and s4 s3 .

Example
Consider the code C:
s1
s2
s3
s4
s5

c1
c2
c3
c4
c5

=
=
=
=
=

0
01
011
0111
1111

This code is uniquely decodable but is not instantaneous.


Example
Consider the code C:
s1
s2
s3
s4

c1
c2
c3
c4

=
=
=
=

00
01
10
11

This code is a block code and is thus an I-code.

Example
Consider the code C:
s1
s2
s3
s4
s5

c1
c2
c3
c4
c5

=
=
=
=
=

0
100
1011
110
111

This code is an I-code.


Decode
0011001011111
as
s1 s1 s4 s1 s3 s5
3

Decision trees can represent I-codes.


Example
The standard comma code of length 5 is
s1
s2
s3
s4
s5
s6
0

c1
c2
c3
c4
c5
c6

=
=
=
=
=
=

0
10
110
1110
11110
11111

s1
0

s2
0

s3
0

s4

s5

s6

Example
Consider the block code C:
s1
s2
s3
s4

c1
c2
c3
c4

=
=
=
=

00
01
10
11

s1

s2

s3

s4

Example
Consider the code C:
s1
s2
s3
s4
s5

c1
c2
c3
c4
c5

=
=
=
=
=

0
100
1011
110
111
0

s1

s2

0
1

s4

s5

s3

Decision trees can represent I-codes.


Branches are numbered from the top down.
Any radix r is allowed.
Two codes are equivalent if their decision trees are isomorphic.
By shuffling source symbols, we may assume that 1 2 q .
Example
Code
s1
s2
s3
s4
s5
s6

c1
c2
c3
c4
c5
c6

=
=
=
=
=
=

00
01
02
1
20
21

Equivalent code
10
11
12
0
21
20

s4

0
1

s1

s3

s6

s2

s5

The Kraft-McMillan Theorem


The following are equivalent:
1
2

There is a radix r UD-code with codeword lengths 1 2 q


There is a radix r I-code with codeword lengths 1 2 q
q  
X
1 i
K=
1
r
i=1

Example
Is there a radix 2 UD-code with codeword lengths 1, 2, 2, 3 ?
No, by the Kraft-McMillan Theorem:
 1 1  1 2  1 2  1 3 9
= 1
+
+
+
2
2
2
2
8
Example
Is there a radix 3 I-code with codeword lengths 1, 2, 2, 2, 2, 3 ?
Yes, by the Kraft-McMillan Theorem:
 1 1  1 2  1 2  1 2  1 2  1 3 22
+
+
+
+
+
=
1
3
3
3
3
3
3
27
For instance,
s1

0
1

s2

s4

s5

0
s1
s2
s3
s4
s5
s6

c1
c2
c3
c4
c5
c6

=
=
=
=
=
=

0
10
11
12
20
210

This is a standard I-code.

s3

s6

2
Proof
1 is trivial. We now prove that 1 3 .
Suppose that a radix r UD-code has codeword lengths 1 2 q .
Note that
q   

X
X
i n
Nj
1
Kn =
=
j
r
r
j=1
i=1

where Nj = |{(i1 , . . . , in ) {1 , . . . , q }n : i1 + + in = j}|.


For instance, if (1 , . . . , q ) = (2, 3) and n = 3, then
q   
X
1 i n  1
1 3  1
1  1
1  1
1
n
K =
= 2+ 3 = 2+ 3
+ 3
+ 3
2
2
r
r
r
r
r
r
r
r
r
i=1
=

+
2+2+2

+
2+2+3

r
r
r
3
3
1
1
= 6+ 7+ 8+ 9
r
r
r
r

+
2+3+2

+
2+3+3

+
3+2+2

+
3+2+3

+
3+3+2

1
r3+3+3

Here, N6 = N9 = 1 and N7 = N8 = 3 and Nj = 0 if j 6= 6, 7, 8, 9 .


Now, Nj counts the ways to write n-codeword messages of length j.
Since the code is UD, each such message can only be written in one way,
so Nj rj . Therefore,
Kn =

nq
X
Nj
j=1

rj

nq j
X
r
j=1

rj

nq
X

1 = nq

j=1

This inequality holds for all n = 1, 2, . . . ,


but the left-hand side is exponential whereas the right-hand side is linear.
We conclude that K 1.
We have proved that 2 1 and that 1 3 .
To conclude the proof, let us also prove that 3 2 (just for r = 2).
Therefore, suppose that K 1; we wish to construct an I-code.
0} and c2 = 0| {z
01} 0| {z
0}.
Set c1 = 0| {z
1

2 1

For i 3, set
ci = ci1 ci2 cii
where ci1 , ci2 , . . . , cii satisfy
i
i1
X
X
cii
ci1 ci2
1
cik
+

+
+
=
=
2j
2
22
2i
2k
j=1
k=1

Such ci1 , . . . , cii exist since


q
X
1
= K 1.
j
2
j=1

For
1
2
3
4

instance,
=2
c1
=3
c2
=3
c3
=4
c4

= 00
3
X
1
1
1
0
0
0
1
1
1
= 010
=
=
=
+
+
+
+
+
22 23 23
2 21 22 23 24
= 011 j=1 2j
= 1000

These binary expansions are unique, so the code is UD.


Assume that the code is not instantaneous.
Then some cu is a prefix of some cv where u < v.
Then u < v , so
u1
v1
v1
X
X
X
1
1
1
1

2j
2j
2j
2u
j=1
j=1
j=u

However, we also have


u1
u
v1
v
v
X
X
X
X
1
1
cvk X
cuk
cvk

=
=

j
j
k
k
k
2
2
2
2
2
j=1
j=1
k=1
k=1
k= +1
u

<

v
X

1
k
2
k=u +1

X
1

k=u

=
This is a contradiction, so the proof is finished.

k
2
+1

1
2u

Chapter 3:

Compression Coding

Lecture 12
Define
source S
code C

with
symbols s1 , . . . , sq
with probabilities p1 , . . . , pq
with
codewords c1 , . . . , cq
of lengths 1 , . . . , q
and radix r

By shuffling source symbols, we may assume that p1 p2 pq .


The (expected or) average length and variance of codewords in C are
L=

q
X
i=1

p i i

V =

q
X

pi 2i

L2

i=1

A UD-code is minimal with respect to p1 , . . . , pq if it has minimal length.


Example

1 1 1
A code C has the codewords 0, 10, 11 with probabilities , , .
2 4 4
Its average length and variance are
1
3
1
1
L= 1 + 2 + 2 =
2
4
4
2
1
1
1
5  3 2 1
2
2
2
2
=
V = 1 + 2 + 2 L =
2
4
4
2
2
4
1 1 1
It is easy to see that C is minimal with respect to , , .
2 4 4
Example

1 1 1
A code C has the codewords 10, 0, 11 with probabilities , , .
2 4 4
Its average length is
1
1
1
7 3
L = 2 + 1 + 2= >
2
4
4
4 2
1 1 1
We see that C is not minimal with respect to , , .
2 4 4

Theorem
If a binary UD-code has minimal average length L with respect to p1 , . . . , pq ,
then, possibly after permuting codewords of equally likely symbols,
1
2
3
4
5

1 2 q
The code
be assumed to be instantaneous.
Pq may
i
K = i=1 2 = 1
q1 = q
cq1 and cq differ only in their last place.

Proof
1 Suppose that p
m > pn and m > n .
Swapping cm and cn gives a new code with smaller L, a contradiction.
2 Use the Kraft-McMillan Theorem.
3 If K < 1, then the code can be shortened, reducing L, a contradiction.
4 We know that
q1 q . If q1 < q , then there must be nodes in the
decision tree where no choice is made, implying K < 1, a contradiction.
5 The tree must end with a simple fork:

sq1

sq

Therefore, cq1 and cq differ only in their last place.

10

Huffmans Algorithm (binary)


Input: a source S = {s1 , . . . , sq } and probabilities p1 , . . . , pq
Output: a code C for S, given by a decision tree
Combining phase
Replace the last 2 symbols sq1 and sq
by a new symbol sq1,q with probability pq1 + pq .
Reorder the symbols s1 , . . . , sq2 , sq1,q by their probabilities.
Repeat until there is only one symbol left.
Splitting phase
Root-label this symbol.
Draw edges from symbol sa,b to symbols sa and sb .
Label edge sa sa,b by 0 and label edge sb sa,b by 1.
The resulting code depends on the reordering of the symbols.

Example
In the place-low strategy, we place sa,b as low as possible.
Consider a source s1 , . . . , s6 with probabilities 0.3, 0.2, 0.2, 0.1, 0.1, 0.1 .
00 s1 0.3
0.3
10 s2 0.2

0.3

codewords

0.2

0.4

11 s3 0.2

0.3

0.6
0

0.2

0.3
0

011 s4 0.1
0100 s5 0.1

1.0
1

0.2
0.2

0.4
0.3

0.2
1

0.1
0101 s6 0.1 L = 0.32 + 0.22 + 0.22 + 0.13 + 0.14 + 0.14 = 2.5
V = 0.322 + 0.222 + 0.222 + 0.132 + 0.142 + 0.142 L2 =110.65

Example
In the place-high strategy, we place sa,b as high as possible.
Consider a source s1 , . . . , s6 with probabilities 0.3, 0.2, 0.2, 0.1, 0.1, 0.1 .
01 s1 0.3
0.3
11 s2 0.2

0.3

codewords

0.2

0.4

000 s3 0.2

0.3
0.2

001 s4 0.1

0.6
0

0.3

0
1

1.0
1

0.2

0.4

0.2
100 s5 0.1

0.3
0.2

0.1
101 s6 0.1 L = 0.32 + 0.22 + 0.23 + 0.13 + 0.13 + 0.13 = 2.5
V = 0.322 + 0.222 + 0.232 + 0.132 + 0.132 + 0.132 L2 = 0.25
The average length is the same as for the place-low strategy
- but the variance is smaller. It turns out that this is always the case,
so we will use only use the place-high strategy.
The Huffman Code Theorem
For any given source S and corresponding probabilities,
the Huffman algorithm yields an instantaneous minimum UD-code.

12

Chapter 3:

Compression Coding

Lectures 13-14
The Huffman Code Theorem
For any given source S and corresponding probabilities,
the Huffman algorithm yields an instantaneous minimum UD-code.
Proof
We proceed by induction on q = |S|. For q = 2, each Huffman code is
an instantaneous UD-code with minimum average length L = 1.
Now assume that each Huffman code on q 1 symbols is an instantaneous
UD-code with minimum average length.
Let C be a Huffman code on q symbols with average length L and
let C be any UD-code q symbols with minimum average length L .
Denote the codeword lengths of C and C by 1 , . . . , q and 1 , . . . , q .
By construction, cq and cq1 in C differ only in their last coordinate.
By minimality, C has codewords cq , cq1 differing only in the last coordinate.
Combine cq and cq1 in C to get a Huffman code on q 1 symbols and
combine cq and cq1 in C to get a UD-code on q 1 symbols.
Denote the average lengths of these codes by M and M , respectively.
By the induction hypothesis M M , so
q
q
X
X
i p i
i pi
L L =
=
=
=

i=1

i=1

q2
X

i pi + q1 pq1 + q pq

q2
X

i pi

q1 pq1

q pq

i=1
q2
 X


X 

i pi + q (pq1 + pq )
i pi + q (pq1 + pq )
i=1
i=1
q2
q2



 X
X

i pi +(q 1)(pq1 +pq )


i pi +(q 1)(pq1 +pq )
i=1
i=1

i=1
 q2

=M M 0

Thus L L , so the Huffman code has minimum average length.


The code is created using a decision tree, so it is instantaneous.
The proof follows by induction.

13

Theorem (Knuth)
The average codeword length L of each Huffman code is
the sum of all child node probabilities.
01 s1 0.3
0.3
11 s2 0.2

0.3

codewords

0.2

0.4

000 s3 0.2

0.3
0.2

001 s4 0.1

0.3

0
1

0.2

0
1

0.2
100 s5 0.1

0.6

0
1

0
1

1.0

0.4

0.3
0.2

0.1
101 s6 0.1 L = 0.32 + 0.22 + 0.23 + 0.13 + 0.13 + 0.13 = 2.5
L = 1.0 + 0.6 + 0.4 + 0.3 + 0.2 = 2.5
Proof
The tree-path for symbol si will pass through exactly i child nodes.
But pi will occur as part of the sum in each of the these child nodes.
So, adding all child node probabilities adds i copies of pi for each si ;
this is L.

Huffmans Algorithm also works for radix r:


just combine r symbols at each step instead of 2.
However, there are at least two ways to do this:
1 Combine the last r symbols at each combining step.
2 First add dummy symbols; then combine the last r symbols at each step.
It turns out that 2 is the best strategy.
If there are k combining steps, then we need
|S| = k(r 1) + r = (k + 1)(r 1) + 1
initial symbols. In other words, we must have |S| 1 (mod r 1).
We can ensure this by adding dummy symbols.

14

Huffmans Algorithm (radix r)


Input: a source S = {s1 , . . . , sq } and probabilities p1 , . . . , pq
Output: a code C for S, given by a decision tree
Add dummy symbols until q = |S| 1 (mod r 1).
Combining phase
Replace the last r symbols sqr+1 , . . . , sq
by a new symbol with probability pqr+1 + + pq .
Reorder the symbols by their probabilities.
Repeat until there is only one symbol left.
Splitting phase
Root-label this symbol.
Draw edges from each child node to the r preceding nodes.
Label these edges from top to bottom by 0, . . . , q 1.

Example
Consider a source s1 , . . . , s6 with probabilities 0.3, 0.2, 0.2, 0.1, 0.1, 0.1 .
With radix r = 3, there are 6 (mod r 1) = 0 symbols.
We need to add 1 dummy symbol.

codewords

1 s1 0.3
00 s2 0.2

0.3

01 s3 0.2

0.2

0.5
0

02 s4 0.1

0.2

0.3

20 s5 0.1
21 s6 0.1
dummy

0.2

0.2

0.1
L = 1.0 + 0.5 + 0.2 = 1.7
V = 0.3 12 + + 0.1 22 L2 = 0.21
15

Extensions
Given a source S = {s1 , . . . , sq } with associated probabilities p1 , . . . , pq ,
the nth extension is the Cartesian product
S n = |S {z
S} = {s1 sn : s1 , . . . , sn S} = {1 , . . . , qn }
n

together with the following probability for each new symbol i S n :


(n)

pi

= P (i ) = P (s1 sn ) = P (s1 ) P (sn )

Note that we implictly assume that p1 , . . . , pq are independent.


(n)
(n)
We usually order the symbols i so that p1 , . . . , pqn are non-increasing.
Example
Consider source S = {a, b} with associated probabilities

3 1
, .
4 4

We apply the (binary) Huffman algorithm:


(2)

S 1 = S pi

ci

S 2 pi

3
4
1
4

aa

ab

a
b

ba
bb

9
16
3
16
3
16
1
16

S3

ci
0

aaa

11

aab

100

aba

101

baa
abb

L(1) = L = 1
L(2) =
L

(3)

bab

27
16

bba

158
64

bbb

(3)

pi

27
64
9
64
9
64
9
64
3
64
3
64
3
64
1
64

ci
1
001
010
011
00000
00001
00010
00011

Average length per S-symbol for S: 1


Average length per S-symbol for S 2 :
Average length per S-symbol for S 3 :

L(2)
2
L(3)
3

=
=

27
0.84
32
158
0.82
192

16

Markov sources
A k-memory source S is one whose symbols each depend on the previous k.
If k = 0, then no symbol depends on any other, and S is memoryless.
If k = 1, then S is a Markov source.
pij = P (si |sj ) is the probability of si occurring right after a given sj .
The matrix M = (pij ) is the transition matrix.
Entry pij is the probability of getting from state sj to state si .
A Markov process is a set of states (the source S)
and probabilities pij = P (si |sj ) of getting from state sj to state si .

Example
Consider Sydney, Melbourne, and Elsewhere in Australia.
A simple Markov model for the populations of these is that, each year,
population growth by births, deaths, and emmi-/immigration is 0%
of people living in Sydney, 5% move to Melbourne; 3% move Elsewhere
of people living in Melbourne, 4% move to Sydney; 2% move Elsewhere
of people living Elsewhere, 7% move to Sydney; 6% move to Melbourne.
S = {Sydney, Melbourne, Elsewhere}
0.94

0.05
0.04

3
0.0
7
0.0

0.0
6
0.0
2

M=

To

0.92

From
S
M
E
S 0.92 0.04 0.07
M 0.05 0.94 0.06
E 0.03 0.02 0.87

0.87

Lemma
The sum of entries in any column of M is 1.

17

sk
Let xk = mk denote the population distribution after k years.
ek
Then

xk+1 = M xk and xk = M k x0
4.5M
Suppose that the initial population distribution is x0 = 4M .
14M
After k = 20 years, the population distribution is then

4.5M
0.41 0.34 0.38
9.5M
x20 = M 20 x0 = 0.42 0.52 0.44 4M = 10.5M
14M
0.16 0.15 0.19
4M
Note that S and M 20 also form a Markov chain.
Eg., after 20 years, most people will have left Sydney (41% remain),
whereas most people will have stayed in Melbourne (52%).
To find a stable population distribution, we need to find a state x0
for which xk = xk1 = = x1 = x0 ; that is, Mx0 = x0 .

In other
words,
we need an eigenvector x0 of M for the eigenvalue 1; e.g.,
0.6
x0 = 0.76. If we want an eigenvector with actual population numbers,

0.26
8.3M
+4M +14M
then we must scale x0 by 4.5M
: x0 = 10.6M
0.6+0.76+0.26
3.6M
A Markov process M is in equilibrium p if p = M p.
In this case, p is an eigenvector for M and the eigenvalue 1.
We will assume that
M is ergodic: we can get from any state j to any state i.
M is aperiodic: the gcd of cycle lengths is 1.
Theorem
Under the above assumptions, M has a non-zero equilibrium state.
We will only consider equilibriums p with |p| = 1.

18

Chapter 3:

Compression Coding

Lecture 15
Huffman coding for stationary Markov sources
Consider a Markov source S = {s1 , . . . , sq } with probabilities p1 , . . . , pq ,
transition matrix M and equilibrium p.
Define
HuffE : the binary Huffman code on p (ordered)
Huff(i) : the binary Huffman code on the (ordered) ith column of M .
HuffM : s1 is encoded by HuffE ; for i > 1, si is encoded by Huff(i1)
This gives average lengths
LE
L(1) , . . . , L(q)
LM p1 L(1) + + pq L(q) .
Importantly, LM LE .


Example
0.3 0.1 0.1
1
1
Transition matrix M = 0.5 0.1 0.55 has equilibrium p = 3.
8
0.2 0.8 0.35
4
pi HuffE
s1
s2
s3

1
8
3
8
1
2

pi Huff(1)

pi Huff(2)

pi

Huff(3)

01

s1 0.3

00

s1 0.1

10

s1 0.10

11

00

s2 0.5

s2 0.1

11

s2 0.55

s3 0.2

01

s3 0.8

s3 0.35

10

LE = 1.5

L(1) = 1.5

L(2) = 1.2

L(3) = 1.45

3
4
1
LM = L(1) + L(2) + L(3) 1.36 < LE
8
8
8
Therefore, compared to a 2-bit block code C,
LM
1.36
this Huffman code compresses the message length to
=
= 68%.
19
LC
2

Let us now encode the message s1 s2 s3 s3 s2 s1 s2 :


symbol
s1
s2
s3
s3
s2
s1
s2

code to use

encoded symbol

HuffE
Huff(1)
Huff(2)
Huff(3)
Huff(3)
Huff(2)
Huff(1)

01
1
0
10
0
10
1

The message is encoded as 0110100101 .

Let us now decode the message 0110100101 :


code to use
HuffE
Huff(1)
Huff(2)
Huff(3)
Huff(3)
Huff(2)
Huff(1)

encoded symbol
01
1
0
10
0
10
1

decoded symbol
s1
s2
s3
s3
s2
s1
s2

The message is decoded as s1 s2 s3 s3 s2 s1 s2 .

20

Compression Coding
Huffman coding
Huffman coding of extensions
Huffman coding of Markov sources
Arithmetic coding
Dictionary methods
Lossy compression

much more

Arithmetic coding
Input:

source S = {s1 , . . . , sq } where sq = is a stop-symbol


probabilities p1 , . . . , pq
A message si1 sin where sin = sq =

Output: The message encoded, given by a number between 0 and 1


Algorithm:
Partition the interval [0, 1) into sub-intervals of lengths p1 , . . . , pq .
Crop to the i1 th sub-interval.
Partition this sub-interval according to relative lengths p1 , . . . , pq .
Crop to the i2 th sub-sub-interval.
Repeat in this way until the whole message has been encoded.

21

Example
Consider symbols s1 , s2 , s3 , s4 = with probabilities 0.4, 0.3, 0.15, 0.15.
Let us encode the message s2 s1 s3 :

s2
s1
s3

subinterval start
0
0 + .4 = .4
.4 + 0.3 = .4
.4 + .7.12 = .484
.484 + .85.018 = .4993

subinterval width
1
.31 = .3
.4.3 = .12
.15.12 = .018
.15.018 = .0027

.4

.7

.85

We must therefore choose a number in the interval


[0.4993, 0.4993 + 0.0027) = [0.4993, 0.5020)
For instance, we may simply choose the number 0.5.

Example
Consider symbols s1 , s2 , s3 , s4 = with probabilities 0.4, 0.3, 0.15, 0.15.
Let us decode the number 0.5 :
0

s1

.4

s2

.7

s3

.85

code number rescaled

in interval

decoded symbol

0.5
(0.5 0.4)/.3 = 0.33333
(0.3333 0)/.4 = 0.83333
(0.8333 0.7)/.15 = 0.88889

[0.4, 0.7)
[0, 0.4)
[0.7, 0.85)
[0.85, 1)

s2
s1
s3

The decoded message is then s2 s1 s3 .


22

Dictionary methods
LZ77, LZ78, LZW, others
For instance used in gzip, gif, ps
LZ78
Input: a message r = r1 rn
Output: The message encoded, given by a dictionary
Algorithm:
Begin with a dictionary D = {}.
Find the longest prefix s of r in D (possibly ), in entry .
Find the symbol c just after s.
Append sc to D, remove it from r, and output (, c).
Repeat in this way until the whole message has been encoded.

Loosely speaking, LZ78 encodes by finding new codewords, adding them


to a dictionary, and recognising them subsequently.

23

Example
Let us encode the message abbcbcababcaa :
r
abbcbcababcaa
bbcbcababcaa
bcbcababcaa
bcababcaa
babcaa
bcaa

b
bc
b
bca

0
0
2
3
2
4

new dictionary entry output


1. a
(0,a)
2. b
(0,b)
3. bc
(2,c)
4. bca
(3,a)
5. ba
(2,a)
6. bcaa
(4,a)

The message is encoded as (0,a)(0,b)(2,c)(3,a)(2,a)(4,a)

Example
Let us decode the message (0,c)(0,a)(2,a)(3,b)(4,c)(4,b) :
output
(0,c)
(0,a)
(2,a)
(3,b)
(4,c)
(4,b)

new dictionary entry


1. c
2. a
3. aa
4. aab
5. aabc
6. aabb

The message is encoded as caaaaabaabcaabb

24

Das könnte Ihnen auch gefallen