Zero-Error Data Compression and Huffman Codes

Ch4.
Zero-Error Data Compression

Yuan Luo
Content
Ch4. Zero-Error Data Compression
4.1 The Entropy Bound

4.2 Prefix Codes
4.2.1 Definition and Existence

4.2.2 Huffman Codes
4.3 Redundancy of Prefix Codes
Definition 4.1 A D-ary source code

for a source
random variable is a mapping from , the
set of all finite length sequences of symbols
taken from a D-ary code alphabet.
Definition 4.2 A code C is uniquely decodable if for

any finite source sequence, the sequence of code
symbols corresponding to this source sequence is
different from the sequence of code symbols
corresponding to any other (finite) source sequence.

Example 1. Let
defined by
, , ,
. Consider the code C
Then all the three source sequence AAD,ACA, and AABA

produce the code sequence 0010. Thus from the code
sequence 0010, we cannot tell which of the three
source sequences it comes from. Therefore, C is not
uniquely decodable.

Theorem (Kraft Inequality)
Let C be a D-ary source code, and let , , , be
the lengths of the codewords. If C is uniquely
decodable, then
1

Example 2. Let
defined by
We know
| |, so
2
, , ,
. Consider the code C

Let X be a source random variable with probability
distribution
, ,,
,
where
2. When we use a uniquely decodable code C
to encode the outcome of , the expected length of a
codeword is given by

Theorem (Entropy Bound)
Let
be a D-ary uniquely decodable code for a
( ). Then
source random variable X with entropy
the expected length of C is lower bounded by
( ),
i.e. ,
( )
This lower bound is tight if and only if

.
Definition 4.8. The redundancy R of a D-ary

uniquely decodable code is the difference between
the expected length of the code and the entropy of
the source.
4.2 Prefix Codes
Definition 4.9. A code is called a prefix-free code

if no codeword is a prefix of any other codeword.
For brevity, a prefix-free code will be referred to
as a prefix code.

Theorem
There exists a D-ary prefix code with codeword
lengths , , , ,if and only if the Kraft
inequality
1
is satisfied.
A probability distribution
such that for all
,where is a positive integer, is called a
,
D-adic distribution. When 2;
is called a
dyadic distribution.
Corollary 4.12. There exists a D-ary prefix code

which achieves the entropy bound for a distribution
if and only if
is D-adic.
4.2.2 Huffman Codes

As we have mentioned, the efficiency of a uniquely
decodable code is measured by its expected length.
Thus for a given source X, we are naturally
interested in prefix codes which have the minimum
expected length. Such codes, called optimal codes,
can be constructed by the Huffman procedure, and
these codes are referred to as Huffman codes.
4.2.2 Huffman Codes
The Huffman procedure is to form a code tree such

that the expected length is minimum. The procedure
is described by a
very simple rule:
Keep merging the two smallest probability masses
until one probability mass ( . . , 1) is left.
4.2.2 Huffman Codes
Lemma 4.15. In an optimal code, shorter codewords

are assigned to larger probabilities.
Lemma 4.16. There exists an optimal code in which
the codewords assigned to the two smallest
probabilities are siblings, i.e., the two
codewords have the same length and they differ
only in the last symbol.
4.2.2 Huffman Codes
Theorem
The Huffman procedure produces an optimal prefix
code.
4.2.2 Huffman Codes

Theorem
The expected length of a Huffman code, denoted by
,satisfies
1.
This bound is the tightest among all the upper bounds
on
which depends only on the source entropy.
From the entropy bound and the above theorem, we have
1

Let X be a source random variable with probability
distribution
,
,,
where
2. A D-ary prefix code for X can be
represented by a D-ary code tree with m leaves,
where each leaf corresponds to a codeword.

: the index set of all the internal nodes
( including the root ) in the code tree.
: the probability of reaching an internal node
during the decoding process.
The probability
is called the reaching
probability of internal node . Evidently,
is
equal to the sum of the probabilities of all the
leaves descending from node .
: the probability that the

branch of node
is
taken during the decoding process. The probabilities
1 , are called the branching
, , 0
probabilities of node , and
,

Once node k is reached, the conditional branching
distribution is
,
,,
Then define the conditional entropy of node k by

,
,,
Lemma 4.19.
Lemma 4.20.
..

Define the local redundancy of an internal node k by
(1
Theorem (Local Redundancy Theorem)

Let L be the expected length of a D-ary prefix code
for a source random variable X, and R be the
redundancy of the code. Then
Corollary 4.22 (Entropy Bound). Let

be the
redundancy of a prefix code. Then
0 with
equality if and only if all the internal nodes in
the code tree are balanced.
Thank you!

Zero-Error Data Compression and Huffman Codes

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Zero-Error Data Compression and Huffman Codes

Hochgeladen von

Copyright:

Verfügbare Formate

Ch4.

Zero-Error Data Compression

Ch4. Zero-Error Data Compression

4.1 The Entropy Bound

4.2.1 Definition and Existence

4.3 Redundancy of Prefix Codes

4.1 The Entropy Bound

Definition 4.1 A D-ary source code

4.1 The Entropy Bound

Definition 4.2 A code C is uniquely decodable if for

4.1 The Entropy Bound

. Consider the code C

Then all the three source sequence AAD,ACA, and AABA

4.1 The Entropy Bound

4.1 The Entropy Bound

. Consider the code C

4.1 The Entropy Bound

4.1 The Entropy Bound

4.1 The Entropy Bound

Definition 4.8. The redundancy R of a D-ary

4.2 Prefix Codes

4.2.1 Definition and Existence

Definition 4.9. A code is called a prefix-free code

4.2.1 Definition and Existence

4.2.1 Definition and Existence

4.2.1 Definition and Existence

Corollary 4.12. There exists a D-ary prefix code

4.2.2 Huffman Codes

4.2.2 Huffman Codes

The Huffman procedure is to form a code tree such

4.2.2 Huffman Codes

Lemma 4.15. In an optimal code, shorter codewords

4.2.2 Huffman Codes

4.2.2 Huffman Codes

4.3 Redundancy of Prefix Codes

4.3 Redundancy of Prefix Codes

4.3 Redundancy of Prefix Codes

: the probability that the

4.3 Redundancy of Prefix Codes

Then define the conditional entropy of node k by

4.3 Redundancy of Prefix Codes

4.3 Redundancy of Prefix Codes

Theorem (Local Redundancy Theorem)

4.3 Redundancy of Prefix Codes

Corollary 4.22 (Entropy Bound). Let

Das könnte Ihnen auch gefallen