Baie ConpresionTetnauns 18
sequences of data bytes cecut. There are different techniques based on such statistical
the most prominent of which are Hulman coding and arithmetic coding.
7.4.7 Huffman Coding
Given the characters that must be encoded, together with their probabilities of
occurrence, the Huffman coding algorithm determines the optimal coding using the
‘minimum number of bits [HufS2]. Hence, the length (numberof bits) of the coded char-
acters will differ. The most frequently occurring characters are assigned to the shortest
‘code words. A Huffman code ean be determined by successively constructing a binary
‘ree, whereby the leaves represent the characters that are to be encoded. Every node
‘contains the relative probability of occurrence of the characters belonging to the subtree
beneath the node. The edges are labeled with the bits 0 and |
‘The following brief example illustrates this process
The leters A, B, CD, and E are to be encoded and have relative probabilities of
‘occurrence as follows:
PIA}=0.16, p(B)=051, p(C)=0.09, p(D)=0.13, p(E)=0.11
2. The two characters with the lowest probabilities, C and E, are combined in the
first binary tee, which has the characters as leaves. The combined probability of
their root node CE is 0.20. The edge from node CE to C is assigned a | and the
cedge from CE to C is assigned a 0. This assignment is arbitrary: thus,
‘Hufliman codes can result {rom the same data,
Nodes with the Following relative probabilities remai
PIA)=O.16, p(B)=051, p(CE)=020, p(D)=0.13
‘The two nodes with the lowest probabilities are D and A. These nodes are e
bbincd to form the leaves of a new binary tree. The combined probability of the
root node AD is 0.29, The edge from AD to A is assigned a | and the edge from
AD to Dis assigned a0.
If root nodes of different trees have the same probability, then trees having the
shortest maximal path between their root and their nodes should be combined
first. This keeps the length of the code words roughly constant,
4.Noxes with the following relative probabil
PIAD)=029, p(B)=051, p(CE)=0.20
‘The two nodes with the lowest probabilities are AD and CE. These are combined
into a binary tree. The combined probability oftheir root node ADCE is 0.49. The
edge fromADCE to AD is assigned a 0 and the edge from ADCE to CE is assigned
all
5. Two nodes re
PIADCE)=049, plB
ain withthe following relative probabilities:
n