Sie sind auf Seite 1von 41

Huffman and Arithmetic Coding

Coding and Its Application


Introduction
 Huffman codes can be classified as instantaneous code
 It has property:
 Huffman codes are compact codes, i.e. it produces a code with
an average length which is the smallest possible to achieve for
the given number of source symbols, code alphabet, and source
statistics
 Huffman codes operate by reducing a source with q
symbols to a source with r symbols, where r is the size of
the code alphabet
Introduction
 Consider the source S with q symbols si : i  1, 2, , q and
associated with probabilitiesP  si  : i  1, 2, , q
 Let the symbols be renumbered so that P  s1   P  s2    P  sq 
 By combining the last r symbols of S, sq r 1 , sq r  2 , , sq 
into one symbol, s qr 1 with probability P  s q r 1    P  sq r 1 
r

i 1

 The trivial r-ary compact code for the reduced source


with r symbols is used to design the compact code for
the preceding reduced source
Binary Huffman Coding
 The algorithm:
 Re-order the source symbols in decreasing order of symbol
probability
 Reduce the source by combining the last two symbols and re-
ordering the new set in decreasing order
 Assign a compact code for the final reduced source. For a two
symbol source the trivial code is {0, 1}
 Backtrack to the original source S assigning a compact code
 Example:
Consider a 5 symbol source with the following probability
Binary Huffman Coding

The average length is 2.2 bits/symbol


The efficiency is 96.5%

Is Huffman code unique?


r-ary Huffman Codes
qr
 Calculate    r 1 . If α is a non-integer value then append
“dummy” symbols to the source with zero probability until
there are q  r     r 1 symbols
 Re-order the source symbols in decreasing order of symbol
probability
 Reduce the source S to S1 , then S 2 and so on by combining
the last r symbols of S j into a combined symbol and re-
ordering the new set of symbol probabilities for S j 1 in
decreasing order. For each source keep track of the position of
the combined symbol sˆq  r 1
 Terminate the source reduction when a source with exactly r
symbols is produced. For a source with q symbols the reduced
source with r symbols will be S 
r-ary Huffman Codes
 Assign a compact r-ary code for the final reduced source.
For a source with r symbols the trivial code is 0,1, , r
 Backtrack to the original source assigning a compact code
for the j-th reduced source. The compact code assigned
to S , minus the code words assigned to any “dummy”
symbols, is the r-ary Huffman code
r-ary Huffman Codes
 Example: we want to design a compact quaternary code
for a source with 11 symbols

 First, we calculate   114 14  2.33 which is not integer


 
 We need to append “dummy” symbols, so that we have a
source with q  4  2.33  4 1  13 symbols
 The appended symbols is s12 , s13 with P  s12   P  s13   0.00
r-ary Huffman Codes
RUN LENGTH CODING
Run-Length Encoding: RLE
 Replaces sequences of the same data values within a file
 by a count number
 and a single value.
 Also need a special byte value to indicate when count number
follows
 E.g., ASCII does not use high order bit of byte, so special byte value can
be 1000 0000 in binary (we’ll call this  here)
 Suppose the following string of ASCII data has to be
compressed:
ABBBBBBBBBCDEEEEF
 Using RLE compression, the compressed file takes up 10 bytes and
could look like this:
A 9BCD4EF
 Data size before compression: 17 bytes
 Data size after compression: 10 bytes
 Savings: 17/10 = 1.7
 In order to have a savings
 Need to have a sequence of >=4 of same characters
Run-length coding

 Every code word is made up of a pair (g, l) where g is the


gray level, and l is the number of pixels with that gray
level (length, or “run”).
 E.g.,
56 56 56 82 82 82 83 80
56 56 56 56 56 80 80 80
creates the run-length code (56, 3)(82, 3)(83, 1)(80, 4)(56, 5).
 The code is calculated row by row.

 Very efficient coding for binary data.


 Important to know position, and the image dimensions
must be stored with the coded image.
 Used in most fax machines.la University) Image Coding
an
Run-length coding
Run-length coding
Run-length coding

Compression Achieved
Original image requires 3 bits per pixel (in total - 8x8x3=192 bits).
Compressed image has 29 runs and needs 3+3=6 bits per
run (in total - 174 bits or 2.72 bits per pixel).
LZW - CODING
Lempel-Ziv-Welch
The History of LZW
 1977 (LZ77) is published and improved in 1978 (LZ78)
 1981 LZ file for US patent 4,464,650 on LZ78 (granted 1984)
for Sperry Corp
 1983 Welch improves on LZ78 before leaving Sperry, who file
for US patent 4,558,302 June 20, 1983 (granted Dec. 10, 1985)
 1984 Welch, publishes "A Technique for High Performance
Data Compression," IEEE Computer, vol. 17, no. 6, June 1984.
 1986 Sperry, Burroughs form Unisys, who assumed ownership
of US 4,558,302

17
Lempel-Ziv-Welch (LZW) Compression: http://netghost.narod.ru/gff/graphics/book/ch09_04.htm
LZW Compression

 Works by building a dictionary of phrases from the input


stream
 A token or an index is used to identify each distinct phrase
 Character sequences in the original text are replaced by
codes that are dynamically determined.
 The code table is not encoded into the compressed text,
because it may be reconstructed from the compressed text
during decompression.
LZW Compression

 Assume the letters in the text are limited to {a, b}.


 In practice, the alphabet may be the 256 character ASCII set.
 The characters in the alphabet are assigned code numbers
beginning at 0.
 The initial code table is:

code 0 1
key a b
LZW Compression
code 0 1 2
key a b ab
 Original text = abababbabaabbabbaabba
 p=a
 pCode = 0
 c=b
 Represent a by 0 and enter ab into the code table.
 Compressed text = 0
LZW Compression
code 0 1 2 3
key a b ab ba

 Original text = abababbabaabbabbaabba


 Compressed text = 0
• p=b
• pCode = 1
• c=a
• Represent b by 1 and enter ba into the code table.
• Compressed text = 01
LZW Compression
code 0 1 2 3 4
key a b ab ba aba

 Original text = abababbabaabbabbaabba


 Compressed text = 01
• p = ab
• pCode = 2
• c=a
• Represent ab by 2 and enter aba into the code
table.
• Compressed text = 012
LZW Compression
code 0 1 2 3 4 5
key a b ab ba aba abb

 Original text = abababbabaabbabbaabba


 Compressed text = 012
• p = ab
• pCode = 2
• c=b
• Represent ab by 2 and enter abb into the code
table.
• Compressed text = 0122
LZW Compression
code 0 1 2 3 4 5 6
key a b ab ba aba abb bab
 Original text = abababbabaabbabbaabba
 Compressed text = 0122
• p = ba
• pCode = 3
• c=b
• Represent ba by 3 and enter bab into the code
table.
• Compressed text = 01223
LZW Compression
code 0 1 2 3 4 5 6 7
key a b ab ba aba abb bab baa
 Original text = abababbabaabbabbaabba
 Compressed text = 01223
• p = ba
• pCode = 3
• c=a
• Represent ba by 3 and enter baa into the code
table.
• Compressed text = 012233
LZW Compression
code 0 1 2 3 4 5 6 7 8
key a b ab ba aba abb bab baa abba
 Original text = abababbabaabbabbaabba
 Compressed text = 012233
• p = abb
• pCode = 5
• c=a
• Represent abb by 5 and enter abba into the code
table.
• Compressed text = 0122335
LZW Compression
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa
 Original text = abababbabaabbabbaabba
 Compressed text = 0122335
• p = abba
• pCode = 8
• c=a
• Represent abba by 8 and enter abbaa into the code
table.
• Compressed text = 01223358
LZW Compression
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa
 Original text = abababbabaabbabbaabba
 Compressed text = 01223358

• p = abba
• pCode = 8
• c = null
• Represent abba by 8.
• Compressed text = 012233588
Code Table Representation
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa

 Dictionary.
 Pairs are (key, element) = (key,code).
 Operations are : get(key) and put(key, code)
 Limit number of codes to 212.
 Use a hash table.
 Convert variable length keys into fixed length keys.
 Each key has the form pc, where the string p is a key that is already in
the table.
 Replace pc with (pCode)c.
Code Table Representation

code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa

code 0 1 2 3 4 5 6 7 8 9
key a b 0b 1a 2a 2b 3b 3a 5a 8a
LZW Decompression
code 0 1
key a b
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• Convert codes to text from left to right.


• 0 represents a.
• Decompressed text = a
• pCode = 0 and p = a.
• p = a followed by next text character (c) is entered
into the code table.
LZW Decompression
code 0 1 2
key a b ab
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 1 represents b.
• Decompressed text = ab
• pCode = 1 and p = b.
• lastP = a followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3
key a b ab ba
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 2 represents ab.
• Decompressed text = abab
• pCode = 2 and p = ab.
• lastP = b followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4
key a b ab ba aba
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 2 represents ab
• Decompressed text = ababab.
• pCode = 2 and p = ab.
• lastP = ab followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4 5
key a b ab ba aba abb
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 3 represents ba
• Decompressed text = abababba.
• pCode = 3 and p = ba.
• lastP = ab followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4 5 6
key a b ab ba aba abb bab
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 3 represents ba
• Decompressed text = abababbaba.
• pCode = 3 and p = ba.
• lastP = ba followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4 5 6 7
key a b ab ba aba abb bab baa
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 5 represents abb
• Decompressed text = abababbabaabb.
• pCode = 5 and p = abb.
• lastP = ba followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4 5 6 7 8
key a b ab ba aba abb bab baa abba
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 8 represents ???
• When a code is not in the table, its key is
lastP followed by first character of lastP.
• lastP = abb
• So 8 represents abba.
LZW Decompression
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 8 represents abba
• Decompressed text = abababbabaabbabbaabba.
• pCode = 8 and p = abba.
• lastP = abba followed by first character of p is
entered into the code table.
Code Table Representation
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa

 Dictionary.
 Pairs are (key, element) = (code, what the code represents) = (code,
codeKey).
 Operations are : get(key) and put(key, code)
 Keys are integers 0, 1, 2, …
 Use a 1D array codeTable.
 codeTable[code] = codeKey.
 Each code key has the form pc, where the string p is a code key that
is already in the table.
 Replace pc with (pCode)c.
Time Complexity
 Compression.
 O(n) expected time, where n is the length of the text that is
being compressed.
 Decompression.
 O(n) time, where n is the length of the decompressed text.

Das könnte Ihnen auch gefallen