Beruflich Dokumente
Kultur Dokumente
Audio
PCM, DPCM
Image
hierarchical coding, subband coding MPEG, JPEG, DCT Wavelet, HAAR Transform
1
Introduction
A key problem with multimedia is the huge quantities of data that result from raw digitized data of audio, image or video source. The main goal for coding and compression is to alleviate the storage, processing and transmission costs for these data. There are a variety of coding and compression techniques commonly used in the Internet and other system.
2
Introduction
The components of a system are capturing, transforming, coding and transmitting.
Sample
Transform
Coding
Introduction
Sampling --- Analog to Digital Conversion.
An input signal is converted from some continuously varying physical value(e.g. pressure in air, or frequency or wavelength of light) into a continuously electrical signal by some electro-mechanical device. This continuously varying electrical signal can then be converted to a sequence of digital values, called samples, by some analog to digital conversion circuit.
Two factors determine the accuracy of the sample with the original continuous signal:
4
Introduction
Sampling and Nyquist theorem
The maximum rate at which we sample.
Based on Nyquists theorem, the digital sampling rate must be twice of the highest frequency in continuous signal.
The number of bits used in each sample. (known as the quantization level.) however, it is often not necessary to capture all frequencies in the original signal.
For example, voice is comprehensible with a much smaller range of frequencies that we can actually hear.
5
Introduction - Transform
The goal of transform is to decorrelate the original signal, and this decorrelation results in the signal energy being redistributed among only a small set of transform coefficients. The original data can be transformed in a number of ways to make it easier to apply certain compression techniques. The most common transform in current techniques are the Discrete Cosine Transform and wavelet transform.
Compression techniques were developed early in the life of computers, to cope with the problems of limited memory and storage capacity Hardware advances have limited the requirement for such techniques in desktop applications Network and communication capacity restrictions have resulted in continuing work on compression The advent of distributed multimedia has resulted in considerable developments in compression Problem : real-time, or timely, transmission of audio and video over communications networks
7
Media
Speech CD Audio
Sample Rate
8000 samples/sec.
44,100 172 KB/s samples/sec. 2 bytes/sample Satellite Images 180x180 km2 1030MB/image 30 m2 resolution VGA Video 25 frames/sec. 22 MB/s 640x480 pixels 3 bytes/pixel
Another View
Data Rate
128 Kbs 384 Kbs 1.5 Mbs 3.0 Mbs 6.0 Mbs 25 Mbs
Size/Hour
60 MB 170 MB 680 MB 1.4 GB 2.7 GB 11.0 GB
10
1280x720 (1.77)
640x480 (1.33)
320x240
160x120
11
- 250 KB/image Digital Cameras - 18-150 MB/image Digital Television - 166 MB/second
12
Image compression standards are necessitated for ease of exchange in software and hardware
Standards are developed by different standards bodies ISO, ITU, ANSI, etc. Some popular image compression standards JPEG, MPEG-1, MPEG-2, MPEG-4 etc. It is important to note that there are many proprietary compression codes!
13
14
width bpixels (160, 320, 640, 720, 1280, 1920, ) height bpixels (120, 240, 480, 485, 720, 1080, ) depth b bits (1, 4, 8, 15, 16, 24, ) fps bframes per second (5, 15, 20, 24, 30, ) compression factor (1, 6, 24, )
15
Effects of Compression
storage for 1 hour of compressed video in megabytes
1:1 :1 :1 :1 100:1 1920x1080 1, , 111, , , 1 1280x720 , , , 11, , 640x480 , ,1 1 , , 1 320x240 , , ,1 160x120 , 1 ,0 1,0
3 bytes/pixel, 30 frames/sec
16
Predictive
Transform
Statistical Huffman
Subsample
CLUT
Run-length
DCT
Fixed
Adaptive
Bit Assignment
Compressed Bit-Stream
18
As can be seen from the diagram, the majority of video compression algorithms use a combination of compression techniques to produce the bit-stream. We will consider each of the individual techniques identified in the diagram. We assume that all input to the system is in the form of a PCM (Pulse Code Modulation - we will discuss this later when considering Sound sampling) digitised signal in colour component (RGB, YUV) form. Selection of colour component form can be important, where there are differences in colour processing between compression and decompression. Techniques can be made adaptive to the image content.
19
CLUT
Colour Lookup Table pixel values in the bitmap represent an index into a table of colours usually 8bpp, so image limited to 256 colours unique CLUT can be created for each image, but this results in non-trivial preprocessing bpp can be increased for better quality, but once you reach 16bpp truncation is better and simpler
20
Run-length Encoding
blocks of repeated pixels are replaced with a single value plus a count works well on images with large repeated blocks of solid colours, can achieve compression rates below 1bpp good for computer-generated images, cartoons, etc. poor for real images, video, etc.
Interpolative Techniques
Interpolative encoding works at the pixel level by transmitting a subset of the pixels and using interpolation to reconstruct the intervening pixels
not really compression as we are reducing the number of pixels rather than the size of their representation it is validly used in colour subsampling, working with luminance-chrominance component images (YUV), can reduce 24bpp to 9bpp 21 also used in motion video compression (i.e. MPEG)
Predictive Techniques
Based on the fact that we can store the previous item (frame, line, pixel, etc.) and use it to help build the next item, allowing us to transmit only that part of the item that has changed. DPCM
Compare adjacent pixels and only transmit the difference between them, because adjacent pixels are likely to be similar the difference value have a high probability of being small and can safely be transmitted with fewer bits. Hence we can use 4 bit difference values for 8 bit pixels. In decompression the difference value is used to modify the previous pixel to get the new one, which works well as long the amplitude change is small. If the change is a fullamplitude, say from black to white, it would overload the DPCM system, requiring a number of pixel times to make the change and causing smearing of the edges in highcontrast images (slope overload).
22
ADPCM
Adaptive DPCM Can adapt the step size for the difference values to cope with full amplitude changes. Some extra overhead in data and processing to achieve adaptation. Replaces slope overload with quantisation noise for highcontrast edges.
Since predictive encoding is dependent on previous pixels for future ones, any errors are likely to be exacerbated. To avoid this typically predictive schemes make differential start overs, often at the beginning of each scanning line or each frame.
25
In image and video compression, the bundle of data is usually a two-dimensional array of pixels, i.e. 8x8.
2x2 Array of Pixels A B
C Transform X0 = A X1 = B - A X2 = C - A X3 = D - A
D Inverse Transform An = X0 Bn = X1 + X0 Cn = X2 + X0 Dn = X3 + X0
28
In the simple example shown, if the pixels were 8 bits each then the block would use 32 bits :
Using the transform we could assign 4 bits each for the difference values and 8 bits for the base pixel, A. This would reduce the data to 8 + (3x4) or 20 bits for the 2x2 block - compressing from 8bpp to 5bpp. This example is too small to be useful, typically transforms are enacted on 8x8 blocks and the trick is to develop good transforms with calculations that are easy to implement in hardware or software.
As adjacent pixel values tend to be similar or vary slowly from one to another, the DCT processing provides the opportunity for compression by forcing most of the signal energy into the lower spatial frequency components. In most cases, many of the higher-frequency coefficients will have zero or near-zero values and can be ignored.
Statistical Coding
Uses the statistical distribution of the pixel values in an image, or of the data created from one of the techniques already described. Also known as entropy encoding Can be used in bit assignment as well as part of the compression algorithm itself. Due to the non-uniform distribution of pixel values, we can set up a coding technique where the more 30 frequently occurring values are encoded using fewer bits.
A codebook is created which sets out the encodings for the pixel values, this is transmitted separately from the image data and can apply to part of an image, a single image or a sequence of images. Because the most frequently occurring values are transmitted using fewer bits high compression ratios can be achieved. One of the most widely used forms of statistical coding is called Huffman encoding.
Motion Compensation
If we are transmitting video frames on the basis of describing the difference between one frame and the next, how do we describe motion? Compare frames for differences 31 Set threshold value for motion
Use DPCM approach to encode the data Use block structure to determine motion in parts of image (similar to transform approach) In sophisticated compression systems, motion vectors can be developed to ensure fidelity of reproduction
Lossy compression
image shows degradation from original high rates of compression (up to 200:1) Objective - achieve highest possible rate of compression while maintaining quality of image to be virtually lossless 32
What is
JPEG - Joint Photographic Experts Group Still image compression, intraframe picture technology MJPEG is sequence of images coded with JPEG MPEG - Moving Picture Experts Group Many standards MPEG1, MPEG2, and MPEG4 Very sophisticated technology involving intra- and interframe picture coding and many other optimizations => high quality and cost in time/computation H.261/H.263/H.263+ - Video Conferencing Low to medium bit rate, quality, and computational cost Used in H.320 and H.323 video conferencing standards
33
lossy compression
image broken down into 8x8 blocks apply DCT and then quantize the image encode using same system as for lossless
34
Quality Moderate to good Good to Very good Excellent Near original quality
Bits per pixel 0.25 - 0.5 0.5 - 0.75 0.75 - 1.5 1.5 - 2.0
35
JPEG can be used for video information (Motion JPEG) but it makes no concession to the nature of video, maintaining the same structure and, more importantly, bit rate and structure for each frame of the video.
JBIG
lossless compression one bit/pixel, binary or bi-level images based on template structure to model redundancy within image uses arithmetic encoding intended primarily for use with fax
36
MPEG
MPEG - 1 : Data rate 1-1.5Mbps Features :
random access to frames to allow starting of the video sequence at any point fast forward and reverse searches to view video in either direction at more than the original speed reverse playback to permit a reverse play mode - not appropriate in situations such as video telephony audio-video synchronisation manage lip-synch robustness to errors should be able to recover from errors, and not propagate errors through frames, particularly important when dealing with non-error-free communication 37 channels
adjustable delay time or real-time operation not a factor in normal video playback, but of particular importance in video telephony editability permit inclusion of other video in encoded sections flexible format permit different window sizes and frame rates implementable in hardware dedicated chipset for decoding is desirable
Algorithm
Based on 3 types of frame I-frame : similar to JPEG still image, basis of the encoding as it contains the maximum amount of information. P-frame : contains less information than an I-frame and is obtained by using motion-compensated prediction from past I-frames.
38
B-frame: has the greatest level of compression, or the least level of information. Obtained by interpolation between an I-frame and a P-frame.
Audio
various elements of audio capture are defined in the MPEG standard, see the handout.
39
Audio
The input audio signal from a microphone is passed through several stages:
firstly, a band pass filter is applied eliminating frequencies in the signal that we are not interested in. then the signal is sampled, converting the analog signal into a sequence of values. This is then quantised, or mapped into one of a set of fixed value. These values are then coded for storage or transmission.
40
Audio
Some techniques for audio compression:
ADPCM LPC CELP
41
Audio
ADPCM -- Adaptive Differential Pulse Code Modulation
ADPCM allows for the compression of PCM encoded input whose power varies with time. Feedback of a reconstructed version of the input signal is subtracted from the actual input signal, which is quantised to give a 4 bits output value. This compression gives a 32 kbit/s output rate.
42
Audio
Transmitter Original
Xm + -
Em
Channel
Channel
+ Xm' Xm*
Predictor
43
Audio
LPC -- Linear Predictive Coding
The encoder fits speech to a simple, analytic model of the vocal tract. Only the parameters describing the best-fit model is transmitted to the decoder. An LPC decoder uses those parameters to generate synthetic speech that is usually very similar to the original. LPC is used to compress audio at 16 Kbit/s and below.
44
Audio -- CELP
CELP -- Code Excited Linear Predictor
CELP does the same LPC modeling but then computes the errors between the original speech and the synthetic model and transmits both model parameters and a very compressed representation of the errors. The result of CELP is a much higher quality speech at low data rate.
45
CODING
46
Huffman
Uncompressed images, audio, and video data require considerable storage capacity. Data transfer of uncompressed video data over digital networks requires very high bandwidth to be provided for a single point-to-point communication. Without compression, a CD with a storage capacity of approximately 600 million bytes would only be able to store about 260 pictures (1024x768 true color) or at the 25 frames per second rate of a motion picture, about 10 seconds of a movie.
47
Compression Terminology
Compression Ratio
The ratio of raw data to compressed data It is computed by dividing the original number of bits or bytes by the number of bits or bytes remaining after data compression is applied or as a percentage of compressed/original. For lossless compression, compression ratios of 2:1 (50%) or 3:1 (30%) are typical. For lossy compression on video, compression ratios of more than 100:1 may be achievable with the effectiveness of the compression algorithms and acceptable information loss.
48
Image Compression
2-stage Coding technique
1. A linear predictor such as DPCM, or some linear predicting function Decorrelate the raw image data 2. A standard coding technique, such as Huffman coding, arithmetic coding, Lossless JPEG: - version 1: DPCM with arithmetic coding - version 2: DPCM with Huffman coding
49
Entropy Encoding
Used regardless of medias specific characteristics. The data stream to be compressed is considered to be a simple digital sequence and the semantics of the data is ignored. It is a lossless process.
50
Source Encoding
Takes into account the semantics of the data. The degree of compression that can be reached by source encoding depends on the data contents. It is usually a lossy process.
51
55
A simple example
Suppose we have a message consisting of 5 symbols, e.g. [ ] How can we code this message using 0/1 so the coded message will have minimum length (for transmission or saving!) 5 symbols at least 3 bits For a simple encoding, length of code is 10*3=30 bits
56
Definitions
An ensemble X is a triple (x, Ax, Px) x: value of a random variable Ax: set of possible values for x , Ax={a1, a2, , aI} Px: probability for each value , Px={p1, p2, , pI} where P(x)=P(x=ai)=pi, pi>0, p !1
i 1 2 3 .. 26
ai a b c z
There exists a variable-length encoding C of an ensemble X such that the average length of an encoded symbol, L(C,X), satisfies
L(C,X)[H(X), H(X)+1)
59
Symbol Codes
Notations: AN: all strings of length N A+: all strings of finite length {0,1}3={000,001,010,,111} {0,1}+={0,1,00,01,10,11,000,001,} A symbol code C for an ensemble X is a mapping from Ax (range of x values) to {0,1}+ c(x): codeword for x, l(x): length of codeword
60
Example
Ensemble X:
Ax= { a , b , c , d } Px= {1/2 , 1/4 , 1/8 , 1/8}
ai c(ai) 1000 0100 0010 0001 li 4 4 4 4
a
C0:
b c d
61
L(C , X ) !
x
x
( x )l ( x ) ! pi li
i !1
x|
62
Example
Ensemble X:
Ax= { a , b , c , d } Px= {1/2 , 1/4 , 1/8 , 1/8}
ai c(ai) 0 10 110 111 li 1 2 3 3
c+(acd)=
0110111 (9 bits compared with 12)
C1:
a b c d
prefix code?
63
Example
Ax={ a , b , c , d , e } Px={0.25, 0.25, 0.2, 0.15, 0.15}
0 0.55 1 1 0 0 a 0.25 00 b 0.25 10 1.0
Create a binary tree whose children are the encoding units with the smallest frequencies
The frequency of the root is the sum of the frequencies of the leaves
Repeat this procedure until all the encoding units are in the binary tree
68
Example, step I
Assume that relative frequencies are:
A: 40 B: 20 C: 10 D: 10 R: 20
69
Example, step II
C and D have already been used, and the new node above them (call it C+D) has value 20 The smallest values are B, C+D, and R, all of which have value 20
Connect any two of these
70
71
Example, step IV
Connect the final two nodes
72
Assign 0 to left branches, 1 to right branches Each encoding is a path from the root
A=0 B = 100 C = 1010 D = 1011 R = 11
Example, step V
73
Example 2
Suppose we want to compress the following message ABEACADABEA. The count table is:
character frequency A 5 B 2 C 1 D 1 E 2
The following table is used to encode the characters: Character representation A 1 B 01 C D 0010 0011 E 000
A B E C D
Example 2
The message ABEACADABEA can then be encoded with the string 10100010010100111010001. 75
Total number of bits to code the string = 5v1 + 2v2 + 1v4 + 1v4 + 2v3 = 23 If the original message uses 8 bit for 1 character, its length is 11v8 = 88. The compression ratio is 88/23 = 3.83 (26.14%). If the original message uses 3 bit for 1 character, its length is 11v3 = 33. The compression ratio is 33/23 = 1.43 (69.70%).
76
Example 2
Example 3
a) Using Huffman Coding Algorithm, compress a string with the following probability of occurrence:
Character robability
b)
G 0.025
Assuming that the above string was originally represented by 3-bit code, calculate the compression ratio achieved.
77
Example 3
Character A B C
1.0 0.6 0.275
A 0.325
F 0.075
E 0.050
G 0.025
D 0.125
B 0.250
C 0.150
78
Example 3
Character A B C E F G robability 0.325 0.250 0.150 0.125 0.050 0.075 0.025
A(00) B(10) C(11) D(011) E(01010) F(0100) G(01011) Original expected length: 3 bits Compressed expected length: 0.325(2) + 0.250(2) + 0.150(2) + 0.125(3) + 0.050(5) + 0.075(4) + 0.025(5) = 2.375 Compression ratio = 3/2.375 = 1.26 (79.17%)
79
LZW (Lempel-Ziv-Welch) is a dictionarybased compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is designed to be fast to implement but not necessarily optimal since it does not perform any analysis on the data.
80
LZW
LZW
The principle of encoding
The algorithm is surprisingly simple. It replaces strings of characters with single codes. It does not do any analysis of the incoming text. Instead, it just adds every new string of characters it sees to a table of strings. Compression occurs when a single code is output instead of a string of characters. It became very widely used after it became part of the GIF image format in 1987.
81
LZW
Algorithm
The compression algorithm is as follows: Initialize table with single character strings STRING = first input character WHILE not end of input stream CHARACTER = next input character IF STRING + CHARACTER is not in the string table add STRING + CHARACTER to the string table output the code for STRING STRING = CHARACTER ELSE STRING = STRING + CHARACTER //wait until for a new string END WHILE output code for string
83
0..255 ..ABC..
NEW 65 (A) 256 (BA) 257 (AB) 65 (A) 260 (AA) string A BA AB A AA output A BA AB A AA character A B A A A index 256 257 258 259 260 entry BA AB BAA ABA AA OLD 65 (A) 256 (BA) 257 (AB) 65 (A) 260 (AA) 87
DPCM
89
1-D DCT:
F([) ! a(u) 2
N 1
N 1 (2n 1)[T (2n 1)[T a(0) ! 1 a(u) f(n) ! F([)cos 2 f(n)cos 16 2 16 a(p) ! 1 ? p { 0 A [ !0 n!0 90
92
160.00
2000.00
120.00
1500.00
80.00
1000.00
40.00
500.00
93
1 2
94