Sie sind auf Seite 1von 56

Compression

compression
Why compression? Lossy or Lossless compression Entropy encoding and source encoding Source encoding Suppression of repetitive sequences Statistical encoding Pattern substitution Huffman encoding Transform encoding Transformations Differential or predictive encoding Different types of differential encoding Vector quantization Errors in vector quantization Vector quantization Fractal compression Asymmetry in compression/decompression

Why compression? Minimize the bit-rate! CD-ROM: 648 MB or 72 minutes of uncompressed stereophonic CD-quality sound BUT only 30 seconds of uncompressed studioquality digital TV
A 90 minutes movie would take about 120 GB, which is about 189 CD-ROMs

We need compression!!

Lossy or Lossless compression


Some information contains redundancy Is all information needed?
Depends on the target and the final "viewer"

Lossless: All information is saved and the compression is reversible Lossy: Some information is "thrown away" based on that the perceptual response of an observer. The compression is irreversible

Entropy encoding and source encoding


Entropy encoding and source encoding Entropy encoding
The source of the data is NOT taken into account when doing the compression remove repetitive sequences statistical encoding lossless and reversible

Source encoding
The data is transformed based on the source and its known characteristics lossy or lossless

Source encoding
Examples: remove silent parts in an audio sequence find common blocks in two video-frames Classification: Transform encoding Differential encoding Vector quantization

Entropy and source encoding can be combined!!

Suppression of repetitive sequences


Zero (or blank) suppression:
A series of n successive zeroes is replaced by a special character followed by the number n.

Run-length encoding:
Same as above but the replaced character is also entered into the code The number of sequential occurrences must be higher than 3

Statistical encoding
The sequence of data that occurs most frequently use the shortest codes A code-book is generated either in advance:
Morse alphabet

or dynamically during the encoding Two groups of encodings:


Pattern substitution Huffman encoding

Pattern substitution
Used when encoding text
Frequent words are replaced with a shorter codeword "Multimedia" is replaced with "*M" and "Network" with "*N"
---------------------------------

Huffman encoding
The code-book is created dynamically and is sent with the compressed information Used for images, movies and program-files. In the case of movies, the code-book can be calculated per frame or per movie

Transform encoding
Some data is easier to compress in the frequency domain The data is translated from the spatial or temporal domain to the frequency domain

Transformations
Mathematical transformations
Fourier, cosines

Lossy or lossless Some frequencies are coded with lower precision or are removed completely Discrete Cosine Transform - DCT
Used when coding images

Differential or predictive encoding


Instead of storing the complete value, the difference to the preceding value (prediction difference or error term) is stored Good when the information contains low variation Higher variations-domain using the same number of bits than absolute encoding

Different types of differential encoding

Differential pulse code modulation - DPCM Delta modulation Adaptive differential pulse code modulation - ADPCM

Vector quantization
Special case of pattern substitution The data is divided into vectors These vectors are compared to a table The the best matching pattern in the code-book is used
The code-book can be constructed in advance or dynamically

Errors in vector quantization


Error occurs (distortion) when no precise match can be found: Lossy: ignore the error Lossless: send the difference (error) with the code-word

Vector quantization
Works well with data with known characteristics, where the code-book can be generated in advance Useful with speech General problems are: How to construct an optimal code-book? Which algorithm to use to find the best matching index?

Fractal compression
Fractals has normally been used to create images not to compress images Fractal transformations:
Divide the image into several small parts Compare each individual part to other parts within the same image
translated, shrunk, slanted, rotated or mirrored

A virtual code-book is created and is dependent of each coded image Requires a large amount of computer power!!!

Audio compression
ITU-T G.7XX Recommendations GSM compression standard Code excited linear prediction (CELP) voice coder VAT Higher-quality audio compression standards Compression techniques used by MPEG-audio Performance and quality Objective of each MPEG-layer

ITU-T G.7XX Recommendations


G.711 G.722 G.726 64, 40, 32, 24, 16 Kbps 64 Kbps 3.4kHz 50Hz-7kHz PCM SB-ADPCM ADPCM Replaces G.721, G.723

G.727

40, 32, 24, 16 Kbps

Embedded ADPCM

To be used in packetised speech

G.728

16 Kbps

3.4kHz

LD-CELP

GSM compression standard


Used for mobile telephony in Europe Bit-rate: 13.2 Kbps

Sampling rate: 8 Khz

Inferior to 711 or 722

Code excited linear prediction (CELP) voice coder


US Federal Standard 1016
compress speech down to 4.8 Kbps

US Federal Standard 1015


Simplified version of CELP, called linear predictive coding (LPC) LPC-10E can operate at 2.4 Kbps

How it works:
Both methods use a form of vector quantization with predefined code-books In 1016, the error is transmitted with the code-word The resulting quality of 1016 is equivalent to that the 32 Kbps ADPCM algorithm used in G.721

VAT
pcm 78Kb/s 8-bit -law encoded 8KHz PCM (20ms frames) pcm2 71Kb/s 8-bit -law encoded 8KHz PCM (40ms frames) pcm4 68Kb/s 8-bit -law encoded 8KHz PCM (80ms frames) dvi 46Kb/s Intel DVI ADPCM (20ms frames) dvi2 39Kb/s Intel DVI ADPCM (40ms frames) dvi4 36Kb/s Intel DVI ADPCM (80ms frames) gsm 17Kb/s GSM (80ms frames) lpc4 9Kb/s Linear Predictive Coder (80ms frames)

Higher-quality audio compression standards Moving Pictures Expert Group (MPEG) family of compression techniques Targets not only speech but sound in general MPEG-1 is described in IS 11172-3 MPEG-1 contains a family of three audio encoding and compression schemes
All three layers are hierarchically compatible

Compression techniques used by MPEG-audio


In principle all layers consists of a combination of transform encoding and sub-band division
The spectrum is divided into 32 sub-bands A fast Fourier transform is applied to represent the signal in the frequency domain A psycho-acoustic model is applied to the transform signal to estimate the just noticeable noise level This is the stage that differs between the different layers

Performance and quality The target bit-rate is ranging between 32 and 448 Kbps per monophonic channel The sampling rate is for MPEG-1 32, 44.1 or 48 kHz MPEG-2 adds 16, 22.05, 24kHz Two audio channels for MPEG-1 Five audio channels for MPEG-2 + low frequency enhancements channel MPEG-audio compression schemes are lossy, but they can achieve perceptually lossless quality

Objective of each MPEG-layer Layer192 or 256 Kbps 96 or 128 Kbps 64 Kbps per audio channel
The quality is very close to CD quality! (Nickname mp3)

Check out mIR - multicast Interactive Radio

Image and TV compression

JPEG MPEG Achievements MPEG MPEG-1 MPEG-2 MPEG-2 Scalable Extensions MPEG-4 MPEG-7 H.320 - H.261 H.320 - H.221, H.230 H.320 - H.231, H.242, H.233, G.7XX H.261 H.261 vs. MPEG Data Rates H.263 PNG, SVG, JPEG-2000

JPEG Joint Photographic Expert Group A standard for compression of both bitonal and continuous-tone images Either lossy a lossless !

MPEG ISO/IEC Joint Technical Committee 1, Sub Committee 29, Work Group 11 MPEG means Moving Picture Experts Group
worldwide 3-5 one week meetings in a year with 300 - 400 people

Coding of Moving Video and Audio

Achievements MPEG Built MPEG-1, MPEG-2


Widely adopted in audiovisual industry Digital TV, DVD, Video-on-Demand, archiving, Music on the Internet

Now working on MPEG-4


First version ready early 1999 Second version early 2000

Identified new work item: MPEG-7 'Multimedia Content Description Interface'


Work is in start-up phase

MPEG-1 MPEG-1 is a standard for storage and retrieval of moving pictures and audio on storage media.
Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to 1.5 Mbps VCR-quality video Standard Interchange Format (SIF) 4:1:1 Subsampling 352 x 240 or 288, but can deal with images up to 4095 x 4095 Progressive in contrast to broadcast TV (interlaced) Compression-rate: 26:1

MPEG-2

MPEG-2 is a standard for digital television.


Generic Coding of Moving Pictures and Associated Audio Studio-quality video Target-rate: from the beginning 4-6 Mbps but now below 10 Mbps Progressive or interlaced MPEG-2 is actually a family of compression schemes:
Low level - CIF (352 x 288) Main level - R.601 (720 x 480) High-1440 level - customer HDTV (1440 x 1152) High level - HDTV (1920 x 1080)

Five profiles also exists

MPEG-3 was abandoned and incorporated into MPEG-2

MPEG-2 Scalable Extensions Support for more than one layer, layers of more or less complexity. A lower layer and an enhanced layer are supported for each of Data partitioning SNR scalability Spatial scalability Temporal scalability

MPEG-4 Note: Changed focus for MPEG-4! MPEG-4 is a standard for multimedia applications. Very important for the future! Earlier:
Very low bit rate Audio-Visual Coding
4.8-64 kbps, QCIF, 10 frames/second

MPEG-7 MPEG-7 is a content representation standard for information search.

H.320 - H.261 ITU standard family for videophony 56 - 1930 Kbps H.320:
The main document for the whole standard with references to other documents

H.261:
Encoding and compression of video p x 64 H.263

H.320 - H.221, H.230 H.221:


The framing protocol How bits are structures into streams and how they should be multiplexed over ISDN

H.230:
Same as H.221 but for none audio/video data

H.320 - H.231, H.242, H.233, G.7XX H.231:


Multipoint Control Units (MCUs) How bridges and mixers should work

H.242:
How to make a connection

H.233 and H-Key


Encryption of data and distribution of keys

G.711, G.722, G.726, G.727, G.728

H.261 Optimized for p*64Kbps, where p=1..30 (ISDN) Three picture components:
Y:C:C with 4:1:1 subsampling

QCIF and optional CIF 30 frames/s Compression:


DPCM Predicted frames based on only previous frames (like P-frames) No B-frames DCT Quantization is linear in contrast to MPEG that uses a table Entropy encoding

H.261 vs. MPEG Data Rates Note the MPEG I-frame peaks! A lost I-frame packet is severe

H.263 Extension of H.261 Supports higher resolutions


Up to 16CIF (1408x1152) Can compete with MPEG

Better optimization H.263+, Video Coding for Low Bit Rate Communication

PNG, SVG, JPEG-2000 PNG, Portable Network Graphics JPEG-2000 SVG, Scalable Vector Graphics

JPEG

JPEG Encoding-modes of JPEG Progressive example Encoding-modes ... Overview JPEG Preparation of the data-blocks Discrete Cosine Transform More DCT Why? Quantization step DPCM encoding Run-length encoding Notes on JPEG Overview JPEG Compression of moving images Implementations

JPEG Joint Photographic Expert Group A standard for compression of both bitonal and continuous-tone images DCT + quantization + run-length + Huffman Either lossy a lossless !

Encoding-modes of JPEG Sequential encoding


One scan, left to right, top to bottom Lossy

Progressive encoding
Allows the image to be rebuilt in multiple coarse-to-clear passes Lossy

Encoding-modes ... Lossless encoding


Reversible encoding (lossless)

Hierarchical encoding
Includes multiple resolution levels which can be decompressed separately.

We will concentrate on Sequential lossy.

Preparation of the data-blocks Different types of picture-components:


Matrices of RGB, YUV, YIQ, YCrCb Total 255 different components allowed in a picture but normally three Subsampling is allowed

The components are divided into blocks to make the transform easier 8x8 samples non-interleaved ordering

Translation from the spatial domain to the frequency domain