Sie sind auf Seite 1von 102

http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M.

Mhlhuser

Multimedia-Systems: Compression
Prof. Dr.-Ing. Ralf Steinmetz Prof. Dr. Max Mhlhuser
MM: TU Darmstadt - Darmstadt University of Technology, Dept. of of Computer Science TK - Telecooperation, Tel.+49 6151 16-3709, Alexanderstr. 6, D-64283 Darmstadt, Germany, max@informatik.tu-darmstadt.de Fax. +49 6151 16-3052 RS: TU Darmstadt - Darmstadt University of Technology, Dept. of Electrical Engineering and Information Technology, Dept. of Computer Science KOM - Industrial Process and System Communications, Tel.+49 6151 166151, Merckstr. 25, D-64283 Darmstadt, Germany, Ralf.Steinmetz@KOM.tu-darmstadt.de Fax. +49 6151 166152 GMD -German National Research Center for Information Technology httc - Hessian Telemedia Technology Competence-Center e.V

Scope Contents

05A-compression.fm 1 15.March.01

Scope
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Usage

Applications Learning & Teaching Design User Interfaces Group SynchroCommuninization cations

Services

Content Processing

Documents

Security

...

Systems

Databases Media-Server Opt. Memories Computer Architectures

Programming Communications Networks

Operating Systems Quality of Service Compression

Basics

Scope Contents

Image & Graphics

Animation

Video

Audio

05A-compression.fm 2 15.March.01

Contents
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

1. Motivation 2. Requirements - General 3. Fundamentals - Categories 4. Source Coding 5. Entropy Coding: 6. Hybrid Coding: Basic Encoding Steps 7. JPEG 8. H.261 and related ITU Standards 9. MPEG-1 10. MPEG-2 11. MPEG-4 12. Wavelets

Scope Contents

13. Fractal Image Compression 14. Basic Audio and Speech Coding Schemes 15. Conclusion

05A-compression.fm 3 15.March.01

1. Motivation
Digital video in computing means for Text: 1 page with 80 char/line and 64 lines/page and 2 Byte/Char 80 x 64 x 2 x 8 = 80 kBit/page Image: 24 Bit/Pixel, 512 x 512 Pixel/image 512 x 512 x 24 = 6 MBit/Image Audio: CD-quality, samplerate44,1 kHz, 16 Bit/sample Mono: 44,1 x 16 = 706 kBit/s Stereo: 1.412 MBit/s Video: full frames with 1024 x 1024 Pixel/frame, 24 Bit/Pixel, 30 frames/s 1024 x 1024 x 24 x 30 = 720 MBit/s more realistic 360 x 240 Pixel/frame = 60 MBit/s Hence compression is NECESSARY
Contents

Scope

05A-compression.fm 4 15.March.01

http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

2. Requirements - General
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

low delay

intrinsic scalability high quality compression

Scope Contents

low complexity (e.g., ease of decoding) efficient implementation (e.g., memory req.)

05A-compression.fm 5 15.March.01

Requirements
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

DIALOGUE AND RETRIEVAL mode requirements: Independence of frame size and video frame rate Synchronization of audio, video, and other media DIALOGUE mode requirements: Compression and decompression in real-time (e.g. 25 frames/s) End-to-end delay < 150ms

RETRIEVAL mode requirements: Fast forward and backward data retrieval Random access within 1/2 s
Scope Contents

Software and/or hardware-assisted implementation requirements


05A-compression.fm 6 15.March.01

3. Fundamentals - Categories
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

entropy coding - ignoring semantics of the data - lossless source entropy encoding coding - based on semantic of the data - often lossy channel coding

hybrid coding

- entropy and source coding

Scope Contents

- adaptation to communication channel - introduction of redundancy

05A-compression.fm 7 15.March.01

Categories and Techniques


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Entropy Coding

Run-Length Coding Huffman Coding Arithmetic Coding Prediction DPCM DM

Source Coding

Transformation

FFT DCT Bit Position Subsampling Sub-Band Coding

Layered Coding Vector Quantization JPEG

Scope Contents

Hybrid Coding

MPEG H.261, H.263 proprietary: Quicktime, ...

05A-compression.fm 8 15.March.01

Categories & Techniques, Cont.


Two principal possibilities
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

(1)

1. Entropy Coding: Eliminate Redundancy (thus, lossless) 2. Reduction Coding: Eliminate Irrelevance / Low-Relevance (lossy) Preparatory Step: Decorrelation - Eliminate Interdependencies this is the essence of source coding changes "representation" of media goal usually: reduce dependencies between data as such, is a preparatory step!! usually, does not compress Steps in hybrid coding (often): decorrelation - reduction - entropy coding often: reduction by quantization last step: additional compresion without harm

Scope Contents

note: literature usually uses terms as in last slide!! note: reduction coding is "smart deletion", not really "compression"

05A-compression.fm 9 15.March.01

Categories and Techniques, Cont.


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

(2)

Major distinction: Symmetric / Asymmetric Asym. (usually): more effort for compression o.k. if compression non real-time, "only once" (movie!) may involve number-crunchers (...owned by content provider) Symmetric: "required" for real-time, e.g., videoconferencing in reality, often not 100% symmetric Further considerations include, e.g., Adjustable compression rate? ...quality? "smooth" bit stream ("isochronous")? terms: CBR (const. bit rate) vs. VBR (variable bit rate) may be "over time": e.g., packet size BigSmallSmall BigSmallSmall... may be simulated w/ loop-back filter plus buffer "progressive" (mainly: non-continuous media): display-while-download "streaming": ~ same for video (here, rather an issue of software) more subtle issues "open" standard? good "performance" (ratio, speed) for all kinds of media? bullet-proof, well-understood? ...

Scope Contents

05A-compression.fm 10 15.March.01

4. Source Coding DPCM


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

DPCM = Differential Pulse-Code Modulation Assumptions: Consecutive samples or frames have similar values Prediction is possible due to existing correlation Fundamental Steps: Incoming sample or frame (pixel or block) is predicted by means of previously processed data Difference between incoming data and prediction is determined Difference is quantized Challenge: optimal predictor Further predictive coding technique: Delta modulation (DM): 1 bit as difference signal

Scope Contents

05A-compression.fm 11 15.March.01

Source Coding: Transformation


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Assumptions: Data in the transformed domain is easier to compress Related processing is feasible Example: Fourier Transformation time domain frequency domain

Inverse Fourier Transformation FFT: Fast Fourier Transformation

Scope Contents

DCT: Discrete Cosine Transformation

05A-compression.fm 12 15.March.01

Source Coding: Sub-Band


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Assumption: Some frequency ranges are more important than others Example: frequency spectrum of the signal

transformation / coding Application: vocoder for speech communication MPEG audio

frequency

Scope Contents

05A-compression.fm 13 15.March.01

Entropy Coding: Principle


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Entropy (in information theory): information content/ "density" symbols/words equally likely: high entropy (full of information) otherwise: lower entropy (suboptimal representation of info, less dense)
high Entropy
grey levels probability

note: seems "little information" to us since it is very regular; this is not covered by entropy formula, yet may be used for compression (e.g. run length) here: "little info" because "most of picture is in same gray"

probability

low Entropy
grey levels

Entropy formula:

H(P) =

p() log B p()

Scope Contents

example: given 4 possible symbols (words) in source code i) IF all equal p=1/4: H(P)=2; ii) IF p= 1/2, 1/4, 1/8, 1/8 --> H(P)= 1 6/8 "Entropy coding" means: mean length of file equals (~almost) entropy in ii) above, with B=2 (binary): p= ! code length -log2 () = -(-1)=1; p=! 2bits, etc. GOAL: find code w/ symbol length as close as possible to logB p()

05A-compression.fm 14 15.March.01

5. Entropy Coding: Run-Length (only marginal relation to entropy)


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Assumption: Long sequences of identical symbols Example: ... A B C E E E E E E D A C B... compression

... A B C E ! 6 D A C B...

symbol
Scope Contents

number of occurrences special flag

Special variant: zero-length encoding only repetition of zeroes count in red part above, "symbol" not needed (i.e. "pays" for >2 repetitions)

05A-compression.fm 15 15.March.01

Entropy Coding: Huffman


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Basics: Assumption: some symbols occur more often than others E.g., character frequencies of the English language Idea: frequent symbols --> shorter bit strings (cf. Entropy!) Example: Characters to be encoded: A, B, C, D, E probability to occur: p(A)=0.3, p(B)=0.3, p(C)=0.1, p(D)=0.15, p(E)=0.15 coding tree
step 1: scan all leaves, assign (1,0) to the two with lowest probability -> intermediate root steps 2-n: scan current "tops" (intermediate roots or leaves), assign (1,0) to the two with lowest probability, -> ... end: assign codes by descending tree until leaves, bits encountered represent code

probability symbol code 1 30% A 11 step 3 60% 0 30% 1 1 25% 0 10% 15% 15% B C D E 10 011 010 00

step 4 100% 0 40% step 2 0

Scope Contents

05A-compression.fm 16 15.March.01

Entropy Coding: Huffman


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Table and example of application to data stream symbol A B C D E code 11 10 011 010 00

B A C

D A B E B A E

10 11 011 010 11 10 00 10 11 00

note: decoder may auto-detect end-of-symbol in bit-stream

Other types of Entropy encoding


Arithmetic Encoding (1) most direct application of entropy principles! symbols occupy sub-interval of [0,1) according to their probabilities successive coding of symbols cuts out corresponding sub-interval

Scope Contents

in sub-(sub-sub-...) interval chosen so far last symbol --> last sub-interval chose "arbitrary" ("short") no. in this subinterval --> transmit/store requires consideration of "additional" symbol "end-of-word" (below: "!")

05A-compression.fm 17 15.March.01

Arithmetic Coding Example


symbols a,b,c,d,!; how to encode "bbd!"??
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

".1#"$$$$$$$$$$ 0.5 $$$$$$$$$$# ".1#"$$0.2$$#".1#


a b c d !

.1

.6

.7

.9

divide [0,1) according to probabilities

restrict to b: [0.1,0.6)

0 0 0

.1 .1 .1 .1
.11 .36

.6 .6 .6 .6

.7 .7 .7 .7

.9 .9 .9 .9

1 1 1 1
b: sub-interval [.1,.6) of [.1,.6), i.e. [.11,.36) d: sub-interval [.7,.9) of [.11,.36) !: sub-interval [.9,1)...

Scope Contents

transmit/store one (arbitrary) value X in "red" sub-interval decoder: X lies in [.1,.6) !1st symbol is "b"; X in [.11, 36) ! 2nd symbol "b" process continues. until "!" is decoded
05A-compression.fm 18 15.March.01

LZW, Code Books, VLWs, VLIs


LZW (Lempl-Ziv-Welch): e.g.: how to code "whoiswho"? might be thought of as follows: start: create table entry for w then: create table entry for h, but also for "wh" (multi-symbol), then: create table entry for o, but also for "who", (etc. until max-length) multi-symbol entries which repeat "often" survive, others: over-written Code-Books: used in many compression schemes 3 basic possibilities: (imagine in above example) "fixed": all implementations of Codec have (1..n) pre-defined code-books "pre-computed": encoder-pass1: compute codebook, store/transmit upfront encoder-pass2: encode (compress) data using codebook "dynamic": code-book grows / changes during compression (LZW) needs "same procedure" for encoder, decoder either: "pieces" of code-book are intertwined with code as they
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Scope Contents

are generated / changed or: rules are such that (dynamic) codebook contents can be derived from encoded (compressed) data by decoder VLWs / VLIs: variable length words / integers similar to Huffman, but decoder can not detect end-of-symbol e.g., 1="0", 2="01", 3="11", ... (useful?? see JPEG etc.)

05A-compression.fm 19 15.March.01

6. Hybrid Coding: Basic Encoding Steps


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

video: lossy audio: lossless lossy (sometimes lossless) lossless

source data

data preparation

data processing

quantization

entropy encoding

compresse data

Scope Contents

e.g. - resolution - frame rate

e.g. - DCT - sub-band coding

e.g. - linear - DC, AC values

e.g. - runlength - Huffman

05A-compression.fm 20 15.March.01

7. JPEG
JPEG: Joint Photographic Expert Group
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

International Standard: For digital compression and coding of continuous-tone still images: Gray-scale Color Since 1992 Joint effort of: ISO/IEC JTC1/SC2/WG10 Commission Q.16 of CCITT SGVIII Compression rate of 1:10 yields reasonable results

Scope Contents

05A-compression.fm 21 15.March.01

JPEG
Very general compression scheme
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Independence of: Image resolution Image and pixel aspect ratio Color representation Image complexity and statistical characteristics Well-defined interchange format of encoded data

Implementation in: Software only Software and hardware MOTION JPEG for video compression Sequence of JPEG-encoded images

Scope Contents

05A-compression.fm 22 15.March.01

JPEG - Compression Steps


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

image presource paration image pixel

image processing quantipredictor zation

entropy encoding runlength Huffman Arithm.

compressed image

block MCU

FDCT

MCU: Minimum Coded Unit FDCT: Forward Discrete Cosine Transformation

Scope Contents

05A-compression.fm 23 15.March.01

JPEG - Image Preparation


data units
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

top * * * * * Xi bottom * * Yi * * line right

CN C2 C1

left

* *

data units: samples in lossless mode, blocks with 8x8 pixels in other modes

Planes: 1 N 255 components Ci (e.g., one plane per color) Different resolution of individual components possible Pixel resolution: 8 or 12 bit per pixel in lossy modes 2 to 16 bit per pixel in lossless mode

Scope Contents

05A-compression.fm 24 15.March.01

JPEG - Image Preparation


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Example 4:2:2 YUV, 4:1:1 YUV, and YUV9 Coding Luminance (Y): brightness sampling frequency 13.5 MHz Chrominance (U, V): color differences sampling frequency 6.75 MHz

Scope Contents

05A-compression.fm 25 15.March.01

JPEG - Image Preparation


Non-interleaved encoding:
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

top * * * * * * * left * * * * * * * right * * * * * * * bottom Interleaved encoding: * * * * C1 * * * * * * * * * * * * C2 * * * * * * * * * * * * * * * * * * * * * * * * * * C3 * * * * * * * * * * * *

Scope Contents

Minimum Coded Unit (MCU): Combination of interleaved data units of different components

05A-compression.fm 26 15.March.01

JPEG - Baseline Mode


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

1: image presource paration image 8x8 blocks

2: image processing

3. quantization

4. entropy encoding

compressed image

FDCT

tables tables

tables

Baseline mode is mandatory for all JPEG implementations: Often restricted to certain resolution Often only three planes with predefined color set-up Image preparation: Step 1a: Pixel resolution --> multiples of p=8 bit yields 8 x 8 pixel blocks (data units) Step 1b: unsigned --> signed integer (prepare for "oscillation" --> sin/cos) ... other steps see below Step 4a: zigzag linearization (see below) Steps 4b, c, ...: several entropy coding algorithms applied

Scope Contents

05A-compression.fm 27 15.March.01

Intuitive Understanding of DCT


Fourier-Transform (& FFT "fast" algorithm) known from 1-dimensional: cut waveform into pieces (blocks of samples) for each blocks: interpret as periodic (infinite) oscillating waveform represent as sum of sin/cos waves ai sin t; i=0...(N-1); same for cos ai coefficients; a0 = DC (direct current= shift wrt. 0-axis), others: how much of the respective sin or cos wave is part of waveform i increasing frequencies (usually N = no. of samples in block) DCT in JPEG etc.: same idea, but 2-dimensional cos-waves cut out square blocks from picture (NxN) cos waves all have independent frequencies in horizontal/vertical direction comparable to smooth hills, # of valleys may differ horiz/vert. again: interpret sample as periodic (2D) waveform --> represent as sum of (2D) cos wave "hill areas" why only cos?? trick: picture swapped around axes --> 4fold size --> picture symmetric to axes --> sin parts become zero 4fold size no problem: 3 parts redundant axes have double "weight" (pix. row/col. "0") --> factor Cu/Cv in formula
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
Scope Contents

05A-compression.fm 28 15.March.01

JPEG - Baseline Mode: Image Processing


Forward Discrete Cosine Transformation (FDCT):
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

7 1 S vu = --- C u C v 4


x = 0y = 0

( 2x + 1 ) u ( 2y + 1 ) v s yx cos ------------------------------- cos -----------------------------16 16

with: cu, cv =
1-----2

, for u, v= 0; else cu, cv = 1

Formula applied to each block for all 0 u, v 7: Blocks with 8x8 pixel result in 64 DCT coefficients: 1 DC-coefficient S00: basic color of the block 63 AC-coefficients: (likely) zero or near-by zero values Different significance of the coefficients: DC: most important AC: less important

Scope Contents

05A-compression.fm 29 15.March.01

JPEG - Baseline Mode: Image Processing


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

FDCT transforms: blocks into blocks not pixels into pixels Example: Calculation of S00

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Scope Contents

05A-compression.fm 30 15.March.01

JPEG - Baseline Mode: Quantization


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Use of quantization tables for the DCT-coefficients: Map interval of real numbers to one integer number Allows to use different granularity for each coefficient

Scope Contents

05A-compression.fm 31 15.March.01

05A-compression.fm 32 15.March.01

Contents

Scope

http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

JPEG: Quantization Effect

(a) (b)

JPEG - Baseline Mode: Entropy Encoding


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

DC-coefficients: Compute the differences: DCi-1 DC i

...

block

block

...

DIFF = DC i - DCi-1 Use differences instead of the DCi values

Scope Contents

05A-compression.fm 33 15.March.01

JPEG - Baseline Mode: Entropy Coding


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

63 AC coefficients: Ordering in zig-zag form AC01 DC * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

AC 07

AC 70

AC77

reason: coefficients in lower right corner are likely to be zero Huffman coding of all coefficients: Transformation into a code

where amount of bits depends on frequency of respective value Subsequent runlength coding of zeros
Scope Contents

05A-compression.fm 34 15.March.01

JPEG: Details of (one possible) Entropy conding


Treatment of "zig-zag sequence": differential coding of DC: DCi stored as "change wrt. DCi-1" assumption: there will rarely be two non-zero AC values in sequence --> regard seq. as iteration of non-zero AC-values and zero-runlengths --> sometimes, the zero-runlength will have "length zero" code non-zero AC-values as VLIs --> need to transmit VLI-lengths (remember: this is not Huffman --> end of code not found by decoder) create pairs (zero-runlength, VLI-length-of-following-non-zero-AC-value) these pairs are Huffman encoded the very first "pair" is not a pair, but the VLI-length of the (diff.) DC-value the block is finally represented as iteration Huffman-encoded pair / VLI-encoded non-zero-AC / Huffman-.... / VLI... / ... preceded by "Huffman-encoded VLI-length / VLI-encoded diff.-DC" The next two slides give an example of the DCT coding of a 8x8 block
Scope Contents

05A-compression.fm 35 15.March.01

http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

JPEG: Sample Compression of 1 Block: 8x8 Matrices


1. Typical Pixel Block:
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
139 144 149 153 155 155 155 155 144 151 153 156 159 156 156 156 150 155 160 163 158 156 156 156 159 161 162 160 160 159 159 159 159 160 161 162 162 155 155 155 161 161 161 161 160 157 157 157 162 162 161 163 162 157 157 157 162 162 161 161 163 158 158 158

2. DCT Coefficients:
235.6 1.0 -12.1 -5.2 1.5 1.5 2.1 -1.7 -2.7 1.3

-22.6 -17.5 -6.2 -3.2 -2.9 -0.1 -10.9 -9.3 -1.6 -7.1 -1.9 -0.6 -0.8 1.8 -0.2 -2.6 0.2 1.5 0.9 -0.1 1.5 1.7

0.4 -1.2 0.0 0.6 0.3 1.3

0.2 -0.9 -0.6 -0.1

1.6 -0.1 -0.7

1.6 -0.3 -0.8 1.9

1.0 -1.0 1.1 -0.8

-1.3 -0.4 -0.3 -1.5 -0.5 1.6 -3.8 -1.8

1.2 -0.6 -0.4

3. Quantization Matrix:
16 12 14 16 18 24 49 72 11 12 13 17 22 35 64 92 10 14 16 22 37 55 78 95 16 19 24 29 56 64 98 24 26 40 51 40 58 57 87 51 60 69 80 113 61 55 56 62 77 92 99

4. Quantized Result:
15 -2 -1 0 0 0 0 0 0 -1 -1 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Scope Contents

68 109 103 81 104

87 103 121 120 101 112 100 103

05A-compression.fm 36 15.March.01

JPEG: Sample Compression (contd.)


assume: last DC value was 18 --> encoded difference is 3
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

--> only 3, -2, -1 occur as non-zero values. Their VLI-encoding is as follows:


3 11 -2 01 -1 0

This makes the iteration look as follows (VLIs still represented as integers):
(2)(3), (1,2)(-2), (0,1)(-1), (0,1)(-1), (0,1)(-1), (2,1)(-1), (0,0) (<-- abbreviation for "til end")

The following Huffman encoding is defined:


(2) (0,0) (0,1) (1,2) (2,1) 011 1010 00 11011 11100
240 -24 -14 0 0 0 0 0 0 -12 -13 0 0 0 0 0 -10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Scope Contents

...so that the bitstream finally consists of the following 31 bits (for 64 coefficients!):
0111111011010000000001110001010

...btw., the decoded matrix looks like this:


05A-compression.fm 37 15.March.01

JPEG - 4 Modes of Compression


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

lossy sequential DCT-based mode (baseline mode) expanded lossy DCT-based mode

lossless mode

hierarchical mode
Scope Contents

05A-compression.fm 38 15.March.01

JPEG - Extended Lossy DCT-Based Mode


Pixel resolution 8 to 12 bit
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Sequential image display: Top to bottom Good for small images and fast processing

Progressive image display: Coarse to fine Good for large and complicated images

Scope Contents

05A-compression.fm 39 15.March.01

JPEG - Extended Lossy DCT-Based Mode


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Principle: Coefficients stored in buffer after quantization Order of pixel/block processing changed By spectral selection: Selection according to importance of DC, AC value All DC values of whole image first All AC values in order of importance subsequently By successive approximation: Selection according to position of bits First the most significant bit of all blocks Then the second significant bit of all blocks Until the least significant bit of all blocks

Scope Contents

05A-compression.fm 40 15.March.01

JPEG - Lossless Mode


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Image preparation: On pixel basis (2-16 bit/pixel) Image processing: Selection of a predictor for each pixel code 0 1 2 3 4 5 6 7 prediction no prediction x=A x=B x=C x=A+B+C x=A+((B-C)/2) x=B+((A-C)/2) x=(A+B)/2

c b a x

Entropy coding: Same as lossy mode Code of chosen predictor and its difference to the actual value
Scope Contents

05A-compression.fm 41 15.March.01

JPEG - Hierarchical Mode


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Coding of each image with several resolutions: Image scaling Differential encoding First, coded with lowest resolution - image A Coded with increasing horizontal & vertical resolution - image A Difference between both images is computed - B = A - A (*) Iteration for higher resolutions Features: Requires more storage and higher data rate Fast decoding process Used for scalable video Similar to Photo-CD (Kodak, proprietary) (*) note for all scalable approaches: relate higher-res version B (or B) to receivers de-coded lower-res version A (to avoid accumulation of quantization errors)

Scope Contents

05A-compression.fm 42 15.March.01

8. H.261 and related ITU Standards


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Video codec for audiovisual services at p x 64kbit/s ("p-times-sixtyfour", where p means "multiples-of"): CCITT standard from 1990 For ISDN With p=1,..., 30 Technical issues: Real-time encoding/decoding Max. signal delay of 150ms Constant data rate Implementation in hardware (main goal) and software

Scope Contents

05A-compression.fm 43 15.March.01

H.261 - Image Preparation


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Fixed source image format Image components: Luminance signal (Y) Two color difference signals (Cb,Cr) Subsampling according to CCIR 601 (4:1:1) Quarter Common Intermediate Format (QCIF) resolution: Mandatory Y: 176 x 144 pixel ("pruning" 180-->176) CIF: 360*288 At 29.97 frames/s appr. 9.115 Mbps (uncompressed) but: encoder may leave out up to 3 frames (--> ~8 fps) QCIF Common Intermediate format (CIF) resolution: Optional Y: 352 x 288 pixel At 29.97 frames/s appr. 36.46 Mbps (uncompressed) i.e. ~ 570 * 64kbps Layered structure: Block of 8 x 8 pixels Macroblock of: 4 Y blocks, 1 Cr block, 1 Cb block Group of blocks (GOBs) of 3 x 11 macroblocks Picture: QCIF picture: 3 GOBs CIF picture: 12 GOBs

Scope Contents

05A-compression.fm 44 15.March.01

H.261 - Image Compression


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Intraframe coding: yields "reference frame" f0 DCT w/ same quantization factor for all AC values this factor may be adjusted by loopback filter (see below) Interframe coding, motion estimation: Frame 1 Frame 2
note about motion vector mv: - mv points "backwards" in time (pos. of object in f - mv related to block, not moving object

Scope Contents

interframes: f1,f2,f3,... relative to f0 (differential encoding) in H.261: intraframes rare (bandwidth!, main application videophone) Search of similar macroblock (16x16) in previous image Position of this macroblock defines motion vector Search range is up to the implementation: max. 15 pixel but: motion vector may also always be 0 ("bad" software encoder)

05A-compression.fm 45 15.March.01

H.261 - Image Compression


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Interframe coding, further steps: Results: Difference between similar macroblocks Motion vector Difference of macroblocks: DCT if value higher than a specific threshold (hybrid DPCM/DCT!) No further processing if value less than this threshold Motion vector: Components are coded yielding code words of variable length Quantization: Linear Adaptation of step size ("loopback filter") => ~ constant data rate ("leaky bucket": constant 64kbps "drop out"; loopback filter: adjust quantization factor if bucket filled above threshold1 or below threshold 2, respectively)

Scope Contents

05A-compression.fm 46 15.March.01

Further ITU Video Schemes (H.263, H.3xx)


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

H.263 extension to H.261 max. bitrate: H.263 approx. 2.5 x H.261; lowest bitrates suitable f. modem Source Image Formats

Format SQCIF QCIF CIF 4CIF 16CIF

Pixels 128 x 96 176 x 144 352 x 144 704 x 576 1408 x 1152

H.261 Encoder Decoder optional required optional not defined

H.263 Encoder Decoder required required optional optional

Scope Contents

05A-compression.fm 47 15.March.01

H.263
Differences of H.263 compared to H.261 mv may point forward in time (future interframe), cf. MPEG, for video optional PB-frames (2 combined pictures: 1 B- & 1 P-Frame) optional overlapped block motion compensation optional motion vector pointing outside image half pel motion compensation (instead of full pel) JPEG is the still picture mode no included error detection and correction unlimited search space for motion vector --> fast encoder can do better ..
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
Scope Contents

05A-compression.fm 48 15.March.01

H.320, H.32x Family


H.320 specifies (as overview) videophone for ISDN
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

H.310 adapt MPEG 2 for communication over B-ISDN (ATM) H.321 define videoconferencing terminal for B-ISDN (instead of N-ISDN) H.322 adapt H.320 for guaranteed QoS LANs (like ISO-Ethernet) H.323 videoconferencing over non-guaranteed LANs H.324 Terminal for low bit rate communication (over V.34 Modems)

Scope Contents

05A-compression.fm 49 15.March.01

9. MPEG-1
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Motion Picture Expert Group (MPEG) ISO/IEC working group(s) ISO/IEC JTC1/SC29/WG11 ISO IS 11172 since 3/93 Starting point: MPEG-1 Audio/video at about 1.5 Mbit/s Based on experiences with JPEG and H.261 Follow-up standards MPEG-2 MPEG-4 MPEG-7 MPEG-21

Scope Contents

05A-compression.fm 50 15.March.01

MPEG - Features
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

MPEG audio video system combined stream coding data stream coding data stream common buffer management

Consideration of other standards: JPEG H.261 Symmetric and asymmetric compression Constant data rate, should be < 1856 kbit/s Original target rate ~ 1.2 Mbps including audio (=1x CD-ROM: 150 kBps)

Scope Contents

05A-compression.fm 51 15.March.01

MPEG - Video: Preparation Step


Fixed image format
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Color subsampling: Y, Cr, Cb 4:2:0 Resolution: Should be at most 768 x 576 pixel 8 bit/pixel in each layer (i.e., for Y, Cr, Cb) 14 pixel aspect ratios 8 frame rates No user defined MCU like JPEG No progressive mode like JPEG

Scope Contents

05A-compression.fm 52 15.March.01

MPEG - Video: Processing Step


4 types of frames:
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

I-frames (intra-coded frames): Like JPEG Real-time decoding demands P-frames (predictive coded frames): Reference to previous I- or P-frames Motion vector MPEG does not define how to determine the motion vector difference of similar macroblocks is DCT coded DC and AC coefficients are runlength coded B-frames (bi-directional predictive coded frames): Reference to previous and subsequent (I or P) frames Interpolation between macro blocks

Scope Contents

D-frames (DC-coded frames): Only DC-coefficients are DCT coded For fast forward and rewind

05A-compression.fm 53 15.March.01

MPEG - Video Coding


Sequence of I-, P-, and B-frames:
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

I B B P B

References
I-Frames (Intracoded) P-Frames (Predictive Coded)
B P I t

B-Frames (Bidirectionally Coded) (D-Frames (DC Coded))

Scope Contents

Sequence: Defined by application E.g., I B B P B B P B B I B B P B B P B B Order of transmission is different: I P B B ...

05A-compression.fm 54 15.March.01

MPEG - Video: Implications


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Random access at I-frames at P-frames: i.e. decode previous I-frame first at B-frame: i.e. decode I and P-frames first Editing decoded data loss of quality (encode -> decode -> encode -> ...) application of all video editing functions encoded data (previous to entropy encoding) preservation of quality transition effects as function in the DCT domain morphing, non-block conform overlay very difficult encoded data preservation of quality today: too complex, if possible, i.e. need for entropy decoding

Scope Contents

05A-compression.fm 55 15.March.01

MPEG - Audio Coding: Fundamentals


80
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

60 Sound Pressure Level (dB)

fm = 0.25

4 kHz

40 masking patterns 20 absolute threshold of hearing

av

0.02

0.05

0.1

0.2

0.5

5 2 frequency (kHz)

10

20

Masking threshold in the frequence domain narrowband random noise depends on frequency
Scope Contents

05A-compression.fm 56 15.March.01

MPEG - Audio Coding: Fundamentals


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

60 40 SLT 20 0

pre-

simultaneous-

post-masking-

masker
-50 Dt 50 100 150 ms 0 tv 50 100 150 200

Masking in Time Domain after and before the event depends on (to some extent) amplitude

Scope Contents

05A-compression.fm 57 15.March.01

MPEG - Audio Coding


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

sub-band coding

32

quantization

entropy coder & frame packing

psychoacoustical model Yields: heavily asymmetric codecs!! Audio channel: Between 32 and 448 kbit/s In steps of 16 kbit/s

controls: how many bits reserved for which sub-band

Scope Contents

Definition of 3 "layers" of quality: "higher layer" means "more complex" & "can handle lower layers" Layer 1: max. 448 Kbit/s (ca. 1:4 compression, e.g. used as PASC in DCC) Layer 2: max. 384 Kbit/s (ca. 1:6-8, common, e.g. as MUSICAM in DAB) Layer 3: max. 320 Kbit/s (ca. 1:10-12, the famous MP3)

05A-compression.fm 58 15.March.01

MPEG - Audio Coding


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Sampling compatible to encoding of CD-DA and DAT: Sampling rates: 32 kHz, 44,1 kHz, 48 kHz Sampling precision: 16 bit/sample Audio channels: Mono (single, 1 channel) Stereo (2 channels) dual channel mode (independent, e.g., bilingual) optional: joint stereo (exploits redundancy and irrelevancy) Application Example: DAB Digital Audio Broadcasting uses MPEG layer 2 (compression also known as MUSICAM =
(Masking pattern adapted Universal Subband Integrated Coding And Multiplexing)

Scope Contents

delays, for VLSI implementation: max. 30 ms encoding max. 10 ms decoding

SW codec delays vary for different layers, implementations, computers (rule-of-thumb may be 50/100/150 ms for layer 1/2/3, which makes MP3 rather inappropriate for real-time conversation)

05A-compression.fm 59 15.March.01

MPEG - Audio and Video Data Streams


Audio Data Stream Layers:
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

1. Frames 2. Audio access units 3. Slots Video Data Stream Layers: 1. Video sequence layer 2. Group of pictures layer 3. Single picture layer 4. Slice layer 5. Macroblock layer 6. Block layer

Scope Contents

05A-compression.fm 60 15.March.01

10. MPEG-2 Follow-Up MPEG Standards


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

MPEG-2: Higher data rates for high-quality audio/video Multiple layers and profiles MPEG-3 Initially HDTV MPEG-2 scaled up to subsume MPEG-3 MPEG-4: Initially, lower data rates for e.g. mobile communication then: focus coding & additional functionalities based on image contents MPEG-7 (EC = "experimental core" status): Content description Basis for search and retrieval See section on databases MPEG-21 (upcoming): Framework for multimedia business, delivery... whats missing? maybe eCommerce focus --> e.g., security, watermarking?

Scope Contents

05A-compression.fm 61 15.March.01

MPEG-2
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

From MPEG-1 to MPEG-2 Improvement in quality from VCR to TV to HDTV No CD-ROM based constraints higher data rates MPEG-1: about 1.5 Mbit/s MPEG-2: 2-100 Mbit/s Evolution 1994: International Standard Also later known as H.262 Prominent role for digital TV in DVB (digital video broadcasting) commercial MPEG-2 realizations available

Scope Contents

05A-compression.fm 62 15.March.01

MPEG-2 Video
Inclusion of interlaced video format
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Increase resolution, more than CCIR 601 Defined as: 5 profiles (simple, main,..) 4 levels (with increasing resolution,...) Other additional features DCT coefficients may be coded with a non-linear quantization function

Scope Contents

05A-compression.fm 63 15.March.01

MPEG-2 Video: Scaling


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Motivation analog: continuous decrease in quality if errors occur digital: need for tolerance whenever error occur, i.e scaling Option: Spatial scaling reduction of resolution approach image sampled with half resolution, then MPEG algorithms applied, output processed with better FEC (base layer) Image decoded, substracted from original, to difference MPEG algorithms applied, output processed with worse FEC (enhanced layer) Option: Signal to Noise (SNR) scaling noise introduced by quantization errors and visible block structures approach Base layer: DCT output, more significant bits encoded with better FEC Enhanced layer: DCT output, less significant bits encoded with worse FEC

Scope Contents

05A-compression.fm 64 15.March.01

MPEG-2 Video Profiles und Levels


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

High Level 1920 pixels/ line 1152 lines High-1440 Level 1440 pixels/ line 1152 lines Main Level 720 pixels/ line 576 lines Low Level 352 pixels/ line 288 lines

80 Mbit/ s

100 Mbit/ s

60 Mbit/ s

60 Mbit/s

80 Mbit/s

15 Mbit/ 15 Mbit/ 15 Mbit/ s s s

20 Mbit/s

4 Mbit/s 4 Mbit/s

Scope Contents

05A-compression.fm 65 15.March.01

Simple Profile
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Main Profile B-frames 4:2:0 Not Scalable

SNR Scalable Profile B-frames 4:2:0 SNR Scalable

Spatial Scalable Profile B-frames 4:2:0

High Profile B-frames 4:2:0 or 4:2:2

No Bframes LEVELS and PROFILES 4:2:0 Not Scalable

SNR SNR Scalable or Scalable or Spatial Spatial Scalable Scalable

Scope Contents

05A-compression.fm 66 15.March.01

MPEG-2 Audio
(two modest) extension to MPEG-1 audio:
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

1) "low sample rate extension" LSE: 1/2 of all MPEG-1 rates: 16, 22.05, 24kHz quantization down to 8 bits/sample 2) "multichannel extension": more channels, i.e. up to 5 full bandwidth channels (surround system) left and right front center (in front) left and right back "matrixing": rule for backward compatible conversion --> stereo (x, y = 0.71)
Left for Stereo = Left_f + xCenter + yLeft_b Right for Stereo = Right_f + xCenter + yRigtht_b option: +1 "low freq. extension" (LFE) channel for subwoofer "multilingual extension": 7 more, i.e. up to 12 channels

Scope Contents

(multiple languages, commentary) compatibility with MPEG-1: all MPEG-1 audio format can be processed by MPEG-2 only 3 MPEG-2 audio codecs do not provide backward compatibility

05A-compression.fm 67 15.March.01

MPEG-2 System
Steps
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

1. Audio and video combined to Packetized Elementary Stream (PES) 2. PES(es) combined to Program Stream or Transport Stream Program stream: Error-free environment Packets of variable length One single stream with one timing reference Transport stream: Designed for noisy (lossy) media channels Multiplex of various programs with one or more time bases Packets of 188 byte length Conversion between Program and Transport Streams possible

Scope Contents

05A-compression.fm 68 15.March.01

11. MPEG-4 Goals


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

MPEG-4 (ISO 14496) originally: Targeted at systems with very scarce resources To support applications like Mobile communication Videophone and E-mail Max. data rates and dimensions (roughly): Between 4800 and 64000 bits/s 176 columns x 144 lines x 10 frames/s Largely covered by H.263, therefore re-orientation: Goal to provide enhanced functionality to allow for analysis and manipulation of image contents MPEG-4: Schedule for Standardization 1993 Work started 1997: Committee Draft 1998: Final Committee Draft 1998: Draft International Standard 1999-2000: International Standard

Scope Contents

05A-compression.fm 69 15.March.01

MPEG-4: Goals (cont.)


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

1: support composite multimedia i.e. find standardized ways to Represent units of aural, visual or audiovisual content "audio/visual objects" or AVOs Rhubarb Rhubarb Audio object 1 1 2

3 Audio object 2

video objects

object coding independent of other objects, surroundings and background Compose these objects together i.e. creation of compound objects that form audiovisual scenes Multiplex and synchronize the data associated with AVOs for transportation over network channels providing QoS (Quality-of-Service)

Scope Contents

2: support synthetic objects computer-gen. (VR), synthesized (txt2speech), model-based ("face") 3: support truly interactive applications (more than play/pause/rewind..) Interact with the audiovisual scene generated at the decoders site

05A-compression.fm 70 15.March.01

MPEG-4: Scope
Definition of System Decoder Model specification for decoder implementations Description language binary syntax of an AVOs bitstream representation scene description information Corresponding concepts, tools and algorithms, especially for content-based compression of simple and compound audiovisual objects manipulation of objects transmission of objects random access to objects animation scaling error robustness
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
Scope Contents

05A-compression.fm 71 15.March.01

MPEG-4: Scope (cont.)


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Targeted bit rates for video and audio: VLBV core Very Low Bit-rate Video 5 - 64 Kbit/s image sequences with up to CIF resolution and up to 15 frames/s Higher-quality video 64 Kbit/s - 4 Mbit/s quality like digital TV Natural audio coding 2 - 64 Kbit/s

Scope Contents

05A-compression.fm 72 15.March.01

MPEG-4: Video and Image Encoding


Encoding / decoding of Rectangular images and video coding similar to MPEG-1/2 motion prediction texture coding Images and video of arbitrary shape as done in conventional approach 8x8 DCT or shape-adaptive DCT plus coding of shape and transparency information Encoder Must generate timing information speed of the encoder clock = time base desired decoding times and/or expiration times by using time stamps attached to the stream Can specify the minimum buffer resources needed for decoding
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
Scope Contents

05A-compression.fm 73 15.March.01

MPEG-4: Composition of Scenes


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Scene description includes: Tree to define hierarchical relationships between objects Rhubarb Rhubarb primitive AVO compound object compound object

Objects positions in space and time by converting the objects local coordinate system into a global coordinate

system Attribute value selection e.g. pitch of sound, color, texture, animation parameters Description based on some VRML concepts VRML = Virtual Reality Modelling Language

Scope Contents

Interaction with scenes e.g. change viewing point, drag object, start/stop streams, select language

05A-compression.fm 74 15.March.01

05A-compression.fm 75 15.March.01

Contents

Scope

http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

MPEG-4: Example of a Composition

MPEG-4: Scaling
Three approaches: Spatial scalability decoder displays textures and visual objects at a reduced spatial resolution by decoding only a subset of the total bit stream 32 levels max. for textures and still images 3 levels max. for video sequences Temporal scalability decoder displays video at a reduced temporal resolution by decoding only a subset of the total bit stream 3 levels max. Quality scalability bitstream is parsed into a number of bit stream layers of different bit-rates either during transmission or in the decoder subset of the layers still yields a meaningful signal Spatial and temporal scaling both for Conventional rectangular display and Objects with arbitrary shape
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
Scope Contents

05A-compression.fm 76 15.March.01

MPEG-4: Synthetic Objects


Visual objects: Human face start object: neutral-expression face animated via FDPs and/or FAPs FAP (facial anim param): animate current display FDP (facial def. param): alternative shape/texture Mesh + texture mapping: for 2D & 3D meshes 2D mesh may also be used for human face anim., see above only triangular 2D meshes, vertices may be moved (mv!), texture is warped e.g. virtual background Texture coding for view-dependent applications texture, e.g. virt. background; decoder/encoder loop for "minimal" Xmission Audio objects: Text-to-speech speech generation from given text and prosodic parameters face animation control Score driven synthesis music generation from a score more general than MIDI Special effects
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
Scope Contents

05A-compression.fm 77 15.March.01

MPEG-4: Layered Networking Architecture


Display / Recording
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Media CoDec CoDec CoDec CoDec Access Units Adaptation Layer Elementary Streams FlexMux Layer
A/V object data + stream type info, sync. info, QoS req.,... Flexible Multiplexing e.g. multiple elementary streams with similar QoS requirements Transport Multiplexing - only interface specified - layer itself can be any network, e.g. RTP/UDP/IP, AAL5/ATM Coding / Decoding e.g. video or audio frames or scene description commands

Scope Contents

Multiplexed Streams TransMux Layer Network or Local Storage

05A-compression.fm 78 15.March.01

MPEG-4: Layered Networking Architecture (cont.)


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

DMIF Delivery Multimedia Integration Framework Allows to establish multiple party sessions interaction with remote interactive peers broadcast systems storage systems establishment of channels with specific QoSs and bandwidths Controls FlexMux layer TransMux layer

Scope Contents

05A-compression.fm 79 15.March.01

MPEG-4: Error Handling


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Mobile communication: Low bit-rate (< 64 Kbps) Error-prone MPEG-4 concepts for error handling: Resynchronization enables receiver to tune in again based on markers within bitstream Data recovery enables receiver to reconstruct lost data encode data in an error-resilient manner Error concealment enables receiver to bridge gaps in data e.g. by repeating parts of old frames

Scope Contents

05A-compression.fm 80 15.March.01

12. Wavelets Motivation


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

JPEG / DCT problems: DCT not applicable to whole image, but only to small blocks block structure becomes visible at high compression ratios Scaling as add-on additional effort DCT function is fixed can not be adapted to source data Improvements by using Wavelets: Transformation of the whole image overcomes visible block structures and introduces inherent scaling
Better identification of which data is relevant to human perception

higher compression ratio

Scope Contents

05A-compression.fm 81 15.March.01

Wavelets: Compression / Decompression


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Compressor
Forward Wavelet Transformation Quantizer Encoder

Inverse Wavelet Transformation

DeQuantizer

Decoder

Decompressor
The same overall structure as for DCT-based algorithms But: important differences in the transformation step
Scope Contents

05A-compression.fm 82 15.March.01

Wavelets: Fundamental Idea


Image is transformed into the frequency domain (as in JPEG)
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

But: based on Wavelet functions instead of cosine functions cosine: Wavelet e.g.:

...

...

Advantage: Wavelets are 0 outside a limited interval Wavelet automatically relates only to a part of the image Image needs not be splitted into blocks "Frequencies"??? : Use Wavelet family: {2-j/2*(2-j*x-k)}, j,k Z, being a Wavelet

Scope Contents

05A-compression.fm 83 15.March.01

Wavelets: Transformation Steps


"Discrete Wavelet Transformation" (Mallat, 1989)
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Split image recursively by using high and low pass filters read by read by column (vert. op.) line (horiz. . . . lower operations L c1 frequencies L H L H
L Low Pass + downsampling H High Pass + downsampling

d11 d12 d13

transformed image with reduced size higher frequencies

Scope Contents

05A-compression.fm 84 15.March.01

Wavelets: Transformation Steps (cont.)


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

In each step i: Three images dxi (x=1,2,3): containing the high frequency parts of the image representing "details" of the image submitted to Wavelet transformation or thrown away in case of scaling One image ci: containing the lower frequency parts of the image representing the original image with less details / at a lower resolution submitted to step i+1 Up to here: 4 images with 1/4 resolution each --> no compression! but again: decorrelation: many coefficients in d-images (close to) zero Afterwards: Quantization Entropy encoding as with DCT

Scope Contents

05A-compression.fm 85 15.March.01

Wavelets: DWT compared with DCT


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Advantages of DWT over DCT: No block artefacts Inherent scaling based on the dxi for i=1,2,3,... Lower time complexity for the transformation DCT: O(n*logn), DWT: O(n) (n=number of values to be transformed) Higher flexibility: Wavelet function can be freely chosen (but: how to choose?)

Scope Contents

05A-compression.fm 86 15.March.01

Wavelets: Further Issues


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Edge detection reduces high frequencies: First extract detected edges Then apply wavelets to such a filtered image Application to video:

In-2 In-1 Image n


Compute differences

... In-1 - In-2 In - In-1


Wavelet compressor

Im

...

Scope Contents

05A-compression.fm 87 15.March.01

13. Fractal Image Compression Fractal Geometry was first applied to image generation
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
Scope Contents

remember "Mandelbrot" images recursive construction of images infinite granularity (i.e. zoom-in), but compact "image data" (formula) (such forms are called fractals) Zi = RealConst. * Zi-1 + ComplexConst

05A-compression.fm 88 15.March.01

Use of Fractals for Compression??? Overview (1)


observation: self-similarities in natural images
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

(clouds, dunes, beaches: zoom-in reveals similar forms as large image) idea: can natural images be described w/ fractal geometry?? first published by Barnsley & Sloan (88), first impl. 89 by Arnaud Joquin Key #1: Iterated Function Systems IFS: a input (sub-)picture subject to math. transform. of type c picture moved, rotated / mirrored, and contracted --> all transformations are "contractions"

b x e + d y f

Scope Contents

Key #2: Banachs Fixed Point Theorem: apply a set Wimg={Wi} of contractions to an image after infinitely many applications, a specific image appears ... called "attractor" or "fractal" this process is independent of initial "start" image!! human perception: iteration can stop "pretty soon" (finite no. of iterations) Q: how to find Wimg such that attractor is image-to-be-compressed?

05A-compression.fm 89 15.March.01

Use of Fractals for Compression??? Overview (2)


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Key #3: Collages Theorem: in order to find Wimg as above: search Wimg such that image is (almost) transformed into itself! First algorithm published (Joaquin): partition image into (small, non-overlapping) "range blocks" search (larger, overlapping) "domain blocks" which can be "contracted" into range blocks for each range block, find domain block and contraction (lots of possibilities!!)

Scope Contents

details / simplifications of Joaquin approach see below

05A-compression.fm 90 15.March.01

To apply self-similarity: Image Generation


Examples
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

(from TUD + Univ. Bochum)

for recursive contruction of images Sirpinky triangle to produce selfsimilar structures infinite steps applied to different source images lead to same result known as Sirpinski-triangle "Grenzwert" also known as attractor

Scope Contents

05A-compression.fm 91 15.March.01

To Find Self-Similarities
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

affine function allows for translation rotation scaling brightness (/color) adaptation IFS: Iterative Function System ideally completely self-similar example see right PIFS: Partitioned Iterative Function System real images are not completly self-similar Wimg?

Scope Contents

05A-compression.fm 92 15.March.01

Theoretical Basis
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Banachs Fixed Point Theorem: Let F be a metrical space Let W: FF be a contractive mapping i.e. there exists an s, 0<s<1, with | W(x)-W(y) | s | x-y | for all x,y F Then W has exactly one fixed point xf i.e. W(xf) = xf xf can be computed as xf = limn Wn(x) with any x F Application to image compression: Let img be the image to be compressed Regard the set of all possible images as a metrical space metric e.g.: maximum difference between the pixels of two pictures Goal: construct Wimg such that img is the fixed point of Wimg

Scope Contents

05A-compression.fm 93 15.March.01

Fractal Image Compression and Decompression


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Compression: Find appropriate Wimg difficult Decompression: Apply Wimg iteratively to any image easy

Scope Contents

Stop when error falls below some bound Error can be calculated by "Collage Theorem"
05A-compression.fm 94 15.March.01

How to Find Wimg? Joaquins Approach


Systematic search based on "Partitioned Iterative Function System (PIFS)" Partition image into "range blocks" Ri 8*8 pixel blocks non-overlapping Consider all "domain blocks" Dj of double size 16*16 pixel blocks overlapping Find for each Ri the most similar Dj consider rotations (0o/90o/180o/270o) and mirroring adapt brightness and contrast of Dj to that of Ri translation, rotation, mirroring, brightness/contrast adaptation define a (partial) affine function Combine partial functions to Wimg Compression rate? Example: for each (8*8) range block: contraction factor fixed 3 bit for transformation 16 bit for domain block coordinates 12 bit for brightness/contrast adaptation --> factor is 8x8x8 : 31= 512:31 (cf. JPEG example)
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
Scope Contents

05A-compression.fm 95 15.March.01

Further Improvements
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Scope Contents

Quadtree partitioning: Problem: fixed 8*8 blocks do not reflect image properties Solution: flexible partition of image into larger or smaller squares driven by image structure Partitioning into rectangles and triangles

05A-compression.fm 96 15.March.01

Advantages & Drawbacks


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

+ High quality at high compression rates At least for images with self-similarities Here: better than JPEG ("cross-over point" at about 1:10 to 1:30) + Zooming into image supported detailed view possible, interpolation instead of "pixelization" + Scalability decompression steps yield iteratively improving image - Long compression times asymmetric mechanisms improving search techniques for range & domain block pairs - blockwise artifacts with Information losses Wimg is only approximative

Scope Contents

- Not well applicable to images of non-fractal nature E.g. texts, sharp lines & no quality guarantee possible - Lower quality than JPEG at low compression rates - Error Propagation (Fehlerfortpflanzung)

05A-compression.fm 97 15.March.01

14. Basic Audio and Speech Coding Schemes


Voice encoder/decoder: "vocoder"
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Background ITU driven activities G.711: PCM with 64 kbps G.722 differential PCM (DPCM) 48, 56, 64 kbps G.723 Multipulse-maximum Likelihood Quatizer (MP-MLQ): 6,3 kbps Algebraic Codebook Excitation Linear Prediction (ACELP) 5,3 kbps application: speech

Scope Contents

05A-compression.fm 98 15.March.01

Schemes for Speech Coding


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

G.728: Low Delay Code Excited Linear Prediction (LD-CELP) used in audio/video conferencing 16 kbps one-way end to end delay less than 2 msec (due to CODEC algorithm) complex algorithm 16-18 MIPS in floating point required appr. 40 MIPS whole encoding and decoding AV.253 still under consideration at ITU 32 kbps IS-54 VSELP good for voice bad for music 13 kbps (appr. 8 kbps voice + 5.05 kbps forward error correction FEC) driving force: Motorola (similar developments in Japan)

Scope Contents

05A-compression.fm 99 15.March.01

Speech Coding in Mobile Telephone Networks


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

RPE-LTP (GSM) Regular Pulse Excitation - Long-Term Predictor used in European GSM: speech 13 kbps GSM Half-Rate Coders 5.6 - 6.25 kbps quality and characteristics similar to RPE-LPT

Scope Contents

05A-compression.fm 100 15.March.01

Vocoder: e.g. Inmarsat IMBE Coder


http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

Improved Multiband Excitation Coder IMBE application: maritime satellite communications 4,15 kbps for voice (plus 2,25 kbps for channel coding) Principle: Vocoder (IMBE voiced and unvoiced individually for each frequency band)
200 300 Hz 300 450 Hz 2.800 3.400 Hz pitch analysis Speech input .DC + lowpass DC + lowpass DC + lowpass modulator modulator 2.800 3.400 Hz switch puls generator replicated for each frequency band noise generator 200 300 Hz 300 450 Hz encoded Speech

modulator

Scope Contents

05A-compression.fm 101 15.March.01

15. Conclusion
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

JPEG: Very general format with high compression ratio SW and HW for baseline mode available H.261 / H.263: Established standard by telecom world Preferable hardware realization MPEG family of standards: Video and audio compression for different data rates Asymmetric (focus) and symmetric Proprietary systems: e.g. Quicktime Product Migration to the use of standards Next steps: wavelets, fractals, models of objects

Scope Contents

05A-compression.fm 102 15.March.01

Das könnte Ihnen auch gefallen