05A Compression

http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M.
Mhlhuser
Multimedia-Systems: Compression
Prof. Dr.-Ing. Ralf Steinmetz Prof. Dr. Max Mhlhuser
MM: TU Darmstadt - Darmstadt University of Technology, Dept. of of Computer Science TK - Telecooperation, Tel.+49 6151 16-3709, Alexanderstr. 6, D-64283 Darmstadt, Germany, max@informatik.tu-darmstadt.de Fax. +49 6151 16-3052 RS: TU Darmstadt - Darmstadt University of Technology, Dept. of Electrical Engineering and Information Technology, Dept. of Computer Science KOM - Industrial Process and System Communications, Tel.+49 6151 166151, Merckstr. 25, D-64283 Darmstadt, Germany, Ralf.Steinmetz@KOM.tu-darmstadt.de Fax. +49 6151 166152 GMD -German National Research Center for Information Technology httc - Hessian Telemedia Technology Competence-Center e.V
Scope Contents
05A-compression.fm 1 15.March.01
Scope
http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser
Usage
Applications Learning & Teaching Design User Interfaces Group SynchroCommuninization cations
Services
Content Processing
Documents
Security
...
Systems
Databases Media-Server Opt. Memories Computer Architectures
Programming Communications Networks
Operating Systems Quality of Service Compression
Basics
Scope Contents
Image & Graphics
Animation
Video
Audio
Contents
1. Motivation 2. Requirements - General 3. Fundamentals - Categories 4. Source Coding 5. Entropy Coding: 6. Hybrid Coding: Basic Encoding Steps 7. JPEG 8. H.261 and related ITU Standards 9. MPEG-1 10. MPEG-2 11. MPEG-4 12. Wavelets
Scope Contents
13. Fractal Image Compression 14. Basic Audio and Speech Coding Schemes 15. Conclusion
1. Motivation
Digital video in computing means for Text: 1 page with 80 char/line and 64 lines/page and 2 Byte/Char 80 x 64 x 2 x 8 = 80 kBit/page Image: 24 Bit/Pixel, 512 x 512 Pixel/image 512 x 512 x 24 = 6 MBit/Image Audio: CD-quality, samplerate44,1 kHz, 16 Bit/sample Mono: 44,1 x 16 = 706 kBit/s Stereo: 1.412 MBit/s Video: full frames with 1024 x 1024 Pixel/frame, 24 Bit/Pixel, 30 frames/s 1024 x 1024 x 24 x 30 = 720 MBit/s more realistic 360 x 240 Pixel/frame = 60 MBit/s Hence compression is NECESSARY
Contents
Scope
2. Requirements - General
low delay
intrinsic scalability high quality compression
Scope Contents
low complexity (e.g., ease of decoding) efficient implementation (e.g., memory req.)
Requirements
DIALOGUE AND RETRIEVAL mode requirements: Independence of frame size and video frame rate Synchronization of audio, video, and other media DIALOGUE mode requirements: Compression and decompression in real-time (e.g. 25 frames/s) End-to-end delay < 150ms
RETRIEVAL mode requirements: Fast forward and backward data retrieval Random access within 1/2 s
Scope Contents
Software and/or hardware-assisted implementation requirements

3. Fundamentals - Categories
entropy coding - ignoring semantics of the data - lossless source entropy encoding coding - based on semantic of the data - often lossy channel coding
hybrid coding
- entropy and source coding
Scope Contents
- adaptation to communication channel - introduction of redundancy
Categories and Techniques

Entropy Coding
Run-Length Coding Huffman Coding Arithmetic Coding Prediction DPCM DM
Source Coding
Transformation
FFT DCT Bit Position Subsampling Sub-Band Coding
Layered Coding Vector Quantization JPEG
Scope Contents
Hybrid Coding
MPEG H.261, H.263 proprietary: Quicktime, ...
Categories & Techniques, Cont.

Two principal possibilities
(1)
1. Entropy Coding: Eliminate Redundancy (thus, lossless) 2. Reduction Coding: Eliminate Irrelevance / Low-Relevance (lossy) Preparatory Step: Decorrelation - Eliminate Interdependencies this is the essence of source coding changes "representation" of media goal usually: reduce dependencies between data as such, is a preparatory step!! usually, does not compress Steps in hybrid coding (often): decorrelation - reduction - entropy coding often: reduction by quantization last step: additional compresion without harm
Scope Contents
note: literature usually uses terms as in last slide!! note: reduction coding is "smart deletion", not really "compression"
Categories and Techniques, Cont.

(2)
Major distinction: Symmetric / Asymmetric Asym. (usually): more effort for compression o.k. if compression non real-time, "only once" (movie!) may involve number-crunchers (...owned by content provider) Symmetric: "required" for real-time, e.g., videoconferencing in reality, often not 100% symmetric Further considerations include, e.g., Adjustable compression rate? ...quality? "smooth" bit stream ("isochronous")? terms: CBR (const. bit rate) vs. VBR (variable bit rate) may be "over time": e.g., packet size BigSmallSmall BigSmallSmall... may be simulated w/ loop-back filter plus buffer "progressive" (mainly: non-continuous media): display-while-download "streaming": ~ same for video (here, rather an issue of software) more subtle issues "open" standard? good "performance" (ratio, speed) for all kinds of media? bullet-proof, well-understood? ...
Scope Contents
4. Source Coding DPCM

DPCM = Differential Pulse-Code Modulation Assumptions: Consecutive samples or frames have similar values Prediction is possible due to existing correlation Fundamental Steps: Incoming sample or frame (pixel or block) is predicted by means of previously processed data Difference between incoming data and prediction is determined Difference is quantized Challenge: optimal predictor Further predictive coding technique: Delta modulation (DM): 1 bit as difference signal
Scope Contents
Source Coding: Transformation

Assumptions: Data in the transformed domain is easier to compress Related processing is feasible Example: Fourier Transformation time domain frequency domain
Inverse Fourier Transformation FFT: Fast Fourier Transformation
Scope Contents
DCT: Discrete Cosine Transformation
Source Coding: Sub-Band

Assumption: Some frequency ranges are more important than others Example: frequency spectrum of the signal
transformation / coding Application: vocoder for speech communication MPEG audio
frequency
Scope Contents
Entropy Coding: Principle

Entropy (in information theory): information content/ "density" symbols/words equally likely: high entropy (full of information) otherwise: lower entropy (suboptimal representation of info, less dense)
high Entropy
grey levels probability
note: seems "little information" to us since it is very regular; this is not covered by entropy formula, yet may be used for compression (e.g. run length) here: "little info" because "most of picture is in same gray"
probability
low Entropy
grey levels
Entropy formula:
H(P) =
p() log B p()
Scope Contents
example: given 4 possible symbols (words) in source code i) IF all equal p=1/4: H(P)=2; ii) IF p= 1/2, 1/4, 1/8, 1/8 --> H(P)= 1 6/8 "Entropy coding" means: mean length of file equals (~almost) entropy in ii) above, with B=2 (binary): p= ! code length -log2 () = -(-1)=1; p=! 2bits, etc. GOAL: find code w/ symbol length as close as possible to logB p()
5. Entropy Coding: Run-Length (only marginal relation to entropy)

Assumption: Long sequences of identical symbols Example: ... A B C E E E E E E D A C B... compression
... A B C E ! 6 D A C B...
symbol
Scope Contents
number of occurrences special flag
Special variant: zero-length encoding only repetition of zeroes count in red part above, "symbol" not needed (i.e. "pays" for >2 repetitions)
Entropy Coding: Huffman

Basics: Assumption: some symbols occur more often than others E.g., character frequencies of the English language Idea: frequent symbols --> shorter bit strings (cf. Entropy!) Example: Characters to be encoded: A, B, C, D, E probability to occur: p(A)=0.3, p(B)=0.3, p(C)=0.1, p(D)=0.15, p(E)=0.15 coding tree
step 1: scan all leaves, assign (1,0) to the two with lowest probability -> intermediate root steps 2-n: scan current "tops" (intermediate roots or leaves), assign (1,0) to the two with lowest probability, -> ... end: assign codes by descending tree until leaves, bits encountered represent code
probability symbol code 1 30% A 11 step 3 60% 0 30% 1 1 25% 0 10% 15% 15% B C D E 10 011 010 00
step 4 100% 0 40% step 2 0
Scope Contents
Entropy Coding: Huffman

Table and example of application to data stream symbol A B C D E code 11 10 011 010 00
B A C
D A B E B A E
10 11 011 010 11 10 00 10 11 00
note: decoder may auto-detect end-of-symbol in bit-stream
Other types of Entropy encoding

Arithmetic Encoding (1) most direct application of entropy principles! symbols occupy sub-interval of [0,1) according to their probabilities successive coding of symbols cuts out corresponding sub-interval
Scope Contents
in sub-(sub-sub-...) interval chosen so far last symbol --> last sub-interval chose "arbitrary" ("short") no. in this subinterval --> transmit/store requires consideration of "additional" symbol "end-of-word" (below: "!")
Arithmetic Coding Example

symbols a,b,c,d,!; how to encode "bbd!"??
".1#"$$$$$$$$$$ 0.5 $$$$$$$$$$# ".1#"$$0.2$$#".1#

a b c d !
.1
.6
.7
.9
divide [0,1) according to probabilities
restrict to b: [0.1,0.6)
0 0 0
.1 .1 .1 .1
.11 .36
.6 .6 .6 .6
.7 .7 .7 .7
.9 .9 .9 .9
1 1 1 1
b: sub-interval [.1,.6) of [.1,.6), i.e. [.11,.36) d: sub-interval [.7,.9) of [.11,.36) !: sub-interval [.9,1)...
Scope Contents
transmit/store one (arbitrary) value X in "red" sub-interval decoder: X lies in [.1,.6) !1st symbol is "b"; X in [.11, 36) ! 2nd symbol "b" process continues. until "!" is decoded
LZW, Code Books, VLWs, VLIs

LZW (Lempl-Ziv-Welch): e.g.: how to code "whoiswho"? might be thought of as follows: start: create table entry for w then: create table entry for h, but also for "wh" (multi-symbol), then: create table entry for o, but also for "who", (etc. until max-length) multi-symbol entries which repeat "often" survive, others: over-written Code-Books: used in many compression schemes 3 basic possibilities: (imagine in above example) "fixed": all implementations of Codec have (1..n) pre-defined code-books "pre-computed": encoder-pass1: compute codebook, store/transmit upfront encoder-pass2: encode (compress) data using codebook "dynamic": code-book grows / changes during compression (LZW) needs "same procedure" for encoder, decoder either: "pieces" of code-book are intertwined with code as they
Scope Contents
are generated / changed or: rules are such that (dynamic) codebook contents can be derived from encoded (compressed) data by decoder VLWs / VLIs: variable length words / integers similar to Huffman, but decoder can not detect end-of-symbol e.g., 1="0", 2="01", 3="11", ... (useful?? see JPEG etc.)
6. Hybrid Coding: Basic Encoding Steps

video: lossy audio: lossless lossy (sometimes lossless) lossless
source data
data preparation
data processing
quantization
entropy encoding
compresse data
Scope Contents
e.g. - resolution - frame rate
e.g. - DCT - sub-band coding
e.g. - linear - DC, AC values
e.g. - runlength - Huffman
7. JPEG
JPEG: Joint Photographic Expert Group
International Standard: For digital compression and coding of continuous-tone still images: Gray-scale Color Since 1992 Joint effort of: ISO/IEC JTC1/SC2/WG10 Commission Q.16 of CCITT SGVIII Compression rate of 1:10 yields reasonable results
Scope Contents
JPEG
Very general compression scheme
Independence of: Image resolution Image and pixel aspect ratio Color representation Image complexity and statistical characteristics Well-defined interchange format of encoded data
Implementation in: Software only Software and hardware MOTION JPEG for video compression Sequence of JPEG-encoded images
Scope Contents
JPEG - Compression Steps

image presource paration image pixel
image processing quantipredictor zation
entropy encoding runlength Huffman Arithm.
compressed image
block MCU
FDCT
MCU: Minimum Coded Unit FDCT: Forward Discrete Cosine Transformation
Scope Contents
JPEG - Image Preparation

data units
top * * * * * Xi bottom * * Yi * * line right
CN C2 C1
left
* *
data units: samples in lossless mode, blocks with 8x8 pixels in other modes
Planes: 1 N 255 components Ci (e.g., one plane per color) Different resolution of individual components possible Pixel resolution: 8 or 12 bit per pixel in lossy modes 2 to 16 bit per pixel in lossless mode
Scope Contents

Example 4:2:2 YUV, 4:1:1 YUV, and YUV9 Coding Luminance (Y): brightness sampling frequency 13.5 MHz Chrominance (U, V): color differences sampling frequency 6.75 MHz
Scope Contents

Non-interleaved encoding:
top * * * * * * * left * * * * * * * right * * * * * * * bottom Interleaved encoding: * * * * C1 * * * * * * * * * * * * C2 * * * * * * * * * * * * * * * * * * * * * * * * * * C3 * * * * * * * * * * * *
Scope Contents
Minimum Coded Unit (MCU): Combination of interleaved data units of different components
JPEG - Baseline Mode

1: image presource paration image 8x8 blocks
2: image processing
3. quantization
4. entropy encoding
compressed image
FDCT
tables tables
tables
Baseline mode is mandatory for all JPEG implementations: Often restricted to certain resolution Often only three planes with predefined color set-up Image preparation: Step 1a: Pixel resolution --> multiples of p=8 bit yields 8 x 8 pixel blocks (data units) Step 1b: unsigned --> signed integer (prepare for "oscillation" --> sin/cos) ... other steps see below Step 4a: zigzag linearization (see below) Steps 4b, c, ...: several entropy coding algorithms applied
Scope Contents
Intuitive Understanding of DCT

Fourier-Transform (& FFT "fast" algorithm) known from 1-dimensional: cut waveform into pieces (blocks of samples) for each blocks: interpret as periodic (infinite) oscillating waveform represent as sum of sin/cos waves ai sin t; i=0...(N-1); same for cos ai coefficients; a0 = DC (direct current= shift wrt. 0-axis), others: how much of the respective sin or cos wave is part of waveform i increasing frequencies (usually N = no. of samples in block) DCT in JPEG etc.: same idea, but 2-dimensional cos-waves cut out square blocks from picture (NxN) cos waves all have independent frequencies in horizontal/vertical direction comparable to smooth hills, # of valleys may differ horiz/vert. again: interpret sample as periodic (2D) waveform --> represent as sum of (2D) cos wave "hill areas" why only cos?? trick: picture swapped around axes --> 4fold size --> picture symmetric to axes --> sin parts become zero 4fold size no problem: 3 parts redundant axes have double "weight" (pix. row/col. "0") --> factor Cu/Cv in formula
Scope Contents
JPEG - Baseline Mode: Image Processing

Forward Discrete Cosine Transformation (FDCT):
7 1 S vu = --- C u C v 4

x = 0y = 0
( 2x + 1 ) u ( 2y + 1 ) v s yx cos ------------------------------- cos -----------------------------16 16
with: cu, cv =
1-----2
, for u, v= 0; else cu, cv = 1
Formula applied to each block for all 0 u, v 7: Blocks with 8x8 pixel result in 64 DCT coefficients: 1 DC-coefficient S00: basic color of the block 63 AC-coefficients: (likely) zero or near-by zero values Different significance of the coefficients: DC: most important AC: less important
Scope Contents
JPEG - Baseline Mode: Image Processing

FDCT transforms: blocks into blocks not pixels into pixels Example: Calculation of S00
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Scope Contents
JPEG - Baseline Mode: Quantization

Use of quantization tables for the DCT-coefficients: Map interval of real numbers to one integer number Allows to use different granularity for each coefficient
Scope Contents
Contents
Scope
JPEG: Quantization Effect
(a) (b)
JPEG - Baseline Mode: Entropy Encoding

DC-coefficients: Compute the differences: DCi-1 DC i
...
block
block
...
DIFF = DC i - DCi-1 Use differences instead of the DCi values
Scope Contents
JPEG - Baseline Mode: Entropy Coding

63 AC coefficients: Ordering in zig-zag form AC01 DC * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
AC 07
AC 70
AC77
reason: coefficients in lower right corner are likely to be zero Huffman coding of all coefficients: Transformation into a code
where amount of bits depends on frequency of respective value Subsequent runlength coding of zeros
Scope Contents
JPEG: Details of (one possible) Entropy conding

Treatment of "zig-zag sequence": differential coding of DC: DCi stored as "change wrt. DCi-1" assumption: there will rarely be two non-zero AC values in sequence --> regard seq. as iteration of non-zero AC-values and zero-runlengths --> sometimes, the zero-runlength will have "length zero" code non-zero AC-values as VLIs --> need to transmit VLI-lengths (remember: this is not Huffman --> end of code not found by decoder) create pairs (zero-runlength, VLI-length-of-following-non-zero-AC-value) these pairs are Huffman encoded the very first "pair" is not a pair, but the VLI-length of the (diff.) DC-value the block is finally represented as iteration Huffman-encoded pair / VLI-encoded non-zero-AC / Huffman-.... / VLI... / ... preceded by "Huffman-encoded VLI-length / VLI-encoded diff.-DC" The next two slides give an example of the DCT coding of a 8x8 block
Scope Contents
JPEG: Sample Compression of 1 Block: 8x8 Matrices

1. Typical Pixel Block:
139 144 149 153 155 155 155 155 144 151 153 156 159 156 156 156 150 155 160 163 158 156 156 156 159 161 162 160 160 159 159 159 159 160 161 162 162 155 155 155 161 161 161 161 160 157 157 157 162 162 161 163 162 157 157 157 162 162 161 161 163 158 158 158
2. DCT Coefficients:
235.6 1.0 -12.1 -5.2 1.5 1.5 2.1 -1.7 -2.7 1.3
-22.6 -17.5 -6.2 -3.2 -2.9 -0.1 -10.9 -9.3 -1.6 -7.1 -1.9 -0.6 -0.8 1.8 -0.2 -2.6 0.2 1.5 0.9 -0.1 1.5 1.7
0.4 -1.2 0.0 0.6 0.3 1.3
0.2 -0.9 -0.6 -0.1
1.6 -0.1 -0.7
1.6 -0.3 -0.8 1.9
1.0 -1.0 1.1 -0.8
-1.3 -0.4 -0.3 -1.5 -0.5 1.6 -3.8 -1.8
1.2 -0.6 -0.4
3. Quantization Matrix:
16 12 14 16 18 24 49 72 11 12 13 17 22 35 64 92 10 14 16 22 37 55 78 95 16 19 24 29 56 64 98 24 26 40 51 40 58 57 87 51 60 69 80 113 61 55 56 62 77 92 99
4. Quantized Result:
15 -2 -1 0 0 0 0 0 0 -1 -1 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Scope Contents
68 109 103 81 104
87 103 121 120 101 112 100 103
JPEG: Sample Compression (contd.)

assume: last DC value was 18 --> encoded difference is 3
--> only 3, -2, -1 occur as non-zero values. Their VLI-encoding is as follows:

3 11 -2 01 -1 0
This makes the iteration look as follows (VLIs still represented as integers):
(2)(3), (1,2)(-2), (0,1)(-1), (0,1)(-1), (0,1)(-1), (2,1)(-1), (0,0) (<-- abbreviation for "til end")
The following Huffman encoding is defined:

(2) (0,0) (0,1) (1,2) (2,1) 011 1010 00 11011 11100
240 -24 -14 0 0 0 0 0 0 -12 -13 0 0 0 0 0 -10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Scope Contents
...so that the bitstream finally consists of the following 31 bits (for 64 coefficients!):
0111111011010000000001110001010
...btw., the decoded matrix looks like this:

JPEG - 4 Modes of Compression

lossy sequential DCT-based mode (baseline mode) expanded lossy DCT-based mode
lossless mode
hierarchical mode
Scope Contents
JPEG - Extended Lossy DCT-Based Mode

Pixel resolution 8 to 12 bit
Sequential image display: Top to bottom Good for small images and fast processing
Progressive image display: Coarse to fine Good for large and complicated images
Scope Contents
JPEG - Extended Lossy DCT-Based Mode

Principle: Coefficients stored in buffer after quantization Order of pixel/block processing changed By spectral selection: Selection according to importance of DC, AC value All DC values of whole image first All AC values in order of importance subsequently By successive approximation: Selection according to position of bits First the most significant bit of all blocks Then the second significant bit of all blocks Until the least significant bit of all blocks
Scope Contents
JPEG - Lossless Mode

Image preparation: On pixel basis (2-16 bit/pixel) Image processing: Selection of a predictor for each pixel code 0 1 2 3 4 5 6 7 prediction no prediction x=A x=B x=C x=A+B+C x=A+((B-C)/2) x=B+((A-C)/2) x=(A+B)/2
c b a x
Entropy coding: Same as lossy mode Code of chosen predictor and its difference to the actual value
Scope Contents
JPEG - Hierarchical Mode

Coding of each image with several resolutions: Image scaling Differential encoding First, coded with lowest resolution - image A Coded with increasing horizontal & vertical resolution - image A Difference between both images is computed - B = A - A (*) Iteration for higher resolutions Features: Requires more storage and higher data rate Fast decoding process Used for scalable video Similar to Photo-CD (Kodak, proprietary) (*) note for all scalable approaches: relate higher-res version B (or B) to receivers de-coded lower-res version A (to avoid accumulation of quantization errors)
Scope Contents
8. H.261 and related ITU Standards

Video codec for audiovisual services at p x 64kbit/s ("p-times-sixtyfour", where p means "multiples-of"): CCITT standard from 1990 For ISDN With p=1,..., 30 Technical issues: Real-time encoding/decoding Max. signal delay of 150ms Constant data rate Implementation in hardware (main goal) and software
Scope Contents
H.261 - Image Preparation

Fixed source image format Image components: Luminance signal (Y) Two color difference signals (Cb,Cr) Subsampling according to CCIR 601 (4:1:1) Quarter Common Intermediate Format (QCIF) resolution: Mandatory Y: 176 x 144 pixel ("pruning" 180-->176) CIF: 360*288 At 29.97 frames/s appr. 9.115 Mbps (uncompressed) but: encoder may leave out up to 3 frames (--> ~8 fps) QCIF Common Intermediate format (CIF) resolution: Optional Y: 352 x 288 pixel At 29.97 frames/s appr. 36.46 Mbps (uncompressed) i.e. ~ 570 * 64kbps Layered structure: Block of 8 x 8 pixels Macroblock of: 4 Y blocks, 1 Cr block, 1 Cb block Group of blocks (GOBs) of 3 x 11 macroblocks Picture: QCIF picture: 3 GOBs CIF picture: 12 GOBs
Scope Contents
H.261 - Image Compression

Intraframe coding: yields "reference frame" f0 DCT w/ same quantization factor for all AC values this factor may be adjusted by loopback filter (see below) Interframe coding, motion estimation: Frame 1 Frame 2
note about motion vector mv: - mv points "backwards" in time (pos. of object in f - mv related to block, not moving object
Scope Contents
interframes: f1,f2,f3,... relative to f0 (differential encoding) in H.261: intraframes rare (bandwidth!, main application videophone) Search of similar macroblock (16x16) in previous image Position of this macroblock defines motion vector Search range is up to the implementation: max. 15 pixel but: motion vector may also always be 0 ("bad" software encoder)
H.261 - Image Compression

Interframe coding, further steps: Results: Difference between similar macroblocks Motion vector Difference of macroblocks: DCT if value higher than a specific threshold (hybrid DPCM/DCT!) No further processing if value less than this threshold Motion vector: Components are coded yielding code words of variable length Quantization: Linear Adaptation of step size ("loopback filter") => ~ constant data rate ("leaky bucket": constant 64kbps "drop out"; loopback filter: adjust quantization factor if bucket filled above threshold1 or below threshold 2, respectively)
Scope Contents
Further ITU Video Schemes (H.263, H.3xx)

H.263 extension to H.261 max. bitrate: H.263 approx. 2.5 x H.261; lowest bitrates suitable f. modem Source Image Formats
Format SQCIF QCIF CIF 4CIF 16CIF
Pixels 128 x 96 176 x 144 352 x 144 704 x 576 1408 x 1152
H.261 Encoder Decoder optional required optional not defined
H.263 Encoder Decoder required required optional optional
Scope Contents
H.263
Differences of H.263 compared to H.261 mv may point forward in time (future interframe), cf. MPEG, for video optional PB-frames (2 combined pictures: 1 B- & 1 P-Frame) optional overlapped block motion compensation optional motion vector pointing outside image half pel motion compensation (instead of full pel) JPEG is the still picture mode no included error detection and correction unlimited search space for motion vector --> fast encoder can do better ..
Scope Contents
H.320, H.32x Family

H.320 specifies (as overview) videophone for ISDN
H.310 adapt MPEG 2 for communication over B-ISDN (ATM) H.321 define videoconferencing terminal for B-ISDN (instead of N-ISDN) H.322 adapt H.320 for guaranteed QoS LANs (like ISO-Ethernet) H.323 videoconferencing over non-guaranteed LANs H.324 Terminal for low bit rate communication (over V.34 Modems)
Scope Contents
9. MPEG-1
Motion Picture Expert Group (MPEG) ISO/IEC working group(s) ISO/IEC JTC1/SC29/WG11 ISO IS 11172 since 3/93 Starting point: MPEG-1 Audio/video at about 1.5 Mbit/s Based on experiences with JPEG and H.261 Follow-up standards MPEG-2 MPEG-4 MPEG-7 MPEG-21
Scope Contents
MPEG - Features
MPEG audio video system combined stream coding data stream coding data stream common buffer management
Consideration of other standards: JPEG H.261 Symmetric and asymmetric compression Constant data rate, should be < 1856 kbit/s Original target rate ~ 1.2 Mbps including audio (=1x CD-ROM: 150 kBps)
Scope Contents
MPEG - Video: Preparation Step

Fixed image format
Color subsampling: Y, Cr, Cb 4:2:0 Resolution: Should be at most 768 x 576 pixel 8 bit/pixel in each layer (i.e., for Y, Cr, Cb) 14 pixel aspect ratios 8 frame rates No user defined MCU like JPEG No progressive mode like JPEG
Scope Contents
MPEG - Video: Processing Step

4 types of frames:
I-frames (intra-coded frames): Like JPEG Real-time decoding demands P-frames (predictive coded frames): Reference to previous I- or P-frames Motion vector MPEG does not define how to determine the motion vector difference of similar macroblocks is DCT coded DC and AC coefficients are runlength coded B-frames (bi-directional predictive coded frames): Reference to previous and subsequent (I or P) frames Interpolation between macro blocks
Scope Contents
D-frames (DC-coded frames): Only DC-coefficients are DCT coded For fast forward and rewind
MPEG - Video Coding

Sequence of I-, P-, and B-frames:
I B B P B
References
I-Frames (Intracoded) P-Frames (Predictive Coded)
B P I t
B-Frames (Bidirectionally Coded) (D-Frames (DC Coded))
Scope Contents
Sequence: Defined by application E.g., I B B P B B P B B I B B P B B P B B Order of transmission is different: I P B B ...
MPEG - Video: Implications

Random access at I-frames at P-frames: i.e. decode previous I-frame first at B-frame: i.e. decode I and P-frames first Editing decoded data loss of quality (encode -> decode -> encode -> ...) application of all video editing functions encoded data (previous to entropy encoding) preservation of quality transition effects as function in the DCT domain morphing, non-block conform overlay very difficult encoded data preservation of quality today: too complex, if possible, i.e. need for entropy decoding
Scope Contents
MPEG - Audio Coding: Fundamentals

80
60 Sound Pressure Level (dB)
fm = 0.25
4 kHz
40 masking patterns 20 absolute threshold of hearing
av
0.02
0.05
0.1
0.2
0.5
5 2 frequency (kHz)
10
20
Masking threshold in the frequence domain narrowband random noise depends on frequency
Scope Contents
MPEG - Audio Coding: Fundamentals

60 40 SLT 20 0
pre-
simultaneous-
post-masking-
masker
-50 Dt 50 100 150 ms 0 tv 50 100 150 200
Masking in Time Domain after and before the event depends on (to some extent) amplitude
Scope Contents
MPEG - Audio Coding

sub-band coding
32
quantization
entropy coder & frame packing
psychoacoustical model Yields: heavily asymmetric codecs!! Audio channel: Between 32 and 448 kbit/s In steps of 16 kbit/s
controls: how many bits reserved for which sub-band
Scope Contents
Definition of 3 "layers" of quality: "higher layer" means "more complex" & "can handle lower layers" Layer 1: max. 448 Kbit/s (ca. 1:4 compression, e.g. used as PASC in DCC) Layer 2: max. 384 Kbit/s (ca. 1:6-8, common, e.g. as MUSICAM in DAB) Layer 3: max. 320 Kbit/s (ca. 1:10-12, the famous MP3)
MPEG - Audio Coding

Sampling compatible to encoding of CD-DA and DAT: Sampling rates: 32 kHz, 44,1 kHz, 48 kHz Sampling precision: 16 bit/sample Audio channels: Mono (single, 1 channel) Stereo (2 channels) dual channel mode (independent, e.g., bilingual) optional: joint stereo (exploits redundancy and irrelevancy) Application Example: DAB Digital Audio Broadcasting uses MPEG layer 2 (compression also known as MUSICAM =
(Masking pattern adapted Universal Subband Integrated Coding And Multiplexing)
Scope Contents
delays, for VLSI implementation: max. 30 ms encoding max. 10 ms decoding
SW codec delays vary for different layers, implementations, computers (rule-of-thumb may be 50/100/150 ms for layer 1/2/3, which makes MP3 rather inappropriate for real-time conversation)
MPEG - Audio and Video Data Streams

Audio Data Stream Layers:
1. Frames 2. Audio access units 3. Slots Video Data Stream Layers: 1. Video sequence layer 2. Group of pictures layer 3. Single picture layer 4. Slice layer 5. Macroblock layer 6. Block layer
Scope Contents
10. MPEG-2 Follow-Up MPEG Standards

MPEG-2: Higher data rates for high-quality audio/video Multiple layers and profiles MPEG-3 Initially HDTV MPEG-2 scaled up to subsume MPEG-3 MPEG-4: Initially, lower data rates for e.g. mobile communication then: focus coding & additional functionalities based on image contents MPEG-7 (EC = "experimental core" status): Content description Basis for search and retrieval See section on databases MPEG-21 (upcoming): Framework for multimedia business, delivery... whats missing? maybe eCommerce focus --> e.g., security, watermarking?
Scope Contents
MPEG-2
From MPEG-1 to MPEG-2 Improvement in quality from VCR to TV to HDTV No CD-ROM based constraints higher data rates MPEG-1: about 1.5 Mbit/s MPEG-2: 2-100 Mbit/s Evolution 1994: International Standard Also later known as H.262 Prominent role for digital TV in DVB (digital video broadcasting) commercial MPEG-2 realizations available
Scope Contents
MPEG-2 Video
Inclusion of interlaced video format
Increase resolution, more than CCIR 601 Defined as: 5 profiles (simple, main,..) 4 levels (with increasing resolution,...) Other additional features DCT coefficients may be coded with a non-linear quantization function
Scope Contents
MPEG-2 Video: Scaling

Motivation analog: continuous decrease in quality if errors occur digital: need for tolerance whenever error occur, i.e scaling Option: Spatial scaling reduction of resolution approach image sampled with half resolution, then MPEG algorithms applied, output processed with better FEC (base layer) Image decoded, substracted from original, to difference MPEG algorithms applied, output processed with worse FEC (enhanced layer) Option: Signal to Noise (SNR) scaling noise introduced by quantization errors and visible block structures approach Base layer: DCT output, more significant bits encoded with better FEC Enhanced layer: DCT output, less significant bits encoded with worse FEC
Scope Contents
MPEG-2 Video Profiles und Levels

High Level 1920 pixels/ line 1152 lines High-1440 Level 1440 pixels/ line 1152 lines Main Level 720 pixels/ line 576 lines Low Level 352 pixels/ line 288 lines
80 Mbit/ s
100 Mbit/ s
60 Mbit/ s
60 Mbit/s
80 Mbit/s
15 Mbit/ 15 Mbit/ 15 Mbit/ s s s
20 Mbit/s
4 Mbit/s 4 Mbit/s
Scope Contents
Simple Profile
Main Profile B-frames 4:2:0 Not Scalable
SNR Scalable Profile B-frames 4:2:0 SNR Scalable
Spatial Scalable Profile B-frames 4:2:0
High Profile B-frames 4:2:0 or 4:2:2
No Bframes LEVELS and PROFILES 4:2:0 Not Scalable
SNR SNR Scalable or Scalable or Spatial Spatial Scalable Scalable
Scope Contents
MPEG-2 Audio
(two modest) extension to MPEG-1 audio:
1) "low sample rate extension" LSE: 1/2 of all MPEG-1 rates: 16, 22.05, 24kHz quantization down to 8 bits/sample 2) "multichannel extension": more channels, i.e. up to 5 full bandwidth channels (surround system) left and right front center (in front) left and right back "matrixing": rule for backward compatible conversion --> stereo (x, y = 0.71)
Left for Stereo = Left_f + xCenter + yLeft_b Right for Stereo = Right_f + xCenter + yRigtht_b option: +1 "low freq. extension" (LFE) channel for subwoofer "multilingual extension": 7 more, i.e. up to 12 channels
Scope Contents
(multiple languages, commentary) compatibility with MPEG-1: all MPEG-1 audio format can be processed by MPEG-2 only 3 MPEG-2 audio codecs do not provide backward compatibility
MPEG-2 System
Steps
1. Audio and video combined to Packetized Elementary Stream (PES) 2. PES(es) combined to Program Stream or Transport Stream Program stream: Error-free environment Packets of variable length One single stream with one timing reference Transport stream: Designed for noisy (lossy) media channels Multiplex of various programs with one or more time bases Packets of 188 byte length Conversion between Program and Transport Streams possible
Scope Contents
11. MPEG-4 Goals

MPEG-4 (ISO 14496) originally: Targeted at systems with very scarce resources To support applications like Mobile communication Videophone and E-mail Max. data rates and dimensions (roughly): Between 4800 and 64000 bits/s 176 columns x 144 lines x 10 frames/s Largely covered by H.263, therefore re-orientation: Goal to provide enhanced functionality to allow for analysis and manipulation of image contents MPEG-4: Schedule for Standardization 1993 Work started 1997: Committee Draft 1998: Final Committee Draft 1998: Draft International Standard 1999-2000: International Standard
Scope Contents
MPEG-4: Goals (cont.)

1: support composite multimedia i.e. find standardized ways to Represent units of aural, visual or audiovisual content "audio/visual objects" or AVOs Rhubarb Rhubarb Audio object 1 1 2
3 Audio object 2
video objects
object coding independent of other objects, surroundings and background Compose these objects together i.e. creation of compound objects that form audiovisual scenes Multiplex and synchronize the data associated with AVOs for transportation over network channels providing QoS (Quality-of-Service)
Scope Contents
2: support synthetic objects computer-gen. (VR), synthesized (txt2speech), model-based ("face") 3: support truly interactive applications (more than play/pause/rewind..) Interact with the audiovisual scene generated at the decoders site
MPEG-4: Scope
Definition of System Decoder Model specification for decoder implementations Description language binary syntax of an AVOs bitstream representation scene description information Corresponding concepts, tools and algorithms, especially for content-based compression of simple and compound audiovisual objects manipulation of objects transmission of objects random access to objects animation scaling error robustness
Scope Contents
MPEG-4: Scope (cont.)

Targeted bit rates for video and audio: VLBV core Very Low Bit-rate Video 5 - 64 Kbit/s image sequences with up to CIF resolution and up to 15 frames/s Higher-quality video 64 Kbit/s - 4 Mbit/s quality like digital TV Natural audio coding 2 - 64 Kbit/s
Scope Contents
MPEG-4: Video and Image Encoding

Encoding / decoding of Rectangular images and video coding similar to MPEG-1/2 motion prediction texture coding Images and video of arbitrary shape as done in conventional approach 8x8 DCT or shape-adaptive DCT plus coding of shape and transparency information Encoder Must generate timing information speed of the encoder clock = time base desired decoding times and/or expiration times by using time stamps attached to the stream Can specify the minimum buffer resources needed for decoding
Scope Contents
MPEG-4: Composition of Scenes

Scene description includes: Tree to define hierarchical relationships between objects Rhubarb Rhubarb primitive AVO compound object compound object
Objects positions in space and time by converting the objects local coordinate system into a global coordinate
system Attribute value selection e.g. pitch of sound, color, texture, animation parameters Description based on some VRML concepts VRML = Virtual Reality Modelling Language
Scope Contents
Interaction with scenes e.g. change viewing point, drag object, start/stop streams, select language
Contents
Scope
MPEG-4: Example of a Composition
MPEG-4: Scaling
Three approaches: Spatial scalability decoder displays textures and visual objects at a reduced spatial resolution by decoding only a subset of the total bit stream 32 levels max. for textures and still images 3 levels max. for video sequences Temporal scalability decoder displays video at a reduced temporal resolution by decoding only a subset of the total bit stream 3 levels max. Quality scalability bitstream is parsed into a number of bit stream layers of different bit-rates either during transmission or in the decoder subset of the layers still yields a meaningful signal Spatial and temporal scaling both for Conventional rectangular display and Objects with arbitrary shape
Scope Contents
MPEG-4: Synthetic Objects

Visual objects: Human face start object: neutral-expression face animated via FDPs and/or FAPs FAP (facial anim param): animate current display FDP (facial def. param): alternative shape/texture Mesh + texture mapping: for 2D & 3D meshes 2D mesh may also be used for human face anim., see above only triangular 2D meshes, vertices may be moved (mv!), texture is warped e.g. virtual background Texture coding for view-dependent applications texture, e.g. virt. background; decoder/encoder loop for "minimal" Xmission Audio objects: Text-to-speech speech generation from given text and prosodic parameters face animation control Score driven synthesis music generation from a score more general than MIDI Special effects
Scope Contents
MPEG-4: Layered Networking Architecture

Display / Recording
Media CoDec CoDec CoDec CoDec Access Units Adaptation Layer Elementary Streams FlexMux Layer
A/V object data + stream type info, sync. info, QoS req.,... Flexible Multiplexing e.g. multiple elementary streams with similar QoS requirements Transport Multiplexing - only interface specified - layer itself can be any network, e.g. RTP/UDP/IP, AAL5/ATM Coding / Decoding e.g. video or audio frames or scene description commands
Scope Contents
Multiplexed Streams TransMux Layer Network or Local Storage
MPEG-4: Layered Networking Architecture (cont.)

DMIF Delivery Multimedia Integration Framework Allows to establish multiple party sessions interaction with remote interactive peers broadcast systems storage systems establishment of channels with specific QoSs and bandwidths Controls FlexMux layer TransMux layer
Scope Contents
MPEG-4: Error Handling

Mobile communication: Low bit-rate (< 64 Kbps) Error-prone MPEG-4 concepts for error handling: Resynchronization enables receiver to tune in again based on markers within bitstream Data recovery enables receiver to reconstruct lost data encode data in an error-resilient manner Error concealment enables receiver to bridge gaps in data e.g. by repeating parts of old frames
Scope Contents
12. Wavelets Motivation

JPEG / DCT problems: DCT not applicable to whole image, but only to small blocks block structure becomes visible at high compression ratios Scaling as add-on additional effort DCT function is fixed can not be adapted to source data Improvements by using Wavelets: Transformation of the whole image overcomes visible block structures and introduces inherent scaling
Better identification of which data is relevant to human perception
higher compression ratio
Scope Contents
Wavelets: Compression / Decompression

Compressor
Forward Wavelet Transformation Quantizer Encoder
Inverse Wavelet Transformation
DeQuantizer
Decoder
Decompressor
The same overall structure as for DCT-based algorithms But: important differences in the transformation step
Scope Contents
Wavelets: Fundamental Idea

Image is transformed into the frequency domain (as in JPEG)
But: based on Wavelet functions instead of cosine functions cosine: Wavelet e.g.:
...
...
Advantage: Wavelets are 0 outside a limited interval Wavelet automatically relates only to a part of the image Image needs not be splitted into blocks "Frequencies"??? : Use Wavelet family: {2-j/2*(2-j*x-k)}, j,k Z, being a Wavelet
Scope Contents
Wavelets: Transformation Steps

"Discrete Wavelet Transformation" (Mallat, 1989)
Split image recursively by using high and low pass filters read by read by column (vert. op.) line (horiz. . . . lower operations L c1 frequencies L H L H
L Low Pass + downsampling H High Pass + downsampling
d11 d12 d13
transformed image with reduced size higher frequencies
Scope Contents
Wavelets: Transformation Steps (cont.)

In each step i: Three images dxi (x=1,2,3): containing the high frequency parts of the image representing "details" of the image submitted to Wavelet transformation or thrown away in case of scaling One image ci: containing the lower frequency parts of the image representing the original image with less details / at a lower resolution submitted to step i+1 Up to here: 4 images with 1/4 resolution each --> no compression! but again: decorrelation: many coefficients in d-images (close to) zero Afterwards: Quantization Entropy encoding as with DCT
Scope Contents
Wavelets: DWT compared with DCT

Advantages of DWT over DCT: No block artefacts Inherent scaling based on the dxi for i=1,2,3,... Lower time complexity for the transformation DCT: O(n*logn), DWT: O(n) (n=number of values to be transformed) Higher flexibility: Wavelet function can be freely chosen (but: how to choose?)
Scope Contents
Wavelets: Further Issues

Edge detection reduces high frequencies: First extract detected edges Then apply wavelets to such a filtered image Application to video:
In-2 In-1 Image n

Compute differences
... In-1 - In-2 In - In-1

Wavelet compressor
Im
...
Scope Contents
13. Fractal Image Compression Fractal Geometry was first applied to image generation
Scope Contents
remember "Mandelbrot" images recursive construction of images infinite granularity (i.e. zoom-in), but compact "image data" (formula) (such forms are called fractals) Zi = RealConst. * Zi-1 + ComplexConst
Use of Fractals for Compression??? Overview (1)

observation: self-similarities in natural images
(clouds, dunes, beaches: zoom-in reveals similar forms as large image) idea: can natural images be described w/ fractal geometry?? first published by Barnsley & Sloan (88), first impl. 89 by Arnaud Joquin Key #1: Iterated Function Systems IFS: a input (sub-)picture subject to math. transform. of type c picture moved, rotated / mirrored, and contracted --> all transformations are "contractions"
b x e + d y f
Scope Contents
Key #2: Banachs Fixed Point Theorem: apply a set Wimg={Wi} of contractions to an image after infinitely many applications, a specific image appears ... called "attractor" or "fractal" this process is independent of initial "start" image!! human perception: iteration can stop "pretty soon" (finite no. of iterations) Q: how to find Wimg such that attractor is image-to-be-compressed?
Use of Fractals for Compression??? Overview (2)

Key #3: Collages Theorem: in order to find Wimg as above: search Wimg such that image is (almost) transformed into itself! First algorithm published (Joaquin): partition image into (small, non-overlapping) "range blocks" search (larger, overlapping) "domain blocks" which can be "contracted" into range blocks for each range block, find domain block and contraction (lots of possibilities!!)
Scope Contents
details / simplifications of Joaquin approach see below
To apply self-similarity: Image Generation

Examples
(from TUD + Univ. Bochum)
for recursive contruction of images Sirpinky triangle to produce selfsimilar structures infinite steps applied to different source images lead to same result known as Sirpinski-triangle "Grenzwert" also known as attractor
Scope Contents
To Find Self-Similarities
affine function allows for translation rotation scaling brightness (/color) adaptation IFS: Iterative Function System ideally completely self-similar example see right PIFS: Partitioned Iterative Function System real images are not completly self-similar Wimg?
Scope Contents
Theoretical Basis
Banachs Fixed Point Theorem: Let F be a metrical space Let W: FF be a contractive mapping i.e. there exists an s, 0<s<1, with | W(x)-W(y) | s | x-y | for all x,y F Then W has exactly one fixed point xf i.e. W(xf) = xf xf can be computed as xf = limn Wn(x) with any x F Application to image compression: Let img be the image to be compressed Regard the set of all possible images as a metrical space metric e.g.: maximum difference between the pixels of two pictures Goal: construct Wimg such that img is the fixed point of Wimg
Scope Contents
Fractal Image Compression and Decompression

Compression: Find appropriate Wimg difficult Decompression: Apply Wimg iteratively to any image easy
Scope Contents
Stop when error falls below some bound Error can be calculated by "Collage Theorem"
How to Find Wimg? Joaquins Approach

Systematic search based on "Partitioned Iterative Function System (PIFS)" Partition image into "range blocks" Ri 8*8 pixel blocks non-overlapping Consider all "domain blocks" Dj of double size 16*16 pixel blocks overlapping Find for each Ri the most similar Dj consider rotations (0o/90o/180o/270o) and mirroring adapt brightness and contrast of Dj to that of Ri translation, rotation, mirroring, brightness/contrast adaptation define a (partial) affine function Combine partial functions to Wimg Compression rate? Example: for each (8*8) range block: contraction factor fixed 3 bit for transformation 16 bit for domain block coordinates 12 bit for brightness/contrast adaptation --> factor is 8x8x8 : 31= 512:31 (cf. JPEG example)
Scope Contents
Further Improvements
Scope Contents
Quadtree partitioning: Problem: fixed 8*8 blocks do not reflect image properties Solution: flexible partition of image into larger or smaller squares driven by image structure Partitioning into rectangles and triangles
Advantages & Drawbacks

+ High quality at high compression rates At least for images with self-similarities Here: better than JPEG ("cross-over point" at about 1:10 to 1:30) + Zooming into image supported detailed view possible, interpolation instead of "pixelization" + Scalability decompression steps yield iteratively improving image - Long compression times asymmetric mechanisms improving search techniques for range & domain block pairs - blockwise artifacts with Information losses Wimg is only approximative
Scope Contents
- Not well applicable to images of non-fractal nature E.g. texts, sharp lines & no quality guarantee possible - Lower quality than JPEG at low compression rates - Error Propagation (Fehlerfortpflanzung)
14. Basic Audio and Speech Coding Schemes

Voice encoder/decoder: "vocoder"
Background ITU driven activities G.711: PCM with 64 kbps G.722 differential PCM (DPCM) 48, 56, 64 kbps G.723 Multipulse-maximum Likelihood Quatizer (MP-MLQ): 6,3 kbps Algebraic Codebook Excitation Linear Prediction (ACELP) 5,3 kbps application: speech
Scope Contents
Schemes for Speech Coding

G.728: Low Delay Code Excited Linear Prediction (LD-CELP) used in audio/video conferencing 16 kbps one-way end to end delay less than 2 msec (due to CODEC algorithm) complex algorithm 16-18 MIPS in floating point required appr. 40 MIPS whole encoding and decoding AV.253 still under consideration at ITU 32 kbps IS-54 VSELP good for voice bad for music 13 kbps (appr. 8 kbps voice + 5.05 kbps forward error correction FEC) driving force: Motorola (similar developments in Japan)
Scope Contents
Speech Coding in Mobile Telephone Networks

RPE-LTP (GSM) Regular Pulse Excitation - Long-Term Predictor used in European GSM: speech 13 kbps GSM Half-Rate Coders 5.6 - 6.25 kbps quality and characteristics similar to RPE-LPT
Scope Contents
Vocoder: e.g. Inmarsat IMBE Coder

Improved Multiband Excitation Coder IMBE application: maritime satellite communications 4,15 kbps for voice (plus 2,25 kbps for channel coding) Principle: Vocoder (IMBE voiced and unvoiced individually for each frequency band)
200 300 Hz 300 450 Hz 2.800 3.400 Hz pitch analysis Speech input .DC + lowpass DC + lowpass DC + lowpass modulator modulator 2.800 3.400 Hz switch puls generator replicated for each frequency band noise generator 200 300 Hz 300 450 Hz encoded Speech
modulator
Scope Contents
15. Conclusion
JPEG: Very general format with high compression ratio SW and HW for baseline mode available H.261 / H.263: Established standard by telecom world Preferable hardware realization MPEG family of standards: Video and audio compression for different data rates Asymmetric (focus) and symmetric Proprietary systems: e.g. Quicktime Product Migration to the use of standards Next steps: wavelets, fractals, models of objects
Scope Contents

05A Compression

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

05A Compression

Hochgeladen von

Copyright:

Verfügbare Formate

http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M.

Databases Media-Server Opt. Memories Computer Architectures

Programming Communications Networks

Operating Systems Quality of Service Compression

Image & Graphics

http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

intrinsic scalability high quality compression

Software and/or hardware-assisted implementation requirements

- entropy and source coding

- adaptation to communication channel - introduction of redundancy

Categories and Techniques

Run-Length Coding Huffman Coding Arithmetic Coding Prediction DPCM DM

FFT DCT Bit Position Subsampling Sub-Band Coding

Layered Coding Vector Quantization JPEG

MPEG H.261, H.263 proprietary: Quicktime, ...

Categories & Techniques, Cont.

Categories and Techniques, Cont.

4. Source Coding DPCM

Source Coding: Transformation

Inverse Fourier Transformation FFT: Fast Fourier Transformation

DCT: Discrete Cosine Transformation

Source Coding: Sub-Band

transformation / coding Application: vocoder for speech communication MPEG audio

Entropy Coding: Principle

p() log B p()

5. Entropy Coding: Run-Length (only marginal relation to entropy)

Assumption: Long sequences of identical symbols Example: ... A B C E E E E E E D A C B... compression

number of occurrences special flag

Entropy Coding: Huffman

step 4 100% 0 40% step 2 0

Entropy Coding: Huffman

note: decoder may auto-detect end-of-symbol in bit-stream

Other types of Entropy encoding

Arithmetic Coding Example

".1#"$$$$$$$$$$ 0.5 $$$$$$$$$$# ".1#"$$0.2$$#".1#

divide [0,1) according to probabilities

LZW, Code Books, VLWs, VLIs

6. Hybrid Coding: Basic Encoding Steps

video: lossy audio: lossless lossy (sometimes lossless) lossless

e.g. - resolution - frame rate

e.g. - DCT - sub-band coding

e.g. - linear - DC, AC values

e.g. - runlength - Huffman

JPEG - Compression Steps

image presource paration image pixel

image processing quantipredictor zation

entropy encoding runlength Huffman Arithm.

MCU: Minimum Coded Unit FDCT: Forward Discrete Cosine Transformation

JPEG - Image Preparation

top * * * * * Xi bottom * * Yi * * line right

JPEG - Image Preparation

JPEG - Image Preparation

top * * * * * * * left * * * * * * * right * * * * * * * bottom Interleaved encoding: * * * * C1 * * * * * * * * * * * * C2 * * * * * * * * * * * * * * * * * * * * * * * * * * C3 * * * * * * * * * * * *

JPEG - Baseline Mode

1: image presource paration image 8x8 blocks

Intuitive Understanding of DCT

JPEG - Baseline Mode: Image Processing

( 2x + 1 ) u ( 2y + 1 ) v s yx cos ------------------------------- cos -----------------------------16 16

, for u, v= 0; else cu, cv = 1

JPEG - Baseline Mode: Image Processing

JPEG - Baseline Mode: Quantization

http://www.kom.e-technik.tu-darmstadt.de http://www.tk.informatik.tu-darmstadt.de R. Steinmetz, M. Mhlhuser

JPEG: Quantization Effect

JPEG - Baseline Mode: Entropy Encoding