Techniques de Compression Audio/Vidéo (Journée Télécoms & Multimédia Par: Zouhair Guennoun)

A/V COMPRESSION TECHNIQUES
& COMPRESSION STANDARDS

Journée TELECOMS & MULTIMEDIA. 14 Mai 2010 – ENSA TANGER
Pr. Zouhair GUENNOUN

Ecole Mohammadia d’ingénieurs – EMI
Laboratoire d’Electronique et Communications – LEC
zouhair@emi.ac.ma
Communication Model
Information
Channel Receiver
Source Transmitted Received
information information
Data Noise source

reduction
2
Audio/Video Coding Applications
3
Detailed Communication Model
Transmitter
Information Data Source Channel
Encrypt Coding
Source Reduction Coding
Information
Noise
Channel
Receiver
Data Source Channel
Destination Reconstruction Decrypt Decoding
Decoding
4
Agenda
• Introduction
• Audio & Video compression principles
• A/V Compression standards
• Conclusion
5
Agenda
• Introduction
– Why compressing?
– Audio & Video basics
– MPEGx, & H.26x Compression Standards Overview
• Conclusion
6
Why compressing?
7
The need for compression
• Audio: Compression needed in spectral domain
• Bit rate of a stereo audio source (CD-DA encoding)

– Sampling frequency : 44.1kHz
– Stereo - 16-bit per sample A ud oi w a ve of m
r (m
ti e )
m
ti e
– Bit rate = 44100 * 2 * 16 = 1.41Mbits/sec
8
Digital Audio
Type Sampling Bits per # Bit Rate (Mbps)

Frequency Sample Channels
(kHz)
Telephone signal 8 8 1 0.064
(G.711) (ISDN)
CD-DA 44,1 16 2 1,411

(Compact Disc – Digital Audio) (CD-ROM 1x)
DAT 48 16 2 1,536
(Digital Audio Tape)
9
• Video: Compression needed in spatial domain
• Bit rate of a video source (CCIR 601 - 50Hz countries)

V di eo mi age
– 25 images per second 720 sam p el s
– YUV colour coding
(Y: luminance –
U,V : Chrominance) 576
• Y: 8 bit per pixel – lni e s
• U,V: 1 pixel on 2 coded,
8 bit per pixel
Bit rate = (576*720)*25*16 = 166Mbits/sec
10
SQ
CI
F
(1
28
*9
6)
Q
3,69
C
IF
(1
74
*1
44
)
7,52
CI
F
(3
52
*2
88
)
30,41
4C
IF
(7
04
16 *5
Bit Rate (in Mbps)
76
CI
F )
4:
3
162,20
(1
40
8
*1
16 15
CI 2)
F
16
648,81
:9
(1
92
0*
11
52
Bit Rate versus Spatial Resolution
)
1061,68
12
• Channels available for A/V transmission
– Analog television channel (compatibility)
• Cable (bandwidth = 8MHz)
• Satellite (Bandwidth = 30-40MHz)
Capacity around 40Mbits/sec
– Compact Disc (CD – 650MB)

For 74 min. play time : 1.41Mbits/sec
– Digital Versatile Disc (DVD – 4.7GB)

For 135 min. play time : 4.6Mbits/sec
13
Illustrative example
• PSTN modem - maximum bit rate: 56kbps
• Video frame sequence -

– Resolution: 288x352 (CIF format)
– RGB colors: 8x3 bits per pixel
– Frame rate transmission: 30 frames per second
• Required bit rate: 288x352x8x3x30 = 72.99Mbps
• Ratio between the required bit rate and largest possible bit
rate: 72.99Mbps/56kbps = 1289
– To accomplish the transmission over PSTN, a need to compress data by
at least 1289 times.
15
• MPEG-1 target (Video-CD : 74 min. constraints)
V di eo :166M b it/se c
1 4. M b it/se c
A ud oi :1 4. M b it/se c
C om p re s s oi n
But quality was judged too poor (about VHS quality)
16
• MPEG-2 target
– Program stream (DVD)
1 p ro g ram
(v di e o , 3 -9 M b it/se c (va rai b el b itra et )
m u ltci h a n n e l (b u th gi h e r q u a lity ht a nM P EG -1 )
a u d oi , ....)
= m o tvi a toi n of r ht e ca p a c ity
C om p re s s oi n ni c re a se o f ht e C D (--> D VD )
– Transport stream (DVB)
n p rog ram s
(v di eo , ab ou t 40 M b it /se c (con s tan t b itra te )
m u lt ci hanne l D
( V B -S a te llite & D VB C
- ab el )
aud oi , . . . .)
C om p re ss oi n
17
• Compression extends the playing time of a given storage
device.
• Compression allows a reduction in bandwidth
• For the same bandwidth, compression allows faster

transmission, and better quality.
• Compression removes redundancy from signals.

– Redundancy is however essential to making data more resistant to
errors.
– Compressed data are more sensitive to errors than uncompressed
data.
18
Principles of Compression
• Compression (or Source Coding) is achieved by
suppressing information:
– redundant information
– irrelevant information
• Suppression of redundant information

lossless compression
Fc(x,y,t) Fp(x,y,t) = Fc(x,y,t)
Compression Decompression
Rc (bps) Ri < Rc Rp = Rc
The original signal and the one obtained after

encoding and decoding are identical
19
• Suppression of irrelevant information
lossy compression (Perceptive Coding)
Example: bandwidth limitation, masking in audio
Fc(x,y,t) Fp(x,y,t) Fc(x,y,t)

Compression Decompression
Rc (bps) Ri < Rc Rp = Rc
The original signal and the one obtained after encoding and decoding
are different but are perceived as identical
20
• Lossless vs. lossy data compression
L0
– Source entropy H(X)
– Rate-Distortion function R(D)
Lossless methods
or D(R)
H(S)
• Probabilistic modeling is at the heart Lossy methods

of data compression
– What is P(X) for video source X?
– Is video coding more difficult than Distortion
image coding?
0 Dmax
22
• Reversible (lossless): data files (i.e.: V.42bis standard in
modems, zip files)
• Non-reversible (lossy): audio & video signals
• Usually more compression to lower quality and higher CPU

consumption.
– Different compression algorithms also differ in their computational

complexity, generally for the same bit rate more complex techniques get
better quality at the expense of using more CPU.
– Compression algorithms designed for telephony should introduce very little

delay because otherwise lost interactivity and echoes are problems and poor
quality of sound.
23
Bit
For Gaussian source N(0, 2)
Rate
Constant Bit Rate
Constant
2 2R
Quality DR 2
Complex
Simple
Distortion
• Scene more complex Higher bit rate for same quality

• CBR variable quality (example : Video CD artefact)
• Constant quality VBR necessary (e.g.: DVD-Video)
24
• Constant Bit Rate systems –
CBR (G.711, G.722, G.729) are better suited for
connection-oriented services.
• Variable Bit Rate systems –

VBR (MPEG, G.723.1) are best suited to networks
without constant bit rate reserve.
– MPEG compression is the most efficient and gives better

quality but consumes much CPU and introduce so much
delay can not be used in interactive applications (video
conferencing or telephone).
26
• Video codec key issues:

– Compression efficiency and image quality
– Computational complexity
– Frame rate
Encoder Channel Decoder
28
• General-purpose compression: Entropy encoding

– Remove statistical redundancy from data
– E.g. encode common values with short codes, uncommon
values with longer codes
– Good for text files, poor for images/video
Source Entropy Entropy Decoded

Data
Channel Data
Encoder Decoder
29
• Add a model that attempts to represent the image/video

signal in a form that can be easily compressed by the entropy
encoder
• Model exploits the subjective redundancy of images and
video (Spatial, Temporal, Chromatic redundancies)
• Decoded image may not be identical to original image
• Image properties that are useful for compression:
– Many of the pixels of a typical photographic image contain little or no
« useful » detail (e.g. flat area)
– The eye is insensitive to « high frequency » image information
Image Entropy Entropy Image

Model Encoder
Channel Decoder Model
30
• Trade-off Complexity/Quality/Bit Rate
• New technique may result in new trade-off

Complexity
Quality
MPEG Layer 2
MPEG Layer 1
MPEG Layer 3
Other Technique
Speech coding MPEG AAC
Bitrate
32
Redundancies
Statistical Psychological
Redundancy Redundancy (HVS)
Interpixel Coding Luminance (Contrast) Masking

Redundancy Redundancy Texture Masking
Color Masking
Frequency Masking
Spatial (intraframe) Temporal Masking
Redundancy
Variable-Length Coding
Temporal (interframe) Huffman, Arithmetic
Redundancy Run Length Coding, …
33
Quality Measurements
• Objective
– Mean Square Error (MSE)
– Peak Signal-to-Noise-Ratio (PSNR)
– Measure the fidelity to original video
• Subjective
– Human Vision System (HVS) based
– Emphasize audiovisual quality rather than fidelity
34
• Signal distortion is not a good measure of the performance of

a lossy compression method
an other method is necessary: MOS scale (Mean Opinion
Score)
• The five-grade CCIR impairment scale (Rec.562)

– 1 – unsatisfactory (Very annoying),
– 2 – poor (Annoying),
– 3 – satisfactory (Slightly annoying),
– 4 – good (Perceptible but not annoying),
– 5 – Excellent (Imperceptible)
• Example: Double blind test
36
Speech Coding - Compression vs quality
Standard MOS
64 PCM (G.711)
G.711 (64 Kb/s): 4,10
Bit Rate (Kb/s)
G.729 ( 8 Kb/s): 3,92

56 G.726 (32 Kb/s): 3,85
G.729a ( 8 Kb/s): 3,70
48 G.723.1 (5,3 Kb/s): 3,65
G.728 (16 Kb/s): 3,61
40
32 ADPCM 32 (G.726)
24 ADPCM 24 (G.725)
16 ADPCM 16 (G.726) LDCELP 16 (G.728)
8 CS-ACELP (G.729a) CS-ACELP 8 (G.729) Require special

LPC 4.8 MP-MLQ 6,4 (G.723.1) hardware (DSP)
0
0 1 2 3 4 5
MOS (Mean Opinion Score)
38
Audio & Video Basics
39
Audio Basics
• Analog signal sampled at
• Example: 8,000 mono samples/sec,
constant rate 256 quantized values --> 64kbps
– telephone: 8,000 samples/sec • Receiver converts it back to analog
– CD music: 44,100 samples/sec signal:
• Each sample quantized, i.e., – some quality reduction
rounded
– e.g., 28=256 possible quantized
values Example rates
• Each quantized value represented • CD: 1.411Mbps
by bits • MP3: 96, 128, 160kbps
– 8 bits for 256 values • Internet telephony: 5.3 - 13kbps
– 16 bits for 65536 values (G.723.3, G.729, and GSM – Global
• Mono, stereo, or surround? System for Mobile communication)
– 1, 2 or more channels
40
Audio Basics:
Speech Coding and compression
• 5 quality ranges (human ear sensitivity: 20Hz to 20kHz):
Range Frequency Bandwidth Quality and Applications
Telephone channel 300Hz – 3.4kHz intelligible speech, noisy natural,
Expanded bandwidth 50Hz – 7kHz speech with respected natural
Hi.Fi. bandwidth 20Hz – 15kHz excellent speech and music
Stereo bandwidth 20Hz – 20kHz CD quality
Stereo bandwidth 20Hz – 48kHz perfect quality, studio, cinema, DVD
43
Video Basics
• Operation of analogue television: The image captured by the camera lens
is converted into three monochrome images obtained by applying filters of
the three fundamental (primary) colors –
R (Red), G (Green), B (Blue).
– All kind colors are produced by using different proportions of these primary
colors
• Additive Color Mixing on a black surface
• Subtractive Color Mixing on a white surface
– The correct combination of the three monochrome images can reconstruct

the original image.
– RGB signals thus obtained are available in some cameras, though it is unusual
to work with them
44
Video Basics: Digital Video & Pixels
• Digital video is a sequence of frames, each consisting
of a rectangular grid of picture elements or pixels.
– For purely black-and-white video, each pixel is

represented as a single bit, 0 for black or 1 for white.
– For grey-scale video, 8 bits per pixel can be used to

represent 256 levels of grey … good enough for most
cases.
– For good colour video, 8 bits are used per pixel for each of
the RGB colours, resulting in 24 bits per pixel.
46
Video Basics : Digital Video & Pixels
Digital Camera
Film
Source: Digital Image Processing – Gonzalez, Woods. Prentice Hall
The Eye 47
Video Basics: Sampling & Quantization
Sampling & Quantization
Source: Digital Image Processing – Gonzalez, Woods. Prentice Hall
48
Video Basics: Scanning
• When an image (frame) appears on the retina of the human
eye, the image is retained for several milliseconds before
decaying.
• Consequently, if a sequence of images is displayed at the

appropriate rate, the eye does not notice that it is looking at
discrete images.
– This is how you get smooth motion in videos!
• What that rate is depends on the eye in question and how

the images are displayed.
49
Video Basics: Scanning
51
Spatial and Temporal Sampling of a Video Sequence
Source: H.264 and MPEG-4 Video Compression. Video Coding for next generation multimedia. I.E.G. Richardson. John Wiley & Sons, Ltd. 2003. Chapter 2.
53
Video Basics: Color Format
• RGB is not efficient since it uses equal bandwidth for each
color component.
• R,G,B components are correlated

– Transmitting R,G,B components separately is redundant
– More efficient use of bandwidth is desired
• To store or transmit video signals (sequence of images –

frames at constant rate), RGB signals are transformed into
three linear combination of such signals.
55
• The combination is performed such that:
– One of the new signals collects all the information light or brightness of the
image, Y, this signal is called luminance.
– The other two signals, called U and V, correspond to different combinations of
the three original signals, chosen so that capture all the color information
which is why these two signals are generically referred to as chrominance.
• Various formulae have been devised to convert RGB values to

chrominance and luminance values, depending on the format: YUV, YIQ,
YCbCr, …
• Consider switching from RGB to YUV as a change of a coordinate system to

one that maintains the same number of degrees of freedom but can solve
the problem more easily.
• For backward compatibility, colour signals had to be receivable and

watchable on a black-and-white set.
56
Color Formats Conversion
Cr R Y
Y kr R k g G kb B Cg G Y Cr Cg Cb cste
Cb B Y
• kr, kg, kb are weighting factors
1 kr
Y kr R 1 kr kb G kb B R Y Cr
0.5
0.5 2k r 1 k r 2k b 1 k b
Cr R Y G Y Cr Cb
1 kr 1 k r kb 1 k r kb
0.5 1 kb
Cb B Y B Y Cb
1 kb 0.5
58
Color Formats Conversion
• ITU-R recommendation BT.601 defines

kr = 0.299 and kb = 0.114.
Y 0.299 R 0.587G 0.114 B

Cr 0.713 R Y
Cb 0.564 B Y
R Y 1.402Cr
G Y 0.714Cr 0.344Cb
B Y 1.772Cb
59
http://www.yorku.ca/eye/photopik.htm
61
• Human eye is more sensitive to the luminance

(brightness) component than the color component: the
latter need not be transmitted as accurately.
– The luminance is broadcast at the same frequency as a black-

and-white signal, and the chrominance is ignored on black-
and-white sets.
– The two chrominance signals are broadcast in narrow bands

at higher frequencies.
• Called hue and saturation or tint and colour
62
Video Basics:
Chrominance Downsampling
• The reduced resolution in the chroma components is called
downsampling (subsampling).
• The subsampling is based on the human eye less sensitive to

chrominance.
• (Y, Cr, Cb) may use different resolutions 4:n:m: The numbers
indicate the relative sampling rate of each component in the
horizontal direction.
63
Video Basics:
• 4:4:4 sampling: the three components
have the same resolution (3n bits per
pixel)
– a sample of each component exists at
every pixel position.
– Preservation of the full fidelity of the
chrominance components.
• 4:2:2 sampling: Cb and Cr have the

same vertical resolution as Y, but half
the horizontal resolution
(2n bits per pixel).
– 4:2:2 video is used for high-quality
color reproduction.
64
Video Basics:
• 4:1:1 sampling: Cb and Cr have the
same vertical resolution as Y, but
quarter the horizontal resolution (1.5n
bits per pixel).
• 4:2:0 sampling: Cb and Cr each have

half the horizontal and vertical
resolution of Y (1.5n bits per pixel).
– 4:2:0 video requires exactly half as
many samples as 4:4:4 video
– 4:2:0 is widely used for consumer
applications such as video
conferencing.
65
Video Basics: Spatial Resolution Formats
• CIF: Common Interchange (Intermediate) Format - Intermediate format used
in videoconferencing (communication between US & Europe)
– Luma resolution: 352x288 (360x288) pixels

– Sampling frequency: 30Hz (30 frames/second - fps),
non-interlaced, sampling rate 4:2:0
• QCIF:176x144 pixels, 30fps (Quarter CIF) –

used in Video Telephony applications
• SQCIF: 128x96 pixels, 30fps (Sub QCIF),

mobile multimedia applications
• 4CIF: 704x576 pixels, 30fps, appropriate for

standard-definition television and DVD-video
• 16CIF: 1408x1152 pixels, 50fps

70
Spatial Resolution Formats
QCIF SQCIF
CIF
SCIF
16CIF 4:3
16CIF 16:9
71
Video Basics: Spatial Resolution Formats
• SIF: Simple Input Format (Source Intermediate Format) - Half the

vertical & horizontal resolution of 4:2:0. Used in Video Cassette
Recorders (VCRs)
– 360x242 (352x240) pixels, 30 frames/second for NTSC,

sampling rate 4:2:0
– 360x288 (352x288) pixels, 25 frames/second for PAL, SECAM, sampling

rate 4:2:0
• CCIR-601 (ITU-R 601 or BT 601)

– 720x525 pixels, 30 frames/second, sampling rate 4:4:4 & 4:2:2
– 720x625 pixels, 25 frames/second, sampling rate 4:4:4 & 4:2:2
72
MPEG, what is it?
76
International Organizations
•ISO (1947): International Organization for Standardization;
•IEC (1906): International Electrotechnical Commission,
•ISO/IEC JTC 1 (1987): Joint Technical Committee 1 of the ISO and the
IEC. It deals with all matters of information technology.
•ITU-T : Telecommunication Standardization Sector coordinates

standards for telecommunications on behalf of the International
Telecommunication Union (ITU 1993 – 1956 CCITT).
77
International Organizations (Cont’d)
• JPEG - ITU-T T.81, ISO/IEC IS 10918-1 : Joint Photographic Experts Group one of
two sub-groups of ISO/IEC Joint Technical Committee 1, Subcommittee 29,
Working Group 1 (ISO/IEC JTC 1/SC 29/WG 1) - titled as Coding of still pictures.
• MPEG: Moving Picture Experts Group (ISO/IEC JTC 1/SC 29/WG 11) - a working
group of ISO/IEC in charge of the development of standards for coded
representation of digital audio and video and related data.
• ITU-T SG15 : H26x – Videophone & Videoconference standards
• JVT: Joint Video Team - a group of video coding experts from ITU-T Study Group
16 (VCEG) and ISO/IEC JTC 1 SC 29 / WG 11 (MPEG), created to develop an
advanced video coding specification.
•Formed in 2001, the JVT’s main result has been ITU-T Rec. H.264 | ISO/IEC 14496-10,
commonly referred to as H.264/MPEG-4-AVC, H.264/AVC, or MPEG-4 Part 10 AVC.
78
MPEG: Moving Picture Experts Group
• Moving Picture Expert Group established in 1988 for the
development of digital video
– Still active (MPEG-21 is currently in development)
• International standard (ISO/IEC)

Interoperability & economy of scale
• Compression of audio and video and multiplexing in a single

stream
• Definition of the interface not of the codecs

room for improvement
79
MPEG: Moving Picture Experts Group
• Official home page of the Moving Picture Experts

Group (MPEG):
www.chiariglione.org/mpeg/
• In charge of the development of standards for coded

representation of digital audio and video and related
data.
• The group produces standards that help the industry

offer end users an ever more enjoyable digital media
experience.
80
List of MPEG standards
• MPEG-1 (ISO 11172)
The standard on which such products as Video CD and MP3 are based
(approved in Nov. 1992)
– Video-oriented CD-ROM, SIF format (video progressive)
– Objective: VHS quality. Typical bit rate 1.5Mb/s
– Useful for tele-education, enterprise applications, business, etc.
81
List of MPEG standards (Cont’d)
• MPEG-2 (ISO 13818)
The standard on which such products as Digital Television set top boxes and DVD
are based (approved in 1994, 1996);
– Compatible extension of MPEG-1 'up‘
– Oriented broadcast (interlaced video)
– Multiple resolutions standardized, from SIF (compatible with MPEG 1 up to
high definition formats for DVDs and so on.
– Intended for studio-quality audio and video. Broadcast quality HDTV also.
– Various bit rates 4-100Mb/s.(CBR & VBR)
– Useful for all types of applications (business, entertainment, etc.).
• MPEG-3: Originally designed for HDTV, finally resolved by reparameterization of

MPEG-2.
82
List of MPEG standards (cont’d)
• MPEG-4 (ISO 14496)
The standard for multimedia for the fixed and mobile web (Version 1 -
approved in Oct. 1998, Version 2 - approved in Dec. 1999, Versions 3, 4, 5)
– Computer Graphics Applications;
– Originally intended to similar applications as H.263, but expanded to cover a

wider range of multimedia applications.
– Extension 'down' MPEG-1. Internet video Oriented
– Useful in the range 28,8-500Kb/s. New compression algorithms. Typically less

than 1 Mbps but could be as high as tens of Mbps.
83
• MPEG-4 (ISO 14496) …
– Coding of Audiovisual Objects - Standard for audio, video and graphics in

interactive 2D and 3D multimedia communication - MPEG-4 v.2 & 3
– Supports scene composition and content-based functionalities, in which

scenes are expressed in terms of multiple audio-visual objects (AVOs) that can
be manipulated together or individually.
– Supports layering/scaling: multiple versions of AVOs can be provided and

matched against needs and available resources.
• For example, a base level AVO can be provided to give the bare essentials, with
multiple optional AVOs that provide levels of enhancement details.
• If we don’t have enough network resources, drop the enhancements and stick with
the basics!
84
• MPEG-7 (ISO 15938) The standard for description and search of audio and
visual content (approved in Jul. 2001);
– Audiovisual content description (indexing, searching, databases, etc.)..

Interprets semantics of audiovisual information
– More to do with structuring, and describing and searching through

multimedia content
• MPEG-21 (21000) The Multimedia Framework.
– Focus on multimedia distribution and on DRM aspects;
85
• MPEG-A (23000) – Application-specific formats, integrating multiple MPEG technologies
• MPEG-B (23001) – Systems specific standards
• MPEG-C (23002) – Video specific standards
• MPEG-D (23003) – Audio specific standards
• MPEG-E (23004) – MPEG multimedia Middleware - support to download and

execution of multimedia applications
• MPEG-V (23005) – Context and media control - interchange with virtual worlds
• MPEG-M (23006) – MPEG extensible Middleware - packaging and reusability

of MPEG technologies
• MPEG-U (23007) – MPEG Rich Media User Interface
86
List of ITU-T Standards
• H.261 (1983-1990)
– A standard for video telephony and video conferencing
over PSTN (Public Switching Telephone Networks) and wireless
networks.
– Uses either the CIF or QCIF format.
– Uses p x 64kbps where p can be between 1 and 30.
– Originally designed for ISDN usage (Integrated Services Digital
Network).
– Still in use
• Low complexity, low latency
• Mostly as a backward-compatibility feature
• Overtaken by H.263
87
List of ITU-T Standards (cont’d)
• H.263, H.263+, H.263++ (1993-1999)
– Based on H.261 but offers significant improvement on
coding efficiency, employs advanced coding options and
lower resolutions to preserve quality over lower bit rates
channels.
– Uses either the QCIF or S-QCIF formats.
– Uses less than 64kbps.
– PSTN and mobile network: 10 to 24kbps
– Adopted by several videophone terminal standards:
H.324 (PSTN), H.320 (ISDN), H.310 (B-ISDN)
• H.264/AVC (1999-2003)
– Double the coding efficiency in comparison to any other
existing video coding standards
88
Chronological Table of Video Coding Standards
ITU-T H.263 H.263++

VCEG (1995/96) H.263+ (2000)
H.261 (1997/98) H.264
(1990) MPEG-2
( MPEG-4
(H.262)
Part 10 )
(1994/95) MPEG-4 v1
ISO/IEC (1998/99)
(2002)
MPEG MPEG-4 v2
MPEG-1 (1999/00)
(1991) MPEG-4 v3
(2001)
1990 1992 1994 1996 1998 2000 2002 2003

92
Agenda
• Introduction

– Audio compression
– Video compression
– Audio/Video synchronisation
• Conclusion
94
Audio Compression principles
95
Speech Coding and Compression
• Waveform coding (PCM, DPCM, ADPCM)

– Samples coding (G.711, G.721, G.722, G.723,
G.725, G.726, …)
• Source Coding
– Speech modeling and parameters transmission of
the model (G728, G729, …)
• Hybrid Coding
96
Audio compression
• By identifying what can and, more important what

cannot be heard, the schemes described obtain
much of their compression by discarding
information that cannot be perceived.
• Over the course of our evolutionary history we have

developed limitations on what we can hear.
– Some of these limitations are physiological, based on the
machinery of hearing.
– Others are psychological, based on how our brain
processes auditory stimuli.
98
Audio Compression
• Sub-band Coding
– Techniques used in Layer I and II of MPEG audio are based
on sub-band coding.
• Transform Coding
– DCT is used in Layer III of MPEG audio.
• Predictive Coding
– Frequency prediction is used in AC-3 and MPEG AAC.
100
Common Audio Formats and Standards
 Pulse Code Modulation (PCM)

– Differential Pulse Code Modulation (DPCM)
– Adaptive Differential Pulse Code Modulation (ADPCM)
• Compact Disc Digital Audio (CD-DA)
• MPEG Audio
– Layer I
– Layer II
– Layer III
104
Audio compression
• Based on psycho-acoustics • 4 main principles :
– Threshold of audibility
• Compress the bit rate without
affecting the quality perceived
– Frequency masking
by the human ears (based on the
imperfection of human ears)
– Critical bands
• Removal of irrelevancies – Temporal masking
112
Audio compression
• Principle 1: Threshold of audibility
Not all frequency components need to be encoded with the
same resolution. Nr_bit(f) = (signal/threshold)db/6
http://www.audiodesignline.com
113
Audio compression
• Principle 2: Frequency masking
Analysis of the incoming signal
114
Audio compression
• Principle 3: Critical bands
– Human ear may be modelled as a collection of narrow band filters
– Bandwidth of these filters = critical band
– critical band
(<100 Hz) for lowest audible frequencies
( 4 kHz) for highest audible frequencies
– The human ear cannot distinguish between two sounds having two different
frequencies in a critical band.
Example : when we hear 50 & 60 Hz at the same time we cannot distinguish
them.
– Consequence:
Noise masking threshold depends solely of the signal energy within a limited
bandwidth domain.
The largest sound is taken as the representative of the critical band.
Necessity to analyse the signal at 100Hz resolution at low-frequency
115
Audio compression
• Principle 4: Temporal masking
The masking that occurs when a sound raises the audibility
threshold for a brief interval preceding and following the
sound, selection of the frame duration for frequency analysis
and encoding.
116
The MPEG encoder
117
Audio features in MPEG
• MPEG1 :
– Mono/stereo/dual/joint stereo (Possibility Dolby surround)
– Sampling frequencies : 32, 44.1 & 48 kHz
– 3 layers : trade-off complexity/delay versus coding
efficiency of compression
– Various bit rate : trade-off quality versus bit rate
• MPEG2 :
– 5.1 channels
– Sampling frequencies extended to 16, 22.05 & 24 kHz
122
Layer I coding
• The Layer I coding scheme provides a 4:1 compression.
• In Layer I coding the time frequency mapping is accomplished

using a bank of 32 subband filters.
• The output of the subband filters is critically sampled. That is,

the output of each filter is down-sampled by 32.
• The samples are divided into groups of 12 samples each.

– Twelve samples from each of the 32 subband filters, or a total of 384
samples, make up one frame of the Layer I coder.
123
Layer II Coding
• The Layer II coder provides a higher compression rate by
making some relatively minor modifications to the Layer I
coding scheme.
• The compression ratio in Layer II coding can be increased from

4:1 to 8:1 or 6:1.
• These modifications include:

– how the samples are grouped together,
– the representation of the scale factors, and
– the quantization strategy.
130
Layer III Coding - MP3
• One of the problems with the Layer I and Layer II coding
schemes was that with the 32-band decomposition, the
bandwidth of the subbands at lower frequencies is
significantly larger than the critical bands.
• This makes it difficult to make an accurate judgment of the

mask-to-signal ratio.
– If we get a high amplitude tone within a subband and if the subband
was narrow enough, we could assume that it masked other tones in
the band.
– However, if the bandwidth of the subband is significantly higher than
the critical bandwidth at that frequency, it becomes more difficult to
determine whether other tones in the subband will be be masked.
131
• Layer III offers almost CD quality with less than 2 bits/sample (enables
transferring music files via Internet over 28.8kbps modems)
• A simple way to increase the spectral resolution would be to decompose

the signal directly into a higher number of bands.
• However, one of the requirements on the Layer III algorithm is that it be

backward compatible with Layer I and Layer II coders.
• To satisfy this backward compatibility requirement, the spectral

decomposition in the Layer III algorithm is performed in two stages.
132
• First the 32-band subband decomposition used in Layer I and
Layer II is employed.
• The output of each subband is transformed using a modified

discrete cosine transform (MDCT) with a 50% overlap.
• The Layer III algorithm specifies two sizes for the MDCT, 6 or
18. This means that the output of each subband can be
decomposed into 18 frequency coefficients or 6 frequency
coefficients.
133
Advanced Audio Coding
• AAC (Advanced Audio Coding): audio compression

formats defined by MPEG-2 standard.
• AAC was known as NBC (Non-Backward-Compatible),

non compatible with MPEG-1 audio formats.
134
Advanced Audio Coding
• AAC can manipulate more channels than MP2 or

MP3 (48 full audio channels and 16 enhanced low-
frequency channels compared to 5 full audio
channels and 1 enhanced low-frequency channel for
MP2 or MP3),
• AAC can manipulates higher sampling frequencies

than MP3 (up to 96kHz compared to 48kHz).
135
Video Compression principles
136
Video Compression
• Two applied techniques for video compression:
– Spatial or intraframe compression: removal of intra-

picture redundancy in the image of each frame as in JPEG
images
– Temporal or interframe compression: removal of inter-

picture redundancy (between consecutive frames.) Coding
of difference with an interpolated picture (moving
vectors).
137
Video compression
• Result
– 4:2:0 SIF resolution : 30 Mbps
(= 25images/sec * 288lines * 352pixels * 1.5(lum & chrom) * 8bits)
±1.2 Mbps (CBR) in video CD (MPEG1)
– 4:2:2 CCIR 601 resolution : 166 Mbps

(= 25images/sec * 576lines * 720pixels * 2(lum & chrom) * 8bits)
± 3-4 Mbps (mean) in MPEG2
138
Image Codec (e.g. JPEG)
Image Model Entropy Decoder
Block DCT Quantize Zigzag RLE VLC
Transmit
/Store
IBlock IDCT IQuantize IZigzag RLD VLD

Blocks Block
• Process the data in blocks (sub-images) of 8x8 samples
• Covert Red-Green-Blue into Luminance (grayscale) and

chrominance (Blue color difference and Red color difference)
• Use half resolution for chrominance (because eye is more

sensitive to grayscale than to color)
• Each block contains redundant information.
140
Discrete Cosine Transform
DCT
• DCT transformation (in frequency domain)

decorrelates the input signal.
• Transform each block of 8x8 samples into a block of

8x8 spatial frequency coefficients.
• Most image blocks only contain a few significant

coefficients (usually the lowest “frequencies”)
– Energy tends to be concentrated into a few significant
coefficients (most energy in low spatial frequencies)
– Other coefficients are close to zero / insignificant
141
Discrete Cosine Transform
• Any 8x8 block of pixels can be
represented as a sum of 64 basis
patterns (black and white patterns)
• Output of the DCT is the set of

weights for these basis patterns (The
DCT coefficients)
– Multiply each basis pattern by its weight
and add them together
– Result is the original image block
142
Quantize and zig-zag scanning
Quantize Zigzag
• Divide each DCT coefficient by an integer, discard

remainder
• high frequent spatial frequencies quantized with

lower resolution than low ones (remove irrelevancy)
- Result: loss of precision. Typically, a few non-zero
coefficients are left
• Scan quantized coefficients in a zig-zag order: Non-

zero coefficients tend to be grouped together
143
Video compression
• Spatial redundancy reduction (DCT example)
1 39 1 44 1 49 15 3 15 5 15 5 1 55 1 55 1 26 0 - 1 -1 2 -5 2 -2 -3 1
1 44 1 51 1 53 15 6 15 9 15 6 1 56 1 56 -2 3 -1 7 - 6 -3 -3 0 0 -1
1 50 1 55 1 60 16 3 15 8 15 6 1 56 1 56 DC T -1 1 - 9 - 2 2 0 -1 -1 0
1 59 1 61 1 62 16 0 16 0 15 9 1 59 1 59 -7 -2 0 1 1 0 0 0
1 59 1 60 1 61 16 2 16 2 15 5 1 55 1 55 -1 -1 1 2 0 -1 1 1
1 61 1 61 1 61 16 1 16 0 15 7 1 57 1 57 2 0 2 0 -1 1 1 -1
1 62 1 62 1 61 16 3 16 2 15 7 1 57 1 57 -1 0 0 -1 0 2 1 -1
1 62 1 62 1 61 16 1 16 3 15 8 1 58 1 58 -3 2 -4 -2 2 1 -1 0
Q uan tsi a toi n

1 58 0 -1 0 0 0 0 0
-1 -1 0 0 0 0 0 0
-1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 z gi -zag scan
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 158 0 -1 -1 -1 -1 EO B
0 0 0 0 0 0 0 0
144
Run-Length Encoding
RLE
• Encode each coefficient value as a (run, level) pair:

– Run = number of zeros preceding value
– Level = non-zero value
• Usually, the block data is reduced to a short sequence of (run,

level) pairs
– This is now easy to compress using an entropy encoder
145
Variable-Length Encoding
VLC
• Encode each (run, level) pair using a variable-length code

– Frequently occurring groups – assign a short code
– Infrequently occurring groups – assign a long code
• Result: compressed version of the image
146
Image decoding
• Reverse the stages to recover the image
• Information was thrown away during quantization

– Decoded image will not be identical to the original
• In general: more compression = more quality loss
• Too much compression:

– Block edges start to show (“blockiness”)
– High-frequency patterns start to appear (“mosquito noise”)
147
Video coding
• Moving images contain significant temporal redundancy
– Successive frames are very similar
• Add an extra “motion model” at the “front end” of the image

encoder
• The amount of data to be coded can be reduced significantly

if the previous frame is subtracted from the current frame.
148
Video Encoder
• Video frames
Motion Model
Motion
DCT Quantize Zigzag RLE VLC Buffer
Comp.
Motion
Vectors
Headers
Motion
Estim.
Motion
Vectors
Recon. IDCT Rescale

Video Decoder
Buffer VLD RLD IZigzag Rescale IDCT Recon.
Headers
Motion Estimation
• Process 16x16 luminance samples at a time (“macroblock”)
• Compare with neighboring area in previous frame
• Find closet matching area

– Prediction reference
• Calculate offset between current macroblock and prediction

reference area
– Motion vector
151
Motion Estimation
152
Motion Compensation
• Subtract the reference area from the current macroblock
– Difference macroblock
• Encode the difference macroblock with an image encoder
• If motion estimation was effective

– Little data left in difference macroblock
– More efficient compression
153
Motion Compensation
– In Motion Estimation (ME), each macroblock (MB) of
the Target P-frame is assigned a best matching MB
from the previously coded I or P frame - prediction.
– prediction error: The difference between the MB and

its matching MB, sent to DCT and its subsequent
encoding steps.
– The prediction is from a previous frame — forward

prediction.
154
Motion Compensation
• MPEG introduces a third frame type — B-frames, and its
accompanying bi-directional motion compensation.
– Each MB from a B-frame will have up to two motion vectors

(MVs) (one from the forward and one from the backward
prediction).
– If matching in both directions is successful, then two MVs will be sent

and the two corresponding matching MBs are averaged before
comparing to the Target MB for generating the prediction error.
– If an acceptable match can be found in only one of the reference

frames, then only one MV and its corresponding MB will be used from
either the forward or backward prediction.
155
B-frame Coding Based on Bidirectional Motion Compensation.
156
Motion Compensation
The Need for Bidirectional Search.
The MB containing part of a ball in the Target frame

cannot find a good matching MB in the previous frame
because half of the ball was occluded by another object.
A match however can readily be obtained from the next
frame.
157
Video MPEG - Frame Types
• I (Intra): self-contained, only spatial compression (like JPEG)
• P (Predictive): referred to the P/I before. Temporal compression by

extrapolation using macroblocks. A macroblock can be:
• Same: no change over the reference frame

• Moved: (eg. A ball in motion) is described by a vector of movement
and eventually a correction (difference from original)
• New: (eg. What appears behind a door that opens) is described by
spatial compression (like an I-frame)
• B (Bidirectional): temporal compression with interpolation; referred to

the P/I before and the P/I after. Maximum compression, maximum
computational complexity. It softens the image, reducing noise.
158
I Frames (Intra)
Intra frames are coded as self-contained,
without reference to other frames
18 KBytes I
18 KBytes I
18 KBytes I
18 KBytes I
18 KBytes I
25 frames
72 x 1024 x 8 / 0,16 = 3,7Mbps per second
159
P frames (Predictive)
Predictive frames are encoded using
motion compensation based on
previous I or P frame 18 KB I
6 KB P
6 KB P
18 KB I
6 KB P
6 KB P
18 KB I
60 x 1024 x 8 / 0,24 = 2,0Mbps

160
B frames (Bidirectional)
Bidirectional frames are encoded
18 KB I
using motion compensation based
on the nearest I or P previous and 4 KB B
subsequent 4 KB B
6 KB P
4 KB B
Common Values
4 KB B 10
6 KB P 9
4 KB B 8
4 KB B 7
18 KB I 6
5
4
3
2
1
54 x 1024 x 8 / 0,36 = 1,2Mbps
Transmission order: 1,4,2,3,7,5,6,10,8,9,…

161
Group of Picture Structure
Bidirectional Motion Pred 12
Compensation
Intra 9
B11
Pred 6 B10
B8
Pred 3 B7 16 x 16 bidirectional
B5 macroblocks
Intra 0 B4 - Intra
B2 - Forward
B1 - Reverse
- Bidirectional
 I-frames: for random access
 intraframe coded; lowest compression
 P-frames: predictive encoded
 most recent I- or P- frame, medium compression
 B-frames: interpolation
 most recent & subsequent I- or P-frame, highest compression 162
Video compression
• Temporal redundancy reduction
In c rea se o f I : In tra -coded p c i tu re
com p re ss o in P :P red c i ted p ci tu re
ra te B :B i-d ire c toi na ly
l n i te rpo a
l ted p c
i tu re
B i-d ire c to
i na lp red c
i to
in
O rde r o f 0 1 2 3 4 5 6 7 8 9
p re sen ta to
in
I B B P B B P B B P I B
P red c
i to
in
O rde r o f
tran sm s
i so
in 0 3 1 2 6 4 5 9 7 8
I P B B P B B P B B I P
163
Synchronisation - Getting data on time
• Synchronisation in the multimedia context refers to the
mechanism that ensures a temporal consistent presentation
of the audio-visual information to the user
• “On time” Not too late, not too early

No buffer over- or underflow
• Flow control : not applicable in broadcasting
• Common time base and Definition of a standard target
decoder that describes the data consumption pattern of the
receiver.
– Remark: Direct MPEG (Microsoft) does not use time information for
clock recovery but relies on flow control
164
Streams
• Idea of continuity (pipelining): Carry time information for
clock recovery
• No flow control (allows broadcasting): The emitter must have

a precise knowledge of the receiver data consumption
pattern (explicit in MPEG STD)
• Just-in-time: Shorter delay and smaller buffer size than with

flow control
• Two aspects in synchronisation :

Clock recovery & timing control (model & buffering)
165
Requirement on for stream transport
• Data information
BER (Bit Error Rate) requirement
No repetition of frame possible FEC (Forward
Error Correction)
• Time information No jitter
166
Agenda
• Introduction

– The MPEG model and its situation in a communication context
– JPEG & MJPEG
– H.261 & MPEG-1
– H.263 & MPEG-2
– Visioconference
• Conclusion
167
MPEG Versions
• MPEG-1
– For video storage in CD-ROM & transmission over T-1 lines (1.5Mbps)
• MPEG-2
– Many options: 352x240 pixel; 720x480 pixel; 1440x1152 pixel;
1920x1080 pixel
– Many profiles (set of coding tools & parameters)
• Main Profile
– I, P & B frames; 720x480 conventional TV
– Very good quality @ 4-6 Mbps
• MPEG-4
– <64kbps to 4Mbps
– Designed to enable viewing, access & manipulation of objects, not only
pixels
– For digital TV, streaming video, mobile multimedia & games
168
MPEG Coding Standard
• Motion Picture Expert Group (MPEG)
– Video and audio compression & multiplexing
– Video display controls
• Fast forward, reverse, random access
• Elements of encoding
– Intra- and inter-frame coding using DCT
– Bidirectional motion compensation
– Group of Picture structure
– Scalability options
• MPEG only standardizes the decoder
169
Video H.26x
• ITU-T video Standards for video conferencing: low speed,
low turnover. Less action in movies.
– H.261: Developed in the late 80 for ISDN (constant flow).
– H.263, H.263+, H.264. More modern and efficient.
• Simplified MPEG compression algorithms:

– More restricted motion vectors (least action)
– In H.261: No frames B (excessive latency and complexity)
• Less CPU intensive. Feasible real-time software codec
170
Video H.26x (Cont’d)
• Subsampling 4:1:1
• Resolutions:
– CIF (Common Interchange Format): 352 x 288
– QCIF (Quarter CIF): 176 x 144
– SCIF (Super CIF): 704 x 576
• Independent Audio: G.722 (quality), G.723.1, G.728, G.729
• Audio-video synchronization using H.320 (ISDN) and H.323

(Internet)
171
The MPEG model
A ud oi A ud oi A ud oi A ud oi
s gi na l en code r de c ode r s gi na l
M u ltpi el xe r T ran sm si s oi n D em u lt i-
cha nne l p el xe r
V di eo V di eo V di eo V di eo
s gi na l en code r D gi ita l s to rage m ed uim de c ode r s gi na l
or
N e wt o rk
C ap tu red s gi na sl P re sen ted s gi na sl
172
Components of the MPEG standard
• The MPEG standard is composed of 3 main parts :
– Audio : Specifies the compression of audio signals
– Video : Specifies the compression of video signals
– System : specifies how the compressed audio and video signals are
combined in the multiplexed stream (program stream or transport
stream).
• Each part specifies :

– The bitstream syntax
– The timing requirement and the related information (bit rate, buffer
needs)
174
MPEG in a communication context
• A simple view of MPEG in the communication context
ES TS (T ran spo r tS tre am )
E( elm en at ry or
S tre am ) PS P( ro g ram S tream )
TS A da p -
at toi n
A ud oi , M u lt i- ot ht e
v di eo p el x ni g ch ann e l C a b el
so u rce s TS
(n p ro -
V di eo g ram s )
E n code r A da p -
at toi n
A ud oi ot ht e S a et lliet
en cod e r ch ann e l
PS
M u lt i-
p el x ni g
PS A da p -
(1 p ro - at toi n
g ram ) ot ht e D si c
ch ann e l
M PEG 2 com p re s s oi n al ye r M PEG 2 s y s etm al ye r DVB ,DVD ...

178
JPEG & MJPEG
179
JPEG Coding Standard
• Key Components:
– Transform:
• 8×8 DCT
• boundary padding
– Quantization:
• uniform quantization
• DC/AC coefficients
– Coding:
• Zigzag scan
• run length/Huffman coding
180
JPEG Baseline Coder
Tour Example
183 160 94 153 194 163 132 165
183 153 116 176 187 166 130 169
179 168 171 182 179 170 131 167
177 177 179 177 179 165 131 167
178 178 179 176 182 164 130 171
179 180 180 179 183 169 132 169
179 179 180 182 183 170 129 173
180 179 181 179 181 170 130 169
181
Step 1: Transform
• DC level shifting
183 160 94 153 194 163 132 165 55 36 34 25 66 35 4 37

183 153 116 176 187 166 130 169 55 25 12 48 59 38 2 41
179 168 171 182 179 170 131 167 51 40 43 54 51 42 3 39
177 177 179 177 179 165 131 167
-128 49 49 51 49 51 37 3 39
178 178 179 176 182 164 130 171 50 50 51 48 54 36 2 43
179 180 180 179 183 169 132 169 51 52 52 51 55 41 4 41
179 179 180 182 183 170 129 173 51 51 52 54 55 42 1 45
180 179 181 179 181 170 130 169 52 51 53 51 53 42 2 41
• 2D DCT
55 36 34 25 66 35 4 37 313 56 27 18 78 60 27 27
55 25 12 48 59 38 2 41 38 27 13 44 32 1 24 10
51 40 43 54 51 42 3 39 20 17 10 33 21 6 16 9
49 49 51 49 51 37 3 39
DCT 10 8 9 17 9 10 13 1
50 50 51 48 54 36 2 43 6 1 6 4 3 7 5 5
51 52 52 51 55 41 4 41 2 3 0 3 7 4 0 3
51 51 52 54 55 42 1 45 4 4 1 2 9 0 2 4
52 51 53 51 53 42 2 41 3 1 0 4 2 1 3 1
182
Step 2: Quantization
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55 Why increase
Q-table 14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62 from top-left to
18 22 37 56 68 109 103 77 bottom-right?
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
313 56 27 18 78 60 27 27 20 5 3 1 3 2 1 0
38 27 13 44 32 1 24 10 3 2 1 2 1 0 0 0
20 17 10 33 21 6 16 9 Q 1 1 1 1 1 0 0 0
10 8 9 17 9 10 13 1 1 0 0 1 0 0 0 0
6 1 6 4 3 7 5 5 0 0 0 0 0 0 0 0
2 3 0 3 7 4 0 3 0 0 0 0 0 0 0 0
4 4 1 2 9 0 2 4 0 0 0 0 0 0 0 0
3 1 0 4 2 1 3 1 0 0 0 0 0 0 0 0
183
Step 3: Entropy Coding
20 5 3 1 3 2 1 0
3 2 1 2 1 0 0 0
1 1 1 1 1 0 0 0
1 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Zigzag Scan
(20,5,-3,-1,-2,-3,1,1,-1,-1,
0,0,1,2,3,-2,1,1,0,0,0,0,0,
0,1,1,0,1,EOB)
End Of the Block:

Zigzag Scan All following coefficients
are zero
184
Video M-JPEG (Motion JPEG)
• The simplest: Try the video as a sequence of JPEG photos,
without taking advantage of redundancy between frames.
• DCT Algorithm (Discrete Cosine Transform)
• less efficient, but low delay.
• Used in:
– Some digital recording systems and nonlinear
editing (editing independent of each frame)
– Some videoconferencing systems (low delay).
• It does not include standard audio support. The audio
has been encoded by some other means (eg CD-DA)
and synchronized by non-standard mechanisms.
185
H.261 & MPEG1
186
H.261 Coding Standard
• Background:
– Facilitate video conferencing and videophone service over
ISDN
– p×64 kbps
• p=1: videophone;
• p>5: videoconference;
• p=30: VHS-quality;
– Basis of MPEG-1 and MPEG-2
• Features
– Maximum coding delay of 150ms
– Amenable to low-cost VLSA implementation
187
Input Image Formats
CIF QCIF
# of pels/line (Y) 360(352) 180(176)

# of pels/line (U/V) 180(176) 90(88)
# of lines/pic (Y) 288 144
# of lines/pic (U/V) 144 72
Interlacing 1:1 1:1
Temporal rate 30,15,10,7.5 30,15,10,7.5
Aspect ratio 4:3 4:3
188
Video Multiplex
• It defines a data structure so that a decoder can
interpret the received bit stream without any
ambiguity
• Hierarchical data structure

– Picture layer
– Group of blocks (GOB) layer
– Macroblock (MB) layer
– Block layer
– Each layer has a distinct header
189
Picture and GOB Layers
• Picture layer consists of picture header followed by

the data for GOBs
– Picture header contains data such as picture format (CIF or
QCIF)
• GOB layer is always composed of 33 MBs

– GOB header contains a MB address and compression mode
followed by the data for the blocks
190
Macroblock and Block Layers
Macroblock: the smallest unit to select the compression mode
Y1 Y2
Cr Cb
Y3 Y4
A MB always consists of 6 blocks (Y1 – Y4, Cr, Cb)
MBA MTYPE MQUANT MVD CBP Block Data
191
Compression Modes
• Intra Mode
– Similar to JPEG coding
– Support two compression modes
• Inter Mode
– ME is not specified (MC is optional)
– Usually, 16-by-16 BMA, integer-pel accuracy,
search range [-15,15]
– Support various compression modes
192
H.261 Encoder
Intra
Huffman
8x8 DCT Q VLC
block
- Inter
Q-1
Filter
CRC error p x 64
and
Frame Fixed-length
I-DCT
Memory control
Motion
Estimation Motion Vector
• Intended for videoconferencing applications

• Bit rates = p x 64 kbps, p = 2, 6, 24 common
197
MPEG-1
• MPEG-1 adopts the SIF (Source Input Format) digital TV format.
• MPEG-1 supports only non-interlaced video. Normally, its picture resolution

is:
– 352×240 for NTSC video at 30 fps
– 352×288 for PAL video at 25 fps
– It uses 4:2:0 chroma subsampling
• The MPEG-1 standard is also referred to as ISO/IEC 11172. It has five parts:
– 11172-1 Systems,
– 11172-2 Video,
– 11172-3 Audio,
– 11172-4 Conformance, and
– 11172-5 Software.
198
Hierarchical Data Structure
• Sequences are formed by Group Of Pictures (GOP)
• GOP are made up of pictures (frames)
• Pictures consist of slices
• Slices are made up of macro-blocks (MB)
• Macro-blocks consist of blocks
• Blocks are 8×8 pixels arrays
200
Hierarchical Data Structure
Layers of MPEG-1 Video Bitstream.

201
Example of temporal picture structure
202
Slices in an MPEG-1 Picture.
203
Video MPEG (MPEG-1)
• Subsampling 4:2:0 (25% more savings than 4:2:2)
• Two possible formats:

– SIF (Standard Interchange Format) - in PAL (396 MBs):
– Y: 352x288 pixels,
– Cr & Cb: 176x144 pixels
– QSIF (Quarter SIF) (99 MBs):
– Y: 176 x 144;
– Cr & Cb : 88 x 72
• Two compression types (simultaneously):

– Spatial: as in JPEG
– Temporal: takes advantage of each frame having similarity with
those around.
204
MPEG-1 Video
• Typical Sequence (360ms): I1 B2 B3 P4 B5 B6 P7 B8 B9 I10
• Order of encoding / decoding : I1 P4 B2 B3 P7 B5 B6 I10 B8 B9
• Typical size of frames (SIF, 352x288):

– I: 18kBytes (7:1)
– P: 6kBytes (20:1)
– B: 2.5 - 4kBytes (50:1)
– Average bit rate (IBBPBBPBBI): 1.2Mbps

– With QSIF the bit rate is reduced to 300kbps
• Compression Latency (Typical values):

– M-JPEG: 45 ms
– MPEG frames I: 200 - 400 ms
– MPEG frames I & P: 200 - 500 ms
– MPEG frames I, P & B: 400 - 850 ms
206
MB Types in MPEG-I
I-pictures P-pictures B-pictures
Intra Intra Intra
Intra-A Intra-A Intra-A
Inter-D Inter-F
Inter-DA Inter-FD
Inter-F Inter-FDA
Inter-FD Inter-B
Inter-FDA Inter-BD
Skipped Inter-BDA
A- adaptive quantization Inter-I
F- forward prediction with MC Inter-ID
D- DCT of prediction error will be coded Inter-IDA
B – backward prediction with MC
Skipped
I – interpolated prediction with MC
210
Audio MPEG-1
• Mono or stereo sampling to 32, 44.1 (CD) or 48 (DAT) kHz. If you are
using a reduced bit rate it is desirable to sample at 32 kHz.
• Psychoacoustic compression (with losses) asymmetric.
• From 32 to 448 kbps per audio channel
• Three layers in ascending order of complexity/quality:
– Layer I: good quality with 192-256 kbps per channel is not used
– Layer II: 96-128 kbps CD quality per channel
– Layer III: quality CD with 64 kbps per channel
• Each layer introduces new algorithms, and includes those of the
above.
• Layer III used in DAB (Digital Audio Broadcast) and MP3
214
MPEG-1System
• Responsible for ensuring the synchronization between
audio and video through a system of time slots (
'timeslots') based on a clock of 90kHz.
• It is only necessary if using audio and video

simultaneously (not for MP3 streams for example)
• Requires a small flow (5-50kbps)
216
Synchronization of audio and video MPEG
Digital audio stream

Analog audio Audio with timeslots
signal encoder
Clock System MPEG-1 stream

90 KHz Multiplexer
Analog video Video Digital video stream

signal encoder with timeslots
During the decoding the reverse process is performed
217
Prototypical Decoder
ISO/IEC 11172
219
Major Differences from H.261
• Source formats supported:

– H.261 only supports CIF (352 × 288) and QCIF (176 × 144) source formats,
MPEG-1 supports SIF (352 × 240 for NTSC, 352 × 288 for PAL).
– MPEG-1 also allows specification of other formats as long as the Constrained
Parameter Set (CPS) as shown in the following Table is satisfied:
The MPEG-1 Constrained Parameter Set

Parameter Value
Horizontal size of picture ≤ 768
Vertical size of picture ≤ 576
No. of MBs / picture ≤ 396
No. of MBs / second ≤ 9,900
Frame rate ≤ 30 fps
Bit-rate ≤ 1,856 kbps
220
MPEG-I vs. H.261
H.261 MPEG-1
Sequential access Random access
One basic frame rate Flexible frame rate
CIF and QCIF images only Flexible image size
I and P frames only I, P and B frames
MC over 1 frame MC over 1 or more frames
Integer-pel MV accuracy Half-pel MV accuracy
Spatial filtering in the loop No filter
Variable threshold+uniform Quantization matrix
quantization
No GOP structure GOP structure
GOB structure Slice structure
225
H.263/H.263+ & MPEG2
226
Video Codecs: H.263
• Frame-based coding
• Low Bit rate Coding:
– < 64 kbps (typical)
• H.261 coding with improvements

– I/P/B frames
– Additional Image formats: 4CIF, 16CIF
• Suitable for desktop video conferencing over

low-speed links
227
H.263 Baseline Coding Algorithm
• Video Frame Structure
– support sub-QCIF, QCIF, CIF, 4CIF and 16CIF
• Video Coding Tools

– Motion estimation and compensation
• range : [-16,15.5] accuracy : half-pel
– Transform: 8×8 DCT
cm , n
cmq ,n ,0 m, n 7
– Quantization: Q factor Q
– Entropy Coding: 3D VLC (LAST,RUN,LEVEL)
• Coding Control
– Intra/Inter switch
230
Advanced Coding Modes in H.263
Unrestricted motion vector mode

• range : [-31.5,31.5]
• Allow MV to point outside the picture boundaries
• Syntax-based arithmetic coding mode
• About 5% savings over VLC
• Advanced prediction mode
Overlapped Block Motion Compensation (OBMC)

• PB-frame mode
I B P B P …
231
H.263+
• Advanced intra coding mode • Temporal, SNR and Spatial
scalability mode
• Deblocking filter mode
• Reference picture resampling
• Slice structure mode mode
• Supplemental enhancement • Reduced resolution update mode
information mode
• Improved PB-frame mode • Independently segmented

decoding mode
• Reference picture selection mode
• Alternative Inter VLC mode
• Modified quantization mode
235
MPEG-2
• MPEG-2: For higher quality video at a bit-rate of more than 4
Mbps.
• Defined seven profiles aimed at different applications

(toolboxes) :
– Simple profile (No B picture),
– Main profile (=MPEG1+interlaced, Does not support scalability),
– SNR scalable profile (allows graceful degradation (noise improvement
at same resolution),
– Spatial scalable profile (hierarchical coding : improvement at higher
resolution),
– High profile.
– 4:2:2 Profile,
– Multiview Profile.
244
Video MPEG-2
• Compatible extension of MPEG-1
• Designed for digital TV:

– Optimized for transmission, not storage
– Provides interlaced video (TV) as well as progressive (MPEG-1 was
only progressive)
• According to the values of the sampling parameters used

are defined in MPEG-2 four levels exist:
– Low: 352x288 (supports MPEG-1)
– Main: 720x576 (equivalent CCIR 601)
– High-1440: 1440x1152 (HDTV 4:3)
– High: 1920x1152 (HDTV 16:9)
246
Profiles and Levels in MPEG-2
Level Simple Main SNR Spatially High 4:2:2 Multiview
profile profile Scalable Scalable Profile Profile Profile
profile profile
High * *
High 1440 * * *
Main * * * * * *
Low * *
Four Levels in the Main Profile of MPEG-2
Level Max. Resolution Max Max Max coded Data Application

fps pixels/sec Rate (Mbps)
High 1,920 × 1,152 60 62.7 × 106 80 film production

High 1440 1,440 × 1,152 60 47.0 × 106 60 consumer HDTV
Main 720 × 576 30 10.4 × 106 15 studio TV
Low 352 × 288 30 3.0 × 106 4 consumer tape equiv.
247
Bit rates of Levels and Profiles MPEG-2
Profiles Simple Main SNR Spatial High 4:2:2
Scalability Scalability (Studio)
Subsampling 4:2:0 4:2:0 4:2:0 4:2:0 4:2:0/2 4:2:2
High 1920x1152 80Mbps 100Mbps
(HDTV 16:9)
High -1440 60Mbps 60Mbps 80Mbps
1440x1152
(HDTV 4:3)
Levels
Main 720x576 15Mbps 15Mbps 15Mbps 20Mbps 50Mbps

(CCIR 601)
Low 352x288 4Mbps 4Mbps
(MPEG1)
The peak rates are shown under the standard for each combination of profile and level.
248
Five Modes of Predictions
• MPEG-2 defines Frame Prediction and Field Prediction as well
as five prediction modes:
1. Frame Prediction for Frame-pictures:

Identical to MPEG-1 MC-based prediction methods in both P-frames
and B-frames.
2. Field Prediction for Field-pictures:

A macroblock size of 16×16 from Field-pictures is used.
249
3. Field Prediction for Frame-pictures:
The top-field and bottom-field of a Frame-picture are treated
separately. Each 16×16 macroblock (MB) from the target Frame-
picture is split into two 16×8 parts, each coming from one field. Field
prediction is carried out for these 16×8 parts.
4. 16×8 MC for Field-pictures:

Each 16×16 macroblock (MB) from the target Field-picture is split into
top and bottom 16×8 halves. Field prediction is performed on each
half. This generates two motion vectors for each 16×16 MB in the P-
Field-picture, and up to four motion vectors for each MB in the B-
Field-picture.
This mode is good for a finer MC when motion is rapid and irregular.
250
5. Dual-Prime for P-pictures:
First, Field prediction from each previous field with the same parity
(top or bottom) is made. Each motion vector mv is then used to derive
a calculated motion vector cv in the field with the opposite parity
taking into account the temporal scaling and vertical shift between
lines in the top and bottom fields. For each MB the pair mv and cv
yields two preliminary predictions. Their prediction errors are
averaged and used as the final prediction error.
This mode mimics B-picture prediction for P-pictures without adopting

backward prediction (and hence with less encoding delay).
This is the only mode that can be used for either Frame-pictures or
Field-pictures.
251
Supporting Interlaced Video
• MPEG-2 must support interlaced video as well since this is one of
the options for digital broadcast TV and HDTV.
• In interlaced video each frame consists of two fields, referred to

as the top-field and the bottom-field.
– In a Frame-picture, all scanlines from both fields are interleaved to form

a single frame, then divided into 16×16 macroblocks and coded using
MC.
– If each field is treated as a separate picture, then it is called Field-

picture.
252
Audio MPEG-2
• Algorithms:
– Version compatible with MPEG-1 Layer I, II and III
– Improved Compression System Advanced Audio Coding (AAC).
Comparable quality to MPEG-1 layer III with 50-70% of flow. Not
compatible with MPEG-1.
• Channels:
– Stereo version compatible with MPEG-1
• Independent (each channel)
• Set (exploits redundancy between channels)
– Support multi-channel (languages) and 5.1 (5 channels surround)
259
MPEG-2 Scalabilities
• The MPEG-2 scalable coding: A base layer and one or more enhancement
layers can be defined — also known as layered coding.
– The base layer can be independently encoded, transmitted and decoded to

obtain basic video quality.
– The encoding and decoding of the enhancement layer is dependent on the

base layer or the previous enhancement layer.
• Scalable coding is especially useful for MPEG-2 video transmitted over

networks with following characteristics:
– Networks with very different bit-rates.
– Networks with variable bit rate (VBR) channels.
– Networks with noisy connections.
261
MPEG-2 Scalabilities (Cont’d)
• MPEG-2 supports the following scalabilities:
1. SNR Scalability—enhancement layer provides higher SNR (Different levels of

quality), base/enhancement layer uses a coarse/fine quantizer for DCT
coefficients.
2. Spatial Scalability — enhancement layer provides higher spatial resolution

(Different resolutions), base/enhancement layer is a low/high spatial resolution
of the video.
3. Temporal Scalability—enhancement layer facilitates higher frame rate (Different

frame rates), allow the decodability at different frame rates.
4. Hybrid Scalability — combination of any two of the above three scalabilities.
5. Data Partitioning — quantized DCT coefficients are split into partitions (Separate
headers and payloads apart).
• Limited scalability capabilities: Three layers only
262
Non-Scalable
Non-scalable Bit stream
Decoder 1 Decoder 2 Decoder 3

264
Spatial Scalability
Scalable bit stream
Decoder 1
Decoder 2
Decoder 3 265
Decoder 4
PSNR Scalability (Quality)
Scalable Bit stream
Decoder 1 Decoder 2 Decoder 3

268
Temporal scalability
1 0 1 1 1 … 0 1 0 1 0 0 0 … 1 1 0 1 0 0
Frame 0,4,8,12,… Frame 0,2,4,6,8,… Frame 0,1,2,3,4,5,…
7.5Hz 15Hz 30Hz
272
Hybrid Scalability
• Any two of the above three scalabilities can be combined
to form hybrid scalability:
1. Spatial and Temporal Hybrid Scalability.
2. SNR and Spatial Hybrid Scalability.
3. SNR and Temporal Hybrid Scalability.
• Usually, a three-layer hybrid coder will be adopted which

consists of:
– Base Layer,
– Enhancement Layer 1, and
– Enhancement Layer 2.
276
Data Partitioning
• The Base partition contains lower-frequency DCT coefficients,
enhancement partition contains high-frequency DCT
coefficients.
• Strictly speaking, data partitioning is not layered coding, since a

single stream of video data is simply divided up and there is no
further dependence on the base partition in generating the
enhancement partition.
• Useful for transmission over noisy channels and for progressive

transmission.
277
Major Differences from MPEG-1
• Better resilience to bit-errors: In addition to Program Stream, a
Transport Stream is added to MPEG-2 bit streams.
• Support of 4:2:2 and 4:4:4 chroma subsampling.
• More restricted slice structure: MPEG-2 slices must start and end in
the same macroblock row. In other words, the left edge of a picture
always starts a new slice and the longest slice in MPEG-2 can have
only one row of macroblocks.
• More flexible video formats: It supports various picture resolutions

as defined by DVD, ATV and HDTV.
278
Major Differences from MPEG-1 (Cont’d)
• Nonlinear quantization — two types of scales are allowed:
1. For the first type, scale is the same as in MPEG-1 in which it is an

integer in the range of [1, 31] and scalei = i.
2. For the second type, a nonlinear relationship exists, i.e., scalei ≠ i.

The ith scale value can be looked up from the following Table.
Table : Possible Nonlinear Scale in MPEG-2
279
Other Improvements
MPEG-I MPEG-II
Intra MB 8bits 11bits

DC Coeff.
Intra MB [-256,255] [-2048,2047]
AC Coeff.
Non-intra MB [-256,255] [-2048,2047]
Coeff.
Finer Quantization of the DCT Coefficients
280
Videoconference
• Interactive communication through audio, video and
data sharing
• It can be:
– Point to point
– Point to multipoint
– Multipoint to multipoint
282
Requirements / Features of the
videoconference
• Compression / Decompression in real time.
• 200-400 ms maximum delay.
• Mobility disabled.
• Normally acceptable quality audio phone.
• Need to synchronize audio and video.
• Need for signaling protocol (connectionless service).
283
Videoconference Standards
• Videoconferencing systems have been standardized by the
ITU-T (International Telecommunications Union -
Telecommunications sector) in the standards of the series H
(multimedia and audiovisual systems)
• The H.32x are videoconferencing standards.

The 'x' depends on the type of network used
284
H.32x Standards
Standard Physical Service Type Year approval
environment
H.320 ISDN Circuit 1990
Streaming a/v
128 to 384 Kb/s
H.321 ATM Circuit
H.322 IsoEthernet TDM
H.323 Ethernet Packet 1996
Streaming a/v
14,4 - 512 Kb/s
H.324 analog Modem Circuit
The H.32x are standards umbrella. Each is based on a previous set of standards to
specify all the necessary services in a videoconference.
e.g., G.711 audio coding
285
H.320 Standard
286
H.323 Standard
• Packet-based multimedia communications systems
287
H.320 & H.323 Standards
H.261 H.221 H.243 G.711 H.261/263 H.245 Q.931 G.711

Video Binary train Multi Point 3.1kHz audio Video Coding Control Call 3.1kHz audio
Coding conversion 64/56kbps
G.728 Protocol Signalization 64/56kbps
H.230 H.242 G.722 H.225
3.1kHz audio RAS G.723
packetization
Signalization Control 16kbps 7kHz audio Gate Keeper 3.1kHz audio
and Control Protocol 64/56/48kbps Signalization 5.3kbps
T.120
Data Protocols Multimedia
Communication
ISDN IP
288
H.320 & H.323 Standards
H.323 H.320
Control H.225.0 Call Control Q.931
H.245 System Control H.242
H.225.0 Multiplexing H.221
Media G.711 Audio G.711
G.722 G.722
G.723.1 G.728
G.728
H.261 Video H.261
H.263 H.263
T.120 Data T.120
289
H.32x audio Formats
Codec Original bandwidth Compression Compressed

(kbps) Ratio Bandwidth (kbps)
G.711 64 1:1 64
G.722 224 3,5-4,6 : 1 48-64
G.723.1 64 10 : 1 6,4
G.728 64 4:1 16
G.729 64 8:1 8
MPEG 706 3-11 : 1 64-256
MPEG is not an audio format H.323. It only appears for comparison
290
Agenda
• Introduction
• Conclusion
294
Some Digital Audio Formats
Sampling Freq. Capacity per Channel
Format # Channels Application
(KHz) (Kb/s)
PCM (G.711) 8 1 64 Telephony
ADPCM (G.721) 8 1 32 Telephony
SB-ADPCM (G.722) 16 1 48/56/64 Vídeoconferenc.
MP-MLQ (G.723.1) 8 1 6,3/5,3 variable Internet Telephony
ADPCM (G.726) 8 1 16/24/32/40 Telephony
E-ADPCM (G.727) 8 1 16/24/32/40 Telephony

Low delay
LD-CELP (G.728) 8 1 16 Telephony /Videoc.
CS-ACELP (G.729) 8 1 8 Internet Telephony
RPE-LTP (GSM 06.10) 8 1 13,2 GSM Telephony
CELP (FS 1016) 8 1 4,8

LPC-10E (FS 1015) 8 1 2,4
CD-DA / DAT 44,1/48 2 705,6/768 Hi-Fi Audio
MPEG-1 Layer I 32/44,1/48 2 192-256 variable
High delay MPEG-1 Layer II 32/44,1/48 2 96-128 variable

MPEG-1 Layer III (MP3) 32/44,1/48 2 64 variable Hi-Fi Internet
MPEG-2 AAC 32/44,1/48 5.1 32-44 variable Hi-Fi Internet
295
Digital Video Formats
Color Frame Rate Raw Data Rate
Video Format Y Size
Sampling (Hz) (Mbps)
HDTV Over air. cable, satellite, MPEG2 video, 20-45 Mbps

SMPTE296M 1280x720 4:2:0 24P/30P/60P 265/332/664
SMPTE295M 1920x1080 4:2:0 24P/30P/60I 597/746/746
Video production, MPEG2, 15-50 Mbps

BT.601 720x480/576 4:4:4 60I/50I 249
BT.601 720x480/576 4:2:2 60I/50I 166
High quality video distribution (DVD, SDTV), MPEG2, 4-10 Mbps

BT.601 720x480/576 4:2:0 60I/50I 124
Intermediate quality video distribution (VCD, WWW), MPEG1, 1.5 Mbps

SIF 352x240/288 4:2:0 30P/25P 30
Video conferencing over ISDN/Internet, H.261/H.263, 128-384 Kbps

CIF 352x288 4:2:0 30P 37
Video telephony over wired/wireless modem, H.263, 20-64 Kbps

QCIF 176x144 4:2:0 30P 9.1
296
Compressed video standard resolutions
Format SQCIF QCIF CIF 4CIF or 16CIF 16CIF 16:9

SCIF 4:3
Resolution 128x96 176x144 352x288 704x576 1408x11521 1920x1152

720x576 440x1152
H.261 Op.
H.263 Op. Op.

Standard
MPEG-4
MPEG-1
MPEG-2 Low Principal High 1440 High
297
Video compression formats
System Spatial Temporal Complexity Efficiency delay

Compression Compression Compression
(DCT)
M-JPEG Yes No Medium Low Very
small
H.261 Yes Limited High Medium small

(I & P)
MPEG-1/2 Yes Extended Very High Large high

(I, P & B)
H.263 Yes Extended Enormous large Half

MPEG-4 (I, P & B) high
298
Video compression formats Bit rates
Standard/Format Typical Bandwidth Compression

Ratio
CCIR 601 170Mbps 1:1 (Reference)
M-JPEG 10-20Mbps 7-27:1
Low H.261 64 – 2000kbps 24:1
delay
H.263 28,8-768kbps 50:1
MPEG-1 0,4-2,0Mbps 100:1
High MPEG-2 1,5-60Mbps 30-100:1

delay
MPEG-4 28,8-500kbps 100-200:1
299
Video compression formats
Type Method Format Original Compressed
Video H.261 176x144 or 2-36 Mbps 64-1544kbps

Conference 352x288
@10-30 fr/sec
Full Motion MPEG2 720x480 249 Mbps 2-6Mbps

@30 fr/sec
HDTV MPEG2 1920x1080 @30 1.6 Gbps 19-38Mbps

fr/sec
300
Agenda
• Introduction
• Conclusion
303
References
• Yun Q. Shi, Huifung Sun, 2008. Image and Video Compression for Multimedia Engineering.
Fundamentals, Algorithms, and Standards. CRC Press.
• Gonzalez, Woods, 2008. Digital Image Processing. Prentice Hall.
• Jae-Beom Lee, Hari Kalva, 2008. The VC-1 and H.264 Video Compression Standards for
Broadband Video Services. Springer.
• H.R. Wu & .R. Rao, 2006. Digital Video Image Quality and Perceptual Coding. Taylor & Francis
Group. LLC.
• Khalid Sayood, 2005. An introduction to data compression. Morgan Kaufmann Publishers.
• I.E.G. Richardson, 2003. H.264 and MPEG-4 Video Compression. Video Coding for next
generation multimedia. John Wiley & Sons, Ltd.
• Richardson, 2002. Video Codec Design. John Wiley & Sons.
• John WATINSON, 2001. The MPEG Handbook MPEG1, MPEG2, MPEG4. Focal Press.
• Ghanbari, 1999. Video coding: an introduction to standard codecs. IEE Press.
• Riley and Richardson, 1997. Digital Video Communications. pub. Artech House.
• Bhaskaran V, Konstantinides, 1996. Image and video compression standards – algorithms and
architectures. Kluwer academic publishers.
• Netravali, A N and Haskell, B G, 1995. Digital pictures: Representation, Compression and
Standards. 2nd Edition, Plenum Press.
304
References
• www.chiariglione.org/mpeg/
• http://www.mpeg.org
• http://jura1.eng.rgu.ac.uk/ (Digital Video pages)
• http://www.vcodex.com
305

Techniques de Compression Audio/Vidéo (Journée Télécoms & Multimédia Par: Zouhair Guennoun)

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Techniques de Compression Audio/Vidéo (Journée Télécoms & Multimédia Par: Zouhair Guennoun)

Hochgeladen von

Copyright:

Verfügbare Formate

A/V COMPRESSION TECHNIQUES

& COMPRESSION STANDARDS

Pr. Zouhair GUENNOUN

Data Noise source

• Audio & Video compression principles

• A/V Compression standards

• Audio & Video compression principles

• A/V Compression standards

• Bit rate of a stereo audio source (CD-DA encoding)

– Bit rate = 44100 * 2 * 16 = 1.41Mbits/sec

Type Sampling Bits per # Bit Rate (Mbps)

CD-DA 44,1 16 2 1,411

• Bit rate of a video source (CCIR 601 - 50Hz countries)

Bit rate = (576*720)*25*16 = 166Mbits/sec

– Compact Disc (CD – 650MB)

– Digital Versatile Disc (DVD – 4.7GB)

• PSTN modem - maximum bit rate: 56kbps

• Video frame sequence -

• Required bit rate: 288x352x8x3x30 = 72.99Mbps

• MPEG-1 target (Video-CD : 74 min. constraints)

But quality was judged too poor (about VHS quality)

– Transport stream (DVB)

• Compression allows a reduction in bandwidth

• For the same bandwidth, compression allows faster

• Compression removes redundancy from signals.

• Suppression of redundant information

The original signal and the one obtained after

Fc(x,y,t) Fp(x,y,t) Fc(x,y,t)

• Probabilistic modeling is at the heart Lossy methods

• Non-reversible (lossy): audio & video signals

• Usually more compression to lower quality and higher CPU

– Different compression algorithms also differ in their computational

– Compression algorithms designed for telephony should introduce very little

• Scene more complex Higher bit rate for same quality

• Variable Bit Rate systems –

– MPEG compression is the most efficient and gives better

• Video codec key issues:

Encoder Channel Decoder

• General-purpose compression: Entropy encoding

– Good for text files, poor for images/video

Source Entropy Entropy Decoded

• Add a model that attempts to represent the image/video

Image Entropy Entropy Image

• New technique may result in new trade-off

Interpixel Coding Luminance (Contrast) Masking

• Signal distortion is not a good measure of the performance of

• The five-grade CCIR impairment scale (Rec.562)

• Example: Double blind test

G.729 ( 8 Kb/s): 3,92

16 ADPCM 16 (G.726) LDCELP 16 (G.728)

8 CS-ACELP (G.729a) CS-ACELP 8 (G.729) Require special

Range Frequency Bandwidth Quality and Applications

Telephone channel 300Hz – 3.4kHz intelligible speech, noisy natural,

Expanded bandwidth 50Hz – 7kHz speech with respected natural

Hi.Fi. bandwidth 20Hz – 15kHz excellent speech and music

Stereo bandwidth 20Hz – 20kHz CD quality

Stereo bandwidth 20Hz – 48kHz perfect quality, studio, cinema, DVD

– The correct combination of the three monochrome images can reconstruct

– For purely black-and-white video, each pixel is

– For grey-scale video, 8 bits per pixel can be used to

Sampling & Quantization

Source: Digital Image Processing – Gonzalez, Woods. Prentice Hall

• Consequently, if a sequence of images is displayed at the

Bit rate = (576720)25*16 = 166Mbits/sec