Beruflich Dokumente
Kultur Dokumente
B. Prabhakaran 1
MPEG Audio Features
MPEG Audio Layers
Layer 1 allows for a maximal bit rate of 448 Kbits/second.
Layer 2 allows for 384 Kbits/second
Layer 3 for 320 Kbits/second.
B. Prabhakaran 2
MPEG Audio Coding
B. Prabhakaran 3
MPEG Audio Coding…
Uncompressed audio transformed into 32 non-
interleaved sub-bands using Fast Fourier Transform
(FFT).
Amplitude and noise level of signal in each sub-band
determined using a psycho-acoustic model.
Next, quantization of the transformed signals.
MPEG audio layers 1 and 2 are PCM-encoded. Layer
3: Huffman coded.
Types of channels
Single channel, two independent channels, or one stereo
channel.
Stereo channels: processed independently or jointly. Joint
stereo exploits redundancy of both channels
B. Prabhakaran 4
MPEG-2 Audio
Different channels:
5 full bandwidth channels (left, right, center, and two
surround channels).
Additional low frequency enhancement (LFE) channel.
Up to seven commentary or multilingual channel.
Sampling rates defined for MPEG-2 audio include 16
KHz, 22.05KHz, and 24 KHz
B. Prabhakaran 5
JPEG Compression
Modes of compression
Lossy Sequential DCT-based Mode: also known as
baseline process; Needs to be supported by every
JPEG implementation.
Expanded Lossy DCT-based Mode: Provides an
additional set of further enhancements to the
baseline mode.
Lossless Mode: Allows perfect reconstruction of
the original image; lower compression ratio.
Hierarchical Mode: consists of images with
different resolutions generated using the methods
described above.
B. Prabhakaran 6
Image Preparation
Source image can have up to 255 components
(instead of only three components Y,U, and V).
E.g., components of an image can be the color
components (Red R, Green G, Blue B) or luminance
components (Luminance Y, chrominance U and V).
Each component of the image may have the same or
different resolution, in terms of the number of pixels
in the horizontal and vertical axis.
B. Prabhakaran 7
Image Preparation ..
B. Prabhakaran 8
Image Preparation …
Each pixel represented by p bits; values 0 - 2p-1.
Value of p depends on the mode of compression.
Lossy modes use either 8 or 12 bits per pixel.
Lossless modes use 2 upto 12 bits per pixel.
All pixels of all components within the same image are coded
with the same number of bits.
An application can have different number of bits per
pixel, provided a suitable transformation of the image
to the well defined numbers of bits in the JPEG
standard.
B. Prabhakaran 9
Non-Interleaved Ordering
B. Prabhakaran 12
Lossy Sequential Mode
B. Prabhakaran 13
Lossy Sequential Mode
Values of each pixel is shifted in the range -128 to
127, with zero as the center. Achieved by Discrete
Cosine Transformation (DCT).
For a block of 8 X 8 pixels, shifted values are
represented by Sxy, 0 ≤ x ≤ 7, 0 ≤ y ≤ 7.
Each of these values are transformed using Forward
DCT (FDCT).
Above FDCT transformation has to be done 64 times
per block, resulting in 64 coefficients Suv per block.
Cosine expressions depend on x, y, u, and v, but not
on Sxy. Computation can be optimized to take
advantage of this fact.
B. Prabhakaran 14
Lossy Sequential Mode ..
• FDCT maps the value from the time to the frequency domain.
• Each coefficient of Suv can be regarded as a two-dimensional frequency.
• Coefficient S00, the DC-coefficient, corresponds to the lowest frequency in
both dimensions. Also describes the fundamental color of the 8 X 8 pixels
block.
• Rest of the coefficients known as AC-coefficients.
B. Prabhakaran 15
Quantization Phase
FDCT coefficients in a block may have low or zero
values, if the block has only one predominant color.
Entropy encoding is used for further compression
Each of the 64 coefficient value is scaled by a factor
Q, the quantization factor.
E.g., the quantized value of the DC-coefficient, SQ00, is :
SQ00 = S00 / Q.
In most cases, quantization is not done in a uniform
manner.
Low frequencies of the FDCT coefficients describe the
boundaries among regions in the image being compressed.
If low frequency coefficients are quantized in a very coarse
manner (i.e., with high values of Q), boundaries in the
reconstructed image may not be as sharp.
B. Prabhakaran 16
Quantization Phase
Low frequency coefficients quantized in finer manner
(i.e., with lower values of Q) than higher frequency
ones.
Table with 64 entries used for representing the
values of the quantization factor Q, for each of the 64
FDCT coefficients.
Quantization of each coefficient is: SQuv = Suv / Quv,
Quv the quantization factor for uvth coefficient.
B. Prabhakaran 17
Quantization Phase..
B. Prabhakaran 18
Entropy Encoding
B. Prabhakaran 19
Entropy Encoding
Lower frequency AC-coefficients have higher values
than higher frequencies ones (which are usually very
small or zero).
Hence, zig-zag ordering of AC-coefficients produces
a sequence where similar values will be together.
Such a sequence is highly suitable for efficient
entropy encoding.
Next step: run-length encoding of zero values. JPEG
specifies Huffman and Arithmetic encoding.
(Arithmetic encoding is protected by a patent).
For the lossy JPEG mode (i.e., the baseline
process), only Huffman encoding is allowed to be
used.
B. Prabhakaran 20
Image Reconstruction
Decompress the data in Huffman/Arithmetic coded
form.
Dequantization is then performed: Ruv = SQuv X Quv.
Must use the same table as the one used in the
quantization process.
Dequantized DCT coefficients are then subject to
IDCT (Inverse DCT).
If FDCT and IDCT can determine the values with full
precision, reconstruction can be lossless (assuming
lossless quantization).
However, precision is restricted and hence the
reconstruction process is lossy.
B. Prabhakaran 21
Image Reconstruction
B. Prabhakaran 22
Expanded Lossy Mode
Progressive representation realized by expansion of
quantization.
Expansion done by addition of an output buffer to the
quantizer, storing all coefficients of the quantized
DCT.
Encoding process follows either:
Encoder processes DCT-coefficients of low frequencies
successively (low frequencies describe border outlines).
Hence, encoding low frequency coefficients successively
decode the boundaries of various objects successively.
Another approach: use all the DCT-coefficients in the
encoding process but single bits are differentiated according
to their significance (i.e., most significant bit first and then
the least significant bits are encoded).
B. Prabhakaran 23
Progressive Spectral Selection
DCT coefficients are grouped into several spectral
bands.
Low-frequency DCT coefficients sent first, and then
higher-frequency coefficients. E.g.,
Band 1: DC coefficients only
Band 2: AC1 and AC2 coefficients
Band 3: AC3, AC4, AC5, and AC6 coefficients
Band 4: AC7 … AC63 coefficients
Bits
n-1
band1 band2 band3 band4
band2
band3
0
DC AC1 AC2 AC63
B. Prabhakaran 25
Combined Progressive …
Combines both spectral & successive approximations
SCAN 1: DC band 1; Scan 2: AC band 1
Scan 3: AC b2; Scan 4: AC b3; Scan 5: AC b4;
Scan 6: AC b5; Scan 7: DC b2; Scan 8: AC b6
Bits
n-1
DC AC b2 AC b3
AC b1
b1
AC b4 AC b5
DC
AC b6
0 b2
DC AC1 AC2 AC63
B. Prabhakaran 26
Lossless Mode
B. Prabhakaran 28
Hierarchical JPEG
Progressive JPEG at multiple resolutions
Rn
Image Res 2
Image Res 1
B. Prabhakaran 29
Video Compression
Asymmetric Applications
Compression process is performed only once and at the
time of storage. E.g., on-Demand servers (such as Video-
on-Demand and News-on-Demand) and electronic
publishing (travel guides, shopping guides, and educational
materials).
Symmetric Applications
Equal use of compression and decompression process.
E.g., information generated through video cameras or by
editing pre-recorded material.
Video conferencing, video telephone applications involve
generation, compression, and decompression of information
generated through video cameras.
Desktop video publishing applications require edit
operations on pre-recorded material.
B. Prabhakaran 30
Desirable Features for Video
Compression
Random Access
Fast Forward / Rewind
Reverse Playback
Audio-Visual Synchronization
Robustness to Errors
Coding / Decoding Delay
Edit ability
Format Flexibility
Cost Tradeoffs
B. Prabhakaran 31
MPEG Standard
MPEG-Video: compression of video signals at about
1.5 Mbits per second.
MPEG-Audio: compression of digital audio signal at
the rates of 64, 128, and 192 kbps per channel.
MPEG-System deals with the issues relating to
audio-visual synchronization.
Also handles multiplexing of multiple compressed
audio and video bit streams.
B. Prabhakaran 32
MPEG Video
Primary aim of MPEG-Video is to compress a video
signal to a bit rate of about 1.5 Mbits/s with an
acceptable quality.
MPEG is often termed a generic standard, implying
that it is independent of a particular application.
Benefited from other following standards. :
JPEG
H.261: standard was already available during MPEG
standardization. MPEG technique is more advanced than
H.261.
B. Prabhakaran 33
MPEG Video
Two nearly conflicting requirements.
Random access requirements for MPEG video are best
satisfied with pure intra-frame coding.
High compression rates are not possible unless a fair
amount of inter-frame coding is done.
Intra-frame coding: targets spatial redundancy
reduction.
Inter-frame coding: targets temporal redundancy
reduction.
Delicate balance between inter- and intra-frame
coding.
B. Prabhakaran 34
Temporal Redundancy Reduction
Temporal redundancies present in video when
subsequent frames carry similar but slightly varying
content.
E.g., video frames of a person walking in a street show a
gradual variation in the contents based on the walking speed
of the person.
Most widely used techniques for achieving temporal
redundancy reduction is motion compensation.
Motion information comprises of the amplitude and
the direction of displacement of the contents.
MPEG uses block-based motion compensation
technique.
B. Prabhakaran 35
Temporal Redundancy Reduction
Significant cost associated with motion information
coding.
Hence, 16 X 16 bit blocks are chosen as motion
compensation units (MCUs), called macro-blocks.
Two types of motion compensation are applied over
these macro-blocks.
Causal predictors (Predictive coding): i.e., generate the
contents of a subsequent frame based on the motion
information and the contents of the current one.
Non-causal predictors (interpolative coding): Frame coded
based on both a previous and a successive frame.
B. Prabhakaran 36
Interpolative Coding
B. Prabhakaran 38
Predictive Coding
B. Prabhakaran 39
MPEG Frame Sequence
B. Prabhakaran 40
Motion Estimation
Previous Current Future
Frame Frame Frame
A
B
C
B. Prabhakaran 41
Motion Estimation …
MPEG does not specify the motion estimation
technique.
Block-matching techniques likely to be used. Goal:
Estimation motion of a n X m block in the present frame with
respect to a previous or a future frame.
Block is compared to with a corresponding block within a
search area G of size (m + 2p X n + 2p) in the previous/future
frame.
Typical: n = m = 16 (16 X 16 pixels) and parameter p = 6.
n
m
p p
Search Area G
B. Prabhakaran 42
Block Matching Approaches
B. Prabhakaran 43
MPEG Layers
Sequence Layer Context unit
Group of Pictures Random access unit
Picture Layer Primary coding unit
Slice Layer Resynchronization unit
Macro-blocks Motion compensation
Blocks DCT unit
B. Prabhakaran 44