Sup DigitalAudio

Digital Audio Compression
4/5/2004
Nguyen Chan Hung - Hanoi University of

Techonology
MPEG Audio: Specifications
MPEG-1 (ISO/IEC 11172-3) provides:

Single-channel ('mono') and two-channel ('stereo' or 'dual mono')
coding of digitized sound waves at 32, 44.1, and 48 kHz
sampling rate.
The predefined bit-rates range from 32 to 448 kbit/s for Layer I,
from 32 to 384 kbit/s for Layer II, and from 32 to 320 kbit/s for
Layer III.
MPEG-2 BC (ISO/IEC 13818-3) provides:
A backwards compatible (BC) multi-channel extension to
MPEG-1
4/5/2004
Up to 5 main channels plus a 'low frequent enhancement' (LFE)

channel can be coded
The bit-rate range is extended up to about 1 Mbit/s;
An extension of MPEG-1 towards lower sampling rates 16,

22.05, and 24 kHz for bitrates from 32 to 256 kbit/s (Layer I)
and from 8 to 160 kbit/s (Layer II & Layer III).
Nguyen Chan Hung - Hanoi University of Techonology
MPEG Audio: Specifications (2)
MPEG-2 AAC (ISO/IEC 13818-7) provides
MPEG-4 (ISO/IEC 14496-3) provides
A very high-quality audio coding standard for 1 to 48 channels at sampling

rates of 8 to 96 kHz, with multichannel, multilingual, and multiprogram
capabilities.
AAC works at bitrates from 8 kbit/s for a monophonic speech signal up to in
excess of 160 kbit/s/channel for very-high-quality coding that permits
multiple encode/decode cycles.
Three profiles of AAC provide varying levels of complexity and scalability.
Coding and composition of natural and synthetic audio objects,
Scalability of the bitrate of an audio bitstream,
Scalability of encoder or decoder complexity,
Structured Audio: A universal language for score-driven sound synthesis
TTSI: An interface for text-to-speech conversion systems.
MPEG-7 (ISO/IEC 15938) will provide
4/5/2004
Standardized descriptions and description schemes of audio structures

and sound content,
A language to specify such descriptions and description schemes.
Related specifications
MUSICAM
ASPEC
Adaptive Spectral Perceptual Entropy Coding

Designed for high degrees of compression to allow audio
transmission on ISDN
NICAM 728
Masking pattern adapted Universal Sub-band Integrated

Coding And Multiplexing
Designed to be suitable for DAB (Digital Audio Broadcasting)
Used for European PAL television audio
Dolby AC-3
4/5/2004
Design for ATSC Digital TV

Background of audio compression
Audio compression takes advantage of two facts.
First, in typical audio signals, not all frequencies are

simultaneously present.
Second, because of the phenomenon of masking, human
hearing cannot perceive every detail of an audio signal.
Audio compression splits the audio spectrum into

bands by filtering or transforms, and includes
less data when describing bands in which the level
is low.
Where masking prevents or reduces audibility of
a particular band, even less data needs to be sent.
4/5/2004
Background of audio compression (2)
Audio compression is relatively harder than video compression

because of the acuity of hearing.
1- Masking:
Masking only works properly when the masking and the masked
sounds coincide spatially.
Spatial coincidence is always the case in mono recordings but
not in stereo recordings, where low-level signals can still be
heard if they are in a different part of the soundstage.
Consequently, in stereo and surround sound systems, a
lower compression factor is allowable for a given quality.
2. Speakers quality:
Delayed resonances in poor loudspeakers actually mask
compression artifacts.
Testing a compressor with poor speakers gives a false result,
signals which are apparently satisfactory may be disappointing
when heard on good equipment.
4/5/2004
The characteristics of human hearing
The top figure shows that the threshold

of hearing is a function of frequency.
Naturally, the greatest sensitivity is
in the speech range.
The bottom figure shows the hearing
threshold in the presence of a single
tone.
Note that the threshold is raised for

tones at higher frequency and to some
extent at lower frequency masking
effect.
A complex input spectrum, such as

music, raises the threshold at nearly all
frequencies.
As a result, the hiss from an analog audio
cassette is only audible during quiet
passages in music.
4/5/2004
The characteristics of human hearing (2)
A sound must be present for

at least about 1 millisecond
before it becomes audible.
4/5/2004
Because of this slow

response, masking can still
take place even when the two
signals involved are not
simultaneous.
Forward and backward
masking occur when the
masking sound continues to
mask sounds at lower levels
before and after the masking
sound's actual duration.
Masking
Masking raises the threshold of hearing,

compressors take advantage of this effect by
raising the noise floor, which allows the audio
waveform to be expressed with fewer bits.
The noise floor can only be raised at frequencies at
which there is effective masking.
To maximize effective masking, it is necessary to
split the audio spectrum into different frequency
bands to allow introduction of different amounts of
companding and noise in each band.
4/5/2004
MPEG Audio: General encoder model

Input
Sub-band
Filter
Bit
Allocation
Bit-stream
Generation
Output
Compute
Masking
4/5/2004
10
MPEG Audio encoding algorithm
Use sub-band filters to divide the audio signal into

32 frequency sub-bands that approximate the 32
critical bands.
Determine amount of masking for each band caused
by nearby band using the psychoacoustic model.
If the power in a band is below the masking
threshold, ignore it. Otherwise, determine number
of bits needed to represent the coefficient such that
noise introduced by quantization is below the
masking effect (1 bit 6 dB).
Generate bitstream
4/5/2004
11
MPEG Audio: Coding example

Band
1 2
10
11
12
13
14
15
16
Level (db)
0 8
12
10
10
60
35
20
15
After analysis, the levels of the first 16 of the 32 bands are:

The level of the 8th band is 60dB. If it gives a masking of 12
dB in the 7th band, 15dB in the 9th, then
Level in 7th band is 10 dB ( < 12 dB ), ignore it.

Level in 9th band is 35 dB ( > 15 dB ), encode it.
4/5/2004
Can encode with up to 2 bits (= 12 dB) of quantization error.
12
Sub-band coding (SBC) - Companding
The Figure shows a band-splitting compandor.

The band-splitting filter is a set of narrow-band,
linear-phase filters that overlap and all have the
same bandwidth.
The output in each band consists of samples
representing a waveform.
In each frequency band, the audio input is
amplified up to maximum level prior to
transmission.
Afterwards, each level is returned to its correct
value.
Noise picked up in the transmission is reduced in
each band.
If the noise reduction is compared with the
threshold of hearing, it can be seen that greater
noise can be tolerated in some bands because
of masking.
Consequently, in each band after companding,
it is possible to reduce the wordlength of
samples.
This technique achieves a compression
because the noise introduced by the loss of
resolution is masked.
4/5/2004
13
MPEG Audio Layer I
The figure shows a simplified bandsplitting coder used in MPEG Layer I.

The digital audio input is fed to a bandsplitting filter that divides the spectrum of
the signal into a number of bands (32 bands).
The time axis is divided into blocks of equal length.
In MPEG Layer I, this is 384 input samples, so in the output of the filter there are
12 samples in each of 32 bands.
Within each band, the level is amplified by multiplication to bring the level up
to maximum.
The gain required is constant for the duration of a block
A single scale factor is transmitted with each block for each band in order
to allow the process to be reversed at the decoder.
4/5/2004
14
MPEG Audio Layer I (cont.)
The filter bank output is also analyzed to determine the

spectrum of the input signal.
This analysis drives a masking model that determines the
degree of masking that can be expected in each band.
The more masking available, the less accurate the samples in
each band can be.
The sample accuracy is reduced by requantizing to reduce
wordlength.
This reduction is also constant for every word in a band, but
different bands can use different wordlengths.
The wordlength needs to be transmitted as a bit allocation
code for each band to allow the decoder to deserialize the bit
stream properly.
4/5/2004
15
MPEG Level 1 audio bit stream
The top Figure shows an MPEG Level 1 audio bit stream,

which includes:
Synchronizing pattern and the header,

32 Bit allocation codes of four bits each.
32 scale factors used in the companding of each band.
4/5/2004
These codes describe the wordlength of samples in each

subband.
These scale factors determine the gain needed in the decoder to
return the audio to the correct level.
Audio data in each band.

16
MPEG Layer I decoder
The synchronization pattern is detected by the timing

generator, which deserializes the bit allocation and scale
factor data.
The bit allocation data then allows deserialization of the
variable length samples.
The requantizing is reversed and the compression is
reversed by the scale factor data to put each band back to
the correct level.
These 32 separate bands are then combined in a combiner
filter which produces the audio output.
4/5/2004
17
MPEG Audio: The concept of Layers

Compression Ratios (Original bitrate is 1,4 Mbps of CD quality audio)
1:4
by Layer 1 (corresponds to 384 kbps for a stereo signal),
1:6...1:8
by Layer 2 (corresponds to 256..192 kbps for a stereo signal),
1:10...1:12
by Layer 3 (corresponds to 128..112 kbps for a stereo signal),
Three layers in MPEG audio: Layer I, II, III
4/5/2004
Basic model is similar.

CODEC complexity increases with each layer.
Encoder of higher layer can decode stream of lower layer (e.g.
Layer III decoder can decode Layer II stream, etc)
Psychoacoustic model is used to determine bit allocation to each
subband.
18
MPEG Audio: Filter type
Layer I: DCT type filter with one frame and equal

frequency spread per band
Layer II: Use three frames in filter (total 1152

samples)
Psychoacoustic model only uses frequency masking.
Psychoacoustic models a little bit of the temporal masking.
Layer III: Better critical band filter is used (nonequal frequencies)
4/5/2004
Psychoacoustic model includes temporal masking effects.

Takes into account stereo redundancy.
Uses Huffman coder.
19
MPEG-1 Audio Encoder (Layer I & II)

SF = Scale factor
R = Rate
SMR = Signal to Mask Ratio
32 subbands
0 to 31
Scaler
Quantizer
SMRn
Psychoacoustic
model
4/5/2004
Scale
factor
encoder
Bit-rate
allocation
Rn
Quantized
sample
encoder
Multiplexer
PCM
input
Analysis
filter bank
SFn
Output
Bit-rate
allocation
encoder
20
MPEG-1 Audio Encoder (cont.)
The input audio stream passes through a filter bank that divides
the input into multiple subbands of frequency.
The input audio stream simultaneously passes through a
psychoacoustic model that determines the ratio of the signal
energy to the masking threshold for each subband.
The bit- or noise allocation block uses the Signal-to-Mask
Ratios to decide how to apportion the total number of code
bits available for the quantization of the subband signals to
minimize the audibility of the quantization noise.
Finally, the multiplexer takes the representation of the quantized
subband samples and formats this data and side information into
a coded bitstream.
Ancillary data not necessarily related to the audio stream can
be inserted within the coded bitstream.
4/5/2004
21
MPEG Audio: Subband sample grouping

Subband
filter 1
Audio
samples
in
Subband
filter 2
.
.
Subband
filter 31
12
samples
12
samples
12
samples
12
samples
12
samples
12
samples
12
samples
12
samples
12
samples
Layer I
frame
4/5/2004
Layer II, III

frame
Layer I: 12 * 32 = 384 samples,

Layer II, III: 12* 3* 32 = 1152 samples
22
Psychoacoustic model: Layer I & II

512 or 1024
frequencies
PCM
input
Compute
quiet
threshold
Fast
Fourier
Transform
(FFT)
Tonal/
tonal
nontonal
separator
non
tonal
Compute
signal
power
Sn
Compute
tonal
masking
threshold
function
Compute
nontonal
masking
threshold
function
Masking
threshold
function
Calculate
Minimum
Mn
SMRn
The separator identifies and separates the tonal and noiselike components (non-tonal) of the audio signal because the
masking abilities of the two types of signal differ.
4/5/2004
23
MPEG-1 Audio Layer III Encoder (mp3)

Sub-subbands Scale_factors
32 subbands
0 to 31
Scaler
Quantizer
SMRn
Psychoacoustic
model
Calculate windows sizes,

Scale factor bands,
Bit rate allocation
and quantization taking
buffer fullness into account
Buffer
fullness
Multiplexer
MDCT
Quantized
sample
Huffman
encoder
Buffer
PCM
input
Analysis
filter bank
Scale
factor
encoder
Output
Side
information
encoder
Side
information
4/5/2004
24
Frame formats of 3 layers

Layer I
Layer II
Layer III
Header
CRC
Bit Allocation
Scale factor
Samples
Ancillary
(32)
(0,16)
(128,256)
(0-384)
Header
CRC
Bit Allocation
SCFSI
Scale factor
(32)
(0,16)
(128,256)
(0-60)
(0-384)
Header
CRC
Side information
Main Data
Ancillary
(32)
(0,16)
(136, 256)
(may belong to other frames)
data
data
Samples
Ancillary
data
SCFSI = Scale Factor Selection Information

Side Information of mp3 frame = 17 bytes (136 bits) in single
channel mode and 32 bytes (256 bits) in dual channel mode.
CRC is optional
While Layer I contains only 384 samples, Layer II and Layer III
contains 1152 samples
Main data of mp3 may contain data of neighbor frames (See next
slide)
4/5/2004
25
MP3 frame
The main data section contains the coded scale factor values
and the Huffman coded frequency lines
Its length depends on the bitrate and the length of the ancillary
data.
The length of the scale factor part depends on whether scale
factors are reused, and also on the window length (short or long).
The scale factors are used in the requantization of the
samples
The demand for Huffman code bits varies with time during the
coding process.
The variable bitrate format can be used to handle this, but a fixed
bitrate is often required for an application such as broadcasting
Therefore there is also a bit reservoir technique that allows
unused main data storage in one frame to be used by up to
two consecutive frames
4/5/2004
26
MP3 frame Bit Reservoir
The design of the Layer III bitstream better fits the encoder's time
varying demand on code bits.
As with Layer II, Layer III processes the audio data in frames of
1,152 samples.
Unlike Layer II, the coded data representing these samples do not
necessarily fit into a fixed length frame in the code bitstream.
The encoder can donate bits to a reservoir when it needs fewer
than the average number of bits to code a frame.
4/5/2004
27
MP3 frame Bit Reservoir (2)
Later, when the encoder needs more than

the average number of bits to code a frame,
it can borrow bits from the reservoir.
The encoder can only borrow bits donated
from past frames; it cannot borrow from
future frames.
MP3 bitstream includes a 9-bit pointer,
"main_data_begin," with each frame's side
information pointing to the location of the
starting byte of the audio data for that frame.
4/5/2004
28
MP3: Hybrid frequency analysis
Purpose
Increase the frequency resolution in subbands for better

perceptural coding.
Allow for some cancelation of aliasing caused by
polyphase analysis subband filters.
MDCT (Modified Discrete Cosine Transform)
4/5/2004
50% overlapped transform

Short-window MDCT: 6 sub-subbands (12 point DCT) in
each subband. Better time resolution.
Long window MDCT: 18 sub-subbands (36 point DCT) in
each subband. Better frequency resolution.
29
MP3 Decoder
4/5/2004
30
MP3 Performance
Sound quality
Bandwidth
Mode
Bitrate
Reduction ratio
Telephone sound
2.5 kHz
mono
8 kbps *
96:1
Short wave
4.5 kHz
mono
16 kbps
48:1
AM radio
7.5 kHz
mono
32 kbps
24:1
FM radio
11 kHz
stereo
56...64 kbps
26...24:1
Near-CD
15 kHz
stereo
96 kbps
16:1
CD
>15 kHz
stereo
112..128kbps
14..12:1
4/5/2004
31
MPEG-2 Audio
Difference between MPEG-1 and MPEG-2

audio for two-channel stereo
4/5/2004
Initial PCM sampling rate extends to include 16,

22.05, 24 kHz.
Pre-assigned bitrates are extended to as low as 8
kbits/s.
Provide better quantization tables for lower rates.
Improve the coding efficiency of the coding of
scale_factor and intensity_mode stereo in Layer
III.
32
MPEG-2 Audio: Backward Compatible (BC)
Define a five-channel surround

sound
MPEG-1 decoder can decode the L

and R signal.
Coding method:
Front left (L), front right (R), front

center (C), side/rear left (LS),
side/rear right (RS), and (optional)
low-frequency enhancement (LFE)
L and R channels are coded as

MPEG-1 does.
Additional channels are coded as
ancillary data in the MPEG-1 audio
stream.
3/2 stereo: L, R, C, LS, RS

5.1 channel stereo: L, R, C, LS, RS,
LFE
4/5/2004
33
MPEG-2 Audio frame

Header
CRC
Bit Allocation
SCFSI
Scale factor
Samples
MC
MC
MC
MC
MC
Header
CRC
Bit Allocation
SCFSI
Predictor
Ancillary data 1
MC Samples
Ancillary data 2
Multi-lingual
Commentary
Multi-Channel (MC) audio data information
As can be seen on the Figure, MPEG-2 Audio frame

is an extension of MPEG-1 frame, which supports
multi-channel and multi-lingual.
4/5/2004
34
MPEG-2 BC and MPEG-1 compatibility

MPEG-2
MPEG-2
MPEG-1
MPEG-1
Mono & Stereo
32, 44.1, 48 Khz
Layer
LayerI I
Layer
LayerIIII
Low
Low
Frequency
Frequency
Layer
LayerIIIIII
MultiMultiChannel
Channel
Mono & Stereo

18, 22.05, 24 Khz
Layer
LayerI I
4/5/2004
Layer
LayerIIII
5 channels
32, 44.1, 48 Khz
Layer
LayerIIIIII
Layer
LayerI I
Layer
LayerIIII
Layer
LayerIIIIII
35
MPEG-2 Audio: Advanced Audio Coding

(AAC)
To further improve the quality of compressed

audio using state-of-the-art technologies.
It was designated as MPEG-2 NBC (Non
Backward Compatible)
Initial PCM sampling rate: 8 kHz to 96 kHz.
Support from mono up to 48 audio channels
4/5/2004
36
Key Points
MPEG Audio Specifications

MPEG Audio mechanism
MPEG-1 Audio encoding/decoding
Human hearing & Audio masking

Sub-band coding (SBC) mechanism
Psychoacoustic model
The concept of layers
Layer I Layer II Layer III
Differences
MPEG-2 Audio BC
MPEG-2 AAC (NBC)
4/5/2004
37

Sup DigitalAudio

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Sup DigitalAudio

Hochgeladen von

Copyright:

Verfügbare Formate

Digital Audio Compression

Nguyen Chan Hung - Hanoi University of

MPEG Audio: Specifications

MPEG-1 (ISO/IEC 11172-3) provides:

Up to 5 main channels plus a 'low frequent enhancement' (LFE)

An extension of MPEG-1 towards lower sampling rates 16,

MPEG Audio: Specifications (2)

MPEG-2 AAC (ISO/IEC 13818-7) provides

MPEG-4 (ISO/IEC 14496-3) provides

A very high-quality audio coding standard for 1 to 48 channels at sampling

MPEG-7 (ISO/IEC 15938) will provide

Standardized descriptions and description schemes of audio structures

Nguyen Chan Hung - Hanoi University of Techonology

Adaptive Spectral Perceptual Entropy Coding

Masking pattern adapted Universal Sub-band Integrated

Used for European PAL television audio

Design for ATSC Digital TV

Background of audio compression

Audio compression takes advantage of two facts.

First, in typical audio signals, not all frequencies are

Audio compression splits the audio spectrum into

Nguyen Chan Hung - Hanoi University of Techonology

Background of audio compression (2)

Audio compression is relatively harder than video compression

Nguyen Chan Hung - Hanoi University of Techonology

The characteristics of human hearing

The top figure shows that the threshold

Note that the threshold is raised for

A complex input spectrum, such as

Nguyen Chan Hung - Hanoi University of Techonology

The characteristics of human hearing (2)

A sound must be present for

Because of this slow

Nguyen Chan Hung - Hanoi University of Techonology

Masking raises the threshold of hearing,

Nguyen Chan Hung - Hanoi University of Techonology

MPEG Audio: General encoder model

Nguyen Chan Hung - Hanoi University of Techonology

MPEG Audio encoding algorithm

Use sub-band filters to divide the audio signal into

Nguyen Chan Hung - Hanoi University of Techonology

MPEG Audio: Coding example

After analysis, the levels of the first 16 of the 32 bands are:

Level in 7th band is 10 dB ( < 12 dB ), ignore it.

Can encode with up to 2 bits (= 12 dB) of quantization error.

Nguyen Chan Hung - Hanoi University of Techonology

Sub-band coding (SBC) - Companding

The Figure shows a band-splitting compandor.

Nguyen Chan Hung - Hanoi University of Techonology

MPEG Audio Layer I

The figure shows a simplified bandsplitting coder used in MPEG Layer I.

Nguyen Chan Hung - Hanoi University of Techonology

MPEG Audio Layer I (cont.)

The filter bank output is also analyzed to determine the

Nguyen Chan Hung - Hanoi University of Techonology

MPEG Level 1 audio bit stream

The top Figure shows an MPEG Level 1 audio bit stream,

Synchronizing pattern and the header,

32 scale factors used in the companding of each band.

These codes describe the wordlength of samples in each

Audio data in each band.

MPEG Layer I decoder

The synchronization pattern is detected by the timing

Nguyen Chan Hung - Hanoi University of Techonology