Beruflich Dokumente
Kultur Dokumente
² MPEG1
² Layer1
² Layer2
² Layer3(MP3)
3
The Characteristics of Sound Waves
² Frequency
² The rate at which sound is measured
² Number of cycles per second or Hertz (Hz)
² Determines the pitch of the sound as heard by our ears
² The higher frequency, the clearer and sharper the sound => the
higher pitch of sound
² Amplitude
² Sound’s intensity or loudness
5
Audio Digitization
Voice 10010010000100
Analog-Digital
Converter (ADC)
Sinusoid wave
6
How Digitized Audio Data
To decide how to digitize audio data we need to answer the following questions:
8
Audio Digitization
² Step1. Sampling
9
Nyquist Theorem
10
Nyquist Theorem (cont.)
11
Quantization
12
Reconstruction
13
Quantization Error (Noise)
² A Signal to Noise Ratio (SNR) is the ratio of the relative sizes of the
signal values and the errors.
² The higher the SNR, the smaller the average error is with respect to
the signal value, and the better the fidelity.
14
Quantization (An Example)
16
µ-law, A-law (cont.)
17
µ-law vs. A-law
² Differences in signals between the present and a past time can reduce the size of signal values
and also concentrate the histogram of pixel values (differences, now) into a much smaller range.
² The result of reducing the variance of values is that lossless compression methods produce a bit
stream with shorter bit lengths for more likely values
20
Audio Coding (cont.)
² In general, producing quantized sampled output for audio is called PCM
(Pulse Code Modulation). The differences version is called DPCM (and a
crude but efficient variant is called DM). The adaptive version is called
ADPCM.
² Coding.
² Assign a codeword (thus forming a binary bit stream) to each output level or symbol
23
Pulse-code Modulation (PCM)
² µ-law and a-law PCM have already reduced the data sent.
² However, each sample is still independently encoded.
Original analog signal Decoded staircase signal. Reconstructed signal after low-
and its corresponding PCM pass filtering.
signals.
² Parameters:
² Sampling Rate (Samples per Second)
² Makes a simple prediction of the next sample, based on weighted previous
n samples.
² Normally the difference between samples is relatively small and can be
coded with less than 8 bits.
² Typically use 6 bits for difference, rather than 8 bits for absolute value.
² 2 Methods
² Adaptive Quantization
² Quantization levels are adaptive, based on the content of the audio.
² Receiver runs same prediction algorithm and adaptive quantization levels
to reconstruct speech.
² PCM, DPCM and ADPCM directly code the received audio signal.
² At the receiver, synthesize the voice from the model and received
parameters.
² Voiced sounds: series of pulses of air as larynx opens and closes. Basic
tone then shaped by changing resonance of vocal tract.
² Unvoiced sounds: larynx held open, turbulent noise made in mouth.
² Most modern voice codecs (eg GSM) are based on enhanced LPC encoders.
² For this to work, the vocal tract “tube” must not have any side branches
(these would require a more complex model).
² OK for vowels (tube is a reasonable model)
² Goal is to efficiently encode the residue signal, improving speech quality
over LPC, but without increasing the bit rate too much.
² Receiver looks up the code in its codebook, retrieves the residue, and uses
this to excite the LPC formant filter.
² Problem is that codebook would require different residue values for every
possible voice pitch.
² Codebook search would be slow, and code would require a lot of bits to send.
² One dynamically filled in with copies of the previous residue delayed by various amounts
(delay provides the pitch)
² CELP algorithm using these techniques can provide pretty good quality at
4.8Kb/s.
² LD-CELP
² Low-delay Code-Excited Linear Prediction (G.728)
² 16Kb/s
² CS-ACELP
² Conjugate Structure Algebraic CELP (G.729)
² 8Kb/s
² MP-MLQ
² Multi-Pulse Maximum Likelihood Quantization (G.723.1)
² 6.3Kb/s
² LPC-based codec model the sound source to achieve good compression.
² Works well for voice.
² Not all sounds in the sampled audio can actually be heard.
² Analyze the audio and send only the sounds that can be heard.
² Quantize more coarsely where noise will be less audible.
Sensitivity of
human ears
² Frequencies just above and below the frequency of a loud sound need to be louder than
the normal minimum amplitude before they can be heard.
² After hearing a loud sound, the ear is deaf to quieter sounds in the
same frequency range for a short time.
² MPEG (Motion Picture Expert Group) and ISO (International Standard
Organization) have published several standards about digital audio coding.
² MPEG-1 Layer 1,2 and 3 (MP3)
² They have been widely used in consumer electronics, digital audio
broadcasting, DVD and movies etc.
² The PCM samples are then divided up into 32 frequency sub-bands (sub-band filtering using
DFT)
² If the amplitude of signal in band is below the masking threshold, don’t encode
² Multiplex the output of the 32 bands into one bit stream
² Level 3 is the most common form for audio files on the Web
² Strictly speaking these files should be called MPEG-1 level 3 files.
² MPEG 1 audio allows sampling rate at 44.1 48, 32, 22.05, 24 and 16KHz.
² Imposes some restrictions on bit allocation in middle and high sub-bands.
² Better audio quality due to saving bits here so more bits can be used in
quantized sub-band values
36 samples
36 samples Normalize
Audio Filtering By scale Perceptual
samples And factor coder
downsampling
36 samples
Image