Speech and Audio Signal Processing ECE554 - Lec - 3 Waveform Coding v2.0

9/25/2013
Overview
Sampling
Speech and Audio Signal Processing
jaj
ECE554
Nikesh Bajaj
nikesh.14730@lpu.co.in
Asst. Prof. DSP, SECE
Lovely Professional University
2 By: Nikesh Bajaj
CODING OF SPEECH SIGNALS

SpeechCoding
Ba Speech Quality V Bit Rate for Codecs
sh
WaveformCoding HybridCoding AnalysisSynthesisor
Vocoders
Waveform Coding: an attempt is made to preserve the original waveform.
Vocoders: a theoretical model of the speech production mechanism is considered.
Hybrid Coding: uses techniques from the other two. 4

ke
Sampling Theorem
Ni
Theorem: If the highest frequency contained in an analog

signal xa(t) is Fmax = B, and the signal is sampled at a frequency
Fs > 2B, then the analog signal can be exactly recovered from
its samples using the following reconstruction formula:
xa t
T t nT
sin
x nT
n
a
T t nT
Note that at the original sample instances (t = nT), the
reconstructed analog signal is equal to the value of the original
analog signal. At times between the sample instances, the
signal is the weighted sum of shifted sinc functions.
6
5 By: Nikesh Bajaj 6 By: Nikesh Bajaj
1
9/25/2013
Sampling Theorem(1920) Reconstruction of Signal
jaj
Sampling
Prove the Theorem
Fs = 2*Fc
Ba
sh
ke
TYPICAL SAMPLING FREQUENCIES IN

SPEECH RECOGNITION Problems
Ni
8 kHz: Popular in digital telephony. Provides Sampling theorem for bandlimited signals
coverage of first three formants for most speakers and How to change the sample rate of a signal?
most sounds. How this can be implemented using time
16 kHz: Popular in speech research. domain interpolation (based on the Sampling
Sub 8 kHz Sampling: Theorem)?
How this can be implemented efficiently using
digital filters?
11 12
2
9/25/2013
PCM
jaj

Speech Probability Density
Function
Probability density function for x(n) is the same as for xa(t) since
x(n)=xa(nt) the mean and variance are the same for both x(n) and xa(t).
Need to estimate probability density and power spectrum from speech
waveforms
probability density estimated from long term histogram of amplitudes
Ba

Measured Speech Densities
Distribution normalized so mean
is 0 and variance is 1(x=0, x=1)
Gamma density more closely
approximates measured
distribution for speech than
sh
Laplacian
good approximation is of a gamma distribution of the form:
Laplacian is still a good model
and is used in analytical studies
Small amplitudes much more
Simpler approximation is Laplacian density, of the form: likely than large amplitudes by
100:1 ratio.
15 16
ke
Correlation of Speech PCM

Sampling and Quantization
Ni
Can estimate long term

autocorrelation and power
spectrum using time-series Separating the processes of sampling and quantization.
analysis methods
Assume x(n) obtained by sampling a bandlimited signal at
a rate at or above the Nyquist rate.
8kHzsampledspeechforseveral
speakers Assume x(n) is known to infinite precision in amplitude.
where L is a large integer highcorrelationbetweenadjacent
samples
Lowpassspeechmorehighly
Need to quantize x(n) in some suitable manner.
correlatedthanbandpassspeech
17 18
3
9/25/2013
Quantization and Encoding n-bit Quantization

Use n-bit binary numbers to represent the quantized
Coding is a two-stage process samples => 2n quantization levels
jaj
quantization process: x( n)x (n) Information Rate of Coder: I=n FS= total bit rate in
bits/second
encoding process: x (n) c(n)
n=16, FS= 8 kHz => I=128 kbps
where is the (assumed fixed) quantization step size
n=8, FS= 8 kHz => I=64 kbps
Decoding is a single-stage process
n=4, FS= 8 kHz => I=32 kbps
decoding process:c(n) x(n)
Goal of waveform coding is to get the highest quality at a
if c(n)=c(n), (no errors in transmission) then x(n) =x(n) fixed value of I (kbps), or equivalently to get the lowest
x(n) x(n) coding and quantization loses information.
value of I for a fixed quality.
Since FS is fixed, need most efficient quantization methods
to minimize I.
19 20
Quantization Basics

Assume |x(n)| Xmax (possibly )
For Laplacian density (where Xmax=), can show that

Ba Quantization Process
sh
0.35% of the samples fall outside the range -4x x(n)
4x => large quantization errors for 0.35% of the samples.
Can safely assume that Xmax is proportional to x.
21 22
ke
Mid--Riser and Mid--Tread

Uniform Quantizer Quantizers
The choice of quantization range and levels chosen such that signal can easily be
Ni
processed digitally Mid-riser

origin (x=0) in middle of rising part of the staircase
same number of positive and negative levels
symmetrical around origin.
Mid-tread
origin (x=0) in middle of quantization level
one more negative level than positive
one quantization level of 0 (where a lot of activity occurs)
Code words have direct numerical significance (sign-magnitude representation for
mid-riser, twos complement for mid-tread).
23 24
4
9/25/2013
Quantizer Uniform Quantization and SNR

Uniform Quantizers characterized by:
number of levels2n (n bits)
quantization step size-.
jaj
if |x(n)| Xmax and x(n) is a symmetric density, then
2n =2Xmax
= 2Xmax/ 2n
if we let
x(n)=x(n) + e(n)
with x(n) the unquantized speech sample, and e(n) the
quantization
- /2 e(n) /2
26
25 By: Nikesh Bajaj

Quantization Noise Model Ba
quantization noise is a zero-mean, stationary white noise
process.
E[e(n)e(n+m)]=2e, m=0
= 0 otherwise
SNR for Quantization Ref:5.3.1
sh
quantization noise is uncorrelated with the input signal
E[x(n)e(n+m)]=0 m
Distribution of quantization errors is uniform over each
quantization interval
pe(e)=1/ - /2 e /2 =0, 2e = 2/12
=0 otherwise
27 28
ke
Instantaneous Companding [R:5.3.2] - Law : Companding/Exp.

LOG Pseudo Logarithmic
Ni
Not Practical
5
9/25/2013
- Law : Companding/Exp.
Cases
y(n) =0 for x(n) =0
jaj

If = 0 then y(n) = x(n)

For large
SNR Smith[10]
- Law : Example
= 40 and L=8
Ba A-Law
sh
ke
Delta Modulation
Ni
At high sampling rate, signal samples are

highly correlated.
Autocorrelation
.
6
9/25/2013
Linear Delta Modulation Linear Delta Modulation

Coder Decoder
jaj
Characteristics ??
Linear Delta Modulation

Optimum Prediction Gain
Ba

LDM
Slope overload
Avoid
sh
Slope overload
distortion (noise)

ke
LDM
When input is zero or constant
Ni
Noise with peak to peak variation (Qnoise)

Granular Noise
7
9/25/2013
LDM Adaptive Delta Modulation

Bit Rate:?? Encoder
Advantages
jaj
Simplicity
No sync. Req.
Simple circuit cond.
Bit-Pattern
Adaptive Delta Modulation

Decoder
Ba Characteristics?
Adaptive DM
sh
ke
ADM Comparison
Ni
Parameters P, Q, Dmn, Dmx

Ratio Dmx/Dmn = large for high SNR
PQ <=1
Another Continuously variable slope delta

modulation (CVSD)
8
9/25/2013
DPCM ADPCM: Adaptive Quan.

Quantize difference rather than sample Adaptive step size
DM is 1-bit DPCM Coder: Feed Forward
jaj
6dB improvement on SNR 5 dB + 6dB improvement
No single predictor can be optimal
Need of Adaptive DPCM
ADPCM
Decoder
Ba ADPCM -Feedback
sh
ke
ADPCM: Adaptive Predictor ADPCM: Adaptive Predictor

Ni
9
9/25/2013
ADPCM
jaj
11
PCM Speech(2)
Companding Example: 5-bit per sample(1-bit polarity, 2-bit segment code,
& 2-bit quantization code)
Linear
quantization
intervals
11
10
01
00
+V signal
Ba 11
PCM Speech(3)
Companding Example: 5-bit per sample(1-bit polarity, 2-bit segment code,
& 2-bit quantization code)
Linear
quantization
intervals
11
10
01
00
+V signal
Polarity: 1
Polarity: 1
11 11
Segment 10 10
01 Segment 10 10
01
00 00
codes(+) 11 codes(+) 11
01 10
01 01 10
01
00 00
11 11
sh
00 10
01 00 10
01
00 00
-V 00
01 +V
00
01
10 00 10 00
11 11
00 00
Polarity: 0
Polarity: 0
Narrower 01
10 01 Wider 01
10 01
intervals 11
Segment intervals 11
Segment
for smaller 00
for smaller 00
01
10 codes(-) 01
10 codes(-)
amplitude 10
11 amplitude 10
11
00 00
01 01
10 11 10 11
11 11
-V -V
57 58
ke
Ni
10

Speech and Audio Signal Processing ECE554 - Lec - 3 Waveform Coding v2.0

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Speech and Audio Signal Processing ECE554 - Lec - 3 Waveform Coding v2.0

Hochgeladen von

Copyright:

Verfügbare Formate

9/25/2013

CODING OF SPEECH SIGNALS

Hybrid Coding: uses techniques from the other two. 4

Theorem: If the highest frequency contained in an analog

Sampling Theorem(1920) Reconstruction of Signal

TYPICAL SAMPLING FREQUENCIES IN

Correlation of Speech PCM

Can estimate long term

Quantization and Encoding n-bit Quantization

For Laplacian density (where Xmax=), can show that

4x => large quantization errors for 0.35% of the samples.

Can safely assume that Xmax is proportional to x.

Mid--Riser and Mid--Tread

processed digitally Mid-riser

Quantizer Uniform Quantization and SNR

Instantaneous Companding [R:5.3.2] - Law : Companding/Exp.

If = 0 then y(n) = x(n)

31 By: Nikesh Bajaj 32 By: Nikesh Bajaj

At high sampling rate, signal samples are

35 By: Nikesh Bajaj 36 By: Nikesh Bajaj

Linear Delta Modulation Linear Delta Modulation

37 By: Nikesh Bajaj 38 By: Nikesh Bajaj

Linear Delta Modulation

39 By: Nikesh Bajaj 40 By: Nikesh Bajaj

Noise with peak to peak variation (Qnoise)

41 By: Nikesh Bajaj 42 By: Nikesh Bajaj

LDM Adaptive Delta Modulation

Adaptive Delta Modulation

Parameters P, Q, Dmn, Dmx

Another Continuously variable slope delta

47 By: Nikesh Bajaj 48 By: Nikesh Bajaj

DPCM ADPCM: Adaptive Quan.

49 By: Nikesh Bajaj 50 By: Nikesh Bajaj

ADPCM: Adaptive Predictor ADPCM: Adaptive Predictor

53 By: Nikesh Bajaj 54 By: Nikesh Bajaj

59 By: Nikesh Bajaj 60 By: Nikesh Bajaj

Das könnte Ihnen auch gefallen