Lecture 2 Acoustic Theory

Lecture # 2
Session 2003
Acoustic Theory of Speech Production

Overview
Sound sources
Vocal tract transfer function
Wave equations
Sound propagation in a uniform acoustic tube
Representing the vocal tract with simple acoustic tubes
Estimating natural frequencies from area functions
Representing the vocal tract with multiple uniform tubes
6.345 Automatic Speech Recognition
Acoustic Theory of Speech Production 1
A n a t o m i ca l S t r u ct u r e s f o r S p e e ch P r o d u ct i o n
6 . 3 4 5 Autom atic Speech Recognition
Acous tic T heory of Speech Production 2
Phonemes in American English

PHONEME
/i/
/I/
/e/
/E/
/@/
/a/
/O/
/^/
/o/
/U/
/u/
/5/
/a/
/O/
/a/
/{/
EXAMPLE
beat
bit
bait
bet
bat
Bob
bought
but
boat
book
boot
Burt
bite
Boyd
bout
about
PHONEME
/s/
/S/
/f/
/T/
/z/
/Z/
/v/
/D/
/p/
/t/
/k/
/b/
/d/
/g/
EXAMPLE
see
she
fee
thief
z
Gigi
v
thee
pea
tea
key
bee
Dee
geese
PHONEME
/w/
/r/
/l/
/y/
/m/
/n/
/4/
/C/
/J/
/h/
EXAMPLE
wet
red
let
yet
meet
neat
sing
church
judge
heat
Places of Articulation for Speech Sounds
Palato-Alveolar
Alveolar
Labial
Dental
Palatal
Velar
Uvular
Speech Waveform: An Example
Two plus seven is less than ten

A Wideband Spectrogram
Two plus seven is less than ten

Acoustic Theory of Speech Production

The acoustic characteristics of speech are usually modelled as a
sequence of source, vocal tract lter, and radiation characteristics
UL
Pr
r
UG
Pr (j) = S(j) T (j) R(j)
For vowel production:

S(j) = UG (j)
T (j) = UL (j) / UG (j)
R(j) = Pr (j) / UL (j)
Sound Source: Vocal Fold Vibration

Modelled as a volume velocity source at glottis, UG (j)
Pr ( t )
To = 1/Fo
UG ( f )
1/f2
UG ( t )
f
Men
Women
Children
F0 ave (Hz)
125
225
300
F0 min (Hz)
80
150
200
F0 max (Hz)
200
350
500
Sound Source: Turbulence Noise

Turbulence noise is produced at a constriction in the vocal tract
Aspiration noise is produced at glottis
Frication noise is produced above the glottis
Modelled as series pressure source at constriction, PS (j)
Ps ( f )
0.2 V
D
V : Velocity at constriction
D: Critical dimension =
4A
A
Vocal Tract Wave Equations

Dene:
u(x, t)
U (x, t)
p(x, t)
=
=
=
=
particle velocity
volume velocity (U = uA)
sound pressure variation (P = PO + p)
density of air
velocity of sound
Assuming plane wave propagation (for a cross dimension ),

and a one-dimensional wave motion, it can be shown that
u
p
=
x
t
u
1 p
=
x c 2 t
1 2 u
2 u
= 2 2
2
x
c t
Time and frequency domain solutions are of the form
1
x
x
+
sx/c
sx/c
u(x, s) =
P e
u(x, t) = u (t ) u (t + )
P+ e
c
c
c
x
x
+
p(x, t) = c u (t ) + u (t + )
p(x, s) = P+ esx/c + P esx/c
c
c
Propagation of Sound in a Uniform Tube

A
UG
x = -l
x = 0
The vocal tract transfer function of volume velocities is

UL (j) U (, j)
=
T (j) =
UG (j)
U (0, j)
Using the boundary conditions U (0, s) = UG (s) and P(, s) = 0
T (s) =
es/c
2
+ es/c
T (j) =
1
cos(/c)
The poles of the transfer function T (j) are where cos(/c) = 0

(2fn ) (2n 1)
=
c
2
c
fn =
(2n1)
4
4
n =
(2n 1)
n = 1, 2, . . .
Propagation of Sound in a Uniform Tube (cont)
For c = 34, 000 cm/sec, = 17 cm, the natural frequencies (also

called the formants) are at 500 Hz, 1500 Hz, 2500 Hz, . . .
j
20 log10 T ( j
)
40
x
20
x
0
Frequency ( kHz )
x
x
The transfer function of a tube with no side branches, excited at

one end and response measured at another, only has poles
The formant frequencies will have nite bandwidth when vocal
tract losses are considered (e.g., radiation, walls, viscosity, heat)
The length of the vocal tract, , corresponds to 14 1 , 34 2 , 54 3 , ...,
where i is the wavelength of the i th natural frequency
Standing Wave Patterns in a Uniform Tube
A uniform tube closed at one end and open at the other is often
referred to as a quarter wavelength resonator
x
glottis
lips
|U(x)|
SWP for
F1
SWP for
F2
2
3
SWP for
F3
2
5
4
5
Natural Frequencies of Simple Acoustic Tubes
z-l
x = -l
x = 0
Quarter wavelength resonator

x
P(x, j) = 2P+ cos
c
U(x, j) = j
z-l
x
A
2P+ sin
c
c
x = -l
x = 0
Half-wavelength resonator
x
P(x, j) = j2P+ sin
c
U(x, j) =
A
x
2P+ cos
c
c
A
A
tan
Y = j
cot
Y = j
c
c
c
c
1
A
A
= j
j
/c 1
j 2 = jCA /c 1
MA
c
CA = A/c 2 = acoustic compliance MA = /A = acoustic mass
fn =
c
(2n 1)
4
n = 1, 2, . . .
fn =
c
n n = 0, 1, 2, . . .
2
Approximating Vocal Tract Shapes
[ i ]
A1
l1
[ a ]
[ u ]
A2
l2
Estimating Natural Resonance Frequencies

Resonance frequencies occur where impedance (or admittance)
function equals natural (e.g., open circuit) boundary conditions
UG
A1
A2
l1
UL
l2
Y 1+ Y 2= 0
For a two tube approximation it is easiest to solve for Y1 + Y2 = 0

1
A2
2
A1
tan
j
cot
=0
j
c
c
c
c
1
2 A2
2
1
sin
sin
cos
=0
cos
c
c
A1
c
c
Decoupling Simple Tube Approximations

If A1 A2 , or A1 A2 , the tubes can be decoupled and natural
frequencies of each tube can be computed independently
For the vowel /i/, the formant frequencies are obtained from:
A1
A2
l1
fn =
At low frequencies:
f =
c
n
21
c
A2
2 A1 1 2
l2
plus
1/2
fn =
c
n
22
1
1
2 CA1 MA2
1/2
This low resonance frequency is called the Helmholtz resonance

Vowel Production Example

2
1 cm
1 cm
9 cm
7 cm
8 cm
8 cm
9 cm
972
2917
.
.
.
Formant
F1
F2
F3
.
.
Actual
789
1276
2808
.
.
6 cm
1093
.
.
.
.
Estimated
972
1093
2917
.
.
268
1944
.
.
.
.
Formant
F1
F2
F3
.
.
2917
.
.
.
.
Actual
256
1905
2917
.
.
Estimated
268
1944
2917
.
Example of Vowel Spectrograms

16
0.0
0.1
0.2
Zero Crossing Rate
Time (seconds)
0.3
0.4
0.5
0.6
0.7
kHz 8
16
8 kHz
16
0.5
0.6
0.7
kHz 8
16
8 kHz
Total Energy
0
Total Energy
dB
dB
dB
dB
dB
Energy -- 125 Hz to 750 Hz
dB
dB
8
Time (seconds)
0.3
0.4
0.0
0.1
0.2
Zero Crossing Rate
dB
Wide Band Spectrogram
kHz 4
4 kHz
kHz 4
4 kHz
0
Waveform
0.0
0.1
0
Waveform
0.2
0.3
0.4
/bit/
0.5
0.6
0.7
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
/bat/
Estimating Anti-Resonance Frequencies (Zeros)
Zeros occur at frequencies where there is no measurable output

ln
UG
Yn
Ap
Yp
lp
An
UN
Ao
Ab
lo
lb
Ac
Yo
lc
Ps A f
UL
lf
For nasal consonants, zeros in UN occur where YO =

For fricatives or stop consonants, zeros in UL occur where the
impedance behind source is innite (i.e., a hard wall at source)
Y1 = 0
Y 3+ Y 4= 0
Zeros occur when measurements are made in vocal tract interior

Consonant Production
Ab
Ac
lb
Ps A f
lc
lf
POLES
+
[g]
[s]
ZEROS
+
Ab
5
5
Ac
0.2
0.5
[g]
poles zeros
215
0
1750 1944
1944 2916
3888 3888
.
.
.
.
Af
4
4
b
9
11
c
3
3
f
5
2.5
[s]
poles zeros
306
0
1590 1590
3180 2916
3500 3180
.
.
.
.
Example of Consonant Spectrograms

0.0
0.1
0.2
16
Zero Crossing Rate
kHz 8
Time (seconds)
0.3
0.4
0.5
0.6
0.7
16
8 kHz
0.0
0.1
0.2
16
Zero Crossing Rate
kHz 8
Time (seconds)
0.4
0.5
0.6
0.7
0.8
16
8 kHz
Total Energy
0
Total Energy
dB
dB
dB
dB
dB
dB
dB
8
0.3
dB
kHz 4
4 kHz
kHz 4
4 kHz
0
Waveform
0.0
0.1
0
Waveform
0.2
0.3
0.4
/kip/
0.5
0.6
0.7
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
/si/
Perturbation Theory
Y j
A
Yl
A
for small
Consider a uniform tube, closed at one end and open at the other
l
Reducing the area of a small piece of the tube near the opening
(where U is max) has the same eect as keeping the area xed
and lengthening the tube
Since lengthening the tube lowers the resonant frequencies,
narrowing the tube near points where U (x) is maximum in the
standing wave pattern for a given formant decreases the value of
that formant
Perturbation Theory (contd)

Y j
A
Yl
A
for small
2
c
Reducing the area of a small piece of the tube near the closure
(where p is max) has the same eect as keeping the area xed and
shortening the tube
Since shortening the tube will increase the values of the formants,
narrowing the tube near points where p(x) is maximum in the
standing wave pattern for a given formant will increase the value
of that formant
Summary of Perturbation Theory Results

x
glottis
lips
glottis
lips
|U(x)|
+
F1
SWP for
F1
1
2
(as a consequence of decreasing A)
F2
SWP for
F2
2
3
F3
SWP for
F3
2
5
4
5
1
2
+
1
2
Illustration of Perturbation Theory
The ship was torn apart on the sharp (reef)

(The ship was torn apart on the sh)arp reef

Multi-Tube Approximation of the Vocal Tract
We can represent the vocal tract as a concatenation of N lossless

tubes with constant area {Ak }and equal length x = /N
The wave propagation time through each tube is =
x
c
Nc
A7
Wave Equations for Individual Tube

The wave equations for the kth tube have the form
c +
x
x
pk (x, t) =
[Uk (t ) + Uk(t + )]
c
Ak
c
Uk (x, t) = Uk+ (t xc ) Uk(t + xc )
where x is measured from the left-hand side (0 x x)
+
U k ( t - ) U k+1( t )
U k ( t + ) U k+1 ( t )
Uk ( t )
Uk ( t )
U k+1 ( t - )
U k+1
(t+)
x
Ak
x
A k+1
Update Expression at Tube Boundaries

We can solve update expressions using continuity constraints at
tube boundaries e.g., pk (x, t) = pk+1 (0, t), and Uk (x, t) = Uk+1 (0, t)
+
Uk (
t)
DELAY
Uk ( t - )
Uk + 1 ( t )
1 + rk
1 - rk
DELAY
U k(
t + )
Uk+1( t - )
DELAY
U k + 1( t + )
rk
- rk
Uk ( t )
DELAY
Uk + 1 (
k th tube
t)
( k + 1 ) st tube
Uk++1 (t) = (1 + rk )Uk+ (t ) + rk Uk+1

(t)
(t)
Uk (t + ) = rk Uk+ (t ) + (1 rk )Uk+1
rk =
Ak+1 Ak
Ak+1 + Ak
note |rk |1
Digital Model of Multi-Tube Vocal Tract

Updates at tube boundaries occur synchronously every 2
If excitation is band-limited, inputs can be sampled every T = 2
Each tube section has a delay of z1/2
+
Uk (
z)
1
2
1 + rk
Uk + 1 ( z )
-rk
rk
Uk ( z )
z
1
2
Uk + 1 ( z )
1 - rk
The choice of N depends on the sampling rate T

2
= N =
T = 2 = 2
Nc
cT
Series and shunt losses can also be introduced at tube junctions
Bandwidths are proportional to energy loss to storage ratio
Stored energy is proportional to tube length
Assignment 1
References
Zue, 6.345 Course Notes
Stevens, Acoustic Phonetics, MIT Press, 1998.
Rabiner & Schafer, Digital Processing of Speech Signals,
Prentice-Hall, 1978.

Lecture 2 Acoustic Theory

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture 2 Acoustic Theory

Hochgeladen von

Copyright:

Verfügbare Formate

Lecture # 2

Acoustic Theory of Speech Production

6.345 Automatic Speech Recognition

Acoustic Theory of Speech Production 1

6 . 3 4 5 Autom atic Speech Recognition

Acous tic T heory of Speech Production 2

Phonemes in American English

6.345 Automatic Speech Recognition

Acoustic Theory of Speech Production 3

Places of Articulation for Speech Sounds

6.345 Automatic Speech Recognition

Acoustic Theory of Speech Production 4

Speech Waveform: An Example

Two plus seven is less than ten

Acoustic Theory of Speech Production 5

Two plus seven is less than ten

Acoustic Theory of Speech Production 6

Acoustic Theory of Speech Production

Pr (j) = S(j) T (j) R(j)

For vowel production:

Acoustic Theory of Speech Production 7

Sound Source: Vocal Fold Vibration

6.345 Automatic Speech Recognition

Acoustic Theory of Speech Production 8

Sound Source: Turbulence Noise

Acoustic Theory of Speech Production 9

Vocal Tract Wave Equations

volume velocity (U = uA)

sound pressure variation (P = PO + p)

Assuming plane wave propagation (for a cross dimension ),

Time and frequency domain solutions are of the form

Acoustic Theory of Speech Production 10

Propagation of Sound in a Uniform Tube

The vocal tract transfer function of volume velocities is

The poles of the transfer function T (j) are where cos(/c) = 0

Acoustic Theory of Speech Production 11

Propagation of Sound in a Uniform Tube (cont)

For c = 34, 000 cm/sec, = 17 cm, the natural frequencies (also

The transfer function of a tube with no side branches, excited at

Acoustic Theory of Speech Production 12

Standing Wave Patterns in a Uniform Tube

Natural Frequencies of Simple Acoustic Tubes

Quarter wavelength resonator

6.345 Automatic Speech Recognition

Approximating Vocal Tract Shapes

6.345 Automatic Speech Recognition

Acoustic Theory of Speech Production 15

Estimating Natural Resonance Frequencies

For a two tube approximation it is easiest to solve for Y1 + Y2 = 0

Acoustic Theory of Speech Production 16

Decoupling Simple Tube Approximations

This low resonance frequency is called the Helmholtz resonance

Acoustic Theory of Speech Production 17

Vowel Production Example

6.345 Automatic Speech Recognition

Acoustic Theory of Speech Production 18

Example of Vowel Spectrograms

Energy -- 125 Hz to 750 Hz

Wide Band Spectrogram

Wide Band Spectrogram

Estimating Anti-Resonance Frequencies (Zeros)

Zeros occur at frequencies where there is no measurable output

For nasal consonants, zeros in UN occur where YO =

Zeros occur when measurements are made in vocal tract interior