Beruflich Dokumente
Kultur Dokumente
Session 2003
A n a t o m i ca l S t r u ct u r e s f o r S p e e ch P r o d u ct i o n
EXAMPLE
beat
bit
bait
bet
bat
Bob
bought
but
boat
book
boot
Burt
bite
Boyd
bout
about
PHONEME
/s/
/S/
/f/
/T/
/z/
/Z/
/v/
/D/
/p/
/t/
/k/
/b/
/d/
/g/
EXAMPLE
see
she
fee
thief
z
Gigi
v
thee
pea
tea
key
bee
Dee
geese
PHONEME
/w/
/r/
/l/
/y/
/m/
/n/
/4/
/C/
/J/
/h/
EXAMPLE
wet
red
let
yet
meet
neat
sing
church
judge
heat
Palato-Alveolar
Alveolar
Labial
Dental
Palatal
Velar
Uvular
A Wideband Spectrogram
Pr
r
UG
To = 1/Fo
UG ( f )
1/f2
UG ( t )
f
Men
Women
Children
F0 ave (Hz)
125
225
300
F0 min (Hz)
80
150
200
F0 max (Hz)
200
350
500
0.2 V
D
V : Velocity at constriction
6.345 Automatic Speech Recognition
D: Critical dimension =
4A
A
u(x, t)
U (x, t)
p(x, t)
=
=
=
=
particle velocity
density of air
velocity of sound
x
t
u
1 p
=
x c 2 t
1 2 u
2 u
= 2 2
2
x
c t
1
x
x
+
sx/c
sx/c
u(x, s) =
P e
u(x, t) = u (t ) u (t + )
P+ e
c
c
c
x
x
+
p(x, t) = c u (t ) + u (t + )
p(x, s) = P+ esx/c + P esx/c
c
c
6.345 Automatic Speech Recognition
UG
x = -l
x = 0
es/c
2
+ es/c
T (j) =
1
cos(/c)
c
2
6.345 Automatic Speech Recognition
c
fn =
(2n1)
4
4
n =
(2n 1)
n = 1, 2, . . .
40
x
20
x
0
Frequency ( kHz )
x
x
A uniform tube closed at one end and open at the other is often
referred to as a quarter wavelength resonator
x
glottis
lips
|U(x)|
SWP for
F1
SWP for
F2
2
3
SWP for
F3
2
5
6.345 Automatic Speech Recognition
4
5
Acoustic Theory of Speech Production 13
z-l
x = -l
x = 0
z-l
x
A
2P+ sin
c
c
x = -l
x = 0
Half-wavelength resonator
x
P(x, j) = j2P+ sin
c
U(x, j) =
A
x
2P+ cos
c
c
A
A
tan
Y = j
cot
Y = j
c
c
c
c
1
A
A
= j
j
/c 1
j 2 = jCA /c 1
MA
c
CA = A/c 2 = acoustic compliance MA = /A = acoustic mass
fn =
c
(2n 1)
4
n = 1, 2, . . .
fn =
c
n n = 0, 1, 2, . . .
2
Acoustic Theory of Speech Production 14
[ i ]
A1
l1
[ a ]
[ u ]
A2
l2
UG
A1
A2
l1
UL
l2
Y 1+ Y 2= 0
cos
=0
cos
c
c
A1
c
c
6.345 Automatic Speech Recognition
A2
l1
fn =
At low frequencies:
f =
c
n
21
c
A2
2 A1 1 2
l2
plus
1/2
fn =
c
n
22
1
1
2 CA1 MA2
1/2
1 cm
1 cm
9 cm
7 cm
8 cm
8 cm
9 cm
972
2917
.
.
.
Formant
F1
F2
F3
.
.
Actual
789
1276
2808
.
.
6 cm
1093
.
.
.
.
Estimated
972
1093
2917
.
.
268
1944
.
.
.
.
Formant
F1
F2
F3
.
.
2917
.
.
.
.
Actual
256
1905
2917
.
.
Estimated
268
1944
2917
.
0.0
0.1
0.2
Zero Crossing Rate
Time (seconds)
0.3
0.4
0.5
0.6
0.7
kHz 8
16
8 kHz
16
0.5
0.6
0.7
kHz 8
16
8 kHz
Total Energy
0
Total Energy
dB
dB
dB
dB
dB
dB
Energy -- 125 Hz to 750 Hz
dB
8
Time (seconds)
0.3
0.4
0.0
0.1
0.2
Zero Crossing Rate
dB
kHz 4
4 kHz
kHz 4
4 kHz
0
Waveform
0.0
0.1
0
Waveform
0.2
0.3
0.4
/bit/
6.345 Automatic Speech Recognition
0.5
0.6
0.7
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
/bat/
Acoustic Theory of Speech Production 19
UG
Yn
Ap
Yp
lp
An
UN
Ao
Ab
lo
lb
Ac
Yo
lc
Ps A f
UL
lf
Y1 = 0
Y 3+ Y 4= 0
Consonant Production
Ab
Ac
lb
Ps A f
lc
lf
POLES
+
[g]
[s]
ZEROS
+
Ab
5
5
Ac
0.2
0.5
[g]
poles zeros
215
0
1750 1944
1944 2916
3888 3888
.
.
.
.
6.345 Automatic Speech Recognition
Af
4
4
b
9
11
c
3
3
f
5
2.5
[s]
poles zeros
306
0
1590 1590
3180 2916
3500 3180
.
.
.
.
Time (seconds)
0.3
0.4
0.5
0.6
0.7
16
8 kHz
0.0
0.1
0.2
16
Zero Crossing Rate
kHz 8
Time (seconds)
0.4
0.5
0.6
0.7
0.8
16
8 kHz
Total Energy
0
Total Energy
dB
dB
dB
dB
dB
dB
Energy -- 125 Hz to 750 Hz
dB
8
0.3
dB
kHz 4
4 kHz
kHz 4
4 kHz
0
Waveform
0.0
0.1
0
Waveform
0.2
0.3
0.4
/kip/
6.345 Automatic Speech Recognition
0.5
0.6
0.7
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
/si/
Acoustic Theory of Speech Production 22
Perturbation Theory
Y j
A
Yl
A
for small
Consider a uniform tube, closed at one end and open at the other
l
Reducing the area of a small piece of the tube near the opening
(where U is max) has the same eect as keeping the area xed
and lengthening the tube
Since lengthening the tube lowers the resonant frequencies,
narrowing the tube near points where U (x) is maximum in the
standing wave pattern for a given formant decreases the value of
that formant
6.345 Automatic Speech Recognition
A
Yl
A
for small
2
c
Reducing the area of a small piece of the tube near the closure
(where p is max) has the same eect as keeping the area xed and
shortening the tube
Since shortening the tube will increase the values of the formants,
narrowing the tube near points where p(x) is maximum in the
standing wave pattern for a given formant will increase the value
of that formant
6.345 Automatic Speech Recognition
glottis
lips
glottis
lips
|U(x)|
+
F1
SWP for
F1
1
2
F2
SWP for
F2
2
3
F3
SWP for
F3
2
5
4
5
1
2
+
1
2
x
c
Nc
A7
+
U k ( t - ) U k+1( t )
U k ( t + ) U k+1 ( t )
Uk ( t )
Uk ( t )
U k+1 ( t - )
U k+1
(t+)
x
Ak
x
A k+1
+
Uk (
t)
DELAY
Uk ( t - )
Uk + 1 ( t )
1 + rk
1 - rk
DELAY
U k(
t + )
Uk+1( t - )
DELAY
U k + 1( t + )
rk
- rk
Uk ( t )
DELAY
Uk + 1 (
k th tube
t)
( k + 1 ) st tube
(t)
Uk (t + ) = rk Uk+ (t ) + (1 rk )Uk+1
rk =
6.345 Automatic Speech Recognition
Ak+1 Ak
Ak+1 + Ak
note |rk |1
Acoustic Theory of Speech Production 31
z)
1
2
1 + rk
Uk + 1 ( z )
-rk
rk
Uk ( z )
z
1
2
Uk + 1 ( z )
1 - rk
= N =
T = 2 = 2
Nc
cT
Series and shunt losses can also be introduced at tube junctions
Bandwidths are proportional to energy loss to storage ratio
Stored energy is proportional to tube length
6.345 Automatic Speech Recognition
Assignment 1
References
Zue, 6.345 Course Notes
Stevens, Acoustic Phonetics, MIT Press, 1998.
Rabiner & Schafer, Digital Processing of Speech Signals,
Prentice-Hall, 1978.