Beruflich Dokumente
Kultur Dokumente
Figure
9.1
Cross-section
view
of
the
human
head.
Acoustically
significant
features
are
labeled
on
the
left
side,
and
speech
articulators
are
labeled
on
the
right.
Figure
9.2
Various
views
of
vocal
fold
vibration,
with
time
proceeding
left
to
right.
(a)
vocal
folds
from
the
top,
looking
down
the
throat;
(b)
cross
section
as
viewed
through
the
neck
wall;
(c)
area
of
the
vocal
fold
opening;
(d)
flow
through
the
vocal
folds;
(e)
Electroglottograph
(EGG),
which
measures
the
electrical
admittance
between
the
vocal
folds.
9.2
Voice
Physiology
Figure
9.1
shows
a
simple
cross-sectional
view
of
the
human
head,
with
the
acoustically
significant
features
labeled
on
the
left
side
and
the
speech
articulators
labeled
on
the
right
side.
It
is
clear
from
figure
9.2b
that
the
vocal
folds
are
indeed
folds,
not
cords,
as
they
are
often
called.
The
vibration
actually
takes
place
in
(at
least)
two
dimensions.
There
is
an
oval-
shaped
oscillation
as
viewed
from
the
top
looking
down
the
throat,
and
a
vertical
opening
and
closing
motion
as
viewed
from
the
front,
looking
inward
through
the
neck
wall.
As
shown
in
figure
9.3,
the
main
physiological
components
that
are
responsible
for
control
of
the
vocal
folds
are
the
arytenoid
cartilage
and
muscles,
to
which
the
vocal
folds
are
attached.
These
structures
are
responsible
for
bringing
the
vocal
folds
together
to
cause
them
to
oscillate
(beat
together).
This
bringing
together
is
called
''adduction."
Pulling
the
vocal
folds
apart,
called
"abduction,"
causes
the
voice
source
to
cease
oscillation.
These
terms
can
be
remembered
by
their
relation
to
the
words
"addiction"
and
"abstinence"
(from
the
Latin
addicere
and
abstinere).
The
vocal
folds
oscillate
when
they
are
adducted,
the
breath
pressure
from
the
lungs
forces
them
apart,
and
the
resulting
flow
and
decrease
in
breath
pressure
between
the
glottal
folds
allows
them
to
move
back
together.
It
was
once
thought
that
the
brain
sent
specific
instructions
to
the
vocal
folds
each
cycle,
to
open
and
close
them.
This
is
a
flawed
notion
when
viewed
in
comparison
with
musical
instrument
oscillators,
such
as
the
lips
of
the
brass
player
or
the
reed
of
the
clarinet.
The
brain
of
a
clarinetist
doesn't
send
explicit
instructions
to
the
wooden
reed
to
move
each
cycle.
Both
the
vocal
folds
and
the
clarinet
reed
oscillate
because
of
basic
physics
and
the
constant
energy
provided
by
the
breath
pressure
of
the
player/singer.
The
other
physiological
components
that
are
important
to
speech
will
be
covered
from
the
standpoint
of
their
functional
articulatory
use.
The
main
articulators
are
the
following:
The
tongue
The
lips
The
jaw
The
velum
The
position
(height)
of
the
larynx.
Most
people
are
fairly
familiar
with
manipulations
of
their
tongue,
lips,
and
jaw
in
speech,
but
much
less
so
with
the
control
of
their
velum
and
larynx
height.
The
velum
is
a
small
flap
of
skin
in
the
back
of
the
throat
that
controls
the
amount
of
sound
and
air
that
is
allowed
to
enter
the
nasal
passages.
If
you
say
the
sound
'ng'
(as
in
sing)
with
the
back
of
your
tongue
as
far
back
as
possible,
your
velum
and
tongue
will
touch,
and
all
of
the
sound
will
come
from
your
nose.
Control
of
larynx
height
allows
human
speakers
to
change
the
effective
length
of
the
vocal
tract,
thus
giving
a
perception
of
a
smaller
or
larger
head
on
the
part
of
the
speaker.
The
voices
of
many
cartoon
characters
have
a
characteristic
high
tenor
quality,
produced
by
raising
the
larynx.
Large,
intimidating
vocal
qualities
like
that
of
Star
Wars'
Darth
Vader
are
produced
with
a
lowered
larynx.
Most
people
are
not
aware
of
their
control
over
larynx
height,
but
when
we
talk
to
babies
or
pets,
we
often
raise
our
larynx
and
talk
high
and
fast
to
show
approval
or
excitement,
and
we
lower
our
larynx
to
show
disapproval
and
become
more
intimidating.
Good
singers
and
professional
actors
with
flexible
control
over
their
voice
can
produce
a
large
range
of
sounds
and
voices
by
varying,
among
other
things,
the
height
of
their
larynx.
9.3
Vocal
Tract
Acoustics
The
vocal
folds
behave
as
an
oscillator
driven
by
the
breath
pressure.
The
force
that
causes
the
vocal
folds
to
close,
the
Bernoulli
force,
is
the
same
force
that
lifts
airplanes
and
causes
a
piece
of
paper
to
rise
when
you
blow
across
it.
In
the
voice,
the
high
flow
rate
of
the
air
rushing
between
the
vocal
folds
causes
the
pressure
to
drop,
and
the
vocal
folds
are
sucked
shut.
Different
types
of
voiced
vocal
sounds
(called
phonation
qualities)
are
related
by
how
closely
the
vocal
folds
are
held
together.
The
more
tightly
pressed
together
they
are,
the
brighter
the
sound
produced
by
the
vocal
fold
oscillator.
Also,
in
general,
the
louder
the
phonation
(higher
airflow),
the
brighter
the
sound
produced
by
the
vocal
folds,
as
shown
in
figure
9.4.
This
is
due
to
the
increased
velocity
of
vocal
fold
closure.
The
vocal
folds
are
often
compared
to
the
reed
of
the
clarinet.
There
is
one
important
difference
between
the
function
and
behavior
of
the
vocal
folds
and
the
clarinet
reed,
and
thus
resulting
differences
in
the
behaviors
of
the
clarinet
and
the
voice.
Before
discussing
that
difference,
some
fundamentals
about
the
acoustics
of
tubes
and
pipes
need
to
be
covered.
Figure
9.4
Spectra
of
soft
(left)
and
loud
(right)
sung
tones.
The
spectrum
decreases
less
with
frequency
in
the
loud
sung
tone,
making
it
sound
brighter
as
well
as
louder
Figure
9.5
A
tube,
closed
at
both
ends,
and
the
first
three
modes
of
oscillation
As
was
discussed
in
section
4.4
for
the
case
of
vibrating
strings,
pipes
and
tubes
also
exhibit
modes,
or
"favored
frequencies"
of
oscillation.
A
tube
that
is
closed
at
both
ends
is
analogous
to
the
vibrating
string
anchored
at
both
ends.
The
flow
velocity
at
a
closed
end
must
be
zero
at
the
end
point
(no
air
can
flow
through
the
closed
end),
like
the
displacement
at
both
ends
of
a
vibrating
string.
Such
points
are
called
"nodes."
Any
frequency
that
exhibits
the
correct
nodal
pattern
(at
the
end
points
of
a
closed
tube,
for
example)
will
be
supported
by
the
tube,
and
is
thus
a
mode
of
that
tube.
Thus,
as
shown
in
figure
9.5,
the
modes
of
a
pipe
with
two
closed
ends
are
integer
multiples
of
a
fundamental,
which
is
one
half-cycle
of
a
sine
wave
whose
wavelength
is
twice
the
length
of
the
tube.
A
tube
that
is
open
at
both
ends
will
exhibit
the
same
modes,
but
with
antinodes
at
both
ends.
The
fundamental
mode
can
be
computed
by
the
simple
relationship
F1=
c/1/2,
(9.1)
where
c
=
the
speed
of
sound
(about
1100
feet/second
in
air)
and
l
is
the
length
of
the
tube.
Another
important
analytic
result
related
to
modal
analysis
is
the
transfer
function.
The
transfer
function
of
a
linear
system
describes
what
happens
to
a
signal
traveling
through
that
system.
In
our
case
the
system
is
an
acoustic
tube,
and
our
interest
is
eventually
to
describe
what
happens
to
the
waveform
coming
from
the
vocal
folds
as
it
proceeds
through
the
vocal
tract
acoustic
tube
and
out
through
the
lips.
A
common
and
powerful
representation
of
the
transfer
function
is
a
graph
of
gain
versus
frequency,
representing
how
much
a
sine
wave
at
each
frequency
would
be
amplified
or
attenuated
in
going
through
the
system.
Figure
9.6
Experimentally
measured
transfer
function
of
a
4-foot
tube
open
at
both
ends.
The
most
important
perceptual
features
are
the
peaks
in
the
transfer
function,
since
these
are
the
features
to
which
our
ears
are
most
sensitive.
The
modal
frequencies
of
an
acoustic
tube
exhibit
the
most
gain
in
the
transfer
function,
with
the
regions
between
exhibiting
lower
gains.
The
modal
resonance
peaks
in
the
transfer
function
are
called
formants.
To
investigate
resonant
formants
experimentally,
find
a
tube
about
4
feet
long,
and
sing
through
it.
As
you
sweep
your
voice
frequency
slowly
up
and
down,
you
will
find
frequencies
that
are
reinforced,
or
easier
to
sing,
and
others
that
seem
difficult
to
sing.
The
easy
frequencies
are
the
modal
formants,
or
peaks
in
the
transfer
function,
as
shown
in
figure
9.6.
The
oral
portion
of
the
vocal
tract
is
essentially
a
tube
that
is
closed
at
the
vocal
fold
end
and
open
at
the
lip
end.
Assuming
that
this
tube
is
straight
and
has
uniform
shape,
the
modal
solutions
are
those
that
exhibit
a
node
at
one
end
and
an
antinode
at
the
other,
as
shown
in
figure
9.7.
The
modes
that
satisfy
those
conditions
are
all
odd
multiples
of
one
quarter-cycle
of
a
sine
wave,
and
thus
the
frequencies
are
all
odd
multiples
of
a
fundamental
mode
F1,
computed
by
F1
=
c/l/4.
(9.2)
Using
l
=
9
in.
as
a
human
vocal
tract
length
in
equation
9.2,
the
modal
frequencies
are
375
Hz,
1125
Hz,
2075
Hz,
and
so
on.
A
transfer
function
of
such
a
tube
is
shown
in
figure
9.7.
The
vibrational
modes
of
of
an
acoustic
tube
that
are
computed
from
the
end
conditions
(open
or
closed)
are
called
longitudinal
modes.
Upper:
a
tube,
closed
at
one
end
and
open
at
the
other,
and
the
first
three
modes
of
vibration.
Lower:
magnitude
transfer
function
of
the
tube
Figure
9.8
The
first
three
cross
modes
of
a
rectangular
tube.
If
our
vocal
tract
were
in
this
shape,
and
2
in.
wide,
the
frequencies
of
these
first
three
modes
would
be
3000,
6000,
and
9000
Hz.
The
assumption
that
the
longitudinal
modes
and
transfer
function
are
the
most
significant
is
based
two
things:
1)
the
tube
is
much
longer
than
it
is
wide,
and
2)
the
lower
frequency
modes
are
perceptually
dominant.
The
next
significant
modes
would
be
the
cross
modes,
but
how
significant
are
they?
For
simplicity,
if
we
assume
that
the
tube
cross
section
is
a
square,
as
shown
in
figure
9.8,
2
inches
on
each
side,
the
cross-modal
frequencies,
as
computed
using
equation
9.1,
are
3000
Hz,
6000
Hz,
and
so
on.
Even
assuming
a
3
inch
cross-sectional
dimension,
the
lowest
cross-modal
frequency
is
2000
Hz,
which
is
near
the
third
longitudinal
mode
frequency
of
2075
Hz.
The
tube
diameter
in
normal
speech
never
approaches
3
in.,
and
thus
the
first
cross
mode
is
always
significantly
above
the
first
three
longitudinal
resonances
of
the
vocal
tract,
which
are
considered
to
be
the
most
important
for
understanding
speech.
The
modes
of
an
acoustic
tube
can
be
changed
in
frequency
by
changing
the
shape
of
the
tube.
Some
simple
rules
for
modal
behavior
as
a
function
of
tube
manipulation
are
the
following:
1.
Narrowing
the
tube
at
a
point
a.
Raises
the
frequency
of
any
mode
that
exhibits
an
antinode
at
that
point,
and
b.
Lowers
the
frequency
of
any
mode
that
exhibits
a
node
at
that
point.
2.
Widening
the
tube
at
a
point.
a.
Lowers
the
frequency
of
any
mode
that
exhibits
an
antinode
at
that
point,
and
b.
Raises
the
frequency
of
any
mode
that
exhibits
a
node
at
that
point.
These
rules
form
the
acoustical
basis
for
the
human
ability
to
form
vowels,
and
thus
our
ability
to
convey
information
through
speech.
By
changing
the
shape
of
the
vocal
tract
acoustic
tube,
we
move
the
natural
resonant
frequencies
of
the
tube.
The
tube
acts
as
a
filter
for
the
voice
source,
shaping
the
spectrum
of
the
source
according
to
the
gain-versus-
frequency
relationship
described
by
the
transfer
function.
In
the
experiment
of
singing
through
a
4-foot-long
tube,
we
found
that
while
we
could
sing
any
frequency,
some
were
easier
to
sing
and
some
were
harder.
The
vocal
tract
is
much
shorter
than
4
feet
(for
all
of
the
humans
I
know),
and
thus
most
or
all
of
the
formant
frequencies
lie
above
the
frequency
of
the
speaking
voice
source
(about
100
Hz
in
males
and
about
150
Hz
in
females).
In
the
voice,
the
vocal
folds
are
free
to
oscillate
at
any
frequency
(within
a
reasonable
range
of
50-1500
Hz),
and
the
transfer
function
acts
to
shape
the
spectrum
of
the
harmonics
of
the
voice
source.
The
resulting
sound
coming
from
the
lips
is
modified
by
the
vocal
tract
transfer
function,
often
called
the
vocal
tract
filter.
The
basis
of
the
source/filter
model
of
the
vocal
mechanism
is
depicted
in
figure
9.9.
The
voice
source
generates
a
complex
waveform
at
some
frequency
that
is
controlled
by
the
human
speaker/singer.
The
vocal
tract
filter
shapes
the
spectrum
of
the
source
according
to
a
transfer
function
determined
by
the
shape
of
the
vocal
tract
acoustic
tube,
also
under
control
of
the
human
speaker/singer.
The
result
is
a
harmonic
spectrum
with
the
overall
shape
of
the
transfer
function
determined
by
the
shape
of
the
vocal
tract.
Figure
9.9
Three
vocal
tract
shapes,
transfer
functions,
and
voice
output
spectra.
With
what
we
now
know
about
voice
acoustics,
we
can
point
out
a
fundamental
difference
between
the
human
voice
and
instruments
like
the
clarinet.
In
the
case
of
the
voice,
the
source
has
the
ability
to
oscillate
at
essentially
any
frequency,
and
the
vocal
tract
tube
can
change
shape,
grossly
modifying
the
final
spectrum
coming
from
the
lips.
The
clarinetist
can
modify
the
resonant
structure
of
the
instrument
by
changing
the
configurations
of
the
tone
holes,
but
the
source
follows
along
by
oscillating
at
a
new
frequency.
In
such
a
system,
it
is
said
that
the
coupling
between
the
reed
source
and
bore
filter
is
strong,
meaning
that
the
player
has
trouble
controlling
the
source
independently
of
the
filter.
In
the
voice
the
coupling
is
weak,
so
we
are
able
to
sing
arbitrary
pitches
on
arbitrary
vowels,
in
arbitrary
combinations,
to
make
speech
and
song.
Of
course,
we
are
able
to
make
sounds
other
than
vowels,
such
as
nasalized
sounds,
consonants,
and
the
like.
By
lowering
the
velum
to
open
the
nasal
passage,
another
tube
is
opened,
another
set
of
resonances
becomes
active,
and
a
nasalized
sound
results.
Another
important
phenomenon
related
to
resonance,
called
cancellation,
becomes
active
when
a
vocal
sound
is
nasalized.
This
will
be
covered
in
a
later
chapter.
Many
consonant
sounds
are
caused
by
noise
in
the
vocal
tract,
as
shown
in
figure
9.10.
When
a
constriction
is
formed
and
air
travels
through
it,
there
is
a
possibility
of
turbulence.
The
likelihood
of
turbulence
increases
with
increasing
airflow,
and
it
also
increases
with
decreasing
constriction.
The
final
spectrum
of
the
noise
is
related
to
two
things:
the
turbulence
itself,
and
the
vocal
tract
tube
downstream
from
the
source
of
noise.
Figure
9.10
Vocal
tract
shapes
and
spectra
for
four
unvoiced
fricatives.
through
it,
there
is
a
possibility
of
turbulence.
The
likelihood
of
turbulence
increases
with
increasing
airflow,
and
it
also
increases
with
decreasing
constriction.
The
final
spectrum
of
the
noise
is
related
to
two
things:
the
turbulence
itself,
and
the
vocal
tract
tube
downstream
from
the
source
of
noise.
The
high
frequency
content
of
the
turbulent
noise
increases
with
increasing
airflow
and
with
decreasing
constriction.
This
is
easily
tested
by
forming
the
sound
/s/
(as
in
sing)
and
first
modulating
your
breath
pressure
up
and
down,
then
moving
your
tongue
slightly
up
and
down.
The
effects
of
the
acoustic
tube
on
the
final
spectrum
are
described
by
the
same
transfer
function
concept
used
to
express
the
relationship
between
voiced
vowel
sounds.
To
test
this,
form
a
/sh/
sound
(as
in
shift)
and
move
your
lips
around.
You
are
not
changing
the
turbulence
much,
just
the
resonance
properties
of
the
tube
downstream.
Sounds
such
as
/s/
and
/sh/
are
called
unvoiced
fricatives,
and
sounds
such
as
/z/
and
/v/
are
called
voiced
fricatives.
9.4
Voice
System
Neurology
The
two
areas
of
the
brain
that
have
well-documented
specific
speech
functions
are
Broca's
area
in
the
frontal
lobe
and
Wernike's
area
in
the
temporal
lobe,
as
shown
in
figure
9.11.
In
general,
the
left
half
of
the
brain
is
dominant
for
language
functions.
This
is
most
so
in
right-handed
males,
and
least
so
in
left-handed
females.
Much
of
what
we
know
about
the
function
of
the
brain
comes
from
case
studies
of
people
who
experienced
some
type
of
nonfatal
brain
injury,
so
we
can
speak
about
how
an
injury
to
a
specific
region
of
the
brain
causes
a
specific
impairment
of
function.
Brain
injuries
resulting
in
speech
or
language
impairments
are
called
aphasias.
Figure
9.11
Areas
of
the
human
brain
related
to
speech
and
language.
Damage
to
Broca's
area
leads
to
impaired
grammatical
speech
production
but
relatively
unimpaired
comprehension.
Damage
to
Wernike's
area
causes
impaired
comprehension
but
possibly
fluent
speech
with
grammatically
correct
inflections.
Of
course,
this
specialization
is
not
clear-cut:
case
studies
of
injuries
are
individual
and
as
unpredictable
as
the
accidents
that
caused
the
injuries.
Reading
seems
to
piggyback
on
oral
speech
production,
as
shown
in
some
case
studies
of
injuries
where
phonetic
syllabary-based
reading
(English
or
the
Japanese
written
language
kana)
was
impaired
in
addition
to
speech.
In
ideographic
or
pictographic
written
languages
like
Chinese
or
the
Japanese
kanji,
the
symbols
are
related
to
nonphonetic
higher-level
concepts
(a
symbol
for
the
entire
concept
of
sunrise,
for
example),
and
thus
comprehension
can
be
impaired
without
affecting
reading
ability.
A
related
but
different
phenomenon
of
aphasia
is
amusia,
which
is
a
brain
injury
resulting
in
impairment
of
musical
ability,
either
comprehension
or
production.
The
French
composer
Maurice
Ravel
experienced
both
aphasia
and
amusia
from
brain
disease
in
1932,
and
remained
both
musically
unproductive
and
speech-impaired
until
his
death
in
1937.
It
would
seem
natural
that
the
brain
functions
for
language
and
musical
processing
could
be
quite
closely
linked,
and
indeed
they
are
in
many
ways.
They
are
quite
independent
in
many
other
ways,
however.
The
composer
Vissarion
Shebalin
became
aphasic
from
a
stroke
but
composed
quite
well
in
his
own
style
for
many
years
thereafter.
Of
course
the
question
of
ability
and
competency,
both
linguistically
and
musically,
has
a
lot
to
do
with
exposure,
training,
and
memory.
Chapter
17
will
discuss
more
on
the
topic
of
memory
for
musical
events,
and
case
studies
of
brain
injuries
as
they
affect
language
and
musical
abilities.
Many
interesting
distinctions
can
be
found
in
the
processing
of
language
in
speech
versus
music.
Sometimes
singers
perform
songs
with
words
in
languages
with
which
they
are
generally
unfamiliar,
and
thus
the
performance
is
more
like
playing
an
instrument
than
speaking
a
language.
In
these
cases
the
singer
is
often
just
rerunning
a
rehearsed
motor
program,
not
spontaneously
trying
to
convey
a
new
abstract
meaning
to
another
person
on
a
personal
level.
Singing
has
been
used
successfully
as
a
therapy
for
persons
with
profound
stuttering
disorders;
it
seems
that
the
rhythm
allows
the
stutterer
to
form
the
words
more
fluently
than
if
the
rhythmic
constraint
were
not
present.
Some
aphasics
have
lost
speech
but
retained
their
ability
to
sing
songs
they
knew
before
their
brain
injury,
and
some
are
even
able
to
learn
and
sing
new
songs
while
unable
to
speak
in
a
fluent
manner.
9.5
Conclusion
We
use
our
voices
to
communicate
with
both
words
and
descriptive
sounds.
A
fundamental
understanding
of
how
the
voice
works,
both
acoustically
and
neurologically,
allows
us
to
study
aspects
of
perception
and
production
of
sound
later
in
this
book.