Voice Analysis 2012minimodule - Lecture1

Voice analysis and
resynthesis for Psychologists

Summer 2012
Lecture 1
Introduction
Essential definitions
Basic acoustics
Course schedule
Lecture 1: voice production
Lecture 2: voice structure
Workshop 1: voice analysis
Workshop 2: speech analysis
Lecture 3: Voice & Psychology
Workshop 3: voice synthesis
Language / Natural language
Course Assessment
Exercise (100%, 1000 words)
produce a spectrogram of your voice, analyse/
report values and discuss their relevance.
See instructions for more details.
Deadline: week 10 (check SD)
Language:
Open communication system that uses a set of written, gestural, or spoken
symbols that refer to people, objects or ideas.
Open-ended system of communication in which the grammatical structure allows
information of great cognitive complexity to be passed from one individual (the
speaker) to another (the listener).
Natural Language:
Spoken or signed language (as opposed to written languages, computer
programming languages). The ASL, Spanish, English & French are natural
languages.
Linguistics:
The study of human language.
Speech:
Human spoken language (as opposed to sign language).
Voice
the sounds made by a person using the vocal folds for talking,
singing, screaming or crying
The voice results from the act of phonation = the use of the
laryngeal system to generate an audible source of acoustic
energy
Not just speech - but more generally vocal communication,

including animal vocal communication.
The term voice refers to the form and to the quality of the vocal
signal rather than to its content.
Phonetics & Phonology

Phonetics is about the physical production and
perception of speech sounds.
How vowels and consonants are produced, their acoustic structure, and
how they are perceived.
Textbooks:
A course in Phonetics, Ladefoged 2000
Speech Physiology, Speech perception, and Acoustic Phonetics,
Lieberman & Blumstein, 1988
Principles of voice production Titze, 1994
Phonology describes the way sounds function - within a

given language or across languages. "
Bioacoustics
Psychoacoustics
Bioacoustics: how animals use sound for communication

and echolocation.
Psychoacoustics:
How animals produce sounds, the physical structure of these sounds, how
animals perceive them, what their function is and how they evolved.
The study of subjective human perception of sounds.
Textbooks:
The Evolution of Communication, Hauser 1996
The Principles of Animal Communication, Bradbury & Vehrencamp, 1998
Animal Signals, Maynard Smith & Harper, 2003.
the psychology of acoustical perception
Study of the relations between the sound stimuli and their auditory
perception in terms of hearing sensations.
These relationships are not simple and linear.
Different people will hear the different things when they listen to the
same sound.
Textbooks:
Speech Physiology, Speech perception, and Acoustic Phonetics,
Lieberman & Blumstein, 1988
Introduction to Acoustics:
A sound wave is caused by an increase in pressure at a certain point which

causes a "domino effect" outward.
What is sound?"
Vibration as perceived by the sense of hearing (Wikipedia Psychoacoustics definition)"
A disturbance of the equilibrium of density (or pressure of a gas,
liquid or solid) (Titze - Physics definition)"
A local pressure disturbance in a continuous medium that contains
frequencies in the range of 20 to 20,000Hz (the audible
range) (Titze, a compromise between physics and psychophysics)"
Small variations in air pressure that occur very rapidly one after
another (Ladefoged)"
If the perturbation is repeated periodically, then it

generates a series of sound waves:
Propagation and speed of Sound
Vibrating
source
In an homogeneous medium (~ the atmosphere), sound

propagates from the source at equal speed in all three
dimensions, therefore sound waves are spherical waves."
Pressure
The speed at which sound propagates depends on the type,

temperature and pressure of the medium through which it
propagates. "
space
The crests correspond to the high pressure points and the troughs
correspond to the low pressure points.
In dry air at 20 C the speed of sound is approximately 343m/s."

Thats approximately 1 meter every 2.9 milliseconds."
In the human vocal tract, which is more humid and warmer,"
the speed of sound is higher at 355m/s."
Waveform / Oscillogram
Sound waves can be represented as the temporal variation of sound
pressure at a fixed point in space - for example the membrane of
a microphone.
When we record a sound - we record (analogically or digitally) this
temporal variation.
Periodic Sounds
Most sounds are generated by oscillators (strings, vocal folds,
resonators, etc)
Why oscillators?
This can be represented as a waveform or oscillogram:
Therefore most natural sounds are are periodic (or quasi-periodic).

The pressure variation of a periodic sound is an oscillation with a
given period and a given amplitude.
Period
The period of a sound wave is the the duration of
an oscillation cycle
Can be measured as the time between two peaks.
Frequency
The frequency of a sound is the number of air
pressure oscillation cycles per second. It is
the multiplicative inverse (or reciprocal) of the
period: F = 1/T
T = 0.74 ms,
F = 1/0.0074 = 133Hz
0.74 ms
7.5 ms
Wavelength
One single oscillatory cycle per second corresponds to 1 Hz. This is not audible.
125 oscillations (the fundamental frequency in male voice), is 125HZ.
200 oscillations (the fundamental frequency in female voice), is 200Hz
2000 oscillations (some bird calls), is 2000Hz,
15000 oscillations (some bats calls), is 15000Hz etc
Wavelength
The wavelength of a periodical sound is the distance (in

space) between two successive crests (and is the
distance that a wave travels in the time of one
oscillatory cycle).
The wavelength of a periodical sound is the distance (in

space) between two successive crests (and is the
distance that a wave travels in the time of one
oscillatory cycle).
It is a function of the frequency of the sound and of the speed of

sound in the medium in which the sound is propagated.
The wavelength of a sound of frequency F traveling at speed c is
given by d = c/F.
It is a function of the frequency of the sound and of the speed of

sound in the medium in which the sound is propagated.
The wavelength of a sound of frequency F traveling at speed c is
given by d = c/F.
For c = 343 m/s (speed of sound in the atmosphere):

a 20 kHz sound wave has a wavelength of 343/20000 = 17 mm.
a 440 Hz wave has a wavelength of 343/440 = 78 cm,
a 20 Hz (an elephant rumble) wave has a wavelength of 343/20
=17 m.
For c = 343 m/s (speed of sound in the atmosphere):

a 20 kHz sound wave has a wavelength of 343/20000 = 17 mm.
a 440 Hz wave has a wavelength of 343/440 = 78 cm,
a 20 Hz (an elephant rumble) wave has a wavelength of 343/20
=17 m.
Amplitude, SPL and loudness
The intensity contour
The amplitude is the magnitude of the change in sound pressure

within the wave. It corresponds to the maximum amount of
pressure at any point in the sound wave.
It is also called Sound Pressure Level and measured in decibels, a
logarithmic (perceptual) scale.
The amplitude envelope of a sound is the

smooth curve that passes through the
peaks of the amplitude.
It is also called the intensity contour
Determines the temporal structure of
animal calls / speech.
dB(SPL)
Examples of dB levels : ambiant speech in an office/restaurant:

60dB, vacuum cleaner at 1m: 80dB, red deer roar at 1m: 104 dB,
jet aircraft at 100m: 120dB, blue whale at 1m:180dB.
Loudness is the perceptual correlate of amplitude it is a
subjective, non linear perceptual attribute of sound (varies with
people, frequency, distance)
How can we study the frequency structure

of sounds ?
Spectrograms:
Spectrums: frequency / amplitude representation. The time

dimension is removed.
Enable to visualise the distribution

of the energy (amplitude) in two
dimensions: time (s) and
frequency (Hz).
Spectrogram
Time is on the x axis, frequency on

the y axis, and the energy is
represented by different shades
of grey.
0
0
0
Time (s)
Amplitude (dB)
0.5
Time (s)
Spectrum
0.5
Waveform
Complex sounds
Simple sound-waves: pure tones
Animals, humans and most

musical instruments usually
generate periodic sounds
which have energy at more
than one frequency.
These sounds are called
complex sounds
Pure tones are single frequency tones with no harmonic content (no
overtones). This corresponds to a sine wave.
Frequency (kHz)
1.5 kHz
Frequency (kHz)
0
dB
Time (s)
0.5
Time (s)
0.5
These sounds are composed of

more than one pure tone
(more than one sinusoidal
wave).
Examples:
Red deer roar, herring gull call.
Examples of pure tones: whistles, scops owl hoots, most electronic beeps.
Fundamental frequency and harmonics in

complex periodic sounds
Typical vocal sounds are composed of several sinusoidal waves which appear
on spectrograms as evenly spaced, parallel, narrow frequency components.
H
The lowest of these parallel

frequency components is called
the fundamental frequency
(F0).
The harmonics are integer

multiples of the fundamental
frequency: H1 = 2F0, H2 = 3F0,
etc
The fundamental frequency
determines the pitch of the tone
(how high or low it is perceived
to be).
Harmonics
F0
The variation of F0 with time

determines the fundamental
frequency contour. In speech it
affects the intonation.
Time (s)
The pitch
What is the Pitch of a voice?
The pitch is the perceived height of a voice (Titze)
It is mainly determined by the fundamental frequency of the sound.
60Hz
140Hz
very low
male
(early morning)
female
White noise:
Pitch goes up to 1.4 kHz (whistle register - female singers only).
The distribution of the energy is not

uniform across frequencies:
H
The peaks and valleys represent the

resonances that take place in the
cavities of the vocal tract.
Called formants (in latin formare
= to shape) because they shape
the spectral structure of the
speech signal.
Formants are central to human

speech as they provide the
acoustic variation at the basis of
vowels and consonants.
(see next lectures!).
- spectral envelope
formants ?
Formants
Time (s)
0.5
How vertebrates make sounds
Anurans use their

larynx, they often use
two sets of folds (AM).
Birds use their syrinx,

located at the base of
the trachea.
- intensity contour
- periodicity
pure tone?
F0+harmonics ?
F0 contour ?
noise ?
0.5
Summary
A vocal sound can be
described in terms of
its:
child
The spectral envelope: resonance

frequencies (formants)
Noise is sound that is made of aperiodic series of waves, corresponding

to irregular and disordered vibrations that include all possible
frequencies (e.g. waves breaking on shore, wind)
Can play a role in speech: e.g. whisper - see next lecture.
Time (s)
500Hz
0.5
Noise
200Hz
Mammals use their

larynx, at the top of the
trachea.
Time (s)
0.5
From Fitch & Hauser 2002
What is the vocal apparatus
The two functional components: the

source and the filter
Speech (and most mammal) sounds result from a twostage process:
The lungs (generate

the power)
The trachea
The larynx
The supralaryngeal
vocal tract:
- a periodic wave (called the glottal wave) is generated

in the larynx (= the source). Its fundamental frequency
determines the pitch of the voice.
- this wave is then filtered in the supralaryngeal
cavities of the vocal tract (= the filter), creating broad
bands of energy called vocal tract resonances or
formants.
the pharynx
The mouth
The nasal cavity
Defined by Fant, G. (1960). Acoustic Theory of Speech Production.

From Titze, 1994
Illustration of the source filter theory

Beheaded
speaker
Glottal wave
only
Speaker with an
anesthetized vocal
tract
Speaker with a
normal vocal tract
Glottal wave
filtered by a
uniform tube
Glottal wave
filtered by a non
uniform, changing
vocal tract

Voice Analysis 2012minimodule - Lecture1

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Voice Analysis 2012minimodule - Lecture1

Hochgeladen von

Copyright:

Verfügbare Formate

Voice analysis and

resynthesis for Psychologists

Language / Natural language

Deadline: week 10 (check SD)

Not just speech - but more generally vocal communication,

Phonetics & Phonology

Phonology describes the way sounds function - within a

Bioacoustics: how animals use sound for communication

The study of subjective human perception of sounds.

the psychology of acoustical perception

A sound wave is caused by an increase in pressure at a certain point which

If the perturbation is repeated periodically, then it

Propagation and speed of Sound

In an homogeneous medium (~ the atmosphere), sound

The speed at which sound propagates depends on the type,

In dry air at 20 C the speed of sound is approximately 343m/s."

This can be represented as a waveform or oscillogram:

Therefore most natural sounds are are periodic (or quasi-periodic).

The wavelength of a periodical sound is the distance (in

The wavelength of a periodical sound is the distance (in

It is a function of the frequency of the sound and of the speed of

It is a function of the frequency of the sound and of the speed of

For c = 343 m/s (speed of sound in the atmosphere):

For c = 343 m/s (speed of sound in the atmosphere):

Amplitude, SPL and loudness

The intensity contour

The amplitude is the magnitude of the change in sound pressure

The amplitude envelope of a sound is the

Examples of dB levels : ambiant speech in an office/restaurant:

How can we study the frequency structure

Spectrums: frequency / amplitude representation. The time

Enable to visualise the distribution

Time is on the x axis, frequency on

Simple sound-waves: pure tones

Animals, humans and most

These sounds are composed of

Fundamental frequency and harmonics in

The lowest of these parallel

The harmonics are integer

The variation of F0 with time

Pitch goes up to 1.4 kHz (whistle register - female singers only).

The distribution of the energy is not

The peaks and valleys represent the

Formants are central to human

How vertebrates make sounds

Anurans use their

Birds use their syrinx,

The spectral envelope: resonance

Noise is sound that is made of aperiodic series of waves, corresponding

Mammals use their

From Fitch & Hauser 2002

What is the vocal apparatus

The two functional components: the

The lungs (generate

- a periodic wave (called the glottal wave) is generated

Defined by Fant, G. (1960). Acoustic Theory of Speech Production.

Illustration of the source filter theory

Das könnte Ihnen auch gefallen