Sie sind auf Seite 1von 6

Voice analysis and

resynthesis for Psychologists


Summer 2012
Lecture 1
Introduction
Essential definitions
Basic acoustics

Course schedule
Lecture 1: voice production
Lecture 2: voice structure
Workshop 1: voice analysis
Workshop 2: speech analysis
Lecture 3: Voice & Psychology
Workshop 3: voice synthesis

Language / Natural language

Course Assessment
Exercise (100%, 1000 words)
produce a spectrogram of your voice, analyse/
report values and discuss their relevance.
See instructions for more details.

Deadline: week 10 (check SD)

Language:
Open communication system that uses a set of written, gestural, or spoken
symbols that refer to people, objects or ideas.
Open-ended system of communication in which the grammatical structure allows
information of great cognitive complexity to be passed from one individual (the
speaker) to another (the listener).

Natural Language:
Spoken or signed language (as opposed to written languages, computer
programming languages). The ASL, Spanish, English & French are natural
languages.

Linguistics:
The study of human language.

Speech:
Human spoken language (as opposed to sign language).

Voice
the sounds made by a person using the vocal folds for talking,
singing, screaming or crying
The voice results from the act of phonation = the use of the
laryngeal system to generate an audible source of acoustic
energy

Not just speech - but more generally vocal communication,


including animal vocal communication.

The term voice refers to the form and to the quality of the vocal
signal rather than to its content.

Phonetics & Phonology


Phonetics is about the physical production and
perception of speech sounds.
How vowels and consonants are produced, their acoustic structure, and
how they are perceived.
Textbooks:
A course in Phonetics, Ladefoged 2000
Speech Physiology, Speech perception, and Acoustic Phonetics,
Lieberman & Blumstein, 1988
Principles of voice production Titze, 1994

Phonology describes the way sounds function - within a


given language or across languages. "

Bioacoustics

Psychoacoustics

Bioacoustics: how animals use sound for communication


and echolocation.

Psychoacoustics:

How animals produce sounds, the physical structure of these sounds, how
animals perceive them, what their function is and how they evolved.

The study of subjective human perception of sounds.

Textbooks:
The Evolution of Communication, Hauser 1996
The Principles of Animal Communication, Bradbury & Vehrencamp, 1998
Animal Signals, Maynard Smith & Harper, 2003.

the psychology of acoustical perception

Study of the relations between the sound stimuli and their auditory
perception in terms of hearing sensations.
These relationships are not simple and linear.
Different people will hear the different things when they listen to the
same sound.

Textbooks:
Speech Physiology, Speech perception, and Acoustic Phonetics,
Lieberman & Blumstein, 1988

Introduction to Acoustics:

A sound wave is caused by an increase in pressure at a certain point which


causes a "domino effect" outward.

What is sound?"
Vibration as perceived by the sense of hearing (Wikipedia Psychoacoustics definition)"
A disturbance of the equilibrium of density (or pressure of a gas,
liquid or solid) (Titze - Physics definition)"
A local pressure disturbance in a continuous medium that contains
frequencies in the range of 20 to 20,000Hz (the audible
range) (Titze, a compromise between physics and psychophysics)"
Small variations in air pressure that occur very rapidly one after
another (Ladefoged)"

If the perturbation is repeated periodically, then it


generates a series of sound waves:

Propagation and speed of Sound

Vibrating
source

In an homogeneous medium (~ the atmosphere), sound


propagates from the source at equal speed in all three
dimensions, therefore sound waves are spherical waves."

Pressure

The speed at which sound propagates depends on the type,


temperature and pressure of the medium through which it
propagates. "

space

The crests correspond to the high pressure points and the troughs
correspond to the low pressure points.

In dry air at 20 C the speed of sound is approximately 343m/s."


Thats approximately 1 meter every 2.9 milliseconds."
In the human vocal tract, which is more humid and warmer,"
the speed of sound is higher at 355m/s."

Waveform / Oscillogram
Sound waves can be represented as the temporal variation of sound
pressure at a fixed point in space - for example the membrane of
a microphone.
When we record a sound - we record (analogically or digitally) this
temporal variation.

Periodic Sounds
Most sounds are generated by oscillators (strings, vocal folds,
resonators, etc)
Why oscillators?

This can be represented as a waveform or oscillogram:

Therefore most natural sounds are are periodic (or quasi-periodic).


The pressure variation of a periodic sound is an oscillation with a
given period and a given amplitude.

Period
The period of a sound wave is the the duration of
an oscillation cycle
Can be measured as the time between two peaks.

Frequency
The frequency of a sound is the number of air
pressure oscillation cycles per second. It is
the multiplicative inverse (or reciprocal) of the
period: F = 1/T
T = 0.74 ms,
F = 1/0.0074 = 133Hz
0.74 ms

7.5 ms

Wavelength

One single oscillatory cycle per second corresponds to 1 Hz. This is not audible.
125 oscillations (the fundamental frequency in male voice), is 125HZ.
200 oscillations (the fundamental frequency in female voice), is 200Hz
2000 oscillations (some bird calls), is 2000Hz,
15000 oscillations (some bats calls), is 15000Hz etc

Wavelength

The wavelength of a periodical sound is the distance (in


space) between two successive crests (and is the
distance that a wave travels in the time of one
oscillatory cycle).

The wavelength of a periodical sound is the distance (in


space) between two successive crests (and is the
distance that a wave travels in the time of one
oscillatory cycle).

It is a function of the frequency of the sound and of the speed of


sound in the medium in which the sound is propagated.
The wavelength of a sound of frequency F traveling at speed c is
given by d = c/F.

It is a function of the frequency of the sound and of the speed of


sound in the medium in which the sound is propagated.
The wavelength of a sound of frequency F traveling at speed c is
given by d = c/F.

For c = 343 m/s (speed of sound in the atmosphere):


a 20 kHz sound wave has a wavelength of 343/20000 = 17 mm.
a 440 Hz wave has a wavelength of 343/440 = 78 cm,
a 20 Hz (an elephant rumble) wave has a wavelength of 343/20
=17 m.

For c = 343 m/s (speed of sound in the atmosphere):


a 20 kHz sound wave has a wavelength of 343/20000 = 17 mm.
a 440 Hz wave has a wavelength of 343/440 = 78 cm,
a 20 Hz (an elephant rumble) wave has a wavelength of 343/20
=17 m.

Amplitude, SPL and loudness

The intensity contour

The amplitude is the magnitude of the change in sound pressure


within the wave. It corresponds to the maximum amount of
pressure at any point in the sound wave.
It is also called Sound Pressure Level and measured in decibels, a
logarithmic (perceptual) scale.

The amplitude envelope of a sound is the


smooth curve that passes through the
peaks of the amplitude.
It is also called the intensity contour
Determines the temporal structure of
animal calls / speech.

dB(SPL)

Examples of dB levels : ambiant speech in an office/restaurant:


60dB, vacuum cleaner at 1m: 80dB, red deer roar at 1m: 104 dB,
jet aircraft at 100m: 120dB, blue whale at 1m:180dB.
Loudness is the perceptual correlate of amplitude it is a
subjective, non linear perceptual attribute of sound (varies with
people, frequency, distance)

How can we study the frequency structure


of sounds ?
Spectrograms:

Spectrums: frequency / amplitude representation. The time


dimension is removed.

Enable to visualise the distribution


of the energy (amplitude) in two
dimensions: time (s) and
frequency (Hz).

Spectrogram

Time is on the x axis, frequency on


the y axis, and the energy is
represented by different shades
of grey.
0

0
0

Time (s)

Amplitude (dB)

0.5

Time (s)

Spectrum

0.5

Waveform

Complex sounds

Simple sound-waves: pure tones

Animals, humans and most


musical instruments usually
generate periodic sounds
which have energy at more
than one frequency.
These sounds are called
complex sounds

Pure tones are single frequency tones with no harmonic content (no
overtones). This corresponds to a sine wave.

Frequency (kHz)

1.5 kHz

Frequency (kHz)

0
dB

Time (s)

0.5

Time (s)

0.5

These sounds are composed of


more than one pure tone
(more than one sinusoidal
wave).
Examples:
Red deer roar, herring gull call.

Examples of pure tones: whistles, scops owl hoots, most electronic beeps.

Fundamental frequency and harmonics in


complex periodic sounds
Typical vocal sounds are composed of several sinusoidal waves which appear
on spectrograms as evenly spaced, parallel, narrow frequency components.
H

The lowest of these parallel


frequency components is called
the fundamental frequency
(F0).

The harmonics are integer


multiples of the fundamental
frequency: H1 = 2F0, H2 = 3F0,
etc
The fundamental frequency
determines the pitch of the tone
(how high or low it is perceived
to be).

Harmonics
F0

The variation of F0 with time


determines the fundamental
frequency contour. In speech it
affects the intonation.

Time (s)

The pitch
What is the Pitch of a voice?
The pitch is the perceived height of a voice (Titze)
It is mainly determined by the fundamental frequency of the sound.

60Hz

140Hz

very low
male
(early morning)

female

White noise:

Pitch goes up to 1.4 kHz (whistle register - female singers only).

The distribution of the energy is not


uniform across frequencies:
H

The peaks and valleys represent the


resonances that take place in the
cavities of the vocal tract.
Called formants (in latin formare
= to shape) because they shape
the spectral structure of the
speech signal.

Formants are central to human


speech as they provide the
acoustic variation at the basis of
vowels and consonants.
(see next lectures!).

- spectral envelope
formants ?

Formants

Time (s)

0.5

How vertebrates make sounds

Anurans use their


larynx, they often use
two sets of folds (AM).

Birds use their syrinx,


located at the base of
the trachea.

- intensity contour
- periodicity
pure tone?
F0+harmonics ?
F0 contour ?
noise ?

0.5

Summary
A vocal sound can be
described in terms of
its:

child

The spectral envelope: resonance


frequencies (formants)

Noise is sound that is made of aperiodic series of waves, corresponding


to irregular and disordered vibrations that include all possible
frequencies (e.g. waves breaking on shore, wind)
Can play a role in speech: e.g. whisper - see next lecture.

Time (s)

500Hz

0.5

Noise

200Hz

Mammals use their


larynx, at the top of the
trachea.

Time (s)

0.5

From Fitch & Hauser 2002

What is the vocal apparatus

The two functional components: the


source and the filter
Speech (and most mammal) sounds result from a twostage process:

The lungs (generate


the power)
The trachea
The larynx
The supralaryngeal
vocal tract:

- a periodic wave (called the glottal wave) is generated


in the larynx (= the source). Its fundamental frequency
determines the pitch of the voice.
- this wave is then filtered in the supralaryngeal
cavities of the vocal tract (= the filter), creating broad
bands of energy called vocal tract resonances or
formants.

the pharynx
The mouth
The nasal cavity

Defined by Fant, G. (1960). Acoustic Theory of Speech Production.


From Titze, 1994

Illustration of the source filter theory


Beheaded
speaker
Glottal wave
only

Speaker with an
anesthetized vocal
tract

Speaker with a
normal vocal tract

Glottal wave
filtered by a
uniform tube

Glottal wave
filtered by a non
uniform, changing
vocal tract

Das könnte Ihnen auch gefallen