Sie sind auf Seite 1von 32

Perspectives of New Music

Speech Extrapolated
Author(s): David Evan Jones
Source: Perspectives of New Music, Vol. 28, No. 1 (Winter, 1990), pp. 112-142
Published by: Perspectives of New Music
Stable URL: http://www.jstor.org/stable/833346
Accessed: 21-10-2015 01:20 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Perspectives of New Music is collaborating with JSTOR to digitize, preserve and extend access to Perspectives of New Music.

http://www.jstor.org

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SPEECHEXTRAPOLATED

DAVIDEVANJONES
T IS LARGELY timbral qualities and timbral transitions which form the
acoustic basisof phonetic communication. It is not surprising, then, that
those timbres which cue the perception of speech are often taken as points
of departurein efforts to structure a wider timbralvocabularyas a vehicle of
purely musicalcommunication. Wayne Slawson (1985), for example, has
developed a detailed and rigorous approach to the organization of vowels
and vowel-like resonance patterns. Fred Lerdahl (1987) asserts that vowels
are "central to human timbre perception" and incorporates them prominently in his efforts to construct a hierarchicalsystem of timbral organization.' Prior to these theoretical efforts, poets such as Filippo Marinetti,
Hugo Ball, Kurr Schwitters, Maurice Lemaitre, and composers such as
Karlheinz Stockhausen, Herbert Eimert, Luciano Berio, Gyorgy Ligeti,
Kenneth Gaburo, CharlesDodge, Roger Reynolds, Paul Lansky,and many

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

113

others have composed with speech sound in such a way as to focus the
attention of listeners on the sounds of speech not only as carriersof
information (verbal meaning), but also as structuresof (timbral) information. Many of these composers and poets have utilized speech sound in
association with timbrally similar nonspeech sounds in an effort to structure a largerand more varied timbral vocabulary.
I have addressedelsewhere (1987) some of the theoretical issues underlying the strategies composers have utilized to focus listeners' attention on
"the sounds of speech-not only as cues for phonetically coded information, but as timbres, pitches, durations." As a composer, I have explored
ways of making the structure of speech sound serve as a basis for important
aspects of my musical structures. My approach has generally involved
organizing speech sounds according to their timbral characteristics(rather
than their morphemic function) and highlighting this timbral organization
by my use of related instrumental timbres and associated pitch structures.
By these means, I have found ways of "mapping" or "extrapolating"
aspects of speech structure into a musical domain.2
In this paper I will describe some aspects of the compositional language
developed for my Still LifeDancingfor four percussion playersand computer
tape. In particular, I will focus upon several families of "percussion/
vowels" I synthesized for this piece: sounds identifiable as percussion
instruments but with identifiable vowel resonances. Along the way, I will
address several theoretical issues related to the musical organization of
speech sounds and of vowels in particular.In the first part of this paper, I
will addresssome the assumptions underlying the internal organization of
the familiesof percussion/vowelsand I will outline the proceduresby which
these sounds were synthesized. In the second part, I will discuss the
musical functions of the percussion/vowels within the overall timbral
organization of Still Life Dancing, and I will outline the effects I hope to
achieve by the integration of speech and nonspeech timbres.
I. PERCUSSION/VOWELS

OVERVIEW

Utilizing the CHANT synthesizer (Rodet, Potard, and Barriere1984)


and an analysis/synthesistechnique for impulsive sounds (Potard, Baisnee,
and Barriere1986) at L'Institut de Rechercheet de CoordinationAcoustique/
Musique (IRCAM), I synthesized familiar-sounding sources (percussion
timbres) such that each source displayed the characteristicsof a set of nine
familiarresonances (vowels) to form the matrix in Example 1.

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

Perspectives
of NewMusic

114

ce

A^ a

Struck Wood
Skuare

Struck Metal
BowedMetal
etc.

EXAMPLE

The percussion/vowel sounds were synthesized with both wide and


narrow bands at the formant center frequencies to produce perception of
both vowel (by virtue of the wideband resonances) and pitch (the narrow
bands) at these frequencies. The formant structure of the vowel "scale"
from /u/ to /i/ is thus closely and consistently associated with a system of
pitch relations.
Vowel qualities were thus given potential musical functions as
1. Points of cognitiveunisonbetween timbres which, in other respects, are
very different, and which are identified by listeners as issuing from
categorically different sources (e.g. metal percussion, wood percussion, etc.).
2. Points of direct association between pitch information (the formant
structure audible as pitches) and timbral information (the vowels
themselves).3

COGNITIVE

UNISONS

I regard two musical features as being in cognitiveunison when they


effectively represent the same functional unit (or category) in musical
context. Two pitches or two timbres (or two independent aspectsof timbre)
or two vowels may be in cognitive unison when, despiteperceptible
differences
betweenthemalongthe dimensionin question,they are functionally equivalent
at the lowest hierarchicallevel along which that featureis varied-when they
are tokens of the same functional categoryalong that dimension. The idea of

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

115

a cognitive unison is thus an attempt to distinguish between those differences along a given dimension that give rise to the musical structure
(functional differences)and those differencesthat do not.4
Perception of differencesalong any musical dimension is strongly influenced by the listeners' experience in their everydayacoustic world. Even if
harmonies, durations, aspects of timbre were varied along carefully controlled continua, it is highly unlikely that the listeners' perceptual responses
would be continuous: listeners make use of familiarcategories to discriminate and identify points along the continuum. (For relatedexperiments, see
Rosch 1973, 1975). In regard to speech sounds in particular, see the
excellent review provided by Liberman and Studdert-Kennedy (1978).
Composers attempting to create musically effective categories (and therefore effective points of cognitive unison) along any musical "dimension"
must take into account the categories already familiar to the listener.
Composers must either make use of these familiarcategoriesor take special
care to convincingly override them for the purposes of a particularpiece.5
The percussion/vowels were synthesized in an effort to make use of
familiar categories: to cue the perception of a familiar source (metal or
wood percussion instruments) and a familiar resonance (one of nine
selected vowels) in the same sound and to vary these two featuresindependently in musicallymeaningful ways.

INDEPENDENCE OF SOURCE AND RESONANCE

Because timbre involves a complex of perceptually separable acoustic


characteristics,each of which can be varied independently, the notion of
cognitiveunisonsin the domain of timbre is, at least potentially, a compound
issue. Matrices of timbral characteristicscan be constructed within which
individual dimensions can be independently varied. Following Rodet
(1984), Rodet, Potard, and Barriere(1984), Slawson (1985), Potard et al.
(1986) and many others, I have selected "source" and "resonance" as
timbral dimensions which can be varied independently, and which can be
parsed independently into musicallyfunctional categories.
When controlled independently, changes in the "source" and "resonance" of a sound can often be discriminated independently. Spoken and
whispered vowel sounds provide a clear example of this independence:
changes in the "source" (glottal oscillations or air friction) and the vowel
(associated with the formant pattern) can be discriminated separately.
Experimental evidence for more general examples of the perceptual independence of source and resonanceis extensively reviewed by Slawson (1985)
who also proposes an elaborate network of rules according to which the
dimension of "sound color" (vowel-likeresonancepatterns)can be varied.6

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

of NewMusic
Perspectives

116

Potard, Baisnee, and Barriere(1986) have developed an approach to synthesis using extremely detailed models of the resonances of impulsive
sounds. I found their (1986) demonstrations of the independence and
perceptual saliency of "source" and "resonance" to be remarkablyconvincing. Grey's (1977) multidimensional perceptual scaling experiments
have also demonstrated that dimensions related to source ("instrument
family") and resonance ("spectral energy distribution") are amongst the
most perceptuallysalient featuresof musical timbre.
VOWEL UNISONS

Vowels (and other "steady-state" voiced speech sounds) are unique


among all possible patterns of formant resonance in that listeners have
codes, cognitive prototypes, by means of which they identify and remember them. Two of the three formant structures in Example 2 describe
vowels. Example 2a is a schematic formant structure of the vowel /u/.
Example2b representsthe vowel /i/. Example2c is not a vowel. Because the
resonance pattern described by 2c would not be easily produced by a
human vocal tract, it has not become part of a vocabulary of coded
resonances associated with a natural language. The nonvowel resonance
might (or might not) be remembered if it occurred twice in a musical
composition but, because it cannot be related to a prototype and coded in
memory, it is likely to be a much less effective category than the vowels, and
therefore less effective as a musical form-buildingelement.
Identification and retention of sequences of vowels would be only marginally affected if a listener's native language divides the vowel continuum
at boundaries different from those chosen by a composer. While the
specific prototypes around which listeners' organize vowel percepts may
vary depending upon the language they speak, the tendency to group
vowel resonances according to these prototypes persists even when the
vowels presented do not conform to the prototype. If a listener's language
has no /A/ , for example, s/he might rememberit as an "open /oe/ ." While
this variantmight not form as strong a perceptual category as a "cardinal"
vowel in the listener's language, it would be still more memorable than a
resonance pattern which fell outside the region of the vowel continuum
structured by the listener's language.7 The vowels at the high and low
formant frequency extremes of the vowel continuum (/u/, /a/ and /i/)
appearalmost universallyin languages of the world (see Disner 1983).
The attempt to construct an instrument/vowel matrix is complicated by
the fact that listeners are usually cued to listen for phonetic information in a
signal when they identify the source of the sounds as a human voice.8 That
is not to say, however, that listeners cannotextract phonetic information

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

117

Speech Extrapolated

Amp.

1000

2000

3000

2000

3000

Freq.

Amp.

1000

Freq.

Amp.

I^llllj

1000

llI

2000

llllllllllllll
3000

4000

Freq.
EXAMPLE 2: VOWEL AND NONVOWEL RESONANCES

(when phonetic information is available) from "un-voice-like" signals.


Experiments such as Bailey et al. (1977)9 suggest, however, that un-voicelike speech sounds must be presented in such a way as to draw the listeners'
attention to the phonetic features of the sounds. 0 Rapid diphthongs and
juxtapositions of sounds with markedlydifferentformantstructurestend to

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

118

Perspectives
of NewMusic

highlight the formant structure in ways which draw listeners' attention to


the phonetic message. Presentation of sounds which clearlycarry phonetic
messages can also help to influence listeners to attend to associated sounds
as speech (Tsunoda 1971).

SYNTHESIS OF PERCUSSION/VOWEL SOUNDS

The percussion/vowels were synthesized in an effort to cue the perception of two very different-and normally incompatible-sets of categories
within the same sound: familiar sources (metal and wood percussion
instruments) with familiarresonances (vowel sounds). For this purpose, I
made use of the CHANT synthesizer (Rodet, Potard, and Barriere1984)
and a technique developed by Potard, Baisnee, and Barriere (1986) for
modeling and synthesizing the resonancesof impulsive sounds.
CHANT utilizes formant-wave-function (FOF) synthesis (Rodet 1984)
to enable the user to dynamically control the center frequencies, bandwidth, and amplitudes of up to two hundred or more time-varying resonances in a synthesized sound. The resonances may be produced either by
synthesis or filtering. The program was initially designed for speech synthesis, but has proven to be extremely flexible and effective in the synthesis
of a variety of timbres.
Potard, Baisnee, and Barriere(1986) have described techniques by means
of which a digital recording of an impulsive sound can be analyzed and
modeled as a set of resonances, each resonance with a given center frequency, bandwidth, and amplitude. In the form of a text file, each model
can then be altered and manipulated by any algorithm the researchermay
develop in the UNIX environment or in FORMES (Rodet and Cointe
1984) and synthesized using CHANT. Severalmodels require more than a
hundred individual resonances, but the number of resonancescan often be
systematically reduced without a marked loss in "fidelity." Potard et al.
have made a number of models of impulsive sounds availablein IRCAM's
on-line library.
Using CHANT and the techniques described by Potard, I synthesized
the percussion/vowel sounds with three active formants. Only the first two
formantswere associatedwith audible pitches (described below); the fourth
and fifth formantswere static.
1. I selected a model of a given instrument originally played at
fundamental frequency around 440 Hz-near the center of the
range of first formant vowel frequencieswith which I was working. (Specific formant freqlucnciestor the nine selected vowels are
given below.)

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

119

2. I transposed this model so that the fundamental frequency (FO)


of the model of the instrument corresponded to the center
frequency of the first formant (F1) of the desired vowel.
3. I transposed the same model (or another model of the same
instrument played at a higher pitch) so that the fundamental
frequency (FO) of the instrumental model corresponded to the
center frequency of the second formant (F2) of the desiredvowel.
4. I combined the two models.
5. I attenuated or eliminated all resonanceswhich fell between ca.
300 Hz and 3000 Hz (the frequency range within which F1 and
F2 vary) exceptfor the resonanceswithin a few Hz of the fundamental frequencies of the two combined percussion models
(which had been tuned to F1 and F2 of the desired vowel).
6. At Fl and F2 of the desired vowel, I added a number of additional resonanceswhich varied from wide bands (perhaps 50 Hz
or more) to very narrow bands (ca. 1 Hz or less).
7. Proceeding empirically,I balanced the relative amplitudes of the
wide and narrow bands at each of the formant frequencies to
produce both a perception of vowel (the wide bands) and pitch
(the narrow bands). I also balancedthe relativeamplitudes ofF1
and F2 (to produce an optimally intelligible vowel) and the
relative amplitudes of the formants vs. the remaining percussion
resonances(to produce a percept of both vowel and instrument.)
Amplitudes and resonance times were altered freely to these
ends.
8. Having produced individual vowel/percussion "objects," I produced diphthongs by using tools in the FORMES environment
to control continuous interpolations between series of two or
more such "objects."
The wide bands at F1 and F2 were designed to convey the vowel. The
transientsbelow 300 Hz, the narrowbandresonancesat the formant center
frequencies (the fundamental frequencies of the two original models), and
the transientsand harmonics above 3000 Hz were left unaltered to convey
the identity (and sense of pitch) of the original percussion instruments.
Thus each percussion/vowel is pitched as a dyad at the formant center
frequencies (F1 and F2).

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

120

Perspectives
of NewMusic

VOWEL SCALE

I think of the extremes of the vowel space-/u/, /a/ , and /i/-as defining
the extremes of a rangeof articulationswith which listeners can identify. If a
speakeror singer were to continue the articulatorygesture from /a/ to /u/
in an attempt to find a vowel with lower first and second formantsthan /u/,
she will arriveat the articulatoryplace for the stop consonant /b/ or (if the
sound is nasalized) the nasal /m/. If she attempts to continue the articulatory gesture from /a/ to /i/ in an attempt to find a vowel with a higher
second formant than /i/, she will arriveat the articulatoryplace for the stop
consonant /d/ or (if the sound is nasalized)the nasal /n/. Similarly,attempts
to continue the transitions from /u/ to /a/ or from /i/ to /a/ to produce
still more "open" vowels, results only in vowels closely related to la/ .
The vowels /u/, /a/ and /i/ are thus not arbitrarilyselected as extremes of
the vowel "scale"; they represent articulatoryextremes along the vowel
continuum. Their unique articulatorypositions may be the reason that the
vowels /u/ and /il appear almost universally in natural languages. In Still
LifeDancing, /u/, /a/ , and /i/ are the most common vowels of referenceor
orientation: /u/ often serves as a point of resolution; lal often serves as a
secondary (more temporary) "tonic", and /i/ often serves as a high point.
I divided the vowel continuum from /u/ to /i/ into a scale of nine
individual vowels roughly organized according to their second formant
frequencies.11The vowel continuum could, of course, be divided into an
infinite number of individual vowels, just as the octave can be divided into
an infinite number of pitches. I selected nine as a number which would
facilitatea sufficient variety of vowel patterns and still permit the discrimination and identification of the individual vowel qualities.12 These particular vowels were selected because they are familiar to me as a speaker of
English and because their formantfrequenciesare spaced fairlyevenly in the
vowel space. In Example 3, a range of possible center frequencies is given
for Fl and F2 of each of the nine vowels.13

VOWEL/PITCH

ARRAY

As the formant center frequencies determine both vowel qualities and


perceived pitch dyads, I designed a pitch arraybased upon the vowel scale
described above.
The pitch array shown in Example 4 was composed intuitively as a
succession of dyads, roughly within the constraints of the averageformant
frequencies given in Example 3. (The phonetic symbol for each vowel in
Example 3 is placed exactly at the point in the matrix representing the
intersection of the F1 and F2 frequencies selected for that vowel.) The

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

121

SpeechExtrapolated

261.6
277.2
293.7
311.1
329.6
349.2
370.0
392.0
415.3
440.0
466.2
493.9
523.2
554.4
587.3
622.3
659.3
698.5
740.0
784.0
830.6
880.0

C4
C#
D
D#
E
F
F#
G
G#
A
A#
B
C5
C#
D
D#
E
F
F#
G
G#
A

U
|

-lo0
I
oe

e^ vo o
O > w
o1

EXAMPLE

<n
o ai rt C

to Q

8r-I6oS

MAPPED

AGAINST

4 o

c c? f

m
)

THE TWELVE-TONE

so

FOR NINE VOWELS

VOWEL FORMANTS

3: FIRST AND SECOND

(FEMALE)

8ve

TEMPERED

SCALE

....................................................-------------------------.............--...........------------

e- i

^ lT'

^A

aCL

EXAMPLE 4: VOWEL FORMANT

CENTER-FREQUENCY/PITCH

i
ARRAY

pitches shown in Example 4 were consistently used as F1 and F2 in the


synthesis of the indicated vowels in all familiesof percussion/vowel timbres.
Each percussion/vowel is thus consistently associated with a specific pitch
interval. (While the F2 pitch is not easily "heard out" in the vowels /u/
through /a/ , it is nonetheless audible in the "character" of the timbres.)

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

of NewMusic
Perspectives

122

As I wished to distinguish the vowel sounds in pitch as well as in formant


pattern, I chose a different pitch classfrom the rangeof pitches availablefor
the first formant of each of the nine vowels. The first formant pitch stands
out most prominently as a pitch in the vowels from /u/ to
/a/. The second formant pitch becomes increasingly prominent in the
vowels from /x/ to /i/-particularly in the metal vowels. Because the
percussion/vowels can be transposed only about a major second in either
direction without seriously impairing the vowel percept, the vowel scale
from /u/ to /i/ is thus roughly associatedwith a somewhat variable"scale"
of formant pitches. Within the limits given, any sequence of vowels or
diphthongs thus involves a concomitant pitch gesture.

DISCRIMINATION

AND IDENTIFICATION

OF PITCH AND VOWELS

Listeners bring very different perceptual processes and capabilities to


bear in the perception of pitch and vowels. Individual pitches can (within
limits) be discriminatedfrom other pitches, and can be placed relative to
other pitches (by interval higher or lower). Most listeners, however, cannot
absolutely identifyindividual pitches. (That is to say, most listeners do not
have absolute pitch.) Presumably for this reason, appreciation of most
musical systems requiresonly the identification of relativepitch patterns.
Individual vowels, on the other hand, are both discriminatedand identified in normal speech perception. Whereas most listeners cannot consistently identify the note G05, for example, in a non-tonal musical context
(or even in a tonal one if they are not first told the key), even musically
untrained listeners can consistently identify the vowel /a/ (whose first
formant is around G#5 in this system).
Moreover, the capability of listeners to label and code vowel qualities in
memory may be a potentially powerful form-building tool. Research has
shown that musically trained listeners retain in memory not only a
of that
"recording" of diatonic pitch patterns, but also a codedrepresentation
The
coded
is
in
retained
more
pattern.
representation
memory
accurately
and for a longer time than the analog "recording." Phonetic perception
entails a similar coding of information. (See Liberman and Studdert-Kennedy (1978) for example.) Listeners' ability to retain the speech information in nonsense syllablessuggests that even a non-morphemic organization
of vowels (in a musical grammar) might be coded and retained in similar
ways.
The concomitance in the organization of vowel and pitch in the percussion/vowels has the effect of reinforcing the memorability of particular
pitch/vowel areas as points of departure and arrival between which the
large gestures of Still Life Dancing were constructed.

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

123

FAMILIES OF PERCUSSION/VOWELS

I produced severalfamilies of percussion/vowels at IRCAM. Each family


consists of one source realized as each of the nine individual vowels and (in
most families) ten to thirty selected diphthongs. The families include
percussion/vowels utilizing models of cowbell, glockenspiel, and piano
sounds. Wood/vowels were synthesized empirically("by ear") without use
of a percussion model. Both metal/vowels (glockenspiel)and wood/vowels
were realized both by FOF synthesis and, using the same resonances, by
filtered white noise. The filtered white noise vowels were altered to produce sustained wideband resonances which I call wood-noise/vowels and
metal-noise/vowels. "Bowed metal" vowels were produced by repeatedly
exciting the glockenspiel model (as if the glockenspiel bar were being gently
struck repeatedlyat a frequency audible as a pitch) to give the impressionof
"bowing."
Of all the percussion/vowels, only cowbell/vowels, glockenspiel/vowels,
wood/vowels, wood-noise.vowels, and metal-noise/vowels are used in the
composition Still Life Dancing. I plan to use other combinations of families
in other pieces. The sound files were recorded at IRCAM on PCM digital
tape and sampled onto the Synclavier II at Bregman Electronic Music
Studio, Dartmouth College, where they were edited and assembled into
the completed tape part.

II. TIMBRAL ORGANIZATION IN STILL LIFE DANCING


INSTRUMENTAL

GROUPINGS

The "instrumental" resources used in my composition Still LifeDancing


for three percussion playersand tape are presented in the matrixin Example
5. This figure displays the "instruments" within a configuration of variables arranged so as to suggest some of the ways in which the "instruments"-and the variables-were deployed in the piece.14
Along the horizontal axis of Example 5, the "instruments" of the piece
are arrangedin three categories representinga stepped transition between
speech and percussion sounds (and, coincidently, according to the means
by which the sounds were produced ). Because they possess the "source"
characteristicsof pitched wood and pitched metal percussion instruments
and the resonance characteristicsof the (unpitched) sampled vowels, the
percussion/vowels help to unify the sound world of the piece. They very
often function as a hinge or a bridge between the unpitched sampled
speech and the pitched percussion or as part of a complex unified texture.
(See Example6, discussed below.)

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

Perspectives
of NewMusic

124

Sampled Speech

Vowels
wood vowels
wood-noise vowels

Wood

Voca/
Tra.c

Live Percussion
marimba Pitched
xylophone
wood blocks
temple blocks

vocal fry vowels


whisper vowels
fricatives

drums
hand percussion

nptch

cowbells
cymbals

Metal
metal-noise vowels
metal vowels

vibraphoneI
orchestral bells Pltched

EXAMPLE 5: "INSTRUMENTS" USED IN Still Life Dancing

Along the vertical axis of Example 5, the sound world of the piece is
divided accordingto the "source" of the sounds (wood, vocal tract, metal)
and according to specificity of pitch ("pitched" vs. "unpitched").15 At
various points in the piece, "unpitched" instruments, "pitched" instruments, "wood" instruments, and "metal" instruments (as defined in
Example 5) are projected as individual timbral groupings. Drums, which
are neither wood nor metal of course, are rarely projected in Still Life
Dancing as a separate "skins" category; instead they serve as a lowerfrequency component of the "unpitched" category which also includes
metal, wood, and the sampled speech.
Although it is not indicated in the diagram, the "unpitched" category is
further divided into vowel-related and fricative-related instruments.
Because of their indefinite pitch and short decay, wood blocks, temple
blocks, and cowbells are often grouped with vocal fry vowels to create an
ambiguous speech/nonspeech texture. In Examples 7 and 8 in particular,
the wood blocks and whisper/fry are hocketed in interlocking, often imitative patterns. Cymbals and whisper-sung vowels are often grouped because
they are both wideband sounds with indefinite pitch and comparatively
long decay. Hand percussion such as maracas, shekere, and tambourine
which can be sustained by rolls are often grouped with the fricatives (/s/,
and /J/ )16 and the fricative-vowel transitions (/su/, /sa/, /si/, /Ju/, /|a/,
/Ji/ ). In Example 2, bar 1, the fricative-voweltransition /fi/ comes out
ofthe sound ofthe Shekere. In bars2-3 of the same example, an extended /I/
joins rolled tambourine, and maracas as part of a high-pitched rustlenoise texture.

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

125

SpeechExtrapolated

9=%

U.

n0

'

--

U.

sE

BUI-

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

Perspectivesof New Music

126

I
I

]Vs
ap

N-

[I

" ILo

IL

e)

0
r.
-U

0'I

9L

Cr;l

-:

Lr)
U-

Lj
-1

-0

ouko

o
1

LL

<0

I
--Io
--I

-1c

U.
m.

tla
^
E

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

127

Speech Extrapolated

3.

1W~-

r
rn

to

I,

--

L74,.

by I

e*n

L-

I;i

I;d

I
I,

0o 1uF L;L
I

(4I

2-

L
L

1"

FsN

sc^
H

E-

\0

p
<k-

7-,

-v^

lt-

0.

r
F

AL

Iq"

.7
Je)

I
p-s,
3-

k^

QL

P"

ILa

.r 71

(
7;3j
r ^

/ It
;O

._A
-

=NEc
a

>
S

:0.
'

ILi

i:
0
a

ae
u

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

of NewMusic
Perspectives

128

(
^

ib
?^

Wh/Fry

7-

tLjoocI-,C-

P/Dipth.lL7
I^
A

P/Yowels3

tJ

aa
I-

L-

Iu,

Ir

ZIJ

Ii

I
_

-~

F --

rj

r_ri

Xylophone
Mari mba

3
Cymbals

Drums

EXAMPLE 6 (CONT.)

Whisper/Fry

*1

Wood Blocks
Temple Blocks

HandPercussion

zf

> --I

1f

I- I

1-f"-

Cymbals

Drums

rt -

'

I -I

EXAMPLE 7

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

129

SpeechExtrapolated

00
0-4
P-

0>
L.
U.
k.
4a0.

a
3:

f)

I
a
f

Q.

a
0

3I-i0

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

of NewMusic
Perspectives

130

-_I

Wh/Fry

. .-J.
I

P/Yoveh1.^

Wd.-'?$?

BI
f
I

'Marimba
*

_r

Wd. Biks
rL
Temp.Blk3l

Gf

r
3

r
~-=~?

M4-

----

-+

7
6I-f

Cowbell
Cymbals

Drums

r(

-^

TT-T1
r
p

If-

EXAMPLE 8 (CONT.)

The piece evolves from a focus on unpitched sounds at the outset


(Example7) to include more and more pitched wood sounds ( Example 8).
After a brief concentration on the ensemble of "wood" instruments, the
piece evolves to a focus on pitched metal (Example 9), and to a complex
mixture of timbral groups.

VOWEL QUALITIES

As indicated at the beginning of this paper, vowel qualities are integrated


with other musical featuresin two different ways:
1. Vowels
functionas pointsof directassociationbetweenpitch information(the
formantstructureof the vowelsaudibleas pitches)and timbre(theformant
structureaudibleas vowels).
Pitch height and vowel quality could be schematicallyrepresentedas different but associated aspects of a third dimension (not shown) of Example 5.
Vowel quality and pitch are associatedalong this dimension because the

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

<

/r

,<

.i4

^_

'

-^.
LO

^S

^ ^
?

jpO0

I IC

sO
?

)w

er

qL

t1

^
?
^~~~~
Q5
?

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

131

132

Perspectives
of NewMusic

percussion/vowel scale from /u/ to /i/ entails a concomitant (though somewhat variable)set of pitches associatedwith the formant frequencies. Vowel
quality and pitch representdifferentaspects of this dimension because of the
differences (some of which are outlined above) between phonetic perception and pitch perception.
Within the context of Still LifeDancing, the pitches from around D4 (the
first formant of /u/) to around F7 (the second formant of /i/) are in a
privileged position because they fall within the range of vowel formant
frequencies.The memorabilityof pitches and patternsof pitches within this
range can be reinforced and "colored" by direct association with specific
vowels. This will be illustratedbelow in a discussion of Example6 (bars93106).
The whisper and vocal fry vowels are organized in a scale of ascending
second formantsas are the percussion/vowels. These sounds are sometimes
organized accordingto their specific vowel qualities (see below) and sometimes simply as another unpitched percussion instrument (a percussion
instrument the vowel qualities of which can become functional at any
time). The percussionistsare asked to select the range of wood and temple
blocks to approximatethe pitch of/i/ at the high end and /u/ at the low end
in order to facilitate perceptual associations and interactions between the
two sets of "instruments."
2. Vowelqualitiesfunctionaspointsof cognitiveunisonbetweentimbreswhich,
in otherrespects,are verydifferent,and whichare identifiedby listenersas
differentsources.
issuingfrom categorically
Because the vocal fry vowels and whisper vowels are vaguely and complexly
pitched, it is usually the vowel quality and the noise content-not the
pitch-that stand out most clearlywhen a vocal fry vowel or whisper vowel
is sounded. Moreover, because the sampled speech sounds were produced
by a male vocal tract while the percussion vowels were based upon average
formant frequenciesfor a smaller (female)vocal tract, the pitches associated
with each vocal fry vowel are not identicalto the formantfrequenciesof the
percussion/vowels. The inclusion of vocal fry vowels and whisper vowels in
the musicalmaterialsof Still LifeDancingthus gives vowel quality a function
separatefrom and more independent of pitch.
The vocal fry and whisper vowels can be used to match, extend, and
imitate patterns of vowel quality heard in the percussion/vowels independently of the pitch or other aspects of the timbre. The fry and whisper
vowel patterns often follow the pitched vowels (either as extensions of an
individual vowel or as an imitation of a vowel pattern) to convey the
impression that the pitch of the sound has decayed, leaving the unpitched
vowel. The intent of these imitations and extensions is to persuadelisteners

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

133

to focus upon individual vowels as categories (points of cognitive unison)


which can include sounds of remarkablydifferent timbre and thereby to
integrate the pitched and unpitched timbres.
Both the association of vowels with specific pitch ranges and the use of
vowel qualities as cognitive unisons between two categorically different
sources (one of them unpitched) are illustratedin Example6 (bars93-106).
Each of the pitched segments in this excerpt focusses upon at least one
clearly-audiblevowel: /a/ ends the first pitched segment (bars 93-96);
/E/ ends the second (bars98-100). The third pitched segment (bars 101104) begins with /I/ and descends to /o/ before repeating, in elaborated
form, the motion from /a/ (bar 103, bar 104) to // (bar 104). The high
point of the phrase is reached at the /i of bar 105.
Each of these structuralvowels appearsin both the (pitched) percussion/
vowels and in the (unpitched) whisper/fry vowels. (Note that the percussion/vowel line is itself a hocket of different sources.) Vowel patterns which
appear as percussion/vowel sounds are sometimes imitated in the whisper/
fry. (E.g. the descent from /i/ to /u/ in bar 105 is telescoped in the whisper/
fry in bars 105-106.) Similar imitations and interactions can be found in
Example 9, bars 259-61 and Example 8, bar 43.
CONCLUSION

The vowel unisons between the sampled speech and the percussion/vowels
point to an interesting and important ambiguity in Example 5. The source
category "vocal tract" should certainly include the sampled speech which
issued (quite audibly) from a human vocal tract. But should it include the
percussion/vowelswhich seem to emanatefrom wood or metal instruments
but behave(in their changing resonances) as if produced by a vocal tract?
This ambiguity is maintained throughout the piece by the continual return
of sampled speech and percussion/vowels. The speech and/or timbral similaritiesof the sampled speech, percussion/vowels and the live percussion is
intended to extendthis ambiguity to include as many of the sounds of the
piece as possible.
The compositional intent is to invite the listener:
1. To attend, at times, to the texture of this piece in a speech modeinterpreting (or attempting to interpret) timbres which would normally be thought of as non- speech in terms of speech sounds which
share timbral characteristics,and, (conversely)...
2. To attend to the sounds of speech not only as cues for phonetically
coded information, but as timbres, pitches, durations.

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

of NewMusic
Perspectives

134

The perception of "vocal tract behavior" and the "extrapolation" of this


behavioronto other sounds requiresonly that the listener recognize characteristically "vocal" formant transitions and other speech qualities as inextricably related elements of the texture. This effect is intensified, however,
to whatever extent specific vowel qualities are made identifiable and functional parts of the musical language.
As Fred Lerdahl (1987) has remarked,the tremendous timbral flexibility
which is provided by electro-acoustic instruments has made the need for a
greaterunderstandingof timbral organization extremely acute. I am not as
optimistic as Lerdahl, however about the development, in the foreseeable
future, of broad principles of hierarchical timbral organization. In the
recent past and for the foreseeable future, I see a variety of "situational"
approaches to timbre: approaches unique to individual composers and to
individual pieces, approaches which involve elaborate timbral structures
which are nonetheless mutually dependent with associatedpitch structures.
The central problem confronting these "situational" understandings concerns the unfamiliarityof audienceswith the assumptions upon which each
new system is based. In this regard, the sounds of speech recommend
themselves as fascinating, diverse, familiar, and therefore useful points of
departure.

ACKNOWLEDGMENTS

It is a great pleasureto acknowledge the assistanceI received in this project


from IRCAM researchersPierre-FrancoisBaisnee, XavierRodet, and JeanBaptiste Barriere, who offered instruction and assistance in the use of
CHANT and who (with others cited in the bibliography)developed all of
the synthesis tools with which I worked. I am grateful, as well, to my
colleague Jamshed Bharucha at Dartmouth College who read and commented upon an earlierversion of this manuscript. Any errors which may
remain are my own.
Funding for this project was generously provided in the form of a
ResearchFellowship by the Dartmouth Class of 1962. I greatly appreciate
their assistance.

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

135

NOTES

1. While Lerdahl organizes timbre according to judgements of "consonance" and "dissonance" along several timbral dimensions (an
approach explicitly related to the structure of tonal music), Slawson
organizes his system of "sound colors" accordingto serial principles.
2. I have utilized speech sounds as the basis for compositional control of
a wide variety of timbres in various contexts. Two examples: In
Passages(1981) for chamber choir, two pianos, two percussion, and
organ, I utilized a phonetically constrained text composed for the
piece by poet Michael Davidson in conjunction with a phonetically
constrained artificial language of my own design. The artificial language entailed the rule-based construction of "nonsense" syllables
utilizing three classes of phonemes: fricatives, stop consonants, and
vowels-the timbral qualitiesand organizationof which were mapped
onto the organization of the entire ensemble and the entire piece. In
Scritto(1986) for computer tape, I utilized sampled whispers (a vowel
scale, two fricatives, three stop/vowel syllables), vocal fry (the same
vowel scale), and sung aggregates with percussion attacks. The
ambitus of the pitch aggregatesconsistently associatedwith the sung
vowels varied directly with the height of the second formant frequencies of the vowels to reinforce the sense of "opening" from /u/
through the "vowel scale" to /a/ to /I/.
3. J.K. Randall (1972-74) proposed a similar association of pitch and
vowel as early as 1972 in "Part II: 6 Stimulating Speculations" of his
article "Compose Yourself-A Manual for the Young." Randall
described a system of "musical-intervallicrelations(specifically, the
intervallicrelations formed by center-pitches of R1 and R2 [the first
and second vowel formants]) betweenand among the timbres ..."
which would be "... reproducible
at various places" in the vowel
square. Slawson (1985) cites a passage in Stockhausen's Kontakte
(1963) in which filters set at wide bands to produce a vowel are
graduallynarrowed until the pitch of the center frequency is audible.
Some psycho-linguists regard speech perception as fundamentally
different from perception of timbre in other contexts, and indeed
from all other types of perception: "speech is special," they say. I
discuss phonetic labeling here as if it were an independent aspect of
timbre perception, but I do not mean by that to deny the claim of
"specialness" for speech. I am agnostic on most aspects of the
controversy.I do maintain, however, that (as experiments in "duplex

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

136

Perspectives
of NewMusic

perception" demonstrate (see, for example, Liberman (1979)), listeners can attend simultaneously to both the timbral quality and
phonetic identity of of speech sounds.
4. An example: In the domain of pitch, the simplest, strongest, most
elemental intra-category relationship is the unison or prime. Within
limits, pitches may differ perceptibly and and still be identified by
listeners asfunctionallyidenticalin context. This is reflected in the fact
that "bad intonation" or expressive "pitch inflection" is generally
distinguished from "wrong notes." (In music which succeeds in
establishing distinct functions for, say, thirty-one tones in an octave,
the threshold between "bad intonation" or "wrong notes" would be
virtually eliminated.) Pitch unisons or primes represent functional
relationshipsin a musicalcontext, ratherthan an absolute identities of
pitch or frequency. This is not to say that two COs cannot play
different functions at a higher hierarchicallevel-they certainly can.
The point here is that, at a lower level of functional interpretation,the
note Cf can be effectively represented by sounds with slightly (but
perceptibly) different pitch heights.
5. Erickson (1975) discusses the importance of recognition and identification at some length. He asserts that "Even if we try very hard we
find it difficult to attend to any single parameterof a timbre" and he
quotes Schouten (1968) as follows: "Evidently our auditory system
does carry out an extremely subtle and multi-varied analysisof these
elements, but our perception is cued to the resulting overall pattern.
Acute observers may bring some of these elements to conscious
perception, like intonation patterns, onsets, harshness, etc., even so,
minute differences may remain unobservable in terms of their auditory quality and yet be highly distinctive in terms of recognizing one
out of a multitude of potential sound sources."
This bears directly upon Lerdahl's (1987) efforts to construct
timbral hierarchies. My strongest objection to Lerdahl's otherwise
intriguing approachis that he oversimplifiesthe powerful associations
(gained by everyday experience) that listeners bring to bear in their
listening to synthesized sounds. In a discussion of "timbral prototypes," he refersonly to "prototypes" of vibrato, tremolo, amplitude envelope, and other independent variablesand makes no reference to traditional instruments or other environmental sounds.
Presumably, his definitions of prototypical vibrato and prototypical
amplitude envelope are somehow abstracted from listeners' experiences, but it seems clear to me that if such prototypical featuresof
timbre can be said to exist, they are interdependent
and strongly related
to the listener's judgement as to the source of the sound. I would

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

137

think, for example, that a wide vibrato (pitch variation-not tremolo)


would sound more cognitively "dissonant" when applied to a
vibraphone-like sound than to a voice-like sound because listeners
know from previous experience that metal cannot change pitch as
flexibly as vocal cords. Lerdahl postulates a prototypically "consonant" vibrato (and tremolo, amplitude envelope, etc.) independently
of other aspects of the timbre. A more elaborateversion of Lerdahl's
initial study may prove fruitful, but any such elaboration must take
account of the fact that timbre perception is much more contextdependent and less linear than Lerdahl seems to imply.
6. Slawson (1985) has given us one of the most thorough theoretical
reviews of issues concerning the physicaland perceptual relationships
between "source and resonance."
7. The memorabilityof a set of resonancepatterns might be explained in
terms of the memorabilityof the location of a set of visual points on a
page. It is easier to categorize and remember the location of points
which fall within a familiar grid (the vowel continuum) than the
location of points which fall outside any known grid (outside the
vowel continuum)-even if the points within the grid do not fall
exactly in the center of the spaces or on the lines. Differing grids
applied to the same set of points will result in the points being
positioned and recalled relative to different lines, but it will not
change the fact that the locations can be more easily memorized when
a grid is applied.
8. For a discussion of "voice-likeness" and phonetic perception, see
Jones (1987).
9. Bailey et al. (1977) synthesized sine-wave analogues of speech sounds
by varying the frequency of two sine waves to correspond to the
center frequenciesof the first two vowel formantsin spoken syllables.
Most of the listeners presented with these rapidly varying sine waves
did not at first identify the sounds as speech. When they were asked
to listen to the sounds as speech, however, all were easily able to
decode the phonetic message.
10. This corresponds to the argument above that each of the independent
featuresof any timbre matrix must be presented in such as way as to
draw the listeners' attention to the musical function of changes along
that dimension.
11. The organization of vowels into a scale of ascending second formants
from /u/ through /a/ to i/ dates back at least as far as the Seventeenth Century. Ladefoged (1967) cites Robinson's 1617 manuscript

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

138

of NewMusic
Perspectives

"The Art of Pronuntiation" as organizing the vowels /u/, /o/, /o ,


EI , /i/ along an articulatory continuum represented by tongue
positions. Stockhausen organized the vowels between /u/ and /i/ into
a "vowel-square" in Stimmung(1967)-essentially two scales of harmonics associated with vowels he indicates. (The "notes" of the
scales are extremely focussed second-formant frequencies; the two
scales delineate the front and back vowels respectively.) Slawson
(1985) develops his elaborate system of "sound color" variation
around traditional phonetic dimensions within the F1 vs. F2 vowelspace. (He adds the dimension of "smallness" not used by phoneticians.) "Openness," "acuteness," "laxness," and "smallness" can all
be described acousticallyand all (except "smallness") have proven to
be useful to phoneticians. But Slawson's sound examples did not
convince me that variationalong all of these dimensions will function
perceptibly in a musical context. Both Stockhausen's and Slawson's
organization of the vowel space are more complex than the simple
"scale" I outline in Example 4. I choose this simpler organization of
vowels because I find it to be the clearestand most compelling for my
purposes.
12. Disner (1983) discusses evidence that the distribution of vowels
in natural languages "... is best accounted for by a principle of
maximum dispersion ... that is, that they tend to be arrangedso as
to be maximally far from one another in the available phonetic
space." Also see Stevens (1972) article on "The Quantal Nature
of Speech ...."
13. The ranges of possible center frequencies for F1 and F2 of the nine
vowels given in Example 3 were derived from a number of different
sources. The most useful of these sources was Peterson and Barney
(1952) (whose reported formant frequencies have worked well in a
vowel synthesizer I developed in 1979 at the Center for Music Experiment at U.C. San Diego reported in Jones (1984)) and Ladefoged
(1967) who gives formant frequencies reported by several different
researchers.The orientation of the formants in Example 3 is a traditional presentation of the vowel space.
14. Matricessuch as the one in Example 5 have become familiarcompositional tools for many composers in this century. A sophisticated
theoretical discussion of use of matrices is found in Lewin's (1987)
GeneralizedMusical Intervalsand Transformations.
My own compositional uses of the above matrix were, however, much more intuitive
than the rigorous approachesdescribed by Lewin. I found it useful to
think of the dimensions in Example 5 as fields within which gestures
(directed motion, discernible shapes) may be constructed.

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

139

15. The term "unpitched" appears in this paper in its "traditional"


usage: as a convenient way to refer to indefinitely and/or complexly
pitched sounds. Many sounds can serve as "unpitched" or
"pitched" depending upon the context. The whisper vowels and
vocal fry, for example, sometimes seem to project audible pitch
centers (which were occasionally exploited in Still Life Dancing).
These sounds, however, were more generally organized according to
their relativepitch and are accordinglylabeled "unpitched."
16. Similar instrumental associations with fricatives are also found in
Berio's Circles(see Jones 1988), Ligeti's Aventuresand NouvelleAventuresand many other pieces.

BIBLIOGRAPHY

Bailey, Peter J., Quentin Summerfield, and Michael Dorman. 1977. "On
the Identification of Sine-wave Analogues of Certain Speech Sounds."
HaskinsLaboratories
StatusReporton SpeechResearchSR-51/52:1-25.
Berio, Luciano. 1961. Circles,for female voice, harp, and two percussion
players. London: Universal Edition.
Disner, S. F. 1983. "Vowel Quality: The Relation Between Universal and
Language Specific Factors." U.C.L.A. WorkingPapersin Phoneticsno.
58.
Erickson, Robert. 1975. Sound Structure in Music. Berkeley and Los
Angeles: University of CaliforniaPress.
Grey, John M. 1977. "Multidimensional Perceptual Scaling of Musical
Timbres." Journal of the AcousticalSocietyof America, 61, no. 5 (May):
1270-77.
Jones, David Evan. 1981. Passages,for chamber choir, two pianos, two
percussion, and organ (score forthcoming from American Composers
Editions).
. 1984. "A Composer's View." Electro-Acoustic
Music (TheJournalof
the Electro-Acoustic
Music Associationof GreatBritain) 1, no. 1 May-June.
. [1986]. Scritto,for computer tape (compact disc Wergo Records on
Digital Music Digital, vol. 4, WER 2024-50).

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

140

of NewMusic
Perspectives

. 1987. "Compositional Control of Phonetic/Non-Phonetic Perception." Perspectives


of New Music 25, nos. 1 & 2:138-55.
. 1988. "Text and Music in Berio's Circles."Ex Tempore4, no. 2
(Spring-Summer): 108-14.
Ladefoged, Peter. 1967. "The Nature of Vowel Quality." Part two of Three
Areas of ExperimentalPhonetics:Stressand Respiratory
Activity; The Nature
and
Production
in
the
Units
Vowel
of Speech.London:
Perception
of
Quality;
Oxford University Press.
Music Review
Lerdahl, Fred. 1987. "Timbral Hierarchies." Contemporary
2:135-60.
Lerdahl, Fred, and Ray Jackendoff. 1983. A GenerativeTheoryof Tonal
Music. Cambridge, Mass.: M.I.T. Press.
New
Lewin, David. 1987. GeneralizedMusicalIntervalsand Transformations.
Haven: YaleUniversity Press.
Liberman, Alvin M. 1979. "Duplex Perception and Integration of Cues:
Evidence that Speech is Different from Nonspeech and Similar to Language." In Ninth InternationalCongressof PhoneticSciences,Symposium,
vol. 8. Copenhagen: Institute of Phonetics, University of Copenhagen.
Liberman, Alvin M., and Michael Studdert-Kennedy. 1978. "Phonetic
Perception." In Handbookof SensoryPhysiology,vol. 8: Perception,edited
by Richard Held, Herschel W. Leiboweitz, and Hans-Lukas Teuber,
143-78. Berlin and Heidelberg: Springer-Verlag.
Ligeti, Gyorgy. 1966. NouvellesAventures,for three singersand seven instrumentalists. New York:C. F. Peters.
McAdams, Stephen. 1982. "Spectral Fusion and the Creation of Auditory
of Music, edited
Images." In Music, Mind and Brain: the Neuropsychology
Press.
New
York:
Plenum
Manfred
279-98.
Clynes,
by
McAdams, Stephen, and KaijaSaariaho.1985. "Qualities and Functions of
Musical Timbre." In Proceedingsof the International ComputerMusic
Conference,1985, edited by Barry Truax, 367-74. San Francisco, CA:
Computer Music Association.
Peterson, Gordon E., Harold L. Barney.1952. "Control Methods Used
in a Study of the Vowels." Journalof the AcousticalSocietyof America24,
no. 2 (March): 175-84.
Plomp, Reinier. 1970. "Timbre as a Multidimensional Attribute of Complex Tones." In FrequencyAnalysisand PeriodicityDetectionin Hearing,
edited by Reinier Plomp and Guido F. Smoorenburn. Leiden: A. W.
Sijthoff.

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

SpeechExtrapolated

141

Potard, Yves, Pierre-Fran;ois Baisnee, and Jean-Baptiste Barriere. 1986.


"Experimenting with Models of Resonance Produced by a New Techof the Internanique for the Analysisof Impulsive Sounds." In Proceedings
tional ComputerMusic Conference1986, edited by Paul Berg. San Francisco, CA: Computer Music Association.
Randall, James K. 1972-74. "Compose Yourself-A Manual for the
of New Music 10, no. 2 (Spring-Summer
Young." Parts 1-3. Perspectives
no.
1
(Fall-Winter 1972): 77-91; 12, nos. 1 & 2 (Fall1972): 1-12; 11,
Winter 1973/Spring-Summer 1974): 233-81.
Rodet, Xavier. 1984. "Time-Domain Formant-Wave-FunctionSynthesis."
ComputerMusicJournal8, no. 3 (Fall): 9-14.
Rodet, Xavier, and Pierre Cointe. 1984. "FORMES: Composition and
Scheduling of Processes." ComputerMusicJournal8, no. 3 (Fall): 32-50.
Rodet, Xavier, Yves Potard, and Jean-Baptiste Barriere. 1984. "The
CHANT Project: From Synthesis of the Singing Voice to Synthesis in
General." ComputerMusicJournal8, no. 3 (Fall): 15-31.
Rosch, Eleanor H. 1973. "On the Internal Structure of Perceptual and
Semantic Categories." In CognitiveDevelopmentand the Acquisition of
Language,edited by Timothy E. Moore, 111-44. New York:Academic
Press.
47.

1975. "Cognitive Reference Points." CognitivePsychology7:532Schouten, J. F. 1968. "The Perception of Timbre." In Reportsof the 6th
InternationalCongresson Acoustics,edited by Dr. Y Kohasi. 6 vols. Vol. 3,
GP-6-2:35-44, 90. Tokyo: Mauruzen Company; Amsterdam: Elsevier.
Shepard, R. N. 1982. "Geometrical Approximations to the Structure of
Musical Pitch." Psychological
Review89:305-33.
Slawson, Wayne. 1985. SoundColor.Berkeleyand Los Angeles: University
of CaliforniaPress.
Stevens, Kenneth N. 1972. "The Quantal Nature of Speech: Evidence
from Articulatory-AcousticData." In Human Communication:A Unified
View. Inter-University Electronics Series, vol. 15, edited by Edward E.
David and Peter B. Denes, 51-66. New York:McGraw-Hill.
Stockhausen, Karlheinz. 1966. Kontakte.London: Universal Edition.
.1967. Stimmung.London: Universal Edition.
Tsunoda, Tadanobu. 1971. "The Difference of the CerebralDominance of

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

142

of NewMusic
Perspectives

Vowel Sounds among Different Languages." The Journal of Auditory


Research11:305-14.
Wessel, David. 1983. "Timbral Control as a Musical Control Structure."
ComputerMusic Journal3, no. 2 (Summer): 45-52.

This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions

Das könnte Ihnen auch gefallen