Beruflich Dokumente
Kultur Dokumente
Speech Extrapolated
Author(s): David Evan Jones
Source: Perspectives of New Music, Vol. 28, No. 1 (Winter, 1990), pp. 112-142
Published by: Perspectives of New Music
Stable URL: http://www.jstor.org/stable/833346
Accessed: 21-10-2015 01:20 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
Perspectives of New Music is collaborating with JSTOR to digitize, preserve and extend access to Perspectives of New Music.
http://www.jstor.org
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SPEECHEXTRAPOLATED
DAVIDEVANJONES
T IS LARGELY timbral qualities and timbral transitions which form the
acoustic basisof phonetic communication. It is not surprising, then, that
those timbres which cue the perception of speech are often taken as points
of departurein efforts to structure a wider timbralvocabularyas a vehicle of
purely musicalcommunication. Wayne Slawson (1985), for example, has
developed a detailed and rigorous approach to the organization of vowels
and vowel-like resonance patterns. Fred Lerdahl (1987) asserts that vowels
are "central to human timbre perception" and incorporates them prominently in his efforts to construct a hierarchicalsystem of timbral organization.' Prior to these theoretical efforts, poets such as Filippo Marinetti,
Hugo Ball, Kurr Schwitters, Maurice Lemaitre, and composers such as
Karlheinz Stockhausen, Herbert Eimert, Luciano Berio, Gyorgy Ligeti,
Kenneth Gaburo, CharlesDodge, Roger Reynolds, Paul Lansky,and many
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
113
others have composed with speech sound in such a way as to focus the
attention of listeners on the sounds of speech not only as carriersof
information (verbal meaning), but also as structuresof (timbral) information. Many of these composers and poets have utilized speech sound in
association with timbrally similar nonspeech sounds in an effort to structure a largerand more varied timbral vocabulary.
I have addressedelsewhere (1987) some of the theoretical issues underlying the strategies composers have utilized to focus listeners' attention on
"the sounds of speech-not only as cues for phonetically coded information, but as timbres, pitches, durations." As a composer, I have explored
ways of making the structure of speech sound serve as a basis for important
aspects of my musical structures. My approach has generally involved
organizing speech sounds according to their timbral characteristics(rather
than their morphemic function) and highlighting this timbral organization
by my use of related instrumental timbres and associated pitch structures.
By these means, I have found ways of "mapping" or "extrapolating"
aspects of speech structure into a musical domain.2
In this paper I will describe some aspects of the compositional language
developed for my Still LifeDancingfor four percussion playersand computer
tape. In particular, I will focus upon several families of "percussion/
vowels" I synthesized for this piece: sounds identifiable as percussion
instruments but with identifiable vowel resonances. Along the way, I will
address several theoretical issues related to the musical organization of
speech sounds and of vowels in particular.In the first part of this paper, I
will addresssome the assumptions underlying the internal organization of
the familiesof percussion/vowelsand I will outline the proceduresby which
these sounds were synthesized. In the second part, I will discuss the
musical functions of the percussion/vowels within the overall timbral
organization of Still Life Dancing, and I will outline the effects I hope to
achieve by the integration of speech and nonspeech timbres.
I. PERCUSSION/VOWELS
OVERVIEW
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
Perspectives
of NewMusic
114
ce
A^ a
Struck Wood
Skuare
Struck Metal
BowedMetal
etc.
EXAMPLE
COGNITIVE
UNISONS
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
115
a cognitive unison is thus an attempt to distinguish between those differences along a given dimension that give rise to the musical structure
(functional differences)and those differencesthat do not.4
Perception of differencesalong any musical dimension is strongly influenced by the listeners' experience in their everydayacoustic world. Even if
harmonies, durations, aspects of timbre were varied along carefully controlled continua, it is highly unlikely that the listeners' perceptual responses
would be continuous: listeners make use of familiarcategories to discriminate and identify points along the continuum. (For relatedexperiments, see
Rosch 1973, 1975). In regard to speech sounds in particular, see the
excellent review provided by Liberman and Studdert-Kennedy (1978).
Composers attempting to create musically effective categories (and therefore effective points of cognitive unison) along any musical "dimension"
must take into account the categories already familiar to the listener.
Composers must either make use of these familiarcategoriesor take special
care to convincingly override them for the purposes of a particularpiece.5
The percussion/vowels were synthesized in an effort to make use of
familiar categories: to cue the perception of a familiar source (metal or
wood percussion instruments) and a familiar resonance (one of nine
selected vowels) in the same sound and to vary these two featuresindependently in musicallymeaningful ways.
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
of NewMusic
Perspectives
116
Potard, Baisnee, and Barriere(1986) have developed an approach to synthesis using extremely detailed models of the resonances of impulsive
sounds. I found their (1986) demonstrations of the independence and
perceptual saliency of "source" and "resonance" to be remarkablyconvincing. Grey's (1977) multidimensional perceptual scaling experiments
have also demonstrated that dimensions related to source ("instrument
family") and resonance ("spectral energy distribution") are amongst the
most perceptuallysalient featuresof musical timbre.
VOWEL UNISONS
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
117
Speech Extrapolated
Amp.
1000
2000
3000
2000
3000
Freq.
Amp.
1000
Freq.
Amp.
I^llllj
1000
llI
2000
llllllllllllll
3000
4000
Freq.
EXAMPLE 2: VOWEL AND NONVOWEL RESONANCES
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
118
Perspectives
of NewMusic
The percussion/vowels were synthesized in an effort to cue the perception of two very different-and normally incompatible-sets of categories
within the same sound: familiar sources (metal and wood percussion
instruments) with familiarresonances (vowel sounds). For this purpose, I
made use of the CHANT synthesizer (Rodet, Potard, and Barriere1984)
and a technique developed by Potard, Baisnee, and Barriere (1986) for
modeling and synthesizing the resonancesof impulsive sounds.
CHANT utilizes formant-wave-function (FOF) synthesis (Rodet 1984)
to enable the user to dynamically control the center frequencies, bandwidth, and amplitudes of up to two hundred or more time-varying resonances in a synthesized sound. The resonances may be produced either by
synthesis or filtering. The program was initially designed for speech synthesis, but has proven to be extremely flexible and effective in the synthesis
of a variety of timbres.
Potard, Baisnee, and Barriere(1986) have described techniques by means
of which a digital recording of an impulsive sound can be analyzed and
modeled as a set of resonances, each resonance with a given center frequency, bandwidth, and amplitude. In the form of a text file, each model
can then be altered and manipulated by any algorithm the researchermay
develop in the UNIX environment or in FORMES (Rodet and Cointe
1984) and synthesized using CHANT. Severalmodels require more than a
hundred individual resonances, but the number of resonancescan often be
systematically reduced without a marked loss in "fidelity." Potard et al.
have made a number of models of impulsive sounds availablein IRCAM's
on-line library.
Using CHANT and the techniques described by Potard, I synthesized
the percussion/vowel sounds with three active formants. Only the first two
formantswere associatedwith audible pitches (described below); the fourth
and fifth formantswere static.
1. I selected a model of a given instrument originally played at
fundamental frequency around 440 Hz-near the center of the
range of first formant vowel frequencieswith which I was working. (Specific formant freqlucnciestor the nine selected vowels are
given below.)
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
119
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
120
Perspectives
of NewMusic
VOWEL SCALE
I think of the extremes of the vowel space-/u/, /a/ , and /i/-as defining
the extremes of a rangeof articulationswith which listeners can identify. If a
speakeror singer were to continue the articulatorygesture from /a/ to /u/
in an attempt to find a vowel with lower first and second formantsthan /u/,
she will arriveat the articulatoryplace for the stop consonant /b/ or (if the
sound is nasalized) the nasal /m/. If she attempts to continue the articulatory gesture from /a/ to /i/ in an attempt to find a vowel with a higher
second formant than /i/, she will arriveat the articulatoryplace for the stop
consonant /d/ or (if the sound is nasalized)the nasal /n/. Similarly,attempts
to continue the transitions from /u/ to /a/ or from /i/ to /a/ to produce
still more "open" vowels, results only in vowels closely related to la/ .
The vowels /u/, /a/ and /i/ are thus not arbitrarilyselected as extremes of
the vowel "scale"; they represent articulatoryextremes along the vowel
continuum. Their unique articulatorypositions may be the reason that the
vowels /u/ and /il appear almost universally in natural languages. In Still
LifeDancing, /u/, /a/ , and /i/ are the most common vowels of referenceor
orientation: /u/ often serves as a point of resolution; lal often serves as a
secondary (more temporary) "tonic", and /i/ often serves as a high point.
I divided the vowel continuum from /u/ to /i/ into a scale of nine
individual vowels roughly organized according to their second formant
frequencies.11The vowel continuum could, of course, be divided into an
infinite number of individual vowels, just as the octave can be divided into
an infinite number of pitches. I selected nine as a number which would
facilitatea sufficient variety of vowel patterns and still permit the discrimination and identification of the individual vowel qualities.12 These particular vowels were selected because they are familiar to me as a speaker of
English and because their formantfrequenciesare spaced fairlyevenly in the
vowel space. In Example 3, a range of possible center frequencies is given
for Fl and F2 of each of the nine vowels.13
VOWEL/PITCH
ARRAY
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
121
SpeechExtrapolated
261.6
277.2
293.7
311.1
329.6
349.2
370.0
392.0
415.3
440.0
466.2
493.9
523.2
554.4
587.3
622.3
659.3
698.5
740.0
784.0
830.6
880.0
C4
C#
D
D#
E
F
F#
G
G#
A
A#
B
C5
C#
D
D#
E
F
F#
G
G#
A
U
|
-lo0
I
oe
e^ vo o
O > w
o1
EXAMPLE
<n
o ai rt C
to Q
8r-I6oS
MAPPED
AGAINST
4 o
c c? f
m
)
THE TWELVE-TONE
so
VOWEL FORMANTS
(FEMALE)
8ve
TEMPERED
SCALE
....................................................-------------------------.............--...........------------
e- i
^ lT'
^A
aCL
CENTER-FREQUENCY/PITCH
i
ARRAY
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
of NewMusic
Perspectives
122
DISCRIMINATION
AND IDENTIFICATION
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
123
FAMILIES OF PERCUSSION/VOWELS
GROUPINGS
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
Perspectives
of NewMusic
124
Sampled Speech
Vowels
wood vowels
wood-noise vowels
Wood
Voca/
Tra.c
Live Percussion
marimba Pitched
xylophone
wood blocks
temple blocks
drums
hand percussion
nptch
cowbells
cymbals
Metal
metal-noise vowels
metal vowels
vibraphoneI
orchestral bells Pltched
Along the vertical axis of Example 5, the sound world of the piece is
divided accordingto the "source" of the sounds (wood, vocal tract, metal)
and according to specificity of pitch ("pitched" vs. "unpitched").15 At
various points in the piece, "unpitched" instruments, "pitched" instruments, "wood" instruments, and "metal" instruments (as defined in
Example 5) are projected as individual timbral groupings. Drums, which
are neither wood nor metal of course, are rarely projected in Still Life
Dancing as a separate "skins" category; instead they serve as a lowerfrequency component of the "unpitched" category which also includes
metal, wood, and the sampled speech.
Although it is not indicated in the diagram, the "unpitched" category is
further divided into vowel-related and fricative-related instruments.
Because of their indefinite pitch and short decay, wood blocks, temple
blocks, and cowbells are often grouped with vocal fry vowels to create an
ambiguous speech/nonspeech texture. In Examples 7 and 8 in particular,
the wood blocks and whisper/fry are hocketed in interlocking, often imitative patterns. Cymbals and whisper-sung vowels are often grouped because
they are both wideband sounds with indefinite pitch and comparatively
long decay. Hand percussion such as maracas, shekere, and tambourine
which can be sustained by rolls are often grouped with the fricatives (/s/,
and /J/ )16 and the fricative-vowel transitions (/su/, /sa/, /si/, /Ju/, /|a/,
/Ji/ ). In Example 2, bar 1, the fricative-voweltransition /fi/ comes out
ofthe sound ofthe Shekere. In bars2-3 of the same example, an extended /I/
joins rolled tambourine, and maracas as part of a high-pitched rustlenoise texture.
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
125
SpeechExtrapolated
9=%
U.
n0
'
--
U.
sE
BUI-
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
126
I
I
]Vs
ap
N-
[I
" ILo
IL
e)
0
r.
-U
0'I
9L
Cr;l
-:
Lr)
U-
Lj
-1
-0
ouko
o
1
LL
<0
I
--Io
--I
-1c
U.
m.
tla
^
E
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
127
Speech Extrapolated
3.
1W~-
r
rn
to
I,
--
L74,.
by I
e*n
L-
I;i
I;d
I
I,
0o 1uF L;L
I
(4I
2-
L
L
1"
FsN
sc^
H
E-
\0
p
<k-
7-,
-v^
lt-
0.
r
F
AL
Iq"
.7
Je)
I
p-s,
3-
k^
QL
P"
ILa
.r 71
(
7;3j
r ^
/ It
;O
._A
-
=NEc
a
>
S
:0.
'
ILi
i:
0
a
ae
u
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
of NewMusic
Perspectives
128
(
^
ib
?^
Wh/Fry
7-
tLjoocI-,C-
P/Dipth.lL7
I^
A
P/Yowels3
tJ
aa
I-
L-
Iu,
Ir
ZIJ
Ii
I
_
-~
F --
rj
r_ri
Xylophone
Mari mba
3
Cymbals
Drums
EXAMPLE 6 (CONT.)
Whisper/Fry
*1
Wood Blocks
Temple Blocks
HandPercussion
zf
> --I
1f
I- I
1-f"-
Cymbals
Drums
rt -
'
I -I
EXAMPLE 7
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
129
SpeechExtrapolated
00
0-4
P-
0>
L.
U.
k.
4a0.
a
3:
f)
I
a
f
Q.
a
0
3I-i0
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
of NewMusic
Perspectives
130
-_I
Wh/Fry
. .-J.
I
P/Yoveh1.^
Wd.-'?$?
BI
f
I
'Marimba
*
_r
Wd. Biks
rL
Temp.Blk3l
Gf
r
3
r
~-=~?
M4-
----
-+
7
6I-f
Cowbell
Cymbals
Drums
r(
-^
TT-T1
r
p
If-
EXAMPLE 8 (CONT.)
VOWEL QUALITIES
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
<
/r
,<
.i4
^_
'
-^.
LO
^S
^ ^
?
jpO0
I IC
sO
?
)w
er
qL
t1
^
?
^~~~~
Q5
?
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
131
132
Perspectives
of NewMusic
percussion/vowel scale from /u/ to /i/ entails a concomitant (though somewhat variable)set of pitches associatedwith the formant frequencies. Vowel
quality and pitch representdifferentaspects of this dimension because of the
differences (some of which are outlined above) between phonetic perception and pitch perception.
Within the context of Still LifeDancing, the pitches from around D4 (the
first formant of /u/) to around F7 (the second formant of /i/) are in a
privileged position because they fall within the range of vowel formant
frequencies.The memorabilityof pitches and patternsof pitches within this
range can be reinforced and "colored" by direct association with specific
vowels. This will be illustratedbelow in a discussion of Example6 (bars93106).
The whisper and vocal fry vowels are organized in a scale of ascending
second formantsas are the percussion/vowels. These sounds are sometimes
organized accordingto their specific vowel qualities (see below) and sometimes simply as another unpitched percussion instrument (a percussion
instrument the vowel qualities of which can become functional at any
time). The percussionistsare asked to select the range of wood and temple
blocks to approximatethe pitch of/i/ at the high end and /u/ at the low end
in order to facilitate perceptual associations and interactions between the
two sets of "instruments."
2. Vowelqualitiesfunctionaspointsof cognitiveunisonbetweentimbreswhich,
in otherrespects,are verydifferent,and whichare identifiedby listenersas
differentsources.
issuingfrom categorically
Because the vocal fry vowels and whisper vowels are vaguely and complexly
pitched, it is usually the vowel quality and the noise content-not the
pitch-that stand out most clearlywhen a vocal fry vowel or whisper vowel
is sounded. Moreover, because the sampled speech sounds were produced
by a male vocal tract while the percussion vowels were based upon average
formant frequenciesfor a smaller (female)vocal tract, the pitches associated
with each vocal fry vowel are not identicalto the formantfrequenciesof the
percussion/vowels. The inclusion of vocal fry vowels and whisper vowels in
the musicalmaterialsof Still LifeDancingthus gives vowel quality a function
separatefrom and more independent of pitch.
The vocal fry and whisper vowels can be used to match, extend, and
imitate patterns of vowel quality heard in the percussion/vowels independently of the pitch or other aspects of the timbre. The fry and whisper
vowel patterns often follow the pitched vowels (either as extensions of an
individual vowel or as an imitation of a vowel pattern) to convey the
impression that the pitch of the sound has decayed, leaving the unpitched
vowel. The intent of these imitations and extensions is to persuadelisteners
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
133
The vowel unisons between the sampled speech and the percussion/vowels
point to an interesting and important ambiguity in Example 5. The source
category "vocal tract" should certainly include the sampled speech which
issued (quite audibly) from a human vocal tract. But should it include the
percussion/vowelswhich seem to emanatefrom wood or metal instruments
but behave(in their changing resonances) as if produced by a vocal tract?
This ambiguity is maintained throughout the piece by the continual return
of sampled speech and percussion/vowels. The speech and/or timbral similaritiesof the sampled speech, percussion/vowels and the live percussion is
intended to extendthis ambiguity to include as many of the sounds of the
piece as possible.
The compositional intent is to invite the listener:
1. To attend, at times, to the texture of this piece in a speech modeinterpreting (or attempting to interpret) timbres which would normally be thought of as non- speech in terms of speech sounds which
share timbral characteristics,and, (conversely)...
2. To attend to the sounds of speech not only as cues for phonetically
coded information, but as timbres, pitches, durations.
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
of NewMusic
Perspectives
134
ACKNOWLEDGMENTS
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
135
NOTES
1. While Lerdahl organizes timbre according to judgements of "consonance" and "dissonance" along several timbral dimensions (an
approach explicitly related to the structure of tonal music), Slawson
organizes his system of "sound colors" accordingto serial principles.
2. I have utilized speech sounds as the basis for compositional control of
a wide variety of timbres in various contexts. Two examples: In
Passages(1981) for chamber choir, two pianos, two percussion, and
organ, I utilized a phonetically constrained text composed for the
piece by poet Michael Davidson in conjunction with a phonetically
constrained artificial language of my own design. The artificial language entailed the rule-based construction of "nonsense" syllables
utilizing three classes of phonemes: fricatives, stop consonants, and
vowels-the timbral qualitiesand organizationof which were mapped
onto the organization of the entire ensemble and the entire piece. In
Scritto(1986) for computer tape, I utilized sampled whispers (a vowel
scale, two fricatives, three stop/vowel syllables), vocal fry (the same
vowel scale), and sung aggregates with percussion attacks. The
ambitus of the pitch aggregatesconsistently associatedwith the sung
vowels varied directly with the height of the second formant frequencies of the vowels to reinforce the sense of "opening" from /u/
through the "vowel scale" to /a/ to /I/.
3. J.K. Randall (1972-74) proposed a similar association of pitch and
vowel as early as 1972 in "Part II: 6 Stimulating Speculations" of his
article "Compose Yourself-A Manual for the Young." Randall
described a system of "musical-intervallicrelations(specifically, the
intervallicrelations formed by center-pitches of R1 and R2 [the first
and second vowel formants]) betweenand among the timbres ..."
which would be "... reproducible
at various places" in the vowel
square. Slawson (1985) cites a passage in Stockhausen's Kontakte
(1963) in which filters set at wide bands to produce a vowel are
graduallynarrowed until the pitch of the center frequency is audible.
Some psycho-linguists regard speech perception as fundamentally
different from perception of timbre in other contexts, and indeed
from all other types of perception: "speech is special," they say. I
discuss phonetic labeling here as if it were an independent aspect of
timbre perception, but I do not mean by that to deny the claim of
"specialness" for speech. I am agnostic on most aspects of the
controversy.I do maintain, however, that (as experiments in "duplex
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
136
Perspectives
of NewMusic
perception" demonstrate (see, for example, Liberman (1979)), listeners can attend simultaneously to both the timbral quality and
phonetic identity of of speech sounds.
4. An example: In the domain of pitch, the simplest, strongest, most
elemental intra-category relationship is the unison or prime. Within
limits, pitches may differ perceptibly and and still be identified by
listeners asfunctionallyidenticalin context. This is reflected in the fact
that "bad intonation" or expressive "pitch inflection" is generally
distinguished from "wrong notes." (In music which succeeds in
establishing distinct functions for, say, thirty-one tones in an octave,
the threshold between "bad intonation" or "wrong notes" would be
virtually eliminated.) Pitch unisons or primes represent functional
relationshipsin a musicalcontext, ratherthan an absolute identities of
pitch or frequency. This is not to say that two COs cannot play
different functions at a higher hierarchicallevel-they certainly can.
The point here is that, at a lower level of functional interpretation,the
note Cf can be effectively represented by sounds with slightly (but
perceptibly) different pitch heights.
5. Erickson (1975) discusses the importance of recognition and identification at some length. He asserts that "Even if we try very hard we
find it difficult to attend to any single parameterof a timbre" and he
quotes Schouten (1968) as follows: "Evidently our auditory system
does carry out an extremely subtle and multi-varied analysisof these
elements, but our perception is cued to the resulting overall pattern.
Acute observers may bring some of these elements to conscious
perception, like intonation patterns, onsets, harshness, etc., even so,
minute differences may remain unobservable in terms of their auditory quality and yet be highly distinctive in terms of recognizing one
out of a multitude of potential sound sources."
This bears directly upon Lerdahl's (1987) efforts to construct
timbral hierarchies. My strongest objection to Lerdahl's otherwise
intriguing approachis that he oversimplifiesthe powerful associations
(gained by everyday experience) that listeners bring to bear in their
listening to synthesized sounds. In a discussion of "timbral prototypes," he refersonly to "prototypes" of vibrato, tremolo, amplitude envelope, and other independent variablesand makes no reference to traditional instruments or other environmental sounds.
Presumably, his definitions of prototypical vibrato and prototypical
amplitude envelope are somehow abstracted from listeners' experiences, but it seems clear to me that if such prototypical featuresof
timbre can be said to exist, they are interdependent
and strongly related
to the listener's judgement as to the source of the sound. I would
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
137
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
138
of NewMusic
Perspectives
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
139
BIBLIOGRAPHY
Bailey, Peter J., Quentin Summerfield, and Michael Dorman. 1977. "On
the Identification of Sine-wave Analogues of Certain Speech Sounds."
HaskinsLaboratories
StatusReporton SpeechResearchSR-51/52:1-25.
Berio, Luciano. 1961. Circles,for female voice, harp, and two percussion
players. London: Universal Edition.
Disner, S. F. 1983. "Vowel Quality: The Relation Between Universal and
Language Specific Factors." U.C.L.A. WorkingPapersin Phoneticsno.
58.
Erickson, Robert. 1975. Sound Structure in Music. Berkeley and Los
Angeles: University of CaliforniaPress.
Grey, John M. 1977. "Multidimensional Perceptual Scaling of Musical
Timbres." Journal of the AcousticalSocietyof America, 61, no. 5 (May):
1270-77.
Jones, David Evan. 1981. Passages,for chamber choir, two pianos, two
percussion, and organ (score forthcoming from American Composers
Editions).
. 1984. "A Composer's View." Electro-Acoustic
Music (TheJournalof
the Electro-Acoustic
Music Associationof GreatBritain) 1, no. 1 May-June.
. [1986]. Scritto,for computer tape (compact disc Wergo Records on
Digital Music Digital, vol. 4, WER 2024-50).
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
140
of NewMusic
Perspectives
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
SpeechExtrapolated
141
1975. "Cognitive Reference Points." CognitivePsychology7:532Schouten, J. F. 1968. "The Perception of Timbre." In Reportsof the 6th
InternationalCongresson Acoustics,edited by Dr. Y Kohasi. 6 vols. Vol. 3,
GP-6-2:35-44, 90. Tokyo: Mauruzen Company; Amsterdam: Elsevier.
Shepard, R. N. 1982. "Geometrical Approximations to the Structure of
Musical Pitch." Psychological
Review89:305-33.
Slawson, Wayne. 1985. SoundColor.Berkeleyand Los Angeles: University
of CaliforniaPress.
Stevens, Kenneth N. 1972. "The Quantal Nature of Speech: Evidence
from Articulatory-AcousticData." In Human Communication:A Unified
View. Inter-University Electronics Series, vol. 15, edited by Edward E.
David and Peter B. Denes, 51-66. New York:McGraw-Hill.
Stockhausen, Karlheinz. 1966. Kontakte.London: Universal Edition.
.1967. Stimmung.London: Universal Edition.
Tsunoda, Tadanobu. 1971. "The Difference of the CerebralDominance of
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions
142
of NewMusic
Perspectives
This content downloaded from 169.229.11.216 on Wed, 21 Oct 2015 01:20:27 UTC
All use subject to JSTOR Terms and Conditions