Sie sind auf Seite 1von 5

CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 53

Vocal Expression and Perception states through the acoustic proper-


ties of their speech. For instance,
of Emotion many of us have experienced talk-
ing in an unwittingly loud voice
Jo-Anne Bachorowski1 when feeling gleeful, speaking in
Department of Psychology, Vanderbilt University, Nashville, Tennessee an uncharacteristically high-
pitched voice when greeting a sex-
ually desirable person, or talking
with marked vocal tremor while
Abstract expected by chance. More de- giving a public speech. In turn, lis-
Speech is an acoustically tailed characterizations of teners are seemingly adept at mak-
rich signal that provides con- these production and percep- ing accurate evaluations of emo-
siderable personal information tion aspects of vocal commu- tional states—even in the absence
about talkers. The expression nication will necessarily of visual cues, as routinely occurs
of emotions in speech sounds involve knowledge about dif- during telephone conversations.
and corresponding abilities to ferences among talkers, such Production and perception phe-
perceive such emotions are as those components of speech nomena are both facets of a broad
both fundamental aspects of that provide comparatively research area concerned with un-
human communication. Find- stable cues to individual talk- derstanding the ways in which
ings from studies seeking to ers’ identities. speech acoustics provide personal
characterize the acoustic prop- information about talkers, such as
erties of emotional speech in- Keywords gender and individual identity, in-
dicate that speech acoustics emotion; speech acoustics; dependent of linguistic content.
provide an external cue to the vocal communication This article provides an overview
level of nonspecific arousal as- of the links between speech
sociated with emotional pro- acoustics and emotions (for more
cesses and, to a lesser extent, detailed reviews, see Pittam &
The speech stream is a highly
the relative pleasantness of ex- Scherer, 1993, and Scherer, 1989).
complex and variable signal that is
perienced emotions. Outcomes Some limitations of traditional ap-
most directly studied by analyzing
from perceptual tests show
its acoustic properties, or sound proaches to this research area, and
that listeners are able to accu-
patterns. We know from everyday alternative ways of thinking about
rately judge emotions from
experience that talkers provide in- enduring problems, are also
speech at rates far greater than
formation about their emotional discussed.

Copyright © 1999 American Psychological Society


54 VOLUME 8, NUMBER 2, APRIL 1999

vocal-production-related physiolo- tours, or the pattern of F0 changes


SPEECH ACOUSTICS gy, such as the fluctuations in respi- over the course of an utterance. For
ration and muscle tension that can example, F0 has been noted to de-
occur in conjunction with some crease over time during portrayals
The source-filter theory of emotions (Scherer, 1989). of anger, but to increase over time
speech production is helpful for during portrayals of joy. In con-
understanding the ways in which- trast, emotions associated with low
speech acoustics might provide in- levels of physiological arousal (e.g.,
VOCAL EMOTION
formation about emotional state sadness) are consistently associated
FROM A PRODUCTION
(see Kent, 1997, for a thorough in- with lower mean F0, F0 variability,
STANDPOINT
troduction to speech acoustics). In and vocal intensity, as well as de-
this framework, speech sounds re- creases in F0 over time.
sult from the combination of source Most production-related investi- Rather than relying on acted
energy, produced by vibration of gations have been guided by the portrayals, my colleague and I
the vocal folds (formerly referred assumption that distinct patterns have studied the acoustic proper-
to as the vocal cords), and the sub- of acoustic cues will be found to be ties of natural speech produced by
sequent filtering of that energy by associated with discrete emotional naive participants in the context of
the vocal tract above the larynx. states. Largely for practical rea- controlled emotion-induction pro-
Source-related acoustic cues sons, these investigations have typ- cedures.2 We have focused our
refer to those aspects of speech ically analyzed the emotional acoustic analysis on very short
sounds that are primarily associat- speech produced by small num- vowel segments both because de-
ed with vocal-fold vibration. In bers of actors or naive subjects tailed measurement of source- and
emotions research, measures asso- asked to portray various emotions filter-related cues is possible with
ciated with F0 (i.e., the fundamental while producing linguistically neu- these sounds and because these
frequency of speech, which corre- tral utterances. For both theoretical speech samples are less likely than
sponds to the rate of vocal-fold vi- and practical reasons, most analy- sentence-length utterances to be in-
bration and is perceived as vocal ses of emotional speech have fo- fluenced by demand characteris-
pitch) are the most commonly cused on source-related acoustic tics, such as culturally prescribed
used. Other potentially important cues. For these cues, a restricted yet rules about how particular emo-
source measures include jitter and fairly reliable pattern of findings tions ought to be conveyed in
shimmer, which correspond to vari- has emerged. For example, Scherer, speech. In one such study
ability in the frequency and ampli- Banse, Wallbott, and Goldbeck (Bachorowski & Owren, 1995), pos-
tude of vocal-fold vibration, re- (1991; also see Banse & Scherer, itive and negative emotions were
spectively. Filter-related cues are 1996; Leinonen, Hiltunen, induced by giving participants
examined less often by emotions Linnankoski, & Laakso, 1997) ex- “Good Job” and “Try Harder” feed-
researchers. However, these cues amined the acoustic features of back as they performed a difficult
may be important for understand- neutral and emotional nonsense computerized spelling task. In real-
ing emotional speech because facial sentences spoken by four actors. In ity, this feedback was not contin-
expression (e.g., lip position) can comparison with neutral speech, gent on participants’ performance.
influence filtering effects. Thus, a portrayals of fear, joy, and anger After each feedback presentation,
sentence spoken while smiling can were each associated with a higher subjects’ speech was recorded as
sound different from the same sen- mean F0, whereas portrayals of they announced the number (n) of
tence spoken while frowning. sadness were associated with a the upcoming block of trials using
These kinds of acoustic differences lower mean F0. A corresponding the phrase “test n test.” F0, jitter,
are reflected in formants, which are pattern was observed for vocal in- and shimmer were measured from
vocal-tract resonances that corre- tensity, or amplitude. 30 instances of the “eh” sound that
spond to the frequencies amplified Across studies, portrayals of occurred in the first utterance of
through vocal-tract filtering. emotions associated with high lev- “test” in each stock phrase. The re-
Another way of thinking about res- els of physiological arousal (e.g., sults indicated that emotion ex-
onances is that they are the natural anger, fear, anxiety, and joy) have pressed through the vocal channel
frequencies that are selectively re- been associated with increases in depended not only on the valence
inforced because of the size and mean F0, F0 variability, and vocal (i.e., the relative pleasantness or
shape of the vocal tract (see Kent, intensity. Some acoustic differentia- unpleasantness) of the elicited
1997). Both source- and filter-relat- tion among these emotions has emotion, but also on differences in
ed cues are sensitive to changes in been found by examining F0 con- the self-reported intensity with

Published by Blackwell Publishers, Inc.


CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 55

which emotions were typically ex- play in the production of emotion- fects also suggest that characteristic
perienced (Bachorowski & Braaten, al speech. acoustic differences between voices
1994). play a role in perceptual evalua-
We observed similar outcomes tions of emotion from speech.
in an unpublished study that used PERCEPTION OF
a more standard emotion-induc- VOCAL EMOTION
tion paradigm in which naive par-
TOWARD A BROADER
ticipants described the thoughts Tests of listeners’ abilities to
FRAMEWORK
and feelings evoked by emotion- infer emotion from speech are crit-
eliciting slides. Notably, efforts to ical for evaluating the perceptual
link both source- and filter-related importance of acoustic cues shown A number of constraints have
acoustic cues with discrete emo- to be important from a production impeded the development of a de-
tions were largely unsuccessful. perspective, and help to inform re- tailed account of vocal-emotion-
Instead, the overall pattern of re- search aimed at developing an related phenomena. For instance,
sults indicated that values of acoustic typology of emotional speech is complex, both in the
acoustic parameters were associ- speech. The standard perception number of potentially relevant
ated with nonspecific arousal and, paradigm is to have listeners acoustic cues related to emotional
to a lesser extent, emotional va- choose which one of several emo- expression and in the multiplicity
lence. Again, differences in emo- tion words best characterizes lin- of other factors that influence the
tional intensity mediated the rela- guistically neutral utterances speech signal at any moment in
tionships between acoustic made by actors attempting to por- time. More pragmatically, accurate
measures and emotional states. tray various emotions (e.g., and detailed acoustic analysis is
Although the expression and Leinonen et al., 1997; Scherer, time-consuming. From a method-
perception of emotion are salient Banse, & Wallbott, 1998). Listeners ological standpoint, the small num-
aspects of human vocal communi- are usually able to perceive the in- ber of participants typically stud-
cation, researchers have yet to fully tended emotions at rates signifi- ied and the reliance on acted
characterize the ways in which cantly better than those expected portrayals have limited the gener-
speech acoustics provide cues to by chance. This general success in ality of findings. Paradigms that in-
emotional states. The most parsi- identifying emotions is typically volve collecting speech samples
monious interpretation of produc- interpreted to indicate that listen- during the controlled induction of
tion-related data is that speech ers associate particular patterns of emotional states best balance the
acoustics provide an external cue acoustic cues with various discrete need for methodological rigor and
to the level of nonspecific arousal emotional states. Evidence for real-world validity.
associated with emotional process- cross-cultural similarities in both Although investigators have
es. Less reliable differentiations are perceptual accuracy and error pat- typically sought to identify invari-
found when researchers look for terns (Scherer et al., 1998) further ant patterns of acoustic cues for
associations between acoustic mea- suggests that the ability to infer various discrete emotional experi-
sures and either emotional valence emotion from speech is a funda- ences, this strategy may be prob-
or discrete emotion categories. mental component of human vocal lematic for a number of reasons.
Moreover, potentially important communication. For instance, this tactic generally
individual differences, including In light of these findings, it is also fails to consider the talker–listener
the identity of the talker and emo- important to note that error rates relationship and the “intended”
tional intensity, are routinely found are also often quite high. A hint impact of vocal signals on the lis-
to mediate vocal expression of about the basis of detection failures tener’s affective states. Some cues,
emotion. As Scherer (1986) has comes from the fact that listeners especially those associated with the
pointed out, there is an apparent are more accurate in inferring emo- rate of vocal-fold vibration, are
contradiction between the difficul- tion from particular voices. readily modifiable. They can be
ty in finding acoustic differentia- Furthermore, for any given actor, used, for example, to signal com-
tion of emotional states and the listeners typically perceive some municative intent or be recruited
comparative ease with which lis- emotions more accurately than oth- for the purposes of affective per-
teners are able to judge emotions ers. Although it is likely that some suasion. Thus, treating these cues
from speech. Resolving this contra- emotions may simply be more diffi- as honest readouts of emotional
diction will likely involve an ex- cult to infer from voice than others, states ignores their other potential
plicit understanding of the role that and that actors vary in the quality of functions in emotion-related com-
individual difference variables their emotion portrayals, these ef- munication.

Copyright © 1999 American Psychological Society


56 VOLUME 8, NUMBER 2, APRIL 1999

Incorporating a more talker- na. In that most studies have ar- Acknowledgments—Work on this arti-
centered (i.e., idiographic) per- guably examined affect rather than cle was completed while the author was
spective may also help advance emotion, it may have been unrea- generously hosted as a Visiting Scholar
by the Department of Psychology at
our understanding of emotional sonable to expect that distinct Cornell University. Funds in support of
speech. Evaluations of emotional acoustic patterns could be identi- this work came from National Science
state are necessarily made against fied. Instead, there is remarkable Foundation (POWRE) and National
Institute of Mental Health (B/START)
an acoustic backdrop of individu- consistency in support of the no- awards, and from Vanderbilt University.
ally distinctive voice characteris- tion that the acoustic features of Michael J. Owren provided valuable
tics, and yet differences among “emotional” speech are best de- comments on an earlier version of this
talkers are usually treated as unin- manuscript, and our collaborative work
scribed using dimensions of non- led to some of the ideas presented here.
teresting variability in vocal-emo- specific arousal and affective
tion research. However, everyday valence, and that most vocal pro-
experience suggests that more ac- ductions index affective rather Notes
curate and detailed perceptual than emotional experience.
judgments of emotional state can The expression and perception 1. Address correspondence to Jo-
be made for familiar than for unfa- of emotional states in speech Anne Bachorowski, Department of
miliar talkers. For example, dis- acoustics are fundamental aspects Psychology, Wilson Hall, Vanderbilt
criminations between related emo- of human communication. In fact, University, Nashville, TN 37240; e-
tions, such as amusement and joy, mail: j.a.bachorowski@vanderbilt.edu.
disturbances in either of these com- 2. Preliminary results from work
are probably more accurate for munication components can con- being conducted in other laboratories
speech samples from a close friend tribute to profound deficits in so- demonstrate that both standard emo-
than those from a more casual ac- cial relationships. By its very tion-induction paradigms and playful,
quaintance. Suggestive empirical nature, research in vocal expression gamelike paradigms are successful for
support for the importance of talk- eliciting speech samples that can be
and perception of emotion is richly used to study vocal expression of emo-
er characteristics comes from stud- interdisciplinary—a circumstance tion. Some investigators using these
ies indicating that acoustic differ- that gives rise to both its inherent kinds of strategies include Arvid
ences among talkers exert a complexities and its considerable Kappas (arvid@psy.ulaval.ca), Gary
powerful influence on cognitive intellectual appeal. As a result of Katz (gary.katz@csun.edu), and Tom
operations such as linguistic pro- Johnstone in Klaus Scherer ’s lab
improved digital processing tech- (johnstone@fapse.unige.ch).
cessing and memory (e.g., Palmeri, niques as well as advances in the
Goldinger, & Pisoni, 1993). Thus, related disciplines of speech sci-
more detailed characterizations of ence, cognitive science, and References
the acoustic features of emotional acoustic primatology, findings ob-
Bachorowski, J.-A., & Braaten, E.B. (1994).
speech might be found by examin- tained in the coming years should Emotional intensity: Measurement and theo-
ing fluctuations in acoustic cues prove especially informative for retical implications. Personality and Individual
against comparatively more stable Differences, 17, 191–199.
our understanding of emotional Bachorowski, J.-A., & Owren, M.J. (1995). Vocal
but individually distinctive talker expression through the vocal expression of emotion: Acoustic properties of
characteristics (see Bachorowski & speech are associated with emotional intensity
channel. and context. Psychological Science, 6, 219–224.
Owren, 1998). Bachorowski, J.-A., & Owren, M.J. (1998). Acoustic
cues to gender and talker identity are present in a
Research in vocal-emotion phe- short vowel segment produced in running speech.
nomena might also benefit from a Manuscript submitted for publication.
Banse, R., & Scherer, K.R. (1996). Acoustic profiles
reinterpretation of findings based in vocal emotion expression. Journal of
on Russell and Feldman Barrett’s Recommended Reading
Personality and Social Psychology, 70, 614–636.
Kent, R.D. (1997). The speech sciences. San Diego:
(1998) distinction between affect Singular Publishing.
and emotion. In their account, af- Bachorowski, J.-A., & Owren, M.J. Leinonen, L., Hiltunen, T., Linnankoski, I., &
Laakso, M.-L. (1997). Expression of emotional-
fect is always present and is best (1995). (See References) motivational connotations with a one-word
described by bipolar dimensions of Kent, R.D. (1997). (See References) utterance. Journal of the Acoustical Society of
Murray, I.R., & Arnott, J.L. (1993). America, 102, 1853–1863.
arousal and valence. In contrast, Toward the simulation of emo- Palmeri, T.J., Goldinger, S.D., & Pisoni, D.B. (1993).
prototypical emotion episodes tion in synthetic speech: A Episodic encoding of voice attributes and
recognition memory for spoken words. Journal
happen more rarely and are associ- review of the literature on of Experimental Psychology, 19, 309–328.
ated with identifiable neurophysio- human vocal emotion. Journal of Pittam, J., & Scherer, K.R. (1993). Vocal expression
the Acoustical Society of America, and communication of emotion. In M. Lewis &
logical, behavioral, and cognitive J.M. Haviland (Eds.), Handbook of emotions (pp.
93, 1097–1108.
processes. This distinction certain- Pittam, J., & Scherer, K.R. (1993). (See
185–197). New York: Guilford Press.
Russell, J.A., & Feldman Barrett, L. (1998). Affect
ly sheds new light on vocal pro- References) and prototypical emotional episodes. Manuscript
duction and perception phenome- submitted for publication.

Published by Blackwell Publishers, Inc.


CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 57

Scherer, K.R. (1986). Vocal affect expression: A Emotion: Theory, research, and experience: Vol. 4. across languages and cultures. Manuscript sub-
review and model for future research. The measurement of emotions (pp. 233–259). New mitted for publication.
Psychological Bulletin, 99, 143–165. York: Academic Press. Scherer, K.R., Banse, R., Wallbott, H.G., & Goldbeck,
Scherer, K.R. (1989). Vocal measurement of emo- Scherer, K.R., Banse, R., & Wallbott, H.G. (1998). T. (1991). Vocal cues in emotion encoding and
tion. In R. Plutchik & H. Kellerman (Eds.), Emotion inferences from vocal expression correlate decoding. Motivation and Emotion, 15, 123–148.

Copyright © 1999 American Psychological Society

Das könnte Ihnen auch gefallen