The Cortical Organization of Speech Processing

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/6397916
The Cortical Organization of Speech Processing
Article in Nature reviews Neuroscience · June 2007

DOI: 10.1038/nrn2113 · Source: PubMed
CITATIONS READS
2,239 3,287
2 authors, including:
David Poeppel
New York University
271 PUBLICATIONS 15,126 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Seeing and Hearing a Word: integration within and across senses View project
Mid-level audition View project
All content following this page was uploaded by David Poeppel on 24 January 2015.
The user has requested enhancement of the downloaded file.

PERSPECTIVES
attentional deficits11 and phonological
OPINION
working memory deficits3.
The clinical picture regarding the neural
The cortical organization of speech basis of speech perception was, therefore,
far from clear when functional imaging
processing methods arrived on the scene in the 1980s.
Unfortunately, the first imaging studies
failed to clarify the situation: studies pre-
Gregory Hickok and David Poeppel senting speech stimuli in passive listening
tasks highlighted superior temporal regions
Abstract | Despite decades of research, the functional neuroanatomy of speech bilaterally12,13, and studies that used tasks
processing has been difficult to characterize. A major impediment to progress may similar to syllable discrimination or iden-
have been the failure to consider task effects when mapping speech-related tification found prominent activations in
processing systems. We outline a dual-stream model of speech processing that the left STG14 and left inferior frontal lobe15.
remedies this situation. In this model, a ventral stream processes speech signals for The paradox remained: damage to either
of these left-hemisphere regions primarily
comprehension, and a dorsal stream maps acoustic speech signals to frontal lobe
produced speech production deficits (or at
articulatory networks. The model assumes that the ventral stream is largely most some mild auditory comprehension
bilaterally organized — although there are important computational differences deficits affecting predominantly post-phone-
between the left- and right-hemisphere systems — and that the dorsal stream is mic processing), not the impaired auditory
strongly left-hemisphere dominant. comprehension problems one might expect if
the main substrate for speech perception had
been destroyed. Although several authors
Understanding the functional anatomy of make it clear that additional regions partici- discussed the possibility that left inferior
speech perception has been a topic of intense pate in the process. frontal areas might be activated in these stud-
investigation for more than 130 years, and Around the same time, neuropsychologi- ies as a result of task-related phonological
interest in the basis of speech and language cal experiments showed that damage to working memory processes14,16, thus address-
dates back to the earliest recorded medical frontal or inferior parietal areas in the left ing the paradox for one region, there was
writings. Despite this attention, the neural hemisphere caused deficits in tasks that little discussion in the imaging literature of
organization of speech perception has been required the discrimination or identifica- the paradox in connection with the left STG.
surprisingly difficult to characterize, even in tion of speech syllables3,4,9. These findings The goal of this article is to describe
gross anatomical terms. highlighted the possible role of a fronto- and extend a dual-stream model of speech
The first hypothesis, dating back to the parietal circuit in the perception of speech. processing that resolves this paradox. The
1870s1, was straightforward and intuitive: However, the relevance of these findings concept of dual processing streams dates
speech perception is supported by the audi- in mapping the neural circuits for speech back at least to the 1870s, when Wernicke
tory cortex. Evidence for this claim came perception was questionable, given that the proposed his now famous model of speech
from patients with auditory language com- ability to perform syllable discrimination processing, which distinguished between
prehension disorders (today’s Wernicke’s and identification tasks doubly dissociated two pathways leading from the auditory
aphasics) who typically had lesions of from the ability to comprehend aurally system1. A dual-stream model has been
the left superior temporal gyrus (STG). presented words: that is, there are patients well accepted in the visual domain since
Accordingly, the left STG in particular was who have impaired syllable discrimination the 1980s17, and the concept of a similar
thought to support speech perception. This but good word comprehension, and vice arrangement in the auditory system has
position was challenged by two discoveries versa7,10. Some authors took this to indicate gained recent empirical support18,19. The
in the 1970s and 1980s. The first was that that auditory comprehension deficits in model described in this article builds on
deficits in the ability to perceive speech Wernicke’s aphasia that are secondary to left this earlier work. We will outline the central
sounds contributed minimally to the audi- temporal lobe lesions resulted from disrup- components and assumptions of the model
tory comprehension deficit in Wernicke’s tions of semantic rather than phonological and discuss the relevant evidence.
aphasia2–7. The second was that destruction processes11, whereas others postulated that
of the left STG does not lead to deficits in the mapping between phonological and Task dependence and definitions
the auditory comprehension of speech, but semantic representations was disrupted10. From the studies described above, it is
instead causes deficits in speech produc- Proposed explanations for deficits in clear that the neural organization of speech
tion8. These findings do not rule out a role syllable discrimination associated with non- processing is task dependent. Therefore,
for the left STG in speech perception, but temporal lobe lesions included generalized when attempting to map the neural systems
NATURE REVIEWS | NEUROSCIENCE VOLUME 8 | MAY 2007 | 393

© 2007 Nature Publishing Group
PERSPECTIVES
signals for comprehension (speech recogni-

Box 1 | Linguistic organization of speech
tion). A dorsal stream, which involves
Linguistic research demonstrates that there are multiple levels of representation in mapping sound structures in the posterior frontal lobe and
to meaning. The distinct levels conjectured to form the basis for speech include ‘distinctive features’, the posterior dorsal-most aspect of the
the smallest building blocks of speech that also have an acoustic interpretation; as such they provide temporal lobe and parietal operculum, is
a connection between action and perception in speech. Distinctive features provide the basic involved in translating acoustic speech sig-
inventory characterizing the sounds of all languages97–100. Bundles of coordinated distinctive features
nals into articulatory representations in the
overlapping in time constitute segments (often called phonemes, although the terms have different
technical meanings). Languages have their own phoneme inventories, and the sequences of these frontal lobe, which is essential for speech
are the building blocks of words101. Segments are organized into syllables, which have language- development and normal speech produc-
specific structure. Some languages permit only a few syllable types, such as consonant (C)–vowel (V), tion. The suggestion that the dorsal stream
whereas others allow for complex syllable structure, such as CCVCC102. In recent research, syllables has an auditory–motor integration function
are proposed to be central to parsing the speech stream into manageable chunks for analysis103. differs from earlier arguments for a dorsal
Featural, segmental and syllabic levels of representation provide the infrastructure for prelexical auditory ‘where’ system18, but is consistent
phonological analysis104,105. The smallest building blocks mediating the representation of meaning with recent conceptualizations of the dorsal
are morphemes. Psycholinguistic and neurolinguistic evidence suggests that morphemic structure visual stream (BOX 2) and has gained sup-
has an active role in word recognition and is not just a theoretical construct106. There are further port in recent years20–22. We propose that
levels that come into play (such as syntactic information and compositional semantics, including
speech perception tasks rely to a greater
sentence- and discourse-level information), but those mentioned above are the representations that
must be accounted for in speech perception in the service of lexical access. At that point one can extent on dorsal stream circuitry, whereas
make contact with existing detailed models of lexical access and representation, including versions speech recognition tasks rely more on
of the cohort model107, the neighbourhood activation model108 and others. These provide further ventral stream circuitry (with shared neural
lexical-level (as well as prelexical) constraints to be accounted for. tissue in the left STG), thus explaining the
observed double dissociations. In addition,
in contrast to the typical view that speech
supporting speech processing, one must mental lexicon. This definition of speech rec- processing is mainly left-hemisphere
carefully define the computational process ognition does not require that there be only dependent, the model suggests that the
(task) of interest. This is not always clearly one route to lexical access: recognition could ventral stream is bilaterally organized
done in the literature. Many studies using be achieved via parallel, computationally (although with important computational
the term ‘speech perception’ to describe the differentiated mappings. According to these differences between the two hemispheres);
process of interest employ sublexical speech definitions, and the dissociations described so, the ventral stream itself comprises par-
tasks, such as syllable discrimination, to above, it follows that speech perception does allel processing streams. This would explain
probe that process. In fact, speech perception not necessarily correlate with, or predict, the failure to find substantial speech recog-
is sometimes interpreted as referring to the speech recognition. This presumably results nition deficits following unilateral temporal
perception of speech at the sublexical level. from the fact that the computational end lobe damage. The dorsal stream, however,
However, the ultimate goal of these studies is points of these two abilities are distinct. We is strongly left-dominant, which explains
presumably to understand the neural proc- have suggested that there is overlap between why production deficits are prominent
esses supporting the ability to process speech these two classes of tasks in the computa- sequelae of dorsal temporal and frontal
sounds under ecologically valid conditions, tional operations leading up to and including lesions, and why left-hemisphere injury can
that is, situations in which successful speech the generation of sublexical representations, substantially impair performance in speech
sound processing ultimately leads to contact but that different neural systems are involved perception tasks.
with the mental lexicon (BOX 1) and auditory beyond this stage6. Speech recognition tasks
comprehension. Thus, the implicit goal of involve lexical access processes, whereas Ventral stream: sound to meaning
speech perception studies is to understand speech perception tasks do not need lexical Mapping acoustic speech input onto concep-
sublexical stages in the process of speech access but instead require processes that tual and semantic representations involves
recognition (auditory comprehension). This allow the listener to maintain sublexical multiple levels of computation and repre-
is a perfectly reasonable goal, and the use of representations in an active state during sentation. These levels may include the rep-
sublexical tasks would seem to be a logical the performance of the task, as well as the resentation of distinctive features, segments
choice for assessing these sublexical proc- recruitment of task-specific operations. Thus, (phonemes), syllabic structure, phonological
esses, except for the empirical observation speech perception tasks involve some degree word forms, grammatical features and
that speech perception and speech recogni- of executive control and working memory, semantic information (BOX 1). However, it
tion doubly dissociate. The result is that which might explain the association with is unclear whether the neural computations
many studies of speech perception have only frontal lobe lesions and activations3,14,16. that underlie speech recognition in particu-
a tenuous connection to their implicit target lar involve each of these processing levels,
of investigation, speech recognition. Dual-stream model of speech processing and whether the levels involved are serially
In this article we use the term ‘speech With these comments in mind, we can organized and immutably applied or involve
processing’ to refer to any task involving summarize the central claims of the dual- parallel computational pathways and allow
aurally presented speech. We will use speech stream model (FIG. 1). Similar to previous for some flexibility in processing. These
perception to refer to sublexical tasks (such hypotheses regarding the auditory ‘what’ questions cannot be answered definitively
as syllable discrimination), and speech stream18, this model proposes that a ventral given the existing evidence. However, some
recognition to refer to the set of computa- stream, which involves structures in the progress has been made in understanding
tions that transform acoustic signals into a superior and middle portions of the tem- the functional organization of the pathways
representation that makes contact with the poral lobe, is involved in processing speech that map between sound and meaning.
394 | MAY 2007 | VOLUME 8 www.nature.com/reviews/neuro

PERSPECTIVES
a Via higher-order frontal networks mechanisms — that is, parallel processing

— might exist to exploit these cues.
Sensorimotor interface
There is also strong neural evidence
Articulatory network Input from
pIFG, PM, anterior insula Parietal–temporal Spt other sensory that parallel pathways are involved in
(left dominant) Dorsal stream (left dominant) modalities speech recognition. Specifically, evidence
from patients with unilateral damage to
either hemisphere, split-brain patients37
Spectrotemporal analysis Phonological network (who have undergone sectioning of the
Dorsal STG Mid-post STS Conceptual network
(bilateral) (bilateral) Widely distributed corpus callosum) and individuals undergo-
ing Wada procedures38 (a presurgical pro-
cedure in which one or the other cerebral
Combinatorial network Ventral stream Lexical interface hemisphere is selectively anaesthetized to
aMTG, aITS pMTG, pITS assess language and memory lateralization
(left dominant?) (weak left-hemisphere bias) patterns) indicates that there is probably
at least one pathway in each hemisphere
b
that can process speech sounds sufficiently
well to access the mental lexicon5,6.
Furthermore, bilateral damage to superior
temporal lobe regions is associated with
severe deficits in speech recognition (word
deafness)39, consistent with the idea that
speech recognition systems are bilaterally
organized40. In rare cases, word deaf-
ness can also result from focal unilateral
Figure 1 | The dual-stream model of the functional anatomy of language. a | Schematic diagram lesions41; however, the frequency of such an
of the dual-stream model. The earliest stage of cortical speech processing involves some form of occurrence is exceedingly small relative to
spectrotemporal analysis, which is carried out in auditory cortices bilaterally in the supratemporal the frequency of occurrence of unilateral
plane. These spectrotemporal computations appear to differ between the two hemispheres. lesions generally, suggesting that such cases
Phonological-level processing and representation involves the middle to posterior portions of the are the exception rather than the rule.
superior temporal sulcus (STS) bilaterally, although there may be a weak left-hemisphere bias at this Functional imaging evidence is also
level of processing. Subsequently, the system diverges into two broad streams, a dorsal pathway consistent with bilateral organization of
(blue) that maps sensory or phonological representations onto articulatory motor representations, speech recognition processes. A consistent
and a ventral pathway (pink) that maps sensory or phonological representations onto lexical concep-
and uncontroversial finding is that, when
tual representations. b | Approximate anatomical locations of the dual-stream model components,
specified as precisely as available evidence allows. Regions shaded green depict areas on the dorsal contrasted with a resting baseline, listen-
surface of the superior temporal gyrus (STG) that are proposed to be involved in spectrotemporal ing to speech activates the STG bilaterally,
analysis. Regions shaded yellow in the posterior half of the STS are implicated in phonological-level including the dorsal STG and superior
processes. Regions shaded pink represent the ventral stream, which is bilaterally organized with a temporal sulcus (STS). Many studies have
weak left-hemisphere bias. The more posterior regions of the ventral stream, posterior middle and attempted to identify ‘phonemic processing’
inferior portions of the temporal lobes correspond to the lexical interface, which links phonological more specifically by contrasting speech
and semantic information, whereas the more anterior locations correspond to the proposed combi- stimuli with various non-speech controls
natorial network. Regions shaded blue represent the dorsal stream, which is strongly left dominant. (BOX 3). Most studies find bilateral activation
The posterior region of the dorsal stream corresponds to an area in the Sylvian fissure at the parieto- for speech typically in the superior temporal
temporal boundary (area Spt), which is proposed to be a sensorimotor interface, whereas the more
sulcus (STS), even after subtracting out the
anterior locations in the frontal lobe, probably involving Broca’s region and a more dorsal premotor
site, correspond to portions of the articulatory network. aITS, anterior inferior temporal sulcus; non-speech controls (TABLE 1). However, in
aMTG, anterior middle temporal gyrus; pIFG, posterior inferior frontal gyrus; PM, premotor cortex. many of these studies, the activation is more
extensive, or, in a few studies, is solely found
in the left hemisphere. Nevertheless, we do
Parallel computations and bilateral meaning. By contrast, the model we suggest not believe that this constitutes evidence
organization. Models of speech recognition here proposes that there are multiple routes against a bilateral organization of speech
typically assume that a single computational to lexical access, which are implemented perception, because a region activated by
pathway exists. In general, the prevailing as parallel channels. We further propose speech that is also activated by acoustically
models (including the TRACE23, cohort24 that this system is organized bilaterally, in similar non-speech stimuli could still be
and neighbourhood activation25 models) contrast to many neural accounts of speech involved in, or capable of, speech processing.
assume that various stages occur in series processing26–31. Specificity is not a prerequisite for functional
that map from sounds to meaning, typically From a behavioural standpoint, it is clear effectiveness. For example, the vocal tract is
incorporating some acoustic-, followed by that the speech signal contains multiple, par- highly effective (even specialized) for speech
some phonetic- and finally some lexical-level tially redundant spectral and temporal cues production, but is far from speech-specific as
representations. Although activation of rep- that can be exploited by listeners and that it is also functionally effective for digestion.
resentations within and across stages can be allow speech perception to tolerate a range of Furthermore, it is not clear exactly what is
parallel and interactive, there is nonetheless signal degradation conditions32–36. This sup- being isolated in these ‘phoneme-specific’
only one computational route from sound to ports the idea that redundant computational areas. It has been hard to identify differential

PERSPECTIVES
Box 2 | Sensorimotor integration in the parietal lobe lexical access (and to succeed on numer-
ous other temporal order tasks in auditory
The dorsal stream processing system in vision was first conceptualized as subserving a spatial perception), the input signal must be analysed
‘where’ function17. However, more recent work has suggested a more general functional role in at this scale. Suprasegmental information
visuomotor integration72,73,93. Single-unit recordings in the primate parietal lobe have identified
carried on the syllable occurs over longer
cells that are sensitive not only to visual stimulation, but also to action towards the visual
stimulation. For example, a unit may respond when an object is presented, but also when the
intervals, roughly 150–300 ms. This longer
monkey reaches for that object in an appropriate way, even if the object is no longer in view92,93. scale, roughly commensurate with the acous-
A number of visuomotor areas have been identified that appear to be organized around different tic envelope of a spoken utterance, carries
motor effector systems93,94. These visuomotor areas are densely connected with frontal lobe regions syllable-boundary and syllabic-rate cues as
involved in controlling motor behaviour for the various effectors93. In humans, dissociations have well as (lexical) tonal information, prosodic
been observed between the conscious perception of visual information (ventral stream function) cues (for interpretation) and stress cues.
and the ability to act on that information appropriately (dorsal stream function). For example, in Although the presupposition typically
optic ataxia, patients can judge the location and orientation of visual stimuli, but have substantial remains unstated, previous models have
deficits in reaching for those same stimuli; parietal lesions are associated with optic ataxia100. assumed that either hierarchical processing
The reverse dissociation has also been reported in the form of a case of visual agnosia in which
(a segmental analysis followed by syl-
perceptual judgements of object orientation and shape were severely impaired, yet reaching
behaviour towards those same stimuli showed normal shape- and orientation-dependent
lable construction on the basis of smaller
anticipatory movements101. units)43,44 or parsing at the syllabic rate45,46
take place with no privileged analysis at the
segmental or phonemic level.
activations in response to words versus represent speech as non-categorical Another way to integrate these differing
pseudowords42, even though the latter are acoustic signals that can be used to access requirements is to posit a multi-time resolu-
presumably not represented in one’s mental the mental lexicon. Thus, even taking into tion model in which speech is processed
lexicon. An accepted explanation is that account the most extreme interpretation of concurrently on these two timescales by
pseudowords activate lexical networks via the imaging data, the claim that the speech two separate streams, and the information
features (phonemes or syllables) that are recognition system is bilaterally organ- that is extracted is combined for subsequent
shared with real words. Likewise, it is possible ized (accounting for the lesion data), but computations at the lexical level47,48 (FIG.2). It
that a similar phenomenon is occurring at with important computational differences is possible that lexical access is initiated by
sublexical processing levels: non-phonemic (accounting for asymmetries), remains the information from each individual stream,
acoustic signals might activate phonological most viable position. but that optimal lexical access occurs when
networks because they share acoustic features segmental-rate and syllabic-rate information
with signals that contain phonemic informa- Multi-time resolution processing. Mapping is combined. Determining whether such a
tion. Thus, phonological networks in both from sound to an interpretable representation model is correct requires evidence for short-
the left and right hemispheres might be sub- involves integrating information on differ- term integration, long-term integration and
tracted out of many of these studies. Finally, ent timescales. Determining the order of perceptual interaction. Although still tenta-
even if left-dominant phoneme-specific segments that constitute a lexical item (such tive, there is now evidence supporting each
networks have been isolated by these stud- as the difference between ‘pets’ and ‘pest’) of these conjectures47,48. Perceptually mean-
ies, it remains possible that other networks requires information encoded in temporal ingful interaction at these two timescales
(the right hemisphere, for example) windows of ~20–50 ms. To achieve successful has been recently demonstrated, in a study
showing that stimuli that selectively filter
out or preserve these specific modulation
Box 3 | Is speech special? frequencies lead to performance changes
(D.P., unpublished observations). Functional
The question of whether the machinery to analyse speech is ‘special’ has a contentious history in the MRI data also support the hypothesis that
behavioural literature43,102–104, and has migrated into neuroimaging as an area of research105,106. The there is multi-time resolution processing,
concept of specialization presumably means that some aspect of the substrate involved in speech
and that this processing is hemispherically
analysis is dedicated, that is, optimized for the perceptual analysis of speech as opposed to other
classes of acoustic information. Whether the debate has led to substantive insight remains unclear,
asymmetrical, with the right hemisphere
but the issues have stimulated valuable research. showing selectivity for long-term integra-
It may or may not be the case that there are populations of neurons whose response properties tion. The left hemisphere seems less selective
encode speech in a preferential manner. In central auditory areas, information in the time domain is in its response to different integration
especially salient, and perhaps some neuronal populations are optimized for dealing with temporal timescales47. This finding differs from the
information commensurate with information in speech signals. These are open questions, but with view that the left hemisphere is dominant
little effect on the framework presented here. for processing (fast) temporal information,
There is, however, an ‘endgame’ for the perceptual process, and that is to make contact with lexical and the right hemisphere is dominant for
representations to mediate intelligibility. We propose that at the sound–word interface there must processing spectral information49,50. Instead,
be some specialization, for the following reason: lexical representations are used for subsequent
we propose that neural mechanisms for inte-
processing, entering into phonological, morphological, syntactic and compositional semantic
computations. For this, the representation has to be in the correct format, which appears to be
grating information over longer timescales
unique to speech. For example, whistles, birdsong or clapping do not enter into subsequent are predominantly located in the right hemi-
linguistic computation, although they receive a rich analysis. It therefore stands to reason that lexical sphere, whereas mechanisms for integrating
items have some representational property that sets them apart from other auditory information. over shorter timescales might be represented
There must be a stage in speech recognition at which this format is constructed, and if that format is more bilaterally. Thus, we are suggesting that
of a particular type, there is necessarily specialization. the traditional view of the left hemisphere

PERSPECTIVES
being uniquely specialized for processing predisposed to processing or representing perception, and might also accommodate
fast temporal information is incorrect, or at acoustic information more categorically the lesion data on the assumption that
most weakly supported by existing data. than the right hemisphere. This could less categorical representations of speech
Another recently discussed possibil- explain some of the asymmetries found in the right hemisphere are sufficient for
ity26 is that the left hemisphere might be in functional activation studies of speech lexical access.
Table 1 | A sample of recent functional imaging studies of sublexical speech processing

Speech stimuli Control stimuli Task Coordinates Coordinate space Hemisphere Ref.
x y z
CVCs Sinewave analogues Oddball detection –60 –16 –8 Talairach L 31
–64 –36 0 Talairach L
56 –28 –4 Talairach R
52 –20 –16 Talairach R
52 –16 0 Talairach R
CVs Tones Target detection –64 –12 –8 MNI L 102
–56 –16 –12 MNI L
–60 –24 8 MNI L
44 –24 8 MNI R
52 –28 8 MNI R
CVs Tones + noise Passive listening –64 –20 0 MNI L
–64 –32 4 MNI L
–64 –28 8 MNI L
56 –12 –8 MNI R
52 –20 –8 MNI R
CVs Noise Passive listening 56 –8 –4 MNI R
64 –20 0 MNI R
60 –24 –8 MNI R
–64 –16 0 MNI L
–68 –28 –4 MNI L
–64 –36 4 MNI L
–44 –28 12 MNI L
–64 –28 8 MNI L
CVs Sinewave analogues AX discrimination –56 –22 3 Talairach L 103
–51 –14 –4 Talairach L
62 1 –12 Talairach R
Sinewave CVs Sinewave CVs Oddball detection –56 –40 0 Talairach L 101
perceived as speech perceived as non-
speech –60 –24 4 Talairach L
Synthesized CV Spectrally rotated ABX discrimination –60 –8 –3 Talairach L 26

continuum synthesized CVs
CVs Noise Detect repeating –59 –27 –2 MNI L 104
sounds
–63 –16 –6 MNI L
59 –4 –10 MNI R
Sinewave CVCs Sinewave non-speech Passive listening 64 –12 –16 MNI R 100
analogues + chord
progressions –64 –32 –8 MNI L
ABX discrimination, judge whether a third stimulus is the same as the first or second stimulus; AX discrimination, same–different judgments on pairs of stimuli;
CVs, consonant–vowel syllables; CVCs, consonant–vowel–consonant syllables; L, left; MNI, Montreal Neurological Institute; R, right.

PERSPECTIVES
Lexical phonological network is debate about whether other anterior

(bilateral) temporal lobe (ATL) regions also participate
in lexical and semantic processing, and
whether they might contribute to gram-
Segment-level representations Syllable-level representations matical or compositional aspects of speech
processing.
Damage to posterior temporal lobe
regions, particularly along the middle
Gamma-range sampling network Theta-range sampling network temporal gyrus, has long been associated
(bilateral?) (right dominant)
with auditory comprehension deficits61,63,64,
an effect that was recently confirmed in a
Acoustic input large-scale study involving 101 patients61.
Data from direct cortical stimulation studies
corroborate the involvement of the middle
Figure 2 | Parallel routes in the mapping from acoustic input to lexical phonological repre- temporal gyrus in auditory comprehension,
sentations. The figure depicts one characterization of the computational properties of the pathways. but also indicate a much broader network
One pathway samples acoustic input at a relatively fast rate (gamma range) that is appropriate for
involving most of the superior temporal
resolving segment-level information, and may be instantiated in both hemispheres. The other pathway
samples acoustic input at a slower rate (theta range) that is appropriate for resolving syllable level
lobe (including anterior portions), and the
information, and may be more strongly represented in the right hemisphere. Under normal circum- inferior frontal lobe65. Functional imaging
stances these pathways interact, both within and between hemispheres, yet each appears capable of studies have also implicated posterior mid-
separately activating lexical phonological networks. Other hypotheses for the computational dle temporal regions in lexical semantic
properties of the two routes exist. processing66–68. These findings do not
preclude the involvement of more anterior
regions in lexical semantic access, but they
Phonological processing and the STS. Beyond narrative-level stimuli contrasted against do make a strong case for the dominance
the earliest stages of speech recognition, there a low-level auditory control. It is therefore of posterior regions in these processes. We
is accumulating evidence and a convergence impossible to determine which levels of suggested previously that posterior middle
of opinion that portions of the STS are language processing underlie these activa- temporal regions supported lexical and
important for representing and/or processing tions. Furthermore, several recent functional semantic access in the form of a sound-to-
phonological information6,16,26,42,51. The STS is imaging studies have implicated anterior meaning interface network5,6. According
activated by language tasks that require access temporal regions in sentence-level process- to this hypothesis, semantic information is
to phonological information, including both ing56–60, suggesting that syntactic or combi- represented in a highly distributed fashion
the perception and production of speech51, natorial processes might drive much of the throughout the cortex69, and middle pos-
and during active maintenance of phonemic anterior temporal activation. In addition, terior temporal regions are involved in the
information52,53. Portions of the STS seem the claim that posterior STS regions are not mapping between phonological representa-
to be relatively selective for acoustic signals part of the ventral stream is dubious given tions in the STS and widely distributed
that contain phonemic information when the extensive evidence that left posterior semantic representations. Most of the
compared with complex non-speech signals temporal lobe disruption leads to auditory evidence reviewed above indicates a left-
(FIG. 3a). STS activation can be modulated comprehension deficits6,61,62. It is possible dominant organization for this middle tem-
by the manipulation of psycholinguistic that the ventral projection pathways extend poral gyrus network. However, the finding
variables that tap phonological networks54, both posteriorly and anteriorly30. We suggest that the right hemisphere can comprehend
such as phonological neighbourhood density that the crucial portion of the STS that is words reasonably well suggests that there is
(the number of words that sound similar to a involved in phonological-level processes is some degree of bilateral capability in lexical
target word) (FIG. 3b). Thus, a range of studies bounded anteriorly by the most anterolateral and semantic access, but that there are per-
converge on the STS as a site that is crucial to aspect of Heschl’s gyrus and posteriorly by haps some differences in the computations
phonological-level processes. Although many the posterior-most extent of the Sylvian that are carried out in each hemisphere.
authors consider this system to be strongly fissure. This corresponds to the distribution ATL regions have been implicated in both
left dominant, both lesion and imaging (FIG. 3) of activation for ‘phonological’ processing, lexical/semantic and sentence-level process-
evidence suggest a bilateral organization with depicted in FIG. 3. ing (syntactic and semantic integration
perhaps a mild leftward bias. processes). Patients with semantic dementia
A number of studies have found activa- Lexical, semantic and grammatical link- have atrophy involving the ATL bilaterally,
tion during speech processing in anterior ages. Much research on speech perception along with deficits on lexical tasks such as
portions of the STS13,27,29,30, leading to seeks to understand processes that lead to naming, semantic association and single-
suggestions that these regions have an the access of phonological codes. However, word comprehension70 , which has been used
important and, according to some papers, during auditory comprehension, the goal to argue for a lexical or semantic function for
exclusive role in ventral stream phonological of speech processing is to use these codes the ATL29,30. However, these deficits might
processes55. This is in contrast to the typical to access higher-level representations that be more general, given that the atrophy
view that posterior areas form the primary are vital to comprehension. There is strong involves a number of regions in addition to
projection targets of the ventral stream5,6. evidence that posterior middle temporal the lateral ATL, including the bilateral inferior
However, many of the studies that high- regions are involved in accessing lexical and medial temporal lobe, bilateral caudate
lighted anterior regions used sentence- or and semantic information. However, there nucleus and right posterior thalamus, among

PERSPECTIVES
a suggested5,6,22,55 that the auditory dorsal

stream supports an interface with the motor
system, a proposal that is similar to recent
claims that the dorsal visual stream has a
sensorimotor integration function72,73 (BOX 2).
The need for auditory–motor integration.

The idea of auditory–motor interaction in
speech is not new. Wernicke’s classic model
of the neural circuitry of language incor-
porated a direct link between sensory and
motor representations of speech, and argued
explicitly that sensory systems participated
in speech production1. Motor theories
b of speech perception also assume a link
between sensory input and motor speech
systems43. However, the simplest argument
for the necessity of auditory–motor interac-
tion in speech comes from development.
Learning to speak is essentially a motor
learning task. The primary input to this is
sensory, speech in particular. So, there must
be a neural mechanism that both codes and
maintains instances of speech sounds, and
can use these sensory traces to guide the
tuning of speech gestures so that the sounds
are accurately reproduced1,74.
Figure 3 | Lexical phonological networks in the superior temporal sulcus. a | Distribution of We have suggested that speech develop-
activation foci in seven recent studies of speech processing using sublexical stimuli contrasted with ment is a primary and crucial function of
non-speech controls26,31,107–111. All coordinates are plotted in Montreal Neurological Institute (MNI) the proposed dorsal auditory–motor inte-
space (coordinates originally reported in Talairach space were transformed to MNI space). Study gration circuit, and that it also continues to
details and coordinates are presented in TABLE 1. b | Highlighted areas represent sites of activation in function in adults5,6. Evidence for the
a functional MRI study contrasting high neighbourhood density words (those with many similar sound-
latter includes the disruptive effects of
ing neighbours) with low neighbourhood density words (those with few similar sounding neighbours).
The middle to posterior portions of the superior temporal sulcus in both hemispheres (arrows) showed altered auditory feedback on speech
greater activation for high density than low density words (coloured blobs), presumably reflecting the production75,76, articulatory decline in late-
partial activation of larger neural networks when high density words are processed. Neighbourhood onset deafness77 and the ability to acquire
density is a property of lexical phonological networks112; thus, modulation of neural activity by density new vocabulary. We also propose that this
manipulation probably highlights such networks in the brain. Reproduced, with permission, from auditory–motor circuit provides the basic
REF. 54 © (2006) Lippincott Williams & Wilkins. neural mechanisms for phonological short-
term memory53,78,79.
others70. This renders the suggested link In terms of syntactic and compositional We suggest that there are at least two
between lexical deficits and the ATL tenuous. semantic operations, neuroimaging evidence levels of auditory–motor interaction
Higher-level syntactic and compositional is converging on the ATL as an important — one involving speech segments and the
semantic processing might involve the ATL. component of the computational network58–60; other involving sequences of segments.
Functional imaging studies have found por- however, the neuropsychological evidence Segmental-level processes would be involved
tions of the ATL to be more active while indi- remains equivocal. in the acquisition and maintenance of basic
viduals listen to or read sentences rather than articulatory phonetic skills. Auditory–motor
unstructured lists of words or sounds56–58,60. Dorsal stream: sound to action processes at the level of sequences of seg-
This structured-versus-unstructured effect is It is generally agreed that the auditory ments would be involved in the acquisition
independent of the semantic content of the ventral stream supports the perception of new vocabulary, and in the online guid-
stimuli, although semantic manipulations and recognition of auditory objects such as ance of speech sequences80. We propose
can modulate the ATL response somewhat60. speech; although, as outlined above, there that auditory–motor interactions in the
Damage to the ATL has also been linked to are a number of outstanding issues concern- acquisition of new vocabulary involve gen-
deficits in comprehending complex syntactic ing the precise mechanisms and regions erating a sensory representation of the new
structures71. However, data from semantic involved5,6,18,19,55. There is less agreement word that codes the sequence of segments
dementia is contradictory, as these patients regarding the functional role of the auditory or syllables. This sensory representation
are reported to have good sentence-level dorsal stream. The earliest proposals argued can then be used to guide motor articula-
comprehension70. for a role in spatial hearing, a ‘where’ func- tory sequences. This might involve true
In summary, there is strong evidence that tion18, similar to the dorsal ‘where’ stream feedforward mechanisms (whereby sensory
lexical semantic access from auditory input proposal in the cortical visual system17. codes for a speech sequence are translated
involves the posterior lateral temporal lobe. More recently we, along with others, have into a motor speech sequence), feedback

PERSPECTIVES
monitoring mechanisms, or both. As the Functional imaging evidence for a senso- of an auditory–motor integration circuit.
word becomes familiar, the nature of the rimotor dorsal stream. Recent functional This places area Spt within a network
sensory–motor interaction might change. imaging studies have identified a neural cir- of parietal lobe sensorimotor interface
New, low-frequency or more complex words cuit that seems to support auditory–motor regions, each of which seems to be organ-
might require incremental motor coding and interaction52,53. Individuals were asked to ized primarily around a different motor
thus more sensory guidance than known, listen to pseudowords and then subvocally effector system72,94, rather than particular
high-frequency or more simple words, reproduce them. On the assumption that sensory systems.
which might become ‘automated’ as motor areas involved in integrating sensory and Area Spt is located within the planum
chunks that require little sensory guidance. motor processes would have both sensory- temporale (PT), a region that has been at the
This hypothesis is consistent with a large and motor-response properties92,93 (BOX 2), centre of much recent functional–anatomi-
motor learning literature showing shifts in the analyses focused on regions that were cal debate. Although the PT has traditionally
the mechanisms of motor control as a active during both the perceptual and been associated with speech processing, this
function of learning81–83. motor-related phases of the trial. A network view has been challenged on the basis of
of regions was identified, including the pos- functional imaging studies that find activa-
Lesion evidence for a sensorimotor dorsal terior STS bilaterally, a left-dominant site tion in the PT for various acoustic signals
stream. Damage to auditory-related regions in the Sylvian fissure at the boundary (tone sequences, music, spatial signals), as
in the left hemisphere often results in between the parietal and temporal lobes well as some non-acoustic signals (visual
speech production deficits63,84, demonstrat- (area Spt), and left posterior frontal regions speech, sign language, visual motion)95. One
ing that sensory systems participate in (FIG. 4)52,53. We propose that the posterior recent proposal95 is that the PT functions as
motor speech. More specifically, damage to STS (bilaterally) supports sensory coding of a computational hub that takes input from
the left dorsal STG or the temporoparietal speech, and that area Spt is involved in trans- the primary auditory cortex and performs
junction is associated with conduction lation between those sensory codes and the a segregation and spectrotemporal pattern-
aphasia, a syndrome that is characterized motor system53. This hypothesis is motivated matching operation; this leads to the output
by good comprehension but frequent by several considerations: the participation of sound object information, which is proc-
phonemic errors in speech production8,85. of bilateral areas, including the STS, in sen- essed further in lateral temporal lobe areas,
Conduction aphasia has classically been sory and comprehension aspects of speech and spatial position information, which is
considered to be a disconnection syndrome perception; the left-dominant organization processed further in parietal structures. This
involving damage to the arcuate fasciculus. of speech production; the fact that Spt and proposal is squarely within the framework
However, there is now good evidence inferior frontal areas are more tightly corre- of a dual-stream model of auditory process-
that this syndrome results from cortical lated in their activation timecourse than STS ing, but differs prima facie from our view in
dysfunction86,87. The production deficit is and inferior frontal areas52; and the lesions terms of the proposed function of both the
load-sensitive: errors are more likely on associated with conduction aphasia that PT region (computational hub versus senso-
longer, lower-frequency words and verba- coincide with the location of Spt63. rimotor integration) and the dorsal stream
tim repetition of strings of speech with little Follow-up studies have shed more light (spatial versus sensorimotor integration). It
semantic constraint85,88. Functionally, con- on the functional properties of area Spt. is worth noting that the PT is probably not
duction aphasia has been characterized as a Spt activity is not specific to speech: it is homogeneous. In fact, four different cytoar-
deficit in the ability to encode phonological activated equally well by the perception chitectonic fields have been observed in this
information for production89. and reproduction (via humming) of tonal region96. It is therefore conceivable that one
We have suggested that conduction sequences53. Spt was also equally active region of the PT serves as a computational
aphasia represents a disruption of the during the reproduction of sensory stimuli hub (or some other function) and another
auditory–motor interface system6,90, that were perceived aurally (spoken words) portion computes sensorimotor transforma-
particularly at the segment sequence level. or visually (written words)78, indicating tions. Within-subject experiments with
Comprehension of speech is preserved that access to this network is not neces- multiple stimulus and task conditions are
because the lesion does not disrupt ventral sarily restricted to the auditory modality. needed to sort out a possible parcellation
stream pathways and/or because right- However, Spt activity is modulated by a of the PT. Furthermore, the concept that
hemisphere speech systems can compensate manipulation of output modality. Skilled the auditory dorsal stream supports sensori-
for disruption of left-hemisphere speech pianists were asked to listen to novel melo- motor integration is not incompatible with
perception systems. Phonological errors dies and then to covertly reproduce them it also supporting a spatial hearing function
occur because sensory representations of by either humming or imagining playing (that is, both ‘how’ and ‘where’ streams
speech are prevented from providing online them on a keyboard. Spt was significantly could coexist). Finally, the fact that the PT
guidance of speech sound sequencing; this more active during the humming condi- region receives various inputs, both acoustic
effect is most pronounced for longer, lower- tion than the playing condition, despite and non-acoustic, is perfectly consistent
frequency or novel words, because these the fact that the playing condition is more with our proposed model. Speech, tones,
words rely on sensory involvement difficult (G.H., unpublished observations). music, environmental sounds and visual
to a greater extent than shorter, higher- These results indicate that Spt might be speech can all be linked to vocal tract
frequency words, as discussed above. more tightly coupled to a specific motor gestures (such as speech, humming and
Directly relevant to this claim, a recent effector system, the vocal tract, than to a naming). Spatial signals, however, should
functional imaging study showed that specific sensory system. It might therefore not activate the same sensory–vocal tract
activity in the region that is often affected be more accurate to view area Spt as part of network, unless the response task involves
in conduction aphasia is modulated by a sensorimotor integration circuit for the speech. It will be instructive to determine
word length in a covert naming task91. vocal tract, rather than specifically as part whether sensory–vocal tract tasks and

PERSPECTIVES
8. Damasio, H. & Damasio, A. R. The anatomical basis of

L R conduction aphasia. Brain 103, 337–350 (1980).
9. Caplan, D., Gow, D. & Makris, N. Analysis of lesions by
MRI in stroke patients with acoustic-phonetic
processing deficits. Neurology 45, 293–298 (1995).
10. Baker, E., Blumsteim, S. E. & Goodglass, H.
Interaction between phonological and semantic factors
Spt
in auditory comprehension. Neuropsychologia 19,
1–15 (1981).
11. Gainotti, G., Micelli, G., Silveri, M. C. & Villa, G. Some
anatomo-clinical aspects of phonemic and semantic
STS comprehension disorders in aphasia. Acta Neurol.
Scand. 66, 652–665 (1982).
12. Binder, J. R. et al. Functional magnetic resonance
imaging of human auditory cortex. Ann. Neurol. 35,
662–672 (1994).
13. Mazoyer, B. M. et al. The cortical representation of
speech. J. Cogn. Neurosci. 5, 467–479 (1993).
14. Demonet, J.-F. et al. The anatomy of phonological and
semantic processing in normal subjects. Brain 115,
1753–1768 (1992).
15. Zatorre, R. J., Evans, A. C., Meyer, E. & Gjedde, A.
Lateralization of phonetic and pitch discrimination in
speech processing. Science 256, 846–849 (1992).
Figure 4 | Regions activated in an auditory–motor task for speech. Coloured regions demon- 16. Price, C. J. et al. Hearing and saying: The functional
strated sensorimotor responses: they were active during both the perception of speech and the sub- neuro-anatomy of auditory word processing. Brain
119, 919–931 (1996).
vocal reproduction of speech. Note the bilateral activation in the superior temporal sulcus (STS), and 17. Ungerleider, L. G. & Mishkin, M. in Analysis of visual
the unilateral activation of area Spt and frontal regions. Reproduced, with permission, from REF. 113 behavior (eds Ingle, D. J., Goodale, M. A. &
© (2003) Cambridge Univ. Press. Mansfield, R. J. W.) 549–586 (MIT Press, Cambridge,
Massachusetts, 1982).
18. Rauschecker, J. P. Cortical processing of complex
sounds. Curr. Opin. Neurobiol. 8, 516–521 (1998).
19. Scott, S. K. Auditory processing — speech, space and
spatial hearing tasks activate the same PT some specific hypotheses in this regard; auditory objects. Curr. Opin. Neurobiol. 15, 197–201
(2005).
region. Finally, the dorsal stream circuit that for example, that the ventral stream is itself 20. Scott, S. K. & Johnsrude, I. S. The neuroanatomical
we are proposing is strongly left dominant, composed of parallel pathways that could and functional organization of speech perception.
Trends Neurosci. 26, 100–107 (2003).
and, within the left hemisphere, involves differ in terms of their sampling rate, and 21. Warren, J. E., Wise, R. J. & Warren, J. D. Sounds
only a portion of the PT. Consequently, our that the dorsal stream might also involve do-able: auditory–motor transformations and the
posterior temporal plane. Trends Neurosci. 28,
proposal is not intended as a general model parallel systems that differ in scale (segments 636–643 (2005).
of cortical auditory processing or PT func- versus sequences of segments). Further 22. Wise, R. J. S. et al. Separate neural sub-systems within
‘Wernicke’s area’. Brain 124, 83–95 (2001).
tion, which leaves plenty of cortical regions empirical work is required to test these 23. McClelland, J. L. & Elman, J. L. The TRACE model of
for other functional circuits to occupy. hypotheses, and to develop more explicit speech perception. Cognit. Psychol. 18, 1–86 (1986).
24. Marslen-Wilson, W. D. Functional parallelism in
models of the architecture and computations spoken word-recognition. Cognition 25, 71–102
Summary and future perspectives involved. Finally, a major challenge will be to (1987).
25. Luce, P. A. & Pisoni, D. B. Recognizing spoken words:
The dual-stream model that is outlined in understand how these neural models relate the neighborhood activation model. Ear Hear. 19,
this article aims to integrate a wide range to linguistic and psycholinguistic models of 1–36 (1998).
26. Liebenthal, E., Binder, J. R., Spitzer, S. M., Possing, E. T.
of empirical observations, including basic language structure and processing. & Medler, D. A. Neural substrates of phonemic
perceptual processes48, aspects of speech Gregory Hickok is at the Department of Cognitive
perception. Cereb. Cortex 15, 1621–1631 (2005).
27. Narain, C. et al. Defining a left-lateralized response
development6 and speech production91,97, Sciences and the Center for Cognitive Neuroscience, specific to intelligible speech using fMRI. Cereb. Cortex
linguistic and psycholinguistic facts48,54,98, University of California, Irvine, California 92697-5100, 13, 1362–1368 (2003).
USA. 28. Obleser, J., Zimmermann, J., Van Meter, J. &
verbal working memory5,6,53, task-related Rauschecker, J. P. Multiple stages of auditory speech
David Poeppel is at the Departments of Linguistics and perception reflected in event-related fMRI. Cereb.
effects5,6, sensorimotor integration cir- Biology, University of Maryland, College Park, Cortex 5 Dec 2006 (doi:10.1093/cercor/bhl133).
cuits6,53,90, and neuropsychological facts Maryland 20742, USA. 29. Scott, S. K., Blank, C. C., Rosen, S. & Wise, R. J. S.
Identification of a pathway for intelligible speech in the
including patterns of sparing and loss in Correspondence to G.H. left temporal lobe. Brain 123, 2400–2406 (2000).
aphasia6,90. The basic concept, that the e-mail: greg.hickok@uci.edu 30. Spitsyna, G., Warren, J. E., Scott, S. K., Turkheimer, F. E.
& Wise, R. J. Converging language streams in the
acoustic speech network must interface with doi:10.1038/nrn2113
human temporal lobe. J. Neurosci. 26, 7328–7336
conceptual systems on the one hand, and Published online 13 April 2007 (2006).
31. Vouloumanos, A., Kiehl, K. A., Werker, J. F. & Liddle, P. F.
motor–articulatory systems on the other, has 1. Wernicke, C. in Boston Studies in the Philosophy of Detection of sounds in the auditory stream: event-
proven its utility in accounting for an array Science (eds Cohen, R. S. & Wartofsky, M. W.) 34–97 related fMRI evidence for differential activation to
(D. Reidel, Dordrecht, 1874/1969). speech and nonspeech. J. Cogn. Neurosci. 13,
of fundamental observations, and fits with 2. Basso, A., Casati, G. & Vignolo, L. A. Phonemic 994–1005 (2001).
similar proposals in the visual73 — and now identification defects in aphasia. Cortex 13, 84–95 32. Shannon, R. V., Zeng, F.-G., Kamath, V., Wygonski, J. &
(1977). Ekelid, M. Speech recognition with primarily temporal
somatosensory99 — domains. This similarity 3. Blumstein, S. E., Baker, E. & Goodglass, H. cues. Science 270, 303–304 (1995).
in organization across sensory systems sug- Phonological factors in auditory comprehension in 33. Remez, R. E., Rubin, P. E., Pisoni, D. B. & Carrell, T. D.
aphasia. Neuropsychologia 15, 19–30 (1977). Speech perception without traditional speech cues.
gests that the present dual-stream proposal 4. Blumstein, S. E., Cooper, W. E., Zurif, E. B. & Science 212, 947–950 (1981).
for cortical speech processing might be a Caramazza, A. The perception and production of voice- 34. Saberi, K. & Perrott, D. R. Cognitive restoration of
onset time in aphasia. Neuropsychologia 15, reversed speech. Nature 398, 760 (1999).
reflection of a more general principle of 371–383 (1977). 35. Drullman, R., Festen, J. M. & Plomp, R. Effect of
sensory system organization. 5. Hickok, G. & Poeppel, D. Towards a functional reducing slow temporal modulations on speech
neuroanatomy of speech perception. Trends Cogn. Sci. reception. J. Acoust. Soc. Am. 95, 2670–2680
Assuming that the basic dual-stream 4, 131–138 (2000). (1994).
6. Hickok, G. & Poeppel, D. Dorsal and ventral streams: A 36. Drullman, R., Festen, J. M. & Plomp, R. Effect of
framework is correct, the main task for framework for understanding aspects of the functional temporal envelope smearing on speech reception.
future research will be to specify the details anatomy of language. Cognition 92, 67–99 (2004). J. Acoust. Soc. Am. 95, 1053–1064 (1994).
7. Miceli, G., Gainotti, G., Caltagirone, C. & Masullo, C. 37. Zaidel, E. in The Dual Brain: Hemispheric Specialization
of the within-stream organization and Some aspects of phonological impairment in aphasia. in Humans (eds Benson, D. F. & Zaidel, E.) 205–231
computational operations. We have made Brain Lang. 11, 159–169 (1980). (Guilford, New York, 1985).

PERSPECTIVES
38. McGlone, J. Speech comprehension after unilateral 65. Miglioretti, D. L. & Boatman, D. Modeling variability 92. Murata, A., Gallese, V., Kaseda, M. & Sakata, H.
injection of sodium amytal. Brain Lang. 22, 150–157 in cortical representations of human complex sound Parietal neurons related to memory-guided hand
(1984). perception. Exp. Brain Res. 153, 382–387 (2003). manipulation. J. Neurophysiol. 75, 2180–2186
39. Poeppel, D. Pure word deafness and the bilateral 66. Binder, J. R. et al. Human brain language areas (1996).
processing of the speech code. Cogn. Sci. 25, identified by functional magnetic resonance imaging. 93. Rizzolatti, G., Fogassi, L. & Gallese, V. Parietal cortex:
679–693 (2001). J. Neurosci. 17, 353–362 (1997). from sight to action. Curr. Opin. Neurobiol. 7,
40. Stefanatos, G. A., Gershkoff, A. & Madigan, S. On pure 67. Rissman, J., Eliassen, J. C. & Blumstein, S. E. An 562–567 (1997).
word deafness, temporal processing, and the left event-related FMRI investigation of implicit semantic 94. Colby, C. L. & Goldberg, M. E. Space and attention in
hemisphere. J. Int. Neuropsychol. Soc. 11, 456–470 priming. J. Cogn. Neurosci. 15, 1160–1175 (2003). parietal cortex. Ann. Rev. Neurosci. 22, 319–349
(2005). 68. Rodd, J. M., Davis, M. H. & Johnsrude, I. S. The neural (1999).
41. Buchman, A. S., Garron, D. C., Trost-Cardamone, J. E., mechanisms of speech comprehension: fMRI studeis of 95. Griffiths, T. D. & Warren, J. D. The planum temporale
Wichter, M. D. & Schwartz, M. Word deafness: One semantic ambiguity. Cereb. Cortex 15, 1261–1269 as a computational hub. Trends Neurosci. 25,
hundred years later. J. Neurol. Neurosurg. Psychiatr. (2005). 348–353 (2002).
49, 489–499 (1986). 69. Damasio, A. R. & Damasio, H. in Large-scale neuronal 96. Galaburda, A. & Sanides, F. Cytoarchitectonic
42. Binder, J. R. et al. Human temporal lobe activation by theories of the brain (eds Koch, C. & Davis, J. L.) organization of the human auditory cortex. J. Comp.
speech and nonspeech sounds. Cereb. Cortex 10, 61–74 (MIT Press, Cambridge, Massachusetts, Neurol. 190, 597–610 (1980).
512–528 (2000). 1994). 97. Okada, K. & Hickok, G. An fMRI study investigating
43. Liberman, A. M. & Mattingly, I. G. The motor theory of 70. Gorno-Tempini, M. L. et al. Cognition and anatomy in posterior auditory cortex activation in speech
speech perception revised. Cognition 21, 1–36 three variants of primary progressive aphasia. Ann. perception and production: evidence of shared neural
(1985). Neurol. 55, 335–346 (2004). substrates in superior temporal lobe. Soc. Neurosci.
44. Stevens, K. N. Toward a model for lexical access based 71. Dronkers, N. F., Wilkins, D. P., Van Valin, R. D. Jr, Abstr. 288.11 (2003).
on acoustic landmarks and distinctive features. Redfern, B. B. & Jaeger, J. J. in The New Functional 98. Hickok, G. Functional anatomy of speech perception
J. Acoust. Soc. Am. 111, 1872–1891 (2002). Anatomy of Language: A special issue of Cognition and speech production: Psycholinguistic implications.
45. Dupoux, E. in Cognitive Models of Speech Processing (eds Hickok, G. & Poeppel, D.) 145–177 (Elsevier J. Psycholinguist. Res. 30, 225–234 (2001).
(eds Altmann, G. & Shillcock, R.) 81–114 (Erlbaum, Science, 2004). 99. Dijkerman, H. C. & de Haan, E. H. F. Somatosensory
Hillsdale, New Jersey, 1993). 72. Andersen, R. Multimodal integration for the processes subserving perception and action. Behav
46. Greenberg, S. & Arai, T. What are the essential cues representation of space in the posterior parietal Brain Sci. (in the press).
for understanding spoken language? IEICE Trans. Inf. cortex. Philos. Trans. R. Soc. Lond. B Biol. Sci. 352, 100. Perenin, M.-T. & Vighetto, A. Optic ataxia: A specific
Syst. E87-D, 1059–1070 (2004). 1421–1428 (1997). disruption in visuomotor mechanisms. I. Different
47. Boemio, A., Fromm, S., Braun, A. & Poeppel, D. 73. Milner, A. D. & Goodale, M. A. The visual brain in aspects of the deficit in reaching for objects. Brain
Hierarchical and asymmetric temporal sensitivity in action (Oxford Univ. Press, Oxford, 1995). 111, 643–674 (1988).
human auditory cortices. Nature Neurosci. 8, 74. Doupe, A. J. & Kuhl, P. K. Birdsong and human 101. Goodale, M. A., Milner, A. D., Jakobson, L. S. &
389–395 (2005). speech: common themes and mechanisms. Ann. Rev. Carey, D. P. A neurological dissociation between
48. Poeppel, D., Idsardi, W. & van Wassenhove, V. Speech Neurosci. 22, 567–631 (1999). perceiving objects and grasping them. Nature 349,
perception at the interface of neurobiology and 75. Houde, J. F. & Jordan, M. I. Sensorimotor adaptation 154–156 (1991).
linguistics. Phil. Trans. Roy. Soc. (in the press). in speech production. Science 279, 1213–1216 102. Darwin, C. J. in Attention and Performance X:
49. Zatorre, R. J., Belin, P. & Penhune, V. B. Structure and (1998). Control of Language Processes (eds Bouma, J. &
function of auditory cortex: music and speech. Trends 76. Yates, A. J. Delayed auditory feedback. Psychol. Bull. Bouwhuis, D. G.) 197–209 (Lawrence Erlbaum
Cogn. Sci. 6, 37–46 (2002). 60, 213–251 (1963). Associates, London,1984).
50. Schonwiesner, M., Rubsamen, R. & von Cramon, D. Y. 77. Waldstein, R. S. Effects of postlingual deafness on 103. Diehl, R. L. & Kluender, K. R. On the objects of speech
Hemispheric asymmetry for spectral and temporal speech production: implications for the role of perception. Ecol. Psychol. 1, 121–144 (1989).
processing in the human antero-lateral auditory belt auditory feedback. J. Acoust. Soc. Am. 88, 104. Liberman, A. M. & Mattingly, I. G. A specialization
cortex. Eur. J. Neurosci. 22, 1521–1528 (2005). 2099–2144 (1989). for speech perception. Science 243, 489–494
51. Indefrey, P. & Levelt, W. J. The spatial and temporal 78. Buchsbaum, B. R., Olsen, R. K., Koch, P. & Berman, K. F. (1989).
signatures of word production components. Cognition Human dorsal and ventral auditory streams subserve 105. Price, C., Thierry, G. & Griffiths, T. Speech-specific
92, 101–144 (2004). rehearsal-based and echoic processes during verbal auditory processing: where is it? Trends Cogn. Sci. 9,
52. Buchsbaum, B., Hickok, G. & Humphries, C. Role of working memory. Neuron 48, 687–697 (2005). 271–276 (2005).
left posterior superior temporal gyrus in phonological 79. Jacquemot, C. & Scott, S. K. What is the relationship 106. Whalen, D. H. et al. Differentiation of speech and
processing for speech perception and production. between phonological short-term memory and nonspeech processing within primary auditory cortex.
Cogn. Sci. 25, 663–678 (2001). speech processing? Trends Cogn. Sci. 10, 480–486 J. Acoust. Soc. Am. 119, 575–581 (2006).
53. Hickok, G., Buchsbaum, B., Humphries, C. & Muftuler, T. (2006). 107. Benson, R. R., Richardson, M., Whalen, D. H. & Lai, S.
Auditory–motor interaction revealed by fMRI: Speech, 80. Bohland, J. W. & Guenther, F. H. An fMRI investigation Phonetic processing areas revealed by sinewave
music, and working memory in area Spt. J. Cogn. of syllable sequence production. Neuroimage 32, speech and acoustically similar non-speech.
Neurosci. 15, 673–682 (2003). 821–841 (2006). Neuroimage 31, 342–353 (2006).
54. Okada, K. & Hickok, G. Identification of lexical- 81. Doyon, J., Penhune, V. & Ungerleider, L. G. Distinct 108. Dehaene-Lambertz, G. et al. Neural correlates of
phonological networks in the superior temporal sulcus contribution of the cortico-striatal and cortico- switching from auditory to speech perception.
using fMRI. Neuroreport 17, 1293–1296 (2006). cerebellar systems to motor skill learning. Neuroimage 24, 21–33 (2005).
55. Scott, S. K. & Wise, R. J. The functional neuroanatomy Neuropsychologia 41, 252–262 (2003). 109. Jancke, L., Wustenberg, T., Scheich, H. & Heinze, H. J.
of prelexical processing in speech perception. 82. Lindemann, P. G. & Wright, C. E. in Methods, Models, Phonetic perception and the temporal cortex.
Cognition 92, 13–45 (2004). and Conceptual Issues: An Invitation to Cognitive Neuroimage 15, 733–746 (2002).
56. Friederici, A. D., Meyer, M. & von Cramon, D. Y. Science. Vol. 4 (eds Scarborough, D. & Sternberg, E.) 110. Joanisse, M. F. & Gati, J. S. Overlapping neural
Auditory language comprehension: an event-related 523–584 (MIT Press, Cambridge, Massachusetts, regions for processing rapid temporal cues in speech
fMRI study on the processing of syntactic and lexical 1998). and nonspeech signals. Neuroimage 19, 64–79
information. Brain Lang. 74, 289–300 (2000). 83. Schmidt, R. A. A schema theory of discrete motor skill (2003).
57. Humphries, C., Willard, K., Buchsbaum, B. & Hickok, G. learning. Psychol. Rev. 82, 225–260 (1975). 111. Rimol, L. M., Specht, K., Weis, S., Savoy, R. &
Role of anterior temporal cortex in auditory sentence 84. Damasio, A. R. Aphasia. N. Engl. J. Med. 326, Hugdahl, K. Processing of sub-syllabic speech units in
comprehension: an fMRI study. Neuroreport 12, 531–539 (1992). the posterior temporal lobe: an fMRI study.
1749–1752 (2001). 85. Goodglass, H. in Conduction Aphasia (ed. Kohn, S. E.) Neuroimage 26, 1059–1067 (2005).
58. Humphries, C., Love, T., Swinney, D. & Hickok, G. 39–49 (Lawrence Erlbaum Associates, Hillsdale, New 112. Vitevitch, M. S. & Luce, P. A. Probabilistic
Response of anterior temporal cortex to syntactic and Jersey, 1992). phonotactics and neighborhood activation in spoken
prosodic manipulations during sentence processing. 86. Anderson, J. M. et al. Conduction aphasia and the word recognition. J. Mem. Lang. 40, 374–408
Hum. Brain Mapp. 26, 128–138 (2005). arcuate fasciculus: A reexamination of the Wernicke– (1999).
59. Humphries, C., Binder, J. R., Medler, D. A. & Liebenthal, Geschwind model. Brain Lang. 70, 1–12 (1999). 113. Hickok, G. & Buchsbaum, B. Temporal lobe speech
E. Syntactic and semantic modulation of neural activity 87. Hickok, G. et al. A functional magnetic resonance perception systems are part of the verbal working
during auditory sentence comprehension. J. Cogn. imaging study of the role of left posterior superior memory circuit: evidence from two recent fMRI
Neurosci. 18, 665–679 (2006). temporal gyrus in speech production: implications for studies. Behav. Brain Sci. 26, 740–741 (2003).
60. Vandenberghe, R., Nobre, A. C. & Price, C. J. The the explanation of conduction aphasia. Neurosci. Lett.
response of left temporal cortex to sentences. J. Cogn. 287, 156–160 (2000). Acknowledgements
Neurosci. 14, 550–560 (2002). 88. Goodglass, H. Understanding Aphasia (Academic, San Supported by grants from the National Institutes of Health.
61. Bates, E. et al. Voxel-based lesion-symptom mapping. Diego, 1993).
Nature Neurosci. 6, 448–450 (2003). 89. Wilshire, C. E. & McCarthy, R. A. Experimental Competing interests statement
62. Boatman, D. Cortical bases of speech perception: investigations of an impairement in phonological The authors declare no competing financial interests.
evidence from functional lesion studies. Cognition 92, encoding. Cogn. Neuropsychol. 13, 1059–1098 (1996).
47–65 (2004). 90. Hickok, G. in Language and the Brain (eds Grodzinsky, Y.,
63. Damasio, H. in Acquired aphasia (ed. Sarno, M.) Shapiro, L. & Swinney, D.) 87–104 (Academic, San
45–71 (Academic, San Diego, 1991). Diego, 2000). FURTHER INFORMATION
64. Dronkers, N. F., Redfern, B. B. & Knight, R. T. in The 91. Okada, K., Smith, K. R., Humphries, C. & Hickok, G. Hickok’s laboratory: http://lcbr.ss.uci.edu
New Cognitive Neurosciences (ed. Gazzaniga, M. S.) Word length modulates neural activity in auditory Poeppel’s laboratory: http://www.ling.umd.edu/cnl
949–958 (MIT Press, Cambridge, Massachusetts, cortex during covert object naming. Neuroreport 14, Access to this interactive links box is free online.
2000). 2323–2326 (2003).

View publication stats

The Cortical Organization of Speech Processing

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Cortical Organization of Speech Processing

Hochgeladen von

Copyright:

Verfügbare Formate

See

The Cortical Organization of Speech Processing

Article in Nature reviews Neuroscience · June 2007

Mid-level audition View project

The user has requested enhancement of the downloaded file.

NATURE REVIEWS | NEUROSCIENCE VOLUME 8 | MAY 2007 | 393

signals for comprehension (speech recogni-

394 | MAY 2007 | VOLUME 8 www.nature.com/reviews/neuro

a Via higher-order frontal networks mechanisms — that is, parallel processing

NATURE REVIEWS | NEUROSCIENCE VOLUME 8 | MAY 2007 | 395

396 | MAY 2007 | VOLUME 8 www.nature.com/reviews/neuro

Table 1 | A sample of recent functional imaging studies of sublexical speech processing

Synthesized CV Spectrally rotated ABX discrimination –60 –8 –3 Talairach L 26

NATURE REVIEWS | NEUROSCIENCE VOLUME 8 | MAY 2007 | 397

Lexical phonological network is debate about whether other anterior

398 | MAY 2007 | VOLUME 8 www.nature.com/reviews/neuro

a suggested5,6,22,55 that the auditory dorsal

The need for auditory–motor integration.

NATURE REVIEWS | NEUROSCIENCE VOLUME 8 | MAY 2007 | 399

400 | MAY 2007 | VOLUME 8 www.nature.com/reviews/neuro

8. Damasio, H. & Damasio, A. R. The anatomical basis of

NATURE REVIEWS | NEUROSCIENCE VOLUME 8 | MAY 2007 | 401

402 | MAY 2007 | VOLUME 8 www.nature.com/reviews/neuro

Das könnte Ihnen auch gefallen