Beruflich Dokumente
Kultur Dokumente
DOI 10.1007/s00221-012-3140-6
RESEARCH ARTICLE
Received: 26 January 2012 / Accepted: 2 June 2012 / Published online: 17 June 2012
! Springer-Verlag 2012
speaking to the automatic detection of crossmodal correspondences. These results are further discussed in terms of
the advantages of the IAT over traditional techniques for
assessing the strength and symmetry of various crossmodal
correspondences.
Keywords Multisensory perception ! Audition !
Vision ! Crossmodal correspondences ! Sound symbolism !
Implicit association test
Introduction
Human observers can readily match apparently unrelated
stimuli from different sensory modalities with a striking
degree of consistency. For example, most people unquestioningly agree that an object called mil is likely going to be
smaller than an object called mal (e.g. Newman 1933; Sapir
1929), and that a lemon is fast rather than slow. For
more than a century now cognitive scientists have been
studying such examples of crossmodal correspondences
(Stumpf 1883), which, over the years, have been labelled
using a wide variety of terms such as crossmodal similarities, synaesthetic associations, weak synaesthesia,
and in the case of linguistic stimuli, sound symbolic associations (see Spence 2011 for a review). Broadly speaking,
crossmodal correspondences can be defined as congruency
effects between stimuli presented (either physically present,
or else merely imagined) in different sensory modalities that
result from the expected (i.e. a priori) mapping between
those sensory cues (Parise and Spence in press).
One of the most famous crossmodal correspondences
was proposed back in 1929 by Wolfgang Kohler. He gave
participants two nonsense words, takete and baluma (later
baluma was renamed maluma, Kohler 1947), and two
123
320
123
observers were explicitly required to match pairs of auditory and visual stimuli (see Davis 1961; Kohler 1929; Sapir
1929; Zigler 1930), cognitive scientists switched to more
sophisticated paradigms that allow for repeated measures
(rather than just a single measure as in many early studies)
within a single observer and often not relying on introspection. The most common technique over the last two or
three decades has relied on the modulation of reaction times
(RTs) in speeded classification tasks in which participants
have to respond to stimuli on a target sensory modality,
while trying to ignore task-irrelevant distractor stimuli
presented in a different sensory modality (see Marks 2004;
Spence 2011, for a reviews). Despite the fact that the stimuli
presented in one sensory modality are completely taskirrelevant, participants RTs are often faster for certain
combinations of (relevant and irrelevant) stimuli and slower
for others. Based on this crossmodal interference on
response latencies, stimulus combinations leading to faster
RTs (and more correct responses) are considered to be
compatible, while those leading to longer RTs (and more
incorrect responses) are considered as being incompatible.
Other techniques instead rely on explicit measures of
similarity. This is the case of the crossmodal matching task
(Stevens and Marks 1965), where participants have to
adjust the magnitude of a stimulus along a given sensory
dimension (e.g. loudness) until it matches the magnitude
of a stimulus on another sensory dimension in a different
modality (e.g. brightness). A more constrained variant of
this technique was later proposed by Marks (1989). He had
participants select which of two stimuli (whose properties
were parametrically manipulated on a trial-by-trial basis) in
a given modality better matched a target stimulus in a
different sensory modality. In addition to these techniques,
many other approaches have also been used in the study of
crossmodal correspondences over the years. They include
the use of the semantic differential technique (Bozzi and
Flores DArcais 1967; Osgood 1960; Osgood et al. 1957;
Oyama et al. 1998; see also Poffenberger and Barrows
1924), preferential looking (Walker et al. 2010), cuing
(whereby the crossmodal congruence or incongruence of a
cue preceding the target stimulus has been shown to
modulate RT to the target stimulus, Melara and OBrien
1990; see also Klein et al. 1987; Chiou and Rich 2012),
analysis of speech sounds (Parise and Pavani 2011) and
EEG (Bien et al. 2012; Kovic et al. 2010; Seo et al. 2010)
to name but a few.
In spite of providing important insights into the underlying nature of crossmodal correspondences, most of these
techniques suffer from various methodological limitations
that potentially compromise the interpretation of many of
these empirical results. Explicit measures of association,
such as the crossmodal matching task in its various forms
or the semantic differential technique, rely on observers
321
Methods
Participants
Fifty participants (twenty-six females) took part in the
present study (ten participants for each of the five experiments). Their mean age was twenty-three years (range,
1835 years), and all of the participants reported normal or
corrected-to-normal vision and audition. The gender and
123
322
123
323
Table 1 Stimuli and results of the statistical analysis of the five experiments reported in the present study (see main text for further details
regarding the stimuli and the analysis). The light grey double-headed arrows connect compatible audiovisual pairs of stimuli
Exp
Visual stimuli
Auditory stimuli
/mil/
/mal/
/takete/
/maluma/
4500Hz
300Hz
4500Hz
300Hz
square wave
sine wave
IAT Results
Congruency
F(1,9)=23.84 p<.001
Modality
F(1,9)=33.42 p<.001
Compatibility X Modality
F<1
n.s.
Congruency
F(1,9)=22.08 p=.001
Modality
F(1,9)=38.26 p<.001
Compatibility X Modality
F(1,9)=2.45
p=.15
Congruency
F(1,9)=11.07
p=.009
Modality
F(1,9)=12.92 p=.006
Compatibility X Modality
F<1
n.s.
Congruency
F(1,9)=16.54 p=.003
Modality
F(1,9)=13.42 p<.006
Compatibility X Modality
F<1
n.s.
Congruency
F(1,9)=5.71
p=.041
Modality
F(1,9)=21.45 p=.001
Compatibility X Modality
F(1,9)=2.45
p=.15
123
324
Results
The first four trials of each block, in which the participants
were presumably still learning the new stimulusresponse
mapping, were not included in the data analysis. In order to
normalize the RT distributions, the RT data were logtransformed, and responses that fell three standard deviations above or below the individual means were excluded
from further analyses. Overall, less than 1 % of the trials
were removed from the analysis. The RTs from those trials
in which participants responded correctly were submitted
to a repeated-measure analysis of variance (ANOVA) with
the within-participants factors of compatibility (compatible
versus incompatible) and stimulus type (words versus
pictures). The results of the analysis are reported in
Tables 1 and 2 (see Figs. 1, 2, 3, 4, 5). Note that all of the
statistical inferences are fully replicated when analysing
untransformed data after the removal of outliers, but
without discarding the first four trials of each block.1
Overall, a significant crossmodal congruency effect was
observed in all five experiments, indicating that all of the
crossmodal correspondences investigated here significantly
modulated the latency of participants behavioural
responses. Moreover, in all five experiments, there was also
a significant effect of stimulus modality, showing that
participants responded more rapidly to visual than to
auditory stimuli overall (see also Evans and Treisman
2010). There was no significant interaction between compatibility and stimulus modality in any of the experiments;
though, in Experiment 2 and 5, the interaction term
approached statistical significance.
In order to study the build-up of the compatibility effect
and thus to determine at which stage of information processing it was taking place, we ran a bin analysis of RTs
(see De Jong et al. 1994; Vallesi et al. 2005) by dividing
1
123
Audition
Congruent
Incongruent
Congruent
Incongruent
0.56
0.65
0.68
0.80
0.94
0.88
0.97
0.85
RT(s)
0.54
0.61
0.66
0.72
Accuracy
0.95
0.92
0.97
0.93
RT(s)
0.55
0.60
0.60
0.68
Accuracy
0.97
0.96
0.95
0.90
RT(s)
0.56
0.62
0.59
0.67
Accuracy
0.92
0.88
0.92
0.85
RT(s)
0.59
0.65
0.67
0.75
Accuracy
0.93
0.94
0.93
0.90
RT(s)
0.56
0.62
0.64
0.72
Accuracy
0.94
0.91
0.95
0.88
Exp 1
RT(s)
Accuracy
Exp 2
Exp 3
Exp 4
Exp 5
Overall
the RT data into 5 bins, from fastest to slowest. This procedure was performed separately for each participant,
modality and stimulusresponse compatibility. This analysis revealed that participants RTs were slower in
incompatible than in compatible trials irrespective of the
bin. To further highlight this difference, we then calculated
the effect size (d-score, Cohen 1988) of compatibility for
each bin by dividing the RT difference between incompatible and compatible trials by the overall standard deviation of that bin (calculated by pooling together compatible
and incompatible responses for each bin). Overall, the
effect size was higher in the central bins. However, for all
five of the experiments reported here, the d-scores were
positive even in the first bin, thus indicating that the
stimulus compatibility modulated response latencies even
when RTs were very fast, thus arguing for an early onset of
the compatibility effect.
In order to compare the size of the congruency effects
for the visual and auditory targets across the five experiments, the overall d-score for visual and auditory responses
for each participant and experiment was calculated. An
ANOVA on the d-scores, with stimulus modality as a
within-participants factor and experiment as a betweenparticipants factor, revealed no main effect of experiment
(F \ 1, ns), no main effect of modality (F(1,4) = 1.053,
p = .31) and no interaction (F(4,45) = 1.075, p = .38, see
Fig. 6).
Congruent blocks
mal
Large circles
Incongruent blocks
mal
800
700
600
500
mil
Large circles
C
on
gr
.
In
co
ng
r.
Small circles
Visual
500
e 1000
500
700
900
Congruent RTs (ms)
Congruent blocks
1000
Incongruent blocks
Incongruent blocks
RTs (ms)
800
600
800
600
400
400
1
Bin
Bin
h
1.5
D-scores
D-scores
e
faster responses
on incongruent trials
500
lin
700
700
900
Congruent RTs (ms)
Congruent blocks
y
id
faster responses
on incongruent trials
faster responses
on congruent
trials
tit
lin
tit
900
700
500
Auditory trials
faster responses
on congruent
trials
en
900
Auditory
en
Visual trials
id
gr
.
In
co
ng
r.
Small circles
mil
900
C
on
RTs (ms)
325
1
.5
1
3
Bin
Discussion
The results of the five experiments reported in the present
study demonstrate the existence of several crossmodal
associations between auditory and visual stimuli. In
1.5
1
.5
1
3
Bin
123
326
Congruent blocks
maluma
Incongruent blocks
700
600
r.
Auditory
Auditory trials
faster responses
on incongruent trials
400
f
RTs (ms)
800
600
lin
y
tit
en
faster responses
on incongruent trials
400
Incongruent blocks
600
800
Congruent RTs (ms)
Congruent blocks
1000
Incongruent blocks
800
600
400
3
Bin
1.5
D-scores
D-scores
600
600
800
Congruent RTs (ms)
Congruent blocks
1000
400
800
faster responses
on congruent
trials
id
e
lin
y
tit
600
400
en
800
faster responses
on congruent
trials
id
Visual trials
co
on
Visual
ng
r.
ng
co
on
In
gr
.
500
takete
400
1
.5
1
3
Bin
we have demonstrated the existence of two new crossmodal correspondences, namely an association between
auditory pitch and the visual size of angles and between the
waveform of auditory stimuli and the roundedness of visual
shapes.
123
800
gr
.
maluma
In
takete
900
RTs (ms)
3
Bin
1.5
1
.5
Bin
Congruent blocks
Low pitch
Large circles
Incongruent blocks
700
600
500
Visual
en
700
600
500
600
500
faster responses
on incongruent trials
500
600 700 800
Congruent RTs (ms)
500
600 700 800
Congruent RTs (ms)
Congruent blocks
1000
RTs (ms)
Incongruent blocks
800
600
400
Congruent blocks
1000
Incongruent blocks
800
600
400
h
D-scores
1.5
1
.5
1
3
Bin
Bin
Bin
D-scores
lin
700
faster responses
on incongruent trials
faster responses
on congruent
trials
tit
lin
800
tit
faster responses
on congruent
trials
id
800
Auditory trials
en
Visual trials
Auditory
id
gr
.
gr
.
C
on
gr
.
In
co
ng
r.
Large circles
on
High pitch
Small circles
800
on
Low pitch
In
c
Small circles
High pitch
900
RTs (ms)
327
1.5
1
.5
1
3
Bin
123
328
Congruent blocks
Low pitch
Incongruent blocks
800
700
600
400
e 1000
faster responses
on incongruent trials
800
400
ng
r.
lin
600
800
Congruent RTs (ms)
Congruent blocks
1000
Incongruent blocks
Incongruent blocks
RTs (ms)
800
600
800
600
400
Bin
h
D-scores
1.5
1
.5
1
3
Bin
Bin
g
D-scores
y
faster responses
on incongruent trials
400
tit
600
600
800
Congruent RTs (ms)
Congruent blocks
faster responses
on congruent
trials
en
e
lin
y
600
400
Auditory trials
tit
en
800
faster responses
on congruent
trials
Auditory
id
Visual trials
id
co
ng
r.
co
In
C
on
Visual
400
123
gr
.
500
High pitch
gr
.
Low pitch
In
High pitch
900
C
on
RTs (ms)
Fig. 4 Pitch-angle
compatibility modulates
observers RTs. a Examples of
stimulusresponse key
assignment (top congruent;
bottom incongruent). b Mean
RTs for congruent and
incongruent trials on visual and
auditory trials. Error bars
represent the standard error of
the mean across participants and
the asterisks indicate statistical
difference (p \ .05). c Scatter
and bagplot of participants
mean visual RTs on congruent
versus incongruent trials.
d Scatter and bagplot of
participants mean auditory RTs
on congruent versus
incongruent trials. The cross at
the centre of the bagplot
represents the centre of mass of
the bivariate distribution of
empirical data (i.e. the halfspace
depth), the dark grey area (i.e.
the bag) includes the 50 % of
the data with the largest depth,
the light grey polygon contains
all the non-outliers data points
and the stars represent the
outliers (Rousseeuw et al.
1999). e, f Mean RTs of
congruent (black) and
incongruent (grey) visual
(e) and auditory (f) trials for
each bin. Mean effect size of the
RT difference between
incongruent and congruent RTs
for each bin on visual (g) and
auditory (h) trials. In all four
panels, error bars represent the
standard error of the mean
1.5
1
.5
1
3
Bin
(see Fig. 6). This result suggests that crossmodal correspondences involving elementary stimulus features, such as
pitch and size, and those involving more complex stimuli,
such as nonsense words and line drawings, are equally
effective in inducing crossmodal compatibility effects.
b
Sine wave
Reaction time (ms)
700
600
Visual
1000
Incongruent blocks
RTs (ms)
Congruent blocks
800
600
400
gr
.
e
lin
600
800
Congruent RTs (ms)
1200
Congruent blocks
1000
Incongruent blocks
800
600
400
Bin
Bin
h
1.5
D-scores
D-scores
y
faster responses
on incongruent trials
400
600
800
Congruent RTs (ms)
1200
tit
600
400
faster responses
on incongruent trials
400
800
faster responses
on congruent
trials
en
e
lin
y
tit
600
400
en
800
faster responses
on congruent
trials
Auditory trials
id
Visual trials
Auditory
id
on
on
on
In
c
gr
.
500
Square wave
gr
.
Sine wave
800
Incongruent blocks
gr
.
Square wave
900
In
c
Congruent blocks
on
RTs (ms)
Fig. 5 Waveform-roundedness
compatibility modulates
observers RTs. a Examples of
stimulusresponse key
assignment (top congruent;
bottom incongruent). b Mean
RTs for congruent and
incongruent trials on visual and
auditory trials. Error bars
represent the standard error of
the mean across participants and
the asterisks indicate statistical
difference (p \ .05). c Scatter
and bagplot of participants
mean visual RTs on congruent
versus incongruent trials.
d Scatter and bagplot of
participants mean auditory RTs
on congruent versus
incongruent trials. The cross at
the centre of the bagplot
represents the centre of mass of
the bivariate distribution of
empirical data (i.e. the halfspace
depth), the dark grey area (i.e.
the bag) includes the 50 % of
the data with the largest depth,
the light grey polygon contains
all the non-outliers data points
and the stars represent the
outliers (Rousseeuw et al.
1999). e, f Mean RTs of
congruent (black) and
incongruent (grey) visual
(e) and auditory (f) trials for
each bin. Mean effect size of the
RT difference between
incongruent and congruent RTs
for each bin on visual (g) and
auditory (h) trials. In all four
panels, error bars represent the
standard error of the mean
329
1
.5
1
3
Bin
Moreover, our results demonstrate that all of the crossmodal correspondences tested here affected participants
responses to visual and auditory stimuli in much the same
way. This also suggests that crossmodal compatibility
effects are neither modality-specific nor modality-
1.5
1
.5
1
3
Bin
123
330
D-scores
0.8
0.6
0.4
0.2
Fig. 6 Comparison of the effect size (d-score) for vision and audition
between the five experiments. Note that all of the crossmodal
correspondences had a very similar effect size. Error bars represent
the standard error of the mean
123
331
123
332
References
Bernstein IH, Edelstein BA (1971) Effects of some variations in
auditory input upon visual choice reaction time. J Exp Psychol
87(2):241247
Bien N, ten Oever S, Goebel R, Sack AT (2012) The sound of size
Crossmodal binding in pitch-size synesthesia: A combined TMS,
EEG and psychophysics study. Neuroimage 59:663672
Blair IV (2002) The malleability of automatic stereotypes and
prejudice. Personal Soc Psychol Rev 6:242261
Bozzi P, Flores DArcais G (1967) Experimental research on the
intermodal relationships between expressive qualities. Arch
Psicol Neurol Psichiatr 28(5):377420
Brang D, Rouw R, Ramachandran VS, Coulson S (2011) Similarly
shaped letters evoke similar colors in graphemecolor synesthesia. Neuropsychologia 49:13551358
Bremner A, Caparos S, Davidoff J, de Fockert J, Linnell K, Spence C
(submitted) Bouba and Kiki in Namibia? Western shapesymbolism does not extend to taste in a remote population.
Cognition
Chen Y-C, Spence C (2011) Crossmodal semantic priming by
naturalistic sounds and spoken words enhances visual sensitivity.
J Exp Psychol Hum Percept Perform 37:15541568
Chiou R, Rich AN (2012) Cross-modality correspondence between
pitch and spatial location modulates attentional orienting.
Perception 41:339353
Cohen J (1988) Statistical power analysis for the behavioral sciences.
Lawrence Erlbaum Associates, Hillsdale, NJ
Cohen Kadosh R, Henik A (2007) Can synaesthesia research inform
cognitive science? Trends Cogn Sci 11(4):177184
Cohen Kadosh R, Cohen Kadosh K, Henik A (2007a) The neuronal
correlate of bidirectional synesthesia: a combined event-related
potential and functional magnetic resonance imaging study.
J Cogn Neurosci 19(12):20502059
Cohen Kadosh R, Henik A, Walsh V (2007b) Small is bright and big
is dark in synaesthesia. Curr Biol 17(19):R834R835
Cowey A, Weiskrantz L (1975) Demonstration of cross-modal
matching in rhesus monkeys, Macaca mulatta. Neuropsychologia 13(1):117120
Crisinel AS, Spence C (2009) Implicit association between basic
tastes and pitch. Neurosci Lett 464(1):3942
Crisinel AS, Spence C (2010) A sweet sound? Food names reveal
implicit associations between taste and pitch. Perception
39(3):417425
123
333
Rousseeuw PJ, Ruts I, Tukey JW (1999) The bagplot: a bivariate
boxplot. Am Stat 53(4):382387
Rudmin F, Cappelli M (1983) Tone-taste synesthesia: a replication.
Percept Mot Skills 56:118
Sapir E (1929) A study in phonetic symbolism. J Exp Psychol
12(3):225239
Seo H-S, Arshamian A, Schemmer K, Scheer I, Sander T, Ritter G,
Hummel T (2010) Cross-modal integration between odors and
abstract symbols. Neurosci Lett 478:175178
Shepherd GM (2012) Neurogastronomy: how the brain creates flavor
and why it matters. Columbia University Press, New York
Simpson RH, Quinn M, Ausubel DP (1956) Synesthesia in children:
association of colors with pure tone frequencies. J Genet Psychol
Res Theory Hum Dev 89(1):95103
Soto-Faraco S, Lyons J, Gazzaniga M, Spence C, Kingstone A (2002)
The ventriloquist in motion: illusory capture of dynamic information across sensory modalities. Cogn Brain Res 14(1):139146
Spence C (2011) Crossmodal correspondences: a tutorial review.
Atten Percept Psychophys 73(4):125
Spence C, Deroy O (2012) Are chimpanzees really synaesthetic?
i-Perception 3:316318
Spence C, Squire S (2003) Multisensory integration: maintaining the
perception of synchrony. Curr Biol 13(13):R519R521
Stevens JC, Marks LE (1965) Cross-modality matching of brightness
and loudness. Proc Natl Acad Sci USA 54(2):407411
Stumpf K (1883) Tonpsychologie. S. Hirzel, Leipzig
Svartdal F, Iversen T (1989) Consistency in synesthetic experience to
vowels and consonants: five case studies. Scand J Psychol
30:220227
Vallesi A, Mapelli D, Schiff S, Amodio P, Umilta` C (2005)
Horizontal and vertical Simon effect: different underlying
mechanisms? Cognition 96(1):B33B43
Van den Doel K, Pai DK (1998) The sounds of physical shapes.
Presence 7(4):382395
Walker P, Smith S (1985) Stroop interference based on the
multimodal correlates of haptic size and auditory pitch. Perception 14(6):729736
Walker P, Bremner J, Mason U, Spring J, Mattock K, Slater A,
Johnson S (2010) Preverbal infants sensitivity to synaesthetic
cross-modality correspondences. Psychol Sci 21(1):2125
Ward J, Huckstep B, Tsakanikos E (2006) Sound-colour synaesthesia:
to what extent does it use cross-modal mechanisms common to
us all? Cortex 42(2):264280
Watson AB, Pelli DG (1983) QUEST-a Bayesian adaptive psychometric method. Percept Psychophys 33(2):113120
Weiskrantz L, Cowey A (1975) Cross-modal matching in the rhesus
monkey using a single pair of stimuli. Neuropsychologia
13(3):257261
Westbury C (2005) Implicit sound symbolism in lexical access:
evidence from an interference task. Brain Lang 93(1):1019
Xu J, Yu L, Rowland BA, Stanford TR, Stein BE (2012) Incorporating cross-modal statistics in the development and maintenance
of multisensory integration. J Neurosci 32:22872298
Zigler MJ (1930) Tone shapes: a novel type of synaesthesia. J Gen
Psychol 3:276287
123