Sie sind auf Seite 1von 15

Exp Brain Res (2012) 220:319333

DOI 10.1007/s00221-012-3140-6

RESEARCH ARTICLE

Audiovisual crossmodal correspondences and sound symbolism:


a study using the implicit association test
Cesare V. Parise Charles Spence

Received: 26 January 2012 / Accepted: 2 June 2012 / Published online: 17 June 2012
! Springer-Verlag 2012

Abstract A growing body of empirical research on the


topic of multisensory perception now shows that even nonsynaesthetic individuals experience crossmodal correspondences, that is, apparently arbitrary compatibility
effects between stimuli in different sensory modalities. In
the present study, we replicated a number of classic results
from the literature on crossmodal correspondences and
highlight the existence of two new crossmodal correspondences using a modified version of the implicit association
test (IAT). Given that only a single stimulus was presented
on each trial, these results rule out selective attention and
multisensory integration as possible mechanisms underlying the reported compatibility effects on speeded performance. The crossmodal correspondences examined in the
present study all gave rise to very similar effect sizes, and
the compatibility effect had a very rapid onset, thus
Electronic supplementary material The online version of this
article (doi:10.1007/s00221-012-3140-6) contains supplementary
material, which is available to authorized users.
C. V. Parise ! C. Spence
Department of Experimental Psychology,
University of Oxford, Oxford, UK
C. V. Parise
Max Planck Institute for Biological Cybernetics,
Tubingen, Germany
C. V. Parise
Bernstein Centre for Computational Neuroscience,
Tubingen, Germany
C. V. Parise (&)
Department of Cognitive Neuroscience and Center
of Excellence Cognitive Interaction Technology (CITEC),
University of Bielefeld, Universitaetstr. 25, W3-246,
33615 Bielefeld, Germany
e-mail: cesare.parise@tuebingen.mpg.de

speaking to the automatic detection of crossmodal correspondences. These results are further discussed in terms of
the advantages of the IAT over traditional techniques for
assessing the strength and symmetry of various crossmodal
correspondences.
Keywords Multisensory perception ! Audition !
Vision ! Crossmodal correspondences ! Sound symbolism !
Implicit association test

Introduction
Human observers can readily match apparently unrelated
stimuli from different sensory modalities with a striking
degree of consistency. For example, most people unquestioningly agree that an object called mil is likely going to be
smaller than an object called mal (e.g. Newman 1933; Sapir
1929), and that a lemon is fast rather than slow. For
more than a century now cognitive scientists have been
studying such examples of crossmodal correspondences
(Stumpf 1883), which, over the years, have been labelled
using a wide variety of terms such as crossmodal similarities, synaesthetic associations, weak synaesthesia,
and in the case of linguistic stimuli, sound symbolic associations (see Spence 2011 for a review). Broadly speaking,
crossmodal correspondences can be defined as congruency
effects between stimuli presented (either physically present,
or else merely imagined) in different sensory modalities that
result from the expected (i.e. a priori) mapping between
those sensory cues (Parise and Spence in press).
One of the most famous crossmodal correspondences
was proposed back in 1929 by Wolfgang Kohler. He gave
participants two nonsense words, takete and baluma (later
baluma was renamed maluma, Kohler 1947), and two

123

320

outline drawings, a spiky and a rounded one. Kohler had


his participants match the words and drawings in the most
natural way. Surprisingly, most observers immediately
matched the word takete with the spiky figure and the word
baluma with the rounded shape. In the following years,
these original observations have been replicated by many
researchers in different ethnic populations providing virtually identical results (e.g. Bremner et al., submitted;
Davis 1961; Rogers and Ross 1968; see also Hinton et al.
2006), thus speaking to the robustness of the underlying
phenomenon (Robson 2011).
In the wake of Kohlers (1929) and Sapirs (1929) early
observations, the study of crossmodal correspondences
gained in popularity and many other examples have subsequently come to light. In the most basic case, crossmodal
correspondences involve a mapping between basic unimodal sensory dimensions present in different sensory
modalities: So, for example, a crossmodal correspondence
has been demonstrated between auditory pitch and the size
of objects presented either visually or haptically. That is,
smaller objects are typically matched with higher pitched
sounds and larger objects with lower pitched sounds (Evans
and Treisman 2010; Gallace and Spence 2006; Parise and
Spence 2008, 2009; Walker and Smith 1985). Such correspondences between crossmodal sensory dimensions
sometimes appear to reflect the natural correlation between
the physical properties of the external world. Pitch-size
correspondence, for example, might mirror the properties
of acoustic resonance, whereby ceteris paribuslarger
bodies resonate at lower frequencies than smaller bodies.
Other examples of crossmodal correspondences involving basic audiovisual dimensions include the following:
pitch (high vs. low) and shape (angular vs. rounded, Marks
1987), pitch and elevation (high vs. low spatial positions,
Bernstein and Edelstein 1971; Chiou and Rich 2012) and
loudness and brightness (Marks 1987; see Marks 2004;
Parise and Spence in press; Spence 2011, for reviews of the
crossmodal correspondences that have been documented to
date). Once again, it could be argued that such correspondences might rely on the acoustic properties of physical
bodies: Harder objects, for example, tend to resonate at
higher frequencies and break into sharper pieces than softer
bodies (e.g. Freed 1990; Klatzky et al. 2000; Van den Doel
and Pai 1998; Walker et al. 2010), hence perhaps explaining
the crossmodal correspondence between pitch and the
angularity of objects (Walker et al. 2010). Moreover, larger
objects (that normally resonate at lower frequencies) are
generally also heavier and are therefore unlikely to fly or be
found at a higher elevation, hence perhaps explaining the
correspondence between pitch and elevation.
Over the years, researchers have utilized a number of
experimental techniques in order to measure such crossmodal congruency effects. From the early studies, in which

123

Exp Brain Res (2012) 220:319333

observers were explicitly required to match pairs of auditory and visual stimuli (see Davis 1961; Kohler 1929; Sapir
1929; Zigler 1930), cognitive scientists switched to more
sophisticated paradigms that allow for repeated measures
(rather than just a single measure as in many early studies)
within a single observer and often not relying on introspection. The most common technique over the last two or
three decades has relied on the modulation of reaction times
(RTs) in speeded classification tasks in which participants
have to respond to stimuli on a target sensory modality,
while trying to ignore task-irrelevant distractor stimuli
presented in a different sensory modality (see Marks 2004;
Spence 2011, for a reviews). Despite the fact that the stimuli
presented in one sensory modality are completely taskirrelevant, participants RTs are often faster for certain
combinations of (relevant and irrelevant) stimuli and slower
for others. Based on this crossmodal interference on
response latencies, stimulus combinations leading to faster
RTs (and more correct responses) are considered to be
compatible, while those leading to longer RTs (and more
incorrect responses) are considered as being incompatible.
Other techniques instead rely on explicit measures of
similarity. This is the case of the crossmodal matching task
(Stevens and Marks 1965), where participants have to
adjust the magnitude of a stimulus along a given sensory
dimension (e.g. loudness) until it matches the magnitude
of a stimulus on another sensory dimension in a different
modality (e.g. brightness). A more constrained variant of
this technique was later proposed by Marks (1989). He had
participants select which of two stimuli (whose properties
were parametrically manipulated on a trial-by-trial basis) in
a given modality better matched a target stimulus in a
different sensory modality. In addition to these techniques,
many other approaches have also been used in the study of
crossmodal correspondences over the years. They include
the use of the semantic differential technique (Bozzi and
Flores DArcais 1967; Osgood 1960; Osgood et al. 1957;
Oyama et al. 1998; see also Poffenberger and Barrows
1924), preferential looking (Walker et al. 2010), cuing
(whereby the crossmodal congruence or incongruence of a
cue preceding the target stimulus has been shown to
modulate RT to the target stimulus, Melara and OBrien
1990; see also Klein et al. 1987; Chiou and Rich 2012),
analysis of speech sounds (Parise and Pavani 2011) and
EEG (Bien et al. 2012; Kovic et al. 2010; Seo et al. 2010)
to name but a few.
In spite of providing important insights into the underlying nature of crossmodal correspondences, most of these
techniques suffer from various methodological limitations
that potentially compromise the interpretation of many of
these empirical results. Explicit measures of association,
such as the crossmodal matching task in its various forms
or the semantic differential technique, rely on observers

Exp Brain Res (2012) 220:319333

introspection. Therefore, the results critically depend on


observers ability (and/or willingness) to report on their
introspections. Such limitations have been overcome by
indirect techniques based on RTs, such as the speeded
classification task. Nevertheless, these tasks also exhibit a
number of further limitations. First, given that two stimuli
in different modalities have often been presented at the
same time in each trial, any stimulus-dependent modulation of response latencies might reflect some form of failure of selective attention (e.g. Gallace and Spence 2006;
Melara and OBrien 1987), with participants being unable
to fully focus their attention on the target stimuli and
ignore the distracting stimuli. Moreover, while the speeded
classification paradigm provides evidence that compatibility between, say, auditory and visual stimuli affects the
processing of visual information, it does not address the
reciprocal effects on audition within the same experiment.
On top of the various limitations of the traditional techniques, such methodological fragmentation inevitably leads
to further difficulties when it comes to trying to compare
the results from different studies.
In order to study the build-up of crossmodal correspondences and rule out selective attention as a possible
explanation, in the present paper, we wanted to measure the
compatibility between crossmodal stimuli using a variant
of the implicit association test (IAT, Greenwald et al.
1998). Over recent years, the IAT has proved to be one of
the most popular tools with which to study the association
(both implicit and explicit) between different items, and it
overcomes all of the above-mentioned issues. In the simplified version of the task used here, participants respond as
rapidly as possible to a series of stimuli, taken from a set of
four stimuli (i.e. two auditory stimuli, mil and mal, and two
visual stimuli, a small and a large circle). They use just two
response keys, with two stimuli (i.e. one auditory and one
visual) being assigned to the same response key in a given
block of trials. Previously, it has been demonstrated that
participants performance improves when the set of stimuli
assigned to a given response key are also associated with
each other (the compatible conditions), as compared with
conditions in which a set of unrelated (or incompatible)
stimuli are assigned to the same response key (the
incompatible conditions; Greenwald et al. 1998).
In the present study, we experimentally manipulated the
assignment of the four stimuli to each response key from
block to block during the course of the experiment, so that
half of the blocks were assumed to be compatible and the
other half, incompatible. Discrepancies in RT between
different stimulusresponse key assignments are taken to
provide evidence of the existence of a compatibility effect:
Shorter RTs indicate associations between the stimuli
assigned to the same response key, while longer RTs
indicate weaker associations. One important feature

321

associated with the use of such a technique (over, say, the


traditional speeded classification task) is that it provides
evidence of associations between items from both visual
and auditory trials within a single experimental session.
Also, given that on each trial only one stimulus is presented, the IAT rules out possible accounts in terms of
selective attention (where selective attention is what participants need in order to choose between two simultaneously presented, but incongruent, competing stimuli).
Moreover, the present task is based on a standard technique
that has proven to be very sensitive to associations between
stimuli from a variety of categories, and it is flexible
enough to be adapted to crossmodal settings (Crisinel and
Spence 2009, 2010; Dematte` et al. 2006, 2007; Parise and
Spence 2012). In spite of its widely agreed upon name, it
should be noted that the IAT compatibility effect reflects
the outcome of both explicit and implicit associations
(Blair 2002; Fiedler et al. 2006).
In the present study, we wanted to use the IAT in order
to replicate some well-known examples of crossmodal
correspondence (including classic examples from the literature on sound symbolism) never studied using the IAT
before, namely takete-maluma, and mil-mal, and the
association between auditory pitch and the size of visual
objects. Moreover, we also wanted to investigate the
existence of two additional postulated crossmodal correspondencesnamely that between auditory pitch and the
size of visual angles, and that between the waveform of
auditory stimuli and the spikinessroundedness of visual
stimuli. Given that only a single stimulus was presented on
any given trial, providing evidence of crossmodal correspondences using the IAT would rule out any potential
account of compatibility effects in terms of selective
attention or multisensory integration. Furthermore, we
wanted to investigate the effects of crossmodal correspondences in more detail and to compare the compatibility
effect size across conditions and to study their build-up
with a bin analysis of RTs in order to see how long it takes
for the compatibility effect to emerge. Since we used the
very same methods in all five of the experiments reported
in the present study, the description of the experiments is
combined into a single methods section.

Methods
Participants
Fifty participants (twenty-six females) took part in the
present study (ten participants for each of the five experiments). Their mean age was twenty-three years (range,
1835 years), and all of the participants reported normal or
corrected-to-normal vision and audition. The gender and

123

322

age of the participants were roughly matched across


experiments. Each session lasted for approximately 35 min,
and participants received 5 (UK Sterling) voucher in return
for taking part in the study. The experimental procedure was
approved by the Ethics Committee of the Department of
Experimental Psychology, University of Oxford.
Apparatus and materials
The presentation of the stimuli and the collection of the
responses were controlled by a personal computer running
the Psychtoolbox v.2.54 (Brainard 1997; Pelli 1997). Each
participant was seated in front of a 2100 CRT computer
monitor with a resolution of 1280 9 1024 pixels (75 Hz
refresh rate) flanked by a pair of loudspeakers. Participants
responded to the target stimuli by pressing a key of a
computer keyboard. The experiment was conducted in a
dark and quiet room.
Stimuli
Two visual stimuli and two auditory stimuli were used in
each experiment. Details of the stimuli used in each experiment are reported below (see Table 1), and the auditory
stimuli can be found online as supplemental material.
Experiment 1: The visual stimuli consisted of two light
grey circles, one subtending 5 cm and the other subtending
2 cm (5.2" vs. 2.1" of visual angle, respectively), presented
at the centre of the screen against a white background. The
auditory stimuli consisted of the words mil and mal
pronounced by a female voice.
Experiment 2: The visual stimuli consisted of two
shapes, one spiky, the other curved (Kohler 1929, see
Fig. 3), respectively, subtending 6.24 9 3.12, and 4.16 9
4.68" of visual angle, and presented at the centre of the
screen against a white background. The auditory stimuli
consisted of the words takete and maluma pronounced by a female voice.
Experiment 3: The visual stimuli consisted of two light
grey circles, one subtending 5 cm and the other subtending
2 cm (5.2" vs. 2.1" of visual angle, respectively), presented
at the centre of the screen against a white background. The
auditory stimuli consisted of two pure tones, a high and a
low pitched one (4,500 and 300 Hz, respectively). The
perceived intensities (loudness) of the 300 ms tones were
individually matched for each participant with a brief
preliminary psychophysical experiment based on the
QUEST procedure (Watson and Pelli 1983).
Experiment 4: The visual stimuli consisted of the two
angles (i.e. arrowheads), one acute and the other obtuse
(42" and 126", respectively) subtended by two segments,
each segment subtending 4.3" of visual angle. The auditory
stimuli were the same as those used in Experiment 3.

123

Exp Brain Res (2012) 220:319333

Experiment 5: The visual stimuli consisted of an angle


and a curve, both subtending a visual angle of 6.8" 9 2.9",
presented at the centre of the screen against a white
background. The auditory stimuli consisted of two tones
with a frequency of 440 Hz and varying in waveform, with
one being sinusoidal and the other being square. The perceived intensities (loudness) of the two tones were individually matched for each participant with a brief
preliminary psychophysical experiment based on the
QUEST procedure (Watson and Pelli 1983).
Procedure
The participants were instructed to maintain their fixation
on the centre of the screen and to respond to the stimuli
as rapidly and accurately as possible, by pressing one of
two keys on a computer keyboard. Two patches, representing an arrow pointing either to the left or to the
right, marked the relevant response keys. Each trial
began with the presentation of a red fixation point from
the centre of the screen for a randomized interval of
500600 ms. After the removal of the fixation point,
there followed a random interstimulus interval of
300400 ms. Next, the target stimulus, either visual or
auditory, was presented. The visual stimulus remained on
the screen for 300 ms before being removed. The auditory stimuli, also lasting for 300 ms (or approximately
300 ms in Experiments 1 and 2), were repeated only
once on each trial. Feedback in the form of a red cross
was provided after each incorrect response and remained
on the screen for 500 ms.
At the beginning of each block of trials, the participants received new instructions about the mapping
between the stimuli and the appropriate response for the
upcoming block of experimental trials. On each block of
trials, two of the four stimuli, one figure and one word,
were assigned to either the left or the right key and the
remaining stimuli to the other response key. The
instructions remained visible on the screen until the
participants initiated the new block of trials by pressing
the space bar. The mapping of the stimuli onto the
response keys was manipulated during the experiment
thus generating four different pairings of which two were
hypothesized to be compatible (e.g. in Experiment 1, the
small circle and the word mil associated with the same
key; Sapir 1929) while the remaining two were judged as
being incompatible (e.g. in Experiment 1, the large circle
and the word mal were associated with the same key).
Note that a block of trials was considered as being
compatible when the two stimuli associated with a
given response key were hypothesized to be associated
with one another. Conversely, a block of trials was
considered as being incompatible when the hypothetically

Exp Brain Res (2012) 220:319333

323

Table 1 Stimuli and results of the statistical analysis of the five experiments reported in the present study (see main text for further details
regarding the stimuli and the analysis). The light grey double-headed arrows connect compatible audiovisual pairs of stimuli

Exp

Visual stimuli

Auditory stimuli
/mil/

/mal/

/takete/

/maluma/

4500Hz

300Hz

4500Hz

300Hz

square wave

sine wave

associated stimuli were mapped onto different response


keys. Each of the four pairings was repeated six times for
a total of 24 randomly alternating blocks. Each block
consisted of 20 trials (each stimulus repeated four times)
giving rise to a total of 480 trials. Participants were
allowed to take a pause at the end of each block.
Reaction times (RTs) and the accuracy of participants
responses were collected.

IAT Results
Congruency
F(1,9)=23.84 p<.001
Modality
F(1,9)=33.42 p<.001
Compatibility X Modality
F<1
n.s.
Congruency
F(1,9)=22.08 p=.001
Modality
F(1,9)=38.26 p<.001
Compatibility X Modality
F(1,9)=2.45
p=.15
Congruency
F(1,9)=11.07
p=.009
Modality
F(1,9)=12.92 p=.006
Compatibility X Modality
F<1
n.s.
Congruency
F(1,9)=16.54 p=.003
Modality
F(1,9)=13.42 p<.006
Compatibility X Modality
F<1
n.s.
Congruency
F(1,9)=5.71
p=.041
Modality
F(1,9)=21.45 p=.001
Compatibility X Modality
F(1,9)=2.45
p=.15

Instruction about the mapping between the stimuli and


the relevant response keys consisted of a schematic representation of the two response keys with the corresponding visual stimuli displayed next to them. The participants
were required to press two keys 1 and 2 on the keyboard to listen to the stimulus associated, respectively, with
the left and right response keys. There were no time
limits to learn the new stimulusresponse mapping, and

123

324

participants were encouraged to listen to the auditory


stimuli as much as they wanted, until they were sure that
they had learnt the new assignment.

Results
The first four trials of each block, in which the participants
were presumably still learning the new stimulusresponse
mapping, were not included in the data analysis. In order to
normalize the RT distributions, the RT data were logtransformed, and responses that fell three standard deviations above or below the individual means were excluded
from further analyses. Overall, less than 1 % of the trials
were removed from the analysis. The RTs from those trials
in which participants responded correctly were submitted
to a repeated-measure analysis of variance (ANOVA) with
the within-participants factors of compatibility (compatible
versus incompatible) and stimulus type (words versus
pictures). The results of the analysis are reported in
Tables 1 and 2 (see Figs. 1, 2, 3, 4, 5). Note that all of the
statistical inferences are fully replicated when analysing
untransformed data after the removal of outliers, but
without discarding the first four trials of each block.1
Overall, a significant crossmodal congruency effect was
observed in all five experiments, indicating that all of the
crossmodal correspondences investigated here significantly
modulated the latency of participants behavioural
responses. Moreover, in all five experiments, there was also
a significant effect of stimulus modality, showing that
participants responded more rapidly to visual than to
auditory stimuli overall (see also Evans and Treisman
2010). There was no significant interaction between compatibility and stimulus modality in any of the experiments;
though, in Experiment 2 and 5, the interaction term
approached statistical significance.
In order to study the build-up of the compatibility effect
and thus to determine at which stage of information processing it was taking place, we ran a bin analysis of RTs
(see De Jong et al. 1994; Vallesi et al. 2005) by dividing
1

Results of supplemental analysis on untransformed data after the


removal of outliers (i.e., responses above 3sd SD from the individual
means. Overall less than 1 % of the data were removed), but without
discarding the first four trials of each block:
Experiment 1: Congruency: F(1,9) = 17.49, p = .002; Modality:
F(1,9) = 29.41, p \ .001; Interaction: F(1,9) = 1.41, p = .26.
Experiment 2: Congruency: F(1,9) = 16.37, p = .003; Modality:
F(1,9) = 38.21, p \ .001; Interaction: F(1,9) \ 1, n.s.
Experiment 3: Congruency: F(1,9) = 7.36, p = .024; Modality:
F(1,9) = 12.07, p = .007; Interaction: F(1,9) = 2.01, p = .19.
Experiment 4: Congruency: F(1,9) = 7.43, p = .023; Modality:
F(1,9) = 5.586, p = .042; Interaction: F(1,9) \ 1, n.s.
Experiment 5: Congruency: F(1,9) = 5.44, p = .045; Modality:
F(1,9) = 20.22, p = .001; Interaction: F(1,9) = 4.17, p = .072.

123

Exp Brain Res (2012) 220:319333


Table 2 Mean RT in seconds (s) and accuracy (probability of correct
responses) for Experiments 15
Vision

Audition

Congruent

Incongruent

Congruent

Incongruent

0.56

0.65

0.68

0.80

0.94

0.88

0.97

0.85

RT(s)

0.54

0.61

0.66

0.72

Accuracy

0.95

0.92

0.97

0.93

RT(s)

0.55

0.60

0.60

0.68

Accuracy

0.97

0.96

0.95

0.90

RT(s)

0.56

0.62

0.59

0.67

Accuracy

0.92

0.88

0.92

0.85

RT(s)

0.59

0.65

0.67

0.75

Accuracy

0.93

0.94

0.93

0.90

RT(s)

0.56

0.62

0.64

0.72

Accuracy

0.94

0.91

0.95

0.88

Exp 1
RT(s)
Accuracy
Exp 2

Exp 3

Exp 4

Exp 5

Overall

the RT data into 5 bins, from fastest to slowest. This procedure was performed separately for each participant,
modality and stimulusresponse compatibility. This analysis revealed that participants RTs were slower in
incompatible than in compatible trials irrespective of the
bin. To further highlight this difference, we then calculated
the effect size (d-score, Cohen 1988) of compatibility for
each bin by dividing the RT difference between incompatible and compatible trials by the overall standard deviation of that bin (calculated by pooling together compatible
and incompatible responses for each bin). Overall, the
effect size was higher in the central bins. However, for all
five of the experiments reported here, the d-scores were
positive even in the first bin, thus indicating that the
stimulus compatibility modulated response latencies even
when RTs were very fast, thus arguing for an early onset of
the compatibility effect.
In order to compare the size of the congruency effects
for the visual and auditory targets across the five experiments, the overall d-score for visual and auditory responses
for each participant and experiment was calculated. An
ANOVA on the d-scores, with stimulus modality as a
within-participants factor and experiment as a betweenparticipants factor, revealed no main effect of experiment
(F \ 1, ns), no main effect of modality (F(1,4) = 1.053,
p = .31) and no interaction (F(4,45) = 1.075, p = .38, see
Fig. 6).

Exp Brain Res (2012) 220:319333

Congruent blocks
mal
Large circles

Incongruent blocks
mal

800
700
600
500

mil
Large circles

C
on
gr
.
In
co
ng
r.

Small circles

Visual

500

e 1000

500

700
900
Congruent RTs (ms)

Congruent blocks

1000

Incongruent blocks

Incongruent blocks

RTs (ms)

800
600

800
600
400

400
1

Bin

Bin

h
1.5

D-scores

D-scores

e
faster responses
on incongruent trials

500

lin

700

700
900
Congruent RTs (ms)

Congruent blocks

y
id

Incongruent RTs (ms)

faster responses
on incongruent trials

faster responses
on congruent
trials

tit

lin
tit

900

700

500

Auditory trials

faster responses
on congruent
trials

en

900

Auditory

en

Visual trials

id

Incongruent RTs (ms)

gr
.
In
co
ng
r.

Small circles

Reaction time (ms)

mil

900

C
on

RTs (ms)

Fig. 1 The mil-mal effect


modulates observers RTs.
a Examples of stimulus
response key assignment (top
congruent; bottom incongruent).
b Mean RTs for congruent and
incongruent trials on visual and
auditory trials. Error bars
represent the standard error of
the mean across participants and
the asterisks indicate statistical
difference (p \ .05). c Scatter
and bagplot of participants
mean visual RTs on congruent
versus incongruent trials.
d Scatter and bagplot of
participants mean auditory RTs
on congruent versus
incongruent trials. The cross at
the centre of the bagplot
represents the centre of mass of
the bivariate distribution of
empirical data (i.e. the halfspace
depth), the dark grey area (i.e.
the bag) includes the 50 % of
the data with the largest depth,
the light grey polygon contains
all the non-outliers data points
and the stars represent the
outliers (Rousseeuw et al.
1999). e, f Mean RTs of
congruent (black) and
incongruent (grey) visual
(e) and auditory (f) trials for
each bin. Mean effect size of the
RT difference between
incongruent and congruent RTs
for each bin on visual (g) and
auditory (h) trials. In all four
panels, error bars represent the
standard error of the mean

325

1
.5
1

3
Bin

Discussion
The results of the five experiments reported in the present
study demonstrate the existence of several crossmodal
associations between auditory and visual stimuli. In

1.5
1
.5
1

3
Bin

particular, we have replicated several of the traditional


results from the literature on sound symbolism (i.e. takete/
maluma and mil/mal) together with a finding from the
literature on crossmodal correspondences (i.e. the association between auditory pitch and visual size). Moreover,

123

326

Congruent blocks
maluma

Reaction time (ms)

Incongruent blocks

700
600

r.

Auditory

Auditory trials

faster responses
on incongruent trials

400

f
RTs (ms)

800
600

lin
y
tit
en
faster responses
on incongruent trials

400

Incongruent blocks

600
800
Congruent RTs (ms)

Congruent blocks

1000

Incongruent blocks

800
600
400

3
Bin

1.5

D-scores

D-scores

600

600
800
Congruent RTs (ms)

Congruent blocks

1000

Incongruent RTs (ms)

400

800

faster responses
on congruent
trials

id

e
lin
y
tit

600

400

en

800

faster responses
on congruent
trials

id

Incongruent RTs (ms)

Visual trials

co

on

Visual

ng

r.
ng
co

on

In

gr
.

500

takete

400

1
.5
1

3
Bin

we have demonstrated the existence of two new crossmodal correspondences, namely an association between
auditory pitch and the visual size of angles and between the
waveform of auditory stimuli and the roundedness of visual
shapes.

123

800

gr
.

maluma

In

takete

900

RTs (ms)

Fig. 2 The taketemaluma


effect modulates observers
RTs. a Examples of stimulus
response key assignment (top
congruent; bottom incongruent).
b Mean RTs for congruent and
incongruent trials on visual and
auditory trials. Error bars
represent the standard error of
the mean across participants and
the asterisks indicate statistical
difference (p \ .05). c Scatter
and bagplot of participants
mean visual RTs on congruent
versus incongruent trials.
d Scatter and bagplot of
participants mean auditory RTs
on congruent versus
incongruent trials. The cross at
the centre of the bagplot
represents the centre of mass of
the bivariate distribution of
empirical data (i.e. the halfspace
depth), the dark grey area (i.e.
the bag) includes the 50 % of
the data with the largest depth,
the light grey polygon contains
all the non-outliers data points
and the stars represent the
outliers (Rousseeuw et al.
1999). e, f Mean RTs of
congruent (black) and
incongruent (grey) visual
(e) and auditory (f) trials for
each bin. Mean effect size of the
RT difference between
incongruent and congruent RTs
for each bin on visual (g) and
auditory (h) trials. In all four
panels, error bars represent the
standard error of the mean

Exp Brain Res (2012) 220:319333

3
Bin

1.5
1
.5

Bin

Given that in the IAT, only one stimulus is presented at


any given time, and given that both modalities were
equally relevant to the task, the present results cannot,
unlike previous findings, be interpreted in terms of the
costs and/or benefits associated with the simultaneous

Exp Brain Res (2012) 220:319333

Congruent blocks
Low pitch
Large circles

Incongruent blocks

700
600
500

Visual

en

Incongruent RTs (ms)

700
600
500

600
500
faster responses
on incongruent trials

500
600 700 800
Congruent RTs (ms)

500
600 700 800
Congruent RTs (ms)

Congruent blocks

1000

RTs (ms)

Incongruent blocks

800
600
400

Congruent blocks

1000

Incongruent blocks

800
600
400

h
D-scores

1.5
1
.5
1

3
Bin

presentation of certain combinations of stimuli, nor in


terms of a failure of selective attention (see Marks 2004;
Spence 2011), nor in terms of costs and benefits of multisensory integration (see Parise and Spence 2008, 2009;
Parise et al. 2012). In classic interference tasks, two stimuli

Bin

Bin

D-scores

lin

700

faster responses
on incongruent trials

faster responses
on congruent
trials

tit

lin

800

tit

faster responses
on congruent
trials

id

Incongruent RTs (ms)

800

Auditory trials

en

Visual trials

Auditory

id

gr
.

gr
.

C
on
gr
.
In
co
ng
r.

Large circles

on

High pitch

Small circles

800

on

Low pitch

In
c

Small circles

Reaction time (ms)

High pitch

900

RTs (ms)

Fig. 3 Pitch-size compatibility


modulates observers RTs.
a Examples of stimulus
response key assignment (top
congruent; bottom incongruent).
b Mean RTs for congruent and
incongruent trials on visual and
auditory trials. Error bars
represent the standard error of
the mean across participants and
the asterisks indicate statistical
difference (p \ .05). c Scatter
and bagplot of participants
mean visual RTs on congruent
versus incongruent trials.
d Scatter and bagplot of
participants mean auditory RTs
on congruent versus
incongruent trials. The cross at
the centre of the bagplot
represents the centre of mass of
the bivariate distribution of
empirical data (i.e. the halfspace
depth), the dark grey area (i.e.
the bag) includes the 50 % of
the data with the largest depth,
the light grey polygon contains
all the non-outliers data points
and the stars represent the
outliers (Rousseeuw et al.
1999). e, f Mean RTs of
congruent (black) and
incongruent (grey) visual
(e) and auditory (f) trials for
each bin. Mean effect size of the
RT difference between
incongruent and congruent RTs
for each bin on visual (g) and
auditory (h) trials. In all four
panels, error bars represent the
standard error of the mean

327

1.5
1
.5
1

3
Bin

are always presented at more or less the same time with


participants being instructed to respond to only one of
them. It is therefore unclear how much of the reported
effects are due to the presence of an irrelevant stimulus and
how much to the effect of stimulus compatibility. Given

123

328

Congruent blocks
Low pitch

Reaction time (ms)

Incongruent blocks

800
700
600

400

e 1000

Incongruent RTs (ms)

faster responses
on incongruent trials

800

400

ng
r.

lin

600
800
Congruent RTs (ms)

Congruent blocks

1000

Incongruent blocks

Incongruent blocks

RTs (ms)

800
600

800
600
400

Bin

h
D-scores

1.5
1
.5
1

3
Bin

that these interpretational uncertainties are not present in


the IAT, the present results qualify as more genuine effect
of stimulus compatibility.
Interestingly, all of the crossmodal correspondences
studied here had effect sizes that were similar in magnitude

Bin

g
D-scores

y
faster responses
on incongruent trials

400

tit

600

600
800
Congruent RTs (ms)

Congruent blocks

faster responses
on congruent
trials

en

e
lin
y

600

400

Auditory trials

tit
en

800

faster responses
on congruent
trials

Auditory

id

Visual trials

id

Incongruent RTs (ms)

co

ng
r.
co

In

C
on

Visual

400

123

gr
.

500

High pitch

gr
.

Low pitch

In

High pitch

900

C
on

RTs (ms)

Fig. 4 Pitch-angle
compatibility modulates
observers RTs. a Examples of
stimulusresponse key
assignment (top congruent;
bottom incongruent). b Mean
RTs for congruent and
incongruent trials on visual and
auditory trials. Error bars
represent the standard error of
the mean across participants and
the asterisks indicate statistical
difference (p \ .05). c Scatter
and bagplot of participants
mean visual RTs on congruent
versus incongruent trials.
d Scatter and bagplot of
participants mean auditory RTs
on congruent versus
incongruent trials. The cross at
the centre of the bagplot
represents the centre of mass of
the bivariate distribution of
empirical data (i.e. the halfspace
depth), the dark grey area (i.e.
the bag) includes the 50 % of
the data with the largest depth,
the light grey polygon contains
all the non-outliers data points
and the stars represent the
outliers (Rousseeuw et al.
1999). e, f Mean RTs of
congruent (black) and
incongruent (grey) visual
(e) and auditory (f) trials for
each bin. Mean effect size of the
RT difference between
incongruent and congruent RTs
for each bin on visual (g) and
auditory (h) trials. In all four
panels, error bars represent the
standard error of the mean

Exp Brain Res (2012) 220:319333

1.5
1
.5
1

3
Bin

(see Fig. 6). This result suggests that crossmodal correspondences involving elementary stimulus features, such as
pitch and size, and those involving more complex stimuli,
such as nonsense words and line drawings, are equally
effective in inducing crossmodal compatibility effects.

Exp Brain Res (2012) 220:319333

b
Sine wave
Reaction time (ms)

700
600

Visual

1000

Incongruent blocks

RTs (ms)

Congruent blocks

800
600
400

gr
.

e
lin

600
800
Congruent RTs (ms)

1200

Congruent blocks

1000

Incongruent blocks

800
600
400

Bin

Bin

h
1.5

D-scores

D-scores

y
faster responses
on incongruent trials

400

600
800
Congruent RTs (ms)

1200

tit

Incongruent RTs (ms)

600

400

faster responses
on incongruent trials

400

800

faster responses
on congruent
trials

en

e
lin
y
tit

600

400

en

800

faster responses
on congruent
trials

Auditory trials

id

Incongruent RTs (ms)

Visual trials

Auditory

id

on

on

on

In
c

gr
.

500

Square wave

gr
.

Sine wave

800

Incongruent blocks

gr
.

Square wave

900

In
c

Congruent blocks

on

RTs (ms)

Fig. 5 Waveform-roundedness
compatibility modulates
observers RTs. a Examples of
stimulusresponse key
assignment (top congruent;
bottom incongruent). b Mean
RTs for congruent and
incongruent trials on visual and
auditory trials. Error bars
represent the standard error of
the mean across participants and
the asterisks indicate statistical
difference (p \ .05). c Scatter
and bagplot of participants
mean visual RTs on congruent
versus incongruent trials.
d Scatter and bagplot of
participants mean auditory RTs
on congruent versus
incongruent trials. The cross at
the centre of the bagplot
represents the centre of mass of
the bivariate distribution of
empirical data (i.e. the halfspace
depth), the dark grey area (i.e.
the bag) includes the 50 % of
the data with the largest depth,
the light grey polygon contains
all the non-outliers data points
and the stars represent the
outliers (Rousseeuw et al.
1999). e, f Mean RTs of
congruent (black) and
incongruent (grey) visual
(e) and auditory (f) trials for
each bin. Mean effect size of the
RT difference between
incongruent and congruent RTs
for each bin on visual (g) and
auditory (h) trials. In all four
panels, error bars represent the
standard error of the mean

329

1
.5
1

3
Bin

Moreover, our results demonstrate that all of the crossmodal correspondences tested here affected participants
responses to visual and auditory stimuli in much the same
way. This also suggests that crossmodal compatibility
effects are neither modality-specific nor modality-

1.5
1
.5
1

3
Bin

dependent (see Evans and Treisman 2010) and are hence


consistent with there being a unique supramodal mechanism coding for crossmodal correspondences. Taken
together, these results would appear to indicate that all
crossmodal correspondences may be equally effective in

123

330

Exp Brain Res (2012) 220:319333

D-scores

0.8

0.6

0.4

0.2

Fig. 6 Comparison of the effect size (d-score) for vision and audition
between the five experiments. Note that all of the crossmodal
correspondences had a very similar effect size. Error bars represent
the standard error of the mean

terms of modulating participants RTs, and thus suggest


that those compatibility effects are indeed all-or-none
effects.
Traditionally, crossmodal correspondences have often
been considered to be the outcome of so-called weak
synaesthesia shared by all humans (see Martino and Marks
2001; Mulvenna and Walsh 2006; Rader and Tellegen
1987; Rudmin and Cappelli 1983; Simpson et al. 1956;
Ward et al. 2006). Sensory (as opposed to more conceptual) synaesthesia is a condition whereby stimulation in a
given sensory modality automatically elicits additional
idiosyncratic sensations, often in another unstimulated
sensory modality (Grossenbacher and Lovelace 2001). Due
to the involvement of multiples senses in this condition and
due to the fact that also in synaesthesia certain combinations of crossmodal stimuli lead to behavioural facilitation
and others to interference (Dixon et al. 2000; Mills 1999),
it has repeatedly been proposed that crossmodal correspondences and full-blown synaesthesia may represent two
extremes of the same continuum (Martino and Marks 2001;
Rader and Tellegen 1987; Svartdal and Iversen 1989).
However, a common trait of synaesthesia is its unidirectionality, that is, stimulation in one sensory modality (the
inducer) elicits a concurrent sensation in another sensory
modality (the concurrent) but not the other way round
(though see Cohen Kadosh et al. 2007a, b; Cohen Kadosh
and Henik 2007; Johnson et al. 2007; Meier and Rothen
2007, for rare exceptions). With regard to this point,
demonstrating that crossmodal correspondences similarly

123

influence a participants responses to visual and auditory


stimuli, the present data seem to argue against the weak
synaesthesia account. Instead, the present results suggest
that crossmodal correspondences and synaesthesia might
rather be two distinct empirical phenomena that just so
happen to share certain superficial similarities (Parise and
Spence in press). More likely, crossmodal correspondences
might reflect a tuning of the perceptual systems to the
statistical properties of the environment achieved through
evolution (Ludwig et al. 2011) and perceptual learning
(Ernst 2007; Xu et al. 2012).
Revealing a reliable compatibility effect even in the
fastest responses, the bin analysis of RTs demonstrates that
the crossmodal compatibility effects reported here have a
very rapid onset. The presence of a RT modulation in
responses faster than 400 ms would appear to rule out any
possible explanation of the results in terms of explicit
cognitive strategies and suggests that compatibility effects
due to crossmodal correspondences might be the outcome
of automatic processes. Nevertheless, it should be noted
that the responses falling in the slowest bins are likely to
reflect the joint contribution of both automatic processes
and cognitive strategies (Chiou and Rich 2012; Klapetek
et al. in press). These results are compatible with the results
of another recent study by Kovic et al. (2010) on shape
word crossmodal correspondences. There, an effect of
compatibility on evoked potentials was found as early as
140180 ms after stimulus onset. Based on the sites and
latencies of the ERP components modulated by crossmodal
correspondences, Kovic and colleagues suggested that their
effect could be due to audiovisual integration occurring
during early sensory processing. Given that, in the present
study, only a single stimulus was presented on each trial;
however, multisensory integration cannot play a role in the
early onset of the compatibility effect reported here. More
generally, the present results question whether multisensory integration played any role at all in the compatibility
effects found by Kovic et al. (2010). In other words,
although it is known that crossmodal correspondences can
have an effect on multisensory integration (Parise and
Spence 2008, 2009), multisensory integration itself is not
necessary for crossmodal correspondences to induce reliable effects on behaviour.
So far, many other studies have demonstrated the effects
of crossmodal correspondences on RTs (Marks 2004);
however, in most cases, the experimental paradigms did
not allow researchers to assess whether those effects
occurred at a perceptual or at a response selection level. In
a recent RT study, Evans and Treisman (2010) ruled out
the effects of response selection and found significant
effects of crossmodal correspondences on speeded classification tasks. This finding suggests that crossmodal correspondences might operate at a perceptual level and

Exp Brain Res (2012) 220:319333

modulate the speed of perceptual processing. Conversely,


in the present study, only one stimulus was presented at a
time and the only variable that was manipulated experimentally was the response assignment. Therefore, we can
exclude any effect of crossmodal correspondences on
perceptual processing and argue that the effects reported
here likely occur at the level of response selection. Taken
together, the present results and those reported by Evans
and Treisman (2010) complement each other by demonstrating that crossmodal correspondences operate both at a
perceptual level and at the level of response selection and
highlight the wide-ranging effects of crossmodal correspondences on information processing.
To the best of our knowledge, this is the first study specifically to have investigated the famous takete/maluma and
mil/mal effects using an indirect performance measure with
auditory verbal (rather than written) stimuli. Our results
therefore demonstrate that such effects cannot simply be
attributed to some kind of similarity between the shape of the
visual stimuli and the visual appearance of the written words
(see also Bremner et al., submitted; Brang et al. 2011;
Westbury 2005). Rather, the results reported here demonstrate that the similarity involves, at least in the early stages,
the physical features of the visual and the auditory stimuli.
These results move beyond simply replicating previous
findings by showing that crossmodal compatibility can
speed-up the processing of unimodal sensory stimuli, but
do so with a single technique that has been specifically
designed by researchers in order to measure associations
between stimuli. As mentioned in the Introduction, the
modified IAT utilized here has several advantages over
other traditional techniques. First, the IAT provides an
indirect measure of association, therefore taken together,
the present results suggest that all of the crossmodal correspondences investigated here are automatically encoded
by participants. This conclusion is further supported by the
bin analysis (see De Jong et al. 1994; Vallesi et al. 2005),
demonstrating that congruency effects modulated RTs even
in the fastest responses, supposedly less influenced by topdown cognitive control (though surely the effects in the
slowest bins are more likely to reflect the contribution of
both automatic and controlled processes). Second, given
that both modalities are task relevant, the IAT allows one
to easily measure how crossmodal compatibility affects the
processing of both visual and auditory stimuli.
Additionally, by ensuring that only a single (unimodal)
stimulus is presented at any one time, the IAT overcomes
every issue concerning potential spatiotemporal inconsistencies in the combined presentation of audiovisual signals.
When auditory and visual stimuli are jointly presented, any
offset in their relative position, such as when the visual
stimuli are presented on the screen while the auditory
stimuli are played over headphones, might alter

331

multisensory processing and hence interfere with the


crossmodal congruency effects that are observed (e.g. see
Soto-Faraco et al. 2002). Similar problems also occur in the
temporal domain, where asynchronies between auditory
and visual stimuli might occur due to physical and neural
delays. Both physical delays (e.g. due to timing inaccuracies in the experimental set-up) and neural delays (e.g. due
the auditory system being generally faster at transducing
signals than the visual system, see Spence and Squire 2003)
can underlie potential asymmetries in the effect of compatibility, whereby stimuli in a given modality can alter the
processing on a second modality (Chen and Spence 2011),
but not vice versa (see Evans and Treisman 2010).
Previous claims that sound symbolic effects are strong
have typically been based on the consistency of the
responses provided by a large number of participants (see
Robson 2011). In this regard, the IAT allows one to assess
the strength of crossmodal correspondences and sound
symbolic associations in a more subtle way than traditional
techniques. Being based on a large number of responses
from a single observer, the IAT also allows one to measure
the strength of crossmodal correspondences within individual participants, hence providing a measure of individual differences. Moreover, not relying on explicit
responses, the IAT might be suitable to investigate crossmodal correspondences and sound symbolism in special
population, such as autistic individuals (which according to
previous research do not show direct evidence of sound
symbolism, Oberman and Ramachandran 2008; Ramachandran and Oberman 2007) or even primates (e.g., see
Cowey and Weiskrantz 1975; Ludwig et al. 2011; Parker
and Easton 2004; Spence and Deroy 2012; Weiskrantz and
Cowey 1975; see also Premack and Premack 2003).
Nevertheless, it should be noted that the IAT also has
some drawbacks: For example, the frequent changes in
response assignment introduces noise in the data due to
learning and practice effects. Moreover, it is not clear on
which dimension the IAT operates. The IAT compatibility
effect can indeed arise not just due to the relevant features
themselves, but rather due the internal response that they
generate (i.e. people may associate two stimuli because of
any feeling of familiarity that they both engender, because
of the hedonism response they elicit), though this is also
the case for other speeded classification paradigms. Nevertheless, together with the fact that the IAT provides a
standard method for measuring (implicit and explicit)
associations between a wide range of items, our results
suggests that the IAT should be used more extensively in
order to measure correspondence between crossmodal and
unimodal sensory signals and might be a key technique for
discovering novel correspondences.
All of the associations previously reported in the literature, and investigated in the present study, have been

123

332

successfully replicated using a modified version of the IAT.


This procedure enabled us to demonstrate that the size of the
compatibility effects elicited by crossmodal correspondences build-up very quickly (and as recently suggested by
ERP results where the crossmodal correspondence between
abstract visual shapes and words was modulated; see Kovic
et al. 2010) and are stable across modalities and a wide range
of stimuli, thus suggesting the existence of a single underlying automatic mechanism that deals with crossmodal
compatibility.
Acknowledgments Cesare Parise was supported by the Bernstein
Center for Computational Neuroscience, Tubingen, funded by the
German Federal Ministry of Education and Research (BMBF; FKZ:
01GQ1002).

References
Bernstein IH, Edelstein BA (1971) Effects of some variations in
auditory input upon visual choice reaction time. J Exp Psychol
87(2):241247
Bien N, ten Oever S, Goebel R, Sack AT (2012) The sound of size
Crossmodal binding in pitch-size synesthesia: A combined TMS,
EEG and psychophysics study. Neuroimage 59:663672
Blair IV (2002) The malleability of automatic stereotypes and
prejudice. Personal Soc Psychol Rev 6:242261
Bozzi P, Flores DArcais G (1967) Experimental research on the
intermodal relationships between expressive qualities. Arch
Psicol Neurol Psichiatr 28(5):377420
Brang D, Rouw R, Ramachandran VS, Coulson S (2011) Similarly
shaped letters evoke similar colors in graphemecolor synesthesia. Neuropsychologia 49:13551358
Bremner A, Caparos S, Davidoff J, de Fockert J, Linnell K, Spence C
(submitted) Bouba and Kiki in Namibia? Western shapesymbolism does not extend to taste in a remote population.
Cognition
Chen Y-C, Spence C (2011) Crossmodal semantic priming by
naturalistic sounds and spoken words enhances visual sensitivity.
J Exp Psychol Hum Percept Perform 37:15541568
Chiou R, Rich AN (2012) Cross-modality correspondence between
pitch and spatial location modulates attentional orienting.
Perception 41:339353
Cohen J (1988) Statistical power analysis for the behavioral sciences.
Lawrence Erlbaum Associates, Hillsdale, NJ
Cohen Kadosh R, Henik A (2007) Can synaesthesia research inform
cognitive science? Trends Cogn Sci 11(4):177184
Cohen Kadosh R, Cohen Kadosh K, Henik A (2007a) The neuronal
correlate of bidirectional synesthesia: a combined event-related
potential and functional magnetic resonance imaging study.
J Cogn Neurosci 19(12):20502059
Cohen Kadosh R, Henik A, Walsh V (2007b) Small is bright and big
is dark in synaesthesia. Curr Biol 17(19):R834R835
Cowey A, Weiskrantz L (1975) Demonstration of cross-modal
matching in rhesus monkeys, Macaca mulatta. Neuropsychologia 13(1):117120
Crisinel AS, Spence C (2009) Implicit association between basic
tastes and pitch. Neurosci Lett 464(1):3942
Crisinel AS, Spence C (2010) A sweet sound? Food names reveal
implicit associations between taste and pitch. Perception
39(3):417425

123

Exp Brain Res (2012) 220:319333


Davis R (1961) The fitness of names to drawings: a cross-cultural
study in Tanganyika. Br J Psychol 52:259268
De Jong R, Liang CC, Lauber E (1994) Conditional andunconditional
automaticity: a dual-process model of effects of spatial stimulusresponse correspondence. J Exp Psychol Hum Percept Perform
20(4):731750
Dematte` M, Sanabria D, Spence C (2006) Cross-modal associations
between odors and colors. Chem Senses 31(6):531538
Dematte` M, Sanabria D, Spence C (2007) Olfactory-tactile compatibility effects demonstrated using a variation of the implicit
association test. Acta Psychol 124(3):332343
Dixon MJ, Smilek D, Cudahy C, Merikle PM (2000) Five plus two
equals yellow. Nature 406(6794):365
Ernst MO (2007) Learning to integrate arbitrary signals from vision
and touch. J Vis 7(5):114
Evans KK, Treisman A (2010) Natural cross-modal mappings
between visual and auditory features. J Vis 10(1):112
Fiedler K, Messner C, Bluemke M (2006) Unresolved problems with
the I, the A, and the T: a logical and psychometric
critique of the implicit association test (IAT). Eur Rev Soc
Psychol 17:74147
Freed DJ (1990) Auditory correlates of perceived mallet hardness for
a set of recorded percussive sound events. J Acoust Soc Am
87(1):311322
Gallace A, Spence C (2006) Multisensory synesthetic interactions in
the speeded classification of visual size. Percept Psychophys
68(7):11911203
Greenwald AG, McGhee DE, Schwartz JLK (1998) Measuring
individual differences in implicit cognition: the implicit association test. J Pers Soc Psychol 74(6):14641480
Grossenbacher PG, Lovelace CT (2001) Mechanisms of synesthesia:
cognitive and physiological constraints. Trends Cogn Sci
5(1):3641
Hinton L, Nichols J, Ohala JJ (eds) (2006) Sound symbolism.
Cambridge University Press, Cambridge
Johnson A, Jepma M, De Jong R (2007) Colours sometimes count:
awareness and bidirectionality in grapheme-colour synaesthesia.
Q J Exp Psychol 60(10):14061422
Klapetek A, Ngo MK, Spence C (in press) Do crossmodal
correspondences enhance the facilitatory effect of auditory cues
on visual search? Atten Percept Psychophys
Klatzky RL, Pai DK, Krotkov EP (2000) Perception of material from
contact sounds. Presence Teleoper Virtual Environ 9(4):399410
Klein R, Brennan M, Gilani A (1987) Covert cross-modality orienting
of attention in space. Paper presented at the Annual meeting of
the Psychonomic Society, Seattle, WA
Kohler W (1929) Gestalt psychology. Liveright, New York
Kohler W (1947) Gestalt psychology: an introduction to new concepts in
modern psychology. Liveright Publ. Corporation, New York, NY
Kovic V, Plunkett K, Westermann G (2010) The shape of words in the
brain. Cognition 114(1):1928
Ludwig VU, Adachi I, Matzuzawa T (2011) Visuoauditory mappings
between high luminance and high pitch are shared by chimpanzees (Pan troglodytes) and humans. Proc Natl Acad Sci USA
108:2066120665
Marks LE (1987) On cross-modal similarity: auditoryvisual interactions in speeded discrimination. J Exp Psychol Hum Percept
Perform 13(3):384394
Marks LE (1989) On cross-modal similarity: the perceptual structure
of pitch, loudness, and brightness. J Exp Psychol Hum Percept
Perform 15(3):586602
Marks LE (2004) Cross-modal interactions in speeded classification.
In: Calvert GA, Spence C, Stein BE (eds) The handbook of
mutisensory processes. MIT Press, Cambridge, MA, pp 85106
Martino G, Marks LE (2001) Synesthesia: strong and weak. Curr Dir
Psychol Sci 10(2):6165

Exp Brain Res (2012) 220:319333


Meier B, Rothen N (2007) When conditioned responses fire back:
bidirectional cross-activation creates learning opportunities in
synesthesia. Neuroscience 147(3):569572
Melara RD, OBrien TP (1987) Interaction between synesthetically
corresponding dimensions. J Exp Psychol Gen 116(4):323336
Melara RD, OBrien TP (1990) Effects of cuing on cross-modal
congruity. J Mem Lang 29(6):655686
Mills CB (1999) Digit synaesthesia: a case study using a Stroop-type
test. Cogn Neuropsychol 16(2):181191
Mulvenna CM, Walsh V (2006) Synaesthesia: supernormal integration? Trends Cogn Sci 10(8):350352
Newman S (1933) Further experiments in phonetic symbolism. Am J
Psychol 45(1):5375
Oberman LM, Ramachandran VS (2008) Preliminary evidence for
deficits in multisensory integration in autism spectrum disorders:
the mirror neuron hypothesis. Soc Neurosci 3(34):348355
Osgood CE (1960) The cross-cultural generality of visualverbal
synesthetic tendencies. Behav Sci 5(2):146169
Osgood CE, Suci G, Tannenbaum P (1957) The measurement of
meaning. University of Illinois Press, Urbana
Oyama T, Yamada H, Iwasawa H (1998) Synesthetic tendencies as
the basis of sensory symbolism: a review of a series of
experiments by means of semantic differential. Psychologia
41:203215
Parise CV, Pavani F (2011) Evidence of sound symbolism in simple
vocalizations. Exp Brain Res 214(3):373380
Parise CV, Spence C (2008) Synesthetic congruency modulates the
temporal ventriloquism effect. Neurosci Lett 442(3):257261
Parise CV, Spence C (2009) When birds of a feather flock together:
synesthetic correspondences modulate audiovisual integration in
non-synesthetes. PLoS One 4(5):e5664
Parise CV, Spence C (2012) Assessing the associations between brand
packaging and brand attributes using an indirect performance
measure. Food Qual Prefer 24:1723
Parise CV, Spence C (in press) Audiovisual crossmodal correspondences. In Simner J, Hubbard EM (eds) Oxford handbook of
synaesthesia. Oxford University Press, Oxford
Parise CV, Spence C, Ernst MO (2012) When correlation implies
causation in multisensory integration. Curr Biol 22:4649
Parker A, Easton A (2004) Cross-modal memory in primates: the
neural basis of learning about the multisensory properties of
objects and events. In: Calvert GA, Spence C, Stein BE (eds)
The handbook of multisensory processes. MIT Press, Cambridge, MA, pp 333342
Poffenberger A, Barrows B (1924) The feeling value of lines. J Appl
Psychol 8(2):187205
Premack D, Premack AJ (2003) Original intelligence: unlocking the
mystery of who we are. McGraw-Hill, New York
Rader C, Tellegen A (1987) An investigation of synesthesia. J Pers
Soc Psychol 52(5):981987
Ramachandran VS, Oberman LM (2007) Broken mirrors: a theory of
autism. Sci Am Spec Ed 17(2):2029
Robson D (2011) Languages missing link. New Sci 211(2821):3033
Rogers SK, Ross AS (1968) A cross-cultural test of the MalumaTakete phenomenon. Perception 4(1):105106

333
Rousseeuw PJ, Ruts I, Tukey JW (1999) The bagplot: a bivariate
boxplot. Am Stat 53(4):382387
Rudmin F, Cappelli M (1983) Tone-taste synesthesia: a replication.
Percept Mot Skills 56:118
Sapir E (1929) A study in phonetic symbolism. J Exp Psychol
12(3):225239
Seo H-S, Arshamian A, Schemmer K, Scheer I, Sander T, Ritter G,
Hummel T (2010) Cross-modal integration between odors and
abstract symbols. Neurosci Lett 478:175178
Shepherd GM (2012) Neurogastronomy: how the brain creates flavor
and why it matters. Columbia University Press, New York
Simpson RH, Quinn M, Ausubel DP (1956) Synesthesia in children:
association of colors with pure tone frequencies. J Genet Psychol
Res Theory Hum Dev 89(1):95103
Soto-Faraco S, Lyons J, Gazzaniga M, Spence C, Kingstone A (2002)
The ventriloquist in motion: illusory capture of dynamic information across sensory modalities. Cogn Brain Res 14(1):139146
Spence C (2011) Crossmodal correspondences: a tutorial review.
Atten Percept Psychophys 73(4):125
Spence C, Deroy O (2012) Are chimpanzees really synaesthetic?
i-Perception 3:316318
Spence C, Squire S (2003) Multisensory integration: maintaining the
perception of synchrony. Curr Biol 13(13):R519R521
Stevens JC, Marks LE (1965) Cross-modality matching of brightness
and loudness. Proc Natl Acad Sci USA 54(2):407411
Stumpf K (1883) Tonpsychologie. S. Hirzel, Leipzig
Svartdal F, Iversen T (1989) Consistency in synesthetic experience to
vowels and consonants: five case studies. Scand J Psychol
30:220227
Vallesi A, Mapelli D, Schiff S, Amodio P, Umilta` C (2005)
Horizontal and vertical Simon effect: different underlying
mechanisms? Cognition 96(1):B33B43
Van den Doel K, Pai DK (1998) The sounds of physical shapes.
Presence 7(4):382395
Walker P, Smith S (1985) Stroop interference based on the
multimodal correlates of haptic size and auditory pitch. Perception 14(6):729736
Walker P, Bremner J, Mason U, Spring J, Mattock K, Slater A,
Johnson S (2010) Preverbal infants sensitivity to synaesthetic
cross-modality correspondences. Psychol Sci 21(1):2125
Ward J, Huckstep B, Tsakanikos E (2006) Sound-colour synaesthesia:
to what extent does it use cross-modal mechanisms common to
us all? Cortex 42(2):264280
Watson AB, Pelli DG (1983) QUEST-a Bayesian adaptive psychometric method. Percept Psychophys 33(2):113120
Weiskrantz L, Cowey A (1975) Cross-modal matching in the rhesus
monkey using a single pair of stimuli. Neuropsychologia
13(3):257261
Westbury C (2005) Implicit sound symbolism in lexical access:
evidence from an interference task. Brain Lang 93(1):1019
Xu J, Yu L, Rowland BA, Stanford TR, Stein BE (2012) Incorporating cross-modal statistics in the development and maintenance
of multisensory integration. J Neurosci 32:22872298
Zigler MJ (1930) Tone shapes: a novel type of synaesthesia. J Gen
Psychol 3:276287

123

Das könnte Ihnen auch gefallen