Sie sind auf Seite 1von 15

Exp Brain Res (2012) 220:319–333 DOI 10.1007/s00221-012-3140-6

RESEARCH ARTICLE

RESEARCH ARTICLE

Audiovisual crossmodal correspondences and sound symbolism:

a study using the implicit association test

Cesare V. Parise Charles Spence

Received: 26 January 2012 / Accepted: 2 June 2012 / Published online: 17 June 2012 Springer-Verlag 2012

Abstract A growing body of empirical research on the topic of multisensory perception now shows that even non- synaesthetic individuals experience crossmodal corre- spondences, that is, apparently arbitrary compatibility effects between stimuli in different sensory modalities. In the present study, we replicated a number of classic results from the literature on crossmodal correspondences and highlight the existence of two new crossmodal correspon- dences using a modified version of the implicit association test (IAT). Given that only a single stimulus was presented on each trial, these results rule out selective attention and multisensory integration as possible mechanisms underly- ing the reported compatibility effects on speeded perfor- mance. The crossmodal correspondences examined in the present study all gave rise to very similar effect sizes, and the compatibility effect had a very rapid onset, thus

Electronic supplementary material The online version of this article (doi:10.1007/s00221-012-3140-6) contains supplementary material, which is available to authorized users.

C. V. Parise C. Spence

Department of Experimental Psychology, University of Oxford, Oxford, UK

C. V. Parise

Max Planck Institute for Biological Cybernetics, Tu¨ bingen, Germany

C. V. Parise

Bernstein Centre for Computational Neuroscience,

Tu¨ bingen, Germany

C. V. Parise ( &)

Department of Cognitive Neuroscience and Center of Excellence Cognitive Interaction Technology (CITEC), University of Bielefeld, Universitaetstr. 25, W3-246, 33615 Bielefeld, Germany e-mail: cesare.parise@tuebingen.mpg.de

speaking to the automatic detection of crossmodal corre- spondences. These results are further discussed in terms of the advantages of the IAT over traditional techniques for assessing the strength and symmetry of various crossmodal correspondences.

Keywords Multisensory perception Audition Vision Crossmodal correspondences Sound symbolism Implicit association test

Introduction

Human observers can readily match apparently unrelated stimuli from different sensory modalities with a striking degree of consistency. For example, most people unques- tioningly agree that an object called mil is likely going to be smaller than an object called mal (e.g. Newman 1933 ; Sapir 1929 ), and that a lemon is ‘‘fast’’ rather than ‘‘slow’’. For more than a century now cognitive scientists have been studying such examples of crossmodal correspondences (Stumpf 1883 ), which, over the years, have been labelled using a wide variety of terms such as ‘‘crossmodal similari- ties’’, ‘‘synaesthetic associations’’, ‘‘weak synaesthesia’’, and in the case of linguistic stimuli, ‘‘sound symbolic asso- ciations’’ (see Spence 2011 for a review). Broadly speaking, crossmodal correspondences can be defined as congruency effects between stimuli presented (either physically present, or else merely imagined) in different sensory modalities that result from the ‘‘expected’’ (i.e. a priori) mapping between those sensory cues (Parise and Spence in press). One of the most famous crossmodal correspondences was proposed back in 1929 by Wolfgang Ko¨ hler. He gave participants two nonsense words, takete and baluma (later baluma was renamed maluma, Ko¨ hler 1947 ), and two

123

320

Exp Brain Res (2012) 220:319–333

outline drawings, a spiky and a rounded one. Ko¨ hler had his participants match the words and drawings in the most natural way. Surprisingly, most observers immediately matched the word takete with the spiky figure and the word baluma with the rounded shape. In the following years, these original observations have been replicated by many researchers in different ethnic populations providing vir- tually identical results (e.g. Bremner et al., submitted ; Davis 1961 ; Rogers and Ross 1968 ; see also Hinton et al. 2006 ), thus speaking to the robustness of the underlying phenomenon (Robson 2011 ). In the wake of Ko¨ hler’s ( 1929 ) and Sapir’s ( 1929 ) early observations, the study of crossmodal correspondences gained in popularity and many other examples have sub- sequently come to light. In the most basic case, crossmodal correspondences involve a mapping between basic uni- modal sensory dimensions present in different sensory modalities: So, for example, a crossmodal correspondence has been demonstrated between auditory pitch and the size of objects presented either visually or haptically. That is, smaller objects are typically matched with higher pitched sounds and larger objects with lower pitched sounds (Evans and Treisman 2010 ; Gallace and Spence 2006 ; Parise and Spence 2008 , 2009 ; Walker and Smith 1985 ). Such cor- respondences between crossmodal sensory dimensions sometimes appear to reflect the natural correlation between the physical properties of the external world. Pitch-size correspondence, for example, might mirror the properties of acoustic resonance, whereby ceteris paribus—larger bodies resonate at lower frequencies than smaller bodies. Other examples of crossmodal correspondences involv- ing basic audiovisual dimensions include the following:

pitch (high vs. low) and shape (angular vs. rounded, Marks 1987 ), pitch and elevation (high vs. low spatial positions, Bernstein and Edelstein 1971 ; Chiou and Rich 2012 ) and loudness and brightness (Marks 1987 ; see Marks 2004 ; Parise and Spence in press; Spence 2011 , for reviews of the crossmodal correspondences that have been documented to date). Once again, it could be argued that such correspon- dences might rely on the acoustic properties of physical bodies: Harder objects, for example, tend to resonate at higher frequencies and break into sharper pieces than softer bodies (e.g. Freed 1990 ; Klatzky et al. 2000 ; Van den Doel and Pai 1998 ; Walker et al. 2010 ), hence perhaps explaining the crossmodal correspondence between pitch and the angularity of objects (Walker et al. 2010 ). Moreover, larger objects (that normally resonate at lower frequencies) are generally also heavier and are therefore unlikely to fly or be found at a higher elevation, hence perhaps explaining the correspondence between pitch and elevation. Over the years, researchers have utilized a number of experimental techniques in order to measure such cross- modal congruency effects. From the early studies, in which

123

observers were explicitly required to match pairs of audi- tory and visual stimuli (see Davis 1961 ; Ko¨ hler 1929 ; Sapir 1929 ; Zigler 1930 ), cognitive scientists switched to more sophisticated paradigms that allow for repeated measures (rather than just a single measure as in many early studies) within a single observer and often not relying on intro- spection. The most common technique over the last two or three decades has relied on the modulation of reaction times (RTs) in speeded classification tasks in which participants have to respond to stimuli on a target sensory modality, while trying to ignore task-irrelevant distractor stimuli presented in a different sensory modality (see Marks 2004 ;

Spence 2011 , for a reviews). Despite the fact that the stimuli presented in one sensory modality are completely task- irrelevant, participants’ RTs are often faster for certain combinations of (relevant and irrelevant) stimuli and slower for others. Based on this crossmodal interference on response latencies, stimulus combinations leading to faster RTs (and more correct responses) are considered to be compatible, while those leading to longer RTs (and more incorrect responses) are considered as being incompatible. Other techniques instead rely on explicit measures of similarity. This is the case of the crossmodal matching task (Stevens and Marks 1965 ), where participants have to adjust the magnitude of a stimulus along a given sensory dimension (e.g. loudness) until it ‘‘matches’’ the magnitude of a stimulus on another sensory dimension in a different modality (e.g. brightness). A more constrained variant of this technique was later proposed by Marks ( 1989 ). He had participants select which of two stimuli (whose properties were parametrically manipulated on a trial-by-trial basis) in

a given modality better matched a target stimulus in a

different sensory modality. In addition to these techniques,

many other approaches have also been used in the study of crossmodal correspondences over the years. They include the use of the semantic differential technique (Bozzi and Flores D’Arcais 1967 ; Osgood 1960 ; Osgood et al. 1957 ; Oyama et al. 1998 ; see also Poffenberger and Barrows 1924 ), preferential looking (Walker et al. 2010 ), cuing (whereby the crossmodal congruence or incongruence of a cue preceding the target stimulus has been shown to modulate RT to the target stimulus, Melara and O’Brien 1990 ; see also Klein et al. 1987 ; Chiou and Rich 2012 ), analysis of speech sounds (Parise and Pavani 2011 ) and

EEG (Bien et al. 2012 ; Kovic et al. 2010 ; Seo et al. 2010 )

to name but a few.

In spite of providing important insights into the under- lying nature of crossmodal correspondences, most of these techniques suffer from various methodological limitations that potentially compromise the interpretation of many of these empirical results. Explicit measures of association, such as the crossmodal matching task in its various forms or the semantic differential technique, rely on observers’

Exp Brain Res (2012) 220:319–333

321

introspection. Therefore, the results critically depend on observers’ ability (and/or willingness) to report on their introspections. Such limitations have been overcome by indirect techniques based on RTs, such as the speeded classification task. Nevertheless, these tasks also exhibit a number of further limitations. First, given that two stimuli in different modalities have often been presented at the same time in each trial, any stimulus-dependent modula- tion of response latencies might reflect some form of fail- ure of selective attention (e.g. Gallace and Spence 2006 ; Melara and O’Brien 1987 ), with participants being unable to fully focus their attention on the target stimuli and ignore the distracting stimuli. Moreover, while the speeded classification paradigm provides evidence that compati- bility between, say, auditory and visual stimuli affects the processing of visual information, it does not address the reciprocal effects on audition within the same experiment. On top of the various limitations of the traditional tech- niques, such methodological fragmentation inevitably leads to further difficulties when it comes to trying to compare the results from different studies. In order to study the build-up of crossmodal corre- spondences and rule out selective attention as a possible explanation, in the present paper, we wanted to measure the compatibility between crossmodal stimuli using a variant of the implicit association test (IAT, Greenwald et al. 1998 ). Over recent years, the IAT has proved to be one of the most popular tools with which to study the association (both implicit and explicit) between different items, and it overcomes all of the above-mentioned issues. In the sim- plified version of the task used here, participants respond as rapidly as possible to a series of stimuli, taken from a set of four stimuli (i.e. two auditory stimuli, mil and mal, and two visual stimuli, a small and a large circle). They use just two response keys, with two stimuli (i.e. one auditory and one visual) being assigned to the same response key in a given block of trials. Previously, it has been demonstrated that participants’ performance improves when the set of stimuli assigned to a given response key are also associated with each other (the compatible conditions), as compared with conditions in which a set of unrelated (or incompatible) stimuli are assigned to the same response key (the

Shorter RTs indicate associations between the stimuli

associated with the use of such a technique (over, say, the traditional speeded classification task) is that it provides evidence of associations between items from both visual and auditory trials within a single experimental session. Also, given that on each trial only one stimulus is pre- sented, the IAT rules out possible accounts in terms of selective attention (where selective attention is what par- ticipants need in order to choose between two simulta- neously presented, but incongruent, competing stimuli). Moreover, the present task is based on a standard technique that has proven to be very sensitive to associations between stimuli from a variety of categories, and it is flexible enough to be adapted to crossmodal settings (Crisinel and Spence 2009 , 2010 ; Dematte` et al. 2006 , 2007 ; Parise and Spence 2012 ). In spite of it’s widely agreed upon name, it should be noted that the IAT compatibility effect reflects the outcome of both explicit and implicit associations (Blair 2002 ; Fiedler et al. 2006 ). In the present study, we wanted to use the IAT in order to replicate some well-known examples of crossmodal correspondence (including classic examples from the lit- erature on sound symbolism) never studied using the IAT before, namely takete-maluma, and mil-mal, and the association between auditory pitch and the size of visual objects. Moreover, we also wanted to investigate the existence of two additional postulated crossmodal corre- spondences—namely that between auditory pitch and the size of visual angles, and that between the waveform of auditory stimuli and the spikiness–roundedness of visual stimuli. Given that only a single stimulus was presented on any given trial, providing evidence of crossmodal corre- spondences using the IAT would rule out any potential account of compatibility effects in terms of selective attention or multisensory integration. Furthermore, we wanted to investigate the effects of crossmodal corre- spondences in more detail and to compare the compatibility effect size across conditions and to study their build-up with a bin analysis of RTs in order to see how long it takes for the compatibility effect to emerge. Since we used the very same methods in all five of the experiments reported in the present study, the description of the experiments is combined into a single methods section.

incompatible conditions; Greenwald et al. 1998 ). In the present study, we experimentally manipulated the assignment of the four stimuli to each response key from

Methods

block to block during the course of the experiment, so that half of the blocks were assumed to be compatible and the

Participants

other half, incompatible. Discrepancies in RT between different stimulus–response key assignments are taken to provide evidence of the existence of a compatibility effect:

assigned to the same response key, while longer RTs indicate weaker associations. One important feature

Fifty participants (twenty-six females) took part in the present study (ten participants for each of the five experi- ments). Their mean age was twenty-three years (range, 18–35 years), and all of the participants reported normal or corrected-to-normal vision and audition. The gender and

123

322

Exp Brain Res (2012) 220:319–333

age of the participants were roughly matched across experiments. Each session lasted for approximately 35 min, and participants received £5 (UK Sterling) voucher in return for taking part in the study. The experimental procedure was approved by the Ethics Committee of the Department of Experimental Psychology, University of Oxford.

Apparatus and materials

The presentation of the stimuli and the collection of the responses were controlled by a personal computer running the Psychtoolbox v.2.54 (Brainard 1997; Pelli 1997). Each participant was seated in front of a 21 00 CRT computer monitor with a resolution of 1280 9 1024 pixels (75 Hz refresh rate) flanked by a pair of loudspeakers. Participants responded to the target stimuli by pressing a key of a computer keyboard. The experiment was conducted in a dark and quiet room.

Stimuli

Two visual stimuli and two auditory stimuli were used in each experiment. Details of the stimuli used in each exper- iment are reported below (see Table 1 ), and the auditory stimuli can be found online as supplemental material. Experiment 1 : The visual stimuli consisted of two light grey circles, one subtending 5 cm and the other subtending

2 cm (5.2 vs. 2.1 of visual angle, respectively), presented

at the centre of the screen against a white background. The auditory stimuli consisted of the words ‘‘mil’’ and ‘‘mal’’ pronounced by a female voice. Experiment 2 : The visual stimuli consisted of two shapes, one spiky, the other curved (Ko¨ hler 1929 , see Fig. 3 ), respectively, subtending 6.24 9 3.12, and 4.16 9 4.68 of visual angle, and presented at the centre of the screen against a white background. The auditory stimuli consisted of the words ‘‘takete’’ and ‘‘maluma’’ pro- nounced by a female voice. Experiment 3 : The visual stimuli consisted of two light grey circles, one subtending 5 cm and the other subtending

2 cm (5.2 vs. 2.1 of visual angle, respectively), presented at the centre of the screen against a white background. The auditory stimuli consisted of two pure tones, a high and a low pitched one (4,500 and 300 Hz, respectively). The perceived intensities (loudness) of the 300 ms tones were individually matched for each participant with a brief preliminary psychophysical experiment based on the QUEST procedure (Watson and Pelli 1983 ). Experiment 4 : The visual stimuli consisted of the two angles (i.e. arrowheads), one acute and the other obtuse (42 and 126 , respectively) subtended by two segments, each segment subtending 4.3 of visual angle. The auditory stimuli were the same as those used in Experiment 3.

123

Experiment 5 : The visual stimuli consisted of an angle and a curve, both subtending a visual angle of 6.8 9 2.9 , presented at the centre of the screen against a white background. The auditory stimuli consisted of two tones with a frequency of 440 Hz and varying in waveform, with one being sinusoidal and the other being square. The per- ceived intensities (loudness) of the two tones were indi- vidually matched for each participant with a brief preliminary psychophysical experiment based on the QUEST procedure (Watson and Pelli 1983 ).

Procedure

The participants were instructed to maintain their fixation on the centre of the screen and to respond to the stimuli as rapidly and accurately as possible, by pressing one of two keys on a computer keyboard. Two patches, repre- senting an arrow pointing either to the left or to the right, marked the relevant response keys. Each trial began with the presentation of a red fixation point from the centre of the screen for a randomized interval of 500–600 ms. After the removal of the fixation point, there followed a random interstimulus interval of 300–400 ms. Next, the target stimulus, either visual or auditory, was presented. The visual stimulus remained on the screen for 300 ms before being removed. The audi- tory stimuli, also lasting for 300 ms (or approximately 300 ms in Experiments 1 and 2), were repeated only once on each trial. Feedback in the form of a red cross was provided after each incorrect response and remained on the screen for 500 ms. At the beginning of each block of trials, the partici- pants received new instructions about the mapping between the stimuli and the appropriate response for the upcoming block of experimental trials. On each block of trials, two of the four stimuli, one figure and one word, were assigned to either the left or the right key and the remaining stimuli to the other response key. The instructions remained visible on the screen until the participants initiated the new block of trials by pressing the space bar. The mapping of the stimuli onto the response keys was manipulated during the experiment thus generating four different pairings of which two were hypothesized to be compatible (e.g. in Experiment 1, the small circle and the word ‘‘mil’’ associated with the same key; Sapir 1929 ) while the remaining two were judged as being incompatible (e.g. in Experiment 1, the large circle and the word ‘‘mal’’ were associated with the same key). Note that a block of trials was considered as being ‘compatible’ when the two stimuli associated with a given response key were hypothesized to be associated with one another. Conversely, a block of trials was considered as being incompatible when the hypothetically

Exp Brain Res (2012) 220:319–333

323

Table 1 Stimuli and results of the statistical analysis of the five experiments reported in the present study (see main text for further details regarding the stimuli and the analysis). The light grey double-headed arrows connect compatible audiovisual pairs of stimuli

Exp Visual stimuli Auditory stimuli IAT Results /mil/ 1 Congruency F(1,9)=23.84 p<.001 Modality F(1,9)=33.42
Exp
Visual stimuli
Auditory stimuli
IAT Results
/mil/
1
Congruency
F(1,9)=23.84 p<.001
Modality
F(1,9)=33.42 p<.001
/mal/
Compatibility X Modality
F<1
n.s.
/takete/
2
/maluma/
Congruency
F(1,9)=22.08 p=.001
Modality
F(1,9)=38.26 p<.001
Compatibility X Modality
F(1,9)=2.45 p=.15
4500Hz
3
Congruency
F(1,9)=11.07 p=.009
Modality
F(1,9)=12.92 p=.006
Compatibility X Modality
F<1
n.s.
300Hz
4500Hz
4
Congruency
F(1,9)=16.54 p=.003
Modality
F(1,9)=13.42 p<.006
Compatibility X Modality
F<1
n.s.
300Hz
5
square wave
Congruency
F(1,9)=5.71 p=.041
Modality
F(1,9)=21.45 p=.001
Compatibility X Modality
F(1,9)=2.45 p=.15
sine wave

associated stimuli were mapped onto different response keys. Each of the four pairings was repeated six times for a total of 24 randomly alternating blocks. Each block consisted of 20 trials (each stimulus repeated four times) giving rise to a total of 480 trials. Participants were allowed to take a pause at the end of each block. Reaction times (RTs) and the accuracy of participants’ responses were collected.

Instruction about the mapping between the stimuli and the relevant response keys consisted of a schematic rep- resentation of the two response keys with the correspond- ing visual stimuli displayed next to them. The participants were required to press two keys ‘‘1’’ and ‘‘2’’ on the key- board to listen to the stimulus associated, respectively, with the left and right response keys. There were no time limits to learn the new stimulus–response mapping, and

123

324

Exp Brain Res (2012) 220:319–333

participants were encouraged to listen to the auditory stimuli as much as they wanted, until they were sure that they had learnt the new assignment.

Results

The first four trials of each block, in which the participants were presumably still learning the new stimulus–response mapping, were not included in the data analysis. In order to normalize the RT distributions, the RT data were log- transformed, and responses that fell three standard devia- tions above or below the individual means were excluded from further analyses. Overall, less than 1 % of the trials were removed from the analysis. The RTs from those trials

in

which participants responded correctly were submitted

to

a repeated-measure analysis of variance (ANOVA) with

the within-participants factors of compatibility (compatible versus incompatible) and stimulus type (words versus pictures). The results of the analysis are reported in Tables 1 and 2 (see Figs. 1 , 2 , 3 , 4 , 5 ). Note that all of the statistical inferences are fully replicated when analysing untransformed data after the removal of outliers, but without discarding the first four trials of each block. 1

Overall, a significant crossmodal congruency effect was observed in all five experiments, indicating that all of the crossmodal correspondences investigated here significantly modulated the latency of participants’ behavioural responses. Moreover, in all five experiments, there was also

a significant effect of stimulus modality, showing that

participants responded more rapidly to visual than to auditory stimuli overall (see also Evans and Treisman 2010 ). There was no significant interaction between com- patibility and stimulus modality in any of the experiments; though, in Experiment 2 and 5, the interaction term approached statistical significance. In order to study the build-up of the compatibility effect and thus to determine at which stage of information pro- cessing it was taking place, we ran a bin analysis of RTs (see De Jong et al. 1994 ; Vallesi et al. 2005 ) by dividing

1 Results of supplemental analysis on untransformed data after the removal of outliers (i.e., responses above 3sd SD from the individual means. Overall less than 1 % of the data were removed), but without discarding the first four trials of each block:

Experiment 1: Congruency: F (1,9) = 17.49, p = .002; Modality:

F (1,9) = 29.41, p \ .001; Interaction: F (1,9) = 1.41, p = .26. Experiment 2: Congruency: F (1,9) = 16.37, p = .003; Modality:

F (1,9) = 38.21, p \ .001; Interaction: F (1,9) \ 1, n.s. Experiment 3: Congruency: F (1,9) = 7.36, p = .024; Modality:

F (1,9) = 12.07, p = .007; Interaction: F (1,9) = 2.01, p = .19. Experiment 4: Congruency: F (1,9) = 7.43, p = .023; Modality:

F (1,9) = 5.586, p = .042; Interaction: F (1,9) \ 1, n.s. Experiment 5: Congruency: F (1,9) = 5.44, p = .045; Modality:

F (1,9) = 20.22, p = .001; Interaction: F (1,9) = 4.17, p = .072.

123

Table 2 Mean RT in seconds (s) and accuracy (probability of correct responses) for Experiments 1–5

(probability of correct responses) for Experiments 1–5 V i s i o n Audition Congruent Incongruent

Vision

of correct responses) for Experiments 1–5 V i s i o n Audition Congruent Incongruent Congruent

Audition

Congruent Incongruent Congruent Incongruent

Exp 1

RT(s)

0.56

0.65

0.68

0.80

Accuracy 0.94

0.88

0.97

0.85

Exp 2

RT(s)

0.54

0.61

0.66

0.72

Accuracy 0.95

0.92

0.97

0.93

Exp 3

RT(s)

0.55

0.60

0.60

0.68

Accuracy 0.97

0.96

0.95

0.90

Exp 4

RT(s)

0.56

0.62

0.59

0.67

Accuracy 0.92

0.88

0.92

0.85

Exp 5

RT(s)

0.59

0.65

0.67

0.75

Accuracy 0.93

0.94

0.93

0.90

Overall

RT(s)

0.56

0.62

0.64

0.72

Accuracy 0.94

0.91

0.95

0.88

the RT data into 5 bins, from fastest to slowest. This pro- cedure was performed separately for each participant, modality and stimulus–response compatibility. This anal- ysis revealed that participants’ RTs were slower in incompatible than in compatible trials irrespective of the bin. To further highlight this difference, we then calculated the effect size (d-score, Cohen 1988 ) of compatibility for each bin by dividing the RT difference between incom- patible and compatible trials by the overall standard devi- ation of that bin (calculated by pooling together compatible and incompatible responses for each bin). Overall, the effect size was higher in the central bins. However, for all five of the experiments reported here, the d-scores were positive even in the first bin, thus indicating that the stimulus compatibility modulated response latencies even when RTs were very fast, thus arguing for an early onset of the compatibility effect. In order to compare the size of the congruency effects for the visual and auditory targets across the five experi- ments, the overall d-score for visual and auditory responses for each participant and experiment was calculated. An ANOVA on the d-scores, with stimulus modality as a within-participants factor and experiment as a between- participants factor, revealed no main effect of experiment ( F \ 1, ns), no main effect of modality ( F (1,4) = 1.053, p = .31) and no interaction ( F (4,45) = 1.075, p = .38, see Fig. 6 ).

Exp Brain Res (2012) 220:319–333

325

Fig. 1 The mil-mal effect modulates observers’ RTs.

a Examples of stimulus–

response key assignment ( top congruent; bottom incongruent).

b Mean RTs for congruent and

incongruent trials on visual and

auditory trials. Error bars represent the standard error of

the mean across participants and the asterisks indicate statistical difference ( p \ .05). c Scatter and bagplot of participants’ mean visual RTs on congruent versus incongruent trials.

d Scatter and bagplot of

participants’ mean auditory RTs on congruent versus incongruent trials. The cross at the centre of the bagplot represents the centre of mass of the bivariate distribution of empirical data (i.e. the halfspace depth), the dark grey area (i.e. the bag) includes the 50 % of the data with the largest depth, the light grey polygon contains all the non-outliers data points and the stars represent the outliers (Rousseeuw et al. 1999 ). e , f Mean RTs of congruent ( black ) and incongruent (grey ) visual ( e ) and auditory ( f) trials for each bin. Mean effect size of the RT difference between incongruent and congruent RTs for each bin on visual ( g ) and auditory ( h ) trials. In all four panels , error bars represent the standard error of the mean

a Congruent blocks

“mil”

“mil”   “mal”
 

“mal”“mil”  

Small circles

Small circles Large circles

Large circlesSmall circles

 
Incongruent blocks
Incongruent blocks

Incongruent blocks

“mal”

“mal” “mil”
“mal” “mil”
“mal” “mil”

“mil”“mal”

Small circles

Small circles Large circles

Large circlesSmall circles

b * * 900 800 * 700 600 500 Visual Auditory Congr. Incong r. Congr.
b
*
*
900
800
*
700
600
500
Visual
Auditory
Congr.
Incong r.
Congr.
Incong r.
Reaction time (ms)
c Visual trials faster responses on congruent 900 trials 700 500 faster responses on incongruent
c
Visual trials
faster responses
on congruent
900
trials
700
500
faster responses
on incongruent trials
500
700
900
identity line
Incongruent RTs (ms)

Congruent RTs (ms)

e Congruent blocks 1000 Incongruent blocks 800 600 400 RTs (ms)
e
Congruent blocks
1000
Incongruent blocks
800
600
400
RTs (ms)
1 2 3 4 5 Bin g 1.5 1 .5 1 2 3 4 5
1 2
3
4
5
Bin
g
1.5
1
.5
1 2
3
4
5
Bin
D-scores
d Auditory trials faster responses on congruent 900 trials 700 500 faster responses on incongruent
d
Auditory trials
faster responses
on congruent
900
trials
700
500
faster responses
on incongruent trials
500
700
900
identity line
Incongruent RTs (ms)

Congruent RTs (ms)

f Congruent blocks 1000 Incongruent blocks 800 600 400 RTs (ms)
f
Congruent blocks
1000
Incongruent blocks
800
600
400
RTs (ms)
1 2 3 4 5 Bin h 1.5 1 .5 1 2 3 4 5
1 2
3
4
5
Bin
h
1.5
1
.5
1 2
3
4
5
Bin
D-scores

Discussion

The results of the five experiments reported in the present study demonstrate the existence of several crossmodal associations between auditory and visual stimuli. In

particular, we have replicated several of the traditional results from the literature on sound symbolism (i.e. takete/ maluma and mil/mal) together with a finding from the literature on crossmodal correspondences (i.e. the associ- ation between auditory pitch and visual size). Moreover,

123

326

Exp Brain Res (2012) 220:319–333

Fig. 2 The takete–maluma effect modulates observers’ RTs. a Examples of stimulus– response key assignment ( top congruent; bottom incongruent). b Mean RTs for congruent and incongruent trials on visual and auditory trials. Error bars represent the standard error of the mean across participants and the asterisks indicate statistical difference ( p \ .05). c Scatter and bagplot of participants’ mean visual RTs on congruent versus incongruent trials. d Scatter and bagplot of participants’ mean auditory RTs on congruent versus incongruent trials. The cross at the centre of the bagplot represents the centre of mass of the bivariate distribution of empirical data (i.e. the halfspace depth), the dark grey area (i.e. the bag) includes the 50 % of the data with the largest depth, the light grey polygon contains all the non-outliers data points and the stars represent the outliers (Rousseeuw et al. 1999 ). e , f Mean RTs of congruent ( black ) and incongruent (grey ) visual ( e ) and auditory ( f) trials for each bin. Mean effect size of the RT difference between incongruent and congruent RTs for each bin on visual ( g ) and auditory ( h ) trials. In all four panels , error bars represent the standard error of the mean

a

Congruent blocks

 
 
“takete”
“takete”
  “takete” “maluma”

“maluma”

blocks     “takete” “maluma”   Incongruent blocks     “maluma”
blocks     “takete” “maluma”   Incongruent blocks     “maluma”
blocks     “takete” “maluma”   Incongruent blocks     “maluma”
blocks     “takete” “maluma”   Incongruent blocks     “maluma”
 

Incongruent blocks

 
 
“maluma”
“maluma”
  “maluma” “takete”

“takete”

“maluma”   Incongruent blocks     “maluma” “takete” c Visual trials  
“maluma”   Incongruent blocks     “maluma” “takete” c Visual trials  
“maluma”   Incongruent blocks     “maluma” “takete” c Visual trials  
“maluma”   Incongruent blocks     “maluma” “takete” c Visual trials  

c

Visual trials

Visual trials

 
faster responses on congruent trials 800 600 400 faster responses on incongruent trials 400 600
faster responses
on congruent
trials
800
600
400
faster responses
on incongruent trials
400
600
800
identity line
Incongruent RTs (ms)

Congruent RTs (ms)

e Congruent blocks 1000 Incongruent blocks 800 600 400 RTs (ms)
e
Congruent blocks
1000
Incongruent blocks
800
600
400
RTs (ms)
1 2 3 4 5 Bin g 1.5 1 .5 1 2 3 4 5
1 2
3
4
5
Bin
g
1.5
1
.5
1 2
3
4
5
Bin
D-scores
b * * 900 800 * 700 600 500 Reaction time (ms)
b
*
*
900
800
*
700
600
500
Reaction time (ms)
Visual Auditory d Auditory trials faster responses on congruent trials 800 600 400 faster responses
Visual
Auditory
d
Auditory trials
faster responses
on congruent
trials
800
600
400
faster responses
on incongruent trials
400
600
800
Congr.
Incong r.
Congr.
Incong r.
identity line
Incongruent RTs (ms)

Congruent RTs (ms)

f Congruent blocks 1000 Incongruent blocks 800 600 400 RTs (ms)
f
Congruent blocks
1000
Incongruent blocks
800
600
400
RTs (ms)
1 2 3 4 5 Bin h 1.5 1 .5 1 2 3 4 5
1 2
3
4
5
Bin
h
1.5
1
.5
1 2
3
4
5
Bin
D-scores

we have demonstrated the existence of two new crossmo- dal correspondences, namely an association between auditory pitch and the visual size of angles and between the waveform of auditory stimuli and the roundedness of visual shapes.

123

Given that in the IAT, only one stimulus is presented at any given time, and given that both modalities were equally relevant to the task, the present results cannot, unlike previous findings, be interpreted in terms of the costs and/or benefits associated with the simultaneous

Exp Brain Res (2012) 220:319–333

327

Fig. 3 Pitch-size compatibility modulates observers’ RTs.

a Examples of stimulus–

response key assignment ( top congruent; bottom incongruent).

b Mean RTs for congruent and

incongruent trials on visual and

auditory trials. Error bars represent the standard error of

the mean across participants and the asterisks indicate statistical difference ( p \ .05). c Scatter and bagplot of participants’ mean visual RTs on congruent versus incongruent trials.

d Scatter and bagplot of

participants’ mean auditory RTs on congruent versus incongruent trials. The cross at the centre of the bagplot represents the centre of mass of the bivariate distribution of empirical data (i.e. the halfspace depth), the dark grey area (i.e. the bag) includes the 50 % of the data with the largest depth, the light grey polygon contains all the non-outliers data points and the stars represent the outliers (Rousseeuw et al. 1999 ). e , f Mean RTs of congruent ( black ) and incongruent (grey ) visual ( e ) and auditory ( f) trials for each bin. Mean effect size of the RT difference between incongruent and congruent RTs for each bin on visual ( g ) and auditory ( h ) trials. In all four panels , error bars represent the standard error of the mean

a Congruent blocks

High pitch

High pitch   Low pitch
 

Low pitchHigh pitch  

Small circles

Small circles Large circles

Large circlesSmall circles

 
Incongruent blocks
Incongruent blocks

Incongruent blocks

Low pitch

Low pitch High pitch
Low pitch High pitch
Low pitch High pitch

High pitchLow pitch

Small circles

Small circles Large circles

Large circlesSmall circles

b * * 900 800 * 700 600 500 Visual Auditory Congr. Incongr. Congr. Incong
b
*
*
900
800
*
700
600
500
Visual
Auditory
Congr.
Incongr. Congr.
Incong r.
Reaction time (ms)
c Visual trials faster responses 800 on congruent trials 700 600 500 faster responses on
c
Visual trials
faster responses
800
on congruent
trials
700
600
500
faster responses
on incongruent trials
identity line
Incongruent RTs (ms)
500 600 700 800 Congruent RTs (ms) e Congruent blocks 1000 Incongruent blocks 800 600
500
600
700
800
Congruent RTs (ms)
e
Congruent blocks
1000
Incongruent blocks
800
600
400
1 2
3
4
5
Bin
g
1.5
1
.5
1 2
3
4
5
Bin
RTs (ms)
D-scores
d Auditory trials faster responses 800 on congruent trials 700 600 500 faster responses on
d
Auditory trials
faster responses
800
on congruent
trials
700
600
500
faster responses
on incongruent trials
500 600
700
800
identity line
Incongruent RTs (ms)

Congruent RTs (ms)

f Congruent blocks 1000 Incongruent blocks 800 600 400 RTs (ms)
f
Congruent blocks
1000
Incongruent blocks
800
600
400
RTs (ms)
1 2 3 4 5 Bin h 1.5 1 .5 1 2 3 4 5
1 2
3
4
5
Bin
h
1.5
1
.5
1 2
3
4
5
Bin
D-scores

presentation of certain combinations of stimuli, nor in terms of a failure of selective attention (see Marks 2004 ; Spence 2011 ), nor in terms of costs and benefits of multi- sensory integration (see Parise and Spence 2008 , 2009 ; Parise et al. 2012 ). In classic interference tasks, two stimuli

are always presented at more or less the same time with participants being instructed to respond to only one of them. It is therefore unclear how much of the reported effects are due to the presence of an irrelevant stimulus and how much to the effect of stimulus compatibility. Given

123

328

Exp Brain Res (2012) 220:319–333

Fig. 4 Pitch-angle compatibility modulates observers’ RTs. a Examples of stimulus–response key assignment (top congruent; bottom incongruent). b Mean RTs for congruent and incongruent trials on visual and auditory trials. Error bars represent the standard error of the mean across participants and the asterisks indicate statistical difference ( p \ .05). c Scatter and bagplot of participants’ mean visual RTs on congruent versus incongruent trials. d Scatter and bagplot of participants’ mean auditory RTs on congruent versus incongruent trials. The cross at the centre of the bagplot represents the centre of mass of the bivariate distribution of empirical data (i.e. the halfspace depth), the dark grey area (i.e. the bag) includes the 50 % of the data with the largest depth, the light grey polygon contains all the non-outliers data points and the stars represent the outliers (Rousseeuw et al. 1999 ). e , f Mean RTs of congruent ( black ) and incongruent (grey ) visual ( e ) and auditory ( f) trials for each bin. Mean effect size of the RT difference between incongruent and congruent RTs for each bin on visual ( g ) and auditory ( h ) trials. In all four panels , error bars represent the standard error of the mean

a b Congruent blocks * * 900 High pitch Low pitch 800 * 700 600
a
b
Congruent blocks
*
*
900
High pitch
Low pitch
800
*
700
600
Incongruent blocks
500
Low pitch
High pitch
Visual
Auditory
c
d
Visual trials
Auditory trials
faster responses
faster responses
on congruent
on congruent
trials
trials
800
800
600
600
400
400
faster responses
on incongruent trials
faster responses
on incongruent trials
400
600
800
400
600
800
Congruent RTs (ms)
Congruent RTs (ms)
identity line
Congr.
Incong r.
Congr.
Incong r.
identity line
Incongruent RTs (ms)
Incongruent RTs (ms)
Reaction time (ms)
e Congruent blocks 1000 Incongruent blocks 800 600 400 RTs (ms)
e
Congruent blocks
1000
Incongruent blocks
800
600
400
RTs (ms)
1 2 3 4 5 Bin g 1.5 1 .5 1 2 3 4 5
1 2
3
4
5
Bin
g
1.5
1
.5
1 2
3
4
5
Bin
D-scores
f Congruent blocks 1000 Incongruent blocks 800 600 400 RTs (ms)
f
Congruent blocks
1000
Incongruent blocks
800
600
400
RTs (ms)
1 2 3 4 5 Bin h 1.5 1 .5 1 2 3 4 5
1
2
3
4
5
Bin
h
1.5
1
.5
1
2
3
4
5
Bin
D-scores

that these interpretational uncertainties are not present in the IAT, the present results qualify as more genuine effect of stimulus compatibility. Interestingly, all of the crossmodal correspondences studied here had effect sizes that were similar in magnitude

123

(see Fig. 6 ). This result suggests that crossmodal corre- spondences involving elementary stimulus features, such as pitch and size, and those involving more complex stimuli, such as nonsense words and line drawings, are equally effective in inducing crossmodal compatibility effects.

Exp Brain Res (2012) 220:319–333

329

Fig. 5 Waveform-roundedness compatibility modulates observers’ RTs. a Examples of stimulus–response key assignment (top congruent; bottom incongruent). b Mean RTs for congruent and incongruent trials on visual and auditory trials. Error bars represent the standard error of the mean across participants and the asterisks indicate statistical difference ( p \ .05). c Scatter and bagplot of participants’ mean visual RTs on congruent versus incongruent trials. d Scatter and bagplot of participants’ mean auditory RTs on congruent versus incongruent trials. The cross at the centre of the bagplot represents the centre of mass of the bivariate distribution of empirical data (i.e. the halfspace depth), the dark grey area (i.e. the bag) includes the 50 % of the data with the largest depth, the light grey polygon contains all the non-outliers data points and the stars represent the outliers (Rousseeuw et al. 1999 ). e , f Mean RTs of congruent ( black ) and incongruent (grey ) visual ( e ) and auditory ( f) trials for each bin. Mean effect size of the RT difference between incongruent and congruent RTs for each bin on visual ( g ) and auditory ( h ) trials. In all four panels , error bars represent the standard error of the mean

a Congruent blocks b * * 900 Square wave Sine wave 800 * 700 600
a Congruent blocks
b
*
*
900
Square wave
Sine wave
800
*
700
600
Incongruent blocks
500
Sine wave
Square wave
Visual
Auditory
c
Visual trials
d
Auditory trials
faster responses
faster responses
on congruent
on congruent
trials
trials
800
800
600
600
400
400
faster responses
on incongruent trials
faster responses
on incongruent trials
400
600
800
400
600
800
identity line
Congr.
Incong r.
Congr.
Incong r.
identity line
Incongruent RTs (ms)
Incongruent RTs (ms)
Reaction time (ms)

Congruent RTs (ms)

e Congruent blocks 1200 Incongruent blocks 1000 800 600 400 RTs (ms)
e
Congruent blocks
1200
Incongruent blocks
1000
800
600
400
RTs (ms)
1 2 3 4 5 Bin g 1.5 1 .5 1 2 3 4 5
1 2
3
4
5
Bin
g
1.5
1
.5
1 2
3
4
5
Bin
D-scores

Congruent RTs (ms)

f 1200 Congruent blocks Incongruent blocks 1000 800 600 400 RTs (ms)
f
1200
Congruent blocks
Incongruent blocks
1000
800
600
400
RTs (ms)
1 2 3 4 5 Bin h 1.5 1 .5 1 2 3 4 5
1
2
3
4
5
Bin
h
1.5
1
.5
1
2
3
4
5
Bin
D-scores

Moreover, our results demonstrate that all of the cross- modal correspondences tested here affected participants’ responses to visual and auditory stimuli in much the same way. This also suggests that crossmodal compatibility effects are neither modality-specific nor modality-

dependent (see Evans and Treisman 2010 ) and are hence consistent with there being a unique supramodal mecha- nism coding for crossmodal correspondences. Taken together, these results would appear to indicate that all crossmodal correspondences may be equally effective in

123

330

Exp Brain Res (2012) 220:319–333

1 0.8 0.6 0.4 0.2 D-scores
1
0.8
0.6
0.4
0.2
D-scores
Res (2012) 220:319–333 1 0.8 0.6 0.4 0.2 D-scores Fig. 6 Comparison of the effect size

Fig. 6 Comparison of the effect size (d-score) for vision and audition between the five experiments. Note that all of the crossmodal correspondences had a very similar effect size. Error bars represent the standard error of the mean

terms of modulating participants’ RTs, and thus suggest that those compatibility effects are indeed all-or-none effects. Traditionally, crossmodal correspondences have often been considered to be the outcome of so-called weak synaesthesia shared by all humans (see Martino and Marks 2001 ; Mulvenna and Walsh 2006 ; Rader and Tellegen 1987 ; Rudmin and Cappelli 1983 ; Simpson et al. 1956 ; Ward et al. 2006 ). Sensory (as opposed to more concep- tual) synaesthesia is a condition whereby stimulation in a given sensory modality automatically elicits additional idiosyncratic sensations, often in another unstimulated sensory modality (Grossenbacher and Lovelace 2001 ). Due to the involvement of multiples senses in this condition and due to the fact that also in synaesthesia certain combina- tions of crossmodal stimuli lead to behavioural facilitation and others to interference (Dixon et al. 2000 ; Mills 1999 ), it has repeatedly been proposed that crossmodal corre- spondences and full-blown synaesthesia may represent two extremes of the same continuum (Martino and Marks 2001 ; Rader and Tellegen 1987 ; Svartdal and Iversen 1989 ). However, a common trait of synaesthesia is its unidirec- tionality, that is, stimulation in one sensory modality (the inducer) elicits a concurrent sensation in another sensory modality (the concurrent) but not the other way round (though see Cohen Kadosh et al. 2007a , b ; Cohen Kadosh and Henik 2007 ; Johnson et al. 2007 ; Meier and Rothen 2007 , for rare exceptions). With regard to this point, demonstrating that crossmodal correspondences similarly

123

influence a participant’s responses to visual and auditory stimuli, the present data seem to argue against the ‘‘weak synaesthesia’’ account. Instead, the present results suggest that crossmodal correspondences and synaesthesia might rather be two distinct empirical phenomena that just so happen to share certain superficial similarities (Parise and Spence in press). More likely, crossmodal correspondences might reflect a tuning of the perceptual systems to the statistical properties of the environment achieved through evolution (Ludwig et al. 2011 ) and perceptual learning (Ernst 2007 ; Xu et al. 2012 ). Revealing a reliable compatibility effect even in the fastest responses, the bin analysis of RTs demonstrates that the crossmodal compatibility effects reported here have a very rapid onset. The presence of a RT modulation in responses faster than 400 ms would appear to rule out any possible explanation of the results in terms of explicit cognitive strategies and suggests that compatibility effects due to crossmodal correspondences might be the outcome of automatic processes. Nevertheless, it should be noted

that the responses falling in the slowest bins are likely to reflect the joint contribution of both automatic processes and cognitive strategies (Chiou and Rich 2012 ; Klapetek

et al. in press). These results are compatible with the results

of another recent study by Kovic et al. (2010 ) on shape– word crossmodal correspondences. There, an effect of compatibility on evoked potentials was found as early as 140–180 ms after stimulus onset. Based on the sites and latencies of the ERP components modulated by crossmodal correspondences, Kovic and colleagues suggested that their effect could be due to audiovisual integration occurring during early sensory processing. Given that, in the present study, only a single stimulus was presented on each trial; however, multisensory integration cannot play a role in the early onset of the compatibility effect reported here. More generally, the present results question whether multisen- sory integration played any role at all in the compatibility effects found by Kovic et al. ( 2010 ). In other words, although it is known that crossmodal correspondences can have an effect on multisensory integration (Parise and Spence 2008 , 2009 ), multisensory integration itself is not

necessary for crossmodal correspondences to induce reli- able effects on behaviour. So far, many other studies have demonstrated the effects of crossmodal correspondences on RTs (Marks 2004 ); however, in most cases, the experimental paradigms did not allow researchers to assess whether those effects

occurred at a perceptual or at a response selection level. In

a recent RT study, Evans and Treisman (2010 ) ruled out

the effects of response selection and found significant effects of crossmodal correspondences on speeded classi-

fication tasks. This finding suggests that crossmodal cor- respondences might operate at a perceptual level and

Exp Brain Res (2012) 220:319–333

331

modulate the speed of perceptual processing. Conversely, in the present study, only one stimulus was presented at a time and the only variable that was manipulated experi- mentally was the response assignment. Therefore, we can exclude any effect of crossmodal correspondences on perceptual processing and argue that the effects reported here likely occur at the level of response selection. Taken together, the present results and those reported by Evans and Treisman ( 2010 ) complement each other by demon- strating that crossmodal correspondences operate both at a perceptual level and at the level of response selection and highlight the wide-ranging effects of crossmodal corre- spondences on information processing. To the best of our knowledge, this is the first study spe- cifically to have investigated the famous takete/maluma and mil/mal effects using an indirect performance measure with auditory verbal (rather than written) stimuli. Our results therefore demonstrate that such effects cannot simply be attributed to some kind of similarity between the shape of the visual stimuli and the visual appearance of the written words (see also Bremner et al., submitted ; Brang et al. 2011 ; Westbury 2005 ). Rather, the results reported here demon- strate that the similarity involves, at least in the early stages, the physical features of the visual and the auditory stimuli. These results move beyond simply replicating previous findings by showing that crossmodal compatibility can speed-up the processing of unimodal sensory stimuli, but do so with a single technique that has been specifically designed by researchers in order to measure associations between stimuli. As mentioned in the Introduction, the modified IAT utilized here has several advantages over other traditional techniques. First, the IAT provides an indirect measure of association, therefore taken together, the present results suggest that all of the crossmodal cor- respondences investigated here are automatically encoded by participants. This conclusion is further supported by the bin analysis (see De Jong et al. 1994 ; Vallesi et al. 2005 ), demonstrating that congruency effects modulated RTs even in the fastest responses, supposedly less influenced by top- down cognitive control (though surely the effects in the slowest bins are more likely to reflect the contribution of both automatic and controlled processes). Second, given that both modalities are task relevant, the IAT allows one to easily measure how crossmodal compatibility affects the processing of both visual and auditory stimuli. Additionally, by ensuring that only a single (unimodal) stimulus is presented at any one time, the IAT overcomes every issue concerning potential spatiotemporal inconsis- tencies in the combined presentation of audiovisual signals. When auditory and visual stimuli are jointly presented, any offset in their relative position, such as when the visual stimuli are presented on the screen while the auditory stimuli are played over headphones, might alter

multisensory processing and hence interfere with the crossmodal congruency effects that are observed (e.g. see Soto-Faraco et al. 2002 ). Similar problems also occur in the temporal domain, where asynchronies between auditory and visual stimuli might occur due to physical and neural delays. Both physical delays (e.g. due to timing inaccura- cies in the experimental set-up) and neural delays (e.g. due the auditory system being generally ‘‘faster’’ at transducing signals than the visual system, see Spence and Squire 2003 ) can underlie potential asymmetries in the effect of com- patibility, whereby stimuli in a given modality can alter the processing on a second modality (Chen and Spence 2011 ), but not vice versa (see Evans and Treisman 2010 ). Previous claims that sound symbolic effects are strong have typically been based on the consistency of the responses provided by a large number of participants (see Robson 2011 ). In this regard, the IAT allows one to assess the strength of crossmodal correspondences and sound symbolic associations in a more subtle way than traditional techniques. Being based on a large number of responses from a single observer, the IAT also allows one to measure the strength of crossmodal correspondences within indi- vidual participants, hence providing a measure of individ- ual differences. Moreover, not relying on explicit responses, the IAT might be suitable to investigate cross- modal correspondences and sound symbolism in special population, such as autistic individuals (which according to previous research do not show direct evidence of sound symbolism, Oberman and Ramachandran 2008 ; Rama- chandran and Oberman 2007 ) or even primates (e.g., see Cowey and Weiskrantz 1975 ; Ludwig et al. 2011 ; Parker and Easton 2004 ; Spence and Deroy 2012 ; Weiskrantz and Cowey 1975 ; see also Premack and Premack 2003 ). Nevertheless, it should be noted that the IAT also has some drawbacks: For example, the frequent changes in response assignment introduces noise in the data due to learning and practice effects. Moreover, it is not clear on which dimension the IAT operates. The IAT compatibility effect can indeed arise not just due to the relevant features themselves, but rather due the internal response that they generate (i.e. people may associate two stimuli because of any feeling of familiarity that they both engender, because of the hedonism response they elicit), though this is also the case for other speeded classification paradigms. Nev- ertheless, together with the fact that the IAT provides a standard method for measuring (implicit and explicit) associations between a wide range of items, our results suggests that the IAT should be used more extensively in order to measure correspondence between crossmodal and unimodal sensory signals and might be a key technique for discovering novel correspondences. All of the associations previously reported in the litera- ture, and investigated in the present study, have been

123

332

Exp Brain Res (2012) 220:319–333

successfully replicated using a modified version of the IAT. This procedure enabled us to demonstrate that the size of the compatibility effects elicited by crossmodal correspon- dences build-up very quickly (and as recently suggested by ERP results where the crossmodal correspondence between abstract visual shapes and words was modulated; see Kovic et al. 2010 ) and are stable across modalities and a wide range of stimuli, thus suggesting the existence of a single under- lying automatic mechanism that deals with crossmodal compatibility.

Acknowledgments Cesare Parise was supported by the Bernstein Center for Computational Neuroscience, Tu¨ bingen, funded by the German Federal Ministry of Education and Research (BMBF; FKZ:

01GQ1002).

References

Bernstein IH, Edelstein BA (1971) Effects of some variations in auditory input upon visual choice reaction time. J Exp Psychol

87(2):241–247

Bien N, ten Oever S, Goebel R, Sack AT (2012) The sound of size Crossmodal binding in pitch-size synesthesia: A combined TMS, EEG and psychophysics study. Neuroimage 59:663–672 Blair IV (2002) The malleability of automatic stereotypes and prejudice. Personal Soc Psychol Rev 6:242–261 Bozzi P, Flores D’Arcais G (1967) Experimental research on the intermodal relationships between expressive qualities. Arch Psicol Neurol Psichiatr 28(5):377–420 Brang D, Rouw R, Ramachandran VS, Coulson S (2011) Similarly shaped letters evoke similar colors in grapheme–color synesthe- sia. Neuropsychologia 49:1355–1358 Bremner A, Caparos S, Davidoff J, de Fockert J, Linnell K, Spence C (submitted) Bouba and Kiki in Namibia? Western shape- symbolism does not extend to taste in a remote population. Cognition Chen Y-C, Spence C (2011) Crossmodal semantic priming by naturalistic sounds and spoken words enhances visual sensitivity. J Exp Psychol Hum Percept Perform 37:1554–1568 Chiou R, Rich AN (2012) Cross-modality correspondence between pitch and spatial location modulates attentional orienting. Perception 41:339–353 Cohen J (1988) Statistical power analysis for the behavioral sciences. Lawrence Erlbaum Associates, Hillsdale, NJ Cohen Kadosh R, Henik A (2007) Can synaesthesia research inform cognitive science? Trends Cogn Sci 11(4):177–184 Cohen Kadosh R, Cohen Kadosh K, Henik A (2007a) The neuronal correlate of bidirectional synesthesia: a combined event-related potential and functional magnetic resonance imaging study. J Cogn Neurosci 19(12):2050–2059 Cohen Kadosh R, Henik A, Walsh V (2007b) Small is bright and big is dark in synaesthesia. Curr Biol 17(19):R834–R835 Cowey A, Weiskrantz L (1975) Demonstration of cross-modal matching in rhesus monkeys, Macaca mulatta. Neuropsycholo- gia 13(1):117–120 Crisinel AS, Spence C (2009) Implicit association between basic tastes and pitch. Neurosci Lett 464(1):39–42 Crisinel AS, Spence C (2010) A sweet sound? Food names reveal implicit associations between taste and pitch. Perception

39(3):417–425

123

Davis R (1961) The fitness of names to drawings: a cross-cultural study in Tanganyika. Br J Psychol 52:259–268 De Jong R, Liang CC, Lauber E (1994) Conditional andunconditional automaticity: a dual-process model of effects of spatial stimulus- response correspondence. J Exp Psychol Hum Percept Perform

20(4):731–750

Dematte` M, Sanabria D, Spence C (2006) Cross-modal associations between odors and colors. Chem Senses 31(6):531–538 Dematte` M, Sanabria D, Spence C (2007) Olfactory-tactile compat- ibility effects demonstrated using a variation of the implicit association test. Acta Psychol 124(3):332–343 Dixon MJ, Smilek D, Cudahy C, Merikle PM (2000) Five plus two equals yellow. Nature 406(6794):365 Ernst MO (2007) Learning to integrate arbitrary signals from vision and touch. J Vis 7(5):1–14 Evans KK, Treisman A (2010) Natural cross-modal mappings between visual and auditory features. J Vis 10(1):1–12 Fiedler K, Messner C, Bluemke M (2006) Unresolved problems with the ‘‘I’’, the ‘‘A’’, and the ‘‘T’’: a logical and psychometric critique of the implicit association test (IAT). Eur Rev Soc Psychol 17:74–147

Freed DJ (1990) Auditory correlates of perceived mallet hardness for

a set of recorded percussive sound events. J Acoust Soc Am

87(1):311–322

Gallace A, Spence C (2006) Multisensory synesthetic interactions in

the speeded classification of visual size. Percept Psychophys

68(7):1191–1203

Greenwald AG, McGhee DE, Schwartz JLK (1998) Measuring individual differences in implicit cognition: the implicit associ- ation test. J Pers Soc Psychol 74(6):1464–1480 Grossenbacher PG, Lovelace CT (2001) Mechanisms of synesthesia:

cognitive and physiological constraints. Trends Cogn Sci

5(1):36–41

Hinton L, Nichols J, Ohala JJ (eds) (2006) Sound symbolism. Cambridge University Press, Cambridge Johnson A, Jepma M, De Jong R (2007) Colours sometimes count:

awareness and bidirectionality in grapheme-colour synaesthesia.

Q J Exp Psychol 60(10):1406–1422

Klapetek A, Ngo MK, Spence C (in press) Do crossmodal correspondences enhance the facilitatory effect of auditory cues

on visual search? Atten Percept Psychophys Klatzky RL, Pai DK, Krotkov EP (2000) Perception of material from contact sounds. Presence Teleoper Virtual Environ 9(4):399–410 Klein R, Brennan M, Gilani A (1987) Covert cross-modality orienting of attention in space. Paper presented at the Annual meeting of the Psychonomic Society, Seattle, WA Ko¨ hler W (1929) Gestalt psychology. Liveright, New York Ko¨ hler W (1947) Gestalt psychology: an introduction to new concepts in modern psychology. Liveright Publ. Corporation, New York, NY Kovic V, Plunkett K, Westermann G (2010) The shape of words in the brain. Cognition 114(1):19–28 Ludwig VU, Adachi I, Matzuzawa T (2011) Visuoauditory mappings between high luminance and high pitch are shared by chimpan- zees (Pan troglodytes) and humans. Proc Natl Acad Sci USA

108:20661–20665

Marks LE (1987) On cross-modal similarity: auditory–visual inter- actions in speeded discrimination. J Exp Psychol Hum Percept Perform 13(3):384–394 Marks LE (1989) On cross-modal similarity: the perceptual structure

of pitch, loudness, and brightness. J Exp Psychol Hum Percept Perform 15(3):586–602 Marks LE (2004) Cross-modal interactions in speeded classification. In: Calvert GA, Spence C, Stein BE (eds) The handbook of mutisensory processes. MIT Press, Cambridge, MA, pp 85–106 Martino G, Marks LE (2001) Synesthesia: strong and weak. Curr Dir Psychol Sci 10(2):61–65

Exp Brain Res (2012) 220:319–333

333

Meier B, Rothen N (2007) When conditioned responses ‘‘fire back’’:

bidirectional cross-activation creates learning opportunities in synesthesia. Neuroscience 147(3):569–572 Melara RD, O’Brien TP (1987) Interaction between synesthetically corresponding dimensions. J Exp Psychol Gen 116(4):323–336 Melara RD, O’Brien TP (1990) Effects of cuing on cross-modal congruity. J Mem Lang 29(6):655–686 Mills CB (1999) Digit synaesthesia: a case study using a Stroop-type test. Cogn Neuropsychol 16(2):181–191 Mulvenna CM, Walsh V (2006) Synaesthesia: supernormal integra- tion? Trends Cogn Sci 10(8):350–352 Newman S (1933) Further experiments in phonetic symbolism. Am J Psychol 45(1):53–75 Oberman LM, Ramachandran VS (2008) Preliminary evidence for deficits in multisensory integration in autism spectrum disorders:

the mirror neuron hypothesis. Soc Neurosci 3(3–4):348–355 Osgood CE (1960) The cross-cultural generality of visual–verbal synesthetic tendencies. Behav Sci 5(2):146–169 Osgood CE, Suci G, Tannenbaum P (1957) The measurement of meaning. University of Illinois Press, Urbana Oyama T, Yamada H, Iwasawa H (1998) Synesthetic tendencies as the basis of sensory symbolism: a review of a series of experiments by means of semantic differential. Psychologia

41:203–215

Parise CV, Pavani F (2011) Evidence of sound symbolism in simple vocalizations. Exp Brain Res 214(3):373–380 Parise CV, Spence C (2008) Synesthetic congruency modulates the temporal ventriloquism effect. Neurosci Lett 442(3):257–261 Parise CV, Spence C (2009) When birds of a feather flock together:

synesthetic correspondences modulate audiovisual integration in non-synesthetes. PLoS One 4(5):e5664 Parise CV, Spence C (2012) Assessing the associations between brand packaging and brand attributes using an indirect performance measure. Food Qual Prefer 24:17–23 Parise CV, Spence C (in press) Audiovisual crossmodal correspon- dences. In Simner J, Hubbard EM (eds) Oxford handbook of synaesthesia. Oxford University Press, Oxford Parise CV, Spence C, Ernst MO (2012) When correlation implies causation in multisensory integration. Curr Biol 22:46–49 Parker A, Easton A (2004) Cross-modal memory in primates: the neural basis of learning about the multisensory properties of objects and events. In: Calvert GA, Spence C, Stein BE (eds) The handbook of multisensory processes. MIT Press, Cam- bridge, MA, pp 333–342 Poffenberger A, Barrows B (1924) The feeling value of lines. J Appl Psychol 8(2):187–205 Premack D, Premack AJ (2003) Original intelligence: unlocking the mystery of who we are. McGraw-Hill, New York Rader C, Tellegen A (1987) An investigation of synesthesia. J Pers Soc Psychol 52(5):981–987 Ramachandran VS, Oberman LM (2007) Broken mirrors: a theory of autism. Sci Am Spec Ed 17(2):20–29 Robson D (2011) Language’s missing link. New Sci 211(2821):30–33 Rogers SK, Ross AS (1968) A cross-cultural test of the Maluma- Takete phenomenon. Perception 4(1):105–106

Rousseeuw PJ, Ruts I, Tukey JW (1999) The bagplot: a bivariate boxplot. Am Stat 53(4):382–387 Rudmin F, Cappelli M (1983) Tone-taste synesthesia: a replication. Percept Mot Skills 56:118 Sapir E (1929) A study in phonetic symbolism. J Exp Psychol

12(3):225–239

Seo H-S, Arshamian A, Schemmer K, Scheer I, Sander T, Ritter G, Hummel T (2010) Cross-modal integration between odors and abstract symbols. Neurosci Lett 478:175–178 Shepherd GM (2012) Neurogastronomy: how the brain creates flavor and why it matters. Columbia University Press, New York Simpson RH, Quinn M, Ausubel DP (1956) Synesthesia in children:

association of colors with pure tone frequencies. J Genet Psychol Res Theory Hum Dev 89(1):95–103 Soto-Faraco S, Lyons J, Gazzaniga M, Spence C, Kingstone A (2002) The ventriloquist in motion: illusory capture of dynamic infor- mation across sensory modalities. Cogn Brain Res 14(1):139–146 Spence C (2011) Crossmodal correspondences: a tutorial review. Atten Percept Psychophys 73(4):1–25 Spence C, Deroy O (2012) Are chimpanzees really synaesthetic? i-Perception 3:316–318 Spence C, Squire S (2003) Multisensory integration: maintaining the perception of synchrony. Curr Biol 13(13):R519–R521 Stevens JC, Marks LE (1965) Cross-modality matching of brightness and loudness. Proc Natl Acad Sci USA 54(2):407–411 Stumpf K (1883) Tonpsychologie. S. Hirzel, Leipzig Svartdal F, Iversen T (1989) Consistency in synesthetic experience to vowels and consonants: five case studies. Scand J Psychol

30:220–227

Vallesi A, Mapelli D, Schiff S, Amodio P, Umilta` C (2005) Horizontal and vertical Simon effect: different underlying mechanisms? Cognition 96(1):B33–B43 Van den Doel K, Pai DK (1998) The sounds of physical shapes. Presence 7(4):382–395 Walker P, Smith S (1985) Stroop interference based on the multimodal correlates of haptic size and auditory pitch. Percep- tion 14(6):729–736 Walker P, Bremner J, Mason U, Spring J, Mattock K, Slater A, Johnson S (2010) Preverbal infants’ sensitivity to synaesthetic cross-modality correspondences. Psychol Sci 21(1):21–25 Ward J, Huckstep B, Tsakanikos E (2006) Sound-colour synaesthesia:

to what extent does it use cross-modal mechanisms common to us all? Cortex 42(2):264–280 Watson AB, Pelli DG (1983) QUEST-a Bayesian adaptive psycho- metric method. Percept Psychophys 33(2):113–120 Weiskrantz L, Cowey A (1975) Cross-modal matching in the rhesus monkey using a single pair of stimuli. Neuropsychologia

13(3):257–261

Westbury C (2005) Implicit sound symbolism in lexical access:

evidence from an interference task. Brain Lang 93(1):10–19 Xu J, Yu L, Rowland BA, Stanford TR, Stein BE (2012) Incorpo- rating cross-modal statistics in the development and maintenance of multisensory integration. J Neurosci 32:2287–2298 Zigler MJ (1930) Tone shapes: a novel type of synaesthesia. J Gen Psychol 3:276–287

123