Sie sind auf Seite 1von 13

94

Journal of Neuropsychology (2014), 8, 94106 2012 The British Psychological Society www.wileyonlinelibrary.com

Reduced audiovisual integration in synesthesia evidence from bimodal speech perception


Christopher Sinke1,2, Janina Neufeld1,2, Markus Zedler1, unte3 and Hinderk M. Emrich1,2, Stefan Bleich1,2, Thomas F. M Gregor R. Szycik1*
1

Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hanover, Germany 2 Center of Systems Neuroscience, Hanover, Germany 3 Department of Neurology, University of L ubeck, Germany
Recent research suggests synesthesia as a result of a hypersensitive multimodal binding mechanism. To address the question whether multimodal integration is altered in synesthetes in general, grapheme-colour and auditory-visual synesthetes were investigated using speech-related stimulation in two behavioural experiments. First, we used the McGurk illusion to test the strength and number of illusory perceptions in synesthesia. In a second step, we analysed the gain in speech perception coming from seen articulatory movements under acoustically noisy conditions. We used disyllabic nouns as stimulation and varied signal-to-noise ratio of the auditory stream presented concurrently to a matching video of the speaker. We hypothesized that if synesthesia is due to a general hyperbinding mechanism this group of subjects should be more susceptible to McGurk illusions and prot more from the visual information during audiovisual speech perception. The results indicate that there are differences between synesthetes and controls concerning multisensory integration but in the opposite direction as hypothesized. Synesthetes showed a reduced number of illusions and had a reduced gain in comprehension by viewing matching articulatory movements in comparison to control subjects. Our results indicate that rather than having a hypersensitive binding mechanism, synesthetes show weaker integration of vision and audition.

Synesthesia refers to the uncommon ability to perceive an internally generated sensation in one sensory modality triggered by a stimulus coming from another sensory modality. Thus, an external stimulus, in the synesthesia literature often called inducer, leads to an additional percept called concurrent (Grossenbacher & Lovelace, 2001). The type of synesthesia is named according to the inducerconcurrent pair: in auditory-visual synesthesia, for example, acoustic stimulation leads to a visual experience, whereas in

*Correspondence should be addressed to Gregor R. Szycik, Department of Psychiatry, Social Psychiatry and Psychotherapy, Medical School Hannover, Carl-Neuberg-Strae 1, Hanover 30625, Germany (e-mail: szycik.gregor@mh-hannover.de).
DOI:10.1111/jnp.12006

Reduced audiovisual integration in synesthesia

95

linguistic-colour synesthesia speech-related stimuli lead to a visual experience. Synesthesia has been estimated to affect about 4% of the population (Simner et al., 2006). The most investigated form of synesthesia is grapheme-colour synesthesia with affected subjects perceiving written and heard letters in different colours (Simner et al., 2006). Usually synesthetic inducerconcurrent couplings are stable (Baron-Cohen, Wyke, & Binnie, 1987; Simner & Logie, 2007), automatic (Mills, Boteler, & Oliver, 1999) and idiosyncratic so that the same inducer always triggers the same concurrent for one synesthete. For example, a particular grapheme-colour synesthete may perceive the letter A as blue whereas another grapheme-colour synesthete may perceive it as red. Synesthesia is context dependent (Dixon, Smilek, Duffy, Zanna, & Merikle, 2006), attention dependent (Mattingley, Rich, Yelland, & Bradshaw, 2001; Sagiv, Heer, & Robertson, 2006) and also dependent on the interpretation rather than on the direct sensorial input (Bargary, Barnett, Mitchell, & Newell, 2009). Theoretically, synesthesia has been thought of as a hyperbinding phenomenon (Esterman, Verstynen, Ivry, & Robertson, 2006; Hubbard, 2007; Robertson, 2003) in the sense that synesthetes having an overactive multimodal integration mechanism, leading to the unusual synesthetic inducerconcurrent coupling. Whether this hyperbinding is achieved via direct connections (Ramachandran & Hubbard, 2001), disinhibited feedback (Grossenbacher & Lovelace, 2001), reentrant processing (Smilek, Dixon, Cudahy, & Merikle, 2001) or a mixture of these mechanisms (Hubbard, 2007) is not known. A recent investigation even points to a general hyperconnectivity in synesthetes (Hanggi, Wotruba, & Jancke, 2011). Whether this overactive binding mechanism affects only the inducerconcurrent pairing or extends to multisensory integration processes in general is not clear so far. Two studies have been published recently addressing this issue (Brang, Williams, & Ramachandran, 2012; Neufeld, Sinke, Zedler, Emrich, & Szycik, 2012). Both present opposite results using the double-ash illusion as indicator for multisensory integration process. While Brang et al. (2012) report stronger susceptibility for double-ash illusions in the synesthesia group, Neufeld et al. (2012) found both, a weaker susceptibility to the illusion and a relation between the degree of illusions and the age of the synesthesia subjects. Thus, more evidence is needed to clarify if synesthesia is brought about by a more sensitive binding mechanism. If this was the case, synesthetes should show behavioural effects besides the unusual inducerconcurrent coupling. To address this problem, we conducted two experiments relying on audiovisual integration mechanisms in different ways. In the rst experiment, the McGurk illusion (McGurk & MacDonald, 1976) was assessed in synesthetes and control participants. This illusion pertains to the fact that divergent auditory and visual information may sometimes be fused to a new percept. For example, if the viseme of a face pronouncing AGA is dubbed onto an audio stream containing ABA, an observer will often perceive the fused percept of ADA. The McGurk illusion has already been used to study audiovisual integration processes in grapheme-colour synesthesia (Bargary et al., 2009). In this study, Bargary et al. were able to show the dependence of the synesthetic percept from relatively late perceptual integration processes. In contrast with the study of Bargary et al., we focused on the overall proportion of audiovisual fusions in the McGurk experiment. In the second experiment, speech comprehension in a noisy environment was analysed. Varying the signal-to-noise ratio (SNR) of the auditory input, it has been shown that even normal non-hearing impaired comprehenders take advantage from concurrent visual input. Also, an SNR can be found for which comprehension benets most from the visual information (Ross, Saint-Amour, Leavitt, Javitt, & Foxe, 2007). Here, we tested

96

Christopher Sinke et al.

whether synesthetes and controls benet similarly from visual information during the perception of speech in a noisy environment. We predicted that if synesthetes have a generally overactive binding mechanism, they should report more fused syllables in the illusion experiment and outperform controls in the speech comprehension task. If, on the other hand, binding is restricted to the inducer concurrent pairing, no differences should be observed in both experiments.

Methods
All procedures had been approved by the local Ethics Committee. All subjects gave informed consent and participated for a small monetary compensation. Participants were matched for age and gender. We divided our subjects into groups depending on the self-reported synesthetic experience. After an extensive interview, all synesthetes were classied by self-reported localization of concurrent perception as associators according to Dixon, Smilek, and Merikle (2004), that is, perceiving the synesthetic sensations in their minds eye. In addition, our subjects were characterized by a modied ofine version of the synesthesia battery (Eagleman, Kagan, Nelson, Sagaram, & Sarma, 2007) in which subjects have to indicate a colour related both to the presentation of tones of different instruments and different pitches and to the presentation of letters from AZ and the numbers from 0 to 9. Control subjects were tested with the complete battery, whereas synesthesia subjects were tested only on those parts of the battery relevant for their self-reported inducer concurrent pair (subjects showing both grapheme-colour and auditory-visual synesthesia performed on the corresponding parts of the battery). Thus, synesthetes were asked to choose the colour which matched their experienced synesthetic colour induced by the tone (letter, number) best and non-synesthetes were asked to select the colour which they thought to t best to the presented item. After three presentations of the stimuli in a randomized order, the geometric distance in RGB (red, green, blue) colour space, indicated by the subjects colour choices for each item during the three runs, was calculated. The mean values were then compared between groups. More consistent colour choices lead to a lower consistency score, as more consistent colour choices mean more similar RGB values and thus a smaller difference between the RGB values. For grapheme-colour synesthetes a threshold value of 1 was chosen as suggested by Eagleman et al. (2007). As a similar threshold has not been dened for auditory-visual synesthesia, we merely show that the group of auditory-visual synesthetes was more consistent than the control group, as suggested by Ward, Huckstep, and Tsakanikos (2006).

Experiment 1: McGurk audiovisual fusion Participants Nineteen synesthetes (Mage = 35.0 14.9, 14 women) and 24 non-synesthetic controls (Mage = 34.6 14.0, 18 women) participated. Synesthetes differed signicantly from controls with regard to the synesthesia battery consistency score (graphemes: graphemecolour synesthetes: 0.60 0.19 range: 0.280.94, controls: 2.2 0.6, range: 1.13.08, p < .01; tones: auditory-visual synesthetes: 1.16 0.47, range: 0.742.3, controls: 1.91 0.53, range: 0.913.03, p < .05). Of the 19 synesthetes, four synesthetes had auditory-visual synesthesia, eight had grapheme-colour synesthesia and seven had

Reduced audiovisual integration in synesthesia

97

grapheme-colour and auditory-visual synesthesia, 12 reported concurrent perception for words and three for voices. Stimuli and task We used self-prepared short (2 s duration) video sequences presented with a resolution of 640 9 512 pixels (covering 23 degree vertically and 18 degree horizontally of the visual angle). The video sequences comprised the frontal view of a male speaker pronouncing four kinds of syllables. Three of them were audiovisually congruent, that is, the auditory stream matched the vocalization movements (syllables: ADA, ABA, and AGA). The fourth stimulus was prepared to elicit the McGurk effect (McGurk & MacDonald, 1976) by combining the visual information of the syllable AGA with the auditory ABA (henceforth: M-ADA). Often, this combination leads to the fused percept of the syllable ADA. The videos were edited using VirtualDub 1.9.9 (www.virtualdub.org). ADA, AGA, and ABA syllables were presented four times each, whereas M-ADA stimuli were presented 28 times. Thus, each subject watched 40 videos presented in randomized order. The stimuli were presented on a 21 Sony Trinitron Multiscan G520 (Sony Electronics Inc., San Diego, CA, USA) monitor with a resolution of 1024 9 768 pixel and a refresh rate of 150 Hz. Subjects were seated 60 cm from the monitor. Acoustical stimuli were presented via AKG K121 Studio headphones with comfortable loudness. All stimuli were presented using Presentation software (Neurobehavioral Systems, Inc., Albany, CA). Subjects watched the stimuli and had to indicate the perceived syllable by pressing the keys D (for ADA), G (AGA) or B (ABA) on a standard computer keyboard. Thus, the answer D could occur (1) for the audiovisually congruent syllable ADA; and (2) for the audiovisually incongruent McGurk syllable (M-ADA), but only in the case of successful bimodal fusion. For the M-ADA stimuli two more answers were possible (1) G for perception driven by the visual modality; and (2) B for perception driven by the auditory modality. Each video stimulus was followed by a response prompt. After the button-press a xation cross in the centre of the screen was presented for 1 s, which was followed by the next stimulus. In the analysis we focused on the proportion of fusions as indicated by the D responses for M-ADA stimuli. Experiment 2: Enhancement of speech perception by visual information Participants Fourteen synesthetes (Mage = 35.4 13.7, 9 women) and 14 non-synesthetic controls (Mage = 36.8 14, 9 women) participated. Synesthetes differed in their consistency score asmeasuredwiththesynesthesiabatterysignicantlyfromcontrols(graphemes:graphemecoloursynesthetes0.64 0.19,range:0.350.94;controls:2.09 0.69,range:1.273.08, p < .01; tones: auditory-visual synesthetes: 0.98 0.27, range: 0.822.09; controls 2.03 0.46,range:1.272.74, p < .05).Threesynestheteshadauditory-visualsynesthesia, eight had grapheme-colour synesthesia and three had grapheme-colour and auditory-visual synesthesia, 11 reported concurrent perception for words and four for voices.

Stimuli and task German high frequency disyllabic lemmas derived from the CELEX-Database (Baayen, Piepenbrock, & Gulikers, 1995) with a Mannheim frequency 1,000,000 (MannMln) of one

98

Christopher Sinke et al.

or more were used for stimulation. The MannMln frequency indicates the down scaled occurrence of the selected word per one million words taken from the Mannheim 6.0 million word corpus. Stimuli, spoken by a male native speaker of German with linguistic experience, were recorded with a digital camera and a microphone. The recorded video was cut into segments of two-second length (720 9 576 pixel resolution) showing the frontal view of the whole face of the speaker as he pronounced one word per segment. The audio stream was in mono and was presented via two speakers situated on the left and right side of the video-monitor (19 at panel with 1280 9 1024 pixel resolution). The video segments were randomly assigned to the experimental conditions and prepared accordingly. For the auditory-alone condition (A), the video stream was replaced with a freeze image of the speakers face. We used 175 stimuli for the A condition. The audiovisual condition (AV) comprised 175 stimuli with synchronous auditory and visual speech information. In addition, the audio stream of both conditions was mixed with white noise of different loudness levels impairing comprehension. The intensity of the white noise was adjusted such that it was 0, 4, 8, 12, 16, 20, or 24 dB louder than the audio stream containing the presented word. This leads to stimuli with signal-to-noise ratios (SNR) in the auditory stream of 0, 4, 8, 12, 16, 20 and 24 dB respectively. The sound intensity was adjusted separately for each participant to a good audibility for SNR of 0 dB. Twenty-ve stimuli were used for each SNR. All stimuli were presented in a random order using Presentation software (Neurobehavioral Systems, Inc.). The experimental procedure was designed according to Ross et al. (2007). Participants were instructed to watch the screen and listen to the stimuli and to report verbally which word they understood. If a word was not clearly understood they were instructed to guess the word. Otherwise they should report I did not understand anything. The answer was recorded by the experimenter. Any answer different from the presented stimulus was counted as false, no matter if the participant had indicated to not have understood anything or had reported a wrong word. When the answer was given, the experimenter triggered the next trial which began with a xation cross of one-second duration followed by the video stimulus.

Statistical analysis According to Ross et al. (2007) the gain in comprehension brought about by the visual information can be calculated by subtracting the performance in the auditory-alone condition from the performance in the audiovisual condition (AV-A). Gain is maximal at 12 dB SNR for normal subjects and decreases with changes in SNR in both directions (Ross et al., 2007). Performance itself was maximal at 0 dB SNR. Thus, we decided to group stimuli with an SNR around the maximal gain of integration (8, 12 and 16 dB, henceforth inner stimuli) and stimuli with less expected gain (0, 4, 20 and 24 dB, henceforth outer stimuli). For the rst class of stimuli we expected large differences between both experimental groups with better performance for synesthesia subjects if they indeed have a more sensitive binding mechanism. For the latter class of stimuli, we expected no large differences between groups. The data were analysed with a two repeated-measures ANOVAs, one for separately calculated for inner and one for outer SNR stimuli, with the factors STIMULATION (auditory vs. audiovisual), SNR (inner respectively outer range stimuli; 3 respectively 4 levels) and GROUP (control vs. synesthesia).

Reduced audiovisual integration in synesthesia

99

Figure 1. Experiment 1. Audiovisual fusion as identied by McGurk illusion: Performance in the incongruent and congruent condition. Synesthetes show less fusion percepts. Error bars represent standard error of the mean.

Results
Experiment 1 In all three audiovisual congruent trial types, accuracy levels were at ceiling (synesthetes: 98.7% 4.2%, controls: 96.2% 5.5%) as is depicted in Figure 1. A t-test between groups did not show any differences. For the M-ADA stimuli synesthetes had signicantly less fusion responses (answer D, synesthesia: 22.4% 35.3%; control: 46.6% 39.7%; two sided t-test, p < .05). Synesthetes thus perceived ADA less often and their answer was driven mainly by the auditory information (answer B, synesthesia: 75.6% 38.3%; control: 52.1% 40%). The visually driven answer G was very rare in both groups (synesthesia: 2% 8.2%; control: 1.3% 5.8%).

Experiment 2 The calculation of 2 9 4 9 2 repeated measurement ANOVA (STIMULATION, SNR, GROUP) for the outer conditions revealed an effect of STIMULATION [F(1, 26) = 179.5, p < .001], SNR [F(3, 78) = 2433.0, p < .001] and an interaction of SNR and STIMULATION [F(3, 78) = 53.1, p < .001]. No differences between groups can be detected in the outer conditions indicating similar audiovisual processing in both groups for these SNRs. The effect of STIMULATION showed that indeed additional visual information leads to better comprehension of speech while effect of SNR shows that with increasing noise intensity performance decreases (see Figure 2A and Table 1). In the next step, the analysis was focused on the inner SNRs close to the optimum for visual enhancement. A 2 9 3 9 2 repeated measure ANOVA (STIMULATION, SNR, GROUP) revealed an effect of STIMULATION [F(1, 26) = 355.7, p < .001], SNR [F(2, 52) = 339.6, p < .001], an interaction of STIMULATION and SNR [F(2, 52) = 4.1, p < .05] and an interaction of STIMULATION and GROUP [F(1, 26) = 5.1, p < .05]. As in the analysis of the outer conditions, the effect of STIMULATION shows that visual information about the vocalization movements helps to understand speech. Again, the effect of SNR is expected as with more noise in the signal, understandability decreases (see Figure 2A and Table 1). Interestingly, the interaction between STIMULATION and GROUP showed that

100

Christopher Sinke et al.


A

Figure 2. Experiment 2. (A) Performance for both groups for the auditory-alone (A) and audiovisual (AV) conditions. (B) Gain in performance (difference between the performance in the AV and A condition). Controls show a maximum benet from additional visual information at 12 dB which is signicantly stronger than in the synesthetic group.

Table 1. Mean performance (% of correctly identied words) and SD of both groups for the different SNRs in the auditory-alone and audiovisual stimulation 0 dB Synaesthete: audiovisual Control: audiovisual Synaesthete: audio Control: audio 99 99 97 97 2 1 3 4 4 dB 90 91 63 59 5 5 13 10 8 dB 83 84 52 50 9 6 10 6 12 dB 67 76 34 28 14 11 11 11 16 dB 48 52 14 10 15 13 8 5 20 dB 26 28 0 1 15 11 1 2 24 dB 11 11 0 0 10 11 1 1

audiovisual stimuli are processed differently in both groups. It is worth to take a look on the gain in performance (difference in performance between audiovisual and auditoryalone conditions, Figure 2B) to better understand this interaction effect. A post-hoc t-test [t(26) = 2.73; p < .05] on the gain in performance (AV-A) reveals signicant differences with an SNR of 12 dB. Here, the group of synaesthetes prots less from the visual information compared with controls, while the control population prots most from the

Reduced audiovisual integration in synesthesia

101

visual information in this condition. When performance in this condition is compared to the anking conditions with a post-hoc t-test, a signicant increase in controls at 12 dB compared with 8 and 16 dB is found [t(40) = 2.44; p < .05] which is missing in synesthetes [t(40) = 0.16; p = .87]. Thus, synesthetes show a different audiovisual integration behaviour. While non-synesthetes prot most from visual information with a SNR of 12 dB, the visual gain in synesthetes does not peak but is rather at as the gain is equal from the 4 dB condition to the 20 dB condition. This indicates that synesthetes integrate visual and acoustical information, but this behaviour is slightly impaired as the synesthesia subjects could not prot from visual information to the same degree as nonsynesthetes.

Discussion
Two audiovisual experiments were performed to investigate multimodal integration mechanisms in synesthesia. Based on the hypothesis of an overactive integration mechanism in synesthesia (Esterman et al., 2006; Hubbard, 2007; Robertson, 2003) we had predicted a greater number of audiovisual fusion percepts in the rst experiment and a better comprehension performance of these subjects for audiovisual speech in noisy situations. Contrary to these expectations, however, both experiments showed that audiovisual integration was weaker in the synesthetes. Synesthesia subjects showed fewer fusions in the McGurk illusion experiment and beneted less from visual information during audiovisual speech comprehension. Our recent results thus speak against a generally overactive binding mechanism as the core problem underlying synesthetic perception. Thus, synesthesia is a phenomenon which features increased multimodal integration not across the board but only in a restricted fashion, that is, for the inducer concurrent coupling. First, with regard to the McGurk illusion, successful integration of visual and auditory stimuli leads to a specic illusionary percept. Speech perception thus draws on both, auditory and visual information in case visual information is available. Recent research shows that brain regions that are usually associated with relatively late, high level processes are implied in this illusion (Jones & Callan, 2003). In particular the left posterior superior temporal sulcus (STS) and adjacent brain areas have been found in relation to the nte, 2012). McGurk illusion (Nath & Beauchamp, 2012; Szycik, Stadler, Tempelmann, & Mu The STS has also been found in studies investigating other aspects of audiovisual speech perception (Szycik, Tausche, & Munte, 2008; Wright, Pelphrey, Allison, McKeown, & McCarthy, 2003). Another region important for audiovisual integration is the inferior frontal gyrus (IFG; Ojanen et al., 2005; Szycik, Jansma, & Munte, 2009). The interplay of these late regions has been sketched by the audiovisual-motor model of speech perception (Skipper, van Wassenhove, Nusbaum, & Small, 2007). As synesthetic experience has also been suggested to be driven by higher level processes such as attention, semantic information or feature binding (Dixon et al., 2006; Esterman et al., 2006; Mattingley, Payne, & Rich, 2006; Mattingley et al., 2001), the McGurk illusion appears to be a good test case to investigate synesthesia. So far, however, only a single publication has dealt with the McGurk illusion in synesthesia (Bargary et al., 2009). Bargary et al. argued that synesthetic experience is a relatively late process. They focused on the perceptual processes in synesthesia after audiovisual fusion happened using audiovisual incongruent whole word stimulation. They were able to show that concurrent colours induced by spoken words are related to what is perceived rather than to the auditory input.

102

Christopher Sinke et al.

In contrast with Bargary et al. this study focused on the question whether susceptibility to the McGurk illusion is altered in synesthesia compared with nonsynesthetic participants. Surprisingly, a reduced susceptibility to the illusion was revealed which suggests reduced audiovisual integration except for the specic inducer concurrent pairing. One limitation to this study could be the fact that for the Mc Gurk experiment 28 audiovisual incongruent (illusory) stimuli and 12 audiovisual congruent stimuli serving as control were used. Thus, there was considerable imbalance in the design between the answer categories. I might be asked whether the group difference found in our experiment could be due to a different susceptibility of synesthesia subjects to such a design imbalance. Control subjects showed about 47% of illusory fusion responses and were nearly perfect in the non-illusory trials. The pertinent literature has reported a wide range of fusion percepts varying between 28 and 98% of fusions (Baum, Martin, Hamilton, & Beauchamp, 2012; Gentilucci & Cattaneo, 2005; Keil, Muller, Ihssen, & Weisz, 2012; McGurk & MacDonald, 1976). The answers of the control subjects for the illusory stimuli when fusion failed were driven mainly by the auditory information attesting to the wellknown auditory dominance in Mc Gurk illusion experiments (Campbell et al., 1990). Thus, the imbalance in design had no inuence on the response properties of the control subjects in the sense that their performance conformed to the expected pattern. The synesthesia subjects showed a similar response pattern for non-illusory stimuli (performance at ceiling, no signicant difference to control group) and for illusory stimuli were the fusion failed (auditory dominance) as the control subjects. We therefore assume that the identied group difference regarding the number of fusions reect differences in multisensory processing rather than a differential susceptibility to the design imbalance. Whereas the McGurk illusion covers a rather unnatural aspect of audiovisual integration and thus might constitute a special case, everyday life features multiple situations in which multisensory facilitation occurs. Already in the fth decade of the last century it has been shown that the presence of additional visual information leads to a considerable improvement of intelligibility of auditory input under noisy conditions (Sumby & Pollack, 1954). The comprehension benet afforded by visual information in form of vocalization movements is particularly strong for specic SNR (McGettigan et al., 2012; Ross et al., 2007). At an intermediate level of SNR of about 12 dB multisensory integration is most evident in normal subjects. The general hyperbinding hypothesis of synesthesia (Hanggi et al., 2011) suggests that subjects affected by synesthesia should show either an additional gain of perception with concurrent audiovisual stimulation and/ or a widening of the special zone of SNRs in such situations. The current study revealed marked differences between synesthesia subjects and normal participants even in this quasi-natural experimental situation. While synesthetic participants beneted from visual information, a specic additional enhancement was missing. Again, this suggests that enhanced audiovisual integration is restricted to the inducerconcurrent pairing. Multisensory integration processes outside this special situation are reduced rather than enhanced in synesthesia. This pattern speaks against the hyperbinding hypothesis. Obviously, more evidence should be gathered before the hyperbinding hypothesis is put to a nal rest. As synesthesia is a heterogeneous phenomenon affecting different sensory modalities (Day, 2004), the current study can only speak to auditory-visual and grapheme-colour synesthetes. Also, alternative classications of synesthesia have been proposed, for example, using the self-reported localization of the concurrent perception (Dixon et al., 2004): so called associators perceive the synesthetic sensations in their minds eye, whereas projectors see synesthetic concurrents outside, for example, on

Reduced audiovisual integration in synesthesia

103

the page where the inducing letter is printed. These different groups may well have at least partially different processes underlying their experience and should be considered separately in future studies. The current study used only complex speech-related stimuli which may engage topdown attentional processes to a greater extent than more basic stimuli. Thus, experiments with more basic stimuli could be helpful to investigate the hyperconnectivity/ hyperbinding hypothesis of synesthesia. An initial effort in this direction has been made by Brang, Williams, and Ramachandran (2011) who used simple auditory (sine tones) and visual (light points) stimulation to investigate the double-ash illusion (Shams, Kamitani, & Shimojo, 2000) in a rather small sample (n = 7). Synesthetes reported more illusionary ashes than control subjects from which the authors inferred that synesthesia is related to hyperbinding between the sensory modalities. Recently, Neufeld et al. (2012) used the same illusion in 18 synesthesia subjects. In contrast with Brang et al. (2012), a reduced number of illusions and additionally a reduced time window of the illusory double ash was revealed in synesthetes. Whether these differences can be explained by differences in the location of the synesthetic percept remains to be seen. The reduced multisensory integration of synesthetes in this study may be explained alternatively by the increased processing effort related to increased information load induced by the synesthetic concurrent percept. Thus, the weaker performance of our synesthesia subjects might have been due to the fact that they had to integrate three sensory qualia instead of two (as the control subjects). Against this explanation speaks the fact that only few of our subjects reported synesthetic concurrents induced by heard voices (only three subjects in the Mc Gurk experiment and only four subjects in the speech perception experiment). To test this hypothesis we conducted the analysis again after removing the affected synesthesia subjects with no considerable changes in the result. The reduced multisensory integration of synesthetes might directly derive from their special ability. Synesthetes usually report that they have no trouble in identifying synesthetic and real parts of their perception. To keep track of which perception is synesthetic and which is real (i.e., stimulated from the outer world), synesthetes have to separate the senses and to perform a reality check. This increased separation of the senses might then extend to other, non-synesthetic situations as well, leading to, as demonstrated by our experiments, decreased susceptibility to the McGurk illusion and decreased benet from multimodal information. It is also possible that these multisensory integration decits are causing the synesthetic perception. It could be that subjects with decits in multisensory integration develop synesthesia to compensate for these decits. This could explain why one of the most common forms is grapheme-colour synesthesia. When children learn to read and write, it is important that the auditory and visual senses work together properly as acquiring reading/writing skills is mainly a transfer of information from the auditory domain (phonological information) to the visual domain. Thus, synesthesia might be a useful implicit strategy to overcome multisensory integration decits. To test this idea it would be useful to screen other types of synesthesia for multisensory integration decits and to look if the decits match the involved senses. Our results shed new light on the denition of synesthesia as a mingling of the senses. Mingled are only the synesthetic parts of their experience but not the normal parts of their sensory experience. Normal auditory-visual integration is even weaker. Thus, it would be equally appropriate to speak of synesthesia as a separation of the senses.

104

Christopher Sinke et al.

Acknowledgement
TFM has been supported by the DFG (a.o. SFB TR31, TP A7).

References
Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database [CD-ROM]. Retrieved from http://www.kun.nl/celex/ Bargary, G., Barnett, K. J., Mitchell, K. J., & Newell, F. N. (2009). Colored-speech synaesthesia is triggered by multisensory, not unisensory, perception. Psychological Science, 20, 529533. doi: 10.1111/j.1467-9280.2009.02338.x Baron-Cohen, S., Wyke, M. A., & Binnie, C. (1987). Hearing words and seeing colours: an experimental investigation of a case of synaesthesia. Perception, 16, 761767. doi:10.1068/ p160761 Baum, S. H., Martin, R. C., Hamilton, A. C., & Beauchamp, M. S. (2012). Multisensory speech perception without the left superior temporal sulcus. Neuroimage, 62, 18251832. doi: 10. 1016/j.neuroimage.2012.05.034 Brang, D., Williams, L. E., & Ramachandran, V. S. (2011). Grapheme-color synesthetes show enhanced crossmodal processing between auditory and visual modalities. Cortex, 48, 630637. doi:10.1016/j.cortex.2011.06.008 Brang, D., Williams, L. E., & Ramachandran, V. S. (2012). Grapheme-color synesthetes show enhanced crossmodal processing between auditory and visual modalities. Cortex, 48, 630637. doi:10.1016/j.cortex.2011.06.008 Campbell, R., Garwood, J., Franklin, S., Howard, D., Landis, T., & Regard, M. (1990). Neuropsychological studies of auditory-visual fusion illusions. Four case studies and their implications. Neuropsychologia, 28, 787802. doi:10.1016/0028-3932(90)90003-7 Day, S. (2004). Some demographical and socio-cultural aspects of synesthesia. In L. C. Robertson & N. Sagiv (Eds.), Synesthesia: perspectives from cognitive neuroscience (pp. 1133). New York: Oxford University Press. Dixon, M. J., Smilek, D., Duffy, P. L., Zanna, M. P., & Merikle, P. M. (2006). The role of meaning in grapheme-colour synaesthesia. Cortex, 42, 243252. doi:10.1016/S0010-9452(08)70349-6 Dixon, M. J., Smilek, D., & Merikle, P. M. (2004). Not all synaesthetes are created equal: projector versus associator synaesthetes. Cognitive, Affective, & Behavioral Neuroscience, 4, 335343. doi:10.3758/CABN.4.3.335 Eagleman, D. M., Kagan, A. D., Nelson, S. S., Sagaram, D., & Sarma, A. K. (2007). A standardized test battery for the study of synesthesia. Journal of Neuroscience Methods, 159, 139145. doi: 10. 1016/j.jneumeth.2006.07.012 Esterman, M., Verstynen, T., Ivry, R. B., & Robertson, L. C. (2006). Coming unbound: disrupting automatic integration of synesthetic color and graphemes by transcranial magnetic stimulation of the right parietal lobe. Journal of Cognitive Neuroscience, 18, 15701576. doi:10.1162/jocn. 2006.18.9.1570 Gentilucci, M., & Cattaneo, L. (2005). Automatic audiovisual integration in speech perception. Experimental Brain Research, 167, 6675. doi:10.1007/s00221-005-0008-z Grossenbacher, P. G., & Lovelace, C. T. (2001). Mechanisms of synesthesia: cognitive and physiological constraints. Trends in Cognitive Sciences, 5, 3641. doi:10.1016/S1364-6613(00) 01571-0 Hanggi, J., Wotruba, D., & Jancke, L. (2011). Globally altered structural brain network topology in grapheme-color synesthesia. Journal of Neuroscience, 31, 58165828. doi:10.1523/JNEUROSCI. 0964-10.2011 Hubbard, E. M. (2007). Neurophysiology of synesthesia. Current Psychiatry Reports, 9, 193199. doi:10.1007/s11920-007-0018-6

Reduced audiovisual integration in synesthesia

105

Jones, J. A., & Callan, D. E. (2003). Brain activity during audiovisual speech perception: an fMRI study of the McGurk effect. NeuroReport, 14, 11291133. doi:10.1097/01.wnr.0000074343. 81633.2a Keil, J., Muller, N., Ihssen, N., & Weisz, N. (2012). On the variability of the McGurk effect: audiovisual integration depends on prestimulus brain states. Cerebral Cortex, 22, 221231. doi: 10.1093/cercor/bhr125 Mattingley, J. B., Payne, J. M., & Rich, A. N. (2006). Attentional load attenuates synaesthetic priming effects in grapheme-colour synaesthesia. Cortex, 42, 213221. doi:10.1016/S0010-9452(08) 70346-0 Mattingley, J. B., Rich, A. N., Yelland, G., & Bradshaw, J. L. (2001). Unconscious priming eliminates automatic binding of colour and alphanumeric form in synaesthesia. Nature, 410, 580582. doi: 10.1038/35069062 McGettigan, C., Faulkner, A., Altarelli, I., Obleser, J., Baverstock, H., & Scott, S. K. (2012). Speech comprehension aided by multiple modalities: behavioural and neural interactions. Neuropsychologia, 50, 762776. doi:10.1016/j.neuropsychologia.2012.01.010 McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746748. doi:10. 1038/264746a0 Mills, C. B., Boteler, E. H., & Oliver, G. K. (1999). Digit synaesthesia: a case study using a Stroop-type test. Cognitive Neuropsychology, 16, 181191. doi:10.1080/026432999380951 Nath, A. R., & Beauchamp, M. S. (2012). A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. Neuroimage, 59, 781787. doi:10.1016/j.neuroimage. 2011.07.024 Neufeld, J., Sinke, C., Zedler, M., Emrich, H. M., & Szycik, G. R. (2012). Reduced audio-visual integration in synaesthetes indicated by the double-ash illusion. Brain Research, 1473, 7886. doi:10.1016/j.brainres.2012.07.011 Ojanen, V., Mottonen, R., Pekkola, J., Jaaskelainen, I. P., Joensuu, R., Autti, T., & Sams, M. (2005). Processing of audiovisual speech in Brocas area. Neuroimage, 25, 333338. doi:10.1016/j. neuroimage.2004.12.001 Ramachandran, V., & Hubbard, E. M. (2001). Synaesthesia - A window into perception, thought and language. Journal of Consciousness Studies, 8, 334. Robertson, L. C. (2003). Binding, spatial attention and perceptual awareness. Nature Reviews Neuroscience, 4, 93102. doi:10.1038/nrn1030 Ross, L. A., Saint-Amour, D., Leavitt, V. M., Javitt, D. C., & Foxe, J. J. (2007). Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex, 17, 11471153. doi:10.1093/cercor/bhl024 Sagiv, N., Heer, J., & Robertson, L. (2006). Does binding of synesthetic color to the evoking grapheme require attention? Cortex, 42, 232242. doi:10.1016/S0010-9452(08)70348-4 Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what you hear. Nature, 408, 788. doi:10.1038/35048669 Simner, J., & Logie, R. H. (2007). Synaesthetic consistency spans decades in a lexical-gustatory synaesthete. Neurocase, 13, 358365. doi:10.1080/13554790701851502 Simner, J., Mulvenna, C., Sagiv, N., Tsakanikos, E., Witherby, S. A., Fraser, C., . . . & Ward, J. (2006). Synaesthesia: the prevalence of atypical cross-modal experiences. Perception, 35, 1024 1033.doi:10.1068/p5469 Skipper, J. I., van Wassenhove, V., Nusbaum, H. C., & Small, S. L. (2007). Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception. Cerebral Cortex, 17, 23872399. doi:10.1093/cercor/bhl147 Smilek, D., Dixon, M. J., Cudahy, C., & Merikle, P. M. (2001). Synaesthetic photisms inuence visual perception. Journal of Cognitive Neuroscience, 13, 930936. doi:10.1162/089892901753 165845 Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212215.

106

Christopher Sinke et al.

Szycik, G. R., Jansma, H., & Munte, T. F. (2009). Audiovisual integration during speech comprehension: an fMRI study comparing ROI-based and whole brain analyses. Human Brain Mapping, 30, 19901999. doi:10.1002/hbm.20640 nte, T. F. (2012). Examining the McGurk illusion Szycik, G. R., Stadler, J., Tempelmann, C., & Mu using high-eld 7 Tesla functional MRI. Frontiers in Human Neuroscience, 6, 95. doi:10.3389/ fnhum.2012.00095 Szycik, G. R., Tausche, P., & Munte, T. F. (2008). A novel approach to study audiovisual integration in speech perception: localizer fMRI and sparse sampling. Brain Research, 1220, 142149. doi:10. 1016/j.brainres.2007.08.027 Ward, J., Huckstep, B., & Tsakanikos, E. (2006). Sound-colour synaesthesia: to what extent does it use cross-modal mechanisms common to us all? Cortex, 42, 264280. doi:10.1016/S0010-9452 (08)70352-6 Wright, T. M., Pelphrey, K. A., Allison, T., McKeown, M. J., & McCarthy, G. (2003). Polysensory interactions along lateral temporal regions evoked by audiovisual speech. Cerebral Cortex, 13 , 10341043. doi:10.1093/cercor/13.10.1034 Received 31 May 2012; revised version received 24 October 2012

Das könnte Ihnen auch gefallen