Sie sind auf Seite 1von 5

Naveen K.

Nagaraj: JASA Express Letters

[http://dx.doi.org/10.1121/1.4962374]

Published Online 9 September 2016

Amplitude modulation detection with concurrent


frequency modulation
Naveen K. Nagaraj
Cognitive Hearing Science Lab, Department of Audiology and Speech Pathology,
University of Arkansas for Medical Sciences/University of Arkansas at Little Rock,
2801 South University Avenue, Little Rock, Arkansas 72204, USA
nknagaraj@uams.edu

Abstract: Human speech consists of concomitant temporal modulations


in amplitude and frequency that are crucial for speech perception. In this
study, amplitude modulation (AM) detection thresholds were measured
for 550 and 5000 Hz carriers with and without concurrent frequency
modulation (FM), at AM rates crucial for speech perception. Results
indicate that adding 40 Hz FM interferes with AM detection, more so for
5000 Hz carrier and for frequency deviations exceeding the critical bandwidth of the carrier frequency. These findings suggest that future cochlear
implant processors, encoding speech fine-structures may consider limiting
the FM to narrow bandwidth and to low frequencies.
C 2016 Acoustical Society of America
V

[QJF]
Date Received: May 5, 2016

Date Accepted: July 29, 2016

1. Introduction
Temporal modulations in amplitude and frequency are the fundamental building
blocks of complex communication signals produced by humans, animals, and birds.
Human speech is rich in dynamic amplitude modulation (AM) and frequency modulation (FM) cues that are decoded in our auditory system. Speech consists of distinct
AM near the syllabic rate of 3 to 4 Hz and speech perception studies suggest that AM
frequencies below 20 Hz are crucial and sufficient for accurate speech recognition in
quiet (Drullman et al., 1994; Shannon et al., 1995). Most current day cochlear implant
(CI) processers mainly extract and transmit low frequency AM information to specific
electrodes depending on the spectral channel achieving great success in speech recognition. However, research evidence suggests that AM cues alone cannot support good
speech recognition in noise (Fu and Shannon, 1999; Stickney et al., 2004), FM cues
such as formant transitions, changes in pitch and vibratory pattern of an individuals
vocal folds also help in speech recognition. Studies designed to understand the relative
importance of AM and FM cues suggest that FM information extracted from the finestructure of the speech signal are important for speech recognition in noise, speaker
identification, music perception, and tonal language perception (Smith et al., 2002;
Zeng et al., 2005). Concurrent AM and FM are the hallmark of our speech and these
findings imply that AM and FM cues offer distinct complimentary information about
speech signal and are crucial for segmenting and understanding target speech in complex listening situations. Based on these results, speech coding strategies for CI processors have been proposed to encode FM cues along with AM cues (Nie et al., 2005).
Mixed modulation (simultaneous AM and FM) perception studies in normal
hearing listeners have demonstrated constructive summation of subthreshold FM and
AM cues at low modulation rates (Moore and Sek, 1992; Ozimek and Sek, 1987).
However, these studies have used the same modulation rates for both AM and FM,
whereas speech contains simultaneous temporal modulation in the range of 1 to 8 Hz
corresponding to syllabic rate and spectral modulation <4 cycles/octave corresponding
to formant transitions that are most important for speech intelligibility (Pasley et al.,
2012). Magnetoencephalography (MEG) evidence suggests that both envelope cues
and fine structure cues interact in our auditory cortex (Ding and Simon, 2009; Luo
et al., 2006). For example, Ding and Simon measured the cortical MEG response for
ten young adults using a 550 Hz carrier tone modulated at FM rate of 37.7 Hz with
concurrent AM varying from 0.3 to 13.8 Hz. They found that auditory steady state
response (ASSR) for fast FM was concurrently amplitude and phase modulated by the
slow AM. This suggests that slow temporal envelope cues are encoded in the auditory
cortex at carrier amplitude dynamics as well as in the carrier frequency dynamics.
In a study by Hsieh and Saberi (2010), detection of AM was measured in stationary and dynamic sweep frequency carriers. They found that compared to
J. Acoust. Soc. Am. 140 (3), September 2016

C 2016 Acoustical Society of America EL251


V

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 203.110.242.24 On: Sat, 12 Nov 2016 05:22:37

Naveen K. Nagaraj: JASA Express Letters

[http://dx.doi.org/10.1121/1.4962374]

Published Online 9 September 2016

stationary pure tone carriers, AM detection was harder when the carrier consisted of
dynamic frequency sweeps especially for AM rates of 8 Hz. However, there are no
other psychoacoustic studies that have measured the detection of slow AM (< 8 Hz)
important for speech recognition with concurrent fast FM of the carrier. Clearly, there
is a lack of perceptual studies that have investigated the role of concurrent FM in the
perception of slow temporal modulation. The purpose of the current study is to seek
the perceptual correlate to ASSR findings of mixed modulation stimuli (Ding and
Simon, 2009). Cortical ASSR response to concurrent fast FM (40 Hz) and slow AM
(<15 Hz) indicate that temporal envelope information is simultaneously coded at the
neural response at AM rate and also in the neural response at fast FM. Based on these
findings, it can be hypothesized that our auditory system may utilize multiple cues to
detect slow temporal modulation. Specifically, in this study, listeners ability to detect
AM with and without concurrent 40 Hz FM was measured using sinusoidal carrier
stimuli where neural phase locking is present (550 Hz) and most likely absent
(5000 Hz). The main research question of this study is: Do multiple neural representations of slow temporal envelopes of mixed modulation stimuli found in ASSR studies,
contribute to detection of slow rate AM which is crucial for speech perception?
2. Method
2.1 Stimuli
Concurrent amplitude and frequency modulation signals were generated using the following mathematical equation (Ozimek and Sek, 1987):
 



Df
sin 2pff m t h ;
(1)
xt A1 m cos2p fam t  cos 2pfc t
ff m
where A denotes amplitude of the signal, fc is the carrier frequency, fam is the AM
rate, ff m is the FM rate, and h represents phase of AM and FM, respectively, Df is
the frequency deviation around the fc , and m is the modulation depth (0  m  1) for
AM.
The AM detection threshold (AMDT) was measured for sinusoidal carrier signals fc 550 and 5000 Hz with and without concurrent FM (ff m 40 Hz). FM rate of
40 Hz was used because it corresponds to formant transition frequency in speech and
also cortical ASSR response is robust at this frequency. The phase of AM and FM
was set to zero ( h 0: The range of FM were selected to cover the frequency
deviations within and beyond the critical bandwidth for each fc . Hence, Df of 6 33
and 6 330 Hz were selected for 550 Hz; Df of 6 330 and 6 2700 Hz were selected for
5000 Hz carrier signal. For conditions with no FM, Df was set to 0. AMDT were measured for each AM rate, fam 2, 5, 10, or 20 Hz. The values of fam were selected
because these rates are found to be crucial for speech recognition. All the stimuli were
generated digitally using MATLAB (MathWorks) and 24 bit real-time signal processor
(Tucker Davis Technologies RZ-6) at a sampling rate of 48 kHz. The stimuli were presented monaurally at 70 dB sound pressure level (SPL) to the listeners left ear through
an ER-2 A insert earphone.
2.2 Procedure
Four normal hearing adults (age range: 23 to 37 years) participated in this study. All
participants had hearing thresholds less than 20 dB hearing level (HL) between 250
and 8000 Hz. Participants were trained on all the conditions before collecting the
actual data. AMDT was measured using three-interval forced-choice adaptive procedure that estimated 79.4% psychometric threshold point (Levitt, 1971). Before each
observation trial, a visual cue ready was displayed for 300 ms. The duration of each
observation interval stimulus was 750 ms including 5 ms raised cosine onset and offset
ramps. Each stimulus interval was also marked by visual cues and the inter-stimulus
interval was set to 500 ms. In one of the randomly selected intervals, the carrier signal
was modulated in amplitude. Listeners were asked to select the interval that was different using a keyboard and the correct interval feedback was provided following the
response. The AM depth (m) was initially set to 0.79 (20 logm 2 dB) so that it was
clearly detected. Subsequently, m was adaptively varied following 3-down, 1-up rule
using a step size of 2 dB during the first four reversals, and reduced to 1 dB throughout
the remaining eight intervals. After each run, the modulation depths at the last eight
reversals were averaged to obtain a threshold estimate. For each condition, two runs
were completed before selecting the next condition. A total of four runs per condition
were completed and the geometric mean of threshold estimates was computed to
obtain the final AMDT for each condition. The order of experimental conditions was
EL252 J. Acoust. Soc. Am. 140 (3), September 2016

Naveen K. Nagaraj

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 203.110.242.24 On: Sat, 12 Nov 2016 05:22:37

Naveen K. Nagaraj: JASA Express Letters

[http://dx.doi.org/10.1121/1.4962374]

Published Online 9 September 2016

Fig. 1. (Color online) Thresholds for detection of AM for 550 and 5000 Hz carrier frequencies fc as a function
of AM rate (fam ) and frequency deviation (Df . Error bars represent the standard error of geometric mean.

randomly selected, but all the four participants completed them in the same order.
Listeners typically completed a 2 hour session per day in a double walled, sound
treated booth and took approximately 8 to 9 sessions to complete the experiment.
3. Results
The data from individual participants had a similar trend, hence the average data for
each carrier frequency are displayed in Fig. 1. Repeated measures analysis of variance
(ANOVA) was performed with AMDT (20logm) as dependent variable for each carrier
with frequency deviations (Df and AM rates (fam as factors. ANOVA revealed significant effect of Df for 550 Hz [F(2, 6)18.79, p 0.002] and 5000 Hz [F(2, 6) 40.13, p
0.000] carrier frequencies. Main effects of fam was also significant for 550 Hz [F(2, 6)
18.79, p 0.002] and 5000 Hz [F(3, 9) 17.57, p 0.000]. There was also a significant interaction between Df and fam for 550 Hz [F(6, 18)5.27, p 0.002] and 5000 Hz
[F(6, 18)2.99, p 0.033] carrier frequencies. Planned post hoc Tukey analysis was performed to study the effect of 40 Hz FM on AMDT at each fam (as shown in two panels of Fig. 1). Adding concurrent 40 Hz FM at 550 Hz carrier frequency (left panel),
did not significantly affect the AMDT for fam of 2 and 5 Hz. AMDT was not significantly different between Df of 6 33 Hz and 0 Hz. For fam of 10 and 20 Hz, AMDT
was substantially impaired when Df of 6 330 Hz was presented simultaneously.
However, for 5000 Hz carrier frequency (right panel), there was a significant effect of
concurrent 40 Hz FM with Df of 6 2700 Hz on AMDT for all the AM rates. At Df
of 6 330 Hz, AMDT was impaired only for fam of 10 and 20 Hz.
4. Discussion
The aim of this study was to measure the participants AMDT with and without concurrent FM to examine whether the multiple neural representations of slow AM found
in ASSR study (Ding and Simon, 2009), using mixed modulation stimuli, aid in detection of AM crucial for speech perception. Results found in this study suggest that, in
general, adding FM to the AM carrier disturbed listeners ability to detect AM when
the frequency deviation exceeded the critical bandwidth for fam of 10 and 20 Hz.
However, at 550 Hz, concurrent FM did not significantly reduce AMDT for fam  5 Hz.
These results are comparable to second order temporal modulation detection thresholds (Lorenzi et al., 2001). This is because concurrent FM creates additional AM at
the output of the cochlear filter creating secondary modulation over the existing AM.
Figure 2 shows the relation between the input stimulus parameters fc ; Df ; and fam
with AM depth (m 0.79) and the output of the auditory nerve (AN) based on
the phenomenological auditory model (Zilany et al., 2014; Zilany et al., 2009). This
auditory model consists of functional components from the human middle ear to the
AN and has been found to accurately predict AM transfer function data (Zilany et al.,
2009). AN firing rate shown in Fig. 2 represents the response obtained from low spontaneous rate fibers of the AN tuned to carrier frequency of the stimulus presented at
70 dB SPL. Top rows of each panel of Fig. 2 shows the AN firing for AM carrier signals with no FM. Firing patterns of the AN represent the modulation of the signal.
J. Acoust. Soc. Am. 140 (3), September 2016

Naveen K. Nagaraj

EL253

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 203.110.242.24 On: Sat, 12 Nov 2016 05:22:37

Naveen K. Nagaraj: JASA Express Letters

[http://dx.doi.org/10.1121/1.4962374]

Published Online 9 September 2016

Fig. 2. Response of the phenomenological auditory model which simulates the human peripheral auditory processing from the middle ear to auditory nerve output. The model provides a realistic response from the population of AN fibers to simple and complex sounds. The output from the model shows the effect of frequency
deviation (Df on the average firing rate of low spontaneous rate AN fibers tuned to stimulus carrier frequency
(fc ) with AM depth (m) of 2 dB at 70 dB SPL. Each panel shows the AN response for pure tone carrier frequencies fc 550 or 5000 Hz with AM rates (fam ) 5 or 20 Hz.

Bottom two rows of each panel show the firing pattern of AN fibers for AM carrier
with concurrent FM having Df varying within and beyond the critical bandwidth of
the carrier frequency, respectively. Note that the AN output for the concurrent FM
suggests that neural firing rate is further modulated in amplitude by the concurrent
40 Hz FM. Also as the Df increased beyond the critical bandwidth of the carrier (bottom row of each panel in Fig. 2), modulation depth at the neural output also
increased. AN output data from the model is consistent with the perceptual data suggesting that concurrent FM to the carrier does have an effect similar to modulation
masking (F
ullgrabe et al., 2005; Lorenzi et al., 2001), making it harder to detect the
primary AM. This masking effect is more pronounced at fc 5000 Hz where phase
locking mechanism is likely absent and when the frequency deviations exceeds the critical bandwidth of the fc . Neural recording from the awake primates auditory cortex
also suggests that spectral context affects the temporal envelope encoding (Malone
et al., 2013). They found that change in carrier type of AM stimuli, from tone to narrow band noise, significantly affected the firing rate and temporal modulation transfer
function. Based on the firing pattern, they predicted better discrimination for AM
tones compared to AM noise. This indicates that cortical neurons do not encode envelope modulation independently of spectral content.
Results from this study are also consistent with AMDT found in simultaneous
FM glides by Hsieh and Saberi (2010). They measured AMDT for sweep frequency
EL254 J. Acoust. Soc. Am. 140 (3), September 2016

Naveen K. Nagaraj

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 203.110.242.24 On: Sat, 12 Nov 2016 05:22:37

Naveen K. Nagaraj: JASA Express Letters

[http://dx.doi.org/10.1121/1.4962374]

Published Online 9 September 2016

carriers and stationary carriers for AM rates from 8 to 128 Hz and found the AMDT
were poorer for sweep frequency carrier. Previous studies on mixed modulation perception also suggest that FM detection threshold gets worse in the presence of simultaneous AM for normal hearing (Moore and Sek, 1996) and hearing impaired (Moore
and Skrodzka, 2002) listeners. This is also true for listeners with CI. Luo and Fu
(2007) measured FM detection thresholds using 10 Hz sinusoidal FM and upward frequency sweep with and without the presence of simultaneous AM in CI users. Their
results also suggest that FM detection thresholds were found to be impaired in CI users
when concurrent AM was added. The present data point out that adding FM may interfere with the perception of low rate AM cues at high carrier frequencies. Although more
work is clearly needed to understand the complex interaction of mixed modulation in the
auditory system, existing data do suggest the possibility of FM interfering with AM cues.
However, FM interference is less for low carrier frequencies where phase locking mechanism is present and at low AM rates (5 Hz). This suggests that limiting the encoding of
fine structure cues in future CI speech processors to low frequencies may reduce the potential interaction with crucial AM cues important for speech perception.
Acknowledgments
The author would like to thank Dr. Samuel Atcherson for helpful comments on AN
modelling and Tia McDonald for assistance with data collection.
References and links
Ding, N., and Simon, J. Z. (2009). Neural representations of complex temporal modulations in the human
auditory cortex, J. Neurophysiol. 102(5), 27312743.
Drullman, R., Festen, J. M., and Plomp, R. (1994). Effect of temporal envelope smearing on speech
reception, J. Acoust. Soc. Am. 95(2), 10531064.
Fu, Q. J., and Shannon, R. V. (1999). Phoneme recognition by cochlear implant users as a function of signal-to-noise ratio and nonlinear amplitude mapping, J. Acoust. Soc. Am. 106(2), L18L23.
F
ullgrabe, C., Moore, B. C. J., Demany, L., Ewert, S. D., Sheft, S., and Lorenzi, C. (2005). Modulation
masking produced by second-order modulators, J. Acoust. Soc. Am. 117(4), 21582168.
Hsieh, I. H., and Saberi, K. (2010). Detection of sinusoidal amplitude modulation in logarithmic frequency sweeps across wide regions of the spectrum, Hear. Res. 262(1-2), 918.
Levitt, H. (1971). Transformed up-down methods in psychoacoustics, J. Acoust. Soc. Am. 49(2),
467477.
Lorenzi, C., Soares, C., and Vonner, T. (2001). Second-order temporal modulation transfer functions,
J. Acoust. Soc. Am. 110(2), 10301038.
Luo, H., Wang, Y., Poeppel, D., and Simon, J. Z. (2006). Concurrent encoding of frequency and amplitude modulation in human auditory cortex: MEG evidence, J. Neurophysiol. 96(5), 27122723.
Luo, X., and Fu, Q. J. (2007). Frequency modulation detection with simultaneous amplitude modulation
by cochlear implant users, J. Acoust. Soc. Am. 122(2), 10461054.
Malone, B. J., Beitel, R. E., Vollmer, M., Heiser, M. A., and Schreiner, C. E. (2013). Spectral context
affects temporal processing in awake auditory cortex, J. Neurosci. 33(22), 94319450.
Moore, B. C., and Sek, A. (1992). Detection of combined frequency and amplitude modulation,
J. Acoust. Soc. Am. 92(6), 31193131.
Moore, B. C., and Sek, A. (1996). Detection of frequency modulation at low modulation rates: Evidence
for a mechanism based on phase locking, J. Acoust. Soc. Am. 100(4), 23202331.
Moore, B. C., and Skrodzka, E. (2002). Detection of frequency modulation by hearing-impaired listeners:
Effects of carrier frequency, modulation rate, and added amplitude modulation, J. Acoust. Soc. Am.
111(1), 327335.
Nie, K., Stickney, G., and Zeng, F. G. (2005). Encoding frequency modulation to improve cochlear
implant performance in noise, IEEE Trans. Bio-Med. Eng. 52(1), 6473.
Ozimek, E., and Sek, A. (1987). Perception of amplitude and frequency modulated signals (mixed modulation), J. Acoust. Soc. Am. 82(5), 15981603.
Pasley, B. N., David, S. V., Mesgarani, N., Flinker, A., Shamma, S. A., Crone, N. E., Knight, R. T., and
Chang, E. F. (2012). Reconstructing speech from human auditory cortex, PLOS Biol. 10(1), e1001251.
Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J., and Ekelid, M. (1995). Speech recognition with
primarily temporal cues, Science 270(5234), 303304.
Smith, Z. M., Delgutte, B., and Oxenham, A. J. (2002). Chimaeric sounds reveal dichotomies in auditory
perception, Nature 416(6876), 8790.
Stickney, G. S., Zeng, F. G., Litovsky, R., and Assmann, P. (2004). Cochlear implant speech recognition
with speech maskers, J. Acoust. Soc. Am. 116(2), 10811091.
Zeng, F. G., Nie, K., Stickney, G. S., Kong, Y. Y., Vongphoe, M., Bhargave, A., Wei, C., and Cao, K.
(2005). Speech recognition with amplitude and frequency modulations, Proc. Natl. Acad. Sci. U.S.A.
102(7), 22932298.
Zilany, M. S. A., Bruce, I. C., and Carney, L. H. (2014). Updated parameters and expanded simulation
options for a model of the auditory periphery, J. Acoust. Soc. Am. 135(1), 283286.
Zilany, M. S. A., Bruce, I. C., Nelson, P. C., and Carney, L. H. (2009). A phenomenological model of the
synapse between the inner hair cell and auditory nerve: Long-term adaptation with power-law dynamics, J. Acoust. Soc. Am. 126(5), 23902412.

J. Acoust. Soc. Am. 140 (3), September 2016

Naveen K. Nagaraj

EL255

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 203.110.242.24 On: Sat, 12 Nov 2016 05:22:37

Das könnte Ihnen auch gefallen