You are on page 1of 33

1 The Perception of Musical Tones

Andrew J. Oxenham
Department of Psychology, University of Minnesota, Minneapolis

I. Introduction
A. What Are Musical Tones?
The definition of a tonea periodic sound that elicits a pitch sensationencompasses
the vast majority of musical sounds. Tones can be either puresinusoidal variations
in air pressure at a single frequencyor complex. Complex tones can be divided into
two categories, harmonic and inharmonic. Harmonic complex tones are periodic,
with a repetition rate known as the fundamental frequency (F0), and are composed of
a sum of sinusoids with frequencies that are all integer multiples, or harmonics,
of the F0. Inharmonic complex tones are composed of multiple sinusoids that are
not simple integer multiples of any common F0. Most musical instrumental or
vocal tones are more or less harmonic but some, such as bell chimes, can be

B. Measuring Perception
The physical attributes of a sound, such as its intensity and spectral content, can be
readily measured with modern technical instrumentation. Measuring the perception
of sound is a different matter. Gustav Fechner, a 19th-century German scientist,
is credited with founding the field of psychophysicsthe attempt to establish a
quantitative relationship between physical variables (e.g., sound intensity and fre-
quency) and the sensations they produce (e.g., loudness and pitch; Fechner, 1860).
The psychophysical techniques that have been developed since Fechners time to
tap into our perceptions and sensations (involving hearing, vision, smell, touch, and
taste) can be loosely divided into two categories of measures, subjective and objec-
tive. The subjective measures typically require participants to estimate or produce
magnitudes or ratios that relate to the dimension under study. For instance, in
establishing a loudness scale, participants may be presented with a series of tones
at different intensities and then asked to assign a number to each tone, correspond-
ing to its loudness. This method of magnitude estimation thus produces a psycho-
physical function that directly relates loudness to sound intensity. Ratio estimation
follows the same principle, except that participants may be presented with two
The Psychology of Music. DOI:
2013 Elsevier Inc. All rights reserved.
2 Andrew J. Oxenham

sounds and then asked to judge how much louder (e.g., twice or three times) one
sound is than the other. The complementary methods are magnitude production and
ratio production. In these production techniques, the participants are required to
vary the relevant physical dimension of a sound until it matches a given magnitude
(number), or until it matches a specific ratio with respect to a reference sound.
In the latter case, the instructions may be something like adjust the level of the
second sound until it is twice as loud as the first sound. All four techniques have
been employed numerous times in attempts to derive appropriate psychophysical
scales (e.g., Buus, Muesch, & Florentine, 1998; Hellman, 1976; Hellman &
Zwislocki, 1964; Stevens, 1957; Warren, 1970). Other variations on these methods
include categorical scaling and cross-modality matching. Categorical scaling involves
asking participants to assign the auditory sensation to one of a number of fixed
categories; following our loudness example, participants might be asked to select a
category ranging from very quiet to very loud (e.g., Mauermann, Long, & Kollmeier,
2004). Cross-modality matching avoids the use of numbers by, for instance, asking
participants to adjust the length of a line, or a piece of string, to match the perceived
loudness of a tone (e.g., Epstein & Florentine, 2005). Although all these methods
have the advantage of providing a more-or-less direct estimate of the relationship
between the physical stimulus and the sensation, they have a number of disadvan-
tages also. First, they are subjective and rely on introspection on the part of the
subject. Perhaps because of this they can be somewhat unreliable, variable across
and within participants, and prone to various biases (e.g., Poulton, 1977).
The other approach is to use an objective measure, where a right and wrong
answer can be verified externally. This approach usually involves probing the limits
of resolution of the sensory system, by measuring absolute threshold (the smallest
detectable stimulus), relative threshold (the smallest detectable change in a stimulus),
or masked threshold (the smallest detectable stimulus in the presence of another
stimulus). There are various ways of measuring threshold, but most involve a forced-
choice procedure, where the subject has to pick the interval that contains the target
sound from a selection of two or more. For instance, in an experiment measuring
absolute threshold, the subject might be presented with two successive time intervals,
marked by lights; the target sound is played during one of the intervals, and the
subject has to decide which one it was. One would expect performance to change
with the intensity of the sound: at very low intensities, the sound will be completely
inaudible, and so performance will be at chance (50% correct in a two-interval task);
at very high intensities, the sound will always be clearly audible, so performance will
be near 100%, assuming that the subject continues to pay attention. A psychometric
function can then be derived, which plots the performance of a subject as a function
of the stimulus parameter. An example of a psychometric function is shown in
Figure 1, which plots percent correct as a function of sound pressure level. This type
of forced-choice paradigm is usually preferable (although often more time-consuming)
than more subjective measures, such as the method of limits, which is often used today
to measure audiograms. In the method of limits, the intensity of a sound is decreased
until the subject reports no longer being able to hear it, and then the intensity
of the sound is increased until the subject again reports being able to hear it.
1. The Perception of Musical Tones 3

Figure 1 A schematic example of

100 a psychometric function, plotting
percent correct in a two-alternative
forced-choice task against the
Percent correct

80 sound pressure level of a test tone.



5 0 5 10 15
Signal level (dB SPL)

The trouble with such measures is that they rely not just on sensitivity but also on
criterionhow willing the subject is to report having heard a sound if he or she is
not sure. A forced-choice procedure eliminates that problem by forcing participants
to guess, even if they are unsure which interval contained the target sound. Clearly,
testing the perceptual limits by measuring thresholds does not tell us everything
about human auditory perception; a primary concern is that these measures are typi-
cally indirectthe finding that people can detect less than a 1% change in frequency
does not tell us much about the perception of much larger musical intervals, such as
an octave. Nevertheless it has proved extremely useful in helping us to gain a deeper
understanding of perception and its relation to the underlying physiology of the
ear and brain.
Measures of reaction time, or response time (RT), have also been used to probe
sensory processing. The two basic forms of response time are simple response time
(SRT), where participants are instructed to respond as quickly as possible by push-
ing a single button once a stimulus is presented, and choice response time (CRT),
where participants have to categorize the stimulus (usually into one of two catego-
ries) before responding (by pressing button 1 or 2).
Although RT measures are more common in cognitive tasks, they also depend
on some basic sound attributes, such as sound intensity, with higher intensity
sounds eliciting faster reactions, measured using both SRTs (Kohfeld, 1971;
Luce & Green, 1972) and CRTs (Keuss & van der Molen, 1982).
Finally, measures of perception are not limited to the quantitative or numerical
domain. It is also possible to ask participants to describe their percepts in words.
This approach has clear applications when dealing with multidimensional attributes,
such as timbre (see below, and Chapter 2 of this volume), but also has some inherent
difficulties, as different people may use descriptive words in different ways.
To sum up, measuring perception is a thorny issue that has many solutions, all
with their own advantages and shortcomings. Perceptual measures remain a crucial
systems-level analysis tool that can be combined in both human and animal stud-
ies with various physiological and neuroimaging techniques, to help us discover
more about how the ears and brain process musical sounds in ways that elicit
musics powerful cognitive and emotional effects.
4 Andrew J. Oxenham

II. Perception of Single Tones

Although a single tone is a far cry from the complex combinations of sound that
make up most music, it can be a useful place to start in order to make sense of how
music is perceived and represented in the auditory system. The sensation produced
by a single tone is typically divided into three categoriesloudness, pitch, and

A. Loudness
The most obvious physical correlate of loudness is sound intensity (or sound pres-
sure) measured at the eardrum. However, many other factors also influence the
loudness of a sound, including its spectral content, its duration, and the context in
which it is presented.

1. Dynamic Range and the Decibel

The human auditory system has an enormous dynamic range, with the lowest-inten-
sity sound that is audible being about a factor of 1,000,000,000,000 less intense
than the loudest sound that does not cause immediate hearing damage. This very
large range is one reason why a logarithmic scalethe decibel or dBis used to
describe sound level. In these units, the dynamic range of hearing corresponds to
about 120 dB. Sound intensity is proportional to the square of sound pressure,
which is often described in terms sound pressure level (SPL) using a pressure, P0,
of 2 3 1025 N  m22 or 20 Pa (micropascals) as the reference, which is close to the
average absolute threshold for medium-frequency pure tones in young normal-
hearing individuals. The SPL of a given sound pressure, P1, is then defined as
20log10(P1/P0). A similar relationship exists between sound intensity and sound
level, such that the level is given by 10log10(I1/I0). (The multiplier is now 10 instead
of 20 because of the square-law relationship between intensity and pressure.) Thus,
a sound level in decibels is always a ratio and not an absolute value.
The dynamic range of music depends on the music style. Modern classical
music can have a very large dynamic range, from pianissimo passages on a solo
instrument (roughly 45 dB SPL) to a full orchestra playing fortissimo (about 95 dB
SPL), as measured in concert halls (Winckel, 1962). Pop music, which is often
listened to in less-than-ideal conditions, such as in a car or on a street, generally
has a much smaller dynamic range. Radio broadcast stations typically reduce the
dynamic range even further using compression to make their signal as consistently
loud as possible without exceeding the maximum peak amplitude of the broadcast
channel, so that the end dynamic range is rarely more than about 10 dB.
Our ability to discriminate small changes in level has been studied in great depth
for a wide variety of sounds and conditions (e.g., Durlach & Braida, 1969; Jesteadt,
Wier, & Green, 1977; Viemeister, 1983). As a rule of thumb, we are able to dis-
criminate changes on the order of 1 dBcorresponding to a change in sound pres-
sure of about 12%. The fact that the size of the just-noticeable difference (JND) of
1. The Perception of Musical Tones 5

broadband sounds remains roughly constant when expressed as a ratio or in deci-

bels is in line with the well-known Webers law, which states that the JND between
two stimuli is proportional to the magnitude of the stimuli.
In contrast to our ability to judge differences in sound level between two sounds
presented one after another, our ability to categorize or label sound levels is rather
poor. In line with Millers (1956) famous 7 plus or minus 2 postulate for infor-
mation processing and categorization, our ability to categorize sound levels accu-
rately is fairly limited and is subject to a variety of influences, such as the context
of the preceding sounds. This may explain why the musical notation of loudness
(in contrast to pitch) has relatively few categories between pianissimo and
fortissimotypically just six (pp, p, mp, mf, f, and ff).

2. Equal Loudness Contours and the Loudness Weighting Curves

There is no direct relationship between the physical sound level (in dB SPL) and
the sensation of loudness. There are many reasons for this, but an important one is
that loudness depends heavily on the frequency content of the sound. Figure 2
shows what are known as equal loudness contours. The basic concept is that two
pure tones with different frequencies, but with levels that fall on the same loudness
contour, have the same loudness. For instance, as shown in Figure 2, a pure tone
with a frequency of 1 kHz and a level of 40 dB SPL has the same loudness as a
pure tone with a frequency of 100 Hz and a level of about 64 dB SPL; in other words,
a 100-Hz tone has to be 24 dB higher in level than a 40-dB SPL 1-kHz tone in order

100 phons
Sound pressure level in dB

90 90

10 10

0 Hearing threshold
16 31,5 63 125 250 500 1000 2000 4000 8000 16000
Frequency in Hz

Figure 2 The equal-loudness contours, taken from ISO 226:2003.

Original figure kindly provided by Brian C. J. Moore.
6 Andrew J. Oxenham

to be perceived as being equally loud. The equal loudness contours are incorporated
into an international standard (ISO 226) that was initially established in 1961 and was
last revised in 2003.
These equal loudness contours have been derived several times from painstaking
psychophysical measurements, not always with identical outcomes (Fletcher &
Munson, 1933; Robinson & Dadson, 1956; Suzuki & Takeshima, 2004). The mea-
surements typically involve either loudness matching, where a subject adjusts the
level of one tone until it sounds as loud as a second tone, or loudness comparisons,
where a subject compares the loudness of many pairs of tones and the results are
compiled to derive points of subjective equality (PSE). Both methods are highly
susceptible to nonsensory biases, making the task of deriving a definitive set of
equal loudness contours a challenging one (Gabriel, Kollmeier, & Mellert, 1997).
The equal loudness contours provide the basis for the measure of loudness
level, which has units of phons. The phon value of a sound is the dB SPL value
of a 1-kHz tone that is judged to have the same loudness as the sound. So, by defi-
nition, a 40-dB SPL tone at 1 kHz has a loudness level of 40 phons. Continuing the
preceding example, the 100-Hz tone at a level of about 64 dB SPL also has a loud-
ness level of 40 phons, because it falls on the same equal loudness contour as the
40-dB SPL 1-kHz tone. Thus, the equal loudness contours can also be termed the
equal phon contours.
Although the actual measurements are difficult, and the results somewhat conten-
tious, there are many practical uses for the equal loudness contours. For instance, in
issues of community noise annoyance from rock concerts or airports, it is more use-
ful to know about the perceived loudness of the sounds in question, rather than just
their physical level. For this reason, an approximation of the 40-phon equal loudness
contour is built into most modern sound level meters and is referred to as the
A-weighted curve. A sound level that is quoted in dB (A) is an overall sound level
that has been filtered with the inverse of the approximate 40-phon curve. This means
that very low and very high frequencies, which are perceived as being less loud, are
given less weight than the middle of the frequency range.
As with all useful tools, the A-weighted curve can be misused. Because it is
based on the 40-phon curve, it is most suitable for low-level sounds; however, that
has not prevented it from being used in measurements of much higher-level sounds,
where a flatter filter would be more appropriate, such as that provided by the
much-less-used C-weighted curve. The ubiquitous use of the dB (A) scale for all
levels of sound therefore provides an example of a case where the convenience of a
single-number measure (and one that minimizes the impact of difficult-to-control
low frequencies) has outweighed the desire for accuracy.

3. Loudness Scales
Equal loudness contours and phons tell us about the relationship between loudness
and frequency. They do not, however, tell us about the relationship between loud-
ness and sound level. For instance, the phon, based as it is on the decibel scale at
1 kHz, says nothing about how much louder a 60-dB SPL tone is than a 30-dB
1. The Perception of Musical Tones 7

SPL tone. The answer, according to numerous studies of loudness, is not twice as
loud. There have been numerous attempts since Fechners day to relate the physical
sound level to loudness. Fechner (1860), building on Webers law, reasoned that if
JNDs were constant on a logarithmic scale, and if equal numbers of JNDs reflected
an equal change in loudness, then loudness must be related logarithmically to sound
intensity. Harvard psychophysicist S. S. Stevens disagreed, claiming that JNDs
reflected noise in the auditory system, which did not provide direct insight into
the function relating loudness to sound intensity (Stevens, 1957). Stevenss
approach was to use magnitude and ratio estimation and production techniques, as
described in Section I of this chapter, to derive a relationship between loudness and
sound intensity. He concluded that loudness (L) was related to sound intensity (I)
by a power law:

L 5 kI (Eq. 1)

where the exponent, , has a value of about 0.3 at medium frequencies and for
moderate and higher sound levels. This law implies that a 10-dB increase in level
results in a doubling of loudness. At low levels, and at lower frequencies, the expo-
nent is typically larger, leading to a steeper growth-of-loudness function. Stevens
used this relationship to derive loudness units, called sones. By definition, 1 sone
is the loudness of a 1-kHz tone presented at a level of 40 dB SPL; 2 sones is twice
as loud, corresponding roughly to a 1-kHz tone presented at 50 dB SPL, and 4
sones corresponds to the same tone at about 60 dB SPL.
Numerous studies have supported the basic conclusion that loudness can be
related to sound intensity by a power law. However, in part because of the variability
of loudness judgments, and the substantial effects of experimental methodology
(Poulton, 1979), different researchers have found different values for the best-fitting
exponent. For instance, Warren (1970) argued that presenting participants with sev-
eral sounds to judge invariably results in bias. He therefore presented each subject
with only one trial. Based on these single-trial judgments, Warren also derived a
power law, but he found an exponent value of 0.5. This exponent value is what one
might expect if the loudness of sound were proportional to its distance from the
receiver, leading to a 6-dB decrease in level for every doubling of distance. Yet
another study, which tried to avoid bias effects by using the entire (100-dB) level
range within each experiment, derived an exponent of only 0.1, implying a doubling
of loudness for every 30-dB increase in sound level (Viemeister & Bacon, 1988).
Overall, it is generally well accepted that the relationship between loudness and
sound intensity can be approximated as a power law, although methodological issues
and intersubject and intrasubject variability have made it difficult to derive a defini-
tive and uncontroversial function relating the sensation to the physical variable.

4. Partial Loudness and Context Effects

Most sounds that we encounter, particularly in music, are accompanied by other
sounds. This fact makes it important to understand how the loudness of a sound is
8 Andrew J. Oxenham

affected by the context in which it is presented. In this section, we deal with two
such situations, the first being when sounds are presented simultaneously, the
second when they are presented sequentially.
When two sounds are presented together, as in the case of two musical instru-
ments or voices, they may partially mask each other, and the loudness of each may
not be as great as if each sound were presented in isolation. The loudness of a par-
tially masked sound is termed partial loudness (Moore, Glasberg, & Baer, 1997;
Scharf, 1964; Zwicker, 1963). When a sound is completely masked by another, its
loudness is zero, or a very small quantity. As its level is increased to above its
masked threshold, it becomes audible, but its loudness is lowsimilar to that of
the same sound presented in isolation but just a few decibels above its absolute
threshold. As the level is increased further, the sounds loudness increases rapidly,
essentially catching up with its unmasked loudness once it is about 20 dB or
more above its masked threshold.
The loudness of a sound is also affected by the sounds that precede it. In some
cases, loud sounds can enhance the loudness of immediately subsequent sounds
(e.g., Galambos, Bauer, Picton, Squires, & Squires, 1972; Plack, 1996); in other
cases, the loudness of the subsequent sounds can be reduced (Mapes-Riordan &
Yost, 1999; Marks, 1994). There is still some debate as to whether separate
mechanisms are required to explain these two phenomena (Arieh & Marks, 2003b;
Oberfeld, 2007; Scharf, Buus, & Nieder, 2002). Initially, it was not clear whether
the phenomenon of loudness recalibrationa reduction in the loudness of moder-
ate-level sounds following a louder onereflected a change in the way participants
assigned numbers to the perceived loudness, or reflected a true change in the loud-
ness sensation (Marks, 1994). However, more recent work has shown that choice
response times to recalibrated stimuli change in a way that is consistent with
physical changes in the intensity, suggesting a true sensory phenomenon (Arieh &
Marks, 2003a).

5. Models of Loudness
Despite the inherent difficulties in measuring loudness, a model that can predict
the loudness of arbitrary sounds is still a useful tool. The development of models
of loudness perception has a long history (Fletcher & Munson, 1937; Moore &
Glasberg, 1996, 1997; Moore et al., 1997; Moore, Glasberg, & Vickers, 1999;
Zwicker, 1960; Zwicker, Fastl, & Dallmayr, 1984). Essentially all are based on
the idea that the loudness of a sound reflects the amount of excitation it produces
within the auditory system. Although a direct physiological test, comparing the
total amount of auditory nerve activity in an animal model with the predicted
loudness based on human studies, did not find a good correspondence between
the two (Relkin & Doucet, 1997), the psychophysical models that relate predicted
excitation patterns, based on auditory filtering and cochlear nonlinearity, to loudness
generally provide accurate predictions of loudness in a wide variety of conditions
(e.g., Chen, Hu, Glasberg, & Moore, 2011).
Some models incorporate partial loudness predictions (Chen et al., 2011; Moore
et al., 1997), others predict the effects of cochlear hearing loss on loudness
1. The Perception of Musical Tones 9

(Moore & Glasberg, 1997), and others have been extended to explain the loudness
of sounds that fluctuate over time (Chalupper & Fastl, 2002; Glasberg & Moore,
2002). However, none has yet attempted to incorporate context effects, such as
loudness recalibration or loudness enhancement.

B. Pitch
Pitch is arguably the most important dimension for conveying music. Sequences of
pitches form a melody, and simultaneous combinations of pitches form harmony
two foundations of Western music. There is a vast body of literature devoted to
pitch research, from both perceptual and neural perspectives (Plack, Oxenham,
Popper, & Fay, 2005). The clearest physical correlate of pitch is the periodicity, or
repetition rate, of sound, although other dimensions, such as sound intensity, can
have small effects (e.g., Verschuure & van Meeteren, 1975). For young people
with normal hearing, pure tones with frequencies between about 20 Hz and 20 kHz
are audible. However, only sounds with repetition rates between about 30 Hz and
5 kHz elicit a pitch percept that can be called musical and is strong enough to carry
a melody (e.g., Attneave & Olson, 1971; Pressnitzer, Patterson, & Krumbholz,
2001; Ritsma, 1962). Perhaps not surprisingly, these limits, which were determined
through psychoacoustical investigation, correspond quite well to the lower and
upper limits of pitch found on musical instruments: the lowest and highest notes of
a modern grand piano, which covers the ranges of all standard orchestral instru-
ments, correspond to 27.5 Hz and 4186 Hz, respectively.
We tend to recognize patterns of pitches that form melodies (see Chapter 7 of
this volume). We do this presumably by recognizing the musical intervals between
successive notes (see Chapters 4 and 7 of this volume), and most of us seem rela-
tively insensitive to the absolute pitch values of the individual note, so long as the
pitch relationships between notes are correct. However, exactly how the pitch is
extracted from each note and how it is represented in the auditory system remain
unclear, despite many decades of intense research.

1. Pitch of Pure Tones

Pure tones produce a clear, unambiguous pitch, and we are very sensitive to
changes in their frequency. For instance, well-trained listeners can distinguish
between two tones with frequencies of 1000 and 1002 Hza difference of only
0.2% (Moore, 1973). A semitone, the smallest step in the Western scale system,
is a difference of about 6%, or about a factor of 30 greater than the JND of
frequency for pure tones. Perhaps not surprisingly, musicians are generally better
than nonmusicians at discriminating small changes in frequency; what is more
surprising is that it does not take much practice for people with no musical train-
ing to catch up with musicians in terms of their performance. In a recent study,
frequency discrimination abilities of trained classical musicians were compared
with those of untrained listeners with no musical background, using both pure
tones and complex tones (Micheyl, Delhommeau, Perrot, & Oxenham, 2006).
Initially thresholds were about a factor of 6 worse for the untrained listeners.
10 Andrew J. Oxenham

However, it took only between 4 and 8 hours of practice for the thresholds of the
untrained listeners to match those of the trained musicians, whereas the trained
musicians did not improve with practice. This suggests that most people are able
to discriminate very fine differences in frequency with very little in the way of
specialized training.
Two representations of a pure tone at 440 Hz (the orchestral A) are shown in
Figure 3. The upper panel shows the waveformvariations in sound pressure as a
function of timethat repeats 440 times a second, and so has a period of 1/440 s,
or about 2.27 ms. The lower panel provides the spectral representation, showing
that the sound has energy only at 440 Hz. This spectral representation is for an
ideal pure toneone that has no beginning or end. In practice, spectral energy
spreads above and below the frequency of the pure tone, reflecting the effects of
onset and offset. These two representations (spectral and temporal) provide a good
introduction to two ways in which pure tones are represented in the peripheral
auditory system.
The first potential code, known as the place code, reflects the mechanical fil-
tering that takes place in the cochlea of the inner ear. The basilar membrane, which
runs the length of the fluid-filled cochlea from the base to the apex, vibrates in

1 Figure 3 Schematic diagram

0.8 of the time waveform (upper
Pressure (arbitrary units)

0.6 panel) and power spectrum

0.4 (lower panel) of a pure tone
with a frequency of 440 Hz.
0 2 4 6 8 10 12
Time (ms)

Magnitude (arbitrary units)





0 1000 2000 3000 4000 5000
Frequency (Hz)
1. The Perception of Musical Tones 11

response to sound. The responses of the basilar membrane are sharply tuned and
highly specific: a certain frequency will cause only a local region of the basilar
membrane to vibrate. Because of its structural properties, the apical end of the basi-
lar membrane responds best to low frequencies, while the basal end responds best
to high frequencies. Thus, every place along the basilar membrane has its own
best frequency or characteristic frequency (CF)the frequency to which that
place responds most strongly. This frequency-to-place mapping, or tonotopic orga-
nization, is maintained throughout the auditory pathways up to primary auditory
cortex, thereby providing a potential neural code for the pitch of pure tones.
The second potential code, known as the temporal code, relies on the fact that
action potentials, or spikes, generated in the auditory nerve tend to occur at a cer-
tain phase within the period of a sinusoid. This property, known as phase locking,
means that the brain could potentially represent the frequency of a pure tone by
way of the time intervals between spikes, when pooled across the auditory nerve.
No data are available from the human auditory nerve, because of the invasive
nature of the measurements, but phase locking has been found to extend to between
2 and 4 kHz in other mammals, depending somewhat on the species. Unlike tonoto-
pic organization, phase locking up to high frequencies is not preserved in higher
stations of the auditory pathways. At the level of the auditory cortex, the limit of
phase locking reduces to at best 100 to 200 Hz (Wallace, Rutkowski, Shackleton, &
Palmer, 2000). Therefore, most researchers believe that the timing code found in
the auditory nerve must be transformed to some form of place or population code at a
relatively early stage of auditory processing.
There is some psychoacoustical evidence for both place and temporal codes.
One piece of evidence in favor of a temporal code is that pitch discrimination
abilities deteriorate at high frequencies: the JND between two frequencies becomes
considerably larger at frequencies above about 4 to 5 kHzthe same frequency
range above which listeners ability to recognize familiar melodies (Attneave &
Olson, 1971), or to notice subtle changes in unfamiliar melodies (Oxenham,
Micheyl, Keebler, Loper, & Santurette, 2011), degrades. This frequency is similar
to the one just described in which phase locking in the auditory nerve is strongly
degraded (e.g., Palmer & Russell, 1986; Rose, Brugge, Anderson, & Hind, 1967),
suggesting that the temporal code is necessary for accurate pitch discrimination
and for melody perception. It might even be taken as evidence that the upper pitch
limits of musical instruments were determined by the basic physiological limits of
the auditory nerve.
Evidence for the importance of place information comes first from the fact that
some form of pitch perception remains possible even with pure tones of very high
frequency (Henning, 1966; Moore, 1973), where it is unlikely that phase locking
information is useful (e.g., Palmer & Russell, 1986). Another line of evidence indi-
cating that place information may be important comes from a study that used so-
called transposed tones (van de Par & Kohlrausch, 1997) to present the temporal
information that would normally be available only to a low-frequency region in the
cochlea to a high-frequency region, thereby dissociating temporal from place cues
(Oxenham, Bernstein, & Penagos, 2004). In that study, pitch discrimination was
12 Andrew J. Oxenham

considerably worse when the low-frequency temporal information was presented to

the wrong place in the cochlea, suggesting that place information is important.
In light of this mixed evidence, it may be safest to assume that the auditory sys-
tem uses both place and timing information from the auditory nerve in order to
extract the pitch of pure tones. Indeed some theories of pitch explicitly require both
accurate place and timing information (Loeb, White, & Merzenich, 1983). Gaining
a better understanding of how the information is extracted remains an important
research goal. The question is of particular clinical relevance, as deficits in pitch
perception are a common complaint of people with hearing loss and people with
cochlear implants. A clearer understanding of how the brain uses information from
the cochlea will help researchers to improve the way in which auditory prostheses,
such as hearing aids and cochlear implants, present sound to their users.

2. Pitch of Complex Tones

A large majority of musical sounds are complex tones of one form or another, and
most have a pitch associated with them. Most common are harmonic complex
tones, which are composed of the F0 (corresponding to the repetition rate of the
entire waveform) and upper partials, harmonics, or overtones, spaced at integer
multiples of the F0. The pitch of a harmonic complex tone usually corresponds to
the F0. In other words, if a subject is asked to match the pitch of a complex tone to
the pitch of a single pure tone, the best match usually occurs when the frequency
of the pure tone is the same as the F0 of the complex tone. Interestingly, this is
true even when the complex tone has no energy at the F0 or the F0 is masked
(de Boer, 1956; Licklider, 1951; Schouten, 1940; Seebeck, 1841). This phenome-
non has been given various terms, including pitch of the missing fundamental, peri-
odicity pitch, residue pitch, and virtual pitch. The ability of the auditory system to
extract the F0 of a sound is important from the perspective of perceptual constancy:
imagine a violin note being played in a quiet room and then again in a room with a
noisy air-conditioning system. The low-frequency noise of the air-conditioning sys-
tem might well mask some of the lower-frequency energy of the violin, including
the F0, but we would not expect the pitch (or identity) of the violin to change
because of it.
Although the ability to extract the periodicity pitch is clearly an important one,
and one that is shared by many different species (Shofner, 2005), exactly how the
auditory system extracts the F0 remains for the most part unknown. The initial
stages in processing a harmonic complex tone are shown in Figure 4. The upper
two panels show the time waveform and the spectral representation of a harmonic
complex tone. The third panel depicts the filtering that occurs in the cochleaeach
point along the basilar membrane can be represented as a band-pass filter that
responds to only those frequencies close to its center frequency. The fourth panel
shows the excitation pattern produced by the sound. This is the average response
of the bank of band-pass filters, plotted as a function of the filters center frequency
(Glasberg & Moore, 1990). The fifth panel shows an excerpt of the time waveform
at the output of some of the filters along the array. This is an approximation of the
Time waveform

Pressure (arbitrary units)


0 2 4 6 8 10 12
Time (ms)

Level (dB)



0 1000 2000 3000 4000 5000 6000 7000 8000
Frequency (Hz)
Auditory filterbank

Response (dB)



0 1000 2000 3000 4000 5000 6000 7000 8000
Frequency (Hz)
Excitation pattern

Excitation (dB)



0 1000 2000 3000 4000 5000 6000 7000 8000
Center frequency (Hz)
0 2 4 6 8 10 12
BM vibration

Time (ms)

Figure 4 Representations of a harmonic complex tone with a fundamental frequency (F0)

of 440 Hz. The upper panel shows the time waveform. The second panel shows the power
spectrum of the same waveform. The third panel shows the auditory filter bank, representing
the filtering that occurs in the cochlea. The fourth panel shows the excitation pattern, or the
time-averaged output of the filter bank. The fifth panel shows some sample time waveforms
at the output of the filter bank, including filters centered at the F0 and the fourth harmonic,
illustrating resolved harmonics, and filters centered at the 8th and 12th harmonic of the
complex, illustrating harmonics that are less well resolved and show amplitude modulations
at a rate corresponding to the F0.
14 Andrew J. Oxenham

waveform that drives the inner hair cells in the cochlea, which in turn synapse with
the auditory nerve fibers to produce the spike trains that the brain must interpret.
Considering the lower two panels of Figure 4, it is possible to see a transition
as one moves from the low-numbered harmonics on the left to the high-
numbered harmonics on the right: The first few harmonics generate distinct peaks
in the excitation pattern, because the filters in that frequency region are narrower
than the spacing between successive harmonics. Note also that the time waveforms
at the outputs of filters centered at the low-numbered harmonics resemble pure
tones. At higher harmonic numbers, the bandwidths of the auditory filters become
wider than the spacing between successive harmonics, and so individual peaks in
the excitation pattern are lost. Similarly, the time waveform at the output of higher-
frequency filters no longer resembles a pure tone, but instead reflects the interac-
tion of multiple harmonics, producing a complex waveform that repeats at a rate
corresponding to the F0.
Harmonics that produce distinct peaks in the excitation pattern and/or produce
quasi-sinusoidal vibrations on the basilar membrane are referred to as being
resolved. Phenomenologically, resolved harmonics are those that can be heard
out as separate tones under certain circumstances. Typically, we do not hear the
individual harmonics when we listen to a musical tone, but our attention can be
drawn to them in various ways, for instance by amplifying them or by switching
them on and off while the other harmonics remain continuous (e.g., Bernstein &
Oxenham, 2003; Hartmann & Goupell, 2006). The ability to resolve or hear out
individual low-numbered harmonics as pure tones was already noted by Hermann
von Helmholtz in his classic work, On the Sensations of Tone Perception
(Helmholtz, 1885/1954).
The higher-numbered harmonics, which do not produce individual peaks of
excitation and cannot typically be heard out, are often referred to as being unre-
solved. The transition between resolved and unresolved harmonics is thought to
lie somewhere between the 5th and 10th harmonic, depending on various factors,
such as the F0 and the relative amplitudes of the components, as well as on how
resolvability is defined (e.g., Bernstein & Oxenham, 2003; Houtsma &
Smurzynski, 1990; Moore & Gockel, 2011; Shackleton & Carlyon, 1994).
Numerous theories and models have been devised to explain how pitch is extracted
from the information present in the auditory periphery (de Cheveigne, 2005). As with
pure tones, the theories can be divided into two basic categoriesplace and temporal
theories. The place theories generally propose that the auditory system uses the
lower-order, resolved harmonics to calculate the pitch (e.g., Cohen, Grossberg, &
Wyse, 1995; Goldstein, 1973; Terhardt, 1974b; Wightman, 1973). This could be
achieved by way of a template-matching process, with either hard-wired harmonic
templates or templates that develop through repeated exposure to harmonic series,
which eventually become associated with the F0. Temporal theories typically involve
evaluating the time intervals between auditory-nerve spikes, using a form of autocor-
relation or all-interval spike histogram (Cariani & Delgutte, 1996; Licklider, 1951;
Meddis & Hewitt, 1991; Meddis & OMard, 1997; Schouten, Ritsma, & Cardozo,
1962). This information can be obtained from both resolved and unresolved harmonics.
1. The Perception of Musical Tones 15

Pooling these spikes from across the nerve array results in a dominant interval
emerging that corresponds to the period of the waveform (i.e., the reciprocal of the
F0). A third alternative involves using both place and temporal information. In one
version, coincident timing between neurons with harmonically related CFs is postu-
lated to lead to a spatial network of coincidence detectorsa place-based template
that emerges through coincident timing information (Shamma & Klein, 2000). In
another version, the impulse-response time of the auditory filters, which depends on
the CF, is postulated to determine the range of periodicities that a certain tonotopic
location can code (de Cheveigne & Pressnitzer, 2006). Recent physiological studies
have supported at the least the plausibility of place-time mechanisms to code pitch
(Cedolin & Delgutte, 2010).
Distinguishing between place and temporal (or place-time) models of pitch has
proved very difficult. In part, this is because spectral and temporal representations
of a signal are mathematically equivalent: any change in the spectral representation
will automatically lead to a change in the temporal representation, and vice versa.
Psychoacoustic attempts to distinguish between place and temporal mechanisms
have focused on the limits imposed by the peripheral physiology in the cochlea and
auditory nerve. For instance, the limits of frequency selectivity can be used to test
the place theory: if all harmonics are clearly unresolved (and therefore providing
no place information) and a pitch is still heard, then pitch cannot depend solely on
place information. Similarly, the putative limits of phase-locking can be used: if
the periodicity of the waveform and the frequencies of all the resolved harmonics
are all above the limit of phase locking in the auditory nerve and a pitch is still
heard, then temporal information is unlikely to be necessary for pitch perception.
A number of studies have shown that pitch perception is possible even when
harmonic tone complexes are filtered to remove all the low-numbered, resolved
harmonics (Bernstein & Oxenham, 2003; Houtsma & Smurzynski, 1990;
Kaernbach & Bering, 2001; Shackleton & Carlyon, 1994). A similar conclusion
was reached by studies that used amplitude-modulated broadband noise, which has
no spectral peaks in its long-term spectrum (Burns & Viemeister, 1976, 1981).
These results suggest that pitch can be extracted from temporal information alone,
thereby ruling out theories that consider only place coding. However, the pitch sen-
sation produced by unresolved harmonics or modulated noise is relatively weak
compared with the pitch of musical instruments, which produce full harmonic
complex tones.
The more salient pitch that we normally associate with music is provided by
the lower-numbered resolved harmonics. Studies that have investigated the
relative contributions of individual harmonics have found that harmonics 3 to 5
(Moore, Glasberg, & Peters, 1985), or frequencies around 600 Hz (Dai, 2000),
seem to have the most influence on the pitch of the overall complex. This is where
current temporal models also encounter some difficulty: they are able to extract the
F0 of a complex tone as well from unresolved harmonics as from resolved harmo-
nics, and therefore they do not predict the large difference in pitch salience and
accuracy between low- and high-numbered harmonics that is observed in psycho-
physical studies (Carlyon, 1998). In other words, place models do not predict good
16 Andrew J. Oxenham

enough performance with unresolved harmonics, whereas temporal models predict

performance that is too good. The apparently qualitative and quantitative difference
in the pitch produced by low-numbered and high-numbered harmonics has led to the
suggestion that there may be two pitch mechanisms at work, one to code the tem-
poral envelope repetition rate from high-numbered harmonics and one to code the
F0 from the individual low-numbered harmonics (Carlyon & Shackleton, 1994),
although subsequent work has questioned some of the evidence proposed for the two
mechanisms (Gockel, Carlyon, & Plack, 2004; Micheyl & Oxenham, 2003).
The fact that low-numbered, resolved harmonics are important suggests that
place coding may play a role in everyday pitch. Further evidence comes from a
variety of studies. The study mentioned earlier that used tones with low-frequency
temporal information transposed into a high-frequency range (Oxenham et al.,
2004) studied complex-tone pitch perception by transposing the information from
harmonics 3, 4, and 5 of a 100-Hz F0 to high-frequency regions of the cochlea
roughly 4 kHz, 6 kHz, and 10 kHz. If temporal information was sufficient to elicit
a periodicity pitch, then listeners should have been able to hear a pitch correspond-
ing to 100 Hz. In fact, none of the listeners reported hearing a low pitch or was
able to match the pitch of the transposed tones to that of the missing fundamental.
This suggests that, if temporal information is used, it may need to be presented to
the correct place along the cochlea.
Another line of evidence has come from revisiting early conclusions that no
pitch is heard when all the harmonics are above about 5 kHz (Ritsma, 1962). The
initial finding led researchers to suggest that timing information was crucial and
that at frequencies above the limits of phase locking, periodicity pitch was not per-
ceived. A recent study revisited this conclusion and found that, in fact, listeners
were well able to hear pitches between 1 and 2 kHz, even when all the harmonics
were filtered to be above 6 kHz, and were sufficiently resolved to ensure that no
temporal envelope cues were available (Oxenham et al., 2011). This outcome leads
to an interesting dissociation: tones above 6 kHz on their own do not produce a
musically useful pitch; however, those same tones when combined with others in a
harmonic series can produce a musical pitch sufficient to convey a melody. The
results suggest that the upper limit of musical pitch may not in fact be explained by
the upper limit of phase locking: the fact that pitch can be heard even when all
tones are above 5 kHz suggests either that temporal information is not necessary
for musical pitch or that usable phase locking in the human auditory nerve extends
to much higher frequencies than currently believed (Heinz, Colburn, & Carney,
2001; Moore & Sek, 2009).
A further line of evidence for the importance of place information has come from
studies that have investigated the relationship between pitch accuracy and auditory
filter bandwidths. Moore and Peters (1992) investigated the relationship between
auditory filter bandwidths, measured using spectral masking techniques (Glasberg &
Moore, 1990), pure-tone frequency discrimination, and complex-tone F0 discrimi-
nation in young and elderly people with normal and impaired hearing. People
with hearing impairments were tested because they often have auditory filter band-
widths that are broader than normal. A wide range of results were foundsome
1. The Perception of Musical Tones 17

participants with normal filter bandwidths showed impaired pure-tone and

complex-tone pitch discrimination thresholds; others with abnormally wide filters
still had relatively normal pure-tone pitch discrimination thresholds. However,
none of the participants with broadened auditory filters had normal F0 discrimina-
tion thresholds, suggesting that perhaps broader filters resulted in fewer or no
resolved harmonics and that resolved harmonics are necessary for accurate F0 dis-
crimination. This question was pursued later by Bernstein and Oxenham (2006a,
2006b), who systematically increased the lowest harmonic present in a harmonic
complex tone and measured the point at which F0 discrimination thresholds wors-
ened. In normal-hearing listeners, there is quite an abrupt transition from good
to poor pitch discrimination as the lowest harmonic present is increased from the
9th to the 12th (Houtsma & Smurzynski, 1990). Bernstein and Oxenham reasoned
that if the transition point is related to frequency selectivity and the resolvability of
the harmonics, then the transition point should decrease to lower harmonic numbers
as the auditory filters become wider. They tested this in hearing-impaired listeners
and found a significant correlation between the transition point and the estimated
bandwidth of the auditory filters (Bernstein & Oxenham, 2006b), suggesting that
harmonics may need to be resolved in order to elicit a strong musical pitch.
Interestingly, even though resolved harmonics may be necessary for accurate pitch
perception, they may not be sufficient. Bernstein and Oxenham (2003) increased
the number of resolved harmonics available to listeners by presenting alternating
harmonics to opposite ears. In this way, the spacing between successive compo-
nents in each ear was doubled, thereby doubling the number of peripherally
resolved harmonics. Listeners were able to hear out about twice as many harmonics
in this new condition, but that did not improve their pitch discrimination thresholds
for the complex tone. In other words, providing access to harmonics that are
not normally resolved does not improve pitch perception abilities. These results are
consistent with theories that rely on pitch templates. If harmonics are not normally
available to the auditory system, they would be unlikely to be incorporated
into templates and so would not be expected to contribute to the pitch percept
when presented by artificial means, such as presenting them to alternate ears.
Most sounds in our world, including those produced by musical instruments,
tend to have more energy at low frequencies than at high; on average, spectral
amplitude decreases at a rate of about 1/f, or -6 dB/octave. It therefore makes sense
that the auditory system would rely on the lower numbered harmonics to determine
pitch, as these are the ones that are most likely to be audible. Also, resolved harmo-
nicsones that produce a peak in the excitation pattern and elicit a sinusoidal tem-
poral responseare much less susceptible to the effects of room reverberation than
are unresolved harmonics. Pitch discrimination thresholds for unresolved harmonics
are relatively good (B2%) when all the components have the same starting phase
(as in a stream of pulses). However, thresholds are much worse when the phase
relationships are scrambled, as they would be in a reverberant hall or church, and
listeners discrimination thresholds can be as poor as 10%more than a musical
semitone. In contrast, the response to resolved harmonics is not materially affected
by reverberation: changing the starting phase of a single sinusoid does not affect its
18 Andrew J. Oxenham

waveshapeit still remains a sinusoid, with frequency discriminations thresholds

of considerably less than 1%.
A number of physiological and neuroimaging studies have searched for represen-
tations of pitch beyond the cochlea (Winter, 2005). Potential correlates of periodicity
have been found in single- and multi-unit studies of the cochlear nucleus (Winter,
Wiegrebe, & Patterson, 2001), in the inferior colliculus (Langner & Schreiner,
1988), and auditory cortex (Bendor & Wang, 2005). Human neuroimaging studies
have also found correlates of periodicity in the brainstem (Griffiths, Uppenkamp,
Johnsrude, Josephs, & Patterson, 2001) as well as in auditory cortical structures
(Griffiths, Buchel, Frackowiak, & Patterson, 1998). More recently, Penagos,
Melcher, and Oxenham (2004) identified a region in human auditory cortex that
seemed sensitive to the degree of pitch salience, as opposed to physical parameters,
such as F0 or spectral region. However, these studies are also not without some con-
troversy. For instance, Hall and Plack (2009) failed to find any single region in the
human auditory cortex that responded to pitch, independent of other stimulus para-
meters. Similarly, in a physiological study of the ferrets auditory cortex, Bizley,
Walker, Silverman, King, and Schnupp (2009) found interdependent coding of pitch,
timbre, and spatial location and did not find any pitch-specific region.
In summary, the pitch of single harmonic complex tones is determined primarily
by the first 5 to 8 harmonics, which are also those thought to be resolved in the
peripheral auditory system. To extract the pitch, the auditory system must somehow
combine and synthesize information from these harmonics. Exactly how this occurs
in the auditory system remains a matter of ongoing research.

C. Timbre
The official ANSI definition of timbre is: That attribute of auditory sensation
which enables a listener to judge that two nonidentical sounds, similarly presented
and having the same loudness and pitch, are dissimilar (ANSI, 1994). The stan-
dard goes on to note that timbre depends primarily on the frequency spectrum of
the sound, but can also depend on the sound pressure and temporal characteristics.
In other words, anything that is not pitch or loudness is timbre. As timbre has its
own chapter in this volume (Chapter 2), it will not be discussed further here.
However, timbre makes an appearance in the next section, where its influence on
pitch and loudness judgments is addressed.

D. Sensory Interactions and Cross-Modal Influences

The auditory sensations of loudness, pitch, and timbre are for the most part studied
independently. Nevertheless, a sizeable body of evidence suggests that these sen-
sory dimensions are not strictly independent. Furthermore, other sensory modali-
ties, in particular vision, can have sizeable effects on auditory judgments of
musical sounds.
1. The Perception of Musical Tones 19

Increasing brightness

High F0, Low spectral peak High F0, High spectral peak
Increasing pitch

Level (dB)

Low F0, Low spectral peak Low F0, High spectral peak


Figure 5 Representations of F0 and spectral peak, which primarily affect the sensations of
pitch and timbre, respectively.

1. Pitch and Timbre Interactions

Pitch and timbre are the two dimensions most likely to be confused, particularly by
people without any musical training. Increasing the F0 of the complex tone results in
an increase in pitch, whereas changing the spectral center of gravity of tone increases
its brightnessone aspect of timbre (Figure 5). In both cases, when asked to describe
the change, many listeners would simply say that the sound was higher.
In general, listeners find it hard to ignore changes in timbre when making pitch
judgments. Numerous studies have shown that the JND for F0 increases when
the two sounds to be compared also vary in spectral content (e.g., Borchert,
Micheyl, & Oxenham, 2011; Faulkner, 1985; Moore & Glasberg, 1990). In principle,
this could be because the change in spectral shape actually affects pitch or because
listeners have difficulty ignoring timbre changes and concentrating solely on pitch.
Studies using pitch matching have generally found that harmonic complex tones are
best matched with a pure-tone frequency corresponding to the F0, regardless of
the spectral content of the complex tone (e.g., Patterson, 1973), which means that the
detrimental effects of differing timbre may be related more to a distraction effect
than to a genuine change in pitch (Moore & Glasberg, 1990).

2. Effects of Pitch or Timbre Changes on the Accuracy of Loudness

Just as listeners have more difficulty judging pitch in the face of varying timbre,
loudness comparisons between two sounds become much more challenging when
either the pitch or timbre of the two sounds differs. Examples include the difficulty
of making loudness comparisons between two pure tones of different frequency
20 Andrew J. Oxenham

(Gabriel et al., 1997; Oxenham & Buus, 2000), and the difficulty of making loud-
ness comparisons between tones of differing duration, even when they have the
same frequency (Florentine, Buus, & Robinson, 1998).

3. Visual Influences on Auditory Attributes

As anyone who has watched a virtuoso musician will know, visual input affects the
aesthetic experience of the audience. More direct influences of vision on auditory
sensations, and vice versa, have also been reported in recent years. For instance,
noise that is presented simultaneously with a light tends to be rated as louder than
noise presented without light (Odgaard, Arieh, & Marks, 2004). Interestingly, this
effect appears to be sensory in nature, rather than a late-stage decisional effect,
or shift in criterion; in contrast, similar effects of noise on the apparent brightness
of light (Stein, London, Wilkinson, & Price, 1996) seem to stem from higher-level
decisional and criterion-setting mechanisms (Odgaard, Arieh, & Marks, 2003).
On the other hand, recent combinations of behavioral and neuroimaging techniques
have suggested that the combination of sound with light can result in increased sen-
sitivity to low-level light, which is reflected in changes in activation of sensory cor-
tices (Noesselt et al., 2010).
Visual cues can also affect other attributes of sound. For instance, Schutz and
colleagues (Schutz & Kubovy, 2009; Schutz & Lipscomb, 2007) have shown that
the gestures made in musical performance can affect the perceived duration of a
musical sound: a short or staccato gesture by a marimba player led to shorter
judged durations of the tone than a long gesture by the player, even though the
tone itself was identical. Interestingly, this did not hold for sustained sounds, such
as a clarinet, where visual information had much less impact on duration judg-
ments. The difference may relate to the exponential decay of percussive sounds,
which have no clearly defined end, allowing the listeners to shift their criterion for
the end point to better match the visual information.

III. Perception of Sound Combinations

A. Object Perception and Grouping
When a musical tone, such as a violin note or a sung vowel, is presented, we normally
hear a single sound with a single pitch, even though the note actually consists of
many different pure tones, each with its own frequency and pitch. This perceptual
fusion is partly because all the pure tones begin and end at roughly the same time,
and partly because they form a single harmonic series (Darwin, 2005). The impor-
tance of onset and offset synchrony can be demonstrated by delaying one of the
components relative to all the others. A delay of only a few tens of milliseconds is
sufficient for the delayed component to pop out and be heard as a separate
object. Similarly, if one component is mistuned compared to the rest of the com-
plex, it will be heard out as a separate object, provided the mistuning is sufficiently
large. For low-numbered harmonics, mistuning a harmonic by between 1 and 3% is
1. The Perception of Musical Tones 21

sufficient for it to pop out (Moore, Glasberg, & Peters, 1986). Interestingly, a
mistuned harmonic can be heard separately, but can still contribute to the overall
pitch of the complex; in fact a single mistuned harmonic continues to contribute to
the overall pitch of the complex, even when it is mistuned by as much as 8%
well above the threshold for hearing it out as a separate object (Darwin & Ciocca,
1992; Darwin, Hukin, & al-Khatib, 1995; Moore et al., 1985). This is an example
of a failure of disjoint allocationa single component is not disjointly allocated
to just a single auditory object (Liberman, Isenberg, & Rakerd, 1981; Shinn-
Cunningham, Lee, & Oxenham, 2007).

B. Perceiving Multiple Pitches

How many tones can we hear at once? Considering all the different instruments in
an orchestra, one might expect the number to be quite high, and a well-trained con-
ductor will in many cases be able to hear a wrong note played by a single instru-
ment within that orchestra. But are we aware of all the pitches being presented at
once, and can we count them? Huron (1989) suggested that the number of indepen-
dent voices we can perceive and count is actually rather low. Huron (1989) used
sounds of homogenous timbre (organ notes) and played participants sections from a
piece of polyphonic organ music by J. S. Bach with between one and five voices
playing simultaneously. Despite the fact that most of the participants were musi-
cally trained, their ability to judge accurately the number of voices present
decreased dramatically when the number of voices actually present exceeded three.
Using much simpler stimuli, consisting of several simultaneous pure tones,
Demany and Ramos (2005) made the interesting discovery that participants could
not tell whether a certain tone was present or absent from the chord, but they
noticed if its frequency was changed in the next presentation. In other words, lis-
teners detected a change in the frequency of a tone that was itself undetected.
Taken together with the results of Huron (1989), the data suggest that the pitches
of many tones can be processed simultaneously, but that listeners may only be con-
sciously aware of a subset of between three and four at any one time.

C. The Role of Frequency Selectivity in the Perception of Multiple Tones

1. Roughness
When two pure tones of differing frequency are added, the resulting waveform
fluctuates in amplitude at a rate corresponding to the difference of the two frequen-
cies. These amplitude fluctuations, or beats, are illustrated in Figure 6, which
shows how the two tones are sometimes in phase, and add constructively (A), and
sometimes out of phase, and so cancel (B). At beat rates of less than about 10 Hz,
we hear the individual fluctuations, but once the rate increases above about 12 Hz,
we are no longer able to follow the individual fluctuations and instead perceive a
rough sound (Daniel & Weber, 1997; Terhardt, 1974a).
22 Andrew J. Oxenham

B A Figure 6 Illustration of the beats created

by the summation of two sinusoids with
slightly different frequencies. At some
points in time, the two waveforms are in
phase and so add constructively (A); at
other points in time, the two waveforms
are in antiphase and their waveforms
cancel (B). The resulting waveform
fluctuates at a rate corresponding to the
difference of the two frequencies.


According to studies of roughness, the percept is maximal at rates of around

70 Hz and then decreases. The decrease in perceived roughness with increasing
rate is in part because the auditory system becomes less sensitive to modulation
above about 100 to 150 Hz, and in part due to the effects of auditory filtering
(Kohlrausch, Fassel, & Dau, 2000): If the two tones do not fall within the same
auditory filter, the beating effect is reduced because the tones do not interact to
form the complex waveform; instead (as with resolved harmonics) each tone is
represented separately in the auditory periphery. Therefore, the perception of beats
depends to a large extent on peripheral interactions in the ear. (Binaural beats also
occur between sounds presented to opposite ears, but they are much less salient and
are heard over a much smaller range of frequency differences; see Licklider,
Webster, & Hedlun, 1950.)
The percept of roughness that results from beats has been used to explain a
number of musical phenomena. First, chords played in the lower registers typically
sound muddy, and music theory calls for notes within a chord to be spaced fur-
ther apart than in higher registers. This may be in part because the auditory filters
are relatively wider at low frequencies (below about 500 Hz), leading to stronger
peripheral interactions, and hence greater roughness, for tones that are spaced a
constant musical interval apart. Second, it has been hypothesized that roughness
underlies in part the attribute of dissonance that is used to describe unpleasant com-
binations of notes. The relationship between dissonance and beating is considered
further in Section III,D.
1. The Perception of Musical Tones 23

2. Pitch Perception of Multiple Sounds

Despite the important role of tone combinations or chords in music, relatively few
psychoacoustic studies have examined their perception. Beerends and Houtsma
(1989) used complex tones consisting of just two consecutive harmonics each.
Although the pitch of these two-component complexes is relatively weak, with prac-
tice, listeners can learn to accurately identify the F0 of such complexes. Beerends
and Houtsma found that listeners were able to identify the pitches of the two com-
plex tones, even if the harmonics from one sound were presented to different ears.
The only exception was when all the components were presented to one ear and
none of the four components was deemed to be resolved. In that case, listeners
were not able to identify either pitch accurately.
Carlyon (1996) used harmonic tone complexes with more harmonics and filtered
them so that they had completely overlapping spectral envelopes. He found that
when both complexes were composed of resolved harmonics, listeners were able to
hear out the pitch of one complex in the presence of the other. However, the sur-
prising finding was that when both complexes comprised only unresolved harmo-
nics, then listeners did not hear a pitch at all, but described the percept as an
unmusical crackle. To avoid ambiguity, Carlyon (1996) used harmonics that
were either highly resolved or highly unresolved. Because of this, it remained
unclear whether it is the resolvability of the harmonics before or after the two
sounds are mixed that determines whether each tone elicits a clear pitch. Micheyl
and colleagues addressed this issue, using a variety of combinations of spectral
region and F0 to vary the relative resolvability of the components (Micheyl,
Bernstein, & Oxenham, 2006; Micheyl, Keebler, & Oxenham, 2010). By compar-
ing the results to simulations of auditory filtering, they found that good pitch dis-
crimination was only possible when at least two of the harmonics from the target
sound were deemed resolved after being mixed with the other sound (Micheyl
et al., 2010). The results are consistent with place theories of pitch that rely on
resolved harmonics; however, it may be possible to adapt timing-based models of
pitch to similarly explain the phenomena (e.g., Bernstein & Oxenham, 2005).

D. Consonance and Dissonance

The question of how certain combinations of tones sound when played together
is central to many aspects of music theory. Combinations of two tones that form
certain musical intervals, such as the octave and the fifth, are typically deemed as
sounding pleasant or consonant, whereas others, such as the augmented fourth (tri-
tone), are often considered unpleasant or dissonant. These types of percepts involv-
ing tones presented in isolation of a musical context have been termed sensory
consonance or dissonance. The term musical consonance (Terhardt, 1976, 1984)
subsumes sensory factors, but also includes many other factors that contribute to
whether a sound combination is judged as consonant or dissonant, including the
context (what sounds preceded it), the style of music (e.g., jazz or classical), and
presumably also the personal taste and musical history of the individual listener.
24 Andrew J. Oxenham

There has been a long-standing search for acoustic and physiological correlates
of consonance and dissonance, going back to the observations of Pythagoras that
strings whose lengths had a small-number ratio relationship (e.g., 2:1 or 3:2)
sounded pleasant together. Helmholtz (1885/1954) suggested that consonance may
be related to the absence of beats (perceived as roughness) in musical sounds.
Plomp and Levelt (1965) developed the idea further by showing that the ranking by
consonance of musical intervals within an octave was well predicted by the number
of component pairs within the two complex tones that fell within the same auditory
filters and therefore caused audible beats (see also Kameoka & Kuriyagawa,
1969a, 1969b). When two complex tones form a consonant interval, such as an
octave or a fifth, the harmonics are either exactly coincident, and so do not produce
beats, or are spaced so far apart as to not produce strong beats. In contrast, when
the tones form a dissonant interval, such as a minor second, none of the compo-
nents are coincident, but many are close enough to produce beats.
Another alternative theory of consonance is based on the harmonicity of the
sound combination, or how closely it resembles a single harmonic series. Consider,
for instance, two complex tones that form the interval of a perfect fifth, with F0s of
440 and 660 Hz. All the components from both tones are multiples of a single
F0220 Hzand so, according to the harmonicity account of consonance, should
sound consonant. In contrast, the harmonics of two tones that form an augmented
fourth, with F0s of 440 Hz and 622 Hz, do not approximate any single harmonic
series within the range of audible pitches and so should sound dissonant, as found
empirically. The harmonicity theory of consonance can be implemented by using a
spectral template model (Terhardt, 1974b) or by using temporal information,
derived for instance from spikes in the auditory nerve (Tramo, Cariani, Delgutte, &
Braida, 2001).
Because the beating and harmonicity theories of consonance and dissonance pro-
duce very similar predictions, it has been difficult to distinguish between them
experimentally. A recent study took a step toward this goal by examining individ-
ual differences in a large group (.200) of participants (McDermott, Lehr, &
Oxenham, 2010). First, listeners were asked to provide preference ratings for diag-
nostic stimuli that varied in beating but not harmonicity, or vice versa. Next,
listeners were asked to provide preference ratings for various musical sound
combinations, including dyads (two-note chords) and triads (three-note chords),
using natural and artificial musical instruments and voices. When the ratings in the
two types of tasks were compared, the correlations between the ratings for the har-
monicity diagnostic tests and the musical sounds were significant, but the correla-
tions between the ratings for the beating diagnostic tests and the musical sounds
were not. Interestingly, the number of years of formal musical training also corre-
lated with both the harmonicity and musical preference ratings, but not with the
beating ratings. Overall, the results suggested that harmonicity, rather than lack of
beating, underlies listeners consonance preferences and that musical training may
amplify the preference for harmonic relationships.
Developmental studies have shown that infants as young as 3 or 4 months show
a preference for consonant over dissonant musical intervals (Trainor & Heinmiller,
1. The Perception of Musical Tones 25

1998; Zentner & Kagan, 1996, 1998). However, it is not yet known whether infants
are responding more to beats or inharmonicity, or both. It would be interesting to
discover whether the adult preferences for harmonicity revealed by McDermott
et al. (2010) are shared by infants, or whether infants initially base their preferences
on acoustic beats.

IV. Conclusions and Outlook

Although the perception of musical tones should be considered primarily in musical
contexts, much about the interactions between acoustics, auditory physiology, and
perception can be learned through psychoacoustic experiments using relatively
simple stimuli and procedures. Recent findings using psychoacoustics, alone or in
combination with neurophysiology and neuroimaging, have extended our knowl-
edge of how pitch, timbre, and loudness are perceived and represented neurally,
both for tones in isolation and in combination. However, much still remains to be
discovered. Important trends include the use of more naturalistic stimuli in experi-
ments and for testing computational models of perception, as well as the simulta-
neous combination of perceptual and neural measures when attempting to elucidate
the underlying neural mechanisms of auditory perception. Using the building
blocks provided by the psychoacoustics of individual and simultaneous musical
tones, it is possible to proceed to answering much more sophisticated questions
regarding the perception of music as it unfolds over time. These and other issues
are tackled in the remaining chapters of this volume.

Emily Allen, Christophe Micheyl, and John Oxenham provided helpful comments on an
earlier version of this chapter. The work from the authors laboratory is supported by funding
from the National Institutes of Health (Grants R01 DC 05216 and R01 DC 07657).

American National Standards Institute. (1994). Acoustical terminology. ANSI S1.1-1994.
New York, NY: Author.
Arieh, Y., & Marks, L. E. (2003a). Recalibrating the auditory system: A speed-accuracy
analysis of intensity perception. Journal of Experimental Psychology: Human
Perception and Performance, 29, 523 536.
Arieh, Y., & Marks, L. E. (2003b). Time course of loudness recalibration: Implications for
loudness enhancement. Journal of the Acoustical Society of America, 114, 1550 1556.
Attneave, F., & Olson, R. K. (1971). Pitch as a medium: A new approach to psychophysical
scaling. American Journal of Psychology, 84, 147 166.
26 Andrew J. Oxenham

Beerends, J. G., & Houtsma, A. J. M. (1989). Pitch identification of simultaneous diotic and
dichotic two-tone complexes. Journal of the Acoustical Society of America, 85,
813 819.
Bendor, D., & Wang, X. (2005). The neuronal representation of pitch in primate auditory
cortex. Nature, 436, 1161 1165.
Bernstein, J. G., & Oxenham, A. J. (2003). Pitch discrimination of diotic and dichotic tone
complexes: Harmonic resolvability or harmonic number? Journal of the Acoustical
Society of America, 113, 3323 3334.
Bernstein, J. G., & Oxenham, A. J. (2005). An autocorrelation model with place dependence
to account for the effect of harmonic number on fundamental frequency discrimination.
Journal of the Acoustical Society of America, 117, 3816 3831.
Bernstein, J. G., & Oxenham, A. J. (2006a). The relationship between frequency selectivity
and pitch discrimination: Effects of stimulus level. Journal of the Acoustical Society of
America, 120, 3916 3928.
Bernstein, J. G., & Oxenham, A. J. (2006b). The relationship between frequency selectivity
and pitch discrimination: Sensorineural hearing loss. Journal of the Acoustical Society
of America, 120, 3929 3945.
Bizley, J. K., Walker, K. M., Silverman, B. W., King, A. J., & Schnupp, J. W. (2009).
Interdependent encoding of pitch, timbre, and spatial location in auditory cortex.
Journal of Neuroscience, 29, 2064 2075.
Borchert, E. M., Micheyl, C., & Oxenham, A. J. (2011). Perceptual grouping affects pitch
judgments across time and frequency. Journal of Experimental Psychology: Human
Perception and Performance, 37, 257 269.
Burns, E. M., & Viemeister, N. F. (1976). Nonspectral pitch. Journal of the Acoustical
Society of America, 60, 863 869.
Burns, E. M., & Viemeister, N. F. (1981). Played again SAM: Further observations on the
pitch of amplitude-modulated noise. Journal of the Acoustical Society of America, 70,
1655 1660.
Buus, S., Muesch, H., & Florentine, M. (1998). On loudness at threshold. Journal of the
Acoustical Society of America, 104, 399 410.
Cariani, P. A., & Delgutte, B. (1996). Neural correlates of the pitch of complex tones.
I. Pitch and pitch salience. Journal of Neurophysiology, 76, 1698 1716.
Carlyon, R. P. (1996). Encoding the fundamental frequency of a complex tone in the pres-
ence of a spectrally overlapping masker. Journal of the Acoustical Society of America,
99, 517 524.
Carlyon, R. P. (1998). Comments on A unitary model of pitch perception [Journal of the
Acoustical Society of America, 102, 1811 1820 (1997)]. Journal of the Acoustical
Society of America, 104, 1118 1121.
Carlyon, R. P., & Shackleton, T. M. (1994). Comparing the fundamental frequencies of
resolved and unresolved harmonics: Evidence for two pitch mechanisms? Journal of the
Acoustical Society of America, 95, 3541 3554.
Cedolin, L., & Delgutte, B. (2010). Spatiotemporal representation of the pitch of harmonic
complex tones in the auditory nerve. Journal of Neuroscience, 30, 12712 12724.
Chalupper, J., & Fastl, H. (2002). Dynamic loudness model (DLM) for normal and hearing-
impaired listeners. Acta Acustica united with Acustica, 88, 378 386.
Chen, Z., Hu, G., Glasberg, B. R., & Moore, B. C. (2011). A new method of calculating
auditory excitation patterns and loudness for steady sounds. Hearing Research, 282
(1 2), 204 215.
1. The Perception of Musical Tones 27

Cohen, M. A., Grossberg, S., & Wyse, L. L. (1995). A spectral network model of pitch per-
ception. Journal of the Acoustical Society of America, 98, 862 879.
Dai, H. (2000). On the relative influence of individual harmonics on pitch judgment. Journal
of the Acoustical Society of America, 107, 953 959.
Daniel, P., & Weber, R. (1997). Psychoacoustical roughness: Implementation of an opti-
mized model. Acustica, 83, 113 123.
Darwin, C. J. (2005). Pitch and auditory grouping. In C. J. Plack, A. J. Oxenham, R. Fay, &
A. N. Popper (Eds.), Pitch: Neural coding and perception (pp. 278 305). New York,
NY: Springer Verlag.
Darwin, C. J., & Ciocca, V. (1992). Grouping in pitch perception: Effects of onset asyn-
chrony and ear of presentation of a mistuned component. Journal of the Acoustical
Society of America, 91, 3381 3390.
Darwin, C. J., Hukin, R. W., & al-Khatib, B. Y. (1995). Grouping in pitch perception:
Evidence for sequential constraints. Journal of the Acoustical Society of America, 98,
880 885.
de Boer, E. (1956). On the residue in hearing (Unpublished doctoral dissertation). The
Netherlands: University of Amsterdam.
de Cheveigne, A. (2005). Pitch perception models. In C. J. Plack, A. J. Oxenham,
A. N. Popper, & R. Fay (Eds.), Pitch: Neural coding and perception
(pp. 169 233). New York, NY: Springer Verlag.
de Cheveigne, A., & Pressnitzer, D. (2006). The case of the missing delay lines: Synthetic
delays obtained by cross-channel phase interaction. Journal of the Acoustical Society of
America, 119, 3908 3918.
Demany, L., & Ramos, C. (2005). On the binding of successive sounds: perceiving
shifts in nonperceived pitches. Journal of the Acoustical Society of America, 117,
833 841.
Durlach, N. I., & Braida, L. D. (1969). Intensity perception. I. Preliminary theory of intensity
resolution. Journal of the Acoustical Society of America, 46, 372 383.
Epstein, M., & Florentine, M. (2005). A test of the equal-loudness-ratio hypothesis using
cross-modality matching functions. Journal of the Acoustical Society of America, 118,
907 913.
Faulkner, A. (1985). Pitch discrimination of harmonic complex signals: Residue pitch or
multiple component discriminations. Journal of the Acoustical Society of America, 78,
1993 2004.
Fechner, G. T. (1860). Elemente der psychophysik (Vol. 1). Leipzig, Germany: Breitkopf
und Haertl.
Fletcher, H., & Munson, W. A. (1933). Loudness, its definition, measurement and calculation.
Journal of the Acoustical Society of America, 5, 82 108.
Fletcher, H., & Munson, W. A. (1937). Relation between loudness and masking. Journal of
the Acoustical Society of America, 9, 1 10.
Florentine, M., Buus, S., & Robinson, M. (1998). Temporal integration of loudness under
partial masking. Journal of the Acoustical Society of America, 104, 999 1007.
Gabriel, B., Kollmeier, B., & Mellert, V. (1997). Influence of individual listener, measure-
ment room and choice of test-tone levels on the shape of equal-loudness level contours.
Acustica, 83, 670 683.
Galambos, R., Bauer, J., Picton, T., Squires, K., & Squires, N. (1972). Loudness enhance-
ment following contralateral stimulation. Journal of the Acoustical Society of America,
52(4), 1127 1130.
28 Andrew J. Oxenham

Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from
notched-noise data. Hearing Research, 47, 103 138.
Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying
sounds. Journal of the Audio Engineering Society, 50, 331 341.
Gockel, H., Carlyon, R. P., & Plack, C. J. (2004). Across-frequency interference effects in
fundamental frequency discrimination: Questioning evidence for two pitch mechanisms.
Journal of the Acoustical Society of America, 116, 1092 1104.
Goldstein, J. L. (1973). An optimum processor theory for the central formation of
the pitch of complex tones. Journal of the Acoustical Society of America, 54,
1496 1516.
Griffiths, T. D., Buchel, C., Frackowiak, R. S., & Patterson, R. D. (1998). Analysis of tem-
poral structure in sound by the human brain. Nature Neuroscience, 1, 422 427.
Griffiths, T. D., Uppenkamp, S., Johnsrude, I., Josephs, O., & Patterson, R. D. (2001).
Encoding of the temporal regularity of sound in the human brainstem. Nature
Neuroscience, 4, 633 637.
Hall, D. A., & Plack, C. J. (2009). Pitch processing sites in the human auditory brain.
Cerebral Cortex, 19, 576 585.
Hartmann, W. M., & Goupell, M. J. (2006). Enhancing and unmasking the harmonics of a
complex tone. Journal of the Acoustical Society of America, 120, 2142 2157.
Heinz, M. G., Colburn, H. S., & Carney, L. H. (2001). Evaluating auditory performance
limits: I. One-parameter discrimination using a computational model for the auditory
nerve. Neural Computation, 13, 2273 2316.
Hellman, R. P. (1976). Growth of loudness at 1000 and 3000 Hz. Journal of the Acoustical
Society of America, 60, 672 679.
Hellman, R. P., & Zwislocki, J. (1964). Loudness function of a 1000-cps tone in the presence
of a masking noise. Journal of the Acoustical Society of America, 36, 1618 1627.
Helmholtz, H. L. F. (1885/1954). On the sensations of tone (A. J. Ellis, Trans.). New York,
NY: Dover.
Henning, G. B. (1966). Frequency discrimination of random amplitude tones. Journal of the
Acoustical Society of America, 39, 336 339.
Houtsma, A. J. M., & Smurzynski, J. (1990). Pitch identification and discrimination for complex
tones with many harmonics. Journal of the Acoustical Society of America, 87, 304 310.
Huron, D. (1989). Voice denumerability in polyphonic music of homogenous timbres. Music
Perception, 6, 361 382.
Jesteadt, W., Wier, C. C., & Green, D. M. (1977). Intensity discrimination as a function of
frequency and sensation level. Journal of the Acoustical Society of America, 61,
169 177.
Kaernbach, C., & Bering, C. (2001). Exploring the temporal mechanism involved in the
pitch of unresolved harmonics. Journal of the Acoustical Society of America, 110,
1039 1048.
Kameoka, A., & Kuriyagawa, M. (1969a). Consonance theory part I: Consonance of dyads.
Journal of the Acoustical Society of America, 45, 1451 1459.
Kameoka, A., & Kuriyagawa, M. (1969b). Consonance theory part II: Consonance of com-
plex tones and its calculation method. Journal of the Acoustical Society of America, 45,
1460 1469.
Keuss, P. J., & van der Molen, M. W. (1982). Positive and negative effects of stimulus
intensity in auditory reaction tasks: Further studies on immediate arousal. Acta
Psychologica, 52, 61 72.
1. The Perception of Musical Tones 29

Kohfeld, D. L. (1971). Simple reaction time as a function of stimulus intensity in decibels of

light and sound. Journal of Experimental Psychology, 88, 251 257.
Kohlrausch, A., Fassel, R., & Dau, T. (2000). The influence of carrier level and frequency
on modulation and beat-detection thresholds for sinusoidal carriers. Journal of the
Acoustical Society of America, 108, 723 734.
Langner, G., & Schreiner, C. E. (1988). Periodicity coding in the inferior colliculus of the
cat. I. Neuronal mechanisms. Journal of Neurophysiology, 60, 1799 1822.
Liberman, A. M., Isenberg, D., & Rakerd, B. (1981). Duplex perception of cues for stop con-
sonants: Evidence for a phonetic mode. Perception & Psychophysics, 30, 133 143.
Licklider, J. C., Webster, J. C., & Hedlun, J. M. (1950). On the frequency limits of binaural
beats. Journal of the Acoustical Society of America, 22, 468 473.
Licklider, J. C. R. (1951). A duplex theory of pitch perception. Experientia, 7, 128 133.
Loeb, G. E., White, M. W., & Merzenich, M. M. (1983). Spatial cross correlation: A pro-
posed mechanism for acoustic pitch perception. Biological Cybernetics, 47, 149 163.
Luce, R. D., & Green, D. M. (1972). A neural timing theory for response times and the psy-
chophysics of intensity. Psychological Review, 79, 14 57.
Mapes-Riordan, D., & Yost, W. A. (1999). Loudness recalibration as a function of level.
Journal of the Acoustical Society of America, 106, 3506 3511.
Marks, L. E. (1994). Recalibrating the auditory system: The perception of loudness.
Journal of Experimental Psychology: Human Perception and Performance, 20,
382 396.
Mauermann, M., Long, G. R., & Kollmeier, B. (2004). Fine structure of hearing threshold and
loudness perception. Journal of the Acoustical Society of America, 116, 1066 1080.
McDermott, J. H., Lehr, A. J., & Oxenham, A. J. (2010). Individual differences reveal the
basis of consonance. Current Biology, 20, 1035 1041.
Meddis, R., & Hewitt, M. (1991). Virtual pitch and phase sensitivity studied of a computer
model of the auditory periphery. I: Pitch identification. Journal of the Acoustical
Society of America, 89, 2866 2882.
Meddis, R., & OMard, L. (1997). A unitary model of pitch perception. Journal of the
Acoustical Society of America, 102, 1811 1820.
Micheyl, C., Bernstein, J. G., & Oxenham, A. J. (2006). Detection and F0 discrimination of
harmonic complex tones in the presence of competing tones or noise. Journal of the
Acoustical Society of America, 120, 1493 1505.
Micheyl, C., Delhommeau, K., Perrot, X., & Oxenham, A. J. (2006). Influence of musical
and psychoacoustical training on pitch discrimination. Hearing Research, 219,
36 47.
Micheyl, C., Keebler, M. V., & Oxenham, A. J. (2010). Pitch perception for mixtures of
spectrally overlapping harmonic complex tones. Journal of the Acoustical Society of
America, 128, 257 269.
Micheyl, C., & Oxenham, A. J. (2003). Further tests of the two pitch mechanisms hypothe-
sis. Journal of the Acoustical Society of America, 113, 2225.
Miller, G. A. (1956). The magic number seven, plus or minus two: Some limits on our
capacity for processing information. Psychology Review, 63, 81 96.
Moore, B. C. J. (1973). Frequency difference limens for short-duration tones. Journal of the
Acoustical Society of America, 54, 610 619.
Moore, B. C. J., & Glasberg, B. R. (1990). Frequency discrimination of complex tones with
overlapping and non-overlapping harmonics. Journal of the Acoustical Society of
America, 87, 2163 2177.
30 Andrew J. Oxenham

Moore, B. C. J., & Glasberg, B. R. (1996). A revision of Zwickers loudness model.

Acustica, 82, 335 345.
Moore, B. C. J., & Glasberg, B. R. (1997). A model of loudness perception applied to
cochlear hearing loss. Auditory Neuroscience, 3, 289 311.
Moore, B. C. J., Glasberg, B. R., & Baer, T. (1997). A model for the prediction of thresholds,
loudness, and partial loudness. Journal of the Audio Engineering Society, 45, 224 240.
Moore, B. C. J., Glasberg, B. R., & Peters, R. W. (1985). Relative dominance of individual
partials in determining the pitch of complex tones. Journal of the Acoustical Society of
America, 77, 1853 1860.
Moore, B. C. J., Glasberg, B. R., & Peters, R. W. (1986). Thresholds for hearing mistuned
partials as separate tones in harmonic complexes. Journal of the Acoustical Society of
America, 80, 479 483.
Moore, B. C. J., Glasberg, B. R., & Vickers, D. A. (1999). Further evaluation of a model of
loudness perception applied to cochlear hearing loss. Journal of the Acoustical Society
of America, 106, 898 907.
Moore, B. C. J., & Gockel, H. E. (2011). Resolvability of components in complex tones and
implications for theories of pitch perception. Hearing Research, 276, 88 97.
Moore, B. C. J., & Peters, R. W. (1992). Pitch discrimination and phase sensitivity in young
and elderly subjects and its relationship to frequency selectivity. Journal of the
Acoustical Society of America, 91, 2881 2893.
Moore, B. C. J., & Sek, A. (2009). Sensitivity of the human auditory system to temporal fine
structure at high frequencies. Journal of the Acoustical Society of America, 125,
3186 3193.
Noesselt, T., Tyll, S., Boehler, C. N., Budinger, E., Heinze, H. J., & Driver, J. (2010).
Sound-induced enhancement of low-intensity vision: Multisensory influences on human
sensory-specific cortices and thalamic bodies relate to perceptual enhancement of visual
detection sensitivity. Journal of Neuroscience, 30, 13609 13623.
Oberfeld, D. (2007). Loudness changes induced by a proximal sound: Loudness enhance-
ment, loudness recalibration, or both? Journal of the Acoustical Society of America,
121, 2137 2148.
Odgaard, E. C., Arieh, Y., & Marks, L. E. (2003). Cross-modal enhancement of perceived
brightness: Sensory interaction versus response bias. Perception & Psychophysics, 65,
123 132.
Odgaard, E. C., Arieh, Y., & Marks, L. E. (2004). Brighter noise: Sensory enhancement of
perceived loudness by concurrent visual stimulation. Cognitive, Affective, & Behavioral
Neuroscience, 4, 127 132.
Oxenham, A. J., Bernstein, J. G. W., & Penagos, H. (2004). Correct tonotopic representation
is necessary for complex pitch perception. Proceedings of the National Academy of
Sciences USA, 101, 1421 1425.
Oxenham, A. J., & Buus, S. (2000). Level discrimination of sinusoids as a function of dura-
tion and level for fixed-level, roving-level, and across-frequency conditions. Journal of
the Acoustical Society of America, 107, 1605 1614.
Oxenham, A. J., Micheyl, C., Keebler, M. V., Loper, A., & Santurette, S. (2011). Pitch per-
ception beyond the traditional existence region of pitch. Proceedings of the National
Academy of Sciences USA, 108, 7629 7634.
Palmer, A. R., & Russell, I. J. (1986). Phase-locking in the cochlear nerve of the guinea-pig
and its relation to the receptor potential of inner hair-cells. Hearing Research, 24,
1 15.
1. The Perception of Musical Tones 31

Patterson, R. D. (1973). The effects of relative phase and the number of components on
residue pitch. Journal of the Acoustical Society of America, 53, 1565 1572.
Penagos, H., Melcher, J. R., & Oxenham, A. J. (2004). A neural representation of pitch
salience in non-primary human auditory cortex revealed with fMRI. Journal of
Neuroscience, 24, 6810 6815.
Plack, C. J. (1996). Loudness enhancement and intensity discrimination under forward and
backward masking. Journal of the Acoustical Society of America, 100, 1024 1030.
Plack, C. J., Oxenham, A. J., Popper, A. N., & Fay, R. (Eds.), (2005). Pitch: Neural coding
and perception. New York, NY: Springer Verlag.
Plomp, R., & Levelt, W. J. M. (1965). Tonal consonance and critical bandwidth. Journal of
the Acoustical Society of America, 38, 548 560.
Poulton, E. C. (1977). Quantitative subjective assessments are almost always biased, some-
times completely misleading. British Journal of Psychology, 68, 409 425.
Poulton, E. C. (1979). Models for the biases in judging sensory magnitude. Psychology
Bulletin, 86, 777 803.
Pressnitzer, D., Patterson, R. D., & Krumbholz, K. (2001). The lower limit of melodic pitch.
Journal of the Acoustical Society of America, 109, 2074 2084.
Relkin, E. M., & Doucet, J. R. (1997). Is loudness simply proportional to the auditory nerve
spike count? Journal of the Acoustical Society of America, 101, 2735 2741.
Ritsma, R. J. (1962). Existence region of the tonal residue. I. Journal of the Acoustical
Society of America, 34, 1224 1229.
Robinson, D. W., & Dadson, R. S. (1956). A re-determination of the equal-loudness relations
for pure tones. British Journal of Applied Physics, 7, 166 181.
Rose, J. E., Brugge, J. F., Anderson, D. J., & Hind, J. E. (1967). Phase-locked response to
low-frequency tones in single auditory nerve fibers of the squirrel monkey. Journal of
Neurophysiology, 30, 769 793.
Scharf, B. (1964). Partial masking. Acustica, 14, 16 23.
Scharf, B., Buus, S., & Nieder, B. (2002). Loudness enhancement: Induced loudness
reduction in disguise? (L). Journal of the Acoustical Society of America, 112, 807 810.
Schouten, J. F. (1940). The residue and the mechanism of hearing. Proceedings of the
Koninklijke Nederlandse Academie van Wetenschappen, 43, 991 999.
Schouten, J. F., Ritsma, R. J., & Cardozo, B. L. (1962). Pitch of the residue. Journal of the
Acoustical Society of America, 34, 1418 1424.
Schutz, M., & Kubovy, M. (2009). Causality and cross-modal integration. Journal of
Experimental Psychology: Human Perception and Performance, 35, 1791 1810.
Schutz, M., & Lipscomb, S. (2007). Hearing gestures, seeing music: Vision influences
perceived tone duration. Perception, 36, 888 897.
Seebeck, A. (1841). Beobachtungen uber einige Bedingungen der Entstehung von Tonen.
Annals of Physical Chemistry, 53, 417 436.
Shackleton, T. M., & Carlyon, R. P. (1994). The role of resolved and unresolved harmonics
in pitch perception and frequency modulation discrimination. Journal of the Acoustical
Society of America, 95, 3529 3540.
Shamma, S., & Klein, D. (2000). The case of the missing pitch templates: How harmonic
templates emerge in the early auditory system. Journal of the Acoustical Society of
America, 107, 2631 2644.
Shinn-Cunningham, B. G., Lee, A. K., & Oxenham, A. J. (2007). A sound element gets lost
in perceptual competition. Proceedings of the National Academy of Sciences USA, 104,
12223 12227.
32 Andrew J. Oxenham

Shofner, W. P. (2005). Comparative aspects of pitch perception. In C. J. Plack, A. J. Oxenham,

R. Fay, & A. N. Popper (Eds.), Pitch: Neural coding and perception (pp. 56 98). New
York, NY: Springer Verlag.
Stein, B. E., London, N., Wilkinson, L. K., & Price, D. D. (1996). Enhancement of perceived
visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive
Neuroscience, 8, 497 506.
Stevens, S. S. (1957). On the psychophysical law. Psychology Review, 64, 153 181.
Suzuki, Y., & Takeshima, H. (2004). Equal-loudness-level contours for pure tones. Journal
of the Acoustical Society of America, 116, 918 933.
Terhardt, E. (1974a). On the perception of periodic sound fluctuations (roughness). Acustica,
30, 201 213.
Terhardt, E. (1974b). Pitch, consonance, and harmony. Journal of the Acoustical Society of
America, 55, 1061 1069.
Terhardt, E. (1976). Psychoakustich begrundetes Konzept der musikalischen Konsonanz.
Acustica, 36, 121 137.
Terhardt, E. (1984). The concept of musical consonance, a link between music and psycho-
acoustics. Music Perception, 1, 276 295.
Trainor, L. J., & Heinmiller, B. M. (1998). The development of evaluative responses to
music: Infants prefer to listen to consonance over dissonance. Infant Behavior and
Development, 21, 77 88.
Tramo, M. J., Cariani, P. A., Delgutte, B., & Braida, L. D. (2001). Neurobiological founda-
tions for the theory of harmony in western tonal music. Annals of the New York
Academy of Sciences, 930, 92 116.
van de Par, S., & Kohlrausch, A. (1997). A new approach to comparing binaural masking
level differences at low and high frequencies. Journal of the Acoustical Society of
America, 101, 1671 1680.
Verschuure, J., & van Meeteren, A. A. (1975). The effect of intensity on pitch. Acustica, 32,
33 44.
Viemeister, N. F. (1983). Auditory intensity discrimination at high frequencies in the pres-
ence of noise. Science, 221, 1206 1208.
Viemeister, N. F., & Bacon, S. P. (1988). Intensity discrimination, increment detection, and
magnitude estimation for 1-kHz tones. Journal of the Acoustical Society of America, 84,
172 178.
Wallace, M. N., Rutkowski, R. G., Shackleton, T. M., & Palmer, A. R. (2000). Phase-locked
responses to pure tones in guinea pig auditory cortex. Neuroreport, 11, 3989 3993.
Warren, R. M. (1970). Elimination of biases in loudness judgements for tones. Journal of the
Acoustical Society of America, 48, 1397 1403.
Wightman, F. L. (1973). The pattern-transformation model of pitch. Journal of the
Acoustical Society of America, 54, 407 416.
Winckel, F. W. (1962). Optimum acoustic criteria of concert halls for the performance of
classical music. Journal of the Acoustical Society of America, 34, 81 86.
Winter, I. M. (2005). The neurophysiology of pitch. In C. J. Plack, A. J. Oxenham, R. Fay,
& A. N. Popper (Eds.), Pitch: Neural coding and perception (pp. 99 146). New York,
NY: Springer Verlag.
Winter, I. M., Wiegrebe, L., & Patterson, R. D. (2001). The temporal representation of the
delay of iterated rippled noise in the ventral cochlear nucleus of the guinea-pig. Journal
of Physiology, 537, 553 566.
Zentner, M. R., & Kagan, J. (1996). Perception of music by infants. Nature, 383, 29.
1. The Perception of Musical Tones 33

Zentner, M. R., & Kagan, J. (1998). Infants perception of consonance and dissonance in
music. Infant Behavior and Development, 21, 483 492.
Zwicker, E. (1960). Ein Verfahren zur Berechnung der Lautstarke. Acustica, 10, 304 308.
Zwicker, E. (1963). Uber die Lautheit von ungedrosselten und gedrosselten Schallen.
Acustica, 13, 194 211.
Zwicker, E., Fastl, H., & Dallmayr, C. (1984). BASIC-Program for calculating the loudness
of sounds from their 1/3-oct. band spectra according to ISO 522B. Acustica, 55, 63 67.