Sie sind auf Seite 1von 9

PAPERS

Free-Field Equivalent Localization of Virtual Audio*


RUSSELL L. MARTIN AND KEN I. MCANALLY

Air Operations Division, Aeronautical and Maritime Research Laboratory, Melbourne, 3001, Australia
AND
MELIS A. SENOVA

Department of Biophysical Sciences and Electrical Engineering, Swinburne University of Technology,


Hawthorn, 3122, Australia

Virtual audio has great potential for conveying spatial information and could be applied
to advantage in several environments. Previously implemented virtual audio systems,
however, have been shown to be less than perfect with respect to front-back confusion
rate and average localization error. A system from this laboratory has been evaluated by
comparing, for three participants, virtual and free-field localization performance across
a wide range of sound-source locations. For each participant, virtual localization was
found to be as good as free-field localization, as measured by both front-back confusion
rate and average localization error. The feasibility of achieving free-field equivalent
localization of virtual audio should encourage the more widespread use of this relatively
new technology.

0 INTRODUCTION
Three-dimensional audio displays have great potential
for conveying spatial information and enhancing virtual
environments. They could be applied to advantage in
domains as diverse as home entertainment and the military. The ability to synthesize sound that is heard in
three-dimensional space also provides the auditory scientist with a powerful tool for examining the cues and
mechanisms involved in sound localization.
High-fidelity virtual audio can be generated by reproducing the at-eardrum signals associated with natural,
free-field sound presentation. This can be achieved by
measuring the way an individual's head and ears filter
sound presented from different directions and then constructing a set of digital filters that modify sound as the
head or ears would. Typically, measurements are made
using small microphones placed within the individual's
ear canals or coupled to them via probe tubes. Filters
constructed from these measurements can be convolved
with any sound to impart directionality to it (see, for
example, [1]).
Implementations of this technique have produced im* Manuscript received 2000 March 1; revised 2000 September 28.
14

pressive results. Wightman and Kistler [2], McKinley


et al. [3], Bronkhorst [4], and Carlile et al. [5] have
described systems capable of generating virtual sound
that can be localized with a high degree of accuracy.
Where thoroughly evaluated, however, the fidelity of
these systems has been found to be less than perfect [2],
[4], [5]. Localization of both free-field and virtual sound
is commonly affected by a phenomenon known as
f r o n t - b a c k confusion, where the source of a sound is
perceived to be in the incorrect hemifield (front or back).
Confusion of this type, which can result in particularly
large localization errors, occurs on about 5% of trials
in which free-field sound is localized [6], [7]. Where
virtual and free-field localization have been compared
under conditions in which cues associated with movement of the head were minimized or excluded, the f r o n t back confusion rate for virtual sound has been found to
be double that for the free field [2], [4], [5]. This implies
that the cues used by listeners to distinguish front and
back have been reproduced with insufficient fidelity.
Even when f r o n t - b a c k confusions were corrected for or
excluded from the analysis, average localization errors
for virtual sound were found to be 1 - 2 ~ greater than
those for the free field [2], [5]. This was also the case
where sound-source locations were restricted to the frontal hemifield and f r o n t - b a c k confusions could be reJ Audio Eng. Soc, Vol 49, No 1/2, 2001 January/February

PAPERS

solved by head movement [4]. The presence of these


shortcomings is widely recognized (see, for example,
[8]) and an impediment to the application of threedimensional audio technology where accurate reproduction of auditory space is required.
In this paper a virtual audio system is described for
which localization performance is as good as that for
free-field sound with respect to both f r o n t - b a c k confusion rate and average localization error. The feasibility
of achieving free-field equivalent localization of virtual
audio, as demonstrated here, should encourage the more
widespread use of this relatively new technology.

1 METHOD
1.1

Participants

Three volunteers (one male aged 41 years and two


females aged 22 and 25 years) participated in this study.
Two were coauthors of this paper. Each participant's
hearing was tested by measuring his or her absolute
thresholds for 1-, 2-, 4-, 8-, 10-, 12-, 14-, and 16-kHz
tones using a two-interval forced-choice task combined
with the two-down one-up adaptive procedure (see [9]
for details). For each participant all thresholds were
lower than the relevant age-specific norm [10], [11].
Two participants had taken part in several previous studies involving the localization of free-field sound. Informed consent was obtained from all.

1.2 Design
The localization of free-field and virtual sound was
compared on a participant-by-participant basis using a
randomized-block design. Each participant took part in
16 experimental sessions, eight in which localization of
free-field sound was tested and eight in which localization of virtual sound was tested. Each session was comprised of 42 localization trials. The localization performance of each participant, therefore, was compared across
a total of 336 free-field and 336 virtual trials.
All participants were allowed to practice free-field
and virtual localization during several training sessions
prior to the experimental phase of this study. The main
purpose of these sessions was to ensure that participants
had comparable training for free-field and virtual localization. The procedures followed during these training
sessions were identical to those followed during the subsequent experimental sessions. The performance of each
participant was observed to stabilize during the training
period. The levels at which performance stabilized in
the case of free-field sound indicated that all participants
were proficient at sound localization.

1.3 Measurement of Head-Related Impulse


Responses (HRIRs)
A set of HRIRs comprising responses for 354 locations
was generated for each participant using a Golay code
stimulus [12] and a "blocked-ear-canal" measurement
technique [13]. Miniature microphones (Sennhciser
KE4-211-2) encased in foam ear protectors (Earlink,
Cabot Safety Corp.) or swimmer's ear putty (Antinois,
J. Audio Eng Soc., Vol. 49, No. 1/2, 2001 January/February

LOCALIZATION OF VIRTUAL AUDIO

Panmedica) were placed in the participants' left and right


ear canals. Ear putty was used in place of ear protectors
in later measurement sessions, as it was thought to be
likely to ensure greater microphone stability. The use of
ear putty also facilitates microphone insertion into the
ear canal. Care was taken to ensure that the microphones
were positionally stable and their diaphragms were at
least 1 mm inside the entrance of the ear canal.
The participant was seated in a 3- by 3-m soundattenuated anechoic chamber at the center of a 1-mradius hoop on which a loudspeaker (Bose FreeSpace
tweeter) was mounted. Background noise levels within
this chamber were less than 10 dB (sound-pressure level)
in all one-third-octave bands with center frequencies
from 0.5 to 16.0 kHz. The internal walls and the ceiling
of the chamber were lined with material having an absorption coefficient greater than or equal to 0.99 down
to 800 Hz.
Prior to the measurement of each pair of HRIRs the
participant placed his or her chin on a rest that helped
to position the head at the center of rotation of the hoop.
Head position was tracked magnetically via a receiver (3
Space Fastrak, Polhemus) attached to a plastic headband
worn by the participant. The head's position relative to
the center of the hoop was displayed on a bank of lightemitting diodes (LEDs) mounted within the participant's
field of view. Measurements were not made unless the
participant's head was positioned within 10 mm of the
hoop center (with respect to the x, y, and z axes) and
oriented within 1~ of straight and level on two successive
readings of the head tracker separated by an interval of
500 ms.
HRIRs were measured at elevations ranging from
- 4 0 to + 7 0 ~ in steps of 10 ~ For each elevation the
number of locations was set such that the vectors extending from the center of the hoop to adjacent locations
subtended an angle of approximately 10 ~ This resulted
in measurements being made for between 12 and 36
locations (or azimuths) at any given elevation. The convention of measuring azimuth in the clockwise direction
and describing elevations below the interaural horizontal
plane as negative was followed. For each location, two
8192-'point Golay codes were generated at a rate of 50
kHz (Tucker-Davis Technologies System II), amplified
(Hailer Pro 1200), and played at 75 dB (A-weighted)
through the hoop-mounted loudspeaker. The signal from
each microphone was low-pass filtered at 20 kHz and
sampled at 50 kHz (Tucker-Davis Technologies System
II) for 327.7 ms following initiation of the Golay codes.
This sampling period was more than sufficient to allow
for the acoustic travel time and any ringing in the response. Impulse responses were derived from each sampled signal [12], truncated to 512 points, and stored.
As the HRIRs determined following this procedure
were contaminated by the transfer functions of the loudspeaker and microphones, their impulse responses were
also determined. The impulse response of the loudspeaker was derived from its response to the Golay code
stimulus as measured using a microphone with a fiat
frequency response (Briiel and Kj~er 4003). This impulse
15

MARTIN ET AL

PAPERS

response was truncated to 256 points. The impulse responses of the two miniature microphones were determined together with those of the headphones (Sennheiser
HD520 II) that were subsequently used to present virtual
sound. These responses were determined immediately
following measurement of the HRIRs by playing Golay
codes through the headphones and sampling the responses of the microphones. Care was taken not to move
the microphones when the headphones were donned.
These impulse responses were truncated to 128 points.
The responses of the loudspeaker, microphones, and
headphones were deconvolved from the HRIRs by division in the frequency domain. The resulting corrected
HRIRs were truncated to 1024 points to accommodate
ringing in the inverse headphone responses.
The magnitude transfer functions of the loudspeaker
and left microphone are shown together with representative headphone and head-related magnitude transfer
functions in Fig. 1. The headphone transfer function is
that appropriate for the left ear of participant R. M. The
head-related transfer function is that appropriate for the
left ear of participant R. M. for the 0 ~ azimuth and
elevation location. It has been corrected for the loud-

speaker and microphone responses but not for the headphone response.

Loudsoeaker transfer function

Headohone transfer function

1.4 Localization Procedure


The participant was seated on a swiveling chair at the
center of the loudspeaker hoop in the same anechoic
chamber in which his or her HRIRs had been measured.
The participant's view of the hoop and loudspeaker was
obscured by an acoustically transparent cloth sphere supported by thin fiberglass rods. The inside of the sphere
was dimly lit to allow visual orientation. Participants
wore the same headband they had worn during HRIR
measurement. The same magnetic-tracker receiver used
during HRIR measurement and a laser pointer were
mounted on this headband in such a way that their relative position and orientation were constant. For virtual
trials, the headphones for which transfer functions had
been measured during the HRIR measurement procedure
were also worn.
At the beginning of each trial the participant placed
his or her chin on the rest and fixated on an LED at 0 ~
azimuth and elevation. When ready, he or she pressed
a handheld button. An acoustic stimulus was then pre-

..5
-10

-10

~ -15

~-15
if)

~ -20

~ -20
Q.

~' -25

-30 ~
-30

-35

-35

-40

"400

10
Frequency (kHz)

15

20

-45

10
Frequency (kHz)

15

(a)

(c)

Microohone transfer function

Head-related transfer function

20

10

'

-10

-10

//'

=
O..
~o

~ -2o

-20
-25

n," -30

-30
-4O

-35

-400

10
Frequency (kHz)

(b)

15

20

-50

10
Frequency (kHz)

15

20

(d)

Fig. 1. Magnitude transfer functions. Headphone transfer function is appropriate for left ear of participant R. M. Head-related
transfer function is appropriate for left ear of participant R. M. for the 0~ azimuth and elevation location. It has been corrected
for loudspeaker and microphone responses but not for headphone response.
16

J Audio Eng Soc, Vol 49, No 1/2, 2001 January/February

PAPERS

sented, provided the participant's head was stationary


(its azimuth, elevation, and roll did not vary by more
than 0.2 ~ over three successive readings of the head
tracker made at 20-ms intervals), positioned within 50
mm of the hoop center and oriented within 5 ~ of straight
and level.
For both free-field and virtual trials each stimulus
consisted of an independent sample of Gaussian noise.
This sample was 328 ms in duration and incorporated
20-ms cosine-shaped rise and fall times. For free-field
trials, each sample was filtered with the inverse of the
transfer function of the loudspeaker, low-pass filtered
at 20 kHz, and presented through the loudspeaker at 60
dB (A-weighted). For virtual trials, each sample was
convolved with the location-appropriate pair of corrected HRIRs derived from measurements for the particular participant, low-pass filtered at 20 kHz, and presented via the headphones at 60 dB (A-weighted).
Participants were instructed to keep their heads stationary during stimulus presentation. Following stimulus
presentation, the participant turned his or her head (and
body, if necessary) to direct the laser pointer's beam at
the point on the surface of the cloth sphere from which
he or she perceived the stimulus to come. The location
and orientation of the laser pointer were measured by
the magnetic tracker, and the point where the beam intersected the sphere was calculated geometrically. An LED
attached to the center of the loudspeaker, which in the
case of virtual trials had been moved to the location
from which the stimulus should have appeared to come,
was then illuminated and the participant directed the
laser pointer's beam at the LED. The location and orientation of the laser pointer were measured again, and
the point where the beam intersected the sphere was
calculated geometrically. Prior calibration established
that the error associated with this pointing and measurement system averaged across the 354 locations for which
HRIRs had been measured was less than 2 ~ The unsigned angle between the two vectors extending from
the hoop's center to the perceived location of the stimulus source and the loudspeaker LED was defined as the
localization error.
The stimulus location for each trial was chosen following a pseudorandom procedure from the set of 354 for
which HRIRs had been measured. (This was the case
for free-field as well as virtual trials.) The part sphere
from - 4 7 . 6 to + 80 ~ elevation was divided into 42 sectors of equal area. Twelve of these sectors contained
seven locations each for which HRIRs had been measured and the other sectors each contained nine. One sector
w a s chosen randomly without replacement on each trial,
and a location within it was then chosen at random. This
procedure ensured that no particular location was chosen
more than once per session and that the spread of locations within each session was reasonably even.
For both free-field and virtual trials, the loudspeaker
w a s moved to the target location before the trial began.
Loudspeaker movement was driven by programmable
stepping motors and occurred in two steps to reduce the
likelihood of participants discerning the target location
J. Audio Eng Soc., Vol. 49, No 1/2, 2001 January/February

LOCALIZATION OF VIRTUAL AUDIO

from the duration of movement. During the first step the


loudspeaker was moved to a randomly chosen location.
This location was constrained to be at least 30 ~ in azimuth and elevation away from the previous and next
target locations. During the second step the loudspeaker
was moved to the target location.
Free-field stimuli were calibrated using a microphone
(Brtiel and Kj~er 4003) and a sound-level meter (Briiel
and Kj~er 2209). Virtual stimuli were set to the same
level as free-field ones, using an acoustic mannequin
incorporating a sound level meter (Head Acoustics
HMS I1.3).

2 RESULTS
Average free-field and virtual localization errors for
each participant are plotted in Fig. 2. Following Carlile
et al. [5], trials in which a f r o n t - b a c k confusion occurred were excluded before these averages were calculated. A localization was regarded as a f r o n t - b a c k confusion if two conditions were met. The first was that
both the true and the perceived locations of the sound
source not fall within a narrow exclusion zone symmetrical about the vertical plane dividing the front and back
hemispheres of the hoop. The width of this exclusion
zone, in degrees of azimuth, was adjusted as a function
of elevation to allow for the convergence of lines of
equal azimuth at the coordinate system's poles. At 0 ~
elevation it was set at 15 ~ and at all elevations it was
equal to 15 ~ divided by the elevation's cosine. The second condition was that the true and perceived locations
of the sound source be in different f r o n t - b a c k hemifields.
Average free-field errors ranged from 8.8 to 11.0 ~
reflecting the high level of aptitude of these participants
for the localization task. Average virtual errors ranged

12

Free-field
Virtual

G"
~
9

10

"o

e"
0

N
0

R.M

KS

MS

Subject

Fig. 2. Average free-field and virtual localization errors for


each participant. Error bars show one standard error.
17

MARTIN ET AL

PAPERS

from 9.6 to 9.7 ~ For one participant (R. M.) the average
error was slightly smaller for the virtual stimulus, and
for the two others (K. S. and M. S.) it was slightly
smaller for the free field. Each of the error bars shown
in this figure represents one standard error of the mean
of the average localization errors for the eight sessions
in which the relevant stimulus was tested. The small
magnitudes of these standard errors indicate that the
performance of all participants was highly reproducible.
Each participant's average errors for the free-field and
the virtual stimuli were compared by performing a
repeated-measures ANOVA on the average errors for
the individual sessions in which the two types of stimuli
were tested. For R. M. the average error was significantly smaller for the virtual stimulus [F (1,7) = 58.29,
p < 0.01]. For K. S. and M. S. average errors did not
differ significantly [K. S., F ( 1 , 7) = 2.47, p = 0.16;
M. S., F ( 1 , 7) = 0.76, p = 0.41). Power analyses
following the procedures outlined by Keppel [14] revealed the presence of sufficient power (/>0.8) to reliably detect effects of 1.31 and 1.03 ~ for K. S. and M. S.,
respectively. We can be reasonably confident, therefore,
that free-field and virtual errors for these participants
differ by no more than 1 - 1 . 3 ~
The number of f r o n t - b a c k confusions made by each
participant for free-field and virtual stimuli is shown in
Table 1. For R. M. the number of free-field and virtual
f r o n t - b a c k confusions were identical. For K. S. the
f r o n t - b a c k confusion rate was a little higher for the
virtual stimulus and for M. S. it was a little higher
for the free field. A chi-square test for each participant
indicated that the number of f r o n t - b a c k confusions for
the two types of stimuli did not differ significantly
(R. M.,X2(1) = 0, p = 1 ; K . S.,X2(1) = 0.24, p =
0.62; M. S., X2(1) = 0.45, p = 0.50).
As visual feedback concerning the location of the
stimulus was provided during all trials in this study, it
is possible that participants learned to localize virtual
stimuli as accurately as free-field stimuli during the
study's training phase. Evidence arguing against this
possibility is provided by the session-by,session performance of individual participants for free-field and
virtual stimuli throughout the training phase (Fig. 3).
Fig. 3(a) shows the average localization errors of individual participants for most (R. M.) or all (K. S. and
M. S.) of the training sessions. (R. M. took part in a
further three free-field and 17 virtual sessions during
which additional pilot data were collected.) Sessions in
which free-field stimuli were presented are represented
by filled circles. Those in which virtual stimuli were
presented are represented by open circles. For each parTable 1. Total numberoffront-backconfusions madebyeach
participant for free-field and virtual stimuli. Numbers in
parentheses show totals as percentages of all localizations.
Participant

Free Field

Virtual

R.M.

16 (4.8%)

16 (4.8%)

K.S.
M.S.

7 (2.1%)
6 (1.8%)

10 (3.0%)
3 (0.9%)

18

ticipant there were two groups of virtual sessions. For


R. M. these groups were differentiated by the headphones used for stimulus presentation. (During the earlier sessions a pair of Sennheiser Linear 265 headphones
were used instead of the HD520 II headphones. HRIRs
were appropriately corrected.) For K. S. and M. S. the
two groups of virtual sessions were differentiated by the
particular set of corrected HRIRs employed. (The HRIRs
of these participants were remeasured after 12 and 13
virtual sessions, respectively. HRIRs were remeasured
via microphones held in place by ear putty rather than
ear protectors. A new set of corrected HRIRs was derived for each participant and used to generate stimuli
in all subsequent virtual sessions.) For all participants,
the particular group to which a given virtual session
belongs is indicated in the figure by the continuous solid
line that links the session to all other sessions in the group.
Fig. 3(b) shows the number of f r o n t - b a c k confusions
made by individual participants in the same training
sessions for which average localization errors are shown
in Fig. 3(a). The conventions for data presentation are
the same as those for Fig. 3(a).
R. M. was the first individual for whom HRIRs were
measured in our laboratory and provided the initial feedback bearing on the quality of our virtual audio system.
He had extensive experience at flee-field localization
prior to this study. The average localization error for
this participant for the eight sessions comprising the first
group of virtual sessions was 10.4 ~ This is a little
smaller than the corresponding value of 10.8 ~ for the
four free-field sessions. The average localization error
for this participant for the four sessions comprising the
second group of virtual sessions was 9.8 ~. The average
number of f r o n t - b a c k confusions for this participant was
3.3 for the flee-field sessions, and 3.4 and 3.8 for the
first and second groups of virtual sessions, respectively.
On the basis of these averages it is difficult to distinguish
between this participant's performance for flee-field and
virtual stimuli. There was no evidence of improvement
in R. M.'s localization performance across the first group
of virtual sessions. This participant localized virtual
stimuli as accurately as free-field ones on first exposure
to the virtual stimuli.
For K. S. both the average localization error (13.1 ~
and the average number of f r o n t - b a c k confusions (2.7)
for the 12 sessions comprising the first group of virtual
sessions were a little greater than the corresponding values for the five flee-field sessions (12.1 ~ and 1.2, respectively). There was no indication that this participant's
localization performance improved across the first group
of virtual sessions. Clear improvement did occur, however, immediately following remeasurement of K. S.'
HRIRs. For the three sessions comprising the second
group of virtual sessions the average localization error
decreased to 11.6 ~ and the average number of f r o n t back confusions dropped to 0.3. Both these values are
smaller than the corresponding values for the free-field
sessions, suggesting that the localization of virtual stimuli had reached free-field equivalent levels. That this
was the case is indicated by the analysis of data collected
J. Audio Eng Soc, Vol 49, No. 1/2, 2001 January/February

PAPERS

LOCALIZATION OF VIRTUAL AUDIO

during the study's experimental phase, which began immediately thereafter.


The average localization error for M. S. for the 13
sessions comprising the first group of virtual sessions
was 12.6 ~. This is larger than the corresponding value
of 10.4 ~ for the six free-field sessions. Similarly, the
average number of f r o n t - b a c k confusions for the first
group of virtual sessions (1.5) was larger than that for
the free-field sessions (0.5). For this participant there
was some suggestion of an improvement in localization
performance across the first group of virtual sessions.

Nevertheless, performance appeared to improve further


immediately after HRIR remeasurement. For the six sessions'comprising the second group of virtual sessions the
average localization error fell to 10.3 ~ and the average
number of f r o n t - b a c k confusions decreased to 1.2.

3 DISCUSSION
Evaluations of previously implemented virtual audio
systems have shown them to be less than perfect with
respect to f r o n t - b a c k confusion rate and average local-

G"

r
,~ 16

R.M.

O)

R . M .

t"-

14

.O

tO
O

t-

.o
12
,...,
8

"6

lo

E
Z

~
9

a
,

10

13

16

Session

10

13

16

Session

16I

K.S.

K.S.

6
t/)
tO

14

~4
t-

O
O

"6
~2
E
..Q

1o

g
~

a
.

10

13

16

19

10

13

16

19

Session

Session
16 84

M.S.

M.S.
t-

.o_

14 84

~4
t-

O
O

12 84
r

.N

~2
E

d3

10

-I

O)
>

<

10

13

16

19

Session
(a)

22

25

10

13

16

19

22

25

Session
(b)

Fig. 3. (a) Average localization errors and (b) numbers of front-back confusions for individual participants for most (R. M.)
or all (K. S. and M. S.) training sessions O--free-field sessions; O--virtual sessions. For each participant, two groups of virtual
sessions were distinguished. The particular group to which a given virtual session belongs is indicated by the continuous solid
line that links the session to all other sessions in group.
J Audio Eng Soc., Vol 49, No. 1/2, 2001 January/February

19

MARTIN ET AL

ization error. Wightman and Kistler [2], Bronkhorst [4],


and Carlile et al. [5] reported average front-back confusion rates for virtual sound that exceeded those for the
free field by a factor of about 2. For example, Wightman
and Kistler [2] reported an average front-back confusion
rate for the eight participants in their study of 10.9%
for virtual sound and 5.6% for free-field sound. The
smallest relative increase in the front-back confusion
rate for virtual compared with free-field sound for any
participant in their study was 1.5. The average localization e~ror .for sound generated by each of these systems
was found to be 1 - 2 ~ greater than that for free-field
sound [2], [4], [5].
We have described a system for which localization
performance is as good as that for free-field sound. The
front-back confusion rate averaged across the three participants in our study was 2.9% for both virtual and freefield sound. The largest relative increase in the frontback confusion rate for virtual compared with free-field
sound for any participant in our study was 1.4. Furthermore, the average localization error for the three participants in our study was 9.7 ~ for both virtual and freefield sound. To our knowledge this is the first report of
such high-fidelity reproduction of auditory space. No
previous study has demonstrated free-field equivalent
localization of virtual audio for any of its participants.
It is interesting that one participant in this study
(R. M.) performed significantly better, with respect to
the average localization error, in the virtual relative to
the free-field condition. This surprising result indicates
that virtual, as opposed to free-field, stimuli provide
localization cues that are either more accurate or more
salient for this participant. It is possible that truncating
HRIRs to 512 poi0ts in the process of generating virtual
stimuli removes low-amplitude echoes, which interfere
with localization in the free field,
As any virtual audio system could be shown to produce free-field equivalent performance if evaluated using
a method that is sufficiently insensitive to changes in
localization error, it is important to note that the method
used here was a highly sensitive one. For a method to
be sensitive to changes in localization error, it must
incorporate an accurate measure of that parameter. As
demonstrated by Perrett and Noble [15], one factor that
can affect the accuracy with which the localization error
is measured is the distribution of target locations. Where
participants have knowledge of that distribution, the
measurement accuracy can be expected to decrease as
the number and dispersion of the target locations is reduced. For example, participants who know that target
locations are restricted to the front (or back) hemifield
will not make as many f r o n t - b a c k confusions as they
would otherwise.
The study described here involved a large number of
widely dispersed target locations. For both free-field and
virtual stimuli, target locations were chosen from a set of
354 evenly dispersed across the part sphere that extended
f r o m - 4 0 to + 7 0 ~ elevation. For virtual stimuli, the
effective number of target locations was even greater
because the hoop-referenced location associated with
20

PAPERS

any of the 354 head-referenced locations varied slightly


from presentation to presentation as a function of the
precise location and orientation of the participant's head.
The sensitivity of this study's method was enhanced
further by having participants indicate where they perceived stimuli to originate from using a technique that
did not restrict the locations to which they could point.
Restricting the response options available to participants
could reduce the accuracy with which the localization
error is measured by, for example, forcing participants
to choose the option nearest the perceived location. By
providing participants with a laser pointer to indicate
where they perceived stimuli to originate from, we ensured that the perceived location of stimuli could be
indicated with accuracy. This enhanced the accuracy
with which perceived location and, therefore, localization error were measured.
As noted before, the provision of feedback regarding
the true stimulus location on all trials in this study raises
the possibility that participants learned to localize virtual
stimuli as accurately as free-field ones during the study's
training phase. Although this possibility cannot be dismissed in the absence of a no-feedback condition, the
performance of participants during the training phase
indicates that it is unlikely. One participant (R. M.)
demonstrated free-field equivalent localization during
the first few sessions in which virtual stimuli were presented. It is highly unlikely that this was a consequence
of location-specific training. As stimulus locations in
this study were chosen from a large set of 354, only a
small proportion of those from which a stimulus was
presented during these first few sessions are likely to
have been familiar to the participant. (The expected proportion for the first three sessions is less than 0.125.)
Another participant (K. S.) showed no sign of improvement across the first 12 virtual sessions but improved to
free-field equivalent levels immediately after her HRIRs
were remeasured. The third participant (M. S.) showed
some improvement across the first 13 virtual sessions
but, like K. S., improved further after her HRIRs were
remeasured. The first two participants' data argue
strongly against the possibility that free-field equivalent
localization of virtual stimuli was attained through learning. The third participant's data are equivocal in that
respect.
The exceptionally accurate localization of virtual
stimuli observed in this study could not have resulted
from the use of cues made available to participants
through movements of their heads. Participants in this
study were instructed to keep their heads stationary during stimulus presentation. Any head movements made
while virtual stimuli were presented could only have
reduced localization accuracy, as the location of these
stimuli was not updated with respect to head position
during the presentation. It is possible that head movements helped participants localize free-field stimuli, but
if that was the case, the free-field equivalent localization
of virtual stimuli in this study is even more impressive.
It is also unlikely that participants in this study used
information derived from the sound of the hoop motors
J Audio Eng Soc., Vol 49, No 1/2, 2001 January/February

PAPERS

to localize virtual and free-field sound with equivalent


accuracy. Tests carried out following this study indicated that the localization performance of all three participants was extremely poor when the loudspeaker was
moved to a new location before every trial, but neither
a virtual nor a free-field stimulus was presented. As the
loudspeaker was moved to a randomly chosen location
before it was moved to each target location, the sound of
the motors rarely provided helpful information regarding
target location. In addition, the most recent studies in
our laboratory (presently unpublished) have shown that
participants can localize virtual sound with the same
level of accuracy in the absence of loudspeaker movement.
The acoustic stimulus used in this study, a 328-ms
burst of Gaussian noise, is similar to that used in many
previous studies of free-field sound localization (for example, [6], [7]). It is also similar to that used by Carlile
et al. [5] to evaluate their virtual audio system. Like
us, Carlile et al. [5] reported an average front-back
confusion rate for free-field sound of around 3%. Their
average localization error for free-field sound cannot be
compared with ours, as they did not report that value.
Wightman and Kistler [2] and Bronkhorst [4], on the
other hand, used stimuli that differed from ours. They
reported average front-back confusion rates and localization errors for free-field sound that are higher than
those reported here. The stimulus in Wightman and Kistler's study [2] was a train of spectrally roved noise
bursts, whereas that in Bronkhorst's [4] was a pulsed
60-component complex tone. It appears that these stimuli are more difficult to localize than a burst of Gaussian
noise. Whether the use of these difficult-to-localize stimuli contributed to the difference between virtual and freefield localization in the studies by Wightman and Kistler
[2[ and by Bronkhorst [4] is unclear.
One possible reason for the superior performance of
the virtual audio system described here concerns our use
of blocked-ear-canal rather than deep-canal measurement techniques. Wightman and Kistler [1], [2], Bronkhorst [4], and Carlile et al. [5] all measured HR1Rs
through probe tubes inserted deep within their participants' ear canals. The inclusion of a probe tube in a
measurement system will attenuate the acoustic signal
and therefore reduce the system's dynamic range. It is
also difficult to place a probe tube close enough to the
tympanic membrane to avoid standing-wave nulls that
could attenuate the acoustic signal even further. By measuring HRIRs directly through microphones at the entrances of our participants' blocked ear canals, as recommended by Mr
et al. [ 13], we optimized the dynamic
range of our HRIR measurement system.
It is also possible that the high fidelity of our virtual
audio system resulted from the procedures we followed
to compensate for the filtering characteristics of our
stimulus delivery and HRIR measurement systems. As
described earlier, headphone-to-microphone transfer functions were measured for each participant following measurement of their HRIRs. Extreme care was taken when
fitting the headphones to ensure that the microphones
J Audio Eng. Soc, Vol 49, No 1/2, 2001 January/February

LOCALIZATION OF VIRTUAL AUDIO

remained positionally stable. Microphone movement


between HRIR and headphone-to-microphone transfer
function measurement will result in inappropriate
compensation. Optimal compensation for headphone-tomicrophone transfer functions was achieved by correcting HRIRs using 1024-point inverse headphone-tomicrophone impulse responses.
The results of this study are important despite the
relatively small number of participants on which they
were based. The participants involved in this study are,
to date, the only individuals for whom our virtual audio
system has been evaluated. The results they provided,
which were analyzed on a participant-by-participant basis, are compelling for each participant and consistent
across all three. They provide a clear demonstration of
the feasibility of achieving free-field equivalent localization of virtual audio.

4 CONCLUSION
This paper has demonstrated the feasibility of achieving free-field equivalent localization of virtual audio.
This demonstration should encourage the more widespread application of three-dimensional audio technology in environments in which accurate spatial information needs to be conveyed. Environments of this kind
include commercial and military aircraft cockpits, teleoperator stations, entertainment and training facilities,
and auditory science laboratories (see [16]-[18] for
reviews).

5 ACKNOWLEDGMENT
The authors thank Gavan Lintern and two anonymous
reviewers for providing comments on previous versions
of this manuscript.

6 REFERENCES
[1] F. L. Wightman and D. J. Kistler, "Headphone
Simulation of Free-Field Listening: I. Stimulus Synthesis," J. Acoust. Soc. Am., vol. 85, pp. 858-867
(1989 Feb.).
[2] F. L. Wightman and D. J. Kistler, "Headphone
Simulation of Free-Field Listening: II. Psychophysical
Validation," J. Acoust. Soc. Am., vol. 85, pp. 868-878
(1989 Feb.).
[3] R. L. McKinley, M. A. Ericson, and W. R.
D'Angelo, "3-Dimensional Auditory Displays: Development, Applications, and Performance," Av. Space Environ. Med., vol. 65, pp. A 3 1 - 3 8 (1994 May).
[4] A. W. Bronkhorst, "Localization of Real and Virtual Sound Sources," J. Acoust. Soc. Am., vol. 98, pp.
2542-2553 (1995 Nov.).
[5] S. Carlile, P. Leong, D. Pralong, R. Boden, and
S. Hyams, "High Fidelity Virtual Auditory Space: An
Operational Definition," in Proc. Simtect 96, S. Sestito,
P. Beckett, G. Tudor, and T. Triggs, Eds. (Simtect
Organising Committee, Melbourne, 1996), pp. 7 9 - 8 4 .
[6] S. R. Oldfield and S. P. A. Parker, "Acuity of
21

MARTIN ET AL

PAPERS

Sound Localisation: A Topography of Auditory Space.


I. Normal Hearing Conditions," Perception, vol. 13,
pp. 5 8 1 - 6 0 0 (1984 Sept.).
[7] J. C. Makous and J. C. Middlebrooks, "TwoDimensional Sound Localization by Human Listeners,"
J. Acoust. Soc. Am., vol. 87, pp. 2188-2200 (1990
May).
[8] M. N. Semple, "Sounds in a Virtual World," Nature, vol. 396, pp. 7 2 1 - 7 2 4 (1998 Dec.).
[9] D. B. Watson, R. L. Martin, K. I. McAnally,
S. E. Smith, and D. L. Emonson, "Effect of Normobaric
Hypoxia on Auditory Sensitivity," Av. Space Environ.
Med., vol. 71, pp. 7 9 1 - 7 9 7 (2000 Aug.).
[10] J. F. Corso, "Age and Sex Differences in PureTone Thresholds," Arch. Otolaryngol., vol. 77, pp.
3 8 5 - 4 0 5 (1963 Apr.).
[11] P. G. Stelmachowicz, K. A. Beauchaine, A.
Kalberer, and W. Jesteadt, "Normative Thresholds in
the 8- to 20-kHz Range as a Function of Age," J. Acoust.
Soc. Am., vol. 86, pp. 1384-1391 (1989 Oct.).
[12] B. Zhou, D. M. Green, andJ. C. Middlebrooks,
"Characterization of External Ear Impulse Responses

Using Golay Codes," J. Acoust. Soc. Am., vol. 92, pp.


1169-1171 (1992 Aug.).
[13] H. Moiler, M. F. SOrensen, D. Hammershoi,
and C. B. Jensen, "Head,Related Transfer Functions
of Human Subjects," J. Audio Eng. Soc., vol. 43, pp.
300-321 (1995 May).
[14] G. Keppel, Design and Analysis : A Researcher's
Handbook (Prentice-Hall, Englewood Cliffs, NJ, 1991).
[15] S. Perrett and W. Noble, "Available Response
Choices Affect Localization of Sound," Percept. Psychophys., vol. 57, pp. 150-158 (1995 Feb.).
[16] D. P. Morgan and B. Gehring, "Applications of
Binaural Sound in the Cockpit," Speech Technol., pp.
4 6 - 5 1 (1990 Apr./May).
[17] B. Shinn-Cunningham and A. Kulkarni, "Recent
Developments in Virtual Auditory Space," in Virtual
Auditory Space: Generation and Applications, S. Carlile, Ed. (R. G. Landes, Austin, TX, 1996), pp.
185-243.
[ 18] D. R. Begault, "Virtual Acoustics, Aeronautics,
and Communications," J. Audio Eng. Soc. (Engineering
Reports), vol. 46, pp. 520-530 (1998 June).

THE AUTHORS

R. L. Martin

K. I. McAnally

Russell Martin graduated with a Ph.D. in psychology


from Monash University, Australia, in 1989. He has
held postdoctorate research positions at the Vision,
Touch and Hearing Research Centre, University of
Queensland, Australia; the Laboratory of Physiology,
Oxford University, U.K.; and the Department of Otolaryngology, University of Melbourne, Australia. In 1994
he was appointed lecturer in the School of Psychology
at Deakin University, Australia. He is' currently employed as a senior research scientist in Air Operations
Division at the Defence Science and Technology Organisation of Australia. His research interests include threedimensional audio display design and application, audio
warning symbology, auditory psychophysics, and auditory neuroscience.
Ken McAnally received a Ph.D. in physiology and
pharmacology from the University of Queensland in
1990. He has held postdoctorate research positions at
the Department of Otolaryngology, University of Melbourne; the Laboratory of Psychoacoustics, University

22

M. A. Senova

of Bordeaux; and the Laboratory of Physiology, Oxford


University. He is currently employed as a senior research
scientist in the Air Operations Division of the Defence
Science and Technology Organisation of Australia. His
research interests include auditory neuroscience, psychophysics, and three-dimensional audio display design
and application.
Melis Senova graduated with a Bachelor of Applied
Science (Biophysics and Instrumentation) from Swinburne University of Technology, Australia, in 1998. She
spent one year at the Research Institute for Brain and
Blood Vessels (Noken) in Akita, Japan, conducting research using multimodal brain imaging techniques to study
the effect of nicotine on brain dynamics. She is currently
studying for a Ph.D. degree in three-dimensional audio
technology in the Air Operations Division of the Defence
Science and Technology Organisation of Australia and
Swinburne University of Technology. Her research interests include three-dimensional audio technology, speech
intelligibility, and auditory psychophysics.

J. Audio Eng. Soc., Vol 49, No 1/2, 2001 January/February

Das könnte Ihnen auch gefallen