Beruflich Dokumente
Kultur Dokumente
Air Operations Division, Aeronautical and Maritime Research Laboratory, Melbourne, 3001, Australia
AND
MELIS A. SENOVA
Virtual audio has great potential for conveying spatial information and could be applied
to advantage in several environments. Previously implemented virtual audio systems,
however, have been shown to be less than perfect with respect to front-back confusion
rate and average localization error. A system from this laboratory has been evaluated by
comparing, for three participants, virtual and free-field localization performance across
a wide range of sound-source locations. For each participant, virtual localization was
found to be as good as free-field localization, as measured by both front-back confusion
rate and average localization error. The feasibility of achieving free-field equivalent
localization of virtual audio should encourage the more widespread use of this relatively
new technology.
0 INTRODUCTION
Three-dimensional audio displays have great potential
for conveying spatial information and enhancing virtual
environments. They could be applied to advantage in
domains as diverse as home entertainment and the military. The ability to synthesize sound that is heard in
three-dimensional space also provides the auditory scientist with a powerful tool for examining the cues and
mechanisms involved in sound localization.
High-fidelity virtual audio can be generated by reproducing the at-eardrum signals associated with natural,
free-field sound presentation. This can be achieved by
measuring the way an individual's head and ears filter
sound presented from different directions and then constructing a set of digital filters that modify sound as the
head or ears would. Typically, measurements are made
using small microphones placed within the individual's
ear canals or coupled to them via probe tubes. Filters
constructed from these measurements can be convolved
with any sound to impart directionality to it (see, for
example, [1]).
Implementations of this technique have produced im* Manuscript received 2000 March 1; revised 2000 September 28.
14
PAPERS
1 METHOD
1.1
Participants
1.2 Design
The localization of free-field and virtual sound was
compared on a participant-by-participant basis using a
randomized-block design. Each participant took part in
16 experimental sessions, eight in which localization of
free-field sound was tested and eight in which localization of virtual sound was tested. Each session was comprised of 42 localization trials. The localization performance of each participant, therefore, was compared across
a total of 336 free-field and 336 virtual trials.
All participants were allowed to practice free-field
and virtual localization during several training sessions
prior to the experimental phase of this study. The main
purpose of these sessions was to ensure that participants
had comparable training for free-field and virtual localization. The procedures followed during these training
sessions were identical to those followed during the subsequent experimental sessions. The performance of each
participant was observed to stabilize during the training
period. The levels at which performance stabilized in
the case of free-field sound indicated that all participants
were proficient at sound localization.
MARTIN ET AL
PAPERS
response was truncated to 256 points. The impulse responses of the two miniature microphones were determined together with those of the headphones (Sennheiser
HD520 II) that were subsequently used to present virtual
sound. These responses were determined immediately
following measurement of the HRIRs by playing Golay
codes through the headphones and sampling the responses of the microphones. Care was taken not to move
the microphones when the headphones were donned.
These impulse responses were truncated to 128 points.
The responses of the loudspeaker, microphones, and
headphones were deconvolved from the HRIRs by division in the frequency domain. The resulting corrected
HRIRs were truncated to 1024 points to accommodate
ringing in the inverse headphone responses.
The magnitude transfer functions of the loudspeaker
and left microphone are shown together with representative headphone and head-related magnitude transfer
functions in Fig. 1. The headphone transfer function is
that appropriate for the left ear of participant R. M. The
head-related transfer function is that appropriate for the
left ear of participant R. M. for the 0 ~ azimuth and
elevation location. It has been corrected for the loud-
speaker and microphone responses but not for the headphone response.
..5
-10
-10
~ -15
~-15
if)
~ -20
~ -20
Q.
~' -25
-30 ~
-30
-35
-35
-40
"400
10
Frequency (kHz)
15
20
-45
10
Frequency (kHz)
15
(a)
(c)
20
10
'
-10
-10
//'
=
O..
~o
~ -2o
-20
-25
n," -30
-30
-4O
-35
-400
10
Frequency (kHz)
(b)
15
20
-50
10
Frequency (kHz)
15
20
(d)
Fig. 1. Magnitude transfer functions. Headphone transfer function is appropriate for left ear of participant R. M. Head-related
transfer function is appropriate for left ear of participant R. M. for the 0~ azimuth and elevation location. It has been corrected
for loudspeaker and microphone responses but not for headphone response.
16
PAPERS
2 RESULTS
Average free-field and virtual localization errors for
each participant are plotted in Fig. 2. Following Carlile
et al. [5], trials in which a f r o n t - b a c k confusion occurred were excluded before these averages were calculated. A localization was regarded as a f r o n t - b a c k confusion if two conditions were met. The first was that
both the true and the perceived locations of the sound
source not fall within a narrow exclusion zone symmetrical about the vertical plane dividing the front and back
hemispheres of the hoop. The width of this exclusion
zone, in degrees of azimuth, was adjusted as a function
of elevation to allow for the convergence of lines of
equal azimuth at the coordinate system's poles. At 0 ~
elevation it was set at 15 ~ and at all elevations it was
equal to 15 ~ divided by the elevation's cosine. The second condition was that the true and perceived locations
of the sound source be in different f r o n t - b a c k hemifields.
Average free-field errors ranged from 8.8 to 11.0 ~
reflecting the high level of aptitude of these participants
for the localization task. Average virtual errors ranged
12
Free-field
Virtual
G"
~
9
10
"o
e"
0
N
0
R.M
KS
MS
Subject
MARTIN ET AL
PAPERS
from 9.6 to 9.7 ~ For one participant (R. M.) the average
error was slightly smaller for the virtual stimulus, and
for the two others (K. S. and M. S.) it was slightly
smaller for the free field. Each of the error bars shown
in this figure represents one standard error of the mean
of the average localization errors for the eight sessions
in which the relevant stimulus was tested. The small
magnitudes of these standard errors indicate that the
performance of all participants was highly reproducible.
Each participant's average errors for the free-field and
the virtual stimuli were compared by performing a
repeated-measures ANOVA on the average errors for
the individual sessions in which the two types of stimuli
were tested. For R. M. the average error was significantly smaller for the virtual stimulus [F (1,7) = 58.29,
p < 0.01]. For K. S. and M. S. average errors did not
differ significantly [K. S., F ( 1 , 7) = 2.47, p = 0.16;
M. S., F ( 1 , 7) = 0.76, p = 0.41). Power analyses
following the procedures outlined by Keppel [14] revealed the presence of sufficient power (/>0.8) to reliably detect effects of 1.31 and 1.03 ~ for K. S. and M. S.,
respectively. We can be reasonably confident, therefore,
that free-field and virtual errors for these participants
differ by no more than 1 - 1 . 3 ~
The number of f r o n t - b a c k confusions made by each
participant for free-field and virtual stimuli is shown in
Table 1. For R. M. the number of free-field and virtual
f r o n t - b a c k confusions were identical. For K. S. the
f r o n t - b a c k confusion rate was a little higher for the
virtual stimulus and for M. S. it was a little higher
for the free field. A chi-square test for each participant
indicated that the number of f r o n t - b a c k confusions for
the two types of stimuli did not differ significantly
(R. M.,X2(1) = 0, p = 1 ; K . S.,X2(1) = 0.24, p =
0.62; M. S., X2(1) = 0.45, p = 0.50).
As visual feedback concerning the location of the
stimulus was provided during all trials in this study, it
is possible that participants learned to localize virtual
stimuli as accurately as free-field stimuli during the
study's training phase. Evidence arguing against this
possibility is provided by the session-by,session performance of individual participants for free-field and
virtual stimuli throughout the training phase (Fig. 3).
Fig. 3(a) shows the average localization errors of individual participants for most (R. M.) or all (K. S. and
M. S.) of the training sessions. (R. M. took part in a
further three free-field and 17 virtual sessions during
which additional pilot data were collected.) Sessions in
which free-field stimuli were presented are represented
by filled circles. Those in which virtual stimuli were
presented are represented by open circles. For each parTable 1. Total numberoffront-backconfusions madebyeach
participant for free-field and virtual stimuli. Numbers in
parentheses show totals as percentages of all localizations.
Participant
Free Field
Virtual
R.M.
16 (4.8%)
16 (4.8%)
K.S.
M.S.
7 (2.1%)
6 (1.8%)
10 (3.0%)
3 (0.9%)
18
PAPERS
3 DISCUSSION
Evaluations of previously implemented virtual audio
systems have shown them to be less than perfect with
respect to f r o n t - b a c k confusion rate and average local-
G"
r
,~ 16
R.M.
O)
R . M .
t"-
14
.O
tO
O
t-
.o
12
,...,
8
"6
lo
E
Z
~
9
a
,
10
13
16
Session
10
13
16
Session
16I
K.S.
K.S.
6
t/)
tO
14
~4
t-
O
O
"6
~2
E
..Q
1o
g
~
a
.
10
13
16
19
10
13
16
19
Session
Session
16 84
M.S.
M.S.
t-
.o_
14 84
~4
t-
O
O
12 84
r
.N
~2
E
d3
10
-I
O)
>
<
10
13
16
19
Session
(a)
22
25
10
13
16
19
22
25
Session
(b)
Fig. 3. (a) Average localization errors and (b) numbers of front-back confusions for individual participants for most (R. M.)
or all (K. S. and M. S.) training sessions O--free-field sessions; O--virtual sessions. For each participant, two groups of virtual
sessions were distinguished. The particular group to which a given virtual session belongs is indicated by the continuous solid
line that links the session to all other sessions in group.
J Audio Eng Soc., Vol 49, No. 1/2, 2001 January/February
19
MARTIN ET AL
PAPERS
PAPERS
4 CONCLUSION
This paper has demonstrated the feasibility of achieving free-field equivalent localization of virtual audio.
This demonstration should encourage the more widespread application of three-dimensional audio technology in environments in which accurate spatial information needs to be conveyed. Environments of this kind
include commercial and military aircraft cockpits, teleoperator stations, entertainment and training facilities,
and auditory science laboratories (see [16]-[18] for
reviews).
5 ACKNOWLEDGMENT
The authors thank Gavan Lintern and two anonymous
reviewers for providing comments on previous versions
of this manuscript.
6 REFERENCES
[1] F. L. Wightman and D. J. Kistler, "Headphone
Simulation of Free-Field Listening: I. Stimulus Synthesis," J. Acoust. Soc. Am., vol. 85, pp. 858-867
(1989 Feb.).
[2] F. L. Wightman and D. J. Kistler, "Headphone
Simulation of Free-Field Listening: II. Psychophysical
Validation," J. Acoust. Soc. Am., vol. 85, pp. 868-878
(1989 Feb.).
[3] R. L. McKinley, M. A. Ericson, and W. R.
D'Angelo, "3-Dimensional Auditory Displays: Development, Applications, and Performance," Av. Space Environ. Med., vol. 65, pp. A 3 1 - 3 8 (1994 May).
[4] A. W. Bronkhorst, "Localization of Real and Virtual Sound Sources," J. Acoust. Soc. Am., vol. 98, pp.
2542-2553 (1995 Nov.).
[5] S. Carlile, P. Leong, D. Pralong, R. Boden, and
S. Hyams, "High Fidelity Virtual Auditory Space: An
Operational Definition," in Proc. Simtect 96, S. Sestito,
P. Beckett, G. Tudor, and T. Triggs, Eds. (Simtect
Organising Committee, Melbourne, 1996), pp. 7 9 - 8 4 .
[6] S. R. Oldfield and S. P. A. Parker, "Acuity of
21
MARTIN ET AL
PAPERS
THE AUTHORS
R. L. Martin
K. I. McAnally
22
M. A. Senova