AES 130 Riekehof Wittek PDF

Audio Engineering Society
Convention Paper
Presented at the 130th Convention
2011 May 13–16 London, UK
The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have
been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from
the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes
no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
Prediction of perceived width of stereo

microphone setups
Hans Riekehof-Boehmer1 , Helmut Wittek2
1
HAW-Hamburg
2
SCHOEPS Mikrofone
Correspondence should be addressed to Hans Riekehof-Boehmer (h.riekehof@gmail.com)
ABSTRACT
The diffuse-field correlation of the two signals generated by a stereophonic microphone setup has an effect
on the perception of spatial width. A correlation meter is often used to measure the correlation coefficient.
However, due to the frequency dependence of the correlation function, the correlation coefficient is not an
appropriate value for predicting the perceived width when it comes to time-delay stereophony.
By using the newly defined “Diffuse-Field Image Predictor” (DFI Predictor) presented in this paper, an
attempt is made to reliably predict perceived width. Listening tests show that the DFI Predictor is fairly
suitable for this task. The aim of the study is to compare the spatial properties of different stereophonic
microphone techniques by one calculated value.
1. INTRODUCTION field in a recording is important for the perception

The resulting directional image of a stereo micro- of spatial width and envelopment. The diffuse sound
phone setup can be predicted by calculating the so- should be reproduced as being decorrelated, so as to
called localization curve. This can easily be done get a wide image in two-channel stereo or a sensa-
by means of the Image Assistant [1] or the Williams tion of envelopment in surround [3]. However, other
Curves [2]. It is, however, not as easy to predict the sources suggest that the diffuse sound should have a
spatial quality of a stereo microphone setup. certain level of correlation, and that complete decor-
The “DFI Predictor” introduced in this paper is de- relation leads to undefined imaging [4], [5].
signed to support this type of prediction. It is known For now, the DFI (Diffuse-Field Image) Predictor
that the interchannel correlation of the diffuse sound is defined only for two-channel microphone setups.
Riekehof-Boehmer AND Wittek Prediction of perceived width of stereo microphone setups
Sine and cosine function of 400Hz

The DFI Predictor is calculated by using the coher- 1
ence function for differential microphones in diffuse 0.5
Magnitude
fields [6]. The correlation of the diffuse sound has 0
also an effect on sound color. The greater the cor- -0.5
relation in the reproduction of a diffuse sound field, -1

0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01
Time in [s]
the more coloration will be perceived. Normalized cross-correlation function of the sine and cosine signal
1
Differences in diffuse-field correlation between the
0.5
two channels are likely to significantly influence the
Correlation
0
preference for a specific recording technique. The
-0.5
diffuse-field correlation of spaced microphones is de-
-1
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
pendent on frequency. Thus, the correlation coeffi- Time in [s]
cient as well as the degree of coherence of the micro-

phone setup in a diffuse sound field can vary depend-
ing on the power spectrum of the excitation signal. Fig. 1: Top: Sine and cosine signals. Bottom: nor-
The DFI Predictor can be calculated for every two- malized cross-correlation function of the two signals.
channel microphone setup and includes a frequency-
dependent weighting function. cross-correlation function is defined:
The weighting function is applied to the coherence Pn=+∞ ∗
function [6]. This frequency-dependent weighting s (n) · g(n + m)
psg (m) = n=−∞p
E
(1)
function is the main difference between the DFI Pre- Es · Eg
dictor and the correlation coefficient or degree of co- with s(n), g(n) being the input signals and Es , Eg
herence. The results of the listening test show that being the corresponding signal energy. The correla-
the DFI Predictor is able to give a rather good pre- tion coefficient of the following example is zero be-
diction of the perceived width. cause the input signals are orthogonal (see figure 1).
The DFI Predictor, however, is meant only as a first The correlation coefficient describes the phase shift
step in the search for a reliable prediction of per- between two signals. The real “similarity” of the sig-
ceived width and envelopment. nals is described inaccurately.
Due to the scaling on the overall energy (see equa-
tion 1), the spectral power density is important for
2. SIGNAL CORRELATION MEASUREMENT the value of the correlation coefficient. Two input
In many psychoacoustic studies the correlation co- signals with the same phase shift but different spec-
efficient or the degree of coherence is used to de- tral power densities result in a different correlation
termine signal “similarity”. The degree of coherence coefficient (see figure 7).
and the correlation coefficient are both calculated
in the time domain. When it comes to spaced mi- 2.2. The degree of coherence
crophones in a diffuse field, the signal correlation The degree of coherence is also based on the nor-
differs for each frequency and has a sin(x) character- malized cross-correlation function. It is defined [8]:
x
k = max pE

istic (see figure 2). This frequency dependence is not sg (m)
(2)
included in the correlation coefficient or the degree
of coherence. The DFI Predictor, however, is based Because the maximum absolute value is used, the
on the coherence function [6] and therefore includes phase shift between the two signals cannot be de-
the frequency-dependent correlation of spaced mi- scribed any more. The degree of coherence for a
crophones [7]. sine and cosine signal of the same frequency would
always be 1 (see figure 1) regardless of the phase
2.1. The correlation coefficient shift. The degree of coherence also is dependent on
The correlation coefficient is certainly the most com- the spectral power density of the input signals.
mon measurement for signal similarity. It is defined 2.3. The coherence function
at m = 0 of the normalized cross-correlation func- The coherence function is able to give a more pre-
tion pE
sg (m). The discrete type of the normalized cise description of the “similarity” of two signals.
AES 130th Convention, London, UK, 2011 May 13–16

Page 2 of 7
Coherence function γxy (f ) of different microphone setups

1
0.8
0.6
0.4
0.2
Coherence
-0.2
-0.4
-0.6
A/B with omnis with 10 cm spacing
-0.8 A/B with omnis with 50 cm spacing
X/Y with cardioids and 90° offset angle
ORTF with supercardioids
-1 0 1 2 3 4
10 10 10 10 10
Frequency in [Hz]
Fig. 2: Coherence functions of different stereo microphone setups in a diffuse sound field.
The advantage of the coherence function is that you degree of coherence. Furthermore, the coherence
have the “similarity” of the input signals at every function is not dependent on the power spectrum
frequency. In [9] the coherence function is defined density of the input signals. In [10] it is shown that
as: low frequencies have a strong effect on the spatial
2
|Pxy (f )| impression. In figure 2 the coherence function for
Cxy (f ) = (3)
Pxx (f ) · Pyy (f ) different stereo microphone setups in a diffuse field
is shown.
with Pxy (f ) being the cross power density spectrum
An A/B setup with omnidirectional microphones
and Pxx (f ) and Pyy (f ) being the power density spec-
with a spacing of 10 cm is almost completely mono
tra. This type of coherence function is called the
below 400 Hz. For coincident setups the signal cor-
magnitude squared coherence. In [6] another ver-
relation is the same for every frequency. In this case,
sion of the coherence function is defined (see figure
the correlation coefficient leads to a better conclu-
2):
Pxy (f ) sion on signal correlation.
γxy (f ) = p (4)
Pxx (f ) · Pyy (f )
again with Pxy (f ) being the cross power density
spectrum and Pxx (f ) and Pyy (f ) being the power 3. THE DFI PREDICTOR
density spectra. The DFI Predictor is based on the complex coher-
The complex coherence function γxy (f ) is able to ence function for microphones in the diffuse sound
describe the phase shift between the two input sig- field. A weighting function χ(f ) is applied which
nals. In a nutshell it can be said that the complex describes a 3 dB per octave attenuation of the co-
coherence function γxy (f ) denotes a correlation coef- herence function (see figure 3). By summing up the
ficient for every frequency. Thus, the “similarity” of weighted coherence function, the DFI Predictor rep-
two broadband input signals can be described more resents the frequency-dependent correlation of mi-
precisely than with the correlation coefficient or the crophones in a diffuse field with a single value.

Page 3 of 7
Weighting function χ(f ) Coherence function of noise1,noise2 and noise1,noise3

20 1
0.5
15
Coherence
0
10 γnoise1,noise2 (f )
-0.5
γ (f )
noise1,noise3
5
Magnitude in [dB]
-1 0 1 2 3 4
10 10 10 10 10
Frequency in [Hz]
0
Coherence function of the two noise signals noise-left, noise-right and the
theoretical coherence at low frequencies
-5 1
0.5
-10
Coherence
0
-15
-0.5 γnoiseL,noiseR (f )
γxy (f )
-20 0 1 2 3 4 -1 0
10 10 10 10 10 10 10
1
10
2
10
3
10
4
Frequency in [Hz] Frequency in [Hz]
Fig. 3: Weighting of the coherence function. Fig. 4: Top diagram: Coherence of the initial noise
signals. Bottom diagram: Result for an A/B setup
with omnis at 0.2 m distance. DFI Predictor =
It is defined as: 0.2957
f =6000Hz
1 X
DF I = · [γxy (f ) · χ(f )]2 (5) noise signals were separately convolved with the dif-
n
f =100Hz fuse part of a mono room impulse response. To iso-
late the diffuse part of the impulse response, the first
with n being the FFT length, γxy (f ) being the com- 100 ms were cut off.
plex coherence function and χ(f ) being the weight- After the convolution a stereo diffuse room impulse
ing function. response is created which has a defined coherence.
In [11], [12] a frequency range from 40 Hz to 1.5 kHz This impulse response is then convolved with a dry
was used to calculate the DFI Predictor. However, mono recording. The result is the stimulus. Figure
the frequency range most likely has to match the 4 shows an example of a noise signal with a defined
stimulus used in the listening test. coherence. The subjects were listening to the stim-
In the listening test described below, the stimulus uli via headphones. The test software used for the
was female speech. The effect on the results of the listening test is based on a MUSHRA test (see fig-
listening test are shown in chapter 4. The DFI Pre- ure 5). After a short introduction the subjects were
dictor can be considered as a correlation coefficient. able to run the listening test with the test software
A small DFI Predictor defines a low correlation in by themselves.
the frequency range 100 Hz to 6 kHz. A high DFI
Predictor value will indicate a high correlation in
that frequency range.
The definition of the DFI Predictor is a first ap-
proach until further studies can be performed to
check and refine this definition.
4. LISTENING TEST
The stimuli for the listening test were created in
Matlab. The aim was to create different stimuli with
a defined coherence. The basis is an arbitrary stereo
microphone setup and its theoretical coherence func-
tion in the diffuse sound field [6]. Fig. 5: The test software was specially designed for
Based on this coherence function two noise signals the listening test
are created having approximately the same coher-
ence as predicted by this calculation. These two In this MUSHRA-like test design, the subjects com-

Page 4 of 7
Results of the listening test

coincident time delay spaced cardioids MV with confint
4
wide
100cm 100cm/30°
3
50cm 35cm
0cm/45°
50cm/30°
2
20cm
20cm/30°
10cm/30°
1
Quantitation
0cm/45°
0
2cm/30°
0cm/45°
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
10cm
-1
0cm/45°
0cm/45°
-2
-3
2cm
narrow
0cm/45°
-4
DFI-Predictor
Fig. 6: Relationship between the DFI Predictor and the perceived spatial width.
pared the stimuli to the reference and all other stim- reference were perceived as narrower and vice versa.
uli. A stimulus that is perceived as wider gets a The hidden reference was well recognized by all sub-
higher quantitative rating on a scale from -4 to 4. jects and as expected was positioned in the middle of
If no difference is perceived between a stimulus and the scale. These results already show that the DFI
the reference, the rating is zero. Predictor can be used to predict perceived width.
In the perfect case there should be a gradual char- At the moment the definition of the DFI Predictor
acteristic from narrow to wide after quantifying all is fairly rough. In further studies the stimuli for the
stimuli. The reference in the listening test was an listening test should be expanded with regard to fre-
X/Y setup with cardioids and a 90-degree offset an- quency range to improve the significance of the DFI
gle (DFI Predictor = 0.5607). The microphone se- Predictor.
tups simulated for the listening test are shown in Figure 7 shows the results of the listening test plot-
figure 6. A shuffle function was integrated in the ted against the correlation coefficient of the two noise
test software. As a result, every subject listened to signals used for the decorrelation of the impulse re-
a different order of stimuli. The dry recording used sponse. Depending on whether the diffuse sound
for all stimuli was female speech (SQAM-CD). Eight field was simulated with pink noise or white noise,
subjects participated in the listening test. the results vary. The graphs show that the corre-
The results of the listening test are also shown in lation coefficient is less able to lead to conclusions
figure 6. All curves show a similar behavior. All about perceived spatial width.
stimuli with a smaller DFI Predictor value than the

Page 5 of 7
Results of the listening test Results of the listening test

coincident time delay equivalence MV coincident time delay equivalence MV
4 4
3 3
2 2
Quantitation
Quantitation
1 1
0 0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
-1 -1
-2 -2
-3 -3
-4 -4
Correlation coefficient (determined with white noise) Correlation coefficient (determined with pink noise)
Fig. 7: The results of the correlation coefficient vary, depending on whether the diffuse sound field was
simulated with pink or white noise.
5. CONCLUSIONS round recordings, and their relationship to mi-

By the method presented here, it is possible to crophone technique. In: AES 19th Intern. Con-
predict the perceived spatial width of an arbitrary ference, 2001, S. 182–200
stereo microphone setup. The results of a listening
test show that the DFI Predictor is already able to [4] Theile, G.: Multichannel natural music
give a fairly good prediction of perceived width. recording based on psychoacoustic principles.
The frequency dependency of the correlation is im- In: AES 19th International Conference, 2001
portant and should be considered for studies on spa-
[5] Damaske, P.: Subjektive Untersuchung von
tial perception of spaced microphone setups.
Schallfeldern. In: Acustica 19 (1967), Nr. 199,
The results show that the DFI Predictor offers a
S. 68
direct comparison of coincident and spaced micro-
phone techniques with regard to their spatial imag- [6] Elko, G.W.: 4 Spatial Coherence Functions
ing properties. This can be particularly important for Differential Microphones in Isotropic Noise
for the design of multichannel microphone setups. Fields. In: Microphone arrays: signal process-
ing techniques and applications (2001), S. 61
6. REFERENCES
[7] Drews, M.: Mikrofonarrays und mehrkanalige
[1] Wittek, H.: The ”‘Image Assistant 2.1”’ Signalverarbeitung zur Verbesserung gestoerter
is an interactive JAVA Applet that calcu- Sprache. In: Technische Universität Berlin,
lates the localization curves of arbitrary 2- Germany, Diss (1999)
and 3-channel stereo microphone configura-
tions. www.hauptmikrofon.de, 2008 [8] Blauert, J.: Spatial hearing. MIT press Cam-
bridge, Mass., 1997
[2] Williams, M.: Microphone Arrays for Stereo
and Multichannel Sound Recordings: a variable [9] Bendat, JS ; Piersol, AG: Engineering ap-
two channel microphone array; The training of plications of correlation and spectral analysis.
the ear: experimental recording and listening (1980)
tests; More on microphones: a short tutorial.
Editrice Il Rostro, 2004. – ISBN 8873650732 [10] Griesinger, D.: General overview of spatial
impression, envelopment, localization, and ex-
[3] Griesinger, D.: The psychoacoustics of lis- ternalization. In: Proceedings of the 15th In-
tening area, depth, and envelopment in sur- ternational Audio Engineering Society Confer-

Page 6 of 7
ence, Copenhagen, Denmark Citeseer, 1998, S.

136–149
[11] Riekehof-Boehmer, H. ; Wittek, H. ; R.,

Mores: Voraussage der wahrgenommenen Bre-
ite einer beliebigen stereofonen Mirkfonanord-
nung. In: 26. Tonmeistertagung (2010)
[12] Riekehof-Boehmer, H.: Auswirkungen

der Diffusfeldkorrelation auf die raeumliche
Wahrnehmung. BSc, HAW-Hamburg, 2010

Page 7 of 7

AES 130 Riekehof Wittek PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

AES 130 Riekehof Wittek PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Audio Engineering Society

Prediction of perceived width of stereo

Correspondence should be addressed to Hans Riekehof-Boehmer (h.riekehof@gmail.com)

1. INTRODUCTION field in a recording is important for the perception

Sine and cosine function of 400Hz

ence function for differential microphones in diffuse 0.5

also an effect on sound color. The greater the cor- -0.5

relation in the reproduction of a diffuse sound field, -1

cient as well as the degree of coherence of the micro-

AES 130th Convention, London, UK, 2011 May 13–16

Coherence function γxy (f ) of different microphone setups

AES 130th Convention, London, UK, 2011 May 13–16

Weighting function χ(f ) Coherence function of noise1,noise2 and noise1,noise3

AES 130th Convention, London, UK, 2011 May 13–16

Results of the listening test

AES 130th Convention, London, UK, 2011 May 13–16

Results of the listening test Results of the listening test

5. CONCLUSIONS round recordings, and their relationship to mi-

AES 130th Convention, London, UK, 2011 May 13–16

ence, Copenhagen, Denmark Citeseer, 1998, S.

[11] Riekehof-Boehmer, H. ; Wittek, H. ; R.,

[12] Riekehof-Boehmer, H.: Auswirkungen

AES 130th Convention, London, UK, 2011 May 13–16

Das könnte Ihnen auch gefallen