Rotated

Audio Engineering Society
Convention Paper
Presented at the 127th Convention
2009 October 912 New York NY, USA
The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have
been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from
the authors advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes
no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
Mixing-Console Design Considerations for
Telematic Music Applications
Jonas Braasch1, Chris Chafe2 , Pauline Oliveros1, and Doug Van Nort1
1
Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
2
CCRMA, Stanford University, Stanford, CA 94305, USA
Correspondence should be addressed to Jonas Braasch (braasj@rpi.edu)
ABSTRACT
This paper describes the architecture for a new mixing console that was especially designed for telematic
live-music collaborations. The prototype mixer is software-based and programmed in Max/MSP. It has many
traditional features but also a number of extra modules that are important for telematic projects: latency
meter, remote data link, auralization unit, remote sound level calibration unit, remote monitoring, and a
synchronized remote audio-recording unit.
1. INTRODUCTION are echo feedback avoidance, low-latency and large
The internet has revolutionized the way music is bandwidth requirements. While those issues are cur-
recorded nowadays, and services such as E-session rently being addressed by several individuals and re-
have enabled musicians to record with remotely- search teams, the recording gear that is being used is
located peers. Contrary to Rock and Pop Mu- still comprised of traditional systems with the excep-
sic, where the sequential recording approach that tion of low-latency transmission servers. In a long
these internet services were designed for is standard, series of experimental sessions that involved weekly
other types of music such as Jazz and Classical typ- music rehearsals between Rensselaer Polytechnic In-
ically require the whole ensemble to perform live to- stitute and Stanford University with participation
gether. This requirement imposes numerous chal- from other institutions (including University of Cal-
lenges for internet collaborations that have been ad- ifornia at San Diego and the Sonic Arts Research
dressed in previous work by the authors among oth- Centre (SARC) in Belfast), it was concluded that
ers [23, 17, 13, 21, 15, 22]. Among these challenges the development of a specialized mixing console was
Braasch et al. Telematic Mixing Console
Co-located Site A Co-located Site B
T3
amplifier pre amp T2
Transmission
Transmission
Computer
Computer
T4 T1
pre amp amplifier
Fig. 1: Feedback loop in a telematic transmission environment.
the most important feature to include in future sys- a synchronized transportation panel to enable
tem designs for telematic music recording and per- simultaneous recordings at dierent sites.
formance.
The special requirements that were identied in the Some of the ideas that were incorporated into the
weekly sessions and the resulting design solutions proposed mixing console are certainly not new, but
are the topic of this paper. The prototype mixer we thought it would still be interesting to view them
is software-based and programmed in Max/MSP. in the context of a holistic design.
It includes a number of C-externals that were pro- Before describing the design elements of the new
gramed especially for this project. The console has console in detail, we will discuss the main prob-
many traditional features like channel strips, aux- lems that we were confronted during our experimen-
iliary busses, and equalizers, but also a number of tal music sessions. In this paper, we will focus on
modules that are not typically used in traditional traditional and experimental acoustic music and ac-
mixing consoles: knowledge that the requirements and challenges are
often dierent for Electronic Music or other forms of
a latency meter to measure the transmission de- Media Art.
lay between two remote sites. Telematic systems for acoustic music projects usu-
ally follow the rationalistic or physicalistic tradition
a remote data link based on OpenSound Control
of virtual reality design. In this tradition, the goal
(OSC)[25] so that the distributed mixing con-
is to minimize the perceptual dierences between
sole elements can communicate with each other.
the physical and the virtual environment. Con-
an auralization unit to be able to spatialize the sequently, the typical overarching ideal of virtual-
participating performers musical instruments reality system design is to create a copy or plausible
and virtually place them into a homogeneous alternative of a real environment in a computer-
acoustical environment. generated space. With respect to telematic music
applications, the ideal system would show no signal
a remote sound-level calibration unit to balance degradation over the transmission pathway. Accord-
the instruments to their original acoustic levels. ing to this standard, the telematic music system can-
not surpass the quality of traditional venues, which
a remote monitoring unit that allows the onsite provide the reference by which the telematic venues
mixing engineer to experience a natural impres- are judged. Obviously, other design criteria apply
sion of the remote room acoustics as possible to Electroacoustic Music projects, where the real
AES 127th Convention, New York NY, USA, 2009 October 912
Page 2 of 17
Fig. 2: Time course for a telematic transmission of
a synchronous pizzicato tone played at two dier- Fig. 3: Same as Fig. 2, but for a main microphone
ent locations at Sites A and B. Top panel: Musi- recording as shown in the bottom panel.
cian A recorded at Site A, Center panel: Musician
B recorded at Site B, Musician A and B recorded at
Site A.
space as such does not exist. This is the main reason
that this article restricts its focus to acoustic music.
2. TRANSMISSION DELAY
From a virtual reality point of view, one of the
biggest challenges for telematic music performances
is transmission delay, which is usually unavoidable
in telematic environments. The transmission de-
lay between two telematic sites typically consists of
two elements: the propagation delay and the signal-
processing delay of the telematic apparatus (system
latency). While much has been achieved to reduce
the system latency of the underlying transmission
systems to almost negligible values, it is the physical
distance between two collaborators that determines Fig. 4: Time course for a telematic transmission
the achievable minimal propagation delay. Even of a synchronous pizzicato tone played at two dif-
though electric signals travel with the speed of light, ferent locations at Sites A and B. Top panel: Musi-
the resulting delays often exceed several tens of mil- cian A recorded at Site A, Center panel: Musician B
liseconds. recorded at Site B, but processed with an additional
A simple calculation shows that a signal traveling time delay to match the transmission latency that
on direct route between Rensselaer Polytechnic In- occurs in the main microphone recording located at
stitute in Troy, NY and Stanford University in Palo Site A bottom panel.
Alto, CA needs 14 ms for the distance of 4,111
Page 3 of 17
Fig. 5: Attempt to correct for transmission latency Fig. 6: Feedback pattern which occurs if the Mu-
to match to main microphone recordings captured sicians are only captured using a main microphone
at dierent sites. The top panel shows the time system for transmission purpose. Top panel: Site A;
course of the simultaneous pizzicato recorded at Site bottom panel: Site B. The fed-back signals do not
A, the bottom panel the same situation but for Site necessarily decline in amplitude, especially if cali-
B. Here, the signal was delayed by a post-processing brated signal level are used.
delay, tp , which results in an additional delay for
Musicians A pizzicato. that we are used to experiencing in traditional
venues several studies have investigated the
km (direct line at the speed of light). The signal- impairment musicians face when they are exposed
processing delay, on the other hand, is determined by to large transmission delays. Basically, there are
processes such as analog-to-digital conversion, data two signicant side eects: (i) temporal misalign-
packaging, routing processes, and digital-to-analog ment between musical instruments located at dier-
conversion. With adequate hard- and software, these ent sites [12, 1], and (ii) the possibility of slowing
processes altogether can take only a few millisec- down in tempo, when playing rhythmical patterns
onds. across sites [10, 11]. For traditional types of music,
performers tend to agree that the threshold above
When we speak about distributed music perfor- which it is very dicult to play in sync between two
mances, we tend to forget that every performance remotely-located sites is about 50100 milliseconds
involving more than one musician is a distributed [12, 1].
performance. Even if the musicians share the same There are both technical and musical solutions to
physical location, the acoustic information needs reduce the latency problem. From a technical stand-
time to propagate from the musical instrument to point the achieved reduction of system delays with
the ears of the participating musician(s). It travels systems such a Jacktrip [7] or Ultravideo Conferenc-
with the speed of sound at only 343 m/s at room ing [15], was probably the biggest single factor to get
temperature. Hence, two musicians that are located the situation under control. Another technical solu-
6 m apart on a concert stage face the same commu- tion is the implementation of a small self-monitoring
nication delay (14 ms per direction) as two closely delay [14]. Such a delay can induce the musician to
captured musicians that perform via the internet be- play ahead in time, from which the remote sites will
tween RPI and Stanford, assuming that the signal- benet, if the signal of the musician is sent directly
processing delay is negligible. without the self-monitoring delay.
However 6 meters would be an unusual distance Musical solutions can be divided into solutions
between musicians of smaller ensembles Since the through musical training and solutions that involve
transmission delay in telematic music performances the careful selection of adequate material and per-
typically exceeds the acoustic transmission delays forming styles, for example by using music material
Page 4 of 17
with rhythmical concepts that work well in a telem- from the instrument to the spot microphone (Fig. 2,
atic performance. We also noted that experience in top panel). A similar spot-microphone signal will
frequently performing in telematic venues helps to be recorded for Musician B at Site B (Fig. 2, center
cope with the situation, but formal psychoacoustic panel). However, the sound of Musician A will oc-
evidence has not been collected yet to test this hy- cur later at this site due to the electric transmission
pothesis. In practice, we found that one can learn delay between both remote sites e , when recorded
not to slow down in high-latency situations and in- at Site B (Fig. 2, bottom panel).
stead maintain the desired tempo. The temporal
Now let us assume that we record the event at Site
misalignment that results from the latency between
B with a main microphone (Fig. 3). Obviously, both
sites can have an interesting eect, which then char-
events are not recorded in sync. The sound of Musi-
acterizes the groove of the telematic performance.
cian B will be delayed by the time, ta2 the acoustic
Another side eect of the transmission delay is that signal needs to travel from the instrument to the
it can lead to audible echoes, unless the microphones main microphone (Fig. 3, bottom panel). Also for
are placed very close to the sound sources for at least Musician A, we will nd an acoustic delay from the
all but one location. Figure 1 depicts the situation broadcasting loudspeaker to the main microphone
for a typical two-way transmission scheme. Here, the a1 , and of course the delay of the electric signal
signal of Microphone A at Site A is reproduced via factors in as well: e , and we nd for the total de-
one or more loudspeakers at Site B (T1 ). This signal lay:
is then picked up by Microphone B (T2 ) and broad-
cast back to the original Site A (T3 ). From there,
t1 = e + a1 . (1)
it is re-captured by Microphone A (T4 ) to complete
the feedback loop. Due to the transmission latency,
For live-mixing and recording purposes, we can
the feedback becomes audible as echo at much lower
mix the recorded event around the captured main-
feedback gains compared to the feedback situation
microphone signals. The Spot Microphone B can
known from local public address systems.
be added using traditional techniques. Spot Micro-
For the given reasons, the use of main microphone phone A can be mixed in the same way, if the sig-
systems microphone arrays that are placed in the nal was captured at Site B. For the case it was
far-eld of the instruments to capture the whole recorded at Site A, a post-processing delay p can
ensemble (e.g., Blumlein, ORTF, Polyhymnia sur- be adjusted afterwards to match the timing of the
round) is problematic for bi-directional use in the corresponding signal that was captured with the
telematic music performances. Theoretically, we can main microphone (Fig. 4):
use a main microphone system at a single location,
if the microphones in the other venues are placed
p = e + a , (2)
in the near-eld to prevent audible feedback loops.
If an audience is present at only one site, a main
or
microphone system can be used at the remote site
to optimize the transmission quality for the concert
goers. p = e . (3)
Time alignment issues that are a product of the
if the acoustic delay between spot and main micro-
transmission latency can also pose a signicant chal-
phone should be preserved.
lenge, especially when recording a telematic perfor-
mance. To illustrate the case, let us proceed with a Next, we assume the hypothetical case of combining
little thought experiment. Let us assume that two the main microphone recordings from two dierent
musicians at dierent sites A and B play a pizzicato sites. For the pizzicato case, we nd two patterns as
tone at the same time, t0 = 0 s, on a hypothetical shown in Fig. 5. Obviously, it becomes impossible to
absolute time scale. At Site A, we record the on- align both recordings in time, since the remote signal
site Musician A with a spot microphone at t0 = 0 is always delayed compared to the onsite signal. For
s, neglecting the small acoustic transmission delay example, if we shift the main-microphone recording
Page 5 of 17
e
ot
m
e er
bl d
me oer
s rri
en m
Fig. 7: Telematic ensemble set-up with both en-
sembles facing each other. This type of placement
is often used for rehearsal situations. Shown left is ec
the onsite ensemble. The remote ensemble, which en
is virtually placed into the space, is shown to the di
au
right. The depicted calibration point is an anchor
point that can be used to calibrate the sound pres-
sure levels and spatial position of the individual mu-
sicians.
e
bl
eti me
for Site B so that the events for Musician A coincide, s s
we eectively double the mismatch for Musician B, on en
which makes matters worse in most cases. Then, of
course, there is the option of operating with head- e
phones or in-ear monitoring systems. In this case bl
me oet
the sounds are isolated from each other for the dif- sm
ferent sites and they can be perfectly aligned. How- en er
ever, one has to live with the circumstance that the
instruments have dierent room sounds, unless all
Fig. 8: Telematic ensemble set-up for a concert situ-
recording venues had similar sound characteristics.
ation with audience. Both ensembles, the one onsite
We also need to consider feedback delays in a bi- and the project one, face the audience. An addi-
directional main microphone scenario as shown in tional display of the remote ensemble can be used
Fig. 6. Unlike the case shown in the gure, the am- for monitoring purposes. In the sketch shown in this
plitudes of the instruments do not necessarily de- gure, the remote ensemble is mirrored in the mon-
crease over time, especially if the instruments are itor screen to preserve the left/right arrangement of
calibrated for sound-pressure level. On an interest- the individual musicians as indicated by the num-
ing sidenote, one of the authors suggested a method bers displayed on the heads of the musicians. This
to embed the audible feedback echoes into a virtual procedure avoids left/right reversals for loudspeaker
auditory space by disguising them as room reec- reproduction, which would lead to an audio/visual
tions, which often have delay times similar to the mismatch.
double transmission delay [9].
Page 6 of 17
niques, it would be useful to discuss the general
strategies to position the onsite and remotes ensem-
bles physically and virtually in the concert venue.
In this paper, we restrict ourselves to scenarios with
two ensemble locations and the use of a single video
screen for each site to project the remote ensem-
ble, but the concepts described here can be easily
extended to multicast connections and multichannel
video projections.
The vast majority of current music performances
are stage based, which is reected by nearly all
1 4 current main microphone and spatial audio coding
techniques with the exception of ambiosonics related
2 3 techniques. Usually, the microphones for the left and
3
right channels, or left, center, right channels are po-
2
sitioned to cover the whole stage spatially. Several
4 1 possibilities exist to create a synthetic venue for a
unied telematic performance.
A common practice in rehearsal situations is to place
two remote ensembles facing each other (Fig. 7).
Fig. 9: Telematic ensemble set-up in V congura- However, this set-up is problematic during a perfor-
tion for a concert situation with audience. mance with an audience. Since the latter would then
be facing the back of one of the two ensembles, the
3. SPATIAL PLACEMENT OF THE DIS- video screen to display the remote ensemble is often
TRIBUTED ENSEMBLE placed behind the onsite ensemble (Fig. 8). Unfortu-
The practical restriction to use main microphone nately, this practice restricts visual contact between
congurations in bi-directional music transmission both ensembles. One can work with additional video
set-ups makes virtual spatialization techniques un- monitors that are placed in front of the musicians to
avoidable, if the spatial placement of the individual solve this problem. The monitor image can be mir-
musicians is anticipated. These tools are especially rored to maintain the spatial coding in the left/right
useful, if the acoustic signals will be aligned with dimension to avoid mismatch with the spatial audio
the visual image of the corresponding performers. signal.
In this context, two dierent visualization concepts Another solution to this problem is to place both en-
will be discussed. sembles in a V-angle (Fig. 9). This way, both ensem-
Traditionally, often several cameras are used at each bles have eye contact, and can be properly viewed
site and the transmitted video signal is frequently by the audience.
faded between these cameras for a change in per-
spective. Obviously, it is impractical to continuously 4. VIRTUAL ACOUSTIC SPATIALIZATION
match the positions of the acoustic signals to the TECHNIQUES
video image, but it certainly would be useful to align
the sound spatially with the total view captured by
Virtual auralization techniques have proven to be a
the main camera. This practice is even more de-
successful tool to resynthesize the spatial informa-
sirable if only one camera is used at each site. In tion of spot-microphone recordings. For this pur-
fact, this is often the case and with recent increase
pose, several panning algorithms are presently avail-
in video resolution, the spectators are now more ex-
able such as the spatialisateur [16], Vector-Based
ible in focusing on visual details themselves. Amplitude Panning (VBAP) [20], and Virtual Mi-
Before, going into the details of spatialization tech- crophone Control (ViMiC) [2, 4, 3]. While all of the
Page 7 of 17
the spatial attributes of the recorded actors are lost
with spot microphones. To preserve the information
about the location of each actor, which could vary
during the piece, a microphone array was temporar-
ily installed in the Performing Garage of the Wooster
Group in Soho, New York (see Fig. 10). Using this
array, the position of each actor was tracked using
custom designed software which will be described in
Section 4.1. The data of the actors positions were
stored with the closely captured audio tracks to be
spatialized in real-time during the showing of the in-
stallation. An array of 32 loudspeakers was used for
this purpose.
Fig. 10: Microphone array. For the design of the mixing console, several ele-
ments were kept from the Wooster/EMPAC instal-
lation, for example the concept of a central calibra-
aforementioned algorithms work very well, in con- tion point. The calibration point falls together with
trast to main microphone techniques, they all re- the location of the microphone array. For telematic
quire the coordinates of the sound sources in addi- ensembles, we propose to use a calibration point for
tional to the near-eld microphone signals. While each physical location (see Fig. 11). The calibration
the positions of the sound sources can be adjusted points can be used to present a remote ensemble
manually by the mixing engineer (e.g., using a pan- virtually at another location. To achieve this, the
pot), an automated sound source tracking system calibration point of the remote ensemble is virtually
would simplify the use of the mixing console, espe- placed at the same location as the calibration point
cially since the recording location is partly o-site at the reproduction site. Consequently, the levels for
and not easily accessible. the remote ensemble will have to match at both cal-
ibration points for a calibrated reproduction. The
The basic design of the mixing consoles spatializa- automated calibration can be ne-tuned using the
tion unit was adapted from a Video installation
faders of the mixing console, but the system at least
with the New York City based Wooster Theatre
avoids a drastic level mismatch to begin of the mix-
Group. The installation was commissioned by Rens- ing process. To speak from experience, we had cases
selaer Polytechnic Institutes Experimental Media
where an energetic solo was heard faintly at the re-
and Performing Arts Center (EMPAC), and the rst mote end, and vs. versa instruments that were cur-
author was involved in the design of the audio en- rently playing a side role were suddenly found to
gine that was custom built for this installation. One
dominate the acoustic scene. In this context, it is
of the artistic requirements of the 18-minute piece noteworthy that contemporary improvised music of-
was to capture the performance with actors simulta-
ten poses a greater mixing challenge than traditional
neously speaking and later allowing the audience to
classical music. In the latter, the sound of an en-
choose what to listen for. For this purpose a rotat- semble is usually known, since this music is centered
able chair was built at the Zentrum fur Kunst und
around a few standardized ensembles (e.g., string
Medientechnologie (ZKM) in Karlsruhe, Germany to quartet, symphony orchestra). Furthermore, a score
select a section of the 360 movie, while the remain- is generally available for the mixing engineer to serve
ing screen was kept blurred. In general, the audio
as guideline.
engine only broadcasts the audio sources that fall
within this visual window, but a complex rendering For the Wooster/EMPAC installation, a custom
system also allows exceptions to follow the narrative VST plug-in was designed in the Max/MSP environ-
requirements of the piece. To be able to isolate the ment to capture spatial positioning data via OSC.
actors from each other, lavalier microphones had to The Rewire interface was used to control the trans-
be used during the recording process. Unfortunately, portation panel of the Digital Audio Workstation
Page 8 of 17
Site A Site B
microphone microphone
array array
a2 b2
lavalier lavalier
microphones microphones
a1 b1
pre- pre- pre- pre-
amplifier amplifier amplifier amplifier
D/A D/A D/A D/A
converter converter converter converter
analysis control data analysis
control data
Transmission Computer
Transmission Computer
computer computer
audio signals audio signals
Spatial Audio Gain Processing Gain Processing Spatial Audio
Processing A Unit A Unit B Processing B
Transmission via
Internet
Fig. 11: Sketch of the recording and reproduction set-up.
(Steinberg, Cubase 4) from the Max/MSP environ- sound source by utilizing arrival time dierences of
ment in order to synchronize the audio worksta- the sound source between the individual array mi-
tion to the high denition video during the post- crophones. In multiple-sound-source scenarios (e.g.,
production process. The digital transport of the a music ensemble), however, determining the sound-
workstations was constant enough to avoid major source positions from the mixed signal and assigning
drift-os, and the concept is now being used for them to the corresponding source is still a real chal-
telematic applications to start and end local record- lenge.
ings simultaneously (see Section 8).
A solution for this challenge is to include the lava-
4.1. Sound Source Tracking System lier microphone signals in conjunction with a tradi-
The sound-localization system that is integrated tional microphone-array based localization system.
into the proposed console is based on a pyramidal The lavalier microphone signals are then used to de-
ve-microphone array shown in Fig. 10, and is determine the signal-to-noise ratios (SNRs) between
scribed in part in [5, 6, 3]. The ve omni-directional a number of sound sources, for example concurrent
microphones are arranged in a square-based pyra- musicians, while still serving the main purpose to
mid with the following dimensions: base side, 14 capture the audio signals. The SNR is calculated
cm; triangular side, 14 cm. Traditional microphone- frequency-wise from the acoustic energy recorded in
array based systems work well to localize an isolated a certain time interval:
Page 9 of 17
with the speed of sound c, the sampling frequency
+t
tm
fs , the internal delay , and the distance between
SN Ri,m = 10 log10 1 pi2 dt (4) both microphones d.
a
tm
Since this technique cannot resolve two sound
with: sources within one time-frequency bin, the estimated
position is assigned to the sound source with the
+t
i1 tm
N +t
tm highest SNR. Alternatively, the information in each
a= pi2 dt + pi2 dt (5) band can be weighted with the SNR in this band. To
n=1 tm n=i+1 tm
save computational cost, a minimum SNR threshold
can be determined, below which the localization al-
gorithm will not be activated for the corresponding
and pi the sound pressure captured with the ith lava- time/frequency slot.
lier microphone, tm the beginning of the measured
time interval m, t its duration and N , the number Figure 11 depicts the whole transmission chain
of lavalier microphones. which includes the sonication system. At the
recording site, the raw sound signals are captured
Basically, the SNRs are measured for each time in-
through the lavalier microphones which also feed the
terval between each observed sound source and the
localization algorithm with information to calculate
remaining sound sources. The data can then be used
the instantaneous SNR. Both the audio data and the
to select and weight those time slots in which the
control data which contains information on the es-
sound source dominates the scene, assuming that
timated sound source position is transmitted live
in this case the SNR is high enough for the mi-
to the co-located site(s). Here, the sound eld is
crophone array to provide stable localization cues.
resynthesized from the near-eld audio signals and
Figure 12 depicts the core idea. In this example, a
the control data using rendering techniques such as
good time slot is found for the third time frame for
ViMiC.
Sound Source 1, which has a large amount of energy
in this frame, because the recorded energy for Sound The sound source tracking unit is currently imple-
Source 2 is very low. Time Slot 6 depicts an exam- mented in Matlab. This allows easier prototyping
ple where a high SNR is found for the second sound than an implementation in Max/MSP, and also saves
source. computational resources for the main units of the
mixing console, because it can run on a dierent pro-
To improve the quality of the algorithms, all data
cessor. The Matlab module runs in real-time using
are analyzed frequency-wise. For this purpose the
the Data Acquisition Toolbox. The module receives
signals are sent through an octave-band lter bank
multichannel audio input (see Fig. 13) and returns
before the SNR is determined. Basically, the SNR is
the calculated results (positions of individual sound
now a function of frequency f , time interval t, and
sources) via OSC.
the index of the sound source i: SNR=SNR(f ,t,i).
The sound source position is determined for each 4.2. Sound Spatialization
time/frequency slot by analyzing the time delays be- In the proposed mixing console the spot-microphone
tween the microphone signals of the microphone ar- recordings are spatialized using Virtual Microphone
ray. The position of the sound source is estimated Control (ViMiC). The system, which has been in-
using the cross-correlation technique, which is used troduced at a previous AES Convention [2] is based
to determine the direction of arrival (DOA) from on the simulation of microphone techniques and
the measured internal delay (peak position of the acoustic enclosures. The system basically simulates
maximum of the cross-correlation function) via this a multichannel main microphone signal from the
equation (e.g., see [26]): near-eld recordings and the positioning data pro-
vided by the sound-localization microphone array.

c In the ViMiC environment, the microphones, with
= arcsin , (6) adjustable directivity patterns and axis orientations,
fs d
Page 10 of 17
Lavalier microphone signals
high energy/
Energy Source 1
positive SNR
Energy Source 2
medium energy/
1 2 3 4 5 6 7 SNR around 0 dB
time
Signal analysis
SNR Source 1 low energy/
negative SNR/impossible SNR
estimation
SNR Source 2
1 2 3 4 5 6 7
time
Fig. 12: Estimation of the signal-to-noise ratios for each sound source.
can be spatially placed as desired to simulate a main-
microphone set-up. Each virtual microphone signal
To remote locations
is then fed to a separate (real) loudspeaker for sound
projection. The system architecture was designed
for a maximum exibility in the creation of spatial Transmission
Computer
imagery. Despite its exibility, the system is intu-
ViMiC
itive to use because it is based on the geometrical Jacktrip
Spatialization
and physical principles of microphone techniques. It
is also consistent with the expectations of audio en- Loudspeaker
Array
gineers to create sound imagery similar to that as- Mixing Console
Mic. Preamplifier
sociated with standard sound-recording practice.
5. REMOTE SOUND LEVEL CALIBRATION Sound Source Tracking
UNIT Level Calibration
Latency Meter
In this section, we would like to discuss the imple- Analysis Computer Stereoscopic Mic. Array Spot
Binaural Head Microphones
mentation of the sound-level calibration unit. The
Legend
system is basically an extension of the sound source audio data
tracking system described in Subsection 4.1. The control data
extended system is shown in Fig. 14. The sound-
pressure level of each sound-source can be measured Fig. 13: Block diagram and data ow of the Telem-
with the onsite microphone array that is used to atic Mixing Console and Periphery.
track this source. The near-eld microphone data is
used to select those time/frequency frames in which
this source has a sucient SNR. To achieve this, the
center microphone array has to be calibrated to mea- pressure level at the Reproduction Site B using an
sure the absolute sound pressure level. When the identical microphone array to Site A. Otherwise,
signal is broadcasted at the remote site, the gain in each musician would have to be calibrated individu-
the audio processing unit can now be set such that ally in absence of other sound sources. Each site has
the sound pressure level of broadcasted signal at Site a Gain Processing Unit in form of a linear amplier
B matches the original sound pressure level of the array to adjust the gain of a near-eld microphone
sound at Site A. The advantage of the system is that signal to match its level at the remote site(s) to the
it can be calibrated adaptively by tracking the sound level at the original site. An adaptive algorithm is
Page 11 of 17
Live
transmission Reproduction
Recording Space
or data Space
storage
microphone
array relative position of
spatialization microphone array
D/A analysis control data
converter computer virtual sound
lavalier
microphones sources
pre- Audio
amplifier processing
audio signals with ViMiC
Fig. 14: Sketch of the recording and reproduction set-up.
used to automatically adjust the gains in real-time to 3. An identical localization microphone array to
compensate for manipulations in the microphone ad- Set 2, but located at the reproduction site, with
justment (e.g., re-alignment of a lavalier microphone sound pressure signals, pp,k , with index p (play-
by an actor, or a musician moving closer to or away back microphone array) to label this group and
from the near-eld microphone). One of the main k the index of the microphone (1 k K). K
advantages of the system is that it is robust against is the total number of array microphones.
competing sound sources at the reproduction site,
since it calculates the calibration data only if the
During the recording process the following data
SNR is suciently large. Each sound source can be
needs to be recorded:
spatialized after gain adjustment using a spatializa-
tion tool such as ViMiC [2].
1. The audio data pc,n (t) for all N near-eld mi-
In the following the detailed calibration procedure crophones.
will be described. Three dierent sets of micro-
phones are needed for the sound-pressure-level cal- 2. The audio data pr,i (t) for all K microphones of
ibration system, and the general procedure is given the sound-localization array.
here:
3. The position trajectory (azimuth (t) and el-
evation (t)) for all N near-eld microphones.
1. The near-eld microphones with sound-pressure The sensitivity of these microphones need to be
signals, pc,n , with index c (close-microphone) to calibrated to an absolute sound pressure level
label this group, and n, the index of the micro- (e.g., in dB SPL).
phone (1 n N ). The variable N is the total
number of near-eld microphones. 4. The microphone index i for each time/frequency
interval of the near-eld microphone with the
2. The localization microphone array at the highest SNR. The parameter i(t, f ) is set to
recording site with sound pressure signals, pr,k , zero, if the SNRs for all microphones falls below
with index r (recording microphone array) to a threshold.
label this group, and k the index of the micro-
phone (1 k K). K is the total number of For sound reproduction, the system is comprised of
array microphones. the following units: An amplied loudspeaker array
Page 12 of 17
to reproduce the recorded sound events, a signal- near-eld microphone. The vector has F ele-
processing computer to playback the recorded au- ments, which corresponds to the number of fre-
dio les, spatialize these les, and adjust the gain of quency bands. The values of wi (m, f ) are set to
each individual sound le that was recorded with the one for all time/frequency intervals, where the
near-eld microphones. For spatialization, a number i-th microphone channel had the highest SNR.
of known methods can be used (e.g., amplitude pan- Otherwise, the weighting factors are set to zero.
ning, or ViMiC). In the center of the loudspeaker
array, a microphone array is placed that is identical Step 4: Then, the overall Level Dierence LD is
to the array for the recording process. Both micro- determined for each near-eld microphone that
phone arrays have to be calibrated to the same sound has at least one weighting coecient wi (m, f ) =
pressure level. 0, for example according to this equation:
Using the recorded data, the following method is
F
used to calibrate the audio signals to the original LDf,m wi,f
f =1
sound pressure level starting with the rst time in- LDi,m = (8)

F
terval t and default gain factors of gi,old = gi,new = 1 wi,f
(as dened later in the description): f =1
Step 5: The parameter LD is then used to ad-
Step 1: The audio data pc,n (t) for all N near- just the gain for each microphone for all mi-
eld microphones are multiplied with the cur- crophones with least one weighting coecient
rent gain factors and then spatialized for the wi (m, f ) = 0:
current time interval t (e.g., t: 10-ms dura-
tion) using the position data (t) and (t) The
signals are then presented to the loudspeaker gi,new = gi,old a + (1 a) 10LDi,m /20 (9)
array.
Step 2: During playback the reproduced sound With the new gain factor, gi , new, and the old
is captured using the onsite microphone array. gain factor gi , old. The variable a (0 a
The signals for both microphone arrays, pp,k (t) 1) determines how fast the gain factor will be
and pr,k (t), are sent through a band-pass l- adapted, with a = 1, instantaneous update, and
terbank. Then, the recorded bandpass-ltered a = 0, no update.
signals pp,k,f (t) are compared channel-wise to Step 6: The gain factors are set to gi,old = gi,new
the bandpass-ltered microphone array signals for all near-eld microphones for which the gain
pr,k,f (t) using this equation: has been updated in this time frame.

+t
K tm
Step 7: The system progresses to the next time
LDm,f = 10 log10 2
pr,k,f dt . . . frame and continues with Step 1, unless the al-
k=1 tm gorithm is terminated by the user.

+t
K tm

10 log10 2
pp,k,f dt , (7) 6. LATENCY METER
k=1 tm The transmission latency can be easily determined
using the cross-correlation algorithm. The imple-
with LDm,f the level dierence for the time- mented procedure is based on impulse-response mea-
interval m, and the frequency band f . surement using Least Mean Square (LMS) signals.
The measurement is carried out by the Matlab pro-
Step 3: Next, the Level Dierence is analyzed gram that also handles sound source-tracking and
for each near-eld microphone. For this pur- level calibration. The signal pathway is shown in
pose a weighting vector w is determined for each Fig. 15. The measured impulse response shows one
Page 13 of 17
ts
Text-field to label the channel strip,
Label is transferred to the remote sites via OSC
amplitude
Gain adjustment, pre-transmission

Reverb level for ViMiC system
ms
Button to calibrate sound source in level and position
led lights up, if calibrated
Button couples sound source settings across sites
Dial to manually adjust the direction of the sound source
0 5 10 15 20 25 30 35
time [ms]
Solo Button
Fig. 15: Impulse Response for a telematic trans-
Button to solo the dummy head signal and orient the
mission system. The top curve shows the condition, dummy head toward the sound source for this channel
strip
where the signal was looped through the telematic
transmission system. The bottom curve shows the
results for a signal that was measured within the
measurement system.
Fig. 16: Channel strip of the telematic mixer.
peak, and the location of the peak in time 2 rep-
resents the time the signal takes on a whole round 7. ACCESSIBILITY OF THE REMOTE MIXING
trip between both transmission sites. To exclude UNIT AND RECORDING SPACE
the latency of the measurement system ms , a sec- The next topic, is dedicated to issues related to vir-
ond signal is routed directly from the output of the tual accessibility to the remote sites. Especially for
measurement system to a second input, and the cal- improvised music it can be dicult to follow what
culated delay ms is subtracted from the round trip is going on at the remote end, since a music nota-
delay measurement. The transmission delay td from tion does not exist as a guide line. The underlying
one site to another can now be approximated to: problem is twofold. On the one hand, it is impor-
tant that the mixing engineer (or group of engineers)
td = 0.5 (2 ms ) . (10) have access to the controls of the remote units and
can read o their displays. One the other hand, it
often would be helpful, if the engineer has intuitive
The measurement is an approximation since we as- perceptual access to the remote sites, instead of be-
sume that it takes the same time for the signal to ing solely confronted with a number of unprocessed
travel the distance in each direction. This assump- audio channels that he or she will need to organize.
tion can be made, because the network is set up
symmetrically. Practically, dierences in trac can 7.1. Accessibility to the mixing console
lead to dierent transmission speeds for both ways, The concept of the proposed mixing console is based
but such asymmetries are dicult to measure and on two identical consoles with mirrored features,
probably not very important. In practice, we have which gives the engineer access to the remote site
not noticed changes in transmission latency during through his/her own interface (see Fig. 13). Fig-
a single session, and it was found to be sucient to ure 16 depicts a channel strip of the console to high-
measure the latency for one audio channel before the light the concept. The faders of the corresponding
start of a session. channel strips of two or more remote units are syn-
Page 14 of 17
on the same head [18]. This multi-modal record-
ing device provides a very natural perspective. A
similar technique is used by the authors. To im-
prove the dummy heads performance, a dummy head
with stereoscopic vision was designed as a master
thesis project by on the of the rst authors stu-
dents ([24], Fig. 17). The dummy head uses sili-
con ears that were molded after Umiles ears with
miniature capsules (Sennheiser, KE-4). For vision,
two Apple iSight cameras were used. The resolu-
tion of 640480 pixels is good enough for monitor-
Fig. 17: Stereoscopic Binaural Head.
ing purposes, and the cameras are small enough to
t into the dummy head without altering the acous-
chronized via OSC. The level-meters are also con- tical properties of the head too much. It is also
trolled via OSC and the update rate can be ad- worth mentioning that relatively inexpensive motor-
justed manually for the best compromise between ized cameras are available that can be controlled re-
meter sluggishness and data rate. The channels can motely (e.g., Sony EVI D30).
be labeled electronically, and the labels are updated The control for the steerable dummy head can be
automatically on the remote site using OSC. This integrated into the mixing console interface, and it
way, the mixing engineer has a good overview what can also be connected to the acoustic sound source
to expect from the remote site even if the channel ar- tracking device, for example, by a button for each
rangement is improvised or changed. Actually, even channel strip, to position the dummy head to face a
missing a complete instrument in the nal mix is not specic source on the click of a button (see Figs. 16
that unusual from what we experienced. One has to and 13).
keep in mind that it is much more dicult to get a
7.3. Bandwidth restrictions
good overview of the ensemble at the remote end as
Bandwidth restrictions is an important topic in
compared to the traditional one-venue situation.
telematic music which we have not addressed in
7.2. Perceptual accessibility to the remote con- great detail in this paper. The main reason is that
cert hall most of our partners have access to INET2 which al-
lows data rates of up to a Gigabit/s or more. Practi-
Head-related recording devices have been proven to cally, we have been working with connections of close
be successful tools to render acoustic scenes. The to 100 Megabits per second in both directions simul-
dummy head, which has been introduced in the taneously. The transmission of DV quality video
1930s at Bell Laboratories, became popular in the requires a bandwidth of 25 Megabits/s in one direc-
1970s for radio-broadcast, and has also been used tion and 8 channels of CD quality audio about 5.5
for mixing purposes. Pellegrini describes a method Megabits/s.
that was proposed by Studer, where a dummy head
is placed in a concert hall to transmit a binaural sig- Most private residencies, however, currently ac-
nal to a broadcast vehicle located outside the concert cess the internet with bandwidths of less than 1
hall [19]. In the described project, the reproduction Megabit/s. It should be mentioned that a new class
quality of the dummy head recording could be im- of compression algorithms has been developed by
proved signicantly by tracking the head movements other parties that allow similar compression rates to
of the mixing engineer in real-time to steer a motor- the widely spread MPEG or AAC standards with
ized dummy head to follow the left/right movements much lower latencies of about 5 ms for the cod-
of the engineers head. In particular, front/back con- ing/decoding process [8]. Once personal computers
fusions could be reduced signicantly this way. become more powerful enough to handle the en- and
decoding process for multichannel les, it would be
In another study, a dummy head has also been used feasible to integrate these systems with the telematic
in conjunction with a video camera that is mounted mixing console concepts described here.
Page 15 of 17
8. SYNCHRONIZED TRANSPORT CONTROL control. Computer Music Journal, 32(3):5571,
For telematic recordings it is often advantageous to 2008.
record the audio separately at each local site, with
the master site recording also the audio from the [4] J. Braasch, T. Ryan, and W. Woszczyk. An
remote site as reference for later time alignment. immersive audio environment with source po-
The required post-processing time alignment can be sitioning based on virtual microphone control
based on the suggestions made in Section 2. Alterna- (ViMiC). In Proc. of the 119th Convention of
tively, one could synchronize the remote machine(s) the Audio Eng. Soc., New York, NY, October
by sending a word-clock signal over one of the audio 710 2005. Paper Number 6546.
channels. Synchronizing two audio machines more [5] J. Braasch and N. Tranby. A sound-source
eciently at a lower data rate would work in the- tracking device to track multiple talkers from
ory also using OSC. However, the timing of the con- microphone array,and lavalier microphone data.
trol data streams of the Max/MSP platform (Ver- In 19th International Congress on Acoustic,
sion 5) that we are currently using is not accurate Madrid, Spain, Sept. 2-7 2007. ELE-03-009.
enough for this task. Aligning the remote tracks by
ear and visual inspection within an audio editor has [6] J. Braasch, D. Valente, and N. Peters. An
the advantage that the best perceptual solution can immersive audio environment with source po-
be used. It should not be forgotten that absolute sitioning based on virtual microphone control
time does practically not exist in distributed perfor- (ViMiC). In Proc. of the 123rd Convention of
mances. the Audio Eng. Soc., New York, NY, 2007. Au-
dio Engineering Society. Paper Number 7209.
Currently, we use OSC commands as a compromise,
to start and end the recording process between re- [7] J. P. Caceres. Jacktripmultimachine jam ses-
mote machines using the ReWire interface between sions over the internet2. Technical report,
the local Digital Audio Workstations and our mixing SoundWIRE research group at CCRMA, Stan-
console in Max/MSP. The time alignment between ford University, 2003.
two digital recorders has been proven to be quite
stable in our practical work. [8] Alexander Carot, Ulrich Kramer, and Gerald
Schuller. Network music performance (NMP)
9. ACKNOWLEDGMENT in narrow band networks. In Proc. of the
This project received support from the National Sci- 120th Convention of the Audio Eng. Soc., Paris,
ence Foundation (#0757454). France, May 2006. Audio Engineering Society.
Paper Number 6724.
10. REFERENCES [9] C. Chafe. Distributed internet reverberation for
[1] Alvaro Barbosa, Jorge Cardoso, and Gunter audio collaboration. In Proc. of 24th Int. Conf.
Geiger. Network latency adaptive tempo in the of the Audio Eng. Soc., Ban, Canada, 2003.
public sound objects system. In Proceedings Paper Number 13.
of the 2005 International Conference on New
[10] C. Chafe and M. Gurevich. Network time de-
Interfaces for Musical Expression (NIME05),
lay and ensemble accuracy: Eects of latency,
pages 184187, Vancouver, BC, Canada, 2005.
asymmetry. In Proc. of the 117th Convention
[2] J. Braasch. A loudspeaker-based 3D sound of the Audio Eng. Soc., San Francisco, CA,
projection using virtual microphone control 2004. Audio Engineering Society. Paper Num-
(ViMiC). In Proc. of the 118th Convention of ber 6208.
the Audio Eng. Soc., Barcelona, Spain, May 28
31 2005. Paper Number 6430. [11] C. Chafe, M. Gurevich, G. Leslie, and S. Tyan.
Eect of time delay on ensemble accuracy. In
[3] J. Braasch, N. Peters, and D. Valente. A Proceedings of the International Symposium on
loudspeaker-based projection technique for spa- Musical Acoustics (ISMA04), Nara, Japan,
tial music application using virtual microphone 2004.
Page 16 of 17
[12] E. Chew, A. Sawchuk, C. Tanoue, and R. Zim- [20] V. Pulkki. Virtual sound source positioning us-
mermann. Segmental tempo analysis of per- ing vector base amplitude panning. J. Audio
formances in performer-centered experiments in Eng. Soc., 45:456466, 1997.
the distributed immersive performance project.
In Proceedings of International Conference on [21] R. Rowe and N. Rolnick. The technophobe and
Sound and Music Computing 05 (SMC05), the madman: an internet2 distributed musical.
Salerno, Italy, November 2005. In Proc. of the Int. Computer Music Conf. Mi-
ami, Florida, November 2004.
[13] E. Chew, A. Sawchuk, R. Zimmerman, V. Stoy-
anova, I. Toshe, C. C. Kyriakakis, C. Pa- [22] F. Schroeder, A. Renaud, P. Rebelo, and
padopoulos, A. Franois, and A. Volk. Dis- F. Gualdas. Addressing the network: Perfor-
tributed immersive performance. In Proceedings mative strategies for playing apart. In Proc. of
of the 2004 Annual National Association of the the 2007 International Computer Music Con-
Schools of Music (NASM) Meeting, San Diego, ference (ICMC 07), pages 133140, 2007.
CA, November 2004. [23] J. C. Steinberg and W. B. Snow. Auditory per-
[14] E. Chew, R. Zimmermann, A. A. Sawchuk, spective physical factors. Electrical Engineer-
C. Papadopoulos, C. Kyriakakis, C. Tanoue, ing, Jan:1217, 1934.
D. Desai, M. Pawar, R. Sinha, and W. Meyer.
[24] Michael T. Umile. Design of a binaural and
A second report on the user experiments in
stereoscopic dummy head. Masters thesis,
the distributed immersive performance project.
Rensselaer Polytechnic Institute, Troy, New
In Proceedings of the 5th Open Workshop of
York, 2009.
MUSICNETWORK: Integration of Music in
Multimedia Applications (MUSICNETWORK [25] M. Wright, A. Freed A., and A. Momeni. Open-
2005), Vienna, Austria, July 2005. sound control: State of the art 2003. In Proceed-
ings of the 2003 Conference on New Interfaces
[15] J.R. Cooperstock, J. Roston, and W. Woszczyk.
for Musical Expression (NIME-03), pages 154
Broadband networked audio: Entering the era
159, Montreal, Canada, 2003.
of multisensory data distribution. In 18th Inter-
national Congress on Acoustics, Kyoto, April [26] W. Wurfel. Passive akustische lokalisation [pas-
2004. sive acoustical localization]. Masters thesis,
[16] J.-M. Jot. Etude et realisation dun spatialisa- Technical University Graz, Graz, Austria, 1997.
teur de sons par models physiques et perceptifs.
PhD thesis, Telecom Paris, 1992.
[17] P. Oliveros, J. Watanabe, and B. Lonsway. A
collaborative internet2 performance. Technical
report, Oering Research In Music and Art,
Orima Inc. Oakland, CA, 2003.
[18] Moonho Park, Laehyun Kim, Heedong Ko,
and Hyeran Byun. A mixed environment
for tele-meeting between real and virtual hu-
man. In International Conference on Intelli-
gent Robots and Systems, IEEE/RSJ, volume 3,
pages 13681373, 1999.
[19] Renato S. Pellegrini. A virtual Reference listen-
ing Room as an Application of auditory virtual
Environments. PhD thesis, Ruhr-University
Bochum, Bochum, Germany, 2001.
Page 17 of 17

Rotated

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Rotated

Hochgeladen von

Copyright:

Verfügbare Formate

Audio Engineering Society

Gain adjustment, pre-transmission

Das könnte Ihnen auch gefallen