Room Correction

Acoustic and Psychoacoustic issues in Room Correction
James D. (jj) Johnston Serge Smirnov Microsoft Corporation
This talk is in two parts:

First, JJ will discuss some basic acoustics, some psychoacoustic issues, and explain how that impacts the idea of room correction. Then, Serge will explain how we actually implement these principles in the Vista room correction algorithm.
Acoustics- What does a room do?

Direct signal Diffuse tail
Early Reflections Unpleasantly large Late reflection Early reflections are those more or less under the 10 msec mark. late (specular) reflections create a problem with perception, more on that later. The example here is egregious.
What else does a room do?

That diffuse field
It is not frequency-flat
Almost always, high frequencies roll off much faster (lower t60) than lower frequencies. It is (mostly) uncorrelated at the two ears, even taking into account ITDs
A point to recall:
Because high frequencies decay faster than low frequencies (even on a cold, dry day in the desert):
If you measure the early arrival frequency response, it will show a different frequency balance than that of the entire tail If you compare the early and late responses, the difference will be even bigger. Were used to listening to things that way, too, because its what we grow up with.
And a loudspeaker
Radiation patterns of loudspeakers are quite different at different frequencies
Typically, there is little directivity at bass frequencies As frequency goes up, there is more directivity. Many (consumer) speakers have fairly narrow high-frequency radiation patterns
So, what does that mean?

Many speakers, both consumer and professional, are not power flat in terms of polar response.
The total radiation from the speaker, not the front radiation, is what is added to the reverberant field. This means that the reverberant field almost always gets proportionally less energy injected at high frequencies than low.
Well, now we combine the two

Several things happen
Due to both the lower t60 at higher frequency and the radiation pattern of the loudspeaker there is less energy in the diffuse field at high frequencies.
So, what do we equalize? First arrival at high frequencies or the whole thing? What happens if we get that wrong?
There is a first-arrival
There may be a delayed reflection of a first arrival.
There are a variety of early reflections
So, we equalize what?

The long-term frequency response? The short-term frequency response? Some combination of both? How exactly do we equalize the frequency response How important is inter-channel matching
Vs
Flattening all responses?
Some other acoustical issues

What do we measure?
If we use an omni, we record only pressure.
There are also 3 other variables at the same point, the volume velocities in each of X, Y, Z
If we use a cardioid, we record one combination of volume velocity and pressure

Specifically, we record half of the volume velocity in the direction of the microphone plus Half of the pressure at the front of the microphone
So what do we correct?
Good question
Some things to remember:

1. The eardrum converts PRESSURE into mechanical movement 2. The head, to some extent, converts velocity to pressure at the ear canal 3. Our head affects the measurement when its there listening.
Even more acoustical issues

Sharp zeros in frequency response
This does not mean signal is absent.
It means that there is no PRESSURE (presuming omni measurement mike) at the point in question It means that volume velocity is at a peak at that point The ENERGY in the room is there, but its in the (mostly) wrong form for the ear at THAT POINT IN THE ROOM.
Adding more energy, therefore, is not a very good solution.
The only time a zero is not a room storage issue is when the loudspeaker has a zero at that frequency
So fix it, already!
Once more, with feeling: adding more energy to the room while its storing energy at that frequency is not a solution!
Finally, a note about speakers and linear systems

Speakers are not linear devices Speakers really arent linear devices Speakers, in fact, are rather far from anything approximating a linear device. So, it is a good idea to keep the energy at any one frequency low.
Sweeps dont do that Allpass sequences spread out the energy at any one frequency across time. This is a good thing.
NOW WHAT?
No, dont abandon ship, the water is only up to your beltline!
For useful answers, we look to the perceptual issues
What do the ears care about?

With the ear, both monaurally and binaurally FIRST ARRIVAL rules.
The precedence effect, which goes by any number of other names, shows that arrivals on the cochlea just after an attack are masked, even if they are quite a bit larger.
They do contribute to overall timbre
This means that most really early arrivals are masked
Ear Continued
The first-arrival provides a very strong localization effect binaurally.
This localization applies to anything that is correlated at the two ears, including with ITD range delays. Signals that are not correlated at the two ears are not localized, and are, rather, heard as envelopment
Localization vs. Intensity

After the time cues are considered, intensity provides us with a variety of spatial cues
First, HRTFs provide a variety of front/back, up/down cues. Mismatched intensity at the two ears at higher frequencies moves the stereo image.
Remember, though, first arrival rules.
Remember: Specular reflections are correlated at the two ears. The diffuse tail is not.
Some rooms are far, far, far from satisfactorily diffuse, hence flutter echo and like problems. This is not an easy problem to fix.
In the diffuse tail, bass hangs over much more strongly than high frequencies, both initially (due to loudspeaker radiation pattern) and more so later, due to lossy transmission and reflection of sound.
Diffuse perception
Signals that are not correlated (either by waveform at low frequencies or envelope at high frequencies) at the two ears are heard as diffuse or surrounding. This means that we hear the diffuse response of the room as a different (set of) auditory objects than the direct sounds. We are USED to the diffuse sounds being heavily colored in timbre.
Low frequencies
We live, day in and day out, in environments that provide a huge variation in the low-frequency environment.
Were used to it Nonetheless, huge excursions, especially peaks, are very annoying. Again, remember the rule dont add energy if theres already too much stored.
So, the message is?

Equalize the direct arrival at high frequencies. Since we are also used to hearing bumps and dips at low frequencies
Equalize the overall frequency response at low frequencies, dont invert the whole thing
Whatever you do, dont try to completely invert the system, i.e. correct both phase and magnitude.
Why not?
First, what are you inverting? Pressure? Volume velocity? Some of each? Does it relate to what your head/ear does in the soundfield? (Hint: NO ) Second, if you try to invert phase, youll introduce pre-echo unless your fit and inversion are good to 60dB.
Even if it was when you did it, it wont be when you exhale and change the humidity in front of your head.
What matters most to the ear?

First arrival timbre Large peaks should be equalized Large, sharp dips are not to be touched, remember the energy storage issue Broad dips can be equalized out for a broader listening area
Where are we?

Obviously, you need to equalize
1. Gain from each speaker 2. Delay from each speaker 3. Frequency response, but within limits
1. But in what way?
1. Exact? 2. Relative?
4. Try to cancel, to some extent, that single first later reflection, but only at low frequencies.
Why only at low frequencies?

As the listener moves, the mic moves, etc, that delay will change
If you equalize at high frequencies, a mic in the center of your head will be wrong for both ears. If you equalize only below 500Hz or so, you get a .5 foot radius space, give or take, where the cancellation makes some sense. You only do SOME cancellation. Even some cancellation removes the boxiness, and does not provide a bizarre experience out of the sweet spot.
The practical outcome

At low frequencies, youre adjusting the overall response of the room, not the details. At high frequencies, youre concerned only with the direct signal and the early reflections. This is almost speaker plus speaker stand correction In any case, you correct whatevers most egregious, speaker, room, whatever. Fix what you can, and dont touch the rest.
Relative vs. Flat correction

Relative correction
1. 2. 3. 4. Reduces the image shift and spread Fixes first arrival (time, frequency response, gain) cues in the soundfield Does not require a calibrated microphone Provides very good stereo imaging Requires a calibrated microphone Does not assure channel matching, in fact, the best flattening solution for each speaker will not in general assure best relative match Fixes first arrival cues for gain and time just like relative systems Does provide the measurably flattest response
Flattening each channel individually

1. 2.
3.
4.
Relative or Flat
Flat costs more for equipment Flat requires more CPU if done accurately Flat doesnt fix imaging as well, unless relative is also added, in which case you need even more CPU Relative is cheaper, both in equipment and CPU Relative corrects the most obvious defects.
First Reflection Cancellation

This is an individual adjustment for each channel
It removes the boxy sound to some extent Fixing this for the listening location means that we do put more impairments elsewhere in the room. Can be adjusted to avoid obvious impairments and still have some productive effect. Can clean up boom to some extent as well.
Conclusions:
At low frequencies, correct the overall room response At high frequencies, correct the first arrival Always, obviously, correct gain and delay between channels Relative correction between channels does more perceptually than the same amount of CPU applied to flattening the system analytically. Too much correction is bad Long-window corrections at high frequencies cause the dentist drill experience, because the system will be equalized to provide way, way too much correction at high frequencies for the first-arrival signal.
After the break:

Serge Smirnov tells us
How to implement a room correction that addresses the perceptual issues How to keep the CPU load down at the same time
The Break
Well do door prizes after the break Please take a 15 minute stretch.
Sequence of operations
Generate probe signals Measure delays Measure gains Measure frequency response Identify first reflection (delays are measured from one set of captures, the rest are measured from a second set of captures)
Probe generation
Synthesized in Frequency (Discrete Fourier) domain: Magnitude the same at all frequencies * Phase is continuous across frequency including at pi and zero Extent of time spread is limited by phase change, no window necessary DFT values at a negative and positive frequencies are complex conjugates to generate real signal Transformed to time domain using inverse complex FFT Imaginary part of complex time domain signal is zero Real part of complex time domain signal is the probe signal
*but see next slide
Narrowband vs. Wideband Probe

We actually generate two probes
The wideband probe used for identifying the system impulse response A narrowband probe used as a matched filter to capture time and delay, while rejecting low and high frequency interference (noise)
Probe Generation
Characteristics of the Probe Signal
Time Domain
Autocorrelation
Spectrum
Unwrapped phase
Cross-channel delay probing
silence between probes (for room to settle) extra marker probe at the end to detect timing glitches in audio capture/playback LS/RS could be LR/RR Can also do 7-channel or other arrangement, using same method
Capture from mic for delay probing
Can you find the pulses?
Delay analysis Hilbert (aka Analytic) envelope
What comes out

Note the noise rejection
Probe Autocorrelation
Gain, Freq response, etc. measurements (this happens for each channel separately)
N takes are used for wide probe in case sporadic room noises interfere
Gain analysis (per channel, per take)
Gain is derived from the 800-2000Hz average of power spectrum coefficients Only the first N (128) samples of the impulse response are used
Then, for each channel, throw away outliers and average the rest
Finally, normalize all gains relative to the channel with the highest/lowest gain
Reject the results if there is too much variation
Frequency domain deconvolution
1. Power spectrum of captured signal 2. Power spectrum of captured signal complex-divided by FFT of probe 3. First 400 samples of IFFT ( FFT ( capture ) / FFT ( probe ) )
Frequency response analysis (per channel, per take)
Deconvolution by way of division in the frequency domain Then, for each channel, throw away outliers and average the rest Finally, if relative response correction is specified, normalize all responses relative to the average of all channels
Computation of FIR correction filter (with apologies)
Separate correction filters are computed for low vs high frequencies Each filter assumes that the part it doesnt do is flat Durbin LPC is used to obtain all-zero inverse filters (normally Durbin LPC is used to obtain all-pole direct filters) transition between the low high is done in log(power spectrum) domain Low- and high-freq correction filters are convolved to obtain final filter final filter is then (not shown) normalized for unity avg gain 800-2000Hz
Location of First reflection

Computed from analytic envelope
Denominator of all-pole reflection cancellation filter
Reflection correction filter has a trivial numerator (1) Denominator uses (upside down) the coefficients of a specially crafted M-tap symmetric low-pass FIR positioned at a distance determined by reflection delay. I.e., it recursively subtracts LP-filtered version of echo.
The Low Pass filter

Provides a wider sweet spot Avoids flutter echo problems off-axis Ensures 100.00% filter stability
The data for the filter is internal to the rendering engine The stored information is only gain and delay, which can be trivially tested for stability at startup time
The Rendering Engine

applies per-channel gain, delay, FIR filter (frequency response correction), IIR filter (reflection cancellation) Is low complexity in CPU, RAM, ROM Allows application of a partial profile (say to 2 channels of a 7.1 profile) Allows limited application of a profile generated for a different sample rate
Questions ?

Room Correction

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Room Correction

Hochgeladen von

Copyright:

Verfügbare Formate

Acoustic and Psychoacoustic issues in Room Correction

James D. (jj) Johnston Serge Smirnov Microsoft Corporation

This talk is in two parts:

Acoustics- What does a room do?

What else does a room do?

So, what does that mean?

Well, now we combine the two

There are a variety of early reflections

So, we equalize what?

Flattening all responses?

Some other acoustical issues

If we use a cardioid, we record one combination of volume velocity and pressure

Some things to remember:

Even more acoustical issues

Adding more energy, therefore, is not a very good solution.

Finally, a note about speakers and linear systems

No, dont abandon ship, the water is only up to your beltline!

For useful answers, we look to the perceptual issues

What do the ears care about?

This means that most really early arrivals are masked

Localization vs. Intensity

Remember, though, first arrival rules.

So, the message is?

What matters most to the ear?

Where are we?

Why only at low frequencies?

The practical outcome

Relative vs. Flat correction

Flattening each channel individually

First Reflection Cancellation

After the break:

Narrowband vs. Wideband Probe

Characteristics of the Probe Signal

Cross-channel delay probing

Capture from mic for delay probing

Can you find the pulses?

Delay analysis Hilbert (aka Analytic) envelope

What comes out

Gain analysis (per channel, per take)

Reject the results if there is too much variation

Frequency domain deconvolution

Frequency response analysis (per channel, per take)

Computation of FIR correction filter (with apologies)

Location of First reflection

Denominator of all-pole reflection cancellation filter

The Low Pass filter

The Rendering Engine

Das könnte Ihnen auch gefallen