AudioCourse 2011

1
Audio Signal Processing

(EE4475)
Prepared by Dr. Farook Sattar
Taught by Dr. Soon Ing Yann
Division of Information Engineering
Room: S2-B2c-114
Tel: 67905638
Email: eiysoon@ntu.edu.sg
Revised : 2011
2
Course Syllabus (Part A)
Fundamentals of Hearing (Part I)

3-D Sound Localization and Synthesis (Part II)
3
Course Contents
Fundamentals of Hearing
Sound, Power, Intensity and Decibels,
Loudness Perception, Equal Loudness
Curves, Sound Masking, Thresholds of
Hearing, Engineering Models of Auditory
System, Critical Band
4
Course Contents
3-D Sound Localization and Synthesis
Auditory Localization Cues, Interaural
Transfer Function (ITF), Head-Related
Transfer Function (HRTF), 3-D Sound
Synthesis, Crosstalk Cancellation
5
References for Fundamentals of
Hearing (Part I)

William A. Yost,
Fundamentals of Hearing An Introduction,
Academic Press, 2000.

Udo Z lzer
Digital Audio Signal Processing,
John-Wiley Publishers, 1997.

o

6
References for 3-D sound
localization and synthesis (Part II)
William G. Gardner,
3-D Audio Using Loudspeakers,
Kluwer Academic Publisher, 1998

John Garas,
Adaptive 3-D Sound Systems,
Kluwer Academic Publisher, 2000

Durand R. Begault,
3-D Sound for Virtual Reality and Multimedia,
Academic Press, 1994

7
Reference web-sites
Physics of sound
http://interface.cipic.ucdavis.edu/sound/tutorial/physics.html#sines

Sound and Hearing
http://hyperphysics.phy-astr.gsu.edu/hbase/hframe.html

Matlab codes
http://sound.media.mit.edu/ica-bench/

3-D audio synthesis
http://www.hitl.washington.edu/scivw/EVE/I.B.1.3DSoundSynthesis.html

3-D audio & Demos
http://interface.cipic.ucdavis.edu/sound/tutorial/

8

Part I
Fundamentals of Hearing
1.1 Physics of Sound
9
10
Sound How sound is created and
how it propagates?
Sound originates from a disturbance of the air
by any object.
For example, two hands clapping cause a
disturbance of the air around the hands:
the hands are the source of the sound.
The local region of air has gained energy
caused by the motion of the air molecules. This
energy spreads outwards in sound waves.

http://interface.cipic.ucdavis.edu/sound/tutorial/physics.html#sines
http://hyperphysics.phy-astr.gsu.edu/hbase/hframe.html

How we produces sound ?
12
Sound
13
Sound
An interesting source of acoustic energy today is
the loudspeaker: the cone of the loudspeaker
vibrates in the air causing disturbances
dependent on the electrical signals reaching the
loudspeaker from the sound system.
Effectively, the loudspeaker converts electrical
energy into sound energy, which travels through
the air as waves radiating from the loudspeaker.
Sound travels through the air at about 340
metres (m) per second (sec).

14
Sound

A microphone converts variations in acoustical
sound pressure into variations in electrical
voltage.
15
Sound - Representation
Sound is often represented
diagrammatically as a sine wave.

The wave crests can be considered as the
pressure maxima, whilst the troughs
represent the pressure minima.
16
Sound - Representation

Another way, a compression of the air results in
a positive voltage and a rarefaction results in a
negative voltage.
17
Sound - Physical Variables
Two physical measurements describe this sound wave
completely: they are the amplitude and the period.

We usually speak, not of the period, but of the
frequency of vibrations, which is simply the inverse of
the period (i.e. 1/period).

The frequency simply measures the number of waves
that travel by in each unit of time. It is usually measured
in Hertz (Hz), 1 Hz is 1 cycle per second. (1 kHz = 1000
Hz).

The range of human hearing is from 20 Hz to 20,000
Hz.

18
For instance, the source is vibrating at
100 Hz, then it will vibrate once per one
hundredth of a second.

The sound will then travel 340/1 (m/sec)
1/100 (sec/cycle)=3.4 m/cycle.

19

This distance per cycle is named as wavelength
of the sound wave ():
=c /f
where
c: sound velocity
f : frequency

) 1 (
20

Wavelength
21

22
Types of Sound Waves
Spherical Wave: In practice, most sound
sources are quite small and, therefore, the
sound is produced in the form of spherical
wave, in which sound waves travel from
the source in every direction.
Plane Wave: A plane surface (theoretically
infinite plane) will produce a plane wave.

23
The parallel between acoustical and electrical
cases should be clear:
In both cases there is a variable that represents
the fundamental disturbance;
- A voltage x(t) that can be either positive and
negative, or
- A pressure x(t) that can be either a compression
(positive) or a rarefaction (negative).
Analogy Between Electrical &
Acoustical Cases
24
Analogy Between Electrical &
Acoustical Cases
By squaring x(t), we obtain an energy like
quantity, a power or an intensity that can
only be positive or zero.

Voltage Pressure

Power Intensity

25
Power
If signal x(t) represents a voltage then power
that is provided can be described if the voltage
drives a load that is purely resistive, i.e., there
is no reactive (capacitive or inductive) part to
the load.

If the resistance of the load is R then, by
Ohm's law, the current in amperes is
i(t)=x(t)/R, and the instantaneous power at
time t is the product of instantaneous voltage
and the instantaneous current:
P(t)=x(t).i(t)=x
2
(t)/R ) 2 (
26
Average Power
The average power, , averaged over a
duration of time T
D
, is given by
} }
(
= =
D D
T
D
T
D
dt
R
t x
T
dt t P
T
P
0
2
0
) (
1
) (
1
P
) 3 (
27
Average Power
For a quasi-periodic signal the averaging
durationT
D
must be long compared to the
period T, or it must be equal to an
integral number of the periods.
For a periodic signal the duration T
D
can
be a single period T.
28

Average Power in terms of RMS-
Value
The root-mean-square (rms) value of the signal is
defined as the numerical value of a constant
which would give the same power as the average
power of the signal.

P
In other words, by defination
}
= =
D
T
D
RMS
t d t P
T
P
R
x
0
2
) ( ) (
1
) 4 (
29
Average Power in terms of RMS-
Value
}
=
D
T
D
RMS
dt t x
T
x
0
2
) (
1
Therefore,
) 5 (
30
Example:
For a sine wave signal, , we
can find the average power of as
}
=
T
dt t A
T
P
0
2 2
, ) ( sin
1
e
where T is the period (Assuming R=1). After
simple manipulations, the average power, for a
sine wave of amplitude A :
2
2
A
P =
) sin( ) ( t A t x e =
P
) 6 (
) 7 (
31
Proof:
| |
( )
2 2
0
2
0
2
0 0
2
2 2
2
1
sin ( )
1 1
1 cos 2
2
cos 2
2
sin 2
2 2
sin 2 2
.
2 2 2
since sin 4 0
2
T
T
T T
P A t dt
T
A t dt
T
A
dt tdt
T
A T
T
T
A A
T
A
e
e
e
e
e
t
e
t
=
=
(
=
(

(
=
(

=
= =
}
}
} }
32
Proof:
A
A
x
A
RMS
707 . 0
2
2
2
= = =
The root-mean-square (RMS) value is the square
root of the average power and, therefore,

) 8 (
33
Sound Intensity
Sound intensity is a measure of power for a
sound, as it contacts an area, such as the
eardrum and directly proportional to the square
of the amplitude of the waveform.

The instantaneous intensity of a sound I(t) is
expressed as power per unit area. It is the
product of the instantaneous values of the
pressure, x(t) and velocity, u(t).
34
Sound Intensity
) ( ) ( ) ( t u t x t I =
2
m
2
m
In MKS units, the intensity is measured in
watts/ and the velocity in m/s, while the
pressure is measured in Newtons/ ,or Pascals.

1 Pascal=1 Pa=1 Newton/m
2
) 9 (
35
Relation Between Intensity & Pressure
For a sound wave, the velocity, u(t) is related to the
pressure, x(t) as

where:
c: Specific impedance of the medium in which the sound
wave propagates as the product of density (units of
kg/m
3
) and speed of sound c (units of m/s).

The expression for the intensity as a function of the pressure
is:

c t x t u / ) ( ) ( =
) /( ) ( ) (
2
c t x t I =
) 10 (
) 11 (
36
Inverse square law
The law of decreasing power per unit area
(intensity) of a wavefront with increasing
distance from the source is known as the
inverse square law, because intensity drops in
proportion to the inverse square of the distance
from the source.

I : sound intensity; r : distance
Why is this?
2
/ 1 r I
) 12 (
37
Inverse square law
It is because the sound power from a point
source is spread over the surface area of a
sphere (S ), which is given by:

where r is the distance from the source, or the
radius of the sphere.
2
4 r S t = ) 13 (
38
Inverse square law

39
Inverse square law
If the original power of the source is W watts ,
then the intensity,I (i.e. power per unit area)
at distance r is:

) 4 /(
2
r W I t =
) 14 (
40
Inverse square law
For example, if the power of a source is
0.1 watt, the intensity at 4 m distance
would be:

2
/ 0005 . 0 ) 16 14 . 3 4 /( 1 . 0 m W I ~ =
) 14 ( a
41
42
Inverse square law
1.2 DECIBEL SCALE
43
44
Sound Measurement in
Decibel (dB) Scale
It is common to refer the power of a
signal, or intensity of a sound as a sound
level.
Sound levels are measured in Decibel
(dB). It is a logarithmic measure.
45
Decibel Scale
Acousticians use the dB scale for the
following reasons:
1) Quantities of interest often exhibit such
huge ranges of variation that a dB scale is
more convenient than a linear scale.
2) The human ear interprets loudness
more easily represented with a logarithmic
scale than with a linear scale.

Decibel Scale demo
Broadband noise is reduced in 10 steps of 6
decibels. Demonstrations are repeated once
decibels
decibel
Free-field speech of constant power at various
distances from the microphone
47
Decibel Scale
In fact, the decibels is a relative unit showing
the change in power or intensity.
Suppose P
1
and P
2
are the power
for the two signals, then the decibels (dB) will be:

where P
1
is the power of the reference signal, or
reference power.
) / ( log 10 dB
1 2 10
P P = ) 15 (
48
Decibel Scale
Example:
If the power ratio is 2:1, then
dB=10log
10
2=10X0.301=3.01 3

~
If the power ratio is 4:1, then
dB=10log
10
4=10X0.602=6.02 6

If the ratio is less than 1, the number of decibels is
unchanged; only a minus sign (-) is put in front.
~
49
Decibel Scale
The sound intensity level (SIL) of the
signal in decibels can be calculated by
comparing it with the accepted reference
level of 10
-12
W/m
2
.
Example:
Referring to Eq.(14a) in page (38),
SIL(dB)= 10log
10
[(5X10
-4
)

/10
-12
]=87 dB
50
Decibel Scale
If the decibel is used to compare values other
than signal powers, the relationship to signal
power must be taken into account.
Voltage has a square relationship to power (from
Ohms law: W=V
2
/R); thus to compare two
voltages:

) / ( log 20 ) / ( log 10 ) / ( 10log dB
1 2 10
2
1 2 10
2
1
2
2 10
V V V V V V = = = ) 16 (
51
Decibel Scale
For example, the difference in decibel between
a signal with a voltage of 1 volt and one of 2
volts is:
20log
10
(2/1)=6 dB

While a doubling in power gives rise to an
increase of 3 dB.

A similar relationship applies to sound pressure
(analogous to electrical voltage) and sound
intensity (analogous to electrical power).

52
Observations with the dB scale
The ear can just detect a change of 1 dB for a
steady tone (i.e. a sinusoidal signal).
It can just detect a change of 3 dB for speech
or music (i.e. a complex signal consists of a
number of sinusoidal signal components).
Human auditory system is capable of
functioning over a wide range of intensities,
e.g., 60 120 dB.
53
Absolute Acoustical decibel
An absolute acoustical decibel (dB) scale is
referenced to the minimal threshold of hearing.
There are two such references, one with pressure
and one with intensity.
The reference pressure, x
o
is 20 micropascals ( ),
1 Pa being a pressure of 1 newton per square
meter; 20 is the minimal threshold of hearing
at 1,000 Hz for most people, so it is a convenient
reference.

Pa
Pa
54
Therefore, a signal with RMS pressure x
RMS
has
a sound pressure level (SPL):

SPL=20log
10
(x
RMS
/x
0
)=20log
10
[x
RMS
/(2x10
-5
)]

where x
0
= 2x10
-5
Pa =20 Pa
) 17 (
55
For a sound wave the reference intensity
I
o
is obtained from the reference pressure,
X
o
by

I
o
= X
o
2
/( c)

where c is the impedance of air, which is
415 Rayls in MKS unit.
) 18 (
56
Thus, the value of I
o
is

I
o
=(2x10
-5
)
2
/415=0.964x10
-12
10
-12
W/m
2

~
Therefore, a signal with intensity I has a sound
intensity level (SIL):

SIL(dB)=10log10 (I/I
0
)=10log10(I/10
-12
)
) 19 (
57
SIL(dB) & SPL(dB)

where
I: sound intensity level
I
o
:

reference intensity level
P: sound pressure level
P
o
: reference pressure level
C : constant

|
|
.
|
\
|
=
|
|
.
|
\
|
=
|
|
.
|
\
|
=
o o o
P
P
CP
CP
I
I
dB SIL log 20 log 10 log 10 ) (
2
2
SPL & SIL
SPL and SIL are in practice meant to be
equivalent.
http://www.pmel.noaa.gov/vents/acoustics/tutorial/7-SPL-SIL.html
Some very minor numerical differences
exist due to rounding (slide 53)
Nowadays the term SPL is more
commonly used than SIL.
For this course, they will be treated as the
SAME.

58
59
Intensity
Absolute vs Relative Scale
Ratio
Intensity reference = 10
-12
watts/m
2

Logarithmic
I
x

I
r

log
= 10
-12
w/m
2

= 10
2
w/m
2

(softest)
(loudest)
100,000,000,000,000
14
Range: 1 to
0 to
10
dB IL =
amount of energy transmitted per second over an area of one square meter
0 to 140
60
Sound A is 60 dB IL (10
-6
w/m
2
)
Sound B is 60 dB IL (10
-6
w/m
2
).
What is the combined dB IL (when both sounds are on)?
Adding Decibels (of equal intensity)
10
-6
+ 10
-6

10
-12

dB IL = 10 log
2 x 10
6
dB IL = 10 log
dB IL = 10 x 6.3
= 63 dB IL
61
Sound A is 60 dB SPL (0.02 Pa).
Turn up sound A so that it has twice the pressure (0.04 Pa).
How many dB SPL is sound A now?
Adding Decibels (of equal pressure)
.02 + .02

0.00002

dB SPL = 20 log
2000 dB SPL = 20 log
dB SPL = 20 x 3.3
= 66 dB SPL
62
Adding Decibels
Cannot directly add dB
e.g., 72 dB + 72 dB is NOT 144 dB
Can directly add intensities (w/m
2
) or pressure (Pa)
e.g., 10
-7
w/m
2
+ 10
-7
w/m
2
IS 2 x 10
-7
w/m
2

Therefore, when adding, convert dB into w/m
2
or Pa,
and then convert back to dB
63
The JMU music school has a brass ensemble that consists of a
trumpet (83.4 dB SPL), french horn (78.1 dB SPL), trombone (85.0
dB SPL), and a tuba (86.3 dB SPL). What is the level of the
music when the entire ensemble is playing together?
Adding Decibels
trumpet All together
83.4 dB SPL = 2.19 x 10
-4
w/m
2

french horn
78.1 dB SPL = 6.46 x 10
-5
w/m
2

trombone
85.0 dB SPL = 3.16 x 10
-4
w/m
2

tuba
86.3 dB SPL = 4.27 x 10
-4
w/m
2

1.03 x 10
-3
w/m
2

= 90.1 dB SPL

64
You are attending an outdoor lecture held in a temporary shelter with a tin
roof. Of course, you take your sound level meter with you. You measure
the speaker at 77.3 dB SPL. It then starts to rain, which makes a terrible
racket. You measure the speaker together with the rain at 78.9 dB SPL.
What was the level of the rain?
Adding Decibels
rain alone speaker
77.3 dB SPL = 5.37 x 10
-5
w/m
2

speaker + rain
78.9 dB SPL = 7.76 x 10
-5
w/m
2

2.39 x 10
-5
w/m
2

= 73.8 dB SPL

65
Example
# For an omnidirectional point source in an
Anechoic room (a room with essentially no
reflective boundaries) the received sound
Pressure level (SPL) at a microphone placed
3 m away is 80 dB.
(i) What is the SPL if the microphone is
placed 6m away?

66
Example (contd)
(ii) If the same source and microphone are
put in a highly reverberant room in which
the energy from reflections in the first 40
ms is much smaller than the energy from
later reflections, what is the corresponding
change in SPL when the distance between
the source and microphone is doubled?
67
We know that

where I is the sound intensity and P is the
the sound pressure.
Considering the assumptions of a point
source and lack of reflective boundaries,
we also know that

) 1 (
2
P I
source. the from distance the is where
) 2 (
1
2
r
r
I
68
From (1) and (2), we find

So, in the anechoic room, if we double the
distance from 3 m to 6 m, the pressure
becomes half and thus SPL goes down by
20log(1/2)=-6 dB to 74 dB.
r
p
1
69
(ii) In the case of highly reverberant room,
since direct sound and the sound from
the first reflections are low in energy in
comparison with later reflections, to a
good approximation the sound is diffuse
(largely due to many reflections) and the
SPL is essentially unchanged with
doubling of distance.
70
Inverse Square Law (Example)
Marv is standing 8 feet from a sound source and
measures a sound to be 80 dB SPL. Del is standing
32 feet from the sound source. How many dB SPL
would the sound be when it arrives at Del?
d dBSPL log 20 =
32 feet
80 dB SPL
? dB SPL
8
32
log 20 = dBSPL
12 80 =
dBSPL 68 =
dB 12 =
Marv
Del
Marv is standing 8 feet from a sound source and
measures a sound to be 80 dB SPL. Del is standing
32 feet from the sound source. How many dB SPL
would the sound be when it arrives at Del?
71
Inverse Square Law (Example)
d dBSPL log 20 =
80 dB SPL
? dB SPL
560 , 10
200
log 20 = dBSPL
5 . 34 80+ = dBSPL 5 . 114 =
dB 5 . 34 + =
1.3 Loudness, Phons and
Equal Loudness Curves
72
73
Loudness Perception
At the simplest level, loudness relates to
intensity, and pitch relates to frequency.
Thus, a loud sound is one of high intensity
(corresponding to a substantial flow of energy),
while a sound of high pitch is one of high
frequency.
In practice, however, the two factors of intensity
and frequency interact and the loudness of a
sound depends on both.
74
Loudness Perception
Loudness is defined as the level (in dB
SPL) of a 1000 Hz tone, judged to be as
loud as the sound under test.
Thus, if a test tone of 100 Hz is
considered then a listener is asked to
adjust the level of a 1000 Hz tone until it
sounds equally loud. The level of the 1000
Hz tone (in dB) is then called the loudness
level (in phons) of the 100 Hz test tone.
75
Loudness Perception
Loudness is a subjective quantity and
therefore, it can not be measured directly.
However, in practice, it is useful to be
able to assign numerical values to the
perception of loudness.
76
Phon and Equal Loudness Curves
Phon:
A unit that depends only on the judgement
of equality between two sounds, and found
its average value, for a group of listeners, to
be a consistant measure of loudness level.
77
Phon
The phon level can be found, in this way, for
any continuous sound, sine wave, or complex
wave.
But as a unit, it only makes possible
comparisons.
It does not tell anything about the loudness of
the sound, except that more phons means
louder. For example, 80 phons is louder than 40
phons, but it is not twice as loud.
Frequency response of the
ear
You will now hear tones (10 ms) at
several frequencies (123, 250, 500, 1000,
2000, 4000, 8000 Hz), presented in 10
decreasing steps of 5 decibels. Count the
number of steps you hear at each
frequency. Frequency staircases are
presented twice.
79
The curves in the following Figure are called
Equal loudness curves or Fletcher-Munson
curves, since they are devised firstly by
Fletcher and Munson.
Each curve represents a sound level or phon
and shows the sensitivity of the ear at different
frequencies across the audible range.
80

81

82
Loudness level comparisons have been made
over the normal range of audible frequencies
(20 Hz to about 15000 Hz), and at various SPL,
leading to the generation of Equal loudness
contours.

Fletcher and Munson have derived their results
from the tests on a large number of subjects
who were asked to adjust the level of test tones
until they are appeared equally as loud as the
reference 1-kHz tone.
83
From these results, one could draw curves
of average equal loudness, indicating the
SIL/SPL required at each frequency for a
sound to be perceived at a particular
loudness level, or phons.
84
The 10-phon curve is the contour which
passes through 10 dB SIL/SPL at 1 kHz.
All points along a phon curve will sound
equally loud. Although clearly a higher
SIL/SPL is required at extremes of the
spectrum than in the middle.

85
Since all points on a given contour have equal
loudness, a sound intensity level of 92 dB at 20
Hz will sound equally as loud as 40 dB at 1000
Hz.
The main features of equal loudness contours
are that they rise steeply at low frequency, less
steeply at high frequencies.
They become flatter as the level rises which
indicates that ears frequency response also
changes with the signal level.

86
Equal Loudness Curves Implications
Consider the following two cases to monitor the sound
levels during recording:

Case 1: If a sound is reproduced at a higher level than
that at which it was recorded, the low frequencies will
be relatively louder and sound will become boomy.

Case 2: If the sound is reproduced at a lower level than
that at which it was recorded, the sound will be lack of
bass and will become thin.
That is why some amplifiers include the loudness
control which attempts a degree of compensation by
boosting bass or possibly treble at low listening levels.

87
Implication of Equal Loudness
Curves
88
Remarks on Equal Loudness
Curves
The shape of the equal loudness curves
depends on the type of the reference
sound used in the test. Fletcher and
Munson have used 1-kHz tone as
reference test sound. If a filtered noise is
used as reference test sound, slightly
different results can be found.

89
Equal Loudness Curves-Example
# Three tones (100 Hz, 2000 Hz, and 7000 Hz)
are presented monaurally over wideband
headphones (40 Hz-16 kHz) to a young adult
subject with normal hearing.
In such case, the sound pressure level (SPL) at
the subjects ear is 40 dB.
What would be the expected loudness for the
tones, going from the most loud to the least
loud?

90
Equal Loudness Curves-Example
91
Equal loudness curves -
Comparison
92
http://hyperphysics.phy-astr.gsu.edu/hbase/sound/phon.html
1.4 Loudness and Sones
93
94
Loudness and Sone
One needs to obtain values for loudness
when the numbers represent magnitude
of the sensation. It is then necessary to
carry out experiments when listeners
make judgements as how many times
louder is sound A than sound B?

95
Loudness and Sone
In practice, there is now an established
unit of loudness, called sone.
A loudness of one sone means that a pure
tone has 40 dB SIL at frequency of 1-kHz,
i.e., 40 phons.
1 sone=40 phons

96
Loudness and Sone
The sensation of loudness is directly
proportional to the number of sones, e.g.
80 sones is twice as loud as 40 sones.

It is found that every addition of 10
phons corresponds to doubling of
loudness. So, 50 phons is twice as loud as
40 phons.

97
98
Sone and Phon
99
Sone and Phon
The sone is derived from psychophysical measurements
which involved volunteers adjusting sounds until they
judge them to be twice as loud. This allows one to relate
perceived loudness to phons. A sone is defined to be
equal to 40 phons. Experimentally it was found that a
10 dB increase in sound level corresponds approximately
to a perceived doubling of loudness. So that
approximation is used in the definition of the phon:
0.5 sone = 30 phon, 1 sone = 40 phon, 2 sone =
50 phon, 4 sone = 60 phon, etc.

100
Psychophysics and Loudness
Phychophysics may be said to be the study of
the relationship between the magnitude of
sensation and the magnitude of of a stimulus as
measured in conventional physical units.
Suppose you go to the emergency room with
your heart attack, the staff may ask you to
describe your pain by a numerical value. This is
an example of psychophysical method in which
observers report the magnitude of sensation by
assigning a number.
101
Measurement of loudness
using psychophysical method
A magnitude estimation experiment for
loudness presents listeners with a series
of stimuli. The stimuli have the same
spectrum, but different intensities.
After many trials, it was found that the
estimates is proportional to a power of
the intensity.
102
Measurement of loudness
using psychophysical method
That is, according to the power law,
the psychophysical magnitude, , is measured
as

where I is the intensity in watts/m
2
while K
and p are constants.
+
) 20 (
p
KI = +
103
Measurement of loudness using
psychophysical method
Taking the logarithms of the magnitude
estimates, from (20),

Because the sound level in dB is ,
log log log K p I + = +
) / log( 10
o
I I L =
log log( ) ( /10)
p
o
KI p L + = +
) 21 (
) 22 (
104
Measurement of loudness using
psychophysical method
Eq. shows that log of the loudness
estimates should be a linear function of
the sound level in dB. The slope of the
line gives the exponent in the power
law.
) 22 (
105
Determination of the exponent p
in the power law
Suppose loudness is twice ,then

Taking the logarithms, we find
p
I I ) / ( 2 /
1 2 1 2
= = + +
2
+
1
+
) / log( 2 log
1 2
I I p =
) 23 (
) 24 (
106
Determination of the exponent p
in the power law
or,

so that

Setting p=0.3 in (26) requires (i.e. the change
in intensity levels) to be 10 dB for loudness doubling.
L p A = ) 10 / ( 3 . 0
L A
p L / 3 doubling) loudness (for = A ) 26 (
) 25 (
107
Sone Scale
The exponent is the basis for currently
used international loudness scale, or sone scale.
The reference for this absolute scale is that a 1-
kHz sine tone having 40 dB SIL would have a
loudness of 1 sone.
The above reference sets the value of k
in Eq. (20). Therefore, we found

3 . 0 = p
3 . 0
849 . 15
1
) Sones (
|
|
.
|
\
|
= +
o
I
I
) 27 (
108
Sone Scale
Proof:

849 . 15
1
8489 . 15
1
10
1
10
2 . 1 log
(dB) 40 03 . 0 log ) 1 log(
log 10 03 . 0 log log
log 3 . 0 log log
2 . 1
) 2 . 1 (
3 . 0
~ =
=
=
=
+ =
|
|
.
|
\
|
+ = +
|
|
.
|
\
|
+ = +
|
|
.
|
\
|
= +
k
k
k
k
k
I
I
k
I
I
k
I
I
k
o
o
o
109
Sone Scale
It follows by defination that any tone with a
loudness level of 40 phons has a loudness of 1
sone.
And using (27), sones can be calculated from
phons by the following equation:

where is the loudness level in phons.
|
L
I
I
o
03 . 0 2 . 1 log 10 03 . 0 ) 849 . 15 log( log + =
|
|
.
|
\
|
+ = +
|
L
) 28 (
110
Sone and Phon
Sone, is related with phon, by the following
equation:

In contrary, phon is related to sone, as
follows:

) ( log 10 40
2
+ + =
|
L
+
|
L
|
L
+
) 29 (
) 30 (
10 / ) 40 (
2

= +
|
L
111
Power law vs. linear law and logarithmic law
(a)
(b)
(c)
Figure 1
112
Power law vs. linear law and
logarithmic law
Figure 1(a) tests the linear law as loundess
proportional to sound pressure. The
pronounced curvature of this plot shows that
the linear law fails.
Figure 1(b) tests the logarithmic law, which
says that loudness should be proportional to
level as measured on the decibel scale. This
plot is also curved, but in the opposite way.
113
Power law vs. linear law and
logarithmic law
Figure 1(c) tests the power law. It is evident
that power law gives the best straight line.
The slope of the straight line corresponds to
the value of the exponent p in power law.

1.5 Human Ear
114
Anatomy of the Ear
Anatomy of the Ear
(contd)
Semicircular Canal provides sense of balance.

Auditory Canal acts like an acoustic tube with some
amplification at 3 kHz.

Eardrum Hammer Anvil Stirrup converts acoustic wave
into mechanical vibrations. The incoming acoustic wave
vibrates the stapes. There is an impedance mismatch between
the eardrum and the fluid of the inner ear. The bones
compensate for this to reduce energy loss.

Round and Oval Window converts mechanical vibrations
into traveling waves in fluid. The vibrating stapes shake the
oval membrane, which presses upon the fluid of the inner ear.
The resulting motion of the fluid follows that of a traveling
wave.

Anatomy of the Ear
(contd)
Basilar Membrane responds to traveling wave and excites the hair
cells. The basilar membrane is a spiraling membrane that is
submerged in the fluid of the inner ear. It responds to the traveling
wave by vibrating. The basilar membrane follows a gradient of
stiffness and is stiffer at the base than at the apex. The result is that
the base has a greater response to the higher frequency components
of the traveling wave, while the apex has a greater response to the
lower frequency components.

Hair Cell produces a receptor potential. The hair cells are
sandwiched between the basilar and tectorial membranes. As the
basilar membrane vibrates, it creates a shearing effect on the hairs of
the cell, which sets off a series of reactions. First, this opens the
channels in the hair cells. Once open, the hair cells depolarize and
release neurotransmitters which, in turn, depolarize the auditory fibers
innervating the cell.

Neurons produces and propagates an action potential. The neurons
surround each of the hair cells and responds to the receptor potential
by transmitting an action potential down a network of neurons.
Activity along the Basilar
membrane
Neural spiking by Hair Cell
Schematic of the ear
121
Engineering Model of Auditory
System/Human Perception
Figure next page shows an engineering
model of auditory system, where the
cochlea which consists of Basilar Membrane
(BM) and hair cells are represented by a
filter bank (i.e. a bank of bandpass filters).
These filters are called auditory filters.
Fletcher also thought that the BM provides
the basis for auditory filters, with
overlapping passbands.
122
Each location on the BM responds to a limited
range of frequencies, so each different point
corresponds to a filter with a different center
frequency.
At the entrance of the cochlea the BM and the
associated hair cells respond more to higher
frequencies.
As the vibrations penetrate more deeply into the
cochlea, the BM response becomes more sluggish,
corresponding to filters with lower central
frequencies.
123
Figure 6.
124
Basilar Membrane (BM)
125
Basilar Membrane (BM)
126
Traveling wave envelop at BM
127
Critical Bands (CBs)
128
Critical Bands (CBs)
The loudness of several tones, if they lie within
one critical band, can be predicted by interacting
the power of all the components.
If, however, the several tones are in different
critical bands, then each tone will have a
particular loudness, and these loudness will sum
to arrive at the combined loudness.
Thus, components more widely spaced will
sound louder than if they are close together in
frequency.

Critical Bands by
Loudness Comparison
The bandwidth of a noise burst is increased while its
amplitude is decreased to keep the power constant.
When the bandwidth is greater than a critical band, the
subjective loudness increases above that of a reference
noise burst, because the stimulus now extends over
more than one critical band
130
Critical Band and
subjective loudness
Suppose that two tones, each 60 dB above
threshold, are presented simultaneously. In
case (1) the frequencies are 2,000 and 2,200
Hz and in case (2) they are 2,000 and 3,000
Hz.
(i) Which sound pair, (1), or (2) will be
subjectively louder?
(ii) What is the reasoning behind your answer,
that is, what data and concepts lead you to
this conclusion (aside from actually running
the experiments itself)?
131
Critical Band and
subjective loudness
(i) The sound pair of 2000 Hz and 3000
Hz in case (2) will be louder.
(ii) Because those in (1) lie in the same
critical band, thus interact and have a
level of activity less than sum of the two
individual, whereas those in (2) lie in
different critical bands and so additivity is
to be expected.
Bark scale
Two scales of nonlinear mapping in frequency
domain are proposed, Greenwood and the Bark
scale.
Example : The relationship between hertz (f)
and bark (b) scale is given below
f = 600 sinh(b/6)
Try computing the equivalent frequencies for
bark values of 1 to 20.

134
135
136
1.6 Threshold of Hearing
137
138
Thresholds of Audibility/Hearing
The thresholds of audibility is defined as the
smallest amount of pressure to which the
auditory system is sensitive.
A listener who has a low threshold can be
described as being very sensitive.
High sensitivity or low threshold means the
same thing.
The thresholds of audibility are plotted as the
threshold in dB SPL versus frequency.
139
Absolute Threshold
The absolute threshold of a sound is the
minimum detectable level of sound in the
absence of any external sounds.

It depends on the way in which the
output sound pressure is measured for an
input stimulus.
140
Absolute Threshold
One method of sound pressure
measurement is to probe a small
microphone close to the ear canal or
inside the ear canal.
The threshold determined by this method
is called the minimum audible pressure
(MAP).
141
Absolute Threshold
MAP thresholds describe the thresholds in
terms of the sound pressure level at the
observers ear canal.
The listeners listen to sounds presented
over earphones, and various procedures
are used to determine the sound pressure
occuring at the ear canal.
142
Absolute Threshold
The absolute threshold is determined by
another method called as minimum audible field
(MAF).
This method uses tones delivered by a
loudspeaker in an anechoic room and
measurement of sound pressure level is made
for the listeners facing the source.
The listeners are listening with both ears
(binaurally) at 1 m from the sound source.

143
Absolute Threshold
Figure 5 shows the minimum audible sound level
by the MAP and MAF methods.

Figure 5.
144
Absolute Threshold
MAF thresholds are always lower than MAP
thresholds.

The shape of the MAP and MAF thresholds are
approximately the same.

Both of them show the loss of sensitivity below
approx. 1000 Hz and above approx. 4000 Hz.

MAF thresholds are considered to be an ISO
standard.

145
Threshold of Discomfort
Threshold of discomfort is the upper limit
of the pressure level that the auditory
system can tolerate.
This can be measured by asking the
listeners to say when the sound is felt or
pain is experienced.
These experiences indicate that the sound
pressure level is reaching a maximum,
which is approx. 120 -130 dB SPL
146
Threshold of Discomfort
The maximum SPL remains relatively
unchanged with respect to frequency.
The dynamic range, which is the
difference between the threshold in dB
SPL and the maximum limit in dB SPL,
varies with the frequency.
The dynamic range is approx. 125-135 dB
at 1000 Hz, while 80-90 dB at 100 Hz.
(Pls. see also the Equal Loudness Curves)
1.7 Auditory Masking
147
148
I cannot hear you when the water is running
is a statement about sound masking.
149
Auditory/Sound Masking
Auditory/Sound masking can be defined in the
following two ways:

A process by which the threshold of audibility of a
sound is raised by the presence of another
masking sound.

An amount by which the threshold of audibility of
a sound is raised by the presence of another
masking sound.

150
Sound Masking
The effect of masking plays a very important
role in hearing.

It can be differentiated into two forms:
- Simultaneous masking/
Frequency masking
- Non-simultaneous masking/
Temporal masking

Temporal Masking
Forward Masking Masking of a signal by
a masker that precedes the masked
(probe) signal.
Backward Masking Masking of a probe
by a masker that comes after the probe.
151
152
Sound Masking
Simultaneous Masking

An example of simultaneous masking would
be case when a person is having a conversation
with another person, while a truck passes by.

The conversation is severely disturbed
and to continue the conversation, the speaker
has to raise his/her voice to mask the noise.

153
Sound Masking

- Simultaneous masking is a frequency domain
phenomenon where a low-level signal (S
1
)
can be made inaudible by a simultaneously
occuring stronger signal (S
0
) when both
signals are close enough to each other in
frequency (see Figure 1).

154
Sound Masking

Figure 2: Threshold in quiet and masking threshold.
155
Sound Masking
In Figure 2, the masker is the signal S
0
,

which produces a masking threshold is
similar in shape to a Gaussian distribution.

Any signal within the skirt of this masking
threshold will be masked by the presence
of S
0
.

156
Sound Masking

The weaker signal S
1
and S
2
are
completely inaudible. This is because their
individual sound pressure levels are now
below the masking threshold.

The signal S
L
is only partially masked and
the perceivable portion of the signal lies
above the masking curve.

157
Sound Masking
The distance between the level of the masker and
masking threshold is called the signal-to-mask ratio
(SMR).

SMR=SNR-MNR
where
SMR: Signal-to-Mask Ratio
SNR: Signal-to-Noise Ratio
MNR: Mask-to-Noise Ratio
In simultaneous/frequency masking, the higher the SMR,
the least it masked.

) 30 (
158
TMN: Tone Mask Noise (i.e. Tone is a masker and noise is the
maskee
Simultaneous Masking Experiment
Put a person in a quiet room. Raise level of 1
kHz tone until just barely audible. Vary the
frequency and plot

159
Play 1 kHz tone (masking tone) at fixed level (60 dB).
Play test tone at a different level (e.g., 1.1 kHz), and
raise level until just distinguishable.
Vary the frequency of the test tone and plot the
threshold when it becomes audible:
160
Repeat for various frequencies of masking
tones
161
Frequency Masking on
critical band scale
162
163
Sound Masking (Simultaneous Masking)
A low-frequency tone is better at masking a tone of higher
frequency than vice-versa Why?
By way of transduction, the basilar membrane in our inner ear
vibrates in response to sound. Low frequencies displace the basilar
membrane much more: the distance from stapes (one of the three
bones in the middle ear) is about 30mm at 25 Hz compared to
20mm at 800 Hz. Additionally, as frequency increases, the location
of maximum displacement along the basilar membrane moves from
the farthest section of the inner ear toward the middle ear (to the
stapes and the oval window). Higher frequencies must therefore be
of greater intensity to overcome the dominance, both spatially and
quantitatively, of the low notes over the basilar membrane.
The spatial representation of frequency on the basilar membrane is
perhaps the single most important piece of physiological
information about the auditory system, clarifying many
psychophysical data, including the masking data and their
asymmetry (Scharf 1975).

164
Sound Masking (Simultaneous Masking)
Broad-band noise is a better masker than a tone
signal as a masker Justify this statement?
Broad-band noise is a better masker than a tone
signal, which is a narrow-band signal. Because
the broad-band noise can cover a wide
frequency band consisting a number of
overlapping critical bands.
(Critical band acts something like a one-third octave
filter whose center frequency can be any place along the
audible frequency axis to be discussed later.)
165
Sound Masking
Non-simultaneous masking/Temporal masking

- Non-simultaneous masking is described as
temporal masking.
- It occurs when two sounds appear within a short
interval of time.
- It can be classified into pre-masking and post-
masking (See Figure 3).

166
Sound Masking
Non-simultaneous masking

Figure 3: Non-simultaneous masking (Acoustic
events in the dark areas will be masked).

167
Sound Masking

168
Sound Masking

Pre-masking: It occurs when the signal
precedes the masker in time. A strong signal
can mask a weaker signal that occurs before
it.
Post-masking: It occurs when the signal
follows the masker in time. A strong signal
can mask a weaker signal that occurs after
it.

169
Sound Masking

In Figure 3, post-masking uses a different
time-origin than pre-masking and
simultaneous masking.

The post-masking lasts longer than pre-
masking.

Post-masking results from the gradual release
of the effect of the masker.
That is, masking does not immediately stop
when the masker is removed, but rather
continues for a period of time following this
removal.

170
Sound Masking
The duration of the post-masking depends on
the duration of the masker.

Figure 4: Dependence of post-masking on masker duration.

171
Sound Masking

In Figure 4, the dashed line indicates post-
masking for a masker duration of 200 ms.
The degree of post-masking decreases from the
value for simultaneous masking as a function of
the delay time.
However, post-masking produced by a very
short burst, such as 5 ms (see Figure 4),
behaves quite differently.
172
Sound Masking

In Figure 4, the post-masking in such case decays
much faster so that after only 50 ms the threshold
in quiet is reached.

This implies that post-masking strongly depends
on the duration of the masker.
173
Masking -Example
# A minimum signal-to-mask ratio (SMR)
of 2 dB is observed for a noise masker
having a 250 Hz center frequency, 3 dB
for 1 kHz masker, and 5 dB for 4 kHz
masker. Which is the better noise masker?
And why?
174
Masking - Example
In frequency masking, the higher the
SMR, the least it masked.
Therefore, a better masker is centered at
250 Hz, which allows more tonal signal to
be masked, as compared to the one
centered at 1 kHz and 4 kHz.
:
175
Masking - Example
# The minimum SMR for a tone-masking-
tone experiment is found to be 15 dB.
(i) How is this value compared to that of
the noise-masking tone experiment?
(ii) Can a tone mask narrow band noise?
(iii) What happens if the masking signal
increases its SPL from 60 to 80 to 100 dB?

176
Masking - Example
(i) The minimum masking SMR for a tone
masking tone experiment is ~15 dB.
For a noise-masking-tone experiment, the
minimum SMR is lesser than 15 dB, which
implies that noise is a better masker than tone.
(ii) A tone can mask a narrow band noise, but
the minimum SMR level are higher between 20-
30 dB.
:
:
177
Masking - Example
(iii) When the masking signal increases its
SPL from 60 to 80 to 100 dB, the slope
towards the higher frequency becomes
more shallower as masking level increases.
This implies that the masking curves cover
a larger upper frequency region.
:
178
179
Critical Band and Masking
# At 2 kHz, and for a noise masker with
intensity (per Hz), , of 40 dB then
the signal intensity required for detection,
, is 60 dB SPL.
Calculate the critical bandwidth from the
above data in the noise masking
experiment?

masker
I
'
signal
I
180
Fletchers observation concerning the
masked signal intensity ( ) and
intensity of the noise in the critical band
( ):

signal
I
noise
I
CBW where
masker noise
noise signal
'
=
=
I I
I I
181
Expressed in dB, we can have,

Hz 100 10 CBW
20
40 - 60

dB in - dB in CBW log 10
2
masker signal
= =
=
=
'
= I I
:
1.7 Pitch Perception
182
183
Pitch Perception
The pitch perception is the brain's way of
interpreting the frequency of the
vibrations.
184
Pitch Perception
Models for Pitch Perception:

- Pattern Recognition model (or place
model)
- Temporal model
185
Pitch Perception
tone
186
Pitch Perception
187
Pitch Perception
188
Pitch Perception
189
Pitch Perception
190
Pitch Perception
191
Pitch Perception
192
Pitch Perception
193
Pitch Perception
194
Pitch Perception- Residual
Pitch
195
Pitch Perception
196
Pitch Perception-Temporal
model
197
Pitch Perception
198
Pitch Perception
199
Relevant Websites /Demos
http://www.lifesci.sussex.ac.uk/home/Chris_Darwin/
Perception/Lecture_Notes/Hearing4/hearing4.html
#RTFToC1

http://www.ece.uvic.ca/~aupward/p/demos.htm

200
Part II
3-D Sound Localization and
Synthesis
201
2.1 3-D Sound Localization
Goal for 3-D sound is to provide
Spatialization i.e. the sense that the
Sound originates outside your head
Sound has a direction

202
3-D Sound - position of the
source in 3-D space
203
3-D Sound Coordinate
System
204
3-D Sound Hearing Basic
Concept
Simply human auditory system can be considered as a
computer with two input ports. The input signals are the
sound waves which reach the left and right eardrums.

On their way to the eardrums, these signals pass
through the external ears. First they are diffracted and
partly shadowed by the skull before entering the pinna-
ear canal system, where their spectrum is specifically
modified by the resonance. These spectral
modifications/distortions of the incoming signals are
particularly important for 3-D sound hearing.
205
3-D Sound Localization
206

3-D Sound Localization Cues:
Interaural Time Difference
Interaural Intensity Difference

The above cues belong to primary
localization cues and called as Interaural cues
207
Interaural Time Difference (ITD)
The more extremely left or right, the greater
the difference
Interaural Intensity Difference (IID)
Head absorbs and reflects sound energy
The ear received the first sound is the
loudest sound
Head Shadow

208
Cone of Confusion: The cone of confusion
is an area where it is difficult to locate a
source.

It is due to the following reasons:
Interaural Time difference is similar
Interaural Intensity Difference is also similar

209

3-D Sound Localization Cues:
Pinnae filtering
Body filtering

The above two cues belong to
secondary localization cues and
called as Spectral cues
210
Sound source or
auditory event location
Azimuth angle
Elevation angle
Distance r

Relative position
to the ear
Ipsilateral
Contralateral
|
o
|
o
211
Sound Localization Cues
Human auditory system makes use of several cues to
locate an auditory event

Interaural cues
Spectral cues
Distance cues

A virtual source synthesis system
must emulate these cues
212
Interaural Cues
Difference between the sound signals
at left ear and right ear
Also called binaural cues
Determined by the horizontal
component (azimuth angle)

Time delay between the sound arrival
at ipsilateral and contralateral ear
Interaural Level Difference (IID)
Difference of sound intensity level
at ipsilateral and contralateral ear
o
o
213
Rayleighs Duplex Theory
ITD and IID are primary cues dominant

Frequencies below ~1500 Hz
IID gives little information
ITD can be easily detected

Frequencies above ~1500 Hz
Ambiguity in the ITD due to several cycles of shift
IID solves this ambiguity (Head-shadow effect)

IID and ITD are complementary cues

214
ITD as function of azimuth
angle for three elevations
(a) 0 deg elevation
(b) 30 deg elevation
(c) 60 deg elevation

Computing methods
Linear regression (dashed)
Cross correlation (dotted)
Spherical head model (solid)
by Gardner & Martin, MIT Media Lab
215
Calculation of ITD
Linear regression based approach:

2
2 (radian) (Period), sec
(radian) (sec)
2 2
where : frequency
Then ITD (sec)
2
T
T
f
f
f
t
|
| |
t t
|
t
A
A A =
A
=
( ) radian ) ( .
1 2 /
0
*
2 1
|
.
|
\
|
Z

=
N
k
k X k X
where,
| A can be calculated as
Here, X
1
(k)=FFT
N
{x
1
},
X
2
(k)=FFT
N
{x
2
} & `* denotes
complex conjugation.
216
Calculation of ITD
Cross-correlation based approach:

{ } ) ( arg ITD
operator. n expectatio the is ) E(
and lag - time discrete the is ..., 1 , 0 where
)], ( ) ( [ ) (
max
A =
= A
A + = A
A
xy
xy
r
K
n y n x E r
217
Calculation of ITD
Spherical-model based approach

(radian) angle n Elevatio :
(radian) angle Azimuth :
m/sec 344 sound of Speed : c
cm 17.5 head the of Diameter : D
: where
) sin cos ) sin (cos (sin
2c
D
ITD
1
|
o
o | o |
=
=
+ =

msec 653 . 0 1
2 344 2
175 . 0
))
2
sin( )
2
(sin (sin
344 2
0.175
ITD
90 & 0 If
1
o
~
|
.
|
\
|
+
= +
=
= =
t t t
o |
o
218
Interaural Intensity Difference
(IID)
IID as function of azimuth angle for three elevations
0 deg elevation (solid)
30 deg elevation (dashed)
60 deg elevation (dash-dotted)
Energy ratios and take all frequencies into account
219
Example: IID
Suppose x
1
=[1 .5 .25] and x
2
=[.5 .2 .1]
are the two HRIRs. Then

where, X
1
(k)=FFT
N
{x
1
}, X
2
(k)=FFT
N
{x
2
} & `*
denotes complex conjugation.
) ( ) (
) ( ). (
log 10 IID
*
2 2
1 2 /
0
1 2 /
0
*
1 1
10
k X k X
k X k X
N
k
N
k
=
=
(dB)
220
Spectral Cues

Due to multipath reflections of the torso and
pinnae
Spectral peaks and notches due to constructive
and destructive interference
Used mainly for elevation localization and
front-back discrimination
221
Distance Cues
Loudness
Based on the inverse squared law

Nearby sources are perceived louder than distant sources
Only valid in echo-free (anechoic) environments

Direct-to-Reverberant Ratio
Ratio between itensity of direct sound and reverberation
Used in enclosed environments

2
1
r
I
222
Interaural Transfer Function (ITF)
Ratio of the frequency responses at the two ears

Convolution of contralateral response
with inverse ipsilateral response
Describes diffraction of sound
around the head
(Head Shadowing)

ITF at 30 deg horizontal incidence
i
c
H
H
ITF =
223
Invertibility Considerations
Stability and causality of the ITF depends
on the invertibility of the ipsilateral HRIR

HRIR has to be minimum-phase

Pole-zero diagram of
a non-minimum phase system

HRIRs are in general non-minimum-phase for high frequencies
Approximation for low frequencies (~ below 2000 Hz)
Decomposition into minimum-phase and allpass system
) ( * ) ( ) ( z H z H z H
allp minp
=
Re
Im
Zero
Pole
224
Minimum-phase filter (Example)
) 5 . 0 5 . 0 )( 5 . 0 5 . 0 (
) )( (

5 . 0 1
1
) (
2 1
2
min
j z j z
j z j z
z z
z
z H
+
+
=

+
=

zero) : o pole, : (X diagram zero - Pole

225
Non-minimum phase filter (Example)
) 75 . 0 )( 5 . 0 5 . 0 )( 5 . 0 5 . 0 (
) 1 )( 2 (
) (
+
+
=
z j z j z
z z z
z H
226
Allpass filter (Example)
2
2
1
1
2 1
1 2
2
2
1
1
1
1
1
1
1
) (
: ) ( filter allpass order - Second
1
) (
: ) ( filter allpass order - First

+ +
+ +
=
+
+
=
z d z d
z z d d
z A
z A
z d
z d
z A
z A
) 95 . 0 , 9 . 0 ( phase (b) magnitude; (a)
: response filter allpass order - Second
1 2
= = d d
filter order - First
filter order - Second
227
Simplification
ITF expressed as the ratio of the minimum-phase
system cascaded with an allpass system phase response

For low frequencies the excess phase difference is
approximately linear with frequency
/
( )
( )
( )
jw
minp,c
jw jwITD T
jw
minp,i
H e
ITF e e
H e
=
)) ( ) ( (
) (
) (
) (
jw
allp,i
jw
allp,c
e H e H j
jw
minp,i
jw
minp,c
jw
e
e H
e H
e ITF

=
228
Simplification
ITF expressed as the ratio of the minimum-phase
system cascaded with an allpass system phase response

For low frequencies the excess phase difference is
approximately linear with frequency
/
( )
( )
( )
jw
minp,c
jw jwITD T
jw
minp,i
H e
ITF e e
H e
=
)) ( ) ( (
) (
) (
) (
jw
allp,i
jw
allp,c
e H e H j
jw
minp,i
jw
minp,c
jw
e
e H
e H
e ITF

=
229
Calculation of IID from ITF (Example)
( / )
Let
ITF( ) ( )
where, ( ) : Minimum-phase filter
: Sampling period
1
Given that ( ) ;
1 .9
we can find IID as
1 1
IID ( ); (Parseval's theorem)
2 ( ) ( )

j ITD T
j
L e
L
T
L
e
d
L L
e
e
t
t
e e
e
e t e t
e
t e e
=
= s s
(
=
(

}
| |
10
1
(1 .9 )(1 .9 ) ( )
2
1
1.81 1.8cos( ) ( )
2
1.8
1.81 .2sin 1.81 2.57 dB ( 10log (1.81))
2
j j
e e d
d
t
e e
t
t
t
e
t
e e
t
t
t
( =

=
= = ~ =
}
}
230
Applications of ITF
Interaural Transfer Function (ITF) is used to
design acoustic crosstalk canceller (to be
discussed later).

Low-pass filtering of ITF is required for
crosstalk cancellation. It is a good
approximation, since crosstalk is dominant at
low frequencies.

2.2 3-D Sound Synthesis
231
232
Introduction to 3-D Sound Synthesis
3-D Sound Synthesis

Creating an auditory event (the
sound perceived) at an
arbitrary position in a virtual 3-D space

http://www.hitl.washington.edu/scivw/EVE/I.B.1.3DSoundSynthesis.html
http://interface.cipic.ucdavis.edu/sound/demos/

233
3-D Sound Synthesis Basic
Concept
The signals are usually obtained from an
acoustically similar imitation of a human head
so-called dummy head. If the outer ears of the
dummy are not shaped exactly like those of
real subjects, there will be errors to locate the
positions of the auditory events. It is found that
the distortions which a sound signal experience
when passing through the external ears (incl.
the skulls) depend in a characteristic way on the
direction of incidence of the sound and distance
to the sound source.
234
3-D Sound Synthesis Basic
Concept
The external ears, thus, become a special
coding machine which transform directions
and distances into spectral information.

The auditory system, on the other hand,
decodes this information contained in the
ear signals concerning the directions and
distances of the sound sources in order to
locate the auditory events.

235
Methods of Sound Synthesis
Monaural sound
One microphone used for sound recording
Sound source and auditory event locations coincide
Stereo sound
Two microphones used for sound recording
Auditory event location between the loudspeakers
Adjusting delay and/or amplitude (Time- and Intensity-panning)
Surround sound
Multiple loudspeakers surrounding the listener
3-D Sound (Binaural sound) synthesis
Two microphones placed in the ears of a person or a dummy
More realistic by resembling the Human Acoustic System

236
Surround Sound

Multiple loudspeakers required
Allows for multiple listeners
237
3-D Sound (Binaural Sound)
Synthesis
Only acoustic pressures at eardrums
of the listener are considered
Microphones positioned in the ears
Listener has to be in sweet spot

Obtain the Head-Related Impulse
Response (HRIR)

3-D (Binaural) Synthesis
t t t
t t t
d t x h t x
d t x h t x
R R
L L
) ( ) ( ) (
) ( ) ( ) (
=
=
}
}
238
Multiple Source 3-D (Binaural)
Sound Synthesis
L1
Y
L2
Y
LN
Y
R1
Y
R2
Y
RN
Y
L2
H
R1
H
LN
H
R2
H
L1
H
RN
H
L
Y
R
Y
1
X
2
X
N
X
(
(
(
=
(
) (
) (
) ( ) (
) ( ) (
) (
) (
e
e
e e
e e
e
e
N
1
RN R1
LN L1
R
L
X
X
H H
H H
Y
Y
239
3-D Sound Synthesis
240
Head-Related Transfer Function
(HRTF)
Head-Related Transfer Function (HRTF) is the
Fourier Transform of the HRIR
Each HRTF is realized by a FIR (finite impulse response)
filter.
Function of 4 parameters
Three spacial coordinates r, and
Frequency f
Accuracy vs processing power
Limited to a single listener

o
|
241

HRTFs -- Head-Related Transfer
Functions

The key to this binaural approach to generating
synthetic spatial sound is the so-called Head-Related
Transfer Function (HRTF).

The HRTF captures the location-dependent spectral
changes that occur when a sound wave is coming from
a sound source to the listener's ear drum.

These spectral changes are due to the diffraction of
sound wave by the torso, head, and outer ears or
pinnae, and their characteristics depend on the azimuth,
elevation, and range/distance from the listener to the
source.

242
HRTFs
In general, the HRTF is a complex function of
the location of the source relative to the
listener, as well as the physical size and shape
of the particular listener.
When a sound signal is filtered by accurate
HRTFs and sent to the listener's two ears (for
example, over headphones), the synthesized
sound is experienced as a virtual source at the
desired location in space.

243
HRIR & HRTF A simple
illustration
{ }
e e e e 2
2
0
25 . 5 . 1 ) ( ) ( i.e.,
.
of (DTFT) ransform Fourier t time discrete as calculated be can HRTF then,
1,.5,.25 Suppose
j j
n
n j
HRIR
j
HRIR
HRIR
e e e n h e H
h
h

=
+ + = =
=
244
HRTF Measurements
http://sound.media.mit.edu/KEMAR.html
245
HRTF Measurements
246
HRTF Measurements
247
HRTF Measurements
248
HRTF Measurements
Thus one can obtain the HRTF of an
individual or of a dummy head by playing
an input signal at a desired position (at
least 1 meter distant)
and measuring the impulse response with
probe microphones placed in the vicinity
of the ear canals.
249
Measured HRTF An Example
250
According to Fig. 3.1 (as shown previously):

- At low frequencies, the responses are similar, and
the higher the frequencies the difference in the
responses increases.
- The high-frequency responses contain sharp
features attributed to interactions of the incident
sound with the external ear.
- For instance, the distinctive notches at 8-9 kHz
that are caused by a concha reflection.
- The broad peak at 2-3 kHz is caused by the ear
canal resonance.

251
Measured HRIR An Example
252
Implementing HRTF for 3-D
Sound Synthesis
253
Implementing HRTF for 3-D
Sound Synthesis
254
Magnitude Responses of the
HRTFs
Function of elevation angle
Three fixed azimuthal angles
Spectral peaks (white) and
spectral notches (black).
Ear canal resonance at about
2-3 kHz remains unchanged.
The first notch, N1, due to
concha reflection is
dependent on elevation.
N1 major cue for
elevation localization
255
Magnitude Responses of the
HRTFs
Function of azimuth angle

Ipsilateral and contralateral
ear

N1 largly independent of the
azimuthal angle
256
Problems with 3-D Sound Synthesis
Microphones in the ear channels
Fixed head locations Head Tracking
Human (manikin) specific
HRIRs usually obtained/measured in
anechoic/ reverberant environments which
are
Very long impule responses
Valid to a particular room
2
2.3 Cross-talk Cancellation
257
258
Applications of 3-D Sound
Synthesis

Surround sound reproduction using fewer
loudspeakers

Virtual Reality System

Human-Machine Interaction

259
The cue-disabling effect with
crosstalk
260
Crosstalk Cancellation
R
y
R
y
~
R
H
L
H
C
LL
A
RR
A
LR
A
RL
A
L
e
R
e
L
y
~
L
y
x

C: Crosstalk Canceller
: Transfer function from loudspeaker
to ear
H
L
and H
R
are related to binaural synthesizer.
XY
A
R L X , e
R L Y , e
261
262
263
264
265
266
267
Crosstalk Cancellation using
ITFs
(47)
268
Crosstalk Cancellation using ITFs
269
Crosstalk Cancellation using ITFs
270
Crosstalk Cancellation Symmetric
Solution
271
Solution
272
Solution
273
Solution
274
Solution
275
Solution
276
Solution
277
Solution
278
Crosstalk Cancellation Stability
Issues
279
Crosstalk Cancellation Stability
Issues
280
Crosstalk Cancellation Stability Issues
281
Head Rotation vs. Crosstalk
Cancellation
R RR L LR R
R RL L LL L
X A X A E
X A X A E
+ =
+ =
X A E
X
X
A A
A A
E
E
R
L
RR LR
RL LL
R
L
~

~

~

=
(
=
(
inverse). valid the have

cannot matrix the (i.e. singular
~
matrix the making same
the are
~
matrix acoustic
the of columns the both
, ; Since
A
A
A A A A
RL LL RR LR
= =
282
Crosstalk Cancellation (Recursive Topologies)
283
284
285
286
287

AudioCourse 2011

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

AudioCourse 2011

Hochgeladen von

Copyright:

Verfügbare Formate

1

Audio Signal Processing

zero) : o pole, : (X diagram zero - Pole

inverse). valid the have

Das könnte Ihnen auch gefallen