Sie sind auf Seite 1von 20

Robust Pitch Detection using DCT

based Spectral Autocorrelation

Under the guidance of


Dr. Rajesh kumar Dubey

Submitted by:
Sudhakar Rai(15316004)
Nikhil Singh Gaur(15316024)

Pitch can be defined as the extent to which sound is


high or low.

Pitch is the perceived fundamental frequency of sound.

Pitch detection is known as determining the level of


intensity of voice. Pitch detection is very important in
some related tasks of voice processing. Pitch detection
is crucial task in singing voice separation also. Pitch
detection also play important role in Musical information
retrieval, Identification of the singer and in lyric
recognition.

Pitch can identify gender of singing voice.

Pitch also can examine or find the time of voice


recording or the time slot of voice recording.

Techniques in pitch
extraction
Time domain approaches
(1) ACF (Autocorrelation function) and MACF
(Modified Autocorrelation function)
(2) Normalized cross correlation function NCCF
(3) AMDF (Average magnitude difference function)
Frequency domain approaches

(4) CPD (Cepstrum Pitch Determination)


(5) DCT (discrete cosine transformation) based
spectral autocorrelation

Method 1:
ACF (Autocorrelation function)
Autocorrelation function (ACF)
By definition , auto - correlatio n is

N
1
R( m ) lim
x ( n ) x (n m ), 0 m M 0

N 2 N 1
n N
R for n ' ' and ' -' are symmetrica l, so only n 0 is used.

1
R(m)
N

N 1 m

x(n) x(n m), 0 m M


n 0

What is Autocorrelation, R(m)?

E.g.
x=[1 5 7 1 4 ]
N=5,
R(0)=[x(0)*x(0)+x(1)*x(1)+x(2)*x2+x(3)*x(3)+x(4)*x(4)]
R(0)= (1+ 25+49+1+16)=92
R(1)=[x(0)*x(1)+x(1)*x(2)+x(2)*x(3)+x(3)*x(4)]
x=[1 5 7 1 4 ]

[1 5 7 1 4 ]

(5+ 35+ 7+ 4)=51


And so on
R=[92.0000 51.0000 40.0000 21.0000

4.0000]

Importance of linear
prediction analysis in speech

Speech signal is produced by the convolution of


excitation source and time varying vocal tract system
components.

These excitation and vocal tract components are to be


separated from the available speech signal to study
these components independently . For deconvolving
the given speech into excitation and vocal tract system
components, method theLinear Predictionanalysis is
developed.
6

The speech sample s(n) are related to the


excitation u(n) by the simple difference
equation

Between the pitch pulses Gu(n) is zero. So the


present speech sample is predicted from the linear
weighted summation of the past speech samples

Excitation is zero during pitch


pulses so u(n)=0

We process the speech signal through the linear


predictor

with predictor coefficients and the

output is :

The error between the actual signal and predicted value is given by:

E(n) consist of train of impulses .before performing spectral


autocorrelation function we do linear prediction analysis so
that residual of LP analysis are impulses who Fourier
transformation will be flat .

Example 2: Discrete Cosine Transform


(DCT)

A NN

akl

a11
...

... ... a1N


... ... ...

, k 1,1 l N

... ... ... ...

a
...
...
a
NN
N1

N
2 cos (2l 1)( k 1) ,2 k N ,1 l N
N
2N

C C* , C 1 CT
13

However, Fourier transformation has strong


disadvantages for some applications
it is complex
it has poor energy compaction
energy compaction is the ability to pack the energy of
the spatial sequence into as few frequency coefficients
as possible
if compaction is high we only have to transmit a few
coefficients.
14

algorithm

15

Algorithm

1.Record a speech signal.


2.Preprocess the speech signal through linear prediction analysis to
flatten the spectrum
3.Take the frame size of 20ms with overlap of 10 ms to get a pitch
contour.
4.Find out dct magnitude spectrum for each analysis of frame.
Dct spectrum is smoothed by following window
W(k)=1

for 0<k<N/2

W(k)=0.5*(1-cos(2*pi*K/N))

for N/2<k<N

5.SAF is applied on smoothed DCT spectrum

16

Result :
This algorithm was tested on number
of speech segment and it is found to
be a robust tool for obtaining a good
estimate of the fundamental
frequency.
Which is clearly shown in the graph.

THANK YOU
ANY QUESTION

Das könnte Ihnen auch gefallen