Sie sind auf Seite 1von 5

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 7 i hc Nng nm 2010

NH GI CC THUT TON PHT HIN TING NI DNG NGNG THCH NGHI V MNG NEURAL TRONG MIN WAVELET
PERFORMANCE ASSESSMENT ON VOICE ACTIVITY DETECTION ALGORITHMS USING ADAPTIVE THRESHOLD AND NEURAL NETWORK IN WAVELET DOMAIN SVTH: Nguyn Tr Phc, Trn L Anh Th, Nguyn Ngc Nh Trang
Lp05DT2 - 05DT3 , Khoa in t Vin thng , Trng i hc Bch Khoa

GVHD: TS. Phm Vn Tun


Khoa in t Vin thng, Trng i hc Bch Khoa
TM TT Mc ch ca bi bo l nghin cu cc thut ton pht hin ting ni (VAD) da trn bin i Wavelet. Cc thuc tnh c trch trong min Wavelet s c em so snh vi cc mc ngng thch nghi hoc c nhn dng bi mng neural (NN) thc hin vic phn loi. Nhng thut ton VAD ny c nh gi v so snh vi cc phng php VAD tiu chun khc c xut bi ITU-T v ETSI. Kt qu m phng trn c s d liu TIMIT trn nhiu cho thy cc phng php dng bin i Wavelet t hiu sut phn loi cao hn cc phng php khc, ng thi cho khi lng tnh ton thp hn. ABSTRACT The objective of this paper is to study on voice activity detection (VAD) algorithms based on Wavelet transform. The feature extracted in Wavelet domain is then compared to adaptive thresholds or recognized by a neural network (NN) to do classification. These VAD algorithms are evaluated with the noisy TIMIT corpus and compared to other VAD methods standardized by ITU-T and ETSI. The experimental results show that Wavelet approaches lead to superior classification performance and offer a much lower computational complexity than other VAD methods.

1. Gii thiu K thut pht hin ting ni ng vai tr quan trng trong cc phng php x l ting ni v ng dng trong thng tin lin lc nh m ha, truyn dn, nhn dng [1]. Do c im phc tp ca cc loi nhiu trong thc t nn rt kh xy dng c cc thut ton VAD bn vng i vi nhiu mi trng. c nhiu phng php c xut nhm nng cao hiu sut ca b VAD nh s dng kt hp nhiu c tnh trong min thi gian, min ph v min Wavelet; thit k cc b quyt nh thch nghi vi mc nhiu; v hun luyn cc m hnh thng k nh mng neural, m hnh Markov n, v.v...

Hnh 1. Kt qu VAD vi tn hiu ting ni b nhiu

Trong bi bo ny, chng ti tin hnh nghin cu cc thut ton VAD dng bin i Wavelet ri rc (DWT) v nh gi hiu sut ca cc thut ton trn c s d liu TIMIT c trn nhiu. Trong phn 2 trnh by cc thut ton VAD dng DWT v cc phng php VAD tiu chun ca ITU-T v ETSI. M phng v phn tch kt qu c
228

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 7 i hc Nng nm 2010

trnh by trong phn 3. Phn cui trnh by kt lun v a ra hng pht trin. 2. Cc thut ton VAD Tn hiu ting ni ban u c phn khung, sau trch cc thuc tnh mang c trng cho phn ting ni (speech) v phn khng c ting ni (non-speech) (hnh 2). Vic thc hin quyt nh da trn mc ngng hay theo m hnh c hun luyn.

Hnh 2. S khi thc hin VAD

2.1. VAD dng bin i Wavelet 2.1.1. Thuc tnh khong cch gia hai bng con v ngng thch nghi Thut ton tnh khong cch gia hai bng con WSDM (Wavelet Subband Distance Measure) theo [2] da trn s khc nhau v phn b nng lng bng con ca phn speech v phn non-speech. Thuc tnh ny c xc nh theo cc cng thc trong (1).
D(i ) 1 Na
Na 2 X m ,i ( n ) n 1

1 N Na

N 2 X m,i (n) , Dw (i) n Na 1

D(i)

1 2

N 16 log 1 2 xi (k ) 2 log(2) k 1

(1)

Vi N l s mu trong mt khung, N a v N

N a l chiu di ca tp cc h s

wavelet X m ,i ( n) ti bng con tn s thp v cao. X m ,i ( n) c tnh bng cch p dng DWT ti tham s t l th m v ly ca s khung th i. Mt b lc percentile filter (PF) c thit k da trn nguyn l: thng tin ting ni khng thng xuyn xut hin ti tt c cc knh tn s v ti cng mt thi im dng xc nh ngng nhiu thch nghi. 2.1.2. Thuc tnh mc nng lng ca cc h s chi tit v ngng quyt nh thch nghi Vic s dng thuc tnh mc nng lng ca cc h s Wavelet chi tit WDCE (Wavelet Details Coefficients Energy) theo [3] da trn c im: ti cc thang t l ln ca DWT, thnh phn chi tit ca tn hiu b nhiu phn ln c quyt nh bi phn ting ni trong khi bin ca nhiu rt nh. Do , VAD c thc hin bng cch so snh nng lng ca cc thnh phn chi tit ly t bin i Wavelet trong khung ang xt vi nng lng ca cc thnh phn chi tit ca 4 khung nhiu gn nht trc . 2.1.3. Thuc tnh WSDM v m hnh mng Neuron (NN) Trong [2], mt m hnh mng NN c 3 lp c hun luyn phn loi speech/non-speech cho tng frame m thanh ng vo. Mng neural c thit lp gm 3 lp: lp nhn d liu vo, lp n (phn tch d liu), lp ng ra. Thng qua thut ton Levenberg-Marquardt [2], NN c hun luyn trn d liu TIMIT Test c trn nhiu. Thuc tnh WSDM v cc o hm bc 1 v bc 2 ca WSDM c a vo lp ng vo ca NN. Hai tn hiu ti lp ng ra c so snh vi nhau thc hin quyt nh. 2.2. Cc phng php khc 2.2.1. VAD G.729 Annex B ITU-T Thut ton VAD G729B trong [4] c pht trin dnh cho thng tin a phng tin v in thoi c nh. Tn hiu ting ni c chia thnh cc khung c di 10ms.
229

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 7 i hc Nng nm 2010

Bn c trng s dng cho vic quyt nh VAD gm: sai khc nng lng ti bng thp (0-1kHz), sai khc nng lng ton bng, sai khc t l qua im 0, mo ph. 2.2.2. VAD ETSI ES 202 050 Chun ETSI 202 050 [5] vi b VAD c tch hp trong khi tin x l c lng nhiu. ETSI-Nest (noise estimation) thc hin tnh nng lng thi on ngn (80 mu mi khung). Nng lng ny c dng cp nht mc nng lng trung bnh ri tnh chnh lch gia hai mc nng lng v so vi mc ngng quyt nh. 2.2.3. Thuc tnh Mel-Frequency Cepstral Coefficients (MFCCs) v m hnh NN Trong [6], mt thut ton VAD c thit k da trn vic hun luyn mng neural vi thuc tnh ph MFCC (Mel-Frequency Cepstral Coefficients) tng t nh trong mc 2.1.3. Vic trch cc c trng MFCC da trn s cm th tn s m thanh ca tai ngi. Cc khung tn hiu ting ni sau khi qua b bin i Fourier (FFT) s c a qua dy b lc thang Mel dng hnh tam gic xp chng nhau. Ng ra bng lc c nn v ly log. Cui cng bin i cosine ri rc (DCT) c p dng tnh 13 h s MFCC. Cc o hm bc 1 v bc 2 cng c trch a vo lp ng vo ca NN. 3. M phng v phn tch kt qu Cc thut ton VAD c nh gi trn c s d liu TIMIT Test c trn vi 4 loi nhiu khc nhau: nhiu nh my(factory), nhiu ting ni (babble), nhiu xe hi (car) v nhiu trng (white). Mi loi nhiu c trn vi tn hiu sch cc t s SNR (signalto-noise ratio) khc nhau gm [-5 0 5 10 15 20 25 30] dB. Hiu nng ca mi thut ton VAD c nh gi thng qua cc php o Recall (RC), Precision (PR) v Fscore.
RC t1 t1 f0

PR

t1 t1 f1

Fscore

2.

PR.RC , PR RC

(3)

Trong : t1 l s cc khung speech c nhn ng, t0 l s khung non-speech c nhn ng, f1 l s khung speech b nhn sai, f0 l s khung non-speech b nhn sai. 3.1. nh gi hiu sut phn loi ca cc thut ton VAD c kho st T cc ng cong c biu din trn mt phng RC-PR hnh 3, ta nhn thy: G729B: cho kt qu tt trong iu kin SNR cao. Khi SNR gim, PR kh n nh v RC gim mnh cho thy thut ton pht hin c t khung speech hn. ETSI-Nest: SNR cao, RC rt cao v PR tt. Khi cc loi nhiu Babble, Factory, White tc ng nhiu, RC gim mnh nhng PR tng nh. VAD hot ng tt vi nhiu car. WDCE: trong hu ht cc trng hp PR tng i n nh v RC gim khi SNR gim. Vi nhiu Car, RC mc cao mc d SNR rt thp. WSDM-PF: SNR cao, cho kt qu RC v PR cao. Khi SNR gim, RC tng, PR gim nh, ngha l cc thng tin ting ni khng b mt. Nhn chung cho kt qu tt ti mi trng hp v phc tp l thp nht. MFCC-NN, WSDM-NN: Hiu sut hot ng tt v kh n nh trong mi trng hp. Tuy nhin phc tp ca thut ton cao hn do s dng 39 h s MFCC.
230

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 7 i hc Nng nm 2010

0.98

0.96

0.94

0.92

0.9

0.88

0.88

0.9

0.92

0.94
Precision

0.96

0.98

WSDM-NN Factory WSDM-NN Babble WSDM-NN White WSDM-NN Car G729B Factory G729B Babble G729B White G729B Car MFCC-NN Factory MFCC-NN Babble MFCC-NN White MFCC-NN Car WSDM-PF Factory WSDM-PF Babble WSDM-PF White WSDM-PF Car ETSI-Nest Factory ETSI-Nest Babble ETSI-Nest White ETSI-Nest Car WDCE Factory WDCE Babble WDCE White WDCE Car Equal Line

Recall

Hnh 3: th biu din RC-PR ca cc thut ton VAD khc nhau

Bn cnh vic phn tch t l nhn dng v chnh xc ca tng thut ton, c th nh gi hiu sut phn loi qua gi tr Fscore. Kt qu trong hnh 4 cho thy: Trong khi thut ton G729B cho ra Fscore thp nht, NN MFCC lun t c hiu sut cao nht trong cc thut ton. VAD ETSI-Nest c Fscore cao v kh n nh tuy nhin khi SNR qu thp th Fscore gim rt mnh tr trng hp nhiu Car. VAD WDCE c Fscore tng i n nh v hiu qu cao i vi nhiu car v babble. WSDMPF hu ht t c Fscore rt cao. Ti mc SNR=0dB Thut ton VAD ny vn t c hiu sut bn vng vi nhiu Car v White. WSDM-NN cho Fscore tt v n nh trong tt c cc trng hp.

Factory Noise
1 0.95 1 0.95

Babble noise

Fscore

0.85

Fscore

0.85

0.75 30

25

20

15 10 SNRs (dB)

0.75 30

25

20

15 SNRs (dB)

10

White noise
0.65 0.95 0.98 1

Car noise

Fscore

Fscore

0.85

0.94

0.75

0.65 30

25

20

15 SNRs (dB)

10

0.9 30

25

20

15 SNRs (dB)

10

Hnh 4: th biu din Fscore ca cc thut ton VAD khc nhau

3.2. nh gi tc ng ca cc h Wavelet khc nhau T kt qu phn tch, thut ton VAD WSDM-PF vi hiu sut tt, n nh vi nhiu v c phc tp thp nht c tip tc phn tch khi s dng cc h Wavelet khc nhau. Kt qu Fscore trn hnh 5 cho thy v c bn, cc h Wavelet cho Fscore kh ging nhau khi SNR thay i t 30 n 5dB. Tuy nhin vi iu kin mi trng nhiu khc nghit hn, h Wavelet Battle cho kt qu tt hn so vi cc h khc.
231

0.95

0.9
F score

0.85

0.8

0.75 30

25

15
dB

10

25 10 25

20 5

20

4. Kt lun
15
dB

10

10

dB

Battle, Trong cng trnh nghin cu ny, cc thut ton VAD khc nhau c nghin factory Battle, babble cu v nh gi n nh khi dng trong cc mi trng nhiu khc nhau. Kt qu nh white Battle, Battle, gi cho thy cc thut ton VAD vi cc thuc tnh c trch trong min Wavelet cho ra car Symmlet, kt qu tt nht khi SNR thp. Thc nghim cng ch ra, thut ton WSDM-PF c phc factory Symmlet, babble tp tnh ton thp hn cc nhm VAD cn li do s dng mt thuc tnh duy nht. Kt qu white Symmlet, nh gi cng cho php chn c cc nhm h Wavelet cho ra hiu sut n nh nht Symmlet, car i vi cc SNR khc nhau. Trong nghin cu tip theo, chng ti s nh gi tnh hiu qu ca cc thut ton VAD khi tch hp chng vo khi tin x l ca h thng nhn dng ting ni nhm nng cao hiu sut nhn dng.

Haar, factory Haar, babble Haar, white Haar, car 1 Haar, factory Vaidyanathan, factory Haar, babble Vaidyanathan, babble Haar, white Vaidyanathan, white Coiflet,white Haar, car Vaidyanathan, car Coiflet, car Haar, factory Vaidyanathan, factory Coiflet, factory Haar, factory Beylkin, factory Haar, babble Vaidyanathan, babble Coiflet, babble Haar, babble Beylkin, babble Haar, white Vaidyanathan, white Coiflet,white Haar, white 0.95 Beylkin, white Haar, car Vaidyanathan, car Coiflet, car Haar, car Beylkin, car Vaidyanathan, factory Coiflet, factory Vaidyanathan, factory Haar, factory Daubechies, factory Beylkin, factory Vaidyanathan, babble Coiflet, babble Vaidyanathan, babble Haar, babble Daubechies, babble Beylkin, babble Vaidyanathan, white Beylkin, white Coiflet,white Vaidyanathan, white Haar, white Daubechies, white Vaidyanathan, car Haar, car Coiflet, car Vaidyanathan, car Beylkin, car Daubechies, car Coiflet, factory Haar, factory Vaidyanathan, factory Beylkin, factory Coiflet, factory Daubechies, factory Battle, factory Haar, babble Coiflet, babble Vaidyanathan, babble Beylkin, babble Coiflet, babble Daubechies, babble 0.9 Battle, babble Haar, white Coiflet,white Vaidyanathan, white Beylkin, white Coiflet,white Daubechies, white Battle, white Haar, car Coiflet, car Vaidyanathan, car Beylkin, car Coiflet, car wDaubechies, car Battle, car Vaidyanathan, factory Coiflet, factory Beylkin, factory Daubechies, factory Beylkin, factory Battle, factory Symmlet, factory Vaidyanathan, babble Coiflet, babble Beylkin, babble Daubechies, babble Beylkin, babble Battle, babble 20 15 10 5 0 Symmlet, babble -5 Vaidyanathan, white Coiflet,white white Beylkin, white dB Daubechies, Beylkin, white Symmlet, white Battle, white Vaidyanathan, car Beylkin, car Coiflet, car car Daubechies, Beylkin, car Symmlet, car Battle, car Coiflet, factory Daubechies, factory Battle, factory Daubechies, factory Beylkin, factory Symmlet, factory 0.85 Coiflet, 25 babble Daubechies, babble 30 20 15 10 5 Battle, babble 5 0 Daubechies, babble Beylkin, babble -5 Symmlet, babble SNR (dB) Coiflet,white white Beylkin, white Daubechies, white Daubechies, Battle, white Symmlet, white Coiflet, car wDaubechies, car Beylkin, car wDaubechies, car Battle, car Symmlet, car Beylkin, factory Battle, factory Daubechies, factory Battle, factory Symmlet, factory Beylkin, babble Battle, babble Daubechies, babble-5 15 10 babble 5 0 15 0 10 -5 Battle, 5 Symmlet, babble SNR dB (dB) Beylkin, white Battle, white Daubechies, white Battle, white Symmlet, white Beylkin, car Battle, car wDaubechies, car Battle, car Symmlet, car Daubechies, factory Battle, factory Symmlet, factory Symmlet, factory Daubechies, babble Battle, babble Symmlet, babble 5 0 Symmlet, babble -5 Daubechies, white Symmlet, white Battle, white Symmlet, white Daubechies, car Symmlet, car Battle, car Symmlet, car Battle, factory Symmlet, factory Battle, babble Symmlet, babble 0 -5 Battle, white Symmlet, white Battle, car Symmlet, car Symmlet, factory Symmlet, babble Symmlet, white Symmlet, car

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 7 i hc Nng nm 2010

Haar, factory Haar, babble Haar, white Haar, car Vaidyanathan, factory Vaidyanathan, babble Vaidyanathan, white Vaidyanathan, car Coiflet, factory Coiflet, babble Coiflet,white Coiflet, car Beylkin, factory Beylkin, babble 0 -5 Beylkin, white Beylkin, car Daubechies, factory Daubechies, babble Daubechies, white Daubechies, car

Fscore Fscore

SNR (dB) Hnh 5. th biu din gi tr F-score ca cc h Wavelet khc nhau

TI LIU THAM KHO [1] Peter Vary, Rainer Martin (2006), Digital Speech Transmission, Wiley. [2] Tuan V. Pham, et al. (2008), Voice Activity Detection Algorithms Using Subband Power Distance Feature For Noisy Environments, Proc. Interspeech, pp. 2586-2589. [3] Jiang Shaojun, et al. (2004), A new algorithm for voice activity detection based on Wavelet transform, Proc. IEEE IMVSP, pp 222-225 [4] A. Benyassine et al. (1997), ITU-T Recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications, IEEE Communications Magazine, vol. 35, no. 9, pp. 6473. [5] ETSI (2003), Speech Processing, Transmission and Quality Aspects (STQ), Distributed speech recognition, Advanced frontend feature extraction algorithm, Compression algorithms, ETSI ES 202 050 V1.1.3, pp 14-15 and pp 40-41. [6] Tuan V. Pham et al. (2009), Using artificial neural network for robust voice activity detection under adverse conditions, Proc. IEEE RIVF, pp.1-8.

232

Das könnte Ihnen auch gefallen