Sie sind auf Seite 1von 111

B GIO DC V O TO

TRNG I HC LC HNG
***

HUNH THANH GIU

NGHIN CU V NHN DNG TING NI TING VIT


V NG DNG TH NGHIM TRONG IU KHIN MY TNH

LUN VN THC S CNG NGH THNG TIN

ng Nai, nm 2012

B GIO DC V O TO
TRNG I HC LC HNG
***
HUNH THANH GIU

NGHIN CU V NHN DNG TING NI TING VIT


V NG DNG TH NGHIM TRONG IU KHIN MY TNH

Chuyn ngnh: Cng ngh Thng tin


M s: 60.48.02.01

LUN VN THC S CNG NGH THNG TIN

NGI HNG DN KHOA HC:


TS. V C LUNG

ng Nai, nm 2012

LI CM N

u tin, em xin by t lng bit n chn thnh n thy V c Lung,


ngi tn tnh hng dn, to mi iu kin thun li em hon thnh tt lun
vn tt nghip ny.
Em cng xin cm n s dy d v gip tn tnh ca tt c qu thy c ti
trng i hc Lc Hng. Tt c cc kin thc m em c truyn t s l hnh
trang qu gi trn con ng hc tp, lm vic v nghin cu sau ny.
Em xin c tri n tt c.
ng Nai, thng 8 nm 2012
Hc vin
Hunh Thanh Giu

ii

TM TT LUN VN

Nghin cu nhn dng ting ni c cc nc trn th gii thc hin


rt nhiu nm qua v cng c nhng thnh cng nht nh. Vit Nam cng c
nhiu cng trnh nghin cu v th nghim, tuy nhin, cc kt qu vn cn hn ch
v cn c nhiu nghin cu na trong vn ny.
Nhm tm hiu nhng phng php nhn dng ting ni ting Vit ng
gp mt phn nh vo nhng cng trnh nghin cu , lun vn mun nghin cu
v nhn dng ting ni ting Vit v ng dng th nghim trong giao tip vi my
tnh c th nhn dng ting ni ting Vit bng vic s dng m hnh Markov n
da trn nn tng CMUSphinx ca i hc Carnegie Mellon.
Lun vn ch yu nghin cu v ting ni, cc phng php x l ting ni,
rt trch c trng ting ni bng MFCC (Mel-scale Frequency Cepstral
Coefficient) v LPC (Linear Predictive Coding), m hnh Markov n, m hnh m
hc, m v p dng cho ting Vit. Lun vn cng tm hiu v kin trc h thng
nhn dng ting ni qua cng c Sphinx v s dng cng c th nghim cho
vic nhn dng ting ni ting Vit.
Qua nghin cu, lun vn nm c cch x l ting ni, m hnh,
phng php no l tng i tt nht cho vic nhn dng ting ni ting Vit. Bn
cnh , lun vn cng xy dng c mt chng trnh demo minh ha cho
nhng hiu bit ca mnh v nhn dng ting ni ting Vit.
Trong thi gian hn ch vi mc phc tp ca vn nhn dng ting
ni ting Vit, lun vn ny ch l bc nghin cu ban u cho nhn dng ting ni
ting Vit.

iii

MC LC
LI CM N .......................................................................................................... i
TM TT LUN VN .......................................................................................... ii
MC LC

..........................................................................................................iii

DANH MC BNG ............................................................................................. vii


DANH MC HNH V ....................................................................................... viii
M U

........................................................................................................... 1

CHNG 1: TNG QUAN .................................................................................... 2


1.1. TNG QUAN TNH HNH TRONG V NGOI NC ......................... 2
1.2. MC CH TI................................................................................... 3
1.3. GII HN TI .................................................................................... 4
CHNG 2: C S L THUYT X L TING NI ........................................ 5
2.1. C S X L TN HIU S .................................................................... 5
2.1.1.

Cc h thng v tn hiu s: ................................................................ 5

2.1.1.1. Cc tn hiu dng sin: ..................................................................... 5


2.1.1.2. H thng s: ................................................................................... 6
2.1.2.

Php bin i tn s lin tc: .............................................................. 7

2.1.2.1. Bin i Fourier: ............................................................................ 7


2.1.2.2. Bin i Z: ..................................................................................... 9
2.1.2.3. Quan h gia bin i Fourier v bin i Z................................. 11
2.1.3.

Php bin i tn s ri rc: ............................................................. 11

2.1.3.1. Bin i Fourier ri rc (Discrete Fourier Transform - DFT):....... 11


2.1.3.2. Bin i Fourier nhanh:................................................................ 13
2.1.3.3. Bin i Cosine ri rc:................................................................ 14
2.1.4.

Cc b lc s v cc ca s: ............................................................. 15

2.1.4.1. B lc l tng thng thp: .......................................................... 15


2.1.4.2. Cc phng php ca s: ............................................................. 15
2.1.4.3. B lc FIR v IIR: ........................................................................ 17
2.1.5.

Xc sut v qu trnh ngu nhin: ..................................................... 17

2.1.5.1. C s xc sut: ............................................................................. 18


2.1.5.2. Bin ngu nhin: .......................................................................... 18
2.2. BIU DIN TN HIU TING NI ........................................................ 20
2.2.1.

Bin i Fourier thi gian ngn: ....................................................... 20

iv

2.2.2.

Phn tch Fourier thi gian ngn: ...................................................... 22

2.3. RT TRCH C TRNG TING NI ................................................. 23


2.3.1.

Trch c trng MFCC (Mel-scale Frequency Cepstral Coefficient) . 24

2.3.1.1. Tin nhn (Pre-emphasis): ............................................................ 24


2.3.1.2. Ca s ha (Windowing): ............................................................. 25
2.3.1.3. Bin i Fourier nhanh (Fast Fourier Transform - FFT): .............. 25
2.3.1.4. Lc qua b lc Mel-scale : ........................................................... 25
2.3.1.5. Tnh log nng lng ph: ............................................................. 26
2.3.1.6. Bin i Cosine ri rc:................................................................ 27
2.3.2. Phng php m ha d bo tuyn tnh LPC (Linear Predictive
Coding) ................................................................................................ 27
2.3.2.1. Phn tch t tng quan: .............................................................. 28
2.3.2.2. Phn tch LPC: ............................................................................. 28
2.3.2.3. Phn tch cepstral: ........................................................................ 29
2.3.2.4. t trng s cho cc h s cepstral: .............................................. 29
CHNG 3: NHN DNG TING NI ............................................................. 30
3.1. M HNH MARKOV N: ....................................................................... 30
3.1.1.

Chui Markov ri rc: ...................................................................... 30

3.1.2.

nh ngha m hnh Markov n: ....................................................... 33

3.1.2.1. Lp trnh ng v DTW: .............................................................. 35


3.1.2.2. c lng HMM - Thut ton tin:.............................................. 37
3.1.2.3. Gii m HMM - Thut ton Viterbi: ............................................. 37
3.1.2.4. c lng cc tham bin HMM - Thut ton Baum-Welch: ......... 39
3.1.3.

Vn thc t trong s dng cc HMM: .......................................... 41

3.1.3.1. c lng ban u: ...................................................................... 41


3.1.3.2. Cu trc lin kt m hnh: ............................................................ 42
3.1.3.3. Tiu ch hun luyn: ..................................................................... 43
3.1.3.4. Php ni suy loi b: .................................................................... 43
3.1.3.5. Ti u ton t: ............................................................................. 44
3.1.3.6. Biu din xc sut: ....................................................................... 45
3.1.4.

Nhng hn ch ca HMM: ............................................................... 47

3.1.4.1. M phng khong thi gian tn ti: .............................................. 47


3.1.4.2. Gi nh bc u tin:................................................................... 49
3.1.4.3. Gi nh c lp c iu kin: ...................................................... 49

3.2. M HNH M HC: ............................................................................... 50


3.2.1.

La chn n v thch hp cho m hnh m hc: .............................. 50

3.2.1.1. So snh cc n v khc nhau: ...................................................... 51


3.2.1.2. La chn n v hun luyn cho ting Vit: ................................. 52
3.2.2.

nh gi c trng m hc: .............................................................. 53

3.2.2.1. La chn cc phn phi u ra HMM: ......................................... 53


3.2.2.2. Hun luyn ting ni ri rc so vi lin tc: ................................. 55
3.2.3.

Phng php tnh ton li:................................................................ 57

3.3. M HNH NGN NG: .......................................................................... 58


3.3.1.

L thuyt ngn ng hnh thc: .......................................................... 58

3.3.1.1. H thng cp bc Chomsky: ......................................................... 59


3.3.1.2. Phn tch c php th cho ng php ng cnh t do (CFG Context Free Grammars): ................................................................. 60
3.3.2.

M hnh ngn ng Stochastic: .......................................................... 62

3.3.2.1. Xc sut ng php ng cnh t do (CFG):.................................... 62


3.3.2.2. M hnh ngn ng n-gram: ........................................................... 64
3.3.3.

phc tp ca cc m hnh ngn ng: ........................................... 65

CHNG 4: CNG C H TR NHN DNG TING NI ............................ 66


4.1. GII THIU V SPHINX: ....................................................................... 66
4.2. KIN TRC SPHINX: ............................................................................. 67
4.2.1.

B ngoi vi - FrontEnd: .................................................................... 69

4.2.2.

B ngn ng - Linguist:.................................................................... 70

4.2.2.1. M hnh ngn ng: ....................................................................... 71


4.2.2.2. T in:........................................................................................ 72
4.2.2.3. M hnh m hc: .......................................................................... 72
4.2.2.4. th tm kim - SearchGraph: .................................................... 73
4.2.3.

B gii m - Decoder:....................................................................... 74

4.3. QUN L CU HNH SPHINX: ............................................................ 76


CHNG 5: CHNG TRNH DEMO ............................................................... 79
5.1. CI T CHNG TRNH ................................................................... 79
5.1.1.

Ti cc gi Sphinx cn thit:............................................................. 79

5.1.2.

Ci t: ............................................................................................. 79

5.1.2.1. Ci t SphinxBase ...................................................................... 80


5.1.2.2. Ci t Sphinxtrain ....................................................................... 81

vi

5.1.2.3. Ci t PocketSphinx ................................................................... 81


5.2. XY DNG B NGN NG: ................................................................ 81
5.2.1.

Xy dng b t in: ........................................................................ 81

5.2.2.

Xy dng m hnh ngn ng: ........................................................... 83

5.2.2.1. Chun b tp tin vn bn: .............................................................. 83


5.2.2.2. Pht sinh b t vng: ................................................................... 84
5.2.2.3. Pht sinh m hnh ngn ng: ........................................................ 84
5.2.3.

Xy dng m hnh m hc:............................................................... 85

5.3. CU HNH HUN LUYN SPHINX: .................................................... 88


5.3.1.

iu chnh tham s: .......................................................................... 88

5.3.1.1. Cu hnh th mc hun luyn: ...................................................... 88


5.3.1.2. iu chnh cc tham s: ............................................................... 89
5.3.2.

Thc thi hun luyn:......................................................................... 90

5.3.2.1. To vector c trng: ................................................................... 90


5.3.2.2. Hun luyn: .................................................................................. 90
5.4. KT QU TH NGHIM: ...................................................................... 91
KT LUN ......................................................................................................... 95
TI LIU THAM KHO
PH LC

vii

DANH MC BNG
Bng 2.1. Cc tnh cht ca bin i Fourier ........................................................... 8
Bng 2.2. Cc tnh cht ca bin i Z .................................................................. 10
Bng 2.3. Tnh cht ca DFT i vi dy tun hon c chu k N........................... 12
Bng 3.1. H thng cp bc Chomsky v my tng ng cho php ngn ng ....... 59
Bng 4.1. Cc th nh dng trong tp tin cu hnh ................................................ 77
Bng 5.1. Thng s cu hnh ................................................................................. 90

viii

DANH MC HNH V
Hnh 2.1. Tn hiu analog v tn hiu s tng ng.................................................. 5
Hnh 2.2. ng hnh sin vi chu k 25 mu .......................................................... 5
Hnh 2.3. Biu din tng ca hai ng sin cng tn s ........................................... 6
Hnh 2.4. S khi ca mt h thng k thut s .................................................. 6
Hnh 2.5. th hm X(e j) .................................................................................... 7
Hnh 2.6. Biu din theo phn thc phn o............................................................. 9
Hnh 2.7. Biu din Z trn mt phng phc ............................................................. 9
Hnh 2.8. Vng trn n v .................................................................................... 10
Hnh 2.9. Thc hin bin i z trn vng trn n v ............................................. 11
Hnh 2.10. FFT 8 im, c s 2, phn chia theo tn s ........................................... 14
Hnh 2.11. Hm sinc .............................................................................................. 15
Hnh 2.12. Biu din AR(ej) .................................................................................. 16
Hnh 2.13. Hm phn phi ..................................................................................... 19
Hnh 2.14. Ph thi gian ngn ca ting ni ging nam ......................................... 22
Hnh 2.15. Chuyn i gia gi tr nng lng log (trn trc x) sang thang xm (trc
y) ........................................................................................................................... 23
Hnh 2.16. S rt trch c trng tng qut ....................................................... 23
Hnh 2.17. Cc bc tnh c trng MFCC ........................................................... 24
Hnh 2.18. th biu din mi quan h gia Mel v Hz ...................................... 26
Hnh 2.19. S b x l LPC rt trch c trng ting ni .................................. 28
Hnh 3.1. Minh ha m hnh Markov..................................................................... 30
Hnh 3.2. So snh trc tip gia hai mu ting ni ................................................. 36
Hnh 3.3. Qu trnh tnh ton li tin cho HMM ca Dow Jones Industrial .......... 37
Hnh 3.4. Qu trnh tnh ton li Viterbi cho HMM ca Dow Jones Industrial ..... 39
Hnh 3.5. Mi quan h t-1 & t v t & t+1 trong thut ton tin-li...................... 40
Hnh 3.6. S minh ha cc php ton yu cu cho vic tnh ton ca t(i, j)........... 41
Hnh 3.7. M hnh Markov n in hnh c dng cho m hnh m v................. 43
Hnh 3.8. Mt HMM chun ................................................................................... 47
Hnh 3.9. T l li t gia cc m hnh................................................................... 54
Hnh 3.10. Cu trc ca mt m hnh t ri rc ..................................................... 56
Hnh 3.11. M hnh Markov n cu tng hp......................................................... 57
Hnh 3.12. Mt biu din cy ca mt cu v ng php tng ng ca n ............. 59

ix

Hnh 3.13. Xc sut bn trong c tnh ton mt cch quy nh tng ca tt c


cc dn sut ........................................................................................................... 63
Hnh 3.14. nh ngha xc sut bn ngoi ............................................................. 64
Hnh 4.1. Kin trc tng qut ca Sphinx .............................................................. 68
Hnh 4.2. Qu trnh trch c trng ca b ngoi vi dng MFCC ........................... 69
Hnh 4.3. Chui cc DataProcessor........................................................................ 70
Hnh 4.4. Mt v d th tm kim....................................................................... 74
Hnh 5.1. Ci t Sphinx ....................................................................................... 80
Hnh 5.2. S qu trnh to m hnh ngn ng bng cng c CMUclmk ............ 83
Hnh 5.3. S hot ng ca chng trnh demo ................................................ 91

M U
Ting ni l phng tin giao tip c bn nht ca con ngi, s dng li
ni l mt cch din t n gin v hiu qu nht. t lu, con ngi lun m
c n cc h thng my iu khin t ng c th giao tip bng ting ni t
nhin ca con ngi. Ngy nay, cng vi s pht trin ca khoa hc k thut v
cng ngh, c bit trong lnh vc tin hc. Cc h thng my t ng dn thay
th con ngi trong nhiu cng vic. Nhu cu giao tip vi thit b my bng ting
ni l rt cn thit, l phng thc giao tip vn minh v t nhin nht.
Nhn dng ting ni l mt vn khng mi. Trn th gii v ang c
c rt nhiu cng trnh nghin cu v vn ny vi rt nhiu phng php nhn
dng ting ni khc nhau. V nhng nghin cu cng c nhng thnh cng ng
k. C th k n nh: h thng nhn dng ting ni ting Anh Via Voice ca IBM,
Spoken Toolkit ca CSLU (Central of Spoken Laguage Under-standing), Speech
Recognition Engine ca Microsoft, Hidden Markov Model toolkit ca i hc
Cambridge, CMU Sphinx ca i hc Carnegie Mellon, ngoi ra, mt s h thng
nhn dng tin ni ting Php, c, Trung Quc,... cng kh pht trin. Ting Vit
th cng c mt s cng trnh ca cc nhm nh: AILab, Vietvoice, Vspeech
Nhng i vi nc ta, nhn dng ting ni vn l mt lnh vc kh mi m. n
nay tuy c nhiu nghin cu v nhn dng ting ni ting Vit v t c
mt s thnh tu, nhng nhn chung vn cha t c kt qu cn thit c th
to ra cc sn phm mang tnh ng dng cao.
Vi mong mun c th hiu c cch giao tip gia ngi v my tnh,
lun vn ny nghin cu cc phng php nhn dng ting ni, t xy dng mt
chng trnh demo nhn dng ting ni ting Vit m khi con ngi ni my tnh c
th hiu c.

CHNG 1: TNG QUAN


1.1. TNG QUAN TNH HNH TRONG V NGOI NC
Vn nghin cu cc phng php nhn dng ting ni v ang thu ht
rt nhiu s u t v nghin cu ca cc nh khoa hc trn khp th gii. tng
v xy dng cc h thng nhn dng ting ni c t nhng nm 50 ca th k 20
v n nay t c nhiu kt qu ng k.
Trn th gii c rt nhiu h thng nhn dng ting ni ting Anh v
ang c ng dng rt hiu qu nh: Via Voice ca IBM, Spoken Toolkit ca
CSLU (Central of Spoken Laguage Under-standing), Speech Recognition Engine
ca Microsoft, Hidden Markov Model toolkit ca i hc Cambridge, CMU Sphinx
ca i hc Carnegie Mellon, ngoi ra, mt s h thng nhn dng tin ni ting
Php, c, Trung Quc, cng kh pht trin.
i vi nc ta, nhn dng ting ni vn l mt lnh vc kh mi m. n
nay tuy c nhiu nghin cu v nhn dng ting ni ting Vit v t c
mt s thnh tu, nhng nhn chung vn cha t c kt qu cn thit c th
to ra cc sn phm mang tnh ng dng cao. C th k n cc cng trnh sau:
- AILab: y l cng trnh c phng th nghim Tr tu Nhn to - AILab
thuc i hc Khoa hc T nhin to ra da trn cc cng ngh tin tin nht v
nhn dng v tng hp ting ni p ng nhu cu ca ngi dng. Da trn cng
ngh x l ting ni ting Vit, AILab xy dng phn mm iSago chuyn h tr
tm kim thng tin qua ting ni.
Thng qua ng dng phn mm ngi s dng c kh nng h tr giao tip
vi in thoi di ng trc tip bng li ni. T ngi s dng tm kim thng
tin nh hng, qun Bar, Caf trn a bn TP. HCM.
Khi ngi dng t cu hi bng ting ni, iSago s truyn ni dung truy
vn ny v server x l v gi li kt qu tm kim, dng mt danh sch: tn nh
hng, a ch.
Phn mm ny cng cho php ngi dng hin th a ch tm c dng
bn hoc nghe c a ch trc tip bng cng ngh tng hp ging ni. Phn

mm

cung

cp

min

ph

ti

ch

www.ailab.hcmus.edu.vn

(http://www.ailab.hcmus.edu.vn)
- Vietvoice: y l phn mm ca mt ngi dn Vit Nam ng ti Canada.
Phn mm c kh nng ni ting Vit t cc tp tin. chy c chng trnh, cn
ci t Microsoft Visual C++ 2005 Redistributable Package (x86). i vi ngi
khim th, phn mm ny cho php s dng cch g tt (nhn nt Ctrl v mt ch)
chn la mt trong cc tnh nng hin th trn mn hnh. Ngi dng c th cp
nht t in cc ch vit tt v cc t ng ting nc ngoi.
- Vspeech: y l mt phn mm iu khin my tnh bng ging ni do
mt nhm sinh vin i hc Bch Khoa TP. HCM vit. Phn mm s dng th vin
Microsoft Speech SDK nhn dng ting Anh nhng c chuyn thnh ting
Vit. Nhm kh thnh cng vi tng ny, do s dng li th vin nhn dng
engine nn thi gian thit k rt ngn li m hiu qu nhn dng kh tt. Phn mm
Vspeech c cc lnh gi h thng n gin nh gi th mc My Computer, nt
Start, Phin bn mi nht c tng tc vi MS Word 2003, lt web vi trnh
duyt Internet Explorer. Khng c cc chc nng ty chnh lnh v gi tt cc ng
dng. Phn mm chy trn nn Windows XP, microphone v card m thanh s dng
tiu chun thng thng.
Tuy nhin vic ng dng nhn dng ging ni vo iu khin my tnh cn
nhiu hn ch. Vit Nam th hu nh ch mi c b phn mm Vspeech ca nhm
sinh vin trng i hc Bch Khoa TP. HCM, cc phn mm khc ch th nghim
trong phng th nghim, cha c s dng thc t v cha t trn 100 t. Phn
mm Vspeech c pht trin t m ngun m Microsoft Speech SDK nhn dng
ting Anh, thng qua d liu, phng thc trung gian, vic nhn dng c chuyn
trong Vspeech nhn bit ting Vit.
1.2. MC CH TI
Lun vn nghin cu nhng tng c bn v cc phng php c s
dng trong nhn dng ting ni t xy dng mt chng trnh demo nhn dng
khong 20 t ting Vit dng iu khin my tnh bng ging ni.
Lun vn gm 05 chng:

Chng 1: Tng quan v tnh hnh trong v ngoi nc lin quan n vic
nhn dng ting ni, mc tiu ti v gii hn ca ti.
Chng 2: Trnh by mt s kin thc c bn v x l tn hiu s, biu
din ting ni trn nh ph v phng php rt trch c trng ting ni bng
phng php MFCC (Mel-scale Frequency Cepstral Coefficient) v LPC (Linear
Predictive Coding).
Chng 3: Tip cn phng php nhn dng ting ni da trn m hnh
Markov n bao gm khi nim, s dng thc t v mt s hn ch ca n. Bn cnh
cng cp n 2 m hnh quan trng xy dng nn b ngn ng cho h thng
nhn dng l m hnh m hc v m hnh ngn ng.
Chng 4: Gii thiu v cng c h tr nhn dng ting ni CMUSphinx
ca i hc Carnegie Mellon, cc thnh phn trong kin trc ca n c c ci
nhn tng quan v mt h thng nhn dng ting ni, ng thi h tr cho vic xy
dng chng trnh demo nhn dng ting ni.
Chng 5: Xy dng chng trnh demo nhn dng ting ni ting Vit s
dng cng c Sphinx, trong m t qu trnh xy dng m hnh ngn ng v hun
luyn m hnh m hc cho chng trnh nhn dng.
Ph lc: Bng phin m phin m ting Vit mc m v theo dng ASCII
da trn bng mu t phin m quc t IPA (International Phonetic Alphabet) c
s dng trong chng trnh.
1.3. GII HN TI
Lun vn ch gii hn trong vic tm hiu v ting ni, cc phng php x
l ting ni, rt trch c trng ting ni; m hnh Markov n, m hnh m hc, m
v p dng cho ting Vit; kin trc h thng nhn dng ting ni qua cng c
Sphinx. Chng trnh demo ch dng mc nhn dng c khong 100 cu lnh
c bn iu khin my tnh (c lit k chng 5). Khi mt ngi c lnh iu
khin, my tnh s hiu v xut hin dng lnh trn mn hnh ca chng trnh.

CHNG 2: C S L THUYT X L TING NI


2.1. C S X L TN HIU S
nh ngha mt tn hiu analog xa(t) di dng hm bin i lin tc theo
thi gian: Nu ta ly mu tn hiu x vi mt khong thi gian ly mu T (tc l t =
nT), ta c th xc nh mt tn hiu thi gian ri rc x(n) = xa(nt). Hn na ta c th
xc nh tn s Fs nh Fs = 1/T, nghch o ca khong thi gian ly mu T.

Hnh 2.1. Tn hiu analog v tn hiu s tng ng

2.1.1. Cc h thng v tn hiu s:


2.1.1.1. Cc tn hiu dng sin:
Mt trong nhng tn hiu quan trng l sng dng sin hay ng hnh sin:
0 () = 0 cos(0 + 0 )

(2.1)

Trong A0 l bin ca ng sin, 0 l tn s gc, v 0 l pha.


n v o gc l radian, do tn s gc 0 lin h vi tn s f0 bi cng
thc 0 = 2 f0 v 0 f0 1
Tn hiu ny tun hon vi chu k T0 = 1/ f0.

Hnh 2.2. ng hnh sin vi chu k 25 mu

ng hnh sin ng vai tr quan trng v cc tn hiu ting ni c th c


phn tch thnh cc tng ca cc ng sin. Tng ca 2 ng sin x0[n] v x1[n] c
cng tn s gc 0 nhng khc bin A0, A1, v pha 0, 1 l mt ng sin khc
c cng tn s nhng khc bin A v pha .
ng hnh sin cng thc (2.1) c din t theo phn thc ca hm s
m phc tng ng nh sau:
0 [] = 0 cos(0 + 0 ) = Re0 (0+0)

(2.2)

Trong : j = 1
Do tng ca hai tn hiu hm m phc l:
0 (0+0) + 1 (0+1 ) = 0 (0 0 + 1 1 ) = 0 = (0+)

(2.3)

Ly phn thc ca 2 v ta c:
0 cos(0 + 0 ) + 1 cos(0 + 1 ) = cos(0 + )

(2.4)

Hnh 2.3. Biu din tng ca hai ng sin cng tn s

tnh A v , ta c cc cng thc:


2 = 0 2 + 1 2 + 2A0 1 cos(0 1 )

tan =

0 sin(0 )+1 sin(1 )


0 cos(0 )+1 sin(1 )

(2.5)

(2.6)

2.1.1.2. H thng s:
Mt h thng s l mt h thng m cho mt tn hiu u vo x(n), pht
sinh mt tn hiu u ra y(n):
y(n) = T{x(n)}

(2.7)

Hnh 2.4. S khi ca mt h thng k thut s

Ni chung, mt h thng s c nh ngha tuyn tnh nu v ch nu:

T{a1x1(n) + a2x2 (n)} = a1T{x1(n)} + a2T{x2 (n)}

(2.8)

Vi bt c gi tr a1, a2 v bt c tn hiu x1(n), x2(n).


H thng tuyn tn bt bin (Linear Time-Invariant - LTI) c m t nh
sau:

() = = ()( ) = () ()

(2.9)

Trong * l php nhn chp. y l php ton quan trng nht trong x l
tn hiu xc nh u ra y(n) h thng khi bit u vo x(n) v p ng xung
h(n). Php chp c tnh cht: giao hon, phn phi, kt hp.
2.1.2. Php bin i tn s lin tc:
2.1.2.1. Bin i Fourier:
Bin i Fourier ca mt tn hiu x(n) c nh ngha nh sau:

( ) = = ()

(2.10)

Hnh 2.5. th hm X(e )

V e j = cos + jsin tun hon vi chu k 2, do vy khi th hin X(e j)


ta ch cn th hin vi di t 0 n 2 hoc t - n ri ly tun hon.
Cc cch th hin X(e j) :
+ Biu din theo phn thc phn o Re, Im
( ) = [( )] + [( )]

(2.11)

+ Biu din theo Module v Argument :


( ) = ( ). arg[(

)]

(2.12)

+ Biu din theo ln v pha:


ln c th ly gi tr m v dng.
( ) = ( ). ()

(2.13)

S tn ti ca bin i Fourier: Cn c vo cc tnh cht hi t ca chui v

s nh x y t min thi gian ri rc n sang min tn s (tc l khi sang


min tn s , ch tn ti bin ch khng tn ti bin n), ta c:
Bin i Fourier ca mt dy x(n) s tn ti nu v ch nu:

=() < (C ngha l chui =() hi t).


Bin i Fourier ngc (IFT: Inverse Fourier Transform):
Bin i Fourier ngc ca ph tn hiu X e j c nh ngha nh sau:
() =

( )
2

(2.14)

y bin i Fourier ngc gip ta xc nh c x(n) t X(e j) .


Bng 2.1. Cc tnh cht ca bin i Fourier

Min n

Min

1
( )
() =
2

( ) = ()
=

ax1 ( n ) + bx2 ( n ) ; (a, b: hng


s)

aX1 ( e j ) + bX2 ( e j )

x ( n n0 )

0 ( )

x(n) l thc (tnh cht i


xng)

X* (ej ) = X (e -j )
Re [ X ( ej ) ] = Re [ X ( e-j ) ]
Im [ X ( ej ) ] = Im [ X ( e-j ) ]
| X (ej ) | = | X (e-j ) |
arg [ X (ej ) ] = -arg [ X (e-j ) ]

x*(n)

X* (e-j )

x(-n)

X (e-j )

x1 ( n ) * x2 ( n )

X1 ( ej ) . X2 ( ej )

x1 ( n ) . x 2 ( n )

1
1 ( () ). 2 ( )
2

nx(n)
0 ()
x ( n ) cos 0n

( )

[ (0) ]
1
1
[ (0)] + [ (0) ]
2
2

1 (). 2 * ()
=

Quan h Parseval

1
1 ( ). 2 * ( )
2

2
()

1
2
( )
2

2.1.2.2. Bin i Z:
nh ngha: Bin i z ca mt dy x(n) c nh ngha nh sau:

() = = ()

(2.15)

nh ngha trn cn c gi l bin i z hai pha.


Ta s c bin i z mt pha nu thay i cn n chy t 0 n +:

() = =0 ()

(2.16)

y ta phi thy c z l mt bin s phc c biu din theo 2 dng:


+ Biu din theo phn thc, phn o Re[z], Im[z]
z = Re[z] + j.Im[z]

(2.17)

Hnh 2.6. Biu din theo phn thc phn o

+ Biu din theo ta cc:


= = (cos + sin) = cos + sin = [] + []

Hnh 2.7. Biu din Z trn mt phng phc

- Trng hp c bit: z = r = 1 , ta c vng trn n v.

(2.18)

10

Hnh 2.8. Vng trn n v

Min hi t ca bin i z: Tp hp tt c cc gi tr ca z m ti chui

() = = ()

(2.19)

hi t c gi l min hi t ca bin i z. K hiu RC (Region of Convergence).


Bin i z ngc (IZT: Inverse Z Transform):
[()] = ()
Bin i z ngc c nh ngha nh sau:
() =

(). 1

(2.20)

- ng cong kn i qua gc ta . Tch phn ng theo chiu dng.


Bng 2.2. Cc tnh cht ca bin i Z

Min n
1
() 1
() =
2

Min Z

() = ()

ax1(n) + bx2 (n) ; a, b l hng s

aX1(z) + bX2(z)

x(n-n0)

0 ()

anx(n)

X(a-1 z)
()

X*(z*)
1
( )

X1 ( z ) . X 2 ( z )
1

1 ()2 ( ) 1
2

nx(n)
x*(n) ; (*: lin hp phc)
x(-n)
x1 ( n ) * x2 ( n )
x1 ( n ) . x 2 ( n )

11

2.1.2.3. Quan h gia bin i Fourier v bin i Z


Ta thy, theo nh ngha ca bin i z:

() = = ()

(2.21)

Mt khc z l mt bin s phc v c biu din trong mt phng phc


theo to cc nh sau: z=r.ej
Nu chng ta nh gi bin i Z trn vng trn n v (r=1), ta c:

() |= = = (). = ( )

(2.22)

Hnh 2.9. Thc hin bin i z trn vng trn n v

Nh vy, ta rt ra mt s nhn xt:


- Bin i Fourier chnh l bin i z c thc hin trn vng trn n v.
- Bin i Fourier ch l trng hp ring ca bin i z.
- Chng ta c th tm bin i Fourier t bin i Z bng cch nh gi ZT
trn vng trn n v vi iu kin vng trn n v phi nm trong min hi t ca
bin i Z.
2.1.3. Php bin i tn s ri rc:
2.1.3.1. Bin i Fourier ri rc (Discrete Fourier Transform DFT):
Nu mt tn hiu xN(n) tun hon vi chu k N th:
xN(n) = xN(n+N)

(2.23)

Bin i Fourier ri rc ca mt dy tun hon xN(n) c chu k N c


nh ngha nh sau:
() =

1
=0

() = =0 ().

(2.24)

12

Trong : =

vi {

=0 1
=01

x(n) l dy tun hon chu k N nn n tha mn: x(n) = x(n + lN)


2

t = =
2

= =
2

= ; 1 = ; 0 = 1
Theo cch t nh trn th bin i Fourier ri rc i vi dy tun hon
chu k N c vit li nh sau:
1

() = =0 ().

(2.25)

Bin i Fourier ri rc ngc (IDFT):


1

().

(2.26)

() = =0 ().

(2.27)

() =

=0

Hay vit li cho gn:


1

Bng 2.3. Tnh cht ca DFT i vi dy tun hon c chu k N

Min n

Min k

1
() = ().

() = ().

1 () + 2 ()

1 () + 2 ()

( 0 )

0 ()

ln ()

( + )

1 () (*) 2 ()

1 () 2 ()

1 () 2 ()

1
1 () 2 ( )

=0

=0

=0

1 () (*) 2 ()
()thc

() = * ()
[()] = [()]
[()] = [()]
() = ()
arg[()] = arg[()]

13

2.1.3.2. Bin i Fourier nhanh:


Bin i Fourier nhanh - FFT (Fast Fourier Transform) l thut ton rt hiu
qu tnh DFT ca mt chui s. u im l ch nhiu tnh ton c lp li do
2

tnh tun hon ca s hng Fourier . Dng ca DFT l:


1

() = =0 ().

(2.28)

Ta c W(N+qN)(k+rN) = Wnk vi mi q, r nguyn do tnh tun hon ca s hng


Fourier. Tch DFT thnh 2 phn:
() =

1
2

=0

(2n)

2nk

1
2

=0

(2n + 1) (2n+1)

(2.29)

Ch s di N ca s hng Fourier biu din kch thc ca chui. Nu


chng ta biu din thnh phn chn ca chui s x(n) bng xev v thnh phn l l
xod th phng trnh c th vit li:
() =

1
2

=0

1
2

=0

(2.30)

Ta c hai biu thc DFT, do c th vit :


() = () + ()

(2.31)

Ch s k chy n N-1 nhng do s dng tnh chu k ca hm chn v hm


l nn ch cn tnh DFT N/2 im c c gi tr ca X(k).

() = ( ) < 1
2

(2.32)

Tip tc chia DFT kt qu thnh hai na chn v l cho n khi ch cn phi


tnh hai im DFT.
2

(0) + (1)
() = {(0) + (1)
(0) (1)

(2.33)

i vi 2 im DFT ny ch cn php cng v tr m khng cn php nhn.


tnh ton b DFT, chng ta nhn 2 im DFT vi cc tha s W thch hp t W 0
ti WN/2. Hnh di1 th 8 im FFT.

14

Hnh 2.10. FFT 8 im, c s 2, phn chia theo tn s

Chng ta c th so snh trc tip DFT v FFT nh sau:


Khi tnh trc tip DFT, mi gi tr ca k cn N php nhn phc v N-1 php
cng phc. i vi FFT, mi hm u c dng (0) Wp (1) (gi l s bm do
th c hnh cnh bm) yu cu mt php nhn v hai php cng. T th trn
hnh 2.5 chng ta c th tng qut ha s bm l:

S bm = log 2
2

iu ny l do c N/2 hng bm (bi v mi bm c hai ng vo) v


log2N ct bm.
2.1.3.3. Bin i Cosine ri rc:
Bin i Cosine ri rc DCT (Discrete Cosine Transform) c s dng
rng ri trong x l ting ni. N l mt php bin i chuyn tn hiu sang min
tn s.
Php bin i thun:
1

() = ()

=0

(2n+1)

(). cos [

2N

] , = 0,1,2, . . . , 1

(2.34)

Php bin i nghch:


1

() = ()

=0

(2n+1)

(). cos [

2N

] , = 0,1,2, . . . , 1

(2.35)

15

2.1.4. Cc b lc s v cc ca s:
B lc s l mt h thng s dng lm bin dng s phn b tn s ca
cc thnh phn ca mt tn hiu theo cc ch tiu cho.
Lc s l cc thao tc ca x l dng lm bin dng s phn b tn s
ca cc thnh phn ca mt tn hiu theo cc ch tiu cho nh mt h thng s.
Phn ny m t cc nguyn tc c bn ca thit k b lc s, nghin cu b
lc s c p ng xung chiu di hu hn FIR (Finite-Impulse Response) v b lc
s c p ng xung chiu di v hn IIR (Infinite-Impulse Response), l cc loi
c bit ca cc b lc s pha tuyn tnh.
2.1.4.1. B lc l tng thng thp:
( ) = {

1 < 0
0 0 < <

(2.36)

S dng nh ngha ca bin i Fourier, ta c:


() =

1
2

( 0 0)
2

sin0

= ( 0 ) (0n )

(2.37)

y hm sinc c nh ngha nh sau:


() =

sin

(2.38)

c phn thc l hm chn ca x c m t trong hnh di.

Hnh 2.11. Hm sinc

2.1.4.2. Cc phng php ca s:


Cc phng php ca s l cc tn hiu c tp trung trong mt khong
thi gian gii hn. C cc phng php ca s tam gic nh Kaiser, Barlett,... ca
s ch nht Hanning v Hamming, trong Hanning v Hamming c s dng
rng ri trong x l ting ni.
Ca s ch nht: Trong min n, ca s ch nht c nh ngha nh sau:
() = {

1 01
0

(2.39)

16

Xt ca s ch nht trong min tn s ta c:


1

( ) = =0 =

1 sin
2
2

sin 2

1
2

2 ( 2
2)

2 ( 2 2 )

( )

=
(2.40)

Hnh 2.12. Biu din AR(ej)

C hai tham s nh gi ca s l:
- B rng nh trung tm .
- T s gia bin nh th cp th nht trn bin nh trung tm:
= 20lg

( )
( 0 )

Ca s Hanning v Hamming: Trong min n, ca s Hanning v Hamming


c nh ngha nh sau:
2

(1 )cos
01
1
() = {
0

(2.41)

Phn loi khc nhau theo h s ta c:


+ = 0,5 : ca s Hanning
2

() = {

0,5 0,5cos
0 1
1
0

(2.42)

+ = 0,54 : ca s Hamming
2

0,54 0,46cos

1
() = {
0
Ta c cc tham s ca b lc Hanning:
+ Han = 8/N

01

(2.43)

17

+ Han 32dB
Cc tham s ca b lc Hamming:
+ Ham = 8 /N
+ Ham 43dB
Nh vy ta thy: T = Han = Ham = 8 /N; T > Han > Ham vy trong
3 ca s b rng nh trung tm l nh nhau nhng bin ca gn sng di
thng v di chn s nh nht khi thit k bng ca s Hamming.
2.1.4.3. B lc FIR v IIR:
Cc h thng c c tnh xung c chiu di hu hn c gi l FIR:
() = {

0
=0

1 2
< < 1 2 < <

(2.44)

Gi s h thng FIR:

0
() = {
0

(2.45)

Khi c tnh xung ca h thng:


() =

1
0

=0 ( )

(2.46)

Cc h thng c c tnh xung c chiu di v hn c gi l IIR:

=0 ( ) = =0 ( )

(2.47)

Phng trnh trn l mt phng trnh qui:

() = =0 ( ) + =0 ( )

(2.48)

v vy IIR cn gi l lc qui v FIR l lc khng qui.


Khi h thng c hm truyn trong mt phng Z:

() =

()
()

=0

1+=1


=0

=0

(2.49)

2.1.5. Xc sut v qu trnh ngu nhin:


Tn hiu ting ni l mt qu trnh thng k s tng quan ca chui
s ng vai tr quan trng phn tch, d bo c tnh ca ting ni. Ngoi ra,
trong nhn dng ting ni, m hnh xc sut l mt trong nhng m hnh t kt qu

18

tt nht. Phn ny trnh by khi nim xc sut v qu trnh ngu nhin lm c s


cho trch hun luyn v nhn dng ting ni chng sau.
2.1.5.1. C s xc sut:
Xc sut ca s kin A c k hiu l P(A). C 3 tnh cht ca P(A):
- P(A) 0
- P(tt c kh nng c th xy ra) = 1
- Cho {Ai}, vi (Ai . Aj = 0) th P(Ai.Aj) = P(Ai) + P(Aj)
Chng ta nh ngha thm xc sut ng thi v xc sut c iu kin. Xc
sut ng thi l xc sut hai hay nhiu s kin ng thi xy ra trong mt php
th. K hiu xc sut 2 s kin A v B xy ra ng thi: P(AB)
Xc sut c iu kin l xc sut xy ra s kin A trong khi s kin B
xy ra. K hiu: P(A/B). Cng thc tnh:
P(A | B) =

P(AB)
P(B)

(2.50)

Cng thc Bayes: 2 bin c A, B c lp:


P(A | B) =

P(B|A).P(A)
P(B)

(2.51)

2.1.5.2. Bin ngu nhin:


Trong x l tn hiu, iu mong mun l tn hiu c gi tr hay nm trong
mt phm vi c th. Tn hiu trong trng hp ny c coi l bin ngu nhin.
y l tp s kin ca bin ri rc c gi tr c th, ngoi ra cn c tp s
kin ca bin lin tc c gi tr nm trong mt phm vi no . Chng ta c th nh
ngha hm phn phi xc sut ca mt bin ngu nhin nh sau:
F(x) = P(X x)
Hm phn phi xc sut l hm tng ca bin c lp x v ch ng cho
bin ngu nhin c th X.
Nu ly vi phn F(x) theo bin x, ta nhn c hm mt xc sut PDF
(Probablity Destiny Function) ca X:
() =

()

(2.52)

19

Ly tch phn p(x), ta c c hm phn phi xc sut nh sau:

() = ()

(2.53)

Ta c th xc nh c xc sut ca bin ngu nhin X nm gia a v b:


P(X b) = P(a < X b) + P(X a)
Vit li phng trnh trn theo hm phn phi:

( < ) = () () = ()

(2.54)

Hnh 2.13. Hm phn phi

Ngha l nu bit hm phn phi hay hm mt , chng ta c th tnh c


xc sut ca bin ngu nhin X nm trong phm vi cho trc.
2.1.5.3. K vng, phng sai:
Gi tr k vng ca x k hiu E(x) l gi tr c kh nng xy ra nhiu nht.
E(x) cn c gi l gi tr trung bnh, v c tnh t hm mt nh sau:

() = ()

(2.55)

V d:

E(X) = 20.05+30.10+ +90.05 = 5.35


Phng sai ca bin ngu nhin x c nh ngha:
2 = Var(x) = E[(x - E[x])2]

(2.56)

l cn bc hai gi tr bnh phng trung bnh ca lch gia mt bin v


gi tr trung bnh ca bin .

20

2.2. BIU DIN TN HIU TING NI


S dng php biu din tn hiu ting ni trn nh ph (spectrogram). nh
ph rt hu dng phn tch cc m v v s chuyn trng thi ca chng. Mt
nh ph ca mt tn hiu thi gian l mt biu din hai chiu c bit, hin th thi
gian trn trc ngang v tn s trn trc dc. Mt thang mu xm thng c dng
ch mc nng lng ti mi im (t, f) vi mu trng ch mc nng lng thp
v mu en l mc nng lng cao. Trong phn ny s tm hiu phng php phn
tch Fourier thi gian ngn, cng c c bn tnh ton chng.
2.2.1. Bin i Fourier thi gian ngn:
Php bin i Fourier khng th p dng i vi tn hiu khng dng, v
cc thnh phn tn s khng n nh. Tuy nhin nu chng ta chia tn hiu khng
dng thnh nhng on nh theo thi gian th tn hiu trong mi on c th
xem l tn hiu dng v do c th ly bin i Fourier trn tng on tn hiu
ny. Nh vy, php bin i Fourier thi gian ngn STFT (Short-Time Fourier
Transform) va c tnh nh v theo tn s do tnh cht ca bin i Fourier, va c
tnh nh v theo thi gian do c tnh trong tng khong thi gian ngn. y l
nguyn l ca STFT hay cn gi l bin i Fourier ca s ha.
Trong STFT, tn hiu f(t) u tin c nhn vi mt hm ca s w(t-)
ly c tn hiu trong khong thi gian ngn xung quanh thi im . Sau php
bin i Fourier bnh thng c tnh trn on tn hiu ny. Kt qu chng ta
c mt hm hai bin STFT f(w,t) xc nh bi:

(, ) = (). ( )

(2.57)

STFT ti thi im l bin i Fourier ca tn hiu f(t) nhn vi phin bn


dch mt khong theo thi gian w(t-) ca ca s c bn tp trung xung quanh .
STFT c tnh nh v theo thi gian. Ca s cng hp th tnh nh v cng tt.
thy r hn v tnh nh v theo tn s, ta p dng nh l Parserval
vit li (2.57) nh sau:

(, ) = (( ) ) ()
=

1
[((
2

) )*]. [()]

21

=
=

[ ( ). () ] . []
2

2

( )()

(2.58)

vi W*(w'-w) v F(w') ln lt l ph ca ca s w(t-) v tn hiu f(t).


W*(w'-w) c tc dng nh mt b lc di thng tp trung quanh tn s w c
bng thng bng bng thng w(t) lm gii hn ph ca tn hiu F(w') xung quanh
tn s ang phn tch w. Nh vy STFT c tnh nh v theo tn s. Tnh nh v ny
cng tt khi bng thng ca ca s phn tch cng hp.
Ta thy rng, STFT chnh l s o ging nhau gia tn hiu phin bn
dch v bin iu ca ca s c bn v (2.57) c th vit li nh sau:

(, )= (( ) ) () = , (), ()

(2.59)

vi gw, (t) = w(t - )ejwt l phin bn dch v bin thin ca w(t).


Do vic dch thi gian mt khong lm cho ca s tnh tin mt khong
theo trc thi gian v bin iu ca s vi ejwt l ca s tnh tin mt khong w theo
trc tn s, nn kch thc ca ca s khng thay i m ch di n v tr mi
xung quanh (, w). Nh vy, mi hm ca s c s s dng trong php bin i ny
u c mt phn gii thi gian - tn s, ch khc v tr trn mt phng thi gian
tn s. Do , c th ri rc ha d dng STFT trn mt li ch nht (mw0, n0).
Nu hm ca s l mt b lc h thng c tn s ct wb, hoc bng thng
2wb th w0 c chn nh hn wb v 0 nh hn /w0 vic ly mu khng mt
thng tin. Cc hm ca s ti tt c cc im ly mu s ph kn mt phng thi
gian - tn s ca php bin i.
phn gii thi gian - tn s ca STFT ph thuc vo hm ca s. c
phn gii tt th ca s phn tch phi hp (v mt thi gian). Trong khi ,
t c phn gii tn s tt th bng thng ca ca s phi hp. Tuy nhin, theo
nguyn l bt nh th khng th tn ti mt ca s vi khong thi gian v bng
thng hp ty m c mt s hon i gia hai thng s ny (do tch ca chng b
chn di). Nu ta chn ca s c bng thng hp phn gii tt th khong
thi gian li rng lm cho phn gii thi gian li km i v ngc li, y chnh
l nhc im ca STFT.

22

2.2.2. Phn tch Fourier thi gian ngn:


tng ng sau nh ph l tnh ton mt bin i Fourier mi 5ms mt
ln, hay biu din nng lng ti mi im thi gian/tn s. Do mt vi min tn
hiu ting ni ngn hn khong 100ms thng xut hin nh k, ta c s dng cc
k thut cp phn x l tn hiu s. Tuy nhin, tn hiu khng cn tun hon
khi phn tch cc on di hn, do , vic xc nh chnh xc ca bin i Fourier
khng th dng c na. Hn na, vic xc nh ny yu cu kin thc ca tn
hiu thi gian v hn. V hai l do ny, cc k thut mi gi l phn tch thi gian
ngn (short-time analysis) c xut. Cc k thut ny phn tch tn hiu ting
ni thnh mt chui cc on ngn, gi l cc khung (frame) v phn tch mi
khung ny mt cch c lp.
Cho xm(n) l tn hiu thi gian ngn ca khung m.
wm(n) l hm ca s, bng 0 ti mi im tr mt vng nh.
C xm(n) = x(n)wm(n)
Do hm ca s c th c cc gi tr khc nhau i vi mi frame m, gi
gi tr khng i cho tt c frame th:
wm(n) = w(m-n)
Biu din Fourier thi gian ngn i vi frame m c nh ngha:

( ) = = () = = ( )()

Hnh 2.14. Ph thi gian ngn ca ting ni ging nam

(2.60)

23

Do nh ph ch hin th nng lng v khng phi on gii hn ca bin


i Fourier nn mc nng lng c tnh nh sau:
log()2 = log( 2 () + 2 ()) (2.61)
Gi tr ny c chuyn sang thang xm nh hnh (2.16). Cc pixel m gi
tr khng c tnh ton c thm vo. on nghing iu chnh tng phn
ca nh ph, trong khi cc im bo ha m trng v mu en iu chnh dy ng
hc.

Hnh 2.15. Chuyn i gia gi tr nng lng log (trn trc x) sang thang xm (trc y)

2.3. RT TRCH C TRNG TING NI


Qu trnh nhn dng mu (c pha hun luyn hay pha nhn dng) u tri
qua giai on trch chn c trng (feature extraction). Bc ny thc hin cc phn
tch ph (spectral analysis) nhm xc nh cc thng tin quan trng, c trng, n
nh ca tn hiu ting ni, ti thiu ha nh hng ca nhiu; xc cm, trng thi,
cch pht m ca ngi ni; gim khi lng d liu cn x l...

Hnh 2.16. S rt trch c trng tng qut

24

2.3.1. Trch c trng MFCC


Coefficient)

(Mel-scale

Frequency Cepstral

MFCC l phng php trch c trng da trn c im cm th tn s m


ca tai ngi: tuyn tnh i vi tn s nh hn 1kHz v phi tuyn i vi tn s
trn 1kHz (theo thang tn s Mel, khng phi theo Hz)
i vi phng php MFCC, vic tnh c trng c s nh sau:

Hnh 2.17. Cc bc tnh c trng MFCC

2.3.1.1. Tin nhn (Pre-emphasis):


Chng ta bit rng ph ting ni hu thanh c khuynh hng suy gim ton
b -6 dB/octave khi tn s tng ln. iu ny l do khuynh hng suy gim -12
dB/octave ca ngun kch m hu thanh v tng ln +6 dB/octave do pht m
ming. Do cn phi b +6 dB/octave trn ton b bng tn. iu ny c gi l
pre-emphasis tn hiu. Trong x l tn hiu s, chng ta dng b lc thng cao c
tn s ct 3 dB tn s trong phm vi t 100 Hz n 1k Hz. Phng trnh sai phn:
y(n) = x(n) - a*x(n)

(2.62)

Trong y(n) l mu ra hin ti ca b lc pre-emphasis, x(n) l mu vo


hin ti, x(n-1) l mu vo trc v a l hng s thng c chn gia 0.9 v 1.
Ly bin z ca phng trnh trn:
Y(z)=X(z) - az-1X(z)=(1 - az-1)X(z)

(2.63)

Trong z-1 l ton t tr mu n v. Suy ra hm truyn H(z) ca b lc:


() =

()
()

= 1 1

(2.64)

25

2.3.1.2. Ca s ha (Windowing):
u tin tn hiu ting ni x(n) s c chia thnh tng frame (c thc hin
chng ph mt phn ln nhau - overlap) c T frame xt(n). Cng vic ca s
ho ny s c thc hin bng cch nhn tn hiu ting ni vi mt hm ca s.
Gi phng trnh ca s ha l w(n) (0 n N-1; N: s mu trong 1 frame tn hiu),
khi tn hiu sau khi c ca s ha l Xt(n):
Xt(n) =xt(n).w(n)
Hm ca s thng c dng l hm ca s Hamming:
2

() = 0.54 0.46cos(

); = 0. . 1

(2.65)

2.3.1.3. Bin i Fourier nhanh (Fast Fourier Transform - FFT):


Ph tn hiu sau khi nhn vi ca s Hamming s s dng php bin i
Fourier nhanh Ta thu c bin ph cha cc thng tin c ch ca tn hiu ting
ni. Bin i Fourier nhanh - FFT (Fast Fourier Transform) l thut ton rt hiu
qu tnh DFT ca mt chui s. u im l ch nhiu tnh ton c lp li do
2

tnh tun hon ca s hng Fourier . Dng ca DFT l:


() =

1
=0

().

(2.66)

2.3.1.4. Lc qua b lc Mel-scale :


Cc nghin cu v h thng thnh gic ca con ngi cho thy, tai ngi c
cm nhn i vi ln cc tn s khng theo thang tuyn tnh. Cc c trng ph
tn s ca ting ni c tai ngi tip nhn nh ng ra ca mt dy cc b lc.
Tn s trung tm ca cc b lc ny khng phn b tuyn tnh dc theo trc tn s.
Thnh phn ph di 1 kHz thng c tp trung nhiu b lc hn v n cha
nhiu thng tin v m thanh hn. tn s thp cc b lc bng hp c s dng
tng phn gii tn s c c tn s c bn v ha tn vn n nh, cn
tn s cao cc b lc thng bng rng c s dng thu c cc thnh phn tn
s cao vn bin ng rt nhanh.
Vi n lc nhm m t chnh xc s tip nhn tn s ca tai ngi, mt
thang tn s c xy dng - thang tn s Mel da trn c s thc nghim cm
nhn nghe ca ngi. Tn s 1 kHz c chn l 1000 Mel. Mi quan h gia thang

26

tn s thc (vt l) v thang tn s Mel (sinh l) c cho bi cng thc:


= 2595log 10(1 +

700

(2.67)

vi FMel l tn s sinh l, n v Mel; FHz l n v tn s thc, n v Hz.

Hnh 2.18. th biu din mi quan h gia Mel v Hz

Trn hnh 2.18 cho thy, vi nhng tn s nh hn 1 kHz, th quan h gia


thang mel v tn s thc l gn tuyn tnh. Cn cc tn s trn 1 kHz th quan h
ny l logarithm. Nh vy thay v xy dng cc b lc trn thang tn s thc ta c
th xy dng cc b lc ny vi tn s trung tm cch u tuyn tnh trn thang
Mel.
Tn s trung tm ca b lc th m c xc nh bi:
fm = fm-1 + fm
Trong :

(2.68)

fm l tn s trung tm ca b lc th m
fm-1 l tn s trung tm ca b lc th m -1
fm l bng thng ca b lc th m

fm c xc nh: Vi khong tn s di 1 kHz, th fm c chn sao cho


c khong 10 b lc phn b cch u trong khong ny. Vi khong tn s trn
1kHz, fm thng c tnh bi : fm = 1.2* fm -1.
Kt qu sau khi cho ph tn hiu Xt(k) qua b lc ta thu c Yt(m).
2.3.1.5. Tnh log nng lng ph:
Sau khi qua b lc Mel, ph tn hiu Yt(m) s c tnh Log10 theo:
log{|Yt(m)|2}

(2.69)

27

2.3.1.6. Bin i Cosine ri rc:


Bc cui cng thu c cc h s MFCC l ly bin i Cosine ri rc
ca kt qu cho bi (2.65):
() () =

=1

log{| ()|2 }cos(( ) )


2

(2.70)

Thng thng s im ri rc k ca bin i ngc ny c chn 1 k


12. Cc h s MFCC chnh l s im ri rc ny, ta c th c 1-12 h s MFCC.
2.3.2. Phng php m ha d bo tuyn tnh LPC (Linear Predictive
Coding)
tng c bn ca phng php m ha d bo tuyn tnh (LPC) l ti
thi im n, mu ting ni s(n) c th c xp x bi mt t hp tuyn tnh ca p
mu trc :
s(n) a1s(n-1) + a2s(n-2) + + aps(n-p)

(2.71)

Trong gi s a1, a2, ... , ap l hng s trn khung d liu (frame) c


phn tch. Chng ta chuyn quan h trn thnh dng ng thc bng cch thm vo
s hng Gu(n) gi l ngun kch thch:

() = =1 ( ) + ()

(2.72)

Trong u(n) l ngun kch thch c chuyn ha v G gi l li ca


n. Thc hin bin i z hai v ca phng trnh trn, ta c:

() = =0 () + ()

(2.73)

dn n hm truyn l:
() =

()
()

1=1

1
()

(2.74)

K hiu (n) l d bo tuyn tnh ca s(n):


~

= =1 ( )
Khi thit lp li d bo e(n) c nh ngha l:
~

() = () () = () =1 ( ) = . ()

(2.75)

tm tp cc h s ak, k = 1, 2, ..., p trn khung c phn tch, cch tip


cn c bn l ta cc tiu ha sai s bnh phng trung bnh. Khi s dn n vic

28

ta phi gii mt h phng trnh vi p n s. C nhiu phng php gii h


phng trnh , nhng trong thc t, phng php thng c dng l phng
php phn tch t tng quan.

Hnh 2.19. S b x l LPC rt trch c trng ting ni

2.3.2.1. Phn tch t tng quan:


Mi khung sau khi c ly ca s s c a qua bc phn tch t
tng quan v cho ra (p + 1) h s t tng quan:
() =

1 ~
=0

() ( + ) ; = 0,1, . . . ,

(2.76)

Trong gi tr t tng quan cao nht, p, c gi l cp ca phn tch


LPC. Thng thng, ta s dng cc gi tr p trong khong t 8 n 16.
2.3.2.2. Phn tch LPC:
Bc ny, ta s chuyn mi khung gm (p + 1) h s t tng quan thnh p
h s LPC bng cch dng thut ton Levinson - Durbin.
Thut ton Levinson - Durbin th hin qua m gi sau:
D liu vo l (p + 1) h s t tng quan cha trong r; kt qu ra l p h s
LPC cha trong a.
Lc ny, ta c th dng cc h s LPC lm vector c trng cho tng
khung. Tuy nhin, c mt php bin i to ra dng h s khc c tp trung cao
hn t cc h s LPC, l php phn tch Cepstral.

29

2.3.2.3. Phn tch cepstral:


T p h s LPC mi khung, ta dn xut ra q h s cepstral c(m) theo cng
thc quy sau:
c0 = ln2
= +
=

( ) ;

=1

( ) ; < <

=1

Trong , 2 l li ca m hnh LPC. Thng thng ta chn Q (3/2)p.


2.3.2.4. t trng s cho cc h s cepstral:
Do nhy ca cc h s cepstral cp thp lm cho ph b dc v do
nhy ca cc h s cepstral cp cao gy ra nhiu nn ta thng s dng k thut t
trng s lm gim thiu nhy ny:
i (m) = c(m).w(m)
Vi w(m) l hm t trng s. Hm t trng s thch hp thng l b lc
thng di:

() = [1 + sin(

)],1

(2.77)

Vi min ting ni hu thanh c trng thi gn n nh, m hnh tt c cc


im cc i ca LPC cho ta mt xp x tt i vi ng bao ph m. Vi ting
ni v thanh, m hnh LPC t ra t hu hiu hn so vi hu thanh, nhng n vn l
m hnh hu ch cho cc mc ch nhn dng ting ni. M hnh LPC n gin v
d ci t trn phn cng ln phn mm.

30

CHNG 3:

NHN DNG TING NI

3.1. M HNH MARKOV N:


Mt phng php hiu qu dng m hnh ha cu trc ng ca ting
ni l m hnh Markov n vit tt l HMM (Hidden Markov Models). y l hng
tip cn i snh mu xc sut, vi gi nh rng cc mu ting ni tun t
theo thi gian l kt qu ca qu trnh thng k hay ngu nhin c tham s, v cc
tham s ny c th c lng. Mi t sau khi qua khu trch c trng ta thu c
mt dy vct P chiu (P = s h s MFCC), v c k hiu l t1, t2, , ti, , tl.
Qua khu lng t vct, dy vct c trng ny c bin i thnh cc quan st
(l cc k hiu sau khi phn lp lng t vct) v c k hiu l o1, o2, , ot, ,
oT. Gi nh c bn ca HMM l mu d liu c th m t k nh qu trnh hnh
thnh mt tham s ngu nhin, v cc tham s ca qu trnh phng on c th c
tnh trong m hnh c nh ngha r rng v chnh xc. L thuyt HMM c bn
c cng b trong mt lot ti liu ca Baum v ng nghip ca ng.
3.1.1. Chui Markov ri rc:
Xt h thng c tnh cht nh sau: mt thi im bt k, h thng s
mt trong N trng thi nh hnh v di y. C sau mt khong thi gian u n,
h thng s chuyn sang trng thi mi hoc gi nguyn trng thi trc . Ta k
hiu cc khong thi gian chuyn trng thi l t=1, 2, v trng thi ti thi im t
ca h thng l qt, qt s c cc gi tr 1, 2, , N. 1 trng thi tng ng vi 1 s
kin. Qu trnh trn c gi l qu trnh Markov.

Hnh 3.1. Minh ha m hnh Markov

31

Trong hnh v, aij l xc sut chuyn t trng thi i sang trng thi j, ta c
cc quan h:
aij 0, ,

=1

(3.1)

Ta ch xt chui Markov bc nht l nhng h thng m trng thi hin ti


ch ph thuc vo trng thi ngay trc , ngha l:
aij = P[qt = j | qt-1 = i] , 1 i, j N

(3.2)

Cc thnh phn trong m hnh Markov:


- N trng thi ca m hnh. K hiu trng thi thi im t l qt.
- N s kin: E = {e1, e2, e3, ..., eN}. Mi s kin tng ng vi 1 trng thi.
Ti mi thi im t, trng thi pht sinh ra s kin tng ng vi n.
- A={aij} - l ma trn phn phi xc sut chuyn trng thi, trong aij l
xc sut chuyn t trng thi i thi im t sang trng thi j thi im t+1:
aij = P[qt = j |qt-1 = i]

1 i, j N

- = {i} - ma trn phn phi trng thi ban u trong i l xc sut m


hnh trng thi i ti thi im ban u t = 1;
i = P[q1= i]

1iN

V d 1: Tung ng xu.

y c 2 trng thi: S1 tng ng vi s kin e1 = Xp; v S2 tng ng


vi s kin e2 = Nga.
Ta c cc phn t ca ma trn A:
a11 = 0.5 a12 = 0.5
a21 = 0.5 a22 = 0.5
Cc s kin:

32

Xp Xp Nga Nga Xp Nga Xp


tng ng vi cc trng thi:
S1 S1 S2 S2 S1 S2 S1
V d 2: Thi tit ca mt vng vi m hnh xc sut nh sau.

- Xc sut ca chui quan st {rain, rain, rain, clouds, sun, clouds, rain} ng
vi m hnh Markov trn l:
Quan st

{ r, r, r, c, s, c, r}

{S1, S1, S1, S2, S3, S2, S1}

Time

{1, 2, 3, 4, 5, 6, 7} (days)

P[S1] P[S1|S1]P[S1|S1]P[S2|S1]P[S3|S2]P[S2|S3]P[S1 |S2]

0.5*0.7*0.7*0.25*0.1*0.7*0.4

0.001715

- Xc sut ca chui {sun, sun, sun, rain, clouds, sun, sun) ng vi m hnh
Markov trn l:
Quan st

{ s, s, s, r, c, s, s}

{S3, S3, S3, S1, S2, S3, S3}

Time

{1, 2, 3, 4 , 5 , 6 , 7} (days)

P[S3]P[S3|S3]P[S3|S3]P[S1|S3]P[S2|S1]P[S3|S2]P[S3|S3]

33

0.1*0.1*0.1*0.2*0.25*0.1*0.1

5.0*10-7 .

3.1.2. nh ngha m hnh Markov n:


Khc vi chui Markov nh trnh by trn, m hnh Markov n c nhng
c im sau:
- T 1 trng thi c th pht sinh hn 1 s kin (hay cn c gi l 1 quan
st).
- Chui quan st l hm xc sut ca trng thi.
- Chng ta c th tnh ton xc sut ca cc chui trng thi khc nhau t
mt chui quan st.
Nh vy HMM vn pht sinh ra cc quan st. S lng trng thi thng
thng khc s lng quan st. Khi trng thi Si, c xc sut p(o1) pht sinh s
kin 1, xc sut p(o2) pht sinh s kin 2...
Cc thnh phn ca m hnh Markov n:
- N l s lng trng thi ca m hnh. {1,2,...,N} l cc trng thi. K hiu
trng thi thi im t l qt.
- M l s lng quan st phn bit. Cc k hiu quan st tng ng vi tn
hiu vt l m h thng ang m t. Ta k hiu tp quan st l V={v1, v2, ..., vM}.
i vi tn hiu ting ni, M l kch thc codebook. vi l m ca tng vector.
- A = {aij} - l ma trn phn phi xc sut chuyn trng thi, trong aij l
xc sut chuyn t trng thi i thi im t sang trng thi j thi im t+1
aij = P[qt = j |qt-1 = i]

1 i, j N

- B = {bj(k)} - ma trn phn phi xc sut cc k hiu quan st, trong


bj(k) l xc sut nhn c k hiu quan st vk trng thi j:
bj(k) = P[ot = vk|qt = j]

1kM

j=1, 2, N

- = {i} - ma trn phn phi trng thi ban u trong i l xc sut ca


m hnh trng thi i ti thi im ban u t=1:
i = P[q1=i]

1iN

Nh vy c t y mt HMM cn phi c s trng thi N ca m


hnh, tp V gm M k hiu quan st, ma trn xc sut chuyn trng thi A, ma trn
xc sut cc k hiu quan st c B v ma trn xc sut trng thi ban u .
V d: Thi tit v m khng kh.

34

Cho m hnh Markov nh hnh sau:

M hnh ny c:

S trng thi N=3 (gm High, Medium, Low)

S k hiu quan st M=3 (gm Rain, Cloud, Sun)


Cc gi tr cc phn t ca ma trn A, B, nh trn hnh trn.

Nu cho chui quan st O = {sun, sun, cloud, rain, cloud, sun} v m


hnh Markov n nh hnh v trn, th xc sut c chui trng thi {H, M, M, L, L,
M} l bao nhiu?
Xc sut cn tm =
bH(sun)*bM(sun)*bM(cloud)*bL(rain)*bL(cloud)*bM(sun)
=

0.8* 0.3*0.4*0.6*0.3*0.3

5.2*10-3

Cho m hnh Markov nh hnh v trn. Tnh xc sut c c chui


quan st O = {sun, sun, cloud, rain, cloud, sun} v chui trng thi l {H, M, M, L,
L, M}.
Xc sut cn tm =
H*bH(s)*aHM*bM(s)*aMM*bM(c)*aML*bL(r)*aLL*bL(c)*aLM*bM(s)
=

0.4*0.8*0.3*0.3*0.2*0.4*0.5*0.6*0.4*0.3*0.7*0.3

1.74*10-5

Cho m hnh Markov nh hnh v trn. Tnh xc sut c c


chui quan st O={sun, sun, cloud, rain, cloud, sun}v chui trng thi l {H, H, M,
L, M, H}.
Xc sut cn tm =
H*bH(s)*aHH*bH(s)*aHM*bM(c)*aML*bL(r)*aLM*bM(c)*aMH*bH(s)
=

0.4*0.8*0.6*0.8*0.3*0.4*0.5*0.6*0.7*0.4*0.4*0.6

35

3.71*10-4

Vi nh ngha HMM trn, ta c ba vn c bn cn quan tm trc khi


c p dng trong cc ng dng thc t.
i. Vn s c lng - cho trc mt m hnh v mt dy ca cc
quan st X=(X1, X2, , XT), c xc sut l P(X| ); xc sut ca m hnh to ra cho
cc quan st l g?
ii. Vn gii m - cho trc mt m hnh v mt dy ca cc quan
st X=(X1, X2, , XT), dy trng thi ph hp nht S = (s0, s1, s2, , sT) trong m
hnh to ra cc quan st l g?
iii. Vn hc - cho trc mt m hnh v mt tp cc quan st, lm
cc i xc sut lin kt
th no chng ta c th iu chnh m hnh tham bin
(c th xy ra) (| )?
Nu chng ta c th gii quyt vn s c lng, chng ta s c mt
cch c lng lm sao vi mt HMM cho trc n khp mt dy quan st. Do ,
chng ta c th s dng HMM nhn dng mu, khi xc sut P(|) c th
c s dng tnh ton xc sut hu nghim P(|), v HMM vi xc sut hu
nghim cao nht c th c xc nh nh mu mong i cho dy quan st. Nu
chng ta c th gii quyt vn gii m, chng ta c th tm c dy trng thi
khp nht vi mt dy quan st cho trc, hay trong cc t khc, chng ta c th
khm ph dy trng thi n lm c s cho qu trnh gii m trong nhn dng ting
ni lin tc. Sau cng nu chng ta c th gii quyt vn hc, chng ta s c th
c lng t ng m hnh tham bin t mt b d liu hun luyn. Ba vn
ny c lin kt di nn tng xc sut tng t. S b sung hiu qu ca cc
thut ton chia s cc nguyn tc ca lp trnh ng sau.
3.1.2.1. Lp trnh ng v DTW:
Khi nim lp trnh ng cn c bit nh DTW (Dynamic Time Warping)
trong nhn dng ting ni, c s dng rng ri dn sut ton din tnh trng
khng phn bit r gia hai mu ting ni. Phng php DTW c th lm lch hai
mu ting ni (x1,x2,, xN) v (y1, y2, , yM) trong chiu thi gian gim bt tnh
trng khng r nh minh ha trong hnh di:

36

Hnh 3.2. So snh trc tip gia hai mu ting ni


X=(x1,x2, xN) v Y=(y1, y2, yM)

iu ny tng ng vi vn tm kim khong cch cc tiu trong li


gia hai mu. c lin kt vi mi cp (i, j) l mt khong cch d(i, j) gia hai
vct ting ni xi v yi. tm ng dn ti u gia im bt u (1, 1) v im
cui (N, M) t tri sang phi, chng ta cn tnh khong cch chng cht D(N, M).
Chng ta c th lit k tt c kh nng khong cch chng cht t (1, 1) n (N, M)
v xc nh mu c khong cch cc tiu. Khi c M kh nng di chuyn cho mi
bc t tri sang phi trong hnh trn, tt c ng c kh nng t (1, 1) n (N, M)
s theo cp s m. Nguyn tc lp trnh ng c th gim mnh lng tnh ton
bng cch trnh s lit k ca cc dy m khng th ti u. Khi ng ti u tng
t sau mi bc phi da trn bc trc , khong cch cc tiu D(i, j) phi
tha mn biu thc sau:
D(i, j) = mink [D(i-1, k)+d(k, j)]

(3.3)

Cng thc (3.3) cho bit ta ch cn xem xt v gi li ch bc i tt nht


i vi mi cp mc d c th c M kh nng bc i. S qui cho php tm kim
ng dn ti u c tin hnh gia tng t tri qua phi. V bn cht, lp trnh
ng giao ph gii php quy cho vn con ca chnh n. Qu trnh tnh ton bt
ngun t vn con (D(i-1, k)) n vn con ln hn (D(i, j)). Chng ta c th
xc nh yj n khp nht vi xi v lu li ch mc trong bng con tr li B(i, j) l
chng ta i qua. ng dn ti u nht c th ln ngc li sau khi ng dn
ti u c xc nh.

37

3.1.2.2. c lng HMM - Thut ton tin:


Ton t tin () l xc sut ca chui quan st tng phn X = (X1, X2,,
Xt) v trng thi quan st Si ti thi im t vi iu kin cho HMM .
( ) = (1 2 , = |)
Thut ton tin:
Bc 1: Khi to
( ) = (1 )

1iN

Bc 2: Qui np
( ) = [
=1 1 ( ) ] ( )

2 t T; 1 j N

Bc 3: Kt thc
(|) =
=1 () nu c yu cu kt thc trng thi sau cng,
(|) = ( )
Ta c th d dng bit c phc tp ca thut ton tin l O(N2T) tt
hn so vi phc tp cp s m. l bi v chng ta c th s dng ton b cc
phn xc sut tnh ton cho hiu qu c ci tin.

Hnh 3.3. Qu trnh tnh ton li tin cho HMM ca Dow Jones Industrial

3.1.2.3. Gii m HMM - Thut ton Viterbi:


Thut ton tin, trong phn trc, tnh ton xc sut m mt HMM to ra
chui quan st bng tng cc xc sut ca tt c ng dn c th, cho nn n
khng cung cp ng dn tt nht (hoc dy trng thi). nhiu ng dng, ngi
ta mong tm c ng dn nh vy. Tm ng dn tt nht (dy trng thi) l

38

nn mng cho qu trnh tm kim trong nhn dng ting ni lin tc. Khi dy trng
thi c n (khng c quan st) trong nn tng HMM, hu ht s dng rng ri
nht tiu chun l tm dy trng thi c xc sut cao nht c ly trong khi to
ra dy quan st. Ni cch khc, chng ta ang tm kim dy trng thi S = (s1,
s2, , sT) m cc i P(S, X|). Vn ny rt ging vi vn ti u ng dn
trong lp trnh ng. H qu l, mt k thut chnh thc da trn lp trnh ng, gi
l thut ton Viterbi, c th c dng tm dy trng thi tt nht cho HMM.
Thc t, phng php tng t c dng nh gi HMM mang li cho gii
php xp x gn vi trng hp t c vic s dng thut ton tin m t trn.
Thut ton Viterbi c th c xem nh thut ton lp trnh ng p dng
cho HMM hay l thut ton tin sa i. Thay v tng kt xc sut t cc con ng
khc n trng thi ch, thut ton Viterbi ly v nh ng dn tt nht. nh
ngha xc sut ng dn tt nht:
( ) = (1 , 11 , = |)

(3.4)

( ) l xc sut c kh nng nht ca dy trng thi thi im t, m


to ra quan st 1 (cho n thi im t) v kt thc trng thi i. Mt th tc qui
np tng t cho thut ton Viterbi c th c m t nh sau:
Thut ton Viterbi:
Bc 1: Khi to
1 ( ) = (1 )

1iN

Bc 2: Qui np
() = 1 [1 () ] ( )

2 t T; 1 j N

() = 1 [1 () ]

2 t T; 1 j N

Bc 3: Kt thc
Ch s tt nht = Max1iN[ ()]
= 1 [ ( )]
Bc 4: Quay lui
)
= +1 (+1

t = T - 1, T - 2,, 1

S* = (1 , 2 , , ) l dy tt nht
phc tp ca thut ton Viterbi l O(N2T)

39

Hnh 3.4. Qu trnh tnh ton li Viterbi cho HMM ca Dow Jones Industrial

3.1.2.4. c lng cc tham bin HMM - Thut ton Baum-Welch:


Rt quan trng i vi c lng cc tham bin m hnh = (A, B, )
m t chnh xc cc dy quan st. y l vn kh nht, v cha bit phng php
phn tch ti u xc sut t hp ca d liu hun luyn trong cng thc dng ng.
Thay vo , vn c th gii quyt bng thut ton lp Baum-Welch, cn c
bit l thut ton tin-li (forward-backward). Vn hc HMM l trng hp in
hnh ca hc khng gim st, ni d liu l khng y v dy trng thi n.
Trc khi m t thut ton Baum-Welch, u tin chng ti nh ngha mt vi
thut ng cn thit. mt hiu chnh tng t vi xc sut tin, chng ta nh
ngha xc sut li nh sau:

( ) = (+1
| = , )

(3.5)

Trong ( ) l xc sut to ra quan st tng phn +1


(t t+1 n kt

thc) cho trc, HMM trong trng thi i thi im t, ( ) c th sau c tnh
ton mt cch qui np;
Khi to:
( ) = 1/

1iN

Qui np:
( ) = [
=1 (+1 )+1 ()]

t = T - 1 1; 1 i N

Mi quan h lin k v (t-1 & t v t & t+1) c th c mnh ha nh

40

hnh bn di. c tnh mt cch qui t tri sang phi, qui t phi sang
tri.

Hnh 3.5. Mi quan h t-1 & t v t & t+1 trong thut ton tin-li

Tip theo chng ta nh ngha t(i, j) l xc sut ca s chuyn tip t trng


thi i sang trng thi j thi im t, cho trc m hnh v dy quan st.
(, ) =

1 ( ) ( ) ()

=1 ()

Chng ta ci tin lp vct tham bin HMM = (A, B, ) bng cch cc


biu th vct tham
i xc sut P(X|) cho mi ln lp. Chng ta s dng
bin mi dn sut t vct tham bin trong vng lp trc . Qu trnh cc
i ha l tng t vic cc i hm Q nh sau:
) =
(,

(, |)
)
log(, |
(|)

Trong :
(, |) = =1 1 ( )
log (, |) = Tt=1 log 1 + Tt=1 log ( )

41

Hnh 3.6. S minh ha cc php ton yu cu cho vic tnh ton ca t(i, j).

Khi chng ta tch hm Q thnh ba thut ng c lp, th tc cc i ha


) c th c thc hin bng cc i nhng thut ng n ri rc, i
trn Q(,
tng hng n l cc rng buc xc sut. Chng ta t c m hnh c lng
nh sau:
=

=1 (,)

=1 =1 (,)

( ) =

= (,)

=1 (,)

(3.6)
(3.7)

Thut ton tin li (hay thut ton Baum-Welch) c th c m t nh sau:


Thut ton Baum-Welch:
Bc 1: Khi to: chn mt c lng .
) trn c s .
Bc 2: E-step: tnh hm ph tr Q(,
theo c lng trong biu thc (3.6) v (3.7) cc
Bc 3: M-step: tnh
i hm ph tr Q.
, lp li t bc hai cho n khi hi
Bc 4: Qu trnh lp: thit t =
t.
3.1.3. Vn thc t trong s dng cc HMM:
3.1.3.1. c lng ban u:
V mt l thuyt, thut ton lng gi ca HMM nn t n ch s ti a
cc b cho kh nng xy ra. Cu hi then cht l lm sao chn ng c tnh ban
u ca cc tham bin HMM sao cho ch s ti a cc b tr thnh ti a ton cc.

42

HMM ri rc, nu mt xc sut c khi to l khng, n s duy tr l


khng mi. Do , iu quan trng l phi c tp hp cc c lng ban u hp l.
Nghin cu theo kinh nghim cho thy, i vi HMM ri rc, ta c th s dng
phn phi ng b nh c lng ban u. N thc hin tt mt cch hp l cho
hu ht ng dng ting ni, c lng ban u tt l lun hu ch tnh ton cc
xc sut u ra.
3.1.3.2. Cu trc lin kt m hnh:
Ting ni l tn hiu khng c nh. Mi trng thi HMM c kh nng gi
mt vi phn on c nh trong tn hiu ting ni khng c nh. Cu trc t tri
sang phi, l thnh phn t nhin m hnh tn hiu ting ni. N t chuyn tip
n mi trng thi, iu c th c dng m hnh cc c trng ting ni
lin tc thuc v trng thi ging nhau. Khi phn on ting ni c nh rt ra, s
chuyn tip t tri sang phi cho php s tin trin t nhin ca cc s thay i nh
vy. Trong cu trc nh vy, mi trng thi ph thuc phn phi xc sut u ra, c
th c dng thng dch tn hiu ting ni quan st c. Cu trc ny l mt
cu trc HMM ph bin nht c dng trong cc h thng nhn dng ting ni tin
tin nht.
Trng thi ph thuc phn phi xc sut u ra va c th phn phi ri rc
hoc hn hp chc nng mt lin tc. y l trng hp c bit ca chuyn
tip-ph thuc cc phn phi xc sut u ra. Trng thi ph thuc cc xc sut u
ra c th c xem nh nu s chuyn tip ph thuc cc phn phi xc sut u ra
c gn b i vi mi trng thi.
i vi trng thi HMM ph thuc t tri sang phi, tham bin quan trng
nht trong xc nh cu trc l s trng thi. La chn ca m hnh cu trc ty theo
d liu hun luyn sn c v nhng g m hnh c dng. Nu mi HMM c
dng i din cho mt m, ta cn c t nht ba n nm phn phi u ra. Nu
m hnh nh vy c dng i din cho mt t, nhiu hn cc trng thi ni
chung c yu cu, ty vo pht m v khong thi gian tn ti ca t. Chng hn
nh, t tetrahydrocannabino nn c nhiu trng thi trong so snh vi ch a. Ta c
th dng t nht 24 trng thi cho phn trc v ba trng thi cho phn sau. Nu ta
c s ca trng thi ty vo khong thi gian tn ti ca tn hiu, ta c l cn dng

43

15 n 25 trng thi cho mi giy ca tn hiu ting ni. Mt ngoi l l, i vi


khong lng, ta c l cn c mt cu trc n gin hn. y l v khong lng l c
nh, v ch cn 1 hoc 2 trng thi s .

Hnh 3.7. M hnh Markov n in hnh c dng cho m hnh m v

C 3 trng thi (0-2) v mi trng thi c mt phn phi xc sut u ra kt


hp.
3.1.3.3. Tiu ch hun luyn:
Lp lun cho s c lng kh nng xy ra ti a (MLE - Maximum
Likelihood Estimation) c da trn mt gi nh l phn phi ng ca ting ni
l mt thnh vin ca cc phn phi s dng. Cc s lng ny xc nhn ting
ni c quan st thc s c to ra bi HMM ang dng, v tham bin khng r
duy nht l gi tr. Tuy nhin, iu ny c th c thch thc. Cc HMM in hnh
to ra nhiu gi nh khng chnh xc v quy trnh to ra ting ni, nh l gi nh
u ra c lp, gi nh Markov, v gi nh hm mt xc sut lin tc. Cc gi
nh khng chnh xc lm yu i c s hp l cho tiu ch kh nng xy ra ti a.
Chng hn nh, phng php c lng kh nng xy ra ti a l nht qun (s hi
t n gi tr ng), n l v ngha c mt tnh cht nh vy nu m hnh sai
c s dng. Tham bin ng trong trng hp ny s l tham bin ng ca cc
m hnh sai. Do , tiu chun phng php c lng c th lm vic tt mc d
cc gi nh khng chnh xc ny nn a ra xc nhn chnh xc c so snh
vi tiu chun kh nng xy ra ti a.
3.1.3.4. Php ni suy loi b:
ci thin tnh chc chn, thng cn thit tng hp m hnh tng qut
c hun luyn tt (nh c lp ngi ni) vi nhng m hnh c hun luyn
km nhng chi tit hn (ph thuc ngi ni). Chng hn nh, ta c th nng cao

44

chnh xc nhn dng ting ni vi hun luyn ph thuc ngi ni. Tuy vy, ta
c th khng c d liu cho ngi ni c th v vy mong mun s dng mt
m hnh ngi ni c lp l tng qut hn nhng km chnh xc hn trong ti u
m hnh ph thuc ngi ni. Mt phng php hiu qu t c s chc chn
l thm vo c hai m hnh vi k thut c gi l php ni suy loi b, trong
o php ni suy s dng c lng qua vic hp thc ha d liu. Hm mc tiu
l ti u xc sut ca m hnh to ra d liu.
By gi, gi s rng chng ta mun ni suy hai tp hp ca cc m hnh
[PA(x) v PB(x), va c th phn phi xc sut ri rc hoc hm mt lin tc]
to thnh mt m hnh ni suy PDI(x). Th tc php ni suy c th biu din dng:
PDI(x) = PA(x) + (1-) PB(x)
3.1.3.5. Ti u ton t:
Mt thc t n gin cho m phng xc sut l cng nhiu s quan st cng
tt, l cn thit n nh m hnh c lng cc tham bin. Tuy nhin, tht ra, ch
mt s lng hn ch d liu hun luyn l sn c. Nu d liu hun luyn b gii
hn, iu ny s dn n kt qu trong mt vi tham bin hun luyn l khng
tha ng, v s phn loi da trn cc m hnh hun luyn km s dn n mc
li nhn dng cng cao. C nhiu gii php hp l gii quyt vn ca d liu
hun luyn khng y nh sau:
Ta c th gia tng kch thc ca d liu hun luyn.
Ta c th gim s tham bin t do c c lng li. iu ny to
nn cc hn ch ca n, v mt s cc tham bin ng k lun cn m hnh s kin.
Ta c th thm vo mt tp cc tham bin c lng vi mt tp khc
ca tham bin c lng, theo mt lng d liu hun luyn tn ti. Xo b
php ni suy c cp trn, c th c s dng hiu qu. Trong HMM ri
rc, mt phng php n gin l thit lp nn cho c hai xc sut chuyn tip v
xc sut u ra loi b kh nng c lng khng.
Ta c th gom cc tham bin vi nhau gim s ca tham bin t do.
Cho HMM hn hp lin tc, ta cn ch n ti u ma trn. C mt s
k thut ta c th s dng:

45

Ta c th ni suy ma trn vi nhng mu hun luyn tt hn.


Ta c th gom ma trn Gaussian thng qua cc thnh phn hn hp khc
nhau hoc qua cc trng thi Markov khc nhau.
Ta c th s dng ma trn cho nu tng quan gia cc h s c trng
l yu, s ng l trng hp ny nu ta s dng cc c trng khng tng quan
nh MFCC.
Ta c th kt hp cc phng php ny vi nhau.
Trong thc t, chng ta c th gim mc li nhn dng ting ni khong
5-20% vi cc k thut ti u khc nhau, ty vo lng d liu hun luyn sn c.
3.1.3.6. Biu din xc sut:
Khi chng ta tnh ton cc xc sut trc v sau trong thut ton ForwardBackward, chng s tip cn khng theo xu hng cp s m nu chiu di dy
quan st, T, tr nn ln. Cho T ln, dy linh ng cc xc sut s vt qu
phm vi chnh xc ca bt k b my no v c bn. Do , trn thc t, n s
dn n thiu ht trn my tnh nu cc xc sut c biu din trc tip. Chng ta
c th gii quyt vn thi hnh ny bng cch ly t l cc xc sut ny vi mt s
h s t l sao cho chng bn trong dy linh ng ca my tnh. Tt c cc h s t
l ny c th c xo b vo cui qu trnh tnh ton khng gy nh hng
chnh xc tng th.
V d, cho t(i) nhn vi h s t l, St:
St = 1/ ()

(3.8)

Trong , () = 1, 1 t T , t(i) c th c nhn bi St , 1 t


T . S quy c bao hm trong qu trnh tnh ton cc bin s trc v sau c th
c ly t l mi giai on ca thi gian t bi St . Ch l t(i) v t(i) c tnh
ton mt cch quy trong xu hng cp s m. V th, thi im t h s t l
ton b p dng cho bin s trc t(i) l:
Scale(t) = =1

(3.9)

V h s t l ton b cho bin s sau t(i) l:


Scale(t) = =

(3.10)

46

l bi v cc h s t l ring c nhn cng vi nhau trong quy


trc v sau. Cho (), (), v (, ) biu th cc bin s t l tng ng, mt
cch mong i. Ch l:
() = () () = ()(|)

(3.11)

Xc sut t l trc tip, (, ), c th sau c vit l:


(, ) =

(1)1 () ( ) ()()
()
=1 ()

(3.12)

Nh vy, cc xc sut trc tip c th c s dng trong cng mt cch


nh cc xc sut khng t l, bi v h s t l c xa b trong biu thc trn.
Cho nn, vic c lng li biu thc c th c gi nguyn mt cch chnh xc
ngoi tr P(|) nn c tnh nh sau:
P(|) = ( )/ ()

(3.13)

Trong thc t, ton t t l khng cn biu din mi thi im quan st.


N c th c s dng bt k khong thi gian tnh t l no cho s thiu ht c
th xy ra. Trong khong thi gian khng t l, c th c gi nguyn nh
h s n v.
Mt cch thay i trnh s thiu ht l s dng biu din lgarit cho tt
c cc xc sut. iu ny khng ch chc chn tnh t l l khng cn thit, v thiu
ht khng th xy ra, m cn cung cp li ch l cc s nguyn c th c s dng
biu din cc gi tr lgarit.
Trong thut ton Forward-Backward, chng ta cn xc sut thm vo.
Chng ta c th gi mt bng lgarit: logbP2 - logbP1. Nu chng ta biu din xc
xut P bi logbP, tng chnh xc c th bao hm bi thit lp b gn hn n h
s n v. Ta hy xem l ta mun thm P1 v P2 v P1 P2. Ta c:
logb(P1 + P2) = logbP1 + logb(1+ 2 1 )

(3.14)

Nu P2 m qu nhiu c lng nh hn P1 , thm vo hai s s ch c kt


qu trong P1. Chng ta c th lu tt c cc gi tr ca (logbP2 - logbP1). S dng
phng php lgarit mang n li cho php ton thm vo. Trong thc t, biu din
chnh xc du chm ng kiu double c th c dng ti thiu nh hng
ca vn chnh xc.

47

3.1.4. Nhng hn ch ca HMM:


C mt s hn ch trong quy c HMM. Chng hn nh, HMM ly khong
thi gian tn ti nh mt phn phi theo cp s m, xc sut chuyn tip ch da
vo ngun gc v ch, v tt c cc khung quan st u ph thuc ch trn trng
thi to ra chng, khng phi gn k cc khung quan st. Cc nh nghin cu
xut mt s k thut x l hn ch ny, mc d cc gii php ny khng
ci tin ng k chnh xc ca nhn dng ting ni trong cc ng dng thc t.
3.1.4.1. M phng khong thi gian tn ti:
Mt im yu chnh ca quy c HMM l chng khng cung cp biu din
thch ng ca cu trc biu th thi gian ca ting ni. y l v xc sut ca trng
thi thi gian gim theo hm m vi thi gian nh nu trong biu thc bn di.
Xc sut ca t cc quan st lin tc trong trng thi i l xc sut ca s gi vng t
lp trng thi i cho thi gian t, c th c vit nh sau:
() = (1 )

(3.15)

Hnh 3.8. Mt HMM chun

(a) v thi gian tn ti qu trnh HMM tng ng (b) ni m cc s t chuyn i c


i ch vi phn phi xc sut quy trnh cho mi trng thi

Ci tin n HMM chun to ra bi s dng HMM vi phn phi quy trnh


thi gian r rng cho mi trng thi. gii thch nguyn tc m phng quy trnh
thi gian, quy c HMM vi mt quy trnh trng thi theo cp s m v mt quy
trnh thi gian HMM vi cc mt quy trnh trng thi xc nh. Trong (a), xc
sut quy trnh trng thi c mt dng theo cp s m trong biu thc (3.15). Trong
(b), cc xc sut t chuyn i c thay th vi mt phn phi xc sut quy trnh
r rng. thi im t, qu trnh a vo trng thi i cho quy trnh vi mt xc

48

sut di(), trong lc cc qu trnh quan st Xt+1, Xt+2, + c to ra. Sau


chuyn tip n trng thi j vi xc sut chuyn i l aij ch sau cc quan st
thch hp xy ra trng thi i. V th, bng thit lp mt xc sut quy trnh thi
gian c mt theo cp s m ca biu thc (3.15) quy trnh thi gian HMM
c th c to ra tng ng vi HMM chun. Cc tham bin di() c th c
c lng t cc quan st ph hp vi cc tham bin khc ca HMM . Xt tnh
thit thc, mt quy trnh thng b ct xn gi tr quy trnh cc i Td. c
lng li cc tham bin ca HMM vi m phng quy trnh thi gian, qu trnh
quy trc phi c chnh sa nh sau:
() = , ( ) () =1 (+1 )

(3.16)

S chuyn tip t trng thi i sang trng thi j khng ch ph thuc xc sut
chuyn i aij m cn trn tt c cc kh nng trong khong thi gian c th xy
ra trong trng thi j. Biu thc (3.16) minh ha khi trng thi j c t n t trng
thi i trc , cc quan st c th gi trng thi j cho mt khong thi gian vi
mt quy trnh di(), v mi quan st to ra xc sut u ra ca chnh n. Tt c
quy trnh c kh nng phi c xem xt, vi s tng kt mong mun t n . Gi
nh c lp ca cc quan st mang n kt qu trong thut ng ca cc xc sut
u ra. Tng t, s quy pha sau c th c vit nh sau:
( ) = , () =1 (+1 )+ ()

(3.17)

Thut ton Baum-Welch ci tin c th c s dng trn c s biu thc


(3.16) v (3.17).
Ngoi ra, mt khng thun li s dng m phng quy trnh thi gian l
s gia tng ln trong phc tp tnh ton bng biu thc O(D2). Vn khc l s
lng ln cc tham bin thm vo D phi c c lng. Mt bin php sut l
s dng hm mt lin tc thay v phn phi ri rc di().
Trong thc t, cc m phng quy trnh cung cp s ci tin bnh thng
cho nhn dng ting ni lin tc c lp ngi ni. Nhiu h thng thm ch rt ra
xc sut chuyn tip hon ton bi v cc xc sut u ra mang tnh chi phi. Tuy
nhin, thng tin quy trnh rt hiu qu cho vic ct ta khng chc cc phn tham gia
trong qu trnh gii m nhn dng ting ni c b t vng ln.

49

3.1.4.2. Gi nh bc u tin:
Khong thi gian tn ti ca mi phn on c nh gi bng trng thi n
l khng tha ng m hnh. Cch khc lm gim nh vn khong thi gian
tn ti l loi b gi nh s chuyn tip bc u tin v to nn dy trnh t
trng thi di mt chui Markov bc hai. Kt qu l xc sut chuyn tip gia hai
trng thi thi im t ph thuc cc trng thi m trong qu trnh thi im t
- 1 v t - 2. Cho trc mt dy trng thi S = {s1, s2, sT}, xc sut ca trng thi
nn tnh ton nh sau:
() =

(3.18)

Trong 21 = ( |2 1 ) l xc sut chuyn tip thi im t,


cho trc hai bc trng thi. Th tc s c lng li c th c m rng sn sng
trn c s (3.18).
Trong thc t, m hnh bc hai rt tn km trong qu trnh tnh ton nh
chng ta phi xem xt khng gian trng thi gia tng, m c th thng c nhn
ra vi m hnh Markov n bc mt tng ng trn khng gian trng thi. N
khng cung cp gia tng chnh xc mt cch ng k sp xp cho u nhau s
gia tng ca n trong phc tp tnh ton cho hu ht ng dng.
3.1.4.3. Gi nh c lp c iu kin:
im yu chnh th ba ca HMMs l tt c cc khung quan st u ph
thuc ch trn trng thi to ra chng, khng phi gn k cc khung quan st. Gi
nh c lp c iu kin khin n kh m x l mt cch hiu qu cc khung
khng c c mi tng lin mnh m. C mt s cch lm gim nh gi nh c
lp c iu kin. Chng hn nh, chng ta c th gi nh phn phi xc sut u ra
ph thuc khng nhng trn trng thi m cn trn khung trc . Do , xc sut
ca trnh t trng thi cho trc c th vit li nh:
(|, ) = = ( | , , )

(3.19)

V khng gian tham bin tr nn qu ln, chng ta thng cn lng t ha


Xt-1 trong mt tp hp nh hn ca cc t m c th gi cho s cc tham bin t
do trong kim sot. V vy, biu thc (3.20) c th c n gin nh sau:
(|, ) = = ( |(), , )

(3.20)

50

Trong () biu th vct lng t c mt kch c cc k hiu nh, L.


Mc d iu ny c th gim khng gian ca cc phn phi xc sut u ra c iu
kin t do, s tng cng ca cc tham bin t do s vn tng ln bng L ln.
S c lng cho cc HMM ph thuc iu kin c th c dn sut vi
s thay i hm Q, nh tho lun trong phn trc . Trong thc t, n khng
c chng minh chnh xc thuyt phc ci tin cho nhn dng ting ni b t
vng ln.
3.2. M HNH M HC:
chnh xc ca nhn dng ting ni t ng lun l mt trong nhng vn
nghin cu quan trng nht. M hnh m hc ng vai tr quyt nh ci thin
chnh xc v c th xem nh thnh phn trung tm trong bt c h thng nhn
dng no.
Vi mt chui quan st m hc cho trc X = X1, X2, Xn , mc tiu ca
nhn dng ting ni l tm ra chui ting tng ng = w1, w2, wn c xc sut
hu cc i P(W | X) biu din bi biu thc:
= ( ) =

()( )
()

(3.21)

Vi X c nh, biu thc trn t cc i khi biu thc sau t cc i:


= ()( )

(3.22)

Bi ton t ra l lm sao xy dng cc m hnh m hc, P(X | W) v m


hnh ngn ng P(W) thc s phn nh c ngn ng ni c nhn dng. i vi
nhn dng vi b t vng ln, cn phi phn tch mt ting ra thnh chui t con
(subword). Do , P(X | W) c lin h gn vi m hnh m tit. P(X | W) cn tnh
n nhng s thay i v ngi ni, cch pht m, mi trng xung quanh v s kt
hp pht m ng m ph thuc ng cnh. Bt c m hnh m hc hay ngn ng m
hnh thng k no cng khng th p ng c nhu cu ca cc ng dng thc t,
v vy, iu quan trng l lm thch ng ng c P(W) v P(X|W) cc i ha
P(W|X) trong vic dng cc h thng ngn ng ni.
3.2.1. La chn n v thch hp cho m hnh m hc:
i vi mc tiu nhn dng ting ni trn b t vng ln, vic xy dng

51

cc m hnh ton t gp nhiu kh khn v:


- Mi tc v mi li cha cc t mi l m khng c bt c d liu hun
luyn sn c no, chng hn nhng danh t ring v cc thut ng mi c a ra.
- C qu nhiu t, v cc t khc nhau ny c th c cc c im m thanh
khc nhau.
- Vic la chn cc n v c bn biu din c trng m hc v thng
tin ng m cho ngn ng l mt vn rt quan trng trong vic thit k mt h
thng kh thi.
Mt s vn cn phi xem xt trong vic la chn cc n v m hnh ha
chnh xc:
- n v ny phi chnh xc biu din hin thc m thanh xut hin trong
cc ng cnh khc nhau.
- n v ny phi hun luyn c. Phi c d liu c lng cc
tham s cho n v ny. Mc d t l n v chnh xc v tiu biu, chng li t c
kh nng hun luyn nht trong vic xy dng mt h thng kh thi do gn nh
khng th hun luyn lp li hng trm ln cho tt c cc t, tr khi ta xy dng mt
b nhn dng trong mt lnh vc c th.
- n v ny phi c tnh tng qut bt c t mi no cng c th k
tha t mt bn c nh ngha trc i vi h thng nhn dng ting ni c lp
tc v. Nu c mt tp tp c nh cc m hnh t th gn nh khng c cch no
mt m hnh t mi k tha t .
3.2.1.1. So snh cc n v khc nhau:
Trong ting Anh, t thng c coi l n v nh nht mang ngha v c
th s dng c lp. L n v t nhin nht ca ting ni, m hnh ton t c
s dng rng ri cho nhiu h thng nhn dng ting ni. Mt li th ca vic s
dng m hnh t l ta c th nm bt cch pht m vn c trong nhng t ny. Khi
b t vng nh, ta c th to cc m hnh t ph thuc ng cnh.
Trong khi t l n v ph hp cho nhn dng ting ni trn b t vng
nh, chng li khng phi l la chn tt i vi nhn dng ting ni lin tc trn

52

b t vng ln v nhng l do sau:


- Mi t phi c x l ring l, v d liu khng th c chia s vi
nhau trong m hnh t. iu ny khin cho s lng d liu hun luyn cn thit l
rt ln.
- i vi mt s tc v, cc t vng nhn dng c th bao gm cc t
khng xut hin trong tp hun luyn.
- Rt kh lm thch nghi mt m hnh t sn c cho mt ngi ni mi,
mt knh mi hay mt ng cnh mi.
Thay vo , ch c khong 50 m t trong ting Anh v chng c th c
hun luyn y ch vi vi trm cu. Khng nh m hnh t, m hnh ng m
khng pht sinh nhiu vn trong vic hun luyn. Hn na, chng c lp vi t
vng v c th c hun luyn trn tc v ny v kim tra trn tc v khc. Do ,
cc m t c kh nng hun luyn cao hn v tng qut hn. Tuy nhin, m hnh
ng m khng tha ng v n gi nh rng mt m v trong mi ng cnh l ging
nhau. D ta c th c gng ni mi t nh l mt chui mc ni vi nhau ca cc
m v c lp, cc m v ny khng c pht sinh mt cch c lp v khp rng
ca ta khng th di chuyn ngay lp tc t v tr ny n v tr khc. Do , hin
thc ca mt m v b nh hng mnh m bi cc m v k st n. Trong khi m
hnh t khng tng qut, m hnh ng m li qu tng qut, v dn n m hnh
km chnh xc.
Mt s kt hp gia m hnh t v m hnh ng m l s dng mt n v
m tit. Cc n v ny bao gm cc b m t cha ng hu ht cc tc ng thay
i ng cnh. Tuy nhin trong khi phn gia ca n v ny khng ph thuc ng
cnh, phn bt u v phn cui vn b tc ng bi mt vi tc ng ng cnh.
3.2.1.2. La chn n v hun luyn cho ting Vit:
Trong ting Vit ting l n v t nhin nht cu to nn li ni, tuy s
lng ting trong ting Vit c gii hn khong 6.000-8.000 nhng nu ng gc
nhn dng ting ni th l mt s lng ng k.
Trong khi , m v trong ting Vit bao gm:

53

- 22 ph m u bao gm /b, m, f, v, t, t, d, n, z, , s, , c, , , l, k, , , ,
h, /
- 1 m m /w/ c chc nng lm trm ha m sc ca m tit.
- 16 m chnh bao gm 13 nguyn m n v 3 nguyn m i: /i, e, , , ,

a, , , u, o, , , , ie, , uo/
- 8 m cui tch cc bao gm 6 ph m /m, n, , p, t, k/ v 2 bn nguyn
m /-w, -j/.
- 6 thanh iu.
C th thy s lng m v khng nhiu, do , vic ng dng m hnh ng
m vo nhn dng ting Vit l mt gii php ng quan tm. Tuy nhin vn kh
khn i vi ting Vit chnh l thanh iu.
Tuy thanh iu nh hng ln ton b ting, nhng c th thy n nh
hng nhiu nht l cc nguyn m. V vy ta c th chia mi nguyn m ra thnh
6 m, tng ng vi 6 thanh iu. Nh vy tng s lng m cn hun luyn l
khong 137 m, nh hn nhiu so vi hun luyn theo ting.
3.2.2. nh gi c trng m hc:
Sau khi tch c trng, ta c mt tp cc vector c trng X, chng hn
vector MFCC l cc d liu u vo. Ta cn phi c lng xc sut ca cc c
trng m hc ny, cho trc m hnh t hoc m hnh ng m W, c th nhn
dng d liu u vo cho t ng. Xc sut ny c gi l xc sut m hc,
P(X|W).
3.2.2.1. La chn cc phn phi u ra HMM:
C th s dng cc HMM ri rc, lin tc hoc bn lin tc. Khi s lng
d liu hun luyn , tham s rng buc tr nn khng cn thit. Mt m hnh
lin tc vi mt s lng ln cc trn ln dn n chnh xc nhn dng tt nht,
mc d phc tp tnh ton ca n cng gia tng tuyn tnh vi s lng cc hn
hp. Mt khc, m hnh ri rc c hiu qu v mt tnh ton, nhng c hiu sut
thp nht trong ba m hnh. M hnh bn lin tc cung cp mt thay th kh thi gia
kh nng hun luyn v tnh mnh m ca h thng.
Khi mt trong HMM ri rc hay bn lin tc c s dng, vic dng nhiu
codebook cho mt s c trng s nng cao hiu sut mt cch ng k. Mi

54

codebook biu din mt tp cc tham s khc nhau. Mt cch kt hp cc quan


st nhiu u ra l gi nh rng chng c lp vi nhau, tnh ton xc sut u ra
nh l sn phm ca cc xc sut mi codebook.

( ) = =1 ( ) ( )

(3.23)

Trong , m biu th cc tham s tng ng codebook-m. Mi codebook


gm c cc hm mt lin tc hn hp Lm.
Thut ton nh gi li m hnh Markov n da trn nhiu codebook
(multiple-codebook-based HMM) c th c m rng. Tch ca mt xc sut
u ra ca mi codebook dn n cc term c lp trong hm Q, vi codebook-m, t
(j, km) c th c chnh li nh sau:
1 () ( ) ( )

(,

( ) ( ) ()

()

(3.24)

S dng nhiu codebook c th lm gia tng nhanh chng kh nng ca VQ


codebook v c th ci tin c bn chnh xc nhn dng ting ni. Ta c th xy
dng mt codebook in hnh cho ck, ck v ck ln lt theo th t. So snh vic
xy dng mt codebook n cho xk, nh hnh di, h thng multiple-codebook c
th gim thiu t l li hn 10%.

Hnh 3.9. T l li t gia cc m hnh

C th thy HMM bn lin tc c mc ci tin chnh xc nm gia m


hnh HMM ri rc v HMM lin tc khi s lng d liu hun luyn c gii hn.
Khi ta tng kch thc d liu hun luyn, HMM mt hn hp lin tc bt u
tt hn hn so vi c HMM ri rc v HMM bn lin tc, do vic chia s cc
tham s m hnh tr nn t quan trng hn.
Hiu sut cng ph thuc vo s lng cc hn hp. Vi mt s lng nh

55

cc hn hp, HMM lin tc thiu sc mnh m hnh v n thc s km hiu qu. so


vi HMM ri rc. Ch sau khi s lng cc hn hp tng ln ng k th HMM lin
tc bt u gia tng chnh xc nhn dng. HMM bn lin tc thng gim thiu
t l li ca HMM ri rc t 10-15%. HMM lin tc vi 20 hm mt cho
Gaussian thc thi km hiu qu hn so vi c HMM ri rc hay HMM bn lin tc
khi kch thc d liu hun luyn nh. N c hiu sut vt tri so vi c HMM ri
rc hay HMM bn lin tc khi c d liu hun luyn. Khi s lng hun luyn
ln, n c th gim t l li ca HMM bn lin tc t 15-20%.
3.2.2.2. Hun luyn ting ni ri rc so vi lin tc:
Nu ta xy dng mt HMM t cho mi t trong b t vng cho nhn dng
ting ni ri rc, qu trnh hun luyn hoc nhn dng c th c thc hin mt
cch trc tip, s dng cc thut ton c bn c trnh by phn m hnh
Markov n. c lng cc tham s m hnh, cc mu ca mi t trong b t
vng c thu thp. Cc tham s m hnh c c lng t tt c cc cc mu
s dng thut ton forward-backward v cng thc c lng li. Khng cn thit
phi xc nh im cui do m hnh khong lng t ng xc nh gii hn ca n
nu ta mc ni cc m hnh khong lng vi m hnh t c hai im u v cui.
Nu cc m hnh ng m c s dng, ta cn phi chia s chng gia cc
t khc nhau i vi nhn dng ting ni trn b t vng ln. Cc n v ng m
c mc ni to thnh mt m hnh t, c th thm cc m hnh khong lng ti
im u v im cui.
mc ni cc ng m thnh dng m hnh t, c th c s chuyn i t
trng thi cui cng ca m hnh Markov n ng m trc sang trng thi khi to
ca m hnh Markov n ca ng m k tip. C th c lng cc tham s ca m
hnh Markov n mc ni. Lu rng vic thm cung chuyn trng thi rng nn
tha mn xc sut rng buc vi xc sut chuyn trng thi ca mi m hnh
Markov n ng m. Nu c lng cc tham s vi m hnh mc ni, xc sut
chuyn trng thi cung rng aijg phi tha mn rng buc:
( + ) = 1

(3.25)

V vy, xc sut chuyn trng thi vng t lp ca trng thi cui cng lun

56

nh hn 1. i vi kt ni lin t hay mc ni bao gm nhiu cch pht m, ta c


th s dng nhiu cung rng mc ni cc m hnh n l vi nhau.
Trong v d trong hnh di, ta c 10 ch s ting Anh trong b t vng.
Xy dng mt m hnh Markov n cho mi m t ting Anh. T in cung cp
thng tin cch pht m ca mi t. Trong c mt t c bit l Silence, nh x
vi /sil/ trong m hnh Markov n c dng topology nh m hnh Markov n ng
m chun. Vi mi t trong b t vng, u tin ta dn xut chui ng m cho mi
t trong t in. Sau kt ni cc m hnh ng m vi nhau thnh dng mt m
hnh Markov n mt t cho mi t trong b t vng.

Hnh 3.10. Cu trc ca mt m hnh t ri rc

V d vi t two, u tin to mt m hnh t bt u bi silence /sil/, m t


/t/, /uw/ v kt thc bng silence /sil/. M hnh t mc ni sau c xem nh
mt m hnh Markov n tng hp ln chun. S dng thut ton ForwardBackward chun c lng cc tham s ca m hnh Markov n tng hp t
nhiu mu ging ni ca t two. Sau vi ln lp li s t ng thu c cc tham s
m hnh Markov n cho /sil/, /t/ v /uw/. Do mt m t c th c chia s trn cc
t khc nhau, cc tham s ng m c th c c lng t d liu m hc trong
cc t khc nhau.
Kh nng t ng sp xp mi m hnh Markov n n l thnh chui quan
st ting ni khng phn on tng ng l mt trong nhng tnh nng mng m
nht trong thut ton Forward-Backward. Khi s dng phng php mc ni m
hnh Markov n cho ting ni lin tc cn phi sp xp nhiu t thnh dng mt m

57

hnh Markov n cu da trn bn ghi ca li ni. Thut ton Forward-Backward hp


th mt dy cc thng tin ranh gii t c th ca cc m hnh mt cch t ng, v
th khng cn phi phn on ting ni lin tc mt cch chnh xc.
c lng cc tham s ca m hnh Markov n, mi t c khi to
vi m hnh t mc ni. Cc t trong cu c mc ni vi cc m hnh silence ty
chn gia chng.
Ni chung, m hnh Markov n cu kt ni c th c hun luyn s dng
thut ton forward-backward vi chui quan st tng ng. Do m hnh Markov n
ton cu c hun luyn trn ton b chui quan st cho cu tng ng, hu ht
cc gii hn t c th u c xem xt. Cc tham s ca mi m hnh c da
trn s lin kt trng thi vi ting ni (state-to-speech aligments) . Phng php
hun luyn nh vy cho php t do hon ton lin kt cc m hnh cu i vi
quan st ny, v khng cn phi c gng tm gii hn t.
Trong gii m ting ni, mt t c th bt u v kt thc bt k u
trong phm vi tn hiu ting ni cho trc. V cc gii hn t khng th c pht
hin mt cch chnh xc, tt c cc im bt u v kt thc phi c tnh n.

Hnh 3.11. M hnh Markov n cu tng hp

3.2.3. Phng php tnh ton li:


c lng hiu sut ca cc h thng nhn dng ting ni l mt cng vic
rt quan trng. T l li nhn dng t c s dng rng ri. Khi so snh cc thut
ton m hnh m hc, iu quan trng l so snh s gim li tng i ca chng.
Phi kim tra mt tp d liu gm hn 500 cu (vi 6 n 10 t mi cu) ca t 5
n 10 ngi ni c lng t l li nhn dng mt cch tin cy. Thng thng,
mt thut ton mi c xem l ph hp khi gim thiu c t 10% li tr ln.
C 3 loi li nhn dng t in hnh trong nhn dng ting ni:

58

- Thay t (Substitution): mt t khng ng thay th cho mt t ng.


- Xa t (Deletion): mt t ng b loi b trong cu nhn dng.
- Thm t (Insertion): mt t khc c thm vo trong cu nhn dng.
xc nh t l li nh nht, s dng cng thc sau:
= 100

++

(3.26)

3.3. M HNH NGN NG:


Qu trnh so khp mu m hc v kin thc v ngn ng l quan trng nh
nhau trong nhn dng v hiu ting ni t nhin. Trong nhn dng ting ni thc
tin, c th khng c kh nng tch vic s dng ca cc cp khc nhau ca
tri thc, v th chng thng c tch hp cht ch. Chng ta s tm hiu v l
thuyt ngn ng hnh thc v xc sut m hnh ngn ng. Ngn ng hnh thc c 2
phn c bn: ng php v thut ton phn tch c php. Ng php l s m t hnh
thc ca cu trc c cho php i vi ngn ng. K thut phn tch c php l
phng php phn tch cu thy nu cu trc ca n l tun theo ng php. Vi
s c mt ca khi lng nhiu vn bn m cu trc ca n c ch thch bng
tay. Mi quan h xc sut gia dy cc t c th c dn sut trc tip v c m
hnh t tp vn bn c gi l m hnh ngn ng Stochastic, nh n-gram. M hnh
ngn ng Stochastic ng vai tr thit yu trong xy dng hot ng mt h thng
ngn ng ni.
3.3.1. L thuyt ngn ng hnh thc:
Cch ph bin nht biu din cu: Mary loves that person, l s dng cy
nh minh ha trong hnh (3.12). Nt S l nt cha ca nt NP v VP cho cm danh t
v cm ng t mt cch mong i. Nt VP l nt cha ca nt V v N cho ng t
v danh t. Mi nt l c lin kt vi t trong cu c phn tch. xy
dng mt cy cho mt cu, chng ta phi bit cu trc ca ngn ng. V vy tp cc
nguyn tc vit li c th c s dng m t cu trc cy c cho php.
Nhng lut ny, xc nh mt k hiu chc chn c th c m rng trong cy
bng mt dy cc k hiu. Cu trc ng php h tr trong vic xc nh ngha ca
cu. N cho chng ta bit that trong cu ch person.

59

Hnh 3.12. Mt biu din cy ca mt cu v ng php tng ng ca n

3.3.1.1. H thng cp bc Chomsky:


Trong l thuyt ngn ng hnh thc Chomsky, mt cu trc ng php c
nh ngha G = (V, T, P, S), trong V v T l tp hu hn ca cc k hiu cui v
cc k hiu khng phi cui. V bao gm tt c cc k hiu khng phi k hiu kt
thc. Chng ta thng s dng ch hoa biu th chng. Tp thut ng T bao gm
Mary, loves, that, person, c biu th ch thng. P l tp hu hn ca vic vit
li cc quy tc. S l mt k hiu c bit, c gi l k hiu bt u.
Bng 3.1. H thng cp bc Chomsky v my tng ng cho php ngn ng

Loi
Ng php cu
trc
Ng php ng
cnh nhy

Rng buc
. y l ng php tng qut nht.

Mt tp con ca ng php cu trc cm t


||||, trong |.| cho bit chiu di ca
chui.
Ng php ng
Mt tp con ca ng php ng cnh nhy.
cnh t do (CFG Quy tc to ra l A , trong A l
- context free
khng phi l k hiu kt thc. Hnh thc
grammar)
Chomsky: A w v ABC, trong w l
mt k hiu kt thc v B, C khng phi.
Ng php thng Mt tp con ca CFG. Qui tc to ra c
thng
m t: Aw v AwB.

H thng t
ng
My Turing
H thng t
ng tuyn tnh
H thng t
ng thc y

H thng t
ng trng thi
hu hn
Ngn ng c phn tch v c bn l mt chui cc k hiu thut ng, nh

Mary loves that person. N c to ra bng cch p dng qui tc to ra theo


chui t k hiu bt u. Qui tc to ra dng , trong v l cc chui ty

60

ca k hiu ng php V v T. V phi khng c rng. Trong l thuyt ngn


ng hnh thc, 4 ngn ng chnh v ng php lin kt ca chng c cu trc mt
cch c cp bc. l cp bc Chomsky nh nh ngha trong bng trn. C 4 loi
h thng t ng m c th chp nhn cc ngn ng c to bi bn loi cu trc
ng php ny. Gia nhng h thng ny, h thng t ng trng thi hu hn khng
ch l h thng ton hc c s dng trang b ng php thng thng m cn l
mt trong nhng cng c ng k trong ngn ng tnh ton. S a dng ca h
thng t ng nh b chuyn i trng thi hu hn, m hnh Markov n v m
hnh n-gram l nhng phn quan trng trong x l ngn ng ni.
3.3.1.2. Phn tch c php th cho ng php ng cnh t do
(CFG - Context Free Grammars):
Thut ton ny c s dng rng ri trong cc h thng hiu ngn ng ni
tin tin.
T trn xung hay t di ln:
Phn tch c php l mt trng hp c bit ca vn tm kim mt cch
tng qut bt gp trong nhn dng ting ni. Th tc tm kim c th bt u t nt
gc ca cy vi k hiu S, ngoi ra c th bt u t cc t trong cu nhp vo v
xc nh mt cu t sao cho khp mt vi k hiu khng kt thc. Th tc t di
ln c th c nhc li vi cc k hiu phn tch tng phn cho n khi nt gc
ca cy hoc k hiu bt u S c xc nh.
Mt phng php t trn xung bt u vi S, sau tm kim thng qua
nhng cch khc vit li cc k hiu cho n khi cu nhp vo c to ra, hay
n khi tt c xc sut c kim tra. Mt dng ng php c ni n chp nhn
mt cu nu c mt cu ca cc qui tc cho php chng ta vit li k hiu bt u
trong cu. Mt cu ca cc qui tc vit li c th c minh ha nh sau:
S
NP VP (vit li S s dng sNP)
NAME VP (vit li NP s dng NPNAME)
Mary VP (vit li NAME s dng NAMEMary)

Mary loves that person (vit li N s dng Nperson)

61

Mt cch thay i, chng ta c th ly cch tip cn t di ln bt u


vi nhng t trong cu nhp vo v s dng qui tc ghi li pha sau gim cu ca
cc k hiu cho n khi n tr thnh S. Pha bn tay tri hay mi qui tc c s
dng ghi li k hiu trn pha bn tay phi nh sau:
NAME loves that person (ghi li Mary s dng NAMEMary)
NAME V that person (ghi li loves s dng Bloves)

NP VP (ghi li NP s dng SNP VP)


S
Mt thut ton phn tch c php phi c t chc c h thng vi mi
trng thi c kh nng m biu din nt trung gian trn cy phn tch c php. Nu
c mt li xy ra sm trong vic la chn quo tc ghi li S, cc kt qu phn tch c
php trung gian c th rt lng ph nu s cc qui tc tr nn cng ln.
Phn tch c php th t di ln:
Thut ton:
Bc 1: khi to: nh ngha mt danh sch c gi l biu lu tr cc
hnh cung v mt danh sch gi danh sch mt s vn lu cc thnh
phn cho n khi chng c thm vo biu .
Bc 2: lp li t bc 2 n bc 7 n khi khng cn d liu u vo.
Bc 3: thm vo v ly ra t danh sch: nu danh sch l rng, tm kim t
tip theo trong u vo v thm chng vo danh sch. Ly thnh phn C t danh
sch. Nu C tng ng vi v tr t wi n wj ca cu nhp vo, chng ta chi biu
th n C[i, j].
Bc 4: thm C[i, j] vo biu .
Bc 5: thm hnh cung nh du vo biu . Vi mi qui tc trong ng
php dng XC Y , them vo th mt hnh cung dng X[i, j]oCY, trong o
cho bit v tr quan trng gi l kha biu th mi th trc o c th nhn thy,
nhng sau o cha c lin kt.
Bc 6: di chuyn o qua: cho bt c hnh cung c hiu lc ca dng
X[1,j]YoCZ (trc wi) trong th, them mt hnh cung mi c dng
X[1,j]YoCZ vo th.
Bc 7: thm thnh phn mi vo danh sch: vi mi hnh cung dng
X[1,j]Y#C thm thnh phn mi vo danh sch X[1,j].
Bc 8: kt thc: nu S[1,n] trong th, trong n l chiu di ca cu
nhp vo, chng ta c th thot mt cch thnh cng nu chng ta khng mun

62

tm tt c cch hiu c kh nng ca cu. th ny c th bao gm nhiu cu


trc S cha ton b tp cc v tr.
3.3.2. M hnh ngn ng Stochastic:
M hnh ngn ng Stochastic (SLM-stochastic language model) ly mt
xc sut im nhn ca vic m hnh ha ngn ng. Chng ta cn c lng chnh
xc xc sut P(W) vi mt dy t cho trc W = w1w2wn . Trong l thuyt ngn
ng hnh thc P(W) c th c quan tm nh 1 hoc 0 nu dy t l c chp
nhn hoc t chi, mt cch mong i theo cu trc ng php. iu ny c th
khng ph hp cho cc h thng ngn ng ni, khi ng php khng th c mt mc
hon ton, khng cp n l ngn ng ni thng khng theo cu trc ng
php trong cc ng dng m thoi thc t.
Mc tiu chnh ca SLM l cung cp y thng tin xc sut v th cc
dy t c kh nng nn c mt xc sut cao hn. N khng ch lm cho qu trnh
nhn dng ting ni thm chnh xc m cn h tr rng buc khng gian tm kim
cho nhn dng ting ni. Ch l SLM c th c mt bao ph rng trn tt c cc
dy t c kh nng, khi xc sut c dng phn bit nhng dy t khc nhau.
S dng rng ri nht ca SLM l m hnh n-gram. CFG c th c tng cng
nh l cu ni gia n-gram v ng php hnh thc nu chng ta c th kt hp cc
xc sut trong cc qui tc qu trnh to ra.
3.3.2.1. Xc sut ng php ng cnh t do (CFG):
CFG c th c tng ln vi xc sut cho mi qui tc to mi. Thun li
ca cc xc sut CFG trn kh nng ca chng bt nhiu chnh xc hn trong
cu trc s dng nhng vo ca ngn ng ni cc tiu s nhp nhng c php.
Vic s dng xc sut tr nn gia tng quan trng phn bit nhiu la chn cnh
tranh khi s cc qui tc l ln.
Vn nhn dng c quan tm vi qu trnh tnh ton xc sut ca k
hiu bt u S to ra dy t W = w1, w2, wT, cho trc b ng php G:
P(SW|G)

(3.27)

Vn hun luyn c quan tm vi vic xc nh tp cc qui tc trong G


trn c s tp vn hun luyn v qu trnh c lng xc sut ca mi lut. Nu tp

63

cc lut l c nh, phng php n gin nht dn sut cc xc sut ny l m


s ln mi lut c s dng trong tp vn bao gm nhng cu c phn tch
c php. Chng ta biu th xc sut ca mt lut A bi P(A|G). Nu c m
lut bn tay tri khng phi nt cui A:A1, A2, Am , chng ta c th
lng gi xc sut cc lut ny nh sau:
( |) = (

=1 ( )

(3.28)

Chng ta cho dy t W=w1, w2, wT c to bi xc sut CFG G, vi


cc qui tc Chomsky:
AiAmAn v Aiwl

(3.29)

Trong Am v An khng c kh nng l nt cui m m rng Ai v tr


khc. Xc sut cho cc qui tc ny phi tha mn rng buc sau:
, ( | ) + ( | ) = 1 ,

(3.30)

Hnh 3.13. Xc sut bn trong c tnh ton mt cch quy nh tng ca tt c cc dn sut

Xc sut bn trong:
inside(j, Ai, k) = P(Aiwjwj+1wk|G)

(3.31)

Nh xc sut cu thnh bn trong, n h tr mt xc sut cho mt dy t


bn trong qu trnh to thnh.
Ngoi ra cn c xc sut bn ngoi cho nt khng phi l nt cui Ai bao
gm ws n wt , trong chng c th c dn sut t k hiu bt u S, cng vi
phn cn li ca cc t trong cu:
outside(s, Ai, t) = P(Sw1ws-1 Ai wt+1wT)

(3.32)

64

Hnh 3.14. nh ngha xc sut bn ngoi

Xc sut bn trong v bn ngoi c s dng tnh ton xc sut cu:


( 1 ) = (, , )(, , ), (3.33)

Mt vn vi xc sut CFG l n gi nh s m rng bt k nt khng


phi nt cui l c lp vi s m rng cc nt khc. V vy mi xc sut lut CFG
c nhn vi nhau m khng cn xem xt v tr ca nt trong cy phn tch c
php. Mt vn khc l s thiu nhay bn vi cc t, mc d thong tin b t vng
ng vai tr quan trng trong vic la chn chnh xc qu trnh phn tch c php
ca cm t nhp nhng. Trong xc sut CFG, thng tin b t vng c th ch c
biu din thng qua xc sut ca cc nt xut hin trc nt cui, nh ng t v
danh t, c m rng theo b t vng. Ta c th thm cc rng buc t vng
cho xc sut CFG v to ra cc xc sut CFG nhy hn trong cu trc c php.
3.3.2.2. M hnh ngn ng n-gram:
Mt m hnh ngn ng c th c lp cng thc nh mt phn phi xc
sut P(W) thng qua cc chui t W phn nh lm th no mt chui W tm thy
nh mt cu.
( ) = =1 ( |1 , 2 , , 1)

(3.34)

Trong ( |1 , 2 , , 1 ) l xc sut m wi s theo, cho trc dy t


1 , 2 , , 1. La chn wi ph thuc vo ton b lch s u vo. Cho mt b t
vng kch thc v, xc nh ( |1 , 2 , , 1 ) , gi tr vi phi c lng gi.
Trong thc t ( |1 , 2 , , 1) l khng th lng gi cho gi tr trung ha
ca i, khi hu ht mu 1 , 2 , , 1 l duy nht hay ch xy ra ti mt vi thi
im. Mt gii php thc t cho vn trn l gi nh ( |1 , 2 , , 1) ph
thuc mt vi lp tng ng. Lp tng ng da trn c s nhiu t trc
+1 , +2 , , 1. iu ny dn n mt m hnh ngn ng n-gram. Nu t

65

ph thuc vo hai t trc , chng ta c m hnh ngn ng trigram:


( |2 , 1) . Tng t ta c m hnh ngn ng unigram: P(wi), hay bigram:
P(wi|wi-1). M hnh trigram rt mnh, n c th lng gi hp l vi tp vn c th
t c.
3.3.3. phc tp ca cc m hnh ngn ng:
Ta

hiu

sut

entropy

H(W)

ca

mt

hnh

( |+1, +2 , , 1) trn d liu W, vi mt dy t di thch ng, c


th c c lng:
() =

log 2 ()

(3.35)

Trong Nw l chiu di vn bn W c o trong b t.


phc tp PP(W) ca m hnh ngn ng P(W) c nh ngha nh hm
nghch ca xc sut trung bnh c phn phi bi m hnh cho mi t trong tp
kim tra W:
PP(W) = 2H(W)

(3.36)

phc tp c th c dch nh ngha hnh hc ca phn nhnh h s ca


vn bn khi c biu din cho m hnh ngn ng. phc tp nh ngha trong
(3.37) c hai i s chnh: m hnh ngn ng v mt dy t. phc tp tp kim
tra lng gi kh nng to ra m hnh ngn ng. phc tp tp hun luyn o m
hnh ngn ng ph hp vi d liu hun luyn ra sao, ging nh kh nng c th
xy ra. N tng qut l ng, phc tp thp hn tng lin vi hiu sut nhn
dng tt hn. l bi v phc tp c o theo s phn nhnh t c tnh mt
cch thng k trn tp kim tra. phc tp cao hn, nhiu nhnh hn ca trnh
nhn dng ting ni cn c quan tm mt cch thng k.
Khi phc tp khng a vo s tnh ton s phc tp m hc, ta cui
cng c th o chnh st qu trnh nhn dng ting ni. V d, nu b t vng
ca trnh nhn dng ting ni bao gm tp E ca cc ch ci: B, C, D, E, G v T,
chng ta c th nh ngha mt CFG vi gi t phc tp thp hn l 6. phc tp
thp hn nh vy khng m bo chng ta s c hiu sut qu trnh nhn dng tt,
bi v s phc tp m hc bn trong ca tp E.

66

CHNG 4: CNG C H TR NHN DNG TING NI


4.1. GII THIU V SPHINX:
Hin nay c 2 b cng c h tr nhn dng ting ni ting Vit l HTK v
Sphinx. Tuy nhin, vn cha c cng trnh nghin cu no khng nh cng c
no l tt nht. Lun vn ny s s dng Sphinx lm cng c nhn dng ting ni
ting Vit.
Sphinx l mt h thng nhn dng ting ni c vit bng ngn ng Java.
N c to ra bi s tham gia cng tc gia nhm Sphinx ca CMU (Carnegie
Mellon University), Sun Microsystems Laboratories, MERL (Mitsubishi Electric
Research Labs) v HP (Hewlett Packard), vi s ng gp ca UCSC (University
of California at Santa Cruz) v MIT (Massachusetts Institute of Technology).
Cc tnh nng chnh:
- Nhn dng ting ni ch trc tip v theo l, c kh nng nhn dng
ting ni ri rc v lin tc.
- Kin trc ngoi vi tng qut c kh nng tho lp. Bao gm kh nng b
sung cc tnh nng tin nhn (preemphasis), ca s Hamming, bin i Fourier
nhanh, thang lc tn s Mel, bin i cosine ri rc, chun ha cepstral, v trch c
trng cepstra, delta cepstra, double delta cepstra.
- Kin trc m hnh ngn ng tng qut v c kh nng tho lp. Bao gm
h tr m hnh ngn ng dng ASCII v cc phin bn nh phn ca unigram,
bigram, trigram, Java Speech API Grammar Format (JSGF), v ARPA-format FST
grammars.
- Kin trc m hnh m tng qut. Bao gm h tr cc m hnh m hc ca
Sphinx3.
- B qun l tm kim tng qut. Bao gm h tr cc tm kim breadth first
v word pruning.
- Cc tin ch cho vic x l kt qu sau khi nhn dng, bao gm tnh im
s tin cy, pht sinh cc li v nhng kch bn ECMA vo th JSGF. Cc cng c

67

c lp bao gm cc cng c hin th dng sng v nh ph v trch c trng t


tp tin m thanh.
Sphinx tr thnh mt framework nhn dng ting ni mnh m, c
s dng trong nhiu h thng nhn dng bao gm cc chng trnh in m nh
Cairo, Freeswitch, jvoicexml, cc chng trnh iu khin nh Gnome-VoiceControl, Voicekey, SpeechLion,
Cc li ch khi s dng Sphinx:
i vi vic nghin cu nhn dng ting ni da trn m hnh Markov n:
- Sphinx mc nh rng vic tnh ton Gaussian Mixture Model v x l tm
kim l tch bit nn c th thc hin hai loi nghin cu khc nhau m khng b
xung t vi nhau. V d c th thc hin mt s quan st xc sut mi m khng
ng n m ngun thc hin tm kim. Cng lc c th xy dng mt thut ton
tm kim mi m khng phi suy ngh v tnh ton GMM,...
- i vi vic hun luyn, hu ht thi gian khi ta nghin cu m hnh ha,
iu mong mun thay i l thut ton c lng. Thut ton Baunm-Welch ca
SphinxTrain gii quyt vn ny qua hai giai on: a thng k xc sut n sau
ra mt tp tin ring bit v c th d dng c li bng cc th vin ca
SphinxTrain. Bn c th ch lm vic vi cc bn thng k ny v khng cn phi t
mnh thc hin hun luyn Baum-Welch. iu ny gip gim thi gian nghin cu.
- M ngun ca Sphinx c vit r rng v d c. Nhiu nh nghin cu
khng ch mun s dng Sphinx nh mt cng c m cn mun thay i m ngun
ph hp vi mc ch ca h.
4.2. KIN TRC SPHINX:
Sphinx Framework c thit k vi linh hot v tnh m un ha cao.
Hnh di y biu din bao qut kin trc ca h thng. Mi thnh phn c gn
nhn biu din mt m un c th d dng c thay th, cho php cc nh nghin
cu th nghim mt m un khc m khng cn phi thay i cc phn cn li ca
h thng.

68

Hnh 4.1. Kin trc tng qut ca Sphinx

C 3 m un chnh trong Sphinx Framework: B ngoi vi (FrontEnd), B


gii m (Decoder) v b ngn ng (Linguist). B ngoi vi nhn vo mt hay nhiu
tn hiu s v tham s ha chng thnh mt dy cc c trng (Feature). B ngn
ng chuyn i tt c cc m hnh ngn ng chun, cng vi thng tin cch pht m
trong t in (Dictionary) v thng tin cu trc t mt hay nhiu cc tp hp cc m
hnh m hc (AcousticModel) vo mt th tm kim (SearchGraph). B qun l
tm kim (SearchManager) trong b gii m s dng cc c trng t b ngoi vi v
th tm kim t b ngn ng thc hin vic gii m, pht sinh cc kt qu
(Result). Ti bt k thi im trc v trong qu trnh x l nhn dng, ng dng
(Application) a ra cc iu khin ti mi m un, tr thnh mt i tc hiu qu
trong qu trnh x l nhn dng.
H thng Sphinx4 c mt s lng ln cc tham s cu hnh iu chnh
hiu sut ca h thng. Thnh phn qun l cu hnh (ConfigurationManager) c
dng cu hnh cc tham s . B qun l cu hnh cn gip cho Sphinx4 c kh
nng np ng v cu hnh cc m un trong thi gian thc thi, lm cho Sphinx4 tr
nn linh hot v c kh nng tho lp. V d Sphinx4 thng c cu hnh vi mt
b ngoi vi to ra cc MFCC (Mel-Frequency Cepstral Coefficient). S dng b
qun l cu hnh c kh nng cu hnh li Sphinx4 xy dng mt ngoi vi khc
pht sinh ra cc PLP (Perceptual Linear Prediction coefficient) m khng cn phi

69

sa i m ngun hay bin dch li h thng.


Sphinx cn cung cp mt s cng c gip cc ng dng v cc nh pht
trin kh nng theo di cc s liu thng k ca b gii m nh t l t li, tc
thc thi v b nh s dng. Cng nh cc phn khc ca h thng, cc cng c ny
c kh nng cu hnh mnh m, cho php ngi dng thc hin vic phn tch h
thng. Hn na chng cn cung cp mt mi trng thc thi tng tc, cho php
ngi dng sa i cc tham s ca h thng trong lc h thng ang chy, gip
cho vic th nghim nhanh chng vi nhiu tham s cu hnh.
Sphinx4 cng cung cp cc tin ch h tr xem xt cp ng dng
(application-level) ca cc kt qu nhn dng. V d, cc tin ch ny bao gm h
tr xem kt qu thu c dng li (lattice), cc nh gi tin cy (confidence
scores) v s hiu ngn ng.
4.2.1. B ngoi vi - FrontEnd:

Hnh 4.2. Qu trnh trch c trng ca b ngoi vi dng MFCC

Mc ch ca b ngoi vi l tham s ha mt tn hiu u vo (m thanh)


thnh mt dy cc c trng xut ra. B ngoi vi bao gm mt hay nhiu chui
song song cc m un x l tn hiu giao tip c kh nng thay th gi l cc
DataProcessor. Vic h tr nhiu chui cho php gi lp tnh ton cc loi tham s
khc nhau trong cng mt hay nhiu tn hiu vo. iu ny cho php to nn cc h
thng c th gii m cng mt lc s dng cc loi tham s khc nhau, v d MFCC

70

v PLP. v thm ch cc loi tham s dn xut t cc tn hiu khng phi l tn hiu


ting ni nh video.

Hnh 4.3. Chui cc DataProcessor

Mi DataProcessor trong b ngoi vi h tr mt u vo v mt u ra c


th c kt ni vi DataProcessor khc, cho php to thnh dy cc chui di
chuyn bit. Sphinx4 cho php kh nng pht sinh dy cc c trng song song v
cho php mt s lng ty cc dng song song.
S dng ConfigurationManager, ngi dng c th xu chui cc
DataProcessor vi nhau theo bt k cch no cng nh cc b sung DataProcessor
kt hp cht ch trong thit k ring ca h.
4.2.2. B ngn ng - Linguist:
B ngn ng pht sinh th tm kim (SearchGraph) s dng trong b
gii m trong qu trnh tm kim, trong khi n i cc phn phc tp bao gm pht
sinh ra th ny. Trong Sphinx4 b ngn ng l mt module c th gn thm, cho
php ngi dng c th cu hnh ng h thng vi cc ci t khc vo b ngn
ng.
Mt b sung b ngn ng thng thng xy dng nn th tm kim s
dng cu trc ngn ng c m t bi LanguageModel cho trc v cu trc hnh
hc tp ca m hnh ngn ng (cc HMM cho cc n v m c bn s dng bi
h thng). B ngn ng c th cng s dng mt t in (thng l mt t in
pht m) nh x cc t t m hnh ngn ng vo cc chui ca cc thnh phn
m hnh m hc. Khi pht sinh th tm kim, b ngn ng c th cn kt hp cc
n v t con (subword) vi cc ng cnh di ty , nu c cung cp.
Bng cch cho php cc b sung khc nhau ca b ngn ng c gn kt
vo trong thi gian chy, Sphinx4 cho php cc c nhn cung cp cc cu hnh khc
nhau cho cc h thng v cc yu cu nhn dng khc nhau. V d: mt ng dng

71

nhn dng cc s n gin c th s dng mt b ngn ng n gin c th lu


ton b khng gian tm kim trong b nh. Mt khc, mt ng dng c chnh t
vi 100 ngn t vng c th dng mt b ngn ng phc tp ch lu mt phn
nh ca khng gian tm kim tim nng trong b nh trong mt thi gian.
B b ngn ng bao gm 3 thnh phn: m hnh ngn ng, t in, v m
hnh m hc.
4.2.2.1. M hnh ngn ng:
M hnh ngn ng ca b ngn ng cung cp cu trc ngn ng cp t
(word-level), c th biu din bi bt c s lng cc b sung c th gn thm.
Nhng b sung ny thng l mt trong hai mc: cc graph-driven grammar v cc
m hnh Stochastic N-Gram. Cc Graph-driven grammar biu din mt th t c
hng trong mi nt biu din mt t n v mi cung biu din xc sut dch
chuyn sang mt t. Cc m hnh stochastic N-Gram cung cp cc xc sut cho cc
t c cho da vo vic quan st n-1 t ng trc.
M hnh ngn ng ca Sphinx4 h tr nhiu nh dng khc nhau bao gm:
- SimpleWordListGrammar: nh ngha mt t da trn mt danh sch cc
t. Mt tham s ty chn ch ra ng php c lp hay khng. Nu ng php khng
lp, ng php s c dng cho mt nhn dng t tch bit. Nu ng php lp, n
s c dng h tr lin kt nhn dng t tm thng, tng ng vi mt
unigram grammar vi xc sut bng nhau.
- JSGFGrammar: H tr JavaTM Speech API Grammar Format (JSGF),
nh ngha mt biu din theo BNF, c lp nn tng, Unicode ca cc ng php.
- LMGrammar: nh ngha mt ng php da trn mt m hnh ngn ng
thng k. LMGrammar pht sinh mt nt ng php mi t v lm vic tt vi cc
unigram v bigram, xp x 1000 t.
- FSTGrammar: h tr mt b chuyn i trng thi gii hn (finite-state
tranducer) trong nh dng ng php ARPA FST.
- SimpleNGramModel: cung cp h tr cho cc m hnh ASCII N-Gram
trong nh dng ARPA. SimpleNGramModel khng c lm ti u vic s dng b
nh, do n lm vic tt vi cc m hnh ngn ng nh.

72

- LargeTrigramModel: cung cp h tr cc m hnh N-Gram ng c


pht

sinh

bi

CMU-Cambridge

Statictical

Language

Modeling Toolkit.

LargeTrigramModel ti u vic lu tr b nh, cho php n lm vic vi cc tp tin


rt ln, trn 100MB.
4.2.2.2. T in:
B t in cung cp cch pht m cho cc t tm thy trong m hnh ngn
ng. Cc cch pht m chia cc t thnh cc n v t ph (sub-word) tm c
trong m hnh m hc. B t in cng h tr vic phn lp cc t v cho php mt
t n trong nhiu lp.
4.2.2.3. M hnh m hc:
M hnh m hc cung cp mt nh x gia mt n v ting ni v mt
HMM c th c nh gi da vo cc c trng c cung cp bi b ngoi vi.
Cc nh x c th a thng tin v tr ca t v ng cnh vo ti khon. V d trong
trng hp cc triphone, ng cnh miu t cc m v n bn tri v bn phi ca
m v cho, v v tr ca t m t triphone v tr bt u, gia hay cui ca
mt t (hay chnh n l mt t). nh ngha ng cnh ny khng b c nh bi
Sphinx4, Sphinx4 cho php nh ngha cc m hnh m hc cha cc tha m v cng
nh cc m hnh m hc m ng cnh ca n khng cn phi st vi n v.
Thng thng, b ngn ng phn tch mi t trong b t vng c kch
hot thnh mt dy cc n v t con ph thuc ng cnh. B ngn ng sau
chuyn cc n v ny v cc ng cnh ca n n m hnh m hc, tm cc th
HMM gn vi cc n v . Sau n dng cc th HMM ny kt hp vi m
hnh m hc xy dng nn th tm kim.
Khng ging hu ht cc h thng nhn dng ting ni biu din cc th
HMM l cc cu trc c nh trong b nh, HMM trong Sphinx4 ch n thun l
mt th c hng ca cc i tng. Trong th ny, mi nt tng ng vi
mt trng thi HMM v mi cung biu din xc sut bin i t trng thi ny sang
trng thi khc trong HMM. Bng cch biu din HMM nh l cc th c hng
ca cc i tng thay v mt cu trc c nh, mt b sung ca m hnh m hc c
th d dng cung cp cc HMM vi cc dng hnh hc tp khc. V d, cc giao
din m hnh m hc khng gii hn s lng trng thi, s lng chuyn trng thi

73

hay hng chuyn trng thi ca cc HMM. Hn na Sphinx4 cho php s lng
cc trng thi trong mt HMM c th khc nhau t mt n v ti n v khc trong
cng mt m hnh m hc.
Mi trng thi HMM c kh nng pht sinh mt nh gi t mt c trng
quan st. Quy tc tnh ton im s c thc hin bi chnh trng thi HMM,
do che du cc thc thi ca n i vi phn cn li ca h thng, thm ch cho
php cc hm mt xc sut khc nhau c s dng trn mi trng thi HMM.
M hnh m hc cng cho php chia s cc thnh phn khc nhau trn tt c cc cp
. Ngha l cc thnh phn to nn mt trng thi HMM nh cc hp Gaussian
(Gaussian mixture), cc ma trn bin i v cc trng s hn hp (mixture weight)
c th c chia s bi bt k trng thi HMM no.
Nh cc phn cn li ca Sphinx4, ngi dng c th cu hnh Sphinx4 vi
cc b sung khc ca m hnh m hc. Sphinx4 hin cung cp mt thc thi trng
thi HMM c bit c kh nng np vo v s dng cc m hnh m hc sinh ra bi
b hun luyn Sphinx-3.
4.2.2.4. th tm kim - SearchGraph:
Mc d Sphinx4 c th c thc thi trong nhiu cch khc nhau v cc
topology ca cc khng gian tm kim sinh bi cc b ngn ng c th rt a dng,
cc khng gian tm kim c m t hon ton nh mt th tm kim. th tm
kim l cu trc d liu chnh s dng trong sut qu trnh gii m.
l mt th c hng trong mi nt, gi l mt Trng thi tm kim
(SearchState), biu din mt trng thi pht hay khng pht (emitting state hay nonemitting state). Cc trng thi pht c th c nh gi da trn cc c trng m
hc vo (incoming acoustic feature) trong khi cc trng thi khng pht thng
thng c dng biu din cc cu trc ngn ng cp cao nh cc t v cc
m v khng th nh gi trc tip da trn cc c trng u vo. Cc cung gia
cc trng thi biu din cc bin i trng thi c th, mi cung c mt xc sut ch
kh nng bin i dc theo cc cung.
Giao din th tm kim c mc tiu cho php mt phm vi ln cc la
chn b sung. Thc t, b ngn nh khng t cc rng buc c hu theo:

74

- Ton b topology khng gian tm kim.


- Kch thc ng cnh ng m
- Loi ca ng php (theo xc sut hay da trn lut)
- Chiu su ca m hnh ngn ng N-Gram.
c im chnh ca th tm kim l vic thc thi ca trng thi tm kim
khng cn c nh. Nh vy, mi b sung b ngn ng thng thng cung cp thc
thi c th ca SearchState ca ring n m c th da trn cc c trng khc nhau
ca b ngn ng c th. V d, mt b ngn ng n gin c th cung cp mt
th tm kim trong b nh m mi trng thi n gin l mt nh x mt-mt ln
cc nt ca th trong b nh (in-memory graph). Mt b ngn ng m t mt b
t vng rt ln v phc tp, tuy nhin c th xy dng mt biu din bn trong c
ng ca th tm kim. Trong trng hp ny, b ngn ng s sinh ra mt tp
trng thi tm kim k tip bng cch ch ng m rng biu din c ng ca n
theo nhu cu.

Hnh 4.4. Mt v d th tm kim

Thit k theo module ca Sphinx cho php th tm kim bin dch cc


chin lc khc nhau s dng m khng cn thay i cc mt khc ca h thng.
La chn gia xy dng cc HMM nhn ng ng hay tnh ph thuc chnh vo
kch thc t vng, phc tp m hnh ngn ng v tha mn yu cu b nh ca
h thng, v c th thc hin bi ng dng.
4.2.3. B gii m - Decoder:
Vai tr chnh ca b gii m l s dng cc c trng (Features) t b ngoi
vi kt hp vi th tm kim t b ngn ng pht sinh cc kt qu (Result).
Khi b gii m bao gm mt b qun l tm kim (SearchManager) c kh nng

75

tho lp v cc m h tr khc n gin ha qu trnh gii m cho mt ng dng.


Do vy, thnh phn ng quan tm ca b gii m l b qun l tm kim
B gii m ch n thun bo b qun l tm kim nhn dng mt tp cc
cu trc c trng. Ti mi bc x l, B qun l tm kim to ra mt i tng
kt qu cha tt c ng dn n mt trng thi khng pht sinh cui cng (final
non-emitting state). x l kt qu, Sphinx cung cp cc tin ch c kh nng pht
sinh mt li v cc nh gi tin cy t kt qu. Khng nh cc h thng khc, cc
ng dng c th iu chnh khng gian tm kim v i tng kt qu gia cc
bc, cho php ng dng tr thnh mt i tc trong qu trnh x l.
Ging b ngn ng, b qun l tm kim khng b rng buc vi bt c b
sung c th no. V d cc b sung ca b qun l tm kim c th thc hin cc
thut ton tm kim nh Viterbi ng b khung, A*, bi-directional,...
Mi b sung b qun l tm kim s dng mt token i qua thut ton m t
bi Young. Mt token Sphinx l mt i tng c gn vi mt trng thi tm
kim v cha nh gi m v ngn ng ca ng i ti mt im cho trc, mt
tham chiu ntrng thi tm kim, mt tham chiu n mt khung c trng
(Feature frame) vo v cc thng tin lin quan khc. Tham chiu trng thi tm kim
cho php b qun l tm kim lin h vi mt token ti phn b trng thi, n v
ng m ph thuc ng cnh, cch pht m, t v trng thi ng php. Mi phn gi
thit tn cng bng mt token c hiu lc.
Cc b sung ca mt b qun l tm kim c th xy dng mt tp cc
token c hiu lc trong nh dng ca mt danh sch hot ng (ActiveList) ti thi
im mi bc (khng yu cu). Sphinx cung cp mt framework b sung h tr
b qun l tm kim bao gm mt danh sch ha ng, mt b ct ta (Pruner) v
mt b nh gi (Scorer).
Framework b sung SearchManager cng giao tip vi Scorer, mt module
c lng xc xc sut trng thi cung cp cc gi tr mt xut trng thi theo
yu cu. Khi SearchManager yu cu mt nh gi cho mt trng thi cho trc ti
mt thi im cho trc, Scorer truy cp n vector c trng ti thi im v
th thi cc php ton tnh ton im s. Trong trng hp ny, vic gii m song

76

song s dng cc m hnh thnh gic song song, Scorer ghp m hnh thnh gic
c dng da vo loi c trng.
Scorer gi li tt c thng tin gn lin vi cc mt xut trng thi. Do ,
SearchManager khng cn bit vic nh gi c hon thnh vi cc HMM ln
tc, bn lin tc hay ri rc. Hn na, hm mt xc sut ca mi trng thi b
tch bit trong cng mt kiu. Bt c gii thut heuristic kt hp vo hm nh gi
tng tc c th cng c thc thi cc b bn trong b nh gi. Thm na,
b nh gi c th tn dng nhiu CPU nu sn c.
Sphinx4 cung cp cc thnh phn b sung ca SearchManager c th tho
lp c, h tr ng b khung Viterbi, Bushderby v gii m song song:
- SimpleBreadthFirstSearchManager: Thc hin mt tm kim Viterbi ng
b khung n gin vi Pruner c gi trn mi khung. Pruner mc nh qun l c
cc tia tuyt i v tng i. B qun l tm kim pht sinh cc Result cha cc
con tr ti cc ng dn hot ng frame cui cung c x l.
- WordPruningBreadthSearchManager: thc hin mt tm kim Viterbi ng
b khung vi mt b ct ta c th tho lp (pluggable Pruner) c gi trn mi
khung. Thay v qun l mt ActiveList n gin, n qun l mt tp cc ActiveList,
mi danh sch cho mt loi trng thi c nh ngha bi Linguist.
- BushderbySearchManager: thc thi mt tm kim theo chiu rng ng b
khung tng qut (generalized frame-synchronous breath-first search) s dng thut
ton Bushderby, thc hin phn lp da trn nng lng t do, tri vi kh nng.
- ParallelSearchManager: thc thi mt tm kim Viterbi ng b khung trn
nhiu lun c trng s dng mt tip cn HMM theo ngn ng yu t, ngc li
vi tip cn tip cn HMM kt hp bi AVCSR. Mt u im ca tm kim theo
yu t l c th thc hin nhanh hn v nh gn hn nhiu so vi tm kim ton b
mt HMM tng hp.
4.3. QUN L CU HNH SPHINX:
H thng qun l cu hnh Sphinx (Sphinx configuration manager system)
c 2 mc ch chnh:

77

- Xc nh nhng thnh phn no c dng trong h thng: H thng


Sphinx c thit k tr nn cc k linh hot. Trong lc chy (runtime), bt c
thnh phn no cng c th c thay th bi ci khc. Trong Sphinx, thnh phn
b ngoi vi cung cp cc c trng m hc c dng nh gi ngc li m
hnh m hc. Thng thng, Sphinx c cu hnh vi 1 b ngoi vi pht sinh cc
Mel frequency cepstral coefficient (MFCCs), tuy nhin vn c th cu hnh li
Sphinx dng mt b ngoi vi khc, v d pht sinh Perceptual Linear Prediction
coeficients (PLP). B qun l cu hnh Sphinx c dng cu hnh h thng theo
cch ny.
- Xc nh cu hnh chi tit ca mi thnh phn ny: H thng Sphinx c 1
s lng ln cc thnh phn iu khin cch thc h thng thc thi. V d mt
beam width thng s dng iu khin s lng ng dn tm kim c duy
tr trong qu trnh gii m m thanh.
Cu hnh ca mt h thng Sphinx c xc nh bi mt tp tin cu hnh
dng xml. Tp tin cu hnh ny nh ngha:
- Cc tn v kiu ca tt c cc thnh phn ca h thng.
- Lin kt ca cc thnh phn ny. l cc thnh phn no lin h vi
thnh phn khc.
- Cu hnh chi tit cho mi thnh phn ny.
Bng 4.1. Cc th nh dng trong tp tin cu hnh

Th
<config>

Cc thuc tnh
khng c

<component> name - tn ca
thnh phn
type - loi ca
thnh phn
<property>

Thnh phn con

M t

<component>
<property>
<propertylist>

Thnh phn cp cao nht,


khng c thuc tnh v c
th cha bt c thnh phn
no trong component,
property v propertylist

<property>
<propertylist>

nh ngha mt thc th ca
mt thnh phn. Th ny
phi lun c thuc tnh
name v type

name - tn thuc Khng c


tnh
type - loi thuc
tnh

Dng nh ngha mt
thuc tnh n ca thnh
phn hay mt thuc tnh h
thng ton cc

78

<propertylist> name - tn ca
danh sch thuc
tnh

<item>

Dng nh ngha mt
danh sch cc chui hay cc
thnh phn. Th ny phi
lun c thnh phn name.
N c th cha bt c s
lng item con no.

<item>

Khng c

Ni dung ca th ny l mt
chui hay mt tn ca thnh
phn.

Khng c

79

CHNG 5: CHNG TRNH DEMO


5.1. CI T CHNG TRNH
Sphinx chy tt trong h iu hnh Linux. C th ci t Sphinx trong h
iu hnh Windows bng cch dng Cygwin to mi trng Linux. Tuy nhin, t
l thnh cng khng cao. Chng trnh demo c ci t trong h iu hnh
Ubuntu.
5.1.1. Ti cc gi Sphinx cn thit:
ci t c Sphinx trn Ubuntu cn ti cc gi sau v v trong cng
mt th mc ( to v t tn, nm trong th mc Home).
Cc gi bao gm:

Pocketsphinx - mt th vin nhn dng vit bng ngn ng C.

Sphinxbase - gi th vin nn, h tr cc th vin cn thit cho cc

gi khc.

Sphinx4 - gi h tr nhn dng vit bng java.

CMUclmtk - b cng c xy dng m hnh ngn ng.

Sphinxtrain - b cng c hun luyn m hnh ng m.

5.1.2. Ci t:
To mt th mc tn sphinx trong Home folder (trong my o Ubuntu). Chp
cc tp tin (Sphinxbase, Sphinxtrain, Pocketsphinx, CMUclmtk) va download
trong mc trn vo v gii nn (lu xa i ch s version sau khi extract).

80

Hnh 5.1. Ci t Sphinx

S dng ca s Terminal trong Ubuntu: Ctrl+Atl+t.


Nhp vo

sudo apt-get update

sau nhp vo password lc ci t (password

s khng hin ln, nhp cn thn v nhn Enter). Lnh trn update cho cc gi
ci t dng bng lnh apt-get. Ch update xong
Nhp vo: cd sphinx di chuyn ti th mc sphinx va to.
Ci t cc gi cn thit trc khi ci SphinxBase:
G cc lnh:

sudo apt-get install bison ,

ng ti v ci bison

sudo apt-get install autoconf

sudo apt-get install automake

sudo apt-get install libtool

5.1.2.1. Ci t SphinxBase
Nhp lnh: cd sphinxbase i vo th mc sphinxbase.
G cc lnh sau v ch thi hnh:

./autogen.sh

./configure

make

81

sudo make install

5.1.2.2. Ci t Sphinxtrain
T th mc sphinxbase trn, g lnh chuyn sang th mc sang th
mc sphinctrain:
cd ../sphinxtrain

G cc lnh sau v ch thi hnh:

./configure

make

sudo make install

5.1.2.3. Ci t PocketSphinx
T th mc sphinxtrain trn, g lnh chuyn sang th mc sang th
mc pocketsphinx:
cd ../pocketsphinx

G cc lnh sau v ch thi hnh:

./autogen.sh

./configure

make

sudo make install

G tip lnh sau vo Terminal:


sudo ldconfig

5.2. XY DNG B NGN NG:


B ngn ng cho chng trnh nhn dng ting ni bao gm 3 thnh phn
chnh: B t in, m hnh ngn ng v m hnh m hc. Phn ny s m t qu
trnh xy dng cc thnh phn bng cc cng c ca Sphinx.
5.2.1. Xy dng b t in:
Cng vic u tin l to mt b t in ph hp. B t in ny bao gm
cc t mong mun chng trnh nhn dng.
Do cc cng c hun luyn ca Sphinx cha h tr tt cho unicode nn cc
k t khng thuc bng m ASCII s s dng phng php:

82

- Cc k t khng thuc bng m ASCII s c thay th bng kiu g


telex.
- Xy dng bng phin m ting Vit mc m v di dng ASCII. (Tham
kho ph lc)
Bng phin m ting ting Vit mc m v c xy dng da trn cc tiu
ch:
- Biu din c ht cc m v c th c trong ting Vit di dng m
ASCII.
- Mi m v u c t hp t cc k hiu sn c trn bn phm tin li
cho vic nhp liu.
- Cch g telex tin li, d s dng v d hiu.
- Cc thanh iu c k hiu bng cc k t S, F, R, X, J v khong trng.
Thanh iu c t cho cc nguyn m.
B t in c t chc nh sau:
LEEN
L EE NZ
NUWXA
N UWX A
XUOOSNG
X U OOS NGZ
TRASI
TR AS IZ
PHARI
F AR IZ
QUA
KA
LAJI
L AJ IZ
TIEESP
T I EES PC
TRUWOWSC
TR WAS KC
VEEF
V EEF
DDAAFU
DD AAF UZ
CUOOSI
K UOS IZ
BAWST
B AWS TZ
NGHE
NG EZ
NHAJC
NH AJ KC
WEB
K ES PC
VAWN
V AW NZ
BARN
B AR NZ

83

5.2.2. Xy dng m hnh ngn ng:

Hnh 5.2. S qu trnh to m hnh ngn ng bng cng c CMUclmk

Cng c dng to m hnh ngn ng l CMUclmk. Qu trnh to mt m


hnh ngn ng bao gm cc bc sau:
5.2.2.1. Chun b tp tin vn bn:
Bc ny pht sinh ra m hnh ngn ng. Cng c m hnh ngn ng
tip nhn u vo l mt tp tin dng text, c cc cu c bao bi cc th <s> v
</s>. V d tp tin dkmt.txt
<s> SON THO VN BN </s>
<s> DI CHUYN TR V U </s>
<s> NGHE NHC </s>
....
Do CMUclmk cha h tr t unicode nn m bo cng c CMUclmk
c th x l tt tp tin ny, cc k t khng thuc ASCII s c chuyn thnh kiu
g telex:
<S> SOAJN THARO VAWN BARN </S>
<S> DI CHUYEERN TROR VEEF DDAAFU </S>
<S> NGHE NHAJC </S>
...

84

5.2.2.2. Pht sinh b t vng:


B t vng l mt tp tin

.vocab,

cha tt c cc t hoc ting trong tp tin

vn bn. N c to ra bi CMUclmk v s c dng to m hnh ngn ng.


To b t vng ny bng lnh:
text2wfreq < dkmt.txt > dkmt.wfreq
wfreq2vocab < dkmt.wfreq > dkmt.vocab

Ta s thu c tp tin

dkmt.wfreq

cha danh sch tt c cc t (ting) km

theo s ln xut hin ca n trong vn bn. Tp tin t vng dkmt.vocab cha tt c cc


t trong vn bn c sp xp theo th t alphabet.
5.2.2.3. Pht sinh m hnh ngn ng:
M hnh ngn ng c nh dng .arpa. to ra m hnh ny, s dng 2 lnh
sau:
text2idngram -vocab dkmt.vocab -idngram dkmt.idngram < dkmt.txt
idngram2lm -vocab_type 0 -idngram dkmt.idngram -vocab dkmt.vocab -arpa dkmt.arpa

nh dng ARPA (hay Doug Paul) cho m hnh N-gram backoff c cu trc
nh sau:
\data\
ngram 1=n1
ngram 2=n2
...
ngram N=nN
\1-grams:
p
w
[bow]
...
\2-grams:
p
w1 w2
[bow]
...
\N-grams:
p
w1 ... wN
...
\end\
Tp tin ny c phn m u vi t kha \data\, lit k s lng N-gram.
Sau cc N-gram c lit k mi dng, c nhm li thnh tng phn theo

85

chiu di. Mi phn bt u vi t kha \N-gram; trong N l chiu di 1, 2, .


Mi dng N-gram bt u vi logarit (c s 10) ca iu kin xc sut p ca Ngram , theo sau bi cc t w1, w2, wN to nn N-gram . T kha \end\ kt
thc biu din m hnh.
Sphinx c th s dng c, phi chuyn tp tin ny sang dng nh phn
bng cng c sphinxbase. Lnh chuyn i:
sphinx_lm_convert -i dkmt.arpa -o dkmt.lm.DMP

5.2.3. Xy dng m hnh m hc:


M hnh m hc bao gm mt biu din thng k cc thanh m ring bit
to nn mi t trong m hnh ngn ng hay b ng php. Mi m thanh ring bit
tng ng vi mt m v. Qu trnh hun luyn m hnh m hc c s dng bng
cng c sphinxtrain.
Chun b d liu:
To mt th mc hun luyn, mang tn dkmt.
Trong to 2 th mc con l etc, wav.
Sau to cc tp tin nh cu trc sau:
etc
|___ dkmt.dic - b t in m v, m tit
|___ dkmt.phone - tp tin cha danh sch cc m v
|___ dkmt.lm.DMP - M hnh ngn ng
|___ dkmt.filler - Danh sch cc khong lng
|___ dkmt_train.fileids - Danh sch cc tp tin hun luyn
|___ dkmt_train.transcription - D liu dng text ca tp tin
hun luyn
|___ dkmt_test.fileids - Danh sch cc tp tin test
|___ dkmt_test.transcription - D liu dng text ca tp tin test
wav
|___ train
|___ speaker_1

86

|___ file_1.wav- tp tin thu m mt cu ni ca


ngi hun luyn
|___ ...
|___ test
|___ speaker_1
|___ file_1.wav
|___ ...
Tp tin dkmt.dic
Tp tin ny l tp tin t in chun b t u. N cha ni dung v cch
pht m ca mt t trong b hun luyn.
Mi mt dng trong tp tin l nh ngha cch c ca mt t.
Tp tin ny c phn bit k t hoa - thng. Thng thng xy dng
c tp tin ny, cn tm hiu v cch pht m ca mt t trong mt ngn ng nht
nh. Nu l ting Anh th h c cch c cho t ting Anh c trong t in. y
cng lm mt bc quan trng xy dng thnh cng b hun luyn.
Trong ting Vit, cch c v cc vit mt t l gn nh gng lin vi nhau.
Khng cn c hng dn cch c khi hc ting Vit, trong ting Anh cch c v
cch vit khng ph thuc nhau, vd lead (dn u) & head (ci u). V d:
mun xy dng tp tin ny cho ting Vit, ta c th nh ngha cc t bng nhiu
cch nh sau:
BAN B A N
Vi cch trn, ta xem t BAN l mt m tit vi s kt hp ca 3 m v
l B, A, N.
BAN B AN
Vi cch trn, ta xem t BAN l mt m tit vi s kt hp ca 2 m v
l B, AN.
Sphinx khng h tr nh ngha dng word-base, ngha l cch c ca
mt t khng c chnh l t . Vd: BAN BAN l khng c cho php. Tuy
nhin c th lm mt phng php tng ng thay th nu mn xy dng theo

87

kiu word-base. Khi phi nh ngha t theo kiu 1 t c nhiu cch c, v d:


BAN BAN BANG
ngha ca dng nh ngha trn l t ban c th c theo 2 cch l
ban (cch c ng chun) hoc c l bang (cch c ngi min Nam).
Ch c dng cc k hiu a-z, A-Z, 0-9 m bo khng gy li cho tp
tin ny.
Vn thanh iu:
Ta s xem cc m v i chung vi thanh iu s l mt m v c lp. Khi
thay v xem thanh iu nh mt m v khc theo cch nh ngha sau (nh ngha
cho t bn):
BARN B A R N
Ta s xem m l mt m v khc, c lp vi m a khi ta nh ngha
nh sau:
BARN B AR N
Tp tin dkmt.phone
Tp tin ny cha tt cc cc m v (phin m) s dng trong tp tin trn,
mi mt dng l mt m v, nn sp xp cc m v theo th t Sphinx d qun
l. Lu thm mt m v c bit vo tp tin ny l SIL, m v i din cho
khong lng.
Tp tin dkmt.lm.DMP
Tp tin ny l m hnh ngn ng thng k c xy dng t trc bng
cng c CMUclmk, nh dng ARPA hoc DMP.
Tp tin dkmt.filler
Tp tin ny cha cc m tit dng lm y, thng thng l cc
khong lng, c nh ngha nh sau:
<s> SIL
</s> SIL
<sil> SIL

88

Tp tin dkmt_train.fileids
Tp tin ny l tp tin lit k ng dn n cc tp tin ghi m trn mi
dng, nm trong th mc wav, c trong s th mc trnh by pha trn.
speaker_1/file_1
speaker_2/file_2
Khng ghi ui tp tin .wav vo. Mi mt dng l mt tp tin.
Tp tin dkmt_train.transcription
y l phn ni dung m tp tin wav thu m c. hun luyn cho
Sphinx hiu nhng g chng ta ni, cn cung cp mt tp tin text gip cho
Sphinx hiu v hc t . Cu trc mt tp tin

.transcript

gm nhiu dng, mi mt

dng l ni dung ca mt tp tin wav km theo tn tp tin wav .


<S> DI CHUYEERN CHUOOJT </S> (file_1)
<S> MOWR TAAJP TIN </S> (file_2)
D liu m thanh:
Dng cc chng trnh ghi m ghi m cc cu ni s dng cc t (ting)
cn hun luyn. m thanh c ghi vo vi cc thng s sau [10]:
- Default Sample Rate Format: 16000Hz
- Default Sample Format: 16-bit
- Channels: 1(Mono)
- File Format: wav, raw hoc sph
5.3. CU HNH HUN LUYN SPHINX:
5.3.1. iu chnh tham s:
5.3.1.1. Cu hnh th mc hun luyn:
Sau khi ci t cc gi cn thit trong Ubuntu, chng ta chp th mc
va to bc 5 vo cng th mc vi th mc Sphinx to trc [9].
bt u qu trnh hun luyn, s dng cc lnh ca sphinxtrain v
pocketsphinx cu hnh th mc hun luyn:
../SphinxTrain/scripts_pl/setup_SphinxTrain.pl -task dkmt

89
../pocketsphinx/scripts/setup_sphinx.pl -task dkmt

Vi dkmt l tn ca th mc hun luyn. Lnh trn s sao chp cc phn cn


thit ln th mc hun luyn:
bin
bwaccumdir
etc
feat
logdir
model_parameters
model_architecture
python
scripts_pl
wav
5.3.1.2. iu chnh cc tham s:
Thng tin cu hnh nm trong tp tin

sphinx_train.cfg.

Mt s cu hnh quan

trng:
- Cu hnh hun luyn tp tin m thanh nh dng wav:
$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav';
- iu chnh loi m hnh (hun luyn HMM lin tc, bn lin tc), b du
# trc m hnh cn hun luyn:
$CFG_HMM_TYPE = '.cont.'; # Sphinx 4, Pocketsphinx
#$CFG_HMM_TYPE = '.semi.'; # PocketSphinx
#$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)
- Cu hnh tham s mt CFG c th nhn cc gi tr 4, 8, 16, 32, 64 ty
theo ln ca d liu:
$CFG_FINAL_NUM_DENSITIES = 8;
- Cu hnh s lng cc senone hun luyn trong mt m hnh. S lng
senone cng ln, sphinx phn bit cc m cng chnh xc. Nhng mt khc, nu bn
c qu nhiu senone, m hnh s khng c tng qut nhn dng cc ting
ni v hnh. Ngha l s t li s tng cao trn d liu cha hun luyn. l l do

90

quan trng khng nn hun luyn qu mc cc m hnh. Trong trng hp c


qu nhiu senone v hnh s pht sinh cnh bo li.
# Number of tied states (senones) to create in decision-tree clustering
$CFG_N_TIED_STATES = 200;
Theo nghin cu ca nhm CMUSphinx th cu hnh da theo bng sau:
Bng 5.1. Thng s cu hnh

Kch thc t S gi hun


vng
luyn

Senones

Densities

V d

20

200

M hnh nhn dng s

100

20

2000

M hnh ra lnh iu
khin

5000

30

4000

16

M hnh c chnh t
5000 t

20000

80

4000

32

M hnh c chnh t
20000 t

60000

200

6000

16

M hnh HUB

60000

2000

12000

64

M hnh Fisher Rich


Telephone Transcription

5.3.2. Thc thi hun luyn:


5.3.2.1. To vector c trng:
H thng s khng lm vic trc tip vi cc tn hiu m thanh. Trc tin,
cc tn hiu c chuyn thnh mt chui cc vector c trng, c dng thay cho
cc tn hiu m thanh thc s. thc thi bin i (hay s tham s ha) ny, trong
th mc hun luyn, thc thi 2 lnh sau:
./scripts_pl/make_feats -ctl etc/dkmt_train.fileids
./scripts_pl/make_feats -ctl etc/dkmt_test.fileids

Tp tin kch bn ny s tnh ton mt chui cc vector 13 hng (cc vector


c trng) cho mi cch ni, bao gm cc Mel-frequency cepstral coefficients
(MFCCs). Cc tp tin cha ng dn tuyt i ti cc tp tin m thanh. Cc
MFCC s c t ng t vo th mc ./feat.
5.3.2.2. Hun luyn:
S dng lnh:

91
./scripts_pl/RunAll.pl

Lnh trn s duyt qua cc phn yu cu. Qu trnh hun luyn s xut ra
cc thng bo dng sau:
Baum welch starting for 2 Gaussian(s), iteration: 3 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 3
Current Overall Likelihood Per Frame = 30.6558644286942
Convergence Ratio = 0.633864444461992
Baum welch starting for 2 Gaussian(s), iteration: 4 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 4
5.4. KT QU TH NGHIM:
Qu trnh hot ng:

Hnh 5.3. S hot ng ca chng trnh demo

u tin, tn hiu ting ni qua micro s c a vo b ngoi vi, y


tn hiu c tham s ha thnh mt dy c trng v chuyn vo cho b gii m.
B ngn ng chuyn i cc m hnh ngn ng, thng tin pht m trong t in v
thng tin cu trc m trong m hnh m hc vo mt th tm kim trong b gii
m. B gii m s xc nh chui c trng gn ging nht trong th tm kim so
vi c trng ting ni c cung cp bi b ngoi vi v pht sinh kt qu.
Thng s h thng:

92

Chng trnh demo c xy dng trn h thng vi cc thng s nh sau:


- My laptop Dell Inspiron N4030.
- B x l Intel Core i3, 2.5 GHz, 3GB RAM.
- H iu hnh Linux v Windows 7.
- Card m thanh onboard.
- Micro dng thu v nhn dng l dng micro km headphone.
- Ting ni c thu vi tn s ly mu 16.000 Hz, kch thc mi mu l
16 bit.
Hin ti lun vn xy dng c chng trnh demo nhn dng ting
Vit vi khong trn 100 cu lnh iu khin c bn cho my tnh gm:
DI CHUYN CHUT
DI CHUYN CHUT LN
DI CHUYN CHUT LN TRN
LN
LN TRN
CHUT LN TRN
CHUT LN TRN NA
TRN NA
TRN
LN TRN NA
NA
DI CHUYN CHUT XUNG
DI CHUYN CHUT XUNG DI
XUNG
XUNG DI
CHUT XUNG DI
CHUT XUNG DI NA
DI NA
DI
XUNG DI NA
DI CHUYN CHUT QUA PHI
QUA PHI NA
CHUT QUA PHI
QUA
QUA PHI
PHI
DI CHUYN CHUT QUA TRI
QUA TRI NA
CHUT QUA TRI

93

QUA TRI
TRI
DI CHUYN CHUT V U
V U
CHUT V U
U
BT U
DI CHUYN CHUT V CUI
V CUI
CHUT V CUI
CUI
CUN CHUT
CUN CHUT LN
CUN CHUT XUNG
NHC
DNG
NGHE TIP
TT NHC
CHNG TRNH NGHE NHC
NG CHNG TRNH NHC
WEB
NG WEB
VN BN
CHN TP TIN
M TP TIN
TP TIN
HY
SON THO VN BN
NG CHNG TRNH SON THO
NG SON THO
NG VN BN
THOT
CON TR
TR
TR V U
TR V CUI
CON TR LN
CON TR LN TRN
CON TR LN U
TR LN
TR LN TRN
TR LN TRN NA
TR LN U
CON TR XUNG
CON TR XUNG DI
CON TR XUNG CUI
TR XUNG

94

TR XUNG DI
TR XUNG DI NA
TR XUNG CUI
THM TH
TH MI
NG TH
THM TRANG
TRANG MI
NG TRANG
XUNG DNG
U DNG
CUI DNG
DNG TRN
DNG DI
XUNG HNG
U HNG
CUI HNG
HNG TRN
HNG DI
TI
LUI
TO
PHNG TO
NH
THU NH
- Tng thi gian ghi m khong 50 gi.
- D liu kim tra gm 660 cu ni.
Kt qu th nghim nh sau:
- S lng cu ng: 545/660. chnh xc: 82,57%.
- S lng t ng 1557/1675. chnh xc: 92,95%.
Nh vy c th thy kt qu nhn dng cu khng cao, chnh xc ch t
khong 82%. Tuy nhin kt qu nhn dng t ng li cao hn, trn 92%.

95

KT LUN
KT QU T C:
Qua qu trnh nghin cu v nhn dng ting ni ting Vit v ng dng th
nghim trong iu khin my tnh, lun vn lm c mt s cng vic sau:
- Nghin cu v ting ni, cc phng php x l ting ni, rt trch c
trng.
- Nghin cu v thc hin hun luyn m hnh m hc theo m v, p dng
cho ting Vit.
- Nghin cc kin trc mt h thng nhn dng ting ni qua cc cng c
ca CMUSphinx.
- Xy dng chng trnh demo nhn dng ting ni ting Vit lin tc.
Do cha c nhiu kin thc v x l tn hiu s v x l ting ni nn lun
vn khng trnh khi nhiu thiu st. Tuy nhin, vi mt s kt qu t c hy
vng lun vn s gp mt phn nh vo vic nghin cu nhn dng ting ni ting
Vit.
HNG PHT TRIN:
Do vic thu m x l d liu cha c phong ph nn kt qu cha c
tt. Vic ny c th c khc phc bng cch thu nhiu mu hn v huy ng thm
nhng ngi tnh nguyn thu m. C th xem xt tn dng ngun m ting ni
trn radio, internet lm phong ph thm b d liu hun luyn. Ngoi ra cn pht
trin thm cc phn sau:
- Kho st thm cc c im ng m ting Vit v quan st nh ph tm
ra cc c trng nh hng n thanh iu, ci thin vic nhn dng cc thanh iu.
- Ci tin phng php tch t trong cu c kt qu nhn dng tt hn.
- Tm hiu thm v m hnh ngn ng v cc thut ton tm kim trong
nhn dng ting ni tng tc nhn dng.

TI LIU THAM KHO


Ting Vit:
[1]. ng Hoi Bc (2006), X l tn hiu s, Hc vin Cng ngh Bu
chnh Vin thng.
[2]. ng Ngc c, Nguyn Tin Dng, Lng Chi Mai (2011), M hnh
v phin m ting Vit mc m v, Institute of Information Technology, Vietnamese
Academy of Science and Technology.
[3]. Cao Xun Ho (1998), Ting Vit - my vn ng m, ng php, ng
ngha, NXB Gio dc.
[4]. Quch Tun Ngc, Mai Cng Nguyn (1998), Nhn dng li ni lin
tc vi b t vng ln, Tiu lun mn Nhn dng ting ni, i hc Bch khoa H
Ni.
[5]. Quch Tun Ngc, Phm Xun Trng (1998), Phng php phn tch
v x l nhn dng ting ni, Tiu lun mn X l ting ni, i hc Bch khoa H
Ni.
[6]. Phan Nguyn Phc Quc, H Thc Phng (2009), H thng nhn dng
ting ni, Lun vn i hc, i hc Bch khoa TP.HCM.
[7]. Thi Hng Vn, Xun t, V Vn Tun (2003), Nghin cu cc
c trng ca ting Vit p dng vo nhn dng ting ni ting Vit, Lun vn i
hc, i hc KHTN TP.HCM.
Ting Anh:
[8]. Xuedong Huang, Alex Acero, Hsiao-wuen Hon (2001), Spoken
language Processing, Carnegie Mellon University.
[9]. CMUSphinx Wiki: http://cmusphinx.sourceforge.net/wiki/
[10].

Record

your

Speech

http://www.voxforge.org/home/submitspeech/windows/step-2

with

Audacity:

PH LC
BNG PHIN M TING VIT DI DNG M ASCII
m v
STT

Ch
IPA

V d

M t

ASCII
m u

ba

ph m tc, hai mi, hu


thanh, khng bt hi, ch xut
hin trong m tit khng c m
m

dd

ph m tc, u li li, hu
thanh, khng bt hi

tng

ph m tc, u li rng, v
thanh, khng bt hi.

th

th

thch

ph m tc, v thanh, bt hi,


u li rng.

tr

tr

trng

ph m tc, u li vm
ming, v thanh, khng bt
hi.

ch

ch

ch

ph m tc, v thanh, mt li,


khng bt hi

k (trc i, e, )

keo

c (trc u, , a,
o,...)

cnh

ph m tc, v thanh, gc li,


khng bt hi

q (trc u)

quy

mm

ph m vang mi, hai mi,


xut hin trong m tit khng
c m m

nng

ph m vang mi, u li li

10

nh

nh

nh

ph m vang mi, mt li

11

ng

ng (trc u, , o,
, , a, , )

ng

ph m vang mi, gc li

ngh (trc i, e, )

ngh

12

ph

ph

ph m xt, v thanh, mi
rng, xut hin trong m tit
khng c m m

13

vi

ph m x, hu thanh, mi
rng, xut hin trong m tit

khng c m m
14

xa

15

gi

gii

g (trc i)

ph m xt, v thanh, u li
li
ph m xt, hu thanh, u
li li

16

lm

ph m vang bn, u li
rng

17

sn

ph m xt, v thanh, du li
vm ming, un li

18

rm

ph m xt, hu thanh, u
li vng ming, un li

19

kh

kh

kh

ph m xt, v thanh, gc li

20

g (trc u, , o,
, , a, , )

gm

gh (trc i, e, )

gh

ph m xt cui li, hu
thanh

21

ha

Ph m xt, v thanh, hng

22

pi

ph m tc, hai mi, ...

m m
23

o (trc nguyn
m rng a, , e)

hoa

u (cn li)

hy

c cu to ging nguyn m
chnh /u/, c m hp, pht
m cc trm, trn mi, thuc
hng sau

m chnh
24

y (ng sau u)

suy

i (cn li)

tnh

nguyn m n di, hng


trc, hp, khng trn mi, c
tnh bng, trc /k, / b rt
ngn

25

ee

ch

nguyn m n, di, hng


trc, hi hp, khng trong
mi, c tnh cht bng, trc
/k, / b rt ngn

26

ch

nguyn m n, di, hng


trc, hi rng, khng trn
mi, c tnh cht bng

27

ea

a (trc ch, nh)

sch

nguyn m n, ngn. Gn nh
l th ngn ca //

28

sung

nguyn m n, di, hng sau,


hp, trn mi, c m sc trm.
ng trc /k, / b rt ngn

29

oo

nguyn m n, di, hng sau,


hi hp, trn mi, c m sc
trm. Th di khi khng ng
trc /k, /

30

con

nguyn m n, di, hng sau,


hi rng, trn mi, c m sc
trm. Th di khi khng ng
trc /k, /

31

oa

o (trc c, ng)

cc

nguyn m n, ngn

32

uw

nguyn m n, di, hng sau,


hp, khng trn mi, m sc
trm va

33

ow

nguyn m n, di, hng sau,


hi hp, khng trn mi, c m
sc trm va

34

aa

nguyn m n, ngn, hng


sau, hi hp, khng trn mi,
c m sc trm va. Xut hin
trong mi m tit tr m tit
m

35

tan

nguyn m n, di, hng sau,


rng, khng trn mi, c m
sc trm va. xut hin trong
tt c cc m tit

36

aw

chn

a (trc u, y)

tay

nguyn m n, ngn, hng


sau, rng, khng trn mi, c
m sc trm va. xut hin
trong tt c cc m tit tr m
tit m

ia

bia

ya (khi trc c
m m)

khuya

i (khi trc
khng c m m
v sau c m
cui)

tin

y (khi trc c
m m hoc sau
c m cui l bn
nguyn m

yu

ua (khi khng c
m cui)

chua

37

38

ie

uo

ie

uo

nguyn m i yu dn, hng


trc, khng trn mi, yu t
sau l nguyn m hng trc,
hi hp, khng trn mi

nguyn m i yu dn, hng


sau, trn mi, yu t u l

39

wa

u (khi s m
cui)

cun

nguyn m hng sau, hi hp,


khng trn mi

a (khi khng c
m cui)

tra

(khi c m
cui)

li

nguyn m i yu dn, hng


sau, khng trn mi, yu t
u l nguyn m hng sau,
hp, khng trn mi. Yu t
sau l nguyn m hng sau, hi
hp, khng trn mi

m cui
40

pc

mp

ph m cui, ...

41

tc

cht

ph m cui, ...

42

mz

cm

ph m cui vang, mi, mi

43

nz

nn

ph m cui vang, mi, u


lui

44

kc

ch (ng sau
i, e, , a)

sch

ph m cui n, mt li

c (trng hp
cn li)

cc

nh (ng sau
i, e, , a)

vnh

ng (trng
hp cn li)

vng

o (ng sau
e, a)

leo

u (trng hp
cn li)

cu

y (ng sau
nguyn m
ngn a, )

bay

i (trng hp
cn li)

ci

45

46

47

-w

-j

Tn thanh iu
K hiu

ngz

uz

iz

Ngang

Sc
S

Huyn
F

ph m cui vang, mi, mt


li

bn nguyn m cui vang, mi

bn nguyn m cui vang, li

Hi
R

Ng
X

Nng
J

Das könnte Ihnen auch gefallen