Beruflich Dokumente
Kultur Dokumente
TRNG I HC LC HNG
***
ng Nai, nm 2012
B GIO DC V O TO
TRNG I HC LC HNG
***
HUNH THANH GIU
ng Nai, nm 2012
LI CM N
ii
TM TT LUN VN
iii
MC LC
LI CM N .......................................................................................................... i
TM TT LUN VN .......................................................................................... ii
MC LC
..........................................................................................................iii
........................................................................................................... 1
Cc b lc s v cc ca s: ............................................................. 15
iv
2.2.2.
3.1.2.
4.2.2.
B ngn ng - Linguist:.................................................................... 70
B gii m - Decoder:....................................................................... 74
Ti cc gi Sphinx cn thit:............................................................. 79
5.1.2.
Ci t: ............................................................................................. 79
vi
5.2.2.
vii
DANH MC BNG
Bng 2.1. Cc tnh cht ca bin i Fourier ........................................................... 8
Bng 2.2. Cc tnh cht ca bin i Z .................................................................. 10
Bng 2.3. Tnh cht ca DFT i vi dy tun hon c chu k N........................... 12
Bng 3.1. H thng cp bc Chomsky v my tng ng cho php ngn ng ....... 59
Bng 4.1. Cc th nh dng trong tp tin cu hnh ................................................ 77
Bng 5.1. Thng s cu hnh ................................................................................. 90
viii
DANH MC HNH V
Hnh 2.1. Tn hiu analog v tn hiu s tng ng.................................................. 5
Hnh 2.2. ng hnh sin vi chu k 25 mu .......................................................... 5
Hnh 2.3. Biu din tng ca hai ng sin cng tn s ........................................... 6
Hnh 2.4. S khi ca mt h thng k thut s .................................................. 6
Hnh 2.5. th hm X(e j) .................................................................................... 7
Hnh 2.6. Biu din theo phn thc phn o............................................................. 9
Hnh 2.7. Biu din Z trn mt phng phc ............................................................. 9
Hnh 2.8. Vng trn n v .................................................................................... 10
Hnh 2.9. Thc hin bin i z trn vng trn n v ............................................. 11
Hnh 2.10. FFT 8 im, c s 2, phn chia theo tn s ........................................... 14
Hnh 2.11. Hm sinc .............................................................................................. 15
Hnh 2.12. Biu din AR(ej) .................................................................................. 16
Hnh 2.13. Hm phn phi ..................................................................................... 19
Hnh 2.14. Ph thi gian ngn ca ting ni ging nam ......................................... 22
Hnh 2.15. Chuyn i gia gi tr nng lng log (trn trc x) sang thang xm (trc
y) ........................................................................................................................... 23
Hnh 2.16. S rt trch c trng tng qut ....................................................... 23
Hnh 2.17. Cc bc tnh c trng MFCC ........................................................... 24
Hnh 2.18. th biu din mi quan h gia Mel v Hz ...................................... 26
Hnh 2.19. S b x l LPC rt trch c trng ting ni .................................. 28
Hnh 3.1. Minh ha m hnh Markov..................................................................... 30
Hnh 3.2. So snh trc tip gia hai mu ting ni ................................................. 36
Hnh 3.3. Qu trnh tnh ton li tin cho HMM ca Dow Jones Industrial .......... 37
Hnh 3.4. Qu trnh tnh ton li Viterbi cho HMM ca Dow Jones Industrial ..... 39
Hnh 3.5. Mi quan h t-1 & t v t & t+1 trong thut ton tin-li...................... 40
Hnh 3.6. S minh ha cc php ton yu cu cho vic tnh ton ca t(i, j)........... 41
Hnh 3.7. M hnh Markov n in hnh c dng cho m hnh m v................. 43
Hnh 3.8. Mt HMM chun ................................................................................... 47
Hnh 3.9. T l li t gia cc m hnh................................................................... 54
Hnh 3.10. Cu trc ca mt m hnh t ri rc ..................................................... 56
Hnh 3.11. M hnh Markov n cu tng hp......................................................... 57
Hnh 3.12. Mt biu din cy ca mt cu v ng php tng ng ca n ............. 59
ix
M U
Ting ni l phng tin giao tip c bn nht ca con ngi, s dng li
ni l mt cch din t n gin v hiu qu nht. t lu, con ngi lun m
c n cc h thng my iu khin t ng c th giao tip bng ting ni t
nhin ca con ngi. Ngy nay, cng vi s pht trin ca khoa hc k thut v
cng ngh, c bit trong lnh vc tin hc. Cc h thng my t ng dn thay
th con ngi trong nhiu cng vic. Nhu cu giao tip vi thit b my bng ting
ni l rt cn thit, l phng thc giao tip vn minh v t nhin nht.
Nhn dng ting ni l mt vn khng mi. Trn th gii v ang c
c rt nhiu cng trnh nghin cu v vn ny vi rt nhiu phng php nhn
dng ting ni khc nhau. V nhng nghin cu cng c nhng thnh cng ng
k. C th k n nh: h thng nhn dng ting ni ting Anh Via Voice ca IBM,
Spoken Toolkit ca CSLU (Central of Spoken Laguage Under-standing), Speech
Recognition Engine ca Microsoft, Hidden Markov Model toolkit ca i hc
Cambridge, CMU Sphinx ca i hc Carnegie Mellon, ngoi ra, mt s h thng
nhn dng tin ni ting Php, c, Trung Quc,... cng kh pht trin. Ting Vit
th cng c mt s cng trnh ca cc nhm nh: AILab, Vietvoice, Vspeech
Nhng i vi nc ta, nhn dng ting ni vn l mt lnh vc kh mi m. n
nay tuy c nhiu nghin cu v nhn dng ting ni ting Vit v t c
mt s thnh tu, nhng nhn chung vn cha t c kt qu cn thit c th
to ra cc sn phm mang tnh ng dng cao.
Vi mong mun c th hiu c cch giao tip gia ngi v my tnh,
lun vn ny nghin cu cc phng php nhn dng ting ni, t xy dng mt
chng trnh demo nhn dng ting ni ting Vit m khi con ngi ni my tnh c
th hiu c.
mm
cung
cp
min
ph
ti
ch
www.ailab.hcmus.edu.vn
(http://www.ailab.hcmus.edu.vn)
- Vietvoice: y l phn mm ca mt ngi dn Vit Nam ng ti Canada.
Phn mm c kh nng ni ting Vit t cc tp tin. chy c chng trnh, cn
ci t Microsoft Visual C++ 2005 Redistributable Package (x86). i vi ngi
khim th, phn mm ny cho php s dng cch g tt (nhn nt Ctrl v mt ch)
chn la mt trong cc tnh nng hin th trn mn hnh. Ngi dng c th cp
nht t in cc ch vit tt v cc t ng ting nc ngoi.
- Vspeech: y l mt phn mm iu khin my tnh bng ging ni do
mt nhm sinh vin i hc Bch Khoa TP. HCM vit. Phn mm s dng th vin
Microsoft Speech SDK nhn dng ting Anh nhng c chuyn thnh ting
Vit. Nhm kh thnh cng vi tng ny, do s dng li th vin nhn dng
engine nn thi gian thit k rt ngn li m hiu qu nhn dng kh tt. Phn mm
Vspeech c cc lnh gi h thng n gin nh gi th mc My Computer, nt
Start, Phin bn mi nht c tng tc vi MS Word 2003, lt web vi trnh
duyt Internet Explorer. Khng c cc chc nng ty chnh lnh v gi tt cc ng
dng. Phn mm chy trn nn Windows XP, microphone v card m thanh s dng
tiu chun thng thng.
Tuy nhin vic ng dng nhn dng ging ni vo iu khin my tnh cn
nhiu hn ch. Vit Nam th hu nh ch mi c b phn mm Vspeech ca nhm
sinh vin trng i hc Bch Khoa TP. HCM, cc phn mm khc ch th nghim
trong phng th nghim, cha c s dng thc t v cha t trn 100 t. Phn
mm Vspeech c pht trin t m ngun m Microsoft Speech SDK nhn dng
ting Anh, thng qua d liu, phng thc trung gian, vic nhn dng c chuyn
trong Vspeech nhn bit ting Vit.
1.2. MC CH TI
Lun vn nghin cu nhng tng c bn v cc phng php c s
dng trong nhn dng ting ni t xy dng mt chng trnh demo nhn dng
khong 20 t ting Vit dng iu khin my tnh bng ging ni.
Lun vn gm 05 chng:
Chng 1: Tng quan v tnh hnh trong v ngoi nc lin quan n vic
nhn dng ting ni, mc tiu ti v gii hn ca ti.
Chng 2: Trnh by mt s kin thc c bn v x l tn hiu s, biu
din ting ni trn nh ph v phng php rt trch c trng ting ni bng
phng php MFCC (Mel-scale Frequency Cepstral Coefficient) v LPC (Linear
Predictive Coding).
Chng 3: Tip cn phng php nhn dng ting ni da trn m hnh
Markov n bao gm khi nim, s dng thc t v mt s hn ch ca n. Bn cnh
cng cp n 2 m hnh quan trng xy dng nn b ngn ng cho h thng
nhn dng l m hnh m hc v m hnh ngn ng.
Chng 4: Gii thiu v cng c h tr nhn dng ting ni CMUSphinx
ca i hc Carnegie Mellon, cc thnh phn trong kin trc ca n c c ci
nhn tng quan v mt h thng nhn dng ting ni, ng thi h tr cho vic xy
dng chng trnh demo nhn dng ting ni.
Chng 5: Xy dng chng trnh demo nhn dng ting ni ting Vit s
dng cng c Sphinx, trong m t qu trnh xy dng m hnh ngn ng v hun
luyn m hnh m hc cho chng trnh nhn dng.
Ph lc: Bng phin m phin m ting Vit mc m v theo dng ASCII
da trn bng mu t phin m quc t IPA (International Phonetic Alphabet) c
s dng trong chng trnh.
1.3. GII HN TI
Lun vn ch gii hn trong vic tm hiu v ting ni, cc phng php x
l ting ni, rt trch c trng ting ni; m hnh Markov n, m hnh m hc, m
v p dng cho ting Vit; kin trc h thng nhn dng ting ni qua cng c
Sphinx. Chng trnh demo ch dng mc nhn dng c khong 100 cu lnh
c bn iu khin my tnh (c lit k chng 5). Khi mt ngi c lnh iu
khin, my tnh s hiu v xut hin dng lnh trn mn hnh ca chng trnh.
(2.1)
(2.2)
Trong : j = 1
Do tng ca hai tn hiu hm m phc l:
0 (0+0) + 1 (0+1 ) = 0 (0 0 + 1 1 ) = 0 = (0+)
(2.3)
Ly phn thc ca 2 v ta c:
0 cos(0 + 0 ) + 1 cos(0 + 1 ) = cos(0 + )
(2.4)
tan =
(2.5)
(2.6)
2.1.1.2. H thng s:
Mt h thng s l mt h thng m cho mt tn hiu u vo x(n), pht
sinh mt tn hiu u ra y(n):
y(n) = T{x(n)}
(2.7)
(2.8)
() = = ()( ) = () ()
(2.9)
Trong * l php nhn chp. y l php ton quan trng nht trong x l
tn hiu xc nh u ra y(n) h thng khi bit u vo x(n) v p ng xung
h(n). Php chp c tnh cht: giao hon, phn phi, kt hp.
2.1.2. Php bin i tn s lin tc:
2.1.2.1. Bin i Fourier:
Bin i Fourier ca mt tn hiu x(n) c nh ngha nh sau:
( ) = = ()
(2.10)
(2.11)
)]
(2.12)
(2.13)
( )
2
(2.14)
Min n
Min
1
( )
() =
2
( ) = ()
=
aX1 ( e j ) + bX2 ( e j )
x ( n n0 )
0 ( )
X* (ej ) = X (e -j )
Re [ X ( ej ) ] = Re [ X ( e-j ) ]
Im [ X ( ej ) ] = Im [ X ( e-j ) ]
| X (ej ) | = | X (e-j ) |
arg [ X (ej ) ] = -arg [ X (e-j ) ]
x*(n)
X* (e-j )
x(-n)
X (e-j )
x1 ( n ) * x2 ( n )
X1 ( ej ) . X2 ( ej )
x1 ( n ) . x 2 ( n )
1
1 ( () ). 2 ( )
2
nx(n)
0 ()
x ( n ) cos 0n
( )
[ (0) ]
1
1
[ (0)] + [ (0) ]
2
2
1 (). 2 * ()
=
Quan h Parseval
1
1 ( ). 2 * ( )
2
2
()
1
2
( )
2
2.1.2.2. Bin i Z:
nh ngha: Bin i z ca mt dy x(n) c nh ngha nh sau:
() = = ()
(2.15)
() = =0 ()
(2.16)
(2.17)
(2.18)
10
() = = ()
(2.19)
(). 1
(2.20)
Min n
1
() 1
() =
2
Min Z
() = ()
aX1(z) + bX2(z)
x(n-n0)
0 ()
anx(n)
X(a-1 z)
()
X*(z*)
1
( )
X1 ( z ) . X 2 ( z )
1
1 ()2 ( ) 1
2
nx(n)
x*(n) ; (*: lin hp phc)
x(-n)
x1 ( n ) * x2 ( n )
x1 ( n ) . x 2 ( n )
11
() = = ()
(2.21)
() |= = = (). = ( )
(2.22)
(2.23)
1
=0
() = =0 ().
(2.24)
12
Trong : =
vi {
=0 1
=01
t = =
2
= =
2
= ; 1 = ; 0 = 1
Theo cch t nh trn th bin i Fourier ri rc i vi dy tun hon
chu k N c vit li nh sau:
1
() = =0 ().
(2.25)
().
(2.26)
() = =0 ().
(2.27)
() =
=0
Min n
Min k
1
() = ().
() = ().
1 () + 2 ()
1 () + 2 ()
( 0 )
0 ()
ln ()
( + )
1 () (*) 2 ()
1 () 2 ()
1 () 2 ()
1
1 () 2 ( )
=0
=0
=0
1 () (*) 2 ()
()thc
() = * ()
[()] = [()]
[()] = [()]
() = ()
arg[()] = arg[()]
13
() = =0 ().
(2.28)
1
2
=0
(2n)
2nk
1
2
=0
(2n + 1) (2n+1)
(2.29)
1
2
=0
1
2
=0
(2.30)
(2.31)
() = ( ) < 1
2
(2.32)
(0) + (1)
() = {(0) + (1)
(0) (1)
(2.33)
14
S bm = log 2
2
() = ()
=0
(2n+1)
(). cos [
2N
] , = 0,1,2, . . . , 1
(2.34)
() = ()
=0
(2n+1)
(). cos [
2N
] , = 0,1,2, . . . , 1
(2.35)
15
2.1.4. Cc b lc s v cc ca s:
B lc s l mt h thng s dng lm bin dng s phn b tn s ca
cc thnh phn ca mt tn hiu theo cc ch tiu cho.
Lc s l cc thao tc ca x l dng lm bin dng s phn b tn s
ca cc thnh phn ca mt tn hiu theo cc ch tiu cho nh mt h thng s.
Phn ny m t cc nguyn tc c bn ca thit k b lc s, nghin cu b
lc s c p ng xung chiu di hu hn FIR (Finite-Impulse Response) v b lc
s c p ng xung chiu di v hn IIR (Infinite-Impulse Response), l cc loi
c bit ca cc b lc s pha tuyn tnh.
2.1.4.1. B lc l tng thng thp:
( ) = {
1 < 0
0 0 < <
(2.36)
1
2
( 0 0)
2
sin0
= ( 0 ) (0n )
(2.37)
sin
(2.38)
1 01
0
(2.39)
16
( ) = =0 =
1 sin
2
2
sin 2
1
2
2 ( 2
2)
2 ( 2 2 )
( )
=
(2.40)
C hai tham s nh gi ca s l:
- B rng nh trung tm .
- T s gia bin nh th cp th nht trn bin nh trung tm:
= 20lg
( )
( 0 )
(1 )cos
01
1
() = {
0
(2.41)
() = {
0,5 0,5cos
0 1
1
0
(2.42)
+ = 0,54 : ca s Hamming
2
0,54 0,46cos
1
() = {
0
Ta c cc tham s ca b lc Hanning:
+ Han = 8/N
01
(2.43)
17
+ Han 32dB
Cc tham s ca b lc Hamming:
+ Ham = 8 /N
+ Ham 43dB
Nh vy ta thy: T = Han = Ham = 8 /N; T > Han > Ham vy trong
3 ca s b rng nh trung tm l nh nhau nhng bin ca gn sng di
thng v di chn s nh nht khi thit k bng ca s Hamming.
2.1.4.3. B lc FIR v IIR:
Cc h thng c c tnh xung c chiu di hu hn c gi l FIR:
() = {
0
=0
1 2
< < 1 2 < <
(2.44)
Gi s h thng FIR:
0
() = {
0
(2.45)
1
0
=0 ( )
(2.46)
=0 ( ) = =0 ( )
(2.47)
() = =0 ( ) + =0 ( )
(2.48)
() =
()
()
=0
1+=1
=0
=0
(2.49)
18
P(AB)
P(B)
(2.50)
P(B|A).P(A)
P(B)
(2.51)
()
(2.52)
19
() = ()
(2.53)
( < ) = () () = ()
(2.54)
() = ()
(2.55)
V d:
(2.56)
20
(, ) = (). ( )
(2.57)
(, ) = (( ) ) ()
=
1
[((
2
) )*]. [()]
21
=
=
[ ( ). () ] . []
2
2
( )()
(2.58)
(, )= (( ) ) () = , (), ()
(2.59)
22
( ) = = () = = ( )()
(2.60)
23
Hnh 2.15. Chuyn i gia gi tr nng lng log (trn trc x) sang thang xm (trc y)
24
(Mel-scale
Frequency Cepstral
(2.62)
(2.63)
()
()
= 1 1
(2.64)
25
2.3.1.2. Ca s ha (Windowing):
u tin tn hiu ting ni x(n) s c chia thnh tng frame (c thc hin
chng ph mt phn ln nhau - overlap) c T frame xt(n). Cng vic ca s
ho ny s c thc hin bng cch nhn tn hiu ting ni vi mt hm ca s.
Gi phng trnh ca s ha l w(n) (0 n N-1; N: s mu trong 1 frame tn hiu),
khi tn hiu sau khi c ca s ha l Xt(n):
Xt(n) =xt(n).w(n)
Hm ca s thng c dng l hm ca s Hamming:
2
() = 0.54 0.46cos(
); = 0. . 1
(2.65)
1
=0
().
(2.66)
26
700
(2.67)
(2.68)
fm l tn s trung tm ca b lc th m
fm-1 l tn s trung tm ca b lc th m -1
fm l bng thng ca b lc th m
(2.69)
27
=1
(2.70)
(2.71)
() = =1 ( ) + ()
(2.72)
() = =0 () + ()
(2.73)
dn n hm truyn l:
() =
()
()
1=1
1
()
(2.74)
= =1 ( )
Khi thit lp li d bo e(n) c nh ngha l:
~
() = () () = () =1 ( ) = . ()
(2.75)
28
1 ~
=0
() ( + ) ; = 0,1, . . . ,
(2.76)
29
( ) ;
=1
( ) ; < <
=1
() = [1 + sin(
)],1
(2.77)
30
CHNG 3:
31
Trong hnh v, aij l xc sut chuyn t trng thi i sang trng thi j, ta c
cc quan h:
aij 0, ,
=1
(3.1)
(3.2)
1 i, j N
1iN
V d 1: Tung ng xu.
32
- Xc sut ca chui quan st {rain, rain, rain, clouds, sun, clouds, rain} ng
vi m hnh Markov trn l:
Quan st
{ r, r, r, c, s, c, r}
Time
{1, 2, 3, 4, 5, 6, 7} (days)
0.5*0.7*0.7*0.25*0.1*0.7*0.4
0.001715
- Xc sut ca chui {sun, sun, sun, rain, clouds, sun, sun) ng vi m hnh
Markov trn l:
Quan st
{ s, s, s, r, c, s, s}
Time
{1, 2, 3, 4 , 5 , 6 , 7} (days)
P[S3]P[S3|S3]P[S3|S3]P[S1|S3]P[S2|S1]P[S3|S2]P[S3|S3]
33
0.1*0.1*0.1*0.2*0.25*0.1*0.1
5.0*10-7 .
1 i, j N
1kM
j=1, 2, N
1iN
34
M hnh ny c:
0.8* 0.3*0.4*0.6*0.3*0.3
5.2*10-3
0.4*0.8*0.3*0.3*0.2*0.4*0.5*0.6*0.4*0.3*0.7*0.3
1.74*10-5
0.4*0.8*0.6*0.8*0.3*0.4*0.5*0.6*0.7*0.4*0.4*0.6
35
3.71*10-4
36
(3.3)
37
1iN
Bc 2: Qui np
( ) = [
=1 1 ( ) ] ( )
2 t T; 1 j N
Bc 3: Kt thc
(|) =
=1 () nu c yu cu kt thc trng thi sau cng,
(|) = ( )
Ta c th d dng bit c phc tp ca thut ton tin l O(N2T) tt
hn so vi phc tp cp s m. l bi v chng ta c th s dng ton b cc
phn xc sut tnh ton cho hiu qu c ci tin.
Hnh 3.3. Qu trnh tnh ton li tin cho HMM ca Dow Jones Industrial
38
nn mng cho qu trnh tm kim trong nhn dng ting ni lin tc. Khi dy trng
thi c n (khng c quan st) trong nn tng HMM, hu ht s dng rng ri
nht tiu chun l tm dy trng thi c xc sut cao nht c ly trong khi to
ra dy quan st. Ni cch khc, chng ta ang tm kim dy trng thi S = (s1,
s2, , sT) m cc i P(S, X|). Vn ny rt ging vi vn ti u ng dn
trong lp trnh ng. H qu l, mt k thut chnh thc da trn lp trnh ng, gi
l thut ton Viterbi, c th c dng tm dy trng thi tt nht cho HMM.
Thc t, phng php tng t c dng nh gi HMM mang li cho gii
php xp x gn vi trng hp t c vic s dng thut ton tin m t trn.
Thut ton Viterbi c th c xem nh thut ton lp trnh ng p dng
cho HMM hay l thut ton tin sa i. Thay v tng kt xc sut t cc con ng
khc n trng thi ch, thut ton Viterbi ly v nh ng dn tt nht. nh
ngha xc sut ng dn tt nht:
( ) = (1 , 11 , = |)
(3.4)
1iN
Bc 2: Qui np
() = 1 [1 () ] ( )
2 t T; 1 j N
() = 1 [1 () ]
2 t T; 1 j N
Bc 3: Kt thc
Ch s tt nht = Max1iN[ ()]
= 1 [ ( )]
Bc 4: Quay lui
)
= +1 (+1
t = T - 1, T - 2,, 1
S* = (1 , 2 , , ) l dy tt nht
phc tp ca thut ton Viterbi l O(N2T)
39
Hnh 3.4. Qu trnh tnh ton li Viterbi cho HMM ca Dow Jones Industrial
( ) = (+1
| = , )
(3.5)
thc) cho trc, HMM trong trng thi i thi im t, ( ) c th sau c tnh
ton mt cch qui np;
Khi to:
( ) = 1/
1iN
Qui np:
( ) = [
=1 (+1 )+1 ()]
t = T - 1 1; 1 i N
40
hnh bn di. c tnh mt cch qui t tri sang phi, qui t phi sang
tri.
Hnh 3.5. Mi quan h t-1 & t v t & t+1 trong thut ton tin-li
1 ( ) ( ) ()
=1 ()
(, |)
)
log(, |
(|)
Trong :
(, |) = =1 1 ( )
log (, |) = Tt=1 log 1 + Tt=1 log ( )
41
Hnh 3.6. S minh ha cc php ton yu cu cho vic tnh ton ca t(i, j).
=1 (,)
=1 =1 (,)
( ) =
= (,)
=1 (,)
(3.6)
(3.7)
42
43
44
chnh xc nhn dng ting ni vi hun luyn ph thuc ngi ni. Tuy vy, ta
c th khng c d liu cho ngi ni c th v vy mong mun s dng mt
m hnh ngi ni c lp l tng qut hn nhng km chnh xc hn trong ti u
m hnh ph thuc ngi ni. Mt phng php hiu qu t c s chc chn
l thm vo c hai m hnh vi k thut c gi l php ni suy loi b, trong
o php ni suy s dng c lng qua vic hp thc ha d liu. Hm mc tiu
l ti u xc sut ca m hnh to ra d liu.
By gi, gi s rng chng ta mun ni suy hai tp hp ca cc m hnh
[PA(x) v PB(x), va c th phn phi xc sut ri rc hoc hm mt lin tc]
to thnh mt m hnh ni suy PDI(x). Th tc php ni suy c th biu din dng:
PDI(x) = PA(x) + (1-) PB(x)
3.1.3.5. Ti u ton t:
Mt thc t n gin cho m phng xc sut l cng nhiu s quan st cng
tt, l cn thit n nh m hnh c lng cc tham bin. Tuy nhin, tht ra, ch
mt s lng hn ch d liu hun luyn l sn c. Nu d liu hun luyn b gii
hn, iu ny s dn n kt qu trong mt vi tham bin hun luyn l khng
tha ng, v s phn loi da trn cc m hnh hun luyn km s dn n mc
li nhn dng cng cao. C nhiu gii php hp l gii quyt vn ca d liu
hun luyn khng y nh sau:
Ta c th gia tng kch thc ca d liu hun luyn.
Ta c th gim s tham bin t do c c lng li. iu ny to
nn cc hn ch ca n, v mt s cc tham bin ng k lun cn m hnh s kin.
Ta c th thm vo mt tp cc tham bin c lng vi mt tp khc
ca tham bin c lng, theo mt lng d liu hun luyn tn ti. Xo b
php ni suy c cp trn, c th c s dng hiu qu. Trong HMM ri
rc, mt phng php n gin l thit lp nn cho c hai xc sut chuyn tip v
xc sut u ra loi b kh nng c lng khng.
Ta c th gom cc tham bin vi nhau gim s ca tham bin t do.
Cho HMM hn hp lin tc, ta cn ch n ti u ma trn. C mt s
k thut ta c th s dng:
45
(3.8)
(3.9)
(3.10)
46
(3.11)
(1)1 () ( ) ()()
()
=1 ()
(3.12)
(3.13)
(3.14)
47
(3.15)
48
(3.16)
S chuyn tip t trng thi i sang trng thi j khng ch ph thuc xc sut
chuyn i aij m cn trn tt c cc kh nng trong khong thi gian c th xy
ra trong trng thi j. Biu thc (3.16) minh ha khi trng thi j c t n t trng
thi i trc , cc quan st c th gi trng thi j cho mt khong thi gian vi
mt quy trnh di(), v mi quan st to ra xc sut u ra ca chnh n. Tt c
quy trnh c kh nng phi c xem xt, vi s tng kt mong mun t n . Gi
nh c lp ca cc quan st mang n kt qu trong thut ng ca cc xc sut
u ra. Tng t, s quy pha sau c th c vit nh sau:
( ) = , () =1 (+1 )+ ()
(3.17)
49
3.1.4.2. Gi nh bc u tin:
Khong thi gian tn ti ca mi phn on c nh gi bng trng thi n
l khng tha ng m hnh. Cch khc lm gim nh vn khong thi gian
tn ti l loi b gi nh s chuyn tip bc u tin v to nn dy trnh t
trng thi di mt chui Markov bc hai. Kt qu l xc sut chuyn tip gia hai
trng thi thi im t ph thuc cc trng thi m trong qu trnh thi im t
- 1 v t - 2. Cho trc mt dy trng thi S = {s1, s2, sT}, xc sut ca trng thi
nn tnh ton nh sau:
() =
(3.18)
(3.19)
(3.20)
50
()( )
()
(3.21)
(3.22)
51
52
53
- 22 ph m u bao gm /b, m, f, v, t, t, d, n, z, , s, , c, , , l, k, , , ,
h, /
- 1 m m /w/ c chc nng lm trm ha m sc ca m tit.
- 16 m chnh bao gm 13 nguyn m n v 3 nguyn m i: /i, e, , , ,
a, , , u, o, , , , ie, , uo/
- 8 m cui tch cc bao gm 6 ph m /m, n, , p, t, k/ v 2 bn nguyn
m /-w, -j/.
- 6 thanh iu.
C th thy s lng m v khng nhiu, do , vic ng dng m hnh ng
m vo nhn dng ting Vit l mt gii php ng quan tm. Tuy nhin vn kh
khn i vi ting Vit chnh l thanh iu.
Tuy thanh iu nh hng ln ton b ting, nhng c th thy n nh
hng nhiu nht l cc nguyn m. V vy ta c th chia mi nguyn m ra thnh
6 m, tng ng vi 6 thanh iu. Nh vy tng s lng m cn hun luyn l
khong 137 m, nh hn nhiu so vi hun luyn theo ting.
3.2.2. nh gi c trng m hc:
Sau khi tch c trng, ta c mt tp cc vector c trng X, chng hn
vector MFCC l cc d liu u vo. Ta cn phi c lng xc sut ca cc c
trng m hc ny, cho trc m hnh t hoc m hnh ng m W, c th nhn
dng d liu u vo cho t ng. Xc sut ny c gi l xc sut m hc,
P(X|W).
3.2.2.1. La chn cc phn phi u ra HMM:
C th s dng cc HMM ri rc, lin tc hoc bn lin tc. Khi s lng
d liu hun luyn , tham s rng buc tr nn khng cn thit. Mt m hnh
lin tc vi mt s lng ln cc trn ln dn n chnh xc nhn dng tt nht,
mc d phc tp tnh ton ca n cng gia tng tuyn tnh vi s lng cc hn
hp. Mt khc, m hnh ri rc c hiu qu v mt tnh ton, nhng c hiu sut
thp nht trong ba m hnh. M hnh bn lin tc cung cp mt thay th kh thi gia
kh nng hun luyn v tnh mnh m ca h thng.
Khi mt trong HMM ri rc hay bn lin tc c s dng, vic dng nhiu
codebook cho mt s c trng s nng cao hiu sut mt cch ng k. Mi
54
( ) = =1 ( ) ( )
(3.23)
(,
( ) ( ) ()
()
(3.24)
55
(3.25)
V vy, xc sut chuyn trng thi vng t lp ca trng thi cui cng lun
56
57
58
++
(3.26)
59
Loi
Ng php cu
trc
Ng php ng
cnh nhy
Rng buc
. y l ng php tng qut nht.
H thng t
ng
My Turing
H thng t
ng tuyn tnh
H thng t
ng thc y
H thng t
ng trng thi
hu hn
Ngn ng c phn tch v c bn l mt chui cc k hiu thut ng, nh
60
61
62
(3.27)
63
=1 ( )
(3.28)
(3.29)
(3.30)
Hnh 3.13. Xc sut bn trong c tnh ton mt cch quy nh tng ca tt c cc dn sut
Xc sut bn trong:
inside(j, Ai, k) = P(Aiwjwj+1wk|G)
(3.31)
(3.32)
64
(3.34)
65
hiu
sut
entropy
H(W)
ca
mt
hnh
log 2 ()
(3.35)
(3.36)
66
67
68
69
70
71
72
sinh
bi
CMU-Cambridge
Statictical
Language
Modeling Toolkit.
73
hay hng chuyn trng thi ca cc HMM. Hn na Sphinx4 cho php s lng
cc trng thi trong mt HMM c th khc nhau t mt n v ti n v khc trong
cng mt m hnh m hc.
Mi trng thi HMM c kh nng pht sinh mt nh gi t mt c trng
quan st. Quy tc tnh ton im s c thc hin bi chnh trng thi HMM,
do che du cc thc thi ca n i vi phn cn li ca h thng, thm ch cho
php cc hm mt xc sut khc nhau c s dng trn mi trng thi HMM.
M hnh m hc cng cho php chia s cc thnh phn khc nhau trn tt c cc cp
. Ngha l cc thnh phn to nn mt trng thi HMM nh cc hp Gaussian
(Gaussian mixture), cc ma trn bin i v cc trng s hn hp (mixture weight)
c th c chia s bi bt k trng thi HMM no.
Nh cc phn cn li ca Sphinx4, ngi dng c th cu hnh Sphinx4 vi
cc b sung khc ca m hnh m hc. Sphinx4 hin cung cp mt thc thi trng
thi HMM c bit c kh nng np vo v s dng cc m hnh m hc sinh ra bi
b hun luyn Sphinx-3.
4.2.2.4. th tm kim - SearchGraph:
Mc d Sphinx4 c th c thc thi trong nhiu cch khc nhau v cc
topology ca cc khng gian tm kim sinh bi cc b ngn ng c th rt a dng,
cc khng gian tm kim c m t hon ton nh mt th tm kim. th tm
kim l cu trc d liu chnh s dng trong sut qu trnh gii m.
l mt th c hng trong mi nt, gi l mt Trng thi tm kim
(SearchState), biu din mt trng thi pht hay khng pht (emitting state hay nonemitting state). Cc trng thi pht c th c nh gi da trn cc c trng m
hc vo (incoming acoustic feature) trong khi cc trng thi khng pht thng
thng c dng biu din cc cu trc ngn ng cp cao nh cc t v cc
m v khng th nh gi trc tip da trn cc c trng u vo. Cc cung gia
cc trng thi biu din cc bin i trng thi c th, mi cung c mt xc sut ch
kh nng bin i dc theo cc cung.
Giao din th tm kim c mc tiu cho php mt phm vi ln cc la
chn b sung. Thc t, b ngn nh khng t cc rng buc c hu theo:
74
75
76
song s dng cc m hnh thnh gic song song, Scorer ghp m hnh thnh gic
c dng da vo loi c trng.
Scorer gi li tt c thng tin gn lin vi cc mt xut trng thi. Do ,
SearchManager khng cn bit vic nh gi c hon thnh vi cc HMM ln
tc, bn lin tc hay ri rc. Hn na, hm mt xc sut ca mi trng thi b
tch bit trong cng mt kiu. Bt c gii thut heuristic kt hp vo hm nh gi
tng tc c th cng c thc thi cc b bn trong b nh gi. Thm na,
b nh gi c th tn dng nhiu CPU nu sn c.
Sphinx4 cung cp cc thnh phn b sung ca SearchManager c th tho
lp c, h tr ng b khung Viterbi, Bushderby v gii m song song:
- SimpleBreadthFirstSearchManager: Thc hin mt tm kim Viterbi ng
b khung n gin vi Pruner c gi trn mi khung. Pruner mc nh qun l c
cc tia tuyt i v tng i. B qun l tm kim pht sinh cc Result cha cc
con tr ti cc ng dn hot ng frame cui cung c x l.
- WordPruningBreadthSearchManager: thc hin mt tm kim Viterbi ng
b khung vi mt b ct ta c th tho lp (pluggable Pruner) c gi trn mi
khung. Thay v qun l mt ActiveList n gin, n qun l mt tp cc ActiveList,
mi danh sch cho mt loi trng thi c nh ngha bi Linguist.
- BushderbySearchManager: thc thi mt tm kim theo chiu rng ng b
khung tng qut (generalized frame-synchronous breath-first search) s dng thut
ton Bushderby, thc hin phn lp da trn nng lng t do, tri vi kh nng.
- ParallelSearchManager: thc thi mt tm kim Viterbi ng b khung trn
nhiu lun c trng s dng mt tip cn HMM theo ngn ng yu t, ngc li
vi tip cn tip cn HMM kt hp bi AVCSR. Mt u im ca tm kim theo
yu t l c th thc hin nhanh hn v nh gn hn nhiu so vi tm kim ton b
mt HMM tng hp.
4.3. QUN L CU HNH SPHINX:
H thng qun l cu hnh Sphinx (Sphinx configuration manager system)
c 2 mc ch chnh:
77
Th
<config>
Cc thuc tnh
khng c
<component> name - tn ca
thnh phn
type - loi ca
thnh phn
<property>
M t
<component>
<property>
<propertylist>
<property>
<propertylist>
nh ngha mt thc th ca
mt thnh phn. Th ny
phi lun c thuc tnh
name v type
Dng nh ngha mt
thuc tnh n ca thnh
phn hay mt thuc tnh h
thng ton cc
78
<propertylist> name - tn ca
danh sch thuc
tnh
<item>
Dng nh ngha mt
danh sch cc chui hay cc
thnh phn. Th ny phi
lun c thnh phn name.
N c th cha bt c s
lng item con no.
<item>
Khng c
Ni dung ca th ny l mt
chui hay mt tn ca thnh
phn.
Khng c
79
gi khc.
5.1.2. Ci t:
To mt th mc tn sphinx trong Home folder (trong my o Ubuntu). Chp
cc tp tin (Sphinxbase, Sphinxtrain, Pocketsphinx, CMUclmtk) va download
trong mc trn vo v gii nn (lu xa i ch s version sau khi extract).
80
s khng hin ln, nhp cn thn v nhn Enter). Lnh trn update cho cc gi
ci t dng bng lnh apt-get. Ch update xong
Nhp vo: cd sphinx di chuyn ti th mc sphinx va to.
Ci t cc gi cn thit trc khi ci SphinxBase:
G cc lnh:
ng ti v ci bison
5.1.2.1. Ci t SphinxBase
Nhp lnh: cd sphinxbase i vo th mc sphinxbase.
G cc lnh sau v ch thi hnh:
./autogen.sh
./configure
make
81
5.1.2.2. Ci t Sphinxtrain
T th mc sphinxbase trn, g lnh chuyn sang th mc sang th
mc sphinctrain:
cd ../sphinxtrain
./configure
make
5.1.2.3. Ci t PocketSphinx
T th mc sphinxtrain trn, g lnh chuyn sang th mc sang th
mc pocketsphinx:
cd ../pocketsphinx
./autogen.sh
./configure
make
82
83
84
.vocab,
Ta s thu c tp tin
dkmt.wfreq
nh dng ARPA (hay Doug Paul) cho m hnh N-gram backoff c cu trc
nh sau:
\data\
ngram 1=n1
ngram 2=n2
...
ngram N=nN
\1-grams:
p
w
[bow]
...
\2-grams:
p
w1 w2
[bow]
...
\N-grams:
p
w1 ... wN
...
\end\
Tp tin ny c phn m u vi t kha \data\, lit k s lng N-gram.
Sau cc N-gram c lit k mi dng, c nhm li thnh tng phn theo
85
86
87
88
Tp tin dkmt_train.fileids
Tp tin ny l tp tin lit k ng dn n cc tp tin ghi m trn mi
dng, nm trong th mc wav, c trong s th mc trnh by pha trn.
speaker_1/file_1
speaker_2/file_2
Khng ghi ui tp tin .wav vo. Mi mt dng l mt tp tin.
Tp tin dkmt_train.transcription
y l phn ni dung m tp tin wav thu m c. hun luyn cho
Sphinx hiu nhng g chng ta ni, cn cung cp mt tp tin text gip cho
Sphinx hiu v hc t . Cu trc mt tp tin
.transcript
gm nhiu dng, mi mt
89
../pocketsphinx/scripts/setup_sphinx.pl -task dkmt
sphinx_train.cfg.
Mt s cu hnh quan
trng:
- Cu hnh hun luyn tp tin m thanh nh dng wav:
$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav';
- iu chnh loi m hnh (hun luyn HMM lin tc, bn lin tc), b du
# trc m hnh cn hun luyn:
$CFG_HMM_TYPE = '.cont.'; # Sphinx 4, Pocketsphinx
#$CFG_HMM_TYPE = '.semi.'; # PocketSphinx
#$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)
- Cu hnh tham s mt CFG c th nhn cc gi tr 4, 8, 16, 32, 64 ty
theo ln ca d liu:
$CFG_FINAL_NUM_DENSITIES = 8;
- Cu hnh s lng cc senone hun luyn trong mt m hnh. S lng
senone cng ln, sphinx phn bit cc m cng chnh xc. Nhng mt khc, nu bn
c qu nhiu senone, m hnh s khng c tng qut nhn dng cc ting
ni v hnh. Ngha l s t li s tng cao trn d liu cha hun luyn. l l do
90
Senones
Densities
V d
20
200
100
20
2000
M hnh ra lnh iu
khin
5000
30
4000
16
M hnh c chnh t
5000 t
20000
80
4000
32
M hnh c chnh t
20000 t
60000
200
6000
16
M hnh HUB
60000
2000
12000
64
91
./scripts_pl/RunAll.pl
Lnh trn s duyt qua cc phn yu cu. Qu trnh hun luyn s xut ra
cc thng bo dng sau:
Baum welch starting for 2 Gaussian(s), iteration: 3 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 3
Current Overall Likelihood Per Frame = 30.6558644286942
Convergence Ratio = 0.633864444461992
Baum welch starting for 2 Gaussian(s), iteration: 4 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 4
5.4. KT QU TH NGHIM:
Qu trnh hot ng:
92
93
QUA TRI
TRI
DI CHUYN CHUT V U
V U
CHUT V U
U
BT U
DI CHUYN CHUT V CUI
V CUI
CHUT V CUI
CUI
CUN CHUT
CUN CHUT LN
CUN CHUT XUNG
NHC
DNG
NGHE TIP
TT NHC
CHNG TRNH NGHE NHC
NG CHNG TRNH NHC
WEB
NG WEB
VN BN
CHN TP TIN
M TP TIN
TP TIN
HY
SON THO VN BN
NG CHNG TRNH SON THO
NG SON THO
NG VN BN
THOT
CON TR
TR
TR V U
TR V CUI
CON TR LN
CON TR LN TRN
CON TR LN U
TR LN
TR LN TRN
TR LN TRN NA
TR LN U
CON TR XUNG
CON TR XUNG DI
CON TR XUNG CUI
TR XUNG
94
TR XUNG DI
TR XUNG DI NA
TR XUNG CUI
THM TH
TH MI
NG TH
THM TRANG
TRANG MI
NG TRANG
XUNG DNG
U DNG
CUI DNG
DNG TRN
DNG DI
XUNG HNG
U HNG
CUI HNG
HNG TRN
HNG DI
TI
LUI
TO
PHNG TO
NH
THU NH
- Tng thi gian ghi m khong 50 gi.
- D liu kim tra gm 660 cu ni.
Kt qu th nghim nh sau:
- S lng cu ng: 545/660. chnh xc: 82,57%.
- S lng t ng 1557/1675. chnh xc: 92,95%.
Nh vy c th thy kt qu nhn dng cu khng cao, chnh xc ch t
khong 82%. Tuy nhin kt qu nhn dng t ng li cao hn, trn 92%.
95
KT LUN
KT QU T C:
Qua qu trnh nghin cu v nhn dng ting ni ting Vit v ng dng th
nghim trong iu khin my tnh, lun vn lm c mt s cng vic sau:
- Nghin cu v ting ni, cc phng php x l ting ni, rt trch c
trng.
- Nghin cu v thc hin hun luyn m hnh m hc theo m v, p dng
cho ting Vit.
- Nghin cc kin trc mt h thng nhn dng ting ni qua cc cng c
ca CMUSphinx.
- Xy dng chng trnh demo nhn dng ting ni ting Vit lin tc.
Do cha c nhiu kin thc v x l tn hiu s v x l ting ni nn lun
vn khng trnh khi nhiu thiu st. Tuy nhin, vi mt s kt qu t c hy
vng lun vn s gp mt phn nh vo vic nghin cu nhn dng ting ni ting
Vit.
HNG PHT TRIN:
Do vic thu m x l d liu cha c phong ph nn kt qu cha c
tt. Vic ny c th c khc phc bng cch thu nhiu mu hn v huy ng thm
nhng ngi tnh nguyn thu m. C th xem xt tn dng ngun m ting ni
trn radio, internet lm phong ph thm b d liu hun luyn. Ngoi ra cn pht
trin thm cc phn sau:
- Kho st thm cc c im ng m ting Vit v quan st nh ph tm
ra cc c trng nh hng n thanh iu, ci thin vic nhn dng cc thanh iu.
- Ci tin phng php tch t trong cu c kt qu nhn dng tt hn.
- Tm hiu thm v m hnh ngn ng v cc thut ton tm kim trong
nhn dng ting ni tng tc nhn dng.
Record
your
Speech
http://www.voxforge.org/home/submitspeech/windows/step-2
with
Audacity:
PH LC
BNG PHIN M TING VIT DI DNG M ASCII
m v
STT
Ch
IPA
V d
M t
ASCII
m u
ba
dd
ph m tc, u li li, hu
thanh, khng bt hi
tng
ph m tc, u li rng, v
thanh, khng bt hi.
th
th
thch
tr
tr
trng
ph m tc, u li vm
ming, v thanh, khng bt
hi.
ch
ch
ch
k (trc i, e, )
keo
c (trc u, , a,
o,...)
cnh
q (trc u)
quy
mm
nng
ph m vang mi, u li li
10
nh
nh
nh
ph m vang mi, mt li
11
ng
ng (trc u, , o,
, , a, , )
ng
ph m vang mi, gc li
ngh (trc i, e, )
ngh
12
ph
ph
ph m xt, v thanh, mi
rng, xut hin trong m tit
khng c m m
13
vi
ph m x, hu thanh, mi
rng, xut hin trong m tit
khng c m m
14
xa
15
gi
gii
g (trc i)
ph m xt, v thanh, u li
li
ph m xt, hu thanh, u
li li
16
lm
ph m vang bn, u li
rng
17
sn
ph m xt, v thanh, du li
vm ming, un li
18
rm
ph m xt, hu thanh, u
li vng ming, un li
19
kh
kh
kh
ph m xt, v thanh, gc li
20
g (trc u, , o,
, , a, , )
gm
gh (trc i, e, )
gh
ph m xt cui li, hu
thanh
21
ha
22
pi
m m
23
o (trc nguyn
m rng a, , e)
hoa
u (cn li)
hy
c cu to ging nguyn m
chnh /u/, c m hp, pht
m cc trm, trn mi, thuc
hng sau
m chnh
24
y (ng sau u)
suy
i (cn li)
tnh
25
ee
ch
26
ch
27
ea
sch
nguyn m n, ngn. Gn nh
l th ngn ca //
28
sung
29
oo
30
con
31
oa
o (trc c, ng)
cc
nguyn m n, ngn
32
uw
33
ow
34
aa
35
tan
36
aw
chn
a (trc u, y)
tay
ia
bia
ya (khi trc c
m m)
khuya
i (khi trc
khng c m m
v sau c m
cui)
tin
y (khi trc c
m m hoc sau
c m cui l bn
nguyn m
yu
ua (khi khng c
m cui)
chua
37
38
ie
uo
ie
uo
39
wa
u (khi s m
cui)
cun
a (khi khng c
m cui)
tra
(khi c m
cui)
li
m cui
40
pc
mp
ph m cui, ...
41
tc
cht
ph m cui, ...
42
mz
cm
43
nz
nn
44
kc
ch (ng sau
i, e, , a)
sch
ph m cui n, mt li
c (trng hp
cn li)
cc
nh (ng sau
i, e, , a)
vnh
ng (trng
hp cn li)
vng
o (ng sau
e, a)
leo
u (trng hp
cn li)
cu
y (ng sau
nguyn m
ngn a, )
bay
i (trng hp
cn li)
ci
45
46
47
-w
-j
Tn thanh iu
K hiu
ngz
uz
iz
Ngang
Sc
S
Huyn
F
Hi
R
Ng
X
Nng
J