Sie sind auf Seite 1von 5

ISSN 2278-3091

Volume 6, No.3, May - June 2017


E. Fedorov et al., International
International Journalof
Journal of Advanced
Advanced Trends in Computer
Trends Science and Engineering,
in Computer Science and 6(3), Engineering
May - June 2017, 35-39
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse04632017.pdf
The distribution formation method of reference patterns of vocal
speech sounds
E. Fedorov1, H. Alrababah2, A. Nehad3
1 Professor, Ukraine, fedorovee75@donntu.edu.ua
2 Dr., UAE, hamza@uoj.ac.ae
3 Mr., UAE, ali@uoj.ac.ae

ABSTRACT necessary to apply signal processing to bring the frequency


of the pitch, energy and duration of the sound units to those
The development of intelligent computer components is with which the synthesized speech should be characterized.
widespread problem. The method of distributed In the systems of concatenative synthesis, three main
transformation training patterns vocal sounds to a unified algorithms are used: TD-PSOLA (made scaling the audio
amplitude-time window and method a distributed clustering unit of time), FD-PSOLA (made scaling the audio unit of
training patterns vocal sounds have been proposed for frequency), LP-PSOLA (carried scaling the prediction error
distributed forming reference patterns of speech vocal signal in time with the subsequent application of LPC filter-
sounds in the paper. These methods allow fast convert quasi- coefficients). The disadvantage of concatenative synthesis is
periodic sections of different lengths to a single amplitude- the need to store a large number of sound units. In this
time window for subsequent comparison and accurately and connection, the problem arises of their more economical
quickly determine the optimal number of clusters, which representation [10]. Existing speech pattern recognition
increases the probability clusterization. The proposed systems utilize approaches such as: logical, metric, Bayesian,
methods can be used in speech recognition and synthesis connectionist, generative. Modern methods and speech
systems. pattern recognition models are usually based on: Hidden
Markov models [5], CDP (Composition + Dynamic
Keywords: distributed transformation training patterns Programming) [10], Artificial neural networks [11 - 15] and
vocal sounds, unified amplitude-time window, distributed have the following disadvantages [16, 17]: unsatisfactory
clustering training patterns vocal sounds. probability of recognition; the need for a large number of
training data; duration of training; storage of a large number
of reference of sounds or words, as well as weight
1. INTRODUCTION coefficients; duration of recognition.

The general formulation of the problem. The Formulation of research problems. The aim is to
development of intelligent software components intended for develop a method of forming a distributed reference patterns
human speech recognition, speech synthesis et al., which are vocal speech sounds.
used in computer systems to communicate, is actual in
current conditions. The basis of this problem lies the problem Problem Solving and research results. To achieve this
of building effects in efficiently methods that provide high aim you need:
speed formation of the reference patterns of speech sounds 1. Develop a distributed method conversion training
and used to learn descriptive and generative models. patterns vocal sounds to a unified amplitude-time window.
2. Develop a method for distributed clustering training
Analysis Research. Existing speech patterns synthesis patterns vocal sounds.
system using such approaches like [1-4]: formant synthesis, 3. Conduct a numerical study of the clustering methods
synthesis based of a linear prediction coefficients (LPC- used.
synthesis), concatenative synthesis. Formant synthesis and
LPC-synthesis are based on the model of human speech
formation. The model of the speech path is realized as an 2. METHOD OF CONVERTING A DISTRIBUTED
adaptive digital filter. For formant synthesis parameters of TRAINING PATTERNS VOCAL SOUNDS TO A
the adaptive digital filter are determined by the formant UNIFIED AMPLITUDE-TIME WINDOW
frequencies [5, 6], and LPC-synthesis - LPC coefficient [7].
The best results regarding the intelligibility and naturalness Let defined finite set of training patterns vocal sound
of the sound of speech can be obtained by concatenative which is described by a set of limited finite integer discrete
synthesis. Concatenative synthesis is carried out by gluing functions X {x i | i {1,..., I }} , where Aimin , Aimax
the necessary sound units [1,3,8,9]. In such systems, it is
35
E. Fedorov et al., International Journal of Advanced Trends in Computer Science and Engineering, 6(3), May - June 2017, 35-39
minimum and maximum value of the function x i on a vector of values of indicator functions [ S1 ( si )] ,
compact. { N imin ,..., N imax } . We introduce the following mean S1 ( s i ) 1 , i {1,..., I } . Iteration number max , * .
values
b) If K I, then the initial partition
1 I
N av
I
Nimax Nimax , (S ) {S k | S k S} on I clusters, which is described by a
matrix of values of indicator functions [ Sk ( si )] ,
i 1

1, i k
1 I S k ( si ) , i {1,..., I } , k {1,., K } . Iteration
Aav
I
Aimax Aimin . 0, i k
i 1
number max , * .
Let I - the initial number of parallel threads. Let first c) If 1 K I , then the initial partition is given at
the thread number correspond to the number of the training
random (S ) {S k | S k S} on K clusters, which
pattern of vocal sound. Then each i -th thread by the
describes initialized randomly matrix of values of indicator
transformation described in [18, 19] maps the function x i
1, si S k
into an integral bounded finite discrete function si , and the functions [ S k ( si )] , Sk (s i ) , i {1,..., I }
0, si S k
function si has compact support {1,..., N av 1} and minimal
, k {1,., K } . Iteration number 1 , * O . In this case,
value 0 and maximal value A av on it. As a result of all the
threads will be received family S {s} .Thus, it is possible for the matrix the following conditions must be satisfied
[20]:
to quickly convert quasi-periodic signal portion of different
lengths to a unified amplitude-time window for subsequent K
comparison.
S k
(si ) 1 , i {1,..., I } ,
k 1

3. THE METHOD OF DISTRIBUTED CLUSTERING I


TRAINING PATTERNS OF VOCAL SOUNDS S k
(s i ) 0 , k {1,., K } ,
i 1
Suppose now that the thread number corresponds to the
number of clusters into which the family S . Then each S k ( si ) {0,1} , k {1,., K } , i {1,..., I } .
thread with the number K performs clustering method with
the number of clusters K . As a result of the work for each
2. Calculation of cluster centers
K -th thread the pair ( J K ,{m Kk | k {1,..., K }}) will be
received, where J K - the value of the objective function, I

mKk - cluster center k . Choosing the best pair numbers of S k


( s i ) s i (l )

all threads is performed in the form m Kk (l ) i 1
I
, k {1,., K } , l {1,..., N av 1} .
S k
( si )
K * min{ K | J K , K {1,..., I }} i 1

As a result, many reference patterns will be created 3. Distance calculation


N av 1
H {hk | hk m K *k , k {1,..., K * }} .
i {1,..., I } , k {1,., K } d ik | si (l ) m Kk (l ) |2 .
l 1
Thus, it becomes possible to accurately and quickly
determine the optimal number of clusters. Researched article 4. If 1 K I , then modification of the matrix of values
iterative clustering methods are shown below of indicator functions is performed according to the
following rule
2.1. Clustering Method based on an algorithm K-means
if k * arg min d ik , then S * ( si ) 1 and
1. Initialization: k k

k {1,..., K } /{k * } S k ( s i ) 0 .
a) If K 1, then the initial partition
(S ) {S1 | S1 S} on one cluster, which is described by a

36
E. Fedorov et al., International Journal of Advanced Trends in Computer Science and Engineering, 6(3), May - June 2017, 35-39
5. Rule of the termination condition The weight of fuzzy clustering w is set (in article w =2).

If * and max , then 1 , * , go to 2. Calculation of cluster centers


2. I

6. The calculation of the objective function


S~ ( si )w si (l )
k
m Kk (l ) i 1
I
, k {1,., K } , l {1,..., N av 1} .

k i* arg max S k ( si ) , S~ (si ) w


k
k i 1
3. Distance calculation
d ik *
J K max
i . N av 1
i
A N av
1 i {1,..., I } , k {1,., K } d ik | si (l ) m Kk (l ) |2 .
l 1

2.2. Clustering Method based on an algorithm Fuzzy C-


means 4. If 1 K I , then modification of the matrix of values
of membership functions is performed according to the
1. Initialization: following rule

a) If K 1, then the initial partition 1


I d 1 /( w1)

(S ) {S1 | S1 S} on one cluster, which is described by a if d ik 0 , then S~ (s i ) ik

,
k d
vector of values of membership functions M [ S~ ( si )] ,
1
l 1 lk
S~ (si ) 1 , i {1,..., I } . Iteration number max ,
1 if d ik 0 , then S~ ( s i ) 1 and k {1,..., K } /{k * }
k
M* M .
S~ (s i ) 0 .
k

b) If K I, then the initial partition


(S ) {S k | S k S} on I clusters, which is described by a 5. Rule of the termination condition
matrix of values of membership functions M [ S~ (si )] , If M M * and max , then 1 , M * M , go
k

1, i k to 2.
S~ ( si ) , i {1,..., I } , k {1,., K } . Iteration
k
0, i k 6. The calculation of the objective function
number max , M * M .
k i* arg max S~ ( si ) ,
k k
c) If 1 K I , then the initial partition is given at
~ ~ ~ ~
random ( S ) {S k | S k S } on K clusters, which
d ik *
describes initialized randomly matrix of values of J K max i .
membership functions M [ S~ (si )] , where S~ ( si ) return i
av
k k A N 1
a degree of membership of objects clusters, i {1,..., I } ,
2.3. Clustering method based on EM and -algorithm
k {1,., K } . Iteration number 1 , M * O . In this case,
for the matrix M the following conditions must be satisfied 1. Initialization:
[21]:
a) If K 1, then the initial partition
K
(S ) {S1 | S1 S} on one cluster, which is described by
S~ ( si ) 1 , i {1,..., I } ,
k
the vector of expected values of hidden variables G [ g i1 ] ,
k 1

g ik 1 , i {1,..., I } . Iteration number max , G * G .


I
S~ ( si ) 0 , k {1,., K } ,
k b) If K I, then the initial partition
i 1
(S ) {S k | S k S} on I clusters, which is described by
S~ (s i ) [0,1] , k {1,., K } , i {1,..., I } . the vector of expected values of hidden variables G [ g ik ] ,
k

37
E. Fedorov et al., International Journal of Advanced Trends in Computer Science and Engineering, 6(3), May - June 2017, 35-39
1, i k If 1 K I , then calculate gik , where hidden variable
g ik , i {1,..., I } , k {1,., K } . Iteration number
0, i k g ik corresponds to the a posteriori probability, i.e.
max , G * G . g ik P(( m Kk , 2Kk ) | s i ) .

c) If 1 K I , then the initial partition is given at wKk f ( si | (m Kk , Kk ))


random (S ) {S k | S k S} on K clusters, which g ik K
,
describes initialized randomly matrix of the expected values wKm f (si | (mKm , Km ))
m 1
of hidden variables G [ g ik ] , i {1,..., I } , k {1,., K } .
Iteration number 1 , G * O . In this case, for the matrix 4. M-step (calculation parameters wKk , mKk , 2Kk )
G the following conditions must be satisfied:
K
I
g ik 1 , i {1,..., I } , wKk
1
g ik , k 1, K ,
k 1 I i 1

I
I
g ik 0 , k {1,., K } , m Kk ( j )
1
g ik si ( j ) , k 1, K , j {1,..., N av 1} ,
i 1 IwKk i 1

gik [0,1] , k {1,., K } , i {1,..., I } .


1 I
2Kk ( j ) g ik ( si ( j ) m Kk ( j )) 2 ,
k 1, K ,
Each cluster is described by the likelihood function IwKk i 1

f ( s | ( m Kk , 2Kk )) , for which j {1,..., N av 1} .

I 5. Rule of the termination condition


g ik si ( j )
m Kk ( j ) i 1
I
, j {1,..., N av 1} , If G G * and max , then 1 , G * G , go to
g ik 2.
i 1

6. The calculation of the objective function


I
2
g ik ( si ( j) mKk ( j )) k i* arg max g ik ,
2Kk ( j) i 1
I
, j {1,..., N av 1} . k

g ik

i 1 N av 1
d ik | si (l ) m Kk (l ) |2 ,
Sets the weighting factor wk , which wk corresponds to l 1

the a priori probability of appearance of an object from k -th


cluster, i.e. wk P(( m Kk , 2Kk )) . d ik *
J K max
i .
i av
A N 1
wk 1 / K .

2. Calculation of likelihood functions 3. NUMERICAL STUDY

1 Numerical study was conducted for all three clustering


f ( si | (m Kk , 2Kk ))
av N av 1 methods for 1000 training patterns of vocal sounds of
1
( 2 ) N 2Kk ( j) speech. The results are shown in Table 1. According to Table
j 1 1, the best results are obtained by EM -algorithm.

1 N av 1
( si ( j ) m Kk ( j )) 2 Table 1 : Probabilities of clustering
exp , k 1, K .
2 2Kk ( j ) Method name Probability of clustering, %
j 1 K-means 90
Fuzzy C-means 95
3. E-Step (matrix calculation of expected values of
hidden variables) EM 98

38
E. Fedorov et al., International Journal of Advanced Trends in Computer Science and Engineering, 6(3), May - June 2017, 35-39
[9] C. Hamon, E. Moulines, F. Charpentier, A diphone
4. CONCLUSION system based on time-domain prosodic modifications of
speech, Proc. of ICASP 89, pp. 238-241.
The method of distributed transformation training [10] T.K. Vintsiuk, Analiz, raspoznavanie i interpretatsia
patterns vocal sounds to a unified amplitude-time window rechevikh signalov, Naukova dumka, 1987.
and method a distributed clustering training patterns vocal [11] S.O. Haykin, Neural Networks and Learning
sounds have been proposed for distributed forming reference Machines, Pearson Education, Inc., 2009.
patterns of speech vocal sounds in the paper. These methods [12] T. Kohonen, Self-organizing Maps, Springer-Verlag,
allow fast convert quasi-periodic sections of different lengths 1995.
to a single amplitude-time window for subsequent [13] R. Callan, The essence of neural networks, Prentice
comparison and accurately and quickly determine the Hall Europe, 1998.
optimal number of clusters, which increases the probability [14] S.N. Sivanandam, S. Sumathi, S.N. Deepa,
clusterization. The proposed methods can be used in speech Introduction to Neural Networks using Matlab 6.0, The
recognition and synthesis systems. McGraw-Hill Comp., Inc., 2006.
[15] K.-L. Du, M.N.S. Swamy, Neural Networks and
Statistical Learning, Sprnger-Verlag, 2014.
REFERENCES [16] E.E. Fedorov, Iskusstvennyie neyronnyie seti,
DVNZ DonNTU, 2016.
[1] V.N. Bondarev, F.G. Ade, Iskusstvenniy intellect, [17] E.E. Fedorov, Metodologia sozdania multiagentnoi
SevNTU, 2002. sistemy rechevogo upravlenia, Noulidzh, 2011.
[2] R.K. Potapova, Rech: kommunikatsia, informatsia, [18] E.E. Fedorov, Metod klassifikatsii vokalnyih zvukov
kibernetika, Radio i sviaz, 1997. rechi na osnove saundletnoy bayesovskoy neyronnoy
[3] T. Dutoit, An introduction to text-to-speech synthesis, seti, Upravlyayuschie sistemyi i mashinyi, Vol. 6, pp. 78-
Kluwer Academic Publishers, 1997. 83.
[4] J. Allen, S. Hunnicut, D. Klatt, From text to speech, the [19] E.E. Fedorov, Metod sinteza vokalnyih zvukov rechi
MITALK system, Cambridge University Press, 1987. po etalonnyim obraztsam na osnove saundletov, Naukovi
[5] L.R. Rabiner, R.V. Shafer, Digital Processing of pratsi Donetskogo natsionalnogo tehnichnogo universitetu,
Speech Signals, Prentice-Hall Inc., 1978 Vol. 2, pp. 127-137.
[6] G. Bailly, G. Murillo, O. Dakkak, B. Guerin, (1988), A [20] K. Ahuja1, A. Sain, Analyzing formation of K Mean
text-to-speech system for French using formant synthesis, Clusters using similarity and dissimilarity measures,
Proc. of SPEECH 88, pp. 255-260. International Journal of Advanced Trends in Computer
[7] L.R. Rabiner, B.H. Jang, Fundamentals of speech Science and Engineering, Vol. 2 , No.1, pp. 72-74.
recognition, Prentice Hall PTR, 1993. [21] S. Baboo, S. Priya, Clustering based integration of
[8] A.J. Hunt, A. Black, Unit selection in a concatenative personal information using Weighted Fuzzy Local
speech synthesis system using a large speech database, Information C-Means Algorithm, International Journal of
ICASSP 96, pp. 11-14. Advanced Trends in Computer Science and Engineering,
Vol. 2, No.2.

39

Das könnte Ihnen auch gefallen