Beruflich Dokumente
Kultur Dokumente
The general formulation of the problem. The Formulation of research problems. The aim is to
development of intelligent software components intended for develop a method of forming a distributed reference patterns
human speech recognition, speech synthesis et al., which are vocal speech sounds.
used in computer systems to communicate, is actual in
current conditions. The basis of this problem lies the problem Problem Solving and research results. To achieve this
of building effects in efficiently methods that provide high aim you need:
speed formation of the reference patterns of speech sounds 1. Develop a distributed method conversion training
and used to learn descriptive and generative models. patterns vocal sounds to a unified amplitude-time window.
2. Develop a method for distributed clustering training
Analysis Research. Existing speech patterns synthesis patterns vocal sounds.
system using such approaches like [1-4]: formant synthesis, 3. Conduct a numerical study of the clustering methods
synthesis based of a linear prediction coefficients (LPC- used.
synthesis), concatenative synthesis. Formant synthesis and
LPC-synthesis are based on the model of human speech
formation. The model of the speech path is realized as an 2. METHOD OF CONVERTING A DISTRIBUTED
adaptive digital filter. For formant synthesis parameters of TRAINING PATTERNS VOCAL SOUNDS TO A
the adaptive digital filter are determined by the formant UNIFIED AMPLITUDE-TIME WINDOW
frequencies [5, 6], and LPC-synthesis - LPC coefficient [7].
The best results regarding the intelligibility and naturalness Let defined finite set of training patterns vocal sound
of the sound of speech can be obtained by concatenative which is described by a set of limited finite integer discrete
synthesis. Concatenative synthesis is carried out by gluing functions X {x i | i {1,..., I }} , where Aimin , Aimax
the necessary sound units [1,3,8,9]. In such systems, it is
35
E. Fedorov et al., International Journal of Advanced Trends in Computer Science and Engineering, 6(3), May - June 2017, 35-39
minimum and maximum value of the function x i on a vector of values of indicator functions [ S1 ( si )] ,
compact. { N imin ,..., N imax } . We introduce the following mean S1 ( s i ) 1 , i {1,..., I } . Iteration number max , * .
values
b) If K I, then the initial partition
1 I
N av
I
Nimax Nimax , (S ) {S k | S k S} on I clusters, which is described by a
matrix of values of indicator functions [ Sk ( si )] ,
i 1
1, i k
1 I S k ( si ) , i {1,..., I } , k {1,., K } . Iteration
Aav
I
Aimax Aimin . 0, i k
i 1
number max , * .
Let I - the initial number of parallel threads. Let first c) If 1 K I , then the initial partition is given at
the thread number correspond to the number of the training
random (S ) {S k | S k S} on K clusters, which
pattern of vocal sound. Then each i -th thread by the
describes initialized randomly matrix of values of indicator
transformation described in [18, 19] maps the function x i
1, si S k
into an integral bounded finite discrete function si , and the functions [ S k ( si )] , Sk (s i ) , i {1,..., I }
0, si S k
function si has compact support {1,..., N av 1} and minimal
, k {1,., K } . Iteration number 1 , * O . In this case,
value 0 and maximal value A av on it. As a result of all the
threads will be received family S {s} .Thus, it is possible for the matrix the following conditions must be satisfied
[20]:
to quickly convert quasi-periodic signal portion of different
lengths to a unified amplitude-time window for subsequent K
comparison.
S k
(si ) 1 , i {1,..., I } ,
k 1
N av 1
H {hk | hk m K *k , k {1,..., K * }} .
i {1,..., I } , k {1,., K } d ik | si (l ) m Kk (l ) |2 .
l 1
Thus, it becomes possible to accurately and quickly
determine the optimal number of clusters. Researched article 4. If 1 K I , then modification of the matrix of values
iterative clustering methods are shown below of indicator functions is performed according to the
following rule
2.1. Clustering Method based on an algorithm K-means
if k * arg min d ik , then S * ( si ) 1 and
1. Initialization: k k
k {1,..., K } /{k * } S k ( s i ) 0 .
a) If K 1, then the initial partition
(S ) {S1 | S1 S} on one cluster, which is described by a
36
E. Fedorov et al., International Journal of Advanced Trends in Computer Science and Engineering, 6(3), May - June 2017, 35-39
5. Rule of the termination condition The weight of fuzzy clustering w is set (in article w =2).
1, i k to 2.
S~ ( si ) , i {1,..., I } , k {1,., K } . Iteration
k
0, i k 6. The calculation of the objective function
number max , M * M .
k i* arg max S~ ( si ) ,
k k
c) If 1 K I , then the initial partition is given at
~ ~ ~ ~
random ( S ) {S k | S k S } on K clusters, which
d ik *
describes initialized randomly matrix of values of J K max i .
membership functions M [ S~ (si )] , where S~ ( si ) return i
av
k k A N 1
a degree of membership of objects clusters, i {1,..., I } ,
2.3. Clustering method based on EM and -algorithm
k {1,., K } . Iteration number 1 , M * O . In this case,
for the matrix M the following conditions must be satisfied 1. Initialization:
[21]:
a) If K 1, then the initial partition
K
(S ) {S1 | S1 S} on one cluster, which is described by
S~ ( si ) 1 , i {1,..., I } ,
k
the vector of expected values of hidden variables G [ g i1 ] ,
k 1
37
E. Fedorov et al., International Journal of Advanced Trends in Computer Science and Engineering, 6(3), May - June 2017, 35-39
1, i k If 1 K I , then calculate gik , where hidden variable
g ik , i {1,..., I } , k {1,., K } . Iteration number
0, i k g ik corresponds to the a posteriori probability, i.e.
max , G * G . g ik P(( m Kk , 2Kk ) | s i ) .
I
I
g ik 0 , k {1,., K } , m Kk ( j )
1
g ik si ( j ) , k 1, K , j {1,..., N av 1} ,
i 1 IwKk i 1
g ik
i 1 N av 1
d ik | si (l ) m Kk (l ) |2 ,
Sets the weighting factor wk , which wk corresponds to l 1
38
E. Fedorov et al., International Journal of Advanced Trends in Computer Science and Engineering, 6(3), May - June 2017, 35-39
[9] C. Hamon, E. Moulines, F. Charpentier, A diphone
4. CONCLUSION system based on time-domain prosodic modifications of
speech, Proc. of ICASP 89, pp. 238-241.
The method of distributed transformation training [10] T.K. Vintsiuk, Analiz, raspoznavanie i interpretatsia
patterns vocal sounds to a unified amplitude-time window rechevikh signalov, Naukova dumka, 1987.
and method a distributed clustering training patterns vocal [11] S.O. Haykin, Neural Networks and Learning
sounds have been proposed for distributed forming reference Machines, Pearson Education, Inc., 2009.
patterns of speech vocal sounds in the paper. These methods [12] T. Kohonen, Self-organizing Maps, Springer-Verlag,
allow fast convert quasi-periodic sections of different lengths 1995.
to a single amplitude-time window for subsequent [13] R. Callan, The essence of neural networks, Prentice
comparison and accurately and quickly determine the Hall Europe, 1998.
optimal number of clusters, which increases the probability [14] S.N. Sivanandam, S. Sumathi, S.N. Deepa,
clusterization. The proposed methods can be used in speech Introduction to Neural Networks using Matlab 6.0, The
recognition and synthesis systems. McGraw-Hill Comp., Inc., 2006.
[15] K.-L. Du, M.N.S. Swamy, Neural Networks and
Statistical Learning, Sprnger-Verlag, 2014.
REFERENCES [16] E.E. Fedorov, Iskusstvennyie neyronnyie seti,
DVNZ DonNTU, 2016.
[1] V.N. Bondarev, F.G. Ade, Iskusstvenniy intellect, [17] E.E. Fedorov, Metodologia sozdania multiagentnoi
SevNTU, 2002. sistemy rechevogo upravlenia, Noulidzh, 2011.
[2] R.K. Potapova, Rech: kommunikatsia, informatsia, [18] E.E. Fedorov, Metod klassifikatsii vokalnyih zvukov
kibernetika, Radio i sviaz, 1997. rechi na osnove saundletnoy bayesovskoy neyronnoy
[3] T. Dutoit, An introduction to text-to-speech synthesis, seti, Upravlyayuschie sistemyi i mashinyi, Vol. 6, pp. 78-
Kluwer Academic Publishers, 1997. 83.
[4] J. Allen, S. Hunnicut, D. Klatt, From text to speech, the [19] E.E. Fedorov, Metod sinteza vokalnyih zvukov rechi
MITALK system, Cambridge University Press, 1987. po etalonnyim obraztsam na osnove saundletov, Naukovi
[5] L.R. Rabiner, R.V. Shafer, Digital Processing of pratsi Donetskogo natsionalnogo tehnichnogo universitetu,
Speech Signals, Prentice-Hall Inc., 1978 Vol. 2, pp. 127-137.
[6] G. Bailly, G. Murillo, O. Dakkak, B. Guerin, (1988), A [20] K. Ahuja1, A. Sain, Analyzing formation of K Mean
text-to-speech system for French using formant synthesis, Clusters using similarity and dissimilarity measures,
Proc. of SPEECH 88, pp. 255-260. International Journal of Advanced Trends in Computer
[7] L.R. Rabiner, B.H. Jang, Fundamentals of speech Science and Engineering, Vol. 2 , No.1, pp. 72-74.
recognition, Prentice Hall PTR, 1993. [21] S. Baboo, S. Priya, Clustering based integration of
[8] A.J. Hunt, A. Black, Unit selection in a concatenative personal information using Weighted Fuzzy Local
speech synthesis system using a large speech database, Information C-Means Algorithm, International Journal of
ICASSP 96, pp. 11-14. Advanced Trends in Computer Science and Engineering,
Vol. 2, No.2.
39