Beruflich Dokumente
Kultur Dokumente
(
(
=
(
(
=
(t o )
(t o )
(t o ) o (1 o )
pj pj
L
pj pj
L
pj pj
L
pj
L
pj
L
2
2
2
2
= o
pj
L
pi
L
o . (8)
The third line in the derivation assumes that the neuronal activation function, f (Net), is the binary sigmoid function, and thus
substitutes the values accordingly. The other cases are evaluated with t
p
=1, and w
ji
< 0, t
p
= 0, and w
ji
> 0, and t
p
= 0, and w
ji
< 0.
This results in four output layer and eight hidden layer equations. The four output layer equations are summarized as follows:
L
pi
L
pj
o o , for t
p
=1, and w
ji
> 0,
U
pi
L
pj
o o , for t
p
=1, and w
ji
< 0
=
c
c
ji
pj
w
E
(9)
U
pi
U
pj
o o , for t
p
= 0 and w
ji
> 0.
L
pi
U
pj
o o , for t
p
= 0 and w
ji
< 0.
where
) o (1 o ) o (t
L
pj
L
pj
L
pj pj
L
pj
= o and ) o (1 o ) o (t
U
pj
U
pj
U
pj pj
U
pj
= o .
The calculation of the partial derivative c c E w
p ji
/ for the hidden layers is based on back-propagating the error as measured at the
output layer. The following discussion assumes one hidden layer, although subsequent terms could be derived in the same way for
other hidden layers. Since this derivation involves a path of two neurons (output and hidden layer), there are eight cases. For the first
case, t
p
=1, w
kj
> 0 and w
ji
> 0, where w
kj
is the weight on the path from hidden layer neuron j to output layer neuron k, k fixed:
c
c
c
c
c
c
c
c
c
c
E
w w
o
o
Net
Net
o
p
ji ji
pk
L
pk
L
pk
L
pk
L
pj
=
(
(
=
(
(
(t o )
(t o )
pk pk
L
pk pk
L
2
2
2
2
c
c
c
c
o
Net
Net
w
pj
L
pj
L
pj
L
ji
(1 o )
pj
L
= o
pk
L
kj pj
L
pi
L
w o o . (10)
The second case is a variation on the first, in which t
p
=1, w
kj
> 0 and w
ji
< 0, so the partial derivative becomes:
Speech Recognition In Neuro-Fuzzy System
(IJSTE/ Volume 01/ Issue 01 / 002)
All rights reserved by www.ijste.org
20
c
c
o
E
w
(1 o )
p
ji
pj
L
=
pk
L
kj pj
L
pi
U
w o o . (11)
The third case is characterized by t
p
=1, w
kj
< 0 and w
ji
> 0 :
c
c
o
E
w
(1 o )
p
ji
pj
U
=
pk
L
kj pj
U
pi
U
w o o . (12)
t
p
=1, w
kj
< 0 and w
ji
< 0 for the fourth case:
c
c
o
E
w
(1 o )
p
ji
pj
U
=
pk
L
kj pj
U
pi
L
w o o . (13)
The fifth through eighth cases deal with a target value of 0. For the fifth case, t
p
=0, w
kj
> 0 and w
ji
> 0 :
c
c
c
c
c
c
c
c
c
c
E
w w
o
o
Net
Net
o
p
ji ji
pk
U
pk
U
pk
U
pk
U
pj
U
=
(
(
=
(
(
(t o )
(t o )
pk pk
U
pk pk
U
2
2
2
2
c
c
c
c
o
Net
Net
w
pj
U
pj
U
pj
U
ji
(1 o )
pj
U
= o
pk
U
kj pj
U
pi
U
w o o . (14)
t
p
=0, w
kj
> 0 and w
ji
< 0 for the sixth case:
c
c
o
E
w
(1 o )
p
ji
pj
U
=
pk
U
kj pj
U
pi
L
w o o . (15)
In the seventh case, t
p
= 0, w
kj
> 0 and w
ji
< 0 :
c
c
o
E
w
(1 o )
p
ji
pj
L
=
pk
U
kj pj
L
pi
L
w o o . (16)
The eighth and final case has t
p
= 0, w
kj
< 0 and w
ji
< 0 :
U
pi
L
pj kj
U
pk
o o w = ) o (1
w
E
L
pj
ji
p
o
c
c
. (17)
Equations (9) through (17) are used to code the training function in the simulation. Simulation results are presented in the next
section.
III. IMPLEMENTATION AND EXPERIMENTAL RESULTS
The implementation of the neuro-fuzzy network was carried out as a simulation. Using the equations derived in the previous section,
the simulation was coded in the C programming language and tested with several classic data sets. These results proved favorable,
with comparable or improved performance over a standard neural network [6].
A. Vowel Recognition Data
The application problem that will serve as a testbench for this study is Vowel, one of a collection of data sets used as a neural network
benchmarks [7]. It is used for speaker-independent speech recognition of the eleven vowel sounds from multiple speakers. The Vowel
data set used in this study was originally collected by Deterding [8], who recorded examples of the eleven steady state vowels of
English spoken by fifteen speakers for a non-connectionist [7] speaker normalization study. Four male and four female speakers
were used to create the training data, and the other four male and three female speakers were used to create the testing data. The
actual composition of the data set consists of 10 inputs, which are obtained by sampling, filtering and carrying out linear predictive
analysis.
Specifically, the speech signals were low pass filtered at 4.7 kHz and digitized to 12 bits with a 10 kHz sampling rate. Twelfth
order linear predictive analysis was carried out on six 512 sample Hamming windowed segments from the steady part of the vowel.
The reflection coefficients were used to calculate 10 log area parameters, giving a 10 dimensional input space [7]. Each speaker thus
Speech Recognition In Neuro-Fuzzy System
(IJSTE/ Volume 01/ Issue 01 / 002)
All rights reserved by www.ijste.org
21
yielded six frames of speech from eleven vowels. This results in 528 frames from the eight speakers used for the training set, and
462 frames from the seven speakers used to create the testing set.
528 samples is relatively small training set (in standard neural network applications), the values are diverse and thus Vowel is an
excellent testbench for the neuro-fuzzy system.
B. Performance Results
As stated before, speaker-independent voice recognition is extremely difficult, even when confined to the eleven vowel sounds
that Vowel consists of. Robinson carried out a study comparing performance of feed-forward networks with different structures using
this data, which serve as a baseline for comparisons. The best recognition rate for a multi-layer perceptron reported in that study is
51% [9]. In spite of its difficulty, other studies have also used the Vowel data set. For various types of systems, the reported
recognition rates (best case) are 51% to 59% [9, 10, 11]. The recognition rate obtained with the neuro-fuzzy system is 89%. Results
of the simulation are summarized in Table 1, shown below.
A recognition rate of 89% surpassed expectations, especially with a data set as diverse as the speaker-independent speech (vowel
recognition) problem.
Table. 1: Best Recognition Rates for Standard Neural Networks and the Neuro-Fuzzy
System
Type of
Network
Number of Hidden
Neurons
Best
Recognition Rate
Std. Neural Network 11 59.3
Std. Neural Network 22 58.6
Std. Neural Network 88 51.1
Neuro-Fuzzy System 11 88.7
IV. CONCLUSIONS
Fuzzy theory has been used successfully in many applications [12]. This study shows that it can be used to improve neural net work
performance. There are many advantages of fuzziness, one of which is the ability to handle imprecise data. Neural networks are
known to be excellent classifiers, but their performance can be hampered by the size and quality of the training set. By combining
some fuzzy techniques and neural networks, a more efficient network results, one which is extremely effective for a class of
problems. This class is characterized by problems with its training set. Either the set is too small, or not representative of the
possibility space, or very diverse. Thus, standard neural network solution is poor, because the training sequence does not converge
upon an effective set of weights. With the incorporation of fuzzy techniques, the training converges and the results are vastly
improved.
One example of the problem class which benefits from neuro-fuzzy techniques is that of speaker-independent speech recognition.
The Vowel data set was used in simulation experiments, as reported in the previous section. This data set is well known and has been
used in many studies, and has yielded poor results. This is true to such a degree that it caused one researcher to claim that poor
results seem to be inherent to the data [10]. This difficulty with poor performance adds more credence to the effectiveness of the
fuzzy techniques used in the neuro-fuzzy system.
In summation, the neuro-fuzzy system outperforms standard neural networks. Specifically, the simulations presented show that the
neuro-fuzzy system in this study outperforms the original multi-layer perceptron study by 45% on the difficult task of vowel
recognition.
REFERENCES
[1] Kosko, Bart, Neural Networks and Fuzzy Systems: A Dynamical Approach to Machin Intelligence, Prentice Hall, Englewood Cliffs, NJ, 1992.
[2] Nava, P. and J. Taylor, The Optimization of Neural Network Performance through Incor-poration of Fuzzy Theory, Proceedings of The Eleventh International
Conference on Systems Engineering, pp. 897-901.
[3] Alefeld, G. and J. Herzberger, Introduction toInterval Computation, Academic Press, New York, NY, 1983.
[4] Bojadziev, G. and M. Bojadziev, Fuzzy Sets, Fuzzy Logic, Applications, World Scientific, New Jersey, 1995.
[5] Ishibuchi, H., R. Fujioka, and H. Tanaka, Neural Networks That Learn from Fuzzy If-Then Rules, Vol.1, No.2, pp. 85-97, 1993.
[6] Nava, P., New Paradigms for Fuzzy Neural Networks, Ph.D. Dissertation, New Mexico State University, Las Cruces, New Mexico, 1995.
[7] Carnegie Mellon University Neural Network Benchmark Collection, Pittsburgh, PA, 1994.
[8] Deterding, D.H. Speaker Normalization for Automatic Speech Recognition, Ph.D. Disser-tation, Cambridge, England, 1989.
[9] Robinson, A.J., Dynamic error Propagation Networks, Ph.D. Dissertation, Cambridge University, Cambridge, England, 1989.
[10] Graa, M., A. DAnjou, F. Albizuri, J. Lozano, P. Laranaga, M. Hernandez, F. Torrealdea, A. de la Hera, and I. Garcia, Experiments of fast learning with High
Order Boltzmann Machines, Informatica y Auto-matica, 1995.
[11] Nava, P. and J. Taylor, Voice Recognition with a Fuzzy Neural Network, Proceedings of The Fifth IEEE International Conference on Fuzzy Systems, pp.
2049-2052.
[12] Zadeh, L.A., Fuzzy Sets and Applications: Selected Papers by L. A. Zadeh, John Wiley & Sons, New York, 1987.