Beruflich Dokumente
Kultur Dokumente
intelligent
laboratory systems
ELSEVIER
Abstract
Artificial neural networks (NN) with back-error propagationwere used for the classification with NIR spectra and applied
to the classification of different strengths of drugs. Four training set selection methods were compared by applying each of
them to three. different data sets. The NN architecture was selected through a pruning method, and batching operation, adaptive learning rate and momentum were used to train the NN. The presented results demonstrate that selection methods based
on Kennard-Stone and D-optimal designs are better than those based on the Kohonen self-organized mapping and on random selection methods and allow 100% correct classification for both recognition and prediction. The Kennard-Stone design is more practical than the D-optimal design. The Kohonen self-organized mapping method is better than the random
selection method.
Keywords:
1. Introduction
One observes an increasing interest in the application of neural networks (NNs) in chemical calibration and pattern recognition problems [l-13]. Although NNs do not require any assumptions about
data distribution, they can be successfully applied
??
Corresponding author.
W. Wu et al./ Chemometrics
36
2. Theory
2.1. Notation
m
g
n
z
X
N
Y
out
37
the initial points selected by the Kennard-Stone design [21]. For NIR data of each class, ti is much larger than
n. The D-optimal design cannot be directly applied
because of the singularity of the information matrix.
Therefore, the data are pretreated by PCA after centering. The number of variables is reduced to n- 1
latent variables. Now X becomes a score matrix for
nobjects and n- 1 latent variables. Then we apply
the D-optimality method to select objects for the linear model y = C pi xi + e with xi the latent variable
i. It should be pointed out that this linear model is
used only for the selection of objects, not for modelling. This procedure is carried out for each class
separately. 3/4 of the objects are chosen from every
class, and are put together as the training set. If 3/4
of the objects is not an integer, it is rounded using the
same way described under random selection.
3. Experimental data
W. Wu et al. / Chewmetrics
38
CfL1Cjs1(Yij -
OUtij)'
Ng
39
NN utilised in this study consisted of two active layers of nodes with a sigmoidal transfer function. The
number of nodes in the output layer is determined by
the number of classes. Normally the number of nodes
in the input layer is also determined by the structure
of the data. As already explained, for NIR data the
number of variables is much larger than the number
of objects and the variables are highly correlated. The
data can be orthogonalized and reduced by principal
component analysis, but then the number of input PCs
should be optimised. According to Widrows suggestion (see Section 11, the number of objects ought to
be about 10 times the number of weights. However,
in practical use, the number of objects is limited and
there are seldom so many available. Therefore, we
relaxed this condition during the optimisation of the
NN architecture: the ratio of the number of objects to
the number of weights ought to be more than 1. If the
numbers of input and output nodes are fixed, the
maximum number of hidden nodes can be estimated
using this rule. For instance, if there are 60 objects in
the training set, we never train an NN having more
than 60 weights. If we want to use 10 input nodes and
6 output nodes, then the number of hidden nodes
cannot exceed 3. The NN with 3 hidden nodes, 10
input nodes and 6 output nodes has together 57 (11
X 3 + 4 X 6) weights. For the net with 4 hidden
nodes, the number of weights (11 X 4 + 5 X 6 = 79)
is already larger than 60.
There is no standard way to optimise the architecture of NN. The simplest way is to try systematically
all combinations of nodes to find the optimal number
of nodes in the input and the hidden layer. Data set 1
Table 1
Data set 1: correctly classified rate of the training set (CCR) and test set (CCRt) of all combinations within 10 input nodes and 4 hidden
nodes; maximum number of epochs 5000
Input nodes
1 hidden node
2 hidden nodes
3 hidden nodes
CCR
cCRt
CCR
CCRt
CCR
CCRt
CCR
CCRt
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
28.6
59.1
66.7
71.4
69.5
71.4
75.2
79.1
71.4
71.4
65.7
71.4
71.4
71.4
71.4
77.1
80.0
68.6
68.6
69.5
91.4
96.2
100
85.7
100
98.1
100
99.1
77.1
97.1
97.1
100
85.7
100
97.1
94.3
97.1
80.0
100
100
100
100
100
100
100
100
77.1
100
100
100
100
100
100
97.1
100
4 hidden nodes
lo
bl
0.2
60
0
500
looo
!300
looo
1500
2ooo
2500
3cMl
3!m
Epoch; . . . training; - test
1!500 Moo
2500
3ooo
3500
Epoch;... training; -test
4ooo
4ooo
4500
5ooo
4500
Fig. 1. (a) The root mean square error (RMS) as a function of the number of haining epochs; (b) the percentage of correctly classified
objects as a function of the number of training epochs; network architecture (10 X 4 X 7); data set 1.
W. Wu et al./
41
(b)
0
::
0.5
1.5
10
2.5
3.5
12
4.5
Nauronj
Fig. 2. (a) Hinton diagram of the weights between the nodes of the input layer and tbe nodes of the hidden layer in the network (10 X 4 X 7);
(b) sum of the absolute values of the weights of tbe node in the input layer, (c) sum of the absolute values of the weights of the node in the
hidden layer; data set 1.
p-Layer &&propgation with AdaptiveLR 8 Momentum
(a)
0.3
. . . . . . . . . . . . .A,
0.05
I
500
I
loo0
1500
so0
4ooo
4500
100
(bl
zfloo 2500
3ow
Epoch;... Iraining; -test
5ooo
g 90iti
u 80i3
$
70-:
60
0
!iw
loocl
1500
2oal
Epoch; . . .
2500
3ocm
3500
4wo
4500
I
5ow
training; -test
Fig. 3. (a) The root mean square error (RMS) as a function of the number of aaining epochs; (b) the percentage of comxtly classified
objects as a function of the number of training epochs; network architecture (3 X 4 X 7); data set 1.
42
nitude of weights can be easily displayed in the Hinton diagram. This diagram displays the elements of
the weight matrix with squares whose areas are proportional to their magnitude. The bias vector is separated from the other weights with a solid vertical line.
The largest square corresponds to the weight with
largest magnitude and all others arc drawn with sizes
relative to the largest square [23]. The sum of the absolute values of the weights connected to a node can
be used to estimate the importance of the role played
by the node. This pruning is repeated until the performance of the network degrades.
Table 2 demonstrates the results of classification
for the sequence of steps in the optimisation of the net
architecture for data set 1. As one can see, a 100%
correct classification is observed for the NN with the
first 10 PCs as input variables and 4 nodes in the
hidden layer.
Fig. 1 demonstrates the performance of the network with 10 input and 4 hidden nodes during the
training. Fig. 2 shows the Hinton diagram. The
weights of input nodes 4 to 10 are much smaller than
those of the first three nodes. This suggests that the
Table 2
Data set 1: correctly classified rate of the training set (CCR) and
test set (CCRt); training set selected by the Kemxud-Stone procedure; maximum number of epochs 5000
Input
nodes
Hidden
nodes
CCR
(C)
cCRt
(%)
Time
(s)
10
10
3
2
3
4
4
4
99.1
100
100
80
97.1
100
100
77.1
568
534
531
627
PCs 4 to 10 do not contribute significantly to the network performance and the first three PC factors play
an important role in classification. After pruning
them, the network performance does not decrease
(Fig. 3). However, recognition and prediction percentages become worse when the input nodes are reduced to 2 (PCs 3 to 10 are rejected>. Therefore, the
optimal structure of the network for data set 1 is 4
nodes in the hidden layer and 3 input nodes. The final weights of the optimal network are shown in the
Hinton diagram (Figs. 4 and 5). This indicates that
0.5
1.5
2.5
3.5
4.5
mj
Fig. 4. (a) Hinton diagram of the weights between the nodes of the input layer and the nodes of the hidden layer in the optimal network
(3 X 4 X 7); (b) sum of tbe absolute values of the weights of the node in the input layer. (c) sum of the absolute values of the weights of the
node in tbe hidden layer, data set 1.
43
Input i 8 Biases
(b)iilli
0.5
0
0
1.5
2.5
4
Neuron j
3.5
4.5
Fig. 5. (a) Hinton diagram of the weights between the nodes of the hidden layer and the nodes of the output layer in the optimal network
(3 X 4 X 7); (b) sum of the absolute values of the weights of the node in the hidden layer; (c) sum of the absolute values of the weights of
the node in the output layer; data set 1.
(4 R~CIII
(4 -
I51
151------
IO
10
*ia!
(c) KannaId-Btona
:J--yy
-5
10
.
Y
(d) Doptimal
15
1 ...
15
-5
3
10
15
Fig. 6. the design of the training sat by random selection, Kohonen self-organising map, Kennard-Stone algorithm and D-optimal design
with a simulated data set; ( *) objects of the training set ; ( - ) objects of the test set.
44
Table 3
Data set 2: correctly classified rate of the training set (CCR) and
test set (CCRt); training set selected by the Kemtard-Stone prccedure; maximum number of epochs 15000
Input
Hidden
CCR
CCRt
Time
10
10
9
8
4
5
5
5
99.2
100
100
99.2
97.5
100
100
100
1670
2182
1781
1763
Hidden
nodes
::
CCRt
(46)
Time
(s)
7
7
6
5
2
3
3
3
76.8
100
100
99.0
69.4
100
100
100
1699
1521
1867
1845
CCR (%)
CCRt (%)
Time (s)
Random
Random
Random
Kohonen
Kohonen
Kohonen
Kemmrd-Stone
Doptimal
100(105/105)
100 (105j105)
100 (105/105)
100(113/113)
100(116/116)
100 (116/116)
100(105/105)
100 (105/105)
97.1 (34/35)
100 (35/35)
94.3 (33/35)
%.3(26/27)
100(24/24)
100 (24/24)
100(35/35)
loo (35/35)
624
634
636
566
577
684
531
516
W. Wu et al./ Chemometrics
45
Table 6
Data set 2: comparison of the four different techniques of training
set selection; number of correctly classified objects divided by total number of objects expressed between parentheses
Table 8
Data set 3: correctly classified rate of the training set (CCR) and
test set (CCRt); training set selected by D-optimal design ; maximum number of epochs 2OCGO
Method
CCR (%o)
CCRt (%I
Time (s)
Random
Random
Random
Kohonen
Kohonen
Kohonen
Kennard-Stone
D-optimal
99.2 (119/120)
100(120/120)
99.2(119/120)
100(132/132)
100(130/130)
99.2(130/131)
100(120/120)
100(120/120)
92.5 (37/40)
97.5 (39/40)
97.5 (39/40)
96.4 (27/28)
100 (30/30)
96.6 (28/29)
100 @O/40)
100 @O/40)
2218
2213
2212
1942
1956
2309
1781
2246
Input
nodes
Hidden
nodes
CCR
(o/o)
CCRt
(8)
Time
(s)
6
5
4
3
3
3
100
100
88.9
100
100
88.9
1881
1916
1915
CCR (%o)
CCRt (%I
Time (s)
Random
Random
Random
Kohonen
Kohonen
Kohonen
Kennard-Stone
D-optimal
100 (99/99)
100 W/99)
100 @9/99)
97.2 (103/106)
1OQW7/107)
100(110/110)
100 W/99)
100 W/99)
94.4 (34/36)
88.9 (32/36)
100 (36/36)
96.6 (28/29)
100 (28/28)
1863
1868
1884
1667
1685
1632
1867
1881
100 (25/25)
100 (36/36)
100 (36/36)
6. Conclusion
Artificial NN are shown to be useful pattern
recognition tools for the classification of NIR spectral data of drugs when the training sets are correctly
selected. Comparing the four training set selection
methods, the Kennard-Stone and D-optimal procedures are better than the random selection and Kohonen methods. The results of the D-optimal design may
be slightly better than those of the Kermard-Stone
design. However, the computing time of the D-optimal design (using Kennard-Stone design as the initial points) is larger than that of the Kennard-Stone
46
procedure. The random selection and Kohonen methods do not allow good performance in our study.
The number of data sets studied is not sufficiently
large to prove that these conclusions are always valid
for any data set. However, they allow at least to state
that the Kennard-Stone procedure will be a useful
approach in certain instances, and according to us, in
most instances.
References
[II X.H. Song and R.Q. Yu, Chemom. Intell. Lab. Syst., 19
(1993) 101-109.
Dl C. Borggaard and H.H. Thodberg, Anal. Chem., 64 (1992)
545-55 1.
[31 Y.W. Li and P.V. Espen, Chemom. Intell. Lab. Syst., 25
(1994) 241-248.
141D. Wienke and G. Kateman, Chemom. Intell. Lab. Syst.. 23
( 1994) 309-329.
151T.B. Blank and S.D. Brown, J. Chemom., 8 (1994) 391-407.
b51 J. Zupan and J. Gasteiger, Neural Networks for Chemists: An
Introduction, Weinheim, New York, 1993.
[71 T. Noes, K. Kvaal, T. Isaksson and C. Miller, J. Near Infrared
spectrosc., 1 (1993) 1-11.
181P. de B. Harrington, Chemom. Intell. Lab. Syst., 19 (1993)
143-154.
[91 B.J. Wythoff, Chemom. Intell. Lab. Syst., 20 (1993) 129148.
[lOI G. Kateman, Chemom. Intell. Lab. Syst., 19 (1993) 135-142.
[III J.R.M. Smits, L.W. Bmedveld, M.W.J. Derksen and G. Kateman, Anal. Chim. Acta, 258 (1992) 1l-25.