Beruflich Dokumente
Kultur Dokumente
www.elsevier.com/locate/jss
Department of Automatic Control, Beijing University of Aeronautics and Astronautics, Beijing 100083, China
b
Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Received 1 March 2000; received in revised form 1 June 2000; accepted 1 August 2000
Abstract
Previous studies have shown that the neural network approach can be applied to identify defect-prone modules and predict the
cumulative number of observed software failures. In this study we examine the eectiveness of the neural network approach in
handling dynamic software reliability data overall and present several new ndings. Specically, we nd
1. The neural network approach is more appropriate for handling datasets with `smooth' trends than for handling datasets with
large uctuations.
2. The training results are much better than the prediction results in general.
3. The empirical probability density distribution of predicting data resembles that of training data. A neural network can qualitatively predict what it has learned.
4. Due to the essential problems associated with the neural network approach and software reliability data, more often than not, the
neural network approach fails to generate satisfactory quantitative results. 2001 Elsevier Science Inc. All rights reserved.
Keywords: Software reliability modeling; Neural network; Network architecture; Scaling function; Filtering; Empirical probability density
distribution; Software operational prole
1. Introduction
Software reliability modeling is a subject of growing
importance. It aims to quantify software reliability
status and behavior and helps to develop reliable
software and check software reliability. Quite a few
methods have been proposed in software reliability
modeling, including regression modeling methods,
capturerecapture modeling methods, times-betweenfailures modeling methods, NHPP modeling methods,
operational-prole modeling methods, etc. (Cai, 1998;
Lyu, 1996; Musa et al., 1987; Xie, 1991). Each of them
has achieved success to some extent in particular cases
and conventional statistical theory plays an essential
role. On the other hand, however, it has been realized
that numerous factors may aect software reliability
behavior, including software development methodology, software development environment, software
complexity, software development organization and
q
Supported by the national key project of China and the national
outstanding youth foundation of China.
*
Corresponding author. Tel./Fax: +86-10-8231-7328.
E-mail address: kyc@ns.dept3.buaa.edu.cn (K.-Y. Cai).
0164-1212/01/$ - see front matter 2001 Elsevier Science Inc. All rights reserved.
PII: S 0 1 6 4 - 1 2 1 2 ( 0 1 ) 0 0 0 2 7 - 9
48
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
3. Neural networks
An (articial) neural network is a machine that is
designed to model or mimic the way in which the brain
performs a particular task or function of interest. It
consists of neurons that are connected in a certain
manner. A neuron serves as an information processing
unit as depicted in Fig. 3. It has p inputs r1 ; r2 ; . . . ; rp
and one output yk , and consists of three basic elements: a
set of synapses or connecting links in terms of weights
wk1 ; wk2 ; . . . ; wkp , an adder in terms of a summing junc-
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
49
Fig. 4. Multilayer perception with one hidden layer and one output.
Fig. 3. Non-linear model of a neuron.
uk
wkj xj ;
yk uuk
hk ;
hk P 0:
j1
A commonly used neural network is the so-called multilayer feedforward network or multilayer perception. In
this study we mostly use a multilayer perception with
only one hidden layer and one output, as shown in Fig.
4, where each circle denotes a neuron. However the input layer (receiving inputs x1 ; x2 ; . . . ; xp ) is a special one.
Each neuron of the input layer only receives an input
signal and passes it as output without any computation
in the neuron. In this way
ui
v
p
P
j1
q
P
l1
wij xj ;
c l hl ;
hi uui ;
i 1; 2; . . . ; q;
y^ uv:
k
1
1 exp
1
MATLAB is a popular software package used for control systems
design.
i 1; 2; . . .
50
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
with xi being the actual value, and let x^i be the corresponding trained or predicted value.
From Figs. 512 and Table 1, we observe:
1. In the 50 training samples of DX 1, only 52% trained
outputs have relative error less than 20%; in the 50
training samples of DX 2, 74% trained outputs have
relative error less than 20%. This suggests that the
training accuracy is not high.
2. In the 30 short-term predictions of DX 1, only 7% predicted values have relative error less than 20%; things
are similar for long-term predictions and DX 2. This
suggests that the prediction accuracy is low.
3. Short-term predictions are somewhat better than
long-term predictions, but the improvements are minor; this can be explained by the fact that the shortterm predictions are poor by themselves.
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
51
Table 1
Training and prediction result of DX 1 and DX 2
Proportion
of with
RE 6 20%
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
DX 1 Set
DX 2 Set
52
74
7
7
7
3
52
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
i0 49
1 X
^
x
2 ji0 j
xj :
Fig. 14. Training and short-term prediction results of the rst software
reliability dataset DX 1 by use of four-layer network.
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
10
30
16.67
73
10
23.3
16.7
70
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
Table 3
Training and prediction results for DX 1 with and without the use of a
probabilistic statistic
53
Table 4
Training and prediction results for DX 1 with and without using difference signals
Proportion of
dataset with
RE 6 20%
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
Proportion of
dataset with
RE 6 20%
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
p 50
p 51
53
51
6
6
6
4
Without using
dierence
signals
With using
dierence signals
52
67
xi xi1 xi49
:
MXi
50
In this way the neural network has 51 neurons in the
input layer, and thus xi ; xi1 ; . . . ; xi49 ; MXi ; xi50 makes
up a training sample. Following the same procedure
presented in Section 4 of using the neural network with
p 50 and with p 51, respectively, as shown in Fig. 3,
to process DX 1, we arrive at Table 3; the error tolerance
bound for the back-propagation algorithm is 0.005. The
case of p 50 means that MXi is not used, whereas the
case of p 51 corresponds to using MXi , 4 we see that
using a probabilistic statistic to mimic the `random'
nature of fxi g does not help.
5.4. Dierence signals
A possible reason for a neural network not generating
satisfactory results is that it is not smart or predictive
enough. In order to improve the predictiveness of a
neural network, we wonder if dierence signals can
help. 5 We may employ dierence signals as input signals to a neural network to `predict' the next failure.
Specically, let
Dxi xi1
xi :
First, we use
fDxi ; Dxi1 ; . . . ; Dxi48 ; Dxi49 g;
i i0 ; i0 1; . . . ; i0 49;
to train a multilayer perception with one hidden layer,
49 input neurons (for Dxi ; Dxi1 ; . . . ; Dxi48 and one
output neuron (for Dxi49 ), and eventually to generate a
D^
xi49 . Then let ~xi50 xi50 D^
xi49 . We may treat ~xi50
as a predicted value of xi50 . In turn, we use a second
multilayer perception to process DX 1. The procedure is
the same as that presented in Section 5.3 except using
~xi50 to replace MXi . This leads to Table 4; the corresponding error tolerance bound for the back-propagation algorithm is 0.005. Obviously, using dierence
signals helps very little.
4
The results corresponding to p 50 are slightly dierent from those
presented in Table 1. One cannot guarantee that a neural network can
generate the same training and prediction results in two runs.
5
This is inspired by the idea of PID control in control engineering. In
a PID controller, `D' denotes the dierence and attempts to `predict'
the future behavior of a controlled object.
Dxi
1
:
2
2 max Dxj
j
ji0 ;...;i0 49
xi50 =a;
where a is a parameter we need to further specify. Following the same procedure presented in Section 4 for
processing DX 1 and DX 2 except adopting a dierent
scaling function, we arrive at Tables 5 and 6.
From Tables 5 and 6 we see that non-linear scaling
functions improve the training results of DX 1, but have
little essential eect on the prediction results of DX 1 and
the training and prediction results of DX 2. Actually,
linear scaling function does not aect the data structures
of DX 1 and DX 2, whereas non-linear scaling functions
will do a reverse. The non-linear scaling functions reduce
the large uctuations of DX 1 smaller, or the transformed
Table 5
Training and prediction results of DX 1 by use of dierent scaling
functions
Proportion of
data with
RE 6 20%
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
Scaling function
Linear
Logarithmic
Exponential
a 2 106
52
82
78
7
7
7
5
3
3
54
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
Table 6
Training and prediction results of DX 2 by use of dierent scaling
functions
Proportion of
data with
RE 6 20%
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
Scaling function
Linear
Logarithmic
Exponential
a 2 106
74
88
74
7
13
7
3
7
7
Table 7
Training and prediction results of DX 1 by use of exponential scaling
function with dierent values of a
Proportion
of data with
RE 6 20%
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
a
50 000
2105
2106
78
92
78
10
10
7
6.7
3
3
i
X
xj :
j1
Fig. 15. Cumulating patterns of the rst and second software reliability datasets.
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
55
Table 8
Training and prediction results of DT 1 and DT 2
Proportion
of data with
RE 6 20%
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
Dataset
DT1
DT2
100
100
93
77
40
57
Fig. 16. Training and short-term prediction results of the rst software
failure dataset DX 1 via Its cumulating dataset DT 1.
Table 9
Training and prediction results of DX 1 and DX 2 by use of DT 1 and
DT 2
Proportion of data
with RE 6 20%
Training
dataset (%)
Short-term prediction
dataset (%)
Dataset
DX 1 by use of DT 1
DX 2 by use of DT 2
42
47
10
3
56
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
10 into Figs. 18 and 19, respectively, to make up histograms. That is, we generate histograms for the datasets
used for training the neural networks and for the shortterm results predicted by the trained neural network,
respectively. We see that empirical probability density
histograms of the training data are similar to those of
the short-term prediction results. Actually we realize this
is also the case even if non-linear scaling functions are
employed (refer to Section 6). In other words, the
qualitative pattern of predictions mimics that of the
training data in terms of probability distributions. What
a neural network can predict is what it has learned.
Neural networks demonstrate satisfactory qualitative
behavior. This coincides with our previous observation
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
Fig. 21. Behavior of the rst software failure count dataset DK1 with
m 80 with respect to failure time.
57
Table 10
Training and prediction results of DK1 k^i predicted value of ki )
Proportion of
data with
ki k^i
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
m
40
80
100
93
93
89
20
52
66
20
64
69
Table 11
Training and prediction results of DK2 (k^i predicted value of ki )
Fig. 22. Behavior of the second software failure count dataset DK2
with m 80 with respect to failure time.
Proportion
of data with
ki k^i
Training
dataset
(%)
Short-term
prediction
dataset (%)
Long-term
prediction
dataset (%)
m
40
80
100
120
67
83
77
92
20
16
29
32
0
8
31
38
Table 10 suggests that with increasing m, the prediction result for DK2 is improved. This can be interpreted
as follows: with increasing m, the number of observed
software failures in each time interval reduces and tends
to be zero or one. Then the prediction task looks like a
problem of classication, and a neural network can do
this well. However the dierent pattern of DK2 makes
the short-term prediction results for DK2 with m 80
not get improved in comparison with those for DK2 with
m 40. But in general, the increasing m improves the
prediction behavior.
9.2. Eects of ltering
Fig. 23. Training and Short-term prediction results of the rst software failure count dataset DK1 with m 40.
58
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
^i
k^i kc
^i
kc
We see that DKC1 can be processed by the neural network approach very well. This coincides with the previous observation by Karuanithi et al., 1992a,b.
However things are somewhat dierent for DKC2 . The
prediction behavior varies with m and may be unsatisfactory. On the other hand, refer to Tables 10 and 11,
^ i g to determine fk^i g does not help to get fk^i g
using fkc
improved.
10. General discussions
Table 12
Basic results of processing DKC1
m
40
80
100
Short-term prediction
dataset (%)
Long-term prediction
dataset (%)
Training
dataset (%)
Short-term prediction
dataset (%)
100
100
100
100
100
100
100
100
100
27
27
34
20
56
51
Table 13
Basic results of processing DKC2
m
40
80
100
120
Short-term prediction
dataset (%)
Long-term prediction
dataset (%)
Training
dataset (%)
Short-term prediction
dataset (%)
100
100
100
100
80
72
29
45
20
20
26
57
13
13
20
20
0
12
26
15
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
i 1; 2; . . .
11.
12.
13.
14.
15.
16.
59
60
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
xi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
320.00
1439.00
9000.00
2880.00
5700.00
21 800.00
26 800.00
113 540.00
112 137.00
660.00
2700.00
28 793.00
2173.00
7263.00
10 865.00
4230.00
8460.00
14 805.00
11 844.00
5361.00
6553.00
6499.00
3124.00
51 323.00
17 017.00
1890.00
5400.00
62 313.00
24 826.00
26 335.00
363.00
13 989.00
15 058.00
32 377.00
41 632.00
41 632.00
4160.00
82 040.00
13 189.00
3426.00
5833.00
640.00
640.00
2880.00
110.00
22 080.00
60 654.00
52 163.00
12 546.00
784.00
10 193.00
7841.00
31 365.00
24 313.00
298 890.00
1280.00
22 099.00
19 150.00
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
xi
178 200.00
144 000.00
639 200.00
86 400.00
288 000.00
320.00
57 600.00
28 800.00
18 000.00
88 640.00
43 200.00
4160.00
3200.00
42 800.00
43 600.00
10 560.00
115 200.00
86 400.00
57 699.00
28 800.00
432 000.00
345 600.00
115 200.00
44 494.00
10 506.00
177 240.00
241 487.00
143 028.00
273 564.00
189 391.00
172 800.00
21 600.00
64 800.00
302 400.00
752 188.00
86 400.00
100 800.00
194 400.00
115 200.00
64 800.00
3600.00
230 400.58
259 200.00
183 600.00
3600.00
144 000.00
14 400.00
86 400.00
110 100.00
28 800.00
43 200.00
57600.00
46 800.00
950 400.00
400 400.00
883 800.00
273 600.00
432 000.00
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
61
Table 14 (Continued.)
i
xi
xi
xi
xi
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
2611.00
39 170.00
55 794.00
42 632.00
267 600.00
87 074.00
149 606.00
14 400.00
34 560.00
39 600.00
334 395.00
296 015.00
177 395.00
214 622.00
156 400.00
16 800.00
10 800.00
267 000.00
34 513.00
7680.00
37 667.00
11 100.00
187 200.00
18 000.00
141
142
143
144
145
146
147
148
149
150
151
152
153
54
155
156
157
158
159
160
161
162
163
864 000.00
202 600.00
203 400.00
277 680.00
105 000.00
580 080.00
4 533 960.00
432 000.00
1 411 200.00
172 800.00
86 400.00
1 123 200.00
1 555 200.00
777 600.00
1 296 000.00
1 872 000.00
335 600.00
921 600.00
1 036 800.00
1 728 000.00
777 600.00
57 600.00
17 280.00
xi
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
884 537.00
893 733.00
950 561.00
951 988.00
956 058.00
969 965.00
978 392.00
2 009 086.00
2 010 192.00
2 012 919.00
2 014 680.00
2 021 405.00
2 788 848.00
3025 774.00
3 036 425.00
3 038 077.00
3 032 470.00
522.00
193 947.00
436 124.00
438 474.00
931 130.00
932 493.00
1 142 273.00
1 142 142.00
170.00
196.00
6037.00
24 8967.00
509 844.00
849 549.00
853 489.00
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
372 627.00
996 308.00
5152.00
5838.00
5840.00
5868.00
5656.00
897.00
19 872.00
10 9107.00
211 412.00
1 208 852.00
1 211 097.00
1 215 349.00
5395.00
6697.00
11 840.00
61 291.00
63 205.00
68 671.00
69 939.00
69 998.00
18 2314.00
183 681.00
169 033.00
169 635.00
172 938.00
222 835.00
223 496.00
224 238.00
229 871.00
495 696.00
852 460.00
1 008 395.00
1 009 559.00
1 021 474.00
1 022 778.00
1 029 531.00
1 060 974.00
1 095 758.00
1 098 899.00
1 103 357.00
1 105 899.00
1 133 337.00
1 135 500.00
1 167 069.00
1 194 139.00
1 197 940.00
598.00
598.00
37.00
858 084.00
1 217 027.00
1 462 761.00
1 617 029.00
1 617 030.00
1 620 637.00
1 618 435.00
293.00
2161.00
3026.00
3117.00
4428.00
5954.00
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
54
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
1 114 172.00
1 965 556.00
1745.00
64 236.00
430 794.00
455 467.00
1 336 603.00
1 895 883.00
2 128 076.00
2 239 002.00
2 239 429.00
2 252 299.00
59 076.00
2646.00
52 920.00
69 995.00
70 489.00
1939.00
3329.00
6519.00
10 341.00
14 604.00
19 913.00
363 130.00
364 654.00
365 201.00
367 205.00
368 904.00
624 031.00
1 404 616.00
1 548 712.00
1 569 953.00
318 386.00
938 843.00
1 472 080.00
2 058 785.00
2 160 478.00
175 109.00
577 275.00
587 033.00
1 814 556.00
1 823 969.00
1 874 995.00
1 993 574.00
2 056 510.00
19 299.00
266 187.00
351 597.00
90 093.00
56 4758.00
589 996.00
1 139 509.00
44 230.00
49 789.00
79 269.00
139 714.00
311 915.00
322 874.00
530 403.00
530 403.00
530 403.00
530 403.00
525 654.00
Table 15
DX 2: Iyer and Lee's data (1996)
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
xi
21 804.00
6358.00
27 294.00
127 631.00
189 694.00
3518.00
12 210.00
14 502.00
6562.00
97 212.00
22 276.00
1078.00
1169.00
656.00
759.00
489.00
392.00
125.00
19 919.00
390 250.00
775.00
2777.00
3626.00
6400.00
10 602.00
13 974.00
19 877.00
21 377.00
24 233.00
25 312.00
261 648.00
356 934.00
62
K.-Y. Cai et al. / The Journal of Systems and Software 58 (2001) 4762
References
Cai, K.Y., 1998. Software Defect and Operational Prole Modeling.
Kluwer Academic Publishers, Dordretch.
Iyer, R.K., Lee, I., 1996. Measurement-based analysis of software
reliability. In: Lyu, M.R. (Ed.), Handbook of Software Reliability
Engineering. McGraw-Hill, New York, pp. 303358.
Karuanithi, N., 1993. A neural network approach for software
reliability growth modeling in the presence of code churn. In:
Proceedings of the Fourth International Symposium on Software
Reliability Engineering. pp. 310317.
Karuanithi, N., Malaiya, Y.K., 1992. The scaling problem in neural
networks for software reliability prediction. In: Proceedings of the
Third International Symposium on Software Reliability Engineering. pp. 7682.
Karuanithi, N., Whitley, D., Malaiya, Y.K., 1992a. Prediction of
software reliability using connectionist models. IEEE Transactions
on Software Engineering 18 (7), 563574.
Karuanithi, N., Whitley, D., Malaiya, Y.K., 1992b. Using neural
networks in reliability prediction. IEEE Software 9 (4), 5359.
Khoshgoftaar, T.M., Allen, E.B., Hudepohl, J.P., Aud, S.J., 1997.
Application of neural networks to software quality modeling of a
very large telecommunications system. IEEE Transactions on
Neural Networks 8 (4), 902909.
Khoshgoftaar, T.M., Lanning, D.L., 1995. A neural network approach
for early detection of program modules having high risk in the
maintenance phase. Journal of Systems and Software 29, 8591.
Khoshgoftaar, T.M., Pandya, A.S., More, H.B., 1992. A neural
network approach for predicting software development faults. In:
Proceedings of the Third International Symposium on Software
Reliability Engineering. pp. 8389.
Khoshgoftaar, T.M., Lanning, D.L., Pandya, A.S., 1993. A neural
network modeling for detection of high-risk program. In: Proceedings of the Fourth International Symposium on Software Reliability Engineering. pp. 302309.
Khoshgoftaar, T.M., Lanning, D.L., Pandya, A.S., 1994. A comparative study of pattern recognition techniques for quality evaluation
of telecommunications software. IEEE Journal on Selected Areas
in Communications 12 (2), 279291.
Khoshgoftaar, T.M., Szabo, R.M., 1996. Using neural networks to
predict software faults during testing. IEEE Transactions on
Reliability 45 (3), 456462.
Khoshgoftaar, T.M., Szabo, R.M., Guasti, P.J., 1995. Exploring the
behavior of neural network software quality models. Software
Engineering Journal, 8996.
Lyu, M.R., 1996. Handbook of Software Reliability Engineering.
McGraw-Hill, New York.
Musa, J.D., 1979. Software Reliability Data. Bell Telephone Laboratories, London.
Musa, J.D., Iannino, A., Okumoto, K., 1987. Software Reliability:
Measurement, Prediction, Application. McGraw-Hill, New York.
Neufelder, A.M., 1993. Ensuring Software Reliability. Marcel Dekker,
New York.
Schneidewind, N.F., 1993. Software reliability model with optimal
selection of failure data. IEEE Transactions on Software Engineering 19 (11), 10951104.
Sherer, S.A., 1995. Software fault prediction. Journal of Systems and
Software 29, 97105.