Beruflich Dokumente
Kultur Dokumente
Test is not able to identify the best input variables, but the best
result is included in the top Gamma value group. The standard
error has very valuable information for choosing the group
members. This demonstrates that the Gamma Test is still a
valuable tool in significantly reducing the modeling workload.
The
reason
for
this
phenomenon
is
discussed
under
the
I. INTRODUCTION
Nonlinear model development is much more complicated
than that of linear ones. The conventional tools such as cross
correlation and Principal Component Analysis (PCA) are
usually not suitable for nonlinear systems. Natural systems,
such as hydrological processes, are usually complex and
nonlinear. A hydrological modeller needs to use trial and
error method to build mathematical models (such as ANN Artificial Neural Networks) for different input combinations
[1 - 3]. This is very time consuming since the modeller needs
to calibrate and test different model structures with all the
likely input combinations. In addition, there is no guidance
about what accuracy a best model is able to achieve. In this
study, the Gamma Test [4] developed by the computer
scientists in Cardiff University is explored for its suitability
in reducing model development workload and providing
input data guidance before specific models are developed
(i.e., its result is independent of the models to be developed).
Theoretically, the Gamma Test is able to provide the best
mean square error that can possibly be achieved using any
nonlinear smooth models. Although the Gamma Test has
been applied by some researchers in identifying data mining
Manuscript received February 4 , 20 I o.
Dawei Han is with Department of Civil Engineering, University of
Bristol, BS8, ITR, UK (phone: +44 11 7 3315 739; fax: +44 11 7 3315 71 9;
email: d.hanlalbristol.ac.uk)
Weizhong Yan is with GE Research, USA (email: yan@crd.ge.com)
Alireza Moghaddam Nia is with University of Zabol, Iran (email:
ali.moghaddamnia@gmail.com)
t
kh
1 M XN ik Xi 12 ( p
8M(k)=-".
M L... t=1 I ( ' )- l k )
where ,..., denotes
the Euclidean distance,
(3)
and the
{(xi,yJ,liM}
(1)
where,
(2)
(8M(k)'YM(k))
y=A8+f
(5)
YM(k) Var(r)
in probability as 8M(k) 0
(8 =0) is the f
value,
(6)
=_
(Y2 f _
(y)
(Y
2 (y) is the variance of output
where,
(7)
y,
which allows a
input-output pairs
. .
y(x)= L Wi' Yi
(8)
i=1
where
Wi =
V.
-1
inputs (i.e., 15 in this study); from which, the best one can be
determined by observing the Gamma values, which indicates
a measure of the best MSE attainable using any modeling
methods for unseen input data. Thus, we performed the
Gamma Tests in different combinations varying the number
of inputs as shown in Table 1. In the table, the minimum
value of r was observed when we use all the input variables
W, T, RH, Ed. In theory, the gradient is considered as an
indicator of model complexity. V-ratio is a measure of
predictability of given outputs using available inputs. An
input data set with low Gamma value, in addition to low
gradient and V-ratio, is considered as the best scenario for
the modeling. Since there is a lack of quantitative guide on
using gradient and V-ratio, only the Gamma values are used
in the analysis. It can be seen that the best result for a well
calibrated model should be around 7.28 since this would be
the innate noise level embedded within the data.
.
TABLE I Gamma Tests fior 15 'mput vanable comb'mations
No
RESULTS
Mask
Gamma
Gradient
V-ratio
21.1
2723
0.44
10
52.31
-53791
1.11
II
17.73
-3
0.46
100
18.16
637
0.61
101
17.88
240
0.47
110
17.88
-286
0.36
111
17.7
-39
0.37
1000
39. 6
-5502
0.48
1001
9.4
597
0.22
10
1010
25.89
1511
0.45
11
1011
7.3
509
0.20
12
1100
7.44
460
0.12
13
1101
7.37
127
0.15
14
1110
7.46
109
0.10
15
IIII
7.28
142
0.14
Testing
No
Mask
RI\2
mse
RI\2
mse
0001
0.75
20.98
0.759
20.6
0010
0.391
51.19
0. 412
50.34
0011
0.786
18.06
0.785
18.35
0100
0.775
18.89
0.779
18.91
0101
0.789
17.75
0.789
18.03
0110
0.788
17.89
0.788
18.19
0111
0.79
17.64
0.79
17.95
1000
0.544
38.37
0. 542
39. 26
1001
0.889
9.38
0.885
9.83
10
1010
0.699
25.38
0. 687
26.73
11
1011
0.915
7.25
0.896
8.9
12
1100
0.91
7.6
0.904
8.27
13
1101
0.917
7.02
0.907
7.99
14
1110
0.921
6.7
0.905
8.16
15
1111
0.926
6.24
0.907
8.01
Mask
No
Gamma
SE
21.1
0.25
10
52.31
0.62
II
17.73
0.21
100
18.16
0.21
101
17.88
0.21
110
17.88
0.21
III
17.7
0.21
1000
39.6
0.47
1001
9.4
0.11
10
1010
25.89
0.31
II
1011
7.3
0.09
12
1100
7.44
0.09
13
1101
7.37
0.09
14
1110
7.46
0.09
15
1111
7.28
0.09
-30
- 20
-10
10
20
30
VI.
DISCUSSION
TABLE 4 Gamma Tests with SE (Standard Error)
Mask
Gamma
SE
Gamma Range
1011
7.30
0.09
7.21 - 7.39
1100
7.44
0.09
7.35 - 7.53
1101
7.37
0.09
7.28-7.46
1110
7.46
0.09
7.37-7.55
1111
7.28
0.09
7.19-7.37
VII.
CONCLUSIONS
REFERENCES
[I]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]