Beruflich Dokumente
Kultur Dokumente
retization
intervals for a rule-based geneti
learning system
Introdu tion
The appli
ation of Geneti
Algorithms (GA) [10, 8 to
lassi
ation problems is
usually known as Geneti
Based Ma
hine Learning (GBML), and traditionally it
has been addressed from two dierent points of view: the Pittsburgh approa
h,
and the Mi
higan approa
h, early exemplied by LS-1 [20 and CS-1 [11, respe
tively.
The
lassi
al knowledge representation used in these systems is a set of rules
where the ante
edent is dened by a prexed nite number of intervals to handle
real-valued attributes. The performan
e of these systems is tied to the right
ele
tion of the intervals.
In this paper we use a rule representation with adaptive dis
rete intervals.
These intervals are splitted and merged through the evolution pro
ess that drives
the training stage. This approa
h avoids the higher
omputational
ost of the
approa
hes whi
h work dire
tly with real values and nds a good dis
retization only expanding the sear
h spa
e with small intervals when ne
essary. This
representation was introdu
ed in [1 and the work presented in this paper is
its evolution, mainly fo
used on generalizing the approa
h and simplifying the
tuning needed for ea
h domain.
This rule representation is
ompared a
ross dierent domains against the
traditional dis
rete representation with xed intervals. The number and size of
the xed intervals approa
h is obtained with two methods: (1) simple uniformwidth intervals and (2) intervals obtained with the Fayyad & Irani method [7,
a well-known dis
retization algorithm. The aim of this
omparison is two-fold:
measure the a
ura
y performan
e and the
omputational
ost.
The paper is stru
tured as follows. Se
tion 2 presents some related work.
Then, we des
ribe the framework of our
lassier system se
tion 3. The adaptive
intervals rule representation is explained in se
tion 4. Next, se
tion 5 des
ribes
the test suite used in the
omparison. The results obtained are summarized in
se
tion 6. Finally, se
tion 7 dis
usses the
on
lusions and some further work.
Related work
There are several approa
hes to handle real-valued attributes in the Geneti
Based Ma
hine Learning (GBML) eld. Early approa
hes use dis
rete rules with
a large number of prexed uniform dis
retization intervals. However, this approa
h has the problem that the sear
h spa
e grows exponentially, slowing the
evolutionary pro
ess without a
lean a
ura
y improvement of the solution [2
Lately, several alternatives to the dis
rete rules have been presented. There
are rules
omposed by real-valued intervals (XCSR [22, [4, COGITO [18).
MOGUL [5, uses a fuzzy reasoning method. This method generates sequentially:
(1) fuzzy rules, and then (2) fuzzy membership fun
tions. Re
ently, GALE [15
proposed a knowledge independent method for learning other knowledge representations like instan
e sets or de
ision trees. All those alternatives present
better performan
e, but usually they also have higher
omputational
ost [18.
A third approa
h is to use a heuristi
dis
retization algorithm. Some of
these methods work with information entropy [7, the 2 statisti
[14 or multidimensional non-uniform dis
retization [13. These algorithms are usually more
a
urate and faster than the uniform dis
retization. However, they suer a la
k
of robustness a
ross some domains [1.
Framework
In this se
tion we des
ribe the main features of our
lassier system. GAssist
(Geneti
Algorithms based
laSSIer sySTem ) [9 is a Pittsburgh style
lassier system based on GABIL [6. Dire
tly from GABIL we have borrowed the
representation of the dis
rete rules (rules with
onjun
tive normal form (CNF)
predi
ates), the semanti
ally
orre
t
rossover operator and the tness
omputation (squared a
ura
y).
Mat
hing strategy: The mat
hing pro
ess follows a \if ... then ... else if ... then..."
stru
ture, usually
alled De
ision List [19.
Mutation operators: The system manipulates variable-length individuals, making more di
ult the tuning of the
lassi
gene-based mutation probability. In
order to simplify this tuning, we dene pmut as the probability i of mutating
an individual. When an individual is sele
ted for mutation (based on pmut ), a
random gene is
hosen inside its
hromosome for mutation.
{ Rule deletion This operator deletes the rules of the individuals that do not
mat
h any training example. This rule deletion is done after the tness
omputation and has two
onstraints: (a) the pro
ess is only a
tivated after a
predened number of iterations, to prevent a massive diversity loss and (b)
the number of rules of an individual never goes below a lower threshold. This
threshold is assigned to the number of
lasses of the domain.
{ Sele
tion bias using the individual size Sele
tion is guided as usual by the
tness (the a
ura
y). However, it also gives
ertain degree of relevan
e to
the size of the individuals, having a poli
y similar to multi-obje
tive systems.
We use tournament sele
tion be
ause its lo
al behavior lets us implement
this poli
y. The
riterion of the tournament is given by an operator
alled
\size-based
omparison" [2. This operator
onsiders two individuals similar
if their tness dieren
e is below a
ertain threshold (d
omp ). Then, it sele
ts
the individual with fewer number of rules.
This se
tion des
ribes the rule representation based on dis
rete rules with adaptive intervals. First we des
ribe the problems that traditional dis
rete rules
present. Then, we explain the adaptive intervals rules proposed and the
hanges
introdu
ed in order to enable the GA to use them.
4.1 Dis
rete rules and unne
essary sear
h spa
e growth
The traditional approa
h to solve problems with real-valued attributes using
dis
rete rules has been done using a dis
retization pro
ess. This dis
retization
an be done using algorithms whi
h determine the dis
retization intervals analyzing the training information or we
an use a simple alternative like using an
uniform-width intervals dis
retization.
In the latter method, the way to in
rease the a
ura
y of the solution is to
in
rease the number of intervals. This solution brings a big problem be
ause the
sear
h spa
e to explore grows in an exponential degree when more intervals are
added. The improvement in a
ura
y expe
ted in
reasing the number of intervals
does not exist sometimes, be
ause the GA spends too mu
h time exploring areas
of the sear
h spa
e whi
h do not need to be explored.
If we nd a
orre
t and minimal set of intervals, the solution a
ura
y will
probably in
rease without a huge in
rease of the
omputational
ost.
with dis
rete adaptive intervals where the dis
retization intervals are not xed.
These intervals are evolved through the iterations, merging and splitting between
them.
To
ontrol the
omputational
ost and the growth of the sear
h spa
e, we
dene the next
onstraints:
{ A number of \low level" uniform and stati
intervals is dened for ea
h
attribute
alled mi
ro-intervals.
{ The adaptive intervals are built joining together mi
ro-intervals.
{ When we split an interval, we sele
t a random point in its mi
ro-intervals
to break it.
{ When we merge two intervals, the value of the resulting interval is taken
from the one whi
h has more mi
ro-intervals. If both have the same number
of mi
ro-intervals, the value is
hosen randomly.
{ The number and size of the initial intervals is sele
ted randomly.
The adaptive intervals as well as the split and merge operators are shown in
gure 1.
Fig. 1.
Rule set
Rule
Class
Split
Interval value
Interval
Microinterval
1
Attribute
1 0 0 1
Cut point
Merge
Test suite
This se
tion summarizes the tests done in order to evaluate the a
ura
y and
e
ien
y of the method presented in this paper. We also
ompare it with some
alternative methods. The tests were
ondu
ted using several ma
hine learning
problems whi
h we also des
ribe.
Table 1.
dis
rete representation, we use two dis
retization methods, the simple uniformwidth intervals method and the Fayyad & Irani method [7.
We analyze the adaptive intervals approa
h with two types of runs. The rst
one assigns the same number of mi
ro-intervals to all the attribute terms of
the individuals. We
all this type of run adaptive. In the se
ond one, attributes
with dierent number of mi
ro-intervals
oexist in the same population. We well
all this type multi-adaptive.
The GA parameters are shown in table 2. The reader
an appre
iate that the
sizing of both psplit and pmerge is the same for all the problems ex
ept the tao
problem. Giving the same value to pmerge and psplit produ
e solutions with too
few rules and intervals, as well as less a
urate than the results obtained with
the
onguration shown in table 2. This is an issue that needs further study.
Another important issue of the psplit and pmerge probabilities for some of
the domains is that they are greater than 1. This means that for these domains
at least one split and merge operation will be surely done to ea
h individual
of the population. Thus, psplit and pmerge be
ome expe
ted values instead of
probabilities. The tuning done produ
es a redu
tion of the number of iterations
needed.
Results
In this se
tion present the results obtained. The aim of the tests was to
ompare
the method presented in the paper in three aspe
ts: a
ura
y and size of the
solutions as well as the
omputational
ost. Forea
h method and test problem
we show the average and standard deviation values of: (1) the
ross-validation
a
ura
y, (2) the size of the best individual in number of rules and intervals per
attribute and (3) the exe
ution time in se
onds. The tests were exe
uted in an
AMD Athlon 1700+ using Linux operating system and C++ language.
The results were also analyzed using the two-sided t-test [23 to determine if
the two adaptive methods outperform the other ones with a signi
an
e level of
1%. Finally, for ea
h
onguration, test and fold, 15 runs using dierent random
seeds were done. Results are shown in table 3. The
olumn titled t-test show a
beside the Uniform or Fayyad & Irani method if it was outperformed by the
Table 2.
Parameter
Crossover probability
Iter. of rule eliminating a
tivation
Iter. of size
omparison a
tivation
Sets of mi
ro-intervals in the multi-adaptive test
Tournament size
Population size
Probability of mutating an individual
Code
#iter
dinterv
ainterv
d
omp
psplit
pmerge
Value
0.6
30
30
5,6,7,8,10,15,20,25
3
300
0.6
Parameter
Number of GA iterations
Number of intervals in the uniform-width dis
rete rules
Number of mi
ro-intervals in the adaptive test
Distan
e parameter in the \size-based
omparison" operator
Probability of splitting an individual (one of its intervals)
Probability of merging an individual (one of its intervals)
Problem
Parameter
#iter dinterv ainterv d
omp
tao
600
12
48 0.001
pima
500
4
8 0.01
10
10 0.02
iris
400
glass
750
4
8 0.015
breast
325
5
10 0.01
bps
500
4
10 0.015
2
5 0.01
mamm
500
lrn
700
5
10 0.01
pmerge psplit
1.3
0.8
0.5
1.5
3.2
1.7
1
1.2
2.6
0.8
0.5
1.5
3.2
1.7
1
1.2
adaptive methods. The adaptive methods were never outperformed in the tests
done, showing a good robustness.
The results are summarized using the ranking in table 4. The ranking for
ea
h problem and method is based on the a
ura
y. The global i rankings are
omputed averaging the problem rankings.
Table 3 shows that in two of the tests the best performing method was the
Fayyad & Irani interval dis
retization te
hnique. However, in the rest of the tests
its performan
e is lower, showing a la
k of robustness a
ross dierent domains.
The two adaptive tests a
hieved the best results of the ranking. Nevertheless, the
goal of improving the rule representation with the multi-adaptive
onguration
has not been a
hieved. It is only better than the original adaptive
onguration
in three of the eight test problems. The
omputational
ost is
learly the main
drawba
k of the adaptive intervals representation. The Fayyad & Irani method
is in average 2.62 times faster than it.
Mean and deviation of the a
ura
y (per
entage of
orre
tly
lassier examples), number of rules, intervals per attribute and exe
ution time for ea
h method
tested. Bold entries show the method with best results for ea
h test problem. A mark
a signi
ant out-performan
e based on a t-test
Table 3.
Problem Conguration A
ura
y Number of Rules Intervals per Rule Time t-test
Uniform
93.71.2
8.81.6
8.30.0
36.03.5
Fayyad
87.81.1
3.10.3
3.40.1
24.21.4
tao
Adaptive
94.61.3
22.55.6
7.70.4
96.614.7
Multi-Adaptive 94.31.0
19.54.9
6.00.6
94.513.9
Uniform
73.84.1
6.32.2
3.70.0
23.22.8
Fayyad
73.63.1
6.62.6
2.30.2
26.43.0
pima
Adaptive
74.83.5
6.22.6
2.00.4
56.29.4
Multi-Adaptive 74.43.1
5.82.2
1.90.4
59.78.9
Uniform
92.92.7
3.81.1
8.20.0
5.20.7
Fayyad
94.23.0
3.20.6
2.80.1
5.50.1
iris
Adaptive
94.92.3
3.30.5
1.30.2
9.20.4
Multi-Adaptive 96.22.2
3.60.9
1.30.2
9.00.8
Uniform
60.58.9
8.71.8
3.70.0
13.91.5
Fayyad
65.76.1
8.11.4
2.40.1
14.01.1
glass
Adaptive
64.64.7
5.91.7
1.70.2
35.15.2
Multi-Adaptive 65.24.1
6.72.0
1.80.2
38.45.0
Uniform
94.82.6
4.82.5
4.60.0
6.51.0
Fayyad
95.21.8
4.10.8
3.60.1
5.80.4
breast Adaptive
95.42.3
2.71.0
1.80.2
15.72.1
Multi-Adaptive 95.32.3
2.60.9
1.70.2
17.41.5
Uniform
77.63.3
15.07.0
3.90.0
50.89.0
Fayyad
80.03.1
7.13.8
2.40.1
37.76.0
bps
Adaptive
80.33.5
4.73.0
2.10.4
106.621.1
Multi-Adaptive 80.13.3
5.12.0
2.00.3
115.920.5
Uniform
63.29.9
2.60.5
2.00.0
7.81.0
Fayyad
65.311.1
2.30.5
2.00.1
8.50.7
mamm Adaptive
65.85.3
4.41.7
1.80.2
27.64.9
Multi-Adaptive 65.06.1
4.41.9
1.90.2
27.45.5
Uniform
64.74.9
17.85.1
4.90.0
29.24.0
Fayyad
67.55.1
14.35.0
4.40.1
26.53.4
lrn
Adaptive
66.14.6
14.04.6
3.60.3
58.97.9
Multi-Adaptive 66.74.1
11.64.1
3.40.2
53.97.2
Table 4.
ranking.
tao
pima
iris
glass
breast
bps
mam
lrn
theless, the performan
e in
rease is not signi
ant. On the other hand, when
the adaptive intervals outperform the other methods, the performan
e in
rease
is higher, showing a better degree of robustness.
The overhead of evolving dis
retization intervals and rules at the same time is
quite signi
ant, being its main drawba
k. Beside the
ost of the representation
itself (our implementation uses twi
e the memory of the dis
rete representation
for the same number of intervals) the main dieren
e is the signi
ant redu
tion
of the sear
h spa
e a
hieved by a heuristi
dis
retization.
Some further work should use the knowledge provided by the dis
retization
te
hniques in order to redu
e the
omputational
ost of the adaptive intervals
representation. This pro
ess should be a
hieved without losing robustness. Another important point of further study is how the value of psplit and pmerge
ae
t the behavior of the system, in order to simplify the tuning needed for ea
h
domain.
Finally, it would also be interesting to
ompare the adaptive intervals rule
representation with some representation dealing dire
tly with real-valued attributes, like the ones des
ribed in the related work se
tion. This
omparison
should follow the same
riteria used here:
omparing both the a
ura
y and the
omputational
ost.
A
knowledgments
The authors a
knowledge the support provided under grant numbers 2001FI
00514, CICYT/Tel08-0408-02 and FIS00/0033-02. The results of this work were
partially obtained using equipment
ofunded by the Dire
io General de Re
er
a
de la Generalitat de Catalunya (D.O.G.C. 30/12/1997). Finally we would like
to thank Enginyeria i Arquite
tura La Salle for their support to our resear
h
group.
Referen
es
1. Jaume Ba
ardit and Josep M. Garrell. Evolution of adaptive dis
retization intervals for a rule-based geneti
learning system. In Pro
eedings of the Geneti
and
Evolutionary Computation Conferen
e (GECCO-2002) (to appear), 2002.
2. Jaume Ba
ardit and Josep M. Garrell. Metodos de generaliza
ion para sistemas
lasi
adores de Pittsburgh. In Pro
eedings of the \Primer Congreso Iberoameri
ano de Algoritmos Evolutivos y Bioinspirados (AEB'02)", pages 486{493, 2002.
3. C. Blake, E. Keogh, and C. Merz. U
i repository of ma
hine learning databases,
1998. Blake, C., Keogh, E., & Merz, C.J. (1998). UCI repository of ma
hine
learning databases (www.i
s.u
i.edu/mlearn/MLRepository.html).
4. A. L. Cor
oran and S. Sen. Using real-valued geneti
algorithms to evolve rule
sets for
lassi
ation. In Pro
eedings of the IEEE Conferen
e on Evolutionary
Computation, pages 120{124, 1994.
5. O. Cordon, M. del Jesus, and F. Herrera. Geneti
learning of fuzzy rule-based
lassi
ation systems
o-operating with fuzzy reasoning methods. In International
Journal of Intelligent Systems, Vol. 13 (10/11), pages 1025{1053, 1998.
6. Kenneth A. DeJong and William M. Spears. Learning
on
ept
lassi
ation rules
using geneti
algorithms. Pro
eedings of the International Joint Conferen
e on
Arti
ial Intelligen
e, pages 651{656, 1991.
7. Usama M. Fayyad and Keki B. Irani. Multi-interval dis
retization of
ontinuousvalued attributes for
lassi
ation learning. In IJCAI, pages 1022{1029, 1993.
8. David E. Goldberg. Geneti
Algorithms in Sear
h, Optimization and Ma
hine
Learning. Addison-Wesley Publishing Company, In
., 1989.
9. Elisabet Golobardes, Xavier Llora, Josep Maria Garrell, David Vernet, and Jaume
Ba
ardit. Geneti
lassier system as a heuristi
weighting method for a
asebased
lassier system. Butllet de l'Asso
ia
io Catalana d'Intel.ligen
ia Arti
ial,
22:132{141, 2000.
10. John H. Holland. Adaptation in Natural and Arti
ial Systems. University of
Mi
higan Press, 1975.
11. John H. Holland. Es
aping Brittleness: The possibilities of General-Purpose Learning Algorithms Applied to Parallel Rule-Based Systems. In Ma
hine learning, an
arti
ial intelligen
e approa
h. Volume II, pages 593{623. 1986.
12. Ron Kohavi. A study of
ross-validation and bootstrap for a
ura
y estimation
and model sele
tion. In IJCAI, pages 1137{1145, 1995.
13. Alexander V. Kozlov and Daphne Koller. Nonuniform dynami
dis
retization in
hybrid networks. In Pro
eedings of the 13th Annual Conferen
e on Un
ertainty in
AI (UAI), pages 314{325, 1997.
14. H. Liu and R. Setiono. Chi2: Feature sele
tion and dis
retization of numeri
attributes. In Pro
eedings of 7th IEEE International Conferen
e on Tools with Arti
ial Intelligen
e, pages 388{391. IEEE Computer So
iety, 1995.
15. Xavier Llora and Josep M. Garrell. Knowledge-independent data mining with
ne-grained parallel evolutionary algorithms. In Pro
eedings of the Geneti
and
Evolutionary Computation Conferen
e (GECCO-2001), pages 461{468. Morgan
Kaufmann, 2001.
16. J. Mart, X. Cuf, J. Regin
os, and et al. Shape-based feature sele
tion for mi
ro
al
i
ation evaluation. In Imaging Conferen
e on Image Pro
essing, 3338:1215-1224,
1998.
17. E. Martnez Marroqun, C. Vos, and et al. Morphologi
al analysis of mammary
biopsy images. In Pro
eedings of the IEEE International Conferen
e on Image
Pro
essing (ICIP'96), pages 943{947, 1996.
18. Jose C. Riquelme and Jesus S. Aguilar. Codi
a
ion indexada de atributos
ontinuos para algoritmos evolutivos en aprendizaje supervisado. In Pro
eedings of
the \Primer Congreso Iberoameri
ano de Algoritmos Evolutivos y Bioinspirados
(AEB'02)", pages 161{167, 2002.
19. Ronald L. Rivest. Learning de
ision lists. Ma
hine Learning, 2(3):229{246, 1987.
20. Stephen F. Smith. Flexible learning of problem solving heuristi
s through adaptive sear
h. In Pro
eedings of the 8th International Joint Conferen
e on Arti
ial
Intelligen
e (IJCAI-83), pages 421{425, Los Altos, CA, 1983. Morgan Kaufmann.
21. Teren
e Soule and James A. Foster. Ee
ts of
ode growth and parsimony pressure
on populations in geneti
programming. Evolutionary Computation, 6(4):293{309,
Winter 1998.
22. Stewart W. Wilson. Get real! XCS with
ontinuous-valued inputs. In L. Booker,
Stephanie Forrest, M. Mit
hell, and Ri
k L. Riolo, editors, Fests
hrift in Honor of
John H. Holland, pages 111{121. Center for the Study of Complex Systems, 1999.
23. Ian H. Witten and Eibe Frank. Data Mining: pra
ti
al ma
hine learning tools and
te
hniques with java implementations. Morgan Kaufmann, 2000.