Sie sind auf Seite 1von 10

Evolution of multi-adaptive dis

retization
intervals for a rule-based geneti learning system

Jaume Ba ardit and Josep Maria Garrell


Intelligent Systems Resear h Group
Enginyeria i Arquite tura La Salle,
Universitat Ramon Llull,
Psg. Bonanova 8, 08022-Bar elona,
Catalonia, Spain, Europe. fjba ardit,josepmggsalleURL.edu
Geneti Based Ma hine Learning (GBML) systems traditionally have evolved rules that only deal with dis rete attributes. Therefore, some dis retization pro ess is needed in order to teal with realvalued attributes. There are several methods to dis retize real-valued
attributes into a nite number of intervals, however none of them an
e iently solve all the possible problems. The alternative of a high number of simple uniform-width intervals usually expands the size of the
sear h spa e without a lear performan e gain. This paper proposes a
rule representation whi h uses adaptive dis rete intervals that split or
merge through the evolution pro ess, nding the orre t dis retization
intervals at the same time as the learning pro ess is done.
Abstra t.

Introdu tion

The appli ation of Geneti Algorithms (GA) [10, 8 to lassi ation problems is
usually known as Geneti Based Ma hine Learning (GBML), and traditionally it
has been addressed from two di erent points of view: the Pittsburgh approa h,
and the Mi higan approa h, early exempli ed by LS-1 [20 and CS-1 [11, respe tively.
The lassi al knowledge representation used in these systems is a set of rules
where the ante edent is de ned by a pre xed nite number of intervals to handle
real-valued attributes. The performan e of these systems is tied to the right
ele tion of the intervals.
In this paper we use a rule representation with adaptive dis rete intervals.
These intervals are splitted and merged through the evolution pro ess that drives
the training stage. This approa h avoids the higher omputational ost of the
approa hes whi h work dire tly with real values and nds a good dis retization only expanding the sear h spa e with small intervals when ne essary. This
representation was introdu ed in [1 and the work presented in this paper is
its evolution, mainly fo used on generalizing the approa h and simplifying the
tuning needed for ea h domain.
This rule representation is ompared a ross di erent domains against the
traditional dis rete representation with xed intervals. The number and size of

the xed intervals approa h is obtained with two methods: (1) simple uniformwidth intervals and (2) intervals obtained with the Fayyad & Irani method [7,
a well-known dis retization algorithm. The aim of this omparison is two-fold:
measure the a ura y performan e and the omputational ost.
The paper is stru tured as follows. Se tion 2 presents some related work.
Then, we des ribe the framework of our lassi er system se tion 3. The adaptive
intervals rule representation is explained in se tion 4. Next, se tion 5 des ribes
the test suite used in the omparison. The results obtained are summarized in
se tion 6. Finally, se tion 7 dis usses the on lusions and some further work.

Related work

There are several approa hes to handle real-valued attributes in the Geneti
Based Ma hine Learning (GBML) eld. Early approa hes use dis rete rules with
a large number of pre xed uniform dis retization intervals. However, this approa h has the problem that the sear h spa e grows exponentially, slowing the
evolutionary pro ess without a lean a ura y improvement of the solution [2
Lately, several alternatives to the dis rete rules have been presented. There
are rules omposed by real-valued intervals (XCSR [22, [4, COGITO [18).
MOGUL [5, uses a fuzzy reasoning method. This method generates sequentially:
(1) fuzzy rules, and then (2) fuzzy membership fun tions. Re ently, GALE [15
proposed a knowledge independent method for learning other knowledge representations like instan e sets or de ision trees. All those alternatives present
better performan e, but usually they also have higher omputational ost [18.
A third approa h is to use a heuristi dis retization algorithm. Some of
these methods work with information entropy [7, the 2 statisti [14 or multidimensional non-uniform dis retization [13. These algorithms are usually more
a urate and faster than the uniform dis retization. However, they su er a la k
of robustness a ross some domains [1.

Framework

In this se tion we des ribe the main features of our lassi er system. GAssist
(Geneti Algorithms based laSSI er sySTem ) [9 is a Pittsburgh style lassi er system based on GABIL [6. Dire tly from GABIL we have borrowed the
representation of the dis rete rules (rules with onjun tive normal form (CNF)
predi ates), the semanti ally orre t rossover operator and the tness omputation (squared a ura y).
Mat hing strategy: The mat hing pro ess follows a \if ... then ... else if ... then..."
stru ture, usually alled De ision List [19.
Mutation operators: The system manipulates variable-length individuals, making more di ult the tuning of the lassi gene-based mutation probability. In
order to simplify this tuning, we de ne pmut as the probability i of mutating
an individual. When an individual is sele ted for mutation (based on pmut ), a
random gene is hosen inside its hromosome for mutation.

Control of the individuals length: Dealing with variable-length individuals arises


some serious onsiderations. One of the most important ones is the ontrol of the
size of the evolving individuals [21. This ontrol is a hieved ins GAssist using
two di erent operators:

{ Rule deletion This operator deletes the rules of the individuals that do not

mat h any training example. This rule deletion is done after the tness omputation and has two onstraints: (a) the pro ess is only a tivated after a
prede ned number of iterations, to prevent a massive diversity loss and (b)
the number of rules of an individual never goes below a lower threshold. This
threshold is assigned to the number of lasses of the domain.
{ Sele tion bias using the individual size Sele tion is guided as usual by the
tness (the a ura y). However, it also gives ertain degree of relevan e to
the size of the individuals, having a poli y similar to multi-obje tive systems.
We use tournament sele tion be ause its lo al behavior lets us implement
this poli y. The riterion of the tournament is given by an operator alled
\size-based omparison" [2. This operator onsiders two individuals similar
if their tness di eren e is below a ertain threshold (d omp ). Then, it sele ts
the individual with fewer number of rules.

Dis rete rules with adaptive intervals

This se tion des ribes the rule representation based on dis rete rules with adaptive intervals. First we des ribe the problems that traditional dis rete rules
present. Then, we explain the adaptive intervals rules proposed and the hanges
introdu ed in order to enable the GA to use them.

4.1 Dis rete rules and unne essary sear h spa e growth
The traditional approa h to solve problems with real-valued attributes using
dis rete rules has been done using a dis retization pro ess. This dis retization
an be done using algorithms whi h determine the dis retization intervals analyzing the training information or we an use a simple alternative like using an
uniform-width intervals dis retization.
In the latter method, the way to in rease the a ura y of the solution is to
in rease the number of intervals. This solution brings a big problem be ause the
sear h spa e to explore grows in an exponential degree when more intervals are
added. The improvement in a ura y expe ted in reasing the number of intervals
does not exist sometimes, be ause the GA spends too mu h time exploring areas
of the sear h spa e whi h do not need to be explored.
If we nd a orre t and minimal set of intervals, the solution a ura y will
probably in rease without a huge in rease of the omputational ost.

4.2 Finding good and minimal intervals


Our aim is to nd good dis retization intervals without a great expansion of the
sear h spa e. In order to a hieve this goal we de ned a rule representation [1

with dis rete adaptive intervals where the dis retization intervals are not xed.
These intervals are evolved through the iterations, merging and splitting between
them.
To ontrol the omputational ost and the growth of the sear h spa e, we
de ne the next onstraints:
{ A number of \low level" uniform and stati intervals is de ned for ea h
attribute alled mi ro-intervals.
{ The adaptive intervals are built joining together mi ro-intervals.
{ When we split an interval, we sele t a random point in its mi ro-intervals
to break it.
{ When we merge two intervals, the value of the resulting interval is taken
from the one whi h has more mi ro-intervals. If both have the same number
of mi ro-intervals, the value is hosen randomly.
{ The number and size of the initial intervals is sele ted randomly.
The adaptive intervals as well as the split and merge operators are shown in
gure 1.
Fig. 1.

Rule set

Adaptive intervals representation and the split and merge operators.


Interval to mutate
Attribute
1

Rule

Class

Split

Interval value

Interval

Microinterval
1

Attribute

1 0 0 1
Cut point

Merge

Neighbour selected to merge

To apply the split and merge operators we have added to the GA y le


two spe ial phases applied to the o spring population after the mutation phase.
For ea h phase (split and merge) we have a probability (psplit or pmerge ) of
applying a split or merge operation to an individual. If an individual is sele ted
for splitting or merging, a random point inside its hromosome is hosen to apply
the operation.
Finally, this representation requires some hanges in some other parts of the
GA:
{ The rossover operator an only take pla e in the attribute boundaries.
{ The \size-based omparison" operator uses the length (number of genes) of
the individual instead of the number of rules, be ause now the size of a rule
an hange when the number of intervals that it ontains hange. This hange
also makes the GA prefer the individuals with fewer intervals in addition to
fewer rules, further simplifying them.

4.3 Changes to the adaptive intervals rule representation


One of the main drawba ks of the initial approa h was the sizing of the number
of mi ro-intervals assigned to ea h attribute term of the rules. This parameter
is di ult to tune be ause it is domain-spe i .
In this paper we test another approa h (multi-adaptive ) whi h onsists in
evolving attribute terms with di erent number of mi ro-intervals in the same
population. This enables the evolutionary pro ess to sele t the orre t number
of mi ro-intervals for ea h attribute term of the rules. The number of mi rointervals of ea h attribute term is sele ted from a prede ned set in the initialization stage.
The initialization phase has also hanged. In our previous work the number
and size of the intervals was uniform. We have hanged this poli y to a total
random initialization in order to gain diversity in the initial population.
The last hange introdu ed involves the split and merge operators. In the
previous version these operators were integrated inside the mutation. This made
the sizing of the probabilities very di ult be ause the three operators (split,
merge and mutation) were oupled. Using an extra re ombination stage in this
version we eliminate this tight linkage.

Test suite

This se tion summarizes the tests done in order to evaluate the a ura y and
e ien y of the method presented in this paper. We also ompare it with some
alternative methods. The tests were ondu ted using several ma hine learning
problems whi h we also des ribe.

5.1 Test problems


The sele ted test problems for this paper present di erent hara teristi s in order
to give us a broad overview of the performan e of the methods being ompared.
The rst problem is a syntheti problem (Tao [15) that has non-orthogonal
lass boundaries. We also use several problems provided by the University of California at Irvine (UCI) repository [3. The problems sele ted are: Pima-indiansdiabetes (pima ), iris, glass and breast- an er-wins onsin (breast ). Finally we will
use three problems from our own private repository. The rst two deal with the
diagnosis of breast an er based of biopsies (bps [17) and mammograms (mamm
[16) whereas the last one is related to the predi tion of student quali ations
(lrn [9). The hara teristi s of the problems are listed in table 1. The partition
of the examples into the train and test sets was done using the strati ed ten-fold
ross-validation method [12.

5.2 Con gurations of the GA to test


The main goal of the tests are to evaluate the performan e of the adaptive intervals rules representation. In order to ompare this method with the traditional

Table 1.

Chara teristi s of the test problems.

Dataset Number of examples real attributes dis rete attributes lasses


tao
1888
2
2
768
8
2
pima
iris
150
4
3
glass
214
9
6
breast
699
9
2
1027
24
2
bps
mamm
216
21
2
lrn
648
4
2
5

dis rete representation, we use two dis retization methods, the simple uniformwidth intervals method and the Fayyad & Irani method [7.
We analyze the adaptive intervals approa h with two types of runs. The rst
one assigns the same number of mi ro-intervals to all the attribute terms of
the individuals. We all this type of run adaptive. In the se ond one, attributes
with di erent number of mi ro-intervals oexist in the same population. We well
all this type multi-adaptive.
The GA parameters are shown in table 2. The reader an appre iate that the
sizing of both psplit and pmerge is the same for all the problems ex ept the tao
problem. Giving the same value to pmerge and psplit produ e solutions with too
few rules and intervals, as well as less a urate than the results obtained with
the on guration shown in table 2. This is an issue that needs further study.
Another important issue of the psplit and pmerge probabilities for some of
the domains is that they are greater than 1. This means that for these domains
at least one split and merge operation will be surely done to ea h individual
of the population. Thus, psplit and pmerge be ome expe ted values instead of
probabilities. The tuning done produ es a redu tion of the number of iterations
needed.

Results

In this se tion present the results obtained. The aim of the tests was to ompare
the method presented in the paper in three aspe ts: a ura y and size of the
solutions as well as the omputational ost. Forea h method and test problem
we show the average and standard deviation values of: (1) the ross-validation
a ura y, (2) the size of the best individual in number of rules and intervals per
attribute and (3) the exe ution time in se onds. The tests were exe uted in an
AMD Athlon 1700+ using Linux operating system and C++ language.
The results were also analyzed using the two-sided t-test [23 to determine if
the two adaptive methods outperform the other ones with a signi an e level of
1%. Finally, for ea h on guration, test and fold, 15 runs using di erent random
seeds were done. Results are shown in table 3. The olumn titled t-test show a
 beside the Uniform or Fayyad & Irani method if it was outperformed by the

Table 2.

Common and problem-spe i parameters of the GA.

Parameter
Crossover probability
Iter. of rule eliminating a tivation
Iter. of size omparison a tivation
Sets of mi ro-intervals in the multi-adaptive test
Tournament size
Population size
Probability of mutating an individual
Code
#iter

dinterv
ainterv
d omp
psplit
pmerge

Value
0.6
30
30
5,6,7,8,10,15,20,25
3
300
0.6

Parameter
Number of GA iterations
Number of intervals in the uniform-width dis rete rules
Number of mi ro-intervals in the adaptive test
Distan e parameter in the \size-based omparison" operator
Probability of splitting an individual (one of its intervals)
Probability of merging an individual (one of its intervals)
Problem

Parameter
#iter dinterv ainterv d omp
tao
600
12
48 0.001
pima
500
4
8 0.01
10
10 0.02
iris
400
glass
750
4
8 0.015
breast
325
5
10 0.01
bps
500
4
10 0.015
2
5 0.01
mamm
500
lrn
700
5
10 0.01

pmerge psplit

1.3
0.8
0.5
1.5
3.2
1.7
1
1.2

2.6
0.8
0.5
1.5
3.2
1.7
1
1.2

adaptive methods. The adaptive methods were never outperformed in the tests
done, showing a good robustness.
The results are summarized using the ranking in table 4. The ranking for
ea h problem and method is based on the a ura y. The global i rankings are
omputed averaging the problem rankings.
Table 3 shows that in two of the tests the best performing method was the
Fayyad & Irani interval dis retization te hnique. However, in the rest of the tests
its performan e is lower, showing a la k of robustness a ross di erent domains.
The two adaptive tests a hieved the best results of the ranking. Nevertheless, the
goal of improving the rule representation with the multi-adaptive on guration
has not been a hieved. It is only better than the original adaptive on guration
in three of the eight test problems. The omputational ost is learly the main
drawba k of the adaptive intervals representation. The Fayyad & Irani method
is in average 2.62 times faster than it.

Con lusions and further work

This paper fo used on an adaptive rule representation as a a robust method


for nding a good dis retization. The main ontribution done is provided by
the used of adaptive dis rete intervals, whi h an split or merge through the
evolution pro ess, redu ing the sear h spa e where it is possible.
The use of a heuristi dis retization method (like the Fayyad & Irani one)
outperform the adaptive intervals representation in some test problem. Never-

Mean and deviation of the a ura y (per entage of orre tly lassi er examples), number of rules, intervals per attribute and exe ution time for ea h method
tested. Bold entries show the method with best results for ea h test problem. A  mark
a signi ant out-performan e based on a t-test
Table 3.

Problem Con guration A ura y Number of Rules Intervals per Rule Time t-test
Uniform
93.71.2
8.81.6
8.30.0
36.03.5
Fayyad
87.81.1
3.10.3
3.40.1
24.21.4

tao
Adaptive
94.61.3
22.55.6
7.70.4
96.614.7
Multi-Adaptive 94.31.0
19.54.9
6.00.6
94.513.9
Uniform
73.84.1
6.32.2
3.70.0
23.22.8
Fayyad
73.63.1
6.62.6
2.30.2
26.43.0
pima
Adaptive
74.83.5
6.22.6
2.00.4
56.29.4
Multi-Adaptive 74.43.1
5.82.2
1.90.4
59.78.9
Uniform
92.92.7
3.81.1
8.20.0
5.20.7
Fayyad
94.23.0
3.20.6
2.80.1
5.50.1
iris
Adaptive
94.92.3
3.30.5
1.30.2
9.20.4
Multi-Adaptive 96.22.2
3.60.9
1.30.2
9.00.8
Uniform
60.58.9
8.71.8
3.70.0
13.91.5
Fayyad
65.76.1
8.11.4
2.40.1
14.01.1
glass
Adaptive
64.64.7
5.91.7
1.70.2
35.15.2
Multi-Adaptive 65.24.1
6.72.0
1.80.2
38.45.0
Uniform
94.82.6
4.82.5
4.60.0
6.51.0
Fayyad
95.21.8
4.10.8
3.60.1
5.80.4
breast Adaptive
95.42.3
2.71.0
1.80.2
15.72.1
Multi-Adaptive 95.32.3
2.60.9
1.70.2
17.41.5
Uniform
77.63.3
15.07.0
3.90.0
50.89.0
Fayyad
80.03.1
7.13.8
2.40.1
37.76.0
bps
Adaptive
80.33.5
4.73.0
2.10.4
106.621.1
Multi-Adaptive 80.13.3
5.12.0
2.00.3
115.920.5
Uniform
63.29.9
2.60.5
2.00.0
7.81.0
Fayyad
65.311.1
2.30.5
2.00.1
8.50.7
mamm Adaptive
65.85.3
4.41.7
1.80.2
27.64.9
Multi-Adaptive 65.06.1
4.41.9
1.90.2
27.45.5
Uniform
64.74.9
17.85.1
4.90.0
29.24.0
Fayyad
67.55.1
14.35.0
4.40.1
26.53.4
lrn
Adaptive
66.14.6
14.04.6
3.60.3
58.97.9
Multi-Adaptive 66.74.1
11.64.1
3.40.2
53.97.2

Table 4.

ranking.

Performan e ranking of the tested methods. Lower number means better


Problem

Fixed Fayyad Adaptive Multi-Adaptive


3
4
1
2
3
4
1
2
4
3
2
1
4
1
3
2
4
3
1
2
4
3
1
2
4
2
1
3
4
1
3
2
Average
3.25 2.625 1.625
2
Final rank 4
3
1
2

tao
pima
iris
glass
breast
bps
mam
lrn

theless, the performan e in rease is not signi ant. On the other hand, when
the adaptive intervals outperform the other methods, the performan e in rease
is higher, showing a better degree of robustness.
The overhead of evolving dis retization intervals and rules at the same time is
quite signi ant, being its main drawba k. Beside the ost of the representation
itself (our implementation uses twi e the memory of the dis rete representation
for the same number of intervals) the main di eren e is the signi ant redu tion
of the sear h spa e a hieved by a heuristi dis retization.
Some further work should use the knowledge provided by the dis retization
te hniques in order to redu e the omputational ost of the adaptive intervals
representation. This pro ess should be a hieved without losing robustness. Another important point of further study is how the value of psplit and pmerge
a e t the behavior of the system, in order to simplify the tuning needed for ea h
domain.
Finally, it would also be interesting to ompare the adaptive intervals rule
representation with some representation dealing dire tly with real-valued attributes, like the ones des ribed in the related work se tion. This omparison
should follow the same riteria used here: omparing both the a ura y and the
omputational ost.

A knowledgments
The authors a knowledge the support provided under grant numbers 2001FI
00514, CICYT/Tel08-0408-02 and FIS00/0033-02. The results of this work were
partially obtained using equipment ofunded by the Dire io General de Re er a
de la Generalitat de Catalunya (D.O.G.C. 30/12/1997). Finally we would like
to thank Enginyeria i Arquite tura La Salle for their support to our resear h
group.

Referen es
1. Jaume Ba ardit and Josep M. Garrell. Evolution of adaptive dis retization intervals for a rule-based geneti learning system. In Pro eedings of the Geneti and
Evolutionary Computation Conferen e (GECCO-2002) (to appear), 2002.
2. Jaume Ba ardit and Josep M. Garrell. Metodos de generaliza ion para sistemas
lasi adores de Pittsburgh. In Pro eedings of the \Primer Congreso Iberoameri ano de Algoritmos Evolutivos y Bioinspirados (AEB'02)", pages 486{493, 2002.
3. C. Blake, E. Keogh, and C. Merz. U i repository of ma hine learning databases,
1998. Blake, C., Keogh, E., & Merz, C.J. (1998). UCI repository of ma hine
learning databases (www.i s.u i.edu/mlearn/MLRepository.html).
4. A. L. Cor oran and S. Sen. Using real-valued geneti algorithms to evolve rule
sets for lassi ation. In Pro eedings of the IEEE Conferen e on Evolutionary
Computation, pages 120{124, 1994.
5. O. Cordon, M. del Jesus, and F. Herrera. Geneti learning of fuzzy rule-based
lassi ation systems o-operating with fuzzy reasoning methods. In International
Journal of Intelligent Systems, Vol. 13 (10/11), pages 1025{1053, 1998.

6. Kenneth A. DeJong and William M. Spears. Learning on ept lassi ation rules
using geneti algorithms. Pro eedings of the International Joint Conferen e on
Arti ial Intelligen e, pages 651{656, 1991.
7. Usama M. Fayyad and Keki B. Irani. Multi-interval dis retization of ontinuousvalued attributes for lassi ation learning. In IJCAI, pages 1022{1029, 1993.
8. David E. Goldberg. Geneti Algorithms in Sear h, Optimization and Ma hine
Learning. Addison-Wesley Publishing Company, In ., 1989.
9. Elisabet Golobardes, Xavier Llora, Josep Maria Garrell, David Vernet, and Jaume
Ba ardit. Geneti lassi er system as a heuristi weighting method for a asebased lassi er system. Butllet de l'Asso ia io Catalana d'Intel.ligen ia Arti ial,
22:132{141, 2000.
10. John H. Holland. Adaptation in Natural and Arti ial Systems. University of
Mi higan Press, 1975.
11. John H. Holland. Es aping Brittleness: The possibilities of General-Purpose Learning Algorithms Applied to Parallel Rule-Based Systems. In Ma hine learning, an
arti ial intelligen e approa h. Volume II, pages 593{623. 1986.
12. Ron Kohavi. A study of ross-validation and bootstrap for a ura y estimation
and model sele tion. In IJCAI, pages 1137{1145, 1995.
13. Alexander V. Kozlov and Daphne Koller. Nonuniform dynami dis retization in
hybrid networks. In Pro eedings of the 13th Annual Conferen e on Un ertainty in
AI (UAI), pages 314{325, 1997.
14. H. Liu and R. Setiono. Chi2: Feature sele tion and dis retization of numeri attributes. In Pro eedings of 7th IEEE International Conferen e on Tools with Arti ial Intelligen e, pages 388{391. IEEE Computer So iety, 1995.
15. Xavier Llora and Josep M. Garrell. Knowledge-independent data mining with
ne-grained parallel evolutionary algorithms. In Pro eedings of the Geneti and
Evolutionary Computation Conferen e (GECCO-2001), pages 461{468. Morgan
Kaufmann, 2001.
16. J. Mart, X. Cuf, J. Regin os, and et al. Shape-based feature sele tion for mi ro al i ation evaluation. In Imaging Conferen e on Image Pro essing, 3338:1215-1224,
1998.
17. E. Martnez Marroqun, C. Vos, and et al. Morphologi al analysis of mammary
biopsy images. In Pro eedings of the IEEE International Conferen e on Image
Pro essing (ICIP'96), pages 943{947, 1996.
18. Jose C. Riquelme and Jesus S. Aguilar. Codi a ion indexada de atributos ontinuos para algoritmos evolutivos en aprendizaje supervisado. In Pro eedings of
the \Primer Congreso Iberoameri ano de Algoritmos Evolutivos y Bioinspirados
(AEB'02)", pages 161{167, 2002.
19. Ronald L. Rivest. Learning de ision lists. Ma hine Learning, 2(3):229{246, 1987.
20. Stephen F. Smith. Flexible learning of problem solving heuristi s through adaptive sear h. In Pro eedings of the 8th International Joint Conferen e on Arti ial
Intelligen e (IJCAI-83), pages 421{425, Los Altos, CA, 1983. Morgan Kaufmann.
21. Teren e Soule and James A. Foster. E e ts of ode growth and parsimony pressure
on populations in geneti programming. Evolutionary Computation, 6(4):293{309,
Winter 1998.
22. Stewart W. Wilson. Get real! XCS with ontinuous-valued inputs. In L. Booker,
Stephanie Forrest, M. Mit hell, and Ri k L. Riolo, editors, Fests hrift in Honor of
John H. Holland, pages 111{121. Center for the Study of Complex Systems, 1999.
23. Ian H. Witten and Eibe Frank. Data Mining: pra ti al ma hine learning tools and
te hniques with java implementations. Morgan Kaufmann, 2000.

Das könnte Ihnen auch gefallen