Sie sind auf Seite 1von 31

GENETIC PROGRAMMING

AND
GENE EXPRESSION PROGRAMMING

Marek Ostaszewski
OUTLINE

Evolutionary Algorithms

Genetic Programming (GP)

Gene Expression Programming (GEP)

Applications in the real world problems


NP-HARDNESS

Some problems cannot be solved by exact methods


Example: Bandwidth allocation

10 bandwidth ranges, 30 antennas

Supercomputer (109 combinations in 1 s)

Result in roughly 7000 times age of Earth

Approximate results are required


EVOLUTIONARY
ALGORITHMS (EA)
Initialize population P
Generation g = 0
Inspired by the process of
natural evolution Evaluate P

Elements Yes
Stopping
condition
Chromosome (representation)
No
Individual (solution) P' = selection(P)

Fitness Recombination (P')


Mutation (P')
Population
P = P'
Modifications depend on g=g+1

representation
Finish
EVOLUTIONARY
ALGORITHMS (EA)
Initialize population P
Generation g = 0
Inspired by the process of
natural evolution Evaluate P

Elements Stopping Yes


condition
Chromosome (representation)
No
Individual (solution) P' = selection(P)

Fitness Recombination (P')


Mutation (P')
Population
P = P'
Modifications depend on g=g+1

representation
Finish
GENETIC ALGORITHMS
(GA) (1)
Chromosome is binary string
Solution is a value encoded by the chromosome 3

Fitness is the performance of the solution 2.5

s1

Example:
1.5

optimize f(x) = x2
1 s3

Solution - x value
0.5

Fitness - y value s2

-5 -4 -3 -2 -1 0 1 2 3

-0.5
GENETIC ALGORITHMS
(2)
Parent 1
0 0 1 1 1 0 0 1 1 0 1

Crossover point
Modifications are Parent 2

straightforward 1 0 0 0 1 1 1 0 0 0 0

Crossover Offspring 1
0 0 1 1 1 0 1 0 0 0 0

Mutation Crossover point


Offspring 2
1 0 0 0 1 1 0 1 1 0 1
GENETIC ALGORITHMS
(2)

Modifications are Parent


0 0 1 1 1 0 0 1 1 0 1
straightforward
Mutation point
Crossover
Offspring
0 0 1 1 1 0 0 0 1 0 1
Mutation
GA - SUMMARY

Solution is a point in the search space

Chromosome encodes the point as a sequence of


symbols
GENETIC
PROGRAMMING (GP) (1)
Chromosome is a
program
/
Program tree *

Terminals - inputs (T) + sin b a

Nodes - functions (F)


a b a

Solution is the program


a
Fitness is its f = (a + b) ∗ sin(a) −
b
performance
GENETIC
PROGRAMMING (2)
Example: data modeling
Solution
3
is a function describing given data set
Fitness is the sum of relative errors of the solution
2

s1: y = ln(x)

0 1 2 3 4 5 6 7 8

s2: y = 4/x+2.4

-1
GENETIC
PROGRAMMING (3)
Parent 1 Parent 2 +

Modifications are * / /

problematic + sin b a a b a a

Crossover a b a Crossover point

Mutation
Problems Offspring 1 Offspring 2 +

/ /
Syntactic correctness *

+ a b a b a a
Tree bloat sin

a b a
GENETIC
PROGRAMMING (3)

Modifications are
problematic Parent + Offspring +

Crossover / * /

Mutation a b a a + a a a

Problems ln b

Mutation point
Syntactic correctness b

Tree bloat
GENETIC
PROGRAMMING (3)

Modifications are
problematic Parent + Offspring +

Crossover / * /

Mutation a b a a + a a a

Problems ln b

Mutation point
Syntactic correctness b

Tree bloat
GP - SUMMARY

Solution is a program of evaluated performance

Chromosome represents solution as a program tree

Problems
Break (10 min)
GENE EXPRESSION
PROGRAMMING (GEP) (1)
Chromosome (k-expression) is a sequence of symbols
from the set of terminals (T) and functions (F)
Solution (expression tree) is a program decoded from
k-expression

Expression K - expression
Tree
+ / a b c d

+ /

a b c d
GENE EXPRESSION
PROGRAMMING (2)

Modifications are straightforward


Fixed size sequence of symbols
Karva notation ensures syntactic correctness
GENE EXPRESSION
PROGRAMMING (3)
Expression
Tree +

ln

Head - F and T
a * b

Tail - only T
/ *c

A part of the tail may be


unused b *d

Head Tail
+ ln a * b / c b d a b a c d

Unused part
GENE EXPRESSION
PROGRAMMING (4)
Expression Expression
Tree + Tree +

ln ln

a * b a * +

/ *c / *c b d
*

b *d a b
*

Head Tail Head Tail


+ ln a * b / c b d a b a c d + ln a * + / c b d a b a c d

Unused part Unused part


GEP - SUMMARY

Combines properties of GA and GP

Fixed size sequence of symbols prevents tree bloat

Karva notation ensures solution correctness

Offers intensive exploration of the search space

Interesting for epistatic problems


INTRUSION DETECTION
WITH GEP

Problem definition

Fitness calculation

Results and comparison with regular approach


PROBLEM DEFINITION

Network traffic is described with a set of parameters


e.g packets/s, bytes/s, etc.

We search for a classification function g(x)

g(x) > 1 during attack

g(x) < 1 during regular traffic


FITNESS CALCULATION
(1)

Input: time series of monitored parameters

E.g. two parameters p and b and time window


w = 2 give s = [pt0, pt1, pt2, bt0, bt1, bt2]

Three parameters taken from traffic clustering

Size, Number and Ratio (Capacity/Number)

Seven values of w = {5, 10 - 60}


FITNESS CALCULATION
(2)
Fitness is the classification performance

Sensitivity

Specificity

Learning set composed of time series samples

Training set

Testing set
RESULTS

Comparison with regular approach (thresholding)

Avg. Sensitivity: soGEP Thresholding Sensitivity


Avg. Specificity soGEP Thresholding Specificity
1

0.96

0.92
Value

0.88

0.84

0.8

5 10 20 30 40 50 60
Time window
FUNCTION FINDING
WITH GEP

Problem definition

Fitness calculation

Results
PROBLEM / FITNESS

Problem: having set of 3-d points P find a function


fitting these points

Fitness: sum of relative errors of a given function


concerning P
RESULTS

Mexican Hat
a2 +b2
f (x, y) = 1 − (a + b ) ∗ e
2 2 − 2

Found

f (x, y) =
((1 + sin(sin(x − ln(y) + 2)) + sin(sin(2sin(−y )))) ∗
2

sin(−1)
(ey−2
∗ sin(1 − sin(x + y)))) + ey −x+1
e
SUMMARY

Evolution of programs is an interesting and useful


approach to problem solving

Challenging issues

Search guidance

Solution structure
THANK YOU

Questions ?

Das könnte Ihnen auch gefallen