Beruflich Dokumente
Kultur Dokumente
GENERATING PROTOTYPES IN
CLASSIFICATION PROBLEMS
Presented by:
Tarundeep Dhot
Dept of ECE
Concordia University
Acknowledgements
X2
Type X
Data SYSTEM /
Sets CLASSIFIER
(Labeled)
Y1
Fig: Training Phase
Type Y
Classification problems undergo a supervised training phase.
Set of labeled data samples indicating the class it belongs to is provided to
the system.
Based on such data, unknown data is classified using rules, decision trees
or mathematical functions.
PROPOSED APPROACH
Provides a new GP based method for determining the prototypes in a c-
class problem where c ≥ 2.
X1 X2 X3 Y1 Y2 Y3 Z1 Z2
The set of prototypes describing the classes makes up a single
individual of the evolving population.
Each prototype is encoded as a derivation tree, thus, an individual
is a multi-tree (list of trees).
Classification consists of attributing a sample to one of the classes
i.e. associating the sample to one of the prototypes.
Fitness function or value is the recognition rate obtained on the
training set of an individual.
Selection is done based on the fitness values of individuals.
At the end of the process, the best individual obtained, constitutes
the set of prototypes to be used for the considered application.
DESCRIPTION OF THE APPROACH
Set of prototypes
Each prototype characterizes a class or subclass
Each class or subclass consists of set of logical expressions
Each expression may contain variable number of predicates.
Each predicate establishes a condition on the value of a particular feature it is
representing. If all predicates of an expression are satisfied by values in the feature
vector describing a sample, we say the expression matches the sample.
DESCRIPTION OF THE APPROACH (cont:)
STRUCTURE DEFINITION:
The implementation requires a program generator providing syntactically correct
programs and an interpreter to execute them.
PROGRAM GENERATOR: Based on grammar written for S-expressions.
Grammar G is defined as a quadruple = (T, N, S, P)
where T and N are disjoint finite alphabets.
S is the starting symbol
P is the set of production rules used to define strings.
INTERPRETER: is implemented by an automation that computes Boolean
functions. Such an automation accepts an expression as an input and returns
TRUE or FALSE as an output depending on whether the expression matches the
sample or not .
LEARNING CLASSIFICATION RULES (cont:)
In order to favor those individuals able to obtain good performance with lesser number
of expressions, the fitness of each individual is increased by 0.1/Nc where Nc is the
number of expressions in an individual.
LEARNING CLASSIFICATION RULES (cont:)
GENETIC OPERATORS:
Crossover and mutation are used as the genetic operator.
GENETIC OPERATORS:
Data Sets R2 R1