Beruflich Dokumente
Kultur Dokumente
[23].
There are a couple of issues with selecting this as the parent selection scheme which
according to Eiben and Smith are [23]:
1. Individuals that have a much higher fitness value then the rest of the population will
be selected thus taking over the entire population in a very rapid manner which
results in premature convergence.
2. Selection pressure is very low if the fitness values are close to each other which
results in slow increase in fitness values.
3. Transposed versions of the same fitness function do not behave the same. So a
function ( ) y f x = would behave differently then a function ( ) y f x c = + where c is
some constant.
3.6.2 Ranking
Ranking selection is similar to fitness proportional but instead of using the fitness
value to do the selection the rank of the parents is used. The mapping from rank number to
selection probability can be done in various ways such as linearly or exponential decreasing.
For linear ranking schemes the selection probability is parameterized by a value s
which is between 1.0 and 2.0. The formula that is used by Eiben and Smith is as follows
[23]: ( )
( ) ( )
( )
2 2 1
1
lin rank
s i s
P i
= +
.
16
3.6.3 Roulette Wheel
Roulette wheel selection can be thought of spinning a one-armed roulette wheel
where the size of each section reflects the selection probability. The outline of the algorithm
is described in Figure 3.2 [23] and the following is a description of it [23]:
In general, the algorithm can be applied to select members from the set of
parents into a mating pool. Here we discuss the traditional case of = that is
usually implemented as follows. Assuming some order over the population
(ranking or random) from 1 to , we calculate a list of values
1 2
, ,..., a a a
(
such
that ( )
1
i
i sel
a P i =
, where ( )
sel
P i is defined by the selection distribution- fitness
proportionate or ranking. Note that this implies 1.0 a
= .
BEGIN
set current_member=1;
WHILE (current_member s ) DO
Pick a random value r uniformly from [0,1];
set i=1;
WHILE (
i
a r < ) DO
Set i=i+1;
OD
set mating_pool[current_member]=parents[i];
set current_member=current_member+1;
OD
END
Figure 3.2. Pseudocode for roulette wheel algorithm.
3.6.4 Tournament Selection
Tournament selection can be used when the population size is very large or distributed
in some way such as a parallel system and obtaining information is time consuming or not
possible [23]. Tournament selection is simple to implement and the pseudo code taken from
Eiben and Smith is given in Figure 3.3 [23]. There are four factors that determine whether
an individual will be selected [23]:
1. Its rank in the population. Effectively this is estimated without the need for sorting
the whole population.
2. The tournament size k. The larger the tournament, the more chance that it will
contain members of above-average fitness, and the less that it will consist entirely of
low-fitness members.
3. The probability p that the most fit member of the tournament is selected. Usually
this is 1.0 (deterministic tournaments) but stochastic versions are also used with
p<1.0. Clearly in this case there is lower selection pressure.
17
BEGIN
set current_member=1;
WHILE (current_member s ) DO
Pick k individuals randomly, with or without replacement;
Select the best of these k comparing their fitness values.
Denote this individual as i;
Set mating_pool[current_member]=i;
Set current_member=current_member+1;
OD
END
Figure 3.3. Pseudocode for tournament selection algorithm.
4. Whether individuals are chosen with or without replacement. In the second case,
with deterministic tournaments, the k-1 least-fit members of the population can
never be selected, whereas if the tournament candidates are picked with replacement,
it is always possible for even the least-fit member of the population to be selected as
a result of a lucky draw.
3.6.5 Truncation Selection
Truncation selection works by selecting a fraction of the best individuals and then
randomly selecting from this pool the parents that will be used for crossover. The general
outline is given in Figure 3.4 [24].
Input: The population ( ) P t , the truncation threshold
| |
0,1 T e
Output: The population after selecting ( )
'
P t
truncation( )
1
, ,..., :
N
T J J
J sorted population J according fitness with worst individual
At first position
for 1 i to N do
rrandom ( ) { }
1 ,..., T N N
'
r
i
J J
od
return
{ }
' '
1
,...,
N
J J
Figure 3.4. Pseudocode for tournament selection algorithm.
3.6.6 N Selection
N selection works by selecting a fraction of the best individuals that will be used as
parents for the crossover operators. If N is set to N then half of the population will be
used as parents for the crossover operators. If we want the entire population to be used as
parents for the crossover operators then we set N to the size of the population.
18
3.7 SURVIVOR SELECTION
Survivor selection determines which genotypes will exist or survive for the next
generation. The survivors will become the basis for the parent selection in the following
generation. Various survivor selection schemes exist such as ParentOffspring, Offspring
and Combined.
3.7.1 ParentOffspring
During ParentOffspring we select the best genotypes from the combined genotypes
of the parents that were selected to undergo mutation and the resulting offspring that was
created from the crossover and mutation that was applied.
3.7.2 Offspring
Offspring selection only uses the genotypes that resulted from the selection of the
parents that underwent crossover and mutation.
3.7.3 Combined
Combined selection retrieves the best genotypes from the offspring and the
genotypes from the entire population.
3.8 RECOMBINATION (CROSSOVER)
Recombination is the process of taking two parent genotypes and mating them to
produce offspring. The offspring results in two newly created genotypes. Recombination is
determined by some crossover rate which according to Eiben and Smith is in the range of
[0.5, 1.0] [23]. There are various recombination strategies which are specific to the type of
problem you are trying to solve. A few of the well known crossover operators will be
discussed below.
3.8.1 One-Point Crossover
One point crossover consists of selecting a location randomly to split the parent
genotypes into two [16]. Given parent 1 1 p and parent 2 2 p of length l then a random
number between 1 and l-1 is selected. Each parent is split resulting into 1
left
p , 1
right
p , 2
left
p
19
and 2
right
p . The offspring are concatenated into 1 | 2
left right
p p and 2 | 1
left right
p p . Figure 3.5
provides an example of this process.
0 0 1 1 0 1 1 0
1 1 0 1 1 1 0 1
1 1 0 1 0 1 1 0
0 0 1 1 1 1 0 1
1
left
p 1
right
p
2
left
p
2
right
p
Figure 3.5 One-point crossover.
3.8.2 N-Point Crossover
N point crossover consists of breaking the parents into n segments and then mating
them to produce the offspring [23]. The points are selected in a similar fashion as one-point
crossover and have to be between 1 and l-1. Figure 3.6 provides an example of n-point
crossover with n=2.
0 0 1 1 0 1 1 0
1 1 0 1 1 1 0 1
1 1 0 1 0 1 0 1
0 0 1 1 1 1 1 0
Figure 3.6 n-point crossover for n=2.
3.8.3 Uniform Crossover
Uniform crossover treats each gene independently from each other and determines
which gene to use based on a probability value for that specific gene [23]. So given a
genotype of length L then L random numbers are generated from a uniform distribution were
each random value is between 0 and 1. Once the random numbers are generated then we
20
check each value and if it is below some parameter p (usually 0.5), the gene is selected from
the first parent else it is selected from the second parent. Figure 3.7 provides an example of
uniform crosssover.
0 0 1 1 0 1 1 0
1 1 0 1 1 1 0 1
0 1 1 1 1 1 1 1
1 0 0 1 0 1 0 0
Figure 3.7. Given a genotype of length 8 we generate 8 random values
from a uniform distribution which resulted in
[0.36,0.65,0.24,0.46,0.89,0.63,0.12,0.55].
3.9 MUTATION
Mutation is the process of taking a genotype and randomly changing parts of it to
form a new genotype. Mutation, in our algorithm (Figure 3.1), occurs after the
recombination has occurred and occurs with a small probability. The difference between
recombination and mutation is that recombination uses two parents to produce offspring
whereas mutation only uses one parent and modifies its genotype to form the new offspring.
Various mutation operators exist and several will be discussed below.
3.9.1 Binary Representations
Given a binary representation, a sequence of 0s and 1s, of length L and a given
small probability
m
p then we treat each gene separately and flip each bit if we generate a
probability value that falls below the
m
p value. So, on average there will be
m
L p
mutations for a genotype of length L. As you can see in Figure 3.8, bits 3, and 8 were
mutated.
0 0 1 1 0 1 1 0
0 0 0 1 0 1 1 1
Figure 3.8. Bitwise mutation.
According to Eiben and Smith: Most binary coded GAs use mutation rates in the
range such that on average between one gene per generation and one gene per offspring are
mutated [23].
21
3.9.2 Integer Representations
Random resetting and creep mutation are two forms of mutation used when the
encoding scheme uses an integer representation. A user defined probability
m
p is used to
specify how much mutation will occur and it is done on a gene by gene manner
independently of the other genes.
In random resetting each gene is allowed to be changed from a list of permissible
values and dependent on the probability
m
p . This mutation is usually selected when the list
of values encoded are cardinal values. On the other hand, creep mutation is used for ordinal
attributes and works by adding a small value to each gene with probability p. This small
value can be either positive or negative. For more information on these types of mutations
one can look at [23].
3.9.3 Permutation Representations
In permutation representations you can no longer deal with each gene independently
since changing a specific gene might introduce duplicates and no longer be a permutation.
For example, given a city tour with a genotype of { } 5, 2,3,1, 4 and we change the second
gene to 4 would result in { } 5, 4,3,1, 4 which would no longer have city tour 2 in the
genotype. The resulting effect is that we never get to visit city 2 and at the same time we
visit city 4 twice. When doing a permutation mutation we want to keep the same values
and not introduce duplicates or delete any specific one. The only change that is made is the
order of how we traverse the values. There are various mutation operators that deal with
permutation based representations and a few will be discussed below.
3.9.3.1 SWAP MUTATION
Swap mutation works by randomly selecting two genes in the genotype and
swapping them. An example of swap mutation is given in Figure 3.9.
1 2 3 4 5 6 7 8
1 2 8 4 5 6 7 3
Figure 3.9. Swap mutation, genes 3 and 8 get swapped.
22
3.9.3.2 INSERT MUTATION
Insert mutation works by randomly selecting two genes in the genotype and moving
one next to the other. An example of insert mutation is given in Figure 3.10.
1 2 3 4 5 6 7 8
1 2 3 7 4 5 6 8
Figure 3.10. Insert mutation, genes 3 and 7 get selected and gene 7
gets placed in front of gene 3.
3.9.3.3 SCRAMBLE MUTATION
Scramble mutation works by selecting a subset of the genotype and randomly
scrambling the individual genes. An example of scramble mutation is given in Figure 3.11.
1 2 3 4 5 6 7 8 1 2 4 6 3 5 7 8
Figure 3.11. Scramble mutation, subset 3 to 6 gets selected then is
scrambled to generate the new genotype.
3.9.3.4 INVERSE MUTATION
Inverse mutation works by selecting a subset of the genotype and reversing the order
of the genes. An example of inverse mutation is provided in Figure 3.12.
1 2 3 4 5 6 7 8
1 2 3 7 6 5 4 8
Figure 3.12. Inverse mutation, subset 4 to 7 gets selected then order is
reversed.
3.10 HYBRID GA
A hybrid genetic algorithm uses the main algorithm used by a genetic algorithm but
incorporates other techniques within its framework to produce a hybrid genetic algorithm
that performs better than the standard genetic algorithm. A standard genetic algorithm
depends on balancing two conflicting objectives, which are exploiting the best solutions
found so far and at the same time exploring the search space for promising solutions [26].
By using local search methods within a genetic algorithm it can accelerate the search
23
towards the global optimum by introducing new genes and if local knowledge is used then
the genetic algorithm is further sped up by finding the most promising search region [26].
Using local search methods within the genetic algorithm can have one or more of the
following benefits: improving the quality of the solution, improving the efficiency,
guaranteeing feasible solutions, the ability to use fitness function estimation and the ability
to substitute other operations for the standard ones [26]. The quality of the solution is
improved by local search by its ability to locate local optima with high accuracy. Efficiency
is improved by reducing the time required to reach a solution and this can be important
when dealing with real-world problems since a function evaluations are the most time-
consuming part of the algorithm [26]. A standard genetic algorithm can introduce infeasible
solutions and resources are consumed searching for a solution where none exists. By
applying problem-specific knowledge, infeasible solutions can be prevented or they can be
repaired in order to provide a feasible solution. Fitness function estimation can be used if
the fitness function is slow or complex to evaluate without impacting the effectiveness of
the search. Given that the genetic algorithms framework works by performing certain
operations in a stepwise fashion it is very simple to replace the operations such as parent
selection, crossover or mutation with other more suitable operators.
24
CHAPTER 4
GENETIC ALGORITHMS IN TIMETABLING
4.1 TIMETABLING
A simple timetabling can be described as having a set of v events { }
1 2
, ,...,
v
E e e e =
and a set of s time slots { }
1 2
, ,...,
s
T t t t = where an assignment is defined as the ordered pair
( ) , a b such that a E e and b T e with the interpretation that that event a occurs in time-
slot b [2]. Given our example for student scheduling (Table 1.2), events E are all the
student courses that we need to schedule and can be defined as follows:
{ }
1 1 1 1 1 1 1 1 9 5 9 4 9 4 9 3
, , , ,..., , , ,
M S E PE M S E PE
E s s s s s s s s
= .
i j
S
means that student i must take
course j, so given
9 5 M
S
decodes into student 9 needing to take the pre-calculus course. Our
time slots can be defined in the following fashion
8 1 1 8 5 1 8 3 1 8 4 1 8 2 1 8 4 1 8 1 1
11 2 1 11 3 2 11 3 2 11 1 2 11 2 2 11 3 1 11 2 1
, , , , , , ,...,
, , , , , ,
M M S S E E PE
M M S S E E PE
t t t t t t t
T
t t t t t t t
=
`
)
where
i j k
t
is defined as the
time slot that begins at i for course j section k. Given
11 2 2 E
t
can be decoded as English 2
section 2 and begins at 11 am. You may omit the time the course-section occurs since
course-section is unique and just have a lookup table that specifies the time for that
particular course-section.
4.2 CONSTRAINTS
In timetabling we are also given constraints and objectives that we must meet. Some
of these constraints/objectives render most of the solutions from the search space infeasible.
Various constraints will be discussed below but these are not the only ones and some
constraints are problem specific.
25
4.2.1 Edge Constraints
Edge constraints are constraints where two events cant be scheduled for the same
time slot [3]. In our example, its the same student being scheduled to take a course at the
same time. Its impossible for this to occur since the student cant be at two places at the
same time. You will also see me mention the word collision(s) which is a violation of one
or more edge constraints.
4.2.2. Ordering Constraints
Ordering constraints are constraints where there needs to be some type of ordering
[3]. For example, if a student is taking a science class that has a corresponding lab class
then there might be some ordering requirement that the students class has to occur before
the lab. In a high school setting, a student might have two math courses and one is a support
for the other one. So an ordering constraint might be that the main math class has to come
prior to the math support class.
4.2.3 Event-spread Constraints
Sometimes there is a requirement for certain events to be spread out in some fashion
[3]. A classic example is exam timetabling where you want to schedule exams such that the
student does not have more than two exams in a particular day. Another example might be
that you want professors in a university to only teach no more than three days so that they
can have the other two days free for research purposes. In a high school setting one might
decide that science and math courses be spread out and there should be at least one course in
between them.
4.2.4 Capacity Constraints
Capacity constraints deal with a room not exceeding its capacity or a teacher being
limited to teach only so many students due to contract arrangements [3]. In high school
timetabling, specific courses regardless of room might not exceed a given number. For
example, math courses might be limited to 30 spaces while art courses or computer courses
might be limited to say 15 spaces due to the resources available for that particular course.
26
4.2.5 Hard/Soft Constraints
Hard constraints are strictly enforced and can not be violated [3]. An example of a
hard constraint is: no person can be allocated at the same time in more than one place. Soft
constraints are constraints that we wish to have but are not absolutely necessary [3]. An
example of a soft constraint is: A teacher teaches all of their courses in succession. That is,
they dont teach some courses in the morning and then some in the afternoon with a
significant break in between [1].
4.3 REPRESENTATION
Representation is an important part of a genetic algorithm. Representation allows us
to represent a real world timetable and map it into a genotype that will be used as the basis
to undergo crossover and mutation. There are various methods to represent a timetabling
problem and two are discussed below.
4.3.1 Simple Ordering
Simple ordering orders the events in a genotype with the corresponding time slot that
it should belong to. For example if we have 10 events that we need to place in time slots
and the resulting genotype is 3721774658 G = then we have event 1 in timeslot 3, event 2
in time slot 7, event 3 in time slot 2 and so on. Simple ordering can be coded by using a
one dimensional array where each cell contains a students event and its corresponding
location in the timetable as long as the timetable is also ordered sequentially.
4.3.2 Matrix Representation
Matrix representation can be used as an encoding scheme. A matrix representation
of size m n was used to solve a school timetabling problem [27], where m is the number of
classes and n is the number of time periods per class in a week. Special operators had to be
defined for crossover and mutation since the standard operators discussed in chapter 3
would not work because they are designed for a one dimensional array representation.
27
4.4 FITNESS
In order for the genetic algorithm to know whether our genotypes are improving in
fitness value we have to define a fitness function that will give us information about a
particular timetable. This fitness function has to take into consideration the various
constraints that we have to meet and as better timetables are produced the better fitness
values we get. Various fitness functions have been defined for timetabling and they are
either a function that we are minimizing or maximizing.
General guidelines for the behavior of a fitness function are defined with the
assumption that we are trying to maximize our fitness function [3]. The following
description is taken from [3]:
Given a space P of candidate solutions to a problem, there are three desirable
properties for the fitness function ( )( ) f p p P e . We would like
1. ( ) f p to be a normally increasing function of the quality of p as a candidate solution
of the problem, so that optimal solutions lie at the global maxima of f.
2. f to be reasonably well-behaved function, so that its value conveys some
information about the quality of p as a solution in most parts of the space.
3. ( ) f p to change in some way that reflects this as p gets closer to being an optimal
solution.
The fitness function is as follows [3]: ( )
( )
1
1
1
n
i i
i
f p
wc p
=
=
+
where
i
w is the weight
associated with constraint i and ( )
i
c p is the number of violations for constraint i at solution
p. This function has a range of
| | 0,1 and an optimal solution occurs when we have 0
violations thus ( )
1
0
n
i i
i
wc p
=
=
which results in ( ) 1 f p = .
An alternative fitness function was defined by in which the function is one in which
we try to minimize [28]. The function in its general form is defined as
( ) ( ) ( )
i i
F x w P x =
where
i
w is the weight for constraint i which can be either a hard or
soft constraint and ( )
i
P x is the number of violations for constraint i. We get an optimal
solution when we have 0 violations resulting in ( ) 0 F x = .
28
Regardless the equation we use we can define the weights for hard and soft
constraints. Given that we want to avoid the occurrences of the hard constraints we can
select a higher weight value for the hard constraints versus the weight values of our soft
constraints. Furthermore, instead of hard coding these values we can make them available
for modification by the end user where they determine how much weight to assign particular
constraints.
4.5 MUTATION
Mutating a genotype by using inside information is referred to as smart mutation
or directed mutation [3]. Smart mutation guides the evolutionary algorithm to more
promising solutions by using information about the specific problem to aid the evolutionary
cycle. There are four smart mutation operators that we use in our genetic algorithm [3].
These four operators are:
1. Violation-directed Mutation (VDM): choose an event with a maximal violation
score, and randomly alter its assigned time.
2. Event-Freeing Mutation (EFM): choose an event with a maximal violation score.
Then, give it a new time which will maximally reduce this score
3. Stochastic Violation-directed Mutation (SVDM): stochastically select an event,
bias toward those with higher violation scores, and randomly alter its assigned time
4. Stochastic Event-freeing Mutation (SEFM): stochastically select an event, bias
toward those with higher violation scores, then stochastically select a new time for
this event, bias toward times which will maximally reduce the events violation
score.
A random mutation operator is also used to compare how the above mutation
operators compare to this one. The specifics of using these operators in our implementation
are discussed in the experimental results in chapter 5.
29
CHAPTER 5
EXPERIMENTAL RESULTS
5.1 INTRODUCTION
Two high school timetabling problems will be looked at and for each of these
problems types we will look at the following: representation, type of constraints,
recombination operators, mutation operators and its fitness function. Then there will be a
list of the results of the experiments to determine how well they perform.
For all of our tests, the assumption is that students have up to seven periods assigned
with the exception that some seniors might have less periods assigned since they might have
already completed most of their requirements for graduation. If a given school has less than
the seven periods then this could easily be modified by just assigning students the courses
that they need to take, one for each period that they are required to be in school. We chose
periods which is not time specific but the periods follow after the previous one with some
time allowed for them to go from class to class and lunch can be defined between any two
periods. So solving the timetable for one particular day solves the entire timetable for every
day since the periods are the same for every day of the week.
In some schools, the periods shift by one every day of the week with the exception
of period one. There is no need to modify the problem to fulfill this request because once
we have a timetable then we just shift the periods the students go to by one for the following
days. For example, if students go to P1, P2, P3, P4, P5,P6 on Mondays then on Tuesdays
students go to P1, P3, P4, P5, P6, P2 and on Wednesdays they attend P1, P4, P5, P6, P1, P2,
P3 and so forth.
5.2 FIXED MASTER SCHEDULE
Given a set of students with courses that they need to take, a master schedule with
assigned courses and a maximum enrollment for each course-section, assign the students
into the master schedule provided that you minimize the over enrollment and the students
are assigned to the courses they requested.
30
For this problem, the teachers are already assigned to the courses that they are going
to teach and for what timeslots. The only problem left is to fit students into these timeslots
making sure students take the courses they requested and the students are not assigned to
more than one course at the same time. Also, there might be some timeslots were the
capacity might be exceeded and we wish to minimize this. A heuristic function can run
during the evolutionary process if there hasnt been any progress made after a specific
number of generations to assist the evolutionary process.
5.2.1 Test Data
The data that was used was from a small high school containing 350 students with
each taking approximately seven courses and a master schedule/timetable with 124 timeslots
(course-sections) broken up into seven periods. The master schedule was already
predefined with 24 teachers teaching the courses that made up the master schedule. The
existing student schedule was taken and for each student that was enrolled for a specific
course-section we added that course to the students list of courses that they needed to take
and student capacities were set for each of the timeslots. So, given 350 students with
approximately seven courses each there is 350 7 2450 = events that have to be arranged
on the timetable with 124 slots.
The test data was read from two xml files, one for the master schedule and one for
the students requests. A sample master schedule is shown in Figure 5.1 and the full master
schedule that was used for the tests [29]. Each course that is being taught is tied to a specific
teacher that teaches the given course. For this case, there is no need to know the teacher
because the master schedule is fixed and the teachers have already been assigned to the
courses they will teach.
The students request xml data lists all the students with the courses that they need to
take. A sample xml data set is given in Figure 5.2 and the full xml data [30].
5.2.2 Representation
I chose to use simple ordering as discussed in section 4.3.1 and ordered the master
schedule into linear timeslots. A phenotype for Table 1.3 is given in Figure 5.3.
31
<MasterSchedule>
<Period id="1">
<course name="Algebra 1" section="1" capacity="40"/>
<course name="Earth Science" section="1" capacity="40"/>
<course name="World History" section="1" capacity="40"/>
</Period>
...
<Period id="7">
<course name="English 1" section="6" capacity="40"/>
<course name="Office Aide" section="5" capacity="2"/>
<course name="World History" section="4" capacity="40"/>
</Period>
</MasterSchedule>
Figure 5.1. Sample master schedule.
<StudentRequests>
<Student id="832316" name="Student 1">
<course name="AVID 2 and 3"/>
<course name="Algebra 2"/>
<course name="Beg PE"/>
<course name="Biology"/>
<course name="English 2"/>
<course name="Spanish 1"/>
<course name="World History"/>
</Student>
...
<Student id="832923" name="Student N">
<course name="Algebra 2"/>
<course name="Beg PE"/>
<course name="English 2"/>
<course name="Food Prep and Fas"/>
<course name="Foundations Bio"/>
<course name="Spanish 2 NN"/>
<course name="World History"/>
</Student>
</StudentRequests>
Figure 5.2. Sample student requests.
1 1, 5 1, 3 1, 4 1, 2 1, 4 1, 1 1,
2 1, 4 1, 2 1, 1 1, 1 1, 1 3, 3 1,
1 2, 3 1, 2 2, 4 2, 1 2, 2 4, 3 2,
2 1, 3 2, 3 2, 1 2, 2 2, 3 1, 2 1
M M S S E E PE
M M S S E E PE
phenotype
M M S S E E PE
M M S S E E PE
(
(
(
=
(
(
Figure 5.3. Phenotype representation of Table 1.3 consisting of
timeslots (course-section).
32
The phenotype is an ordering of the master schedule. Looking at the master schedule
in Table 1.3 we can see that there are four periods with Math, Science, English and Physical
Education as the courses being taught. Each period is taught for an hour and our phenotype
would be represented as a one dimensional array.
The genotype would be a one dimensional array where each allele would contain the
student and an index representing the slot in the phenotype for the course-section the student
would take. The students slots are grouped by students so if student 1 has 4 courses then
the first four elements of the array would correspond to the timeslots for student 1 and so
forth. If you looked at two different genotypes then you would see that each allele would
correspond to the same student but not necessarily the same timeslot. So, if we wanted to
represent a specific genotype that is going through the evolutionary process in section 2.1
with the phenotype just defined in Figure 5.3 it would be a specific instance of an array of
size 36 as in Figure 5.4.
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
1,1 , 1,10 , 1,10 , 1,18 ,...
9,12 , 9, 7 , 9, 21 , 9,15
S S S S
genotype
S S S S
(
=
(
(
Figure 5.4. Specific instance of a genotype
representation for Table 1.2.
5.2.3 Constraints
There are three constraints that we have defined: collision, mismatch and course-
overload. Collisions are edge constraints and occur when the same student is assigned to
more than one course-section during a specific time. A mismatch occurs when a student is
not assigned to the correct course that the student requested. Each specific instance of a
course or more precisely a course-section is allowed to have up to a certain limit of students
enrolled for that specific course-section.
Weights are assigned to these three different constraints and for our purposes
collisions were assigned a larger weight followed by mismatches and then course-overload.
Since our fitness function is one that we are looking to minimize then collisions provide a
higher value for our fitness function followed by mismatches and course-overload.
Therefore, our evolutionary algorithm will have a higher priority in reducing collisions first
followed by mismatches and then course-overloads during the mutation process.
33
5.2.4 Initialization
Initializing the genotypes is simply done by randomly selecting timeslots for each
student for the number of courses that they are required to take.
5.2.5 Recombination
Four crossover operators are used and examples of their use with specific instances
of genotypes will be performed to give the reader a visual view of how the crossover
operators work.
5.2.5.1 UNIFORM CROSSOVER
Uniform crossover was discussed in section 3.8.3 and works by looking at each
specific allele and performing the crossover. Given two genotypes and a probability array,
one for each allele, Figure 5.5 shows the results of a uniform crossover taking place.
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
1,1 , 1,10 , 1,20 , 1,18 ,... 1,1 , 1,10 , 1,15 , 1,18 ,...
1 _ 1
9,12 , 9,7 , 9,21 , 9,15 9,14 , 9,7 , 9,21 , 9,19
1,12 , 1,8 , 1,15 , 1,23 ,...
2 _ 2
9,14 , 9,3 , 9,27 , 9,19
S S S S S S S S
geno new geno
S S S S S S S S
S S S S
geno new geno
S S S S
( (
= =
( (
( (
(
=
(
(
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
1,12 , 1,8 , 1,20 , 1,23 ,...
9,12 , 9,3 , 9,27 , 9,15
S S S S
S S S S
(
=
(
(
Figure 5.5. Two genotypes undergoing uniform crossover with the following
probability distribution array [.37, .49, .81, .12,, .98, .23, .50, .75].
5.2.5.2 ONE POINT CROSSOVER
One point crossover is the same traditional crossover discussed in section 3.8.1. A
random index is selected between the [1,L-1] and crossover is performed by exchanging the
time slots. An example of one point crossover is shown in Figure 5.6.
5.2.5.3 MODIFIED ONE POINT CROSSOVER
Modified one point crossover operators works by first generating a random number
between [1,L-1] where L is the length of the genotype. From here we find the student that
was selected and find the start and end index for that specific student. The next step is to
choose a random number between the start and end and perform a one point crossover for
only the specified student. From here another random number between [1, 3] is selected
34
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
1,1 , 1,10 , 1,20 , 1,18 ,... 1,1 , 1,10 , 1,20 , 1,23 ,...
1 _ 1
9,12 , 9,7 , 9,21 , 9,15 9,14 , 9,3 , 9,27 , 9,19
1,12 , 1,8 , 1,15 , 1,23 ,...
2 _ 2
9,14 , 9,3 , 9,27 , 9,19
S S S S S S S S
geno new geno
S S S S S S S S
S S S S
geno new geno
S S S S
( (
= =
( (
( (
(
=
(
(
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
1,12 , 1,8 , 1,15 , 1,18 ,...
9,12 , 9,7 , 9,21 , 9,15
S S S S
S S S S
(
=
(
(
Figure 5.6. Two genotypes undergoing one point crossover with index specified at 3.
which results in 2. Crossover is performed on student 1 using the data from genotype 1 and
2 at the indicated index which results in two newly created genotypes.
In Figure 5.7 the first index is selected to be at index 3 which results in selecting
student S1.
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( )
1,1 , 1,10 , 1,20 , 1,18 ,...
1
9,12 , 9,7 , 9,21 , 9,15 1_ 1 1,1 , 1,10 , 1,20 , 1,18
1,12 , 1,8 , 1,15 , 1,23 ,... 2_ 1 1,12 , 1,8 , 1,15 , 1
2
9,14 , 9,3 , 9,27 , 9,19
S S S S
geno
S S S S geno student S S S S
S S S S geno student S S S S
geno
S S S S
(
=
(
= (
(
= (
=
(
(
( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )
,23
1,1 , 1,10 , 1,15 , 1,23 ,...
_ 1
_ 1 1,1 , 1,10 , 1,15 , 1,23 9,12 , 9,7 , 9,21 , 9,15
_ 2 1,12 , 1,8 , 1,20 , 1,18 1,12 , 1,8 , 1,20 , 1,18 ,...
_ 2
9,14 , 9,3 ,
S S S S
new geno
new student S S S S S S S S
new student S S S S S S S S
new geno
S S
(
(
=
(
= (
(
= (
=
( ) ( ) 9,27 , 9,19 S S
(
(
(
Figure 5.7. Two genotypes undergoing modified one point crossover with the first
index at 3 followed by an index of 2.
5.2.5.4 ONE POINT PER STUDENT
CROSSOVER
One point per student crossover is similar to the modified one point crossover but it
is done for each student. So given a genotype with N students then N students undergo
crossover.
35
5.2.6 Mutation
Five mutation operators are used. One is the classic random mutation operator while
the other four mutations are Violation Directed Mutation (VDM), Event-Freeing Mutation
(EFM), Stochastic Violation-directed Mutation (SVDM) and Stochastic Event-freeing
Mutation (SEFM) which are further discussed below.
5.2.6.1 RANDOM MUTATION
Random mutation works by taking a random index between [1, L] and then
randomly changing the timeslot for that specific student.
5.2.6.2 SMART OPERATORS
The smart operators that are used, Violation Directed Mutation (VDM), Event-
Freeing Mutation (EFM), Stochastic Violation-directed Mutation (SVDM), and Stochastic
Event-freeing Mutation (SEFM) all use one or more maximal violation score(s) to determine
the section or more specific the students alleles where the mutation will occur. A violation
score is assigned for each student using the following formula: ( ) ( ) ( )
i i c i m
v s c s w m s w = +
where ( )
i
v s is the violation score for student i, ( )
i
c s is the number of collisions for
student
i
s ,
c
w is the weight for collisions, ( )
i
m s is the number of mismatches for student
i
s ,
and
m
w is the weight for mismatches.
5.2.6.3 VIOLATION DIRECTED MUTATION
Violation directed mutation as previously discussed works by searching for the event
with the maximal violation score and randomly assigning another timeslot. There are three
weights that we defined for our problem which are collision, mismatch and course-overload.
For our tests, given that we gave collisions a higher weight followed by mismatches and
then course-overloads the violation directed mutation would first try to resolve the
constraints in that order but randomly assign the violation to a timeslot.
The process is described in Figure 5.8 and it first tries to determine if there is a
student with a violation and if there is then it creates two new genotypes one by removing a
collision and another by removing a mismatch. From there it determines which one is the
lowest one and assigns it to the current minimum genotype. Next, it checks if there are any
36
Input: genotype
Output: Mutated genotype or original one
mutate(genotype)
BEGIN
1 student=student with max violation score
2 currentMinGeno=genotype;
3 if (student exists) DO
4 genotypeC=processCollisions(genotype,student);
5 genotypeM=processMismatches(genotype,student);
6 currentMinGeno=min{genotypeC,genotypeM}
7 OD;
8 genotypeOC=processOverCapacities(genotype);
9 minGenotype=min{currentMinGeno,genotypeOC};
10 return minGenotype;
END
Figure 5.8. Violation Directed Mutation Pseudocode for mutation.
overcapacities that it can remove. The final genotype is the minimum of the current
minimum genotype and the genotype that resulted from removing an over capacity.
Collisions as described in Figure 5.9 are removed by first getting the list of all
collisions that are not requested by the student. If this set is not empty then we randomly
select one and replace one of the timeslots that contains that course-section with a random
one and return a newly created genotype. If this list is empty then randomly select a course-
section from the collisions list and replace one of them with a random timeslot and return
the newly created genotype.
Input: genotype,student
Output: Mutated Genotype or original one
processCollision(genotype,student)
BEGIN
1 if no collisions for student return genotype;
2 nonRequestedCollisions= set of non requested collisions;
3 if size(nonRequestedCollisions)>0 DO
4 cand= randomlySelectOne(nonRequestedCollisions);
5 newGeno=replace cand with random timeslot;
6 return newGeno;
7 ELSE
8 collisions= set of collisions;
9 cand=randomlySelectOne(collisions);
10 newGeno=replace cand with random timeslot;
11 return newGeno
12 OD;
END
Figure 5.9 Violation Directed Mutation Pseudocode for processing a collision.
37
Mismatches as described in Figure 5.10 are processed by first checking if there are
any mismatches for the student. If there are no mismatches then it returns the original
genotype. If there are mismatches then we select one of the mismatches and randomly
replace it with a timeslot (course-section) and generate a new genotype.
Input: genotype,student
Output: Mutated Genotype or original one
processMismatch(genotype,student)
BEGIN
1 if no mismatches for student return genotype;
2 mismatches= set of mismatch course-sections for student;
3 newGeno=select one mismatch and randomly
select a timeslot;
4 return newGeno;
END
Figure 5.10. Violation Directed Mutation Pseudocode for processing a mismatch.
Processing overcapacities as described in Figure 5.11 goes through all the timeslots
and determines their current enrollment and checks if it has gone over its capacity
constraint. If there are no timeslots that have gone over its capacity limit then it just returns
the original genotype. For those that have gone over, it maintains the one which has
exceeded it the most. From here it retrieves all the students that have been assigned that
timeslot and goes through each student and determines if the student did not request the
course. If the student did not request the course it changes the students course to a
randomly assigned timeslot and creates a new genotype with the updated information. If all
students have requested the course-section then it just returns the original genotype.
5.2.6.4 EVENT-FREEING MUTATION
Event-freeing mutation just like violation directed mutation and also previously
discussed finds the event with the maximal violation score but instead of randomly
assigning another timeslot it finds the timeslot that will maximally reduce the score. Given
that collisions have a higher weight for our tests then an event freeing mutation takes a
collision and tries to find an acceptable course for that particular student that does not
introduce another collision or mismatch. If there are no collisions then it proceeds with the
mismatches and then it tries to deal with the over capacities. If the weights were changed
38
Input: phenotype,genotype
Output: Mutated Genotype or original one
processOvercapacity(genotype,phenotype)
BEGIN
1 maxViolationCS=retrieve the course-section with the most
over capacities.
2 if (maxViolationCS does not exist) return genotype;
3 students=students that contain the maxViolationCS;
4 for each student in students list DO
5 if (student contains the maxViolationCS AND
the course is not a requested one) DO
6 student[index maxViolationCS]=randomly assign
the student a timeslot
7 return new genotype with students altered
timeslot;
8 OD
9 OD
10 return genotype;
END
Figure 5.11. Violation Directed Mutation Pseudocode for processing an overcapacity.
were the collisions have a lower weight then the other constraints then the collisions would
not yield events with maximal violation scores being reduced and a different constraint
would be used. Figure 5.12 contains the general pseudocode for this process.
Input: genotype
Output: Mutated genotype or original one
mutate(genotype)
BEGIN
1 student=student with max violation score
2 currentMinGeno=genotype;
3 if (student exists) DO
4 genotypeC=processCollisions(genotype,student);
5 genotypeM=processMismatches(genotype,student);
6 currentMinGeno=min{genotypeC,genotypeM}
7 OD;
8 genotypeOC=processOverCapacities(genotype);
9 minGenotype=min{currentMinGeno,genotypeOC};
END
Figure 5.12. Event-freeing Mutation Pseudocode for mutation.
Figure 5.13 contains the pseudocode for processing a mismatch. The algorithm starts
by retrieving all the mismatches for the student and if it doesnt find one it just returns the
original genotype that was passed in. If there were mismatches then it goes through each
mismatch and retrieves the course-sections that are available for the period of the mismatch.
Then it proceeds by going through the course-sections and determines if the course is one
39
Input: genotype,student
Output: Mutated genotype or original one
processMismatches(genotype,student)
BEGIN
1 if no mismatches for student return genotype;
2 mismatches= set of mismatch course-sections for student;
3 for each mismatch in mismatches DO
4 coursesSections=list of course sections for
the period of the mismatch
5 for (courseSection in courseSections) DO
6 if (the course in courseSection is a requested
one and course in courseSection has not
been assigned) DO
7 newGeno=create new geno replacing the
mismatch course with this one;
8 return newGeno;
9 OD
10 OD
11 OD
12 return genotype;
END
Figure 5.13. Event-freeing Mutation Pseudocode for processing a mismatch.
that the student has requested and has not already been assigned. If it passes the test then
the mismatch gets replaced by this new course-section and a new genotype is created.
The pseudocode for processing collisions for the Event-freeing mutation is given in
Figure 5.14. The process first determines if there is any collisions and if so then there are
three steps that are done to see if a collision can be removed for the student. The first step it
tries to remove a collision where the collision is for a non requested course (lines 2-9). If
the first step was not successful then it goes through all the collisions and tries to remove
one of the collisions (lines 10-16). The final step (line 17) is a more destructive one because
if the first two steps failed then the algorithm tries to arrange the courses for the student in
such a way that there are no collisions or mismatches by either doing a best first search to
arrange the students schedule into an acceptable one or looking for all the possible
combinations and taking the best one. This last step does not occur during the start of the
genetic algorithm but as we converge to a solution it becomes more difficult to remove a
collision and the only option is to rearrange the students schedule as long as it minimizes the
fitness value.
Processing over capacities is given in Figure 5.15. The process starts by first
determining which course-section has the highest over capacity. Once we have determined
the course section we proceed through each student that is assigned to this course and if the
40
Input: genotype,student
Output: Mutated Genotype or original one
processCollision(genotype,student)
BEGIN
1 if no collisions for student return genotype;
2 nonRequestedCollisions = student collisions for courses
that were not requested;
3 if (nonRequestedCollisons exist) DO
4 selectedCourseSection = randomly select one from
nonRequestedCollisions
5 newGeno = remove collision by using a non assigned
course for the non assigned periods;
6 if (newGeno is not null) return newGeno;
7 newGeno = remove collision by using mismatched
periods for the non assigned periods;
8 if (newGeno is not null) return newGeno;
9 OD
10 allCollisions = all collisions for student;
11 For (courseSection in allCollisions) DO
12 newGeno = remove collision by using a non assigned
course for the non assigned periods;
13 if (newGeno is not null) return newGeno;
14 newGeno = remove collision by using mismatched
periods for the non assigned periods;
15 if (newGeno is not null) return newGeno;
16 OD
17 newGeno=bfsReorderCourses();
18 return min{newGeno,origGeno};
END
Figure 5.14. Event-freeing Mutation Pseudocode for processing a collision.
Input: phenotype,genotype
Output: Mutated Genotype or original one
processOvercapacity(genotype,phenotype)
BEGIN
1 courseSectionHOC = get course section with highest over
capacity;
2 if (courseSectionHOC does not exist) return genotype;
3 for (each student that contains the courseSectionHOC) DO
4 if (student has not requested the course) DO
5 if (non assigned course exists for
the period of the courseSectionHOC) DO
6 genotype= replace the students slot with
the non assigned courseSection;
7 return genotype;
8 OD
9 OD
10 OD
11 return genotype;
END
Figure 5.15. Event Freeing Mutation Pseudocode for processing an overcapacity.
41
student did not request the existing course we try to replace its timeslot with one that hasnt
been assigned to that student.
5.2.6.5 STOCHASTIC VIOLATION-DIRECTED
MUTATION
Stochastic violation-directed mutation is very similar to violation-directed but
maintains a list of the violation scores and biases our selection towards the highest violation
scores. Instead of retrieving the student that has the maximum violation score it maintains a
list of students and their violation scores which then the selection is biased for the students
that have the highest violation scores.
5.2.6.6 STOCHASTIC EVENT-FREEING
MUTATION
Stochastic event-freeing mutation is also similar to event-freeing mutation but
maintains a list of violation scores and also biases our selection toward the highest violation
scores.
5.2.7 Fitness Function
The fitness function that was used was the sum of the individual weights times the
number of violations for that specific weight. We can describe it by using the following
equation ( ) ( ) ( ) ( )
i c c i m m i oc oc i
F g w t g w t g w t g = + + where ( )
i
F g is the fitness value of
the i
th
genotype,
c
w is the weight assigned to the collisions, ( )
c i
t g is the total number of
collisions for genotype i,
m
w is the weight assigned to the mismatches, ( )
m i
t g is the
number of mismatches for genotype i,
oc
w is the weight for the over-capacity and ( )
oc i
t g is
the total number of overcapacities that have occurred for genotype i. The more violations
that we have the higher the fitness function and the lower the violations the lower the fitness
function. When there are no constraints violated our fitness function becomes 0. So our
objective is to try to minimize this fitness function.
42
5.2.8 Parent Selection Schemes
Various parent selection schemes exists and the ones that I chose to use where
Tournament, Truncation and N-Parent selection. For N-Parent selection I chose to use all
the parents for the selection of the pair of parents to undergo crossover.
5.2.9 Survivor Selection
Similar to the parent selection schemes there are various survivor selection schemes
that one can incorporate into the genetic algorithm. The selection scheme that was chosen
for the tests was combined.
5.2.10 Results
We ran five tests for each of the results and averaged the results to obtain the results
in the figures below. All tests were run using Windows XP with an Intel Core 2 CPU
running at 2.16GHz with 3GB of RAM. The population size that was chosen was 400 with
survivor selection set to combined. The parent selection schemes that were used were all,
tournament and truncation. The following weights were as follows: collisions was set to 10,
mismatches was set to 9, and over capacity was set to 8.
Violation Directed Mutation (Table 5.1) performed well and there was on average
zero collisions and zero mismatches for this mutation operator for the five tests runs. The
best performer was VDM using a 1 Point Classic crossover with parent selection set to all
and the average fitness value was 296.
Stochastic Violation Directed Mutation (Table 5.2) did not perform as well as VDM.
All runs resulted in zero collisions but there were more mismatches and over capacities
compared to VDM. The best performer was SVDM using 1 Point Per Student crossover
with parent selection set to all which resulted in an average fitness value of 1,101. There
were two different setups that were tried for the collision as described in Figure 5.14 line 17
where First is defined as doing a best first search and returning the first complete schedule
for the student that was found and All finds the best student schedule which minimizes the
genotype.
Event Freeing Mutation (Table 5.3) did not perform as well as VDM or SVDM. The
best performer was EFM using 1 Point crossover with parent selection set to all with BFS
set to all which resulted in an average fitness value of 3,807.
43
Table 5.1.Violation Directed Mutation (VDM)
Crossover Parent
Selection
Collisions Mismatches Over
Capacity
#Courses
OC
Fitness
Value
Total
Time
(hrs)
Uniform All 0 0 41 15 330 3.6
Tournament 0 0 70 17 558 2.6
Truncation 0 0 44 15 352 2.6
1 Point All 0 0 51 15 405 6.3
Tournament 0 0 82 18 656 3.1
Truncation 0 0 56 16 446 3.7
1 Point
Classic
All 0 0 37 13 296 4.4
Tournament 0 0 82 17 658 2.9
Truncation 0 0 48 16 382 2.8
1 Point Per
Student
All 0 0 47 15 373 3.6
Tournament 0 0 92 20 740 2.8
Truncation 0 0 48 16 386 3.1
Stochastic Event Freeing Mutation (Table 5.4) performed better than EFM but did
not perform as well as VDM or SVDM. The best performer was SEFM using uniform
crossover with parent selection set to all with BFS set to first which resulted in an average
fitness value of 1,943.
Applying random mutation (Table 5.5) yielded the worst of all the mutation
operators and the best fitness value was attained using one point crossover with parent
selection set to tournament was 12, 843.
5.3 MASTER SCHEDULE NOT DEFINED
Given a set of teachers, with each teacher a set of courses that they are able to teach
and each teacher is allowed to teach a specified number of periods and a set of students with
courses that they need to take, how do we create a master schedule of courses where each
teacher teaches for a specified number of periods and each student is assigned the courses
that they requested within the master schedule.
44
Table 5.2. Stochastic Violation Directed Mutation (SVDM)
Crossover Parent
Selection
Collisions Mismatches Over
Capacity
#Courses
OC
Fitness
Value
Total
Time
(hrs)
Uniform All 0 137 92 17 1966 2.8
Tourn. 0 120 93 18 1825 2.1
Trunc. 0 87 95 18 1544 2.3
1 Point All 0 84 55 17 1201 4.8
Tourn. 0 75 104 17 1511 2.9
Trunc. 0 91 70 16 1384 4.1
1 Point
Classic
All 0 73 70 17 1217 3.0
Tourn. 0 81 97 18 1504 2.3
Trunc. 0 108 78 16 1598 2.6
1 Point Per
Student
All 0 59 71 16 1101 3.5
Tournament 0 86 103 17 1593 2.2
Truncation 0 72 74 16 1241 3.1
This case is similar to (1) but the main difference is that there is no master schedule.
The timeslots where we wish to place students has not been created. What we know is those
teachers that can teach certain specific courses and how many periods they are allowed to
teach. We also know the capacities for each specific course. So there are two things
occurring at the same time. One is building a master schedule and second is assigning
students to timeslots within different master schedules. The objective is to find the best
master schedule that will produce the best assignment of students into the given timeslots.
5.3.1 Test Data
The test data was the same one used for section 5.2 but instead we used the master
schedule to determine which courses the teachers were allowed to teach. A sample list of
courses and their capacities is given in Figure 5.16 with the full data file at [31]. A sample
xml file with courses that a teacher can teach is given in Figure 5.17 with the full data file at
[32]. The name of the teacher is given with the number of periods that particular teacher
45
Table 5.3. Event Freeing Mutation (EFM)
Crossover Parent
Selection
BFS Collis. Mism. Over
Capacity
#Courses
OC
Fitness
Value
Total
Time
(hrs)
Uniform All First 40 350 123 14 4536 1.0
All 13 324 137 16 4255 1.7
Tourn. First 24 336 167 15 4600 .4
All 13 317 168 15 4326 .7
Trunc. First 10 283 168 16 3987 .8
All 9 298 153 14 3988 1.1
1 Point All First 25 430 159 16 5398 1.2
All 3 220 224 17 3807 1.8
Tourn. First 32 491 146 14 5903 .4
All 24 466 166 17 5762 .6
Trunc. First 31 487 172 16 6060 .8
All 14 379 175 16 4947 1.4
1 Point
Classic
All First 37 526 129 14 6132 1.1
All 10 326 186 15 4513 1.7
Tourn. First 20 380 162 16 4915 .4
All 28 459 164 17 5730 .6
Trunc. First 27 429 166 18 5461 .8
All 31 455 135 14 5484 1.0
1 Point Per
Student
All First 12 266 193 16 4057 1.1
All 3 288 194 16 4175 1.6
Tourn. First 29 346 181 15 5038 .4
All 7 264 202 16 4060 1.2
Trunc. First 26 425 156 16 5334 .8
All 5 311 171 16 4217 1.3
46
Table 5.4. Stochastic Event Freeing Mutation (SEFM)
Crossover Parent
Selection
BFS Collis. Mism. Over
Capacity
#Courses
OC
Fitness
Value
Total
Time
(hrs)
Uniform All First 3 3 236 17 1943 .6
All 1 4 248 17 2034 1.3
Tourn. First 3 4 303 18 2488 .3
All 1 5 299 18 2445 .9
Trunc. First 3 5 273 19 2256 .6
All 0 4 270 17 2201 .9
1 Point All First 3 5 317 18 2609 1.7
All 0 4 311 17 2526 2.8
Tourn. First 2.4 4.2 326 19 2671 .7
All 1 4 335 18 2724 1.1
Trunc. First 2 5 334 19 2737 1.3
All 1 4 315 17 2564 1.8
1 Point
Classic
All First 2 4 304 17 2481 .9
All 1 4 296 17 2413 1.8
Tourn. First 3 4 307 18 2519 .4
All 1 4 306 18 2488 .9
Trunc. First 2 4 309 17 2535 .7
All 1 3 315 19 2557 1.2
1 Point Per
Student
All First 2 3 295 72 2410 1.5
All 1 5 305 19 2496 1.8
Tourn. First 3 4 313 17 2567 .5
All 0 5 315 18 2567 1.1
Trunc. First 3 4 304 18 2504 .9
All 1 5 313 18 2557 1.4
47
Table 5.5. Random Mutation
Crossover Parent
Selection
Collisions Mismatches Over
Capacity
#Courses
OC
Fitness
Value
Total
Time
(hrs)
Uniform All 182 1284 4 3 13405 .6
Tourn. 138 1383 6 3 13876 .8
Trunc. 160 1324 6 3 13565 .6
1 Point All 197 1246 3 2 13205 1.5
Tourn. 197 1202 7 4 12843 1.2
Trunc. 194 1242 4 3 13154
1 Point
Classic
All 152 1382 3 2 13984 .8
Tourn. 149 1401 7 4 14150 .6
Trunc. 146 1389 4 3 13994 .8
1 Point Per
Student
All 163 1310 5 3 13454 .9
Tournament 158 1370 6 3 13952 .6
Truncation 172 1360 4 2 13989 .7
<Courses>
<course name="Food Tech" capacity="20"/>
<course name="English 2" capacity="30"/>
<course name="Sports Cond 1" capacity="35"/>
<course name="Welding" capacity="10"/>
<course name="Wood 1" capacity="15"/>
<course name="Spanish 1" capacity="30"/>
<course name="English 1" capacity="20"/>
<course name="Algebra 1" capacity="30"/>
<course name="Algebra 2" capacity="25"/>
<course name="Graph Design" capacity="20"/>
<course name="AP US History" capacity="25"/>
</Courses>
Figure 5.16. Sample xml data file for courses and their
maximum capacity.
48
<TeacherCourses>
<teacher name="Teacher 1" periods="6">
<course name="Functl Life Skl"/>
<course name="Teachers Aide"/>
<course name="Math"/>
<course name="Soc Studies"/>
<course name="Science"/>
<course name="English"/>
</teacher>
<teacher name="Teacher 2" periods="6">
<course name="Algebra 1"/>
<course name="Algebra 2"/>
<course name="Math Suppt Alg"/>
</teacher>
...
<teacher name="Teacher N" periods="7">
<course name="Earth Science"/>
</teacher>
</TeacherCourses>
Figure 5.17. Sample xml data for the courses that
a teacher is allowed to teach and for how many
periods.
should be scheduled for. In the case for Teacher N, the teacher only teaches one course and
teaches for seven periods so that teacher will be teaching the same course all day long
compared to Teacher 2 which teaches for six periods but can be any of the three courses
described in Figure 5.17.
5.3.2 Representation
The representation that was used was a simple ordering similar to Problem 1 in
section 5.2 but the genotype would have to not only contain the student slots but also
contain the master schedule since the master schedule is specific to the student slots for that
particular genotype. The master schedule can also go through the evolutionary process.
The phenotype for this is a one dimensional array containing the teachers and for each
teacher it contains the courses that they are able to teach. This phenotype is used to seed the
genotypes with various master schedules.
5.3.3 Constraints
The constraints are similar to the first problem as discussed in section 5.2.2.
49
5.3.4 Initialization
There were two ways that the genotypes can be initialized: (1) randomly, (2)
intelligently. Randomly initialization just goes by randomly creating a master schedule
from the teachers availability and then randomly assigning students as in 5.2.4. Randomly
creating the master schedule does not necessarily create a master schedule that will be able
to satisfy the students requests since a master schedule can be created with only one section
of a course and more than one section are required in order to fulfill the students requests.
Intelligent initialization creates master schedules with the required minimum
sections for each course in order to be able to have a chance to fulfill the students requests.
Some of these might not be valid since the timeslots of the courses are not ordered correctly
in order to fulfill all of the students requests. Given this, it is still a much better way to
create the master schedules and the possibility of one of them being the one that can fulfill
all of the students requests. During intelligent initialization the algorithm determines the
minimum number of sections for each course that are required in order to be able to fulfill
the students requirements. This is done by looking at the students requests and summing
up the total number of students that need to take a specific course. Once this step is done
then we randomly create different master schedules using this information.
5.3.5 Recombination
The recombination operators that were used were the same as in section 5.2.5.
Initial results were not satisfactory when applying crossover to the master schedule so
crossover was only applied to the student slots as in section 5.2 but the genotype still
contained the master schedule since there is no guaranteed that different genotypes had
matching master schedules. The crossover operators that were used were Uniform, 1 Point
Classic, and 1 Point Per Student.
5.3.6 Mutation
Given that EFM and SEFM did poorly for the problem in section 5.2 we chose to
focus on solving this problem using only VDM and SVDM.
50
5.3.7 Fitness Function
The fitness function that was used is very similar to 5.2.7 and was the sum of the
individual weights times the number of violations for that specific weight. The only
difference is that we have a master schedule and now we need to incorporate the violations
that occur when a teacher is scheduled to teach more than one course at the same time. We
can describe it by using the following equation
( ) ( ) ( ) ( ) ( )
i c c i m m i oc oc i tc tc i
F g w t g w t g w t g w t g = + + + where
i
g is the i
th
genotype of
the current generation,
c
w is the weight assigned to the collisions, ( )
c i
t g is the total number
of collisions for genotype i,
m
w is the weight assigned to the mismatches, ( )
m i
t g is the
number of mismatches for genotype i,
oc
w is the weight for the over-capacity, ( )
oc i
t g is the
total number of overcapacities that have occurred for genotype i,
tc
w is the weight assigned
for teacher collisions and ( )
tc i
t g is the total number of teacher collisions that have occurred
for genotype i. Teacher violations might or might not occur depending on whether we allow
the evolutionary process to occur for the master schedule or not. If we have valid master
schedules and keep the master scheduled from the evolutionary process then there will be no
teacher collisions.
Just like the fitness function in 5.2.7, the more violations that we have the higher the
fitness function and the lower the violations the lower the fitness function. When there are
no constraints violated our fitness function becomes 0. So our objective is to try to
minimize this fitness function. A high weight value relative to the other weights was
assigned to
tc
w in order to avoid having teachers teaching more than one course at the same
time since it is not possible for a teacher to teach two different courses at the same time.
5.3.8 Parent Selection Schemes
The same parent selection scheme that was used in section 5.2.8 was used here.
5.3.9 Survivor Selection
The same survivor selection scheme that was used in section 5.2.9 was used here.
51
5.3.10 Results
We ran five tests for each of the results and averaged the results to obtain the results
in the figures below. All tests were run using Windows XP with an Intel Core 2 CPU
running at 2.16GHz with 3GB of RAM. The population size that was chosen was 5000 with
survivor selection set to combined in order to generate different possible combinations of
master schedules. The parent selection schemes that were used were all, tournament and
truncation. The following weights were as follows: collisions was set to 10, mismatches
was set to 9, and over capacity was set to 8.
Violation Directed Mutation (Table 5.6) best performance was achieved for uniform
crossover with a parent selection of all and resulted in a fitness value of 990. There were
several collisions and mismatches that occurred and this resulted because the master
schedule that was selected was not able to accommodate all the students courses for the
timeslots that were available. Tournament selection did not perform well and this is
probably due to the same parameters that were used for the problem in section 5.1 and we
didnt change the parameters as we increased the size of the population. The tournament
size was set at 25 and this is too low for a population of 5000.
Table 5.6. Violation Directed Mutation (VDM)
Crossover Parent
Selection
Collisions Mismatches Over
Capacity
#Courses
OC
Fitness
Value
Total
Time
(hrs)
Uniform All 48 44 14 5 990 3.2
Tourn. 223 732 2 1 8828 .5
Trunc. 83 95 18 3 1825 1.7
1 Point
Classic
All 358 1438 141 23 17651 4.5
Tourn. 341 1611 284 27 20185 1.8
Trunc. 334 1520 204 25 18648 3.6
1 Point Per
Student
All 154 1035 24 14 11047 4.5
Tournament 316 1439 58 23 16578 .8
Truncation 194 1116 33 14 12245 1.8
52
Stochastic Violation Directed Mutation (Table 5.7) best performance was achieved
for uniform crossover with a parent selection of all and resulted in a fitness value of 866.
Tournament selection did not perform well for the same reason that was discussed above.
Table 5.7. Stochastic Violation Directed Mutation (SVDM)
Crossover Parent
Selection
Collisions Mismatches Over
Capacity
#Courses
OC
Fitness
Value
Total
Time
(hrs)
Uniform All 4 89 4 2 866 6.6
Tourn. 122 656 91 17 7853 3.4
Trunc. 25 149 15 1710 9484 2.6
1 Point
Classic
All 60 1265 476 24 15823 9.2
Tourn. 35 1456 635 24 18543 4.8
Trunc. 46 1368 540 24 17088 7.1
1 Point Per
Student
All 11 915 203 22 9970 10.4
Tournament 38.4 1234 365 24 14412 3.5
Truncation 16 1003 225 22 10983 7.1
53
CHAPTER 6
CONCLUSION
Solving timetable problems is not a trivial task and traditional techniques fall short
due to the complexity of solving these types of problems and its known complexity which
fall in the category of NP complete problems. Several techniques have been used
throughout to solve these problems and my focus for my thesis was using genetic algorithms
to solve high school timetabling problems. Two types of problems were my main focuses
which were (1) the master schedule was fixed and (2) the master schedule was not fixed.
Given these two types of problems the issue was how we schedule students course requests
into a master schedule while minimizing the number of collisions, mismatches and course
capacities.
Solving the first type of problem provided solutions where there were zero collisions
and zero mismatches with several courses going over the maximum capacity. The best
results was attained using violation directed mutation with a crossover operator of 1 Point
Classic (Table 5.1) which resulted in an average fitness value of 296. There were 13
courses that exceeded its capacity with an average of 3 students per course. A close second
also used violation directed mutation with an average fitness value of 330 with 15 courses
that exceeded its capacity with an average of 3 students per course.
The second type of problem was more difficult for the genetic algorithm and we had
to increase the population from the 400 that was used in problem one to 5,000 for this
problem. This allowed the genetic algorithm to have more master schedules to choose from.
Given the time constraints the master scheduled remained fixed after initialization and the
genetic algorithm would converge to one of these existing master schedules. Given that
event freeing mutation did not produce acceptable results in the first problem we directed
our focus to only use violation directed mutation. The best results was attained using
stochastic violation directed mutation with crossover operator of uniform which resulted in
an average fitness value of 866 with 4 collisions and 89 mismatches with 2 courses going
over their maximum capacity with an average of 2 students per course.
54
Several techniques can be used to try to further improve the solutions above:
1. Employ a heuristic at the end of the run to try to remove the over capacities while at
the same time not increasing the collisions or mismatches.
2. For problem 2 we can allow the master schedule to also go through the evolutionary
process so various master schedules can be created and not rely only on the
initialization process for the master schedules.
3. Do not perform crossover and only perform mutation on a given percentage of the
population during the evolutionary process.
55
REFERENCES
[1] S. Petrovic S. and E. Burke. University timetabling. In Handbook of scheduling:
Algorithms, models, and performance analysis, pages 1-23. CRC Press, Boca Raton,
FL, 2004.
[2] M. L. Pinedo. Planning and scheduling in manufacturing and services. In Springer
Series in operations research, pages 3-8. Springer, New York, 2005.
[3] H. L. Fang. Genetic algorithms in timetabling scheduling. Ph.D. thesis, University of
Edinburgh, Edinburgh, UK, 1994.
[4] S. A. MirHassani. A computational approach to enhancing course timetabling with
integer programming. Applied Mathematics and Computation, 175: 814-822, 2006.
[5] J. Clausen. Branch and bound algorithms- principles and examples, 1999.
http://citeseerx.ist.psu.edu/
[6] T. E. Morton and D. W. Pentico. Heuristic scheduling systems. Wiley Series in
Engineering & Technology Management, New York, 1993.
[7] F. Busetti. Simulated annealing overview, 2009.
http://www.cs.ubbcluj.ro/~csatol/mestint/pdfs/Busetti_AnnealingIntro.pdf
[8] M. Friedrich, I. Hofs and S. Wekeck. Timetable-Based Transit Assignment Using
Branch & Bound, 2009. http://cgi.ptv.de/download/traffic/library/2001%20TRB%20
Timetable%20Transit%20Assignment.pdf
[9] R. Montemanni. Timetabling: Guided Simulated Annealing + Local Searches, 2003.
http://www.idsia.ch/Files/ttcomp2002/montemanni.pdf
[10] D. Abramson, M. Krishnamoorthy and H. Dang. Simulated Annealing Cooling
Schedules for the School Timetabling Problem, 1997.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.994
[11] F. Glover and M. Laguna. Tabu Search, 2009.
http://www.dei.unipd.it/~fisch/ricop/tabu_search_glover_laguna.pdf
[12] A. Hertz, E. Taillard E. and D. A. de Werra. Tutorial On Tabu Search, 2009.
http://www.cs.colostate.edu/~whitley/CS640/hertz92tutorial.pdf
[13] A. Schaerf. Tabu Search Techniques for Large High-School Timetabling Problems,
1996. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.9007
[14] W. Legierski, Constraint-Based Reasoning for Timetabling. AI-METH 2002-
Artificial Intelligence Methods (November 13-15, 2002), 2002. http://www.ai-
forum.org/data/22-cons.pdf
[15] J. J. Blanco and L. Khatib. Couse Scheduling as a Constraint Satisfaction Problem,
1998. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.8460
56
[16] H. Kanoh and Y. Sakamoto. Knowledge-Based Genetic Algorithm for University
Course Timetabling Problems. Inter. J. of Knowledge-based and Intelligent Eng. Sys.,
12: 283-294, 2008.
[17] A. Aamodt and E. Plaza. Case-Based Reasoning: Foundational Issues, Methodological
Variations, and System Approaches, AI Communications, 1994.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.9093
[18] E. K. Burke and S. Petrovic. Recent Research Directions in Automated Timetabling,
2002. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.63.5457
[19] J. Heaton. Introduction to neural networks with java, Heaton Research, Chesterfield,
MO, 2
edition, 2008.
[20] M. Carrasco and M. Pato. A potts neural network heuristic for the class/teacher
timetabling problem. In Proc. 4th Metaheuristics International Conference, pages
139-142. Metaheuristics International, Porto, Portugal, 2001.
[21] M. Eley. Ant Algorithms for The Exam Timetabling Problem, 2010.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.6675
[22] H. S. Fen, S. Deris and S. Z. Mohd Hashim. Incorporating Of Constraint-Based
Reasoning Into Particle Swarm Optimization For University Timetabling Problem.
Computer Science Letters, 2009.
http://www.issres.net/journal/index.php/csl/article/view/55/12
[23] A. E. Eiben and J. E. Smith. Introduction to Evolutionary Computing, Springer, New
York, 2 edition, 2003.
[24] T. Blickle and L. Thiele. A Comparison of Selection Schemes used in Genetic
Algorithms, 1995. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.509
[25] S. N. Sivanandam and S. N. Deepa. Introduction to genetic algorithms. Springer, New
York, 2008.
[26] T. A. El-Mihoub, A. A. Hopgood, L. Nolle and A. Battersby. Hybrid Genetic
Algorithms: A Review, 2006.
http://www.engineeringletters.com/issues_v13/issue_2/EL_13_2_11.pdf
[27] G. N. Beligiannis, C. N. Moschopoulos, G. P. Kaperonis and S. D. Likothanassis.
Applying evolutionary computation to the school timetable problem: The greek case.
Computers & Operations Research 35:1265-1280, 2008.
[28] S. Kazarlis, V. Petridis and P. Fragkou. Solving University Timetabling Problems
Using Advanced Genetic Algorithms, 2010.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.7392
[29] E. R. Ramirez. Master Schedule Xml Data File, 2010.
https://docs.google.com/leaf?id=0B4LQhtZnAEcVY2ViNDBmM2MtZmI0Yi00OWV
iLWEzM2EtMDk0OTI3MTlhOWRi&hl=en
[30] E. R. Ramirez. Students Requests Xml Data File, 2010.
https://docs.google.com/leaf?id=0B4LQhtZnAEcVMWQzYjA4MDAtOTQ3YS00ZD
EwLTk1MjQtMzk3YzE5ODYwODRj&hl=en
57
[31] E. R. Ramirez. Courses Xml Data File, 2010.
https://docs.google.com/leaf?id=0B4LQhtZnAEcVMDQyNzY2ODYtYTI2NS00MDk
2LWFkOGUtNDgzOTNkMDcyMWRh&hl=en
[32] E. R. Ramirez. Teacher Courses Xml Data File, 2010.
https://docs.google.com/leaf?id=0B4LQhtZnAEcVOTFkOTkwNDktM2RlNC00YjA5
LWE0YWYtZDNmNzc5N2FhMjE5&hl=en