Beruflich Dokumente
Kultur Dokumente
The traveling salesman problem is a good example: the salesman is looking to visit a set of cities in the
order that minimizes the total number of miles he travels. As the number of cities gets large, it becomes
too computationally intensive to check every possible itinerary. At that point, you need an algorithm.
You can visualize this by imagining a 2D graph like the one below. Each x-coordinate represents a
particular solution (e.g., a particular itinerary for the salesman). Each y-coordinate represents how good
that solution is (e.g., the inverse of that itinerary's mileage).
Broadly, an optimization algorithm searches for the best solution by generating a random initial solution
and "exploring" the area nearby. If a neighboring solution is better than the current one, then it moves to
it. If not, then the algorithm stays put.
This is perfectly logical, but it can lead to situations where you're stuck at a sub-optimal place. In the
graph below, the best solution is at the yellow star on the left. But if a simple algorithm finds its way to
the green star on the right, it won't move away from it: all of the neighboring solutions are worse. The
green star is a local maximum.
Simulated annealing injects just the right amount of randomness into things to escape local maxima early
in the process without getting off course late in the game, when a solution is nearby. This makes it pretty
good at tracking down a decent answer, no matter its starting point.
On top of this, simulated annealing is not that difficult to implement, despite its somewhat scary name.
5. Compare them:
6. Repeat steps 3-5 above until an acceptable solution is found or you reach some maximum number of
iterations.
Usually, the temperature is started at 1.0 and is decreased at the end of each iteration by multiplying it by
a constant called . You get to decide what value to use for ; typical choices are between 0.8 and 0.99.
Furthermore, simulated annealing does better when the neighbor-cost-compare-move process is carried
about many times (typically somewhere between 100 and 1,000) at each temperature. So the production-
grade algorithm is somewhat more complicated than the one discussed above. It's implemented in the
example Python code below.
Example Code
This code is for a very basic version of the simulated annealing algorithm. A useful additional
optimization is to always keep track of the best solution found so far so that it can be returned if the
algorithm terminates at a sub-optimal place.
def anneal(solution):
old_cost = cost(solution)
T = 1.0
T_min = 0.00001
alpha = 0.9
while T > T_min:
i = 1
while i <= 100:
new_solution = neighbor(solution)
new_cost = cost(new_solution)
ap = acceptance_probability(old_cost, new_cost, T)
if ap > random():
solution = new_solution
old_cost = new_cost
i += 1
T = T*alpha
return solution, old_cost
This skeleton leaves a few gaps for you to fill in: neighbor() , in which you generate a random neighboring
solution, cost() , in which you apply your cost function, and acceptance_probability() , which is basically
defined for you.
Once the acceptance probability is calculated, it's compared to a randomly-generated number between 0
and 1. If the acceptance probability is larger than the random number, you're switching!
where a is the acceptance probability, (cold cnew ) is the difference between the old cost and the new
one, T is the temperature, and e is 2.71828, that mathematical constant that pops up in all sorts of
unexpected places.
This equation is the part of simulated annealing that was inspired by metalworking. Throw in a constant
and it describes the embodied energy of metal particles as they are cooled slowly after being subjected to
high heat. This process allows the particles to move from a random configuration to one with a very low
embodied energy. Computer scientists borrow the annealing equation to help them move from a random
solution to one with a very low cost.
is always > 1 when the new solution is better (has a lower cost) than the old one. Since you can't have a
probability greater than 100%, we use = 1 in this case..
gets smaller as the new solution gets more worse than the old one.
gets smaller as the temperature decreases (if the new solution is worse than the old one)
What this means is that the algorithm is more likely to accept sort-of-bad jumps than really-bad jumps,
and is more likely to accept them early on, when the temperature is high.
Conclusion
If you ever have a combinatorial optimization problem to solve, simulated annealing should cross your
mind. Plenty of other strategies exist, but as algorithms expert Steven Skiena says, "[The] simulated
annealing solution works admirably. It is my heuristic method of choice."
References
This post relies heavily on these notes
(http://www.cs.nott.ac.uk/~gxk/aim/notes/simulatedannealing.doc) from Graham Kendall
(http://www.cs.nott.ac.uk/~gxk/aim/) at Nottingham University and on Steven Skiena's Algorithm
Design Manual (http://www.algorist.com/). Another excellent source is the 1983 paper "Optimization by
Simulated Annealing" (http://home.gwu.edu/~stroud/classics/KirkpatrickGelattVecchi83.pdf) by
Kirkpatrick, Gelatti, and Vecchi. It's a bit dense, but relatively readable for an academic paper.
Update 1/29/16: Fixed a couple of errors - the most egregious of which claimed the acceptance probability
cnew cold cold cnew
function to be a = e T instead of a = e T .