Beruflich Dokumente
Kultur Dokumente
Starting with
Steepest Descent Method
Steepest Descent Method
• Gradient Descent Method is a optimization
algorithm to find a local minimum of a
function.
• Gradient is the slope of a
function.
Select a function and the starting point. Click START.
Illustration of Gradient Descent
f
x2
Original point in
weight space (xk)
New point in
weight space
x1
Line Search Parameter
• Parameter αk is the rate of change to the point in the
specific direction
• Our interest if to find the rate of change of the function f along the
direction charaterised by the parameter
• The objective is to determine optimum line parameter (step length) αk*
which minimizes f
• This is determined by =0 and finding the value of optimum αk* that
minimizes f
Steepest Descent method
• Certain other properties
– Rate of convergence is dependent on starting
point
– The steepest descent method is a “local” property
and is not effective in most problems
Heuristics
A Heuristic is simply a rule of thumb that hopefully will find a good
answer.
Why use a Heuristic?
Heuristics are typically used to solve complex(large, nonlinear,
non-convex (i.e. contain several local minima)) multivariate
optimization problems that are difficult to solve to optimality.
Heuristics are NOT guaranteed to find the true global optimal
solution in a single objective problem, but should find many
good solutions (the mathematician's answer vs. the engineer’s
answer)
Heuristics are good at dealing with local optima without getting
stuck in them while searching for the global optimum.
A Heuristic is simply a rule of thumb that hopefully will find a
good answer.
Metaheuristics
• “A Metaheuristic is formally defined as an iterative
optimization process which guides a subordinate
heuristic by combining intelligently different concepts
for exploring and exploiting the search space, learning
strategies are used to structure information in order to
find efficiently near-optimal solutions.”
• “Metaheuristics are typically high-level strategies
which guide an underlying, more problem specific
heuristic, to increase their performance. The main goal
is to avoid the disadvantages of iterative improvement
and, in particular, multiple descent by allowing the
local search to escape from local optima
Common Heuristics/Metaheuristics
• Neighbourhood search methods
– Simulated Annealing
– Tabu Search
– Etc.
• Population search methods
– Genetic Algorithms
– Particle swarm optimization
– Ant Colony optimization
– Artificial Bee colony algorithm
– Etc..
Simulated Annealing
Method proposed in 1983 by
IBM researchers for solving Initial position
of the ball Simulated Annealing explores
VLSI layout problems more. Chooses this move with a
small probability (Hill Climbing)
(Kirkpatrick et al, Science,
220:671-680, 1983).
Generate neighbourhood
soln. xnew and find E(xnew)
T←T .α
E=E(xnew)E(xcur)
Y N n=n+1
If E<0
random(0,1)
< exp(E/B.T)
Y Min. T?
N Y
xcur ← xnew xcur ← xnew
Print xcur
N Y
If n<N
Simulated Annealing Algorithm
1. Select a point xcur and find objective function E(xcur);
2. Select an initial temperature T>0; //T=2000 (say)
3. Set number of iterations at each temperature N //N=20 (say)
4. Set rate of change of temperature α; //say 0.99
5. Repeat
5.1 Set repetition counter n=0;
5.2 Repeat
5.2.1 Generate state xnew, a neighbor of xcur;
5.2.2 Calculate E=E(xnew)E(xcur);
5.2.3 If E<0 (minimization problem) then xcur ← xnew;
5.2.4 else if random(0,1)<exp(E/B.T) then xcur ← xnew;
5.2.5 n=n+1;
5.3 Until n=N;
5.4 T←T .α;
Repeat Step 5 Until stopping criterion reached.
// T=20 (say) or convergence
Algorithm Parameters
• Kirkpatrick et al. (1983)
– Set initial value T to be large enough
– T(t+1)= T(t), :0.8~0.99 //rate of cooling
– N: a sufficient number of transitions
• corresponding to thermal equilibrium
• constant
• or proportional to the size of the neighborhood
– Stopping criterion
• when the solution obtained at each temperature change is
unaltered for a number of consecutive temperature changes.
or
• When the temperature reaches a specific minimum value
Algorithm parameters
• Constant: B should be chosen depending on ΔE (depending on
the function) and initial Temperature T and the final
temperature.
ΔE/B.T Probability P=exp(E/B.T)
Start of algorithm
0.01 0.999
(Initial Temp)
0.1 0.905
0.25 0.779
0.5 0.607
1.0 0.368
2.0 0.135
3.0 0.050
4.0 0.018
5.0 0.007 End of algorithm
(Final Temp)
Good features
• It is very easy to implement.
• It can be generally applied to a wide range of
problems.
• SA provided high quality solutions to many
problems.
• Simulated Annealing algorithms are usually
better than greedy algorithms, when it comes
to problems that have numerous locally
optimum solutions.
Weaknesses
• Care is needed to devise an appropriate neighborhood structure and
cooling scheduler to obtain an efficient algorithm.
• Results are generally not reproducible: another run can give a different
result
• SA can leave an optimal solution and not find it again(so try to remember
the best solution found so far)
• Proven to find the good quality solutions under certain conditions; one of
these conditions is that you must run forever (several restarts)
• The energy function of the left would work with SA while the one of the
right may fail.
Example