Simulated Annealing (Recovered)

Simulated Annealing
Introduction
Simulated annealing was created when researchers noticed the analogy between their search algorithms
and metallurgists' annealing algorithms. The idea is to achieve a goal state without reaching it too fast. In
metallurgy, for example, the process of hardening steel requires specially timed heating and cooling to
make the iron and carbon atoms settle just right. In mathematical search algorithms, we want to focus on
promising solutions without ignoring better solutions we might find later. In other words, we want to
reduce error to the global minima without getting stuck in less successful local minima.
The Algorithm
1. Create the first solution, and get its energy value.

2. While temperature > minimum temperature...
3. Make a copy of the solution.
4. Modify the copy.
5. Get the copy's energy value.
6. Keep the better of these two solutions.
7. Reduce the temperature.
8. Repeat
The algorithm stops when the temperature reaches a preset minimum value.
The important thing to keep in mind is that "keeping the better solution" does NOT necessarily mean the
one with the lower energy value. If the modified copy solution turns out to be better than the previous
solution, great. We're making progress. But if it's not better, how would we know, by any logic, which way
to go from here? We've got to find something better than the best solution. If the best solution still
contains errors, chances are we've done the best we can in one of the local minima. To get free of a local
minima, we have no choice but to except worse solutions, scaling the walls of error, until we've risen out
of the local minima and can begin exploring neighboring possibilities. This is where a special acceptance
algorithm comes in.
We accept the better solution based on: P = exp(-delta / Temp)
where delta is the difference between the copy's energy and the previous solution's energy.
As the temperature gradually decreases, the difference in energy values has a profound effect on
acceptance. When the energy difference is low at higher temperatures, the algorithm will accept most
anything. When the energy difference is high at lower temperatures, the algorithm is very discriminating.
Particle Swarm Optimization
Inspired by the flocking and schooling patterns of birds and fish, Particle Swarm Optimization (PSO) was
invented by Russell Eberhart and James Kennedy in 1995. Originally, these two started out developing
computer software simulations of birds flocking around food sources, then later realized how well their
algorithms worked on optimization problems.
Particle Swarm Optimization might sound complicated, but it's really a very simple algorithm. Over a
number of iterations, a group of variables have their values adjusted closer to the member whose value is
closest to the target at any given moment. Imagine a flock of birds circling over an area where they can
smell a hidden source of food. The one who is closest to the food chirps the loudest and the other birds
swing around in his direction. If any of the other circling birds comes closer to the target than the first, it
chirps louder and the others veer over toward him. This tightening pattern continues until one of the
birds happens upon the food. It's an algorithm that's simple and easy to implement.
The algorithm keeps track of three global variables:
 Target value or condition
 Global best (gBest) value indicating which particle's data is currently closest to the Target
 Stopping value indicating when the algorithm should stop if the Target isn't found
Each particle consists of:
 Data representing a possible solution
 A Velocity value indicating how much the Data can be changed
 A personal best (pBest) value indicating the closest the particle's Data has ever come to the Target
The particles' data could be anything. In the flocking birds example above, the data would be the X, Y, Z
coordinates of each bird. The individual coordinates of each bird would try to move closer to the
coordinates of the bird which is closer to the food's coordinates (gBest). If the data is a pattern or
sequence, then individual pieces of the data would be manipulated until the pattern matches the target
pattern.The velocity value is calculated according to how far an individual's data is from the target. The
further it is, the larger the velocity value. In the birds example, the individuals furthest from the food
would make an effort to keep up with the others by flying faster toward the gBest bird. If the data is a
pattern or sequence, the velocity would describe how different the pattern is from the target, and thus,
how much it needs to be changed to match the target.
Each particle's pBest value only indicates the closest the data has ever come to the target since the
algorithm started.
The gBest value only changes when any particle's pBest value comes closer to the target than gBest.
Through each iteration of the algorithm, gBest gradually moves closer and closer to the target until one of
the particles reaches the target.
It's also common to see PSO algorithms using population topologies, or "neighborhoods", which can be
smaller, localized subsets of the global best value. These neighborhoods can involve two or more
particles which are predetermined to act together, or subsets of the search space that particles happen
into during testing. The use of neighborhoods often help the algorithm to avoid getting stuck in local
minima.
Figure 1. A few common population topologies (neighborhoods). (A) Single-sighted, where individuals
only compare themselves to the next best. (B) Ring topology, where each individual compares only to
those to the left and right. (C) Fully connected topology, where everyone is compared together. (D)
Isolated, where individuals only compare to those within specified groups
What Is the Genetic Algorithm?
The genetic algorithm is a method for solving both constrained and unconstrained optimization problems
that is based on natural selection, the process that drives biological evolution. The genetic algorithm
repeatedly modifies a population of individual solutions. At each step, the genetic algorithm selects
individuals at random from the current population to be parents and uses them to produce the children for
the next generation. Over successive generations, the population "evolves" toward an optimal solution. You
can apply the genetic algorithm to solve a variety of optimization problems that are not well suited for
standard optimization algorithms, including problems in which the objective function is discontinuous,
nondifferentiable, stochastic, or highly nonlinear. The genetic algorithm can address problems of mixed
integer programming, where some components are restricted to be integer-valued.
The genetic algorithm uses three main types of rules at each step to create the next generation from the
current population:
 Selection rules select the individuals, called parents, that contribute to the population at the next
generation.
 Crossover rules combine two parents to form children for the next generation.
 Mutation rules apply random changes to individual parents to form children.
The genetic algorithm differs from a classical, derivative-based, optimization algorithm in two main ways,
as summarized in the following table.
Classical Algorithm Genetic Algorithm
Generates a single point at each iteration. Generates a population of points at each
The sequence of points approaches an iteration. The best point in the population
optimal solution. approaches an optimal solution.
Selects the next point in the sequence by Selects the next population by computation
a deterministic computation. which uses random number generators.
Ant Colony Optimization Algorithms
When a colony of ants is confronted with the choice of reaching their food via two different routes of which
one is much shorter than the other, their choice is entirely random. However, those who use the shorter
route move faster and therefore go back and forth more often between the anthill and the food.[1]
In computer science and operations research, the ant colony optimization algorithm (ACO) is
a probabilistic technique for solving computational problems which can be reduced to finding good paths
through graphs. Artificial Ants stand for multi-agent methods inspired by the behavior of real ants. The
pheromone-based communication of biological ants is often the predominant paradigm
used.[2] Combinations of Artificial Ants and local search algorithms have become a method of choice for
numerous optimization tasks involving some sort of graph, e.g., vehicle routing and internet routing. The
burgeoning activity in this field has led to conferences dedicated solely to Artificial Ants, and to numerous
commercial applications by specialized companies such as AntOptima.
As an example, Ant colony optimization[3] is a class of optimization algorithms modeled on the actions of
an ant colony. Artificial 'ants' (e.g. simulation agents) locate optimal solutions by moving through
a parameter space representing all possible solutions. Real ants lay down pheromones directing each other
to resources while exploring their environment. The simulated 'ants' similarly record their positions and
the quality of their solutions, so that in later simulation iterations more ants locate better solutions.[4] One
variation on this approach is the bees algorithm, which is more analogous to the foraging patterns of
the honey bee, another social insect.
This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it
constitutes some metaheuristic optimizations. Initially proposed by Marco Dorigo in 1992 in his PhD
thesis,[5][6] the first algorithm was aiming to search for an optimal path in a graph, based on the behavior
of ants seeking a path between their colony and a source of food. The original idea has since diversified to
solve a wider class of numerical problems, and as a result, several problems have emerged, drawing on
various aspects of the behavior of ants. From a broader perspective, ACO performs a model-based
search[7] and shares some similarities with estimation of distribution algorithms.
Teaching-Learning-Based Optimization with Learning Enthusiasm Mechanism and

Its Application in Chemical Engineering (TLBO)
1. Introduction
In recent years, many real-world problems have become extremely complex and are difficult to solve using
classic analytical optimization algorithms. Metaheuristic search (MS) algorithms have shown more
favorable performance on nonconvex and nondifferentiable problems, resulting in the development of
various MS algorithms for difficult real-world problems. Most of these MS algorithms are nature-inspired,
and several of the prominent algorithms include genetic algorithms (GA) [1], evolution strategies (ES) [2],
differential evolution (DE) [3], particle swarm optimization (PSO) [4, 5], harmony search (HS) [6], and
biogeography-based optimization (BBO) [7, 8]. However, the “No Free Lunch” theorem suggests that no
single algorithm is suitable for all problems [9]; therefore, more research is required to develop novel
algorithms for different optimization problems with high efficiency [10].
2.1. Basic TLBO
TLBO is a population-based MS algorithm which mimics the teaching and learning process of a typical class
[11]. In the TLBO, a group of learners is considered as the population of solutions, and the fitness of the
solutions is considered as results or grades. The algorithm adaptively updates the grade of each learner in
the class by learning from the teacher and learning through the interaction between learners. The TLBO
process is carried out through two basic operations: teacher phase and learner phase.
In the teacher phase, the best solution in the entire population is considered as the teacher, and the teacher
shares his or her knowledge to the learners to increase the mean result of the class. Assume is the position
of the ith learner, the learner with the best fitness is identified as the teacher , and the mean position of a
class with NP learners can be represented as . The position of each learner is updated by the following
equation:where and are the ith learner’s new and old positions, respectively, is a random vector
uniformly distributed within , is a teacher factor, and its value is heuristically set to either 1 or 2. If is
better than , is accepted, otherwise is unchanged.
In the learner phase, a learner randomly interacts with other different learners to further improve his/her
performance. Learner randomly selects another learner and the learning process can be expressed by the
following equation:where is the objective function with D-dimensional variables and is the old position of
the jth learner. If is better than , is used to replace . The pseudocode for TLBO is shown in Algorithm 1.
Artificial Neural Network (ANN)
Introduction to Neural Networks, Advantages and Applications : Artificial Neural Network (ANN) uses the
processing of the brain as a basis to develop algorithms that can be used to model complex patterns and
prediction problems. Lets begin by first understanding how our brain processes information: In our brain,
there are billions of cells called neurons, which processes information in the form of electric signals.
External information/stimuli is received by the dendrites of the neuron, processed in the neuron cell body,
converted to an output and passed through the Axon to the next neuron. The next neuron can choose to
either accept it or reject it depending on the strength of the signal.
Here, w1, w2, w3 gives the strength of the input signals As you can see from the above, an ANN is a very
simplistic representation of a how a brain neuron works. To make things clearer, lets understand ANN using
a simple example: A bank wants to assess whether to approve a loan application to a customer, so, it wants
to predict whether a customer is likely to default on the loan.
Key Points related to the architecture:
1. The network architecture has an input layer, hidden layer (there can be more than 1) and the output layer.
It is also called MLP (Multi Layer Perceptron) because of the multiple layers.
2. The hidden layer can be seen as a “distillation layer” that distills some of the important patterns from the
inputs and passes it onto the next layer to see. It makes the network faster and efficient by identifying only
the important information from the inputs leaving out the redundant information
3. The activation function serves two notable purposes:
- It captures non-linear relationship between the inputs
- It helps convert the input into a more useful output.
In the above example, the activation function used is sigmoid:
O1 = 1 / (1+exp(-F)) Where F = W1*X1 + W2*X2 + W3*X3
Sigmoid activation function creates an output with values between 0 and 1. There can be other activation
functions like Tanh, softmax and RELU.
4. Similarly, the hidden layer leads to the final prediction at the output layer:
O3 = 1 / (1+exp(-F 1)) Where F 1= W7*H1 + W8*H2
Here, the output value (O3) is between 0 and 1. A value closer to 1 (e.g. 0.75) indicates that there is a higher
indication of customer defaulting.
5. The weights W are the importance associated with the inputs. If W1 is 0.56 and W2 is 0.92, then there is
higher importance attached to X2: Debt Ratio than X1: Age, in predicting H1.
6. The above network architecture is called “feed-forward network”, as you can see that input signals are
flowing in only one direction (from inputs to outputs). We can also create “feedback networks where signals
flow in both directions.
7. A good model with high accuracy gives predictions that are very close to the actual values. So, in the table
above, Column X values should be very close to Column W values. The error in prediction is the difference
between column W and column X:
8. The key to get a good model with accurate predictions is to find “optimal values of W — weights” that
minimizes the prediction error. This is achieved by “Back propagation algorithm” and this makes ANN a
learning algorithm because by learning from the errors, the model is improved.
9. The most common method of optimization algorithm is called “gradient descent”, where, iteratively
different values of W are used and prediction errors assessed. So, to get the optimal W, the values of W are
changed in small amounts and the impact on prediction errors assessed. Finally, those values of W are
chosen as optimal, where with further changes in W, errors are not reducing further. To get a more detailed
understanding of gradient descent, please refer to
Key advantages of neural Networks:
ANNs have some key advantages that make them most suitable for certain problems and situations:
1. ANNs have the ability to learn and model non-linear and complex relationships, which is really important
because in real-life, many of the relationships between inputs and outputs are non-linear as well as complex.
2. ANNs can generalize — After learning from the initial inputs and their relationships, it can infer unseen
relationships on unseen data as well, thus making the model generalize and predict on unseen data.
3. Unlike many other prediction techniques, ANN does not impose any restrictions on the input variables
(like how they should be distributed). Additionally, many studies have shown that ANNs can better model
heteroskedasticity i.e. data with high volatility and non-constant variance, given its ability to learn hidden
relationships in the data without imposing any fixed relationships in the data. This is something very useful
in financial time series forecasting (e.g. stock prices) where data volatility is very high.
A few applications:
1. Image Processing and Character recognition: Given ANNs ability to take in a lot of inputs, process them
to infer hidden as well as complex, non-linear relationships, ANNs are playing a big role in image and
character recognition. Character recognition like handwriting has lot of applications in fraud detection (e.g.
bank fraud) and even national security assessments. Image recognition is an ever-growing field with
widespread applications from facial recognition in social media, cancer detention in medicine to satellite
imagery processing for agricultural and defense usage. The research on ANN now has paved the way for
deep neural networks that forms the basis of “deep learning” and which has now opened up all the exciting
and transformational innovations in computer vision, speech recognition, natural language processing —
famous examples being self-driving cars.
2. Forecasting: Forecasting is required extensively in everyday business decisions (e.g. sales, financial
allocation between products, capacity utilization), in economic and monetary policy, in finance and stock
market. More often, forecasting problems are complex, for example, predicting stock prices is a complex
problem with a lot of underlying factors (some known, some unseen). Traditional forecasting models throw
up limitations in terms of taking into account these complex, non-linear relationships. ANNs, applied in the
right way, can provide robust alternative, given its ability to model and extract unseen features and
relationships. Also, unlike these traditional models, ANN doesn’t impose any restriction on input and
residual distributions. More research is going on in the field, for example — recent advances in the usage of
LSTM and Recurrent Neural Networks for forecasting.
ANNs are powerful models that have a wide range of applications. Above, I have listed a few prominent ones,
but they have far-reaching applications across many different fields in medicine, security, banking/finance
as well as government, agriculture and defense.
Mathematical model : A mathematical model is a description of
a system using mathematical concepts and language. The process of developing a mathematical model is
termed mathematical modeling. Mathematical models are used in the natural sciences (such
as physics, biology, earth science, chemistry) and engineering disciplines A model may help to explain a
system and to study the effects of different components, and to make predictions about behaviour.
Classifications: Linear vs. nonlinear: If all the operators in a mathematical model exhibit linearity, the
resulting mathematical model is defined as linear. A model is considered to be nonlinear otherwise. The
definition of linearity and nonlinearity is dependent on context, and linear models may have nonlinear
expressions in them. For example, in a statistical linear model, it is assumed that a relationship is linear in
the parameters, but it may be nonlinear in the predictor variables. Similarly, a differential equation is said
to be linear if it can be written with linear differential operators, but it can still have nonlinear expressions
in it. In a mathematical programming model, if the objective functions and constraints are represented
entirely by linear equations, then the model is regarded as a linear model. If one or more of the objective
functions or constraints are represented with a nonlinear equation, then the model is known as a nonlinear
model. Nonlinearity, even in fairly simple systems, is often associated with phenomena such
as chaos and irreversibility. Although there are exceptions, nonlinear systems and models tend to be more
difficult to study than linear ones. A common approach to nonlinear problems is linearization, but this can
be problematic if one is trying to study aspects such as irreversibility, which are strongly tied to
nonlinearity.
Static vs. dynamic: A dynamic model accounts for time-dependent changes in the state of the system,
while a static (or steady-state) model calculates the system in equilibrium, and thus is time-invariant.
Dynamic models typically are represented by differential equations or difference equations.
Explicit vs. implicit: If all of the input parameters of the overall model are known, and the output
parameters can be calculated by a finite series of computations, the model is said to be explicit. But
sometimes it is the output parameters which are known, and the corresponding inputs must be solved for
by an iterative procedure, such as Newton's method (if the model is linear) or Broyden's method (if non-
linear). In such a case the model is said to be implicit. For example, a jet engine's physical properties such
as turbine and nozzle throat areas can be explicitly calculated given a design thermodynamic cycle (air and
fuel flow rates, pressures, and temperatures) at a specific flight condition and power setting, but the
engine's operating cycles at other flight conditions and power settings cannot be explicitly calculated from
the constant physical properties.
Discrete vs. continuous: A discrete model treats objects as discrete, such as the particles in a molecular
model or the states in a statistical model; while a continuous model represents the objects in a continuous
manner, such as the velocity field of fluid in pipe flows, temperatures and stresses in a solid, and electric
field that applies continuously over the entire model due to a point charge.
Deterministic vs. probabilistic (stochastic): A deterministic model is one in which every set of variable
states is uniquely determined by parameters in the model and by sets of previous states of these variables;
therefore, a deterministic model always performs the same way for a given set of initial conditions.
Conversely, in a stochastic model—usually called a "statistical model"—randomness is present, and
variable states are not described by unique values, but rather by probability distributions.
Deductive, inductive, or floating: A deductive model is a logical structure based on a theory. An inductive
model arises from empirical findings and generalization from them. The floating model rests on neither
theory nor observation, but is merely the invocation of expected structure. Application of mathematics in
social sciences outside of economics has been criticized for unfounded models.

Simulated Annealing (Recovered)

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Simulated Annealing (Recovered)

Hochgeladen von

Copyright:

Verfügbare Formate

Simulated Annealing

1. Create the first solution, and get its energy value.

Ant Colony Optimization Algorithms

Teaching-Learning-Based Optimization with Learning Enthusiasm Mechanism and

Das könnte Ihnen auch gefallen