Beruflich Dokumente
Kultur Dokumente
Introduction
Simulated annealing was created when researchers noticed the analogy between their search algorithms
and metallurgists' annealing algorithms. The idea is to achieve a goal state without reaching it too fast. In
metallurgy, for example, the process of hardening steel requires specially timed heating and cooling to
make the iron and carbon atoms settle just right. In mathematical search algorithms, we want to focus on
promising solutions without ignoring better solutions we might find later. In other words, we want to
reduce error to the global minima without getting stuck in less successful local minima.
The Algorithm
When a colony of ants is confronted with the choice of reaching their food via two different routes of which
one is much shorter than the other, their choice is entirely random. However, those who use the shorter
route move faster and therefore go back and forth more often between the anthill and the food.[1]
In computer science and operations research, the ant colony optimization algorithm (ACO) is
a probabilistic technique for solving computational problems which can be reduced to finding good paths
through graphs. Artificial Ants stand for multi-agent methods inspired by the behavior of real ants. The
pheromone-based communication of biological ants is often the predominant paradigm
used.[2] Combinations of Artificial Ants and local search algorithms have become a method of choice for
numerous optimization tasks involving some sort of graph, e.g., vehicle routing and internet routing. The
burgeoning activity in this field has led to conferences dedicated solely to Artificial Ants, and to numerous
commercial applications by specialized companies such as AntOptima.
As an example, Ant colony optimization[3] is a class of optimization algorithms modeled on the actions of
an ant colony. Artificial 'ants' (e.g. simulation agents) locate optimal solutions by moving through
a parameter space representing all possible solutions. Real ants lay down pheromones directing each other
to resources while exploring their environment. The simulated 'ants' similarly record their positions and
the quality of their solutions, so that in later simulation iterations more ants locate better solutions.[4] One
variation on this approach is the bees algorithm, which is more analogous to the foraging patterns of
the honey bee, another social insect.
This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it
constitutes some metaheuristic optimizations. Initially proposed by Marco Dorigo in 1992 in his PhD
thesis,[5][6] the first algorithm was aiming to search for an optimal path in a graph, based on the behavior
of ants seeking a path between their colony and a source of food. The original idea has since diversified to
solve a wider class of numerical problems, and as a result, several problems have emerged, drawing on
various aspects of the behavior of ants. From a broader perspective, ACO performs a model-based
search[7] and shares some similarities with estimation of distribution algorithms.
Here, w1, w2, w3 gives the strength of the input signals As you can see from the above, an ANN is a very
simplistic representation of a how a brain neuron works. To make things clearer, lets understand ANN using
a simple example: A bank wants to assess whether to approve a loan application to a customer, so, it wants
to predict whether a customer is likely to default on the loan.
Key Points related to the architecture:
1. The network architecture has an input layer, hidden layer (there can be more than 1) and the output layer.
It is also called MLP (Multi Layer Perceptron) because of the multiple layers.
2. The hidden layer can be seen as a “distillation layer” that distills some of the important patterns from the
inputs and passes it onto the next layer to see. It makes the network faster and efficient by identifying only
the important information from the inputs leaving out the redundant information
3. The activation function serves two notable purposes:
- It captures non-linear relationship between the inputs
- It helps convert the input into a more useful output.
In the above example, the activation function used is sigmoid:
O1 = 1 / (1+exp(-F)) Where F = W1*X1 + W2*X2 + W3*X3
Sigmoid activation function creates an output with values between 0 and 1. There can be other activation
functions like Tanh, softmax and RELU.
4. Similarly, the hidden layer leads to the final prediction at the output layer:
O3 = 1 / (1+exp(-F 1)) Where F 1= W7*H1 + W8*H2
Here, the output value (O3) is between 0 and 1. A value closer to 1 (e.g. 0.75) indicates that there is a higher
indication of customer defaulting.
5. The weights W are the importance associated with the inputs. If W1 is 0.56 and W2 is 0.92, then there is
higher importance attached to X2: Debt Ratio than X1: Age, in predicting H1.
6. The above network architecture is called “feed-forward network”, as you can see that input signals are
flowing in only one direction (from inputs to outputs). We can also create “feedback networks where signals
flow in both directions.
7. A good model with high accuracy gives predictions that are very close to the actual values. So, in the table
above, Column X values should be very close to Column W values. The error in prediction is the difference
between column W and column X:
8. The key to get a good model with accurate predictions is to find “optimal values of W — weights” that
minimizes the prediction error. This is achieved by “Back propagation algorithm” and this makes ANN a
learning algorithm because by learning from the errors, the model is improved.
9. The most common method of optimization algorithm is called “gradient descent”, where, iteratively
different values of W are used and prediction errors assessed. So, to get the optimal W, the values of W are
changed in small amounts and the impact on prediction errors assessed. Finally, those values of W are
chosen as optimal, where with further changes in W, errors are not reducing further. To get a more detailed
understanding of gradient descent, please refer to
Key advantages of neural Networks:
ANNs have some key advantages that make them most suitable for certain problems and situations:
1. ANNs have the ability to learn and model non-linear and complex relationships, which is really important
because in real-life, many of the relationships between inputs and outputs are non-linear as well as complex.
2. ANNs can generalize — After learning from the initial inputs and their relationships, it can infer unseen
relationships on unseen data as well, thus making the model generalize and predict on unseen data.
3. Unlike many other prediction techniques, ANN does not impose any restrictions on the input variables
(like how they should be distributed). Additionally, many studies have shown that ANNs can better model
heteroskedasticity i.e. data with high volatility and non-constant variance, given its ability to learn hidden
relationships in the data without imposing any fixed relationships in the data. This is something very useful
in financial time series forecasting (e.g. stock prices) where data volatility is very high.
A few applications:
1. Image Processing and Character recognition: Given ANNs ability to take in a lot of inputs, process them
to infer hidden as well as complex, non-linear relationships, ANNs are playing a big role in image and
character recognition. Character recognition like handwriting has lot of applications in fraud detection (e.g.
bank fraud) and even national security assessments. Image recognition is an ever-growing field with
widespread applications from facial recognition in social media, cancer detention in medicine to satellite
imagery processing for agricultural and defense usage. The research on ANN now has paved the way for
deep neural networks that forms the basis of “deep learning” and which has now opened up all the exciting
and transformational innovations in computer vision, speech recognition, natural language processing —
famous examples being self-driving cars.
2. Forecasting: Forecasting is required extensively in everyday business decisions (e.g. sales, financial
allocation between products, capacity utilization), in economic and monetary policy, in finance and stock
market. More often, forecasting problems are complex, for example, predicting stock prices is a complex
problem with a lot of underlying factors (some known, some unseen). Traditional forecasting models throw
up limitations in terms of taking into account these complex, non-linear relationships. ANNs, applied in the
right way, can provide robust alternative, given its ability to model and extract unseen features and
relationships. Also, unlike these traditional models, ANN doesn’t impose any restriction on input and
residual distributions. More research is going on in the field, for example — recent advances in the usage of
LSTM and Recurrent Neural Networks for forecasting.
ANNs are powerful models that have a wide range of applications. Above, I have listed a few prominent ones,
but they have far-reaching applications across many different fields in medicine, security, banking/finance
as well as government, agriculture and defense.
Mathematical model : A mathematical model is a description of
a system using mathematical concepts and language. The process of developing a mathematical model is
termed mathematical modeling. Mathematical models are used in the natural sciences (such
as physics, biology, earth science, chemistry) and engineering disciplines A model may help to explain a
system and to study the effects of different components, and to make predictions about behaviour.
Classifications: Linear vs. nonlinear: If all the operators in a mathematical model exhibit linearity, the
resulting mathematical model is defined as linear. A model is considered to be nonlinear otherwise. The
definition of linearity and nonlinearity is dependent on context, and linear models may have nonlinear
expressions in them. For example, in a statistical linear model, it is assumed that a relationship is linear in
the parameters, but it may be nonlinear in the predictor variables. Similarly, a differential equation is said
to be linear if it can be written with linear differential operators, but it can still have nonlinear expressions
in it. In a mathematical programming model, if the objective functions and constraints are represented
entirely by linear equations, then the model is regarded as a linear model. If one or more of the objective
functions or constraints are represented with a nonlinear equation, then the model is known as a nonlinear
model. Nonlinearity, even in fairly simple systems, is often associated with phenomena such
as chaos and irreversibility. Although there are exceptions, nonlinear systems and models tend to be more
difficult to study than linear ones. A common approach to nonlinear problems is linearization, but this can
be problematic if one is trying to study aspects such as irreversibility, which are strongly tied to
nonlinearity.
Static vs. dynamic: A dynamic model accounts for time-dependent changes in the state of the system,
while a static (or steady-state) model calculates the system in equilibrium, and thus is time-invariant.
Dynamic models typically are represented by differential equations or difference equations.
Explicit vs. implicit: If all of the input parameters of the overall model are known, and the output
parameters can be calculated by a finite series of computations, the model is said to be explicit. But
sometimes it is the output parameters which are known, and the corresponding inputs must be solved for
by an iterative procedure, such as Newton's method (if the model is linear) or Broyden's method (if non-
linear). In such a case the model is said to be implicit. For example, a jet engine's physical properties such
as turbine and nozzle throat areas can be explicitly calculated given a design thermodynamic cycle (air and
fuel flow rates, pressures, and temperatures) at a specific flight condition and power setting, but the
engine's operating cycles at other flight conditions and power settings cannot be explicitly calculated from
the constant physical properties.
Discrete vs. continuous: A discrete model treats objects as discrete, such as the particles in a molecular
model or the states in a statistical model; while a continuous model represents the objects in a continuous
manner, such as the velocity field of fluid in pipe flows, temperatures and stresses in a solid, and electric
field that applies continuously over the entire model due to a point charge.
Deterministic vs. probabilistic (stochastic): A deterministic model is one in which every set of variable
states is uniquely determined by parameters in the model and by sets of previous states of these variables;
therefore, a deterministic model always performs the same way for a given set of initial conditions.
Conversely, in a stochastic model—usually called a "statistical model"—randomness is present, and
variable states are not described by unique values, but rather by probability distributions.
Deductive, inductive, or floating: A deductive model is a logical structure based on a theory. An inductive
model arises from empirical findings and generalization from them. The floating model rests on neither
theory nor observation, but is merely the invocation of expected structure. Application of mathematics in
social sciences outside of economics has been criticized for unfounded models.