Iris
versicolor
Iris setosa
Iris virginica
Your first workshop task (using J48 in Weka) was to see if there was enough information in the four attributes of petal length and width
and sepal length and width to
distinguish and classify these similar
looking flowers. Notice that colour may not help here. Your next workshop will involve using ANNs to distinguish between these flowers.
1
Workshops and assignment
• Week 3: Workshop in ANNs (attend if you
wish)
• Week 3: assignment distributed
• Week 4 Workshop in ANNs (attend if you wish)
• Week 59: Workshops in WEKA/ANNs as you progress through the assignment (tailored to
your requirements)
• Week 10: 1 ^{s}^{t} June 2012 handin. 13 pages max.
2
Lecture 4
Artificial Neural Networks
3
Images of the brain. Top left:
photo; top right: what is
currently known; left: close up
showing the brain consisting of layers of interconnected nerve cells (neurons) and tissue.
All images taken from
www.idsia.ch/NNcourse/brain.html
4
From biology to computing
Neuron = nerve cell (in brain)
Neurons
Flow of information
http://faculty.washington.edu/chudler/color/pic1an.gif
5
http://www.frontiersin.org/neuromorphic_engineering/10.3389/fnins.2011.00026/full
Biological computing through action potential spikes:
A: Abstract physiology B: Biochemistry
Physiology and
biochemistry lead to
spikes. Can spikes be used for computing?
Post synaptic neuron spikes more (or less) depending on pre synaptic behaviour
6
http://www.socialbehavior.uzh.ch/teaching/ComputationalNeuroeconomicsFS11/Chapter10.pptx
Action
potential spiking can take place many times per second,
depending
on which part of the brain we look at
7
The Structure of Neurons
A neuron has a cell body, a branching input
structure (the dendrIte) and a branching output structure (the axOn)
• Axons connect to dendrites via synapses.
• Electrochemical signals are propagated from the dendritic input, through the cell body, and down the axon to other neurons
8
Classical computing vs. Neural Net
CPU
data and
instructions
^{d}^{a}^{t}^{a}
memory
http://ilab.usc.edu/classes/2002cs561/notes/session28.ppt
Layers of interconnected neurons (as many layers as you like), with the connections being weighted to reflect strength of incoming signal
Feedforward
http://www.cs.umbc.edu/~ypeng/F04NN/lecturenotes/NNCh1.ppt
Introduction
• What is an (artificial) neural network?
– A set of nodes (units, neurons, processing elements)
• Each node has input and output
• Each node performs a simple computation by its node function
– Weighted connections between nodes
• Connectivity gives the structure/architecture of the net
• What can be computed by a NN is primarily determined by the connections and their weights
– A very much simplified version of networks of neurons in
animal nerve systems
– Neuron is basic computational unit (primitive processor), not a program
http://www.cs.umbc.edu/~ypeng/F04NN/lecturenotes/NNCh1.ppt
Introduction
Von Neumann machine
Human Brain
 
• One or a few high speed (ns)
processors with considerable
computing power
• One or a few shared high speed buses for
communication
• Sequential memory access by address
• Problemsolving knowledge is
separated from the computing
component
• Hard to be adaptive
• Large # (10 ^{1}^{1} ) of low speed processors (ms) with limited
computing power
• Large # (10 ^{1}^{5} ) of low speed connections
• Problemsolving knowledge
resides in the connectivity of neurons
• Adaptation by changing the
connectivity
• Easily adapts for learning
• Fault tolerant
Example of fault tolerance
Captchas now frequently
used by websites to check
that faulttolerant human brain (rather than rigorous software) is interacting with site. If there were a reliable algorithm for recognizing
these ‘faulty’ characters, some other method for checking the ‘humanness’ of the user needs to be found.
12
http://www.cs.umbc.edu/~ypeng/F04NN/lecturenotes/NNCh1.ppt
Introduction
ANN

Bio NN

• Nodes
– input
– output
– node function
• Connections
• Cell body
– signal from other neurons
– firing frequency
– firing mechanism
• Synapses
– synaptic strength
– connection strength
• Highly parallel, simple local computation (at neuron level) achieves global results as emerging property of the interaction (at network level)
• Pattern directed (meaning of individual nodes only in the context of a pattern)
• Faulttolerant/graceful degrading
• Learning/adaptation plays important role.
http://www.cs.umbc.edu/~ypeng/F04NN/lecturenotes/NNCh1.ppt
History of NN
• Pitts & McCulloch (1943)
– First mathematical model of biological neurons
– All Boolean operations can be implemented by these neuron like nodes (with different threshold and excitatory/inhibitory connections).
– Competitor to Von Neumann model for general purpose computing device
– Origin of automata theory.
• Hebb (1949)
– Hebbian rule of learning: increase the connection strength between neurons i and j whenever both i and j are activated.
– Or increase the connection strength between nodes i and j whenever both nodes are simultaneously ON or OFF.
http://www.cs.umbc.edu/~ypeng/F04NN/lecturenotes/NNCh1.ppt
History of NN
• Early booming (50’s – early 60’s) – Rosenblatt (1958)
• Perceptron: network of threshold nodes for pattern classification Perceptron convergence theorem:
x 1
x 2
x n
everything that can be represented by a perceptron can be
learned
• A neuron only fires if its input signal exceeds a certain amount (the threshold) in a short time period.
• Synapses vary in strength
• Good connections allowing a large signal
• Slight connections allow only a weak signal.
• Synapses can be either excitatory or inhibitory.
Perceptron
16
Perceptron Learning Rule
w _{i} = w _{i} + w _{i} (a weight changes according to some difference) w _{i} = (t  o) x _{i} (difference is between desired class value and output value) t=c(x) is the target value (class value of sample)
o is the perceptron output (actual value representing class)
(eta) is a small constant (e.g. 0.1) called learning rate (more later)
• If the output is correct (t=o) the weights w _{i} are not changed
• If the output is incorrect (to) the weights w _{i} are changed such that the output of the perceptron for the adjusted weights is closer to t.
• The algorithm converges to the correct classification after repeated presentations of samples:
• if the training data is linearly separable • and is sufficiently small
17
ANN Supervised Learning Method  Basic
2. Actual output
compared with desired output
4. Error used to adjust
weights (not input or output)
Problem: How do we represent the class value if a threshold function only returns 0 and 1?
5. Many repetitions of 14
1. Samples fed in one at a
time to the input units:
repeat many times
3. Actual output stored in a temporary file to allow
calculation of error (difference between desired output and actual output) – error can be summed
18
Training ANN (perceptron) for AND function
For AND
A B 
Output 

0 
0 
0 
0 
1 
0 
1 
0 
0 
1 
1 
1 
Additional input constant
A
Output is +1 if t exceeded,
B
19
Training Perceptrons
For AND
A B 
Output 

0 
0 
0 
0 
1 
0 
1 
0 
0 
1 
1 
1 
^{I} 
2 
^{I} 
3 
Summation 
Output 


0 
0 
(1*0.3) + (0*0.5) + (0*0.4) = 0.3 
0 

0 
1 
(1*0.3) + (0*0.5) + (1*0.4) = 0.7 
0 

1 
0 
(1*0.3) + (1*0.5) + (0*0.4) = 0.2 
1 

1 
1 
(1*0.3) + (1*0.5) + (1*0.4) = 0.2 
0 
Given the current weights, this perceptron does not produce the correct results for two combinations of AND: ‘1 0’ and ‘1 1’.
20
Exercise: Fill the values in the summation table to determine
whether this Perceptron correctly performs the AND function.
For AND A B Output
0 
0 
0 
0 
1 
0 
1 
0 
0 
1 
1 
1 
^{I} 
1 
^{I} 
2 
^{I} 
3 
Summation 
Output 
1 
0 
0 

1 
0 
1 

1 
1 
0 

1 
1 
1 
21
For AND A B Output
0 
0 
0 
0 
1 
0 
1 
0 
0 
1 
1 
1 
^{I} 
2 
^{I} 
3 
Summation 
Output 


0 
0 
(1*0.4) + (0*0.7) + (0*0.2) = 0.4 
0 

0 
1 
(1*0.4) + (0*0.7) + (1*0.2) = 0.6 
0 

1 
0 
(1*0.4) + (1*0.7) + (0*0.2) = 0.3 
1 

1 
1 
(1*0.4) + (1*0.7) + (1*0.2) = 0.1 
1 
22
Weight adjustment

^{I} 
2 
^{I} 
3 
Summation 
Output 

0 
0 
(1*0.3) + (0*0.5) + (0*0.4) = 0.3 
0 

0 
1 
(1*0.3) + (0*0.5) + (1*0.4) = 0.7 
0 

1 
0 
(1*0.3) + (1*0.5) + (0*0.4) = 0.2 
1 

1 
1 
(1*0.3) + (1*0.5) + (1*0.4) = 0.2 
0 

^{I} 
2 
^{I} 
3 
Summation 
Output 


0 
0 
(1*0.4) + (0*0.7) + (0*0.2) = 0.4 
0 

0 
1 
(1*0.4) + (0*0.7) + (1*0.2) = 0.6 
0 

1 
0 
(1*0.4) + (1*0.7) + (0*0.2) = 0.3 
1 

1 
1 
(1*0.4) + (1*0.7) + (1*0.2) = 0.1 
1 
Note that the summation results are in the right direction for producing the correct output for ‘1 1’ but not for ‘1 0’ (given threshold of t=0.0)
23
Learning algorithm
Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1])
Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network
to output 0 and it output a 1, then
Error = 1
Learning question 1: Is there a set of weights that will produce the
correct results for all input values of the AND function?
Learning question 2: If so, how do we find these weights automatically rather than manually adjusting the weights?
24
Learning algorithm (supervised)  recap
Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the training value will be 1
Output , O : The output value from the neuron
Ij : Inputs being presented to the neuron
Wj : Weight from input neuron (I _{j} ) to the output neuron
LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1, for reasons to be described later
25
Feedback Learning Algorithm 
recap
Until Convergence (low error or other stopping
criterion) do
– Present a training pattern
– Calculate the error of the output nodes
– Adjust the weights connecting the input nodes to the output node so that the next time the training pattern is presented the error of the output is reduced
26
Automatic Perceptron algorithm
• weight change = some small constant (target
output – actual output) input
• if we use error instead of the “target output – actual output”, we have:
• weight change = some small constant error input
Perceptron feedback rule
• weight change = some small constant (learning rate) error
input (typically, learning rate is 0.1 or smaller)
• The error is:
Error = (0 − 1) = 1
Weight change for w1 = 0.1 x 1 x 1 = 0.1 (w1=0.4+0.1=0.5) Weight change for w2 = 0.1 x 1 x 1 = 0.1 (w2=0.70.1=0.6) Weight change for w3 = 0.1 x 1 x 0 = 0 (w3=0.2, no change)
28
Update Perceptron weights for ‘1 0’
For AND A B Output
0 
0 
0 
0 
1 
0 
1 0 
0 

1 
1 
1 
That is, while the output is still wrong, there has been a reduction in output from 0.3 to 0.1 for input ‘1 0’. At least another presentation of ‘1 0’ is required to produce the desired output 0, given the threshold of t=0.0
29
Next presentation
• Error = (0 − 1) = 1
• Weight change for w1 = 0.1 x 1 x 1 = 0.1 (w1=0.5+0.1=0.6)
• Weight change for w2 = 0.1 x 1 x 1 = 0.1 (w2=0.60.1=0.5)
• Weight change for w3 = 0.1 x 1 x 0 = 0 (w3=0.2, no change)
• Next presentation:
Since the output activation for ‘1 0’ is now below 0, the correct output for AND(1,0) = 0 is now produced. We can now move to the next wrongly classified sample ‘1 1’.
One can change weights after processing a single pattern or accumulate weight error values over a batch of patterns before changing the weights. This allows all patterns to be presented to a perceptron’s existing weights before the weights are changed.
30
http://www.cs.umbc.edu/~ypeng/F04NN/lecturenotes/NNCh1.ppt
History of NN
• The setback (mid 60’s – late 70’s)
– Serious problems with perceptron model (Minsky’s book
1969)
• Single layer perceonptrons cannot represent (learn) simple functions such as XOR
• Multilayer of nonlinear units may have greater power
but there is no learning rule for such nets
• Scaling problem: connection weights may grow infinitely
– The first two problems overcame by latter effort in 80’s, but the scaling problem persists
– Death of Rosenblatt (1964)
– Striving of Von Neumann machine and AI
http://www.cs.umbc.edu/~ypeng/F04NN/lecturenotes/NNCh1.ppt
History of NN
• Renewed enthusiasm and flourish (80’s – present)
– New techniques
• Backpropagation learning for multilayer feed forward nets (with nonlinear, differentiable node functions)
• Thermodynamic models (Hopfield net, Boltzmann machine, etc.)
• Unsupervised learning
– Impressive application (character recognition, speech
recognition, texttospeech transformation, process control, associative memory, etc.)
– ANNs now preferred computational method in many
applications (e.g. pattern recognition)
Excitatory and Inhibitory Synapses
 Recap
• We call a synapse/weight:
excitatory if w _{i} > 0, and
inhibitory if w _{i} < 0.
• We also associate a threshold
q with each neuron
• A neuron fires (i.e., has value 1 on its output line if the weighted sum of inputs at t reaches or passes q:
output = 1
if and only if
w _{i} x _{i} q
http://ilab.usc.edu/classes/2002cs561/notes/session28.ppt
33
Most common ANN architecture: Feed forward nets
Information flow is unidirectional
Data is presented to Input layer Passed on to Hidden Layer Passed on to Output layer
Information is distributed
Information processing is parallel
34
Your ANN for OCR
• feedforward
network
• train using Back propagation
• 2D pixel matrix
converted into linear input
35
Multilayer Perceptron (MLP)
Topology
Note that output layer can contain
more than
one unit
E.g. output classes 1/0 can be
represented
by one unit or
by two units (‘ 1 0’ for class 1 and ‘0
1’ for class 2)
Input Layer i
Hidden Layer(s) j Output Layer k
Fully connected, MLP is the most common (and simplest) ANN
36
Backpropagation Learning
Algorithm
Until Convergence (low error or other stopping criteria) do
– Present a training pattern
– Calculate the error of the output nodes
– Calculate the error of the hidden nodes (based on the error of the output nodes which is propagated back to the hidden nodes)
– Continue propagating error back until the input layer is reached
– Update all weights based on the standard delta rule with the appropriate error function d
w _{i}_{j} = d _{j} Z _{i}
where Z is the output function of ANN and is learning rate
One can change weights after processing a single pattern or accumulate weight error values over a batch of patterns before changing the weights.
37
Backpropagation algorithm in rules
• weight change = some small constant
(learning rate) error input activation
• For an output node, the error is:
error = (target activation  output activation)
output activation (1  output activation)
• For a hidden node, the error is:
error = weighted sum of tonode errors hidden
activation (1  hidden activation)
38
Why backpropagation?
• Each weight ‘Shares the Blame’ for prediction
error with other weights.
• Backpropagation algorithm decides how to distribute the blame among all weights and
adjust the weights accordingly.
• Small portion of blame leads to small adjustment.
• Large portion of the blame leads to large adjustment.
39
The role of η in
Assume three inputs, one
output
1 0 1 is the pattern at the input nodes, with 0 the target
w
w
w
_{1}
_{2}
_{3}
=0.5
=0.2
=0.8
1*0.5 + 0*0.2 + 1*0.8 = 1.3 (actual output)
Assume θ = 1.0 (threshold activation function) Then 1.3>1.0 and perceptron outputs 1.
But desired output is 0. Then:
w _{n}_{e}_{w} = w _{o}_{l}_{d} + η(desired − actual) * input
Assume η = 1 w _{1}_{n}_{e}_{w} = 0.5+1*(0−1)*1=−0.5 W _{2}_{n}_{e}_{w} = 0.2+1*(0−1)*0=0.2 W _{3}_{n}_{e}_{w} = 0.8+1*(0−1)*1= −0.2
Large η can lead to ‘weight oscillation’:
Assume η = 0.2
w _{1}_{n}_{e}_{w} = 0.5+0.2*(0−1)*1=0.3 W _{2}_{n}_{e}_{w} = 0.2+0.2*(0−1)*0=0.2 W _{3}_{n}_{e}_{w} = 0.8+0.2*(0−1)*1= 0.6
Note how weights that are more to blame get a larger amount of
change
40
Transfer Functions: Transfer function is usually the same for every unit in the same layer
There are various choices for Transfer / Activation functions that determine what is output from a neuron
Tanh f(x) = (e ^{x} – e ^{}^{x} ) / (e ^{x} + e ^{}^{x} )
Logistic f(x) = e ^{x} / (1 + e ^{x} )
1
0
Threshold
f(x) =
0 if x< 0
1 if x >= 1
Choice of transfer function for one output unit will depend on class information:
1.
2.
3.
If 0 and 1, use Logistic or Threshold
If nonbinary, use Tanh (e.g. 1, 0, 1 for tripartite classification)
, many output nodes as distinct class values)
If N classes, a class is represented as (0,
0,1, 0,
,
0) at the output layer (i.e. as
_{4}_{1}
Why is back propagation important?
• Provides a procedure that allows networks to learn weights that can solve any deterministic inputoutput problem.
• Allows networks to learn how to represent information as well as how to use it.
• Raises questions about the nature of representations and of
what must be specified in order to learn them.
• But back propagation pure and simple may be prone to the local minima problem
• This is because standard BP always seeks to reduce error through weight adjustment and gradient descent – error is not allowed to increase
42
Local
Minima
• Advantages of back propagation
– Relatively simple implementation
– Standard method and generally works well
• Disadvantages of BP
– Slow and inefficient
– Can get stuck in local minima resulting in suboptimal solutions
Learning rate
specifies the step
width of gradient descent
Local
Minimum
Weights are stuck through gradient
descent (i.e. error has reached local
minimum)
Gradient descent must be amended to allow learning to leave flat spots
^{4}^{3}
Enhancements To Back
Propagation
• Momentum
– Adds a percentage of the last movement to the
current movement
Without momentum, BP will fall back to local minimum
With momentum and elimination of flat spots, BP will find the global minimum
44
The role of bias in FFBP ANNs
• If all inputs are 0 in a FFBP, then output will be
0 irrespective of weights
• Bias unit lies in one layer and is connected to all neurons in next layer
The role of the bias units is to ensure that some value is input to the nodes at the next layer even if values are 0 from nodes in the previous layer. Bias units are usually set to output 1.
45
Weight change and momentum
• backpropagation algorithm often takes a long time to
learn
• Momentum consists of adding a fraction of the old weight change – typically set at about 0.5
• The learning rule then looks like:
weight change = some small constant error input
activation + momentum constant old weight change
– w(t) = *d + a*w(t1)
• w is the change in weight
• is the learning rate
• d is the error x input activation
• a is the momentum parameter
46
Batch Update
• With default BP update you update weights after every
pattern
• With Batch update you accumulate the changes for each weight, but do not update them until the end of each epoch
• Batch update gives a correct direction of the gradient for the
entire data set, while default update could do some weight
updates in directions quite different from the average gradient of the entire data set
47
When to stop training the Network ?
• Ideally – when we reach the global minimum of the error surface:
– Stop if the decrease in total training error (since last
cycle) is small. Usually, sum of squared error (SSE) is used for this purpose (squared so that negative error values are converted to positive for summing)
– Stop if the overall changes in the weights (since last cycle) are small.
• But the network thus obtained may have poor
generalizing power on unseen data i.e. the ANN has
overfitted the data
• Overfitting means that the ANN has memorized the training data so that, when new data is presented,
predictions are poor
48
Example of nontraining. Note how the error graph goes
into a flat line
An example of a neural network learning successfully
50
Choice of Training Parameters
• Learning Parameter and Momentum What should be the optimal values of these training parameters ?
 No clear consensus on any fixed strategy.
 However, effects of wrongly specifying them are well studied.
• Too big – Large leaps in weight space – risk of missing
global minima.
• Too small –  Takes long time to converge to global minima
 Once stuck in local minima, difficult to get out of
• Trial and error still the only method known
51
Is Backprop biologically plausible?
• Neurons do net send error signals backward across
their weights through a chain of neurons, as far as
anyone can tell
• But does this matter?
• Some neurons appear to use error signals, and there
are ways to use differences between activation signals to carry error information
52
Neural network models (summary)
Some of the most popular NN models are:
Perceptron, ADALINE, Multilayer Perceptron (MLP), Learning
Vector Quantization (LVQ), SelfOrganizing Map (SOM), Adaptive Resonance Theory (ART), Probabilistic Neural Network (PNN), General Regression Neural Network
(GNN), Bidirectional Associative Memory (BAM),
Boltzmann Machine, Elman, Hamming, Support Vector Machines (SVM), Time Delay NN (TDNN), Recurrent Backpropagation, ARTMAP, Counterpropagation,
Neocognitron (over 100 different type)
53
Connectionism versus strong AI
• Strong AI (and its use of the PSSH) is often called
‘computationalism’ – the human mind works purely through
formal operations on symbols, like a computer program
• Connectionism is used to describe mental phenomena as the emergent processes of interconnected networks of simple
units.
• There are many forms of connectionism, but the most common forms use neural network models.
• Hence, the debate is whether traditional programming or neural networks explain the mind better
• That is, does your mind work like highlevel computer program (or programs), or does your mind work like a neural
network?
54
Practical Differences
• A traditional program need have no
knowledge of the hardware on which it is run
• A neural network is totally dependent on the architecture and is therefore ‘hardware dependent’
• You can write a traditional program without paying attention to the hardware
• To solve a problem on a neural network, you have to experiment with different
architectures and parameters (hardware)
^{5}^{5}
Philosophical differences
• A computer program contains explicit rules for
manipulating symbols, e.g.
If x>y then a=1 else a=2
• A neural network represents symbols (‘ x ’, ‘y ’, etc) as a distributed pattern (typically a feature vector), e.g. 001100 for a, 110011 for
b
• A neural network doesn’t use explicit rules but weights on connections (and threshold functions in units) to perform tasks
56
Operation
• A traditional program takes input, transforms
the input through rules and produces output
• A neural network uses spreading activation that represents a probability that a neuron generates an action potential spike (fires) that spreads to other units connected to it
57
Explaining the mind
• Computationalism explains the mind as a suite
of software that takes incoming symbols,
performs mathematical and logical operations on those symbols to produce other symbols,
and outputs a desired result that is ‘correct’
• Connectionism explains the mind as an n dimensional vector of numeric activation
values over neural units in a network – no
symbols exist
58
Explaining learning
• Computationalism explains learning as the
application of symbolic formulae to data to
extract important features or derive new conclusions
• Connectionism explains learning as modifying
weights on connections
59
Where do you stand?
• Do you think that your mind works like a computer (computationalism, strong AI) or like a neural network?
• Remember, it is the mind we are dealing with,
not brain
• Even connectionists accept that neural networks are, at best, an approximation to real brains
• If you want to work at the brain level, you
need to know about biochemistry
60
Summary
• You have been introduced to the two main views in AI
concerning intelligence and how to build intelligence into a
machine:
– Write increasingly sophisticated software
– Construct increasingly sophisticated neural networks
• The debate is not just about machine intelligence but about
ourselves:
– Are we von Neumann computers (like your desktop, Apple iPad)?
– Are we neural networks (no real neural network computers have yet been built)?
61
Exercise: Update Perceptron weights
When ‘1 1’ is presented:
For AND A B Output
0 
0 
0 
0 
1 
0 
1 
0 
0 
1 
1 
1 
Using the perceptron learning rule and a learning rate of 0.1, show how the weights are changed for subsequent presentations of ‘1 1’. How many presentations does it take before the perceptron produces the correct
output for ‘1 1’? Then test your weights on ‘0 1’, ‘0 0’ and ‘1 0’.
62
Solution to Exercise: ‘1 1’
Second presentation: Error = (target output – actual output) = 1 Weight change = learning rate x error x input Weight change for w1 = 0.1 x 1 x 1 = 0.1 (w1=0.40.1=0.3)
Weight change for w2 = 0.1 x 1 x 1 = 0.1 (w2=0.4+0.1=0.5)
Weight change for w3 = 0.1 x 1 x 1 = 0.1 (w3=0.1+0.1=0)
63
Test your weights on ‘0 1’, ‘0 0’ and ‘1 0’
1
0
1
(1*0.3) + (0*0.5) + (1*0) = 0.3
0
That is, the weights work for ‘0 1’.
For ‘0 0’:
For ‘1 0’:
1
1
0
(1*0.3) + (1*0.5) + (0*0) = 0.2
1
64
References and further reading
• 
http://en.wikipedia.org/wiki/Connectionism 

• 
butler.redlands.edu/cs/ai/AdotSaha/NNtutorial.ppt 

• 

• 
http://philosophy.uwaterloo.ca/MindDict/connectionism.html 
65
Viel mehr als nur Dokumente.
Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.
Jederzeit kündbar.