Beruflich Dokumente
Kultur Dokumente
Sudeshna Sarkar
IIT Kharagpur
Single layer Perceptron
• Single layer perceptrons learn o x
linear decision boundaries
x2
0 0
0 0 o o
+ + 0 0
+ 0
+ ++
x: class I (y = 1)
o: class II (y = -1)
x1
x x
x2
+ 0
o x
0 +
x: class I (y = 1)
x1 o: class II (y = -1)
xor
x2
Boolean OR + +
OR
- + x1
input input
ouput
x1 x2
w0= -0.5
0 0 0
0 1 1 1
w1=1 w2=1
1 0 1
1 1 1 x1 x2
x2
Boolean AND - +
AND
input input x1
ouput - -
x1 x2
w0= -1.5
0 0 0
0 1 0 1
w1=1 w2=1
1 0 0
1 1 1 x1 x2
x2
Boolean XOR
+ -
XOR
input input
ouput
x1 x2 x1
- +
0 0 0
0 1 1
1 0 1
1 1 0
Boolean XOR
XOR
o -0.5
input input
ouput 1 -1
x1 x2
OR AND
0 0 0 -0.5 h1 h1 -1.5
0 1 1
1
1 0 1 1
1 1
1 1 0
x1 x1
Representation Capability of NNs
• Single layer nets have limited representation power (linear
separability problem). Multi-layer nets (or nets with non-
linear hidden units) may overcome linear inseparability
problem.
• Every Boolean function can be represented by a network with
a single hidden layer.
• Every bounded continuous function can be approximated with
arbitrarily small error, by network with one hidden layer
• Any function can be approximated to arbitrary accuracy by a
network with two hidden layers.
Multilayer Network
Outputls
Inputs
First Second
Input hidden hidden Output
layer layer layer
Two-layer back-propagation neural network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
n1
n n2 yn2
xn
Input Hidden Output
layer layer
Error signals
9
Derivation
• For one output neuron, the error function is
1
𝐸 = (𝑦 − 𝑜)2
2
• For each unit 𝑗, the output 𝑜𝑗 is defined as
𝑛
𝑜𝑗 = 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 𝑤𝑘𝑗 𝑜𝑘
𝑘=1
The input 𝑛𝑒𝑡𝑗 to a neuron is the weighted sum of outputs 𝑜𝑘
of previous 𝑛 neurons.
• Finding the derivative of the error:
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
1
For one output neuron, the error function is 𝐸 = 2 (𝑦 − 𝑜)2
For each unit 𝑗, the output 𝑜𝑗 is defined as
𝑛
𝑜𝑗 = 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 𝑤𝑘𝑗 𝑜𝑘
𝑘=1
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
𝜕𝐸 𝜕𝑜𝑙
= 𝑤 𝜑 𝑛𝑒𝑡𝑗 1 − 𝜑 𝑛𝑒𝑡𝑗 𝑜𝑖
𝜕𝑜𝑙 𝜕𝑛𝑒𝑡𝑧𝑙 𝑗𝑧𝑙
𝑙
𝜕𝐸
= 𝛿𝑗 𝑜𝑖
𝜕𝑤𝑖𝑗
with
𝑜𝑗 − 𝑦𝑗 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an output neuron
𝜕𝐸 𝜕𝑜𝑗
𝛿𝑗 = =
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝛿𝑧𝑙 𝑤𝑗𝑙 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an inner neuron
𝑍
To update the weight 𝑤𝑖𝑗 using gradient descent, one must choose a learning rate 𝜂.
𝜕𝐸
∆𝑤𝑖𝑗 = −𝜂
𝜕𝑤𝑖𝑗
Backpropagation Algorithm
Initialize all weights to small random numbers.
Until satisfied, do
– For each training example, do
• Input the training example to the network and compute the network
outputs
• For each output unit 𝑘
𝛿𝑘 ← 𝑜𝑘 (1 − 𝑜𝑘 )(𝑦𝑘 − 𝑜𝑘 )
• For each hidden unit h 𝑥𝑑 = input
Stopping
1. Fixed maximum number of epochs: most naïve
2. Keep track of the training and validation error
curves.
Overfitting in ANNs
Local Minima