Sie sind auf Seite 1von 44

Programming in MATLAB

Chapter 3: Multi Layer Perceptron Gp.Capt.Thanapant Raicharoen, PhD

Outline
n Limitation of Single layer Perceptron n Multi Layer Perceptron (MLP) n Backpropagation Algorithm n MLP for non-linear separable classification problem n MLP for function approximation problem

Artificial Neural Network

3.2

Gp.Capt.Thanapant Raicharoen, PhD

Limitation of Perceptron (XOR Function)


No. 1. 2. 3. 4. P1 0 0 1 1 P2 0 1 0 1 Output/Target 0 1 1 0

Artificial Neural Network

3.3

Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Feedforward Network Structure


Input node

x1 x2 x3

w w

(1) 2,1

(1) 1

Hidden node

h = layer no.

w
(1) y3

(1) 2,2

(1) y2 w(2)
1,2

(2) 1,1

Output node

(h) i

i = node i of layer h h = layer no.

o1
(2) 1,3

(1) 2,3

( wijh )

i = node i of layer h

j = node j of layer h-1

Output of each node


h h ( h ( ( yi( h ) = f ( wi(,1 ) y1( h 1) + wi(,2) y2h 1) + wi(,3) y3h 1) + L wi(,h ) ymh 1) + i( h ) ) m

where

= f ( wi(,hj) y (jh 1) + i( h ) )
j

Artificial Neural Network

y (0) = x j = input j and j


3.4

yi( N ) = oi = Output i Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Perceptron : How it works


Function XOR
x1
0 0 1 1
(1) w1,1

x2
0 1 0 1

y
0 1 1 0

x1 x2

y1

w w

(1) 2,1 (1) 1,2 (1) w2,2 y2

(2) w1,1

o
(2) w1,2

Layer 1
(1) (1) y1 = f ( w1,1 x1 + w1,2 x2 + 1(1) ) (1) (1) y2 = f ( w2,1 x1 + w2,2 x2 + 2(1) )

Layer 2 (2) (2) o = f ( w1,1 y1 + w1,2 y2 + 1(2) ) f( ) = Activation (or Transfer) function
Artificial Neural Network 3.5 Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Perceptron : How it works (cont.)


x2
(1,1) (0,1)
0 0 1 1 0 1 0 1 0 1 1 1 0 0 0 1

Outputs at layer 1
x1 x2 y1 y2

x1

(1) w1,1 (1) w2,1 (1) w1,2

y1

x2

(1) w2,2

y2

(0,0)

(1,0)

x1
(1) (1) Line L2 w2,1 x1 + w2,2 x2 + 2(1) = 0

(1) (1) Line L1 w1,1 x1 + w1,2 x2 + 1(1) = 0

Artificial Neural Network

3.6

Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Perceptron : How it works (cont.)


Inside layer 1 x2
(0,1)

x1-x2 space
(1,1)

y2

y1-y2 space
(1,1)

Linearly separable !
(0,0) Class 0 Class 1 (1,0)

x1

(0,0)

(1,0)

y1

(1) (1) Line L2 w2,1 x1 + w2,2 x2 + 2(1) = 0 (1) (1) Line L1 w1,1 x1 + w1,2 x2 + 1(1) = 0

Artificial Neural Network

3.7

Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Perceptron : How it works (cont.)


Inside output layer
Class 1 Class 0

y1 y2
(1,1)

(2) w1,1

o
(2) w1,2

y2

y1-y2 space

(2) (2) Line L3 w1,1 y1 + w1,2 y2 + 1(2) = 0

(0,0)

(1,0)

y1

Space y1-y2 is linearly separable. Therefore the line L3 can classify (separate) class 0 and class 1.

Artificial Neural Network

3.8

Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Perceptron : How it works (cont.)


How hidden layers work - Try to map data in hidden layer to be a linearly separable, before transferring these data into output layer - Finally the data in hidden layer should be linearly separable. - There may be more than one hidden layer in order to map data to be linearly separable. - Generally Activation function of each layer is not necessary to be a Hard limit (Thresholding) function and not to be the same function.
Artificial Neural Network 3.9 Gp.Capt.Thanapant Raicharoen, PhD

How can we adjust weights?


Assume we have a function y = x1 + 2x2 And we want to use a single layer perceptron to approximate this function. w1 x1 ^ y x2 w2 output is:
y = w1 x1 + w2 x2

In this case: activation function is identity function (Linear function) f(x) = x -We need to adjust w1 and ^ 2 in order to obtain w y is close to y (or equal to)
Artificial Neural Network 3.10 Gp.Capt.Thanapant Raicharoen, PhD

Delta Learning Rule (Widrow-Hoff Rule)


Consider the Mean Square Error, MSE
2 = ( y y )2 = ( y w1 x1 w2 x2 )2
2

means average

is a function of w1 and w2 as see on this below graph


This graph called error surface (parabola)

MSE

w2
Artificial Neural Network

w1
3.11 Gp.Capt.Thanapant Raicharoen, PhD

Delta Learning Rule (Widrow-Hoff Rule)


Mean square error 2 as a function of w1 and w2
3 2.8 2.6 2.4 2.2

w2

2 1.8 1.6 1.4 1.2 1 0 0.5 1 1.5 2

The minimum point is (1,2). Because MSE = 0

w1

Therefore, w1 and w2 must be adjusted in order to reach the minimum point in this error surface
Artificial Neural Network 3.12 Gp.Capt.Thanapant Raicharoen, PhD

Delta Learning Rule (Widrow-Hoff Rule) w1 and w2 are adjusted to the minimum point like this:
3 2.8

Adjusted No. k Adjusted No. 3 Adjusted No. 2 Adjusted No. 1


0.5 1 1.5 2

Target
w2

2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0

w1

Initial values (w1,w2)

Artificial Neural Network

3.13

Gp.Capt.Thanapant Raicharoen, PhD

Gradient Descent Method


What is the direction of steepest descent? In what direction will the function decrease Most rapidly. 1. calculate gradient of error surface in the current position (w1,w2), gradient direction is steepest (Go to Hill Direction) 2. Walk to the opposite site of gradient (adjust w1,w2) 3. Go to Step 1 until reach the minimum point

Artificial Neural Network

3.14

Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Algorithm
2-Layer case
(2) ok = f ( wk , j y j + k(2) ) j

ok
Output layer
(2) wk , j

= f (hk(2) )

Hidden layer

y j = f ( w(1) xi + j(1) ) j ,i
i

w(1) j ,i
Input layer

= f ( h(1) ) j
( hmn ) = weighted sum of input

xi

of Node m in Layer n
Artificial Neural Network 3.15 Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Algorithm (cont.)


2-Layer case
2 = (ok ok ) 2 = (ok f ( wk(2)j y j + k(2) ))2 ,
k j k

(2.1) (2.2) (2.3)

= (ok f ( wk(2)j f ( w(1) xi + j(1) ) + k(2) ))2 , j ,i


k j i

The derivative of 2 with respect to w(2)k,j

2 = 2 (o k ok ) f (hk(2) ) y j (2) wk , j
The derivative of 2 with respect to (2)k

Artificial Neural Network

2 = 2 (o k ok ) f (hk(2) ) (2) k
3.16

Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Algorithm (cont.)


2-Layer case
(2) 2 = (ok f ( wk , j y j + k(2) )) 2 k j (2) = (ok f ( wk , j f ( w(1) xi + j(1) ) + k(2) )) 2 j ,i k j i

(2.2) (2.3)

The derivative of 2 with respect to

w(1)j,i

2 (2) = 2 (o k ok ) (1) ( f ( wk , j y j + k(2) ) w(1) w j ,i k j j ,i


(2) (2) = 2 (ok ok ) f ( wk , j y j + k(2) ) wk , j f ( w(1) xi + (1) ) xi j ,i j k j j (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j k

Artificial Neural Network

3.17

Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Algorithm (cont.)


Taking the derivative of e2 with respect to w(1)j,i in order to adjust the weight connecting the Node j of current layer (Layer 1) with Node i of Lower Layer (Layer 0)
Error from upper Node k Derivative of upper Node k Weight between upper Node k and Node j of current layer

2 (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j w(1) k j ,i


Input from lower Node i Derivative of Node j of current layer

This part is the back propagation of error to Node j at current layer


Artificial Neural Network 3.18 Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Algorithm (cont.)

derivative of 2 with w(2)k,j

2 = 2 (o k ok ) f (hk(2) ) y j 2 wk , j
Derivative of current node Input from lower node

Error at current node

derivative of e2 with w(1)j,i

2 (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j w(1) k j ,i

Artificial Neural Network

3.19

Gp.Capt.Thanapant Raicharoen, PhD

Updating Weights : Gradient Descent Method


w
(n) j ,i

2 = ( n ) = (jn ) f (h (j n ) ) xi( n 1) w j ,i 2 = ( n ) = (jn ) f (h (j n ) ) j

(n) j

Updating weights and bias


w(jni) (new) = w (jni) (old ) + w(jni) , , , (j n ) (new) = j( n ) (old ) + j( n )

Artificial Neural Network

3.20

Gp.Capt.Thanapant Raicharoen, PhD

Adjusting Weights for a Nonlinear Function (Unit) calculation f , in case of nonlinear (function) unit
1. Sigmoid function
1 f ( x) = 1 + e 2 x

We get

f ( x ) = 2 f ( x ) (1 f ( x )) Special case of f Its easy to calculate f

2. Function tanh(x)

f ( x ) = tanh( x ) f ( x ) = (1 f ( x )2 )

We get

Artificial Neural Network

3.21

Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Calculation Demonstration

Artificial Neural Network

3.22

Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification


Example: Run_XOR_MLP_Newff.m
% Run_XOR_MLP_Newff.m P = [0 0 1 1; 0 1 0 1]; % XOR Function T = [0 1 1 0] plotpv(P,T,[-1, 2, -1, 2]); % plot data PR = [min(P(1,:)) max(P(1,:)); min(P(2,:)) max(P(2,:))]; S1 = 2; S2 = 1; TF1 = 'logsig'; TF2 = 'logsig'; PF = 'mse'; % net = newff(PR,[S1 S2],{TF1 TF2}); % net.trainParam.epochs = 100; net.trainParam.goal = 0.001; net = train(net,P,T); %
Artificial Neural Network 3.23 Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification


Example: Run_MLP_Random.m Matlab command : Create training data
x = randn([2 200]); o = (x(1,:).^2+x(2,:).^2)<1;

Input pattern x1 and x2 generated from random numbers


Class 1 Class 0

Desired output o: if (x1,x2) lies in a circle of radius 1 centered at the origin then o=1 x2 else o=0

-1

-2

-3 -3 -2 -1 0 1 2 3

Artificial Neural Network

3.24

x1

Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification (cont.)

Matlab command : Create a 2-layer network


PR = [min(x(1,:)) max(x(1,:)); min(x(2,:)) max(x(2,:))]; S1 = 10; S2 = 1;

Range of inputs

No. of nodes in Layers 1 and 2 Activation functions of Layers 1 and 2

TF1 = 'logsig'; TF2 = 'logsig'; BTF = 'traingd';

Training function BLF = 'learngd'; Learning function PF = 'mse'; Cost function


net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);

Command for creating the network


Gp.Capt.Thanapant Raicharoen, PhD

Artificial Neural Network

3.25

Example : Application of MLP for classification (cont.)

Matlab command : Train the network


net.trainParam.epochs = 2000; net.trainParam.goal = net = train(net,x,o); y = sim(net,x); netout = y>0.5;

No. of training rounds 0.002; Maximum desired error Training command Compute network outputs (continuous) Convert to binary outputs

Artificial Neural Network

3.26

Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification (cont.)


Network structure

x1 Input nodes x2 Hidden nodes (sigmoid)

Output node (Sigmoid)

Threshold unit (for binary output)

Artificial Neural Network

3.27

Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification (cont.)

Class 1 Class 0

-1

-2

-3 -3 -2 -1 0 1 2 3

Initial weights of the hidden layer nodes (10 nodes) displayed as Lines w1x1+w2x2+ = 0
Artificial Neural Network 3.28 Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification (cont.)


Training algorithm: Gradient descent method
10
0

Performance is 0.151511, Goal is 0.002

Training-Blue Goal-Black

10

-1

10

-2

10

-3

0.5

1 20000 Epochs

1.5 x 10

2
4

Artificial Neural Network

MSE vs training epochs 3.29

Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification (cont.)


Results obtained using the Gradient descent method
Class 1 Class 0

-1

-2

-3 -3 -2 -1 0 1 2 3

Classification Error : 40/200


Artificial Neural Network 3.30 Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification (cont.)


Training algorithm: Levenberg-Marquardt Backpropagation
10
0

Performance is 0.00172594, Goal is 0.002

Training-Blue Goal-Black

10

-1

10

-2

10

-3

4 10 Epochs

10

MSE vs training epochs (success with in only 10 epochs!)


Artificial Neural Network 3.31 Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification (cont.)


Results obtained using the Levenberg-Marquardt Backpropagation
2 Class 1 Class 0

Unused node
-1

-2

-3 -3 -2 -1 0 1 2 3

Only 6 hidden nodes are adequate ! Classification Error : 0/200


Artificial Neural Network 3.32 Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification (cont.) Summary: MLP for Classification Problem
- Each lower layer (hidden) Nodes of Neural Network create a local boundary decision.
- The

upper layer Nodes of Neural Network combine all local boundary decisions to a global boundary decision.

Artificial Neural Network

3.33

Gp.Capt.Thanapant Raicharoen, PhD

Example: Application of MLP for function approximation


Example: Run_MLP_SinFunction.m Matlab command : Create a 2-layer network
PR = [min(x) max(x)]

Range of inputs

S1 = 6; S2 = 1;

No. of nodes in Layers 1 and 2 Activation functions of Layers 1 and 2

TF1 = 'logsig'; TF2 = 'purelin'; BTF = 'trainlm';

Training function BLF = 'learngd'; Learning function PF = 'mse'; Cost function


net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);

Command for creating the network


Gp.Capt.Thanapant Raicharoen, PhD

Artificial Neural Network

3.34

Example: Application of MLP for function approximation


Network structure

Output node (Linear) x y

Input nodes

Hidden nodes (sigmoid)

Artificial Neural Network

3.35

Gp.Capt.Thanapant Raicharoen, PhD

Example: Application of MLP for function approximation


Example: Run_MLP_SinFunction.m
% Run_MLP_SinFunction.m p=0:0.25:5; t = sin(p); figure; plot(p,t,'+b'); axis([-0.5 5.5 -1.5 1.5 ]); % net = newff([0 10],[6,1],{'logsig','purelin'},'trainlm'); % net.trainParam.epochs = 50; net.trainParam.goal = 0.01; net = train(net,p,t); % a = sim(net,p); hold on; plot(p,a,'.r'); %
Artificial Neural Network 3.36 Gp.Capt.Thanapant Raicharoen, PhD

Example: Application of MLP for function approximation

Matlab command : Create a 2-layer network


PR = [min(x) max(x)]

Range of inputs

S1 = 3; S2 = 1;

No. of nodes in Layers 1 and 2 Activation functions of Layers 1 and 2

TF1 = 'logsig'; TF2 = 'purelin'; BTF = 'trainlm';

Training function BLF = 'learngd'; Learning function PF = 'mse'; Cost function


net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);
Artificial Neural Network 3.37

Command for creating the network


Gp.Capt.Thanapant Raicharoen, PhD

Example: Application of MLP for function approximation


1.8 1.6 1.4 1.2 Output y 1 0.8 0.6 0.4 0.2 0

0.5

1.5

2 Input x

2.5

3.5

Function to be approximated
x = 0:0.01:4; y = (sin(2*pi*x)+1).*exp(-x.^2);
Artificial Neural Network 3.38 Gp.Capt.Thanapant Raicharoen, PhD

Example: Application of MLP for function approximation


Network structure
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Desired output Network output

No. of hidden nodes is too small !

0.5

1.5

2.5

3.5

Function approximated using the network


Artificial Neural Network 3.39 Gp.Capt.Thanapant Raicharoen, PhD

Example: Application of MLP for function approximation

Matlab command : Create a 2-layer network


PR = [min(x) max(x)]

Range of inputs

S1 = 5; S2 = 1;

No. of nodes in Layers 1 and 2 Activation functions of Layers 1 and 2

TF1 = 'radbas'; TF2 = 'purelin'; BTF = 'trainlm';

Training function BLF = 'learngd'; Learning function PF = 'mse'; Cost function


net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);
Artificial Neural Network 3.40

Command for creating the network


Gp.Capt.Thanapant Raicharoen, PhD

Example: Application of MLP for function approximation

2 Desired output Network output 1.5

0.5

-0.5

0.5

1.5

2.5

3.5

Function approximated using the network


Artificial Neural Network 3.41 Gp.Capt.Thanapant Raicharoen, PhD

Example: Application of MLP for function approximation Summary: MLP for Function Approximation Problem
- Each lower layer (hidden) nodes of Neural Network create a local (short) approximated function.
- The

upper layer Nodes of Neural Network combine all local approximated function to global approximated function cover all input range.

Artificial Neural Network

3.42

Gp.Capt.Thanapant Raicharoen, PhD

Summary
Backpropagation can train multilayer feed-forward networks with differentiable transfer functions to perform function approximation, pattern association, and pattern classification. The term backpropagation refers to the process by which derivatives of network error, with respect to network weights and biases, can be computed. The number of inputs and outputs to the network are constrained by the problem. However, the number of layers between network inputs and the output layer and the sizes of the layers are up to the designer. The two-layer sigmoid/linear network can represent any functional relationship between inputs and outputs if the sigmoid layer has enough neurons.
Gp.Capt.Thanapant Raicharoen, PhD

Artificial Neural Network

3.43

Programming in MATLAB Exercise


n Exercise:
1. Write MATLAB to solve the question 1 in Exercise 4. 2. Write MATLAB to solve the question 2 in Exercise 4.

Artificial Neural Network

3.44

Gp.Capt.Thanapant Raicharoen, PhD