Sie sind auf Seite 1von 44

# Programming in MATLAB

## Chapter 3: Multi Layer Perceptron Gp.Capt.Thanapant Raicharoen, PhD

Outline
n Limitation of Single layer Perceptron n Multi Layer Perceptron (MLP) n Backpropagation Algorithm n MLP for non-linear separable classification problem n MLP for function approximation problem

3.2

## Limitation of Perceptron (XOR Function)

No. 1. 2. 3. 4. P1 0 0 1 1 P2 0 1 0 1 Output/Target 0 1 1 0

3.3

Input node

x1 x2 x3

w w

(1) 2,1

(1) 1

Hidden node

h = layer no.

w
(1) y3

(1) 2,2

(1) y2 w(2)
1,2

(2) 1,1

Output node

(h) i

## i = node i of layer h h = layer no.

o1
(2) 1,3

(1) 2,3

( wijh )

i = node i of layer h

## Output of each node

h h ( h ( ( yi( h ) = f ( wi(,1 ) y1( h 1) + wi(,2) y2h 1) + wi(,3) y3h 1) + L wi(,h ) ymh 1) + i( h ) ) m

where

= f ( wi(,hj) y (jh 1) + i( h ) )
j

3.4

Function XOR
x1
0 0 1 1
(1) w1,1

x2
0 1 0 1

y
0 1 1 0

x1 x2

y1

w w

## (1) 2,1 (1) 1,2 (1) w2,2 y2

(2) w1,1

o
(2) w1,2

Layer 1
(1) (1) y1 = f ( w1,1 x1 + w1,2 x2 + 1(1) ) (1) (1) y2 = f ( w2,1 x1 + w2,2 x2 + 2(1) )

Layer 2 (2) (2) o = f ( w1,1 y1 + w1,2 y2 + 1(2) ) f( ) = Activation (or Transfer) function
Artificial Neural Network 3.5 Gp.Capt.Thanapant Raicharoen, PhD

## Multilayer Perceptron : How it works (cont.)

x2
(1,1) (0,1)
0 0 1 1 0 1 0 1 0 1 1 1 0 0 0 1

Outputs at layer 1
x1 x2 y1 y2

x1

## (1) w1,1 (1) w2,1 (1) w1,2

y1

x2

(1) w2,2

y2

(0,0)

(1,0)

x1
(1) (1) Line L2 w2,1 x1 + w2,2 x2 + 2(1) = 0

3.6

## Multilayer Perceptron : How it works (cont.)

Inside layer 1 x2
(0,1)

x1-x2 space
(1,1)

y2

y1-y2 space
(1,1)

Linearly separable !
(0,0) Class 0 Class 1 (1,0)

x1

(0,0)

(1,0)

y1

(1) (1) Line L2 w2,1 x1 + w2,2 x2 + 2(1) = 0 (1) (1) Line L1 w1,1 x1 + w1,2 x2 + 1(1) = 0

3.7

## Multilayer Perceptron : How it works (cont.)

Inside output layer
Class 1 Class 0

y1 y2
(1,1)

(2) w1,1

o
(2) w1,2

y2

y1-y2 space

## (2) (2) Line L3 w1,1 y1 + w1,2 y2 + 1(2) = 0

(0,0)

(1,0)

y1

Space y1-y2 is linearly separable. Therefore the line L3 can classify (separate) class 0 and class 1.

3.8

## Multilayer Perceptron : How it works (cont.)

How hidden layers work - Try to map data in hidden layer to be a linearly separable, before transferring these data into output layer - Finally the data in hidden layer should be linearly separable. - There may be more than one hidden layer in order to map data to be linearly separable. - Generally Activation function of each layer is not necessary to be a Hard limit (Thresholding) function and not to be the same function.
Artificial Neural Network 3.9 Gp.Capt.Thanapant Raicharoen, PhD

## How can we adjust weights?

Assume we have a function y = x1 + 2x2 And we want to use a single layer perceptron to approximate this function. w1 x1 ^ y x2 w2 output is:
y = w1 x1 + w2 x2

In this case: activation function is identity function (Linear function) f(x) = x -We need to adjust w1 and ^ 2 in order to obtain w y is close to y (or equal to)
Artificial Neural Network 3.10 Gp.Capt.Thanapant Raicharoen, PhD

## Delta Learning Rule (Widrow-Hoff Rule)

Consider the Mean Square Error, MSE
2 = ( y y )2 = ( y w1 x1 w2 x2 )2
2

means average

## is a function of w1 and w2 as see on this below graph

This graph called error surface (parabola)

MSE

w2
Artificial Neural Network

w1
3.11 Gp.Capt.Thanapant Raicharoen, PhD

## Delta Learning Rule (Widrow-Hoff Rule)

Mean square error 2 as a function of w1 and w2
3 2.8 2.6 2.4 2.2

w2

## The minimum point is (1,2). Because MSE = 0

w1

Therefore, w1 and w2 must be adjusted in order to reach the minimum point in this error surface
Artificial Neural Network 3.12 Gp.Capt.Thanapant Raicharoen, PhD

Delta Learning Rule (Widrow-Hoff Rule) w1 and w2 are adjusted to the minimum point like this:
3 2.8

0.5 1 1.5 2

Target
w2

w1

3.13

## Gradient Descent Method

What is the direction of steepest descent? In what direction will the function decrease Most rapidly. 1. calculate gradient of error surface in the current position (w1,w2), gradient direction is steepest (Go to Hill Direction) 2. Walk to the opposite site of gradient (adjust w1,w2) 3. Go to Step 1 until reach the minimum point

3.14

## Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Algorithm
2-Layer case
(2) ok = f ( wk , j y j + k(2) ) j

ok
Output layer
(2) wk , j

= f (hk(2) )

Hidden layer

y j = f ( w(1) xi + j(1) ) j ,i
i

w(1) j ,i
Input layer

= f ( h(1) ) j
( hmn ) = weighted sum of input

xi

of Node m in Layer n
Artificial Neural Network 3.15 Gp.Capt.Thanapant Raicharoen, PhD

## Backpropagation Algorithm (cont.)

2-Layer case
2 = (ok ok ) 2 = (ok f ( wk(2)j y j + k(2) ))2 ,
k j k

k j i

## The derivative of 2 with respect to w(2)k,j

2 = 2 (o k ok ) f (hk(2) ) y j (2) wk , j
The derivative of 2 with respect to (2)k

## Artificial Neural Network

2 = 2 (o k ok ) f (hk(2) ) (2) k
3.16

## Backpropagation Algorithm (cont.)

2-Layer case
(2) 2 = (ok f ( wk , j y j + k(2) )) 2 k j (2) = (ok f ( wk , j f ( w(1) xi + j(1) ) + k(2) )) 2 j ,i k j i

(2.2) (2.3)

w(1)j,i

## 2 (2) = 2 (o k ok ) (1) ( f ( wk , j y j + k(2) ) w(1) w j ,i k j j ,i

(2) (2) = 2 (ok ok ) f ( wk , j y j + k(2) ) wk , j f ( w(1) xi + (1) ) xi j ,i j k j j (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j k

3.17

## Backpropagation Algorithm (cont.)

Taking the derivative of e2 with respect to w(1)j,i in order to adjust the weight connecting the Node j of current layer (Layer 1) with Node i of Lower Layer (Layer 0)
Error from upper Node k Derivative of upper Node k Weight between upper Node k and Node j of current layer

## 2 (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j w(1) k j ,i

Input from lower Node i Derivative of Node j of current layer

## This part is the back propagation of error to Node j at current layer

Artificial Neural Network 3.18 Gp.Capt.Thanapant Raicharoen, PhD

## derivative of 2 with w(2)k,j

2 = 2 (o k ok ) f (hk(2) ) y j 2 wk , j
Derivative of current node Input from lower node

3.19

w
(n) j ,i

(n) j

## Updating weights and bias

w(jni) (new) = w (jni) (old ) + w(jni) , , , (j n ) (new) = j( n ) (old ) + j( n )

3.20

## Gp.Capt.Thanapant Raicharoen, PhD

Adjusting Weights for a Nonlinear Function (Unit) calculation f , in case of nonlinear (function) unit
1. Sigmoid function
1 f ( x) = 1 + e 2 x

We get

## f ( x ) = 2 f ( x ) (1 f ( x )) Special case of f Its easy to calculate f

2. Function tanh(x)

f ( x ) = tanh( x ) f ( x ) = (1 f ( x )2 )

We get

3.21

3.22

## Example : Application of MLP for classification

Example: Run_XOR_MLP_Newff.m
% Run_XOR_MLP_Newff.m P = [0 0 1 1; 0 1 0 1]; % XOR Function T = [0 1 1 0] plotpv(P,T,[-1, 2, -1, 2]); % plot data PR = [min(P(1,:)) max(P(1,:)); min(P(2,:)) max(P(2,:))]; S1 = 2; S2 = 1; TF1 = 'logsig'; TF2 = 'logsig'; PF = 'mse'; % net = newff(PR,[S1 S2],{TF1 TF2}); % net.trainParam.epochs = 100; net.trainParam.goal = 0.001; net = train(net,P,T); %
Artificial Neural Network 3.23 Gp.Capt.Thanapant Raicharoen, PhD

## Example : Application of MLP for classification

Example: Run_MLP_Random.m Matlab command : Create training data
x = randn([2 200]); o = (x(1,:).^2+x(2,:).^2)<1;

## Input pattern x1 and x2 generated from random numbers

Class 1 Class 0

Desired output o: if (x1,x2) lies in a circle of radius 1 centered at the origin then o=1 x2 else o=0

-1

-2

-3 -3 -2 -1 0 1 2 3

3.24

x1

## Matlab command : Create a 2-layer network

PR = [min(x(1,:)) max(x(1,:)); min(x(2,:)) max(x(2,:))]; S1 = 10; S2 = 1;

Range of inputs

## Training function BLF = 'learngd'; Learning function PF = 'mse'; Cost function

net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);

## Command for creating the network

Gp.Capt.Thanapant Raicharoen, PhD

3.25

## Matlab command : Train the network

net.trainParam.epochs = 2000; net.trainParam.goal = net = train(net,x,o); y = sim(net,x); netout = y>0.5;

No. of training rounds 0.002; Maximum desired error Training command Compute network outputs (continuous) Convert to binary outputs

3.26

## Example : Application of MLP for classification (cont.)

Network structure

3.27

## Example : Application of MLP for classification (cont.)

Class 1 Class 0

-1

-2

-3 -3 -2 -1 0 1 2 3

Initial weights of the hidden layer nodes (10 nodes) displayed as Lines w1x1+w2x2+ = 0
Artificial Neural Network 3.28 Gp.Capt.Thanapant Raicharoen, PhD

## Example : Application of MLP for classification (cont.)

Training algorithm: Gradient descent method
10
0

## Performance is 0.151511, Goal is 0.002

Training-Blue Goal-Black

10

-1

10

-2

10

-3

0.5

1 20000 Epochs

1.5 x 10

2
4

## Example : Application of MLP for classification (cont.)

Results obtained using the Gradient descent method
Class 1 Class 0

-1

-2

-3 -3 -2 -1 0 1 2 3

## Classification Error : 40/200

Artificial Neural Network 3.30 Gp.Capt.Thanapant Raicharoen, PhD

## Example : Application of MLP for classification (cont.)

Training algorithm: Levenberg-Marquardt Backpropagation
10
0

## Performance is 0.00172594, Goal is 0.002

Training-Blue Goal-Black

10

-1

10

-2

10

-3

4 10 Epochs

10

## MSE vs training epochs (success with in only 10 epochs!)

Artificial Neural Network 3.31 Gp.Capt.Thanapant Raicharoen, PhD

## Example : Application of MLP for classification (cont.)

Results obtained using the Levenberg-Marquardt Backpropagation
2 Class 1 Class 0

Unused node
-1

-2

-3 -3 -2 -1 0 1 2 3

## Only 6 hidden nodes are adequate ! Classification Error : 0/200

Artificial Neural Network 3.32 Gp.Capt.Thanapant Raicharoen, PhD

Example : Application of MLP for classification (cont.) Summary: MLP for Classification Problem
- Each lower layer (hidden) Nodes of Neural Network create a local boundary decision.
- The

upper layer Nodes of Neural Network combine all local boundary decisions to a global boundary decision.

3.33

## Example: Application of MLP for function approximation

Example: Run_MLP_SinFunction.m Matlab command : Create a 2-layer network
PR = [min(x) max(x)]

Range of inputs

S1 = 6; S2 = 1;

## Training function BLF = 'learngd'; Learning function PF = 'mse'; Cost function

net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);

## Command for creating the network

Gp.Capt.Thanapant Raicharoen, PhD

3.34

## Example: Application of MLP for function approximation

Network structure

Input nodes

3.35

## Example: Application of MLP for function approximation

Example: Run_MLP_SinFunction.m
% Run_MLP_SinFunction.m p=0:0.25:5; t = sin(p); figure; plot(p,t,'+b'); axis([-0.5 5.5 -1.5 1.5 ]); % net = newff([0 10],[6,1],{'logsig','purelin'},'trainlm'); % net.trainParam.epochs = 50; net.trainParam.goal = 0.01; net = train(net,p,t); % a = sim(net,p); hold on; plot(p,a,'.r'); %
Artificial Neural Network 3.36 Gp.Capt.Thanapant Raicharoen, PhD

## Matlab command : Create a 2-layer network

PR = [min(x) max(x)]

Range of inputs

S1 = 3; S2 = 1;

## Training function BLF = 'learngd'; Learning function PF = 'mse'; Cost function

net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);
Artificial Neural Network 3.37

## Command for creating the network

Gp.Capt.Thanapant Raicharoen, PhD

## Example: Application of MLP for function approximation

1.8 1.6 1.4 1.2 Output y 1 0.8 0.6 0.4 0.2 0

0.5

1.5

2 Input x

2.5

3.5

Function to be approximated
x = 0:0.01:4; y = (sin(2*pi*x)+1).*exp(-x.^2);
Artificial Neural Network 3.38 Gp.Capt.Thanapant Raicharoen, PhD

## Example: Application of MLP for function approximation

Network structure
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Desired output Network output

0.5

1.5

2.5

3.5

## Function approximated using the network

Artificial Neural Network 3.39 Gp.Capt.Thanapant Raicharoen, PhD

## Matlab command : Create a 2-layer network

PR = [min(x) max(x)]

Range of inputs

S1 = 5; S2 = 1;

## Training function BLF = 'learngd'; Learning function PF = 'mse'; Cost function

net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);
Artificial Neural Network 3.40

## Command for creating the network

Gp.Capt.Thanapant Raicharoen, PhD

0.5

-0.5

0.5

1.5

2.5

3.5

## Function approximated using the network

Artificial Neural Network 3.41 Gp.Capt.Thanapant Raicharoen, PhD

Example: Application of MLP for function approximation Summary: MLP for Function Approximation Problem
- Each lower layer (hidden) nodes of Neural Network create a local (short) approximated function.
- The

upper layer Nodes of Neural Network combine all local approximated function to global approximated function cover all input range.

3.42

## Gp.Capt.Thanapant Raicharoen, PhD

Summary
Backpropagation can train multilayer feed-forward networks with differentiable transfer functions to perform function approximation, pattern association, and pattern classification. The term backpropagation refers to the process by which derivatives of network error, with respect to network weights and biases, can be computed. The number of inputs and outputs to the network are constrained by the problem. However, the number of layers between network inputs and the output layer and the sizes of the layers are up to the designer. The two-layer sigmoid/linear network can represent any functional relationship between inputs and outputs if the sigmoid layer has enough neurons.
Gp.Capt.Thanapant Raicharoen, PhD

3.43

## Programming in MATLAB Exercise

n Exercise:
1. Write MATLAB to solve the question 1 in Exercise 4. 2. Write MATLAB to solve the question 2 in Exercise 4.

3.44