Sie sind auf Seite 1von 35

A Brief Overview of Neural Networks N k

Overview
Relation to Biological Brain: Biological Neural Network g g The Artificial Neuron Types of Networks and Learning Techniques Supervised Learning & Backpropagation Training Algorithm Learning by Example Applications

Biological Neuron

Artificial Neuron
W I N P U T S W
Neuron

W W W=Weight W i ht

f(n)
Activation Function

Outputs

Transfer Functions
Output

1 SIGMOID : f (n) = n 1 +e
Input

LINEAR : f (n) = n

Types of networks

Multiple Inputs and Single Layer

Multiple Inputs and layers

Types of Networks Contd Contd.


Feedback

Recurrent Networks

Recurrent Networks
Feed forward networks: Information only flows one way One input pattern produces one output No sense of time (or memory of previous state) Recurrency Nodes connect back to other nodes or themselves Information flow is multidirectional Sense of time and memory of previous state(s) Biological nervous systems show high levels of recurrency (but feed-forward structures exists too)

ANNs The basics


ANNs incorporate the two fundamental components of biological neural nets:
1. Neurones (nodes) 2. Synapses (weights)

Feed forward Feed-forward nets


Information flow is unidirectional
Data is presented to Input layer p p y Passed on to Hidden Layer Passed on to Output layer

Information is distributed Information processing is parallel

Internal representation (interpretation) of data

Neural networks a e good for p ed ct o p ob e s eu a et o s are o prediction problems. The inputs are well understood. You have a p good idea of which features of the data are important, but not necessarily how to combine them. them The output is well understood. You know what you are trying to predict predict. Experience is available. You have plenty of examples where both the inputs and the output are known. This experience will be used to train the network.

Feeding data through the net:

(1 0.25) + (0.5 (-1.5)) = 0.25 + (-0.75) 1 Squashing: = 0.3775

= - 0.5

1 + e 0 .5

Learning Techniques
Supervised Learning:
Inputs from the environment
Expected Output

Actual System y
Actual Output

Neural Network
Training Error

Multilayer Perceptron

Inputs

First Hidden layer

Second Hidden Layer

Output Layer

Signal Flow Backpropagation of Errors B k i fE

Function Signals Error Signals

Neural networks for Directed Data Mining: Building a model f classification and prediction d l for l ifi ti d di ti
1. 2. 2 3. 4. 5. Identify the input and output features Normalize (scaling) the inputs and outputs so their range is between 0 and 1. Set up a network on a representative set of training examples. l Train the network on a representative set of training examples. p Test the network on a test set strictly independent from the training examples. If necessary repeat the training, adjusting the training set network topology nad set, topology, parameters. Evaluate the network using the evaluation set to see how well it performs. Apply the model generated by the network to predict outcomes for unknown inputs.

6. 6

Learning by Example
Hidden layer transfer function: Sigmoid function = F(n)= 1/(1+exp(-n)), where n is the net input to the neuron. Derivative= F(n) = (output of the neuron)(1output of the neuron) : Slope of the transfer function. Output layer transfer function: Linear function= F(n)=n; Output=Input to the neuron F( ) O I h Derivative= F(n)= 1

Purpose of the Activation Function


We want the unit to be active (near +1) when the right ( ) g inputs are given We want the unit to be inactive (near 0) when the wrong inputs are given. i t i Its preferable for activation function to be nonlinear. Otherwise, the entire neural network collapses into a simple linear function.

Possibilities for activation function

Step function

Sign function

Sigmoid (logistic) function


sigmoid(x) = 1/(1+e-x)

sign(x) = +1, if x > 0 step(x) = 1, if x > threshold 0, if x threshold -1, if x 0 ( picture above, threshold = 0) (in p , )

Adding an extra input with activation a0 = - 1 and weight W0,jj = t (called the bias weight) is equivalent to having a threshold at t. This way we can always assume a 0 threshold.

Using Bias Weight to U i a Bi W i ht t Standardize the Threshold


-1 x1 x2 W1x1+ W2x2 < T W1x1+ W2x2 - T < 0 W1 W2 T

Learning by Example
Training Algorithm: backpropagation of errors using gradient descent training. Colors:
Red: Current weights O Orange: Updated weights U d t d i ht Black boxes: Inputs and outputs to a neuron Blue: Sensitivities at each layer

The perceptron learning rule performs gradient descent in weight space. Error surface: The surface that describes the error on each example as a function of all the weights in the network. network A set of weights defines a point on this surface surface. (It could also be called a state in the state space of possible weights, i.e., weight space.) We look at the partial derivative of the surface with respect to each weight (i.e., the gradient -- how much the error would change if we made a small change in each weight). Then the weights are being altered in an amount proportional t th slope i each di ti l to the l in h direction (corresponding ti ( di to a weight). Thus the network as a whole is moving in the direction of steepest descent on the error surface.

Definition of Error: Sum of Squared Errors


1 1 2 E= (t o) = Err 2 2 examples 2

This is introduced to simplify the math on the next slide

Here, t is the correct (desired) output and o is the actual output of the neural net. t t f th l t

Reduction of Squared Error


Gradient descent reduces the squared error by calculating the partial derivative of E with respect to each weight:
E Err chain rule for derivatives h i l f d i ti =E Err a vector E = W j W j
This is called in

E is

n t g Wk xk expand second E above to (t g(in)) d d Err b ( (i )) k =0 t = Err g ' (in) x j = 0 and chain rule because = Err W j
learning rate

W j W j + Err g ' (in) x j


The weight is updated by times this gradient of error E in weight space. The fact that the weight is updated in the correct direction (+/-) can be verified with examples. The learning rate, , is typically set to a small value such as 0.1

First Pass
G1= (0.6225)(10.6225)(0.0397)(0.5)(2)=0.0093 0 6225)(0 0397)(0 5)(2) 0 0093 G2= (0.6508)(10.6508)(0.3492)(0.5)=0.0397 0 6508)(0 3492)(0 5) 0 0397

0.5 05
0.5

0.6225
0.5 05

0.6225

0.6508
0.5

0.6508

1
0.5

0.5

0.5 0.5

0.6508
G3=(1)(0.3492)=0.3492

0.5

0.6225

0.5

0.6225

0.6508

Gradient of the neuron= G =slope of the transfer function[{(weight of the neuron t the next neuron) to th t ) (output of the neuron)}]

Gradient of the output neuron = slope of the transfer function error

Error=1-0.6508=0.3492

Weight Update 1
New Weight=Old Weight + {(learning rate)(gradient)(prior output)}
0.5+(0.5)(0.0397)(0.6225) 0.5+(0.5)(0.0093)(1) 0.5+(0.5)(0.3492)(0.6508)

0.5124 0 5124 0.5047 0.5124 0.5047 0.5124 0.5124 0.6136 0.6136

Second Pass
G1= (0.6236)(10.6236)(0.5124)(0.0273)(2)=0.0066 0 6236)(0 5124)(0 0273)(2) 0 0066 G2= (0.6545)(10.6545)(0.1967)(0.6136)=0.0273 0 6545)(0 1967)(0 6136) 0 0273

0.6236 0.5047 0 5047


0.5047

0.6391
0.5124 0 5124

0.6545
0.6136

0.8033

1
0.5047

0.5124

0.5124 0.6136 0.5124

0.8033
G3=(1)(0.1967)=0.1967

0.5047 0.6236

0.6391

0.6545

Error=1-0.8033=0.1967

Weight Update 2
New Weight=Old Weight + {(learning rate)(gradient)(prior output)}
0.5124+(0.5)(0.0273)(0.6236) 0.5047+(0.5)(0.0066)(1) 0.6136+(0.5)(0.1967)(0.6545)

0.5209 0 5209 0.508 0.5209 0.508 0.5209 0.5209 0.6779 0.6779

Third Pass
0.6243 0.508 0 508
0.508

0.6504
0.5209 0 5209

0.6571
0.6779

0.8909

1
0.508

0.5209

0.5209 0.6779 0.5209

0.508 0.6243

0.6571

0.8909

0.6504

Weight Update Summary


Weights Output Expected O Error w1 1 w2 2 w3 3 Initial conditions 0.5 0.5 0.5 0.6508 1 0.3492 Pass 1 Update 0.5047 0.5124 0.6136 0.8033 0 5047 0 5124 0 6136 0 8033 1 0 1967 0.1967 Pass 2 Update 0.508 0.5209 0.6779 0.8909 1 0.1091
W1: Weights from the input to the input layer W2: Weights from the input layer to the hidden layer W3: Weights from the hidden layer to the output layer

Training Algorithm
The process of feedforward and backpropagation continues until the required mean squared error has been reached. Typical mse: 1e 5 1e-5 Other complicated backpropagation training l ith t i i algorithms also available. l il bl

Why Gradient?
O1 Output W1 O3 = 1/[1+exp(-N)] Error = Actual Output O3

N= O2 (O1W1) +(O2W (O O = Output of the neuron 2) W = Weight N = Net input to the neuron W2 To reduce error: Change in weights:

Input

o Learning rate o Rate of change of error w.r.t rate of change of weight Gradient: rate of change of error w.r.t rate of change of N Prior output (O1 and O2)

Gradient in Detail
Gradient : Rate of change of error w.r.t rate of change in net input to

neuron o For output neurons Slope of the transfer function error o For hidden neurons : A bit complicated ! : error fed back in terms of gradient of successive neurons Slope of the transfer function [ (gradient of next neuron weight connecting the neuron to the next neuron)] Why summation? Share the responsibility!!

An Example
G1=0.66(1-0.66)(0.34)= 0.0763 1 0.731 0.5 0.6645 0.66 1 Error = 1-0.66 = 0 3 o 0 66 0.34 0.5 0.5

0.4

0.598 Reduce more

0.5

0.6645

0.66

0 Error = 0-0.66 = -0.66

G1=0.66(1-0.66)(-0.66)= -0.148 Increase less

Improving performance
Changing the number of layers and number of neurons in each layer. Variation in Transfer functions functions. Changing the learning rate. Training for longer times. Type of p p yp pre-processing and p g postprocessing.