Sie sind auf Seite 1von 17

MSEP2010_L10 1

MSEP2010_L10 2
Problems with Backpropagation
p p g
1. The presence of local minima in the error surface.
Sol tions get trapped in local peaks
Solutions peaks, and ma
may never
ne er
reach the lowest point if the multidimensional function
has manyy local minimums.
2. Backpropagation is an extremely slow process. Tens of
thousands of learning trials are common even for simple
problems.
bl
3. The assumption behind backpropagation is that
minimizing the error for a training set is the right thing
to do. In fact, this depends very much on the definition
of error, on the choice of training set, and on the function
off th
the network.
t k

MSEP2010_L10 3
4. If learningg goes
g on too longg generalization
g often suffers.
5. Backpropagation is unbiological - there is no evidence of
weight error information running backwards in the brain.
6 Some backpropagation networks can exhibit what is known
6.
as catastrophic unlearning. After the network has learned a
set of patterns, if a new pattern has to be learned, sometimes
it may be needed to undo all old connections and to change
everything in order to accommodate the new information
(which, of course, takes a very long time).
7. The biggest problem is neural network mysticism. A neural
network may solve a practical problem, but it can be
difficult to understand how it solved it. For many y pproblems
the hidden layer is not doing an obvious analysis. If you
don’t know what was done, it can be hard to improve it.

The strong tendency to say, “Who cares? The network works.” This
approachh iis rarely
l the
h roadd to either
i h progress or wisdom.
id
MSEP2010_L10 4
XOR Problem
Haykin, page 176
Touretzkey and Pomerleau, 1989

-1.5
1
-2 -0.5
1

1 -0.5
1
1
When the top hidden neuron is off and the bottom hidden neuron is on, which
occurs when the input pattern is (0,1) or (1,0), the output neuron is switched
on due to the excitatory effect of the positive weight connected to the bottom
MSEP2010_L10 5
hidden neuron
Feed-forward network mappings

Feed-forward neural networks provide a general framework


for representing non-linear functional mappings between a set of input variables
aandd a set of
o output va
variables.
ab es. Thiss iss ac
achieved
eved by representing
ep ese t g the
t e non-linear
o ea
function of many variables in terms of compositions of non-linear functions of
MSEP2010_L10
a single variable, called activation functions. 6
The units which are not treated as output units are called hidden units.
In this network there are d inputs,
p , M hidden units and c outputp units.

The output of the j th hidden unit is obtained by first forming a weighted


linear combination of the d input
p values, and adding g a bias, to g
give

Here wji denotes a weight in the first layer, going from input i to
hidd uniti j , and
hidden d wj0 denotes
d the
h bibias ffor hidd
hidden unit
i j.

Treating bias as a weight

The activation of hidden unit j is then obtained by transforming


the linear sum in using an activation function g(.) to give

MSEP2010_L10 7
The outputs of the network are obtained by transforming the activations
of the hidden units usingg a second layer
y of p processing
g elements. Thus,,
for each output unit k, we construct a linear combination of the outputs of
the hidden units of the form

T ti bias
Treating bi as a weight
i ht

The activation of the kth output unit is then obtained by transforming


this linear combination using a non-linear activation function, to give

An explicit expression for the complete function represented by the


network diagram

MSEP2010_L10 8
When the inputs are binary-

We can easily show that a two-layer network of the form shown in the
earlier figure can generate any Boolean function, provided the number M
of hidden units is sufficiently large (McCulloch and Pitts, 1943)
When the inputs are continuous-
continuous

MSEP2010_L10 9
(Lippmann, 1987 . In the file section of the group- Reading Assignment)
Design a single output two layer network which classifies the shaded
region
i ini Fig.
Fi from
f the
th other
th region.
i

(1,3)

(3, 2)

(1,1)

MSEP2010_L10 10
The equations of the decision boundaries are

h1
x1 − 1 = 0
h2
0.5 x1 − x 2 + 0.5 = 0
− 0.5 x1 − x 2 + 3.5 = 0 h3

1
1

So the hidden layer weights are For the first neuron, weights are 1 0 -1
For the second neuron, weights are 0.5 -1 0.5
F the
For th third
thi d neuron, weights
i ht are -0.5
0 5 -1
1 0.5
05 33.5
5

Let the outputs of the hidden neurons are h1,


h1 h2 and h3.
h3 h1=1 means it is ‘on’
on denoted by
h1>0. Similarly h1=0 by off state by h1<0. Now take the part numbered 1. here h1=0,
h2=0 and h3=0. Similarly we can write for parts also (see the table). We are interested in
making a neural network for which parts 1,
1 3 and 5 will produce output (o) 1 and other
parts 0.
MSEP2010_L10 11
1
(0,4)
*

h1>0

(2,3)
*
6
2
7 h2>0
(0 2) *
(0,2) * *
(2,2) (4,2) 5

4 h3>0

* *
(0,0) (2,0)

MSEP2010_L10 12
−θ > 0
w3 − θ < 0
h1 h2 h3 o
1 0 0 0 1 w 2 + w3 − θ > 0
2 0 0 1 0 w1 + w 2 + w 3 − θ < 0
3 0 1 1 1
4 1 1 1 0 w1 + w 2 − θ > 0
5 1 1 0 1 w1 − θ < 0
6 1 0 0 0
7 1 0 1 0
w1 + w 3 − θ < 0

From the first equation we know that θ is negative. From the second equation, w3 is
more negative h θ andd so on…II tookk θ =-1,
i than 1 w3=-2,2 w2=33 and
d w1=-3.
3

p [ ; ];
p=[2;0];
net = newff([0 10;0 10],[3 1],{'hardlim' 'hardlim'});
net.b{1}=[-1;.5;3.5];
net.b{2}=[1];
net.lw{2,1}=[-3 3 -2];
net iw{1 1}=[1 0;
net.iw{1,1}=[1 0;.5
5 -1;-.5
1; 5 -1];
1];
y=sim(net,p)
MSEP2010_L10 13
About the lab class

MSEP2010_L10 14
Find a neural network to fit the data generated by
humps function between [0 2]

100

80 x=0:.05:2;
y=humps(x);
60

plot(x,y)
40

20

-20
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
MSEP2010_L10 15
p x;t y;
p=x;t=y;
net=newff(p,t,[2,1],{'logsig','purelin'});
net=train(net,p,t);
t t i ( t t)
y=sim(net,p);
Plot the output
Change the number of hidden neurons
Change the learning rate
Change number of epochs

net.trainParam.lr = 0.05;
net.trainParam.epochs =100;

MSEP2010_L10 16
load iris Classification
load housing Regression

Read about housing


gpproblem in the following
g link

http://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

UC IIrvine
i M Machine
hi L Learning
i R Repository
it

http://archive.ics.uci.edu/ml/
p

MSEP2010_L10 17

Das könnte Ihnen auch gefallen