Sie sind auf Seite 1von 64

Business School

Institute of
Business Informatics

Supervised Learning

Uwe Lmmel
www.wi.hswismar.de/~laemmel
U.laemmel@wi.hswismar.de

Supervised Learning

Neural Networks
Idea
Artificial Neuron & Network
Supervised Learning
Unsupervised Learning
Data Mining other
Techniques

Supervised Learning

Supervised Learning
Feed-Forward Networks

Perceptron AdaLinE LTU


Multi-Layer networks
Backpropagation Algorithm
Pattern recognition
Data preparation

Examples

Bank Customer
Customer Relationship

Supervised Learning

Connections
Feed-forward
Input layer
Hidden layer
Output layer

Feed-back / auto-associative
From (output) layer back to
previous (hidden/input) layer
All neurons fully connected to
each other

Hopfield
network

Supervised Learning

Perceptron Adaline TLU


One layer of trainable links only
Adaptive linear element
Threshold Linear Unit
class of neural network of a special architecture:

...

Supervised Learning

Papert, Minsky and Perceptron History


"Once upon a time two daughter sciences were born to the new science
of cybernetics.
One sister was natural, with features inherited from the study of the
brain, from the way nature does things.
The other was artificial, related from the beginning to the use of
computers.

But Snow White was not dead.


What Minsky and Papert had shown the world as proof was not the
heart of the princess; it was the heart of a pig."
Seymour Papert, 1988

Supervised Learning

Perception
mapping layer

Perception
first step of recognition
becoming aware of
something via the senses

output-layer

picture

fixed 1-1- links

Supervised Learning

trainable, fully
connected

Perceptron
Input layer
binary input, passed trough,
no trainable links
Propagation function
netj = oiwij
Activation function
oj = aj = 1 if netj j , 0 otherwise
A perceptron can learn all the functions,
that can be represented, in a finite
time .

(perceptron convergence theorem, F. Rosenblatt)

Supervised Learning

Linear separable
Neuron j should be 0,
iff both neurons 1 and 2 have the same
value (o1=o2), otherwise 1:
netj = o1w1j + o2w2j
0 w1j + 0w2j < j
0 w1j + 1w2j j
1 w1j + 0w2j j
1 w1j + 1w2j < j
9

Supervised Learning

j
w1j
1

w2j
2

Linear
separable

o2

(1,1)

netj = o1w1j + o2w2j


o1
(0,0)
1
o1*w1 +o2*w2=q
line in a 2-dim. space
line divides plane so,
that (0,1) and (1,0) are in different sub planes.
the network can not solve the problem.
a perceptron can represent only some functions
a neural network representing the XORfunction needs hidden neurons

10

Supervised Learning

Learning is easy

while input pattern do begin


next input patter
calculate output
for each j in OutputNeurons do
if ojtj then
if oj=0 then {output=0, but 1 expected }
for each i in InputNeurons do
wij:=wij+oi
else if oj=1 then {output=1, but 0 expected }
for each i in InputNeurons do
wij:=wij-oi ;
end
repeat until desired behaviour
11

Supervised Learning

Exercise
Decoding
input: binary code of a digit
output - unary representation:
as many digits 1, as the digit
represents:
5:11111
architecture:

12

Supervised Learning

Exercise
Decoding
input: Binary code of a digit
output: classification:
0~ 1st Neuron, 1~ 2nd Neuron, ... 5~ 6th
Neuron, ...
architecture:

13

Supervised Learning

Exercises
1. Look at the EXCEL-file of the decoding problem
2. Implement (in PASCAL/Java)
a 4-10-Perceptron which transforms a binary
representation of a digit (0..9) into a decimal
number.
Implement the learning algorithm and train the
network.
3. Which task can be learned faster?
(Unary representation or classification)

14

Supervised Learning

Exercises
5. Develop a perceptron for the
recognition of digits 0..9. (pixel
representation)
input layer: 3x7-input neurons
Use the SNNS or JavaNNS
6. Can we recognize numbers greater
than 9 as well?
7. Develop a perceptron for the
recognition of capital letters. (input
layer 5x7)

15

Supervised Learning

multi-layer Perceptron
Cancels the limits of a
perceptron
several trainable layers
a two layer perceptron can classify convex
polygons
a three layer perceptron can classify any sets

16

multi layer perceptron = feed-forward network


= backpropagation
network

Supervised Learning

Multi-layer feed-forward network

17

Supervised Learning

Feed-Forward Network

18

Supervised Learning

Training
pattern p

Evaluation of the net output in a feed


forward network
Ni

Oi=pi
netj

Nj

Oj=actj
netk

Nk

Ok=act
k

Input-Layer

19

Supervised Learning

hidden Layer(s)

Output Layer

Backpropagation-Learning
Algorithm
supervised Learning
error is a function of the weights w i :
E(W) = E(w1,w2, ... , wn)
We are looking for a minimal error
minimal error = hollow in the error
surface
Backpropagation uses the gradient
for weight adaptation

20

Supervised Learning

error curve

weight1
weight2

21

Supervised Learning

Problem

output
hidden
layer

input layer
22

Supervised Learning

teaching
output

error in output layer:


difference output teaching output
error in a hidden layer?

Gradient descent
Gradient:
Vector orthogonal to a
surface in direction
of the strongest slope

0,80

0,40

-1

-0,6

0,00
-0,2

0,2

0,6

example of an error curve


of a weight wi

23

Supervised Learning

derivation of a function
in a certain direction is
the projection of the
gradient in this
direction

Example: Newton-Approximation
tan = f(x) = 2x
tan = f(x) / (x-x)
x =(x + a/x)

calculation of the root


f(x) = x-5
f(x)= x-a

24

Supervised Learning

x = 2
x = (x + 5/x) = 2.25
X= (x + 5/x) =
2.2361

Backpropagation - Learning
gradient-descent algorithm
supervised learning:
error signal used for weight adaptation
error signal:
teaching calculated output , if output neuron
weighted sum of error signals of successor
weight adaptation:
: Learning rate
: error signal

25

Supervised Learning

w wij oi j
'
ij

Standard-Backpropagation Rule
gradient descent: derivation of a function
logistic function:

f Logistic ( x)

1
1 ex

fact(netj) = fact(netj)(1- fact(netj)) = oj(1-oj)

the error signal jis therefore:

o j (1 o j ) k w jk

if j is hidden neuron

o j (1 o j ) (t j o j ) if j is output neuron

wij' wij oi j
26

Supervised Learning

Backpropagation

Examples:
XOR (Excel)
Bank Customer

27

Supervised Learning

Backpropagation - Problems

28

Supervised Learning

Backpropagation-Problems
A: flat plateau
weight adaptation is slow
finding a minimum takes a lot of time

B: Oscillation in a narrow gorge


it jumps from one side to the other and back

C: leaving a minimum
if the modification in one training step is to
high,
the minimum can be lost

29

Supervised Learning

Solutions: looking at the values


change the parameter of the logistic
function in order to get other values
Modification of weights depends on the
output:
if oi=0 no modification will take place
If we use binary input we probably have a
lot of zero-values: change [0,1] into [- ,
] or [-1,1]
use another activation function, eg. tanh
and use [-1..1] values
30

Supervised Learning

Solution: Quickprop
assumption: error curve is a square function
calculate the vertex of the curve

S (t )
wij (t )
wij (t 1)
S (t 1) S (t )
slope of the error curve:

S (t )
-2

31

Supervised Learning

E
wij (t )

Resilient Propagation (RPROP)


sign and size of the weight modification are calculated
separately: bij(t) size of modification

bij(t) =
bij(t-1)

bij(t-1) +
bij(t-1) otherwise

if S(t-1)S(t) > 0
if S(t-1)S(t) < 0

+>1 : both ascents are equal big step


0<-<1 : ascents are different
smaller step

-bij(t) if S(t-1)>0 S(t) > 0


wij(t) =
bij(t)
f S(t-1)<0 S(t) < 0

-wij(t-1)
if S(t-1)S(t) < 0
(*)

-sgn(S(t))bij(t)
otherwise
(*) S(t) is set to 0, S(t):=0 ; at time (t+1) the 4th case will be applied.

32

Supervised Learning

Limits of the Learning Algorithm

it is not a model for biological learning


no teaching output in natural learning
no feedbacks in a natural neural network
(at least nobody has discovered yet)
training of an ANN is rather time consuming

33

Supervised Learning

Exercise - JavaNNS

Implement a feed forward network containing of


2 input neurons, 2 hidden neurons and one output
neuron.
Train the network so that it simulates the XORfunction.
Implement a 4-2-4-network, which works like the
identity function. (Encoder-Decoder-Network).
Try other versions: 4-3-4, 8-4-8, ...
What can you say about the training effort?
34

Supervised Learning

Pattern Recognition
Eingabeschicht
input layer







35

Supervised Learning

1. 1.
verdeckte
hidden
Schicht
layer

2.2.verdeckte
hidden
schicht
layer

Ausgabeoutput
schicht
layer

Example: Pattern Recognition

JavaNNS example: Font

36

Supervised Learning

font Example
input = 24x24 pixel-array
output layer: 75 neurons, one neuron for each
character:
digits
letters (lower case, capital)
separators and operator characters

two hidden layer of 4x6 neurons each


all neuron of a row of the input layer are linked to
one neuron of the first hidden layer
all neuron of a column of the input layer are linked
to one neuron of the second hidden layer.
37

Supervised Learning

Exercise
load the network font_untrained
train the network, use various learning
algorithms:
(look at the SNNS documentation for the
parameters and their meaning)
Backpropagation
Backpropagation
with momentum
Quickprop
Rprop

=2.0
=0.8
=0.1
=0.6

mu=0.6

c=0.1

mg=2.0

n=0.0001

use various values for


learning parameter, momentum, and noise:
learning parameter 0.2
Momentum
0.9
noise
0.0

38

Supervised Learning

0.3
0.7
0.1

0.5
0.5
0.2

1.0
0.0

Example: Bank Customer


A1: Credit history
A2: debt
A3: collateral
A4: income

network architecture depends on the coding of input and output


How can we code values like good, bad, 1, 2, 3, ...?
39

Supervised Learning

Data Pre-processing

objectives
prospects of better
results
adaptation to algorithms
data reduction
trouble shooting

40

Supervised Learning

methods
selection and
integration
completion
transformation
normalization
coding
filter

Selection and Integration

unification of data (different origins)


selection of attributes/features
reduction
omit obviously non-relevant data
all values are equal
key values
meaning not relevant
data protection

41

Supervised Learning

Completion / Cleaning
Missing values
ignore / omit attribute
add values
manual
global constant (missing
value)
average
highly probable value
remove data set

noised data
inconsistent data

42

Supervised Learning

Transformation

Normalization
Coding
Filter

43

Supervised Learning

Normalization of values
Normalization equally distributed
in the range [0,1]
e.g. for the logistic function
act = (x-minValue) / (maxValue - minValue)
in the range [-1,+1]
e.g. for activation function tanh
act = (x-minValue) / (maxValue - minValue)*2-1

logarithmic normalization
act = (ln(x) - ln(minValue)) / (ln(maxValue)ln(minValue))

44

Supervised Learning

Binary Coding of nominal values I

no order relation, n-values


n neurons,
each neuron represents one and only one value:
example:
red,
blue,
yellow,
white,
black
1,0,0,0,0
0,1,0,0,0
0,0,1,0,0 ...
disadvantage:
n neurons necessary lots of zeros in the input

45

Supervised Learning

Bank Customer
credit
history

46

Supervised Learning

debt

collateral income

Are these customers good


ones?
1: bad high adequate
2: good low adequate

3
2

Data Mining Cup


2002

The Problem: A Mailing Action


mailing action of a company:

special offer
estimated annual income per customer:
customer

given:

will
cancel

will
not cancel

gets an offer

43.80

66.30

gets no offer

0.00

72.00

10,000 sets of customer data


containing 1,000 cancellers (training)

problem:
test set containing 10,000 customer data
Who will cancel ? Whom to send an offer?
47

Supervised Learning

Mailing Action Aim?customer

no mailing action:
9,000 x 72.00

gets an
offer

43.80

66.30

gets no
offer

0.00

72.00

= 648,000

everybody gets an offer:


1,000 x 43.80 + 9,000 x 66.30
640,500

maximum (100% correct classification):


1,000 x 43.80 + 9,000 x 72.00
=
691,800

48

Supervised Learning

will
will
cancel not
cancel

Goal Function: Lift


customer

will
will
cancel not
cancel

gets an
offer

43.80

66.30

gets no
offer

0.00

72.00

basis: no mailing action: 9,000 72.00


goal = extra income:
liftM = 43.8 cM + 66.30 nkM 72.00 nkM

49

Supervised Learning

----- 32 input data ------

<important

Data
results>

^missing values^

50

Supervised Learning

Feed Forward Network What to do?

train the net with training set (10,000)


test the net using the test set ( another 10,000)
classify all 10,000 customer into canceller or
loyal
evaluate the additional income
51

Supervised Learning

Results
data mining cup
2002

neural network project


2004

gain:
additional income by the mailing action
if target group was chosen according analysis
52

Supervised Learning

Review Students Project


copy of the data mining cup
real data
known results motivati
contest
on

enthusias
m

better results

wishes

engineering approach data


mining
real data for teaching purposes

53

Supervised Learning

Data Mining Cup 2007

54

Supervised Learning

started on April 10.


check-out couponing
Who will get a rebate coupon?
50,000 data sets for training

Data

55

Supervised Learning

DMC2007

~75% output = N(o)


e.g. classification has to > 75%!!
first experiments: no success
deadline: May 31st

56

Supervised Learning

Optimization of Neural Networks


objectives
good results in an application:
better generalisation
(improve correctness)
faster processing of patterns
(improve efficiency)
good presentation of the results
(improve comprehension)

57

Supervised Learning

Ability to generalize
a trained net can classify data

(out of the same class as the learning data)

that it has never seen before


aim of every ANN development

network too large:


all training patterns are learned from memory
no ability to generalize
network too small:
rules of pattern recognition can not be learned
(simple example: Perceptron and XOR)

58

Supervised Learning

Development of an NN-application
calculate
network
output

build a network
architecture
input of training
pattern

modify
weights
change
parameters
error is too high

compare to
teaching
output

quality is good
enough
use Test set data

error is too
high

evaluate output

compare to teaching
output
quality is good enough

59

Supervised Learning

Possible Changes
Architecture of NN

size of a network
shortcut connection
partial connected layers
remove/add links
receptive areas

Find the right parameter


values

learning parameter
size of layers
using genetic algorithms

60

Supervised Learning

Memory Capacity
Number of patterns
a network can store without generalisation
figure out the memory capacity
change output-layer: output-layer input-layer
train the network with an increasing number of random
patterns:
error becomes small:
network stores all
patterns
error remains:
network can not store all patterns
in between: memory capacity

61

Supervised Learning

Memory Capacity - Experiment


output-layer is a copy of
the input-layer
training set consisting of
n random pattern
error:
error = 0
network can store more
than n patterns
error >> 0
network can not store n
patterns
memory capacity:
error > 0 and error = 0
for n-1 patterns and
error >>0 for n+1
patterns
62

Supervised Learning

Layers Not fully Connected


connections:
new
removed
remaining

63

Supervised Learning

partial connected (e.g. 75%)


remove links, if weight has been nearby 0 for
several training steps
build new connections (by chance)

Summary
Feed-forward network
Perceptron (has limits)
Learning is Math
Backpropagation is a Backpropagation of Error
Algorithm
works like gradient descent
Activation Functions: Logistics, tanh

Application in Data Mining, Pattern Recognition


data preparation is important
Finding an appropriate Architecture

64

Supervised Learning