Modul 9 (ANN2)

Module 9
Feedforward Neural Networks

Estimating a Function - Neural Network Models
Applications
• Pattern recognition
• Function estimation
• Classification
• Nonlinear modeling Neural networks complement
other existing tools
• Prediction / Forecast
• Time series analysis
• Visualization
Artificial Neural Networks
Short history of neural network development
1943 “McCulloch/Pitts Cells”, first model of a neuron

1958 “Perceptron” (Rosenblatt)
1960 “Adaline” (Widrow & Hoff)
1969 Book “Perceptrons” (Minsky & Papert)
1970s Associative memory systems
1982 Hopfield networks (energy functions, spin-glass theory)
1982 Self-organizing networks (Kohonen)
1983 Stochastic networks (Hinton & Sejnowski)
1986 “Back-Propagation of Errors” (Rumelhart, McClelland et al.)
Estimating a Linear Function - The Perceptron
Idea: find a function f(x) = ax + b
INPUT x1 x2 x3
“weights” wi
“Neuron” with “bias” Θ
3
OUTPUT f ( x) = ∑ wi xi + θ
i =1
Perceptron Training (“Supervised Learning”)
Classification Task
1. Separate data into Training- and Test-Sets

• Typically 80+20 split, better: 50+50 split
2. Assign class labels (target values) to the data points

• Typically “1” for positive examples and
“0” or “-1” for negative examples
3. Determine the weights and bias values

• Optimization procedure, e.g. Gradient Descent
4. Test re-classification and classification accuracy

• Check separation of classes using training and test data
Gradient Descent Learning
The Perceptron Error Function
sum over Minimize E[w ] !

µ patterns
2
 µ  
E[w ] = ∑  y −  ∑ wi xi + θ  
µ
µ   i 
desired output actual output

(target value) (computed value)
The Delta Rule (LMS rule, Widrow-Hoff rule)
new old
w =w + ∆w E
∂E
∆wi = −η
∂wi w
optimum
µ µ Eo = 0
∆wi = η δ xi
Note: Θ is treated like a weight
learning rate
µ µ µ
δ = ytarget − yactual
Adaptive Learning Rate (Darken & Moody, 1991)
initial value
η0
ηt =
t
1+
r
degree of adaptation
(after r steps 50% reduction of η)
Momentum Term (Rumelhart & McClelland, 1986)
∆wi (t ) = η δ xi + α ⋅ ∆wi (t − 1)
constant weight change

in previous step
Perceptron: Linear Classifier
optimum optimum “ill-posed“

Eo = 0 Eo > 0
• Many (infinite) solutions

• One optimal solution (E = min.)
Perceptron: Linear Function Fitting
x1 x2 40
35
Input 30
25
w1 w2 Weights f(x) 20
15
10
Neuron (linear) 5
0
8
6
X2
4 10
8
2 2 6
r
y = f( x ) = ∑ wi xi + θ
4
0 2
0
X1
i =1
• The neuron can have other “activation” (e.g., sigmoidal)

• Perceptron (one neuron) classification is always linear in X
Estimating a Non-Linear Classifier Function
The Multi-Layer Network
INPUT x1 x2 x3
w 1
act (ξ ) =
1 + e −ξ
Hidden Layer
“activation function”
v
OUTPUT
r  HID   IN   
y = f( x ) = act  ∑ vh  act  ∑ whi xhi + ϑh   + θ 
 h =1   i =1  
Universal function approximators!
hidden bias output bias
The Sigmoidal Activation Function
1.0
0.8
1 0.6
act ( x) =
1 + e−x 0.4
0.2
0.0
-6 -4 -2 0 2 4 6 x
• compresses the output to a 0/1 range

• its derivative is: act’(x) = act(1 – act)
Gradient Descent Learning (Back-Propagation of Errors)
Multi-Layer Network: Function Approximation
x1 x2
W2,1
9
W1,1 W2,2
W1,2
6
Hidden (sigmoidal)
f( x)
V1 V2 3
10
Output (linear) 0
8 6
8
6
4
4 X1
2 2
X2 0 0
r HID   IN 
y = f( x ) = ∑ vh  act  ∑ whi xhi + ϑh   + θ
h =1   i =1 
The Multi-Layer Network Error Function
sum over Minimize E[w ] !

µ patterns
2
 µ    
( )
E[w ] = ∑  y − act  ∑ vh act  ∑ whi xi + ϑh  + θ  
µ
µ   h  i  
hidden neuron output
desired output actual output

(target value) (computed value)
Back-Propagation of Errors (“Backprop”) (1)
new old
v h =v h + ∆v
∂E
∆vh = −η Hidden-to-Output weights
∂vh
After presentation of a pattern µ
Hidden Output
∆vh = η δ H h
δ = ( ytarget − yactual )yactual (1 − yactual )

act’(x) = act(1 – act)
new old
w hi =whi + ∆w
∂E
∆whi = −η Input-to-Hidden weights
∂whi
After presentation of a pattern µ
Input variables
∆whi = η δ h xhi
∆wij = η ⋅ (δ v ) ⋅ sig (net ) ⋅ (1 − sig (net )) ⋅ inpi

k jk j j
144444424444443
δ
j
∆v jk = η ⋅ (t − o ) ⋅ sig (net ) ⋅ (1 − sig (net )) ⋅ o j

1k44k44442 k 444444 k3
δ
k
Neural Network Training
Validation data
Test data
Error
Training data
Training Time
“Forced Stop”
Mapping Chemical Space: “Drugs” and “Nondrugs”
120-dimensional data, Ghose & Crippen parameters

5’000 drugs, 5’000 nondrugs (Sadowski & Kubinyi, 1998)
Application of Neural Networks: “Drug-Likeness”
Drugs
25
120
20
Ghose & Crippen Σ = 24% Σ = 76%
% 15
Descriptors
10
0
0 1
Score
Nondrugs
14
12
10 Σ = 76% Σ = 24%
%
8
y = f(x) 6
Score 4
0
0 1
Score
“Drug-Likeness” ?
O
O HN N
NH2 N
H N
O H O O O
NH2 OH H H
N O
N N N N
S N O N O S O
O S H
N N O N
O
N O N
H S H
H H N
N
O
Rocephin™ (Ceftriaxone) Fortovase™ (Saquinavir) Viagra™ (Sildenafil)

Score = 0.98 Score = 0.96 Score = 0.94
O
O
HN O
O O
Xenical™ (Orlistat)
Score = 0.54
The Jury Decision Approach
Encoder Network
Training Mode
Pattern Pattern
vector vector
Mapping Mode
Pattern Factor 1
vector
Factor 2
Sequence Analysis by ANN
Residue encoding: “Unary” vectors
Score A 10000000000000000000
C 01000000000000000000
Output
D 00100000000000000000
E 00010000000000000000
F 00001000000000000000
Hidden G 00000100000000000000
H 00000010000000000000
I 00000001000000000000
Input
K 00000000100000000000
L 00000000010000000000
M R N L L V I …
M 00000000001000000000
N 00000000000100000000
b) “sliding window”
P 00000000000010000000
Q 00000000000001000000
R 00000000000000100000
S 00000000000000010000
T 00000000000000001000
V 00000000000000000100
W 00000000000000000010
Y 00000000000000000001
SignalP Output
Neural network tutorial
http://diwww.epfl.ch/mantra/tutorial/english/

Modul 9 (ANN2)

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Modul 9 (ANN2)

Hochgeladen von

Copyright:

Verfügbare Formate

Module 9

Feedforward Neural Networks

1943 “McCulloch/Pitts Cells”, first model of a neuron

Idea: find a function f(x) = ax + b

“Neuron” with “bias” Θ

1. Separate data into Training- and Test-Sets

2. Assign class labels (target values) to the data points

3. Determine the weights and bias values

4. Test re-classification and classification accuracy

sum over Minimize E[w ] !

desired output actual output

Adaptive Learning Rate (Darken & Moody, 1991)

Momentum Term (Rumelhart & McClelland, 1986)

constant weight change

optimum optimum “ill-posed“

• Many (infinite) solutions

• The neuron can have other “activation” (e.g., sigmoidal)

• compresses the output to a 0/1 range

sum over Minimize E[w ] !

desired output actual output

δ = ( ytarget − yactual )yactual (1 − yactual )

∆wij = η ⋅ (δ v ) ⋅ sig (net ) ⋅ (1 − sig (net )) ⋅ inpi

∆v jk = η ⋅ (t − o ) ⋅ sig (net ) ⋅ (1 − sig (net )) ⋅ o j

120-dimensional data, Ghose & Crippen parameters

Rocephin™ (Ceftriaxone) Fortovase™ (Saquinavir) Viagra™ (Sildenafil)

Das könnte Ihnen auch gefallen