Beruflich Dokumente
Kultur Dokumente
1
Practically, single layers allow only global feature interaction,
often making better approximation at some part of it at the
expense of the other… so… 2 hidden layers?
Cross Validation
All inputs in
Universe Probability
Generalization
error:
Get with
to avoid model complexity / overfitting
estimation set
examples
validation set
training set
4
Use of cross validation results in this choice:
As training progress --
Learns fairly simple to more complex mapping functions
6
The training (on estimation data) can be stopped at regular
intervals (epochs) and then tested on validation data for
early stopping.
Usually not as
smooth as this,
contains multiple
minima
7
Variants of Cross-Validation
Described earlier:
Randomly partitioned
blue: validation
K=4
Validation Error
averaged over the 4
Weight decay!
10
is ignored! (assuming we
are at flat gradient)
Optimal Brain Surgeon (OBS)
11
Lagrangian
λ𝟏𝟏𝒊𝒊
∆𝒘𝒘 = −𝟏𝟏
𝑯𝑯
Substituting here:
λ = −𝑤𝑤𝑖𝑖 𝑯𝑯−𝟏𝟏
12
𝛼𝛼𝑖𝑖
∆𝑤𝑤𝑖𝑖 = 𝛼𝛼 = λ𝟏𝟏𝒊𝒊
[𝑯𝑯−𝟏𝟏 ]𝒊𝒊,𝒊𝒊
(increase in error)
13
Some other aspects
• Replicator mapping
Encoder - Decoder
14
• Function approximation:
Universal approximator
15
Supervised Learning is Numerical Optimization
16
Steepest descent
Problems:
• Inverse Hessian calculation expensive
• Hessian singular (Psuedo), rank deficient (ill-conditioned)
• For non-quadratic COST, no guarantee of convergence.
17
Cost – sum of squares, then Gauss-Newton!
Min
18
After proceeding a bit as gradient falls off
19