Sie sind auf Seite 1von 14

Chapter 7:

Regularization for
Deep Learning

Deep Learning Textbook Study Group, SF

Safak Ozkan
April 15, 2017

1 Safak Ozkan
Chapter 7: Regularization for Deep Learning

L2 Parameter Regularization
L1 Parameter Regularization
Norm Penalties and Constrained Optimization
Regularization and Under-Constrained Problems
Dataset Augmentation
Noise Robustness
Injecting Noise at Output Targets
Early Stopping
Semi Supervised Learning
Multi-Task Learning
Parameter Tying and Parameter Sharing
Bagging and Other Ensemble Methods
Dropout
Adversarial Training
Tangent Distance, Manifold Tangent Classifier
2 of 13 Safak Ozkan
Definition

Regularization is any modification we make to a


learning algorithm that is intended to reduce its
test error but NOT its training error.

Etrain : Training Error Etest : Test Error


(or Generalization Error)

3 of 13 Safak Ozkan
L2 Regularization
(a.k.a. Weight decay, Tikhonov regularization, Ridge regression)

Regularization increases bias and reduces variance.

Regularization
parameter

Regularized Regularization term


cost function Unregularized
Cost function

Gradient Descent update rule:

Additional term

4 of 13 Safak Ozkan
L2 Regularization

Lagrangian Constrained Optimization


Lagrangian
multiplier

is equivalent to optimizing
such that .

5 of 13 Safak Ozkan
L2 Regularization

Lagrangian Constrained Optimization


We typically dont set explicitly,
We set .

Unregularized
solution

Regularized Large small


solution constraint region

Large

6 of 13 Safak Ozkan
L2 Regularization

2nd degree Taylor Approximation of around :


unregularized
problem

At ,

Analysis through e-vector decomposition

Stretching in i th small eigen-directions will be affected


eigen direction: more than larger eigen-directions.

7 of 13 Safak Ozkan
L2 Regularization
Normal Equations for Linear Regression

Assume:

Then, would shrink


more than components.

covariance of input features


covariance of
with the target values.
input features

regularization causes the learning algorithm to


perceive the input with increased variance.

8 of 13 Safak Ozkan
L1 Regularization
(a.k.a. LASSO)

Regularization
Term

2nd degree Taylor Approximation


of around :

(Induces
Sparsity)

9 of 13 Safak Ozkan
Under-Constrained Problems
E.g. Logistic Regression

Linearly non seperable Linearly separable

Well behaved problem. Under-determined problem.


( will continue to increase
in a GD Algorithm)

10 of 13 Safak Ozkan
Data Augmentation
Best way to improve generalization of a model is
to train it on more data.
Data Augmentation works particularly well for
Object Recognition tasks.
Injecting noise to input works well for
Speech Recognition.
Affine Elastic
Distortion Noise Deformation
Original
Input Image

Horizontal Random Hue


Flip Translation Shift

11 of 13 Safak Ozkan
Noise Robustness
Addition of noise with a small variance is
equivalent to imposing norm penalty on weights.
Noise on weights: A stochastic implementation of
Bayesian Inference (uncertainty on weights are
represented by a probability distribution)

For each input data,


apply noise on weights

modified cost
function

regularization term

12 of 13 Safak Ozkan
Early Stopping

regularization
number of parameter
learning rate
steps
13 of 13 Safak Ozkan
Early Stopping
HAPTER 7. REGULARIZATION FOR DEEP LEARNING
Early stopping: Terminate while validation set
performance is better

0.20
Loss (negative log-likelihood)

Training set loss


0.15 Validation set loss

0.10

0.05

0.00
0 50 100 150 200 250
Time (epochs)

gure 7.3: Learning curves showing how the negative log-likelihood loss changes o
14 of 13 Safak Ozkan

Das könnte Ihnen auch gefallen