Sie sind auf Seite 1von 22

Deep Neural Networks

1
Convolutional Neural Network

Classification
Learning Features

1998
Focus on end to end learning!
2
The Operations in Detail

Activation function

3
Spatial / Dimensional
structures are
preserved

4
5
If we use 6 such filters:

Neurons
Convnet arranged in
3D grids.

6
ReLU – Rectified Linear Unit (The activation function)
Pool – Sub-sampling like operation

7
Strides and Zero Padding

• Zero padding by floor (F/2) [usually]

• Will increase the size N and hence the Output size

8
Pooling:

• Makes representations
“manageable”
• Introduces 0 parameters
• No zero padding

Max-pooling:

9
Training the CNN: More practical approaches

One time setup


• Activation functions
• Preprocessing
• Weight initialization (Batch Normalization)
• Regularization
Training dynamics
• Babysitting the learning process
• Hyperparameter optimization
• Parameter updates (learning rate)
Evaluation & Bottlenecks
• Model ensembles
• Fine tuning

10
Relook & more: Activation functions

Logistic

11
12
13
14
PReLU training

15
MAXout

Max

16
Advice from CNN gurus

Reminder: Preprocessing

17
More on: Weights initialization

Optimal Initialization?
Number of inputs

Based on specific parameters of tanh!


Does not work for ReLU!
“intuitively” half of the variation gone
?
18
Batch Normalization

Weighted sum / x BN y Activation


Dot product function

If you think normalizing


(independently) at each
neuron could be a
problem

HOW
Allow these
parameters to
19 be learned
More on: Regularization
From optimization literature

Regularizing term
cost & amount
Derivative w.r.to
𝑓𝑓 of weights learning parameter
should exist
Tikhonov regularization

LS [linear]
Smoothing topology,
20 removing local minima
For CNN:

Also, kind of puts a cap on the aggregate of the weights!


Mostly found no good!
Other kinds of regularization is used

Dropout Randomly set output of a


neuron to 0, during live
computation (training). This
is done probabilistically,
usually with 50% chance.
21
More common in fully connected network, also used in ConvNet.

The hunch is that this works because:


It like ENSEMBLE of learning multiple sub-networks (and putting
the together), which reduces complexity and hence over-fitting.

So what is the final trained network?

Probability of drop out


22

Das könnte Ihnen auch gefallen