Beruflich Dokumente
Kultur Dokumente
NEURAL NETWORKS
3x3 filter
• Depth: Depth corresponds to the number of filters we use for the convolution operation.
• Stride: Stride is the number of pixels by which we slide our filter matrix over the input matrix. Having
a larger stride will produce smaller feature maps.
• Zero-padding: Sometimes, it is convenient to pad the input matrix with zeros around the border, so
that we can apply the filter to bordering elements of our input image matrix. A nice feature of zero
padding is that it allows us to control the size of the feature maps. Adding zero-padding is also
called wide convolution, and not using zero-padding would be a narrow convolution
Non Linearity-Rectified Linear Unit (ReLU)
An additional operation called ReLU has been used after every Convolution
operation
ReLU is an element wise operation (applied per pixel) and replaces all
negative pixel values in the feature map by zero. The purpose of ReLU is to
introduce non-linearity in our ConvNet, since most of the real-world data we
would want our ConvNet to learn would be non-linear (Convolution is a linear
operation – element wise matrix multiplication and addition, so we account
for non-linearity by introducing a non-linear function like ReLU).
The Pooling Step
Spatial Pooling (also called subsampling or downsampling) reduces the
dimensionality of each feature map but retains the most important information.
Spatial Pooling can be of different types: Max, Average, Sum etc.
Max Pooling operation on a Rectified Feature map (obtained after convolution + ReLU
operation) by using a 2×2 window.
The function of Pooling is to progressively reduce the
spatial size of the input representation
The whole CNN
cat dog ……
Convolution
Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times
Max Pooling
Flattened
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)
Input
1 x 28 x 28
Convolution
How many parameters for
each filter? 9 25 x 26 x 26
Max Pooling
25 x 13 x 13
Convolution
How many parameters 225=
for each filter? 50 x 11 x 11
25x9
Max Pooling
50 x 5 x 5
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)
Input
1 x 28 x 28
Output Convolution
25 x 26 x 26
Fully connected Max Pooling
feedforward network
25 x 13 x 13
Convolution
50 x 11 x 11
Max Pooling
1250 50 x 5 x 5
Flattened
References
Artificial Intelligence: A Modern Approach (3rd Edition) By Stuart Russell (Author), Peter
Norvig (Author)
Toshev, A., and Szegedy, C. 2014. Deeppose: Human pose estimation via deep neural
networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on,
1653–1660. IEEE. Wang, N., and Yeung, D.-Y. 2013. Learning a deep compact image
representation for visual tracking. In Advances in Neural Information Processing Systems,
809–817.
Medium.org
Tedx Talk - How we teach computers to understand pictures | Fei Fei Li
https://www.youtube.com/watch?v=40riCqvRoMs
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
https://medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neural-
networks-on-the-internet-fbb8b1ad5df8
https://medium.com/emergent-future/image-recognition-is-not-enough-293cd7d58004
https://imagga.com/blog/the-top-5-uses-of-image-recognition/
http://cs231n.github.io/convolutional-networks/#overview
https://medium.com/@tifa2up/image-classification-using-deep-neural-networks-a-
beginner-friendly-approach-using-tensorflow-94b0a090ccd4