Image Recognition Using Neural Networks

AN ALARMING SYSTEM USING IMAGE RECOGNITION &
NEURAL NETWORKS
Akash Manoj Srivastava

Anam Khatoon
Shubham Sharma
Final-CSE
AITH Kanpur
IMAGE RECOGNITION IS JUST NOT
ENOUGH , THE CONTEXT MATTERS
Object recognition, but what about context?
 Let’s take a look at the above photo. If we were to feed a

machine this photo, what does the machine see?
 5 young children (2 girls, 3 boys),
 the machine might be able to match the names of the
children with other photos that the machine has learned
(Awesome!),
 different colored balloons,
 different colored mugs,
 plate of food,
 hats on kids
 background with purple letter P, yellow letter P, green letter
B, yellow letter R, blue letter T
Now let’s contrast that with how a human sees this photo.
 It’s a picture from a birthday party,

 The obscured sign in the back is obviously a sign that
says “Happy Birthday”,
 The plate of food, although blurry, is most likely a plate
of cookies,
 The kids seem to be having fun and are smiling,
 Although it is somewhat difficult to tell for sure, there
appears to be ten colored balloons showing.
 The tiara on the red-headed girl is difficult to make out,
it says “Happy Birthday”.
 Perhaps the machine could fill in some of the blanks to
guess the sign in the back says “Happy Birthday”, but if
the machine is taking the images literally, it will only
register “P P” “B R T” clearly.
The implied Invisible elements
Wind, heat, and cold are camera shy.

A human can see a picture like the above and
see more than a series of trees, bushes, a body
of water, a road and building in the distance.
The human sees a gusty wind storm.
Sr. No Description Status
1. Project Title Done
2. Study of conceptual theory in image recognition Ongoing
3. Study of Convolutional Neural Networks Ongoing
4. Procuring Required Software(Anaconda, Python,SPyDEr) Done
5. Analysis and Testing Blueprint Ongoing

6. Gathering Training Data & Test Data Done
7. Coding process in SPyDEr
 Preprocessing Ongoing
 Installing Required Libraries & Dependencies
 Developing a Convolutional Neural Network
8. Testing Completed Project Pending

The Model : Convolutional Neural Network
 There are four main operations in the ConvNet as shown in Figure

 Convolution
 Non Linearity (ReLU)
 Pooling or Sub Sampling
 Classification (Fully Connected Layer)
 These operations are the basic building blocks of every Convolutional
Neural Network, so understanding how these work is an important step to
developing a sound understanding of ConvNets
The Convolution Step – A Quick Look
 Every image can be considered as a matrix of pixel values. Consider a 5 x 5
image whose pixel values are only 0 and 1 : Single Channel (For a grayscale
image, pixel values range from 0 to 255, the green matrix below is a special
case where pixel values are only 0 and 1)
 Also, consider another 3 x 3 matrix as shown below:
 3x3 filter
 The output matrix is called Convolved Feature or Feature Map

Feature Map
• The size of the Feature Map (Convolved Feature) is controlled by three parameters that we need to
decide before the convolution step is performed
• Depth: Depth corresponds to the number of filters we use for the convolution operation.
• Stride: Stride is the number of pixels by which we slide our filter matrix over the input matrix. Having
a larger stride will produce smaller feature maps.
• Zero-padding: Sometimes, it is convenient to pad the input matrix with zeros around the border, so
that we can apply the filter to bordering elements of our input image matrix. A nice feature of zero
padding is that it allows us to control the size of the feature maps. Adding zero-padding is also
called wide convolution, and not using zero-padding would be a narrow convolution
Non Linearity-Rectified Linear Unit (ReLU)
 An additional operation called ReLU has been used after every Convolution
operation
 ReLU is an element wise operation (applied per pixel) and replaces all
negative pixel values in the feature map by zero. The purpose of ReLU is to
introduce non-linearity in our ConvNet, since most of the real-world data we
would want our ConvNet to learn would be non-linear (Convolution is a linear
operation – element wise matrix multiplication and addition, so we account
for non-linearity by introducing a non-linear function like ReLU).
The Pooling Step
Spatial Pooling (also called subsampling or downsampling) reduces the
dimensionality of each feature map but retains the most important information.
Spatial Pooling can be of different types: Max, Average, Sum etc.
 Max Pooling operation on a Rectified Feature map (obtained after convolution + ReLU
operation) by using a 2×2 window.
The function of Pooling is to progressively reduce the
spatial size of the input representation
The whole CNN
cat dog ……
Convolution
Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times
Max Pooling
Flattened
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)
Input
1 x 28 x 28
Convolution
How many parameters for
each filter? 9 25 x 26 x 26
Max Pooling
25 x 13 x 13
Convolution
How many parameters 225=
for each filter? 50 x 11 x 11
25x9
Max Pooling
50 x 5 x 5
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)
Input
1 x 28 x 28
Output Convolution
25 x 26 x 26
Fully connected Max Pooling
feedforward network
25 x 13 x 13
Convolution
50 x 11 x 11
Max Pooling
1250 50 x 5 x 5
Flattened
References
 Artificial Intelligence: A Modern Approach (3rd Edition) By Stuart Russell (Author), Peter
Norvig (Author)
 Toshev, A., and Szegedy, C. 2014. Deeppose: Human pose estimation via deep neural
networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on,
1653–1660. IEEE. Wang, N., and Yeung, D.-Y. 2013. Learning a deep compact image
representation for visual tracking. In Advances in Neural Information Processing Systems,
809–817.
 Medium.org
 Tedx Talk - How we teach computers to understand pictures | Fei Fei Li
https://www.youtube.com/watch?v=40riCqvRoMs
 https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
 https://medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neural-
networks-on-the-internet-fbb8b1ad5df8
 https://medium.com/emergent-future/image-recognition-is-not-enough-293cd7d58004
 https://imagga.com/blog/the-top-5-uses-of-image-recognition/
 http://cs231n.github.io/convolutional-networks/#overview
 https://medium.com/@tifa2up/image-classification-using-deep-neural-networks-a-
beginner-friendly-approach-using-tensorflow-94b0a090ccd4

Image Recognition Using Neural Networks

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Image Recognition Using Neural Networks

Hochgeladen von

Copyright:

Verfügbare Formate

AN ALARMING SYSTEM USING IMAGE RECOGNITION &

Akash Manoj Srivastava

 Let’s take a look at the above photo. If we were to feed a

 It’s a picture from a birthday party,

Wind, heat, and cold are camera shy.

5. Analysis and Testing Blueprint Ongoing

8. Testing Completed Project Pending

 There are four main operations in the ConvNet as shown in Figure

 The output matrix is called Convolved Feature or Feature Map

Das könnte Ihnen auch gefallen