Sie sind auf Seite 1von 16

Application of Transfer Learning in RGB-D object

recognition
# (1570298162)

S. Nithin Shrivatsav
National Institute of Technology,
Tiruchirappalli
nithinshrivatsav@gmail.com

Outline

Motivation
Background information
Proposed Method
Results
Summary
References

Motivation
Human-Machine Interaction is rapidly
developing and researchers are
striving to develop a system that can
accurately classify objects.
Our goal was to dive into the field of
computer vision and deep learning
and contribute towards this pursuit
ofdeveloping
a
robust
object
classifier.
3

Motivation
Aim: Achieve
recognition.

fast

and

robust

object

To avoid time-consuming hand-crafted


features we made use of deep learning
algorithm.
For robust object recognition we made use
of both Depth and RGB images of objects.
To train the network quickly we made use of
Transfer Learning instead of training the
4
weights from scratch.

Background information
In past Five years, Many researcher worked on
Object Detection using Convolution Neural
Network(CNN). However, CNN requires a large
amount of training data to train.
In this paper, we tried to overcome this
problem by using Transfer Learning on multimodal neural network.
Transfer learning also helped in decreasing the
learning time.

Proposed Method
Our method involves object recognition
using deep neural networks.
Neural Network Type: Convolutional
Neural Network
Network Structure:
Multi-modal network with two streams
RGB stream
Depth stream

The two streams are fused in a late fully


connected layer.
6

Proposed Method
Softmax classifier is used to classify the images
Input image size: 60x60x3
No. of feature maps after Conv layer 1 : 48
No. of feature maps after Conv layer 2 : 128
The fully connected layer consists of 2048
neurons.
Non-linearity: Rectified Linear Unit
Regularization method: Dropout
Max-pooling layers are present after every
convolutional layer.
7

Proposed Method
The network was implemented using
Keras library in Theano. The network
was trained on Quadro K2000 GPU to
speed up the trainng process.
The depth images are encoded using
jetcolormap into RGB images before
being fed to the network.
The images are first unrolled into
vectors of size 60x60x3 and then
they are reshaped into 4D tensors

Proposed Method
Training examples are stored in a
matrix and are normalized.
Training examples are shared
variables of float32 type, this is done
to enable efficient use of GPU.
Training Process:
Pre-trained weights are assigned to both
the RGB and the depth stream.
The weights of the fully-connected layer
is randomly initialized.
9

Proposed Method
The images are passed and the weights
of convolutional layers are fine-tuned.

The reason the transferred


parameters work efficiently is due to
the fundamental nature of
convolutional neural networks.

10

Results

11

Results

12

Results
The CNN has around 6 million parameters
and it is not possible for a few thousand
training images to give a good accuracy.
The comparison shows the effectiveness
of transfer learning.
Also by making use of transfer learning
the time taken to achieve a given
accuracy is lesser.
Hence, we can use lesser number of
epochs to reach a particular accuracy.
13

Summary
In our work we have presented a transfer
learning approach for multi-modal deep
networks. This enables faster training and
more accurate results.
We have shown that by making use of both
depth and RGB information of an image the
accuracy increases
By making use of deep learning we have
avoided the use of time consuming handcrafted features.
We have effectively encoded the depth into a
color image.
14

References
[1] Freeman, William T., et al. Computer vision for computer interaction.
ACM SIGGRAPH Computer Graphics 33.4 (1999): 65-68.
[2] Achanta, Radhakrishna, et al. SLIC superpixels compared to state-oftheart superpixel methods. IEEE transactions on pattern analysis and
machine intelligence 34.11 (2012): 2274-2282.
[3] Kukenys, Ignas, and Brendan McCane. Classifier cascades for support
vector machines. 2008 23rd International Conference Image and Vision
Computing New Zealand. IEEE, 2008.
[4] Ren, Shaoqing, et al. Faster R-CNN: Towards real-time object detection
with region proposal networks. Advances in neural information processing
systems, 2015.
[5] Jang, Hyeok, et al. Object classification using CNN for video traffic
detection system. Frontiers of Computer Vision (FCV), 2015 21st KoreaJapan Joint Workshop on. IEEE, 2015.
[6] Nguyen, Kien, Clinton Fookes, and Sridha Sridharan. Improving deep
convolutional neural networks with unsupervised feature learning. Image
Processing (ICIP), 2015 IEEE International Conference on. IEEE, 2015.
[7] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet
classification with deep convolutional neural networks. Advances in
neural information processing systems. 2012.
[8] McCann, Shawn, and Jim Reesman. Object Detection using Convolutional
Neural Networks. (2013).

15

[9] Eitel, Andreas, et al. Multimodal deep learning for robust rgb-d object
recognition. Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ
International Conference on. IEEE, 2015.
[10] Pan, Sinno Jialin, Qiang Yang, and Wei Fan. Transfer Learning with
Applications. (2012).
[11] LeCun, Yann, et al. Gradient-based learning applied to document
recognition. Proceedings of the IEEE 86.11 (1998): 2278-2324.
[12] Liu, Tianyi, et al. Implementation of Training Convolutional Neural
Networks. arXiv preprint arXiv:1506.01195 (2015).
[13] Srivastava, Nitish, et al. Dropout: a simple way to prevent neural
networks from overfitting. Journal of Machine Learning Research 15.1
(2014): 1929-1958.
[14] Oquab, Maxime, et al. Learning and transferring mid-level image
representations using convolutional neural networks. Proceedings of the
IEEE conference on computer vision and pattern recognition. 2014.
[15] Weiss, Karl, Taghi M. Khoshgoftaar, and DingDing Wang. A survey
of transfer learning. Journal of Big Data 3.1 (2016): 1-40.
[16] Team, The Theano Development, et al. Theano: A Python framework
for fast computation of mathematical expressions. arXiv preprint
arXiv:1605.02688 (2016).
[17] Chollet, Franois. Keras: Theano-based deep learning library. Code:
https://github. com/fchollet. Documentation: http://keras. io (2015).

16

Das könnte Ihnen auch gefallen