Beruflich Dokumente
Kultur Dokumente
2 RELATED WORK E. M. Upadhyayet al. [13] proposed a method of CWD, which uses
Primarily, research on gun detection focuses on Concealed Weapon image fusion. They used the fusion of IR image and visual to detect
Detection (CWD) and knife detection. CWD is stand on some tech- concealed weapon in a situation where over exposed and under
niques of imaging like infrared imaging, millimeter wave imaging, exposed area in image of scene are present. Their methodology
in application of luggage control at airports. consists of applying a homomorphic filter to visual and IR images,
In our previous work [11], we have implemented the visual gun captured at different exposure condition.
detection system using SIFT(Scale Invariant Feature Transform) Glowacz et al. [5] proposed a method for recognizable knives
and Harris interest point detector. The proposed system utilized detection for the baggage scanning system at airports and railway
color based segmentation to take out distinct object from an image stations. Their method is stand on active appearance model and
using k-mean clustering algorithm. Harris interest point detector Harris corner detector.
and Fast Retina Keypoint (FREAK) is utilized to find the weapon in
the segmented images. The object detection challenges like scaling, 3 METHODOLOGY
rotation, affine and occlusion were addressed in this work. The
dataset was ordered by us with sixty-five positive images (gun
3.1 Database
present) and twenty-four negative images (gun is not present). The We have implemented and evaluated our system over Internet
dataset was set up such that it comprises of images of various kind Movie Firearms Database IMFDB, a benchmark database of firearms
of handheld gun with various scale, revolution and orientation. [1].
In a few images, some of firearm was blocked by either hand or
some other protest and some images comprises multiple weapons. Internet Movie Firearms Database (IMFDb)
In a few pictures, some different items are additionally present The IMFDb is an online database of firearms used or featured
other than firearm with different background. The overall accuracy in movies, television shows, video games, and anime. The firearms
achieved by us using proposed system was 84.26%. images are compiled from hollywood movies, television shows,
Followed by work [11], we have implemented gun detection video games and Japanese animation. Following firearms are in-
system Speeded up robust features (SURF) interest point detector cluded under gun category- Assault Rifle, Battle Rifle, Bullpup, Car-
[12]. The Color based segmentation was utilized to take out random bine, Flamethrower, Flare Gun, Fictional Firearm, Grenade, Grenade
color or objects that are not of interest. At that point SURF features Launcher, Machine Guns, Machine Pistol, Mine, Missile Launcher,
were utilized to measure similarity of each segmented object with Mortar, Pisol, Revolver, Rifle, Shotgun, Sniper Rifle, Submachine
the weapon descriptor. An object is marked as gun if half of the Gun, Underwater Firearm etc.
features of weapon descriptor are matched with the SURF features Although, several number of gun categories are available in
of object. The accuracy achieved using SURF descriptor was 88.67%. IMFDb, we have compiled only Revolver, Rifle, Shotgun. Figure 1
Halima, N.B. et al.[3] demonstrated that BoWSS (Bag of Words shows sample positive images of IMFDb database. The negative im-
Surveillance System) algorithm has a high potential to detect guns. ages are collected from internet randomly from different categories
They first extract features using SIFT, cluster the obtained functions like flowers, landscape, animals etc.
using K-Means clustering and use SVM (Support Vector Machine)
for the training. 3.2 Deep Learning Model
Sheen et al.[9] proposed a method of CWD, based on three dimen- We have implemented CNN using MatConvNet [14], a MATLAB
sional millimeter (mm) wave imaging technique, to detect concealed toolbox implementing state-of-the-art Convolutional Neural Net-
weapon in the body at airports and other secure location. By using works (CNNs) without Graphical Processing Unit(GPU) for com-
2-D millimeter wave imaging, they modeled a 3-D image for the puter vision applications. In this study we have used a VGG-16
target. 3-D image from gathered data of 2-D image can be formed based classification model pre-trained on the Image Net dataset (ap-
from three dimensional imaging systems or wide wand imaging. proximately 1.28 million images across 1,000 generic object classes).
Z. Xue et al.[17] proposed a method of CWD, which is based VGG Net comes with two version VGG-16 and VGG-19. The archi-
on multi scale decomposition based fusion method. This method tecture of VGG-16 involves 16 convolutional layers with millions
associates the integration of color visual image andinfrared (IR) of parameters. The output of the model is a combination of a linear
imaging. For maintaining the natural color of the actual image, layer, which has activation function, named as ’Softmax”, and three
integration method for visual image and infrared image is done. fully connected layers. In a fully connected layer, VGG-16 utilizes
R. Blum et al.[4] proposed a method of CWD, which is based on dropout regularization and then RELU activation is implemented to
integration of visual image and IR or mm wave image and it uses the all convolutional layers. Deep CNNs, such as VGG-16, are generally
multi resolution mosaic technique. They have used image mosaic trained based on the prediction loss minimization.
to highlight the concealed weapon of the target image. To construct Let x and y be the input images and corresponding output class
that composite image, which havemicroscopic seam, image mosaic labels, the objective of the training is to iteratively minimize the
method is used to combine two or more images.Cut and paste average loss defined as equation 1.
process is resembled by image mosaic process. A multi resolution
algorithm is used here, which is proposed by Simoncelli et al. [10],
for recognizable image, to construct the steerable pyramid. N
1 Õ
J (w) = Ł (f (w, x i ) , yi ) + λR(w) (1)
N i=1
A Handheld Gun Detection using Faster R-CNN Deep Learning ICCCT-2017, November 24–26, 2017, Allahabad, India
4.3 Classification
Training
With the help of mini-batch gradient descent with momentum,
training can be performed by modifying the multinomial logistic
regression objective. In spite of the larger number of parameters
and the greater depth of the introduced nets is taken and to connect
(b) negative these, some epochs are required by the nets, due to the following
purpose: (a) indirect regularization, inflicted by smaller convolu-
Figure 1: Sample images from IMFDB database (a) positive tional filter sizes and greater depth (b) use pre-initialization for
and (b) negative samples definite layers. The ConvNets input is a fixed size 224 × 224 RGB
image during the Training.
To obtain this fixed-size image, rescaling has been done while
where N is the number of data instances (mini-batch) in every training. Two approaches for setting the training scale S (Let S be
iteration, L is the loss function, f is the predicted output of the the compact side of an isotropically resized training image) are
network depending on the current weights w, and R is the weight considered: 1) single-scale training, that requires a fixed S. 2) multi-
decay with the Lagrange multiplierλ. scale training, where every training image is independently resized
We use the Stochastic Gradient Descent (SGD), which is com- by arbitrarily sampling S from a definite range [Smin, Smax].
monly used in deep CNNs to update the weights and given with To improve overall training speed of each model, the researchers
the equation 2. introduced parallelization to the mini batch gradient descent pro-
cess. Since the model is very deep, training on a single GPU would
Wt +1 = µw t − α ∆J (w t ) (2) take months to finish. To speed up the process, the researchers
trained separate batches of images, on each GPU in parallel, to
where µ is the momentum weight for the current weights wt calculate the gradients.
and α is the learning rate. Testing At test time, in order to classify the input image: Firstly,
The network weights are randomly initialized if the network is it is isotropically resized to that shortest image side, which is pre-
trained from scratch and are initially set to a pre-trained network defined and symbolized as Q. After that, the network is impenetra-
weights if fine-tuning the deep model. In this work we have used bly employed across the resized test image in such a way that, firstly
fine-tuning VGG-16 and initialized it with the weights of the same the fully connected layers are transformed to the convolutional lay-
architecture. VGG-16 pre-trained on Image net database. The pre- ers (first fully connected layer to7 × 7 convolutional layer, the last
trained VGG-16 model used in this study is obtained through the two Fully Connected convolutional layers to 1 × 1 convolutional
MatConvNet; a deep learning software for MatLab. layers)
T PR = T P/T P + F N (3)
False Positive Rate: It is the determination of percentage of those
negative images, which are wrongly perceived by the system [17].
It is also known as specificity in machine learning.
Accuracy: It is the determination of proportion of those total
numbers of images which are accurately detected by the system.
(c)
6 CONCLUSION
In this proposed paper we propounded and implemented a novel Figure 2: Performance graph in terms of ROC and AUC
handheld gun detection approach for surveillance and alert system. Curve for (a) SVM (b) KNN and (c) Ensemble Tree
The system includes CNN based VGG-16 architecture as feature
extractor, followed by state-of-the-art classifiers implemented on a
standard gun database. With 93% accuracy, the utmost auspicious
A Handheld Gun Detection using Faster R-CNN Deep Learning ICCCT-2017, November 24–26, 2017, Allahabad, India
Table 2: Classification accuracy for KNN [7] Yingdong Ma and Qian Chen. 2010. Depth Assisted Occlusion Handling in Video
Object Tracking. Springer Berlin Heidelberg, Berlin, Heidelberg, 449–460. https:
//doi.org/10.1007/978-3-642-17289-2_43
Algorithm Accuracy TPR FPR PPV FDR [8] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
Fine KNN 87.2 91 18 86 14 Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C.
Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge.
Medium KNN 89.9 88 7 94 6 International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https:
Coarse KNN 89.9 89 10 92 8 //doi.org/10.1007/s11263-015-0816-y
[9] D. M. Sheen, D. L. McMakin, and T. E. Hall. 2001. Three-dimensional millimeter-
Cosine KNN 91.5 88 4 97 3 wave imaging for concealed weapon detection. IEEE Transactions on Microwave
Cubic KNN 91 88 6 95 5 Theory and Techniques 49, 9 (Sep 2001), 1581–1592. https://doi.org/10.1109/22.
Weighted KNN 91 88 5 96 4 942570
[10] E. P. Simoncelli and W. T. Freeman. 1995. The steerable pyramid: a flexible
architecture for multi-scale derivative computation. In Proceedings., International
Conference on Image Processing, Vol. 3. 444–447 vol.3. https://doi.org/10.1109/
Table 3: Classification accuracy for Ensemble Tree ICIP.1995.537667
[11] Rohit Kumar Tiwari and Gyanendra K. Verma. 2015. A Computer Vision
based Framework for Visual Gun Detection Using Harris Interest Point De-
Algorithm Accuracy TPR FPR PPV FDR tector. Procedia Computer Science 54, Supplement C (2015), 703 – 712. https:
Boosted tree 93.1 90 6 99 1 //doi.org/10.1016/j.procs.2015.06.083
[12] R. K. Tiwari and G. K. Verma. 2015. A computer vision based framework for
Bagged tree 92.6 88 1 99 1 visual gun detection using SURF. In 2015 International Conference on Electrical,
Subspace 76.6 91 42 73 27 Electronics, Signals, Communication and Optimization (EESCO). 1–5. https://doi.
org/10.1109/EESCO.2015.7253863
Discreminant [13] E. M. Upadhyay and N. K. Rana. 2014. Exposure fusion for concealed weapon
Subspace KNN 89.9 89 10 92 8 detection. In 2014 2nd International Conference on Devices, Circuits and Systems
RUSBoosted tree 92 90 6 95 5 (ICDCS). 1–6. https://doi.org/10.1109/ICDCSyst.2014.6926141
[14] Andrea Vedaldi and Karel Lenc. 2014. MatConvNet - Convolutional Neural
Networks for MATLAB. CoRR abs/1412.4564 (2014). http://arxiv.org/abs/1412.
4564
Table 4: Accuracy comparison with existing studies [15] A. Vedaldi and K. Lenc. 2015. MatConvNet – Convolutional Neural Networks for
MATLAB. (2015).
[16] Sergio A. Velastin, Boghos A. Boghossian, and Maria Alicia Vicencio-Silva. 2006.
Study Year Methods Accuracy (%) A motion-based image processing system for detecting potentially dangerous
Rohit Tiwari et al. [11] 2015 SIFT & FREAK 84.26 situations in underground railway stations. Transportation Research Part C:
Emerging Technologies 14, 2 (2006), 96 – 113. https://doi.org/10.1016/j.trc.2006.05.
Rohit Tiwari et al. [12] 2015 SURF 88.67 006
This Study 2017 CNN 93.1 [17] Z. Xue, R. S. Blum, and Y. Li. 2002. Fusion of visual and IR images for concealed
weapon detection. In Proceedings of the Fifth International Conference on Infor-
mation Fusion. FUSION 2002. (IEEE Cat.No.02EX5997), Vol. 2. 1198–1205 vol.2.
https://doi.org/10.1109/ICIF.2002.1020949
results have been procured. Our system can discern the existence of [18] V. Zeljkovic and M. Popovic. 2001. Detection of moving objects in video signal
under fast changes of scene illumination. In 5th International Conference on
numerous guns in real time and it is robust across the variation in Telecommunications in Modern Satellite, Cable and Broadcasting Service. TELSIKS
affine, scale, rotation and partial closure or occlusion. Although, we 2001. Proceedings of Papers (Cat. No.01EX517), Vol. 2. 411–414 vol.2. https://doi.
presume that by implementing the novel method, the performance org/10.1109/TELSKS.2001.955808
of our system, can be refined and its real time processing essentials
like complexity of space and time can be diminished. For comparing
the accuracy of our paper, [11] and [12] methods are used. The
accuracy rate using [11] (SIFT and FREAK method) comes out to be
84.26 and using [12] (SURF method), it comes out to be 88.67. From
table 4, it is observed that the accuracy rate of our system comes
93.1, which is greater than both methods.
REFERENCES
[1] [n. d.]. IMFDB: Internet Movie Firearms Database. http://www.imfdb.org/wiki/
Main_Page. ([n. d.]). Accessed: 2016-10-30.
[2] Jorge P. Batista. 2004. Tracking Pedestrians Under Occlusion Using Multiple
Cameras. Springer Berlin Heidelberg, Berlin, Heidelberg, 552–562. https:
//doi.org/10.1007/978-3-540-30126-4_68
[3] Nadhir Ben Halima and Osama Hosam. 2016. Bag of Words Based Surveillance
System Using Support Vector Machines. 10 (04 2016), 331–346.
[4] R. Blum, Zhiyun Xue, Z. Liu, and D. S. Forsyth. 2004. Multisensor concealed
weapon detection by using a multiresolution mosaic approach. In IEEE 60th
Vehicular Technology Conference, 2004. VTC2004-Fall. 2004, Vol. 7. 4597–4601 Vol.
7. https://doi.org/10.1109/VETECF.2004.1404961
[5] Andrzej Glowacz, Marcin Kmieć, and Andrzej Dziech. 2015. Visual detection of
knives in security applications using Active Appearance Models. Multimedia
Tools and Applications 74, 12 (01 Jun 2015), 4253–4267. https://doi.org/10.1007/
s11042-013-1537-2
[6] Yu-Ming Liang, Sheng-Wen Shih, and Arthur Chun-Chieh Shih. 2013. Human
action segmentation and classification based on the Isomap algorithm. Multimedia
Tools and Applications 62, 3 (01 Feb 2013), 561–580. https://doi.org/10.1007/
s11042-011-0858-2