Sie sind auf Seite 1von 5

2018 the 3rd IEEE International Conference on Cloud Computing and Big Data Analysis

Real-Time Object Recognition Algorithm Based on Deep Convolutional Neural


Network

Lihong Yang, Liewei Wang, Shuo Wu


The Thirty-eighth Research Institute of China Electronic Technology Group Corporation
Hefei Research Institute of Public Safety Technology
Hefei, China
e-mail: lih.yang@163.com, levidwang@163.com, perfectshuo@126.com

Abstract—Radar detection of moving objects is vulnerable to generalization of widely used features such as HOG and
external environment. By introducing the object confirmation SIFT, features applied to engineering practices should be
algorithm, the intelligent radar perimeter security system’s designed specifically. Therefore, the detection accuracy
false alarm rate can be reduced significantly. The object depends heavily on the experience of developers. In order to
confirmation algorithm is essentially an object detection obtain higher accuracy, multiple features are used, so the
algorithm. Due to the poor generalization of artificial feature feature dimension becomes bigger and bigger which greatly
extraction algorithms, we use the deep convolution neural reduces the real-time performance of algorithms. As a result,
network to extract deep features automatically for object it is difficult to make a breakthrough in practice for
confirmation. In order to meet the real-time requirement in
traditional object detection algorithms.
engineering practices, our algorithm use the YOLOv2 system
With the continuous construction of large-scale data sets
as a basis, and selects anchor boxes which meet object scales of
our training data set by k-means++ clustering. To improve the and the ever-increasing hardware computing power, theories
YOLOv2 network structure, low-layer deep features which and practices of deep learning have been rapidly developed
denote the texture information and high-layer deep features in recent years. Object detection algorithms based on deep
which denote the semantic information are combined layer by convolutional neural networks have achieved a qualitative
layer to make object detection more accurate. The improvement in performance. In 2014, deep features are
experimental results show that the false alarm rate of the applied to object detection for the first time in R-CNN [1].
intelligent radar perimeter security system is further reduced Since then, object detection with deep learning based on
by introducing the object confirmation algorithm. Especially Region Proposal was pioneered. SPP-net [2], Fast R-CNN
for extreme weather, false alarms of radar detection greatly [3], Faster R-CNN [4] and R-FCN [5] all were improvement
increase. But most of them are eliminated after running object of this kind of method. With the optimization of deep
confirmation algorithm. Therefore the warning accuracy of the convolutional neural network, the above methods achieved
entire system can be guaranteed. The detection speed of the higher accuracy and faster detection speed. But all of them
object confirmation algorithm is 33FPS, which meets the real- didn’t achieve real-time. In engineering practices, real-time
time requirement of engineering practices. is very important. As a result, object detection algorithms
with deep learning based on regression were proposed.
Keywords-YOLOv2; deep convolutional neural network;
YOLO [6], SSD [7] and YOLOv2 [8] were typical
feature fusion; perimeter security
representatives. Without extraction of region proposals, such
methods directly obtained the location and category of
I. INTRODUCTION objects by regression, achieving real-time detection.
Perimeter security system is widely used in airports, Compared with traditional feature extraction methods,
nuclear power plants, oil fields, prisons, etc., to prevent the deep convolution neural network simulates the human
illegal invasion. The intelligent radar perimeter security brain. By continuous learning, deep features are extracted
system is a next-generation product which is developed to from images automatically, which not only avoid
reduce the high false alarm rate of the traditional perimeter complicated feature design step but also have powerful
security system. Due to external interference signals caused generalization. According to the real-time requirement of the
by swaying trees and moving animals, radar detection still object confirmation module, our object confirmation
contains some false alarms. By introducing the object algorithm is based on YOLOv2. Our training data set is
confirmation algorithm, false alarms detected by the radar constructed and anchor boxes are selected by clustering for
can be eliminated and the warning accuracy can be further this training set. For network structure, the detail texture
improved. information of deep features in low layers and the semantic
Object confirmation essentially is object detection based information of deep features in high layers are combined to
on an image. Traditional object detection algorithms include make object detection more accurate. The experimental
three steps: region selection, feature extraction and results show that most of false alarms captured by the radar
classification. Feature extraction is the key factor which are eliminated by introducing our object confirmation
affects the system performance. Because of the poor algorithm. The false alarm rate of the system is reduced

978-1-5386-4301-3/18/$31.00 ©2018 IEEE 331


while the accuracy of the system is improved. The average × M cells. For each cell, with given B anchor boxes, B
detection speed is 33FPS, which satisfies the real-time bounding boxes are predicted after running a single deep
requirement of engineering practices. convolution neural network. For each bounding box, the
network predicts five coordinates and C probabilities. By
II. INTRODUCTION OF YOLOV2 non-maximum suppression, the final objects are obtained
YOLOv2 reframes object detection as a single regression among the M × M × B bounding boxes.
problem, straight from image pixels to bounding box The deep convolutional neural network used by YOLOv2
coordinates and class probabilities simultaneously by is a fully convolutional network with 19 convolutional layers
running a single deep convolutional neural network on the and 5 pooling layers. Inspired by the global average pooling
image. Because YOLOv2 unifies bounding box extraction, in Network in Network (MIN) [9], 1x1 reduction layer are
feature extraction, object classification and object location used between two 3x3 convolutional layers to compress deep
into a single network, it not only enables end to end features. When applied to VOC (The PASCAL Visual
optimization during training, but also achieves real-time Object Classes) data set, M = 13, B = 5, C = 20, so the final
when detecting. output of the network is 13 × 13 × (5 × (5 + 20)) = 13 × 13 ×
When predicting the location and class of each object, the 125. The full network is shown in Fig. 1.
input image is zoomed to a uniform size and divided into M

Figure 1. The deep convolutional neural network architecture of YOLOv2.

III. OBJECT CONFIRMATION ALGORITHM


We divide the objects needed to detect in our project into
six classes, named people, car, motorbike, minibus, truck and
bus. The entire algorithm is divided into the training step and
detecting step.
During the training step, we collect a large number of
images with objects belong to the above six classes and label
them. Then, we design a deep convolution neural network
and initialize the network with parameters which is obtained
by pre-training on the large-scale data set. The above images
are fed into the deep convolutional neural network to predict
locations and classes of objects in them. By back-
propagation algorithm, the difference between predicted Figure 2. The diagram of the detecting step.
results and ground truths is continuously reduced. Finally,
we obtain the deep model for object confirmation as a result
of the training step, which is used to initialize the deep
convolution neural network in the detecting step.
The diagram of the detecting step is shown in Fig. 4.
Firstly, the image is zoomed to 416 × 416 and divided into
13 × 13 detection cells. Then 9 bounding boxes are predicted
for each detection cell, so a total of 1521 bounding boxes are
obtained for the whole image. Finally, by non-maximum
suppression, objects in the image which belong to the six
classes are obtained like the black car and the pedestrian
shown in Fig. 4.
Figure 3. Average IOU plot with different number of clustering centers.

332
The number of anchor boxes is equal to the number of
clustering centers. For various numbers of clustering centers,
the average IOU between each ground truth and its closet
centroid is shown Fig. 5. When using 9 clustering centers,
the upward trend of the average IOU has tended to be stable.
Therefore, we use 9 different anchor boxes to balance the
detection accuracy and the model complexity. Table I shows
the width scale and height scale of each anchor box which
refer to the ratio of the width and height of each anchor box
to the width and height of related detection cell whose width
and height are both 32pixels.
V. NETWORK STRUCTURE
Deep convolution neural network is the soul of deep
learning. A well-designed network can efficiently extract
deep features of objects. Because the low-layer filters extract
the detail texture information of objects while the high-layer
filters extract the semantic information, multi-feature fusion
has become a new trend in the deep convolutional neural
Figure 4. Feature Combined module. network design in recent years [10]. Combining the semantic
information with the texture information for object detection
can achieved higher detection accuracy.
IV. TRAINING DATA SET
TABLE I. ANCHOR BOXES WITH 9 CLUSTERING CENTERS
A. Training Data Set Construction
The training data set is a very important factor whether Anchor box Width scale Height scale
deep learning can get good results. Therefore, in the training
step, the training data set construction is a key task. In our 1 0.63 1.22
project, we collect a large number of images as training 2 0.92 3.12
samples, which contain objects of the six classes named
people, car, motorbike, minibus, truck and bus. The location 3 1.68 2.00
and class of each object in the images are labeled. To ensure 4 1.73 5.21
the detection accuracy, training samples should be as diverse
as possible. The images in our training data set not only 5 2.96 4.32
include different outdoor scenes such as security monitoring 6 3.00 8.59
and traffic, but also cover different light environments such
as sunny day, rain, snow, and night. In addition, scales and 7 4.87 6.80
deformations of objects also should be as diverse as possible. 8 5.77 10.38
B. Anchor Boxes Slection 9 10.36 10.70
It is easier to achieve stable training of the deep model by
predicting the offset of each bounding box to related A. Feature Combined Module
detection cell than directly calculating coordinates of each YOLOv2 combines the 26 × 26 × 256 features before the
bounding box. So it is necessary to provide anchor boxes last pooling layer with the subsequent 13 × 13 × 1024
with specified scales as the reference to predict bounding features to obtain 13 × 13 × 3072 high-dimensional features
boxes. for the final detection. In order to excavate more deep
YOLOv2 provides anchor boxes for training sets like features, the feature combined module shown in Fig.4 is used
VOC and COCO (Common Objects in Context). Because to merge the low-layer deep features one by one with the
there are great differences in object classes and scenarios subsequent high-layer deep features. The fusion process is
between our training data set and them, we must calculate carried out along the deep convolutional neural network
anchor boxes which are suitable for our training data set. layer by layer.
We collect the width and height data of all objects in our As shown in Fig. 4, Firstly we perform batch
training data set. Then we run k-means++ clustering on them normalization on the previous depth feature named
to get anchor boxes which can best fit the aspect ratio of our feature_N and then put the output to conv1. Secondly, we
training data set. The detection accuracy is measured by the apply the second batch normalization to the output of conv1
IOU (Intersection over Union) between the bounding box and then put the output to conv2. Finally, we combine the
and the ground truth, so we use the distance metric shown in output of conv2 with feature_N to obtain the combined
feature named feature_N + 1.
d(box, centroid) = 1 IOU(box, centroid)  

333
B. Network Structure convolution layer which has gradually reduplicated the
The structure of our deep convolutional neural network is number of filters.
shown in Fig. 5. Before each pooling layer, the low-layer We should detect objects with six different classes. The
features are superposed on the high-layer deep features by input image is divided into 13 × 13 detection cells and 9
multiple feature combined modules and then fed to 1 × 1 bounding boxes are predicted for each cell. Therefore, our
final output is 13 × 13 × (9 × (5 + 6)) = 13 × 13 × 99.

Figure 5. Our deep convolutional neural network structure.

VI. DEEP CONVOLUTION NEURAL NETWORK


The running environment of our object confirmation
algorithm is as follows: CPU is Inter Xeon CPU E5-2623 v3
3.00GHz with 64G memory; GPU is NVIDIA Geforce
GTX1080Ti; the operating system is 64-bit Windows 10
Enterprise Edition.
Under the above condition, average detection time per
frame is 30ms, which fully meets the real-time requirement
of the intelligent radar perimeter security system. (b) Shaking trees under the snow
Radar object detection is vulnerable to the natural
environment. Fig. 6 shows the false alarms detected by the
radar. 6(a) is caused by shaky leaves in the wind; 6(b) is
caused by the snow on the tree; 6(c) is caused by the drift of
rain and snow in the extreme weather. When the above
images are input to the object confirmation module, because
there aren’t interested objects such as persons or vehicles in
them, the false alarms are removed from warning results.
Therefore the system’s false alarm rate is reduced.
Table II shows warning results of the intelligent radar
(c) Drift of rain and snow
perimeter security system for test on January 5, 2018,
January 6, 2018 and January 7, 2018. Figure 6. False alarms detected by the radar.

TABLE II. OBJECT CONFIRMATION RESULTS

Object confirmation
Date Radar warnings
warnings
20180105 2860 2722

20180106 1177 903

20180107 1768 1103

(a) Shaking leaves in the wind

334
Because the weather condition on January 5, 2018 was 2017YFC0804900, the National Natural Science Foundation
stable, there were 148 false alarms in 2860 warnings. After of China through grant 61503352.
object confirmation, 138 false alarms were removed, that is,
93.3% of false alarms were eliminated. Because of the REFERENCES
frequent rocking of trees due to snow melting on January 6, [1] R. Girshick, J. Donahue, T. Darrell and J. Malik, “Rich Feature
2018, false alarms increased significantly. There were 304 Hierarchies for Accurate Object Detection and Semantic
false alarms in 1177 warnings. After object confirmation, Segmentation,” Proc. IEEE Conf. Computer Vision and Pattern
Recognition (CVPR 14), IEEE Press, Jun. 2014, pp. 580-587, doi:
274 false alarms were removed, that is, 90.1% of false 10.1109/CVPR.2014.81.
alarms were eliminated. There were 719 false alarms in 1768 [2] K. He, X. Zhang, S Ren and J. Sun, “Spatial Pyramid Pooling in Deep
warnings on January 7, 2018. After object confirmation, 651 Convolutional Networks for Visual Recognition,” IEEE Transactions
false alarms were removed, that is, 90.5% of false alarms on Pattern Analysis and Machine Intelligence, vol. 37, Sept. 2015, pp.
were eliminated. In conclusion, under stable weather 1904-1916, doi: 10.1109/TPAMI.2015.2389824.
conditions, the intelligent radar perimeter security system [3] R. Girshick, “Fast R-CNN,” Proc. IEEE Conf. Computer Vision
can achieve effective warning even in complex scenarios. Its (ICCV 15), IEEE Press, Dec. 2015, pp. 1440-1448, doi:
10.1109/ICCV.2015.169.
false alarm rate can be further reduced by introducing the
[4] S Ren, K. He, R. Girshick and J. Sun, “Faster R-CNN: Towards Real-
object confirmation module. Under extreme weather Time Object Detection with Region Proposal Networks,” IEEE
conditions, false alarms detected by the radar will be Transactions on Pattern Analysis and Machine Intelligence, vol. 39,
increased substantially. At this time, with the object Jun. 2017, pp. 1137-1149, doi: 10.1109/TPAMI.2016.2577031.
confirmation module, most of false alarms can be eliminated [5] J. Dai, Y. Li, K. He and J. Sun, “R-FCN: Object Detection via
and the entire system’s warning accuracy can be guaranteed. Region-based Fully Convolutional Networks,” arXiv:1605.06409, Jun.
2016, unpublished.
VII. CONCLUSION [6] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You Only Look
Once: Unified, Real-Time Object Detection,” Proc. IEEE Conf.
By introducing the real-time object confirmation Computer Vision and Pattern Recognition (CVPR 16), IEEE Press,
algorithm based on deep convolution neural network, the Jun. 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
intelligent radar perimeter security system incorporates the [7] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., “SSD:
high sensitivity of radar detection and the high accuracy of Single Shot MultiBox Detector,” Proc. 14th European Conference.
object confirmation. Due to breaking the limitation of single European Conference on Computer Vision (ECCV 2016), Springer
technology, the system’s false alarm rate is reduced while the Press, Sept. 2016, pp. 21-37, doi: 10.1007/ 978-3-319-46448-0_2.
system’s accuracy is improved. Using the object [8] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR
confirmation algorithm, the intelligent radar perimeter 17), IEEE Press, Jul. 2017, pp. 6517-6525, doi:
security system significantly reduces the dependence on the 10.1109/CVPR.2017.690.
weather environment. It can still achieve effective warning [9] M. Lin, Q. Chen and S. Yan, “Network In Network,” Proc.
under adverse weather conditions. International Conference on Learning Representations(ILCR 2014),
Apr. 2014.
ACKNOWLEDGMENT [10] K. He, X. Zhang, S Ren and J. Sun, “Deep Residual Learning for
Image Recognition,” Proc. IEEE Conf. Computer Vision and Pattern
This work is financially supported by National Key Recognition (CVPR 16), IEEE Press, Jun. 2016, pp. 770-.787, doi:
Research and Development Project of China through grant 10.1109/CVPR.2016.90.

335

Das könnte Ihnen auch gefallen