Sie sind auf Seite 1von 7

2019 IEEE 3rd Information Technology,Networking,Electronic and Automation Control Conference (ITNEC 2019)

Robot Vision Model Based on Multi-Neural Network


Fusion
Hexi Li, Jihua Li , Xinle Han
Faculty of Intelligent Manufacturing, Wuyi University
Jiangmen, China
jmlihexi@163.com, 1073434853@qq.com, 981887295@qq.com

Abstract—In the practical application of robot vision, neural


networks are widely used to recognize working targets, but the II. BP NEURAL NETWORK MODEL FOR COLOR-BASED
reliability will decrease due to the influence of environmental TARGET RECOGNITION
factors such as illumination, background, camera orientation and
so on. To solve this problem of robot vision, this paper establishes A. Standard color reference template
two back-propagation neural networks corresponding to the
color and shape and a convolution neural network corresponding Generally speaking, the acquired target image is true color,
to the texture respectively to recognize the identical target in the and its color value may reach 224, this is too complicated to
robot's field of view, and then fuses the recognition results of image processing, color reduction is needed. By using color
three neural networks with the D-S evidence theory to get a clustering, the color can be approximated to several colors.
better judgment. The experimental results show that the This paper takes 16 standard reference colors as the
proposed model in this paper can improve the reliability of robot classification of target color, as shown in Fig. 1.
vision by the fusion of multi-neural network of color, shape and
texture, and it can be used in automatic control systems such as
feeding, assembly, sorting and tracking of industrial robots.
Keywords—robot vision; back-propagation neural network;
convolution neural network; D-S evidence theory

I. INTRODUCTION
Vision is an important way for robots to perceive working
targets. It has become one of the key technologies of Fig.1. Target color reference template
intelligent robots, and has made great progress in recent years.
Robot vision for target recognition includes three steps: target After the color of target image is converted into the colors
image acquisition, feature extraction and target recognition. of standard reference template, the normalized color histogram
Target image acquisition is accomplished by visual sensors. of the target image is obtained as the expression of the target
Target feature extraction is a key step, there are three kinds of color features.
extraction methods for different target images: shape-based,
color-based, and texture-based. Target recognition needs to B. Neural network structure of color-based recognition
design a good model as a classifier, most applications have For each target image, a 16-color histogram is generated
proved that neural network model is a very effective target after color clustering. In the workspace of the robot, the
classifier[1-4]. However, color-based, shape-based or texture- proportion of background appearing in the target image
based models have their own flaws when they are used for changes greatly due to the different orientation of the image
target recognition alone, when the illumination, background acquisition. Therefore, the histogram of background should be
and orientation of the target change greatly, the recognition removed in image preprocessing. In fact, there are only 15
accuracy may decrease. For example, using color histogram, colors as target feature. The classifier for color recognition uses
the spatial information of the target will be lost, and using back propagation neural network(BPNN), which adopts three-
shape will lose color and texture feature. In this paper, a layer structure, i.e., input layer, hidden layer and output layer.
single neural network of color, shape and texture is combined The normalized histogram of color is used as the input of
to form a multi-neural network-based robot vision model. BPNN to form a 15-dimensional input vector xi, and the output
Each neural network may have different classification results dimension of BPNN depends on the category of target to be
for identical target, even conflicting with each other, using D- classified. Here, five kinds of medicine packages are taken as
S evidence theory to solve this problem, so that more accurate examples to discuss the problem of target recognition of robots,
results can be obtained to robot target recognition in complex so the output of the neural network is the five-dimensional
environment. vector yi, and the number of hidden layer nodes is set to 20, as
shown in Fig.2.

978-1-5386-6243-4/19/$31.00 ©2019 IEEE 2571


Fig.2. Color-based BP neural network for target recognition
(c) Color histogram of target class3 (d) Color histogram of target class4

C. The training of color –based neural network


Images of five kinds of medicine packages with different
textures are used as target training sets of neural network, as
shown in Fig. 3.


(e) Color histogram of target class5 (f) Train curve of BPNN
Fig.4. Color histogram of target image and train curve

The last item of each histogram in Fig.4(a-e) is background


Fig.3. Samples to be trained color-based BPNN pixel statistics, when normalization processing is done, the
pixels of this item are removed. In this case, the normalized
Before input into BPNN, the color histogram of each target
histogram that removes background pixels is called valid
is calculated according to standard 16-color reference. When
histogram. According to the batch training method of BPNN,
calculating the color histogram, an index table can be obtained
50 medicine packages are input into the color-based neural
by establishing the index image of the standard template, and
network for training. The performance of BPNN is expressed
then using the index table, the target image with RGB real
by the sum of squares of errors E, the goal of E, Eth, is set to
color can be converted into an index image with 16 colors. Five
0.0001, and the training curve is shown in Fig.4 (f).
typical medicine package histograms are shown in Fig.
4(a),(b),(c),(d) and (e).
III. CONVOLUTIONAL NEURAL NETWORK MODEL FOR
TEXTURE-BASED TARGET RECOGNITION
Convolutional neural network(CNN) has been widely used
in the field of computer vision in recent years[5]. It has been
proved that it can significantly improve the accuracy of target
recognition and image classification[6-8]. Initially, the CNN
network was proposed by LeCun to solve handwritten postcode
recognition problems[9], now it has been widely used in target
image recognition[10]. Many facts have shown that the
recognition accuracy of CNN is better than that of traditional
BPNN for targets with complex texture.

 (a) Color histogram of target class1 (b) Color histogram of target class2

2572
A. The structure of CNN for robot texture recognition
CNN is usually connected alternately by convolution layer
C and pooling layer S, and finally the output is formed by a
fully connected BPNN. The convolution layer C is used to
extract target features. Each convolution layer is the
convolution result of the current layer and the previous layer
using a convolution kernel, except the first layer is the original
image, the other layers are feature maps. Generally, each
convolution layer has multiple feature maps. Nodes in the
convolution layer (neurons) are only associated with nodes in
the convolution nucleus, so it is called local connection, which
greatly reduces the training parameters of CNN. In order to
further reduce the number of nodes in feature map and ensure
less loss of features, the a pooling layer S is introduced after
each convolution layer to subsample the feature map. Industrial
robots require high real-time visual response, the recognition of
targets should be fast enough. In addition, considering the Fig.6. The training curve of CNN
different tasks, the working targets change at any time, and the
sampling and training of working targets should be completed IV. BP NEURAL NETWORK MODEL SHAPE-BASED TARGET
in a relatively short time under the condition of a single CPU, RECOGNITION
the CNN network layer can not be too deep. In this paper, three
pairs of C-S connections(C1-S1-C2-S2-C3-S3) are used to Shape is another important feature of target, which is often
form a local connection of CNN for feature extraction of target used in target recognition of robot vision[11]. Shape can be
texture. The last layer of local connection, S3, is stacked into a expressed by the external boundary of target or by the region
feature vectors and input into a 3-layer fully connected surrounded by the boundary of target. the region and
(FC)BPNN. The fully connected BPNN consists of input boundary is usually converted into binary image, as shown in
layer(FC1), hiden layer(FC2) and output layer(O) as a classifer Fig.7.
of target recognition for robot vision. The whole 10-layer
CNN structure used in this paper is shown in Fig. 5

(a) Sample images

(b) Region of target binary image


Fig.5. 10-layer-CNN structure for target texture recognition Fig.7. Shapes of target images

B. Training of CNN for texture recognition In this paper, a three-layer BP neural network is used as a
classifier of target shape. Its structure is similar to that of
The training data set of the whole CNN network consists of color-based BPNN, but the number of nodes in input layer and
1200 image samples and their class labels. Gradient descent hidden layer is different. In order to keep the original shape of
method is used to train CNN. 1200 samples are divided into 24 target unchanged, the target is normalized to 30 pixels in its
batches, 50 samples as a batch, and the network parameters are height, and the width of target is scaled, and the normalized
modified once for every training iteration until the performance target image is stacked into a feature vector as the input of
requirements are met. The performance measure is the sum of BPNN. The dimension of the vector is variable, and the input
squares of network output error E, here, we take performance of BPNN requires that the vector dimension is fixed, so we
goal E<=0.001. The training curve of CNN is shown in Fig. 6. extend the dimension of the input vector to 2700 and keep it
unchanged. For feature vectors whose dimension is less than
2700, its remaining portion will be filled with 0, as shown in
Fig.8.

2573
neural network, CNN, with local connection and full
connection, as is shown in Fig.9.
Three models can be used to identify the identical target at
the same time, however, when the recognition results are
inconsistent or conflicting, how does the robot give the right
judgment? A better way to solve evidence conflict is D-S
evidence theory, here it is used for the fusion of three neural
networks.
D-S evidence theory proposed by Dempster and Shafer is
an effective method to deal with multiple sources of evidence,
especially when evidence conflicts with each other. This paper
use it to fuse the output of the combined neural network.
Fig.8. Shape-based BP neural network for target recognition Firstly, a framework of finite and complete universe set U is
defined, in which each element is mutually exclusive. For any
V. FUSION OF MULTI- NEURAL NETWORK FOR TARGET proposition A in the universe, it contains a power set of U, a
RECOGNITION mapping, m:2U ė[0,1], and meets  m(A)=1, m()=0 ( is
From a biological point of view, human vision perception empty set), m is called a basic belief assignment. Let m1, m2
of targets depends on shape, color and texture features. Based and m3 be the basic reliability allocation functions (also called
on this idea, the visual perception of robots can also be mass functions) corresponding to the evidence of three neural
modeled by combining shape, color and texture. As mentioned networks, and there are three propositions, A, B and C,
respectively. For proposition A based on color-BPNN, the
above, a color-based BPNN, shape-based BPNN and texture-
corresponding focal elements {A1, A2, A3, A4, A5}= {target1,
based CNN are designed respectively, after training with the
target2, target3, target4, target5}, for proposition B based on
same labeled samples, they are applied to recognize the
shape-BPNN, the corresponding focal elements {B1, B2, B3, B4,
working target of robot workspace, such as the medicine
packages here. Of course, each neural network can be used B5}= {target1, target2, target3, target4, target5}, for
independently for the target recognition. However, the proposition C based on texture-CNN, corresponding focal
accuracy and robustness of single neural network for target element {C1, C2, C3, C4, C5}= {target1, target2, target3,
recognition will decrease when the illumination of robot target4, target5}, then the D-S synthesis rule can be expressed
environment changes and the orientation of camera is different. as follows:
To solve this problem, integrating the three neural networks of
color, shape and texture for target recognition, a combined
vision model of multi-neural networks is formed. ­ ¦ m1( Ai )m2 ( B j )m3 (Ck )
° Ai B j Ck D
m( D) ® D Ž U and D z M (1)
1 K
°
¯0 D M

K ¦
Ai  B j Ck M
m1 ( Ai )m2 ( B j )m3 (Ck )  1 (2)

Here, K represents the conflicting level of evidence m1,


m2 and m3. Five target classification proposition, the output
of three neural networks are taken as the basic evidence
source m1, m2 and m3. To fuse the three evidence source, we
need to determine the reliability allocation function m1 (Ai),
m2 (Bj), m3 (Ck). If the actual output of the three neural
networks are ta, tb and tc respectively, the mass function can
be calculated in the following ways:

t a (i )
m1 ( Ai ) 5
i 1, 2, 3, 4, 5
¦ ta ( p )
p 1
(3)

 tb ( j )
m2 (B j ) 5
j 1, 2, 3, 4, 5
Fig.9. Combination of color,shape and texture neural network
¦p 1
tb ( p ) (4)
For the combined neural networks, the first two are fully
connected BP neural networks, The third one is a hybrid

2574
symbol "T" is used to indicate the correct classification, and
tc ( k ) the symbol "F" is used to indicate the classification error.
m 3 (C k ) 5
k 1, 2, 3, 4, 5
¦t
p 1
c ( p) (5)
TABLE I. TEST RESULTS OF THREE NEURAL NETWORKS
it can be seen that the output of the above three neural Target 1# 2# 3# 4# 5#
networks after normalization obviously meets the
requirements of the mass function. 0.896 0.266 0.231 0.139 0.256
Color 0.301 0.898 0.187 0.097 0.097
­
° ¦ m ( A) 1, m (M ) 0 BPNN 0.097 0.092 0.685 0.649 0.688
° AŽU ta(i) 0.115 0.146 0.701 0.587 0.316
°
® ¦ m(B) 1, m (M ) 0  (6) 0.413 0.203 0.441 0.331 0.655
° B ŽU
° T/F T T F T F
°̄ C¦
m (C ) 1, m (M ) 0
ŽU 0.772 0.477 0.211 0.118 0.592
Shape 0.652 0.704 0.235 0.136 0.255
BPNN 0.220 0.199 0.793 0.501 0.106
VI. EXPERIMENTAL RESULTS AND DISCUSSIONS tb(j) 0.201 0.205 0.541 0.562 0.118
0.336 0.376 0.101 0.126 0.580
Taking the robot medicine package sorting system as an
example, the proposed multi-neural network fusion model is T/F T T T T F
validated and discussed. The experimental device consists of 0.907 0.396 0.219 0.205 0.321
EPSON Scara robot, vision sensor, pneumatic sucker and Texture 0.761 0.919 0.088 0.112 0.101
master computer, there are five kinds of medicine package to 0.224 0.203 0.745 0.432 0.685
CNN
be classified on a workbench, as shown in Fig.10.
tc(k) 0.087 0.234 0.522 0.786 0.211
0.366 0.195 0.492 0.221 0.671
T/F T T T T F

TABLE II. MASSES OF 5 SAMPLES RESPONDING TO TABLE1

Target 1# 2# 3# 4# 5#
0.4918 0.1657 0.1029 0.0771 0.1272
Color 0.1652 0.5595 0.0833 0.0538 0.0482
BPNN 0.0532 0.0573 0.3051 0.3600 0.3419
m1(Ai) 0.0631 0.0910 0.3122 0.3256 0.1571
0.2267 0.1265 0.1964 0.1836 0.3255
T/F T T F T F
0.3540 0.2432 0.1122 0.0818 0.3586
Fig.10. Robot vision system for medicine package sorting Shape 0.2989 0.3590 0.1249 0.0942 0.1545
BPNN 0.1009 0.1015 0.4216 0.3472 0.0642
When the vision sensor carried by the robot sweeps
through the workspace, the target image, medicine package, is m2(Bj) 0.0922 0.1045 0.2876 0.3895 0.0715
captured and input to the master computer, The input image is 0.1541 0.1917 0.0537 0.0873 0.3513
automatically converted into the outputs of trained color,
T/F T T T T F
shape and texture neural networks respectively, and mass
function m1, m2 and m3 can be calculated according to (3), (4) 0.3868 0.2034 0.1060 0.1167 0.1614
and (5). 0.3245 0.4720 0.0426 0.0638 0.0508
Texture
Table 1 gives a test results of three neural networks, in CNN 0.0955 0.1043 0.3606 0.2460 0.3444
which each column lists the output values of 5 nodes in three m3(Ck) 0.0371 0.1202 0.2527 0.4476 0.1061
kinds of neural networks for one test sample. Five test samples
are given here, which are distinguished by symbol #, and each 0.1561 0.1002 0.2381 0.1259 0.3374
sample represents a medicine packaging category. For each T/F T T T T F
neural network, the subscript corresponding to the maximum
value of the five nodes serves as the classification label. The
Table 2 gives the mass function of each neural network

2575
responding to Table 1(5 test samples), from the fifth column in
Table 2, we can see that neither of the neural networks of color,
¦
Ai Bj Ck D4
m1( A4 )m2 (B4 )m3(C4 )
shape or texture correctly identifies the working target, this is m(D4 )
the worst result. We will take fifth sample as an example to
1 K
discuss the problem of evidence synthesis using D-S evidence 0.1571u0.0715u0.1061
0.0216
theory. Let the framework of evidence combination be D 1-0.9449
D={D1, D2, D3, D4, D5}= target set{target1, target2, target3,
target4, target5}, for the evidence m1, m2 and m3 given by ¦ m1( A5)m2 (B5)m3(C5)
Ai Bj Ck D5
three neural networks of color, shape and texture are shown in m(D5 )
Table 3. 1 K
0.3255 u 0.3513u 0.3374
TABLE III. MASSES OF THREE NETWORKS FOR SAMPLE 5#
0.7005
1-0.9449
The calculation of the masses of the remaining four samples
m1 m2 m3 is similar to that of the fifth sample, which is listed in Table 4.
A1 0.1272 B1 0.3586 C1 0.1614 TABLE IV. MASSES OF FINAAL OUTPUT AFTER FUSION(5 SAMPLES)
A2 0.0482 B2 0.1545 C2 0.0508
D-S evidence theory fusion
A3 0.3419 B3 0.0642 C3 0.3444
A4 0.1571 B4 0.0715 C4 0.1061 Target     
A5 0.3255 B5 0.3513 C5 0.3374
D1 0.7520 0.0765 0.0167 0.0081 0.1337
From Table3, we can calculate the amount of conflict D2 0.1790 0.8845 0.0061 0.0036 0.0069
between the three evidences as follows,. D3 0.0057 0.0057 0.6332 0.3394 0.1373
D4 0.0024 0.0107 0.3098 0.6266 0.0216
K ¦
Ai  B j  C k M
m1 ( Ai ) m 2 ( B j ) m3 (C k ) D5 0.0609 0.0227 0.0343 0.0223 0.7005

¦
T/F T T T T T
1- m1 ( Ai ) m 2 ( B j ) m3 (C k ) 0.9449
Ai  B j  C k zM
From the fifth column of Table 1, we can see that neither
According to the combination rule of D-S evidence theory,
color-based, shape-based nor texture-based neural networks
the mass of D1 in the set D can be expressed as
can give correct recognition results for sample 5, but through
the fusion of D-S evidence theory, correct judgment results
¦
Ai B j Ck D1
m1( A1 )m2 ( B1 )m3 (C1 ) can be obtained. Table 4 is the result of evidence fusion
m( D1 ) processing for all the samples in Table 1, all column results
1 K corresponding to the last row are “T”, which indicates that the
0.1272 u 0.3586 u 0.1614 recognition results of all samples are correct. This shows that
0.1337 in the application of robot vision, multi-neural network fusion
 1-0.9449  can improve the reliability of working target recognition.
In the same way, we can calculate m(D2), m(D3), m(D4)
and m(D5). VII. CONCLUSIONS
¦
Ai Bj Ck D2
m1( A2 )m2 (B2 )m3(C2 ) In this paper, a color-based, shape-based and texture-based
m(D2 ) neural network is established for working target recognition in
1 K robot vision applications. The color-based neural network uses
0.0482 u 0.1545 u 0.0508 the color histogram of target image as the input feature vector,
0.0069 the shape-based neural network uses the binary image of the
1-0.9449  target region as the input of network, and the texture
recognition uses a 10-layer-convolution neural network. We
¦
Ai Bj Ck D3
m1( A3 )m2 (B3 )m3(C3) discuss the problems of working target recognition using these
m(D3 ) three kinds of neural networks independently, and then to
1 K combine three kinds of networks and use D-S evidence theory
0.3419 u 0.0642 u 0.3444 to fuse their output to get a better judgment. The classification
0.1373 test of five kinds of medicine package proves that the proposed
1-0.9449  model can improve the reliability of target recognition in robot
vision, and it can be used in automatic control systems such as
feeding, assembly, sorting and tracking of industrial robots.

2576
ACKNOWLEDGMENT on Computer Vision and Pattern Recognition (CVPR), Boston, MA,
USA, pp.648-656, 2015.
This work is supported by the Guangdong National [8] K. Simonyan K, A. Zisserman, “Very deep convolutional networks for
Natural Science Foundation under Grant No.2016A030313003 large-scale image recognition,”. Proc. Of International Conference on
and Jiangmen Science and Technology Bureau under Grant Learning Representations (ICLR), San Diego, CA, USA, pp.1-14, 2015.
No.20140060117111. [9] Y. LeCun, Y. Bengio and G. E. Hinton, “Deep learning,” Nature,
vol.521, no.7553, pp. 436-444, 2015.
REFERENCES [10] A. Gangopadhyay, S. M. Tripathi, I. Jindal and et al. “dynamic scene
classification using convolutional neural networks,” arXiv preprint ar
[1] D. Ramachandram and M. Rajeswari, “ Neural network-based robot Xiv:1502.05243, 2015.
visual positioning for intelligent assembly,” Journal of Intelligent
Manufacturing, vol.15, No.2, pp. 219-231, 2004. [11] M. Marszaek and C. Schmid, “Accurate Object Recognition with Shape
Masks,” International Journal of Computer Vision, vol. 97, issue 2, pp.
[2] I. Lenz, H. Lee, A. Saxena, “Deep learning for detecting robotic grasps,” 191–209, 2012.
Int J Robotics Res, vol.34, pp. 705–724, 2015.
[12] N. R. ManiˈD. M. PotukuchiˈCh. Satyanarayana, “A novel approach
[3] M. Z. Alom, M. Hasan, C. Yakopcic and et al, “Improved inception- for shape-based object recognition with curvelet transform,”
residual convolutional neural network for object recognition,” Neural International Journal of Multimedia Information Retrieval, vol. 5, issue
Computing and Applications, pp. 1-15, 2018. 4, pp. 219–228, 2016.
[4] J. Redmon, A. Angelova, “Real-time robotic grasp detection using
convolutional neural networks.”, In: 2015 IEEE International
Conference on Robotics and Automation (ICRA). IEEE, pp.1316–1322,
2015.
[5] A. Anuse and V. Vyas, “A novel training algorithm for convolutional
neural network,” Complex & Intelligent Systems, vol.2, issue 3, pp.
221-234, 2016.
[6] S. Levine, P. Pastor, A. Krizhevsky and et al, “Learning hand-eye
coordination for robotic grasping with deep learning and large-scale data
collection,” Int J Robotics Res, vol.37, pp. 421–436, 2018.
[7] J. Tompson J, R. Goroshin, A. Jain and et al, “Efficient object
localization using convolutional networks,” Proc. of IEEE Conference

2577

Das könnte Ihnen auch gefallen