Beruflich Dokumente
Kultur Dokumente
CHAPTER 1
INTRODUCTION
With the development of deep learning and the improvement of computer hardware
performance, object detection technology represented by convolutional neural
networks has been widely used. At present, object detection based on deep learning
mainly includes two types. One is based on region proposal, and the representative
networks are R-CNN, Fast R-CNN , Faster R-CNN. The other type frames the object
detection problem as a regression problem i.e, a single neural network predicts
bounding boxes and class probabilities directly from full images in one evaluation, and
the representative networks are YOLO, SSD and so on. These algorithms have been
applied to various image classification techniques with varying levels of speed and
accuracy.
Such loss of efficiency within the assembly line may lead to the formation of
pipeline bubbles within the system which can lead to overall reductions in the
throughput of the entire system. Since most of these problems are based on the
process of identifying certain characteristics of the product within the conveyor line,
they can easily be remedied through the use of machine vision systems.
Hence as the technology continues to get more sophisticated, the use cases for
machine vision will continue to grow.We see that detection and classification of
conveyor line objects is an important task.The current method of naked eye detection
can be either supplanted with or completely replaced by novel machine vision systems
using modern image processing and deep learning based methodologies.
The proposed system uses cameras placed along the conveyor line at specified
intervals to capture the image of the footwear.This image is then processed and then
sent to a computing system containing a pre-trained SSD MobileNet classifier that then
proceeds to identify the make of the footwear and then automatically generates a
barcode encoding its price and model number which can then be used further along in
the assembly line process.The classifier will be trained on a large dataset of images of
the footwear under varying lighting conditions and orientations.Due to this, it would
be able to recognize the objects in any orientation.Whenever Mismatched pair of
footwear comes under the camera, the system can recognize this and send an alert.
CHAPTER 2
LITERATURE SURVEY
2.1 DESCRIPTION OF EXISTING SYSTEMS
The authors of the project compared the speed and accuracy of both YOLOv3
based and SSD-Mobilenet based architectures on an object detection problem.The
performance of the models under varying conditions of lighting and complex changing
environments.The end result was that the SSD based model consistently provided
better accuracy while sacrificing negligible amounts of speed against the YOLO
classifier.
2. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu: “SSD: Single Shot MultiBox Detector”, 2016; arXiv:1512.02325.
This is the foundational paper on the SSD network.The authors presented a new type
of CNN that goes through the images only once rather than having separate region
proposal and identification stages such as in RCNN based approaches.They theorized
that with sufficiently powerful hardware it was possible to perform realtime object
detection with this algorithm
3. J. Jia, "A Machine Vision Application for Industrial Assembly Inspection," 2009
Second International Conference on Machine Vision, Dubai, 2009, pp. 172-176, doi:
10.1109/ICMV.2009.51.
The authors performed a comparison between various off the shelf embedded
camera based machine vision units.They used classical machine vision methods and
hence the model tends to be rule based, i.e, cannot perform accurately when the input
has too much noise.This is due to the fact that machine learning systems were not
popular at that time.
4. A. Pouramini and H. Varaee, "A machine vision system for defect detection of a
traveling grate conveyor," 2015 2nd International Conference on Knowledge-Based
Engineering and Innovation (KBEI), Tehran, 2015, pp. 1063-1066, doi:
10.1109/KBEI.2015.74361
5. Li, Yiting & Huang, Haisong & Xie, Qingsheng & Yao, Liguo & Chen, Qipeng. (2018).
Research on a Surface Defect Detection Algorithm Based on MobileNet-SSD. Applied
Sciences. 8. 1678. 10.3390/app8091678.
This paper aims to achieve real-time and accurate detection of surface defects
by using a deep learning method. For this purpose, the Single Shot MultiBox Detector
(SSD) network was adopted as the meta structure and combined with the base
convolution neural network (CNN) MobileNet into the MobileNet-SSD. Then, a
detection method for surface defects was proposed based on the MobileNet-SSD.
Specifically, the structure of the SSD was optimized without sacrificing its accuracy,
and the network structure and parameters were adjusted to streamline the detection
model. The proposed method was applied to the detection of typical defects like
breaches, dents, burrs and abrasions on the sealing surface of a container in the filling
line. The results show that our method can automatically detect surface defects more
accurately and rapidly than lightweight network methods and traditional machine
learning methods. The research results shed new light on defect detection in actual
industrial scenarios.
CHAPTER 3
BACKGROUND THEORY
The proposed solution to the problem will be an image processing and decision
making system using a Single Shot Multibox Detector(SSD) based architecture.The
system performs various image processing operations on the input image such
greyscaling, normalization, thresholding and segmentation to prepare the image for
the image classifier.The trained image classifier identifies the kind of footwear based
on a pre trained network.The results are then generated in the form of a barcode which
represents the size and article number of the identified footwear. The entire
architecture was trained on a CUDA enabled NVIDIA gpu using Tensorflow 1.4 as the
backend.
3.1 Tensorflow
Before discussing the actual architecture of the classifier, it is important to have a
basic understanding of how the whole tensorflow framework works.Researchers have
been implementing software packages to facilitate the construction of neural network
architectures for decades. Until the last few years, these systems were mostly special
purpose and only used within an academic group. The lack of standardized,
industrial-strength software made it difficult for non experts to use neural networks
extensively.
The situation has changed dramatically over the last few years.The Google Brain
team finally release the beta version of Tensorflow framework on November
2015.Tensorflow closely followed the principles of another famous library-Theano and
hence follows a similar use of tensors and graphs as the fundamental underlying
primitive power in deep learning systems.This allows the framework to build
sophisticated models.
functions.This graph first passes two inputs 5 and 3 via an input layer to two nodes
mul and add which perform multiplication and addition respectively.Both the results
are then sent to another node which sums the results and produces an output.With the
use of such hidden nodes, it becomes possible to manipulate the precise details of
what’s going on inside of our graph, but the client only has to know to send
information to the same two input nodes.It can be easier to visualize chaining together
groups of computations instead of having to worry about the specific details of each
piece.
Hence The basic tensorflow workflow consists of only two steps: Define the
computation graph->Run the graph(with data)
y = min(max(z, 0), 6), where z is the value of each pixel in the feature map.
where m is the number of feature maps and S min , S max are parameters that can be
set.
As SSD is an end to end training model so the general loss function of the training
contains the confidence loss Lconf(s,c) of the classification regression and the position
loss of the bounding box regression Lloc(r,l,g).This function can be depicted as
Where ɑ is the constraint to adjust the confidence loss and position loss; s and r are the
eigenvectors of confidence loss and position loss, individually.c is the confidence of
classification and l is the offset of predicted box, including the translation offset of the
middle facilitate and scaling offset of height and width.g is the ground truth box
The SSD-MobileNet split into two parts, of which MobileNet is for object prediction or
feature extraction, and Single Shot MultiBox Detector (SSD) is to determine the
classification results. The SSD-MobileNet model can adequately diminish the quantity
of parameters, and accomplish higher accuracy under restricted equipment conditions.
The entire model contains four sections: the input layer for bringing in the objective
image, the MobileNet i.e. base net for images features extraction, the SSD for
classification and bounding box regressions and in conclusion the output layer for
sending out the detection results. This model performs quick and accurate object
detection in light of the fact that the MobileNet structure infer the overall computing
complexity.
A typical CUDA program has code intended both for the GPU and the CPU. By default, a
traditional C program is a CUDA program with only the host code. The CPU is referred
to as the host, and the GPU is referred to as the device. Whereas the host code can be
compiled by a traditional C compiler as the GCC, the device code needs a special
compiler to understand the api functions that are used. For Nvidia GPUs, the compiler
is called the NVCC (Nvidia C Compiler). The device code runs on the GPU, and the host
code runs on the CPU. The NVCC processes a CUDA program, and separates the
hostcode from the device code. To accomplish this, special CUDA keywords are looked
for. The code intended to run on the GPU (device code) is marked with special CUDA
keywords for labelling data-parallel functions, called ‘Kernels’. The device code is
further compiled by the NVCC and executed on the GPU.
CHAPTER 4
FLOWCHARTS AND DIAGRAMS
CHAPTER 5
IMPLEMENTATION
The corresponding bounding box coordinates for the objects were generated using a
simple python program.This process comes with a major downside since the training
image will always have a white background which may negatively affect the real world
classification performance. To remedy this issue, we merged each image with a
random image from the internet as its background.The program then automatically
generates the corresponding CSV files for the label data.Each tuple in the CSV file
consist of a particular object with its class value and the bounding coordinates within
the image.This coordinates are necessary for the training process.
The final dataset was then split according to the 80-20 rule where 80% of the images
went into training and the rest 20% went into evaluation.
In the above images, the area containing the footwear was identified from the
background for labelling purposes.The red boundary box contains the necessary
coordinates for the labelling process.
a dataset of images was seen 14000 times by the network. After the loss value
stabilized at about .04, the training was stopped and the inference graph upto that
checkpoint was imported for further use.
The model was trained with Stochastic Gradient Descent with a learning rate of
.0002.The training was done on a Laptop with a NVIDIA MX-150 GPU to accelerate the
Fig 5.2.3:Loss
For the process of object classification various test images under varying
lighting conditions were collected. Each image was first cropped, grayscaled and
thresholded to a preset value so as to improve the results of the classification project.
The image is then sent to the trained SSD classifier. SSD divides the image using a grid
and have each grid cell be responsible for detecting objects in that region of the
image.The grids in the figure below are representative.The actual grid sizes vary based
on the input image.
Fig 5.3.1:SSDGrid
Detection objects simply means predicting the class and location of an object within
that region. If no object is present, we consider it as the background class and the
location is ignored. Each grid cell in the SSD can be assigned with multiple
anchor/prior boxes. These anchor boxes are predefined and each one is responsible for
a size and shape within a grid cell
SSD uses a matching phase while training, to match the appropriate anchor box with
the bounding boxes of each ground truth object within an image. Essentially, the
anchor box with the highest degree of overlap with an object is responsible for
predicting that object’s class and its location. This property is used for training the
network and for predicting the detected objects and their locations once the network
has been trained.
Each of the eight networks were run on the validation dataset with a confidence
threshold of 0.6 and non-maxima suppression of 0.2. The high confidence threshold
ensures that the object detection algorithm does not output unlikely detections
(incorrect detections could have major consequences) and the low non-maxima
suppression value allows the network to detect objects separated by relatively small
distances. A copy of the processed images complete with labelled bounding boxes is
saved for processing later, and the inference times for each image in the validation set
is also recorded.The corresponding 12 digit barcode encoding the price and the article
number of the detected footwear is also generated and stored in a separate folder.
Fig 5.3.4:Operation Pipeline
5.4 ADVANTAGES
The benefits of this project are :
● Decreased human labour required for identification
● Fully monitored conveyor line
● Reduce manual work: smart agriculture based system, which reduces the
manual work.
CHAPTER 6
CONCLUSION
The aim of this project is to develop a system for identifying the various types of
footwear present in a conveyor line.This was achieved through the use of novel image
processing and classification techniques made possible by the rapid increase in
processing power of consumer grade devices. The technology used behind this
proposal are image processing and object detection using a Convolutional algorithm
that was trained on a large dataset.After the identification was done, a barcode was
printed for each of the identified objects. With very less computational efforts the
optimum results were obtained which also shows the efficiency of algorithm in
recognition and classification of the footwear. Another advantage of using these
methods is that the production line issues can be identified at an earlier stage in a
quick and efficient manner. To improve recognition rate in the classification process
Artificial Neural Network, Bayes Classifier, Fuzzy Logic and hybrid algorithms can also
be used.
As a next step towards improving the results that were obtained here, further
training of the networks with the given datasets should help alleviate some of the
sensitivity issues identified in . Furthermore, an expanded training dataset including
more varied lighting conditions, more cases of partial occlusion, motion blur, and
general noise should increase the sensitivity of the networks, allowing smaller and
faster networks to be used. Additionally, an expanded dataset such as this may allow
the use of the MobileNet-SSD network architecture, which would further decrease
detection times. Additional improvements include the use of optimised OpenCV
libraries for the Nvidia Jetson nano compute boards used , and closer integrations of
the API into the other autonomous production systems, which should provide further
speed increases.The system can be connected to a robotic arm or other such
mechanical devices to perform automatic package sorting and identification
task.Supplementary smartphone apps can be created to identify in real time the
conveyor belt process so that the production engineer can make necessary decisions
based on the product flow
REFERENCES
[1] Enzeng Dong, Yao Lu, Shengzhi Du, “An Improved SSD Algorithm and its
Mobile Terminal Implementation”-2019 International Conference on Mechatronics and
Automation.
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified,
real-time object detection,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2016.
[4] Roland Sczabo, Aurel Gontean, “Industrial robotic automation with Raspberry PI
using image processing”,2018 International conference on applied electronics
[5] Yong Li, Jinbing Xu, “Electronic Product Surface Defect Detection Based on a MSSD
Network ”,-2020 2020 IEEE 4th Information Technology, Networking,
Electronic and Automation Control Conference (ITNEC)
[6] Shoaqing Ren, Kaiming He, Ross Girshick and Jian Sun. “Faster R-CNN: Towards
Real Time Object Detection with Region Proposal Networks”.
ArXiv:1506.0149v3 [cv.CS] 6 Jan 2016.
[7] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition",
2016 IEEE Conference on Computer Vision andPattern
Recognition(CVPR),2016, pp.770-778, doi:10.1109/CVPR.2016.90.
[9] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu: “SSD: Single Shot MultiBox Detector”, 2016; arXiv:1512.02325.
[10] J. Jia, "A Machine Vision Application for Industrial Assembly Inspection," 2009
Second International Conference on Machine Vision, Dubai, 2009, pp. 172-176, doi:
10.1109/ICMV.2009.51.
[11] Li, Yiting & Huang, Haisong & Xie, Qingsheng & Yao, Liguo & Chen, Qipeng. (2018).
Research on a Surface Defect Detection Algorithm Based on MobileNet-SSD. Applied
Sciences. 8. 1678. 10.3390/app8091678.