Packing Automation in A High Variety Conveyor Line Using Image Classification

Packing automation in a high variety conveyor line using Image Classification
CHAPTER 1
INTRODUCTION
Automation is generally accepted as the process of producing a certain product in a

more precise and efficient manner. It consists of using technologies such as high speed
cameras and identification systems used to efficiently channel the flow of goods within
an industrial environment. Cost effective and highly capable vision systems running
robust software are transforming longstanding autonomous and adaptive industrial
automation aspirations into reality
Fig 1.1: Basic image identification system
Automated systems in manufacturing line environments are capable of working more

tirelessly, faster and more exactly than humans. Embedded vision innovations can help
improve product tracking through production lines and with enhanced storage
efficiency. While barcodes and radio-frequency identification tags can also help track
and route materials, they cannot be used to detect damaged or flawed goods.
Intelligent raw material and product tracking and handling in the era of embedded
vision will be the foundation for the next generation of inventory management
systems, as image sensor technologies continue to mature and as other vision
Dept. of CSE, ASIET Page 1

processing components become increasingly integrated. High-resolution cameras can

already provide detailed images of work material and inventory tags, but complex,
real-time software is needed to analyze the images, to identify objects within them, to
identify ID tags associated with these objects, and to perform quality checks.
With the development of deep learning and the improvement of computer hardware
performance, object detection technology represented by convolutional neural
networks has been widely used. At present, object detection based on deep learning
mainly includes two types. One is based on region proposal, and the representative
networks are R-CNN, Fast R-CNN , Faster R-CNN. The other type frames the object
detection problem as a regression problem i.e, a single neural network predicts
bounding boxes and class probabilities directly from full images in one evaluation, and
the representative networks are YOLO, SSD and so on. These algorithms have been
applied to various image classification techniques with varying levels of speed and
accuracy.

1.1 PROBLEM STATEMENT

Automation is defined as the technology by which a process or procedure is
accomplished without human assistance. Automation permeates through all levels of
modern manufacturing from automated vision systems which are employed to
perform product identification and quality control. Assembly line based systems
frequently use a large number of human labourers for tasks such as quality control and
sorting, this results in general slow down of the whole system because humans by
nature are not able to do the same task for an extended period of time without
significant decrease in output. For this project the task was to identify the article no
and size for each footwear passing through the conveyor line and to generate a
corresponding barcode reflecting its characteristics.
Lack of concentration, disillusionment with the upper management and general

disinterest towards work can cause the workers to perform poorly in the assembly line
process.This is because of the fact that humans by nature are really good at pattern
identification and problem solving but are severely lacking in the field of repeatedly
performing a rote task which ironically, is what the whole assembly line system is
based on.
Such loss of efficiency within the assembly line may lead to the formation of
pipeline bubbles within the system which can lead to overall reductions in the
throughput of the entire system. Since most of these problems are based on the
process of identifying certain characteristics of the product within the conveyor line,
they can easily be remedied through the use of machine vision systems.
One of the simplest ways to understand a machine vision system is to consider

it the “eyes” of a machine. The system uses digital input that’s captured by a camera to
determine action.Such systems can be used for :
-Correcting production line defects: In addition to using machine vision to

identify defective products, machine vision can help determine where the problems
are being introduced in a production line so corrective action can be taken.

Inventory control and management: Machine vision is imperative in the process

of reading barcodes and labels on components and products. This has important
applications for inventory control, but also in the manufacturing process to ensure the
correct components get added as products move down an assembly line. Machine
vision is critical for the bin-picking done in warehouses by robots.
Hence as the technology continues to get more sophisticated, the use cases for
machine vision will continue to grow.We see that detection and classification of
conveyor line objects is an important task.The current method of naked eye detection
can be either supplanted with or completely replaced by novel machine vision systems
using modern image processing and deep learning based methodologies.
1.2 PROPOSED SYSTEM
The proposed system uses cameras placed along the conveyor line at specified
intervals to capture the image of the footwear.This image is then processed and then
sent to a computing system containing a pre-trained SSD MobileNet classifier that then
proceeds to identify the make of the footwear and then automatically generates a
barcode encoding its price and model number which can then be used further along in
the assembly line process.The classifier will be trained on a large dataset of images of
the footwear under varying lighting conditions and orientations.Due to this, it would
be able to recognize the objects in any orientation.Whenever Mismatched pair of
footwear comes under the camera, the system can recognize this and send an alert.

CHAPTER 2
LITERATURE SURVEY
2.1 DESCRIPTION OF EXISTING SYSTEMS
1. A. R. Visalatchi, T. Navasri, P. Ranjanipriya and R. Yogamathi, "Intelligent Vision with

TensorFlow using Neural Network Algorithms," 2020 Fourth International Conference
on Computing Methodologies and Communication (ICCMC), Erode, India, 2020, pp.
944-948, doi: 10.1109/ICCMC48092.2020.ICCMC-000175.
The authors of the project compared the speed and accuracy of both YOLOv3
based and SSD-Mobilenet based architectures on an object detection problem.The
performance of the models under varying conditions of lighting and complex changing
environments.The end result was that the SSD based model consistently provided
better accuracy while sacrificing negligible amounts of speed against the YOLO
classifier.
2. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu: “SSD: Single Shot MultiBox Detector”, 2016; arXiv:1512.02325.
This is the foundational paper on the SSD network.The authors presented a new type
of CNN that goes through the images only once rather than having separate region
proposal and identification stages such as in RCNN based approaches.They theorized
that with sufficiently powerful hardware it was possible to perform realtime object
detection with this algorithm
3. J. Jia, "A Machine Vision Application for Industrial Assembly Inspection," 2009
Second International Conference on Machine Vision, Dubai, 2009, pp. 172-176, doi:
10.1109/ICMV.2009.51.
The authors performed a comparison between various off the shelf embedded
camera based machine vision units.They used classical machine vision methods and
hence the model tends to be rule based, i.e, cannot perform accurately when the input
has too much noise.This is due to the fact that machine learning systems were not
popular at that time.

4. A. Pouramini and H. Varaee, "A machine vision system for defect detection of a
traveling grate conveyor," 2015 2nd International Conference on Knowledge-Based
Engineering and Innovation (KBEI), Tehran, 2015, pp. 1063-1066, doi:
10.1109/KBEI.2015.74361
In this paper, an online system for automatic defect detection of grates of a

travelling grate conveyor was presented. After capturing the video, its frames are
converted to binary images.A sliding window over the grate measures the black pixels
ratio in each window and the damaged parts are detected.The detection rate was more
than 98% in test results.
5. Li, Yiting & Huang, Haisong & Xie, Qingsheng & Yao, Liguo & Chen, Qipeng. (2018).
Research on a Surface Defect Detection Algorithm Based on MobileNet-SSD. Applied
Sciences. 8. 1678. 10.3390/app8091678.
This paper aims to achieve real-time and accurate detection of surface defects
by using a deep learning method. For this purpose, the Single Shot MultiBox Detector
(SSD) network was adopted as the meta structure and combined with the base
convolution neural network (CNN) MobileNet into the MobileNet-SSD. Then, a
detection method for surface defects was proposed based on the MobileNet-SSD.
Specifically, the structure of the SSD was optimized without sacrificing its accuracy,
and the network structure and parameters were adjusted to streamline the detection
model. The proposed method was applied to the detection of typical defects like
breaches, dents, burrs and abrasions on the sealing surface of a container in the filling
line. The results show that our method can automatically detect surface defects more
accurately and rapidly than lightweight network methods and traditional machine
learning methods. The research results shed new light on defect detection in actual
industrial scenarios.

CHAPTER 3
BACKGROUND THEORY
The proposed solution to the problem will be an image processing and decision
making system using a Single Shot Multibox Detector(SSD) based architecture.The
system performs various image processing operations on the input image such
greyscaling, normalization, thresholding and segmentation to prepare the image for
the image classifier.The trained image classifier identifies the kind of footwear based
on a pre trained network.The results are then generated in the form of a barcode which
represents the size and article number of the identified footwear. The entire
architecture was trained on a CUDA enabled NVIDIA gpu using Tensorflow 1.4 as the
backend.
3.1 Tensorflow
Before discussing the actual architecture of the classifier, it is important to have a
basic understanding of how the whole tensorflow framework works.Researchers have
been implementing software packages to facilitate the construction of neural network
architectures for decades. Until the last few years, these systems were mostly special
purpose and only used within an academic group. The lack of standardized,
industrial-strength software made it difficult for non experts to use neural networks
extensively.
The situation has changed dramatically over the last few years.The Google Brain
team finally release the beta version of Tensorflow framework on November
2015.Tensorflow closely followed the principles of another famous library-Theano and
hence follows a similar use of tensors and graphs as the fundamental underlying
primitive power in deep learning systems.This allows the framework to build
sophisticated models.
At the core of every Tensorflow program is the computation graph.A computation

graph is a specific type of directed graph that is used for defining the computational
structure.In essence, a series of functions are chained together, each passing its output
to zero, one, or more functions further along in the chain.In this way the user can

construct a complex transformation on data by using blocks of smaller, well

understood mathematical functions.
Fig 3.1.1: A computational graph
functions.This graph first passes two inputs 5 and 3 via an input layer to two nodes
mul and add which perform multiplication and addition respectively.Both the results
are then sent to another node which sums the results and produces an output.With the
use of such hidden nodes, it becomes possible to manipulate the precise details of
what’s going on inside of our graph, but the client only has to know to send
information to the same two input nodes.It can be easier to visualize chaining together
groups of computations instead of having to worry about the specific details of each
piece.
Hence The basic tensorflow workflow consists of only two steps: Define the
computation graph->Run the graph(with data)
Fig 3.1.2: A simple computation graph
3.2 Convolutional Neural Networks

Deep learning is a machine learning technique that teaches computers to do

what comes naturally to humans: learn by example. It can achieve state-of-the-art
accuracy, sometimes exceeding human-level performance [10]. Models are trained by
using a large set of labelled data and neural network architectures that contain many
layers. Deep learning classifiers are trained through feature learning rather than
task-specific algorithms. The structure of deep learning is given in the figure 2.3, where
the first layer is the input, the last layer is the output layer, and the blue coloured
neurons indicate the hidden layers. Each layer can contain any number of neurons
possible. And every neuron has an activation function along with the weights.
Convolutional neural network architectures have been found to be very useful in image
based tasks since it uses various kernel matrices to identify important features within
images
Convolutional neural networks (CNNs) consist of multiple layers of receptive

fields. These are small neuron collections which process portions of the input image.
The outputs of these collections are then tiled so that their input regions overlap, to
obtain a higher-resolution representation of the original image; this is repeated for
every such layer.Technically, a convolutional neural network is a network which has at
least one layer(tf.nn.conv2d) that does a convolution between its input F and a
configurable kernel G generating the layer’s output. In simplified definition, a
convolution’s goal is to apply a kernel(filter) to every point in a tensor and generate a
filtered output by sliding the kernel over an input tensor .

Fig 3.2.1:A convolution operation

In the above image, a special kernel is applied to each pixel of an image and the output
is a new image depicting all the edges.In this case the input tensor is an image and each
point in the tensor is treated as a pixel which includes the RGB value at that point. The
kernel is slid over every pixel in the image and the output value increases whenever
there is an edge between the colors.Using the above filter, we obtain all the horizontal
lines present within the image.Successively using such kernels it is possible to find
vertical, circular, square etc shapes within an image and these can be then combined to
identify more complex objects within the image.The training process of a CNN deals
with adjusting these filter weights so as to produce accurate outputs.This is often
accomplished by combining multiple different layers and learning weights using
gradient descent.

Fig 3.2.2:CNN architecture

A simple CNN architecture may combine a convolutional layer(tf.nn.conv2d),
non-linearity layer(tf.nn.relu), pooling layer(tf.nn.max_pool) and a fully connected
layer (tf.matmul).A well designed CNN architecture highlights important information
while ignoring noise. The input image for this architecture is a complex format
designed to support the ability to load batches of images.Loading a batch of images
allows the computation of multiple images simultaneously
3.3 SSD Mobilenet
MobileNet is made out of Point-wise layers(Pw) and Depth-wise layers(Dw).The

Dw are deep convolutional layers utilizing 3x3 kernels while the Pw are common
convolutional layers utilizing 1x1 kernels. Every convolution result is treated by the
Batch Normalization(BN) calculation and the activation function Rectified Linear
Unit(ReLU6).This function can be represented as :

y = min(max(z, 0), 6), where z is the value of each pixel in the feature map.
Fig 3.3.1:Mobilenet Architecture
SSD network is a regression model, which utilizes features of various

convolutional layers to create classification regression and bounding box regression. In
each chosen feature map, there are k frames that contrast in size and width to-height
proportion.These frames are known as the default boxes.These default boxes are then
scaled for each feature map and can be calculated as:
where m is the number of feature maps and S min , S max are parameters that can be
set.
As SSD is an end to end training model so the general loss function of the training
contains the confidence loss Lconf(s,c) of the classification regression and the position
loss of the bounding box regression Lloc(r,l,g).This function can be depicted as
Where ɑ is the constraint to adjust the confidence loss and position loss; s and r are the
eigenvectors of confidence loss and position loss, individually.c is the confidence of

classification and l is the offset of predicted box, including the translation offset of the
middle facilitate and scaling offset of height and width.g is the ground truth box
The SSD-MobileNet split into two parts, of which MobileNet is for object prediction or
feature extraction, and Single Shot MultiBox Detector (SSD) is to determine the
classification results. The SSD-MobileNet model can adequately diminish the quantity
of parameters, and accomplish higher accuracy under restricted equipment conditions.
The entire model contains four sections: the input layer for bringing in the objective
image, the MobileNet i.e. base net for images features extraction, the SSD for
classification and bounding box regressions and in conclusion the output layer for
sending out the detection results. This model performs quick and accurate object
detection in light of the fact that the MobileNet structure infer the overall computing
complexity.
3.4 Nvidia CUDA
Compute Unified Device Architecture(CUDA) is a parallel computing platform and an

API model that was developed by Nvidia. Using CUDA, one can utilize the power of
Nvidia GPUs to perform general computing tasks, such as multiplying matrices and
performing other linear algebra operations, instead of just doing graphical calculations.
Using CUDA, developers can now harness the potential of the GPU for general purpose
computing (GPGPU).
GPUs are designed for data intensive applications. This is emphasized-upon by

the fact that the bandwidths of GPU DRAM has increased tremendously by each
passing year, but not so much in case of CPUs. Why do GPUs adopt such a design and
CPUs do not? Well, because GPUs were originally designed for 3D rendering, which
requires holding large amounts of texture and polygon data. Caches cannot hold such
large amounts of data, and thus, the only design that would have increased rendering
performance was to increase the bus width and the memory clock. For example, the
Intel i7, which currently supports the largest memory bandwidth, has a memory bus of
width 192b and a memory clock upto 800MHz. The GTX 285 had a bus width of 512b,
and a memory clock of 1242 MHz.Since image recognition tasks are by nature consist

of a large number of matrix multiplication procedures with no dependencies between

the various results, they can be parallelized within the gpu efficiently.
A typical CUDA program has code intended both for the GPU and the CPU. By default, a
traditional C program is a CUDA program with only the host code. The CPU is referred
to as the host, and the GPU is referred to as the device. Whereas the host code can be
compiled by a traditional C compiler as the GCC, the device code needs a special
compiler to understand the api functions that are used. For Nvidia GPUs, the compiler
Fig 3.4.1:CUDA Architecture
is called the NVCC (Nvidia C Compiler). The device code runs on the GPU, and the host
code runs on the CPU. The NVCC processes a CUDA program, and separates the
hostcode from the device code. To accomplish this, special CUDA keywords are looked
for. The code intended to run on the GPU (device code) is marked with special CUDA
keywords for labelling data-parallel functions, called ‘Kernels’. The device code is
further compiled by the NVCC and executed on the GPU.

CHAPTER 4
FLOWCHARTS AND DIAGRAMS
4.1 USE CASE DIAGRAM
Fig 4.1.1:Use Case Diagram
4.2 Data Flow Diagram
Fig 4.2.1:Data Flow Diagram

CHAPTER 5
IMPLEMENTATION
5.1 IMAGE DATASET

The Dataset consists of over 1200 images. Each of the images falls into one of the six
categories.These images were taken using a modern smartphone camera using the
burst mode functionality.Each such image comes out to about 5MP.Each of the
representative images were taken against a white background. This was done to
automate the process of label generation.We were able to programmatically identify
the area of the image containing the footwear against the white backdrop.
Fig 5.1.1:Image Dataset
The corresponding bounding box coordinates for the objects were generated using a
simple python program.This process comes with a major downside since the training
image will always have a white background which may negatively affect the real world
classification performance. To remedy this issue, we merged each image with a
random image from the internet as its background.The program then automatically
generates the corresponding CSV files for the label data.Each tuple in the CSV file
consist of a particular object with its class value and the bounding coordinates within
the image.This coordinates are necessary for the training process.

Fig 5.1.2:Train and test labels.
The final dataset was then split according to the 80-20 rule where 80% of the images
went into training and the rest 20% went into evaluation.
Fig 5.1.3:Cropping Boundary

Fig 5.1.4:Code Snippet
In the above images, the area containing the footwear was identified from the
background for labelling purposes.The red boundary box contains the necessary
coordinates for the labelling process.

5.2 Model Training

The inference model was trained on a dataset of 985 images.Each image
corresponds to a label within the training csv file. The model was trained using the
tensorflow framework.A label map containing the classes of footwear was created for
the training process.The training lasted for about 14000 steps,i.e each the whole
Fig 5.2.1:Training Process
a dataset of images was seen 14000 times by the network. After the loss value
stabilized at about .04, the training was stopped and the inference graph upto that
checkpoint was imported for further use.

Fig 5.2.2:Training Process
The model was trained with Stochastic Gradient Descent with a learning rate of
.0002.The training was done on a Laptop with a NVIDIA MX-150 GPU to accelerate the

whole training process.Tensorboard, shown in Fig 4.2.2 is an invaluable tool to track

the progress of the training procedure.It shows various graphs detailing the various
network parameters at each point in training.The inference graph is file of type
protobuf, which stores the weights and other network parameters along with the
details of the various computational nodes present in the network.This will be further
used in the classification process.One of the major advantages of tensorflow is that the
training need not be continuous until the loss stabilizes.The training can be stopped at
any intermediate step and this creates a checkpoint file which can be then used to
resume the training from that point onwards at a future time
Fig 5.2.3:Loss

5.3 Object Classification
For the process of object classification various test images under varying
lighting conditions were collected. Each image was first cropped, grayscaled and
thresholded to a preset value so as to improve the results of the classification project.
The image is then sent to the trained SSD classifier. SSD divides the image using a grid
and have each grid cell be responsible for detecting objects in that region of the
image.The grids in the figure below are representative.The actual grid sizes vary based
on the input image.
Fig 5.3.1:SSDGrid
Detection objects simply means predicting the class and location of an object within
that region. If no object is present, we consider it as the background class and the
location is ignored. Each grid cell in the SSD can be assigned with multiple
anchor/prior boxes. These anchor boxes are predefined and each one is responsible for
a size and shape within a grid cell
SSD uses a matching phase while training, to match the appropriate anchor box with
the bounding boxes of each ground truth object within an image. Essentially, the
anchor box with the highest degree of overlap with an object is responsible for
predicting that object’s class and its location. This property is used for training the

network and for predicting the detected objects and their locations once the network
has been trained.
Fig 5.3.2:Classified images
Fig 5.3.3:Images and their barcodes
Each of the eight networks were run on the validation dataset with a confidence
threshold of 0.6 and non-maxima suppression of 0.2. The high confidence threshold
ensures that the object detection algorithm does not output unlikely detections
(incorrect detections could have major consequences) and the low non-maxima
suppression value allows the network to detect objects separated by relatively small
distances. A copy of the processed images complete with labelled bounding boxes is

saved for processing later, and the inference times for each image in the validation set
is also recorded.The corresponding 12 digit barcode encoding the price and the article
number of the detected footwear is also generated and stored in a separate folder.
Fig 5.3.4:Operation Pipeline
5.4 ADVANTAGES
The benefits of this project are :
● Decreased human labour required for identification
● Fully monitored conveyor line
● Reduce manual work: smart agriculture based system, which reduces the
manual work.

● Automatic product identification significantly quickens the production process

● Production Engineers get live updates of the production process.
5.5 FUTURE SCOPE
The speed of the model can be significantly increased if run on custom

hardware.Significant speed gains can be obtained by writing the whole program in a
system programming language such as C/C++ since it does not require the presence of
a Virtual machine as in the case of the Python language.
The classification accuracy can be improved by better means of image

processing before the classification process.In this project we only performed Object
detection on the items, but with more advanced algorithms it's also possible to identify
the size and color of the objects passing through the classifier.Moreover it's also
possible to identify defects within the products while they pass through the camera by
analyzing the amount of black pixels in each object by using a sliding window
approach.

CHAPTER 6
CONCLUSION
The aim of this project is to develop a system for identifying the various types of
footwear present in a conveyor line.This was achieved through the use of novel image
processing and classification techniques made possible by the rapid increase in
processing power of consumer grade devices. The technology used behind this
proposal are image processing and object detection using a Convolutional algorithm
that was trained on a large dataset.After the identification was done, a barcode was
printed for each of the identified objects. With very less computational efforts the
optimum results were obtained which also shows the efficiency of algorithm in
recognition and classification of the footwear. Another advantage of using these
methods is that the production line issues can be identified at an earlier stage in a
quick and efficient manner. To improve recognition rate in the classification process
Artificial Neural Network, Bayes Classifier, Fuzzy Logic and hybrid algorithms can also
be used.
This paper demonstrates the viability of the MobileNet-SSD neural network

architecture for object detection on embedded hardware. After exploring the existing
literature regarding object detection algorithms, several prototype MobileNet-SSD
networks were created, demonstrating the benefits of this architecture in terms of
speed and accuracy. A multi- network design was proposed to ensure maximum
efficiency on the hardware, and statistical analysis methods were used to measure the
effectiveness of this design in practice. The analysis demonstrated the effectiveness of
the proposed solution and resulted in networks that are able to detect objects at up to
9 frames per second while maintaining a high level of accuracy and sensitivity.
As a next step towards improving the results that were obtained here, further
training of the networks with the given datasets should help alleviate some of the
sensitivity issues identified in . Furthermore, an expanded training dataset including
more varied lighting conditions, more cases of partial occlusion, motion blur, and

general noise should increase the sensitivity of the networks, allowing smaller and
faster networks to be used. Additionally, an expanded dataset such as this may allow
the use of the MobileNet-SSD network architecture, which would further decrease
detection times. Additional improvements include the use of optimised OpenCV
libraries for the Nvidia Jetson nano compute boards used , and closer integrations of
the API into the other autonomous production systems, which should provide further
speed increases.The system can be connected to a robotic arm or other such
mechanical devices to perform automatic package sorting and identification
task.Supplementary smartphone apps can be created to identify in real time the
conveyor belt process so that the production engineer can make necessary decisions
based on the product flow

REFERENCES
[1] Enzeng Dong, Yao Lu, Shengzhi Du, “An Improved SSD Algorithm and its
Mobile Terminal Implementation”-2019 International Conference on Mechatronics and
Automation.
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified,
real-time object detection,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2016.
[3] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman,

“The pascal visual object classes challenge: A retrospective,” Int. J. Comput. Vis.,
vol. 111, no. 1, pp. 98–136, 2018
[4] Roland Sczabo, Aurel Gontean, “Industrial robotic automation with Raspberry PI
using image processing”,2018 International conference on applied electronics
[5] Yong Li, Jinbing Xu, “Electronic Product Surface Defect Detection Based on a MSSD
Network ”,-2020 2020 IEEE 4th Information Technology, Networking,
Electronic and Automation Control Conference (ITNEC)
[6] Shoaqing Ren, Kaiming He, Ross Girshick and Jian Sun. “Faster R-CNN: Towards
Real Time Object Detection with Region Proposal Networks”.
ArXiv:1506.0149v3 [cv.CS] 6 Jan 2016.
[7] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition",
2016 IEEE Conference on Computer Vision andPattern
Recognition(CVPR),2016, pp.770-778, doi:10.1109/CVPR.2016.90.
[8] A. R. Visalatchi, T. Navasri, P. Ranjanipriya and R. Yogamathi, "Intelligent Vision

with TensorFlow using Neural Network Algorithms," 2020 Fourth International

Conference on Computing Methodologies and Communication (ICCMC), Erode,

India, 2020, pp. 944-948, doi: 10.1109/ICCMC48092.2020.ICCMC-000175.
[9] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu: “SSD: Single Shot MultiBox Detector”, 2016; arXiv:1512.02325.
[10] J. Jia, "A Machine Vision Application for Industrial Assembly Inspection," 2009
Second International Conference on Machine Vision, Dubai, 2009, pp. 172-176, doi:
10.1109/ICMV.2009.51.
[11] Li, Yiting & Huang, Haisong & Xie, Qingsheng & Yao, Liguo & Chen, Qipeng. (2018).
Research on a Surface Defect Detection Algorithm Based on MobileNet-SSD. Applied
Sciences. 8. 1678. 10.3390/app8091678.

Packing Automation in A High Variety Conveyor Line Using Image Classification

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Packing Automation in A High Variety Conveyor Line Using Image Classification

Hochgeladen von

Copyright:

Verfügbare Formate

Packing automation in a high variety conveyor line using Image Classification

Automation is generally accepted as the process of producing a certain product in a

Fig 1.1: Basic image identification system

Automated systems in manufacturing line environments are capable of working more

Dept. of CSE, ASIET Page 1

processing components become increasingly integrated. High-resolution cameras can

Dept. of CSE, ASIET Page 2

1.1 PROBLEM STATEMENT

Lack of concentration, disillusionment with the upper management and general

One of the simplest ways to understand a machine vision system is to consider

-Correcting production line defects: In addition to using machine vision to

Dept. of CSE, ASIET Page 3

Inventory control and management: Machine vision is imperative in the process

1.2 PROPOSED SYSTEM

Dept. of CSE, ASIET Page 4

1. A. R. Visalatchi, T. Navasri, P. Ranjanipriya and R. Yogamathi, "Intelligent Vision with

Dept. of CSE, ASIET Page 5

In this paper, an online system for automatic defect detection of grates of a

Dept. of CSE, ASIET Page 6

At the core of every Tensorflow program is the computation graph.A computation

Dept. of CSE, ASIET Page 7

construct a complex transformation on data by using blocks of smaller, well

​Fig 3.1.1: A computational graph

​Fig 3.1.2: A simple computation graph

3.2 Convolutional Neural Networks

Dept. of CSE, ASIET Page 8

Deep learning is a machine learning technique that teaches computers to do

Convolutional neural networks (CNNs) consist of multiple layers of receptive

Dept. of CSE, ASIET Page 9

​Fig 3.2.1:A convolution operation

Dept. of CSE, ASIET Page 10

​Fig 3.2.2:CNN architecture

3.3 SSD Mobilenet

MobileNet is made out of Point-wise layers(Pw) and Depth-wise layers(Dw).The

Dept. of CSE, ASIET Page 11

Fig 3.3.1:Mobilenet Architecture

SSD network is a regression model, which utilizes features of various

Dept. of CSE, ASIET Page 12

3.4 Nvidia CUDA

Compute Unified Device Architecture(CUDA) is a parallel computing platform and an

GPUs are designed for data intensive applications. This is emphasized-upon by

Dept. of CSE, ASIET Page 13

of a large number of matrix multiplication procedures with no dependencies between

Fig 3.4.1:CUDA Architecture

Dept. of CSE, ASIET Page 14

4.1 USE CASE DIAGRAM

Fig 4.1.1:Use Case Diagram

4.2 Data Flow Diagram

Fig 4.2.1:Data Flow Diagram

Dept. of CSE, ASIET Page 15

5.1 IMAGE DATASET

Fig 5.1.1:Image Dataset

Dept. of CSE, ASIET Page 16

Fig 5.1.2:Train and test labels.

Fig 5.1.3:Cropping Boundary

Dept. of CSE, ASIET Page 17

Fig 5.1.4:Code Snippet

Dept. of CSE, ASIET Page 18

5.2 Model Training

Fig 5.2.1:Training Process

Dept. of CSE, ASIET Page 19

Fig 5.2.2:Training Process

Dept. of CSE, ASIET Page 20

Fig 3.1.1: A computational graph

Fig 3.1.2: A simple computation graph

Fig 3.2.1:A convolution operation

Fig 3.2.2:CNN architecture

Fig 5.3.2:Classified images