farmer’s annual monetary revenue and minimizing crop III. SYSTEM DESIGN AND
loss caused by pests/disease attacks. The work thus ARCHITECTURE
carried out in this part of research is to put forth a
solution which consists of 3 major steps: Data MATERIALS AND METHODS
acquisition, Pre-processing and Classification for plant In order to develop a model for plant disease recognition, the
disease detection. It further helps in resolving those approach used is deep CNN.
issues and produce healthy crop growth.
1. DATASET
II. LITERATURE SURVEY
Authors of [11] give a brief knowledge of Support Vector For the purpose of image-based identification which includes,
Machine. They focus on proposing an automated system for training phase to evaluation phase where the performance of
diagnosis of common paddy diseases by making use of k- classification algorithms are evaluated, it is necessary to have huge
data sets. Hence, the source of data is collected from PlantVillage
means clustering and feature extraction. While the authors
website. The images thus collated are labeled with four different
of [12] focuses on plant diseases that are affected by
categories-bacterial spot , yellow leaf curl virus , late blight and
climatic causes in Thailand. Accordingly, they state that
healthy(in order to differentiate healthy leaves from affected ones).
system should have an application which can operate for
Subsequently, there is a need to enhance the dataset by adding the
specific disease diagnosis using rule based model of data images that are augmented. This paper further train the network to
mining technique. learn features that differentiates one class from other.
Correspondingly, a database comprising of more than 5000 images
Authors of [13] focuses on proposing a model that provides are used to train and around 1000 images are further used to
automatic method find out leaf diseases by inspecting if an validate the same.
image which is subjected to examination in the system has
been affected by any disease or not. They are generated by
using different cluster sizes using image segmentation and 2. PROCESS AND LABEL OF IMAGES
thereby obtaining an optimized results. However, the work Several samples of images are collected from plant village which
carried out by the authors [14] emphasize to determine the are spread across in several formats having varying levels of
nitrogen deficiency and further anticipate the right quantity resolutions and hence the variations in quality. Thus, to acquire a
of fertilizers required for the type of area using feature reasonable feature extraction, the final images are used as input
extraction, text extraction. data for classifier which are then pre-processed to achieve
consistency.
However, authors of [14] feel that digital image recognition
of plant diseases is one of the thrust areas and hence came It is further ensured that at the time of data collection, those of the
out with a model which comprises of back propagation images whose resolution is smaller and which has a dimension less
networks and probabilistic neural networks. It is further than 500px is not taken into consideration as valid images for the
depending on color features, shape features and text dataset. As such, images having higher resolution form the
features extracted from disease image. potential candidates for this investigation purpose. Consequently,
images are ascertained to contain all the required information for
Also, the work of authors [15] focuses on occurrence of risk feature learning. Accordingly, images used for the dataset
factor in apple. Beta regression model is used to predict the were image resized to 50 X 50. This ensures that there is a
reduction of the time required for training and automatically
subsystem and the severity of the apple disease so that it
computes it using written script in Python, using the OpenCV
helps the farmers to take the decision of pesticide spray and
framework. Pre-processing images involves outsourcing
reduce the diseases.
background noise, intensity normalization of individual
image particles, removing reflections and masking portions of
Thus, huge amount of research is always going on in the images.
domain of agriculture in order to yield better and
satisfactory results. This work is broadly classified into 2 algorithms:
Artificial Neural Networks algorithm:
ANN is used to detect the plant swelling (moisture content),
disease and pest along with soil analysis. The dataset of the plant
leaf, various diseases, pests and soil images are trained in python
tool and classified into various clusters which classifies various
labels. Convolution Neural Networks is designed for like “Disease X and Pest Y”. This phase is Disease/Pest Detecting
accurate analysis. Unsupervised Learning classification is
used since the input image is unknown and new to the
algorithm. Most of the real time applications need
unsupervised learning data since the input is always
unknown to the algorithm.
The dataset is divided into 70% for the training, 10% for
validation and 20% for testing. Different models with different
architectures and learning rate are tested. The parameters of the
network like the kernel size, filter size, learning parameter were
selected by trial and error. Table 1 depicts classification results
from different models using different architectures
Figure 4: illustrates working of CNN layers From the result, the classification accuracy from the color
images is better than the gray scale and the segmented images.
This shows the color feature is important to extract important
features for classification. The model that provides good
classification accuracy contains three convolutional layers each
followed by max pooling layer. The graphs of the training
accuracy versus validation ac- curacy of the model is shown. It
can be seen from the graphs that the model is overfitting.
Overfitting happens when the model fits too well to the training
set. It then becomes difficult for the model to generalize to new
examples that were not in the training set.
Figure 7 indicates activation process of the feature maps is implemented with training data and classification of given
image dataset. The test input image is compared with the trained
data for detection and prediction analysis. From the results, it is
clear that model provides reliable results
VIII. REFERENCES
VII. CONCLUSION 6. German, L., Ramisch, J.J. & Verma R. (2010) Beyond
Convolution neural network is used to detect and classify the Biophysical, Knowledge, Culture, and Power in
plant diseases. The Network is trained using the images Agriculture and Natural Resource Management,
taken in the natural environment and achieved 99.32% Springer Publ.
classification ability. This shows the ability of CNN to
extract important features in the natural environment which 7. Jun Wu, Anastasiya Olesnikova, Chi-Hwa Song, Won
is required for plant disease classification. Don Lee (2009). The Development and Application of
Decision Tree for Agriculture Data. IITSI :16-20.
Image classification, Image Categories, Feature Extraction,
and Training Data is carried out. The whole development of 8. Leemans, V., Destain, M.F.,2004.A real-time grading
algorithm is done in Python tool. Using several toolboxes method of apples based on features extracted from
like Statistics and Machine Learning toolbox, Neural defects. J. Food Eng. 61, 83-89.
Network Toolbox and Image Processing Toolbox the outputs
as of now are the training data in form of image categories, 9. Quinlan, J.R.(1985b). Decision trees and multi-valued
image classification using K-Means clustering and moisture attributes. In J.E. Hayes & D. Michie (Eds.), Machine
content along with predicting of withstanding. The algorithm intelligence 11. Oxford University Press (in press).