Sie sind auf Seite 1von 24

DATA MINING APPLICATIONS

IN AGRICULTURE

Prof. Navneet Goyal


Department of Computer Science & Information Systems,
BITS, Pilani.
Agricultural Applications
„ Mushroom Grading
„ Apple Pest Management (PICO)
„ Apple Proliferation Disease
„ Soil Salinity
„ Integrated Production in Agriculture
„ Pesticide Abuse
„ Precision Agriculture
„ Drought Risk Management

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 2


Mushroom Grading
„ Classification problem
„ Develop a classification system
for quality grading of mushrooms
achieving an accuracy similar to
that of human inspectors
„ Details can be found in Kusabs et
al, 1998.

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 3


Mushroom Grading: Building
the Model
„ Data Preprocessing
– Cleansing of raw data
– Construction of test data set in
collaboration with agricultural
researchers
– Dataset contains descriptions of 282
mushrooms
– Objective & subjective measures

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 4


Mushroom Grading: Building
the Model
„ Objective Measures
– Weight
– Firmness
– Percentage of cap opening
„ Subjective Measures - Likertscale
estimates of the degree of dirt, stalk
damage bruising, shrivel, bacterial blotch
and P.gingeri

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 5


Mushroom Grading: Building
the Model
„ Three inspectors independently graded the
mushrooms using the three broad
commercial grades (1st, 2nd, and 3rd grade)
„ Digital images were captured for the 282
mushrooms
„ 60 image based attributes: frequency bin
values (0-4) from the analysis of Red, Green and
Blue (R,G,B) and Hue, Saturation and Value
(H,S,V) histograms for top (t) and bottom (b)
images of the sample mushrooms.
„ Dimensionality reduction: many of the 68
attributes were eliminated

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 6


Mushroom Grading

Figure taken from “Developing innovative applications in agriculture using data


mining” by Sally Jo Cunningham and Geoffrey Holmes
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 7
Mushroom Grading
„ A separate model was developed for
the three inspectors
„ Models suggested that each inspector
used different combinations of
attributes when assigning grades to
mushrooms
„ All predictive models used attributes
from top & bottom images
„ Only inspector 2 used weight for the
classification

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 8


Mushroom Grading
„ Subjective measurements did not
increase the accuracy of any of the
prediction models
„ So were removed by the wrapper
technique
„ Each model finally used between 4 – 7
attributes
„ Avg. accuracy of these models was
comparable with that human experts

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 9


Mushroom Grading
„ Results indicate that visual attributes,
which can be extracted from digital
images are sufficient for mushroom
grading
„ Subjective attributes, commonly
believed to play a crucial role in
grading are apparently irrelevant to
the task
„ Surprising bit of ‘mined’ information

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 10


Mushroom Grading
„ Mined information echoes the conclusions of
a classic ML paper (Michlski & Chilausky,
1980)
„ Paper induced a set of rules for diagnosing
soybean disease
„ Rules were strikingly dissimilar to expert
opinions on the correct diagnosis procedure
„ But rules were so accurate that one expert
adopted the ‘discovered’ rules in place of his
own

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 11


Mushroom Grading
„ Mined information from data can provide insights
into the domain being studied that may run counter
to the received wisdom of a field
„ Locating these surprising or unusual portions of the
model can be the focus for a data mining analysis, so
that the results can be applied back in the domain
from which the data was drawn
„ In this case, the results indicate that the subjective
attributes for mushroom grading may not be useful in
practice, and so perhaps they need not be measured
or recorded
„ Classification model may prove useful in developing
more objective standards for quality grading and
market pricing of mushrooms

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 12


Pesticide Abuse*
„ Recent studies by agriculture researchers in
Pakistan have shown that attempts of crop
yield maximization through pro-pesticide
state policies have led to a dangerously high
pesticide usage.
„ These studies have reported a negative
correlation between pesticide usage and
crop yield in Pakistan
„ Excessive use (or abuse) of pesticides is
harming the farmers with adverse financial,
environmental and social impacts.
* Based on paper titled Learning Dynamics of Pesticide Abuse through Data Mining by
Ahsan Abdullah Stephen Brobst Ijaz Pervaiz at The Australasian Workshop on Data Mining and
Web Intelligence (AWDM&WI2004), Dunedin, NewZealand. Conferences in Research and Practice in
InformationTechnology, Vol. 32. Editors, James Hogan, Paul Montague, Martin Purvis and Chris
Steketee.
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 13
Pesticide Abuse
„ Data Mining of integrated agricultural data
including pest scouting, pesticide usage and
meteorological recordings is useful for
optimization (and reduction) of pesticide
usage.
„ Clustering of this data through Recursive
Noise Removal (RNR) heuristic of Abdullah
and Brobst (2003)
„ Clusters reveal interesting patterns of
farmer practices along with pesticide usage
dynamics and hence help identify the
reasons for this pesticide abuse.

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 14


Pesticide Abuse

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 15


Data Collection
„ Pest Scouting Data
– Past Pest Situation
– Pesticide Usage Data
– Farmer Demographics
„ Pest scouting is a systematic field sampling process
that provides field specific information on pest
pressure and crop injury
„ This data was obtained from the Directorate of Pest
Warning and Quality Control of Pesticides (DPWQCP)
Government of Punjab
„ Since 1984 the said directorate has been collecting
and recording pest scouting data on a weekly basis
from mostly 1800 random locations.
„ For this study the province of Punjab has been
selected, because it is a major producer of the cotton
crop (Federal Bureau of Statistics 2002).
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 16
Data Collection
„ Pest scouting data by itself is a “Gold Mine”
of data
„ Coupling it with pesticide usage and
meteorological data can provide an
excellent insight into the dynamics of past
situations and their outcomes
„ In Pakistan the Pest scouting data has never
been digitized and until now it was
impossible for any researcher to use it for an
in depth analysis.
„ As a pilot project we implemented a data
warehouse using two years of pest scouting,
pesticide usage and meteorological data
consisting of 200 typed sheets and each
record consisting of 40 attributes.
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 17
Data Collection
„ The data warehouse was implemented after
digitization, cleansing and integration of
data generated by multiple disparate
sources.
„ In the first phase of implementation we
covered district Multan only, which is one of
the thirty four districts of Province of
Punjab.
„ District Multan is the hub of cotton
production and cotton related activities in
the Province (Federal Bureau of Statistics
2002)

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 18


Objective
„ Finding the conditions in which the pesticide
usage will be optimal
„ A typical question of a cotton grower would
be “which pesticide should be used? And when ?”
„ These questions were modeled by looking
for a pattern and relationship between pest
population and meteorological data
elements, and to find out (if possible)
temperature and humidity thresholds at
which population of a certain pest booms
(or declines)

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 19


Approach
„ Randomly 20 records were chosen for year
2001 and noted populations of cotton pests
such as Jasid (Amrasca), Thrips (Thrip
Tabaci) and Spotted Boll Worm (Earias
Vitella)
„ For each record the Min, Max temperature
and % humidity were retrieved from the
daily weather database
„ 20×21 table
„ (20×20) similarity matrix based on
calculating pairwise Pearson’s correlation

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 20


Findings

C1

C2

Clusters Identified by RNR

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 21


Findings

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 22


Findings

Temp > 29 AND Humidity > 70 then high pest incidence


Temp < 27 AND Humidity < 67 then low pest incidence

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 23


Findings
„ Checking these rules against the data (325 matching
records retrieved out of 2,000+) shows some very
exciting results as shown in Figure
„ This experimentation presents a very credible case
that common farmer questions can be modeled
through this data mining technique and answers can
be given based on evidence present in the data before
the pest attack occurs.

2/23/2007 Dr. Navneet Goyal, BITS, Pilani 24

Das könnte Ihnen auch gefallen