Department of Computer Science & Information Systems, BITS, Pilani. Agricultural Applications Mushroom Grading Apple Pest Management (PICO) Apple Proliferation Disease Soil Salinity Integrated Production in Agriculture Pesticide Abuse Precision Agriculture Drought Risk Management
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 2
Mushroom Grading Classification problem Develop a classification system for quality grading of mushrooms achieving an accuracy similar to that of human inspectors Details can be found in Kusabs et al, 1998.
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 3
Mushroom Grading: Building the Model Data Preprocessing – Cleansing of raw data – Construction of test data set in collaboration with agricultural researchers – Dataset contains descriptions of 282 mushrooms – Objective & subjective measures
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 4
Mushroom Grading: Building the Model Objective Measures – Weight – Firmness – Percentage of cap opening Subjective Measures - Likertscale estimates of the degree of dirt, stalk damage bruising, shrivel, bacterial blotch and P.gingeri
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 5
Mushroom Grading: Building the Model Three inspectors independently graded the mushrooms using the three broad commercial grades (1st, 2nd, and 3rd grade) Digital images were captured for the 282 mushrooms 60 image based attributes: frequency bin values (0-4) from the analysis of Red, Green and Blue (R,G,B) and Hue, Saturation and Value (H,S,V) histograms for top (t) and bottom (b) images of the sample mushrooms. Dimensionality reduction: many of the 68 attributes were eliminated
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 6
Mushroom Grading
Figure taken from “Developing innovative applications in agriculture using data
mining” by Sally Jo Cunningham and Geoffrey Holmes 2/23/2007 Dr. Navneet Goyal, BITS, Pilani 7 Mushroom Grading A separate model was developed for the three inspectors Models suggested that each inspector used different combinations of attributes when assigning grades to mushrooms All predictive models used attributes from top & bottom images Only inspector 2 used weight for the classification
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 8
Mushroom Grading Subjective measurements did not increase the accuracy of any of the prediction models So were removed by the wrapper technique Each model finally used between 4 – 7 attributes Avg. accuracy of these models was comparable with that human experts
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 9
Mushroom Grading Results indicate that visual attributes, which can be extracted from digital images are sufficient for mushroom grading Subjective attributes, commonly believed to play a crucial role in grading are apparently irrelevant to the task Surprising bit of ‘mined’ information
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 10
Mushroom Grading Mined information echoes the conclusions of a classic ML paper (Michlski & Chilausky, 1980) Paper induced a set of rules for diagnosing soybean disease Rules were strikingly dissimilar to expert opinions on the correct diagnosis procedure But rules were so accurate that one expert adopted the ‘discovered’ rules in place of his own
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 11
Mushroom Grading Mined information from data can provide insights into the domain being studied that may run counter to the received wisdom of a field Locating these surprising or unusual portions of the model can be the focus for a data mining analysis, so that the results can be applied back in the domain from which the data was drawn In this case, the results indicate that the subjective attributes for mushroom grading may not be useful in practice, and so perhaps they need not be measured or recorded Classification model may prove useful in developing more objective standards for quality grading and market pricing of mushrooms
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 12
Pesticide Abuse* Recent studies by agriculture researchers in Pakistan have shown that attempts of crop yield maximization through pro-pesticide state policies have led to a dangerously high pesticide usage. These studies have reported a negative correlation between pesticide usage and crop yield in Pakistan Excessive use (or abuse) of pesticides is harming the farmers with adverse financial, environmental and social impacts. * Based on paper titled Learning Dynamics of Pesticide Abuse through Data Mining by Ahsan Abdullah Stephen Brobst Ijaz Pervaiz at The Australasian Workshop on Data Mining and Web Intelligence (AWDM&WI2004), Dunedin, NewZealand. Conferences in Research and Practice in InformationTechnology, Vol. 32. Editors, James Hogan, Paul Montague, Martin Purvis and Chris Steketee. 2/23/2007 Dr. Navneet Goyal, BITS, Pilani 13 Pesticide Abuse Data Mining of integrated agricultural data including pest scouting, pesticide usage and meteorological recordings is useful for optimization (and reduction) of pesticide usage. Clustering of this data through Recursive Noise Removal (RNR) heuristic of Abdullah and Brobst (2003) Clusters reveal interesting patterns of farmer practices along with pesticide usage dynamics and hence help identify the reasons for this pesticide abuse.
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 14
Pesticide Abuse
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 15
Data Collection Pest Scouting Data – Past Pest Situation – Pesticide Usage Data – Farmer Demographics Pest scouting is a systematic field sampling process that provides field specific information on pest pressure and crop injury This data was obtained from the Directorate of Pest Warning and Quality Control of Pesticides (DPWQCP) Government of Punjab Since 1984 the said directorate has been collecting and recording pest scouting data on a weekly basis from mostly 1800 random locations. For this study the province of Punjab has been selected, because it is a major producer of the cotton crop (Federal Bureau of Statistics 2002). 2/23/2007 Dr. Navneet Goyal, BITS, Pilani 16 Data Collection Pest scouting data by itself is a “Gold Mine” of data Coupling it with pesticide usage and meteorological data can provide an excellent insight into the dynamics of past situations and their outcomes In Pakistan the Pest scouting data has never been digitized and until now it was impossible for any researcher to use it for an in depth analysis. As a pilot project we implemented a data warehouse using two years of pest scouting, pesticide usage and meteorological data consisting of 200 typed sheets and each record consisting of 40 attributes. 2/23/2007 Dr. Navneet Goyal, BITS, Pilani 17 Data Collection The data warehouse was implemented after digitization, cleansing and integration of data generated by multiple disparate sources. In the first phase of implementation we covered district Multan only, which is one of the thirty four districts of Province of Punjab. District Multan is the hub of cotton production and cotton related activities in the Province (Federal Bureau of Statistics 2002)
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 18
Objective Finding the conditions in which the pesticide usage will be optimal A typical question of a cotton grower would be “which pesticide should be used? And when ?” These questions were modeled by looking for a pattern and relationship between pest population and meteorological data elements, and to find out (if possible) temperature and humidity thresholds at which population of a certain pest booms (or declines)
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 19
Approach Randomly 20 records were chosen for year 2001 and noted populations of cotton pests such as Jasid (Amrasca), Thrips (Thrip Tabaci) and Spotted Boll Worm (Earias Vitella) For each record the Min, Max temperature and % humidity were retrieved from the daily weather database 20×21 table (20×20) similarity matrix based on calculating pairwise Pearson’s correlation
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 20
Findings
C1
C2
Clusters Identified by RNR
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 21
Findings
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 22
Findings
Temp > 29 AND Humidity > 70 then high pest incidence
Temp < 27 AND Humidity < 67 then low pest incidence
2/23/2007 Dr. Navneet Goyal, BITS, Pilani 23
Findings Checking these rules against the data (325 matching records retrieved out of 2,000+) shows some very exciting results as shown in Figure This experimentation presents a very credible case that common farmer questions can be modeled through this data mining technique and answers can be given based on evidence present in the data before the pest attack occurs.
Potential application of hyperspectral imaging and FT-NIR spectroscopy for discrimination of soilless tomato according to growing techniques, water use efficiency and fertilizer productivity