Computers "learn" about agricultural data



Press Release, June 30, 1994

by Dr Robert McQueen

A new computer software technique, called machine learning, is being investigated by a research group at the University of Waikato in Hamilton.

Machine learning uses a number of methods to extract relations and rules from sets of data, and the researchers are particularly interested in agriculture-related data, and how these techniques might be used to increase the effectiveness and productivity of applicable sectors in the industry.

Most people are familiar with the concept of a database, where a set of records containing various fields of data can be interrogated, and the records with a match in the specified field can be extracted. Complex multiple field extractions can be assembled, such as "give me the records of all employees making over $30,000 and with an age of less than 30, and who have been with the company for more than 5 years". Database extractions require you to have a good understanding of the structure, or fields, of the database records, and some idea of what you want to find. For example, you might use a query like this to investigate whether a relationship that you think is true of the data -- maybe that all such employees have university degrees -- really does hold.

Machine learning is quite a bit different, in that the machine learning program finds relationships for you automatically, rather than you having to guess what relationships might exist. Machine learning can also accommodate data where some erroneous instances may be incorporated, quite unlike normal database retrieval work where precision and correctness are essential. The output of a machine learning program might be a set of rules than helps you understand the structure of the data, such as "if attribute 1 is greater than 7.5, and attribute 2 is equal to the word brown, then attribute 9 is likely to be greater than 50 days, in 95% of the cases"

The most common machine learning technique is called "supervised learning." For example, a set of diagnoses prepared by an expert for tomato diseases, based on attributes visible on the plant, can be run through a machine learning program to determine rules that non-experts can use in future to help diagnose other plants based on the same visible attributes. This is called similarity-based machine learning, where the outcome classes (in this case, diagnoses) are known, but the rules to get there, based on attributes, may not be so easy to determine.

Another type of machine learning is called "unsupervised learning," or clustering. This technique is useful where there is a mass of data giving the attributes of individual cases, but the human expert has no clear idea of which cases go together. Unsupervised machine learning can produce clusters of cases which seem to have something in common, which can then be further examined to find the exact rules and relationships.

Interactive machine learning matches a human domain expert with a machine learning program, and together, interactively, they peel back the noise from the data to reveal the underlying structure and relationships. And finally, sequence identification machine learning looks for relationships in sets of data that may have cyclic repetitions, such as multi-year pest cycles, or data that has a changing structure over time, such as the effect of fertiliser application programs over a number of years.

These techniques are being applied to sample agricultural data by the University of Waikato researchers to test how they can best be used to extract meaning from existing datasets. Some datasets that are being investigated include cow culling data, possum and rabbit populations, diabetes data, recumbent cow data, and others. It is hoped that, as knowledge of this work becomes more widespread in the agricultural community, further databases will be made available to the research team, to help further refine the machine learning techniques being developed.

While a commercial software product that might be usable by a farmer on a home computer is still some way off, a prototype of a machine learning "workbench," or assembly of software programs, has been developed by the Waikato researchers, and is being made available to other researchers in this research area. The prototype workbench is called WEKA, after the inquisitive native New Zealand bird, and stands for Waikato Environment for Knowledge Acquisition.

Further information on the activities of the machine learning research group may be obtained from the author, or from Professor Ian Witten (ihw@cs.waikato.ac.nz), both at the University of Waikato, Private Bag 3105, Hamilton.



If you have any comments about these pages then please contact: mlwebmaster@cs.waikato.ac.nz