Experiments on the zero frequency problem

J.G. Cleary and W.J. Teahan. Experiments on the zero frequency problem. In J.A. Storer and M. Cohn, editors, Proc Data Compression Conference, page 480, Los Alamitos, CA, 1995. IEEE Press.
[ bib | .ps.gz | .pdf ]
The best algorithms for lossless compression of text are those which adapt to the text being compressed [1]. Two classes of such adaptive techniques are commonly used. One class matches the text against a dictionary of strings seen and transforms the text into a list of indices into the dictionary. These techniques are usually formulated as a variant on Ziv-Lempel (LZ) compression. While LZ compressors do not give the best compression they are widely used because of their simplicity and low execution overhead.


K*: An Instance-Based Learner Using an Entropic Distance Measure

J.G. Cleary and L.E. Trigg. K*: An instance-based learner using an entropic distance measure. In Proc Machine Learning Conference, pages 108-114, Tahoe City, CA, USA, 1995.
[ bib | .ps.gz | .pdf ]
The use of entropy as a distance measure has several benefits. Amongst other things it provides a consistent approach to handling of symbolic attributes, real valued attributes and missing values. The approach of taking all possible transformation paths is discussed. We describe K*, an instance-based learner which uses such a measure, and results are presented which compare favourably with several machine learning algorithms.


Unbounded length contexts for PPM

J.G. Cleary, W.J. Teahan, and I.H. Witten. Unbounded length contexts for ppm. In J.A. Storer and M. Cohn, editors, Proc Data Compression Conference, pages 52-61, Los Alamitos, CA, 1995. IEEE Press.
[ bib | .ps.gz | .pdf ]
The PPM data compression scheme has set the performance standard in lossless compression of text throughout the past decade. The original algorithm was first published in 1984 by Cleary and Witten [3], and a series of improvements was described by Moffat [6], culminating in a careful implementation, called PPMC, which has become the benchmark version. This still achieves results superior to virtually all other compression methods, despite many attempts to better it. Other methods such as those based on Ziv-Lempel coding [9] are more commonly used in practice, but their attractiveness lies in their relative speed rather than any superiority in compression-indeed, their compression performance generally falls distinctly below that of PPM in practical benchmark tests [1].


Machine Learning and Statistics: a Matter of Perspective

S.J. Cunningham. Machine learning and statistics: a matter of perspective. New Zealand J Computing, 6(1a):69-73, August 1995.
[ bib | .ps.gz | .pdf ]
Information has become an important commercial commodity-indeed, possibly the most important product of the future. While we have well-developed technologies to store data, the analysis to extract information is time-consuming and requires skilled human intervention. Machine learning algorithms augment statistical analysis by providing mechanisms that automate the information discovery process. These algorithms also tend to be more accessible to end-users and domain experts. The two analysis methods are converging, and the fields have much to offer each other.


Analysis of Cow Culling with a Machine Learning Workbench

R.E. DeWar and R.J. McQueen. Analysis of cow culling with a machine learning workbench. Technical Report 95/1, University of Waikato, Computer Science Department, Hamilton, New Zealand, January 1995.
[ bib | .ps.gz | .pdf ]
This report discusses the use of machine learning tools to examine datasets from a database of dairy cows. The objective of the study was to investigate whether these machine learning tools could extract meaningful rules from the datasets, and in the process, understand more about how these tools manipulate data.


WEKA: The Waikato Environment for Knowledge Analysis

S.R. Garner. Weka: The waikato environment for knowledge analysis. In Proc New Zealand Computer Science Research Students Conference, pages 57-64, University of Waikato, Hamilton, New Zealand, 1995.
[ bib | .ps.gz | .pdf ]
WEKA is a workbench designed to aid in the application of machine learning technology to real world data sets, in particular, data sets from New Zealand's agricultural sector. In order to do this a range of machine learning techniques are presented to the user in such a way as to hide the idiosyncrasies of input and output formats, as well as allow an exploratory approach in applying the technology. The system presented is a component based one that also has application in machine learning research and education.


Applying a Machine Learning Workbench: Experience with Agricultural Databases

S.R. Garner, S.J. Cunningham, G. Holmes, C.G. Nevill-Manning, and Witten I.H. Applying a machine learning workbench: Experience with agricultural databases. In Proc Machine Learning in Practice Workshop, Machine Learning Conference, pages 14-21, Tahoe City, CA, USA, 1995.
[ bib | .ps.gz | .pdf ]
This paper reviews our experience with the application of machine learning techniques to agricultural databases. We have designed and implemented a machine learning workbench, WEKA, which permits rapid experimentation on a given dataset using a variety of machine learning schemes, and has several facilities for interactive investigation of the data: preprocessing attributes, evaluating and comparing the results of different schemes, and designing comparative experiments to be run off-line. We discuss the partnership between agricultural scientist and machine learning researcher that our experience has shown to be vital to success. We review in some detail a particular agricultural application concerned with the culling of dairy herds.


Machine learning from agricultural databases: practice and experience

S.R. Garner, G. Holmes, R.J. McQueen, and I.H. Witten. Machine learning from agricultural databases: practice and experience. New Zealand J Computing, 6(1a):69-73, August 1996.
[ bib | .ps.gz | .pdf ]
Per capita, New Zealand is reputed to be one of the world's leading collectors of information in databases. The country, quite rightly, places a substantial investment in stored knowledge. As databases grow, however, interest tends to shift from techniques for retrieving individual items to large-scale analysis of the data as a whole. Data can be analysed to view trends, pick out anomalies, discover relationships, check that policy is turned into practice, and so on. With large databases, such analyse scan be costly.

At Waikato University we are examining these issues in the context of the agricultural sector of the economy. Our project centers around techniques of data mining or machine learning, which automatically analyse large bodies of data to discover hidden dependencies. Other methods of exploratory data analysis-many of them interactive-are being investigated as an adjunct to the machine learning algorithms. This paper reviews the software that has been developed to support the application of machine learning techniques, and comments on our success with particular agricultural databases. We discuss those aspects of the databases that hinder the analysis, to assist with future data collection activities.


Selection of attributes for modeling Bach chorales by a genetic algorithm

M.A. Hall. Selection of attributes for modeling bach chorales by a genetic algorithm. In Proc Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems, pages 182-185, Dunedin, NZ, 1995. IEEE Computer Society Press.
[ bib | .ps | .pdf ]
A genetic algorithm selected combinations of attributes for a machine learning system. The algorithm used 90 Bach chorale melodies to train models and randomly selected sets of 10 chorales for evaluation. Compression of pitch was used as the fitness evaluation criterion. The best models were used to compress a different test set of chorales and their performance compared to human generated models. G.A. models outperformed the human models, improving compression by 10 percent.


Feature selection via the discovery of simple classification rules

G. Holmes and C.G. Nevill-Manning. Feature selection via the discovery of simple classification rules. In Proc Symposium on Intelligent Data Analysis (IDA-95), pages 75-79, Baden-Baden, Germany, 1995.
[ bib | .ps.gz | .pdf ]
It has been our experience that in order to obtain useful results using supervised learning of real-world datasets it is necessary to perform feature subset selection and to perform many experiments using computed aggregates from the most relevant features. It is, therefore, important to look for selection algorithms that work quickly and accurately so that these experiments can be performed in a reasonable length of time, preferably interactively. This paper suggests a method to achieve this using a very simple algorithm that gives good performance across different supervised learning schemes and when compared to one of the most common methods for feature subset selection.


Document zone classification using machine learning

S. Inglis and I.H. Witten. Document zone classification using machine learning. In Proc Digital Image Computing: Techniques and Applications, pages 631-636, Brisbane, Australia, 1995.
[ bib | .pdf ]
When processing document images, an important step is classifying the zones they contain into meaningful categories such as text, halftone pictures, line drawings, and mathematical formulae. A character recognition system, for example, may confine its attention to zones that are classified as text, while in an image compressor may employ specialized techniques and models for zones such as halftone pictures. The penalty for incorrect classification may range from incorrect interpretation to reduced efficiency. In any case, the higher the classification accuracy, the better the results.

This classification problem is not new. For example, Wang and Srihari [1] described a method for zone classification that gave 100% accuracy on the images they used for testing. But these images contained just 41 zones-and one of the categories occurred only once. Previous approaches to the problem generally describe a new set of features and assign classes using some linear weighted formula or nonlinear heuristic. In contrast, our work uses predefined features and investigates the application of standard machine learning methods, using a large publicly-available document database as a source of training and test data.


Applications of machine learning in information retrieval

J.N. Littin. Applications of machine learning in information retrieval. Technical Report 0657.555, University of Waikato, Computer Science Department, Hamilton, New Zealand, 1995. Directed Research Project.
[ bib | .pdf ]

Instance-Based Learning: Nearest Neighbour with Generalisation

B.I. Martin. Instance-based learning: Nearest neighbour with generalisation. Master's thesis, University of Waikato, Computer Science Department, Hamilton, New Zealand, 1995.
[ bib | .ps.gz | .pdf ]

Interactive concept learning for end-user applications

D. Maulsby and I.H. Witten. Interactive concept learning for end-user applications. Technical Report 95/4, University of Waikato, Computer Science Department, Hamilton, New Zealand, 1995.
[ bib | .pdf ]
Personalizable software agents will learn new tasks from their users. This implies being able to learn from instructions users might give: examples, yes/no responses, and ambiguous, incomplete hints. Agents should also exploit background knowledge customized for applications such as drawing, word processing and form-filling. The task models that agents learn describe data, actions and their context. Learning about data from examples and hints is the subject of this paper.

The Cima learning system combines evidence from examples, task knowledge and user hints to form Disjunctive Normal Form (DNF) rules for classifying, generating or modifying data. Cima's dynamic bias manager generates candidate features (attribute values, functions or relations), from which its DNF learning algorithm selects relevant features and forms the rules. The algorithm is based on a classic greedy method, with two enhancements. First, the standard learning criterion, correct classification, is augmented with a set of utility and instructional criteria. Utility criteria ensures that descriptions are properly formed for use in actions, whether to classify, search for, generate or modify data. Instructional criteria ensure that descriptions include features that users suggest and avoid those that users reject. The second enhancement is to augment the usual statistical metric for selecting relevant attributes with a set of heuristics, including beliefs based on user suggestions and application-specific background knowledge. Using multiple heuristics increases the justification for selecting features; more important, it helps the learner choose among alternative interpretations of hints.

When tested on dialogues observed in a prior user study on a simulated interface agent, the learning algorithm achieves 95% of the learning efficiency standard established in that study.


Learning to describe data in actions

D. Maulsby and I.H. Witten. Learning to describe data in actions. In Proc PBD Workshop, Machine Learning Conference, pages 65-73, Tahoe City, CA, July 1995.
[ bib | .pdf ]
Traditional machine learning algorithms have failed to serve the needs of systems for Programming by Demonstration (PBD), which require interaction with a user (a teacher) and a task environment. We argue that traditional learning algorithms fail for two reasons: they do not cope with the ambiguous instructions that users provide in addition to examples; and their learning criterion requires only that concepts classify examples to some degree of accuracy, ignoring the other ways in which an active agent might use concepts. We show how a classic concept learning algorithm can be adapted for use in PBD by replacing the learning criterion with a set of instructional and utility criteria, and by replacing a statistical preference bias with a set of heuristics that exploit user hints and background knowledge to focus attention.


Applying Machine Learning to Agricultural Data

R.J. McQueen, S.R. Garner, C.G. Nevill-Manning, and I.H. Witten. Applying machine learning to agricultural data. Computing and Electronics in Agriculture, 12:275-293, 1995.
[ bib | .ps.gz | .pdf ]
Many techniques have been developed for abstracting, or learning, rules and relationships from diverse data sets, in the hope that machines can help in the often tedious and error-prone process of acquiring knowledge from empirical data. While these techniques are plausible, theoretically well-founded, and perform well on more or less artificial test data sets, they stand or fall on their ability to make sense of real-world data. This paper describes a project that is applying a range of learning strategies to problems in primary industry, in particular agriculture and horticulture. We briefly survey some of the more readily applicable techniques that are emerging from the machine learning research community, describe a software workbench that allows users to experiment with a variety of techniques on real-world data sets, and detail the problems encountered and solutions developed in a case study of dairy herd management in which culling rules were inferred from a medium-sized database of herd information.


Application of Machine Learning Techniques to Time-Series Data

R.S. Mitchell. Application of machine learning techniques to time-series data. Technical Report 95/15, University of Waikato, Computer Science Department, Hamilton, New Zealand, April 1995.
[ bib | .ps.gz | .pdf ]
This report documents new methods for discovering knowledge in real world time-series data. Two complementary approaches were investigated: 1) manipulation of the original dataset into a form that is usable by conventional similarity-based learners; and 2) using sequence identification techniques to learn the concepts embedded in the database. Experimental results obtained from applying both techniques to a large agricultural database are presented and analysed.


Detecting sequential structure

C.G. Nevill-Manning and I.H. Witten. Detecting sequential structure. In Proc Programming by Demonstration Workshop, Machine Learning Conference, pages 49-56, Tahoe City, CA, USA, 1995.
[ bib | .ps.gz | .pdf ]
Programming by demonstration requires detection and analysis of sequential patterns in a user's input, and the synthesis of an appropriate structural model that can be used for prediction. This paper describes SEQUITUR, a scheme for inducing a structural description of a sequence from a single example. SEQUITUR integrates several different inference techniques: identification of lexical subsequences or vocabulary elements, hierarchical structuring of such subsequences, identification of elements that have equivalent usage patterns, inference of programming constructs such as looping and branching, generalisation by unifying grammar rules, and the detection of procedural substructure. Although SEQUITUR operates with abstract sequences, a number of concrete illustrations are provided.


Learning from experience

C.G. Nevill-Manning. Learning from experience. In Proc New Zealand Computer Science Research Students Conference, pages 193-201, University of Waikato, Hamilton, New Zealand, 1995.
[ bib | .ps | .pdf ]
Life is a sequence of events; some of them predictable, some unexpected. We instinctively look for structure in the stream of events we observe, and learning to anticipate these events is an important part of functioning successfully in the world. Machine learning emphasises learning from unordered facts, but for machines to be helpful in our day-to-day activities, they also need to be able to recognise patterns in the sequences they observe.

The sequence of events that constitute our lives is often monotonous; we end up repeating certain sequences of actions again and again. We have built machines to relieve some of this monotony, but they often fail to be flexible enough to free us of all but the most common repetitive tasks.

SEQUITUR is a system which infers structure from a sequence, and uses the structure to explain and extrapolate the sequence. It is capable of recognising structure in a variety of real-world sequences, but is of particular utility in automating repetitive computing tasks.


The Development of Holte's 1R Classifier

C.G. Nevill-Manning, G. Holmes, and I.H. Witten. The development of holte's 1r classifier. In Proc Artificial Neural Networks and Expert Systems, pages 239-242, Dunedin, NZ, 1995.
[ bib | .ps.gz | .pdf ]
The 1R procedure for machine learning is a very simple one that proves surprisingly effective on the standard datasets commonly used for evaluation. This paper describes the method and discusses two areas that can be improved: the way that intervals are formed when discretizing continuously-valued attributes, and the way that missing values are treated. Then we show how the algorithm can be extended to avoid a problem endemic to most practical machine learning algorithms-their frequent dismissal of an attribute as irrelevant when in fact it is highly relevant when combined with other attributes.


A genetic algorithm for the induction of natural language grammars

T.C. Smith and I.H. Witten. A genetic algorithm for the induction of natural language grammars. In Proc IJCAI-95 Workshop on New Approaches to Learning for Natural Language Processing, pages 17-24, Montreal, Canada, 1995.
[ bib | .ps.gz | .pdf ]
Strict pattern-based methods of grammar induction are often frustrated by the apparently inexhaustible variety of novel word combinations in large corpora. Statistical methods offer a possible solution by allowing frequent well-formed expressions to overwhelm the infrequent ungrammatical ones. They also have the desirable property of being able to construct robust grammars from positive instances alone. Unfortunately, the zero-frequency problem entails assigning a small probability to all possible word patterns, thus ungrammatical n-grams become as probable as unseen grammatical ones. Further, such grammars are unable to take advantage of inherent lexical properties that should allow infrequent words to inherit the syntactic properties of the class to which they belong.

This paper describes a genetic algorithm (GA) that adapts a population of hypothesis grammars towards a more effective model of language structure. The GA is statistically sensitive in that the utility of frequent patterns is reinforced by the persistence of efficient substructures. It also supports the view of language learning as a bootstrapping problem, a learning domain where it appears necessary to simultaneously discover a set of categories and a set of rules defined over them. Results from a number of tests indicate that the GA is a robust, fault-tolerant method for inferring grammars from positive examples of natural language.


Probability-driven lexical classification: a corpus-based approach

T.C. Smith and I.H. Witten. Probability-driven lexical classification: a corpus-based approach. In Proc Pacific Association for Computational Linguistics Conference, pages 271-283, Brisbane, Australia, 1995.
[ bib | .ps.gz | .pdf ]
Successful grammatical inference from a corpus of linguistic material rests largely on the ability to tag the words of the corpus with appropriate lexical categories. Static tagging methods, such as dictionary lookup, often misclassify words associated with multiple categories, and adaptive statistical taggers or context-based corrective taggers may still have errorrates of 3 or 4 percent. Even a small proportion of lexical misclassification may lead a syntax induction mechanism to produce an extraordinarily large number of special-case rules.

By treating grammar induction as a bootstrapping problem in which it is necessary to simultaneously discover a set of categories and a set of rules defined over them, lexical tagging is relieved of constraints imposed by traditional notions of syntactic categories. Categories are created based on salient features in the corpus under study.

This paper describes a statistical corpus-based approach to lexical inference that uses linguistic notions of syntactic function to guide its analysis of a given text. Unlike other probability-driven inference techniques, it is simple enough to apply to the complete vocabularies of very large texts. It has been successfully incorporated into a syntactic inference system whose resulting grammars have been shown superior to two standard stochastic CFGs.


Subset selection using rough numeric dependency

T.C. Smith and G. Holmes. Subset selection using rough numeric dependency. In Proc Symposium on Intelligent Data Analysis (IDA-95), pages 191-195, Baden-Baden, Germany, 1995.
[ bib | .ps.gz | .pdf ]
A large percentage of computer systems are presently dedicated to the collection of an incomprehensibly vast amount of information. The daily world-wide accrual of data has long since exceeded the ability for human analysis, and machine processing has been largely limited to collation, summation, sorting and a variety of basic statistical analyses.

Artificial intelligence research in the form of expert systems has attempted to relieve much of the burden of analysis from the shoulders of human beings by developing automated reasoning systems that can respond quickly to the flood of incoming data. Computer users have countered with the collection of an even broader range of information for which expert systems have either not yet been built or whose domains have proven as yet too difficult to formalise.


PBD systems: when will they ever learn?

I.H. Witten. Pbd systems: when will they ever learn? In Proc Programming by Demonstration Workshop, Machine Learning Conference, pages 1-9, Tahoe City, CA, USA, 1995.
[ bib | .ps.gz | .pdf ]
We trace the tragicomic courtship between programming by demonstration and machine learning, two fields that seem made for each other but have never quite got together. A long-term historical perspective illuminates the twin roles of interaction and sequence learning in programming by demonstration, and reveals the parallel growth of machine learning as a research area in its own right. A view of the present shows a few-but only a few-intimations that these two fields are beginning to develop a meaningful relationship. The long-term prognosis is not so much wedded bliss as assimilation-ultimately PBD and ML shall, in the words of Genesis, cleave unto each other: and they shall be one flesh.


Intelligent data analysis using the WEKA workbench

I.H. Witten, S.J. Cunningham, and G. Holmes. Intelligent data analysis using the weka workbench. In Conference on Artificial Neural Networks and Expert Systems, Dunedin, NZ, 1995. Tutorial Notes.
[ bib | .ps | .pdf ]