Complexity-based Induction

D. Conklin and I.H. Witten. Complexity-based induction. Machine Learning, 16(3):203-225, 1994.
[ bib | .ps.gz | .pdf ]
A central problem in inductive logic programming is theory evaluation. Without some sort of preference criterion, any two theories that explain the same set of examples are equally acceptable. This paper presents a scheme for evaluating alternative inductive theories based on an objective preference criterion. It strives to extract maximal redundancy from examples, transforming structure into randomness. A major strength of the method is its application to learning problems where negative examples of concepts are scarce or unavailable. A new measure called model complexity is introduced, and its use is illustrated and compared with a proof complexity measure on relational learning tasks. The complementarity of the model and proof complexity parallels that of model and proof-theoretic semantics. Model complexity, where applicable, seems to be an appropriate measure for evaluating inductive logic theories.


WEKA Machine Learning Project: Cow Culling

R.E. De War and D.L. Neal. Weka machine learning project: Cow culling. Technical report, The University of Waikato, Computer Science Department, Hamilton, New Zealand, 1994.
[ bib | .ps.gz | .pdf ]
This document describes the results of applying the WEKA machine learning workbench to a large database of dairy cows. The aim of the project was to induce usable rules for culling decisions from the data, and hence find which attributes were most important to farmers in determining whether an animal should be culled.


WEKA: A Machine Learning Workbench

G. Holmes, A. Donkin, and I.H. Witten. Weka: A machine learning workbench. In Proc Second Australia and New Zealand Conference on Intelligent Information Systems, Brisbane, Australia, 1994.
[ bib | .ps.gz | .pdf ]
WEKA is a workbench for machine learning that is intended to aid in the application of machine learning techniques to a variety of real-world problems, in particular, those arising from agricultural and horticultural domains. Unlike other machine learning projects, the emphasis is on providing a working environment for the domain specialist rather than the machine learning expert. Lessons learned include the necessity of providing a wealth of interactive tools for data manipulation, result visualization, database linkage, and cross-validation and comparison of rule sets, to complement the basic machine learning tools.


Knowledge-rich Induction of Classification Rules

B. Martin. Knowledge-rich induction of classification rules. In Proc Canadian Machine Learning Workshop, University of Calgary, Alberta, Canada, 1994.
[ bib | .ps.gz | .pdf ]
The purpose of this research was to produce a machine learning system that can take advantage of many forms of background knowledge to guide the induction of classification rules. This system will be used for knowledge discovery in databases (also known as database mining), as the use of background knowledge can considerably reduce the search space of a database knowledge search. A new system, MARVIN++, is introduced, that attempts to satisfy this aim.


The WEKA Machine Learning Workbench : Its Application to a Real World Agricultural Database

R.J. McQueen, D.L. Neal, R.E. DeWar, S.R. Garner, and C.G. Nevill-Manning. The weka machine learning workbench : Its application to a real world agricultural database. In Proc Canadian Machine Learning Workshop, Banff, Alberta, Canada, 1994.
[ bib | .ps.gz | .pdf ]
Numerous techniques have been proposed for learning rules and relationships from diverse data sets, in the hope that machines can help in the often tedious and error-prone process of knowledge acquisition. While these techniques are plausible and theoretically well-founded, they stand or fall on their ability to make sense of real-world data. This paper describes a project that aims to apply a range of learning strategies to problems in primary industry, in particular agriculture and horticulture.


Geometric Comparison of Classifications and Rule Sets

T.J. Monk, R.S. Mitchell, L.A. Smith, and G. Holmes. Geometric comparison of classifications and rule sets. In Proc AAAI Workshop on Knowledge Discovery in Databases, pages 395-406, Seattle, Washington, USA, 1994.
[ bib | .ps.gz | .pdf ]
We present a technique for evaluating classifications by geometric comparison of rules. Rules are represented as objects in an n-dimensional hyperspace. The similarity of classes is computed from the overlap of the geometric class descriptions. The system produces a correlation matrix that indicates the degree of similarity between each pair of classes. The technique can be applied to classification generated by different algorithms, with different numbers of classes and different attribute sets. Experimental results from a case study in a medical domain are included.


Modelling Sequences Using Grammars and Automata

C.G. Nevill-Manning and D.L. Maulsby. Modelling sequences using grammars and automata. In Proc Canadian Machine Learning Workshop, University of Calgary, Alberta, Canada, 1994.
[ bib | .pdf ]
This paper presents two sequence modelling techniques. The first induces a context-free deterministic grammar from a sequence. It was motivated by a specific machine learning problem, that of modelling a sequence of actions performed by a computer user, but it can also be applied to other machine learning problems, and its performance as a data compression technique is the best in its class. The second technique induces a push-down finite-state automaton from a sequence. It was designed to derive an executable program from a program execution trace expressed in high-level language statements, and is capable of recognising branches and loops, as well as recursive and non-recursive procedure calls. The inductive capabilities of these two techniques are complementary, and following their description we examine how they can be combined into a more powerful system.


Compression by Induction of Hierarchical Grammars

C.G. Nevill-Manning, I.H. Witten, and D.L. Maulsby. Compression by induction of hierarchical grammars. In J.A. Storer and M. Cohn, editors, Proc Data Compression Conference, pages 244-253, Los Alamitos, CA, 1994. IEEE Press.
[ bib | .pdf ]
This paper describes a technique that constructs models of symbol sequences in the form of small, human-readable, hierarchical grammars. The grammars are both semantically plausible and compact. The technique can induce structure from a variety of different kinds of sequence, and examples are given of models derived from English text, C source code and a sequence of terminal control codes.


Data Transformation: A Semantically-based Approach to Function Discovery

T.H. Phan and I.H. Witten. Data transformation: A semantically-based approach to function discovery. Technical report, The University of Waikato, Computer Science Department, Hamilton, New Zealand, 1994.
[ bib | .ps.gz | .pdf ]
This paper presents the method of data transformation for discovering numeric functions from their examples. Based on the idea of transformations between functions, this method can be viewed as a semantic counterpart to the more common approach of formula construction used in most previous discovery systems. Advantages of the new method include a flexible implementation through the design of transformation rules, and a sound basis for rigorous mathematical analysis to characterize what can be discovered. The method has been implemented in a discovery system called Linus , which can identify a wide range of functions: rational functions, quadratic relations, and many transcendental functions, as well as those that can be transformed to rational functions by combinations of diferentiation, logarithm and function inverse operations.


Function Discovery using Data Transformation

T.H. Phan and I.H. Witten. Function discovery using data transformation. In Proc International Symposium on Artificial Intelligence and Mathematics, Florida, USA, 1994.
[ bib | .ps.gz | .pdf ]
Function discovery is the problem of finding a symbolic formula for an unknown function from examples of the function's value on certain arguments. In most previous discovery systems, the description language has been restricted to rational functions so that symbolic descriptions can be easily enumerated. This papers shows how data transformation can be used as the basis of a far more comprehensive description language that includes all functions that can be transformed to rational functions by differentiation and logarithm operations. the main contribution of this paper is to define a transformation-based description language and characterize its representational power. We also briefly sketch a practical implementation of a function induction system that uses this approach.


The WEKA Machine Learning Workbench (Video)

WEKA Machine Learning Group. The weka machine learning workbench (video), 1994.
[ bib ]
This is a 6-minute video tape giving a brief introduction to the WEKA project and the WEKA workbench.


Trans-Pacific Machine Learning Research: the Calgary/Waikato Axis

I.H. Witten. Trans-pacific machine learning research: the calgary/waikato axis. In Proc Canadian Machine Learning Workshop, Calgary, Alberta, Canada, 1994.
[ bib | .ps.gz | .pdf ]
The following five contributions summarize selected research projects on machine learning that are being undertaken in the Computer Science Departments at the Universities of Calgary and Waikato:
(a) The WEKA machine learning workbench (Bob McQueen et al.);
(b) Knowledge-rich induction of classification rules (Brent Martin);
(c) LINUS: a transformation-based system for function discovery (Thong Phan);
(d) Modeling sequences (Craig Nevill-Manning et al.);
(e) Instructible agents (Dave Maulsby).