1996

[1]

Active Learning of Soft Rules for System Modelling

Eibe Frank and Klaus-Peter Huber. Active learning of soft rules for system modelling. In Proc 2nd European Congress on Intelligent Techniques and Soft Computing, Aachen, Germany, pages 1430-1434, Aachen, 1996. Verlag Mainz.
[ bib | .ps.gz | .pdf ]
[2]

A Computer Model of Blues Music and its Evaluation

Mark Andrew Hall and Lloyd Smith. A computer model of blues music and its evaluation. Journal of the Acoustical Society of America, 100(2):1163-1167, 1996.
[ bib ]
[3]

An MDL estimate of the significance of rules

J.G. Cleary, S. Legg, and I.H. Witten. An mdl estimate of the significance of rules. In Proc ISIS: Information, Statistics, and Induction in Science, pages 43-53, Melbourne, Australia, August 1996.
[ bib | .ps | .pdf ]
This paper proposes a new method for measuring the performance of models-whether decision trees or sets of rules-inferred by machine learning methods. Inspired by the minimum description length (MDL) philosophy and theoretically rooted in information theory, the new method measures the complexity of test data with respect to the model. It has been evaluated on rule sets produced by several different machine learning schemes on a large number of standard data sets. When compared with the usual percentage correct measure, it is shown to agree with it in restricted cases. However, in other more general cases taken from real data sets-for example, when rule sets make multiple or no predictions-it disagrees substantially. It is argued that the MDL measure is more reasonable in these cases and represents a better way of assessing the significance of a rule set's performance. The question of the complexity of the rule set itself is not addressed in the paper.

[4]

MetaData for database mining

Cleary J.G., G. Holmes, S.J. Cunningham, and I.H. Witten. Metadata for database mining. In Proc IEEE Metadata Conference, Silver Spring, MD, April 1996.
[ bib | .ps | .pdf ]
At present, a machine learning application is accomplished by carefully crafting asingle table from an often complex, multi-table database. The metadata necessary to create this table is rarely formally recorded, and is sometimes implicit in the structure of the database or the typing of the attributes. We categorize the types of metadata that we have encountered in our work with machine learning applications in agriculture, and describe a first generation tool that we have built to aid in the recording and use of metadata in database mining.

[5]

Dataset cataloguing metadata for machine learning applications and research

S.J. Cunningham. Dataset cataloguing metadata for machine learning applications and research. Technical Report 96/26, University of Waikato, Computer Science Department, Hamilton, New Zealand, October 1996.
[ bib ]
As the field of machine learning (ML) matures, two types of data archives are developing: collections of benchmark data sets used to test the performance of new algorithms, and data stores to which machine learning/data mining algorithms are applied to create scientific or commercial applications. At present, the catalogs of these archives are ad hoc and not tailored to machine learning analysis. This paper considers the cataloging metadata required to support these two types of repositories, and discusses the organizational support necessary for archive catalog maintenance.

[6]

Understanding what Machine Learning produces Part I: Representations and their comprehensibility

S.J. Cunningham, M.C. Humphrey, and I.H. Witten. Understanding what machine learning produces part i: Representations and their comprehensibility. Technical Report 96/21, University of Waikato, Computer Science Department, Hamilton, New Zealand, October 1996.
[ bib | .ps | .pdf ]
The aim of many machine learning users is to comprehend the structures that are inferred from a dataset, and such users may be far more interested in understanding the structure of their data than in predicting the outcome of new test data. Part I of this paper surveys representations based on decision trees, production rules and decision graphs, that have been developed and used for machine learning. These representations have differing degrees of expressive power, and particular attention is paid to their comprehensibility for non-specialist users. The graphic form in which a structure is portrayed also has a strong effect on comprehensibility, and Part II of this paper develops knowledge visualization techniques that are particularly appropriate to help answer the questions that machine learning users typically ask about the structures produced.

[7]

Selecting multiway splits in decision trees

E. Frank and I.H. Witten. Selecting multiway splits in decision trees. Technical Report 96/31, University of Waikato, Department of Computer Science, Hamilton, New Zealand, December 1996.
[ bib | .ps | .pdf ]
Decision trees in which numeric attributes are split several ways are more comprehensible than the usual binary trees because attributes rarely appear more than once in any path from root to leaf. There are efficient algorithms for finding the optimal multiway split for a numeric attribute, given the number of intervals in which it is to be divided. The problem we tackle is how to choose this number in order to obtain small, accurate trees.

We view each multiway decision as a model and a decision tree as a recursive structure of such models. Standard methods of choosing between competing models include resampling techniques (such as cross-validation, holdout, or bootstrap) for estimating the classification error; and minimum description length techniques. However, the recursive situation differs from the usual one, and may call for new model selection methods.

This paper introduces a new criterion for model selection: a resampling estimate of the information gain. Empirical results are presented for building multiway decision trees using this new criterion, and compared with criteria adopted by previous authors. The new method generates multiway trees that are both smaller and more accurate than those produced previously, and their performance is comparable with standard binary decision trees.

[8]

Experimental comparison of optimisation techniques: computer optimisation of dairy farm management

R. Hart. Experimental comparison of optimisation techniques: computer optimisation of dairy farm management. Master's thesis, University of Waikato, Department of Computer Science, Hamilton, New Zealand, 1996.
[ bib ]
[9]

Understanding what Machine Learning produces Part II: Knowledge visualization techniques

M.C. Humphrey, S.J. Cunningham, and I.H. Witten. Understanding what machine learning produces part ii: Knowledge visualization techniques. Technical Report 96/22, University of Waikato, Computer Science Department, Hamilton, New Zealand, October 1996.
[ bib | .ps | .pdf ]
Researchers in machine learning use decision trees, production rules, and decision graphs for visualizing classification data. Part I of this paper surveyed these representations, paying particular attention to their comprehensibility for non-specialist users. Part II turns attention to knowledge visualization the graphic form in which a structure is portrayed and its strong influence on comprehensibility. We analyze the questions that, in our experience, end users of machine learning tend to ask of the structures inferred from their empirical data. By mapping these questions onto visualization tasks, we have created new graphical representations that show the flow of examples through a decision structure. These knowledge visualization techniques are particularly appropriate in helping to answer the questions that users typically ask, and we describe their use in discovering new properties of a data set. In the case of decision trees, an automated software tool has been developed to construct the visualizations.

[10]

Learning relational ripple-down rules

J.N. Littin. Learning relational ripple-down rules. Master's thesis, University of Waikato, Computer Science Department, Hamilton, New Zealand, 1996.
[ bib | .ps | .pdf ]
[11]

An investigation into the use of machine learning for the determining of oestrus in cows

R.S. Mitchell, R.A. Sherlock, and L.A. Smith. An investigation into the use of machine learning for the determining of oestrus in cows. Computing and Electronics in Agriculture, 15:195-215, 1996.
[ bib | .ps | .pdf ]
A preliminary investigation of the application of two well-known machine learning schemes-C4.5 and FOIL-to detection of oestrus in dairy cows has been made. This is a problem of practical economic significance as each missed opportunity for artificial insemination results in 21 days lost milk production. Classifications were made on normalised diviations of milk volume production and milking order time series data. The best learning scheme was C4.5 which was able to detect 69% of oestrus events, albeit with an unacceptably high rate of false positives (74%). Several directions for further work and improvements are identified.

[12]

Machine learning in programming by demonstration: lessons learned from CIMA

D. Maulsby and I.H. Witten. Machine learning in programming by demonstration: lessons learned from cima. In Y. Gil, editor, Acquisition, learning and demonstration: Automating tasks for users (Proc AAAI Symposium, Stanford), pages 66-72, Menlo Park, CA, 1996. AAAI Press.
[ bib | .ps | .pdf ]
Programming-by-demonstration (PBD) systems learn tasks by watching the user perform them. CIMA is an interactive learning system for modeling the data selected and modified by a user as he or she undertakes a task. Part of a PBD system, CIMA is invoked when user actions are matched to find a common description of their operands. Although the system's interfaces to users and applications are still too rough-hewn to permit field trials, its performance on recorded dialogs between users and a simulated agent meets the design goals established prior to its implementation.

The contributions of this work lie in three areas:

(a) a design methodology for PBD systems;
(b) a framework for user actions in PBD;
(c) novel methods of interaction with the user.

These are discussed in separate sections below. First, however, it is necessary to convey the flavor of what it is like to use the CIMA system, and this is done in the following section.

[13]

Detecting sequential structure

C.G. Nevill-Manning. Detecting sequential structure. PhD thesis, University of Waikato, Department of Computer Science, Hamilton, New Zealand, 1996.
[ bib | .ps | .pdf ]
[14]

Compressing semi-structured text using hierarchical phrase identification

C.G. Nevill-Manning, I.H. Witten, and D.R. Olsen. Compressing semi-structured text using hierarchical phrase identification. In J.A. Storer and M. Cohn, editors, Proc Data Compression Conference, pages 63-72, Los Alamitos, CA, 1996. IEEE Press.
[ bib | .ps | .pdf ]
Many computer files contain highly-structured, predictable information interspersed with information which has less regularity and is therefore less predictable-such as free text. Examples range from word-processing source files, which contain precisely-expressed formatting specifications enclosing tracts of natural-language text, to files containing a sequence of filled-out forms which have a predefined skeleton clothed with relatively unpredictable entries. These represent extreme ends of a spectrum. Word-processing files are dominated by free text, and respond well to general-purpose compression techniques. Forms generally contain database-style information, and are most appropriately compressed by taking into account their special structure. But one frequently encounters intermediate cases. For example, in many email messages the formal header and the informal free-text content are equally voluminous. Short SGML files often contain comparable amounts of formal structure and informal text. Although such files may be compressed quite well by general-purpose adaptive text compression algorithms, which will soon pick up the regular structure during the course of normal adaptation, better compression can often be obtained by methods that are equipped to deal with both formal and informal structure.

[15]

Automatic oestrus detection from milking data - a preliminary investigation

R.A. Sherlock, L.A. Smith, and R.S. Mitchell. Automatic oestrus detection from milking data - a preliminary investigation. In Proc NZ Society for Animal Production, volume 65, pages 228-229, Hamilton, New Zealand, February 1996.
[ bib | .ps | .pdf ]
Efficient oestrus detection is important economically, particularly with seasonal-calving herds employing artificial insemination, as every missed oestrus effectively costs 21 days milk production for that cow in that season which is about 8% of the total. Traditionally, oestrus detection relies mainly on visual observation of animal behaviour - a cow in oestrus will stand and allow herself to be mounted by herdmates. Such events are readily noted when they take place amongst assembled cows, such as at milking times, but may be missed entirely in herds of free-grazing animals which are only brought in for milking twice a day, as is usual in New Zealand style systems. A common low-cost aid to oestrus detection is the use of tail-paint (Macmillan and Curnow, 1977), and although this can be very effective when properly executed the regular visual inspection and repainting is labour-intensive.

[16]

Learning language using genetic algorithms

T.C. Smith and I.H. Witten. Learning language using genetic algorithms, volume 1040 of LNCS: Connectionist, statistical and symbolic approaches to learning for natural language processing, pages 132-145. Springer-Verlag, New York, 1996.
[ bib | .ps | .pdf ]
Strict pattern-based methods of grammar induction are often frustrated by the apparently inexhaustible variety of novel word combinations in large corpora. Statistical methods offer a possible solution by allowing frequent well-formed expressions to overwhelm the infrequent ungrammatical ones. They also have the desirable property of being able to construct robust grammars from positive instances alone. Unfortunately, the zero-frequency problem entails assigning a small probability to all possible word patterns, thus ungrammatical n-grams become as probable as unseen grammatical ones. Further, such grammars are unable to take advantage of inherent lexical properties that should allow infrequent words to inherit the syntactic properties of the class to which they belong.

This paper describes a genetic algorithm (GA) that adapts a population of hypothesis grammars towards a more effective model of language structure. The GA is statistically sensitive in that the utility of frequent patterns is reinforced by the persistence of efficient substructures. It also supports the view of language learning as a bootstrapping problem, a learning domain where it appears necessary to simultaneously discover a set of categories and a set of rules defined over them. Results from a number of tests indicate that the GA is a robust, fault-tolerant method for inferring grammars from positive examples of natural language.

[17]

The entropy of English using PPM-based models

W.J. Teahan and J.G. Cleary. The entropy of english using ppm-based models. In J.A. Storer and M. Cohn, editors, Proc Data Compression Conference, pages 53-62, Los Alamitos, CA, 1996. EEE Press.
[ bib | .ps | .pdf ]
Over 45 years ago Claude E. Shannon estimated the entropy of English to be about 1 bit per character [16]. He did this by having human subjects guess samples of text, letter by letter. From the number of guesses made by each subject he estimated upper and lower bounds of 1.3 and 0.6 bits per character (bpc) for the entropy of English. Shannon's methodology was not improved upon until 1978 when Cover and King [6] used a gambling approach to estimate the upper bound to be 1.25 bpc from the same text. In the cryptographic community n-gram analysis suggests 1.5 bpc as the asymptotic limit for 26-letter English (Tilbourg [19]).

[18]

Machine Learning applied to fourteen agricultural datasets

K. Thomson and R.J. McQueen. Machine learning applied to fourteen agricultural datasets. Technical Report 96/18, University of Waikato, Computer Science Department, Hamilton, New Zealand, September 1996.
[ bib | .ps | .pdf ]
This document reports on an investigation conducted between November, 1995 and March, 1996 into the use of machine learning on 14 sets of data supplied by agricultural researchers in New Zealand. Our purpose here is to collect together short reports on trials with these datasets using the WEKA machine learning workbench, so that some understanding of the applicability and potential application of machine learning to similar datasets may result.

[19]

Decision combination based on the characterization of predictive accuracy

K.M. Ting. Decision combination based on the characterization of predictive accuracy. In R.S. Michalski and J. Wnek, editors, Proc 3rd International Workshop on Multistrategy Learning, pages 191-202, Fairfax, VA, 1996.
[ bib ]
[20]

The characterization of predictive accuracy and decision combination

K.M. Ting. The characterization of predictive accuracy and decision combination. In Proc International Conference on Machine Learning, pages 498-506, Bari, Italy, 1996.
[ bib | .ps | .pdf ]
In this paper, we first explore an intrinsic problem that exists in the theories induced by learning algorithms. Regardless of the selected algorithm, search methodology and hypothesis representation by which the theory is induced, one would expect the theory to make better predictions in some regions of the description space than others. We term the fact that an induced theory will have some regions of relatively poor performance the problem of locally low predictive accuracy. Having characterised the problem of locally low predictive accuracy in Instance-Based and Naive Bayesian classifiers, we propose to counter this problem using a composite learner that incorporates both classifiers. The strategy is to select an estimated better performing classifier to do the final prediction during classification. Empirical results show that the strategy is capable of partially overcoming the problem and at the same time improving the overall performance of its constituent classifiers. We provide explanations of why the proposed composite learner performs better than the cross-validation method and the better of its constituent classifiers.

[21]

Theory combination: an alternative to data combination

K.M. Ting and B.T. Low. Theory combination: an alternative to data combination. Technical Report 96/19, University of Waikato, Department of Computer Science, Hamilton, New Zealand, 1996.
[ bib | .ps | .pdf ]
The approach of combining theories learned from multiple batches of data provide an alternative to the common practice of learning one theory from all the available data (i.e., the data combination approach). This paper empirically examines the base-line behaviour of the theory combination approach in classification tasks. We find that theory combination can lead to better performance even if the disjoint batches of data are drawn randomly from a larger sample, and relate the relative performance of the two approaches to the learning curve of the classifier used.

The practical implication of our results is that one should consider using theory combination rather tan data combination, especially when multiple batches of data for the same task are readily available.

Another interesting result is that we empirically show that the near-asymptotic performance of a single theory, in some classification task, can be significantly improved by combining multiple theories (of the same algorithm) if the constituent theories are substantially different and there is some regularity in the theories to be exploited by the combination method used. Comparisons with known theoretical results are also provided.

[22]

Interacting with learning agents: implications for ML from HCI

I.H. Witten, C.G. Nevill-Manning, and D.L. Maulsby. Interacting with learning agents: implications for ml from hci. In Workshop on Machine Learning meets Human-Computer Interaction, ML'96, pages 51-58, Bari, Italy, 1996.
[ bib | .pdf ]
Computers excel at repetitive tasks. But automating them usually involves programming, which is beyond the reach of most non-specialist users. One solution is for machines to learn procedures by observing users at work-and if this enhanced users' productivity and sense of achievement, they might even be persuaded to help the system by supplying some additional information. In principle, combining machine learning with instructional interaction should increase the speed with which tasks are acquired, and enhance reliability too.

[23]

Applications of Machine Learning on two agricultural datasets

S. Yeates and K. Thomson. Applications of machine learning on two agricultural datasets. In Proc New Zealand Conference of Postgraduate Students in Engineering and Technology, pages 495-496, Christchurch, New Zealand, 1996.
[ bib | .ps | .pdf ]
The induction of decision trees from tabulated data is a field of machine learning which has been demonstrated successfully in several practical applications. This paper looks at the application of this technology to two datasets in the agricultural domain, and show why it was not possible to achieve the success obtained in other domains.

[24]

Tutorial-Weka: the Waikato environment of knowledge analysis

Waikato ML group. Tutorial-weka: the waikato environment of knowledge analysis, November 1996.
[ bib ]