[1]

Text Categorization Using Compression Models
Eibe Frank, Chang Chui, and Ian H. Witten.
Text categorization using compression models.
In Data Compression Conference, Snowbird, Utah, page 555. IEEE
Computer Society, 2000.
Note: abstract only. Full paper is available as
[10].
[ bib 
.ps.gz 
.pdf ]


[2]

BottomUp Propositionalization
Stefan Kramer and Eibe Frank.
Bottomup propositionalization.
In J. Cussens and A. Frisch, editors, Proc Workinprogress
reports of the 10th International Conference on Inductive Logic Programming,
pages 156162. CEURWS.org, July 2000.
[ bib 
.ps 
.pdf ]
In this paper, we present a new method for propositionalization that works in a bottomup, datadriven manner. It is tailored for biochemical databases, where the examples are 2D descriptions of chemical compounds. The method generates all frequent fragments (i.e., linearly connected atoms) up to a userspecified length. A preliminary experiment in the domain of carcinogenicity prediction showed that bottomup propositionalization is a promising approach to feature construction from relational data.


[3]

Pruning Decision Trees and Lists
Eibe Frank.
Pruning Decision Trees and Lists.
PhD thesis, Department of Computer Science, University of Waikato,
2000.
[ bib 
.ps.gz 
.pdf ]


[4]

MetaLearning by Landmarking Various Learning Algorithms
B. Pfahringer, H. Bensusan, and C. GiraudCarrier.
Metalearning by landmarking various learning algorithms.
In P. Langley, editor, Proceedings of the 17th International
Conference on Machine Learning (ICML2000), July 2000.
[ bib 
.ps 
.pdf ]
Landmarking is a novel approach to describing tasks in metalearning. Previous approaches to metalearning mostly considered only statisticsinspired measures of the data as a source for the definition of metaattributes. Contrary to such approaches, landmarking tries to determine the location of a specific learning problem in the space of all learning problems by directly measuring the performance of some simple and efficient learning algorithms themselves. In the experiments reported we show how such a use of landmark values can help to distinguish between areas of the learning space favouring different learners. Experiments, both with artificial and realworld databases, show that landmarking selects, with moderate but reasonable level of success, the best performing of a set of learning algorithms.


[5]

Learning to Use Operational Advice
J. Fuernkranz, B. Pfahringer, H. Kaindl, and S. Kramer.
Learning to use operational advice.
In W. Horn, editor, Proceedings of the 14th European Conference
on Artificial Intelligence (ECAI 2000), pages 291295, August 2000.
[ bib ]


[6]

A new approach to fitting linear models in high dimensional spaces
Yong Wang.
A new approach to fitting linear models in high dimensional
spaces.
PhD thesis, University of Waikato, Department of Computer Science,
Hamilton, New Zealand, 2000.
[ bib 
.ps 
.pdf ]
This thesis presents a new approach to fitting linear models, called 'pace regression', which also overcomes the dimensionality determination problem. Its optimality in minimizing the expected prediction loss is theoretically established, when the number of free parameters is infinitely large. In this sense, pace regression outperforms existing procedures for fitting linear models. Dimensionality determination, a special case of fitting linear models, turns out to be a natural byproduct. A range of simulation studies are conducted; the results support the theoretical analysis.
Through the thesis, a deeper understanding is gained of the problem of fitting linear models. Many key issues are discussed. Existing procedures, namely OLS, AIC, BIC, RIC, CIC, CV(d), BS(m), RIDGE, NNGAROTTE and LASSO, are reviewed and compared, both theoretically and empirically, with the new methods.
Estimating a mixing distribution is an indispensable part of pace regression. A measurebased minimum distance approach, including probability measures and nonnegative measures, is proposed, and strongly consistent estimators are produced. Of all minimum distance methods for estimating a mixing distribution, only the nonnegativemeasurebased one solves the minority cluster problem, what is vital for pace regression.
Pace regression has striking advantages over existing techniques for fitting linear models. It also has more general implications for empirical modeling, which are discussed in the thesis.


[7]

Experiences with a weighted decision tree learner
J. G. Cleary, L. E. Trigg, G. Holmes, and M. A. Hall.
Experiences with a weighted decision tree learner.
In Proc 20th SGES International Conference on Knowledge Based
Systems and Applied Artificial Intelligence, pages 3547. Springer, 2000.
[ bib ]


[8]

Comparison of consumer and producer perceptions of mushroom quality
A.F. Bollen, N.J. Kusabs, G. Holmes, and M.A. Hall.
Comparison of consumer and producer perceptions of mushroom quality.
In W.J. Florkowski, S.E. Prussia, and R.L. Shewfelt, editors,
Proc Integrated View of Fruit and Vegetable Quality International
Multidisciplinary Conference, pages 303311, Georgia, USA, 2000.
[ bib ]
The marketing of mushrooms in New Zealand is highly subjective. No detailed grading specifications exist and they are graded based on the experience of the growers (experts). The requirements of consumers are three or four steps removed in the value chain. The objective of this research has been to develop a quantitative set of descriptors which describe the quality grading criteria actually used by the graders, and to develop a similar set of criteria for the consumer. These two sets of descriptors were then compared to determine the difference in the two interpretations of quality.
Generally the consumers are classifying solely on visual attributes. There was disagreement between the consumer and the grower that suggested that the grower has adopted a quality profile which considerably exceeds that expected by the consumer group studied here.
The Machine Learning technique has been shown to produce a useful set of quality characterisation models for consumers. These have easily interpretable decision trees which are built on the objective attributes measured. The techniques help to provide an insight into the complex decisions made by consumers considering the purchase of mushrooms.


[9]

Technical Note: Naive Bayes for regression
E. Frank, L. Trigg, G. Holmes, and I.H. Witten.
Technical note: Naive bayes for regression.
Machine Learning, 41(1):526, 2000.
[ bib 
.ps 
.pdf ]
Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates.
This paper shows how to apply the naive Bayes methodology to numeric prediction (i.e. regression) tasks, and compares it to linear regression, instancebased learning, and a method that produces model treesdecision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naive Bayes is the method of choice, on realworld datasets it is almost uniformly worse than model trees. The comparison with linear regression depends on the error measure: for one measure naive Bayes performs similarly, for another it is worse. Compared to instancebased learning, it performs similarly with respect to both measures. These results indicate that the simplistic statistical assumption that naive Bayes makes is indeed more restrictive for regression than for classification.


[10]

Text Categorization Using Compression Models
Chang Chui Eibe Frank and Ian H. Witten.
Text categorization using compression models.
Technical Report 00/02, Department of Computer Science, University of
Waikato, January 2000.
[ bib 
.ps 
.pdf ]
Text categorization, or the assignment of natural language texts to predefined categories based on their content, is of growing importance as the volume of information available on the internet continues to overwhelm us. The use of predefined categories implies a supervised learning approach to categorization, where alreadyclassified articles  which effectively define the categories  are used as training data to build a model that can be used for classifying new articles that comprise the test data. This contrasts with unsupervised learning, where there is no training data and clusters of like documents are sought amongst the test articles. With supervised learning, meaningful labels (such as keyphrases) are attached to the training documents, and appropriate labels can be assigned automatically to test documents depending on which category they fall into.


[11]

Benchmarking attribute selection techniques for data mining
M.A. Hall.
Benchmarking attribute selection techniques for data mining.
Technical Report 00/10, University of Waikato, Department of Computer
Science, Hamilton, New Zealand, July 2000.
[ bib 
.ps 
.pdf ]
Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation.
Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted.
This paper presents a benchmark comparison of several attribute selection methods. All the methods produce an attribute ranking, a useful devise of isolating the individual merit of an attribute. Attribute selection is achieved by crossvalidating the rankings with respect to a learning scheme to find the best attributes. Results are reported for a selection of standard data sets and two learning schemes C4.5 and naive Bayes.


[12]

Correlationbased feature selection for discrete and numeric class machine learning
Mark Andrew Hall.
Correlationbased feature selection for discrete and numeric class
machine learning.
In Proc 17th International Conference on Machine Learning,
pages 359366. Morgan Kaufmann, June 2000.
[ bib 
.ps 
.pdf ]
Algorithms for feature selection fall into two broad categories: wrappersthat use the learning algorithm itself to evaluate the usefulness of features and filtersthat evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlationbased filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the wellknown ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instancebased learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF doesreducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the prepocessed data are often significantly smaller.


[13]

Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations
Ian H. Witten and Eibe Frank.
Data Mining: Practical Machine Learning Tools and Techniques
with Java Implementations.
Morgan Kaufmann, San Francisco, 2000.
[ bib 
.html ]
This book complements the Weka software. It shows how to use Weka's Java algorithms to discern meaningful patterns in your data, how to adapt them for your specialized data mining applications, and how to develop your own machine learning schemes. It offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in realworld data mining situations. Inside, you'll learn all you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. If you're involved at any level in the work of extracting usable knowledge from large collections of data, this book will be a valuable resource.

