2000

[1]

Text Categorization Using Compression Models

Eibe Frank, Chang Chui, and Ian H. Witten. Text categorization using compression models. In Data Compression Conference, Snowbird, Utah, page 555. IEEE Computer Society, 2000. Note: abstract only. Full paper is available as [10].
[ bib | .ps.gz | .pdf ]
[2]

Bottom-Up Propositionalization

Stefan Kramer and Eibe Frank. Bottom-up propositionalization. In J. Cussens and A. Frisch, editors, Proc Work-in-progress reports of the 10th International Conference on Inductive Logic Programming, pages 156-162. CEUR-WS.org, July 2000.
[ bib | .ps | .pdf ]
In this paper, we present a new method for propositionalization that works in a bottom-up, data-driven manner. It is tailored for biochemical databases, where the examples are 2-D descriptions of chemical compounds. The method generates all frequent fragments (i.e., linearly connected atoms) up to a user-specified length. A preliminary experiment in the domain of carcinogenicity prediction showed that bottom-up propositionalization is a promising approach to feature construction from relational data.

[3]

Pruning Decision Trees and Lists

Eibe Frank. Pruning Decision Trees and Lists. PhD thesis, Department of Computer Science, University of Waikato, 2000.
[ bib | .ps.gz | .pdf ]
[4]

Meta-Learning by Landmarking Various Learning Algorithms

B. Pfahringer, H. Bensusan, and C. Giraud-Carrier. Meta-learning by landmarking various learning algorithms. In P. Langley, editor, Proceedings of the 17th International Conference on Machine Learning (ICML-2000), July 2000.
[ bib | .ps | .pdf ]
Landmarking is a novel approach to describing tasks in meta-learning. Previous approaches to meta-learning mostly considered only statistics-inspired measures of the data as a source for the definition of meta-attributes. Contrary to such approaches, landmarking tries to determine the location of a specific learning problem in the space of all learning problems by directly measuring the performance of some simple and efficient learning algorithms themselves. In the experiments reported we show how such a use of landmark values can help to distinguish between areas of the learning space favouring different learners. Experiments, both with artificial and real-world databases, show that landmarking selects, with moderate but reasonable level of success, the best performing of a set of learning algorithms.

[5]

Learning to Use Operational Advice

J. Fuernkranz, B. Pfahringer, H. Kaindl, and S. Kramer. Learning to use operational advice. In W. Horn, editor, Proceedings of the 14th European Conference on Artificial Intelligence (ECAI 2000), pages 291-295, August 2000.
[ bib ]
[6]

A new approach to fitting linear models in high dimensional spaces

Yong Wang. A new approach to fitting linear models in high dimensional spaces. PhD thesis, University of Waikato, Department of Computer Science, Hamilton, New Zealand, 2000.
[ bib | .ps | .pdf ]
This thesis presents a new approach to fitting linear models, called 'pace regression', which also overcomes the dimensionality determination problem. Its optimality in minimizing the expected prediction loss is theoretically established, when the number of free parameters is infinitely large. In this sense, pace regression outperforms existing procedures for fitting linear models. Dimensionality determination, a special case of fitting linear models, turns out to be a natural by-product. A range of simulation studies are conducted; the results support the theoretical analysis.

Through the thesis, a deeper understanding is gained of the problem of fitting linear models. Many key issues are discussed. Existing procedures, namely OLS, AIC, BIC, RIC, CIC, CV(d), BS(m), RIDGE, NN-GAROTTE and LASSO, are reviewed and compared, both theoretically and empirically, with the new methods.

Estimating a mixing distribution is an indispensable part of pace regression. A measure-based minimum distance approach, including probability measures and nonnegative measures, is proposed, and strongly consistent estimators are produced. Of all minimum distance methods for estimating a mixing distribution, only the nonnegative-measure-based one solves the minority cluster problem, what is vital for pace regression.

Pace regression has striking advantages over existing techniques for fitting linear models. It also has more general implications for empirical modeling, which are discussed in the thesis.

[7]

Experiences with a weighted decision tree learner

J. G. Cleary, L. E. Trigg, G. Holmes, and M. A. Hall. Experiences with a weighted decision tree learner. In Proc 20th SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence, pages 35-47. Springer, 2000.
[ bib ]
[8]

Comparison of consumer and producer perceptions of mushroom quality

A.F. Bollen, N.J. Kusabs, G. Holmes, and M.A. Hall. Comparison of consumer and producer perceptions of mushroom quality. In W.J. Florkowski, S.E. Prussia, and R.L. Shewfelt, editors, Proc Integrated View of Fruit and Vegetable Quality International Multidisciplinary Conference, pages 303-311, Georgia, USA, 2000.
[ bib ]
The marketing of mushrooms in New Zealand is highly subjective. No detailed grading specifications exist and they are graded based on the experience of the growers (experts). The requirements of consumers are three or four steps removed in the value chain. The objective of this research has been to develop a quantitative set of descriptors which describe the quality grading criteria actually used by the graders, and to develop a similar set of criteria for the consumer. These two sets of descriptors were then compared to determine the difference in the two interpretations of quality.

Generally the consumers are classifying solely on visual attributes. There was disagreement between the consumer and the grower that suggested that the grower has adopted a quality profile which considerably exceeds that expected by the consumer group studied here.

The Machine Learning technique has been shown to produce a useful set of quality characterisation models for consumers. These have easily interpretable decision trees which are built on the objective attributes measured. The techniques help to provide an insight into the complex decisions made by consumers considering the purchase of mushrooms.

[9]

Technical Note: Naive Bayes for regression

E. Frank, L. Trigg, G. Holmes, and I.H. Witten. Technical note: Naive bayes for regression. Machine Learning, 41(1):5-26, 2000.
[ bib | .ps | .pdf ]
Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates.

This paper shows how to apply the naive Bayes methodology to numeric prediction (i.e. regression) tasks, and compares it to linear regression, instance-based learning, and a method that produces model trees-decision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naive Bayes is the method of choice, on real-world datasets it is almost uniformly worse than model trees. The comparison with linear regression depends on the error measure: for one measure naive Bayes performs similarly, for another it is worse. Compared to instance-based learning, it performs similarly with respect to both measures. These results indicate that the simplistic statistical assumption that naive Bayes makes is indeed more restrictive for regression than for classification.

[10]

Text Categorization Using Compression Models

Chang Chui Eibe Frank and Ian H. Witten. Text categorization using compression models. Technical Report 00/02, Department of Computer Science, University of Waikato, January 2000.
[ bib | .ps | .pdf ]
Text categorization, or the assignment of natural language texts to predefined categories based on their content, is of growing importance as the volume of information available on the internet continues to overwhelm us. The use of predefined categories implies a supervised learning approach to categorization, where already-classified articles - which effectively define the categories - are used as training data to build a model that can be used for classifying new articles that comprise the test data. This contrasts with unsupervised learning, where there is no training data and clusters of like documents are sought amongst the test articles. With supervised learning, meaningful labels (such as keyphrases) are attached to the training documents, and appropriate labels can be assigned automatically to test documents depending on which category they fall into.

[11]

Benchmarking attribute selection techniques for data mining

M.A. Hall. Benchmarking attribute selection techniques for data mining. Technical Report 00/10, University of Waikato, Department of Computer Science, Hamilton, New Zealand, July 2000.
[ bib | .ps | .pdf ]
Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation.

Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted.

This paper presents a benchmark comparison of several attribute selection methods. All the methods produce an attribute ranking, a useful devise of isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the rankings with respect to a learning scheme to find the best attributes. Results are reported for a selection of standard data sets and two learning schemes C4.5 and naive Bayes.

[12]

Correlation-based feature selection for discrete and numeric class machine learning

Mark Andrew Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proc 17th International Conference on Machine Learning, pages 359-366. Morgan Kaufmann, June 2000.
[ bib | .ps | .pdf ]
Algorithms for feature selection fall into two broad categories: wrappersthat use the learning algorithm itself to evaluate the usefulness of features and filtersthat evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often out-performs the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does-reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the prepocessed data are often significantly smaller.

[13]

Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations

Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, 2000.
[ bib | .html ]
This book complements the Weka software. It shows how to use Weka's Java algorithms to discern meaningful patterns in your data, how to adapt them for your specialized data mining applications, and how to develop your own machine learning schemes. It offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. Inside, you'll learn all you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. If you're involved at any level in the work of extracting usable knowledge from large collections of data, this book will be a valuable resource.