2002

[1]

Accuracy bounds for ensembles under 0 - 1 loss

R.R. Bouckaert. Accuracy bounds for ensembles under 0 - 1 loss. Technical Report 04/02, University of Waikato, Computer Science Department, Hamilton, New Zealand, 2002.
[ bib | .ps | .pdf ]
This paper is an attempt to increase the understanding in the behavior of ensembles for discrete variables in a quantitative way. A set of tight upper and lower bounds for the accuracy of an ensemble is presented for wide classes of ensemble algorithms, including bagging and boosting. The ensemble accuracy is expressed in terms of the accuracies of the members of the ensemble.

Since those bounds represent best and worst case behavior only, we study typical behavior as well, and discuss its properties. A parameterized bound is presented which describes ensemble behavior as a mixture of dependent base classifier and independent base classifier areas. Some empirical results are presented to support our conclusions.

[2]

Low level information extraction: a Bayesian network based approach

R.R. Bouckaert. Low level information extraction: a bayesian network based approach. In Workshop on Text Learning (TextML-2002), 2002.
[ bib | .ps | .pdf ]
In this article, a contribution is made to information extraction and Bayesian network learning motivated by two practical information extraction tasks.

It is shown that some information extraction tasks can be approached as a classification problem where the text is split in tokens and each token is assigned a class. Hidden Markov models are a popular formalism for this task, however they do not deal with tokens having a set of attributes instead of a single one. A new algorithm for this task is presented using various Bayesian networks architectures that deals with multiple attributes per token. Experiments suggest most Bayesian networks architectures perform better than Naive Bayes in our problem domain.

Hopefully, this article helps in making Bayesian networks more accessible for the information extraction community.

[3]

Learning Structure from Sequences, with Applications in a Digital Library

Ian H. Witten. Learning structure from sequences, with applications in a digital library. In ALT, pages 42-56, 2002.
[ bib ]
The services that digital libraries provide to users can be greatly enhanced by automatically gleaning certain kinds of information from the full text of the documents they contain. This paper reviews some recent work that applies novel techniques of machine learning (broadly interpreted) to extract information from plain text, and puts it in the context of digital library applications. We describe three areas: hierarchical phrase browsing, including efficient methods for inferring a phrase hierarchy from a large corpus of text; text mining using adaptive compression techniques, giving a new approach to generic entity extraction, word segmentation, and acronym extraction; and keyphrase extraction.

[4]

Optimising tabling structures for bottom-up logic programming

R. J. Clayton, J. G. Cleary, B. Pfahringer, and B. M. Utting. Optimising tabling structures for bottom-up logic programming. In Proc International Workshop on Logic Based Program Development and Transformation LOPSTR'02, pages 57-75, Madrid, Spain, 2002. Fundacion General de la Universidad Politechica de Madrid, Madrid.
[ bib ]
[5]

Racing Committees for Large Datasets

Eibe Frank, Geoffrey Holmes, Richard Kirkby, and Mark Hall. Racing committees for large datasets. In S. Lange, K. Satoh, and C. H. Smith, editors, Proc 5th International Conference on Discovery Science, volume 2534 of LNCS, pages 153-164. Springer, 2002. Also published as Working Paper 03/02, Computer Science Department, University of Waikato, Hamilton.
[ bib | .ps | .pdf ]
This paper proposes a method for generating classifiers from large datasets by building a committee of simple base classifiers using a standard boosting algorithm. It permits the processing of large datasets even if the underlying base learning algorithm cannot efficently do so. The basic idea is to split incoming data into chunks and build a committee based on classifiers built from these individual chunks. Our method extends earlier work by introducing a method for adaptively pruning the committee. This is essential when applying the algorithm in practice because it dramatically reduces the algorithm's running time and memory consumption. It also makes it possible to efficently race committees corresponding to different chunk sizes. This is important because our empirical results show that the accuracy of the resulting committee can vary significantly with the chunk size. They also show that pruning is indeed crucial to make the method practical for large datasets in terms of running time and memory requirements. Surprisingly, the results demonstrate that pruning can also improve accuracy.

[6]

An experimental speech to graphics system

A. C. Golightly and T. C. Smith. An experimental speech to graphics system. In S. Jones and M. Masoodian, editors, Proc SIGCHI-NZ Symposium on Computer-Human Interaction, pages 91-96, Hamilton, New Zealand, 2002.
[ bib ]
[7]

Benchmarking attribute selection techniques for discrete class data mining

M. Hall and G. Holmes. Benchmarking attribute selection techniques for discrete class data mining. Technical Report 02/02, The University of Waikato, Department of Computer Science, Hamilton, New Zealand, 2002.
[ bib ]
Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation.

Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutation and has led to a situation where very few benchmark studies have been conducted.

This paper presents a benchmark comparison of several attribute selection methods for supervised classification. All the methods produce an attribute ranking, a useful devise for isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. Results are reported for a selection of standard data sets and two diverse learning schemes C4.5 and naïve Bayes.

[8]

A development environment for predictive modelling in foods

G. Holmes and M. Hall. A development environment for predictive modelling in foods. International Journal of Food Microbiology, 73(2-3):351-362, 2002.
[ bib ]
[9]

A logistic boosting approach to inducing multiclass alternating decision trees

G. Holmes, B. Pfahringer, E. T. Frank, R. B. Kirkby, and M. A. Hall. A logistic boosting approach to inducing multiclass alternating decision trees. Technical Report 01/02, The University of Waikato, Department of Computer Science, Hamilton, New Zealand, 2002.
[ bib ]
The alternating decision tree (ADTree) is a successful classification technique that combine decision trees with the predictive accuracy of boosting into a ser to interpretable classification rules. The original formulation of the tree induction algorithm restricted attention to binary classification problems. This paper empirically evaluates several methods for extending the algorithm to the multiclass case by splitting the problem into several two-class LogitBoost procedure to induce alternating decision trees directly. Experimental results confirm that this procedure is comparable with methods that are based on the original ADTree formulation in accuracy, while inducing much smaller trees.

[10]

Multiclass Alternating Decision Trees

Geoffrey Holmes, Bernhard Pfahringer, Richard Kirkby, Eibe Frank, and Mark Hall. Multiclass alternating decision trees. In T. Elomaa, H. Mannila, and H. Toivonen, editors, Proc 13th European Conference on Machine Learning, volume 2430 of LNCS, pages 161-172. Springer, 2002.
[ bib | .ps | .pdf ]
The alternating decision tree (ADTree) is a successful classification technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules. The original formulation of the tree induction algorithm restricted attention to binary classification problems. This paper empirically evaluates several wrapper methods for extending the algorithm to the multiclass case by splitting the problem into several two-class problems. Seeking a more natural solution we then adapt the multiclass LogitBoost and AdaBoost.MH procedures to induce alternating decision trees directly. Experimental results confirm that these procedures are comparable with wrapper methods that are based on the original ADTree formulation in accuracy, while inducing much smaller trees.

[11]

Fragment Generation and Support Vector Machines for Inducing SARs

Stefan Kramer, Eibe Frank, and Christoph Helma. Fragment generation and support vector machines for inducing SARs. SAR and QSAR in Environmental Research, 13(5):509-523, 2002.
[ bib | http ]
We present a new approach to the induction of SARs based on the generation of structural fragments and support vector machines (SVMs). It is tailored for bio-chemical databases, where the examples are two-dimensional descriptions of chemical compounds. The fragment generator finds all fragments (i.e. linearly connected atoms) that satisfy user-specified constraints regarding their frequency and generality. In this paper, we are querying for fragments within a minimum and a maximum frequency in the dataset. After fragment generation, we propose to apply SVMs to the problem of inducing SARs from these fragments. We conjecture that the SVMs are particularly useful in this context, as they can deal with a large number of features. Experiments in the domains of carcinogenicity and mutagenicity prediction show that the minimum and the maximum frequency queries for fragments can be answered within a reasonable time, and that the predictive accuracy obtained using these fragments is satisfactory. However, further experiments will have to confirm that this is a viable approach to inducing SARs.

[12]

Modeling for optimal probability prediction

Y. Wang and I.H. Witten. Modeling for optimal probability prediction. In C. Sammut and A. Hoffmann, editors, Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco, California, pages 650-657. Morgan Kaufmann, 2002.
[ bib | .ps | .pdf ]
We present a general modeling method for optimal probability prediction over future observations, in which model dimensionality is determined as a natural by-product. This new method yields several estimators, and we establish theoretically that they are optimal (either overall or under stated restrictions) when the number of free parameters is infinite. As a case study, we investigate the problem of fitting logistic models in finite-sample situations. Simulation results on both artificial and practical datasets are supportive.