2006

[1]

Generalized Unified Decomposition of Ensemble Loss

Remco R. Bouckaert, Michael Goebel, and Patricia J. Riddle. Generalized unified decomposition of ensemble loss. In Proc 19th Australian Joint Conference on Artificial Intelligence, pages 1133-1139, 2006.
[ bib | .pdf ]
Goebel et al. [4] presented a unified decomposition of ensemble loss for explaining ensemble performance. They considered democratic voting schemes with uniform weights, where the various base classifiers each can vote for a single class once only. In this article, we generalize their decomposition to cover weighted, probabilistic voting schemes and non-uniform (progressive) voting schemes. Empirical results suggest that democratic voting schemes can be outperformed by probabilistic and progressive voting schemes. This makes the generalization worth exploring and we show how to use the generalization to analyze ensemble loss.

[2]

Efficient AUC Learning Curve Calculation

Remco R. Bouckaert. Efficient auc learning curve calculation. In Proc 19th Australian Joint Conference on Artificial Intelligence, pages 181-191, 2006.
[ bib | .pdf ]
A learning curve of a performance measure provides a graphical method with many benefits for judging classifier properties. The area under the ROC curve (AUC) is a useful and increasingly popular performance measure. In this paper, we consider the computational aspects of calculating AUC learning curves. A new method is provided for incrementally updating exact AUC curves and for calculating approximate AUC curves for datasets with millions of instances. Both theoretical and empirical justifications are given for the approximation. Variants for incremental exact and approximate AUC curves are provided as well.

[3]

Voting Massive Collections of Bayesian Network Classifiers for Data Streams

Remco R. Bouckaert. Voting massive collections of bayesian network classifiers for data streams. In Proc 19th Australian Joint Conference on Artificial Intelligence, pages 243-252, 2006.
[ bib | .pdf ]
We present a new method for voting exponential (in the number of attributes) size sets of Bayesian classifiers in polynomial time with polynomial memory requirements. Training is linear in the number of instances in the dataset and can be performed incrementally. This allows the collection to learn from massive data streams. The method allows for flexibility in balancing computational complexity, memory requirements and classification performance. Unlike many other incremental Bayesian methods, all statistics kept in memory are directly used in classification. Experimental results show that the classifiers perform well on both small and very large data sets, and that classification performance can be weighed against computational and memory costs.

[4]

A tree-based algorithm for predicate-argument recognition

Chi-San Lin and Tony Smith. A tree-based algorithm for predicate-argument recognition. Association for Computing Machinery New Zealand Bulletin, 2(1):online journal, January 2006.
[ bib ]
[5]

Improving on Bagging with Input Smearing

Eibe Frank and Bernhard Pfahringer. Improving on bagging with input smearing. In Proc 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore. Springer, 2006.
[ bib | .ps.gz | .pdf ]
Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an ensemble of decision trees, it often leads to reduced errors in the predictions when compared to using a single tree. A single tree is built from a training set of size N . Bagging is based on the idea that, ideally, we would like to eliminate the variance due to a particular training set by combining trees built from all training sets of size N . However, in practice, only one training set is available, and bagging simulates this platonic method by sampling with replacement from the original training data to form new training sets. In this paper we pursue the idea of sampling from a kernel density estimator of the underlying distribution to form new training sets, in addition to sampling from the data itself. This can be viewed as smearing out the resampled training data to generate new datasets, and the amount of smear is controlled by a parameter. We show that the resulting method, called input smearing, can lead to improved results when compared to bagging. We present results for both classification and regression problems.

[6]

Using Weighted Nearest Neighbor to Benefit from Unlabeled Data

Kurt Driessens, Peter Reutemann, Bernhard Pfahringer, and Claire Leschi. Using weighted nearest neighbor to benefit from unlabeled data. In Wee Keong Ng, Masaru Kitsuregawa, Jianzhong Li, and Kuiyu Chang, editors, Advances in Knowledge Discovery and Data Mining, 10th Pacific-Asia Conference, PAKDD 2006, volume 3918 of LNCS, pages 60-69, 2006.
[ bib | .pdf ]
The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are pre-labeled by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this pre-labeled data, the nearest neighbor classifier consistently improves on the original classifier.

[7]

A Semi-Supervised Spam Mail Detector

Bernhard Pfahringer. A semi-supervised spam mail detector. In Steffen Bickel, editor, Proceedings of the ECML/PKDD 2006 Discovery Challenge Workshop, pages 48-53. Humboldt University Berlin, 2006.
[ bib | .pdf ]
This document describes a novel semi-supervised approach to spam classification, which was successful at the ECML/PKDD2006 spam classification challenge. A local learning method based on lazy projections was successfully combined with a variant of a standard semi-supervised learning algorithm.

[8]

Naive Bayes for Text Classification with Unbalanced Classes

Eibe Frank and Remco R. Bouckaert. Naive bayes for text classification with unbalanced classes. In Proc 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, pages 503-510. Springer, 2006.
[ bib | .pdf ]
Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that predictive performance can be improved further by appropriate data transformations. In this paper we present another transformation that is designed to combat a potential problem with the application of MNB to unbalanced datasets. We propose an appropriate correction by adjusting attribute priors. This correction can be implemented as another data normalization step, and we show that it can significantly improve the area under the ROC curve. We also show that the modified version of MNB is very closely related to the simple centroid-based classifier and compare the two methods empirically.

[9]

Automatic Species Identification of Live Moths

Michael Mayo. Automatic species identification of live moths. In Ellis et. al, editor, Proc. of the 26th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pages 195-202, 2006.
[ bib | .pdf ]
A collection consisting of the images of 774 live moth individuals, each moth belonging to one of 35 different UK species, was analysed to determine if data mining techniques could be used effectively for automatic species identification. Feature vectors were extracted from each of the moth images and the machine learning toolkit WEKA was used to classify the moths by species using the feature vectors. Whereas a previous analysis of this image dataset reported in the literature [1] required that each moth's least worn wing region be highlighted manually for each image, WEKA was able to achieve a greater level of accuracy (85%) using support vector machines without manual specification of a region of interest at all. This paper describes the features that were extracted from the images, and the various experiments using different classifiers and datasets that were performed. The results show that data mining can be usefully applied to the problem of automatic species identification of live specimens in the field.

[10]

Random Relational Rules

Grant Anderson and Bernhard Pfahringer. Random relational rules. In Stephen Muggleton and Ramon Otero, editors, Extended Abstracts for 16th International Conference on Inductive Logic Programming, 2006.
[ bib | .pdf ]
Exhaustive search in relational learning is generally infeasible, therefore some form of heuristic search is usually employed, such as in FOIL[1]. On the other hand, so-called stochastic discrimination provides a framework for combining arbitrary numbers of weak classifiers (in this case randomly generated relational rules) in a way where accuracy improves with additional rules, even after maxi- mal accuracy on the training data has been reached.[2] The weak classifiers must have a slightly higher probability of covering instances of their target class than of other classes. As the rules are also independent and identically distributed, the Central Limit theorem applies and as the number of weak classifiers/rules grows, coverages for different classes resemble well-separated normal distribu- tions. Stochastic discrimination is closely related to other ensemble methods like Bagging, Boosting, or Random forests, all of which have been tried in relational learning[3, 4, 5].

[11]

A Comparison of Multi-instance Learning Algorithms

Lin Dong. A comparison of multi-instance learning algorithms. Master's thesis, Department of Computer Science, University of Waikato, 2006.
[ bib | http ]
Motivated by various challenging real-world applications, such as drug activity prediction and image retrieval, multi-instance (MI) learning has attracted considerable interest in recent years. Compared with standard supervised learning, the MI learning task is more difficult as the label information of each training example is incomplete. Many MI algorithms have been proposed. Some of them are specifically designed for MI problems whereas others have been upgraded or adapted from standard single-instance learning algorithms. Most algorithms have been evaluated on only one or two benchmark datasets, and there is a lack of systematic comparisons of MI learning algorithms. This thesis presents a comprehensive study of MI learning algorithms that aims to compare their performance and find a suitable way to properly address different MI problems. First, it briefly reviews the history of research on MI learning. Then it discusses five general classes of MI approaches that cover a total of 16 MI algorithms. After that, it presents empirical results for these algorithms that were obtained from 15 datasets which involve five different real-world application domains. Finally, some conclusions are drawn from these results: (1) applying suitable standard single-instance learners to MI problems can often generate the best result on the datasets that were tested, (2) algorithms exploiting the standard asymmetric MI assumption do not show significant advantages over approaches using the so-called collective assumption, and (3) different MI approaches are suitable for different application domains, and no MI algorithm works best on all MI problems.

[12]

Reinforcement Learning for Racecar Control

Benjamin George Cleland. Reinforcement learning for racecar control. Master's thesis, Department of Computer Science, University of Waikato, 2006.
[ bib | http ]
This thesis investigates the use of reinforcement learning to learn to drive a racecar in the simulated environment of the Robot Automobile Racing Simulator. Real-life race driving is known to be difficult for humans, and expert human drivers use complex sequences of actions. There are a large number of variables, some of which change stochastically and all of which may affect the outcome. This makes driving a promising domain for testing and developing Machine Learning techniques that have the potential to be robust enough to work in the real world. Therefore the principles of the algorithms from this work may be applicable to a range of problems. The investigation starts by finding a suitable data structure to represent the information learnt. This is tested using supervised learning. Reinforcement learning is added and roughly tuned, and the supervised learning is then removed. A simple tabular representation is found satisfactory, and this avoids difficulties with more complex methods and allows the investigation to concentrate on the essentials of learning. Various reward sources are tested and a combination of three are found to produce the best performance. Exploration of the problem space is investigated. Results show exploration is essential but controlling how much is done is also important. It turns out the learning episodes need to be very long and because of this the task needs to be treated as continuous by using discounting to limit the size of the variables stored. Eligibility traces are used with success to make the learning more efficient. The tabular representation is made more compact by hashing and more accurate by using smaller buckets. This slows the learning but produces better driving. The improvement given by a rough form of generalisation indicates the replacement of the tabular method by a function approximator is warranted. These results show reinforcement learning can work within the Robot Automobile Racing Simulator, and lay the foundations for building a more efficient and competitive agent.