Learning from the Past with Experiment Databases

Joaquin Vanschoren, Bernhard Pfahringer, and Geoffrey Holmes. Learning from the past with experiment databases. In Proc 10th Pacific Rim International Conference on Artificial Intelligence, Hanoi, Vietnam, pages 485-496. Springer, 2008.
[ bib ]
Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, often losing detailed results after publication. Yet, when collecting all these past experiments in experiment databases, they can readily be reused for additional and possibly much broader investigation. In this paper, we make use of such a database to answer various interesting research questions about learning algorithms and to verify a number of recent studies. Alongside performing elaborate comparisons of algorithms, we also investigate the effects of algorithm parameters and data properties, and seek deeper insights into the behavior of learning algorithms by studying their learning curves and bias-variance profiles.


Multi-label Classification Using Ensembles of Pruned Sets

Jesse Read, Bernhard Pfahringer, and Geoffrey Holmes. Multi-label classification using ensembles of pruned sets. In Proc 8th IEEE International Conference on Data Mining, Pisa, Italy, pages 995-1000. IEEE Computer Society, 2008.
[ bib | http ]
This paper presents a pruned sets method (PS) for multi-label classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account correlations between labels. By pruning these sets, PS focuses only on the most important correlations, which reduces complexity and improves accuracy. By combining pruned sets in an ensemble scheme (EPS), new label sets can be formed to adapt to irregular or complex data. The results from experimental evaluation on a variety of multi-label datasets show that [E]PS can achieve better performance and train much faster than other multi-label methods.


Practical Bias Variance Decomposition

Remco R. Bouckaert. Practical bias variance decomposition. In Proc 21st Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib | .pdf ]
Bias variance decomposition for classifiers is a useful tool in understanding classifier behavior. Unfortunately, the literature does not provide consistent guidelines on how to apply a bias variance decomposition. This paper examines the various parameters and variants of empirical bias variance decompositions through an extensive simulation study. Based on this study, we recommend to use ten fold cross validation as sampling method and take 100 samples within each fold with a test set size of at least 2000. Only if the learning algorithm is stable, fewer samples, a smaller test set size or lower number of folds may be justified.


Revisiting Multiple-Instance Learning via Embedded Instance Selection

James Foulds and Eibe Frank. Revisiting multiple-instance learning via embedded instance selection. In Proc 21st Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib | .pdf ]
Multiple-Instance Learning via Embedded Instance Selection (MILES) is a recently proposed multiple-instance (MI) classification al- gorithm that applies a single-instance base learner to a propositional- ized version of MI data. However, the original authors consider only one single-instance base learner for the algorithm - the 1-norm SVM. We present an empirical study investigating the efficacy of alternative base learners for MILES, and compare MILES to other MI algorithms. Our results show that boosted decision stumps can in some cases provide better classification accuracy than the 1-norm SVM as a base learner for MILES. Although MILES provides competitive performance when compared to other MI learners, we identify simpler propositionaliza- tion methods that require shorter training times while retaining MILES' strong classification performance on the datasets we tested.


Additive Regression Applied to a Large-Scale Collaborative Filtering Problem

Eibe Frank and Mark Hall. Additive regression applied to a large-scale collaborative filtering problem. In Proc 21set Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib | .pdf ]
The much-publicized Netflix competition has put the spot- light on the application domain of collaborative filtering and has sparked interest in machine learning algorithms that can be applied to this sort of problem. The demanding nature of the Netflix data has lead to some interesting and ingenious modifications to standard learning methods in the name of efficiency and speed. There are three basic methods that have been applied in most approaches to the Netflix problem so far: stand-alone neighborhood-based methods, latent factor models based on singular-value decomposition, and ensembles consisting of variations of these techniques. In this paper we investigate the application of forward stage-wise additive modeling to the Netflix problem, using two regression schemes as base learners: ensembles of weighted simple linear regressors and k-means clustering-the latter being interpreted as a tool for multi- variate regression in this context. Experimental results show that our methods produce competitive results.


Discriminating Against New Classes: One-Class versus Multi-Class Classification

Kathryn Hempstalk and Eibe Frank. Discriminating against new classes: One-class versus multi-class classification. In Proc 21set Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib | .pdf ]
Many applications require the ability to identify data that is anomalous with respect to a target group of observations, in the sense of belonging to a new, previously unseen 'attacker' class. One possible approach to this kind of verification problem is one-class classification, learning a description of the target class concerned based solely on data from this class. However, if known non-target classes are available at training time, it is also possible to use standard multi-class or two-class classification, exploiting the negative data to infer a description of the target class. In this paper we assume that this scenario holds and inves- tigate under what conditions multi-class and two-class Naive Bayes clas- sifiers are preferable to the corresponding one-class model when the aim is to identify examples from a new 'attacker' class. To this end we first identify a way of performing a fair comparison between the techniques concerned and present an adaptation of standard cross-validation. This is one of the main contributions of the paper. Based on the experimental results obtained, we then show under what conditions which group of techniques is likely to be preferable. Our main finding is that multi-class and two-class classification becomes preferable to one-class classification when a sufficiently large number of non-target classes is available.


Propositionalisation of Profile Hidden Markov Models for Biological Sequence Analysis

Stefan Mutter, Bernhard Pfahringer, and Geoffrey Holmes. Propositionalisation of profile hidden markov models for biological sequence analysis. In Proc 21st Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib | .pdf ]
Hidden Markov Models are a widely used generative model for analysing sequence data. A variant, Profile Hidden Markov Models are a special case used in Bioinformatics to represent, for example, protein families. In this paper we introduce a simple propositionalisation method for Profile Hidden Markov Models. The method allows the use of PHMMs discriminatively in a classification task. Previously, kernel approaches have been proposed to generate a discriminative description for an HMM, but require the explicit definition of a similarity measure for HMMs. Propositionalisation does not need such a measure and allows the use of any propositional learner including kernel-based approaches. We show empirically that using propositionalisation leads to higher accuracies in comparison with PHMMs on benchmark datasets.


Mining Arbitrarily Large Datasets Using Heuristic k-Nearest Neighbour Search

Xing Wu, Geoffrey Holmes, and Bernhard Pfahringer. Mining arbitrarily large datasets using heuristic k-nearest neighbour search. In Proc 21st Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand, pages 355-361. Springer, 2008.
[ bib | http ]
Nearest Neighbour Search (NNS) is one of the top ten data mining algorithms. It is simple and effective but has a time complexity that is the product of the number of instances and the number of dimensions. When the number of dimensions is greater than two there are no known solutions that can guarantee a sublinear retrieval time. This paper describes and evaluates two ways to make NNS efficient for datasets that are arbitrarily large in the number of instances and dimensions. The methods are best described as heuristic as they are neither exact nor approximate. Both stem from recent developments in the field of data stream classification. The first uses Hoeffding Trees, an extension of decision trees to streams and the second is a direct stream extension of NNS. The methods are evaluated in terms of their accuracy and the time taken to find the neighbours. Results show that the methods are competitive with NNS in terms of accuracy but significantly faster.


Organizing the World's Machine Learning Information

Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, and Geoffrey Holmes. Organizing the world's machine learning information. In Proc 3rd International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, Porto Sani, Greece, pages 693-708. Springer, 2008.
[ bib | http ]
All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their original intent and, if properly stored, could be of great use to future research. In this paper, we hope to stimulate the development of such learning experiment repositories by providing a bird's-eye view of how they can be created and used in practice, bringing together existing approaches and new ideas. We draw parallels between how experiments are being curated in other sciences, and consecutively discuss how both the empirical and theoretical details of learning experiments can be expressed, organized and made universally accessible. Finally, we discuss a range of possible services such a resource can offer, either used directly or integrated into data mining tools.


Adaptive feature thresholding for off-line signature verification

Rober Larkins and Michael Mayo. Adaptive feature thresholding for off-line signature verification. In Proc 23rd International Conference Image and Vision Computing New Zealand, Christchurch, New Zealand, pages 1-6. IEEE, 2008.
[ bib | http ]
This paper introduces Adaptive Feature Thresholding (AFT) which is a novel method of person-dependent off-line signature verification. AFT enhances how a simple image feature of a signature is converted to a binary feature vector by significantly improving its representation in relation to the training signatures. The similarity between signatures is then easily computed from their corresponding binary feature vectors. AFT was tested on the CEDAR and GPDS benchmark datasets, with classification using either a manual or an automatic variant. On the CEDAR dataset we achieved a classification accuracy of 92% for manual and 90% for automatic, while on the GPDS dataset we achieved over 87% and 85% respectively. For both datasets AFT is less complex and requires fewer images features than the existing state of the art methods, while achieving competitive results.


Improving face gender classification by adding deliberately misaligned faces to the training data

Michael Mayo and Edmond Zhang. Improving face gender classification by adding deliberately misaligned faces to the training data. In Proc 23rd International Conference Image and Vision Computing New Zealand, Christchurch, New Zealand, pages 1-5. IEEE, 2008.
[ bib | http ]
A novel method of face gender classifier construction is proposed and evaluated. Previously, researchers have assumed that a computationally expensive face alignment step (in which the face image is transformed so that facial landmarks such as the eyes, nose, chin, etc, are in uniform locations in the image) is required in order to maximize the accuracy of predictions on new face images. We, however, argue that this step is not necessary, and that machine learning classifiers can be made robust to face misalignments by automatically expanding the training data with examples of faces that have been deliberately misaligned (for example, translated or rotated). To test our hypothesis, we evaluate this automatic training dataset expansion method with two types of image classifier, the first based on weak features such as Local Binary Pattern histograms, and the second based on SIFT keypoints. Using a benchmark face gender classification dataset recently proposed in the literature, we obtain a state-of-the-art accuracy of 92.5%, thus validating our approach.


Pattern discovery for object categorization

Edmond Zhang and Michael Mayo. Pattern discovery for object categorization. In Proc 23rd International Conference Image and Vision Computing New Zealand, Christchurch, New Zealand, pages 1 - 6. IEEE, 2008.
[ bib | http ]
This paper presents a new approach for the object categorization problem. Our model is based on the successful `bag of words' approach. However, unlike the original model, image features (keypoints) are not seen as independent and orderless. Instead, our model attempts to discover intermediate representations for each object class. This approach works by partitioning the image into smaller regions then computing the spatial relationships between all of the informative image keypoints in the region. The results show that the inclusion of spatial relationships leads to a measurable increase in performance for two of the most challenging datasets.


Machine Learning for Adaptive Computer Game Opponents

Jonathan David Miles. Machine learning for adaptive computer game opponents. Master's thesis, Department of Computer Science, University of Waikato, 2008.
[ bib | http ]
This thesis investigates the use of machine learning techniques in computer games to create a computer player that adapts to its opponent's game-play. This includes first confirming that machine learning algorithms can be integrated into a modern computer game without have a detrimental effect on game performance, then experimenting with different machine learning techniques to maximize the computer player's performance. Experiments use three machine learning techniques; static prediction models, continuous learning, and reinforcement learning. Static models show the highest initial performance but are not able to beat a simple opponent. Continuous learning is able to improve the performance achieved with static models but the rate of improvement drops over time and the computer player is still unable to beat the opponent. Reinforcement learning methods have the highest rate of improvement but the lowest initial performance. This limits the effectiveness of reinforcement learning because a large number of episodes are required before performance becomes sufficient to match the opponent.


Random Relational Rules

Grant Anderson. Random Relational Rules. PhD thesis, Department of Computer Science, University of Waikato, 2008.
[ bib | http ]
In the field of machine learning, methods for learning from single-table data have received much more attention than those for learning from multi-table, or relational data, which are generally more computationally complex. However, a significant amount of the world's data is relational. This indicates a need for algorithms that can operate efficiently on relational data and exploit the larger body of work produced in the area of single-table techniques. This thesis presents algorithms for learning from relational data that mitigate, to some extent, the complexity normally associated with such learning. All algorithms in this thesis are based on the generation of random relational rules. The assumption is that random rules enable efficient and effective relational learning, and this thesis presents evidence that this is indeed the case. To this end, a system for generating random relational rules is described, and algorithms using these rules are evaluated. These algorithms include direct classification, classification by propositionalisation, clustering, semi-supervised learning and generating random forests. The experimental results show that these algorithms perform competitively with previously published results for the datasets used, while often exhibiting lower runtime than other tested systems. This demonstrates that sufficient information for classification and clustering is retained in the rule generation process and that learning with random rules is efficient. Further applications of random rules are investigated. Propositionalisation allows single-table algorithms for classification and clustering to be applied to the resulting data, reducing the amount of relational processing required. Further results show that techniques for utilising additional unlabeled training data improve accuracy of classification in the semi-supervised setting. The thesis also develops a novel algorithm for building random forests by making efficient use of random rules to generate trees and leaves in parallel.


Experiment Databases: Creating a New Platform for Meta-Learning Research

Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, and Geoffrey Holmes. Experiment databases: Creating a new platform for meta-learning research. In Proc ICML/COLT/UAI 2008 Planning to Learn Workshop, Helsinki, Finland. University of Porto, 2008.
[ bib | http ]

One-class Classification by Combining Density and Class Probability Estimation

Kathryn Hempstalk, Eibe Frank, and Ian H. Witten. One-class classification by combining density and class probability estimation. In Proc 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Machine Learning, Antwerp, Belgium. Springer, 2008.
[ bib | .pdf ]
One-class classification has important applications such as outlier and novelty detection. It is commonly tackled using density es- timation techniques or by adapting a standard classification algorithm to the problem of carving out a decision boundary that describes the location of the target data. In this paper we investigate a simple method for one-class classification that combines the application of a density es- timator, used to form a reference distribution, with the induction of a standard model for class probability estimation. In this method, the ref- erence distribution is used to generate artificial data that is employed to form a second, artificial class. In conjunction with the target class, this artificial class is the basis for a standard two-class learning problem. We explain how the density function of the reference distribution can be combined with the class probability estimates obtained in this way to form an adjusted estimate of the density function of the target class. Us- ing UCI datasets, and data from a typist recognition problem, we show that the combined model, consisting of both a density estimator and a class probability estimator, can improve on using either component tech- nique alone when used for one-class classification. We also compare the method to one-class classification using support vector machines.


Combining Naive Bayes and Decision Tables

Mark Hall and Eibe Frank. Combining naive Bayes and decision tables. In Proc 21st Florida Artificial Intelligence Research Society Conference, Miami, Florida. AAAI Press, 2008.
[ bib | .pdf ]
We investigate a simple semi-naive Bayesian ranking method that combines naive Bayes with induction of decision tables. Naive Bayes and decision tables can both be trained effi- ciently, and the same holds true for the combined semi-naive model. We show that the resulting ranker, compared to ei- ther component technique, frequently significantly increases AUC. For some datasets it significantly improves on both techniques. This is also the case when attribute selection is performed in naive Bayes and its semi-naive variant.


Propositionalisation of multiple sequence alignments using probabilistic models

Stefan Mutter, Bernhard Pfahringer, and Geoffrey Holmes. Propositionalisation of multiple sequence alignments using probabilistic models. In New Zealand Computer Science Research Student Conference (NZCSRSC 2008), Christchurch, New Zealand, pages 234-237, April 2008.
[ bib | .pdf ]
Multiple sequence alignments play a central role in Bioinformatics. Most alignment representations are designed to facilitate knowledge extraction by human experts. Additionally statistical models like Profile Hidden Markov Models are used as representations. They offer the advantage to provide sound, probabilistic scores. The basic idea we present in this paper is to use the structure of a Profile Hidden Markov Model for propositionalisation. This way we get a simple, extendable representation of multiple sequence alignments which facilitates further analysis by Machine Learning algorithms.


Exploiting propositionalization based on random relational rules for semi-supervised learning

Bernhard Pfahringer and Grant Anderson. Exploiting propositionalization based on random relational rules for semi-supervised learning. In Proc 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Osaka, Japan. Springer, 2008.
[ bib | http ]
In this paper we investigate an approach to semi-supervised learning based on randomized propositionalization, which allows for applying standard propositional classification algorithms like support vector machines to multi-relational data. Randomization based on random relational rules can work both with and without a class attribute and can therefore be applied simultaneously to both the labeled and the unlabeled portion of the data present in semi-supervised learning. An empirical investigation compares semi-supervised propositionalization to standard propositionalization using just the labeled data portion, as well as to a variant that also just uses the labeled data portion but includes the label information in an attempt to improve the resulting propositionalization. Preliminary experimental results indicate that propositionalization generated on the full dataset, i.e. the semi- supervised approach, tends to outperform the other two more standard approaches.


Handling Numeric Attributes in Hoeffding Trees

Bernhard Pfahringer, Geoffrey Holmes, and Richard Kirkby. Handling numeric attributes in hoeffding trees. In Proc 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Osaka, Japan, pages 296-307. Springer, 2008.
[ bib | http ]
For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into pre-defined bins or sort the data and search for the best split points. Unfortunately, none of these solutions carry over particularly well to a data stream environment. Solutions for data streams have been proposed by several authors but as yet none have been compared empirically. In this paper we investigate a range of methods for multi-class tree-based classification where the handling of numeric attributes takes place as the tree is constructed. To this end, we extend an existing approximation approach, based on simple Gaussian approximation. We then compare this method with four approaches from the literature arriving at eight final algorithm configurations for testing. The solutions cover a range of options from perfectly accurate and memory intensive to highly approximate. All methods are tested using the Hoeffding tree classification algorithm. Surprisingly, the experimental comparison shows that the most approximate methods produce the most accurate trees by allowing for faster tree growth.


Learning Instance Weights in Multi-Instance Learning

James Foulds. Learning instance weights in multi-instance learning. Master's thesis, Department of Computer Science, University of Waikato, 2008.
[ bib | http ]
Multi-instance (MI) learning is a variant of supervised machine learning, where each learning example contains a bag of instances instead of just a single feature vector. MI learning has applications in areas such as drug activity prediction, fruit disease management and image classification. This thesis investigates the case where each instance has a weight value determining the level of influence that it has on its s class label. This is a more general assumption than most existing approaches use, and thus is more widely applicable. The challenge is to accurately estimate these weights in order to make predictions at the bag level. An existing approach known as MILES is retroactively identified as an algorithm that uses instance weights for MI learning, and is evaluated using a variety of base learners on benchmark problems. New algorithms for learning instance weights for MI learning are also proposed and rigorously evaluated on both artificial and real-world datasets. The new algorithms are shown to achieve better root mean squared error rates than existing approaches on artificial data generated according to the underlying assumptions. Experimental results also demonstrate that the new algorithms are competitive with existing approaches on real-world problems.