[1]

Learning from the Past with Experiment Databases
Joaquin Vanschoren, Bernhard Pfahringer, and Geoffrey Holmes.
Learning from the past with experiment databases.
In Proc 10th Pacific Rim International Conference on Artificial
Intelligence, Hanoi, Vietnam, pages 485496. Springer, 2008.
[ bib ]
Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, often losing detailed results after publication. Yet, when collecting all these past experiments in experiment databases, they can readily be reused for additional and possibly much broader investigation. In this paper, we make use of such a database to answer various interesting research questions about learning algorithms and to verify a number of recent studies. Alongside performing elaborate comparisons of algorithms, we also investigate the effects of algorithm parameters and data properties, and seek deeper insights into the behavior of learning algorithms by studying their learning curves and biasvariance profiles.


[2]

Multilabel Classification Using Ensembles of Pruned Sets
Jesse Read, Bernhard Pfahringer, and Geoffrey Holmes.
Multilabel classification using ensembles of pruned sets.
In Proc 8th IEEE International Conference on Data Mining, Pisa,
Italy, pages 9951000. IEEE Computer Society, 2008.
[ bib 
http ]
This paper presents a pruned sets method (PS) for multilabel classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account correlations between labels. By pruning these sets, PS focuses only on the most important correlations, which reduces complexity and improves accuracy. By combining pruned sets in an ensemble scheme (EPS), new label sets can be formed to adapt to irregular or complex data. The results from experimental evaluation on a variety of multilabel datasets show that [E]PS can achieve better performance and train much faster than other multilabel methods.


[3]

Practical Bias Variance Decomposition
Remco R. Bouckaert.
Practical bias variance decomposition.
In Proc 21st Australasian Joint Conference on Artificial
Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib 
.pdf ]
Bias variance decomposition for classifiers is a useful tool in
understanding classifier behavior. Unfortunately, the literature
does not provide consistent guidelines on how to apply a bias
variance decomposition. This paper examines the various parameters
and variants of empirical bias variance decompositions through an
extensive simulation study. Based on this study, we recommend to use
ten fold cross validation as sampling method and take 100 samples
within each fold with a test set size of at least 2000. Only if the
learning algorithm is stable, fewer samples, a smaller test set
size or lower number of folds may be justified.


[4]

Revisiting MultipleInstance Learning via Embedded Instance Selection
James Foulds and Eibe Frank.
Revisiting multipleinstance learning via embedded instance
selection.
In Proc 21st Australasian Joint Conference on Artificial
Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib 
.pdf ]
MultipleInstance Learning via Embedded Instance Selection
(MILES) is a recently proposed multipleinstance (MI) classification al
gorithm that applies a singleinstance base learner to a propositional
ized version of MI data. However, the original authors consider only one
singleinstance base learner for the algorithm  the 1norm SVM. We
present an empirical study investigating the efficacy of alternative base
learners for MILES, and compare MILES to other MI algorithms. Our
results show that boosted decision stumps can in some cases provide
better classification accuracy than the 1norm SVM as a base learner
for MILES. Although MILES provides competitive performance when
compared to other MI learners, we identify simpler propositionaliza
tion methods that require shorter training times while retaining MILES'
strong classification performance on the datasets we tested.


[5]

Additive Regression Applied to a LargeScale Collaborative Filtering Problem
Eibe Frank and Mark Hall.
Additive regression applied to a largescale collaborative filtering
problem.
In Proc 21set Australasian Joint Conference on Artificial
Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib 
.pdf ]
The muchpublicized Netflix competition has put the spot
light on the application domain of collaborative filtering and has sparked
interest in machine learning algorithms that can be applied to this sort
of problem. The demanding nature of the Netflix data has lead to some
interesting and ingenious modifications to standard learning methods in
the name of efficiency and speed. There are three basic methods that
have been applied in most approaches to the Netflix problem so far:
standalone neighborhoodbased methods, latent factor models based on
singularvalue decomposition, and ensembles consisting of variations of
these techniques. In this paper we investigate the application of forward
stagewise additive modeling to the Netflix problem, using two regression
schemes as base learners: ensembles of weighted simple linear regressors
and kmeans clusteringthe latter being interpreted as a tool for multi
variate regression in this context. Experimental results show that our
methods produce competitive results.


[6]

Discriminating Against New Classes: OneClass versus MultiClass Classification
Kathryn Hempstalk and Eibe Frank.
Discriminating against new classes: Oneclass versus multiclass
classification.
In Proc 21set Australasian Joint Conference on Artificial
Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib 
.pdf ]
Many applications require the ability to identify data that
is anomalous with respect to a target group of observations, in the sense
of belonging to a new, previously unseen 'attacker' class. One possible
approach to this kind of verification problem is oneclass classification,
learning a description of the target class concerned based solely on data
from this class. However, if known nontarget classes are available at
training time, it is also possible to use standard multiclass or twoclass
classification, exploiting the negative data to infer a description of the
target class. In this paper we assume that this scenario holds and inves
tigate under what conditions multiclass and twoclass Naive Bayes clas
sifiers are preferable to the corresponding oneclass model when the aim
is to identify examples from a new 'attacker' class. To this end we first
identify a way of performing a fair comparison between the techniques
concerned and present an adaptation of standard crossvalidation. This
is one of the main contributions of the paper. Based on the experimental
results obtained, we then show under what conditions which group of
techniques is likely to be preferable. Our main finding is that multiclass
and twoclass classification becomes preferable to oneclass classification
when a sufficiently large number of nontarget classes is available.


[7]

Propositionalisation of Profile Hidden Markov Models for Biological Sequence Analysis
Stefan Mutter, Bernhard Pfahringer, and Geoffrey Holmes.
Propositionalisation of profile hidden markov models for biological
sequence analysis.
In Proc 21st Australasian Joint Conference on Artificial
Intelligence, Auckland, New Zealand. Springer, 2008.
[ bib 
.pdf ]
Hidden Markov Models are a widely used generative model
for analysing sequence data. A variant, Profile Hidden Markov Models
are a special case used in Bioinformatics to represent, for example,
protein families. In this paper we introduce a simple
propositionalisation method for Profile Hidden Markov Models. The
method allows the use of PHMMs discriminatively in a classification
task. Previously, kernel approaches have been proposed to generate a
discriminative description for an HMM, but require the explicit
definition of a similarity measure for HMMs. Propositionalisation does
not need such a measure and allows the use of any propositional
learner including kernelbased approaches. We show empirically that
using propositionalisation leads to higher accuracies in comparison
with PHMMs on benchmark datasets.


[8]

Mining Arbitrarily Large Datasets Using Heuristic kNearest
Neighbour Search
Xing Wu, Geoffrey Holmes, and Bernhard Pfahringer.
Mining arbitrarily large datasets using heuristic knearest neighbour
search.
In Proc 21st Australasian Joint Conference on Artificial
Intelligence, Auckland, New Zealand, pages 355361. Springer, 2008.
[ bib 
http ]
Nearest Neighbour Search (NNS) is one of the top ten data mining algorithms. It is simple and effective but has a time complexity that is the product of the number of instances and the number of dimensions. When the number of dimensions is greater than two there are no known solutions that can guarantee a sublinear retrieval time. This paper describes and evaluates two ways to make NNS efficient for datasets that are arbitrarily large in the number of instances and dimensions. The methods are best described as heuristic as they are neither exact nor approximate. Both stem from recent developments in the field of data stream classification. The first uses Hoeffding Trees, an extension of decision trees to streams and the second is a direct stream extension of NNS. The methods are evaluated in terms of their accuracy and the time taken to find the neighbours. Results show that the methods are competitive with NNS in terms of accuracy but significantly faster.


[9]

Organizing the World's Machine Learning Information
Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, and Geoffrey Holmes.
Organizing the world's machine learning information.
In Proc 3rd International Symposium on Leveraging Applications
of Formal Methods, Verification and Validation, Porto Sani, Greece, pages
693708. Springer, 2008.
[ bib 
http ]
All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their original intent and, if properly stored, could be of great use to future research. In this paper, we hope to stimulate the development of such learning experiment repositories by providing a bird'seye view of how they can be created and used in practice, bringing together existing approaches and new ideas. We draw parallels between how experiments are being curated in other sciences, and consecutively discuss how both the empirical and theoretical details of learning experiments can be expressed, organized and made universally accessible. Finally, we discuss a range of possible services such a resource can offer, either used directly or integrated into data mining tools.


[10]

Adaptive feature thresholding for offline signature verification
Rober Larkins and Michael Mayo.
Adaptive feature thresholding for offline signature verification.
In Proc 23rd International Conference Image and Vision Computing
New Zealand, Christchurch, New Zealand, pages 16. IEEE, 2008.
[ bib 
http ]
This paper introduces Adaptive Feature Thresholding (AFT) which is a novel method of persondependent offline signature verification. AFT enhances how a simple image feature of a signature is converted to a binary feature vector by significantly improving its representation in relation to the training signatures. The similarity between signatures is then easily computed from their corresponding binary feature vectors. AFT was tested on the CEDAR and GPDS benchmark datasets, with classification using either a manual or an automatic variant. On the CEDAR dataset we achieved a classification accuracy of 92% for manual and 90% for automatic, while on the GPDS dataset we achieved over 87% and 85% respectively. For both datasets AFT is less complex and requires fewer images features than the existing state of the art methods, while achieving competitive results.


[11]

Improving face gender classification by adding deliberately misaligned faces to the training data
Michael Mayo and Edmond Zhang.
Improving face gender classification by adding deliberately
misaligned faces to the training data.
In Proc 23rd International Conference Image and Vision Computing
New Zealand, Christchurch, New Zealand, pages 15. IEEE, 2008.
[ bib 
http ]
A novel method of face gender classifier construction is proposed and evaluated. Previously, researchers have assumed that a computationally expensive face alignment step (in which the face image is transformed so that facial landmarks such as the eyes, nose, chin, etc, are in uniform locations in the image) is required in order to maximize the accuracy of predictions on new face images. We, however, argue that this step is not necessary, and that machine learning classifiers can be made robust to face misalignments by automatically expanding the training data with examples of faces that have been deliberately misaligned (for example, translated or rotated). To test our hypothesis, we evaluate this automatic training dataset expansion method with two types of image classifier, the first based on weak features such as Local Binary Pattern histograms, and the second based on SIFT keypoints. Using a benchmark face gender classification dataset recently proposed in the literature, we obtain a stateoftheart accuracy of 92.5%, thus validating our approach.


[12]

Pattern discovery for object categorization
Edmond Zhang and Michael Mayo.
Pattern discovery for object categorization.
In Proc 23rd International Conference Image and Vision Computing
New Zealand, Christchurch, New Zealand, pages 1  6. IEEE, 2008.
[ bib 
http ]
This paper presents a new approach for the object categorization problem. Our model is based on the successful `bag of words' approach. However, unlike the original model, image features (keypoints) are not seen as independent and orderless. Instead, our model attempts to discover intermediate representations for each object class. This approach works by partitioning the image into smaller regions then computing the spatial relationships between all of the informative image keypoints in the region. The results show that the inclusion of spatial relationships leads to a measurable increase in performance for two of the most challenging datasets.


[13]

Machine Learning for Adaptive Computer Game Opponents
Jonathan David Miles.
Machine learning for adaptive computer game opponents.
Master's thesis, Department of Computer Science, University of
Waikato, 2008.
[ bib 
http ]
This thesis investigates the use of machine learning techniques in computer games to create a computer player that adapts to its opponent's gameplay. This includes first confirming that machine learning algorithms can be integrated into a modern computer game without have a detrimental effect on game performance, then experimenting with different machine learning techniques to maximize the computer player's performance. Experiments use three machine learning techniques; static prediction models, continuous learning, and reinforcement learning. Static models show the highest initial performance but are not able to beat a simple opponent. Continuous learning is able to improve the performance achieved with static models but the rate of improvement drops over time and the computer player is still unable to beat the opponent. Reinforcement learning methods have the highest rate of improvement but the lowest initial performance. This limits the effectiveness of reinforcement learning because a large number of episodes are required before performance becomes sufficient to match the opponent.


[14]

Random Relational Rules
Grant Anderson.
Random Relational Rules.
PhD thesis, Department of Computer Science, University of Waikato,
2008.
[ bib 
http ]
In the field of machine learning, methods for learning from singletable data have received much more attention than those for learning from multitable, or relational data, which are generally more computationally complex. However, a significant amount of the world's data is relational. This indicates a need for algorithms that can operate efficiently on relational data and exploit the larger body of work produced in the area of singletable techniques. This thesis presents algorithms for learning from relational data that mitigate, to some extent, the complexity normally associated with such learning. All algorithms in this thesis are based on the generation of random relational rules. The assumption is that random rules enable efficient and effective relational learning, and this thesis presents evidence that this is indeed the case. To this end, a system for generating random relational rules is described, and algorithms using these rules are evaluated. These algorithms include direct classification, classification by propositionalisation, clustering, semisupervised learning and generating random forests. The experimental results show that these algorithms perform competitively with previously published results for the datasets used, while often exhibiting lower runtime than other tested systems. This demonstrates that sufficient information for classification and clustering is retained in the rule generation process and that learning with random rules is efficient. Further applications of random rules are investigated. Propositionalisation allows singletable algorithms for classification and clustering to be applied to the resulting data, reducing the amount of relational processing required. Further results show that techniques for utilising additional unlabeled training data improve accuracy of classification in the semisupervised setting. The thesis also develops a novel algorithm for building random forests by making efficient use of random rules to generate trees and leaves in parallel.


[15]

Experiment Databases: Creating a New Platform for MetaLearning Research
Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, and Geoffrey Holmes.
Experiment databases: Creating a new platform for metalearning
research.
In Proc ICML/COLT/UAI 2008 Planning to Learn Workshop,
Helsinki, Finland. University of Porto, 2008.
[ bib 
http ]


[16]

Oneclass Classification by Combining Density and Class Probability Estimation
Kathryn Hempstalk, Eibe Frank, and Ian H. Witten.
Oneclass classification by combining density and class probability
estimation.
In Proc 12th European Conference on Principles and Practice of
Knowledge Discovery in Databases and 19th European Conference on Machine
Learning, Antwerp, Belgium. Springer, 2008.
[ bib 
.pdf ]
Oneclass classification has important applications such as
outlier and novelty detection. It is commonly tackled using density es
timation techniques or by adapting a standard classification algorithm
to the problem of carving out a decision boundary that describes the
location of the target data. In this paper we investigate a simple method
for oneclass classification that combines the application of a density es
timator, used to form a reference distribution, with the induction of a
standard model for class probability estimation. In this method, the ref
erence distribution is used to generate artificial data that is employed
to form a second, artificial class. In conjunction with the target class,
this artificial class is the basis for a standard twoclass learning problem.
We explain how the density function of the reference distribution can be
combined with the class probability estimates obtained in this way to
form an adjusted estimate of the density function of the target class. Us
ing UCI datasets, and data from a typist recognition problem, we show
that the combined model, consisting of both a density estimator and a
class probability estimator, can improve on using either component tech
nique alone when used for oneclass classification. We also compare the
method to oneclass classification using support vector machines.


[17]

Combining Naive Bayes and Decision Tables
Mark Hall and Eibe Frank.
Combining naive Bayes and decision tables.
In Proc 21st Florida Artificial Intelligence Research Society
Conference, Miami, Florida. AAAI Press, 2008.
[ bib 
.pdf ]
We investigate a simple seminaive Bayesian ranking method
that combines naive Bayes with induction of decision tables.
Naive Bayes and decision tables can both be trained effi
ciently, and the same holds true for the combined seminaive
model. We show that the resulting ranker, compared to ei
ther component technique, frequently significantly increases
AUC. For some datasets it significantly improves on both
techniques. This is also the case when attribute selection is
performed in naive Bayes and its seminaive variant.


[18]

Propositionalisation of multiple sequence alignments using probabilistic models
Stefan Mutter, Bernhard Pfahringer, and Geoffrey Holmes.
Propositionalisation of multiple sequence alignments using
probabilistic models.
In New Zealand Computer Science Research Student Conference
(NZCSRSC 2008), Christchurch, New Zealand, pages 234237, April 2008.
[ bib 
.pdf ]
Multiple sequence alignments play a central role in
Bioinformatics. Most alignment representations are designed to
facilitate knowledge extraction by human experts. Additionally
statistical models like Profile Hidden Markov Models are used as
representations. They offer the advantage to provide sound,
probabilistic scores. The basic idea we present in this paper is to
use the structure of a Profile Hidden Markov Model for
propositionalisation. This way we get a simple, extendable
representation of multiple sequence alignments which facilitates
further analysis by Machine Learning algorithms.


[19]

Exploiting propositionalization based on random relational rules for semisupervised learning
Bernhard Pfahringer and Grant Anderson.
Exploiting propositionalization based on random relational rules for
semisupervised learning.
In Proc 12th PacificAsia Conference on Knowledge Discovery and
Data Mining, Osaka, Japan. Springer, 2008.
[ bib 
http ]
In this paper we investigate an approach to semisupervised learning based on randomized propositionalization, which allows for applying standard propositional classification algorithms like support vector machines to multirelational data. Randomization based on random relational rules can work both with and without a class attribute and can therefore be applied simultaneously to both the labeled and the unlabeled portion of the data present in semisupervised learning. An empirical investigation compares semisupervised propositionalization to standard propositionalization using just the labeled data portion, as well as to a variant that also just uses the labeled data portion but includes the label information in an attempt to improve the resulting propositionalization. Preliminary experimental results indicate that propositionalization generated on the full dataset, i.e. the semi supervised approach, tends to outperform the other two more standard approaches.


[20]

Handling Numeric Attributes in Hoeffding Trees
Bernhard Pfahringer, Geoffrey Holmes, and Richard Kirkby.
Handling numeric attributes in hoeffding trees.
In Proc 12th PacificAsia Conference on Knowledge Discovery and
Data Mining, Osaka, Japan, pages 296307. Springer, 2008.
[ bib 
http ]
For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into predefined bins or sort the data and search for the best split points. Unfortunately, none of these solutions carry over particularly well to a data stream environment. Solutions for data streams have been proposed by several authors but as yet none have been compared empirically. In this paper we investigate a range of methods for multiclass treebased classification where the handling of numeric attributes takes place as the tree is constructed. To this end, we extend an existing approximation approach, based on simple Gaussian approximation. We then compare this method with four approaches from the literature arriving at eight final algorithm configurations for testing. The solutions cover a range of options from perfectly accurate and memory intensive to highly approximate. All methods are tested using the Hoeffding tree classification algorithm. Surprisingly, the experimental comparison shows that the most approximate methods produce the most accurate trees by allowing for faster tree growth.


[21]

Learning Instance Weights in MultiInstance Learning
James Foulds.
Learning instance weights in multiinstance learning.
Master's thesis, Department of Computer Science, University of
Waikato, 2008.
[ bib 
http ]
Multiinstance (MI) learning is a variant of supervised
machine learning, where each learning example contains a bag of
instances instead of just a single feature vector. MI learning has
applications in areas such as drug activity prediction, fruit disease
management and image classification. This thesis investigates the case
where each instance has a weight value determining the level of
influence that it has on its s class label. This is a more general
assumption than most existing approaches use, and thus is more widely
applicable. The challenge is to accurately estimate these weights in
order to make predictions at the bag level. An existing approach known
as MILES is retroactively identified as an algorithm that uses
instance weights for MI learning, and is evaluated using a variety of
base learners on benchmark problems. New algorithms for learning
instance weights for MI learning are also proposed and rigorously
evaluated on both artificial and realworld datasets. The new
algorithms are shown to achieve better root mean squared error rates
than existing approaches on artificial data generated according to the
underlying assumptions. Experimental results also demonstrate that the
new algorithms are competitive with existing approaches on realworld
problems.

