[1]

Integrated Instance and Classbased Generative Modeling for Text Classification
Antti Puurula and SungHyon Myaeng.
Integrated instance and classbased generative modeling for text
classification.
In Proc 18th Australasian Document Computing Symposium, pages
6673. ACM, 2013.
[ bib 
.pdf ]
Statistical methods for text classification are predominantly based on the paradigm of classbased learning that associates class variables with features, discarding the instances of data after model training. This results in efficient models, but neglects the finegrained information present in individual documents. Instancebased learning uses this information, but suffers from data sparsity with text data. In this paper, we propose a generative model called Tied Document Mixture (TDM) for extending Multinomial Naive Bayes (MNB) with mixtures of hierarchically smoothed models for documents. Alternatively, TDM can be viewed as a Kernel Density Classifier using classsmoothed Multinomial kernels. TDM is evaluated for classification accuracy on 14 different datasets for multilabel, multiclass and binaryclass text classification tasks and compared to instance and classbased learning baselines. The comparisons to MNB demonstrate a substantial improvement in accuracy as a function of available training documents per class, ranging up to average error reductions of over 26% in sentiment clas sification and 65% in spam classification. On average TDM is as accurate as the best discriminative classifiers, but retains the linear time complexities of instancebased learning methods, with exact algorithms for both model estimation and inference.


[2]

Cumulative Progress in Language Models for Information Retrieval
Antti Puurula.
Cumulative progress in language models for information retrieval.
In Proc 11th Australasian Language Technology Workshop,
Brisbane, Australia, pages 96100. ACL, 2013.
[ bib 
.pdf ]
The improvements to adhoc IR systems over the last decades have been recently criticized as illusionary and based on incorrect baseline comparisons. In this paper several improvements to the LM approach to IR are combined and evaluated: PitmanYor Process smoothing, TFIDF feature weighting and modelbased feedback. The increases in ranking quality are significant and cumulative over the standard baselines of Dirichlet Prior and 2stage Smoothing, when evaluated across 13 standard adhoc retrieval datasets. The combination of the improvements is shown to improve the Mean Average Precision over the datasets by 17.1% relative. Furthermore, the considered improvements can be easily implemented with little additional computation to existing LM retrieval systems. On the basis of the results it is suggested that LM research for IR should move towards using stronger baseline models.


[3]

Online Estimation of Discrete Densities
Michael Geilke, Eibe Frank, Andreas Karwath, and Stefan Kramer.
Online estimation of discrete densities.
In Proc 13th IEEE International Conference on Data Mining,
Dallas, Texas. IEEE, 2013.
[ bib 
.pdf ]
We address the problem of estimating a discrete joint density online, that is, the algorithm is only provided the current example and its current estimate. The proposed online estimator of discrete densities, EDDO (Estimation of Discrete Densities Online), uses classifier chains to model dependencies among features. Each classifier in the chain estimates the probability of one particular feature. Because a single chain may not provide a reliable estimate, we also consider ensembles of classifier chains and ensembles of weighted classifier chains. For all density estimators, we provide consistency proofs and propose algorithms to perform certain inference tasks. The empirical evaluation of the estimators is conducted in several experiments and on data sets of up to several million instances: We compare them to density estimates computed from Bayesian structure learners, evaluate them under the influence of noise, measure their ability to deal with concept drift, and measure the runtime performance. Our experiments demonstrate that, even though designed to work online, EDDO delivers estimators of competitive accuracy compared to batch Bayesian structure learners and batch variants of EDDO.


[4]

Propositionalisation of Multiinstance Data using Random Forests
Eibe Frank and Bernhard Pfahringer.
Propositionalisation of multiinstance data using random forests.
In Proc 26th Australasian Conference on Artificial
Intelligence, Otago, New Zealand, pages 362373. Springer, 2013.
[ bib 
.pdf ]
Multiinstance learning is a generalisation of attributevalue learning where examples for learning consist of labeled bags (i.e. multisets) of instances. This learning setting is more computationally challenging than attributevalue learning and a natural fit for important application areas of machine learning such as classification of molecules and image classification. One approach to solve multiinstance learning problems is to apply propositionalisation, where bags of data are converted into vectors of attributevalue pairs so that a standard propositional (i.e. attributevalue) learning algorithm can be applied. This approach is attractive because of the large number of propositional learning algorithms that have been developed and can thus be applied to the propositionalised data. In this paper, we empirically investigate a variant of an existing propositionalisation method called TLC. TLC uses a single decision tree to obtain propositionalised data. Our variant applies a random forest instead and is motivated by the potential increase in robustness that this may yield. We present results on synthetic and realworld data from the above two application domains showing that it indeed yields increased classification accuracy when applying boosting and support vector machines to classify the propositionalised data.


[5]

Random Projections as Regularizers: Learning a Linear Discriminant Ensemble from Fewer Observations than Dimensions
Robert J. Durrant and Ata Kabán.
Random projections as regularizers: Learning a linear discriminant
ensemble from fewer observations than dimensions.
In Proc 5th Asian Conference on Machine Learning, Canberra,
Australia. JMLR, 2013.
[ bib 
.pdf ]
We examine the performance of an ensemble of randomlyprojected Fisher Linear Discriminant classifiers, focusing on the case when there are fewer training observations than data
dimensions. Our ensemble is learned from a sequence of randomlyprojected representations of the original high dimensional data and therefore for this approach data can be
collected, stored and processed in such a compressed form.
The specific form and simplicity of this ensemble permits a direct and much more detailed
analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training
set, and based on this we give theoretical guarantees which directly link the performance
of the ensemble to that of the corresponding linear discriminant learned in the full data
space. To the best of our knowledge these are the first theoretical results to prove such
an explicit link for any classifier and classifier ensemble pair. Furthermore we show that
the randomlyprojected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents
overfitting in conditions of small sample size where pseudoinverse FLD learned in the data
space is provably poor.
We confirm theoretical findings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain where fewer observations than
dimensions are the norm.


[6]

DimensionAdaptive Bounds on Compressive FLD Classification
Ata Kabán and Robert J. Durrant.
Dimensionadaptive bounds on compressive fld classification.
In Proc 24th International Conference on Algorithmic Learning
Theory, Singapore, pages 294308, 2013.
[ bib 
http 
.pdf ]
Efficient dimensionality reduction by random projections (RP) gains popularity, hence the learning guarantees achievable in RP spaces are of great interest. In finite dimensional setting, it has been shown for the compressive Fisher Linear Discriminant (FLD) classifier that for good generalisation the required target dimension grows only as the log of the number of classes and is not adversely affected by the number of projected data points. However these bounds depend on the dimensionality d of the original data space. In this paper we give further guarantees that remove d from the bounds under certain conditions of regularity on the data density structure. In particular, if the data density does not fill the ambient space then the error of compressive FLD is independent of the ambient dimension and depends only on a notion of ‘intrinsic dimension’.


[7]

Clustering Based Active Learning for Evolving Data Streams
Dino Ienco, Albert Bifet, Indre Zliobaite, and Bernhard Pfahringer.
Clustering based active learning for evolving data streams.
In Proc 16th International Conference on Discovery Science,
Singapore, pages 7993, 2013.
[ bib 
http ]
Data labeling is an expensive and timeconsuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction of all instances. While many works exist that deal with this issue in nonstreaming scenarios, few works exist in the data stream setting. In this paper we propose a new active learning approach for evolving data streams based on a preclustering step, for selecting the most informative instances for labeling. We consider a batch incremental setting: when a new batch arrives, first we cluster the examples, and then, we select the best instances to train the learner. The clustering approach allows to cover the whole data space avoiding to oversample examples from only few areas. We compare our method w.r.t. state of the art active learning strategies over real datasets. The results highlight the improvement in performance of our proposal. Experiments on parameter sensitivity are also reported.


[8]

Pairwise metarules for better metalearningbased algorithm
ranking
Quan Sun and Bernhard Pfahringer.
Pairwise metarules for better metalearningbased algorithm ranking.
Machine Learning, 93(1):141161, 2013.
[ bib 
http ]
In this paper, we present a novel metafeature generation method in the context of metalearning, which is based on rules that compare the performance of individual base learners in a oneagainstone manner. In addition to these new metafeatures, we also introduce a new metalearner called Approximate Ranking Tree Forests (ART Forests) that performs very competitively when compared with several stateoftheart metalearners. Our experimental results are based on a large collection of datasets and show that the proposed new techniques can improve the overall performance of metalearning for algorithm ranking significantly. A key point in our approach is that each performance figure of any base learner for any specific dataset is generated by optimising the parameters of the base learner separately for each dataset.


[9]

Pitfalls in Benchmarking Data Stream Classification and
How to Avoid Them
Albert Bifet, Jesse Read, Indre Zliobaite, Bernhard Pfahringer, and Geoff
Holmes.
Pitfalls in benchmarking data stream classification and how to avoid
them.
In Proc European Conference on Machine Learning and Knowledge
Discovery in Databases, Prague, Czech Republic, pages 465479, 2013.
[ bib 
http ]
Data stream classification plays an important role in modern data analysis, where data arrives in a stream and needs to be mined in real time. In the data stream setting the underlying distribution from which this data comes may be changing and evolving, and so classifiers that can update themselves during operation are becoming the stateoftheart. In this paper we show that data streams may have an important temporal component, which currently is not considered in the evaluation and benchmarking of data stream classifiers. We demonstrate how a naive classifier considering the temporal component only outperforms a lot of current stateoftheart classifiers on real data streams that have temporal dependence, i.e. data is autocorrelated. We propose to evaluate data stream classifiers taking into account temporal dependence, and introduce a new evaluation measure, which provides a more accurate gauge of data stream classifier performance. In response to the temporal dependence issue we propose a generic wrapper for data stream classifiers, which incorporates the temporal component into the attribute space.


[10]

SMOTE for Regression
Luís Torgo, Rita P. Ribeiro, Bernhard Pfahringer, and Paula Branco.
Smote for regression.
In Proc 16th Portuguese Conference on Artificial Intelligence,
Angra do Heroísmo, Azores, Portugal, pages 378389, 2013.
[ bib 
http ]
Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present a modification of the wellknown Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous target variable


[11]

Applying additive logistic regression to data derived from sensors monitoring behavioral and physiological characteristics of dairy cows to detect lameness
Claudia Kamphuis, Eibe Frank, Jennie K. Burke, Gwyn Verkerk, and Jenny Jago.
Applying additive logistic regression to data derived from sensors
monitoring behavioral and physiological characteristics of dairy cows to
detect lameness.
Dairy Science, 2013.
[ bib 
http ]
The hypothesis was that sensors currently available on farm that monitor behavioral and physiological characteristics have potential for the detection of lameness in dairy cows. This was tested by applying additive logistic regression to variables derived from sensor data. Data were collected between November 2010 and June 2012 on 5 commercial pasturebased dairy farms. Sensor data from weigh scales (liveweight), pedometers (activity), and milk meters (milking order, unadjusted and adjusted milk yield in the first 2 min of milking, total milk yield, and milking duration) were collected at every milking from 4,904 cows. Lameness events were recorded by farmers who were trained in detecting lameness before the study commenced. A total of 318 lameness events affecting 292 cows were available for statistical analyses. For each lameness event, the lame cow's sensor data for a time period of 14 d before observation date were randomly matched by farm and date to 10 healthy cows (i.e., cows that were not lame and had no other health event recorded for the matched time period). Sensor data relating to the 14d time periods were used for developing univariable (using one source of sensor data) and multivariable (using multiple sources of sensor data) models. Model development involved the use of additive logistic regression by applying the LogitBoost algorithm with a regression tree as base learner. The model's output was a probability estimate for lameness, given the sensor data collected during the 14d time period. Models were validated using leaveonefarmout crossvalidation and, as a result of this validation, each cow in the data set (318 lame and 3,180 nonlame cows) received a probability estimate for lameness. Based on the area under the curve (AUC), results indicated that univariable models had low predictive potential, with the highest AUC values found for liveweight (AUC = 0.66), activity (AUC = 0.60), and milking order (AUC = 0.65). Combining these 3 sensors improved AUC to 0.74. Detection performance of this combined model varied between farms but it consistently and significantly outperformed univariable models across farms at a fixed specificity of 80%. Still, detection performance was not high enough to be implemented in practice on large, pasturebased dairy farms. Future research may improve performance by developing variables based on sensor data of liveweight, activity, and milking order, but that better describe changes in sensor data patterns when cows go lame.


[12]

Towards a Framework for Designing Full Model Selection and
Optimization Systems
Quan Sun, Bernhard Pfahringer, and Michael Mayo.
Towards a framework for designing full model selection and
optimization systems.
In Proc 11th International Workshop on Multiple Classifier
Systems, Nanjing, China, pages 259270. Springer, 2013.
[ bib 
http ]
People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. Endusers now have to face the new problem of how to choose a combination of data processing tools and algorithms for a given dataset. This problem is usually termed the Full Model Selection (FMS) problem. Extended from our previous work [10], in this paper, we introduce a framework for designing FMS algorithms. Under this framework, we propose a novel algorithm combining both genetic algorithms (GA) and particle swarm optimization (PSO) named GPS (which stands for GAPSOFMS), in which a GA is used for searching the optimal structure for a data mining solution, and PSO is used for searching optimal parameters for a particular structure instance. Given a classification dataset, GPS outputs a FMS solution as a directed acyclic graph consisting of diverse data mining operators that are available to the problem. Experimental results demonstrate the benefit of the algorithm. We also present, with detailed analysis, two modeltreebased variants for speeding up the GPS algorithm.


[13]

Predicting Regression Test Failures Using Genetic AlgorithmSelected
Dynamic Performance Analysis Metrics
Michael Mayo and Simon A. Spacey.
Predicting regression test failures using genetic algorithmselected
dynamic performance analysis metrics.
In Proc 5th International Symposium on Search Based Software
Engineering, St. Petersburg, Russia, pages 158171, 2013.
[ bib 
http ]
A novel framework for predicting regression test failures is proposed. The basic principle embodied in the framework is to use performance analysis tools to capture the runtime behaviour of a program as it executes each test in a regression suite. The performance information is then used to build a dynamically predictive model of test outcomes. Our framework is evaluated using a genetic algorithm for dynamic metric selection in combination with stateoftheart machine learning classifiers. We show that if a program is modified and some tests subsequently fail, then it is possible to predict with considerable accuracy which of the remaining tests will also fail which can be used to help prioritise tests in time constrained testing environments.


[14]

Automatic construction of lexicons, taxonomies, ontologies,
and other knowledge structures
Olena Medelyan, Ian H. Witten, Anna Divoli, and Jeen Broekstra.
Automatic construction of lexicons, taxonomies, ontologies, and other
knowledge structures.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery, 3(4):257279, 2013.
[ bib 
http ]
Abstract, structured, representations of knowledge such as lexicons, taxonomies, and ontologies have proven to be powerful resources not only for the systematization of knowledge in general, but to support practical technologies of document organization, information retrieval, natural language understanding, and questionanswering systems. These resources are extremely time consuming for people to create and maintain, yet demand for them is growing, particularly in specialized areas ranging from legacy documents of large enterprises to rapidly changing domains such as current affairs and celebrity news. Consequently, researchers are investigating methods of creating such structures automatically from document collections, calling on the proliferation of interlinked resources already available on the web for background knowledge and general information about the world. This review surveys what is possible, and also outlines current research directions.


[15]

Towards large scale continuous EDA: a random matrix theory
perspective
Ata Kabán, Jakramate Bootkrajang, and Robert John Durrant.
Towards large scale continuous eda: a random matrix theory
perspective.
In Proc Genetic and Evolutionary Computation Conference,
Amsterdan, The Netherlands, pages 383390, 2013.
[ bib 
http 
.pdf ]
Estimation of distribution algorithms (EDA) are a major branch of evolutionary algorithms (EA) with some unique advantages in principle. They are able to take advantage of correlation structure to drive the search more efficiently, and they are able to provide insights about the structure of the search space. However, model building in high dimensions is extremely challenging and as a result existing EDAs lose their strengths in large scale problems.
Large scale continuous global optimisation is key to many real world problems of modern days. Scaling up EAs to large scale problems has become one of the biggest challenges of the field.
This paper pins down some fundamental roots of the problem and makes a start at developing a new and generic framework to yield effective EDAtype algorithms for large scale continuous global optimisation problems. Our concept is to introduce an ensemble of random projections of the set of fittest search points to low dimensions as a basis for developing a new and generic divideandconquer methodology. This is rooted in the theory of random projections developed in theoretical computer science, and will exploit recent advances of nonasymptotic random matrix theory.


[16]

Sharp Generalization Error Bounds for Randomlyprojected Classifiers
Robert J. Durrant and Ata Kabán.
Sharp generalization error bounds for randomlyprojected classifiers.
In Proc 30th International Conference on Machine Learning,
Atlanta, Georgia, pages 693701. JMLR, 2013.
[ bib 
.pdf ]
We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomlyprojected data. We make no restrictive assumptions (such as sparsity or separability) on the data: Instead we use the fact that, in a classification setting, the question of interest is really ‘what is the effect of random projection on the predicted class labels?’ and we therefore derive the exact probability of ‘label flipping’ under Gaussian random projection in order to quantify this effect precisely in our bounds.


[17]

Constructing a Focused Taxonomy from a Document Collection
Olena Medelyan, Steve Manion, Jeen Broekstra, Anna Divoli, AnnaLan Huang, and
Ian H. Witten.
Constructing a focused taxonomy from a document collection.
In Proc 10th European Semantic Web Conference, Montpellier,
France, pages 367381. Springer, 2013.
[ bib 
http ]
We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain.


[18]

Artificial neural network is highly predictive of outcome in paediatric acute liver failure
Jeremy Rajanayagam, Eibe Frank, Ross W. Shepherd, and Peter J. Lewindon.
Artificial neural network is highly predictive of outcome in
paediatric acute liver failure.
Pediatric Transplantation, 2013.
[ bib 
http ]
Current prognostic models in PALF are unreliable, failing to account for complex, nonlinear relationships existing between multiple prognostic factors. A computational approach using ANN should provide superior modelling to PELDMELD scores. We assessed the prognostic accuracy of PELDMELD scores and ANN in PALF in children presenting to the QLTS, Australia. A comprehensive registrybased data set was evaluated in 54 children (32M, 22F, median age 17 month) with PALF. PELDMELD scores calculated at (i) meeting PALF criteria and (ii) peak. ANN was evaluated using stratified 10fold crossvalidation. Outcomes were classified as good (transplantfree survival) or poor (death or LT) and predictive accuracy compared using AUROC curves. Mean PELDMELD scores were significantly higher in nontransplanted nonsurvivors (i) 37 and (ii) 46 and transplant recipients (i) 32 and (ii) 43 compared to transplantfree survivors (i) 26 and (ii) 30. Threshold PELDMELD scores ≥27 and ≥42, at meeting PALF criteria and peak, gave AUROC 0.71 and 0.86, respectively, for poor outcome. ANN showed superior prediction for poor outcome with AUROC 0.96, sensitivity 82.6%, specificity 96%, PPV 96.2% and NPV 85.7% (cutoff 0.5). ANN is superior to PELDMELD for predicting poor outcome in PALF.


[19]

Identifying Market Price Levels Using Differential Evolution
Michael Mayo.
Identifying market price levels using differential evolution.
In Proc 16th European Conference on Applications of Evolutionary
Computation, Vienna, Austria, pages 203212, 2013.
[ bib 
http ]
Evolutionary data mining is used in this paper to investigate the concept of support and resistance levels in financial markets. Specifically, Differential Evolution is used to learn support/resistance levels from price data. The presence of these levels is then tested in outofsample data. Our results from a set of experiments covering five years worth of daily data across nine different US markets show that there is statistical evidence for price levels in certain markets, and that Differential Evolution can uncover them.


[20]

A direct policysearch algorithm for relational reinforcement learning
Samuel Sarjant.
A direct policysearch algorithm for relational reinforcement
learning.
PhD thesis, Department of Computer Science, University of Waikato,
2013.
[ bib 
http ]
Relational Reinforcement Learning (RRL) is a subfield of machine learning in which a learning agent seeks to maximise a numerical reward within an environment, represented as collections of objects and relations, by performing actions that interact with the environment. The relational representation allows more dynamic environment states than an attributebased representation of reinforcement learning, but this flexibility also creates new problems such as a potentially infinite number of states.
This thesis describes an RRL algorithm named Cerrla that creates policies directly from a set of learned relational conditionaction rules using the CrossEntropy Method (CEM) to control policy creation. The CEM assigns each rule a sampling probability and gradually modifies these probabilities such that the randomly sampled policies consist of ‘better’ rules, resulting in larger rewards received. Rule creation is guided by an inferred partial model of the environment that defines: the minimal conditions needed to take an action, the possible specialisation conditions per rule, and a set of simplification rules to remove redundant and illegal rule conditions, resulting in compact, efficient, and comprehensible policies.
Cerrla is evaluated on four separate environments, where each environment has several different goals. Results show that compared to existing RRL algorithms, Cerrla is able to learn equal or better behaviour in less time on the standard RRL environment. On other larger, more complex environments, it can learn behaviour that is competitive to specialised approaches. The simplified rules and CEM’s bias towards compact policies result in comprehensive and effective relational policies created in a relatively short amount of time.


[21]

Model selection based product kernel learning for regression
on graphs
Madeleine Seeland, Stefan Kramer, and Bernhard Pfahringer.
Model selection based product kernel learning for regression on
graphs.
In Proc 28th Annual ACM Symposium on Applied Computing,
Coimbra, Portugal, pages 136143. ACM, 2013.
[ bib 
http ]
The choice of a suitable graph kernel is intrinsically hard and often cannot be made in an informed manner for a given dataset. Methods for multiple kernel learning offer a possible remedy, as they combine and weight kernels on the basis of a labeled training set of molecules to define a new kernel. Whereas most methods for multiple kernel learning focus on learning convex linear combinations of kernels, we propose to combine kernels in products, which theoretically enables higher expressiveness. In experiments on ten publicly available chemical QSAR datasets we show that product kernel learning is on no dataset significantly worse than any of the competing kernel methods and on average the best method available. A qualitative analysis of the resulting product kernels shows how the results vary from dataset to dataset.


[22]

Efficient data stream classification via probabilistic adaptive
windows
Albert Bifet, Bernhard Pfahringer, Jesse Read, and Geoff Holmes.
Efficient data stream classification via probabilistic adaptive
windows.
In Proc 28th Annual ACM Symposium on Applied Computing,
Coimbra, Portugal, pages 801806. ACM, 2013.
[ bib 
http ]
In the context of a data stream, a classifier must be able to learn from a theoreticallyinfinite stream of examples using limited time and memory, while being able to predict at any point. Many methods deal with this problem by basing their model on a window of examples. We introduce a probabilistic adaptive window (PAW) for datastream learning, which improves this windowing technique with a mechanism to include older examples as well as the most recent ones, thus maintaining information on past concept drifts while being able to adapt quickly to new ones. We exemplify PAW with lazy learning methods in two variations: one to handle concept drift explicitly, and the other to add classifier diversity using an ensemble. Along with the standard measures of accuracy and time and memory use, we compare classifiers against stateoftheart classifiers from the datastream literature.


[23]

An opensource toolkit for mining Wikipedia
David N. Milne and Ian H. Witten.
An opensource toolkit for mining wikipedia.
Artificial Intelligence, 194:222239, 2013.
[ bib 
http ]
The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an opensource software system that allows researchers and developers to integrate Wikipedia's rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia's content and structure, and includes a Java API to provide access to them. Wikipedia's articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machinelearned semantic relatedness measures and annotation features, and XMLbased web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques.

