2016

[1]

Statistical Genomics: Methods and Protocols

Tony C. Smith and Eibe Frank. Statistical Genomics: Methods and Protocols, chapter Introducing Machine Learning Concepts with WEKA, pages 353-378. Springer, New York, NY, 2016.
[ bib | http ]
This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

[2]

The WEKA Workbench

Eibe Frank, Mark A. Hall, and Ian H. Witten. The WEKA workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, Fourth Edition, 2016.
[ bib | .pdf ]
[3]

Data Mining: Practical Machine Learning Tools and Techniques

Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington, MA, 4 edition, 2016.
[ bib | .html ]
[4]

WekaPyScript: Classification, Regression, and Filter Schemes for WEKA Implemented in Python

Christopher Beckham, Mark Hall, and Eibe Frank. WekaPyScript: Classification, regression, and filter schemes for WEKA implemented in Python. Journal of Open Research Software, 4(1):e33, 2016.
[ bib | http ]
[5]

From opinion lexicons to sentiment classification of tweets and vice versa: a transfer learning approach

Felipe Bravo-Marquez, Eibe Frank, and Bernhard Pfahringer. From opinion lexicons to sentiment classification of tweets and vice versa: a transfer learning approach. In Proc 15th IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, Nebraska. IEEE Computer Society, 2016.
[ bib | .pdf ]
[6]

Determining Word-Emotion Associations from Tweets by Multi-Label Classification

Felipe Bravo-Marquez, Eibe Frank, Saif M. Mohammad, and Bernhard Pfahringer. Determining word-emotion associations from tweets by multi-label classification. In Proc 15th IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, Nebraska. IEEE Computer Society, 2016.
[ bib | .pdf ]
[7]

Building Ensembles of Adaptive Nested Dichotomies with Random-Pair Selection

Tim Leathart, Bernhard Pfahringer, and Eibe Frank. Building ensembles of adaptive nested dichotomies with random-pair selection. In Proc 20th European Conference on Principles and Practice of Knowledge Discovery in Databases and 27th European Conference on Machine Learning, Riva del Garda, Italy. Springer, 2016.
[ bib | .pdf ]
[8]

Annotate-Sample-Average (ASA): A New Distant Supervision Approach for Twitter Sentiment Analysis

Felipe Bravo-Marquez, Eibe Frank, and Bernhard Pfahringer. Annotate-sample-average (ASA): A new distant supervision approach for Twitter sentiment analysis. In Proc 22nd European Conference on Artificial Intelligence, The Hague, Netherlands. IOS Press, 2016.
[ bib | .pdf ]
[9]

Building a Twitter opinion lexicon from automatically-annotated tweets

Felipe Bravo-Marquez, Eibe Frank, and Bernhard Pfahringer. Building a twitter opinion lexicon from automatically-annotated tweets. Knowl.-Based Syst., 108:65-78, 2016.
[ bib | http | .pdf ]
Opinion lexicons, which are lists of terms labeled by sentiment, are widely used resources to support automatic sentiment analysis of textual passages. However, existing resources of this type exhibit some limitations when applied to social media messages such as tweets (posts in Twitter), because they are unable to capture the diversity of informal expressions commonly found in this type of media. In this article, we present a method that combines information from automatically annotated tweets and existing hand-made opinion lexicons to expand an opinion lexicon in a supervised fashion. The expanded lexicon contains part-of-speech (POS) disambiguated entries with a probability distribution for positive, negative, and neutral polarity classes, similarly to SentiWordNet. To obtain this distribution using machine learning, we propose word-level attributes based on (a) the morphological information conveyed by POS tags and (b) associations between words and the sentiment expressed in the tweets that contain them. We consider tweets with both hard and soft sentiment labels. The sentiment associations are modeled in two different ways: using point-wise-mutual-information semantic orientation (PMI-SO), and using stochastic gradient descent semantic orientation (SGD-SO), which learns a linear relationship between words and sentiment. The training dataset is labeled by a seed lexicon formed by combining multiple hand-annotated lexicons. Our experimental results show that our method outperforms the three-dimensional word-level polarity classification performance obtained by using PMI-SO alone. This is significant because PMI-SO is a state-of-the-art measure for establishing world-level sentiment. Additionally, we show that lexicons created with our method achieve significant improvements over SentiWordNet for classifying tweets into polarity classes, and also outperform SentiStrength in the majority of the experiments.

[10]

Learning Distance Metrics for Multi-Label Classification

Henry Gouk, Bernhard Pfahringer, and Michael Cree. Learning distance metrics for multi-label classification. In Proc 8th Asian Conference on Machine Learning, Hamilton, New Zealand. JMLR Workshop and Conference Proceedings, 2016.
[ bib | .pdf ]
[11]

Estimating heading direction from monocular video sequences using biologically-based sensors

Michael Cree, John Perrone, Gehan Anthonys, Aden Garnett, and Henry Gouk. Estimating heading direction from monocular video sequences using biologically-based sensors. In Image and Vision Computing New Zealand (IVCNZ), International Conference on, Palmerston North, New Zealand. IEEE, 2016.
[ bib | .pdf ]
[12]

Toward Large-Scale Continuous EDA: A Random Matrix Theory Perspective

Ata Kabán, Jakramate Bootkrajang, and Robert J. Durrant. Toward large-scale continuous EDA: A random matrix theory perspective. Evolutionary Computation, 24(2):255-291, 2016.
[ bib | http ]
[13]

How effective is Cauchy-EDA in high dimensions?

Momodou L. Sanyang, Robert J. Durrant, and Ata Kabán. How effective is cauchy-eda in high dimensions? In IEEE Congress on Evolutionary Computation, CEC 2016, Vancouver, BC, Canada, July 24-29, 2016, pages 3409-3416, 2016.
[ bib | http ]
[14]

BlockCopy-based operators for evolving efficient wind farm layouts

Michael Mayo and Chen Zheng. Blockcopy-based operators for evolving efficient wind farm layouts. In IEEE Congress on Evolutionary Computation, CEC 2016, Vancouver, BC, Canada, July 24-29, 2016, pages 1085-1092, 2016.
[ bib | http ]
[15]

Towards a New Evolutionary Subsampling Technique for Heuristic Optimisation of Load Disaggregators

Michael Mayo and Sara Omranian. Towards a new evolutionary subsampling technique for heuristic optimisation of load disaggregators. In Trends and Applications in Knowledge Discovery and Data Mining - PAKDD 2016 Workshops, BDM, MLSDA, PACC, WDMBF Auckland, New Zealand, April 19, 2016, Revised Selected Papers, pages 3-14, 2016.
[ bib | http ]
[16]

Deferral classification of evolving temporal dependent data streams

Michael Mayo and Albert Bifet. Deferral classification of evolving temporal dependent data streams. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, April 4-8, 2016, pages 952-954, 2016.
[ bib | http ]
[17]

DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources

Juan D. Velásquez, Yerko Covacevich, Francisco Molina, Edison Marrese-Taylor, Cristián Rodríguez, and Felipe Bravo-Marquez. DOCODE 3.0 (document copy detector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources. Information Fusion, 27:64-75, 2016.
[ bib | http | .pdf ]
Plagiarism refers to the act of presenting external words, thoughts, or ideas as one’s own, without providing references to the sources from which they were taken. The exponential growth of different digital document sources available on the Web has facilitated the spread of this practice, making the accurate detection of it a crucial task for educational institutions. In this article, we present DOCODE 3.0, a Web system for educational institutions that performs automatic analysis of large quantities of digital documents in relation to their degree of originality. Since plagiarism is a complex problem, frequently tackled at different levels, our system applies algorithms in order to perform an information fusion process from multi data source to all these levels. These algorithms have been successfully tested in the scientific community in solving tasks like the identification of plagiarized passages and the retrieval of source candidates from the Web, among other multi data sources as digital libraries, and have proven to be very effective. We integrate these algorithms into a multi-tier, robust and scalable JEE architecture, allowing many different types of clients with different requirements to consume our services. For users, DOCODE produces a number of visualizations and reports from the different outputs to let teachers and professors gain insights on the originality of the documents they review, allowing them to discover, understand and handle possible plagiarism cases and making it easier and much faster to analyze a vast number of documents. Our experience here is so far focused on the Chilean situation and the Spanish language, offering solutions to Chilean educational institutions in any of their preferred Virtual Learning Environments. However, DOCODE can easily be adapted to increase language coverage.