Antti Puurula

Antti Puurula

Publications( bibtex )
M. Kurimo, A. Puurula, E. Arisoy, V. Siivola, T. Hirsimaki, J. Pylkkonen, T. Alumae, and M. Saraclar. Unlimited Vocabulary Speech Recognition for Agglutinative Languages. In HLT-NAACL, 2006.
M. Creutz, T. Hirsimaki, M. Kurimo, A. Puurula, J. Pylkkonen, V. Siivola, M. Varjokallio, E. Arisoy, M. Saraclar, and A. Stolcke. Analysis of Morph-based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages. In HLT-NAACL, pages 380-387, 2007.
M. Creutz, T. Hirsimaki, M. Kurimo, A. Puurula, J. Pylkkonen, V. Siivola, M. Varjokallio, E. Arisoy, M. Saraclar, and A. Stolcke. Morph-based Speech Recognition and Modeling of Out-of-Vocabulary Words Across Languages. TSLP, 5(1), 2007.
A. Puurula and M. Kurimo. Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition. In ACL, 2007.
K. Demuynck, A. Puurula, D. V. Compernolle, and P. Wambacq. The ESAT 2008 System for N-Best Dutch Speech Recognition Benchmark. In ASRU, pages 339-344, 2009.
A. Puurula and D. Compernolle. Dual Stream Speech Recognition Using Articulatory Syllable Models. Int. J. Speech Technol., 13(4):219-230, Dec. 2010.
A. Puurula. Mixture Models for Multi-label Text Classication. In 10th New Zealand Computer Science Research Student Conference, 2011.
A. Puurula. Large Scale Text Classification with Multi-label Naive Bayes. Journal of Measurement Science and Instrumentation, 2:35-45, 2011.
A. Puurula. Scalable Text Classification with Sparse Generative Modeling. In PRICAI2012, pages 458-469, 2012.
A. Puurula and A. Bifet. Ensembles of Sparse Multinomial Classifiers for Scalable Text Classification. In ECML/PKDD PASCAL Workshop on Large-Scale Hierarchical Classification, 2012.
A. Puurula. Combining Modifications to Multinomial Naive Bayes for Text Classification. Lecture Notes in Computer Science, volume 7675, pages 114-125. Springer Berlin Heidelberg, 2012.
A. Puurula and S. Myaeng. Integrated Instance- and Class-based Generative Modeling for Text Classification. In Australasian Document Computing Symposium, 2013.
A. Puurula. Cumulative Progress in Language Models for Information Retrieval. In Australasian Language Technology Workshop, 2013. Errata.
A. Puurula. Kaggle LSHTC4 Winning Solution. www.kaggle.com/c/lshtc, 2014.
G. Tsoumakas, A. Papadopoulos and W. Qian, S. Vologiannidis, A. D'yakonov, A. Puurula, J. Read, J. Svec, and S. Semenov WISE 2014 Challenge: Multi-label Classification of Print Media Articles to Topics. In WISE 2014.
A. Trotman, A. Puurula, and B. Burgess Improvements to BM25 and Language Models Examined. In Australasian Document Computing Symposium, 2014. Best Paper Award
J. Read, A. Puurula, A. Bifet. Multi-label Classification with Meta Labels. to appear in Proc. of IEEE International Conference on Data Mining. 2014.
Slides
ALTA2013, Brisbane, Australia, 2013, Cumulative Progress in Language Models for Information Retrieval
ADCS2013, Brisbane, Australia, 2013, Integrated Instance- and Class-based Generative Modeling for Text Classification
WISE2014, Thessaloniki, Greece, 2014, Kaggle WISE2014. 2nd-place Solution
ADCS2014, Melbourne, Australia, 2014, Improvements to BM25 and Language Models Examined
Resources
For documentation on these, consult the wiki: http://sourceforge.net/p/sgmweka/wiki/SGMWeka Documentation v.1.4.4
SGM Toolkit A tidy toolkit for generative models with sparse matrix representations
Text classification datasets in LIBSVM format 14 preprocessed and split text classification dataset feature files in LIBSVM format. 3 spam classification, 3 sentiment analysis, 5 multi-class and 3 multi-label datasets
Text classification datasets in .arff format Text classification datasets in .arff format for Weka. Matches the 5 multi-class datasets in LIBSVM format
Preprocessing scripts Preprocessing scripts for processing raw text datasets into feature file formats
Metaopt.py Script for Random Search optimization of program parameters
LSHTC4_winner_solution.zip LSHTC4 Winning solution code package, including precomputed base-classifier result files
LSHTC4_winner_solution_omit_resultsfiles.zip LSHTC4 Winning solution code package, without precomputed base-classifier result files
Competitions
2012 LSHTC3 Track 1 Medium. 5th/17. Description
2013 KDD Cup 2013. 43rd/554
2014 LSHTC4 1st/119. Description
2014 WISE2014 2nd/120. Overview
2014 Tradeshift. 34th/375
Contact
asp12 at students.waikato.ac.nz