Antti Puurula

Antti Puurula

Publications( bibtex )
M. Kurimo, A. Puurula, E. Arisoy, V. Siivola, T. Hirsimaki, J. Pylkkonen, T. Alumae, and M. Saraclar. Unlimited Vocabulary Speech Recognition for Agglutinative Languages. In HLT-NAACL, 2006.
M. Creutz, T. Hirsimaki, M. Kurimo, A. Puurula, J. Pylkkonen, V. Siivola, M. Varjokallio, E. Arisoy, M. Saraclar, and A. Stolcke. Analysis of Morph-based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages. In HLT-NAACL, pages 380-387, 2007.
M. Creutz, T. Hirsimaki, M. Kurimo, A. Puurula, J. Pylkkonen, V. Siivola, M. Varjokallio, E. Arisoy, M. Saraclar, and A. Stolcke. Morph-based Speech Recognition and Modeling of Out-of-Vocabulary Words Across Languages. TSLP, 5(1), 2007.
A. Puurula and M. Kurimo. Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition. In ACL, 2007.
K. Demuynck, A. Puurula, D. V. Compernolle, and P. Wambacq. The ESAT 2008 System for N-Best Dutch Speech Recognition Benchmark. In ASRU, pages 339-344, 2009.
A. Puurula and D. Compernolle. Dual Stream Speech Recognition Using Articulatory Syllable Models. Int. J. Speech Technol., 13(4):219-230, Dec. 2010.
A. Puurula. Mixture Models for Multi-label Text Classication. In 10th New Zealand Computer Science Research Student Conference, 2011.
A. Puurula. Large Scale Text Classification with Multi-label Naive Bayes. Journal of Measurement Science and Instrumentation, 2:35-45, 2011.
A. Puurula. Scalable Text Classification with Sparse Generative Modeling. In PRICAI2012, pages 458-469, 2012.
A. Puurula and A. Bifet. Ensembles of Sparse Multinomial Classifiers for Scalable Text Classification. In ECML/PKDD PASCAL Workshop on Large-Scale Hierarchical Classification, 2012.
A. Puurula. Combining Modifications to Multinomial Naive Bayes for Text Classification. Lecture Notes in Computer Science, volume 7675, pages 114-125. Springer Berlin Heidelberg, 2012.
A. Puurula and S. Myaeng. Integrated Instance- and Class-based Generative Modeling for Text Classification. In Australasian Document Computing Symposium, 2013.
A. Puurula. Cumulative Progress in Language Models for Information Retrieval. Australasian Language Technology Workshop, 2013. Errata.
A. Puurula. Kaggle LSHTC4 Winning Solution. www.kaggle.com/c/lshtc, 2014.
Slides
ALTA2013, Brisbane, Australia, 2013, Cumulative Progress in Language Models for Information Retrieval
ADCS2013, Brisbane, Australia, 2013, Integrated Instance- and Class-based Generative Modeling for Text Classification
Resources
For documentation on these, consult the wiki: http://sourceforge.net/p/sgmweka/wiki/SGMWeka Documentation v.1.4.4
SGM Toolkit A tidy toolkit for generative models with sparse matrix representations.
Text classification datasets in LIBSVM format 14 preprocessed and split text classification dataset feature files in LIBSVM format. 3 spam classification, 3 sentiment analysis, 5 multi-class and 3 multi-label datasets.
Text classification datasets in .arff format Text classification datasets in .arff format for Weka. Matches the 5 multi-class datasets in LIBSVM format
Preprocessing scripts Preprocessing scripts for processing raw text datasets into feature file formats.
Metaopt.py Script for Random Search optimization of program parameters.
LSHTC4_winner_solution.zip LSHTC4 Winning solution code package, including precomputed base-classifier result files.
LSHTC4_winner_solution_omit_resultsfiles.zip LSHTC4 Winning solution code package, without precomputed base-classifier result files.
Competitions
2012 LSHTC3 Track 1 Medium: 5th/17. Description
2013 KDD Cup 2013: 43rd/554
2014 LSHTC4 1st/119. Description
Contact
asp12 at students.waikato.ac.nz