Antti Puurula

Antti Puurula

Publications( bibtex )
M. Kurimo, A. Puurula, E. Arisoy, V. Siivola, T. Hirsimaki, J. Pylkkonen, T. Alumae, and M. Saraclar. Unlimited Vocabulary Speech Recognition for Agglutinative Languages. In HLT-NAACL, 2006.
M. Creutz, T. Hirsimaki, M. Kurimo, A. Puurula, J. Pylkkonen, V. Siivola, M. Varjokallio, E. Arisoy, M. Saraclar, and A. Stolcke. Analysis of Morph-based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages. In HLT-NAACL, pages 380-387, 2007.
M. Creutz, T. Hirsimaki, M. Kurimo, A. Puurula, J. Pylkkonen, V. Siivola, M. Varjokallio, E. Arisoy, M. Saraclar, and A. Stolcke. Morph-based Speech Recognition and Modeling of Out-of-Vocabulary Words Across Languages. TSLP, 5(1), 2007.
A. Puurula and M. Kurimo. Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition. In ACL, 2007.
K. Demuynck, A. Puurula, D. V. Compernolle, and P. Wambacq. The ESAT 2008 System for N-Best Dutch Speech Recognition Benchmark. In ASRU, pages 339-344, 2009.
A. Puurula and D. Compernolle. Dual Stream Speech Recognition Using Articulatory Syllable Models. Int. J. Speech Technol., 13(4):219-230, Dec. 2010.
A. Puurula. Mixture Models for Multi-label Text Classication. In 10th New Zealand Computer Science Research Student Conference, 2011.
A. Puurula. Large Scale Text Classification with Multi-label Naive Bayes. Journal of Measurement Science and Instrumentation, 2:35-45, 2011.
A. Puurula. Scalable Text Classification with Sparse Generative Modeling. In PRICAI2012, pages 458-469, 2012.
A. Puurula and A. Bifet. Ensembles of Sparse Multinomial Classifiers for Scalable Text Classification. In ECML/PKDD PASCAL Workshop on Large-Scale Hierarchical Classification, 2012.
A. Puurula. Combining Modifications to Multinomial Naive Bayes for Text Classification. Lecture Notes in Computer Science, volume 7675, pages 114-125. Springer Berlin Heidelberg, 2012.
A. Puurula and S. Myaeng. Integrated Instance- and Class-based Generative Modeling for Text Classification. In Australasian Document Computing Symposium, 2013.
A. Puurula. Cumulative Progress in Language Models for Information Retrieval. Australasian Language Technology Workshop, 2013.
Slides
ALTA2013, Brisbane, Australia, 2013, Cumulative Progress in Language Models for Information Retrieval
ADCS2013, Brisbane, Australia, 2013, Integrated Instance- and Class-based Generative Modeling for Text Classification
Resources
For documentation on these, consult the wiki: http://sourceforge.net/p/sgmweka/wiki/SGMWeka Documentation v.1.4.4
SGM Toolkit A tidy toolkit for generative models with sparse matrix representations.
Text classification datasets in LIBSVM format 14 preprocessed and split text classification dataset feature files in LIBSVM format. 3 spam classification, 3 sentiment analysis, 5 multi-class and 3 multi-label datasets.
Text classification datasets in .arff format Text classification datasets in .arff format for Weka. Matches the 5 multi-class datasets in LIBSVM format
Preprocessing scripts Preprocessing scripts for processing raw text datasets into feature file formats.
Metaopt.py Script for Random Search optimization of program parameters.
Contact
asp12 at students.waikato.ac.nz