Lan (Anna) Huang

I am a PhD student in the Department of Computer Science at the University of Waikato. My supervisors are Ian H. Witten and Eibe Frank.

My PhD topic is about concept-based text clustering, which leverages the rich semantic knowledge in knowledge bases like WordNet and Wikipedia to facilitate text representation, enrich thematic text similarity measures, and benefit text clustering. My PhD project develops the Katoa toolkit, which is an open-source software for concept-based text processing.

My research interests are text clustering, machine learning for text mining, digital libraries and information retrieval.

 

Resume

Contact

Email: lh92 @ cs.waikato.ac.nz
Phone: +64 7 856 2889 ext. 6038
Digital Library Lab. (G.2.01)
Department of Computer Science
University of Waikato
Private Bag 3105
Hamilton, New Zealand

Publications

Huang, A. (2011) Learning document similarity. In Proceedings of the New Zealand Computer Science Research Student Conference (NZCSRSC'11), Palmerston North, New Zealand.

Huang, A. (2010) Combining Global Semantic Relatedness and Local Analysis for Document Clustering. In Proceedings of the New Zealand Computer Science Research Student Conference (NZCSRSC'10), Wellington, New Zealand.

Huang, A., Milne, D., Frank, E. and Witten, I.H. (2009) Clustering documents using a Wikipedia-based concept representation. In Proceedings of the Thirteenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'09), Bangkok, Thailand.

Huang, A., Milne, D., Frank, E. and Witten, I.H. (2008) Clustering documents with active learning using Wikipedia. In Proceedings of the Eighth IEEE International Conference on Data Mining (ICDM'08),, Pisa, Italy.

Huang, A. (2007) Similarity Measures for Text Document Clustering. In Proceedings of the New Zealand Computer Science Research Student Conference (NZCSRSC'08), Christchurch, New Zealand.

Li, G, Huang, L. (2004). Digital Object Identifiers:Handle System. Library Development, 2004 (3). (in Chinese)

Theses

Huang, L. (2011) Concept-based text clustering. PhD Thesis, University of Waikato, New Zealand.

Huang, L. (2006) Web resource integration using portal and portlets. Master Thesis, Beijing Normal University, China. (in Chinese)

Current & Past Projects

I work part-time on the Greenstone software, which is an open source digital library software that supports 57 languages. If you are interested in maintaining languages currently supported by Greenstone or translating Greestone into new languages, please let me know.

Cross-database Search System is a meta search engine that I worked on part-time at the Information System Department of the Library of Chinese Academy of Sciences (Beijing). It simultaneously searches multiple heterogenous data sources and provides a unified representation of the search results. I mainly worked on the search module that runs both HTTP and Z39.50 protocols, and customization of data sources.

I worked on translating Integrated Broadband Networks--a textbook on broadband networks--into Chinese (third translator). Although unrelated to my research, it was a helpful excercise for both my English and Chinese. The Chinese version was published by the Publishing House of Eletronics Industry in 2005.

I also worked part-time as English language tutor at freshman and sophomore levels at Beijing Normal University.