- Room No: G.2.11
- Telephone: +64 7 838 4466
- Extension: 8766
- Facsimile: +64 7 838 4155
BCMS Honours Topic: "Adaptive Lemmatization."
In natural language, words may be modified in a regular way for grammatical purposes. Such modified forms of basic words are called inflections. For natural language processing tasks, semantic analysis is often interested in deriving the uninflected form of a word—the so-called “gloss” or “lemma” as it would appear in a dictionary.
My project seeks to develop an algorithm that can “learn by example” how to derive a lemma for any regularly inflected word. I am interested to discover how many examples of inflected words and their uninflected glosses must be seen by a learning algorithm before correct inflection/glossing of future novel words can be done accurately.