- Room No: G.1.31
- Telephone: +64 7 838 4021
- Extension: 4021
- Facsimile: +64 7 838 4155
Thesis Topic: "Morphological analysis for automatic Arabic text classification."
Text classification aims to automatically assign the text to a predefined category based on linguistic features. Text preprocessing should be performed with very special care before carrying out the text classification task (eg, document conversion, stop words removal, stemming, etc). After applying the preprocessing routines texts typically are represented as vectors with n elements that denote the number of features which are mostly the text words. My project aims to exploit the morpho-syntactic information that can be extracted from the Arabic text to investigate their effects on the accuracy of the classifier in automatic Arabic text classification in different aspects. A new conflation method will be adopted using machine learning approaches as a backbone of this project.