
In my development greenstone site, I am currently exploring automatic metadata extraction from raw text. Many features of this install are broken, if you want a system for for general use, go to the Proper NZDL Site. My publications are on a seperate Publications Page as is some of the material linked to my research group, the Text Mining Research Group.
I also have a page on Advogato.
Metadata is used in libraries to build catalogues and manage their collections. Catalogues typically contain author, title, publication date and subject information for each document in the library's collections. There is, however, a great deal of other metadata not dealt with by these systems including: metadata related to the readers' accesses of the documents; subject area reviews and bibliographies; figures and tables within the documents; and domain specific dictionaries of terms and acronyms. By far the greatest source of metadata avaliable to libraries is the documents themselves (the ``full text''), with explicit metadata in bibliographies, tables of contents, tables of figures and indexes and implicit metadata in the structure of the text, for instance the terminology introduced and used and the other documents referred to. It is this novel, non-traditional, metadata that I intend to explore in my thesis.
I did my undergraduate study and my masters in Computer Science the University of Canterbury in Christchurch. During much of that time I lived at College House. My masters thesis, for which I earned first class honours, was entitled ``Design Patterns in Garbage Collection'', and focused on several software engineering aspects of operating systems. My supervisor was Michel de Champlain who is now at Concordia and at DeepObjectKnowledge.
At Canterbury I tutored introductory computer studies as well as third year operating systems.
After graduating I worked for Trimble Navigation Limited for approximately a year, doing application development for the land survey market on the Win/NT/Office98 platform leveraging Trimble's GPS (Global Positioning Satellite) technology.
I am currently working on a PhD in computer science within the digital library research group. David Bainbridge is my supervisor. Other people I work with include Ian H. Witten, Sally Jo Cunningham and Dr Malika Mahoui and the rest of the crew on the NZDL roject. My thesis (or at least my thesis proposal dated August 31 1999) is entitled ``Novel Indexes and Metadata Sources in Digital Libraries.''
I wrote and maintain the web page for the text mining group within the NZDL.
The software that I work on is released under the GNU licence.
Stuart Yeates