|
Wikipedia can be utilized as a controlled
vocabulary for identifying the main topics in a document, with article
titles serving as index terms and redirect titles as their synonyms. Wikipedia
contains over 4M such titles covering the terminology of nearly any document
collection. This permits controlled indexing in the absence of manually
created vocabularies. We combine state-of-the-art strategies for automatic
controlled indexing with Wikipedia's unique property--a richly hyperlinked
encyclopedia. We evaluate the scheme by comparing automatically assigned
topics with those chosen manually by human indexers. Analysis of indexing
consistency shows that our algorithm performs as well as the average
person.
Full
paper (to appear in Proceedings of the WikiAI Workshop at AAAI-2008,
Chicago, US)
|