David Milne, Olena Medelyan and Ian Witten have a nice paper at this conference I am attending in Hong Kong, on mining domain-specific thesauri from Wikipedia. As they say:
How can you obtain a thesaurus to support a library of documents in a particular domain? Manual construction is prohibitively expensive; automatic generation is woefully inaccurate. General thesauri do not incorporate the specialist terminology that pervades our professions, nor can they keep pace with the deluge of new topics and concepts that arrive each day. Yet a contemporary resource that incorporates expertise in all fields of human endeavour already exists: the widely known Wikipedia.
Basically, they mine the structure of Wikipedia (its redirects, hierarchy, and hyperlinks) to infer the equivalence, hierarchical, and associative relations needed to build a thesaurus. Comparison with a professionally prepared thesaurus (from agriculture) shows that this approach can be effective. Another example of crowdsourcing, based on a rather nonobvious use of the work of its contributors.

Recent Comments