Disambiguating automatic semantic annotation based on a thesaurus structure

Véronique Malaisé, Luit Gazendam, Hennie Brugman


Abstract
The use/use for relationship a thesaurus is usually more complex than the (para-) synonymy recommended in the ISO-2788 standard describing the content of these controlled vocabularies. The fact that a non preferred term can refer to multiple preferred terms (only the latter are relevant in controlled indexing) makes this relationship difficult to use in automatic annotation applications : it generates ambiguity cases. In this paper, we present the CARROT algorithm, meant to rank the output of our Information Extraction pipeline, and how this algorithm can be used to select the relevant preferred term out of different possibilities. This selection is meant to provide suggestions of keywords to human annotators, in order to ease and speed up their daily process and is based on the structure of their thesaurus. We achieve a 95 % success, and discuss these results along with perspectives for this experiment.
Anthology ID:
2007.jeptalnrecital-long.18
Volume:
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Month:
June
Year:
2007
Address:
Toulouse, France
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA
Note:
Pages:
185–194
Language:
URL:
https://aclanthology.org/2007.jeptalnrecital-long.18
DOI:
Bibkey:
Cite (ACL):
Véronique Malaisé, Luit Gazendam, and Hennie Brugman. 2007. Disambiguating automatic semantic annotation based on a thesaurus structure. In Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, pages 185–194, Toulouse, France. ATALA.
Cite (Informal):
Disambiguating automatic semantic annotation based on a thesaurus structure (Malaisé et al., JEP/TALN/RECITAL 2007)
Copy Citation:
PDF:
https://aclanthology.org/2007.jeptalnrecital-long.18.pdf