Automatically Selecting Domain Markers for Terminology Extraction

Jorge Vivaldi; Horacio Rodríguez

Automatically Selecting Domain Markers for Terminology Extraction

Abstract

Some approaches to automatic terminology extraction from corpora imply the use of existing semantic resources for guiding the detection of terms. Most of these systems exploit specialised resources, like UMLS in the medical domain, while a few try to take profit from general-purpose semantic resources, like EuroWordNet (EWN). As the term extraction task is clearly domain depending, in the case a general-purpose resource without specific domain information is used, we need a way of attaching domain information to the units of the resource. For big resources it is desirable that this semantic enrichment could be carried out automatically. Given a specific domain, our proposal aims to detect in EWN those units that can be considered as domain markers (DM). We can define a DM as an EWN entry whose attached strings belong to the domain, as well as the variants of all its descendents through the hyponymy relation. The procedure we propose in this paper is fully automatic and, a priori, domain-independent. The only external knowledge it uses is a set of terms, which is an external vocabulary, which is considered to have at least one sense belonging to the domain.

Anthology ID:: L04-1192
Volume:: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Month:: May
Year:: 2004
Address:: Lisbon, Portugal
Editors:: Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa, Raquel Silva
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2004/pdf/348.pdf
DOI:
Bibkey:
Cite (ACL):: Jorge Vivaldi and Horacio Rodríguez. 2004. Automatically Selecting Domain Markers for Terminology Extraction. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).
Cite (Informal):: Automatically Selecting Domain Markers for Terminology Extraction (Vivaldi & Rodríguez, LREC 2004)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2004/pdf/348.pdf

PDF Cite Search Fix data