Automatic Term Recognition Needs Multiple Evidence

Natalia Loukachevitch


Abstract
In this paper we argue that the automatic term extraction procedure is an inherently multifactor process and the term extraction models needs to be based on multiple features including a specific type of a terminological resource under development. We proposed to use three types of features for extraction of two-word terms and showed that all these types of features are useful for term extraction. The set of features includes new features such as features extracted from an existing domain-specific thesaurus and features based on Internet search results. We studied the set of features for term extraction in two different domains and showed that the combination of several types of features considerably enhances the quality of the term extraction procedure. We found that for developing term extraction models in a specific domain, it is important to take into account some properties of the domain.
Anthology ID:
L12-1532
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2401–2407
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/893_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Natalia Loukachevitch. 2012. Automatic Term Recognition Needs Multiple Evidence. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2401–2407, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Automatic Term Recognition Needs Multiple Evidence (Loukachevitch, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/893_Paper.pdf