Terminology extraction using co-occurrence patterns as predictors of semantic relevance

Rogelio Nazar, David Lindemann


Abstract
We propose a method for automatic term extraction based on a statistical measure that ranks term candidates according to their semantic relevance to a specialised domain. As a measure of relevance we use term co-occurrence, defined as the repeated instantiation of two terms in the same sentences, in indifferent order and at variable distances. In this way, term candidates are ranked higher if they show a tendency to co-occur with a selected group of other units, as opposed to those showing more uniform distributions. No external resources are needed for the application of the method, but performance improves when provided with a pre-existing term list. We present results of the application of this method to a Spanish-English Linguistics corpus, and the evaluation compares favourably with a standard method based on reference corpora.
Anthology ID:
2022.term-1.5
Volume:
Proceedings of the Workshop on Terminology in the 21st century: many faces, many places
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Rute Costa, Sara Carvalho, Ana Ostroški Anić, Anas Fahad Khan
Venue:
TERM
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
26–29
Language:
URL:
https://aclanthology.org/2022.term-1.5
DOI:
Bibkey:
Cite (ACL):
Rogelio Nazar and David Lindemann. 2022. Terminology extraction using co-occurrence patterns as predictors of semantic relevance. In Proceedings of the Workshop on Terminology in the 21st century: many faces, many places, pages 26–29, Marseille, France. European Language Resources Association.
Cite (Informal):
Terminology extraction using co-occurrence patterns as predictors of semantic relevance (Nazar & Lindemann, TERM 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.term-1.5.pdf