Analysing the Impact of Supervised Machine Learning on Automatic Term Extraction: HAMLET vs TermoStat

Ayla Rigouts Terryn, Patrick Drouin, Veronique Hoste, Els Lefever


Abstract
Traditional approaches to automatic term extraction do not rely on machine learning (ML) and select the top n ranked candidate terms or candidate terms above a certain predefined cut-off point, based on a limited number of linguistic and statistical clues. However, supervised ML approaches are gaining interest. Relatively little is known about the impact of these supervised methodologies; evaluations are often limited to precision, and sometimes recall and f1-scores, without information about the nature of the extracted candidate terms. Therefore, the current paper presents a detailed and elaborate analysis and comparison of a traditional, state-of-the-art system (TermoStat) and a new, supervised ML approach (HAMLET), using the results obtained for the same, manually annotated, Dutch corpus about dressage.
Anthology ID:
R19-1117
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1012–1021
Language:
URL:
https://aclanthology.org/R19-1117
DOI:
10.26615/978-954-452-056-4_117
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/R19-1117.pdf