MariATE: Automatic Term Extraction Using Large Language Models in the Maritime Domain

Shijie Liu; Els Lefever; Veronique Hoste

MariATE: Automatic Term Extraction Using Large Language Models in the Maritime Domain

Shijie Liu, Els Lefever, Veronique Hoste

Abstract

This study presents a comprehensive evaluation of Large Language Models (LLMs) for automatic term extraction in the maritime safety domain. The research examines the zero-shot performance of seven state-of-the-art LLMs, including both open-source and closed-source models, and investigates terminology annotation strategies for optimal coverage. Nested annotation captures both complete technical expressions and their constituent components, while full-term annotation focuses exclusively on maximal-length terms. Experimental results demonstrate Claude-3.5-Sonnet’s superior performance (F1-score of 0.80) in maritime safety terminology extraction, particularly in boundary detection capabilities. Error analysis reveals three primary challenges: distinguishing contextual descriptions from legitimate terminology, handling complex multi-word expressions, and identifying maritime safety operational and navigational terms. Analysis of annotation strategies reveals that the full-term annotation approach achieves 95.24% coverage of unique terms compared to the nested annotation approach. The additional 4.76% of terms identified through nested annotation represents subcomponents of larger technical expressions. These findings advance the understanding of LLMs’ capabilities in specialized terminology extraction and provide empirical evidence supporting the sufficiency of full-term annotation for comprehensive terminology coverage in domain-specific applications.

Anthology ID:: 2025.ranlp-1.77
Volume:: Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 663–673
Language:
URL:: https://aclanthology.org/2025.ranlp-1.77/
DOI:
Bibkey:
Cite (ACL):: Shijie Liu, Els Lefever, and Veronique Hoste. 2025. MariATE: Automatic Term Extraction Using Large Language Models in the Maritime Domain. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 663–673, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: MariATE: Automatic Term Extraction Using Large Language Models in the Maritime Domain (Liu et al., RANLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ranlp-1.77.pdf

PDF Cite Search Fix data