Leveraging Domain Corpora for Enhanced Terminology: The Case of Estonian-English Remote Sensing Termbase

Liisi Jakobson, Jelena Kallas, Erko Jakobson


Abstract
This article addresses methodological issues related to developing domain corpora and a terminological database from scratch. We present an ongoing project focused on creating an Estonian-English Remote Sensing Termbase. First, we describe the compilation process of the Estonian Remote Sensing Corpus 2022 , which served as the primary data source for the termbase. The corpus was compiled by crawling the web and adding files using the Corpus Query System Sketch Engine (Kilgarriff et al., 2004). In the next step, we employed the Term Extraction module (Kilgarriff et al., 2014; Fišer et al., 2016; Blahuš et al., 2023) to identify terms, which were subsequently registered in the Estonian Remote Sensing Termbase using the Dictionary Writing System Ekilex (Tavast et al., 2018). For each term, we provided definitions, variants, and usage contexts. In the final stage, remote sensing experts reviewed and edited the terms, their variants, and usage contexts. Finally, we provide insights and outline directions for future work in this area.
Anthology ID:
2024.lrec-main.904
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
10347–10351
Language:
URL:
https://aclanthology.org/2024.lrec-main.904
DOI:
Bibkey:
Cite (ACL):
Liisi Jakobson, Jelena Kallas, and Erko Jakobson. 2024. Leveraging Domain Corpora for Enhanced Terminology: The Case of Estonian-English Remote Sensing Termbase. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10347–10351, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Leveraging Domain Corpora for Enhanced Terminology: The Case of Estonian-English Remote Sensing Termbase (Jakobson et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.904.pdf