Bilingual Terminology Alignment Using Contextualized Embeddings

Imene Setha, Hassina Aliane


Abstract
Terminology Alignment faces big challenges in NLP because of the dynamic nature of terms. Fortunately, over these last few years, Deep Learning models showed very good progress with several NLP tasks such as multilingual data resourcing, glossary building, terminology understanding. . . etc. In this work, we propose a new method for terminology alignment from a comparable corpus (Arabic/French languages) for the Algerian culture field. We aim to improve bilingual alignment based on contextual information of a term and to create a significant term bank i.e. a bilingual Arabic-French dictionary. We propose to create word embeddings for both Arabic and French languages using ELMO model focusing on contextual features of terms. Then, we mapp those embeddings using Seq2seq model. We use multilingual-BERT and All-MiniLM-L6 as baseline mod- els to compare terminology alignment results. Lastly we study the performance of these models by applying evaluation methods. Experimentation’s showed quite satisfying alignment results.
Anthology ID:
2023.contents-1.1
Volume:
Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC)
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Amal Haddad Haddad, Ayla Rigouts Terryn, Ruslan Mitkov, Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff
Venues:
ConTeNTS | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/2023.contents-1.1
DOI:
Bibkey:
Cite (ACL):
Imene Setha and Hassina Aliane. 2023. Bilingual Terminology Alignment Using Contextualized Embeddings. In Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC), pages 1–8, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Bilingual Terminology Alignment Using Contextualized Embeddings (Setha & Aliane, ConTeNTS-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.contents-1.1.pdf