Dialectal and Low Resource Machine Translation for Aromanian

Alexandru-Iulius Jerpelea, Alina Radoi, Sergiu Nisioi


Abstract
We present a neural machine translation system that can translate between Romanian, English, and Aromanian (an endangered Eastern Romance language); the first of its kind. BLEU scores range from 17 to 32 depending on the direction and genre of the text. Alongside, we release the biggest known Aromanian-Romanian bilingual corpus, consisting of 80k cleaned sentence pairs. Additional tools such as an agnostic sentence embedder (used for both text mining and automatic evaluation) and a diacritics converter are also presented. Lastly, we describe the online deployment of our quantized model, considering a CPU-driven limited resource scenario.
Anthology ID:
2025.coling-main.481
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7209–7228
Language:
URL:
https://aclanthology.org/2025.coling-main.481/
DOI:
Bibkey:
Cite (ACL):
Alexandru-Iulius Jerpelea, Alina Radoi, and Sergiu Nisioi. 2025. Dialectal and Low Resource Machine Translation for Aromanian. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7209–7228, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Dialectal and Low Resource Machine Translation for Aromanian (Jerpelea et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.481.pdf