Alexandru-Iulius Jerpelea
2025
Dialectal and Low Resource Machine Translation for Aromanian
Alexandru-Iulius Jerpelea
|
Alina Radoi
|
Sergiu Nisioi
Proceedings of the 31st International Conference on Computational Linguistics
We present a neural machine translation system that can translate between Romanian, English, and Aromanian (an endangered Eastern Romance language); the first of its kind. BLEU scores range from 17 to 32 depending on the direction and genre of the text. Alongside, we release the biggest known Aromanian-Romanian bilingual corpus, consisting of 80k cleaned sentence pairs. Additional tools such as an agnostic sentence embedder (used for both text mining and automatic evaluation) and a diacritics converter are also presented. Lastly, we describe the online deployment of our quantized model, considering a CPU-driven limited resource scenario.