Laurentiu G. Ghetoiu
2025
Graph-based RAG for Low-Resource Aromanian–Romanian Translation
Laurentiu G. Ghetoiu
|
Sergiu Nisioi
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Aromanian, a linguistically and culturally significant yet low-resource Romance language, poses substantial challenges in computational linguistic research due to its limited NLP resources and non-standardized orthography. In this paper, we present an experimental study aimed at translating Aromanian texts into Romanian using a variety of modern NLP methodologies. We leverage two key resources: a parallel corpus consisting of approximately 3,000 sentence-aligned short stories and a dictionary of over 28,000 Aromanian-Romanian word pairs. Our approaches include Retrieval-Augmented Generation (RAG) supported by a graph-based alignment database, fine-tuning multilingual transformer models (specifically Meta’s NLLB), and parameter-efficient fine-tuning techniques such as LoRA applied to LLaMA-derived models. Evaluations using standard metrics (BLEU, chrF) demonstrate varied effectiveness across these methodologies, highlighting the strong performance of NLLB for general translation tasks, while RAG excels in translating familiar content. Our findings underline the complexities inherent in low-resource language translation and provide valuable insights into effective digital preservation and NLP adaptation strategies for underrepresented languages.