Graph-based RAG for Low-Resource Aromanian–Romanian Translation

Laurentiu G. Ghetoiu, Sergiu Nisioi


Abstract
Aromanian, a linguistically and culturally significant yet low-resource Romance language, poses substantial challenges in computational linguistic research due to its limited NLP resources and non-standardized orthography. In this paper, we present an experimental study aimed at translating Aromanian texts into Romanian using a variety of modern NLP methodologies. We leverage two key resources: a parallel corpus consisting of approximately 3,000 sentence-aligned short stories and a dictionary of over 28,000 Aromanian-Romanian word pairs. Our approaches include Retrieval-Augmented Generation (RAG) supported by a graph-based alignment database, fine-tuning multilingual transformer models (specifically Meta’s NLLB), and parameter-efficient fine-tuning techniques such as LoRA applied to LLaMA-derived models. Evaluations using standard metrics (BLEU, chrF) demonstrate varied effectiveness across these methodologies, highlighting the strong performance of NLLB for general translation tasks, while RAG excels in translating familiar content. Our findings underline the complexities inherent in low-resource language translation and provide valuable insights into effective digital preservation and NLP adaptation strategies for underrepresented languages.
Anthology ID:
2025.ranlp-1.47
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
388–394
Language:
URL:
https://aclanthology.org/2025.ranlp-1.47/
DOI:
Bibkey:
Cite (ACL):
Laurentiu G. Ghetoiu and Sergiu Nisioi. 2025. Graph-based RAG for Low-Resource Aromanian–Romanian Translation. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 388–394, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Graph-based RAG for Low-Resource Aromanian–Romanian Translation (Ghetoiu & Nisioi, RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.47.pdf