Language verY Rare for All

Ibrahim Merad, Amos Wolf, Ziad Mazzawi, Yannick Léo


Abstract
In the quest to overcome language barriers, encoder-decoder models like NLLB have expanded machine translation to rare languages, with some models (e.g., NLLB 1.3B) even trainable on a single GPU. While general-purpose LLMs perform well in translation, open LLMs prove highly competitive when fine-tuned for specific tasks involving unknown corpora. We introduce LYRA (Language verY Rare for All), a novel approach that combines open LLM fine-tuning, retrieval-augmented generation (RAG), and transfer learning from related high-resource languages. This study is exclusively focused on single-GPU training to facilitate ease of adoption. Our study focuses on two-way translation between French and Monégasque — a rare language unsupported by existing translation tools due to limited corpus availability. Our results demonstrate LYRA’s effectiveness, frequently surpassing and consistently matching state-of-the-art encoder-decoder models in rare language translation.
Anthology ID:
2025.loreslm-1.12
Volume:
Proceedings of the First Workshop on Language Models for Low-Resource Languages
Month:
January
Year:
2025
Address:
Abu Dhabi, United Arab Emirates
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venues:
LoResLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
166–174
Language:
URL:
https://aclanthology.org/2025.loreslm-1.12/
DOI:
Bibkey:
Cite (ACL):
Ibrahim Merad, Amos Wolf, Ziad Mazzawi, and Yannick Léo. 2025. Language verY Rare for All. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, pages 166–174, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Language verY Rare for All (Merad et al., LoResLM 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.loreslm-1.12.pdf