Text Generation Models for Luxembourgish with Limited Data: A Balanced Multilingual Strategy

Alistair Plum, Tharindu Ranasinghe, Christoph Purschke


Abstract
This paper addresses the challenges in developing language models for less-represented languages, with a focus on Luxembourgish. Despite its active development, Luxembourgish faces a digital data scarcity, exacerbated by Luxembourg’s multilingual context. We propose a novel text generation model based on the T5 architecture, combining limited Luxembourgish data with equal amounts, in terms of size and type, of German and French data. We hypothesise that a model trained on Luxembourgish, German, and French will improve the model’s cross-lingual transfer learning capabilities and outperform monolingual and large multilingual models. To verify this, the study at hand explores whether multilingual or monolingual training is more beneficial for Luxembourgish language generation. For the evaluation, we introduce LuxGen, a text generation benchmark that is the first of its kind for Luxembourgish.
Anthology ID:
2025.vardial-1.7
Volume:
Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jorg Tiedemann, Marcos Zampieri
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
93–104
Language:
URL:
https://aclanthology.org/2025.vardial-1.7/
DOI:
Bibkey:
Cite (ACL):
Alistair Plum, Tharindu Ranasinghe, and Christoph Purschke. 2025. Text Generation Models for Luxembourgish with Limited Data: A Balanced Multilingual Strategy. In Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 93–104, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Text Generation Models for Luxembourgish with Limited Data: A Balanced Multilingual Strategy (Plum et al., VarDial 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.vardial-1.7.pdf