Lahjawi: Arabic Cross-Dialect Translator

Mohamed Motasim Hamed, Muhammad Hreden, Khalil Hennara, Zeina Aldallal, Sara Chrouf, Safwan AlModhayan


Abstract
In this paper, we explore the rich diversity of Arabic dialects by introducing a suite of pioneering models called Lahjawi. The primary model, Lahjawi-D2D, is the first designed for cross-dialect translation among 15 Arabic dialects. Furthermore, we introduce Lahjawi-D2MSA, a model designed to convert any Arabic dialect into Modern Standard Arabic (MSA). Both models are fine-tuned versions of Kuwain-1.5B an in-house built small language model, tailored for Arabic linguistic characteristics. We provide a detailed overview of Lahjawi’s architecture and training methods, along with a comprehensive evaluation of its performance. The results demonstrate Lahjawi’s success in preserving meaning and style, with BLEU scores of 9.62 for dialect-to-MSA and 9.88 for dialect-to- dialect tasks. Additionally, human evaluation reveals an accuracy score of 58% and a fluency score of 78%, underscoring Lahjawi’s robust handling of diverse dialectal nuances. This research sets a foundation for future advancements in Arabic NLP and cross-dialect communication technologies.
Anthology ID:
2025.wacl-1.2
Volume:
Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Saad Ezzini, Hamza Alami, Ismail Berrada, Abdessamad Benlahbib, Abdelkader El Mahdaouy, Salima Lamsiyah, Hatim Derrouz, Amal Haddad Haddad, Mustafa Jarrar, Mo El-Haj, Ruslan Mitkov, Paul Rayson
Venues:
WACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–24
Language:
URL:
https://aclanthology.org/2025.wacl-1.2/
DOI:
Bibkey:
Cite (ACL):
Mohamed Motasim Hamed, Muhammad Hreden, Khalil Hennara, Zeina Aldallal, Sara Chrouf, and Safwan AlModhayan. 2025. Lahjawi: Arabic Cross-Dialect Translator. In Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4), pages 12–24, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Lahjawi: Arabic Cross-Dialect Translator (Hamed et al., WACL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.wacl-1.2.pdf