AraSim: Optimizing Arabic Dialect Translation in Children’s Literature with LLMs and Similarity Scores

Alaa Hassan Bouomar, Noorhan Abbas


Abstract
The goal of the paper is to address the linguistic gap faced by young Egyptian Arabic speakers through translating children stories from Modern Standard Arabic to the Egyptian Cairo dialect. Claude is used for initial translation, and a fine-tuned AraT5 model is used for backtranslation. The translation quality is assessed using semantic similarity and BLUE scores to compare the original texts and the translations. The resulting corpus contains 130 stories which were revised by native Egyptian speakers who are professional translators. The strengths of this paper are multiple: working on a less-resourced variety, addressing an important social issue, creating a dataset with potential real-life applications, and ensuring the quality of the produced dataset through human validation.
Anthology ID:
2025.wacl-1.11
Volume:
Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Saad Ezzini, Hamza Alami, Ismail Berrada, Abdessamad Benlahbib, Abdelkader El Mahdaouy, Salima Lamsiyah, Hatim Derrouz, Amal Haddad Haddad, Mustafa Jarrar, Mo El-Haj, Ruslan Mitkov, Paul Rayson
Venues:
WACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
93–102
Language:
URL:
https://aclanthology.org/2025.wacl-1.11/
DOI:
Bibkey:
Cite (ACL):
Alaa Hassan Bouomar and Noorhan Abbas. 2025. AraSim: Optimizing Arabic Dialect Translation in Children’s Literature with LLMs and Similarity Scores. In Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4), pages 93–102, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
AraSim: Optimizing Arabic Dialect Translation in Children’s Literature with LLMs and Similarity Scores (Bouomar & Abbas, WACL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.wacl-1.11.pdf
Optionalsupplementarymaterial:
 2025.wacl-1.11.OptionalSupplementaryMaterial.zip