Ali Chehab

2025

pdf bib abs
Can LLMs Translate Cultural Nuance in Dialects? A Case Study on Lebanese Arabic
Silvana Yakhni | Ali Chehab
Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script

Machine Translation (MT) of Arabic-script languages presents unique challenges due to their vast linguistic diversity and lack of standardization. This paper focuses on the Lebanese dialect, investigating the effectiveness of Large Language Models (LLMs) in handling culturally-aware translations. We identify critical limitations in existing Lebanese-English parallel datasets, particularly their non-native nature and lack of cultural context. To address these gaps, we introduce a new culturally-rich dataset derived from the Language Wave (LW) podcast. We evaluate the performance of LLMs: Jais, AceGPT, Cohere, and GPT-4 models against Neural Machine Translation (NMT) systems: NLLB-200, and Google Translate. Our findings reveal that while both architectures perform similarly on non-native datasets, LLMs demonstrate superior capabilities in preserving cultural nuances when handling authentic Lebanese content. Additionally, we validate xCOMET as a reliable metric for evaluating the quality of Arabic dialect translation, showing a strong correlation with human judgment. This work contributes to the growing field of Culturally-Aware Machine Translation and highlights the importance of authentic, culturally representative datasets in advancing low-resource translation systems.

Co-authors

Silvana Yakhni 1

Venues

abjadnlp1
ws1

Fix data