ALADAN at IWSLT24 Low-resource Arabic Dialectal Speech Translation Task

Waad Ben Kheder, Josef Jon, André Beyer, Abdel Messaoudi, Rabea Affan, Claude Barras, Maxim Tychonov, Jean-Luc Gauvain


Abstract
This paper presents ALADAN’s approach to the IWSLT 2024 Dialectal and Low-resource shared task, focusing on Levantine Arabic (apc) and Tunisian Arabic (aeb) to English speech translation (ST). Addressing challenges such as the lack of standardized orthography and limited training data, we propose a solution for data normalization in Dialectal Arabic, employing a modified Levenshtein distance and Word2vec models to find orthographic variants of the same word. Our system consists of a cascade ST system integrating two ASR systems (TDNN-F and Zipformer) and two NMT modules derived from pre-trained models (NLLB-200 1.3B distilled model and CohereAI’s Command-R). Additionally, we explore the integration of unsupervised textual and audio data, highlighting the importance of multi-dialectal datasets for both ASR and NMT tasks. Our system achieves BLEU score of 31.5 for Levantine Arabic on the official validation set.
Anthology ID:
2024.iwslt-1.25
Volume:
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:
IWSLT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
192–202
Language:
URL:
https://aclanthology.org/2024.iwslt-1.25
DOI:
Bibkey:
Cite (ACL):
Waad Ben Kheder, Josef Jon, André Beyer, Abdel Messaoudi, Rabea Affan, Claude Barras, Maxim Tychonov, and Jean-Luc Gauvain. 2024. ALADAN at IWSLT24 Low-resource Arabic Dialectal Speech Translation Task. In Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), pages 192–202, Bangkok, Thailand (in-person and online). Association for Computational Linguistics.
Cite (Informal):
ALADAN at IWSLT24 Low-resource Arabic Dialectal Speech Translation Task (Ben Kheder et al., IWSLT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.iwslt-1.25.pdf