Arabic-Adapted One-Step Speech-to-Diacritized ASR: Evaluation and Error Analysis

Osamah A. I. Abduljalil, Dalal Ali, Razan A. Bajaman, Abdullah I. Alharbi


Abstract
Arabic diacritics encode phonetic information essential for pronunciation, disambiguation, and downstream applications, yet most Arabic ASR systems generate undiacritized output. In this work, we study direct speech-to-diacritized-text recognition using a single-stage ASR pipeline that predicts diacritics jointly with Arabic letters, without text-based post-processing. We evaluate two Arabic-adapted ASR architectures—wav2vec 2.0 XLSR-53 and Whisper-base—under a unified experimental setup on the ClArTTS Classical Arabic dataset. Performance is assessed using surface and lexical WER/CER alongside diacritic error rate (DER) to disentangle base transcription accuracy from diacritic realization. Our results show that Arabic-adapted wav2vec 2.0 achieves substantially lower diacritic error rates than Whisper, indicating stronger exploitation of acoustic cues relevant to vowelization. We further analyze the effect of decoding strategy and provide a detailed breakdown of diacritic errors, highlighting challenges associated with short vowels and morphosyntactic markers. These findings underscore the importance of model architecture and Arabic-specific adaptation for accurate diacritized Arabic ASR.
Anthology ID:
2026.abjadnlp-1.43
Volume:
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
371–379
Language:
URL:
https://aclanthology.org/2026.abjadnlp-1.43/
DOI:
Bibkey:
Cite (ACL):
Osamah A. I. Abduljalil, Dalal Ali, Razan A. Bajaman, and Abdullah I. Alharbi. 2026. Arabic-Adapted One-Step Speech-to-Diacritized ASR: Evaluation and Error Analysis. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 371–379, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Arabic-Adapted One-Step Speech-to-Diacritized ASR: Evaluation and Error Analysis (Abduljalil et al., AbjadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.abjadnlp-1.43.pdf