Uncertainty-driven Partial Diacritization for Arabic Text

Humaid Ali Alblooshi, Artem Shelmanov, Hanan Aldarmaki


Abstract
We present an uncertainty-based approach to Partial Diacritization (PD) for Arabic text. We evaluate three uncertainty metrics for this task: Softmax Response, BALD via MC-dropout, and Mahalanobis Distance. We further introduce a lightweight Confident Error Regularizer to improve model calibration. Our preliminary exploration illustrates possible ways to use uncertainty estimation for selectively retaining or discarding diacritics in Arabic text with an analysis of performance in terms of correlation with diacritic error rates. For instance, the model can be used to detect words with high diacritic error rates which tend to have higher uncertainty scores at inference time. On the Tashkeela dataset, the method maintains low Diacritic Error Rate while reducing the amount of visible diacritics on the text by up to 50% with thresholding-based retention.
Anthology ID:
2025.uncertainlp-main.1
Volume:
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editor:
Noidea Noidea
Venues:
UncertaiNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/2025.uncertainlp-main.1/
DOI:
Bibkey:
Cite (ACL):
Humaid Ali Alblooshi, Artem Shelmanov, and Hanan Aldarmaki. 2025. Uncertainty-driven Partial Diacritization for Arabic Text. In Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025), pages 1–10, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Uncertainty-driven Partial Diacritization for Arabic Text (Ali Alblooshi et al., UncertaiNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.uncertainlp-main.1.pdf