Evaluation of Large Language Models on Arabic Punctuation Prediction

Asma Ali Al Wazrah, Afrah Altamimi, Hawra Aljasim, Waad Alshammari, Rawan Al-Matham, Omar Elnashar, Mohamed Amin, Abdulrahman AlOsaimy


Abstract
The linguistic inclusivity of Large Language Models (LLMs) such as ChatGPT, Gemni, JAIS, and AceGPT has not been sufficiently explored, particularly in their handling of low-resource languages like Arabic compared to English. While these models have shown impressive performance across various tasks, their effectiveness in Arabic remains under-examined. Punctuation, critical for sentence structure and comprehension in tasks like speech analysis, synthesis, and machine translation, requires precise prediction. This paper assesses seven LLMs: GPT4-o, Gemni1.5, JAIS, AceGPT, SILMA, ALLaM, and CommandR+ for Arabic punctuation prediction. Additionally, the performance of fine-tuned AraBERT is compared with these models in zero-shot and few-shot settings using a proposed Arabic punctuation prediction corpus of 10,044 sentences. The experiments demonstrate that while AraBERT performs well for specific punctuation marks, LLMs show significant promise in zero-shot learning, with further improvements in few-shot scenarios. These findings highlight the potential of LLMs to enhance the automation and accuracy of Arabic text processing.
Anthology ID:
2025.abjadnlp-1.15
Volume:
Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editor:
Mo El-Haj
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
144–154
Language:
URL:
https://aclanthology.org/2025.abjadnlp-1.15/
DOI:
Bibkey:
Cite (ACL):
Asma Ali Al Wazrah, Afrah Altamimi, Hawra Aljasim, Waad Alshammari, Rawan Al-Matham, Omar Elnashar, Mohamed Amin, and Abdulrahman AlOsaimy. 2025. Evaluation of Large Language Models on Arabic Punctuation Prediction. In Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script, pages 144–154, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Evaluation of Large Language Models on Arabic Punctuation Prediction (Al Wazrah et al., AbjadNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.abjadnlp-1.15.pdf