Afrah Altamimi
2025
Evaluation of Large Language Models on Arabic Punctuation Prediction
Asma Ali Al Wazrah
|
Afrah Altamimi
|
Hawra Aljasim
|
Waad Alshammari
|
Rawan Al-Matham
|
Omar Elnashar
|
Mohamed Amin
|
Abdulrahman AlOsaimy
Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script
The linguistic inclusivity of Large Language Models (LLMs) such as ChatGPT, Gemni, JAIS, and AceGPT has not been sufficiently explored, particularly in their handling of low-resource languages like Arabic compared to English. While these models have shown impressive performance across various tasks, their effectiveness in Arabic remains under-examined. Punctuation, critical for sentence structure and comprehension in tasks like speech analysis, synthesis, and machine translation, requires precise prediction. This paper assesses seven LLMs: GPT4-o, Gemni1.5, JAIS, AceGPT, SILMA, ALLaM, and CommandR+ for Arabic punctuation prediction. Additionally, the performance of fine-tuned AraBERT is compared with these models in zero-shot and few-shot settings using a proposed Arabic punctuation prediction corpus of 10,044 sentences. The experiments demonstrate that while AraBERT performs well for specific punctuation marks, LLMs show significant promise in zero-shot learning, with further improvements in few-shot scenarios. These findings highlight the potential of LLMs to enhance the automation and accuracy of Arabic text processing.
2023
KSAA-RD Shared Task: Arabic Reverse Dictionary
Rawan Al-Matham
|
Waad Alshammari
|
Abdulrahman AlOsaimy
|
Sarah Alhumoud
|
Asma Wazrah
|
Afrah Altamimi
|
Halah Alharbi
|
Abdullah Alaifi
Proceedings of ArabicNLP 2023
This paper outlines the KSAA-RD shared task, which aims to develop a Reverse Dictionary (RD) system for the Arabic language. RDs allow users to find words based on their meanings or definition. This shared task, KSAA-RD, includes two subtasks: Arabic RD and cross-lingual reverse dictionaries (CLRD). Given a definition (referred to as a “gloss”) in either Arabic or English, the teams compete to find the most similar word embeddings of their corresponding word. The winning team achieved 24.20 and 12.70 for RD and CLRD, respectively in terms of rank metric. In this paper, we describe the methods employed by the participating teams and offer an outlook for KSAA-RD.
Search
Fix data
Co-authors
- Rawan Al-Matham 2
- Abdulrahman Alosaimy 2
- Waad Alshammari 2
- Asma Ali Al Wazrah 1
- Abdullah Alaifi 1
- show all...