Enhancing Arabic NLP Tasks through Character-Level Models and Data Augmentation

Mohanad Mohamed; Sadam Al-Azani

Enhancing Arabic NLP Tasks through Character-Level Models and Data Augmentation

Abstract

This study introduces a character-level approach specifically designed for Arabic NLP tasks, offering a novel and highly effective solution to the unique challenges inherent in Arabic language processing. It presents a thorough comparative study of various character-level models, including Convolutional Neural Networks (CNNs), pre-trained transformers (CANINE), and Bidirectional Long Short-Term Memory networks (BiLSTMs), assessing their performance and exploring the impact of different data augmentation techniques on enhancing their effectiveness. Additionally, it introduces two innovative Arabic-specific data augmentation methods—vowel deletion and style transfer—and rigorously evaluates their effectiveness. The proposed approach was evaluated on Arabic privacy policy classification task as a case study, demonstrating significant improvements in model performance, reporting a micro-averaged F1-score of 93.8%, surpassing state-of-the-art models.

Anthology ID:: 2025.coling-main.186
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2744–2757
Language:
URL:: https://aclanthology.org/2025.coling-main.186/
DOI:
Bibkey:
Cite (ACL):: Mohanad Mohamed and Sadam Al-Azani. 2025. Enhancing Arabic NLP Tasks through Character-Level Models and Data Augmentation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 2744–2757, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Enhancing Arabic NLP Tasks through Character-Level Models and Data Augmentation (Mohamed & Al-Azani, COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.186.pdf

PDF Cite Search Fix data