Enhancing Arabic Dialectal Sentiment Analysis through Advanced Data Augmentation Techniques

Md. Rafiul Biswas, Wajdi Zaghouani


Abstract
This work addresses the challenge of Arabic sentiment analysis in the hospitality domain in all dialects by using data augmentation techniques. We created a pipeline with three simple techniques: context-based paraphrasing, pattern-based sentence generation, and domain-specific word replacement. Our method preserves the original dialect features, meanings, and key classification details while adding diversity to the training data. It also includes automatic fallback between methods to handle challenges effectively. We used the Fanar API for dialectal data augmentation in the hospitality domain. The AraBERT-Large-v02 model was fine-tuned on original and augmented data, showing improved performance. This study helps solve the problem of limited dialect data in Arabic NLP and offers an effective framework that is useful for other Arabic text analysis tasks.
Anthology ID:
2025.ranlp-ahasis.4
Volume:
Proceedings of the Shared Task on Sentiment Analysis for Arabic Dialects
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Maram Alharbi, Salmane Chafik, Saad Ezzini, Ruslan Mitkov, Tharindu Ranasinghe, Hansi Hettiarachchi
Venues:
RANLP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
24–28
Language:
URL:
https://aclanthology.org/2025.ranlp-ahasis.4/
DOI:
Bibkey:
Cite (ACL):
Md. Rafiul Biswas and Wajdi Zaghouani. 2025. Enhancing Arabic Dialectal Sentiment Analysis through Advanced Data Augmentation Techniques. In Proceedings of the Shared Task on Sentiment Analysis for Arabic Dialects, pages 24–28, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Enhancing Arabic Dialectal Sentiment Analysis through Advanced Data Augmentation Techniques (Biswas & Zaghouani, RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-ahasis.4.pdf