ONE: Toward ONE model, ONE algorithm, ONE corpus dedicated to sentiment analysis of Arabic/Arabizi and its dialects

Imane Guellil, Faical Azouaou, Fodil Benali, Hachani Ala-Eddine


Abstract
Arabic is the official language of 22 countries, spoken by more than 400 million speakers. Each one of this country use at least on dialect for daily life conversation. Then, Arabic has at least 22 dialects. Each dialect can be written in Arabic or Arabizi Scripts. The most recent researches focus on constructing a language model and a training corpus for each dialect, in each script. Following this technique means constructing 46 different resources (by including the Modern Standard Arabic, MSA) for handling only one language. In this paper, we extract ONE corpus, and we propose ONE algorithm to automatically construct ONE training corpus using ONE classification model architecture for sentiment analysis MSA and different dialects. After manually reviewing the training corpus, the obtained results outperform all the research literature results for the targeted test corpora.
Anthology ID:
2021.wassa-1.25
Volume:
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Month:
April
Year:
2021
Address:
Online
Venues:
EACL | WASSA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
236–249
Language:
URL:
https://aclanthology.org/2021.wassa-1.25
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.wassa-1.25.pdf