As the number of social media users increases, they express their thoughts, needs, socialise and publish their opinions reviews. For good social media sentiment analysis, good quality resources are needed, and the lack of these resources is particularly evident for languages other than English, in particular Arabic. The available Arabic resources lack of from either the size of the corpus or the quality of the annotation. In this paper, we present an Arabic Sentiment Analysis Corpus collected from Twitter, which contains 36K tweets labelled into positive and negative. We employed distant supervision and self-training approaches into the corpus to annotate it. Besides, we release an 8K tweets manually annotated as a gold standard. We evaluated the corpus intrinsically by comparing it to human classification and pre-trained sentiment analysis models, Moreover, we apply extrinsic evaluation methods exploiting sentiment analysis task and achieve an accuracy of 86%.
The Arabic language is a morphologically rich language with relatively few resources and a less explored syntax compared to English. Given these limitations, Arabic Natural Language Processing (NLP) tasks like Sentiment Analysis (SA), Named Entity Recognition (NER), and Question Answering (QA), have proven to be very challenging to tackle. Recently, with the surge of transformers based models, language-specific BERT based models have proven to be very efficient at language understanding, provided they are pre-trained on a very large corpus. Such models were able to set new standards and achieve state-of-the-art results for most NLP tasks. In this paper, we pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language. The performance of AraBERT is compared to multilingual BERT from Google and other state-of-the-art approaches. The results showed that the newly developed AraBERT achieved state-of-the-art performance on most tested Arabic NLP tasks. The pretrained araBERT models are publicly available on https://github.com/aub-mind/araBERT hoping to encourage research and applications for Arabic NLP.
We describe AraNet, a collection of deep learning Arabic social media processing tools. Namely, we exploit an extensive host of both publicly available and novel social media datasets to train bidirectional encoders from transformers (BERT) focused at social meaning extraction. AraNet models predict age, dialect, gender, emotion, irony, and sentiment. AraNet either delivers state-of-the-art performance on a number of these tasks and performs competitively on others. AraNet is exclusively based on a deep learning framework, giving it the advantage of being feature-engineering free. To the best of our knowledge, AraNet is the first to performs predictions across such a wide range of tasks for Arabic NLP. As such, AraNet has the potential to meet critical needs. We publicly release AraNet to accelerate research, and to facilitate model-based comparisons across the different tasks
The current Arabic natural language processing resources are mainly build to address the Modern Standard Arabic (MSA), while we witnessed some scattered efforts to build resources for various Arabic dialects such as the Levantine and the Egyptian dialects. We observed a lack of resources for Gulf Arabic and especially the Qatari variety. In this paper, we present the first Qatari idioms and expression corpus of 1000 entries. The corpus was created from on-line and printed sources in addition to transcribed recorded interviews. The corpus covers various Qatari traditional expressions and idioms. To this end, audio recordings were collected from interviews and an online survey questionnaire was conducted to validate our data. This corpus aims to help advance the dialectal Arabic Speech and Natural Language Processing tools and applications for the Qatari dialect.
Sarcasm is one of the main challenges for sentiment analysis systems. Its complexity comes from the expression of opinion using implicit indirect phrasing. In this paper, we present ArSarcasm, an Arabic sarcasm detection dataset, which was created through the reannotation of available Arabic sentiment analysis datasets. The dataset contains 10,547 tweets, 16% of which are sarcastic. In addition to sarcasm the data was annotated for sentiment and dialects. Our analysis shows the highly subjective nature of these tasks, which is demonstrated by the shift in sentiment labels based on annotators’ biases. Experiments show the degradation of state-of-the-art sentiment analysers when faced with sarcastic content. Finally, we train a deep learning model for sarcasm detection using BiLSTM. The model achieves an F1 score of 0.46, which shows the challenging nature of the task, and should act as a basic baseline for future research on our dataset.
Social media communication has become a significant part of daily activity in modern societies. For this reason, ensuring safety in social media platforms is a necessity. Use of dangerous language such as physical threats in online environments is a somewhat rare, yet remains highly important. Although several works have been performed on the related issue of detecting offensive and hateful language, dangerous speech has not previously been treated in any significant way. Motivated by these observations, we report our efforts to build a labeled dataset for dangerous speech. We also exploit our dataset to develop highly effective models to detect dangerous content. Our best model performs at 59.60% macro F1, significantly outperforming a competitive baseline.
This paper provides an overview of the offensive language detection shared task at the 4th workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4). There were two subtasks, namely: Subtask A, involving the detection of offensive language, which contains unacceptable or vulgar content in addition to any kind of explicit or implicit insults or attacks against individuals or groups; and Subtask B, involving the detection of hate speech, which contains insults or threats targeting a group based on their nationality, ethnicity, race, gender, political or sport affiliation, religious belief, or other common characteristics. In total, 40 teams signed up to participate in Subtask A, and 14 of them submitted test runs. For Subtask B, 33 teams signed up to participate and 13 of them submitted runs. We present and analyze all submissions in this paper.
The preprocessing phase is one of the key phases within the text classification pipeline. This study aims at investigating the impact of the preprocessing phase on text classification, specifically on offensive language and hate speech classification for Arabic text. The Arabic language used in social media is informal and written using Arabic dialects, which makes the text classification task very complex. Preprocessing helps in dimensionality reduction and removing useless content. We apply intensive preprocessing techniques to the dataset before processing it further and feeding it into the classification model. An intensive preprocessing-based approach demonstrates its significant impact on offensive language detection and hate speech detection shared tasks of the fourth workshop on Open-Source Arabic Corpora and Corpora Processing Tools (OSACT). Our team wins the third place (3rd) in the Sub-Task A Offensive Language Detection division and wins the first place (1st) in the Sub-Task B Hate Speech Detection division, with an F1 score of 89% and 95%, respectively, by providing the state-of-the-art performance in terms of F1, accuracy, recall, and precision for Arabic hate speech detection.
In this paper, we describe our efforts at OSACT Shared Task on Offensive Language Detection. The shared task consists of two subtasks: offensive language detection (Subtask A) and hate speech detection (Subtask B). For offensive language detection, a system combination of Support Vector Machines (SVMs) and Deep Neural Networks (DNNs) achieved the best results on development set, which ranked 1st in the official results for Subtask A with F1-score of 90.51% on the test set. For hate speech detection, DNNs were less effective and a system combination of multiple SVMs with different parameters achieved the best results on development set, which ranked 4th in official results for Subtask B with F1-macro score of 80.63% on the test set.
In the past years, toxic comments and offensive speech are polluting the internet and manual inspection of these comments is becoming a tiresome task to manage. Having a machine learning based model that is able to filter offensive Arabic content is of high need nowadays. In this paper, we describe the model that was submitted to the Shared Task on Offensive Language Detection that is organized by (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools). Our model makes use transformer based model (BERT) to detect offensive content. We came in the fourth place in subtask A (detecting Offensive Speech) and in the third place in subtask B (detecting Hate Speech).
In this paper, we describe our submission for the OCAST4 2020 shared tasks on offensive language and hate speech detection in the Arabic language. Our solution builds upon combining a number of deep learning models using pre-trained word vectors. To improve the word representation and increase word coverage, we compare a number of existing pre-trained word embeddings and finally concatenate the two empirically best among them. To avoid under- as well as over-fitting, we train each deep model multiple times, and we include the optimization of the decision threshold into the training process. The predictions of the resulting models are then combined into a tuned ensemble by stacking a classifier on top of the predictions by these base models. We name our approach “ESOTP” (Ensembled Stacking classifier over Optimized Thresholded Predictions of multiple deep models). The resulting ESOTP-based system ranked 6th out of 35 on the shared task of Offensive Language detection (sub-task A) and 5th out of 30 on Hate Speech Detection (sub-task B).
In this paper, we tackle the problem of offensive language and hate speech detection. We proposed our methods for data preprocessing and balancing, and then we presented our Convolutional Neural Network (CNN) and bidirectional Gated Recurrent Unit (GRU) models used. After that, we augmented these models with attention layer. The best results achieved was using the Bidirectional Gated Recurrent Unit augmented with attention layer (Bi-GRU_ATT). Keywords: Abusive Language, Text Mining, Arabic Language, Social Media Mining, Deep Learning, Convolutional Neural Network, Gated Recurrent Unit, Attention Mechanism, Machine Learning.
In this paper, we approach the shared task OffenseEval 2020 by Mubarak et al. (2020) using ULMFiT Howard and Ruder (2018) pre-trained on Arabic Wikipedia Khooli (2019) which we use as a starting point and use the target data-set to fine-tune it. The data set of the task is highly imbalanced. We train forward and backward models and ensemble the results. We report confusion matrix, accuracy, precision, recall and F1 of the development set and report summarized results of the test set. Transfer learning method using ULMFiT shows potential for Arabic text classification. Mubarak, K. Darwish,W. Magdy, T. Elsayed, and H. Al-Khalifa. Overview of osact4 arabic offensive language detection shared task. 4, 2020. Howard and S. Ruder. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146, 2018. Khooli. Applied data science. https://github.com/abedkhooli/ds2, 2019.
Offensive language and hate-speech are phenomena that spread with the rising popularity of social media. Detecting such content is crucial for understanding and predicting conflicts, understanding polarisation among communities and providing means and tools to filter or block inappropriate content. This paper describes the SMASH team submission to OSACT4’s shared task on hate-speech and offensive language detection, where we explore different approaches to perform these tasks. The experiments cover a variety of approaches that include deep learning, transfer learning and multitask learning. We also explore the utilisation of sentiment information to perform the previous task. Our best model is a multitask learning architecture, based on CNN-BiLSTM, that was trained to detect hate-speech and offensive language and predict sentiment.
Twitter and other social media platforms offer users the chance to share their ideas via short posts. While the easy exchange of ideas has value, these microblogs can be leveraged by people who want to share hatred. and such individuals can share negative views about an individual, race, or group with millions of people at the click of a button. There is thus an urgent need to establish a method that can automatically identify hate speech and offensive language. To contribute to this development, during the OSACT4 workshop, a shared task was undertaken to detect offensive language in Arabic. A key challenge was the uniqueness of the language used on social media, prompting the out-of-vocabulary (OOV) problem. In addition, the use of different dialects in Arabic exacerbates this problem. To deal with the issues associated with OOV, we generated a character-level embeddings model, which was trained on a massive data collected carefully. This level of embeddings can work effectively in resolving the problem of OOV words through its ability to learn the vectors of character n-grams or parts of words. The proposed systems were ranked 7th and 8th for Subtasks A and B, respectively.
The use of social media platforms has become more prevalent, which has provided tremendous opportunities for people to connect but has also opened the door for misuse with the spread of hate speech and offensive language. This phenomenon has been driving more and more people to more extreme reactions and online aggression, sometimes causing physical harm to individuals or groups of people. There is a need to control and prevent such misuse of online social media through automatic detection of profane language. The shared task on Offensive Language Detection at the OSACT4 has aimed at achieving state of art profane language detection methods for Arabic social media. Our team “BERTologists” tackled this problem by leveraging state of the art pretrained Arabic language model, AraBERT, that we augment with the addition of Multi-task learning to enable our model to learn efficiently from little data. Our Multitask AraBERT approach achieved the second place in both subtasks A & B, which shows that the model performs consistently across different tasks.
Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech. In this work, we report our submission to the Offensive Language and hate-speech Detection shared task organized with the 4th Workshop on Open-Source Arabic Corpora and Processing Tools Arabic (OSACT4). We focus on developing purely deep learning systems, without a need for feature engineering. For that purpose, we develop an effective method for automatic data augmentation and show the utility of training both offensive and hate speech models off (i.e., by fine-tuning) previously trained affective models (i.e., sentiment and emotion). Our best models are significantly better than a vanilla BERT model, with 89.60% acc (82.31% macro F1) for hate speech and 95.20% acc (70.51% macro F1) on official TEST data.
As the use of social media platforms increases extensively to freely communicate and share opinions, hate speech becomes an outstanding problem that requires urgent attention. This paper focuses on the problem of detecting hate speech in Arabic tweets. To tackle the problem efficiently, we adopt a “quick and simple” approach by which we investigate the effectiveness of 15 classical (e.g., SVM) and neural (e.g., CNN) learning models, while exploring two different term representations. Our experiments on 8k labelled dataset show that the best neural learning models outperform the classical ones, while distributed term representation is more effective than statistical bag-of-words representation. Overall, our best classifier (that combines both CNN and RNN in a joint architecture) achieved 0.73 macro-F1 score on the dev set, which significantly outperforms the majority-class baseline that achieves 0.49, proving the effectiveness of our “quick and simple” approach.