Chedi Bechikh Ali

Also published as: Chedi Bechikh

2019

Tw-StAR at SemEval-2019 Task 5: N-gram embeddings for Hate Speech Detection in Multilingual Tweets
Hala Mulki | Chedi Bechikh Ali | Hatem Haddad | Ismail Babaoğlu
Proceedings of the 13th International Workshop on Semantic Evaluation

In this paper, we describe our contribution in SemEval-2019: subtask A of task 5 “Multilingual detection of hate speech against immigrants and women in Twitter (HatEval)”. We developed two hate speech detection model variants through Tw-StAR framework. While the first model adopted one-hot encoding ngrams to train an NB classifier, the second generated and learned n-gram embeddings within a feedforward neural network. For both models, specific terms, selected via MWT patterns, were tagged in the input data. With two feature types employed, we could investigate the ability of n-gram embeddings to rival one-hot n-grams. Our results showed that in English, n-gram embeddings outperformed one-hot ngrams. However, representing Spanish tweets by one-hot n-grams yielded a slightly better performance compared to that of n-gram embeddings. The official ranking indicated that Tw-StAR ranked 9th for English and 20th for Spanish.

pdf bib abs

L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language
Hala Mulki | Hatem Haddad | Chedi Bechikh Ali | Halima Alshabani
Proceedings of the Third Workshop on Abusive Language Online

Hate speech and abusive language have become a common phenomenon on Arabic social media. Automatic hate speech and abusive detection systems can facilitate the prohibition of toxic textual contents. The complexity, informality and ambiguity of the Arabic dialects hindered the provision of the needed resources for Arabic abusive/hate speech detection research. In this paper, we introduce the first publicly-available Levantine Hate Speech and Abusive (L-HSAB) Twitter dataset with the objective to be a benchmark dataset for automatic detection of online Levantine toxic contents. We, further, provide a detailed review of the data collection steps and how we design the annotation guidelines such that a reliable dataset annotation is guaranteed. This has been later emphasized through the comprehensive evaluation of the annotations as the annotation agreement metrics of Cohen’s Kappa (k) and Krippendorff’s alpha (α) indicated the consistency of the annotations.

2018

pdf bib abs

Tw-StAR at SemEval-2018 Task 1: Preprocessing Impact on Multi-label Emotion Classification
Hala Mulki | Chedi Bechikh Ali | Hatem Haddad | Ismail Babaoğlu
Proceedings of the 12th International Workshop on Semantic Evaluation

In this paper, we describe our contribution in SemEval-2018 contest. We tackled task 1 “Affect in Tweets”, subtask E-c “Detecting Emotions (multi-label classification)”. A multilabel classification system Tw-StAR was developed to recognize the emotions embedded in Arabic, English and Spanish tweets. To handle the multi-label classification problem via traditional classifiers, we employed the binary relevance transformation strategy while a TF-IDF scheme was used to generate the tweets’ features. We investigated using single and combinations of several preprocessing tasks to further improve the performance. The results showed that specific combinations of preprocessing tasks could significantly improve the evaluation measures. This has been later emphasized by the official results as our system ranked 3rd for both Arabic and Spanish datasets and 14th for the English dataset.

pdf bib

Impact du Prétraitement Linguistique sur l’Analyse de Sentiment du Dialecte Tunisien ()
Chedi Bechikh Ali | Hala Mulki | Hatem Haddad
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN