Abdessamad Benlahbib


2023

pdf bib
NLP-LISAC at SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis via a Transformer-based Approach and Data Augmentation
Abdessamad Benlahbib | Hamza Alami | Achraf Boumhidi | Omar Benslimane
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our system and findings for SemEval 2023 Task 9 Tweet Intimacy Analysis. The main objective of this task was to predict the intimacy of tweets in 10 languages. Our submitted model (ranked 28/45) consists of a transformer-based approach with data augmentation via machine translation.

pdf bib
NLP-LISAC at SemEval-2023 Task 12: Sentiment Analysis for Tweets expressed in African languages via Transformer-based Models
Abdessamad Benlahbib | Achraf Boumhidi
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our systems and findings for SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages. The main objective of this task was to determine the polarity of a tweet (positive, negative, or neutral). Our submitted models (highest rank is 1 and lowest rank is 21 depending on the target Track) consist of various Transformer-based approaches.

pdf bib
UM6P at SemEval-2023 Task 3: News genre classification based on transformers, graph convolution networks and number of sentences
Hamza Alami | Abdessamad Benlahbib | Abdelkader El Mahdaouy | Ismail Berrada
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our proposed method for english documents genre classification in the context of SemEval 2023 task 3, subtask 1. Our method use ensemble technique to combine four distinct models predictions: Longformer, RoBERTa, GCN, and a sentences number-based model. Each model is optimized on simple objectives and easy to grasp. We provide snippets of code that define each model to make the reading experience better. Our method ranked 12th in documents genre classification for english texts.

2022

pdf bib
High Tech team at SemEval-2022 Task 6: Intended Sarcasm Detection for Arabic texts
Hamza Alami | Abdessamad Benlahbib | Ahmed Alami
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents our proposed methods for the iSarcasmEval shared task. The shared task consists of three different subtasks. We participate in both subtask A and subtask C. The purpose of subtask A was to predict if a text is sarcastic while the aim of subtask C is to determine which text is sarcastic given a sarcastic text and its non-sarcastic rephrase. Both of the developed solutions used BERT pre-trained models. The proposed models are optimized on simple objectives and are easy to grasp. However, despite their simplicity, our methods ranked 4 and 2 in iSarcasmEval subtask A and subtask C for Arabic texts.

pdf bib
LISACTeam at SemEval-2022 Task 6: A Transformer based Approach for Intended Sarcasm Detection in English Tweets
Abdessamad Benlahbib | Hamza Alami | Ahmed Alami
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this paper, we present our system and findings for SemEval-2022 Task 6 - iSarcasmEval: Intended Sarcasm Detection in English. The main objective of this task was to identify sarcastic tweets. This task was challenging mainly due to (1) the small training dataset that contains only 3468 tweets and (2) the imbalanced class distribution (25% sarcastic and 75% non-sarcastic). Our submitted model (ranked eighth on Sub-Task A and fifth on Sub-Task C) consists of a Transformer-based approach (BERTweet model).

2021

pdf bib
LISAC FSDM USMBA at SemEval-2021 Task 5: Tackling Toxic Spans Detection Challenge with Supervised SpanBERT-based Model and Unsupervised LIME-based Model
Abdessamad Benlahbib | Ahmed Alami | Hamza Alami
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Toxic spans detection is an emerging challenge that aims to find toxic spans within a toxic text. In this paper, we describe our solutions to tackle toxic spans detection. The first solution, which follows a supervised approach, is based on SpanBERT model. This latter is intended to better embed and predict spans of text. The second solution, which adopts an unsupervised approach, combines linear support vector machine with the Local Interpretable Model-Agnostic Explanations (LIME). This last is used to interpret predictions of learning-based models. Our supervised model outperformed the unsupervised model and achieved the f-score of 67,84% (ranked 22/85) in Task 5 at SemEval-2021: Toxic Spans Detection.

2020

pdf bib
LISAC FSDM-USMBA Team at SemEval-2020 Task 12: Overcoming AraBERT’s pretrain-finetune discrepancy for Arabic offensive language identification
Hamza Alami | Said Ouatik El Alaoui | Abdessamad Benlahbib | Noureddine En-nahnahi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

AraBERT is an Arabic version of the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) model. The latter has achieved good performance in a variety of Natural Language Processing (NLP) tasks. In this paper, we propose an effective AraBERT embeddings-based method for dealing with offensive Arabic language in Twitter. First, we pre-process tweets by handling emojis and including their Arabic meanings. To overcome the pretrain-finetune discrepancy, we substitute each detected emojis by the special token [MASK] into both fine tuning and inference phases. Then, we represent tweets tokens by applying AraBERT model. Finally, we feed the tweet representation into a sigmoid function to decide whether a tweet is offensive or not. The proposed method achieved the best results on OffensEval 2020: Arabic task and reached a macro F1 score equal to 90.17%.