Hamza Alami


pdf bib
NLP-LISAC at SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis via a Transformer-based Approach and Data Augmentation
Abdessamad Benlahbib | Hamza Alami | Achraf Boumhidi | Omar Benslimane
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our system and findings for SemEval 2023 Task 9 Tweet Intimacy Analysis. The main objective of this task was to predict the intimacy of tweets in 10 languages. Our submitted model (ranked 28/45) consists of a transformer-based approach with data augmentation via machine translation.

pdf bib
UL & UM6P at SemEval-2023 Task 10: Semi-Supervised Multi-task Learning for Explainable Detection of Online Sexism
Salima Lamsiyah | Abdelkader El Mahdaouy | Hamza Alami | Ismail Berrada | Christoph Schommer
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper introduces our participating system to the Explainable Detection of Online Sexism (EDOS) SemEval-2023 - Task 10: Explainable Detection of Online Sexism. The EDOS shared task covers three hierarchical sub-tasks for sexism detection, coarse-grained and fine-grained categorization. We have investigated both single-task and multi-task learning based on RoBERTa transformer-based language models. For improving the results, we have performed further pre-training of RoBERTa on the provided unlabeled data. Besides, we have employed a small sample of the unlabeled data for semi-supervised learning using the minimum class-confusion loss. Our system has achieved macro F1 scores of 82.25\%, 67.35\%, and 49.8\% on Tasks A, B, and C, respectively.

pdf bib
UM6P at SemEval-2023 Task 3: News genre classification based on transformers, graph convolution networks and number of sentences
Hamza Alami | Abdessamad Benlahbib | Abdelkader El Mahdaouy | Ismail Berrada
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our proposed method for english documents genre classification in the context of SemEval 2023 task 3, subtask 1. Our method use ensemble technique to combine four distinct models predictions: Longformer, RoBERTa, GCN, and a sentences number-based model. Each model is optimized on simple objectives and easy to grasp. We provide snippets of code that define each model to make the reading experience better. Our method ranked 12th in documents genre classification for english texts.

pdf bib
UM6P at SemEval-2023 Task 12: Out-Of-Distribution Generalization Method for African Languages Sentiment Analysis
Abdelkader El Mahdaouy | Hamza Alami | Salima Lamsiyah | Ismail Berrada
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our submitted system to AfriSenti SemEval-2023 Task 12: Sentiment Analysis for African Languages. The AfriSenti consists of three different tasks, covering monolingual, multilingual, and zero-shot sentiment analysis scenarios for African languages. To improve model generalization, we have explored the following steps: 1) further pre-training of the AfroXLM Pre-trained Language Model (PLM), 2) combining AfroXLM and MARBERT PLMs using a residual layer, and 3) studying the impact of metric learning and two out-of-distribution generalization training objectives. The overall evaluation results show that our system has achieved promising results on several sub-tasks of Task A. For Tasks B and C, our system is ranked among the top six participating systems.


pdf bib
High Tech team at SemEval-2022 Task 6: Intended Sarcasm Detection for Arabic texts
Hamza Alami | Abdessamad Benlahbib | Ahmed Alami
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents our proposed methods for the iSarcasmEval shared task. The shared task consists of three different subtasks. We participate in both subtask A and subtask C. The purpose of subtask A was to predict if a text is sarcastic while the aim of subtask C is to determine which text is sarcastic given a sarcastic text and its non-sarcastic rephrase. Both of the developed solutions used BERT pre-trained models. The proposed models are optimized on simple objectives and are easy to grasp. However, despite their simplicity, our methods ranked 4 and 2 in iSarcasmEval subtask A and subtask C for Arabic texts.

pdf bib
LISACTeam at SemEval-2022 Task 6: A Transformer based Approach for Intended Sarcasm Detection in English Tweets
Abdessamad Benlahbib | Hamza Alami | Ahmed Alami
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this paper, we present our system and findings for SemEval-2022 Task 6 - iSarcasmEval: Intended Sarcasm Detection in English. The main objective of this task was to identify sarcastic tweets. This task was challenging mainly due to (1) the small training dataset that contains only 3468 tweets and (2) the imbalanced class distribution (25% sarcastic and 75% non-sarcastic). Our submitted model (ranked eighth on Sub-Task A and fifth on Sub-Task C) consists of a Transformer-based approach (BERTweet model).


pdf bib
LISAC FSDM USMBA at SemEval-2021 Task 5: Tackling Toxic Spans Detection Challenge with Supervised SpanBERT-based Model and Unsupervised LIME-based Model
Abdessamad Benlahbib | Ahmed Alami | Hamza Alami
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Toxic spans detection is an emerging challenge that aims to find toxic spans within a toxic text. In this paper, we describe our solutions to tackle toxic spans detection. The first solution, which follows a supervised approach, is based on SpanBERT model. This latter is intended to better embed and predict spans of text. The second solution, which adopts an unsupervised approach, combines linear support vector machine with the Local Interpretable Model-Agnostic Explanations (LIME). This last is used to interpret predictions of learning-based models. Our supervised model outperformed the unsupervised model and achieved the f-score of 67,84% (ranked 22/85) in Task 5 at SemEval-2021: Toxic Spans Detection.


pdf bib
Weighted combination of BERT and N-GRAM features for Nuanced Arabic Dialect Identification
Abdellah El Mekki | Ahmed Alami | Hamza Alami | Ahmed Khoumsi | Ismail Berrada
Proceedings of the Fifth Arabic Natural Language Processing Workshop

Around the Arab world, different Arabic dialects are spoken by more than 300M persons, and are increasingly popular in social media texts. However, Arabic dialects are considered to be low-resource languages, limiting the development of machine-learning based systems for these dialects. In this paper, we investigate the Arabic dialect identification task, from two perspectives: country-level dialect identification from 21 Arab countries, and province-level dialect identification from 100 provinces. We introduce an unified pipeline of state-of-the-art models, that can handle the two subtasks. Our experimental studies applied to the NADI shared task, show promising results both at the country-level (F1-score of 25.99%) and the province-level (F1-score of 6.39%), and thus allow us to be ranked 2nd for the country-level subtask, and 1st in the province-level subtask.

pdf bib
LISAC FSDM-USMBA Team at SemEval-2020 Task 12: Overcoming AraBERT’s pretrain-finetune discrepancy for Arabic offensive language identification
Hamza Alami | Said Ouatik El Alaoui | Abdessamad Benlahbib | Noureddine En-nahnahi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

AraBERT is an Arabic version of the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) model. The latter has achieved good performance in a variety of Natural Language Processing (NLP) tasks. In this paper, we propose an effective AraBERT embeddings-based method for dealing with offensive Arabic language in Twitter. First, we pre-process tweets by handling emojis and including their Arabic meanings. To overcome the pretrain-finetune discrepancy, we substitute each detected emojis by the special token [MASK] into both fine tuning and inference phases. Then, we represent tweets tokens by applying AraBERT model. Finally, we feed the tweet representation into a sigmoid function to decide whether a tweet is offensive or not. The proposed method achieved the best results on OffensEval 2020: Arabic task and reached a macro F1 score equal to 90.17%.