2023
pdf
bib
abs
UL & UM6P at SemEval-2023 Task 10: Semi-Supervised Multi-task Learning for Explainable Detection of Online Sexism
Salima Lamsiyah
|
Abdelkader El Mahdaouy
|
Hamza Alami
|
Ismail Berrada
|
Christoph Schommer
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper introduces our participating system to the Explainable Detection of Online Sexism (EDOS) SemEval-2023 - Task 10: Explainable Detection of Online Sexism. The EDOS shared task covers three hierarchical sub-tasks for sexism detection, coarse-grained and fine-grained categorization. We have investigated both single-task and multi-task learning based on RoBERTa transformer-based language models. For improving the results, we have performed further pre-training of RoBERTa on the provided unlabeled data. Besides, we have employed a small sample of the unlabeled data for semi-supervised learning using the minimum class-confusion loss. Our system has achieved macro F1 scores of 82.25\%, 67.35\%, and 49.8\% on Tasks A, B, and C, respectively.
pdf
bib
abs
UM6P at SemEval-2023 Task 12: Out-Of-Distribution Generalization Method for African Languages Sentiment Analysis
Abdelkader El Mahdaouy
|
Hamza Alami
|
Salima Lamsiyah
|
Ismail Berrada
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper presents our submitted system to AfriSenti SemEval-2023 Task 12: Sentiment Analysis for African Languages. The AfriSenti consists of three different tasks, covering monolingual, multilingual, and zero-shot sentiment analysis scenarios for African languages. To improve model generalization, we have explored the following steps: 1) further pre-training of the AfroXLM Pre-trained Language Model (PLM), 2) combining AfroXLM and MARBERT PLMs using a residual layer, and 3) studying the impact of metric learning and two out-of-distribution generalization training objectives. The overall evaluation results show that our system has achieved promising results on several sub-tasks of Task A. For Tasks B and C, our system is ranked among the top six participating systems.
pdf
bib
abs
UL & UM6P at ArAIEval Shared Task: Transformer-based model for Persuasion Techniques and Disinformation detection in Arabic
Salima Lamsiyah
|
Abdelkader El Mahdaouy
|
Hamza Alami
|
Ismail Berrada
|
Christoph Schommer
Proceedings of ArabicNLP 2023
In this paper, we introduce our participating system to the ArAIEval Shared Task, addressing both the detection of persuasion techniques and disinformation tasks. Our proposed system employs a pre-trained transformer-based language model for Arabic, alongside a classifier. We have assessed the performance of three Arabic Pre-trained Language Models (PLMs) for sentence encoding. Additionally, to enhance our model’s performance, we have explored various training objectives, including Cross-Entropy loss, regularized Mixup loss, asymmetric multi-label loss, and Focal Tversky loss. On the official test set, our system has achieved micro-F1 scores of 0.7515, 0.5666, 0.904, and 0.8333 for Sub-Task 1A, Sub-Task 1B, Sub-Task 2A, and Sub-Task 2B, respectively. Furthermore, our system has secured the 4th, 1st, 3rd, and 2nd positions, respectively, among all participating systems in sub-tasks 1A, 1B, 2A, and 2B of the ArAIEval shared task.
pdf
bib
abs
UM6P & UL at WojoodNER shared task: Improving Multi-Task Learning for Flat and Nested Arabic Named Entity Recognition
Abdelkader El Mahdaouy
|
Salima Lamsiyah
|
Hamza Alami
|
Christoph Schommer
|
Ismail Berrada
Proceedings of ArabicNLP 2023
In this paper, we present our submitted system for the WojoodNER Shared Task, addressing both flat and nested Arabic Named Entity Recognition (NER). Our system is based on a BERT-based multi-task learning model that leverages the existing Arabic Pretrained Language Models (PLMs) to encode the input sentences. To enhance the performance of our model, we have employed a multi-task loss variance penalty and combined several training objectives, including the Cross-Entropy loss, the Dice loss, the Tversky loss, and the Focal loss. Besides, we have studied the performance of three existing Arabic PLMs for sentence encoding. On the official test set, our system has obtained a micro-F1 score of 0.9113 and 0.9303 for Flat (Sub-Task 1) and Nested (Sub-Task 2) NER, respectively. It has been ranked in the 6th and the 2nd positions among all participating systems in Sub-Task 1 and Sub-Task 2, respectively.
2018
pdf
bib
abs
Résumé automatique guidé de textes: État de l’art et perspectives (Guided Summarization : State-of-the-art and perspectives )
Salima Lamsiyah
|
Said Ouatik El Alaoui
|
Bernard Espinasse
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT
Les systèmes de résumé automatique de textes (SRAT) consistent à produire une représentation condensée et pertinente à partir d’un ou de plusieurs documents textuels. La majorité des SRAT sont basés sur des approches extractives. La tendance actuelle consiste à s’orienter vers les approches abstractives. Dans ce contexte, le résumé guidé défini par la campagne d’évaluation internationale TAC (Text Analysis Conference) en 2010, vise à encourager la recherche sur ce type d’approche, en se basant sur des techniques d’analyse en profondeur de textes. Dans ce papier, nous nous penchons sur le résumé automatique guidé de textes. Dans un premier temps, nous définissons les différentes caractéristiques et contraintes liées à cette tâche. Ensuite, nous dressons un état de l’art des principaux systèmes existants en mettant l’accent sur les travaux les plus récents, et en les classifiant selon les approches adoptées, les techniques utilisées, et leurs évaluations sur des corpus de références. Enfin, nous proposons les grandes étapes d’une méthode spécifique devant permettre le développement d’un nouveau type de systèmes de résumé guidé.