2023
pdf
bib
abs
FiRC at SemEval-2023 Task 10: Fine-grained Classification of Online Sexism Content Using DeBERTa
Fadi Hassan
|
Abdessalam Bouchekif
|
Walid Aransa
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
The SemEval 2023 shared task 10 “Explainable Detection of Online Sexism” focuses on detecting and identifying comments and tweets containing sexist expressions and also explaining why it is sexist. This paper describes our system that we used to participate in this shared task. Our model is an ensemble of different variants of fine tuned DeBERTa models that employs a k-fold cross-validation. We have participated in the three tasks A, B and C. Our model ranked 2 nd position in tasks A, 7 th in task B and 4 th in task C.
2022
pdf
bib
abs
Arabic Dialect Identification and Sentiment Classification using Transformer-based Models
Joseph Attieh
|
Fadi Hassan
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
In this paper, we present two deep learning approaches that are based on AraBERT, submitted to the Nuanced Arabic Dialect Identification (NADI) shared task of the Seventh Workshop for Arabic Natural Language Processing (WANLP 2022). NADI consists of two main sub-tasks, mainly country-level dialect and sentiment identification for dialectical Arabic. We present one system per sub-task. The first system is a multi-task learning model that consists of a shared AraBERT encoder with three task-specific classification layers. This model is trained to jointly learn the country-level dialect of the tweet as well as the region-level and area-level dialects. The second system is a distilled model of an ensemble of models trained using K-fold cross-validation. Each model in the ensemble consists of an AraBERT model and a classifier, fine-tuned on (K-1) folds of the training set. Our team Pythoneers achieved rank 6 on the first test set of the first sub-task, rank 9 on the second test set of the first sub-task, and rank 4 on the test set of the second sub-task.
pdf
bib
abs
Pythoneers at WANLP 2022 Shared Task: Monolingual AraBERT for Arabic Propaganda Detection and Span Extraction
Joseph Attieh
|
Fadi Hassan
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
In this paper, we present two deep learning approaches that are based on AraBERT, submitted to the Propaganda Detection shared task of the Seventh Workshop for Arabic Natural Language Processing (WANLP 2022). Propaganda detection consists of two main sub-tasks, mainly propaganda identification and span extraction. We present one system per sub-task. The first system is a Multi-Task Learning model that consists of a shared AraBERT encoder with task-specific binary classification layers. This model is trained to jointly learn one binary classification task per propaganda method. The second system is an AraBERT model with a Conditional Random Field (CRF) layer. We achieved rank 3 on the first sub-task and rank 1 on the second sub-task.
pdf
bib
abs
SeqL at SemEval-2022 Task 11: An Ensemble of Transformer Based Models for Complex Named Entity Recognition Task
Fadi Hassan
|
Wondimagegnhue Tufa
|
Guillem Collell
|
Piek Vossen
|
Lisa Beinborn
|
Adrian Flanagan
|
Kuan Eeik Tan
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper presents our system used to participate in task 11 (MultiCONER) of the SemEval 2022 competition. Our system ranked fourth place in track 12 (Multilingual) and fifth place in track 13 (Code-Mixed). The goal of track 12 is to detect complex named entities in a multilingual setting, while track 13 is dedicated to detecting complex named entities in a code-mixed setting. Both systems were developed using transformer-based language models. We used an ensemble of XLM-RoBERTa-large and Microsoft/infoxlm-large with a Conditional Random Field (CRF) layer. In addition, we describe the algorithms employed to train our models and our hyper-parameter selection. We furthermore study the impact of different methods to aggregate the outputs of the individual models that compose our ensemble. Finally, we present an extensive analysis of the results and errors.