HULAT at SemEval-2023 Task 10: Data Augmentation for Pre-trained Transformers Applied to the Detection of Sexism in Social Media

Isabel Segura-Bedmar


Abstract
This paper describes our participation in SemEval-2023 Task 10, whose goal is the detection of sexism in social media. We explore some of the most popular transformer models such as BERT, DistilBERT, RoBERTa, and XLNet. We also study different data augmentation techniques to increase the training dataset. During the development phase, our best results were obtained by using RoBERTa and data augmentation for tasks B and C. However, the use of synthetic data does not improve the results for task C. We participated in the three subtasks. Our approach still has much room for improvement, especially in the two fine-grained classifications. All our code is available in the repository https://github.com/isegura/hulat_edos.
Anthology ID:
2023.semeval-1.26
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
184–192
Language:
URL:
https://aclanthology.org/2023.semeval-1.26
DOI:
10.18653/v1/2023.semeval-1.26
Bibkey:
Cite (ACL):
Isabel Segura-Bedmar. 2023. HULAT at SemEval-2023 Task 10: Data Augmentation for Pre-trained Transformers Applied to the Detection of Sexism in Social Media. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 184–192, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
HULAT at SemEval-2023 Task 10: Data Augmentation for Pre-trained Transformers Applied to the Detection of Sexism in Social Media (Segura-Bedmar, SemEval 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.semeval-1.26.pdf