NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset

Sana Al-Azzawi; György Kovács; Filip Nilsson; Tosin Adewumi; Marcus Liwicki

doi:10.18653/v1/2023.semeval-1.196

NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset

Sana Al-Azzawi, György Kovács, Filip Nilsson, Tosin Adewumi, Marcus Liwicki

Abstract

In this paper, we propose a methodology fortask 10 of SemEval23, focusing on detectingand classifying online sexism in social me-dia posts. The task is tackling a serious is-sue, as detecting harmful content on socialmedia platforms is crucial for mitigating theharm of these posts on users. Our solutionfor this task is based on an ensemble of fine-tuned transformer-based models (BERTweet,RoBERTa, and DeBERTa). To alleviate prob-lems related to class imbalance, and to improvethe generalization capability of our model, wealso experiment with data augmentation andsemi-supervised learning. In particular, fordata augmentation, we use back-translation, ei-ther on all classes, or on the underrepresentedclasses only. We analyze the impact of thesestrategies on the overall performance of thepipeline through extensive experiments. whilefor semi-supervised learning, we found thatwith a substantial amount of unlabelled, in-domain data available, semi-supervised learn-ing can enhance the performance of certainmodels. Our proposed method (for which thesource code is available on Github12) attainsan F 1-score of 0.8613 for sub-taskA, whichranked us 10th in the competition.

Anthology ID:: 2023.semeval-1.196
Volume:: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1421–1427
Language:
URL:: https://aclanthology.org/2023.semeval-1.196
DOI:: 10.18653/v1/2023.semeval-1.196
Bibkey:
Cite (ACL):: Sana Al-Azzawi, György Kovács, Filip Nilsson, Tosin Adewumi, and Marcus Liwicki. 2023. NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1421–1427, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset (Al-Azzawi et al., SemEval 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.semeval-1.196.pdf
Video:: https://aclanthology.org/2023.semeval-1.196.mp4

PDF Cite Search Video