Filip Nilsson


2023

pdf bib
NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset
Sana Al-Azzawi | György Kovács | Filip Nilsson | Tosin Adewumi | Marcus Liwicki
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

In this paper, we propose a methodology fortask 10 of SemEval23, focusing on detectingand classifying online sexism in social me-dia posts. The task is tackling a serious is-sue, as detecting harmful content on socialmedia platforms is crucial for mitigating theharm of these posts on users. Our solutionfor this task is based on an ensemble of fine-tuned transformer-based models (BERTweet,RoBERTa, and DeBERTa). To alleviate prob-lems related to class imbalance, and to improvethe generalization capability of our model, wealso experiment with data augmentation andsemi-supervised learning. In particular, fordata augmentation, we use back-translation, ei-ther on all classes, or on the underrepresentedclasses only. We analyze the impact of thesestrategies on the overall performance of thepipeline through extensive experiments. whilefor semi-supervised learning, we found thatwith a substantial amount of unlabelled, in-domain data available, semi-supervised learn-ing can enhance the performance of certainmodels. Our proposed method (for which thesource code is available on Github12) attainsan F 1-score of 0.8613 for sub-taskA, whichranked us 10th in the competition.

2022

pdf bib
FilipN@LT-EDI-ACL2022-Detecting signs of Depression from Social Media: Examining the use of summarization methods as data augmentation for text classification
Filip Nilsson | György Kovács
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Depression is a common mental disorder that severely affects the quality of life, and can lead to suicide. When diagnosed in time, mild, moderate, and even severe depression can be treated. This is why it is vital to detect signs of depression in time. One possibility for this is the use of text classification models on social media posts. Transformers have achieved state-of-the-art performance on a variety of similar text classification tasks. One drawback, however, is that when the dataset is imbalanced, the performance of these models may be negatively affected. Because of this, in this paper, we examine the effect of balancing a depression detection dataset using data augmentation. In particular, we use abstractive summarization techniques for data augmentation. We examine the effect of this method on the LT-EDI-ACL2022 task. Our results show that when increasing the multiplicity of the minority classes to the right degree, this data augmentation method can in fact improve classification scores on the task.