Tesla at SemEval-2022 Task 4: Patronizing and Condescending Language Detection using Transformer-based Models with Data Augmentation

Sahil Bhatt, Manish Shrivastava


Abstract
This paper describes our system for Task 4 of SemEval 2022: Patronizing and Condescending Language (PCL) Detection. For sub-task 1, where the objective is to classify a text as PCL or non-PCL, we use a T5 Model fine-tuned on the dataset. For sub-task 2, which is a multi-label classification problem, we use a RoBERTa model fine-tuned on the dataset. Given that the key challenge in this task is classification on an imbalanced dataset, our models rely on an augmented dataset that we generate using paraphrasing. We found that these two models yield the best results out of all the other approaches we tried.
Anthology ID:
2022.semeval-1.52
Volume:
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
394–399
Language:
URL:
https://aclanthology.org/2022.semeval-1.52
DOI:
10.18653/v1/2022.semeval-1.52
Bibkey:
Cite (ACL):
Sahil Bhatt and Manish Shrivastava. 2022. Tesla at SemEval-2022 Task 4: Patronizing and Condescending Language Detection using Transformer-based Models with Data Augmentation. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 394–399, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Tesla at SemEval-2022 Task 4: Patronizing and Condescending Language Detection using Transformer-based Models with Data Augmentation (Bhatt & Shrivastava, SemEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.semeval-1.52.pdf
Video:
 https://aclanthology.org/2022.semeval-1.52.mp4
Data
PAWS