MS@IW at SemEval-2022 Task 4: Patronising and Condescending Language Detection with Synthetically Generated Data

Selina Meyer, Maximilian Schmidhuber, Udo Kruschwitz


Abstract
In this description paper we outline the system architecture submitted to Task 4, Subtask 1 at SemEval-2022. We leverage the generative power of state of the art generative pretrained transformer models to increase training set size and remedy class imbalance issues. Our best submitted system is trained on a synthetically enhanced dataset with 10.3 times as many positive samples as the original dataset and reaches an F1 score of 50.62%, which is 10 percentage points higher than our initial system trained on an undersampled version of the original dataset. We explore possible reasons for the comparably low score in the overall task ranking and report on experiments conducted during the post-evaluation phase.
Anthology ID:
2022.semeval-1.47
Volume:
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
363–368
Language:
URL:
https://aclanthology.org/2022.semeval-1.47
DOI:
10.18653/v1/2022.semeval-1.47
Bibkey:
Cite (ACL):
Selina Meyer, Maximilian Schmidhuber, and Udo Kruschwitz. 2022. MS@IW at SemEval-2022 Task 4: Patronising and Condescending Language Detection with Synthetically Generated Data. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 363–368, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
MS@IW at SemEval-2022 Task 4: Patronising and Condescending Language Detection with Synthetically Generated Data (Meyer et al., SemEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.semeval-1.47.pdf
Video:
 https://aclanthology.org/2022.semeval-1.47.mp4
Data
DPM