Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset

Jordan Painter, Helen Treharne, Diptesh Kanojia


Abstract
Sarcasm is prevalent in all corners of social media, posing many challenges within Natural Language Processing (NLP), particularly for sentiment analysis. Sarcasm detection remains a largely unsolved problem in many NLP tasks due to its contradictory and typically derogatory nature as a figurative language construct. With recent strides in NLP, many pre-trained language models exist that have been trained on data from specific social media platforms, i.e., Twitter. In this paper, we evaluate the efficacy of multiple sarcasm detection datasets using machine and deep learning models. We create two new datasets - a manually annotated gold standard Sarcasm Annotated Dataset (SAD) and a Silver-Standard Sarcasm-annotated Dataset (S3D). Using a combination of existing sarcasm datasets with SAD, we train a sarcasm detection model over a social-media domain pre-trained language model, BERTweet, which yields an F1-score of 78.29%. Using an Ensemble model with an underlying majority technique, we further label S3D to produce a weakly supervised dataset containing over 100,000 tweets. We publicly release all the code, our manually annotated and weakly supervised datasets, and fine-tuned models for further research.
Anthology ID:
2022.nlpcss-1.22
Volume:
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)
Month:
November
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
David Bamman, Dirk Hovy, David Jurgens, Katherine Keith, Brendan O'Connor, Svitlana Volkova
Venue:
NLP+CSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
197–206
Language:
URL:
https://aclanthology.org/2022.nlpcss-1.22
DOI:
10.18653/v1/2022.nlpcss-1.22
Bibkey:
Cite (ACL):
Jordan Painter, Helen Treharne, and Diptesh Kanojia. 2022. Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset. In Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), pages 197–206, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset (Painter et al., NLP+CSS 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nlpcss-1.22.pdf
Video:
 https://aclanthology.org/2022.nlpcss-1.22.mp4