Rishik Lad
2022
Dartmouth at SemEval-2022 Task 6: Detection of Sarcasm
Rishik Lad
|
Weicheng Ma
|
Soroush Vosoughi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper introduces the result of Team Dartmouth’s experiments on each of the five subtasks for the detection of sarcasm in English and Arabic tweets. This detection was framed as a classification problem, and our contributions are threefold: we developed an English binary classifier system with RoBERTa, an Arabic binary classifier with XLM-RoBERTa, and an English multilabel classifier with BERT. Preprocessing steps are taken with labeled input data prior to tokenization, such as extracting and appending verbs/adjectives or representative/significant keywords to the end of an input tweet to help the models better understand and generalize sarcasm detection. We also discuss the results of simple data augmentation techniques to improve the quality of the given training dataset as well as an alternative approach to the question of multilabel sequence classification. Ultimately, our systems place us in the top 14 participants for each of the five subtasks.