Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques

Lang Xiong; Raina Gao; Alyssa Jeong

doi:10.18653/v1/2025.winlp-main.25

Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques

Abstract

Sarcasm is a complex linguistic and pragmatic phenomenon where expressions convey meanings that contrast with their literal interpretations, requiring sensitivity to the speaker’s intent and context. Misinterpreting sarcasm in collaborative human–AI settings can lead to under- or overreliance on LLM outputs, with consequences ranging from breakdowns in communication to critical safety failures. We introduce Sarc7, a benchmark for fine-grained sarcasm evaluation based on the MUStARD dataset, annotated with seven pragmatically defined sarcasm types: self-deprecating, brooding, deadpan, polite, obnoxious, raging, and manic. These categories are adapted from prior linguistic work and used to create a structured dataset suitable for LLM evaluation. For classification, we evaluate multiple prompting strategies—zero-shot, few-shot, chain-of-thought (CoT), and a novel emotion-based technique—across five major LLMs. Emotion-based prompting yields the highest macro-averaged F1 score of 0.3664 (Gemini 2.5), outperforming CoT for several models and demonstrating its effectiveness in sarcasm type recognition. For sarcasm generation, we design structured prompts using fixed values across four sarcasm-relevant dimensions: incongruity, shock value, context dependency, and emotion. Using Claude 3.5 Sonnet, this approach produces more subtype-aligned outputs, with human evaluators preferring emotion-based generations 38.46% more often than zero-shot baselines. Sarc7 offers a foundation for evaluating nuanced sarcasm understanding and controllable generation in LLMs, pushing beyond binary classification toward interpretable, emotion-informed language modeling.

Anthology ID:: 2025.winlp-main.25
Volume:: Proceedings of the 9th Widening NLP Workshop
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Chen Zhang, Emily Allaway, Hua Shen, Lesly Miculicich, Yinqiao Li, Meryem M'hamdi, Peerat Limkonchotiwat, Richard He Bai, Santosh T.y.s.s., Sophia Simeng Han, Surendrabikram Thapa, Wiem Ben Rim
Venues:: WiNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 157–166
Language:
URL:: https://aclanthology.org/2025.winlp-main.25/
DOI:: 10.18653/v1/2025.winlp-main.25
Bibkey:
Cite (ACL):: Lang Xiong, Raina Gao, and Alyssa Jeong. 2025. Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques. In Proceedings of the 9th Widening NLP Workshop, pages 157–166, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques (Xiong et al., WiNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.winlp-main.25.pdf

PDF Cite Search Fix data