Investigating Counterfactual Unfairness in LLMs towards Identities through Humor

Shubin Kim; Yejin Son; Junyeong Park; Keummin Ka; Seungbeen Lee; Jaeyoung Lee; Hyeju Jang; Alice Oh; Youngjae Yu

Investigating Counterfactual Unfairness in LLMs towards Identities through Humor

Shubin Kim, Yejin Son, Junyeong Park, Keummin Ka, Seungbeen Lee, Jaeyoung Lee, Hyeju Jang, Alice Oh, Youngjae Yu

Abstract

Humor holds up a mirror to social perception: what we find funny often reflects who we are and how we judge others. When language models engage with humor, their reactions expose the social assumptions they have internalized from training data. In this paper, we investigate counterfactual unfairness through humor by observing how the model’s responses change when we swap who speaks and who is addressed while holding other factors constant. Our framework spans three tasks: humor generation refusal, speaker intention inference, and relational/societal impact prediction, covering both identity-agnostic humor and identity-specific disparagement humor. We introduce interpretable bias metrics that capture asymmetric patterns under identity swaps. Experiments across state-of-the-art models reveal consistent relational disparities: jokes told by privileged speakers are refused up to 67.5% more often, judged as malicious 64.7% more frequently, and rated up to 1.5 points higher in social harm on a 5-point scale. These patterns highlight how sensitivity and stereotyping coexist in generative models, complicating efforts toward fairness and cultural alignment.

Anthology ID:: 2026.acl-long.2041
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 44092–44138
Language:
URL:: https://aclanthology.org/2026.acl-long.2041/
DOI:
Bibkey:
Cite (ACL):: Shubin Kim, Yejin Son, Junyeong Park, Keummin Ka, Seungbeen Lee, Jaeyoung Lee, Hyeju Jang, Alice Oh, and Youngjae Yu. 2026. Investigating Counterfactual Unfairness in LLMs towards Identities through Humor. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44092–44138, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Investigating Counterfactual Unfairness in LLMs towards Identities through Humor (Kim et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2041.pdf
Checklist:: 2026.acl-long.2041.checklist.pdf

PDF Cite Search Checklist Fix data