The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns

Bastian Bunzeck, Sina Zarrieß


Abstract
We introduce SlayQA, a novel benchmark data set designed to evaluate language models’ ability to handle gender-inclusive language, specifically the use of neopronouns, in a question-answering setting. Derived from the Social IQa data set, SlayQA modifies context-question-answer triples to include gender-neutral pronouns, creating a significant linguistic distribution shift in comparison to common pre-training corpora like C4 or Dolma. Our results show that state-of-the-art language models struggle with the challenge, exhibiting small, but noticeable performance drops when answering question containing neopronouns compared to those without.
Anthology ID:
2024.genbench-1.3
Volume:
Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Amirhossein Kazemnejad, Christos Christodoulopoulos, Mario Giulianelli, Ryan Cotterell
Venue:
GenBench
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–53
Language:
URL:
https://aclanthology.org/2024.genbench-1.3
DOI:
Bibkey:
Cite (ACL):
Bastian Bunzeck and Sina Zarrieß. 2024. The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, pages 42–53, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns (Bunzeck & Zarrieß, GenBench 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.genbench-1.3.pdf