Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals

Debora Nozza, Federico Bianchi, Anne Lauscher, Dirk Hovy


Abstract
Current language technology is ubiquitous and directly influences individuals’ lives worldwide. Given the recent trend in AI on training and constantly releasing new and powerful large language models (LLMs), there is a need to assess their biases and potential concrete consequences. While some studies have highlighted the shortcomings of these models, there is only little on the negative impact of LLMs on LGBTQIA+ individuals. In this paper, we investigated a state-of-the-art template-based approach for measuring the harmfulness of English LLMs sentence completion when the subjects belong to the LGBTQIA+ community. Our findings show that, on average, the most likely LLM-generated completion is an identity attack 13% of the time. Our results raise serious concerns about the applicability of these models in production environments.
Anthology ID:
2022.ltedi-1.4
Volume:
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Bharathi Raja Chakravarthi, B Bharathi, John P McCrae, Manel Zarrouk, Kalika Bali, Paul Buitelaar
Venue:
LTEDI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–34
Language:
URL:
https://aclanthology.org/2022.ltedi-1.4
DOI:
10.18653/v1/2022.ltedi-1.4
Bibkey:
Cite (ACL):
Debora Nozza, Federico Bianchi, Anne Lauscher, and Dirk Hovy. 2022. Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, pages 26–34, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals (Nozza et al., LTEDI 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ltedi-1.4.pdf
Video:
 https://aclanthology.org/2022.ltedi-1.4.mp4
Code
 milanlproc/honest
Data
HONEST