Prompting Fairness: Learning Prompts for Debiasing Large Language Models

Andrei-Victor Chisca, Andrei-Cristian Rad, Camelia Lemnaru


Abstract
Large language models are prone to internalize social biases due to the characteristics of the data used for their self-supervised training scheme. Considering their recent emergence and wide availability to the general public, it is mandatory to identify and alleviate these biases to avoid perpetuating stereotypes towards underrepresented groups. We present a novel prompt-tuning method for reducing biases in encoder models such as BERT or RoBERTa. Unlike other methods, we only train a small set of additional reusable token embeddings that can be concatenated to any input sequence to reduce bias in the outputs. We particularize this method to gender bias by providing a set of templates used for training the prompts. Evaluations on two benchmarks show that our method is on par with the state of the art while having a limited impact on language modeling ability.
Anthology ID:
2024.ltedi-1.6
Volume:
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Bharathi Raja Chakravarthi, Bharathi B, Paul Buitelaar, Thenmozhi Durairaj, György Kovács, Miguel Ángel García Cumbreras
Venues:
LTEDI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–62
Language:
URL:
https://aclanthology.org/2024.ltedi-1.6
DOI:
Bibkey:
Cite (ACL):
Andrei-Victor Chisca, Andrei-Cristian Rad, and Camelia Lemnaru. 2024. Prompting Fairness: Learning Prompts for Debiasing Large Language Models. In Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 52–62, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
Prompting Fairness: Learning Prompts for Debiasing Large Language Models (Chisca et al., LTEDI-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ltedi-1.6.pdf