From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

Aishik Rakshit; Smriti Singh; Shuvam Keshari; Arijit Ghosh Chowdhury; Vinija Jain; Aman Chadha

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

Aishik Rakshit, Smriti Singh, Shuvam Keshari, Arijit Ghosh Chowdhury, Vinija Jain, Aman Chadha

Abstract

Embeddings play a pivotal role in the efficacy of large language models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the aforementioned seminal work of (CITATION) and (CITATION) and propose DeepSoftDebias, an algorithm that uses a neural network to perform ‘soft debiasing’. We exhaustively evaluate this algorithm across a variety of state-of-the-art datasets, accuracy metrics, and challenging NLP tasks. On a wide range of metrics, we find that DeepSoftDebias outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.

Anthology ID:: 2025.coling-main.450
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6718–6747
Language:
URL:: https://aclanthology.org/2025.coling-main.450/
DOI:
Bibkey:
Cite (ACL):: Aishik Rakshit, Smriti Singh, Shuvam Keshari, Arijit Ghosh Chowdhury, Vinija Jain, and Aman Chadha. 2025. From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6718–6747, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings (Rakshit et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.450.pdf

PDF Cite Search Fix data