Fine-tuning LLMs with Cross-Attention-based Weight Decay for Bias Mitigation

Farsheed Haque, Zhe Fu, Depeng Xu, Shuhan Yuan, Xi Niu


Abstract
Large Language Models (LLMs) excel in Natural Language Processing (NLP) tasks but often propagate societal biases from their training data, leading to discriminatory outputs. These biases are amplified by the models’ self-attention mechanisms, which disproportionately emphasize biased correlations with sensitive tokens, like “he” or “she”, reflecting the sensitive attributes such as gender and race. To address this issue, we propose a novel fine-tuning method, called Cross-Attention-based Weight Decay (CrAWD), which modifies the LLM architecture to mitigate bias. CrAWD introduces a cross-attention mechanism between an input sequence and a sensitive token sequence, enabling the model to identify and selectively decay the attention weights of tokens associated with sensitive tokens. This reduces the influence of biased association on the model’s generation while maintaining task performance. Evaluations on real-world datasets demonstrate the effectiveness of our proposed CrAWD method. Notably, our method can handle multiple sensitive attributes by adjusting the sensitive token sequence, and it does not require full knowledge of sensitive tokens presented in the dataset, underscoring CrAWD’s versatility in promoting fair LLMs across various applications.
Anthology ID:
2025.findings-emnlp.854
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15785–15798
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.854/
DOI:
Bibkey:
Cite (ACL):
Farsheed Haque, Zhe Fu, Depeng Xu, Shuhan Yuan, and Xi Niu. 2025. Fine-tuning LLMs with Cross-Attention-based Weight Decay for Bias Mitigation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15785–15798, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Fine-tuning LLMs with Cross-Attention-based Weight Decay for Bias Mitigation (Haque et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.854.pdf
Checklist:
 2025.findings-emnlp.854.checklist.pdf