Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis

David F. Jenny, Yann Billeter, Bernhard Schölkopf, Zhijing Jin


Abstract
The rapid advancement of Large Language Models (LLMs) has sparked intense debate regarding the prevalence of bias in these models and its mitigation. Yet, as exemplified by both results on debiasing methods in the literature and reports of alignment-related defects from the wider community, bias remains a poorly understood topic despite its practical relevance. To enhance the understanding of the internal causes of bias, we analyse LLM bias through the lens of causal fairness analysis, which enables us to both comprehend the origins of bias and reason about its downstream consequences and mitigation. To operationalize this framework, we propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the LLM decision process. By applying Activity Dependency Networks (ADNs), we then analyse how these attributes influence an LLM’s decision process. We apply our method to LLM ratings of argument quality in political debates. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment, and discuss the consequences of our findings for human-AI alignment and bias mitigation.
Anthology ID:
2024.nlp4pi-1.15
Volume:
Proceedings of the Third Workshop on NLP for Positive Impact
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, Jieyu Zhao
Venue:
NLP4PI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
152–178
Language:
URL:
https://aclanthology.org/2024.nlp4pi-1.15
DOI:
Bibkey:
Cite (ACL):
David F. Jenny, Yann Billeter, Bernhard Schölkopf, and Zhijing Jin. 2024. Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis. In Proceedings of the Third Workshop on NLP for Positive Impact, pages 152–178, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis (Jenny et al., NLP4PI 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4pi-1.15.pdf