Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection

Yachao Zhao; Bo Wang; Yan Wang; Dongming Zhao; Ruifang He; Yuexian Hou

doi:10.18653/v1/2025.findings-acl.1

Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection

Yachao Zhao, Bo Wang, Yan Wang, Dongming Zhao, Ruifang He, Yuexian Hou

Abstract

Large Language Models (LLMs) have been shown to exhibit various biases and stereotypes in their generated content. While extensive research has investigated biases in LLMs, prior work has predominantly focused on explicit bias, with minimal attention to implicit bias and the relation between these two forms of bias. This paper presents a systematic framework grounded in social psychology theories to investigate and compare explicit and implicit biases in LLMs.We propose a novel self-reflection-based evaluation framework that operates in two phases: first measuring implicit bias through simulated psychological assessment methods, then evaluating explicit bias by prompting LLMs to analyze their own generated content. Through extensive experiments on advanced LLMs across multiple social dimensions, we demonstrate that LLMs exhibit a substantial inconsistency between explicit and implicit biases: while explicit bias manifests as mild stereotypes, implicit bias exhibits strong stereotypes.We further investigate the underlying factors contributing to this explicit-implicit bias inconsistency, examining the effects of training data scale, model size, and alignment techniques. Experimental results indicate that while explicit bias declines with increased training data and model size, implicit bias exhibits a contrasting upward trend. Moreover, contemporary alignment methods effectively suppress explicit bias but show limited efficacy in mitigating implicit bias.

Anthology ID:: 2025.findings-acl.1
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–12
Language:
URL:: https://aclanthology.org/2025.findings-acl.1/
DOI:: 10.18653/v1/2025.findings-acl.1
Bibkey:
Cite (ACL):: Yachao Zhao, Bo Wang, Yan Wang, Dongming Zhao, Ruifang He, and Yuexian Hou. 2025. Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection. In Findings of the Association for Computational Linguistics: ACL 2025, pages 1–12, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection (Zhao et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1.pdf

PDF Cite Search Fix data