Synthetic Paths to Integral Truth: Mitigating Hallucinations Caused by Confirmation Bias with Synthetic Data

Changwon Ok; Eunkyeong Lee; Dongsuk Oh

Synthetic Paths to Integral Truth: Mitigating Hallucinations Caused by Confirmation Bias with Synthetic Data

Abstract

Recently, large language models (LLMs) have made significant progress through retrieval-augmented generation (RAG) and preference learning. However, they still exhibit issues such as confirmation bias, the tendency to favor information that confirms one’s beliefs, which remains largely unexplored in current research. In this paper, we propose a novel approach to mitigate confirmation bias-induced hallucination in LLMs through a synthetic data construction pipeline and Direct Preference Optimization (DPO) training. Our method enhances the integration of diverse and complementary information from multiple passages retrieved by RAG, enabling more balanced and accurate reasoning. Experimental results demonstrate significant improvements in response accuracy and reduced hallucination on benchmarks such as Natural Questions Open and HaluBench. These findings suggest that our approach effectively mitigates confirmation bias in long-context question answering, with potential applications to other NLP tasks. We release our data, and evaluation/train code for public access.3]https://github.com/OccasionallyNLP/Synthetic-Paths-to-Integral-Truth.git

Anthology ID:: 2025.coling-main.347
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5168–5180
Language:
URL:: https://aclanthology.org/2025.coling-main.347/
DOI:
Bibkey:
Cite (ACL):: Changwon Ok, Eunkyeong Lee, and Dongsuk Oh. 2025. Synthetic Paths to Integral Truth: Mitigating Hallucinations Caused by Confirmation Bias with Synthetic Data. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5168–5180, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Synthetic Paths to Integral Truth: Mitigating Hallucinations Caused by Confirmation Bias with Synthetic Data (Ok et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.347.pdf

PDF Cite Search Fix data