The Hallucination Tax of Reinforcement Finetuning

Linxin Song; Taiwei Shi; Jieyu Zhao

doi:10.18653/v1/2025.findings-emnlp.112

The Hallucination Tax of Reinforcement Finetuning

Abstract

Reinforcement finetuning (RFT) has become a standard approach for enhancing the reasoning capabilities of large language models (LLMs). However, its impact on model trustworthiness remains underexplored. In this work, we identify and systematically study a critical side effect of RFT, which we term the hallucination tax: a degradation in refusal behavior causing models to produce hallucinated answers to unanswerable questions confidently. To investigate this, we introduce SUM (Synthetic Unanswerable Math), a high-quality dataset of unanswerable math problems designed to probe models’ ability to recognize an unanswerable question by reasoning from the insufficient or ambiguous information. Our results show that standard RFT training could reduce model refusal rates by more than 80%, which significantly increases model’s tendency to hallucinate. We further demonstrate that incorporating just 10% SUM during RFT substantially restores appropriate refusal behavior, with minimal accuracy trade-offs on solvable tasks. Crucially, this approach enables LLMs to leverage inference-time compute to reason about their own uncertainty and knowledge boundaries, improving generalization not only to out-of-domain math problems but also to factual question answering tasks.

Anthology ID:: 2025.findings-emnlp.112
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2105–2120
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.112/
DOI:: 10.18653/v1/2025.findings-emnlp.112
Bibkey:
Cite (ACL):: Linxin Song, Taiwei Shi, and Jieyu Zhao. 2025. The Hallucination Tax of Reinforcement Finetuning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2105–2120, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: The Hallucination Tax of Reinforcement Finetuning (Song et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.112.pdf
Checklist:: 2025.findings-emnlp.112.checklist.pdf

PDF Cite Search Checklist Fix data