Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation

Yurui Chang; Bochuan Cao; Lu Lin

doi:10.18653/v1/2025.findings-acl.752

Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation

Abstract

While large language models have demonstrated exceptional performance across a wide range of tasks, they remain susceptible to hallucinations – generating plausible yet factually incorrect contents. Existing methods to mitigating such risk often rely on sampling multiple full-length generations, which introduces significant response latency and becomes ineffective when the model consistently produces hallucinated outputs with high confidence. To address these limitations, we introduce Monitoring Decoding (MD), a novel framework that dynamically monitors the generation process and selectively applies in-process interventions, focusing on revising crucial tokens responsible for hallucinations. Instead of waiting until completion of multiple full-length generations, we identify hallucination-prone tokens during generation using a monitor function, and further refine these tokens through a tree-based decoding strategy. This approach ensures an enhanced factual accuracy and coherence in the generated output while maintaining efficiency. Experimental results demonstrate that MD consistently outperforms self-consistency-based approaches in both effectiveness and efficiency, achieving higher factual accuracy while significantly reducing computational overhead.

Anthology ID:: 2025.findings-acl.752
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14574–14587
Language:
URL:: https://aclanthology.org/2025.findings-acl.752/
DOI:: 10.18653/v1/2025.findings-acl.752
Bibkey:
Cite (ACL):: Yurui Chang, Bochuan Cao, and Lu Lin. 2025. Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14574–14587, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation (Chang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.752.pdf

PDF Cite Search Fix data