FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning

Ruosen Li, Ziming Luo, Xinya Du


Abstract
Hallucinations in large language models (LLMs) pose significant challenges in tasks requiring complex multi-step reasoning, such as mathematical problem-solving. Existing approaches primarily detect the presence of hallucinations but lack a nuanced understanding of their types and manifestations. In this paper, we first introduce a comprehensive taxonomy that categorizes the common hallucinations in mathematical reasoning tasks into six types. We then propose FG-PRM (Fine-Grained Process Reward Model), an augmented model designed to detect and mitigate hallucinations in a fine-grained, step-level manner. To address the limitations of manually labeling training data, we propose an automated method for generating fine-grained hallucination data using LLMs. Our FG-PRM demonstrates superior performance across two key tasks: 1) Fine-grained hallucination detection: classifying hallucination types for each reasoning step; and 2) Verification: ranking multiple LLM-generated outputs to select the most accurate solution. Our experiments show that FG-PRM excels in fine-grained hallucination detection and substantially boosts the performance of LLMs on GSM8K and MATH benchmarks. These results highlight the benefits of fine-grained supervision in enhancing the reliability and interpretability of LLM reasoning processes. Codes and datasets are available at: https://github.com/du-nlp-lab/FG-PRM.
Anthology ID:
2025.findings-emnlp.228
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4247–4278
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.228/
DOI:
Bibkey:
Cite (ACL):
Ruosen Li, Ziming Luo, and Xinya Du. 2025. FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 4247–4278, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning (Li et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.228.pdf
Checklist:
 2025.findings-emnlp.228.checklist.pdf