Efficient Hallucination Detection in Automatic Code Generation

Georgii Andriushchenko; Roman Garaev; Lyudmila Rvanova; Artem Shelmanov; Vladimir V. Ivanov

Efficient Hallucination Detection in Automatic Code Generation

Georgii Andriushchenko, Roman Garaev, Lyudmila Rvanova, Artem Shelmanov, Vladimir V. Ivanov

Abstract

Large language models (LLMs) frequently produce source code that seems correct and well-formed, yet includes hallucinated elements that cause downstream test failures. In this study, we benchmark state-of-the-art uncertainty quantification methods and existing baselines for the task of hallucination detection in source code and introduce a diff-based pipeline to construct a code dataset annotated with line-level hallucinations. Building on this, we train a lightweight Transformer-based detector that uses LLM internal representations to identify hallucinations, substantially outperforming existing methods across several code generation domains. The detector also shows particular promise for enabling self-correction in LLM-based coding agents. We release the first publicly available dataset of line-level code hallucinations, along with the corresponding source code and trained hallucination detectors https://github.com/datapaf/CodeHallucinationDetection

Anthology ID:: 2026.findings-acl.2143
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 43197–43220
Language:
URL:: https://aclanthology.org/2026.findings-acl.2143/
DOI:
Bibkey:
Cite (ACL):: Georgii Andriushchenko, Roman Garaev, Lyudmila Rvanova, Artem Shelmanov, and Vladimir V. Ivanov. 2026. Efficient Hallucination Detection in Automatic Code Generation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 43197–43220, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Efficient Hallucination Detection in Automatic Code Generation (Andriushchenko et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.2143.pdf
Checklist:: 2026.findings-acl.2143.checklist.pdf

PDF Cite Search Checklist Fix data