Sanitizing Large Language Models in Bug Detection with Data-Flow

Chengpeng Wang; Wuqi Zhang; Zian Su; Xiangzhe Xu; Xiangyu Zhang

doi:10.18653/v1/2024.findings-emnlp.217

Sanitizing Large Language Models in Bug Detection with Data-Flow

Chengpeng Wang, Wuqi Zhang, Zian Su, Xiangzhe Xu, Xiangyu Zhang

Abstract

Large language models (LLMs) show potential in code reasoning tasks, facilitating the customization of detecting bugs in software development. However, the hallucination effect can significantly compromise the reliability of bug reports. This work formulates a new schema of bug detection and presents a novel sanitization technique that detects false positives for hallucination mitigation. Our key idea is to enforce LLMs to emit data-flow paths in few-shot chain-of-thought prompting and validate them via the program-property decomposition. Specifically, we dissect data-flow paths into basic properties upon concise code snippets and leverage parsing-based analysis and LLMs for validation. Our approach averagely achieves 91.03% precision and 74.00% recall upon synthetic benchmarks and boosts the precision by 21.99% with the sanitization. The evaluation upon real-world Android malware applications also demonstrates the superiority over an industrial analyzer, surpassing the precision and recall by 15.36% and 3.61%, respectively.

Anthology ID:: 2024.findings-emnlp.217
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3790–3805
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.217/
DOI:: 10.18653/v1/2024.findings-emnlp.217
Bibkey:
Cite (ACL):: Chengpeng Wang, Wuqi Zhang, Zian Su, Xiangzhe Xu, and Xiangyu Zhang. 2024. Sanitizing Large Language Models in Bug Detection with Data-Flow. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 3790–3805, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Sanitizing Large Language Models in Bug Detection with Data-Flow (Wang et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.217.pdf
Data:: 2024.findings-emnlp.217.data.zip

PDF Cite Search Data Fix data