DocChecker: Bootstrapping Code Large Language Model for Detecting and Resolving Code-Comment Inconsistencies

Anh Dau, Jin L.c. Guo, Nghi Bui


Abstract
Comments in source code are crucial for developers to understand the purpose of the code and to use it correctly. However, keeping comments aligned with the evolving codebase poses a significant challenge. With increasing interest in automated solutions to identify and rectify discrepancies between code and its associated comments, most existing methods rely heavily on heuristic rules. This paper introduces DocChecker, a language model-based framework adept at detecting inconsistencies between code and comments and capable of generating synthetic comments. This functionality allows DocChecker to identify and rectify cases where comments do not accurately represent the code they describe.The efficacy of DocChecker is demonstrated using the Just-In-Time and CodeXGlue datasets in various scenarios. Notably, DocChecker sets a new benchmark in the Inconsistency Code-Comment Detection (ICCD) task, achieving 72.3% accuracy, and scoring 33.64 in BLEU-4 on the code summarization task. These results surpass other Large Language Models (LLMs), including GPT 3.5 and CodeLlama.DocChecker is accessible for use and evaluation. It can be found on https://github.com/FSoft-AI4Code/DocChecker and at http://4.193.50.237:5000/. For a more comprehensive understanding of its functionality, a demonstration video is available on https://youtu.be/FqnPmd531xw.
Anthology ID:
2024.eacl-demo.20
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Nikolaos Aletras, Orphee De Clercq
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
187–194
Language:
URL:
https://aclanthology.org/2024.eacl-demo.20
DOI:
Bibkey:
Cite (ACL):
Anh Dau, Jin L.c. Guo, and Nghi Bui. 2024. DocChecker: Bootstrapping Code Large Language Model for Detecting and Resolving Code-Comment Inconsistencies. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 187–194, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
DocChecker: Bootstrapping Code Large Language Model for Detecting and Resolving Code-Comment Inconsistencies (Dau et al., EACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eacl-demo.20.pdf