ProConSuL: Project Context for Code Summarization with LLMs

Vadim Lomshakov; Andrey Podivilov; Sergey Savin; Oleg Baryshnikov; Alena Lisevych; Sergey Nikolenko

doi:10.18653/v1/2024.emnlp-industry.65

ProConSuL: Project Context for Code Summarization with LLMs

Vadim Lomshakov, Andrey Podivilov, Sergey Savin, Oleg Baryshnikov, Alena Lisevych, Sergey Nikolenko

Abstract

We propose Project Context for Code Summarization with LLMs (ProConSuL), a new framework to provide a large language model (LLM) with precise information about the code structure from program analysis methods such as a compiler or IDE language services and use task decomposition derived from the code structure. ProConSuL builds a call graph to provide the context from callees and uses a two-phase training method (SFT + preference alignment) to train the model to use the project context. We also provide a new evaluation benchmark for C/C++ functions and a set of proxy metrics. Experimental results demonstrate that ProConSuL allows to significantly improve code summaries and reduce the number of hallucinations compared to the base model (CodeLlama-7B-instruct). We make our code and dataset available at https://github.com/TypingCat13/ProConSuL.

Anthology ID:: 2024.emnlp-industry.65
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2024
Address:: Miami, Florida, US
Editors:: Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 866–880
Language:
URL:: https://aclanthology.org/2024.emnlp-industry.65/
DOI:: 10.18653/v1/2024.emnlp-industry.65
Bibkey:
Cite (ACL):: Vadim Lomshakov, Andrey Podivilov, Sergey Savin, Oleg Baryshnikov, Alena Lisevych, and Sergey Nikolenko. 2024. ProConSuL: Project Context for Code Summarization with LLMs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 866–880, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):: ProConSuL: Project Context for Code Summarization with LLMs (Lomshakov et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-industry.65.pdf

PDF Cite Search Fix data