Andrey Podivilov
2024
ProConSuL: Project Context for Code Summarization with LLMs
Vadim Lomshakov
|
Andrey Podivilov
|
Sergey Savin
|
Oleg Baryshnikov
|
Alena Lisevych
|
Sergey Nikolenko
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
We propose Project Context for Code Summarization with LLMs (ProConSuL), a new framework to provide a large language model (LLM) with precise information about the code structure from program analysis methods such as a compiler or IDE language services and use task decomposition derived from the code structure. ProConSuL builds a call graph to provide the context from callees and uses a two-phase training method (SFT + preference alignment) to train the model to use the project context. We also provide a new evaluation benchmark for C/C++ functions and a set of proxy metrics. Experimental results demonstrate that ProConSuL allows to significantly improve code summaries and reduce the number of hallucinations compared to the base model (CodeLlama-7B-instruct). We make our code and dataset available at https://github.com/TypingCat13/ProConSuL.