Sergey Savin


2024

pdf bib
ProConSuL: Project Context for Code Summarization with LLMs
Vadim Lomshakov | Andrey Podivilov | Sergey Savin | Oleg Baryshnikov | Alena Lisevych | Sergey Nikolenko
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

We propose Project Context for Code Summarization with LLMs (ProConSuL), a new framework to provide a large language model (LLM) with precise information about the code structure from program analysis methods such as a compiler or IDE language services and use task decomposition derived from the code structure. ProConSuL builds a call graph to provide the context from callees and uses a two-phase training method (SFT + preference alignment) to train the model to use the project context. We also provide a new evaluation benchmark for C/C++ functions and a set of proxy metrics. Experimental results demonstrate that ProConSuL allows to significantly improve code summaries and reduce the number of hallucinations compared to the base model (CodeLlama-7B-instruct). We make our code and dataset available at https://github.com/TypingCat13/ProConSuL.