Vadim Lomshakov
2024
ProConSuL: Project Context for Code Summarization with LLMs
Vadim Lomshakov
|
Andrey Podivilov
|
Sergey Savin
|
Oleg Baryshnikov
|
Alena Lisevych
|
Sergey Nikolenko
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
We propose Project Context for Code Summarization with LLMs (ProConSuL), a new framework to provide a large language model (LLM) with precise information about the code structure from program analysis methods such as a compiler or IDE language services and use task decomposition derived from the code structure. ProConSuL builds a call graph to provide the context from callees and uses a two-phase training method (SFT + preference alignment) to train the model to use the project context. We also provide a new evaluation benchmark for C/C++ functions and a set of proxy metrics. Experimental results demonstrate that ProConSuL allows to significantly improve code summaries and reduce the number of hallucinations compared to the base model (CodeLlama-7B-instruct). We make our code and dataset available at https://github.com/TypingCat13/ProConSuL.
2022
Human perceiving behavior modeling in evaluation of code generation models
Sergey V. Kovalchuk
|
Vadim Lomshakov
|
Artem Aliev
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Within this study, we evaluated a series of code generation models based on CodeGen and GPTNeo to compare the metric-based performance and human evaluation. For a deeper analysis of human perceiving within the evaluation procedure we’ve implemented a 5-level Likert scale assessment of the model output using a perceiving model based on the Theory of Planned Behavior (TPB). Through such analysis, we showed an extension of model assessment as well as a deeper understanding of the quality and applicability of generated code for practical question answering. The approach was evaluated with several model settings in order to assess diversity in quality and style of answer. With the TPB-based model, we showed a different level of perceiving the model result, namely personal understanding, agreement level, and readiness to use the particular code. With such analysis, we investigate a series of issues in code generation as natural language generation (NLG) problems observed in a practical context of programming question-answering with code.
Search
Co-authors
- Andrey Podivilov 1
- Sergey Savin 1
- Oleg Baryshnikov 1
- Alena Lisevych 1
- Sergey Nikolenko 1
- show all...