Human perceiving behavior modeling in evaluation of code generation models

Sergey Kovalchuk; Vadim Lomshakov; Artem Aliev

doi:10.18653/v1/2022.gem-1.24

Human perceiving behavior modeling in evaluation of code generation models

Sergey Kovalchuk, Vadim Lomshakov, Artem Aliev

Abstract

Within this study, we evaluated a series of code generation models based on CodeGen and GPTNeo to compare the metric-based performance and human evaluation. For a deeper analysis of human perceiving within the evaluation procedure we’ve implemented a 5-level Likert scale assessment of the model output using a perceiving model based on the Theory of Planned Behavior (TPB). Through such analysis, we showed an extension of model assessment as well as a deeper understanding of the quality and applicability of generated code for practical question answering. The approach was evaluated with several model settings in order to assess diversity in quality and style of answer. With the TPB-based model, we showed a different level of perceiving the model result, namely personal understanding, agreement level, and readiness to use the particular code. With such analysis, we investigate a series of issues in code generation as natural language generation (NLG) problems observed in a practical context of programming question-answering with code.

Anthology ID:: 2022.gem-1.24
Original:: 2022.gem-1.24v1
Version 2:: 2022.gem-1.24v2
Volume:: Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Antoine Bosselut, Khyathi Chandu, Kaustubh Dhole, Varun Gangal, Sebastian Gehrmann, Yacine Jernite, Jekaterina Novikova, Laura Perez-Beltrachini
Venue:: GEM
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 287–294
Language:
URL:: https://aclanthology.org/2022.gem-1.24/
DOI:: 10.18653/v1/2022.gem-1.24
Bibkey:
Cite (ACL):: Sergey Kovalchuk, Vadim Lomshakov, and Artem Aliev. 2022. Human perceiving behavior modeling in evaluation of code generation models. In Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 287–294, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Human perceiving behavior modeling in evaluation of code generation models (Kovalchuk et al., GEM 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.gem-1.24.pdf
Video:: https://aclanthology.org/2022.gem-1.24.mp4

PDF (v2) PDF (v1) Cite Search Video Fix data