Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization

Juncai Guo, Jin Liu, Yao Wan, Li Li, Pingyi Zhou


Abstract
Automatic code summarization, which aims to describe the source code in natural language, has become an essential task in software maintenance. Our fellow researchers have attempted to achieve such a purpose through various machine learning-based approaches. One key challenge keeping these approaches from being practical lies in the lacking of retaining the semantic structure of source code, which has unfortunately been overlooked by the state-of-the-art. Existing approaches resort to representing the syntax structure of code by modeling the Abstract Syntax Trees (ASTs). However, the hierarchical structures of ASTs have not been well explored. In this paper, we propose CODESCRIBE to model the hierarchical syntax structure of code by introducing a novel triplet position for code summarization. Specifically, CODESCRIBE leverages the graph neural network and Transformer to preserve the structural and sequential information of code, respectively. In addition, we propose a pointer-generator network that pays attention to both the structure and sequential tokens of code for a better summary generation. Experiments on two real-world datasets in Java and Python demonstrate the effectiveness of our proposed approach when compared with several state-of-the-art baselines.
Anthology ID:
2022.acl-long.37
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
486–500
Language:
URL:
https://aclanthology.org/2022.acl-long.37
DOI:
10.18653/v1/2022.acl-long.37
Bibkey:
Cite (ACL):
Juncai Guo, Jin Liu, Yao Wan, Li Li, and Pingyi Zhou. 2022. Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 486–500, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization (Guo et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.37.pdf