HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations

Minh Nguyen, Nghi Bui, Truong Son Hy, Long Tran-Thanh, Tien Nguyen


Abstract
Code representation is important to machine learning models in the code-related applications. Existing code summarization approaches primarily leverage Abstract Syntax Trees (ASTs) and sequential information from source code to generate code summaries while often overlooking the critical consideration of the interplay of dependencies among code elements and code hierarchy. However, effective summarization necessitates a holistic analysis of code snippets from three distinct aspects: lexical, syntactic, and semantic information. In this paper, we propose a novel code summarization approach utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet. HCRs adeptly capture essential code features at lexical, syntactic, and semantic levels within a hierarchical structure. HierarchyNet processes each layer of the HCR separately, employing a Heterogeneous Graph Transformer, a Tree-based CNN, and a Transformer Encoder. In addition, HierarchyNet demonstrates superior performance compared to fine-tuned pre-trained models, including CodeT5, and CodeBERT, as well as large language models that employ zero/few-shot settings, such as CodeLlama, StarCoder, and CodeGen. Implementation details can be found at https://github.com/FSoft-AI4Code/HierarchyNet.
Anthology ID:
2024.findings-eacl.156
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2355–2367
Language:
URL:
https://aclanthology.org/2024.findings-eacl.156
DOI:
Bibkey:
Cite (ACL):
Minh Nguyen, Nghi Bui, Truong Son Hy, Long Tran-Thanh, and Tien Nguyen. 2024. HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations. In Findings of the Association for Computational Linguistics: EACL 2024, pages 2355–2367, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations (Nguyen et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-eacl.156.pdf