2025
pdf
bib
abs
On the Impacts of Contexts on Repository-Level Code Generation
Nam Le Hai
|
Dung Manh Nguyen
|
Nghi D. Q. Bui
Findings of the Association for Computational Linguistics: NAACL 2025
CodeLLMs are widely used for code generation, yet their ability to handle repository-level dependencies remains underexplored. We introduce RepoExec, a benchmark for evaluating repository-level code generation, focusing on executability, functional correctness, and dependency utilization. Our study evaluates 18 models, revealing that retaining full dependency context yields the best performance, while smaller context sizes can be misleading. Pretrained LLMs excel in correctness but often reimplement dependencies, while instruction-tuned models better utilize dependencies but sometimes introduce unnecessary complexity. We propose an instruction-tuning dataset that improves dependency handling and introduce a new metric, Dependency Invocation Rate (DIR), to measure context utilization. Experiments show that instruction-tuned models improve DIR by over 10%, and multi-round debugging further enhances both correctness and dependency use. RepoExec provides a comprehensive framework to advance CodeLLMs for real-world applications. The dataset and source code are available at
https://github.com/FSoft-AI4Code/RepoExec.
pdf
bib
abs
VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning
Cuong Le Chi
|
Chau Truong Vinh Hoang
|
Phan Nhật Huy
|
Dung D. Le
|
Tien N Nguyen
|
Nghi D. Q. Bui
Findings of the Association for Computational Linguistics: NAACL 2025
Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static syntax, they often struggle with dynamic reasoning tasks. We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating multimodal Chain-of-Thought (CoT) reasoning with a visual Control Flow Graph (CFG). By aligning code snippets with their corresponding CFGs, VisualCoder provides deeper insights into execution flows. We address challenges in multimodal CoT integration through a reference mechanism, ensuring consistency between code and its execution path, thereby improving performance in program behavior prediction, error detection, and output generation.
2024
pdf
bib
abs
HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations
Minh Huynh Nguyen
|
Nghi D. Q. Bui
|
Truong Son Hy
|
Long Tran Thanh
|
Tien N. Nguyen
Findings of the Association for Computational Linguistics: EACL 2024
Code representation is important to machine learning models in the code-related applications. Existing code summarization approaches primarily leverage Abstract Syntax Trees (ASTs) and sequential information from source code to generate code summaries while often overlooking the critical consideration of the interplay of dependencies among code elements and code hierarchy. However, effective summarization necessitates a holistic analysis of code snippets from three distinct aspects: lexical, syntactic, and semantic information. In this paper, we propose a novel code summarization approach utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet. HCRs adeptly capture essential code features at lexical, syntactic, and semantic levels within a hierarchical structure. HierarchyNet processes each layer of the HCR separately, employing a Heterogeneous Graph Transformer, a Tree-based CNN, and a Transformer Encoder. In addition, HierarchyNet demonstrates superior performance compared to fine-tuned pre-trained models, including CodeT5, and CodeBERT, as well as large language models that employ zero/few-shot settings, such as CodeLlama, StarCoder, and CodeGen. Implementation details can be found at https://github.com/FSoft-AI4Code/HierarchyNet.
pdf
bib
abs
Functional Overlap Reranking for Neural Code Generation
Hung Quoc To
|
Minh Huynh Nguyen
|
Nghi D. Q. Bui
Findings of the Association for Computational Linguistics: ACL 2024
Code Large Language Models (CodeLLMs) have ushered in a new era in code generation advancements. However, selecting the best code solutions from all possible CodeLLM outputs remains a challenge. Previous methods often overlooked the intricate functional similarities and interactions between solution clusters. We introduce SRank, a novel reranking strategy for selecting the best solutions from code generation, focusing on modeling the relationships between clusters of solutions. By quantifying the functional overlap between solution clusters, our approach provides a better ranking strategy for code solutions. Empirical results show that our method achieves remarkable results on the pass@1 score. For instance, on the Human-Eval benchmark, we achieve 69.66% in pass@1 with Codex002, 75.31% with WizardCoder, 53.99% with StarCoder, and 60.55% with CodeGen, surpassing state-of-the-art code generation reranking methods such as CodeT and Coder-Reviewer on the same CodeLLM by a significant margin approx 6.1% improvement on average. Even in scenarios with a limited number of sampled solutions and test cases, our approach demonstrates robustness and superiority, marking a new benchmark in code generation reranking. Our implementation can be found at https://github.com/FSoft-AI4Code/SRank-CodeRanker.
2023
pdf
bib
abs
Better Language Models of Code through Self-Improvement
Hung Quoc To
|
Nghi D. Q. Bui
|
Jin Guo
|
Tien N. Nguyen
Findings of the Association for Computational Linguistics: ACL 2023
Pre-trained language models for code (PLMCs) have gained attention in recent research. These models are pre-trained on large-scale datasets using multi-modal objectives. However, fine-tuning them requires extensive supervision and is limited by the size of the dataset provided. We aim to improve this issue by proposing a data augmentation framework using knowledge distillation. Our framework utilizes knowledge gained during the pre-training and fine-tuning stage to augment training data, which is then used for the next step. We incorporate this framework into the state-of-the-art language models, such as CodeT5, CodeBERT, and UnixCoder. The results show that our framework significantly improves PLMCs’ performance in sequence-generation tasks, such as code summarization and code generation in the CodeXGLUE benchmark.
pdf
bib
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
Dung Nguyen Manh
|
Nam Le Hai
|
Anh T. V. Dau
|
Anh Minh Nguyen
|
Khanh Nghiem
|
Jin Guo
|
Nghi D. Q. Bui
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)