Late Code Chunking: A Code Chunking Strategy for Repository-Level Code Completion

Seungmin Oh; Eunseok Lee

Late Code Chunking: A Code Chunking Strategy for Repository-Level Code Completion

Abstract

This paper introduces Late Code Chunking (LC²), a chunking strategy designed to improve the semantic understanding of code segments for Large Language Models (LLMs). Repository-level code completion requires predicting the completion of unfinished code by leveraging cross-file context spread across a repository. However, when retrieved fragments have missing semantics—the loss of structural or behavioral information during chunking—LLMs struggle to interpret the target code. To address this, LC² refines retrieved chunks by constructing a dual context: a "Code Retrieval Context" optimized for similarity-based search, and a "Code Comprehension Context" that serves as a late enrichment step through context expansion and augmentation. This dual-context design reduces information loss due to chunking and enhances the ability of LLMs to utilize retrieved code. Additionally, we introduce an Asymmetric Query-Chunk Sizing strategy to further optimize retrieval quality by minimizing query noise. Our experiments demonstrate that LC² provides robust performance gains, achieving a statistically significant 19.7% improvement in Exact Match accuracy on the CrossCodeEval benchmark compared to the best existing chunking method.

Anthology ID:: 2026.acl-short.64
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 780–786
Language:
URL:: https://aclanthology.org/2026.acl-short.64/
DOI:
Bibkey:
Cite (ACL):: Seungmin Oh and Eunseok Lee. 2026. Late Code Chunking: A Code Chunking Strategy for Repository-Level Code Completion. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 780–786, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Late Code Chunking: A Code Chunking Strategy for Repository-Level Code Completion (Oh & Lee, ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-short.64.pdf
Checklist:: 2026.acl-short.64.checklist.pdf

PDF Cite Search Checklist Fix data