UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models

Jiajun Wu; Jian Yang; Wei Zhang; Linzheng Chai; Yuchi Ma; Ensheng Shi; Yuqing Ma; Zhoujun Li; Xianglong Liu

UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models

Jiajun Wu, Jian Yang, Wei Zhang, Linzheng Chai, Yuchi Ma, Ensheng Shi, Yuqing Ma, Zhoujun Li, Xianglong Liu

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, their effectiveness heavily relies on supervised training with extensive labeled (e.g., question-answering pairs) or unlabeled datasets (e.g., code snippets), which are often expensive and difficult to obtain at scale. To address this limitation, this paper introduces a method IPC, an unsupervised framework that leverages Internal Probing of LLMs for Code generation without any external corpus, even unlabeled code snippets. We introduce the problem space probing, test understanding probing, solution space probing, and knowledge consolidation and reinforcement to probe the internal knowledge and confidence patterns existing in LLMs. Further, IPC identifies reliable code candidates through self-consistency mechanisms and representation-based quality estimation to train UCoder (coder with unsupervised learning). We validate the proposed approach across multiple code benchmarks, demonstrating that unsupervised methods can achieve competitive performance compared to supervised approaches while significantly reducing the dependency on labeled data and computational resources. Analytic experiments reveal that internal model states contain rich signals about code quality and correctness, and that properly harnessing these signals enables effective unsupervised learning for code generation tasks, opening new directions for training code LLMs in resource-constrained scenarios.

Anthology ID:: 2026.findings-acl.277
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5642–5655
Language:
URL:: https://aclanthology.org/2026.findings-acl.277/
DOI:
Bibkey:
Cite (ACL):: Jiajun Wu, Jian Yang, Wei Zhang, Linzheng Chai, Yuchi Ma, Ensheng Shi, Yuqing Ma, Zhoujun Li, and Xianglong Liu. 2026. UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 5642–5655, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models (Wu et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.277.pdf
Checklist:: 2026.findings-acl.277.checklist.pdf

PDF Cite Search Checklist Fix data