CodeRise: Bootstrapping LLMs for Ultra Low-Resource Programming Languages via Progressive Self-Refinement Curriculum

Tengfei Wen; Xuanang Chen; Ben He; Xiaoliang Cong; Le Sun

CodeRise: Bootstrapping LLMs for Ultra Low-Resource Programming Languages via Progressive Self-Refinement Curriculum

Tengfei Wen, Xuanang Chen, Ben He, Xiaoliang Cong, Le Sun

Abstract

Large Language Models (LLMs) struggle with code generation for Ultra Low-Resource Programming Languages (ULRPLs) due to the scarcity of training data. Existing synthetic data generation methods fail in this context, suffering from a severe cold-start problem and resulting in samples that lack diversity. To overcome these challenges, we propose CodeRise, a novel two-stage framework that autonomously generates a high-quality, diverse, and progressively complex curriculum for ULRPLs. The framework first tackles the cold-start and distribution issues by leveraging the full formal syntax of the target language as structural guidance and applying a biased sampling strategy over library modules. Building on this foundation, we fine-tune the model to generate increasingly complex code without explicit syntax input, using an adaptive curriculum and multi-turn self-debugging to progressively improve code quality.We evaluate on two ULRPLs, Tengo and Janet, using migrated HumanEval-Tengo and MBPP-Tengo, as well as our new benchmarks, TengoEval and JanetEval. Experiments show that CodeRise significantly outperforms both training-free and training-based baselines in ultra low-resource environments.

Anthology ID:: 2026.findings-acl.1840
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36929–36942
Language:
URL:: https://aclanthology.org/2026.findings-acl.1840/
DOI:
Bibkey:
Cite (ACL):: Tengfei Wen, Xuanang Chen, Ben He, Xiaoliang Cong, and Le Sun. 2026. CodeRise: Bootstrapping LLMs for Ultra Low-Resource Programming Languages via Progressive Self-Refinement Curriculum. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36929–36942, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: CodeRise: Bootstrapping LLMs for Ultra Low-Resource Programming Languages via Progressive Self-Refinement Curriculum (Wen et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1840.pdf
Checklist:: 2026.findings-acl.1840.checklist.pdf

PDF Cite Search Checklist Fix data