Siming Huang
2026
code-transformed: The Influence of Large Language Models on Code
Yuliang Xu | Siming Huang | Mingmeng Geng | Yao Wan | Xuanhua Shi | Dongping Chen
Findings of the Association for Computational Linguistics: EACL 2026
Yuliang Xu | Siming Huang | Mingmeng Geng | Yao Wan | Xuanhua Shi | Dongping Chen
Findings of the Association for Computational Linguistics: EACL 2026
Coding remains one of the most fundamental modes of interaction between humans and machines. With the rapid advancement of Large Language Models (LLMs), code generation capabilities have begun to significantly reshape programming practices. This development prompts a central question: Have LLMs transformed code style, and how can such transformation be characterized? In this paper, we present a pioneering study that investigates the impact of LLMs on code style, with a focus on naming conventions, complexity, maintainability, and similarity. By analyzing code from over 20,000 GitHub repositories linked to arXiv papers published between 2020 and 2025, we identify measurable trends in the evolution of coding style that align with characteristics of LLM-generated code. For instance, the proportion of snake_case function names in Python code increased from 40.7% in Q1 2023 to 49.8% in Q3 2025. Furthermore, we investigate how LLMs approach algorithmic problems by examining their reasoning processes. Our experimental results may provide the first large-scale empirical evidence that LLMs affect real-world programming style.
2025
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang | Tianhao Cheng | Jason Klein Liu | Weidi Xu | Jiaran Hao | Liuyihan Song | Yang Xu | Jian Yang | Jiaheng Liu | Chenchen Zhang | Linzheng Chai | Ruifeng Yuan | Xianzhen Luo | Qiufeng Wang | YuanTao Fan | Qingfu Zhu | Zhaoxiang Zhang | Yang Gao | Jie Fu | Qian Liu | Houyi Li | Ge Zhang | Yuan Qi | Xu Yinghui | Wei Chu | Zili Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Siming Huang | Tianhao Cheng | Jason Klein Liu | Weidi Xu | Jiaran Hao | Liuyihan Song | Yang Xu | Jian Yang | Jiaheng Liu | Chenchen Zhang | Linzheng Chai | Ruifeng Yuan | Xianzhen Luo | Qiufeng Wang | YuanTao Fan | Qingfu Zhu | Zhaoxiang Zhang | Yang Gao | Jie Fu | Qian Liu | Houyi Li | Ge Zhang | Yuan Qi | Xu Yinghui | Wei Chu | Zili Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Code LLMs have been widely used in various domains, including code generation, logical reasoning, and agent systems. However, open-access code LLMs mostly only release weights, lacking key features such as reproducible data pipelines and transparent training protocols, which are crucial for advancing deeper, more reliable investigations. To address the gap, we introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an “open cookbook” for the research community. Unlike most prior efforts, we release not only model weights and inference code, but also the reproducible training data, complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research. Our work identifies the key ingredients for building a top-tier code LLM: optimized heuristic rules for data cleaning and deduplication, effective recall of code-related text corpus, and high-quality synthetic data for both annealing and supervised fine-tuning stages. By offering this level of openness, we aim to broaden access to all aspects of a top-tier code LLM, with OpenCoder serving as both a powerful model and an open foundation to accelerate research and enable reproducible advancements in code intelligence. The released resource is available at https://opencoder-llm.github.io.
Search
Fix author
Co-authors
- Linzheng Chai 1
- Dongping Chen 1
- Tianhao Cheng 1
- Wei Chu 1
- Yuantao Fan 1
- Jie Fu 1
- Yang Gao (扬 高) 1
- Mingmeng Geng 1
- Jiaran Hao 1
- Houyi Li 1
- Jason Klein Liu 1
- Jiaheng Liu 1
- Qian Liu 1
- Xianzhen Luo 1
- Yuan Qi 1
- Xuanhua Shi 1
- Liuyihan Song 1
- Yao Wan 1
- Qiufeng Wang 1
- Zili Wang 1
- Weidi Xu 1
- Yang Xu 1
- Yuliang Xu 1
- Jian Yang 1
- Xu Yinghui 1
- Ruifeng Yuan 1
- Chenchen Zhang 1
- Zhaoxiang Zhang 1
- Ge Zhang 1
- Qingfu Zhu 1