Yuheng Cheng
2026
EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving
Xiyuan Zhou | Xinlei Wang | Yirui He | Ruixi Zou | Yang Wu | Yuheng Cheng | Yulu Xie | Wenxuan Liu | Huan Zhao | Yan Xu | Jinjin Gu | Junhua Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Xiyuan Zhou | Xinlei Wang | Yirui He | Ruixi Zou | Yang Wu | Yuheng Cheng | Yulu Xie | Wenxuan Liu | Huan Zhao | Yan Xu | Jinjin Gu | Junhua Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) have shown strong performance on mathematical reasoning under well-defined conditions. However, real-world engineering problems involve uncertainty, context, and open-ended settings that extend beyond symbolic computation. Existing benchmarks largely focus on well-defined or abstract reasoning and therefore fail to capture these complexities. We introduce EngiBench, a hierarchical benchmark designed to evaluate LLMs on solving engineering problems. It spans three levels of increasing difficulty (foundational knowledge retrieval, contextual reasoning, and open-ended modeling) and covers diverse engineering subfields. To facilitate a deeper understanding of model performance, we systematically rewrite each problem into three controlled variants (perturbed, knowledge-enhanced, and math abstraction), enabling us to separately evaluate the model’s robustness, domain-specific knowledge, and mathematical reasoning abilities. Experimental results show clear performance stratification across difficulty levels: model accuracy declines with task complexity, degrades under minor perturbations, and remains substantially below human performance on high-level engineering tasks. These findings reveal that current LLMs still lack the high-level reasoning needed for real-world engineering, highlighting the need for future models with deeper and more reliable problem-solving capabilities. Our source code and data are available at https://github.com/AI4Engi/EngiBench.
2025
Data Interpreter: An LLM Agent for Data Science
Sirui Hong | Yizhang Lin | Bang Liu | Bangbang Liu | Binhao Wu | Ceyao Zhang | Danyang Li | Jiaqi Chen | Jiayi Zhang | Jinlin Wang | Li Zhang | Lingyao Zhang | Min Yang | Mingchen Zhuge | Taicheng Guo | Tuo Zhou | Wei Tao | Robert Tang | Xiangtao Lu | Xiawu Zheng | Xinbing Liang | Yaying Fei | Yuheng Cheng | Yongxin Ni | Zhibin Gou | Zongze Xu | Yuyu Luo | Chenglin Wu
Findings of the Association for Computational Linguistics: ACL 2025
Sirui Hong | Yizhang Lin | Bang Liu | Bangbang Liu | Binhao Wu | Ceyao Zhang | Danyang Li | Jiaqi Chen | Jiayi Zhang | Jinlin Wang | Li Zhang | Lingyao Zhang | Min Yang | Mingchen Zhuge | Taicheng Guo | Tuo Zhou | Wei Tao | Robert Tang | Xiangtao Lu | Xiawu Zheng | Xinbing Liang | Yaying Fei | Yuheng Cheng | Yongxin Ni | Zhibin Gou | Zongze Xu | Yuyu Luo | Chenglin Wu
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Model (LLM)-based agents have excelled in various domains but face significant challenges when applied to data science workflows due to their complex, multi-stage nature. Current LLM-based agents struggle with non-linear relationships, recursive dependencies, implicit data- and logic-dependent reasoning, and managing extensive context. In this paper, we introduce Data Interpreter, an LLM-based agent that addresses these challenges through hierarchical graph-based modeling to represent the complexity and a progressive strategy for step-by-step verification, refinement, and consistent context management. Extensive experiments confirm the effectiveness of Data Interpreter. On InfiAgent-DABench, it boosts performance by 25% (from 75.9% to 94.9%), and on machine learning and open-ended tasks, it lifts accuracy from 88% to 95% and from 60% to 97%, respectively. Moreover, our method surpasses state-of-the-art baselines by 26% on the MATH dataset. We will release the code upon publication.
Search
Fix author
Co-authors
- Jiaqi Chen 1
- Yaying Fei 1
- Zhibin Gou 1
- Jinjin Gu 1
- Taicheng Guo 1
- Yirui He 1
- Sirui Hong 1
- Danyang Li 1
- Xinbing Liang 1
- Yizhang Lin 1
- Bang Liu 1
- Bangbang Liu 1
- Wenxuan Liu 1
- Xiangtao Lu 1
- Yuyu Luo 1
- Yongxin Ni 1
- Robert Tang 1
- Wei Tao 1
- Jinlin Wang 1
- Xinlei Wang 1
- Binhao Wu 1
- Chenglin Wu 1
- Yang Wu 1
- Yulu Xie 1
- Yan Xu 1
- Zongze Xu 1
- Min Yang 1
- Ceyao Zhang 1
- Jiayi Zhang 1
- Li Zhang 1
- Lingyao Zhang 1
- Huan Zhao 1
- Junhua Zhao 1
- Xiawu Zheng 1
- Tuo Zhou 1
- Xiyuan Zhou 1
- Mingchen Zhuge 1
- Ruixi Zou 1