MLAlgo-Bench: Can Machines Implement Machine Learning Algorithms?

Yunfei Wang, Yeqin Zhang, Yuyang Wu, Liang Lu, Phi Le Nguyen, Xiaoliang Wang, Nguyen Cam-Tu


Abstract
As machine learning (ML) application continues to expand across diverse fields, there is a rising demand for ML code generation. In this paper, we aim at a critical research question: Can machines autonomously generate ML code for sophisticated, human-designed algorithms or solutions? To answer this question, we introduce a novel benchmark, MLAlgo-Bench, which includes two challenging tasks: 1) Generating code for ML algorithms including both traditional ML and modern deep learning-based methods, and 2) Giving humans solution sketches, writing ML code for solving practical tasks in Kaggle competitions. This benchmark is unique in its focus on the challenges of interpreting intricate human instructions and producing multi-step, high-complexity code, offering a rigorous test for current Large Language Model (LLM) capabilities. We introduce an automatic evaluation framework with comprehensive metrics such as task pass rate, relative performance metric, and time overhead. Currently, the top-performing models (Claude3.5-Sonet) achieve a 48.8% task completion rate on realizing machine learning algorithms, and a 21.6% rate for completing Kaggle competitions. Further analysis suggests substantial room for improvement.
Anthology ID:
2025.findings-emnlp.772
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14298–14329
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.772/
DOI:
Bibkey:
Cite (ACL):
Yunfei Wang, Yeqin Zhang, Yuyang Wu, Liang Lu, Phi Le Nguyen, Xiaoliang Wang, and Nguyen Cam-Tu. 2025. MLAlgo-Bench: Can Machines Implement Machine Learning Algorithms?. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 14298–14329, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
MLAlgo-Bench: Can Machines Implement Machine Learning Algorithms? (Wang et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.772.pdf
Checklist:
 2025.findings-emnlp.772.checklist.pdf