MdEval: Massively Multilingual Code Debugging

Shukai Liu; Linzheng Chai; Jian Yang; Jiajun Shi; He Zhu; Liran Wang; Jin Ke; Wei Zhang; Hualei Zhu; Shuyue Guo; Tao Sun; Jiaheng Liu; Yunlong Duan; Yu Hao; Liqun Yang; Guanglin Niu; Ge Zhang; Zhoujun Li

MdEval: Massively Multilingual Code Debugging

Shukai Liu, Linzheng Chai, Jian Yang, Jiajun Shi, He Zhu, Liran Wang, Jin Ke, Wei Zhang, Hualei Zhu, Shuyue Guo, Tao Sun, Jiaheng Liu, Yunlong Duan, Yu Hao, Liqun Yang, Guanglin Niu, Ge Zhang, Zhoujun Li

Abstract

Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippets and their associated test cases, are used to assess the debugging capabilities of LLMs. However, many existing benchmarks primarily focus on Python and are often limited in terms of language diversity (e.g., DebugBench and DebugEval). To advancethe field of multilingual debugging with LLMs, we propose the first massively multilingual debugging benchmark, which includes 3.9K test samples of 20 programming languages and covers the automated program repair (APR) task, the bug localization(BL) task, and the bug identification (BI) task. In addition, we introduce the debugging instruction corpora MdEval-Instruct by injecting bugs into the correct multilingual queries and solutions (xDebugGen). Further, a multilingual debugger xDebugCoder trained on MdEval-Instruct as a strong baseline specifically to handle bugs of a wide range of programming languages (e.g. “Missing Mut” in language Rust and “Misused Macro Definition” in language C). Our extensive experiments on MdEval reveal a notable performance gap between open-source and closed-source LLMs (e.g., GPT and Claudeseries), highlighting huge room for improvement in multilingual code debugging scenarios.

Anthology ID:: 2026.findings-acl.1041
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20780–20797
Language:
URL:: https://aclanthology.org/2026.findings-acl.1041/
DOI:
Bibkey:
Cite (ACL):: Shukai Liu, Linzheng Chai, Jian Yang, Jiajun Shi, He Zhu, Liran Wang, Jin Ke, Wei Zhang, Hualei Zhu, Shuyue Guo, Tao Sun, Jiaheng Liu, Yunlong Duan, Yu Hao, Liqun Yang, Guanglin Niu, Ge Zhang, and Zhoujun Li. 2026. MdEval: Massively Multilingual Code Debugging. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20780–20797, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MdEval: Massively Multilingual Code Debugging (Liu et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1041.pdf
Checklist:: 2026.findings-acl.1041.checklist.pdf

PDF Cite Search Checklist Fix data