Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling

Jiayi Zeng; Yizhe Feng; Mengliang He; Wenhui Lei; Wei Zhang; Zeming Liu; Xiaoming Shi; Aimin Zhou

doi:10.18653/v1/2025.acl-long.833

Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling

Jiayi Zeng, Yizhe Feng, Mengliang He, Wenhui Lei, Wei Zhang, Zeming Liu, Xiaoming Shi, Aimin Zhou

Abstract

Large language models (LLMs) have demonstrated significant advancements in error handling. Current error-handling works are performed in a passive manner, with explicit error-handling instructions. However, in real-world scenarios, explicit error-handling instructions are usually unavailable. In this paper, our work identifies this challenge as how to conduct proactive error handling without explicit error handling instructions. To promote further research, this work introduces a new benchmark, termed Mis-prompt, consisting of four evaluation tasks, an error category taxonomy, and a new evaluation dataset. Furthermore, this work analyzes current LLMs’ performance on the benchmark, and the experimental results reveal that current LLMs show poor performance on proactive error handling, and SFT on error handling instances improves LLMs’ proactive error handling capabilities. The dataset will be publicly available.

Anthology ID:: 2025.acl-long.833
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17007–17034
Language:
URL:: https://aclanthology.org/2025.acl-long.833/
DOI:: 10.18653/v1/2025.acl-long.833
Bibkey:
Cite (ACL):: Jiayi Zeng, Yizhe Feng, Mengliang He, Wenhui Lei, Wei Zhang, Zeming Liu, Xiaoming Shi, and Aimin Zhou. 2025. Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17007–17034, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling (Zeng et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.833.pdf

PDF Cite Search Fix data