GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers

GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers Qintong Li author Leyang Cui author Xueliang Zhao author Lingpeng Kong author Wei Bi author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication li-etal-2024-gsm 10.18653/v1/2024.acl-long.163 https://aclanthology.org/2024.acl-long.163/ 2024-08 2961 2984