StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models Zhicheng Guo author Sijie Cheng author Hao Wang author Shihao Liang author Yujia Qin author Peng Li author Zhiyuan Liu author Maosong Sun author Yang Liu author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication guo-etal-2024-stabletoolbench 10.18653/v1/2024.findings-acl.664 https://aclanthology.org/2024.findings-acl.664/ 2024-08 11143 11156