NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

Han Han; Tong Zhu (朱桐); Xiang Zhang; Mengsong Wu; Xiong Hao; Wenliang Chen (陈文亮)

NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

Han Han, Tong Zhu, Xiang Zhang, MengSong Wu, Xiong Hao, Wenliang Chen

Abstract

Large language models (LLMs) combined with tool learning have gained impressive results in real-world applications. During tool learning, LLMs may call multiple tools in nested orders, where the latter tool call may take the former response as its input parameters. However, current research on the nested tool learning capabilities is still under-explored, since the existing benchmarks lack relevant data instances. To address this problem, we introduce NesTools to bridge the current gap in comprehensive nested tool learning evaluations. NesTools comprises a novel automatic data generation method to construct large-scale nested tool calls with different nesting structures. With manual review and refinement, the dataset is in high quality and closely aligned with real-world scenarios. Therefore, NesTools can serve as a new benchmark to evaluate the nested tool learning abilities of LLMs. We conduct extensive experiments on 22 LLMs, and provide in-depth analyses with NesTools, which shows that current LLMs still suffer from the complex nested tool learning task.

Anthology ID:: 2025.coling-main.657
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9824–9844
Language:
URL:: https://aclanthology.org/2025.coling-main.657/
DOI:
Bibkey:
Cite (ACL):: Han Han, Tong Zhu, Xiang Zhang, MengSong Wu, Xiong Hao, and Wenliang Chen. 2025. NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9824–9844, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models (Han et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.657.pdf

PDF Cite Search Fix data