Zhou Jianxiang
2024
An Evaluation Mechanism of LLM-based Agents on Manipulating APIs
Bing Liu
|
Zhou Jianxiang
|
Dan Meng
|
Haonan Lu
Findings of the Association for Computational Linguistics: EMNLP 2024
LLM-based agents can greatly extend the abilities of LLMs and thus attract sharply increased studies. An ambitious vision – serving users by manipulating massive API-based tools – has been proposed and explored. However, we find a widely accepted evaluation mechanism for generic agents is still missing. This work aims to fill this gap. We decompose tool use capability into seven aspects and form a thorough evaluation schema. In addition, we design and release an instruction dataset and a toolset – the two sides that the agents bridge between – following the principle of reflecting real-world challenges. Furthermore, we evaluate multiple generic agents. Our findings can inspire future research in improving LLM-based agents and rethink the philosophy of API design.