An Evaluation Mechanism of LLM-based Agents on Manipulating APIs

Bing Liu, Zhou Jianxiang, Dan Meng, Haonan Lu


Abstract
LLM-based agents can greatly extend the abilities of LLMs and thus attract sharply increased studies. An ambitious vision – serving users by manipulating massive API-based tools – has been proposed and explored. However, we find a widely accepted evaluation mechanism for generic agents is still missing. This work aims to fill this gap. We decompose tool use capability into seven aspects and form a thorough evaluation schema. In addition, we design and release an instruction dataset and a toolset – the two sides that the agents bridge between – following the principle of reflecting real-world challenges. Furthermore, we evaluate multiple generic agents. Our findings can inspire future research in improving LLM-based agents and rethink the philosophy of API design.
Anthology ID:
2024.findings-emnlp.267
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4649–4662
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.267
DOI:
Bibkey:
Cite (ACL):
Bing Liu, Zhou Jianxiang, Dan Meng, and Haonan Lu. 2024. An Evaluation Mechanism of LLM-based Agents on Manipulating APIs. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4649–4662, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
An Evaluation Mechanism of LLM-based Agents on Manipulating APIs (Liu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.267.pdf