Yuanhao Shen
2024
SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration
Yuanhao Shen
|
Xiaodan Zhu
|
Lei Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
The tool-use ability of Large Language Models (LLMs) has a profound impact on a wide range of applications. However, LLMs’ self-awareness and self-control capability in appropriately using tools remains understudied. The problem is consequential as it alarms a potential risk of degraded performance and poses a threat to trustworthiness on the models. In this paper, we conduct a study on a family of state-of-the-art LLMs on three datasets with two mainstream tool-use frameworks. Our study reveals the tool-abuse behavior of LLMs, a tendency for models to misuse tools along with models’ frequent overconfidence in tool choice. We also find that this is a common issue regardless of model capability. Accordingly, we propose a novel framework, SMARTCAL, to mitigate the observed issues, and our results show an average 8.6 percent increase in the QA performance in three testing datasets and 21.6 percent lower Expected Calibration Error (ECE) than existing methods.
Search