SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration

Yuanhao Shen, Xiaodan Zhu, Lei Chen


Abstract
The tool-use ability of Large Language Models (LLMs) has a profound impact on a wide range of applications. However, LLMs’ self-awareness and self-control capability in appropriately using tools remains understudied. The problem is consequential as it alarms a potential risk of degraded performance and poses a threat to trustworthiness on the models. In this paper, we conduct a study on a family of state-of-the-art LLMs on three datasets with two mainstream tool-use frameworks. Our study reveals the tool-abuse behavior of LLMs, a tendency for models to misuse tools along with models’ frequent overconfidence in tool choice. We also find that this is a common issue regardless of model capability. Accordingly, we propose a novel framework, SMARTCAL, to mitigate the observed issues, and our results show an average 8.6 percent increase in the QA performance in three testing datasets and 21.6 percent lower Expected Calibration Error (ECE) than existing methods.
Anthology ID:
2024.emnlp-industry.59
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
774–789
Language:
URL:
https://aclanthology.org/2024.emnlp-industry.59
DOI:
Bibkey:
Cite (ACL):
Yuanhao Shen, Xiaodan Zhu, and Lei Chen. 2024. SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 774–789, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration (Shen et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-industry.59.pdf