SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration
Shen, Yuanhao, Zhu, Xiaodan, Chen, Lei
–arXiv.org Artificial Intelligence
The tool-use ability of Large Language Models (LLMs) has a profound impact on a wide range of industrial applications. However, LLMs' self-control and calibration capability in appropriately using tools remains understudied. The problem is consequential as it raises potential risks of degraded performance and poses a threat to the trustworthiness of the models. In this paper, we conduct a study on a family of state-of-the-art LLMs on three datasets with two mainstream tool-use frameworks. Our study reveals the tool-abuse behavior of LLMs, a tendency for models to misuse tools with overconfidence. We also find that this is a common issue regardless of model capability. Accordingly, we propose a novel approach, \textit{SMARTCAL}, to mitigate the observed issues, and our results show an average of 8.6 percent increase in the QA performance and a 21.6 percent decrease in Expected Calibration Error (ECE) compared to baseline models.
arXiv.org Artificial Intelligence
Dec-11-2024
- Country:
- Asia
- Europe
- France > Île-de-France
- Switzerland (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States > California
- Los Angeles County > Los Angeles (0.14)
- Canada > Ontario
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Government (0.46)
- Technology: