ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages

Ye, Junjie, Li, Sixian, Li, Guanyu, Huang, Caishuang, Gao, Songyang, Wu, Yilong, Zhang, Qi, Gui, Tao, Huang, Xuanjing

Feb-16-2024–arXiv.org Artificial Intelligence

Tool learning is widely acknowledged as a foundational approach or deploying large language models (LLMs) in real-world scenarios. While current research primarily emphasizes leveraging tools to augment LLMs, it frequently neglects emerging safety considerations tied to their application. To fill this gap, we present $ToolSword$, a comprehensive framework dedicated to meticulously investigating safety issues linked to LLMs in tool learning. Specifically, ToolSword delineates six safety scenarios for LLMs in tool learning, encompassing $malicious$ $queries$ and $jailbreak$ $attacks$ in the input stage, $noisy$ $misdirection$ and $risky$ $cues$ in the execution stage, and $harmful$ $feedback$ and $error$ $conflicts$ in the output stage. Experiments conducted on 11 open-source and closed-source LLMs reveal enduring safety challenges in tool learning, such as handling harmful queries, employing risky tools, and delivering detrimental feedback, which even GPT-4 is susceptible to. Moreover, we conduct further studies with the aim of fostering research on tool learning safety. The data is released in https://github.com/Junjie-Ye/ToolSword.

information, llm, scenario, (14 more...)

arXiv.org Artificial Intelligence

Feb-16-2024

arXiv.org PDF

Add feedback

Country:
- Indian Ocean (0.04)
- Africa (0.04)
- North America > United States
  - New York (0.04)
- Europe
  - Russia (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - India (0.04)
  - Russia (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Banking & Finance (1.00)
- Health & Medicine (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)