Self-Training Large Language Models for Tool-Use Without Demonstrations

Luo, Ne, Gema, Aryo Pradipta, He, Xuanli, van Krieken, Emile, Lesci, Pietro, Minervini, Pasquale

Feb-9-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) remain prone to factual inaccuracies and computational errors, including hallucinations and mistakes in mathematical reasoning. Recent work augmented LLMs with tools to mitigate these shortcomings, but often requires curated gold tool-use demonstrations. In this paper, we investigate whether LLMs can learn to use tools without demonstrations. First, we analyse zero-shot prompting strategies to guide LLMs in tool utilisation. Second, we propose a self-training method to synthesise tool-use traces using the LLM itself. We compare supervised fine-tuning and preference fine-tuning techniques for fine-tuning the model on datasets constructed using existing Question Answering (QA) datasets, i.e., TriviaQA and GSM8K. Experiments show that tool-use enhances performance on a long-tail knowledge task: 3.7% on PopQA, which is used solely for evaluation, but leads to mixed results on other datasets, i.e., TriviaQA, GSM8K, and NQ-Open. Our findings highlight the potential and challenges of integrating external tools into LLMs without demonstrations.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Feb-9-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Hawaii (0.04)
    - Florida > Miami-Dade County
      - Miami (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Italy > Tuscany
    - Florence (0.04)
- Asia
  - British Indian Ocean Territory > Diego Garcia (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Saudi Arabia > Asir Province
      - Abha (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found