DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Jung, Sunghee, Lee, Donghun, Lee, Shinbok, Seo, Gaeun, Lee, Daniel, Ko, Byeongil, Cho, Junrae, Kim, Kihyun, Kim, Eunggyun, Shin, Myeongcheol

Jul-15-2025–arXiv.org Artificial Intelligence

Tool-Augmented Larage Language Models (TA-LLMs) have shown promise in real-world applications, but face challenges in handling incomplete queries and out-of-scope requests. While existing approaches rely mainly on Supervised Fine-Tuning with expert trajectories, we propose DiaTool-DPO, a novel method that enhances TA-LLM's dialogue capabilities through Direct Preference Optimization. We model TA-LLM interactions as a Markov Decision Process with 5 distinct dialogue states and categorize user queries into 3 types based on their state transition trajectories. We automatically construct paired trajectory datasets of correct and incorrect dialogue flows and introduce a specialized objective loss for dialogue control. Our comprehensive evaluation demonstrates that DiaTool-DPO approaches GPT-4o's performance (94.8% in information gathering, 91% in tool call rejection) with substantial improvements over baseline (44% and 9.6% respectively) while maintaining core functionality. Our approach opens new possibilities for developing TA-LLMs that can handle diverse real-world scenarios without requiring additional expert demonstrations or human labeling.

large language model, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

Jul-15-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.93)
- Europe > Austria (0.28)
- North America
  - United States (0.46)
  - Mexico (0.28)

Genre:
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found