Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement

Zhou, Xiaofeng, Huang, Heyan, Liao, Lizi

Jun-5-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) continue to set new standards in knowledge-intensive and complex reasoning tasks, yet their high computational demands limit widespread adoption. While distilling large models into smaller ones offers a sustainable solution, current techniques--such as static knowledge distillation, resource-intensive reinforcement learning from human feedback, or limited self-reflection--struggle to yield substantial and lasting performance gains. In this paper, we present a novel Debate and Reflect (D&R) framework that orchestrates multi-turn debates between smaller models and stronger teacher models, eliciting actionable feedback (e.g., error analysis, corrective strategies) to guide student models. Further, we introduce Tree-structured Direct Preference Optimization (T-DPO) to efficiently leverage these debate logs, organizing interactions into a hierarchical format for effective training. Empirical evaluations across diverse NLP benchmarks demonstrate that our approach significantly improves smaller-model accuracy, robustness, and generalization, outperforming conventional baselines by a large margin.

large language model, machine learning, student model, (20 more...)

arXiv.org Artificial Intelligence

Jun-5-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Florida
    - Miami-Dade County > Miami (0.05)
  - Mexico > Mexico City
    - Mexico City (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Asia
  - Singapore (0.05)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.55)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.94)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found