Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning Hao Ma

Nov-14-2025, 07:06:48 GMT–Neural Information Processing Systems

Reinforcement learning (RL) has emerged as a pivotal technique for fine-tuning large language models (LLMs) on specific tasks. However, prevailing RL fine-tuning methods predominantly rely on PPO and its variants. Though these algorithms are effective in general RL settings, they often exhibit suboptimal performance and vulnerability to distribution collapse when applied to the fine-tuning of LLMs.

kl divergence, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Nov-14-2025, 07:06:48 GMT

Conferences PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- Asia
  - Macao (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.67)

Industry:
- Education (0.93)
- Leisure & Entertainment (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning Hao Ma

Similar Docs Excel Report more

Title	Similarity	Source
None found