Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning Hao Ma
–Neural Information Processing Systems
Reinforcement learning (RL) has emerged as a pivotal technique for fine-tuning large language models (LLMs) on specific tasks. However, prevailing RL fine-tuning methods predominantly rely on PPO and its variants. Though these algorithms are effective in general RL settings, they often exhibit suboptimal performance and vulnerability to distribution collapse when applied to the fine-tuning of LLMs.
Neural Information Processing Systems
Nov-14-2025, 07:06:48 GMT
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- Asia
- Europe > United Kingdom
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.67)
- Research Report
- Industry:
- Education (0.93)
- Leisure & Entertainment (0.67)
- Technology: