Goto

Collaborating Authors

 Reinforcement Learning








Melting Pot Contest: Charting the Future of Generalized Cooperative Intelligence

Neural Information Processing Systems

As AI systems become increasingly sophisticated and interconnected, it will be critical that they be competent at cooperating, both with other AI systems and with humans.



Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning Hao Ma

Neural Information Processing Systems

Reinforcement learning (RL) has emerged as a pivotal technique for fine-tuning large language models (LLMs) on specific tasks. However, prevailing RL fine-tuning methods predominantly rely on PPO and its variants. Though these algorithms are effective in general RL settings, they often exhibit suboptimal performance and vulnerability to distribution collapse when applied to the fine-tuning of LLMs.


Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn

Neural Information Processing Systems

Network outputs can change indirectly to unexpected values after any random batch update for input data not included in the batch, called churn in this paper.