Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
–Neural Information Processing Systems
This paper presents a pioneering exploration of reinforcement learning (RL) via group relative policy optimization for unified multimodal large language models (ULMs), aimed at simultaneously reinforcing generation and understanding capabilities. Through systematic pilot studies, we uncover the significant potential of ULMs to enable the synergistic co-evolution of dual capabilities within a shared policy optimization framework.
Neural Information Processing Systems
Jun-10-2026, 00:01:26 GMT
- Technology: