Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling

Open in new window