SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM

Open in new window