CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Open in new window