Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Open in new window