SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data

Open in new window