SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Open in new window