Towards Understanding Self-play for LLM Reasoning

Open in new window