Towards Understanding Self-play for LLM Reasoning