A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Open in new window