Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

Open in new window