Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning

Open in new window