Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Open in new window