Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Open in new window