Learning to Reason in Structured In-context Environments with Reinforcement Learning

Open in new window