Learning to Reason in Structured In-context Environments with Reinforcement Learning