Reinforcement Learning in hyperbolic space for multi-step reasoning