Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning