Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

Open in new window