Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL