Bridging Offline and Online Reinforcement Learning for LLMs