Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias