Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning