Decoupling Value and Policy for Generalization in Reinforcement Learning