Improving Deep Policy Gradients with Value Function Search