Search-Based Adversarial Estimates for Improving Sample Efficiency in Off-Policy Reinforcement Learning