Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy