Counterfactual experience augmented off-policy reinforcement learning