Counterfactually Safe Reinforcement Learning