Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic