OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

Open in new window