Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Open in new window