Goto

Collaborating Authors

 Reinforcement Learning


SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Neural Information Processing Systems

In this paper, we present a hyperparameter-free offline safe IL algorithm, SafeDICE, that learns safe policy by leveraging the non-preferred demonstrations in the space of stationary distributions. Our algorithm directly estimates the stationary distribution corrections of the policy that imitate the demonstrations excluding the non-preferred behavior.


Replicability in Reinforcement Learning

Neural Information Processing Systems

We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). We focus on the fundamental setting of discounted tabular MDPs with access to a generative model .