SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Neural Information Processing Systems 

In this paper, we present a hyperparameter-free offline safe IL algorithm, SafeDICE, that learns safe policy by leveraging the non-preferred demonstrations in the space of stationary distributions. Our algorithm directly estimates the stationary distribution corrections of the policy that imitate the demonstrations excluding the non-preferred behavior.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found