SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations
–Neural Information Processing Systems
In this paper, we present a hyperparameter-free offline safe IL algorithm, SafeDICE, that learns safe policy by leveraging the non-preferred demonstrations in the space of stationary distributions. Our algorithm directly estimates the stationary distribution corrections of the policy that imitate the demonstrations excluding the non-preferred behavior.
Neural Information Processing Systems
Feb-17-2026, 19:41:33 GMT
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California > Alameda County
- Berkeley (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California > Alameda County
- Europe > United Kingdom
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Reinforcement Learning (0.94)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Robots (1.00)
- Information Technology > Artificial Intelligence