SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Jan-20-2025, 01:53:19 GMT–Neural Information Processing Systems

We consider offline safe imitation learning (IL), where the agent aims to learn the safe policy that mimics preferred behavior while avoiding non-preferred behavior from non-preferred demonstrations and unlabeled demonstrations. This problem setting corresponds to various real-world scenarios, where satisfying safety constraints is more important than maximizing the expected return. However, it is very challenging to learn the policy to avoid constraint-violating (i.e. In this paper, we present a hyperparameter-free offline safe IL algorithm, SafeDICE, that learns safe policy by leveraging the non-preferred demonstrations in the space of stationary distributions. Our algorithm directly estimates the stationary distribution corrections of the policy that imitate the demonstrations excluding the non-preferred behavior.

non-preferred demonstration, offline safe imitation learning, safedice, (5 more...)

Neural Information Processing Systems

Jan-20-2025, 01:53:19 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.85)
  - Robots (0.70)