SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Dec-27-2025, 03:25:34 GMT–Neural Information Processing Systems

We consider offline safe imitation learning (IL), where the agent aims to learn the safe policy that mimics preferred behavior while avoiding non-preferred behavior from non-preferred demonstrations and unlabeled demonstrations. This problem setting corresponds to various real-world scenarios, where satisfying safety constraints is more important than maximizing the expected return. However, it is very challenging to learn the policy to avoid constraint-violating (i.e.

non-preferred demonstration, offline safe imitation learning, safedice, (6 more...)

Neural Information Processing Systems

Dec-27-2025, 03:25:34 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.47)