SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations
–Neural Information Processing Systems
We consider offline safe imitation learning (IL), where the agent aims to learn the safe policy that mimics preferred behavior while avoiding non-preferred behavior from non-preferred demonstrations and unlabeled demonstrations. This problem setting corresponds to various real-world scenarios, where satisfying safety constraints is more important than maximizing the expected return. However, it is very challenging to learn the policy to avoid constraint-violating (i.e.
Neural Information Processing Systems
Dec-27-2025, 03:25:34 GMT
- Technology: