Don't Trade Off Safety: Diffusion Regularization for Constrained Offline RL

Jun-14-2026, 06:51:35 GMT–Neural Information Processing Systems

Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent learns from a fixed dataset--a common requirement in realistic tasks to prevent unsafe exploration. To address this, we propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL), which first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference. We further apply gradient manipulation for safety adaptation, balancing the reward objective and constraint satisfaction.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Jun-14-2026, 06:51:35 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)