Provably Safe Reinforcement Learning with Step-wise Violation Constraints
–Neural Information Processing Systems
We name this problem Safe-RL-SW . Our step-wise violation constraint differs from prior expected violation constraint (Wachi & Sui, 2020; Efroni et al., 2020b; Kalagarla et al., 2021) in two aspects: (i) Minimizing the step-wise violation enables the agent to learn an optimal policy that avoids unsafe regions deterministically,
Neural Information Processing Systems
Feb-16-2026, 10:54:35 GMT