Provably Safe Reinforcement Learning with Step-wise Violation Constraints
–Neural Information Processing Systems
We name this problem Safe-RL-SW . Our step-wise violation constraint differs from prior expected violation constraint (Wachi & Sui, 2020; Efroni et al., 2020b; Kalagarla et al., 2021) in two aspects: (i) Minimizing the step-wise violation enables the agent to learn an optimal policy that avoids unsafe regions deterministically,
Neural Information Processing Systems
Feb-16-2026, 10:54:39 GMT
- Country:
- Asia > China (0.04)
- North America > United States
- Illinois (0.04)
- Genre:
- Workflow (0.93)
- Technology: