safe action
Provably Safe Reinforcement Learning with Step-wise Violation Constraints
We name this problem Safe-RL-SW . Our step-wise violation constraint differs from prior expected violation constraint (Wachi & Sui, 2020; Efroni et al., 2020b; Kalagarla et al., 2021) in two aspects: (i) Minimizing the step-wise violation enables the agent to learn an optimal policy that avoids unsafe regions deterministically,
- North America > United States > Illinois (0.04)
- Asia > China (0.04)
Provably Safe Reinforcement Learning with Step-wise Violation Constraints
We name this problem Safe-RL-SW . Our step-wise violation constraint differs from prior expected violation constraint (Wachi & Sui, 2020; Efroni et al., 2020b; Kalagarla et al., 2021) in two aspects: (i) Minimizing the step-wise violation enables the agent to learn an optimal policy that avoids unsafe regions deterministically,
- North America > United States > Illinois (0.04)
- Asia > China (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Provably Safe Reinforcement Learning with Step-wise Violation Constraints
We investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we focus on stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications that need to ensure safety in all decision steps but may not always possess safe actions, e.g., robot control and autonomous driving.We propose an efficient algorithm SUCBVI, which guarantees $\widetilde{\mathcal{O}}(\sqrt{ST})$ or gap-dependent $\widetilde{\mathcal{O}}(S/\mathcal{C}_{\mathrm{gap}} + S^2AH^2)$ step-wise violation and $\widetilde{\mathcal{O}}(\sqrt{H^3SAT})$ regret. Lower bounds are provided to validate the optimality in both violation and regret performance with respect to the number of states $S$ and the total number of steps $T$.
Leveraging Analytic Gradients in Provably Safe Reinforcement Learning
Walter, Tim, Markgraf, Hannah, Külz, Jonathan, Althoff, Matthias
The deployment of autonomous robots in safety-critical applications requires safety guarantees. Provably safe reinforcement learning is an active field of research that aims to provide such guarantees using safeguards. These safeguards should be integrated during training to reduce the sim-to-real gap. While there are several approaches for safeguarding sampling-based reinforcement learning, analytic gradient-based reinforcement learning often achieves superior performance from fewer environment interactions. However, there is no safeguarding approach for this learning paradigm yet. Our work addresses this gap by developing the first effective safeguard for analytic gradient-based reinforcement learning. We analyse existing, differentiable safeguards, adapt them through modified mappings and gradient formulations, and integrate them into a state-of-the-art learning algorithm and a differentiable simulation. Using numerical experiments on three control tasks, we evaluate how different safeguards affect learning. The results demonstrate safeguarded training without compromising performance.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States (0.04)
- (3 more...)
- North America > United States > Illinois (0.04)
- Asia > China (0.04)
- North America > United States > Illinois (0.04)
- Asia > China (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Health & Medicine (0.68)
- Energy (0.67)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > California (0.28)
- Asia > China > Guangdong Province (0.14)
- Research Report (0.66)
- Instructional Material > Course Syllabus & Notes (0.46)
- Energy > Power Industry (1.00)
- Energy > Oil & Gas > Upstream (0.46)