Provably Safe Reinforcement Learning with Step-wise Violation Constraints

Jun-8-2023–arXiv.org Artificial Intelligence

In this paper, we investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we consider stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications which need to ensure safety in all decision steps and may not always possess safe actions, e.g., robot control and autonomous driving. We propose a novel algorithm SUCBVI, which guarantees $\widetilde{O}(\sqrt{ST})$ step-wise violation and $\widetilde{O}(\sqrt{H^3SAT})$ regret. Lower bounds are provided to validate the optimality in both violation and regret performance with respect to $S$ and $T$. Moreover, we further study a novel safe reward-free exploration problem with step-wise violation constraints. For this problem, we design an $(\varepsilon,\delta)$-PAC algorithm SRF-UCRL, which achieves nearly state-of-the-art sample complexity $\widetilde{O}((\frac{S^2AH^2}{\varepsilon}+\frac{H^4SA}{\varepsilon^2})(\log(\frac{1}{\delta})+S))$, and guarantees $\widetilde{O}(\sqrt{ST})$ violation during the exploration. The experimental results demonstrate the superiority of our algorithms in safety performance, and corroborate our theoretical results.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Jun-8-2023

arXiv.org PDF

Add feedback

Genre:
- Workflow (0.93)
- Research Report > New Finding (0.34)

Industry:
- Information Technology > Robotics & Automation (0.34)
- Automobiles & Trucks (0.34)
- Transportation > Ground
  - Road (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning > Reinforcement Learning (0.85)
  - Representation & Reasoning > Constraint-Based Reasoning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found