Guiding Safe Exploration with Weakest Preconditions
Anderson, Greg, Chaudhuri, Swarat, Dillig, Isil
–arXiv.org Artificial Intelligence
In reinforcement learning for safety-critical settings, it is often desirable for the agent to obey safety constraints at all points in time, including during training. We evaluate the approach on a suite of continuous control benchmarks and show that it can achieve comparable performance to existing safe learning techniques while incurring fewer safety violations. In many real-world applications of reinforcement learning (RL), it is crucial for the agent to behave safely during training. Over the years, a body of safe exploration techniques (Garcıa & Fernández, 2015) has emerged to address this challenge. Broadly, these methods aim to converge to highperformance policies while ensuring that every intermediate policy seen during learning satisfies a set of safety constraints. Recent work has developed neural versions of these methods (Achiam et al., 2017; Dalal et al., 2018; Bharadhwaj et al., 2021) that can handle continuous state spaces and complex policy classes. Any method for safe exploration needs a mechanism for deciding if an action can be safely executed at a given state. Some existing approaches use prior knowledge about system dynamics (Berkenkamp et al., 2017; Anderson et al., 2020) to make such judgments.
arXiv.org Artificial Intelligence
Feb-27-2023
- Country:
- North America > United States > Texas (0.28)
- Genre:
- Research Report (0.82)
- Industry:
- Government > Military (0.46)
- Technology: