Guiding Safe Exploration with Weakest Preconditions

Anderson, Greg, Chaudhuri, Swarat, Dillig, Isil

Feb-27-2023–arXiv.org Artificial Intelligence

In reinforcement learning for safety-critical settings, it is often desirable for the agent to obey safety constraints at all points in time, including during training. We evaluate the approach on a suite of continuous control benchmarks and show that it can achieve comparable performance to existing safe learning techniques while incurring fewer safety violations. In many real-world applications of reinforcement learning (RL), it is crucial for the agent to behave safely during training. Over the years, a body of safe exploration techniques (Garcıa & Fernández, 2015) has emerged to address this challenge. Broadly, these methods aim to converge to highperformance policies while ensuring that every intermediate policy seen during learning satisfies a set of safety constraints. Recent work has developed neural versions of these methods (Achiam et al., 2017; Dalal et al., 2018; Bharadhwaj et al., 2021) that can handle continuous state spaces and complex policy classes. Any method for safe exploration needs a mechanism for deciding if an action can be safely executed at a given state. Some existing approaches use prior knowledge about system dynamics (Berkenkamp et al., 2017; Anderson et al., 2020) to make such judgments.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Feb-27-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Texas (0.28)

Genre:
- Research Report (0.82)

Industry:
- Government > Military (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.88)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found