Provably Safe PAC-MDP Exploration Using Analogies

Roderick, Melrose, Nagarajan, Vaishnavh, Kolter, J. Zico

Jul-7-2020–arXiv.org Artificial Intelligence

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods. Source code for the experiments is available at https://github.com/locuslab/ase.

artificial intelligence, reinforcement learning, state-action pair, (19 more...)

arXiv.org Artificial Intelligence

Jul-7-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:
- Research Report (0.81)
- Workflow (0.93)

Industry:
- Health & Medicine (0.45)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found