Efficient Exploration for Constrained MDPs

Taleghan, Majid Alkaee (Oregon State University) | Dietterich, Thomas G. (Oregon State University)

Mar-21-2018–AAAI Conferences

Given a Markov Decision Process (MDP) defined by a simulator, a designated starting state $s_0$, and a downside risk constraint defined as the probability of reaching catastrophic states, our goal is to find a stationary deterministic policy $\pi$ that with probability $1-\delta$ achieves a value $V^\pi(s_0)$ that is within $\epsilon$ of the value of the optimal stationary deterministic $\nu$-feasible policy, $V^*(s_0)$, while economizing on the number of calls to the simulator. This paper presents the first {\bf PAC-Safe-RL} algorithm for this purpose. The algorithm extends PAC-RL algorithms for efficient exploration while providing guarantees that the downside constraint is satisfied. Experiments comparing our {\sc ConstrainedDDV} algorithm to baselines show substantial reductions in the number of simulator calls required to find a feasible policy.

constrained mdp, efficient exploration

AAAI Conferences

Mar-21-2018

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found