High-Confidence Policy Optimization: Reshaping Ambiguity Sets in Robust MDPs

Behzadian, Bahram, Russel, Reazul Hasan, Petrik, Marek

Oct-25-2019–arXiv.org Artificial Intelligence

Robust MDPs are a promising framework for computing robust policies in reinforcement learning. Ambiguity sets, which represent the plausible errors in transition probabilities, determine the trade-off between robustness and average-case performance. The standard practice of defining ambiguity sets using the $L_1$ norm leads, unfortunately, to loose and impractical guarantees. This paper describes new methods for optimizing the shape of ambiguity sets beyond the $L_1$ norm. We derive new high-confidence sampling bounds for weighted $L_1$ and weighted $L_\infty$ ambiguity sets and describe how to compute near-optimal weights from rough value function estimates. Experimental results on a diverse set of benchmarks show that optimized ambiguity sets provide significantly tighter robustness guarantees.

ambiguity, optimization problem, transition probability, (12 more...)

arXiv.org Artificial Intelligence

Oct-25-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New Hampshire (0.04)
  - Massachusetts > Middlesex County
    - Belmont (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.69)
  - Machine Learning
    - Learning Graphical Models (0.69)
    - Reinforcement Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found