Fast Global Convergence of Policy Optimization for Constrained MDPs

Liu, Tao, Zhou, Ruida, Kalathil, Dileep, Kumar, P. R., Tian, Chao

Oct-31-2021–arXiv.org Artificial Intelligence

We address the issue of safety in reinforcement learning. We pose the problem in a discounted infinite-horizon constrained Markov decision process framework. Existing results have shown that gradient-based methods are able to achieve an $\mathcal{O}(1/\sqrt{T})$ global convergence rate both for the optimality gap and the constraint violation. We exhibit a natural policy gradient-based algorithm that has a faster convergence rate $\mathcal{O}(\log(T)/T)$ for both the optimality gap and the constraint violation. When Slater's condition is satisfied and known a priori, zero constraint violation can be further guaranteed for a sufficiently large $T$ while maintaining the same convergence rate.

algorithm, constraint violation, convergence rate, (12 more...)

arXiv.org Artificial Intelligence

Oct-31-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas > Brazos County > College Station (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.81)

Industry:
- Energy > Renewable (0.67)
- Government > Regional Government
  - North America Government > United States Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Optimization (0.93)
    - Constraint-Based Reasoning (0.76)
  - Machine Learning
    - Statistical Learning > Gradient Descent (0.48)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)