Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

Jun-25-2024–arXiv.org Artificial Intelligence

We study the error introduced by entropy regularization in infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problemspecific exponent. We provide a lower bound matching our upper bound up to a polynomial term. Our proof relies on the correspondence of the solutions of entropy-regularized Markov decision processes with gradient flows of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of the gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this particular gradient flow which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropyregularized natural policy gradient methods the overall error decays exponentially in the square root of the number of iterations improving existing sublinear guarantees.

convergence, gradient flow, markov decision process, (12 more...)

arXiv.org Artificial Intelligence

Jun-25-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > San Francisco County > San Francisco (0.14)
- Europe > Germany
  - Saxony > Leipzig (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.93)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found