A Large Deviations Perspective on Policy Gradient Algorithms

Jongeneel, Wouter, Li, Mengmeng, Kuhn, Daniel

Dec-14-2023–arXiv.org Machine Learning

Motivated by policy gradient methods in the context of reinforcement learning, we derive the first large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Lojasiewicz condition. Leveraging the contraction principle from large deviations theory, we illustrate the potential of this result by showing how convergence properties of policy gradient with a softmax parametrization and an entropy regularized objective can be naturally extended to a wide spectrum of other policy parametrizations.

artificial intelligence, exp, machine learning, (15 more...)

arXiv.org Machine Learning

Dec-14-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland
  - Vaud > Lausanne (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Mathematical & Statistical Methods (0.67)
  - Machine Learning
    - Statistical Learning > Gradient Descent (0.56)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found