Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

Dong, Jing, Shen, Li, Xu, Yinggan, Wang, Baoxiang

Feb-28-2022–arXiv.org Machine Learning

We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation. Stochastic gradient descent ascent is applied with an adaptive proximal term for robust learning rates. We show the first efficient convergence result with primal-dual actor-critic with a convergence rate of $\mathcal{O}\left(\sqrt{\frac{\ln \left(N d G^2 \right)}{N}}\right)$ under Markovian sampling, where $G$ is the element-wise maximum of the gradient, $N$ is the number of iterations, and $d$ is the dimension of the gradient. Our result is presented with only the Polyak-\L{}ojasiewicz condition for the dual variables, which is easy to verify and applicable to a wide range of reinforcement learning (RL) scenarios. The algorithm and analysis are general enough to be applied to other RL settings, like multi-agent RL. Empirical results on OpenAI Gym continuous control tasks corroborate our theoretical findings.

algorithm, assumption 5, reinforcement, (13 more...)

arXiv.org Machine Learning

Feb-28-2022

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Hong Kong (0.04)
  - Guangdong Province > Shenzhen (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Agents (0.89)
    - Uncertainty > Fuzzy Logic (0.61)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning > Gradient Descent (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found