On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Yang, Zhuoran, Chen, Yongxin, Hong, Mingyi, Wang, Zhaoran

Jul-14-2019–arXiv.org Machine Learning

Compared with the classical policy gradient algorithm 1992), actor-critic tracks the action-value function (critic) in policy gradient in an online(Williams, manner, and alternatively updates the policy (actor) and the critic. On the one hand, the online update of critic significantly reduces the variance of policy gradient and hence leads to faster convergence. On the other hand, it also introduces algorithmic instability, which is often observed in practice (Islam et al., 2017) and parallels the notoriously unstable training of generative adversarial and Vinyals, 2016). Such instability of actor-critic originates from severalnetworks (Pfau intertwining challenges, including(i) function approximation of actor and critic, (ii) improper choice of stepsizes, (iii) the noise arising from stochastic approximation, (iv) the asynchrony between actor and critic, and (v) possibly off-policy data used in the update of critic. As a result, the convergence of actor-critic remains much less well understood than that of policy gradient, which itself is open. Consequently, the practical use of actor-critic often lacks theoretical guidance. In this paper, we aim to theoretically understand the algorithmic instability of actor-critic.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

Jul-14-2019

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- North America > United States
  - Minnesota (0.04)

Genre:
- Research Report (0.50)
- Workflow (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.68)
    - Reinforcement Learning (1.00)
    - Statistical Learning (0.93)
  - Representation & Reasoning > Optimization (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found