Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator

Chen, Xuyang, Duan, Jingliang, Zhao, Lin

May-9-2025–arXiv.org Artificial Intelligence

In addition to a policy update, AC methods employ a parallel critic update to bootstrap the Q-value for policy gradient estimation, which often enjoys reduced variance and fast convergence in training. Despite the empirical success, theoretical analysis of AC in the most practical form remains challenging. Existing works mostly focus on either the double-loop or the two-timescale variants. In double-loop AC, the actor is updated in the outer loop only after the critic takes sufficiently many steps to have an accurate estimation of the Q-value in the inner loop [ Y anget al., 2019; Kumar et al., 2019; Wang et al., 2019 ] . Hence, the convergence of the critic is decoupled from that of the actor. The analysis is separated into a policy evaluation sub-problem in the inner loop and a perturbed gradient descent in the outer loop. In two-timescale AC, the actor and the critic are updated simultaneously in each iteration using stepsizes of different timescales. The actor stepsize (denoted by α t in the sequel) is typically smaller than that of the critic (denoted by β t in the sequel), with their ratio going to zero as the iteration number goes to infinity (i.e., lim t α t/β t = 0). The two-timescale allows the critic to approximate the correct Q-value asymptotically.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

May-9-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Singapore (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning > Gradient Descent (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found