On the Convergence of Single-Timescale Actor-Critic
–Neural Information Processing Systems
We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework for handling complex, coupled recursions inherent in the algorithm. Leveraging this framework, we establish that the algorithm converges to an ϵ-close globally optimal policy with a sample complexity of O(ϵ 3). This significantly improves upon the existing complexity of O(ϵ 2)to achieve ϵ-close stationary policy, which is equivalent to the complexity of O(ϵ 4)to achieve ϵ-close globally optimal policy using gradient domination lemma.
Neural Information Processing Systems
Jun-16-2026, 04:31:28 GMT