A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods

Neural Information Processing Systems 

In this work, we provide a non-asymptotic analysis for two timescale actor-critic methods under non-i.i.d.