Goto

Collaborating Authors

 markovian


Finite-Time Analysis of Single-Timescale Actor-Critic

Neural Information Processing Systems

Actor-critic methods have achieved significant success in many challenging applications. However, its finite-time convergence is still poorly understood in the most practical single-timescale form. Existing works on analyzing single-timescale actor-critic have been limited to i.i.d.





Markovian with Christian Columbia chr Columbia d

Neural Information Processing Systems

Output: K ?, K ?. 1 for k=1,...,K do 2 Samplez[k] M( |z[k 1]; k 1, k 1) 3 Computes(z[k]; k 1)= r logq(z[k]; k 1) 4 Compute bgML( k 1)= r logp(z[k],x; k 1) 5 Set k= k 1+"ks(z[k]; k 1) 6 Set k= k 1+ kbgML( k 1) 7 end F hood (this obtained or WecompareMSCwith SMC-based [22] using [29].






ImprovingSampleComplexityBoundsfor(Natural) Actor-CriticAlgorithms

Neural Information Processing Systems

The goal of reinforcement learning (RL) [39] is to maximize the expected total reward by taking actions according toapolicyinastochastic environment, whichismodelled asaMarkovdecision process (MDP) [4]. To obtain an optimal policy, one popular method is the direct maximization of the expected total reward via gradient ascent, which is referred to as the policy gradient (PG) method [40,47].