Adaptive Step-Size for Policy Gradient Methods

Pirotta, Matteo, Restelli, Marcello, Bascetta, Luca

Neural Information Processing Systems 

In the last decade, policy gradient methods have significantly grown in popularity in the reinforcement--learning field. In particular, they have been largely employed in motor control and robotic applications, thanks to their ability to cope with continuous state and action domains and partial observable problems. Policy gradient researches have been mainly focused on the identification of effective gradient directions and the proposal of efficient estimation algorithms. Nonetheless, the performance of policy gradient methods is determined not only by the gradient direction, since convergence properties are strongly influenced by the choice of the step size: small values imply slow convergence rate, while large values may lead to oscillations or even divergence of the policy parameters. Step--size value is usually chosen by hand tuning and still little attention has been paid to its automatic selection.