Model-free Reinforcement Learning for Robust Locomotion Using Trajectory Optimization for Exploration

Bogdanovic, Miroslav, Khadiv, Majid, Righetti, Ludovic

arXiv.org Artificial Intelligence 

Hence, losing the time dependence However, exploration remains a serious challenge in from the demonstration trajectories in the final feedback RL, especially for legged locomotion control, mainly due policy is the key in our approach to provide robustness with to the sparse rewards in problems with contact as well as respect to contact timing uncertainties. the inherent under-actuation and instability of legged robots. Furthermore, to successfully transfer learned control policies A. Related work to real robots, there is still no consensus among researchers Demonstrations have long been used in dealing with about the choice of the action space [4] and what (and how) exploration issues in reinforcement learning for robotic tasks to randomize [5] in the training procedure to generate robust [11], [12], [13]. Recently, demonstrations have been used as policies.