Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge

Song, Doo Re, Yang, Chuanyu, McGreavy, Christopher, Li, Zhibin

arXiv.org Artificial Intelligence 

This paper presents a deep learning framework that is capable of solving partially observable locomotion tasks based on our novel Recurrent Deterministic Policy Gradient (RDPG). Three major improvements are applied in our RDPG based learning framework: asynchronized backup of interpolated temporal difference, initialisation of hidden state using past trajectory scanning, and injection of external experiences learned by other agents. The proposed learning framework was implemented to solve the Bipedal-Walker challenge in OpenAI's gym simulation environment where only partial state information is available. Our simulation study shows that the autonomous behaviors generated by the RDPG agent are highly adaptive to a variety of obstacles and enables the agent to traverse rugged terrains effectively.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found