Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge
Song, Doo Re, Yang, Chuanyu, McGreavy, Christopher, Li, Zhibin
–arXiv.org Artificial Intelligence
This paper presents a deep learning framework that is capable of solving partially observable locomotion tasks based on our novel Recurrent Deterministic Policy Gradient (RDPG). Three major improvements are applied in our RDPG based learning framework: asynchronized backup of interpolated temporal difference, initialisation of hidden state using past trajectory scanning, and injection of external experiences learned by other agents. The proposed learning framework was implemented to solve the Bipedal-Walker challenge in OpenAI's gym simulation environment where only partial state information is available. Our simulation study shows that the autonomous behaviors generated by the RDPG agent are highly adaptive to a variety of obstacles and enables the agent to traverse rugged terrains effectively.
arXiv.org Artificial Intelligence
May-6-2018
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.49)
- Technology: