Learning Memory-Dependent Continuous Control from Demonstrations
Hou, Siqing, Han, Dongqi, Tani, Jun
–arXiv.org Artificial Intelligence
Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a reasonably small number of demonstration samples. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.
arXiv.org Artificial Intelligence
Feb-18-2021
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia
- South Korea (0.04)
- Middle East > Jordan (0.04)
- Japan
- Kyūshū & Okinawa > Okinawa (0.04)
- Honshū > Kantō
- Tochigi Prefecture > Utsunomiya (0.04)
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Leisure & Entertainment (0.68)