Improving Learning from Demonstrations by Learning from Experience

Liu, Haofeng, Chen, Yiwen, Tan, Jiayi, Ang, Marcelo H Jr

Nov-15-2021–arXiv.org Artificial Intelligence

How to make imitation learning more general when demonstrations are relatively limited has been a persistent problem in reinforcement learning (RL). Poor demonstrations lead to narrow and biased date distribution, non-Markovian human expert demonstration makes it difficult for the agent to learn, and over-reliance on sub-optimal trajectories can make it hard for the agent to improve its performance. To solve these problems we propose a new algorithm named TD3fG that can smoothly transition from learning from experts to learning from experience. Our algorithm achieves good performance in the MUJOCO environment with limited and sub-optimal demonstrations. We use behavior cloning to train the network as a reference action generator and utilize it in terms of both loss function and exploration noise. This innovation can help agents extract a priori knowledge from demonstrations while reducing the detrimental effects of the poor Markovian properties of the demonstrations. It has a better performance compared to the BC+ fine-tuning and DDPGfD approach, especially when the demonstrations are relatively limited. We call our method TD3fG meaning TD3 from a generator.

ddpgfd, demonstration, experiment, (14 more...)

arXiv.org Artificial Intelligence

Nov-15-2021

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > Jordan (0.04)

Genre:
- Research Report (0.50)

Industry:
- Leisure & Entertainment > Games (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)