BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Chen, Xinyue, Zhou, Zijian, Wang, Zheng, Wang, Che, Wu, Yanqiu, Deng, Qing, Ross, Keith

Oct-27-2019–arXiv.org Artificial Intelligence

The field of Deep Reinforcement Learning (DRL) has recently seen a surge in research in batch reinforcement learning, which aims for sample-efficient learning from a given data set without additional interactions with the environment. In the batch DRL setting, commonly employed off-policy DRL algorithms can perform poorly and sometimes even fail to learn altogether. In this paper, we propose a new algorithm, Best-Action Imitation Learning (BAIL), which unlike many off-policy DRL algorithms does not involve maximizing Q functions over the action space. Striving for simplicity as well as performance, BAIL first selects from the batch the actions it believes to be high-performing actions for their corresponding states; it then uses those state-action pairs to train a policy network using imitation learning. Although BAIL is simple, we demonstrate that BAIL achieves state of the art performance on the Mujoco benchmark.

artificial intelligence, reinforcement learning, upper envelope, (15 more...)

arXiv.org Artificial Intelligence

Oct-27-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Industry:
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found