Sample-Efficient Policy Learning based on Completely Behavior Cloning

Zou, Qiming, Wang, Ling, Lu, Ke, Li, Yu

arXiv.org Artificial Intelligence 

Sample-E fficient Policy Learning based on Completely Behavior Cloning Qiming Zou a,, Ling Wang a,, Ke Lu b,, Y u Li b, a Department of Computer Science and T echnology, Harbin Institute of T echnology, China b Department of Management Science and Engineering, Anhui University of T echnology, ChinaAbstract Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise a ffine (PW A) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better. Keywords: Deep Reinforcement Learning, Model Predictive Control, Sample E fficiency 1. Introduction Deep reinforcement learning is becoming increasingly popular for tackling challenging sequential decision making problems, and has been shown to be successful in solving a range of di fficult problems, such as games [1, 2], robotic control [3] and locomotion [4, 5]. One particular appealing prospect is to use deep neural network parametrization to minimize the burden for manual policy engineering [6].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found