Sample-Efficient Policy Learning based on Completely Behavior Cloning

Nov-9-2018–arXiv.org Artificial Intelligence

Sample-E fficient Policy Learning based on Completely Behavior Cloning Qiming Zou a,, Ling Wang a,, Ke Lu b,, Y u Li b, a Department of Computer Science and T echnology, Harbin Institute of T echnology, China b Department of Management Science and Engineering, Anhui University of T echnology, ChinaAbstract Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise a ffine (PW A) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better. Keywords: Deep Reinforcement Learning, Model Predictive Control, Sample E fficiency 1. Introduction Deep reinforcement learning is becoming increasingly popular for tackling challenging sequential decision making problems, and has been shown to be successful in solving a range of di fficult problems, such as games [1, 2], robotic control [3] and locomotion [4, 5]. One particular appealing prospect is to use deep neural network parametrization to minimize the burden for manual policy engineering [6].

controller, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

Nov-9-2018

arXiv.org PDF

Add feedback

Country:
- Asia > China > Heilongjiang Province > Harbin (0.24)

Genre:
- Research Report (0.40)

Industry:
- Energy > Oil & Gas (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.49)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found