texplore
TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots
Hester, Todd (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
Reinforcement Learning (RL) is a paradigm for learning decision-making tasks that could enable robots to learn and adapt to situations on-line. For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. In addition, the algorithm must learn efficiently in the face of noise, sensor/actuator delays and continuous state features. In this paper, we describe TEXPLORE, a model-based RL method that addresses these issues. It learns a random forest model of the domain which generalizes dynamics to unseen states. The agent targets exploration on states that are both promising for the final policy and uncertain in the model. With sample-based planning and a novel parallel architecture, TEXPLORE can select actions continually in real-time whenever necessary. We empirically evaluate TEXPLORE learning to control the velocity of an autonomous vehicle in real-time.