Continuous Value Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) for learning multi-goal, continuous action and state space controllers
Gerken, Andreas, Spranger, Michael
–arXiv.org Artificial Intelligence
Continuous V alue Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) for learning multi-goal, continuous action and state space controllers Andreas Gerken and Michael Spranger Sony Computer Science Laboratories Inc., Tokyo, Japan Abstract -- This paper presents a novel model-free Reinforcement Learning algorithm for learning behavior in continuous action, state, and goal spaces. The algorithm approximates optimal value functions using nonparametric estimators. It is able to efficiently learn to reach multiple arbitrary goals in deterministic and nondeterministic environments. T o improve generalization in the goal space, we propose a novel sample augmentation technique. Using these methods, robots learn faster and overall better controllers. We benchmark the proposed algorithms using simulation and a real-world voltage controlled robot that learns to maneuver in a non-observable Cartesian task space. I NTRODUCTION Learning to control one's body is a crucial skill for any embodied agent. A common way of framing the problem of learning to control an agent is Reinforcement Learning (RL). RL poses the problem in terms of actions that an agent can perform, observed states of the world and some reward function that pays out a treat or punishes the agent depending on its performance. The aim of an optimal RL controller is to maximize the collected rewards. Reinforcement Learning has been studied widely and applied to different domains of learning and control.
arXiv.org Artificial Intelligence
Aug-27-2019