Reinforcement Learning
NeuralDynamicPolicies forEnd-to-EndSensorimotorLearning
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces such as torque, joint angle, or end-effector position. This forces the agent to make decision at each point in training, and hence, limit the scalability to continuous, high-dimensional,andlong-horizontasks.Incontrast,researchinclassicalrobotics has, for a long time, exploited dynamical systems as a policy representation to learn robot behaviors via demonstrations.