Fast, Robust Adaptive Control by Learning only Forward Models

Neural Information Processing Systems 

A large class of motor control tasks requires that on each cycle the con(cid:173) troller is told its current state and must choose an action to achieve a specified, state-dependent, goal behaviour. This paper argues that the optimization of learning rate, the number of experimental control deci(cid:173) sions before adequate performance is obtained, and robustness is of prime importance-if necessary at the expense of computation per control cy(cid:173) cle and memory requirement. This is motivated by the observation that a robot which requires two thousand learning steps to achieve adequate performance, or a robot which occasionally gets stuck while learning, will always be undesirable, whereas moderate computational expense can be accommodated by increasingly powerful computer hardware. It is not un(cid:173) reasonable to assume the existence of inexpensive 100 Mflop controllers within a few years and so even processes with control cycles in the low tens of milliseconds will have millions of machine instructions in which to make their decisions. This paper outlines a learning control scheme which aims to make effective use of such computational power.