Fast Learning with Predictive Forward Models
–Neural Information Processing Systems
A method for transforming performance evaluation signals distal both in space and time into proximal signals usable by supervised learning algorithms, presentedin [Jordan & Jacobs 90], is examined. A simple observation concerning differentiation through models trained with redundant inputs (as one of their networks is) explains a weakness in the original architecture and suggests a modification: an internal world model that encodes action-space exploration and, crucially, cancels input redundancy to the forward model is added. Learning time on an example task, cartpole balancing,is thereby reduced about 50 to 100 times. 1 INTRODUCTION In many learning control problems, the evaluation used to modify (and thus improve) controlmay not be available in terms of the controller's output: instead, it may be in terms of a spatial transformation of the controller's output variables (in which case we shall term it as being "distal in space"), or it may be available only several time steps into the future (termed as being "distal in time"). For example, control of a robot arm may be exerted in terms of joint angles, while evaluation may be in terms of the endpoint cartesian coordinates; furthermore, we may only wish to evaluate the endpoint coordinates reached after a certain period of time: the co- ·Current address: Computation and Neural Systems Program, California Institute of Technology, Pasadena CA. 563 564 Brody ordinatesreached at the end of some motion, for instance. In such cases, supervised learning methods are not directly applicable, and other techniques must be used. Here we study one such technique (proposed for cases where the evaluation is distal in both space and time by [Jordan & Jacobs 90)), analyse a source of its problems, and propose a simple solution for them which leads to fast, efficient learning. We first describe two methods, and then combine them into the "predictive forward modeling" technique with which we are concerned.
Neural Information Processing Systems
Dec-31-1992
- Country:
- Asia > Middle East
- Jordan (0.50)
- North America > United States
- California > Los Angeles County > Pasadena (0.24)
- Asia > Middle East
- Technology: