Reinforcement Learning
Learning to predict by the methods of temporal difference
This article introduces a class of incremental learning procedures specializedfor prediction that is, for using past experience with an incompletely knownsystem to predict its future behavior. Whereas conventional prediction-learningmethods assign credit by means of the difference between predicted and actual outcomes,tile new methods assign credit by means of the difference between temporallysuccessive predictions. Although such temporal-difference method~ have been used inSamuel's checker player, Holland's bucket brigade, and the author's Adaptive HeuristicCritic, they have remained poorly understood. Here we prove their convergenceand optimality for special cases and relate them to supervised-learning methods. Formost real-world prediction problems, telnporal-differenee methods require less memoryand less peak computation than conventional methods and they produce moreaccurate predictions. We argue that most problems to which supervised learningis currently applied are really prediction problemsMachine Learning 3: 9-44, erratum p. 377
Associative search network: A reinforcement learning associative memory
Barto, A. G. | Sutton, R. S. | Brouwer, P. S.
An associative memory system is presented which does not require a "teacher" to provide the desired associations. For each input key it conducts a search for the output pattern which optimizes an external payoff or reinforcement signal. The associative search network (ASN) combines pattern recognition and function optimization capabilities in a simple and effective way. We define the associative search problem, discuss conditions under which the associative search network is capable of solving it, and present results from computer simulations. The synthesis of sensory-motor control surfaces is discussed as an example of the associative search problem.