Learning to predict by the methods of temporal difference

Sutton, Richard S.

Classics 

This article introduces a class of incremental learning procedures specializedfor prediction that is, for using past experience with an incompletely knownsystem to predict its future behavior. Whereas conventional prediction-learningmethods assign credit by means of the difference between predicted and actual outcomes,tile new methods assign credit by means of the difference between temporallysuccessive predictions. Although such temporal-difference method~ have been used inSamuel's checker player, Holland's bucket brigade, and the author's Adaptive HeuristicCritic, they have remained poorly understood. Here we prove their convergenceand optimality for special cases and relate them to supervised-learning methods. Formost real-world prediction problems, telnporal-differenee methods require less memoryand less peak computation than conventional methods and they produce moreaccurate predictions. We argue that most problems to which supervised learningis currently applied are really prediction problemsMachine Learning 3: 9-44, erratum p. 377

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found