Learning to predict by the methods of temporal difference
This article introduces a class of incremental learning procedures specializedfor prediction that is, for using past experience with an incompletely knownsystem to predict its future behavior. Whereas conventional prediction-learningmethods assign credit by means of the difference between predicted and actual outcomes,tile new methods assign credit by means of the difference between temporallysuccessive predictions. Although such temporal-difference method~ have been used inSamuel's checker player, Holland's bucket brigade, and the author's Adaptive HeuristicCritic, they have remained poorly understood. Here we prove their convergenceand optimality for special cases and relate them to supervised-learning methods. Formost real-world prediction problems, telnporal-differenee methods require less memoryand less peak computation than conventional methods and they produce moreaccurate predictions. We argue that most problems to which supervised learningis currently applied are really prediction problemsMachine Learning 3: 9-44, erratum p. 377
Feb-1-1988
- Country:
- North America > United States
- California > Orange County
- Irvine (0.14)
- Massachusetts > Middlesex County (0.14)
- California > Orange County
- North America > United States
- Genre:
- Workflow (0.46)
- Industry:
- Leisure & Entertainment > Games (1.00)
- Technology: