Generalization in Reinforcement Learning: Safely Approximating the Value Function
Boyan, Justin A., Moore, Andrew W.
–Neural Information Processing Systems
Reinforcement learning-the problem of getting an agent to learn to act from sparse, delayed rewards-has been advanced by techniques based on dynamic programming (DP). These algorithms compute a value function which gives, for each state, the minimum possiblelong-term cost commencing in that state. For the high-dimensional and continuous state spaces characteristic of real-world control tasks, a discrete representation ofthe value function is intractable; some form of generalization is required. A natural way to incorporate generalization into DP is to use a function approximator, rather than a lookup table, to represent the value function. This approach, which dates back to uses of Legendre polynomials in DP [Bellman et al., 19631, has recently worked well on several dynamic control problems [Mahadevan and Connell, 1990, Lin, 1993] and succeeded spectacularly on the game of backgammon [Tesauro, 1992, Boyan, 1992].
Neural Information Processing Systems
Dec-31-1995
- Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Industry:
- Leisure & Entertainment > Games > Backgammon (0.55)
- Technology: