Generalization in Reinforcement Learning: Safely Approximating the Value Function

Boyan, Justin A., Moore, Andrew W.

Neural Information Processing Systems 

Reinforcement learning-the problem of getting an agent to learn to act from sparse, delayed rewards-has been advanced by techniques based on dynamic programming (DP). These algorithms compute a value function which gives, for each state, the minimum possiblelong-term cost commencing in that state. For the high-dimensional and continuous state spaces characteristic of real-world control tasks, a discrete representation ofthe value function is intractable; some form of generalization is required. A natural way to incorporate generalization into DP is to use a function approximator, rather than a lookup table, to represent the value function. This approach, which dates back to uses of Legendre polynomials in DP [Bellman et al., 19631, has recently worked well on several dynamic control problems [Mahadevan and Connell, 1990, Lin, 1993] and succeeded spectacularly on the game of backgammon [Tesauro, 1992, Boyan, 1992].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found