Generalization in Reinforcement Learning: Safely Approximating the Value Function

Dec-31-1995–Neural Information Processing Systems

Reinforcement learning-the problem of getting an agent to learn to act from sparse, delayed rewards-has been advanced by techniques based on dynamic programming (DP). These algorithms compute a value function which gives, for each state, the minimum possiblelong-term cost commencing in that state. For the high-dimensional and continuous state spaces characteristic of real-world control tasks, a discrete representation ofthe value function is intractable; some form of generalization is required. A natural way to incorporate generalization into DP is to use a function approximator, rather than a lookup table, to represent the value function. This approach, which dates back to uses of Legendre polynomials in DP [Bellman et al., 19631, has recently worked well on several dynamic control problems [Mahadevan and Connell, 1990, Lin, 1993] and succeeded spectacularly on the game of backgammon [Tesauro, 1992, Boyan, 1992].

artificial intelligence, backgammon, iteration, (16 more...)

Neural Information Processing Systems

Dec-31-1995

Conferences PDF

Add feedback

Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry:
- Leisure & Entertainment > Games > Backgammon (0.55)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
Generalization in Reinforcement Learning: Safely Approximating the Value Function
Generalization in Reinforcement Learning: Safely Approximating the Value Function

Similar Docs Excel Report more

Title	Similarity	Source
None found