Reinforcement Learning with Function Approximation Converges to a Region

Dec-31-2001–Neural Information Processing Systems

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation ofeither the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.For example, fitted value iteration can diverge even for Markov processes [2]; Q-Iearning with linear function approximators can diverge, even when the states are updated according to a fixed update policy [3]; and SARSA(O) can oscillate between multiple policies with different value functions [4].

artificial intelligence, backgammon, sarsa, (16 more...)

Neural Information Processing Systems

Dec-31-2001

Conferences PDF

Add feedback

Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)

Industry:
- Leisure & Entertainment > Games > Backgammon (0.35)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.90)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Reinforcement Learning with Function Approximation Converges to a Region
Reinforcement Learning with Function Approximation Converges to a Region

Similar Docs Excel Report more

Title	Similarity	Source
None found