Reinforcement Learning with Function Approximation Converges to a Region

Apr-6-2023, 16:54:04 GMT–Neural Information Processing Systems

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program.

algorithm, function approximation converge, reinforcement learning

Neural Information Processing Systems

Apr-6-2023, 16:54:04 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.40)