Demystifying Approximate Value-based RL with $\epsilon$-greedy Exploration: A Differential Inclusion View

Feb-10-2023–arXiv.org Artificial Intelligence

Q-learning and SARSA with $\epsilon$-greedy exploration are leading reinforcement learning methods. Their tabular forms converge to the optimal Q-function under reasonable conditions. However, with function approximation, these methods exhibit strange behaviors such as policy oscillation, chattering, and convergence to different attractors (possibly even the worst policy) on different runs, apart from the usual instability. A theory to explain these phenomena has been a long-standing open problem, even for basic linear function approximation (Sutton, 1999). Our work uses differential inclusion to provide the first framework for resolving this problem. We also provide numerical examples to illustrate our framework's prowess in explaining these algorithms' behaviors.

approximation, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

Feb-10-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found