A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms
–Neural Information Processing Systems
However, its application to Q-learning has been limited due to the presence of the max-operator, which makes the associated ODE model a complex nonlinear system. In contrast, the associated ODE of TD learning for policy evaluation is a linear system, whose asymptotic stability is much easier to analyze in general.
Neural Information Processing Systems
Aug-15-2025, 21:30:11 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Switzerland > Zürich
- North America
- Canada (0.04)
- United States > Massachusetts
- Middlesex County > Belmont (0.04)
- Asia > Middle East
- Genre:
- Overview (0.46)
- Technology: