Review for NeurIPS paper: A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms
–Neural Information Processing Systems
Summary and Contributions: The goal of the paper is to prove (asymptotic) convergence of asynchronous Q-learning and other variants of Q-learning within a "simplified" common framework. This is done using the (commonly adopted) ODE method and the results in Borkar and Meyn, but crucially, modeling the ODEs as switched linear systems with state-dependent switching policies. This allows the authors to unify the treatment across different instances of Q-learning, and in particular establish convergence of some interesting variants of Q-learning ("averaging Q-learning" which essentially amounts to target tracking with a Polyak average; Q-learning with a linear state-space function approximator). The established theory of switching systems is a key tool in the paper, and its usage in the context of an analysis of RL algorithms may be of interest in its own right. The novelty of the paper stems principally from the use of switching systems as an analysis tool; the authors provide a condition for global asymptotic convergence of Q-learning with linear function approximators that appears to be weaker than in previously published work.
Neural Information Processing Systems
Jan-27-2025, 17:33:14 GMT
- Technology: