Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Dec-24-2025, 00:53:33 GMT–Neural Information Processing Systems

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a $\gamma$-discounted MDP with state space S and action space A, we demonstrate that the $ \ell_{\infty} $-based sample complexity of classical asynchronous Q-learning --- namely, the number of samples needed to yield an entrywise $\epsilon$-accurate estimate of the Q-function --- is at most on the order of $ \frac{1}{ \mu_{\min}(1-\gamma)^5 \epsilon^2 }+ \frac{ t_{\mathsf{mix}} }{ \mu_{\min}(1-\gamma) } $ up to some logarithmic factor, provided that a proper constant learning rate is adopted.

asynchronous q-learning, sample complexity, sharper analysis and variance reduction, (8 more...)

Neural Information Processing Systems

Dec-24-2025, 00:53:33 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)