Goto

Collaborating Authors

 Reinforcement Learning



Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Neural Information Processing Systems

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy.




The Mean-Squared Error of Double Q-Learning

Neural Information Processing Systems

In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided that the optimal policy is unique and the algorithms converge.