Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
–Neural Information Processing Systems
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy.
Neural Information Processing Systems
May-29-2025, 07:21:57 GMT