Speedy Q-Learning
Ghavamzadeh, Mohammad, Kappen, Hilbert J., Azar, Mohammad G., Munos, Rémi
–Neural Information Processing Systems
We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor \gamma only T=O\big(\log(n)/(\epsilon^{2}(1-\gamma)^{4})\big) steps are required for the SQL algorithm to converge to an \epsilon-optimal action-value function with high probability. This bound has a better dependency on 1/\epsilon and 1/(1-\gamma), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning.
Neural Information Processing Systems
Dec-31-2011
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Colorado > Denver County
- Denver (0.04)
- New York > New York County
- Europe
- France (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Netherlands > Gelderland
- Nijmegen (0.05)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Technology: