Speedy Q-Learning

Ghavamzadeh, Mohammad, Kappen, Hilbert J., Azar, Mohammad G., Munos, Rémi

Dec-31-2011–Neural Information Processing Systems

We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor \gamma only T=O\big(\log(n)/(\epsilon^{2}(1-\gamma)^{4})\big) steps are required for the SQL algorithm to converge to an \epsilon-optimal action-value function with high probability. This bound has a better dependency on 1/\epsilon and 1/(1-\gamma), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning.

action-value function, algorithm, q-learning, (16 more...)

Neural Information Processing Systems

Dec-31-2011

Conferences PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
    - Belmont (0.04)
  - Colorado > Denver County
    - Denver (0.04)
- Europe
  - France (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)
  - Netherlands > Gelderland
    - Nijmegen (0.05)
- Asia > Middle East
  - Jordan (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Speedy Q-Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found